Class Notes
(810,488)

Canada
(494,139)

University of Toronto Mississauga
(23,388)

Sociology
(3,990)

SOC222H5
(93)

John Kervin
(32)

Lecture

# Lec 4

Unlock Document

University of Toronto Mississauga

Sociology

SOC222H5

John Kervin

Fall

Description

1
SOC 222 -- MEASURING the SOCIAL WORLD
Session #4 -- CORRELATION & REGRESSION
TODAY’S OBJECTIVES
1. Know how to find the equation for a regression line
2. Know how to use that equation to make predictions
3. Understand covariance
4. Know how to obtain a correlation coefficient
5. Understand outliers
6. Know how to find PVE statistics for regressions, correlations, and comparing means
Terms to Know
effect size
linear (equation)
parameter
constant “a”
slope “b”
variance
covariance
definition formula
computational formula
correlation coefficient “r”
scatter
outlier
proportion of variation explained (PVE)
coefficient of determination R2
eta η
LINEAR EQUATIONS
RQ: Are your marks higher if you spend more time on campus?
Linear equation:
effect size.
1. Is there a relationship?
2. How much is the effect? (what is the effect size) 2
1. The first step:
Find the regression line that fits the data best.
• “best fit” means: comes the closest to all the cases
• “Linear” means: a straight line
The equation for a straight line is: Y = a + b (X)
• “Y” stands for values of the dependent variable
• “X” ............... the independent variable
• Linneman, pp. 213-215
Marks = a + b (Days) –using the equation of the regression line 3
• The two variables are X and Y
• Days and Marks
• The two parameters are a and b
• a is called the constant
• This is where the regression line crosses the vertical Y axis
• b is called the slope of the line
• every time the value of X increases by 1, this is how much
the value of Y goes up
• how much the dependent variable is going to change when
changing independent variabl
• Linneman, p. 214
Prediction:
Income (in $000) = 40 + 2 X Hours
No studying:
Hours = 0
Predicted income = 40 + 2 X 0 = 40
10 hours studying:
Hours = 10
Predicted income = 40 + 2 X 10 = 40 + 20 = 60
30 hours studying:
Hours = 30
Predicted income = 40 + 2 X 30 = 40 + 60 = 100
Finding the Regression Line
Covariance: how two things change together (systematically, unrelated way). How they
vary together
Covariance
Variance is the “variation” or “dispersion” or “spread” of one variable
Formula: 2
2 ∑ (x−́x)
variance = s = n−1 4
Take each value and subtract the mean
(x−́x)
(x−x)(y−́y)
Step 1:
• Pick a case
• Find its value on variable X
• Subtract the mean of X = result #1
Step 2:
• Same case
• Find its value on variable Y
• Subtract the mean of Y = result #2
Step 3:
• Multiply these two results = result #3
• A product result for each case
Step 4:
• Add up the product results:
(x−x)(y−́y)
∑
Step 5:
• Divide it all by N - 1
∑ (x−́x)y−́y )
Covariance =
N−1
NOTE:
• ∑ : almost always divide by n-1
• N - 1:
• Small: 2, 3
• Big: 11, 12
• A small number times a small number is a small number 5
• 2 * 3 = 6
• A small number times a big number is a big number
• 2 * 11 = 22
• A big number times a big number is a very big number
• 11 * 12 = 132
If a case is close to its mean, then the difference will be small
• But if it’s far from the mean, the difference will be larger
• Values: 1, 3, 5, 7, 9, 11
• Mean = 6
• For the value “5”, the difference is just “1”
• For the value 1”, the difference is “5”
Return to X and Y:
• If a case is close to the mean on both X and Y, then the product result will
be small
• Because each of the differences is small
• That case won’t add much to the covariance
• If a case is far from the mean on both X and Y, then the product result will
be very large
• That case will contribute much more to the covariance
EG #1:
X Y Differences from mean for X, Y
1 2 -2, -2
3 4 0, 0
5 6 2, 2
• Product results: 4, 0, 4
• Sum: 8
EG #2:
X Y Differences from mean for X, Y
1 2 -2, -2
3 6 0, -2
5 4 2, 0
• Product results: 4, 0, 0, 6
• Sum: 4
• When two variables are related, then the product of the sums is large
(greater covariance)
Slope “b”
slope = parameter “b”
The step: find a “conversion” term: turn covariance into slope
Steps:
1. Select the independent variable X
2. Calculate for each case the difference between the case value and the mean
of x
(x−x)
3. Square these differences 2
(x−́x)
4. Add the squared differences up and divide by N – 1
2
∑ (x−́x)
N−1
• Kranzler, p. 20, Rule #6
• Result is the parameter “b”
b=∑ (x−x́)(y−y)
∑ (x−x)
= (x−x)(y−́y)
Linneman, pp. 218-219
Not Kranzler’s explanation
NOTE: 7
• “definition” formulas: show step by step as to why we do that and
explains
• computational formulas: ignore this types
• See Kranzler p. 85
• shortcut formulas: elements from other formulas (also ignore)
• Kranzler p. 91
• Don’t memorize any formula
• Understand it
Constant “a”
• Linneman
• bottom of p. 219, continue to 220 (read as prof didn’t explain it) 8
Y = 68.56 + 1.39 * X
“a” = 68.56
“b” = 1.39
= 68.56 + 1.39 * 4
= 74.12%
• in absence of information, the best predication is the average
=============================================================
SPSS: Regression Coefficients
Purpose: to get regression coefficients for a bivariate linear regression line
1. Analyze, Regression, Linear
• Opens a box called “Linear Regression”
• List of variables on left 9
• Choice buttons on right
• In the middle: five working areas:
• Dependent and independent – use these
• Other five -- ignore for now
• Bottom: the same five action buttons
2. Move dependent and independent variables into their working areas
3. OK
• Opens your output file
• Output is four tables:
• First: Tells you what your independent and dependent variables are
• Dependent is in a footnote
• Second: “Model Summary”
• Two important numbers:
• R
• R square
• Third table: ANOVA
• Ignore for now
• Fourth table: Coefficients
• Top row is the constant
• Second row is the slope for your IV
• Slope is one measure of effect size: the strength of the
relationship
•

More
Less
Related notes for SOC222H5