false

Class Notes
(839,194)

Canada
(511,223)

University of Toronto Mississauga
(24,116)

Sociology
(4,081)

SOC222H5
(93)

John Kervin
(32)

Lecture

Department

Sociology

Course Code

SOC222H5

Professor

John Kervin

Description

1
SOC222 MEASURING the SOCIAL WORLD: Session #4 CORRELATION &
REGRESSION
TODAY’S OBJECTIVES
1. Know how to find the equation for a regression line
2. Know how to use that equation to make predictions
3. Understand covariance
4. Know how to obtain a correlation coefficient
5. Understand outliers
6. Know how to find PVE statistics for regressions, correlations, and comparing means
Terms to Know
effect size, linear (equation), parameter, constant “a”, slope “b”, variance, covariance,
definition formula, computational formula, correlation coefficient “r” , scatter, outlier,
proportion of variation explained (PVE), coefficient of determination R , eta η – Greek
letter
Talking about ratio variables today. Some of the terms we have looked at before.
LINEAR EQUATIONS
This is what happens when you have 2 ratio variables.
RQ: Are your marks higher if you spend more time on campus?
is there a relationship between the 2 variables and if there is, what is the
effect size. how big/strong is the relationship
Ranges from 0-6, most are there almost every day while 3 are never on campus
The relationship is slightly positive
1. The first step:
Find the regression line that fits the data best.
• “best fit” means: it comes the closest to all the points (cases)
• “Linear” means: a straight line
• We want a straight line that fits these data points the best 2
Any time we have a straight line, we can find an equation for it
• The equation for a straight line is:
Y = a + b (X)
• “Y” stands for values of the dependent variable
• “X” stands for the independent variable
• Linneman, pp. 213-215
The regression line is marks equals a plus b times the amount of days you are on
campus
Marks = a + b (Days)
Can use the regression equation to make predictions
• The two variables are X and Y
• Days and Marks
Finding the equation is the same is finding the parameters
• The two parameters are a and b
• a is called the constant
• This is where the regression line crosses the vertical Y axis
• b is called the slope of the line
• every time the value of X increases by 1, this is how much
the value of Y goes up
• Linneman, p. 214
Prediction: your income is 40 plus 2 times the hours you spend studying
Income (in $000) = 40 + 2 X Hours
No studying:
Hours = 0
Predicted income = 40 + 2 X 0 = 40
Making 40, 000$ 3
10 hours studying:
Hours = 10
Predicted income = 40 + 2 X 10 = 40 + 20 = 60
30 hours studying:
Hours = 30
Predicted income = 40 + 2 X 30 = 40 + 60 = 100
We can use the regression line to find what someone’s IV would be based on the DV
Finding the Regression Line
Covariance: the heart of the formula
Covariance
Variance is the “variation” or “dispersion” or “spread” of one variable
Take each value and subtract the average
This expression will pop up almost everywhere in statistics
Covariance is similar; how two things change together – do they change
systematically? One goes up, the other goes down or up too OR one goes up, other
doesn’t change much – how they co-vary together
Formula:
2 ∑ (x−x)2
variance = s =
n−1
(x−́x)
(x−x)(y−́y)
Take a case and subtract its mean, but do this for x AND y, then multiply
Step 1:
• Pick a case
• Find its value on variable X
• Subtract the mean of X = result #1
Step 2:
• Same case
• Find its value on variable Y
• Subtract the mean of Y = result #2
Step 3:
• Multiply these two results = result #3
• A product result for each case
Step 4:
• Add up the product results:
∑ (x−x)(y−́y)
Step 5:
• Divide it all by N - 1
∑ (x−́x)y−́y )
Covariance =
N−1
NOTE:
• ∑: add things up; almost every time you see this sign, you will divide it by
n-1
• N - 1 4
Logic behind covariance – why did we multiply the two things? Consider the following:
• Small: 2, 3
• Big: 11, 12
• A small number times a small number is a small number
• 2 * 3 = 6
• A small number times a big number is a big number
• 2 * 11 = 22
• A big number times a big number is a very big number
• 11 * 12 = 132
If a case is close to its mean, then the difference will be small (x minus x bar will be
small)
• But if it’s far from the mean, the difference will be larger
• Values: 1, 3, 5, 7, 9, 11
• Mean = 6
• For the value “5”, the difference is just “1”
• For the value 1”, the difference is “5”
Return to X and Y:
• If a case is close to the mean on both X and Y, then the product result will
be small (small x small = small)
• Because each of the differences is small
• That case won’t add much to the covariance
• If a case is far from the mean on both X and Y, then the product result will
be very large (big x big = big)
• That case will contribute much more to the covariance
If the sample has all cases with big #s, then you get a lot of covariance
EG #1: 3 cases strongly related
X Y Differences from mean for X, Y
1 2 -2, -2
3 4 0, 0
5 6 2, 2
• Product results: 4, 0, 4
• Sum: 8
EG #2: 3 cases not very strongly related so product result will be small
X Y Differences from mean for X, Y
1 2 -2, -2
3 6 0, -2
5 4 2, 0
• Product results: 4, 0, 0,
• Sum: 4
When variables are related, sum of product results is larger & this means sum is larger
Slope “b” slope = parameter “b”
The step: find a “conversion” term. (Take covariance and turn it into the slope)
Steps:
1. Select the independent variable X 5
2. Calculate for each case the difference between the case value and the mean
of x
(x−́x)
3. Square these differences
2
(x−́x)
4. Add the squared differences up and divide by N – 1
∑ (x−́x)
N−1
• Kranzler, p. 20, Rule #6
Dividing two fractions take one and turn it upside down and multiply
• Result is the parameter “b”
b= ∑ (x−x́)(y−y)
∑ (x−x)
What is on top (numerator) is what matters – it is the heart of the equation:
(x−x)(y−́y)
Linneman, pp. 218-219 NOT Kranzler’s explanation
SPSS does this for us
NOTE: going thru different texts, you will find different formulas – there are 3 kinds
• “definition” formulas: show us all pieces, step by step, and why we do it
(rationale) – these are what we use
• computational formulas: back in the days, on mechanical calculators
• See Kranzler p. 85
• shortcut formulas: symbols which represent other kinds of formulas
• Kranzler p. 91
• Don’t memorize any formula. Understand it
Constant “a”: Read Linneman: bottom of p. 219, continue to 220; Second parameter 6
Regression line’s formula:Y = 68.56 + 1.39 * X
If a certain student, say Susan, spends 4 days on campus, we can make a prediction:
“a” = 68.56 “b” = 1.39
Y = 68.56 + 1.39 * 4 = 74.12%
If we didn’t know anything about Susan, can use average of everybody, where as if we
only knew this piece of information then we can make this is as our best possible
prediction HOWEVER Susan’s mark is likely to differ from this
=============================================================
SPSS: Regression Coefficients
Purpose: to get regression coefficients for a bivariate linear regression line
1. Analyze, Regression, Linear
• Opens a box

More
Less
Unlock Document

Related notes for SOC222H5

Only pages 1,2 and half of page 3 are available for preview. Some parts have been intentionally blurred.

Unlock DocumentJoin OneClass

Access over 10 million pages of study

documents for 1.3 million courses.

Sign up

Join to view

Continue

Continue
OR

By registering, I agree to the
Terms
and
Privacy Policies

Already have an account?
Log in

Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.