Class Notes (839,194)
Canada (511,223)
Sociology (4,081)
SOC222H5 (93)

222-04 Outline.docx - Lecture 4

11 Pages

Course Code
John Kervin

This preview shows pages 1,2 and half of page 3. Sign up to view the full 11 pages of the document.
1 SOC222 MEASURING the SOCIAL WORLD: Session #4 CORRELATION & REGRESSION TODAY’S OBJECTIVES 1. Know how to find the equation for a regression line 2. Know how to use that equation to make predictions 3. Understand covariance 4. Know how to obtain a correlation coefficient 5. Understand outliers 6. Know how to find PVE statistics for regressions, correlations, and comparing means Terms to Know effect size, linear (equation), parameter, constant “a”, slope “b”, variance, covariance, definition formula, computational formula, correlation coefficient “r” , scatter, outlier, proportion of variation explained (PVE), coefficient of determination R , eta η – Greek letter Talking about ratio variables today. Some of the terms we have looked at before. LINEAR EQUATIONS This is what happens when you have 2 ratio variables. RQ: Are your marks higher if you spend more time on campus? is there a relationship between the 2 variables and if there is, what is the effect size. how big/strong is the relationship Ranges from 0-6, most are there almost every day while 3 are never on campus The relationship is slightly positive 1. The first step: Find the regression line that fits the data best. • “best fit” means: it comes the closest to all the points (cases) • “Linear” means: a straight line • We want a straight line that fits these data points the best 2 Any time we have a straight line, we can find an equation for it • The equation for a straight line is: Y = a + b (X) • “Y” stands for values of the dependent variable • “X” stands for the independent variable • Linneman, pp. 213-215 The regression line is marks equals a plus b times the amount of days you are on campus Marks = a + b (Days) Can use the regression equation to make predictions • The two variables are X and Y • Days and Marks Finding the equation is the same is finding the parameters • The two parameters are a and b • a is called the constant • This is where the regression line crosses the vertical Y axis • b is called the slope of the line • every time the value of X increases by 1, this is how much the value of Y goes up • Linneman, p. 214 Prediction: your income is 40 plus 2 times the hours you spend studying Income (in $000) = 40 + 2 X Hours No studying: Hours = 0 Predicted income = 40 + 2 X 0 = 40 Making 40, 000$ 3 10 hours studying: Hours = 10 Predicted income = 40 + 2 X 10 = 40 + 20 = 60 30 hours studying: Hours = 30 Predicted income = 40 + 2 X 30 = 40 + 60 = 100 We can use the regression line to find what someone’s IV would be based on the DV Finding the Regression Line Covariance: the heart of the formula Covariance Variance is the “variation” or “dispersion” or “spread” of one variable Take each value and subtract the average This expression will pop up almost everywhere in statistics Covariance is similar; how two things change together – do they change systematically? One goes up, the other goes down or up too OR one goes up, other doesn’t change much – how they co-vary together Formula: 2 ∑ (x−x)2 variance = s = n−1 (x−́x) (x−x)(y−́y) Take a case and subtract its mean, but do this for x AND y, then multiply Step 1: • Pick a case • Find its value on variable X • Subtract the mean of X = result #1 Step 2: • Same case • Find its value on variable Y • Subtract the mean of Y = result #2 Step 3: • Multiply these two results = result #3 • A product result for each case Step 4: • Add up the product results: ∑ (x−x)(y−́y) Step 5: • Divide it all by N - 1 ∑ (x−́x)y−́y ) Covariance = N−1 NOTE: • ∑: add things up; almost every time you see this sign, you will divide it by n-1 • N - 1 4 Logic behind covariance – why did we multiply the two things? Consider the following: • Small: 2, 3 • Big: 11, 12 • A small number times a small number is a small number • 2 * 3 = 6 • A small number times a big number is a big number • 2 * 11 = 22 • A big number times a big number is a very big number • 11 * 12 = 132 If a case is close to its mean, then the difference will be small (x minus x bar will be small) • But if it’s far from the mean, the difference will be larger • Values: 1, 3, 5, 7, 9, 11 • Mean = 6 • For the value “5”, the difference is just “1” • For the value 1”, the difference is “5” Return to X and Y: • If a case is close to the mean on both X and Y, then the product result will be small (small x small = small) • Because each of the differences is small • That case won’t add much to the covariance • If a case is far from the mean on both X and Y, then the product result will be very large (big x big = big) • That case will contribute much more to the covariance If the sample has all cases with big #s, then you get a lot of covariance EG #1: 3 cases strongly related X Y Differences from mean for X, Y 1 2 -2, -2 3 4 0, 0 5 6 2, 2 • Product results: 4, 0, 4 • Sum: 8 EG #2: 3 cases not very strongly related so product result will be small X Y Differences from mean for X, Y 1 2 -2, -2 3 6 0, -2 5 4 2, 0 • Product results: 4, 0, 0, • Sum: 4 When variables are related, sum of product results is larger & this means sum is larger Slope “b” slope = parameter “b” The step: find a “conversion” term. (Take covariance and turn it into the slope) Steps: 1. Select the independent variable X 5 2. Calculate for each case the difference between the case value and the mean of x (x−́x) 3. Square these differences 2 (x−́x) 4. Add the squared differences up and divide by N – 1 ∑ (x−́x) N−1 • Kranzler, p. 20, Rule #6 Dividing two fractions take one and turn it upside down and multiply • Result is the parameter “b” b= ∑ (x−x́)(y−y) ∑ (x−x) What is on top (numerator) is what matters – it is the heart of the equation: (x−x)(y−́y) Linneman, pp. 218-219 NOT Kranzler’s explanation SPSS does this for us NOTE: going thru different texts, you will find different formulas – there are 3 kinds • “definition” formulas: show us all pieces, step by step, and why we do it (rationale) – these are what we use • computational formulas: back in the days, on mechanical calculators • See Kranzler p. 85 • shortcut formulas: symbols which represent other kinds of formulas • Kranzler p. 91 • Don’t memorize any formula. Understand it Constant “a”: Read Linneman: bottom of p. 219, continue to 220; Second parameter 6 Regression line’s formula:Y = 68.56 + 1.39 * X If a certain student, say Susan, spends 4 days on campus, we can make a prediction: “a” = 68.56 “b” = 1.39 Y = 68.56 + 1.39 * 4 = 74.12% If we didn’t know anything about Susan, can use average of everybody, where as if we only knew this piece of information then we can make this is as our best possible prediction HOWEVER Susan’s mark is likely to differ from this ============================================================= SPSS: Regression Coefficients Purpose: to get regression coefficients for a bivariate linear regression line 1. Analyze, Regression, Linear • Opens a box
More Less
Unlock Document

Only pages 1,2 and half of page 3 are available for preview. Some parts have been intentionally blurred.

Unlock Document
You're Reading a Preview

Unlock to view full version

Unlock Document

Log In


Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view


By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.