Class Notes (811,705)
Canada (494,883)
Sociology (4,001)
SOC222H5 (93)
Lecture 9

lecture 9

15 Pages
Unlock Document

John Kervin

1 SOC 222 -- MEASURING the SOCIAL WORLD Session #9 -- INFERENTIAL STATISTICS for REGRESSION Readings: Linneman ch. 8 Kranzler ch. 9:94-96 Today’s Objectives: Know… Use of regression for prediction Definition of positive and negative residual How to locate outliers, and their effects How to eliminate the effects of outliers using SPSS Select Cases Roles of residuals and least squares in getting a regression line How to get inferential statistics for slopes, regression model, and correlation How to test for the linearity and the equality of variance assumptions. Terms to Know model linear equation y R 2 dispersion degree of association coefficient of determination multiple regression residual prediction errors outlier least squares linearity equality of variance REVIEW: HOW REGRESSION WORKS RQ: Do immigrants tend to choose provinces or territories with more urban populations? • IV: urbanization – percent of residents living in cities • DV: percent of residents who are immigrants 2 Scatterplot ****Graphs-> chart builder-> scatter plot-> pick one and drag it up, DV- goes on Y-axis-. Then hit ok and it gives you scatter plot Eye-balling the plot suggests three things: 1. There seems to be a positive relationship because as one goes up the other goes up 2.Relationship is not a straight line 3. There seem to be some extreme cases, the first lonely one, and last lonely ones Regression Line & Linear Equation Model – a mathematical expression that simplifies and represents what we think is going on in the world Linear equation – a model of the form: y = a + b*x y the value on the vertical axis 3 a the constant in the equation • the value of y when x is zero y = a + b*0 y = a b the slope of the straight line x the value on the horizontal axis SPSS: Add Regression Line and Regression Equation to Scatterplot ̂y=6.59+0.26∗x Review: Regression Effect Sizes Slope Slope is interpreted in terms of effect of one unit of change in the IV b- effect of one unit change in the variable and what effect it has on dependent variable =0.26 change in percentage of imigrants 4 NOTE: • The slope effect size is expressed in units of measure of the DV **Go to chart builder-> add linear line with left click-> choose 5 one that adds a fit line Fit Symbol: R = 0.485?? coefficient of determination multiple regression- more than one dependent variables y = a + b 1 + b 2 + b 3 + … + b n • R 2 is equivalent to r • A good fit is .80 or more 1. Model is incomplete, its missing of important independent variables 2. Relationship is slightly ….. rather than linear Correlation Correlation is an effect siz which depends on both slope and spread. Depends on both slope and spread, how spread out the points are in scatter plot dispersion 5 High correlation Low correlation ** although slope stays the same, correlation is less same slope, different correlations Regression and Prediction • we predict y from x • We replace y with y y=a+b∗x RESIDUALS residual: The vertical distance between an actual case and the regression line Each case has a residual a positive residual (case is above the line) 6 a negative residual (case is below the line) Residuals and Prediction- equivalent to errors made by the regression line? prediction errors y−̂y residual = • y is the predicted value • The y predictions are the straight line • The actual values of y are the points above and below the regression line Outliers Outliers are cases with large residuals • Can change the slope b 2 • Can reduce the fit R • Can reduce the correlation r Typically extreme on one variable but not the other 7 Outliers and Variables Y X Outlier on X, but not Y Outliner on Y, but not X This case wouldn’t effect fit, Checking Outliers Five Steps: 1. Check scatterplot for any obvious outliers. • If there are, continue. 2. Compute slope and R-squared and correlation with all cases 3. Identify the outlier 4. Re-compute with the outlier excluded 5. Compare results Step #1: Check for obvious outliers Step #2: Compute statistics 8 Step #3: Identify outlier • This step has four steps: 3-A: Note which variable has the outlier, and the approximate value of the outlier 3-B: Find variable in data set: click on the row number for the variable • This takes you to the “Data View” with that variable in the first column 3-C: Find the row with the extreme value 3-D: Click row number to get the case ID (usually the ID is the first variable) Step #4: Re-compute To drop the outlier • In this example, select those provinces with more than 10% in Cities • This will exclude Nunavut Put it in data view, we are interested in 0-> click on rows and it tells us that that extreme is Nunavut, SPSS Select Cases ================================================================= SPSS: Select Cases 9 This procedure temporarily omits one or more cases from your analysis. This is called “filtering” – it filters out cases. Does this by creating a variable called “Filter” 1. Menu Bar, Data, Select Cases (near bottom) • Opens the “Select Cases” box • On left: list of variables • On right: two working areas • Top: “Select” area • Underneath: “Output” area • Bottom: the five action buttons 2. In the “Select” area • Click on “If condition is satisfied” • This highlights the “If” button • Click on it • Opens a sub-box called “Select Cases: If” • On left: List of variables • Top: Empty working area put in >or = 10 • Underneath: a keypad, two “Function” working areas, and a blank ar
More Less

Related notes for SOC222H5

Log In


Don't have an account?

Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view


By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.