Class Notes
(811,705)

Canada
(494,883)

University of Toronto Mississauga
(23,469)

Sociology
(4,001)

SOC222H5
(93)

John Kervin
(32)

Lecture 9

# lecture 9

Unlock Document

Sociology

SOC222H5

John Kervin

Winter

Description

1
SOC 222 -- MEASURING the SOCIAL WORLD
Session #9 -- INFERENTIAL STATISTICS for REGRESSION
Readings:
Linneman ch. 8
Kranzler ch. 9:94-96
Today’s Objectives: Know…
Use of regression for prediction
Definition of positive and negative residual
How to locate outliers, and their effects
How to eliminate the effects of outliers using SPSS Select Cases
Roles of residuals and least squares in getting a regression line
How to get inferential statistics for slopes, regression model, and correlation
How to test for the linearity and the equality of variance assumptions.
Terms to Know
model
linear equation
y
R 2
dispersion
degree of association
coefficient of determination
multiple regression
residual
prediction errors
outlier
least squares
linearity
equality of variance
REVIEW: HOW REGRESSION WORKS
RQ: Do immigrants tend to choose provinces or territories with more urban
populations?
• IV: urbanization – percent of residents living in cities
• DV: percent of residents who are immigrants 2
Scatterplot
****Graphs-> chart builder-> scatter plot-> pick one and drag it up, DV- goes on Y-axis-.
Then hit ok and it gives you scatter plot
Eye-balling the plot suggests three things:
1. There seems to be a positive relationship because as one goes up the other goes up
2.Relationship is not a straight line
3. There seem to be some extreme cases, the first lonely one, and last lonely ones
Regression Line & Linear Equation
Model – a mathematical expression that simplifies and represents what we think is
going on in the world
Linear equation – a model of the form: y = a + b*x
y the value on the vertical axis 3
a the constant in the equation
• the value of y when x is zero
y = a + b*0
y = a
b the slope of the straight line
x the value on the horizontal axis
SPSS: Add Regression Line and Regression Equation to Scatterplot
̂y=6.59+0.26∗x
Review: Regression Effect Sizes
Slope
Slope is interpreted in terms of effect of one unit of change in the IV
b- effect of one unit change in the variable and what effect it has on dependent variable
=0.26 change in percentage of imigrants 4
NOTE:
• The slope effect size is expressed in units of measure of the DV
**Go to chart builder-> add linear line with left click-> choose 5 one that adds a fit line
Fit
Symbol: R = 0.485??
coefficient of determination
multiple regression- more than one dependent variables
y = a + b 1 + b 2 + b 3 + … + b n
• R 2 is equivalent to r
•
A good fit is .80 or more
1. Model is incomplete, its missing of important independent variables
2. Relationship is slightly ….. rather than linear
Correlation
Correlation is an effect siz which depends on both slope and spread.
Depends on both slope and spread, how spread out the points are in scatter plot
dispersion 5
High correlation
Low correlation
** although slope stays the same, correlation is less
same slope, different correlations
Regression and Prediction
• we predict y from x
• We replace y with y
y=a+b∗x
RESIDUALS
residual: The vertical distance between an actual case and the regression line
Each case has a residual
a positive residual (case is above the line) 6
a negative residual (case is below the line)
Residuals and Prediction- equivalent to errors made by the regression line?
prediction errors
y−̂y
residual =
• y is the predicted value
• The y predictions are the straight line
• The actual values of y are the points above and below the regression line
Outliers
Outliers are cases with large residuals
• Can change the slope b
2
• Can reduce the fit R
• Can reduce the correlation r
Typically extreme on one variable but not the other 7
Outliers and Variables
Y
X
Outlier on X, but not Y
Outliner on Y, but not X
This case wouldn’t effect fit,
Checking Outliers
Five Steps:
1. Check scatterplot for any obvious outliers.
• If there are, continue.
2. Compute slope and R-squared and correlation with all cases
3. Identify the outlier
4. Re-compute with the outlier excluded
5. Compare results
Step #1: Check for obvious outliers
Step #2: Compute statistics 8
Step #3: Identify outlier
• This step has four steps:
3-A: Note which variable has the outlier, and the approximate value of the outlier
3-B: Find variable in data set: click on the row number for the variable
• This takes you to the “Data View” with that variable in the first column
3-C: Find the row with the extreme value
3-D: Click row number to get the case ID (usually the ID is the first variable)
Step #4: Re-compute
To drop the outlier
• In this example, select those provinces with more than 10% in Cities
• This will exclude Nunavut
Put it in data view, we are interested in 0-> click on rows and it tells us that that extreme
is Nunavut,
SPSS Select Cases
=================================================================
SPSS: Select Cases 9
This procedure temporarily omits one or more cases from your analysis.
This is called “filtering” – it filters out cases.
Does this by creating a variable called “Filter”
1. Menu Bar, Data, Select Cases (near bottom)
• Opens the “Select Cases” box
• On left: list of variables
• On right: two working areas
• Top: “Select” area
• Underneath: “Output” area
• Bottom: the five action buttons
2. In the “Select” area
• Click on “If condition is satisfied”
• This highlights the “If” button
• Click on it
• Opens a sub-box called “Select Cases: If”
• On left: List of variables
• Top: Empty working area put in >or = 10
• Underneath: a keypad, two “Function” working areas, and a blank
ar

More
Less
Related notes for SOC222H5