Class Notes (809,444)
Canada (493,713)
Sociology (3,988)
SOC350H5 (9)
Lecture 8

# SOC350H5 Lecture 8: Lecture 8 Premium

5 Pages
16 Views

School
University of Toronto Mississauga
Department
Sociology
Course
SOC350H5
Professor
David Pettinicchio
Semester
Winter

Description
Lecture 8 Index Variable  Count data  Does not follow a distribution for OLS  How to create an index variable  People call it index/scale  Count measure  When creating scale  You ask spss to create scale, it will count what you categorize as 1  If you have victimization in random labels – need to recode data in the same direction so one in this variable is one in the other  Why is OLS problematic – too few variables added together you wont get statistical sig  It produces a small rage  Not enough variation in that range – due to less cases  Only go with adding index if you have five possible outcomes  Have to have enough variables  Things you add have to be same  How to create index  Start with clean and consistent variables  IF one variable is coded one and two – spss will just add two  When you add variables together they have to be exactly same otherwise adding things that are not comparable  Everything has to be same  Compute – add variables that you are going to add into total variable and use operations to add them  Will end up with a range  If you have 15 possibilities then max you can have is 15  Must not exceed the maximum  Have to also look at distribution  Always double check  For OLS it is problematic when there is little variation in that spread  Caveat – shouldn’t be doing OLS with count data  Because it is not a normal distribution  Can fix it to get significance  People that do index use cronbach’s alpha  Anything less than .7 means you have an unreliable scale Outliers  Normally dist data – symmetric – no influential outliers  Line of fit based on model that attempts to minimize error  Outliers have undue influence on the model (line of fit)  Outliers have large error and that effects slope  Model with mother and fathers occ pres and educ determining respondents occ pres  Doing a scatter plot on spss - most data is clustered  Everything that starts to move away could be problematic  Standardized residuals  Just because it seems to be an outlier doesn’t mean it is  Use standard residual tool – we want to compare errors  We want everything to be standardized  We are talking about the distance of points – that’s what we standardized  Allows us to talk about outliers beyond standard scores  Allows you to use normal dist to see if the outliers are beyond certain z scores  Those are def case for concerns – beyond 2.5 is problem  Idea is that you are now able to use properties of distributions to see how far stand dev away – 2.5 are problems  After creating new variable for distribution of variable – you can make a histogram  This is a distribution of the errors – not cases  0 is avg error and tails are stand dev away  When you go into spss – create new variable  This one has large outliers on positive end  Mean residual was 44.84  Lowest error -35.68  Highest was 56.901  Z score – -2.942 to 4.689  There is more problem on the right  Case that has highest error is 4.689 stand dev away from mean  Cases of homicide - ** Cook’s D  More systematic and another tool to see outliers is cooks d  A measure of both distance and leverage – how much of an influence is the case having on model  Higher the cook d value the more influence that case has on the slope and stat sig  Greater than 1 is not a problem but other ones are  Look at structure of data, using these tools are there cases that really stand out and might be influencing your model  Cook’s D – estimation diagnosis  Want to diagnose  Ranked in order DFbeta  What it does is look at difference between regression coefficient if you were to drop those outliers  It will give indicator – if you drop them, what would amount of influence of those cases be if you drop them  2/sq root of sample size OR greater than -1 or 1  Look at it comprehensively with all tests  0.04 as cutoff was established by 2/sqrtn Solutions  Delete the observation that is the outlier  Delete the variable if it has a lot of outliers  Transform a v
More Less

Related notes for SOC350H5

Log In

OR

Don't have an account?

Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view

OR

By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.

Request Course
Submit