Class Notes (811,317)
Canada (494,645)
Sociology (2,049)
Lecture 18

2Z03 Lecture 18 "Quantitative Data Analysis".docx

11 Pages
Unlock Document

McMaster University
Gerald Bierling

Fox 1 Lecture 18 SOCIOL 2Z03 Wednesday March 26, 2014  Quantitative Data Analysis  Lecture Outline: • Quantitative Data • Univariate Analysis • Bivariate Analysis • Multivariate Analysis  Quantitative Data: • Numerical representations of our observations (assigning numerical values referred to as variables) • 2 main purposes/tasks of quantitative analysis: 1) Describing variables on their own (descriptive); Univariate (ex=examining income on its own) 2) Examining relationships between variables (explanatory); bivariate (2 variables together) or multivariate (more than 2 variables at once) analysis (ex=office testing)/This can only be done after we finish with univariate analysis • How we describe and examine depends on the level of measurement o More ‘powerful’ tools are available at higher levels of measurement o Analysis tools provide more detailed information at higher levels of measurement  Quantitative Data: Numerical Representation: • Age: o 1=1 o 2=2 o 3=3 o 4=4 o 5=5 o Etc. • Sex: o Male=1 o Female=2 (female could be one and male could be 2:doesn’t matter) Fox 2 Lecture 18 • Political Affiliation: o Liberal=1 o Conservative=2 o NDP=3 o Other=4 • Civil Unrest: o Low=1 o Medium=2 o High=3  Univariate Analysis: Frequency Distributions: • What is your main source of stress? (handout) • Value labels indicate what the value categories were (value labels) • Nominal level variable (numbering wouldn’t change the meaning of the variable, don’t let frequency’s and per cents influence what level of measurement you think it might be/try mixing up the categories) • Valid responses/categories: one group of responses • Missing responses/categories: people who don’t want to answer the question/these kind of responses (Refuse/Don’t know/weren’t asked=missing cases or values/We only want to examine the valid responses) • Total: of all valid categories • Frequency: how many people gave each type of answer • Frequency and percent column not the same (768/3840=20%) • 1/3 of respondents feel that work is the main source of stress in their lives • Make basic descriptive conclusions without every value in the table (summarize key findings from the table)  Univariate Analysis: Frequency Distributions • Chart: univariate analysis can’t always….problem (missed it) • 2 solutions: 1) Recode variable into smaller number of categories (total household income) 2) Graph (histograms=show clusters) • Histograms will typically take one of 3 shapes (skewness) Fox 3 Lecture 18 i) Normal Curve (SHAPE 1): doesn’t occur a lot with the type of variables we’re looking at ii) Positive skew: (e.g house prices, income/SHAPE 2) iii) Negative skew (e.g life expectancy/SHAPE 3)  2 main questions: • Central tendency: what is typical/normal? o Mode, median and mean • Variability/Dispersion: what is atypical/different? (what countries have different life expectancy values?) o Percentages, range, standard deviation Nominal Ordinal Interval Central tendency -Mode -Mode -Mode -Median -Median -Mean Variability -Compare -Compares -Range percentages percentages -Standard deviation -Range  Univariate Analysis: Mode • Most frequently occurring value • Least useful measure of central tendency • (handout: main source of stress in peoples life: nominal level variable: mode/value that occurs the most is work)  Univariate Analysis: Median • Divides distribution into 2 equal halves: middlemost value • But values must be in ordinal – i.e ordinal variable • Chart: Median=2 (very good) why? Because in cumulative percent, adds all the valid per cents for each valid categories (ex=excellent=19.2%, very good 36.8% added which gives us 56/1/2 of ppl very good, and ½ not very good or worse) Fox 4 Lecture 18  Univariate Analysis: Mean • Arithmetic average • E.g: FT-Attitudes towards gays and lesbians o Mode=50 o Median=50 o Mean=38.8  Average is lower than mode and median • If distribution is skewed, the median might be a better measure of central tendency than is the mean, even if there is interval level data o Why: the mean is being (artificially) inflated or deflated by a few extremely high or low scores  Canada, employment income, 2010 ($) Mean Median Male 49,000 37,000 Female 34,000 27,000 • Income takes the shape of a skewed distribution • Median is much lower than the mean value because there’s few people with high incomes compared to everyone else/there’s inflation of average income values  Univariate Analysis: Dispersion: • For nominal data compare percentages to determine variation • FT Chart: If we want to talk about variability, compare valid percentages • Work is the main source of stress for many more people (twice as many people more than family) then any other source of stress=how people are dispersed • Range: difference between highest and lowest value (least useful measure of variability because there’s nothing in-between) • Standard Deviation: o Can only be used for interval variables o Makes use of all values/cases in the distribution o More useful than range o Can be used to compare distribution measured on the same scale group with the highest variation = most in responses • Rule of thumb: double the value of the standard deviation, if it encompasses twice the range of values, dispersion/variability is high Fox 5 Lecture 18 • Example=Feeling thermometer: Rate Health Care System (0-100) o Mean=78.7 o Median=75 o Mode=75 o Range=100 (
More Less

Related notes for SOCIOL 2Z03

Log In


Don't have an account?

Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view


By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.