Lecture 18

# 2Z03 Lecture 18 "Quantitative Data Analysis".docx Premium

School
McMaster University
Department
Sociology
Course
SOCIOL 2Z03
Professor
Gerald Bierling
Semester
Winter

Description
Fox 1 Lecture 18 SOCIOL 2Z03 Wednesday March 26, 2014  Quantitative Data Analysis  Lecture Outline: • Quantitative Data • Univariate Analysis • Bivariate Analysis • Multivariate Analysis  Quantitative Data: • Numerical representations of our observations (assigning numerical values referred to as variables) • 2 main purposes/tasks of quantitative analysis: 1) Describing variables on their own (descriptive); Univariate (ex=examining income on its own) 2) Examining relationships between variables (explanatory); bivariate (2 variables together) or multivariate (more than 2 variables at once) analysis (ex=office testing)/This can only be done after we finish with univariate analysis • How we describe and examine depends on the level of measurement o More ‘powerful’ tools are available at higher levels of measurement o Analysis tools provide more detailed information at higher levels of measurement  Quantitative Data: Numerical Representation: • Age: o 1=1 o 2=2 o 3=3 o 4=4 o 5=5 o Etc. • Sex: o Male=1 o Female=2 (female could be one and male could be 2:doesn’t matter) Fox 2 Lecture 18 • Political Affiliation: o Liberal=1 o Conservative=2 o NDP=3 o Other=4 • Civil Unrest: o Low=1 o Medium=2 o High=3  Univariate Analysis: Frequency Distributions: • What is your main source of stress? (handout) • Value labels indicate what the value categories were (value labels) • Nominal level variable (numbering wouldn’t change the meaning of the variable, don’t let frequency’s and per cents influence what level of measurement you think it might be/try mixing up the categories) • Valid responses/categories: one group of responses • Missing responses/categories: people who don’t want to answer the question/these kind of responses (Refuse/Don’t know/weren’t asked=missing cases or values/We only want to examine the valid responses) • Total: of all valid categories • Frequency: how many people gave each type of answer • Frequency and percent column not the same (768/3840=20%) • 1/3 of respondents feel that work is the main source of stress in their lives • Make basic descriptive conclusions without every value in the table (summarize key findings from the table)  Univariate Analysis: Frequency Distributions • Chart: univariate analysis can’t always….problem (missed it) • 2 solutions: 1) Recode variable into smaller number of categories (total household income) 2) Graph (histograms=show clusters) • Histograms will typically take one of 3 shapes (skewness) Fox 3 Lecture 18 i) Normal Curve (SHAPE 1): doesn’t occur a lot with the type of variables we’re looking at ii) Positive skew: (e.g house prices, income/SHAPE 2) iii) Negative skew (e.g life expectancy/SHAPE 3)  2 main questions: • Central tendency: what is typical/normal? o Mode, median and mean • Variability/Dispersion: what is atypical/different? (what countries have different life expectancy values?) o Percentages, range, standard deviation Nominal Ordinal Interval Central tendency -Mode -Mode -Mode -Median -Median -Mean Variability -Compare -Compares -Range percentages percentages -Standard deviation -Range  Univariate Analysis: Mode • Most frequently occurring value • Least useful measure of central tendency • (handout: main source of stress in peoples life: nominal level variable: mode/value that occurs the most is work)  Univariate Analysis: Median • Divides distribution into 2 equal halves: middlemost value • But values must be in ordinal – i.e ordinal variable • Chart: Median=2 (very good) why? Because in cumulative percent, adds all the valid per cents for each valid categories (ex=excellent=19.2%, very good 36.8% added which gives us 56/1/2 of ppl very good, and ½ not very good or worse) Fox 4 Lecture 18  Univariate Analysis: Mean • Arithmetic average • E.g: FT-Attitudes towards gays and lesbians o Mode=50 o Median=50 o Mean=38.8  Average is lower than mode and median • If distribution is skewed, the median might be a better measure of central tendency than is the mean, even if there is interval level data o Why: the mean is being (artificially) inflated or deflated by a few extremely high or low scores  Canada, employment income, 2010 (\$) Mean Median Male 49,000 37,000 Female 34,000 27,000 • Income takes the shape of a skewed distribution • Median is much lower than the mean value because there’s few people with high incomes compared to everyone else/there’s inflation of average income values  Univariate Analysis: Dispersion: • For nominal data compare percentages to determine variation • FT Chart: If we want to talk about variability, compare valid percentages • Work is the main source of stress for many more people (twice as many people more than family) then any other source of stress=how people are dispersed • Range: difference between highest and lowest value (least useful measure of variability because there’s nothing in-between) • Standard Deviation: o Can only be used for interval variables o Makes use of all values/cases in the distribution o More useful than range o Can be used to compare distribution measured on the same scale group with the highest variation = most in responses • Rule of thumb: double the value of the standard deviation, if it encompasses twice the range of values, dispersion/variability is high Fox 5 Lecture 18 • Example=Feeling thermometer: Rate Health Care System (0-100) o Mean=78.7 o Median=75 o Mode=75 o Range=100 (
