Class Notes
(811,317)

Canada
(494,645)

McMaster University
(38,482)

Sociology
(2,049)

SOCIOL 2Z03
(98)

Gerald Bierling
(33)

Lecture 18

Unlock Document

McMaster University

Sociology

SOCIOL 2Z03

Gerald Bierling

Winter

Description

Fox 1 Lecture 18
SOCIOL 2Z03
Wednesday March 26, 2014
Quantitative Data Analysis
Lecture Outline:
• Quantitative Data
• Univariate Analysis
• Bivariate Analysis
• Multivariate Analysis
Quantitative Data:
• Numerical representations of our observations (assigning numerical values
referred to as variables)
• 2 main purposes/tasks of quantitative analysis:
1) Describing variables on their own (descriptive); Univariate (ex=examining
income on its own)
2) Examining relationships between variables (explanatory); bivariate (2
variables together) or multivariate (more than 2 variables at once) analysis
(ex=office testing)/This can only be done after we finish with univariate
analysis
• How we describe and examine depends on the level of measurement
o More ‘powerful’ tools are available at higher levels of measurement
o Analysis tools provide more detailed information at higher levels of
measurement
Quantitative Data: Numerical Representation:
• Age: o 1=1
o 2=2
o 3=3
o 4=4
o 5=5
o Etc.
• Sex:
o Male=1
o Female=2 (female could be one and male could be 2:doesn’t matter)
Fox 2 Lecture 18
• Political Affiliation:
o Liberal=1
o Conservative=2
o NDP=3
o Other=4
• Civil Unrest:
o Low=1
o Medium=2
o High=3
Univariate Analysis: Frequency Distributions:
• What is your main source of stress? (handout) • Value labels indicate what the value categories were (value labels)
• Nominal level variable (numbering wouldn’t change the meaning of the variable,
don’t let frequency’s and per cents influence what level of measurement you
think it might be/try mixing up the categories)
• Valid responses/categories: one group of responses
• Missing responses/categories: people who don’t want to answer the
question/these kind of responses (Refuse/Don’t know/weren’t asked=missing
cases or values/We only want to examine the valid responses)
• Total: of all valid categories
• Frequency: how many people gave each type of answer
• Frequency and percent column not the same (768/3840=20%)
• 1/3 of respondents feel that work is the main source of stress in their lives
• Make basic descriptive conclusions without every value in the table (summarize
key findings from the table)
Univariate Analysis: Frequency Distributions
• Chart: univariate analysis can’t always….problem (missed it)
• 2 solutions:
1) Recode variable into smaller number of categories (total household income)
2) Graph (histograms=show clusters)
• Histograms will typically take one of 3 shapes (skewness)
Fox 3 Lecture 18
i) Normal Curve (SHAPE 1): doesn’t occur a lot with the type of
variables we’re looking at
ii) Positive skew: (e.g house prices, income/SHAPE 2)
iii) Negative skew (e.g life expectancy/SHAPE 3) 2 main questions:
• Central tendency: what is typical/normal?
o Mode, median and mean
• Variability/Dispersion: what is atypical/different? (what countries have different life
expectancy values?)
o Percentages, range, standard deviation
Nominal Ordinal Interval
Central tendency -Mode -Mode -Mode
-Median -Median
-Mean
Variability -Compare -Compares -Range
percentages percentages
-Standard deviation
-Range
Univariate Analysis: Mode
• Most frequently occurring value
• Least useful measure of central tendency
• (handout: main source of stress in peoples life: nominal level variable:
mode/value that occurs the most is work)
Univariate Analysis: Median
• Divides distribution into 2 equal halves: middlemost value • But values must be in ordinal – i.e ordinal variable
• Chart: Median=2 (very good) why? Because in cumulative percent, adds all
the valid per cents for each valid categories (ex=excellent=19.2%, very good
36.8% added which gives us 56/1/2 of ppl very good, and ½ not very good or
worse)
Fox 4 Lecture 18
Univariate Analysis: Mean
• Arithmetic average
• E.g: FT-Attitudes towards gays and lesbians
o Mode=50
o Median=50
o Mean=38.8
Average is lower than mode and median
• If distribution is skewed, the median might be a better measure of central
tendency than is the mean, even if there is interval level data
o Why: the mean is being (artificially) inflated or deflated by a few extremely
high or low scores
Canada, employment income, 2010 ($)
Mean Median
Male 49,000 37,000
Female 34,000 27,000
• Income takes the shape of a skewed distribution
• Median is much lower than the mean value because there’s few people with high
incomes compared to everyone else/there’s inflation of average income values
Univariate Analysis: Dispersion:
• For nominal data compare percentages to determine variation • FT Chart: If we want to talk about variability, compare valid percentages
• Work is the main source of stress for many more people (twice as many
people more than family) then any other source of stress=how people are
dispersed
• Range: difference between highest and lowest value (least useful measure of
variability because there’s nothing in-between)
• Standard Deviation:
o Can only be used for interval variables
o Makes use of all values/cases in the distribution
o More useful than range
o Can be used to compare distribution measured on the same scale group
with the highest variation = most in responses
• Rule of thumb: double the value of the standard deviation, if it encompasses
twice the range of values, dispersion/variability is high
Fox 5 Lecture 18
• Example=Feeling thermometer: Rate Health Care System (0-100)
o Mean=78.7
o Median=75
o Mode=75
o Range=100 (

More
Less
Related notes for SOCIOL 2Z03