Class Notes
(811,170)

Canada
(494,539)

University of Toronto Mississauga
(23,445)

Sociology
(4,001)

SOC222H5
(93)

John Kervin
(32)

Lecture

# Soc222Lec#3 (Jan23rd).docx (lin. and Kranzler)

Unlock Document

University of Toronto Mississauga

Sociology

SOC222H5

John Kervin

Winter

Description

SOC 222 -- MEASURING the SOCIAL WORLD
Session #3 -- MEANS & VARIATION
Sep 2013
Agenda:
Announcements
Where we are
Today’s Objectives: Know …
Terms to Know
Ratio (Quantitative) Variables
Frequency Distributions for Rat Vars
Frequency Distribution Histograms in SPSS
1. Skewed Distributions
2. Outliers
Central Tendency
Mode
Median
Mean
Comparing Measures of Central Tendency
SPSS: Central Tendency
Variation
1. Range
2. Variance
Standard Deviation (SD)
3. Inter-quartile range
Individual Uniqueness
Z scores
Normal Curve Distributions
CatRat Relationships
Comparing Means
Other Stuff in the Text
Tutorial Vote
Today’s Objectives: Know
1. How to get a histogram frequency distribution
2. Skewness and outliers
3. Three measures of central tendency, and pros and cons of each 4. How to get central tendency and variation measures with SPSS
5. Variation as distance from the mean
6. Three measures of variation
7. How to assess the uniqueness of a specific case with Z scores and percentiles
8. Difference between experimental and non-experimental designs
9. Comparing means as effect size for cat rat relationships
Terms to Know
quantitative variable
valid percent
normal curve
histogram
skewness
positive and negative skewness
outlier
mode, median, mean
bimodal distribution
variance, range, standard deviation, inter-quartile range
percentile
Z score
true experiment, quasi-experiment
RATIO (QUANTITATIVE) VARIABLES
• Counts
o Could be the total amount of students in the class
• Amounts
o University classes, the amount could be the total cost of the text for each
class
FREQUENCY DISTRIBUTIONS for RAT VARS
- Frequency distributions for quantitative variables are not so useful, they end up
with huge tables, which make it hard to find median and mode
- Valid is on the total number of cases that actually answered. Percent is on the
total cases including the missed ones.
- Reminder: note the difference between “Percent” and “Valid Percent”
- When we have quantitative variables we don’t use frequency tables, they are too
long. We are concerned with the shape of the distribution, shape matters
because it effects what we do with the data. Data analysis is easier in a normal
distribution [bell-shaped curve] - We don’t want the tables we want the graphics, this graphic is called a histogram:
it gives us the shape of the distribution.
• Distribution shape is important because
- Let’s us know what we are analyzing
- Gives us an idea of the data
- Makes it easy for data analysis
• normal curve
• This is a bell-shaped curve
Frequency Distribution Histograms in SPSS
The graphic of a frequency distribution is called a histogram
Revised SPSS Guide is posted:
SPSS Frequency Distributions
Open data set
• Click on “Analyze” on menu bar
• Choose “Descriptive statistics” from dropdown menu
• Choose “Frequencies”
- Click on display frequency tables, so the table does not show
This opens a box called “Frequencies”
Expand the box to read the variable labels
• List of variables on the left
• In the middle: an empty working area called “Variable(s)”
• Option buttons on the right
• Action buttons on the bottom
Click on the variable you want
• Click on the arrow to move it to the Variables area
For a Frequency Distribution Table
• Click on OK
• This opens your output window with a frequency distribution table.
For a Frequency Distribution Histogram
• Click on “Charts…” option button
• Select Histogram, Normal curve
• Continue
• In main Frequencies box
• Uncheck “Display frequency tables”
• OK
• This opens your output window with a Histogram The variable has approximately a normal curve shape
• Missing some cases in the middle range
• Big gap at 80, if these we moved over the distribution will be normal
1. Skewed Distributions
These are distributions which stretch out in one direction - this is a skewed distribution, positively skewed.
We say this distribution is skewed.
• positively skewed. Since most of the data is on the left
• Most of the data on the right means negatively skewed
2. Outliers
• Outliers are extreme values
• They are noticeably different from the rest of the values in the case
• They can distort the conclusions you come up with. • Shows an outlier at around 5.
• This distribution is clearly skewed, but it also has one outlier, only one case had a
rate of 5 homicide race.
CENTRAL TENDENCY
The underlying question:
• What’s the “typical value” of a variable?
• Underlying question ^
• Three common measures: mode, median, mean
• Today we will talk about mean, this is the one that matter (K:43-47)
Texts:
• Linneman: 76-84
• Kranzler: 43-47 Mode
• Mode: The value with the most cases
EG: Ages: 21, 21, 24, 25, 27, 28, 29
• The value “21” occurs twice
• All other ages occur just once
• The mode is “21”, since it occurs most often
NOTE:
• two modes
• bimodal: when we have two modes
EG: 21, 21, 24, 25, 27, 27, 29
• This distribution is bimodal
• We have 2 modes ^.
Median
• Median: The value of the “middle” case
• When all cases are sorted in rank order
• If you put all your cases in rank order, the one that’s in the middle is the
median.
EG: 21, 24, 24, 26, 27
• median is “24”
• it’s in the middle
• ordinal and ratio variables, you cannot have a median in a nominal
category variable
NOTES:
1. Only applies to ordinal and ratio variables
2. What if you have an even # of cases?
EG: 21, 21, 24, 25, 27, 28
• median is the mid-point between these two
• median here is 24.5
• when you have two in the middle you take the midpoint, the sum of
those 2 /by the pieces of data.
Mean
• Mean: The average value
EG: the distribution (of ages in a grad student seminar) is :
21, 21, 24, 25, 27, 27, 29
xi means the value of the i thcase.
EG: x 3s 24.
∑ (“sigma”) means sum
EG: ∑x i is 174 x= xi
∑ n
Interpretation:
• The x with the bar over it means the mean
• The equal sign tells us how to get the mean
• The sigma sign tells us to add something up
• The numerator (top of the fraction)
• x with the subscript i tells us to sum up all the values of x
• The denominator (bottom)
• n tells us to divide that sum
EG: mean of this set of values: 3, 5, 7, 9
x = ∑ 3,5,7,9 = 24 = 6
4 4
NOTES:
- add up all the numbers and divide by the number of cases.
Comparing Measures of Central Tendency
Nominal category:
- can only use the mode
Ordinal category:
- can use the mode and the median
Quantitative:
- can use any one of the three, but mode is not very useful.
- We want the central value the most frequent value.
- That leaves us with mean and median
• Mode:
o What happens most often and frequent
• Median
o When you need to the most common case, the middle
• Advantage of the median:
• Not effected by outliers.
• outliers (Linneman, p. 77)
EG: 21, 21, 24, 25, 27, 28, 29
• median is “25”
EG: 21, 21, 24, 25, 27, 28, 92
• median is still “25” • Disadvantage of the median:
• Doesn’t take each score into account equally
• The scores at the end of the distribution don’t count as much, they
will not count as much
• Mean
• Advantages of the mean:
1. it considers all other values

More
Less
Related notes for SOC222H5