Class Notes
(808,088)

Canada
(493,058)

University of Toronto Mississauga
(23,322)

Sociology
(3,982)

SOC222H5
(88)

John Kervin
(32)

Lecture 8

# lecture 8

Unlock Document

University of Toronto Mississauga

Sociology

SOC222H5

John Kervin

Winter

Description

1
SOC 222 -- MEASURING the SOCIAL WORLD
Session #8 -- CATEGORY & RATIO: INFERENTIAL
Readings:
Linneman: ch. 6 (sampling distribution of differences in means, t-test, ANOVA)
Kranzler: ch. 11 (t-test, one- and two-tailed tests)
ch. 12:125-133 (ANOVA, degrees of freedom, omega square)
===============================================================
Today’s Objectives: Know …
1. The role of eta-squared as a PVE measure of effect size
2. How eta is related to the F statistic
3. The relationship between a statistic and a sampling distribution
4. How to calculate degrees of freedom for a t-test
5. The difference between one- and two-tailed tests
6. Know how to run and interpret an ANOVA test
7. How to use SPSS to get eta, t-tests, and ANOVA F-tests
8. The difference between true experiments, quasi experiments, and non-
experimental designs
Terms to Know
t distribution
F distribution
eta-squared
proportion of variance explained (PVE)
one-tailed test
two-tailed test
t-test
critical values
ANOVA
degrees of freedom
between-categories variance
within-categories variance
total variance
============================================================= 2
EFFECT SIZE in CAT-->RAT RELATIONSHIPS
Difference of Means
Eta Squared
Problem with difference of means: Interpreting it is easy, but you can’t tell if it’s big/little/
or in the middle for effect size
-Better to have a pve measure
PVE measure
• PVE = Proportion of variation explained
How much variation in the DV is being explained by the IV
• IE, think of it as covariance
• Recall overlapping circles:
• PVE varies from .0 to 1.0
• Less than .02 trivial. Can treat as no relationship
• .02 to .04 weak
• .05 to .08 moderate
• .09 and higher strong
Correlation is small r
• eta2
1. Symbol:
2
η 3
1. Two advantages :
• Exact measure of how important the relationship is
• Compare it with other measures:
SPSS: Means (eta, ANOVA)
Report
Receiving EI per Thousand DV
Sex Std.
Predominance Mean N Deviation
More Males 17.3900 17 6.28505
More Females 13.8699 17 3.69568
Total 15.6299 34 5.38202
• Interpretation:
• In male predominant cities, about 3.5 more residents per thousand
are receiving EI
Measures of Association
Eta
Eta Squared
Receiving EI per Thousand
.332 .110
* Sex Predominance
• Less than .02 trivial. Can treat as no relationship
• .02 to .04 weak
• .05 to .08 moderate
• .09 and higher strong
==============================================================
SPSS: Means (eta, ANOVA)
For: category ratio relationships 4
Finds:
Category means and effect size eta
ANOVA F-test and significance level
• Menu bar, Analyze, Compare Means, Means
• This opens a box called “Means”
• Variable list on the left
• Two working areas in the middle
• Three option buttons on right
• Five action buttons at bottom
• Move DV into “Dependent List”
• Move IV into “Independent List”
• In Options: click for “Anova table and eta”
• Continue
• OK
• This opens your output window with four tables
• First is “Case Processing Summary”. Ignore
• Second gives the category means and numbers of cases
• Third is the ANOVA Table
• It gives value of F and significance level
• Note: there are better SPSS procedures for ANOVA F-tests
• Fourth gives eta-squared
• For ANOVA: Determine which one of the five conventional significance levels is
closest to and greater than the SPSS output significance level
• EG: If output level is .055
• Then conventional level is .10
• NOTE: The F test is always a one-tailed test:
• It tests only whether the ratio of sums of squares is zero (null
hypothesis)
• The ratio can never be negative
=============================================================
============ 5
Last week: Inference for Sample Proportions
Last week we covered inferential statistics for categorical à categorical
relationships.
This is a relationship between a categorical variable (Cause) and categorical
variable (Effect).
Research question e.g.: Are women more politically liberal from men?
IV: Gender (Cause) à DV: Vote for liberal party (yes/no) (Effect)
To test this relationship, we use a chi-square test of significance
Sampling distribution = chi-square distribution
This week: Inference for sample means
• This week we cover inferential statistics for categorical à ratio relationships.
• This is a relationship between a categorical variable (Cause) and ratio variable
(Effect).
Research e.g.: Is the cities sex ratio related to the employment insurance rate?
IV: Cities sex ratio (Cause) à DV: EI rate (Effect) 6
2 inferential statistics tests for this kind of relationship:
t-test and F-Test (ANOVA)
Each test has its own sampling distribution:
t-distribution for t-Tests
F-distribution for F-Test
Reminder:
• We used t-distribution before
In session #6: to find confidence intervals for sample means
The F-test is new
NOTES:
• Actually, sociologists very rarely use t-tests & ANOVA to analyze Category à
Ratio relationships.
• Instead sociologists use regression (what we learned at the start of the course)
So why bother with t-test and ANOVA?
• Because the sampling distributions of T and F are very important.
In regression:
• The t-distribution is important in significance tests of regression slopes.
• The F-distribution is important in significance tests of regression models.
t-test
The idea: Comparing two categories (always IV)
Comparing on some ratio measure (always the DV)
Focus: The difference of means
TERMINOLOGY NOTE: Text talks about two “samples” and “two populations.” 7
1) This can be confusing Terminology comes from Psych to distinguish independent
from matched samples
i) We almost never used matched samples in Sociology.
2) Instead of “two samples”, think of two different groups in one sample
o I.E.: Think “categories” instead of “samples” or “populations”
o E.G.: In a sample of 100 UTM students
Two categories: male & female
Four categories: academic year
Three categories: academic area
Hypothesis testing with category means: T-Tests
• What does a T-test do?
Disadvantage of t-test: Only compares two category means
F-test Advantage: Can compare 2+ more category means
E.g.: Is there a significant difference in the average earnings between immigrants
and native-born Canadians? 8
IV: Immigrant (yes or no)
DV: Earnings
• Why use a t-test?
• When to use a T-Test?
• When is it incorrect to use a T-Test?
• What is a disadvantage of t-test in comparison to F-test:
The t Statistic and Sampling Distribution
• First we want to convert difference of means to a test statistic.
o A value called “t”
• This is also a test statistic
• We begin with the difference of means
́
(¿¿ A−x́B)
¿
• This is our unadjusted sample statistic:. 9
• First term: the sample mean for category “A”
• Second term: the sample mean for category “B”
How do we get “t”?
• Using a formula, we “adjust” the difference of means to give us a “t” value
• Once we have the value of t, we can use the t sampling distribution
X −X ́
t= A B
SA SB
+
√ nA nB
Remember: denominators are often just “adjustments” to make the numerators behave
• Same thing here
• Don’t memorize this
• Instead, recognize: 10
X AX ́B
t= 2 2
SA SB
√ nA+ nB
• Numerator is the difference of means
• This is the meat of the formulla
• Denominator just adjusts this to standard error units
Sampling distribution for a t-statistic
We have a sampling distribution for this statistic: the t distribution
1. Assume the sampling distribution is normally distributed
2. The shape varies with degrees of freedom
3. This is just like the chi-square test
4. Shape of the chi-square curve depends on degrees of freedom 11
We assume the mean of the t sampling distribution is zero
o This is the same as a null hypothesis:
Null: There is no difference in the category means
• You can see immediately we’re testing statistical significance
• Notice how the t-distribution becomes flatter with a smaller n.
• Why? This is because the “t” distribution changes shape with the degrees of
freedom 12
Review
• What’s our sampling distribution?
o The t distribution
• What’s our test statistic?
o A value called “t”
• How do we get “t” value from our sample?
o We “adjust: the difference of means to give us a “t” value
• Once we have the t-value, we can find out how far from the mean of 0 the
T value is?
o We compare this value with critical values of t
o These are found in a table (EG, p. 170 in Kranzler)
• Table shows how far from the mean (of zero) the t value is 13
• With this distance from the mean, and our degrees of freedom, we can find the
area under the curve from our t value and beyond.
o This area is samples that would give a t value equal or greater than what
we found.
o This area gives us the probability of a Type I error
The chance of getting a sample statistic t in this area when the
population value of t is really zero
This probability is our level of statistical significance
Degrees of Freedom for a t Test
Reminder: the “t” distribution changes shape with the degrees of freedom
What are the degrees of freedom?
o Add number of cases in each category: n + nA B
o EG: 25 males and 25 females=45
• Subtract for the two means in the denominator
o Each “uses up” a degree of freedom
o EG: 45-2=43
This is our degrees of freedom
df = n A n - B 14
See:
Linneman 190-195
Kranzler 112-119
Summary of T-Tests
1) Have we met the test assumptions? Is our DV a ratio variable? Does our IV
have two categories? (IE: are we comparing means across two categories?)
2) State the null hypothesis. There is no significant difference between the
mean of category A and B.
3) Calculate degrees of freedom (SPSS alread

More
Less
Related notes for SOC222H5