Class Notes (1,100,000)
CA (620,000)
UTSG (50,000)
SOC (3,000)
Lecture

SOC202H1 Lecture Notes - Null Hypothesis, Sampling Distribution, Frequency Distribution


Department
Sociology
Course Code
SOC202H1
Professor
Scott Schieman

Page:
of 5
Feb 27 2012
SOC202
Two group difference of means test:
A lot of people in Canada compared to are neither believers nor non-
believers.
Non-believer and believer: do these two groups have the same or
different levels of distress?
Imagining another table that has believers and their level of
distress (mean) and compare it to non-believers, we should be
able to see if the relaxed got hypothesis is true, then believers
should have lower distress.
Null hypothesis, in the sample of which this population was
drawn, there is no difference in mean level of distress
Operationalization of distress: The distress index that is used in
the research analysis paper
Cases drop off - people who didn't answer the distress items are not in
the issue.
People forget that this is all estimates. Nothing is precise. Thing to
keep in mind is comparison and estimate.
Look for example to refute null hypothesis
Two tail test means, rather than suggesting a direction, let's just
say there is a difference. it could be this way, or it could be the
other way.
Means are just one part of the story. another part of the story is
variance.
Ideal is, you want believers and non believers to be far apart.
You don't want a lot of spread between groups. if u ave believers
here and non believers in other side in an index, far apart and
uniformity within
Imagine that at some point u have to almost hypothetically, sample
after sample, scooping up if null hypothesis is true, if its close to zero,
just plot it on frequency distribution.
Just like the dice example in the book, if you scoop up a population and
it gives you a big difference, something is wrong, or null hypothesis is
false.
It's like taking that dice example and pulling that to the example prof
Feb 27 2012
SOC202 Quantitative Analysis in Social Sciences
SP.TG.
Page 1
gave. How likely is it get to 2 or a 12 rolling 2 dice? Very little.
Ideal: bigger difference between means.
Believers have a bit, but not much. But it'll be statically significant
because the sample population is large.
Instead of getting 60 000, if we did a study of 500 canadians, these
kinds of SD will be hard to reach a statically significant effect.
In larger samples, gives you a more flexibility to have larger variation.
Versus smaller samples, there are greater fluctuations around the
average.
Prof wants to be able to talk about the bigger difference between the
means (eg. this group is different from the other group, and there is
small variance etc (you want SD to be somewhat similar))
T-statistic: here's the centre, t-statisitic that you would get if null
hypothesis is true, is zero. You're building a case against the null
hypothesis.
The numerator (test effect) in the case against the null is the
difference between the two means.
Denominator is the standard error. It is the pooled variance estimate.
Sample to sample to sample variability is the pooled variance.
The whole idea here is, t-statistic is 'test error divided by standard
error'
p-value the tee will set is .05. We'll say it is a two-tailed test (Believers
could be better off, or not better off).
Null hypothesis is in the middle.
The interpretation:
On average, people who agree or strongly agree with the statement
"the challenges i face in lfe are god sway of testing me" report a higher
level of distress compared to study participants who reject or deny this
statements.
There is a real difference in the levels of distress b/w believe era and
non believers
* highly unlikely the that observed difference of .618 resulted from
normal, expected from sampling error. Statistically speaking, it is
different enough from expected, from giving a confidence that this isn't
a fluke.
* It doesnt stay whether a difference is big enough to make a big deal
about it
Feb 27 2012
SOC202 Quantitative Analysis in Social Sciences
SP.TG.
Page 2
Repeated sampling - established level of distress and take their
diffference
In repeated sampling, it will cluster around zero.
Down the side, the t-statistic work out in a different cvalue of
freedom. You can see difference between 20 and infinity is not
that big.
If the null is true, it will be near zero.
Pulling out a sample from below, is not likely to happen if
difference is zero. How likely? P-value
Sampling distribution of a large number of sampling mean
differences in a symmetrical and centers on zero
This is consistent: people who attend regularly seems beneficial
to them. Gives them support. They have lower levels of distress.
You don’t have to memorize the Standard Error of Difference between
two means formula, but know key points from slides.
Look at the mean for one group and other for second, see how
big is the spread
Other idea: “heteroscedasticity” is about legal variances – griup
2 = ow variance (want equal variances)
You don’t want G1 to have large variance and group 2 have
small
Degree of Freedom: It is basically taking n into account and basically
says bigger the sample, the better – that’s the basic idea. The standard
error will be lower that way.
You want large difference in the mean (case against the null) – it
standardizes difference between the mean
“What happens to me in the future mostly depends on god”: look at
the size of test effect and standard error is larger.
T-statistic is 1.111 (much smaller) since numerator is smaller and
denominator is larger.
p-value .227 (greater than .05 so it fails to reject the null hypothesis)
By hand, if someone gave you a t-stat, how do you know its p-
value/whether it rejects null hypothesis? -> 1.96 is the value!!
Feb 27 2012
SOC202 Quantitative Analysis in Social Sciences
SP.TG.
Page 3