Class Notes (1,100,000)
CA (620,000)
U of G (30,000)
STAT (200)
STAT 2040 (100)
Lecture 17

# STAT2040 Lecture 17: Week 6 - Sampling Distributions, completed lecture guideline

Department
Statistics
Course Code
STAT 2040
Professor
Jeremy Balka
Lecture
17

This preview shows pages 1-2. to view the full 8 pages of the document. Supporting Videos For This Chapter
8msl videos (these are also given at appropriate places in this chapter):
Sampling Distributions: Introduction to the Concept (7:52)
(http://youtu.be/Zbw-YvELsaM)
The Sampling Distribution of the Sample Mean (11:40) (http://youtu.be/q50GpTdFYyI)
Introduction to the Central Limit Theorem (13:14) (http://youtu.be/Pujol1yC1_A)
Other supporting videos for this chapter (not given elsewhere in this chapter):
Deriving the Mean and Variance of the Sample Mean (5:07)
(http://youtu.be/7mYDHbrLEQo)
Proof that the Sample Variance is an Unbiased Estimator of the Population Variance
(6:58) (http://youtu.be/D1hgiAla3KI) **Not needed in STAT*2040**
NOTE: this is the Lecture guideline given, with the spaces ﬁlled in. Some of the shorthand I use for typing might be different than what you use
(Ex. sqrt = square root) or I’ll type out the name of the symbol if I can’t type it (Ex. sigma, mu). If you missed the lecture and just want the notes, this will
be helpful but if you’re having difﬁculties understanding and need clariﬁcation don’t download this note.
ALSO NOTE: The original lecture guideline that I’ve edited is not my intellectual property.

Only pages 1-2 are available for preview. Some parts have been intentionally blurred. Sampling Distributions
The concept of the sampling distribution of a statistic is fundamental to much of statis-
tical inference.
Recall that the population is the entire group of individuals or items that we want informa-
tion about, and the sample is the subset of the population that we actually examine.
We will soon use sample statistics to estimate and make inferences about population pa-
rameters.
Problem: We don’t know the value of the parameter, so how can we possibly say how close
the statistic is to the parameter?
We use arguments based on the sampling distribution of the statistic to state how close the
estimate is likely to be to the parameter.
The sampling distribution of a statistic is the probability distribution of that statistic (the
distribution of the statistic in all possible samples of the same size).
This is often phrased in terms of repeated sampling: the sampling distribution of a statistic is
the probability distribution of that statistic if samples of the same size were to be repeatedly
drawn from the population.
Example 0.1 A professor thinks that they could prepare more appropriate course materials
if they knew the average age of the 16 students in their class. The view from the professor’s
perspective and the underlying reality of the situation:
234 241 233 227
251 227 242 239
241 238 230 246
231 243 238 276
1234
5678
910 11 12
13 14 15 16
(a) From the professor’s viewpoint: 16 students with un-
known ages.
234 241 233 227
251 227 242 239
241 238 230 246
231 243 238 276
(b) The ages, in months, of the students. This underly-
ing reality is unknown to the professor.
The professor asks the university for the students’ ages.
The university is wary of violating a privacy policy, and agrees only to supply the sample
mean age of 3 randomly selected students from the class.
fundamentals of statistical inference
the sampling
distribution is a
theoretical actual
thing. We take
only one sample,
but we always
keep the notion
that we could
theoretically take
another sample
and get a different
value of our
statistic.
we could calculate the mean for these individuals, and they
represent the entire population, the mean (mu) = 239.8 months.
###### You're Reading a Preview

Unlock to view full version

Only pages 1-2 are available for preview. Some parts have been intentionally blurred. 234 241 233 227
251 227 242 239
241 238 230 246
231 243 238 276
1234
5678
910 11 12
13 14 15 16
233
227
238
(a) Individuals 3, 6, and 15 were se-
lected in this sample. ¯x=232.67.
234 241 233 227
251 227 242 239
241 238 230 246
231 243 238 276
1234
5678
910 11 12
13 14 15 16
251
238 276
(b) Individuals 5, 15, and 16 were se-
lected in this sample. ¯x=255.0.
230 240 250 260
0.00 0.02 0.04 0.06 0.08
Values of the Sample Mean
Relative Frequency
µ
(239.8)
Figure 1: A relative frequency histogram of 100,000 sample means for Example 0.1.Thisis
(approximately) the sampling distribution of the sample mean for n=3.
Repeated sampling is an underlying concept, and not something that we actually carry out
in practice!
Mathematical arguments based on the sampling distribution will allow us to make state-
ments like: “We can be 95% conﬁdent that the population mean age of students in this class
lies between 219 and 246 months.”
In many situations we can determine a statistic’s sampling distribution using mathematical
arguments, instead of actually sampling repeatedly.
The sampling distributions of ¯
Xand S2if we are sampling 10 values from a normally
distributed population with µ= 50 and 2=4:
(a) The sampling distribution of ¯
X.
(b) The sampling distribution of S2.
you can use this as an estimate
of the avg. age of the pop.
You know if you take another
sample the value of the sample
would change, and although you
wouldn’t normally take another
sample you do it here to show the
value changing.
note the sample mean is distributed about the population mean.
!!!
Note: there are (16 3) = 560 possible samples, so
we could have worked out the exact sampling
distribution of the sample mean. You could’ve
found the exact sample mean for each option,
and then worked out the exact amount of times
that’d show up and use that to make your
distribution but it would look very very similar.
!!!
Conﬁdence interval: later in course.
centered at mu, but distributed about
the parameter it estimates.
sample variance, estimated population variance,
and is distributed about the actual variance
###### You're Reading a Preview

Unlock to view full version