Class Notes (1,100,000)
CA (620,000)
UBC (10,000)
COMM (700)
COMM 291 (100)
Lecture 11

COMM 291 Lecture Notes - Lecture 11: Binomial Distribution, Probability Distribution, Standard Deviation


Department
Commerce
Course Code
COMM 291
Professor
Jonathan Berkowitz
Lecture
11

This preview shows pages 1-3. to view the full 15 pages of the document.
COMMERCE 291 – Lecture Notes 2016 – © Jonathan Berkowitz
Not to be copied, used, or revised without explicit written permission from the copyright owner.
Summary of Lectures 11 and 12
The Normal Model
The Normal Model or Normal Distribution is by far the most widely used continuous
probability distribution! It is commonly known as the bell-shaped curve. Another name is
the Gaussian distribution.
The normal curve is very easy to visualize and draw (see page 230 of the text for a
graph), but it is difficult to handle mathematically. The equation of the normal curve is
complex:
f
(
x
)
=1
σ
2πexp
{
(xμ)2
2σ2
}
where μ is the mean of the distribution and σ is the standard deviation of the distribution.
Area under the curve corresponds to probability, so the total area under the curve is
100% (i.e. total probability of 1).
An extremely useful guide to area under the normal curve:
The 68-95-99.7 Rule:
68% of the area under the normal curve lies within μ ± σ
95% of the area under the normal curve lies within μ ± 2σ
99.7% of the area under the normal curve lies within μ ± 3σ
The data version of the 68-95-99.7 Rule is called The Empirical Rule.
For a symmetric, bell-shaped (i.e. normal) distribution:
68% of the data values are within
´x
± s
95% of the data values are within
´x
± 2s
99.7% of the area values are within
´x
± 3s
This helps to explain why we say that the standard deviation represents the "typical"
distance from the mean; when we say "typical" distance, we mean that it applies to two-
thirds of the data values. That is, two-thirds of the data values are no more than one
standard deviation away from the mean.
From The Empirical Rule we can derive an extremely useful Rule of Thumb for getting a
rough approximation to the value of the standard deviation:
s ≈ Range/6 (for bell-shaped distributions and large n)
s ≈ Range/4 (for bell-shaped distributions and small n, approx. 20))
Why does this work?
1

Only pages 1-3 are available for preview. Some parts have been intentionally blurred.

The minimum to the maximum contains all the data. The distance from the minimum to
the maximum is called the range. The interval
´x
± 3s contains (almost) all the data.
The width of the interval is 6s. If we ignore the word "almost" which accounts for only
0.3% of the data, then the range must have an approximate width of 6s. Thus Range ≈
6s. Remember that "≈" means "approximately equal to". Do not rely on this rule of thumb
for exact calculations of the standard deviation.
* * *
Just as a histogram represents the data values of a quantitative variable, the normal
curve represents the possible values of a random variable. As usual, we will use X or Y
to represent random variables.
If X has a normal distribution with mean μ and standard deviation σ, we write X is N(μ,σ).
To compute areas under the normal curve we use a linear transformation to standardize
to a standard normal curve.
If X is N(μ,σ), then standardize as follows:
Z=Xμ
σ
Then Z is called the "standard normal" and has mean 0 and standard deviation 1.
Note that "standardization" means "subtract off the mean and divide by the standard
deviation."
The 68-95-99.7 Rule (or The Empirical Rule) for Z now becomes:
68% of the Z-values lie between –1 and 1
95% of the Z-values lie between –2 and 2
99.7% of the Z-values lie between –3 and 3.
Remember that Z has no units. It is a "pure" number that measures how many standard
deviations away from the mean a value lies.
Note: We introduced the idea of "standardization" and a "z-score" in Chapter 5, but we used the actual data,
not the hypothetical distribution. In that situation,
z=y− ´y
s
.The interpretation is the same.
Question: I can hear you scratching your head already; why have we introduced new
notation, namely, μ for the mean and σ for the standard deviation?
Isn’t that what
´x
and s represented? For now the answer is that we are making the
leap from real observed empirical data to a hypothetical collection of possible values.
You can think of the normal curve as a “stylized” description of your real data. And since
it is “stylized” it needs its own notation for mean and standard deviation.
Here is an illustration of how we will use the normal curve. Suppose you are interested in
the behaviour of a quantitative variable such as the height of a population. A histogram
of your collected data shows a symmetric bell-shape so that you think it appropriate to
summarize the shape with a normal curve. Because the properties of the normal curve
2

Only pages 1-3 are available for preview. Some parts have been intentionally blurred.

are known, you will now be able to compute the chance that the true height of your
population exceeds a certain limit, or falls within a particular interval.
3
You're Reading a Preview

Unlock to view full version