Class Notes (1,100,000)

CA (620,000)

UBC (10,000)

COMM (700)

COMM 291 (100)

Jonathan Berkowitz (100)

Lecture 11

# COMM 291 Lecture Notes - Lecture 11: Binomial Distribution, Probability Distribution, Standard Deviation

by OC390871

Department

CommerceCourse Code

COMM 291Professor

Jonathan BerkowitzLecture

11This

**preview**shows pages 1-3. to view the full**15 pages of the document.**COMMERCE 291 – Lecture Notes 2016 – © Jonathan Berkowitz

Not to be copied, used, or revised without explicit written permission from the copyright owner.

Summary of Lectures 11 and 12

The Normal Model

The Normal Model or Normal Distribution is by far the most widely used continuous

probability distribution! It is commonly known as the bell-shaped curve. Another name is

the Gaussian distribution.

The normal curve is very easy to visualize and draw (see page 230 of the text for a

graph), but it is difficult to handle mathematically. The equation of the normal curve is

complex:

f

(

x

)

=1

σ

√

2πexp

{

−(x−μ)2

2σ2

}

where μ is the mean of the distribution and σ is the standard deviation of the distribution.

Area under the curve corresponds to probability, so the total area under the curve is

100% (i.e. total probability of 1).

An extremely useful guide to area under the normal curve:

The 68-95-99.7 Rule:

68% of the area under the normal curve lies within μ ± σ

95% of the area under the normal curve lies within μ ± 2σ

99.7% of the area under the normal curve lies within μ ± 3σ

The data version of the 68-95-99.7 Rule is called The Empirical Rule.

For a symmetric, bell-shaped (i.e. normal) distribution:

68% of the data values are within

´x

± s

95% of the data values are within

´x

± 2s

99.7% of the area values are within

´x

± 3s

This helps to explain why we say that the standard deviation represents the "typical"

distance from the mean; when we say "typical" distance, we mean that it applies to two-

thirds of the data values. That is, two-thirds of the data values are no more than one

standard deviation away from the mean.

From The Empirical Rule we can derive an extremely useful Rule of Thumb for getting a

rough approximation to the value of the standard deviation:

s ≈ Range/6 (for bell-shaped distributions and large n)

s ≈ Range/4 (for bell-shaped distributions and small n, approx. 20))

Why does this work?

1

Only pages 1-3 are available for preview. Some parts have been intentionally blurred.

The minimum to the maximum contains all the data. The distance from the minimum to

the maximum is called the range. The interval

´x

± 3s contains (almost) all the data.

The width of the interval is 6s. If we ignore the word "almost" which accounts for only

0.3% of the data, then the range must have an approximate width of 6s. Thus Range ≈

6s. Remember that "≈" means "approximately equal to". Do not rely on this rule of thumb

for exact calculations of the standard deviation.

* * *

Just as a histogram represents the data values of a quantitative variable, the normal

curve represents the possible values of a random variable. As usual, we will use X or Y

to represent random variables.

If X has a normal distribution with mean μ and standard deviation σ, we write X is N(μ,σ).

To compute areas under the normal curve we use a linear transformation to standardize

to a standard normal curve.

If X is N(μ,σ), then standardize as follows:

Z=X−μ

σ

Then Z is called the "standard normal" and has mean 0 and standard deviation 1.

Note that "standardization" means "subtract off the mean and divide by the standard

deviation."

The 68-95-99.7 Rule (or The Empirical Rule) for Z now becomes:

68% of the Z-values lie between –1 and 1

95% of the Z-values lie between –2 and 2

99.7% of the Z-values lie between –3 and 3.

Remember that Z has no units. It is a "pure" number that measures how many standard

deviations away from the mean a value lies.

Note: We introduced the idea of "standardization" and a "z-score" in Chapter 5, but we used the actual data,

not the hypothetical distribution. In that situation,

z=y− ´y

s

.The interpretation is the same.

Question: I can hear you scratching your head already; why have we introduced new

notation, namely, μ for the mean and σ for the standard deviation?

Isn’t that what

´x

and s represented? For now the answer is that we are making the

leap from real observed empirical data to a hypothetical collection of possible values.

You can think of the normal curve as a “stylized” description of your real data. And since

it is “stylized” it needs its own notation for mean and standard deviation.

Here is an illustration of how we will use the normal curve. Suppose you are interested in

the behaviour of a quantitative variable such as the height of a population. A histogram

of your collected data shows a symmetric bell-shape so that you think it appropriate to

summarize the shape with a normal curve. Because the properties of the normal curve

2

Only pages 1-3 are available for preview. Some parts have been intentionally blurred.

are known, you will now be able to compute the chance that the true height of your

population exceeds a certain limit, or falls within a particular interval.

3

###### You're Reading a Preview

Unlock to view full version