Class Notes
(786,277)

Canada
(482,090)

University of Toronto Scarborough
(30,375)

Statistics
(266)

STAB22H3
(207)

Ken Butler
(34)

Lecture

# STAB22-LEC06-(6,7).docx

Unlock Document

University of Toronto Scarborough

Statistics

STAB22H3

Ken Butler

Fall

Description

STAB22 LEC06
(Covers the remaining part of Chapter 6 and start of Chapter 7)
---------------[CHAPTER6]-----------------
[59]
EXAMPLE OF NPP
Given data, how do we know whether normal distribution works or not?
- can easily tell by making & looking at NPP (normal probability plot)
(ex) Potassium data for breakfast cereals
- skewed to right
- this is indicated by bunch of high val's spread out, and then have bunch of low val's
clumped together
- for normal distribution, the pts have to be pretty close to, or on the black line
[60]
EXAMPLE OF NPP
-> actual normal data - realistic data typically does not follow blk line precisely
- it looks OK to form a normal curve for this because val's are pretty close to blk line
[61]
EXAMPLE OF NPP
-> cereal calorie data
- notice that it curves down, and then curves up
- forms an S-shaped distribution
- this one is symmetrical, but too many outliers for it to be normal
[62]
EXAMPLE OF NPP
-> cereal sugars
- although it "wiggles" (meaning it curves), overall the pts are not far away from line
- OK for normal distribution
[63]
EXAMPLE OF NPP
-> cereal sugars histogram
- it had hole in the middle, sth that NPP could not tell you
- b/c of this, it is not really normal, b/c shape is not as symmetric, even tho. there isn't any
outliers
[64]
68-95-99.7 RULE
- applies to normal distribution only
- says that
- within +/- 1 SD away from mean, there is about 68% of the data - within +/- 2 SD away from mean, there is about 95% of the data
- within +/- 3 SD away from mean, there is about 99.7% of the data
- this rule gives up prop's for certain SD's without having to use z-table.
Rule (con.)
- tells you where to get what % of the curve from its name
- does not matter what the original mean and SD of the data was, this rule still applies, as long as the
distribution of the data is, or is roughly normal.
(ex)
Roma Tomatoes
- mean = 74g
- SD = 2.5g
======
Question1: What interval of weights will be covered by 95%?
Solution:
- 95% => +/- 3 SD away from mean
- to get lower value for interval,
- val = 74 - 3(2.5) = 66.5g
- to get upper val. for interval,
- val = 74 + 3(2.5) = 81.5g
=> 95% of the weights will be between 66.5g and 81.5g.
Question2: What proportion of the weights will be between 71.5g and 76.5g?
Solution
- get z-scores for these val's
- For 71.5g, - For 76.5g,
You want to find what % b/ween -1.00SD and 1.00SD away from mean, which is 68%, by the 68-95-99.7
Rule.
[65]
Roma Tomatoes
- mean = 74g
- SD = 2.5g
======
Question: Approximately what fraction of weights will be greater than 79g?
Solution:
- Get z-score
- corresponding prop = 0.9772 => 97.72% below
=> ~2.3% above Portions of thecurve:
- shaded parts both together make up 95%
- on the upper end, that is 2.5% (unshaded part), and there is one on the lower end that is 2.5%
=> 2.5% of it is bigger than 79
[66]
Roma Tomatoes
- mean = 74g
- SD = 2.5g
======
Question: About what proportion will be between 74 and 81.5g?
Solution
Method 1
- Get z-scores
- For 74
=> 50% of the data is below this
- For 81.5 => about 99.8% of the data is below this
- So to find what is between these two regions, we subtract 99.8% from 50%
=> 99.8 - 50.0= 49.8%
Method 2
- 74g is the mean
- The two points that are 3SD's away from the mean are:
- val 3SD below mean = 74 - 3(2.5) = 66.5
- val 3SD above mean = 74 + 3(2.5) = 81.5
- together the regions in b/ween 3SD away from mean is about 99.7%
- so this is from 66.5 to 81.5
- we only want 74 to 81.5
- by symmetry, the proportion between 74 and 81.5 is half of 99.7%, which is 49.85%
[68]
WHEN YOU DON'T HAVE NORMAL TABLE
Can use the following formula: About theformula
- only accurate to 2 d.p.
Note about exam
- will have NORMAL TABLE to work with in exams
(ex) Roma tomatoes
- mean = 74
- SD = 2.5
- prop. less than 77.4 gives z = 1.36
- anyways, you must find the positive z-value that would give the equivalent proportion if you
want to work with the formula.
----------------------------
---------------[CHAPTER7]-----------------
[69]
SCATTERPLOTS, ASSOCIATION & CORRELATION
- now were looking at 2 quan. var's together
- first tool introduced: scatterplot
- plot of val's of one q.var against val's of another q.var.
[70]
EXAMPLE OF SCATTERPLOT
->airport in Oakland recorded #passengers leaving each month from 1990 to 2006 -> this scatterplot has:
a) time (years since 1990) for x-axis
b) passengers (#) for y-axis
About this scatterplot
- can see a general upward trend
=> as time goes by, number of passengers seems to incr.
- each blue dot rep. one month's worth of data
- ex. a month in 2001 where there was about 7000 passengers is rep. by 1 unique point

More
Less
Related notes for STAB22H3