false

Class Notes
(839,469)

Canada
(511,354)

University of Toronto Scarborough
(31,658)

Statistics
(297)

STAB22H3
(239)

Ken Butler
(34)

Lecture

Department

Statistics

Course Code

STAB22H3

Professor

Ken Butler

Description

inferences about means (chapter 23)
STUDENT'S T
-> ASSUMPTIONS & CONDITIONS
1. INDEPENDENCE ASSUMPTION
- randomization condition
- 10%
2. NORMALITY
- nearly normal condition
- when have to make decision based on data, use hypo test
-----------------------------------------
WHERE ARE WE GOING?
- CHP23:
1. making CI
2. testing hypothesises
... for mean of qvar
main text, p617
[1]
WHO - 2ndry school students
WHAT - Time it takes to get to school
UNITS - min.
WHEN - 2007-2008
WHERE - Ontario WHY - Pt of CensusAtSchool Project
- want to get a feel as to what is avg time it takes for all Ontario students to get to
school
- n = 40 Ontario 2ndry school students, SRS
HOW DOES DATA REGARDING MEANS DIFFER FROM PROPORTIONS
- *impt way: prop's typically reported as summaries
- individual response is either: "succcess"/"failure"
- qdata typically reports numerical val. for each subj.
- summarized w/ means & SD's
GETTING STARTED
[1]
CLT SUMMARY - regardless of what popn the random sample is retrieved from, shpape of sampling
distrib approx. Normal, given n sufficently largr
- larger the n more closely that Normal approximates sampling distrib of mean
- this formula req. that we know true popn SD (sigma)
p619
CLT Problem
- req. that to model smapling distrib. of mean from random sample of qdata, need true
popn SD
- for means, knowing about sample mean does not tell us ath about standard deviation
of mean
- know n , but sigma couild be ath
- resol'n: est. popn parameter sigma with s = sample SD (based on data)
GREEN BOX:
- b/c SD of sampling distrib model is being estimated from data, SD called SE (standard
error)
- SE = estimated SD of samplign distrib model for means
main text (con.; p619)
STANDARD ERROR WITH NORMAL MODEL
- this worked for larger sample sizes
- prob's w/ smaller samples
- too much variance in data to fit with Normal model properly - was giving wrong calculations for P-value & margins of error
GOSSET
- Normal model will not work, rather need new sampling distrib model that will allow for
extra var. w/ larger margins of error and P-values
- need whole new family of models, dep. on n
- unimodal, symm, bell-shaped
- smaller n is, the more the tails of this model have to be stretched
GOSSET'S T
[1]
STUDENT'S t
- bell-shaped model, but features change dep. on what n is
- forms entire family of related distrib's that dep. on parameter = DEGREES OF
FREEDOM (df)
- t_df INDEPENDENCE ASSUMPTION
- data val's should be indep
- check whether this is reasonable
- cannot check, merely by looking at sample, whether data exhibits indep.
Randomization condition
- data come from random sample/randomized expt
- ideal case: random sampled dtata from SRS
- but if they come from more complex sampling methods (Ex. cluster, multistage), then
will likely have SEs bigger than formula suggests
10% Condition
- sample is no greater than 10% of popn
- check this when sample without replacement is not negligable - indep. of selections would be compromised if large fraction of popn was
sampled
- typically, this condition is not even mentioned for means
- b/c samples gen. smaller, and so indep. issue only arises if sampling from small
popn
- if we get from randomized expt, then there is no sampling at all
WE DON'T WANT TO STOP
- check conditions for the aim of mamking meaningful analysis of data
- conditions are disqualifiers
- ie. when they fail, then data is disqualified for getting meaningful data analysis
from
- but cont. proceeding unless there is some serious failure
- if minor issue, then make note of it and express caution wrt results
- limit conclusions if sample not SRS, but is rep'ive of some popn
- if outliers in data, then do analysis w/, and w/out them
- if sample bimodal, then attempt to analyze subgroups sep'ly
- when there is major issues
- ex. sample signif. skewed
- ex. sample obviously non-rep'ive
... then cannot proceed
p623 -> NORMAL POPULATION ASSUMPTION
[1]
Student's t-model
- will not wokri s data is signif. skewed
- no way to check for certain that data from POPN follows Normal model, so just
assume it does
[2]
ASSOCIATED CONDITION
- NEARLY-NORMAL CONDITION
- data originates from distrib. that is UNIMODAL & SYMMETRIC (ie. the sample
has this distrib.)
- just sufficient to check this condition and proceed when wokring w/
small samples
[4-5]
- can be checked by examining histogram or NPP
- Normality for Student's t dep. on sample size
[6]
- for v.small samples (n < 15) data should follow Normal model pretty closely
- if outliers or signif. skewedness found, then do not proceed
[7]
- for moderate sample sizes ( 15 <= n <= 40) - t methods will work given that data is unimodal, and not signif. skewed
- make histogram to check
[8]
- when sample > 40 or > 50
- unless data is v.extremely skewed, use t methods
- make histogram to check
- if outliers fiound, perform analysis w/ outlier, and another w/out
- could reveal additional info about data that req. special at'n
- if multiple modes found, then investigate for diff. groups in data that should be sep'ly
analyzed & understood
GUINNESS STOUT MAY BE HEARTY, BUT THE t-PROCEDURE IS ROBUST!
(BLUE BOX)
- robust statistical test - can still prod. accurate results despite an assumtpion is violated
(ex) one-sample t-test
- robust wrt Normality assumption
- aka robust against violations of Normality
- if procedure can tolerate bigger violations, then it is more robust
- robustness incr. by n
p626 - get t* value from column (0.10 0.05)
- locate row of table corresponding to closest df & col. corresponding to probability we
want
(ex) 90% CI
- leaves 5% of val's on either side of distrib
=> look for one-tail probability of 0.05 at top of col. or 90% at bottom
- which col. we get data on t table dep. on what % confident we have
- if precise df not given, that can go with bigger val, or use software
p627
MORE CAUTIONS ABOUT INTERPRETING CONFIDENCE INTERVALS
[1]
THE FOLLOWING INTERPRETATIONS ABOUT CONFIDENCE INTERVALS FOR MEANS
ARE INCORRECT
A) "90% OF ALL OSS STUDENTS TAKE BETWEEN 14.4 AND 19.6 MINUTES TO GET
TO SCHOOL"
- CI is about mean travel time, not about times of individual students
B) "90% CONFIDENT THAT A RANDOMLY SELECTED STUDENT WILL TAKE BETWEEN
14.4 AND 18.6 MIN TO GET TO SCHOOL"
- CI is about mean travel time, not about times of individual students
correct: "90% CONFIDENT THAT MEAN TRAVEL TIME OF ALL 2NDRY STUDENTS
B/WEEN 14.4 AND 19.6 MIN."
C) "MEAN STUDENT TRAVEL TIME IS 17.0 MIN, 90% OF THE TIME"
- this is => that true mean varies, when it is really the CI that is diff. from one sample
to another - true mean is fixed val.
D) "90% OF ALL SAMPLES WILL HAVE MEAN TRAVEL TIMES BETWEEN 14.4 AND 19.6
MINUTES"
- suggesting that this CI sets standard for every other intv
- but th

More
Less
Unlock Document

Related notes for STAB22H3

Only pages 1,2,3,4 are available for preview. Some parts have been intentionally blurred.

Unlock DocumentJoin OneClass

Access over 10 million pages of study

documents for 1.3 million courses.

Sign up

Join to view

Continue

Continue
OR

By registering, I agree to the
Terms
and
Privacy Policies

Already have an account?
Log in

Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.