Class Notes (839,469)
Canada (511,354)
Statistics (297)
STAB22H3 (239)
Ken Butler (34)

***** STAB22 - Highly Detailed - Chapter 23 - 1st Canadian edition Textbook Notes *****

23 Pages

Course Code
Ken Butler

This preview shows pages 1,2,3,4. Sign up to view the full 23 pages of the document.
inferences about means (chapter 23) STUDENT'S T -> ASSUMPTIONS & CONDITIONS 1. INDEPENDENCE ASSUMPTION - randomization condition - 10% 2. NORMALITY - nearly normal condition - when have to make decision based on data, use hypo test ----------------------------------------- WHERE ARE WE GOING? - CHP23: 1. making CI 2. testing hypothesises ... for mean of qvar main text, p617 [1] WHO - 2ndry school students WHAT - Time it takes to get to school UNITS - min. WHEN - 2007-2008 WHERE - Ontario WHY - Pt of CensusAtSchool Project - want to get a feel as to what is avg time it takes for all Ontario students to get to school - n = 40 Ontario 2ndry school students, SRS HOW DOES DATA REGARDING MEANS DIFFER FROM PROPORTIONS - *impt way: prop's typically reported as summaries - individual response is either: "succcess"/"failure" - qdata typically reports numerical val. for each subj. - summarized w/ means & SD's GETTING STARTED [1] CLT SUMMARY - regardless of what popn the random sample is retrieved from, shpape of sampling distrib approx. Normal, given n sufficently largr - larger the n more closely that Normal approximates sampling distrib of mean - this formula req. that we know true popn SD (sigma) p619 CLT Problem - req. that to model smapling distrib. of mean from random sample of qdata, need true popn SD - for means, knowing about sample mean does not tell us ath about standard deviation of mean - know n , but sigma couild be ath - resol'n: est. popn parameter sigma with s = sample SD (based on data) GREEN BOX: - b/c SD of sampling distrib model is being estimated from data, SD called SE (standard error) - SE = estimated SD of samplign distrib model for means main text (con.; p619) STANDARD ERROR WITH NORMAL MODEL - this worked for larger sample sizes - prob's w/ smaller samples - too much variance in data to fit with Normal model properly - was giving wrong calculations for P-value & margins of error GOSSET - Normal model will not work, rather need new sampling distrib model that will allow for extra var. w/ larger margins of error and P-values - need whole new family of models, dep. on n - unimodal, symm, bell-shaped - smaller n is, the more the tails of this model have to be stretched GOSSET'S T [1] STUDENT'S t - bell-shaped model, but features change dep. on what n is - forms entire family of related distrib's that dep. on parameter = DEGREES OF FREEDOM (df) - t_df INDEPENDENCE ASSUMPTION - data val's should be indep - check whether this is reasonable - cannot check, merely by looking at sample, whether data exhibits indep. Randomization condition - data come from random sample/randomized expt - ideal case: random sampled dtata from SRS - but if they come from more complex sampling methods (Ex. cluster, multistage), then will likely have SEs bigger than formula suggests 10% Condition - sample is no greater than 10% of popn - check this when sample without replacement is not negligable - indep. of selections would be compromised if large fraction of popn was sampled - typically, this condition is not even mentioned for means - b/c samples gen. smaller, and so indep. issue only arises if sampling from small popn - if we get from randomized expt, then there is no sampling at all WE DON'T WANT TO STOP - check conditions for the aim of mamking meaningful analysis of data - conditions are disqualifiers - ie. when they fail, then data is disqualified for getting meaningful data analysis from - but cont. proceeding unless there is some serious failure - if minor issue, then make note of it and express caution wrt results - limit conclusions if sample not SRS, but is rep'ive of some popn - if outliers in data, then do analysis w/, and w/out them - if sample bimodal, then attempt to analyze subgroups sep'ly - when there is major issues - ex. sample signif. skewed - ex. sample obviously non-rep'ive ... then cannot proceed p623 -> NORMAL POPULATION ASSUMPTION [1] Student's t-model - will not wokri s data is signif. skewed - no way to check for certain that data from POPN follows Normal model, so just assume it does [2] ASSOCIATED CONDITION - NEARLY-NORMAL CONDITION - data originates from distrib. that is UNIMODAL & SYMMETRIC (ie. the sample has this distrib.) - just sufficient to check this condition and proceed when wokring w/ small samples [4-5] - can be checked by examining histogram or NPP - Normality for Student's t dep. on sample size [6] - for v.small samples (n < 15) data should follow Normal model pretty closely - if outliers or signif. skewedness found, then do not proceed [7] - for moderate sample sizes ( 15 <= n <= 40) - t methods will work given that data is unimodal, and not signif. skewed - make histogram to check [8] - when sample > 40 or > 50 - unless data is v.extremely skewed, use t methods - make histogram to check - if outliers fiound, perform analysis w/ outlier, and another w/out - could reveal additional info about data that req. special at'n - if multiple modes found, then investigate for diff. groups in data that should be sep'ly analyzed & understood GUINNESS STOUT MAY BE HEARTY, BUT THE t-PROCEDURE IS ROBUST! (BLUE BOX) - robust statistical test - can still prod. accurate results despite an assumtpion is violated (ex) one-sample t-test - robust wrt Normality assumption - aka robust against violations of Normality - if procedure can tolerate bigger violations, then it is more robust - robustness incr. by n p626 - get t* value from column (0.10 0.05) - locate row of table corresponding to closest df & col. corresponding to probability we want (ex) 90% CI - leaves 5% of val's on either side of distrib => look for one-tail probability of 0.05 at top of col. or 90% at bottom - which col. we get data on t table dep. on what % confident we have - if precise df not given, that can go with bigger val, or use software p627 MORE CAUTIONS ABOUT INTERPRETING CONFIDENCE INTERVALS [1] THE FOLLOWING INTERPRETATIONS ABOUT CONFIDENCE INTERVALS FOR MEANS ARE INCORRECT A) "90% OF ALL OSS STUDENTS TAKE BETWEEN 14.4 AND 19.6 MINUTES TO GET TO SCHOOL" - CI is about mean travel time, not about times of individual students B) "90% CONFIDENT THAT A RANDOMLY SELECTED STUDENT WILL TAKE BETWEEN 14.4 AND 18.6 MIN TO GET TO SCHOOL" - CI is about mean travel time, not about times of individual students correct: "90% CONFIDENT THAT MEAN TRAVEL TIME OF ALL 2NDRY STUDENTS B/WEEN 14.4 AND 19.6 MIN." C) "MEAN STUDENT TRAVEL TIME IS 17.0 MIN, 90% OF THE TIME" - this is => that true mean varies, when it is really the CI that is diff. from one sample to another - true mean is fixed val. D) "90% OF ALL SAMPLES WILL HAVE MEAN TRAVEL TIMES BETWEEN 14.4 AND 19.6 MINUTES" - suggesting that this CI sets standard for every other intv - but th
More Less
Unlock Document

Only pages 1,2,3,4 are available for preview. Some parts have been intentionally blurred.

Unlock Document
You're Reading a Preview

Unlock to view full version

Unlock Document

Log In


Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view


By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.