Class Notes (838,951)
Canada (511,158)
Statistics (297)
STAB22H3 (239)
Ken Butler (34)
Lecture

STAB22-LEC04-(5,6).docx

20 Pages
123 Views
Unlock Document

Department
Statistics
Course
STAB22H3
Professor
Ken Butler
Semester
Fall

Description
STAB22 LEC04 (Covers entire chapter 5, and some beginning part of chapter 6) ---------------[CHAPTER5]----------------- Note - near the end of the lecture, he discusses data about potassium. He is getting that data from: http://www.statcrunch.com/5.0/viewreport.php?reportid=25026 --------------------------------- [36] CHAPTER 5: UNDERSTANDING AND COMPARING DATA Data val's - 10,11,14,15,17,19,21,28,35 Find: a) median = 17 b) Q1= 14 c) Q3 = 21 d) IQR = Q3 - Q1 = 21 - 14 = 7 e) max = 35 f) min = 10 5-Number Summary - gives min, Q1, median, Q3 and max: => 10, 14, 17, 21, 35 - summarizes the extremes, and the percentiles (25th, 75th, 50th) => capable of summarizing large quantity of #'s w/ small quantity of summary val's (only 5) - Note - 50th percentile aka median [37-38] BOXPLOT - visual display retireved from using the following statistical summaries: - median, Q3, Q1, min, max, upper fence, lower fence, and IQR => incorporates 5-number summary in a visual display - scale on left Calculating the "fences" - these are imaginary horizontal lines at particular values  upper fence - UF = Q3 + (1.5)(IQR)  lower fence - LF = Q1 + (1.5)(IQR) - Note - he expressed "R" to equal (1.5)(IQR). So, we can alternatively write the UF, LF formulae as follows: - UF = Q3 + R - LF = Q1 + R - draw vertical lines protruding from box of boxplot to most extreme value WITHIN THE FENCES - if there is a val. that is OUTSIDE the fence, plot it individually - these are SUSPECT OUTLIERS P37 BOXPLOT R Recall: ----------- - UF = Q3 + R - LF = Q1 + R , where R = (1.5)*(IQR) ----------- - R is a determinant value for whether a value is too big, or too small, and thus should be considered an outlier - the explanation for why it we multiplied IQR by 1.5 - the professor says "B/c it is." - if we picked greater than 1.5, then value has to be very very big to be considered "suspicious" - ath about upper or lower fence is suspiciously large or suspiciously small Data val's - 10,11,14,15,17,19,21,28,35 - largest: 35, bigger than upper fence - smallest: 10, which isn't smaller than smaller fence, which was 3.5, Further notes on Boxplot - all that matters in the v.axis => hrztal scale (aka x-axis) doesn't mean ath at all => how wide the boxplot is does not matter, and does not mean ath With respect to our example - dot - largest val - out, beyond the fence (in this ex., it is beyond UF) - we draw atn to it b/c it is unusually large Advantages of Boxplot - upon seeing it, you get immediate picture of centre and spread of data - centre instantly (often times) from line in middle, which is median - height of box is Q3 - Q1, which is IQR - the taller the box is, the more spread out data val's are - if box taller, IQR bigger b/c Q3 and Q1 more farther apart - unlike histogram or stem plot, it gives you centre and spread directly, and tells you about unusually high or low val's Example: 35 - high val = 35 - do further investigation questioning if 35 is correct - an outlier ------------------------- MyStatCrunch instructions - Graphics -> Boxplot -> select column - "use fences to identify outliers" - Otherwise it will show whiskers all the way to top, all the way to bottom ------------------------- [39] COMPARING DISTRIBUTIONS WITH BOXPLOTS - ex - classify cereals on which shelf they are, and do they differ in the amt of sugar they have per serving - compare these three distributions - commonality: each measures sugar serving of cereals - difference: the cases are in a particular shelf - ie. for "1", only counts cases (ie. cereals) on top shelf - ie. for "2", only on middle shelf - ie. for "3", only on bottom shelf [40] Comparing the 3 distributions with Histograms - hard to look at and compare - have to also decide where middle of distributions are, which is not easily found using this display [41] Comparing the 3 distributions with Boxplots - instead do side-by-side box plots => more clearer story from that - used to compare var's or var's grouped up in diff. ways Shelf 1 - median at around 4 (line on mid of box) - whiskers - bottom one short, top one long => appear to be skewed to right - no outliers, b/c no pts plotted by themselves - Q1 to bottom of data - not v.far - Q3 to top of data - data quite spread out Shelf 3 - around 7 for median - whiskers about same length top and bottom => distribution is roughly symmetric Shelf 2 - where is median? - There is no line across the box somewhere in the middle => median must be same as Q1 or Q3 - longer whisker at bottom => distribution appears to be skewed to left Advantages of Boxplot - gives quick comparison of distributions if want to know what certain statistical summary is (given that is is displayed by boxplot) - ex. could easily compare medians for sugar count b/ween the three distributions - comparing diff. groups for sth is done best by comparing box plots side-by-side - ex. cereals on diff. shelves - comparing length of upper whisker to bottom whisker within one distribution, and b/ween distributions to get an idea as to direction of skewedness, if any - both similar/same lengths => symmetric - upper whisker longer => skewed to right - lower whisker longer => skewed to left [42] Where is median for shelf 2? - median = Q3 - likely b/c there are a lot of cereals that have sugars exactly 12 - b/c so many
More Less

Related notes for STAB22H3

Log In


OR

Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view


OR

By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.


Submit