false

Class Notes
(838,951)

Canada
(511,158)

University of Toronto Scarborough
(31,630)

Statistics
(297)

STAB22H3
(239)

Ken Butler
(34)

Lecture

Unlock Document

Statistics

STAB22H3

Ken Butler

Fall

Description

STAB22 LEC04
(Covers entire chapter 5, and some beginning part of chapter 6)
---------------[CHAPTER5]-----------------
Note - near the end of the lecture, he discusses data about potassium. He is getting that data
from: http://www.statcrunch.com/5.0/viewreport.php?reportid=25026
---------------------------------
[36]
CHAPTER 5: UNDERSTANDING AND COMPARING DATA
Data val's
- 10,11,14,15,17,19,21,28,35 Find:
a) median = 17
b) Q1= 14
c) Q3 = 21
d) IQR = Q3 - Q1 = 21 - 14 = 7
e) max = 35
f) min = 10
5-Number Summary
- gives min, Q1, median, Q3 and max:
=> 10, 14, 17, 21, 35
- summarizes the extremes, and the percentiles (25th, 75th, 50th) => capable of summarizing
large quantity of #'s w/ small quantity of summary val's (only 5)
- Note - 50th percentile aka median
[37-38]
BOXPLOT
- visual display retireved from using the following statistical summaries:
- median, Q3, Q1, min, max, upper fence, lower fence, and IQR
=> incorporates 5-number summary in a visual display
- scale on left
Calculating the "fences" - these are imaginary horizontal lines at particular values
upper fence
- UF = Q3 + (1.5)(IQR)
lower fence
- LF = Q1 + (1.5)(IQR)
- Note - he expressed "R" to equal (1.5)(IQR). So, we can alternatively write the UF, LF
formulae as follows:
- UF = Q3 + R
- LF = Q1 + R
- draw vertical lines protruding from box of boxplot to most extreme value WITHIN THE
FENCES
- if there is a val. that is OUTSIDE the fence, plot it individually
- these are SUSPECT OUTLIERS P37
BOXPLOT
R
Recall:
-----------
- UF = Q3 + R
- LF = Q1 + R
, where R = (1.5)*(IQR)
-----------
- R is a determinant value for whether a value is too big, or too small, and thus should be
considered an outlier
- the explanation for why it we multiplied IQR by 1.5
- the professor says "B/c it is." - if we picked greater than 1.5, then value has to be very very big to be considered
"suspicious"
- ath about upper or lower fence is suspiciously large or suspiciously small
Data val's
- 10,11,14,15,17,19,21,28,35
- largest: 35, bigger than upper fence
- smallest: 10, which isn't smaller than smaller fence, which was 3.5,
Further notes on Boxplot
- all that matters in the v.axis
=> hrztal scale (aka x-axis) doesn't mean ath at all
=> how wide the boxplot is does not matter, and does not mean ath
With respect to our example
- dot - largest val
- out, beyond the fence (in this ex., it is beyond UF)
- we draw atn to it b/c it is unusually large
Advantages of Boxplot
- upon seeing it, you get immediate picture of centre and spread of data
- centre instantly (often times) from line in middle, which is median
- height of box is Q3 - Q1, which is IQR
- the taller the box is, the more spread out data val's are
- if box taller, IQR bigger b/c Q3 and Q1 more farther apart
- unlike histogram or stem plot, it gives you centre and spread directly, and tells you about
unusually high or low val's
Example: 35
- high val = 35
- do further investigation questioning if 35 is correct
- an outlier
-------------------------
MyStatCrunch instructions
- Graphics -> Boxplot -> select column
- "use fences to identify outliers"
- Otherwise it will show whiskers all the way to top, all the way to bottom -------------------------
[39]
COMPARING DISTRIBUTIONS WITH BOXPLOTS
- ex - classify cereals on which shelf they are, and do they differ in the amt of sugar they have
per serving
- compare these three distributions
- commonality: each measures sugar serving of cereals
- difference: the cases are in a particular shelf
- ie. for "1", only counts cases (ie. cereals) on top shelf
- ie. for "2", only on middle shelf
- ie. for "3", only on bottom shelf
[40]
Comparing the 3 distributions with Histograms
- hard to look at and compare
- have to also decide where middle of distributions are, which is not easily found using this
display [41]
Comparing the 3 distributions with Boxplots
- instead do side-by-side box plots
=> more clearer story from that
- used to compare var's or var's grouped up in diff. ways
Shelf 1
- median at around 4 (line on mid of box)
- whiskers - bottom one short, top one long
=> appear to be skewed to right
- no outliers, b/c no pts plotted by themselves
- Q1 to bottom of data - not v.far
- Q3 to top of data - data quite spread out
Shelf 3 - around 7 for median
- whiskers about same length top and bottom
=> distribution is roughly symmetric
Shelf 2
- where is median?
- There is no line across the box somewhere in the middle
=> median must be same as Q1 or Q3
- longer whisker at bottom
=> distribution appears to be skewed to left
Advantages of Boxplot
- gives quick comparison of distributions if want to know what certain statistical summary is
(given that is is displayed by boxplot)
- ex. could easily compare medians for sugar count b/ween the three distributions
- comparing diff. groups for sth is done best by comparing box plots side-by-side
- ex. cereals on diff. shelves
- comparing length of upper whisker to bottom whisker within one distribution, and b/ween
distributions to get an idea as to direction of skewedness, if any
- both similar/same lengths => symmetric
- upper whisker longer => skewed to right
- lower whisker longer => skewed to left [42]
Where is median for shelf 2?
- median = Q3
- likely b/c there are a lot of cereals that have sugars exactly 12
- b/c so many

More
Less
Related notes for STAB22H3

Join OneClass

Access over 10 million pages of study

documents for 1.3 million courses.

Sign up

Join to view

Continue

Continue
OR

By registering, I agree to the
Terms
and
Privacy Policies

Already have an account?
Log in

Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.