Class Notes
(808,294)

Canada
(493,118)

University of Toronto Scarborough
(30,808)

Statistics
(266)

STAB22H3
(208)

Ken Butler
(34)

Lecture

# STAB22-LEC02-(3,4).docx

Unlock Document

University of Toronto Scarborough

Statistics

STAB22H3

Ken Butler

Fall

Description

STAB22 LEC02
(Covers chapters 3 and 4)
---------------[CHAPTER3]-----------------
(5)
DISPLAYING AND DESCRIBING CATEGORICAL DATA
- Example with family survey
Why is this categorical data?
- it is categorical data in the sense that a single person answering this survey can give one of
those 5 answers available
- ie. each case falls under one category
Jump to [10]
How do attitudes change form 1991 to 2001.
- can be viewed best using side-by-side bar chart when compared to using single bar chart, or
pie chart
NOTE
- If you look at current lecture notes, we are missing from [11] - [22] for this lecture. This is
b/c he added these slides the following lecture, after he realized that he "rushed" Chapter
3.
(see LEC03 notes)
Now moving onto Chapter 4,
---------------[CHAPTER4]-----------------
Displaying & summarizing QUANTITATIVE data
[22]
(EX) Breakfast cereal data Why is it quantitative?
- variables have val's that are quantities (#'s of stuff) , and have units
- ex. calories for that cereal (in Cal)
- ex. amt of nutrient X in that cereal (in g)
-------------
Note - a var. can still be quantitative if it does not have units, but instead, uses arbitrary units
- ex. score of 500 (this qvar "Score" really has arbitrary units being "points")
-------------
What would be example of a categoricalvariable?
- opinion with regards to a particular cereal
- so it could have categories (OK, Awesome, Average etc.)
-------------
[!] CANNOT USE PIE CHART OR BAR CHART FOR QUANTITATIVE DATA.
------------
[23]
HISTOGRAM - QVAR DISPLAY #1
- one type of display used for qvar's
- its like the bar chart version for qvar's
(ex) histogram for calories per serving
- x-axis shows quan var. calories, with units being Cal.
>- # whatever is being measured
- big tall bar in the middle (100-120)
>- those cereals that have calorie totalling anywhere in between 100 to 200 calories will fall
in that section of the histogram
>- those 100-120 calorie cereals are about 40
-------------
Note - each of these "sections" of the histogram which have bars coming out of them are
called bins -------------
Bar count
- have decent # of bars (with corresponding bins) to show what it is going on
- not too many b/c you get too many small bars
Other observations from Histogram
- few have a lot more 160+ calories or are in the 50 calorie interval
Nature of histogram
- shape = symmetric
- goes to peak in middle
- comes down pretty much the same on both sides
- this is a unimodal distribution
[24]
STEMPLOT - QVAR DISPLAY #2
- this is representing the same data as the histogram previously
- this is same shape as histogram, which you can see from turning the stem plot 90 degrees
- can actually see the # of the individual values How to read this stem plot
- on the left-hand-side (LHS), you have stem
- on the right-hand-side (RHS), you have leaves
- to get a value, match the stem with the leaf:
- ex. 7 : 8 = 78 Cal
Stem plot
- know more accurately what the calorie count was, b/c now know #'s of each individual val.
- can also see to what decimal place were they measured to
- ex. found that calorie count only measured to nearest 10th, sth that we could not
easily tell from histogram
- all that big bar in histogram tells you is that there are val's b/ween 100-120
(doesn't tell you exactly where they are in that range; but stem plot does)
How can this data be represented categorically?
Ex. we could have categorical variable being "Calories", and the categories being:
a) Small (70 and below)
b) Moderate (70 to 120)
c) High (1300+)
=> 3 barred bar chart, or 3 slice pie chart
[25]
BAR CHART OF CALORIES (treated categorically)
- THIS IS A BAR CHART, NOT A HISTOGRAM
- those "spaces" in b/ween the bar chart are just to separate one category from another
- in histogram, those spaces are real
- ie. it means that there was no data val's in that bin.
(ex) Bar chart (left), histogram (right) [26]
EXAMPLE OF SKEWED DISTRIBUTION
CEREAL POTASSIUM DATA HISTOGRAM
- We were previously looking at the cereal calories but for the subsequent graphs, we are
looking at the amount of potassium in each of the cereal cases.
Observing the Potassium data histogram
- peak on left
- long tail of val's that run towards right
=> has right-skewed shape
- the opp. of symmetric is skewed
- "skewed" => not the same on both sides (opp. of symmetric)
=> few cereals with little amts of K
left-skewed [27]
EXAMPLE OF SKEWED DISTRIBUTION
POTASSIUM STEM-AND-LEAF
- tall bars on histogram are on top of stem plot,and the straggle (ie. the parts of the "tail") is
on the bottom of the stem plot
- this stem-and-leaf display (aka stem-and-leaf plot) reads like

More
Less
Related notes for STAB22H3