Class Notes (808,294)
Canada (493,118)
Statistics (266)
STAB22H3 (208)
Ken Butler (34)


12 Pages
Unlock Document

University of Toronto Scarborough
Ken Butler

STAB22 LEC02 (Covers chapters 3 and 4) ---------------[CHAPTER3]----------------- (5) DISPLAYING AND DESCRIBING CATEGORICAL DATA - Example with family survey Why is this categorical data? - it is categorical data in the sense that a single person answering this survey can give one of those 5 answers available - ie. each case falls under one category Jump to [10] How do attitudes change form 1991 to 2001. - can be viewed best using side-by-side bar chart when compared to using single bar chart, or pie chart NOTE - If you look at current lecture notes, we are missing from [11] - [22] for this lecture. This is b/c he added these slides the following lecture, after he realized that he "rushed" Chapter 3. (see LEC03 notes) Now moving onto Chapter 4, ---------------[CHAPTER4]----------------- Displaying & summarizing QUANTITATIVE data [22] (EX) Breakfast cereal data Why is it quantitative? - variables have val's that are quantities (#'s of stuff) , and have units - ex. calories for that cereal (in Cal) - ex. amt of nutrient X in that cereal (in g) ------------- Note - a var. can still be quantitative if it does not have units, but instead, uses arbitrary units - ex. score of 500 (this qvar "Score" really has arbitrary units being "points") ------------- What would be example of a categoricalvariable? - opinion with regards to a particular cereal - so it could have categories (OK, Awesome, Average etc.) ------------- [!] CANNOT USE PIE CHART OR BAR CHART FOR QUANTITATIVE DATA. ------------ [23] HISTOGRAM - QVAR DISPLAY #1 - one type of display used for qvar's - its like the bar chart version for qvar's (ex) histogram for calories per serving - x-axis shows quan var. calories, with units being Cal. >- # whatever is being measured - big tall bar in the middle (100-120) >- those cereals that have calorie totalling anywhere in between 100 to 200 calories will fall in that section of the histogram >- those 100-120 calorie cereals are about 40 ------------- Note - each of these "sections" of the histogram which have bars coming out of them are called bins ------------- Bar count - have decent # of bars (with corresponding bins) to show what it is going on - not too many b/c you get too many small bars Other observations from Histogram - few have a lot more 160+ calories or are in the 50 calorie interval Nature of histogram - shape = symmetric - goes to peak in middle - comes down pretty much the same on both sides - this is a unimodal distribution [24] STEMPLOT - QVAR DISPLAY #2 - this is representing the same data as the histogram previously - this is same shape as histogram, which you can see from turning the stem plot 90 degrees - can actually see the # of the individual values How to read this stem plot - on the left-hand-side (LHS), you have stem - on the right-hand-side (RHS), you have leaves - to get a value, match the stem with the leaf: - ex. 7 : 8 = 78 Cal Stem plot - know more accurately what the calorie count was, b/c now know #'s of each individual val. - can also see to what decimal place were they measured to - ex. found that calorie count only measured to nearest 10th, sth that we could not easily tell from histogram - all that big bar in histogram tells you is that there are val's b/ween 100-120 (doesn't tell you exactly where they are in that range; but stem plot does) How can this data be represented categorically? Ex. we could have categorical variable being "Calories", and the categories being: a) Small (70 and below) b) Moderate (70 to 120) c) High (1300+) => 3 barred bar chart, or 3 slice pie chart [25] BAR CHART OF CALORIES (treated categorically) - THIS IS A BAR CHART, NOT A HISTOGRAM - those "spaces" in b/ween the bar chart are just to separate one category from another - in histogram, those spaces are real - ie. it means that there was no data val's in that bin. (ex) Bar chart (left), histogram (right) [26] EXAMPLE OF SKEWED DISTRIBUTION  CEREAL POTASSIUM DATA HISTOGRAM - We were previously looking at the cereal calories but for the subsequent graphs, we are looking at the amount of potassium in each of the cereal cases. Observing the Potassium data histogram - peak on left - long tail of val's that run towards right => has right-skewed shape - the opp. of symmetric is skewed - "skewed" => not the same on both sides (opp. of symmetric) => few cereals with little amts of K left-skewed [27] EXAMPLE OF SKEWED DISTRIBUTION  POTASSIUM STEM-AND-LEAF - tall bars on histogram are on top of stem plot,and the straggle (ie. the parts of the "tail") is on the bottom of the stem plot - this stem-and-leaf display (aka stem-and-leaf plot) reads like
More Less

Related notes for STAB22H3

Log In


Don't have an account?

Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view


By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.