STA215H5 Lecture Notes - Lecture 3: High School Dropouts, Frequency Distribution, Unimodality
STA215 Lecture 3; Chapter 3
SIMPSON’S PARADOX
● Relationships among proportions taken within different groups of subsets can appear
to contradict relationships among the overall population
● Ex. wage decline
○ Median wage has risen about 1% in the United States since 2000
○ But median wage for high school dropouts, high school graduates with no
college education, people with some college education, and people with
Bachelor’s or higher degrees have all decreased
○ ***Overall wages rose, group wages fell
○ Median wage has been increasing because there are now many more college
graduates (who get higher-paying jobs) than there were in 2000, but wages
for college graduates collectively have fallen at a much slower rate (down
1.2%) than for those of lower educational attainment (whose wages have
fallen precipitously, down 7.9% for high school dropouts). The growth in the
proportion of college graduates swamps the wage decline for specic groups
Displaying Quantitative Data;
● No categories in quantitative data so we slice possible values into bins and then
count the number of cases that fall into each of the bins created: THIS IS A
HISTOGRAM
● Summarized in either a frequency, relative frequency, or percent frequency
distribution (Y-AXIS)
● Variables of interest (X-AXIS)
●EX: HISTOGRAM
○ 14 21 23 21 16 19 22 25 16 16
○ 24 24 25 19 16 19 18 19 21 12
○ 16 17 18 23 25 20 23 16 20 19
○ 24 26 15 22 24 20 22 24 22 20
12-14
15-17
18-20
21-23
24-26
Total
Frequency
2
8
11
10
9
40
Relative Frequency
2/40
8/40
11/40
10/40
9/40
40
Percent Frequency
5%
20%
27.5%
25%
22.5%
100%
Distribution;
● Distribution is symmetric if the right and left sides of the histogram are approx. mirror
images of each other
● Distribution is skewed to the right if the right side of the histogram extends much
farther out than the left side
● Distribution is skewed to the left if the left side of the histogram extends much farther
out than the right side
●
● Mode = observation that occurs most frequently
● Modal class: class with the largest number of observations; bin has the highest
number of counts
● Unimodal histogram: one with a single peak
● Bimodal histogram: one with two peaks, not necessarily equal in height (different
heights too)
○ Look at overall pattern and for noticeable deviations from the pattern
○ You can describe the overall pattern by shape, center, spread, and number of
modal classes
○ Outlier point that departs from the body of the distribution
○ Use difference between smallest and largest observations to measure spread
Stem and Leaf Plots
● Histograms don’t show you what the exact values are, and so stem and leaf plots
help in doing so
○ (1) Separate each observation into a stem, consisting of all but the nal
(rightmost) digit, and a leaf, the nal digit
■ ***Stems may have as many digits as needed, but each leaf contains
only a single digit
○ (2) Write the stems in a vertical column with the smallest at the top, and draw
a vertical line at the right of this column
○ (3) Write each leaf in the row to the right of its stem, in increasing order out
from the stem
● Example;
○ 70 72 75 64 58 83 80 82 76 75 68 65 57 78 85 72
■ (1) 57, 58, 64, 65, 68, 70, 72, 72, 75, 75, 76, 78, 80, 82, 83, 85
■ (2)
5
7 8
6
4 5 8
7
0 2 2 5 5 6 8
8
0 2 3 5
Dot Plots;
● Like a stem-and-leaf display but with dots instead of digits for all of its leaves
Document Summary
Relationships among proportions taken within di erent groups of subsets can appear to contradict relationships among the overall population. Median wage has risen about 1% in the united states since 2000. But median wage for high school dropouts, high school graduates with no college education, people with some college education, and people with. Median wage has been increasing because there are now many more college graduates (who get higher-paying jobs) than there were in 2000, but wages for college graduates collectively have fallen at a much slower rate (down. 1. 2%) than for those of lower educational attainment (whose wages have fallen precipitously, down 7. 9% for high school dropouts). The growth in the proportion of college graduates swamps the wage decline for speci c groups. No categories in quantitative data so we slice possible values into bins and then count the number of cases that fall into each of the bins created: this is a.