Class Notes
(806,431)

Canada
(492,249)

University of Alberta
(12,903)

Statistics
(237)

STAT151
(146)

Paul Cartledge
(23)

Lecture

# Ch2.pdf

Unlock Document

University of Alberta

Statistics

STAT151

Paul Cartledge

Fall

Description

2.1 Types of Data
Def’n: A variable is any characteristic that is recorded for subjects in a study.
- Qualitative (categorical): cannot assume a numerical value but classifiable into 2 or
more non-numeric categories. e.g. gender, smell, grades
- Quantitative (numerical): measured numerically.
- Discrete: only certain values with no intermediate values. e.g. integers, grades
- Continuous: any numerical value over a certain interval or intervals.
e.g. GPA, gas prices
Def’n: A frequency table (for qualitative data) is a listing of possible values for a
variable, together with the # of observations for each value.
( Major Frequency f) Relative frequency Percentage (%)
Science
Arts
Business
Nursing
Other
Relative_ frequency = frequency pctg. = (Relative frequency) x 100
∑ f
2.2 Graphical Summaries
Def’n: A pie chart is a circle divided into portions that represent relative frequency
belonging to different categories. e.g. (above table used in class)
Look for: categories that form large and small proportions of the data set.
A bar graph displays vertical bars whose heights represent the frequencies of
respective categories. e.g. (above table used in class)
Look for: frequently and infrequently occurring categories.
Graphs for quantitative variables:
Def’n: A stem-and-leaf plot has each value divided into two portions: a stem and a leaf.
The leaves for each stem are shown separately in a display. (Values should be ranked.)
Look for: - typical values and corresponding spread
- gaps in the data or outliers
- presence of symmetry in the distribution
- number and location of peaks
Ex2.1) U.S. Box Office for weekend of January 2, 2011
25.8 24.4 18.8 12.4 10.3 10.0 9.8 9.3 8.9 7.8
0 | 7.8 8.9 9.3 9.8
1 | 0.0 0.3 2.4 8.8
2 | 4.4 5.8 A comparative S-and-L plot has a common stem to compare two related distributions:
9.8 8.4 7.3 7.0 6.8 5.2 4.8 | 0 | 7.8 8.9 9.3 9.8
5.0 3.8 0.7 | 1 | 0.0 0.3 2.4 8.8
| 2 | 4.4 5.8
Note: Dot plots also exist (see p. 33 in textbook), but “replace” the values with dots.
Def’n: A histogram , like a bar graph, graphically shows a frequency distribution. The
data here, however, is quantitative.
Look for: - central or typical value and corresponding spread
- gaps in the data or outliers
- presence of symmetry in the distribution
- number and location of peaks
The data divide into intervals (normally of equal width).
Cumulative Relative Frequency = (Cumul. freq. of a interval) / (Total obs’ns in dataset)
Total earnings as of Jan. 9/2011 (412 films)
Worldwide Box Office Number of movies Relative Cumulative
(in millions) f Frequency rel. freq.
200 to 599 365 0.867 0.867
600 to 999 49 0.116 0.983
1000 to 1399 5 0.012 0.995
1400 to 1799 0 0 0.995
1800 to 2199 1 0.002 0.997
2200 to 2599 0 0 0.997
2600 to 3000 1 0.002 0.999
Ex2.2) (drawn in class using above data)
NOTE: Dot and S-and-L plots are good for small data sets because data values are
retained. Histograms are better for large data sets to condense the data.
Histogram traits: (corresponding figures drawn in class)
1. Modes (unimodal, bimodal, multimodal, uniform)
2. Skewness (symmetric, left-skewed & right-skewed) Æ term refers to “TAIL”
3. Tail weight (normal, heavy-tailed, light-tailed)
Def’n: A time plot is a graph of data collected over time (or a time series).
Look for: - a trend over time, denoting a decrease or increase.
- a pattern repeating at regular intervals (a cycle or seasonal variation)
Ex2.3) (drawn in class) 2.3 Measures of Center
Def’n: An outlier is an obs’n that falls well above or below the overall bulk of the data.
x x
Population mean: µ =∑ iSamplemean: x = x1+ x2 +...+xn = ∑ i
N n n
The median is the value of the midpoint of a data set that has been ranked in
order, increasing or decreasing. If dataset has an even # of observations, use the average
of the middle 2 values.
Note: median resistant to outliers, mean uses all observations.
Table 2X0 – Estimated provincial populations circa Oct. 2007 (in millions)
ON PQBC ABMBSKNSNB NFLPEI
12.851 7.7204.4033.487 1.190 1.003 0.935 0.751 0.507 0.139
Ex2.4) Avg. pop’n of all provinces: Avg. pop’n from sample of 3 provinces:
∑ xi 12.851+...+ 0.139 ∑ xi 4.403+3.487 +1.190
µ = = = 3.299 x = = = 3.027
N 10 n 3
1.190+1.003
median = 2 =1.097
Outlier effect? (remove Ontario & Quebec)
∑ xi 4.403+...+0.139
x = = =1.552
n 8
Comparing Mean and Median: (corresponding figures drawn in class)
1. Symmetric curve & histogram
- the 2 are identical, lie at center of distribution
2. Right-skewed: Median < Mean
3. Left-skewed: Mean < Median
Def’n: The mode is the most frequent value in a data set.
Exro5.) Æ no mode Movies Æ 200 – 499
2.4 Measures of Spread
Def’n: Range = largest value – smallest value = max – min
Ex2.6) (from Table 2X0) range = 12.851 – 0.139 = 12.712
Deviations from the Mean:
Ex2.7) 1, 2, 4, 3
xi

More
Less
Related notes for STAT151