Chapter 1
Def’n: Statistics:
1) are commonly known as numerical facts
2) is a field of discipline or study
Htets,tics is about variation.
3 main aspects of statistics:
1) Design (“Think”): Planning how to obtain data to answer questions.
2) Description (“Show”): Summarizing the obtained data.
3) Inference (“Tell”): Making decisions and predictions based on data.
Chapter 2 - Data
Def’n: A population consists of all elements whose characteristics are being studied.
Esto.e)nts
samAple is a portion of the population selected for study.
Ex2.2) UofA students in this section
A parameter is a summary measure calculated for population data.
A statistic is a summary measure calculated for sample data.
Types of statistics:
Descriptive: methods to view a given dataset.
Æ averages, histograms
Inferential: methods using sample results to infer conclusions about a larger population.
Æ t-tests, simple linear regression
Def’n: A variable is any characteristic that is recorded for subjects in a study.
- Qualitative (categorical): cannot assume a numerical value but classifiable into 2 or
more non-numeric categories. Æ gender, smell, grades
- Quantitative (numerical): measured numerically.
- Discrete: only certain values with no intermediate values. Æ integers, grades
- Continuous: any numerical value over a certain interval or intervals.
Æ GPA, gas prices
Chapter 3 – Categorical Data Graphs
Def’n: A frequency table (for qualitative data) is a listing of possible values for a
variable, together with the # of observations for each value.
( Major Frequency f) Relative frequency Percentage (%)
Science
Arts
Business
Nursing
Other
f
Relative frequency = f Percentage = Relative frequency × 100%
∑ Graphical Summaries
Def’n: A bar chart is a graph of bars whose heights represent the (relative) frequencies of
respective categories. Ex3.1) (preceding table used in class)
Look for: frequently and infrequently occurring categories.
A pie chart is a circle divided into portions that represent (relative) frequency
belonging to different categories. Ex3.2) (preceding table used in class)
Look for: categories that form large and small proportions of the data set.
A segmented bar chart uses a rectangular bar divided into segments that represent
frequency or relative freq. of different categories. Ex3.3) (preceding table used in class)
Chapter 4 – Numerical Variable Graphs
Def’n: A stem-and-leaf display has each value divided into two portions: a stem and a
leaf. The leaves for each stem are shown separately. (Values should be ranked.)
Look for: - typical values and corresponding spread
- gaps in the data or outliers
- presence of symmetry in the distribution
- number and location of peaks
Ex4.1) U.S. Box Office for weekend of Dec. 27 – 29, 2013
29.0 28.6 19.7 18.7 18.4 13.5 12.8 10.1 9.9 7.3
0 | 7.3 9.9
1 | 0.1 2.8 3.5 8.4 8.7 9.7
2 | 8.6 9.0
Note: Dotplots also exist (see p. 52 in textbook), but “replace” the values with dots.
Def’n: A histogram , like a bar graph, graphically shows a frequency distribution. The
data here, however, is quantitative.
Look for: - central or typical value and corresponding spread
- gaps in the data or outliers
- presence of symmetry in the distribution
- number and location of peaks
The data divide into intervals (normally of equal width).
Cumulative Relative Frequency = (Cumul. freq. of a class) / (Total obs’ns in dataset)
Table 4X0 – Total earnings as of Jan. 7/2014 (551 films)
Worldwide Box Office Number of movies Relative Cumulative
(in millions) f Frequency rel. freq.
200 to 599 466 0.8457 0.8457
600 to 999 68 0.1234 0.9691
1000 to 1399 14 0.0254 0.9946
1400 to 1799 1 0.0018 0.9964
1800 to 2199 1 0.0018 0.9982
2200 to 2599 0 0.0000 0.9982
2600 to 3000 1 0.0018 1.0000 Ex4.2) (drawn in class using above data)
NOTE: Dot and S-and-L plots are good for small data sets because data values are
retained. Histograms are better for large data sets to condense the data.
Histogram shapes/traits: (corresponding figures drawn in class)
1. Modes (unimodal, bimodal, multimodal, uniform)
2. Skewness (symmetric, left-skewed & right-skewed) Æ term refers to “TAIL”
3. Tail weight (normal, heavy-tailed, light-tailed)
Def’n: A timeplot is a graph of data collected over time (or a time series).
Look for: - a trend over time, denoting a decrease or increase.
- a pattern repeating at regular intervals (a cycle or seasonal variation)
Ex4.3) (drawn in class)
Chapters 4/5 – Summary measures (and one more graph)
Measures of Center
Def’n: An outlier is an obs’n that falls well above or below the overall bulk of the data.
∑ y i y1+ y2+...+ yn ∑ yi
Population mean: µ = N Samplemean: y = n = n
The medianis the value of the midpoint of a data set that has been ranked in
order, increasing or decreasing. If dataset has an even # of observations, use the average
of the middle 2 values.
Note: median resistant to outliers, mean uses all observations.
Table 5X0 – Estimated provincial populations circa Jul. 2011 (in millions)
ON PQBC ABMBSKNSNB NLPEI
13.373 7.9804.5733.779 1.250 1.058 0.945 0.756 0.511 0.146
Ex5.1) Avg. pop’n of all provinces: Avg. pop’n from sample of 3 provinces:
y y
µ = ∑ i =13.373+...+ 0.146= 3.437 y= ∑ i = 4.573+3.779+1.250 = 3.201
N 10 n 3
Outlier effect? (remove Ontario & Quebec)
1.250+1.058 y 4.573+...+0.146
median = =1.154 y= ∑ i = =1.627
2 n 8
Comparing Mean and Median: (corresponding figures drawn in class)
1. Symmetric curve & histogram
- the 2 are identical, lie at center of distribution
