Population vs. sample Æ Parameter vs. statistic
Statistics: Descriptive vs. inferential
Types of variables
Quantitative vs. Qualitative
Tables, charts & graphs
- frequency tables
- qualitative: bar graph/pie chart
- stem-and-leaf plot/dot plot
- time plot
- histogram (modality)
- traits: # of modes, tail weight, overall shape (symmetry, skewness)
- identify skewness by TAIL
- boxplot (skewness)
- outliers, overall shape (symmetry, skewness)
- identify skewness inside box or entire graph
Measures of center/spread/position
- center: mean, median, mode
Æ Outlier effect? Skewness effect?
- spread: range, variance, standard deviation, IQR
Æ Why use squared and (n – 1)? Ever negative? Empirical Rule?
- position: min, max, percentiles (quartiles)
Æ recall that we INCLUDE the median when determining quartiles
Æ 5-number summary, boxplot, types of outliers
Displaying bivariate data
- scatterplot: visual aid to see form/strength/direction of relationship
and/or outliers (large residual, high leverage, influential)
- correlation: numerical aid to see strength/direction of relationship (range?)
Æ Warning: assumes linearity, sensitive to outliers
Simple linear regression analysis
- regression line: ŷ = b + b x
⎛ sy ⎞
- least-squares estimation gives b1 = r⎜ ⎟ and b 0 y −b x 1
⎝ sx ⎠
- estimation: interpolation vs. extrapolation (BAD!)
- R-squared: r 2 = proportion of variation in y explained by x
- causation: association does NOT imply causation
- residual plots: observed vs. theoretical appearance
- transformation of a variable can help improve linearity Chapter 11-13:
- observational/retrospective/prospective study, experiment/controlled clinical trial
Æ population and causal inferences (what needs to be present for each?)
- types of bias (response, undercoverage, nonresponse)
- types of sampling: with/without replacement, SRS/stratified/cluster/
- controlling factors: randomization, blocking, direct control, replication
- more experiment design definitions
- types of events: marginal, conditional, union, intersection, complement,
- What common words identify them?
- relating events: dependent vs. disjoint vs. independent
- Do these relations affect the rules below? If so, how?
- Do they allow certain rules to be easily extended?
- probability laws:
- conditional probability: P(A| B) =P(A∩ B)
- complement rule:P(A ) = 1 – P(A)
- multiplication rule: P( A∩ B) = P(A and B) = P(A) × P(B | A) = P(B) × P(A | B)
- addition rule: P(A or B) = P(A) + P(B) – P(A and B)
- total probability rule:A) P= +()∩P ∩A B C
- recall examples where we combined a few of these together
- discrete (exact probability or intervals) vs. continuous (only intervals)
DisIcfrete: P(X = a) > 0, then P(X ≤ a) ≠ P(X < a)
Continuous: If P(X = a) = P(X = b) = 0, then P(a ≤ X ≤ b) = P(a < X < b)
- discrete distributions:
- determine probability distribution (values of X and corresponding probabilities)
- mean: µ = ∑ xiP(X = x i
- variance: σ = ∑ (xi− µ) P(X = x )i= ∑ xiP(X = x ) i µ 2
n n! n ⎜ ⎟ n!
- permutations/combinations: P r (n− r)! and C r= ⎜ r = r!(n −r)!
- binomial dist’n: indep. trials, two outcomes/trial, constant p, X = # of successes
⎛ ⎞ x n−x
f( ) = ⎜ ⎟ p (1− p) x = 0, 1, …, n
µ = E(X) = np and σ =V(X))= np(1− p
- continuous distributions:
- uniform distribution: finding an area of a rectangle (with a twist!)
- normal distribution: symmetric, 2 parameters: µ and σ, other properties Standard Normal Distribution (and its applications)
- µ = 0 and σ = 1
- Table Z only gives areas to left of valuez, conversion to these values required
Æ use diagrams, complements, symmetry, etc.
⎛ X − µ x− µ ⎞
- standardizing: P(X ≤ xÆ P⎜ σ ≤ σ ⎟= P(Z ≤ z)
- identifying values for a given probability: x = µ + zσ
- normal approximation to binomial: If X ~ B(n, p), np ≥ 10 and n(1 – p) ≥ 10, then
⎛ ⎞ xpn
PX() ≤≈ P Z ⎜ ⎟ np (1− p )
Combinations and Functions of Random Variables
For any constants a and b,
1. E(a) =. a 1 V(a) = 0
2. E(aX) = aE(X) 2. V(aX) = a V(X)
3. E(aX + b) = aE(X) + b 3. V(aX + b) = a 2(X) 2
4. E(aX ± bY) = aE(X) ± bE(Y) 4. V(aX ± bY) = a V(X) + b V(Y) ± 2abcov(X, Y)
Y = a1 1+ a 2 2 … + a X n n, E(Y) = a E(1 ) +1a E(2 ) +2… + a E(Xn) +nb
If X1, 2 , …, n are independent,V(Y) = a 1(X ) 1 a V2X ) +2..+ a V(n ) n
- sample proportion:
p(1− )p pq
Rule 1: µ p p. Rule 2:σ p = = .
Rule 3: If np and n(1 – p) are both ≥ 15, then has an approx. normal dist’n.
All 3 rules Æ If rule 3 holdsZ N pp− ▯ (0,1)
- sample mean:
Rule 1: µ y µ . Rule 2: σ y .
Rule 3: When the population distribution is normal, the sampling distribution of
y is also normal for any sample size n.
Rule 4 (CLT): When n > 30, the sampling distribution of y is well approximated
by a normal curve, even when the population distribution is not itself normal.
All 4 rules Æ If n is large OR the population is normal, N Y −µ ▯ (0,1)
σ / n Chapter 19:
- how to interpret CI?
- generic CI: point estimate ± (critical value) × (standard error)
Æ confidence level increases, ME increases
Æ n increases, ME decreases
- sample proportion:
Assumptions: random sample, np ≥ 15 and n(1− p) ≥ 15.
p(1 − p)
p ± zα / 2 n
- choosing n:
n ≈ p (1