Textbook Notes (363,019)
Canada (158,147)
Statistics (112)
STAT141 (28)

chp. 24, 25, 26,27 & 28.docx

17 Pages
Unlock Document

University of Alberta
Oksana Kotovych

Chapter 24- Comparing Means Ex: Are the mean scores on an exam significantly different for men than women? Ex: Do students in morning stat 141 classes perform better on exams than students in afternoon stat 141 classes? Compare 2 independent samples. Independent – selection of individuals doesn’t influence the selection of individuals for other sample. Population: Mean SD Pop 1 µ1 σ 1 Pop 2 µ2 σ 2 Sample: Sample Size Mean SD Samp. From pop 1 n1 Ӯ1 S1 Samp. From pop 2 n2 Ӯ1 S2 n1 and n2 don’t have to be equal! Focus on the difference: µ1- µ2 Estimate: ӯ1- ӯ2 The sampling distribution of ӯ1- ӯ2 1) Take a sample of size n1 from pop 1. Compute ӯ1. 2) Take a sample of size n2 from pop 2. Compute ӯ2. 3) Compute ӯ1- ӯ2 4) Repeat steps 1-3 many times 5) Construct a histogram of all the values of (ӯ1- ӯ2). Approximate with curve. What is the center, shape, spread? General Properties 1) E(ӯ1- ӯ2) = E(ӯ1) - E(ӯ2)= µ1- µ2 = µ(ӯ1- ӯ2) This is the mean of all the values of (ӯ1- ӯ2). 2) Var(ӯ1- ӯ2) = var(ӯ1) + var(ӯ2) the equal sign shows independence √var(ӯ1)+var(ӯ2) SD(ӯ1- ӯ2) = = σ(ӯ1- ӯ2) = SD of all the values of (ӯ1- ӯ2). S12 S2² We approximate this with the standard error: SE(ӯ1- ӯ2) = + √ n1 n2 3) In each group, n should be reasonably large or the population distribution should be roughly normal. In that case – normal shape. µ ± ȳ σ ≤ ≥ ≠ Chapter 24 – continued Note: when ∆ₒ is = 0, then Hₒ: µ1 = µ2 or µ1 - µ2 = 0 This says: no difference between means. Ex: To compare the age at first marriage of females in 2 ethnic groups A & B in a random sample of 100 ever-married females was taken from each group and the ages at first marriage were recorded. Results: Figure 1 – A) is the distribution of y-values normal for each population? B) Construct a 95% CI for µa- µb. c) do these data provide strong evidence that the mean age at first marriage is different for the 2 groups? Alpha = 0.05. d) Compare results from b and c. Explain why the results are consistent with each other. Solution: µA= true mean age at first marriage for group A, µB = “ ” B (Doesn’t matter which is pop 1, just consistent throughout) Compute df. vA = Sa squared/ na = 6.3 squared / 100 = 0.3969 vB = Sb squared/ nb = 5.8 squared/ 100 = 0.3364 df = (0.3969+0.3364) / ((0.3969 squared/ 99)+ (0.3364 squared/99)) = 196.7 round down to 196 a) Distribution not normal. Why? Y-values > ages at first marriage. b) ȳ- s, ȳ-25 > very low ages, not logical here. 68-95-99.7% rule violated. CI: T table, row: df =180 (closest), column: C = 0.95 Figure 2. We’re 95% confident that µa - µb lies b/w these 2 numbers. c) Hnot: µa = µb or µa - µb = 0 Ha: µa cannot equal µb or µa - µb cannot equal to 0 Fig 3. Note: this is not the same t that’s used for CI. Fig 4. Compare the p-value to alpha. Since p-value is less than 0.02, it’s < 0.005 so reject Hnot. These 2 samples provide enough evidence that the mean ages at first marriage are different for the 2 groups. d) Deltanot = 0 is not in the interval, so reject Hₒ. Ie) zero is not a plausible value of the true µa- µb. this is consistent with part c. (95% CI, alpha = 0.05) Note: the t computed for hypothesis testing is not the same t used for CIs. Hypothesis testing: t computed from sample data. CIs: t is a fixed value on the table. Note: if we interchanged pop A &B, then the results would be the same. The t would be negative, and the interval (a,b) would be (-b, -a). *Skip the pooled t test in this chapter. Chapter 25- paired samples and blocks Recall: we have 2 independent samples when 2 unrelated sets of individuals are measured, one sample from each population. Here, sample sizes might be different. Now: we have paired or matched samples when we know in advance that an observation in one data set is directly related to a specific observation in the other data set. Here, n1 = n2 always. Layout: Figure 1 Ex: “before” and “after” scenario, a number of patients have their blood pressures recorded for and after taking a certain pill. Figure 2 We reduce such situations to a 1-sample problem. ie) a paired t test is just a 1-sample t test for the means of differences. Let µd = µ1- µ2 = mean value of the difference population σ d= SD of the difference pop. Sample: dbar = mean of pairwise diff. n = number of pairs (n = n1=n2) SE (dbar) = Sd/ square root n, Sd = SD of differences CI for µd: Fig 3. Assumptions: - Paired data - Differences independent of each other - Differences - random sample - N large or differences follow a normal distribution *same procedure as for 1 sample problems, but use different symbols Ex: consider the following paired data: Fig 4. a) Compute d bar and Sd. b) Do a test of Hₒ: µd = 0, Ha: µd not equal to 0, use alpha = 0.05 c) Construct a 95% CI for µd. Sol’n: Fig 5. These sample differences don’t provide enough evidence that µd is not equal to 0. Here, dbar is not equal to deltanot ie) 0.25 is not equal to 0, but the differences aren’t significant. Note: If you had computed (y-x) instead of (x-y), then you’d get t=-0.225. Fig 6. We’re 95% confident that µd lies between these 2 values. Note: if you had done y-x instead of x-y, then you’d get -3.2779 > 3.2779 ie) the signs of the CI numbers would change. µ ± ȳ σ ≤ ≥ ≠ Ex: A researcher measured the corneal thickness of 8 patients who have glaucoma in one eye but not in the other. Results are data on corneal thickness in microns: Patient Normal Glaucoma Difference (N-G) 1 484 488 -4 2 478 478 0 3 492 480 12 4 444 426 18 5 436 440 -4 6 398 410 -12 7 464 458 6 8 476 460 16 Do the data provide enough evidence to conclude that the true mean corneal thickness is greater in normal eyes than in eyes with glaucoma? Alpha = 0.10 Sol’n/ Fig 7. Fail to reject Hnot. These sample differences don’t provide sufficient evidence that µd > 0. Note: d bar = 4.00, deltanot = 0 Dbar > deltanot, but he results aren’t statistically significant. D bar is not “bigger enough” than deltanot for us to reject Hnot. Reminder: using SE (dbar) as a ruler here. RECAP: 1 & 2 sample procedures Statistics – y bar, s, p(hat), y1 bar – y2 bar, p1(hat)- p2(hat), d(bar) Parameter - µ, σ, p, µ1- µ2, p1-p2, µ(d) Proportions – z-scores only Means- only use z if σ known (chp. 18 only) – switch to t, since only s known. Proportions- one sample – 2 samples (independent) – pool proportions only when hypothesis testing. Here, pnot = 0 ONLY! Means- one sample – two samples 1) independent case 2) matched pairs case Difference: matched pairs – need evidence of matching or pairing. Hint: see ex. In chapter 24 & 25 to see the difference. Finding z or t – CI (z or t): T table row – correct df ( use infinity for z-scores), column – correct confidence level. For both, report the z or t given there. Hypothesis testing – compute z or t by using sample data. P-value – use z table when using z, approximations from T table when using t. The z, t for CIs is not the same as the z, t for hypothesis testing! 1) Determine if you’re working with means or proportions. Means – y bar & s given in question. Proportions – either the values of p hat & n are given, or the question is phrased as: Ex: “out of 2000 students, 1500 said yes” 2) Determine how many samples you have. Ex: are you just given y(bar), or do you have y1 bar & y2 bar? 3) To determine which Ha to use, look for key words. “…evidence of increase..” = upper tailed test, “…decrease…” = lower tailed test, “..some difference…” = two tailed test. P value regions (z or t): 1) Upper-tailed 2) Lower-tailed 3) 2-tailed test Hₒ- what we assume to be true initially. Ha- what we’re trying to prove. CIs: statistic +/- margin of error – want to keep ME small – best way – increase n. P-values: the larger the absolute value of t or z, the smaller the p-value. Chapter 26 – Comparing Counts This chapter deals with multiple proportions. χ² – new test statistic Fig 1. 4) For this chapter, the P-value is to the right of a computed chi squared value. 5) Large chi squared responds to a small P-value. Table: χ² table Ex: For χ² values not on the table, do an approximation. (space constraints, only some values listed) Ex: df = 9, computed χ² = 15.3, select row: df=9 and you’ll find 14.684 < 15.3 < 16.919 14.684 (right tailed prob = 0.10), 16.919 (right tail prob = 0.05) So P( χ² ≥ 15.3) is between 0.05 and 0.10. Fig 2. Goodness-of-fitness test: a test of whether the distribution of counts in one categorical variable matches the distribution predicted a model. Recall: categorical var = non numerical variable Ex: there is the frequency distribution of blood types of 96 people: Blood type O A B AB Total Freq. 38 43 10 5 96 Do these data contradict the belief that the 4 blood types are equally likely to occur? This is a one-way frequency table with k = 4 categories (old edition uses n = 4 categories) Observed counts: 38, 43, 10, 5 Cell – a spot where we write in the observed counts. We test hypotheses about the proportion of the population that falls in each category. Our example: Is the long-run proportion of people falling into each of the 4 categories equal to 0.25? Notation: k=number of categories P1= true proportion for category 1, P2= true proportion for category 2…Pk = true proportion for category k. Note: p1 + p2 …+ pk = 1 Denote the hypothesized values by (pₒ)1, (pₒ)2…(pₒ)k. Then Hₒ: p1 = (pₒ)1, p2 = (pₒ)2,… pk= (pₒ)k Ha: Hₒ is not true. At least one proportion differs from its hypothesized value. Note: Ha is many-sided because there are many ways that Hₒ can be wrong. Ha – anything except Hₒ fits in here. Our example: P1 = true proportion of people with blood O, P2 = blood A, P3 = blood B, P4 = blood AB Hₒ: p1 =p2=p3=p4 = 0.25 …..ie) the long-run proportion of people falling into each of the 4 categories is the same. Ha: At least one value of P is not equal to 0.25….. ie) the proportion of people falling into each of the 4 categories is not the same. Compute the expected counts assuming Hₒ is true. Expected count: (total number of observations) X (hypothesized value) Our example: each expected count = (96)(0.25)= 24 O A B AB Expected counts 24 24 24 24 Observed counts 38 43 10 5 Test statistic: χ²= sum of (observed – expected)²/expected = sum of (O-E)²/E, df = k-1 Observed counts – numbers in the random sample Expected counts – (total)(hypothesized value) p-value = P (χ² ≥ to computed χ²) use χ² table. Assumptions - Data – counts for the categories of a categorical var - The counts in cells – independent of each other - Counts – random sample - Each expected count is ≥ to 5 Ex: alpha = 0.05, χ²= (38-24)²/24…..type O….do the rest for Type A, B, and AB and the sum you get will be = 46.4167, df = 4-1=3, assumptions hold χ² table, row: df = 3, highest χ² there is 12.838…P(χ² is ≥ 12.838) = 0.005 since 46.7167 is > 12.838, P(χ² is ≥ 46.7167) < 0.005 Fig 3. Compare p-value to alpha. Since p-value < 0.005, it’s
More Less

Related notes for STAT141

Log In


Don't have an account?

Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view


By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.