Class Notes (838,093)
Statistics (248)
STAT151 (157)
Susan Kamp (11)
Lecture

# 28, 30.pdf

23 Pages
82 Views

School
Department
Statistics
Course
STAT151
Professor
Susan Kamp
Semester
Fall

Description
28 ANOVA (ANalysis Of VAriance) Are the Means of Several Groups Equal? - We already know how to test whether two groups have equal means (Ch 24) - When we want to test whether more than 2 groups have equal means, we could compare each pair of groups with a t-test. - However, we’d wind up increasing the probability of a Type I error, since each test would bring with it its own α. - Fortunately, there is a test that generalizes the t-test to any number of treatment groups. - For comparing several means, there is yet another sampling distribution model, called the F-model. Here, we introduce the tool, ANOVA, for comparing the means of at least two populations. It will provide us with the opportunity to make general conclusions by finding an overall error probability α. There are 2 possible conclusions: 1. There are some differences among the means at a level of significant α. 2. There are no differences among the means at a level of significant α. 1 of 23 Example: We want to compare three brands of gasoline. Scenario 1 Brand1 Brand2 Brand3 15 19 22 19 17 17 14 16 19 16 20 18 Average 16 18 19 St. Dev. 2.16 1.83 2.16 - It looks like these observations could have occurred from treatments with the same means. Let’s consider another scenario: Scenario 2 Brand1 Brand2 Brand3 15.2 18.5 19.6 16.1 17.5 19.3 16.8 18.2 18.4 15.9 17.8 18.7 Average 16 18 19 St. Dev. 0.66 0.44 0.55 - It’s easy to see that the means in the second set differ. o It’s hard to imagine that the means could be that far apart just from natural sampling variability alone. 2 of 23 NOTE: the two sets of mean travel distances in both scenarios are the same. (They are 16, 18, and 19, respectively.) Then why do the figures look so different? - In order to assess whether these sample means differed because their respective population means actually are different, we need to introduce variability. - Two types of variability: 1)variability within (inside) the groups 2)variability between the groups (ie. Difference between the means of the groups) - In the first scenario, the differences among the means look as though they could have arisen just from natural sampling variability from groups with equal means, so there’s not enough evidence to reject H . 0 - In the second scenario, the variation within each group is so small that the differences between the means stand out. - And it’s the central idea of the F-test. variation between thegroups F  variation within thegroups 3 of 23 o We compare the difference between the means of the groups with the variation within the groups o When the differences between means are large compared with the variation within the groups, we reject the null hypothesis and conclude that the means are (probably) not equal. - We have an estimate from the variation within groups. o Traditionally, it’s called the Error Mean Square (or sometimes Within Mean Square) and written MSE. o It’s just the variance of the residuals. 2 o Because it’s a pooled variance, we write it s p - We’ve got a separate estimate from the variation between the groups. o We call this quantity the Treatment Mean Square (or sometimes Between Mean Square) denoted by MS T. o We expect it to estimate σ , if we assume the null hypothesis is true. The F-Statistic - When the null hypothesis is true and the treatment means are 2 equal, both MS anE MS estimTte σ , and their ratio should be 4 of 23 close to 1. - We can use their ratio, the F-statistic F = MS /MS to test the T E null hypothesis. - Just like Student’s t, the F-models are a family of distributions. However, since we have 2 variance estimates, we have 2 degrees of freedom parameters: o MS esTimates the variance of the treatment means and has df1= k – 1 when there are k groups o MS isEthe pooled estimate of the variance within groups. If there are a total of N observations and there are k groups, MS hEs df = 2 – k. - You’ll often see the Mean Squares and other info put into a table called the ANOVA table. - For the soap example in the book, the ANOVA table is: - The ANOVA table was originally designed to organize the calculations. - With advances in technology, we get all of this info, but we only need to look at the F-ratio and the p-value 5 of 23 - Usually, you’ll get the P-value for the F-statistic from technology. Any software program performing an ANOVA will automatically “look up” the appropriate one-sided p-value for the F-statistic. - If you want to do it yourself, you’ll need an F-table. ANOVA assumptions 1. We have k independent random samples, one from each of the k populations. 2. The data within each treatment group must be independent. 3. The i population has a normal distribution with unknown mean µ, ihere i = 1,…, k. The means may be different. (Check by a histogram or a Normal probability plot of all the residuals together; Check for outliers for each treatment group) 4. All the populations have the same standard deviation σ, whose value is unknown. The hypotheses in ANOVA are: H : µ = µ = … = µ 0 1 2 k H a the means are not all equal SST F  statistic   I 1 The F-statistic is: MSE SSE n I 6 of 23 The F distributions 1. are a family of distributions with two parameters: the degrees of freedom in the numerator and denominator; 2. interchanging the degrees of freedom changes the distributions; 3. are right-skewed; 4. have no probability below 0; 5. the peak of the density curve is near 1; P-value p-value = P(F > F0) with df1= k – 1 and d2 = N – k; (ie. The p-value for 0 is the probability of obtaining a sample with an F-statistic greater than the observed one i0 H is true.) NOTE: the alternative in the ANOVA is always the same. It is not one-sided or two-sided, but multi-sided. P-values are always and only calculated as the probability in the right tail of an F- distribution. ANOVA Table for k Independent Random Samples Source df SS MS F Treatments k-1 SS T MS TSS /Tk-1) MS /TS E Error N-k SS E MS ESS /EN-k) Total N-1 Total SS 7 of 23 Example (Continued): Using the Data from Scenario 1, the following ANOVA summary is resulted: ANOVA Source of Variationdf SS MS F P-value F crit Between Groups 2 18.66667 9.333333 2.210526 0.165597 4.256492 Within Groups 9 38 4.222222 Total 11 56.66667 8 of 23 Example: In an effort to improve the quality of recording tapes, the effects of four kinds of coatings A, B, C, D on the reproducing quality of sound are compared. The following values on distortion are obtained: A B C D 10 14 17 12 15 18 16 15 8 21 14 17 12 15 15 16 15 17 15 15 15 18 Average 12 17 16 15 Variance 9.5 10 2 2.8 The following ANOVA output is obtained: ANOVA Source of Variation SS df MS F P-value F crit Between Groups 68 3 22.66667 4.340426 0.018136 3.159911 Within Groups 94 18 5.222222 Total 162 21 With the help of such a sample, we want to decide if the four different coatings result in different mean distortions at α = 0.05 9 of 23 10 of 23 26 Comparing Counts This chapter will cover 2 types of tests: 1. Tests of hypotheses for experiments with more than 2 categories, called goodness-of-fit tests. 2. Tests of hypotheses about contingency tables, called independence and homogeneity tests. 2 Goodness-of-Fit Test or Univariate  Test 2 In this section, the test for comparing the frequency distribution of a categorical variable with more than 2 categories from a sample with a given model probability distribution is introduced. Example: A company filling grass seed bags wants to evaluate their filling machine. The following distribution is advertised on the bags: Kinds of seeds Proportion (expected frequency) K1 0.5 K2 0.25 K3 0.15 K4 0.05 K5 0.05 The company wants to check if the seed distribution in the bags fits the advertised distribution. 11 of 23 They take a sample of size 1000 and find the following summarized data: Kinds of seeds Count K1 480 K2 233 K3 160 K4 63 K5 64 The 2 Goodness-of-Fit Test Given a distribut
More Less

Related notes for STAT151
Me

OR

Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Join to view

OR

By registering, I agree to the Terms and Privacy Policies
Just a few more details

So we can recommend you notes for your school.