Class Notes (836,521)
Canada (509,851)
Statistics (248)
STAT151 (157)
Susan Kamp (11)
Lecture

18.pdf

28 Pages
58 Views
Unlock Document

Department
Statistics
Course
STAT151
Professor
Susan Kamp
Semester
Fall

Description
18 Sampling Distribution Models Definition: 1)Population parameter is a numerical measure such as the mean, median, mode, range, variance, or standard deviation calculated for a population data; and is written with Greek letters. Eg. µ and σ. - Usually unknown and constant 2)Sample statistic is a summary measure calculated for a y, s sample data set; it is written with Latin letters. Eg. - Regarded as random before sample is selected - Observed after sample is selected 3)The value of the statistic varies from sample to sample, this is called sampling variability. 4)The distribution of all the values of a statistic is called its sampling distribution. Population and Sample Proportions Suppose we are just interested in one characteristic occurred in the population of interest. For convenience, we will call the outcome we are looking for “Success” (S). The population proportion, p, 1 of 28 is obtained by taking the ratio of the number of successes in a population to the total number of elements in the population. Example for population proportion: - check for N students, how many are "nonresidents". - check how many out of N patients survived at least five years, after a specific cancer treatment. Recall: N is population size. Looking at a SRS of the size n from a large population, the probability p can be estimated by calculating the sample ˆ proportion (relative frequency) of Successes, p . That is, p  number of Successes(S)inthe sample sample size . Example for sample proportion: - flip n coins and observe if “Tail” was tossed. - look at n random persons and survey how many have an IQ above 120. - look at n random students and survey how many have more than two siblings. 2 of 28 Example: Suppose a total of 10,000 patients in a hospital and 7,000 of them like to play basketball. A sample of 200 patients is selected from this hospital, and 128 of them like to play basketball. Find the proportion of patients who like to play basketball in the population and in the sample. Find the sampling error for this case while assuming that the sample is random and no nonsampling error has been made. Sampling error = The Sampling Distribution of a Sample Proportion ( ) p - Consider two different samples from a population, which you want to use for estimating the proportion of people with more than 2 siblings in the population. Use the statistic "proportion" for both samples. Are the outcomes the same? - Most likely not! This is known as sampling variability. - imagine we draw many samples and look at the sample proportions for these samples. 3 of 28 - The histogram we’d get if we could see all the proportions from all possible samples is called the sampling distribution of the proportions. - What would the histogram of all the sample proportions look like? - We would expect the histogram of the sample proportions to center at the true proportion, p, in the population. - RULE 1:  p is the mean of the sampling distribution of pˆequals p:  p p Notation:  p is also denoted as  p  - RULE 2: The standard deviation of the sampling ˆ  distribution of p , p, is: p(1 p)  pˆ n Notation:  p is also denoted as SD p  . NOTE: the standard deviation of a sampling distribution is called a standard error. Thus, SD p  is called a standard error. 4 of 28 - As far as the shape of the histogram goes, we can simulate a bunch of random samples that we didn’t really draw. - It turns out that the histogram is unimodal, symmetric, and centered at p. - More specifically, the sampling proportion p is approximately normal distributed for large n. (Central Limit Theorem) RULE 3: A rule of thumb states that the sample size is considered to be sufficiently large if: np > 10 and n(1 – p) > 10 NOTE: When n is large and p is not too close to 0 or 1, the sampling distribution of p is approximately normal. The further p is away from 0.5, the larger n must be for accurate normal approximation of p . - A sampling distribution model for how a sample proportion varies from sample to sample allows us to quantify that variation and how likely it is that we’d observe a sample proportion in any particular interval. - So, the distribution of the sample proportions is modeled with a probability model that is 5 of 28  pq  N p,   n  - Because we have a Normal model, for example, we know that 95% of Normally distributed values are within two standard deviations of the mean. So we should not be surprised if 95% of various polls gave results that were near the mean but varied above and below that by no more than two standard deviations. Example: Which Brand of Pizza Do You Prefer? • Two Choices: A or D. • Assume that half of the population prefers Brand A and half prefers Brand D. 6 of 28 • Take a random sample of n = 3 tasters. Find the sampling distribution for the sample proportion. Find the mean and standard deviation. Sample No. Prefer Proportion Sample Probability Pizza A Proportion (A,A,A) 3 1 0 1/8 (A,A,D) 2 2/3 1/3 3/8 (A,D,A) 2 2/3 2/3 3/8 (D,A,A) 2 2/3 1 1/8 (A,D,D) 1 1/3 (D,A,D) 1 1/3 (D,D,A) 1 1/3 (D,D,D) 0 0 7 of 28 Assumptions and Conditions - Most models are useful only when specific assumptions are true. - There are two assumptions in the case of the model for the distribution of sample proportions: 1. The Independence Assumption: The sampled values must be independent of each other. 2. The Sample Size Assumption: The sample size, n, must be large enough. - Assumptions are hard—often impossible—to check. That’s why we assume them. - Still, we need to check whether the assumptions are reasonable by checking conditions that provide information about the assumptions. - The corresponding conditions to check before using the Normal to model the distribution of sample proportions:  Randomization Condition: The sample should be a simple random sample of the population.  10% Condition: If sampling has not been made with replacement, then the sample size, n, must be no larger than 10% of the population. 8 of 28  Success/Failure Condition: The sample size has to be big enough so that both np and nq are at least 10. With these sampling distributions, we can apply standardization in order to find the probability for the proportion of Successes. The z value for a value of p is: p p p p z    p p(1 p) , n which is well approximated by the standard normal distribution. Example: A study showed that the proportion of people in the 20 to 34 age group with an IQ (on the Wechsler Intelligence Scale) of over 120 is about 0.35. Let p = proportion of the sample with an IQ of at least 120. a) Find the mean and standard deviation of sample proportion 9 of 28 b) What can you say about the distribution of sample proportion? c) Find the probability for the event that in a sample of 50 there are more than 30 people with an IQ of at least 120. Example: In an experiment, 32 subjects made a total of 60,000 guesses on a set of 5 symbol cards. Pure chance would give around 12,000 correct guesses, but the subjects had a total of 12,489 correct guesses. 10 of 28 a) Find the mean and standard deviation of the sample proportion of correct guesses. b)What can you say about the distribution of sample proportion of correct guesses? c) Could this excess of 489 good guesses just be good luck? In other words, calculate the probability for the event that in the total guesses, there are more than 12489 correct guesses. 11 of 28 Example: (Please try it on your own) Suppose that the true proportion of people who have failed a professional exam is 0.87. A sample consists of 158 people is randomly drawn. a) Find the mean and standard deviation of the sample proportion of people failed to pass the professional exam? b)What can you say about the distribution of sample proportion? c) Find the probability that the sample proportion of people failed to pass the professional exam exceeds 0.94. 12 of 28 What about Quantitative Data? - Proportions summarize categorical variables. - The Normal sampling distribution model looks like it will be very useful. - Can we do something similar with quantitative data? - We can indeed. Even more remarkable, not only can we use all of the same concepts, but almost the same model. Example: The population is a class of 5 Stat 151 students. Let  be the population mean of their weight. Select a random sample of size 3 x and observe the average weight . We must be careful to distinguish this number x from . How can x , based on a sample of a small percentage of the class, be an accurate estimate of ? After all, a second sample would give a different value ofx . This basic fact is called sampling variability (the value of a statistic varies in repeated random sampling). 13 of 28 Now suppose you look at every possible random sample from this Stat 151 population and the corresponding sample mean. For these numbers, you can create the sampling distribution. Population Data All possible samples of size 3 Student Weight Possible Weight in thex Samples sample A 70 ABC 70, 75, 75 73.3333 B 75 ABD 70, 75, 75 73.3333 C 75 ABE 70, 75, 80 75 D 75 ACD 70, 75, 75 73.3333 E 80 ACE 70, 75, 80 75 ADE 70, 75, 80 75 BCD 75, 75, 75 75 BCE 75, 75, 80 76.6667 BDE 75, 75, 80 76.6667
More Less

Related notes for STAT151

Log In


OR

Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view


OR

By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.


Submit