# BSB123 - Data Analysis Final Exam Notes (Summary of Lectures 7 - 12)

Unlock Document

Queensland University of Technology

Management and Human Resources

BSB123

All

Spring

Description

Data Analysis Final Exam Notes
Lecture 7: Sampling Distributions
Statistical inference: When a sample is selected to draw conclusions about a population
Sampling distribution: Distribution of possible values any sample statistic may take or spread around the
population parameter of interest
Sample error: Different samples of the same size from the sample population will yield different sample
means
Standard error of the mean:
X
n
As n increase, the sample error decreases (the more accurate the results will be)
If the Population is Normal
Population is normal with a menand standard deviation,
The sampling distribution Xf and X
n
Z Value for Sampling Distribution of the Mean
Z (X )
n
Example: A soft drink manufacturer sells one of its population flavours in a 600mL bottle. Fill of the soft
drink is normally distributed with a mean fill of 600mL and a standard deviation of fill of 10mL.
How likely is it we would get a mean fill from a sample of 25 bottle which has a result of 598mL or less?
X 600
10 2
X n 25
X x
P(X 598) P
x
P Z 598600
2
x 598
P(Z 1)
0.1587
There is a 15.87% chance a sample of 25 bottles would produce a sample mean fill of less than 598mL.
Christina Meyers BSB 123 Data Analysis 1 If the Population is not Normal
Sample size 30 Central Limit Theorem applies
Central Limit Theorem: Regardless of the shape of individual values in the population distribution, as
long as the sample size is large enough the sampling distribXtwill be approximately
normally distributed withX and X
n
Example: For a non-normal population wi 8 and 3, what is the probability that the sample
mean is between 7.8 and 8.2 if a sample size n = 36 is selected
The central limit theorem can be usd30s
The sampling distribution xis approximately normal with a mean, of 8 and standard error,
X
3
X 0.5
n 36
7.88 8.28
P(7.8 X 8.2) P 3 Z 3
36 36
P(0.4 Z 0.4)
0.3108
Sample Distribution of Proportion – Z Distribution
is population proportion
pis sample proportion
p X no.of items inthe samplehavingthe characteristicof interest
n samplesize
Sample distribution is binomial
It can be approximated by normal n 5 and n(1) 5
With resulting mean equal to:p
Christina Meyers BSB 123 Data Analysis 2 Standard error: (1)
p n
Z Value for Sampling Distribution of Proportion
p
Z
(1)
n
Example: If the true proportion of voters who support Proposit 0.4, what is the
probability that a sample size 200 yields a sample proportion between 0.40 and 0.45?
p 0.4
(1)
p n
0.4(10.4)
200
0.03464
0.400.40 0.450.40
P(0.40 p 0.45) P Z
0.03464 0.03464
P(0 Z 1,44)
0.4251
Christina Meyers BSB 123 Data Analysis 3 Lecture 8: Estimation
Point estimate: value of a single sample statistic (“best number guess”) used to estimate an
unknown population parameter
Confidence interval: range of values around point estimate
Interpretation e.g 95% CI – If we make 100 confidence intervals we expect 95 to contain the true
population parameter (+ long term frequency link)
Mean point estimate for x
General Formula: Point Estimate ± (Critical Value)*(Standard Error)
Confidence Level: 1 -
in each tail
2
Confidence interval
For ( Known) – Z Distribution: Z
n
Where:
x is the point estimate
Z is the normal distribution critical value for a probability of /2 in each tail
is the standard error
n
Example:
Christina Meyers BSB 123 Data Analysis 4 Common Levels of Confidence
Example: A sample of 11 circuits from a large normal population has a mean resistance of 2.20
ohms. We know from past testing that the population standard deviation is 0.35 ohms.
Determine a 95% confidence interval for the true mean resistance of the population.
x Z
n
0.35
2.201.96
11
2.20 0.2068
1.9932 2.4068
We are 95% confident that the true mean resistance is between 1.9932 and 2.4068. Although the
true mean may or may not be in this interval, 95% of intervals formed in this manner (in repeated
samples) will contain the true mean.
Determining Sample Size for the Mean
z 2
Sample size n 2
e
Example:
If = 45, what sample size is needed to estimate the mean withi 5 with 90% confidence?
2 2 2 2
n z (1.645) (45) 219.19
e2 52
The required sample size is n = 220 (ALWAYS ROUND UP)
S
For ( Unknown) – t Distribution: x tn1
n
Where t has df = n -1
Example: A random sample of n = 25 has x = 50 & S=8. Form a 95% confidence interval fr:
df = n – 1 = 25 – 1 = 24
Christina Meyers BSB 123 Data Analysis 5 t 2,n1 t0.025,242.0639
The confidence interval is
S 8
x 2,n1 50 (2.0639 )
n 25
2 2
z
Sample size n 2
e
p(1 p)
Proportion point estimate for : p Z
n
Where:
Z is the standard normal value for the level of confidence required
p is the sample proportion
n is the sample size
Example: A random sample of 100 people shows that 25 are left handed. Form a 95% confidence
interval for the true proportion of left-handers.
p(1 p)
p Z
n
25 0.25(0.75)
1.96
100 100
0.251.96(0.0433)
0.1651 0.3349
Estimating Sample Size for
2
Sample size n Z (1)
e 2
NB: If completely unknown use 0.5 as the default
How large a sample would be necessary to estimate the true proportion defective in a large
population within 3% with 95% confidence? (Assume a pilot sample yields p = 0.12)
Z (1) (1.96) (0.12)(10.12)
n 2 2 450.74
e (0.03)
Therefore required sample size is n = 451.
Christina Meyers BSB 123 Data Analysis 6 Lecture 9: Hypothesis Testing – One Sample Tests
Hypothesis: Statement about a population parameter. It uses the sample statistic as evidence to
back up the claim about the population parameter. It always conta
Steps in Hypothesis Testing
1. State the null and alternative hypothesis
H0: Null hypothesis – status quo
H1: Alternative hypothesis – hypothesis of change
2. Choose the level of significance,and sample size, n
Level of significance- total rejection area
3. Determine the appropriate test statistic and sampling distribution
Means Known – Z Distribution
Test statistic:
Z x
n
Means Unknown – t Distribution
Test statistic:
x
t S
n
Proportions – Z Distribution
p
Z
(1)
n
4. Determine the critical values that divide the rejection and non-rejection regions
Decision diagram technique
One tail test – Upper tail – Direction Implied ‘increase’ or ‘d1 reject region
H : x
0
H 1 x
Upper tail test 1f H is focused on above the mean
Christina Meyers BSB 123 Data Analysis 7 One tail test – Lower tail– Direction Implied ‘increase’ or ‘decrease’1 reject region
H : x
0
H : x
1
Is a lower-tail test i1 H is focused on below the mean
Two tail test – No Direction Implied 2 reject regions
H :0 x
H : x
1
p-Value technique
p-Value: Probability of obtaining test statistic more extreme than observed value given 0 is true
If p-Value

More
Less
Related notes for BSB123