# f2010.pdf

Unlock Document

University of Toronto Scarborough

Statistics

STAB22H3

Ken Butler

Fall

Description

University of Toronto Scarborough
STAB22 Final Examination
December 2010
For this examination, you are allowed two handwritten letter-sized sheets of notes
(both sides) prepared by you, a non-programmable, non-communicating calculator,
and writing implements.
This question paper has 31 numbered pages, with statistical tables at the back.
Before you start, check to see that you have all the pages. You should also have a
Scantron sheet on which to enter your answers. If any of this is missing, speak to
an invigilator.
This examination is multiple choice. Each question has equal weight, and there is
no penalty for guessing. To ensure that you receive credit for your work on the
exam, ▯ll in the bubbles on the Scantron sheet for your correct student number
(under \Identi▯cation"), your last name, and as much of your ▯rst name as ▯ts.
Mark in each case the best answer out of the alternatives given (which means the
numerically closest answer if the answer is a number and the answer you obtained
is not given.)
If you need paper for rough work, use the back of the sheets of this question paper.
Before you begin, two more things:
▯ Check that the colour printed on your Scantron sheet matches the colour of
your question paper. If it does not, get a new Scantron from an invigilator.
▯ Complete the signature sheet, but sign it only when the invigilator collects it.
The signature sheet shows that you were present at the exam.
At the end of the exam, you must hand in your Scantron sheet (or you will receive a
mark of zero for the examination). You will be graded only on what appears on the
Scantron sheet. You may take away the question paper after the exam, but whether
you do or not, anything written on the question paper will not be considered in your
grade.
1 1. A study was carried out to determine whether a new diet is e▯ective in reducing
cholesterol levels. Twenty subjects were recruited. For each subject, the cholesterol
level was measured initially. Each subject was placed on the new diet for one month,
and then that subject’s cholesterol level was measured again. Which of the following
methods of analysis is most appropriate?
(a) * One-sided matched pairs t test.
(b) Two-sided two-sample z-test
(c) Two-sided two-sample t test.
(d) One-sided two-sample t test.
(e) Two-sided matched pair t test.
2. A medical treatment has a success probability of 0.7. Three patients will be treated
with this treatment. Assuming the results are independent for the three patients, what
is the probability that at least one of them will be successfully cured?
1-0.3^3=0.973 or use Table C n=3, p=0.7
(a) 0.70
(b) 0.50
(c) * 0.97
(d) 0.99
(e) 0.21
3. The distribution of the weight of chocolate bars produced by a certain machine has
mean of 8.1 oz and a standard deviation of 0.1 oz. The quality control manager plans
to take a simple random sample from the production line. How big should the sample
size n be so that the sampling distribution of the sample mean has standard deviation
0.02 oz?
sigma/sqrt(n)=0.02=0.1/sqrt(n), so sqrt(n)=5 and n=25.
(a) * 25
(b) 100
(c) 5
(d) Cannot be determined unless we know the population follows a Normal distribu-
tion.
(e) 10
2 4. A study was conducted to examine the quality of ▯sh after seven days in ice storage.
In the output below, y denotes a measurement of ▯sh quality (on a 10-point scale)
and x denotes the time (in hours) after being caught that the ▯sh were placed in ice
packing. The scatterplot of y versus x showed a linear relationship. Some output is
shown below.
Regression Analysis: y versus x
The regression equation is
y = 8.43 - 0.144 x
Predictor Coef SE Coef T P
Constant 8.4318 0.1018 82.86 0.000
x -0.14402 0.01391 -10.36 0.000
S = 0.187052 R-Sq = 92.3% R-Sq(adj) = 91.4%
Analysis of Variance
Source DF SS MS F P
Regression 1 3.7524 3.7524 107.25 0.000
Residual Error 9 0.3149 0.0350
Total 10 4.0673
What is the value of the correlation between x and y?
sqrt(0.923) = 0.96072889 and the correlation is negative since the
slope is negative.
(a) -0.923
(b) 0.923
(c) 0.961
(d) * -0.961
(e) 9.61
3 5. One of the ▯sh specimens in Question 4 above was placed in ice packing 3 hours after
being caught and had a ▯sh quality rating of 6. What is the residual for this ▯sh
specimen?
predicted quality is 8.43-(0.144)(3)=7.998 (or 8 using more accurate
figures). Residual is 6-8=-2.
(a) * -2
(b) 8
(c) 2
(d) 6
(e) -8
6. In the analysis in Question 4 above, a plot was also made of residuals against x. What
would you expect this residual plot to look like?
(a) Has a downward non-linear trend.
(b) Has a \fanning-out" pattern.
(c) Has a linear trend.
(d) Has a curved pattern.
(e) * Has no pattern.
7. Two alloys, A and B, are used in the manufacture of steel bars. A study was carried
out to compare the load capacities of bars made from the two alloys. (Load capacity
is measured in tons.) The researchers collected data from simple random samples of 9
bars made from alloy A and 13 bars made from alloy B. Let ▯ dAnote the population
mean load capacity of bars made from alloy A, and let ▯Bdenote the population mean
load capacity of bars made from alloy B. You may assume that load capacities in the
samples have normal distributions. Some output from an analysis is shown below.
Sample 1 is alloy A, and sample 2 is alloy B.
4 Two-Sample T-Test and CI
Sample N Mean StDev SE Mean
1 9 28.50 2.49 0.83
2 13 26.20 1.80 0.50
Difference = mu (1) - mu (2)
Estimate for difference: 2.30000
95% CI for difference: (0.20135, 4.39865)
T-Test of difference = 0 (vs not =): T-Value = 2.37 P-Value = 0.034 DF = 13
Following are three statements based on this analysis. Each statement is either true
or false.
I: The margin of error of the 95 percent con▯dence interval foA ▯ ▯B▯
is greater than 2.2.
II: The P-value for the test of the null hypothesisA▯ = ▯ B versus the
alternative hypothesis A > ▯B is 0.034.
III: The value 4.0 will be in the 99% con▯dence interval for ▯ ▯ based
A B
on this data.
Which of statements I, II and III is (are) true?
False ME = (4.39865- 0.20135)/2 = 2.09865 not greater than 2.2.
False, it is 0.034/2
Ans True 4.0 is in the 95\% CI and so will be in 99\% CI.
Ans : C) only statement III is true.
(a) only statement I is true.
(b) more than one of the three statements I, II and III is true.
(c) * only statement III is true
(d) none of the three statements I, II and III is true.
(e) only statement II is true
5 8. Using the information in Question 7 above, what can we say about the P-value of the
test of the null hypothesis ▯ = 26 versus the alternative hypothesis ▯ > 26?
A A
t-value = (28.5-26)/(2.49/sqrt(9)) = 3.012048193 with d.f. 8.
p-value for the test is between 0.005 and 0.01
(a) between 0.01 and 0.02
(b) between 0.0025 and 0.005
(c) less than 0.0025
(d) greater than 0.02
(e) * between 0.005 and 0.01
9. Every morning, John tosses a fair coin. If the coin comes up heads, he goes jogging
that day, and if it comes up tails, he does not go jogging. What is the probability that
he goes jogging on exactly two days in the next week (7 days)?
bin(7,0.5): table C, k=2: p=0.1641
(a) 0.01
(b) 0.23
(c) 0.06
(d) * 0.16
(e) 0.50
10. The random variable X has a binomial distribution with mean 0.8. The standard
deviation of X is also 0.8. What is the probability that X ▯ 2?
mean = np=0.8, variance np(1-p)=0.8^2=0.64. Divide to get
(1-p)=0.64/0.8=0.8, so p=0.2. Divide eqn for mean by this to get
n=4. Use Table C with n=4, p=0.2: prob=0.4096+0.4096+0.1536=0.9728.
normal approx no good, but gives z=(2-0.8)/0.8=1.5, prob~=0.935,
correctly the wrong choice.
(a) 0.50
(b) 0.95
(c) * 0.97
(d) 0.90
(e) 0.21
6 11. The scores on an exam have a Normal distribution with mean 65. The third quartile
of the distribution of scores is 75. Which of the following numbers is closest to the
standard deviation of the distribution of scores?
Q3 = mean + 0.67 SD = 75 and so SD = (75-65)/0.67 = 14.92537313
(a) 19
(b) 17
(c) 21
(d) 12
(e) * 15
12. An advertisement for a cold relief claims an 80% success rate. Eight patients selected
at random from a large population of patients are given this cold relief. Assume that
the claimed success rate is correct. Find the probability that this treatment will be
successful for seven or more of these eight patients.
number of successes is Bin(8,0.80). Can’t use Table C, so convert to
failures: p becomes 1-0.80=0.20, # successes becomes
8-(7,8)=0,1: 0.1678+0.3355=0.5033.
(a) 0.80
(b) * 0.50
(c) 0.20
(d) 0.35
(e) 0.10
13. Researchers are studying the yield of a crop in two locations. The researchers are going
to compute independent 90% con▯dence intervals for the mean yield at each location.
What is the probability that neither of the researchers’ intervals will contain the true
mean yields at their location?
just 0.10*0.10=0.01
(a) 0.90
(b) * 0.01
(c) 0.99
(d) 0.05
(e) 0.10
7 14. The histograms of two data sets (labelled A and B) are shown below:
Descriptive statistics for data set A are also given below:
Descriptive Statistics: Data Set A
Variable N Mean SE Mean StDev Q1 Q3
Data Set A 60 78.43 2.57 19.92 62.95 86.58
Which of the following could be the mean of data set B?
8 B is A shifted 12 units to the right, so mean is 12 bigger
(a) 56.43
(b) 58.43
(c) 78.43
(d) 98.43
(e) * 90.43
15. In Data Set A described in Question 14, how many observations are outliers according
to the \1.5 times IQR" rule?
1.5 IQR = 1.5(86.58-62.95)=35.445; Q1-35.445=27.905 and
Q3+35.445=122.025. There are no observations below 27.905, but
there are 1+3=4 observations in the last 2 bars (above 124), plus
possibly some from the next bar down.
(a) cannot tell from the information given
(b) no outliers
(c) exactly 2 outliers
(d) exactly 1 outlier
(e) * more than 2 outliers
16. A community is considering building a new skateboard park, and wants to survey
people’s opinions. Younger people might have a di▯erent opinion about the skateboard
park than older people, so, to ensure that each age group is properly represented,
separate simple random samples are taken from the di▯erent age groups, and the
results are combined. What is this sampling method called?
(a) quota sampling
(b) multistage sampling
(c) systematic sampling
(d) simple random sampling
(e) * strati▯ed sampling
9 17. Surveys conducted over the telephone often have problems with nonresponse. Why
does this happen?
(a) Not all households have phones, which causes bias.
(b) Some people have unlisted numbers and so cannot be part of the survey.
(c) * In many homes, when the survey company calls, no one is available to answer
the phone.
(d) This is a form of voluntary-response sampling, and randomization should be used
instead.
18. A weather forecaster says that the probability of rain on Monday, Tuesday and Wednes-
day of next week is 0.2, 0.6 and 0.3 respectively. Assuming that the weather forecaster
is correct, what is the probability that it rains on at least one of those three days?
P(no rain)=(1-0.2) x (1-0.6) x (1-0.3)=0.224, at least one day is 1-0.224=0.776
(a) * 0.8
(b) 0.6
(c) 0.9
(d) 0.7
(e) 0.5
19. In a casino game, you have probability 0.1 of winning. If you play the game 20 times
(independently), what is the probability that you win 2 times or fewer?
from Table C, n=20, p=0.1: .1216+.2752+.2802=0.6770
(a) * 0.65
(b) 0.30
(c) 0.45
(d) 0.05
(e) 0.20
10 20. In the casino game described in Question 19 above, suppose that if you win, you win
$8, and if you lose, you lose $1. You play the game 15 times (independently). What is
the probability that you lose more money than you win?
win 1 or 0 times: table C, n=15, p=0.1: .2059+.3432=0.5491
(a) * 0.55
(b) 0.80
(c) 0.40
(d) 0.25
(e) 0.10
21. A multiple-choice examination has two parts: Part A and Part B. Part A has 8 ques-
tions, each with ▯ve choices. Part B has 10 questions, each with four choices. In each
question, only one of the choices is the correct answer.
A student has not prepared for the examination at all, so answers each question by
picking a choice at random. Let X denote the total number of questions the student
gets correct. What is the standard deviation of X?
X is sum of independent Y=B(8,1/5) and
Z=B(10,1/4). Var(Y)=8(1/5)(4/5)=32/25=1.28 and
Var(Z)=10(1/4)(3/4)=30/16=1.875. So X has variance 1.28+1.875=3.155
and SD approx 1.8.
(a) 2.9
(b) * 1.8
(c) 1.4
(d) 2.2
(e) 1.0
11 22. Aldrin is a highly toxic organic compound that can cause various cancers. Ten water
specimens were taken from random locations in Wolf River, downstream from a toxic
waste site. The sample mean concentration was 5.109 nanograms per litre. Assuming
that the population standard deviation of concentrations is 0.9 nanograms per litre,
calculate a 95% con▯dence interval for the population mean concentration. What is
the lower limit of this con▯dence interval?
From Minitab:
One-Sample Z: conc
The assumed standard deviation = 0.9
Variable N Mean StDev SE Mean 95\% CI
conc 10 5.01900 1.10440 0.28460 (4.46118, 5.57682)
lower limit of corresponding t interval is 4.23
(a) 4.2
(b) * 4.5
(c) 5.8
(d) 5.1
(e) 5.6
12 23. A simple random sample was taken of 11 students in a class. The sampled students
reported the number of hours per week they studied. The sample mean was 10.26
hours and the sample standard deviation was 6.22 hours. Calculate a 99% con▯dence
interval for the mean study time of all the students in the class. What is the upper
limit of the con▯dence interval?
this should be t
Minitab gives (for t and z):
One-Sample T
N Mean StDev SE Mean 99% CI
11 10.2600 6.2200 1.8754 (4.3163, 16.2037)
One-Sample Z
The assumed standard deviation = 6.22
N Mean SE Mean 99% CI
11 10.2600 1.8754 (5.4293, 15.0907)
(a) 12.8
(b) 15.1
(c) 10.3
(d) * 16.2
(e) 4.3
24. When calculating a 95% con▯dence interval for a population mean, which is the most
important situation where use of the t distribution would be better than use of the
normal distribution?
(a) When the sample size is large.
(b) When the sample standard deviation is unknown.
(c) When the population standard deviation is known.
(d) When the sample size is small.
(e) * When the population standard deviation is unknown.
13 25. Some communities have installed red-light cameras at intersections. Drivers who fail
to stop at a red light are photographed and receive a ticket in the mail. Red-light
cameras do cut down on the number of right-angle collisions, but there are more rear-
end collisions as drivers brake suddenly to avoid a ticket. A highway agency collected
data on rear-end collisions at intersections before and after red-light cameras were
installed; in particular only those collisions where there was injury. In each case, the
type of injury was recorded, as shown below.
Type of injury Before camera After camera
Death/disabling 61 27
Evident injury 210 136
Possible injury 1659 845
Total 1930 1008
What is the marginal proportion of evident injuries?
Grand total = 1930+1008 = 2938
total of evident = 346
346/2938 = 0.117
(a) 0.34
(b) 0.03
(c) 0.66
(d) * 0.12
(e) 0.85
26. Look again at the data in Question 25. For those accidents that happened after the red-
light cameras were installed, what is the conditional proportion of them that resulted
in death or disabling injury?
27/1008=0.0267
(a) * 0.029
(b) 0.109
(c) 0.135
(d) 0.032
(e) 0.268
14 27. Refer again to the situation described in Question 25. The conditional distributions
of injury types are very similar before and after the red-light cameras were installed.
What would you conclude from this?
(a) There was not much di▯erence between the numbers of rear-end collisions before
and after the red-light cameras were installed.
(b) There was a di▯erence in the distribution of types of injury su▯ered before and
after the installation of red-light cameras.
(c) Type of injury and before/after are positively associated.
(d) Type of injury and before/after are negatively associated.
(e) * Red-light cameras did not make much di▯erence to the type of injury su▯ered
in rear-end collisions.
28. The correlation coe▯cient is useful to decide whether:
(a) A normal distribution adequately describes the data.
(b) * There is a straight-line relationship between two variables.
(c) There is any relationship between two variables, possibly a curved one.
(d) One variable is the cause of another.
(e) The intercept of the best-▯tting straight line is large or small.
29. A normal probability plot for some data is shown below.
What do you conclude about the shape of the data distribution from this plot?
15 (a) approximately normal
(b) * skewed to the left
(c) symmetric but not normal
(d) skewed to the right
(e) not linear
30. Researchers called medical specialists’ o▯ces posing as new patients and requesting
appointments for non-urgent problems. The waiting time, in days, was recorded, for
each request. Boxplots for two di▯erent samples of requests, labelled A and B, are
shown below.
The samples are the same size, and the distributions have symmetric shapes, so that
the sample mean is very close to the sample median in each case. Which sample, A or
B, o▯ers more compelling evidence that the population mean waiting time exceeds 30
days?
(a) It is impossible to tell from the boxplots.
(b) The two samples o▯er about the same evidence
(c) * Sample B
(d) Sample A
16 31. The random variable Y has a mean of 5 and this probability distribution:
Value 4 8
Probability 0.75 0.25
What is the standard deviation of Y ?
mean is 4(0.75)+8(0.25)=3+2=5
variance is (4-5)^2(0.75)+(8-5)^2(0.25)=0.75+2.25=3
sd is sqrt(3)=1.73
or: 4*Bernoulli(0.25)+4, so sd=4*sqrt{(0.25)(0.75)}=sqrt

More
Less
Related notes for STAB22H3