# STAB22 Exam Fall 2007.pdf

Unlock Document

University of Toronto Scarborough

Statistics

STAB22H3

Olga Chilina

Fall

Description

University of Toronto Scarborough
STAB22 Final Examination
December 2007
This examination is multiple choice. Ensure that you
have a Scantron answer sheet and a #2 pencil, and com-
plete the Scantron sheet according to the instructions
(otherwise your exam may not be marked).
For this examination, you are allowed two letter-sized
sheet of notes (both sides), handwritten and prepared
by you, a non-programmable, non-communicating
calculator, and writing implements.
If your answer is not included in the alternatives given,
mark the answer that is most nearly correct from those
alternatives.
If you need paper for rough work, use the back of the
sheets of this question paper. The question paper will be
collected at the end of the examination, but any writing
on it will not be read or marked.
This examination has 23 numbered pages; before you
start, check to see that you have all the pages.
1 1. One part of the stock market is called the \over-the-counter market". One way of
measuring activity in a stock market is by the percentage of outstanding shares traded.
On a particular day, the results for 40 shares were as shown in the stemplot below:
Stem-and-leaf of C1 N = 40
Leaf Unit = 1.0
(22) 0 2222222222333333333333
18 0 444444455555
6 0 66777
1 0
1 1
1 1
1 1
1 1
1 1
1 2
1 2 2
What is the median of this distribution?
(a) * 3
(b) 0.3
(c) 30
(d) 2.2
(e) 6
2. In the data of Question 1, how do the mean and median compare?
(a) * The mean is bigger than the median
(b) The median is bigger than the mean
(c) The mean and median are about the same
(d) It is impossible to compare the mean and median from this information.
3. The pie chart below displays the distribution of grades in a statistics course. Note that
in this course, the performance of the students is graded into one of the ▯ve grades, A,
B, C, D and F.
2 One (and only one) of the ▯ve bar charts blow was constructed from the same data set
used to create the above pie chart. Which of the following bar charts represents the
same data as the pie chart above?
(a) This bar chart:
(b) * This bar chart:
3 (c) This bar chart:
(d) This bar chart:
(e) This bar chart:
4 4. The MINITAB output below gives the stemplot of the GPAs of a group of 78 students
in a class.
Stem-and-Leaf Display: GPA
Stem-and-leaf of GPA N = 78
Leaf Unit = 0.10
1 0 5
2 1 7
3 2 4
7 3 4689
11 4 0678
15 5 0259
22 6 0001249
(22) 7 1122344555566668888999
34 8 001111223378899
19 9 011133445555679
4 10 1577
This class actually had 82 students, but the GPAs of four of them were not available at
the time, the above stemplot was constructed. Later, it was found out that the GPAs
of these four students are 5.9, 7.1, 10.5, and 9.1.
Which of the following numbers is the closest to the interquartile range (i.e. IQR) of
the GPAs of all 82 students?
(a) 2.0
(b) 2.3
5 (c) 2.6
(d) * 2.9
(e) 3.2
5. There are ▯ve children aged 3, 3, 4, 5 and 5 years in a room. If another 4-year-old child
enters the room, what will happen to the mean and standard deviation of the ages of
the children in the room?
(a) The mean will stay the same but the standard deviation will increase.
(b) * The mean will stay the same but the standard deviation will decrease.
(c) The mean and standard deviation will both stay the same.
(d) The mean and standard deviation will both decrease.
(e) The mean and standard deviation will both increase.
6. The histogram given below shows the distribution of survival times in days of 89 guinea
pigs after they were injected with an experimental substance in a medical experiment.
Which of the following statements regarding the median survival time is true?
(a) The median survival time is less than 50 days.
(b) The median survival time is greater than 50 days but less than (or equal to) 95
days
(c) * The median survival time is greater than 95 days but less than (or equal to)
155 days
(d) The median survival time is greater than 155 days but less than (or equal to) 300
days
(e) The median survival time is greater than 300 days
6 7. A supermarket chain studied the times required to service customers. The values, in
minutes, are shown in the boxplot below.
What is the inter-quartile range of service times?
(a) * 1.1 minutes
(b) 0.4 minutes
(c) 0.7 minutes
(d) 3 minutes
(e) greater than 5 minutes
8. Gasoline use for compact cars sold in the United States has a normal distribution with
mean 32 miles per gallon, and SD 5 miles per gallon. What proportion of compact
cars obtain 40 miles per gallon or higher?
(a) * 0.05
(b) 0.95
(c) 0.15
(d) 0.85
(e) 0.30
9. Consider again the situation of Question 8. When gasoline is scarce, there is a compet-
itive advantage in developing a car that has better (higher) miles per gallon than 90%
of the current compact car market. What must the gasoline consumption (in miles per
gallon) be for this new car?
(a) * 38.4
(b) 25.6
(c) 32.0
7 (d) 35.2
(e) 28.8
10. The time it takes (for any student) to complete a STAB22 ▯nal exam is a random
variable having a normal distribution with mean 160 minutes and standard deviation
15 minutes. Anne, Bob, Clara and Dave are four friends writing this exam. What
is the probability that at least one of them will complete the exam in less than 145
minutes? (Assume that their completion times are independent.)
(a) Less than 0.01
(b) Between 0.01 and 0.35
(c) Between 0.35 and 0.45
(d) * Between 0.45 and 0.55
(e) Greater than 0.55
11. The \running economy" of a runner is the oxygen consumption when that runner runs
at a standardized speed. It is believed that a runner’s ▯nishing time in a 10 km race
will be related to the running economy. A scatterplot of running economy and 10 km
▯nishing time is shown below.
It is proposed to ▯t a straight line to these data. Which comment below is most
appropriate?
(a) * The relationship is not very strong, but it appears roughly linear.
(b) A straight line describes this relationship very well, and the correlation will be
very high.
(c) The relationship is clearly curved, and so ▯tting a straight line is a bad idea.
(d) This residual plot clearly does not show a random pattern, and therefore a straight
line should not be used for these data.
8 (e) We cannot use regression techniques because of the outliers.
12. In the situation described in Question 11, you are asked to predict the ▯nish time for
a runner with running economy 40. What is your reaction to this request?
(a) * A running economy value of 40 is very untypical for these data, and so the
▯nishing time may not be predicted accurately.
(b) Because a straight line describes the data, it is appropriate to use the regression
equation to do this prediction.
(c) The average ▯nishing time is 33.8 minutes, and so this is our best guess at the
▯nishing time for this runner.
(d) A straight line is not appropriate here, so we cannot produce a prediction at all.
13. A study was made of the infestation of a certain type of lobster by two di▯erent types
of barnacle, \tridens" and \lowei". It is believed that these two types of barnacle
compete for space on the surface of the lobster. A sample of 10 lobsters is taken, and
the number of barnacles of each type on each lobster is counted. A scatterplot of the
results is shown below.
What can you say about the correlation between the number of tridens and the number
of lowei?
(a) * It is de▯nitely negative.
(b) It is de▯nitely positive.
(c) It is close to zero.
(d) It is close to +1.
(e) It is close to ▯1.
14. Wood scientists are interested in replacing solid-wood building material by less expen-
sive products made from wood
akes. The following Minitab outputs were obtained
from a study of the relationship between the length (in inches) and the strength (in
pounds per square inch) of beams made from wood
akes.
9 Descriptive Statistics: Length, Strength
Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3
Length 10 0 9.500 0.957 3.028 5.000 6.750 9.500 12.250
Strength 10 0 291.3 22.5 71.2 234.0 242.8 251.5 343.3
Regression Analysis: Strength versus Length
The regression equation is
Strength = 488 - 20.7 Length
Predictor Coef SE Coef T P
Constant 488.38 38.85 12.57 0.000
Length -20.745 3.914 -5.30 0.001
Which of the following numbers is the closest to the value of the correlation between
the length and strength?
(a) 0.8
(b) {0.6
(c) {0.7
(d) {0.8
(e) * {0.9
15. In the study of the relationship between the length and the strength in Question 14
above, if we change the units of length to centimeters and the units of strength to
kilograms per square centimeter, what will be the slope of the least squares regression
line for predicting strength (in kilograms per square centimeter) from length (in cm)?
Choose the closest answer from the alternatives below.
(Assume 1 inch = 2.54 cm 1 pound per square inch = 0.07 kg per square centimeter)
(a) *{0.5
(b) {1.5
(c) {8.0
(d) {20.7
(e) {53.2
16. A study was carried out to assess the e▯ects of sleep deprivation on subjects’ ability to
perform simple tasks. Each subject was deprived of a pre-determined amount of sleep
between 8 and 24 hours, and then given a standard set of addition problems to solve;
the number of errors was recorded. A regression was carried out to predict the number
of errors from the number of hours of sleep deprivation, as shown below.
10 The regression equation is
errors = 3.00 + 0.475 hours
Predictor Coef SE Coef T P
Constant 3.000 2.127 1.41 0.196
hours 0.4750 0.1253 3.79 0.005
What is the predicted number of errors for a subject who is deprived of 20 hours of
sleep?
(a) * 12.5 errors
(b) 3.0 errors
(c) 20.0 errors
(d) 8.0 errors
(e) cannot predict because we are extrapolating
17. The following MINITAB output was obtained from a study of the relationship between
the salary (in thousands of dollars) and length of service (in years) based on a random
sample of 25 employees from a large ▯rm.
Regression Analysis: Salary versus Length
The regression equation is
Salary = 20.3 + 0.915 Length
Predictor Coef SE Coef T P
Constant 20.332 1.851 10.98 0.000
Length 0.9153 0.1767 5.18 0.000
S = 3.44985 R-Sq = 53.8% R-Sq(adj) = 51.8%
11 Based on the above, which one of the following statements is true?
(a) The above regression model explains more than 75% of the variability in salaries
of these employees.
(b) The distribution of residuals is left skewed.
(c) The residual plots above, clearly indicates the need for a higher order model
(d) * The absolute value of the residual for an observation in this sample with Length
= 16 and Salary = 41, is greater than 5.0
(e) The correlation between Length and Salary is 0.538
18. Using the information in Question 17 above, if the average length of service of the 25
employees is 9.72 years, what is the average salary of these 25 employees?
(a) it will be less than $21 000
(b) * it will be greater than $21 000 but less than $31 000
(c) it will be greater than $31 000 but less than $41 000
12 (d) it will be greater than $41 000
(e) it cannot be determined from the information given.
19. The regression analysis in Question 17 above is based on 25 observations. The residuals
for these 25 observations were calculated (not shown in the output). The sum of 20 of
these values (i.e. 20 residuals) was ▯6:08. What will be the sum of the remaining 5
residuals? (You may assume that the answer is not exactly equal to any of ▯10;▯5;0;5
or 10.)
(a) between ▯10 and ▯5
(b) between ▯5 and 0
(c) between 0 and 5
(d) * between 5 and 10
(e) none of the above intervals contain this value
20. A plot is made of the residuals from a regression against the explanatory variables. In
order for you to conclude that the regression is satisfactory, what are you looking for
on the residual plot?
(a) * no pattern at all
(b) a straight-line pattern
(c) some pattern, but not necessarily a straight line
(d) no outliers
21. A residual plot was made as described in Question 20. In the residual plot, which of
the following would indicate that a straight-line relationship is not satisfactory in the
regression?
▯ * a curved pattern
▯ no pattern at all
▯ a \fan-out" pattern with larger values of the explanatory variable going with large
positive or large negative residuals
▯ some pattern not given above
22. An researcher believes that a certain meditation technique lowers people’s anxiety level.
The researcher collects a random sample of subjects and divides them, at random, into
two groups. The subjects in group 1 are taught how to meditate, and are given frequent
practice in meditating. The subjects in group 2 spent an equivalent amount of time in
quiet relaxation. At the end of the study, all the subjects are given a standard test of
their anxiety, and the subjects in group 1 had a statistically signi▯cantly lower anxiety
level. What do you conclude from this study?
13 (a) * This experiment provides convincing evidence that meditation causes lower
anxiety.
(b) This observational study suggests that meditation is associated with lower anxiety,
but does not o▯er convincing evidence.
(c) This study o▯ers anecdotal evidence only, and so is no proof of anything.
(d) Group 2 in this study was unnecessary, because quiet relaxation is obviously
associated with lower anxiety.
23. A survey is taken on a healthcare issue, where opinion is likely to be di▯erent between
young and old people. A simple random sample is taken of young people and a separate
simple random sample is taken of older people, and the responses to the survey question
are combined. It is desired to make a con▯dence interval for the proportion of all people
who agree with the statement made in the survey. Is it possible to use the methods of
this course to construct the con▯dence interval?
(a) * No, because this is a strati▯ed sample, and our methods only apply to simple
random samples.
(b) Yes, because random samples were taken.
(c) Yes, because the samples don’t have to be random for our methods to be used.
(d) No, because a systematic sample is used here, and our methods only apply to
simple random samples.
(e) No, because simple random samples were used here, and our methods apply to
some other kind of sampling.
24. A multiple-choice exam consists of 30 multiple-choice questions. Each question has
▯ve possible responses, of which only one is the correct answer. Each correct response
carries 5 marks and each incorrect response loses 1 mark (scores a negative mark). A
question that is left unanswered automatically scores 0 marks. (Subtracting marks for
incorrect responses is known as a \correction for guessing" and is designed to discourage
test takers from choosing answers at random.). A totally unprepared student answers
all 30 questions by just selecting one of the ▯ve answers at random. Find the mean of
his total score on this exam. Choose the closest answer from the options below.
(a) * 6
(b) 0
(c) 3
(d) 12
(e) ▯3
(KB: I did some serious editing here. (i) having integer numbers of points for correct
and wrong answers makes the calculation a lot less prone to error; (ii) the question
is di▯cult enough just ▯guring out the mean, and with two numbers, \closest" isn’t
meaningful.)
14 25. What is the probability that the observed value of a binomial random variable with 5
trials and success probability 0.8 will be 4 or more?
(a) * 0.74
(b) 0.26
(c) 0.007
(d) 0.90
(e) 0.41
26. What is the approximate probability that the observed value of a binomial random

More
Less
Related notes for STAB22H3