31 Pages
Unlock Document

University of Toronto Scarborough
Ken Butler

University of Toronto Scarborough STAB22 Final Examination December 2010 For this examination, you are allowed two handwritten letter-sized sheets of notes (both sides) prepared by you, a non-programmable, non-communicating calculator, and writing implements. This question paper has 31 numbered pages, with statistical tables at the back. Before you start, check to see that you have all the pages. You should also have a Scantron sheet on which to enter your answers. If any of this is missing, speak to an invigilator. This examination is multiple choice. Each question has equal weight, and there is no penalty for guessing. To ensure that you receive credit for your work on the exam, ▯ll in the bubbles on the Scantron sheet for your correct student number (under \Identi▯cation"), your last name, and as much of your ▯rst name as ▯ts. Mark in each case the best answer out of the alternatives given (which means the numerically closest answer if the answer is a number and the answer you obtained is not given.) If you need paper for rough work, use the back of the sheets of this question paper. Before you begin, two more things: ▯ Check that the colour printed on your Scantron sheet matches the colour of your question paper. If it does not, get a new Scantron from an invigilator. ▯ Complete the signature sheet, but sign it only when the invigilator collects it. The signature sheet shows that you were present at the exam. At the end of the exam, you must hand in your Scantron sheet (or you will receive a mark of zero for the examination). You will be graded only on what appears on the Scantron sheet. You may take away the question paper after the exam, but whether you do or not, anything written on the question paper will not be considered in your grade. 1 1. A study was carried out to determine whether a new diet is e▯ective in reducing cholesterol levels. Twenty subjects were recruited. For each subject, the cholesterol level was measured initially. Each subject was placed on the new diet for one month, and then that subject’s cholesterol level was measured again. Which of the following methods of analysis is most appropriate? (a) * One-sided matched pairs t test. (b) Two-sided two-sample z-test (c) Two-sided two-sample t test. (d) One-sided two-sample t test. (e) Two-sided matched pair t test. 2. A medical treatment has a success probability of 0.7. Three patients will be treated with this treatment. Assuming the results are independent for the three patients, what is the probability that at least one of them will be successfully cured? 1-0.3^3=0.973 or use Table C n=3, p=0.7 (a) 0.70 (b) 0.50 (c) * 0.97 (d) 0.99 (e) 0.21 3. The distribution of the weight of chocolate bars produced by a certain machine has mean of 8.1 oz and a standard deviation of 0.1 oz. The quality control manager plans to take a simple random sample from the production line. How big should the sample size n be so that the sampling distribution of the sample mean has standard deviation 0.02 oz? sigma/sqrt(n)=0.02=0.1/sqrt(n), so sqrt(n)=5 and n=25. (a) * 25 (b) 100 (c) 5 (d) Cannot be determined unless we know the population follows a Normal distribu- tion. (e) 10 2 4. A study was conducted to examine the quality of ▯sh after seven days in ice storage. In the output below, y denotes a measurement of ▯sh quality (on a 10-point scale) and x denotes the time (in hours) after being caught that the ▯sh were placed in ice packing. The scatterplot of y versus x showed a linear relationship. Some output is shown below. Regression Analysis: y versus x The regression equation is y = 8.43 - 0.144 x Predictor Coef SE Coef T P Constant 8.4318 0.1018 82.86 0.000 x -0.14402 0.01391 -10.36 0.000 S = 0.187052 R-Sq = 92.3% R-Sq(adj) = 91.4% Analysis of Variance Source DF SS MS F P Regression 1 3.7524 3.7524 107.25 0.000 Residual Error 9 0.3149 0.0350 Total 10 4.0673 What is the value of the correlation between x and y? sqrt(0.923) = 0.96072889 and the correlation is negative since the slope is negative. (a) -0.923 (b) 0.923 (c) 0.961 (d) * -0.961 (e) 9.61 3 5. One of the ▯sh specimens in Question 4 above was placed in ice packing 3 hours after being caught and had a ▯sh quality rating of 6. What is the residual for this ▯sh specimen? predicted quality is 8.43-(0.144)(3)=7.998 (or 8 using more accurate figures). Residual is 6-8=-2. (a) * -2 (b) 8 (c) 2 (d) 6 (e) -8 6. In the analysis in Question 4 above, a plot was also made of residuals against x. What would you expect this residual plot to look like? (a) Has a downward non-linear trend. (b) Has a \fanning-out" pattern. (c) Has a linear trend. (d) Has a curved pattern. (e) * Has no pattern. 7. Two alloys, A and B, are used in the manufacture of steel bars. A study was carried out to compare the load capacities of bars made from the two alloys. (Load capacity is measured in tons.) The researchers collected data from simple random samples of 9 bars made from alloy A and 13 bars made from alloy B. Let ▯ dAnote the population mean load capacity of bars made from alloy A, and let ▯Bdenote the population mean load capacity of bars made from alloy B. You may assume that load capacities in the samples have normal distributions. Some output from an analysis is shown below. Sample 1 is alloy A, and sample 2 is alloy B. 4 Two-Sample T-Test and CI Sample N Mean StDev SE Mean 1 9 28.50 2.49 0.83 2 13 26.20 1.80 0.50 Difference = mu (1) - mu (2) Estimate for difference: 2.30000 95% CI for difference: (0.20135, 4.39865) T-Test of difference = 0 (vs not =): T-Value = 2.37 P-Value = 0.034 DF = 13 Following are three statements based on this analysis. Each statement is either true or false. I: The margin of error of the 95 percent con▯dence interval foA ▯ ▯B▯ is greater than 2.2. II: The P-value for the test of the null hypothesisA▯ = ▯ B versus the alternative hypothesis A > ▯B is 0.034. III: The value 4.0 will be in the 99% con▯dence interval for ▯ ▯ based A B on this data. Which of statements I, II and III is (are) true? False ME = (4.39865- 0.20135)/2 = 2.09865 not greater than 2.2. False, it is 0.034/2 Ans True 4.0 is in the 95\% CI and so will be in 99\% CI. Ans : C) only statement III is true. (a) only statement I is true. (b) more than one of the three statements I, II and III is true. (c) * only statement III is true (d) none of the three statements I, II and III is true. (e) only statement II is true 5 8. Using the information in Question 7 above, what can we say about the P-value of the test of the null hypothesis ▯ = 26 versus the alternative hypothesis ▯ > 26? A A t-value = (28.5-26)/(2.49/sqrt(9)) = 3.012048193 with d.f. 8. p-value for the test is between 0.005 and 0.01 (a) between 0.01 and 0.02 (b) between 0.0025 and 0.005 (c) less than 0.0025 (d) greater than 0.02 (e) * between 0.005 and 0.01 9. Every morning, John tosses a fair coin. If the coin comes up heads, he goes jogging that day, and if it comes up tails, he does not go jogging. What is the probability that he goes jogging on exactly two days in the next week (7 days)? bin(7,0.5): table C, k=2: p=0.1641 (a) 0.01 (b) 0.23 (c) 0.06 (d) * 0.16 (e) 0.50 10. The random variable X has a binomial distribution with mean 0.8. The standard deviation of X is also 0.8. What is the probability that X ▯ 2? mean = np=0.8, variance np(1-p)=0.8^2=0.64. Divide to get (1-p)=0.64/0.8=0.8, so p=0.2. Divide eqn for mean by this to get n=4. Use Table C with n=4, p=0.2: prob=0.4096+0.4096+0.1536=0.9728. normal approx no good, but gives z=(2-0.8)/0.8=1.5, prob~=0.935, correctly the wrong choice. (a) 0.50 (b) 0.95 (c) * 0.97 (d) 0.90 (e) 0.21 6 11. The scores on an exam have a Normal distribution with mean 65. The third quartile of the distribution of scores is 75. Which of the following numbers is closest to the standard deviation of the distribution of scores? Q3 = mean + 0.67 SD = 75 and so SD = (75-65)/0.67 = 14.92537313 (a) 19 (b) 17 (c) 21 (d) 12 (e) * 15 12. An advertisement for a cold relief claims an 80% success rate. Eight patients selected at random from a large population of patients are given this cold relief. Assume that the claimed success rate is correct. Find the probability that this treatment will be successful for seven or more of these eight patients. number of successes is Bin(8,0.80). Can’t use Table C, so convert to failures: p becomes 1-0.80=0.20, # successes becomes 8-(7,8)=0,1: 0.1678+0.3355=0.5033. (a) 0.80 (b) * 0.50 (c) 0.20 (d) 0.35 (e) 0.10 13. Researchers are studying the yield of a crop in two locations. The researchers are going to compute independent 90% con▯dence intervals for the mean yield at each location. What is the probability that neither of the researchers’ intervals will contain the true mean yields at their location? just 0.10*0.10=0.01 (a) 0.90 (b) * 0.01 (c) 0.99 (d) 0.05 (e) 0.10 7 14. The histograms of two data sets (labelled A and B) are shown below: Descriptive statistics for data set A are also given below: Descriptive Statistics: Data Set A Variable N Mean SE Mean StDev Q1 Q3 Data Set A 60 78.43 2.57 19.92 62.95 86.58 Which of the following could be the mean of data set B? 8 B is A shifted 12 units to the right, so mean is 12 bigger (a) 56.43 (b) 58.43 (c) 78.43 (d) 98.43 (e) * 90.43 15. In Data Set A described in Question 14, how many observations are outliers according to the \1.5 times IQR" rule? 1.5 IQR = 1.5(86.58-62.95)=35.445; Q1-35.445=27.905 and Q3+35.445=122.025. There are no observations below 27.905, but there are 1+3=4 observations in the last 2 bars (above 124), plus possibly some from the next bar down. (a) cannot tell from the information given (b) no outliers (c) exactly 2 outliers (d) exactly 1 outlier (e) * more than 2 outliers 16. A community is considering building a new skateboard park, and wants to survey people’s opinions. Younger people might have a di▯erent opinion about the skateboard park than older people, so, to ensure that each age group is properly represented, separate simple random samples are taken from the di▯erent age groups, and the results are combined. What is this sampling method called? (a) quota sampling (b) multistage sampling (c) systematic sampling (d) simple random sampling (e) * strati▯ed sampling 9 17. Surveys conducted over the telephone often have problems with nonresponse. Why does this happen? (a) Not all households have phones, which causes bias. (b) Some people have unlisted numbers and so cannot be part of the survey. (c) * In many homes, when the survey company calls, no one is available to answer the phone. (d) This is a form of voluntary-response sampling, and randomization should be used instead. 18. A weather forecaster says that the probability of rain on Monday, Tuesday and Wednes- day of next week is 0.2, 0.6 and 0.3 respectively. Assuming that the weather forecaster is correct, what is the probability that it rains on at least one of those three days? P(no rain)=(1-0.2) x (1-0.6) x (1-0.3)=0.224, at least one day is 1-0.224=0.776 (a) * 0.8 (b) 0.6 (c) 0.9 (d) 0.7 (e) 0.5 19. In a casino game, you have probability 0.1 of winning. If you play the game 20 times (independently), what is the probability that you win 2 times or fewer? from Table C, n=20, p=0.1: .1216+.2752+.2802=0.6770 (a) * 0.65 (b) 0.30 (c) 0.45 (d) 0.05 (e) 0.20 10 20. In the casino game described in Question 19 above, suppose that if you win, you win $8, and if you lose, you lose $1. You play the game 15 times (independently). What is the probability that you lose more money than you win? win 1 or 0 times: table C, n=15, p=0.1: .2059+.3432=0.5491 (a) * 0.55 (b) 0.80 (c) 0.40 (d) 0.25 (e) 0.10 21. A multiple-choice examination has two parts: Part A and Part B. Part A has 8 ques- tions, each with ▯ve choices. Part B has 10 questions, each with four choices. In each question, only one of the choices is the correct answer. A student has not prepared for the examination at all, so answers each question by picking a choice at random. Let X denote the total number of questions the student gets correct. What is the standard deviation of X? X is sum of independent Y=B(8,1/5) and Z=B(10,1/4). Var(Y)=8(1/5)(4/5)=32/25=1.28 and Var(Z)=10(1/4)(3/4)=30/16=1.875. So X has variance 1.28+1.875=3.155 and SD approx 1.8. (a) 2.9 (b) * 1.8 (c) 1.4 (d) 2.2 (e) 1.0 11 22. Aldrin is a highly toxic organic compound that can cause various cancers. Ten water specimens were taken from random locations in Wolf River, downstream from a toxic waste site. The sample mean concentration was 5.109 nanograms per litre. Assuming that the population standard deviation of concentrations is 0.9 nanograms per litre, calculate a 95% con▯dence interval for the population mean concentration. What is the lower limit of this con▯dence interval? From Minitab: One-Sample Z: conc The assumed standard deviation = 0.9 Variable N Mean StDev SE Mean 95\% CI conc 10 5.01900 1.10440 0.28460 (4.46118, 5.57682) lower limit of corresponding t interval is 4.23 (a) 4.2 (b) * 4.5 (c) 5.8 (d) 5.1 (e) 5.6 12 23. A simple random sample was taken of 11 students in a class. The sampled students reported the number of hours per week they studied. The sample mean was 10.26 hours and the sample standard deviation was 6.22 hours. Calculate a 99% con▯dence interval for the mean study time of all the students in the class. What is the upper limit of the con▯dence interval? this should be t Minitab gives (for t and z): One-Sample T N Mean StDev SE Mean 99% CI 11 10.2600 6.2200 1.8754 (4.3163, 16.2037) One-Sample Z The assumed standard deviation = 6.22 N Mean SE Mean 99% CI 11 10.2600 1.8754 (5.4293, 15.0907) (a) 12.8 (b) 15.1 (c) 10.3 (d) * 16.2 (e) 4.3 24. When calculating a 95% con▯dence interval for a population mean, which is the most important situation where use of the t distribution would be better than use of the normal distribution? (a) When the sample size is large. (b) When the sample standard deviation is unknown. (c) When the population standard deviation is known. (d) When the sample size is small. (e) * When the population standard deviation is unknown. 13 25. Some communities have installed red-light cameras at intersections. Drivers who fail to stop at a red light are photographed and receive a ticket in the mail. Red-light cameras do cut down on the number of right-angle collisions, but there are more rear- end collisions as drivers brake suddenly to avoid a ticket. A highway agency collected data on rear-end collisions at intersections before and after red-light cameras were installed; in particular only those collisions where there was injury. In each case, the type of injury was recorded, as shown below. Type of injury Before camera After camera Death/disabling 61 27 Evident injury 210 136 Possible injury 1659 845 Total 1930 1008 What is the marginal proportion of evident injuries? Grand total = 1930+1008 = 2938 total of evident = 346 346/2938 = 0.117 (a) 0.34 (b) 0.03 (c) 0.66 (d) * 0.12 (e) 0.85 26. Look again at the data in Question 25. For those accidents that happened after the red- light cameras were installed, what is the conditional proportion of them that resulted in death or disabling injury? 27/1008=0.0267 (a) * 0.029 (b) 0.109 (c) 0.135 (d) 0.032 (e) 0.268 14 27. Refer again to the situation described in Question 25. The conditional distributions of injury types are very similar before and after the red-light cameras were installed. What would you conclude from this? (a) There was not much di▯erence between the numbers of rear-end collisions before and after the red-light cameras were installed. (b) There was a di▯erence in the distribution of types of injury su▯ered before and after the installation of red-light cameras. (c) Type of injury and before/after are positively associated. (d) Type of injury and before/after are negatively associated. (e) * Red-light cameras did not make much di▯erence to the type of injury su▯ered in rear-end collisions. 28. The correlation coe▯cient is useful to decide whether: (a) A normal distribution adequately describes the data. (b) * There is a straight-line relationship between two variables. (c) There is any relationship between two variables, possibly a curved one. (d) One variable is the cause of another. (e) The intercept of the best-▯tting straight line is large or small. 29. A normal probability plot for some data is shown below. What do you conclude about the shape of the data distribution from this plot? 15 (a) approximately normal (b) * skewed to the left (c) symmetric but not normal (d) skewed to the right (e) not linear 30. Researchers called medical specialists’ o▯ces posing as new patients and requesting appointments for non-urgent problems. The waiting time, in days, was recorded, for each request. Boxplots for two di▯erent samples of requests, labelled A and B, are shown below. The samples are the same size, and the distributions have symmetric shapes, so that the sample mean is very close to the sample median in each case. Which sample, A or B, o▯ers more compelling evidence that the population mean waiting time exceeds 30 days? (a) It is impossible to tell from the boxplots. (b) The two samples o▯er about the same evidence (c) * Sample B (d) Sample A 16 31. The random variable Y has a mean of 5 and this probability distribution: Value 4 8 Probability 0.75 0.25 What is the standard deviation of Y ? mean is 4(0.75)+8(0.25)=3+2=5 variance is (4-5)^2(0.75)+(8-5)^2(0.25)=0.75+2.25=3 sd is sqrt(3)=1.73 or: 4*Bernoulli(0.25)+4, so sd=4*sqrt{(0.25)(0.75)}=sqrt
More Less

Related notes for STAB22H3

Log In


Don't have an account?

Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view


By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.