13 Pages
Unlock Document

University of Toronto St. George
Statistical Sciences
Hadas Moshonov

UNIVERSITY OF TORONTO Faculty of Arts and Science DECEMBER EXAMINATIONS 2010 STA 302 H1F / STA 1001 HF Duration - 3 hours Aids Allowed: Calculator LAST NAME: SOLUTIONS FIRST NAME: STUDENT NUMBER: ▯ There are 19 pages including this page. ▯ The last page is a table of formulae that may be useful. For all questions you can assume that the results on the formula page are known unless the question states otherwise. ▯ Pages 14 through 18 contain output from SAS that you will need to answer Question 5. ▯ Total marks: 85 1 2ab 2cd 3 4 5a 5b 5c 5d(i-iii) 5d(iv-vi) 6 7, 8 1 1. (10 marks) Beside each description, write the letter of the term from the list below that provides the best match. (I) I What to include when the e▯ect of 1 on Y is di▯erent for di▯erent values of X . 2 (II) C The proportion of variation explained by the regression line. (III) F An observed response minus its estimated mean according to some model. (IV) Q A measure of how in uential a particular observation is. (V) N A method for estimating regression coe▯cients. (VI) B A test comparing a model of interest to the model with only an intercept. (VII) S A statistic used to identify problems of multicollinearity. (VIII) any of D,Z,T A statistic for comparing models with di▯erent sets of explanatory variables. (IX) T Another name for the estimate of ▯ in regression analysis. (X) R A measure of how unusual the x-values are for a particular observation. (A) Analysis of Variance (N) Least Squares (B) Analysis of Variance F-test (O) Correlation (C) R-squared (P) Degrees of freedom (D) Adjusted R-squared (Q) Cook’s Distance (E) t-test (R) Leverage (F) Residual (S) Variance In ation Factor (G) Standardized residual (T) Residual mean square (H) Fitted value (U) Mean square of regression (I) Interaction (V) Residual sum of squares (J) Indicator (W) Regression sum of squares (K) Explanatory variable (X) Total sum of squares (L) Response variable (Y) Variance (M) Outlier (Z) Extra sum of squares 2 Continued 2. Suppose that we believe that a response variable Y is related to a non-random ex- planatory variable x by the modei Y = ix +ie , i = 1;:::;n. That is, we believe that it is appropriate to use a model that goes through the origin. Assume that the following conditions hold: ▯ The errors1e ;::n;e have expectation 0. ▯ The errors have common variance ▯ . ▯ The errors are uncorrelated. (a) (3 marks) Show that the least squares estimator of ▯ is , X Xn ▯ = x Y x2 i i i i=1 i=1 The least square estimates minimize the residual sum of squares: X RSS = (Yi▯ ▯xi) @ RSS X = ▯2 xi(Yi▯ ▯xi) @ ▯ Setting equal to 0 and solving gives P xi i ▯ = P 2 xi (b) (3 marks) Assuming that the model is correct, show that ▯ is an unbiased esti- mator of ▯. xiE(Yi) E(▯) = P xi P x i▯xi) = P x2 i = ▯ 3 Continued (Question 2 continued.) ^ (c) (2 marks) Find Var(▯). 1 X 2 Var(▯) = ▯P 2 xiVar(Y i x i (since the i ’s are assumed uncorrelated because tie e ’s are assumed uncorrelated) 1 = P ▯2 x2 i (d) (2 marks) Suppose that the model Yi= ▯x i e is correct, but the model Yi= ▯ 0 ▯ x1 ie ii used. Show that Var(▯ )1▯ Var(▯). 2 2 ^ ▯ ▯ Var(▯ 1 = P 2 2 ▯ P 2 xi▯ nx xi P P since x ▯ n x ▯ x2 i i 4 Continued 3. A multiple linear regression model with dependent variable Y and 3 explanatory variables was ▯t to 15 observations. The residual sum of squares was found to be 22.0 and it was also found that 2 3 0:5 0:3 0:2 0:6 6 7 (X X) ▯1= 6 0:3 6:0 0:5 0:4 7 4 0:2 0:5 0:2 0:7 5 0:6 0:4 0:7 3:0 (a) (1 mark) What degrees of freedom would be used when ▯nding a con▯dence interval for1▯ ? n ▯ (p + 1) = 15 ▯ 4 = 11 (b) (1 mark) What is the estimate of the error variance? 22= 2 11 (c) (1 mark) What is the estimated variance of the estimat2r of ▯ ? 2(0:2) = 0:4 5 Continued 4. Consider the multiple regression model 2 Y = X▯ + e; e ▯ N(0;▯ I) (a) (3 marks) Show that e = (I ▯ H)e. e = (I ▯ H)Y = (I ▯ H)(X▯ + e) = X▯ + e ▯ X(X X) 0 ▯1X X▯ ▯ He = (I ▯ H)e (b) (1 mark) Why is E(ee ) = Var(e)? 0 0 Var(e) = E(ee ) ▯ E(e)(E(e)) (from the formula sheet) and E(e) = 0 (c) (4 marks) Show that I ▯ H is idempotent and symmetric. Idempotent: 2 2 (I ▯ H) = I ▯ 2H + H ▯ ▯ ▯ ▯ 0 ▯1 0 0 ▯1 0 = I ▯ 2H + X(X X) X X(X X) X ▯ ▯ = I ▯ 2H + X(X X) 0 ▯1X 0 = I ▯ 2H + H = I ▯ 2H Symmetric: 0 0 0 (I ▯ H) = I ▯ H ▯ 0 ▯1 0 0 = I ▯ X(X X) X = I ▯ X(X X) 0 ▯1X 0 = I ▯ H ^ 2 (d) (3 marks) Show that Var(ejX) = ▯ (I ▯ H). Var(ejX) = Var((I ▯ H)ejX) from (a) = (I ▯ H)Var(e)(I ▯ H) 0 = (I ▯ H)▯ I(I ▯ H) 0 2 = ▯ (I ▯ H) using (c) Continued 6 5. The data considered in this question are the same data considered in Assignment 1, taken from a 2007 Wall Street Journal article on the decline of U.S. house prices. The data are indicators of the real-estate market in 28 U.S. cities. The variables considered in this question are: Response variable: ▯ PriceChange { The percent change in average price of a home from one year ago. Explanatory variables: ▯ LoansOverdue { The percentage of mortgage loans that are 30 days or more overdue. ▯ InventoryChange { The percent change in housing inventory from one year ago. A positive value indicates that more houses are on the market. ▯ EmployOutlook { A character variable that classi▯es the projected growth in the number of jobs as one of Strong, Average, or Weak. (An observation that had an employment outlook of Very Weak in the original data has been re-classi▯ed as Weak.) ▯ iEmployOutIsWeak { An indicator variable that is 1 if EmployOutlook is Weak and 0 otherwise. ▯ iEmployOutIsAverage { An indicator variable that is 1 if EmployOutlook is Average and 0 otherwise. ▯ iEmpWeak LoansOD { The product of iEmployOutIsWeak and LoansOverdue. ▯ iEmpAvg LoansOD { The product of iEmployOutIsAverage and LoansOverdue. On pages 14 through 18 there is SAS output for the analysis of these data. The questions below relate to the SAS output. (a) ANALYSIS 1 (page 14) was carried out using only observations having EmployOutlook either Strong or Weak. (That is, cities with Average employment outlook were removed from the data for this analysis only.) The questions in part (a) relate to ANALYSIS 1. i. (2 marks) What is the estimated di▯erence in the mean of percent change in average price of a home between cities with Strong and cities with Weak employment outlook? 5.645% with cities with weak outlook having the smaller (negative) mean percent change. ii. (2 marks) Can you conclude that there is a di▯erence in the mean of percent change in average price of a home between cities with Strong a
More Less

Related notes for STA302H1

Log In


Don't have an account?

Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view


By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.