Study Guides
(238,069)

Canada
(114,906)

University of Toronto St. George
(7,975)

Statistical Sciences
(73)

STA302H1
(10)

Hadas Moshonov
(2)

# exsolf10.pdf

Unlock Document

University of Toronto St. George

Statistical Sciences

STA302H1

Hadas Moshonov

Fall

Description

UNIVERSITY OF TORONTO
Faculty of Arts and Science
DECEMBER EXAMINATIONS 2010
STA 302 H1F / STA 1001 HF
Duration - 3 hours
Aids Allowed: Calculator
LAST NAME: SOLUTIONS FIRST NAME:
STUDENT NUMBER:
▯ There are 19 pages including this page.
▯ The last page is a table of formulae that may be useful. For all questions you can assume
that the results on the formula page are known unless the question states otherwise.
▯ Pages 14 through 18 contain output from SAS that you will need to answer Question 5.
▯ Total marks: 85
1 2ab 2cd 3 4 5a
5b 5c 5d(i-iii) 5d(iv-vi) 6 7, 8
1 1. (10 marks) Beside each description, write the letter of the term from the list below
that provides the best match.
(I) I What to include when the e▯ect of 1 on Y is di▯erent for di▯erent values
of X .
2
(II) C The proportion of variation explained by the regression line.
(III) F An observed response minus its estimated mean according to some model.
(IV) Q A measure of how in
uential a particular observation is.
(V) N A method for estimating regression coe▯cients.
(VI) B A test comparing a model of interest to the model with only an intercept.
(VII) S A statistic used to identify problems of multicollinearity.
(VIII) any of D,Z,T A statistic for comparing models with di▯erent sets of explanatory
variables.
(IX) T Another name for the estimate of ▯ in regression analysis.
(X) R A measure of how unusual the x-values are for a particular observation.
(A) Analysis of Variance (N) Least Squares
(B) Analysis of Variance F-test (O) Correlation
(C) R-squared (P) Degrees of freedom
(D) Adjusted R-squared (Q) Cook’s Distance
(E) t-test (R) Leverage
(F) Residual (S) Variance In
ation Factor
(G) Standardized residual (T) Residual mean square
(H) Fitted value (U) Mean square of regression
(I) Interaction (V) Residual sum of squares
(J) Indicator (W) Regression sum of squares
(K) Explanatory variable (X) Total sum of squares
(L) Response variable (Y) Variance
(M) Outlier (Z) Extra sum of squares
2 Continued 2. Suppose that we believe that a response variable Y is related to a non-random ex-
planatory variable x by the modei Y = ix +ie , i = 1;:::;n. That is, we believe
that it is appropriate to use a model that goes through the origin. Assume that the
following conditions hold:
▯ The errors1e ;::n;e have expectation 0.
▯ The errors have common variance ▯ .
▯ The errors are uncorrelated.
(a) (3 marks) Show that the least squares estimator of ▯ is
,
X Xn
▯ = x Y x2
i i i
i=1 i=1
The least square estimates minimize the residual sum of squares:
X
RSS = (Yi▯ ▯xi)
@ RSS X
= ▯2 xi(Yi▯ ▯xi)
@ ▯
Setting equal to 0 and solving gives
P
xi i
▯ = P 2
xi
(b) (3 marks) Assuming that the model is correct, show that ▯ is an unbiased esti-
mator of ▯.
xiE(Yi)
E(▯) = P
xi
P
x i▯xi)
= P x2
i
= ▯
3 Continued (Question 2 continued.)
^
(c) (2 marks) Find Var(▯).
1 X 2
Var(▯) = ▯P 2 xiVar(Y i
x i
(since the i ’s are assumed uncorrelated because tie e ’s are assumed uncorrelated)
1
= P ▯2
x2
i
(d) (2 marks) Suppose that the model Yi= ▯x i e is correct, but the model
Yi= ▯ 0 ▯ x1 ie ii used. Show that Var(▯ )1▯ Var(▯).
2 2
^ ▯ ▯
Var(▯ 1 = P 2 2 ▯ P 2
xi▯ nx xi
P P
since x ▯ n x ▯ x2
i i
4 Continued 3. A multiple linear regression model with dependent variable Y and 3 explanatory
variables was ▯t to 15 observations. The residual sum of squares was found to be 22.0
and it was also found that
2 3
0:5 0:3 0:2 0:6
6 7
(X X) ▯1= 6 0:3 6:0 0:5 0:4 7
4 0:2 0:5 0:2 0:7 5
0:6 0:4 0:7 3:0
(a) (1 mark) What degrees of freedom would be used when ▯nding a con▯dence
interval for1▯ ?
n ▯ (p + 1) = 15 ▯ 4 = 11
(b) (1 mark) What is the estimate of the error variance?
22= 2
11
(c) (1 mark) What is the estimated variance of the estimat2r of ▯ ?
2(0:2) = 0:4
5 Continued 4. Consider the multiple regression model
2
Y = X▯ + e; e ▯ N(0;▯ I)
(a) (3 marks) Show that e = (I ▯ H)e.
e = (I ▯ H)Y
= (I ▯ H)(X▯ + e)
= X▯ + e ▯ X(X X) 0 ▯1X X▯ ▯ He
= (I ▯ H)e
(b) (1 mark) Why is E(ee ) = Var(e)?
0 0
Var(e) = E(ee ) ▯ E(e)(E(e)) (from the formula sheet)
and E(e) = 0
(c) (4 marks) Show that I ▯ H is idempotent and symmetric.
Idempotent:
2 2
(I ▯ H) = I ▯ 2H + H ▯ ▯ ▯ ▯
0 ▯1 0 0 ▯1 0
= I ▯ 2H + X(X X) X X(X X) X
▯ ▯
= I ▯ 2H + X(X X) 0 ▯1X 0
= I ▯ 2H + H = I ▯ 2H
Symmetric:
0 0 0
(I ▯ H) = I ▯ H
▯ 0 ▯1 0 0
= I ▯ X(X X) X
= I ▯ X(X X) 0 ▯1X 0
= I ▯ H
^ 2
(d) (3 marks) Show that Var(ejX) = ▯ (I ▯ H).
Var(ejX) = Var((I ▯ H)ejX) from (a)
= (I ▯ H)Var(e)(I ▯ H) 0
= (I ▯ H)▯ I(I ▯ H) 0
2
= ▯ (I ▯ H) using (c)
Continued
6 5. The data considered in this question are the same data considered in Assignment 1,
taken from a 2007 Wall Street Journal article on the decline of U.S. house prices. The
data are indicators of the real-estate market in 28 U.S. cities. The variables considered
in this question are:
Response variable:
▯ PriceChange { The percent change in average price of a home from one year ago.
Explanatory variables:
▯ LoansOverdue { The percentage of mortgage loans that are 30 days or more overdue.
▯ InventoryChange { The percent change in housing inventory from one year ago. A
positive value indicates that more houses are on the market.
▯ EmployOutlook { A character variable that classi▯es the projected growth in the
number of jobs as one of Strong, Average, or Weak. (An observation that had an
employment outlook of Very Weak in the original data has been re-classi▯ed as Weak.)
▯ iEmployOutIsWeak { An indicator variable that is 1 if EmployOutlook is Weak and
0 otherwise.
▯ iEmployOutIsAverage { An indicator variable that is 1 if EmployOutlook is Average
and 0 otherwise.
▯ iEmpWeak LoansOD { The product of iEmployOutIsWeak and LoansOverdue.
▯ iEmpAvg LoansOD { The product of iEmployOutIsAverage and LoansOverdue.
On pages 14 through 18 there is SAS output for the analysis of these data. The
questions below relate to the SAS output.
(a) ANALYSIS 1 (page 14) was carried out using only observations having EmployOutlook
either Strong or Weak. (That is, cities with Average employment outlook were
removed from the data for this analysis only.) The questions in part (a) relate
to ANALYSIS 1.
i. (2 marks) What is the estimated di▯erence in the mean of percent change
in average price of a home between cities with Strong and cities with Weak
employment outlook?
5.645% with cities with weak outlook having the smaller (negative) mean
percent change.
ii. (2 marks) Can you conclude that there is a di▯erence in the mean of percent
change in average price of a home between cities with Strong a

More
Less
Related notes for STA302H1