School
Department
Statistics
Course
STAT 331
Professor
Kun Liang
Semester
Fall

Description
Stat 331 Tutorial 3 House Price Example Kun Liang [email protected] M3 4201 Model diagnostics: Why? 1 2 3 4 y y y y 4 6 8 0 2 1 1 4 6 8 0 2 1 1 4 6 8 0 2 1 1 4 6 8 0 2 1 1 5 0151 5 0151 5 0151 5 0151 Anscombe 1973 data 1 2 3 4 y y y y 4 6 8 0 2 1 1 4 6 8 0 2 1 1 4 6 8 0 2 1 1 4 6 8 0 2 1 1 5 0151 5 0151 5 0151 5 0151 House price example I The objective of the example was to predict house price based on I size of home in square feet (Size) I number of bedrooms (Beds) I number of bathrooms (Baths) I whether New (1 = yes, 0 = no) I annual tax bill in dollars (Taxes). I The data are collected for 100 homes sold in Gainesville, Florida, fall 2006. I Consider a multiple regression model of the selling price (y) on three explanatory variables: Size (1 ), New (2 ), and Taxes (x ). 3 Read data > hp > head(hp) case Taxes Beds Baths New Price Size 1 1 3104 4 2 0 279900 2048 2 2 1173 2 1 0 146500 912 3 3 3076 4 2 0 237700 1654 4 4 1608 3 2 0 200000 2068 5 5 1454 3 3 0 159900 1477 6 6 2997 3 2 1 499900 3153 > > dim(hp) [1] 100 7 > hp plot(hp\$Size, hp\$Price, xlab="Size", ylab="Price") Price 0e+00 1e+05 2e+05 3e+05 4e+05 5e+05 6e+05 500 1000 1500 2000 2500 3000 3500 4000 Size Look at data > pairs(hp) 500 1500 2500 3500 0 2000 4000 6000 Price 0e+00 3e+05 6e+05 Size 500 2000 3500 New 0.0 0.4 0.8 Taxes 0 2000 5000 0e+00 2e+05 4e+05 6e+05 0.0 0.2 0.4 0.6 0.8 1.0 Even better > library(car) > scatterplotMatrix(hp) 500 1500 2500 3500 0 2000 4000 6000 Price 0e+00 3e+05 6e+05 Size 500 2000 3500 New 0.0 0.4 0.8 Taxes 0 2000 5000 0e+00 2e+05 4e+05 6e+05 0.0 0.2 0.4 0.6 0.8 1.0 Linear model Consider model Price = Size + New + Taxes + ▯ > fit summary(fit) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -21353.776 13311.487 -1.604 0.11196 Size 61.704 12.499 4.937 3.35e-06 *** New 46373.703 16459.019 2.818 0.00588 ** Taxes 37.231 6.735 5.528 2.78e-07 *** Residual standard error: 47170 on 96 degrees of freedom Multiple R-squared: 0.7896, Adjusted R-squared: 0.783 F-statistic: 120.1 on 3 and 96 DF, p-value: < 2.2e-16 Residual vs ﬁtted > plot(fitted(fit), rstudent(fit), xlab="fitted", ylab="residuals") studentized residual −4 −2 0 2 4 0e+00 1e+05 2e+05 3e+05 4e+05 fitted Transformation? > library(MASS) > boxcox(fit, lambda = seq(-1, 1, 1/20)) 95% log−Likelihood −180 −160 −140 −120 −100 −1.0 −0.5 0.0 0.5 1.0 λ Linear model p Price = Size + New + Taxes + ▯ > fit2 > summary(fit2) Residuals: Min 1Q Median 3Q Max -155.478 -33.695 4.374 31.780 145.241 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.887e+02 1.450e+01 13.012 < 2e-16 *** Size 6.084e-02 1.362e-02 4.468 2.15e-05 *** New 4.712e+01 1.793e+01 2.628 0.01 * Taxes 4.476e-02 7.337e-03 6.101 2.21e-08 *** Residual standard error: 51.39 on 96 degrees of freedom Multiple R-squared: 0.791, Adjusted R-squared: 0.7844 F-statistic: 121.1 on 3 and 96 DF, p-value: < 2.2e-16 Residual vs Size > plot(hp\$Size, rstudent(fit2), xlab="Size", ylab="residuals") residuals −3 −2 −1 0 1 2 3 500 1000 1500 2000 2500 3000 3500 4000 Size Residual vs Taxes > plot(hp\$Taxes, rstudent(fit2), xlab="Size", ylab="residuals") residuals −3 −2 −1 0 1 2 3 0 1000 2000 3000 4000 5000 6000 Taxes Residual vs ﬁtted > plot(fitted(fit2), rstudent(fit2), xlab="fitted", ylab="residuals") residuals −3 −2 −1 0 1 2 3 300 400 500 600 700
