Chapter 6 - Prediction Matthias Schonlau Stat 231 Outline • One sample prediction • Regression prediction – Example: Professor Cotton’s Marathon Prediction of a single obs vs Averages • We have made inferences for averages, and for regression parameters α and β. • The purpose of prediction is to make inference for a single new observation. • Single observations are more variable than averages. Example: Driving to T oronto • Give a confidence interval for the average driving time from Waterloo to Toronto – If you plan for the average driving time, you will arrive late half the time and early the other half. • You have tickets for the Toronto fringe theater festival. Late arrivers are not seated. Give a confidence interval for driving to Toronto for this one drive. • The confidence interval for one particular drive is wider than that for the average. One sample Prediction • Assume the one sample model YR µ σ , R~G(0, ) • So far, we have learned how to – estimateμ and σ –(Confidence interval)inty of the estimation – Hypothesis tests • All this refers to already collected data. • What if we want to predict a new data point? One Sample Prediction • A single new data point drawn from this model has expectation μ and standard deviation σ • Y is the random variable for the new point 0 Y0~G(μ,σ) • We don’t know μ and σ, we only know their estimates • We also know the sampling distribution for μ σ µµ~ (, ) One Sample Prediction • The error of the point estimate is given by () −µ 0 • Note ()0 0−+−=+µ−Rµµ ( µ) ( ) • By independence: 2 2 2  1 Var(Y0 0)  V+)r Y =+ =arµ+ σ  nn  One Sample Prediction  1 Y0−µ+σ~ , 1 n  • There are two sources of variation: – uncertainty for a single point – uncertainty of the estimate for μ One sample prediction: Hypothesis T est (T-test) • The prediction variance is used in the usual ways • T-test YY µµ 0 0 = ~ n 1 se()0 −µ 1 σ 1 +n One sample prediction: Prediction Interval 1 µσ±+c*1 n where c is the critical value form the t-distribution with (n-1) d.f. Regression prediction • Now assume the regression model: Y =α+1 xR , R~G(0,σ) • It is useful to rewrite this model, centering on x: Y =α+β−+ x( R ) , R~G(0,σ) 2 • where α12β − x Regression prediction: Centred regression estimators • For centered data, x = 0 yx +=αβ α ˆ • Estimates for α and β are independent ˆ Sxy β = Sxx • The distributions of the parameters are: – This was shown earlier   α ~ G, σ , ~    n S   xx Regression Prediction • The estimator of the mean value at x newis µ x +− α=β xx)  new new Regression Prediction • By independence, Var ( x( ) +)βVa()  (ar )x x new ( new ) =Var () x x Var β  ( new ) ( )  2 =σ  1 +(xnew− ) nS xx  Regression prediction: Distribution of estimator at x new • By independence, the estimated value at x new has the following distribution 2   1 (xnew ) µ(x new ~ x − ( +new ), nS   xx • This predicted value is on the regression line. • The variance refers to average Regression prediction: Distribution of a single point at x new • The distribution of a single random draw at xnewis : Ynew xx (αβ+− ( new σ ), ) • The expected value of a single obs and the average are the same • The variance refers to a single observation Regression Prediction: Combined variability • By independence between Y and μ(x ) new new   2 Y x−G µσ ) ~ 0, 1++1 (xnew ) new new   nS xx   Regression Prediction: Hypothesis test • Because the Y~G, the sampling distribution is T • Hypothesis test: Ynewµ() new 2 ~ n 2 1 (xnew ) σ 1++ nS xx • Subtracting 2 d.f.because α and β were estimated Regression Prediction: Confidence interval 2 ˆ 1 (xnew ) µˆ(xnew σ ++ nS xx Where c is the critical value from the T distribution with (n-2) d.f. Regression
