If we have a curved pattern on a scatterplot, it would not make sense to model it linearly,
but that doesn’t mean there is no relationship between X and Y.
One way to model the relationship is to transform the explanatory variable
i.e., using log (x) or √ x in regression instead of x
Model: y = β +0β 1 √ x + ε
More common way – polynomial regression
i.e., fit data to y = 0 + β1x + β 2 + ε
for a quadratic function
Can do polynomial regression using X , for k ≥ 1, but we want to include all lower order
terms of X in the model as well.
So if we were to fi a 5 degree polynomial, we would use = b0+ b 1 + b x2+ b x 3 3
b 4 + b x5 5
Sometimes Y is strongly related to several explanatory variables while not having a
strong relationship with any one single explanatory variable.
If we have k predictors, our model is
y = β 0 β x1 1β x 2 2 + β x + εk k i N (0, σ 2
for k ≥ 1, integer
We fit β 0 b ,0β 1 b ,1etc., where b , 0 ,1… b ark the values that minimize
[ y− b +b x+…+b x 2] [(y−̂ y) ]
∑ ( ( 0 1 k k) = ∑ = SSE Testing the fit of the model
Is the “overall fit” of our model good?
We perform a model utility test.
H : β = β = … = β = 0
0 1 2 k
H A At least one β i 0 for i = 1, …, k