The terms are more spread out on the right side of the chart when compared to the left hand side of the
Possible transformation to help eliminate or reduce heteroscedasticity are:
√ 𝑦 or ln(𝑦)
Comment on Transformation of Data
There are many possible transformation of the form 𝑌 where 𝑘 can be any real exponent. Examples
1 1 1 1 2
𝑦 = √𝑦;𝑦 ;𝑦 −2 = ;𝑦−1 = ;𝑦 ;𝑦 −3;𝑒𝑡𝑐.
Salary vs. Years of Experience for a random sample of 50 social workers.
Exp (X) Salary (Y)
Second Order Model
Minitab Output: Second Order Model
Residual Plot: Evidence of Heteroscedasticity Summary of Second Order Model
Regression equation: SALARY = 20242 + 522EXP + 53.0EXPSQ
𝑅 = 81.6%
But the residual plot shows signs of heteroscedasticity.
Second Order Model with Logarithmic Transformation
Minitab Output: Model Including Logarithmic Transformation and Quadratic Term Residual Plot: No Evidence of Heteroscedasticity
Summary of Second Order Model ln(Salary) vs. EXP and EXPSQ
Regression equation: ln(SALARY) = 9.84 + 0.0497EXP + 0.000009EXPSQ
𝑅 = 86.4%
𝑅𝑎= 85.8% The residual plot shows that the log transformation has significantly reduce heteroscedasticity.
But the coefficient of EXPSQ is not significant (p-value = 0.98).
First Order Model with Logarithmic Transformation
Note: Same 𝑅 value and higher 𝑅 value.
Interpreting the Model
ln ̂ = 9.84 + 0.05𝑥
̂ = 𝑒 9.84+0.05= 𝑒 9.8𝑒 0.05= 18769.72𝑒 0.05𝑥
Experience Predicted Salary
10 $30946.04 15 $39735.50
A Test for Heteroscedasticity
Divide the sample observations based on the values ô or equivalently, in this example, the value of 𝑥
(since for the fitted mod̂ increases as 𝑥 increases).
Examination of the data shows that approximately one-half of the 50 observations fall below 𝑥 = 20.
Testing for Equal Variances
We next calculate the variances of the observations in subgroups 1 and 2 and perform a test of
hypothesis for the ratio of the variances.
𝐻0: 1 = 1
𝐻 :𝜎1 ≠ 1
1 𝜎 2
Subgroups 1 and 2 Calculating𝒔𝟏and 𝒔𝟐for SAL vs. EXP, EXPSQ
𝑀𝑆𝐸 1 𝑠 1
𝑀𝑆𝐸 2 𝑠 2
Testing the Hypothesis of Equal Variances
𝐻0: = 1
𝐻1: 2 ≠ 1
𝑇𝑆:𝐹 = 𝑠𝑙𝑎𝑟𝑔𝑒𝑟 = 𝑀𝑆𝐸 𝑙𝑎𝑟𝑔𝑒𝑟= 94711023 = 2.99937
𝑠2 𝑀𝑆𝐸 𝑠𝑚𝑎𝑙𝑙𝑒𝑟 31576998
Accept 𝐻0if 𝐹 ≤ 2.37
Reject𝐻 0f 𝐹 < 2.37
Since 𝐹 = 2.99937 is greater than 2.37 we reject the hypothesis of equal variances. Therefore, the
quadratic model has residuals that exhibit heteroscedasticity.
Checking the Normality Assumption Important note
Moderate departures from the normality assumption will generally not invalidate the results of a
regression analysis. We can say that regression analysis is robust with regard to the normality
If a graphical display of the data (stem-and-leaf plot, histogram, etc.) is not badly skewed, and has one
major central peak, we can be confident in using the model.
Checking for Normality
We will use the model:
ln 𝑆𝑎𝑙𝑎𝑟𝑦 = 9.84 + 0.05𝐸𝑥𝑝
The histogram of the residuals of the residuals shows that the distribution is mound-shaped and
reasonable symmetric. Therefore, we suspect that the normality assumption is satisfied. However, this
claim is subjective so we need a more formal approach.
Normal Probability Plot
The normal probability plot graphs the residuals against the expected values of the residuals under the
assumption of normality.
If the assumption of normality is true then a residual value should approximately equal its expected
value, resulting in a straight line graph. The Anderson-DarlingStatistic
The AD statistic is used to test the hypothesis.
𝐻 : Distribution is normal
𝐻 1 Distribution is not normal
If the p-value for the AD is ≥ 0.05, there is no reason to conclude that the distribution is not