MTH-416, REGRESSION ANALYSIS Lecture Notes - Lecture 4: Indian Institute Of Technology Kanpur, Scatter Plot, Summary Statistics

22 views36 pages
Regression Analysis | Chapter 4 | Model Adequacy Checking | Shalabh, IIT Kanpur
111
Chapter 4
Model Adequacy Checking
The fitting of the linear regression model, estimation of parameters testing of hypothesis properties of the
estimator, is based on the following major assumptions:
1. The relationship between the study variable and explanatory variables is linear, at least approximately.
2. The error term has zero mean.
3. The error term has a constant variance.
4. The errors are uncorrelated.
5. The errors are normally distributed.
The validity of these assumptions is needed for the results to be meaningful. If these assumptions are violated,
the result can be incorrect and may have serious consequences. If these departures are small, the final result
may not be changed significantly. But if the deviations are large, the model obtained may become unstable in
the sense that a different sample could lead to an entirely different model with opposite conclusions. So such
underlying assumptions have to be verified before attempting to regression modeling. Such information is not
available from the summary statistic such as t-statistic, F-statistic or coefficient of determination.
One crucial point to keep in mind is that these assumptions are for the population, and we work only with a
sample. So the main issue is to make a decision about the population on the basis of a sample of data.
Several diagnostic methods to check the violation of regression assumption are based on the study of model
residuals with the help of various types of graphics.
Checking of the linear relationship between study and explanatory variables
1. Case of one explanatory variable
If there is only one explanatory variable in the model, then it is easy to check the existence of the linear
relationship between
y
and
X
by scatter diagram of the available data.
If the scatter diagram shows a linear trend, it indicates that the relationship between
y
and
X
is linear. If the
pattern is not linear, then it suggests that the relationship between y and X is nonlinear. For example, the
following figure indicates a linear trend
Unlock document

This preview shows pages 1-3 of the document.
Unlock all 36 pages and 3 million more documents.

Already have an account? Log in
Regression Analysis | Chapter 4 | Model Adequacy Checking | Shalabh, IIT Kanpur
222
whereas the following graph suggests a nonlinear trend:
2. Case of more than one explanatory variables
To check the assumption of linearity between the study variable and the explanatory variables, the scatter plot
matrix of the data can be used. A scatterplot matrix is a two-dimensional array of two-dimension plots where
each form contains a scatter diagram except for the diagonal. Thus, each scenario sheds some light on the
relationship between a pair of variables. It gives more information than the correlation coefficient between
each pair of variables because it provides a sense of linearity or nonlinearity of the relationship and some
awareness of how the individual data points are arranged over the region. It is a scatter diagram of
1
( versus ),
y
X 2
( versus ),yX
…, ( versus )
k
y
X.
Unlock document

This preview shows pages 1-3 of the document.
Unlock all 36 pages and 3 million more documents.

Already have an account? Log in
Regression Analysis | Chapter 4 | Model Adequacy Checking | Shalabh, IIT Kanpur
333
Another option to present the scatterplot is
- display the scatterplots in the upper triangular part of the plot matrix.
- Mention the corresponding correlation coefficients in the lower triangular part of the matrix.
Suppose there are only two explanatory variables and the model is 11 2 2 ,yX X


then the scatterplot
matrix looks like as follows.
Such an arrangement helps in examining of plot and corresponding correlation coefficient together. The
pairwise correlation coefficient should always be interpreted in conjunction with the corresponding scatter
plots because
- the correlation coefficient measures only the linear relationship and
- the correlation coefficient is non-robust, i.e., one or two observations can substantially influence its
value in the data.
The presence of linear patterns is reassuring, but the absence of such patterns does not imply that the linear
model is incorrect. Most of the statistical software provides the option for creating the scatterplot matrix. The
view of all the plots indicates that a multiple linear regression model may provide a reasonable fit to the data.
Unlock document

This preview shows pages 1-3 of the document.
Unlock all 36 pages and 3 million more documents.

Already have an account? Log in

Get access

Grade+20% off
$8 USD/m$10 USD/m
Billed $96 USD annually
Grade+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
40 Verified Answers
Class+
$8 USD/m
Billed $96 USD annually
Class+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
30 Verified Answers

Related Documents