𝑒 ≤ 𝑆 𝑇 𝑒
$65.69 ≤ 𝑆 ≤𝑇$98.40
Simple Linear Regression Review
Simple Linear Regression
Simple linear regression analysis is used to analyze the nature of the relationship between two variables.
The dependent (response) variable is designated by 𝑌 and the independent (predictor) variable is
designated by 𝑋. For a given independent variable, there may be many values of the dependent
The decision regarding which variable to designate 𝑌 and which variable to designate 𝑋 must be based
upon theory, knowledge of the subject matter, and the objectives of the analysis. The relationship
between the two variables is estimated and then used to make predictions for 𝑌.
A scatter diagram is a graph showing the shape and direction of the underlying relationship between the
independent variable 𝑋 and the dependent variable 𝑌.
Observations are plotted in pairs (𝑥,𝑦) with one variable plotted on each axis.
Linear Relationships Between Two Variables Intercept and Slope
The relationship between the two variables is described by a straight line mode in general form:
True regression line:
𝑌 = 𝛽 0 𝛽 𝑋 1 𝜖 or 𝐸 𝑌 = 𝛽 + 𝛽 𝑋0 1
̂ ̂ ̂
𝑌 = 𝛽 +0𝛽 𝑋 1
𝛽0is a point estimation of 𝛽0
𝛽1is a point estimation of 𝛽1
We note that 𝛽 and 𝛽 are variables while 𝛽 and 𝛽 are constants.
0 1 0 1
Simple Linear Regression Model Definitions of Terms
𝑋 = the independent (predictor) variable
𝑌 = the dependent (response) variable
𝛽0= the true 𝑌-intercept
𝛽 = the true slope
𝜖 = the error term
𝛽0= the estimated 𝑌-intercept
𝛽 = the estimated slope
Simple Linear Regression Model Assumptions
1. In Simple Linear Regression, we have the assumption of linearity.
2. For each value of 𝑋 the 𝑌 values are normally distributed.
3. For each value of 𝑋 the variance of the 𝑌values is the same (homoscedasticity).
4. Independence (independent sample of 𝑌 are chosen for different values o𝑋 i.e. error terms are
Intercept and Slope
The 𝑌-intercept 𝛽0is the point on the 𝑌-axis where the true regression line crosses and is the average
value of 𝑌 when 𝑋 = 0.
The slope of the true regression line1𝛽 represents the average change in 𝑌 when 𝑋 is increased by one
𝛽0and 𝛽 1re called the parameters of the regression line.
The difference between an observed value 𝑌 and an estimated 𝑌 is called a residual.
SSE - Sum of Squares Error
𝑆𝑆𝐸 = ∑𝑒 = 𝑖 𝑦 − 𝑦 𝑖 ̂𝑖)
SSE represents the sum of the squared deviations between the observed 𝑌-values in the data set and
the 𝑌-values predicted by the estimated regression line. This is the amountof variation in 𝒀 that is not
explained by the regression line.
Note: The Least Squares regression line is the line that has an intercep0 𝛽 and slope1𝛽 that will
Minimizing SSE To find the line that best fits the points in the scatter diagram, we minimize the quantity:
𝑆𝑆𝐸 = ∑(𝑌 − 𝑌) = ∑(𝑌 − 𝛽 − 𝛽 𝑋) ̂ ̂ 2
𝑖 𝑖 𝑖 0 1
We note that 𝑆𝑆𝐸 = 𝑓(𝛽 ,𝛽 0. W1 obtain the two partial derivatives:
𝛿𝛽̂ and 𝛿𝛽
And solve the equations:
𝛿(𝑆𝑆𝐸) = 0 and 𝛿(𝑆𝑆𝐸) = 0
Least Squares Estimates of Regression Parameters
∑𝑥𝑦 − 𝑛𝑥̅𝑦 ̅
∑𝑥 − 𝑛𝑥̅ 2
𝛽0= 𝑦 ̅ − 𝛽 1̅
Interpretation of Regression Coefficients
The regression equation is:
𝑌 = 7.905 + 0.715𝑋
𝛽1= 0.715 implies that, on average, a one unit increase in 𝑋 will result in an increase of 0.715 in 𝑌, i.e.
if advertising increases by $1000 then profit will increase, on average, by $715.
𝛽 = 7.905 implies that the average profit in stores with no advertising budget will be $7905.
The interpretation of 𝛽 0 7.905 is not reliable.
The problem is that we are making a claim about a value of 𝑋 for which we have no experimental
All of our experimental data is for values of 𝑋 in the range $3000 to $7000. Therefore we cannot make a
reliable claim about the relationship between 𝑋 and 𝑌 when 𝑋 = 0.
This is called extrapolation.
Point Estimates The regression equation can be used to predict a value of 𝑌 based on a given value of 𝑋 by substituting
the 𝑋-value into the regression line.
Estimate the profit of a store which spends $3500 on advertising.
Let 𝑋 = 3.5
Then 𝑌 = 7.905 + 0.715 3.5 = 10.4 (i.e. $10400)
Caution: Do not use the regression line to predict 𝑌 with values of the independent variable significantly
beyond the range of those represented in the sample. The nature of the relationship outside the range
of 𝑋-values represented in the sample may not be linear and extrapolation may lead to false
Partitioning Total Deviation
Total deviation = 𝑌 𝑖 𝑌
Unexplained deviation = 𝑌 − 𝑖 ̂𝑖
Explained deviation = 𝑌 −𝑖𝑌 ̅
̅ ̂ ̂ ̅
𝑌𝑖− 𝑌 = (𝑌 − 𝑖) + 𝑖𝑌 − 𝑌)𝑖
Total deviation = Unexplained deviation + Explained deviation
Sum of Squares
We can compute sums of squares in regression analysis and construct an analysis of variance (ANOVA)
table for the regression.
Partitioning Sums of Squares
It can be shown that: