30 Mar 2012

School

Department

Course

Professor

Chapter 15

Describing Relationships: Regression, Prediction, and Causation

Regression lines

A regression line is a straight line that describes how a response variable y changes as an

explanatory variable x changes. (predict the value of y for a given value of x)

Because we want to predict y from x, we want a line that is close to the poitsn in the vertical

direction

The least-squares regression line of y on x is the line that makes the sum of the squares of the

vertical distances of the data points from the line as small as possible

X stands for the explanatory variable and y for the response variable

Y = a + bx

The number b is the slope of the line (the amount by which y changes when x increases by one

unit)

The number a is the intercept (the value of y when x = 0)

To use the equation for prediction, just substitute your x-value into the equation and calculate the

resulting y-value

Understanding prediction

Prediction is based on fitting some “model” to set of data

Prediction works best when the model first the data closely

Prediction outside the range of the available data is risky

Beware of extraporlation (prediction outside the range of available data)

Correlation and regression

Correlation measures the direction and strength of a straight-line relationship

Regression draws a line to describe the relationship

Correlation does not require choosing an explanatory variable, regression is opposite

Both correlation and regression are strongly affected by outliers

The usefulness of the regression line for prediction depends on the strength of the association

(depends on the correlation between the variables)

The square of the correlation r^2 is the proportion of the variation in the values of y that is

explained by the least-square regression of y on x

The idea is that when there is a straight-line relationship, some of the variation in y is accounted

for by the fact that as x changes it pulls y along with it

In reporting a regression, it is usual to give r^2 as a measure of how successful the regression was

in explaining the response

Perfect correlation (r = 1 or r = -1) means the points lie exactly on a lie

The question of causation

A strong relationship between two variables does not always mean that change sin one variable

cause changes in the other