3 Apr 2015

School

Department

Course

Professor

For unlimited access to Textbook Notes, a Class+ subscription is required.

Chapter 15 – Describing Relationships: Regression, Prediction, and Causation

•Regression line – straight line that describes how a response variable y changes as explanatory

variable x changes

oOften used to predict value of y for a given value of x

•Want to draw a line that is close to the points in the vertical (y) direction

oNeed to find equation of the line that comes closest to the points in the vertical direction

•Least-squares regression of y on x – line that makes the sum of the squares of the vertical

distances of the data points from the line as small as possible

oLook at vertical distances of points from the regression line, square them, and move the

line until the sum of the squares is the smallest it can be for any line

•y = a + bx

ox explanatory variable, y response variable

ob slope of the line (amount by which y changes when x increases by one unit)

oa intercept (value of y when x=0)

•computer makes prediction easy and automatic anything done automatically often done

thoughtlessly

ocomputer cannot decide which is explanatory variable and which is response variable

two different lines depending on which is explanatory

•we often use several explanatory variables to predict response

•statistical methods of predicting response all share some basic properties of least-squares

regression lines

oPrediction is based on fitting some “model” to a set of data

oPrediction works best when the model fits the data closely

if they do not have strong patterns, prediction may be very inaccurate

oPrediction outside the range of the available data is risky

Referred to as extrapolation

•Correlation – measures direction and strength of straight-line relationship; regression – draws a

line to describe the relationship

oClosely connected even though regression requires choosing explanatory variable and

correlation does not

oBoth are strongly affected by outliers

•Usefulness of regression line for prediction depends on correlation between variables

oSquare of the correlation, r2 – proportion of variation in the values of y that is explained

by the least-squares regression of y on x

owhen there is a straight-line relationship, some of variation in y is accounted for by fact

that as x changes it pulls y along with it

ouseful to give r2 as measure of how successful the regression was in explaining the

response

operfect correlation (r = 1 or -1) means points lie exactly on the line

•Statistics and causation

o1. A strong relationship between two variables does not always mean that changes in one

variable causes changes in the other

o2. The relationship between two variables is often influenced by other variables lurking in

the background

o3. The best evidence for causation comes from randomized comparative experiments

•“nonsense correlations” correlations that lead to conclusions that changing one of the variables

causes changes in another