Textbook Notes
(363,065)

Canada
(158,171)

University of Toronto St. George
(10,473)

Psychology
(2,948)

PSY201H1
(45)

Kristie Dukewich
(20)

Chapter

# Chapter Seven PSY201

Unlock Document

University of Toronto St. George

Psychology

PSY201H1

Kristie Dukewich

Fall

Description

Chapter 7: Linear Regression
– regression and correlation are closely related
+ both involve the relationship between two variables
+ both utilize the same set of basic data: paired scores taken from the same/matched subjects
– correlation concerned with magnitude and direction of the relationships
+ regression focuses on using the relationship for prediction
– prediction easy when relationship is perfect
+ if perfect, all the points fall on a straight line and all we need to do is derive the equation of the straight line
and use it for prediction
+ perfect relationship = perfect prediction
– all predicted values are exactly equal to the observed values and prediction error equals zero
+ situation more complicated when relationship is imperfect
– regression: a topic that considers using the relationship between two/more variables for prediction
– regression line: a best fitting line used for prediction
Prediction and Imperfect Relationships:
– ie: in a given scatter plot of data,
+ relationship is imperfect, positive, and linear
+ problem for prediction: how to determine the single straight line that best describes the data
+ solution most often used is to construct the line that minimizes errors of prediction according to a least-squares
criterion → line is called least-squares regression line. • least-squares regression line shown in data
• vertical distance between each point and line represents error in prediction
• if let Y' = the predicted Y value and Y = the actual value → Y – Y' = error for each point
• the total error in prediction doesn't equal Σ(Y-Y') because some of Y' values will be greater than Y and some will
be less → there'll be both positive and negative error scores, and simple algebraic sums of these would cancel
each other
• similar situation when considering measures of average disperson
+ in deriving equation for standard deviation, squared X-X(line) to overcome the fact that there were positive and
negative deviation scores that cancel each other
+ same solution works too... →
• instead of just summing Y-Y', first computer (Y-Y')^2 for each score → removes the negative values and
eliminates the cancellation problem
• if minimize Σ(Y-Y')^2, minimize the total error of prediction
– least-squares regression line: prediction line that minimizes the total error of prediction, according to the
least-squares criterion of Σ(Y-Y')^2
– for any linear relationship, there's only one line that'll minimize Σ(Y-Y')^2 → there's only one least-squares
regression line for each linear relationship
– use the least-squares regression line because it gives the greatest overall accuracy in prediction
+ ie: another prediction line drawn in 7.2(b)
+ line picked arbitrarily and is just one of an infinite number that could've been drawn
+ does better on some points (A&B), bad on other (C&D) – consider all the points, it's clear that the line of (a) fits
the points better than the lines of (b)
+ total error in prediction presented by Σ(Y-Y')^2 is less for least-squares regression line than for the line in (b) –
the total error in prediction is less for the least-squares regression line than for any other possible prediction line → the least-squares regression line is used because it gives greater overall accuracy in prediction than any other
possible regression line
Constructing the Least-Squares Regression Line: Regression of Y on X
– equation for least-squares regression line for predicting Y given X is:
- the general equation of a straight line that we've been using all along
+ ay & by called regression constants
- line called regression line of Y on X, or regression of Y on X → predicting Y given X
– by regression constant is equal to
- since we need by constant to determine ay constant, got to find by and then ay
+ once both found, substituted into the regression equation
+ ie: IQ and GPA (7.2)
- equation for Y' can be used to predict the GPA knowing only the student's IQ score – suppose a student's IQ is 124; what's the GPA?
→ Y' = 0.074X – 7.006
= 0.074(124) – 7.006
= 2.17 Regresssion of X on Y
– so far, predicting Y scores from X scores
+ derived a regression line that enabled us to predict Y given X → regression line of Y on X ;; also possible to
predict X given Y
– to predict X given Y, must derive a new regression line
+ cannot use regression equation for predicting Y given X
– ie: involving IQ (X) and GPA (Y)
+ Y' = 0.074X – 7.006
+ cannot use this to predict IQ given GPA → must derive new regression constants because old regression line
derived to minimize errors in Y variable
– minimizing Y' errors and minimizing X' errors will not lead to same regression lines
+ exception occurs when relationship is perfect rather than imperfect – both regression lines coincide, forming the
single line that hits all the points
+ regression line for predicting X from Y some called regression line of X on Y – regions of X on Y
– use IQ and GPA again
+ predict IQ (X) from GPA (Y)
– linear regression equation for predicting X given Y – this line, along with the line predicting Y given X
+ two lines are different → expected when relationship is imperfect
– although different equations do exist for computing the second regression line, they are seldom used
+ instead, it is common practice to designate the predicted variable as Y' and the given variable as X
→ if we wanted to predict IQ from GPA, we'd designate IQ as Y' variable and GPA as X variable and then use the
regression equation for predicting Y given X
Measuring Prediction Errors: The Standard Error of Estimate
– regression line represents our best estimate of the Y scores given their corresponding X values
+ unless relationship between X and Y is perfect, most of actual Y values will not fall on the regression line –
when relationship is imperfect, there will be necessarily be prediction errors
+ useful to know the magnitude of the errors
– Quantifying prediction errors involves computing the standard error of estimate
+ standard error of estimate is much like standard deviation (gives a measure of the average deviation of the
prediction errors about the regression line)
+ standard error gives a measure of the average deviation of the prediction errors about the regression line – can
be considered as an estimate of the mean of the Y values which changes with X values
+ with standard deviation, the sum of the deviations, Σ(X-X(bar)), equaled zero – had to square the deviations to
obtain a meaningful average (situation is same with standard error of estimate) + since sum of prediction errors, Σ(Y-Y(bar)), equals 0, we must square them also – average is then obtained by
summing the squared values, dividing them by N-2, and taking the square root of the quotient
– equation for standard error for estimate for predicting Y given X is
+ have divided by N-2 rather than N-1 as was done with standard deviation
– determining by regression coefficient, already calculated the values for SSx and SSy
- calculate standard error of estimate for grade point and IQ data (7.1, &.2)
+ shall let GPA be Y variable and IQ the X variable → calculate the standard error of estimate for predicting grade
point average given IQ
+ SSx = 936.25
SSy = 7.022
ΣXY – (ΣX)(ΣY)/N = 69.375
N = 12
substituting in the equation for the standard error of estimate for predicting Y given X
→ standard error of estimate = 0.43 ;; measure has been computed over all the Y scores
+ for it to be meaningful, we must assume that the variability of Y remains constant as we go from one X score to
the next – assumption of homoscedasticity

More
Less
Related notes for PSY201H1