Correlation and Regression

48 views7 pages
user avatar
Published on 28 Sep 2011
School
Simon Fraser University
Department
Political Science
Course
POL 201
Professor
A Summary of the Key Points about Correlation and Regression
At the interval level, any relationship can be analyzed in terms of both its nature and its strength.
Nature
we can visualize the shape or nature of the relationship by plotting our cases in a scatter
diagram or scatterplot
the vertical axis represents the dependent variable (Y) and the horizontal axis
represents the independent variable (X), and each case is plotted according to its Y and X
scores
to summarize the nature of the relationship (if any) between X and Y, we can fit a line as
closely as possible to the cluster of points in the scattergram
this line is called a regression line and is defined by its Y-intercept, a, and a slope, b:
Y = a + bX
‘b’ is known as the regression coefficient
it tells us what change in Y is produced by a one–unit change in X
its formula is:
=
=
=
n
i
i
n
i
ii
XX
YYXX
b
1
2
1
)(
))((
‘a’ is the intercept, the value of Y when X = 0
its formula is:
XbYa
=
the regression equation also allows us to make predictions of Y by substituting in a value for
X and calculating what Y-value would be produced
for example, the regression equation linking extremist party presence to government
durability in the example used in class is
Y = 33.0 – 0.39X
1
Unlock document

This preview shows pages 1-2 of the document.
Unlock all 7 pages and 3 million more documents.

Already have an account? Log in
thus, if a particular country has an extremist vote of 31%, we substitute in the X-
score of 31 and get a predicted average duration of:
Y = 33.0 – .39(31) = 20.9 months
Strength
The correlation coefficient (r) tells us how strong the association is between two variables
and whether it is positive or negative
it ranges between -1 (perfect negative association) through 0 (no association at all) to
+1 (perfect positive association).
The interpretation of the correlation is based in the principle of ‘proportional reduction in
error’ (PRE)
All PRE measures record the proportion of our errors in guessing Y-scores that are
eliminated if we use our knowledge of the X-scores
they are based on the idea that, the more our knowledge of one variable helps us
guess values of the other variable, the more strongly two variables must be related
with a PRE measure, we always start with an initial guess of each case’s Y-score without
knowledge of its X-score and calculate how much error we have made, then we make a final
guess using knowledge of each case’s X-score and calculate the remaining error; the final
guess should be more accurate than the initial guess (i.e. the error should be reduced) if the
variables are related
with interval data, our initial guess is simply the mean Y-score for the group of cases, since
the mean is in the middle of the distribution
the error we make for each case is the deviation of its actual Y-score from our guess,
i.e. from the mean of Y
in the example used in class, for instance, Swiss governments lasted an average of
36 months and the mean for all 27 countries was 28.9 months
therefore our initial error for Switzerland is (36 – 28.9) = 7.1 months
we need to sum all these deviations to get the total error, but there is a problem: the
sum is always 0, which isn’t very informative
so we square the deviations first, then sum them up:
2
Unlock document

This preview shows pages 1-2 of the document.
Unlock all 7 pages and 3 million more documents.

Already have an account? Log in

Document Summary

A summary of the key points about correlation and regression. At the interval level, any relationship can be analyzed in terms of both its nature and its strength. We can visualize the shape or nature of the relationship by plotting our cases in a scatter diagram or scatterplot. The vertical axis represents the dependent variable (y) and the horizontal axis represents the independent variable (x), and each case is plotted according to its y and x scores. To summarize the nature of the relationship (if any) between x and y, we can fit a line as closely as possible to the cluster of points in the scattergram. This line is called a regression line and is defined by its y-intercept, a, and a slope, b: B" is known as the regression coefficient. It tells us what change in y is produced by a one unit change in x. A" is the intercept, the value of y when x = 0.