# SOC202H1 Lecture Notes - Linear Regression, Scatter Plot, Linear Equation

11 views1 pages
School
UTSG
Department
Sociology
Course
SOC202H1 CH14 BIVARIATE CORRELATIONSHIP AND REGRESSION
CORRELATION a systematic change in the scores of two interval/ratio
variables
o 2 interval/ratio variables correlated (co-relate) when measurements
of one variable change in tandem with the other consistently from
case to case
o Usually: dependent variable (Y) and independent variable (X)
o Simple Linear Correlation and Regression use formula for a straight
line to improve best estimates of a interval/ratio dependent variable
(Y) for all values of an interval/ratio independent variable (X)
o Formula for straight line:    
Applies for interval and ratio measurements
Used only when there’s a linear relationship between X and Y
Graphical representation of the relationship between two interval/ratio
variables
o Scatterplot:
A two-dimensional grid of coordinates of two interval/ratio
variables, X and Y on the X-axis and Y-axis
o Coordinate:
A point on a scatterplot where the values of X and Y are plotted
for a case
o Linear pattern:
One where the coordinates of the scatterplot fall into a cigar-
shaped pattern that approximates the shape of a straight line
Identifying linear patterns
o On scatterplot: coordinates in elongated cigar-shaped pattern
linear relationship
o POSITIVE CORRELATION an increase in X is related to an increase in
Y (as X increases, Y has a tendency to increase)
o NEGATIVE CORRELATION an increase in X is related to a decrease in
Y (as X increases, Y has a tendency to decrease)
o NO CORRELATION an increase in X is unrelated to the score of Y. (As
X increase, Y-scores vary randomly)
Using the Linear Regression
REGRESSION LINE the best-fitting straight line plotted through the X, Y-
coordinates of a scatterplot of two interval/ratio variables
o Used to estimate Y values at any X
Estimate based on knowing the precise relationship between
height and weight
o = value of Y predicted by the regression line
knowledgeable estimate of Y based on knowing Y is related to X
o Deviation Score of single sample = error in estimating to the mean
 
o Error in estimating a sample’s Y value to the regression line
  
o When relationship is found to exist between two interval/ratio
variables make best estimates by using the regression line rather
than the mean
Pearson’s r Bivariate Correlation Coefficient
PEARSON’S r BIVARIATE CORRELATION COEFFICIENT Measures tightness
of fit of X, Y coordinates around the regression line
  







o CO-VARIATION OF X AND Y sum of deviation scores of X multiplied
by the deviation scores of Y, aka. Sum of cross products (SCP)


Characteristics of Pearson’s r Bivariate Correlation Coefficient
o The sign (+ or -) of r indicates the direction of relationship
o r range from -1.0 (perfect negative correlation) to +1.0 (perfect
positive correlation)
larger the absolute value of r the tighter the fit of X, Y
coordinates around the regression line
o When r = 0, regression line is flat no correlation
Understanding the Pearson’s r Formulation
o Divide scatter plot into 4 quadrants by
and
, note the relationship
3 populated = positive relationship; no pattern = no relationship
o All elements of r equation involve deviation scores
o r gauges how deviation scores of Y and Y fluctuate together (covary)
if positive relationship then expect samples on positive side of
also on the positive side of
denominator gauges how much total error X and Y have relative
to one another
assuming no relationship between X and Y
numerator gauges how well X and Y fluctuate in a pattern
measures correlation effect of the relationship
perfect relationship: numerator will equal denominator
Regression Statistics
Linear equation formula:  
o First calculate the values of a and b then plug in X and solve for Y
o : the predict Y
An estimate of the dependent variable Y computer for a given
value of the independent variable X
o : the Regression Coefficient or Slope
Effect on Y per 1-unit change in X
o : The Y-intercept or the constant
Anchors the regression line to the Y-axis
Usually hypothetical point b/c there may be no case where X = 0
Calculating the Terms of the Regression Line Formula
  






  
Total Sum of Squares (TSS) the overall different between each score and
the mean of Y; i.e. the sum of the deviation scores

Explained Sum of Squares (ESS or MSS) the overall difference between
each predicted value and the mean of Y
 
Residual Sum of Squares (RSS) the different between each score and
predict score
  
Thus it follows:
  
Steps of calculating bivariate correlation and regression statistics
o 1. Decide independent vs. dependent variable
o 2. Observe scatterplot, determine relationship
o 3. Calculate mean X and Y
o 4. Calculate TSS for X and Y and SCP
o 5. Calculate Pearson’s r correlation coefficient
o 6. Calculate the regression slope b and the Y-intercept a
Statistical Follies and Fallacies
Linear equations work only with linear pattern in the scatterplot
Outlier coordinates distort correlation & regression coefficients, cause:
o Attenuation of Correlation the weakening or reduction of
correlation and regression coefficient
o Outliers coordinate may falsely imply a relationship
Unlock document

This preview shows half of the first page of the document.
Unlock all 1 pages and 3 million more documents.

## Document Summary

Correlation a systematic change in the scores of two interval/ratio. Used only when there"s a linear relationship between x and y. Graphical representation of the relationship between two interval/ratio variables: scatterplot: A two-dimensional grid of coordinates of two interval/ratio variables, x and y on the x-axis and y-axis: coordinate: A point on a scatterplot where the values of x and y are plotted for a case: linear pattern, divide scatter plot into 4 quadrants by and , note the relationship. Quadrant 1 & 4 populated = negative relationship; quadrant 2 & 3 populated = positive relationship; no pattern = no relationship: all elements of r equation involve deviation scores, r gauges how deviation scores of y and y fluctuate together (covary) If positive relationship then expect samples on positive side of also on the positive side of . Denominator gauges how much total error x and y have relative to one another. Assuming no relationship between x and y.