# SOC202H1 Lecture Notes - Linear Regression, Scatter Plot, Linear Equation

11 views1 pages

CH14 – BIVARIATE CORRELATIONSHIP AND REGRESSION

CORRELATION – a systematic change in the scores of two interval/ratio

variables

o 2 interval/ratio variables correlated (co-relate) when measurements

of one variable change in tandem with the other consistently from

case to case

o Usually: dependent variable (Y) and independent variable (X)

o Simple Linear Correlation and Regression – use formula for a straight

line to improve best estimates of a interval/ratio dependent variable

(Y) for all values of an interval/ratio independent variable (X)

o Formula for straight line:

Applies for interval and ratio measurements

Used only when there’s a linear relationship between X and Y

Graphical representation of the relationship between two interval/ratio

variables

o Scatterplot:

A two-dimensional grid of coordinates of two interval/ratio

variables, X and Y on the X-axis and Y-axis

o Coordinate:

A point on a scatterplot where the values of X and Y are plotted

for a case

o Linear pattern:

One where the coordinates of the scatterplot fall into a cigar-

shaped pattern that approximates the shape of a straight line

Identifying linear patterns

o On scatterplot: coordinates in elongated cigar-shaped pattern

linear relationship

o POSITIVE CORRELATION – an increase in X is related to an increase in

Y (as X increases, Y has a tendency to increase)

o NEGATIVE CORRELATION – an increase in X is related to a decrease in

Y (as X increases, Y has a tendency to decrease)

o NO CORRELATION – an increase in X is unrelated to the score of Y. (As

X increase, Y-scores vary randomly)

Using the Linear Regression

REGRESSION LINE – the best-fitting straight line plotted through the X, Y-

coordinates of a scatterplot of two interval/ratio variables

o Used to estimate Y values at any X

Estimate based on knowing the precise relationship between

height and weight

o = value of Y predicted by the regression line

knowledgeable estimate of Y based on knowing Y is related to X

o Deviation Score of single sample = error in estimating to the mean

o Error in estimating a sample’s Y value to the regression line

o When relationship is found to exist between two interval/ratio

variables make best estimates by using the regression line rather

than the mean

Pearson’s r Bivariate Correlation Coefficient

PEARSON’S r BIVARIATE CORRELATION COEFFICIENT – Measures tightness

of fit of X, Y coordinates around the regression line

o CO-VARIATION OF X AND Y – sum of deviation scores of X multiplied

by the deviation scores of Y, aka. Sum of cross products (SCP)

Characteristics of Pearson’s r Bivariate Correlation Coefficient

o The sign (+ or -) of r indicates the direction of relationship

o r range from -1.0 (perfect negative correlation) to +1.0 (perfect

positive correlation)

larger the absolute value of r the tighter the fit of X, Y

coordinates around the regression line

o When r = 0, regression line is flat no correlation

Understanding the Pearson’s r Formulation

o Divide scatter plot into 4 quadrants by

and

, note the relationship

Quadrant 1 & 4 populated = negative relationship; Quadrant 2 &

3 populated = positive relationship; no pattern = no relationship

o All elements of r equation involve deviation scores

o r gauges how deviation scores of Y and Y fluctuate together (covary)

if positive relationship then expect samples on positive side of

also on the positive side of

denominator gauges how much total error X and Y have relative

to one another

assuming no relationship between X and Y

numerator gauges how well X and Y fluctuate in a pattern

measures correlation effect of the relationship

perfect relationship: numerator will equal denominator

Regression Statistics

Linear equation formula:

o First calculate the values of a and b then plug in X and solve for Y

o : the predict Y

An estimate of the dependent variable Y computer for a given

value of the independent variable X

o : the Regression Coefficient or Slope

Effect on Y per 1-unit change in X

o : The Y-intercept or the constant

Anchors the regression line to the Y-axis

Usually hypothetical point b/c there may be no case where X = 0

Calculating the Terms of the Regression Line Formula

Total Sum of Squares (TSS) – the overall different between each score and

the mean of Y; i.e. the sum of the deviation scores

Explained Sum of Squares (ESS or MSS) – the overall difference between

each predicted value and the mean of Y

Residual Sum of Squares (RSS) – the different between each score and

predict score

Thus it follows:

Steps of calculating bivariate correlation and regression statistics

o 1. Decide independent vs. dependent variable

o 2. Observe scatterplot, determine relationship

o 3. Calculate mean X and Y

o 4. Calculate TSS for X and Y and SCP

o 5. Calculate Pearson’s r correlation coefficient

o 6. Calculate the regression slope b and the Y-intercept a

Statistical Follies and Fallacies

Linear equations work only with linear pattern in the scatterplot

Outlier coordinates distort correlation & regression coefficients, cause:

o Attenuation of Correlation – the weakening or reduction of

correlation and regression coefficient

o Outliers coordinate may falsely imply a relationship

## Document Summary

Correlation a systematic change in the scores of two interval/ratio. Used only when there"s a linear relationship between x and y. Graphical representation of the relationship between two interval/ratio variables: scatterplot: A two-dimensional grid of coordinates of two interval/ratio variables, x and y on the x-axis and y-axis: coordinate: A point on a scatterplot where the values of x and y are plotted for a case: linear pattern, divide scatter plot into 4 quadrants by and , note the relationship. Quadrant 1 & 4 populated = negative relationship; quadrant 2 & 3 populated = positive relationship; no pattern = no relationship: all elements of r equation involve deviation scores, r gauges how deviation scores of y and y fluctuate together (covary) If positive relationship then expect samples on positive side of also on the positive side of . Denominator gauges how much total error x and y have relative to one another. Assuming no relationship between x and y.