Textbook Notes
(363,074)

Canada
(158,173)

Western University
(15,347)

Biology
(1,131)

Biology 2244A/B
(85)

Ben Rubin
(13)

Chapter 9

# Chapter 9: Correlation & Regression

Unlock Document

Western University

Biology

Biology 2244A/B

Ben Rubin

Fall

Description

Chapter 9: Correlation and Regression
Overview
– Objective: determine whether association exists btwn 2 variables
○ If exists, want to describe w/equation that can be used for predictions
Correlation
Bivariate data: collection of paired sample data
Correlation: exists btwn 2 variables when one of them is related to the other in some way
Exploring the Data
Begin investigation into association btwn 2 variables by constructing scatterplot
– Graph in which paired sample data plotted w/horizontal x-axis & vertical y-axis
○ Each individual pair plotted as a single point
– Study overall pattern of plotted points
○ Note direction of pattern if exists
Can have non-linear pattern
Pattern of points becomes closer to a straight line association btwn x and y becomes
stronger
○ Note any outliers
○ Conclusion largely subjective based on perception of whether a pattern is present
Linear Correlation Coefficient
– Linear correlation coefficient r
○ Use to detect straight-line patterns
○ Measures strength of linear association btwn paired x- and y-quantitative values in a sample
Calculated using sample data
○ Sample statistic used to measure strength of linear correlation btwn x and y
○ Population parameter: ρ
Requirements
When testing hypotheses or making other inferences about r:
1) Sample of paired data is random sample of quantitative data
2) Visual examination points approximate straight-line pattern
3) Outliers must be removed if known to be errors
Requirements 2 & 3 simplified way to check formal requirement:
– Pairs of data must have bivariate normal distribution
○ For any fixed value of x, corresponding values of y have a distribution that is bell-shaped
○ For any fixed value of y, values of x have distribution that is bell-shaped
Formula 9-1
– Can interpret r Using Table A-6
○ If absolute value of computed value of r exceeds value in Table A-6 significant linear correlation
Otherwise, not enough evidence
Interpreting the Linear Correlation Coefficient
– Value of r must always fall between -1 and +1 inclusive
○ r close to 0 no significant linear correlation btwn x and y
○ r close to -1 or 1 significant linear correlation btwn x and y
– Use P-value to determine
○ Less than or equal to significance value significant linear correlation (reject null hypothesis of no
correlation) Properties of the Linear Correlation Coefficient r
1) Value always between -1 and +1 inclusive
2) Value of r does not change if al values of either variable are converted to a different scale
3) Value of r is not affected by the choice of x or y (can interchange the naming of the variables)
4) r measures the strength of a linear association cannot measure non-linear association
Interpreting r: Explained Variation
2
The value of r is the proportion of the variation in y that is explained by the linear association
between x and y
Common Errors Involving Correlation
Three of the most common sources of errors made in interpreting results involving correlation:
1) Correlation does not equal causation
○ Could be a lurking variable affects the variables being studied, but not included in the study
1) Data based on averages averages suppress individual variation, may inflate correlation coefficient
2) Property of linearity association may exist between x and y even when there is no significant linear
correlation
○ Could have a non-linear association!
Formal Hypothesis Test
Standard deviation of r values can be expressed as denominator of test statistic
Reject null hypothesis of ρ = 0 if absolute value of test statistic exceeds critical values
– Sufficient evidence to support claim of linear correlation btwn the 2 variables
Hypothesis Test for Correlation
H 0 ρ = 0 (There is no significant linear correlation.)
H : ρ ≠ 0 (There is a significant linear correlation.)
1
Test Statistic:
Critical Values: Use t-table with n – 2 degrees of freedom
P-Value: Use t-table with n – 2 degrees of freedom
Conclusion:
– If |t| > critical value from t-table, reject null hypothesis
– If |t| ≤ critical value, fail to reject null hypothesis
One-Tailed Tests:
– Can occur w/claim of a positive linear correlation or a negative linear correlation
Claim of Negative Correlation Claim of Positive Correlation
(Left-tailed test) (Right-tailed test)
H0: ρ = 0 H0: ρ = 0
H1: ρ < 0 H1: ρ > 0
– Handle hypothesis testing method as explained in earlier chapters Centroid: The point , given a collection of paired data
r is based on the sum of the products
– If points of scatterplot tend to approximate uphill line (as in figure), individual values of tend to be
positive
st rd
○ Most of points found in 1 & 3 quadrants
– If approximate downhill line, points are in 2ndand 4 quadrants, so is negative
– No linear pattern = points scattered among 4 quadrants, value of tends to be close to 0
○ Can therefore use as a measure of how the points are arranged
st rd
○ Large, positive sum = points predominantly in 1 & 3 quadrants positive linear correlation
○ Large, negative sum = points predominantly in 2 nd& 4 quadrants negative linear correlation
○ Sum near zero = points scattered among quadrants no linear correlation
However, sum depends on magnitude of the numbers used standardize them by dividing
by x part by sxand y part by s y
Further modify by introducing divisor of n – 1 gives a type of average, instead of sum that
grows b/c have more data
End up with equation for r!
Regression
Objective: describe association btwn 2 variables by finding graph & equation of straight line that represents
as

More
Less
Related notes for Biology 2244A/B