Textbook Notes (363,074)
Biology (1,131)
Ben Rubin (13)
Chapter 9

# Chapter 9: Correlation & Regression

6 Pages
76 Views

School
Western University
Department
Biology
Course
Biology 2244A/B
Professor
Ben Rubin
Semester
Fall

Description
Chapter 9: Correlation and Regression Overview – Objective: determine whether association exists btwn 2 variables ○ If exists, want to describe w/equation that can be used for predictions Correlation Bivariate data: collection of paired sample data Correlation: exists btwn 2 variables when one of them is related to the other in some way Exploring the Data Begin investigation into association btwn 2 variables by constructing scatterplot – Graph in which paired sample data plotted w/horizontal x-axis & vertical y-axis ○ Each individual pair plotted as a single point – Study overall pattern of plotted points ○ Note direction of pattern if exists  Can have non-linear pattern  Pattern of points becomes closer to a straight line  association btwn x and y becomes stronger ○ Note any outliers ○ Conclusion largely subjective  based on perception of whether a pattern is present Linear Correlation Coefficient – Linear correlation coefficient r ○ Use to detect straight-line patterns ○ Measures strength of linear association btwn paired x- and y-quantitative values in a sample  Calculated using sample data ○ Sample statistic used to measure strength of linear correlation btwn x and y ○ Population parameter: ρ Requirements When testing hypotheses or making other inferences about r: 1) Sample of paired data is random sample of quantitative data 2) Visual examination  points approximate straight-line pattern 3) Outliers must be removed if known to be errors Requirements 2 & 3  simplified way to check formal requirement: – Pairs of data must have bivariate normal distribution ○ For any fixed value of x, corresponding values of y have a distribution that is bell-shaped ○ For any fixed value of y, values of x have distribution that is bell-shaped Formula 9-1 – Can interpret r Using Table A-6 ○ If absolute value of computed value of r exceeds value in Table A-6  significant linear correlation  Otherwise, not enough evidence Interpreting the Linear Correlation Coefficient – Value of r must always fall between -1 and +1 inclusive ○ r close to 0  no significant linear correlation btwn x and y ○ r close to -1 or 1  significant linear correlation btwn x and y – Use P-value to determine ○ Less than or equal to significance value  significant linear correlation (reject null hypothesis of no correlation) Properties of the Linear Correlation Coefficient r 1) Value always between -1 and +1 inclusive 2) Value of r does not change if al values of either variable are converted to a different scale 3) Value of r is not affected by the choice of x or y (can interchange the naming of the variables) 4) r measures the strength of a linear association  cannot measure non-linear association Interpreting r: Explained Variation 2 The value of r is the proportion of the variation in y that is explained by the linear association between x and y Common Errors Involving Correlation Three of the most common sources of errors made in interpreting results involving correlation: 1) Correlation does not equal causation ○ Could be a lurking variable  affects the variables being studied, but not included in the study 1) Data based on averages  averages suppress individual variation, may inflate correlation coefficient 2) Property of linearity  association may exist between x and y even when there is no significant linear correlation ○ Could have a non-linear association! Formal Hypothesis Test Standard deviation of r values can be expressed as denominator of test statistic Reject null hypothesis of ρ = 0 if absolute value of test statistic exceeds critical values – Sufficient evidence to support claim of linear correlation btwn the 2 variables Hypothesis Test for Correlation H 0 ρ = 0 (There is no significant linear correlation.) H : ρ ≠ 0 (There is a significant linear correlation.) 1 Test Statistic: Critical Values: Use t-table with n – 2 degrees of freedom P-Value: Use t-table with n – 2 degrees of freedom Conclusion: – If |t| > critical value from t-table, reject null hypothesis – If |t| ≤ critical value, fail to reject null hypothesis One-Tailed Tests: – Can occur w/claim of a positive linear correlation or a negative linear correlation Claim of Negative Correlation Claim of Positive Correlation (Left-tailed test) (Right-tailed test) H0: ρ = 0 H0: ρ = 0 H1: ρ < 0 H1: ρ > 0 – Handle hypothesis testing method as explained in earlier chapters Centroid: The point , given a collection of paired data r is based on the sum of the products – If points of scatterplot tend to approximate uphill line (as in figure), individual values of tend to be positive st rd ○ Most of points found in 1 & 3 quadrants – If approximate downhill line, points are in 2ndand 4 quadrants, so is negative – No linear pattern = points scattered among 4 quadrants, value of tends to be close to 0 ○ Can therefore use as a measure of how the points are arranged st rd ○ Large, positive sum = points predominantly in 1 & 3 quadrants  positive linear correlation ○ Large, negative sum = points predominantly in 2 nd& 4 quadrants  negative linear correlation ○ Sum near zero = points scattered among quadrants  no linear correlation  However, sum depends on magnitude of the numbers used  standardize them by dividing by x part by sxand y part by s y  Further modify by introducing divisor of n – 1  gives a type of average, instead of sum that grows b/c have more data  End up with equation for r! Regression Objective: describe association btwn 2 variables by finding graph & equation of straight line that represents as
More Less

Related notes for Biology 2244A/B

OR

Don't have an account?

Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Join to view

OR

By registering, I agree to the Terms and Privacy Policies
Just a few more details

So we can recommend you notes for your school.