Ch6 Correlation and Linear Regression
The sign of the correlation tells us the direction of the association.▯
The magnitude of the correlation tells us of the strength of a linear association.
x: explanatory variable
y: response variable
Scatterplot,plots one quantitative variable against another
• ideal way to picture associations between two quantitative variables.
Looking at scatterplot
1. Direction: negative / positive
c. No relationship: X and Y vary independently.Knowing X tells you nothing about Y. 3. Strength: how much scatter?
a. can be seen by how much variation,or scatter,there is around the main form.
- the idea is we need to use explanatory variable to predict the response variable.if [y] can
be predicted accurately,the relationship is strong,if not,the relationship is weak.
-- The ratio of the sum of the product zxzy for every point in the scatterplot to n–1is called the
Correlation measures the strength of the linear association between two quantitative variables.
• "r"quantiﬁes the strength and direction of a linear relationship between 2 quantitative
variables.ranges from-1 to 1.
• Strength: how closely the points follow a straight line.
• Direction: x-y positive / negative relation
• correlation properties
• The sign of a correlation coefﬁcient gives the direction of the association.
Correlation is always between–1 and +1.
• -1 and +1 are rare-means all data points fall exactly on a single straight line
• Correlation treats x and y symmetrically.
• correlation of x with y = correlation of y with x
• Correlation has no units.
• Correlation is not affected by changes in the center or scale of either variable.
• changing the units or baseline of either variable has no effect on the correlation
coefﬁcient because it depends only on z-scores
• Correlation measures the strength of the linear association between the two variables.
• Correlation is sensitive to unusual observations i.e.outliers
before using correlation,must check correlation condition
5. Quantitative Variables Condition: Correlation applies only to quantitative variables.
6. Linearity Condition: Correlation measures the strength only of the linear association.
a. No matter how strong the association,r does not describe curved relationships.
7. Outlier Condition: Unusual observations can distort the correlation. correlation table
• the upper half is same as the lower half,so by convention only the lower half is shown
• There is no way to conclude from a high correlation alone that one variable causes the other.
• There’s always the possibility that some third variable—a lurking variable—is simultaneously
affecting both of the variables you have observed.
The linear model is just an equation of a straight line through the data
A linear model can be written in the form y-hat = b0 + b1x ,where b0 and b1 are numbers
estimated from the data and y-hat is the predicted value.
The difference between the predicted value y-hat and the observed value y,is called the residual
and is denoted e. e = y -y-hat
• points above regression line have positive re