Correlation and Linear Regression
How can we investigate whether two variables are associated with one another?
• Is there a relationship between cardiac mortality and the consumption of wine?
o Do a study!
• Is there an association between marital relationships and health problems?
• For each individual studied, we record data on
• We then examine whether there is a
relationship between these two variables: Do
changes in one variable tend to be associated
with specific changes in the other variables?
• A scatterplot is used to display quantitative
• Each variable makes up one axis
o Each individual is a point on the graph
Explanatory and response variables
• A response (dependent) variable measures an
outcome of a study.
o Example: Weight is dependent on Age
• An explanatory (independent) variable may explain or influence changes in a response
o Example: Gestational Age influences the Birth Weight
o When there is an obvious explanatory variable, it is
plotted on the x (horizontal) axis of the scatterplot.
How to scale a scatterplot
• Both variables should be given a similar amount of space:
o Plot is roughly square
o Points should occupy all the plot space (nonblank
▪ *colored box is around best graph
1 Stat 1400
• After plotting two variables on a scatterplot, we describe the overall pattern of the
relationship. Specifically, we look for …
o Form: linear, curved, clusters, no pattern, etc.
▪ The example above is linear
o Direction: positive, negative, no direction (increasing or decreasing??)
▪ The example above is negative
o Strength: how closely the points fit the “form” (how scattered it is?)
▪ The example above is relatively weak with variation
• … and clear deviations from that pattern 100
o Outliers of the relationship
• Example: 80
o Form: Linear 60
o Direction: Positive
o Strength Moderate
o No outliers 20
Manatee deaths from powerboat collision
400 600 800 1000
Powerboats registered (x1,000)
o Form: Liner? Cluster? We need more data
o Direction: Negative
o Strength: weak
o Outlier around (17, 120)
Adding categorical variables to scatterplots
▪ Two or more relationships can be compared on a
single scatterplot when we use different
symbols for groups of points on the graph
▪ To add a categorical variable, use a different
plot color or symbol for each category.
▪ Consider the relationship between mean SAT
verbal score and percent of high-school grads
taking SAT for each state
o Orange dots are southern states and
blue dots are northern states