STAT141 Chapter Notes -Confounding, Scatter Plot, Dependent And Independent Variables

69 views15 pages
19 Nov 2011
Statistics Part Two – Exploring Relationships
Between Two Variables
Chapter Seven – Scatterplots, Association and
Scatterplots may be the most common and most effective display for
In a scatterplot, you can see patterns, trends, relationships, and
even the occasional extraordinary value sitting apart from the
Scatterplots are the best way to start observing the relationship and
the ideal way to picture associations between two quantitative
Roles for Variables
It is important to determine which of the two quantitative variables
goes on the x-axis and which on the y-axis
This determination is made based on the roles played by the variables
When the roles are clear, the explanatory or predictor variable goes on
the x-axis and the response variable goes on the y-axis
The roles that we choose for variables are more about how we think
about them rather than about the variables themselves
Just placing a variable on the x-axis doesn’t necessarily mean that it
explains or predicts anything. And the variable on the y-axis may not
respond to it in any way
More on Scatterplots
When looking at scatterplots, we will look for direction, form, strength,
and unusual features.
A pattern that runs from the upper left to the lower right is said
to have negative direction
A trend running the other way is said to have positive direction
If there is a straight line (linear) relationship, it will appear as a
cloud or swarm of points stretched out in a generally consistent,
straight form
If the relationship isn’t straight, but curves gently (exponential
growth for example), while still increasing or decreasing
steadily, we can often find ways to make it more nearly straight
If the relationship curves sharply however, the methods will not
At one extreme, the points appear to follow a single stream
(whether straight, curved, or bending all over)
At the other extreme, the points appear as a vague cloud with
no discernable trend or pattern
Unlock document

This preview shows pages 1-3 of the document.
Unlock all 15 pages and 3 million more documents.

Already have an account? Log in
Unusual Features:
Look for the unexpected
Often the most interesting thing to see in a scatterplot is the
thing you never thought to look for
One example of such a surprise is an outlier standing away from
the overall pattern of the scatterplot
Clusters or subgroups should also raise questions
Correlation: Quantifying the Strength of Linear
Data collected from students in stats classes
included their heights and weights
Here is a positive association and a fairly
straight form with one high outlier.
So how strong is the association between
weight and height of stats students?
If we had to put a number on the strength,
we would not want it to depend on the units
we used because no matter the units, the pattern is
the same
So since units do not matter, why not remove them?
We could standardize both variables and write the
coordinates of a point as (zx, zy)
Here is a scatterplot of the standardized weights and
Note that the underlying linear pattern seems
steeper in the standardized plot than in the
That’s because we made the scales of the axis
the same
Equal scaling gives a neutral way of drawing
the scatterplot and a fairer impression of the
strength of association
Some points strengthen the impression of a positive association (along
linear line), others weaken the positive (outliers) and some don’t vote
either way (z-scores of zero)
The correlation coefficient (r) gives us a numerical measurement of the
strength of the linear relationship between the explanatory and the
response variables
x y
z z
(So the formula means multiply each zx by zy, add up all those values,
then divide by the number of data minus one)
Correlation Conditions
Unlock document

This preview shows pages 1-3 of the document.
Unlock all 15 pages and 3 million more documents.

Already have an account? Log in
Correlation measures the strength of the linear association between
two quantitative variables
Before you use correlation, you must check several conditions
Quantitative Variables Condition
Correlation applies only to quantitative variables
Don’t apply correlation to categorical data masquerading
as quantitative
Check that you the variables’ units and what they
Straight Enough Condition
You can calculate a correlation coefficient for any pair of
But correlation measures the strength only of the linear
association, and will be misleading if the relationship is
not linear
Outlier Condition
Outliers can distort the correlation dramatically
An outlier can make an otherwise small correlation look
big or hide a large correlation
It can even give an otherwise positive association a
negative correlation coefficient (and vice versa)
When you see an outlier, it’s often a good idea to report
the correlations with and without that point
Correlation Properties
The sign of a correlation coefficient gives the direction of the
Correlation is always between -1 and +1
Correlation can be exactly equal to -1 or +1, but these values
are unusual in real data because they mean that all the data
points fall exactly on a single straight line
A correlation near zero corresponds to a weak linear association
Correlation treats x and y symmetrically:
The correlation of x with y is the same as the correlation of y
with x
Correlation has no units
Correlation is not affected by changes in the center or scale of either
Correlation depends only on the z-scores, and they are
unaffected by changes in center or scale
Correlation DOES NOT EQUAL Causation
Whenever we have a strong correlation, it is tempting to explain it by
imagining that the predictor variable has caused the response to help
Scatterplots and correlation coefficients never prove causation
A hidden variable that stands behind a relationship and determines it
by simultaneously affecting the other two variables is called a lurking
Unlock document

This preview shows pages 1-3 of the document.
Unlock all 15 pages and 3 million more documents.

Already have an account? Log in

Get OneClass Notes+

Unlimited access to class notes and textbook notes.

YearlyBest Value
75% OFF
$8 USD/m
$30 USD/m
You will be charged $96 USD upfront and auto renewed at the end of each cycle. You may cancel anytime under Payment Settings. For more information, see our Terms and Privacy.
Payments are encrypted using 256-bit SSL. Powered by Stripe.