STATS 10 Lecture Notes - Lecture 5: Scatter Plot, Lincoln Near-Earth Asteroid Research, Dependent And Independent Variables

67 views5 pages
10 Jun 2018
School
Department
Course
Chapter 4: Regression Analysis: Exploring Associations Between Variables
Plots to visualize numerical data: dotplots, histograms, stemplots
When describing numerical distribution, consider: shape, center, spread
We will consider questions about the relationship between two numerical variables
Scatterplot: used to plot relationship between two variables; each point represents one observation, and location of point
depends on values of the two variables of interest (one at x-axis, other at y-axis)
Each observation is a PAIR of values
horozontal/vertical axes do not have to be on same scale → just label properly
Bottom left corner does not have to start at (0,0); we want to zoom in on the relationship between the two
variables and not have a lot of empty space
The Big Three (of analyzing relationship between TWO variables via scatterplot)
Trend
Strength
Shape
Trend: association between two variables (associated if there is relationship bewteen them)
Trend of association=general tendency of scatterplot scanning from left to right
Increasing trend (positive association/trend)→ uphil/rising tendency; increases in one variable are
associated w/ increases in the other variable
Decreasing trend (negative association/trend)→ downhill/falling tendency; increases in one variable are
associated with decreases in the other
Describes general tendency, not all individual behavior between two variables
Do not use absolute terms when interpreting trends in context!!!!!
Not always clearly positive or negative!
Strength: of an association (trend)=how closely related two variables are
(amount of scatter in the scatterplot)
Refers to spread of points in the vertical direction
Weak association/trend=large amount of scatter, ie high
vertical variation; trend harder to visually detect
Strong association/trend=little scatter, ie low vertical
variation; trend easier to visually detect
Strength → how well knowing one variable can predict the other
Relative strength (which trend=stronger?) is easier to
distinguish than labelling a trend by itself (subjective!)
Shape: rate of increase/decrease in the trend
Linear: trend always increases/decreases at the same trait; can be
summarized w/ straight line (superimposed
Nonlinear: rate of increase/decrease changes depending on values of variables
E.g. quadratic, exponential, etc; not really covered in course, more focus on linear
Correlation coefficient (correlation): # that measures the strength of the LINEAR association between two numerical
values
Two variables are correlated if they are LINEARLY associated
Only makes sense when trend is LINEAR and both variables are NUMERICAL
Correlation coefficent denoted as r
Always between -1 and 1
Both value and sign are important
Value: strength
If value is close to -1 or +1, association is strong
If value is close to 0, association is weak (correlation=0 if scatterplot shows nonlinear)
Sign: direction
If sign is +, trend is +
If sign is -, trend is -
find more resources at oneclass.com
find more resources at oneclass.com
Unlock document

This preview shows pages 1-2 of the document.
Unlock all 5 pages and 3 million more documents.

Already have an account? Log in
CORRELATION DOES NOT IMPLY CAUSATION! (even if the correlation is close to -1 or +1)
Only a measure of linear association,
Find the mean and standard dev to find z score
Do this for each x and each y
For each observation (scatterplot point), multiply the x and y z-scores
Add all the z-score products up!
Divide this number by n-1 (sample size-1)
Points that add to correlation: values of 2 variables BOTH above or BOTH below means
Points that subtract from correlation: value of one variable above mean and one below mean
Points that do not contribute to correlation have value of at least one variable equal exactly to the
mean
Properties of correlation
The order of variables does not matter
Height vs. hand spain is the same as hand span vs. height
Highlights fact that correlation only tells us about strength of association, cannot imply
causality
Unitless
Only depends on z-scores, units of measurement for each variable don’t affect correlation
inches//pounds is same correlation between cm//kg
Only linear
Correlation does not tell you whether an assocation is linear
Does not tell you shape of graph
If the association is linear, THEN the correlation coefficient is a measure of its strength
If the association is nonlinear, then the correlation coefficient does not have much
interpretability
Statistical modeling
One way to measure a trend
Make an assumption that the trend can be summarized by a math equation
Use observed data to estimate mathematical equation tha tbest describes trend
Analogous to a physical model, except statistical models have inherent uncetainty and must account for
variation
FOR LINEAR TRENDS, assume trend can be summed by a line equation: regression line
Regression line: statistical model that summarizes the linear trend of observed values → also
represents best guess/prediction for any new or future observations
Equation for (straight) line: y=mx+b
m=slope (how steep the line is); change in y for a unit increase in x
b=y-intercept (value of y when x=0)
Statisticians typically write equation of line w/ intercept first: y=a+bx
a=y-intercept
b=slope
Also called least squares line as it’s chosen as to minimize the sum of squared (vertical)
distances of the observed and predicted values → BEST FIT line
find more resources at oneclass.com
find more resources at oneclass.com
Unlock document

This preview shows pages 1-2 of the document.
Unlock all 5 pages and 3 million more documents.

Already have an account? Log in

Document Summary

Chapter 4: regression analysis: exploring associations between variables. Plots to visualize numerical data: dotplots, histograms, stemplots. When describing numerical distribution, consider: shape, center, spread. We will consider questions about the relationship between two numerical variables. Scatterplot: used to plot relationship between two variables; each point represents one observation, and location of point depends on values of the two variables of interest (one at x-axis, other at y-axis) Each observation is a pair of values. Horozontal/vertical axes do not have to be on same scale just label properly. Bottom left corner does not have to start at (0,0); we want to zoom in on the relationship between the two variables and not have a lot of empty space. The big three (of analyzing relationship between two variables via scatterplot) Trend: association between two variables (associated if there is relationship bewteen them) Trend of association=general tendency of scatterplot scanning from left to right.

Get access

Grade+20% off
$8 USD/m$10 USD/m
Billed $96 USD annually
Grade+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
40 Verified Answers
Class+
$8 USD/m
Billed $96 USD annually
Class+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
30 Verified Answers

Related textbook solutions

Related Documents