Lecture 18, 19, 20, 21 and 22
Correlation
- Also called bivariate data
- Basic concepts
o Correlation – relationship between 2 variables
o Use a scatter plot
X and Y represent a one of the two variables used and are plotted together
o Linear correlation coefficient (r) SAMPLE STATISTIC
Measures strength of correlation in a sample
Requirements
Sample data is random sample
Visual examination shows straight-line pattern
OUTLIERS REMOVED
Also, pairs of data must have bivariate normal distribution
For and x, y must be bell shaped and vice versa
Round to 3 decimals
If VALUE FROM TABLE LESS THAN COMPUTED r VALUE, significant linear
correlation Write as “Since r=?, its absolute value does exceed ? so we conclude
that there is a significant linear correlation between x and y.
Properties
-1 <= r <= 1
Value of r doesn’t change if all values for 1 variable converted to
different scale
Value of r not affected by choice of x or y
r measures strength of linear association
interpreting r
value of r^2 is proportion of variation in y explained by linear
association between x and y
“We conclude that r^2 of variation in y can be explained by linear
association between x and y. This implies that 1-r^2 of variation in y can
be explained by factors other than x.”
Common error
Correlation DOES NOT IMPLY CAUSALITY
o Lurking variable – variable that affects but is not in study
Averages inflate correlation coefficient
Property of linearity
o Visually, looks like there is correlation but from data, r = 0
- Beyond the basics
o Hypothesis testing
Null = there is no significant correlation
Alternate = there is significant correlation
Method 1
If absolute value of t > critical value, reject null
o Otherwise, say “there is not sufficient evidence to conclude that
there is a significant linear correlation”
o T is test stat, critical value is from table A3
Method 2
Test stat is r, critical value is from table A6
If absolute of r > critical value, reject null
o Centroid
For given (x, y), point (x-bar, y-bar) is centroid Regression
- Basic concepts
o Association between x (independent variable) and y-hat (dependent variable)
o Requirements
Paired sample (x, y) are random sample
Scatterplot shows straight-line pattern
Outliers are removed
o Notes
For each x, y must be bell-shaped
For different values of x, y-values must have same variance
For different values of x, distribution of y-values have means on straight line
Y-values INDEPEN

More
Less