Lecture 15

# STAT 1400 Lecture 15: 4.18 Stat Notes (Ch. 9)

Description
Stat 1400 4.18.2017 9:30 am Correlation and Linear Regression How can we investigate whether two variables are associated with one another? • Is there a relationship between cardiac mortality and the consumption of wine? o Do a study! • Is there an association between marital relationships and health problems? o Study! Bivariate data • For each individual studied, we record data on TWO variables • We then examine whether there is a relationship between these two variables: Do changes in one variable tend to be associated with specific changes in the other variables? Scatterplots • A scatterplot is used to display quantitative bivariate data • Each variable makes up one axis o Each individual is a point on the graph Explanatory and response variables • A response (dependent) variable measures an outcome of a study. o Example: Weight is dependent on Age • An explanatory (independent) variable may explain or influence changes in a response variable. o Example: Gestational Age influences the Birth Weight o When there is an obvious explanatory variable, it is plotted on the x (horizontal) axis of the scatterplot. How to scale a scatterplot • Both variables should be given a similar amount of space: o Plot is roughly square o Points should occupy all the plot space (nonblank space) ▪ *colored box is around best graph 1 Stat 1400 4.18.2017 9:30 am Interpreting Scatterplots • After plotting two variables on a scatterplot, we describe the overall pattern of the relationship. Specifically, we look for … o Form: linear, curved, clusters, no pattern, etc. ▪ The example above is linear o Direction: positive, negative, no direction (increasing or decreasing??) ▪ The example above is negative o Strength: how closely the points fit the “form” (how scattered it is?) ▪ The example above is relatively weak with variation • … and clear deviations from that pattern 100 o Outliers of the relationship • Example: 80 o Form: Linear 60 o Direction: Positive 40 o Strength Moderate o No outliers 20 Manatee deaths from powerboat collision 0 400 600 800 1000 Powerboats registered (x1,000) • Example: o Form: Liner? Cluster? We need more data o Direction: Negative o Strength: weak o Outlier around (17, 120) Adding categorical variables to scatterplots ▪ Two or more relationships can be compared on a single scatterplot when we use different symbols for groups of points on the graph ▪ To add a categorical variable, use a different plot color or symbol for each category. ▪ Consider the relationship between mean SAT verbal score and percent of high-school grads taking SAT for each state o Orange dots are southern states and blue dots are northern states
