Lecture 5

MATH 11 Lecture 5: 41217Ch8 and 9

David James Quarfoot
4/12/17 lecture notes (5) Ch8 and 9 Wednesday, April 12, 2017 11:02 AM Since correlations are involved, we need our three conditions from before • Quantitative variables • Straight enough • No outliers A new condition: • Residual noise There should not be a pattern in your residual plot A new statistic: R^2 (percent variance explained) If you calculate the correlation from the Old Faithful example, you get r=0.854 For a give linear model, R^2=r^2 is the proportion of the variation in the y-variable that is accounted for (or explained) by the variation in the x-variable So, R^2 = 0.854^2=0.73= 73% of how long we must wait is completely determined by how long the last eruption lasted. As another example, the R^2 in the height-weight regression is 0.67 so 67% of the variability in weights is simply because of height differences. Sad reality: as simple as linear regression is, most people use it incorrectly: 1. They fail to look at the residuals and make sure that the model is reasonable 2. They extrapolate without caution 3. They don’t consider outliers carefully enough 4. They build a model ---------- Subgroups in your data Often, you can identify subgroups in your original data or in the residuals. In this case, split your data into different parts and do several linear regression instead of one, clunky, regression. : : it: ! ¥ j ; . Number of passengers at airport through time Average age at which people got married Subgroups can be explained by different outside causes. In 2008 there was an economic crisis so people stopped flying and traveling as much P
