MISY261- Midterm Exam Guide - Comprehensive Notes for the exam ( 13 pages long!)

88 views13 pages

Document Summary

Overfitting and variable selection: overfitting is adding too many variables. Outliers: outliers are, cases that differ dramatically from the rest of the data, data that don"t make sense in the context of the problem, outliers can exist in any type of statistical model. Interpretation of coefficients for interactions: model: fare ~ 11. 728 + 1. 67tip 7. 12credit + 1. 68tip*credit. Interpretation: r^2: test prediction using the model with and without interactions and comparing how similar the outputs are. Nonlinear relationships: we can implement transformations of our variables, ex: tip^2, tip^3, log(tip), sqrt(tip, stars next to a variable in the summary printout indicate significance. How to create a decision tree: start by understanding total variation, the best split creates the smallest total variation (s0+s1, r considers every variable every time it splits a node. Logistic regression: introduction: linear regression predicts continuous outcomes, binary logistic regression helps with yes/no outcomes.