# INFO 2020 Lecture 4: Week 4-6 Notes

7 Feb 2017
Week 4-6 Notes
Part 2 (must be a spreadsheet)
- Organized and easy to understand
Scatter plots DV on left IV bottom
Observations correlation matrix highlighted and noted
Correlations doesn’t matter if they are bad, just need to interpret it correct
Correlations
Between IV’s and DV’s are good
IV’s to IV’s not necessarily good
Diagnostics
Standardized residuals should be between -2 and 2 if good model
Would like residuals to be independent
Residuals should be identically distributed
o Normally distributed
What to do with New Data
1. Data cleaning
2. Exploration
o Descriptive stats
o Histograms
o Correlations
o Scatter plots
3. Regression
Correlation Cut-Offs
What is good? What is bad?
o Depends on industry
o Strong maybe .7 or more
o No correlation .1 or less
Between .1 and .7 average
Coefficient of determination
Predicts power of model
I can predict 77% of the change in my dependent variable based on changes
in my independent variable
Model
o 77% is predicted, 23% is error
Is significance F good? <0.05 (p-value)
