STAT C100 Lecture Notes - Lecture 13: Feature Engineering, Overfitting, Invertible Matrix

63 views8 pages
13 Oct 2018
School
Department
Course
Professor

Document Summary

Fitting linear models, regularization and cross validation (domain) feature engineering linear regression. Turn into the feature matrix with entirely quantitative values. Note: for inverse to exist needs to be full column rank. Scikit learn has a wide range of models. Many of the models follow a common pattern: from sklearn import linear_model f = linear_model. linearregression(fit_intercept=true) f. fit(train_data[["x"]], train_data["y"]) How can we control overfitting through . Proposal: set weights = 0 to remove features. Does not encourage sparsity small but non-zero weights. Does not have an analytic solution numerical methods. Which means that complexity(f) less and equal to beta (regularization parameter) such that f(x) not too complicated. Non-convex hard to solve constrained optimization problem. There is an equivalent unconstrained formulation (obtained by lagrangian such that duality) Larger values more regularization more bias less variance. Larger less regularization greater complexity overfitting. Training error might be small but test error large failure to generalize. Larger training set more complex models.

Get access

Grade+20% off
$8 USD/m$10 USD/m
Billed $96 USD annually
Grade+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
40 Verified Answers
Class+
$8 USD/m
Billed $96 USD annually
Class+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
30 Verified Answers

Related textbook solutions

Related Documents