.pf{position:relative;background-color:#fff;overflow:hidden;margin:0;border:0}.pc{position:absolute;border:0;padding:0;margin:0;top:0;left:0;width:100%;height:100%;overflow:hidden;display:block;transform-origin:0 0;-ms-transform-origin:0 0;-webkit-transform-origin:0 0}.bi{position:absolute;border:0;margin:0}.c{position:absolute;border:0;padding:0;margin:0;overflow:hidden;display:block}.t{position:absolute;white-space:pre;font-size:1px;transform-origin:0 100%;-ms-transform-origin:0 100%;-webkit-transform-origin:0 100%;unicode-bidi:bidi-override;-moz-font-feature-settings:"liga" 0}.t:after{content:''}.t:before{content:'';display:inline-block}.t span{position:relative;unicode-bidi:bidi-override}._{display:inline-block;color:transparent;z-index:-1}.pi{display:none}@media screen{.pf{margin:13px auto;box-shadow:1px 1px 3px 1px #333;border-collapse:separate}}.ff1{font-family:ff1;line-height:.98291;font-style:normal;font-weight:400;visibility:visible}.ff2{font-family:ff2;line-height:.758789;font-style:normal;font-weight:400;visibility:visible}.ff3{font-family:ff3;line-height:1.432;font-style:normal;font-weight:400;visibility:visible}.ff4{font-family:ff4;line-height:1.051270;font-style:normal;font-weight:400;visibility:visible}.ff5{font-family:ff5;line-height:.942383;font-style:normal;font-weight:400;visibility:visible}.m0{transform:matrix(.320260,0,0,.320260,0,0);-ms-transform:matrix(.320260,0,0,.320260,0,0);-webkit-transform:matrix(.320260,0,0,.320260,0,0)}.ls4{letter-spacing:0}.ls5{letter-spacing:.096000px}.ls2{letter-spacing:.192px}.ls0{letter-spacing:20.72px}.ls1{letter-spacing:20.736px}.ls3{letter-spacing:43.76px}.sc0{text-shadow:-.015em 0 transparent,0 .015em transparent,.015em 0 transparent,0 -.015em transparent}@media screen and (-webkit-min-device-pixel-ratio:0){.sc0{-webkit-text-stroke:.015em transparent;text-shadow:none}}.ws0{word-spacing:-48px}.ws2{word-spacing:-10.944px}.ws1{word-spacing:-10.848px}.ws3{word-spacing:0}._2{margin-left:-1.056000px}._1{width:368.504000px}._0{width:456.056000px}.fc0{color:#000}.fs0{font-size:48px}.y1{bottom:0}.y14{bottom:168.815657px}.y0{bottom:183.188944px}.y13{bottom:183.419531px}.y12{bottom:221.082153px}.y11{bottom:235.686027px}.y10{bottom:310.281078px}.yf{bottom:329.035526px}.ye{bottom:343.6394px}.yd{bottom:642.660122px}.yc{bottom:661.414571px}.yb{bottom:676.018444px}.ya{bottom:713.565773px}.y9{bottom:732.320221px}.y8{bottom:751.074670px}.y7{bottom:769.982843px}.y6{bottom:795.962366px}.y5{bottom:851.457087px}.y4{bottom:870.211535px}.y3{bottom:888.965984px}.y2{bottom:907.746053px}.h2{height:38.625px}.h3{height:55.104000px}.h1{height:664.220052px}.h0{height:1014.58492px}.w1{width:549.566831px}.w2{width:783.997426px}.w0{width:783.997438px}.x1{left:.000012px}.x2{left:92.265726px}.x0{left:115.293741px}.x3{left:161.436846px}.x7{left:304.132066px}.x4{left:419.451428px}.x6{left:454.808175px}.x5{left:664.988665px}

STAT C100 Lecture Notes - Lecture 13: Feature Engineering, Overfitting, Invertible Matrix

Fitting linear models, regularization and cross validation (domain) feature engineering linear regression. Turn into the feature matrix with entirely quantitative values. Note: for inverse to exist needs to be full column rank. Scikit learn has a wide range of models. Many of the models follow a common pattern: from sklearn import linear_model f = linear_model. linearregression(fit_intercept=true) f. fit(train_data[["x"]], train_data["y"]) How can we control overfitting through . Proposal: set weights = 0 to remove features. Does not encourage sparsity small but non-zero weights. Does not have an analytic solution numerical methods. Which means that complexity(f) less and equal to beta (regularization parameter) such that f(x) not too complicated. Non-convex hard to solve constrained optimization problem. There is an equivalent unconstrained formulation (obtained by lagrangian such that duality) Larger values more regularization more bias less variance. Larger less regularization greater complexity overfitting. Training error might be small but test error large failure to generalize. Larger training set more complex models.

United States

Principles & Techniques of Data Science

Statistics

Josh Hug

University of California - Berkeley

Calculus

Introduction to the History of Asians in the United States

The Structure and Interpretation of Computer Programs

Worldings - Regions, Peoples and States

Science, Technology, and Society

Music Now

The (Secret) Life of Plants

Concepts of Probability

Performance: Writing and Research

Principles of Business

Introduction to Climate Change

STAT C100 Lecture Notes - Lecture 11: Feature Engineering, Missing Data, Data Science

STAT C100 Lecture Notes - Lecture 13: Feature Engineering, Overfitting, Invertible Matrix

Document Summary

Get access

Related textbook solutions

Introductory Statistics

Related Documents

STAT C100 Lecture Notes - Lecture 11: Feature Engineering, Missing Data, Data Science