MIS372 Lecture Notes - Lecture 4: Overfitting, Predictive Modelling, Metafunction

53 views6 pages

Document Summary

Estimate the performance of the predictive model, especially for new unseen data. A simplistic approach: train/build the model on the entire data available. The simplistic approach is not the best solution. Will not give a good estimate of performance when new unseen data arrives. The estimation of performance will be overly optimistic, sometimes 100% accuracy. Random selection of a sample for training (~70%-80%), the rest for testing: Classifier will be trained with the training set, once only. Classifier will be tested on the testing set, once only. Preserves the distribution of instances in the selected sample. Hold-out validation shortcomings; each data instance is guaranteed to be seen only once by the model (training/testing) The split (training-testing) may not represent the entire data set/population. Testing performances may vary a lot based on the random selection of training/testing partitions. A lucky split may result in a good estimate. An unfortunate split may result in a poor/misleading estimate.

Get access

Grade+20% off
$8 USD/m$10 USD/m
Billed $96 USD annually
Grade+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
40 Verified Answers
Class+
$8 USD/m
Billed $96 USD annually
Class+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
30 Verified Answers

Related Documents