RSM412H1 Lecture Notes - Lecture 11: Euclidean Distance, Similarity Measure, Data Mining

27 views1 pages
9 Apr 2020
School
Department
Course
Professor

Document Summary

Object assigned to most common class among its k nearest neighbors: also known as: Need to calculate distance between new example and all examples in the training set. K-closest examples used to determine which class new example belongs to. Can calculate using euclidean distance or other distance measure. Distance between neighbors could be dominated by some attributes with relatively large numbers. Feature normalization does not help in high dimensional spaces if most features are irrelevant. If number of useful features smaller than number of noisy features, euclidean distance is dominated by noise. Important to normalize features to eliminate scale effects of different features. Maps values to numbers between 0 and 1. Rule of thumb is n is number of examples. Where i represents discriminative features, j represents noisy features. Choosing k: in practice, often use k=1 for efficiency. This can be sensitive to noise: larger k can improve performance, too large k destroys locality.

Get access

Grade+20% off
$8 USD/m$10 USD/m
Billed $96 USD annually
Grade+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
40 Verified Answers
Class+
$8 USD/m
Billed $96 USD annually
Class+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
30 Verified Answers

Related Documents