RSM412H1 Lecture Notes - Lecture 10: Unsupervised Learning, Taxicab Geometry, Hierarchical Clustering

38 views2 pages
9 Apr 2020
School
Department
Course
Professor

Document Summary

March 19, 2020: with unsupervised learning, there is no target. Don"t know what you are looking for. Can"t validate with mse, sse, etc: goal is t explain and develop insights, can be used for data reduction. Pca analysis: main application is image recognition. Divides data into groups so that data points in same group are similar to other data points in the group and dissimilar to data points in other groups. Defines clusters to minimize within-cluster variation and maximize between cluster variation. Hartigan-wong algorithm: within-cluster variation is sum of squared euclidean distances between items and corresponding centroid is data point belong to cluster: is mean value of points assigned to cluster. Optimal number of clusters is where bend in plot of k vs wss occurs. Dissimilarity matrix: matrix of distance between each pair of observations. Can give false confidence in compactness of cluster. Also good if features deviate significantly from normality. Calculate distance and assign each observation to closest centroid.

Get access

Grade+20% off
$8 USD/m$10 USD/m
Billed $96 USD annually
Grade+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
40 Verified Answers
Class+
$8 USD/m
Billed $96 USD annually
Class+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
30 Verified Answers

Related Documents