Class Notes
(809,909)

United States
(313,931)

Computer Science
(355)

COMPSCI C8
(50)

John Denero
(47)

Lecture 13

# COMPSCI C8 Lecture 13: Week 13 Study Guide (Lecture & Textbook Notes)

Unlock Document

University of California - Berkeley

Computer Science

COMPSCI C8

John Denero

Spring

Description

Mon 417 [34]: Classifiers Reading: New Methods Today: Lecture: Nearest Neighbor Idea What is classification? Classifier is some program that assesses attributes and returns predicted label of example How do we build the classifier? Gather data. We might not be accurate. So we use a test set. How big should the test set be? Demo: Google Science Fair It will be hard to guess based on only 2 attributes (for some points that are in the middle) Lets use ALL the attributes If you have a row, then np.array(t.row(o)) evaluates to an array of all the numbers in the row Now you can do array arithmetic. You can also iterate through rows Distance formula: applies to any number of dimensionsattributes Finding the k nearest Neighbors To find the k nearest neighbors: find distance between example and each example in the training set Augment the training data table with a column containing all the distances Sort the augmented table in increasing order Take the top k rows of the sorted table Let the top k rows VOTE. Force it to make a guess every time. Even if not sure. Accuracy of Classifier Proportion of examples that are labeled correctly Need to compare classifier predictions to true labels If labeled set is sampled at random from the population then we can infer accuracy on that population Decision Boundary Wed 419 [35]: Categorical Association Reading: New Methods Today: Lecture: https:www.youtube.comwatch?v=0vjYj7LbzdE You can only predict if you have an association Demo: Using Bar Chart on Uniformity Comparing Two Samples What if bar charts pretty similar? Is the association due to random sampling or actually there? Lets do a hypothesis test to find out if there is an association between two classes and an attribute. Permutation Test: whether two samples are drawn randomly from the same distribution Does it matter that Ive paired together rows in this way? Or would it be same if we kept xs but permuted ys? Lets make our null hypothesis that permuting doesnt make a difference. Any difference is due to chance! All rearrangements are equally likely Keep shuffling and keep computing a test statistic. Eventually you will have a distribution of the test statistic. We can use the tvd to measure how different distributions are, remember? Sum of absolute distances 2

More
Less
Related notes for COMPSCI C8