COMPSCI C8 Lecture 13: Week 13 Study Guide (Lecture & Textbook Notes)

2 Pages
Unlock Document

University of California - Berkeley
Computer Science
John Denero

Mon 417 [34]: Classifiers Reading: New Methods Today: Lecture: Nearest Neighbor Idea What is classification? Classifier is some program that assesses attributes and returns predicted label of example How do we build the classifier? Gather data. We might not be accurate. So we use a test set. How big should the test set be? Demo: Google Science Fair It will be hard to guess based on only 2 attributes (for some points that are in the middle) Lets use ALL the attributes If you have a row, then np.array(t.row(o)) evaluates to an array of all the numbers in the row Now you can do array arithmetic. You can also iterate through rows Distance formula: applies to any number of dimensionsattributes Finding the k nearest Neighbors To find the k nearest neighbors: find distance between example and each example in the training set Augment the training data table with a column containing all the distances Sort the augmented table in increasing order Take the top k rows of the sorted table Let the top k rows VOTE. Force it to make a guess every time. Even if not sure. Accuracy of Classifier Proportion of examples that are labeled correctly Need to compare classifier predictions to true labels If labeled set is sampled at random from the population then we can infer accuracy on that population Decision Boundary Wed 419 [35]: Categorical Association Reading: New Methods Today: Lecture: You can only predict if you have an association Demo: Using Bar Chart on Uniformity Comparing Two Samples What if bar charts pretty similar? Is the association due to random sampling or actually there? Lets do a hypothesis test to find out if there is an association between two classes and an attribute. Permutation Test: whether two samples are drawn randomly from the same distribution Does it matter that Ive paired together rows in this way? Or would it be same if we kept xs but permuted ys? Lets make our null hypothesis that permuting doesnt make a difference. Any difference is due to chance! All rearrangements are equally likely Keep shuffling and keep computing a test statistic. Eventually you will have a distribution of the test statistic. We can use the tvd to measure how different distributions are, remember? Sum of absolute distances 2
More Less

Related notes for COMPSCI C8

Log In


Don't have an account?

Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view


By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.