Computer Science 4442A/B Lecture Notes - Lecture 23: Bigram, Probability Mass Function, Test Data

81 views8 pages

Document Summary

Smoothing: increase p(unseen event) -> decrease p(seen event, p(w | denied the) (cid:894)unseen event(cid:895) decrease (cid:894)unseen event(cid:895) decrease. 7 total: smoothing flattens spiky distributions so they generalize better. 7: works okay if sparsity is mild, there aren"t a lot of missing n-grams. N=10,000 add-one smoothed bigram counts want to eat. Example allocation to unseen bigrams: n = 22,000,000, v = 273, 266, b = v2 = 74, 674, 306, 756, 74, 671, 100, 000 unseen bigrams, add-one probability of unseen bigram: =(cid:184) (cid:185: portion of probability mass given to unseen bigrams: number of unseen bigrams x p(unseen bigram) = 1 n n: empirical count (averaged) of these n-grams is tr / nr, want predicted count close to empirical count tr / nr. Counts on test data: corpus of 44,000,000 bigram tokens, 22,000,000 for training, 22,000,000 for testing, probability, divide count by 22,000,000, each unseen bigram was given a count of 0. 000295 num. of times appeared in training corpus.

Get access

Grade+20% off
$8 USD/m$10 USD/m
Billed $96 USD annually
Grade+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
40 Verified Answers
Class+
$8 USD/m
Billed $96 USD annually
Class+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
30 Verified Answers