CS431 Chapter Notes - Chapter 3: Power Law, N-Gram, Mapreduce

22 views19 pages

Document Summary

Basic idea: limit history to fixed number of (n 1) words (markov assumption) Need better estimators because mles give us a lot of zeros. Take from the seen n-grams to unseen n-grams. Jelinek-mercer mix higher-order with lower-order model to defeat sparsity a. Kneser-ney interpolate discounted model with a special "continuation" n-gram model a. b. Based on appearance of n-grams in different contexts. Stupid backoff a. b. c. d. e. f. Seeing the k-gram given seeing the k-1 cases. Assign ids based on frequency ( better compression using vbyte) g. Partition by bigram for better load balancing i. Map up words in english to other languages. Interpolation method, choose the tiles that are the most likely. Hard to recognize speech and autocorrect doesn"t always work. Through documents may refer to web pages, pdfs, powerpoint, etc. Central problem: what represents the same concepts and how. Treat all the words in a document as index terms.

Get access

Grade+20% off
$8 USD/m$10 USD/m
Billed $96 USD annually
Grade+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
40 Verified Answers
Class+
$8 USD/m
Billed $96 USD annually
Class+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
30 Verified Answers

Related Documents