LING 15 Lecture Notes - Lecture 24: Bigram, Edit Distance, Phoneme

23 views2 pages
25 Jul 2018
School
Department
Course
Professor

Document Summary

Once parser ids or tags a word, lots more info is available. I (cid:449)a(cid:374)t to [(cid:449)e(cid:396)] highe(cid:396) p(cid:396)o(cid:271)a(cid:271)ilit(cid:455) that [(cid:449)e(cid:396)] is a verb (wear) I (cid:449)a(cid:374)t to k(cid:374)o(cid:449) [(cid:449)e(cid:396)] highe(cid:396) p(cid:396)o(cid:271)a(cid:271)ilit(cid:455) that [(cid:449)e(cid:396)] is a (cid:396)elati(cid:448)e pronoun (where) Every word has some probability of being phrase initial. Each subsequent words has a probability based on previous word. The increases probability that next word is adjective or noun. Reduce probability that next word is a verb. Rich enough database can store predictive probability of each word relative to each word. (cid:863)(cid:272)atego(cid:396)ies(cid:863) e(cid:373)e(cid:396)ge as (cid:449)o(cid:396)ds (cid:449)ith si(cid:373)ila(cid:396) dist(cid:396)i(cid:271)utio(cid:374)al p(cid:396)ope(cid:396)ties. Corpus: large collection of texts and/or speech. Written corpora: amalgamation of texts (e. g. books, articles, websites) Spoken corpora: amalgamation of recorded conversations (e. g. phone calls); especially useful is transcribed/annotated. P(cid:396)og(cid:396)a(cid:373)s (cid:272)a(cid:374) s(cid:272)a(cid:374) (cid:272)o(cid:396)po(cid:396)a fo(cid:396) (cid:449)o(cid:396)d(cid:859)s f(cid:396)e(cid:395)ue(cid:374)(cid:272)(cid:455) Convert corpus to list of (cid:449)o(cid:396)ds step th(cid:396)u list (cid:271)uild ta(cid:271)le (cid:894)(cid:373)at(cid:396)i(cid:454)(cid:895) to count frequency for each word.

Get access

Grade+
$40 USD/m
Billed monthly
Grade+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
10 Verified Answers
Class+
$30 USD/m
Billed monthly
Class+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
7 Verified Answers

Related Documents