POLI 210 Lecture Notes - Lecture 20: Named-Entity Recognition
Document Summary
Poli 210- lecture 20: content analysis ii, prof. aaron erlich. Often, we want to discover meaning from texts. The internet and other sources have led to astronomical amount of text. We want to turn that text into data. Most techniques now use a combination of approaches (supervised, unsupervised, dictionary, regression) It depends on the question you are asking. We are going to look at an application to get a better sense of what this looks like. We wa(cid:374)t to ide(cid:374)tify ele(cid:373)e(cid:374)ts of do(cid:374)ald tru(cid:373)p"s spee(cid:272)h. The problem is there is also a campaign. Need to pre-process the data to strip out unuseful information and standardize text. Many dictionary approaches also have to do some of these steps. Bag of words: get rid of structure and just use the words. N-grams (bi-grams, tri-grams: use combinations of words. Named entity recognition: use powerful machine learning algorithms to extract named entities. Term frequency/ term frequency inverse document frequency.