CPSC 477 Lecture Notes - Lecture 3: Binary Classification, Parsing, Lexical Analysis

75 views3 pages
153
Type = unique word/ character
Token = total number of words
Tokenization
- Do’t wat to reak up words like New York, ad ho,
- Tokeizig: New York-Los Ageles Flight
o Phrase processing first and then tokenize
Decision list
- If x then
Binary classification problem
Most preprocessing are rule-based
Domain transfer
- Build machine learning that is trained on one domain, make it operate on another
domain
151
NLP Tasks
Part of speech tagging
- How many? Depends, but at least 10?
Open dictionary, look up swimmer, is a noun, assume is a noun in this case
Run: can be noun, verb, need to do part of speech disambiguation
Final, race: ambiguous
Even with rule-based systems, ambiguities
Need to use probabilities
Constituent vs dependency parsing
Constituent parsing
- Context-free grammar or phrase-structure grammar
- Left-side: non-terminating structures
find more resources at oneclass.com
find more resources at oneclass.com
Unlock document

This preview shows page 1 of the document.
Unlock all 3 pages and 3 million more documents.

Already have an account? Log in

Get access

Grade+20% off
$8 USD/m$10 USD/m
Billed $96 USD annually
Grade+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
40 Verified Answers
Class+
$8 USD/m
Billed $96 USD annually
Class+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
30 Verified Answers