Computer Science 4442A/B Lecture Notes - Lecture 23: Bigram, Probability Mass Function, Test Data

Smoothing: increase p(unseen event) -&gt; decrease p(seen event, p(w | denied the) (cid:894)unseen event(cid:895) decrease (cid:894)unseen event(cid:895) decrease. 7 total: smoothing flattens spiky distributions so they generalize better. 7: works okay if sparsity is mild, there aren"t a lot of missing n-grams. N=10,000 add-one smoothed bigram counts want to eat. Example allocation to unseen bigrams: n = 22,000,000, v = 273, 266, b = v2 = 74, 674, 306, 756, 74, 671, 100, 000 unseen bigrams, add-one probability of unseen bigram: =(cid:184) (cid:185: portion of probability mass given to unseen bigrams: number of unseen bigrams x p(unseen bigram) = 1 n n: empirical count (averaged) of these n-grams is tr / nr, want predicted count close to empirical count tr / nr. Counts on test data: corpus of 44,000,000 bigram tokens, 22,000,000 for training, 22,000,000 for testing, probability, divide count by 22,000,000, each unseen bigram was given a count of 0. 000295 num. of times appeared in training corpus.

Canada

Artificial Intelligence II

Computer Science

Olga Veksler

Western University

Origin And Geology Of The Solar System

Engineering Ethics, Sustainable Development And The Law

Fundamentals Of Geography

Chemistry for Engineers

Game Design

Software Quality Reliability and Maintenance

All Subjects

Accounting

Mathematics

Statistics

Calculus

Psychology

Management

Biology

Physics

Chemistry

Electrical Engineering

Mechanical Engineering

Marketing

Music

History

Project Management

Business

Architecture

Geography

Science

Nursing

Information Technology

Engineering

English

Finance

Philosophy

Astronomy

Communications

Sociology

Anthropology

Prealgebra

Algebra

Precalculus

Ethics

Geometry

Probability

Economics

Computer Science 4442A/B Lecture Notes - Lecture 23: Bigram, Probability Mass Function, Test Data

Document Summary

Get access