COMPSCI 61B Lecture Notes - Lecture 38: Trie, Coding Theory, Glossary Of Ancient Roman Religion
Document Summary
In a lossless algorithm, we require that no information is lost. By default, english text is represented by sequences of characters, each 8 bits long. Easy way to compress: use fewer than 8 bits per letter. Decide which bit sequencies go with which letters. More generally, which codewords go with which symbols. A pre x-free code is one in which no codeword is a pre x of another. Used to nd the ideal pre x-free code for a set of words. Count relative frequencies of all characters in a text. Split into left and right halves of roughly equal frequency. Left half gets a leading zero; right half gets a leading one. * assign each symbol to a node with weight = relative frequency. * take the two smallest nodes and merge them into a super node with weight equal to sum of weights. * repeat until everything is part of a tree.