Class Notes (807,517)
Canada (492,691)
CMPT 150 (6)


4 Pages
Unlock Document

Simon Fraser University
Computing Science
CMPT 150
Anthony Dixon

A.H.Dixon CMPT 150: Week 1 (Sept 4 - 9) 1 1 ALPHABETS AND ENCODINGS An alphabet is a ▯nite set of distinct symbols often called \characters". Some, but not necessarily all, sequences of characters are \meaningful". That is, they have been selected to represent an idea or concept. The assignment of meanings to a subset of sentences de▯ned on an alphabet is called an interpretation. The 26 letters of the alphabet can be used to de▯ne sequences more commonly called \words". Some words are meaningful, others are just garbled sequences of letters. The 10 digits, \0" through \9" de▯ne sequences called the non-negative integers. In this case every sentence is meaningful since each de▯nes some integer. By adding the character \-" to the alphabet of digits, we can construct sentences that correspond to the full set of integers, both positive and negative. An encoding is the assignment of a unique sequence of symbols from one alphabet to represent each symbol of the second alphabet. When a symbol (or sequence of symbols) in one alphabet has been assigned a unique sequence of characters from the second alphabet, that sequence is called a \codeword" for the original symbol or sequence. A sequence of symbols in a given alphabet can be encoded in a second alphabet by con- catenating the codewords for each symbol of the given alphabet as expressed in the sec- ond alphabet. The resulting encoded sequence is called a \message". For example, if 8 is encoded as \1000" and 7 is encoded as \0111" then 87 is expressed by the message \10000111". There are two types of encodings: 1. Variable Length Encoding: Assigns codewords that may be sequences of di▯erent length. 2. Fixed Length Encoding: The codewords for all symbols have the same length. Variable length encodings may lead to shorter messages, but may not be uniquely deci- pherable; that is, attempting to interpret the message may lead to more than one possible interpretation. In such cases, the encoding is called \ambiguous". Fixed length encodings are always uniquely decipherable, but the messages may be longer. In the design of electrical circuits an encoding is de▯ned to represent an alphabet of symbols using di▯erent voltage levels. It is desirable to use an alphabet of only two symbols to maximize the voltage level interval that can be associated with each symbol in an encoding of characters by voltage levels. Although more symbols could be used, an alphabet of only two symbols simpli▯es the design of circuits and reduces the chance of error from voltage uctuations. The symbols of the binary alphabet f0, 1g are called \bits" (analgous to \digits" for the ten digit symbols) and a bit refers to one symbol (either 0 or 1) in a binary sequence. The length of a binary sequence is the number of bits in the sequence. 1.1 Common Fixed Length Encodings A binary alphabet does not lend itself to human interpretation readily, and so encodings must be de▯ned for \user-friendly" alphabets using the binary alphabet. Some of the ways A.H.Dixon CMPT 150: Week 1 (Sept 4 - 9) 2 of encoding larger alphabets using the binary alphabet are: 1. Natural binary encoding: Each unsigned integer is represented by its base-2 value. The arithmetic techniques for converting between base-2 and base-10 provide the tools for transforming sentences in one alphabet to sentences in the other. Techniques for converting from decimal to binary and from binary to decimal are described in the textbook by Mano and Kime (custom or 4th ed), pages 13 - 15. 2. Binary Coded Decimal (BCD): Rather than encode every integer into its base-2 equiv- alent, only the symbols \0" through \9" are represented by their corresponding 4-bit base-2 equivalents. The BCD representation of an unsigned integer is obtained by concatenating the corresponding binary codes of each digit together. For example: 9132 would be encoded as 1001 0001 0011 0010. (The spaces between certain bits are not part of the encoding, but are there to improve readability of the binary sequence.) 3. Gray Encoding of the decimal digits. It is not necessary that the codeword corre- spond to the base-2 representation of the 1-digit integer. Any unique assignment of codewords to the decimal digits de▯nes a possible encoding. For example, the follow- ing encoding of the decimal digits is called a \Gray Encoding" because each pair of consecutive codewords di▯ers in only one bit position: SYMBOL CODEWORD 0 0000 1 0001 2 0011 3 0010 4 0110 5 1110 6 1010 7 1011 8 1001 9 1000 4. Hexadecimal: In the BCD encoding, some bit sequences of length 4 do not correspond to any decimal digit. Therefore not every binary sequence can be interpreted as a valid BCD encoding. For example, 1100 0011 does not represent any number in BCD. By assigning unique symbols to those sequences not representing a decimal digit, it is possible to encode an alphabet of 16 symbols. Such an alphabet is called an hexadecimal alphabet. The symbols assigned to the unused codewords of the BCD encoding are as follows: 1010 is represented by A 1011 is represented by B 1100 is represented by C 1101 is represented by D 1110 is represented by E 1111 is represented by F The hexadecimal alphabet is most often used as a \short-hand" way of expressing long binary sequences. For any arbit
More Less

Related notes for CMPT 150

Log In


Don't have an account?

Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view


By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.