A.H.Dixon CMPT 150: Week 1 (Sept 4 - 9) 1
1 ALPHABETS AND ENCODINGS
An alphabet is a ▯nite set of distinct symbols often called \characters".
Some, but not necessarily all, sequences of characters are \meaningful". That is, they have
been selected to represent an idea or concept. The assignment of meanings to a subset of
sentences de▯ned on an alphabet is called an interpretation.
The 26 letters of the alphabet can be used to de▯ne sequences more commonly called
\words". Some words are meaningful, others are just garbled sequences of letters.
The 10 digits, \0" through \9" de▯ne sequences called the non-negative integers. In this
case every sentence is meaningful since each de▯nes some integer.
By adding the character \-" to the alphabet of digits, we can construct sentences that
correspond to the full set of integers, both positive and negative.
An encoding is the assignment of a unique sequence of symbols from one alphabet to
represent each symbol of the second alphabet. When a symbol (or sequence of symbols) in
one alphabet has been assigned a unique sequence of characters from the second alphabet,
that sequence is called a \codeword" for the original symbol or sequence.
A sequence of symbols in a given alphabet can be encoded in a second alphabet by con-
catenating the codewords for each symbol of the given alphabet as expressed in the sec-
ond alphabet. The resulting encoded sequence is called a \message". For example, if 8
is encoded as \1000" and 7 is encoded as \0111" then 87 is expressed by the message
There are two types of encodings:
1. Variable Length Encoding: Assigns codewords that may be sequences of di▯erent
2. Fixed Length Encoding: The codewords for all symbols have the same length.
Variable length encodings may lead to shorter messages, but may not be uniquely deci-
pherable; that is, attempting to interpret the message may lead to more than one possible
interpretation. In such cases, the encoding is called \ambiguous".
Fixed length encodings are always uniquely decipherable, but the messages may be longer.
In the design of electrical circuits an encoding is de▯ned to represent an alphabet of symbols
using di▯erent voltage levels. It is desirable to use an alphabet of only two symbols to
maximize the voltage level interval that can be associated with each symbol in an encoding
of characters by voltage levels. Although more symbols could be used, an alphabet of only
two symbols simpli▯es the design of circuits and reduces the chance of error from voltage
The symbols of the binary alphabet f0, 1g are called \bits" (analgous to \digits" for the
ten digit symbols) and a bit refers to one symbol (either 0 or 1) in a binary sequence. The
length of a binary sequence is the number of bits in the sequence.
1.1 Common Fixed Length Encodings
A binary alphabet does not lend itself to human interpretation readily, and so encodings
must be de▯ned for \user-friendly" alphabets using the binary alphabet. Some of the ways A.H.Dixon CMPT 150: Week 1 (Sept 4 - 9) 2
of encoding larger alphabets using the binary alphabet are:
1. Natural binary encoding: Each unsigned integer is represented by its base-2 value. The
arithmetic techniques for converting between base-2 and base-10 provide the tools for
transforming sentences in one alphabet to sentences in the other. Techniques for
converting from decimal to binary and from binary to decimal are described in the
textbook by Mano and Kime (custom or 4th ed), pages 13 - 15.
2. Binary Coded Decimal (BCD): Rather than encode every integer into its base-2 equiv-
alent, only the symbols \0" through \9" are represented by their corresponding 4-bit
base-2 equivalents. The BCD representation of an unsigned integer is obtained by
concatenating the corresponding binary codes of each digit together.
For example: 9132 would be encoded as 1001 0001 0011 0010. (The spaces between
certain bits are not part of the encoding, but are there to improve readability of the
3. Gray Encoding of the decimal digits. It is not necessary that the codeword corre-
spond to the base-2 representation of the 1-digit integer. Any unique assignment of
codewords to the decimal digits de▯nes a possible encoding. For example, the follow-
ing encoding of the decimal digits is called a \Gray Encoding" because each pair of
consecutive codewords di▯ers in only one bit position:
4. Hexadecimal: In the BCD encoding, some bit sequences of length 4 do not correspond
to any decimal digit. Therefore not every binary sequence can be interpreted as a
valid BCD encoding. For example, 1100 0011 does not represent any number in
BCD. By assigning unique symbols to those sequences not representing a decimal
digit, it is possible to encode an alphabet of 16 symbols. Such an alphabet is called
an hexadecimal alphabet. The symbols assigned to the unused codewords of the BCD
encoding are as follows:
1010 is represented by A
1011 is represented by B
1100 is represented by C
1101 is represented by D
1110 is represented by E
1111 is represented by F
The hexadecimal alphabet is most often used as a \short-hand" way of expressing
long binary sequences. For any arbit