Page 253 - Programming Microcontrollers in C
P. 253
238 Chapter 5 Programming Large 8-Bit Systems
Huffman codes are self-synchronizing. A long string of characters
can be represented as a series of bits, and the decoding program can
find each character in the sequence without any special flags to designate
the end of each character. Messages stored as ASCII characters require
at least 7 bits per character, and because of the nature of storage in a
computer, 8 bits are usually assigned to each character. With a Huffman
code, it is possible to reduce the average number of bits per character
for the whole English language to about 4.5 bits per character.
Sometimes even fewer bits are needed with limited messages.
The programmer does all of the Huffman encoding and the computer
must decode the messages for display. Huffman decoding requires extensive
bit manipulation, so C is an ideal language to decode the messages.
A Huffman decoding scheme can be represented by a strictly binary
tree. A binary tree is comprised of a collection of nodes. Each node has
two descendents, often called the right and left descendents (thus the
term binary tree). The terminating node in a tree is called a leaf. In a
strictly binary tree, every nonleaf node has both a right and left descendent.
If a strictly binary tree has n leafs, it has a total of 2n – 1 nodes.
The Huffman code is decoded in this manner. The sequence always
starts at the root node. The code sequence to be decoded is examined a
bit at a time. If the bit is a 0, move to the left node. If the bit is a 1,
move to the right node. The tree is traversed until a leaf is found which
contains the desired character. The sequence is then restarted at the
root node. If the tree is not known, the sequence of bits is bewildering.
Usually it is found that the occurrence of ones and zeros is about equally
likely, so no statistical analysis can be applied to decode the sequence
easily. A disadvantage is that if a single bit in the sequence is lost, the
remainder of the code stream is undecipherable. Of course, the
advantage to the Huffman code is that it takes somewhere between
one-quarter and one-half the number of bits to represent a message as
is required by the ASCII code. A Huffman Code tree is shown in Figure
5-1. The bit sequence 111111100101111010100000 as decoded by
this tree is seen to spell the word “committee.” This sequence requires
23 bits, and the ASCII code needed to represent committee is 72 bits
long. Of course, this tree was designed specifically for the word
“committee,” but try to encode other words that can be made of these
same letters such as “tic,” “me,” “come,” “tom,” “mite,” etc., and you
will find that every word that can be derived from this tree requires
fewer bits than the corresponding ASCII sequence would need.