Page 253 - Programming Microcontrollers in C
P. 253

238    Chapter 5  Programming Large 8-Bit Systems

                              Huffman codes are self-synchronizing. A long string of characters
                          can be represented as a series of bits, and the decoding program can
                          find each character in the sequence without any special flags to designate
                          the end of each character. Messages stored as ASCII characters require
                          at least 7 bits per character, and because of the nature of storage in a
                          computer, 8 bits are usually assigned to each character. With a Huffman
                          code, it is possible to reduce the average number of bits per character
                          for the whole English language to about 4.5 bits per character.
                          Sometimes even fewer bits are needed with limited messages.
                              The programmer does all of the Huffman encoding and the computer
                          must decode the messages for display. Huffman decoding requires extensive
                          bit manipulation, so C is an ideal language to decode the messages.
                              A Huffman decoding scheme can be represented by a strictly binary
                          tree. A binary tree is comprised of a collection of nodes. Each node has
                          two descendents, often called the right and left descendents (thus the
                          term binary tree). The terminating node in a tree is called a leaf. In a
                          strictly binary tree, every nonleaf node has both a right and left descendent.
                          If a strictly binary tree has n leafs, it has a total of 2n – 1 nodes.
                              The Huffman code is decoded in this manner. The sequence always
                          starts at the root node. The code sequence to be decoded is examined a
                          bit at a time. If the bit is a 0, move to the left node. If the bit is a 1,
                          move to the right node. The tree is traversed until a leaf is found which
                          contains the desired character. The sequence is then restarted at the
                          root node. If the tree is not known, the sequence of bits is bewildering.
                          Usually it is found that the occurrence of ones and zeros is about equally
                          likely, so no statistical analysis can be applied to decode the sequence
                          easily. A disadvantage is that if a single bit in the sequence is lost, the
                          remainder of the code stream is undecipherable. Of course, the
                          advantage to the Huffman code is that it takes somewhere between
                          one-quarter and one-half the number of bits to represent a message as
                          is required by the ASCII code. A Huffman Code tree is shown in Figure
                          5-1. The bit sequence 111111100101111010100000 as decoded by
                          this tree is seen to spell the word “committee.” This sequence requires
                          23 bits, and the ASCII code needed to represent committee is 72 bits
                          long. Of course, this tree was designed specifically for the word
                          “committee,” but try to encode other words that can be made of these
                          same letters such as “tic,” “me,” “come,” “tom,” “mite,” etc., and you
                          will find that every word that can be derived from this tree requires
                          fewer bits than the corresponding ASCII sequence would need.
   248   249   250   251   252   253   254   255   256   257   258