Page 255 - Programming Microcontrollers in C
P. 255

240    Chapter 5  Programming Large 8-Bit Systems

                          code is shown in Table 5-1. Frequently occurring letters like e, i, or a
                          are encoded with only two or three bits. On the other hand, letters that
                          occur very infrequently—like c or v—require 9 bits to encode. Even
                          though some letters require more than 8 bits to encode, the average
                          number of bits per character for this particular message set is only 4.032.
                              Decoding a Huffman code with a program is relatively easy. The
                          program must contain the Huffman tree with all of its nodes and leaves.
                          If we want the entire alphabet and a space, a period, a comma and an
                          end of message, there will be 59 entries in the tree. This or an equivalent
                          tree is an overhead that must be carried for every Huffman program.
                          Assume that the statistical analysis of the messages along with the
                          construction of the Huffman tree is complete. The approach taken to
                          build this tree in memory is to allow one byte for each node. The tree
                          is searched from the root node, which is the lowest address. The message
                          to be decoded is examined. If the bit in the message is a zero, the node
                          pointer is incremented by one and the next bit is examined. If the bit in
                          the message is a one, the node pointer is incremented by the content of
                          the node. All printable characters are entered into the tree as negative
                          numbers. The contents of the node are examined. If the content of the
                          node is positive, the next bit from the message is examined and so
                          forth. Whenever the content of a node is negative, its most significant
                          bit is removed, and the character is sent to the output. At that time, the
                          node pointer is returned to the root node, and the sequence is repeated
                          until an end of message is found.

                              Character     Code         Character  Code
                              E             00           D           110000
                              I             01           ‘ ‘         110001
                              A             100          G           111100
                              ‘\n’          101          U           111101
                              N             11001        L           1111100
                              R             11010        O           1111101
                              T             11011        F           11111100
                              M             11100        H           11111101
                              S             11101        P           11111110
                                                         C           111111110
                                                         V           111111111

                              Table 5-1: Huffman Code
   250   251   252   253   254   255   256   257   258   259   260