Page 254 - Programming Microcontrollers in C
P. 254

Data Compression      239

                              If you look at Figure 5-1, you will note that the letters e, m, and
                          t require only 2 bits each to determine the value. These letters were
                          selected to be represented by 2-bit values because they are the most
                          frequently occurring letters in the message. The remainder of the
                          letters require more bits but they do not occur so often in the message.












                                     e         m       t



                                                           l



                                                                  o              c


                              Figure 5-1: Huffman Code Tree
                              The code must be designed specifically for the messages to be
                          represented. First, all of the characters in all of the messages are
                          collected into a histogram of the occurrence of each character.
                          Remember to include spaces and other such characters in the list.
                          Once you have determined the number of different characters in the
                          list, you can design a tree that will compress the code. If the list
                          contains n different characters, there will be 2n – 1 nodes in the tree.
                          Each level in the tree, starting from the root node at the zeroth level,
                          can contain 2  nodes or leafs. This tree cannot be balanced like a
                                       k
                          binary sort tree because we want some of the leafs to be determined
                          by as few as 2 bits, and we do not care about the number of bits
                          needed to determine a character that occurs infrequently. In Figure
                          5-1, levels 0 and 1 contain only nodes with no leaves. Level 0 contains
                          three leaves and one node. It is through this single node that all of the
                          remainder of the characters must be found.
                              A series of avionics-related items were put together and an analysis
                          of the frequency of characters in these messages was completed. A
                          Huffman code that will encode these data effectively was created. This
   249   250   251   252   253   254   255   256   257   258   259