Page 376 - Programming Microcontrollers in C
P. 376

Coding the Alpha Data     361

                            Char         Frequency          Char         Frequency
                            e            9.2042             d            2.7331
                            <            7.7572             k            2.3312
                            >            7.5563             b            2.2106
                            a            7.5161             y            1.9695
                            r            7.3151             p            1.8891
                            n            6.3505             u            1.7685
                            o            5.6270             g            1.7283
                            i            5.3859             w            1.2862
                            l            5.0241             f            1.2460
                            s            4.5418             j            1.2058
                            t            4.0595             v            0.8039
                            c            3.8585             x            0.1206
                            h            3.2958             z            0.1206
                            m            3.0547             q            0.0402

                   There are 2488 characters
                   The theoretical average bits per character is
                   4.392597

                              Output 7-2: Calculation of Entropy for Phone Book
                              Next, a Huffman code will be created to encode the data from the
                          phone book. A Huffman code is built into a complete binary tree.
                          Such a tree always has two descendents from every node unless the
                          node is a leaf node. As such, whenever a Huffman tree is created to
                          encode n characters, there will be 2n–1 nodes in the tree. Figure 7.1
                          shows an instance of such a tree. This tree encodes the data shown in
                          Output 2 above. As with most trees, analysis, or encoding, starts at
                          the root node at the top of the page. Whenever you traverse to the
                          left, a code value of zero is recorded. When traversing to the right, a
                          code value of 1 is recorded. For example, the character R will be
                          encoded as 0010, and the character M will be 111100. This table is
                          constructed and filled to keep the most frequently occurring letters
                          at the top of the tree and the least frequently occurring letters at the
                          bottom. Therefore, the number of bits for each character is inversely
                          proportional to its frequency of occurrence. This choice for the letter
                          codes requires less than the number of bits one would expect when
                          using the standard 8 bits per character.
   371   372   373   374   375   376   377   378   379   380   381