Page 371 - Programming Microcontrollers in C
P. 371

356    Chapter 7  Advanced Topics

                       while((c=getchar())!=’\n’)
                          a[i++]=c;
                       a[i]=’\n’;
                       encode(a,array,ARRAY_SIZE);
                       decode(array,s);
                       printf(“%s”,s);
                   }
                              Listing 7-4: Numeric Decode Test


            Coding the alpha data

                              For encoding and decoding the name data to be stored in our
                          phone book, we will use a Huffman code. We saw the decoding of a
                          Huffman code in Chapter 5 and the decoding approach used here
                          will be almost the same as was used there. In the discussion in Chap­
                          ter 5 there was no encoding and that feature must be added here. To
                          do justice to the encoding technique, it is necessary to try to build the
                          code to encode the type of text that a phone book represents. There is
                          no reason to suspect that a collection of English names will contain
                          the same character frequency as standard English text. It is necessary
                          to understand the frequency of occurrence of each letter in the text to
                          be encoded. With this understanding, you can write a code that as­
                          signs few bits to frequently occurring letters and more bits to letters
                          that occur less frequently.
                              The program below reads in data, counts the occurrences of let­
                          ters, both upper and lower case, in a document. The occurrence of
                          letters is sorted in order of decreasing occurrence. These data are
                          printed out. The program calculates the average theoretical entropy,
                          bits per character, of each character in the document and displays
                          this number. Also included in the calculations are the space, ‘ ’, char­
                          acter and the new line, ‘\n’, character. These characters cause the
                          output to be distorted, so the character ‘>’ is used to indicate a space
                          character and a ‘<’ indicates a new line. These data will then be used
                          to create a Huffman code used to compress the data prior to storage
                          in the internal EEPROM.
                              This particular program, along with many variations, has been
                          an exercise used for years in classes. It does demonstrate some im­
                          portant considerations. The shell sort was used in Chapters 2 and 5 to
                          sort data, and here we will use it again. In this case, the data to be
   366   367   368   369   370   371   372   373   374   375   376