Page 371 - Programming Microcontrollers in C

P. 371

356 Chapter 7 Advanced Topics

while((c=getchar())!=’\n’)
a[i++]=c;
a[i]=’\n’;
encode(a,array,ARRAY_SIZE);
decode(array,s);
printf(“%s”,s);
}
Listing 7-4: Numeric Decode Test

Coding the alpha data

For encoding and decoding the name data to be stored in our
phone book, we will use a Huffman code. We saw the decoding of a
Huffman code in Chapter 5 and the decoding approach used here
will be almost the same as was used there. In the discussion in Chap
ter 5 there was no encoding and that feature must be added here. To
do justice to the encoding technique, it is necessary to try to build the
code to encode the type of text that a phone book represents. There is
no reason to suspect that a collection of English names will contain
the same character frequency as standard English text. It is necessary
to understand the frequency of occurrence of each letter in the text to
be encoded. With this understanding, you can write a code that as
signs few bits to frequently occurring letters and more bits to letters
that occur less frequently.
The program below reads in data, counts the occurrences of let
ters, both upper and lower case, in a document. The occurrence of
letters is sorted in order of decreasing occurrence. These data are
printed out. The program calculates the average theoretical entropy,
bits per character, of each character in the document and displays
this number. Also included in the calculations are the space, ‘ ’, char
acter and the new line, ‘\n’, character. These characters cause the
output to be distorted, so the character ‘>’ is used to indicate a space
character and a ‘<’ indicates a new line. These data will then be used
to create a Huffman code used to compress the data prior to storage
in the internal EEPROM.
This particular program, along with many variations, has been
an exercise used for years in classes. It does demonstrate some im
portant considerations. The shell sort was used in Chapters 2 and 5 to
sort data, and here we will use it again. In this case, the data to be

366 367 368 369 370 371 372 373 374 375 376