Page 254 - Programming Microcontrollers in C
P. 254
Data Compression 239
If you look at Figure 5-1, you will note that the letters e, m, and
t require only 2 bits each to determine the value. These letters were
selected to be represented by 2-bit values because they are the most
frequently occurring letters in the message. The remainder of the
letters require more bits but they do not occur so often in the message.
e m t
l
o c
Figure 5-1: Huffman Code Tree
The code must be designed specifically for the messages to be
represented. First, all of the characters in all of the messages are
collected into a histogram of the occurrence of each character.
Remember to include spaces and other such characters in the list.
Once you have determined the number of different characters in the
list, you can design a tree that will compress the code. If the list
contains n different characters, there will be 2n – 1 nodes in the tree.
Each level in the tree, starting from the root node at the zeroth level,
can contain 2 nodes or leafs. This tree cannot be balanced like a
k
binary sort tree because we want some of the leafs to be determined
by as few as 2 bits, and we do not care about the number of bits
needed to determine a character that occurs infrequently. In Figure
5-1, levels 0 and 1 contain only nodes with no leaves. Level 0 contains
three leaves and one node. It is through this single node that all of the
remainder of the characters must be found.
A series of avionics-related items were put together and an analysis
of the frequency of characters in these messages was completed. A
Huffman code that will encode these data effectively was created. This