Page 376 - Programming Microcontrollers in C
P. 376
Coding the Alpha Data 361
Char Frequency Char Frequency
e 9.2042 d 2.7331
< 7.7572 k 2.3312
> 7.5563 b 2.2106
a 7.5161 y 1.9695
r 7.3151 p 1.8891
n 6.3505 u 1.7685
o 5.6270 g 1.7283
i 5.3859 w 1.2862
l 5.0241 f 1.2460
s 4.5418 j 1.2058
t 4.0595 v 0.8039
c 3.8585 x 0.1206
h 3.2958 z 0.1206
m 3.0547 q 0.0402
There are 2488 characters
The theoretical average bits per character is
4.392597
Output 7-2: Calculation of Entropy for Phone Book
Next, a Huffman code will be created to encode the data from the
phone book. A Huffman code is built into a complete binary tree.
Such a tree always has two descendents from every node unless the
node is a leaf node. As such, whenever a Huffman tree is created to
encode n characters, there will be 2n–1 nodes in the tree. Figure 7.1
shows an instance of such a tree. This tree encodes the data shown in
Output 2 above. As with most trees, analysis, or encoding, starts at
the root node at the top of the page. Whenever you traverse to the
left, a code value of zero is recorded. When traversing to the right, a
code value of 1 is recorded. For example, the character R will be
encoded as 0010, and the character M will be 111100. This table is
constructed and filled to keep the most frequently occurring letters
at the top of the tree and the least frequently occurring letters at the
bottom. Therefore, the number of bits for each character is inversely
proportional to its frequency of occurrence. This choice for the letter
codes requires less than the number of bits one would expect when
using the standard 8 bits per character.