Page 171 - ARM 64 Bit Assembly Language
P. 171
158 Chapter 6
6.2 Word frequency counts
Counting the frequency of words in written text has several uses. In digital forensics, it can
be used to provide evidence as to the author of written communications. Different peo-
ple have different vocabularies, and use words with differing frequency. Word counts can
also be used to classify documents by type. Scientific articles from different fields contain
words specific to the field, and historical novels will differ from western novels in word fre-
quency.
Listing 6.4 shows the main function for a simple C program which reads a text file and cre-
ates a list of all of the words contained in a file, along with their frequency of occurrence.
The program has been divided into two parts: the main program, and an ADT. The ADT
is used to keep track the words and their frequencies, and to print a table of word frequen-
cies.
Listing 6.4 C program to compute word frequencies.
1 #include <stdlib.h>
2 #include <string.h>
3 #include <stdio.h>
4 #include <ctype.h>
5 #include <list.h>
6 /***********************************************************/
7 /* remove_punctuation copies the input string to a new */
8 /* string, but omits any punctuation characters */
9 char *remove_punctuation(char *word)
10 { char* newword = (char*)malloc(strlen(word)+1);
11 char* curdst = newword;
12 char* cursrc = word;
13 while( *cursrc != 0 )
14 {
15 if(strchr(",.\"!$();:{}\\[]", *cursrc) == NULL)
16 { /* Current character is not punctuation */
17 *curdst = tolower(*cursrc);
18 curdst++;
19 }
20 cursrc++;
21 }
22 *curdst=0;
23 return newword;
24 }
25
26 /***********************************************************/
27 /* The main function reads whitespace separated words */
28 /* from stdin, removes punctuation, and generates a word */
29 /* frequency list. */
30 int main()