Page 20 - Introduction to Statistical Pattern Recognition

P. 20

2 Introduction to Statistical Pattern Recognition

operation by observing the output voltage of a microphone over a period of
time. This problem reduces to discrimination of waveforms from good and
bad machines. On the other hand, recognition of printed English Characters
corresponds to classification of geometric figures. In order to perform this type
of classification, we must first measure the observable characteristics of the
sample. The most primitive but assured way to extract all information con-
tained in the sample is to measure the time-sampled values for a waveform,
x(t,), . . . , x(t,,), and the grey levels of pixels for a figure, x(1) , . . . , A-(n), as
shown in Fig. 1-1. These n measurements form a vector X. Even under the
normal machine condition, the observed waveforms are different each time the
observation is made. Therefore, x(ri) is a random variable and will be
expressed, using boldface, as x(fi). Likewise, X is called a random vector if its
components are random variables and is expressed as X. Similar arguments
hold for characters: the observation, x(i), varies from one A to another and
therefore x(i) is a random variable, and X is a random vector.
Thus, each waveform or character is expressed by a vector (or a sample)
in an n-dimensional space, and many waveforms or characters form a distribu-
tion of X in the n-dimensional space. Figure 1-2 shows a simple two-
dimensional example of two distributions corresponding to normal and
abnormal machine conditions, where points depict the locations of samples and
solid lines are the contour lines of the probability density functions. If we
know these two distributions of X from past experience, we can set up a boun-
dary between these two distributions, g (I- ,, x2) = 0, which divides the two-
dimensional space into two regions. Once the boundary is selected, we can
classify a sample without a class label to a normal or abnormal machine,
depending on g (x I, xz)< 0 or g (x, , x2) >O. We call g (x , x2) a discriminant
function, and a network which detects the sign of g (x 1, x2) is called a pattern
I-ecognition network, a categorizer, or a classfier. Figure 1-3 shows a block
diagram of a classifier in a general n-dimensional space. Thus, in order to
design a classifier, we must study the characteristics of the distribution of X for
each category and find a proper discriminant function. This process is called
learning or training, and samples used to design a classifier are called learning
or training samples. The discussion can be easily extended to multi-category
cases.
Thus, pattern recognition, or decision-making in a broader sense, may be
considered as a problem of estimating density functions in a high-dimensional
space and dividing the space into the regions of categories or classes. Because

15 16 17 18 19 20 21 22 23 24 25