Page 176 -
P. 176
164 5 Neural Networks
These are written in a 7x8 grid and their images binarised. From the binary images
the horizontal (H1 to H8) and vertical (V1 to V7) projections are obtained by
counting the dark pixels, as shown in Figure 5.14.
Inspecting these projections for the "prototypes" U and V of Figure 5.14, the
following features seem worth trying (other choices are possible):
Using a separable set of US and V's (set 1) the perceptron adjusts a linear
discriminant until complete separation, as shown in Figure 5.15. The Perceptron
program allows learning to be performed in apattern-by-pattern fashion, observing
the progress of the discriminant adjustment until convergence.
Using a set of non-separable U's and V's (set 2), the perceptron is unable to
converge and oscillates near the border of the U's and V's clusters. Figure 5.16
shows one of the best solutions obtained.
The simple type of decision surfaces that one can achieve with the perceptron is,
of course, one of its limitations. Many textbooks illustrate this issue with the
classic XOR problem. This consists of separating the two-dimensional patterns
shown in Figure 5.17, whose target values correspond to the logical exclusive-or
(XOR) of the inputs xl and x2, coding the logical variables as: l=Tme, O=False.
Figure 5.17. The classic XOR problem, often used to illustrate neural classifier
performance.
As the two classes of XOR patterns are not linearly separable, it is customary to
say that it is not possible to solve this problem with a perceptron. However, we
must not forget that we may use transformed features as inputs. For instance, we
can use a quadratic transformation of the features. As seen in 2.1.1, we would then
need to compute (d+2)(d+1)/2=6 new features and use a perceptron with 6 weights: