Page 153 - Artificial Intelligence in the Age of Neural Networks and Brain Computing
P. 153
142 CHAPTER 7 Pitfalls and Opportunities in the Development of AI Systems
FIGURE 7.3
Plot showing the heights and weights of random samples of 100 Asian elephants [3] and
100 humans from the United States [4], flanked by a true pachydermal posterior
(Wisconsinart j Dreamstime) and the remarkably similar one of an author. The plot also
shows the heights and weights of basketball player Shaquille O’Neal and the heaviest man
ever recorded (Guinness world records). The dotted line is a possible decision surface in
feature space separating the two classes. Only extreme obesity (X) causes any overlap
between classes.
“probability”: the decision variable can be any function strictly monotonic with that
probability.
If the locations of the respective cases were to overlap in feature space, as for men
and women, for example, then much greater ambiguity arises, Fig. 7.4. Say the task is
to determine which individuals are women. Given an infinite amount of data in the
training set, the requisite probability would be easy to determine. At height H and
weight W (or within some infinitesimal region surrounding that point) how many
women are there relative to the total population in that volume (here area) of feature
space? With finite datasets the calculation is more ambiguous. Fig. 7.4 shows prob-
ability curves for our limited data set given a variety of different reasonable statistical
models. This is indicative of the difficulty in training Hal in real-world problems with
limited data. For each model the decision surface is shown assuming that we require
at least a 50% chance that an individual is a woman in order to make that decision.
2. AI DEVELOPMENT
2.1 OUR DATA ARE CRAP
Our first duty is to validate and characterize our data. The sage aphorism “garbage
in: garbage out” is applicable to the CI development process. Our algorithm will be
no better than the data used in its development. This is especially important because