Page 153 - Artificial Intelligence in the Age of Neural Networks and Brain Computing
P. 153

142    CHAPTER 7 Pitfalls and Opportunities in the Development of AI Systems























                         FIGURE 7.3
                         Plot showing the heights and weights of random samples of 100 Asian elephants [3] and
                         100 humans from the United States [4], flanked by a true pachydermal posterior
                         (Wisconsinart j Dreamstime) and the remarkably similar one of an author. The plot also
                         shows the heights and weights of basketball player Shaquille O’Neal and the heaviest man
                         ever recorded (Guinness world records). The dotted line is a possible decision surface in
                         feature space separating the two classes. Only extreme obesity (X) causes any overlap
                         between classes.


                         “probability”: the decision variable can be any function strictly monotonic with that
                         probability.
                            If the locations of the respective cases were to overlap in feature space, as for men
                         and women, for example, then much greater ambiguity arises, Fig. 7.4. Say the task is
                         to determine which individuals are women. Given an infinite amount of data in the
                         training set, the requisite probability would be easy to determine. At height H and
                         weight W (or within some infinitesimal region surrounding that point) how many
                         women are there relative to the total population in that volume (here area) of feature
                         space? With finite datasets the calculation is more ambiguous. Fig. 7.4 shows prob-
                         ability curves for our limited data set given a variety of different reasonable statistical
                         models. This is indicative of the difficulty in training Hal in real-world problems with
                         limited data. For each model the decision surface is shown assuming that we require
                         at least a 50% chance that an individual is a woman in order to make that decision.



                         2. AI DEVELOPMENT
                         2.1 OUR DATA ARE CRAP
                         Our first duty is to validate and characterize our data. The sage aphorism “garbage
                         in: garbage out” is applicable to the CI development process. Our algorithm will be
                         no better than the data used in its development. This is especially important because
   148   149   150   151   152   153   154   155   156   157   158