Page 159 - Artificial Intelligence in the Age of Neural Networks and Brain Computing
P. 159

148    CHAPTER 7 Pitfalls and Opportunities in the Development of AI Systems




                            Given the task and the type of data available to us, we have just so much inherent
                         discriminabilitydeven for the ideal observer. If we have perfection on our training
                         set, then we either are working on a task not worth doing or we’ve got to use a less
                         complex algorithm or ease up on the throttle and halt training earlier.
                            Our algorithm is too simple for the task at hand. Some tasks require complicated
                         classifiers. We may need to acquire more data, or as suggested above, do pretraining
                         on some kind of similar data, for example, to allow a deep network to learn a feature
                         space pertinent to our problem.
                            We’ll train our CI algorithm as well as we can. Still, our algorithm is crap: get
                         over it.




                         3. AI EVALUATION
                         Now we reach the heart of our subject. Having worked through the defects of our
                         data and potential problems of our methodology, we have a finished product: Is it
                         any good? If our evaluation is crap, we can’t get over it. Many CI developers
                         seem exhausted by the birthing process and only cursorily treat the assessment
                         of their proud creation. How well does it work? How does it compare with other
                         algorithms for its assigned task? Finally, how well do we understand its behavior?
                         Always remember that no technological result is credible without accompanying
                         error bars. Open any AI conference proceedings and scan the articles’ evaluation
                         sections. This will not take a long time; because, on average there just isn’t much
                         to see. We can do better.

                         3.1 USE OF DATA
                         Before creating a CI, we must consider how we will use our limited set of data to
                         train the CI and evaluate its performance after it is trained. There are several
                         ways to do this, and below we discuss using resubstitution, cross-validation, and a
                         separate testing set.
                            Resubstitution: We can use all our data to train our CI, and then apply that CI to
                         the same set of data to test it, as in Method 4 of Table 7.1. This is called resubstitution.
                         It will give an overly optimistic estimate of performance on the general population,
                         particularly if there are a large number of features or a large number of parameters to
                         be fit in our algorithm. Indeed this method can give a perfect measure of performance
                         on a CI that achieves only random performance on the overall population, as
                         discussed above. Essentially resubstitution only demonstrates that our model fits
                         the training set data.
                            Cross-Validation: In cross-validation we evaluate the performance of our CI by
                         training it on some large fraction of the data, such as 90% of the cases, and testing
                         the CI on the remaining 10% of the cases. We then repeat the process a number of
                         times, each time selecting a different 90% of the cases, and testing on the unincluded
                         10%. The average of the testing performance across the 10% hold-out sets is used
   154   155   156   157   158   159   160   161   162   163   164