Page 159 - Artificial Intelligence in the Age of Neural Networks and Brain Computing
P. 159
148 CHAPTER 7 Pitfalls and Opportunities in the Development of AI Systems
Given the task and the type of data available to us, we have just so much inherent
discriminabilitydeven for the ideal observer. If we have perfection on our training
set, then we either are working on a task not worth doing or we’ve got to use a less
complex algorithm or ease up on the throttle and halt training earlier.
Our algorithm is too simple for the task at hand. Some tasks require complicated
classifiers. We may need to acquire more data, or as suggested above, do pretraining
on some kind of similar data, for example, to allow a deep network to learn a feature
space pertinent to our problem.
We’ll train our CI algorithm as well as we can. Still, our algorithm is crap: get
over it.
3. AI EVALUATION
Now we reach the heart of our subject. Having worked through the defects of our
data and potential problems of our methodology, we have a finished product: Is it
any good? If our evaluation is crap, we can’t get over it. Many CI developers
seem exhausted by the birthing process and only cursorily treat the assessment
of their proud creation. How well does it work? How does it compare with other
algorithms for its assigned task? Finally, how well do we understand its behavior?
Always remember that no technological result is credible without accompanying
error bars. Open any AI conference proceedings and scan the articles’ evaluation
sections. This will not take a long time; because, on average there just isn’t much
to see. We can do better.
3.1 USE OF DATA
Before creating a CI, we must consider how we will use our limited set of data to
train the CI and evaluate its performance after it is trained. There are several
ways to do this, and below we discuss using resubstitution, cross-validation, and a
separate testing set.
Resubstitution: We can use all our data to train our CI, and then apply that CI to
the same set of data to test it, as in Method 4 of Table 7.1. This is called resubstitution.
It will give an overly optimistic estimate of performance on the general population,
particularly if there are a large number of features or a large number of parameters to
be fit in our algorithm. Indeed this method can give a perfect measure of performance
on a CI that achieves only random performance on the overall population, as
discussed above. Essentially resubstitution only demonstrates that our model fits
the training set data.
Cross-Validation: In cross-validation we evaluate the performance of our CI by
training it on some large fraction of the data, such as 90% of the cases, and testing
the CI on the remaining 10% of the cases. We then repeat the process a number of
times, each time selecting a different 90% of the cases, and testing on the unincluded
10%. The average of the testing performance across the 10% hold-out sets is used