Page 157 - Artificial Intelligence in the Age of Neural Networks and Brain Computing
P. 157
146 CHAPTER 7 Pitfalls and Opportunities in the Development of AI Systems
We don’t have enough data. We never do. We need data to develop our CI
classifier, including feature identification, architecture selection, parameter tuning
(including when to stop training), and finally performance evaluation. In particular,
the amount of data directly limits the complexity of the classifier which we can
utilize. There is a famous theorem popularized by the eminent information theorist
Thomas Cover which bears on this point. Unless the number of our training cases
is at least twice the number of features we are trying to use for our classifier, we
are practically guaranteed perfect performance for any arbitrary assignment of class
labels to our cases [8]. We revisit this point again under algorithm development.
One valuable method for overcoming a lack of sufficient data for our task is to uti-
lize “similar” data, either simulated or natural in a pretraining phase of our algorithm
development (as is of course natural for the human brain). Deep learning techniques,
in particular, require vast amounts of data, so that a boost from training on a related
dataset may enable us to overcome our own insufficiencies. For example, if we want to
identify dogs, we might train our network on the universe of cats out there on the
internet. Just don’t be surprised if later our CI is much better with chihuahuas than
doberman pinschers. Inevitably we are still stuck with the limitations of our own
particular data, with the same perils of overtraining of any limited set plus the biases
introduced by the pretraining.
Life is hard, but we move on. Our data are crap: Get over it.
2.2 OUR ALGORITHM IS CRAP
Our feature selection is wrong. It is always wrong. Our chances (or those of our deep
learning machine, no matter how deep) of wringing the ideal features from a massive
database are miniscule. That doesn’t mean we shouldn’t try; however, it means it’s
a much tougher problem than we think it is, and we aren’t going to ever (totally)
succeed. Chen and Brown [9] show what happens when simulated microarray
data are used with 30 known true (discriminatory) features out of 1000 and known
noise levels for all features. For this problem, out of 30 features selected as “sticking
farthest out of the noise” on average only three were valid discriminatory ones for
low intrinsic separation and nine for high separation conditions.
This problem is not alleviated when our CI observer is choosing its own features.
Azriel Rosenfeld was a prominent AI researcher at the University of Maryland who
did a lot of interesting work on object recognition (and self-driving vehicles). He
used to tell a story about one of his early successes. He had a contract with the
army to develop an algorithm to identify battle tanks. He had data for scenes with
and without tanks, and was enthusiastic and then puzzled when his CI performed
brilliantlydtoo brilliantly. Even if only a very small portion of a tank were visible
in a scene, his CI observer would say a tank were there. Upon reflection he saw that
the tankless scenes had been shot on a cloudy day and the with-tank ones on a sunny
day. That was the sole feature his CI had needed. Our CI observer may not be very
smart, but it is smarter than we are, in its own sly way.