Page 180 - Artificial Intelligence in the Age of Neural Networks and Brain Computing

P. 180

170 CHAPTER 8 The New AI: Basic Concepts, and Urgent Risks

FIGURE 8.7
General backpropagation as formulated and proven in 1974.

I argued that the true “neural code” for the highest level neurons is not just ones
and zeros, but bursts of volleys of continuous intensity, at regular intervals. However,
Minsky stated that he simply could not get away with that, and history has shown
that his political strategy worked better than mine.
At about the same time, I wrote this up as a request for computer time from my
Department at Harvard. Professor Larry Ho, who controlled computer time that year,
rejected the request on grounds that he did not think that this kind of backpropaga-
tion could possibly work. However, when I asked to use this as the basis for my
Harvard Ph.D. thesis [20], the department said that they would allow it, so long
as neural networks were not a major part of the thesis and so long as I could prove
that the new method for calculating derivatives would really work. This was an
excellent piece of guidance from them, which led to my proving the general chain
rule for ordered derivatives illustrated in Fig. 8.7.
Note that this method for calculating gradients can be applied to any large sparse
differentiable nonlinear system, and not just the type of ANN illustrated in
Fig. 8.5A. In 1988, I generalized the method for use on implicit, simultaneous-
equation types of model; for a review of the history, and of ways to use this method
not only in neural networks but in other applications, see my paper on automatic
differentiation [21].

2.4 CONNs, >3 LAYERS, AND AUTOENCODERS: THE THREE MAIN
TOOLS OF TODAY’S DEEP LEARNING
Many people argue that the phrase “deep learning” simply means adding more layers
to an ANN, beyond the traditional popular three you see in Fig. 8.5B. But many have

175 176 177 178 179 180 181 182 183 184 185