Page 180 - Artificial Intelligence in the Age of Neural Networks and Brain Computing
P. 180

170    CHAPTER 8 The New AI: Basic Concepts, and Urgent Risks



























                         FIGURE 8.7
                         General backpropagation as formulated and proven in 1974.


                            I argued that the true “neural code” for the highest level neurons is not just ones
                         and zeros, but bursts of volleys of continuous intensity, at regular intervals. However,
                         Minsky stated that he simply could not get away with that, and history has shown
                         that his political strategy worked better than mine.
                            At about the same time, I wrote this up as a request for computer time from my
                         Department at Harvard. Professor Larry Ho, who controlled computer time that year,
                         rejected the request on grounds that he did not think that this kind of backpropaga-
                         tion could possibly work. However, when I asked to use this as the basis for my
                         Harvard Ph.D. thesis [20], the department said that they would allow it, so long
                         as neural networks were not a major part of the thesis and so long as I could prove
                         that the new method for calculating derivatives would really work. This was an
                         excellent piece of guidance from them, which led to my proving the general chain
                         rule for ordered derivatives illustrated in Fig. 8.7.
                            Note that this method for calculating gradients can be applied to any large sparse
                         differentiable nonlinear system, and not just the type of ANN illustrated in
                         Fig. 8.5A. In 1988, I generalized the method for use on implicit, simultaneous-
                         equation types of model; for a review of the history, and of ways to use this method
                         not only in neural networks but in other applications, see my paper on automatic
                         differentiation [21].

                         2.4 CONNs, >3 LAYERS, AND AUTOENCODERS: THE THREE MAIN
                             TOOLS OF TODAY’S DEEP LEARNING
                         Many people argue that the phrase “deep learning” simply means adding more layers
                         to an ANN, beyond the traditional popular three you see in Fig. 8.5B. But many have
   175   176   177   178   179   180   181   182   183   184   185