Page 181 - Artificial Intelligence in the Age of Neural Networks and Brain Computing
P. 181

2. Brief History and Foundations of the Deep Learning Revolution  171




                  said that someone should do research to develop more rigorous ways to decide on the
                  number of layers, and so on. They complain that such choices are usually based on
                  trying things out, and seeing what happens to error. In fact, such research was done
                  long ago, and many tools exist which implement rational tools. All the standard
                  methods of statistics also work. People planning to use ANNs in forecasting or
                  time-series modeling applications should be sure to understand the rigorous methods
                  developed in statistics [22], which are also based on trying models out and seeing
                  what happens to error, in a systematic way. Section 3 will say more about numbers
                  of layers.
                     Widespread use of the Convolutional Neural Network (CoNN) was arguably the
                  most important new direction in the new wave of deep learning which started in
                  2009e11. The basic CoNN is a variation of the simple feedforward ANN shown
                  in Fig. 8.5B, varied in a way which makes it possible to handle a much larger number
                  of inputs. The key idea is illustrated in Fig. 8.8.
                     The CoNN addresses the special case where the inputs to the neural network are
                  organized in a regular Euclidean grid, like what we often see in camera images. In a
                  naı ¨ve ANN, each of the many, many hidden neurons would take inputs from
                  different regions of the image. That would require estimating many, many parame-
                  ters to train the network (millions, if there are millions of pixels in the image). From
                  statistical theory, we know that this would require really huge amounts of data, even
                  by the standards of big data, to achieve reasonable accuracy.
                     The key idea in CoNNs is to “reuse” the same hidden neuron “in different
                  locations.” Equivalently, we could phrase this as “sharing weights” between all
                  the sibling hidden neurons of the same type handling different parts of the image.
                  In terms of basic mathematics, the CoNN is exploiting the idea of invariance with
























                  FIGURE 8.8
                  Schematic of what a CoNN is. See LeCun’s tutorial [23] for more details.
   176   177   178   179   180   181   182   183   184   185   186