Page 83 - Artificial Intelligence in the Age of Neural Networks and Brain Computing
P. 83

3. MFE Gradient Descent      71




                     Of which the sigmoid logic follows as the two state of BNN (growing new
                  neuron recruit or trim or prune old neuron) probability normalization dropping
                  the integration constant:

                          H recruit       H prune         H recruit
                    exp             exp          þ exp            ¼ 1= expðDHÞþ 1
                           k B T o         k B T o         k B T o
                                   1; DH/∞

                      ¼ sðDHÞ¼
                                  0; DH/   ∞
                                                                                (3.14)
                                Dimensionless DH ¼ H recruit   H prune  Q:E:D.
                     Note that Russian Mathematician G. Cybenko has proved “Approximation by
                  Superposition of a Sigmoidal Functions,” Math. Control Signals Sys. 2 (1989)
                  303e314. Similarly, A. N. Kolmogorov, “On the representation of continuous func-
                  tions of many variables by superposition of continuous function of one variable and
                  addition,” Dokl. Akad. Nauk, SSSR, 114 (1957) 953e956.
                                              .
                     Those activation column vector a of thousands of neurons are denoted as lower
                  case
                                            . T
                                            a ¼ða 1 ; a 2 ; .Þ
                  after the squash binary sigmoid logic function, or bipolar hyperbolic tangent logic
                  function within the multiple layer DL, with the backward error propagation requires
                                                                        1
                  gradient descent derivatives: MPDP; superscript l˛(1, 2, .)¼R denotes l th
                  layers. The 1k by 1k million pixels image spanned in the linear vector space of
                  million orthogonal axes where the collective values of all neurons’ activations
                  . ½lŠ
                  a  of the next l-th layer in the infinite dimensional Hilbert Space. The slope weight
                                        . ½lŠ                                 . ½l 1Š
                          [l]
                                           will be adjusted based on the million inputs X  of
                  matrix [W ] and intercepts q
                  early layer. The threshold logic at the output will be (a) Do Away All Do loops using
                  one-step MDP algorithm within layers will be bipolar hyperbolic tangent and
                  (b) output layer bipolar sigmoid (see Fig. 3.6);

                                              h    i . ½l 1Š  . ½lŠ
                                      . ½lŠ
                                       a  ¼ s  W  ½lŠ  X    q  ;
                   h   i   h  i  1
                    W  ½lŠ  ¼ A  ½lŠ
                           hh i   h i  h  ii i  1   h i  h  ii    h i   h  ii  2
                        ¼   I      I   A ½lŠ    y   I   A  ½lŠ  þ   I   A ½lŠ   þ /
                     While Frank Rosenblatt developed ANN, Marvin Minsky challenged it and
                  coined the name of Artificial Intelligence (AI) as the classical rule-based system.
                  Steve Grossberg and Gail Carpenter of Boston University developed Adaptive
                  Resonance Theory (ART) model that has three layers folded down to itself as the
   78   79   80   81   82   83   84   85   86   87   88