Page 83 - Artificial Intelligence in the Age of Neural Networks and Brain Computing
P. 83
3. MFE Gradient Descent 71
Of which the sigmoid logic follows as the two state of BNN (growing new
neuron recruit or trim or prune old neuron) probability normalization dropping
the integration constant:
H recruit H prune H recruit
exp exp þ exp ¼ 1= expðDHÞþ 1
k B T o k B T o k B T o
1; DH/∞
¼ sðDHÞ¼
0; DH/ ∞
(3.14)
Dimensionless DH ¼ H recruit H prune Q:E:D.
Note that Russian Mathematician G. Cybenko has proved “Approximation by
Superposition of a Sigmoidal Functions,” Math. Control Signals Sys. 2 (1989)
303e314. Similarly, A. N. Kolmogorov, “On the representation of continuous func-
tions of many variables by superposition of continuous function of one variable and
addition,” Dokl. Akad. Nauk, SSSR, 114 (1957) 953e956.
.
Those activation column vector a of thousands of neurons are denoted as lower
case
. T
a ¼ða 1 ; a 2 ; .Þ
after the squash binary sigmoid logic function, or bipolar hyperbolic tangent logic
function within the multiple layer DL, with the backward error propagation requires
1
gradient descent derivatives: MPDP; superscript l˛(1, 2, .)¼R denotes l th
layers. The 1k by 1k million pixels image spanned in the linear vector space of
million orthogonal axes where the collective values of all neurons’ activations
. ½l
a of the next l-th layer in the infinite dimensional Hilbert Space. The slope weight
. ½l . ½l 1
[l]
will be adjusted based on the million inputs X of
matrix [W ] and intercepts q
early layer. The threshold logic at the output will be (a) Do Away All Do loops using
one-step MDP algorithm within layers will be bipolar hyperbolic tangent and
(b) output layer bipolar sigmoid (see Fig. 3.6);
h i . ½l 1 . ½l
. ½l
a ¼ s W ½l X q ;
h i h i 1
W ½l ¼ A ½l
hh i h i h ii i 1 h i h ii h i h ii 2
¼ I I A ½l y I A ½l þ I A ½l þ /
While Frank Rosenblatt developed ANN, Marvin Minsky challenged it and
coined the name of Artificial Intelligence (AI) as the classical rule-based system.
Steve Grossberg and Gail Carpenter of Boston University developed Adaptive
Resonance Theory (ART) model that has three layers folded down to itself as the