Page 72 - Neural Network Modeling and Identification of Dynamical Systems
P. 72
60 2. DYNAMIC NEURAL NETWORKS: STRUCTURES AND TRAINING METHODS
put (the scalar error) and multiple inputs; there- We define the error function sensitivities with
l
fore reverse mode is significantly faster than the respect to weighted sums n to be as follows:
i
forward mode. As shown in [65], under realis-
l
tic assumptions the error function gradient can δ ∂E . (2.62)
i
be computed in reverse mode at a cost of five ∂n l i
function evaluations or less. Also note that in the
Sensitivities for the output layer neurons are
ANN field the forward and reverse computation
obtained directly, i.e.,
modes are usually referred to as forward propa-
gation and backward propagation (or backprop- δ =−ω i ˜y i − a L ϕ (n ), (2.63)
L
L
L
agation). i i i i
In the rest of this subsection we present auto- while sensitivities for the hidden layer neurons
matic differentiation algorithms for the compu- are computed during a backward pass:
tation of gradient, Jacobian, and Hessian of the
l+1
squared error function (2.58) in the case of a lay- S
l
l
l
ered feedforward neural network (2.8). All these δ = ϕ (n ) δ l+1 w l+1 ,l = L − 1,...,1.
i
i
i
j
j,i
algorithms rely on the fact that the derivatives j=1
of activation functions are known. For example, (2.64)
the derivatives of hyperbolic tangent activation
Finally, the error function derivatives with re-
functions (2.9)are
spect to parameters are expressed in terms of
⎫
2 sensitivities, i.e.,
l
l
l
l
ϕ (n ) = 1 − ϕ (n ) ⎬ l = 1,...,L − 1,
i i i i
l ∂E l
⎭ i = 1,...,S ,
l
l
l
l
l
l
ϕ (n ) =−2ϕ (n )ϕ (n ) l = δ ,
i
i i i i i i ∂b i
(2.59) (2.65)
∂E l l−1
= δ a .
i j
l
while the derivatives of a logistic function (2.10) ∂w i,j
equal
In a similar manner, we can compute the
⎫ derivatives with respect to network inputs, i.e.,
l
l
l
l
l
l
ϕ (n ) = ϕ (n ) 1 − ϕ (n ) ⎪
i i i i i i ⎬ l =1,...,L − 1,
l S 1
ϕ (n ) = ϕ (n ) 1 − 2ϕ (n ) ⎭ i =1,...,S . ∂E 1 1
l
l
l
l
l
l ⎪
i i i i i i = δ w . (2.66)
j
j,i
∂a 0
(2.60) i j=1
Forward propagation for network outputs
Derivatives of the identity activation functions
(2.11)are simply Jacobian. We define the pairwise sensitivities of
weighted sums to be as follows:
L
L
ϕ (n ) = 1 " l
i i L l,m ∂n
i = 1,...,S . (2.61) ν i . (2.67)
L
ϕ L (n ) = 0 i,j ∂n m
i i j
Backpropagation algorithm for error func- Pairwise sensitivities for neurons of the same
tion gradient. First, we perform a forward pass layer are obtained directly, i.e.,
l
to compute the weighted sums n and activa-
i ν l,l = 1,
l
l
tions a for all neurons i = 1,...,S of each layer i,i
i (2.68)
l = 1,...,L, according to equations (2.8). ν l,l = 0,i = j.
i,j