Page 73 - Neural Network Modeling and Identification of Dynamical Systems
P. 73
2.2 ARTIFICIAL NEURAL NETWORK TRAINING METHODS 61
Since the activations of neurons of some layer m of additional sensitivities, i.e.,
do not affect the neurons of preceding layers l<
m, the corresponding pairwise sensitivities are ∂a L = ϕ (n )ν L,0 . (2.74)
i
L
L
identically zero, i.e., ∂a 0 i i i,j
j
ν l,m = 0,m > l. (2.69) Backpropagation algorithm for error gradi-
i,j
ent and Hessian [66]. First, we perform a for-
The remaining pairwise sensitivities are com- ward pass to compute the weighted sums n and
l
i
puted during the forward pass, along with the activations a according to Eqs. (2.8), and also to
l
i
l
l
weighted sums n and activations a , i.e., l,m
i i compute the pairwise sensitivities ν accord-
i,j
ing to (2.68)–(2.70).
l−1
S
l
ν l,m = w ϕ l−1 (n l−1 )ν l−1,m ,l = 2,...,L. We define the error function second-order
i,j i,k k k k,j sensitivities with respect to weighted sums to be
k=1
(2.70) as follows:
2
∂ E
l,m
Finally, the derivatives of neural network out- δ i,j ∂n ∂n m . (2.75)
l
puts with respect to parameters are expressed in i j
terms of pairwise sensitivities, i.e., Next, during a backward pass we compute
l
the error function sensitivities δ as well as
i
∂a L l,m
L
L
i = ϕ (n )ν L,m , the second-order sensitivities δ i,j . According
∂b m i i i,j to Schwarz’s theorem on equality of mixed
j
(2.71)
∂a L partials, due to continuity of second partial
i = ϕ (n )ν L,m m−1 . derivatives of an error function with respect to
L
L
a
∂w m i i i,j k
j,k l,m m,l
weighted sums, we have δ = δ . Hence, we
i,j j,i
If we additionally define the sensitivities of need to compute the second-order sensitivities
weighted sums with respect to network inputs, only for the case m l.
Second-order sensitivities for the output layer
∂n l neurons are obtained directly, i.e.,
l,0 i
ν , (2.72)
i,j 0
∂a # $
j L,m L L 2 L L L L,m
δ = ω i ϕ (n ) − ˜y i − a ϕ (n ) ν ,
i,j i i i i i i,j
then we obtain the derivatives of network out-
(2.76)
puts with respect to network inputs. First, we
compute the additional sensitivities during the
while second-order sensitivities for the hidden
forward pass, i.e.,
layer neurons are computed during a backward
pass, i.e.,
1,0 1
ν = w ,
i,j i,j
l+1
S l−1 S
l,0 l l−1 l−1 l−1,0 l,m l l l+1 l+1,m
ν = w ϕ (n )ν ,l = 2,...,L. δ i,j = ϕ (n ) w k,i δ k,j
i
i
i,j i,k k k k,j
k=1 k=1
l+1
(2.73) l,m S (2.77)
l
l
δ
+ ϕ (n )ν w l+1 l+1 ,
i i i,j k,i k
Then, the derivatives of network outputs with k=1
respect to network inputs are expressed in terms l = L − 1,...,1.