Page 52 - Artificial Intelligence for the Internet of Everything
P. 52
Uncertainty Quantification in Internet of Battlefield Things 39
x 0 0
i +1 ¼ clip E,x ðx + α signðr x Lossðx,l x ÞÞÞ
i
In contrast to FGSM and IGSM, DeepFool (Moosavi-Dezfooli, Fawzi, &
Frossard, 2016) attempts to find a perturbed image x from a normal image
0
x by finding the closest decision boundary and crossing it. In practice, Deep-
Fool relies on local linearized approximation of the decision boundary.
Another attack method that has received a lot of attention is the Carlini
attack, which relies on finding a perturbation that minimizes change as well
as the hinge loss on the logits (presoftmax classification result vector). The
attack is generated by solving the following optimization problem:
0 0
i
min½k δk 2 + c maxðZðx Þ maxZðx Þ : i 6¼ l x , κÞ
l x
δ
where Z denotes the logits, l x is the ground-truth label, κ is the confidence
(the raising of which will force the search for larger perturbations), and c is a
hyperparameter that balances the perturbation and the hinge loss. Another
attack method is projected gradient method (PGM) proposed in Madry,
Makelov, Schmidt, Tsipras, and Vladu (2017). PGD attempts to solve this
constrained optimization problem:
adv
max Lossðx ,l x Þ
adv
kx xk ∞ E
where S is the constraint on the allowed perturbation, usually given as bound
E on the norm, and l x is the ground-truth label of x. Projected gradient
descent is used to solve this constrained optimization problem by restarting
PGD from several points in the l ∞ balls around the data points x. This gra-
dient descent increases the loss function Loss in a fairly consistent way before
reaching a plateau with a fairly well-concentrated distribution and the
achieved maximum value is considerably higher than that of a random point
in the dataset. In this chapter, we focus on this PGD attack because it is
shown to be a universal first-order adversary (Madry et al., 2017), that is,
developing detection capability or resilience against PGD also implies
defense against many other first-order attacks.
Defense of neural networks against adversarial examples is more difficult
compared to generating attacks. Madry et al. (2017) propose a generic saddle
point formulation, where D is the underlying training data distribution and
Loss(θ, x, l x ) is a loss function at data point x with ground-truth label l x for a
model with parameter θ:
adv
max Lossðθ,x ,l x Þ
θ kx xk ∞ E
min E ðx,yÞ D
adv