Page 51 - Artificial Intelligence for the Internet of Everything

P. 51

38 Artificial Intelligence for the Internet of Everything

2.5 ADVERSARIAL LEARNING IN DNN

Unfortunately these models have been shown to be very brittle and vulner-
able to specially crafted adversarial perturbations to examples: given an input
x and any target classification t, it is possible to find a new input x that is
0
similar to x but classified as t. These adversarial examples often appear almost
indistinguishable from natural data to human perception and are, as yet,
incorrectly classified by the neural network. Recent results have shown that
accuracy of neural networks can be reduced from close to 100% to below 5%
using adversarial examples. This creates a significant challenge in deploying
these deep learning models in security-critical domains where adversarial
activity is intrinsic, such as IoBT, cyber networks, and surveillance. The
use of neural networks in computer vision and speech recognition has
brought these models into the center of security-critical systems where
authentication depends on these machine-learned models. How do we
ensure that adversaries in these domains do not exploit the limitations of
ML models to go undetected or trigger an unintended outcome?
Multiple methods have been proposed in literature to generate adversar-
ial examples as well as defend against adversarial examples. Adversarial
example-generation methods include both white-box and black-box attacks
on neural networks (Goodfellow, Shlens, & Szegedy, 2014; Papernot et al.,
2017; Papernot, McDaniel, Jha, et al., 2016; Szegedy et al., 2013), targeting
feed-forward classification networks (Carlini & Wagner, 2016), generative
networks (Kos, Fischer, & Song, 2017), and recurrent neural networks
(Papernot, McDaniel, Swami, & Harang, 2016). These methods leverage
gradient-based optimization for normal examples to discover perturbations
that lead to misprediction—the techniques differ in defining the neighbor-
hood in which perturbation is permitted and the loss function used to guide
the search. For example, one of the earliest attacks (Goodfellow et al., 2014)
used a fast sign gradient method (FGMS) that looks for a similar image x in
0
∞
the L neighborhood of x. Given a loss function Loss(x, l) specifying the cost
of classifying the point x as label l, the adversarial example x is calculated as:
0
0
x ¼ x + E signðr x Lossðx,l x ÞÞ
FGMS was improved to an iterative gradient sign approach (IGSM) in
Kurakin, Goodfellow, and Bengio (2016) by using a finer iterative optimi-
zation strategy, where the attack performs FGMS with a smaller step-width
α and clips the updated result so that the image stays within the E boundary of
x. In this approach, the ith iteration computes the following:

46 47 48 49 50 51 52 53 54 55 56