Page 44 - Artificial Intelligence for the Internet of Everything

P. 44

Uncertainty Quantification in Internet of Battlefield Things 31

‘ðα*ðw αðD ;xÞ;ðx t ,y ÞÞÞ that captures how well the classifier w may predict
t

y t when given the coding αðD ;x t Þ. Note that α is computed using the dictio-

nary D. The task-driven dictionary learning problem is formulated as the joint
k
determination of the dictionary D2D and classifier w 2W  that mini-
mize the cost ‘ðαðD ;x t Þ,w;ðx t ,y ÞÞ averaged over the training set:

t
T
(2.6)
t
argmin  x,y ½‘ðw αðD ;x t Þ;ðx t ,y ÞÞ
ðD *,w*Þ :¼

D2D,w2W
T

In Eq. (2.6) we specify the estimator ^yðxÞ¼ w α*ðD ;xÞ, which parame-
terizes the function class F as the product set W D. For a given dictionary

D and signal sample x t , we compute the code α*ðD ;x t Þ as per some lasso
regression problem, for instance, and then predict y t using w, and measure
T

the prediction error with the loss function ‘ðw αðD ;x t Þ,;ðx t ,y ÞÞ. The
t

optimal pair ðD *,w*Þ in Eq. (2.6) is the one that minimizes the cost aver-

aged over the given sample pairs (x t , y t ). Observe that α*ðD ;x t Þ is not a var-
iable in the optimization in Eq. (2.6) but a mapping for an implicit

dependence of the loss on the dictionary D. The optimization problem
in Eq. (2.6) is not assumed to be convex—this would be restrictive because

the dependence of ‘ on D is, partly, through the mapping α*ðD ;x t Þ defined
by some sparse-coding procedure. In general, only local minima of Eq. (2.6)
can be found. This formulation has, nonetheless, been successful in solving
practical pattern recognition tasks in vision (Mairal et al., 2012) and robotics
(Koppel, Fink, Warnell, Stump, & Ribeiro, 2016).
The lack of convexity of Eq. (2.6) means that attaining statistical consis-
tency for supervised dictionary learning methods is much more challenging
than for GLMs. To this end, the prevalence of nonconvex stochastic pro-
grams arising from statistical learning based on nonlinear transformations
of the feature space X has led to a renewed interest in nonconvex optimi-
zation methods through applying convex techniques to nonconvex settings
(Boyd & Vanderberghe, 2004). This constitutes a form of simulated anneal-
ing (Bertsimas & Tsitsiklis, 1993) with successive convex approximation
(Facchinei, Scutari, & Sagratella, 2015). A compelling achievement of this
recent surge is the hybrid convex-annealing approach, which has been
shown to be capable of finding a global minimizer (Raginsky, Rakhlin, &
Telgarsky, 2017). However, the use of these methods for addressing the
training of estimators defined by nonconvex stochastic programs requires
far more training examples to obtain convergence than convex problems
and requires further demonstration in practice.

39 40 41 42 43 44 45 46 47 48 49