Page 44 - Artificial Intelligence for the Internet of Everything
P. 44

Uncertainty Quantification in Internet of Battlefield Things  31


                    T

              ‘ðα*ðw αðD ;xÞ;ðx t ,y ÞÞÞ that captures how well the classifier w may predict
                                 t

              y t when given the coding αðD ;x t Þ. Note that α is computed using the dictio-

              nary D. The task-driven dictionary learning problem is formulated as the joint
                                                                    k
              determination of the dictionary D2D and classifier w 2W    that mini-
              mize the cost ‘ðαðD ;x t Þ,w;ðx t ,y ÞÞ averaged over the training set:

                                          t
                                                    T
                                                                          (2.6)
                                                                  t
                                     argmin  x,y ½‘ðw αðD ;x t Þ;ðx t ,y Þފ
                       ðD *,w*Þ :¼

                                    D2D,w2W
                                                      T

              In Eq. (2.6) we specify the estimator ^yðxÞ¼ w α*ðD ;xÞ, which parame-
              terizes the function class F as the product set W D. For a given dictionary

              D and signal sample x t , we compute the code α*ðD ;x t Þ as per some lasso
              regression problem, for instance, and then predict y t using w, and measure
                                                        T

              the prediction error with the loss function ‘ðw αðD ;x t Þ,;ðx t ,y ÞÞ. The
                                                                       t

              optimal pair ðD *,w*Þ in Eq. (2.6) is the one that minimizes the cost aver-

              aged over the given sample pairs (x t , y t ). Observe that α*ðD ;x t Þ is not a var-
              iable in the optimization in Eq. (2.6) but a mapping for an implicit

              dependence of the loss on the dictionary D. The optimization problem
              in Eq. (2.6) is not assumed to be convex—this would be restrictive because

              the dependence of ‘ on D is, partly, through the mapping α*ðD ;x t Þ defined
              by some sparse-coding procedure. In general, only local minima of Eq. (2.6)
              can be found. This formulation has, nonetheless, been successful in solving
              practical pattern recognition tasks in vision (Mairal et al., 2012) and robotics
              (Koppel, Fink, Warnell, Stump, & Ribeiro, 2016).
                 The lack of convexity of Eq. (2.6) means that attaining statistical consis-
              tency for supervised dictionary learning methods is much more challenging
              than for GLMs. To this end, the prevalence of nonconvex stochastic pro-
              grams arising from statistical learning based on nonlinear transformations
              of the feature space X has led to a renewed interest in nonconvex optimi-
              zation methods through applying convex techniques to nonconvex settings
              (Boyd & Vanderberghe, 2004). This constitutes a form of simulated anneal-
              ing (Bertsimas & Tsitsiklis, 1993) with successive convex approximation
              (Facchinei, Scutari, & Sagratella, 2015). A compelling achievement of this
              recent surge is the hybrid convex-annealing approach, which has been
              shown to be capable of finding a global minimizer (Raginsky, Rakhlin, &
              Telgarsky, 2017). However, the use of these methods for addressing the
              training of estimators defined by nonconvex stochastic programs requires
              far more training examples to obtain convergence than convex problems
              and requires further demonstration in practice.
   39   40   41   42   43   44   45   46   47   48   49