Page 312 - Artificial Intelligence in the Age of Neural Networks and Brain Computing

P. 312

5. Application Case Study: Image Captioning for the Blind 305

FIGURE 15.4
An iconic image from an online magazine captioned by an evolved model. The model
provides a suitably detailed description without any unnecessary context.

captions to the “alt” ﬁeld of images, which screen readers can then read to blind
Internet users (Fig. 15.4).

5.3 IMAGE CAPTIONING RESULTS
Trained in parallel on about 100 GPUs, each generation took around 1 h to complete.
The most ﬁt architecture was discovered on generation 37 (Fig. 15.5). This architec-
ture performs better than the hand-tuned baseline [49] when trained on the
MSCOCO data alone (Table 15.3).
However, a more important result is the performance of this network on the
magazine website. Because no suitable automatic metrics exist for the types of cap-
tions collected for the magazine website (and existing metrics are very noisy when
there is only one reference caption), captions generated by the evolved model on all
3100 holdout images were manually evaluated as correct, mostly correct, mostly
incorrect, and incorrect (Fig. 15.6). Fig. 15.7 shows some examples of good and
bad captions for these images.
The model is not perfect, but the results are promising. The evolved network is
correct or mostly correct on 63% of iconic images and 31% of all images. There are
many known improvements that can be implemented, including ensembling diverse
architectures generated by evolution, ﬁne-tuning of the ImageNet model, using a
more recent ImageNet model, and performing beam search or scheduled sampling
during training [54] (preliminary experiments with ensembling alone suggest im-
provements of about 20%). For this application, it is also important to include
methods for automatically evaluation caption quality and ﬁltering captions that
would give an incorrect impression to a blind user. However, even without these ad-
ditions, the results demonstrate that it is now possible to develop practical applica-
tions through evolving DNNs.

307 308 309 310 311 312 313 314 315 316 317