Page 47 - Handbook of Deep Learning in Biomedical Engineering Techniques and Applications
P. 47
Chapter 2 Deep convolutional neural network in medical image processing 35
After the year 2012, the study of innovative models started off
and recently in the past few years, there is interest in deeper ar-
chitectures. Instead of using a single layer of kernel, having a
large receptive field, an alike function can be represented by a
lesser number of parameters. This can be done by stacking
smaller kernels. These deeper designs usually have a low memory
footprint during inference, which is helpful during the deploy-
ment on mobile smart computing devices. Simonyan and Zisser-
man [43] were the first to study on much deeper networks and
employed fixed as well as small size kernels on each layer. In
2014, a 19-layer system had won the ImageNet challenge and
generally referred as OxfordNetor VGG19.
More complex blocks have been added on top of the deeper
networks so as to bring an improvement in the performance of
the training process as well as reduce the number of variables.
Authors in Ref. [44] designed an architecture having 22-layer
and named it as GoogleNet, which is also known as Inception
that used the so-called inception blocks [45]. He et al. [46] intro-
duced the ResNet architecture that won the ImageNet challenge
in 2015 and comprised the ResNet blocks. The residual block's
main purpose is to learn the residuals instead of learning a func-
tion, and so it is preconditioned in the direction of learning map-
ping in each of the layers that are closer to the identity function. In
this way, even deeper architectures can be trained effectively.
Since 2014, the efficiency of the ImageNet standard has satu-
rated, and it is now hard to assess if the small rise in efficiency
can be attributed to having more complex and better models.
The advantage of the lower memory footprint that is provided
by the models is classically not so important for medical image
applications. Popularly, AlexNet and other simple architectures
such as VGG are still used for clinical data even though recently
the version of GoogleNet called Inception v3 has been used by
many researchers [47e49].
3.1.2 Multistream architectures
The traditional CNN model is able to accommodate several
sources of illustrations of the input, in the system of channels
presented to the input layer without any difficulty. This basic
idea could be extended more, and the channels at any point in
the network can be combined. Considering the concept that
different tasks involve different ways of merging, several multi-
stream models are being discussed. These architectures some-
times also termed as dual-pathway models [50] having two
major applications that are appropriate for medical image