Page 279 - Computational Retinal Image Analysis
P. 279

2  Oct fluid quantification  277




                     Fully convolutional networks (FCN) were proposed soon afterward and achieved
                  dense predictions without the need for fully connected layers [16]. Besides being
                  substantially faster than patch-based models, they allowed segmentations to be ob-
                  tained from images of arbitrary sizes and image-to-image training. Thus, all of the
                  subsequent segmentation works adopted the FCN paradigm. Popular and successful
                  semantic segmentation CNN architectures consist of two processing components, an
                  encoder and a decoder [17, 18]. The encoder gradually transforms an input image
                  into a low-dimensional embedding, and the decoder gradually recovers this abstract
                  image representation to an image of class labels. The mapping of the encoder from
                  raw images to the data embedding, needed to generate the label image, and the map-
                  ping of the decoder from the embedding to a full-input resolution label image are
                  learned simultaneously, end to end. A pixel-based cross-entropy or a smoothed Dice
                  coefficient [19] is typically used as the network’s loss function, which is optimized
                  during training. At the end, a softmax layer estimates the probability of a pixel be-
                  longing to a class and pixel-wise class labels are obtained by computing the arg
                  max function over the class probabilities (Fig. 3). The current state-of-the-art CNN
                  for medical image segmentation is a U-net [20], which further includes shortcut/
                  skip connections across an encoder and decoder to facilitate resolution recovery by
                  a decoder.

                       OCT slice                           Result
                                      Encoder data representation

                                      Decoder data representation
                                      Convolution layer
                                      Max-pooling layer
                                      Unpooling layer


                                             Encoder  Decoder




                        Input
                                               Neural network
                                                                        Label probabilities
                           Convolution block                    Transposed convolution block
                  FIG. 3
                  Convolutional neural network with an encoder-decoder architecture to segment intraretinal
                  fluid (green), subretinal fluid (blue), retinal tissue (red), and nonretinal region (yellow).
                     Reproduced from T. Schlegl, S.M. Waldstein, H. Bogunovic, F. Endstraßer, A. Sadeghipour, A.-M. Philip,
                     D. Podkowinski, B.S. Gerendas, G. Langs, U. Schmidt-Erfurth, Fully automated detection and quantifica-
                      tion of macular fluid in OCT using deep learning, Ophthalmology 125 (4) 2018) 549–558, 10.1016/J.
                                                                      OPHTHA.2017.10.031.
   274   275   276   277   278   279   280   281   282   283   284