Page 98 - Computational Retinal Image Analysis
P. 98

90     CHAPTER 5  Automatic landmark detection in fundus photography




                         machine learning requires a set of handcrafted features that are “learned” with the aid
                         of ground truth (supervised learning), clustered based on intrinsic properties (unsu-
                         pervised learning), or some area in between. Given enough training examples, deep
                         networks learn their own set of features from internal convolutions with a set of
                         filters, generally performed at different scales and abstractions of the input data. If
                         large amounts of labeled data are available along with sufficient computing power
                         for training, one is likely to see performance gains over traditional methods on the
                         same data.
                            Deep CNNs are best known for their classification abilities. Given an input im-
                         age, predict a level of disease such as for diabetic retinopathy. They can also be used
                         for regression tasks. In this case, the training labels are the OD and fovea center
                         points marked by two graders on the Messidor and Kaggle datasets. The first step
                         is to preprocess the data. Remarkably, the only preprocessing done on the images is
                         to convert the image to grayscale, resize to 256 × 256 and perform contrast limited
                         adaptive histogram equalization [24]. The deep CNN is built from a combination of
                         standard layers:
                            Convolutional layer—sets of filters convolved with the input to produce feature
                            maps.
                            Pooling layer—form of subsampling of the convolutional layers. In this case
                            max-pooling is used, where a window slides over a feature map and the max
                            value in that window is selected.
                            Dropout layer—with a certain probability, drops the output (ignores) of certain
                            hidden units to prevent overfitting in the training phase.
                            Fully connected layer—each node in this layer is connected to every node in the
                            previous layer. This is usually the final layer of a deep CNN.

                            All layers except the output layer use a Rectified Linear Unit [25] as an activation
                         function, define as:
                                                    ϕ : x →  max(0, x)                   (10)

                            So that the output of any particular node is 0 if the output is less than 0 and x
                         otherwise. The output layer uses a linear function to combine the output layer ac-
                         tivations. The layers used in this architecture can be visualized in Fig. 6. There are
                         two steps to detection the fovea and OD. The first step runs the preprocessed image
                         through the network and the found areas become areas of interest. These areas of
                         interest are then run through other networks to fine tune the found locations (Fig. 6).
                            10,000 images from the Kaggle dataset were used for training and testing (7000
                         for training and validation, 3000 for testing) and 1200 images from the Messidor
                         dataset were also used for testing. Results were presented as percent of images with
                         OD and fovea found within 1 disc radius, 0.5 disc radii and 0.25 disc radii. Results
                         using the 1 disc radius criteria were 97/96.6% for OD and fovea respectively in the
                         Messidor dataset and 96.7/95.6% in the Kaggle dataset. Further, with all the over-
                         head timewise in the training of the model, this method is able to run almost instan-
                         taneously (0.007 s).
   93   94   95   96   97   98   99   100   101   102   103