Page 67 - Big Data Analytics for Intelligent Healthcare Management

P. 67

60 CHAPTER 4 TRANSFER LEARNING AND SUPERVISED CLASSIFIER

models, a dimensionality reduction algorithm PCA was applied. Then to perform classification, three
different classifiers logistic regression (LR), support vector machine (SVM), and K-nearest neighbor
(K-NN) were used. This model could help the pathologist have a preliminary idea of to what class a
breast tissue image belongs to for example, to benign or malignant. After that step, it will be easy for the
pathologists to diagnose the image to the predicted class or not.

4.2 RELATED WORK
A huge amount of research work has been done for the automatic prediction of the presence of breast
cancer using different datasets. In paper [3], using the BreaKHis image dataset to classify the images
into two classes (benign and malignant), the authors used six different feature extractors with four
different classifiers and reported accuracy ranges from 80% to 85%. The authors of paper [4], using
different pretrained ConvNet models running thousands of epochs, reported the highest accuracy of
99.8%. In paper [5], the authors reported accuracy ranges from 81% to 90% using DECAF features
with other classifiers and compared their work to other works. In paper [6], the authors showed a com-
parative study of different machine learning techniques on breast cancer FNA biopsy data and the
K-NN with Euclidean distance approach showed a prediction accuracy of 100% with the K value
of 5, 10, and 11, and it also showed the same accuracy using cityblock distance with a K value of
13. In the paper [7], the authors applied different machine learning algorithms to the Wisconsin breast
cancer dataset and analyzed the performance. They reported an accuracy close to 100% and SVM gaves
an accuracy of 100%. In paper [8], the authors used ResNet-50 and VGG16 and reported an accuracy of
89% and 84%, respectively. In paper [9], the author used different machine learning algorithms and
reported an accuracy of 98.8% and 96.33%m respectively, using SVM on two different datasets. In
paper [10], the authors reported their best accuracy of 99.038% using MLP on the Wisconsin breast
cancer dataset. A huge amount of research work has been done in the area of cancer with different
supervised and semisupervised classification, clustering, and feature detection methods of biomedical
images [11–20], optimization, and information security techniques of medical data [21–25] helping to
make computer-aided medical systems.

4.3 DATASET AND METHODOLOGIES
Dataset: In this work, we have used the BreaKHis [3] breast cancer histopathological image dataset.
This dataset contains about 7909 RGB images of benign and malignant tissue at four magnification
factors (40 , 100 , 200 , 400 )(Fig. 4.1, Table 4.1).

4.3.1 CONVOLUTION NEURAL NETWORKS (CNNS/CONVNETS)
Convolution neural networks are the state-of-the-art models for image classification and they are very
similar to ordinary neural networks except they have some extra layers. There are three main building
blocks of convolution neural networks: Convolution Layer, Pooling Layer, and Fully Connected Layer.
These terms are described further below.

62 63 64 65 66 67 68 69 70 71 72