Page 33 - Vibrational Spectroscopic Imaging for Biomedical Applications
P. 33
10 Cha pte r O n e
error that would result in using that specific metric for classification.
The metrics are arranged in order of increasing error and employed
to classify tissue. An entire classifier is built using the first metric, the
first two, the first three, and so on. The total number of classifiers is
equal to the total combinations of metrics that are present. We
restricted ourselves to linear combinations or singular measures of
metrics to allow interpretation of results in terms of the underlying
spectral data.
Statistical analysis of classification accuracy is performed by the
application of receiver operating characteristic (ROC) analysis with
quantitative evaluation by calculation of the area under the ROC
curve (AUC). Since each classifier differs from the previous by the
addition of a metric, this process has also been termed the sequential
forward selection process. A plot of the AUC with the addition of
specific metrics reveals those that increase or reduce classification
accuracy. Classification is then optimized by sorting the metrics by
the change in the AUC after the addition of a given metric and subse-
quently iterating the classification procedure. The classification algo-
rithm is based on Bayes’ decision rule which states that
(
pm c pc ) pm c pc )
(
(
(
)
)
pc m ) = i 1 1 and pc m ) = i 2 2 (1.1)
(
(
1 i 2 i
(
pm ) pm )
(
i i
where c is a tissue class and m is a spectral metric. Due to the limited
tissue sampling for determining the distributions of pm c( i 1 ) and
(
pm c ), it is not possible to find exact values for the prior tissue class
i 2
probabilities pc() and pc(). Therefore, pc() and pc() are estimated
1 2 1 2
during the calibration step to determine which values provide the
highest accuracy.
1.2 Materials and Methods
Two paraffin-embedded breast TMAs from US Biomax Inc. with
tissue samples from 40 breast cancer patients are analyzed in this
study. The TMAs are fixed on barium fluoride (BaF ) substrates to
2
permit data collection over the entire mid-IR spectral region of inter-
−1
est (720 to 4000 cm ). The first array contains carcinoma and adjacent
normal tissue from 40 patients (2 with a grade I tumor, 26 with a
grade II tumor, 6 with a grade III tumor, and 6 with an unknown
tumor grade). This array is used as a calibration dataset to develop
algorithms to segment breast histology and pathology as outlined in
Fig. 1.2. These algorithms are then validated on a separate cut of the
same TMA containing different tissue sections from the same patients.
Prior to imaging, paraffin is removed from each TMA by immersing
in hexane for 48 to 72 hours at 40°C while stirring. To ensure continued