Page 33 - Vibrational Spectroscopic Imaging for Biomedical Applications
P. 33

10    Cha pte r  O n e


        error that would result in using that specific metric for classification.
        The metrics are arranged in order of increasing error and employed
        to classify tissue. An entire classifier is built using the first metric, the
        first two, the first three, and so on. The total number of classifiers is
        equal to the total combinations of metrics that are present. We
        restricted ourselves to linear combinations or singular measures of
        metrics to allow interpretation of results in terms of the underlying
        spectral data.
            Statistical analysis of classification accuracy is performed by the
        application of receiver operating characteristic (ROC) analysis with
        quantitative evaluation by calculation of the area under the ROC
        curve (AUC). Since each classifier differs from the previous by the
        addition of a metric, this process has also been termed the sequential
        forward selection process. A plot of the AUC with the addition of
        specific metrics reveals those that increase or reduce classification
        accuracy. Classification is then optimized by sorting the metrics by
        the change in the AUC after the addition of a given metric and subse-
        quently iterating the classification procedure. The classification algo-
        rithm is based on Bayes’ decision rule which states that


                                                        (
                    pm c pc )                    pm c pc )
                     (
                           (
                                                  (
                                                       )
                         )
            pc m ) =   i  1  1    and    pc m ) =   i  2  2   (1.1)
                                          (
             (
              1  i                         2  i
                                                     (
                       pm )                         pm )
                        (
                          i                            i
        where c is a tissue class and m is a spectral metric. Due to the limited
        tissue sampling for determining the distributions of  pm c(  i  1 ) and
         (
         pm c ), it is not possible to find exact values for the prior tissue class
            i  2
        probabilities pc() and pc(). Therefore,  pc() and pc() are estimated
                      1       2             1        2
        during the calibration step to determine which values provide the
        highest accuracy.
   1.2  Materials and Methods
        Two paraffin-embedded breast TMAs from US Biomax Inc. with
        tissue samples from 40 breast cancer patients are analyzed in this
        study. The TMAs are fixed on barium fluoride (BaF ) substrates to
                                                     2
        permit data collection over the entire mid-IR spectral region of inter-
                        −1
        est (720 to 4000 cm ). The first array contains carcinoma and adjacent
        normal tissue from 40 patients (2 with a grade I tumor, 26 with a
        grade II tumor, 6 with a grade III tumor, and 6 with an unknown
        tumor grade). This array is used as a calibration dataset to develop
        algorithms to segment breast histology and pathology as outlined in
        Fig. 1.2. These algorithms are then validated on a separate cut of the
        same TMA containing different tissue sections from the same patients.
        Prior to imaging, paraffin is removed from each TMA by immersing
        in hexane for 48 to 72 hours at 40°C while stirring. To ensure continued
   28   29   30   31   32   33   34   35   36   37   38