Page 57 - Geochemical Anomaly and Mineral Prospectivity Mapping in GIS
P. 57

56                                                              Chapter 3

             X LIF  =  X  LH  −  5 . 1 (  × IQR )                               (3.2)

             X LOF  =  X LH  −  3 ( ×  IQR )                                    (3.3)

             An upper inner fence (UIF) and an upper outer fence (UOF) are also defined at 1½×IQR
             and 3×IQR,  respectively,  away from the upper  hinge toward the  maximum.
             Algebraically, values (X) at the UIF and the UOF can be estimated, respectively, as:

             X UIF  =  X LH  +  5 . 1 (  ×  IQR )                               (3.4)

             X UOF  =  X LH  +  3 ( ×  IQR )                                    (3.5)

             For log-transformed data, the fences are defined using log-transformed values in either
             equation (3.4) or (3.5).
                The lower whisker (LW) and the upper whisker (UW) are drawn from each of the
             hinges toward the most extreme data values  within the inner fences. Algebraically,
             values (X) of the LW and the UW can be determined, respectively, as:

             X LW  =  min ( [XX  >  X  LIF  ) ] , and                           (3.6)

             X UW  =  max ( [XX  <  X UIF  ) ] .                                (3.7)

             where the values in brackets are those within the inner fences and the hinges. For log-
             transformed data, the log values of the inner fences must be anti-logged for use in either
             equation (3.6) or (3.7). Data values beyond the inner fences are considered outliers. Data
             values  between the inner and  outer fences are considered ‘mild’ outliers,  whilst  data
             values beyond the outer fences are considered ‘far’ or extreme outliers, i.e., very unusual
             values (Kotz and Johnson, 1985, pp. 136-137). Mild and extreme outliers beyond the
             inner fences  are marked by different symbols (e.g., open circles and asterisks,
             respectively (Fig. 3-4)).
                A boxplot thus defines the 5-number summary statistics (minimum, LH, median, UH
             and maximum) and describes the most important characteristics of a univariate data set,
             namely (Tukey, 1997; Hoaglin et al., 2000): (a) location or central tendency; (b) spread;
             (c) skewness; (d) lengths of tails; and (e) outliers. As the box represents approximately
             50% or two quartiles of a univariate data set, it means that at most 25% of data can be
             outliers  but these values  do not significantly affect the median and  the hinges.  In
             addition, because the inner fences are defined by the IQR or the hinge width, they are not
             seriously affected by outliers. These imply that the boxplot (or box-and-whiskers) plot is
             resistant and robust against extreme outliers in a univariate data set.
   52   53   54   55   56   57   58   59   60   61   62