Page 57 - Geochemical Anomaly and Mineral Prospectivity Mapping in GIS
P. 57
56 Chapter 3
X LIF = X LH − 5 . 1 ( × IQR ) (3.2)
X LOF = X LH − 3 ( × IQR ) (3.3)
An upper inner fence (UIF) and an upper outer fence (UOF) are also defined at 1½×IQR
and 3×IQR, respectively, away from the upper hinge toward the maximum.
Algebraically, values (X) at the UIF and the UOF can be estimated, respectively, as:
X UIF = X LH + 5 . 1 ( × IQR ) (3.4)
X UOF = X LH + 3 ( × IQR ) (3.5)
For log-transformed data, the fences are defined using log-transformed values in either
equation (3.4) or (3.5).
The lower whisker (LW) and the upper whisker (UW) are drawn from each of the
hinges toward the most extreme data values within the inner fences. Algebraically,
values (X) of the LW and the UW can be determined, respectively, as:
X LW = min ( [XX > X LIF ) ] , and (3.6)
X UW = max ( [XX < X UIF ) ] . (3.7)
where the values in brackets are those within the inner fences and the hinges. For log-
transformed data, the log values of the inner fences must be anti-logged for use in either
equation (3.6) or (3.7). Data values beyond the inner fences are considered outliers. Data
values between the inner and outer fences are considered ‘mild’ outliers, whilst data
values beyond the outer fences are considered ‘far’ or extreme outliers, i.e., very unusual
values (Kotz and Johnson, 1985, pp. 136-137). Mild and extreme outliers beyond the
inner fences are marked by different symbols (e.g., open circles and asterisks,
respectively (Fig. 3-4)).
A boxplot thus defines the 5-number summary statistics (minimum, LH, median, UH
and maximum) and describes the most important characteristics of a univariate data set,
namely (Tukey, 1997; Hoaglin et al., 2000): (a) location or central tendency; (b) spread;
(c) skewness; (d) lengths of tails; and (e) outliers. As the box represents approximately
50% or two quartiles of a univariate data set, it means that at most 25% of data can be
outliers but these values do not significantly affect the median and the hinges. In
addition, because the inner fences are defined by the IQR or the hinge width, they are not
seriously affected by outliers. These imply that the boxplot (or box-and-whiskers) plot is
resistant and robust against extreme outliers in a univariate data set.