Page 16 - Advances In Productive, Safe, and Responsible Coal Mining
P. 16

Profitable coal mining means being productive, safe, and environmentally responsible  3

           mine, data for underground operations, surface operations, nonpreparation plant sur-
           face facilities, and preparation plant facilities were considered separately for a total of
           321 sites. A simple linear regression analysis was performed on these data using the
                                              2
           Python scikit-learn library [8]. It had an R value of 0.024 and did not indicate the
           trend we were hoping for (see Fig. 1.1).
              After cleaning the data to show only total injury rates above zero, that is, null values
           were dropped, it was scaled and grouped into four clusters using the K-means algo-
           rithm. This algorithm works by separating samples into groups of equal variances
           through minimization of inertia and within-cluster sum-of-squares criteria. It is a
           three-step process commonly used with large data sets. In the first step, initial cluster
           centroids are chosen, and each data sample is assigned to one of the centroids. Then,
           the K-means algorithm creates new centroids based on the mean value of all samples
           assigned to each centroid in the first step. Finally, the algorithm processes differences
           between old and new centroids and repeats the first two steps until centroids no longer
           move significantly [9, 10]. Clusters are described in Table 1.1 and shown in Fig. 1.2.
           Manual clustering could have improved the statistical accuracy of the data (i.e.,
                     2
           increased R values), but this was not performed as it could be considered biased.
              After clustering, ordinary least squares (OLS) linear regression analyses were per-
           formed separately for Cluster 1 and Clusters 2–4 combined as shown in Figs. 1.3 and
           1.4, respectively. Happily, correlation trends were now in the desired direction;
                    2
           however, R values improved to only 0.066 and 0.275, respectively. The model for
           Clusters 2–4 indicated that an increase in productivity of approximately 1ton per






              120

              100

               80
             Total IR  60



               40


               20

                0
                    0           50           100         150          200
                                           Tons/ManHr

           Fig. 1.1 Simple linear regression analysis of total injury frequency rate versus tons per man-
                          2
           hour for 321 sites, R ¼0.024.
   11   12   13   14   15   16   17   18   19   20   21