Page 152 - Intelligent Digital Oil And Gas Fields
P. 152

Components of Artificial Intelligence and Data Analytics     115


              Apache Hadoop (Ghemawat et al., 2003; Handy, 2015) or NoSQL
              (Pokorny, 2011), distributed on platforms such as Cloudera, Hortonworks
              and MapReduce (Dean and Ghemawat, 2008) or Apache Spark.
                 Recently, an overwhelming amount of literature has been published
              about Big Data concepts. Two publications that we recommend include
              “Harness the Power of Big Data” by Zikopoulos et al. (2013) and Harness
              Oil and Gas Big Data with Analytics: Optimize Exploration and Production with
              Data Driven Models by Holdaway (2014).



                   4.2 INTELLIGENT DATA ANALYTICS
                       AND VISUALIZATION
                   4.2.1 Data Mining
              Data mining (DM) is a knowledge discovery from large quantities of data.
              The process derives its name from the similarity between searching for valu-
              able business information in a large database, containing terabytes or even
              petabytes of data, and mining a mountain for a vein of valuable ore. Tech-
              nically, the term refers to the process of extracting useful models and patterns
              that are (Leskovec et al., 2014)
              •  valid (i.e., contain new data with some certainty),
              •  useful (i.e., add value and enable people to take related actions),
              •  unexpected (i.e., nonobvious and nonintuitive, spurring the “aha!”
                 moment), and
              •  understandable (i.e., humans should be able to interpret and analyze them).
              Data mining as a discipline overlaps with database systems, statistics, and
              ML, and, as such, the complexity when dealing with data in data mining
              applications can be graphically represented as shown in Fig. 4.6.
                 As data come in a variety of modalities, formats, and ontologies—from
              structured, unstructured, static to streaming, descriptive to Boolean—this
              infers that for successful data mining, the data need to be properly collected,
              stored, and managed. Ideally, these tasks would be performed continuously
              by the data operators; however, in reality (as is frequently the case in the E&P
              industry), the data presented for mining is imperfect, with missing, illogical,
              and nonphysical values that require extensive QA/QC processing, with
              missing data interpolation and imputation (van Buuren and Groothuis-
              Oudshoorn, 2011).
                 Historically, statisticians were the first to use the term “data mining,”
              ironically focusing primarily on the attempts to extract the information that
              was not supported by the data. However, with the evolution of statistical
   147   148   149   150   151   152   153   154   155   156   157