Page 397 -
P. 397

Chapter 9  Business Intelligence Systems
                396
                4.  If you had the data in your answer to question 3, how would   6.  Suppose it is possible to obtain the data needed and to build
                   you go about determining how much each of the factors   a model to predict with 51 percent accuracy the price of a
                   influences the price of the stock? What kinds of BI techniques   stock. Is that a usable model? What do you need to make such
                   would you employ?                                     a model effective?
                5.  Assuming you had used BI to answer question 4 and now had   7.  Suppose you’ve misjudged your model and it predicts with only
                   a model of how your 10 factors influence the price of that   49 percent accuracy. What is likely to happen?
                   stock, how would you determine how good your model is? How   8.  Summarize what you have learned from this exercise.
                   would you know that the 10 factors you choose were the right
                   10 factors?






                                            referred to as the Map phase. In Figure 9-23, for example, a data set having the logs of Google
                                            searches is broken into pieces, and each independent processor is instructed to search for and
                                            count search keywords. Figure 9-23, of course, shows just a small portion of the data; here you
                                            can see a portion of the keywords that begin with H.
                                               As the processors finish, their results are combined in what is referred to as the Reduce phase.
                                            The result is a list of all the terms searched for on a given day and the count of each. The process is
                                            considerably more complex than described here, but this is the gist of the idea.
                                               By the way, you can visit Google Trends to see an application of MapReduce. There you can
                                            obtain a trend line of the number of searches for a particular term or terms. Figure 9-24 compares
                                            the search trends for the terms Web 2.0 and Hadoop. Go to www.google.com/trends and enter the
                                            terms Big Data, BigData, and data analytics to see why learning about them is a good use of your time.

                                            Hadoop

                                                                                                         16
                                            Hadoop is an open source  program supported by  the Apache Foundation  that  implements
                                            MapReduce on potentially thousands of computers. Hadoop could drive the process of finding and
                                            counting the Google search terms, but Google uses its own proprietary version of MapReduce to
                                            do so instead.


                                                       Log
                                     Search log:       segments:  Map Phase                    Reduce Phase
                                      …
                                      Halon; Wolverine;                           …
                                      Abacus; Poodle; Fence;    Processor 1       Hadoop    14
                                      Acura; Healthcare;                          Healthcare   85
                                      Cassandra; Belltown;                        Hiccup    17
                                      Hadoop; Geranium;                           Hurricane   8      Keyword:  Total Count:
                                      Stonework; Healthcare;                      …                 …
                                      Honda; Hadoop;                              …                 Hadoop       10,418
                                      Congress; Healthcare;                       Hadoop     3      Halon         4,788
                                      Frigate; Metric; Clamp;   Processor 2       Healthcare   2    Healthcare   12,487,318
                                      Dell; Salmon; Hadoop;                       Honda      1      Hiccup        7,435
                                      Picasso; Abba;                              …                 Honda        127,489
                                                                                       …            Hotel        237,654
                                      …                 …           …
                                                                                  …                 Hurricane     2,799
                                                                                  Halon     11      …
                                                               Processor 9,555    Hotel    175
                                                                  (+ or –)        Honda     87
                Figure 9-23                                                       Hurricane   53
                MapReduce Processing                                              …
                Summary
   392   393   394   395   396   397   398   399   400   401   402