Page 397 -
P. 397
Chapter 9 Business Intelligence Systems
396
4. If you had the data in your answer to question 3, how would 6. Suppose it is possible to obtain the data needed and to build
you go about determining how much each of the factors a model to predict with 51 percent accuracy the price of a
influences the price of the stock? What kinds of BI techniques stock. Is that a usable model? What do you need to make such
would you employ? a model effective?
5. Assuming you had used BI to answer question 4 and now had 7. Suppose you’ve misjudged your model and it predicts with only
a model of how your 10 factors influence the price of that 49 percent accuracy. What is likely to happen?
stock, how would you determine how good your model is? How 8. Summarize what you have learned from this exercise.
would you know that the 10 factors you choose were the right
10 factors?
referred to as the Map phase. In Figure 9-23, for example, a data set having the logs of Google
searches is broken into pieces, and each independent processor is instructed to search for and
count search keywords. Figure 9-23, of course, shows just a small portion of the data; here you
can see a portion of the keywords that begin with H.
As the processors finish, their results are combined in what is referred to as the Reduce phase.
The result is a list of all the terms searched for on a given day and the count of each. The process is
considerably more complex than described here, but this is the gist of the idea.
By the way, you can visit Google Trends to see an application of MapReduce. There you can
obtain a trend line of the number of searches for a particular term or terms. Figure 9-24 compares
the search trends for the terms Web 2.0 and Hadoop. Go to www.google.com/trends and enter the
terms Big Data, BigData, and data analytics to see why learning about them is a good use of your time.
Hadoop
16
Hadoop is an open source program supported by the Apache Foundation that implements
MapReduce on potentially thousands of computers. Hadoop could drive the process of finding and
counting the Google search terms, but Google uses its own proprietary version of MapReduce to
do so instead.
Log
Search log: segments: Map Phase Reduce Phase
…
Halon; Wolverine; …
Abacus; Poodle; Fence; Processor 1 Hadoop 14
Acura; Healthcare; Healthcare 85
Cassandra; Belltown; Hiccup 17
Hadoop; Geranium; Hurricane 8 Keyword: Total Count:
Stonework; Healthcare; … …
Honda; Hadoop; … Hadoop 10,418
Congress; Healthcare; Hadoop 3 Halon 4,788
Frigate; Metric; Clamp; Processor 2 Healthcare 2 Healthcare 12,487,318
Dell; Salmon; Hadoop; Honda 1 Hiccup 7,435
Picasso; Abba; … Honda 127,489
… Hotel 237,654
… … …
… Hurricane 2,799
Halon 11 …
Processor 9,555 Hotel 175
(+ or –) Honda 87
Figure 9-23 Hurricane 53
MapReduce Processing …
Summary

