Page 219 -
P. 219
11-ch04-125-186-9780123814791
2011/6/1
HAN
182 Chapter 4 Data Warehousing and Online Analytical Processing 3:17 Page 182 #58
(a) Present an example illustrating such a huge and sparse data cube.
(b) Design an implementation method that can elegantly overcome this sparse matrix
problem. Note that you need to explain your data structures in detail and discuss
the space needed, as well as how to retrieve data from your structures.
(c) Modify your design in (b) to handle incremental data updates. Give the reasoning
behind your new design.
4.9 Regarding the computation of measures in a data cube:
(a) Enumerate three categories of measures, based on the kind of aggregate functions
used in computing a data cube.
(b) For a data cube with the three dimensions time, location, and item, which category
does the function variance belong to? Describe how to compute it if the cube is
partitioned into many chunks.
2
Hint: The formula for computing variance is 1 P N (x i − ¯x i ) , where ¯x i is the
N i=1
average of x i s.
(c) Suppose the function is “top 10 sales.” Discuss how to efficiently compute this
measure in a data cube.
4.10 Suppose a company wants to design a data warehouse to facilitate the analysis of moving
vehicles in an online analytical processing manner. The company registers huge amounts
of auto movement data in the format of (Auto ID, location, speed, time). Each Auto ID
represents a vehicle associated with information (e.g., vehicle category, driver category),
and each location may be associated with a street in a city. Assume that a street map is
available for the city.
(a) Design such a data warehouse to facilitate effective online analytical processing in
multidimensional space.
(b) The movement data may contain noise. Discuss how you would develop a method
to automatically discover data records that were likely erroneously registered in the
data repository.
(c) The movement data may be sparse. Discuss how you would develop a method that
constructs a reliable data warehouse despite the sparsity of data.
(d) If you want to drive from A to B starting at a particular time, discuss how a system
may use the data in this warehouse to work out a fast route.
4.11 Radio-frequency identification is commonly used to trace object movement and per-
form inventory control. An RFID reader can successfully read an RFID tag from
a limited distance at any scheduled time. Suppose a company wants to design a data
warehouse to facilitate the analysis of objects with RFID tags in an online analytical pro-
cessing manner. The company registers huge amounts of RFID data in the format of
(RFID, at location, time), and also has some information about the objects carrying the
RFID tag, for example, (RFID, product name, product category, producer, date produced,
price).
(a) Design a data warehouse to facilitate effective registration and online analytical
processing of such data.