Page 220 -
P. 220

2011/6/1
                                                                                    #59
                               11-ch04-125-186-9780123814791
                                                                     3:17 Page 183
                         HAN
                                                                                    4.7 Exercises  183


                               (b) The RFID data may contain lots of redundant information. Discuss a method
                                  that maximally reduces redundancy during data registration in the RFID data
                                  warehouse.
                               (c) The RFID data may contain lots of noise such as missing registration and misread
                                  IDs. Discuss a method that effectively cleans up the noisy data in the RFID data
                                  warehouse.
                               (d) You may want to perform online analytical processing to determine how many TV
                                  sets were shipped from the LA seaport to BestBuy in Champaign, IL, by month,
                                  brand, and price range. Outline how this could be done efficiently if you were to
                                  store such RFID data in the warehouse.
                               (e) If a customer returns a jug of milk and complains that is has spoiled before its expi-
                                  ration date, discuss how you can investigate such a case in the warehouse to find out
                                  what the problem is, either in shipping or in storage.
                          4.12 In many applications, new data sets are incrementally added to the existing large
                               data sets. Thus, an important consideration is whether a measure can be computed
                               efficiently in an incremental manner. Use count, standard deviation, and median as
                               examples to show that a distributive or algebraic measure facilitates efficient incremental
                               computation, whereas a holistic measure does not.
                          4.13 Suppose that we need to record three measures in a data cube: min(), average(), and
                               median(). Design an efficient computation and storage method for each measure given
                               that the cube allows data to be deleted incrementally (i.e., in small portions at a time)
                               from the cube.
                          4.14 In data warehouse technology, a multiple dimensional view can be implemented by
                               a relational database technique (ROLAP), by a multidimensional database technique
                               (MOLAP), or by a hybrid database technique (HOLAP).
                               (a) Briefly describe each implementation technique.
                               (b) For each technique, explain how each of the following functions may be
                                  implemented:
                                   i. The generation of a data warehouse (including aggregation)
                                  ii. Roll-up
                                  iii. Drill-down
                                  iv. Incremental updating
                               (c) Which implementation techniques do you prefer, and why?
                          4.15 Suppose that a data warehouse contains 20 dimensions, each with about five levels of
                               granularity.
                               (a) Users are mainly interested in four particular dimensions, each having three fre-
                                  quently accessed levels for rolling up and drilling down. How would you design a
                                  data cube structure to support this preference efficiently?
                               (b) At times, a user may want to drill through the cube to the raw data for one or two
                                  particular dimensions. How would you support this feature?
   215   216   217   218   219   220   221   222   223   224   225