Page 144 -
P. 144

#25
                               10-ch03-083-124-9780123814791
                                                                     3:16 Page 107
                                                            2011/6/1
                         HAN
                                                                              3.4 Data Reduction  107



                                 10
                                  9
                                  8
                                  7
                                  6
                                count  5

                                  4
                                  3
                                  2
                                  1
                                  0
                                         5     10    15    20    25    30
                                                    price ($)


                     Figure 3.7 A histogram for price using singleton buckets—each bucket represents one price–value/
                               frequency pair.


                                 25
                                 20

                                count  15
                                 10

                                  5

                                  0
                                        1–10        11–20       21–30
                                                   price ($)


                     Figure 3.8 An equal-width histogram for price, where values are aggregated so that each bucket has a
                               uniform width of $10.

                                 “How are the buckets determined and the attribute values partitioned?” There are
                               several partitioning rules, including the following:

                                 Equal-width: In an equal-width histogram, the width of each bucket range is
                                 uniform (e.g., the width of $10 for the buckets in Figure 3.8).
                                 Equal-frequency (or equal-depth): In an equal-frequency histogram, the buckets are
                                 created so that, roughly, the frequency of each bucket is constant (i.e., each bucket
                                 contains roughly the same number of contiguous data samples).
   139   140   141   142   143   144   145   146   147   148   149