Page 156 -
P. 156

3:16 Page 119
                                                                                    #37
                               10-ch03-083-124-9780123814791
                                                            2011/6/1
                         HAN
                                                      3.5 Data Transformation and Data Discretization  119



                                      country      15 distinct values





                                   province_or_state  365 distinct values





                                       city        3567 distinct values





                                       street      674,339 distinct values



                    Figure 3.13 Automatic generation of a schema concept hierarchy based on the number of distinct
                               attribute values.


                                 relevant attributes in the hierarchy specification. For example, instead of including
                                 all of the hierarchically relevant attributes for location, the user may have specified
                                 only street and city. To handle such partially specified hierarchies, it is important to
                                 embed data semantics in the database schema so that attributes with tight semantic
                                 connections can be pinned together. In this way, the specification of one attribute
                                 may trigger a whole group of semantically tightly linked attributes to be “dragged in”
                                 to form a complete hierarchy. Users, however, should have the option to override this
                                 feature, as necessary.

                  Example 3.8 Concept hierarchy generation using prespecified semantic connections. Suppose that
                               a data mining expert (serving as an administrator) has pinned together the five attri-
                               butes number, street, city, province or state, and country, because they are closely linked
                               semantically regarding the notion of location. If a user were to specify only the attribute
                               city for a hierarchy defining location, the system can automatically drag in all five seman-
                               tically related attributes to form a hierarchy. The user may choose to drop any of
                               these attributes (e.g., number and street) from the hierarchy, keeping city as the lowest
                               conceptual level.

                                 In summary, information at the schema level and on attribute–value counts can be
                               used to generate concept hierarchies for nominal data. Transforming nominal data with
                               the use of concept hierarchies allows higher-level knowledge patterns to be found. It
                               allows mining at multiple levels of abstraction, which is a common requirement for data
                               mining applications.
   151   152   153   154   155   156   157   158   159   160   161