Page 156 -
P. 156
3:16 Page 119
#37
10-ch03-083-124-9780123814791
2011/6/1
HAN
3.5 Data Transformation and Data Discretization 119
country 15 distinct values
province_or_state 365 distinct values
city 3567 distinct values
street 674,339 distinct values
Figure 3.13 Automatic generation of a schema concept hierarchy based on the number of distinct
attribute values.
relevant attributes in the hierarchy specification. For example, instead of including
all of the hierarchically relevant attributes for location, the user may have specified
only street and city. To handle such partially specified hierarchies, it is important to
embed data semantics in the database schema so that attributes with tight semantic
connections can be pinned together. In this way, the specification of one attribute
may trigger a whole group of semantically tightly linked attributes to be “dragged in”
to form a complete hierarchy. Users, however, should have the option to override this
feature, as necessary.
Example 3.8 Concept hierarchy generation using prespecified semantic connections. Suppose that
a data mining expert (serving as an administrator) has pinned together the five attri-
butes number, street, city, province or state, and country, because they are closely linked
semantically regarding the notion of location. If a user were to specify only the attribute
city for a hierarchy defining location, the system can automatically drag in all five seman-
tically related attributes to form a hierarchy. The user may choose to drop any of
these attributes (e.g., number and street) from the hierarchy, keeping city as the lowest
conceptual level.
In summary, information at the schema level and on attribute–value counts can be
used to generate concept hierarchies for nominal data. Transforming nominal data with
the use of concept hierarchies allows higher-level knowledge patterns to be found. It
allows mining at multiple levels of abstraction, which is a common requirement for data
mining applications.