Page 146 -
P. 146
3:16 Page 109
#27
2011/6/1
HAN
10-ch03-083-124-9780123814791
3.4 Data Reduction 109
Cluster sample
Startified sample
Figure 3.9 Sampling can be used for data reduction.
a cluster. A reduced data representation can be obtained by applying, say, SRSWOR
to the pages, resulting in a cluster sample of the tuples. Other clustering criteria con-
veying rich semantics can also be explored. For example, in a spatial database, we
may choose to define clusters geographically based on how closely different areas are
located.
Stratified sample: If D is divided into mutually disjoint parts called strata, a stratified
sample of D is generated by obtaining an SRS at each stratum. This helps ensure a