Page 117 -
P. 117

HAN 09-ch02-039-082-9780123814791


          80    Chapter 2 Getting to Know Your Data          2011/6/1  3:15  Page 80  #42



                     2.2 Suppose that the data for analysis includes the attribute age. The age values for the data
                         tuples are (in increasing order) 13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30,
                         33, 33, 35, 35, 35, 35, 36, 40, 45, 46, 52, 70.
                         (a) What is the mean of the data? What is the median?
                        (b) What is the mode of the data? Comment on the data’s modality (i.e., bimodal,
                            trimodal, etc.).
                         (c) What is the midrange of the data?
                        (d) Can you find (roughly) the first quartile (Q 1 ) and the third quartile (Q 3 ) of the data?
                         (e) Give the five-number summary of the data.
                         (f) Show a boxplot of the data.
                         (g) How is a quantile–quantile plot different from a quantile plot?
                     2.3 Suppose that the values for a given set of data are grouped into intervals. The intervals
                         and corresponding frequencies are as follows:
                            age       frequency
                            1–5          200
                            6–15         450
                            16–20        300
                            21–50       1500
                            51–80        700
                            81–110        44
                         Compute an approximate median value for the data.
                     2.4 Suppose that a hospital tested the age and body fat data for 18 randomly selected adults
                         with the following results:
                            age    23    23     27    27     39     41    47     49    50
                            %fat   9.5   26.5    7.8   17.8  31.4   25.9   27.4  27.2   31.2
                            age    52    54     54    56     57     58    58     60    61
                            %fat   34.6  42.5   28.8   33.4  30.2   34.1   32.9  41.2   35.7


                         (a) Calculate the mean, median, and standard deviation of age and %fat.
                        (b) Draw the boxplots for age and %fat.
                         (c) Draw a scatter plot and a q-q plot based on these two variables.
                     2.5 Briefly outline how to compute the dissimilarity between objects described by the
                         following:
                         (a) Nominal attributes
                        (b) Asymmetric binary attributes
   112   113   114   115   116   117   118   119   120   121   122