Page 126 - Computational Statistics Handbook with MATLAB
P. 126

Chapter 5: Exploratory Data Analysis                            113



                             g
                                 gr
                                  raamms
                             Histo
                             Histo  ggrr aamm ss s
                             HistoHisto
                             A histogram is a way to graphically represent the frequency distribution of a
                             data set. Histograms are a good way to
                                • summarize a data set to understand general characteristics of the
                                   distribution such as shape, spread or location,
                                • suggest possible probabilistic models, or
                                • determine unusual behavior.

                             In this chapter, we look only at the simple, basic histogram. Variants and
                             extensions of the histogram are discussed in Chapter 8.
                              A frequency histogram is obtained by creating a set of bins or intervals that
                             cover the range of the data set. It is important that these bins do not overlap
                             and that they have equal width. We then count the number of observations
                             that fall into each bin. To visualize this, we plot the frequency as the height of
                             a bar, with the width of the bar representing the width of the bin. The histo-
                             gram is determined by two parameters, the bin width and the starting point
                             of the first bin. We discuss these issues in greater detail in Chapter 8. Relative
                             frequency histograms are obtained by representing the height of the bin by
                             the relative frequency of the observations that fall into the bin.
                              The basic MATLAB package has a function for calculating and plotting a
                             univariate histogram. This function is illustrated in the example given below.

                             Example 5.1
                             In this example, we look at a histogram of the data in forearm. These data
                             [Hand, et al., 1994; Pearson and Lee, 1903] consist of 140 measurements of the
                             length in inches of the forearm of adult males. We can obtain a simple histo-
                             gram in MATLAB using these commands:
                                load forearm
                                subplot(1,2,1)
                                % The hist function optionally returns the
                                % bin centers and frequencies.
                                [n,x] = hist(forearm);
                                % Plot and use the argument of width=1
                                % to produce bars that touch.
                                bar(x,n,1);
                                axis square
                                title('Frequency Histogram')
                                % Now create a relative frequency histogram.
                                % Divide each box by the total number of points.
                                subplot(1,2,2)
                                bar(x,n/140,1)
                                title('Relative Frequency Histogram')
                                axis square


                            © 2002 by Chapman & Hall/CRC
   121   122   123   124   125   126   127   128   129   130   131