Page 275 - Computational Statistics Handbook with MATLAB

P. 275

264 Computational Statistics Handbook with MATLAB

hf ξ () = ∫ ft() t; for some ξ in B k . (8.8)
d
k
k
B
k
This is based on the assumption that the probability density function f x() is
. A function is Lipschitz contin-
Lipschitz continuous over the bin interval B k
such that
uous if there is a positive constant γ k
,
fx() – fy() < γ k x – y ; for all xy in B k . (8.9)
The first term in Equation 8.7 is an upper bound for the variance of the den-
sity estimate, and the second term is an upper bound for the squared bias of
the density estimate. This upper bound shows what happens to the density
estimate when the bin width h is varied.
We can try to minimize the MSE by varying the bin width h. We could set
h very small to reduce the bias, but this also increases the variance. The
increased variance in our density estimate is evident in Figure 8.1, where we
see more spikes as the bin width gets smaller. Equation 8.7 shows a common
problem in some density estimation methods: the trade-off between variance
and bias as h is changed. Most of the optimal bin widths presented here are
obtained by trying to minimize the squared error.
A rule for bin width selection that is often presented in introductory statis-
tics texts is called Sturges’ Rule. In reality, it is a rule that provides the number
of bins in the histogram, and is given by the following formula.

STURGES’ RULE (HISTOGRAM)

k = 1 + log 2 n .

Here k is the number of bins. The bin width h is obtained by taking the range
of the sample data and dividing it into the requisite number of bins, k.
Some improved values for the bin width h can be obtained by assuming the
existence of two derivatives of the probability density function f x() . We
include the following results (without proof), because they are the basis for
many of the univariate bin width rules presented in this chapter. The inter-
ested reader is referred to Scott [1992] for more details. Most of what we
present here follows his treatment of the subject.
Equation 8.7 provides a measure of the squared error at a point x. If we
want to measure the error in our estimate for the entire function, then we can
integrate over all values of x. Let’s assume f x() has an absolutely continuous
and a square-integrable first derivative. If we let n get very large n →( ∞) ,
then the asymptotic MISE is

270 271 272 273 274 275 276 277 278 279 280