Page 484 - Discrete Mathematics and Its Applications

P. 484

7.2 Probability Theory 463

It follows that the probability that there is at least one collision, that is, at least two keys are
mapped to the same location, is

m − 1 m − 2 m − n + 1
1 − p n = 1 − · · ··· · .
m m m

Techniques from calculus can be used to ﬁnd the smallest value of n given a value of m such
that the probability of a collision is greater than a particular threshold. It can be shown that the
smallest integer n such that the probability of a collision is greater than 1/2 is approximately
√
n = 1.177 m. For example, when m = 1,000,000, the smallest integer n such that the proba-
bility of a collision is greater than 1/2 is 1178. ▲

Monte Carlo Algorithms

The algorithms discussed so far in this book are all deterministic. That is, each algorithm always
proceeds in the same way whenever given the same input. However, there are many situations
where we would like an algorithm to make a random choice at one or more steps. Such a
situation arises when a deterministic algorithm would have to go through a huge number, or even
an unknown number, of possible cases. Algorithms that make random choices at one or more
steps are called probabilistic algorithms. We will discuss a particular class of probabilistic
algorithms in this section, namely, Monte Carlo algorithms, for decision problems. Monte
Carlo algorithms always produce answers to problems, but a small probability remains that
these answers may be incorrect. However, the probability that the answer is incorrect decreases
rapidly when the algorithm carries out sufﬁcient computation. Decision problems have either
“true” or “false” as their answer. The designation “Monte Carlo” is a reference to the famous
Monte Carlo methods
were invented to help casino in Monaco; the use of randomness and the repetitive processes in these algorithms make
develop the ﬁrst nuclear them similar to some gambling games. This name was introduced by the inventors of Monte
weapons. Carlo methods, including Stan Ulam, Enrico Fermi, and John von Neumann.
A Monte Carlo algorithm for a decision problem uses a sequence of tests. The probability
that the algorithm answers the decision problem correctly increases as more tests are carried
out. At each step of the algorithm, possible responses are “true,” which means that the answer
is “true” and no additional iterations are needed, or “unknown,” which means that the answer
could be either “true” or “false.” After running all the iterations in such an algorithm, the ﬁnal
answer produced is “true” if at least one iteration yields the answer “true,” and the answer is
“false” if every iteration yields the answer “unknown.” If the correct answer is “false,” then the
algorithm answers “false,” because every iteration will yield “unknown.” However, if the correct
answer is “true,” then the algorithm could answer either “true” or “false,” because it may be
possible that each iteration produced the response “unknown” even though the correct response
was “true.” We will show that this possibility becomes extremely unlikely as the number of tests
increases.
Suppose that p is the probability that the response of a test is “true,” given that the answer
is “true.” It follows that 1−p is the probability that the response is “unknown,” given that the
answer is “true.” Because the algorithm answers “false” when all n iterations yield the answer
n
“unknown” and the iterations perform independent tests, the probability of error is (1−p) .
When p = 0, this probability approaches 0 as the number of tests increases. Consequently, the
probability that the algorithm answers “true” when the answer is “true” approaches 1.
EXAMPLE 15 Quality Control (This example is adapted from [AhUl95].) Suppose that a manufacturer
orders processor chips in batches of size n, where n is a positive integer. The chip maker
has tested only some of these batches to make sure that all the chips in the batch are good
(replacing any bad chips found during testing with good ones). In previously untested batches,
the probability that a particular chip is bad has been observed to be 0.1 when random testing
is done. The PC manufacturer wants to decide whether all the chips in a batch are good. To

479 480 481 482 483 484 485 486 487 488 489