Page 483 - Discrete Mathematics and Its Applications

P. 483

462 7 / Discrete Probability

birthdays of the j − 1 people already in the room given that these j − 1 people have different
birthdays is

366 − (j − 1) 367 − j
= .
366 366
Because we have assumed that the birthdays of the people in the room are independent, we
can conclude that the probability that the n people in the room have different birthdays is

365 364 363 367 − n
p n = ··· .
366 366 366 366
It follows that the probability that among n people there are at least two people with the same
birthday is

365 364 363 367 − n
1 − p n = 1 − ··· .
366 366 366 366
To determine the minimum number of people in the room so that the probability that at
least two of them have the same birthday is greater than 1/2, we use the formula we have found
for 1 − p n to compute it for increasing values of n until it becomes greater than 1/2. (There are
more sophisticated approaches using calculus that can eliminate this computation, but we will
not use them here.) After considerable computation we ﬁnd that for n = 22, 1 − p n ≈ 0.475,
while for n = 23, 1−p n ≈ 0.506. Consequently, the minimum number of people needed so that
the probability that at least two people have the same birthday is greater than 1/2 is 23. ▲

The solution to the birthday problem leads to the solution of the question in Example 14
about hashing functions.
EXAMPLE 14 Probability of a Collision in Hashing Functions Recall from Section 4.5 that a hashing
function h(k) is a mapping of the keys (of the records that are to be stored in a database) to
storage locations. Hashing functions map a large universe of keys (such as the approximately
300 million Social Security numbers in the United States) to a much smaller set of storage
locations. A good hashing function yields few collisions, which are mappings of two different
keys to the same memory location, when relatively few of the records are in play in a given
application. What is the probability that no two keys are mapped to the same location by a
hashing function, or, in other words, that there are no collisions?

Solution: To calculate this probability, we assume that the probability that a randomly selected
key is mapped to a location is 1/m, where m is the number of available locations, that is, the
hashing function distributes keys uniformly. (In practice, hashing functions may not satisfy this
assumption. However, for a good hashing function, this assumption should be close to correct.)
Furthermore, we assume that the keys of the records selected have an equal probability to be
any of the elements of the key universe and that these keys are independently selected.
Suppose that the keys are k 1 ,k 2 ,...,k n . When we add the second record, the probability
that it is mapped to a location different from the location of the ﬁrst record, that h(k 2 ) = h(k 1 ),
is (m − 1)/m because there are m − 1 free locations after the ﬁrst record has been placed. The
probability that the third record is mapped to a free location after the ﬁrst and second records
have been placed without a collision is (m − 2)/m. In general, the probability that the jth record
is mapped to a free location after the ﬁrst j − 1 records have been mapped to locations h(k 1 ),
h(k 2 ),...,h(k j−1 ) without collisions is (m − (j − 1))/m because j − 1ofthe m locations are
taken.
Because the keys are independent, the probability that all n keys are mapped to different
locations is
m − 1 m − 2 m − n + 1
p n = · · ··· · .
m m m

478 479 480 481 482 483 484 485 486 487 488