Page 483 - Discrete Mathematics and Its Applications
P. 483

462  7 / Discrete Probability


                                                birthdays of the j − 1 people already in the room given that these j − 1 people have different
                                                birthdays is

                                                    366 − (j − 1)   367 − j
                                                                 =         .
                                                         366         366
                                                    Because we have assumed that the birthdays of the people in the room are independent, we
                                                can conclude that the probability that the n people in the room have different birthdays is

                                                         365 364 363    367 − n
                                                    p n =            ···       .
                                                         366 366 366      366
                                                It follows that the probability that among n people there are at least two people with the same
                                                birthday is

                                                                365 364 363    367 − n
                                                    1 − p n = 1 −           ···       .
                                                                366 366 366      366
                                                    To determine the minimum number of people in the room so that the probability that at
                                                least two of them have the same birthday is greater than 1/2, we use the formula we have found
                                                for 1 − p n to compute it for increasing values of n until it becomes greater than 1/2. (There are
                                                more sophisticated approaches using calculus that can eliminate this computation, but we will
                                                not use them here.) After considerable computation we find that for n = 22, 1 − p n ≈ 0.475,
                                                while for n = 23, 1−p n ≈ 0.506. Consequently, the minimum number of people needed so that
                                                the probability that at least two people have the same birthday is greater than 1/2 is 23.  ▲

                                                    The solution to the birthday problem leads to the solution of the question in Example 14
                                                about hashing functions.
                                EXAMPLE 14      Probability of a Collision in Hashing Functions  Recall from Section 4.5 that a hashing
                                                function h(k) is a mapping of the keys (of the records that are to be stored in a database) to
                                                storage locations. Hashing functions map a large universe of keys (such as the approximately
                                                300 million Social Security numbers in the United States) to a much smaller set of storage
                                                locations. A good hashing function yields few collisions, which are mappings of two different
                                                keys to the same memory location, when relatively few of the records are in play in a given
                                                application. What is the probability that no two keys are mapped to the same location by a
                                                hashing function, or, in other words, that there are no collisions?

                                                Solution: To calculate this probability, we assume that the probability that a randomly selected
                                                key is mapped to a location is 1/m, where m is the number of available locations, that is, the
                                                hashing function distributes keys uniformly. (In practice, hashing functions may not satisfy this
                                                assumption. However, for a good hashing function, this assumption should be close to correct.)
                                                Furthermore, we assume that the keys of the records selected have an equal probability to be
                                                any of the elements of the key universe and that these keys are independently selected.
                                                    Suppose that the keys are k 1 ,k 2 ,...,k n . When we add the second record, the probability
                                                that it is mapped to a location different from the location of the first record, that h(k 2 )  = h(k 1 ),
                                                is (m − 1)/m because there are m − 1 free locations after the first record has been placed. The
                                                probability that the third record is mapped to a free location after the first and second records
                                                have been placed without a collision is (m − 2)/m. In general, the probability that the jth record
                                                is mapped to a free location after the first j − 1 records have been mapped to locations h(k 1 ),
                                                h(k 2 ),...,h(k j−1 ) without collisions is (m − (j − 1))/m because j − 1ofthe m locations are
                                                taken.
                                                    Because the keys are independent, the probability that all n keys are mapped to different
                                                locations is
                                                         m − 1 m − 2        m − n + 1
                                                    p n =      ·      · ··· ·        .
                                                           m      m             m
   478   479   480   481   482   483   484   485   486   487   488