Page 498 - Discrete Mathematics and Its Applications

P. 498

7.4 Expected Value and Variance 477

∗ 17. Prove Theorem 2, the extended form of Bayes’ theo- messages and 20 messages that are not spam, while the
rem. That is, suppose that E is an event from a sample word “herbal” appears in 800 spam messages and 200
space S and that F 1 ,F 2 ,...,F n are mutually exclusive messages that are not spam. Estimate the probability that
n
events such that F i = S. Assume that p(E) = 0 a received message containing both the words “enhance-
i=1
and p(F i ) = 0 for i = 1, 2,...,n. Show that ment” and “herbal” is spam. Will the message be rejected
as spam if the threshold for rejecting spam is 0.9?
p(E | F j )p(F j )
. 22. Suppose that we have prior information concerning
n
p(F j | E) =
i=1 p(E | F i )p(F i ) whether a random incoming message is spam. In par-
ticular, suppose that over a time period, we ﬁnd that s
n
[Hint: Use the fact that E = (E ∩ F i ).] spam messages arrive and h messages arrive that are
i=1
18. Suppose that a Bayesian spam ﬁlter is trained on a set of not spam.
500 spam messages and 200 messages that are not spam. a) Use this information to estimate p(S), the probabil-
The word “exciting” appears in 40 spam messages and ity that an incoming message is spam, and p(S), the
in 25 messages that are not spam. Would an incom- probability an incoming message is not spam.
ing message be rejected as spam if it contains the word b) Use Bayes’theorem and part (a) to estimate the prob-
“exciting” and the threshold for rejecting spam is 0.9? ability that an incoming message containing the word
19. Suppose that a Bayesian spam ﬁlter is trained on a set w is spam, where p(w) is the probability that w occurs
of 1000 spam messages and 400 messages that are not in a spam message and q(w) is the probability that w
spam. The word “opportunity” appears in 175 spam mes- occurs in a message that is not spam.
sages and 20 messages that are not spam. Would an in- 23. Suppose that E 1 and E 2 are the events that an incoming
coming message be rejected as spam if it contains the mail message contains the words w 1 and w 2 , respectively.
word “opportunity” and the threshold for rejecting a mes- Assuming that E 1 and E 2 are independent events and that
sage is 0.9? E 1 | S and E 2 | S are independent events, where S is the
20. Would we reject a message as spam in Example 4 event that an incoming message is spam, and that we have
a) using just the fact that the word “undervalued” occurs no prior knowledge regarding whether or not the message
in the message? is spam, show that
b) using just the fact that the word “stock” occurs in the
message? p(S | E 1 ∩ E 2 )
21. Suppose that a Bayesian spam ﬁlter is trained on a set
p(E 1 | S)p(E 2 | S)
of 10,000 spam messages and 5000 messages that are not = .
spam. The word “enhancement” appears in 1500 spam p(E 1 | S)p(E 2 | S) + p(E 1 | S)p(E 2 | S)

7.4 Expected Value and Variance

Introduction

The expected value of a random variable is the sum over all elements in a sample space of the
product of the probability of the element and the value of the random variable at this element.
Consequently, the expected value is a weighted average of the values of a random variable. The
expected value of a random variable provides a central point for the distribution of values of
this random variable. We can solve many problems using the notion of the expected value of a
random variable, such as determining who has an advantage in gambling games and computing
the average-case complexity of algorithms. Another useful measure of a random variable is its
variance, which tells us how spread out the values of this random variable are. We can use the
variance of a random variable to help us estimate the probability that a random variable takes
values far removed from its expected value.

Expected Values

Many questions can be formulated in terms of the value we expect a random variable to take,
or more precisely, the average value of a random variable when an experiment is performed a
large number of times. Questions of this kind include: How many heads are expected to appear

493 494 495 496 497 498 499 500 501 502 503