Page 424 - Matrix Analysis & Applied Linear Algebra
P. 424
420 Chapter 5 Norms, Inner Products, and Orthogonality
T
|q A| = | cos θ 1 |, | cos θ 2 |,. . . , | cos θ n | provides the information that allows
the search engine to rank the relevance of each document relative to the query.
However, due to things like variation and ambiguity in the use of vocabulary,
presentation style, and even the indexing process, there is a lot of “noise” in
T
A, so the results in |q A| are nowhere near being an exact measure of how
well query q matches the various documents. To filter out some of this noise,
r T
the techniques of Example 5.12.3 are employed. An SVD A = σ i u i v is
i=1 i
judiciously truncated, and
v T
σ 1 1 k
T . . T
.
A k = U k D k V = u 1 |· · · | u k . . . = σ i u i v i
k
v T i=1
σ k
k
is used in place of A in (5.12.12). In other words, instead of using cos θ j , query
q is compared with document D j by using the magnitude of
T
q A k e j
cos φ j = .
q A k e j
2 2
T
To make this more suitable for computation, set S k = D k V = s 1 | s 2 |· · · | s k ,
k
and use
T
A k e j = U k D k V e j
= U k s j = s j
k
2 2 2 2
to write
T
q U k s j
cos φ j = . (5.12.13)
q s j
2 2
The vectors in U k and S k only need to be computed once (and they can be
determined without computing the entire SVD), so (5.12.13) requires very little
computation to process each new query. Furthermore, we can be generous in the
number of SVD components that are dropped because variation in the use of
vocabulary and the ambiguity of many words produces significant noise in A.
Coupling this with the fact that numerical accuracy is not an important issue
(knowing a cosine to two or three significant digits is sufficient) means that we
are more than happy to replace the SVD of A by alow-rank truncation A k ,
where k is significantly less than r.
Alternate Query Matching Strategy. An alternate way to measuring how
close a given query q is to a document vector d j is to replace the query vector
q in (5.12.12) by the projected query 9 q = P R(A) q, where P R(A) = U r U T r is the
orthogonal projector onto R (A) along R (A) ⊥ (Exercise 5.12.15) to produce
T
9 q Ae j
cos θ j = . (5.12.14)
9
9 q Ae j
2 2