Page 167 -

P. 167

138 CHAPTER 4 / CACHE MEMORY

Cache coherency is an active field of research.This topic is explored further in
Part Five.

Line Size
Another design element is the line size.When a block of data is retrieved and placed
in the cache, not only the desired word but also some number of adjacent words are
retrieved.As the block size increases from very small to larger sizes, the hit ratio will
at first increase because of the principle of locality, which states that data in the
vicinity of a referenced word are likely to be referenced in the near future. As the
block size increases, more useful data are brought into the cache. The hit ratio will
begin to decrease, however, as the block becomes even bigger and the probability of
using the newly fetched information becomes less than the probability of reusing
the information that has to be replaced.Two specific effects come into play:
• Larger blocks reduce the number of blocks that fit into a cache. Because each
block fetch overwrites older cache contents, a small number of blocks results
in data being overwritten shortly after they are fetched.
• As a block becomes larger, each additional word is farther from the requested
word and therefore less likely to be needed in the near future.

The relationship between block size and hit ratio is complex, depending on the
locality characteristics of a particular program, and no definitive optimum value has
been found. A size of from 8 to 64 bytes seems reasonably close to optimum
[SMIT87, PRZY88, PRZY90, HAND98]. For HPC systems, 64- and 128-byte cache
line sizes are most frequently used.

Number of Caches
When caches were originally introduced, the typical system had a single cache. More
recently, the use of multiple caches has become the norm. Two aspects of this design
issue concern the number of levels of caches and the use of unified versus split caches.

MULTILEVEL CACHES As logic density has increased, it has become possible to
have a cache on the same chip as the processor: the on-chip cache. Compared with a
cache reachable via an external bus, the on-chip cache reduces the processor’s ex-
ternal bus activity and therefore speeds up execution times and increases overall
system performance.When the requested instruction or data is found in the on-chip
cache, the bus access is eliminated. Because of the short data paths internal to the
processor, compared with bus lengths, on-chip cache accesses will complete appre-
ciably faster than would even zero-wait state bus cycles. Furthermore, during this
period the bus is free to support other transfers.
The inclusion of an on-chip cache leaves open the question of whether an off-
chip, or external, cache is still desirable.Typically, the answer is yes, and most contem-
porary designs include both on-chip and external caches. The simplest such
organization is known as a two-level cache, with the internal cache designated as level
1 (L1) and the external cache designated as level 2 (L2). The reason for including an
L2 cache is the following: If there is no L2 cache and the processor makes an access
request for a memory location not in the L1 cache, then the processor must access

162 163 164 165 166 167 168 169 170 171 172