Page 173 - A Practical Guide from Design Planning to Manufacturing
P. 173
146 Chapter Five
A cache that stores each value in only one location has an associativ-
ity of 1 and is said to be direct mapped. A cache that can store in two
locations has an associativity of 2, and a cache that can store a value in
any location is said to be fully associative. Caches that are more asso-
ciative are less likely to replace values that will be needed in the future,
but greater associativity can cause increased cache delay and power,
since an associative cache must look for data in multiple places.
Some cold misses are inevitable any time a new program is run. The
first miss on a location will be what causes it to be loaded, but increas-
ing cache line size will reduce the number of cold misses. When a byte of
memory is accessed, it is very likely that nearby bytes will also be needed.
By fetching larger blocks of data with each miss, it is more likely that a
needed piece of data will already have been brought into the cache, even
before it is to be used for the first time.
Large lines will reduce the number of cold misses but increase the
number of conflict misses. The same size cache using larger lines will
have a smaller number of lines. This makes conflicts between pieces of
data being mapped to the same line more likely. The byte immediately
after one just accessed is very likely to be needed, but the next one less
so and the one after that even less. As line size is increased beyond
some optimum point, the cache is loading more and more data that is
less likely to be needed and may be replacing data more likely to be used.
Another limit on line size is that large line sizes produce more bus traf-
fic. Performance may be hurt if the processor has to wait to access the
bus while a very large cache line is being loaded.
In addition to the main data and instruction caches, a modern processor
actually contains many other caches with different specialized functions.
One example is the translation lookaside buffer (TLB). This is a cache that
stores virtual memory page numbers in the tag array and physical memory
page numbers in the data array. Whenever a memory access occurs, the
TLB is checked to see if it contains the needed virtual to physical transla-
tion. If so, the access proceeds. If not, the needed translation is fetched from
main memory and is stored in the TLB. Without a TLB, every load or store
would require two memory accesses, one access to read the translation
and another to perform the operation. Virtual memory is an important
architectural feature, but it is microarchitectural features such as the TLB
that allow it to be implemented with reasonable performance.
Some misses are inevitable for any cache. If we could implement an
infinitely large, fully associative cache, it still would suffer from cold
misses. Having to choose a finite size adds capacity misses and making
the array less than fully associative will add conflict misses. A line size
must be chosen, and any cache could be implemented as a hierarchy of
multiple levels of cache. These choices will have a large impact on the
die size and performance of any microprocessor.