Page 173 - A Practical Guide from Design Planning to Manufacturing
P. 173

146   Chapter Five

          A cache that stores each value in only one location has an associativ-
        ity of 1 and is said to be direct mapped. A cache that can store in two
        locations has an associativity of 2, and a cache that can store a value in
        any location is said to be fully associative. Caches that are more asso-
        ciative are less likely to replace values that will be needed in the future,
        but greater associativity can cause increased cache delay and power,
        since an associative cache must look for data in multiple places.
          Some cold misses are inevitable any time a new program is run. The
        first miss on a location will be what causes it to be loaded, but increas-
        ing cache line size will reduce the number of cold misses. When a byte of
        memory is accessed, it is very likely that nearby bytes will also be needed.
        By fetching larger blocks of data with each miss, it is more likely that a
        needed piece of data will already have been brought into the cache, even
        before it is to be used for the first time.
          Large lines will reduce the number of cold misses but increase the
        number of conflict misses. The same size cache using larger lines will
        have a smaller number of lines. This makes conflicts between pieces of
        data being mapped to the same line more likely. The byte immediately
        after one just accessed is very likely to be needed, but the next one less
        so and the one after that even less. As line size is increased beyond
        some optimum point, the cache is loading more and more data that is
        less likely to be needed and may be replacing data more likely to be used.
        Another limit on line size is that large line sizes produce more bus traf-
        fic. Performance may be hurt if the processor has to wait to access the
        bus while a very large cache line is being loaded.
          In addition to the main data and instruction caches, a modern processor
        actually contains many other caches with different specialized functions.
        One example is the translation lookaside buffer (TLB). This is a cache that
        stores virtual memory page numbers in the tag array and physical memory
        page numbers in the data array. Whenever a memory access occurs, the
        TLB is checked to see if it contains the needed virtual to physical transla-
        tion. If so, the access proceeds. If not, the needed translation is fetched from
        main memory and is stored in the TLB. Without a TLB, every load or store
        would require two memory accesses, one access to read the translation
        and another to perform the operation. Virtual memory is an important
        architectural feature, but it is microarchitectural features such as the TLB
        that allow it to be implemented with reasonable performance.
          Some misses are inevitable for any cache. If we could implement an
        infinitely large, fully associative cache, it still would suffer from cold
        misses. Having to choose a finite size adds capacity misses and making
        the array less than fully associative will add conflict misses. A line size
        must be chosen, and any cache could be implemented as a hierarchy of
        multiple levels of cache. These choices will have a large impact on the
        die size and performance of any microprocessor.
   168   169   170   171   172   173   174   175   176   177   178