Page 169 -
P. 169

140  CHAPTER 4 / CACHE MEMORY

                  performance.The need for the L2 cache to be larger than the L1 cache to affect per-
                  formance makes sense. If the L2 cache has the same line size and capacity as the L1
                  cache, its contents will more or less mirror those of the L1 cache.
                       With the increasing availability of on-chip area available for cache, most con-
                  temporary microprocessors have moved the L2 cache onto the processor chip and
                  added an L3 cache. Originally, the L3 cache was accessible over the external bus.
                  More recently, most microprocessors have incorporated an on-chip L3 cache. In ei-
                  ther case, there appears to be a performance advantage to adding the third level
                  (e.g., see [GHAI98]).
                  UNIFIED VERSUS SPLIT CACHES When the on-chip cache first made an appear-
                  ance, many of the designs consisted of a single cache used to store references to both
                  data and instructions. More recently, it has become common to split the cache into
                  two: one dedicated to instructions and one dedicated to data.These two caches both
                  exist at the same level, typically as two L1 caches. When the processor attempts to
                  fetch an instruction from main memory, it first consults the instruction L1 cache, and
                  when the processor attempts to fetch data from main memory, it first consults the
                  data L1 cache.
                       There are two potential advantages of a unified cache:

                     • For a given cache size, a unified cache has a higher hit rate than split caches be-
                       cause it balances the load between instruction and data fetches automatically.
                       That is, if an execution pattern involves many more instruction fetches than
                       data fetches, then the cache will tend to fill up with instructions, and if an exe-
                       cution pattern involves relatively more data fetches, the opposite will occur.
                     • Only one cache needs to be designed and implemented.
                       Despite these advantages, the trend is toward split caches, particularly for super-
                  scalar machines such as the Pentium and PowerPC, which emphasize parallel instruc-
                  tion execution and the prefetching of predicted future instructions.The key advantage
                  of the split cache design is that it eliminates contention for the cache between the in-
                  struction fetch/decode unit and the execution unit.This is important in any design that
                  relies on the pipelining of instructions. Typically, the processor will fetch instructions
                  ahead of time and fill a buffer, or pipeline, with instructions to be executed. Suppose
                  now that we have a unified instruction/data cache.When the execution unit performs
                  a memory access to load and store data, the request is submitted to the unified cache.
                  If, at the same time, the instruction prefetcher issues a read request to the cache for an
                  instruction, that request will be temporarily blocked so that the cache can service the
                  execution unit first, enabling it to complete the currently executing instruction. This
                  cache contention can degrade performance by interfering with efficient use of the
                  instruction pipeline.The split cache structure overcomes this difficulty.


             4.4 PENTIUM 4 CACHE ORGANIZATION


                  The evolution of cache organization is seen clearly in the evolution of Intel micro-
                  processors (Table 4.4). The 80386 does not include an on-chip cache. The 80486
                  includes a single on-chip cache of 8 KBytes,using a line size of 16 bytes and a four-way
   164   165   166   167   168   169   170   171   172   173   174