Page 169 -
P. 169
140 CHAPTER 4 / CACHE MEMORY
performance.The need for the L2 cache to be larger than the L1 cache to affect per-
formance makes sense. If the L2 cache has the same line size and capacity as the L1
cache, its contents will more or less mirror those of the L1 cache.
With the increasing availability of on-chip area available for cache, most con-
temporary microprocessors have moved the L2 cache onto the processor chip and
added an L3 cache. Originally, the L3 cache was accessible over the external bus.
More recently, most microprocessors have incorporated an on-chip L3 cache. In ei-
ther case, there appears to be a performance advantage to adding the third level
(e.g., see [GHAI98]).
UNIFIED VERSUS SPLIT CACHES When the on-chip cache first made an appear-
ance, many of the designs consisted of a single cache used to store references to both
data and instructions. More recently, it has become common to split the cache into
two: one dedicated to instructions and one dedicated to data.These two caches both
exist at the same level, typically as two L1 caches. When the processor attempts to
fetch an instruction from main memory, it first consults the instruction L1 cache, and
when the processor attempts to fetch data from main memory, it first consults the
data L1 cache.
There are two potential advantages of a unified cache:
• For a given cache size, a unified cache has a higher hit rate than split caches be-
cause it balances the load between instruction and data fetches automatically.
That is, if an execution pattern involves many more instruction fetches than
data fetches, then the cache will tend to fill up with instructions, and if an exe-
cution pattern involves relatively more data fetches, the opposite will occur.
• Only one cache needs to be designed and implemented.
Despite these advantages, the trend is toward split caches, particularly for super-
scalar machines such as the Pentium and PowerPC, which emphasize parallel instruc-
tion execution and the prefetching of predicted future instructions.The key advantage
of the split cache design is that it eliminates contention for the cache between the in-
struction fetch/decode unit and the execution unit.This is important in any design that
relies on the pipelining of instructions. Typically, the processor will fetch instructions
ahead of time and fill a buffer, or pipeline, with instructions to be executed. Suppose
now that we have a unified instruction/data cache.When the execution unit performs
a memory access to load and store data, the request is submitted to the unified cache.
If, at the same time, the instruction prefetcher issues a read request to the cache for an
instruction, that request will be temporarily blocked so that the cache can service the
execution unit first, enabling it to complete the currently executing instruction. This
cache contention can degrade performance by interfering with efficient use of the
instruction pipeline.The split cache structure overcomes this difficulty.
4.4 PENTIUM 4 CACHE ORGANIZATION
The evolution of cache organization is seen clearly in the evolution of Intel micro-
processors (Table 4.4). The 80386 does not include an on-chip cache. The 80486
includes a single on-chip cache of 8 KBytes,using a line size of 16 bytes and a four-way

