Page 170 -
P. 170

4.4 / PENTIUM 4 CACHE ORGANIZATION 141

           Table 4.4 Intel Cache Evolution

                                                                        Processor on which
            Problem                                  Solution          Feature First Appears
            External memory slower than the system  Add external cache using faster  386
            bus.                             memory technology.
            Increased processor speed results in  Move external cache on-chip, op-  486
            external bus becoming a bottleneck for  erating at the same speed as the
            cache access.                    processor.
            Internal cache is rather small, due to  Add external L2 cache using faster  486
            limited space on chip            technology than main memory
            Contention occurs when both the Instruc-  Create separate data and instruc-  Pentium
            tion Prefetcher and the Execution Unit  tion caches.
            simultaneously require access to the cache.
            In that case, the Prefetcher is stalled while
            the Execution Unit’s data access takes
            place.
                                             Create separate back-side bus that  Pentium Pro
                                             runs at higher speed than the main
            Increased processor speed results in  (front-side) external bus.The BSB
            external bus becoming a bottleneck for L2  is dedicated to the L2 cache.
            cache access.
                                             Move L2 cache on to the proces-  Pentium II
                                             sor chip.
            Some applications deal with massive data-
                                             Add external L3 cache.         Pentium III
            bases and must have rapid access to large
            amounts of data.The on-chip caches are
                                             Move L3 cache on-chip.          Pentium 4
            too small.
                  set-associative organization. All of the Pentium processors include two on-chip L1
                  caches, one for data and one for instructions. For the Pentium 4, the L1 data cache
                  is 16 KBytes, using a line size of 64 bytes and a four-way set-associative organiza-
                  tion. The Pentium 4 instruction cache is described subsequently. The Pentium II
                  also includes an L2 cache that feeds both of the L1 caches. The L2 cache is eight-
                  way set associative with a size of 512 KB and a line size of 128 bytes. An L3 cache
                  was added for the Pentium III and became on-chip with high-end versions of the
                  Pentium 4.
                       Figure 4.18 provides a simplified view of the Pentium 4 organization, high-
                  lighting the placement of the three caches.The processor core consists of four major
                  components:
                     • Fetch/decode unit: Fetches program instructions in order from the L2 cache,
                       decodes these into a series of micro-operations, and stores the results in the L1
                       instruction cache.
                     • Out-of-order execution logic: Schedules execution of the micro-operations
                       subject to data dependencies and resource availability; thus, micro-operations
                       may be scheduled for execution in a different order than they were fetched
                       from the instruction stream. As time permits, this unit schedules speculative
                       execution of micro-operations that may be required in the future.
   165   166   167   168   169   170   171   172   173   174   175