Page 88 - A Practical Guide from Design Planning to Manufacturing
P. 88

64   Chapter Two

        into registers. The register file is implemented as a section of transis-
        tors at the heart of the microprocessor die. Its small size and physical
        location directly next to the portion of the die performing calculations
        are what make its very low latencies possible. The effective cost of this
        die area is extremely high because increasing the capacity of the regis-
        ter file will push the other parts of the die farther apart, possibly lim-
        iting the maximum processor frequency. Also the latency of the register
        file will increase if its capacity is increased.
          Making any memory store larger will always increase its access time.
        So the register file is typically kept small to allow it to provide laten-
        cies of only a few processor cycles; but operating at billions of calcula-
        tions per second, it won’t be long before the processor will need a piece
        of data not in the register file. The first place the processor looks next
        for data is called cache memory.
          Cache memory is high-speed memory built into the processor die. It
        has higher capacity than the register file but a longer latency. Cache
        memories reduce the effective memory latency by storing data that has
        recently been used. If the processor accesses a particular memory loca-
        tion while running a program, it is likely to access it more than once.
        Nearby memory locations are also likely to be needed.
          By loading and storing memory values and their neighboring locations
        as they are accessed, cache memory will often contain the data the
        processor needs. If the needed data is not found in the cache, it will have
        to be retrieved from the next level of the memory hierarchy, the com-
        puter’s main memory. The percentage of time the needed data is found
        when the cache is accessed is called the hit rate. A larger cache will pro-
        vide a higher hit rate but will also take up more die area, increasing the
        processor cost. In addition, the larger the cache capacity, the longer its
        latency will be. Table 2-10 shows some of the trade-offs in designing
        cache memory.
          All the examples in Table 2-10 assume an average access time to
        main memory of 50 processor cycles. The first column shows that a
        processor with no cache will always have to go to main memory and
        therefore has an average access time of 50 cycles. The next column
        shows a 4-kB cache giving a hit rate of 65 percent and a latency of 4 cycles.
        For each memory access, there is a 65 percent chance the data will be
        found in the cache (a cache hit) and made available after 4 cycles.



        TABLE 2-10  Effective Memory Latency Example
        Cache size (kB)       0        4        32       128        4/128
        Hit rate             0%       65%       86%      90%      65%/90%
        Latency (cycles)    None       4        10        14        4/14
        Avg access (cycles)  50       21.5      17.0     19.0       10.7
   83   84   85   86   87   88   89   90   91   92   93