Page 170 - A Practical Guide from Design Planning to Manufacturing
P. 170

Microarchitecture  143

        which instruction reordering or better sharing of resources might elim-
        inate. When dependencies can often but not always be eliminated,
        modern microarchitectures are designed to “guess” by predicting the
        most common behavior, spending extra time to get the correct result
        after a wrong guess. Some of the most important microarchitectural
        concepts of recent processors are:
          Cache memories
          Cache coherency
          Branch prediction
          Register renaming
          Microinstructions
          Reorder, replay, retire

          All of these ideas seek to improve IPC and are discussed in more
        detail in the following sections.


        Cache memory
        Storing recently used values in cache memories to reduce average latency
        is an idea used over and over again in modern microarchitectures.
        Instructions, data, virtual memory translations, and branch addresses are
        all values commonly stored in caches in modern processors. Chapter 2 dis-
        cussed how multiple levels of cache work together to create a memory hier-
        archy that has lower average latency than any single level of cache could
        achieve. Caches are effective at reducing average latency because pro-
        grams tend to access data and instructions in regular patterns.
          A program accessing a particular memory location is very likely to
        access the same memory location again in the near future. On any given
        day we are much more likely to look in a drawer that we also opened
        the day before than to look in a drawer that has been closed for months.
        If we used a particular knife to make lunch, we are far more likely to
        use the same knife to make dinner than some utensil in a drawer that
        hasn’t been opened for months. Computers act in the same way, being
        more likely to access memory locations recently accessed. This is called
        temporal locality; similar accesses tend to be close together in time.
          By holding recently accessed values in caches, processors provide
        reduced latency when the same value is needed repeatedly. In addition,
        nearby values are likely to be needed in the near future. A typical pro-
        gram accessing memory location 100 is extremely likely to need location
        101 as well. This grouping of accesses in similar locations is called
        spatial locality.
   165   166   167   168   169   170   171   172   173   174   175