Page 176 - A Practical Guide from Design Planning to Manufacturing
P. 176

Microarchitecture  149

          Figure 5-12 shows two processors, each with their own cache, at three
        different moments in time. Normally there would also be a Northbridge
        chip handling communication with memory but for simplicity this has
        been left out. Each cache line has a flag showing that line’s MESI state,
        a tag holding the address of the data stored, and the data itself. To start
        out, processor A’s cache is empty with all lines invalid, and processor B’s
        cache exclusively owns the line from address 1. When processor A reads
        the lines from address 1 and 2, processor B snoops the bus and sees the
        request. Processor B ignores the request for line 2, which is not in its
        cache, but it does have line 1. The cache line state must be updated to
        share because now processor A’s cache will have it as well. When proces-
        sor A writes both cache lines, it writes line 2 without a bus transaction.
        Because it is the exclusive owner, it does not need to communicate that
        this write has happened. However, line 1 is shared, which means proces-
        sor A must signal that this line has been written. Processor B snoops this
        write transaction and marks its own copy invalid. Processor A updates
        its copy to modified.
          Only through this careful bookkeeping and communication can caches
        be safely used to improve performance without causing logical errors.


        Branch prediction
        One type of specialized cache used in modern microarchitectures is a
        branch prediction cache. Branches create a special problem for pipelined
        and out-of-order processors. Because they can alter the control flow, all
        the instructions after them depend upon their result. This control
        dependency affects not just the execution of later instructions, but
        whether they should be fetched at all. For many programs, 20 percent
                                             6
        or more of the instructions are branches . No pipelined processor could
        hope to achieve any reasonable speedup without some mechanism for
        dealing with branch control dependencies. The most common method is
        branch prediction (Fig. 5-13). The processor simply guesses which
        instruction should be fetched next.




           No       Correct   Incorrect
         prediction  prediction  prediction
          Branch    Branch     Branch
          Wait      Instr 1   Instr 50
          Instr 1              Clear   Figure 5-13 Branch prediction.
                               Instr 1

          6
           Hennessy and Patterson, Computer Architecture, 109.
   171   172   173   174   175   176   177   178   179   180   181