Page 190 - A Practical Guide from Design Planning to Manufacturing
P. 190

Microarchitecture  163

          On a prediction of taken, a new instruction address is provided to the
        instruction prefetch, so that instructions from the most likely program
        path are fetched even before the branch has executed.


        Trace cache write
        The decoded uops, including any uop branches, are written into the trace
        cache. The contents of the trace cache are not the same as main memory
        because the instructions have been decoded and because they are not nec-
        essarily stored in the same order. Because branch prediction is used to
        direct the instruction prefetch, the order that macroinstructions are decoded
        and written into the trace cache is the expected order of execution, not nec-
        essarily the order the macroinstructions appear in memory.
          Writing into the trace cache finishes the front-end pipeline. Uops will
        wait in the trace cache until they are read to enter the execution pipeline.
        All the steps until this point have just been preparing to start the
        performance-critical portion of the pipeline. After being loaded into the
        trace cache, a uop may remain there unread for sometime. It is part of
        the code of the program currently being run, but program execution has
        not reached it yet. The trace cache can achieve very high hit rates, so that
        most of the time the processor performance is not affected by the latency
        of the front-end pipeline. It is important that the front-end pipeline has
        enough bandwidth to keep the trace cache filled.


        Microbranch prediction
        To begin the execution pipeline, the processor must determine which uop
        should enter the pipeline next. The processor really maintains two instruc-
        tions pointers. One holds the address of the next macroinstruction to be
        read from the L2 cache by the instruction prefetch. The other holds the
        address of the next uop to be read from the trace cache. If the last uop was
        not a branch, the uop pointer is simply incremented to point to the next
        group of uops in the trace cache. If the last uop fetched was a branch, its
        address is sent to a trace cache BTB for prediction. The trace cache BTB
        performs the same function as the front-end BTB, but it is used to predict
        uop branches rather than macroinstruction branches. The predictions of the
        front-end BTB steer the instruction prefetch to read needed macroin-
        structions from the L2 cache. The predictions of the trace cache BTB steer
        the microinstruction fetch to read needed uops from the trace cache.


        Uop fetch and drive
        Using the address determined by microbranch prediction, the trace
        cache is read. If there is a trace cache miss, the needed address is sent
        to the L2 cache. The data read from the L2 cache then flows through the
   185   186   187   188   189   190   191   192   193   194   195