Page 190 - A Practical Guide from Design Planning to Manufacturing

P. 190

Microarchitecture 163

On a prediction of taken, a new instruction address is provided to the
instruction prefetch, so that instructions from the most likely program
path are fetched even before the branch has executed.

Trace cache write
The decoded uops, including any uop branches, are written into the trace
cache. The contents of the trace cache are not the same as main memory
because the instructions have been decoded and because they are not nec-
essarily stored in the same order. Because branch prediction is used to
direct the instruction prefetch, the order that macroinstructions are decoded
and written into the trace cache is the expected order of execution, not nec-
essarily the order the macroinstructions appear in memory.
Writing into the trace cache finishes the front-end pipeline. Uops will
wait in the trace cache until they are read to enter the execution pipeline.
All the steps until this point have just been preparing to start the
performance-critical portion of the pipeline. After being loaded into the
trace cache, a uop may remain there unread for sometime. It is part of
the code of the program currently being run, but program execution has
not reached it yet. The trace cache can achieve very high hit rates, so that
most of the time the processor performance is not affected by the latency
of the front-end pipeline. It is important that the front-end pipeline has
enough bandwidth to keep the trace cache filled.

Microbranch prediction
To begin the execution pipeline, the processor must determine which uop
should enter the pipeline next. The processor really maintains two instruc-
tions pointers. One holds the address of the next macroinstruction to be
read from the L2 cache by the instruction prefetch. The other holds the
address of the next uop to be read from the trace cache. If the last uop was
not a branch, the uop pointer is simply incremented to point to the next
group of uops in the trace cache. If the last uop fetched was a branch, its
address is sent to a trace cache BTB for prediction. The trace cache BTB
performs the same function as the front-end BTB, but it is used to predict
uop branches rather than macroinstruction branches. The predictions of the
front-end BTB steer the instruction prefetch to read needed macroin-
structions from the L2 cache. The predictions of the trace cache BTB steer
the microinstruction fetch to read needed uops from the trace cache.

Uop fetch and drive
Using the address determined by microbranch prediction, the trace
cache is read. If there is a trace cache miss, the needed address is sent
to the L2 cache. The data read from the L2 cache then flows through the

185 186 187 188 189 190 191 192 193 194 195