Page 184 - A Practical Guide from Design Planning to Manufacturing
P. 184
Microarchitecture 157
Macroinstrs Macroinstrs
Instr cache Instr decode
MicroInstrs
Instr decode Branches Trace cache Branches
Microinstrs
Execution Execution
Pentium III Pentium 4
Figure 5-17 Trace cache.
that will be stored in the data cache. These values are forwarded on to
main memory from where they can be displayed to the user.
The microarchitecture of the Pentium 4 limits the performance impact
of having to translate macroinstructions to uops by storing uops in a
trace cache. See Fig. 5-17.
The Pentium III microarchitecture stores macroinstructions in its
instruction cache. These are then translated before execution. If one of
the instructions is a branch, the new program path must be fetched from
the instruction cache and translated before execution can continue. In the
Pentium 4, the instructions translated to uops before being stored in the
cache. The instruction cache is called a trace cache because it no longer
stores instructions as they exist in main memory, but instead stores
decoded uop translations. When a branch is executed the already trans-
lated uops are immediately fetched from the trace cache. The instruction
decode delay is no longer part of the branch mispredict penalty and per-
formance is improved. The disadvantage is that the decoded instructions
require more storage and the trace cache must be larger in size to achieve
the same hit rate as an instruction cache.
In general, the use of uops allows only the translation circuits on the
processor to see a macroinstruction. The rest of the processor operates only
on uops, which can be changed from one processor design to the next as
suits the microarchitecture while maintaining software compatibility. The
use of uops has allowed processors supporting CISC architectures to use
all of the performance enhancing features that RISC processors do and has
made the distinction between CISC and RISC processors less important.
Reorder, retire, and replay
Any processor that allows out-of-order execution must have some
mechanism for putting instructions back in order after execution. Software
is written assuming that instructions will be executed in the order spec-
ified in the program. To produce the expected behavior, interrupts and
exceptions must be handled at the proper times. Branch prediction
means that some instructions will be executed that an in-order processor