Page 181 - A Practical Guide from Design Planning to Manufacturing
P. 181
154 Chapter Five
order. After these two instructions go through renaming, architectural
registers AX, BX, and CX have been mapped to physical registers R1,
R2, and R3. There is a WAR dependency between the multiply and
move. The move instruction is written to write to the same architectural
register that is to be read by the multiply. This dependency is removed
by mapping the architectural register BX to a different physical regis-
ter for the move instruction. After a branch a second move also writes
to register BX. The dependency is removed by mapping BX to yet
another physical register. After renaming only the single true depend-
ency remains.
This mapping from architectural to physical registers is very similar
to the mapping of virtual to physical memory addresses performed by
virtual memory. Virtual memory allows multiple programs to run in
parallel without interfering with each other by mapping their virtual
memory addresses to separate physical addresses. Register renaming
allows multiple instructions to run in parallel without interfering with
each other by mapping their architectural registers to separate physi-
cal registers. In both cases, more parallelism is allowed while the results
of each program are unaffected.
Architectures that define a large number of architectural registers
have less need of hardware register renaming since the compiler
can avoid most false dependencies. However, because control flow
of programs varies at run time, false dependencies still appear and
even processors with these architectures can benefit from register
renaming.
Microinstructions and microcode
We can imagine a processor pipeline being a physical pipe with each
instruction like a ball rolling down the pipe. If some balls roll more
slowly down the pipe, other balls will stack up behind it. The pipeline
works best when all the balls travel the pipeline in the same length of
time. As a result, pipelined processors try to break down complicated
instructions into simpler steps, like replacing a single slow ball with sev-
eral fast ones.
RISC architectures achieve this by allowing only simple instructions
of relatively uniform complexity. Processors that support CISC architec-
tures achieve the same affect by using hardware to translate their com-
plex machine language instructions into smaller steps. Before translation
the machine language instructions are called macroinstructions, and
the smaller steps after translation are called microinstructions. These
microinstructions typically bare a striking resemblance to RISC instruc-
tions. The following example shows a simple macroinstruction being
translated into three microinstructions.