Page 186 - A Practical Guide from Design Planning to Manufacturing
P. 186
Microarchitecture 159
shown on the left specifies an add instruction to be followed by a branch
and then a subtract instruction if the branch is not taken. The ROB
maintains a pointer to the oldest entry not yet retired, which starts at
the entry for the add. To start with, only the subtract is ready to retire,
having been executed out of order before the other two instructions.
The ROB also stores the physical destination register for the instruc-
tion and the architectural register that physical register had been
mapped to. In this example, the add instruction is to write to register 3,
which has been mapped to architectural register AX, and the subtract
is to write to register 6, which has been mapped to architectural BX. The
branch does not write to a register. At the right the RAT maintains a
list of the physical registers holding the most recent committed in-order
results for each architectural register.
Imagine the add instruction now completes execution. As the oldest
instruction, it is allowed to retire. The ROB oldest pointer is incremented
to point to the branch’s entry. The RAT is updated to show that the most
recent committed results for AX are stored in register 3. Upon execution
of the branch, it is discovered to have been mispredicted. The branch is
now the oldest entry in the ROB, so it is allowed to retire. However, the
subtract should never have been executed at all. When it retires its
results are discarded because the early retired branch was mispredicted.
The RAT is not updated and the latest committed results for BX are still
in register 15. The registers used by the subtract will be recycled for use by
other instructions; the result the subtract produced will be overwritten
without ever having been read.
If instructions were like balls rolling through a pipe, then the in-order
portion of the processor would have the job of carefully labeling each ball
with a number that indicates the order they should be retired. The out-
of-order execution of the processor is like the balls being thrown into the
pipe in random order. The ROB has the job of gathering up the balls that
come out of the pipe, and using their labeled numbers to put them back
into order and to discard any balls that should never have come through
the pipe in the first place.
The example in Fig. 5-18 shows instructions retiring successfully and
an instruction being discarded. Another possibility is that an instruction
will need to be replayed. Some processors allow execution of instructions
whose data may not be available yet. The most common case is schedul-
ing instructions that depend upon a load. Because the load may or may
not hit in the cache, its latency is not known. All the dependent instruc-
tions could be stalled until a hit or miss is determined but this causes some
delay. Performance can be improved by assuming the load will hit (the most
common case), and scheduling instructions before this is known for certain.
Taking into account the worst possible latency of each type of instruc-
tion would make filling the pipeline much more difficult. Instead, the