Page 1137 - The Mechatronics Handbook

P. 1137

Today, most modern superscalar microprocessors use dynamic out-of-order scheduling techniques to
increase the number of instructions executed per cycle. Such microprocessors use basically the same
dynamically scheduled pipeline concept, all instructions pass through an issue stage in-order, are executed
out-of-order, and are retired in-order. There are several functional elements of this common sequence,
which have developed into computer architecture concepts. The ﬁrst functional concept is scoreboarding.
Scoreboarding is a technique for allowing instructions to execute out-of-order when there are available
resources and no data dependences. Scoreboarding originates from the CDC 6600 machine’s issue logic,
named the scoreboard. The overall goal of scoreboarding is to execute every instruction as early as
possible.
A more advanced approach to dynamic execution is Tomasulo’s approach. This scheme was employed in
the IBM 360/91 processor. Although there are many variations on this scheme, the key concept of avoiding
write-after-read (WAR) and write-after-write (WAW) dependences during dynamic execution is attributed
to Tomasulo. In Tomasulo’s scheme, the functionality of the scoreboarding is provided by the reservation
stations. Reservation stations buffer the operands of instructions waiting to issue as soon as they become
available. The concept is to issue new instructions immediately when all source operands become available
instead of accessing such operands through the register ﬁle. As such, waiting instructions designate the res-
ervation station entry that will provide their input operands. This action removes WAW dependences caused
by successive writes to the same register by forcing instructions to be related by dependences instead of by
register speciﬁers. In general, renaming of register speciﬁers for pending operands to the reservation station
entries is called register renaming. Overall, Tomasulo’s scheme combines scoreboarding and register
6
renaming. An Efﬁcient Algorithm for Exploring Multiple Arithmetic Units provides the complete details
of Tomasulo’s scheme.

Predicated Execution

Branch instructions are recognized as a major impediment to exploiting (ILP). Branches force the
compiler and hardware to make frequent predictions of branch directions in an attempt to ﬁnd sufﬁcient
parallelism. Misprediction of these branches can result in severe performance degradation through the
introduction of wasted cycles into the instruction stream. Branch prediction strategies reduce this prob-
lem by allowing the compiler and hardware to continue processing instructions along the predicted
control path, thus eliminating these wasted cycles.
Predicated execution support provides an effective means to eliminate branches from an instruction
stream. Predicated execution refers to the conditional execution of an instruction based on the value of
a Boolean source operand, referred to as the predicate of the instruction. This architectural support
allows the compiler to use an if-conversion algorithm to convert conditional branches into predicate
deﬁning instructions, and instructions along alternative paths of each branch into predicated instruc-
7
tions. Predicated instructions are fetched regardless of their predicate value. Instructions whose predicate
value is true are executed normally. Conversely, instructions whose predicate is false are nulliﬁed, and
thus are prevented from modifying the processor state. Predicated execution allows the compiler to trade
instruction fetch efﬁciency for the capability to expose ILP to the hardware along multiple execution paths.
Predicated execution offers the opportunity to improve branch handling in microprocessors. Elimi-
nating frequently mispredicted branches may lead to a substantial reduction in branch prediction misses.
As a result, the performance penalties associated with the eliminated branches are removed. Eliminating
branches also reduces the need to handle multiple branches per cycle for wide issue processors. Finally,
predicated execution provides an efﬁcient interface for the compiler to expose multiple execution paths
to the hardware. Without compiler support, the cost of maintaining multiple execution paths in hardware
grows rapidly.
The essence of predicated execution is the ability to suppress the modiﬁcation of the processor state
based upon some execution condition. Full predication cleanly supports this through a combination of
instruction set and microarchitecture extensions. These extensions can be classiﬁed as a support for
suppression of execution and expression of condition. The result of the condition, which determines if

1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142