Page 1137 - The Mechatronics Handbook
P. 1137

Today, most modern superscalar microprocessors use dynamic out-of-order scheduling techniques to
                                 increase the number of instructions executed per cycle. Such microprocessors use basically the same
                                 dynamically scheduled pipeline concept, all instructions pass through an issue stage in-order, are executed
                                 out-of-order, and are retired in-order. There are several functional elements of this common sequence,
                                 which have developed into computer architecture concepts. The first functional concept is scoreboarding.
                                 Scoreboarding is a technique for allowing instructions to execute out-of-order when there are available
                                 resources and no data dependences. Scoreboarding originates from the CDC 6600 machine’s issue logic,
                                 named the scoreboard. The overall goal of scoreboarding is to execute every instruction as early as
                                 possible.
                                   A more advanced approach to dynamic execution is Tomasulo’s approach. This scheme was employed in
                                 the IBM 360/91 processor. Although there are many variations on this scheme, the key concept of avoiding
                                 write-after-read (WAR) and write-after-write (WAW) dependences during dynamic execution is attributed
                                 to Tomasulo. In Tomasulo’s scheme, the functionality of the scoreboarding is provided by the reservation
                                 stations. Reservation stations buffer the operands of instructions waiting to issue as soon as they become
                                 available. The concept is to issue new instructions immediately when all source operands become available
                                 instead of accessing such operands through the register file. As such, waiting instructions designate the res-
                                 ervation station entry that will provide their input operands. This action removes WAW dependences caused
                                 by successive writes to the same register by forcing instructions to be related by dependences instead of by
                                 register specifiers. In general, renaming of register specifiers for pending operands to the reservation station
                                 entries is called  register renaming. Overall, Tomasulo’s scheme combines scoreboarding and register
                                                                                         6
                                 renaming. An Efficient Algorithm for Exploring Multiple Arithmetic Units  provides the complete details
                                 of Tomasulo’s scheme.

                                 Predicated Execution

                                 Branch instructions are recognized as a major impediment to exploiting (ILP). Branches force the
                                 compiler and hardware to make frequent predictions of branch directions in an attempt to find sufficient
                                 parallelism. Misprediction of these branches can result in severe performance degradation through the
                                 introduction of wasted cycles into the instruction stream. Branch prediction strategies reduce this prob-
                                 lem by allowing the compiler and hardware to continue processing instructions along the predicted
                                 control path, thus eliminating these wasted cycles.
                                   Predicated execution support provides an effective means to eliminate branches from an instruction
                                 stream. Predicated execution refers to the conditional execution of an instruction based on the value of
                                 a Boolean source operand, referred to as the predicate of the instruction. This architectural support
                                 allows the compiler to use an  if-conversion algorithm to convert conditional branches into predicate
                                 defining instructions, and instructions along alternative paths of each branch into predicated instruc-
                                      7
                                 tions.  Predicated instructions are fetched regardless of their predicate value. Instructions whose predicate
                                 value is true are executed normally. Conversely, instructions whose predicate is false are nullified, and
                                 thus are prevented from modifying the processor state. Predicated execution allows the compiler to trade
                                 instruction fetch efficiency for the capability to expose ILP to the hardware along multiple execution paths.
                                   Predicated execution offers the opportunity to improve branch handling in microprocessors. Elimi-
                                 nating frequently mispredicted branches may lead to a substantial reduction in branch prediction misses.
                                 As a result, the performance penalties associated with the eliminated branches are removed. Eliminating
                                 branches also reduces the need to handle multiple branches per cycle for wide issue processors. Finally,
                                 predicated execution provides an efficient interface for the compiler to expose multiple execution paths
                                 to the hardware. Without compiler support, the cost of maintaining multiple execution paths in hardware
                                 grows rapidly.
                                   The essence of predicated execution is the ability to suppress the modification of the processor state
                                 based upon some execution condition. Full predication cleanly supports this through a combination of
                                 instruction set and microarchitecture extensions. These extensions can be classified as a support for
                                 suppression of execution and expression of condition. The result of the condition, which determines if


                                 ©2002 CRC Press LLC
   1132   1133   1134   1135   1136   1137   1138   1139   1140   1141   1142