Page 1139 - The Mechatronics Handbook
P. 1139

FIGURE 42.11  Instruction sequence: (a) program code, (b) traditional execution, (c) predicated execution.

                                 pred_eq instructions. Predicate register p1 is set to indicate if the condition (A = B) is true, and p2 is set
                                 if the condition is false. The “then” part of the if-statement is predicated on p1 and the “else” part is
                                 predicated on p2. The pred_eq simply decides whether the addition or subtraction instruction is performed
                                 and ensures that one of the two parts is not executed. There are several performance benefits for the predicated
                                 code. First, the microprocessor does not need to make any branch predictions since all the branches in
                                 the code are eliminated. This removes related penalties due to misprediction branches. More importantly,
                                 the predicated instructions can utilize multiple instruction execution capabilities of modern micropro-
                                 cessors and avoid the penalties for mispredicting branches.

                                 Speculative Execution

                                 The amount of ILP available within basic blocks is extremely limited in non-numeric programs. As such,
                                 processors must optimize and schedule instructions across basic block code boundaries to achieve higher
                                 performance. In addition, future processors must contend with both long latency load operations and
                                 long latency cache misses. When load data is needed by subsequent dependent instructions, the processor
                                 execution must wait until the cache access is complete.
                                   In these situations, out-of-order machines dynamically reorder the instruction stream to execute non-
                                 dependent instructions. Additionally, out-of-order machines have the advantage of executing instructions
                                 that follow correctly predicted branch instructions. However, this approach requires complex circuitry
                                 at the cost of chip die space. Similar performance gains can be achieved using static compile-time
                                 speculation methods without complex out-of-order logic. Speculative execution, a technique for execut-
                                 ing an instruction before knowing its execution is required, is an important technique for exploiting ILP
                                 in programs. Speculative execution is best known for hiding memory latency. These methods utilize
                                 instruction set architecture support of special speculative instructions.
                                   A compiler utilizes speculative code motion to achieve higher performance in several ways. First, in
                                 regions of code where insufficient ILP exists to fully utilize the processor resources, useful instructions
                                 may be executed. Second, instructions at the beginning of long dependence chains may be executed early
                                 to reduce the computation’s critical path. Finally, long latency instructions may be initiated early to
                                 overlap their execution with other useful operations. Figure 42.12 illustrates a simple example of code
                                 before and after a speculative compile-time transformation is performed to execute a load instruction
                                 above a conditional branch.
                                   Figure 42.12(a) shows how the branch instruction and its implied control flow define a control depen-
                                 dence that restricts the load operation from being scheduled earlier in the code. Cache miss latencies would
                                 halt the processor unless out-of-order execution mechanisms were used. However, with speculation sup-
                                 port, Fig. 42.12(b) can be used to hide the latency of the load operation.
                                   The solution requires the load to be speculative or nonfaulting. A speculative load will not signal an
                                 exception for faults such as address alignment or address space access errors. Essentially, the load is
                                 considered silent for these occurrences. The additional check instruction in Fig. 42.12(b) enables these

                                 ©2002 CRC Press LLC
   1134   1135   1136   1137   1138   1139   1140   1141   1142   1143   1144