Page 1139 - The Mechatronics Handbook
P. 1139
FIGURE 42.11 Instruction sequence: (a) program code, (b) traditional execution, (c) predicated execution.
pred_eq instructions. Predicate register p1 is set to indicate if the condition (A = B) is true, and p2 is set
if the condition is false. The “then” part of the if-statement is predicated on p1 and the “else” part is
predicated on p2. The pred_eq simply decides whether the addition or subtraction instruction is performed
and ensures that one of the two parts is not executed. There are several performance benefits for the predicated
code. First, the microprocessor does not need to make any branch predictions since all the branches in
the code are eliminated. This removes related penalties due to misprediction branches. More importantly,
the predicated instructions can utilize multiple instruction execution capabilities of modern micropro-
cessors and avoid the penalties for mispredicting branches.
Speculative Execution
The amount of ILP available within basic blocks is extremely limited in non-numeric programs. As such,
processors must optimize and schedule instructions across basic block code boundaries to achieve higher
performance. In addition, future processors must contend with both long latency load operations and
long latency cache misses. When load data is needed by subsequent dependent instructions, the processor
execution must wait until the cache access is complete.
In these situations, out-of-order machines dynamically reorder the instruction stream to execute non-
dependent instructions. Additionally, out-of-order machines have the advantage of executing instructions
that follow correctly predicted branch instructions. However, this approach requires complex circuitry
at the cost of chip die space. Similar performance gains can be achieved using static compile-time
speculation methods without complex out-of-order logic. Speculative execution, a technique for execut-
ing an instruction before knowing its execution is required, is an important technique for exploiting ILP
in programs. Speculative execution is best known for hiding memory latency. These methods utilize
instruction set architecture support of special speculative instructions.
A compiler utilizes speculative code motion to achieve higher performance in several ways. First, in
regions of code where insufficient ILP exists to fully utilize the processor resources, useful instructions
may be executed. Second, instructions at the beginning of long dependence chains may be executed early
to reduce the computation’s critical path. Finally, long latency instructions may be initiated early to
overlap their execution with other useful operations. Figure 42.12 illustrates a simple example of code
before and after a speculative compile-time transformation is performed to execute a load instruction
above a conditional branch.
Figure 42.12(a) shows how the branch instruction and its implied control flow define a control depen-
dence that restricts the load operation from being scheduled earlier in the code. Cache miss latencies would
halt the processor unless out-of-order execution mechanisms were used. However, with speculation sup-
port, Fig. 42.12(b) can be used to hide the latency of the load operation.
The solution requires the load to be speculative or nonfaulting. A speculative load will not signal an
exception for faults such as address alignment or address space access errors. Essentially, the load is
considered silent for these occurrences. The additional check instruction in Fig. 42.12(b) enables these
©2002 CRC Press LLC

