Page 160 - A Practical Guide from Design Planning to Manufacturing
P. 160
Microarchitecture 133
Asingle issue out-of-order pipeline can at best achieve an IPC of 1, com-
pleting one instruction every cycle. A superscalar processor can achieve
an IPC of greater than 1 by allowing multiple instructions to go through
the pipeline in parallel. Superscalar designs are described by their issue
width, the maximum number of instructions that can enter the pipeline
simultaneously. Larger transistor budgets have made microarchitec-
tures with issue widths of 2, 3, or more possible, but very wide issue
designs have difficulty reaching their maximum theoretical performance.
Larger issue widths and longer pipelines mean that more independent
instructions must be found by the scheduler to keep the pipeline full.
A processor that is capable of an IPC of 3 may achieve an IPC of less
than 1 because of numerous pipeline breaks. The added die area and
complexity to build ever more sophisticated schedulers may not be jus-
tified by the performance improvements. This is the problem addressed
by architectural solutions to expose more parallelism.
The EPIC architecture adds features to allow the compiler to perform
most of the work of the scheduler. Encoding instructions with informa-
tion about which can be executed in parallel dramatically simplifies the
task of the scheduler. The compiler is also able to search a much larger
window of instructions looking for independent instructions than would
be possible in hardware. Speculative load and conditional move instruc-
tions allow more reordering by reducing control dependencies.
The HyperThreading architectural extensions simplify the sched-
uler’s job by allowing the program to divide itself into separate inde-
pendent threads. Except for special synchronizing instructions, the
scheduler assumes any instruction in one thread is independent of any
instruction in another thread. This allows the scheduler to fill pipeline
breaks created by dependencies in one thread with instructions from the
other thread.
In Fig. 5-7, the first instruction executed happens to be a very long one.
If the next instruction depends upon the result, it must wait. However,
Cycle 1 2 3 4 5 6 7 8
Instr 1 Fetch Decode Execute Write
Instr 2 Fetch Decode Wait Execute Write
Instr 3 Fetch Decode Execute Write
Thread 1
Instr 4 Fetch Decode Execute Write
Thread 2
Instr 5 Fetch Decode Execute Write
Thread 1
Figure 5-7 HyperThreading.