Page 161 - A Practical Guide from Design Planning to Manufacturing
P. 161
134 Chapter Five
while that instruction is waiting, the processor can execute instructions
from the second thread, which do not depend upon the first thread’s
results. The processor spends less time idle and more time performing
useful work.
Out-of-order issue and superscalar issue are purely microarchitectural
changes, which improve performance without any change in software.
HyperThreading is an example of how architecture and microarchitecture
working together can achieve even more performance at the cost of requir-
ing new software.
Designing for Performance
What we really want from a fast computer is to quickly run the programs
that we care about. Better performance simply means less program run
time.
×
Performance ∝ 1 = frequency instructionss per cycle
run time instruction count
To improve performance, we must increase frequency or average instruc-
tions per cycle (IPC) or reduce the number of instructions required. Choices
in architecture will affect the IPC and instruction count. Choices in microar-
chitecture seek to improve frequency or IPC. Performance is improved by
increasing either, but in general, changes that improve one make the
other worse. The easiest way to improve frequency is by increasing the
pipeline depth, dividing the execution of each instruction into smaller faster
cycles.
The examples earlier in this chapter had a pipeline depth of 4, divid-
ing instructions into 4 cycles. If instructions are balls rolling down a pipe,
this means the time between adding new balls to the pipe is equal to 1/4
the time a ball takes to roll the length of the pipe. Adding new balls
at this rate means there will always be 4 balls in the pipe at one time.
If a pipeline depth of 8 had been chosen instead, the time between adding
new balls would be 1/8 the time to roll the entire length and there would
be 8 balls in the pipe at one time. Doubling the pipeline depth has doubled
rate balls are added by cutting in half the time between new balls.
The time between adding new balls to the pipe is equivalent to a
processor’s cycle time, the inverse of processor frequency. Each instruc-
tion has a certain amount of logic delay determined by the type of
instruction. Increasing the pipeline depth divides this computation
among more cycles, allowing the time for each cycle to be less. Ideally,
doubling the pipeline depth would allow twice the frequency. In reality,
some amount of delay overhead is added with each new pipestage. This