Page 162 - A Practical Guide from Design Planning to Manufacturing
P. 162
Microarchitecture 135
overhead limits the amount of frequency improved by increasing pipeline
depth.
Frequency = 1 = 1 = deppth
T cycle T logic T logic + depth T× overhead
+ T overhead
depth
The above equation shows how frequency is improved by dividing the logic
delay of the instruction among a deeper pipeline. However, the rate of
improvement will slow down as the amount of logic in each cycle approaches
the overhead delay added with each new pipestage. Doubling frequency
requires increasing pipeline depth by more than a factor of 2. Even if fre-
quency is doubled in this fashion, performance will not double because IPC
drops as pipeline depth increases.
A longer pipeline allows more instructions in the pipe at one time.
More instructions in the pipe make data dependencies, control depend-
encies, and resource conflicts all more likely. Inevitably, increasing the
pipeline depth increases the number of pipeline stalls per instruction
and reduces the average instructions per cycle. Together the effects on
frequency and IPC let us write an equation for how performance changes
with pipeline depth.
Performance ∝ frequency × IPC =
depth
(T logic + depth T× overhead )( + stalls per iinstruction)
1
For the ideal case of no delay overhead and no added stalls, doubling
pipeline depth will double performance. The real improvement depends
upon how much circuit design minimizes the delay overhead per pipestage
and how much microarchitectural improvements offset the reduction
in IPC.
Of course, there is not really a single pipeline depth because different
instructions require different amounts of computation. The processor cycle
time will be set by the slowest pipestage. To prevent instructions requir-
ing more computation from limiting processor frequency, they are designed
to execute over more pipestages. An add instruction might require a total
of 10 cycles whereas a divide might use a total of 40. Having instructions
of different latencies increases resource conflicts. A short instruction can
finish on the same cycle as a longer instruction started earlier and
end up competing for access to write back results. There would be
fewer conflicts if all instructions used the same pipeline depth, but