Page 162 - A Practical Guide from Design Planning to Manufacturing
P. 162

Microarchitecture  135

        overhead limits the amount of frequency improved by increasing pipeline
        depth.


             Frequency =  1   =       1        =        deppth
                         T cycle  T logic       T logic  + depth T×  overhead
                                      + T overhead
                                depth


          The above equation shows how frequency is improved by dividing the logic
        delay of the instruction among a deeper pipeline. However, the rate of
        improvement will slow down as the amount of logic in each cycle approaches
        the overhead delay added with each new pipestage. Doubling frequency
        requires increasing pipeline depth by more than a factor of 2. Even if fre-
        quency is doubled in this fashion, performance will not double because IPC
        drops as pipeline depth increases.
          A longer pipeline allows more instructions in the pipe at one time.
        More instructions in the pipe make data dependencies, control depend-
        encies, and resource conflicts all more likely. Inevitably, increasing the
        pipeline depth increases the number of pipeline stalls per instruction
        and reduces the average instructions per cycle. Together the effects on
        frequency and IPC let us write an equation for how performance changes
        with pipeline depth.

          Performance ∝ frequency × IPC =

                                      depth
                  (T logic  + depth T×  overhead )( +  stalls per iinstruction)
                                        1

          For the ideal case of no delay overhead and no added stalls, doubling
        pipeline depth will double performance. The real improvement depends
        upon how much circuit design minimizes the delay overhead per pipestage
        and how much microarchitectural improvements offset the reduction
        in IPC.
          Of course, there is not really a single pipeline depth because different
        instructions require different amounts of computation. The processor cycle
        time will be set by the slowest pipestage. To prevent instructions requir-
        ing more computation from limiting processor frequency, they are designed
        to execute over more pipestages. An add instruction might require a total
        of 10 cycles whereas a divide might use a total of 40. Having instructions
        of different latencies increases resource conflicts. A short instruction can
        finish on the same cycle as a longer instruction started earlier and
        end up competing for access to write back results. There would be
        fewer conflicts if all instructions used the same pipeline depth, but
   157   158   159   160   161   162   163   164   165   166   167