Page 156 - A Practical Guide from Design Planning to Manufacturing
P. 156

Microarchitecture  129


         Cycle    1     2      3      4     5      6     7      8
          Instr 1  Fetch  Decode Execute Write
          Instr 2                          Fetch  Decode Execute  Write
        Figure 5-1 Sequential processing.



        defines how software should run and part of this is the expectation that
        programs will execute instructions one at a time. However, there are
        many instructions within programs that could be executed in parallel
        or at least overlapped. Microarchitectures that take advantage of this
        can provide higher performance, but to do this while providing soft-
        ware compatibility, the illusion of linear execution must be maintained.
        Pipelining provides higher performance by allowing execution of dif-
        ferent instructions to overlap.
          The earliest processors did not have sufficient transistors to support
        pipelining. They processed instructions serially one at a time exactly as
        the architecture defined. A very simple processor might break down
        each instruction into four steps.

        1. Fetch. The next instruction is retrieved from memory.
        2. Decode. The type of operation required is determined.
        3. Execute. The operation is performed.
        4. Write. The instruction results are stored.

          All modern processors use clock signals to synchronize their operation
        both internally and when interacting with external components. The
        operation of a simple sequential processor allocating one clock cycle per
        instruction step would appear as shown in Fig. 5-1.
          A pipelined processor improves performance by noting that separate
        parts of the processor are used to carry out each of instruction steps (see
        Fig. 5-2). With some added control logic, it is possible to begin the next
        instruction as soon as the last instruction has completed the first step.
          We can imagine individual instructions as balls that must roll from
        one end of a pipe to another to complete. Processing them sequentially
        is like adding a new ball only after the last comes out. By allowing mul-
        tiple balls to be in transit inside the pipe at the same time, the rate that
        the balls are loaded into the pipe is improved. This improvement hap-
        pens even though the total time it takes one ball to roll the length of the
        pipe has not changed. In processor terms, the instruction bandwidth has
        improved even though instruction latency has remained the same.
          Our simple sequential processor completed an instruction only every
        4 cycles, but ideally a pipelined processor could complete an instruction
   151   152   153   154   155   156   157   158   159   160   161