Page 160 - A Practical Guide from Design Planning to Manufacturing
P. 160

Microarchitecture  133

          Asingle issue out-of-order pipeline can at best achieve an IPC of 1, com-
        pleting one instruction every cycle. A superscalar processor can achieve
        an IPC of greater than 1 by allowing multiple instructions to go through
        the pipeline in parallel. Superscalar designs are described by their issue
        width, the maximum number of instructions that can enter the pipeline
        simultaneously. Larger transistor budgets have made microarchitec-
        tures with issue widths of 2, 3, or more possible, but very wide issue
        designs have difficulty reaching their maximum theoretical performance.
          Larger issue widths and longer pipelines mean that more independent
        instructions must be found by the scheduler to keep the pipeline full.
        A processor that is capable of an IPC of 3 may achieve an IPC of less
        than 1 because of numerous pipeline breaks. The added die area and
        complexity to build ever more sophisticated schedulers may not be jus-
        tified by the performance improvements. This is the problem addressed
        by architectural solutions to expose more parallelism.
          The EPIC architecture adds features to allow the compiler to perform
        most of the work of the scheduler. Encoding instructions with informa-
        tion about which can be executed in parallel dramatically simplifies the
        task of the scheduler. The compiler is also able to search a much larger
        window of instructions looking for independent instructions than would
        be possible in hardware. Speculative load and conditional move instruc-
        tions allow more reordering by reducing control dependencies.
          The HyperThreading architectural extensions simplify the sched-
        uler’s job by allowing the program to divide itself into separate inde-
        pendent threads. Except for special synchronizing instructions, the
        scheduler assumes any instruction in one thread is independent of any
        instruction in another thread. This allows the scheduler to fill pipeline
        breaks created by dependencies in one thread with instructions from the
        other thread.
          In Fig. 5-7, the first instruction executed happens to be a very long one.
        If the next instruction depends upon the result, it must wait. However,





           Cycle     1      2      3     4      5      6      7     8

           Instr 1  Fetch  Decode         Execute           Write
           Instr 2        Fetch  Decode        Wait        Execute  Write
           Instr 3               Fetch  Decode Execute  Write
                  Thread 1
           Instr 4                      Fetch  Decode Execute Write
                           Thread 2
           Instr 5                             Fetch  Decode Execute Write
                                     Thread 1
        Figure 5-7 HyperThreading.
   155   156   157   158   159   160   161   162   163   164   165