Page 148 - A Practical Guide from Design Planning to Manufacturing
P. 148

Computer Architecture  121

        could mark as being able to execute in parallel. Future processors may
        include additional functional units and immediately make use of them
        without recompiling code.
          The EPIC architecture adds other features specifically intended to
        allow the compiler to expose more instructions that can be executed in
        parallel.
          EPIC architecture features are as follows:

          128 64-bit integer registers
          128 82-bit floating-point registers
          64 1-bit predicate registers
          Speculative loads
          Most RISC architectures defined 32 integer and 32 floating-point reg-
        isters. Increased transistor budgets have made implementation of more
        registers possible, so the EPIC architecture defines a full 128 integer and
        128 floating-point registers. The implementation of this many registers
        alone takes several times the total transistor budget of Intel’s original
        8086 processor, but with modern fabrication processes this number of
        registers is no longer unreasonable. With few architectural registers, the
        compiler must create false dependencies between instructions when it
        reuses registers. More registers allow the compiler to create more par-
        allel instructions and perform more reordering.
          Branches present a special problem when trying to find parallelism.
        The compiler cannot determine which instructions can be executed in
        parallel because the control flow is determined at run time. Some RISC
        architectures support conditional move instructions, which only move
        a value from one register to another if a condition is met. This has the
        effect of a branch without altering the control flow of the program. The
        EPIC architecture builds upon this idea with predicated execution.
          Almost any EPIC instruction can specify one of 64 single-bit predicate
        registers. The instruction will be allowed only to write back its result if
        its predict bit is true. This means that any instruction can be a condi-
        tional instruction. Compare instructions specify two predicate registers to
        which to write. One is written with the result of the comparison and the
        other with the complement. Executing instructions on both sides of the
        branch with complementary predicate registers eliminates branches
        altogether. There is wasted effort because not all the instructions being
        executed are truly needed, but the entire premise of the EPIC archi-
        tecture is that its implementations will have many functional units
        available. For branches that are difficult to predict accurately, execut-
        ing both sides of the branch may give the best performance.
          Loads also pose a problem for compilers attempting reordering.
        Because loads have a very long latency if they miss in the cache, moving
   143   144   145   146   147   148   149   150   151   152   153