Page 148 - A Practical Guide from Design Planning to Manufacturing
P. 148
Computer Architecture 121
could mark as being able to execute in parallel. Future processors may
include additional functional units and immediately make use of them
without recompiling code.
The EPIC architecture adds other features specifically intended to
allow the compiler to expose more instructions that can be executed in
parallel.
EPIC architecture features are as follows:
128 64-bit integer registers
128 82-bit floating-point registers
64 1-bit predicate registers
Speculative loads
Most RISC architectures defined 32 integer and 32 floating-point reg-
isters. Increased transistor budgets have made implementation of more
registers possible, so the EPIC architecture defines a full 128 integer and
128 floating-point registers. The implementation of this many registers
alone takes several times the total transistor budget of Intel’s original
8086 processor, but with modern fabrication processes this number of
registers is no longer unreasonable. With few architectural registers, the
compiler must create false dependencies between instructions when it
reuses registers. More registers allow the compiler to create more par-
allel instructions and perform more reordering.
Branches present a special problem when trying to find parallelism.
The compiler cannot determine which instructions can be executed in
parallel because the control flow is determined at run time. Some RISC
architectures support conditional move instructions, which only move
a value from one register to another if a condition is met. This has the
effect of a branch without altering the control flow of the program. The
EPIC architecture builds upon this idea with predicated execution.
Almost any EPIC instruction can specify one of 64 single-bit predicate
registers. The instruction will be allowed only to write back its result if
its predict bit is true. This means that any instruction can be a condi-
tional instruction. Compare instructions specify two predicate registers to
which to write. One is written with the result of the comparison and the
other with the complement. Executing instructions on both sides of the
branch with complementary predicate registers eliminates branches
altogether. There is wasted effort because not all the instructions being
executed are truly needed, but the entire premise of the EPIC archi-
tecture is that its implementations will have many functional units
available. For branches that are difficult to predict accurately, execut-
ing both sides of the branch may give the best performance.
Loads also pose a problem for compilers attempting reordering.
Because loads have a very long latency if they miss in the cache, moving