Page 181 - A Practical Guide from Design Planning to Manufacturing
P. 181

154   Chapter Five

        order. After these two instructions go through renaming, architectural
        registers AX, BX, and CX have been mapped to physical registers R1,
        R2, and R3. There is a WAR dependency between the multiply and
        move. The move instruction is written to write to the same architectural
        register that is to be read by the multiply. This dependency is removed
        by mapping the architectural register BX to a different physical regis-
        ter for the move instruction. After a branch a second move also writes
        to register BX. The dependency is removed by mapping BX to yet
        another physical register. After renaming only the single true depend-
        ency remains.
          This mapping from architectural to physical registers is very similar
        to the mapping of virtual to physical memory addresses performed by
        virtual memory. Virtual memory allows multiple programs to run in
        parallel without interfering with each other by mapping their virtual
        memory addresses to separate physical addresses. Register renaming
        allows multiple instructions to run in parallel without interfering with
        each other by mapping their architectural registers to separate physi-
        cal registers. In both cases, more parallelism is allowed while the results
        of each program are unaffected.
          Architectures that define a large number of architectural registers
        have less need of hardware register renaming since the compiler
        can avoid most false dependencies. However, because control flow
        of programs varies at run time, false dependencies still appear and
        even processors with these architectures can benefit from register
        renaming.



        Microinstructions and microcode
        We can imagine a processor pipeline being a physical pipe with each
        instruction like a ball rolling down the pipe. If some balls roll more
        slowly down the pipe, other balls will stack up behind it. The pipeline
        works best when all the balls travel the pipeline in the same length of
        time. As a result, pipelined processors try to break down complicated
        instructions into simpler steps, like replacing a single slow ball with sev-
        eral fast ones.
          RISC architectures achieve this by allowing only simple instructions
        of relatively uniform complexity. Processors that support CISC architec-
        tures achieve the same affect by using hardware to translate their com-
        plex machine language instructions into smaller steps. Before translation
        the machine language instructions are called macroinstructions, and
        the smaller steps after translation are called microinstructions. These
        microinstructions typically bare a striking resemblance to RISC instruc-
        tions. The following example shows a simple macroinstruction being
        translated into three microinstructions.
   176   177   178   179   180   181   182   183   184   185   186