Page 187 - A Practical Guide from Design Planning to Manufacturing
P. 187

160   Chapter Five

        processor can assume that each instruction will execute in the shortest pos-
        sible number of cycles. After execution the instruction is checked to see this
        guess was correct. Scheduling assuming data will be available is sometimes
        called data speculation. It is very similar to branch prediction. Both meth-
        ods make predictions during scheduling and reduce delay when correct,
        at the cost of increased delay when incorrect. When branch prediction is
        wrong instructions are incorrectly executed and must be discarded. When
        data speculation is wrong, the correct instructions are executed but with
        the wrong data. These instructions cannot be discarded; instead, they
        must be executed again, this time with the correct data. Sending instruc-
        tions back into the pipe is called replay. If the events that cause replay
        are rare enough, overall performance is improved.

        Life of an Instruction
        The basic steps any microprocessor instruction goes through have changed
        little since the first pipelined processors. An instruction must be fetched.
        It must be decoded to determine what type of instruction it is. The instruc-
        tion is executed and the results stored. What has changed is that steps
        have been added to try and improve performance, such as register renam-
        ing and out-of-order scheduling. The total number of cycles in the pipeline
        has increased to allow higher clock frequency. As an example of how a
        processor microarchitecture works, this section describes in detail what
        actions occur during each step of the original Pentium 4 pipeline.
          The Pentium 4 actually has two separate pipelines, the front-end
        pipeline, which translates macroinstructions into uops, and the execu-
        tion pipeline, which executes uops.
          The front-end pipeline has the responsibility for fetching macroin-
        structions from memory and decoding them to keep a trace cache filled
        with uops. The execution pipeline works only with uops and is respon-
        sible for scheduling, executing, and retiring these instructions. Table 5-1

        TABLE 5-1  Pentium & Pipeline
        Front-end pipeline    Microinstruction pipeline (20)
        Instruction prefetch   Microbranch prediction (2)
        L2 cache read          Microinstruction fetch (2)
        Instruction decode     Drive (1)
        Branch prediction      Allocation (1)
        Trace cache write      Register rename (2)
                               Load instruction queue (1)
                               Schedule (3)
                               Dispatch (2)
                               Register file read (2)
                               Execute (1)
                               Calculate flags (1)
                               Retirement (1)
                               Drive (1)
   182   183   184   185   186   187   188   189   190   191   192