Page 1127 - The Mechatronics Handbook
P. 1127

The time required to finish N instructions in a pipeline with K stages can be calculated. Assume a
                                 cycle time of T for the overall instruction completion, and an equal T/K processing delay at each stage.
                                 With a pipeline scheme, the first instruction completes the pipeline after T, and there will be a new
                                 instruction out of the pipeline per stage delay T/K. Therefore, the delays of executing N instructions with
                                 and without pipelining, respectively, are


                                                                      T * N()                              (42.1)

                                                                 T +  ( T/k) * N 1)                        (42.2)
                                                                           (
                                                                              –
                                   There is an initial delay in the pipeline execution model before each stage has operations to execute.
                                 The initial delay is usually called pipeline start-up delay (P), and is equal to total execution time of one
                                 instruction. The speed-up of a pipelined machine relative to a nonpipelined machine is calculated as

                                                                      P * N
                                                                    ----------------------------           (42.3)
                                                                          –
                                                                    P +  ( N 1)
                                   When N is much larger than the number of pipestages P, the ideal speed-up approaches P. This is an
                                 intuitive result since there are P parts of the machine working in parallel, allowing the execution to go
                                 about P times faster in ideal conditions.
                                   The overlap of sequential instructions in a processor pipeline is shown in Fig. 42.4(b). The instruction
                                 pipeline becomes full after the pipeline delay of P = 5 cycles. Although the pipeline configuration executes
                                 operations in each stage of the processor, two important mechanisms are constructed to ensure correct
                                 functional operation between dependent instructions in the presence of data hazards. Data hazards occur
                                 when instructions in the pipeline generate results that are necessary for later instructions that are already
                                 started in the pipeline. In the pipeline configuration of Fig. 42.4(a), register operands are initially retrieved
                                 during the decode stage. However, the execute and memory stage can define register operands and contain
                                 the correct current value but are not able to update the register file until the later write-back execution
                                 stage. Forwarding (or bypassing) is the action of retrieving the correct operand value for an executing
                                 instruction between the initial register file access and any pending instruction’s register file updates.
                                 Interlocking is the action of stalling an operation in the pipeline when conditions cause necessary register
                                 operand results to be delayed. It is necessary to stall early stages of the machine so that the correct results
                                 are used, and the machine does not proceed with incorrect values for source operands. The primary
                                 causes of delay in pipeline execution are initiated due to instruction fetch delay and memory latency.
                                 Branch Prediction
                                 Branch instructions pose serious problems for pipelined processors by causing hardware to fetch and
                                 execute instructions until the branch instructions are completed. Executing incorrect instructions can
                                 result in severe performance degradation through the introduction of wasted cycles into the instruction
                                 stream.
                                   There are several methods for dealing with pipeline stalls caused by branch instructions. The simplest
                                 performance scheme handles branches by treating every branch as either taken or not taken. This treat-
                                 ment can be set for every branch or determined by the branch opcode. The designation allows the pipeline
                                 to continue to fetch instructions as if the branch was a normal instruction. However, the fetched instruction
                                 may need to be discarded and the instruction fetch restarted when the branch outcome is incorrect.
                                 Delayed branching is another scheme which treats the set of sequential instructions following a branch
                                 as delay slots. The delay-slot instructions are executed whether or not the branch instruction is taken.
                                 Limitations on delayed branches are caused by the compiler and program characteristics being unable
                                 to support numerous instructions that execute independent of the branch direction. Improvements have
                                 been introduced to provide nullifying branches, which include a predicted direction for the branch. When
                                 the prediction is incorrect, the delay-slot instructions are nullified.

                                 ©2002 CRC Press LLC
   1122   1123   1124   1125   1126   1127   1128   1129   1130   1131   1132