Page 187 - A Practical Guide from Design Planning to Manufacturing

P. 187

160 Chapter Five

processor can assume that each instruction will execute in the shortest pos-
sible number of cycles. After execution the instruction is checked to see this
guess was correct. Scheduling assuming data will be available is sometimes
called data speculation. It is very similar to branch prediction. Both meth-
ods make predictions during scheduling and reduce delay when correct,
at the cost of increased delay when incorrect. When branch prediction is
wrong instructions are incorrectly executed and must be discarded. When
data speculation is wrong, the correct instructions are executed but with
the wrong data. These instructions cannot be discarded; instead, they
must be executed again, this time with the correct data. Sending instruc-
tions back into the pipe is called replay. If the events that cause replay
are rare enough, overall performance is improved.

Life of an Instruction
The basic steps any microprocessor instruction goes through have changed
little since the first pipelined processors. An instruction must be fetched.
It must be decoded to determine what type of instruction it is. The instruc-
tion is executed and the results stored. What has changed is that steps
have been added to try and improve performance, such as register renam-
ing and out-of-order scheduling. The total number of cycles in the pipeline
has increased to allow higher clock frequency. As an example of how a
processor microarchitecture works, this section describes in detail what
actions occur during each step of the original Pentium 4 pipeline.
The Pentium 4 actually has two separate pipelines, the front-end
pipeline, which translates macroinstructions into uops, and the execu-
tion pipeline, which executes uops.
The front-end pipeline has the responsibility for fetching macroin-
structions from memory and decoding them to keep a trace cache filled
with uops. The execution pipeline works only with uops and is respon-
sible for scheduling, executing, and retiring these instructions. Table 5-1

TABLE 5-1 Pentium & Pipeline
Front-end pipeline Microinstruction pipeline (20)
Instruction prefetch Microbranch prediction (2)
L2 cache read Microinstruction fetch (2)
Instruction decode Drive (1)
Branch prediction Allocation (1)
Trace cache write Register rename (2)
Load instruction queue (1)
Schedule (3)
Dispatch (2)
Register file read (2)
Execute (1)
Calculate flags (1)
Retirement (1)
Drive (1)

182 183 184 185 186 187 188 189 190 191 192