Page 177 - A Practical Guide from Design Planning to Manufacturing
P. 177
150 Chapter Five
Without branch prediction a pipeline break is created, since the proces-
sor must wait until the branch has been resolved before fetching the
proper instruction. With branch prediction, the processor guesses
whether the branch will be taken or not, and based on that prediction it
continues fetching instructions without a break. If the prediction is cor-
rect, time has been saved and performance is improved. If the prediction
is incorrect, several instructions will have incorrectly entered the pipeline
before the mistake is discovered. It will take longer to begin executing
the correct instruction than if no prediction had been made and the
processor had merely waited to fully resolve the branch. These lost cycles
are known as the branch mispredict penalty. For branch prediction to
improve performance, it must be accurate enough for the correct pre-
dictions to save more cycles than are lost to incorrect predictions. As the
pipeline gets longer, the number of cycles from prediction to discovering
an incorrect prediction increases. This increases the branch mispredict
penalty and makes prediction accuracy more important.
Some architectures allow the compiler to include a suggested predic-
tion as part of the encoding for each branch instruction. For architec-
tures that do not include this, the simplest type of hardware prediction
is static prediction, which causes each branch to be predicted the same
way every time. On average, backward branches (those going to earlier
instructions) tend to be taken and forward branches (those going to
later instructions) tend not to be taken. For branches with relative tar-
gets, this makes deciding to guess taken or not taken a simple matter
of looking at the sign bit of the target: for a negative target address guess
taken, and for a positive target address guess not taken.
Static prediction has the advantage of being extremely simple but may
have very poor results for some branches. Prediction accuracy is greatly
improved by using dynamic prediction, which uses past branch behav-
ior to predict future behavior. One of the most commonly used dynamic
prediction schemes is called 2-bit prediction. 7
The processor contains a cache memory holding 2-bit counters for each
of the recently accessed branches. This cache is called a branch predic-
tion buffer or branch history table. Each counter holds the value 0, 1, 2,
or 3. Whenever a branch is taken, its 2-bit counter is incremented if it
is below the maximum value of 3. Whenever a branch is not taken, its
2-bit counter is decremented if it is above the minimum value of 0. When
a branch is fetched, its address is used to lookup the value of the counter
for that branch. If the value is a 0 or 1, the branch is assumed to be
not taken. If the counter is a 2 or 3, the branch is assumed to be taken.
See Fig. 5-14.
7
Smith, “Study of Branch Prediction”.