Page 177 - A Practical Guide from Design Planning to Manufacturing
P. 177

150   Chapter Five

          Without branch prediction a pipeline break is created, since the proces-
        sor must wait until the branch has been resolved before fetching the
        proper instruction. With branch prediction, the processor guesses
        whether the branch will be taken or not, and based on that prediction it
        continues fetching instructions without a break. If the prediction is cor-
        rect, time has been saved and performance is improved. If the prediction
        is incorrect, several instructions will have incorrectly entered the pipeline
        before the mistake is discovered. It will take longer to begin executing
        the correct instruction than if no prediction had been made and the
        processor had merely waited to fully resolve the branch. These lost cycles
        are known as the branch mispredict penalty. For branch prediction to
        improve performance, it must be accurate enough for the correct pre-
        dictions to save more cycles than are lost to incorrect predictions. As the
        pipeline gets longer, the number of cycles from prediction to discovering
        an incorrect prediction increases. This increases the branch mispredict
        penalty and makes prediction accuracy more important.
          Some architectures allow the compiler to include a suggested predic-
        tion as part of the encoding for each branch instruction. For architec-
        tures that do not include this, the simplest type of hardware prediction
        is static prediction, which causes each branch to be predicted the same
        way every time. On average, backward branches (those going to earlier
        instructions) tend to be taken and forward branches (those going to
        later instructions) tend not to be taken. For branches with relative tar-
        gets, this makes deciding to guess taken or not taken a simple matter
        of looking at the sign bit of the target: for a negative target address guess
        taken, and for a positive target address guess not taken.
          Static prediction has the advantage of being extremely simple but may
        have very poor results for some branches. Prediction accuracy is greatly
        improved by using dynamic prediction, which uses past branch behav-
        ior to predict future behavior. One of the most commonly used dynamic
        prediction schemes is called 2-bit prediction. 7
          The processor contains a cache memory holding 2-bit counters for each
        of the recently accessed branches. This cache is called a branch predic-
        tion buffer or branch history table. Each counter holds the value 0, 1, 2,
        or 3. Whenever a branch is taken, its 2-bit counter is incremented if it
        is below the maximum value of 3. Whenever a branch is not taken, its
        2-bit counter is decremented if it is above the minimum value of 0. When
        a branch is fetched, its address is used to lookup the value of the counter
        for that branch. If the value is a 0 or 1, the branch is assumed to be
        not taken. If the counter is a 2 or 3, the branch is assumed to be taken.
        See Fig. 5-14.



          7
          Smith, “Study of Branch Prediction”.
   172   173   174   175   176   177   178   179   180   181   182