Page 189 - A Practical Guide from Design Planning to Manufacturing
P. 189

162   Chapter Five

          Once the page is in memory and the address translation is complete,
        the processor issues a read address to the L2 cache and begins reading
        instructions.

        L2 cache read
         The Pentium 4 L2 cache is a unified cache, meaning that it stores both
        instructions and data. Reads looking for either are treated the same way.
        Part of the read address bits is used to select a particular line in the data
        and tag array. The most significant address bits are then compared with
        value read from the tag array. Amatch indicates a cache hit and no match
        indicates a cache miss. The Pentium 4 L2 cache is actually 8-way asso-
        ciative which means that 8 separate lines are read from the tag array and
        data array. Comparing all 8 tag lines to the requested address deter-
        mines which if any of the 8 data lines is the correct one.
          If the needed data is not in the cache, the bus controller is signaled to
        request the data from main memory. If the data is found in the cache, at
        this point the instructions are treated as only a stream of numbers. The
        processor does not know what the instructions are or even how many
        have been retrieved. This will be determined during the next step,
        instruction decode.


        Instruction decode
        During the instruction decode step the actual instructions retrieved from
        the L2 cache are determined. Because the number of bytes used to encode
        a macroinstruction varies for the x86 architecture, the first task is to
        determine how many macroinstructions were read and where their start-
        ing points are. Each instruction’s bits provide an opcode that determines
        the operation as well as bits encoding the operands.
          Each macroinstruction is decoded into up to 4 uops. If the translation
        requires more than 4 uops, a place keeper instruction is created instead,
        which will cause the full translation to be read from the microcode ROM
        at the proper time. If one of the macroinstructions being decoded is a
        branch, then it is sent for branch prediction.

        Branch prediction
        A prediction of taken or not taken is made for every macroinstruction
        branch. After being found by the decoder the address of the branch is
        used to look up a counter showing its past behavior and its target
        address in the branch target buffer (BTB). A hit enables the BTB to make
        a prediction and provide the target address. A miss causes a new entry
        to be allocated for the branch in the BTB, and the branch to go through
        static prediction. In general, backward branches are predicted as taken,
        and forward branches are predicted as not taken.
   184   185   186   187   188   189   190   191   192   193   194