Page 204 - ARM 64 Bit Assembly Language
P. 204

192 Chapter 7


                           Listing 7.1 AArch64 assembly code for adding two 128 bit numbers.

                1         adds    x0, x0, x2     // add the low-order double-words,
                2                                // ash set flags in PSTATE
                3         adc     x1, x1, x3     // add the high-order double-words
                4                                // plus the carry flag



                  adds the two most significant words along with the carry bit. This technique can be extended
                  to add integers with any number of bits.

                  Example 15. Binary multiplication using Algorithm 1.
                  Assume we wish to multiply two numbers, x = 01101001 and y = 01011010. Applying Algo-
                  rithm 1 results in the following sequence:


                                 a                    x               y        Next operation
                         0000000000000000     0000000001101001     01011010    shift only
                         0000000000000000     0000000011010010     00101101    add, then shift
                         0000000011010010     0000000110100100     00010110    shift only
                         0000000011010010     0000001101001000     00001011    add, then shift
                         0000010000011010     0000011010010000     00000101    add, then shift
                         0000101010101010     0000110100100000     00000010    shift only
                         0000101010101010     0001101001000000     00000001    add, then shift
                         0010010011101010     0011010010000000     00000000    return result

                                                   105 × 90 = 9450


                  On an AArch64 processor, the algorithm to multiply two 64-bit unsigned integers is very effi-
                  cient. Listing 7.2 shows one possible algorithm for multiplying two 64-bit numbers to obtain a
                  128-bit result. The code is a straightforward implementation of the algorithm, and some mod-
                  ifications can be made to improve efficiency. For example, if we only want a 64-bit result, we
                  do not need to perform 128-bit addition. This significantly simplifies the code, as shown in
                  Listing 7.3.

                  Listing 7.2 AArch64 assembly code for multiplication with a 128 bit result without umulh
                                                      or smulh.

                1         .section .rodata
                2  x:     .8byte  0x57
                3  y:     .8byte  0x75
                4  msg:   .asciz  "%lx * %lx = %016lx%016lx\n"
   199   200   201   202   203   204   205   206   207   208   209