Page 326 - ARM 64 Bit Assembly Language

P. 326

Floating point 315

24 // precision floating point. It computes sine by summing
25 // the first six terms of the Taylor series.
26 .global sin_a_f
27 sin_a_f:// register s0 contains x
28 // initialize variables
29 fmul s1,s0,s0 // s1 <- x^2
30 fmov s3,s0 // s3 <= x
31 ldr x0,=ctab // load pointer to coefficients
32 mov x3,#TERMS // load loop counter
33 // loop over table
34 loop: fmul s3,s1,s3 // s3 <- x^(2n+1)
35 ldr s4,[x0],#4 // load coefficient and increment pointer
36 subs x3,x3,#1 // decrement and test loop count
37 fmadd s0,s3,s4,s0 // s0 += next term
38 bne loop // loop five times
39 ret

Listing 9.1 shows a single precision ﬂoating point implementation of the sine function, using
the ARM FP/NEON instruction set. It works in a similar way to the previous ﬁxed point code.
There is a table of constants, each of which is the reciprocal of one of the factorial divisors in
the Taylor series for sine. The subroutine calculates the powers of x one-by-one, and multi-
plies each power by the next constant in the table, summing the results as it goes. Note that
the single precision ﬂoating point version uses fewer terms of the Taylor series than the ﬁxed
point version. This is because there are fewer bits of precision in the IEEE single precision
format than in the ﬁxed point format used in the previous chapter.

Listing 9.2 Simple scalar implementation of the sinx function using IEEE double
precision.

1 //*************************************************************
2 // Name: sincos_a_f.S
3 // Author: Larry Pyeatt
4 // Date: 2/22/2018
5 //*************************************************************
6 // This is a version of the sin functions that uses double
7 // precision floating point with the FP/NEON instruction set.
8 // ---------------------------------------------------------------
9 .data
10 // The following is a table of constants used in the
11 // Taylor series approximation for sine
12 .align 6 // Align for efficient caching
13 ctab: .word 0x55555555, 0xBFC55555 // -1.666666666666667e-01 (-1/3!)
14 .word 0x11111111, 0x3F811111 // 8.333333333333333e-03 (1/5!)
15 .word 0x1A01A01A, 0xBF2A01A0 // -1.984126984126984e-04 (-1/7!)
16 .word 0xA556C734, 0x3EC71DE3 // 2.755731922398589e-06 (1/9!)
17 .word 0x67F544E4, 0xBE5AE645 // -2.505210838544172e-08

321 322 323 324 325 326 327 328 329 330 331