Page 331 - ARM 64 Bit Assembly Language

P. 331

320 Chapter 9

• Show two ways to ﬁx the code in AArch64 assembly. Hint: One way is to change
the amount of the increment, which will change the number of times that the loop
executes.
9.6. The ﬁxed point sine function from the previous chapter was not compared directly to
the hand-coded VFP implementation. Based on the information in Table 9.2 and Ta-
ble 8.4, would you expect the ﬁxed point sine function from the previous chapter to
beat the hand-coded assembly VFP sine function in this chapter? Why or why not?
9.7. 3-D objects are often stored as an array of points, where each point is a vector (array)
consisting of four values, x, y, z, and the constant 1.0. Rotation, translation, scaling,
and other operations are accomplished by multiplying each point by a 4 × 4 transfor-
mation matrix. The following C code shows the data types and the transform opera-
tion:
1 typedef float point[4]; // Point is an array of floats
2 typedef float matrix[4][4]; // Matrix is a 2-D array of floats
3 .
4 .
5 .
6 void xform(matrix *m, point* p)
7 {
8 int i,j;
9 point result;
10 for(i=0;i<4;i++)
11 {
12 result[i] = 0.0;
13 for(j=0;j<4;j++)
14 result[i] += *m[j][i] * *p[j];
15 }
16 for(i=0;i<4;i++)
17 *p[i] = result[i];
18 }
Write the equivalent AArch64 assembly code.
9.8. Optimize the AArch64 assembly code you wrote in the previous exercise. Use vector
mode if possible.
9.9. Since the fourth element of the point is always 1.0, there is no need to actually store it.
This will reduce memory requirements by about 25%, and require one fewer multiply.
The C code would look something like this:

1 typedef float[3] point; // Point is an array of floats
2 typedef float[4][4] matrix; // Matrix is a 2-D array of floats
3 .
4 .
5 .
6 void xform(matrix *m, point* p)

326 327 328 329 330 331 332 333 334 335 336