Page 44 - ARM 64 Bit Assembly Language
P. 44

Introduction  27

                     is 1110xxxx 10xxxxxx 10xxxxxx where the x characters are replaced with the 16 least-
                     significant bits of the code point. In this case the code point, in binary is 0010000010101100.
                     Therefore, the UTF-8 encoding for e is 11100010 10000010 10101100 in binary, or E2 82
                     AC in hexadecimal.
                     In summary, there are three components to modern language support. The ISO/IEC 10646 de-
                     fines a mapping from code points (numbers) to glyphs (characters). UTF-8 defines an efficient
                     variable-length encoding for code points (text data) in the ISO/IEC 10646 standard. Unicode
                     adds language specific properties to the ISO/IEC 10646 character set. Together, these three el-
                     ements currently provide support for textual data in almost every human written language, and
                     they continue to be extended and refined.



                     1.4 Memory layout of an executing program

                     Computer memory consists of number of storage locations, or cells, each of which has a
                     unique numeric address. Addresses are usually written in hexadecimal. Each storage loca-
                     tion can contain a fixed number of binary digits. The most common size is one byte. Most
                     computers group bytes together into words. A computer CPU that is capable of accessing a
                     single byte of memory is said to have byte addressable memory. Some CPUs are only capa-
                     ble of accessing memory only in word-sized groups. They are said to have word addressable
                     memory.
                     Fig. 1.6A shows a section of memory containing some data. Each byte has a unique address
                     that is used when data is transferred to or from that memory cell. Most processors can also
                     move data in word-sized chunks. On a 32-bit system, four bytes are grouped together to form
                     a word. There are two ways that this grouping can be done. Systems that store the most sig-
                     nificant byte of a word in the smallest address, and the least significant byte in the largest
                     address, are said to be big-endian. The big-endian interpretation of a region of memory is
                     shown in Fig. 1.6B. As showninFig. 1.6C, little-endian systems store the least significant
                     byte in the lowest address and the most significant byte in the highest address. Some pro-
                     cessors, such as the ARM, can be configured as either little-endian or big-endian. The Linux
                     operating system, by default, configures the ARM processor to run in little-endian mode.
                     The memory layout for a typical program is shown in Fig. 1.7. The program is divided into
                     six major memory regions, or sections. The programmer specifies the contents of the .text,
                     .data, .rodata,and .bss sections. The Stack and Heap sections are defined when the
                     program is loaded for execution. The Stack and Heap may grow and shrink as the program
                     executes, while the other sections are set to fixed sizes by the compiler, linker, and loader. The
                     .text section contains the executable instructions. The .data, .rodata,and .bss sections
   39   40   41   42   43   44   45   46   47   48   49