Page 44 - ARM 64 Bit Assembly Language
P. 44
Introduction 27
is 1110xxxx 10xxxxxx 10xxxxxx where the x characters are replaced with the 16 least-
significant bits of the code point. In this case the code point, in binary is 0010000010101100.
Therefore, the UTF-8 encoding for e is 11100010 10000010 10101100 in binary, or E2 82
AC in hexadecimal.
In summary, there are three components to modern language support. The ISO/IEC 10646 de-
fines a mapping from code points (numbers) to glyphs (characters). UTF-8 defines an efficient
variable-length encoding for code points (text data) in the ISO/IEC 10646 standard. Unicode
adds language specific properties to the ISO/IEC 10646 character set. Together, these three el-
ements currently provide support for textual data in almost every human written language, and
they continue to be extended and refined.
1.4 Memory layout of an executing program
Computer memory consists of number of storage locations, or cells, each of which has a
unique numeric address. Addresses are usually written in hexadecimal. Each storage loca-
tion can contain a fixed number of binary digits. The most common size is one byte. Most
computers group bytes together into words. A computer CPU that is capable of accessing a
single byte of memory is said to have byte addressable memory. Some CPUs are only capa-
ble of accessing memory only in word-sized groups. They are said to have word addressable
memory.
Fig. 1.6A shows a section of memory containing some data. Each byte has a unique address
that is used when data is transferred to or from that memory cell. Most processors can also
move data in word-sized chunks. On a 32-bit system, four bytes are grouped together to form
a word. There are two ways that this grouping can be done. Systems that store the most sig-
nificant byte of a word in the smallest address, and the least significant byte in the largest
address, are said to be big-endian. The big-endian interpretation of a region of memory is
shown in Fig. 1.6B. As showninFig. 1.6C, little-endian systems store the least significant
byte in the lowest address and the most significant byte in the highest address. Some pro-
cessors, such as the ARM, can be configured as either little-endian or big-endian. The Linux
operating system, by default, configures the ARM processor to run in little-endian mode.
The memory layout for a typical program is shown in Fig. 1.7. The program is divided into
six major memory regions, or sections. The programmer specifies the contents of the .text,
.data, .rodata,and .bss sections. The Stack and Heap sections are defined when the
program is loaded for execution. The Stack and Heap may grow and shrink as the program
executes, while the other sections are set to fixed sizes by the compiler, linker, and loader. The
.text section contains the executable instructions. The .data, .rodata,and .bss sections