Page 44 - ARM 64 Bit Assembly Language

P. 44

Introduction 27

is 1110xxxx 10xxxxxx 10xxxxxx where the x characters are replaced with the 16 least-
signiﬁcant bits of the code point. In this case the code point, in binary is 0010000010101100.
Therefore, the UTF-8 encoding for e is 11100010 10000010 10101100 in binary, or E2 82
AC in hexadecimal.
In summary, there are three components to modern language support. The ISO/IEC 10646 de-
ﬁnes a mapping from code points (numbers) to glyphs (characters). UTF-8 deﬁnes an efﬁcient
variable-length encoding for code points (text data) in the ISO/IEC 10646 standard. Unicode
adds language speciﬁc properties to the ISO/IEC 10646 character set. Together, these three el-
ements currently provide support for textual data in almost every human written language, and
they continue to be extended and reﬁned.

1.4 Memory layout of an executing program

Computer memory consists of number of storage locations, or cells, each of which has a
unique numeric address. Addresses are usually written in hexadecimal. Each storage loca-
tion can contain a ﬁxed number of binary digits. The most common size is one byte. Most
computers group bytes together into words. A computer CPU that is capable of accessing a
single byte of memory is said to have byte addressable memory. Some CPUs are only capa-
ble of accessing memory only in word-sized groups. They are said to have word addressable
memory.
Fig. 1.6A shows a section of memory containing some data. Each byte has a unique address
that is used when data is transferred to or from that memory cell. Most processors can also
move data in word-sized chunks. On a 32-bit system, four bytes are grouped together to form
a word. There are two ways that this grouping can be done. Systems that store the most sig-
niﬁcant byte of a word in the smallest address, and the least signiﬁcant byte in the largest
address, are said to be big-endian. The big-endian interpretation of a region of memory is
shown in Fig. 1.6B. As showninFig. 1.6C, little-endian systems store the least signiﬁcant
byte in the lowest address and the most signiﬁcant byte in the highest address. Some pro-
cessors, such as the ARM, can be conﬁgured as either little-endian or big-endian. The Linux
operating system, by default, conﬁgures the ARM processor to run in little-endian mode.
The memory layout for a typical program is shown in Fig. 1.7. The program is divided into
six major memory regions, or sections. The programmer speciﬁes the contents of the .text,
.data, .rodata,and .bss sections. The Stack and Heap sections are deﬁned when the
program is loaded for execution. The Stack and Heap may grow and shrink as the program
executes, while the other sections are set to ﬁxed sizes by the compiler, linker, and loader. The
.text section contains the executable instructions. The .data, .rodata,and .bss sections

39 40 41 42 43 44 45 46 47 48 49