]>
Commit | Line | Data |
---|---|---|
b693d0b3 MCC |
1 | ============================== |
2 | Memory Layout on AArch64 Linux | |
3 | ============================== | |
4 | ||
5 | Author: Catalin Marinas <catalin.marinas@arm.com> | |
6 | ||
7 | This document describes the virtual memory layout used by the AArch64 | |
8 | Linux kernel. The architecture allows up to 4 levels of translation | |
9 | tables with a 4KB page size and up to 3 levels with a 64KB page size. | |
10 | ||
11 | AArch64 Linux uses either 3 levels or 4 levels of translation tables | |
12 | with the 4KB page configuration, allowing 39-bit (512GB) or 48-bit | |
13 | (256TB) virtual addresses, respectively, for both user and kernel. With | |
14 | 64KB pages, only 2 levels of translation tables, allowing 42-bit (4TB) | |
15 | virtual address, are used but the memory layout is the same. | |
16 | ||
d2c68de1 SC |
17 | ARMv8.2 adds optional support for Large Virtual Address space. This is |
18 | only available when running with a 64KB page size and expands the | |
19 | number of descriptors in the first level of translation. | |
20 | ||
b693d0b3 MCC |
21 | User addresses have bits 63:48 set to 0 while the kernel addresses have |
22 | the same bits set to 1. TTBRx selection is given by bit 63 of the | |
23 | virtual address. The swapper_pg_dir contains only kernel (global) | |
24 | mappings while the user pgd contains only user (non-global) mappings. | |
25 | The swapper_pg_dir address is written to TTBR1 and never written to | |
26 | TTBR0. | |
27 | ||
28 | ||
d2c68de1 | 29 | AArch64 Linux memory layout with 4KB pages + 4 levels (48-bit):: |
b693d0b3 MCC |
30 | |
31 | Start End Size Use | |
32 | ----------------------------------------------------------------------- | |
33 | 0000000000000000 0000ffffffffffff 256TB user | |
d2c68de1 | 34 | ffff000000000000 ffff7fffffffffff 128TB kernel logical memory map |
68af6d24 | 35 | [ffff600000000000 ffff7fffffffffff] 32TB [kasan shadow region] |
f4693c27 AB |
36 | ffff800000000000 ffff800007ffffff 128MB bpf jit region |
37 | ffff800008000000 ffff80000fffffff 128MB modules | |
9ad7c6d5 AB |
38 | ffff800010000000 fffffbffefffffff 124TB vmalloc |
39 | fffffbfff0000000 fffffbfffdffffff 224MB fixed mappings (top down) | |
40 | fffffbfffe000000 fffffbfffe7fffff 8MB [guard region] | |
41 | fffffbfffe800000 fffffbffff7fffff 16MB PCI I/O space | |
42 | fffffbffff800000 fffffbffffffffff 8MB [guard region] | |
8c96400d AB |
43 | fffffc0000000000 fffffdffffffffff 2TB vmemmap |
44 | fffffe0000000000 ffffffffffffffff 2TB [guard region] | |
d2c68de1 SC |
45 | |
46 | ||
47 | AArch64 Linux memory layout with 64KB pages + 3 levels (52-bit with HW support):: | |
b693d0b3 MCC |
48 | |
49 | Start End Size Use | |
50 | ----------------------------------------------------------------------- | |
d2c68de1 | 51 | 0000000000000000 000fffffffffffff 4PB user |
f4693c27 | 52 | fff0000000000000 ffff7fffffffffff ~4PB kernel logical memory map |
68af6d24 | 53 | [fffd800000000000 ffff7fffffffffff] 512TB [kasan shadow region] |
f4693c27 AB |
54 | ffff800000000000 ffff800007ffffff 128MB bpf jit region |
55 | ffff800008000000 ffff80000fffffff 128MB modules | |
9ad7c6d5 AB |
56 | ffff800010000000 fffffbffefffffff 124TB vmalloc |
57 | fffffbfff0000000 fffffbfffdffffff 224MB fixed mappings (top down) | |
58 | fffffbfffe000000 fffffbfffe7fffff 8MB [guard region] | |
59 | fffffbfffe800000 fffffbffff7fffff 16MB PCI I/O space | |
60 | fffffbffff800000 fffffbffffffffff 8MB [guard region] | |
8c96400d AB |
61 | fffffc0000000000 ffffffdfffffffff ~4TB vmemmap |
62 | ffffffe000000000 ffffffffffffffff 128GB [guard region] | |
b693d0b3 MCC |
63 | |
64 | ||
65 | Translation table lookup with 4KB pages:: | |
66 | ||
67 | +--------+--------+--------+--------+--------+--------+--------+--------+ | |
68 | |63 56|55 48|47 40|39 32|31 24|23 16|15 8|7 0| | |
69 | +--------+--------+--------+--------+--------+--------+--------+--------+ | |
70 | | | | | | | | |
71 | | | | | | v | |
72 | | | | | | [11:0] in-page offset | |
73 | | | | | +-> [20:12] L3 index | |
74 | | | | +-----------> [29:21] L2 index | |
75 | | | +---------------------> [38:30] L1 index | |
76 | | +-------------------------------> [47:39] L0 index | |
77 | +-------------------------------------------------> [63] TTBR0/1 | |
78 | ||
79 | ||
80 | Translation table lookup with 64KB pages:: | |
81 | ||
82 | +--------+--------+--------+--------+--------+--------+--------+--------+ | |
83 | |63 56|55 48|47 40|39 32|31 24|23 16|15 8|7 0| | |
84 | +--------+--------+--------+--------+--------+--------+--------+--------+ | |
85 | | | | | | | |
86 | | | | | v | |
87 | | | | | [15:0] in-page offset | |
88 | | | | +----------> [28:16] L3 index | |
89 | | | +--------------------------> [41:29] L2 index | |
d2c68de1 SC |
90 | | +-------------------------------> [47:42] L1 index (48-bit) |
91 | | [51:42] L1 index (52-bit) | |
b693d0b3 MCC |
92 | +-------------------------------------------------> [63] TTBR0/1 |
93 | ||
94 | ||
95 | When using KVM without the Virtualization Host Extensions, the | |
96 | hypervisor maps kernel pages in EL2 at a fixed (and potentially | |
97 | random) offset from the linear mapping. See the kern_hyp_va macro and | |
98 | kvm_update_va_mask function for more details. MMIO devices such as | |
99 | GICv2 gets mapped next to the HYP idmap page, as do vectors when | |
c4792b6d | 100 | ARM64_SPECTRE_V3A is enabled for particular CPUs. |
b693d0b3 MCC |
101 | |
102 | When using KVM with the Virtualization Host Extensions, no additional | |
103 | mappings are created, since the host kernel runs directly in EL2. | |
d2c68de1 SC |
104 | |
105 | 52-bit VA support in the kernel | |
106 | ------------------------------- | |
107 | If the ARMv8.2-LVA optional feature is present, and we are running | |
108 | with a 64KB page size; then it is possible to use 52-bits of address | |
109 | space for both userspace and kernel addresses. However, any kernel | |
110 | binary that supports 52-bit must also be able to fall back to 48-bit | |
111 | at early boot time if the hardware feature is not present. | |
112 | ||
113 | This fallback mechanism necessitates the kernel .text to be in the | |
114 | higher addresses such that they are invariant to 48/52-bit VAs. Due | |
115 | to the kasan shadow being a fraction of the entire kernel VA space, | |
116 | the end of the kasan shadow must also be in the higher half of the | |
117 | kernel VA space for both 48/52-bit. (Switching from 48-bit to 52-bit, | |
118 | the end of the kasan shadow is invariant and dependent on ~0UL, | |
119 | whilst the start address will "grow" towards the lower addresses). | |
120 | ||
121 | In order to optimise phys_to_virt and virt_to_phys, the PAGE_OFFSET | |
122 | is kept constant at 0xFFF0000000000000 (corresponding to 52-bit), | |
123 | this obviates the need for an extra variable read. The physvirt | |
124 | offset and vmemmap offsets are computed at early boot to enable | |
125 | this logic. | |
126 | ||
127 | As a single binary will need to support both 48-bit and 52-bit VA | |
128 | spaces, the VMEMMAP must be sized large enough for 52-bit VAs and | |
ce4a64e1 | 129 | also must be sized large enough to accommodate a fixed PAGE_OFFSET. |
d2c68de1 SC |
130 | |
131 | Most code in the kernel should not need to consider the VA_BITS, for | |
132 | code that does need to know the VA size the variables are | |
133 | defined as follows: | |
134 | ||
135 | VA_BITS constant the *maximum* VA space size | |
136 | ||
137 | VA_BITS_MIN constant the *minimum* VA space size | |
138 | ||
139 | vabits_actual variable the *actual* VA space size | |
140 | ||
141 | ||
142 | Maximum and minimum sizes can be useful to ensure that buffers are | |
143 | sized large enough or that addresses are positioned close enough for | |
144 | the "worst" case. | |
145 | ||
146 | 52-bit userspace VAs | |
147 | -------------------- | |
148 | To maintain compatibility with software that relies on the ARMv8.0 | |
149 | VA space maximum size of 48-bits, the kernel will, by default, | |
150 | return virtual addresses to userspace from a 48-bit range. | |
151 | ||
152 | Software can "opt-in" to receiving VAs from a 52-bit space by | |
153 | specifying an mmap hint parameter that is larger than 48-bit. | |
a2b99dca | 154 | |
d2c68de1 | 155 | For example: |
a2b99dca AZ |
156 | |
157 | .. code-block:: c | |
158 | ||
159 | maybe_high_address = mmap(~0UL, size, prot, flags,...); | |
d2c68de1 SC |
160 | |
161 | It is also possible to build a debug kernel that returns addresses | |
162 | from a 52-bit space by enabling the following kernel config options: | |
a2b99dca AZ |
163 | |
164 | .. code-block:: sh | |
165 | ||
d2c68de1 SC |
166 | CONFIG_EXPERT=y && CONFIG_ARM64_FORCE_52BIT=y |
167 | ||
168 | Note that this option is only intended for debugging applications | |
169 | and should not be used in production. |