]>
Commit | Line | Data |
---|---|---|
b693d0b3 MCC |
1 | ===================== |
2 | Booting AArch64 Linux | |
3 | ===================== | |
9703d9d7 CM |
4 | |
5 | Author: Will Deacon <will.deacon@arm.com> | |
b693d0b3 | 6 | |
9703d9d7 CM |
7 | Date : 07 September 2012 |
8 | ||
9 | This document is based on the ARM booting document by Russell King and | |
10 | is relevant to all public releases of the AArch64 Linux kernel. | |
11 | ||
12 | The AArch64 exception model is made up of a number of exception levels | |
13 | (EL0 - EL3), with EL0 and EL1 having a secure and a non-secure | |
14 | counterpart. EL2 is the hypervisor level and exists only in non-secure | |
15 | mode. EL3 is the highest priority level and exists only in secure mode. | |
16 | ||
b693d0b3 | 17 | For the purposes of this document, we will use the term `boot loader` |
9703d9d7 CM |
18 | simply to define all software that executes on the CPU(s) before control |
19 | is passed to the Linux kernel. This may include secure monitor and | |
20 | hypervisor code, or it may just be a handful of instructions for | |
21 | preparing a minimal boot environment. | |
22 | ||
23 | Essentially, the boot loader should provide (as a minimum) the | |
24 | following: | |
25 | ||
26 | 1. Setup and initialise the RAM | |
27 | 2. Setup the device tree | |
28 | 3. Decompress the kernel image | |
29 | 4. Call the kernel image | |
30 | ||
31 | ||
32 | 1. Setup and initialise RAM | |
33 | --------------------------- | |
34 | ||
35 | Requirement: MANDATORY | |
36 | ||
37 | The boot loader is expected to find and initialise all RAM that the | |
38 | kernel will use for volatile data storage in the system. It performs | |
39 | this in a machine dependent manner. (It may use internal algorithms | |
40 | to automatically locate and size all RAM, or it may use knowledge of | |
41 | the RAM in the machine, or any other method the boot loader designer | |
42 | sees fit.) | |
43 | ||
44 | ||
45 | 2. Setup the device tree | |
46 | ------------------------- | |
47 | ||
48 | Requirement: MANDATORY | |
49 | ||
61bd93ce AB |
50 | The device tree blob (dtb) must be placed on an 8-byte boundary and must |
51 | not exceed 2 megabytes in size. Since the dtb will be mapped cacheable | |
52 | using blocks of up to 2 megabytes in size, it must not be placed within | |
53 | any 2M region which must be mapped with any specific attributes. | |
9703d9d7 | 54 | |
61bd93ce AB |
55 | NOTE: versions prior to v4.2 also require that the DTB be placed within |
56 | the 512 MB region starting at text_offset bytes below the kernel Image. | |
9703d9d7 CM |
57 | |
58 | 3. Decompress the kernel image | |
59 | ------------------------------ | |
60 | ||
61 | Requirement: OPTIONAL | |
62 | ||
63 | The AArch64 kernel does not currently provide a decompressor and | |
64 | therefore requires decompression (gzip etc.) to be performed by the boot | |
65 | loader if a compressed Image target (e.g. Image.gz) is used. For | |
66 | bootloaders that do not implement this requirement, the uncompressed | |
67 | Image target is available instead. | |
68 | ||
69 | ||
70 | 4. Call the kernel image | |
71 | ------------------------ | |
72 | ||
73 | Requirement: MANDATORY | |
74 | ||
b693d0b3 | 75 | The decompressed kernel image contains a 64-byte header as follows:: |
9703d9d7 | 76 | |
4370eec0 RF |
77 | u32 code0; /* Executable code */ |
78 | u32 code1; /* Executable code */ | |
a2c1d73b MR |
79 | u64 text_offset; /* Image load offset, little endian */ |
80 | u64 image_size; /* Effective Image size, little endian */ | |
81 | u64 flags; /* kernel flags, little endian */ | |
9703d9d7 | 82 | u64 res2 = 0; /* reserved */ |
4370eec0 RF |
83 | u64 res3 = 0; /* reserved */ |
84 | u64 res4 = 0; /* reserved */ | |
85 | u32 magic = 0x644d5241; /* Magic number, little endian, "ARM\x64" */ | |
6c020ea8 | 86 | u32 res5; /* reserved (used for PE COFF offset) */ |
4370eec0 RF |
87 | |
88 | ||
89 | Header notes: | |
90 | ||
a2c1d73b MR |
91 | - As of v3.17, all fields are little endian unless stated otherwise. |
92 | ||
4370eec0 | 93 | - code0/code1 are responsible for branching to stext. |
a2c1d73b | 94 | |
cdd78578 MS |
95 | - when booting through EFI, code0/code1 are initially skipped. |
96 | res5 is an offset to the PE header and the PE header has the EFI | |
a2c1d73b | 97 | entry point (efi_stub_entry). When the stub has done its work, it |
cdd78578 | 98 | jumps to code0 to resume the normal boot process. |
9703d9d7 | 99 | |
a2c1d73b MR |
100 | - Prior to v3.17, the endianness of text_offset was not specified. In |
101 | these cases image_size is zero and text_offset is 0x80000 in the | |
102 | endianness of the kernel. Where image_size is non-zero image_size is | |
103 | little-endian and must be respected. Where image_size is zero, | |
104 | text_offset can be assumed to be 0x80000. | |
105 | ||
106 | - The flags field (introduced in v3.17) is a little-endian 64-bit field | |
107 | composed as follows: | |
b693d0b3 MCC |
108 | |
109 | ============= =============================================================== | |
110 | Bit 0 Kernel endianness. 1 if BE, 0 if LE. | |
111 | Bit 1-2 Kernel Page size. | |
112 | ||
113 | * 0 - Unspecified. | |
114 | * 1 - 4K | |
115 | * 2 - 16K | |
116 | * 3 - 64K | |
117 | Bit 3 Kernel physical placement | |
118 | ||
119 | 0 | |
120 | 2MB aligned base should be as close as possible | |
121 | to the base of DRAM, since memory below it is not | |
122 | accessible via the linear mapping | |
123 | 1 | |
124 | 2MB aligned base may be anywhere in physical | |
125 | memory | |
126 | Bits 4-63 Reserved. | |
127 | ============= =============================================================== | |
a2c1d73b MR |
128 | |
129 | - When image_size is zero, a bootloader should attempt to keep as much | |
130 | memory as possible free for use by the kernel immediately after the | |
131 | end of the kernel image. The amount of space required will vary | |
132 | depending on selected features, and is effectively unbound. | |
133 | ||
134 | The Image must be placed text_offset bytes from a 2MB aligned base | |
a7f8de16 AB |
135 | address anywhere in usable system RAM and called there. The region |
136 | between the 2 MB aligned base address and the start of the image has no | |
137 | special significance to the kernel, and may be used for other purposes. | |
a2c1d73b MR |
138 | At least image_size bytes from the start of the image must be free for |
139 | use by the kernel. | |
a7f8de16 AB |
140 | NOTE: versions prior to v4.6 cannot make use of memory below the |
141 | physical offset of the Image so it is recommended that the Image be | |
142 | placed as close as possible to the start of system RAM. | |
a2c1d73b | 143 | |
177e15f0 AB |
144 | If an initrd/initramfs is passed to the kernel at boot, it must reside |
145 | entirely within a 1 GB aligned physical memory window of up to 32 GB in | |
146 | size that fully covers the kernel Image as well. | |
147 | ||
6c020ea8 AB |
148 | Any memory described to the kernel (even that below the start of the |
149 | image) which is not marked as reserved from the kernel (e.g., with a | |
a2c1d73b MR |
150 | memreserve region in the device tree) will be considered as available to |
151 | the kernel. | |
9703d9d7 CM |
152 | |
153 | Before jumping into the kernel, the following conditions must be met: | |
154 | ||
155 | - Quiesce all DMA capable devices so that memory does not get | |
156 | corrupted by bogus network packets or disk data. This will save | |
157 | you many hours of debug. | |
158 | ||
b693d0b3 MCC |
159 | - Primary CPU general-purpose register settings: |
160 | ||
161 | - x0 = physical address of device tree blob (dtb) in system RAM. | |
162 | - x1 = 0 (reserved for future use) | |
163 | - x2 = 0 (reserved for future use) | |
164 | - x3 = 0 (reserved for future use) | |
9703d9d7 CM |
165 | |
166 | - CPU mode | |
b693d0b3 | 167 | |
9703d9d7 CM |
168 | All forms of interrupts must be masked in PSTATE.DAIF (Debug, SError, |
169 | IRQ and FIQ). | |
170 | The CPU must be in either EL2 (RECOMMENDED in order to have access to | |
171 | the virtualisation extensions) or non-secure EL1. | |
172 | ||
173 | - Caches, MMUs | |
b693d0b3 | 174 | |
9703d9d7 | 175 | The MMU must be off. |
877a37d3 | 176 | |
e24e03aa WD |
177 | The instruction cache may be on or off, and must not hold any stale |
178 | entries corresponding to the loaded kernel image. | |
877a37d3 | 179 | |
c218bca7 CM |
180 | The address range corresponding to the loaded kernel image must be |
181 | cleaned to the PoC. In the presence of a system cache or other | |
182 | coherent masters with caches enabled, this will typically require | |
183 | cache maintenance by VA rather than set/way operations. | |
184 | System caches which respect the architected cache maintenance by VA | |
185 | operations must be configured and may be enabled. | |
186 | System caches which do not respect architected cache maintenance by VA | |
187 | operations (not recommended) must be configured and disabled. | |
9703d9d7 CM |
188 | |
189 | - Architected timers | |
b693d0b3 | 190 | |
4fcd6e14 MR |
191 | CNTFRQ must be programmed with the timer frequency and CNTVOFF must |
192 | be programmed with a consistent value on all CPUs. If entering the | |
193 | kernel at EL1, CNTHCTL_EL2 must have EL1PCTEN (bit 0) set where | |
194 | available. | |
9703d9d7 CM |
195 | |
196 | - Coherency | |
b693d0b3 | 197 | |
9703d9d7 CM |
198 | All CPUs to be booted by the kernel must be part of the same coherency |
199 | domain on entry to the kernel. This may require IMPLEMENTATION DEFINED | |
200 | initialisation to enable the receiving of maintenance operations on | |
201 | each CPU. | |
202 | ||
203 | - System registers | |
b693d0b3 | 204 | |
230800cd MB |
205 | All writable architected system registers at or below the exception |
206 | level where the kernel image will be entered must be initialised by | |
207 | software at a higher exception level to prevent execution in an UNKNOWN | |
208 | state. | |
9703d9d7 | 209 | |
d98d0a99 JT |
210 | - SCR_EL3.FIQ must have the same value across all CPUs the kernel is |
211 | executing on. | |
212 | - The value of SCR_EL3.FIQ must be the same as the one present at boot | |
213 | time whenever the kernel is executing. | |
214 | ||
6d32ab2d | 215 | For systems with a GICv3 interrupt controller to be used in v3 mode: |
63f8344c | 216 | - If EL3 is present: |
b693d0b3 MCC |
217 | |
218 | - ICC_SRE_EL3.Enable (bit 3) must be initialiased to 0b1. | |
219 | - ICC_SRE_EL3.SRE (bit 0) must be initialised to 0b1. | |
7e3a57fa MZ |
220 | - ICC_CTLR_EL3.PMHE (bit 6) must be set to the same value across |
221 | all CPUs the kernel is executing on, and must stay constant | |
222 | for the lifetime of the kernel. | |
b693d0b3 | 223 | |
63f8344c | 224 | - If the kernel is entered at EL1: |
b693d0b3 MCC |
225 | |
226 | - ICC.SRE_EL2.Enable (bit 3) must be initialised to 0b1 | |
227 | - ICC_SRE_EL2.SRE (bit 0) must be initialised to 0b1. | |
228 | ||
6d32ab2d MZ |
229 | - The DT or ACPI tables must describe a GICv3 interrupt controller. |
230 | ||
231 | For systems with a GICv3 interrupt controller to be used in | |
232 | compatibility (v2) mode: | |
b693d0b3 | 233 | |
6d32ab2d | 234 | - If EL3 is present: |
b693d0b3 MCC |
235 | |
236 | ICC_SRE_EL3.SRE (bit 0) must be initialised to 0b0. | |
237 | ||
6d32ab2d | 238 | - If the kernel is entered at EL1: |
b693d0b3 MCC |
239 | |
240 | ICC_SRE_EL2.SRE (bit 0) must be initialised to 0b0. | |
241 | ||
6d32ab2d | 242 | - The DT or ACPI tables must describe a GICv2 interrupt controller. |
63f8344c | 243 | |
fbedc599 | 244 | For CPUs with pointer authentication functionality: |
877a37d3 | 245 | |
fbedc599 | 246 | - If EL3 is present: |
b693d0b3 MCC |
247 | |
248 | - SCR_EL3.APK (bit 16) must be initialised to 0b1 | |
249 | - SCR_EL3.API (bit 17) must be initialised to 0b1 | |
250 | ||
fbedc599 | 251 | - If the kernel is entered at EL1: |
b693d0b3 MCC |
252 | |
253 | - HCR_EL2.APK (bit 40) must be initialised to 0b1 | |
254 | - HCR_EL2.API (bit 41) must be initialised to 0b1 | |
fbedc599 | 255 | |
6abde908 | 256 | For CPUs with Activity Monitors Unit v1 (AMUv1) extension present: |
877a37d3 | 257 | |
6abde908 | 258 | - If EL3 is present: |
877a37d3 MCC |
259 | |
260 | - CPTR_EL3.TAM (bit 30) must be initialised to 0b0 | |
261 | - CPTR_EL2.TAM (bit 30) must be initialised to 0b0 | |
262 | - AMCNTENSET0_EL0 must be initialised to 0b1111 | |
263 | - AMCNTENSET1_EL0 must be initialised to a platform specific value | |
264 | having 0b1 set for the corresponding bit for each of the auxiliary | |
265 | counters present. | |
266 | ||
6abde908 | 267 | - If the kernel is entered at EL1: |
877a37d3 MCC |
268 | |
269 | - AMCNTENSET0_EL0 must be initialised to 0b1111 | |
270 | - AMCNTENSET1_EL0 must be initialised to a platform specific value | |
271 | having 0b1 set for the corresponding bit for each of the auxiliary | |
272 | counters present. | |
6abde908 | 273 | |
3e237387 MB |
274 | For CPUs with the Fine Grained Traps (FEAT_FGT) extension present: |
275 | ||
276 | - If EL3 is present and the kernel is entered at EL2: | |
277 | ||
278 | - SCR_EL3.FGTEn (bit 27) must be initialised to 0b1. | |
279 | ||
4fcd6e14 MR |
280 | The requirements described above for CPU mode, caches, MMUs, architected |
281 | timers, coherency and system registers apply to all CPUs. All CPUs must | |
282 | enter the kernel in the same exception level. | |
283 | ||
9703d9d7 CM |
284 | The boot loader is expected to enter the kernel on each CPU in the |
285 | following manner: | |
286 | ||
287 | - The primary CPU must jump directly to the first instruction of the | |
288 | kernel image. The device tree blob passed by this CPU must contain | |
4fcd6e14 MR |
289 | an 'enable-method' property for each cpu node. The supported |
290 | enable-methods are described below. | |
9703d9d7 CM |
291 | |
292 | It is expected that the bootloader will generate these device tree | |
293 | properties and insert them into the blob prior to kernel entry. | |
294 | ||
4fcd6e14 MR |
295 | - CPUs with a "spin-table" enable-method must have a 'cpu-release-addr' |
296 | property in their cpu node. This property identifies a | |
297 | naturally-aligned 64-bit zero-initalised memory location. | |
298 | ||
299 | These CPUs should spin outside of the kernel in a reserved area of | |
300 | memory (communicated to the kernel by a /memreserve/ region in the | |
9703d9d7 CM |
301 | device tree) polling their cpu-release-addr location, which must be |
302 | contained in the reserved region. A wfe instruction may be inserted | |
303 | to reduce the overhead of the busy-loop and a sev will be issued by | |
304 | the primary CPU. When a read of the location pointed to by the | |
4fcd6e14 MR |
305 | cpu-release-addr returns a non-zero value, the CPU must jump to this |
306 | value. The value will be written as a single 64-bit little-endian | |
307 | value, so CPUs must convert the read value to their native endianness | |
308 | before jumping to it. | |
309 | ||
310 | - CPUs with a "psci" enable method should remain outside of | |
311 | the kernel (i.e. outside of the regions of memory described to the | |
312 | kernel in the memory node, or in a reserved area of memory described | |
313 | to the kernel by a /memreserve/ region in the device tree). The | |
314 | kernel will issue CPU_ON calls as described in ARM document number ARM | |
315 | DEN 0022A ("Power State Coordination Interface System Software on ARM | |
316 | processors") to bring CPUs into the kernel. | |
317 | ||
318 | The device tree should contain a 'psci' node, as described in | |
5025ef8b | 319 | Documentation/devicetree/bindings/arm/psci.yaml. |
9703d9d7 CM |
320 | |
321 | - Secondary CPU general-purpose register settings | |
877a37d3 MCC |
322 | |
323 | - x0 = 0 (reserved for future use) | |
324 | - x1 = 0 (reserved for future use) | |
325 | - x2 = 0 (reserved for future use) | |
326 | - x3 = 0 (reserved for future use) |