]>
Commit | Line | Data |
---|---|---|
3ec8ce5d TG |
1 | L1TF - L1 Terminal Fault |
2 | ======================== | |
3 | ||
4 | L1 Terminal Fault is a hardware vulnerability which allows unprivileged | |
5 | speculative access to data which is available in the Level 1 Data Cache | |
6 | when the page table entry controlling the virtual address, which is used | |
7 | for the access, has the Present bit cleared or other reserved bits set. | |
8 | ||
9 | Affected processors | |
10 | ------------------- | |
11 | ||
12 | This vulnerability affects a wide range of Intel processors. The | |
13 | vulnerability is not present on: | |
14 | ||
15 | - Processors from AMD, Centaur and other non Intel vendors | |
16 | ||
17 | - Older processor models, where the CPU family is < 6 | |
18 | ||
19 | - A range of Intel ATOM processors (Cedarview, Cloverview, Lincroft, | |
1949f9f4 | 20 | Penwell, Pineview, Silvermont, Airmont, Merrifield) |
3ec8ce5d | 21 | |
3ec8ce5d TG |
22 | - The Intel XEON PHI family |
23 | ||
24 | - Intel processors which have the ARCH_CAP_RDCL_NO bit set in the | |
25 | IA32_ARCH_CAPABILITIES MSR. If the bit is set the CPU is not affected | |
26 | by the Meltdown vulnerability either. These CPUs should become | |
27 | available by end of 2018. | |
28 | ||
29 | Whether a processor is affected or not can be read out from the L1TF | |
30 | vulnerability file in sysfs. See :ref:`l1tf_sys_info`. | |
31 | ||
32 | Related CVEs | |
33 | ------------ | |
34 | ||
35 | The following CVE entries are related to the L1TF vulnerability: | |
36 | ||
37 | ============= ================= ============================== | |
38 | CVE-2018-3615 L1 Terminal Fault SGX related aspects | |
39 | CVE-2018-3620 L1 Terminal Fault OS, SMM related aspects | |
40 | CVE-2018-3646 L1 Terminal Fault Virtualization related aspects | |
41 | ============= ================= ============================== | |
42 | ||
43 | Problem | |
44 | ------- | |
45 | ||
46 | If an instruction accesses a virtual address for which the relevant page | |
47 | table entry (PTE) has the Present bit cleared or other reserved bits set, | |
48 | then speculative execution ignores the invalid PTE and loads the referenced | |
49 | data if it is present in the Level 1 Data Cache, as if the page referenced | |
50 | by the address bits in the PTE was still present and accessible. | |
51 | ||
52 | While this is a purely speculative mechanism and the instruction will raise | |
53 | a page fault when it is retired eventually, the pure act of loading the | |
54 | data and making it available to other speculative instructions opens up the | |
55 | opportunity for side channel attacks to unprivileged malicious code, | |
56 | similar to the Meltdown attack. | |
57 | ||
58 | While Meltdown breaks the user space to kernel space protection, L1TF | |
59 | allows to attack any physical memory address in the system and the attack | |
60 | works across all protection domains. It allows an attack of SGX and also | |
61 | works from inside virtual machines because the speculation bypasses the | |
62 | extended page table (EPT) protection mechanism. | |
63 | ||
64 | ||
65 | Attack scenarios | |
66 | ---------------- | |
67 | ||
68 | 1. Malicious user space | |
69 | ^^^^^^^^^^^^^^^^^^^^^^^ | |
70 | ||
71 | Operating Systems store arbitrary information in the address bits of a | |
72 | PTE which is marked non present. This allows a malicious user space | |
73 | application to attack the physical memory to which these PTEs resolve. | |
74 | In some cases user-space can maliciously influence the information | |
75 | encoded in the address bits of the PTE, thus making attacks more | |
76 | deterministic and more practical. | |
77 | ||
78 | The Linux kernel contains a mitigation for this attack vector, PTE | |
79 | inversion, which is permanently enabled and has no performance | |
80 | impact. The kernel ensures that the address bits of PTEs, which are not | |
81 | marked present, never point to cacheable physical memory space. | |
82 | ||
83 | A system with an up to date kernel is protected against attacks from | |
84 | malicious user space applications. | |
85 | ||
86 | 2. Malicious guest in a virtual machine | |
87 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
88 | ||
89 | The fact that L1TF breaks all domain protections allows malicious guest | |
90 | OSes, which can control the PTEs directly, and malicious guest user | |
91 | space applications, which run on an unprotected guest kernel lacking the | |
92 | PTE inversion mitigation for L1TF, to attack physical host memory. | |
93 | ||
94 | A special aspect of L1TF in the context of virtualization is symmetric | |
95 | multi threading (SMT). The Intel implementation of SMT is called | |
96 | HyperThreading. The fact that Hyperthreads on the affected processors | |
97 | share the L1 Data Cache (L1D) is important for this. As the flaw allows | |
98 | only to attack data which is present in L1D, a malicious guest running | |
99 | on one Hyperthread can attack the data which is brought into the L1D by | |
100 | the context which runs on the sibling Hyperthread of the same physical | |
101 | core. This context can be host OS, host user space or a different guest. | |
102 | ||
103 | If the processor does not support Extended Page Tables, the attack is | |
104 | only possible, when the hypervisor does not sanitize the content of the | |
105 | effective (shadow) page tables. | |
106 | ||
107 | While solutions exist to mitigate these attack vectors fully, these | |
108 | mitigations are not enabled by default in the Linux kernel because they | |
109 | can affect performance significantly. The kernel provides several | |
110 | mechanisms which can be utilized to address the problem depending on the | |
111 | deployment scenario. The mitigations, their protection scope and impact | |
112 | are described in the next sections. | |
113 | ||
1949f9f4 | 114 | The default mitigations and the rationale for choosing them are explained |
3ec8ce5d TG |
115 | at the end of this document. See :ref:`default_mitigations`. |
116 | ||
117 | .. _l1tf_sys_info: | |
118 | ||
119 | L1TF system information | |
120 | ----------------------- | |
121 | ||
122 | The Linux kernel provides a sysfs interface to enumerate the current L1TF | |
123 | status of the system: whether the system is vulnerable, and which | |
124 | mitigations are active. The relevant sysfs file is: | |
125 | ||
126 | /sys/devices/system/cpu/vulnerabilities/l1tf | |
127 | ||
128 | The possible values in this file are: | |
129 | ||
130 | =========================== =============================== | |
131 | 'Not affected' The processor is not vulnerable | |
132 | 'Mitigation: PTE Inversion' The host protection is active | |
133 | =========================== =============================== | |
134 | ||
135 | If KVM/VMX is enabled and the processor is vulnerable then the following | |
136 | information is appended to the 'Mitigation: PTE Inversion' part: | |
137 | ||
138 | - SMT status: | |
139 | ||
140 | ===================== ================ | |
141 | 'VMX: SMT vulnerable' SMT is enabled | |
142 | 'VMX: SMT disabled' SMT is disabled | |
143 | ===================== ================ | |
144 | ||
145 | - L1D Flush mode: | |
146 | ||
147 | ================================ ==================================== | |
148 | 'L1D vulnerable' L1D flushing is disabled | |
149 | ||
150 | 'L1D conditional cache flushes' L1D flush is conditionally enabled | |
151 | ||
152 | 'L1D cache flushes' L1D flush is unconditionally enabled | |
153 | ================================ ==================================== | |
154 | ||
155 | The resulting grade of protection is discussed in the following sections. | |
156 | ||
157 | ||
158 | Host mitigation mechanism | |
159 | ------------------------- | |
160 | ||
161 | The kernel is unconditionally protected against L1TF attacks from malicious | |
162 | user space running on the host. | |
163 | ||
164 | ||
165 | Guest mitigation mechanisms | |
166 | --------------------------- | |
167 | ||
168 | .. _l1d_flush: | |
169 | ||
170 | 1. L1D flush on VMENTER | |
171 | ^^^^^^^^^^^^^^^^^^^^^^^ | |
172 | ||
173 | To make sure that a guest cannot attack data which is present in the L1D | |
174 | the hypervisor flushes the L1D before entering the guest. | |
175 | ||
176 | Flushing the L1D evicts not only the data which should not be accessed | |
177 | by a potentially malicious guest, it also flushes the guest | |
178 | data. Flushing the L1D has a performance impact as the processor has to | |
179 | bring the flushed guest data back into the L1D. Depending on the | |
180 | frequency of VMEXIT/VMENTER and the type of computations in the guest | |
181 | performance degradation in the range of 1% to 50% has been observed. For | |
182 | scenarios where guest VMEXIT/VMENTER are rare the performance impact is | |
183 | minimal. Virtio and mechanisms like posted interrupts are designed to | |
184 | confine the VMEXITs to a bare minimum, but specific configurations and | |
185 | application scenarios might still suffer from a high VMEXIT rate. | |
186 | ||
187 | The kernel provides two L1D flush modes: | |
188 | - conditional ('cond') | |
189 | - unconditional ('always') | |
190 | ||
191 | The conditional mode avoids L1D flushing after VMEXITs which execute | |
1949f9f4 TL |
192 | only audited code paths before the corresponding VMENTER. These code |
193 | paths have been verified that they cannot expose secrets or other | |
3ec8ce5d TG |
194 | interesting data to an attacker, but they can leak information about the |
195 | address space layout of the hypervisor. | |
196 | ||
197 | Unconditional mode flushes L1D on all VMENTER invocations and provides | |
198 | maximum protection. It has a higher overhead than the conditional | |
199 | mode. The overhead cannot be quantified correctly as it depends on the | |
1949f9f4 | 200 | workload scenario and the resulting number of VMEXITs. |
3ec8ce5d TG |
201 | |
202 | The general recommendation is to enable L1D flush on VMENTER. The kernel | |
203 | defaults to conditional mode on affected processors. | |
204 | ||
205 | **Note**, that L1D flush does not prevent the SMT problem because the | |
206 | sibling thread will also bring back its data into the L1D which makes it | |
207 | attackable again. | |
208 | ||
209 | L1D flush can be controlled by the administrator via the kernel command | |
210 | line and sysfs control files. See :ref:`mitigation_control_command_line` | |
211 | and :ref:`mitigation_control_kvm`. | |
212 | ||
213 | .. _guest_confinement: | |
214 | ||
215 | 2. Guest VCPU confinement to dedicated physical cores | |
216 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
217 | ||
218 | To address the SMT problem, it is possible to make a guest or a group of | |
219 | guests affine to one or more physical cores. The proper mechanism for | |
220 | that is to utilize exclusive cpusets to ensure that no other guest or | |
221 | host tasks can run on these cores. | |
222 | ||
223 | If only a single guest or related guests run on sibling SMT threads on | |
224 | the same physical core then they can only attack their own memory and | |
225 | restricted parts of the host memory. | |
226 | ||
227 | Host memory is attackable, when one of the sibling SMT threads runs in | |
228 | host OS (hypervisor) context and the other in guest context. The amount | |
229 | of valuable information from the host OS context depends on the context | |
230 | which the host OS executes, i.e. interrupts, soft interrupts and kernel | |
231 | threads. The amount of valuable data from these contexts cannot be | |
232 | declared as non-interesting for an attacker without deep inspection of | |
233 | the code. | |
234 | ||
235 | **Note**, that assigning guests to a fixed set of physical cores affects | |
236 | the ability of the scheduler to do load balancing and might have | |
237 | negative effects on CPU utilization depending on the hosting | |
238 | scenario. Disabling SMT might be a viable alternative for particular | |
239 | scenarios. | |
240 | ||
241 | For further information about confining guests to a single or to a group | |
242 | of cores consult the cpusets documentation: | |
243 | ||
244 | https://www.kernel.org/doc/Documentation/cgroup-v1/cpusets.txt | |
245 | ||
246 | .. _interrupt_isolation: | |
247 | ||
248 | 3. Interrupt affinity | |
249 | ^^^^^^^^^^^^^^^^^^^^^ | |
250 | ||
251 | Interrupts can be made affine to logical CPUs. This is not universally | |
252 | true because there are types of interrupts which are truly per CPU | |
253 | interrupts, e.g. the local timer interrupt. Aside of that multi queue | |
254 | devices affine their interrupts to single CPUs or groups of CPUs per | |
255 | queue without allowing the administrator to control the affinities. | |
256 | ||
257 | Moving the interrupts, which can be affinity controlled, away from CPUs | |
258 | which run untrusted guests, reduces the attack vector space. | |
259 | ||
260 | Whether the interrupts with are affine to CPUs, which run untrusted | |
261 | guests, provide interesting data for an attacker depends on the system | |
262 | configuration and the scenarios which run on the system. While for some | |
1949f9f4 | 263 | of the interrupts it can be assumed that they won't expose interesting |
3ec8ce5d TG |
264 | information beyond exposing hints about the host OS memory layout, there |
265 | is no way to make general assumptions. | |
266 | ||
267 | Interrupt affinity can be controlled by the administrator via the | |
268 | /proc/irq/$NR/smp_affinity[_list] files. Limited documentation is | |
269 | available at: | |
270 | ||
271 | https://www.kernel.org/doc/Documentation/IRQ-affinity.txt | |
272 | ||
273 | .. _smt_control: | |
274 | ||
275 | 4. SMT control | |
276 | ^^^^^^^^^^^^^^ | |
277 | ||
278 | To prevent the SMT issues of L1TF it might be necessary to disable SMT | |
279 | completely. Disabling SMT can have a significant performance impact, but | |
280 | the impact depends on the hosting scenario and the type of workloads. | |
281 | The impact of disabling SMT needs also to be weighted against the impact | |
282 | of other mitigation solutions like confining guests to dedicated cores. | |
283 | ||
284 | The kernel provides a sysfs interface to retrieve the status of SMT and | |
285 | to control it. It also provides a kernel command line interface to | |
286 | control SMT. | |
287 | ||
288 | The kernel command line interface consists of the following options: | |
289 | ||
290 | =========== ========================================================== | |
291 | nosmt Affects the bring up of the secondary CPUs during boot. The | |
292 | kernel tries to bring all present CPUs online during the | |
293 | boot process. "nosmt" makes sure that from each physical | |
294 | core only one - the so called primary (hyper) thread is | |
295 | activated. Due to a design flaw of Intel processors related | |
296 | to Machine Check Exceptions the non primary siblings have | |
297 | to be brought up at least partially and are then shut down | |
298 | again. "nosmt" can be undone via the sysfs interface. | |
299 | ||
1949f9f4 | 300 | nosmt=force Has the same effect as "nosmt" but it does not allow to |
3ec8ce5d TG |
301 | undo the SMT disable via the sysfs interface. |
302 | =========== ========================================================== | |
303 | ||
304 | The sysfs interface provides two files: | |
305 | ||
306 | - /sys/devices/system/cpu/smt/control | |
307 | - /sys/devices/system/cpu/smt/active | |
308 | ||
309 | /sys/devices/system/cpu/smt/control: | |
310 | ||
311 | This file allows to read out the SMT control state and provides the | |
312 | ability to disable or (re)enable SMT. The possible states are: | |
313 | ||
314 | ============== =================================================== | |
315 | on SMT is supported by the CPU and enabled. All | |
316 | logical CPUs can be onlined and offlined without | |
317 | restrictions. | |
318 | ||
319 | off SMT is supported by the CPU and disabled. Only | |
320 | the so called primary SMT threads can be onlined | |
321 | and offlined without restrictions. An attempt to | |
322 | online a non-primary sibling is rejected | |
323 | ||
324 | forceoff Same as 'off' but the state cannot be controlled. | |
325 | Attempts to write to the control file are rejected. | |
326 | ||
327 | notsupported The processor does not support SMT. It's therefore | |
328 | not affected by the SMT implications of L1TF. | |
329 | Attempts to write to the control file are rejected. | |
330 | ============== =================================================== | |
331 | ||
332 | The possible states which can be written into this file to control SMT | |
333 | state are: | |
334 | ||
335 | - on | |
336 | - off | |
337 | - forceoff | |
338 | ||
339 | /sys/devices/system/cpu/smt/active: | |
340 | ||
341 | This file reports whether SMT is enabled and active, i.e. if on any | |
342 | physical core two or more sibling threads are online. | |
343 | ||
344 | SMT control is also possible at boot time via the l1tf kernel command | |
345 | line parameter in combination with L1D flush control. See | |
346 | :ref:`mitigation_control_command_line`. | |
347 | ||
348 | 5. Disabling EPT | |
349 | ^^^^^^^^^^^^^^^^ | |
350 | ||
351 | Disabling EPT for virtual machines provides full mitigation for L1TF even | |
352 | with SMT enabled, because the effective page tables for guests are | |
353 | managed and sanitized by the hypervisor. Though disabling EPT has a | |
354 | significant performance impact especially when the Meltdown mitigation | |
355 | KPTI is enabled. | |
356 | ||
357 | EPT can be disabled in the hypervisor via the 'kvm-intel.ept' parameter. | |
358 | ||
359 | There is ongoing research and development for new mitigation mechanisms to | |
360 | address the performance impact of disabling SMT or EPT. | |
361 | ||
362 | .. _mitigation_control_command_line: | |
363 | ||
364 | Mitigation control on the kernel command line | |
365 | --------------------------------------------- | |
366 | ||
367 | The kernel command line allows to control the L1TF mitigations at boot | |
368 | time with the option "l1tf=". The valid arguments for this option are: | |
369 | ||
370 | ============ ============================================================= | |
371 | full Provides all available mitigations for the L1TF | |
372 | vulnerability. Disables SMT and enables all mitigations in | |
373 | the hypervisors, i.e. unconditional L1D flushing | |
374 | ||
375 | SMT control and L1D flush control via the sysfs interface | |
376 | is still possible after boot. Hypervisors will issue a | |
377 | warning when the first VM is started in a potentially | |
378 | insecure configuration, i.e. SMT enabled or L1D flush | |
379 | disabled. | |
380 | ||
381 | full,force Same as 'full', but disables SMT and L1D flush runtime | |
382 | control. Implies the 'nosmt=force' command line option. | |
383 | (i.e. sysfs control of SMT is disabled.) | |
384 | ||
385 | flush Leaves SMT enabled and enables the default hypervisor | |
386 | mitigation, i.e. conditional L1D flushing | |
387 | ||
388 | SMT control and L1D flush control via the sysfs interface | |
389 | is still possible after boot. Hypervisors will issue a | |
390 | warning when the first VM is started in a potentially | |
391 | insecure configuration, i.e. SMT enabled or L1D flush | |
392 | disabled. | |
393 | ||
394 | flush,nosmt Disables SMT and enables the default hypervisor mitigation, | |
395 | i.e. conditional L1D flushing. | |
396 | ||
397 | SMT control and L1D flush control via the sysfs interface | |
398 | is still possible after boot. Hypervisors will issue a | |
399 | warning when the first VM is started in a potentially | |
400 | insecure configuration, i.e. SMT enabled or L1D flush | |
401 | disabled. | |
402 | ||
403 | flush,nowarn Same as 'flush', but hypervisors will not warn when a VM is | |
404 | started in a potentially insecure configuration. | |
405 | ||
406 | off Disables hypervisor mitigations and doesn't emit any | |
407 | warnings. | |
5b5e4d62 MH |
408 | It also drops the swap size and available RAM limit restrictions |
409 | on both hypervisor and bare metal. | |
410 | ||
3ec8ce5d TG |
411 | ============ ============================================================= |
412 | ||
413 | The default is 'flush'. For details about L1D flushing see :ref:`l1d_flush`. | |
414 | ||
415 | ||
416 | .. _mitigation_control_kvm: | |
417 | ||
418 | Mitigation control for KVM - module parameter | |
419 | ------------------------------------------------------------- | |
420 | ||
421 | The KVM hypervisor mitigation mechanism, flushing the L1D cache when | |
422 | entering a guest, can be controlled with a module parameter. | |
423 | ||
424 | The option/parameter is "kvm-intel.vmentry_l1d_flush=". It takes the | |
425 | following arguments: | |
426 | ||
427 | ============ ============================================================== | |
428 | always L1D cache flush on every VMENTER. | |
429 | ||
430 | cond Flush L1D on VMENTER only when the code between VMEXIT and | |
431 | VMENTER can leak host memory which is considered | |
432 | interesting for an attacker. This still can leak host memory | |
433 | which allows e.g. to determine the hosts address space layout. | |
434 | ||
435 | never Disables the mitigation | |
436 | ============ ============================================================== | |
437 | ||
438 | The parameter can be provided on the kernel command line, as a module | |
439 | parameter when loading the modules and at runtime modified via the sysfs | |
440 | file: | |
441 | ||
442 | /sys/module/kvm_intel/parameters/vmentry_l1d_flush | |
443 | ||
444 | The default is 'cond'. If 'l1tf=full,force' is given on the kernel command | |
445 | line, then 'always' is enforced and the kvm-intel.vmentry_l1d_flush | |
446 | module parameter is ignored and writes to the sysfs file are rejected. | |
447 | ||
448 | ||
449 | Mitigation selection guide | |
450 | -------------------------- | |
451 | ||
452 | 1. No virtualization in use | |
453 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
454 | ||
455 | The system is protected by the kernel unconditionally and no further | |
456 | action is required. | |
457 | ||
458 | 2. Virtualization with trusted guests | |
459 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
460 | ||
461 | If the guest comes from a trusted source and the guest OS kernel is | |
462 | guaranteed to have the L1TF mitigations in place the system is fully | |
463 | protected against L1TF and no further action is required. | |
464 | ||
465 | To avoid the overhead of the default L1D flushing on VMENTER the | |
466 | administrator can disable the flushing via the kernel command line and | |
467 | sysfs control files. See :ref:`mitigation_control_command_line` and | |
468 | :ref:`mitigation_control_kvm`. | |
469 | ||
470 | ||
471 | 3. Virtualization with untrusted guests | |
472 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
473 | ||
474 | 3.1. SMT not supported or disabled | |
475 | """""""""""""""""""""""""""""""""" | |
476 | ||
477 | If SMT is not supported by the processor or disabled in the BIOS or by | |
478 | the kernel, it's only required to enforce L1D flushing on VMENTER. | |
479 | ||
480 | Conditional L1D flushing is the default behaviour and can be tuned. See | |
481 | :ref:`mitigation_control_command_line` and :ref:`mitigation_control_kvm`. | |
482 | ||
483 | 3.2. EPT not supported or disabled | |
484 | """""""""""""""""""""""""""""""""" | |
485 | ||
486 | If EPT is not supported by the processor or disabled in the hypervisor, | |
487 | the system is fully protected. SMT can stay enabled and L1D flushing on | |
488 | VMENTER is not required. | |
489 | ||
490 | EPT can be disabled in the hypervisor via the 'kvm-intel.ept' parameter. | |
491 | ||
492 | 3.3. SMT and EPT supported and active | |
493 | """"""""""""""""""""""""""""""""""""" | |
494 | ||
495 | If SMT and EPT are supported and active then various degrees of | |
496 | mitigations can be employed: | |
497 | ||
498 | - L1D flushing on VMENTER: | |
499 | ||
500 | L1D flushing on VMENTER is the minimal protection requirement, but it | |
501 | is only potent in combination with other mitigation methods. | |
502 | ||
503 | Conditional L1D flushing is the default behaviour and can be tuned. See | |
504 | :ref:`mitigation_control_command_line` and :ref:`mitigation_control_kvm`. | |
505 | ||
506 | - Guest confinement: | |
507 | ||
508 | Confinement of guests to a single or a group of physical cores which | |
509 | are not running any other processes, can reduce the attack surface | |
510 | significantly, but interrupts, soft interrupts and kernel threads can | |
511 | still expose valuable data to a potential attacker. See | |
512 | :ref:`guest_confinement`. | |
513 | ||
514 | - Interrupt isolation: | |
515 | ||
516 | Isolating the guest CPUs from interrupts can reduce the attack surface | |
517 | further, but still allows a malicious guest to explore a limited amount | |
518 | of host physical memory. This can at least be used to gain knowledge | |
519 | about the host address space layout. The interrupts which have a fixed | |
520 | affinity to the CPUs which run the untrusted guests can depending on | |
521 | the scenario still trigger soft interrupts and schedule kernel threads | |
522 | which might expose valuable information. See | |
523 | :ref:`interrupt_isolation`. | |
524 | ||
525 | The above three mitigation methods combined can provide protection to a | |
526 | certain degree, but the risk of the remaining attack surface has to be | |
527 | carefully analyzed. For full protection the following methods are | |
528 | available: | |
529 | ||
530 | - Disabling SMT: | |
531 | ||
532 | Disabling SMT and enforcing the L1D flushing provides the maximum | |
533 | amount of protection. This mitigation is not depending on any of the | |
534 | above mitigation methods. | |
535 | ||
536 | SMT control and L1D flushing can be tuned by the command line | |
537 | parameters 'nosmt', 'l1tf', 'kvm-intel.vmentry_l1d_flush' and at run | |
538 | time with the matching sysfs control files. See :ref:`smt_control`, | |
539 | :ref:`mitigation_control_command_line` and | |
540 | :ref:`mitigation_control_kvm`. | |
541 | ||
542 | - Disabling EPT: | |
543 | ||
544 | Disabling EPT provides the maximum amount of protection as well. It is | |
545 | not depending on any of the above mitigation methods. SMT can stay | |
546 | enabled and L1D flushing is not required, but the performance impact is | |
547 | significant. | |
548 | ||
549 | EPT can be disabled in the hypervisor via the 'kvm-intel.ept' | |
550 | parameter. | |
551 | ||
5b76a3cf PB |
552 | 3.4. Nested virtual machines |
553 | """""""""""""""""""""""""""" | |
554 | ||
555 | When nested virtualization is in use, three operating systems are involved: | |
556 | the bare metal hypervisor, the nested hypervisor and the nested virtual | |
557 | machine. VMENTER operations from the nested hypervisor into the nested | |
558 | guest will always be processed by the bare metal hypervisor. If KVM is the | |
60ca05c3 | 559 | bare metal hypervisor it will: |
5b76a3cf PB |
560 | |
561 | - Flush the L1D cache on every switch from the nested hypervisor to the | |
562 | nested virtual machine, so that the nested hypervisor's secrets are not | |
563 | exposed to the nested virtual machine; | |
564 | ||
565 | - Flush the L1D cache on every switch from the nested virtual machine to | |
566 | the nested hypervisor; this is a complex operation, and flushing the L1D | |
567 | cache avoids that the bare metal hypervisor's secrets are exposed to the | |
568 | nested virtual machine; | |
569 | ||
570 | - Instruct the nested hypervisor to not perform any L1D cache flush. This | |
571 | is an optimization to avoid double L1D flushing. | |
572 | ||
3ec8ce5d TG |
573 | |
574 | .. _default_mitigations: | |
575 | ||
576 | Default mitigations | |
577 | ------------------- | |
578 | ||
579 | The kernel default mitigations for vulnerable processors are: | |
580 | ||
581 | - PTE inversion to protect against malicious user space. This is done | |
5b5e4d62 MH |
582 | unconditionally and cannot be controlled. The swap storage is limited |
583 | to ~16TB. | |
3ec8ce5d TG |
584 | |
585 | - L1D conditional flushing on VMENTER when EPT is enabled for | |
586 | a guest. | |
587 | ||
588 | The kernel does not by default enforce the disabling of SMT, which leaves | |
589 | SMT systems vulnerable when running untrusted guests with EPT enabled. | |
590 | ||
591 | The rationale for this choice is: | |
592 | ||
593 | - Force disabling SMT can break existing setups, especially with | |
594 | unattended updates. | |
595 | ||
596 | - If regular users run untrusted guests on their machine, then L1TF is | |
597 | just an add on to other malware which might be embedded in an untrusted | |
598 | guest, e.g. spam-bots or attacks on the local network. | |
599 | ||
600 | There is no technical way to prevent a user from running untrusted code | |
601 | on their machines blindly. | |
602 | ||
603 | - It's technically extremely unlikely and from today's knowledge even | |
604 | impossible that L1TF can be exploited via the most popular attack | |
605 | mechanisms like JavaScript because these mechanisms have no way to | |
606 | control PTEs. If this would be possible and not other mitigation would | |
607 | be possible, then the default might be different. | |
608 | ||
609 | - The administrators of cloud and hosting setups have to carefully | |
610 | analyze the risk for their scenarios and make the appropriate | |
611 | mitigation choices, which might even vary across their deployed | |
612 | machines and also result in other changes of their overall setup. | |
613 | There is no way for the kernel to provide a sensible default for this | |
614 | kind of scenarios. |