]> git.proxmox.com Git - mirror_ubuntu-bionic-kernel.git/blob - Documentation/x86/mds.rst
3d6f943f1afb5b119461bba7b625b6abafa3fd0a
[mirror_ubuntu-bionic-kernel.git] / Documentation / x86 / mds.rst
1 Microarchitectural Data Sampling (MDS) mitigation
2 =================================================
3
4 .. _mds:
5
6 Overview
7 --------
8
9 Microarchitectural Data Sampling (MDS) is a family of side channel attacks
10 on internal buffers in Intel CPUs. The variants are:
11
12 - Microarchitectural Store Buffer Data Sampling (MSBDS) (CVE-2018-12126)
13 - Microarchitectural Fill Buffer Data Sampling (MFBDS) (CVE-2018-12130)
14 - Microarchitectural Load Port Data Sampling (MLPDS) (CVE-2018-12127)
15
16 MSBDS leaks Store Buffer Entries which can be speculatively forwarded to a
17 dependent load (store-to-load forwarding) as an optimization. The forward
18 can also happen to a faulting or assisting load operation for a different
19 memory address, which can be exploited under certain conditions. Store
20 buffers are partitioned between Hyper-Threads so cross thread forwarding is
21 not possible. But if a thread enters or exits a sleep state the store
22 buffer is repartitioned which can expose data from one thread to the other.
23
24 MFBDS leaks Fill Buffer Entries. Fill buffers are used internally to manage
25 L1 miss situations and to hold data which is returned or sent in response
26 to a memory or I/O operation. Fill buffers can forward data to a load
27 operation and also write data to the cache. When the fill buffer is
28 deallocated it can retain the stale data of the preceding operations which
29 can then be forwarded to a faulting or assisting load operation, which can
30 be exploited under certain conditions. Fill buffers are shared between
31 Hyper-Threads so cross thread leakage is possible.
32
33 MLPDS leaks Load Port Data. Load ports are used to perform load operations
34 from memory or I/O. The received data is then forwarded to the register
35 file or a subsequent operation. In some implementations the Load Port can
36 contain stale data from a previous operation which can be forwarded to
37 faulting or assisting loads under certain conditions, which again can be
38 exploited eventually. Load ports are shared between Hyper-Threads so cross
39 thread leakage is possible.
40
41
42 Exposure assumptions
43 --------------------
44
45 It is assumed that attack code resides in user space or in a guest with one
46 exception. The rationale behind this assumption is that the code construct
47 needed for exploiting MDS requires:
48
49 - to control the load to trigger a fault or assist
50
51 - to have a disclosure gadget which exposes the speculatively accessed
52 data for consumption through a side channel.
53
54 - to control the pointer through which the disclosure gadget exposes the
55 data
56
57 The existence of such a construct in the kernel cannot be excluded with
58 100% certainty, but the complexity involved makes it extremly unlikely.
59
60 There is one exception, which is untrusted BPF. The functionality of
61 untrusted BPF is limited, but it needs to be thoroughly investigated
62 whether it can be used to create such a construct.
63
64
65 Mitigation strategy
66 -------------------
67
68 All variants have the same mitigation strategy at least for the single CPU
69 thread case (SMT off): Force the CPU to clear the affected buffers.
70
71 This is achieved by using the otherwise unused and obsolete VERW
72 instruction in combination with a microcode update. The microcode clears
73 the affected CPU buffers when the VERW instruction is executed.
74
75 For virtualization there are two ways to achieve CPU buffer
76 clearing. Either the modified VERW instruction or via the L1D Flush
77 command. The latter is issued when L1TF mitigation is enabled so the extra
78 VERW can be avoided. If the CPU is not affected by L1TF then VERW needs to
79 be issued.
80
81 If the VERW instruction with the supplied segment selector argument is
82 executed on a CPU without the microcode update there is no side effect
83 other than a small number of pointlessly wasted CPU cycles.
84
85 This does not protect against cross Hyper-Thread attacks except for MSBDS
86 which is only exploitable cross Hyper-thread when one of the Hyper-Threads
87 enters a C-state.
88
89 The kernel provides a function to invoke the buffer clearing:
90
91 mds_clear_cpu_buffers()
92
93 The mitigation is invoked on kernel/userspace, hypervisor/guest and C-state
94 (idle) transitions.
95
96 As a special quirk to address virtualization scenarios where the host has
97 the microcode updated, but the hypervisor does not (yet) expose the
98 MD_CLEAR CPUID bit to guests, the kernel issues the VERW instruction in the
99 hope that it might actually clear the buffers. The state is reflected
100 accordingly.
101
102 According to current knowledge additional mitigations inside the kernel
103 itself are not required because the necessary gadgets to expose the leaked
104 data cannot be controlled in a way which allows exploitation from malicious
105 user space or VM guests.
106
107 Kernel internal mitigation modes
108 --------------------------------
109
110 ======= ============================================================
111 off Mitigation is disabled. Either the CPU is not affected or
112 mds=off is supplied on the kernel command line
113
114 full Mitigation is eanbled. CPU is affected and MD_CLEAR is
115 advertised in CPUID.
116
117 vmwerv Mitigation is enabled. CPU is affected and MD_CLEAR is not
118 advertised in CPUID. That is mainly for virtualization
119 scenarios where the host has the updated microcode but the
120 hypervisor does not expose MD_CLEAR in CPUID. It's a best
121 effort approach without guarantee.
122 ======= ============================================================
123
124 If the CPU is affected and mds=off is not supplied on the kernel command
125 line then the kernel selects the appropriate mitigation mode depending on
126 the availability of the MD_CLEAR CPUID bit.
127
128 Mitigation points
129 -----------------
130
131 1. Return to user space
132 ^^^^^^^^^^^^^^^^^^^^^^^
133
134 When transitioning from kernel to user space the CPU buffers are flushed
135 on affected CPUs when the mitigation is not disabled on the kernel
136 command line. The migitation is enabled through the static key
137 mds_user_clear.
138
139 The mitigation is invoked in prepare_exit_to_usermode() which covers
140 most of the kernel to user space transitions. There are a few exceptions
141 which are not invoking prepare_exit_to_usermode() on return to user
142 space. These exceptions use the paranoid exit code.
143
144 - Non Maskable Interrupt (NMI):
145
146 Access to sensible data like keys, credentials in the NMI context is
147 mostly theoretical: The CPU can do prefetching or execute a
148 misspeculated code path and thereby fetching data which might end up
149 leaking through a buffer.
150
151 But for mounting other attacks the kernel stack address of the task is
152 already valuable information. So in full mitigation mode, the NMI is
153 mitigated on the return from do_nmi() to provide almost complete
154 coverage.
155
156 - Double fault (#DF):
157
158 A double fault is usually fatal, but the ESPFIX workaround, which can
159 be triggered from user space through modify_ldt(2) is a recoverable
160 double fault. #DF uses the paranoid exit path, so explicit mitigation
161 in the double fault handler is required.
162
163 - Machine Check Exception (#MC):
164
165 Another corner case is a #MC which hits between the CPU buffer clear
166 invocation and the actual return to user. As this still is in kernel
167 space it takes the paranoid exit path which does not clear the CPU
168 buffers. So the #MC handler repopulates the buffers to some
169 extent. Machine checks are not reliably controllable and the window is
170 extremly small so mitigation would just tick a checkbox that this
171 theoretical corner case is covered. To keep the amount of special
172 cases small, ignore #MC.
173
174 - Debug Exception (#DB):
175
176 This takes the paranoid exit path only when the INT1 breakpoint is in
177 kernel space. #DB on a user space address takes the regular exit path,
178 so no extra mitigation required.
179
180
181 2. C-State transition
182 ^^^^^^^^^^^^^^^^^^^^^
183
184 When a CPU goes idle and enters a C-State the CPU buffers need to be
185 cleared on affected CPUs when SMT is active. This addresses the
186 repartitioning of the store buffer when one of the Hyper-Threads enters
187 a C-State.
188
189 When SMT is inactive, i.e. either the CPU does not support it or all
190 sibling threads are offline CPU buffer clearing is not required.
191
192 The idle clearing is enabled on CPUs which are only affected by MSBDS
193 and not by any other MDS variant. The other MDS variants cannot be
194 protected against cross Hyper-Thread attacks because the Fill Buffer and
195 the Load Ports are shared. So on CPUs affected by other variants, the
196 idle clearing would be a window dressing exercise and is therefore not
197 activated.
198
199 The invocation is controlled by the static key mds_idle_clear which is
200 switched depending on the chosen mitigation mode and the SMT state of
201 the system.
202
203 The buffer clear is only invoked before entering the C-State to prevent
204 that stale data from the idling CPU from spilling to the Hyper-Thread
205 sibling after the store buffer got repartitioned and all entries are
206 available to the non idle sibling.
207
208 When coming out of idle the store buffer is partitioned again so each
209 sibling has half of it available. The back from idle CPU could be then
210 speculatively exposed to contents of the sibling. The buffers are
211 flushed either on exit to user space or on VMENTER so malicious code
212 in user space or the guest cannot speculatively access them.
213
214 The mitigation is hooked into all variants of halt()/mwait(), but does
215 not cover the legacy ACPI IO-Port mechanism because the ACPI idle driver
216 has been superseded by the intel_idle driver around 2010 and is
217 preferred on all affected CPUs which are expected to gain the MD_CLEAR
218 functionality in microcode. Aside of that the IO-Port mechanism is a
219 legacy interface which is only used on older systems which are either
220 not affected or do not receive microcode updates anymore.