]> git.proxmox.com Git - mirror_ubuntu-bionic-kernel.git/blame - Documentation/x86/mds.rst
x86/speculation/mds: Conditionally clear CPU buffers on idle entry
[mirror_ubuntu-bionic-kernel.git] / Documentation / x86 / mds.rst
CommitLineData
4446d382
TG
1Microarchitectural Data Sampling (MDS) mitigation
2=================================================
3
4.. _mds:
5
6Overview
7--------
8
9Microarchitectural Data Sampling (MDS) is a family of side channel attacks
10on internal buffers in Intel CPUs. The variants are:
11
12 - Microarchitectural Store Buffer Data Sampling (MSBDS) (CVE-2018-12126)
13 - Microarchitectural Fill Buffer Data Sampling (MFBDS) (CVE-2018-12130)
14 - Microarchitectural Load Port Data Sampling (MLPDS) (CVE-2018-12127)
15
16MSBDS leaks Store Buffer Entries which can be speculatively forwarded to a
17dependent load (store-to-load forwarding) as an optimization. The forward
18can also happen to a faulting or assisting load operation for a different
19memory address, which can be exploited under certain conditions. Store
20buffers are partitioned between Hyper-Threads so cross thread forwarding is
21not possible. But if a thread enters or exits a sleep state the store
22buffer is repartitioned which can expose data from one thread to the other.
23
24MFBDS leaks Fill Buffer Entries. Fill buffers are used internally to manage
25L1 miss situations and to hold data which is returned or sent in response
26to a memory or I/O operation. Fill buffers can forward data to a load
27operation and also write data to the cache. When the fill buffer is
28deallocated it can retain the stale data of the preceding operations which
29can then be forwarded to a faulting or assisting load operation, which can
30be exploited under certain conditions. Fill buffers are shared between
31Hyper-Threads so cross thread leakage is possible.
32
33MLPDS leaks Load Port Data. Load ports are used to perform load operations
34from memory or I/O. The received data is then forwarded to the register
35file or a subsequent operation. In some implementations the Load Port can
36contain stale data from a previous operation which can be forwarded to
37faulting or assisting loads under certain conditions, which again can be
38exploited eventually. Load ports are shared between Hyper-Threads so cross
39thread leakage is possible.
40
41
42Exposure assumptions
43--------------------
44
45It is assumed that attack code resides in user space or in a guest with one
46exception. The rationale behind this assumption is that the code construct
47needed for exploiting MDS requires:
48
49 - to control the load to trigger a fault or assist
50
51 - to have a disclosure gadget which exposes the speculatively accessed
52 data for consumption through a side channel.
53
54 - to control the pointer through which the disclosure gadget exposes the
55 data
56
57The existence of such a construct in the kernel cannot be excluded with
58100% certainty, but the complexity involved makes it extremly unlikely.
59
60There is one exception, which is untrusted BPF. The functionality of
61untrusted BPF is limited, but it needs to be thoroughly investigated
62whether it can be used to create such a construct.
63
64
65Mitigation strategy
66-------------------
67
68All variants have the same mitigation strategy at least for the single CPU
69thread case (SMT off): Force the CPU to clear the affected buffers.
70
71This is achieved by using the otherwise unused and obsolete VERW
72instruction in combination with a microcode update. The microcode clears
73the affected CPU buffers when the VERW instruction is executed.
74
75For virtualization there are two ways to achieve CPU buffer
76clearing. Either the modified VERW instruction or via the L1D Flush
77command. The latter is issued when L1TF mitigation is enabled so the extra
78VERW can be avoided. If the CPU is not affected by L1TF then VERW needs to
79be issued.
80
81If the VERW instruction with the supplied segment selector argument is
82executed on a CPU without the microcode update there is no side effect
83other than a small number of pointlessly wasted CPU cycles.
84
85This does not protect against cross Hyper-Thread attacks except for MSBDS
86which is only exploitable cross Hyper-thread when one of the Hyper-Threads
87enters a C-state.
88
89The kernel provides a function to invoke the buffer clearing:
90
91 mds_clear_cpu_buffers()
92
93The mitigation is invoked on kernel/userspace, hypervisor/guest and C-state
94(idle) transitions.
95
96According to current knowledge additional mitigations inside the kernel
97itself are not required because the necessary gadgets to expose the leaked
98data cannot be controlled in a way which allows exploitation from malicious
99user space or VM guests.
5ab15133
TG
100
101Mitigation points
102-----------------
103
1041. Return to user space
105^^^^^^^^^^^^^^^^^^^^^^^
106
107 When transitioning from kernel to user space the CPU buffers are flushed
108 on affected CPUs when the mitigation is not disabled on the kernel
109 command line. The migitation is enabled through the static key
110 mds_user_clear.
111
112 The mitigation is invoked in prepare_exit_to_usermode() which covers
113 most of the kernel to user space transitions. There are a few exceptions
114 which are not invoking prepare_exit_to_usermode() on return to user
115 space. These exceptions use the paranoid exit code.
116
117 - Non Maskable Interrupt (NMI):
118
119 Access to sensible data like keys, credentials in the NMI context is
120 mostly theoretical: The CPU can do prefetching or execute a
121 misspeculated code path and thereby fetching data which might end up
122 leaking through a buffer.
123
124 But for mounting other attacks the kernel stack address of the task is
125 already valuable information. So in full mitigation mode, the NMI is
126 mitigated on the return from do_nmi() to provide almost complete
127 coverage.
128
129 - Double fault (#DF):
130
131 A double fault is usually fatal, but the ESPFIX workaround, which can
132 be triggered from user space through modify_ldt(2) is a recoverable
133 double fault. #DF uses the paranoid exit path, so explicit mitigation
134 in the double fault handler is required.
135
136 - Machine Check Exception (#MC):
137
138 Another corner case is a #MC which hits between the CPU buffer clear
139 invocation and the actual return to user. As this still is in kernel
140 space it takes the paranoid exit path which does not clear the CPU
141 buffers. So the #MC handler repopulates the buffers to some
142 extent. Machine checks are not reliably controllable and the window is
143 extremly small so mitigation would just tick a checkbox that this
144 theoretical corner case is covered. To keep the amount of special
145 cases small, ignore #MC.
146
147 - Debug Exception (#DB):
148
149 This takes the paranoid exit path only when the INT1 breakpoint is in
150 kernel space. #DB on a user space address takes the regular exit path,
151 so no extra mitigation required.
f3eb8f09
TG
152
153
1542. C-State transition
155^^^^^^^^^^^^^^^^^^^^^
156
157 When a CPU goes idle and enters a C-State the CPU buffers need to be
158 cleared on affected CPUs when SMT is active. This addresses the
159 repartitioning of the store buffer when one of the Hyper-Threads enters
160 a C-State.
161
162 When SMT is inactive, i.e. either the CPU does not support it or all
163 sibling threads are offline CPU buffer clearing is not required.
164
165 The idle clearing is enabled on CPUs which are only affected by MSBDS
166 and not by any other MDS variant. The other MDS variants cannot be
167 protected against cross Hyper-Thread attacks because the Fill Buffer and
168 the Load Ports are shared. So on CPUs affected by other variants, the
169 idle clearing would be a window dressing exercise and is therefore not
170 activated.
171
172 The invocation is controlled by the static key mds_idle_clear which is
173 switched depending on the chosen mitigation mode and the SMT state of
174 the system.
175
176 The buffer clear is only invoked before entering the C-State to prevent
177 that stale data from the idling CPU from spilling to the Hyper-Thread
178 sibling after the store buffer got repartitioned and all entries are
179 available to the non idle sibling.
180
181 When coming out of idle the store buffer is partitioned again so each
182 sibling has half of it available. The back from idle CPU could be then
183 speculatively exposed to contents of the sibling. The buffers are
184 flushed either on exit to user space or on VMENTER so malicious code
185 in user space or the guest cannot speculatively access them.
186
187 The mitigation is hooked into all variants of halt()/mwait(), but does
188 not cover the legacy ACPI IO-Port mechanism because the ACPI idle driver
189 has been superseded by the intel_idle driver around 2010 and is
190 preferred on all affected CPUs which are expected to gain the MD_CLEAR
191 functionality in microcode. Aside of that the IO-Port mechanism is a
192 legacy interface which is only used on older systems which are either
193 not affected or do not receive microcode updates anymore.