[mirror_ubuntu-bionic-kernel.git] / Documentation / admin-guide / hw-vuln / mds.rst

MDS - Microarchitectural Data Sampling
======================================

Microarchitectural Data Sampling is a hardware vulnerability which allows
unprivileged speculative access to data which is available in various CPU
internal buffers.

Affected processors
-------------------

This vulnerability affects a wide range of Intel processors. The
vulnerability is not present on:

   - Processors from AMD, Centaur and other non Intel vendors

   - Older processor models, where the CPU family is < 6

   - Some Atoms (Bonnell, Saltwell, Goldmont, GoldmontPlus)

   - Intel processors which have the ARCH_CAP_MDS_NO bit set in the
     IA32_ARCH_CAPABILITIES MSR.

Whether a processor is affected or not can be read out from the MDS
vulnerability file in sysfs. See :ref:`mds_sys_info`.

Not all processors are affected by all variants of MDS, but the mitigation
is identical for all of them so the kernel treats them as a single
vulnerability.

Related CVEs
------------

The following CVE entries are related to the MDS vulnerability:

   ==============  =====  ===================================================
   CVE-2018-12126  MSBDS  Microarchitectural Store Buffer Data Sampling
   CVE-2018-12130  MFBDS  Microarchitectural Fill Buffer Data Sampling
   CVE-2018-12127  MLPDS  Microarchitectural Load Port Data Sampling
   CVE-2019-11091  MDSUM  Microarchitectural Data Sampling Uncacheable Memory
   ==============  =====  ===================================================

Problem
-------

When performing store, load, L1 refill operations, processors write data
into temporary microarchitectural structures (buffers). The data in the
buffer can be forwarded to load operations as an optimization.

Under certain conditions, usually a fault/assist caused by a load
operation, data unrelated to the load memory address can be speculatively
forwarded from the buffers. Because the load operation causes a fault or
assist and its result will be discarded, the forwarded data will not cause
incorrect program execution or state changes. But a malicious operation
may be able to forward this speculative data to a disclosure gadget which
allows in turn to infer the value via a cache side channel attack.

Because the buffers are potentially shared between Hyper-Threads cross
Hyper-Thread attacks are possible.

Deeper technical information is available in the MDS specific x86
architecture section: :ref:`Documentation/x86/mds.rst <mds>`.


Attack scenarios
----------------

Attacks against the MDS vulnerabilities can be mounted from malicious non
priviledged user space applications running on hosts or guest. Malicious
guest OSes can obviously mount attacks as well.

Contrary to other speculation based vulnerabilities the MDS vulnerability
does not allow the attacker to control the memory target address. As a
consequence the attacks are purely sampling based, but as demonstrated with
the TLBleed attack samples can be postprocessed successfully.

Web-Browsers
^^^^^^^^^^^^

  It's unclear whether attacks through Web-Browsers are possible at
  all. The exploitation through Java-Script is considered very unlikely,
  but other widely used web technologies like Webassembly could possibly be
  abused.


.. _mds_sys_info:

MDS system information
-----------------------

The Linux kernel provides a sysfs interface to enumerate the current MDS
status of the system: whether the system is vulnerable, and which
mitigations are active. The relevant sysfs file is:

/sys/devices/system/cpu/vulnerabilities/mds

The possible values in this file are:

  .. list-table::

     * - 'Not affected'
       - The processor is not vulnerable
     * - 'Vulnerable'
       - The processor is vulnerable, but no mitigation enabled
     * - 'Vulnerable: Clear CPU buffers attempted, no microcode'
       - The processor is vulnerable but microcode is not updated.

         The mitigation is enabled on a best effort basis. See :ref:`vmwerv`
     * - 'Mitigation: Clear CPU buffers'
       - The processor is vulnerable and the CPU buffer clearing mitigation is
         enabled.

If the processor is vulnerable then the following information is appended
to the above information:

    ========================  ============================================
    'SMT vulnerable'          SMT is enabled
    'SMT mitigated'           SMT is enabled and mitigated
    'SMT disabled'            SMT is disabled
    'SMT Host state unknown'  Kernel runs in a VM, Host SMT state unknown
    ========================  ============================================

.. _vmwerv:

Best effort mitigation mode
^^^^^^^^^^^^^^^^^^^^^^^^^^^

  If the processor is vulnerable, but the availability of the microcode based
  mitigation mechanism is not advertised via CPUID the kernel selects a best
  effort mitigation mode.  This mode invokes the mitigation instructions
  without a guarantee that they clear the CPU buffers.

  This is done to address virtualization scenarios where the host has the
  microcode update applied, but the hypervisor is not yet updated to expose
  the CPUID to the guest. If the host has updated microcode the protection
  takes effect otherwise a few cpu cycles are wasted pointlessly.

  The state in the mds sysfs file reflects this situation accordingly.


Mitigation mechanism
-------------------------

The kernel detects the affected CPUs and the presence of the microcode
which is required.

If a CPU is affected and the microcode is available, then the kernel
enables the mitigation by default. The mitigation can be controlled at boot
time via a kernel command line option. See
:ref:`mds_mitigation_control_command_line`.

.. _cpu_buffer_clear:

CPU buffer clearing
^^^^^^^^^^^^^^^^^^^

  The mitigation for MDS clears the affected CPU buffers on return to user
  space and when entering a guest.

  If SMT is enabled it also clears the buffers on idle entry when the CPU
  is only affected by MSBDS and not any other MDS variant, because the
  other variants cannot be protected against cross Hyper-Thread attacks.

  For CPUs which are only affected by MSBDS the user space, guest and idle
  transition mitigations are sufficient and SMT is not affected.

.. _virt_mechanism:

Virtualization mitigation
^^^^^^^^^^^^^^^^^^^^^^^^^

  The protection for host to guest transition depends on the L1TF
  vulnerability of the CPU:

  - CPU is affected by L1TF:

    If the L1D flush mitigation is enabled and up to date microcode is
    available, the L1D flush mitigation is automatically protecting the
    guest transition.

    If the L1D flush mitigation is disabled then the MDS mitigation is
    invoked explicit when the host MDS mitigation is enabled.

    For details on L1TF and virtualization see:
    :ref:`Documentation/admin-guide/hw-vuln//l1tf.rst <mitigation_control_kvm>`.

  - CPU is not affected by L1TF:

    CPU buffers are flushed before entering the guest when the host MDS
    mitigation is enabled.

  The resulting MDS protection matrix for the host to guest transition:

  ============ ===== ============= ============ =================
   L1TF         MDS   VMX-L1FLUSH   Host MDS     MDS-State

   Don't care   No    Don't care    N/A          Not affected

   Yes          Yes   Disabled      Off          Vulnerable

   Yes          Yes   Disabled      Full         Mitigated

   Yes          Yes   Enabled       Don't care   Mitigated

   No           Yes   N/A           Off          Vulnerable

   No           Yes   N/A           Full         Mitigated
  ============ ===== ============= ============ =================

  This only covers the host to guest transition, i.e. prevents leakage from
  host to guest, but does not protect the guest internally. Guests need to
  have their own protections.

.. _xeon_phi:

XEON PHI specific considerations
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  The XEON PHI processor family is affected by MSBDS which can be exploited
  cross Hyper-Threads when entering idle states. Some XEON PHI variants allow
  to use MWAIT in user space (Ring 3) which opens an potential attack vector
  for malicious user space. The exposure can be disabled on the kernel
  command line with the 'ring3mwait=disable' command line option.

  XEON PHI is not affected by the other MDS variants and MSBDS is mitigated
  before the CPU enters a idle state. As XEON PHI is not affected by L1TF
  either disabling SMT is not required for full protection.

.. _mds_smt_control:

SMT control
^^^^^^^^^^^

  All MDS variants except MSBDS can be attacked cross Hyper-Threads. That
  means on CPUs which are affected by MFBDS or MLPDS it is necessary to
  disable SMT for full protection. These are most of the affected CPUs; the
  exception is XEON PHI, see :ref:`xeon_phi`.

  Disabling SMT can have a significant performance impact, but the impact
  depends on the type of workloads.

  See the relevant chapter in the L1TF mitigation documentation for details:
  :ref:`Documentation/admin-guide/hw-vuln/l1tf.rst <smt_control>`.


.. _mds_mitigation_control_command_line:

Mitigation control on the kernel command line
---------------------------------------------

The kernel command line allows to control the MDS mitigations at boot
time with the option "mds=". The valid arguments for this option are:

  ============  =============================================================
  full		If the CPU is vulnerable, enable all available mitigations
		for the MDS vulnerability, CPU buffer clearing on exit to
		userspace and when entering a VM. Idle transitions are
		protected as well if SMT is enabled.

		It does not automatically disable SMT.

  full,nosmt	The same as mds=full, with SMT disabled on vulnerable
		CPUs.  This is the complete mitigation.

  off		Disables MDS mitigations completely.

  ============  =============================================================

Not specifying this option is equivalent to "mds=full". For processors
that are affected by both TAA (TSX Asynchronous Abort) and MDS,
specifying just "mds=off" without an accompanying "tsx_async_abort=off"
will have no effect as the same mitigation is used for both
vulnerabilities.

Mitigation selection guide
--------------------------

1. Trusted userspace
^^^^^^^^^^^^^^^^^^^^

   If all userspace applications are from a trusted source and do not
   execute untrusted code which is supplied externally, then the mitigation
   can be disabled.


2. Virtualization with trusted guests
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

   The same considerations as above versus trusted user space apply.

3. Virtualization with untrusted guests
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

   The protection depends on the state of the L1TF mitigations.
   See :ref:`virt_mechanism`.

   If the MDS mitigation is enabled and SMT is disabled, guest to host and
   guest to guest attacks are prevented.

.. _mds_default_mitigations:

Default mitigations
-------------------

  The kernel default mitigations for vulnerable processors are:

  - Enable CPU buffer clearing

  The kernel does not by default enforce the disabling of SMT, which leaves
  SMT systems vulnerable when running untrusted code. The same rationale as
  for L1TF applies.
  See :ref:`Documentation/admin-guide/hw-vuln//l1tf.rst <default_mitigations>`.
Commit	Line	Data
40318558 TG	1	MDS - Microarchitectural Data Sampling
	2	======================================
	3
	4	Microarchitectural Data Sampling is a hardware vulnerability which allows
	5	unprivileged speculative access to data which is available in various CPU
	6	internal buffers.
	7
	8	Affected processors
	9	-------------------
	10
	11	This vulnerability affects a wide range of Intel processors. The
	12	vulnerability is not present on:
	13
	14	- Processors from AMD, Centaur and other non Intel vendors
	15
	16	- Older processor models, where the CPU family is < 6
	17
	18	- Some Atoms (Bonnell, Saltwell, Goldmont, GoldmontPlus)
	19
	20	- Intel processors which have the ARCH_CAP_MDS_NO bit set in the
	21	IA32_ARCH_CAPABILITIES MSR.
	22
	23	Whether a processor is affected or not can be read out from the MDS
	24	vulnerability file in sysfs. See :ref:`mds_sys_info`.
	25
	26	Not all processors are affected by all variants of MDS, but the mitigation
	27	is identical for all of them so the kernel treats them as a single
	28	vulnerability.
	29
	30	Related CVEs
	31	------------
	32
	33	The following CVE entries are related to the MDS vulnerability:
	34
445b98a1	35	============== ===== ===================================================
40318558 TG	36	CVE-2018-12126 MSBDS Microarchitectural Store Buffer Data Sampling
	37	CVE-2018-12130 MFBDS Microarchitectural Fill Buffer Data Sampling
	38	CVE-2018-12127 MLPDS Microarchitectural Load Port Data Sampling
445b98a1 PG	39	CVE-2019-11091 MDSUM Microarchitectural Data Sampling Uncacheable Memory
445b98a1 PG	40	============== ===== ===================================================
40318558 TG	41
	42	Problem
	43	-------
	44
	45	When performing store, load, L1 refill operations, processors write data
	46	into temporary microarchitectural structures (buffers). The data in the
	47	buffer can be forwarded to load operations as an optimization.
	48
	49	Under certain conditions, usually a fault/assist caused by a load
	50	operation, data unrelated to the load memory address can be speculatively
	51	forwarded from the buffers. Because the load operation causes a fault or
	52	assist and its result will be discarded, the forwarded data will not cause
	53	incorrect program execution or state changes. But a malicious operation
	54	may be able to forward this speculative data to a disclosure gadget which
	55	allows in turn to infer the value via a cache side channel attack.
	56
	57	Because the buffers are potentially shared between Hyper-Threads cross
	58	Hyper-Thread attacks are possible.
	59
	60	Deeper technical information is available in the MDS specific x86
	61	architecture section: :ref:`Documentation/x86/mds.rst <mds>`.
	62
	63
	64	Attack scenarios
	65	----------------
	66
	67	Attacks against the MDS vulnerabilities can be mounted from malicious non
	68	priviledged user space applications running on hosts or guest. Malicious
	69	guest OSes can obviously mount attacks as well.
	70
	71	Contrary to other speculation based vulnerabilities the MDS vulnerability
	72	does not allow the attacker to control the memory target address. As a
	73	consequence the attacks are purely sampling based, but as demonstrated with
	74	the TLBleed attack samples can be postprocessed successfully.
	75
	76	Web-Browsers
	77	^^^^^^^^^^^^
	78
	79	It's unclear whether attacks through Web-Browsers are possible at
	80	all. The exploitation through Java-Script is considered very unlikely,
	81	but other widely used web technologies like Webassembly could possibly be
	82	abused.
	83
	84
	85	.. _mds_sys_info:
	86
	87	MDS system information
	88	-----------------------
	89
	90	The Linux kernel provides a sysfs interface to enumerate the current MDS
	91	status of the system: whether the system is vulnerable, and which
	92	mitigations are active. The relevant sysfs file is:
	93
	94	/sys/devices/system/cpu/vulnerabilities/mds
	95
	96	The possible values in this file are:
	97
244d1d22 TH	98	.. list-table::
	99
	100	* - 'Not affected'
	101	- The processor is not vulnerable
	102	* - 'Vulnerable'
	103	- The processor is vulnerable, but no mitigation enabled
	104	* - 'Vulnerable: Clear CPU buffers attempted, no microcode'
	105	- The processor is vulnerable but microcode is not updated.
	106
	107	The mitigation is enabled on a best effort basis. See :ref:`vmwerv`
	108	* - 'Mitigation: Clear CPU buffers'
	109	- The processor is vulnerable and the CPU buffer clearing mitigation is
	110	enabled.
40318558 TG	111
	112	If the processor is vulnerable then the following information is appended
	113	to the above information:
	114
	115	======================== ============================================
	116	'SMT vulnerable' SMT is enabled
	117	'SMT mitigated' SMT is enabled and mitigated
	118	'SMT disabled' SMT is disabled
	119	'SMT Host state unknown' Kernel runs in a VM, Host SMT state unknown
	120	======================== ============================================
	121
	122	.. _vmwerv:
	123
	124	Best effort mitigation mode
	125	^^^^^^^^^^^^^^^^^^^^^^^^^^^
	126
	127	If the processor is vulnerable, but the availability of the microcode based
	128	mitigation mechanism is not advertised via CPUID the kernel selects a best
	129	effort mitigation mode. This mode invokes the mitigation instructions
	130	without a guarantee that they clear the CPU buffers.
	131
	132	This is done to address virtualization scenarios where the host has the
	133	microcode update applied, but the hypervisor is not yet updated to expose
	134	the CPUID to the guest. If the host has updated microcode the protection
	135	takes effect otherwise a few cpu cycles are wasted pointlessly.
	136
	137	The state in the mds sysfs file reflects this situation accordingly.
	138
	139
	140	Mitigation mechanism
	141	-------------------------
	142
	143	The kernel detects the affected CPUs and the presence of the microcode
	144	which is required.
	145
	146	If a CPU is affected and the microcode is available, then the kernel
	147	enables the mitigation by default. The mitigation can be controlled at boot
	148	time via a kernel command line option. See
	149	:ref:`mds_mitigation_control_command_line`.
	150
	151	.. _cpu_buffer_clear:
	152
	153	CPU buffer clearing
	154	^^^^^^^^^^^^^^^^^^^
	155
	156	The mitigation for MDS clears the affected CPU buffers on return to user
	157	space and when entering a guest.
	158
	159	If SMT is enabled it also clears the buffers on idle entry when the CPU
	160	is only affected by MSBDS and not any other MDS variant, because the
	161	other variants cannot be protected against cross Hyper-Thread attacks.
	162
	163	For CPUs which are only affected by MSBDS the user space, guest and idle
	164	transition mitigations are sufficient and SMT is not affected.
	165
	166	.. _virt_mechanism:
	167
	168	Virtualization mitigation
	169	^^^^^^^^^^^^^^^^^^^^^^^^^
	170
	171	The protection for host to guest transition depends on the L1TF
	172	vulnerability of the CPU:
	173
	174	- CPU is affected by L1TF:
175
176	If the L1D flush mitigation is enabled and up to date microcode is
177	available, the L1D flush mitigation is automatically protecting the
178	guest transition.
179
180	If the L1D flush mitigation is disabled then the MDS mitigation is
181	invoked explicit when the host MDS mitigation is enabled.
182
183	For details on L1TF and virtualization see:
184	:ref:`Documentation/admin-guide/hw-vuln//l1tf.rst <mitigation_control_kvm>`.
185
186	- CPU is not affected by L1TF:
187
188	CPU buffers are flushed before entering the guest when the host MDS
189	mitigation is enabled.
190
191	The resulting MDS protection matrix for the host to guest transition:
192
193	============ ===== ============= ============ =================
194	L1TF MDS VMX-L1FLUSH Host MDS MDS-State
195
196	Don't care No Don't care N/A Not affected
197
198	Yes Yes Disabled Off Vulnerable
199
200	Yes Yes Disabled Full Mitigated
201
202	Yes Yes Enabled Don't care Mitigated
203
204	No Yes N/A Off Vulnerable
205
206	No Yes N/A Full Mitigated
207	============ ===== ============= ============ =================
208
209	This only covers the host to guest transition, i.e. prevents leakage from
210	host to guest, but does not protect the guest internally. Guests need to
211	have their own protections.
212
213	.. _xeon_phi:
214
215	XEON PHI specific considerations
216	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
217
218	The XEON PHI processor family is affected by MSBDS which can be exploited
219	cross Hyper-Threads when entering idle states. Some XEON PHI variants allow
220	to use MWAIT in user space (Ring 3) which opens an potential attack vector
221	for malicious user space. The exposure can be disabled on the kernel
222	command line with the 'ring3mwait=disable' command line option.
223
224	XEON PHI is not affected by the other MDS variants and MSBDS is mitigated
225	before the CPU enters a idle state. As XEON PHI is not affected by L1TF
226	either disabling SMT is not required for full protection.
227
228	.. _mds_smt_control:
229
230	SMT control
231	^^^^^^^^^^^
232
233	All MDS variants except MSBDS can be attacked cross Hyper-Threads. That
234	means on CPUs which are affected by MFBDS or MLPDS it is necessary to
235	disable SMT for full protection. These are most of the affected CPUs; the
236	exception is XEON PHI, see :ref:`xeon_phi`.
237
238	Disabling SMT can have a significant performance impact, but the impact
239	depends on the type of workloads.
240
241	See the relevant chapter in the L1TF mitigation documentation for details:
242	:ref:`Documentation/admin-guide/hw-vuln/l1tf.rst <smt_control>`.
243
244
245	.. _mds_mitigation_control_command_line:
246
247	Mitigation control on the kernel command line
248	---------------------------------------------
249
250	The kernel command line allows to control the MDS mitigations at boot
251	time with the option "mds=". The valid arguments for this option are:
252
253	============ =============================================================
254	full If the CPU is vulnerable, enable all available mitigations
255	for the MDS vulnerability, CPU buffer clearing on exit to
256	userspace and when entering a VM. Idle transitions are
257	protected as well if SMT is enabled.
258
259	It does not automatically disable SMT.
260
e2f3c337 JP	261	full,nosmt The same as mds=full, with SMT disabled on vulnerable
	262	CPUs. This is the complete mitigation.
	263
40318558 TG	264	off Disables MDS mitigations completely.
	265
	266	============ =============================================================
	267
daa83568 WL	268	Not specifying this option is equivalent to "mds=full". For processors
	269	that are affected by both TAA (TSX Asynchronous Abort) and MDS,
	270	specifying just "mds=off" without an accompanying "tsx_async_abort=off"
	271	will have no effect as the same mitigation is used for both
	272	vulnerabilities.
40318558 TG	273
	274	Mitigation selection guide
	275	--------------------------
	276
	277	1. Trusted userspace
	278	^^^^^^^^^^^^^^^^^^^^
	279
	280	If all userspace applications are from a trusted source and do not
	281	execute untrusted code which is supplied externally, then the mitigation
	282	can be disabled.
	283
	284
	285	2. Virtualization with trusted guests
	286	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	287
	288	The same considerations as above versus trusted user space apply.
	289
	290	3. Virtualization with untrusted guests
	291	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	292
	293	The protection depends on the state of the L1TF mitigations.
	294	See :ref:`virt_mechanism`.
	295
	296	If the MDS mitigation is enabled and SMT is disabled, guest to host and
	297	guest to guest attacks are prevented.
	298
	299	.. _mds_default_mitigations:
	300
	301	Default mitigations
	302	-------------------
	303
	304	The kernel default mitigations for vulnerable processors are:
	305
	306	- Enable CPU buffer clearing
	307
	308	The kernel does not by default enforce the disabling of SMT, which leaves
	309	SMT systems vulnerable when running untrusted code. The same rationale as
	310	for L1TF applies.
	311	See :ref:`Documentation/admin-guide/hw-vuln//l1tf.rst <default_mitigations>`.