Documentation: x86: convert intel_mpx.txt to reST

author Changbin Du <changbin.du@gmail.com>

Wed, 8 May 2019 15:21:27 +0000 (23:21 +0800)

committer Jonathan Corbet <corbet@lwn.net>

Wed, 8 May 2019 20:34:10 +0000 (14:34 -0600)
author Changbin Du <changbin.du@gmail.com>
Wed, 8 May 2019 15:21:27 +0000 (23:21 +0800)
committer Jonathan Corbet <corbet@lwn.net>
Wed, 8 May 2019 20:34:10 +0000 (14:34 -0600)
diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst

index e2c0db9fcd4ed6a8e19ee70f22a7ce7b6cc65e40..b5cdc0d889b338c4b8f8cdc14a880e841a4b839b 100644 (file)
--- a/Documentation/x86/index.rst
+++ b/Documentation/x86/index.rst
@@ -19,3 +19,4 @@ x86-specific Documentation
     mtrr
     pat
     protection-keys
+   intel_mpx
diff --git a/Documentation/x86/intel_mpx.rst b/Documentation/x86/intel_mpx.rst

new file mode 100644 (file)

index 0000000..387a640
--- /dev/null
+++ b/Documentation/x86/intel_mpx.rst
@@ -0,0 +1,252 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===========================================
+Intel(R) Memory Protection Extensions (MPX)
+===========================================
+
+Intel(R) MPX Overview
+=====================
+
+Intel(R) Memory Protection Extensions (Intel(R) MPX) is a new capability
+introduced into Intel Architecture. Intel MPX provides hardware features
+that can be used in conjunction with compiler changes to check memory
+references, for those references whose compile-time normal intentions are
+usurped at runtime due to buffer overflow or underflow.
+
+You can tell if your CPU supports MPX by looking in /proc/cpuinfo::
+
+       cat /proc/cpuinfo  | grep ' mpx '
+
+For more information, please refer to Intel(R) Architecture Instruction
+Set Extensions Programming Reference, Chapter 9: Intel(R) Memory Protection
+Extensions.
+
+Note: As of December 2014, no hardware with MPX is available but it is
+possible to use SDE (Intel(R) Software Development Emulator) instead, which
+can be downloaded from
+http://software.intel.com/en-us/articles/intel-software-development-emulator
+
+
+How to get the advantage of MPX
+===============================
+
+For MPX to work, changes are required in the kernel, binutils and compiler.
+No source changes are required for applications, just a recompile.
+
+There are a lot of moving parts of this to all work right. The following
+is how we expect the compiler, application and kernel to work together.
+
+1) Application developer compiles with -fmpx. The compiler will add the
+   instrumentation as well as some setup code called early after the app
+   starts. New instruction prefixes are noops for old CPUs.
+2) That setup code allocates (virtual) space for the "bounds directory",
+   points the "bndcfgu" register to the directory (must also set the valid
+   bit) and notifies the kernel (via the new prctl(PR_MPX_ENABLE_MANAGEMENT))
+   that the app will be using MPX.  The app must be careful not to access
+   the bounds tables between the time when it populates "bndcfgu" and
+   when it calls the prctl().  This might be hard to guarantee if the app
+   is compiled with MPX.  You can add "__attribute__((bnd_legacy))" to
+   the function to disable MPX instrumentation to help guarantee this.
+   Also be careful not to call out to any other code which might be
+   MPX-instrumented.
+3) The kernel detects that the CPU has MPX, allows the new prctl() to
+   succeed, and notes the location of the bounds directory. Userspace is
+   expected to keep the bounds directory at that location. We note it
+   instead of reading it each time because the 'xsave' operation needed
+   to access the bounds directory register is an expensive operation.
+4) If the application needs to spill bounds out of the 4 registers, it
+   issues a bndstx instruction. Since the bounds directory is empty at
+   this point, a bounds fault (#BR) is raised, the kernel allocates a
+   bounds table (in the user address space) and makes the relevant entry
+   in the bounds directory point to the new table.
+5) If the application violates the bounds specified in the bounds registers,
+   a separate kind of #BR is raised which will deliver a signal with
+   information about the violation in the 'struct siginfo'.
+6) Whenever memory is freed, we know that it can no longer contain valid
+   pointers, and we attempt to free the associated space in the bounds
+   tables. If an entire table becomes unused, we will attempt to free
+   the table and remove the entry in the directory.
+
+To summarize, there are essentially three things interacting here:
+
+GCC with -fmpx:
+ * enables annotation of code with MPX instructions and prefixes
+ * inserts code early in the application to call in to the "gcc runtime"
+GCC MPX Runtime:
+ * Checks for hardware MPX support in cpuid leaf
+ * allocates virtual space for the bounds directory (malloc() essentially)
+ * points the hardware BNDCFGU register at the directory
+ * calls a new prctl(PR_MPX_ENABLE_MANAGEMENT) to notify the kernel to
+   start managing the bounds directories
+Kernel MPX Code:
+ * Checks for hardware MPX support in cpuid leaf
+ * Handles #BR exceptions and sends SIGSEGV to the app when it violates
+   bounds, like during a buffer overflow.
+ * When bounds are spilled in to an unallocated bounds table, the kernel
+   notices in the #BR exception, allocates the virtual space, then
+   updates the bounds directory to point to the new table. It keeps
+   special track of the memory with a VM_MPX flag.
+ * Frees unused bounds tables at the time that the memory they described
+   is unmapped.
+
+
+How does MPX kernel code work
+=============================
+
+Handling #BR faults caused by MPX
+---------------------------------
+
+When MPX is enabled, there are 2 new situations that can generate
+#BR faults.
+
+  * new bounds tables (BT) need to be allocated to save bounds.
+  * bounds violation caused by MPX instructions.
+
+We hook #BR handler to handle these two new situations.
+
+On-demand kernel allocation of bounds tables
+--------------------------------------------
+
+MPX only has 4 hardware registers for storing bounds information. If
+MPX-enabled code needs more than these 4 registers, it needs to spill
+them somewhere. It has two special instructions for this which allow
+the bounds to be moved between the bounds registers and some new "bounds
+tables".
+
+#BR exceptions are a new class of exceptions just for MPX. They are
+similar conceptually to a page fault and will be raised by the MPX
+hardware during both bounds violations or when the tables are not
+present. The kernel handles those #BR exceptions for not-present tables
+by carving the space out of the normal processes address space and then
+pointing the bounds-directory over to it.
+
+The tables need to be accessed and controlled by userspace because
+the instructions for moving bounds in and out of them are extremely
+frequent. They potentially happen every time a register points to
+memory. Any direct kernel involvement (like a syscall) to access the
+tables would obviously destroy performance.
+
+Why not do this in userspace? MPX does not strictly require anything in
+the kernel. It can theoretically be done completely from userspace. Here
+are a few ways this could be done. We don't think any of them are practical
+in the real-world, but here they are.
+
+:Q: Can virtual space simply be reserved for the bounds tables so that we
+    never have to allocate them?
+:A: MPX-enabled application will possibly create a lot of bounds tables in
+    process address space to save bounds information. These tables can take
+    up huge swaths of memory (as much as 80% of the memory on the system)
+    even if we clean them up aggressively. In the worst-case scenario, the
+    tables can be 4x the size of the data structure being tracked. IOW, a
+    1-page structure can require 4 bounds-table pages. An X-GB virtual
+    area needs 4*X GB of virtual space, plus 2GB for the bounds directory.
+    If we were to preallocate them for the 128TB of user virtual address
+    space, we would need to reserve 512TB+2GB, which is larger than the
+    entire virtual address space today. This means they can not be reserved
+    ahead of time. Also, a single process's pre-populated bounds directory
+    consumes 2GB of virtual *AND* physical memory. IOW, it's completely
+    infeasible to prepopulate bounds directories.
+
+:Q: Can we preallocate bounds table space at the same time memory is
+    allocated which might contain pointers that might eventually need
+    bounds tables?
+:A: This would work if we could hook the site of each and every memory
+    allocation syscall. This can be done for small, constrained applications.
+    But, it isn't practical at a larger scale since a given app has no
+    way of controlling how all the parts of the app might allocate memory
+    (think libraries). The kernel is really the only place to intercept
+    these calls.
+
+:Q: Could a bounds fault be handed to userspace and the tables allocated
+    there in a signal handler instead of in the kernel?
+:A: mmap() is not on the list of safe async handler functions and even
+    if mmap() would work it still requires locking or nasty tricks to
+    keep track of the allocation state there.
+
+Having ruled out all of the userspace-only approaches for managing
+bounds tables that we could think of, we create them on demand in
+the kernel.
+
+Decoding MPX instructions
+-------------------------
+
+If a #BR is generated due to a bounds violation caused by MPX.
+We need to decode MPX instructions to get violation address and
+set this address into extended struct siginfo.
+
+The _sigfault field of struct siginfo is extended as follow::
+
+  87           /* SIGILL, SIGFPE, SIGSEGV, SIGBUS */
+  88           struct {
+  89                   void __user *_addr; /* faulting insn/memory ref. */
+  90 #ifdef __ARCH_SI_TRAPNO
+  91                   int _trapno;    /* TRAP # which caused the signal */
+  92 #endif
+  93                   short _addr_lsb; /* LSB of the reported address */
+  94                   struct {
+  95                           void __user *_lower;
+  96                           void __user *_upper;
+  97                   } _addr_bnd;
+  98           } _sigfault;
+
+The '_addr' field refers to violation address, and new '_addr_and'
+field refers to the upper/lower bounds when a #BR is caused.
+
+Glibc will be also updated to support this new siginfo. So user
+can get violation address and bounds when bounds violations occur.
+
+Cleanup unused bounds tables
+----------------------------
+
+When a BNDSTX instruction attempts to save bounds to a bounds directory
+entry marked as invalid, a #BR is generated. This is an indication that
+no bounds table exists for this entry. In this case the fault handler
+will allocate a new bounds table on demand.
+
+Since the kernel allocated those tables on-demand without userspace
+knowledge, it is also responsible for freeing them when the associated
+mappings go away.
+
+Here, the solution for this issue is to hook do_munmap() to check
+whether one process is MPX enabled. If yes, those bounds tables covered
+in the virtual address region which is being unmapped will be freed also.
+
+Adding new prctl commands
+-------------------------
+
+Two new prctl commands are added to enable and disable MPX bounds tables
+management in kernel.
+::
+
+  155  #define PR_MPX_ENABLE_MANAGEMENT        43
+  156  #define PR_MPX_DISABLE_MANAGEMENT       44
+
+Runtime library in userspace is responsible for allocation of bounds
+directory. So kernel have to use XSAVE instruction to get the base
+of bounds directory from BNDCFG register.
+
+But XSAVE is expected to be very expensive. In order to do performance
+optimization, we have to get the base of bounds directory and save it
+into struct mm_struct to be used in future during PR_MPX_ENABLE_MANAGEMENT
+command execution.
+
+
+Special rules
+=============
+
+1) If userspace is requesting help from the kernel to do the management
+of bounds tables, it may not create or modify entries in the bounds directory.
+
+Certainly users can allocate bounds tables and forcibly point the bounds
+directory at them through XSAVE instruction, and then set valid bit
+of bounds entry to have this entry valid.  But, the kernel will decline
+to assist in managing these tables.
+
+2) Userspace may not take multiple bounds directory entries and point
+them at the same bounds table.
+
+This is allowed architecturally.  See more information "Intel(R) Architecture
+Instruction Set Extensions Programming Reference" (9.3.4).
+
+However, if users did this, the kernel might be fooled in to unmapping an
+in-use bounds table since it does not recognize sharing.
diff --git a/Documentation/x86/intel_mpx.txt b/Documentation/x86/intel_mpx.txt

deleted file mode 100644 (file)

index 85d0549..0000000
--- a/Documentation/x86/intel_mpx.txt
+++ /dev/null
@@ -1,244 +0,0 @@
-1. Intel(R) MPX Overview
-========================
-
-Intel(R) Memory Protection Extensions (Intel(R) MPX) is a new capability
-introduced into Intel Architecture. Intel MPX provides hardware features
-that can be used in conjunction with compiler changes to check memory
-references, for those references whose compile-time normal intentions are
-usurped at runtime due to buffer overflow or underflow.
-
-You can tell if your CPU supports MPX by looking in /proc/cpuinfo:
-
-       cat /proc/cpuinfo  | grep ' mpx '
-
-For more information, please refer to Intel(R) Architecture Instruction
-Set Extensions Programming Reference, Chapter 9: Intel(R) Memory Protection
-Extensions.
-
-Note: As of December 2014, no hardware with MPX is available but it is
-possible to use SDE (Intel(R) Software Development Emulator) instead, which
-can be downloaded from
-http://software.intel.com/en-us/articles/intel-software-development-emulator
-
-
-2. How to get the advantage of MPX
-==================================
-
-For MPX to work, changes are required in the kernel, binutils and compiler.
-No source changes are required for applications, just a recompile.
-
-There are a lot of moving parts of this to all work right. The following
-is how we expect the compiler, application and kernel to work together.
-
-1) Application developer compiles with -fmpx. The compiler will add the
-   instrumentation as well as some setup code called early after the app
-   starts. New instruction prefixes are noops for old CPUs.
-2) That setup code allocates (virtual) space for the "bounds directory",
-   points the "bndcfgu" register to the directory (must also set the valid
-   bit) and notifies the kernel (via the new prctl(PR_MPX_ENABLE_MANAGEMENT))
-   that the app will be using MPX.  The app must be careful not to access
-   the bounds tables between the time when it populates "bndcfgu" and
-   when it calls the prctl().  This might be hard to guarantee if the app
-   is compiled with MPX.  You can add "__attribute__((bnd_legacy))" to
-   the function to disable MPX instrumentation to help guarantee this.
-   Also be careful not to call out to any other code which might be
-   MPX-instrumented.
-3) The kernel detects that the CPU has MPX, allows the new prctl() to
-   succeed, and notes the location of the bounds directory. Userspace is
-   expected to keep the bounds directory at that location. We note it
-   instead of reading it each time because the 'xsave' operation needed
-   to access the bounds directory register is an expensive operation.
-4) If the application needs to spill bounds out of the 4 registers, it
-   issues a bndstx instruction. Since the bounds directory is empty at
-   this point, a bounds fault (#BR) is raised, the kernel allocates a
-   bounds table (in the user address space) and makes the relevant entry
-   in the bounds directory point to the new table.
-5) If the application violates the bounds specified in the bounds registers,
-   a separate kind of #BR is raised which will deliver a signal with
-   information about the violation in the 'struct siginfo'.
-6) Whenever memory is freed, we know that it can no longer contain valid
-   pointers, and we attempt to free the associated space in the bounds
-   tables. If an entire table becomes unused, we will attempt to free
-   the table and remove the entry in the directory.
-
-To summarize, there are essentially three things interacting here:
-
-GCC with -fmpx:
- * enables annotation of code with MPX instructions and prefixes
- * inserts code early in the application to call in to the "gcc runtime"
-GCC MPX Runtime:
- * Checks for hardware MPX support in cpuid leaf
- * allocates virtual space for the bounds directory (malloc() essentially)
- * points the hardware BNDCFGU register at the directory
- * calls a new prctl(PR_MPX_ENABLE_MANAGEMENT) to notify the kernel to
-   start managing the bounds directories
-Kernel MPX Code:
- * Checks for hardware MPX support in cpuid leaf
- * Handles #BR exceptions and sends SIGSEGV to the app when it violates
-   bounds, like during a buffer overflow.
- * When bounds are spilled in to an unallocated bounds table, the kernel
-   notices in the #BR exception, allocates the virtual space, then
-   updates the bounds directory to point to the new table. It keeps
-   special track of the memory with a VM_MPX flag.
- * Frees unused bounds tables at the time that the memory they described
-   is unmapped.
-
-
-3. How does MPX kernel code work
-================================
-
-Handling #BR faults caused by MPX
----------------------------------
-
-When MPX is enabled, there are 2 new situations that can generate
-#BR faults.
-  * new bounds tables (BT) need to be allocated to save bounds.
-  * bounds violation caused by MPX instructions.
-
-We hook #BR handler to handle these two new situations.
-
-On-demand kernel allocation of bounds tables
---------------------------------------------
-
-MPX only has 4 hardware registers for storing bounds information. If
-MPX-enabled code needs more than these 4 registers, it needs to spill
-them somewhere. It has two special instructions for this which allow
-the bounds to be moved between the bounds registers and some new "bounds
-tables".
-
-#BR exceptions are a new class of exceptions just for MPX. They are
-similar conceptually to a page fault and will be raised by the MPX
-hardware during both bounds violations or when the tables are not
-present. The kernel handles those #BR exceptions for not-present tables
-by carving the space out of the normal processes address space and then
-pointing the bounds-directory over to it.
-
-The tables need to be accessed and controlled by userspace because
-the instructions for moving bounds in and out of them are extremely
-frequent. They potentially happen every time a register points to
-memory. Any direct kernel involvement (like a syscall) to access the
-tables would obviously destroy performance.
-
-Why not do this in userspace? MPX does not strictly require anything in
-the kernel. It can theoretically be done completely from userspace. Here
-are a few ways this could be done. We don't think any of them are practical
-in the real-world, but here they are.
-
-Q: Can virtual space simply be reserved for the bounds tables so that we
-   never have to allocate them?
-A: MPX-enabled application will possibly create a lot of bounds tables in
-   process address space to save bounds information. These tables can take
-   up huge swaths of memory (as much as 80% of the memory on the system)
-   even if we clean them up aggressively. In the worst-case scenario, the
-   tables can be 4x the size of the data structure being tracked. IOW, a
-   1-page structure can require 4 bounds-table pages. An X-GB virtual
-   area needs 4*X GB of virtual space, plus 2GB for the bounds directory.
-   If we were to preallocate them for the 128TB of user virtual address
-   space, we would need to reserve 512TB+2GB, which is larger than the
-   entire virtual address space today. This means they can not be reserved
-   ahead of time. Also, a single process's pre-populated bounds directory
-   consumes 2GB of virtual *AND* physical memory. IOW, it's completely
-   infeasible to prepopulate bounds directories.
-
-Q: Can we preallocate bounds table space at the same time memory is
-   allocated which might contain pointers that might eventually need
-   bounds tables?
-A: This would work if we could hook the site of each and every memory
-   allocation syscall. This can be done for small, constrained applications.
-   But, it isn't practical at a larger scale since a given app has no
-   way of controlling how all the parts of the app might allocate memory
-   (think libraries). The kernel is really the only place to intercept
-   these calls.
-
-Q: Could a bounds fault be handed to userspace and the tables allocated
-   there in a signal handler instead of in the kernel?
-A: mmap() is not on the list of safe async handler functions and even
-   if mmap() would work it still requires locking or nasty tricks to
-   keep track of the allocation state there.
-
-Having ruled out all of the userspace-only approaches for managing
-bounds tables that we could think of, we create them on demand in
-the kernel.
-
-Decoding MPX instructions
--------------------------
-
-If a #BR is generated due to a bounds violation caused by MPX.
-We need to decode MPX instructions to get violation address and
-set this address into extended struct siginfo.
-
-The _sigfault field of struct siginfo is extended as follow:
-
-87             /* SIGILL, SIGFPE, SIGSEGV, SIGBUS */
-88             struct {
-89                     void __user *_addr; /* faulting insn/memory ref. */
-90 #ifdef __ARCH_SI_TRAPNO
-91                     int _trapno;    /* TRAP # which caused the signal */
-92 #endif
-93                     short _addr_lsb; /* LSB of the reported address */
-94                     struct {
-95                             void __user *_lower;
-96                             void __user *_upper;
-97                     } _addr_bnd;
-98             } _sigfault;
-
-The '_addr' field refers to violation address, and new '_addr_and'
-field refers to the upper/lower bounds when a #BR is caused.
-
-Glibc will be also updated to support this new siginfo. So user
-can get violation address and bounds when bounds violations occur.
-
-Cleanup unused bounds tables
-----------------------------
-
-When a BNDSTX instruction attempts to save bounds to a bounds directory
-entry marked as invalid, a #BR is generated. This is an indication that
-no bounds table exists for this entry. In this case the fault handler
-will allocate a new bounds table on demand.
-
-Since the kernel allocated those tables on-demand without userspace
-knowledge, it is also responsible for freeing them when the associated
-mappings go away.
-
-Here, the solution for this issue is to hook do_munmap() to check
-whether one process is MPX enabled. If yes, those bounds tables covered
-in the virtual address region which is being unmapped will be freed also.
-
-Adding new prctl commands
--------------------------
-
-Two new prctl commands are added to enable and disable MPX bounds tables
-management in kernel.
-
-155    #define PR_MPX_ENABLE_MANAGEMENT        43
-156    #define PR_MPX_DISABLE_MANAGEMENT       44
-
-Runtime library in userspace is responsible for allocation of bounds
-directory. So kernel have to use XSAVE instruction to get the base
-of bounds directory from BNDCFG register.
-
-But XSAVE is expected to be very expensive. In order to do performance
-optimization, we have to get the base of bounds directory and save it
-into struct mm_struct to be used in future during PR_MPX_ENABLE_MANAGEMENT
-command execution.
-
-
-4. Special rules
-================
-
-1) If userspace is requesting help from the kernel to do the management
-of bounds tables, it may not create or modify entries in the bounds directory.
-
-Certainly users can allocate bounds tables and forcibly point the bounds
-directory at them through XSAVE instruction, and then set valid bit
-of bounds entry to have this entry valid.  But, the kernel will decline
-to assist in managing these tables.
-
-2) Userspace may not take multiple bounds directory entries and point
-them at the same bounds table.
-
-This is allowed architecturally.  See more information "Intel(R) Architecture
-Instruction Set Extensions Programming Reference" (9.3.4).
-
-However, if users did this, the kernel might be fooled in to unmapping an
-in-use bounds table since it does not recognize sharing.
author	Changbin Du <changbin.du@gmail.com>
	Wed, 8 May 2019 15:21:27 +0000 (23:21 +0800)
committer	Jonathan Corbet <corbet@lwn.net>
	Wed, 8 May 2019 20:34:10 +0000 (14:34 -0600)
Documentation/x86/index.rst		patch \| blob \| blame \| history
Documentation/x86/intel_mpx.rst	[new file with mode: 0644]	patch \| blob
Documentation/x86/intel_mpx.txt	[deleted file]	patch \| blob \| blame \| history