]> git.proxmox.com Git - mirror_ubuntu-hirsute-kernel.git/blame - Documentation/core-api/protection-keys.rst
Merge tag 'tag-chrome-platform-for-v5.9' of git://git.kernel.org/pub/scm/linux/kernel...
[mirror_ubuntu-hirsute-kernel.git] / Documentation / core-api / protection-keys.rst
CommitLineData
28e21eac
CD
1.. SPDX-License-Identifier: GPL-2.0
2
3======================
4Memory Protection Keys
5======================
6
c51ff2c7 7Memory Protection Keys for Userspace (PKU aka PKEYs) is a feature
38f3e775
BM
8which is found on Intel's Skylake (and later) "Scalable Processor"
9Server CPUs. It will be available in future non-server Intel parts
10and future AMD processors.
c51ff2c7
DH
11
12For anyone wishing to test or use this feature, it is available in
13Amazon's EC2 C5 instances and is known to work there using an Ubuntu
1417.04 image.
591b1d8d
DH
15
16Memory Protection Keys provides a mechanism for enforcing page-based
17protections, but without requiring modification of the page tables
18when an application changes protection domains. It works by
19dedicating 4 previously ignored bits in each page table entry to a
20"protection key", giving 16 possible keys.
21
22There is also a new user-accessible register (PKRU) with two separate
23bits (Access Disable and Write Disable) for each key. Being a CPU
24register, PKRU is inherently thread-local, potentially giving each
25thread a different set of protections from every other thread.
26
27There are two new instructions (RDPKRU/WRPKRU) for reading and writing
28to the new register. The feature is only available in 64-bit mode,
29even though there is theoretically space in the PAE PTEs. These
30permissions are enforced on data access only and have no effect on
31instruction fetches.
32
28e21eac
CD
33Syscalls
34========
c74fe394 35
28e21eac 36There are 3 system calls which directly interact with pkeys::
c74fe394
DH
37
38 int pkey_alloc(unsigned long flags, unsigned long init_access_rights)
39 int pkey_free(int pkey);
40 int pkey_mprotect(unsigned long start, size_t len,
41 unsigned long prot, int pkey);
42
43Before a pkey can be used, it must first be allocated with
44pkey_alloc(). An application calls the WRPKRU instruction
45directly in order to change access permissions to memory covered
46with a key. In this example WRPKRU is wrapped by a C function
47called pkey_set().
28e21eac 48::
c74fe394
DH
49
50 int real_prot = PROT_READ|PROT_WRITE;
f90e2d9a 51 pkey = pkey_alloc(0, PKEY_DISABLE_WRITE);
c74fe394
DH
52 ptr = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
53 ret = pkey_mprotect(ptr, PAGE_SIZE, real_prot, pkey);
54 ... application runs here
55
56Now, if the application needs to update the data at 'ptr', it can
28e21eac 57gain access, do the update, then remove its write access::
c74fe394 58
f90e2d9a 59 pkey_set(pkey, 0); // clear PKEY_DISABLE_WRITE
c74fe394 60 *ptr = foo; // assign something
f90e2d9a 61 pkey_set(pkey, PKEY_DISABLE_WRITE); // set PKEY_DISABLE_WRITE again
c74fe394
DH
62
63Now when it frees the memory, it will also free the pkey since it
28e21eac 64is no longer in use::
c74fe394
DH
65
66 munmap(ptr, PAGE_SIZE);
67 pkey_free(pkey);
68
28e21eac
CD
69.. note:: pkey_set() is a wrapper for the RDPKRU and WRPKRU instructions.
70 An example implementation can be found in
71 tools/testing/selftests/x86/protection_keys.c.
6679dac5 72
28e21eac
CD
73Behavior
74========
c74fe394
DH
75
76The kernel attempts to make protection keys consistent with the
28e21eac 77behavior of a plain mprotect(). For instance if you do this::
c74fe394
DH
78
79 mprotect(ptr, size, PROT_NONE);
80 something(ptr);
81
28e21eac 82you can expect the same effects with protection keys when doing this::
c74fe394
DH
83
84 pkey = pkey_alloc(0, PKEY_DISABLE_WRITE | PKEY_DISABLE_READ);
85 pkey_mprotect(ptr, size, PROT_READ|PROT_WRITE, pkey);
86 something(ptr);
87
88That should be true whether something() is a direct access to 'ptr'
28e21eac 89like::
c74fe394
DH
90
91 *ptr = foo;
92
93or when the kernel does the access on the application's behalf like
28e21eac 94with a read()::
c74fe394
DH
95
96 read(fd, ptr, 1);
97
98The kernel will send a SIGSEGV in both cases, but si_code will be set
99to SEGV_PKERR when violating protection keys versus SEGV_ACCERR when
100the plain mprotect() permissions are violated.