]>
Commit | Line | Data |
---|---|---|
28e21eac CD |
1 | .. SPDX-License-Identifier: GPL-2.0 |
2 | ||
3 | ====================== | |
4 | Memory Protection Keys | |
5 | ====================== | |
6 | ||
c51ff2c7 | 7 | Memory Protection Keys for Userspace (PKU aka PKEYs) is a feature |
38f3e775 BM |
8 | which is found on Intel's Skylake (and later) "Scalable Processor" |
9 | Server CPUs. It will be available in future non-server Intel parts | |
10 | and future AMD processors. | |
c51ff2c7 DH |
11 | |
12 | For anyone wishing to test or use this feature, it is available in | |
13 | Amazon's EC2 C5 instances and is known to work there using an Ubuntu | |
14 | 17.04 image. | |
591b1d8d DH |
15 | |
16 | Memory Protection Keys provides a mechanism for enforcing page-based | |
17 | protections, but without requiring modification of the page tables | |
18 | when an application changes protection domains. It works by | |
19 | dedicating 4 previously ignored bits in each page table entry to a | |
20 | "protection key", giving 16 possible keys. | |
21 | ||
22 | There is also a new user-accessible register (PKRU) with two separate | |
23 | bits (Access Disable and Write Disable) for each key. Being a CPU | |
24 | register, PKRU is inherently thread-local, potentially giving each | |
25 | thread a different set of protections from every other thread. | |
26 | ||
27 | There are two new instructions (RDPKRU/WRPKRU) for reading and writing | |
28 | to the new register. The feature is only available in 64-bit mode, | |
29 | even though there is theoretically space in the PAE PTEs. These | |
30 | permissions are enforced on data access only and have no effect on | |
31 | instruction fetches. | |
32 | ||
28e21eac CD |
33 | Syscalls |
34 | ======== | |
c74fe394 | 35 | |
28e21eac | 36 | There are 3 system calls which directly interact with pkeys:: |
c74fe394 DH |
37 | |
38 | int pkey_alloc(unsigned long flags, unsigned long init_access_rights) | |
39 | int pkey_free(int pkey); | |
40 | int pkey_mprotect(unsigned long start, size_t len, | |
41 | unsigned long prot, int pkey); | |
42 | ||
43 | Before a pkey can be used, it must first be allocated with | |
44 | pkey_alloc(). An application calls the WRPKRU instruction | |
45 | directly in order to change access permissions to memory covered | |
46 | with a key. In this example WRPKRU is wrapped by a C function | |
47 | called pkey_set(). | |
28e21eac | 48 | :: |
c74fe394 DH |
49 | |
50 | int real_prot = PROT_READ|PROT_WRITE; | |
f90e2d9a | 51 | pkey = pkey_alloc(0, PKEY_DISABLE_WRITE); |
c74fe394 DH |
52 | ptr = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0); |
53 | ret = pkey_mprotect(ptr, PAGE_SIZE, real_prot, pkey); | |
54 | ... application runs here | |
55 | ||
56 | Now, if the application needs to update the data at 'ptr', it can | |
28e21eac | 57 | gain access, do the update, then remove its write access:: |
c74fe394 | 58 | |
f90e2d9a | 59 | pkey_set(pkey, 0); // clear PKEY_DISABLE_WRITE |
c74fe394 | 60 | *ptr = foo; // assign something |
f90e2d9a | 61 | pkey_set(pkey, PKEY_DISABLE_WRITE); // set PKEY_DISABLE_WRITE again |
c74fe394 DH |
62 | |
63 | Now when it frees the memory, it will also free the pkey since it | |
28e21eac | 64 | is no longer in use:: |
c74fe394 DH |
65 | |
66 | munmap(ptr, PAGE_SIZE); | |
67 | pkey_free(pkey); | |
68 | ||
28e21eac CD |
69 | .. note:: pkey_set() is a wrapper for the RDPKRU and WRPKRU instructions. |
70 | An example implementation can be found in | |
71 | tools/testing/selftests/x86/protection_keys.c. | |
6679dac5 | 72 | |
28e21eac CD |
73 | Behavior |
74 | ======== | |
c74fe394 DH |
75 | |
76 | The kernel attempts to make protection keys consistent with the | |
28e21eac | 77 | behavior of a plain mprotect(). For instance if you do this:: |
c74fe394 DH |
78 | |
79 | mprotect(ptr, size, PROT_NONE); | |
80 | something(ptr); | |
81 | ||
28e21eac | 82 | you can expect the same effects with protection keys when doing this:: |
c74fe394 DH |
83 | |
84 | pkey = pkey_alloc(0, PKEY_DISABLE_WRITE | PKEY_DISABLE_READ); | |
85 | pkey_mprotect(ptr, size, PROT_READ|PROT_WRITE, pkey); | |
86 | something(ptr); | |
87 | ||
88 | That should be true whether something() is a direct access to 'ptr' | |
28e21eac | 89 | like:: |
c74fe394 DH |
90 | |
91 | *ptr = foo; | |
92 | ||
93 | or when the kernel does the access on the application's behalf like | |
28e21eac | 94 | with a read():: |
c74fe394 DH |
95 | |
96 | read(fd, ptr, 1); | |
97 | ||
98 | The kernel will send a SIGSEGV in both cases, but si_code will be set | |
99 | to SEGV_PKERR when violating protection keys versus SEGV_ACCERR when | |
100 | the plain mprotect() permissions are violated. |