]>
Commit | Line | Data |
---|---|---|
3fa97bf0 JS |
1 | .. SPDX-License-Identifier: GPL-2.0 |
2 | ||
3 | =============================== | |
4 | Software Guard eXtensions (SGX) | |
5 | =============================== | |
6 | ||
7 | Overview | |
8 | ======== | |
9 | ||
10 | Software Guard eXtensions (SGX) hardware enables for user space applications | |
11 | to set aside private memory regions of code and data: | |
12 | ||
13 | * Privileged (ring-0) ENCLS functions orchestrate the construction of the. | |
14 | regions. | |
15 | * Unprivileged (ring-3) ENCLU functions allow an application to enter and | |
16 | execute inside the regions. | |
17 | ||
18 | These memory regions are called enclaves. An enclave can be only entered at a | |
19 | fixed set of entry points. Each entry point can hold a single hardware thread | |
20 | at a time. While the enclave is loaded from a regular binary file by using | |
21 | ENCLS functions, only the threads inside the enclave can access its memory. The | |
22 | region is denied from outside access by the CPU, and encrypted before it leaves | |
23 | from LLC. | |
24 | ||
25 | The support can be determined by | |
26 | ||
27 | ``grep sgx /proc/cpuinfo`` | |
28 | ||
29 | SGX must both be supported in the processor and enabled by the BIOS. If SGX | |
30 | appears to be unsupported on a system which has hardware support, ensure | |
31 | support is enabled in the BIOS. If a BIOS presents a choice between "Enabled" | |
32 | and "Software Enabled" modes for SGX, choose "Enabled". | |
33 | ||
34 | Enclave Page Cache | |
35 | ================== | |
36 | ||
37 | SGX utilizes an *Enclave Page Cache (EPC)* to store pages that are associated | |
38 | with an enclave. It is contained in a BIOS-reserved region of physical memory. | |
39 | Unlike pages used for regular memory, pages can only be accessed from outside of | |
40 | the enclave during enclave construction with special, limited SGX instructions. | |
41 | ||
42 | Only a CPU executing inside an enclave can directly access enclave memory. | |
43 | However, a CPU executing inside an enclave may access normal memory outside the | |
44 | enclave. | |
45 | ||
46 | The kernel manages enclave memory similar to how it treats device memory. | |
47 | ||
48 | Enclave Page Types | |
49 | ------------------ | |
50 | ||
51 | **SGX Enclave Control Structure (SECS)** | |
52 | Enclave's address range, attributes and other global data are defined | |
53 | by this structure. | |
54 | ||
55 | **Regular (REG)** | |
56 | Regular EPC pages contain the code and data of an enclave. | |
57 | ||
58 | **Thread Control Structure (TCS)** | |
59 | Thread Control Structure pages define the entry points to an enclave and | |
60 | track the execution state of an enclave thread. | |
61 | ||
62 | **Version Array (VA)** | |
63 | Version Array pages contain 512 slots, each of which can contain a version | |
64 | number for a page evicted from the EPC. | |
65 | ||
66 | Enclave Page Cache Map | |
67 | ---------------------- | |
68 | ||
69 | The processor tracks EPC pages in a hardware metadata structure called the | |
70 | *Enclave Page Cache Map (EPCM)*. The EPCM contains an entry for each EPC page | |
71 | which describes the owning enclave, access rights and page type among the other | |
72 | things. | |
73 | ||
74 | EPCM permissions are separate from the normal page tables. This prevents the | |
75 | kernel from, for instance, allowing writes to data which an enclave wishes to | |
76 | remain read-only. EPCM permissions may only impose additional restrictions on | |
77 | top of normal x86 page permissions. | |
78 | ||
79 | For all intents and purposes, the SGX architecture allows the processor to | |
80 | invalidate all EPCM entries at will. This requires that software be prepared to | |
81 | handle an EPCM fault at any time. In practice, this can happen on events like | |
82 | power transitions when the ephemeral key that encrypts enclave memory is lost. | |
83 | ||
84 | Application interface | |
85 | ===================== | |
86 | ||
87 | Enclave build functions | |
88 | ----------------------- | |
89 | ||
90 | In addition to the traditional compiler and linker build process, SGX has a | |
91 | separate enclave “build” process. Enclaves must be built before they can be | |
92 | executed (entered). The first step in building an enclave is opening the | |
93 | **/dev/sgx_enclave** device. Since enclave memory is protected from direct | |
94 | access, special privileged instructions are Then used to copy data into enclave | |
95 | pages and establish enclave page permissions. | |
96 | ||
97 | .. kernel-doc:: arch/x86/kernel/cpu/sgx/ioctl.c | |
98 | :functions: sgx_ioc_enclave_create | |
99 | sgx_ioc_enclave_add_pages | |
100 | sgx_ioc_enclave_init | |
101 | sgx_ioc_enclave_provision | |
102 | ||
103 | Enclave vDSO | |
104 | ------------ | |
105 | ||
106 | Entering an enclave can only be done through SGX-specific EENTER and ERESUME | |
107 | functions, and is a non-trivial process. Because of the complexity of | |
108 | transitioning to and from an enclave, enclaves typically utilize a library to | |
109 | handle the actual transitions. This is roughly analogous to how glibc | |
110 | implementations are used by most applications to wrap system calls. | |
111 | ||
112 | Another crucial characteristic of enclaves is that they can generate exceptions | |
113 | as part of their normal operation that need to be handled in the enclave or are | |
114 | unique to SGX. | |
115 | ||
116 | Instead of the traditional signal mechanism to handle these exceptions, SGX | |
117 | can leverage special exception fixup provided by the vDSO. The kernel-provided | |
118 | vDSO function wraps low-level transitions to/from the enclave like EENTER and | |
119 | ERESUME. The vDSO function intercepts exceptions that would otherwise generate | |
120 | a signal and return the fault information directly to its caller. This avoids | |
121 | the need to juggle signal handlers. | |
122 | ||
123 | .. kernel-doc:: arch/x86/include/uapi/asm/sgx.h | |
124 | :functions: vdso_sgx_enter_enclave_t | |
125 | ||
126 | ksgxd | |
127 | ===== | |
128 | ||
129 | SGX support includes a kernel thread called *ksgxwapd*. | |
130 | ||
131 | EPC sanitization | |
132 | ---------------- | |
133 | ||
134 | ksgxd is started when SGX initializes. Enclave memory is typically ready | |
135 | For use when the processor powers on or resets. However, if SGX has been in | |
136 | use since the reset, enclave pages may be in an inconsistent state. This might | |
137 | occur after a crash and kexec() cycle, for instance. At boot, ksgxd | |
138 | reinitializes all enclave pages so that they can be allocated and re-used. | |
139 | ||
140 | The sanitization is done by going through EPC address space and applying the | |
141 | EREMOVE function to each physical page. Some enclave pages like SECS pages have | |
142 | hardware dependencies on other pages which prevents EREMOVE from functioning. | |
143 | Executing two EREMOVE passes removes the dependencies. | |
144 | ||
145 | Page reclaimer | |
146 | -------------- | |
147 | ||
148 | Similar to the core kswapd, ksgxd, is responsible for managing the | |
149 | overcommitment of enclave memory. If the system runs out of enclave memory, | |
150 | *ksgxwapd* “swaps” enclave memory to normal memory. | |
151 | ||
152 | Launch Control | |
153 | ============== | |
154 | ||
155 | SGX provides a launch control mechanism. After all enclave pages have been | |
156 | copied, kernel executes EINIT function, which initializes the enclave. Only after | |
157 | this the CPU can execute inside the enclave. | |
158 | ||
159 | ENIT function takes an RSA-3072 signature of the enclave measurement. The function | |
160 | checks that the measurement is correct and signature is signed with the key | |
161 | hashed to the four **IA32_SGXLEPUBKEYHASH{0, 1, 2, 3}** MSRs representing the | |
162 | SHA256 of a public key. | |
163 | ||
164 | Those MSRs can be configured by the BIOS to be either readable or writable. | |
165 | Linux supports only writable configuration in order to give full control to the | |
166 | kernel on launch control policy. Before calling EINIT function, the driver sets | |
167 | the MSRs to match the enclave's signing key. | |
168 | ||
169 | Encryption engines | |
170 | ================== | |
171 | ||
172 | In order to conceal the enclave data while it is out of the CPU package, the | |
173 | memory controller has an encryption engine to transparently encrypt and decrypt | |
174 | enclave memory. | |
175 | ||
176 | In CPUs prior to Ice Lake, the Memory Encryption Engine (MEE) is used to | |
177 | encrypt pages leaving the CPU caches. MEE uses a n-ary Merkle tree with root in | |
178 | SRAM to maintain integrity of the encrypted data. This provides integrity and | |
179 | anti-replay protection but does not scale to large memory sizes because the time | |
180 | required to update the Merkle tree grows logarithmically in relation to the | |
181 | memory size. | |
182 | ||
183 | CPUs starting from Icelake use Total Memory Encryption (TME) in the place of | |
184 | MEE. TME-based SGX implementations do not have an integrity Merkle tree, which | |
185 | means integrity and replay-attacks are not mitigated. B, it includes | |
186 | additional changes to prevent cipher text from being returned and SW memory | |
187 | aliases from being Created. | |
188 | ||
189 | DMA to enclave memory is blocked by range registers on both MEE and TME systems | |
190 | (SDM section 41.10). | |
191 | ||
192 | Usage Models | |
193 | ============ | |
194 | ||
195 | Shared Library | |
196 | -------------- | |
197 | ||
198 | Sensitive data and the code that acts on it is partitioned from the application | |
199 | into a separate library. The library is then linked as a DSO which can be loaded | |
200 | into an enclave. The application can then make individual function calls into | |
201 | the enclave through special SGX instructions. A run-time within the enclave is | |
202 | configured to marshal function parameters into and out of the enclave and to | |
203 | call the correct library function. | |
204 | ||
205 | Application Container | |
206 | --------------------- | |
207 | ||
208 | An application may be loaded into a container enclave which is specially | |
209 | configured with a library OS and run-time which permits the application to run. | |
210 | The enclave run-time and library OS work together to execute the application | |
211 | when a thread enters the enclave. | |
b0c7459b KH |
212 | |
213 | Impact of Potential Kernel SGX Bugs | |
214 | =================================== | |
215 | ||
216 | EPC leaks | |
217 | --------- | |
218 | ||
219 | When EPC page leaks happen, a WARNING like this is shown in dmesg: | |
220 | ||
221 | "EREMOVE returned ... and an EPC page was leaked. SGX may become unusable..." | |
222 | ||
223 | This is effectively a kernel use-after-free of an EPC page, and due | |
224 | to the way SGX works, the bug is detected at freeing. Rather than | |
225 | adding the page back to the pool of available EPC pages, the kernel | |
226 | intentionally leaks the page to avoid additional errors in the future. | |
227 | ||
228 | When this happens, the kernel will likely soon leak more EPC pages, and | |
229 | SGX will likely become unusable because the memory available to SGX is | |
230 | limited. However, while this may be fatal to SGX, the rest of the kernel | |
231 | is unlikely to be impacted and should continue to work. | |
232 | ||
233 | As a result, when this happpens, user should stop running any new | |
234 | SGX workloads, (or just any new workloads), and migrate all valuable | |
235 | workloads. Although a machine reboot can recover all EPC memory, the bug | |
236 | should be reported to Linux developers. | |
540745dd SC |
237 | |
238 | ||
239 | Virtual EPC | |
240 | =========== | |
241 | ||
242 | The implementation has also a virtual EPC driver to support SGX enclaves | |
243 | in guests. Unlike the SGX driver, an EPC page allocated by the virtual | |
244 | EPC driver doesn't have a specific enclave associated with it. This is | |
245 | because KVM doesn't track how a guest uses EPC pages. | |
246 | ||
247 | As a result, the SGX core page reclaimer doesn't support reclaiming EPC | |
248 | pages allocated to KVM guests through the virtual EPC driver. If the | |
249 | user wants to deploy SGX applications both on the host and in guests | |
250 | on the same machine, the user should reserve enough EPC (by taking out | |
251 | total virtual EPC size of all SGX VMs from the physical EPC size) for | |
252 | host SGX applications so they can run with acceptable performance. |