]>
Commit | Line | Data |
---|---|---|
7d98c21b MCC |
1 | ========================================== |
2 | Reducing OS jitter due to per-cpu kthreads | |
3 | ========================================== | |
49717cb4 PM |
4 | |
5 | This document lists per-CPU kthreads in the Linux kernel and presents | |
6 | options to control their OS jitter. Note that non-per-CPU kthreads are | |
7 | not listed here. To reduce OS jitter from non-per-CPU kthreads, bind | |
8 | them to a "housekeeping" CPU dedicated to such work. | |
9 | ||
7d98c21b MCC |
10 | References |
11 | ========== | |
49717cb4 | 12 | |
7d98c21b | 13 | - Documentation/IRQ-affinity.txt: Binding interrupts to sets of CPUs. |
49717cb4 | 14 | |
7d98c21b | 15 | - Documentation/cgroup-v1: Using cgroups to bind tasks to sets of CPUs. |
49717cb4 | 16 | |
7d98c21b | 17 | - man taskset: Using the taskset command to bind tasks to sets |
49717cb4 PM |
18 | of CPUs. |
19 | ||
7d98c21b | 20 | - man sched_setaffinity: Using the sched_setaffinity() system |
49717cb4 PM |
21 | call to bind tasks to sets of CPUs. |
22 | ||
7d98c21b | 23 | - /sys/devices/system/cpu/cpuN/online: Control CPU N's hotplug state, |
49717cb4 PM |
24 | writing "0" to offline and "1" to online. |
25 | ||
7d98c21b | 26 | - In order to locate kernel-generated OS jitter on CPU N: |
49717cb4 PM |
27 | |
28 | cd /sys/kernel/debug/tracing | |
29 | echo 1 > max_graph_depth # Increase the "1" for more detail | |
30 | echo function_graph > current_tracer | |
31 | # run workload | |
32 | cat per_cpu/cpuN/trace | |
33 | ||
7d98c21b MCC |
34 | kthreads |
35 | ======== | |
36 | ||
37 | Name: | |
38 | ehca_comp/%u | |
49717cb4 | 39 | |
7d98c21b MCC |
40 | Purpose: |
41 | Periodically process Infiniband-related work. | |
49717cb4 | 42 | |
49717cb4 | 43 | To reduce its OS jitter, do any of the following: |
7d98c21b | 44 | |
49717cb4 PM |
45 | 1. Don't use eHCA Infiniband hardware, instead choosing hardware |
46 | that does not require per-CPU kthreads. This will prevent these | |
47 | kthreads from being created in the first place. (This will | |
48 | work for most people, as this hardware, though important, is | |
49 | relatively old and is produced in relatively low unit volumes.) | |
50 | 2. Do all eHCA-Infiniband-related work on other CPUs, including | |
51 | interrupts. | |
52 | 3. Rework the eHCA driver so that its per-CPU kthreads are | |
53 | provisioned only on selected CPUs. | |
54 | ||
55 | ||
7d98c21b MCC |
56 | Name: |
57 | irq/%d-%s | |
58 | ||
59 | Purpose: | |
60 | Handle threaded interrupts. | |
61 | ||
49717cb4 | 62 | To reduce its OS jitter, do the following: |
7d98c21b | 63 | |
49717cb4 PM |
64 | 1. Use irq affinity to force the irq threads to execute on |
65 | some other CPU. | |
66 | ||
7d98c21b MCC |
67 | Name: |
68 | kcmtpd_ctr_%d | |
69 | ||
70 | Purpose: | |
71 | Handle Bluetooth work. | |
72 | ||
49717cb4 | 73 | To reduce its OS jitter, do one of the following: |
7d98c21b | 74 | |
49717cb4 PM |
75 | 1. Don't use Bluetooth, in which case these kthreads won't be |
76 | created in the first place. | |
77 | 2. Use irq affinity to force Bluetooth-related interrupts to | |
78 | occur on some other CPU and furthermore initiate all | |
79 | Bluetooth activity on some other CPU. | |
80 | ||
7d98c21b MCC |
81 | Name: |
82 | ksoftirqd/%u | |
83 | ||
84 | Purpose: | |
85 | Execute softirq handlers when threaded or when under heavy load. | |
86 | ||
49717cb4 PM |
87 | To reduce its OS jitter, each softirq vector must be handled |
88 | separately as follows: | |
7d98c21b MCC |
89 | |
90 | TIMER_SOFTIRQ | |
91 | ------------- | |
92 | ||
93 | Do all of the following: | |
94 | ||
49717cb4 PM |
95 | 1. To the extent possible, keep the CPU out of the kernel when it |
96 | is non-idle, for example, by avoiding system calls and by forcing | |
97 | both kernel threads and interrupts to execute elsewhere. | |
98 | 2. Build with CONFIG_HOTPLUG_CPU=y. After boot completes, force | |
99 | the CPU offline, then bring it back online. This forces | |
100 | recurring timers to migrate elsewhere. If you are concerned | |
101 | with multiple CPUs, force them all offline before bringing the | |
102 | first one back online. Once you have onlined the CPUs in question, | |
103 | do not offline any other CPUs, because doing so could force the | |
104 | timer back onto one of the CPUs in question. | |
7d98c21b MCC |
105 | |
106 | NET_TX_SOFTIRQ and NET_RX_SOFTIRQ | |
107 | --------------------------------- | |
108 | ||
109 | Do all of the following: | |
110 | ||
49717cb4 PM |
111 | 1. Force networking interrupts onto other CPUs. |
112 | 2. Initiate any network I/O on other CPUs. | |
113 | 3. Once your application has started, prevent CPU-hotplug operations | |
114 | from being initiated from tasks that might run on the CPU to | |
115 | be de-jittered. (It is OK to force this CPU offline and then | |
116 | bring it back online before you start your application.) | |
7d98c21b MCC |
117 | |
118 | BLOCK_SOFTIRQ | |
119 | ------------- | |
120 | ||
121 | Do all of the following: | |
122 | ||
49717cb4 PM |
123 | 1. Force block-device interrupts onto some other CPU. |
124 | 2. Initiate any block I/O on other CPUs. | |
125 | 3. Once your application has started, prevent CPU-hotplug operations | |
126 | from being initiated from tasks that might run on the CPU to | |
127 | be de-jittered. (It is OK to force this CPU offline and then | |
128 | bring it back online before you start your application.) | |
7d98c21b MCC |
129 | |
130 | IRQ_POLL_SOFTIRQ | |
131 | ---------------- | |
132 | ||
133 | Do all of the following: | |
134 | ||
49717cb4 PM |
135 | 1. Force block-device interrupts onto some other CPU. |
136 | 2. Initiate any block I/O and block-I/O polling on other CPUs. | |
137 | 3. Once your application has started, prevent CPU-hotplug operations | |
138 | from being initiated from tasks that might run on the CPU to | |
139 | be de-jittered. (It is OK to force this CPU offline and then | |
140 | bring it back online before you start your application.) | |
7d98c21b MCC |
141 | |
142 | TASKLET_SOFTIRQ | |
143 | --------------- | |
144 | ||
145 | Do one or more of the following: | |
146 | ||
49717cb4 PM |
147 | 1. Avoid use of drivers that use tasklets. (Such drivers will contain |
148 | calls to things like tasklet_schedule().) | |
149 | 2. Convert all drivers that you must use from tasklets to workqueues. | |
150 | 3. Force interrupts for drivers using tasklets onto other CPUs, | |
151 | and also do I/O involving these drivers on other CPUs. | |
7d98c21b MCC |
152 | |
153 | SCHED_SOFTIRQ | |
154 | ------------- | |
155 | ||
156 | Do all of the following: | |
157 | ||
49717cb4 PM |
158 | 1. Avoid sending scheduler IPIs to the CPU to be de-jittered, |
159 | for example, ensure that at most one runnable kthread is present | |
160 | on that CPU. If a thread that expects to run on the de-jittered | |
161 | CPU awakens, the scheduler will send an IPI that can result in | |
162 | a subsequent SCHED_SOFTIRQ. | |
44c65ff2 PM |
163 | 2. CONFIG_NO_HZ_FULL=y and ensure that the CPU to be de-jittered |
164 | is marked as an adaptive-ticks CPU using the "nohz_full=" | |
165 | boot parameter. This reduces the number of scheduler-clock | |
166 | interrupts that the de-jittered CPU receives, minimizing its | |
167 | chances of being selected to do the load balancing work that | |
168 | runs in SCHED_SOFTIRQ context. | |
49717cb4 PM |
169 | 3. To the extent possible, keep the CPU out of the kernel when it |
170 | is non-idle, for example, by avoiding system calls and by | |
171 | forcing both kernel threads and interrupts to execute elsewhere. | |
172 | This further reduces the number of scheduler-clock interrupts | |
173 | received by the de-jittered CPU. | |
7d98c21b MCC |
174 | |
175 | HRTIMER_SOFTIRQ | |
176 | --------------- | |
177 | ||
178 | Do all of the following: | |
179 | ||
49717cb4 PM |
180 | 1. To the extent possible, keep the CPU out of the kernel when it |
181 | is non-idle. For example, avoid system calls and force both | |
182 | kernel threads and interrupts to execute elsewhere. | |
183 | 2. Build with CONFIG_HOTPLUG_CPU=y. Once boot completes, force the | |
184 | CPU offline, then bring it back online. This forces recurring | |
185 | timers to migrate elsewhere. If you are concerned with multiple | |
186 | CPUs, force them all offline before bringing the first one | |
187 | back online. Once you have onlined the CPUs in question, do not | |
188 | offline any other CPUs, because doing so could force the timer | |
189 | back onto one of the CPUs in question. | |
7d98c21b MCC |
190 | |
191 | RCU_SOFTIRQ | |
192 | ----------- | |
193 | ||
194 | Do at least one of the following: | |
195 | ||
49717cb4 PM |
196 | 1. Offload callbacks and keep the CPU in either dyntick-idle or |
197 | adaptive-ticks state by doing all of the following: | |
7d98c21b | 198 | |
44c65ff2 PM |
199 | a. CONFIG_NO_HZ_FULL=y and ensure that the CPU to be |
200 | de-jittered is marked as an adaptive-ticks CPU using the | |
201 | "nohz_full=" boot parameter. Bind the rcuo kthreads to | |
202 | housekeeping CPUs, which can tolerate OS jitter. | |
49717cb4 PM |
203 | b. To the extent possible, keep the CPU out of the kernel |
204 | when it is non-idle, for example, by avoiding system | |
205 | calls and by forcing both kernel threads and interrupts | |
206 | to execute elsewhere. | |
7d98c21b | 207 | |
49717cb4 PM |
208 | 2. Enable RCU to do its processing remotely via dyntick-idle by |
209 | doing all of the following: | |
7d98c21b | 210 | |
49717cb4 PM |
211 | a. Build with CONFIG_NO_HZ=y and CONFIG_RCU_FAST_NO_HZ=y. |
212 | b. Ensure that the CPU goes idle frequently, allowing other | |
213 | CPUs to detect that it has passed through an RCU quiescent | |
214 | state. If the kernel is built with CONFIG_NO_HZ_FULL=y, | |
215 | userspace execution also allows other CPUs to detect that | |
216 | the CPU in question has passed through a quiescent state. | |
217 | c. To the extent possible, keep the CPU out of the kernel | |
218 | when it is non-idle, for example, by avoiding system | |
219 | calls and by forcing both kernel threads and interrupts | |
220 | to execute elsewhere. | |
221 | ||
7d98c21b MCC |
222 | Name: |
223 | kworker/%u:%d%s (cpu, id, priority) | |
224 | ||
225 | Purpose: | |
226 | Execute workqueue requests | |
227 | ||
f7bac9b8 | 228 | To reduce its OS jitter, do any of the following: |
7d98c21b | 229 | |
f7bac9b8 PM |
230 | 1. Run your workload at a real-time priority, which will allow |
231 | preempting the kworker daemons. | |
bbf393b0 PM |
232 | 2. A given workqueue can be made visible in the sysfs filesystem |
233 | by passing the WQ_SYSFS to that workqueue's alloc_workqueue(). | |
234 | Such a workqueue can be confined to a given subset of the | |
7d98c21b | 235 | CPUs using the ``/sys/devices/virtual/workqueue/*/cpumask`` sysfs |
bbf393b0 PM |
236 | files. The set of WQ_SYSFS workqueues can be displayed using |
237 | "ls sys/devices/virtual/workqueue". That said, the workqueues | |
238 | maintainer would like to caution people against indiscriminately | |
239 | sprinkling WQ_SYSFS across all the workqueues. The reason for | |
240 | caution is that it is easy to add WQ_SYSFS, but because sysfs is | |
241 | part of the formal user/kernel API, it can be nearly impossible | |
242 | to remove it, even if its addition was a mistake. | |
243 | 3. Do any of the following needed to avoid jitter that your | |
f7bac9b8 | 244 | application cannot tolerate: |
7d98c21b | 245 | |
f7bac9b8 PM |
246 | a. Build your kernel with CONFIG_SLUB=y rather than |
247 | CONFIG_SLAB=y, thus avoiding the slab allocator's periodic | |
248 | use of each CPU's workqueues to run its cache_reap() | |
249 | function. | |
250 | b. Avoid using oprofile, thus avoiding OS jitter from | |
251 | wq_sync_buffer(). | |
252 | c. Limit your CPU frequency so that a CPU-frequency | |
253 | governor is not required, possibly enlisting the aid of | |
254 | special heatsinks or other cooling technologies. If done | |
255 | correctly, and if you CPU architecture permits, you should | |
256 | be able to build your kernel with CONFIG_CPU_FREQ=n to | |
257 | avoid the CPU-frequency governor periodically running | |
258 | on each CPU, including cs_dbs_timer() and od_dbs_timer(). | |
7d98c21b | 259 | |
f7bac9b8 PM |
260 | WARNING: Please check your CPU specifications to |
261 | make sure that this is safe on your particular system. | |
89bf5d82 PM |
262 | d. As of v3.18, Christoph Lameter's on-demand vmstat workers |
263 | commit prevents OS jitter due to vmstat_update() on | |
264 | CONFIG_SMP=y systems. Before v3.18, is not possible | |
265 | to entirely get rid of the OS jitter, but you can | |
266 | decrease its frequency by writing a large value to | |
267 | /proc/sys/vm/stat_interval. The default value is HZ, | |
268 | for an interval of one second. Of course, larger values | |
269 | will make your virtual-memory statistics update more | |
270 | slowly. Of course, you can also run your workload at | |
271 | a real-time priority, thus preempting vmstat_update(), | |
64f26e5c PM |
272 | but if your workload is CPU-bound, this is a bad idea. |
273 | However, there is an RFC patch from Christoph Lameter | |
274 | (based on an earlier one from Gilad Ben-Yossef) that | |
275 | reduces or even eliminates vmstat overhead for some | |
276 | workloads at https://lkml.org/lkml/2013/9/4/379. | |
f1360570 PM |
277 | e. Boot with "elevator=noop" to avoid workqueue use by |
278 | the block layer. | |
279 | f. If running on high-end powerpc servers, build with | |
f7bac9b8 PM |
280 | CONFIG_PPC_RTAS_DAEMON=n. This prevents the RTAS |
281 | daemon from running on each CPU every second or so. | |
282 | (This will require editing Kconfig files and will defeat | |
283 | this platform's RAS functionality.) This avoids jitter | |
284 | due to the rtas_event_scan() function. | |
285 | WARNING: Please check your CPU specifications to | |
286 | make sure that this is safe on your particular system. | |
f1360570 | 287 | g. If running on Cell Processor, build your kernel with |
f7bac9b8 PM |
288 | CBE_CPUFREQ_SPU_GOVERNOR=n to avoid OS jitter from |
289 | spu_gov_work(). | |
290 | WARNING: Please check your CPU specifications to | |
291 | make sure that this is safe on your particular system. | |
f1360570 | 292 | h. If running on PowerMAC, build your kernel with |
f7bac9b8 PM |
293 | CONFIG_PMAC_RACKMETER=n to disable the CPU-meter, |
294 | avoiding OS jitter from rackmeter_do_timer(). | |
295 | ||
7d98c21b MCC |
296 | Name: |
297 | rcuc/%u | |
298 | ||
299 | Purpose: | |
300 | Execute RCU callbacks in CONFIG_RCU_BOOST=y kernels. | |
301 | ||
49717cb4 | 302 | To reduce its OS jitter, do at least one of the following: |
7d98c21b | 303 | |
49717cb4 PM |
304 | 1. Build the kernel with CONFIG_PREEMPT=n. This prevents these |
305 | kthreads from being created in the first place, and also obviates | |
306 | the need for RCU priority boosting. This approach is feasible | |
307 | for workloads that do not require high degrees of responsiveness. | |
308 | 2. Build the kernel with CONFIG_RCU_BOOST=n. This prevents these | |
309 | kthreads from being created in the first place. This approach | |
310 | is feasible only if your workload never requires RCU priority | |
311 | boosting, for example, if you ensure frequent idle time on all | |
312 | CPUs that might execute within the kernel. | |
44c65ff2 PM |
313 | 3. Build with CONFIG_RCU_NOCB_CPU=y and boot with the rcu_nocbs= |
314 | boot parameter offloading RCU callbacks from all CPUs susceptible | |
315 | to OS jitter. This approach prevents the rcuc/%u kthreads from | |
316 | having any work to do, so that they are never awakened. | |
49717cb4 PM |
317 | 4. Ensure that the CPU never enters the kernel, and, in particular, |
318 | avoid initiating any CPU hotplug operations on this CPU. This is | |
319 | another way of preventing any callbacks from being queued on the | |
320 | CPU, again preventing the rcuc/%u kthreads from having any work | |
321 | to do. | |
322 | ||
7d98c21b MCC |
323 | Name: |
324 | rcuob/%d, rcuop/%d, and rcuos/%d | |
325 | ||
326 | Purpose: | |
327 | Offload RCU callbacks from the corresponding CPU. | |
328 | ||
49717cb4 | 329 | To reduce its OS jitter, do at least one of the following: |
7d98c21b | 330 | |
49717cb4 PM |
331 | 1. Use affinity, cgroups, or other mechanism to force these kthreads |
332 | to execute on some other CPU. | |
b9651622 | 333 | 2. Build with CONFIG_RCU_NOCB_CPU=n, which will prevent these |
49717cb4 PM |
334 | kthreads from being created in the first place. However, please |
335 | note that this will not eliminate OS jitter, but will instead | |
336 | shift it to RCU_SOFTIRQ. | |
337 | ||
7d98c21b MCC |
338 | Name: |
339 | watchdog/%u | |
340 | ||
341 | Purpose: | |
342 | Detect software lockups on each CPU. | |
343 | ||
49717cb4 | 344 | To reduce its OS jitter, do at least one of the following: |
7d98c21b | 345 | |
49717cb4 PM |
346 | 1. Build with CONFIG_LOCKUP_DETECTOR=n, which will prevent these |
347 | kthreads from being created in the first place. | |
f1360570 PM |
348 | 2. Boot with "nosoftlockup=0", which will also prevent these kthreads |
349 | from being created. Other related watchdog and softlockup boot | |
8c27ceff | 350 | parameters may be found in Documentation/admin-guide/kernel-parameters.rst |
f1360570 PM |
351 | and Documentation/watchdog/watchdog-parameters.txt. |
352 | 3. Echo a zero to /proc/sys/kernel/watchdog to disable the | |
49717cb4 | 353 | watchdog timer. |
f1360570 | 354 | 4. Echo a large number of /proc/sys/kernel/watchdog_thresh in |
49717cb4 PM |
355 | order to reduce the frequency of OS jitter due to the watchdog |
356 | timer down to a level that is acceptable for your workload. |