]>
Commit | Line | Data |
---|---|---|
1da177e4 LT |
1 | Documentation for /proc/sys/kernel/* kernel version 2.2.10 |
2 | (c) 1998, 1999, Rik van Riel <riel@nl.linux.org> | |
760df93e | 3 | (c) 2009, Shen Feng<shen@cn.fujitsu.com> |
1da177e4 LT |
4 | |
5 | For general info and legal blurb, please look in README. | |
6 | ||
7 | ============================================================== | |
8 | ||
9 | This file contains documentation for the sysctl files in | |
10 | /proc/sys/kernel/ and is valid for Linux kernel version 2.2. | |
11 | ||
12 | The files in this directory can be used to tune and monitor | |
13 | miscellaneous and general things in the operation of the Linux | |
14 | kernel. Since some of the files _can_ be used to screw up your | |
15 | system, it is advisable to read both documentation and source | |
16 | before actually making adjustments. | |
17 | ||
18 | Currently, these files might (depending on your configuration) | |
19 | show up in /proc/sys/kernel: | |
807094c0 | 20 | |
1da177e4 | 21 | - acct |
807094c0 BP |
22 | - acpi_video_flags |
23 | - auto_msgmni | |
d75757ab PA |
24 | - bootloader_type [ X86 only ] |
25 | - bootloader_version [ X86 only ] | |
c114728a | 26 | - callhome [ S390 only ] |
73efc039 | 27 | - cap_last_cap |
1da177e4 | 28 | - core_pattern |
a293980c | 29 | - core_pipe_limit |
1da177e4 LT |
30 | - core_uses_pid |
31 | - ctrl-alt-del | |
eaf06b24 | 32 | - dmesg_restrict |
1da177e4 LT |
33 | - domainname |
34 | - hostname | |
35 | - hotplug | |
55537871 | 36 | - hardlockup_all_cpu_backtrace |
270750db AT |
37 | - hung_task_panic |
38 | - hung_task_check_count | |
39 | - hung_task_timeout_secs | |
40 | - hung_task_warnings | |
7984754b | 41 | - kexec_load_disabled |
455cd5ab | 42 | - kptr_restrict |
1da177e4 | 43 | - l2cr [ PPC only ] |
ac76cff2 | 44 | - modprobe ==> Documentation/debugging-modules.txt |
3d43321b | 45 | - modules_disabled |
03f59566 | 46 | - msg_next_id [ sysv ipc ] |
1da177e4 LT |
47 | - msgmax |
48 | - msgmnb | |
49 | - msgmni | |
760df93e | 50 | - nmi_watchdog |
1da177e4 LT |
51 | - osrelease |
52 | - ostype | |
53 | - overflowgid | |
54 | - overflowuid | |
55 | - panic | |
807094c0 | 56 | - panic_on_oops |
55af7796 | 57 | - panic_on_stackoverflow |
9e3961a0 PB |
58 | - panic_on_unrecovered_nmi |
59 | - panic_on_warn | |
088e9d25 | 60 | - panic_on_rcu_stall |
3379e0c3 BH |
61 | - perf_cpu_time_max_percent |
62 | - perf_event_paranoid | |
c5dfd78e | 63 | - perf_event_max_stack |
ac0bb6b7 | 64 | - perf_event_mlock_kb |
c85b0334 | 65 | - perf_event_max_contexts_per_stack |
1da177e4 LT |
66 | - pid_max |
67 | - powersave-nap [ PPC only ] | |
68 | - printk | |
807094c0 BP |
69 | - printk_delay |
70 | - printk_ratelimit | |
71 | - printk_ratelimit_burst | |
8b253b07 | 72 | - pty ==> Documentation/filesystems/devpts.txt |
1ec7fd50 | 73 | - randomize_va_space |
8c27ceff | 74 | - real-root-dev ==> Documentation/admin-guide/initrd.rst |
1da177e4 LT |
75 | - reboot-cmd [ SPARC only ] |
76 | - rtsig-max | |
77 | - rtsig-nr | |
8e5f1ad1 | 78 | - seccomp/ ==> Documentation/userspace-api/seccomp_filter.rst |
1da177e4 | 79 | - sem |
03f59566 | 80 | - sem_next_id [ sysv ipc ] |
1da177e4 | 81 | - sg-big-buff [ generic SCSI device (sg) ] |
03f59566 | 82 | - shm_next_id [ sysv ipc ] |
b34a6b1d | 83 | - shm_rmid_forced |
1da177e4 LT |
84 | - shmall |
85 | - shmmax [ sysv ipc ] | |
86 | - shmmni | |
ed235875 | 87 | - softlockup_all_cpu_backtrace |
195daf66 | 88 | - soft_watchdog |
1da177e4 | 89 | - stop-a [ SPARC only ] |
d3c1a297 | 90 | - sysrq ==> Documentation/admin-guide/sysrq.rst |
f4aacea2 | 91 | - sysctl_writes_strict |
1da177e4 LT |
92 | - tainted |
93 | - threads-max | |
760df93e | 94 | - unknown_nmi_panic |
195daf66 | 95 | - watchdog |
08825c90 | 96 | - watchdog_thresh |
1da177e4 LT |
97 | - version |
98 | ||
99 | ============================================================== | |
100 | ||
101 | acct: | |
102 | ||
103 | highwater lowwater frequency | |
104 | ||
105 | If BSD-style process accounting is enabled these values control | |
106 | its behaviour. If free space on filesystem where the log lives | |
107 | goes below <lowwater>% accounting suspends. If free space gets | |
108 | above <highwater>% accounting resumes. <Frequency> determines | |
109 | how often do we check the amount of free space (value is in | |
110 | seconds). Default: | |
111 | 4 2 30 | |
112 | That is, suspend accounting if there left <= 2% free; resume it | |
113 | if we got >=4%; consider information about amount of free space | |
114 | valid for 30 seconds. | |
115 | ||
807094c0 BP |
116 | ============================================================== |
117 | ||
118 | acpi_video_flags: | |
119 | ||
120 | flags | |
121 | ||
122 | See Doc*/kernel/power/video.txt, it allows mode of video boot to be | |
123 | set during run time. | |
124 | ||
125 | ============================================================== | |
126 | ||
127 | auto_msgmni: | |
128 | ||
0050ee05 MS |
129 | This variable has no effect and may be removed in future kernel |
130 | releases. Reading it always returns 0. | |
131 | Up to Linux 3.17, it enabled/disabled automatic recomputing of msgmni | |
132 | upon memory add/remove or upon ipc namespace creation/removal. | |
133 | Echoing "1" into this file enabled msgmni automatic recomputing. | |
134 | Echoing "0" turned it off. auto_msgmni default value was 1. | |
807094c0 BP |
135 | |
136 | ||
1da177e4 LT |
137 | ============================================================== |
138 | ||
d75757ab PA |
139 | bootloader_type: |
140 | ||
141 | x86 bootloader identification | |
142 | ||
143 | This gives the bootloader type number as indicated by the bootloader, | |
144 | shifted left by 4, and OR'd with the low four bits of the bootloader | |
145 | version. The reason for this encoding is that this used to match the | |
146 | type_of_loader field in the kernel header; the encoding is kept for | |
147 | backwards compatibility. That is, if the full bootloader type number | |
148 | is 0x15 and the full version number is 0x234, this file will contain | |
149 | the value 340 = 0x154. | |
150 | ||
151 | See the type_of_loader and ext_loader_type fields in | |
152 | Documentation/x86/boot.txt for additional information. | |
153 | ||
154 | ============================================================== | |
155 | ||
156 | bootloader_version: | |
157 | ||
158 | x86 bootloader version | |
159 | ||
160 | The complete bootloader version number. In the example above, this | |
161 | file will contain the value 564 = 0x234. | |
162 | ||
163 | See the type_of_loader and ext_loader_ver fields in | |
164 | Documentation/x86/boot.txt for additional information. | |
165 | ||
166 | ============================================================== | |
167 | ||
c114728a HJP |
168 | callhome: |
169 | ||
170 | Controls the kernel's callhome behavior in case of a kernel panic. | |
171 | ||
172 | The s390 hardware allows an operating system to send a notification | |
173 | to a service organization (callhome) in case of an operating system panic. | |
174 | ||
175 | When the value in this file is 0 (which is the default behavior) | |
176 | nothing happens in case of a kernel panic. If this value is set to "1" | |
177 | the complete kernel oops message is send to the IBM customer service | |
178 | organization in case the mainframe the Linux operating system is running | |
179 | on has a service contract with IBM. | |
180 | ||
181 | ============================================================== | |
182 | ||
73efc039 DB |
183 | cap_last_cap |
184 | ||
185 | Highest valid capability of the running kernel. Exports | |
186 | CAP_LAST_CAP from the kernel. | |
187 | ||
188 | ============================================================== | |
189 | ||
1da177e4 LT |
190 | core_pattern: |
191 | ||
192 | core_pattern is used to specify a core dumpfile pattern name. | |
cd081041 | 193 | . max length 128 characters; default value is "core" |
1da177e4 LT |
194 | . core_pattern is used as a pattern template for the output filename; |
195 | certain string patterns (beginning with '%') are substituted with | |
196 | their actual values. | |
197 | . backward compatibility with core_uses_pid: | |
198 | If core_pattern does not include "%p" (default does not) | |
199 | and core_uses_pid is set, then .PID will be appended to | |
200 | the filename. | |
201 | . corename format specifiers: | |
202 | %<NUL> '%' is dropped | |
203 | %% output one '%' | |
204 | %p pid | |
65aafb1e | 205 | %P global pid (init PID namespace) |
b03023ec ON |
206 | %i tid |
207 | %I global tid (init PID namespace) | |
5202efe5 NI |
208 | %u uid (in initial user namespace) |
209 | %g gid (in initial user namespace) | |
12a2b4b2 ON |
210 | %d dump mode, matches PR_SET_DUMPABLE and |
211 | /proc/sys/fs/suid_dumpable | |
1da177e4 LT |
212 | %s signal number |
213 | %t UNIX time of dump | |
214 | %h hostname | |
57cc083a JS |
215 | %e executable filename (may be shortened) |
216 | %E executable path | |
1da177e4 | 217 | %<OTHER> both are dropped |
cd081041 MU |
218 | . If the first character of the pattern is a '|', the kernel will treat |
219 | the rest of the pattern as a command to run. The core dump will be | |
220 | written to the standard input of that program instead of to a file. | |
1da177e4 LT |
221 | |
222 | ============================================================== | |
223 | ||
a293980c NH |
224 | core_pipe_limit: |
225 | ||
807094c0 BP |
226 | This sysctl is only applicable when core_pattern is configured to pipe |
227 | core files to a user space helper (when the first character of | |
228 | core_pattern is a '|', see above). When collecting cores via a pipe | |
229 | to an application, it is occasionally useful for the collecting | |
230 | application to gather data about the crashing process from its | |
231 | /proc/pid directory. In order to do this safely, the kernel must wait | |
232 | for the collecting process to exit, so as not to remove the crashing | |
233 | processes proc files prematurely. This in turn creates the | |
234 | possibility that a misbehaving userspace collecting process can block | |
235 | the reaping of a crashed process simply by never exiting. This sysctl | |
236 | defends against that. It defines how many concurrent crashing | |
237 | processes may be piped to user space applications in parallel. If | |
238 | this value is exceeded, then those crashing processes above that value | |
239 | are noted via the kernel log and their cores are skipped. 0 is a | |
240 | special value, indicating that unlimited processes may be captured in | |
241 | parallel, but that no waiting will take place (i.e. the collecting | |
242 | process is not guaranteed access to /proc/<crashing pid>/). This | |
243 | value defaults to 0. | |
a293980c NH |
244 | |
245 | ============================================================== | |
246 | ||
1da177e4 LT |
247 | core_uses_pid: |
248 | ||
249 | The default coredump filename is "core". By setting | |
250 | core_uses_pid to 1, the coredump filename becomes core.PID. | |
251 | If core_pattern does not include "%p" (default does not) | |
252 | and core_uses_pid is set, then .PID will be appended to | |
253 | the filename. | |
254 | ||
255 | ============================================================== | |
256 | ||
257 | ctrl-alt-del: | |
258 | ||
259 | When the value in this file is 0, ctrl-alt-del is trapped and | |
260 | sent to the init(1) program to handle a graceful restart. | |
261 | When, however, the value is > 0, Linux's reaction to a Vulcan | |
262 | Nerve Pinch (tm) will be an immediate reboot, without even | |
263 | syncing its dirty buffers. | |
264 | ||
265 | Note: when a program (like dosemu) has the keyboard in 'raw' | |
266 | mode, the ctrl-alt-del is intercepted by the program before it | |
267 | ever reaches the kernel tty layer, and it's up to the program | |
268 | to decide what to do with it. | |
269 | ||
270 | ============================================================== | |
271 | ||
eaf06b24 DR |
272 | dmesg_restrict: |
273 | ||
807094c0 BP |
274 | This toggle indicates whether unprivileged users are prevented |
275 | from using dmesg(8) to view messages from the kernel's log buffer. | |
276 | When dmesg_restrict is set to (0) there are no restrictions. When | |
38ef4c2e | 277 | dmesg_restrict is set set to (1), users must have CAP_SYSLOG to use |
eaf06b24 DR |
278 | dmesg(8). |
279 | ||
807094c0 BP |
280 | The kernel config option CONFIG_SECURITY_DMESG_RESTRICT sets the |
281 | default value of dmesg_restrict. | |
eaf06b24 DR |
282 | |
283 | ============================================================== | |
284 | ||
1da177e4 LT |
285 | domainname & hostname: |
286 | ||
287 | These files can be used to set the NIS/YP domainname and the | |
288 | hostname of your box in exactly the same way as the commands | |
289 | domainname and hostname, i.e.: | |
290 | # echo "darkstar" > /proc/sys/kernel/hostname | |
291 | # echo "mydomain" > /proc/sys/kernel/domainname | |
292 | has the same effect as | |
293 | # hostname "darkstar" | |
294 | # domainname "mydomain" | |
295 | ||
296 | Note, however, that the classic darkstar.frop.org has the | |
297 | hostname "darkstar" and DNS (Internet Domain Name Server) | |
298 | domainname "frop.org", not to be confused with the NIS (Network | |
299 | Information Service) or YP (Yellow Pages) domainname. These two | |
300 | domain names are in general different. For a detailed discussion | |
301 | see the hostname(1) man page. | |
302 | ||
55537871 JK |
303 | ============================================================== |
304 | hardlockup_all_cpu_backtrace: | |
305 | ||
306 | This value controls the hard lockup detector behavior when a hard | |
307 | lockup condition is detected as to whether or not to gather further | |
308 | debug information. If enabled, arch-specific all-CPU stack dumping | |
309 | will be initiated. | |
310 | ||
311 | 0: do nothing. This is the default behavior. | |
312 | ||
313 | 1: on detection capture more debug information. | |
1da177e4 LT |
314 | ============================================================== |
315 | ||
316 | hotplug: | |
317 | ||
318 | Path for the hotplug policy agent. | |
319 | Default value is "/sbin/hotplug". | |
320 | ||
321 | ============================================================== | |
322 | ||
270750db AT |
323 | hung_task_panic: |
324 | ||
325 | Controls the kernel's behavior when a hung task is detected. | |
326 | This file shows up if CONFIG_DETECT_HUNG_TASK is enabled. | |
327 | ||
328 | 0: continue operation. This is the default behavior. | |
329 | ||
330 | 1: panic immediately. | |
331 | ||
332 | ============================================================== | |
333 | ||
334 | hung_task_check_count: | |
335 | ||
336 | The upper bound on the number of tasks that are checked. | |
337 | This file shows up if CONFIG_DETECT_HUNG_TASK is enabled. | |
338 | ||
339 | ============================================================== | |
340 | ||
341 | hung_task_timeout_secs: | |
342 | ||
343 | Check interval. When a task in D state did not get scheduled | |
344 | for more than this value report a warning. | |
345 | This file shows up if CONFIG_DETECT_HUNG_TASK is enabled. | |
346 | ||
347 | 0: means infinite timeout - no checking done. | |
80df2847 | 348 | Possible values to set are in range {0..LONG_MAX/HZ}. |
270750db AT |
349 | |
350 | ============================================================== | |
351 | ||
70e0ac5f | 352 | hung_task_warnings: |
270750db AT |
353 | |
354 | The maximum number of warnings to report. During a check interval | |
70e0ac5f AT |
355 | if a hung task is detected, this value is decreased by 1. |
356 | When this value reaches 0, no more warnings will be reported. | |
270750db AT |
357 | This file shows up if CONFIG_DETECT_HUNG_TASK is enabled. |
358 | ||
359 | -1: report an infinite number of warnings. | |
360 | ||
361 | ============================================================== | |
362 | ||
7984754b KC |
363 | kexec_load_disabled: |
364 | ||
365 | A toggle indicating if the kexec_load syscall has been disabled. This | |
366 | value defaults to 0 (false: kexec_load enabled), but can be set to 1 | |
367 | (true: kexec_load disabled). Once true, kexec can no longer be used, and | |
368 | the toggle cannot be set back to false. This allows a kexec image to be | |
369 | loaded before disabling the syscall, allowing a system to set up (and | |
370 | later use) an image without it being altered. Generally used together | |
371 | with the "modules_disabled" sysctl. | |
372 | ||
373 | ============================================================== | |
374 | ||
455cd5ab DR |
375 | kptr_restrict: |
376 | ||
377 | This toggle indicates whether restrictions are placed on | |
312b4e22 RM |
378 | exposing kernel addresses via /proc and other interfaces. |
379 | ||
380 | When kptr_restrict is set to (0), the default, there are no restrictions. | |
381 | ||
382 | When kptr_restrict is set to (1), kernel pointers printed using the %pK | |
383 | format specifier will be replaced with 0's unless the user has CAP_SYSLOG | |
384 | and effective user and group ids are equal to the real ids. This is | |
385 | because %pK checks are done at read() time rather than open() time, so | |
386 | if permissions are elevated between the open() and the read() (e.g via | |
387 | a setuid binary) then %pK will not leak kernel pointers to unprivileged | |
388 | users. Note, this is a temporary solution only. The correct long-term | |
389 | solution is to do the permission checks at open() time. Consider removing | |
390 | world read permissions from files that use %pK, and using dmesg_restrict | |
391 | to protect against uses of %pK in dmesg(8) if leaking kernel pointer | |
392 | values to unprivileged users is a concern. | |
393 | ||
394 | When kptr_restrict is set to (2), kernel pointers printed using | |
395 | %pK will be replaced with 0's regardless of privileges. | |
455cd5ab DR |
396 | |
397 | ============================================================== | |
398 | ||
807094c0 BP |
399 | l2cr: (PPC only) |
400 | ||
401 | This flag controls the L2 cache of G3 processor boards. If | |
402 | 0, the cache is disabled. Enabled if nonzero. | |
403 | ||
404 | ============================================================== | |
405 | ||
3d43321b KC |
406 | modules_disabled: |
407 | ||
408 | A toggle value indicating if modules are allowed to be loaded | |
409 | in an otherwise modular kernel. This toggle defaults to off | |
410 | (0), but can be set true (1). Once true, modules can be | |
411 | neither loaded nor unloaded, and the toggle cannot be set back | |
7984754b | 412 | to false. Generally used with the "kexec_load_disabled" toggle. |
3d43321b KC |
413 | |
414 | ============================================================== | |
415 | ||
03f59566 SK |
416 | msg_next_id, sem_next_id, and shm_next_id: |
417 | ||
418 | These three toggles allows to specify desired id for next allocated IPC | |
419 | object: message, semaphore or shared memory respectively. | |
420 | ||
421 | By default they are equal to -1, which means generic allocation logic. | |
422 | Possible values to set are in range {0..INT_MAX}. | |
423 | ||
424 | Notes: | |
425 | 1) kernel doesn't guarantee, that new object will have desired id. So, | |
426 | it's up to userspace, how to handle an object with "wrong" id. | |
427 | 2) Toggle with non-default value will be set back to -1 by kernel after | |
428 | successful IPC object allocation. | |
429 | ||
430 | ============================================================== | |
431 | ||
807094c0 BP |
432 | nmi_watchdog: |
433 | ||
195daf66 UO |
434 | This parameter can be used to control the NMI watchdog |
435 | (i.e. the hard lockup detector) on x86 systems. | |
807094c0 | 436 | |
195daf66 UO |
437 | 0 - disable the hard lockup detector |
438 | 1 - enable the hard lockup detector | |
439 | ||
440 | The hard lockup detector monitors each CPU for its ability to respond to | |
441 | timer interrupts. The mechanism utilizes CPU performance counter registers | |
442 | that are programmed to generate Non-Maskable Interrupts (NMIs) periodically | |
443 | while a CPU is busy. Hence, the alternative name 'NMI watchdog'. | |
444 | ||
445 | The NMI watchdog is disabled by default if the kernel is running as a guest | |
446 | in a KVM virtual machine. This default can be overridden by adding | |
447 | ||
448 | nmi_watchdog=1 | |
449 | ||
8c27ceff | 450 | to the guest kernel command line (see Documentation/admin-guide/kernel-parameters.rst). |
807094c0 BP |
451 | |
452 | ============================================================== | |
453 | ||
10fc05d0 MG |
454 | numa_balancing |
455 | ||
456 | Enables/disables automatic page fault based NUMA memory | |
457 | balancing. Memory is moved automatically to nodes | |
458 | that access it often. | |
459 | ||
460 | Enables/disables automatic NUMA memory balancing. On NUMA machines, there | |
461 | is a performance penalty if remote memory is accessed by a CPU. When this | |
462 | feature is enabled the kernel samples what task thread is accessing memory | |
463 | by periodically unmapping pages and later trapping a page fault. At the | |
464 | time of the page fault, it is determined if the data being accessed should | |
465 | be migrated to a local memory node. | |
466 | ||
467 | The unmapping of pages and trapping faults incur additional overhead that | |
468 | ideally is offset by improved memory locality but there is no universal | |
469 | guarantee. If the target workload is already bound to NUMA nodes then this | |
470 | feature should be disabled. Otherwise, if the system overhead from the | |
471 | feature is too high then the rate the kernel samples for NUMA hinting | |
472 | faults may be controlled by the numa_balancing_scan_period_min_ms, | |
930aa174 | 473 | numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms, |
52bf84aa | 474 | numa_balancing_scan_size_mb, and numa_balancing_settle_count sysctls. |
10fc05d0 MG |
475 | |
476 | ============================================================== | |
477 | ||
478 | numa_balancing_scan_period_min_ms, numa_balancing_scan_delay_ms, | |
930aa174 | 479 | numa_balancing_scan_period_max_ms, numa_balancing_scan_size_mb |
10fc05d0 MG |
480 | |
481 | Automatic NUMA balancing scans tasks address space and unmaps pages to | |
482 | detect if pages are properly placed or if the data should be migrated to a | |
483 | memory node local to where the task is running. Every "scan delay" the task | |
484 | scans the next "scan size" number of pages in its address space. When the | |
485 | end of the address space is reached the scanner restarts from the beginning. | |
486 | ||
487 | In combination, the "scan delay" and "scan size" determine the scan rate. | |
488 | When "scan delay" decreases, the scan rate increases. The scan delay and | |
489 | hence the scan rate of every task is adaptive and depends on historical | |
490 | behaviour. If pages are properly placed then the scan delay increases, | |
491 | otherwise the scan delay decreases. The "scan size" is not adaptive but | |
492 | the higher the "scan size", the higher the scan rate. | |
493 | ||
494 | Higher scan rates incur higher system overhead as page faults must be | |
495 | trapped and potentially data must be migrated. However, the higher the scan | |
496 | rate, the more quickly a tasks memory is migrated to a local node if the | |
497 | workload pattern changes and minimises performance impact due to remote | |
498 | memory accesses. These sysctls control the thresholds for scan delays and | |
499 | the number of pages scanned. | |
500 | ||
598f0ec0 MG |
501 | numa_balancing_scan_period_min_ms is the minimum time in milliseconds to |
502 | scan a tasks virtual memory. It effectively controls the maximum scanning | |
503 | rate for each task. | |
10fc05d0 MG |
504 | |
505 | numa_balancing_scan_delay_ms is the starting "scan delay" used for a task | |
506 | when it initially forks. | |
507 | ||
598f0ec0 MG |
508 | numa_balancing_scan_period_max_ms is the maximum time in milliseconds to |
509 | scan a tasks virtual memory. It effectively controls the minimum scanning | |
510 | rate for each task. | |
10fc05d0 MG |
511 | |
512 | numa_balancing_scan_size_mb is how many megabytes worth of pages are | |
513 | scanned for a given scan. | |
514 | ||
10fc05d0 MG |
515 | ============================================================== |
516 | ||
1da177e4 LT |
517 | osrelease, ostype & version: |
518 | ||
519 | # cat osrelease | |
520 | 2.1.88 | |
521 | # cat ostype | |
522 | Linux | |
523 | # cat version | |
524 | #5 Wed Feb 25 21:49:24 MET 1998 | |
525 | ||
526 | The files osrelease and ostype should be clear enough. Version | |
527 | needs a little more clarification however. The '#5' means that | |
528 | this is the fifth kernel built from this source base and the | |
529 | date behind it indicates the time the kernel was built. | |
530 | The only way to tune these values is to rebuild the kernel :-) | |
531 | ||
532 | ============================================================== | |
533 | ||
534 | overflowgid & overflowuid: | |
535 | ||
807094c0 BP |
536 | if your architecture did not always support 32-bit UIDs (i.e. arm, |
537 | i386, m68k, sh, and sparc32), a fixed UID and GID will be returned to | |
538 | applications that use the old 16-bit UID/GID system calls, if the | |
539 | actual UID or GID would exceed 65535. | |
1da177e4 LT |
540 | |
541 | These sysctls allow you to change the value of the fixed UID and GID. | |
542 | The default is 65534. | |
543 | ||
544 | ============================================================== | |
545 | ||
546 | panic: | |
547 | ||
807094c0 BP |
548 | The value in this file represents the number of seconds the kernel |
549 | waits before rebooting on a panic. When you use the software watchdog, | |
550 | the recommended setting is 60. | |
551 | ||
552 | ============================================================== | |
9f318e3f HK |
553 | |
554 | panic_on_io_nmi: | |
555 | ||
556 | Controls the kernel's behavior when a CPU receives an NMI caused by | |
557 | an IO error. | |
558 | ||
559 | 0: try to continue operation (default) | |
560 | ||
561 | 1: panic immediately. The IO error triggered an NMI. This indicates a | |
562 | serious system condition which could result in IO data corruption. | |
563 | Rather than continuing, panicking might be a better choice. Some | |
564 | servers issue this sort of NMI when the dump button is pushed, | |
565 | and you can use this option to take a crash dump. | |
566 | ||
567 | ============================================================== | |
807094c0 | 568 | |
1da177e4 LT |
569 | panic_on_oops: |
570 | ||
571 | Controls the kernel's behaviour when an oops or BUG is encountered. | |
572 | ||
573 | 0: try to continue operation | |
574 | ||
a982ac06 | 575 | 1: panic immediately. If the `panic' sysctl is also non-zero then the |
8b23d04d | 576 | machine will be rebooted. |
1da177e4 LT |
577 | |
578 | ============================================================== | |
579 | ||
55af7796 MH |
580 | panic_on_stackoverflow: |
581 | ||
582 | Controls the kernel's behavior when detecting the overflows of | |
583 | kernel, IRQ and exception stacks except a user stack. | |
584 | This file shows up if CONFIG_DEBUG_STACKOVERFLOW is enabled. | |
585 | ||
586 | 0: try to continue operation. | |
587 | ||
588 | 1: panic immediately. | |
589 | ||
590 | ============================================================== | |
591 | ||
9e3961a0 PB |
592 | panic_on_unrecovered_nmi: |
593 | ||
594 | The default Linux behaviour on an NMI of either memory or unknown is | |
595 | to continue operation. For many environments such as scientific | |
596 | computing it is preferable that the box is taken out and the error | |
597 | dealt with than an uncorrected parity/ECC error get propagated. | |
598 | ||
599 | A small number of systems do generate NMI's for bizarre random reasons | |
600 | such as power management so the default is off. That sysctl works like | |
601 | the existing panic controls already in that directory. | |
602 | ||
603 | ============================================================== | |
604 | ||
605 | panic_on_warn: | |
606 | ||
607 | Calls panic() in the WARN() path when set to 1. This is useful to avoid | |
608 | a kernel rebuild when attempting to kdump at the location of a WARN(). | |
609 | ||
610 | 0: only WARN(), default behaviour. | |
611 | ||
612 | 1: call panic() after printing out WARN() location. | |
613 | ||
614 | ============================================================== | |
615 | ||
088e9d25 DBO |
616 | panic_on_rcu_stall: |
617 | ||
618 | When set to 1, calls panic() after RCU stall detection messages. This | |
619 | is useful to define the root cause of RCU stalls using a vmcore. | |
620 | ||
621 | 0: do not panic() when RCU stall takes place, default behavior. | |
622 | ||
623 | 1: panic() after printing RCU stall messages. | |
624 | ||
625 | ============================================================== | |
626 | ||
14c63f17 DH |
627 | perf_cpu_time_max_percent: |
628 | ||
629 | Hints to the kernel how much CPU time it should be allowed to | |
630 | use to handle perf sampling events. If the perf subsystem | |
631 | is informed that its samples are exceeding this limit, it | |
632 | will drop its sampling frequency to attempt to reduce its CPU | |
633 | usage. | |
634 | ||
635 | Some perf sampling happens in NMIs. If these samples | |
636 | unexpectedly take too long to execute, the NMIs can become | |
637 | stacked up next to each other so much that nothing else is | |
638 | allowed to execute. | |
639 | ||
640 | 0: disable the mechanism. Do not monitor or correct perf's | |
641 | sampling rate no matter how CPU time it takes. | |
642 | ||
643 | 1-100: attempt to throttle perf's sample rate to this | |
644 | percentage of CPU. Note: the kernel calculates an | |
645 | "expected" length of each sample event. 100 here means | |
646 | 100% of that expected length. Even if this is set to | |
647 | 100, you may still see sample throttling if this | |
648 | length is exceeded. Set to 0 if you truly do not care | |
649 | how much CPU is consumed. | |
650 | ||
651 | ============================================================== | |
652 | ||
3379e0c3 BH |
653 | perf_event_paranoid: |
654 | ||
655 | Controls use of the performance events system by unprivileged | |
0161028b | 656 | users (without CAP_SYS_ADMIN). The default value is 2. |
3379e0c3 BH |
657 | |
658 | -1: Allow use of (almost) all events by all users | |
ac0bb6b7 KK |
659 | Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK |
660 | >=0: Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN | |
661 | Disallow raw tracepoint access by users without CAP_SYS_ADMIN | |
3379e0c3 BH |
662 | >=1: Disallow CPU event access by users without CAP_SYS_ADMIN |
663 | >=2: Disallow kernel profiling by users without CAP_SYS_ADMIN | |
664 | ||
665 | ============================================================== | |
55af7796 | 666 | |
c5dfd78e ACM |
667 | perf_event_max_stack: |
668 | ||
669 | Controls maximum number of stack frames to copy for (attr.sample_type & | |
670 | PERF_SAMPLE_CALLCHAIN) configured events, for instance, when using | |
671 | 'perf record -g' or 'perf trace --call-graph fp'. | |
672 | ||
673 | This can only be done when no events are in use that have callchains | |
674 | enabled, otherwise writing to this file will return -EBUSY. | |
675 | ||
676 | The default value is 127. | |
677 | ||
678 | ============================================================== | |
679 | ||
ac0bb6b7 KK |
680 | perf_event_mlock_kb: |
681 | ||
682 | Control size of per-cpu ring buffer not counted agains mlock limit. | |
683 | ||
684 | The default value is 512 + 1 page | |
685 | ||
686 | ============================================================== | |
687 | ||
c85b0334 ACM |
688 | perf_event_max_contexts_per_stack: |
689 | ||
690 | Controls maximum number of stack frame context entries for | |
691 | (attr.sample_type & PERF_SAMPLE_CALLCHAIN) configured events, for | |
692 | instance, when using 'perf record -g' or 'perf trace --call-graph fp'. | |
693 | ||
694 | This can only be done when no events are in use that have callchains | |
695 | enabled, otherwise writing to this file will return -EBUSY. | |
696 | ||
697 | The default value is 8. | |
698 | ||
699 | ============================================================== | |
700 | ||
1da177e4 LT |
701 | pid_max: |
702 | ||
beb7dd86 | 703 | PID allocation wrap value. When the kernel's next PID value |
1da177e4 LT |
704 | reaches this value, it wraps back to a minimum PID value. |
705 | PIDs of value pid_max or larger are not allocated. | |
706 | ||
707 | ============================================================== | |
708 | ||
b8f566b0 PE |
709 | ns_last_pid: |
710 | ||
711 | The last pid allocated in the current (the one task using this sysctl | |
712 | lives in) pid namespace. When selecting a pid for a next task on fork | |
713 | kernel tries to allocate a number starting from this one. | |
714 | ||
715 | ============================================================== | |
716 | ||
1da177e4 LT |
717 | powersave-nap: (PPC only) |
718 | ||
719 | If set, Linux-PPC will use the 'nap' mode of powersaving, | |
720 | otherwise the 'doze' mode will be used. | |
721 | ||
722 | ============================================================== | |
723 | ||
724 | printk: | |
725 | ||
726 | The four values in printk denote: console_loglevel, | |
727 | default_message_loglevel, minimum_console_loglevel and | |
728 | default_console_loglevel respectively. | |
729 | ||
730 | These values influence printk() behavior when printing or | |
731 | logging error messages. See 'man 2 syslog' for more info on | |
732 | the different loglevels. | |
733 | ||
734 | - console_loglevel: messages with a higher priority than | |
735 | this will be printed to the console | |
87889e15 | 736 | - default_message_loglevel: messages without an explicit priority |
1da177e4 LT |
737 | will be printed with this priority |
738 | - minimum_console_loglevel: minimum (highest) value to which | |
739 | console_loglevel can be set | |
740 | - default_console_loglevel: default value for console_loglevel | |
741 | ||
742 | ============================================================== | |
743 | ||
807094c0 BP |
744 | printk_delay: |
745 | ||
746 | Delay each printk message in printk_delay milliseconds | |
747 | ||
748 | Value from 0 - 10000 is allowed. | |
749 | ||
750 | ============================================================== | |
751 | ||
1da177e4 LT |
752 | printk_ratelimit: |
753 | ||
754 | Some warning messages are rate limited. printk_ratelimit specifies | |
755 | the minimum length of time between these messages (in jiffies), by | |
756 | default we allow one every 5 seconds. | |
757 | ||
758 | A value of 0 will disable rate limiting. | |
759 | ||
760 | ============================================================== | |
761 | ||
762 | printk_ratelimit_burst: | |
763 | ||
764 | While long term we enforce one message per printk_ratelimit | |
765 | seconds, we do allow a burst of messages to pass through. | |
766 | printk_ratelimit_burst specifies the number of messages we can | |
767 | send before ratelimiting kicks in. | |
768 | ||
769 | ============================================================== | |
770 | ||
750afe7b BP |
771 | printk_devkmsg: |
772 | ||
773 | Control the logging to /dev/kmsg from userspace: | |
774 | ||
775 | ratelimit: default, ratelimited | |
776 | on: unlimited logging to /dev/kmsg from userspace | |
777 | off: logging to /dev/kmsg disabled | |
778 | ||
779 | The kernel command line parameter printk.devkmsg= overrides this and is | |
780 | a one-time setting until next reboot: once set, it cannot be changed by | |
781 | this sysctl interface anymore. | |
782 | ||
783 | ============================================================== | |
784 | ||
807094c0 | 785 | randomize_va_space: |
1ec7fd50 JK |
786 | |
787 | This option can be used to select the type of process address | |
788 | space randomization that is used in the system, for architectures | |
789 | that support this feature. | |
790 | ||
b7f5ab6f HS |
791 | 0 - Turn the process address space randomization off. This is the |
792 | default for architectures that do not support this feature anyways, | |
793 | and kernels that are booted with the "norandmaps" parameter. | |
1ec7fd50 JK |
794 | |
795 | 1 - Make the addresses of mmap base, stack and VDSO page randomized. | |
796 | This, among other things, implies that shared libraries will be | |
b7f5ab6f HS |
797 | loaded to random addresses. Also for PIE-linked binaries, the |
798 | location of code start is randomized. This is the default if the | |
799 | CONFIG_COMPAT_BRK option is enabled. | |
1ec7fd50 | 800 | |
b7f5ab6f HS |
801 | 2 - Additionally enable heap randomization. This is the default if |
802 | CONFIG_COMPAT_BRK is disabled. | |
803 | ||
804 | There are a few legacy applications out there (such as some ancient | |
1ec7fd50 | 805 | versions of libc.so.5 from 1996) that assume that brk area starts |
b7f5ab6f HS |
806 | just after the end of the code+bss. These applications break when |
807 | start of the brk area is randomized. There are however no known | |
1ec7fd50 | 808 | non-legacy applications that would be broken this way, so for most |
b7f5ab6f HS |
809 | systems it is safe to choose full randomization. |
810 | ||
811 | Systems with ancient and/or broken binaries should be configured | |
812 | with CONFIG_COMPAT_BRK enabled, which excludes the heap from process | |
813 | address space randomization. | |
1ec7fd50 JK |
814 | |
815 | ============================================================== | |
816 | ||
1da177e4 LT |
817 | reboot-cmd: (Sparc only) |
818 | ||
819 | ??? This seems to be a way to give an argument to the Sparc | |
820 | ROM/Flash boot loader. Maybe to tell it what to do after | |
821 | rebooting. ??? | |
822 | ||
823 | ============================================================== | |
824 | ||
825 | rtsig-max & rtsig-nr: | |
826 | ||
827 | The file rtsig-max can be used to tune the maximum number | |
828 | of POSIX realtime (queued) signals that can be outstanding | |
829 | in the system. | |
830 | ||
831 | rtsig-nr shows the number of RT signals currently queued. | |
832 | ||
833 | ============================================================== | |
834 | ||
cb251765 MG |
835 | sched_schedstats: |
836 | ||
837 | Enables/disables scheduler statistics. Enabling this feature | |
838 | incurs a small amount of overhead in the scheduler but is | |
839 | useful for debugging and performance tuning. | |
840 | ||
841 | ============================================================== | |
842 | ||
1da177e4 LT |
843 | sg-big-buff: |
844 | ||
845 | This file shows the size of the generic SCSI (sg) buffer. | |
846 | You can't tune it just yet, but you could change it on | |
847 | compile time by editing include/scsi/sg.h and changing | |
848 | the value of SG_BIG_BUFF. | |
849 | ||
850 | There shouldn't be any reason to change this value. If | |
851 | you can come up with one, you probably know what you | |
852 | are doing anyway :) | |
853 | ||
854 | ============================================================== | |
855 | ||
358e419f CALP |
856 | shmall: |
857 | ||
858 | This parameter sets the total amount of shared memory pages that | |
859 | can be used system wide. Hence, SHMALL should always be at least | |
860 | ceil(shmmax/PAGE_SIZE). | |
861 | ||
862 | If you are not sure what the default PAGE_SIZE is on your Linux | |
863 | system, you can run the following command: | |
864 | ||
865 | # getconf PAGE_SIZE | |
866 | ||
867 | ============================================================== | |
868 | ||
807094c0 | 869 | shmmax: |
1da177e4 LT |
870 | |
871 | This value can be used to query and set the run time limit | |
872 | on the maximum shared memory segment size that can be created. | |
807094c0 | 873 | Shared memory segments up to 1Gb are now supported in the |
1da177e4 LT |
874 | kernel. This value defaults to SHMMAX. |
875 | ||
876 | ============================================================== | |
877 | ||
b34a6b1d VK |
878 | shm_rmid_forced: |
879 | ||
880 | Linux lets you set resource limits, including how much memory one | |
881 | process can consume, via setrlimit(2). Unfortunately, shared memory | |
882 | segments are allowed to exist without association with any process, and | |
883 | thus might not be counted against any resource limits. If enabled, | |
884 | shared memory segments are automatically destroyed when their attach | |
885 | count becomes zero after a detach or a process termination. It will | |
886 | also destroy segments that were created, but never attached to, on exit | |
887 | from the process. The only use left for IPC_RMID is to immediately | |
888 | destroy an unattached segment. Of course, this breaks the way things are | |
889 | defined, so some applications might stop working. Note that this | |
890 | feature will do you no good unless you also configure your resource | |
891 | limits (in particular, RLIMIT_AS and RLIMIT_NPROC). Most systems don't | |
892 | need this. | |
893 | ||
894 | Note that if you change this from 0 to 1, already created segments | |
895 | without users and with a dead originative process will be destroyed. | |
896 | ||
897 | ============================================================== | |
898 | ||
f4aacea2 KC |
899 | sysctl_writes_strict: |
900 | ||
901 | Control how file position affects the behavior of updating sysctl values | |
902 | via the /proc/sys interface: | |
903 | ||
904 | -1 - Legacy per-write sysctl value handling, with no printk warnings. | |
905 | Each write syscall must fully contain the sysctl value to be | |
906 | written, and multiple writes on the same sysctl file descriptor | |
907 | will rewrite the sysctl value, regardless of file position. | |
41662f5c KC |
908 | 0 - Same behavior as above, but warn about processes that perform writes |
909 | to a sysctl file descriptor when the file position is not 0. | |
910 | 1 - (default) Respect file position when writing sysctl strings. Multiple | |
911 | writes will append to the sysctl value buffer. Anything past the max | |
912 | length of the sysctl value buffer will be ignored. Writes to numeric | |
913 | sysctl entries must always be at file position 0 and the value must | |
914 | be fully contained in the buffer sent in the write syscall. | |
f4aacea2 KC |
915 | |
916 | ============================================================== | |
917 | ||
ed235875 AT |
918 | softlockup_all_cpu_backtrace: |
919 | ||
920 | This value controls the soft lockup detector thread's behavior | |
921 | when a soft lockup condition is detected as to whether or not | |
922 | to gather further debug information. If enabled, each cpu will | |
923 | be issued an NMI and instructed to capture stack trace. | |
924 | ||
925 | This feature is only applicable for architectures which support | |
926 | NMI. | |
927 | ||
928 | 0: do nothing. This is the default behavior. | |
929 | ||
930 | 1: on detection capture more debug information. | |
931 | ||
932 | ============================================================== | |
933 | ||
195daf66 UO |
934 | soft_watchdog |
935 | ||
936 | This parameter can be used to control the soft lockup detector. | |
937 | ||
938 | 0 - disable the soft lockup detector | |
939 | 1 - enable the soft lockup detector | |
940 | ||
941 | The soft lockup detector monitors CPUs for threads that are hogging the CPUs | |
942 | without rescheduling voluntarily, and thus prevent the 'watchdog/N' threads | |
943 | from running. The mechanism depends on the CPUs ability to respond to timer | |
944 | interrupts which are needed for the 'watchdog/N' threads to be woken up by | |
945 | the watchdog timer function, otherwise the NMI watchdog - if enabled - can | |
946 | detect a hard lockup condition. | |
947 | ||
948 | ============================================================== | |
949 | ||
807094c0 | 950 | tainted: |
1da177e4 LT |
951 | |
952 | Non-zero if the kernel has been tainted. Numeric values, which | |
953 | can be ORed together: | |
954 | ||
bb20698d GKH |
955 | 1 - A module with a non-GPL license has been loaded, this |
956 | includes modules with no license. | |
957 | Set by modutils >= 2.4.9 and module-init-tools. | |
958 | 2 - A module was force loaded by insmod -f. | |
959 | Set by modutils >= 2.4.9 and module-init-tools. | |
960 | 4 - Unsafe SMP processors: SMP with CPUs not designed for SMP. | |
961 | 8 - A module was forcibly unloaded from the system by rmmod -f. | |
962 | 16 - A hardware machine check error occurred on the system. | |
963 | 32 - A bad page was discovered on the system. | |
964 | 64 - The user has asked that the system be marked "tainted". This | |
965 | could be because they are running software that directly modifies | |
966 | the hardware, or for other reasons. | |
967 | 128 - The system has died. | |
968 | 256 - The ACPI DSDT has been overridden with one supplied by the user | |
969 | instead of using the one provided by the hardware. | |
970 | 512 - A kernel warning has occurred. | |
971 | 1024 - A module from drivers/staging was loaded. | |
f5fe184b LF |
972 | 2048 - The system is working around a severe firmware bug. |
973 | 4096 - An out-of-tree module has been loaded. | |
66cc69e3 MD |
974 | 8192 - An unsigned module has been loaded in a kernel supporting module |
975 | signature. | |
69361eef | 976 | 16384 - A soft lockup has previously occurred on the system. |
c5f45465 | 977 | 32768 - The kernel has been live patched. |
1da177e4 | 978 | |
760df93e SF |
979 | ============================================================== |
980 | ||
0ec62afe HS |
981 | threads-max |
982 | ||
983 | This value controls the maximum number of threads that can be created | |
984 | using fork(). | |
985 | ||
986 | During initialization the kernel sets this value such that even if the | |
987 | maximum number of threads is created, the thread structures occupy only | |
988 | a part (1/8th) of the available RAM pages. | |
989 | ||
990 | The minimum value that can be written to threads-max is 20. | |
991 | The maximum value that can be written to threads-max is given by the | |
992 | constant FUTEX_TID_MASK (0x3fffffff). | |
993 | If a value outside of this range is written to threads-max an error | |
994 | EINVAL occurs. | |
995 | ||
996 | The value written is checked against the available RAM pages. If the | |
997 | thread structures would occupy too much (more than 1/8th) of the | |
998 | available RAM pages threads-max is reduced accordingly. | |
999 | ||
1000 | ============================================================== | |
1001 | ||
760df93e SF |
1002 | unknown_nmi_panic: |
1003 | ||
807094c0 BP |
1004 | The value in this file affects behavior of handling NMI. When the |
1005 | value is non-zero, unknown NMI is trapped and then panic occurs. At | |
1006 | that time, kernel debugging information is displayed on console. | |
760df93e | 1007 | |
807094c0 BP |
1008 | NMI switch that most IA32 servers have fires unknown NMI up, for |
1009 | example. If a system hangs up, try pressing the NMI switch. | |
08825c90 LZ |
1010 | |
1011 | ============================================================== | |
1012 | ||
195daf66 UO |
1013 | watchdog: |
1014 | ||
1015 | This parameter can be used to disable or enable the soft lockup detector | |
1016 | _and_ the NMI watchdog (i.e. the hard lockup detector) at the same time. | |
1017 | ||
1018 | 0 - disable both lockup detectors | |
1019 | 1 - enable both lockup detectors | |
1020 | ||
1021 | The soft lockup detector and the NMI watchdog can also be disabled or | |
1022 | enabled individually, using the soft_watchdog and nmi_watchdog parameters. | |
1023 | If the watchdog parameter is read, for example by executing | |
1024 | ||
1025 | cat /proc/sys/kernel/watchdog | |
1026 | ||
1027 | the output of this command (0 or 1) shows the logical OR of soft_watchdog | |
1028 | and nmi_watchdog. | |
1029 | ||
1030 | ============================================================== | |
1031 | ||
fe4ba3c3 CM |
1032 | watchdog_cpumask: |
1033 | ||
1034 | This value can be used to control on which cpus the watchdog may run. | |
1035 | The default cpumask is all possible cores, but if NO_HZ_FULL is | |
1036 | enabled in the kernel config, and cores are specified with the | |
1037 | nohz_full= boot argument, those cores are excluded by default. | |
1038 | Offline cores can be included in this mask, and if the core is later | |
1039 | brought online, the watchdog will be started based on the mask value. | |
1040 | ||
1041 | Typically this value would only be touched in the nohz_full case | |
1042 | to re-enable cores that by default were not running the watchdog, | |
1043 | if a kernel lockup was suspected on those cores. | |
1044 | ||
1045 | The argument value is the standard cpulist format for cpumasks, | |
1046 | so for example to enable the watchdog on cores 0, 2, 3, and 4 you | |
1047 | might say: | |
1048 | ||
1049 | echo 0,2-4 > /proc/sys/kernel/watchdog_cpumask | |
1050 | ||
1051 | ============================================================== | |
1052 | ||
08825c90 LZ |
1053 | watchdog_thresh: |
1054 | ||
1055 | This value can be used to control the frequency of hrtimer and NMI | |
1056 | events and the soft and hard lockup thresholds. The default threshold | |
1057 | is 10 seconds. | |
1058 | ||
1059 | The softlockup threshold is (2 * watchdog_thresh). Setting this | |
1060 | tunable to zero will disable lockup detection altogether. | |
1061 | ||
1062 | ============================================================== |