]>
Commit | Line | Data |
---|---|---|
faa5273c | 1 | Documentation for /proc/sys/net/* |
760df93e SF |
2 | (c) 1999 Terrehon Bowden <terrehon@pacbell.net> |
3 | Bodo Bauer <bb@ricochet.net> | |
4 | (c) 2000 Jorge Nerin <comandante@zaralinux.com> | |
5 | (c) 2009 Shen Feng <shen@cn.fujitsu.com> | |
6 | ||
7 | For general info and legal blurb, please look in README. | |
8 | ||
9 | ============================================================== | |
10 | ||
11 | This file contains the documentation for the sysctl files in | |
faa5273c | 12 | /proc/sys/net |
760df93e SF |
13 | |
14 | The interface to the networking parts of the kernel is located in | |
faa5273c | 15 | /proc/sys/net. The following table shows all possible subdirectories. You may |
760df93e SF |
16 | see only some of them, depending on your kernel's configuration. |
17 | ||
18 | ||
19 | Table : Subdirectories in /proc/sys/net | |
20 | .............................................................................. | |
21 | Directory Content Directory Content | |
22 | core General parameter appletalk Appletalk protocol | |
23 | unix Unix domain sockets netrom NET/ROM | |
24 | 802 E802 protocol ax25 AX25 | |
25 | ethernet Ethernet protocol rose X.25 PLP layer | |
26 | ipv4 IP version 4 x25 X.25 protocol | |
27 | ipx IPX token-ring IBM token ring | |
28 | bridge Bridging decnet DEC net | |
cc79dd1b | 29 | ipv6 IP version 6 tipc TIPC |
760df93e SF |
30 | .............................................................................. |
31 | ||
32 | 1. /proc/sys/net/core - Network core options | |
33 | ------------------------------------------------------- | |
34 | ||
0a14842f ED |
35 | bpf_jit_enable |
36 | -------------- | |
37 | ||
2110ba58 DB |
38 | This enables the BPF Just in Time (JIT) compiler. BPF is a flexible |
39 | and efficient infrastructure allowing to execute bytecode at various | |
40 | hook points. It is used in a number of Linux kernel subsystems such | |
41 | as networking (e.g. XDP, tc), tracing (e.g. kprobes, uprobes, tracepoints) | |
42 | and security (e.g. seccomp). LLVM has a BPF back end that can compile | |
43 | restricted C into a sequence of BPF instructions. After program load | |
44 | through bpf(2) and passing a verifier in the kernel, a JIT will then | |
45 | translate these BPF proglets into native CPU instructions. There are | |
46 | two flavors of JITs, the newer eBPF JIT currently supported on: | |
014cd0a3 | 47 | - x86_64 |
03f5781b | 48 | - x86_32 |
014cd0a3 | 49 | - arm64 |
d2aaa3dc | 50 | - arm32 |
014cd0a3 ME |
51 | - ppc64 |
52 | - sparc64 | |
53 | - mips64 | |
d4dd2d75 | 54 | - s390x |
014cd0a3 | 55 | |
2110ba58 | 56 | And the older cBPF JIT supported on the following archs: |
014cd0a3 ME |
57 | - mips |
58 | - ppc | |
59 | - sparc | |
60 | ||
2110ba58 DB |
61 | eBPF JITs are a superset of cBPF JITs, meaning the kernel will |
62 | migrate cBPF instructions into eBPF instructions and then JIT | |
63 | compile them transparently. Older cBPF JITs can only translate | |
64 | tcpdump filters, seccomp rules, etc, but not mentioned eBPF | |
65 | programs loaded through bpf(2). | |
014cd0a3 | 66 | |
0a14842f ED |
67 | Values : |
68 | 0 - disable the JIT (default value) | |
69 | 1 - enable the JIT | |
70 | 2 - enable the JIT and ask the compiler to emit traces on kernel log. | |
71 | ||
4f3446bb DB |
72 | bpf_jit_harden |
73 | -------------- | |
74 | ||
2110ba58 DB |
75 | This enables hardening for the BPF JIT compiler. Supported are eBPF |
76 | JIT backends. Enabling hardening trades off performance, but can | |
77 | mitigate JIT spraying. | |
4f3446bb DB |
78 | Values : |
79 | 0 - disable JIT hardening (default value) | |
80 | 1 - enable JIT hardening for unprivileged users only | |
81 | 2 - enable JIT hardening for all users | |
82 | ||
74451e66 DB |
83 | bpf_jit_kallsyms |
84 | ---------------- | |
85 | ||
2110ba58 DB |
86 | When BPF JIT compiler is enabled, then compiled images are unknown |
87 | addresses to the kernel, meaning they neither show up in traces nor | |
88 | in /proc/kallsyms. This enables export of these addresses, which can | |
89 | be used for debugging/tracing. If bpf_jit_harden is enabled, this | |
90 | feature is disabled. | |
74451e66 DB |
91 | Values : |
92 | 0 - disable JIT kallsyms export (default value) | |
93 | 1 - enable JIT kallsyms export for privileged users only | |
94 | ||
c60f6aa8 SW |
95 | dev_weight |
96 | -------------- | |
97 | ||
98 | The maximum number of packets that kernel can handle on a NAPI interrupt, | |
97bbf662 MC |
99 | it's a Per-CPU variable. For drivers that support LRO or GRO_HW, a hardware |
100 | aggregated packet is counted as one packet in this context. | |
101 | ||
c60f6aa8 SW |
102 | Default: 64 |
103 | ||
3d48b53f MT |
104 | dev_weight_rx_bias |
105 | -------------- | |
106 | ||
107 | RPS (e.g. RFS, aRFS) processing is competing with the registered NAPI poll function | |
108 | of the driver for the per softirq cycle netdev_budget. This parameter influences | |
109 | the proportion of the configured netdev_budget that is spent on RPS based packet | |
110 | processing during RX softirq cycles. It is further meant for making current | |
111 | dev_weight adaptable for asymmetric CPU needs on RX/TX side of the network stack. | |
112 | (see dev_weight_tx_bias) It is effective on a per CPU basis. Determination is based | |
113 | on dev_weight and is calculated multiplicative (dev_weight * dev_weight_rx_bias). | |
114 | Default: 1 | |
115 | ||
116 | dev_weight_tx_bias | |
117 | -------------- | |
118 | ||
119 | Scales the maximum number of packets that can be processed during a TX softirq cycle. | |
120 | Effective on a per CPU basis. Allows scaling of current dev_weight for asymmetric | |
121 | net stack processing needs. Be careful to avoid making TX softirq processing a CPU hog. | |
122 | Calculation is based on dev_weight (dev_weight * dev_weight_tx_bias). | |
123 | Default: 1 | |
124 | ||
6da7c8fc | 125 | default_qdisc |
126 | -------------- | |
127 | ||
128 | The default queuing discipline to use for network devices. This allows | |
2e64126b PS |
129 | overriding the default of pfifo_fast with an alternative. Since the default |
130 | queuing discipline is created without additional parameters so is best suited | |
131 | to queuing disciplines that work well without configuration like stochastic | |
132 | fair queue (sfq), CoDel (codel) or fair queue CoDel (fq_codel). Don't use | |
133 | queuing disciplines like Hierarchical Token Bucket or Deficit Round Robin | |
134 | which require setting up classes and bandwidths. Note that physical multiqueue | |
135 | interfaces still use mq as root qdisc, which in turn uses this default for its | |
136 | leaves. Virtual devices (like e.g. lo or veth) ignore this setting and instead | |
137 | default to noqueue. | |
6da7c8fc | 138 | Default: pfifo_fast |
139 | ||
64b0dc51 | 140 | busy_read |
06021292 | 141 | ---------------- |
e0d1095a | 142 | Low latency busy poll timeout for socket reads. (needs CONFIG_NET_RX_BUSY_POLL) |
cbf55001 | 143 | Approximate time in us to busy loop waiting for packets on the device queue. |
64b0dc51 ET |
144 | This sets the default value of the SO_BUSY_POLL socket option. |
145 | Can be set or overridden per socket by setting socket option SO_BUSY_POLL, | |
146 | which is the preferred method of enabling. If you need to enable the feature | |
147 | globally via sysctl, a value of 50 is recommended. | |
cbf55001 | 148 | Will increase power usage. |
06021292 ET |
149 | Default: 0 (off) |
150 | ||
64b0dc51 | 151 | busy_poll |
2d48d67f | 152 | ---------------- |
e0d1095a | 153 | Low latency busy poll timeout for poll and select. (needs CONFIG_NET_RX_BUSY_POLL) |
cbf55001 | 154 | Approximate time in us to busy loop waiting for events. |
2d48d67f ET |
155 | Recommended value depends on the number of sockets you poll on. |
156 | For several sockets 50, for several hundreds 100. | |
157 | For more than that you probably want to use epoll. | |
64b0dc51 ET |
158 | Note that only sockets with SO_BUSY_POLL set will be busy polled, |
159 | so you want to either selectively set SO_BUSY_POLL on those sockets or set | |
160 | sysctl.net.busy_read globally. | |
cbf55001 | 161 | Will increase power usage. |
2d48d67f ET |
162 | Default: 0 (off) |
163 | ||
760df93e SF |
164 | rmem_default |
165 | ------------ | |
166 | ||
167 | The default setting of the socket receive buffer in bytes. | |
168 | ||
169 | rmem_max | |
170 | -------- | |
171 | ||
172 | The maximum receive socket buffer size in bytes. | |
173 | ||
b245be1f WB |
174 | tstamp_allow_data |
175 | ----------------- | |
176 | Allow processes to receive tx timestamps looped together with the original | |
177 | packet contents. If disabled, transmit timestamp requests from unprivileged | |
178 | processes are dropped unless socket option SOF_TIMESTAMPING_OPT_TSONLY is set. | |
179 | Default: 1 (on) | |
180 | ||
181 | ||
760df93e SF |
182 | wmem_default |
183 | ------------ | |
184 | ||
185 | The default setting (in bytes) of the socket send buffer. | |
186 | ||
187 | wmem_max | |
188 | -------- | |
189 | ||
190 | The maximum send socket buffer size in bytes. | |
191 | ||
192 | message_burst and message_cost | |
193 | ------------------------------ | |
194 | ||
195 | These parameters are used to limit the warning messages written to the kernel | |
196 | log from the networking code. They enforce a rate limit to make a | |
197 | denial-of-service attack impossible. A higher message_cost factor, results in | |
198 | fewer messages that will be written. Message_burst controls when messages will | |
199 | be dropped. The default settings limit warning messages to one every five | |
200 | seconds. | |
201 | ||
202 | warnings | |
203 | -------- | |
204 | ||
ba7a46f1 JP |
205 | This sysctl is now unused. |
206 | ||
207 | This was used to control console messages from the networking stack that | |
208 | occur because of problems on the network like duplicate address or bad | |
209 | checksums. | |
210 | ||
211 | These messages are now emitted at KERN_DEBUG and can generally be enabled | |
212 | and controlled by the dynamic_debug facility. | |
760df93e SF |
213 | |
214 | netdev_budget | |
215 | ------------- | |
216 | ||
217 | Maximum number of packets taken from all interfaces in one polling cycle (NAPI | |
218 | poll). In one polling cycle interfaces which are registered to polling are | |
7acf8a1e MW |
219 | probed in a round-robin manner. Also, a polling cycle may not exceed |
220 | netdev_budget_usecs microseconds, even if netdev_budget has not been | |
221 | exhausted. | |
222 | ||
223 | netdev_budget_usecs | |
224 | --------------------- | |
225 | ||
226 | Maximum number of microseconds in one NAPI polling cycle. Polling | |
227 | will exit when either netdev_budget_usecs have elapsed during the | |
228 | poll cycle or the number of packets processed reaches netdev_budget. | |
760df93e SF |
229 | |
230 | netdev_max_backlog | |
231 | ------------------ | |
232 | ||
233 | Maximum number of packets, queued on the INPUT side, when the interface | |
234 | receives packets faster than kernel can process them. | |
235 | ||
960fb622 ED |
236 | netdev_rss_key |
237 | -------------- | |
238 | ||
239 | RSS (Receive Side Scaling) enabled drivers use a 40 bytes host key that is | |
240 | randomly generated. | |
241 | Some user space might need to gather its content even if drivers do not | |
242 | provide ethtool -x support yet. | |
243 | ||
244 | myhost:~# cat /proc/sys/net/core/netdev_rss_key | |
245 | 84:50:f4:00:a8:15:d1:a7:e9:7f:1d:60:35:c7:47:25:42:97:74:ca:56:bb:b6:a1:d8: ... (52 bytes total) | |
246 | ||
247 | File contains nul bytes if no driver ever called netdev_rss_key_fill() function. | |
248 | Note: | |
249 | /proc/sys/net/core/netdev_rss_key contains 52 bytes of key, | |
250 | but most drivers only use 40 bytes of it. | |
251 | ||
252 | myhost:~# ethtool -x eth0 | |
253 | RX flow hash indirection table for eth0 with 8 RX ring(s): | |
254 | 0: 0 1 2 3 4 5 6 7 | |
255 | RSS hash key: | |
256 | 84:50:f4:00:a8:15:d1:a7:e9:7f:1d:60:35:c7:47:25:42:97:74:ca:56:bb:b6:a1:d8:43:e3:c9:0c:fd:17:55:c2:3a:4d:69:ed:f1:42:89 | |
257 | ||
3b098e2d ED |
258 | netdev_tstamp_prequeue |
259 | ---------------------- | |
260 | ||
261 | If set to 0, RX packet timestamps can be sampled after RPS processing, when | |
262 | the target CPU processes packets. It might give some delay on timestamps, but | |
263 | permit to distribute the load on several cpus. | |
264 | ||
265 | If set to 1 (default), timestamps are sampled as soon as possible, before | |
266 | queueing. | |
267 | ||
760df93e SF |
268 | optmem_max |
269 | ---------- | |
270 | ||
271 | Maximum ancillary buffer size allowed per socket. Ancillary data is a sequence | |
272 | of struct cmsghdr structures with appended data. | |
273 | ||
79134e6c ED |
274 | fb_tunnels_only_for_init_net |
275 | ---------------------------- | |
276 | ||
277 | Controls if fallback tunnels (like tunl0, gre0, gretap0, erspan0, | |
278 | sit0, ip6tnl0, ip6gre0) are automatically created when a new | |
279 | network namespace is created, if corresponding tunnel is present | |
280 | in initial network namespace. | |
281 | If set to 1, these devices are not automatically created, and | |
282 | user space is responsible for creating them if needed. | |
283 | ||
284 | Default : 0 (for compatibility reasons) | |
285 | ||
760df93e SF |
286 | 2. /proc/sys/net/unix - Parameters for Unix domain sockets |
287 | ------------------------------------------------------- | |
288 | ||
45dad7bd LX |
289 | There is only one file in this directory. |
290 | unix_dgram_qlen limits the max number of datagrams queued in Unix domain | |
ca8b9950 | 291 | socket's buffer. It will not take effect unless PF_UNIX flag is specified. |
760df93e SF |
292 | |
293 | ||
294 | 3. /proc/sys/net/ipv4 - IPV4 settings | |
295 | ------------------------------------------------------- | |
296 | Please see: Documentation/networking/ip-sysctl.txt and ipvs-sysctl.txt for | |
297 | descriptions of these entries. | |
298 | ||
299 | ||
300 | 4. Appletalk | |
301 | ------------------------------------------------------- | |
302 | ||
303 | The /proc/sys/net/appletalk directory holds the Appletalk configuration data | |
304 | when Appletalk is loaded. The configurable parameters are: | |
305 | ||
306 | aarp-expiry-time | |
307 | ---------------- | |
308 | ||
309 | The amount of time we keep an ARP entry before expiring it. Used to age out | |
310 | old hosts. | |
311 | ||
312 | aarp-resolve-time | |
313 | ----------------- | |
314 | ||
315 | The amount of time we will spend trying to resolve an Appletalk address. | |
316 | ||
317 | aarp-retransmit-limit | |
318 | --------------------- | |
319 | ||
320 | The number of times we will retransmit a query before giving up. | |
321 | ||
322 | aarp-tick-time | |
323 | -------------- | |
324 | ||
325 | Controls the rate at which expires are checked. | |
326 | ||
327 | The directory /proc/net/appletalk holds the list of active Appletalk sockets | |
328 | on a machine. | |
329 | ||
330 | The fields indicate the DDP type, the local address (in network:node format) | |
331 | the remote address, the size of the transmit pending queue, the size of the | |
332 | received queue (bytes waiting for applications to read) the state and the uid | |
333 | owning the socket. | |
334 | ||
335 | /proc/net/atalk_iface lists all the interfaces configured for appletalk.It | |
336 | shows the name of the interface, its Appletalk address, the network range on | |
337 | that address (or network number for phase 1 networks), and the status of the | |
338 | interface. | |
339 | ||
340 | /proc/net/atalk_route lists each known network route. It lists the target | |
341 | (network) that the route leads to, the router (may be directly connected), the | |
342 | route flags, and the device the route is using. | |
343 | ||
344 | ||
345 | 5. IPX | |
346 | ------------------------------------------------------- | |
347 | ||
348 | The IPX protocol has no tunable values in proc/sys/net. | |
349 | ||
350 | The IPX protocol does, however, provide proc/net/ipx. This lists each IPX | |
351 | socket giving the local and remote addresses in Novell format (that is | |
352 | network:node:port). In accordance with the strange Novell tradition, | |
353 | everything but the port is in hex. Not_Connected is displayed for sockets that | |
354 | are not tied to a specific remote address. The Tx and Rx queue sizes indicate | |
355 | the number of bytes pending for transmission and reception. The state | |
356 | indicates the state the socket is in and the uid is the owning uid of the | |
357 | socket. | |
358 | ||
359 | The /proc/net/ipx_interface file lists all IPX interfaces. For each interface | |
360 | it gives the network number, the node number, and indicates if the network is | |
361 | the primary network. It also indicates which device it is bound to (or | |
362 | Internal for internal networks) and the Frame Type if appropriate. Linux | |
363 | supports 802.3, 802.2, 802.2 SNAP and DIX (Blue Book) ethernet framing for | |
364 | IPX. | |
365 | ||
366 | The /proc/net/ipx_route table holds a list of IPX routes. For each route it | |
367 | gives the destination network, the router node (or Directly) and the network | |
368 | address of the router (or Connected) for internal networks. | |
cc79dd1b YX |
369 | |
370 | 6. TIPC | |
371 | ------------------------------------------------------- | |
372 | ||
a5325ae5 EH |
373 | tipc_rmem |
374 | ---------- | |
375 | ||
cc79dd1b YX |
376 | The TIPC protocol now has a tunable for the receive memory, similar to the |
377 | tcp_rmem - i.e. a vector of 3 INTEGERs: (min, default, max) | |
378 | ||
379 | # cat /proc/sys/net/tipc/tipc_rmem | |
380 | 4252725 34021800 68043600 | |
381 | # | |
382 | ||
383 | The max value is set to CONN_OVERLOAD_LIMIT, and the default and min values | |
384 | are scaled (shifted) versions of that same value. Note that the min value | |
385 | is not at this point in time used in any meaningful way, but the triplet is | |
386 | preserved in order to be consistent with things like tcp_rmem. | |
a5325ae5 EH |
387 | |
388 | named_timeout | |
389 | -------------- | |
390 | ||
391 | TIPC name table updates are distributed asynchronously in a cluster, without | |
392 | any form of transaction handling. This means that different race scenarios are | |
393 | possible. One such is that a name withdrawal sent out by one node and received | |
394 | by another node may arrive after a second, overlapping name publication already | |
395 | has been accepted from a third node, although the conflicting updates | |
396 | originally may have been issued in the correct sequential order. | |
397 | If named_timeout is nonzero, failed topology updates will be placed on a defer | |
398 | queue until another event arrives that clears the error, or until the timeout | |
399 | expires. Value is in milliseconds. |