]> git.proxmox.com Git - pve-docs.git/blob - pve-network.adoc
attrs: update cephdocs template to quincy
[pve-docs.git] / pve-network.adoc
1 [[sysadmin_network_configuration]]
2 Network Configuration
3 ---------------------
4 ifdef::wiki[]
5 :pve-toplevel:
6 endif::wiki[]
7
8 {pve} is using the Linux network stack. This provides a lot of flexibility on
9 how to set up the network on the {pve} nodes. The configuration can be done
10 either via the GUI, or by manually editing the file `/etc/network/interfaces`,
11 which contains the whole network configuration. The `interfaces(5)` manual
12 page contains the complete format description. All {pve} tools try hard to keep
13 direct user modifications, but using the GUI is still preferable, because it
14 protects you from errors.
15
16 A 'vmbr' interface is needed to connect guests to the underlying physical
17 network. They are a Linux bridge which can be thought of as a virtual switch
18 to which the guests and physical interfaces are connected to. This section
19 provides some examples on how the network can be set up to accomodate different
20 use cases like redundancy with a xref:sysadmin_network_bond['bond'],
21 xref:sysadmin_network_vlan['vlans'] or
22 xref:sysadmin_network_routed['routed'] and
23 xref:sysadmin_network_masquerading['NAT'] setups.
24
25 The xref:chapter_pvesdn[Software Defined Network] is an option for more complex
26 virtual networks in {pve} clusters.
27
28 WARNING: It's discouraged to use the traditional Debian tools `ifup` and `ifdown`
29 if unsure, as they have some pitfalls like interupting all guest traffic on
30 `ifdown vmbrX` but not reconnecting those guest again when doing `ifup` on the
31 same bridge later.
32
33 Apply Network Changes
34 ~~~~~~~~~~~~~~~~~~~~~
35
36 {pve} does not write changes directly to `/etc/network/interfaces`. Instead, we
37 write into a temporary file called `/etc/network/interfaces.new`, this way you
38 can do many related changes at once. This also allows to ensure your changes
39 are correct before applying, as a wrong network configuration may render a node
40 inaccessible.
41
42 Live-Reload Network with ifupdown2
43 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
44
45 With the recommended 'ifupdown2' package (default for new installations since
46 {pve} 7.0), it is possible to apply network configuration changes without a
47 reboot. If you change the network configuration via the GUI, you can click the
48 'Apply Configuration' button. This will move changes from the staging
49 `interfaces.new` file to `/etc/network/interfaces` and apply them live.
50
51 If you made manual changes directly to the `/etc/network/interfaces` file, you
52 can apply them by running `ifreload -a`
53
54 NOTE: If you installed {pve} on top of Debian, or upgraded to {pve} 7.0 from an
55 older {pve} installation, make sure 'ifupdown2' is installed: `apt install
56 ifupdown2`
57
58 Reboot Node to Apply
59 ^^^^^^^^^^^^^^^^^^^^
60
61 Another way to apply a new network configuration is to reboot the node.
62 In that case the systemd service `pvenetcommit` will activate the staging
63 `interfaces.new` file before the `networking` service will apply that
64 configuration.
65
66 Naming Conventions
67 ~~~~~~~~~~~~~~~~~~
68
69 We currently use the following naming conventions for device names:
70
71 * Ethernet devices: en*, systemd network interface names. This naming scheme is
72 used for new {pve} installations since version 5.0.
73
74 * Ethernet devices: eth[N], where 0 ≤ N (`eth0`, `eth1`, ...) This naming
75 scheme is used for {pve} hosts which were installed before the 5.0
76 release. When upgrading to 5.0, the names are kept as-is.
77
78 * Bridge names: vmbr[N], where 0 ≤ N ≤ 4094 (`vmbr0` - `vmbr4094`)
79
80 * Bonds: bond[N], where 0 ≤ N (`bond0`, `bond1`, ...)
81
82 * VLANs: Simply add the VLAN number to the device name,
83 separated by a period (`eno1.50`, `bond1.30`)
84
85 This makes it easier to debug networks problems, because the device
86 name implies the device type.
87
88 Systemd Network Interface Names
89 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
90
91 Systemd uses the two character prefix 'en' for Ethernet network
92 devices. The next characters depends on the device driver and the fact
93 which schema matches first.
94
95 * o<index>[n<phys_port_name>|d<dev_port>] — devices on board
96
97 * s<slot>[f<function>][n<phys_port_name>|d<dev_port>] — device by hotplug id
98
99 * [P<domain>]p<bus>s<slot>[f<function>][n<phys_port_name>|d<dev_port>] — devices by bus id
100
101 * x<MAC> — device by MAC address
102
103 The most common patterns are:
104
105 * eno1 — is the first on board NIC
106
107 * enp3s0f1 — is the NIC on pcibus 3 slot 0 and use the NIC function 1.
108
109 For more information see https://www.freedesktop.org/wiki/Software/systemd/PredictableNetworkInterfaceNames/[Predictable Network Interface Names].
110
111 Choosing a network configuration
112 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
113
114 Depending on your current network organization and your resources you can
115 choose either a bridged, routed, or masquerading networking setup.
116
117 {pve} server in a private LAN, using an external gateway to reach the internet
118 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
119
120 The *Bridged* model makes the most sense in this case, and this is also
121 the default mode on new {pve} installations.
122 Each of your Guest system will have a virtual interface attached to the
123 {pve} bridge. This is similar in effect to having the Guest network card
124 directly connected to a new switch on your LAN, the {pve} host playing the role
125 of the switch.
126
127 {pve} server at hosting provider, with public IP ranges for Guests
128 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
129
130 For this setup, you can use either a *Bridged* or *Routed* model, depending on
131 what your provider allows.
132
133 {pve} server at hosting provider, with a single public IP address
134 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
135
136 In that case the only way to get outgoing network accesses for your guest
137 systems is to use *Masquerading*. For incoming network access to your guests,
138 you will need to configure *Port Forwarding*.
139
140 For further flexibility, you can configure
141 VLANs (IEEE 802.1q) and network bonding, also known as "link
142 aggregation". That way it is possible to build complex and flexible
143 virtual networks.
144
145 Default Configuration using a Bridge
146 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
147
148 [thumbnail="default-network-setup-bridge.svg"]
149 Bridges are like physical network switches implemented in software.
150 All virtual guests can share a single bridge, or you can create multiple
151 bridges to separate network domains. Each host can have up to 4094 bridges.
152
153 The installation program creates a single bridge named `vmbr0`, which
154 is connected to the first Ethernet card. The corresponding
155 configuration in `/etc/network/interfaces` might look like this:
156
157 ----
158 auto lo
159 iface lo inet loopback
160
161 iface eno1 inet manual
162
163 auto vmbr0
164 iface vmbr0 inet static
165 address 192.168.10.2/24
166 gateway 192.168.10.1
167 bridge-ports eno1
168 bridge-stp off
169 bridge-fd 0
170 ----
171
172 Virtual machines behave as if they were directly connected to the
173 physical network. The network, in turn, sees each virtual machine as
174 having its own MAC, even though there is only one network cable
175 connecting all of these VMs to the network.
176
177 [[sysadmin_network_routed]]
178 Routed Configuration
179 ~~~~~~~~~~~~~~~~~~~~
180
181 Most hosting providers do not support the above setup. For security
182 reasons, they disable networking as soon as they detect multiple MAC
183 addresses on a single interface.
184
185 TIP: Some providers allow you to register additional MACs through their
186 management interface. This avoids the problem, but can be clumsy to
187 configure because you need to register a MAC for each of your VMs.
188
189 You can avoid the problem by ``routing'' all traffic via a single
190 interface. This makes sure that all network packets use the same MAC
191 address.
192
193 [thumbnail="default-network-setup-routed.svg"]
194 A common scenario is that you have a public IP (assume `198.51.100.5`
195 for this example), and an additional IP block for your VMs
196 (`203.0.113.16/28`). We recommend the following setup for such
197 situations:
198
199 ----
200 auto lo
201 iface lo inet loopback
202
203 auto eno0
204 iface eno0 inet static
205 address 198.51.100.5/29
206 gateway 198.51.100.1
207 post-up echo 1 > /proc/sys/net/ipv4/ip_forward
208 post-up echo 1 > /proc/sys/net/ipv4/conf/eno0/proxy_arp
209
210
211 auto vmbr0
212 iface vmbr0 inet static
213 address 203.0.113.17/28
214 bridge-ports none
215 bridge-stp off
216 bridge-fd 0
217 ----
218
219
220 [[sysadmin_network_masquerading]]
221 Masquerading (NAT) with `iptables`
222 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
223
224 Masquerading allows guests having only a private IP address to access the
225 network by using the host IP address for outgoing traffic. Each outgoing
226 packet is rewritten by `iptables` to appear as originating from the host,
227 and responses are rewritten accordingly to be routed to the original sender.
228
229 ----
230 auto lo
231 iface lo inet loopback
232
233 auto eno1
234 #real IP address
235 iface eno1 inet static
236 address 198.51.100.5/24
237 gateway 198.51.100.1
238
239 auto vmbr0
240 #private sub network
241 iface vmbr0 inet static
242 address 10.10.10.1/24
243 bridge-ports none
244 bridge-stp off
245 bridge-fd 0
246
247 post-up echo 1 > /proc/sys/net/ipv4/ip_forward
248 post-up iptables -t nat -A POSTROUTING -s '10.10.10.0/24' -o eno1 -j MASQUERADE
249 post-down iptables -t nat -D POSTROUTING -s '10.10.10.0/24' -o eno1 -j MASQUERADE
250 ----
251
252 NOTE: In some masquerade setups with firewall enabled, conntrack zones might be
253 needed for outgoing connections. Otherwise the firewall could block outgoing
254 connections since they will prefer the `POSTROUTING` of the VM bridge (and not
255 `MASQUERADE`).
256
257 Adding these lines in the `/etc/network/interfaces` can fix this problem:
258
259 ----
260 post-up iptables -t raw -I PREROUTING -i fwbr+ -j CT --zone 1
261 post-down iptables -t raw -D PREROUTING -i fwbr+ -j CT --zone 1
262 ----
263
264 For more information about this, refer to the following links:
265
266 https://commons.wikimedia.org/wiki/File:Netfilter-packet-flow.svg[Netfilter Packet Flow]
267
268 https://lwn.net/Articles/370152/[Patch on netdev-list introducing conntrack zones]
269
270 https://web.archive.org/web/20220610151210/https://blog.lobraun.de/2019/05/19/prox/[Blog post with a good explanation by using TRACE in the raw table]
271
272
273 [[sysadmin_network_bond]]
274 Linux Bond
275 ~~~~~~~~~~
276
277 Bonding (also called NIC teaming or Link Aggregation) is a technique
278 for binding multiple NIC's to a single network device. It is possible
279 to achieve different goals, like make the network fault-tolerant,
280 increase the performance or both together.
281
282 High-speed hardware like Fibre Channel and the associated switching
283 hardware can be quite expensive. By doing link aggregation, two NICs
284 can appear as one logical interface, resulting in double speed. This
285 is a native Linux kernel feature that is supported by most
286 switches. If your nodes have multiple Ethernet ports, you can
287 distribute your points of failure by running network cables to
288 different switches and the bonded connection will failover to one
289 cable or the other in case of network trouble.
290
291 Aggregated links can improve live-migration delays and improve the
292 speed of replication of data between Proxmox VE Cluster nodes.
293
294 There are 7 modes for bonding:
295
296 * *Round-robin (balance-rr):* Transmit network packets in sequential
297 order from the first available network interface (NIC) slave through
298 the last. This mode provides load balancing and fault tolerance.
299
300 * *Active-backup (active-backup):* Only one NIC slave in the bond is
301 active. A different slave becomes active if, and only if, the active
302 slave fails. The single logical bonded interface's MAC address is
303 externally visible on only one NIC (port) to avoid distortion in the
304 network switch. This mode provides fault tolerance.
305
306 * *XOR (balance-xor):* Transmit network packets based on [(source MAC
307 address XOR'd with destination MAC address) modulo NIC slave
308 count]. This selects the same NIC slave for each destination MAC
309 address. This mode provides load balancing and fault tolerance.
310
311 * *Broadcast (broadcast):* Transmit network packets on all slave
312 network interfaces. This mode provides fault tolerance.
313
314 * *IEEE 802.3ad Dynamic link aggregation (802.3ad)(LACP):* Creates
315 aggregation groups that share the same speed and duplex
316 settings. Utilizes all slave network interfaces in the active
317 aggregator group according to the 802.3ad specification.
318
319 * *Adaptive transmit load balancing (balance-tlb):* Linux bonding
320 driver mode that does not require any special network-switch
321 support. The outgoing network packet traffic is distributed according
322 to the current load (computed relative to the speed) on each network
323 interface slave. Incoming traffic is received by one currently
324 designated slave network interface. If this receiving slave fails,
325 another slave takes over the MAC address of the failed receiving
326 slave.
327
328 * *Adaptive load balancing (balance-alb):* Includes balance-tlb plus receive
329 load balancing (rlb) for IPV4 traffic, and does not require any
330 special network switch support. The receive load balancing is achieved
331 by ARP negotiation. The bonding driver intercepts the ARP Replies sent
332 by the local system on their way out and overwrites the source
333 hardware address with the unique hardware address of one of the NIC
334 slaves in the single logical bonded interface such that different
335 network-peers use different MAC addresses for their network packet
336 traffic.
337
338 If your switch support the LACP (IEEE 802.3ad) protocol then we recommend using
339 the corresponding bonding mode (802.3ad). Otherwise you should generally use the
340 active-backup mode. +
341 // http://lists.linux-ha.org/pipermail/linux-ha/2013-January/046295.html
342 If you intend to run your cluster network on the bonding interfaces, then you
343 have to use active-passive mode on the bonding interfaces, other modes are
344 unsupported.
345
346 The following bond configuration can be used as distributed/shared
347 storage network. The benefit would be that you get more speed and the
348 network will be fault-tolerant.
349
350 .Example: Use bond with fixed IP address
351 ----
352 auto lo
353 iface lo inet loopback
354
355 iface eno1 inet manual
356
357 iface eno2 inet manual
358
359 iface eno3 inet manual
360
361 auto bond0
362 iface bond0 inet static
363 bond-slaves eno1 eno2
364 address 192.168.1.2/24
365 bond-miimon 100
366 bond-mode 802.3ad
367 bond-xmit-hash-policy layer2+3
368
369 auto vmbr0
370 iface vmbr0 inet static
371 address 10.10.10.2/24
372 gateway 10.10.10.1
373 bridge-ports eno3
374 bridge-stp off
375 bridge-fd 0
376
377 ----
378
379
380 [thumbnail="default-network-setup-bond.svg"]
381 Another possibility it to use the bond directly as bridge port.
382 This can be used to make the guest network fault-tolerant.
383
384 .Example: Use a bond as bridge port
385 ----
386 auto lo
387 iface lo inet loopback
388
389 iface eno1 inet manual
390
391 iface eno2 inet manual
392
393 auto bond0
394 iface bond0 inet manual
395 bond-slaves eno1 eno2
396 bond-miimon 100
397 bond-mode 802.3ad
398 bond-xmit-hash-policy layer2+3
399
400 auto vmbr0
401 iface vmbr0 inet static
402 address 10.10.10.2/24
403 gateway 10.10.10.1
404 bridge-ports bond0
405 bridge-stp off
406 bridge-fd 0
407
408 ----
409
410
411 [[sysadmin_network_vlan]]
412 VLAN 802.1Q
413 ~~~~~~~~~~~
414
415 A virtual LAN (VLAN) is a broadcast domain that is partitioned and
416 isolated in the network at layer two. So it is possible to have
417 multiple networks (4096) in a physical network, each independent of
418 the other ones.
419
420 Each VLAN network is identified by a number often called 'tag'.
421 Network packages are then 'tagged' to identify which virtual network
422 they belong to.
423
424
425 VLAN for Guest Networks
426 ^^^^^^^^^^^^^^^^^^^^^^^
427
428 {pve} supports this setup out of the box. You can specify the VLAN tag
429 when you create a VM. The VLAN tag is part of the guest network
430 configuration. The networking layer supports different modes to
431 implement VLANs, depending on the bridge configuration:
432
433 * *VLAN awareness on the Linux bridge:*
434 In this case, each guest's virtual network card is assigned to a VLAN tag,
435 which is transparently supported by the Linux bridge.
436 Trunk mode is also possible, but that makes configuration
437 in the guest necessary.
438
439 * *"traditional" VLAN on the Linux bridge:*
440 In contrast to the VLAN awareness method, this method is not transparent
441 and creates a VLAN device with associated bridge for each VLAN.
442 That is, creating a guest on VLAN 5 for example, would create two
443 interfaces eno1.5 and vmbr0v5, which would remain until a reboot occurs.
444
445 * *Open vSwitch VLAN:*
446 This mode uses the OVS VLAN feature.
447
448 * *Guest configured VLAN:*
449 VLANs are assigned inside the guest. In this case, the setup is
450 completely done inside the guest and can not be influenced from the
451 outside. The benefit is that you can use more than one VLAN on a
452 single virtual NIC.
453
454
455 VLAN on the Host
456 ^^^^^^^^^^^^^^^^
457
458 To allow host communication with an isolated network. It is possible
459 to apply VLAN tags to any network device (NIC, Bond, Bridge). In
460 general, you should configure the VLAN on the interface with the least
461 abstraction layers between itself and the physical NIC.
462
463 For example, in a default configuration where you want to place
464 the host management address on a separate VLAN.
465
466
467 .Example: Use VLAN 5 for the {pve} management IP with traditional Linux bridge
468 ----
469 auto lo
470 iface lo inet loopback
471
472 iface eno1 inet manual
473
474 iface eno1.5 inet manual
475
476 auto vmbr0v5
477 iface vmbr0v5 inet static
478 address 10.10.10.2/24
479 gateway 10.10.10.1
480 bridge-ports eno1.5
481 bridge-stp off
482 bridge-fd 0
483
484 auto vmbr0
485 iface vmbr0 inet manual
486 bridge-ports eno1
487 bridge-stp off
488 bridge-fd 0
489
490 ----
491
492 .Example: Use VLAN 5 for the {pve} management IP with VLAN aware Linux bridge
493 ----
494 auto lo
495 iface lo inet loopback
496
497 iface eno1 inet manual
498
499
500 auto vmbr0.5
501 iface vmbr0.5 inet static
502 address 10.10.10.2/24
503 gateway 10.10.10.1
504
505 auto vmbr0
506 iface vmbr0 inet manual
507 bridge-ports eno1
508 bridge-stp off
509 bridge-fd 0
510 bridge-vlan-aware yes
511 bridge-vids 2-4094
512 ----
513
514 The next example is the same setup but a bond is used to
515 make this network fail-safe.
516
517 .Example: Use VLAN 5 with bond0 for the {pve} management IP with traditional Linux bridge
518 ----
519 auto lo
520 iface lo inet loopback
521
522 iface eno1 inet manual
523
524 iface eno2 inet manual
525
526 auto bond0
527 iface bond0 inet manual
528 bond-slaves eno1 eno2
529 bond-miimon 100
530 bond-mode 802.3ad
531 bond-xmit-hash-policy layer2+3
532
533 iface bond0.5 inet manual
534
535 auto vmbr0v5
536 iface vmbr0v5 inet static
537 address 10.10.10.2/24
538 gateway 10.10.10.1
539 bridge-ports bond0.5
540 bridge-stp off
541 bridge-fd 0
542
543 auto vmbr0
544 iface vmbr0 inet manual
545 bridge-ports bond0
546 bridge-stp off
547 bridge-fd 0
548
549 ----
550
551 Disabling IPv6 on the Node
552 ~~~~~~~~~~~~~~~~~~~~~~~~~~
553
554 {pve} works correctly in all environments, irrespective of whether IPv6 is
555 deployed or not. We recommend leaving all settings at the provided defaults.
556
557 Should you still need to disable support for IPv6 on your node, do so by
558 creating an appropriate `sysctl.conf (5)` snippet file and setting the proper
559 https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt[sysctls],
560 for example adding `/etc/sysctl.d/disable-ipv6.conf` with content:
561
562 ----
563 net.ipv6.conf.all.disable_ipv6 = 1
564 net.ipv6.conf.default.disable_ipv6 = 1
565 ----
566
567 This method is preferred to disabling the loading of the IPv6 module on the
568 https://www.kernel.org/doc/Documentation/networking/ipv6.rst[kernel commandline].
569
570
571 Disabling MAC Learning on a Bridge
572 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
573
574 By default, MAC learning is enabled on a bridge to ensure a smooth experience
575 with virtual guests and their networks.
576
577 But in some environments this can be undesired. Since {pve} 7.3 you can disable
578 MAC learning on the bridge by setting the `bridge-disable-mac-learning 1`
579 configuration on a bridge in `/etc/network/interfaces', for example:
580
581 ----
582 # ...
583
584 auto vmbr0
585 iface vmbr0 inet static
586 address 10.10.10.2/24
587 gateway 10.10.10.1
588 bridge-ports ens18
589 bridge-stp off
590 bridge-fd 0
591 bridge-disable-mac-learning 1
592 ----
593
594 Once enabled, {pve} will manually add the configured MAC address from VMs and
595 Containers to the bridges forwarding database to ensure that guest can still
596 use the network - but only when they are using their actual MAC address.
597
598 ////
599 TODO: explain IPv6 support?
600 TODO: explain OVS
601 ////