pve-network.adoc

   1 [[sysadmin_network_configuration]]
   2 Network Configuration
   3 ---------------------
   4 ifdef::wiki[]
   5 :pve-toplevel:
   6 endif::wiki[]
   7
   8 {pve} is using the Linux network stack. This provides a lot of flexibility on
   9 how to set up the network on the {pve} nodes. The configuration can be done
  10 either via the GUI, or by manually editing the file `/etc/network/interfaces`,
  11 which contains the whole network configuration. The  `interfaces(5)` manual
  12 page contains the complete format description. All {pve} tools try hard to keep
  13 direct user modifications, but using the GUI is still preferable, because it
  14 protects you from errors.
  15
  16 A 'vmbr' interface is needed to connect guests to the underlying physical
  17 network.  They are a Linux bridge which can be thought of as a virtual switch
  18 to which the guests and physical interfaces are connected to.  This section
  19 provides some examples on how the network can be set up to accomodate different
  20 use cases like redundancy with a xref:sysadmin_network_bond['bond'],
  21 xref:sysadmin_network_vlan['vlans'] or
  22 xref:sysadmin_network_routed['routed'] and
  23 xref:sysadmin_network_masquerading['NAT'] setups.
  24
  25 The xref:chapter_pvesdn[Software Defined Network] is an option for more complex
  26 virtual networks in {pve} clusters.
  27
  28 WARNING: It's discourage to use the Debian traditional tools `ifup` and `ifdown`
  29 if unsure, as they have some pitfalls like interupting all guest traffic on
  30 `ifdown vmbrX` but not reconnecting those guest again when doing `ifup` on the
  31 same bridge later.
  32
  33 Apply Network Changes
  34 ~~~~~~~~~~~~~~~~~~~~~
  35
  36 {pve} does not write changes directly to `/etc/network/interfaces`. Instead, we
  37 write into a temporary file called `/etc/network/interfaces.new`, this way you
  38 can do many related changes at once. This also allows to ensure your changes
  39 are correct before applying, as a wrong network configuration may render a node
  40 inaccessible.
  41
  42 Reboot Node to apply
  43 ^^^^^^^^^^^^^^^^^^^^
  44
  45 One way to apply a new network configuration is to reboot the node.
  46
  47 Reload Network with ifupdown2
  48 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  49
  50 With the 'ifupdown2' package (default since {pve} 7), it is possible to apply
  51 network configuration changes without a reboot. If you change the network
  52 configuration via the GUI, you can click the 'Apply Configuration' button. Run
  53 the following command if you make changes directly to the
  54 `/etc/network/interfaces` file:
  55
  56 ----
  57  ifreload -a
  58 ----
  59
  60 NOTE: If you installed {pve} on top of Debian, make sure 'ifupdown2' is
  61 installed: 'apt install ifupdown2'
  62
  63 Naming Conventions
  64 ~~~~~~~~~~~~~~~~~~
  65
  66 We currently use the following naming conventions for device names:
  67
  68 * Ethernet devices: en*, systemd network interface names. This naming scheme is
  69  used for new {pve} installations since version 5.0.
  70
  71 * Ethernet devices: eth[N], where 0 ≤ N (`eth0`, `eth1`, ...) This naming
  72 scheme is used for {pve} hosts which were installed before the 5.0
  73 release. When upgrading to 5.0, the names are kept as-is.
  74
  75 * Bridge names: vmbr[N], where 0 ≤ N ≤ 4094 (`vmbr0` - `vmbr4094`)
  76
  77 * Bonds: bond[N], where 0 ≤ N (`bond0`, `bond1`, ...)
  78
  79 * VLANs: Simply add the VLAN number to the device name,
  80   separated by a period (`eno1.50`, `bond1.30`)
  81
  82 This makes it easier to debug networks problems, because the device
  83 name implies the device type.
  84
  85 Systemd Network Interface Names
  86 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  87
  88 Systemd uses the two character prefix 'en' for Ethernet network
  89 devices. The next characters depends on the device driver and the fact
  90 which schema matches first.
  91
  92 * o<index>[n<phys_port_name>|d<dev_port>] — devices on board
  93
  94 * s<slot>[f<function>][n<phys_port_name>|d<dev_port>] — device by hotplug id
  95
  96 * [P<domain>]p<bus>s<slot>[f<function>][n<phys_port_name>|d<dev_port>] — devices by bus id
  97
  98 * x<MAC> — device by MAC address
  99
 100 The most common patterns are:
 101
 102 * eno1 — is the first on board NIC
 103
 104 * enp3s0f1 — is the NIC on pcibus 3 slot 0 and use the NIC function 1.
 105
 106 For more information see https://www.freedesktop.org/wiki/Software/systemd/PredictableNetworkInterfaceNames/[Predictable Network Interface Names].
 107
 108 Choosing a network configuration
 109 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 110
 111 Depending on your current network organization and your resources you can
 112 choose either a bridged, routed, or masquerading networking setup.
 113
 114 {pve} server in a private LAN, using an external gateway to reach the internet
 115 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 116
 117 The *Bridged* model makes the most sense in this case, and this is also
 118 the default mode on new {pve} installations.
 119 Each of your Guest system will have a virtual interface attached to the
 120 {pve} bridge. This is similar in effect to having the Guest network card
 121 directly connected to a new switch on your LAN, the {pve} host playing the role
 122 of the switch.
 123
 124 {pve} server at hosting provider, with public IP ranges for Guests
 125 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 126
 127 For this setup, you can use either a *Bridged* or *Routed* model, depending on
 128 what your provider allows.
 129
 130 {pve} server at hosting provider, with a single public IP address
 131 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 132
 133 In that case the only way to get outgoing network accesses for your guest
 134 systems is to use *Masquerading*. For incoming network access to your guests,
 135 you will need to configure *Port Forwarding*.
 136
 137 For further flexibility, you can configure
 138 VLANs (IEEE 802.1q) and network bonding, also known as "link
 139 aggregation". That way it is possible to build complex and flexible
 140 virtual networks.
 141
 142 Default Configuration using a Bridge
 143 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 144
 145 [thumbnail="default-network-setup-bridge.svg"]
 146 Bridges are like physical network switches implemented in software.
 147 All virtual guests can share a single bridge, or you can create multiple
 148 bridges to separate network domains. Each host can have up to 4094 bridges.
 149
 150 The installation program creates a single bridge named `vmbr0`, which
 151 is connected to the first Ethernet card. The corresponding
 152 configuration in `/etc/network/interfaces` might look like this:
 153
 154 ----
 155 auto lo
 156 iface lo inet loopback
 157
 158 iface eno1 inet manual
 159
 160 auto vmbr0
 161 iface vmbr0 inet static
 162         address 192.168.10.2/24
 163         gateway 192.168.10.1
 164         bridge-ports eno1
 165         bridge-stp off
 166         bridge-fd 0
 167 ----
 168
 169 Virtual machines behave as if they were directly connected to the
 170 physical network. The network, in turn, sees each virtual machine as
 171 having its own MAC, even though there is only one network cable
 172 connecting all of these VMs to the network.
 173
 174 [[sysadmin_network_routed]]
 175 Routed Configuration
 176 ~~~~~~~~~~~~~~~~~~~~
 177
 178 Most hosting providers do not support the above setup. For security
 179 reasons, they disable networking as soon as they detect multiple MAC
 180 addresses on a single interface.
 181
 182 TIP: Some providers allow you to register additional MACs through their
 183 management interface. This avoids the problem, but can be clumsy to
 184 configure because you need to register a MAC for each of your VMs.
 185
 186 You can avoid the problem by ``routing'' all traffic via a single
 187 interface. This makes sure that all network packets use the same MAC
 188 address.
 189
 190 [thumbnail="default-network-setup-routed.svg"]
 191 A common scenario is that you have a public IP (assume `198.51.100.5`
 192 for this example), and an additional IP block for your VMs
 193 (`203.0.113.16/28`). We recommend the following setup for such
 194 situations:
 195
 196 ----
 197 auto lo
 198 iface lo inet loopback
 199
 200 auto eno0
 201 iface eno0 inet static
 202         address  198.51.100.5/29
 203         gateway  198.51.100.1
 204         post-up echo 1 > /proc/sys/net/ipv4/ip_forward
 205         post-up echo 1 > /proc/sys/net/ipv4/conf/eno0/proxy_arp
 206
 207
 208 auto vmbr0
 209 iface vmbr0 inet static
 210         address  203.0.113.17/28
 211         bridge-ports none
 212         bridge-stp off
 213         bridge-fd 0
 214 ----
 215
 216
 217 [[sysadmin_network_masquerading]]
 218 Masquerading (NAT) with `iptables`
 219 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 220
 221 Masquerading allows guests having only a private IP address to access the
 222 network by using the host IP address for outgoing traffic. Each outgoing
 223 packet is rewritten by `iptables` to appear as originating from the host,
 224 and responses are rewritten accordingly to be routed to the original sender.
 225
 226 ----
 227 auto lo
 228 iface lo inet loopback
 229
 230 auto eno1
 231 #real IP address
 232 iface eno1 inet static
 233         address  198.51.100.5/24
 234         gateway  198.51.100.1
 235
 236 auto vmbr0
 237 #private sub network
 238 iface vmbr0 inet static
 239         address  10.10.10.1/24
 240         bridge-ports none
 241         bridge-stp off
 242         bridge-fd 0
 243
 244         post-up   echo 1 > /proc/sys/net/ipv4/ip_forward
 245         post-up   iptables -t nat -A POSTROUTING -s '10.10.10.0/24' -o eno1 -j MASQUERADE
 246         post-down iptables -t nat -D POSTROUTING -s '10.10.10.0/24' -o eno1 -j MASQUERADE
 247 ----
 248
 249 NOTE: In some masquerade setups with firewall enabled, conntrack zones might be
 250 needed for outgoing connections. Otherwise the firewall could block outgoing
 251 connections since they will prefer the `POSTROUTING` of the VM bridge (and not
 252 `MASQUERADE`).
 253
 254 Adding these lines in the `/etc/network/interfaces` can fix this problem:
 255
 256 ----
 257 post-up   iptables -t raw -I PREROUTING -i fwbr+ -j CT --zone 1
 258 post-down iptables -t raw -D PREROUTING -i fwbr+ -j CT --zone 1
 259 ----
 260
 261 For more information about this, refer to the following links:
 262
 263 https://commons.wikimedia.org/wiki/File:Netfilter-packet-flow.svg[Netfilter Packet Flow]
 264
 265 https://lwn.net/Articles/370152/[Patch on netdev-list introducing conntrack zones]
 266
 267 https://blog.lobraun.de/2019/05/19/prox/[Blog post with a good explanation by using TRACE in the raw table]
 268
 269
 270 [[sysadmin_network_bond]]
 271 Linux Bond
 272 ~~~~~~~~~~
 273
 274 Bonding (also called NIC teaming or Link Aggregation) is a technique
 275 for binding multiple NIC's to a single network device.  It is possible
 276 to achieve different goals, like make the network fault-tolerant,
 277 increase the performance or both together.
 278
 279 High-speed hardware like Fibre Channel and the associated switching
 280 hardware can be quite expensive. By doing link aggregation, two NICs
 281 can appear as one logical interface, resulting in double speed. This
 282 is a native Linux kernel feature that is supported by most
 283 switches. If your nodes have multiple Ethernet ports, you can
 284 distribute your points of failure by running network cables to
 285 different switches and the bonded connection will failover to one
 286 cable or the other in case of network trouble.
 287
 288 Aggregated links can improve live-migration delays and improve the
 289 speed of replication of data between Proxmox VE Cluster nodes.
 290
 291 There are 7 modes for bonding:
 292
 293 * *Round-robin (balance-rr):* Transmit network packets in sequential
 294 order from the first available network interface (NIC) slave through
 295 the last. This mode provides load balancing and fault tolerance.
 296
 297 * *Active-backup (active-backup):* Only one NIC slave in the bond is
 298 active. A different slave becomes active if, and only if, the active
 299 slave fails. The single logical bonded interface's MAC address is
 300 externally visible on only one NIC (port) to avoid distortion in the
 301 network switch. This mode provides fault tolerance.
 302
 303 * *XOR (balance-xor):* Transmit network packets based on [(source MAC
 304 address XOR'd with destination MAC address) modulo NIC slave
 305 count]. This selects the same NIC slave for each destination MAC
 306 address. This mode provides load balancing and fault tolerance.
 307
 308 * *Broadcast (broadcast):* Transmit network packets on all slave
 309 network interfaces. This mode provides fault tolerance.
 310
 311 * *IEEE 802.3ad Dynamic link aggregation (802.3ad)(LACP):* Creates
 312 aggregation groups that share the same speed and duplex
 313 settings. Utilizes all slave network interfaces in the active
 314 aggregator group according to the 802.3ad specification.
 315
 316 * *Adaptive transmit load balancing (balance-tlb):* Linux bonding
 317 driver mode that does not require any special network-switch
 318 support. The outgoing network packet traffic is distributed according
 319 to the current load (computed relative to the speed) on each network
 320 interface slave. Incoming traffic is received by one currently
 321 designated slave network interface. If this receiving slave fails,
 322 another slave takes over the MAC address of the failed receiving
 323 slave.
 324
 325 * *Adaptive load balancing (balance-alb):* Includes balance-tlb plus receive
 326 load balancing (rlb) for IPV4 traffic, and does not require any
 327 special network switch support. The receive load balancing is achieved
 328 by ARP negotiation. The bonding driver intercepts the ARP Replies sent
 329 by the local system on their way out and overwrites the source
 330 hardware address with the unique hardware address of one of the NIC
 331 slaves in the single logical bonded interface such that different
 332 network-peers use different MAC addresses for their network packet
 333 traffic.
 334
 335 If your switch support the LACP (IEEE 802.3ad) protocol then we recommend using
 336 the corresponding bonding mode (802.3ad). Otherwise you should generally use the
 337 active-backup mode. +
 338 // http://lists.linux-ha.org/pipermail/linux-ha/2013-January/046295.html
 339 If you intend to run your cluster network on the bonding interfaces, then you
 340 have to use active-passive mode on the bonding interfaces, other modes are
 341 unsupported.
 342
 343 The following bond configuration can be used as distributed/shared
 344 storage network. The benefit would be that you get more speed and the
 345 network will be fault-tolerant.
 346
 347 .Example: Use bond with fixed IP address
 348 ----
 349 auto lo
 350 iface lo inet loopback
 351
 352 iface eno1 inet manual
 353
 354 iface eno2 inet manual
 355
 356 iface eno3 inet manual
 357
 358 auto bond0
 359 iface bond0 inet static
 360       bond-slaves eno1 eno2
 361       address  192.168.1.2/24
 362       bond-miimon 100
 363       bond-mode 802.3ad
 364       bond-xmit-hash-policy layer2+3
 365
 366 auto vmbr0
 367 iface vmbr0 inet static
 368         address  10.10.10.2/24
 369         gateway  10.10.10.1
 370         bridge-ports eno3
 371         bridge-stp off
 372         bridge-fd 0
 373
 374 ----
 375
 376
 377 [thumbnail="default-network-setup-bond.svg"]
 378 Another possibility it to use the bond directly as bridge port.
 379 This can be used to make the guest network fault-tolerant.
 380
 381 .Example: Use a bond as bridge port
 382 ----
 383 auto lo
 384 iface lo inet loopback
 385
 386 iface eno1 inet manual
 387
 388 iface eno2 inet manual
 389
 390 auto bond0
 391 iface bond0 inet manual
 392       bond-slaves eno1 eno2
 393       bond-miimon 100
 394       bond-mode 802.3ad
 395       bond-xmit-hash-policy layer2+3
 396
 397 auto vmbr0
 398 iface vmbr0 inet static
 399         address  10.10.10.2/24
 400         gateway  10.10.10.1
 401         bridge-ports bond0
 402         bridge-stp off
 403         bridge-fd 0
 404
 405 ----
 406
 407
 408 [[sysadmin_network_vlan]]
 409 VLAN 802.1Q
 410 ~~~~~~~~~~~
 411
 412 A virtual LAN (VLAN) is a broadcast domain that is partitioned and
 413 isolated in the network at layer two.  So it is possible to have
 414 multiple networks (4096) in a physical network, each independent of
 415 the other ones.
 416
 417 Each VLAN network is identified by a number often called 'tag'.
 418 Network packages are then 'tagged' to identify which virtual network
 419 they belong to.
 420
 421
 422 VLAN for Guest Networks
 423 ^^^^^^^^^^^^^^^^^^^^^^^
 424
 425 {pve} supports this setup out of the box. You can specify the VLAN tag
 426 when you create a VM. The VLAN tag is part of the guest network
 427 configuration. The networking layer supports different modes to
 428 implement VLANs, depending on the bridge configuration:
 429
 430 * *VLAN awareness on the Linux bridge:*
 431 In this case, each guest's virtual network card is assigned to a VLAN tag,
 432 which is transparently supported by the Linux bridge.
 433 Trunk mode is also possible, but that makes configuration
 434 in the guest necessary.
 435
 436 * *"traditional" VLAN on the Linux bridge:*
 437 In contrast to the VLAN awareness method, this method is not transparent
 438 and creates a VLAN device with associated bridge for each VLAN.
 439 That is, creating a guest on VLAN 5 for example, would create two
 440 interfaces eno1.5 and vmbr0v5, which would remain until a reboot occurs.
 441
 442 * *Open vSwitch VLAN:*
 443 This mode uses the OVS VLAN feature.
 444
 445 * *Guest configured VLAN:*
 446 VLANs are assigned inside the guest. In this case, the setup is
 447 completely done inside the guest and can not be influenced from the
 448 outside. The benefit is that you can use more than one VLAN on a
 449 single virtual NIC.
 450
 451
 452 VLAN on the Host
 453 ^^^^^^^^^^^^^^^^
 454
 455 To allow host communication with an isolated network. It is possible
 456 to apply VLAN tags to any network device (NIC, Bond, Bridge). In
 457 general, you should configure the VLAN on the interface with the least
 458 abstraction layers between itself and the physical NIC.
 459
 460 For example, in a default configuration where you want to place
 461 the host management address on a separate VLAN.
 462
 463
 464 .Example: Use VLAN 5 for the {pve} management IP with traditional Linux bridge
 465 ----
 466 auto lo
 467 iface lo inet loopback
 468
 469 iface eno1 inet manual
 470
 471 iface eno1.5 inet manual
 472
 473 auto vmbr0v5
 474 iface vmbr0v5 inet static
 475         address  10.10.10.2/24
 476         gateway  10.10.10.1
 477         bridge-ports eno1.5
 478         bridge-stp off
 479         bridge-fd 0
 480
 481 auto vmbr0
 482 iface vmbr0 inet manual
 483         bridge-ports eno1
 484         bridge-stp off
 485         bridge-fd 0
 486
 487 ----
 488
 489 .Example: Use VLAN 5 for the {pve} management IP with VLAN aware Linux bridge
 490 ----
 491 auto lo
 492 iface lo inet loopback
 493
 494 iface eno1 inet manual
 495
 496
 497 auto vmbr0.5
 498 iface vmbr0.5 inet static
 499         address  10.10.10.2/24
 500         gateway  10.10.10.1
 501
 502 auto vmbr0
 503 iface vmbr0 inet manual
 504         bridge-ports eno1
 505         bridge-stp off
 506         bridge-fd 0
 507         bridge-vlan-aware yes
 508         bridge-vids 2-4094
 509 ----
 510
 511 The next example is the same setup but a bond is used to
 512 make this network fail-safe.
 513
 514 .Example: Use VLAN 5 with bond0 for the {pve} management IP with traditional Linux bridge
 515 ----
 516 auto lo
 517 iface lo inet loopback
 518
 519 iface eno1 inet manual
 520
 521 iface eno2 inet manual
 522
 523 auto bond0
 524 iface bond0 inet manual
 525       bond-slaves eno1 eno2
 526       bond-miimon 100
 527       bond-mode 802.3ad
 528       bond-xmit-hash-policy layer2+3
 529
 530 iface bond0.5 inet manual
 531
 532 auto vmbr0v5
 533 iface vmbr0v5 inet static
 534         address  10.10.10.2/24
 535         gateway  10.10.10.1
 536         bridge-ports bond0.5
 537         bridge-stp off
 538         bridge-fd 0
 539
 540 auto vmbr0
 541 iface vmbr0 inet manual
 542         bridge-ports bond0
 543         bridge-stp off
 544         bridge-fd 0
 545
 546 ----
 547
 548 Disabling IPv6 on the Node
 549 ~~~~~~~~~~~~~~~~~~~~~~~~~~
 550
 551 {pve} works correctly in all environments, irrespective of whether IPv6 is
 552 deployed or not. We recommend leaving all settings at the provided defaults.
 553
 554 Should you still need to disable support for IPv6 on your node, do so by
 555 creating an appropriate `sysctl.conf (5)` snippet file and setting the proper
 556 https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt[sysctls],
 557 for example adding `/etc/sysctl.d/disable-ipv6.conf` with content:
 558
 559 ----
 560 net.ipv6.conf.all.disable_ipv6 = 1
 561 net.ipv6.conf.default.disable_ipv6 = 1
 562 ----
 563
 564 This method is preferred to disabling the loading of the IPv6 module on the
 565 https://www.kernel.org/doc/Documentation/networking/ipv6.rst[kernel commandline].
 566
 567 ////
 568 TODO: explain IPv6 support?
 569 TODO: explain OVS
 570 ////