backup: update information about performance settings

[pve-docs.git] / pve-network.adoc
diff --git a/pve-network.adoc b/pve-network.adoc

index 37667b8366880cb9a574d011723a5b99c425e397..d1ec64b37c2bbad0cbac9d28236f3eb0d1abb111 100644 (file)
--- a/pve-network.adoc
+++ b/pve-network.adoc
@@ -5,15 +5,30 @@ ifdef::wiki[]
  :pve-toplevel:
  endif::wiki[]
  
-Network configuration can be done either via the GUI, or by manually
-editing the file `/etc/network/interfaces`, which contains the
-whole network configuration. The  `interfaces(5)` manual page contains the
-complete format description. All {pve} tools try hard to keep direct
-user modifications, but using the GUI is still preferable, because it
+{pve} is using the Linux network stack. This provides a lot of flexibility on
+how to set up the network on the {pve} nodes. The configuration can be done
+either via the GUI, or by manually editing the file `/etc/network/interfaces`,
+which contains the whole network configuration. The  `interfaces(5)` manual
+page contains the complete format description. All {pve} tools try hard to keep
+direct user modifications, but using the GUI is still preferable, because it
  protects you from errors.
  
-Once the network is configured, you can use the Debian traditional tools `ifup`
-and `ifdown` commands to bring interfaces up and down.
+A 'vmbr' interface is needed to connect guests to the underlying physical
+network.  They are a Linux bridge which can be thought of as a virtual switch
+to which the guests and physical interfaces are connected to.  This section
+provides some examples on how the network can be set up to accomodate different
+use cases like redundancy with a xref:sysadmin_network_bond['bond'],
+xref:sysadmin_network_vlan['vlans'] or
+xref:sysadmin_network_routed['routed'] and
+xref:sysadmin_network_masquerading['NAT'] setups.
+
+The xref:chapter_pvesdn[Software Defined Network] is an option for more complex
+virtual networks in {pve} clusters.
+
+WARNING: It's discouraged to use the traditional Debian tools `ifup` and `ifdown`
+if unsure, as they have some pitfalls like interupting all guest traffic on
+`ifdown vmbrX` but not reconnecting those guest again when doing `ifup` on the
+same bridge later.
  
  Apply Network Changes
  ~~~~~~~~~~~~~~~~~~~~~
@@ -24,53 +39,45 @@ can do many related changes at once. This also allows to ensure your changes
  are correct before applying, as a wrong network configuration may render a node
  inaccessible.
  
-Reboot Node to apply
-^^^^^^^^^^^^^^^^^^^^
-
-With the default installed `ifupdown` network managing package you need to
-reboot to commit any pending network changes. Most of the time, the basic {pve}
-network setup is stable and does not change often, so rebooting should not be
-required often.
-
-Reload Network with ifupdown2
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-With the optional `ifupdown2` network managing package you also can reload the
-network configuration live, without requiring a reboot.
-
-Since {pve} 6.1 you can apply pending network changes over the web-interface,
-using the 'Apply Configuration' button in the 'Network' panel of a node.
+Live-Reload Network with ifupdown2
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  
-To install 'ifupdown2' ensure you have the latest {pve} updates installed, then
+With the recommended 'ifupdown2' package (default for new installations since
+{pve} 7.0), it is possible to apply network configuration changes without a
+reboot. If you change the network configuration via the GUI, you can click the
+'Apply Configuration' button. This will move changes from the staging
+`interfaces.new` file to `/etc/network/interfaces` and apply them live.
  
-WARNING: installing 'ifupdown2' will remove 'ifupdown', but as the removal
-scripts of 'ifupdown' before version '0.8.35+pve1' have a issue where network
-is fully stopped on removal footnote:[Introduced with Debian Buster:
-https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=945877] you *must* ensure
-that you have a up to date 'ifupdown' package version.
+If you made manual changes directly to the `/etc/network/interfaces` file, you
+can apply them by running `ifreload -a`
  
-For the installation itself you can then simply do:
+NOTE: If you installed {pve} on top of Debian, or upgraded to {pve} 7.0 from an
+older {pve} installation, make sure 'ifupdown2' is installed: `apt install
+ifupdown2`
  
- apt install ifupdown2
+Reboot Node to Apply
+^^^^^^^^^^^^^^^^^^^^
  
-With that you're all set. You can also switch back to the 'ifupdown' variant at
-any time, if you run into issues.
+Another way to apply a new network configuration is to reboot the node.
+In that case the systemd service `pvenetcommit` will activate the staging
+`interfaces.new` file before the `networking` service will apply that
+configuration.
  
  Naming Conventions
  ~~~~~~~~~~~~~~~~~~
  
  We currently use the following naming conventions for device names:
  
-* Ethernet devices: en*, systemd network interface names. This naming scheme is
+* Ethernet devices: `en*`, systemd network interface names. This naming scheme is
   used for new {pve} installations since version 5.0.
  
-* Ethernet devices: eth[N], where 0 ≤ N (`eth0`, `eth1`, ...) This naming
+* Ethernet devices: `eth[N]`, where 0 ≤ N (`eth0`, `eth1`, ...) This naming
  scheme is used for {pve} hosts which were installed before the 5.0
  release. When upgrading to 5.0, the names are kept as-is.
  
-* Bridge names: vmbr[N], where 0 ≤ N ≤ 4094 (`vmbr0` - `vmbr4094`)
+* Bridge names: `vmbr[N]`, where 0 ≤ N ≤ 4094 (`vmbr0` - `vmbr4094`)
  
-* Bonds: bond[N], where 0 ≤ N (`bond0`, `bond1`, ...)
+* Bonds: `bond[N]`, where 0 ≤ N (`bond0`, `bond1`, ...)
  
  * VLANs: Simply add the VLAN number to the device name,
    separated by a period (`eno1.50`, `bond1.30`)
@@ -78,28 +85,121 @@ release. When upgrading to 5.0, the names are kept as-is.
  This makes it easier to debug networks problems, because the device
  name implies the device type.
  
+[[systemd_network_interface_names]]
  Systemd Network Interface Names
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  
-Systemd uses the two character prefix 'en' for Ethernet network
-devices. The next characters depends on the device driver and the fact
-which schema matches first.
+Systemd defines a versioned naming scheme for network device names. The
+scheme uses the two-character prefix `en` for Ethernet network devices. The
+next characters depends on the device driver, device location and other
+attributes. Some possible patterns are:
+
+* `o<index>[n<phys_port_name>|d<dev_port>]` — devices on board
+
+* `s<slot>[f<function>][n<phys_port_name>|d<dev_port>]` — devices by hotplug id
+
+* `[P<domain>]p<bus>s<slot>[f<function>][n<phys_port_name>|d<dev_port>]` —
+devices by bus id
+
+* `x<MAC>` — devices by MAC address
+
+Some examples for the most common patterns are:
+
+* `eno1` — is the first on-board NIC
+
+* `enp3s0f1` — is function 1 of the NIC on PCI bus 3, slot 0
+
+For a full list of possible device name patterns, see the
+https://manpages.debian.org/stable/systemd/systemd.net-naming-scheme.7.en.html[
+systemd.net-naming-scheme(7) manpage].
+
+A new version of systemd may define a new version of the network device naming
+scheme, which it then uses by default. Consequently, updating to a newer
+systemd version, for example during a major {pve} upgrade, can change the names
+of network devices and require adjusting the network configuration. To avoid
+name changes due to a new version of the naming scheme, you can manually pin a
+particular naming scheme version (see
+xref:network_pin_naming_scheme_version[below]).
+
+However, even with a pinned naming scheme version, network device names can
+still change due to kernel or driver updates. In order to avoid name changes
+for a particular network device altogether, you can manually override its name
+using a link file (see xref:network_override_device_names[below]).
+
+For more information on network interface names, see
+https://systemd.io/PREDICTABLE_INTERFACE_NAMES/[Predictable Network Interface
+Names].
+
+[[network_pin_naming_scheme_version]]
+Pinning a specific naming scheme version
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+You can pin a specific version of the naming scheme for network devices by
+adding the `net.naming-scheme=<version>` parameter to the
+xref:sysboot_edit_kernel_cmdline[kernel command line]. For a list of naming
+scheme versions, see the
+https://manpages.debian.org/stable/systemd/systemd.net-naming-scheme.7.en.html[
+systemd.net-naming-scheme(7) manpage].
+
+For example, to pin the version `v252`, which is the latest naming scheme
+version for a fresh {pve} 8.0 installation, add the following kernel
+command-line parameter:
+
+----
+net.naming-scheme=v252
+----
+
+See also xref:sysboot_edit_kernel_cmdline[this section] on editing the kernel
+command line. You need to reboot for the changes to take effect.
+
+[[network_override_device_names]]
+Overriding network device names
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+You can manually assign a name to a particular network device using a custom
+https://manpages.debian.org/stable/udev/systemd.link.5.en.html[systemd.link
+file]. This overrides the name that would be assigned according to the latest
+network device naming scheme. This way, you can avoid naming changes due to
+kernel updates, driver updates or newer versions of the naming scheme.
  
-* o<index>[n<phys_port_name>|d<dev_port>] — devices on board
+Custom link files should be placed in `/etc/systemd/network/` and named
+`<n>-<id>.link`, where `n` is a priority smaller than `99` and `id` is some
+identifier. A link file has two sections: `[Match]` determines which interfaces
+the file will apply to; `[Link]` determines how these interfaces should be
+configured, including their naming.
  
-* s<slot>[f<function>][n<phys_port_name>|d<dev_port>] — device by hotplug id
+To assign a name to a particular network device, you need a way to uniquely and
+permanently identify that device in the `[Match]` section. One possibility is
+to match the device's MAC address using the `MACAddress` option, as it is
+unlikely to change. Then, you can assign a name using the `Name` option in the
+`[Link]` section.
  
-* [P<domain>]p<bus>s<slot>[f<function>][n<phys_port_name>|d<dev_port>] — devices by bus id
+For example, to assign the name `enwan0` to the device with MAC address
+`aa:bb:cc:dd:ee:ff`, create a file `/etc/systemd/network/10-enwan0.link` with
+the following contents:
  
-* x<MAC> — device by MAC address
+----
+[Match]
+MACAddress=aa:bb:cc:dd:ee:ff
  
-The most common patterns are:
+[Link]
+Name=enwan0
+----
  
-* eno1 — is the first on board NIC
+Do not forget to adjust `/etc/network/interfaces` to use the new name.
+You need to reboot the node for the change to take effect.
  
-* enp3s0f1 — is the NIC on pcibus 3 slot 0 and use the NIC function 1.
+NOTE: It is recommended to assign a name starting with `en` or `eth` so that
+{pve} recognizes the interface as a physical network device which can then be
+configured via the GUI. Also, you should ensure that the name will not clash
+with other interface names in the future. One possibility is to assign a name
+that does not match any name pattern that systemd uses for network interfaces
+(xref:systemd_network_interface_names[see above]), such as `enwan0` in the
+example above.
  
-For more information see https://www.freedesktop.org/wiki/Software/systemd/PredictableNetworkInterfaceNames/[Predictable Network Interface Names].
+For more information on link files, see the
+https://manpages.debian.org/stable/udev/systemd.link.5.en.html[systemd.link(5)
+manpage].
  
  Choosing a network configuration
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -155,8 +255,7 @@ iface eno1 inet manual
  
  auto vmbr0
  iface vmbr0 inet static
-        address 192.168.10.2
-        netmask 255.255.255.0
+        address 192.168.10.2/24
          gateway 192.168.10.1
          bridge-ports eno1
          bridge-stp off
@@ -168,6 +267,7 @@ physical network. The network, in turn, sees each virtual machine as
  having its own MAC, even though there is only one network cable
  connecting all of these VMs to the network.
  
+[[sysadmin_network_routed]]
  Routed Configuration
  ~~~~~~~~~~~~~~~~~~~~
  
@@ -186,32 +286,31 @@ address.
  [thumbnail="default-network-setup-routed.svg"]
  A common scenario is that you have a public IP (assume `198.51.100.5`
  for this example), and an additional IP block for your VMs
-(`203.0.113.16/29`). We recommend the following setup for such
+(`203.0.113.16/28`). We recommend the following setup for such
  situations:
  
  ----
  auto lo
  iface lo inet loopback
  
-auto eno1
-iface eno1 inet static
-        address  198.51.100.5
-        netmask  255.255.255.0
+auto eno0
+iface eno0 inet static
+        address  198.51.100.5/29
          gateway  198.51.100.1
          post-up echo 1 > /proc/sys/net/ipv4/ip_forward
-        post-up echo 1 > /proc/sys/net/ipv4/conf/eno1/proxy_arp
+        post-up echo 1 > /proc/sys/net/ipv4/conf/eno0/proxy_arp
  
  
  auto vmbr0
  iface vmbr0 inet static
-        address  203.0.113.17
-        netmask  255.255.255.248
+        address  203.0.113.17/28
          bridge-ports none
          bridge-stp off
          bridge-fd 0
  ----
  
  
+[[sysadmin_network_masquerading]]
  Masquerading (NAT) with `iptables`
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  
@@ -227,15 +326,13 @@ iface lo inet loopback
  auto eno1
  #real IP address
  iface eno1 inet static
-        address  198.51.100.5
-        netmask  255.255.255.0
+        address  198.51.100.5/24
          gateway  198.51.100.1
  
  auto vmbr0
  #private sub network
  iface vmbr0 inet static
-        address  10.10.10.1
-        netmask  255.255.255.0
+        address  10.10.10.1/24
          bridge-ports none
          bridge-stp off
          bridge-fd 0
@@ -263,10 +360,10 @@ https://commons.wikimedia.org/wiki/File:Netfilter-packet-flow.svg[Netfilter Pack
  
  https://lwn.net/Articles/370152/[Patch on netdev-list introducing conntrack zones]
  
-https://blog.lobraun.de/2019/05/19/prox/[Blog post with a good explanation by using TRACE in the raw table]
-
+https://web.archive.org/web/20220610151210/https://blog.lobraun.de/2019/05/19/prox/[Blog post with a good explanation by using TRACE in the raw table]
  
  
+[[sysadmin_network_bond]]
  Linux Bond
  ~~~~~~~~~~
  
@@ -333,11 +430,11 @@ traffic.
  
  If your switch support the LACP (IEEE 802.3ad) protocol then we recommend using
  the corresponding bonding mode (802.3ad). Otherwise you should generally use the
-active-backup mode. +
-// http://lists.linux-ha.org/pipermail/linux-ha/2013-January/046295.html
-If you intend to run your cluster network on the bonding interfaces, then you
-have to use active-passive mode on the bonding interfaces, other modes are
-unsupported.
+active-backup mode.
+
+For the cluster network (Corosync) we recommend configuring it with multiple
+networks. Corosync does not need a bond for network reduncancy as it can switch
+between networks by itself, if one becomes unusable.
  
  The following bond configuration can be used as distributed/shared
  storage network. The benefit would be that you get more speed and the
@@ -357,16 +454,14 @@ iface eno3 inet manual
  auto bond0
  iface bond0 inet static
        bond-slaves eno1 eno2
-      address  192.168.1.2
-      netmask  255.255.255.0
+      address  192.168.1.2/24
        bond-miimon 100
        bond-mode 802.3ad
        bond-xmit-hash-policy layer2+3
  
  auto vmbr0
  iface vmbr0 inet static
-        address  10.10.10.2
-        netmask  255.255.255.0
+        address  10.10.10.2/24
          gateway  10.10.10.1
          bridge-ports eno3
          bridge-stp off
@@ -397,8 +492,7 @@ iface bond0 inet manual
  
  auto vmbr0
  iface vmbr0 inet static
-        address  10.10.10.2
-        netmask  255.255.255.0
+        address  10.10.10.2/24
          gateway  10.10.10.1
          bridge-ports bond0
          bridge-stp off
@@ -407,6 +501,7 @@ iface vmbr0 inet static
  ----
  
  
+[[sysadmin_network_vlan]]
  VLAN 802.1Q
  ~~~~~~~~~~~
  
@@ -473,8 +568,7 @@ iface eno1.5 inet manual
  
  auto vmbr0v5
  iface vmbr0v5 inet static
-        address  10.10.10.2
-        netmask  255.255.255.0
+        address  10.10.10.2/24
          gateway  10.10.10.1
          bridge-ports eno1.5
          bridge-stp off
@@ -498,8 +592,7 @@ iface eno1 inet manual
  
  auto vmbr0.5
  iface vmbr0.5 inet static
-        address  10.10.10.2
-        netmask  255.255.255.0
+        address  10.10.10.2/24
          gateway  10.10.10.1
  
  auto vmbr0
@@ -508,6 +601,7 @@ iface vmbr0 inet manual
          bridge-stp off
          bridge-fd 0
          bridge-vlan-aware yes
+        bridge-vids 2-4094
  ----
  
  The next example is the same setup but a bond is used to
@@ -533,8 +627,7 @@ iface bond0.5 inet manual
  
  auto vmbr0v5
  iface vmbr0v5 inet static
-        address  10.10.10.2
-        netmask  255.255.255.0
+        address  10.10.10.2/24
          gateway  10.10.10.1
          bridge-ports bond0.5
          bridge-stp off
@@ -567,6 +660,34 @@ net.ipv6.conf.default.disable_ipv6 = 1
  This method is preferred to disabling the loading of the IPv6 module on the
  https://www.kernel.org/doc/Documentation/networking/ipv6.rst[kernel commandline].
  
+
+Disabling MAC Learning on a Bridge
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+By default, MAC learning is enabled on a bridge to ensure a smooth experience
+with virtual guests and their networks.
+
+But in some environments this can be undesired. Since {pve} 7.3 you can disable
+MAC learning on the bridge by setting the `bridge-disable-mac-learning 1`
+configuration on a bridge in `/etc/network/interfaces', for example:
+
+----
+# ...
+
+auto vmbr0
+iface vmbr0 inet static
+        address  10.10.10.2/24
+        gateway  10.10.10.1
+        bridge-ports ens18
+        bridge-stp off
+        bridge-fd 0
+        bridge-disable-mac-learning 1
+----
+
+Once enabled, {pve} will manually add the configured MAC address from VMs and
+Containers to the bridges forwarding database to ensure that guest can still
+use the network - but only when they are using their actual MAC address.
+
  ////
  TODO: explain IPv6 support?
  TODO: explain OVS