X-Git-Url: https://git.proxmox.com/?p=pve-docs.git;a=blobdiff_plain;f=pvecm.adoc;h=3d62c0b6be5dafce2c3d8e27a352bb2ea62243d9;hp=3b2a75da18eb8d3436a6b1982773db9b2617db09;hb=94958b8b9230d5b9b5e2e70c481f115b18a5fa0b;hpb=d8742b0c9cbfbc0a2bac4b342657dc94db079a81 diff --git a/pvecm.adoc b/pvecm.adoc index 3b2a75d..3d62c0b 100644 --- a/pvecm.adoc +++ b/pvecm.adoc @@ -1,14 +1,15 @@ +[[chapter_pvecm]] ifdef::manvolnum[] -PVE({manvolnum}) -================ -include::attributes.txt[] +pvecm(1) +======== +:pve-toplevel: NAME ---- pvecm - Proxmox VE Cluster Manager -SYNOPSYS +SYNOPSIS -------- include::pvecm.1-synopsis.adoc[] @@ -20,13 +21,1005 @@ endif::manvolnum[] ifndef::manvolnum[] Cluster Manager =============== -include::attributes.txt[] +:pve-toplevel: endif::manvolnum[] -'pvecm' is a program to manage the cluster configuration. It can be -used to create a new cluster, join nodes to a cluster, leave the -cluster, get status information and do various other cluster related -tasks. +The {PVE} cluster manager `pvecm` is a tool to create a group of +physical servers. Such a group is called a *cluster*. We use the +http://www.corosync.org[Corosync Cluster Engine] for reliable group +communication, and such clusters can consist of up to 32 physical nodes +(probably more, dependent on network latency). + +`pvecm` can be used to create a new cluster, join nodes to a cluster, +leave the cluster, get status information and do various other cluster +related tasks. The **P**rox**m**o**x** **C**luster **F**ile **S**ystem (``pmxcfs'') +is used to transparently distribute the cluster configuration to all cluster +nodes. + +Grouping nodes into a cluster has the following advantages: + +* Centralized, web based management + +* Multi-master clusters: each node can do all management task + +* `pmxcfs`: database-driven file system for storing configuration files, + replicated in real-time on all nodes using `corosync`. + +* Easy migration of virtual machines and containers between physical + hosts + +* Fast deployment + +* Cluster-wide services like firewall and HA + + +Requirements +------------ + +* All nodes must be in the same network as `corosync` uses IP Multicast + to communicate between nodes (also see + http://www.corosync.org[Corosync Cluster Engine]). Corosync uses UDP + ports 5404 and 5405 for cluster communication. ++ +NOTE: Some switches do not support IP multicast by default and must be +manually enabled first. + +* Date and time have to be synchronized. + +* SSH tunnel on TCP port 22 between nodes is used. + +* If you are interested in High Availability, you need to have at + least three nodes for reliable quorum. All nodes should have the + same version. + +* We recommend a dedicated NIC for the cluster traffic, especially if + you use shared storage. + +* Root password of a cluster node is required for adding nodes. + +NOTE: It is not possible to mix {pve} 3.x and earlier with {pve} 4.X cluster +nodes. + +NOTE: While it's possible for {pve} 4.4 and {pve} 5.0 this is not supported as +production configuration and should only used temporarily during upgrading the +whole cluster from one to another major version. + + +Preparing Nodes +--------------- + +First, install {PVE} on all nodes. Make sure that each node is +installed with the final hostname and IP configuration. Changing the +hostname and IP is not possible after cluster creation. + +Currently the cluster creation can either be done on the console (login via +`ssh`) or the API, which we have a GUI implementation for (__Datacenter -> +Cluster__). + +While it's often common use to reference all other nodenames in `/etc/hosts` +with their IP this is not strictly necessary for a cluster, which normally uses +multicast, to work. It maybe useful as you then can connect from one node to +the other with SSH through the easier to remember node name. + +[[pvecm_create_cluster]] +Create the Cluster +------------------ + +Login via `ssh` to the first {pve} node. Use a unique name for your cluster. +This name cannot be changed later. The cluster name follows the same rules as +node names. + +---- + hp1# pvecm create CLUSTERNAME +---- + +CAUTION: The cluster name is used to compute the default multicast address. +Please use unique cluster names if you run more than one cluster inside your +network. To avoid human confusion, it is also recommended to choose different +names even if clusters do not share the cluster network. + +To check the state of your cluster use: + +---- + hp1# pvecm status +---- + +Multiple Clusters In Same Network +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +It is possible to create multiple clusters in the same physical or logical +network. Each cluster must have a unique name, which is used to generate the +cluster's multicast group address. As long as no duplicate cluster names are +configured in one network segment, the different clusters won't interfere with +each other. + +If multiple clusters operate in a single network it may be beneficial to setup +an IGMP querier and enable IGMP Snooping in said network. This may reduce the +load of the network significantly because multicast packets are only delivered +to endpoints of the respective member nodes. + + +[[pvecm_join_node_to_cluster]] +Adding Nodes to the Cluster +--------------------------- + +Login via `ssh` to the node you want to add. + +---- + hp2# pvecm add IP-ADDRESS-CLUSTER +---- + +For `IP-ADDRESS-CLUSTER` use the IP from an existing cluster node. + +CAUTION: A new node cannot hold any VMs, because you would get +conflicts about identical VM IDs. Also, all existing configuration in +`/etc/pve` is overwritten when you join a new node to the cluster. To +workaround, use `vzdump` to backup and restore to a different VMID after +adding the node to the cluster. + +To check the state of cluster: + +---- + # pvecm status +---- + +.Cluster status after adding 4 nodes +---- +hp2# pvecm status +Quorum information +~~~~~~~~~~~~~~~~~~ +Date: Mon Apr 20 12:30:13 2015 +Quorum provider: corosync_votequorum +Nodes: 4 +Node ID: 0x00000001 +Ring ID: 1928 +Quorate: Yes + +Votequorum information +~~~~~~~~~~~~~~~~~~~~~~ +Expected votes: 4 +Highest expected: 4 +Total votes: 4 +Quorum: 2 +Flags: Quorate + +Membership information +~~~~~~~~~~~~~~~~~~~~~~ + Nodeid Votes Name +0x00000001 1 192.168.15.91 +0x00000002 1 192.168.15.92 (local) +0x00000003 1 192.168.15.93 +0x00000004 1 192.168.15.94 +---- + +If you only want the list of all nodes use: + +---- + # pvecm nodes +---- + +.List nodes in a cluster +---- +hp2# pvecm nodes + +Membership information +~~~~~~~~~~~~~~~~~~~~~~ + Nodeid Votes Name + 1 1 hp1 + 2 1 hp2 (local) + 3 1 hp3 + 4 1 hp4 +---- + +[[adding-nodes-with-separated-cluster-network]] +Adding Nodes With Separated Cluster Network +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +When adding a node to a cluster with a separated cluster network you need to +use the 'ringX_addr' parameters to set the nodes address on those networks: + +[source,bash] +---- +pvecm add IP-ADDRESS-CLUSTER -ring0_addr IP-ADDRESS-RING0 +---- + +If you want to use the Redundant Ring Protocol you will also want to pass the +'ring1_addr' parameter. + + +Remove a Cluster Node +--------------------- + +CAUTION: Read carefully the procedure before proceeding, as it could +not be what you want or need. + +Move all virtual machines from the node. Make sure you have no local +data or backups you want to keep, or save them accordingly. +In the following example we will remove the node hp4 from the cluster. + +Log in to a *different* cluster node (not hp4), and issue a `pvecm nodes` +command to identify the node ID to remove: + +---- +hp1# pvecm nodes + +Membership information +~~~~~~~~~~~~~~~~~~~~~~ + Nodeid Votes Name + 1 1 hp1 (local) + 2 1 hp2 + 3 1 hp3 + 4 1 hp4 +---- + + +At this point you must power off hp4 and +make sure that it will not power on again (in the network) as it +is. + +IMPORTANT: As said above, it is critical to power off the node +*before* removal, and make sure that it will *never* power on again +(in the existing cluster network) as it is. +If you power on the node as it is, your cluster will be screwed up and +it could be difficult to restore a clean cluster state. + +After powering off the node hp4, we can safely remove it from the cluster. + +---- + hp1# pvecm delnode hp4 +---- + +If the operation succeeds no output is returned, just check the node +list again with `pvecm nodes` or `pvecm status`. You should see +something like: + +---- +hp1# pvecm status + +Quorum information +~~~~~~~~~~~~~~~~~~ +Date: Mon Apr 20 12:44:28 2015 +Quorum provider: corosync_votequorum +Nodes: 3 +Node ID: 0x00000001 +Ring ID: 1992 +Quorate: Yes + +Votequorum information +~~~~~~~~~~~~~~~~~~~~~~ +Expected votes: 3 +Highest expected: 3 +Total votes: 3 +Quorum: 3 +Flags: Quorate + +Membership information +~~~~~~~~~~~~~~~~~~~~~~ + Nodeid Votes Name +0x00000001 1 192.168.15.90 (local) +0x00000002 1 192.168.15.91 +0x00000003 1 192.168.15.92 +---- + +If, for whatever reason, you want that this server joins the same +cluster again, you have to + +* reinstall {pve} on it from scratch + +* then join it, as explained in the previous section. + +[[pvecm_separate_node_without_reinstall]] +Separate A Node Without Reinstalling +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +CAUTION: This is *not* the recommended method, proceed with caution. Use the +above mentioned method if you're unsure. + +You can also separate a node from a cluster without reinstalling it from +scratch. But after removing the node from the cluster it will still have +access to the shared storages! This must be resolved before you start removing +the node from the cluster. A {pve} cluster cannot share the exact same +storage with another cluster, as storage locking doesn't work over cluster +boundary. Further, it may also lead to VMID conflicts. + +Its suggested that you create a new storage where only the node which you want +to separate has access. This can be an new export on your NFS or a new Ceph +pool, to name a few examples. Its just important that the exact same storage +does not gets accessed by multiple clusters. After setting this storage up move +all data from the node and its VMs to it. Then you are ready to separate the +node from the cluster. + +WARNING: Ensure all shared resources are cleanly separated! You will run into +conflicts and problems else. + +First stop the corosync and the pve-cluster services on the node: +[source,bash] +---- +systemctl stop pve-cluster +systemctl stop corosync +---- + +Start the cluster filesystem again in local mode: +[source,bash] +---- +pmxcfs -l +---- + +Delete the corosync configuration files: +[source,bash] +---- +rm /etc/pve/corosync.conf +rm /etc/corosync/* +---- + +You can now start the filesystem again as normal service: +[source,bash] +---- +killall pmxcfs +systemctl start pve-cluster +---- + +The node is now separated from the cluster. You can deleted it from a remaining +node of the cluster with: +[source,bash] +---- +pvecm delnode oldnode +---- + +If the command failed, because the remaining node in the cluster lost quorum +when the now separate node exited, you may set the expected votes to 1 as a workaround: +[source,bash] +---- +pvecm expected 1 +---- + +And the repeat the 'pvecm delnode' command. + +Now switch back to the separated node, here delete all remaining files left +from the old cluster. This ensures that the node can be added to another +cluster again without problems. + +[source,bash] +---- +rm /var/lib/corosync/* +---- + +As the configuration files from the other nodes are still in the cluster +filesystem you may want to clean those up too. Remove simply the whole +directory recursive from '/etc/pve/nodes/NODENAME', but check three times that +you used the correct one before deleting it. + +CAUTION: The nodes SSH keys are still in the 'authorized_key' file, this means +the nodes can still connect to each other with public key authentication. This +should be fixed by removing the respective keys from the +'/etc/pve/priv/authorized_keys' file. + +Quorum +------ + +{pve} use a quorum-based technique to provide a consistent state among +all cluster nodes. + +[quote, from Wikipedia, Quorum (distributed computing)] +____ +A quorum is the minimum number of votes that a distributed transaction +has to obtain in order to be allowed to perform an operation in a +distributed system. +____ + +In case of network partitioning, state changes requires that a +majority of nodes are online. The cluster switches to read-only mode +if it loses quorum. + +NOTE: {pve} assigns a single vote to each node by default. + +Cluster Network +--------------- + +The cluster network is the core of a cluster. All messages sent over it have to +be delivered reliable to all nodes in their respective order. In {pve} this +part is done by corosync, an implementation of a high performance low overhead +high availability development toolkit. It serves our decentralized +configuration file system (`pmxcfs`). + +[[cluster-network-requirements]] +Network Requirements +~~~~~~~~~~~~~~~~~~~~ +This needs a reliable network with latencies under 2 milliseconds (LAN +performance) to work properly. While corosync can also use unicast for +communication between nodes its **highly recommended** to have a multicast +capable network. The network should not be used heavily by other members, +ideally corosync runs on its own network. +*never* share it with network where storage communicates too. + +Before setting up a cluster it is good practice to check if the network is fit +for that purpose. + +* Ensure that all nodes are in the same subnet. This must only be true for the + network interfaces used for cluster communication (corosync). + +* Ensure all nodes can reach each other over those interfaces, using `ping` is + enough for a basic test. + +* Ensure that multicast works in general and a high package rates. This can be + done with the `omping` tool. The final "%loss" number should be < 1%. ++ +[source,bash] +---- +omping -c 10000 -i 0.001 -F -q NODE1-IP NODE2-IP ... +---- + +* Ensure that multicast communication works over an extended period of time. + This uncovers problems where IGMP snooping is activated on the network but + no multicast querier is active. This test has a duration of around 10 + minutes. ++ +[source,bash] +---- +omping -c 600 -i 1 -q NODE1-IP NODE2-IP ... +---- + +Your network is not ready for clustering if any of these test fails. Recheck +your network configuration. Especially switches are notorious for having +multicast disabled by default or IGMP snooping enabled with no IGMP querier +active. + +In smaller cluster its also an option to use unicast if you really cannot get +multicast to work. + +Separate Cluster Network +~~~~~~~~~~~~~~~~~~~~~~~~ + +When creating a cluster without any parameters the cluster network is generally +shared with the Web UI and the VMs and its traffic. Depending on your setup +even storage traffic may get sent over the same network. Its recommended to +change that, as corosync is a time critical real time application. + +Setting Up A New Network +^^^^^^^^^^^^^^^^^^^^^^^^ + +First you have to setup a new network interface. It should be on a physical +separate network. Ensure that your network fulfills the +<>. + +Separate On Cluster Creation +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +This is possible through the 'ring0_addr' and 'bindnet0_addr' parameter of +the 'pvecm create' command used for creating a new cluster. + +If you have setup an additional NIC with a static address on 10.10.10.1/25 +and want to send and receive all cluster communication over this interface +you would execute: + +[source,bash] +---- +pvecm create test --ring0_addr 10.10.10.1 --bindnet0_addr 10.10.10.0 +---- + +To check if everything is working properly execute: +[source,bash] +---- +systemctl status corosync +---- + +Afterwards, proceed as descripted in the section to +<>. + +[[separate-cluster-net-after-creation]] +Separate After Cluster Creation +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +You can do this also if you have already created a cluster and want to switch +its communication to another network, without rebuilding the whole cluster. +This change may lead to short durations of quorum loss in the cluster, as nodes +have to restart corosync and come up one after the other on the new network. + +Check how to <> first. +The open it and you should see a file similar to: + +---- +logging { + debug: off + to_syslog: yes +} + +nodelist { + + node { + name: due + nodeid: 2 + quorum_votes: 1 + ring0_addr: due + } + + node { + name: tre + nodeid: 3 + quorum_votes: 1 + ring0_addr: tre + } + + node { + name: uno + nodeid: 1 + quorum_votes: 1 + ring0_addr: uno + } + +} + +quorum { + provider: corosync_votequorum +} + +totem { + cluster_name: thomas-testcluster + config_version: 3 + ip_version: ipv4 + secauth: on + version: 2 + interface { + bindnetaddr: 192.168.30.50 + ringnumber: 0 + } + +} +---- + +The first you want to do is add the 'name' properties in the node entries if +you do not see them already. Those *must* match the node name. + +Then replace the address from the 'ring0_addr' properties with the new +addresses. You may use plain IP addresses or also hostnames here. If you use +hostnames ensure that they are resolvable from all nodes. + +In my example I want to switch my cluster communication to the 10.10.10.1/25 +network. So I replace all 'ring0_addr' respectively. I also set the bindnetaddr +in the totem section of the config to an address of the new network. It can be +any address from the subnet configured on the new network interface. + +After you increased the 'config_version' property the new configuration file +should look like: + +---- + +logging { + debug: off + to_syslog: yes +} + +nodelist { + + node { + name: due + nodeid: 2 + quorum_votes: 1 + ring0_addr: 10.10.10.2 + } + + node { + name: tre + nodeid: 3 + quorum_votes: 1 + ring0_addr: 10.10.10.3 + } + + node { + name: uno + nodeid: 1 + quorum_votes: 1 + ring0_addr: 10.10.10.1 + } + +} + +quorum { + provider: corosync_votequorum +} + +totem { + cluster_name: thomas-testcluster + config_version: 4 + ip_version: ipv4 + secauth: on + version: 2 + interface { + bindnetaddr: 10.10.10.1 + ringnumber: 0 + } + +} +---- + +Now after a final check whether all changed information is correct we save it +and see again the <> section to +learn how to bring it in effect. + +As our change cannot be enforced live from corosync we have to do an restart. + +On a single node execute: +[source,bash] +---- +systemctl restart corosync +---- + +Now check if everything is fine: + +[source,bash] +---- +systemctl status corosync +---- + +If corosync runs again correct restart corosync also on all other nodes. +They will then join the cluster membership one by one on the new network. + +[[pvecm_rrp]] +Redundant Ring Protocol +~~~~~~~~~~~~~~~~~~~~~~~ +To avoid a single point of failure you should implement counter measurements. +This can be on the hardware and operating system level through network bonding. + +Corosync itself offers also a possibility to add redundancy through the so +called 'Redundant Ring Protocol'. This protocol allows running a second totem +ring on another network, this network should be physically separated from the +other rings network to actually increase availability. + +RRP On Cluster Creation +~~~~~~~~~~~~~~~~~~~~~~~ + +The 'pvecm create' command provides the additional parameters 'bindnetX_addr', +'ringX_addr' and 'rrp_mode', can be used for RRP configuration. + +NOTE: See the <> if you do not know what each parameter means. + +So if you have two networks, one on the 10.10.10.1/24 and the other on the +10.10.20.1/24 subnet you would execute: + +[source,bash] +---- +pvecm create CLUSTERNAME -bindnet0_addr 10.10.10.1 -ring0_addr 10.10.10.1 \ +-bindnet1_addr 10.10.20.1 -ring1_addr 10.10.20.1 +---- + +RRP On Existing Clusters +~~~~~~~~~~~~~~~~~~~~~~~~ + +You will take similar steps as described in +<> to +enable RRP on an already running cluster. The single difference is, that you +will add `ring1` and use it instead of `ring0`. + +First add a new `interface` subsection in the `totem` section, set its +`ringnumber` property to `1`. Set the interfaces `bindnetaddr` property to an +address of the subnet you have configured for your new ring. +Further set the `rrp_mode` to `passive`, this is the only stable mode. + +Then add to each node entry in the `nodelist` section its new `ring1_addr` +property with the nodes additional ring address. + +So if you have two networks, one on the 10.10.10.1/24 and the other on the +10.10.20.1/24 subnet, the final configuration file should look like: + +---- +totem { + cluster_name: tweak + config_version: 9 + ip_version: ipv4 + rrp_mode: passive + secauth: on + version: 2 + interface { + bindnetaddr: 10.10.10.1 + ringnumber: 0 + } + interface { + bindnetaddr: 10.10.20.1 + ringnumber: 1 + } +} + +nodelist { + node { + name: pvecm1 + nodeid: 1 + quorum_votes: 1 + ring0_addr: 10.10.10.1 + ring1_addr: 10.10.20.1 + } + + node { + name: pvecm2 + nodeid: 2 + quorum_votes: 1 + ring0_addr: 10.10.10.2 + ring1_addr: 10.10.20.2 + } + + [...] # other cluster nodes here +} + +[...] # other remaining config sections here + +---- + +Bring it in effect like described in the +<> section. + +This is a change which cannot take live in effect and needs at least a restart +of corosync. Recommended is a restart of the whole cluster. + +If you cannot reboot the whole cluster ensure no High Availability services are +configured and the stop the corosync service on all nodes. After corosync is +stopped on all nodes start it one after the other again. + +Corosync Configuration +---------------------- + +The `/etc/pve/corosync.conf` file plays a central role in {pve} cluster. It +controls the cluster member ship and its network. +For reading more about it check the corosync.conf man page: +[source,bash] +---- +man corosync.conf +---- + +For node membership you should always use the `pvecm` tool provided by {pve}. +You may have to edit the configuration file manually for other changes. +Here are a few best practice tips for doing this. + +[[edit-corosync-conf]] +Edit corosync.conf +~~~~~~~~~~~~~~~~~~ + +Editing the corosync.conf file can be not always straight forward. There are +two on each cluster, one in `/etc/pve/corosync.conf` and the other in +`/etc/corosync/corosync.conf`. Editing the one in our cluster file system will +propagate the changes to the local one, but not vice versa. + +The configuration will get updated automatically as soon as the file changes. +This means changes which can be integrated in a running corosync will take +instantly effect. So you should always make a copy and edit that instead, to +avoid triggering some unwanted changes by an in between safe. + +[source,bash] +---- +cp /etc/pve/corosync.conf /etc/pve/corosync.conf.new +---- + +Then open the Config file with your favorite editor, `nano` and `vim.tiny` are +preinstalled on {pve} for example. + +NOTE: Always increment the 'config_version' number on configuration changes, +omitting this can lead to problems. + +After making the necessary changes create another copy of the current working +configuration file. This serves as a backup if the new configuration fails to +apply or makes problems in other ways. + +[source,bash] +---- +cp /etc/pve/corosync.conf /etc/pve/corosync.conf.bak +---- + +Then move the new configuration file over the old one: +[source,bash] +---- +mv /etc/pve/corosync.conf.new /etc/pve/corosync.conf +---- + +You may check with the commands +[source,bash] +---- +systemctl status corosync +journalctl -b -u corosync +---- + +If the change could applied automatically. If not you may have to restart the +corosync service via: +[source,bash] +---- +systemctl restart corosync +---- + +On errors check the troubleshooting section below. + +Troubleshooting +~~~~~~~~~~~~~~~ + +Issue: 'quorum.expected_votes must be configured' +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +When corosync starts to fail and you get the following message in the system log: + +---- +[...] +corosync[1647]: [QUORUM] Quorum provider: corosync_votequorum failed to initialize. +corosync[1647]: [SERV ] Service engine 'corosync_quorum' failed to load for reason + 'configuration error: nodelist or quorum.expected_votes must be configured!' +[...] +---- + +It means that the hostname you set for corosync 'ringX_addr' in the +configuration could not be resolved. + + +Write Configuration When Not Quorate +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +If you need to change '/etc/pve/corosync.conf' on an node with no quorum, and you +know what you do, use: +[source,bash] +---- +pvecm expected 1 +---- + +This sets the expected vote count to 1 and makes the cluster quorate. You can +now fix your configuration, or revert it back to the last working backup. + +This is not enough if corosync cannot start anymore. Here its best to edit the +local copy of the corosync configuration in '/etc/corosync/corosync.conf' so +that corosync can start again. Ensure that on all nodes this configuration has +the same content to avoid split brains. If you are not sure what went wrong +it's best to ask the Proxmox Community to help you. + + +[[corosync-conf-glossary]] +Corosync Configuration Glossary +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +ringX_addr:: +This names the different ring addresses for the corosync totem rings used for +the cluster communication. + +bindnetaddr:: +Defines to which interface the ring should bind to. It may be any address of +the subnet configured on the interface we want to use. In general its the +recommended to just use an address a node uses on this interface. + +rrp_mode:: +Specifies the mode of the redundant ring protocol and may be passive, active or +none. Note that use of active is highly experimental and not official +supported. Passive is the preferred mode, it may double the cluster +communication throughput and increases availability. + + +Cluster Cold Start +------------------ + +It is obvious that a cluster is not quorate when all nodes are +offline. This is a common case after a power failure. + +NOTE: It is always a good idea to use an uninterruptible power supply +(``UPS'', also called ``battery backup'') to avoid this state, especially if +you want HA. + +On node startup, the `pve-guests` service is started and waits for +quorum. Once quorate, it starts all guests which have the `onboot` +flag set. + +When you turn on nodes, or when power comes back after power failure, +it is likely that some nodes boots faster than others. Please keep in +mind that guest startup is delayed until you reach quorum. + + +Guest Migration +--------------- + +Migrating virtual guests to other nodes is a useful feature in a +cluster. There are settings to control the behavior of such +migrations. This can be done via the configuration file +`datacenter.cfg` or for a specific migration via API or command line +parameters. + +It makes a difference if a Guest is online or offline, or if it has +local resources (like a local disk). + +For Details about Virtual Machine Migration see the +xref:qm_migration[QEMU/KVM Migration Chapter] + +For Details about Container Migration see the +xref:pct_migration[Container Migration Chapter] + +Migration Type +~~~~~~~~~~~~~~ + +The migration type defines if the migration data should be sent over an +encrypted (`secure`) channel or an unencrypted (`insecure`) one. +Setting the migration type to insecure means that the RAM content of a +virtual guest gets also transferred unencrypted, which can lead to +information disclosure of critical data from inside the guest (for +example passwords or encryption keys). + +Therefore, we strongly recommend using the secure channel if you do +not have full control over the network and can not guarantee that no +one is eavesdropping to it. + +NOTE: Storage migration does not follow this setting. Currently, it +always sends the storage content over a secure channel. + +Encryption requires a lot of computing power, so this setting is often +changed to "unsafe" to achieve better performance. The impact on +modern systems is lower because they implement AES encryption in +hardware. The performance impact is particularly evident in fast +networks where you can transfer 10 Gbps or more. + + +Migration Network +~~~~~~~~~~~~~~~~~ + +By default, {pve} uses the network in which cluster communication +takes place to send the migration traffic. This is not optimal because +sensitive cluster traffic can be disrupted and this network may not +have the best bandwidth available on the node. + +Setting the migration network parameter allows the use of a dedicated +network for the entire migration traffic. In addition to the memory, +this also affects the storage traffic for offline migrations. + +The migration network is set as a network in the CIDR notation. This +has the advantage that you do not have to set individual IP addresses +for each node. {pve} can determine the real address on the +destination node from the network specified in the CIDR form. To +enable this, the network must be specified so that each node has one, +but only one IP in the respective network. + + +Example +^^^^^^^ + +We assume that we have a three-node setup with three separate +networks. One for public communication with the Internet, one for +cluster communication and a very fast one, which we want to use as a +dedicated network for migration. + +A network configuration for such a setup might look as follows: + +---- +iface eno1 inet manual + +# public network +auto vmbr0 +iface vmbr0 inet static + address 192.X.Y.57 + netmask 255.255.250.0 + gateway 192.X.Y.1 + bridge_ports eno1 + bridge_stp off + bridge_fd 0 + +# cluster network +auto eno2 +iface eno2 inet static + address 10.1.1.1 + netmask 255.255.255.0 + +# fast network +auto eno3 +iface eno3 inet static + address 10.1.2.1 + netmask 255.255.255.0 +---- + +Here, we will use the network 10.1.2.0/24 as a migration network. For +a single migration, you can do this using the `migration_network` +parameter of the command line tool: + +---- +# qm migrate 106 tre --online --migration_network 10.1.2.0/24 +---- + +To configure this as the default network for all migrations in the +cluster, set the `migration` property of the `/etc/pve/datacenter.cfg` +file: + +---- +# use dedicated migration network +migration: secure,network=10.1.2.0/24 +---- + +NOTE: The migration type must always be set when the migration network +gets set in `/etc/pve/datacenter.cfg`. ifdef::manvolnum[]