]> git.proxmox.com Git - pve-docs.git/blame - pvecm.adoc
pvecm: add note about corosync killnode error
[pve-docs.git] / pvecm.adoc
CommitLineData
bde0e57d 1[[chapter_pvecm]]
d8742b0c 2ifdef::manvolnum[]
b2f242ab
DM
3pvecm(1)
4========
5f09af76
DM
5:pve-toplevel:
6
d8742b0c
DM
7NAME
8----
9
74026b8f 10pvecm - Proxmox VE Cluster Manager
d8742b0c 11
49a5e11c 12SYNOPSIS
d8742b0c
DM
13--------
14
15include::pvecm.1-synopsis.adoc[]
16
17DESCRIPTION
18-----------
19endif::manvolnum[]
20
21ifndef::manvolnum[]
22Cluster Manager
23===============
5f09af76 24:pve-toplevel:
194d2f29 25endif::manvolnum[]
5f09af76 26
65a0aa49 27The {pve} cluster manager `pvecm` is a tool to create a group of
8c1189b6 28physical servers. Such a group is called a *cluster*. We use the
8a865621 29http://www.corosync.org[Corosync Cluster Engine] for reliable group
fdf1dd36
TL
30communication. There's no explicit limit for the number of nodes in a cluster.
31In practice, the actual possible node count may be limited by the host and
79bb0794 32network performance. Currently (2021), there are reports of clusters (using
fdf1dd36 33high-end enterprise hardware) with over 50 nodes in production.
8a865621 34
8c1189b6 35`pvecm` can be used to create a new cluster, join nodes to a cluster,
a37d539f 36leave the cluster, get status information, and do various other cluster-related
60ed554f 37tasks. The **P**rox**m**o**x** **C**luster **F**ile **S**ystem (``pmxcfs'')
e300cf7d 38is used to transparently distribute the cluster configuration to all cluster
8a865621
DM
39nodes.
40
41Grouping nodes into a cluster has the following advantages:
42
a37d539f 43* Centralized, web-based management
8a865621 44
6d3c0b34 45* Multi-master clusters: each node can do all management tasks
8a865621 46
a37d539f
DW
47* Use of `pmxcfs`, a database-driven file system, for storing configuration
48 files, replicated in real-time on all nodes using `corosync`
8a865621 49
5eba0743 50* Easy migration of virtual machines and containers between physical
8a865621
DM
51 hosts
52
53* Fast deployment
54
55* Cluster-wide services like firewall and HA
56
57
58Requirements
59------------
60
a9e7c3aa
SR
61* All nodes must be able to connect to each other via UDP ports 5404 and 5405
62 for corosync to work.
8a865621 63
a37d539f 64* Date and time must be synchronized.
8a865621 65
a37d539f 66* An SSH tunnel on TCP port 22 between nodes is required.
8a865621 67
ceabe189
DM
68* If you are interested in High Availability, you need to have at
69 least three nodes for reliable quorum. All nodes should have the
70 same version.
8a865621
DM
71
72* We recommend a dedicated NIC for the cluster traffic, especially if
73 you use shared storage.
74
a37d539f 75* The root password of a cluster node is required for adding nodes.
d4a9910f 76
e4b62d04
TL
77NOTE: It is not possible to mix {pve} 3.x and earlier with {pve} 4.X cluster
78nodes.
79
a37d539f
DW
80NOTE: While it's possible to mix {pve} 4.4 and {pve} 5.0 nodes, doing so is
81not supported as a production configuration and should only be done temporarily,
82during an upgrade of the whole cluster from one major version to another.
8a865621 83
a9e7c3aa
SR
84NOTE: Running a cluster of {pve} 6.x with earlier versions is not possible. The
85cluster protocol (corosync) between {pve} 6.x and earlier versions changed
86fundamentally. The corosync 3 packages for {pve} 5.4 are only intended for the
87upgrade procedure to {pve} 6.0.
88
8a865621 89
ceabe189
DM
90Preparing Nodes
91---------------
8a865621 92
65a0aa49 93First, install {pve} on all nodes. Make sure that each node is
8a865621
DM
94installed with the final hostname and IP configuration. Changing the
95hostname and IP is not possible after cluster creation.
96
a37d539f 97While it's common to reference all node names and their IPs in `/etc/hosts` (or
a9e7c3aa
SR
98make their names resolvable through other means), this is not necessary for a
99cluster to work. It may be useful however, as you can then connect from one node
a37d539f 100to another via SSH, using the easier to remember node name (see also
a9e7c3aa 101xref:pvecm_corosync_addresses[Link Address Types]). Note that we always
a37d539f 102recommend referencing nodes by their IP addresses in the cluster configuration.
a9e7c3aa 103
9a7396aa 104
11202f1d 105[[pvecm_create_cluster]]
6cab1704
TL
106Create a Cluster
107----------------
108
109You can either create a cluster on the console (login via `ssh`), or through
a37d539f 110the API using the {pve} web interface (__Datacenter -> Cluster__).
8a865621 111
6cab1704
TL
112NOTE: Use a unique name for your cluster. This name cannot be changed later.
113The cluster name follows the same rules as node names.
3e380ce0 114
6cab1704 115[[pvecm_cluster_create_via_gui]]
3e380ce0
SR
116Create via Web GUI
117~~~~~~~~~~~~~~~~~~
118
24398259
SR
119[thumbnail="screenshot/gui-cluster-create.png"]
120
3e380ce0 121Under __Datacenter -> Cluster__, click on *Create Cluster*. Enter the cluster
a37d539f
DW
122name and select a network connection from the drop-down list to serve as the
123main cluster network (Link 0). It defaults to the IP resolved via the node's
3e380ce0
SR
124hostname.
125
663ae2bf
DW
126As of {pve} 6.2, up to 8 fallback links can be added to a cluster. To add a
127redundant link, click the 'Add' button and select a link number and IP address
128from the respective fields. Prior to {pve} 6.2, to add a second link as
129fallback, you can select the 'Advanced' checkbox and choose an additional
130network interface (Link 1, see also xref:pvecm_redundancy[Corosync Redundancy]).
3e380ce0 131
a37d539f
DW
132NOTE: Ensure that the network selected for cluster communication is not used for
133any high traffic purposes, like network storage or live-migration.
6cab1704
TL
134While the cluster network itself produces small amounts of data, it is very
135sensitive to latency. Check out full
136xref:pvecm_cluster_network_requirements[cluster network requirements].
137
138[[pvecm_cluster_create_via_cli]]
a37d539f
DW
139Create via the Command Line
140~~~~~~~~~~~~~~~~~~~~~~~~~~~
3e380ce0
SR
141
142Login via `ssh` to the first {pve} node and run the following command:
8a865621 143
c15cdfba
TL
144----
145 hp1# pvecm create CLUSTERNAME
146----
8a865621 147
3e380ce0 148To check the state of the new cluster use:
8a865621 149
c15cdfba 150----
8a865621 151 hp1# pvecm status
c15cdfba 152----
8a865621 153
a37d539f
DW
154Multiple Clusters in the Same Network
155~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
dd1aa0e0
TL
156
157It is possible to create multiple clusters in the same physical or logical
a37d539f
DW
158network. In this case, each cluster must have a unique name to avoid possible
159clashes in the cluster communication stack. Furthermore, this helps avoid human
160confusion by making clusters clearly distinguishable.
dd1aa0e0
TL
161
162While the bandwidth requirement of a corosync cluster is relatively low, the
163latency of packages and the package per second (PPS) rate is the limiting
164factor. Different clusters in the same network can compete with each other for
165these resources, so it may still make sense to use separate physical network
166infrastructure for bigger clusters.
8a865621 167
11202f1d 168[[pvecm_join_node_to_cluster]]
8a865621 169Adding Nodes to the Cluster
ceabe189 170---------------------------
8a865621 171
3e380ce0
SR
172CAUTION: A node that is about to be added to the cluster cannot hold any guests.
173All existing configuration in `/etc/pve` is overwritten when joining a cluster,
a37d539f
DW
174since guest IDs could otherwise conflict. As a workaround, you can create a
175backup of the guest (`vzdump`) and restore it under a different ID, after the
176node has been added to the cluster.
3e380ce0 177
6cab1704
TL
178Join Node to Cluster via GUI
179~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3e380ce0 180
24398259
SR
181[thumbnail="screenshot/gui-cluster-join-information.png"]
182
a37d539f
DW
183Log in to the web interface on an existing cluster node. Under __Datacenter ->
184Cluster__, click the *Join Information* button at the top. Then, click on the
3e380ce0
SR
185button *Copy Information*. Alternatively, copy the string from the 'Information'
186field manually.
187
24398259
SR
188[thumbnail="screenshot/gui-cluster-join.png"]
189
a37d539f 190Next, log in to the web interface on the node you want to add.
3e380ce0 191Under __Datacenter -> Cluster__, click on *Join Cluster*. Fill in the
6cab1704
TL
192'Information' field with the 'Join Information' text you copied earlier.
193Most settings required for joining the cluster will be filled out
194automatically. For security reasons, the cluster password has to be entered
195manually.
3e380ce0
SR
196
197NOTE: To enter all required data manually, you can disable the 'Assisted Join'
198checkbox.
199
6cab1704 200After clicking the *Join* button, the cluster join process will start
a37d539f
DW
201immediately. After the node has joined the cluster, its current node certificate
202will be replaced by one signed from the cluster certificate authority (CA).
203This means that the current session will stop working after a few seconds. You
204then might need to force-reload the web interface and log in again with the
205cluster credentials.
3e380ce0 206
6cab1704 207Now your node should be visible under __Datacenter -> Cluster__.
3e380ce0 208
6cab1704
TL
209Join Node to Cluster via Command Line
210~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3e380ce0 211
a37d539f 212Log in to the node you want to join into an existing cluster via `ssh`.
8a865621 213
c15cdfba 214----
8673c878 215 # pvecm add IP-ADDRESS-CLUSTER
c15cdfba 216----
8a865621 217
a37d539f 218For `IP-ADDRESS-CLUSTER`, use the IP or hostname of an existing cluster node.
a9e7c3aa 219An IP address is recommended (see xref:pvecm_corosync_addresses[Link Address Types]).
8a865621 220
8a865621 221
a9e7c3aa 222To check the state of the cluster use:
8a865621 223
c15cdfba 224----
8a865621 225 # pvecm status
c15cdfba 226----
8a865621 227
ceabe189 228.Cluster status after adding 4 nodes
8a865621 229----
8673c878
DW
230 # pvecm status
231Cluster information
232~~~~~~~~~~~~~~~~~~~
233Name: prod-central
234Config Version: 3
235Transport: knet
236Secure auth: on
237
8a865621
DM
238Quorum information
239~~~~~~~~~~~~~~~~~~
8673c878 240Date: Tue Sep 14 11:06:47 2021
8a865621
DM
241Quorum provider: corosync_votequorum
242Nodes: 4
243Node ID: 0x00000001
8673c878 244Ring ID: 1.1a8
8a865621
DM
245Quorate: Yes
246
247Votequorum information
248~~~~~~~~~~~~~~~~~~~~~~
249Expected votes: 4
250Highest expected: 4
251Total votes: 4
91f3edd0 252Quorum: 3
8a865621
DM
253Flags: Quorate
254
255Membership information
256~~~~~~~~~~~~~~~~~~~~~~
257 Nodeid Votes Name
2580x00000001 1 192.168.15.91
2590x00000002 1 192.168.15.92 (local)
2600x00000003 1 192.168.15.93
2610x00000004 1 192.168.15.94
262----
263
a37d539f 264If you only want a list of all nodes, use:
8a865621 265
c15cdfba 266----
8a865621 267 # pvecm nodes
c15cdfba 268----
8a865621 269
5eba0743 270.List nodes in a cluster
8a865621 271----
8673c878 272 # pvecm nodes
8a865621
DM
273
274Membership information
275~~~~~~~~~~~~~~~~~~~~~~
276 Nodeid Votes Name
277 1 1 hp1
278 2 1 hp2 (local)
279 3 1 hp3
280 4 1 hp4
281----
282
3254bfdd 283[[pvecm_adding_nodes_with_separated_cluster_network]]
a37d539f 284Adding Nodes with Separated Cluster Network
e4ec4154
TL
285~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
286
a37d539f 287When adding a node to a cluster with a separated cluster network, you need to
a9e7c3aa 288use the 'link0' parameter to set the nodes address on that network:
e4ec4154
TL
289
290[source,bash]
4d19cb00 291----
a9e7c3aa 292pvecm add IP-ADDRESS-CLUSTER -link0 LOCAL-IP-ADDRESS-LINK0
4d19cb00 293----
e4ec4154 294
a9e7c3aa 295If you want to use the built-in xref:pvecm_redundancy[redundancy] of the
a37d539f 296Kronosnet transport layer, also use the 'link1' parameter.
e4ec4154 297
a37d539f
DW
298Using the GUI, you can select the correct interface from the corresponding
299'Link X' fields in the *Cluster Join* dialog.
8a865621
DM
300
301Remove a Cluster Node
ceabe189 302---------------------
8a865621 303
a37d539f 304CAUTION: Read the procedure carefully before proceeding, as it may
8a865621
DM
305not be what you want or need.
306
a37d539f
DW
307Move all virtual machines from the node. Make sure you have made copies of any
308local data or backups that you want to keep. In the following example, we will
309remove the node hp4 from the cluster.
8a865621 310
e8503c6c
EK
311Log in to a *different* cluster node (not hp4), and issue a `pvecm nodes`
312command to identify the node ID to remove:
8a865621
DM
313
314----
8673c878 315 hp1# pvecm nodes
8a865621
DM
316
317Membership information
318~~~~~~~~~~~~~~~~~~~~~~
319 Nodeid Votes Name
320 1 1 hp1 (local)
321 2 1 hp2
322 3 1 hp3
323 4 1 hp4
324----
325
e8503c6c 326
a37d539f
DW
327At this point, you must power off hp4 and ensure that it will not power on
328again (in the network) with its current configuration.
e8503c6c 329
a37d539f
DW
330IMPORTANT: As mentioned above, it is critical to power off the node
331*before* removal, and make sure that it will *not* power on again
332(in the existing cluster network) with its current configuration.
333If you power on the node as it is, the cluster could end up broken,
334and it could be difficult to restore it to a functioning state.
e8503c6c
EK
335
336After powering off the node hp4, we can safely remove it from the cluster.
8a865621 337
c15cdfba 338----
8a865621 339 hp1# pvecm delnode hp4
10da5ce1 340 Killing node 4
c15cdfba 341----
8a865621 342
249fd833
DW
343NOTE: At this point, it is possible that you will receive an error message
344stating `Could not kill node (error = CS_ERR_NOT_EXIST)`. This does not
345signify an actual failure in the deletion of the node, but rather a failure in
346corosync trying to kill an offline node. Thus, it can be safely ignored.
347
10da5ce1
DJ
348Use `pvecm nodes` or `pvecm status` to check the node list again. It should
349look something like:
8a865621
DM
350
351----
352hp1# pvecm status
353
8673c878 354...
8a865621
DM
355
356Votequorum information
357~~~~~~~~~~~~~~~~~~~~~~
358Expected votes: 3
359Highest expected: 3
360Total votes: 3
91f3edd0 361Quorum: 2
8a865621
DM
362Flags: Quorate
363
364Membership information
365~~~~~~~~~~~~~~~~~~~~~~
366 Nodeid Votes Name
3670x00000001 1 192.168.15.90 (local)
3680x00000002 1 192.168.15.91
3690x00000003 1 192.168.15.92
370----
371
a9e7c3aa 372If, for whatever reason, you want this server to join the same cluster again,
a37d539f 373you have to:
8a865621 374
a37d539f 375* do a fresh install of {pve} on it,
8a865621
DM
376
377* then join it, as explained in the previous section.
d8742b0c 378
41925ede
SR
379NOTE: After removal of the node, its SSH fingerprint will still reside in the
380'known_hosts' of the other nodes. If you receive an SSH error after rejoining
9121b45b
TL
381a node with the same IP or hostname, run `pvecm updatecerts` once on the
382re-added node to update its fingerprint cluster wide.
41925ede 383
38ae8db3 384[[pvecm_separate_node_without_reinstall]]
a37d539f 385Separate a Node Without Reinstalling
555e966b
TL
386~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
387
388CAUTION: This is *not* the recommended method, proceed with caution. Use the
a37d539f 389previous method if you're unsure.
555e966b
TL
390
391You can also separate a node from a cluster without reinstalling it from
a37d539f
DW
392scratch. But after removing the node from the cluster, it will still have
393access to any shared storage. This must be resolved before you start removing
555e966b 394the node from the cluster. A {pve} cluster cannot share the exact same
60ed554f 395storage with another cluster, as storage locking doesn't work over the cluster
a37d539f 396boundary. Furthermore, it may also lead to VMID conflicts.
555e966b 397
a37d539f 398It's suggested that you create a new storage, where only the node which you want
a9e7c3aa 399to separate has access. This can be a new export on your NFS or a new Ceph
a37d539f
DW
400pool, to name a few examples. It's just important that the exact same storage
401does not get accessed by multiple clusters. After setting up this storage, move
402all data and VMs from the node to it. Then you are ready to separate the
3be22308 403node from the cluster.
555e966b 404
a37d539f
DW
405WARNING: Ensure that all shared resources are cleanly separated! Otherwise you
406will run into conflicts and problems.
555e966b 407
a37d539f 408First, stop the corosync and pve-cluster services on the node:
555e966b 409[source,bash]
4d19cb00 410----
555e966b
TL
411systemctl stop pve-cluster
412systemctl stop corosync
4d19cb00 413----
555e966b 414
a37d539f 415Start the cluster file system again in local mode:
555e966b 416[source,bash]
4d19cb00 417----
555e966b 418pmxcfs -l
4d19cb00 419----
555e966b
TL
420
421Delete the corosync configuration files:
422[source,bash]
4d19cb00 423----
555e966b 424rm /etc/pve/corosync.conf
838081cd 425rm -r /etc/corosync/*
4d19cb00 426----
555e966b 427
a37d539f 428You can now start the file system again as a normal service:
555e966b 429[source,bash]
4d19cb00 430----
555e966b
TL
431killall pmxcfs
432systemctl start pve-cluster
4d19cb00 433----
555e966b 434
a37d539f
DW
435The node is now separated from the cluster. You can deleted it from any
436remaining node of the cluster with:
555e966b 437[source,bash]
4d19cb00 438----
555e966b 439pvecm delnode oldnode
4d19cb00 440----
555e966b 441
a37d539f
DW
442If the command fails due to a loss of quorum in the remaining node, you can set
443the expected votes to 1 as a workaround:
555e966b 444[source,bash]
4d19cb00 445----
555e966b 446pvecm expected 1
4d19cb00 447----
555e966b 448
96d698db 449And then repeat the 'pvecm delnode' command.
555e966b 450
a37d539f
DW
451Now switch back to the separated node and delete all the remaining cluster
452files on it. This ensures that the node can be added to another cluster again
453without problems.
555e966b
TL
454
455[source,bash]
4d19cb00 456----
555e966b 457rm /var/lib/corosync/*
4d19cb00 458----
555e966b
TL
459
460As the configuration files from the other nodes are still in the cluster
a37d539f
DW
461file system, you may want to clean those up too. After making absolutely sure
462that you have the correct node name, you can simply remove the entire
463directory recursively from '/etc/pve/nodes/NODENAME'.
555e966b 464
a37d539f
DW
465CAUTION: The node's SSH keys will remain in the 'authorized_key' file. This
466means that the nodes can still connect to each other with public key
467authentication. You should fix this by removing the respective keys from the
555e966b 468'/etc/pve/priv/authorized_keys' file.
d8742b0c 469
a9e7c3aa 470
806ef12d
DM
471Quorum
472------
473
474{pve} use a quorum-based technique to provide a consistent state among
475all cluster nodes.
476
477[quote, from Wikipedia, Quorum (distributed computing)]
478____
479A quorum is the minimum number of votes that a distributed transaction
480has to obtain in order to be allowed to perform an operation in a
481distributed system.
482____
483
484In case of network partitioning, state changes requires that a
485majority of nodes are online. The cluster switches to read-only mode
5eba0743 486if it loses quorum.
806ef12d
DM
487
488NOTE: {pve} assigns a single vote to each node by default.
489
a9e7c3aa 490
e4ec4154
TL
491Cluster Network
492---------------
493
494The cluster network is the core of a cluster. All messages sent over it have to
a9e7c3aa 495be delivered reliably to all nodes in their respective order. In {pve} this
a37d539f
DW
496part is done by corosync, an implementation of a high performance, low overhead,
497high availability development toolkit. It serves our decentralized configuration
498file system (`pmxcfs`).
e4ec4154 499
3254bfdd 500[[pvecm_cluster_network_requirements]]
e4ec4154
TL
501Network Requirements
502~~~~~~~~~~~~~~~~~~~~
503This needs a reliable network with latencies under 2 milliseconds (LAN
a9e7c3aa 504performance) to work properly. The network should not be used heavily by other
a37d539f 505members; ideally corosync runs on its own network. Do not use a shared network
a9e7c3aa
SR
506for corosync and storage (except as a potential low-priority fallback in a
507xref:pvecm_redundancy[redundant] configuration).
e4ec4154 508
a9e7c3aa 509Before setting up a cluster, it is good practice to check if the network is fit
a37d539f 510for that purpose. To ensure that the nodes can connect to each other on the
a9e7c3aa
SR
511cluster network, you can test the connectivity between them with the `ping`
512tool.
e4ec4154 513
a9e7c3aa
SR
514If the {pve} firewall is enabled, ACCEPT rules for corosync will automatically
515be generated - no manual action is required.
e4ec4154 516
a9e7c3aa
SR
517NOTE: Corosync used Multicast before version 3.0 (introduced in {pve} 6.0).
518Modern versions rely on https://kronosnet.org/[Kronosnet] for cluster
519communication, which, for now, only supports regular UDP unicast.
e4ec4154 520
a9e7c3aa
SR
521CAUTION: You can still enable Multicast or legacy unicast by setting your
522transport to `udp` or `udpu` in your xref:pvecm_edit_corosync_conf[corosync.conf],
523but keep in mind that this will disable all cryptography and redundancy support.
524This is therefore not recommended.
e4ec4154
TL
525
526Separate Cluster Network
527~~~~~~~~~~~~~~~~~~~~~~~~
528
a37d539f
DW
529When creating a cluster without any parameters, the corosync cluster network is
530generally shared with the web interface and the VMs' network. Depending on
531your setup, even storage traffic may get sent over the same network. It's
532recommended to change that, as corosync is a time-critical, real-time
a9e7c3aa 533application.
e4ec4154 534
a37d539f 535Setting Up a New Network
e4ec4154
TL
536^^^^^^^^^^^^^^^^^^^^^^^^
537
9ffebff5 538First, you have to set up a new network interface. It should be on a physically
e4ec4154 539separate network. Ensure that your network fulfills the
3254bfdd 540xref:pvecm_cluster_network_requirements[cluster network requirements].
e4ec4154
TL
541
542Separate On Cluster Creation
543^^^^^^^^^^^^^^^^^^^^^^^^^^^^
544
a9e7c3aa 545This is possible via the 'linkX' parameters of the 'pvecm create'
a37d539f 546command, used for creating a new cluster.
e4ec4154 547
a9e7c3aa
SR
548If you have set up an additional NIC with a static address on 10.10.10.1/25,
549and want to send and receive all cluster communication over this interface,
e4ec4154
TL
550you would execute:
551
552[source,bash]
4d19cb00 553----
a9e7c3aa 554pvecm create test --link0 10.10.10.1
4d19cb00 555----
e4ec4154 556
a37d539f 557To check if everything is working properly, execute:
e4ec4154 558[source,bash]
4d19cb00 559----
e4ec4154 560systemctl status corosync
4d19cb00 561----
e4ec4154 562
a9e7c3aa 563Afterwards, proceed as described above to
3254bfdd 564xref:pvecm_adding_nodes_with_separated_cluster_network[add nodes with a separated cluster network].
82d52451 565
3254bfdd 566[[pvecm_separate_cluster_net_after_creation]]
e4ec4154
TL
567Separate After Cluster Creation
568^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
569
a9e7c3aa 570You can do this if you have already created a cluster and want to switch
e4ec4154 571its communication to another network, without rebuilding the whole cluster.
a37d539f 572This change may lead to short periods of quorum loss in the cluster, as nodes
e4ec4154
TL
573have to restart corosync and come up one after the other on the new network.
574
3254bfdd 575Check how to xref:pvecm_edit_corosync_conf[edit the corosync.conf file] first.
a9e7c3aa 576Then, open it and you should see a file similar to:
e4ec4154
TL
577
578----
579logging {
580 debug: off
581 to_syslog: yes
582}
583
584nodelist {
585
586 node {
587 name: due
588 nodeid: 2
589 quorum_votes: 1
590 ring0_addr: due
591 }
592
593 node {
594 name: tre
595 nodeid: 3
596 quorum_votes: 1
597 ring0_addr: tre
598 }
599
600 node {
601 name: uno
602 nodeid: 1
603 quorum_votes: 1
604 ring0_addr: uno
605 }
606
607}
608
609quorum {
610 provider: corosync_votequorum
611}
612
613totem {
a9e7c3aa 614 cluster_name: testcluster
e4ec4154 615 config_version: 3
a9e7c3aa 616 ip_version: ipv4-6
e4ec4154
TL
617 secauth: on
618 version: 2
619 interface {
a9e7c3aa 620 linknumber: 0
e4ec4154
TL
621 }
622
623}
624----
625
a37d539f 626NOTE: `ringX_addr` actually specifies a corosync *link address*. The name "ring"
a9e7c3aa
SR
627is a remnant of older corosync versions that is kept for backwards
628compatibility.
629
a37d539f 630The first thing you want to do is add the 'name' properties in the node entries,
a9e7c3aa 631if you do not see them already. Those *must* match the node name.
e4ec4154 632
a9e7c3aa
SR
633Then replace all addresses from the 'ring0_addr' properties of all nodes with
634the new addresses. You may use plain IP addresses or hostnames here. If you use
a37d539f
DW
635hostnames, ensure that they are resolvable from all nodes (see also
636xref:pvecm_corosync_addresses[Link Address Types]).
e4ec4154 637
a37d539f
DW
638In this example, we want to switch cluster communication to the
63910.10.10.1/25 network, so we change the 'ring0_addr' of each node respectively.
e4ec4154 640
a9e7c3aa 641NOTE: The exact same procedure can be used to change other 'ringX_addr' values
a37d539f
DW
642as well. However, we recommend only changing one link address at a time, so
643that it's easier to recover if something goes wrong.
a9e7c3aa
SR
644
645After we increase the 'config_version' property, the new configuration file
e4ec4154
TL
646should look like:
647
648----
e4ec4154
TL
649logging {
650 debug: off
651 to_syslog: yes
652}
653
654nodelist {
655
656 node {
657 name: due
658 nodeid: 2
659 quorum_votes: 1
660 ring0_addr: 10.10.10.2
661 }
662
663 node {
664 name: tre
665 nodeid: 3
666 quorum_votes: 1
667 ring0_addr: 10.10.10.3
668 }
669
670 node {
671 name: uno
672 nodeid: 1
673 quorum_votes: 1
674 ring0_addr: 10.10.10.1
675 }
676
677}
678
679quorum {
680 provider: corosync_votequorum
681}
682
683totem {
a9e7c3aa 684 cluster_name: testcluster
e4ec4154 685 config_version: 4
a9e7c3aa 686 ip_version: ipv4-6
e4ec4154
TL
687 secauth: on
688 version: 2
689 interface {
a9e7c3aa 690 linknumber: 0
e4ec4154
TL
691 }
692
693}
694----
695
a37d539f
DW
696Then, after a final check to see that all changed information is correct, we
697save it and once again follow the
698xref:pvecm_edit_corosync_conf[edit corosync.conf file] section to bring it into
699effect.
e4ec4154 700
a9e7c3aa
SR
701The changes will be applied live, so restarting corosync is not strictly
702necessary. If you changed other settings as well, or notice corosync
703complaining, you can optionally trigger a restart.
e4ec4154
TL
704
705On a single node execute:
a9e7c3aa 706
e4ec4154 707[source,bash]
4d19cb00 708----
e4ec4154 709systemctl restart corosync
4d19cb00 710----
e4ec4154 711
a37d539f 712Now check if everything is okay:
e4ec4154
TL
713
714[source,bash]
4d19cb00 715----
e4ec4154 716systemctl status corosync
4d19cb00 717----
e4ec4154 718
a37d539f 719If corosync begins to work again, restart it on all other nodes too.
e4ec4154
TL
720They will then join the cluster membership one by one on the new network.
721
3254bfdd 722[[pvecm_corosync_addresses]]
a37d539f 723Corosync Addresses
270757a1
SR
724~~~~~~~~~~~~~~~~~~
725
a9e7c3aa
SR
726A corosync link address (for backwards compatibility denoted by 'ringX_addr' in
727`corosync.conf`) can be specified in two ways:
270757a1 728
a37d539f 729* **IPv4/v6 addresses** can be used directly. They are recommended, since they
270757a1
SR
730are static and usually not changed carelessly.
731
a37d539f 732* **Hostnames** will be resolved using `getaddrinfo`, which means that by
270757a1
SR
733default, IPv6 addresses will be used first, if available (see also
734`man gai.conf`). Keep this in mind, especially when upgrading an existing
735cluster to IPv6.
736
a37d539f 737CAUTION: Hostnames should be used with care, since the addresses they
270757a1
SR
738resolve to can be changed without touching corosync or the node it runs on -
739which may lead to a situation where an address is changed without thinking
740about implications for corosync.
741
5f318cc0 742A separate, static hostname specifically for corosync is recommended, if
270757a1
SR
743hostnames are preferred. Also, make sure that every node in the cluster can
744resolve all hostnames correctly.
745
746Since {pve} 5.1, while supported, hostnames will be resolved at the time of
a37d539f 747entry. Only the resolved IP is saved to the configuration.
270757a1
SR
748
749Nodes that joined the cluster on earlier versions likely still use their
750unresolved hostname in `corosync.conf`. It might be a good idea to replace
5f318cc0 751them with IPs or a separate hostname, as mentioned above.
270757a1 752
e4ec4154 753
a9e7c3aa
SR
754[[pvecm_redundancy]]
755Corosync Redundancy
756-------------------
e4ec4154 757
a37d539f 758Corosync supports redundant networking via its integrated Kronosnet layer by
a9e7c3aa
SR
759default (it is not supported on the legacy udp/udpu transports). It can be
760enabled by specifying more than one link address, either via the '--linkX'
3e380ce0
SR
761parameters of `pvecm`, in the GUI as **Link 1** (while creating a cluster or
762adding a new node) or by specifying more than one 'ringX_addr' in
763`corosync.conf`.
e4ec4154 764
a9e7c3aa
SR
765NOTE: To provide useful failover, every link should be on its own
766physical network connection.
e4ec4154 767
a9e7c3aa
SR
768Links are used according to a priority setting. You can configure this priority
769by setting 'knet_link_priority' in the corresponding interface section in
5f318cc0 770`corosync.conf`, or, preferably, using the 'priority' parameter when creating
a9e7c3aa 771your cluster with `pvecm`:
e4ec4154 772
4d19cb00 773----
fcf0226e 774 # pvecm create CLUSTERNAME --link0 10.10.10.1,priority=15 --link1 10.20.20.1,priority=20
4d19cb00 775----
e4ec4154 776
fcf0226e 777This would cause 'link1' to be used first, since it has the higher priority.
a9e7c3aa
SR
778
779If no priorities are configured manually (or two links have the same priority),
780links will be used in order of their number, with the lower number having higher
781priority.
782
783Even if all links are working, only the one with the highest priority will see
a37d539f
DW
784corosync traffic. Link priorities cannot be mixed, meaning that links with
785different priorities will not be able to communicate with each other.
e4ec4154 786
a9e7c3aa 787Since lower priority links will not see traffic unless all higher priorities
a37d539f
DW
788have failed, it becomes a useful strategy to specify networks used for
789other tasks (VMs, storage, etc.) as low-priority links. If worst comes to
790worst, a higher latency or more congested connection might be better than no
a9e7c3aa 791connection at all.
e4ec4154 792
a9e7c3aa
SR
793Adding Redundant Links To An Existing Cluster
794~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
e4ec4154 795
a9e7c3aa
SR
796To add a new link to a running configuration, first check how to
797xref:pvecm_edit_corosync_conf[edit the corosync.conf file].
e4ec4154 798
a9e7c3aa
SR
799Then, add a new 'ringX_addr' to every node in the `nodelist` section. Make
800sure that your 'X' is the same for every node you add it to, and that it is
801unique for each node.
802
803Lastly, add a new 'interface', as shown below, to your `totem`
a37d539f 804section, replacing 'X' with the link number chosen above.
a9e7c3aa
SR
805
806Assuming you added a link with number 1, the new configuration file could look
807like this:
e4ec4154
TL
808
809----
a9e7c3aa
SR
810logging {
811 debug: off
812 to_syslog: yes
e4ec4154
TL
813}
814
815nodelist {
a9e7c3aa 816
e4ec4154 817 node {
a9e7c3aa
SR
818 name: due
819 nodeid: 2
e4ec4154 820 quorum_votes: 1
a9e7c3aa
SR
821 ring0_addr: 10.10.10.2
822 ring1_addr: 10.20.20.2
e4ec4154
TL
823 }
824
a9e7c3aa
SR
825 node {
826 name: tre
827 nodeid: 3
e4ec4154 828 quorum_votes: 1
a9e7c3aa
SR
829 ring0_addr: 10.10.10.3
830 ring1_addr: 10.20.20.3
e4ec4154
TL
831 }
832
a9e7c3aa
SR
833 node {
834 name: uno
835 nodeid: 1
836 quorum_votes: 1
837 ring0_addr: 10.10.10.1
838 ring1_addr: 10.20.20.1
839 }
840
841}
842
843quorum {
844 provider: corosync_votequorum
845}
846
847totem {
848 cluster_name: testcluster
849 config_version: 4
850 ip_version: ipv4-6
851 secauth: on
852 version: 2
853 interface {
854 linknumber: 0
855 }
856 interface {
857 linknumber: 1
858 }
e4ec4154 859}
a9e7c3aa 860----
e4ec4154 861
a9e7c3aa
SR
862The new link will be enabled as soon as you follow the last steps to
863xref:pvecm_edit_corosync_conf[edit the corosync.conf file]. A restart should not
864be necessary. You can check that corosync loaded the new link using:
e4ec4154 865
a9e7c3aa
SR
866----
867journalctl -b -u corosync
e4ec4154
TL
868----
869
a9e7c3aa
SR
870It might be a good idea to test the new link by temporarily disconnecting the
871old link on one node and making sure that its status remains online while
872disconnected:
e4ec4154 873
a9e7c3aa
SR
874----
875pvecm status
876----
877
878If you see a healthy cluster state, it means that your new link is being used.
e4ec4154 879
e4ec4154 880
65a0aa49 881Role of SSH in {pve} Clusters
9d999d1b 882-----------------------------
39aa8892 883
65a0aa49 884{pve} utilizes SSH tunnels for various features.
39aa8892 885
4e8fe2a9 886* Proxying console/shell sessions (node and guests)
9d999d1b 887+
4e8fe2a9
FG
888When using the shell for node B while being connected to node A, connects to a
889terminal proxy on node A, which is in turn connected to the login shell on node
890B via a non-interactive SSH tunnel.
39aa8892 891
4e8fe2a9
FG
892* VM and CT memory and local-storage migration in 'secure' mode.
893+
a37d539f 894During the migration, one or more SSH tunnel(s) are established between the
4e8fe2a9
FG
895source and target nodes, in order to exchange migration information and
896transfer memory and disk contents.
9d999d1b
TL
897
898* Storage replication
39aa8892 899
9d999d1b
TL
900.Pitfalls due to automatic execution of `.bashrc` and siblings
901[IMPORTANT]
902====
903In case you have a custom `.bashrc`, or similar files that get executed on
904login by the configured shell, `ssh` will automatically run it once the session
905is established successfully. This can cause some unexpected behavior, as those
a37d539f
DW
906commands may be executed with root permissions on any of the operations
907described above. This can cause possible problematic side-effects!
39aa8892
OB
908
909In order to avoid such complications, it's recommended to add a check in
910`/root/.bashrc` to make sure the session is interactive, and only then run
911`.bashrc` commands.
912
913You can add this snippet at the beginning of your `.bashrc` file:
914
915----
9d999d1b 916# Early exit if not running interactively to avoid side-effects!
39aa8892
OB
917case $- in
918 *i*) ;;
919 *) return;;
920esac
921----
9d999d1b 922====
39aa8892
OB
923
924
c21d2cbe
OB
925Corosync External Vote Support
926------------------------------
927
928This section describes a way to deploy an external voter in a {pve} cluster.
929When configured, the cluster can sustain more node failures without
930violating safety properties of the cluster communication.
931
a37d539f 932For this to work, there are two services involved:
c21d2cbe 933
a37d539f 934* A QDevice daemon which runs on each {pve} node
c21d2cbe 935
a37d539f 936* An external vote daemon which runs on an independent server
c21d2cbe 937
a37d539f 938As a result, you can achieve higher availability, even in smaller setups (for
c21d2cbe
OB
939example 2+1 nodes).
940
941QDevice Technical Overview
942~~~~~~~~~~~~~~~~~~~~~~~~~~
943
5f318cc0 944The Corosync Quorum Device (QDevice) is a daemon which runs on each cluster
a37d539f
DW
945node. It provides a configured number of votes to the cluster's quorum
946subsystem, based on an externally running third-party arbitrator's decision.
c21d2cbe
OB
947Its primary use is to allow a cluster to sustain more node failures than
948standard quorum rules allow. This can be done safely as the external device
949can see all nodes and thus choose only one set of nodes to give its vote.
a37d539f 950This will only be done if said set of nodes can have quorum (again) after
c21d2cbe
OB
951receiving the third-party vote.
952
a37d539f
DW
953Currently, only 'QDevice Net' is supported as a third-party arbitrator. This is
954a daemon which provides a vote to a cluster partition, if it can reach the
955partition members over the network. It will only give votes to one partition
c21d2cbe
OB
956of a cluster at any time.
957It's designed to support multiple clusters and is almost configuration and
958state free. New clusters are handled dynamically and no configuration file
959is needed on the host running a QDevice.
960
a37d539f
DW
961The only requirements for the external host are that it needs network access to
962the cluster and to have a corosync-qnetd package available. We provide a package
963for Debian based hosts, and other Linux distributions should also have a package
c21d2cbe
OB
964available through their respective package manager.
965
966NOTE: In contrast to corosync itself, a QDevice connects to the cluster over
a37d539f 967TCP/IP. The daemon may even run outside of the cluster's LAN and can have longer
a9e7c3aa 968latencies than 2 ms.
c21d2cbe
OB
969
970Supported Setups
971~~~~~~~~~~~~~~~~
972
973We support QDevices for clusters with an even number of nodes and recommend
974it for 2 node clusters, if they should provide higher availability.
a37d539f
DW
975For clusters with an odd node count, we currently discourage the use of
976QDevices. The reason for this is the difference in the votes which the QDevice
977provides for each cluster type. Even numbered clusters get a single additional
978vote, which only increases availability, because if the QDevice
979itself fails, you are in the same position as with no QDevice at all.
980
981On the other hand, with an odd numbered cluster size, the QDevice provides
982'(N-1)' votes -- where 'N' corresponds to the cluster node count. This
983alternative behavior makes sense; if it had only one additional vote, the
984cluster could get into a split-brain situation. This algorithm allows for all
985nodes but one (and naturally the QDevice itself) to fail. However, there are two
986drawbacks to this:
c21d2cbe
OB
987
988* If the QNet daemon itself fails, no other node may fail or the cluster
a37d539f 989 immediately loses quorum. For example, in a cluster with 15 nodes, 7
c21d2cbe 990 could fail before the cluster becomes inquorate. But, if a QDevice is
a37d539f
DW
991 configured here and it itself fails, **no single node** of the 15 may fail.
992 The QDevice acts almost as a single point of failure in this case.
c21d2cbe 993
a37d539f
DW
994* The fact that all but one node plus QDevice may fail sounds promising at
995 first, but this may result in a mass recovery of HA services, which could
996 overload the single remaining node. Furthermore, a Ceph server will stop
997 providing services if only '((N-1)/2)' nodes or less remain online.
c21d2cbe 998
a37d539f
DW
999If you understand the drawbacks and implications, you can decide yourself if
1000you want to use this technology in an odd numbered cluster setup.
c21d2cbe 1001
c21d2cbe
OB
1002QDevice-Net Setup
1003~~~~~~~~~~~~~~~~~
1004
a37d539f 1005We recommend running any daemon which provides votes to corosync-qdevice as an
7c039095 1006unprivileged user. {pve} and Debian provide a package which is already
e34c3e91 1007configured to do so.
c21d2cbe 1008The traffic between the daemon and the cluster must be encrypted to ensure a
a37d539f 1009safe and secure integration of the QDevice in {pve}.
c21d2cbe 1010
41a37193
DJ
1011First, install the 'corosync-qnetd' package on your external server
1012
1013----
1014external# apt install corosync-qnetd
1015----
1016
1017and the 'corosync-qdevice' package on all cluster nodes
1018
1019----
1020pve# apt install corosync-qdevice
1021----
c21d2cbe 1022
a37d539f 1023After doing this, ensure that all the nodes in the cluster are online.
c21d2cbe 1024
a37d539f 1025You can now set up your QDevice by running the following command on one
c21d2cbe
OB
1026of the {pve} nodes:
1027
1028----
1029pve# pvecm qdevice setup <QDEVICE-IP>
1030----
1031
1b80fbaa
DJ
1032The SSH key from the cluster will be automatically copied to the QDevice.
1033
1034NOTE: Make sure that the SSH configuration on your external server allows root
1035login via password, if you are asked for a password during this step.
c21d2cbe 1036
a37d539f
DW
1037After you enter the password and all the steps have successfully completed, you
1038will see "Done". You can verify that the QDevice has been set up with:
c21d2cbe
OB
1039
1040----
1041pve# pvecm status
1042
1043...
1044
1045Votequorum information
1046~~~~~~~~~~~~~~~~~~~~~
1047Expected votes: 3
1048Highest expected: 3
1049Total votes: 3
1050Quorum: 2
1051Flags: Quorate Qdevice
1052
1053Membership information
1054~~~~~~~~~~~~~~~~~~~~~~
1055 Nodeid Votes Qdevice Name
1056 0x00000001 1 A,V,NMW 192.168.22.180 (local)
1057 0x00000002 1 A,V,NMW 192.168.22.181
1058 0x00000000 1 Qdevice
1059
1060----
1061
c21d2cbe 1062
c21d2cbe
OB
1063Frequently Asked Questions
1064~~~~~~~~~~~~~~~~~~~~~~~~~~
1065
1066Tie Breaking
1067^^^^^^^^^^^^
1068
00821894 1069In case of a tie, where two same-sized cluster partitions cannot see each other
a37d539f
DW
1070but can see the QDevice, the QDevice chooses one of those partitions randomly
1071and provides a vote to it.
c21d2cbe 1072
d31de328
TL
1073Possible Negative Implications
1074^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1075
a37d539f
DW
1076For clusters with an even node count, there are no negative implications when
1077using a QDevice. If it fails to work, it is the same as not having a QDevice
1078at all.
d31de328 1079
870c2817
OB
1080Adding/Deleting Nodes After QDevice Setup
1081^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
d31de328
TL
1082
1083If you want to add a new node or remove an existing one from a cluster with a
00821894
TL
1084QDevice setup, you need to remove the QDevice first. After that, you can add or
1085remove nodes normally. Once you have a cluster with an even node count again,
a37d539f 1086you can set up the QDevice again as described previously.
870c2817
OB
1087
1088Removing the QDevice
1089^^^^^^^^^^^^^^^^^^^^
1090
00821894 1091If you used the official `pvecm` tool to add the QDevice, you can remove it
a37d539f 1092by running:
870c2817
OB
1093
1094----
1095pve# pvecm qdevice remove
1096----
d31de328 1097
51730d56
TL
1098//Still TODO
1099//^^^^^^^^^^
a9e7c3aa 1100//There is still stuff to add here
c21d2cbe
OB
1101
1102
e4ec4154
TL
1103Corosync Configuration
1104----------------------
1105
a9e7c3aa
SR
1106The `/etc/pve/corosync.conf` file plays a central role in a {pve} cluster. It
1107controls the cluster membership and its network.
1108For further information about it, check the corosync.conf man page:
e4ec4154 1109[source,bash]
4d19cb00 1110----
e4ec4154 1111man corosync.conf
4d19cb00 1112----
e4ec4154 1113
a37d539f 1114For node membership, you should always use the `pvecm` tool provided by {pve}.
e4ec4154
TL
1115You may have to edit the configuration file manually for other changes.
1116Here are a few best practice tips for doing this.
1117
3254bfdd 1118[[pvecm_edit_corosync_conf]]
e4ec4154
TL
1119Edit corosync.conf
1120~~~~~~~~~~~~~~~~~~
1121
a9e7c3aa
SR
1122Editing the corosync.conf file is not always very straightforward. There are
1123two on each cluster node, one in `/etc/pve/corosync.conf` and the other in
e4ec4154
TL
1124`/etc/corosync/corosync.conf`. Editing the one in our cluster file system will
1125propagate the changes to the local one, but not vice versa.
1126
a37d539f
DW
1127The configuration will get updated automatically, as soon as the file changes.
1128This means that changes which can be integrated in a running corosync will take
1129effect immediately. Thus, you should always make a copy and edit that instead,
1130to avoid triggering unintended changes when saving the file while editing.
e4ec4154
TL
1131
1132[source,bash]
4d19cb00 1133----
e4ec4154 1134cp /etc/pve/corosync.conf /etc/pve/corosync.conf.new
4d19cb00 1135----
e4ec4154 1136
a37d539f
DW
1137Then, open the config file with your favorite editor, such as `nano` or
1138`vim.tiny`, which come pre-installed on every {pve} node.
e4ec4154 1139
a37d539f 1140NOTE: Always increment the 'config_version' number after configuration changes;
e4ec4154
TL
1141omitting this can lead to problems.
1142
a37d539f 1143After making the necessary changes, create another copy of the current working
e4ec4154 1144configuration file. This serves as a backup if the new configuration fails to
a37d539f 1145apply or causes other issues.
e4ec4154
TL
1146
1147[source,bash]
4d19cb00 1148----
e4ec4154 1149cp /etc/pve/corosync.conf /etc/pve/corosync.conf.bak
4d19cb00 1150----
e4ec4154 1151
a37d539f 1152Then replace the old configuration file with the new one:
e4ec4154 1153[source,bash]
4d19cb00 1154----
e4ec4154 1155mv /etc/pve/corosync.conf.new /etc/pve/corosync.conf
4d19cb00 1156----
e4ec4154 1157
a37d539f
DW
1158You can check if the changes could be applied automatically, using the following
1159commands:
e4ec4154 1160[source,bash]
4d19cb00 1161----
e4ec4154
TL
1162systemctl status corosync
1163journalctl -b -u corosync
4d19cb00 1164----
e4ec4154 1165
a37d539f 1166If the changes could not be applied automatically, you may have to restart the
e4ec4154
TL
1167corosync service via:
1168[source,bash]
4d19cb00 1169----
e4ec4154 1170systemctl restart corosync
4d19cb00 1171----
e4ec4154 1172
a37d539f 1173On errors, check the troubleshooting section below.
e4ec4154
TL
1174
1175Troubleshooting
1176~~~~~~~~~~~~~~~
1177
1178Issue: 'quorum.expected_votes must be configured'
1179^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1180
1181When corosync starts to fail and you get the following message in the system log:
1182
1183----
1184[...]
1185corosync[1647]: [QUORUM] Quorum provider: corosync_votequorum failed to initialize.
1186corosync[1647]: [SERV ] Service engine 'corosync_quorum' failed to load for reason
1187 'configuration error: nodelist or quorum.expected_votes must be configured!'
1188[...]
1189----
1190
a37d539f 1191It means that the hostname you set for a corosync 'ringX_addr' in the
e4ec4154
TL
1192configuration could not be resolved.
1193
e4ec4154
TL
1194Write Configuration When Not Quorate
1195^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1196
a37d539f
DW
1197If you need to change '/etc/pve/corosync.conf' on a node with no quorum, and you
1198understand what you are doing, use:
e4ec4154 1199[source,bash]
4d19cb00 1200----
e4ec4154 1201pvecm expected 1
4d19cb00 1202----
e4ec4154
TL
1203
1204This sets the expected vote count to 1 and makes the cluster quorate. You can
a37d539f 1205then fix your configuration, or revert it back to the last working backup.
e4ec4154 1206
a37d539f
DW
1207This is not enough if corosync cannot start anymore. In that case, it is best to
1208edit the local copy of the corosync configuration in
1209'/etc/corosync/corosync.conf', so that corosync can start again. Ensure that on
1210all nodes, this configuration has the same content to avoid split-brain
1211situations.
e4ec4154
TL
1212
1213
3254bfdd 1214[[pvecm_corosync_conf_glossary]]
e4ec4154
TL
1215Corosync Configuration Glossary
1216~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1217
1218ringX_addr::
a37d539f 1219This names the different link addresses for the Kronosnet connections between
a9e7c3aa 1220nodes.
e4ec4154 1221
806ef12d
DM
1222
1223Cluster Cold Start
1224------------------
1225
1226It is obvious that a cluster is not quorate when all nodes are
1227offline. This is a common case after a power failure.
1228
1229NOTE: It is always a good idea to use an uninterruptible power supply
8c1189b6 1230(``UPS'', also called ``battery backup'') to avoid this state, especially if
806ef12d
DM
1231you want HA.
1232
204231df 1233On node startup, the `pve-guests` service is started and waits for
8c1189b6 1234quorum. Once quorate, it starts all guests which have the `onboot`
612417fd
DM
1235flag set.
1236
1237When you turn on nodes, or when power comes back after power failure,
a37d539f 1238it is likely that some nodes will boot faster than others. Please keep in
612417fd 1239mind that guest startup is delayed until you reach quorum.
806ef12d 1240
054a7e7d 1241
082ea7d9
TL
1242Guest Migration
1243---------------
1244
054a7e7d
DM
1245Migrating virtual guests to other nodes is a useful feature in a
1246cluster. There are settings to control the behavior of such
1247migrations. This can be done via the configuration file
1248`datacenter.cfg` or for a specific migration via API or command line
1249parameters.
1250
a37d539f 1251It makes a difference if a guest is online or offline, or if it has
da6c7dee
DC
1252local resources (like a local disk).
1253
a37d539f 1254For details about virtual machine migration, see the
a9e7c3aa 1255xref:qm_migration[QEMU/KVM Migration Chapter].
da6c7dee 1256
a37d539f 1257For details about container migration, see the
a9e7c3aa 1258xref:pct_migration[Container Migration Chapter].
082ea7d9
TL
1259
1260Migration Type
1261~~~~~~~~~~~~~~
1262
44f38275 1263The migration type defines if the migration data should be sent over an
d63be10b 1264encrypted (`secure`) channel or an unencrypted (`insecure`) one.
082ea7d9 1265Setting the migration type to insecure means that the RAM content of a
a37d539f 1266virtual guest is also transferred unencrypted, which can lead to
b1743473 1267information disclosure of critical data from inside the guest (for
a37d539f 1268example, passwords or encryption keys).
054a7e7d
DM
1269
1270Therefore, we strongly recommend using the secure channel if you do
1271not have full control over the network and can not guarantee that no
6d3c0b34 1272one is eavesdropping on it.
082ea7d9 1273
054a7e7d
DM
1274NOTE: Storage migration does not follow this setting. Currently, it
1275always sends the storage content over a secure channel.
1276
1277Encryption requires a lot of computing power, so this setting is often
1278changed to "unsafe" to achieve better performance. The impact on
1279modern systems is lower because they implement AES encryption in
b1743473 1280hardware. The performance impact is particularly evident in fast
a37d539f 1281networks, where you can transfer 10 Gbps or more.
082ea7d9 1282
082ea7d9
TL
1283Migration Network
1284~~~~~~~~~~~~~~~~~
1285
a9baa444 1286By default, {pve} uses the network in which cluster communication
a37d539f 1287takes place to send the migration traffic. This is not optimal both because
a9baa444
TL
1288sensitive cluster traffic can be disrupted and this network may not
1289have the best bandwidth available on the node.
1290
1291Setting the migration network parameter allows the use of a dedicated
a37d539f 1292network for all migration traffic. In addition to the memory,
a9baa444
TL
1293this also affects the storage traffic for offline migrations.
1294
a37d539f
DW
1295The migration network is set as a network using CIDR notation. This
1296has the advantage that you don't have to set individual IP addresses
1297for each node. {pve} can determine the real address on the
1298destination node from the network specified in the CIDR form. To
1299enable this, the network must be specified so that each node has exactly one
1300IP in the respective network.
a9baa444 1301
082ea7d9
TL
1302Example
1303^^^^^^^
1304
a37d539f 1305We assume that we have a three-node setup, with three separate
a9baa444 1306networks. One for public communication with the Internet, one for
a37d539f 1307cluster communication, and a very fast one, which we want to use as a
a9baa444
TL
1308dedicated network for migration.
1309
1310A network configuration for such a setup might look as follows:
082ea7d9
TL
1311
1312----
7a0d4784 1313iface eno1 inet manual
082ea7d9
TL
1314
1315# public network
1316auto vmbr0
1317iface vmbr0 inet static
8673c878 1318 address 192.X.Y.57/24
082ea7d9 1319 gateway 192.X.Y.1
7a39aabd
AL
1320 bridge-ports eno1
1321 bridge-stp off
1322 bridge-fd 0
082ea7d9
TL
1323
1324# cluster network
7a0d4784
WL
1325auto eno2
1326iface eno2 inet static
8673c878 1327 address 10.1.1.1/24
082ea7d9
TL
1328
1329# fast network
7a0d4784
WL
1330auto eno3
1331iface eno3 inet static
8673c878 1332 address 10.1.2.1/24
082ea7d9
TL
1333----
1334
a9baa444
TL
1335Here, we will use the network 10.1.2.0/24 as a migration network. For
1336a single migration, you can do this using the `migration_network`
1337parameter of the command line tool:
1338
082ea7d9 1339----
b1743473 1340# qm migrate 106 tre --online --migration_network 10.1.2.0/24
082ea7d9
TL
1341----
1342
a9baa444
TL
1343To configure this as the default network for all migrations in the
1344cluster, set the `migration` property of the `/etc/pve/datacenter.cfg`
1345file:
1346
082ea7d9 1347----
a9baa444 1348# use dedicated migration network
b1743473 1349migration: secure,network=10.1.2.0/24
082ea7d9
TL
1350----
1351
a9baa444 1352NOTE: The migration type must always be set when the migration network
a37d539f 1353is set in `/etc/pve/datacenter.cfg`.
a9baa444 1354
806ef12d 1355
d8742b0c
DM
1356ifdef::manvolnum[]
1357include::pve-copyright.adoc[]
1358endif::manvolnum[]