]> git.proxmox.com Git - pve-docs.git/blame - pvecm.adoc
pvecm: replace {PVE} with {pve}
[pve-docs.git] / pvecm.adoc
CommitLineData
bde0e57d 1[[chapter_pvecm]]
d8742b0c 2ifdef::manvolnum[]
b2f242ab
DM
3pvecm(1)
4========
5f09af76
DM
5:pve-toplevel:
6
d8742b0c
DM
7NAME
8----
9
74026b8f 10pvecm - Proxmox VE Cluster Manager
d8742b0c 11
49a5e11c 12SYNOPSIS
d8742b0c
DM
13--------
14
15include::pvecm.1-synopsis.adoc[]
16
17DESCRIPTION
18-----------
19endif::manvolnum[]
20
21ifndef::manvolnum[]
22Cluster Manager
23===============
5f09af76 24:pve-toplevel:
194d2f29 25endif::manvolnum[]
5f09af76 26
65a0aa49 27The {pve} cluster manager `pvecm` is a tool to create a group of
8c1189b6 28physical servers. Such a group is called a *cluster*. We use the
8a865621 29http://www.corosync.org[Corosync Cluster Engine] for reliable group
fdf1dd36
TL
30communication. There's no explicit limit for the number of nodes in a cluster.
31In practice, the actual possible node count may be limited by the host and
79bb0794 32network performance. Currently (2021), there are reports of clusters (using
fdf1dd36 33high-end enterprise hardware) with over 50 nodes in production.
8a865621 34
8c1189b6 35`pvecm` can be used to create a new cluster, join nodes to a cluster,
a37d539f 36leave the cluster, get status information, and do various other cluster-related
60ed554f 37tasks. The **P**rox**m**o**x** **C**luster **F**ile **S**ystem (``pmxcfs'')
e300cf7d 38is used to transparently distribute the cluster configuration to all cluster
8a865621
DM
39nodes.
40
41Grouping nodes into a cluster has the following advantages:
42
a37d539f 43* Centralized, web-based management
8a865621 44
6d3c0b34 45* Multi-master clusters: each node can do all management tasks
8a865621 46
a37d539f
DW
47* Use of `pmxcfs`, a database-driven file system, for storing configuration
48 files, replicated in real-time on all nodes using `corosync`
8a865621 49
5eba0743 50* Easy migration of virtual machines and containers between physical
8a865621
DM
51 hosts
52
53* Fast deployment
54
55* Cluster-wide services like firewall and HA
56
57
58Requirements
59------------
60
a9e7c3aa
SR
61* All nodes must be able to connect to each other via UDP ports 5404 and 5405
62 for corosync to work.
8a865621 63
a37d539f 64* Date and time must be synchronized.
8a865621 65
a37d539f 66* An SSH tunnel on TCP port 22 between nodes is required.
8a865621 67
ceabe189
DM
68* If you are interested in High Availability, you need to have at
69 least three nodes for reliable quorum. All nodes should have the
70 same version.
8a865621
DM
71
72* We recommend a dedicated NIC for the cluster traffic, especially if
73 you use shared storage.
74
a37d539f 75* The root password of a cluster node is required for adding nodes.
d4a9910f 76
e4b62d04
TL
77NOTE: It is not possible to mix {pve} 3.x and earlier with {pve} 4.X cluster
78nodes.
79
a37d539f
DW
80NOTE: While it's possible to mix {pve} 4.4 and {pve} 5.0 nodes, doing so is
81not supported as a production configuration and should only be done temporarily,
82during an upgrade of the whole cluster from one major version to another.
8a865621 83
a9e7c3aa
SR
84NOTE: Running a cluster of {pve} 6.x with earlier versions is not possible. The
85cluster protocol (corosync) between {pve} 6.x and earlier versions changed
86fundamentally. The corosync 3 packages for {pve} 5.4 are only intended for the
87upgrade procedure to {pve} 6.0.
88
8a865621 89
ceabe189
DM
90Preparing Nodes
91---------------
8a865621 92
65a0aa49 93First, install {pve} on all nodes. Make sure that each node is
8a865621
DM
94installed with the final hostname and IP configuration. Changing the
95hostname and IP is not possible after cluster creation.
96
a37d539f 97While it's common to reference all node names and their IPs in `/etc/hosts` (or
a9e7c3aa
SR
98make their names resolvable through other means), this is not necessary for a
99cluster to work. It may be useful however, as you can then connect from one node
a37d539f 100to another via SSH, using the easier to remember node name (see also
a9e7c3aa 101xref:pvecm_corosync_addresses[Link Address Types]). Note that we always
a37d539f 102recommend referencing nodes by their IP addresses in the cluster configuration.
a9e7c3aa 103
9a7396aa 104
11202f1d 105[[pvecm_create_cluster]]
6cab1704
TL
106Create a Cluster
107----------------
108
109You can either create a cluster on the console (login via `ssh`), or through
a37d539f 110the API using the {pve} web interface (__Datacenter -> Cluster__).
8a865621 111
6cab1704
TL
112NOTE: Use a unique name for your cluster. This name cannot be changed later.
113The cluster name follows the same rules as node names.
3e380ce0 114
6cab1704 115[[pvecm_cluster_create_via_gui]]
3e380ce0
SR
116Create via Web GUI
117~~~~~~~~~~~~~~~~~~
118
24398259
SR
119[thumbnail="screenshot/gui-cluster-create.png"]
120
3e380ce0 121Under __Datacenter -> Cluster__, click on *Create Cluster*. Enter the cluster
a37d539f
DW
122name and select a network connection from the drop-down list to serve as the
123main cluster network (Link 0). It defaults to the IP resolved via the node's
3e380ce0
SR
124hostname.
125
126To add a second link as fallback, you can select the 'Advanced' checkbox and
127choose an additional network interface (Link 1, see also
128xref:pvecm_redundancy[Corosync Redundancy]).
129
a37d539f
DW
130NOTE: Ensure that the network selected for cluster communication is not used for
131any high traffic purposes, like network storage or live-migration.
6cab1704
TL
132While the cluster network itself produces small amounts of data, it is very
133sensitive to latency. Check out full
134xref:pvecm_cluster_network_requirements[cluster network requirements].
135
136[[pvecm_cluster_create_via_cli]]
a37d539f
DW
137Create via the Command Line
138~~~~~~~~~~~~~~~~~~~~~~~~~~~
3e380ce0
SR
139
140Login via `ssh` to the first {pve} node and run the following command:
8a865621 141
c15cdfba
TL
142----
143 hp1# pvecm create CLUSTERNAME
144----
8a865621 145
3e380ce0 146To check the state of the new cluster use:
8a865621 147
c15cdfba 148----
8a865621 149 hp1# pvecm status
c15cdfba 150----
8a865621 151
a37d539f
DW
152Multiple Clusters in the Same Network
153~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
dd1aa0e0
TL
154
155It is possible to create multiple clusters in the same physical or logical
a37d539f
DW
156network. In this case, each cluster must have a unique name to avoid possible
157clashes in the cluster communication stack. Furthermore, this helps avoid human
158confusion by making clusters clearly distinguishable.
dd1aa0e0
TL
159
160While the bandwidth requirement of a corosync cluster is relatively low, the
161latency of packages and the package per second (PPS) rate is the limiting
162factor. Different clusters in the same network can compete with each other for
163these resources, so it may still make sense to use separate physical network
164infrastructure for bigger clusters.
8a865621 165
11202f1d 166[[pvecm_join_node_to_cluster]]
8a865621 167Adding Nodes to the Cluster
ceabe189 168---------------------------
8a865621 169
3e380ce0
SR
170CAUTION: A node that is about to be added to the cluster cannot hold any guests.
171All existing configuration in `/etc/pve` is overwritten when joining a cluster,
a37d539f
DW
172since guest IDs could otherwise conflict. As a workaround, you can create a
173backup of the guest (`vzdump`) and restore it under a different ID, after the
174node has been added to the cluster.
3e380ce0 175
6cab1704
TL
176Join Node to Cluster via GUI
177~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3e380ce0 178
24398259
SR
179[thumbnail="screenshot/gui-cluster-join-information.png"]
180
a37d539f
DW
181Log in to the web interface on an existing cluster node. Under __Datacenter ->
182Cluster__, click the *Join Information* button at the top. Then, click on the
3e380ce0
SR
183button *Copy Information*. Alternatively, copy the string from the 'Information'
184field manually.
185
24398259
SR
186[thumbnail="screenshot/gui-cluster-join.png"]
187
a37d539f 188Next, log in to the web interface on the node you want to add.
3e380ce0 189Under __Datacenter -> Cluster__, click on *Join Cluster*. Fill in the
6cab1704
TL
190'Information' field with the 'Join Information' text you copied earlier.
191Most settings required for joining the cluster will be filled out
192automatically. For security reasons, the cluster password has to be entered
193manually.
3e380ce0
SR
194
195NOTE: To enter all required data manually, you can disable the 'Assisted Join'
196checkbox.
197
6cab1704 198After clicking the *Join* button, the cluster join process will start
a37d539f
DW
199immediately. After the node has joined the cluster, its current node certificate
200will be replaced by one signed from the cluster certificate authority (CA).
201This means that the current session will stop working after a few seconds. You
202then might need to force-reload the web interface and log in again with the
203cluster credentials.
3e380ce0 204
6cab1704 205Now your node should be visible under __Datacenter -> Cluster__.
3e380ce0 206
6cab1704
TL
207Join Node to Cluster via Command Line
208~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3e380ce0 209
a37d539f 210Log in to the node you want to join into an existing cluster via `ssh`.
8a865621 211
c15cdfba 212----
8a865621 213 hp2# pvecm add IP-ADDRESS-CLUSTER
c15cdfba 214----
8a865621 215
a37d539f 216For `IP-ADDRESS-CLUSTER`, use the IP or hostname of an existing cluster node.
a9e7c3aa 217An IP address is recommended (see xref:pvecm_corosync_addresses[Link Address Types]).
8a865621 218
8a865621 219
a9e7c3aa 220To check the state of the cluster use:
8a865621 221
c15cdfba 222----
8a865621 223 # pvecm status
c15cdfba 224----
8a865621 225
ceabe189 226.Cluster status after adding 4 nodes
8a865621
DM
227----
228hp2# pvecm status
229Quorum information
230~~~~~~~~~~~~~~~~~~
231Date: Mon Apr 20 12:30:13 2015
232Quorum provider: corosync_votequorum
233Nodes: 4
234Node ID: 0x00000001
a9e7c3aa 235Ring ID: 1/8
8a865621
DM
236Quorate: Yes
237
238Votequorum information
239~~~~~~~~~~~~~~~~~~~~~~
240Expected votes: 4
241Highest expected: 4
242Total votes: 4
91f3edd0 243Quorum: 3
8a865621
DM
244Flags: Quorate
245
246Membership information
247~~~~~~~~~~~~~~~~~~~~~~
248 Nodeid Votes Name
2490x00000001 1 192.168.15.91
2500x00000002 1 192.168.15.92 (local)
2510x00000003 1 192.168.15.93
2520x00000004 1 192.168.15.94
253----
254
a37d539f 255If you only want a list of all nodes, use:
8a865621 256
c15cdfba 257----
8a865621 258 # pvecm nodes
c15cdfba 259----
8a865621 260
5eba0743 261.List nodes in a cluster
8a865621
DM
262----
263hp2# pvecm nodes
264
265Membership information
266~~~~~~~~~~~~~~~~~~~~~~
267 Nodeid Votes Name
268 1 1 hp1
269 2 1 hp2 (local)
270 3 1 hp3
271 4 1 hp4
272----
273
3254bfdd 274[[pvecm_adding_nodes_with_separated_cluster_network]]
a37d539f 275Adding Nodes with Separated Cluster Network
e4ec4154
TL
276~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
277
a37d539f 278When adding a node to a cluster with a separated cluster network, you need to
a9e7c3aa 279use the 'link0' parameter to set the nodes address on that network:
e4ec4154
TL
280
281[source,bash]
4d19cb00 282----
a9e7c3aa 283pvecm add IP-ADDRESS-CLUSTER -link0 LOCAL-IP-ADDRESS-LINK0
4d19cb00 284----
e4ec4154 285
a9e7c3aa 286If you want to use the built-in xref:pvecm_redundancy[redundancy] of the
a37d539f 287Kronosnet transport layer, also use the 'link1' parameter.
e4ec4154 288
a37d539f
DW
289Using the GUI, you can select the correct interface from the corresponding
290'Link X' fields in the *Cluster Join* dialog.
8a865621
DM
291
292Remove a Cluster Node
ceabe189 293---------------------
8a865621 294
a37d539f 295CAUTION: Read the procedure carefully before proceeding, as it may
8a865621
DM
296not be what you want or need.
297
a37d539f
DW
298Move all virtual machines from the node. Make sure you have made copies of any
299local data or backups that you want to keep. In the following example, we will
300remove the node hp4 from the cluster.
8a865621 301
e8503c6c
EK
302Log in to a *different* cluster node (not hp4), and issue a `pvecm nodes`
303command to identify the node ID to remove:
8a865621
DM
304
305----
306hp1# pvecm nodes
307
308Membership information
309~~~~~~~~~~~~~~~~~~~~~~
310 Nodeid Votes Name
311 1 1 hp1 (local)
312 2 1 hp2
313 3 1 hp3
314 4 1 hp4
315----
316
e8503c6c 317
a37d539f
DW
318At this point, you must power off hp4 and ensure that it will not power on
319again (in the network) with its current configuration.
e8503c6c 320
a37d539f
DW
321IMPORTANT: As mentioned above, it is critical to power off the node
322*before* removal, and make sure that it will *not* power on again
323(in the existing cluster network) with its current configuration.
324If you power on the node as it is, the cluster could end up broken,
325and it could be difficult to restore it to a functioning state.
e8503c6c
EK
326
327After powering off the node hp4, we can safely remove it from the cluster.
8a865621 328
c15cdfba 329----
8a865621 330 hp1# pvecm delnode hp4
10da5ce1 331 Killing node 4
c15cdfba 332----
8a865621 333
10da5ce1
DJ
334Use `pvecm nodes` or `pvecm status` to check the node list again. It should
335look something like:
8a865621
DM
336
337----
338hp1# pvecm status
339
340Quorum information
341~~~~~~~~~~~~~~~~~~
342Date: Mon Apr 20 12:44:28 2015
343Quorum provider: corosync_votequorum
344Nodes: 3
345Node ID: 0x00000001
a9e7c3aa 346Ring ID: 1/8
8a865621
DM
347Quorate: Yes
348
349Votequorum information
350~~~~~~~~~~~~~~~~~~~~~~
351Expected votes: 3
352Highest expected: 3
353Total votes: 3
91f3edd0 354Quorum: 2
8a865621
DM
355Flags: Quorate
356
357Membership information
358~~~~~~~~~~~~~~~~~~~~~~
359 Nodeid Votes Name
3600x00000001 1 192.168.15.90 (local)
3610x00000002 1 192.168.15.91
3620x00000003 1 192.168.15.92
363----
364
a9e7c3aa 365If, for whatever reason, you want this server to join the same cluster again,
a37d539f 366you have to:
8a865621 367
a37d539f 368* do a fresh install of {pve} on it,
8a865621
DM
369
370* then join it, as explained in the previous section.
d8742b0c 371
41925ede
SR
372NOTE: After removal of the node, its SSH fingerprint will still reside in the
373'known_hosts' of the other nodes. If you receive an SSH error after rejoining
9121b45b
TL
374a node with the same IP or hostname, run `pvecm updatecerts` once on the
375re-added node to update its fingerprint cluster wide.
41925ede 376
38ae8db3 377[[pvecm_separate_node_without_reinstall]]
a37d539f 378Separate a Node Without Reinstalling
555e966b
TL
379~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
380
381CAUTION: This is *not* the recommended method, proceed with caution. Use the
a37d539f 382previous method if you're unsure.
555e966b
TL
383
384You can also separate a node from a cluster without reinstalling it from
a37d539f
DW
385scratch. But after removing the node from the cluster, it will still have
386access to any shared storage. This must be resolved before you start removing
555e966b 387the node from the cluster. A {pve} cluster cannot share the exact same
60ed554f 388storage with another cluster, as storage locking doesn't work over the cluster
a37d539f 389boundary. Furthermore, it may also lead to VMID conflicts.
555e966b 390
a37d539f 391It's suggested that you create a new storage, where only the node which you want
a9e7c3aa 392to separate has access. This can be a new export on your NFS or a new Ceph
a37d539f
DW
393pool, to name a few examples. It's just important that the exact same storage
394does not get accessed by multiple clusters. After setting up this storage, move
395all data and VMs from the node to it. Then you are ready to separate the
3be22308 396node from the cluster.
555e966b 397
a37d539f
DW
398WARNING: Ensure that all shared resources are cleanly separated! Otherwise you
399will run into conflicts and problems.
555e966b 400
a37d539f 401First, stop the corosync and pve-cluster services on the node:
555e966b 402[source,bash]
4d19cb00 403----
555e966b
TL
404systemctl stop pve-cluster
405systemctl stop corosync
4d19cb00 406----
555e966b 407
a37d539f 408Start the cluster file system again in local mode:
555e966b 409[source,bash]
4d19cb00 410----
555e966b 411pmxcfs -l
4d19cb00 412----
555e966b
TL
413
414Delete the corosync configuration files:
415[source,bash]
4d19cb00 416----
555e966b 417rm /etc/pve/corosync.conf
838081cd 418rm -r /etc/corosync/*
4d19cb00 419----
555e966b 420
a37d539f 421You can now start the file system again as a normal service:
555e966b 422[source,bash]
4d19cb00 423----
555e966b
TL
424killall pmxcfs
425systemctl start pve-cluster
4d19cb00 426----
555e966b 427
a37d539f
DW
428The node is now separated from the cluster. You can deleted it from any
429remaining node of the cluster with:
555e966b 430[source,bash]
4d19cb00 431----
555e966b 432pvecm delnode oldnode
4d19cb00 433----
555e966b 434
a37d539f
DW
435If the command fails due to a loss of quorum in the remaining node, you can set
436the expected votes to 1 as a workaround:
555e966b 437[source,bash]
4d19cb00 438----
555e966b 439pvecm expected 1
4d19cb00 440----
555e966b 441
96d698db 442And then repeat the 'pvecm delnode' command.
555e966b 443
a37d539f
DW
444Now switch back to the separated node and delete all the remaining cluster
445files on it. This ensures that the node can be added to another cluster again
446without problems.
555e966b
TL
447
448[source,bash]
4d19cb00 449----
555e966b 450rm /var/lib/corosync/*
4d19cb00 451----
555e966b
TL
452
453As the configuration files from the other nodes are still in the cluster
a37d539f
DW
454file system, you may want to clean those up too. After making absolutely sure
455that you have the correct node name, you can simply remove the entire
456directory recursively from '/etc/pve/nodes/NODENAME'.
555e966b 457
a37d539f
DW
458CAUTION: The node's SSH keys will remain in the 'authorized_key' file. This
459means that the nodes can still connect to each other with public key
460authentication. You should fix this by removing the respective keys from the
555e966b 461'/etc/pve/priv/authorized_keys' file.
d8742b0c 462
a9e7c3aa 463
806ef12d
DM
464Quorum
465------
466
467{pve} use a quorum-based technique to provide a consistent state among
468all cluster nodes.
469
470[quote, from Wikipedia, Quorum (distributed computing)]
471____
472A quorum is the minimum number of votes that a distributed transaction
473has to obtain in order to be allowed to perform an operation in a
474distributed system.
475____
476
477In case of network partitioning, state changes requires that a
478majority of nodes are online. The cluster switches to read-only mode
5eba0743 479if it loses quorum.
806ef12d
DM
480
481NOTE: {pve} assigns a single vote to each node by default.
482
a9e7c3aa 483
e4ec4154
TL
484Cluster Network
485---------------
486
487The cluster network is the core of a cluster. All messages sent over it have to
a9e7c3aa 488be delivered reliably to all nodes in their respective order. In {pve} this
a37d539f
DW
489part is done by corosync, an implementation of a high performance, low overhead,
490high availability development toolkit. It serves our decentralized configuration
491file system (`pmxcfs`).
e4ec4154 492
3254bfdd 493[[pvecm_cluster_network_requirements]]
e4ec4154
TL
494Network Requirements
495~~~~~~~~~~~~~~~~~~~~
496This needs a reliable network with latencies under 2 milliseconds (LAN
a9e7c3aa 497performance) to work properly. The network should not be used heavily by other
a37d539f 498members; ideally corosync runs on its own network. Do not use a shared network
a9e7c3aa
SR
499for corosync and storage (except as a potential low-priority fallback in a
500xref:pvecm_redundancy[redundant] configuration).
e4ec4154 501
a9e7c3aa 502Before setting up a cluster, it is good practice to check if the network is fit
a37d539f 503for that purpose. To ensure that the nodes can connect to each other on the
a9e7c3aa
SR
504cluster network, you can test the connectivity between them with the `ping`
505tool.
e4ec4154 506
a9e7c3aa
SR
507If the {pve} firewall is enabled, ACCEPT rules for corosync will automatically
508be generated - no manual action is required.
e4ec4154 509
a9e7c3aa
SR
510NOTE: Corosync used Multicast before version 3.0 (introduced in {pve} 6.0).
511Modern versions rely on https://kronosnet.org/[Kronosnet] for cluster
512communication, which, for now, only supports regular UDP unicast.
e4ec4154 513
a9e7c3aa
SR
514CAUTION: You can still enable Multicast or legacy unicast by setting your
515transport to `udp` or `udpu` in your xref:pvecm_edit_corosync_conf[corosync.conf],
516but keep in mind that this will disable all cryptography and redundancy support.
517This is therefore not recommended.
e4ec4154
TL
518
519Separate Cluster Network
520~~~~~~~~~~~~~~~~~~~~~~~~
521
a37d539f
DW
522When creating a cluster without any parameters, the corosync cluster network is
523generally shared with the web interface and the VMs' network. Depending on
524your setup, even storage traffic may get sent over the same network. It's
525recommended to change that, as corosync is a time-critical, real-time
a9e7c3aa 526application.
e4ec4154 527
a37d539f 528Setting Up a New Network
e4ec4154
TL
529^^^^^^^^^^^^^^^^^^^^^^^^
530
9ffebff5 531First, you have to set up a new network interface. It should be on a physically
e4ec4154 532separate network. Ensure that your network fulfills the
3254bfdd 533xref:pvecm_cluster_network_requirements[cluster network requirements].
e4ec4154
TL
534
535Separate On Cluster Creation
536^^^^^^^^^^^^^^^^^^^^^^^^^^^^
537
a9e7c3aa 538This is possible via the 'linkX' parameters of the 'pvecm create'
a37d539f 539command, used for creating a new cluster.
e4ec4154 540
a9e7c3aa
SR
541If you have set up an additional NIC with a static address on 10.10.10.1/25,
542and want to send and receive all cluster communication over this interface,
e4ec4154
TL
543you would execute:
544
545[source,bash]
4d19cb00 546----
a9e7c3aa 547pvecm create test --link0 10.10.10.1
4d19cb00 548----
e4ec4154 549
a37d539f 550To check if everything is working properly, execute:
e4ec4154 551[source,bash]
4d19cb00 552----
e4ec4154 553systemctl status corosync
4d19cb00 554----
e4ec4154 555
a9e7c3aa 556Afterwards, proceed as described above to
3254bfdd 557xref:pvecm_adding_nodes_with_separated_cluster_network[add nodes with a separated cluster network].
82d52451 558
3254bfdd 559[[pvecm_separate_cluster_net_after_creation]]
e4ec4154
TL
560Separate After Cluster Creation
561^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
562
a9e7c3aa 563You can do this if you have already created a cluster and want to switch
e4ec4154 564its communication to another network, without rebuilding the whole cluster.
a37d539f 565This change may lead to short periods of quorum loss in the cluster, as nodes
e4ec4154
TL
566have to restart corosync and come up one after the other on the new network.
567
3254bfdd 568Check how to xref:pvecm_edit_corosync_conf[edit the corosync.conf file] first.
a9e7c3aa 569Then, open it and you should see a file similar to:
e4ec4154
TL
570
571----
572logging {
573 debug: off
574 to_syslog: yes
575}
576
577nodelist {
578
579 node {
580 name: due
581 nodeid: 2
582 quorum_votes: 1
583 ring0_addr: due
584 }
585
586 node {
587 name: tre
588 nodeid: 3
589 quorum_votes: 1
590 ring0_addr: tre
591 }
592
593 node {
594 name: uno
595 nodeid: 1
596 quorum_votes: 1
597 ring0_addr: uno
598 }
599
600}
601
602quorum {
603 provider: corosync_votequorum
604}
605
606totem {
a9e7c3aa 607 cluster_name: testcluster
e4ec4154 608 config_version: 3
a9e7c3aa 609 ip_version: ipv4-6
e4ec4154
TL
610 secauth: on
611 version: 2
612 interface {
a9e7c3aa 613 linknumber: 0
e4ec4154
TL
614 }
615
616}
617----
618
a37d539f 619NOTE: `ringX_addr` actually specifies a corosync *link address*. The name "ring"
a9e7c3aa
SR
620is a remnant of older corosync versions that is kept for backwards
621compatibility.
622
a37d539f 623The first thing you want to do is add the 'name' properties in the node entries,
a9e7c3aa 624if you do not see them already. Those *must* match the node name.
e4ec4154 625
a9e7c3aa
SR
626Then replace all addresses from the 'ring0_addr' properties of all nodes with
627the new addresses. You may use plain IP addresses or hostnames here. If you use
a37d539f
DW
628hostnames, ensure that they are resolvable from all nodes (see also
629xref:pvecm_corosync_addresses[Link Address Types]).
e4ec4154 630
a37d539f
DW
631In this example, we want to switch cluster communication to the
63210.10.10.1/25 network, so we change the 'ring0_addr' of each node respectively.
e4ec4154 633
a9e7c3aa 634NOTE: The exact same procedure can be used to change other 'ringX_addr' values
a37d539f
DW
635as well. However, we recommend only changing one link address at a time, so
636that it's easier to recover if something goes wrong.
a9e7c3aa
SR
637
638After we increase the 'config_version' property, the new configuration file
e4ec4154
TL
639should look like:
640
641----
e4ec4154
TL
642logging {
643 debug: off
644 to_syslog: yes
645}
646
647nodelist {
648
649 node {
650 name: due
651 nodeid: 2
652 quorum_votes: 1
653 ring0_addr: 10.10.10.2
654 }
655
656 node {
657 name: tre
658 nodeid: 3
659 quorum_votes: 1
660 ring0_addr: 10.10.10.3
661 }
662
663 node {
664 name: uno
665 nodeid: 1
666 quorum_votes: 1
667 ring0_addr: 10.10.10.1
668 }
669
670}
671
672quorum {
673 provider: corosync_votequorum
674}
675
676totem {
a9e7c3aa 677 cluster_name: testcluster
e4ec4154 678 config_version: 4
a9e7c3aa 679 ip_version: ipv4-6
e4ec4154
TL
680 secauth: on
681 version: 2
682 interface {
a9e7c3aa 683 linknumber: 0
e4ec4154
TL
684 }
685
686}
687----
688
a37d539f
DW
689Then, after a final check to see that all changed information is correct, we
690save it and once again follow the
691xref:pvecm_edit_corosync_conf[edit corosync.conf file] section to bring it into
692effect.
e4ec4154 693
a9e7c3aa
SR
694The changes will be applied live, so restarting corosync is not strictly
695necessary. If you changed other settings as well, or notice corosync
696complaining, you can optionally trigger a restart.
e4ec4154
TL
697
698On a single node execute:
a9e7c3aa 699
e4ec4154 700[source,bash]
4d19cb00 701----
e4ec4154 702systemctl restart corosync
4d19cb00 703----
e4ec4154 704
a37d539f 705Now check if everything is okay:
e4ec4154
TL
706
707[source,bash]
4d19cb00 708----
e4ec4154 709systemctl status corosync
4d19cb00 710----
e4ec4154 711
a37d539f 712If corosync begins to work again, restart it on all other nodes too.
e4ec4154
TL
713They will then join the cluster membership one by one on the new network.
714
3254bfdd 715[[pvecm_corosync_addresses]]
a37d539f 716Corosync Addresses
270757a1
SR
717~~~~~~~~~~~~~~~~~~
718
a9e7c3aa
SR
719A corosync link address (for backwards compatibility denoted by 'ringX_addr' in
720`corosync.conf`) can be specified in two ways:
270757a1 721
a37d539f 722* **IPv4/v6 addresses** can be used directly. They are recommended, since they
270757a1
SR
723are static and usually not changed carelessly.
724
a37d539f 725* **Hostnames** will be resolved using `getaddrinfo`, which means that by
270757a1
SR
726default, IPv6 addresses will be used first, if available (see also
727`man gai.conf`). Keep this in mind, especially when upgrading an existing
728cluster to IPv6.
729
a37d539f 730CAUTION: Hostnames should be used with care, since the addresses they
270757a1
SR
731resolve to can be changed without touching corosync or the node it runs on -
732which may lead to a situation where an address is changed without thinking
733about implications for corosync.
734
5f318cc0 735A separate, static hostname specifically for corosync is recommended, if
270757a1
SR
736hostnames are preferred. Also, make sure that every node in the cluster can
737resolve all hostnames correctly.
738
739Since {pve} 5.1, while supported, hostnames will be resolved at the time of
a37d539f 740entry. Only the resolved IP is saved to the configuration.
270757a1
SR
741
742Nodes that joined the cluster on earlier versions likely still use their
743unresolved hostname in `corosync.conf`. It might be a good idea to replace
5f318cc0 744them with IPs or a separate hostname, as mentioned above.
270757a1 745
e4ec4154 746
a9e7c3aa
SR
747[[pvecm_redundancy]]
748Corosync Redundancy
749-------------------
e4ec4154 750
a37d539f 751Corosync supports redundant networking via its integrated Kronosnet layer by
a9e7c3aa
SR
752default (it is not supported on the legacy udp/udpu transports). It can be
753enabled by specifying more than one link address, either via the '--linkX'
3e380ce0
SR
754parameters of `pvecm`, in the GUI as **Link 1** (while creating a cluster or
755adding a new node) or by specifying more than one 'ringX_addr' in
756`corosync.conf`.
e4ec4154 757
a9e7c3aa
SR
758NOTE: To provide useful failover, every link should be on its own
759physical network connection.
e4ec4154 760
a9e7c3aa
SR
761Links are used according to a priority setting. You can configure this priority
762by setting 'knet_link_priority' in the corresponding interface section in
5f318cc0 763`corosync.conf`, or, preferably, using the 'priority' parameter when creating
a9e7c3aa 764your cluster with `pvecm`:
e4ec4154 765
4d19cb00 766----
fcf0226e 767 # pvecm create CLUSTERNAME --link0 10.10.10.1,priority=15 --link1 10.20.20.1,priority=20
4d19cb00 768----
e4ec4154 769
fcf0226e 770This would cause 'link1' to be used first, since it has the higher priority.
a9e7c3aa
SR
771
772If no priorities are configured manually (or two links have the same priority),
773links will be used in order of their number, with the lower number having higher
774priority.
775
776Even if all links are working, only the one with the highest priority will see
a37d539f
DW
777corosync traffic. Link priorities cannot be mixed, meaning that links with
778different priorities will not be able to communicate with each other.
e4ec4154 779
a9e7c3aa 780Since lower priority links will not see traffic unless all higher priorities
a37d539f
DW
781have failed, it becomes a useful strategy to specify networks used for
782other tasks (VMs, storage, etc.) as low-priority links. If worst comes to
783worst, a higher latency or more congested connection might be better than no
a9e7c3aa 784connection at all.
e4ec4154 785
a9e7c3aa
SR
786Adding Redundant Links To An Existing Cluster
787~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
e4ec4154 788
a9e7c3aa
SR
789To add a new link to a running configuration, first check how to
790xref:pvecm_edit_corosync_conf[edit the corosync.conf file].
e4ec4154 791
a9e7c3aa
SR
792Then, add a new 'ringX_addr' to every node in the `nodelist` section. Make
793sure that your 'X' is the same for every node you add it to, and that it is
794unique for each node.
795
796Lastly, add a new 'interface', as shown below, to your `totem`
a37d539f 797section, replacing 'X' with the link number chosen above.
a9e7c3aa
SR
798
799Assuming you added a link with number 1, the new configuration file could look
800like this:
e4ec4154
TL
801
802----
a9e7c3aa
SR
803logging {
804 debug: off
805 to_syslog: yes
e4ec4154
TL
806}
807
808nodelist {
a9e7c3aa 809
e4ec4154 810 node {
a9e7c3aa
SR
811 name: due
812 nodeid: 2
e4ec4154 813 quorum_votes: 1
a9e7c3aa
SR
814 ring0_addr: 10.10.10.2
815 ring1_addr: 10.20.20.2
e4ec4154
TL
816 }
817
a9e7c3aa
SR
818 node {
819 name: tre
820 nodeid: 3
e4ec4154 821 quorum_votes: 1
a9e7c3aa
SR
822 ring0_addr: 10.10.10.3
823 ring1_addr: 10.20.20.3
e4ec4154
TL
824 }
825
a9e7c3aa
SR
826 node {
827 name: uno
828 nodeid: 1
829 quorum_votes: 1
830 ring0_addr: 10.10.10.1
831 ring1_addr: 10.20.20.1
832 }
833
834}
835
836quorum {
837 provider: corosync_votequorum
838}
839
840totem {
841 cluster_name: testcluster
842 config_version: 4
843 ip_version: ipv4-6
844 secauth: on
845 version: 2
846 interface {
847 linknumber: 0
848 }
849 interface {
850 linknumber: 1
851 }
e4ec4154 852}
a9e7c3aa 853----
e4ec4154 854
a9e7c3aa
SR
855The new link will be enabled as soon as you follow the last steps to
856xref:pvecm_edit_corosync_conf[edit the corosync.conf file]. A restart should not
857be necessary. You can check that corosync loaded the new link using:
e4ec4154 858
a9e7c3aa
SR
859----
860journalctl -b -u corosync
e4ec4154
TL
861----
862
a9e7c3aa
SR
863It might be a good idea to test the new link by temporarily disconnecting the
864old link on one node and making sure that its status remains online while
865disconnected:
e4ec4154 866
a9e7c3aa
SR
867----
868pvecm status
869----
870
871If you see a healthy cluster state, it means that your new link is being used.
e4ec4154 872
e4ec4154 873
65a0aa49 874Role of SSH in {pve} Clusters
9d999d1b 875-----------------------------
39aa8892 876
65a0aa49 877{pve} utilizes SSH tunnels for various features.
39aa8892 878
4e8fe2a9 879* Proxying console/shell sessions (node and guests)
9d999d1b 880+
4e8fe2a9
FG
881When using the shell for node B while being connected to node A, connects to a
882terminal proxy on node A, which is in turn connected to the login shell on node
883B via a non-interactive SSH tunnel.
39aa8892 884
4e8fe2a9
FG
885* VM and CT memory and local-storage migration in 'secure' mode.
886+
a37d539f 887During the migration, one or more SSH tunnel(s) are established between the
4e8fe2a9
FG
888source and target nodes, in order to exchange migration information and
889transfer memory and disk contents.
9d999d1b
TL
890
891* Storage replication
39aa8892 892
9d999d1b
TL
893.Pitfalls due to automatic execution of `.bashrc` and siblings
894[IMPORTANT]
895====
896In case you have a custom `.bashrc`, or similar files that get executed on
897login by the configured shell, `ssh` will automatically run it once the session
898is established successfully. This can cause some unexpected behavior, as those
a37d539f
DW
899commands may be executed with root permissions on any of the operations
900described above. This can cause possible problematic side-effects!
39aa8892
OB
901
902In order to avoid such complications, it's recommended to add a check in
903`/root/.bashrc` to make sure the session is interactive, and only then run
904`.bashrc` commands.
905
906You can add this snippet at the beginning of your `.bashrc` file:
907
908----
9d999d1b 909# Early exit if not running interactively to avoid side-effects!
39aa8892
OB
910case $- in
911 *i*) ;;
912 *) return;;
913esac
914----
9d999d1b 915====
39aa8892
OB
916
917
c21d2cbe
OB
918Corosync External Vote Support
919------------------------------
920
921This section describes a way to deploy an external voter in a {pve} cluster.
922When configured, the cluster can sustain more node failures without
923violating safety properties of the cluster communication.
924
a37d539f 925For this to work, there are two services involved:
c21d2cbe 926
a37d539f 927* A QDevice daemon which runs on each {pve} node
c21d2cbe 928
a37d539f 929* An external vote daemon which runs on an independent server
c21d2cbe 930
a37d539f 931As a result, you can achieve higher availability, even in smaller setups (for
c21d2cbe
OB
932example 2+1 nodes).
933
934QDevice Technical Overview
935~~~~~~~~~~~~~~~~~~~~~~~~~~
936
5f318cc0 937The Corosync Quorum Device (QDevice) is a daemon which runs on each cluster
a37d539f
DW
938node. It provides a configured number of votes to the cluster's quorum
939subsystem, based on an externally running third-party arbitrator's decision.
c21d2cbe
OB
940Its primary use is to allow a cluster to sustain more node failures than
941standard quorum rules allow. This can be done safely as the external device
942can see all nodes and thus choose only one set of nodes to give its vote.
a37d539f 943This will only be done if said set of nodes can have quorum (again) after
c21d2cbe
OB
944receiving the third-party vote.
945
a37d539f
DW
946Currently, only 'QDevice Net' is supported as a third-party arbitrator. This is
947a daemon which provides a vote to a cluster partition, if it can reach the
948partition members over the network. It will only give votes to one partition
c21d2cbe
OB
949of a cluster at any time.
950It's designed to support multiple clusters and is almost configuration and
951state free. New clusters are handled dynamically and no configuration file
952is needed on the host running a QDevice.
953
a37d539f
DW
954The only requirements for the external host are that it needs network access to
955the cluster and to have a corosync-qnetd package available. We provide a package
956for Debian based hosts, and other Linux distributions should also have a package
c21d2cbe
OB
957available through their respective package manager.
958
959NOTE: In contrast to corosync itself, a QDevice connects to the cluster over
a37d539f 960TCP/IP. The daemon may even run outside of the cluster's LAN and can have longer
a9e7c3aa 961latencies than 2 ms.
c21d2cbe
OB
962
963Supported Setups
964~~~~~~~~~~~~~~~~
965
966We support QDevices for clusters with an even number of nodes and recommend
967it for 2 node clusters, if they should provide higher availability.
a37d539f
DW
968For clusters with an odd node count, we currently discourage the use of
969QDevices. The reason for this is the difference in the votes which the QDevice
970provides for each cluster type. Even numbered clusters get a single additional
971vote, which only increases availability, because if the QDevice
972itself fails, you are in the same position as with no QDevice at all.
973
974On the other hand, with an odd numbered cluster size, the QDevice provides
975'(N-1)' votes -- where 'N' corresponds to the cluster node count. This
976alternative behavior makes sense; if it had only one additional vote, the
977cluster could get into a split-brain situation. This algorithm allows for all
978nodes but one (and naturally the QDevice itself) to fail. However, there are two
979drawbacks to this:
c21d2cbe
OB
980
981* If the QNet daemon itself fails, no other node may fail or the cluster
a37d539f 982 immediately loses quorum. For example, in a cluster with 15 nodes, 7
c21d2cbe 983 could fail before the cluster becomes inquorate. But, if a QDevice is
a37d539f
DW
984 configured here and it itself fails, **no single node** of the 15 may fail.
985 The QDevice acts almost as a single point of failure in this case.
c21d2cbe 986
a37d539f
DW
987* The fact that all but one node plus QDevice may fail sounds promising at
988 first, but this may result in a mass recovery of HA services, which could
989 overload the single remaining node. Furthermore, a Ceph server will stop
990 providing services if only '((N-1)/2)' nodes or less remain online.
c21d2cbe 991
a37d539f
DW
992If you understand the drawbacks and implications, you can decide yourself if
993you want to use this technology in an odd numbered cluster setup.
c21d2cbe 994
c21d2cbe
OB
995QDevice-Net Setup
996~~~~~~~~~~~~~~~~~
997
a37d539f 998We recommend running any daemon which provides votes to corosync-qdevice as an
7c039095 999unprivileged user. {pve} and Debian provide a package which is already
e34c3e91 1000configured to do so.
c21d2cbe 1001The traffic between the daemon and the cluster must be encrypted to ensure a
a37d539f 1002safe and secure integration of the QDevice in {pve}.
c21d2cbe 1003
41a37193
DJ
1004First, install the 'corosync-qnetd' package on your external server
1005
1006----
1007external# apt install corosync-qnetd
1008----
1009
1010and the 'corosync-qdevice' package on all cluster nodes
1011
1012----
1013pve# apt install corosync-qdevice
1014----
c21d2cbe 1015
a37d539f 1016After doing this, ensure that all the nodes in the cluster are online.
c21d2cbe 1017
a37d539f 1018You can now set up your QDevice by running the following command on one
c21d2cbe
OB
1019of the {pve} nodes:
1020
1021----
1022pve# pvecm qdevice setup <QDEVICE-IP>
1023----
1024
1b80fbaa
DJ
1025The SSH key from the cluster will be automatically copied to the QDevice.
1026
1027NOTE: Make sure that the SSH configuration on your external server allows root
1028login via password, if you are asked for a password during this step.
c21d2cbe 1029
a37d539f
DW
1030After you enter the password and all the steps have successfully completed, you
1031will see "Done". You can verify that the QDevice has been set up with:
c21d2cbe
OB
1032
1033----
1034pve# pvecm status
1035
1036...
1037
1038Votequorum information
1039~~~~~~~~~~~~~~~~~~~~~
1040Expected votes: 3
1041Highest expected: 3
1042Total votes: 3
1043Quorum: 2
1044Flags: Quorate Qdevice
1045
1046Membership information
1047~~~~~~~~~~~~~~~~~~~~~~
1048 Nodeid Votes Qdevice Name
1049 0x00000001 1 A,V,NMW 192.168.22.180 (local)
1050 0x00000002 1 A,V,NMW 192.168.22.181
1051 0x00000000 1 Qdevice
1052
1053----
1054
c21d2cbe 1055
c21d2cbe
OB
1056Frequently Asked Questions
1057~~~~~~~~~~~~~~~~~~~~~~~~~~
1058
1059Tie Breaking
1060^^^^^^^^^^^^
1061
00821894 1062In case of a tie, where two same-sized cluster partitions cannot see each other
a37d539f
DW
1063but can see the QDevice, the QDevice chooses one of those partitions randomly
1064and provides a vote to it.
c21d2cbe 1065
d31de328
TL
1066Possible Negative Implications
1067^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1068
a37d539f
DW
1069For clusters with an even node count, there are no negative implications when
1070using a QDevice. If it fails to work, it is the same as not having a QDevice
1071at all.
d31de328 1072
870c2817
OB
1073Adding/Deleting Nodes After QDevice Setup
1074^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
d31de328
TL
1075
1076If you want to add a new node or remove an existing one from a cluster with a
00821894
TL
1077QDevice setup, you need to remove the QDevice first. After that, you can add or
1078remove nodes normally. Once you have a cluster with an even node count again,
a37d539f 1079you can set up the QDevice again as described previously.
870c2817
OB
1080
1081Removing the QDevice
1082^^^^^^^^^^^^^^^^^^^^
1083
00821894 1084If you used the official `pvecm` tool to add the QDevice, you can remove it
a37d539f 1085by running:
870c2817
OB
1086
1087----
1088pve# pvecm qdevice remove
1089----
d31de328 1090
51730d56
TL
1091//Still TODO
1092//^^^^^^^^^^
a9e7c3aa 1093//There is still stuff to add here
c21d2cbe
OB
1094
1095
e4ec4154
TL
1096Corosync Configuration
1097----------------------
1098
a9e7c3aa
SR
1099The `/etc/pve/corosync.conf` file plays a central role in a {pve} cluster. It
1100controls the cluster membership and its network.
1101For further information about it, check the corosync.conf man page:
e4ec4154 1102[source,bash]
4d19cb00 1103----
e4ec4154 1104man corosync.conf
4d19cb00 1105----
e4ec4154 1106
a37d539f 1107For node membership, you should always use the `pvecm` tool provided by {pve}.
e4ec4154
TL
1108You may have to edit the configuration file manually for other changes.
1109Here are a few best practice tips for doing this.
1110
3254bfdd 1111[[pvecm_edit_corosync_conf]]
e4ec4154
TL
1112Edit corosync.conf
1113~~~~~~~~~~~~~~~~~~
1114
a9e7c3aa
SR
1115Editing the corosync.conf file is not always very straightforward. There are
1116two on each cluster node, one in `/etc/pve/corosync.conf` and the other in
e4ec4154
TL
1117`/etc/corosync/corosync.conf`. Editing the one in our cluster file system will
1118propagate the changes to the local one, but not vice versa.
1119
a37d539f
DW
1120The configuration will get updated automatically, as soon as the file changes.
1121This means that changes which can be integrated in a running corosync will take
1122effect immediately. Thus, you should always make a copy and edit that instead,
1123to avoid triggering unintended changes when saving the file while editing.
e4ec4154
TL
1124
1125[source,bash]
4d19cb00 1126----
e4ec4154 1127cp /etc/pve/corosync.conf /etc/pve/corosync.conf.new
4d19cb00 1128----
e4ec4154 1129
a37d539f
DW
1130Then, open the config file with your favorite editor, such as `nano` or
1131`vim.tiny`, which come pre-installed on every {pve} node.
e4ec4154 1132
a37d539f 1133NOTE: Always increment the 'config_version' number after configuration changes;
e4ec4154
TL
1134omitting this can lead to problems.
1135
a37d539f 1136After making the necessary changes, create another copy of the current working
e4ec4154 1137configuration file. This serves as a backup if the new configuration fails to
a37d539f 1138apply or causes other issues.
e4ec4154
TL
1139
1140[source,bash]
4d19cb00 1141----
e4ec4154 1142cp /etc/pve/corosync.conf /etc/pve/corosync.conf.bak
4d19cb00 1143----
e4ec4154 1144
a37d539f 1145Then replace the old configuration file with the new one:
e4ec4154 1146[source,bash]
4d19cb00 1147----
e4ec4154 1148mv /etc/pve/corosync.conf.new /etc/pve/corosync.conf
4d19cb00 1149----
e4ec4154 1150
a37d539f
DW
1151You can check if the changes could be applied automatically, using the following
1152commands:
e4ec4154 1153[source,bash]
4d19cb00 1154----
e4ec4154
TL
1155systemctl status corosync
1156journalctl -b -u corosync
4d19cb00 1157----
e4ec4154 1158
a37d539f 1159If the changes could not be applied automatically, you may have to restart the
e4ec4154
TL
1160corosync service via:
1161[source,bash]
4d19cb00 1162----
e4ec4154 1163systemctl restart corosync
4d19cb00 1164----
e4ec4154 1165
a37d539f 1166On errors, check the troubleshooting section below.
e4ec4154
TL
1167
1168Troubleshooting
1169~~~~~~~~~~~~~~~
1170
1171Issue: 'quorum.expected_votes must be configured'
1172^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1173
1174When corosync starts to fail and you get the following message in the system log:
1175
1176----
1177[...]
1178corosync[1647]: [QUORUM] Quorum provider: corosync_votequorum failed to initialize.
1179corosync[1647]: [SERV ] Service engine 'corosync_quorum' failed to load for reason
1180 'configuration error: nodelist or quorum.expected_votes must be configured!'
1181[...]
1182----
1183
a37d539f 1184It means that the hostname you set for a corosync 'ringX_addr' in the
e4ec4154
TL
1185configuration could not be resolved.
1186
e4ec4154
TL
1187Write Configuration When Not Quorate
1188^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1189
a37d539f
DW
1190If you need to change '/etc/pve/corosync.conf' on a node with no quorum, and you
1191understand what you are doing, use:
e4ec4154 1192[source,bash]
4d19cb00 1193----
e4ec4154 1194pvecm expected 1
4d19cb00 1195----
e4ec4154
TL
1196
1197This sets the expected vote count to 1 and makes the cluster quorate. You can
a37d539f 1198then fix your configuration, or revert it back to the last working backup.
e4ec4154 1199
a37d539f
DW
1200This is not enough if corosync cannot start anymore. In that case, it is best to
1201edit the local copy of the corosync configuration in
1202'/etc/corosync/corosync.conf', so that corosync can start again. Ensure that on
1203all nodes, this configuration has the same content to avoid split-brain
1204situations.
e4ec4154
TL
1205
1206
3254bfdd 1207[[pvecm_corosync_conf_glossary]]
e4ec4154
TL
1208Corosync Configuration Glossary
1209~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1210
1211ringX_addr::
a37d539f 1212This names the different link addresses for the Kronosnet connections between
a9e7c3aa 1213nodes.
e4ec4154 1214
806ef12d
DM
1215
1216Cluster Cold Start
1217------------------
1218
1219It is obvious that a cluster is not quorate when all nodes are
1220offline. This is a common case after a power failure.
1221
1222NOTE: It is always a good idea to use an uninterruptible power supply
8c1189b6 1223(``UPS'', also called ``battery backup'') to avoid this state, especially if
806ef12d
DM
1224you want HA.
1225
204231df 1226On node startup, the `pve-guests` service is started and waits for
8c1189b6 1227quorum. Once quorate, it starts all guests which have the `onboot`
612417fd
DM
1228flag set.
1229
1230When you turn on nodes, or when power comes back after power failure,
a37d539f 1231it is likely that some nodes will boot faster than others. Please keep in
612417fd 1232mind that guest startup is delayed until you reach quorum.
806ef12d 1233
054a7e7d 1234
082ea7d9
TL
1235Guest Migration
1236---------------
1237
054a7e7d
DM
1238Migrating virtual guests to other nodes is a useful feature in a
1239cluster. There are settings to control the behavior of such
1240migrations. This can be done via the configuration file
1241`datacenter.cfg` or for a specific migration via API or command line
1242parameters.
1243
a37d539f 1244It makes a difference if a guest is online or offline, or if it has
da6c7dee
DC
1245local resources (like a local disk).
1246
a37d539f 1247For details about virtual machine migration, see the
a9e7c3aa 1248xref:qm_migration[QEMU/KVM Migration Chapter].
da6c7dee 1249
a37d539f 1250For details about container migration, see the
a9e7c3aa 1251xref:pct_migration[Container Migration Chapter].
082ea7d9
TL
1252
1253Migration Type
1254~~~~~~~~~~~~~~
1255
44f38275 1256The migration type defines if the migration data should be sent over an
d63be10b 1257encrypted (`secure`) channel or an unencrypted (`insecure`) one.
082ea7d9 1258Setting the migration type to insecure means that the RAM content of a
a37d539f 1259virtual guest is also transferred unencrypted, which can lead to
b1743473 1260information disclosure of critical data from inside the guest (for
a37d539f 1261example, passwords or encryption keys).
054a7e7d
DM
1262
1263Therefore, we strongly recommend using the secure channel if you do
1264not have full control over the network and can not guarantee that no
6d3c0b34 1265one is eavesdropping on it.
082ea7d9 1266
054a7e7d
DM
1267NOTE: Storage migration does not follow this setting. Currently, it
1268always sends the storage content over a secure channel.
1269
1270Encryption requires a lot of computing power, so this setting is often
1271changed to "unsafe" to achieve better performance. The impact on
1272modern systems is lower because they implement AES encryption in
b1743473 1273hardware. The performance impact is particularly evident in fast
a37d539f 1274networks, where you can transfer 10 Gbps or more.
082ea7d9 1275
082ea7d9
TL
1276Migration Network
1277~~~~~~~~~~~~~~~~~
1278
a9baa444 1279By default, {pve} uses the network in which cluster communication
a37d539f 1280takes place to send the migration traffic. This is not optimal both because
a9baa444
TL
1281sensitive cluster traffic can be disrupted and this network may not
1282have the best bandwidth available on the node.
1283
1284Setting the migration network parameter allows the use of a dedicated
a37d539f 1285network for all migration traffic. In addition to the memory,
a9baa444
TL
1286this also affects the storage traffic for offline migrations.
1287
a37d539f
DW
1288The migration network is set as a network using CIDR notation. This
1289has the advantage that you don't have to set individual IP addresses
1290for each node. {pve} can determine the real address on the
1291destination node from the network specified in the CIDR form. To
1292enable this, the network must be specified so that each node has exactly one
1293IP in the respective network.
a9baa444 1294
082ea7d9
TL
1295Example
1296^^^^^^^
1297
a37d539f 1298We assume that we have a three-node setup, with three separate
a9baa444 1299networks. One for public communication with the Internet, one for
a37d539f 1300cluster communication, and a very fast one, which we want to use as a
a9baa444
TL
1301dedicated network for migration.
1302
1303A network configuration for such a setup might look as follows:
082ea7d9
TL
1304
1305----
7a0d4784 1306iface eno1 inet manual
082ea7d9
TL
1307
1308# public network
1309auto vmbr0
1310iface vmbr0 inet static
1311 address 192.X.Y.57
1312 netmask 255.255.250.0
1313 gateway 192.X.Y.1
7a39aabd
AL
1314 bridge-ports eno1
1315 bridge-stp off
1316 bridge-fd 0
082ea7d9
TL
1317
1318# cluster network
7a0d4784
WL
1319auto eno2
1320iface eno2 inet static
082ea7d9
TL
1321 address 10.1.1.1
1322 netmask 255.255.255.0
1323
1324# fast network
7a0d4784
WL
1325auto eno3
1326iface eno3 inet static
082ea7d9
TL
1327 address 10.1.2.1
1328 netmask 255.255.255.0
082ea7d9
TL
1329----
1330
a9baa444
TL
1331Here, we will use the network 10.1.2.0/24 as a migration network. For
1332a single migration, you can do this using the `migration_network`
1333parameter of the command line tool:
1334
082ea7d9 1335----
b1743473 1336# qm migrate 106 tre --online --migration_network 10.1.2.0/24
082ea7d9
TL
1337----
1338
a9baa444
TL
1339To configure this as the default network for all migrations in the
1340cluster, set the `migration` property of the `/etc/pve/datacenter.cfg`
1341file:
1342
082ea7d9 1343----
a9baa444 1344# use dedicated migration network
b1743473 1345migration: secure,network=10.1.2.0/24
082ea7d9
TL
1346----
1347
a9baa444 1348NOTE: The migration type must always be set when the migration network
a37d539f 1349is set in `/etc/pve/datacenter.cfg`.
a9baa444 1350
806ef12d 1351
d8742b0c
DM
1352ifdef::manvolnum[]
1353include::pve-copyright.adoc[]
1354endif::manvolnum[]