]> git.proxmox.com Git - pve-docs.git/blame - pvecm.adoc
followup #1850: pvecm: referencing all nodes in etc/hosts is not necessarry
[pve-docs.git] / pvecm.adoc
CommitLineData
bde0e57d 1[[chapter_pvecm]]
d8742b0c 2ifdef::manvolnum[]
b2f242ab
DM
3pvecm(1)
4========
5f09af76
DM
5:pve-toplevel:
6
d8742b0c
DM
7NAME
8----
9
74026b8f 10pvecm - Proxmox VE Cluster Manager
d8742b0c 11
49a5e11c 12SYNOPSIS
d8742b0c
DM
13--------
14
15include::pvecm.1-synopsis.adoc[]
16
17DESCRIPTION
18-----------
19endif::manvolnum[]
20
21ifndef::manvolnum[]
22Cluster Manager
23===============
5f09af76 24:pve-toplevel:
194d2f29 25endif::manvolnum[]
5f09af76 26
8c1189b6
FG
27The {PVE} cluster manager `pvecm` is a tool to create a group of
28physical servers. Such a group is called a *cluster*. We use the
8a865621 29http://www.corosync.org[Corosync Cluster Engine] for reliable group
5eba0743 30communication, and such clusters can consist of up to 32 physical nodes
8a865621
DM
31(probably more, dependent on network latency).
32
8c1189b6 33`pvecm` can be used to create a new cluster, join nodes to a cluster,
8a865621 34leave the cluster, get status information and do various other cluster
e300cf7d
FG
35related tasks. The **P**rox**m**o**x** **C**luster **F**ile **S**ystem (``pmxcfs'')
36is used to transparently distribute the cluster configuration to all cluster
8a865621
DM
37nodes.
38
39Grouping nodes into a cluster has the following advantages:
40
41* Centralized, web based management
42
5eba0743 43* Multi-master clusters: each node can do all management task
8a865621 44
8c1189b6
FG
45* `pmxcfs`: database-driven file system for storing configuration files,
46 replicated in real-time on all nodes using `corosync`.
8a865621 47
5eba0743 48* Easy migration of virtual machines and containers between physical
8a865621
DM
49 hosts
50
51* Fast deployment
52
53* Cluster-wide services like firewall and HA
54
55
56Requirements
57------------
58
8c1189b6 59* All nodes must be in the same network as `corosync` uses IP Multicast
8a865621 60 to communicate between nodes (also see
ceabe189 61 http://www.corosync.org[Corosync Cluster Engine]). Corosync uses UDP
ff72a2ba 62 ports 5404 and 5405 for cluster communication.
ceabe189
DM
63+
64NOTE: Some switches do not support IP multicast by default and must be
65manually enabled first.
8a865621
DM
66
67* Date and time have to be synchronized.
68
ceabe189 69* SSH tunnel on TCP port 22 between nodes is used.
8a865621 70
ceabe189
DM
71* If you are interested in High Availability, you need to have at
72 least three nodes for reliable quorum. All nodes should have the
73 same version.
8a865621
DM
74
75* We recommend a dedicated NIC for the cluster traffic, especially if
76 you use shared storage.
77
d4a9910f
DL
78* Root password of a cluster node is required for adding nodes.
79
8a865621 80NOTE: It is not possible to mix Proxmox VE 3.x and earlier with
ceabe189 81Proxmox VE 4.0 cluster nodes.
8a865621
DM
82
83
ceabe189
DM
84Preparing Nodes
85---------------
8a865621
DM
86
87First, install {PVE} on all nodes. Make sure that each node is
88installed with the final hostname and IP configuration. Changing the
89hostname and IP is not possible after cluster creation.
90
30101530
TL
91Currently the cluster creation can either be done on the console (login via
92`ssh`) or the API, which we have a GUI implementation for (__Datacenter ->
93Cluster__).
8a865621 94
9a7396aa
TL
95While it's often common use to reference all other nodenames in `/etc/hosts`
96with their IP this is not strictly necessary for a cluster, which normally uses
97multicast, to work. It maybe useful as you then can connect from one node to
98the other with SSH through the easier to remember node name.
99
11202f1d 100[[pvecm_create_cluster]]
8a865621 101Create the Cluster
ceabe189 102------------------
8a865621 103
8c1189b6 104Login via `ssh` to the first {pve} node. Use a unique name for your cluster.
9a7396aa
TL
105This name cannot be changed later. The cluster name follows the same rules as
106node names.
8a865621 107
c15cdfba
TL
108----
109 hp1# pvecm create CLUSTERNAME
110----
8a865621 111
9a7396aa
TL
112CAUTION: The cluster name is used to compute the default multicast address.
113Please use unique cluster names if you run more than one cluster inside your
114network. To avoid human confusion, it is also recommended to choose different
115names even if clusters do not share the cluster network.
63f956c8 116
8a865621
DM
117To check the state of your cluster use:
118
c15cdfba 119----
8a865621 120 hp1# pvecm status
c15cdfba 121----
8a865621 122
82445c4e
TL
123Multiple Clusters In Same Network
124~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
125
126It is possible to create multiple clusters in the same physical or logical
127network. Each cluster must have a unique name, which is used to generate the
128cluster's multicast group address. As long as no duplicate cluster names are
129configured in one network segment, the different clusters won't interfere with
130each other.
131
132If multiple clusters operate in a single network it may be beneficial to setup
133an IGMP querier and enable IGMP Snooping in said network. This may reduce the
134load of the network significantly because multicast packets are only delivered
135to endpoints of the respective member nodes.
136
8a865621 137
11202f1d 138[[pvecm_join_node_to_cluster]]
8a865621 139Adding Nodes to the Cluster
ceabe189 140---------------------------
8a865621 141
8c1189b6 142Login via `ssh` to the node you want to add.
8a865621 143
c15cdfba 144----
8a865621 145 hp2# pvecm add IP-ADDRESS-CLUSTER
c15cdfba 146----
8a865621
DM
147
148For `IP-ADDRESS-CLUSTER` use the IP from an existing cluster node.
149
5eba0743 150CAUTION: A new node cannot hold any VMs, because you would get
7980581f 151conflicts about identical VM IDs. Also, all existing configuration in
8c1189b6
FG
152`/etc/pve` is overwritten when you join a new node to the cluster. To
153workaround, use `vzdump` to backup and restore to a different VMID after
7980581f 154adding the node to the cluster.
8a865621
DM
155
156To check the state of cluster:
157
c15cdfba 158----
8a865621 159 # pvecm status
c15cdfba 160----
8a865621 161
ceabe189 162.Cluster status after adding 4 nodes
8a865621
DM
163----
164hp2# pvecm status
165Quorum information
166~~~~~~~~~~~~~~~~~~
167Date: Mon Apr 20 12:30:13 2015
168Quorum provider: corosync_votequorum
169Nodes: 4
170Node ID: 0x00000001
171Ring ID: 1928
172Quorate: Yes
173
174Votequorum information
175~~~~~~~~~~~~~~~~~~~~~~
176Expected votes: 4
177Highest expected: 4
178Total votes: 4
179Quorum: 2
180Flags: Quorate
181
182Membership information
183~~~~~~~~~~~~~~~~~~~~~~
184 Nodeid Votes Name
1850x00000001 1 192.168.15.91
1860x00000002 1 192.168.15.92 (local)
1870x00000003 1 192.168.15.93
1880x00000004 1 192.168.15.94
189----
190
191If you only want the list of all nodes use:
192
c15cdfba 193----
8a865621 194 # pvecm nodes
c15cdfba 195----
8a865621 196
5eba0743 197.List nodes in a cluster
8a865621
DM
198----
199hp2# pvecm nodes
200
201Membership information
202~~~~~~~~~~~~~~~~~~~~~~
203 Nodeid Votes Name
204 1 1 hp1
205 2 1 hp2 (local)
206 3 1 hp3
207 4 1 hp4
208----
209
82d52451 210[[adding-nodes-with-separated-cluster-network]]
e4ec4154
TL
211Adding Nodes With Separated Cluster Network
212~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
213
214When adding a node to a cluster with a separated cluster network you need to
215use the 'ringX_addr' parameters to set the nodes address on those networks:
216
217[source,bash]
4d19cb00 218----
e4ec4154 219pvecm add IP-ADDRESS-CLUSTER -ring0_addr IP-ADDRESS-RING0
4d19cb00 220----
e4ec4154
TL
221
222If you want to use the Redundant Ring Protocol you will also want to pass the
223'ring1_addr' parameter.
224
8a865621
DM
225
226Remove a Cluster Node
ceabe189 227---------------------
8a865621
DM
228
229CAUTION: Read carefully the procedure before proceeding, as it could
230not be what you want or need.
231
232Move all virtual machines from the node. Make sure you have no local
233data or backups you want to keep, or save them accordingly.
e8503c6c 234In the following example we will remove the node hp4 from the cluster.
8a865621 235
e8503c6c
EK
236Log in to a *different* cluster node (not hp4), and issue a `pvecm nodes`
237command to identify the node ID to remove:
8a865621
DM
238
239----
240hp1# pvecm nodes
241
242Membership information
243~~~~~~~~~~~~~~~~~~~~~~
244 Nodeid Votes Name
245 1 1 hp1 (local)
246 2 1 hp2
247 3 1 hp3
248 4 1 hp4
249----
250
e8503c6c
EK
251
252At this point you must power off hp4 and
253make sure that it will not power on again (in the network) as it
254is.
255
256IMPORTANT: As said above, it is critical to power off the node
257*before* removal, and make sure that it will *never* power on again
258(in the existing cluster network) as it is.
259If you power on the node as it is, your cluster will be screwed up and
260it could be difficult to restore a clean cluster state.
261
262After powering off the node hp4, we can safely remove it from the cluster.
8a865621 263
c15cdfba 264----
8a865621 265 hp1# pvecm delnode hp4
c15cdfba 266----
8a865621
DM
267
268If the operation succeeds no output is returned, just check the node
8c1189b6 269list again with `pvecm nodes` or `pvecm status`. You should see
8a865621
DM
270something like:
271
272----
273hp1# pvecm status
274
275Quorum information
276~~~~~~~~~~~~~~~~~~
277Date: Mon Apr 20 12:44:28 2015
278Quorum provider: corosync_votequorum
279Nodes: 3
280Node ID: 0x00000001
281Ring ID: 1992
282Quorate: Yes
283
284Votequorum information
285~~~~~~~~~~~~~~~~~~~~~~
286Expected votes: 3
287Highest expected: 3
288Total votes: 3
289Quorum: 3
290Flags: Quorate
291
292Membership information
293~~~~~~~~~~~~~~~~~~~~~~
294 Nodeid Votes Name
2950x00000001 1 192.168.15.90 (local)
2960x00000002 1 192.168.15.91
2970x00000003 1 192.168.15.92
298----
299
8a865621
DM
300If, for whatever reason, you want that this server joins the same
301cluster again, you have to
302
26ca7ff5 303* reinstall {pve} on it from scratch
8a865621
DM
304
305* then join it, as explained in the previous section.
d8742b0c 306
38ae8db3 307[[pvecm_separate_node_without_reinstall]]
555e966b
TL
308Separate A Node Without Reinstalling
309~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
310
311CAUTION: This is *not* the recommended method, proceed with caution. Use the
312above mentioned method if you're unsure.
313
314You can also separate a node from a cluster without reinstalling it from
315scratch. But after removing the node from the cluster it will still have
316access to the shared storages! This must be resolved before you start removing
317the node from the cluster. A {pve} cluster cannot share the exact same
2ea5c4a5
TL
318storage with another cluster, as storage locking doesn't work over cluster
319boundary. Further, it may also lead to VMID conflicts.
555e966b 320
3be22308
TL
321Its suggested that you create a new storage where only the node which you want
322to separate has access. This can be an new export on your NFS or a new Ceph
323pool, to name a few examples. Its just important that the exact same storage
324does not gets accessed by multiple clusters. After setting this storage up move
325all data from the node and its VMs to it. Then you are ready to separate the
326node from the cluster.
555e966b
TL
327
328WARNING: Ensure all shared resources are cleanly separated! You will run into
329conflicts and problems else.
330
331First stop the corosync and the pve-cluster services on the node:
332[source,bash]
4d19cb00 333----
555e966b
TL
334systemctl stop pve-cluster
335systemctl stop corosync
4d19cb00 336----
555e966b
TL
337
338Start the cluster filesystem again in local mode:
339[source,bash]
4d19cb00 340----
555e966b 341pmxcfs -l
4d19cb00 342----
555e966b
TL
343
344Delete the corosync configuration files:
345[source,bash]
4d19cb00 346----
555e966b
TL
347rm /etc/pve/corosync.conf
348rm /etc/corosync/*
4d19cb00 349----
555e966b
TL
350
351You can now start the filesystem again as normal service:
352[source,bash]
4d19cb00 353----
555e966b
TL
354killall pmxcfs
355systemctl start pve-cluster
4d19cb00 356----
555e966b
TL
357
358The node is now separated from the cluster. You can deleted it from a remaining
359node of the cluster with:
360[source,bash]
4d19cb00 361----
555e966b 362pvecm delnode oldnode
4d19cb00 363----
555e966b
TL
364
365If the command failed, because the remaining node in the cluster lost quorum
366when the now separate node exited, you may set the expected votes to 1 as a workaround:
367[source,bash]
4d19cb00 368----
555e966b 369pvecm expected 1
4d19cb00 370----
555e966b
TL
371
372And the repeat the 'pvecm delnode' command.
373
374Now switch back to the separated node, here delete all remaining files left
375from the old cluster. This ensures that the node can be added to another
376cluster again without problems.
377
378[source,bash]
4d19cb00 379----
555e966b 380rm /var/lib/corosync/*
4d19cb00 381----
555e966b
TL
382
383As the configuration files from the other nodes are still in the cluster
384filesystem you may want to clean those up too. Remove simply the whole
385directory recursive from '/etc/pve/nodes/NODENAME', but check three times that
386you used the correct one before deleting it.
387
388CAUTION: The nodes SSH keys are still in the 'authorized_key' file, this means
389the nodes can still connect to each other with public key authentication. This
390should be fixed by removing the respective keys from the
391'/etc/pve/priv/authorized_keys' file.
d8742b0c 392
806ef12d
DM
393Quorum
394------
395
396{pve} use a quorum-based technique to provide a consistent state among
397all cluster nodes.
398
399[quote, from Wikipedia, Quorum (distributed computing)]
400____
401A quorum is the minimum number of votes that a distributed transaction
402has to obtain in order to be allowed to perform an operation in a
403distributed system.
404____
405
406In case of network partitioning, state changes requires that a
407majority of nodes are online. The cluster switches to read-only mode
5eba0743 408if it loses quorum.
806ef12d
DM
409
410NOTE: {pve} assigns a single vote to each node by default.
411
e4ec4154
TL
412Cluster Network
413---------------
414
415The cluster network is the core of a cluster. All messages sent over it have to
416be delivered reliable to all nodes in their respective order. In {pve} this
417part is done by corosync, an implementation of a high performance low overhead
418high availability development toolkit. It serves our decentralized
419configuration file system (`pmxcfs`).
420
421[[cluster-network-requirements]]
422Network Requirements
423~~~~~~~~~~~~~~~~~~~~
424This needs a reliable network with latencies under 2 milliseconds (LAN
425performance) to work properly. While corosync can also use unicast for
426communication between nodes its **highly recommended** to have a multicast
427capable network. The network should not be used heavily by other members,
428ideally corosync runs on its own network.
429*never* share it with network where storage communicates too.
430
431Before setting up a cluster it is good practice to check if the network is fit
432for that purpose.
433
434* Ensure that all nodes are in the same subnet. This must only be true for the
435 network interfaces used for cluster communication (corosync).
436
437* Ensure all nodes can reach each other over those interfaces, using `ping` is
438 enough for a basic test.
439
440* Ensure that multicast works in general and a high package rates. This can be
441 done with the `omping` tool. The final "%loss" number should be < 1%.
9e73d831 442+
e4ec4154
TL
443[source,bash]
444----
445omping -c 10000 -i 0.001 -F -q NODE1-IP NODE2-IP ...
446----
447
448* Ensure that multicast communication works over an extended period of time.
a181f090 449 This uncovers problems where IGMP snooping is activated on the network but
e4ec4154
TL
450 no multicast querier is active. This test has a duration of around 10
451 minutes.
9e73d831 452+
e4ec4154 453[source,bash]
4d19cb00 454----
e4ec4154 455omping -c 600 -i 1 -q NODE1-IP NODE2-IP ...
4d19cb00 456----
e4ec4154
TL
457
458Your network is not ready for clustering if any of these test fails. Recheck
459your network configuration. Especially switches are notorious for having
460multicast disabled by default or IGMP snooping enabled with no IGMP querier
461active.
462
463In smaller cluster its also an option to use unicast if you really cannot get
464multicast to work.
465
466Separate Cluster Network
467~~~~~~~~~~~~~~~~~~~~~~~~
468
469When creating a cluster without any parameters the cluster network is generally
470shared with the Web UI and the VMs and its traffic. Depending on your setup
471even storage traffic may get sent over the same network. Its recommended to
472change that, as corosync is a time critical real time application.
473
474Setting Up A New Network
475^^^^^^^^^^^^^^^^^^^^^^^^
476
477First you have to setup a new network interface. It should be on a physical
478separate network. Ensure that your network fulfills the
479<<cluster-network-requirements,cluster network requirements>>.
480
481Separate On Cluster Creation
482^^^^^^^^^^^^^^^^^^^^^^^^^^^^
483
484This is possible through the 'ring0_addr' and 'bindnet0_addr' parameter of
485the 'pvecm create' command used for creating a new cluster.
486
44f38275 487If you have setup an additional NIC with a static address on 10.10.10.1/25
e4ec4154
TL
488and want to send and receive all cluster communication over this interface
489you would execute:
490
491[source,bash]
4d19cb00 492----
e4ec4154 493pvecm create test --ring0_addr 10.10.10.1 --bindnet0_addr 10.10.10.0
4d19cb00 494----
e4ec4154
TL
495
496To check if everything is working properly execute:
497[source,bash]
4d19cb00 498----
e4ec4154 499systemctl status corosync
4d19cb00 500----
e4ec4154 501
266cb17b
WB
502Afterwards, proceed as descripted in the section to
503<<adding-nodes-with-separated-cluster-network,add nodes with a separated cluster network>>.
82d52451 504
e4ec4154
TL
505[[separate-cluster-net-after-creation]]
506Separate After Cluster Creation
507^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
508
509You can do this also if you have already created a cluster and want to switch
510its communication to another network, without rebuilding the whole cluster.
511This change may lead to short durations of quorum loss in the cluster, as nodes
512have to restart corosync and come up one after the other on the new network.
513
514Check how to <<edit-corosync-conf,edit the corosync.conf file>> first.
515The open it and you should see a file similar to:
516
517----
518logging {
519 debug: off
520 to_syslog: yes
521}
522
523nodelist {
524
525 node {
526 name: due
527 nodeid: 2
528 quorum_votes: 1
529 ring0_addr: due
530 }
531
532 node {
533 name: tre
534 nodeid: 3
535 quorum_votes: 1
536 ring0_addr: tre
537 }
538
539 node {
540 name: uno
541 nodeid: 1
542 quorum_votes: 1
543 ring0_addr: uno
544 }
545
546}
547
548quorum {
549 provider: corosync_votequorum
550}
551
552totem {
553 cluster_name: thomas-testcluster
554 config_version: 3
555 ip_version: ipv4
556 secauth: on
557 version: 2
558 interface {
559 bindnetaddr: 192.168.30.50
560 ringnumber: 0
561 }
562
563}
564----
565
566The first you want to do is add the 'name' properties in the node entries if
567you do not see them already. Those *must* match the node name.
568
569Then replace the address from the 'ring0_addr' properties with the new
570addresses. You may use plain IP addresses or also hostnames here. If you use
571hostnames ensure that they are resolvable from all nodes.
572
573In my example I want to switch my cluster communication to the 10.10.10.1/25
470d4313 574network. So I replace all 'ring0_addr' respectively. I also set the bindnetaddr
e4ec4154
TL
575in the totem section of the config to an address of the new network. It can be
576any address from the subnet configured on the new network interface.
577
578After you increased the 'config_version' property the new configuration file
579should look like:
580
581----
582
583logging {
584 debug: off
585 to_syslog: yes
586}
587
588nodelist {
589
590 node {
591 name: due
592 nodeid: 2
593 quorum_votes: 1
594 ring0_addr: 10.10.10.2
595 }
596
597 node {
598 name: tre
599 nodeid: 3
600 quorum_votes: 1
601 ring0_addr: 10.10.10.3
602 }
603
604 node {
605 name: uno
606 nodeid: 1
607 quorum_votes: 1
608 ring0_addr: 10.10.10.1
609 }
610
611}
612
613quorum {
614 provider: corosync_votequorum
615}
616
617totem {
618 cluster_name: thomas-testcluster
619 config_version: 4
620 ip_version: ipv4
621 secauth: on
622 version: 2
623 interface {
624 bindnetaddr: 10.10.10.1
625 ringnumber: 0
626 }
627
628}
629----
630
631Now after a final check whether all changed information is correct we save it
632and see again the <<edit-corosync-conf,edit corosync.conf file>> section to
633learn how to bring it in effect.
634
635As our change cannot be enforced live from corosync we have to do an restart.
636
637On a single node execute:
638[source,bash]
4d19cb00 639----
e4ec4154 640systemctl restart corosync
4d19cb00 641----
e4ec4154
TL
642
643Now check if everything is fine:
644
645[source,bash]
4d19cb00 646----
e4ec4154 647systemctl status corosync
4d19cb00 648----
e4ec4154
TL
649
650If corosync runs again correct restart corosync also on all other nodes.
651They will then join the cluster membership one by one on the new network.
652
11202f1d 653[[pvecm_rrp]]
e4ec4154
TL
654Redundant Ring Protocol
655~~~~~~~~~~~~~~~~~~~~~~~
656To avoid a single point of failure you should implement counter measurements.
657This can be on the hardware and operating system level through network bonding.
658
659Corosync itself offers also a possibility to add redundancy through the so
660called 'Redundant Ring Protocol'. This protocol allows running a second totem
661ring on another network, this network should be physically separated from the
662other rings network to actually increase availability.
663
664RRP On Cluster Creation
665~~~~~~~~~~~~~~~~~~~~~~~
666
667The 'pvecm create' command provides the additional parameters 'bindnetX_addr',
668'ringX_addr' and 'rrp_mode', can be used for RRP configuration.
669
670NOTE: See the <<corosync-conf-glossary,glossary>> if you do not know what each parameter means.
671
672So if you have two networks, one on the 10.10.10.1/24 and the other on the
67310.10.20.1/24 subnet you would execute:
674
675[source,bash]
4d19cb00 676----
e4ec4154
TL
677pvecm create CLUSTERNAME -bindnet0_addr 10.10.10.1 -ring0_addr 10.10.10.1 \
678-bindnet1_addr 10.10.20.1 -ring1_addr 10.10.20.1
4d19cb00 679----
e4ec4154 680
6e78f927 681RRP On Existing Clusters
e4ec4154
TL
682~~~~~~~~~~~~~~~~~~~~~~~~
683
6e78f927
TL
684You will take similar steps as described in
685<<separate-cluster-net-after-creation,separating the cluster network>> to
686enable RRP on an already running cluster. The single difference is, that you
687will add `ring1` and use it instead of `ring0`.
e4ec4154
TL
688
689First add a new `interface` subsection in the `totem` section, set its
690`ringnumber` property to `1`. Set the interfaces `bindnetaddr` property to an
691address of the subnet you have configured for your new ring.
692Further set the `rrp_mode` to `passive`, this is the only stable mode.
693
694Then add to each node entry in the `nodelist` section its new `ring1_addr`
695property with the nodes additional ring address.
696
697So if you have two networks, one on the 10.10.10.1/24 and the other on the
69810.10.20.1/24 subnet, the final configuration file should look like:
699
700----
701totem {
702 cluster_name: tweak
703 config_version: 9
704 ip_version: ipv4
705 rrp_mode: passive
706 secauth: on
707 version: 2
708 interface {
709 bindnetaddr: 10.10.10.1
710 ringnumber: 0
711 }
712 interface {
713 bindnetaddr: 10.10.20.1
714 ringnumber: 1
715 }
716}
717
718nodelist {
719 node {
720 name: pvecm1
721 nodeid: 1
722 quorum_votes: 1
723 ring0_addr: 10.10.10.1
724 ring1_addr: 10.10.20.1
725 }
726
727 node {
728 name: pvecm2
729 nodeid: 2
730 quorum_votes: 1
731 ring0_addr: 10.10.10.2
732 ring1_addr: 10.10.20.2
733 }
734
735 [...] # other cluster nodes here
736}
737
738[...] # other remaining config sections here
739
740----
741
7d48940b
DM
742Bring it in effect like described in the
743<<edit-corosync-conf,edit the corosync.conf file>> section.
e4ec4154
TL
744
745This is a change which cannot take live in effect and needs at least a restart
746of corosync. Recommended is a restart of the whole cluster.
747
748If you cannot reboot the whole cluster ensure no High Availability services are
749configured and the stop the corosync service on all nodes. After corosync is
750stopped on all nodes start it one after the other again.
751
752Corosync Configuration
753----------------------
754
470d4313 755The `/etc/pve/corosync.conf` file plays a central role in {pve} cluster. It
e4ec4154
TL
756controls the cluster member ship and its network.
757For reading more about it check the corosync.conf man page:
758[source,bash]
4d19cb00 759----
e4ec4154 760man corosync.conf
4d19cb00 761----
e4ec4154
TL
762
763For node membership you should always use the `pvecm` tool provided by {pve}.
764You may have to edit the configuration file manually for other changes.
765Here are a few best practice tips for doing this.
766
767[[edit-corosync-conf]]
768Edit corosync.conf
769~~~~~~~~~~~~~~~~~~
770
771Editing the corosync.conf file can be not always straight forward. There are
772two on each cluster, one in `/etc/pve/corosync.conf` and the other in
773`/etc/corosync/corosync.conf`. Editing the one in our cluster file system will
774propagate the changes to the local one, but not vice versa.
775
776The configuration will get updated automatically as soon as the file changes.
777This means changes which can be integrated in a running corosync will take
778instantly effect. So you should always make a copy and edit that instead, to
779avoid triggering some unwanted changes by an in between safe.
780
781[source,bash]
4d19cb00 782----
e4ec4154 783cp /etc/pve/corosync.conf /etc/pve/corosync.conf.new
4d19cb00 784----
e4ec4154
TL
785
786Then open the Config file with your favorite editor, `nano` and `vim.tiny` are
787preinstalled on {pve} for example.
788
789NOTE: Always increment the 'config_version' number on configuration changes,
790omitting this can lead to problems.
791
792After making the necessary changes create another copy of the current working
793configuration file. This serves as a backup if the new configuration fails to
794apply or makes problems in other ways.
795
796[source,bash]
4d19cb00 797----
e4ec4154 798cp /etc/pve/corosync.conf /etc/pve/corosync.conf.bak
4d19cb00 799----
e4ec4154
TL
800
801Then move the new configuration file over the old one:
802[source,bash]
4d19cb00 803----
e4ec4154 804mv /etc/pve/corosync.conf.new /etc/pve/corosync.conf
4d19cb00 805----
e4ec4154
TL
806
807You may check with the commands
808[source,bash]
4d19cb00 809----
e4ec4154
TL
810systemctl status corosync
811journalctl -b -u corosync
4d19cb00 812----
e4ec4154
TL
813
814If the change could applied automatically. If not you may have to restart the
815corosync service via:
816[source,bash]
4d19cb00 817----
e4ec4154 818systemctl restart corosync
4d19cb00 819----
e4ec4154
TL
820
821On errors check the troubleshooting section below.
822
823Troubleshooting
824~~~~~~~~~~~~~~~
825
826Issue: 'quorum.expected_votes must be configured'
827^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
828
829When corosync starts to fail and you get the following message in the system log:
830
831----
832[...]
833corosync[1647]: [QUORUM] Quorum provider: corosync_votequorum failed to initialize.
834corosync[1647]: [SERV ] Service engine 'corosync_quorum' failed to load for reason
835 'configuration error: nodelist or quorum.expected_votes must be configured!'
836[...]
837----
838
839It means that the hostname you set for corosync 'ringX_addr' in the
840configuration could not be resolved.
841
842
843Write Configuration When Not Quorate
844^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
845
846If you need to change '/etc/pve/corosync.conf' on an node with no quorum, and you
847know what you do, use:
848[source,bash]
4d19cb00 849----
e4ec4154 850pvecm expected 1
4d19cb00 851----
e4ec4154
TL
852
853This sets the expected vote count to 1 and makes the cluster quorate. You can
854now fix your configuration, or revert it back to the last working backup.
855
856This is not enough if corosync cannot start anymore. Here its best to edit the
857local copy of the corosync configuration in '/etc/corosync/corosync.conf' so
858that corosync can start again. Ensure that on all nodes this configuration has
859the same content to avoid split brains. If you are not sure what went wrong
860it's best to ask the Proxmox Community to help you.
861
862
863[[corosync-conf-glossary]]
864Corosync Configuration Glossary
865~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
866
867ringX_addr::
868This names the different ring addresses for the corosync totem rings used for
869the cluster communication.
870
871bindnetaddr::
872Defines to which interface the ring should bind to. It may be any address of
873the subnet configured on the interface we want to use. In general its the
874recommended to just use an address a node uses on this interface.
875
876rrp_mode::
877Specifies the mode of the redundant ring protocol and may be passive, active or
878none. Note that use of active is highly experimental and not official
879supported. Passive is the preferred mode, it may double the cluster
880communication throughput and increases availability.
881
806ef12d
DM
882
883Cluster Cold Start
884------------------
885
886It is obvious that a cluster is not quorate when all nodes are
887offline. This is a common case after a power failure.
888
889NOTE: It is always a good idea to use an uninterruptible power supply
8c1189b6 890(``UPS'', also called ``battery backup'') to avoid this state, especially if
806ef12d
DM
891you want HA.
892
204231df 893On node startup, the `pve-guests` service is started and waits for
8c1189b6 894quorum. Once quorate, it starts all guests which have the `onboot`
612417fd
DM
895flag set.
896
897When you turn on nodes, or when power comes back after power failure,
898it is likely that some nodes boots faster than others. Please keep in
899mind that guest startup is delayed until you reach quorum.
806ef12d 900
054a7e7d 901
082ea7d9
TL
902Guest Migration
903---------------
904
054a7e7d
DM
905Migrating virtual guests to other nodes is a useful feature in a
906cluster. There are settings to control the behavior of such
907migrations. This can be done via the configuration file
908`datacenter.cfg` or for a specific migration via API or command line
909parameters.
910
da6c7dee
DC
911It makes a difference if a Guest is online or offline, or if it has
912local resources (like a local disk).
913
914For Details about Virtual Machine Migration see the
915xref:qm_migration[QEMU/KVM Migration Chapter]
916
917For Details about Container Migration see the
918xref:pct_migration[Container Migration Chapter]
082ea7d9
TL
919
920Migration Type
921~~~~~~~~~~~~~~
922
44f38275 923The migration type defines if the migration data should be sent over an
d63be10b 924encrypted (`secure`) channel or an unencrypted (`insecure`) one.
082ea7d9 925Setting the migration type to insecure means that the RAM content of a
470d4313 926virtual guest gets also transferred unencrypted, which can lead to
b1743473
DM
927information disclosure of critical data from inside the guest (for
928example passwords or encryption keys).
054a7e7d
DM
929
930Therefore, we strongly recommend using the secure channel if you do
931not have full control over the network and can not guarantee that no
932one is eavesdropping to it.
082ea7d9 933
054a7e7d
DM
934NOTE: Storage migration does not follow this setting. Currently, it
935always sends the storage content over a secure channel.
936
937Encryption requires a lot of computing power, so this setting is often
938changed to "unsafe" to achieve better performance. The impact on
939modern systems is lower because they implement AES encryption in
b1743473
DM
940hardware. The performance impact is particularly evident in fast
941networks where you can transfer 10 Gbps or more.
082ea7d9 942
082ea7d9
TL
943
944Migration Network
945~~~~~~~~~~~~~~~~~
946
a9baa444
TL
947By default, {pve} uses the network in which cluster communication
948takes place to send the migration traffic. This is not optimal because
949sensitive cluster traffic can be disrupted and this network may not
950have the best bandwidth available on the node.
951
952Setting the migration network parameter allows the use of a dedicated
953network for the entire migration traffic. In addition to the memory,
954this also affects the storage traffic for offline migrations.
955
956The migration network is set as a network in the CIDR notation. This
957has the advantage that you do not have to set individual IP addresses
958for each node. {pve} can determine the real address on the
959destination node from the network specified in the CIDR form. To
960enable this, the network must be specified so that each node has one,
961but only one IP in the respective network.
962
082ea7d9
TL
963
964Example
965^^^^^^^
966
a9baa444
TL
967We assume that we have a three-node setup with three separate
968networks. One for public communication with the Internet, one for
969cluster communication and a very fast one, which we want to use as a
970dedicated network for migration.
971
972A network configuration for such a setup might look as follows:
082ea7d9
TL
973
974----
7a0d4784 975iface eno1 inet manual
082ea7d9
TL
976
977# public network
978auto vmbr0
979iface vmbr0 inet static
980 address 192.X.Y.57
981 netmask 255.255.250.0
982 gateway 192.X.Y.1
7a0d4784 983 bridge_ports eno1
082ea7d9
TL
984 bridge_stp off
985 bridge_fd 0
986
987# cluster network
7a0d4784
WL
988auto eno2
989iface eno2 inet static
082ea7d9
TL
990 address 10.1.1.1
991 netmask 255.255.255.0
992
993# fast network
7a0d4784
WL
994auto eno3
995iface eno3 inet static
082ea7d9
TL
996 address 10.1.2.1
997 netmask 255.255.255.0
082ea7d9
TL
998----
999
a9baa444
TL
1000Here, we will use the network 10.1.2.0/24 as a migration network. For
1001a single migration, you can do this using the `migration_network`
1002parameter of the command line tool:
1003
082ea7d9 1004----
b1743473 1005# qm migrate 106 tre --online --migration_network 10.1.2.0/24
082ea7d9
TL
1006----
1007
a9baa444
TL
1008To configure this as the default network for all migrations in the
1009cluster, set the `migration` property of the `/etc/pve/datacenter.cfg`
1010file:
1011
082ea7d9 1012----
a9baa444 1013# use dedicated migration network
b1743473 1014migration: secure,network=10.1.2.0/24
082ea7d9
TL
1015----
1016
a9baa444
TL
1017NOTE: The migration type must always be set when the migration network
1018gets set in `/etc/pve/datacenter.cfg`.
1019
806ef12d 1020
d8742b0c
DM
1021ifdef::manvolnum[]
1022include::pve-copyright.adoc[]
1023endif::manvolnum[]