]> git.proxmox.com Git - pve-docs.git/blame - pvecm.adoc
#1850 followup: pvecm: you can use SSH or API to create cluster
[pve-docs.git] / pvecm.adoc
CommitLineData
bde0e57d 1[[chapter_pvecm]]
d8742b0c 2ifdef::manvolnum[]
b2f242ab
DM
3pvecm(1)
4========
5f09af76
DM
5:pve-toplevel:
6
d8742b0c
DM
7NAME
8----
9
74026b8f 10pvecm - Proxmox VE Cluster Manager
d8742b0c 11
49a5e11c 12SYNOPSIS
d8742b0c
DM
13--------
14
15include::pvecm.1-synopsis.adoc[]
16
17DESCRIPTION
18-----------
19endif::manvolnum[]
20
21ifndef::manvolnum[]
22Cluster Manager
23===============
5f09af76 24:pve-toplevel:
194d2f29 25endif::manvolnum[]
5f09af76 26
8c1189b6
FG
27The {PVE} cluster manager `pvecm` is a tool to create a group of
28physical servers. Such a group is called a *cluster*. We use the
8a865621 29http://www.corosync.org[Corosync Cluster Engine] for reliable group
5eba0743 30communication, and such clusters can consist of up to 32 physical nodes
8a865621
DM
31(probably more, dependent on network latency).
32
8c1189b6 33`pvecm` can be used to create a new cluster, join nodes to a cluster,
8a865621 34leave the cluster, get status information and do various other cluster
e300cf7d
FG
35related tasks. The **P**rox**m**o**x** **C**luster **F**ile **S**ystem (``pmxcfs'')
36is used to transparently distribute the cluster configuration to all cluster
8a865621
DM
37nodes.
38
39Grouping nodes into a cluster has the following advantages:
40
41* Centralized, web based management
42
5eba0743 43* Multi-master clusters: each node can do all management task
8a865621 44
8c1189b6
FG
45* `pmxcfs`: database-driven file system for storing configuration files,
46 replicated in real-time on all nodes using `corosync`.
8a865621 47
5eba0743 48* Easy migration of virtual machines and containers between physical
8a865621
DM
49 hosts
50
51* Fast deployment
52
53* Cluster-wide services like firewall and HA
54
55
56Requirements
57------------
58
8c1189b6 59* All nodes must be in the same network as `corosync` uses IP Multicast
8a865621 60 to communicate between nodes (also see
ceabe189 61 http://www.corosync.org[Corosync Cluster Engine]). Corosync uses UDP
ff72a2ba 62 ports 5404 and 5405 for cluster communication.
ceabe189
DM
63+
64NOTE: Some switches do not support IP multicast by default and must be
65manually enabled first.
8a865621
DM
66
67* Date and time have to be synchronized.
68
ceabe189 69* SSH tunnel on TCP port 22 between nodes is used.
8a865621 70
ceabe189
DM
71* If you are interested in High Availability, you need to have at
72 least three nodes for reliable quorum. All nodes should have the
73 same version.
8a865621
DM
74
75* We recommend a dedicated NIC for the cluster traffic, especially if
76 you use shared storage.
77
d4a9910f
DL
78* Root password of a cluster node is required for adding nodes.
79
8a865621 80NOTE: It is not possible to mix Proxmox VE 3.x and earlier with
ceabe189 81Proxmox VE 4.0 cluster nodes.
8a865621
DM
82
83
ceabe189
DM
84Preparing Nodes
85---------------
8a865621
DM
86
87First, install {PVE} on all nodes. Make sure that each node is
88installed with the final hostname and IP configuration. Changing the
89hostname and IP is not possible after cluster creation.
90
30101530
TL
91Currently the cluster creation can either be done on the console (login via
92`ssh`) or the API, which we have a GUI implementation for (__Datacenter ->
93Cluster__).
8a865621 94
11202f1d 95[[pvecm_create_cluster]]
8a865621 96Create the Cluster
ceabe189 97------------------
8a865621 98
8c1189b6 99Login via `ssh` to the first {pve} node. Use a unique name for your cluster.
d4a9910f 100This name cannot be changed later. The cluster name follows the same rules as node names.
8a865621
DM
101
102 hp1# pvecm create YOUR-CLUSTER-NAME
103
63f956c8
DM
104CAUTION: The cluster name is used to compute the default multicast
105address. Please use unique cluster names if you run more than one
106cluster inside your network.
107
8a865621
DM
108To check the state of your cluster use:
109
110 hp1# pvecm status
111
82445c4e
TL
112Multiple Clusters In Same Network
113~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
114
115It is possible to create multiple clusters in the same physical or logical
116network. Each cluster must have a unique name, which is used to generate the
117cluster's multicast group address. As long as no duplicate cluster names are
118configured in one network segment, the different clusters won't interfere with
119each other.
120
121If multiple clusters operate in a single network it may be beneficial to setup
122an IGMP querier and enable IGMP Snooping in said network. This may reduce the
123load of the network significantly because multicast packets are only delivered
124to endpoints of the respective member nodes.
125
8a865621 126
11202f1d 127[[pvecm_join_node_to_cluster]]
8a865621 128Adding Nodes to the Cluster
ceabe189 129---------------------------
8a865621 130
8c1189b6 131Login via `ssh` to the node you want to add.
8a865621
DM
132
133 hp2# pvecm add IP-ADDRESS-CLUSTER
134
135For `IP-ADDRESS-CLUSTER` use the IP from an existing cluster node.
136
5eba0743 137CAUTION: A new node cannot hold any VMs, because you would get
7980581f 138conflicts about identical VM IDs. Also, all existing configuration in
8c1189b6
FG
139`/etc/pve` is overwritten when you join a new node to the cluster. To
140workaround, use `vzdump` to backup and restore to a different VMID after
7980581f 141adding the node to the cluster.
8a865621
DM
142
143To check the state of cluster:
144
145 # pvecm status
146
ceabe189 147.Cluster status after adding 4 nodes
8a865621
DM
148----
149hp2# pvecm status
150Quorum information
151~~~~~~~~~~~~~~~~~~
152Date: Mon Apr 20 12:30:13 2015
153Quorum provider: corosync_votequorum
154Nodes: 4
155Node ID: 0x00000001
156Ring ID: 1928
157Quorate: Yes
158
159Votequorum information
160~~~~~~~~~~~~~~~~~~~~~~
161Expected votes: 4
162Highest expected: 4
163Total votes: 4
164Quorum: 2
165Flags: Quorate
166
167Membership information
168~~~~~~~~~~~~~~~~~~~~~~
169 Nodeid Votes Name
1700x00000001 1 192.168.15.91
1710x00000002 1 192.168.15.92 (local)
1720x00000003 1 192.168.15.93
1730x00000004 1 192.168.15.94
174----
175
176If you only want the list of all nodes use:
177
178 # pvecm nodes
179
5eba0743 180.List nodes in a cluster
8a865621
DM
181----
182hp2# pvecm nodes
183
184Membership information
185~~~~~~~~~~~~~~~~~~~~~~
186 Nodeid Votes Name
187 1 1 hp1
188 2 1 hp2 (local)
189 3 1 hp3
190 4 1 hp4
191----
192
82d52451 193[[adding-nodes-with-separated-cluster-network]]
e4ec4154
TL
194Adding Nodes With Separated Cluster Network
195~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
196
197When adding a node to a cluster with a separated cluster network you need to
198use the 'ringX_addr' parameters to set the nodes address on those networks:
199
200[source,bash]
4d19cb00 201----
e4ec4154 202pvecm add IP-ADDRESS-CLUSTER -ring0_addr IP-ADDRESS-RING0
4d19cb00 203----
e4ec4154
TL
204
205If you want to use the Redundant Ring Protocol you will also want to pass the
206'ring1_addr' parameter.
207
8a865621
DM
208
209Remove a Cluster Node
ceabe189 210---------------------
8a865621
DM
211
212CAUTION: Read carefully the procedure before proceeding, as it could
213not be what you want or need.
214
215Move all virtual machines from the node. Make sure you have no local
216data or backups you want to keep, or save them accordingly.
e8503c6c 217In the following example we will remove the node hp4 from the cluster.
8a865621 218
e8503c6c
EK
219Log in to a *different* cluster node (not hp4), and issue a `pvecm nodes`
220command to identify the node ID to remove:
8a865621
DM
221
222----
223hp1# pvecm nodes
224
225Membership information
226~~~~~~~~~~~~~~~~~~~~~~
227 Nodeid Votes Name
228 1 1 hp1 (local)
229 2 1 hp2
230 3 1 hp3
231 4 1 hp4
232----
233
e8503c6c
EK
234
235At this point you must power off hp4 and
236make sure that it will not power on again (in the network) as it
237is.
238
239IMPORTANT: As said above, it is critical to power off the node
240*before* removal, and make sure that it will *never* power on again
241(in the existing cluster network) as it is.
242If you power on the node as it is, your cluster will be screwed up and
243it could be difficult to restore a clean cluster state.
244
245After powering off the node hp4, we can safely remove it from the cluster.
8a865621
DM
246
247 hp1# pvecm delnode hp4
248
249If the operation succeeds no output is returned, just check the node
8c1189b6 250list again with `pvecm nodes` or `pvecm status`. You should see
8a865621
DM
251something like:
252
253----
254hp1# pvecm status
255
256Quorum information
257~~~~~~~~~~~~~~~~~~
258Date: Mon Apr 20 12:44:28 2015
259Quorum provider: corosync_votequorum
260Nodes: 3
261Node ID: 0x00000001
262Ring ID: 1992
263Quorate: Yes
264
265Votequorum information
266~~~~~~~~~~~~~~~~~~~~~~
267Expected votes: 3
268Highest expected: 3
269Total votes: 3
270Quorum: 3
271Flags: Quorate
272
273Membership information
274~~~~~~~~~~~~~~~~~~~~~~
275 Nodeid Votes Name
2760x00000001 1 192.168.15.90 (local)
2770x00000002 1 192.168.15.91
2780x00000003 1 192.168.15.92
279----
280
8a865621
DM
281If, for whatever reason, you want that this server joins the same
282cluster again, you have to
283
26ca7ff5 284* reinstall {pve} on it from scratch
8a865621
DM
285
286* then join it, as explained in the previous section.
d8742b0c 287
38ae8db3 288[[pvecm_separate_node_without_reinstall]]
555e966b
TL
289Separate A Node Without Reinstalling
290~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
291
292CAUTION: This is *not* the recommended method, proceed with caution. Use the
293above mentioned method if you're unsure.
294
295You can also separate a node from a cluster without reinstalling it from
296scratch. But after removing the node from the cluster it will still have
297access to the shared storages! This must be resolved before you start removing
298the node from the cluster. A {pve} cluster cannot share the exact same
2ea5c4a5
TL
299storage with another cluster, as storage locking doesn't work over cluster
300boundary. Further, it may also lead to VMID conflicts.
555e966b 301
3be22308
TL
302Its suggested that you create a new storage where only the node which you want
303to separate has access. This can be an new export on your NFS or a new Ceph
304pool, to name a few examples. Its just important that the exact same storage
305does not gets accessed by multiple clusters. After setting this storage up move
306all data from the node and its VMs to it. Then you are ready to separate the
307node from the cluster.
555e966b
TL
308
309WARNING: Ensure all shared resources are cleanly separated! You will run into
310conflicts and problems else.
311
312First stop the corosync and the pve-cluster services on the node:
313[source,bash]
4d19cb00 314----
555e966b
TL
315systemctl stop pve-cluster
316systemctl stop corosync
4d19cb00 317----
555e966b
TL
318
319Start the cluster filesystem again in local mode:
320[source,bash]
4d19cb00 321----
555e966b 322pmxcfs -l
4d19cb00 323----
555e966b
TL
324
325Delete the corosync configuration files:
326[source,bash]
4d19cb00 327----
555e966b
TL
328rm /etc/pve/corosync.conf
329rm /etc/corosync/*
4d19cb00 330----
555e966b
TL
331
332You can now start the filesystem again as normal service:
333[source,bash]
4d19cb00 334----
555e966b
TL
335killall pmxcfs
336systemctl start pve-cluster
4d19cb00 337----
555e966b
TL
338
339The node is now separated from the cluster. You can deleted it from a remaining
340node of the cluster with:
341[source,bash]
4d19cb00 342----
555e966b 343pvecm delnode oldnode
4d19cb00 344----
555e966b
TL
345
346If the command failed, because the remaining node in the cluster lost quorum
347when the now separate node exited, you may set the expected votes to 1 as a workaround:
348[source,bash]
4d19cb00 349----
555e966b 350pvecm expected 1
4d19cb00 351----
555e966b
TL
352
353And the repeat the 'pvecm delnode' command.
354
355Now switch back to the separated node, here delete all remaining files left
356from the old cluster. This ensures that the node can be added to another
357cluster again without problems.
358
359[source,bash]
4d19cb00 360----
555e966b 361rm /var/lib/corosync/*
4d19cb00 362----
555e966b
TL
363
364As the configuration files from the other nodes are still in the cluster
365filesystem you may want to clean those up too. Remove simply the whole
366directory recursive from '/etc/pve/nodes/NODENAME', but check three times that
367you used the correct one before deleting it.
368
369CAUTION: The nodes SSH keys are still in the 'authorized_key' file, this means
370the nodes can still connect to each other with public key authentication. This
371should be fixed by removing the respective keys from the
372'/etc/pve/priv/authorized_keys' file.
d8742b0c 373
806ef12d
DM
374Quorum
375------
376
377{pve} use a quorum-based technique to provide a consistent state among
378all cluster nodes.
379
380[quote, from Wikipedia, Quorum (distributed computing)]
381____
382A quorum is the minimum number of votes that a distributed transaction
383has to obtain in order to be allowed to perform an operation in a
384distributed system.
385____
386
387In case of network partitioning, state changes requires that a
388majority of nodes are online. The cluster switches to read-only mode
5eba0743 389if it loses quorum.
806ef12d
DM
390
391NOTE: {pve} assigns a single vote to each node by default.
392
e4ec4154
TL
393Cluster Network
394---------------
395
396The cluster network is the core of a cluster. All messages sent over it have to
397be delivered reliable to all nodes in their respective order. In {pve} this
398part is done by corosync, an implementation of a high performance low overhead
399high availability development toolkit. It serves our decentralized
400configuration file system (`pmxcfs`).
401
402[[cluster-network-requirements]]
403Network Requirements
404~~~~~~~~~~~~~~~~~~~~
405This needs a reliable network with latencies under 2 milliseconds (LAN
406performance) to work properly. While corosync can also use unicast for
407communication between nodes its **highly recommended** to have a multicast
408capable network. The network should not be used heavily by other members,
409ideally corosync runs on its own network.
410*never* share it with network where storage communicates too.
411
412Before setting up a cluster it is good practice to check if the network is fit
413for that purpose.
414
415* Ensure that all nodes are in the same subnet. This must only be true for the
416 network interfaces used for cluster communication (corosync).
417
418* Ensure all nodes can reach each other over those interfaces, using `ping` is
419 enough for a basic test.
420
421* Ensure that multicast works in general and a high package rates. This can be
422 done with the `omping` tool. The final "%loss" number should be < 1%.
9e73d831 423+
e4ec4154
TL
424[source,bash]
425----
426omping -c 10000 -i 0.001 -F -q NODE1-IP NODE2-IP ...
427----
428
429* Ensure that multicast communication works over an extended period of time.
a181f090 430 This uncovers problems where IGMP snooping is activated on the network but
e4ec4154
TL
431 no multicast querier is active. This test has a duration of around 10
432 minutes.
9e73d831 433+
e4ec4154 434[source,bash]
4d19cb00 435----
e4ec4154 436omping -c 600 -i 1 -q NODE1-IP NODE2-IP ...
4d19cb00 437----
e4ec4154
TL
438
439Your network is not ready for clustering if any of these test fails. Recheck
440your network configuration. Especially switches are notorious for having
441multicast disabled by default or IGMP snooping enabled with no IGMP querier
442active.
443
444In smaller cluster its also an option to use unicast if you really cannot get
445multicast to work.
446
447Separate Cluster Network
448~~~~~~~~~~~~~~~~~~~~~~~~
449
450When creating a cluster without any parameters the cluster network is generally
451shared with the Web UI and the VMs and its traffic. Depending on your setup
452even storage traffic may get sent over the same network. Its recommended to
453change that, as corosync is a time critical real time application.
454
455Setting Up A New Network
456^^^^^^^^^^^^^^^^^^^^^^^^
457
458First you have to setup a new network interface. It should be on a physical
459separate network. Ensure that your network fulfills the
460<<cluster-network-requirements,cluster network requirements>>.
461
462Separate On Cluster Creation
463^^^^^^^^^^^^^^^^^^^^^^^^^^^^
464
465This is possible through the 'ring0_addr' and 'bindnet0_addr' parameter of
466the 'pvecm create' command used for creating a new cluster.
467
44f38275 468If you have setup an additional NIC with a static address on 10.10.10.1/25
e4ec4154
TL
469and want to send and receive all cluster communication over this interface
470you would execute:
471
472[source,bash]
4d19cb00 473----
e4ec4154 474pvecm create test --ring0_addr 10.10.10.1 --bindnet0_addr 10.10.10.0
4d19cb00 475----
e4ec4154
TL
476
477To check if everything is working properly execute:
478[source,bash]
4d19cb00 479----
e4ec4154 480systemctl status corosync
4d19cb00 481----
e4ec4154 482
266cb17b
WB
483Afterwards, proceed as descripted in the section to
484<<adding-nodes-with-separated-cluster-network,add nodes with a separated cluster network>>.
82d52451 485
e4ec4154
TL
486[[separate-cluster-net-after-creation]]
487Separate After Cluster Creation
488^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
489
490You can do this also if you have already created a cluster and want to switch
491its communication to another network, without rebuilding the whole cluster.
492This change may lead to short durations of quorum loss in the cluster, as nodes
493have to restart corosync and come up one after the other on the new network.
494
495Check how to <<edit-corosync-conf,edit the corosync.conf file>> first.
496The open it and you should see a file similar to:
497
498----
499logging {
500 debug: off
501 to_syslog: yes
502}
503
504nodelist {
505
506 node {
507 name: due
508 nodeid: 2
509 quorum_votes: 1
510 ring0_addr: due
511 }
512
513 node {
514 name: tre
515 nodeid: 3
516 quorum_votes: 1
517 ring0_addr: tre
518 }
519
520 node {
521 name: uno
522 nodeid: 1
523 quorum_votes: 1
524 ring0_addr: uno
525 }
526
527}
528
529quorum {
530 provider: corosync_votequorum
531}
532
533totem {
534 cluster_name: thomas-testcluster
535 config_version: 3
536 ip_version: ipv4
537 secauth: on
538 version: 2
539 interface {
540 bindnetaddr: 192.168.30.50
541 ringnumber: 0
542 }
543
544}
545----
546
547The first you want to do is add the 'name' properties in the node entries if
548you do not see them already. Those *must* match the node name.
549
550Then replace the address from the 'ring0_addr' properties with the new
551addresses. You may use plain IP addresses or also hostnames here. If you use
552hostnames ensure that they are resolvable from all nodes.
553
554In my example I want to switch my cluster communication to the 10.10.10.1/25
470d4313 555network. So I replace all 'ring0_addr' respectively. I also set the bindnetaddr
e4ec4154
TL
556in the totem section of the config to an address of the new network. It can be
557any address from the subnet configured on the new network interface.
558
559After you increased the 'config_version' property the new configuration file
560should look like:
561
562----
563
564logging {
565 debug: off
566 to_syslog: yes
567}
568
569nodelist {
570
571 node {
572 name: due
573 nodeid: 2
574 quorum_votes: 1
575 ring0_addr: 10.10.10.2
576 }
577
578 node {
579 name: tre
580 nodeid: 3
581 quorum_votes: 1
582 ring0_addr: 10.10.10.3
583 }
584
585 node {
586 name: uno
587 nodeid: 1
588 quorum_votes: 1
589 ring0_addr: 10.10.10.1
590 }
591
592}
593
594quorum {
595 provider: corosync_votequorum
596}
597
598totem {
599 cluster_name: thomas-testcluster
600 config_version: 4
601 ip_version: ipv4
602 secauth: on
603 version: 2
604 interface {
605 bindnetaddr: 10.10.10.1
606 ringnumber: 0
607 }
608
609}
610----
611
612Now after a final check whether all changed information is correct we save it
613and see again the <<edit-corosync-conf,edit corosync.conf file>> section to
614learn how to bring it in effect.
615
616As our change cannot be enforced live from corosync we have to do an restart.
617
618On a single node execute:
619[source,bash]
4d19cb00 620----
e4ec4154 621systemctl restart corosync
4d19cb00 622----
e4ec4154
TL
623
624Now check if everything is fine:
625
626[source,bash]
4d19cb00 627----
e4ec4154 628systemctl status corosync
4d19cb00 629----
e4ec4154
TL
630
631If corosync runs again correct restart corosync also on all other nodes.
632They will then join the cluster membership one by one on the new network.
633
11202f1d 634[[pvecm_rrp]]
e4ec4154
TL
635Redundant Ring Protocol
636~~~~~~~~~~~~~~~~~~~~~~~
637To avoid a single point of failure you should implement counter measurements.
638This can be on the hardware and operating system level through network bonding.
639
640Corosync itself offers also a possibility to add redundancy through the so
641called 'Redundant Ring Protocol'. This protocol allows running a second totem
642ring on another network, this network should be physically separated from the
643other rings network to actually increase availability.
644
645RRP On Cluster Creation
646~~~~~~~~~~~~~~~~~~~~~~~
647
648The 'pvecm create' command provides the additional parameters 'bindnetX_addr',
649'ringX_addr' and 'rrp_mode', can be used for RRP configuration.
650
651NOTE: See the <<corosync-conf-glossary,glossary>> if you do not know what each parameter means.
652
653So if you have two networks, one on the 10.10.10.1/24 and the other on the
65410.10.20.1/24 subnet you would execute:
655
656[source,bash]
4d19cb00 657----
e4ec4154
TL
658pvecm create CLUSTERNAME -bindnet0_addr 10.10.10.1 -ring0_addr 10.10.10.1 \
659-bindnet1_addr 10.10.20.1 -ring1_addr 10.10.20.1
4d19cb00 660----
e4ec4154 661
6e78f927 662RRP On Existing Clusters
e4ec4154
TL
663~~~~~~~~~~~~~~~~~~~~~~~~
664
6e78f927
TL
665You will take similar steps as described in
666<<separate-cluster-net-after-creation,separating the cluster network>> to
667enable RRP on an already running cluster. The single difference is, that you
668will add `ring1` and use it instead of `ring0`.
e4ec4154
TL
669
670First add a new `interface` subsection in the `totem` section, set its
671`ringnumber` property to `1`. Set the interfaces `bindnetaddr` property to an
672address of the subnet you have configured for your new ring.
673Further set the `rrp_mode` to `passive`, this is the only stable mode.
674
675Then add to each node entry in the `nodelist` section its new `ring1_addr`
676property with the nodes additional ring address.
677
678So if you have two networks, one on the 10.10.10.1/24 and the other on the
67910.10.20.1/24 subnet, the final configuration file should look like:
680
681----
682totem {
683 cluster_name: tweak
684 config_version: 9
685 ip_version: ipv4
686 rrp_mode: passive
687 secauth: on
688 version: 2
689 interface {
690 bindnetaddr: 10.10.10.1
691 ringnumber: 0
692 }
693 interface {
694 bindnetaddr: 10.10.20.1
695 ringnumber: 1
696 }
697}
698
699nodelist {
700 node {
701 name: pvecm1
702 nodeid: 1
703 quorum_votes: 1
704 ring0_addr: 10.10.10.1
705 ring1_addr: 10.10.20.1
706 }
707
708 node {
709 name: pvecm2
710 nodeid: 2
711 quorum_votes: 1
712 ring0_addr: 10.10.10.2
713 ring1_addr: 10.10.20.2
714 }
715
716 [...] # other cluster nodes here
717}
718
719[...] # other remaining config sections here
720
721----
722
7d48940b
DM
723Bring it in effect like described in the
724<<edit-corosync-conf,edit the corosync.conf file>> section.
e4ec4154
TL
725
726This is a change which cannot take live in effect and needs at least a restart
727of corosync. Recommended is a restart of the whole cluster.
728
729If you cannot reboot the whole cluster ensure no High Availability services are
730configured and the stop the corosync service on all nodes. After corosync is
731stopped on all nodes start it one after the other again.
732
733Corosync Configuration
734----------------------
735
470d4313 736The `/etc/pve/corosync.conf` file plays a central role in {pve} cluster. It
e4ec4154
TL
737controls the cluster member ship and its network.
738For reading more about it check the corosync.conf man page:
739[source,bash]
4d19cb00 740----
e4ec4154 741man corosync.conf
4d19cb00 742----
e4ec4154
TL
743
744For node membership you should always use the `pvecm` tool provided by {pve}.
745You may have to edit the configuration file manually for other changes.
746Here are a few best practice tips for doing this.
747
748[[edit-corosync-conf]]
749Edit corosync.conf
750~~~~~~~~~~~~~~~~~~
751
752Editing the corosync.conf file can be not always straight forward. There are
753two on each cluster, one in `/etc/pve/corosync.conf` and the other in
754`/etc/corosync/corosync.conf`. Editing the one in our cluster file system will
755propagate the changes to the local one, but not vice versa.
756
757The configuration will get updated automatically as soon as the file changes.
758This means changes which can be integrated in a running corosync will take
759instantly effect. So you should always make a copy and edit that instead, to
760avoid triggering some unwanted changes by an in between safe.
761
762[source,bash]
4d19cb00 763----
e4ec4154 764cp /etc/pve/corosync.conf /etc/pve/corosync.conf.new
4d19cb00 765----
e4ec4154
TL
766
767Then open the Config file with your favorite editor, `nano` and `vim.tiny` are
768preinstalled on {pve} for example.
769
770NOTE: Always increment the 'config_version' number on configuration changes,
771omitting this can lead to problems.
772
773After making the necessary changes create another copy of the current working
774configuration file. This serves as a backup if the new configuration fails to
775apply or makes problems in other ways.
776
777[source,bash]
4d19cb00 778----
e4ec4154 779cp /etc/pve/corosync.conf /etc/pve/corosync.conf.bak
4d19cb00 780----
e4ec4154
TL
781
782Then move the new configuration file over the old one:
783[source,bash]
4d19cb00 784----
e4ec4154 785mv /etc/pve/corosync.conf.new /etc/pve/corosync.conf
4d19cb00 786----
e4ec4154
TL
787
788You may check with the commands
789[source,bash]
4d19cb00 790----
e4ec4154
TL
791systemctl status corosync
792journalctl -b -u corosync
4d19cb00 793----
e4ec4154
TL
794
795If the change could applied automatically. If not you may have to restart the
796corosync service via:
797[source,bash]
4d19cb00 798----
e4ec4154 799systemctl restart corosync
4d19cb00 800----
e4ec4154
TL
801
802On errors check the troubleshooting section below.
803
804Troubleshooting
805~~~~~~~~~~~~~~~
806
807Issue: 'quorum.expected_votes must be configured'
808^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
809
810When corosync starts to fail and you get the following message in the system log:
811
812----
813[...]
814corosync[1647]: [QUORUM] Quorum provider: corosync_votequorum failed to initialize.
815corosync[1647]: [SERV ] Service engine 'corosync_quorum' failed to load for reason
816 'configuration error: nodelist or quorum.expected_votes must be configured!'
817[...]
818----
819
820It means that the hostname you set for corosync 'ringX_addr' in the
821configuration could not be resolved.
822
823
824Write Configuration When Not Quorate
825^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
826
827If you need to change '/etc/pve/corosync.conf' on an node with no quorum, and you
828know what you do, use:
829[source,bash]
4d19cb00 830----
e4ec4154 831pvecm expected 1
4d19cb00 832----
e4ec4154
TL
833
834This sets the expected vote count to 1 and makes the cluster quorate. You can
835now fix your configuration, or revert it back to the last working backup.
836
837This is not enough if corosync cannot start anymore. Here its best to edit the
838local copy of the corosync configuration in '/etc/corosync/corosync.conf' so
839that corosync can start again. Ensure that on all nodes this configuration has
840the same content to avoid split brains. If you are not sure what went wrong
841it's best to ask the Proxmox Community to help you.
842
843
844[[corosync-conf-glossary]]
845Corosync Configuration Glossary
846~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
847
848ringX_addr::
849This names the different ring addresses for the corosync totem rings used for
850the cluster communication.
851
852bindnetaddr::
853Defines to which interface the ring should bind to. It may be any address of
854the subnet configured on the interface we want to use. In general its the
855recommended to just use an address a node uses on this interface.
856
857rrp_mode::
858Specifies the mode of the redundant ring protocol and may be passive, active or
859none. Note that use of active is highly experimental and not official
860supported. Passive is the preferred mode, it may double the cluster
861communication throughput and increases availability.
862
806ef12d
DM
863
864Cluster Cold Start
865------------------
866
867It is obvious that a cluster is not quorate when all nodes are
868offline. This is a common case after a power failure.
869
870NOTE: It is always a good idea to use an uninterruptible power supply
8c1189b6 871(``UPS'', also called ``battery backup'') to avoid this state, especially if
806ef12d
DM
872you want HA.
873
204231df 874On node startup, the `pve-guests` service is started and waits for
8c1189b6 875quorum. Once quorate, it starts all guests which have the `onboot`
612417fd
DM
876flag set.
877
878When you turn on nodes, or when power comes back after power failure,
879it is likely that some nodes boots faster than others. Please keep in
880mind that guest startup is delayed until you reach quorum.
806ef12d 881
054a7e7d 882
082ea7d9
TL
883Guest Migration
884---------------
885
054a7e7d
DM
886Migrating virtual guests to other nodes is a useful feature in a
887cluster. There are settings to control the behavior of such
888migrations. This can be done via the configuration file
889`datacenter.cfg` or for a specific migration via API or command line
890parameters.
891
da6c7dee
DC
892It makes a difference if a Guest is online or offline, or if it has
893local resources (like a local disk).
894
895For Details about Virtual Machine Migration see the
896xref:qm_migration[QEMU/KVM Migration Chapter]
897
898For Details about Container Migration see the
899xref:pct_migration[Container Migration Chapter]
082ea7d9
TL
900
901Migration Type
902~~~~~~~~~~~~~~
903
44f38275 904The migration type defines if the migration data should be sent over an
d63be10b 905encrypted (`secure`) channel or an unencrypted (`insecure`) one.
082ea7d9 906Setting the migration type to insecure means that the RAM content of a
470d4313 907virtual guest gets also transferred unencrypted, which can lead to
b1743473
DM
908information disclosure of critical data from inside the guest (for
909example passwords or encryption keys).
054a7e7d
DM
910
911Therefore, we strongly recommend using the secure channel if you do
912not have full control over the network and can not guarantee that no
913one is eavesdropping to it.
082ea7d9 914
054a7e7d
DM
915NOTE: Storage migration does not follow this setting. Currently, it
916always sends the storage content over a secure channel.
917
918Encryption requires a lot of computing power, so this setting is often
919changed to "unsafe" to achieve better performance. The impact on
920modern systems is lower because they implement AES encryption in
b1743473
DM
921hardware. The performance impact is particularly evident in fast
922networks where you can transfer 10 Gbps or more.
082ea7d9 923
082ea7d9
TL
924
925Migration Network
926~~~~~~~~~~~~~~~~~
927
a9baa444
TL
928By default, {pve} uses the network in which cluster communication
929takes place to send the migration traffic. This is not optimal because
930sensitive cluster traffic can be disrupted and this network may not
931have the best bandwidth available on the node.
932
933Setting the migration network parameter allows the use of a dedicated
934network for the entire migration traffic. In addition to the memory,
935this also affects the storage traffic for offline migrations.
936
937The migration network is set as a network in the CIDR notation. This
938has the advantage that you do not have to set individual IP addresses
939for each node. {pve} can determine the real address on the
940destination node from the network specified in the CIDR form. To
941enable this, the network must be specified so that each node has one,
942but only one IP in the respective network.
943
082ea7d9
TL
944
945Example
946^^^^^^^
947
a9baa444
TL
948We assume that we have a three-node setup with three separate
949networks. One for public communication with the Internet, one for
950cluster communication and a very fast one, which we want to use as a
951dedicated network for migration.
952
953A network configuration for such a setup might look as follows:
082ea7d9
TL
954
955----
7a0d4784 956iface eno1 inet manual
082ea7d9
TL
957
958# public network
959auto vmbr0
960iface vmbr0 inet static
961 address 192.X.Y.57
962 netmask 255.255.250.0
963 gateway 192.X.Y.1
7a0d4784 964 bridge_ports eno1
082ea7d9
TL
965 bridge_stp off
966 bridge_fd 0
967
968# cluster network
7a0d4784
WL
969auto eno2
970iface eno2 inet static
082ea7d9
TL
971 address 10.1.1.1
972 netmask 255.255.255.0
973
974# fast network
7a0d4784
WL
975auto eno3
976iface eno3 inet static
082ea7d9
TL
977 address 10.1.2.1
978 netmask 255.255.255.0
082ea7d9
TL
979----
980
a9baa444
TL
981Here, we will use the network 10.1.2.0/24 as a migration network. For
982a single migration, you can do this using the `migration_network`
983parameter of the command line tool:
984
082ea7d9 985----
b1743473 986# qm migrate 106 tre --online --migration_network 10.1.2.0/24
082ea7d9
TL
987----
988
a9baa444
TL
989To configure this as the default network for all migrations in the
990cluster, set the `migration` property of the `/etc/pve/datacenter.cfg`
991file:
992
082ea7d9 993----
a9baa444 994# use dedicated migration network
b1743473 995migration: secure,network=10.1.2.0/24
082ea7d9
TL
996----
997
a9baa444
TL
998NOTE: The migration type must always be set when the migration network
999gets set in `/etc/pve/datacenter.cfg`.
1000
806ef12d 1001
d8742b0c
DM
1002ifdef::manvolnum[]
1003include::pve-copyright.adoc[]
1004endif::manvolnum[]