]>
Commit | Line | Data |
---|---|---|
d8742b0c | 1 | ifdef::manvolnum[] |
7e2fdb3d DM |
2 | PVE(1) |
3 | ====== | |
d8742b0c | 4 | include::attributes.txt[] |
5f09af76 DM |
5 | :pve-toplevel: |
6 | ||
d8742b0c DM |
7 | NAME |
8 | ---- | |
9 | ||
74026b8f | 10 | pvecm - Proxmox VE Cluster Manager |
d8742b0c | 11 | |
49a5e11c | 12 | SYNOPSIS |
d8742b0c DM |
13 | -------- |
14 | ||
15 | include::pvecm.1-synopsis.adoc[] | |
16 | ||
17 | DESCRIPTION | |
18 | ----------- | |
19 | endif::manvolnum[] | |
20 | ||
21 | ifndef::manvolnum[] | |
22 | Cluster Manager | |
23 | =============== | |
24 | include::attributes.txt[] | |
25 | endif::manvolnum[] | |
26 | ||
5f09af76 DM |
27 | ifdef::wiki[] |
28 | :pve-toplevel: | |
29 | endif::wiki[] | |
30 | ||
8c1189b6 FG |
31 | The {PVE} cluster manager `pvecm` is a tool to create a group of |
32 | physical servers. Such a group is called a *cluster*. We use the | |
8a865621 | 33 | http://www.corosync.org[Corosync Cluster Engine] for reliable group |
5eba0743 | 34 | communication, and such clusters can consist of up to 32 physical nodes |
8a865621 DM |
35 | (probably more, dependent on network latency). |
36 | ||
8c1189b6 | 37 | `pvecm` can be used to create a new cluster, join nodes to a cluster, |
8a865621 | 38 | leave the cluster, get status information and do various other cluster |
e300cf7d FG |
39 | related tasks. The **P**rox**m**o**x** **C**luster **F**ile **S**ystem (``pmxcfs'') |
40 | is used to transparently distribute the cluster configuration to all cluster | |
8a865621 DM |
41 | nodes. |
42 | ||
43 | Grouping nodes into a cluster has the following advantages: | |
44 | ||
45 | * Centralized, web based management | |
46 | ||
5eba0743 | 47 | * Multi-master clusters: each node can do all management task |
8a865621 | 48 | |
8c1189b6 FG |
49 | * `pmxcfs`: database-driven file system for storing configuration files, |
50 | replicated in real-time on all nodes using `corosync`. | |
8a865621 | 51 | |
5eba0743 | 52 | * Easy migration of virtual machines and containers between physical |
8a865621 DM |
53 | hosts |
54 | ||
55 | * Fast deployment | |
56 | ||
57 | * Cluster-wide services like firewall and HA | |
58 | ||
59 | ||
60 | Requirements | |
61 | ------------ | |
62 | ||
8c1189b6 | 63 | * All nodes must be in the same network as `corosync` uses IP Multicast |
8a865621 | 64 | to communicate between nodes (also see |
ceabe189 | 65 | http://www.corosync.org[Corosync Cluster Engine]). Corosync uses UDP |
ff72a2ba | 66 | ports 5404 and 5405 for cluster communication. |
ceabe189 DM |
67 | + |
68 | NOTE: Some switches do not support IP multicast by default and must be | |
69 | manually enabled first. | |
8a865621 DM |
70 | |
71 | * Date and time have to be synchronized. | |
72 | ||
ceabe189 | 73 | * SSH tunnel on TCP port 22 between nodes is used. |
8a865621 | 74 | |
ceabe189 DM |
75 | * If you are interested in High Availability, you need to have at |
76 | least three nodes for reliable quorum. All nodes should have the | |
77 | same version. | |
8a865621 DM |
78 | |
79 | * We recommend a dedicated NIC for the cluster traffic, especially if | |
80 | you use shared storage. | |
81 | ||
82 | NOTE: It is not possible to mix Proxmox VE 3.x and earlier with | |
ceabe189 | 83 | Proxmox VE 4.0 cluster nodes. |
8a865621 DM |
84 | |
85 | ||
ceabe189 DM |
86 | Preparing Nodes |
87 | --------------- | |
8a865621 DM |
88 | |
89 | First, install {PVE} on all nodes. Make sure that each node is | |
90 | installed with the final hostname and IP configuration. Changing the | |
91 | hostname and IP is not possible after cluster creation. | |
92 | ||
93 | Currently the cluster creation has to be done on the console, so you | |
8c1189b6 | 94 | need to login via `ssh`. |
8a865621 | 95 | |
8a865621 | 96 | Create the Cluster |
ceabe189 | 97 | ------------------ |
8a865621 | 98 | |
8c1189b6 FG |
99 | Login via `ssh` to the first {pve} node. Use a unique name for your cluster. |
100 | This name cannot be changed later. | |
8a865621 DM |
101 | |
102 | hp1# pvecm create YOUR-CLUSTER-NAME | |
103 | ||
63f956c8 DM |
104 | CAUTION: The cluster name is used to compute the default multicast |
105 | address. Please use unique cluster names if you run more than one | |
106 | cluster inside your network. | |
107 | ||
8a865621 DM |
108 | To check the state of your cluster use: |
109 | ||
110 | hp1# pvecm status | |
111 | ||
112 | ||
113 | Adding Nodes to the Cluster | |
ceabe189 | 114 | --------------------------- |
8a865621 | 115 | |
8c1189b6 | 116 | Login via `ssh` to the node you want to add. |
8a865621 DM |
117 | |
118 | hp2# pvecm add IP-ADDRESS-CLUSTER | |
119 | ||
120 | For `IP-ADDRESS-CLUSTER` use the IP from an existing cluster node. | |
121 | ||
5eba0743 | 122 | CAUTION: A new node cannot hold any VMs, because you would get |
7980581f | 123 | conflicts about identical VM IDs. Also, all existing configuration in |
8c1189b6 FG |
124 | `/etc/pve` is overwritten when you join a new node to the cluster. To |
125 | workaround, use `vzdump` to backup and restore to a different VMID after | |
7980581f | 126 | adding the node to the cluster. |
8a865621 DM |
127 | |
128 | To check the state of cluster: | |
129 | ||
130 | # pvecm status | |
131 | ||
ceabe189 | 132 | .Cluster status after adding 4 nodes |
8a865621 DM |
133 | ---- |
134 | hp2# pvecm status | |
135 | Quorum information | |
136 | ~~~~~~~~~~~~~~~~~~ | |
137 | Date: Mon Apr 20 12:30:13 2015 | |
138 | Quorum provider: corosync_votequorum | |
139 | Nodes: 4 | |
140 | Node ID: 0x00000001 | |
141 | Ring ID: 1928 | |
142 | Quorate: Yes | |
143 | ||
144 | Votequorum information | |
145 | ~~~~~~~~~~~~~~~~~~~~~~ | |
146 | Expected votes: 4 | |
147 | Highest expected: 4 | |
148 | Total votes: 4 | |
149 | Quorum: 2 | |
150 | Flags: Quorate | |
151 | ||
152 | Membership information | |
153 | ~~~~~~~~~~~~~~~~~~~~~~ | |
154 | Nodeid Votes Name | |
155 | 0x00000001 1 192.168.15.91 | |
156 | 0x00000002 1 192.168.15.92 (local) | |
157 | 0x00000003 1 192.168.15.93 | |
158 | 0x00000004 1 192.168.15.94 | |
159 | ---- | |
160 | ||
161 | If you only want the list of all nodes use: | |
162 | ||
163 | # pvecm nodes | |
164 | ||
5eba0743 | 165 | .List nodes in a cluster |
8a865621 DM |
166 | ---- |
167 | hp2# pvecm nodes | |
168 | ||
169 | Membership information | |
170 | ~~~~~~~~~~~~~~~~~~~~~~ | |
171 | Nodeid Votes Name | |
172 | 1 1 hp1 | |
173 | 2 1 hp2 (local) | |
174 | 3 1 hp3 | |
175 | 4 1 hp4 | |
176 | ---- | |
177 | ||
e4ec4154 TL |
178 | Adding Nodes With Separated Cluster Network |
179 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
180 | ||
181 | When adding a node to a cluster with a separated cluster network you need to | |
182 | use the 'ringX_addr' parameters to set the nodes address on those networks: | |
183 | ||
184 | [source,bash] | |
4d19cb00 | 185 | ---- |
e4ec4154 | 186 | pvecm add IP-ADDRESS-CLUSTER -ring0_addr IP-ADDRESS-RING0 |
4d19cb00 | 187 | ---- |
e4ec4154 TL |
188 | |
189 | If you want to use the Redundant Ring Protocol you will also want to pass the | |
190 | 'ring1_addr' parameter. | |
191 | ||
8a865621 DM |
192 | |
193 | Remove a Cluster Node | |
ceabe189 | 194 | --------------------- |
8a865621 DM |
195 | |
196 | CAUTION: Read carefully the procedure before proceeding, as it could | |
197 | not be what you want or need. | |
198 | ||
199 | Move all virtual machines from the node. Make sure you have no local | |
200 | data or backups you want to keep, or save them accordingly. | |
201 | ||
8c1189b6 | 202 | Log in to one remaining node via ssh. Issue a `pvecm nodes` command to |
7980581f | 203 | identify the node ID: |
8a865621 DM |
204 | |
205 | ---- | |
206 | hp1# pvecm status | |
207 | ||
208 | Quorum information | |
209 | ~~~~~~~~~~~~~~~~~~ | |
210 | Date: Mon Apr 20 12:30:13 2015 | |
211 | Quorum provider: corosync_votequorum | |
212 | Nodes: 4 | |
213 | Node ID: 0x00000001 | |
214 | Ring ID: 1928 | |
215 | Quorate: Yes | |
216 | ||
217 | Votequorum information | |
218 | ~~~~~~~~~~~~~~~~~~~~~~ | |
219 | Expected votes: 4 | |
220 | Highest expected: 4 | |
221 | Total votes: 4 | |
222 | Quorum: 2 | |
223 | Flags: Quorate | |
224 | ||
225 | Membership information | |
226 | ~~~~~~~~~~~~~~~~~~~~~~ | |
227 | Nodeid Votes Name | |
228 | 0x00000001 1 192.168.15.91 (local) | |
229 | 0x00000002 1 192.168.15.92 | |
230 | 0x00000003 1 192.168.15.93 | |
231 | 0x00000004 1 192.168.15.94 | |
232 | ---- | |
233 | ||
234 | IMPORTANT: at this point you must power off the node to be removed and | |
235 | make sure that it will not power on again (in the network) as it | |
236 | is. | |
237 | ||
238 | ---- | |
239 | hp1# pvecm nodes | |
240 | ||
241 | Membership information | |
242 | ~~~~~~~~~~~~~~~~~~~~~~ | |
243 | Nodeid Votes Name | |
244 | 1 1 hp1 (local) | |
245 | 2 1 hp2 | |
246 | 3 1 hp3 | |
247 | 4 1 hp4 | |
248 | ---- | |
249 | ||
250 | Log in to one remaining node via ssh. Issue the delete command (here | |
8c1189b6 | 251 | deleting node `hp4`): |
8a865621 DM |
252 | |
253 | hp1# pvecm delnode hp4 | |
254 | ||
255 | If the operation succeeds no output is returned, just check the node | |
8c1189b6 | 256 | list again with `pvecm nodes` or `pvecm status`. You should see |
8a865621 DM |
257 | something like: |
258 | ||
259 | ---- | |
260 | hp1# pvecm status | |
261 | ||
262 | Quorum information | |
263 | ~~~~~~~~~~~~~~~~~~ | |
264 | Date: Mon Apr 20 12:44:28 2015 | |
265 | Quorum provider: corosync_votequorum | |
266 | Nodes: 3 | |
267 | Node ID: 0x00000001 | |
268 | Ring ID: 1992 | |
269 | Quorate: Yes | |
270 | ||
271 | Votequorum information | |
272 | ~~~~~~~~~~~~~~~~~~~~~~ | |
273 | Expected votes: 3 | |
274 | Highest expected: 3 | |
275 | Total votes: 3 | |
276 | Quorum: 3 | |
277 | Flags: Quorate | |
278 | ||
279 | Membership information | |
280 | ~~~~~~~~~~~~~~~~~~~~~~ | |
281 | Nodeid Votes Name | |
282 | 0x00000001 1 192.168.15.90 (local) | |
283 | 0x00000002 1 192.168.15.91 | |
284 | 0x00000003 1 192.168.15.92 | |
285 | ---- | |
286 | ||
287 | IMPORTANT: as said above, it is very important to power off the node | |
288 | *before* removal, and make sure that it will *never* power on again | |
289 | (in the existing cluster network) as it is. | |
290 | ||
291 | If you power on the node as it is, your cluster will be screwed up and | |
292 | it could be difficult to restore a clean cluster state. | |
293 | ||
294 | If, for whatever reason, you want that this server joins the same | |
295 | cluster again, you have to | |
296 | ||
26ca7ff5 | 297 | * reinstall {pve} on it from scratch |
8a865621 DM |
298 | |
299 | * then join it, as explained in the previous section. | |
d8742b0c | 300 | |
555e966b TL |
301 | Separate A Node Without Reinstalling |
302 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
303 | ||
304 | CAUTION: This is *not* the recommended method, proceed with caution. Use the | |
305 | above mentioned method if you're unsure. | |
306 | ||
307 | You can also separate a node from a cluster without reinstalling it from | |
308 | scratch. But after removing the node from the cluster it will still have | |
309 | access to the shared storages! This must be resolved before you start removing | |
310 | the node from the cluster. A {pve} cluster cannot share the exact same | |
311 | storage with another cluster, as it leads to VMID conflicts. | |
312 | ||
3be22308 TL |
313 | Its suggested that you create a new storage where only the node which you want |
314 | to separate has access. This can be an new export on your NFS or a new Ceph | |
315 | pool, to name a few examples. Its just important that the exact same storage | |
316 | does not gets accessed by multiple clusters. After setting this storage up move | |
317 | all data from the node and its VMs to it. Then you are ready to separate the | |
318 | node from the cluster. | |
555e966b TL |
319 | |
320 | WARNING: Ensure all shared resources are cleanly separated! You will run into | |
321 | conflicts and problems else. | |
322 | ||
323 | First stop the corosync and the pve-cluster services on the node: | |
324 | [source,bash] | |
4d19cb00 | 325 | ---- |
555e966b TL |
326 | systemctl stop pve-cluster |
327 | systemctl stop corosync | |
4d19cb00 | 328 | ---- |
555e966b TL |
329 | |
330 | Start the cluster filesystem again in local mode: | |
331 | [source,bash] | |
4d19cb00 | 332 | ---- |
555e966b | 333 | pmxcfs -l |
4d19cb00 | 334 | ---- |
555e966b TL |
335 | |
336 | Delete the corosync configuration files: | |
337 | [source,bash] | |
4d19cb00 | 338 | ---- |
555e966b TL |
339 | rm /etc/pve/corosync.conf |
340 | rm /etc/corosync/* | |
4d19cb00 | 341 | ---- |
555e966b TL |
342 | |
343 | You can now start the filesystem again as normal service: | |
344 | [source,bash] | |
4d19cb00 | 345 | ---- |
555e966b TL |
346 | killall pmxcfs |
347 | systemctl start pve-cluster | |
4d19cb00 | 348 | ---- |
555e966b TL |
349 | |
350 | The node is now separated from the cluster. You can deleted it from a remaining | |
351 | node of the cluster with: | |
352 | [source,bash] | |
4d19cb00 | 353 | ---- |
555e966b | 354 | pvecm delnode oldnode |
4d19cb00 | 355 | ---- |
555e966b TL |
356 | |
357 | If the command failed, because the remaining node in the cluster lost quorum | |
358 | when the now separate node exited, you may set the expected votes to 1 as a workaround: | |
359 | [source,bash] | |
4d19cb00 | 360 | ---- |
555e966b | 361 | pvecm expected 1 |
4d19cb00 | 362 | ---- |
555e966b TL |
363 | |
364 | And the repeat the 'pvecm delnode' command. | |
365 | ||
366 | Now switch back to the separated node, here delete all remaining files left | |
367 | from the old cluster. This ensures that the node can be added to another | |
368 | cluster again without problems. | |
369 | ||
370 | [source,bash] | |
4d19cb00 | 371 | ---- |
555e966b | 372 | rm /var/lib/corosync/* |
4d19cb00 | 373 | ---- |
555e966b TL |
374 | |
375 | As the configuration files from the other nodes are still in the cluster | |
376 | filesystem you may want to clean those up too. Remove simply the whole | |
377 | directory recursive from '/etc/pve/nodes/NODENAME', but check three times that | |
378 | you used the correct one before deleting it. | |
379 | ||
380 | CAUTION: The nodes SSH keys are still in the 'authorized_key' file, this means | |
381 | the nodes can still connect to each other with public key authentication. This | |
382 | should be fixed by removing the respective keys from the | |
383 | '/etc/pve/priv/authorized_keys' file. | |
d8742b0c | 384 | |
806ef12d DM |
385 | Quorum |
386 | ------ | |
387 | ||
388 | {pve} use a quorum-based technique to provide a consistent state among | |
389 | all cluster nodes. | |
390 | ||
391 | [quote, from Wikipedia, Quorum (distributed computing)] | |
392 | ____ | |
393 | A quorum is the minimum number of votes that a distributed transaction | |
394 | has to obtain in order to be allowed to perform an operation in a | |
395 | distributed system. | |
396 | ____ | |
397 | ||
398 | In case of network partitioning, state changes requires that a | |
399 | majority of nodes are online. The cluster switches to read-only mode | |
5eba0743 | 400 | if it loses quorum. |
806ef12d DM |
401 | |
402 | NOTE: {pve} assigns a single vote to each node by default. | |
403 | ||
e4ec4154 TL |
404 | Cluster Network |
405 | --------------- | |
406 | ||
407 | The cluster network is the core of a cluster. All messages sent over it have to | |
408 | be delivered reliable to all nodes in their respective order. In {pve} this | |
409 | part is done by corosync, an implementation of a high performance low overhead | |
410 | high availability development toolkit. It serves our decentralized | |
411 | configuration file system (`pmxcfs`). | |
412 | ||
413 | [[cluster-network-requirements]] | |
414 | Network Requirements | |
415 | ~~~~~~~~~~~~~~~~~~~~ | |
416 | This needs a reliable network with latencies under 2 milliseconds (LAN | |
417 | performance) to work properly. While corosync can also use unicast for | |
418 | communication between nodes its **highly recommended** to have a multicast | |
419 | capable network. The network should not be used heavily by other members, | |
420 | ideally corosync runs on its own network. | |
421 | *never* share it with network where storage communicates too. | |
422 | ||
423 | Before setting up a cluster it is good practice to check if the network is fit | |
424 | for that purpose. | |
425 | ||
426 | * Ensure that all nodes are in the same subnet. This must only be true for the | |
427 | network interfaces used for cluster communication (corosync). | |
428 | ||
429 | * Ensure all nodes can reach each other over those interfaces, using `ping` is | |
430 | enough for a basic test. | |
431 | ||
432 | * Ensure that multicast works in general and a high package rates. This can be | |
433 | done with the `omping` tool. The final "%loss" number should be < 1%. | |
434 | [source,bash] | |
435 | ---- | |
436 | omping -c 10000 -i 0.001 -F -q NODE1-IP NODE2-IP ... | |
437 | ---- | |
438 | ||
439 | * Ensure that multicast communication works over an extended period of time. | |
440 | This covers up problems where IGMP snooping is activated on the network but | |
441 | no multicast querier is active. This test has a duration of around 10 | |
442 | minutes. | |
443 | [source,bash] | |
4d19cb00 | 444 | ---- |
e4ec4154 | 445 | omping -c 600 -i 1 -q NODE1-IP NODE2-IP ... |
4d19cb00 | 446 | ---- |
e4ec4154 TL |
447 | |
448 | Your network is not ready for clustering if any of these test fails. Recheck | |
449 | your network configuration. Especially switches are notorious for having | |
450 | multicast disabled by default or IGMP snooping enabled with no IGMP querier | |
451 | active. | |
452 | ||
453 | In smaller cluster its also an option to use unicast if you really cannot get | |
454 | multicast to work. | |
455 | ||
456 | Separate Cluster Network | |
457 | ~~~~~~~~~~~~~~~~~~~~~~~~ | |
458 | ||
459 | When creating a cluster without any parameters the cluster network is generally | |
460 | shared with the Web UI and the VMs and its traffic. Depending on your setup | |
461 | even storage traffic may get sent over the same network. Its recommended to | |
462 | change that, as corosync is a time critical real time application. | |
463 | ||
464 | Setting Up A New Network | |
465 | ^^^^^^^^^^^^^^^^^^^^^^^^ | |
466 | ||
467 | First you have to setup a new network interface. It should be on a physical | |
468 | separate network. Ensure that your network fulfills the | |
469 | <<cluster-network-requirements,cluster network requirements>>. | |
470 | ||
471 | Separate On Cluster Creation | |
472 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
473 | ||
474 | This is possible through the 'ring0_addr' and 'bindnet0_addr' parameter of | |
475 | the 'pvecm create' command used for creating a new cluster. | |
476 | ||
477 | If you have setup a additional NIC with a static address on 10.10.10.1/25 | |
478 | and want to send and receive all cluster communication over this interface | |
479 | you would execute: | |
480 | ||
481 | [source,bash] | |
4d19cb00 | 482 | ---- |
e4ec4154 | 483 | pvecm create test --ring0_addr 10.10.10.1 --bindnet0_addr 10.10.10.0 |
4d19cb00 | 484 | ---- |
e4ec4154 TL |
485 | |
486 | To check if everything is working properly execute: | |
487 | [source,bash] | |
4d19cb00 | 488 | ---- |
e4ec4154 | 489 | systemctl status corosync |
4d19cb00 | 490 | ---- |
e4ec4154 TL |
491 | |
492 | [[separate-cluster-net-after-creation]] | |
493 | Separate After Cluster Creation | |
494 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
495 | ||
496 | You can do this also if you have already created a cluster and want to switch | |
497 | its communication to another network, without rebuilding the whole cluster. | |
498 | This change may lead to short durations of quorum loss in the cluster, as nodes | |
499 | have to restart corosync and come up one after the other on the new network. | |
500 | ||
501 | Check how to <<edit-corosync-conf,edit the corosync.conf file>> first. | |
502 | The open it and you should see a file similar to: | |
503 | ||
504 | ---- | |
505 | logging { | |
506 | debug: off | |
507 | to_syslog: yes | |
508 | } | |
509 | ||
510 | nodelist { | |
511 | ||
512 | node { | |
513 | name: due | |
514 | nodeid: 2 | |
515 | quorum_votes: 1 | |
516 | ring0_addr: due | |
517 | } | |
518 | ||
519 | node { | |
520 | name: tre | |
521 | nodeid: 3 | |
522 | quorum_votes: 1 | |
523 | ring0_addr: tre | |
524 | } | |
525 | ||
526 | node { | |
527 | name: uno | |
528 | nodeid: 1 | |
529 | quorum_votes: 1 | |
530 | ring0_addr: uno | |
531 | } | |
532 | ||
533 | } | |
534 | ||
535 | quorum { | |
536 | provider: corosync_votequorum | |
537 | } | |
538 | ||
539 | totem { | |
540 | cluster_name: thomas-testcluster | |
541 | config_version: 3 | |
542 | ip_version: ipv4 | |
543 | secauth: on | |
544 | version: 2 | |
545 | interface { | |
546 | bindnetaddr: 192.168.30.50 | |
547 | ringnumber: 0 | |
548 | } | |
549 | ||
550 | } | |
551 | ---- | |
552 | ||
553 | The first you want to do is add the 'name' properties in the node entries if | |
554 | you do not see them already. Those *must* match the node name. | |
555 | ||
556 | Then replace the address from the 'ring0_addr' properties with the new | |
557 | addresses. You may use plain IP addresses or also hostnames here. If you use | |
558 | hostnames ensure that they are resolvable from all nodes. | |
559 | ||
560 | In my example I want to switch my cluster communication to the 10.10.10.1/25 | |
561 | network. So I replace all 'ring0_addr' respectively. I also set the bindetaddr | |
562 | in the totem section of the config to an address of the new network. It can be | |
563 | any address from the subnet configured on the new network interface. | |
564 | ||
565 | After you increased the 'config_version' property the new configuration file | |
566 | should look like: | |
567 | ||
568 | ---- | |
569 | ||
570 | logging { | |
571 | debug: off | |
572 | to_syslog: yes | |
573 | } | |
574 | ||
575 | nodelist { | |
576 | ||
577 | node { | |
578 | name: due | |
579 | nodeid: 2 | |
580 | quorum_votes: 1 | |
581 | ring0_addr: 10.10.10.2 | |
582 | } | |
583 | ||
584 | node { | |
585 | name: tre | |
586 | nodeid: 3 | |
587 | quorum_votes: 1 | |
588 | ring0_addr: 10.10.10.3 | |
589 | } | |
590 | ||
591 | node { | |
592 | name: uno | |
593 | nodeid: 1 | |
594 | quorum_votes: 1 | |
595 | ring0_addr: 10.10.10.1 | |
596 | } | |
597 | ||
598 | } | |
599 | ||
600 | quorum { | |
601 | provider: corosync_votequorum | |
602 | } | |
603 | ||
604 | totem { | |
605 | cluster_name: thomas-testcluster | |
606 | config_version: 4 | |
607 | ip_version: ipv4 | |
608 | secauth: on | |
609 | version: 2 | |
610 | interface { | |
611 | bindnetaddr: 10.10.10.1 | |
612 | ringnumber: 0 | |
613 | } | |
614 | ||
615 | } | |
616 | ---- | |
617 | ||
618 | Now after a final check whether all changed information is correct we save it | |
619 | and see again the <<edit-corosync-conf,edit corosync.conf file>> section to | |
620 | learn how to bring it in effect. | |
621 | ||
622 | As our change cannot be enforced live from corosync we have to do an restart. | |
623 | ||
624 | On a single node execute: | |
625 | [source,bash] | |
4d19cb00 | 626 | ---- |
e4ec4154 | 627 | systemctl restart corosync |
4d19cb00 | 628 | ---- |
e4ec4154 TL |
629 | |
630 | Now check if everything is fine: | |
631 | ||
632 | [source,bash] | |
4d19cb00 | 633 | ---- |
e4ec4154 | 634 | systemctl status corosync |
4d19cb00 | 635 | ---- |
e4ec4154 TL |
636 | |
637 | If corosync runs again correct restart corosync also on all other nodes. | |
638 | They will then join the cluster membership one by one on the new network. | |
639 | ||
640 | Redundant Ring Protocol | |
641 | ~~~~~~~~~~~~~~~~~~~~~~~ | |
642 | To avoid a single point of failure you should implement counter measurements. | |
643 | This can be on the hardware and operating system level through network bonding. | |
644 | ||
645 | Corosync itself offers also a possibility to add redundancy through the so | |
646 | called 'Redundant Ring Protocol'. This protocol allows running a second totem | |
647 | ring on another network, this network should be physically separated from the | |
648 | other rings network to actually increase availability. | |
649 | ||
650 | RRP On Cluster Creation | |
651 | ~~~~~~~~~~~~~~~~~~~~~~~ | |
652 | ||
653 | The 'pvecm create' command provides the additional parameters 'bindnetX_addr', | |
654 | 'ringX_addr' and 'rrp_mode', can be used for RRP configuration. | |
655 | ||
656 | NOTE: See the <<corosync-conf-glossary,glossary>> if you do not know what each parameter means. | |
657 | ||
658 | So if you have two networks, one on the 10.10.10.1/24 and the other on the | |
659 | 10.10.20.1/24 subnet you would execute: | |
660 | ||
661 | [source,bash] | |
4d19cb00 | 662 | ---- |
e4ec4154 TL |
663 | pvecm create CLUSTERNAME -bindnet0_addr 10.10.10.1 -ring0_addr 10.10.10.1 \ |
664 | -bindnet1_addr 10.10.20.1 -ring1_addr 10.10.20.1 | |
4d19cb00 | 665 | ---- |
e4ec4154 TL |
666 | |
667 | RRP On A Created Cluster | |
668 | ~~~~~~~~~~~~~~~~~~~~~~~~ | |
669 | ||
670 | When enabling an already running cluster to use RRP you will take similar steps | |
671 | as describe in <<separate-cluster-net-after-creation,separating the cluster | |
672 | network>>. You just do it on another ring. | |
673 | ||
674 | First add a new `interface` subsection in the `totem` section, set its | |
675 | `ringnumber` property to `1`. Set the interfaces `bindnetaddr` property to an | |
676 | address of the subnet you have configured for your new ring. | |
677 | Further set the `rrp_mode` to `passive`, this is the only stable mode. | |
678 | ||
679 | Then add to each node entry in the `nodelist` section its new `ring1_addr` | |
680 | property with the nodes additional ring address. | |
681 | ||
682 | So if you have two networks, one on the 10.10.10.1/24 and the other on the | |
683 | 10.10.20.1/24 subnet, the final configuration file should look like: | |
684 | ||
685 | ---- | |
686 | totem { | |
687 | cluster_name: tweak | |
688 | config_version: 9 | |
689 | ip_version: ipv4 | |
690 | rrp_mode: passive | |
691 | secauth: on | |
692 | version: 2 | |
693 | interface { | |
694 | bindnetaddr: 10.10.10.1 | |
695 | ringnumber: 0 | |
696 | } | |
697 | interface { | |
698 | bindnetaddr: 10.10.20.1 | |
699 | ringnumber: 1 | |
700 | } | |
701 | } | |
702 | ||
703 | nodelist { | |
704 | node { | |
705 | name: pvecm1 | |
706 | nodeid: 1 | |
707 | quorum_votes: 1 | |
708 | ring0_addr: 10.10.10.1 | |
709 | ring1_addr: 10.10.20.1 | |
710 | } | |
711 | ||
712 | node { | |
713 | name: pvecm2 | |
714 | nodeid: 2 | |
715 | quorum_votes: 1 | |
716 | ring0_addr: 10.10.10.2 | |
717 | ring1_addr: 10.10.20.2 | |
718 | } | |
719 | ||
720 | [...] # other cluster nodes here | |
721 | } | |
722 | ||
723 | [...] # other remaining config sections here | |
724 | ||
725 | ---- | |
726 | ||
727 | Bring it in effect like described in the <<edit-corosync-conf,edit the | |
728 | corosync.conf file>> section. | |
729 | ||
730 | This is a change which cannot take live in effect and needs at least a restart | |
731 | of corosync. Recommended is a restart of the whole cluster. | |
732 | ||
733 | If you cannot reboot the whole cluster ensure no High Availability services are | |
734 | configured and the stop the corosync service on all nodes. After corosync is | |
735 | stopped on all nodes start it one after the other again. | |
736 | ||
737 | Corosync Configuration | |
738 | ---------------------- | |
739 | ||
740 | The `/ect/pve/corosync.conf` file plays a central role in {pve} cluster. It | |
741 | controls the cluster member ship and its network. | |
742 | For reading more about it check the corosync.conf man page: | |
743 | [source,bash] | |
4d19cb00 | 744 | ---- |
e4ec4154 | 745 | man corosync.conf |
4d19cb00 | 746 | ---- |
e4ec4154 TL |
747 | |
748 | For node membership you should always use the `pvecm` tool provided by {pve}. | |
749 | You may have to edit the configuration file manually for other changes. | |
750 | Here are a few best practice tips for doing this. | |
751 | ||
752 | [[edit-corosync-conf]] | |
753 | Edit corosync.conf | |
754 | ~~~~~~~~~~~~~~~~~~ | |
755 | ||
756 | Editing the corosync.conf file can be not always straight forward. There are | |
757 | two on each cluster, one in `/etc/pve/corosync.conf` and the other in | |
758 | `/etc/corosync/corosync.conf`. Editing the one in our cluster file system will | |
759 | propagate the changes to the local one, but not vice versa. | |
760 | ||
761 | The configuration will get updated automatically as soon as the file changes. | |
762 | This means changes which can be integrated in a running corosync will take | |
763 | instantly effect. So you should always make a copy and edit that instead, to | |
764 | avoid triggering some unwanted changes by an in between safe. | |
765 | ||
766 | [source,bash] | |
4d19cb00 | 767 | ---- |
e4ec4154 | 768 | cp /etc/pve/corosync.conf /etc/pve/corosync.conf.new |
4d19cb00 | 769 | ---- |
e4ec4154 TL |
770 | |
771 | Then open the Config file with your favorite editor, `nano` and `vim.tiny` are | |
772 | preinstalled on {pve} for example. | |
773 | ||
774 | NOTE: Always increment the 'config_version' number on configuration changes, | |
775 | omitting this can lead to problems. | |
776 | ||
777 | After making the necessary changes create another copy of the current working | |
778 | configuration file. This serves as a backup if the new configuration fails to | |
779 | apply or makes problems in other ways. | |
780 | ||
781 | [source,bash] | |
4d19cb00 | 782 | ---- |
e4ec4154 | 783 | cp /etc/pve/corosync.conf /etc/pve/corosync.conf.bak |
4d19cb00 | 784 | ---- |
e4ec4154 TL |
785 | |
786 | Then move the new configuration file over the old one: | |
787 | [source,bash] | |
4d19cb00 | 788 | ---- |
e4ec4154 | 789 | mv /etc/pve/corosync.conf.new /etc/pve/corosync.conf |
4d19cb00 | 790 | ---- |
e4ec4154 TL |
791 | |
792 | You may check with the commands | |
793 | [source,bash] | |
4d19cb00 | 794 | ---- |
e4ec4154 TL |
795 | systemctl status corosync |
796 | journalctl -b -u corosync | |
4d19cb00 | 797 | ---- |
e4ec4154 TL |
798 | |
799 | If the change could applied automatically. If not you may have to restart the | |
800 | corosync service via: | |
801 | [source,bash] | |
4d19cb00 | 802 | ---- |
e4ec4154 | 803 | systemctl restart corosync |
4d19cb00 | 804 | ---- |
e4ec4154 TL |
805 | |
806 | On errors check the troubleshooting section below. | |
807 | ||
808 | Troubleshooting | |
809 | ~~~~~~~~~~~~~~~ | |
810 | ||
811 | Issue: 'quorum.expected_votes must be configured' | |
812 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
813 | ||
814 | When corosync starts to fail and you get the following message in the system log: | |
815 | ||
816 | ---- | |
817 | [...] | |
818 | corosync[1647]: [QUORUM] Quorum provider: corosync_votequorum failed to initialize. | |
819 | corosync[1647]: [SERV ] Service engine 'corosync_quorum' failed to load for reason | |
820 | 'configuration error: nodelist or quorum.expected_votes must be configured!' | |
821 | [...] | |
822 | ---- | |
823 | ||
824 | It means that the hostname you set for corosync 'ringX_addr' in the | |
825 | configuration could not be resolved. | |
826 | ||
827 | ||
828 | Write Configuration When Not Quorate | |
829 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
830 | ||
831 | If you need to change '/etc/pve/corosync.conf' on an node with no quorum, and you | |
832 | know what you do, use: | |
833 | [source,bash] | |
4d19cb00 | 834 | ---- |
e4ec4154 | 835 | pvecm expected 1 |
4d19cb00 | 836 | ---- |
e4ec4154 TL |
837 | |
838 | This sets the expected vote count to 1 and makes the cluster quorate. You can | |
839 | now fix your configuration, or revert it back to the last working backup. | |
840 | ||
841 | This is not enough if corosync cannot start anymore. Here its best to edit the | |
842 | local copy of the corosync configuration in '/etc/corosync/corosync.conf' so | |
843 | that corosync can start again. Ensure that on all nodes this configuration has | |
844 | the same content to avoid split brains. If you are not sure what went wrong | |
845 | it's best to ask the Proxmox Community to help you. | |
846 | ||
847 | ||
848 | [[corosync-conf-glossary]] | |
849 | Corosync Configuration Glossary | |
850 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
851 | ||
852 | ringX_addr:: | |
853 | This names the different ring addresses for the corosync totem rings used for | |
854 | the cluster communication. | |
855 | ||
856 | bindnetaddr:: | |
857 | Defines to which interface the ring should bind to. It may be any address of | |
858 | the subnet configured on the interface we want to use. In general its the | |
859 | recommended to just use an address a node uses on this interface. | |
860 | ||
861 | rrp_mode:: | |
862 | Specifies the mode of the redundant ring protocol and may be passive, active or | |
863 | none. Note that use of active is highly experimental and not official | |
864 | supported. Passive is the preferred mode, it may double the cluster | |
865 | communication throughput and increases availability. | |
866 | ||
806ef12d DM |
867 | |
868 | Cluster Cold Start | |
869 | ------------------ | |
870 | ||
871 | It is obvious that a cluster is not quorate when all nodes are | |
872 | offline. This is a common case after a power failure. | |
873 | ||
874 | NOTE: It is always a good idea to use an uninterruptible power supply | |
8c1189b6 | 875 | (``UPS'', also called ``battery backup'') to avoid this state, especially if |
806ef12d DM |
876 | you want HA. |
877 | ||
8c1189b6 FG |
878 | On node startup, service `pve-manager` is started and waits for |
879 | quorum. Once quorate, it starts all guests which have the `onboot` | |
612417fd DM |
880 | flag set. |
881 | ||
882 | When you turn on nodes, or when power comes back after power failure, | |
883 | it is likely that some nodes boots faster than others. Please keep in | |
884 | mind that guest startup is delayed until you reach quorum. | |
806ef12d DM |
885 | |
886 | ||
d8742b0c DM |
887 | ifdef::manvolnum[] |
888 | include::pve-copyright.adoc[] | |
889 | endif::manvolnum[] |