]>
Commit | Line | Data |
---|---|---|
bde0e57d | 1 | [[chapter_pvecm]] |
d8742b0c | 2 | ifdef::manvolnum[] |
b2f242ab DM |
3 | pvecm(1) |
4 | ======== | |
5f09af76 DM |
5 | :pve-toplevel: |
6 | ||
d8742b0c DM |
7 | NAME |
8 | ---- | |
9 | ||
74026b8f | 10 | pvecm - Proxmox VE Cluster Manager |
d8742b0c | 11 | |
49a5e11c | 12 | SYNOPSIS |
d8742b0c DM |
13 | -------- |
14 | ||
15 | include::pvecm.1-synopsis.adoc[] | |
16 | ||
17 | DESCRIPTION | |
18 | ----------- | |
19 | endif::manvolnum[] | |
20 | ||
21 | ifndef::manvolnum[] | |
22 | Cluster Manager | |
23 | =============== | |
5f09af76 | 24 | :pve-toplevel: |
194d2f29 | 25 | endif::manvolnum[] |
5f09af76 | 26 | |
8c1189b6 FG |
27 | The {PVE} cluster manager `pvecm` is a tool to create a group of |
28 | physical servers. Such a group is called a *cluster*. We use the | |
8a865621 | 29 | http://www.corosync.org[Corosync Cluster Engine] for reliable group |
5eba0743 | 30 | communication, and such clusters can consist of up to 32 physical nodes |
8a865621 DM |
31 | (probably more, dependent on network latency). |
32 | ||
8c1189b6 | 33 | `pvecm` can be used to create a new cluster, join nodes to a cluster, |
8a865621 | 34 | leave the cluster, get status information and do various other cluster |
e300cf7d FG |
35 | related tasks. The **P**rox**m**o**x** **C**luster **F**ile **S**ystem (``pmxcfs'') |
36 | is used to transparently distribute the cluster configuration to all cluster | |
8a865621 DM |
37 | nodes. |
38 | ||
39 | Grouping nodes into a cluster has the following advantages: | |
40 | ||
41 | * Centralized, web based management | |
42 | ||
5eba0743 | 43 | * Multi-master clusters: each node can do all management task |
8a865621 | 44 | |
8c1189b6 FG |
45 | * `pmxcfs`: database-driven file system for storing configuration files, |
46 | replicated in real-time on all nodes using `corosync`. | |
8a865621 | 47 | |
5eba0743 | 48 | * Easy migration of virtual machines and containers between physical |
8a865621 DM |
49 | hosts |
50 | ||
51 | * Fast deployment | |
52 | ||
53 | * Cluster-wide services like firewall and HA | |
54 | ||
55 | ||
56 | Requirements | |
57 | ------------ | |
58 | ||
8c1189b6 | 59 | * All nodes must be in the same network as `corosync` uses IP Multicast |
8a865621 | 60 | to communicate between nodes (also see |
ceabe189 | 61 | http://www.corosync.org[Corosync Cluster Engine]). Corosync uses UDP |
ff72a2ba | 62 | ports 5404 and 5405 for cluster communication. |
ceabe189 DM |
63 | + |
64 | NOTE: Some switches do not support IP multicast by default and must be | |
65 | manually enabled first. | |
8a865621 DM |
66 | |
67 | * Date and time have to be synchronized. | |
68 | ||
ceabe189 | 69 | * SSH tunnel on TCP port 22 between nodes is used. |
8a865621 | 70 | |
ceabe189 DM |
71 | * If you are interested in High Availability, you need to have at |
72 | least three nodes for reliable quorum. All nodes should have the | |
73 | same version. | |
8a865621 DM |
74 | |
75 | * We recommend a dedicated NIC for the cluster traffic, especially if | |
76 | you use shared storage. | |
77 | ||
d4a9910f DL |
78 | * Root password of a cluster node is required for adding nodes. |
79 | ||
e4b62d04 TL |
80 | NOTE: It is not possible to mix {pve} 3.x and earlier with {pve} 4.X cluster |
81 | nodes. | |
82 | ||
83 | NOTE: While it's possible for {pve} 4.4 and {pve} 5.0 this is not supported as | |
84 | production configuration and should only used temporarily during upgrading the | |
85 | whole cluster from one to another major version. | |
8a865621 DM |
86 | |
87 | ||
ceabe189 DM |
88 | Preparing Nodes |
89 | --------------- | |
8a865621 DM |
90 | |
91 | First, install {PVE} on all nodes. Make sure that each node is | |
92 | installed with the final hostname and IP configuration. Changing the | |
93 | hostname and IP is not possible after cluster creation. | |
94 | ||
30101530 TL |
95 | Currently the cluster creation can either be done on the console (login via |
96 | `ssh`) or the API, which we have a GUI implementation for (__Datacenter -> | |
97 | Cluster__). | |
8a865621 | 98 | |
9a7396aa TL |
99 | While it's often common use to reference all other nodenames in `/etc/hosts` |
100 | with their IP this is not strictly necessary for a cluster, which normally uses | |
101 | multicast, to work. It maybe useful as you then can connect from one node to | |
102 | the other with SSH through the easier to remember node name. | |
103 | ||
11202f1d | 104 | [[pvecm_create_cluster]] |
8a865621 | 105 | Create the Cluster |
ceabe189 | 106 | ------------------ |
8a865621 | 107 | |
8c1189b6 | 108 | Login via `ssh` to the first {pve} node. Use a unique name for your cluster. |
9a7396aa TL |
109 | This name cannot be changed later. The cluster name follows the same rules as |
110 | node names. | |
8a865621 | 111 | |
c15cdfba TL |
112 | ---- |
113 | hp1# pvecm create CLUSTERNAME | |
114 | ---- | |
8a865621 | 115 | |
9a7396aa TL |
116 | CAUTION: The cluster name is used to compute the default multicast address. |
117 | Please use unique cluster names if you run more than one cluster inside your | |
118 | network. To avoid human confusion, it is also recommended to choose different | |
119 | names even if clusters do not share the cluster network. | |
63f956c8 | 120 | |
8a865621 DM |
121 | To check the state of your cluster use: |
122 | ||
c15cdfba | 123 | ---- |
8a865621 | 124 | hp1# pvecm status |
c15cdfba | 125 | ---- |
8a865621 | 126 | |
82445c4e TL |
127 | Multiple Clusters In Same Network |
128 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
129 | ||
130 | It is possible to create multiple clusters in the same physical or logical | |
131 | network. Each cluster must have a unique name, which is used to generate the | |
132 | cluster's multicast group address. As long as no duplicate cluster names are | |
133 | configured in one network segment, the different clusters won't interfere with | |
134 | each other. | |
135 | ||
136 | If multiple clusters operate in a single network it may be beneficial to setup | |
137 | an IGMP querier and enable IGMP Snooping in said network. This may reduce the | |
138 | load of the network significantly because multicast packets are only delivered | |
139 | to endpoints of the respective member nodes. | |
140 | ||
8a865621 | 141 | |
11202f1d | 142 | [[pvecm_join_node_to_cluster]] |
8a865621 | 143 | Adding Nodes to the Cluster |
ceabe189 | 144 | --------------------------- |
8a865621 | 145 | |
8c1189b6 | 146 | Login via `ssh` to the node you want to add. |
8a865621 | 147 | |
c15cdfba | 148 | ---- |
8a865621 | 149 | hp2# pvecm add IP-ADDRESS-CLUSTER |
c15cdfba | 150 | ---- |
8a865621 | 151 | |
270757a1 SR |
152 | For `IP-ADDRESS-CLUSTER` use the IP or hostname of an existing cluster node. |
153 | An IP address is recommended (see <<corosync-addresses,Ring Address Types>>). | |
8a865621 | 154 | |
5eba0743 | 155 | CAUTION: A new node cannot hold any VMs, because you would get |
7980581f | 156 | conflicts about identical VM IDs. Also, all existing configuration in |
8c1189b6 FG |
157 | `/etc/pve` is overwritten when you join a new node to the cluster. To |
158 | workaround, use `vzdump` to backup and restore to a different VMID after | |
7980581f | 159 | adding the node to the cluster. |
8a865621 DM |
160 | |
161 | To check the state of cluster: | |
162 | ||
c15cdfba | 163 | ---- |
8a865621 | 164 | # pvecm status |
c15cdfba | 165 | ---- |
8a865621 | 166 | |
ceabe189 | 167 | .Cluster status after adding 4 nodes |
8a865621 DM |
168 | ---- |
169 | hp2# pvecm status | |
170 | Quorum information | |
171 | ~~~~~~~~~~~~~~~~~~ | |
172 | Date: Mon Apr 20 12:30:13 2015 | |
173 | Quorum provider: corosync_votequorum | |
174 | Nodes: 4 | |
175 | Node ID: 0x00000001 | |
176 | Ring ID: 1928 | |
177 | Quorate: Yes | |
178 | ||
179 | Votequorum information | |
180 | ~~~~~~~~~~~~~~~~~~~~~~ | |
181 | Expected votes: 4 | |
182 | Highest expected: 4 | |
183 | Total votes: 4 | |
91f3edd0 | 184 | Quorum: 3 |
8a865621 DM |
185 | Flags: Quorate |
186 | ||
187 | Membership information | |
188 | ~~~~~~~~~~~~~~~~~~~~~~ | |
189 | Nodeid Votes Name | |
190 | 0x00000001 1 192.168.15.91 | |
191 | 0x00000002 1 192.168.15.92 (local) | |
192 | 0x00000003 1 192.168.15.93 | |
193 | 0x00000004 1 192.168.15.94 | |
194 | ---- | |
195 | ||
196 | If you only want the list of all nodes use: | |
197 | ||
c15cdfba | 198 | ---- |
8a865621 | 199 | # pvecm nodes |
c15cdfba | 200 | ---- |
8a865621 | 201 | |
5eba0743 | 202 | .List nodes in a cluster |
8a865621 DM |
203 | ---- |
204 | hp2# pvecm nodes | |
205 | ||
206 | Membership information | |
207 | ~~~~~~~~~~~~~~~~~~~~~~ | |
208 | Nodeid Votes Name | |
209 | 1 1 hp1 | |
210 | 2 1 hp2 (local) | |
211 | 3 1 hp3 | |
212 | 4 1 hp4 | |
213 | ---- | |
214 | ||
82d52451 | 215 | [[adding-nodes-with-separated-cluster-network]] |
e4ec4154 TL |
216 | Adding Nodes With Separated Cluster Network |
217 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
218 | ||
219 | When adding a node to a cluster with a separated cluster network you need to | |
220 | use the 'ringX_addr' parameters to set the nodes address on those networks: | |
221 | ||
222 | [source,bash] | |
4d19cb00 | 223 | ---- |
e4ec4154 | 224 | pvecm add IP-ADDRESS-CLUSTER -ring0_addr IP-ADDRESS-RING0 |
4d19cb00 | 225 | ---- |
e4ec4154 TL |
226 | |
227 | If you want to use the Redundant Ring Protocol you will also want to pass the | |
228 | 'ring1_addr' parameter. | |
229 | ||
8a865621 DM |
230 | |
231 | Remove a Cluster Node | |
ceabe189 | 232 | --------------------- |
8a865621 DM |
233 | |
234 | CAUTION: Read carefully the procedure before proceeding, as it could | |
235 | not be what you want or need. | |
236 | ||
237 | Move all virtual machines from the node. Make sure you have no local | |
238 | data or backups you want to keep, or save them accordingly. | |
e8503c6c | 239 | In the following example we will remove the node hp4 from the cluster. |
8a865621 | 240 | |
e8503c6c EK |
241 | Log in to a *different* cluster node (not hp4), and issue a `pvecm nodes` |
242 | command to identify the node ID to remove: | |
8a865621 DM |
243 | |
244 | ---- | |
245 | hp1# pvecm nodes | |
246 | ||
247 | Membership information | |
248 | ~~~~~~~~~~~~~~~~~~~~~~ | |
249 | Nodeid Votes Name | |
250 | 1 1 hp1 (local) | |
251 | 2 1 hp2 | |
252 | 3 1 hp3 | |
253 | 4 1 hp4 | |
254 | ---- | |
255 | ||
e8503c6c EK |
256 | |
257 | At this point you must power off hp4 and | |
258 | make sure that it will not power on again (in the network) as it | |
259 | is. | |
260 | ||
261 | IMPORTANT: As said above, it is critical to power off the node | |
262 | *before* removal, and make sure that it will *never* power on again | |
263 | (in the existing cluster network) as it is. | |
264 | If you power on the node as it is, your cluster will be screwed up and | |
265 | it could be difficult to restore a clean cluster state. | |
266 | ||
267 | After powering off the node hp4, we can safely remove it from the cluster. | |
8a865621 | 268 | |
c15cdfba | 269 | ---- |
8a865621 | 270 | hp1# pvecm delnode hp4 |
c15cdfba | 271 | ---- |
8a865621 DM |
272 | |
273 | If the operation succeeds no output is returned, just check the node | |
8c1189b6 | 274 | list again with `pvecm nodes` or `pvecm status`. You should see |
8a865621 DM |
275 | something like: |
276 | ||
277 | ---- | |
278 | hp1# pvecm status | |
279 | ||
280 | Quorum information | |
281 | ~~~~~~~~~~~~~~~~~~ | |
282 | Date: Mon Apr 20 12:44:28 2015 | |
283 | Quorum provider: corosync_votequorum | |
284 | Nodes: 3 | |
285 | Node ID: 0x00000001 | |
286 | Ring ID: 1992 | |
287 | Quorate: Yes | |
288 | ||
289 | Votequorum information | |
290 | ~~~~~~~~~~~~~~~~~~~~~~ | |
291 | Expected votes: 3 | |
292 | Highest expected: 3 | |
293 | Total votes: 3 | |
91f3edd0 | 294 | Quorum: 2 |
8a865621 DM |
295 | Flags: Quorate |
296 | ||
297 | Membership information | |
298 | ~~~~~~~~~~~~~~~~~~~~~~ | |
299 | Nodeid Votes Name | |
300 | 0x00000001 1 192.168.15.90 (local) | |
301 | 0x00000002 1 192.168.15.91 | |
302 | 0x00000003 1 192.168.15.92 | |
303 | ---- | |
304 | ||
8a865621 DM |
305 | If, for whatever reason, you want that this server joins the same |
306 | cluster again, you have to | |
307 | ||
26ca7ff5 | 308 | * reinstall {pve} on it from scratch |
8a865621 DM |
309 | |
310 | * then join it, as explained in the previous section. | |
d8742b0c | 311 | |
41925ede SR |
312 | NOTE: After removal of the node, its SSH fingerprint will still reside in the |
313 | 'known_hosts' of the other nodes. If you receive an SSH error after rejoining | |
9121b45b TL |
314 | a node with the same IP or hostname, run `pvecm updatecerts` once on the |
315 | re-added node to update its fingerprint cluster wide. | |
41925ede | 316 | |
38ae8db3 | 317 | [[pvecm_separate_node_without_reinstall]] |
555e966b TL |
318 | Separate A Node Without Reinstalling |
319 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
320 | ||
321 | CAUTION: This is *not* the recommended method, proceed with caution. Use the | |
322 | above mentioned method if you're unsure. | |
323 | ||
324 | You can also separate a node from a cluster without reinstalling it from | |
325 | scratch. But after removing the node from the cluster it will still have | |
326 | access to the shared storages! This must be resolved before you start removing | |
327 | the node from the cluster. A {pve} cluster cannot share the exact same | |
2ea5c4a5 TL |
328 | storage with another cluster, as storage locking doesn't work over cluster |
329 | boundary. Further, it may also lead to VMID conflicts. | |
555e966b | 330 | |
3be22308 TL |
331 | Its suggested that you create a new storage where only the node which you want |
332 | to separate has access. This can be an new export on your NFS or a new Ceph | |
333 | pool, to name a few examples. Its just important that the exact same storage | |
334 | does not gets accessed by multiple clusters. After setting this storage up move | |
335 | all data from the node and its VMs to it. Then you are ready to separate the | |
336 | node from the cluster. | |
555e966b TL |
337 | |
338 | WARNING: Ensure all shared resources are cleanly separated! You will run into | |
339 | conflicts and problems else. | |
340 | ||
341 | First stop the corosync and the pve-cluster services on the node: | |
342 | [source,bash] | |
4d19cb00 | 343 | ---- |
555e966b TL |
344 | systemctl stop pve-cluster |
345 | systemctl stop corosync | |
4d19cb00 | 346 | ---- |
555e966b TL |
347 | |
348 | Start the cluster filesystem again in local mode: | |
349 | [source,bash] | |
4d19cb00 | 350 | ---- |
555e966b | 351 | pmxcfs -l |
4d19cb00 | 352 | ---- |
555e966b TL |
353 | |
354 | Delete the corosync configuration files: | |
355 | [source,bash] | |
4d19cb00 | 356 | ---- |
555e966b TL |
357 | rm /etc/pve/corosync.conf |
358 | rm /etc/corosync/* | |
4d19cb00 | 359 | ---- |
555e966b TL |
360 | |
361 | You can now start the filesystem again as normal service: | |
362 | [source,bash] | |
4d19cb00 | 363 | ---- |
555e966b TL |
364 | killall pmxcfs |
365 | systemctl start pve-cluster | |
4d19cb00 | 366 | ---- |
555e966b TL |
367 | |
368 | The node is now separated from the cluster. You can deleted it from a remaining | |
369 | node of the cluster with: | |
370 | [source,bash] | |
4d19cb00 | 371 | ---- |
555e966b | 372 | pvecm delnode oldnode |
4d19cb00 | 373 | ---- |
555e966b TL |
374 | |
375 | If the command failed, because the remaining node in the cluster lost quorum | |
376 | when the now separate node exited, you may set the expected votes to 1 as a workaround: | |
377 | [source,bash] | |
4d19cb00 | 378 | ---- |
555e966b | 379 | pvecm expected 1 |
4d19cb00 | 380 | ---- |
555e966b | 381 | |
96d698db | 382 | And then repeat the 'pvecm delnode' command. |
555e966b TL |
383 | |
384 | Now switch back to the separated node, here delete all remaining files left | |
385 | from the old cluster. This ensures that the node can be added to another | |
386 | cluster again without problems. | |
387 | ||
388 | [source,bash] | |
4d19cb00 | 389 | ---- |
555e966b | 390 | rm /var/lib/corosync/* |
4d19cb00 | 391 | ---- |
555e966b TL |
392 | |
393 | As the configuration files from the other nodes are still in the cluster | |
394 | filesystem you may want to clean those up too. Remove simply the whole | |
395 | directory recursive from '/etc/pve/nodes/NODENAME', but check three times that | |
396 | you used the correct one before deleting it. | |
397 | ||
398 | CAUTION: The nodes SSH keys are still in the 'authorized_key' file, this means | |
399 | the nodes can still connect to each other with public key authentication. This | |
400 | should be fixed by removing the respective keys from the | |
401 | '/etc/pve/priv/authorized_keys' file. | |
d8742b0c | 402 | |
806ef12d DM |
403 | Quorum |
404 | ------ | |
405 | ||
406 | {pve} use a quorum-based technique to provide a consistent state among | |
407 | all cluster nodes. | |
408 | ||
409 | [quote, from Wikipedia, Quorum (distributed computing)] | |
410 | ____ | |
411 | A quorum is the minimum number of votes that a distributed transaction | |
412 | has to obtain in order to be allowed to perform an operation in a | |
413 | distributed system. | |
414 | ____ | |
415 | ||
416 | In case of network partitioning, state changes requires that a | |
417 | majority of nodes are online. The cluster switches to read-only mode | |
5eba0743 | 418 | if it loses quorum. |
806ef12d DM |
419 | |
420 | NOTE: {pve} assigns a single vote to each node by default. | |
421 | ||
e4ec4154 TL |
422 | Cluster Network |
423 | --------------- | |
424 | ||
425 | The cluster network is the core of a cluster. All messages sent over it have to | |
426 | be delivered reliable to all nodes in their respective order. In {pve} this | |
427 | part is done by corosync, an implementation of a high performance low overhead | |
428 | high availability development toolkit. It serves our decentralized | |
429 | configuration file system (`pmxcfs`). | |
430 | ||
431 | [[cluster-network-requirements]] | |
432 | Network Requirements | |
433 | ~~~~~~~~~~~~~~~~~~~~ | |
434 | This needs a reliable network with latencies under 2 milliseconds (LAN | |
435 | performance) to work properly. While corosync can also use unicast for | |
436 | communication between nodes its **highly recommended** to have a multicast | |
437 | capable network. The network should not be used heavily by other members, | |
438 | ideally corosync runs on its own network. | |
439 | *never* share it with network where storage communicates too. | |
440 | ||
441 | Before setting up a cluster it is good practice to check if the network is fit | |
442 | for that purpose. | |
443 | ||
444 | * Ensure that all nodes are in the same subnet. This must only be true for the | |
445 | network interfaces used for cluster communication (corosync). | |
446 | ||
447 | * Ensure all nodes can reach each other over those interfaces, using `ping` is | |
448 | enough for a basic test. | |
449 | ||
450 | * Ensure that multicast works in general and a high package rates. This can be | |
451 | done with the `omping` tool. The final "%loss" number should be < 1%. | |
9e73d831 | 452 | + |
e4ec4154 TL |
453 | [source,bash] |
454 | ---- | |
455 | omping -c 10000 -i 0.001 -F -q NODE1-IP NODE2-IP ... | |
456 | ---- | |
457 | ||
458 | * Ensure that multicast communication works over an extended period of time. | |
a181f090 | 459 | This uncovers problems where IGMP snooping is activated on the network but |
e4ec4154 TL |
460 | no multicast querier is active. This test has a duration of around 10 |
461 | minutes. | |
9e73d831 | 462 | + |
e4ec4154 | 463 | [source,bash] |
4d19cb00 | 464 | ---- |
e4ec4154 | 465 | omping -c 600 -i 1 -q NODE1-IP NODE2-IP ... |
4d19cb00 | 466 | ---- |
e4ec4154 TL |
467 | |
468 | Your network is not ready for clustering if any of these test fails. Recheck | |
469 | your network configuration. Especially switches are notorious for having | |
470 | multicast disabled by default or IGMP snooping enabled with no IGMP querier | |
471 | active. | |
472 | ||
473 | In smaller cluster its also an option to use unicast if you really cannot get | |
474 | multicast to work. | |
475 | ||
476 | Separate Cluster Network | |
477 | ~~~~~~~~~~~~~~~~~~~~~~~~ | |
478 | ||
479 | When creating a cluster without any parameters the cluster network is generally | |
480 | shared with the Web UI and the VMs and its traffic. Depending on your setup | |
481 | even storage traffic may get sent over the same network. Its recommended to | |
482 | change that, as corosync is a time critical real time application. | |
483 | ||
484 | Setting Up A New Network | |
485 | ^^^^^^^^^^^^^^^^^^^^^^^^ | |
486 | ||
487 | First you have to setup a new network interface. It should be on a physical | |
488 | separate network. Ensure that your network fulfills the | |
489 | <<cluster-network-requirements,cluster network requirements>>. | |
490 | ||
491 | Separate On Cluster Creation | |
492 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
493 | ||
494 | This is possible through the 'ring0_addr' and 'bindnet0_addr' parameter of | |
495 | the 'pvecm create' command used for creating a new cluster. | |
496 | ||
44f38275 | 497 | If you have setup an additional NIC with a static address on 10.10.10.1/25 |
e4ec4154 TL |
498 | and want to send and receive all cluster communication over this interface |
499 | you would execute: | |
500 | ||
501 | [source,bash] | |
4d19cb00 | 502 | ---- |
e4ec4154 | 503 | pvecm create test --ring0_addr 10.10.10.1 --bindnet0_addr 10.10.10.0 |
4d19cb00 | 504 | ---- |
e4ec4154 TL |
505 | |
506 | To check if everything is working properly execute: | |
507 | [source,bash] | |
4d19cb00 | 508 | ---- |
e4ec4154 | 509 | systemctl status corosync |
4d19cb00 | 510 | ---- |
e4ec4154 | 511 | |
266cb17b WB |
512 | Afterwards, proceed as descripted in the section to |
513 | <<adding-nodes-with-separated-cluster-network,add nodes with a separated cluster network>>. | |
82d52451 | 514 | |
e4ec4154 TL |
515 | [[separate-cluster-net-after-creation]] |
516 | Separate After Cluster Creation | |
517 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
518 | ||
519 | You can do this also if you have already created a cluster and want to switch | |
520 | its communication to another network, without rebuilding the whole cluster. | |
521 | This change may lead to short durations of quorum loss in the cluster, as nodes | |
522 | have to restart corosync and come up one after the other on the new network. | |
523 | ||
524 | Check how to <<edit-corosync-conf,edit the corosync.conf file>> first. | |
525 | The open it and you should see a file similar to: | |
526 | ||
527 | ---- | |
528 | logging { | |
529 | debug: off | |
530 | to_syslog: yes | |
531 | } | |
532 | ||
533 | nodelist { | |
534 | ||
535 | node { | |
536 | name: due | |
537 | nodeid: 2 | |
538 | quorum_votes: 1 | |
539 | ring0_addr: due | |
540 | } | |
541 | ||
542 | node { | |
543 | name: tre | |
544 | nodeid: 3 | |
545 | quorum_votes: 1 | |
546 | ring0_addr: tre | |
547 | } | |
548 | ||
549 | node { | |
550 | name: uno | |
551 | nodeid: 1 | |
552 | quorum_votes: 1 | |
553 | ring0_addr: uno | |
554 | } | |
555 | ||
556 | } | |
557 | ||
558 | quorum { | |
559 | provider: corosync_votequorum | |
560 | } | |
561 | ||
562 | totem { | |
563 | cluster_name: thomas-testcluster | |
564 | config_version: 3 | |
565 | ip_version: ipv4 | |
566 | secauth: on | |
567 | version: 2 | |
568 | interface { | |
569 | bindnetaddr: 192.168.30.50 | |
570 | ringnumber: 0 | |
571 | } | |
572 | ||
573 | } | |
574 | ---- | |
575 | ||
576 | The first you want to do is add the 'name' properties in the node entries if | |
577 | you do not see them already. Those *must* match the node name. | |
578 | ||
579 | Then replace the address from the 'ring0_addr' properties with the new | |
580 | addresses. You may use plain IP addresses or also hostnames here. If you use | |
270757a1 SR |
581 | hostnames ensure that they are resolvable from all nodes. (see also |
582 | <<corosync-addresses,Ring Address Types>>) | |
e4ec4154 TL |
583 | |
584 | In my example I want to switch my cluster communication to the 10.10.10.1/25 | |
470d4313 | 585 | network. So I replace all 'ring0_addr' respectively. I also set the bindnetaddr |
e4ec4154 TL |
586 | in the totem section of the config to an address of the new network. It can be |
587 | any address from the subnet configured on the new network interface. | |
588 | ||
589 | After you increased the 'config_version' property the new configuration file | |
590 | should look like: | |
591 | ||
592 | ---- | |
593 | ||
594 | logging { | |
595 | debug: off | |
596 | to_syslog: yes | |
597 | } | |
598 | ||
599 | nodelist { | |
600 | ||
601 | node { | |
602 | name: due | |
603 | nodeid: 2 | |
604 | quorum_votes: 1 | |
605 | ring0_addr: 10.10.10.2 | |
606 | } | |
607 | ||
608 | node { | |
609 | name: tre | |
610 | nodeid: 3 | |
611 | quorum_votes: 1 | |
612 | ring0_addr: 10.10.10.3 | |
613 | } | |
614 | ||
615 | node { | |
616 | name: uno | |
617 | nodeid: 1 | |
618 | quorum_votes: 1 | |
619 | ring0_addr: 10.10.10.1 | |
620 | } | |
621 | ||
622 | } | |
623 | ||
624 | quorum { | |
625 | provider: corosync_votequorum | |
626 | } | |
627 | ||
628 | totem { | |
629 | cluster_name: thomas-testcluster | |
630 | config_version: 4 | |
631 | ip_version: ipv4 | |
632 | secauth: on | |
633 | version: 2 | |
634 | interface { | |
635 | bindnetaddr: 10.10.10.1 | |
636 | ringnumber: 0 | |
637 | } | |
638 | ||
639 | } | |
640 | ---- | |
641 | ||
642 | Now after a final check whether all changed information is correct we save it | |
643 | and see again the <<edit-corosync-conf,edit corosync.conf file>> section to | |
644 | learn how to bring it in effect. | |
645 | ||
646 | As our change cannot be enforced live from corosync we have to do an restart. | |
647 | ||
648 | On a single node execute: | |
649 | [source,bash] | |
4d19cb00 | 650 | ---- |
e4ec4154 | 651 | systemctl restart corosync |
4d19cb00 | 652 | ---- |
e4ec4154 TL |
653 | |
654 | Now check if everything is fine: | |
655 | ||
656 | [source,bash] | |
4d19cb00 | 657 | ---- |
e4ec4154 | 658 | systemctl status corosync |
4d19cb00 | 659 | ---- |
e4ec4154 TL |
660 | |
661 | If corosync runs again correct restart corosync also on all other nodes. | |
662 | They will then join the cluster membership one by one on the new network. | |
663 | ||
270757a1 SR |
664 | [[corosync-addresses]] |
665 | Corosync addresses | |
666 | ~~~~~~~~~~~~~~~~~~ | |
667 | ||
668 | A corosync link or ring address can be specified in two ways: | |
669 | ||
670 | * **IPv4/v6 addresses** will be used directly. They are recommended, since they | |
671 | are static and usually not changed carelessly. | |
672 | ||
673 | * **Hostnames** will be resolved using `getaddrinfo`, which means that per | |
674 | default, IPv6 addresses will be used first, if available (see also | |
675 | `man gai.conf`). Keep this in mind, especially when upgrading an existing | |
676 | cluster to IPv6. | |
677 | ||
678 | CAUTION: Hostnames should be used with care, since the address they | |
679 | resolve to can be changed without touching corosync or the node it runs on - | |
680 | which may lead to a situation where an address is changed without thinking | |
681 | about implications for corosync. | |
682 | ||
683 | A seperate, static hostname specifically for corosync is recommended, if | |
684 | hostnames are preferred. Also, make sure that every node in the cluster can | |
685 | resolve all hostnames correctly. | |
686 | ||
687 | Since {pve} 5.1, while supported, hostnames will be resolved at the time of | |
688 | entry. Only the resolved IP is then saved to the configuration. | |
689 | ||
690 | Nodes that joined the cluster on earlier versions likely still use their | |
691 | unresolved hostname in `corosync.conf`. It might be a good idea to replace | |
692 | them with IPs or a seperate hostname, as mentioned above. | |
693 | ||
11202f1d | 694 | [[pvecm_rrp]] |
e4ec4154 TL |
695 | Redundant Ring Protocol |
696 | ~~~~~~~~~~~~~~~~~~~~~~~ | |
697 | To avoid a single point of failure you should implement counter measurements. | |
698 | This can be on the hardware and operating system level through network bonding. | |
699 | ||
700 | Corosync itself offers also a possibility to add redundancy through the so | |
701 | called 'Redundant Ring Protocol'. This protocol allows running a second totem | |
702 | ring on another network, this network should be physically separated from the | |
703 | other rings network to actually increase availability. | |
704 | ||
705 | RRP On Cluster Creation | |
706 | ~~~~~~~~~~~~~~~~~~~~~~~ | |
707 | ||
708 | The 'pvecm create' command provides the additional parameters 'bindnetX_addr', | |
709 | 'ringX_addr' and 'rrp_mode', can be used for RRP configuration. | |
710 | ||
711 | NOTE: See the <<corosync-conf-glossary,glossary>> if you do not know what each parameter means. | |
712 | ||
713 | So if you have two networks, one on the 10.10.10.1/24 and the other on the | |
714 | 10.10.20.1/24 subnet you would execute: | |
715 | ||
716 | [source,bash] | |
4d19cb00 | 717 | ---- |
e4ec4154 TL |
718 | pvecm create CLUSTERNAME -bindnet0_addr 10.10.10.1 -ring0_addr 10.10.10.1 \ |
719 | -bindnet1_addr 10.10.20.1 -ring1_addr 10.10.20.1 | |
4d19cb00 | 720 | ---- |
e4ec4154 | 721 | |
6e78f927 | 722 | RRP On Existing Clusters |
e4ec4154 TL |
723 | ~~~~~~~~~~~~~~~~~~~~~~~~ |
724 | ||
6e78f927 TL |
725 | You will take similar steps as described in |
726 | <<separate-cluster-net-after-creation,separating the cluster network>> to | |
727 | enable RRP on an already running cluster. The single difference is, that you | |
728 | will add `ring1` and use it instead of `ring0`. | |
e4ec4154 TL |
729 | |
730 | First add a new `interface` subsection in the `totem` section, set its | |
731 | `ringnumber` property to `1`. Set the interfaces `bindnetaddr` property to an | |
732 | address of the subnet you have configured for your new ring. | |
733 | Further set the `rrp_mode` to `passive`, this is the only stable mode. | |
734 | ||
735 | Then add to each node entry in the `nodelist` section its new `ring1_addr` | |
736 | property with the nodes additional ring address. | |
737 | ||
738 | So if you have two networks, one on the 10.10.10.1/24 and the other on the | |
739 | 10.10.20.1/24 subnet, the final configuration file should look like: | |
740 | ||
741 | ---- | |
742 | totem { | |
743 | cluster_name: tweak | |
744 | config_version: 9 | |
745 | ip_version: ipv4 | |
746 | rrp_mode: passive | |
747 | secauth: on | |
748 | version: 2 | |
749 | interface { | |
750 | bindnetaddr: 10.10.10.1 | |
751 | ringnumber: 0 | |
752 | } | |
753 | interface { | |
754 | bindnetaddr: 10.10.20.1 | |
755 | ringnumber: 1 | |
756 | } | |
757 | } | |
758 | ||
759 | nodelist { | |
760 | node { | |
761 | name: pvecm1 | |
762 | nodeid: 1 | |
763 | quorum_votes: 1 | |
764 | ring0_addr: 10.10.10.1 | |
765 | ring1_addr: 10.10.20.1 | |
766 | } | |
767 | ||
768 | node { | |
769 | name: pvecm2 | |
770 | nodeid: 2 | |
771 | quorum_votes: 1 | |
772 | ring0_addr: 10.10.10.2 | |
773 | ring1_addr: 10.10.20.2 | |
774 | } | |
775 | ||
776 | [...] # other cluster nodes here | |
777 | } | |
778 | ||
779 | [...] # other remaining config sections here | |
780 | ||
781 | ---- | |
782 | ||
7d48940b DM |
783 | Bring it in effect like described in the |
784 | <<edit-corosync-conf,edit the corosync.conf file>> section. | |
e4ec4154 TL |
785 | |
786 | This is a change which cannot take live in effect and needs at least a restart | |
787 | of corosync. Recommended is a restart of the whole cluster. | |
788 | ||
789 | If you cannot reboot the whole cluster ensure no High Availability services are | |
790 | configured and the stop the corosync service on all nodes. After corosync is | |
791 | stopped on all nodes start it one after the other again. | |
792 | ||
c21d2cbe OB |
793 | Corosync External Vote Support |
794 | ------------------------------ | |
795 | ||
796 | This section describes a way to deploy an external voter in a {pve} cluster. | |
797 | When configured, the cluster can sustain more node failures without | |
798 | violating safety properties of the cluster communication. | |
799 | ||
800 | For this to work there are two services involved: | |
801 | ||
802 | * a so called qdevice daemon which runs on each {pve} node | |
803 | ||
804 | * an external vote daemon which runs on an independent server. | |
805 | ||
806 | As a result you can achieve higher availability even in smaller setups (for | |
807 | example 2+1 nodes). | |
808 | ||
809 | QDevice Technical Overview | |
810 | ~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
811 | ||
812 | The Corosync Quroum Device (QDevice) is a daemon which runs on each cluster | |
813 | node. It provides a configured number of votes to the clusters quorum | |
814 | subsystem based on an external running third-party arbitrator's decision. | |
815 | Its primary use is to allow a cluster to sustain more node failures than | |
816 | standard quorum rules allow. This can be done safely as the external device | |
817 | can see all nodes and thus choose only one set of nodes to give its vote. | |
51730d56 | 818 | This will only be done if said set of nodes can have quorum (again) when |
c21d2cbe OB |
819 | receiving the third-party vote. |
820 | ||
821 | Currently only 'QDevice Net' is supported as a third-party arbitrator. It is | |
822 | a daemon which provides a vote to a cluster partition if it can reach the | |
823 | partition members over the network. It will give only votes to one partition | |
824 | of a cluster at any time. | |
825 | It's designed to support multiple clusters and is almost configuration and | |
826 | state free. New clusters are handled dynamically and no configuration file | |
827 | is needed on the host running a QDevice. | |
828 | ||
829 | The external host has the only requirement that it needs network access to the | |
830 | cluster and a corosync-qnetd package available. We provide such a package | |
831 | for Debian based hosts, other Linux distributions should also have a package | |
832 | available through their respective package manager. | |
833 | ||
834 | NOTE: In contrast to corosync itself, a QDevice connects to the cluster over | |
835 | TCP/IP and thus does not need a multicast capable network between itself and | |
836 | the cluster. In fact the daemon may run outside of the LAN and can have | |
837 | longer latencies than 2 ms. | |
838 | ||
839 | ||
840 | Supported Setups | |
841 | ~~~~~~~~~~~~~~~~ | |
842 | ||
843 | We support QDevices for clusters with an even number of nodes and recommend | |
844 | it for 2 node clusters, if they should provide higher availability. | |
845 | For clusters with an odd node count we discourage the use of QDevices | |
846 | currently. The reason for this, is the difference of the votes the QDevice | |
847 | provides for each cluster type. Even numbered clusters get single additional | |
848 | vote, with this we can only increase availability, i.e. if the QDevice | |
849 | itself fails we are in the same situation as with no QDevice at all. | |
850 | ||
851 | Now, with an odd numbered cluster size the QDevice provides '(N-1)' votes -- | |
852 | where 'N' corresponds to the cluster node count. This difference makes | |
853 | sense, if we had only one additional vote the cluster can get into a split | |
854 | brain situation. | |
855 | This algorithm would allow that all nodes but one (and naturally the | |
856 | QDevice itself) could fail. | |
857 | There are two drawbacks with this: | |
858 | ||
859 | * If the QNet daemon itself fails, no other node may fail or the cluster | |
860 | immediately loses quorum. For example, in a cluster with 15 nodes 7 | |
861 | could fail before the cluster becomes inquorate. But, if a QDevice is | |
862 | configured here and said QDevice fails itself **no single node** of | |
863 | the 15 may fail. The QDevice acts almost as a single point of failure in | |
864 | this case. | |
865 | ||
866 | * The fact that all but one node plus QDevice may fail sound promising at | |
867 | first, but this may result in a mass recovery of HA services that would | |
868 | overload the single node left. Also ceph server will stop to provide | |
869 | services after only '((N-1)/2)' nodes are online. | |
870 | ||
871 | If you understand the drawbacks and implications you can decide yourself if | |
872 | you should use this technology in an odd numbered cluster setup. | |
873 | ||
874 | ||
875 | QDevice-Net Setup | |
876 | ~~~~~~~~~~~~~~~~~ | |
877 | ||
878 | We recommend to run any daemon which provides votes to corosync-qdevice as an | |
e34c3e91 TL |
879 | unprivileged user. {pve} and Debian provides a package which is already |
880 | configured to do so. | |
c21d2cbe OB |
881 | The traffic between the daemon and the cluster must be encrypted to ensure a |
882 | safe and secure QDevice integration in {pve}. | |
883 | ||
884 | First install the 'corosync-qnetd' package on your external server and | |
885 | the 'corosync-qdevice' package on all cluster nodes. | |
886 | ||
887 | After that, ensure that all your nodes on the cluster are online. | |
888 | ||
889 | You can now easily set up your QDevice by running the following command on one | |
890 | of the {pve} nodes: | |
891 | ||
892 | ---- | |
893 | pve# pvecm qdevice setup <QDEVICE-IP> | |
894 | ---- | |
895 | ||
896 | The SSH key from the cluster will be automatically copied to the QDevice. You | |
897 | might need to enter an SSH password during this step. | |
898 | ||
899 | After you enter the password and all the steps are successfully completed, you | |
900 | will see "Done". You can check the status now: | |
901 | ||
902 | ---- | |
903 | pve# pvecm status | |
904 | ||
905 | ... | |
906 | ||
907 | Votequorum information | |
908 | ~~~~~~~~~~~~~~~~~~~~~ | |
909 | Expected votes: 3 | |
910 | Highest expected: 3 | |
911 | Total votes: 3 | |
912 | Quorum: 2 | |
913 | Flags: Quorate Qdevice | |
914 | ||
915 | Membership information | |
916 | ~~~~~~~~~~~~~~~~~~~~~~ | |
917 | Nodeid Votes Qdevice Name | |
918 | 0x00000001 1 A,V,NMW 192.168.22.180 (local) | |
919 | 0x00000002 1 A,V,NMW 192.168.22.181 | |
920 | 0x00000000 1 Qdevice | |
921 | ||
922 | ---- | |
923 | ||
924 | which means the QDevice is set up. | |
925 | ||
926 | ||
927 | Frequently Asked Questions | |
928 | ~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
929 | ||
930 | Tie Breaking | |
931 | ^^^^^^^^^^^^ | |
932 | ||
00821894 TL |
933 | In case of a tie, where two same-sized cluster partitions cannot see each other |
934 | but the QDevice, the QDevice chooses randomly one of those partitions and | |
c21d2cbe OB |
935 | provides a vote to it. |
936 | ||
d31de328 TL |
937 | Possible Negative Implications |
938 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
939 | ||
00821894 TL |
940 | For clusters with an even node count there are no negative implications when |
941 | setting up a QDevice. If it fails to work, you are as good as without QDevice at | |
942 | all. | |
d31de328 | 943 | |
870c2817 OB |
944 | Adding/Deleting Nodes After QDevice Setup |
945 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
d31de328 TL |
946 | |
947 | If you want to add a new node or remove an existing one from a cluster with a | |
00821894 TL |
948 | QDevice setup, you need to remove the QDevice first. After that, you can add or |
949 | remove nodes normally. Once you have a cluster with an even node count again, | |
950 | you can set up the QDevice again as described above. | |
870c2817 OB |
951 | |
952 | Removing the QDevice | |
953 | ^^^^^^^^^^^^^^^^^^^^ | |
954 | ||
00821894 TL |
955 | If you used the official `pvecm` tool to add the QDevice, you can remove it |
956 | trivially by running: | |
870c2817 OB |
957 | |
958 | ---- | |
959 | pve# pvecm qdevice remove | |
960 | ---- | |
d31de328 | 961 | |
51730d56 TL |
962 | //Still TODO |
963 | //^^^^^^^^^^ | |
964 | //There ist still stuff to add here | |
c21d2cbe OB |
965 | |
966 | ||
e4ec4154 TL |
967 | Corosync Configuration |
968 | ---------------------- | |
969 | ||
470d4313 | 970 | The `/etc/pve/corosync.conf` file plays a central role in {pve} cluster. It |
e4ec4154 TL |
971 | controls the cluster member ship and its network. |
972 | For reading more about it check the corosync.conf man page: | |
973 | [source,bash] | |
4d19cb00 | 974 | ---- |
e4ec4154 | 975 | man corosync.conf |
4d19cb00 | 976 | ---- |
e4ec4154 TL |
977 | |
978 | For node membership you should always use the `pvecm` tool provided by {pve}. | |
979 | You may have to edit the configuration file manually for other changes. | |
980 | Here are a few best practice tips for doing this. | |
981 | ||
982 | [[edit-corosync-conf]] | |
983 | Edit corosync.conf | |
984 | ~~~~~~~~~~~~~~~~~~ | |
985 | ||
986 | Editing the corosync.conf file can be not always straight forward. There are | |
987 | two on each cluster, one in `/etc/pve/corosync.conf` and the other in | |
988 | `/etc/corosync/corosync.conf`. Editing the one in our cluster file system will | |
989 | propagate the changes to the local one, but not vice versa. | |
990 | ||
991 | The configuration will get updated automatically as soon as the file changes. | |
992 | This means changes which can be integrated in a running corosync will take | |
993 | instantly effect. So you should always make a copy and edit that instead, to | |
994 | avoid triggering some unwanted changes by an in between safe. | |
995 | ||
996 | [source,bash] | |
4d19cb00 | 997 | ---- |
e4ec4154 | 998 | cp /etc/pve/corosync.conf /etc/pve/corosync.conf.new |
4d19cb00 | 999 | ---- |
e4ec4154 TL |
1000 | |
1001 | Then open the Config file with your favorite editor, `nano` and `vim.tiny` are | |
1002 | preinstalled on {pve} for example. | |
1003 | ||
1004 | NOTE: Always increment the 'config_version' number on configuration changes, | |
1005 | omitting this can lead to problems. | |
1006 | ||
1007 | After making the necessary changes create another copy of the current working | |
1008 | configuration file. This serves as a backup if the new configuration fails to | |
1009 | apply or makes problems in other ways. | |
1010 | ||
1011 | [source,bash] | |
4d19cb00 | 1012 | ---- |
e4ec4154 | 1013 | cp /etc/pve/corosync.conf /etc/pve/corosync.conf.bak |
4d19cb00 | 1014 | ---- |
e4ec4154 TL |
1015 | |
1016 | Then move the new configuration file over the old one: | |
1017 | [source,bash] | |
4d19cb00 | 1018 | ---- |
e4ec4154 | 1019 | mv /etc/pve/corosync.conf.new /etc/pve/corosync.conf |
4d19cb00 | 1020 | ---- |
e4ec4154 TL |
1021 | |
1022 | You may check with the commands | |
1023 | [source,bash] | |
4d19cb00 | 1024 | ---- |
e4ec4154 TL |
1025 | systemctl status corosync |
1026 | journalctl -b -u corosync | |
4d19cb00 | 1027 | ---- |
e4ec4154 TL |
1028 | |
1029 | If the change could applied automatically. If not you may have to restart the | |
1030 | corosync service via: | |
1031 | [source,bash] | |
4d19cb00 | 1032 | ---- |
e4ec4154 | 1033 | systemctl restart corosync |
4d19cb00 | 1034 | ---- |
e4ec4154 TL |
1035 | |
1036 | On errors check the troubleshooting section below. | |
1037 | ||
1038 | Troubleshooting | |
1039 | ~~~~~~~~~~~~~~~ | |
1040 | ||
1041 | Issue: 'quorum.expected_votes must be configured' | |
1042 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
1043 | ||
1044 | When corosync starts to fail and you get the following message in the system log: | |
1045 | ||
1046 | ---- | |
1047 | [...] | |
1048 | corosync[1647]: [QUORUM] Quorum provider: corosync_votequorum failed to initialize. | |
1049 | corosync[1647]: [SERV ] Service engine 'corosync_quorum' failed to load for reason | |
1050 | 'configuration error: nodelist or quorum.expected_votes must be configured!' | |
1051 | [...] | |
1052 | ---- | |
1053 | ||
1054 | It means that the hostname you set for corosync 'ringX_addr' in the | |
1055 | configuration could not be resolved. | |
1056 | ||
1057 | ||
1058 | Write Configuration When Not Quorate | |
1059 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
1060 | ||
1061 | If you need to change '/etc/pve/corosync.conf' on an node with no quorum, and you | |
1062 | know what you do, use: | |
1063 | [source,bash] | |
4d19cb00 | 1064 | ---- |
e4ec4154 | 1065 | pvecm expected 1 |
4d19cb00 | 1066 | ---- |
e4ec4154 TL |
1067 | |
1068 | This sets the expected vote count to 1 and makes the cluster quorate. You can | |
1069 | now fix your configuration, or revert it back to the last working backup. | |
1070 | ||
1071 | This is not enough if corosync cannot start anymore. Here its best to edit the | |
1072 | local copy of the corosync configuration in '/etc/corosync/corosync.conf' so | |
1073 | that corosync can start again. Ensure that on all nodes this configuration has | |
1074 | the same content to avoid split brains. If you are not sure what went wrong | |
1075 | it's best to ask the Proxmox Community to help you. | |
1076 | ||
1077 | ||
1078 | [[corosync-conf-glossary]] | |
1079 | Corosync Configuration Glossary | |
1080 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
1081 | ||
1082 | ringX_addr:: | |
1083 | This names the different ring addresses for the corosync totem rings used for | |
1084 | the cluster communication. | |
1085 | ||
1086 | bindnetaddr:: | |
1087 | Defines to which interface the ring should bind to. It may be any address of | |
1088 | the subnet configured on the interface we want to use. In general its the | |
1089 | recommended to just use an address a node uses on this interface. | |
1090 | ||
1091 | rrp_mode:: | |
1092 | Specifies the mode of the redundant ring protocol and may be passive, active or | |
1093 | none. Note that use of active is highly experimental and not official | |
1094 | supported. Passive is the preferred mode, it may double the cluster | |
1095 | communication throughput and increases availability. | |
1096 | ||
806ef12d DM |
1097 | |
1098 | Cluster Cold Start | |
1099 | ------------------ | |
1100 | ||
1101 | It is obvious that a cluster is not quorate when all nodes are | |
1102 | offline. This is a common case after a power failure. | |
1103 | ||
1104 | NOTE: It is always a good idea to use an uninterruptible power supply | |
8c1189b6 | 1105 | (``UPS'', also called ``battery backup'') to avoid this state, especially if |
806ef12d DM |
1106 | you want HA. |
1107 | ||
204231df | 1108 | On node startup, the `pve-guests` service is started and waits for |
8c1189b6 | 1109 | quorum. Once quorate, it starts all guests which have the `onboot` |
612417fd DM |
1110 | flag set. |
1111 | ||
1112 | When you turn on nodes, or when power comes back after power failure, | |
1113 | it is likely that some nodes boots faster than others. Please keep in | |
1114 | mind that guest startup is delayed until you reach quorum. | |
806ef12d | 1115 | |
054a7e7d | 1116 | |
082ea7d9 TL |
1117 | Guest Migration |
1118 | --------------- | |
1119 | ||
054a7e7d DM |
1120 | Migrating virtual guests to other nodes is a useful feature in a |
1121 | cluster. There are settings to control the behavior of such | |
1122 | migrations. This can be done via the configuration file | |
1123 | `datacenter.cfg` or for a specific migration via API or command line | |
1124 | parameters. | |
1125 | ||
da6c7dee DC |
1126 | It makes a difference if a Guest is online or offline, or if it has |
1127 | local resources (like a local disk). | |
1128 | ||
1129 | For Details about Virtual Machine Migration see the | |
1130 | xref:qm_migration[QEMU/KVM Migration Chapter] | |
1131 | ||
1132 | For Details about Container Migration see the | |
1133 | xref:pct_migration[Container Migration Chapter] | |
082ea7d9 TL |
1134 | |
1135 | Migration Type | |
1136 | ~~~~~~~~~~~~~~ | |
1137 | ||
44f38275 | 1138 | The migration type defines if the migration data should be sent over an |
d63be10b | 1139 | encrypted (`secure`) channel or an unencrypted (`insecure`) one. |
082ea7d9 | 1140 | Setting the migration type to insecure means that the RAM content of a |
470d4313 | 1141 | virtual guest gets also transferred unencrypted, which can lead to |
b1743473 DM |
1142 | information disclosure of critical data from inside the guest (for |
1143 | example passwords or encryption keys). | |
054a7e7d DM |
1144 | |
1145 | Therefore, we strongly recommend using the secure channel if you do | |
1146 | not have full control over the network and can not guarantee that no | |
1147 | one is eavesdropping to it. | |
082ea7d9 | 1148 | |
054a7e7d DM |
1149 | NOTE: Storage migration does not follow this setting. Currently, it |
1150 | always sends the storage content over a secure channel. | |
1151 | ||
1152 | Encryption requires a lot of computing power, so this setting is often | |
1153 | changed to "unsafe" to achieve better performance. The impact on | |
1154 | modern systems is lower because they implement AES encryption in | |
b1743473 DM |
1155 | hardware. The performance impact is particularly evident in fast |
1156 | networks where you can transfer 10 Gbps or more. | |
082ea7d9 | 1157 | |
082ea7d9 TL |
1158 | |
1159 | Migration Network | |
1160 | ~~~~~~~~~~~~~~~~~ | |
1161 | ||
a9baa444 TL |
1162 | By default, {pve} uses the network in which cluster communication |
1163 | takes place to send the migration traffic. This is not optimal because | |
1164 | sensitive cluster traffic can be disrupted and this network may not | |
1165 | have the best bandwidth available on the node. | |
1166 | ||
1167 | Setting the migration network parameter allows the use of a dedicated | |
1168 | network for the entire migration traffic. In addition to the memory, | |
1169 | this also affects the storage traffic for offline migrations. | |
1170 | ||
1171 | The migration network is set as a network in the CIDR notation. This | |
1172 | has the advantage that you do not have to set individual IP addresses | |
1173 | for each node. {pve} can determine the real address on the | |
1174 | destination node from the network specified in the CIDR form. To | |
1175 | enable this, the network must be specified so that each node has one, | |
1176 | but only one IP in the respective network. | |
1177 | ||
082ea7d9 TL |
1178 | |
1179 | Example | |
1180 | ^^^^^^^ | |
1181 | ||
a9baa444 TL |
1182 | We assume that we have a three-node setup with three separate |
1183 | networks. One for public communication with the Internet, one for | |
1184 | cluster communication and a very fast one, which we want to use as a | |
1185 | dedicated network for migration. | |
1186 | ||
1187 | A network configuration for such a setup might look as follows: | |
082ea7d9 TL |
1188 | |
1189 | ---- | |
7a0d4784 | 1190 | iface eno1 inet manual |
082ea7d9 TL |
1191 | |
1192 | # public network | |
1193 | auto vmbr0 | |
1194 | iface vmbr0 inet static | |
1195 | address 192.X.Y.57 | |
1196 | netmask 255.255.250.0 | |
1197 | gateway 192.X.Y.1 | |
7a0d4784 | 1198 | bridge_ports eno1 |
082ea7d9 TL |
1199 | bridge_stp off |
1200 | bridge_fd 0 | |
1201 | ||
1202 | # cluster network | |
7a0d4784 WL |
1203 | auto eno2 |
1204 | iface eno2 inet static | |
082ea7d9 TL |
1205 | address 10.1.1.1 |
1206 | netmask 255.255.255.0 | |
1207 | ||
1208 | # fast network | |
7a0d4784 WL |
1209 | auto eno3 |
1210 | iface eno3 inet static | |
082ea7d9 TL |
1211 | address 10.1.2.1 |
1212 | netmask 255.255.255.0 | |
082ea7d9 TL |
1213 | ---- |
1214 | ||
a9baa444 TL |
1215 | Here, we will use the network 10.1.2.0/24 as a migration network. For |
1216 | a single migration, you can do this using the `migration_network` | |
1217 | parameter of the command line tool: | |
1218 | ||
082ea7d9 | 1219 | ---- |
b1743473 | 1220 | # qm migrate 106 tre --online --migration_network 10.1.2.0/24 |
082ea7d9 TL |
1221 | ---- |
1222 | ||
a9baa444 TL |
1223 | To configure this as the default network for all migrations in the |
1224 | cluster, set the `migration` property of the `/etc/pve/datacenter.cfg` | |
1225 | file: | |
1226 | ||
082ea7d9 | 1227 | ---- |
a9baa444 | 1228 | # use dedicated migration network |
b1743473 | 1229 | migration: secure,network=10.1.2.0/24 |
082ea7d9 TL |
1230 | ---- |
1231 | ||
a9baa444 TL |
1232 | NOTE: The migration type must always be set when the migration network |
1233 | gets set in `/etc/pve/datacenter.cfg`. | |
1234 | ||
806ef12d | 1235 | |
d8742b0c DM |
1236 | ifdef::manvolnum[] |
1237 | include::pve-copyright.adoc[] | |
1238 | endif::manvolnum[] |