]> git.proxmox.com Git - ceph.git/blob - ceph/doc/rados/operations/add-or-rm-mons.rst
update ceph source to reef 18.2.1
[ceph.git] / ceph / doc / rados / operations / add-or-rm-mons.rst
1 .. _adding-and-removing-monitors:
2
3 ==========================
4 Adding/Removing Monitors
5 ==========================
6
7 It is possible to add monitors to a running cluster as long as redundancy is
8 maintained. To bootstrap a monitor, see `Manual Deployment`_ or `Monitor
9 Bootstrap`_.
10
11 .. _adding-monitors:
12
13 Adding Monitors
14 ===============
15
16 Ceph monitors serve as the single source of truth for the cluster map. It is
17 possible to run a cluster with only one monitor, but for a production cluster
18 it is recommended to have at least three monitors provisioned and in quorum.
19 Ceph monitors use a variation of the `Paxos`_ algorithm to maintain consensus
20 about maps and about other critical information across the cluster. Due to the
21 nature of Paxos, Ceph is able to maintain quorum (and thus establish
22 consensus) only if a majority of the monitors are ``active``.
23
24 It is best to run an odd number of monitors. This is because a cluster that is
25 running an odd number of monitors is more resilient than a cluster running an
26 even number. For example, in a two-monitor deployment, no failures can be
27 tolerated if quorum is to be maintained; in a three-monitor deployment, one
28 failure can be tolerated; in a four-monitor deployment, one failure can be
29 tolerated; and in a five-monitor deployment, two failures can be tolerated. In
30 general, a cluster running an odd number of monitors is best because it avoids
31 what is called the *split brain* phenomenon. In short, Ceph is able to operate
32 only if a majority of monitors are ``active`` and able to communicate with each
33 other, (for example: there must be a single monitor, two out of two monitors,
34 two out of three monitors, three out of five monitors, or the like).
35
36 For small or non-critical deployments of multi-node Ceph clusters, it is
37 recommended to deploy three monitors. For larger clusters or for clusters that
38 are intended to survive a double failure, it is recommended to deploy five
39 monitors. Only in rare circumstances is there any justification for deploying
40 seven or more monitors.
41
42 It is possible to run a monitor on the same host that is running an OSD.
43 However, this approach has disadvantages: for example: `fsync` issues with the
44 kernel might weaken performance, monitor and OSD daemons might be inactive at
45 the same time and cause disruption if the node crashes, is rebooted, or is
46 taken down for maintenance. Because of these risks, it is instead
47 recommended to run monitors and managers on dedicated hosts.
48
49 .. note:: A *majority* of monitors in your cluster must be able to
50 reach each other in order for quorum to be established.
51
52 Deploying your Hardware
53 -----------------------
54
55 Some operators choose to add a new monitor host at the same time that they add
56 a new monitor. For details on the minimum recommendations for monitor hardware,
57 see `Hardware Recommendations`_. Before adding a monitor host to the cluster,
58 make sure that there is an up-to-date version of Linux installed.
59
60 Add the newly installed monitor host to a rack in your cluster, connect the
61 host to the network, and make sure that the host has network connectivity.
62
63 .. _Hardware Recommendations: ../../../start/hardware-recommendations
64
65 Installing the Required Software
66 --------------------------------
67
68 In manually deployed clusters, it is necessary to install Ceph packages
69 manually. For details, see `Installing Packages`_. Configure SSH so that it can
70 be used by a user that has passwordless authentication and root permissions.
71
72 .. _Installing Packages: ../../../install/install-storage-cluster
73
74
75 .. _Adding a Monitor (Manual):
76
77 Adding a Monitor (Manual)
78 -------------------------
79
80 The procedure in this section creates a ``ceph-mon`` data directory, retrieves
81 both the monitor map and the monitor keyring, and adds a ``ceph-mon`` daemon to
82 the cluster. The procedure might result in a Ceph cluster that contains only
83 two monitor daemons. To add more monitors until there are enough ``ceph-mon``
84 daemons to establish quorum, repeat the procedure.
85
86 This is a good point at which to define the new monitor's ``id``. Monitors have
87 often been named with single letters (``a``, ``b``, ``c``, etc.), but you are
88 free to define the ``id`` however you see fit. In this document, ``{mon-id}``
89 refers to the ``id`` exclusive of the ``mon.`` prefix: for example, if
90 ``mon.a`` has been chosen as the ``id`` of a monitor, then ``{mon-id}`` is
91 ``a``. ???
92
93 #. Create a data directory on the machine that will host the new monitor:
94
95 .. prompt:: bash $
96
97 ssh {new-mon-host}
98 sudo mkdir /var/lib/ceph/mon/ceph-{mon-id}
99
100 #. Create a temporary directory ``{tmp}`` that will contain the files needed
101 during this procedure. This directory should be different from the data
102 directory created in the previous step. Because this is a temporary
103 directory, it can be removed after the procedure is complete:
104
105 .. prompt:: bash $
106
107 mkdir {tmp}
108
109 #. Retrieve the keyring for your monitors (``{tmp}`` is the path to the
110 retrieved keyring and ``{key-filename}`` is the name of the file that
111 contains the retrieved monitor key):
112
113 .. prompt:: bash $
114
115 ceph auth get mon. -o {tmp}/{key-filename}
116
117 #. Retrieve the monitor map (``{tmp}`` is the path to the retrieved monitor map
118 and ``{map-filename}`` is the name of the file that contains the retrieved
119 monitor map):
120
121 .. prompt:: bash $
122
123 ceph mon getmap -o {tmp}/{map-filename}
124
125 #. Prepare the monitor's data directory, which was created in the first step.
126 The following command must specify the path to the monitor map (so that
127 information about a quorum of monitors and their ``fsid``\s can be
128 retrieved) and specify the path to the monitor keyring:
129
130 .. prompt:: bash $
131
132 sudo ceph-mon -i {mon-id} --mkfs --monmap {tmp}/{map-filename} --keyring {tmp}/{key-filename}
133
134 #. Start the new monitor. It will automatically join the cluster. To provide
135 information to the daemon about which address to bind to, use either the
136 ``--public-addr {ip}`` option or the ``--public-network {network}`` option.
137 For example:
138
139 .. prompt:: bash $
140
141 ceph-mon -i {mon-id} --public-addr {ip:port}
142
143 .. _removing-monitors:
144
145 Removing Monitors
146 =================
147
148 When monitors are removed from a cluster, it is important to remember
149 that Ceph monitors use Paxos to maintain consensus about the cluster
150 map. Such consensus is possible only if the number of monitors is sufficient
151 to establish quorum.
152
153
154 .. _Removing a Monitor (Manual):
155
156 Removing a Monitor (Manual)
157 ---------------------------
158
159 The procedure in this section removes a ``ceph-mon`` daemon from the cluster.
160 The procedure might result in a Ceph cluster that contains a number of monitors
161 insufficient to maintain quorum, so plan carefully. When replacing an old
162 monitor with a new monitor, add the new monitor first, wait for quorum to be
163 established, and then remove the old monitor. This ensures that quorum is not
164 lost.
165
166
167 #. Stop the monitor:
168
169 .. prompt:: bash $
170
171 service ceph -a stop mon.{mon-id}
172
173 #. Remove the monitor from the cluster:
174
175 .. prompt:: bash $
176
177 ceph mon remove {mon-id}
178
179 #. Remove the monitor entry from the ``ceph.conf`` file:
180
181 .. _rados-mon-remove-from-unhealthy:
182
183
184 Removing Monitors from an Unhealthy Cluster
185 -------------------------------------------
186
187 The procedure in this section removes a ``ceph-mon`` daemon from an unhealthy
188 cluster (for example, a cluster whose monitors are unable to form a quorum).
189
190 #. Stop all ``ceph-mon`` daemons on all monitor hosts:
191
192 .. prompt:: bash $
193
194 ssh {mon-host}
195 systemctl stop ceph-mon.target
196
197 Repeat this step on every monitor host.
198
199 #. Identify a surviving monitor and log in to the monitor's host:
200
201 .. prompt:: bash $
202
203 ssh {mon-host}
204
205 #. Extract a copy of the ``monmap`` file by running a command of the following
206 form:
207
208 .. prompt:: bash $
209
210 ceph-mon -i {mon-id} --extract-monmap {map-path}
211
212 Here is a more concrete example. In this example, ``hostname`` is the
213 ``{mon-id}`` and ``/tmp/monpap`` is the ``{map-path}``:
214
215 .. prompt:: bash $
216
217 ceph-mon -i `hostname` --extract-monmap /tmp/monmap
218
219 #. Remove the non-surviving or otherwise problematic monitors:
220
221 .. prompt:: bash $
222
223 monmaptool {map-path} --rm {mon-id}
224
225 For example, suppose that there are three monitors |---| ``mon.a``, ``mon.b``,
226 and ``mon.c`` |---| and that only ``mon.a`` will survive:
227
228 .. prompt:: bash $
229
230 monmaptool /tmp/monmap --rm b
231 monmaptool /tmp/monmap --rm c
232
233 #. Inject the surviving map that includes the removed monitors into the
234 monmap of the surviving monitor(s):
235
236 .. prompt:: bash $
237
238 ceph-mon -i {mon-id} --inject-monmap {map-path}
239
240 Continuing with the above example, inject a map into monitor ``mon.a`` by
241 running the following command:
242
243 .. prompt:: bash $
244
245 ceph-mon -i a --inject-monmap /tmp/monmap
246
247
248 #. Start only the surviving monitors.
249
250 #. Verify that the monitors form a quorum by running the command ``ceph -s``.
251
252 #. The data directory of the removed monitors is in ``/var/lib/ceph/mon``:
253 either archive this data directory in a safe location or delete this data
254 directory. However, do not delete it unless you are confident that the
255 remaining monitors are healthy and sufficiently redundant. Make sure that
256 there is enough room for the live DB to expand and compact, and make sure
257 that there is also room for an archived copy of the DB. The archived copy
258 can be compressed.
259
260 .. _Changing a Monitor's IP address:
261
262 Changing a Monitor's IP Address
263 ===============================
264
265 .. important:: Existing monitors are not supposed to change their IP addresses.
266
267 Monitors are critical components of a Ceph cluster. The entire system can work
268 properly only if the monitors maintain quorum, and quorum can be established
269 only if the monitors have discovered each other by means of their IP addresses.
270 Ceph has strict requirements on the discovery of monitors.
271
272 Although the ``ceph.conf`` file is used by Ceph clients and other Ceph daemons
273 to discover monitors, the monitor map is used by monitors to discover each
274 other. This is why it is necessary to obtain the current ``monmap`` at the time
275 a new monitor is created: as can be seen above in `Adding a Monitor (Manual)`_,
276 the ``monmap`` is one of the arguments required by the ``ceph-mon -i {mon-id}
277 --mkfs`` command. The following sections explain the consistency requirements
278 for Ceph monitors, and also explain a number of safe ways to change a monitor's
279 IP address.
280
281
282 Consistency Requirements
283 ------------------------
284
285 When a monitor discovers other monitors in the cluster, it always refers to the
286 local copy of the monitor map. Using the monitor map instead of using the
287 ``ceph.conf`` file avoids errors that could break the cluster (for example,
288 typos or other slight errors in ``ceph.conf`` when a monitor address or port is
289 specified). Because monitors use monitor maps for discovery and because they
290 share monitor maps with Ceph clients and other Ceph daemons, the monitor map
291 provides monitors with a strict guarantee that their consensus is valid.
292
293 Strict consistency also applies to updates to the monmap. As with any other
294 updates on the monitor, changes to the monmap always run through a distributed
295 consensus algorithm called `Paxos`_. The monitors must agree on each update to
296 the monmap, such as adding or removing a monitor, to ensure that each monitor
297 in the quorum has the same version of the monmap. Updates to the monmap are
298 incremental so that monitors have the latest agreed upon version, and a set of
299 previous versions, allowing a monitor that has an older version of the monmap
300 to catch up with the current state of the cluster.
301
302 There are additional advantages to using the monitor map rather than
303 ``ceph.conf`` when monitors discover each other. Because ``ceph.conf`` is not
304 automatically updated and distributed, its use would bring certain risks:
305 monitors might use an outdated ``ceph.conf`` file, might fail to recognize a
306 specific monitor, might fall out of quorum, and might develop a situation in
307 which `Paxos`_ is unable to accurately ascertain the current state of the
308 system. Because of these risks, any changes to an existing monitor's IP address
309 must be made with great care.
310
311 .. _operations_add_or_rm_mons_changing_mon_ip:
312
313 Changing a Monitor's IP address (Preferred Method)
314 --------------------------------------------------
315
316 If a monitor's IP address is changed only in the ``ceph.conf`` file, there is
317 no guarantee that the other monitors in the cluster will receive the update.
318 For this reason, the preferred method to change a monitor's IP address is as
319 follows: add a new monitor with the desired IP address (as described in `Adding
320 a Monitor (Manual)`_), make sure that the new monitor successfully joins the
321 quorum, remove the monitor that is using the old IP address, and update the
322 ``ceph.conf`` file to ensure that clients and other daemons are made aware of
323 the new monitor's IP address.
324
325 For example, suppose that there are three monitors in place::
326
327 [mon.a]
328 host = host01
329 addr = 10.0.0.1:6789
330 [mon.b]
331 host = host02
332 addr = 10.0.0.2:6789
333 [mon.c]
334 host = host03
335 addr = 10.0.0.3:6789
336
337 To change ``mon.c`` so that its name is ``host04`` and its IP address is
338 ``10.0.0.4``: (1) follow the steps in `Adding a Monitor (Manual)`_ to add a new
339 monitor ``mon.d``, (2) make sure that ``mon.d`` is running before removing
340 ``mon.c`` or else quorum will be broken, and (3) follow the steps in `Removing
341 a Monitor (Manual)`_ to remove ``mon.c``. To move all three monitors to new IP
342 addresses, repeat this process.
343
344 Changing a Monitor's IP address (Advanced Method)
345 -------------------------------------------------
346
347 There are cases in which the method outlined in :ref"`<Changing a Monitor's IP
348 Address (Preferred Method)> operations_add_or_rm_mons_changing_mon_ip` cannot
349 be used. For example, it might be necessary to move the cluster's monitors to a
350 different network, to a different part of the datacenter, or to a different
351 datacenter altogether. It is still possible to change the monitors' IP
352 addresses, but a different method must be used.
353
354 For such cases, a new monitor map with updated IP addresses for every monitor
355 in the cluster must be generated and injected on each monitor. Although this
356 method is not particularly easy, such a major migration is unlikely to be a
357 routine task. As stated at the beginning of this section, existing monitors are
358 not supposed to change their IP addresses.
359
360 Continue with the monitor configuration in the example from :ref"`<Changing a
361 Monitor's IP Address (Preferred Method)>
362 operations_add_or_rm_mons_changing_mon_ip` . Suppose that all of the monitors
363 are to be moved from the ``10.0.0.x`` range to the ``10.1.0.x`` range, and that
364 these networks are unable to communicate. Carry out the following procedure:
365
366 #. Retrieve the monitor map (``{tmp}`` is the path to the retrieved monitor
367 map, and ``{filename}`` is the name of the file that contains the retrieved
368 monitor map):
369
370 .. prompt:: bash $
371
372 ceph mon getmap -o {tmp}/{filename}
373
374 #. Check the contents of the monitor map:
375
376 .. prompt:: bash $
377
378 monmaptool --print {tmp}/{filename}
379
380 ::
381
382 monmaptool: monmap file {tmp}/{filename}
383 epoch 1
384 fsid 224e376d-c5fe-4504-96bb-ea6332a19e61
385 last_changed 2012-12-17 02:46:41.591248
386 created 2012-12-17 02:46:41.591248
387 0: 10.0.0.1:6789/0 mon.a
388 1: 10.0.0.2:6789/0 mon.b
389 2: 10.0.0.3:6789/0 mon.c
390
391 #. Remove the existing monitors from the monitor map:
392
393 .. prompt:: bash $
394
395 monmaptool --rm a --rm b --rm c {tmp}/{filename}
396
397 ::
398
399 monmaptool: monmap file {tmp}/{filename}
400 monmaptool: removing a
401 monmaptool: removing b
402 monmaptool: removing c
403 monmaptool: writing epoch 1 to {tmp}/{filename} (0 monitors)
404
405 #. Add the new monitor locations to the monitor map:
406
407 .. prompt:: bash $
408
409 monmaptool --add a 10.1.0.1:6789 --add b 10.1.0.2:6789 --add c 10.1.0.3:6789 {tmp}/{filename}
410
411 ::
412
413 monmaptool: monmap file {tmp}/{filename}
414 monmaptool: writing epoch 1 to {tmp}/{filename} (3 monitors)
415
416 #. Check the new contents of the monitor map:
417
418 .. prompt:: bash $
419
420 monmaptool --print {tmp}/{filename}
421
422 ::
423
424 monmaptool: monmap file {tmp}/{filename}
425 epoch 1
426 fsid 224e376d-c5fe-4504-96bb-ea6332a19e61
427 last_changed 2012-12-17 02:46:41.591248
428 created 2012-12-17 02:46:41.591248
429 0: 10.1.0.1:6789/0 mon.a
430 1: 10.1.0.2:6789/0 mon.b
431 2: 10.1.0.3:6789/0 mon.c
432
433 At this point, we assume that the monitors (and stores) have been installed at
434 the new location. Next, propagate the modified monitor map to the new monitors,
435 and inject the modified monitor map into each new monitor.
436
437 #. Make sure all of your monitors have been stopped. Never inject into a
438 monitor while the monitor daemon is running.
439
440 #. Inject the monitor map:
441
442 .. prompt:: bash $
443
444 ceph-mon -i {mon-id} --inject-monmap {tmp}/{filename}
445
446 #. Restart all of the monitors.
447
448 Migration to the new location is now complete. The monitors should operate
449 successfully.
450
451
452
453 .. _Manual Deployment: ../../../install/manual-deployment
454 .. _Monitor Bootstrap: ../../../dev/mon-bootstrap
455 .. _Paxos: https://en.wikipedia.org/wiki/Paxos_(computer_science)
456
457 .. |---| unicode:: U+2014 .. EM DASH
458 :trim: