]> git.proxmox.com Git - ceph.git/blame - ceph/doc/rados/operations/add-or-rm-mons.rst
update sources to v12.1.2
[ceph.git] / ceph / doc / rados / operations / add-or-rm-mons.rst
CommitLineData
7c673cae
FG
1==========================
2 Adding/Removing Monitors
3==========================
4
5When you have a cluster up and running, you may add or remove monitors
6from the cluster at runtime. To bootstrap a monitor, see `Manual Deployment`_
7or `Monitor Bootstrap`_.
8
9Adding Monitors
10===============
11
12Ceph monitors are light-weight processes that maintain a master copy of the
13cluster map. You can run a cluster with 1 monitor. We recommend at least 3
14monitors for a production cluster. Ceph monitors use a variation of the
15`Paxos`_ protocol to establish consensus about maps and other critical
16information across the cluster. Due to the nature of Paxos, Ceph requires
17a majority of monitors running to establish a quorum (thus establishing
18consensus).
19
20It is advisable to run an odd-number of monitors but not mandatory. An
21odd-number of monitors has a higher resiliency to failures than an
22even-number of monitors. For instance, on a 2 monitor deployment, no
23failures can be tolerated in order to maintain a quorum; with 3 monitors,
24one failure can be tolerated; in a 4 monitor deployment, one failure can
25be tolerated; with 5 monitors, two failures can be tolerated. This is
26why an odd-number is advisable. Summarizing, Ceph needs a majority of
27monitors to be running (and able to communicate with each other), but that
28majority can be achieved using a single monitor, or 2 out of 2 monitors,
292 out of 3, 3 out of 4, etc.
30
31For an initial deployment of a multi-node Ceph cluster, it is advisable to
32deploy three monitors, increasing the number two at a time if a valid need
33for more than three exists.
34
35Since monitors are light-weight, it is possible to run them on the same
36host as an OSD; however, we recommend running them on separate hosts,
37because fsync issues with the kernel may impair performance.
38
39.. note:: A *majority* of monitors in your cluster must be able to
40 reach each other in order to establish a quorum.
41
42Deploy your Hardware
43--------------------
44
45If you are adding a new host when adding a new monitor, see `Hardware
46Recommendations`_ for details on minimum recommendations for monitor hardware.
47To add a monitor host to your cluster, first make sure you have an up-to-date
48version of Linux installed (typically Ubuntu 14.04 or RHEL 7).
49
50Add your monitor host to a rack in your cluster, connect it to the network
51and ensure that it has network connectivity.
52
53.. _Hardware Recommendations: ../../../start/hardware-recommendations
54
55Install the Required Software
56-----------------------------
57
58For manually deployed clusters, you must install Ceph packages
59manually. See `Installing Packages`_ for details.
60You should configure SSH to a user with password-less authentication
61and root permissions.
62
63.. _Installing Packages: ../../../install/install-storage-cluster
64
65
66.. _Adding a Monitor (Manual):
67
68Adding a Monitor (Manual)
69-------------------------
70
71This procedure creates a ``ceph-mon`` data directory, retrieves the monitor map
72and monitor keyring, and adds a ``ceph-mon`` daemon to your cluster. If
73this results in only two monitor daemons, you may add more monitors by
74repeating this procedure until you have a sufficient number of ``ceph-mon``
75daemons to achieve a quorum.
76
77At this point you should define your monitor's id. Traditionally, monitors
78have been named with single letters (``a``, ``b``, ``c``, ...), but you are
79free to define the id as you see fit. For the purpose of this document,
80please take into account that ``{mon-id}`` should be the id you chose,
81without the ``mon.`` prefix (i.e., ``{mon-id}`` should be the ``a``
82on ``mon.a``).
83
84#. Create the default directory on the machine that will host your
85 new monitor. ::
86
87 ssh {new-mon-host}
88 sudo mkdir /var/lib/ceph/mon/ceph-{mon-id}
89
90#. Create a temporary directory ``{tmp}`` to keep the files needed during
91 this process. This directory should be different from the monitor's default
92 directory created in the previous step, and can be removed after all the
93 steps are executed. ::
94
95 mkdir {tmp}
96
97#. Retrieve the keyring for your monitors, where ``{tmp}`` is the path to
98 the retrieved keyring, and ``{key-filename}`` is the name of the file
99 containing the retrieved monitor key. ::
100
101 ceph auth get mon. -o {tmp}/{key-filename}
102
103#. Retrieve the monitor map, where ``{tmp}`` is the path to
104 the retrieved monitor map, and ``{map-filename}`` is the name of the file
105 containing the retrieved monitor monitor map. ::
106
107 ceph mon getmap -o {tmp}/{map-filename}
108
109#. Prepare the monitor's data directory created in the first step. You must
110 specify the path to the monitor map so that you can retrieve the
111 information about a quorum of monitors and their ``fsid``. You must also
112 specify a path to the monitor keyring::
113
114 sudo ceph-mon -i {mon-id} --mkfs --monmap {tmp}/{map-filename} --keyring {tmp}/{key-filename}
115
116
117#. Start the new monitor and it will automatically join the cluster.
118 The daemon needs to know which address to bind to, either via
119 ``--public-addr {ip:port}`` or by setting ``mon addr`` in the
120 appropriate section of ``ceph.conf``. For example::
121
122 ceph-mon -i {mon-id} --public-addr {ip:port}
123
124
125Removing Monitors
126=================
127
128When you remove monitors from a cluster, consider that Ceph monitors use
129PAXOS to establish consensus about the master cluster map. You must have
130a sufficient number of monitors to establish a quorum for consensus about
131the cluster map.
132
133.. _Removing a Monitor (Manual):
134
135Removing a Monitor (Manual)
136---------------------------
137
138This procedure removes a ``ceph-mon`` daemon from your cluster. If this
139procedure results in only two monitor daemons, you may add or remove another
140monitor until you have a number of ``ceph-mon`` daemons that can achieve a
141quorum.
142
143#. Stop the monitor. ::
144
145 service ceph -a stop mon.{mon-id}
146
147#. Remove the monitor from the cluster. ::
148
149 ceph mon remove {mon-id}
150
151#. Remove the monitor entry from ``ceph.conf``.
152
153
154Removing Monitors from an Unhealthy Cluster
155-------------------------------------------
156
157This procedure removes a ``ceph-mon`` daemon from an unhealthy
158cluster, for example a cluster where the monitors cannot form a
159quorum.
160
161
162#. Stop all ``ceph-mon`` daemons on all monitor hosts. ::
163
164 ssh {mon-host}
165 service ceph stop mon || stop ceph-mon-all
166 # and repeat for all mons
167
168#. Identify a surviving monitor and log in to that host. ::
169
170 ssh {mon-host}
171
172#. Extract a copy of the monmap file. ::
173
174 ceph-mon -i {mon-id} --extract-monmap {map-path}
175 # in most cases, that's
176 ceph-mon -i `hostname` --extract-monmap /tmp/monmap
177
178#. Remove the non-surviving or problematic monitors. For example, if
179 you have three monitors, ``mon.a``, ``mon.b``, and ``mon.c``, where
180 only ``mon.a`` will survive, follow the example below::
181
182 monmaptool {map-path} --rm {mon-id}
183 # for example,
184 monmaptool /tmp/monmap --rm b
185 monmaptool /tmp/monmap --rm c
186
187#. Inject the surviving map with the removed monitors into the
188 surviving monitor(s). For example, to inject a map into monitor
189 ``mon.a``, follow the example below::
190
191 ceph-mon -i {mon-id} --inject-monmap {map-path}
192 # for example,
193 ceph-mon -i a --inject-monmap /tmp/monmap
194
195#. Start only the surviving monitors.
196
197#. Verify the monitors form a quorum (``ceph -s``).
198
199#. You may wish to archive the removed monitors' data directory in
200 ``/var/lib/ceph/mon`` in a safe location, or delete it if you are
201 confident the remaining monitors are healthy and are sufficiently
202 redundant.
203
204.. _Changing a Monitor's IP address:
205
206Changing a Monitor's IP Address
207===============================
208
209.. important:: Existing monitors are not supposed to change their IP addresses.
210
211Monitors are critical components of a Ceph cluster, and they need to maintain a
212quorum for the whole system to work properly. To establish a quorum, the
213monitors need to discover each other. Ceph has strict requirements for
214discovering monitors.
215
216Ceph clients and other Ceph daemons use ``ceph.conf`` to discover monitors.
217However, monitors discover each other using the monitor map, not ``ceph.conf``.
218For example, if you refer to `Adding a Monitor (Manual)`_ you will see that you
219need to obtain the current monmap for the cluster when creating a new monitor,
220as it is one of the required arguments of ``ceph-mon -i {mon-id} --mkfs``. The
221following sections explain the consistency requirements for Ceph monitors, and a
222few safe ways to change a monitor's IP address.
223
224
225Consistency Requirements
226------------------------
227
228A monitor always refers to the local copy of the monmap when discovering other
229monitors in the cluster. Using the monmap instead of ``ceph.conf`` avoids
230errors that could break the cluster (e.g., typos in ``ceph.conf`` when
231specifying a monitor address or port). Since monitors use monmaps for discovery
232and they share monmaps with clients and other Ceph daemons, the monmap provides
233monitors with a strict guarantee that their consensus is valid.
234
235Strict consistency also applies to updates to the monmap. As with any other
236updates on the monitor, changes to the monmap always run through a distributed
237consensus algorithm called `Paxos`_. The monitors must agree on each update to
238the monmap, such as adding or removing a monitor, to ensure that each monitor in
239the quorum has the same version of the monmap. Updates to the monmap are
240incremental so that monitors have the latest agreed upon version, and a set of
241previous versions, allowing a monitor that has an older version of the monmap to
242catch up with the current state of the cluster.
243
244If monitors discovered each other through the Ceph configuration file instead of
245through the monmap, it would introduce additional risks because the Ceph
c07f9fc5 246configuration files are not updated and distributed automatically. Monitors
7c673cae 247might inadvertently use an older ``ceph.conf`` file, fail to recognize a
c07f9fc5 248monitor, fall out of a quorum, or develop a situation where `Paxos`_ is not able
7c673cae
FG
249to determine the current state of the system accurately. Consequently, making
250changes to an existing monitor's IP address must be done with great care.
251
252
253Changing a Monitor's IP address (The Right Way)
254-----------------------------------------------
255
256Changing a monitor's IP address in ``ceph.conf`` only is not sufficient to
257ensure that other monitors in the cluster will receive the update. To change a
258monitor's IP address, you must add a new monitor with the IP address you want
259to use (as described in `Adding a Monitor (Manual)`_), ensure that the new
260monitor successfully joins the quorum; then, remove the monitor that uses the
261old IP address. Then, update the ``ceph.conf`` file to ensure that clients and
262other daemons know the IP address of the new monitor.
263
264For example, lets assume there are three monitors in place, such as ::
265
266 [mon.a]
267 host = host01
268 addr = 10.0.0.1:6789
269 [mon.b]
270 host = host02
271 addr = 10.0.0.2:6789
272 [mon.c]
273 host = host03
274 addr = 10.0.0.3:6789
275
276To change ``mon.c`` to ``host04`` with the IP address ``10.0.0.4``, follow the
277steps in `Adding a Monitor (Manual)`_ by adding a new monitor ``mon.d``. Ensure
278that ``mon.d`` is running before removing ``mon.c``, or it will break the
279quorum. Remove ``mon.c`` as described on `Removing a Monitor (Manual)`_. Moving
280all three monitors would thus require repeating this process as many times as
281needed.
282
283
284Changing a Monitor's IP address (The Messy Way)
285-----------------------------------------------
286
287There may come a time when the monitors must be moved to a different network, a
288different part of the datacenter or a different datacenter altogether. While it
289is possible to do it, the process becomes a bit more hazardous.
290
291In such a case, the solution is to generate a new monmap with updated IP
292addresses for all the monitors in the cluster, and inject the new map on each
293individual monitor. This is not the most user-friendly approach, but we do not
294expect this to be something that needs to be done every other week. As it is
295clearly stated on the top of this section, monitors are not supposed to change
296IP addresses.
297
298Using the previous monitor configuration as an example, assume you want to move
299all the monitors from the ``10.0.0.x`` range to ``10.1.0.x``, and these
300networks are unable to communicate. Use the following procedure:
301
302#. Retrieve the monitor map, where ``{tmp}`` is the path to
303 the retrieved monitor map, and ``{filename}`` is the name of the file
304 containing the retrieved monitor monitor map. ::
305
306 ceph mon getmap -o {tmp}/{filename}
307
308#. The following example demonstrates the contents of the monmap. ::
309
310 $ monmaptool --print {tmp}/{filename}
311
312 monmaptool: monmap file {tmp}/{filename}
313 epoch 1
314 fsid 224e376d-c5fe-4504-96bb-ea6332a19e61
315 last_changed 2012-12-17 02:46:41.591248
316 created 2012-12-17 02:46:41.591248
317 0: 10.0.0.1:6789/0 mon.a
318 1: 10.0.0.2:6789/0 mon.b
319 2: 10.0.0.3:6789/0 mon.c
320
321#. Remove the existing monitors. ::
322
323 $ monmaptool --rm a --rm b --rm c {tmp}/{filename}
324
325 monmaptool: monmap file {tmp}/{filename}
326 monmaptool: removing a
327 monmaptool: removing b
328 monmaptool: removing c
329 monmaptool: writing epoch 1 to {tmp}/{filename} (0 monitors)
330
331#. Add the new monitor locations. ::
332
333 $ monmaptool --add a 10.1.0.1:6789 --add b 10.1.0.2:6789 --add c 10.1.0.3:6789 {tmp}/{filename}
334
335 monmaptool: monmap file {tmp}/{filename}
336 monmaptool: writing epoch 1 to {tmp}/{filename} (3 monitors)
337
338#. Check new contents. ::
339
340 $ monmaptool --print {tmp}/{filename}
341
342 monmaptool: monmap file {tmp}/{filename}
343 epoch 1
344 fsid 224e376d-c5fe-4504-96bb-ea6332a19e61
345 last_changed 2012-12-17 02:46:41.591248
346 created 2012-12-17 02:46:41.591248
347 0: 10.1.0.1:6789/0 mon.a
348 1: 10.1.0.2:6789/0 mon.b
349 2: 10.1.0.3:6789/0 mon.c
350
351At this point, we assume the monitors (and stores) are installed at the new
352location. The next step is to propagate the modified monmap to the new
353monitors, and inject the modified monmap into each new monitor.
354
355#. First, make sure to stop all your monitors. Injection must be done while
356 the daemon is not running.
357
358#. Inject the monmap. ::
359
360 ceph-mon -i {mon-id} --inject-monmap {tmp}/{filename}
361
362#. Restart the monitors.
363
364After this step, migration to the new location is complete and
365the monitors should operate successfully.
366
367
368.. _Manual Deployment: ../../../install/manual-deployment
369.. _Monitor Bootstrap: ../../../dev/mon-bootstrap
370.. _Paxos: http://en.wikipedia.org/wiki/Paxos_(computer_science)