]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | ========================== |
2 | Adding/Removing Monitors | |
3 | ========================== | |
4 | ||
5 | When you have a cluster up and running, you may add or remove monitors | |
6 | from the cluster at runtime. To bootstrap a monitor, see `Manual Deployment`_ | |
7 | or `Monitor Bootstrap`_. | |
8 | ||
9 | Adding Monitors | |
10 | =============== | |
11 | ||
12 | Ceph monitors are light-weight processes that maintain a master copy of the | |
13 | cluster map. You can run a cluster with 1 monitor. We recommend at least 3 | |
14 | monitors for a production cluster. Ceph monitors use a variation of the | |
15 | `Paxos`_ protocol to establish consensus about maps and other critical | |
16 | information across the cluster. Due to the nature of Paxos, Ceph requires | |
17 | a majority of monitors running to establish a quorum (thus establishing | |
18 | consensus). | |
19 | ||
20 | It is advisable to run an odd-number of monitors but not mandatory. An | |
21 | odd-number of monitors has a higher resiliency to failures than an | |
22 | even-number of monitors. For instance, on a 2 monitor deployment, no | |
23 | failures can be tolerated in order to maintain a quorum; with 3 monitors, | |
24 | one failure can be tolerated; in a 4 monitor deployment, one failure can | |
25 | be tolerated; with 5 monitors, two failures can be tolerated. This is | |
26 | why an odd-number is advisable. Summarizing, Ceph needs a majority of | |
27 | monitors to be running (and able to communicate with each other), but that | |
28 | majority can be achieved using a single monitor, or 2 out of 2 monitors, | |
29 | 2 out of 3, 3 out of 4, etc. | |
30 | ||
31 | For an initial deployment of a multi-node Ceph cluster, it is advisable to | |
32 | deploy three monitors, increasing the number two at a time if a valid need | |
33 | for more than three exists. | |
34 | ||
35 | Since monitors are light-weight, it is possible to run them on the same | |
36 | host as an OSD; however, we recommend running them on separate hosts, | |
37 | because fsync issues with the kernel may impair performance. | |
38 | ||
39 | .. note:: A *majority* of monitors in your cluster must be able to | |
40 | reach each other in order to establish a quorum. | |
41 | ||
42 | Deploy your Hardware | |
43 | -------------------- | |
44 | ||
45 | If you are adding a new host when adding a new monitor, see `Hardware | |
46 | Recommendations`_ for details on minimum recommendations for monitor hardware. | |
47 | To add a monitor host to your cluster, first make sure you have an up-to-date | |
48 | version of Linux installed (typically Ubuntu 14.04 or RHEL 7). | |
49 | ||
50 | Add your monitor host to a rack in your cluster, connect it to the network | |
51 | and ensure that it has network connectivity. | |
52 | ||
53 | .. _Hardware Recommendations: ../../../start/hardware-recommendations | |
54 | ||
55 | Install the Required Software | |
56 | ----------------------------- | |
57 | ||
58 | For manually deployed clusters, you must install Ceph packages | |
59 | manually. See `Installing Packages`_ for details. | |
60 | You should configure SSH to a user with password-less authentication | |
61 | and root permissions. | |
62 | ||
63 | .. _Installing Packages: ../../../install/install-storage-cluster | |
64 | ||
65 | ||
66 | .. _Adding a Monitor (Manual): | |
67 | ||
68 | Adding a Monitor (Manual) | |
69 | ------------------------- | |
70 | ||
71 | This procedure creates a ``ceph-mon`` data directory, retrieves the monitor map | |
72 | and monitor keyring, and adds a ``ceph-mon`` daemon to your cluster. If | |
73 | this results in only two monitor daemons, you may add more monitors by | |
74 | repeating this procedure until you have a sufficient number of ``ceph-mon`` | |
75 | daemons to achieve a quorum. | |
76 | ||
77 | At this point you should define your monitor's id. Traditionally, monitors | |
78 | have been named with single letters (``a``, ``b``, ``c``, ...), but you are | |
79 | free to define the id as you see fit. For the purpose of this document, | |
80 | please take into account that ``{mon-id}`` should be the id you chose, | |
81 | without the ``mon.`` prefix (i.e., ``{mon-id}`` should be the ``a`` | |
82 | on ``mon.a``). | |
83 | ||
84 | #. Create the default directory on the machine that will host your | |
85 | new monitor. :: | |
86 | ||
87 | ssh {new-mon-host} | |
88 | sudo mkdir /var/lib/ceph/mon/ceph-{mon-id} | |
89 | ||
90 | #. Create a temporary directory ``{tmp}`` to keep the files needed during | |
91 | this process. This directory should be different from the monitor's default | |
92 | directory created in the previous step, and can be removed after all the | |
93 | steps are executed. :: | |
94 | ||
95 | mkdir {tmp} | |
96 | ||
97 | #. Retrieve the keyring for your monitors, where ``{tmp}`` is the path to | |
98 | the retrieved keyring, and ``{key-filename}`` is the name of the file | |
99 | containing the retrieved monitor key. :: | |
100 | ||
101 | ceph auth get mon. -o {tmp}/{key-filename} | |
102 | ||
103 | #. Retrieve the monitor map, where ``{tmp}`` is the path to | |
104 | the retrieved monitor map, and ``{map-filename}`` is the name of the file | |
105 | containing the retrieved monitor monitor map. :: | |
106 | ||
107 | ceph mon getmap -o {tmp}/{map-filename} | |
108 | ||
109 | #. Prepare the monitor's data directory created in the first step. You must | |
110 | specify the path to the monitor map so that you can retrieve the | |
111 | information about a quorum of monitors and their ``fsid``. You must also | |
112 | specify a path to the monitor keyring:: | |
113 | ||
114 | sudo ceph-mon -i {mon-id} --mkfs --monmap {tmp}/{map-filename} --keyring {tmp}/{key-filename} | |
115 | ||
116 | ||
117 | #. Start the new monitor and it will automatically join the cluster. | |
118 | The daemon needs to know which address to bind to, either via | |
119 | ``--public-addr {ip:port}`` or by setting ``mon addr`` in the | |
120 | appropriate section of ``ceph.conf``. For example:: | |
121 | ||
122 | ceph-mon -i {mon-id} --public-addr {ip:port} | |
123 | ||
124 | ||
125 | Removing Monitors | |
126 | ================= | |
127 | ||
128 | When you remove monitors from a cluster, consider that Ceph monitors use | |
129 | PAXOS to establish consensus about the master cluster map. You must have | |
130 | a sufficient number of monitors to establish a quorum for consensus about | |
131 | the cluster map. | |
132 | ||
133 | .. _Removing a Monitor (Manual): | |
134 | ||
135 | Removing a Monitor (Manual) | |
136 | --------------------------- | |
137 | ||
138 | This procedure removes a ``ceph-mon`` daemon from your cluster. If this | |
139 | procedure results in only two monitor daemons, you may add or remove another | |
140 | monitor until you have a number of ``ceph-mon`` daemons that can achieve a | |
141 | quorum. | |
142 | ||
143 | #. Stop the monitor. :: | |
144 | ||
145 | service ceph -a stop mon.{mon-id} | |
146 | ||
147 | #. Remove the monitor from the cluster. :: | |
148 | ||
149 | ceph mon remove {mon-id} | |
150 | ||
151 | #. Remove the monitor entry from ``ceph.conf``. | |
152 | ||
153 | ||
154 | Removing Monitors from an Unhealthy Cluster | |
155 | ------------------------------------------- | |
156 | ||
157 | This procedure removes a ``ceph-mon`` daemon from an unhealthy | |
158 | cluster, for example a cluster where the monitors cannot form a | |
159 | quorum. | |
160 | ||
161 | ||
162 | #. Stop all ``ceph-mon`` daemons on all monitor hosts. :: | |
163 | ||
164 | ssh {mon-host} | |
165 | service ceph stop mon || stop ceph-mon-all | |
166 | # and repeat for all mons | |
167 | ||
168 | #. Identify a surviving monitor and log in to that host. :: | |
169 | ||
170 | ssh {mon-host} | |
171 | ||
172 | #. Extract a copy of the monmap file. :: | |
173 | ||
174 | ceph-mon -i {mon-id} --extract-monmap {map-path} | |
175 | # in most cases, that's | |
176 | ceph-mon -i `hostname` --extract-monmap /tmp/monmap | |
177 | ||
178 | #. Remove the non-surviving or problematic monitors. For example, if | |
179 | you have three monitors, ``mon.a``, ``mon.b``, and ``mon.c``, where | |
180 | only ``mon.a`` will survive, follow the example below:: | |
181 | ||
182 | monmaptool {map-path} --rm {mon-id} | |
183 | # for example, | |
184 | monmaptool /tmp/monmap --rm b | |
185 | monmaptool /tmp/monmap --rm c | |
186 | ||
187 | #. Inject the surviving map with the removed monitors into the | |
188 | surviving monitor(s). For example, to inject a map into monitor | |
189 | ``mon.a``, follow the example below:: | |
190 | ||
191 | ceph-mon -i {mon-id} --inject-monmap {map-path} | |
192 | # for example, | |
193 | ceph-mon -i a --inject-monmap /tmp/monmap | |
194 | ||
195 | #. Start only the surviving monitors. | |
196 | ||
197 | #. Verify the monitors form a quorum (``ceph -s``). | |
198 | ||
199 | #. You may wish to archive the removed monitors' data directory in | |
200 | ``/var/lib/ceph/mon`` in a safe location, or delete it if you are | |
201 | confident the remaining monitors are healthy and are sufficiently | |
202 | redundant. | |
203 | ||
204 | .. _Changing a Monitor's IP address: | |
205 | ||
206 | Changing a Monitor's IP Address | |
207 | =============================== | |
208 | ||
209 | .. important:: Existing monitors are not supposed to change their IP addresses. | |
210 | ||
211 | Monitors are critical components of a Ceph cluster, and they need to maintain a | |
212 | quorum for the whole system to work properly. To establish a quorum, the | |
213 | monitors need to discover each other. Ceph has strict requirements for | |
214 | discovering monitors. | |
215 | ||
216 | Ceph clients and other Ceph daemons use ``ceph.conf`` to discover monitors. | |
217 | However, monitors discover each other using the monitor map, not ``ceph.conf``. | |
218 | For example, if you refer to `Adding a Monitor (Manual)`_ you will see that you | |
219 | need to obtain the current monmap for the cluster when creating a new monitor, | |
220 | as it is one of the required arguments of ``ceph-mon -i {mon-id} --mkfs``. The | |
221 | following sections explain the consistency requirements for Ceph monitors, and a | |
222 | few safe ways to change a monitor's IP address. | |
223 | ||
224 | ||
225 | Consistency Requirements | |
226 | ------------------------ | |
227 | ||
228 | A monitor always refers to the local copy of the monmap when discovering other | |
229 | monitors in the cluster. Using the monmap instead of ``ceph.conf`` avoids | |
230 | errors that could break the cluster (e.g., typos in ``ceph.conf`` when | |
231 | specifying a monitor address or port). Since monitors use monmaps for discovery | |
232 | and they share monmaps with clients and other Ceph daemons, the monmap provides | |
233 | monitors with a strict guarantee that their consensus is valid. | |
234 | ||
235 | Strict consistency also applies to updates to the monmap. As with any other | |
236 | updates on the monitor, changes to the monmap always run through a distributed | |
237 | consensus algorithm called `Paxos`_. The monitors must agree on each update to | |
238 | the monmap, such as adding or removing a monitor, to ensure that each monitor in | |
239 | the quorum has the same version of the monmap. Updates to the monmap are | |
240 | incremental so that monitors have the latest agreed upon version, and a set of | |
241 | previous versions, allowing a monitor that has an older version of the monmap to | |
242 | catch up with the current state of the cluster. | |
243 | ||
244 | If monitors discovered each other through the Ceph configuration file instead of | |
245 | through the monmap, it would introduce additional risks because the Ceph | |
c07f9fc5 | 246 | configuration files are not updated and distributed automatically. Monitors |
7c673cae | 247 | might inadvertently use an older ``ceph.conf`` file, fail to recognize a |
c07f9fc5 | 248 | monitor, fall out of a quorum, or develop a situation where `Paxos`_ is not able |
7c673cae FG |
249 | to determine the current state of the system accurately. Consequently, making |
250 | changes to an existing monitor's IP address must be done with great care. | |
251 | ||
252 | ||
253 | Changing a Monitor's IP address (The Right Way) | |
254 | ----------------------------------------------- | |
255 | ||
256 | Changing a monitor's IP address in ``ceph.conf`` only is not sufficient to | |
257 | ensure that other monitors in the cluster will receive the update. To change a | |
258 | monitor's IP address, you must add a new monitor with the IP address you want | |
259 | to use (as described in `Adding a Monitor (Manual)`_), ensure that the new | |
260 | monitor successfully joins the quorum; then, remove the monitor that uses the | |
261 | old IP address. Then, update the ``ceph.conf`` file to ensure that clients and | |
262 | other daemons know the IP address of the new monitor. | |
263 | ||
264 | For example, lets assume there are three monitors in place, such as :: | |
265 | ||
266 | [mon.a] | |
267 | host = host01 | |
268 | addr = 10.0.0.1:6789 | |
269 | [mon.b] | |
270 | host = host02 | |
271 | addr = 10.0.0.2:6789 | |
272 | [mon.c] | |
273 | host = host03 | |
274 | addr = 10.0.0.3:6789 | |
275 | ||
276 | To change ``mon.c`` to ``host04`` with the IP address ``10.0.0.4``, follow the | |
277 | steps in `Adding a Monitor (Manual)`_ by adding a new monitor ``mon.d``. Ensure | |
278 | that ``mon.d`` is running before removing ``mon.c``, or it will break the | |
279 | quorum. Remove ``mon.c`` as described on `Removing a Monitor (Manual)`_. Moving | |
280 | all three monitors would thus require repeating this process as many times as | |
281 | needed. | |
282 | ||
283 | ||
284 | Changing a Monitor's IP address (The Messy Way) | |
285 | ----------------------------------------------- | |
286 | ||
287 | There may come a time when the monitors must be moved to a different network, a | |
288 | different part of the datacenter or a different datacenter altogether. While it | |
289 | is possible to do it, the process becomes a bit more hazardous. | |
290 | ||
291 | In such a case, the solution is to generate a new monmap with updated IP | |
292 | addresses for all the monitors in the cluster, and inject the new map on each | |
293 | individual monitor. This is not the most user-friendly approach, but we do not | |
294 | expect this to be something that needs to be done every other week. As it is | |
295 | clearly stated on the top of this section, monitors are not supposed to change | |
296 | IP addresses. | |
297 | ||
298 | Using the previous monitor configuration as an example, assume you want to move | |
299 | all the monitors from the ``10.0.0.x`` range to ``10.1.0.x``, and these | |
300 | networks are unable to communicate. Use the following procedure: | |
301 | ||
302 | #. Retrieve the monitor map, where ``{tmp}`` is the path to | |
303 | the retrieved monitor map, and ``{filename}`` is the name of the file | |
304 | containing the retrieved monitor monitor map. :: | |
305 | ||
306 | ceph mon getmap -o {tmp}/{filename} | |
307 | ||
308 | #. The following example demonstrates the contents of the monmap. :: | |
309 | ||
310 | $ monmaptool --print {tmp}/{filename} | |
311 | ||
312 | monmaptool: monmap file {tmp}/{filename} | |
313 | epoch 1 | |
314 | fsid 224e376d-c5fe-4504-96bb-ea6332a19e61 | |
315 | last_changed 2012-12-17 02:46:41.591248 | |
316 | created 2012-12-17 02:46:41.591248 | |
317 | 0: 10.0.0.1:6789/0 mon.a | |
318 | 1: 10.0.0.2:6789/0 mon.b | |
319 | 2: 10.0.0.3:6789/0 mon.c | |
320 | ||
321 | #. Remove the existing monitors. :: | |
322 | ||
323 | $ monmaptool --rm a --rm b --rm c {tmp}/{filename} | |
324 | ||
325 | monmaptool: monmap file {tmp}/{filename} | |
326 | monmaptool: removing a | |
327 | monmaptool: removing b | |
328 | monmaptool: removing c | |
329 | monmaptool: writing epoch 1 to {tmp}/{filename} (0 monitors) | |
330 | ||
331 | #. Add the new monitor locations. :: | |
332 | ||
333 | $ monmaptool --add a 10.1.0.1:6789 --add b 10.1.0.2:6789 --add c 10.1.0.3:6789 {tmp}/{filename} | |
334 | ||
335 | monmaptool: monmap file {tmp}/{filename} | |
336 | monmaptool: writing epoch 1 to {tmp}/{filename} (3 monitors) | |
337 | ||
338 | #. Check new contents. :: | |
339 | ||
340 | $ monmaptool --print {tmp}/{filename} | |
341 | ||
342 | monmaptool: monmap file {tmp}/{filename} | |
343 | epoch 1 | |
344 | fsid 224e376d-c5fe-4504-96bb-ea6332a19e61 | |
345 | last_changed 2012-12-17 02:46:41.591248 | |
346 | created 2012-12-17 02:46:41.591248 | |
347 | 0: 10.1.0.1:6789/0 mon.a | |
348 | 1: 10.1.0.2:6789/0 mon.b | |
349 | 2: 10.1.0.3:6789/0 mon.c | |
350 | ||
351 | At this point, we assume the monitors (and stores) are installed at the new | |
352 | location. The next step is to propagate the modified monmap to the new | |
353 | monitors, and inject the modified monmap into each new monitor. | |
354 | ||
355 | #. First, make sure to stop all your monitors. Injection must be done while | |
356 | the daemon is not running. | |
357 | ||
358 | #. Inject the monmap. :: | |
359 | ||
360 | ceph-mon -i {mon-id} --inject-monmap {tmp}/{filename} | |
361 | ||
362 | #. Restart the monitors. | |
363 | ||
364 | After this step, migration to the new location is complete and | |
365 | the monitors should operate successfully. | |
366 | ||
367 | ||
368 | .. _Manual Deployment: ../../../install/manual-deployment | |
369 | .. _Monitor Bootstrap: ../../../dev/mon-bootstrap | |
370 | .. _Paxos: http://en.wikipedia.org/wiki/Paxos_(computer_science) |