]>
Commit | Line | Data |
---|---|---|
1 | ====================== | |
2 | Adding/Removing OSDs | |
3 | ====================== | |
4 | ||
5 | When a cluster is up and running, it is possible to add or remove OSDs. | |
6 | ||
7 | Adding OSDs | |
8 | =========== | |
9 | ||
10 | OSDs can be added to a cluster in order to expand the cluster's capacity and | |
11 | resilience. Typically, an OSD is a Ceph ``ceph-osd`` daemon running on one | |
12 | storage drive within a host machine. But if your host machine has multiple | |
13 | storage drives, you may map one ``ceph-osd`` daemon for each drive on the | |
14 | machine. | |
15 | ||
16 | It's a good idea to check the capacity of your cluster so that you know when it | |
17 | approaches its capacity limits. If your cluster has reached its ``near full`` | |
18 | ratio, then you should add OSDs to expand your cluster's capacity. | |
19 | ||
20 | .. warning:: Do not add an OSD after your cluster has reached its ``full | |
21 | ratio``. OSD failures that occur after the cluster reaches its ``near full | |
22 | ratio`` might cause the cluster to exceed its ``full ratio``. | |
23 | ||
24 | ||
25 | Deploying your Hardware | |
26 | ----------------------- | |
27 | ||
28 | If you are also adding a new host when adding a new OSD, see `Hardware | |
29 | Recommendations`_ for details on minimum recommendations for OSD hardware. To | |
30 | add an OSD host to your cluster, begin by making sure that an appropriate | |
31 | version of Linux has been installed on the host machine and that all initial | |
32 | preparations for your storage drives have been carried out. For details, see | |
33 | `Filesystem Recommendations`_. | |
34 | ||
35 | Next, add your OSD host to a rack in your cluster, connect the host to the | |
36 | network, and ensure that the host has network connectivity. For details, see | |
37 | `Network Configuration Reference`_. | |
38 | ||
39 | ||
40 | .. _Hardware Recommendations: ../../../start/hardware-recommendations | |
41 | .. _Filesystem Recommendations: ../../configuration/filesystem-recommendations | |
42 | .. _Network Configuration Reference: ../../configuration/network-config-ref | |
43 | ||
44 | Installing the Required Software | |
45 | -------------------------------- | |
46 | ||
47 | If your cluster has been manually deployed, you will need to install Ceph | |
48 | software packages manually. For details, see `Installing Ceph (Manual)`_. | |
49 | Configure SSH for the appropriate user to have both passwordless authentication | |
50 | and root permissions. | |
51 | ||
52 | .. _Installing Ceph (Manual): ../../../install | |
53 | ||
54 | ||
55 | Adding an OSD (Manual) | |
56 | ---------------------- | |
57 | ||
58 | The following procedure sets up a ``ceph-osd`` daemon, configures this OSD to | |
59 | use one drive, and configures the cluster to distribute data to the OSD. If | |
60 | your host machine has multiple drives, you may add an OSD for each drive on the | |
61 | host by repeating this procedure. | |
62 | ||
63 | As the following procedure will demonstrate, adding an OSD involves creating a | |
64 | metadata directory for it, configuring a data storage drive, adding the OSD to | |
65 | the cluster, and then adding it to the CRUSH map. | |
66 | ||
67 | When you add the OSD to the CRUSH map, you will need to consider the weight you | |
68 | assign to the new OSD. Since storage drive capacities increase over time, newer | |
69 | OSD hosts are likely to have larger hard drives than the older hosts in the | |
70 | cluster have and therefore might have greater weight as well. | |
71 | ||
72 | .. tip:: Ceph works best with uniform hardware across pools. It is possible to | |
73 | add drives of dissimilar size and then adjust their weights accordingly. | |
74 | However, for best performance, consider a CRUSH hierarchy that has drives of | |
75 | the same type and size. It is better to add larger drives uniformly to | |
76 | existing hosts. This can be done incrementally, replacing smaller drives | |
77 | each time the new drives are added. | |
78 | ||
79 | #. Create the new OSD by running a command of the following form. If you opt | |
80 | not to specify a UUID in this command, the UUID will be set automatically | |
81 | when the OSD starts up. The OSD number, which is needed for subsequent | |
82 | steps, is found in the command's output: | |
83 | ||
84 | .. prompt:: bash $ | |
85 | ||
86 | ceph osd create [{uuid} [{id}]] | |
87 | ||
88 | If the optional parameter {id} is specified it will be used as the OSD ID. | |
89 | However, if the ID number is already in use, the command will fail. | |
90 | ||
91 | .. warning:: Explicitly specifying the ``{id}`` parameter is not | |
92 | recommended. IDs are allocated as an array, and any skipping of entries | |
93 | consumes extra memory. This memory consumption can become significant if | |
94 | there are large gaps or if clusters are large. By leaving the ``{id}`` | |
95 | parameter unspecified, we ensure that Ceph uses the smallest ID number | |
96 | available and that these problems are avoided. | |
97 | ||
98 | #. Create the default directory for your new OSD by running commands of the | |
99 | following form: | |
100 | ||
101 | .. prompt:: bash $ | |
102 | ||
103 | ssh {new-osd-host} | |
104 | sudo mkdir /var/lib/ceph/osd/ceph-{osd-number} | |
105 | ||
106 | #. If the OSD will be created on a drive other than the OS drive, prepare it | |
107 | for use with Ceph. Run commands of the following form: | |
108 | ||
109 | .. prompt:: bash $ | |
110 | ||
111 | ssh {new-osd-host} | |
112 | sudo mkfs -t {fstype} /dev/{drive} | |
113 | sudo mount -o user_xattr /dev/{hdd} /var/lib/ceph/osd/ceph-{osd-number} | |
114 | ||
115 | #. Initialize the OSD data directory by running commands of the following form: | |
116 | ||
117 | .. prompt:: bash $ | |
118 | ||
119 | ssh {new-osd-host} | |
120 | ceph-osd -i {osd-num} --mkfs --mkkey | |
121 | ||
122 | Make sure that the directory is empty before running ``ceph-osd``. | |
123 | ||
124 | #. Register the OSD authentication key by running a command of the following | |
125 | form: | |
126 | ||
127 | .. prompt:: bash $ | |
128 | ||
129 | ceph auth add osd.{osd-num} osd 'allow *' mon 'allow rwx' -i /var/lib/ceph/osd/ceph-{osd-num}/keyring | |
130 | ||
131 | This presentation of the command has ``ceph-{osd-num}`` in the listed path | |
132 | because many clusters have the name ``ceph``. However, if your cluster name | |
133 | is not ``ceph``, then the string ``ceph`` in ``ceph-{osd-num}`` needs to be | |
134 | replaced with your cluster name. For example, if your cluster name is | |
135 | ``cluster1``, then the path in the command should be | |
136 | ``/var/lib/ceph/osd/cluster1-{osd-num}/keyring``. | |
137 | ||
138 | #. Add the OSD to the CRUSH map by running the following command. This allows | |
139 | the OSD to begin receiving data. The ``ceph osd crush add`` command can add | |
140 | OSDs to the CRUSH hierarchy wherever you want. If you specify one or more | |
141 | buckets, the command places the OSD in the most specific of those buckets, | |
142 | and it moves that bucket underneath any other buckets that you have | |
143 | specified. **Important:** If you specify only the root bucket, the command | |
144 | will attach the OSD directly to the root, but CRUSH rules expect OSDs to be | |
145 | inside of hosts. If the OSDs are not inside hosts, the OSDS will likely not | |
146 | receive any data. | |
147 | ||
148 | .. prompt:: bash $ | |
149 | ||
150 | ceph osd crush add {id-or-name} {weight} [{bucket-type}={bucket-name} ...] | |
151 | ||
152 | Note that there is another way to add a new OSD to the CRUSH map: decompile | |
153 | the CRUSH map, add the OSD to the device list, add the host as a bucket (if | |
154 | it is not already in the CRUSH map), add the device as an item in the host, | |
155 | assign the device a weight, recompile the CRUSH map, and set the CRUSH map. | |
156 | For details, see `Add/Move an OSD`_. This is rarely necessary with recent | |
157 | releases (this sentence was written the month that Reef was released). | |
158 | ||
159 | ||
160 | .. _rados-replacing-an-osd: | |
161 | ||
162 | Replacing an OSD | |
163 | ---------------- | |
164 | ||
165 | .. note:: If the procedure in this section does not work for you, try the | |
166 | instructions in the ``cephadm`` documentation: | |
167 | :ref:`cephadm-replacing-an-osd`. | |
168 | ||
169 | Sometimes OSDs need to be replaced: for example, when a disk fails, or when an | |
170 | administrator wants to reprovision OSDs with a new back end (perhaps when | |
171 | switching from Filestore to BlueStore). Replacing an OSD differs from `Removing | |
172 | the OSD`_ in that the replaced OSD's ID and CRUSH map entry must be kept intact | |
173 | after the OSD is destroyed for replacement. | |
174 | ||
175 | ||
176 | #. Make sure that it is safe to destroy the OSD: | |
177 | ||
178 | .. prompt:: bash $ | |
179 | ||
180 | while ! ceph osd safe-to-destroy osd.{id} ; do sleep 10 ; done | |
181 | ||
182 | #. Destroy the OSD: | |
183 | ||
184 | .. prompt:: bash $ | |
185 | ||
186 | ceph osd destroy {id} --yes-i-really-mean-it | |
187 | ||
188 | #. *Optional*: If the disk that you plan to use is not a new disk and has been | |
189 | used before for other purposes, zap the disk: | |
190 | ||
191 | .. prompt:: bash $ | |
192 | ||
193 | ceph-volume lvm zap /dev/sdX | |
194 | ||
195 | #. Prepare the disk for replacement by using the ID of the OSD that was | |
196 | destroyed in previous steps: | |
197 | ||
198 | .. prompt:: bash $ | |
199 | ||
200 | ceph-volume lvm prepare --osd-id {id} --data /dev/sdX | |
201 | ||
202 | #. Finally, activate the OSD: | |
203 | ||
204 | .. prompt:: bash $ | |
205 | ||
206 | ceph-volume lvm activate {id} {fsid} | |
207 | ||
208 | Alternatively, instead of carrying out the final two steps (preparing the disk | |
209 | and activating the OSD), you can re-create the OSD by running a single command | |
210 | of the following form: | |
211 | ||
212 | .. prompt:: bash $ | |
213 | ||
214 | ceph-volume lvm create --osd-id {id} --data /dev/sdX | |
215 | ||
216 | Starting the OSD | |
217 | ---------------- | |
218 | ||
219 | After an OSD is added to Ceph, the OSD is in the cluster. However, until it is | |
220 | started, the OSD is considered ``down`` and ``in``. The OSD is not running and | |
221 | will be unable to receive data. To start an OSD, either run ``service ceph`` | |
222 | from your admin host or run a command of the following form to start the OSD | |
223 | from its host machine: | |
224 | ||
225 | .. prompt:: bash $ | |
226 | ||
227 | sudo systemctl start ceph-osd@{osd-num} | |
228 | ||
229 | After the OSD is started, it is considered ``up`` and ``in``. | |
230 | ||
231 | Observing the Data Migration | |
232 | ---------------------------- | |
233 | ||
234 | After the new OSD has been added to the CRUSH map, Ceph begins rebalancing the | |
235 | cluster by migrating placement groups (PGs) to the new OSD. To observe this | |
236 | process by using the `ceph`_ tool, run the following command: | |
237 | ||
238 | .. prompt:: bash $ | |
239 | ||
240 | ceph -w | |
241 | ||
242 | Or: | |
243 | ||
244 | .. prompt:: bash $ | |
245 | ||
246 | watch ceph status | |
247 | ||
248 | The PG states will first change from ``active+clean`` to ``active, some | |
249 | degraded objects`` and then return to ``active+clean`` when migration | |
250 | completes. When you are finished observing, press Ctrl-C to exit. | |
251 | ||
252 | .. _Add/Move an OSD: ../crush-map#addosd | |
253 | .. _ceph: ../monitoring | |
254 | ||
255 | ||
256 | Removing OSDs (Manual) | |
257 | ====================== | |
258 | ||
259 | It is possible to remove an OSD manually while the cluster is running: you | |
260 | might want to do this in order to reduce the size of the cluster or when | |
261 | replacing hardware. Typically, an OSD is a Ceph ``ceph-osd`` daemon running on | |
262 | one storage drive within a host machine. Alternatively, if your host machine | |
263 | has multiple storage drives, you might need to remove multiple ``ceph-osd`` | |
264 | daemons: one daemon for each drive on the machine. | |
265 | ||
266 | .. warning:: Before you begin the process of removing an OSD, make sure that | |
267 | your cluster is not near its ``full ratio``. Otherwise the act of removing | |
268 | OSDs might cause the cluster to reach or exceed its ``full ratio``. | |
269 | ||
270 | ||
271 | Taking the OSD ``out`` of the Cluster | |
272 | ------------------------------------- | |
273 | ||
274 | OSDs are typically ``up`` and ``in`` before they are removed from the cluster. | |
275 | Before the OSD can be removed from the cluster, the OSD must be taken ``out`` | |
276 | of the cluster so that Ceph can begin rebalancing and copying its data to other | |
277 | OSDs. To take an OSD ``out`` of the cluster, run a command of the following | |
278 | form: | |
279 | ||
280 | .. prompt:: bash $ | |
281 | ||
282 | ceph osd out {osd-num} | |
283 | ||
284 | ||
285 | Observing the Data Migration | |
286 | ---------------------------- | |
287 | ||
288 | After the OSD has been taken ``out`` of the cluster, Ceph begins rebalancing | |
289 | the cluster by migrating placement groups out of the OSD that was removed. To | |
290 | observe this process by using the `ceph`_ tool, run the following command: | |
291 | ||
292 | .. prompt:: bash $ | |
293 | ||
294 | ceph -w | |
295 | ||
296 | The PG states will change from ``active+clean`` to ``active, some degraded | |
297 | objects`` and will then return to ``active+clean`` when migration completes. | |
298 | When you are finished observing, press Ctrl-C to exit. | |
299 | ||
300 | .. note:: Under certain conditions, the action of taking ``out`` an OSD | |
301 | might lead CRUSH to encounter a corner case in which some PGs remain stuck | |
302 | in the ``active+remapped`` state. This problem sometimes occurs in small | |
303 | clusters with few hosts (for example, in a small testing cluster). To | |
304 | address this problem, mark the OSD ``in`` by running a command of the | |
305 | following form: | |
306 | ||
307 | .. prompt:: bash $ | |
308 | ||
309 | ceph osd in {osd-num} | |
310 | ||
311 | After the OSD has come back to its initial state, do not mark the OSD | |
312 | ``out`` again. Instead, set the OSD's weight to ``0`` by running a command | |
313 | of the following form: | |
314 | ||
315 | .. prompt:: bash $ | |
316 | ||
317 | ceph osd crush reweight osd.{osd-num} 0 | |
318 | ||
319 | After the OSD has been reweighted, observe the data migration and confirm | |
320 | that it has completed successfully. The difference between marking an OSD | |
321 | ``out`` and reweighting the OSD to ``0`` has to do with the bucket that | |
322 | contains the OSD. When an OSD is marked ``out``, the weight of the bucket is | |
323 | not changed. But when an OSD is reweighted to ``0``, the weight of the | |
324 | bucket is updated (namely, the weight of the OSD is subtracted from the | |
325 | overall weight of the bucket). When operating small clusters, it can | |
326 | sometimes be preferable to use the above reweight command. | |
327 | ||
328 | ||
329 | Stopping the OSD | |
330 | ---------------- | |
331 | ||
332 | After you take an OSD ``out`` of the cluster, the OSD might still be running. | |
333 | In such a case, the OSD is ``up`` and ``out``. Before it is removed from the | |
334 | cluster, the OSD must be stopped by running commands of the following form: | |
335 | ||
336 | .. prompt:: bash $ | |
337 | ||
338 | ssh {osd-host} | |
339 | sudo systemctl stop ceph-osd@{osd-num} | |
340 | ||
341 | After the OSD has been stopped, it is ``down``. | |
342 | ||
343 | ||
344 | Removing the OSD | |
345 | ---------------- | |
346 | ||
347 | The following procedure removes an OSD from the cluster map, removes the OSD's | |
348 | authentication key, removes the OSD from the OSD map, and removes the OSD from | |
349 | the ``ceph.conf`` file. If your host has multiple drives, it might be necessary | |
350 | to remove an OSD from each drive by repeating this procedure. | |
351 | ||
352 | #. Begin by having the cluster forget the OSD. This step removes the OSD from | |
353 | the CRUSH map, removes the OSD's authentication key, and removes the OSD | |
354 | from the OSD map. (The :ref:`purge subcommand <ceph-admin-osd>` was | |
355 | introduced in Luminous. For older releases, see :ref:`the procedure linked | |
356 | here <ceph_osd_purge_procedure_pre_luminous>`.): | |
357 | ||
358 | .. prompt:: bash $ | |
359 | ||
360 | ceph osd purge {id} --yes-i-really-mean-it | |
361 | ||
362 | ||
363 | #. Navigate to the host where the master copy of the cluster's | |
364 | ``ceph.conf`` file is kept: | |
365 | ||
366 | .. prompt:: bash $ | |
367 | ||
368 | ssh {admin-host} | |
369 | cd /etc/ceph | |
370 | vim ceph.conf | |
371 | ||
372 | #. Remove the OSD entry from your ``ceph.conf`` file (if such an entry | |
373 | exists):: | |
374 | ||
375 | [osd.1] | |
376 | host = {hostname} | |
377 | ||
378 | #. Copy the updated ``ceph.conf`` file from the location on the host where the | |
379 | master copy of the cluster's ``ceph.conf`` is kept to the ``/etc/ceph`` | |
380 | directory of the other hosts in your cluster. | |
381 | ||
382 | .. _ceph_osd_purge_procedure_pre_luminous: | |
383 | ||
384 | If your Ceph cluster is older than Luminous, you will be unable to use the | |
385 | ``ceph osd purge`` command. Instead, carry out the following procedure: | |
386 | ||
387 | #. Remove the OSD from the CRUSH map so that it no longer receives data (for | |
388 | more details, see `Remove an OSD`_): | |
389 | ||
390 | .. prompt:: bash $ | |
391 | ||
392 | ceph osd crush remove {name} | |
393 | ||
394 | Instead of removing the OSD from the CRUSH map, you might opt for one of two | |
395 | alternatives: (1) decompile the CRUSH map, remove the OSD from the device | |
396 | list, and remove the device from the host bucket; (2) remove the host bucket | |
397 | from the CRUSH map (provided that it is in the CRUSH map and that you intend | |
398 | to remove the host), recompile the map, and set it: | |
399 | ||
400 | ||
401 | #. Remove the OSD authentication key: | |
402 | ||
403 | .. prompt:: bash $ | |
404 | ||
405 | ceph auth del osd.{osd-num} | |
406 | ||
407 | #. Remove the OSD: | |
408 | ||
409 | .. prompt:: bash $ | |
410 | ||
411 | ceph osd rm {osd-num} | |
412 | ||
413 | For example: | |
414 | ||
415 | .. prompt:: bash $ | |
416 | ||
417 | ceph osd rm 1 | |
418 | ||
419 | .. _Remove an OSD: ../crush-map#removeosd |