]>
Commit | Line | Data |
---|---|---|
1 | ====================== | |
2 | Adding/Removing OSDs | |
3 | ====================== | |
4 | ||
5 | When you have a cluster up and running, you may add OSDs or remove OSDs | |
6 | from the cluster at runtime. | |
7 | ||
8 | Adding OSDs | |
9 | =========== | |
10 | ||
11 | When you want to expand a cluster, you may add an OSD at runtime. With Ceph, an | |
12 | OSD is generally one Ceph ``ceph-osd`` daemon for one storage drive within a | |
13 | host machine. If your host has multiple storage drives, you may map one | |
14 | ``ceph-osd`` daemon for each drive. | |
15 | ||
16 | Generally, it's a good idea to check the capacity of your cluster to see if you | |
17 | are reaching the upper end of its capacity. As your cluster reaches its ``near | |
18 | full`` ratio, you should add one or more OSDs to expand your cluster's capacity. | |
19 | ||
20 | .. warning:: Do not let your cluster reach its ``full ratio`` before | |
21 | adding an OSD. OSD failures that occur after the cluster reaches | |
22 | its ``near full`` ratio may cause the cluster to exceed its | |
23 | ``full ratio``. | |
24 | ||
25 | Deploy your Hardware | |
26 | -------------------- | |
27 | ||
28 | If you are adding a new host when adding a new OSD, see `Hardware | |
29 | Recommendations`_ for details on minimum recommendations for OSD hardware. To | |
30 | add an OSD host to your cluster, first make sure you have an up-to-date version | |
31 | of Linux installed, and you have made some initial preparations for your | |
32 | storage drives. See `Filesystem Recommendations`_ for details. | |
33 | ||
34 | Add your OSD host to a rack in your cluster, connect it to the network | |
35 | and ensure that it has network connectivity. See the `Network Configuration | |
36 | Reference`_ for details. | |
37 | ||
38 | .. _Hardware Recommendations: ../../../start/hardware-recommendations | |
39 | .. _Filesystem Recommendations: ../../configuration/filesystem-recommendations | |
40 | .. _Network Configuration Reference: ../../configuration/network-config-ref | |
41 | ||
42 | Install the Required Software | |
43 | ----------------------------- | |
44 | ||
45 | For manually deployed clusters, you must install Ceph packages | |
46 | manually. See `Installing Ceph (Manual)`_ for details. | |
47 | You should configure SSH to a user with password-less authentication | |
48 | and root permissions. | |
49 | ||
50 | .. _Installing Ceph (Manual): ../../../install | |
51 | ||
52 | ||
53 | Adding an OSD (Manual) | |
54 | ---------------------- | |
55 | ||
56 | This procedure sets up a ``ceph-osd`` daemon, configures it to use one drive, | |
57 | and configures the cluster to distribute data to the OSD. If your host has | |
58 | multiple drives, you may add an OSD for each drive by repeating this procedure. | |
59 | ||
60 | To add an OSD, create a data directory for it, mount a drive to that directory, | |
61 | add the OSD to the cluster, and then add it to the CRUSH map. | |
62 | ||
63 | When you add the OSD to the CRUSH map, consider the weight you give to the new | |
64 | OSD. Hard drive capacity grows 40% per year, so newer OSD hosts may have larger | |
65 | hard drives than older hosts in the cluster (i.e., they may have greater | |
66 | weight). | |
67 | ||
68 | .. tip:: Ceph prefers uniform hardware across pools. If you are adding drives | |
69 | of dissimilar size, you can adjust their weights. However, for best | |
70 | performance, consider a CRUSH hierarchy with drives of the same type/size. | |
71 | ||
72 | #. Create the OSD. If no UUID is given, it will be set automatically when the | |
73 | OSD starts up. The following command will output the OSD number, which you | |
74 | will need for subsequent steps. :: | |
75 | ||
76 | ceph osd create [{uuid} [{id}]] | |
77 | ||
78 | If the optional parameter {id} is given it will be used as the OSD id. | |
79 | Note, in this case the command may fail if the number is already in use. | |
80 | ||
81 | .. warning:: In general, explicitly specifying {id} is not recommended. | |
82 | IDs are allocated as an array, and skipping entries consumes some extra | |
83 | memory. This can become significant if there are large gaps and/or | |
84 | clusters are large. If {id} is not specified, the smallest available is | |
85 | used. | |
86 | ||
87 | #. Create the default directory on your new OSD. :: | |
88 | ||
89 | ssh {new-osd-host} | |
90 | sudo mkdir /var/lib/ceph/osd/ceph-{osd-number} | |
91 | ||
92 | ||
93 | #. If the OSD is for a drive other than the OS drive, prepare it | |
94 | for use with Ceph, and mount it to the directory you just created:: | |
95 | ||
96 | ssh {new-osd-host} | |
97 | sudo mkfs -t {fstype} /dev/{drive} | |
98 | sudo mount -o user_xattr /dev/{hdd} /var/lib/ceph/osd/ceph-{osd-number} | |
99 | ||
100 | ||
101 | #. Initialize the OSD data directory. :: | |
102 | ||
103 | ssh {new-osd-host} | |
104 | ceph-osd -i {osd-num} --mkfs --mkkey | |
105 | ||
106 | The directory must be empty before you can run ``ceph-osd``. | |
107 | ||
108 | #. Register the OSD authentication key. The value of ``ceph`` for | |
109 | ``ceph-{osd-num}`` in the path is the ``$cluster-$id``. If your | |
110 | cluster name differs from ``ceph``, use your cluster name instead.:: | |
111 | ||
112 | ceph auth add osd.{osd-num} osd 'allow *' mon 'allow rwx' -i /var/lib/ceph/osd/ceph-{osd-num}/keyring | |
113 | ||
114 | ||
115 | #. Add the OSD to the CRUSH map so that the OSD can begin receiving data. The | |
116 | ``ceph osd crush add`` command allows you to add OSDs to the CRUSH hierarchy | |
117 | wherever you wish. If you specify at least one bucket, the command | |
118 | will place the OSD into the most specific bucket you specify, *and* it will | |
119 | move that bucket underneath any other buckets you specify. **Important:** If | |
120 | you specify only the root bucket, the command will attach the OSD directly | |
121 | to the root, but CRUSH rules expect OSDs to be inside of hosts. | |
122 | ||
123 | For Argonaut (v 0.48), execute the following:: | |
124 | ||
125 | ceph osd crush add {id} {name} {weight} [{bucket-type}={bucket-name} ...] | |
126 | ||
127 | For Bobtail (v 0.56) and later releases, execute the following:: | |
128 | ||
129 | ceph osd crush add {id-or-name} {weight} [{bucket-type}={bucket-name} ...] | |
130 | ||
131 | You may also decompile the CRUSH map, add the OSD to the device list, add the | |
132 | host as a bucket (if it's not already in the CRUSH map), add the device as an | |
133 | item in the host, assign it a weight, recompile it and set it. See | |
134 | `Add/Move an OSD`_ for details. | |
135 | ||
136 | ||
137 | .. topic:: Argonaut (v0.48) Best Practices | |
138 | ||
139 | To limit impact on user I/O performance, add an OSD to the CRUSH map | |
140 | with an initial weight of ``0``. Then, ramp up the CRUSH weight a | |
141 | little bit at a time. For example, to ramp by increments of ``0.2``, | |
142 | start with:: | |
143 | ||
144 | ceph osd crush reweight {osd-id} .2 | |
145 | ||
146 | and allow migration to complete before reweighting to ``0.4``, | |
147 | ``0.6``, and so on until the desired CRUSH weight is reached. | |
148 | ||
149 | To limit the impact of OSD failures, you can set:: | |
150 | ||
151 | mon osd down out interval = 0 | |
152 | ||
153 | which prevents down OSDs from automatically being marked out, and then | |
154 | ramp them down manually with:: | |
155 | ||
156 | ceph osd reweight {osd-num} .8 | |
157 | ||
158 | Again, wait for the cluster to finish migrating data, and then adjust | |
159 | the weight further until you reach a weight of 0. Note that this | |
160 | problem prevents the cluster to automatically re-replicate data after | |
161 | a failure, so please ensure that sufficient monitoring is in place for | |
162 | an administrator to intervene promptly. | |
163 | ||
164 | Note that this practice will no longer be necessary in Bobtail and | |
165 | subsequent releases. | |
166 | ||
167 | .. _rados-replacing-an-osd: | |
168 | ||
169 | Replacing an OSD | |
170 | ---------------- | |
171 | ||
172 | When disks fail, or if an admnistrator wants to reprovision OSDs with a new | |
173 | backend, for instance, for switching from FileStore to BlueStore, OSDs need to | |
174 | be replaced. Unlike `Removing the OSD`_, replaced OSD's id and CRUSH map entry | |
175 | need to be keep intact after the OSD is destroyed for replacement. | |
176 | ||
177 | #. Destroy the OSD first:: | |
178 | ||
179 | ceph osd destroy {id} --yes-i-really-mean-it | |
180 | ||
181 | #. Zap a disk for the new OSD, if the disk was used before for other purposes. | |
182 | It's not necessary for a new disk:: | |
183 | ||
184 | ceph-disk zap /dev/sdX | |
185 | ||
186 | #. Prepare the disk for replacement by using the previously destroyed OSD id:: | |
187 | ||
188 | ceph-disk prepare --bluestore /dev/sdX --osd-id {id} --osd-uuid `uuidgen` | |
189 | ||
190 | #. And activate the OSD:: | |
191 | ||
192 | ceph-disk activate /dev/sdX1 | |
193 | ||
194 | ||
195 | Starting the OSD | |
196 | ---------------- | |
197 | ||
198 | After you add an OSD to Ceph, the OSD is in your configuration. However, | |
199 | it is not yet running. The OSD is ``down`` and ``in``. You must start | |
200 | your new OSD before it can begin receiving data. You may use | |
201 | ``service ceph`` from your admin host or start the OSD from its host | |
202 | machine. | |
203 | ||
204 | For Ubuntu Trusty use Upstart. :: | |
205 | ||
206 | sudo start ceph-osd id={osd-num} | |
207 | ||
208 | For all other distros use systemd. :: | |
209 | ||
210 | sudo systemctl start ceph-osd@{osd-num} | |
211 | ||
212 | ||
213 | Once you start your OSD, it is ``up`` and ``in``. | |
214 | ||
215 | ||
216 | Observe the Data Migration | |
217 | -------------------------- | |
218 | ||
219 | Once you have added your new OSD to the CRUSH map, Ceph will begin rebalancing | |
220 | the server by migrating placement groups to your new OSD. You can observe this | |
221 | process with the `ceph`_ tool. :: | |
222 | ||
223 | ceph -w | |
224 | ||
225 | You should see the placement group states change from ``active+clean`` to | |
226 | ``active, some degraded objects``, and finally ``active+clean`` when migration | |
227 | completes. (Control-c to exit.) | |
228 | ||
229 | ||
230 | .. _Add/Move an OSD: ../crush-map#addosd | |
231 | .. _ceph: ../monitoring | |
232 | ||
233 | ||
234 | ||
235 | Removing OSDs (Manual) | |
236 | ====================== | |
237 | ||
238 | When you want to reduce the size of a cluster or replace hardware, you may | |
239 | remove an OSD at runtime. With Ceph, an OSD is generally one Ceph ``ceph-osd`` | |
240 | daemon for one storage drive within a host machine. If your host has multiple | |
241 | storage drives, you may need to remove one ``ceph-osd`` daemon for each drive. | |
242 | Generally, it's a good idea to check the capacity of your cluster to see if you | |
243 | are reaching the upper end of its capacity. Ensure that when you remove an OSD | |
244 | that your cluster is not at its ``near full`` ratio. | |
245 | ||
246 | .. warning:: Do not let your cluster reach its ``full ratio`` when | |
247 | removing an OSD. Removing OSDs could cause the cluster to reach | |
248 | or exceed its ``full ratio``. | |
249 | ||
250 | ||
251 | Take the OSD out of the Cluster | |
252 | ----------------------------------- | |
253 | ||
254 | Before you remove an OSD, it is usually ``up`` and ``in``. You need to take it | |
255 | out of the cluster so that Ceph can begin rebalancing and copying its data to | |
256 | other OSDs. :: | |
257 | ||
258 | ceph osd out {osd-num} | |
259 | ||
260 | ||
261 | Observe the Data Migration | |
262 | -------------------------- | |
263 | ||
264 | Once you have taken your OSD ``out`` of the cluster, Ceph will begin | |
265 | rebalancing the cluster by migrating placement groups out of the OSD you | |
266 | removed. You can observe this process with the `ceph`_ tool. :: | |
267 | ||
268 | ceph -w | |
269 | ||
270 | You should see the placement group states change from ``active+clean`` to | |
271 | ``active, some degraded objects``, and finally ``active+clean`` when migration | |
272 | completes. (Control-c to exit.) | |
273 | ||
274 | .. note:: Sometimes, typically in a "small" cluster with few hosts (for | |
275 | instance with a small testing cluster), the fact to take ``out`` the | |
276 | OSD can spawn a CRUSH corner case where some PGs remain stuck in the | |
277 | ``active+remapped`` state. If you are in this case, you should mark | |
278 | the OSD ``in`` with: | |
279 | ||
280 | ``ceph osd in {osd-num}`` | |
281 | ||
282 | to come back to the initial state and then, instead of marking ``out`` | |
283 | the OSD, set its weight to 0 with: | |
284 | ||
285 | ``ceph osd crush reweight osd.{osd-num} 0`` | |
286 | ||
287 | After that, you can observe the data migration which should come to its | |
288 | end. The difference between marking ``out`` the OSD and reweighting it | |
289 | to 0 is that in the first case the weight of the bucket which contains | |
290 | the OSD is not changed whereas in the second case the weight of the bucket | |
291 | is updated (and decreased of the OSD weight). The reweight command could | |
292 | be sometimes favoured in the case of a "small" cluster. | |
293 | ||
294 | ||
295 | ||
296 | Stopping the OSD | |
297 | ---------------- | |
298 | ||
299 | After you take an OSD out of the cluster, it may still be running. | |
300 | That is, the OSD may be ``up`` and ``out``. You must stop | |
301 | your OSD before you remove it from the configuration. :: | |
302 | ||
303 | ssh {osd-host} | |
304 | sudo systemctl stop ceph-osd@{osd-num} | |
305 | ||
306 | Once you stop your OSD, it is ``down``. | |
307 | ||
308 | ||
309 | Removing the OSD | |
310 | ---------------- | |
311 | ||
312 | This procedure removes an OSD from a cluster map, removes its authentication | |
313 | key, removes the OSD from the OSD map, and removes the OSD from the | |
314 | ``ceph.conf`` file. If your host has multiple drives, you may need to remove an | |
315 | OSD for each drive by repeating this procedure. | |
316 | ||
317 | #. Let the cluster forget the OSD first. This step removes the OSD from the CRUSH | |
318 | map, removes its authentication key. And it is removed from the OSD map as | |
319 | well. Please note the `purge subcommand`_ is introduced in Luminous, for older | |
320 | versions, please see below :: | |
321 | ||
322 | ceph osd purge {id} --yes-i-really-mean-it | |
323 | ||
324 | #. Navigate to the host where you keep the master copy of the cluster's | |
325 | ``ceph.conf`` file. :: | |
326 | ||
327 | ssh {admin-host} | |
328 | cd /etc/ceph | |
329 | vim ceph.conf | |
330 | ||
331 | #. Remove the OSD entry from your ``ceph.conf`` file (if it exists). :: | |
332 | ||
333 | [osd.1] | |
334 | host = {hostname} | |
335 | ||
336 | #. From the host where you keep the master copy of the cluster's ``ceph.conf`` file, | |
337 | copy the updated ``ceph.conf`` file to the ``/etc/ceph`` directory of other | |
338 | hosts in your cluster. | |
339 | ||
340 | If your Ceph cluster is older than Luminous, instead of using ``ceph osd purge``, | |
341 | you need to perform this step manually: | |
342 | ||
343 | ||
344 | #. Remove the OSD from the CRUSH map so that it no longer receives data. You may | |
345 | also decompile the CRUSH map, remove the OSD from the device list, remove the | |
346 | device as an item in the host bucket or remove the host bucket (if it's in the | |
347 | CRUSH map and you intend to remove the host), recompile the map and set it. | |
348 | See `Remove an OSD`_ for details. :: | |
349 | ||
350 | ceph osd crush remove {name} | |
351 | ||
352 | #. Remove the OSD authentication key. :: | |
353 | ||
354 | ceph auth del osd.{osd-num} | |
355 | ||
356 | The value of ``ceph`` for ``ceph-{osd-num}`` in the path is the ``$cluster-$id``. | |
357 | If your cluster name differs from ``ceph``, use your cluster name instead. | |
358 | ||
359 | #. Remove the OSD. :: | |
360 | ||
361 | ceph osd rm {osd-num} | |
362 | #for example | |
363 | ceph osd rm 1 | |
364 | ||
365 | ||
366 | .. _Remove an OSD: ../crush-map#removeosd | |
367 | .. _purge subcommand: /man/8/ceph#osd |