]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | ====================== |
2 | Adding/Removing OSDs | |
3 | ====================== | |
4 | ||
5 | When you have a cluster up and running, you may add OSDs or remove OSDs | |
11fdf7f2 | 6 | from the cluster at runtime. |
7c673cae FG |
7 | |
8 | Adding OSDs | |
9 | =========== | |
10 | ||
11 | When you want to expand a cluster, you may add an OSD at runtime. With Ceph, an | |
12 | OSD is generally one Ceph ``ceph-osd`` daemon for one storage drive within a | |
13 | host machine. If your host has multiple storage drives, you may map one | |
14 | ``ceph-osd`` daemon for each drive. | |
15 | ||
16 | Generally, it's a good idea to check the capacity of your cluster to see if you | |
17 | are reaching the upper end of its capacity. As your cluster reaches its ``near | |
18 | full`` ratio, you should add one or more OSDs to expand your cluster's capacity. | |
19 | ||
20 | .. warning:: Do not let your cluster reach its ``full ratio`` before | |
11fdf7f2 | 21 | adding an OSD. OSD failures that occur after the cluster reaches |
7c673cae FG |
22 | its ``near full`` ratio may cause the cluster to exceed its |
23 | ``full ratio``. | |
24 | ||
25 | Deploy your Hardware | |
26 | -------------------- | |
27 | ||
28 | If you are adding a new host when adding a new OSD, see `Hardware | |
29 | Recommendations`_ for details on minimum recommendations for OSD hardware. To | |
30 | add an OSD host to your cluster, first make sure you have an up-to-date version | |
11fdf7f2 | 31 | of Linux installed, and you have made some initial preparations for your |
7c673cae FG |
32 | storage drives. See `Filesystem Recommendations`_ for details. |
33 | ||
34 | Add your OSD host to a rack in your cluster, connect it to the network | |
35 | and ensure that it has network connectivity. See the `Network Configuration | |
36 | Reference`_ for details. | |
37 | ||
38 | .. _Hardware Recommendations: ../../../start/hardware-recommendations | |
39 | .. _Filesystem Recommendations: ../../configuration/filesystem-recommendations | |
40 | .. _Network Configuration Reference: ../../configuration/network-config-ref | |
41 | ||
42 | Install the Required Software | |
43 | ----------------------------- | |
44 | ||
45 | For manually deployed clusters, you must install Ceph packages | |
46 | manually. See `Installing Ceph (Manual)`_ for details. | |
47 | You should configure SSH to a user with password-less authentication | |
48 | and root permissions. | |
49 | ||
50 | .. _Installing Ceph (Manual): ../../../install | |
51 | ||
52 | ||
53 | Adding an OSD (Manual) | |
54 | ---------------------- | |
55 | ||
56 | This procedure sets up a ``ceph-osd`` daemon, configures it to use one drive, | |
57 | and configures the cluster to distribute data to the OSD. If your host has | |
58 | multiple drives, you may add an OSD for each drive by repeating this procedure. | |
59 | ||
11fdf7f2 | 60 | To add an OSD, create a data directory for it, mount a drive to that directory, |
7c673cae FG |
61 | add the OSD to the cluster, and then add it to the CRUSH map. |
62 | ||
63 | When you add the OSD to the CRUSH map, consider the weight you give to the new | |
64 | OSD. Hard drive capacity grows 40% per year, so newer OSD hosts may have larger | |
11fdf7f2 | 65 | hard drives than older hosts in the cluster (i.e., they may have greater |
7c673cae FG |
66 | weight). |
67 | ||
68 | .. tip:: Ceph prefers uniform hardware across pools. If you are adding drives | |
11fdf7f2 | 69 | of dissimilar size, you can adjust their weights. However, for best |
7c673cae FG |
70 | performance, consider a CRUSH hierarchy with drives of the same type/size. |
71 | ||
11fdf7f2 TL |
72 | #. Create the OSD. If no UUID is given, it will be set automatically when the |
73 | OSD starts up. The following command will output the OSD number, which you | |
7c673cae | 74 | will need for subsequent steps. :: |
11fdf7f2 | 75 | |
7c673cae FG |
76 | ceph osd create [{uuid} [{id}]] |
77 | ||
78 | If the optional parameter {id} is given it will be used as the OSD id. | |
79 | Note, in this case the command may fail if the number is already in use. | |
80 | ||
81 | .. warning:: In general, explicitly specifying {id} is not recommended. | |
82 | IDs are allocated as an array, and skipping entries consumes some extra | |
83 | memory. This can become significant if there are large gaps and/or | |
84 | clusters are large. If {id} is not specified, the smallest available is | |
85 | used. | |
86 | ||
11fdf7f2 | 87 | #. Create the default directory on your new OSD. :: |
7c673cae FG |
88 | |
89 | ssh {new-osd-host} | |
90 | sudo mkdir /var/lib/ceph/osd/ceph-{osd-number} | |
7c673cae | 91 | |
11fdf7f2 TL |
92 | |
93 | #. If the OSD is for a drive other than the OS drive, prepare it | |
94 | for use with Ceph, and mount it to the directory you just created:: | |
7c673cae FG |
95 | |
96 | ssh {new-osd-host} | |
97 | sudo mkfs -t {fstype} /dev/{drive} | |
98 | sudo mount -o user_xattr /dev/{hdd} /var/lib/ceph/osd/ceph-{osd-number} | |
99 | ||
11fdf7f2 TL |
100 | |
101 | #. Initialize the OSD data directory. :: | |
7c673cae FG |
102 | |
103 | ssh {new-osd-host} | |
104 | ceph-osd -i {osd-num} --mkfs --mkkey | |
11fdf7f2 | 105 | |
7c673cae FG |
106 | The directory must be empty before you can run ``ceph-osd``. |
107 | ||
11fdf7f2 TL |
108 | #. Register the OSD authentication key. The value of ``ceph`` for |
109 | ``ceph-{osd-num}`` in the path is the ``$cluster-$id``. If your | |
7c673cae FG |
110 | cluster name differs from ``ceph``, use your cluster name instead.:: |
111 | ||
112 | ceph auth add osd.{osd-num} osd 'allow *' mon 'allow rwx' -i /var/lib/ceph/osd/ceph-{osd-num}/keyring | |
113 | ||
114 | ||
11fdf7f2 TL |
115 | #. Add the OSD to the CRUSH map so that the OSD can begin receiving data. The |
116 | ``ceph osd crush add`` command allows you to add OSDs to the CRUSH hierarchy | |
117 | wherever you wish. If you specify at least one bucket, the command | |
118 | will place the OSD into the most specific bucket you specify, *and* it will | |
119 | move that bucket underneath any other buckets you specify. **Important:** If | |
120 | you specify only the root bucket, the command will attach the OSD directly | |
7c673cae | 121 | to the root, but CRUSH rules expect OSDs to be inside of hosts. |
7c673cae | 122 | |
11fdf7f2 | 123 | Execute the following:: |
7c673cae FG |
124 | |
125 | ceph osd crush add {id-or-name} {weight} [{bucket-type}={bucket-name} ...] | |
126 | ||
11fdf7f2 TL |
127 | You may also decompile the CRUSH map, add the OSD to the device list, add the |
128 | host as a bucket (if it's not already in the CRUSH map), add the device as an | |
129 | item in the host, assign it a weight, recompile it and set it. See | |
7c673cae FG |
130 | `Add/Move an OSD`_ for details. |
131 | ||
132 | ||
28e407b8 | 133 | .. _rados-replacing-an-osd: |
7c673cae | 134 | |
c07f9fc5 FG |
135 | Replacing an OSD |
136 | ---------------- | |
137 | ||
11fdf7f2 | 138 | When disks fail, or if an administrator wants to reprovision OSDs with a new |
c07f9fc5 FG |
139 | backend, for instance, for switching from FileStore to BlueStore, OSDs need to |
140 | be replaced. Unlike `Removing the OSD`_, replaced OSD's id and CRUSH map entry | |
141 | need to be keep intact after the OSD is destroyed for replacement. | |
142 | ||
9f95a23c TL |
143 | #. Make sure it is safe to destroy the OSD:: |
144 | ||
145 | while ! ceph osd safe-to-destroy osd.{id} ; do sleep 10 ; done | |
146 | ||
c07f9fc5 FG |
147 | #. Destroy the OSD first:: |
148 | ||
149 | ceph osd destroy {id} --yes-i-really-mean-it | |
150 | ||
151 | #. Zap a disk for the new OSD, if the disk was used before for other purposes. | |
152 | It's not necessary for a new disk:: | |
153 | ||
11fdf7f2 | 154 | ceph-volume lvm zap /dev/sdX |
c07f9fc5 FG |
155 | |
156 | #. Prepare the disk for replacement by using the previously destroyed OSD id:: | |
157 | ||
9f95a23c | 158 | ceph-volume lvm prepare --osd-id {id} --data /dev/sdX |
c07f9fc5 FG |
159 | |
160 | #. And activate the OSD:: | |
161 | ||
11fdf7f2 TL |
162 | ceph-volume lvm activate {id} {fsid} |
163 | ||
164 | Alternatively, instead of preparing and activating, the device can be recreated | |
165 | in one call, like:: | |
166 | ||
9f95a23c | 167 | ceph-volume lvm create --osd-id {id} --data /dev/sdX |
c07f9fc5 FG |
168 | |
169 | ||
7c673cae FG |
170 | Starting the OSD |
171 | ---------------- | |
172 | ||
11fdf7f2 TL |
173 | After you add an OSD to Ceph, the OSD is in your configuration. However, |
174 | it is not yet running. The OSD is ``down`` and ``in``. You must start | |
7c673cae FG |
175 | your new OSD before it can begin receiving data. You may use |
176 | ``service ceph`` from your admin host or start the OSD from its host | |
177 | machine. | |
178 | ||
179 | For Ubuntu Trusty use Upstart. :: | |
180 | ||
181 | sudo start ceph-osd id={osd-num} | |
182 | ||
183 | For all other distros use systemd. :: | |
184 | ||
185 | sudo systemctl start ceph-osd@{osd-num} | |
186 | ||
187 | ||
188 | Once you start your OSD, it is ``up`` and ``in``. | |
189 | ||
190 | ||
191 | Observe the Data Migration | |
192 | -------------------------- | |
193 | ||
194 | Once you have added your new OSD to the CRUSH map, Ceph will begin rebalancing | |
195 | the server by migrating placement groups to your new OSD. You can observe this | |
11fdf7f2 | 196 | process with the `ceph`_ tool. :: |
7c673cae FG |
197 | |
198 | ceph -w | |
199 | ||
200 | You should see the placement group states change from ``active+clean`` to | |
201 | ``active, some degraded objects``, and finally ``active+clean`` when migration | |
202 | completes. (Control-c to exit.) | |
203 | ||
204 | ||
205 | .. _Add/Move an OSD: ../crush-map#addosd | |
206 | .. _ceph: ../monitoring | |
207 | ||
208 | ||
209 | ||
210 | Removing OSDs (Manual) | |
211 | ====================== | |
212 | ||
213 | When you want to reduce the size of a cluster or replace hardware, you may | |
214 | remove an OSD at runtime. With Ceph, an OSD is generally one Ceph ``ceph-osd`` | |
215 | daemon for one storage drive within a host machine. If your host has multiple | |
216 | storage drives, you may need to remove one ``ceph-osd`` daemon for each drive. | |
217 | Generally, it's a good idea to check the capacity of your cluster to see if you | |
218 | are reaching the upper end of its capacity. Ensure that when you remove an OSD | |
219 | that your cluster is not at its ``near full`` ratio. | |
220 | ||
221 | .. warning:: Do not let your cluster reach its ``full ratio`` when | |
11fdf7f2 | 222 | removing an OSD. Removing OSDs could cause the cluster to reach |
7c673cae | 223 | or exceed its ``full ratio``. |
11fdf7f2 | 224 | |
7c673cae FG |
225 | |
226 | Take the OSD out of the Cluster | |
227 | ----------------------------------- | |
228 | ||
229 | Before you remove an OSD, it is usually ``up`` and ``in``. You need to take it | |
230 | out of the cluster so that Ceph can begin rebalancing and copying its data to | |
11fdf7f2 | 231 | other OSDs. :: |
7c673cae FG |
232 | |
233 | ceph osd out {osd-num} | |
234 | ||
235 | ||
236 | Observe the Data Migration | |
237 | -------------------------- | |
238 | ||
239 | Once you have taken your OSD ``out`` of the cluster, Ceph will begin | |
240 | rebalancing the cluster by migrating placement groups out of the OSD you | |
11fdf7f2 | 241 | removed. You can observe this process with the `ceph`_ tool. :: |
7c673cae FG |
242 | |
243 | ceph -w | |
244 | ||
245 | You should see the placement group states change from ``active+clean`` to | |
246 | ``active, some degraded objects``, and finally ``active+clean`` when migration | |
247 | completes. (Control-c to exit.) | |
248 | ||
249 | .. note:: Sometimes, typically in a "small" cluster with few hosts (for | |
250 | instance with a small testing cluster), the fact to take ``out`` the | |
251 | OSD can spawn a CRUSH corner case where some PGs remain stuck in the | |
252 | ``active+remapped`` state. If you are in this case, you should mark | |
253 | the OSD ``in`` with: | |
254 | ||
255 | ``ceph osd in {osd-num}`` | |
256 | ||
257 | to come back to the initial state and then, instead of marking ``out`` | |
258 | the OSD, set its weight to 0 with: | |
259 | ||
260 | ``ceph osd crush reweight osd.{osd-num} 0`` | |
261 | ||
262 | After that, you can observe the data migration which should come to its | |
263 | end. The difference between marking ``out`` the OSD and reweighting it | |
264 | to 0 is that in the first case the weight of the bucket which contains | |
c07f9fc5 | 265 | the OSD is not changed whereas in the second case the weight of the bucket |
7c673cae FG |
266 | is updated (and decreased of the OSD weight). The reweight command could |
267 | be sometimes favoured in the case of a "small" cluster. | |
268 | ||
269 | ||
270 | ||
271 | Stopping the OSD | |
272 | ---------------- | |
273 | ||
11fdf7f2 TL |
274 | After you take an OSD out of the cluster, it may still be running. |
275 | That is, the OSD may be ``up`` and ``out``. You must stop | |
276 | your OSD before you remove it from the configuration. :: | |
7c673cae FG |
277 | |
278 | ssh {osd-host} | |
279 | sudo systemctl stop ceph-osd@{osd-num} | |
280 | ||
11fdf7f2 | 281 | Once you stop your OSD, it is ``down``. |
7c673cae FG |
282 | |
283 | ||
284 | Removing the OSD | |
285 | ---------------- | |
286 | ||
287 | This procedure removes an OSD from a cluster map, removes its authentication | |
288 | key, removes the OSD from the OSD map, and removes the OSD from the | |
289 | ``ceph.conf`` file. If your host has multiple drives, you may need to remove an | |
290 | OSD for each drive by repeating this procedure. | |
291 | ||
c07f9fc5 FG |
292 | #. Let the cluster forget the OSD first. This step removes the OSD from the CRUSH |
293 | map, removes its authentication key. And it is removed from the OSD map as | |
11fdf7f2 | 294 | well. Please note the :ref:`purge subcommand <ceph-admin-osd>` is introduced in Luminous, for older |
c07f9fc5 FG |
295 | versions, please see below :: |
296 | ||
297 | ceph osd purge {id} --yes-i-really-mean-it | |
298 | ||
299 | #. Navigate to the host where you keep the master copy of the cluster's | |
300 | ``ceph.conf`` file. :: | |
301 | ||
302 | ssh {admin-host} | |
303 | cd /etc/ceph | |
304 | vim ceph.conf | |
305 | ||
306 | #. Remove the OSD entry from your ``ceph.conf`` file (if it exists). :: | |
307 | ||
308 | [osd.1] | |
309 | host = {hostname} | |
310 | ||
311 | #. From the host where you keep the master copy of the cluster's ``ceph.conf`` file, | |
312 | copy the updated ``ceph.conf`` file to the ``/etc/ceph`` directory of other | |
313 | hosts in your cluster. | |
314 | ||
315 | If your Ceph cluster is older than Luminous, instead of using ``ceph osd purge``, | |
316 | you need to perform this step manually: | |
317 | ||
7c673cae FG |
318 | |
319 | #. Remove the OSD from the CRUSH map so that it no longer receives data. You may | |
320 | also decompile the CRUSH map, remove the OSD from the device list, remove the | |
321 | device as an item in the host bucket or remove the host bucket (if it's in the | |
11fdf7f2 TL |
322 | CRUSH map and you intend to remove the host), recompile the map and set it. |
323 | See `Remove an OSD`_ for details. :: | |
7c673cae FG |
324 | |
325 | ceph osd crush remove {name} | |
11fdf7f2 | 326 | |
7c673cae FG |
327 | #. Remove the OSD authentication key. :: |
328 | ||
329 | ceph auth del osd.{osd-num} | |
11fdf7f2 TL |
330 | |
331 | The value of ``ceph`` for ``ceph-{osd-num}`` in the path is the ``$cluster-$id``. | |
332 | If your cluster name differs from ``ceph``, use your cluster name instead. | |
333 | ||
7c673cae FG |
334 | #. Remove the OSD. :: |
335 | ||
336 | ceph osd rm {osd-num} | |
337 | #for example | |
338 | ceph osd rm 1 | |
7c673cae | 339 | |
11fdf7f2 | 340 | |
7c673cae | 341 | .. _Remove an OSD: ../crush-map#removeosd |