]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | ====================== |
2 | Adding/Removing OSDs | |
3 | ====================== | |
4 | ||
5 | When you have a cluster up and running, you may add OSDs or remove OSDs | |
6 | from the cluster at runtime. | |
7 | ||
8 | Adding OSDs | |
9 | =========== | |
10 | ||
11 | When you want to expand a cluster, you may add an OSD at runtime. With Ceph, an | |
12 | OSD is generally one Ceph ``ceph-osd`` daemon for one storage drive within a | |
13 | host machine. If your host has multiple storage drives, you may map one | |
14 | ``ceph-osd`` daemon for each drive. | |
15 | ||
16 | Generally, it's a good idea to check the capacity of your cluster to see if you | |
17 | are reaching the upper end of its capacity. As your cluster reaches its ``near | |
18 | full`` ratio, you should add one or more OSDs to expand your cluster's capacity. | |
19 | ||
20 | .. warning:: Do not let your cluster reach its ``full ratio`` before | |
21 | adding an OSD. OSD failures that occur after the cluster reaches | |
22 | its ``near full`` ratio may cause the cluster to exceed its | |
23 | ``full ratio``. | |
24 | ||
25 | Deploy your Hardware | |
26 | -------------------- | |
27 | ||
28 | If you are adding a new host when adding a new OSD, see `Hardware | |
29 | Recommendations`_ for details on minimum recommendations for OSD hardware. To | |
30 | add an OSD host to your cluster, first make sure you have an up-to-date version | |
31 | of Linux installed, and you have made some initial preparations for your | |
32 | storage drives. See `Filesystem Recommendations`_ for details. | |
33 | ||
34 | Add your OSD host to a rack in your cluster, connect it to the network | |
35 | and ensure that it has network connectivity. See the `Network Configuration | |
36 | Reference`_ for details. | |
37 | ||
38 | .. _Hardware Recommendations: ../../../start/hardware-recommendations | |
39 | .. _Filesystem Recommendations: ../../configuration/filesystem-recommendations | |
40 | .. _Network Configuration Reference: ../../configuration/network-config-ref | |
41 | ||
42 | Install the Required Software | |
43 | ----------------------------- | |
44 | ||
45 | For manually deployed clusters, you must install Ceph packages | |
46 | manually. See `Installing Ceph (Manual)`_ for details. | |
47 | You should configure SSH to a user with password-less authentication | |
48 | and root permissions. | |
49 | ||
50 | .. _Installing Ceph (Manual): ../../../install | |
51 | ||
52 | ||
53 | Adding an OSD (Manual) | |
54 | ---------------------- | |
55 | ||
56 | This procedure sets up a ``ceph-osd`` daemon, configures it to use one drive, | |
57 | and configures the cluster to distribute data to the OSD. If your host has | |
58 | multiple drives, you may add an OSD for each drive by repeating this procedure. | |
59 | ||
60 | To add an OSD, create a data directory for it, mount a drive to that directory, | |
61 | add the OSD to the cluster, and then add it to the CRUSH map. | |
62 | ||
63 | When you add the OSD to the CRUSH map, consider the weight you give to the new | |
64 | OSD. Hard drive capacity grows 40% per year, so newer OSD hosts may have larger | |
65 | hard drives than older hosts in the cluster (i.e., they may have greater | |
66 | weight). | |
67 | ||
68 | .. tip:: Ceph prefers uniform hardware across pools. If you are adding drives | |
69 | of dissimilar size, you can adjust their weights. However, for best | |
70 | performance, consider a CRUSH hierarchy with drives of the same type/size. | |
71 | ||
72 | #. Create the OSD. If no UUID is given, it will be set automatically when the | |
73 | OSD starts up. The following command will output the OSD number, which you | |
74 | will need for subsequent steps. :: | |
75 | ||
76 | ceph osd create [{uuid} [{id}]] | |
77 | ||
78 | If the optional parameter {id} is given it will be used as the OSD id. | |
79 | Note, in this case the command may fail if the number is already in use. | |
80 | ||
81 | .. warning:: In general, explicitly specifying {id} is not recommended. | |
82 | IDs are allocated as an array, and skipping entries consumes some extra | |
83 | memory. This can become significant if there are large gaps and/or | |
84 | clusters are large. If {id} is not specified, the smallest available is | |
85 | used. | |
86 | ||
87 | #. Create the default directory on your new OSD. :: | |
88 | ||
89 | ssh {new-osd-host} | |
90 | sudo mkdir /var/lib/ceph/osd/ceph-{osd-number} | |
91 | ||
92 | ||
93 | #. If the OSD is for a drive other than the OS drive, prepare it | |
94 | for use with Ceph, and mount it to the directory you just created:: | |
95 | ||
96 | ssh {new-osd-host} | |
97 | sudo mkfs -t {fstype} /dev/{drive} | |
98 | sudo mount -o user_xattr /dev/{hdd} /var/lib/ceph/osd/ceph-{osd-number} | |
99 | ||
100 | ||
101 | #. Initialize the OSD data directory. :: | |
102 | ||
103 | ssh {new-osd-host} | |
104 | ceph-osd -i {osd-num} --mkfs --mkkey | |
105 | ||
106 | The directory must be empty before you can run ``ceph-osd``. | |
107 | ||
108 | #. Register the OSD authentication key. The value of ``ceph`` for | |
109 | ``ceph-{osd-num}`` in the path is the ``$cluster-$id``. If your | |
110 | cluster name differs from ``ceph``, use your cluster name instead.:: | |
111 | ||
112 | ceph auth add osd.{osd-num} osd 'allow *' mon 'allow rwx' -i /var/lib/ceph/osd/ceph-{osd-num}/keyring | |
113 | ||
114 | ||
115 | #. Add the OSD to the CRUSH map so that the OSD can begin receiving data. The | |
116 | ``ceph osd crush add`` command allows you to add OSDs to the CRUSH hierarchy | |
117 | wherever you wish. If you specify at least one bucket, the command | |
118 | will place the OSD into the most specific bucket you specify, *and* it will | |
119 | move that bucket underneath any other buckets you specify. **Important:** If | |
120 | you specify only the root bucket, the command will attach the OSD directly | |
121 | to the root, but CRUSH rules expect OSDs to be inside of hosts. | |
122 | ||
123 | For Argonaut (v 0.48), execute the following:: | |
124 | ||
125 | ceph osd crush add {id} {name} {weight} [{bucket-type}={bucket-name} ...] | |
126 | ||
127 | For Bobtail (v 0.56) and later releases, execute the following:: | |
128 | ||
129 | ceph osd crush add {id-or-name} {weight} [{bucket-type}={bucket-name} ...] | |
130 | ||
131 | You may also decompile the CRUSH map, add the OSD to the device list, add the | |
132 | host as a bucket (if it's not already in the CRUSH map), add the device as an | |
133 | item in the host, assign it a weight, recompile it and set it. See | |
134 | `Add/Move an OSD`_ for details. | |
135 | ||
136 | ||
137 | .. topic:: Argonaut (v0.48) Best Practices | |
138 | ||
139 | To limit impact on user I/O performance, add an OSD to the CRUSH map | |
140 | with an initial weight of ``0``. Then, ramp up the CRUSH weight a | |
141 | little bit at a time. For example, to ramp by increments of ``0.2``, | |
142 | start with:: | |
143 | ||
144 | ceph osd crush reweight {osd-id} .2 | |
145 | ||
146 | and allow migration to complete before reweighting to ``0.4``, | |
147 | ``0.6``, and so on until the desired CRUSH weight is reached. | |
148 | ||
149 | To limit the impact of OSD failures, you can set:: | |
150 | ||
151 | mon osd down out interval = 0 | |
152 | ||
153 | which prevents down OSDs from automatically being marked out, and then | |
154 | ramp them down manually with:: | |
155 | ||
156 | ceph osd reweight {osd-num} .8 | |
157 | ||
158 | Again, wait for the cluster to finish migrating data, and then adjust | |
159 | the weight further until you reach a weight of 0. Note that this | |
160 | problem prevents the cluster to automatically re-replicate data after | |
161 | a failure, so please ensure that sufficient monitoring is in place for | |
162 | an administrator to intervene promptly. | |
163 | ||
164 | Note that this practice will no longer be necessary in Bobtail and | |
165 | subsequent releases. | |
166 | ||
167 | ||
168 | Starting the OSD | |
169 | ---------------- | |
170 | ||
171 | After you add an OSD to Ceph, the OSD is in your configuration. However, | |
172 | it is not yet running. The OSD is ``down`` and ``in``. You must start | |
173 | your new OSD before it can begin receiving data. You may use | |
174 | ``service ceph`` from your admin host or start the OSD from its host | |
175 | machine. | |
176 | ||
177 | For Ubuntu Trusty use Upstart. :: | |
178 | ||
179 | sudo start ceph-osd id={osd-num} | |
180 | ||
181 | For all other distros use systemd. :: | |
182 | ||
183 | sudo systemctl start ceph-osd@{osd-num} | |
184 | ||
185 | ||
186 | Once you start your OSD, it is ``up`` and ``in``. | |
187 | ||
188 | ||
189 | Observe the Data Migration | |
190 | -------------------------- | |
191 | ||
192 | Once you have added your new OSD to the CRUSH map, Ceph will begin rebalancing | |
193 | the server by migrating placement groups to your new OSD. You can observe this | |
194 | process with the `ceph`_ tool. :: | |
195 | ||
196 | ceph -w | |
197 | ||
198 | You should see the placement group states change from ``active+clean`` to | |
199 | ``active, some degraded objects``, and finally ``active+clean`` when migration | |
200 | completes. (Control-c to exit.) | |
201 | ||
202 | ||
203 | .. _Add/Move an OSD: ../crush-map#addosd | |
204 | .. _ceph: ../monitoring | |
205 | ||
206 | ||
207 | ||
208 | Removing OSDs (Manual) | |
209 | ====================== | |
210 | ||
211 | When you want to reduce the size of a cluster or replace hardware, you may | |
212 | remove an OSD at runtime. With Ceph, an OSD is generally one Ceph ``ceph-osd`` | |
213 | daemon for one storage drive within a host machine. If your host has multiple | |
214 | storage drives, you may need to remove one ``ceph-osd`` daemon for each drive. | |
215 | Generally, it's a good idea to check the capacity of your cluster to see if you | |
216 | are reaching the upper end of its capacity. Ensure that when you remove an OSD | |
217 | that your cluster is not at its ``near full`` ratio. | |
218 | ||
219 | .. warning:: Do not let your cluster reach its ``full ratio`` when | |
220 | removing an OSD. Removing OSDs could cause the cluster to reach | |
221 | or exceed its ``full ratio``. | |
222 | ||
223 | ||
224 | Take the OSD out of the Cluster | |
225 | ----------------------------------- | |
226 | ||
227 | Before you remove an OSD, it is usually ``up`` and ``in``. You need to take it | |
228 | out of the cluster so that Ceph can begin rebalancing and copying its data to | |
229 | other OSDs. :: | |
230 | ||
231 | ceph osd out {osd-num} | |
232 | ||
233 | ||
234 | Observe the Data Migration | |
235 | -------------------------- | |
236 | ||
237 | Once you have taken your OSD ``out`` of the cluster, Ceph will begin | |
238 | rebalancing the cluster by migrating placement groups out of the OSD you | |
239 | removed. You can observe this process with the `ceph`_ tool. :: | |
240 | ||
241 | ceph -w | |
242 | ||
243 | You should see the placement group states change from ``active+clean`` to | |
244 | ``active, some degraded objects``, and finally ``active+clean`` when migration | |
245 | completes. (Control-c to exit.) | |
246 | ||
247 | .. note:: Sometimes, typically in a "small" cluster with few hosts (for | |
248 | instance with a small testing cluster), the fact to take ``out`` the | |
249 | OSD can spawn a CRUSH corner case where some PGs remain stuck in the | |
250 | ``active+remapped`` state. If you are in this case, you should mark | |
251 | the OSD ``in`` with: | |
252 | ||
253 | ``ceph osd in {osd-num}`` | |
254 | ||
255 | to come back to the initial state and then, instead of marking ``out`` | |
256 | the OSD, set its weight to 0 with: | |
257 | ||
258 | ``ceph osd crush reweight osd.{osd-num} 0`` | |
259 | ||
260 | After that, you can observe the data migration which should come to its | |
261 | end. The difference between marking ``out`` the OSD and reweighting it | |
262 | to 0 is that in the first case the weight of the bucket which contains | |
263 | the OSD isn't changed whereas in the second case the weight of the bucket | |
264 | is updated (and decreased of the OSD weight). The reweight command could | |
265 | be sometimes favoured in the case of a "small" cluster. | |
266 | ||
267 | ||
268 | ||
269 | Stopping the OSD | |
270 | ---------------- | |
271 | ||
272 | After you take an OSD out of the cluster, it may still be running. | |
273 | That is, the OSD may be ``up`` and ``out``. You must stop | |
274 | your OSD before you remove it from the configuration. :: | |
275 | ||
276 | ssh {osd-host} | |
277 | sudo systemctl stop ceph-osd@{osd-num} | |
278 | ||
279 | Once you stop your OSD, it is ``down``. | |
280 | ||
281 | ||
282 | Removing the OSD | |
283 | ---------------- | |
284 | ||
285 | This procedure removes an OSD from a cluster map, removes its authentication | |
286 | key, removes the OSD from the OSD map, and removes the OSD from the | |
287 | ``ceph.conf`` file. If your host has multiple drives, you may need to remove an | |
288 | OSD for each drive by repeating this procedure. | |
289 | ||
290 | ||
291 | #. Remove the OSD from the CRUSH map so that it no longer receives data. You may | |
292 | also decompile the CRUSH map, remove the OSD from the device list, remove the | |
293 | device as an item in the host bucket or remove the host bucket (if it's in the | |
294 | CRUSH map and you intend to remove the host), recompile the map and set it. | |
295 | See `Remove an OSD`_ for details. :: | |
296 | ||
297 | ceph osd crush remove {name} | |
298 | ||
299 | #. Remove the OSD authentication key. :: | |
300 | ||
301 | ceph auth del osd.{osd-num} | |
302 | ||
303 | The value of ``ceph`` for ``ceph-{osd-num}`` in the path is the ``$cluster-$id``. | |
304 | If your cluster name differs from ``ceph``, use your cluster name instead. | |
305 | ||
306 | #. Remove the OSD. :: | |
307 | ||
308 | ceph osd rm {osd-num} | |
309 | #for example | |
310 | ceph osd rm 1 | |
311 | ||
312 | #. Navigate to the host where you keep the master copy of the cluster's | |
313 | ``ceph.conf`` file. :: | |
314 | ||
315 | ssh {admin-host} | |
316 | cd /etc/ceph | |
317 | vim ceph.conf | |
318 | ||
319 | #. Remove the OSD entry from your ``ceph.conf`` file (if it exists). :: | |
320 | ||
321 | [osd.1] | |
322 | host = {hostname} | |
323 | ||
324 | #. From the host where you keep the master copy of the cluster's ``ceph.conf`` file, | |
325 | copy the updated ``ceph.conf`` file to the ``/etc/ceph`` directory of other | |
326 | hosts in your cluster. | |
327 | ||
328 | ||
329 | ||
330 | .. _Remove an OSD: ../crush-map#removeosd |