]>
Commit | Line | Data |
---|---|---|
f67539c2 TL |
1 | *********** |
2 | OSD Service | |
3 | *********** | |
4 | .. _device management: ../rados/operations/devices | |
5 | .. _libstoragemgmt: https://github.com/libstorage/libstoragemgmt | |
6 | ||
7 | List Devices | |
8 | ============ | |
9 | ||
522d829b | 10 | ``ceph-volume`` scans each host in the cluster from time to time in order |
f67539c2 TL |
11 | to determine which devices are present and whether they are eligible to be |
12 | used as OSDs. | |
13 | ||
14 | To print a list of devices discovered by ``cephadm``, run this command: | |
15 | ||
16 | .. prompt:: bash # | |
17 | ||
18 | ceph orch device ls [--hostname=...] [--wide] [--refresh] | |
19 | ||
20 | Example | |
21 | :: | |
22 | ||
23 | Hostname Path Type Serial Size Health Ident Fault Available | |
24 | srv-01 /dev/sdb hdd 15P0A0YFFRD6 300G Unknown N/A N/A No | |
25 | srv-01 /dev/sdc hdd 15R0A08WFRD6 300G Unknown N/A N/A No | |
26 | srv-01 /dev/sdd hdd 15R0A07DFRD6 300G Unknown N/A N/A No | |
27 | srv-01 /dev/sde hdd 15P0A0QDFRD6 300G Unknown N/A N/A No | |
28 | srv-02 /dev/sdb hdd 15R0A033FRD6 300G Unknown N/A N/A No | |
29 | srv-02 /dev/sdc hdd 15R0A05XFRD6 300G Unknown N/A N/A No | |
30 | srv-02 /dev/sde hdd 15R0A0ANFRD6 300G Unknown N/A N/A No | |
31 | srv-02 /dev/sdf hdd 15R0A06EFRD6 300G Unknown N/A N/A No | |
32 | srv-03 /dev/sdb hdd 15R0A0OGFRD6 300G Unknown N/A N/A No | |
33 | srv-03 /dev/sdc hdd 15R0A0P7FRD6 300G Unknown N/A N/A No | |
34 | srv-03 /dev/sdd hdd 15R0A0O7FRD6 300G Unknown N/A N/A No | |
35 | ||
36 | Using the ``--wide`` option provides all details relating to the device, | |
37 | including any reasons that the device might not be eligible for use as an OSD. | |
38 | ||
39 | In the above example you can see fields named "Health", "Ident", and "Fault". | |
40 | This information is provided by integration with `libstoragemgmt`_. By default, | |
41 | this integration is disabled (because `libstoragemgmt`_ may not be 100% | |
42 | compatible with your hardware). To make ``cephadm`` include these fields, | |
43 | enable cephadm's "enhanced device scan" option as follows; | |
44 | ||
45 | .. prompt:: bash # | |
46 | ||
47 | ceph config set mgr mgr/cephadm/device_enhanced_scan true | |
48 | ||
49 | .. warning:: | |
50 | Although the libstoragemgmt library performs standard SCSI inquiry calls, | |
51 | there is no guarantee that your firmware fully implements these standards. | |
52 | This can lead to erratic behaviour and even bus resets on some older | |
53 | hardware. It is therefore recommended that, before enabling this feature, | |
54 | you test your hardware's compatibility with libstoragemgmt first to avoid | |
55 | unplanned interruptions to services. | |
56 | ||
57 | There are a number of ways to test compatibility, but the simplest may be | |
58 | to use the cephadm shell to call libstoragemgmt directly - ``cephadm shell | |
59 | lsmcli ldl``. If your hardware is supported you should see something like | |
60 | this: | |
61 | ||
62 | :: | |
63 | ||
64 | Path | SCSI VPD 0x83 | Link Type | Serial Number | Health Status | |
65 | ---------------------------------------------------------------------------- | |
66 | /dev/sda | 50000396082ba631 | SAS | 15P0A0R0FRD6 | Good | |
67 | /dev/sdb | 50000396082bbbf9 | SAS | 15P0A0YFFRD6 | Good | |
68 | ||
69 | ||
70 | After you have enabled libstoragemgmt support, the output will look something | |
71 | like this: | |
72 | ||
73 | :: | |
74 | ||
75 | # ceph orch device ls | |
76 | Hostname Path Type Serial Size Health Ident Fault Available | |
77 | srv-01 /dev/sdb hdd 15P0A0YFFRD6 300G Good Off Off No | |
78 | srv-01 /dev/sdc hdd 15R0A08WFRD6 300G Good Off Off No | |
79 | : | |
80 | ||
81 | In this example, libstoragemgmt has confirmed the health of the drives and the ability to | |
82 | interact with the Identification and Fault LEDs on the drive enclosures. For further | |
83 | information about interacting with these LEDs, refer to `device management`_. | |
84 | ||
85 | .. note:: | |
86 | The current release of `libstoragemgmt`_ (1.8.8) supports SCSI, SAS, and SATA based | |
87 | local disks only. There is no official support for NVMe devices (PCIe) | |
88 | ||
89 | .. _cephadm-deploy-osds: | |
90 | ||
91 | Deploy OSDs | |
92 | =========== | |
93 | ||
94 | Listing Storage Devices | |
95 | ----------------------- | |
96 | ||
97 | In order to deploy an OSD, there must be a storage device that is *available* on | |
98 | which the OSD will be deployed. | |
99 | ||
100 | Run this command to display an inventory of storage devices on all cluster hosts: | |
101 | ||
102 | .. prompt:: bash # | |
103 | ||
104 | ceph orch device ls | |
105 | ||
106 | A storage device is considered *available* if all of the following | |
107 | conditions are met: | |
108 | ||
109 | * The device must have no partitions. | |
110 | * The device must not have any LVM state. | |
111 | * The device must not be mounted. | |
112 | * The device must not contain a file system. | |
113 | * The device must not contain a Ceph BlueStore OSD. | |
114 | * The device must be larger than 5 GB. | |
115 | ||
116 | Ceph will not provision an OSD on a device that is not available. | |
117 | ||
118 | Creating New OSDs | |
119 | ----------------- | |
120 | ||
121 | There are a few ways to create new OSDs: | |
122 | ||
123 | * Tell Ceph to consume any available and unused storage device: | |
124 | ||
125 | .. prompt:: bash # | |
126 | ||
127 | ceph orch apply osd --all-available-devices | |
128 | ||
129 | * Create an OSD from a specific device on a specific host: | |
130 | ||
131 | .. prompt:: bash # | |
132 | ||
133 | ceph orch daemon add osd *<host>*:*<device-path>* | |
134 | ||
135 | For example: | |
136 | ||
137 | .. prompt:: bash # | |
138 | ||
139 | ceph orch daemon add osd host1:/dev/sdb | |
140 | ||
33c7a0ef TL |
141 | Advanced OSD creation from specific devices on a specific host: |
142 | ||
143 | .. prompt:: bash # | |
144 | ||
145 | ceph orch daemon add osd host1:data_devices=/dev/sda,/dev/sdb,db_devices=/dev/sdc,osds_per_device=2 | |
146 | ||
1e59de90 TL |
147 | * Create an OSD on a specific LVM logical volume on a specific host: |
148 | ||
149 | .. prompt:: bash # | |
150 | ||
151 | ceph orch daemon add osd *<host>*:*<lvm-path>* | |
152 | ||
153 | For example: | |
154 | ||
155 | .. prompt:: bash # | |
156 | ||
157 | ceph orch daemon add osd host1:/dev/vg_osd/lvm_osd1701 | |
158 | ||
f67539c2 TL |
159 | * You can use :ref:`drivegroups` to categorize device(s) based on their |
160 | properties. This might be useful in forming a clearer picture of which | |
161 | devices are available to consume. Properties include device type (SSD or | |
162 | HDD), device model names, size, and the hosts on which the devices exist: | |
163 | ||
164 | .. prompt:: bash # | |
165 | ||
166 | ceph orch apply -i spec.yml | |
167 | ||
168 | Dry Run | |
169 | ------- | |
170 | ||
171 | The ``--dry-run`` flag causes the orchestrator to present a preview of what | |
172 | will happen without actually creating the OSDs. | |
173 | ||
174 | For example: | |
175 | ||
176 | .. prompt:: bash # | |
177 | ||
178 | ceph orch apply osd --all-available-devices --dry-run | |
179 | ||
180 | :: | |
181 | ||
182 | NAME HOST DATA DB WAL | |
183 | all-available-devices node1 /dev/vdb - - | |
184 | all-available-devices node2 /dev/vdc - - | |
185 | all-available-devices node3 /dev/vdd - - | |
186 | ||
187 | .. _cephadm-osd-declarative: | |
188 | ||
189 | Declarative State | |
190 | ----------------- | |
191 | ||
b3b6e05e TL |
192 | The effect of ``ceph orch apply`` is persistent. This means that drives that |
193 | are added to the system after the ``ceph orch apply`` command completes will be | |
194 | automatically found and added to the cluster. It also means that drives that | |
195 | become available (by zapping, for example) after the ``ceph orch apply`` | |
196 | command completes will be automatically found and added to the cluster. | |
f67539c2 | 197 | |
b3b6e05e | 198 | We will examine the effects of the following command: |
f67539c2 | 199 | |
b3b6e05e TL |
200 | .. prompt:: bash # |
201 | ||
202 | ceph orch apply osd --all-available-devices | |
203 | ||
204 | After running the above command: | |
205 | ||
206 | * If you add new disks to the cluster, they will automatically be used to | |
207 | create new OSDs. | |
208 | * If you remove an OSD and clean the LVM physical volume, a new OSD will be | |
209 | created automatically. | |
f67539c2 | 210 | |
f67539c2 TL |
211 | If you want to avoid this behavior (disable automatic creation of OSD on available devices), use the ``unmanaged`` parameter: |
212 | ||
213 | .. prompt:: bash # | |
214 | ||
215 | ceph orch apply osd --all-available-devices --unmanaged=true | |
216 | ||
b3b6e05e TL |
217 | .. note:: |
218 | ||
219 | Keep these three facts in mind: | |
220 | ||
221 | - The default behavior of ``ceph orch apply`` causes cephadm constantly to reconcile. This means that cephadm creates OSDs as soon as new drives are detected. | |
222 | ||
223 | - Setting ``unmanaged: True`` disables the creation of OSDs. If ``unmanaged: True`` is set, nothing will happen even if you apply a new OSD service. | |
224 | ||
225 | - ``ceph orch daemon add`` creates OSDs, but does not add an OSD service. | |
226 | ||
f67539c2 TL |
227 | * For cephadm, see also :ref:`cephadm-spec-unmanaged`. |
228 | ||
522d829b | 229 | .. _cephadm-osd-removal: |
f67539c2 TL |
230 | |
231 | Remove an OSD | |
232 | ============= | |
233 | ||
234 | Removing an OSD from a cluster involves two steps: | |
235 | ||
236 | #. evacuating all placement groups (PGs) from the cluster | |
237 | #. removing the PG-free OSD from the cluster | |
238 | ||
239 | The following command performs these two steps: | |
240 | ||
241 | .. prompt:: bash # | |
242 | ||
243 | ceph orch osd rm <osd_id(s)> [--replace] [--force] | |
244 | ||
245 | Example: | |
246 | ||
247 | .. prompt:: bash # | |
248 | ||
249 | ceph orch osd rm 0 | |
250 | ||
251 | Expected output:: | |
252 | ||
253 | Scheduled OSD(s) for removal | |
254 | ||
255 | OSDs that are not safe to destroy will be rejected. | |
256 | ||
2a845540 TL |
257 | .. note:: |
258 | After removing OSDs, if the drives the OSDs were deployed on once again | |
259 | become available, cephadm may automatically try to deploy more OSDs | |
260 | on these drives if they match an existing drivegroup spec. If you deployed | |
261 | the OSDs you are removing with a spec and don't want any new OSDs deployed on | |
262 | the drives after removal, it's best to modify the drivegroup spec before removal. | |
263 | Either set ``unmanaged: true`` to stop it from picking up new drives at all, | |
264 | or modify it in some way that it no longer matches the drives used for the | |
265 | OSDs you wish to remove. Then re-apply the spec. For more info on drivegroup | |
266 | specs see :ref:`drivegroups`. For more info on the declarative nature of | |
267 | cephadm in reference to deploying OSDs, see :ref:`cephadm-osd-declarative` | |
268 | ||
f67539c2 TL |
269 | Monitoring OSD State |
270 | -------------------- | |
271 | ||
272 | You can query the state of OSD operation with the following command: | |
273 | ||
274 | .. prompt:: bash # | |
275 | ||
276 | ceph orch osd rm status | |
277 | ||
278 | Expected output:: | |
279 | ||
280 | OSD_ID HOST STATE PG_COUNT REPLACE FORCE STARTED_AT | |
281 | 2 cephadm-dev done, waiting for purge 0 True False 2020-07-17 13:01:43.147684 | |
282 | 3 cephadm-dev draining 17 False True 2020-07-17 13:01:45.162158 | |
283 | 4 cephadm-dev started 42 False True 2020-07-17 13:01:45.162158 | |
284 | ||
285 | ||
286 | When no PGs are left on the OSD, it will be decommissioned and removed from the cluster. | |
287 | ||
288 | .. note:: | |
289 | After removing an OSD, if you wipe the LVM physical volume in the device used by the removed OSD, a new OSD will be created. | |
290 | For more information on this, read about the ``unmanaged`` parameter in :ref:`cephadm-osd-declarative`. | |
291 | ||
292 | Stopping OSD Removal | |
293 | -------------------- | |
294 | ||
295 | It is possible to stop queued OSD removals by using the following command: | |
296 | ||
297 | .. prompt:: bash # | |
298 | ||
b3b6e05e | 299 | ceph orch osd rm stop <osd_id(s)> |
f67539c2 TL |
300 | |
301 | Example: | |
302 | ||
303 | .. prompt:: bash # | |
304 | ||
305 | ceph orch osd rm stop 4 | |
306 | ||
307 | Expected output:: | |
308 | ||
309 | Stopped OSD(s) removal | |
310 | ||
311 | This resets the initial state of the OSD and takes it off the removal queue. | |
312 | ||
39ae355f | 313 | .. _cephadm-replacing-an-osd: |
f67539c2 TL |
314 | |
315 | Replacing an OSD | |
316 | ---------------- | |
317 | ||
318 | .. prompt:: bash # | |
319 | ||
1e59de90 | 320 | ceph orch osd rm <osd_id(s)> --replace [--force] |
f67539c2 TL |
321 | |
322 | Example: | |
323 | ||
324 | .. prompt:: bash # | |
325 | ||
326 | ceph orch osd rm 4 --replace | |
327 | ||
328 | Expected output:: | |
329 | ||
330 | Scheduled OSD(s) for replacement | |
331 | ||
332 | This follows the same procedure as the procedure in the "Remove OSD" section, with | |
333 | one exception: the OSD is not permanently removed from the CRUSH hierarchy, but is | |
334 | instead assigned a 'destroyed' flag. | |
335 | ||
a4b75251 TL |
336 | .. note:: |
337 | The new OSD that will replace the removed OSD must be created on the same host | |
338 | as the OSD that was removed. | |
339 | ||
f67539c2 TL |
340 | **Preserving the OSD ID** |
341 | ||
342 | The 'destroyed' flag is used to determine which OSD ids will be reused in the | |
343 | next OSD deployment. | |
344 | ||
345 | If you use OSDSpecs for OSD deployment, your newly added disks will be assigned | |
346 | the OSD ids of their replaced counterparts. This assumes that the new disks | |
347 | still match the OSDSpecs. | |
348 | ||
349 | Use the ``--dry-run`` flag to make certain that the ``ceph orch apply osd`` | |
350 | command does what you want it to. The ``--dry-run`` flag shows you what the | |
351 | outcome of the command will be without making the changes you specify. When | |
352 | you are satisfied that the command will do what you want, run the command | |
353 | without the ``--dry-run`` flag. | |
354 | ||
355 | .. tip:: | |
356 | ||
357 | The name of your OSDSpec can be retrieved with the command ``ceph orch ls`` | |
358 | ||
359 | Alternatively, you can use your OSDSpec file: | |
360 | ||
361 | .. prompt:: bash # | |
362 | ||
20effc67 | 363 | ceph orch apply -i <osd_spec_file> --dry-run |
f67539c2 TL |
364 | |
365 | Expected output:: | |
366 | ||
367 | NAME HOST DATA DB WAL | |
368 | <name_of_osd_spec> node1 /dev/vdb - - | |
369 | ||
370 | ||
371 | When this output reflects your intention, omit the ``--dry-run`` flag to | |
372 | execute the deployment. | |
373 | ||
374 | ||
375 | Erasing Devices (Zapping Devices) | |
376 | --------------------------------- | |
377 | ||
378 | Erase (zap) a device so that it can be reused. ``zap`` calls ``ceph-volume | |
379 | zap`` on the remote host. | |
380 | ||
381 | .. prompt:: bash # | |
382 | ||
522d829b | 383 | ceph orch device zap <hostname> <path> |
f67539c2 TL |
384 | |
385 | Example command: | |
386 | ||
387 | .. prompt:: bash # | |
388 | ||
389 | ceph orch device zap my_hostname /dev/sdx | |
390 | ||
391 | .. note:: | |
392 | If the unmanaged flag is unset, cephadm automatically deploys drives that | |
a4b75251 | 393 | match the OSDSpec. For example, if you use the |
f67539c2 TL |
394 | ``all-available-devices`` option when creating OSDs, when you ``zap`` a |
395 | device the cephadm orchestrator automatically creates a new OSD in the | |
396 | device. To disable this behavior, see :ref:`cephadm-osd-declarative`. | |
397 | ||
398 | ||
b3b6e05e TL |
399 | .. _osd_autotune: |
400 | ||
401 | Automatically tuning OSD memory | |
402 | =============================== | |
403 | ||
404 | OSD daemons will adjust their memory consumption based on the | |
405 | ``osd_memory_target`` config option (several gigabytes, by | |
406 | default). If Ceph is deployed on dedicated nodes that are not sharing | |
407 | memory with other services, cephadm can automatically adjust the per-OSD | |
408 | memory consumption based on the total amount of RAM and the number of deployed | |
409 | OSDs. | |
410 | ||
20effc67 | 411 | .. warning:: Cephadm sets ``osd_memory_target_autotune`` to ``true`` by default which is unsuitable for hyperconverged infrastructures. |
b3b6e05e TL |
412 | |
413 | Cephadm will start with a fraction | |
414 | (``mgr/cephadm/autotune_memory_target_ratio``, which defaults to | |
415 | ``.7``) of the total RAM in the system, subtract off any memory | |
416 | consumed by non-autotuned daemons (non-OSDs, for OSDs for which | |
417 | ``osd_memory_target_autotune`` is false), and then divide by the | |
418 | remaining OSDs. | |
419 | ||
420 | The final targets are reflected in the config database with options like:: | |
421 | ||
422 | WHO MASK LEVEL OPTION VALUE | |
423 | osd host:foo basic osd_memory_target 126092301926 | |
424 | osd host:bar basic osd_memory_target 6442450944 | |
425 | ||
426 | Both the limits and the current memory consumed by each daemon are visible from | |
427 | the ``ceph orch ps`` output in the ``MEM LIMIT`` column:: | |
428 | ||
429 | NAME HOST PORTS STATUS REFRESHED AGE MEM USED MEM LIMIT VERSION IMAGE ID CONTAINER ID | |
430 | osd.1 dael running (3h) 10s ago 3h 72857k 117.4G 17.0.0-3781-gafaed750 7015fda3cd67 9e183363d39c | |
431 | osd.2 dael running (81m) 10s ago 81m 63989k 117.4G 17.0.0-3781-gafaed750 7015fda3cd67 1f0cc479b051 | |
432 | osd.3 dael running (62m) 10s ago 62m 64071k 117.4G 17.0.0-3781-gafaed750 7015fda3cd67 ac5537492f27 | |
433 | ||
434 | To exclude an OSD from memory autotuning, disable the autotune option | |
435 | for that OSD and also set a specific memory target. For example, | |
436 | ||
437 | .. prompt:: bash # | |
438 | ||
439 | ceph config set osd.123 osd_memory_target_autotune false | |
440 | ceph config set osd.123 osd_memory_target 16G | |
441 | ||
442 | ||
f67539c2 TL |
443 | .. _drivegroups: |
444 | ||
445 | Advanced OSD Service Specifications | |
446 | =================================== | |
447 | ||
b3b6e05e TL |
448 | :ref:`orchestrator-cli-service-spec`\s of type ``osd`` are a way to describe a |
449 | cluster layout, using the properties of disks. Service specifications give the | |
450 | user an abstract way to tell Ceph which disks should turn into OSDs with which | |
451 | configurations, without knowing the specifics of device names and paths. | |
452 | ||
453 | Service specifications make it possible to define a yaml or json file that can | |
454 | be used to reduce the amount of manual work involved in creating OSDs. | |
f67539c2 | 455 | |
b3b6e05e | 456 | For example, instead of running the following command: |
f67539c2 TL |
457 | |
458 | .. prompt:: bash [monitor.1]# | |
459 | ||
460 | ceph orch daemon add osd *<host>*:*<path-to-device>* | |
461 | ||
b3b6e05e TL |
462 | for each device and each host, we can define a yaml or json file that allows us |
463 | to describe the layout. Here's the most basic example. | |
f67539c2 | 464 | |
b3b6e05e | 465 | Create a file called (for example) ``osd_spec.yml``: |
f67539c2 TL |
466 | |
467 | .. code-block:: yaml | |
468 | ||
469 | service_type: osd | |
a4b75251 | 470 | service_id: default_drive_group # custom name of the osd spec |
f67539c2 | 471 | placement: |
a4b75251 TL |
472 | host_pattern: '*' # which hosts to target |
473 | spec: | |
474 | data_devices: # the type of devices you are applying specs to | |
475 | all: true # a filter, check below for a full list | |
f67539c2 | 476 | |
b3b6e05e | 477 | This means : |
f67539c2 | 478 | |
b3b6e05e TL |
479 | #. Turn any available device (ceph-volume decides what 'available' is) into an |
480 | OSD on all hosts that match the glob pattern '*'. (The glob pattern matches | |
481 | against the registered hosts from `host ls`) A more detailed section on | |
482 | host_pattern is available below. | |
f67539c2 | 483 | |
b3b6e05e | 484 | #. Then pass it to `osd create` like this: |
f67539c2 | 485 | |
b3b6e05e | 486 | .. prompt:: bash [monitor.1]# |
f67539c2 | 487 | |
20effc67 | 488 | ceph orch apply -i /path/to/osd_spec.yml |
f67539c2 | 489 | |
b3b6e05e TL |
490 | This instruction will be issued to all the matching hosts, and will deploy |
491 | these OSDs. | |
f67539c2 | 492 | |
b3b6e05e TL |
493 | Setups more complex than the one specified by the ``all`` filter are |
494 | possible. See :ref:`osd_filters` for details. | |
f67539c2 | 495 | |
b3b6e05e TL |
496 | A ``--dry-run`` flag can be passed to the ``apply osd`` command to display a |
497 | synopsis of the proposed layout. | |
f67539c2 TL |
498 | |
499 | Example | |
500 | ||
501 | .. prompt:: bash [monitor.1]# | |
502 | ||
20effc67 | 503 | ceph orch apply -i /path/to/osd_spec.yml --dry-run |
b3b6e05e | 504 | |
f67539c2 TL |
505 | |
506 | ||
b3b6e05e | 507 | .. _osd_filters: |
f67539c2 TL |
508 | |
509 | Filters | |
510 | ------- | |
511 | ||
512 | .. note:: | |
b3b6e05e TL |
513 | Filters are applied using an `AND` gate by default. This means that a drive |
514 | must fulfill all filter criteria in order to get selected. This behavior can | |
515 | be adjusted by setting ``filter_logic: OR`` in the OSD specification. | |
f67539c2 | 516 | |
b3b6e05e TL |
517 | Filters are used to assign disks to groups, using their attributes to group |
518 | them. | |
f67539c2 | 519 | |
b3b6e05e TL |
520 | The attributes are based off of ceph-volume's disk query. You can retrieve |
521 | information about the attributes with this command: | |
f67539c2 TL |
522 | |
523 | .. code-block:: bash | |
524 | ||
525 | ceph-volume inventory </path/to/disk> | |
526 | ||
b3b6e05e TL |
527 | Vendor or Model |
528 | ^^^^^^^^^^^^^^^ | |
f67539c2 | 529 | |
b3b6e05e | 530 | Specific disks can be targeted by vendor or model: |
f67539c2 TL |
531 | |
532 | .. code-block:: yaml | |
533 | ||
534 | model: disk_model_name | |
535 | ||
536 | or | |
537 | ||
538 | .. code-block:: yaml | |
539 | ||
540 | vendor: disk_vendor_name | |
541 | ||
542 | ||
b3b6e05e TL |
543 | Size |
544 | ^^^^ | |
f67539c2 | 545 | |
b3b6e05e | 546 | Specific disks can be targeted by `Size`: |
f67539c2 TL |
547 | |
548 | .. code-block:: yaml | |
549 | ||
550 | size: size_spec | |
551 | ||
b3b6e05e TL |
552 | Size specs |
553 | __________ | |
f67539c2 | 554 | |
b3b6e05e | 555 | Size specifications can be of the following forms: |
f67539c2 TL |
556 | |
557 | * LOW:HIGH | |
558 | * :HIGH | |
559 | * LOW: | |
560 | * EXACT | |
561 | ||
562 | Concrete examples: | |
563 | ||
b3b6e05e | 564 | To include disks of an exact size |
f67539c2 TL |
565 | |
566 | .. code-block:: yaml | |
567 | ||
568 | size: '10G' | |
569 | ||
b3b6e05e | 570 | To include disks within a given range of size: |
f67539c2 TL |
571 | |
572 | .. code-block:: yaml | |
573 | ||
574 | size: '10G:40G' | |
575 | ||
b3b6e05e | 576 | To include disks that are less than or equal to 10G in size: |
f67539c2 TL |
577 | |
578 | .. code-block:: yaml | |
579 | ||
580 | size: ':10G' | |
581 | ||
b3b6e05e | 582 | To include disks equal to or greater than 40G in size: |
f67539c2 TL |
583 | |
584 | .. code-block:: yaml | |
585 | ||
586 | size: '40G:' | |
587 | ||
b3b6e05e | 588 | Sizes don't have to be specified exclusively in Gigabytes(G). |
f67539c2 | 589 | |
1e59de90 | 590 | Other units of size are supported: Megabyte(M), Gigabyte(G) and Terabyte(T). |
b3b6e05e | 591 | Appending the (B) for byte is also supported: ``MB``, ``GB``, ``TB``. |
f67539c2 TL |
592 | |
593 | ||
b3b6e05e TL |
594 | Rotational |
595 | ^^^^^^^^^^ | |
f67539c2 TL |
596 | |
597 | This operates on the 'rotational' attribute of the disk. | |
598 | ||
599 | .. code-block:: yaml | |
600 | ||
601 | rotational: 0 | 1 | |
602 | ||
603 | `1` to match all disks that are rotational | |
604 | ||
605 | `0` to match all disks that are non-rotational (SSD, NVME etc) | |
606 | ||
607 | ||
b3b6e05e TL |
608 | All |
609 | ^^^ | |
f67539c2 TL |
610 | |
611 | This will take all disks that are 'available' | |
612 | ||
a4b75251 | 613 | .. note:: This is exclusive for the data_devices section. |
f67539c2 TL |
614 | |
615 | .. code-block:: yaml | |
616 | ||
617 | all: true | |
618 | ||
619 | ||
b3b6e05e TL |
620 | Limiter |
621 | ^^^^^^^ | |
f67539c2 | 622 | |
b3b6e05e | 623 | If you have specified some valid filters but want to limit the number of disks that they match, use the ``limit`` directive: |
f67539c2 TL |
624 | |
625 | .. code-block:: yaml | |
626 | ||
627 | limit: 2 | |
628 | ||
b3b6e05e TL |
629 | For example, if you used `vendor` to match all disks that are from `VendorA` |
630 | but want to use only the first two, you could use `limit`: | |
f67539c2 TL |
631 | |
632 | .. code-block:: yaml | |
633 | ||
634 | data_devices: | |
635 | vendor: VendorA | |
636 | limit: 2 | |
637 | ||
a4b75251 | 638 | .. note:: `limit` is a last resort and shouldn't be used if it can be avoided. |
f67539c2 TL |
639 | |
640 | ||
641 | Additional Options | |
642 | ------------------ | |
643 | ||
644 | There are multiple optional settings you can use to change the way OSDs are deployed. | |
a4b75251 | 645 | You can add these options to the base level of an OSD spec for it to take effect. |
f67539c2 TL |
646 | |
647 | This example would deploy all OSDs with encryption enabled. | |
648 | ||
649 | .. code-block:: yaml | |
650 | ||
651 | service_type: osd | |
652 | service_id: example_osd_spec | |
653 | placement: | |
654 | host_pattern: '*' | |
a4b75251 TL |
655 | spec: |
656 | data_devices: | |
657 | all: true | |
658 | encrypted: true | |
f67539c2 TL |
659 | |
660 | See a full list in the DriveGroupSpecs | |
661 | ||
662 | .. py:currentmodule:: ceph.deployment.drive_group | |
663 | ||
664 | .. autoclass:: DriveGroupSpec | |
665 | :members: | |
666 | :exclude-members: from_json | |
667 | ||
1e59de90 | 668 | |
f67539c2 | 669 | Examples |
a4b75251 | 670 | ======== |
f67539c2 TL |
671 | |
672 | The simple case | |
a4b75251 | 673 | --------------- |
f67539c2 TL |
674 | |
675 | All nodes with the same setup | |
676 | ||
677 | .. code-block:: none | |
678 | ||
679 | 20 HDDs | |
680 | Vendor: VendorA | |
681 | Model: HDD-123-foo | |
682 | Size: 4TB | |
683 | ||
684 | 2 SSDs | |
685 | Vendor: VendorB | |
686 | Model: MC-55-44-ZX | |
687 | Size: 512GB | |
688 | ||
689 | This is a common setup and can be described quite easily: | |
690 | ||
691 | .. code-block:: yaml | |
692 | ||
693 | service_type: osd | |
694 | service_id: osd_spec_default | |
695 | placement: | |
696 | host_pattern: '*' | |
a4b75251 TL |
697 | spec: |
698 | data_devices: | |
699 | model: HDD-123-foo # Note, HDD-123 would also be valid | |
700 | db_devices: | |
701 | model: MC-55-44-XZ # Same here, MC-55-44 is valid | |
f67539c2 TL |
702 | |
703 | However, we can improve it by reducing the filters on core properties of the drives: | |
704 | ||
705 | .. code-block:: yaml | |
706 | ||
707 | service_type: osd | |
708 | service_id: osd_spec_default | |
709 | placement: | |
710 | host_pattern: '*' | |
a4b75251 TL |
711 | spec: |
712 | data_devices: | |
713 | rotational: 1 | |
714 | db_devices: | |
715 | rotational: 0 | |
f67539c2 TL |
716 | |
717 | Now, we enforce all rotating devices to be declared as 'data devices' and all non-rotating devices will be used as shared_devices (wal, db) | |
718 | ||
719 | If you know that drives with more than 2TB will always be the slower data devices, you can also filter by size: | |
720 | ||
721 | .. code-block:: yaml | |
722 | ||
723 | service_type: osd | |
724 | service_id: osd_spec_default | |
725 | placement: | |
726 | host_pattern: '*' | |
a4b75251 TL |
727 | spec: |
728 | data_devices: | |
729 | size: '2TB:' | |
730 | db_devices: | |
731 | size: ':2TB' | |
f67539c2 | 732 | |
a4b75251 | 733 | .. note:: All of the above OSD specs are equally valid. Which of those you want to use depends on taste and on how much you expect your node layout to change. |
f67539c2 TL |
734 | |
735 | ||
a4b75251 TL |
736 | Multiple OSD specs for a single host |
737 | ------------------------------------ | |
f67539c2 TL |
738 | |
739 | Here we have two distinct setups | |
740 | ||
741 | .. code-block:: none | |
742 | ||
743 | 20 HDDs | |
744 | Vendor: VendorA | |
745 | Model: HDD-123-foo | |
746 | Size: 4TB | |
747 | ||
748 | 12 SSDs | |
749 | Vendor: VendorB | |
750 | Model: MC-55-44-ZX | |
751 | Size: 512GB | |
752 | ||
753 | 2 NVMEs | |
754 | Vendor: VendorC | |
755 | Model: NVME-QQQQ-987 | |
756 | Size: 256GB | |
757 | ||
758 | ||
759 | * 20 HDDs should share 2 SSDs | |
760 | * 10 SSDs should share 2 NVMes | |
761 | ||
762 | This can be described with two layouts. | |
763 | ||
764 | .. code-block:: yaml | |
765 | ||
766 | service_type: osd | |
767 | service_id: osd_spec_hdd | |
768 | placement: | |
769 | host_pattern: '*' | |
a4b75251 TL |
770 | spec: |
771 | data_devices: | |
1e59de90 | 772 | rotational: 1 |
a4b75251 TL |
773 | db_devices: |
774 | model: MC-55-44-XZ | |
775 | limit: 2 # db_slots is actually to be favoured here, but it's not implemented yet | |
f67539c2 TL |
776 | --- |
777 | service_type: osd | |
778 | service_id: osd_spec_ssd | |
779 | placement: | |
780 | host_pattern: '*' | |
a4b75251 TL |
781 | spec: |
782 | data_devices: | |
783 | model: MC-55-44-XZ | |
784 | db_devices: | |
785 | vendor: VendorC | |
f67539c2 TL |
786 | |
787 | This would create the desired layout by using all HDDs as data_devices with two SSD assigned as dedicated db/wal devices. | |
1e59de90 | 788 | The remaining SSDs(10) will be data_devices that have the 'VendorC' NVMEs assigned as dedicated db/wal devices. |
f67539c2 | 789 | |
a4b75251 TL |
790 | Multiple hosts with the same disk layout |
791 | ---------------------------------------- | |
792 | ||
793 | Assuming the cluster has different kinds of hosts each with similar disk | |
794 | layout, it is recommended to apply different OSD specs matching only one | |
795 | set of hosts. Typically you will have a spec for multiple hosts with the | |
796 | same layout. | |
f67539c2 | 797 | |
20effc67 TL |
798 | The service id as the unique key: In case a new OSD spec with an already |
799 | applied service id is applied, the existing OSD spec will be superseded. | |
a4b75251 TL |
800 | cephadm will now create new OSD daemons based on the new spec |
801 | definition. Existing OSD daemons will not be affected. See :ref:`cephadm-osd-declarative`. | |
f67539c2 TL |
802 | |
803 | Node1-5 | |
804 | ||
805 | .. code-block:: none | |
806 | ||
807 | 20 HDDs | |
1e59de90 | 808 | Vendor: VendorA |
f67539c2 TL |
809 | Model: SSD-123-foo |
810 | Size: 4TB | |
811 | 2 SSDs | |
1e59de90 | 812 | Vendor: VendorB |
f67539c2 TL |
813 | Model: MC-55-44-ZX |
814 | Size: 512GB | |
815 | ||
816 | Node6-10 | |
817 | ||
818 | .. code-block:: none | |
819 | ||
820 | 5 NVMEs | |
1e59de90 | 821 | Vendor: VendorA |
f67539c2 TL |
822 | Model: SSD-123-foo |
823 | Size: 4TB | |
824 | 20 SSDs | |
1e59de90 | 825 | Vendor: VendorB |
f67539c2 TL |
826 | Model: MC-55-44-ZX |
827 | Size: 512GB | |
828 | ||
a4b75251 | 829 | You can use the 'placement' key in the layout to target certain nodes. |
f67539c2 TL |
830 | |
831 | .. code-block:: yaml | |
832 | ||
833 | service_type: osd | |
a4b75251 | 834 | service_id: disk_layout_a |
f67539c2 | 835 | placement: |
a4b75251 TL |
836 | label: disk_layout_a |
837 | spec: | |
838 | data_devices: | |
839 | rotational: 1 | |
840 | db_devices: | |
841 | rotational: 0 | |
f67539c2 TL |
842 | --- |
843 | service_type: osd | |
a4b75251 | 844 | service_id: disk_layout_b |
f67539c2 | 845 | placement: |
a4b75251 TL |
846 | label: disk_layout_b |
847 | spec: | |
848 | data_devices: | |
849 | model: MC-55-44-XZ | |
850 | db_devices: | |
851 | model: SSD-123-foo | |
852 | ||
1e59de90 | 853 | |
a4b75251 TL |
854 | This applies different OSD specs to different hosts depending on the `placement` key. |
855 | See :ref:`orchestrator-cli-placement-spec` | |
856 | ||
857 | .. note:: | |
858 | ||
859 | Assuming each host has a unique disk layout, each OSD | |
860 | spec needs to have a different service id | |
f67539c2 | 861 | |
f67539c2 TL |
862 | |
863 | Dedicated wal + db | |
a4b75251 | 864 | ------------------ |
f67539c2 TL |
865 | |
866 | All previous cases co-located the WALs with the DBs. | |
867 | It's however possible to deploy the WAL on a dedicated device as well, if it makes sense. | |
868 | ||
869 | .. code-block:: none | |
870 | ||
871 | 20 HDDs | |
872 | Vendor: VendorA | |
873 | Model: SSD-123-foo | |
874 | Size: 4TB | |
875 | ||
876 | 2 SSDs | |
877 | Vendor: VendorB | |
878 | Model: MC-55-44-ZX | |
879 | Size: 512GB | |
880 | ||
881 | 2 NVMEs | |
882 | Vendor: VendorC | |
883 | Model: NVME-QQQQ-987 | |
884 | Size: 256GB | |
885 | ||
886 | ||
887 | The OSD spec for this case would look like the following (using the `model` filter): | |
888 | ||
889 | .. code-block:: yaml | |
890 | ||
891 | service_type: osd | |
892 | service_id: osd_spec_default | |
893 | placement: | |
894 | host_pattern: '*' | |
a4b75251 TL |
895 | spec: |
896 | data_devices: | |
897 | model: MC-55-44-XZ | |
898 | db_devices: | |
899 | model: SSD-123-foo | |
900 | wal_devices: | |
901 | model: NVME-QQQQ-987 | |
f67539c2 TL |
902 | |
903 | ||
904 | It is also possible to specify directly device paths in specific hosts like the following: | |
905 | ||
906 | .. code-block:: yaml | |
907 | ||
908 | service_type: osd | |
909 | service_id: osd_using_paths | |
910 | placement: | |
911 | hosts: | |
912 | - Node01 | |
913 | - Node02 | |
a4b75251 TL |
914 | spec: |
915 | data_devices: | |
916 | paths: | |
f67539c2 | 917 | - /dev/sdb |
a4b75251 TL |
918 | db_devices: |
919 | paths: | |
f67539c2 | 920 | - /dev/sdc |
a4b75251 TL |
921 | wal_devices: |
922 | paths: | |
f67539c2 TL |
923 | - /dev/sdd |
924 | ||
925 | ||
926 | This can easily be done with other filters, like `size` or `vendor` as well. | |
927 | ||
39ae355f TL |
928 | It's possible to specify the `crush_device_class` parameter within the |
929 | DriveGroup spec, and it's applied to all the devices defined by the `paths` | |
930 | keyword: | |
931 | ||
932 | .. code-block:: yaml | |
933 | ||
934 | service_type: osd | |
935 | service_id: osd_using_paths | |
936 | placement: | |
937 | hosts: | |
938 | - Node01 | |
939 | - Node02 | |
940 | crush_device_class: ssd | |
941 | spec: | |
942 | data_devices: | |
943 | paths: | |
944 | - /dev/sdb | |
945 | - /dev/sdc | |
946 | db_devices: | |
947 | paths: | |
948 | - /dev/sdd | |
949 | wal_devices: | |
950 | paths: | |
951 | - /dev/sde | |
952 | ||
953 | The `crush_device_class` parameter, however, can be defined for each OSD passed | |
954 | using the `paths` keyword with the following syntax: | |
955 | ||
956 | .. code-block:: yaml | |
957 | ||
958 | service_type: osd | |
959 | service_id: osd_using_paths | |
960 | placement: | |
961 | hosts: | |
962 | - Node01 | |
963 | - Node02 | |
964 | crush_device_class: ssd | |
965 | spec: | |
966 | data_devices: | |
967 | paths: | |
968 | - path: /dev/sdb | |
969 | crush_device_class: ssd | |
970 | - path: /dev/sdc | |
971 | crush_device_class: nvme | |
972 | db_devices: | |
973 | paths: | |
974 | - /dev/sdd | |
975 | wal_devices: | |
976 | paths: | |
977 | - /dev/sde | |
978 | ||
a4b75251 TL |
979 | .. _cephadm-osd-activate: |
980 | ||
f67539c2 TL |
981 | Activate existing OSDs |
982 | ====================== | |
983 | ||
984 | In case the OS of a host was reinstalled, existing OSDs need to be activated | |
985 | again. For this use case, cephadm provides a wrapper for :ref:`ceph-volume-lvm-activate` that | |
986 | activates all existing OSDs on a host. | |
987 | ||
988 | .. prompt:: bash # | |
989 | ||
990 | ceph cephadm osd activate <host>... | |
991 | ||
992 | This will scan all existing disks for OSDs and deploy corresponding daemons. | |
a4b75251 | 993 | |
20effc67 TL |
994 | Further Reading |
995 | =============== | |
a4b75251 TL |
996 | |
997 | * :ref:`ceph-volume` | |
998 | * :ref:`rados-index` |