]>
Commit | Line | Data |
---|---|---|
f67539c2 TL |
1 | *********** |
2 | OSD Service | |
3 | *********** | |
4 | .. _device management: ../rados/operations/devices | |
5 | .. _libstoragemgmt: https://github.com/libstorage/libstoragemgmt | |
6 | ||
7 | List Devices | |
8 | ============ | |
9 | ||
522d829b | 10 | ``ceph-volume`` scans each host in the cluster from time to time in order |
f67539c2 TL |
11 | to determine which devices are present and whether they are eligible to be |
12 | used as OSDs. | |
13 | ||
14 | To print a list of devices discovered by ``cephadm``, run this command: | |
15 | ||
16 | .. prompt:: bash # | |
17 | ||
18 | ceph orch device ls [--hostname=...] [--wide] [--refresh] | |
19 | ||
20 | Example | |
21 | :: | |
22 | ||
23 | Hostname Path Type Serial Size Health Ident Fault Available | |
24 | srv-01 /dev/sdb hdd 15P0A0YFFRD6 300G Unknown N/A N/A No | |
25 | srv-01 /dev/sdc hdd 15R0A08WFRD6 300G Unknown N/A N/A No | |
26 | srv-01 /dev/sdd hdd 15R0A07DFRD6 300G Unknown N/A N/A No | |
27 | srv-01 /dev/sde hdd 15P0A0QDFRD6 300G Unknown N/A N/A No | |
28 | srv-02 /dev/sdb hdd 15R0A033FRD6 300G Unknown N/A N/A No | |
29 | srv-02 /dev/sdc hdd 15R0A05XFRD6 300G Unknown N/A N/A No | |
30 | srv-02 /dev/sde hdd 15R0A0ANFRD6 300G Unknown N/A N/A No | |
31 | srv-02 /dev/sdf hdd 15R0A06EFRD6 300G Unknown N/A N/A No | |
32 | srv-03 /dev/sdb hdd 15R0A0OGFRD6 300G Unknown N/A N/A No | |
33 | srv-03 /dev/sdc hdd 15R0A0P7FRD6 300G Unknown N/A N/A No | |
34 | srv-03 /dev/sdd hdd 15R0A0O7FRD6 300G Unknown N/A N/A No | |
35 | ||
36 | Using the ``--wide`` option provides all details relating to the device, | |
37 | including any reasons that the device might not be eligible for use as an OSD. | |
38 | ||
39 | In the above example you can see fields named "Health", "Ident", and "Fault". | |
40 | This information is provided by integration with `libstoragemgmt`_. By default, | |
41 | this integration is disabled (because `libstoragemgmt`_ may not be 100% | |
42 | compatible with your hardware). To make ``cephadm`` include these fields, | |
43 | enable cephadm's "enhanced device scan" option as follows; | |
44 | ||
45 | .. prompt:: bash # | |
46 | ||
47 | ceph config set mgr mgr/cephadm/device_enhanced_scan true | |
48 | ||
49 | .. warning:: | |
50 | Although the libstoragemgmt library performs standard SCSI inquiry calls, | |
51 | there is no guarantee that your firmware fully implements these standards. | |
52 | This can lead to erratic behaviour and even bus resets on some older | |
53 | hardware. It is therefore recommended that, before enabling this feature, | |
54 | you test your hardware's compatibility with libstoragemgmt first to avoid | |
55 | unplanned interruptions to services. | |
56 | ||
57 | There are a number of ways to test compatibility, but the simplest may be | |
58 | to use the cephadm shell to call libstoragemgmt directly - ``cephadm shell | |
59 | lsmcli ldl``. If your hardware is supported you should see something like | |
60 | this: | |
61 | ||
62 | :: | |
63 | ||
64 | Path | SCSI VPD 0x83 | Link Type | Serial Number | Health Status | |
65 | ---------------------------------------------------------------------------- | |
66 | /dev/sda | 50000396082ba631 | SAS | 15P0A0R0FRD6 | Good | |
67 | /dev/sdb | 50000396082bbbf9 | SAS | 15P0A0YFFRD6 | Good | |
68 | ||
69 | ||
70 | After you have enabled libstoragemgmt support, the output will look something | |
71 | like this: | |
72 | ||
73 | :: | |
74 | ||
75 | # ceph orch device ls | |
76 | Hostname Path Type Serial Size Health Ident Fault Available | |
77 | srv-01 /dev/sdb hdd 15P0A0YFFRD6 300G Good Off Off No | |
78 | srv-01 /dev/sdc hdd 15R0A08WFRD6 300G Good Off Off No | |
79 | : | |
80 | ||
81 | In this example, libstoragemgmt has confirmed the health of the drives and the ability to | |
82 | interact with the Identification and Fault LEDs on the drive enclosures. For further | |
83 | information about interacting with these LEDs, refer to `device management`_. | |
84 | ||
85 | .. note:: | |
86 | The current release of `libstoragemgmt`_ (1.8.8) supports SCSI, SAS, and SATA based | |
87 | local disks only. There is no official support for NVMe devices (PCIe) | |
88 | ||
89 | .. _cephadm-deploy-osds: | |
90 | ||
91 | Deploy OSDs | |
92 | =========== | |
93 | ||
94 | Listing Storage Devices | |
95 | ----------------------- | |
96 | ||
97 | In order to deploy an OSD, there must be a storage device that is *available* on | |
98 | which the OSD will be deployed. | |
99 | ||
100 | Run this command to display an inventory of storage devices on all cluster hosts: | |
101 | ||
102 | .. prompt:: bash # | |
103 | ||
104 | ceph orch device ls | |
105 | ||
106 | A storage device is considered *available* if all of the following | |
107 | conditions are met: | |
108 | ||
109 | * The device must have no partitions. | |
110 | * The device must not have any LVM state. | |
111 | * The device must not be mounted. | |
112 | * The device must not contain a file system. | |
113 | * The device must not contain a Ceph BlueStore OSD. | |
114 | * The device must be larger than 5 GB. | |
115 | ||
116 | Ceph will not provision an OSD on a device that is not available. | |
117 | ||
118 | Creating New OSDs | |
119 | ----------------- | |
120 | ||
121 | There are a few ways to create new OSDs: | |
122 | ||
123 | * Tell Ceph to consume any available and unused storage device: | |
124 | ||
125 | .. prompt:: bash # | |
126 | ||
127 | ceph orch apply osd --all-available-devices | |
128 | ||
129 | * Create an OSD from a specific device on a specific host: | |
130 | ||
131 | .. prompt:: bash # | |
132 | ||
133 | ceph orch daemon add osd *<host>*:*<device-path>* | |
134 | ||
135 | For example: | |
136 | ||
137 | .. prompt:: bash # | |
138 | ||
139 | ceph orch daemon add osd host1:/dev/sdb | |
140 | ||
33c7a0ef TL |
141 | Advanced OSD creation from specific devices on a specific host: |
142 | ||
143 | .. prompt:: bash # | |
144 | ||
145 | ceph orch daemon add osd host1:data_devices=/dev/sda,/dev/sdb,db_devices=/dev/sdc,osds_per_device=2 | |
146 | ||
f67539c2 TL |
147 | * You can use :ref:`drivegroups` to categorize device(s) based on their |
148 | properties. This might be useful in forming a clearer picture of which | |
149 | devices are available to consume. Properties include device type (SSD or | |
150 | HDD), device model names, size, and the hosts on which the devices exist: | |
151 | ||
152 | .. prompt:: bash # | |
153 | ||
154 | ceph orch apply -i spec.yml | |
155 | ||
156 | Dry Run | |
157 | ------- | |
158 | ||
159 | The ``--dry-run`` flag causes the orchestrator to present a preview of what | |
160 | will happen without actually creating the OSDs. | |
161 | ||
162 | For example: | |
163 | ||
164 | .. prompt:: bash # | |
165 | ||
166 | ceph orch apply osd --all-available-devices --dry-run | |
167 | ||
168 | :: | |
169 | ||
170 | NAME HOST DATA DB WAL | |
171 | all-available-devices node1 /dev/vdb - - | |
172 | all-available-devices node2 /dev/vdc - - | |
173 | all-available-devices node3 /dev/vdd - - | |
174 | ||
175 | .. _cephadm-osd-declarative: | |
176 | ||
177 | Declarative State | |
178 | ----------------- | |
179 | ||
b3b6e05e TL |
180 | The effect of ``ceph orch apply`` is persistent. This means that drives that |
181 | are added to the system after the ``ceph orch apply`` command completes will be | |
182 | automatically found and added to the cluster. It also means that drives that | |
183 | become available (by zapping, for example) after the ``ceph orch apply`` | |
184 | command completes will be automatically found and added to the cluster. | |
f67539c2 | 185 | |
b3b6e05e | 186 | We will examine the effects of the following command: |
f67539c2 | 187 | |
b3b6e05e TL |
188 | .. prompt:: bash # |
189 | ||
190 | ceph orch apply osd --all-available-devices | |
191 | ||
192 | After running the above command: | |
193 | ||
194 | * If you add new disks to the cluster, they will automatically be used to | |
195 | create new OSDs. | |
196 | * If you remove an OSD and clean the LVM physical volume, a new OSD will be | |
197 | created automatically. | |
f67539c2 | 198 | |
b3b6e05e TL |
199 | To disable the automatic creation of OSD on available devices, use the |
200 | ``unmanaged`` parameter: | |
f67539c2 TL |
201 | |
202 | If you want to avoid this behavior (disable automatic creation of OSD on available devices), use the ``unmanaged`` parameter: | |
203 | ||
204 | .. prompt:: bash # | |
205 | ||
206 | ceph orch apply osd --all-available-devices --unmanaged=true | |
207 | ||
b3b6e05e TL |
208 | .. note:: |
209 | ||
210 | Keep these three facts in mind: | |
211 | ||
212 | - The default behavior of ``ceph orch apply`` causes cephadm constantly to reconcile. This means that cephadm creates OSDs as soon as new drives are detected. | |
213 | ||
214 | - Setting ``unmanaged: True`` disables the creation of OSDs. If ``unmanaged: True`` is set, nothing will happen even if you apply a new OSD service. | |
215 | ||
216 | - ``ceph orch daemon add`` creates OSDs, but does not add an OSD service. | |
217 | ||
f67539c2 TL |
218 | * For cephadm, see also :ref:`cephadm-spec-unmanaged`. |
219 | ||
522d829b | 220 | .. _cephadm-osd-removal: |
f67539c2 TL |
221 | |
222 | Remove an OSD | |
223 | ============= | |
224 | ||
225 | Removing an OSD from a cluster involves two steps: | |
226 | ||
227 | #. evacuating all placement groups (PGs) from the cluster | |
228 | #. removing the PG-free OSD from the cluster | |
229 | ||
230 | The following command performs these two steps: | |
231 | ||
232 | .. prompt:: bash # | |
233 | ||
234 | ceph orch osd rm <osd_id(s)> [--replace] [--force] | |
235 | ||
236 | Example: | |
237 | ||
238 | .. prompt:: bash # | |
239 | ||
240 | ceph orch osd rm 0 | |
241 | ||
242 | Expected output:: | |
243 | ||
244 | Scheduled OSD(s) for removal | |
245 | ||
246 | OSDs that are not safe to destroy will be rejected. | |
247 | ||
2a845540 TL |
248 | .. note:: |
249 | After removing OSDs, if the drives the OSDs were deployed on once again | |
250 | become available, cephadm may automatically try to deploy more OSDs | |
251 | on these drives if they match an existing drivegroup spec. If you deployed | |
252 | the OSDs you are removing with a spec and don't want any new OSDs deployed on | |
253 | the drives after removal, it's best to modify the drivegroup spec before removal. | |
254 | Either set ``unmanaged: true`` to stop it from picking up new drives at all, | |
255 | or modify it in some way that it no longer matches the drives used for the | |
256 | OSDs you wish to remove. Then re-apply the spec. For more info on drivegroup | |
257 | specs see :ref:`drivegroups`. For more info on the declarative nature of | |
258 | cephadm in reference to deploying OSDs, see :ref:`cephadm-osd-declarative` | |
259 | ||
f67539c2 TL |
260 | Monitoring OSD State |
261 | -------------------- | |
262 | ||
263 | You can query the state of OSD operation with the following command: | |
264 | ||
265 | .. prompt:: bash # | |
266 | ||
267 | ceph orch osd rm status | |
268 | ||
269 | Expected output:: | |
270 | ||
271 | OSD_ID HOST STATE PG_COUNT REPLACE FORCE STARTED_AT | |
272 | 2 cephadm-dev done, waiting for purge 0 True False 2020-07-17 13:01:43.147684 | |
273 | 3 cephadm-dev draining 17 False True 2020-07-17 13:01:45.162158 | |
274 | 4 cephadm-dev started 42 False True 2020-07-17 13:01:45.162158 | |
275 | ||
276 | ||
277 | When no PGs are left on the OSD, it will be decommissioned and removed from the cluster. | |
278 | ||
279 | .. note:: | |
280 | After removing an OSD, if you wipe the LVM physical volume in the device used by the removed OSD, a new OSD will be created. | |
281 | For more information on this, read about the ``unmanaged`` parameter in :ref:`cephadm-osd-declarative`. | |
282 | ||
283 | Stopping OSD Removal | |
284 | -------------------- | |
285 | ||
286 | It is possible to stop queued OSD removals by using the following command: | |
287 | ||
288 | .. prompt:: bash # | |
289 | ||
b3b6e05e | 290 | ceph orch osd rm stop <osd_id(s)> |
f67539c2 TL |
291 | |
292 | Example: | |
293 | ||
294 | .. prompt:: bash # | |
295 | ||
296 | ceph orch osd rm stop 4 | |
297 | ||
298 | Expected output:: | |
299 | ||
300 | Stopped OSD(s) removal | |
301 | ||
302 | This resets the initial state of the OSD and takes it off the removal queue. | |
303 | ||
39ae355f | 304 | .. _cephadm-replacing-an-osd: |
f67539c2 TL |
305 | |
306 | Replacing an OSD | |
307 | ---------------- | |
308 | ||
309 | .. prompt:: bash # | |
310 | ||
b3b6e05e | 311 | orch osd rm <osd_id(s)> --replace [--force] |
f67539c2 TL |
312 | |
313 | Example: | |
314 | ||
315 | .. prompt:: bash # | |
316 | ||
317 | ceph orch osd rm 4 --replace | |
318 | ||
319 | Expected output:: | |
320 | ||
321 | Scheduled OSD(s) for replacement | |
322 | ||
323 | This follows the same procedure as the procedure in the "Remove OSD" section, with | |
324 | one exception: the OSD is not permanently removed from the CRUSH hierarchy, but is | |
325 | instead assigned a 'destroyed' flag. | |
326 | ||
a4b75251 TL |
327 | .. note:: |
328 | The new OSD that will replace the removed OSD must be created on the same host | |
329 | as the OSD that was removed. | |
330 | ||
f67539c2 TL |
331 | **Preserving the OSD ID** |
332 | ||
333 | The 'destroyed' flag is used to determine which OSD ids will be reused in the | |
334 | next OSD deployment. | |
335 | ||
336 | If you use OSDSpecs for OSD deployment, your newly added disks will be assigned | |
337 | the OSD ids of their replaced counterparts. This assumes that the new disks | |
338 | still match the OSDSpecs. | |
339 | ||
340 | Use the ``--dry-run`` flag to make certain that the ``ceph orch apply osd`` | |
341 | command does what you want it to. The ``--dry-run`` flag shows you what the | |
342 | outcome of the command will be without making the changes you specify. When | |
343 | you are satisfied that the command will do what you want, run the command | |
344 | without the ``--dry-run`` flag. | |
345 | ||
346 | .. tip:: | |
347 | ||
348 | The name of your OSDSpec can be retrieved with the command ``ceph orch ls`` | |
349 | ||
350 | Alternatively, you can use your OSDSpec file: | |
351 | ||
352 | .. prompt:: bash # | |
353 | ||
20effc67 | 354 | ceph orch apply -i <osd_spec_file> --dry-run |
f67539c2 TL |
355 | |
356 | Expected output:: | |
357 | ||
358 | NAME HOST DATA DB WAL | |
359 | <name_of_osd_spec> node1 /dev/vdb - - | |
360 | ||
361 | ||
362 | When this output reflects your intention, omit the ``--dry-run`` flag to | |
363 | execute the deployment. | |
364 | ||
365 | ||
366 | Erasing Devices (Zapping Devices) | |
367 | --------------------------------- | |
368 | ||
369 | Erase (zap) a device so that it can be reused. ``zap`` calls ``ceph-volume | |
370 | zap`` on the remote host. | |
371 | ||
372 | .. prompt:: bash # | |
373 | ||
522d829b | 374 | ceph orch device zap <hostname> <path> |
f67539c2 TL |
375 | |
376 | Example command: | |
377 | ||
378 | .. prompt:: bash # | |
379 | ||
380 | ceph orch device zap my_hostname /dev/sdx | |
381 | ||
382 | .. note:: | |
383 | If the unmanaged flag is unset, cephadm automatically deploys drives that | |
a4b75251 | 384 | match the OSDSpec. For example, if you use the |
f67539c2 TL |
385 | ``all-available-devices`` option when creating OSDs, when you ``zap`` a |
386 | device the cephadm orchestrator automatically creates a new OSD in the | |
387 | device. To disable this behavior, see :ref:`cephadm-osd-declarative`. | |
388 | ||
389 | ||
b3b6e05e TL |
390 | .. _osd_autotune: |
391 | ||
392 | Automatically tuning OSD memory | |
393 | =============================== | |
394 | ||
395 | OSD daemons will adjust their memory consumption based on the | |
396 | ``osd_memory_target`` config option (several gigabytes, by | |
397 | default). If Ceph is deployed on dedicated nodes that are not sharing | |
398 | memory with other services, cephadm can automatically adjust the per-OSD | |
399 | memory consumption based on the total amount of RAM and the number of deployed | |
400 | OSDs. | |
401 | ||
20effc67 | 402 | .. warning:: Cephadm sets ``osd_memory_target_autotune`` to ``true`` by default which is unsuitable for hyperconverged infrastructures. |
b3b6e05e TL |
403 | |
404 | Cephadm will start with a fraction | |
405 | (``mgr/cephadm/autotune_memory_target_ratio``, which defaults to | |
406 | ``.7``) of the total RAM in the system, subtract off any memory | |
407 | consumed by non-autotuned daemons (non-OSDs, for OSDs for which | |
408 | ``osd_memory_target_autotune`` is false), and then divide by the | |
409 | remaining OSDs. | |
410 | ||
411 | The final targets are reflected in the config database with options like:: | |
412 | ||
413 | WHO MASK LEVEL OPTION VALUE | |
414 | osd host:foo basic osd_memory_target 126092301926 | |
415 | osd host:bar basic osd_memory_target 6442450944 | |
416 | ||
417 | Both the limits and the current memory consumed by each daemon are visible from | |
418 | the ``ceph orch ps`` output in the ``MEM LIMIT`` column:: | |
419 | ||
420 | NAME HOST PORTS STATUS REFRESHED AGE MEM USED MEM LIMIT VERSION IMAGE ID CONTAINER ID | |
421 | osd.1 dael running (3h) 10s ago 3h 72857k 117.4G 17.0.0-3781-gafaed750 7015fda3cd67 9e183363d39c | |
422 | osd.2 dael running (81m) 10s ago 81m 63989k 117.4G 17.0.0-3781-gafaed750 7015fda3cd67 1f0cc479b051 | |
423 | osd.3 dael running (62m) 10s ago 62m 64071k 117.4G 17.0.0-3781-gafaed750 7015fda3cd67 ac5537492f27 | |
424 | ||
425 | To exclude an OSD from memory autotuning, disable the autotune option | |
426 | for that OSD and also set a specific memory target. For example, | |
427 | ||
428 | .. prompt:: bash # | |
429 | ||
430 | ceph config set osd.123 osd_memory_target_autotune false | |
431 | ceph config set osd.123 osd_memory_target 16G | |
432 | ||
433 | ||
f67539c2 TL |
434 | .. _drivegroups: |
435 | ||
436 | Advanced OSD Service Specifications | |
437 | =================================== | |
438 | ||
b3b6e05e TL |
439 | :ref:`orchestrator-cli-service-spec`\s of type ``osd`` are a way to describe a |
440 | cluster layout, using the properties of disks. Service specifications give the | |
441 | user an abstract way to tell Ceph which disks should turn into OSDs with which | |
442 | configurations, without knowing the specifics of device names and paths. | |
443 | ||
444 | Service specifications make it possible to define a yaml or json file that can | |
445 | be used to reduce the amount of manual work involved in creating OSDs. | |
f67539c2 | 446 | |
b3b6e05e | 447 | For example, instead of running the following command: |
f67539c2 TL |
448 | |
449 | .. prompt:: bash [monitor.1]# | |
450 | ||
451 | ceph orch daemon add osd *<host>*:*<path-to-device>* | |
452 | ||
b3b6e05e TL |
453 | for each device and each host, we can define a yaml or json file that allows us |
454 | to describe the layout. Here's the most basic example. | |
f67539c2 | 455 | |
b3b6e05e | 456 | Create a file called (for example) ``osd_spec.yml``: |
f67539c2 TL |
457 | |
458 | .. code-block:: yaml | |
459 | ||
460 | service_type: osd | |
a4b75251 | 461 | service_id: default_drive_group # custom name of the osd spec |
f67539c2 | 462 | placement: |
a4b75251 TL |
463 | host_pattern: '*' # which hosts to target |
464 | spec: | |
465 | data_devices: # the type of devices you are applying specs to | |
466 | all: true # a filter, check below for a full list | |
f67539c2 | 467 | |
b3b6e05e | 468 | This means : |
f67539c2 | 469 | |
b3b6e05e TL |
470 | #. Turn any available device (ceph-volume decides what 'available' is) into an |
471 | OSD on all hosts that match the glob pattern '*'. (The glob pattern matches | |
472 | against the registered hosts from `host ls`) A more detailed section on | |
473 | host_pattern is available below. | |
f67539c2 | 474 | |
b3b6e05e | 475 | #. Then pass it to `osd create` like this: |
f67539c2 | 476 | |
b3b6e05e | 477 | .. prompt:: bash [monitor.1]# |
f67539c2 | 478 | |
20effc67 | 479 | ceph orch apply -i /path/to/osd_spec.yml |
f67539c2 | 480 | |
b3b6e05e TL |
481 | This instruction will be issued to all the matching hosts, and will deploy |
482 | these OSDs. | |
f67539c2 | 483 | |
b3b6e05e TL |
484 | Setups more complex than the one specified by the ``all`` filter are |
485 | possible. See :ref:`osd_filters` for details. | |
f67539c2 | 486 | |
b3b6e05e TL |
487 | A ``--dry-run`` flag can be passed to the ``apply osd`` command to display a |
488 | synopsis of the proposed layout. | |
f67539c2 TL |
489 | |
490 | Example | |
491 | ||
492 | .. prompt:: bash [monitor.1]# | |
493 | ||
20effc67 | 494 | ceph orch apply -i /path/to/osd_spec.yml --dry-run |
b3b6e05e | 495 | |
f67539c2 TL |
496 | |
497 | ||
b3b6e05e | 498 | .. _osd_filters: |
f67539c2 TL |
499 | |
500 | Filters | |
501 | ------- | |
502 | ||
503 | .. note:: | |
b3b6e05e TL |
504 | Filters are applied using an `AND` gate by default. This means that a drive |
505 | must fulfill all filter criteria in order to get selected. This behavior can | |
506 | be adjusted by setting ``filter_logic: OR`` in the OSD specification. | |
f67539c2 | 507 | |
b3b6e05e TL |
508 | Filters are used to assign disks to groups, using their attributes to group |
509 | them. | |
f67539c2 | 510 | |
b3b6e05e TL |
511 | The attributes are based off of ceph-volume's disk query. You can retrieve |
512 | information about the attributes with this command: | |
f67539c2 TL |
513 | |
514 | .. code-block:: bash | |
515 | ||
516 | ceph-volume inventory </path/to/disk> | |
517 | ||
b3b6e05e TL |
518 | Vendor or Model |
519 | ^^^^^^^^^^^^^^^ | |
f67539c2 | 520 | |
b3b6e05e | 521 | Specific disks can be targeted by vendor or model: |
f67539c2 TL |
522 | |
523 | .. code-block:: yaml | |
524 | ||
525 | model: disk_model_name | |
526 | ||
527 | or | |
528 | ||
529 | .. code-block:: yaml | |
530 | ||
531 | vendor: disk_vendor_name | |
532 | ||
533 | ||
b3b6e05e TL |
534 | Size |
535 | ^^^^ | |
f67539c2 | 536 | |
b3b6e05e | 537 | Specific disks can be targeted by `Size`: |
f67539c2 TL |
538 | |
539 | .. code-block:: yaml | |
540 | ||
541 | size: size_spec | |
542 | ||
b3b6e05e TL |
543 | Size specs |
544 | __________ | |
f67539c2 | 545 | |
b3b6e05e | 546 | Size specifications can be of the following forms: |
f67539c2 TL |
547 | |
548 | * LOW:HIGH | |
549 | * :HIGH | |
550 | * LOW: | |
551 | * EXACT | |
552 | ||
553 | Concrete examples: | |
554 | ||
b3b6e05e | 555 | To include disks of an exact size |
f67539c2 TL |
556 | |
557 | .. code-block:: yaml | |
558 | ||
559 | size: '10G' | |
560 | ||
b3b6e05e | 561 | To include disks within a given range of size: |
f67539c2 TL |
562 | |
563 | .. code-block:: yaml | |
564 | ||
565 | size: '10G:40G' | |
566 | ||
b3b6e05e | 567 | To include disks that are less than or equal to 10G in size: |
f67539c2 TL |
568 | |
569 | .. code-block:: yaml | |
570 | ||
571 | size: ':10G' | |
572 | ||
b3b6e05e | 573 | To include disks equal to or greater than 40G in size: |
f67539c2 TL |
574 | |
575 | .. code-block:: yaml | |
576 | ||
577 | size: '40G:' | |
578 | ||
b3b6e05e | 579 | Sizes don't have to be specified exclusively in Gigabytes(G). |
f67539c2 | 580 | |
b3b6e05e TL |
581 | Other units of size are supported: Megabyte(M), Gigabyte(G) and Terrabyte(T). |
582 | Appending the (B) for byte is also supported: ``MB``, ``GB``, ``TB``. | |
f67539c2 TL |
583 | |
584 | ||
b3b6e05e TL |
585 | Rotational |
586 | ^^^^^^^^^^ | |
f67539c2 TL |
587 | |
588 | This operates on the 'rotational' attribute of the disk. | |
589 | ||
590 | .. code-block:: yaml | |
591 | ||
592 | rotational: 0 | 1 | |
593 | ||
594 | `1` to match all disks that are rotational | |
595 | ||
596 | `0` to match all disks that are non-rotational (SSD, NVME etc) | |
597 | ||
598 | ||
b3b6e05e TL |
599 | All |
600 | ^^^ | |
f67539c2 TL |
601 | |
602 | This will take all disks that are 'available' | |
603 | ||
a4b75251 | 604 | .. note:: This is exclusive for the data_devices section. |
f67539c2 TL |
605 | |
606 | .. code-block:: yaml | |
607 | ||
608 | all: true | |
609 | ||
610 | ||
b3b6e05e TL |
611 | Limiter |
612 | ^^^^^^^ | |
f67539c2 | 613 | |
b3b6e05e | 614 | If you have specified some valid filters but want to limit the number of disks that they match, use the ``limit`` directive: |
f67539c2 TL |
615 | |
616 | .. code-block:: yaml | |
617 | ||
618 | limit: 2 | |
619 | ||
b3b6e05e TL |
620 | For example, if you used `vendor` to match all disks that are from `VendorA` |
621 | but want to use only the first two, you could use `limit`: | |
f67539c2 TL |
622 | |
623 | .. code-block:: yaml | |
624 | ||
625 | data_devices: | |
626 | vendor: VendorA | |
627 | limit: 2 | |
628 | ||
a4b75251 | 629 | .. note:: `limit` is a last resort and shouldn't be used if it can be avoided. |
f67539c2 TL |
630 | |
631 | ||
632 | Additional Options | |
633 | ------------------ | |
634 | ||
635 | There are multiple optional settings you can use to change the way OSDs are deployed. | |
a4b75251 | 636 | You can add these options to the base level of an OSD spec for it to take effect. |
f67539c2 TL |
637 | |
638 | This example would deploy all OSDs with encryption enabled. | |
639 | ||
640 | .. code-block:: yaml | |
641 | ||
642 | service_type: osd | |
643 | service_id: example_osd_spec | |
644 | placement: | |
645 | host_pattern: '*' | |
a4b75251 TL |
646 | spec: |
647 | data_devices: | |
648 | all: true | |
649 | encrypted: true | |
f67539c2 TL |
650 | |
651 | See a full list in the DriveGroupSpecs | |
652 | ||
653 | .. py:currentmodule:: ceph.deployment.drive_group | |
654 | ||
655 | .. autoclass:: DriveGroupSpec | |
656 | :members: | |
657 | :exclude-members: from_json | |
658 | ||
659 | Examples | |
a4b75251 | 660 | ======== |
f67539c2 TL |
661 | |
662 | The simple case | |
a4b75251 | 663 | --------------- |
f67539c2 TL |
664 | |
665 | All nodes with the same setup | |
666 | ||
667 | .. code-block:: none | |
668 | ||
669 | 20 HDDs | |
670 | Vendor: VendorA | |
671 | Model: HDD-123-foo | |
672 | Size: 4TB | |
673 | ||
674 | 2 SSDs | |
675 | Vendor: VendorB | |
676 | Model: MC-55-44-ZX | |
677 | Size: 512GB | |
678 | ||
679 | This is a common setup and can be described quite easily: | |
680 | ||
681 | .. code-block:: yaml | |
682 | ||
683 | service_type: osd | |
684 | service_id: osd_spec_default | |
685 | placement: | |
686 | host_pattern: '*' | |
a4b75251 TL |
687 | spec: |
688 | data_devices: | |
689 | model: HDD-123-foo # Note, HDD-123 would also be valid | |
690 | db_devices: | |
691 | model: MC-55-44-XZ # Same here, MC-55-44 is valid | |
f67539c2 TL |
692 | |
693 | However, we can improve it by reducing the filters on core properties of the drives: | |
694 | ||
695 | .. code-block:: yaml | |
696 | ||
697 | service_type: osd | |
698 | service_id: osd_spec_default | |
699 | placement: | |
700 | host_pattern: '*' | |
a4b75251 TL |
701 | spec: |
702 | data_devices: | |
703 | rotational: 1 | |
704 | db_devices: | |
705 | rotational: 0 | |
f67539c2 TL |
706 | |
707 | Now, we enforce all rotating devices to be declared as 'data devices' and all non-rotating devices will be used as shared_devices (wal, db) | |
708 | ||
709 | If you know that drives with more than 2TB will always be the slower data devices, you can also filter by size: | |
710 | ||
711 | .. code-block:: yaml | |
712 | ||
713 | service_type: osd | |
714 | service_id: osd_spec_default | |
715 | placement: | |
716 | host_pattern: '*' | |
a4b75251 TL |
717 | spec: |
718 | data_devices: | |
719 | size: '2TB:' | |
720 | db_devices: | |
721 | size: ':2TB' | |
f67539c2 | 722 | |
a4b75251 | 723 | .. note:: All of the above OSD specs are equally valid. Which of those you want to use depends on taste and on how much you expect your node layout to change. |
f67539c2 TL |
724 | |
725 | ||
a4b75251 TL |
726 | Multiple OSD specs for a single host |
727 | ------------------------------------ | |
f67539c2 TL |
728 | |
729 | Here we have two distinct setups | |
730 | ||
731 | .. code-block:: none | |
732 | ||
733 | 20 HDDs | |
734 | Vendor: VendorA | |
735 | Model: HDD-123-foo | |
736 | Size: 4TB | |
737 | ||
738 | 12 SSDs | |
739 | Vendor: VendorB | |
740 | Model: MC-55-44-ZX | |
741 | Size: 512GB | |
742 | ||
743 | 2 NVMEs | |
744 | Vendor: VendorC | |
745 | Model: NVME-QQQQ-987 | |
746 | Size: 256GB | |
747 | ||
748 | ||
749 | * 20 HDDs should share 2 SSDs | |
750 | * 10 SSDs should share 2 NVMes | |
751 | ||
752 | This can be described with two layouts. | |
753 | ||
754 | .. code-block:: yaml | |
755 | ||
756 | service_type: osd | |
757 | service_id: osd_spec_hdd | |
758 | placement: | |
759 | host_pattern: '*' | |
a4b75251 TL |
760 | spec: |
761 | data_devices: | |
762 | rotational: 0 | |
763 | db_devices: | |
764 | model: MC-55-44-XZ | |
765 | limit: 2 # db_slots is actually to be favoured here, but it's not implemented yet | |
f67539c2 TL |
766 | --- |
767 | service_type: osd | |
768 | service_id: osd_spec_ssd | |
769 | placement: | |
770 | host_pattern: '*' | |
a4b75251 TL |
771 | spec: |
772 | data_devices: | |
773 | model: MC-55-44-XZ | |
774 | db_devices: | |
775 | vendor: VendorC | |
f67539c2 TL |
776 | |
777 | This would create the desired layout by using all HDDs as data_devices with two SSD assigned as dedicated db/wal devices. | |
778 | The remaining SSDs(8) will be data_devices that have the 'VendorC' NVMEs assigned as dedicated db/wal devices. | |
779 | ||
a4b75251 TL |
780 | Multiple hosts with the same disk layout |
781 | ---------------------------------------- | |
782 | ||
783 | Assuming the cluster has different kinds of hosts each with similar disk | |
784 | layout, it is recommended to apply different OSD specs matching only one | |
785 | set of hosts. Typically you will have a spec for multiple hosts with the | |
786 | same layout. | |
f67539c2 | 787 | |
20effc67 TL |
788 | The service id as the unique key: In case a new OSD spec with an already |
789 | applied service id is applied, the existing OSD spec will be superseded. | |
a4b75251 TL |
790 | cephadm will now create new OSD daemons based on the new spec |
791 | definition. Existing OSD daemons will not be affected. See :ref:`cephadm-osd-declarative`. | |
f67539c2 TL |
792 | |
793 | Node1-5 | |
794 | ||
795 | .. code-block:: none | |
796 | ||
797 | 20 HDDs | |
798 | Vendor: Intel | |
799 | Model: SSD-123-foo | |
800 | Size: 4TB | |
801 | 2 SSDs | |
802 | Vendor: VendorA | |
803 | Model: MC-55-44-ZX | |
804 | Size: 512GB | |
805 | ||
806 | Node6-10 | |
807 | ||
808 | .. code-block:: none | |
809 | ||
810 | 5 NVMEs | |
811 | Vendor: Intel | |
812 | Model: SSD-123-foo | |
813 | Size: 4TB | |
814 | 20 SSDs | |
815 | Vendor: VendorA | |
816 | Model: MC-55-44-ZX | |
817 | Size: 512GB | |
818 | ||
a4b75251 | 819 | You can use the 'placement' key in the layout to target certain nodes. |
f67539c2 TL |
820 | |
821 | .. code-block:: yaml | |
822 | ||
823 | service_type: osd | |
a4b75251 | 824 | service_id: disk_layout_a |
f67539c2 | 825 | placement: |
a4b75251 TL |
826 | label: disk_layout_a |
827 | spec: | |
828 | data_devices: | |
829 | rotational: 1 | |
830 | db_devices: | |
831 | rotational: 0 | |
f67539c2 TL |
832 | --- |
833 | service_type: osd | |
a4b75251 | 834 | service_id: disk_layout_b |
f67539c2 | 835 | placement: |
a4b75251 TL |
836 | label: disk_layout_b |
837 | spec: | |
838 | data_devices: | |
839 | model: MC-55-44-XZ | |
840 | db_devices: | |
841 | model: SSD-123-foo | |
842 | ||
843 | This applies different OSD specs to different hosts depending on the `placement` key. | |
844 | See :ref:`orchestrator-cli-placement-spec` | |
845 | ||
846 | .. note:: | |
847 | ||
848 | Assuming each host has a unique disk layout, each OSD | |
849 | spec needs to have a different service id | |
f67539c2 | 850 | |
f67539c2 TL |
851 | |
852 | Dedicated wal + db | |
a4b75251 | 853 | ------------------ |
f67539c2 TL |
854 | |
855 | All previous cases co-located the WALs with the DBs. | |
856 | It's however possible to deploy the WAL on a dedicated device as well, if it makes sense. | |
857 | ||
858 | .. code-block:: none | |
859 | ||
860 | 20 HDDs | |
861 | Vendor: VendorA | |
862 | Model: SSD-123-foo | |
863 | Size: 4TB | |
864 | ||
865 | 2 SSDs | |
866 | Vendor: VendorB | |
867 | Model: MC-55-44-ZX | |
868 | Size: 512GB | |
869 | ||
870 | 2 NVMEs | |
871 | Vendor: VendorC | |
872 | Model: NVME-QQQQ-987 | |
873 | Size: 256GB | |
874 | ||
875 | ||
876 | The OSD spec for this case would look like the following (using the `model` filter): | |
877 | ||
878 | .. code-block:: yaml | |
879 | ||
880 | service_type: osd | |
881 | service_id: osd_spec_default | |
882 | placement: | |
883 | host_pattern: '*' | |
a4b75251 TL |
884 | spec: |
885 | data_devices: | |
886 | model: MC-55-44-XZ | |
887 | db_devices: | |
888 | model: SSD-123-foo | |
889 | wal_devices: | |
890 | model: NVME-QQQQ-987 | |
f67539c2 TL |
891 | |
892 | ||
893 | It is also possible to specify directly device paths in specific hosts like the following: | |
894 | ||
895 | .. code-block:: yaml | |
896 | ||
897 | service_type: osd | |
898 | service_id: osd_using_paths | |
899 | placement: | |
900 | hosts: | |
901 | - Node01 | |
902 | - Node02 | |
a4b75251 TL |
903 | spec: |
904 | data_devices: | |
905 | paths: | |
f67539c2 | 906 | - /dev/sdb |
a4b75251 TL |
907 | db_devices: |
908 | paths: | |
f67539c2 | 909 | - /dev/sdc |
a4b75251 TL |
910 | wal_devices: |
911 | paths: | |
f67539c2 TL |
912 | - /dev/sdd |
913 | ||
914 | ||
915 | This can easily be done with other filters, like `size` or `vendor` as well. | |
916 | ||
39ae355f TL |
917 | It's possible to specify the `crush_device_class` parameter within the |
918 | DriveGroup spec, and it's applied to all the devices defined by the `paths` | |
919 | keyword: | |
920 | ||
921 | .. code-block:: yaml | |
922 | ||
923 | service_type: osd | |
924 | service_id: osd_using_paths | |
925 | placement: | |
926 | hosts: | |
927 | - Node01 | |
928 | - Node02 | |
929 | crush_device_class: ssd | |
930 | spec: | |
931 | data_devices: | |
932 | paths: | |
933 | - /dev/sdb | |
934 | - /dev/sdc | |
935 | db_devices: | |
936 | paths: | |
937 | - /dev/sdd | |
938 | wal_devices: | |
939 | paths: | |
940 | - /dev/sde | |
941 | ||
942 | The `crush_device_class` parameter, however, can be defined for each OSD passed | |
943 | using the `paths` keyword with the following syntax: | |
944 | ||
945 | .. code-block:: yaml | |
946 | ||
947 | service_type: osd | |
948 | service_id: osd_using_paths | |
949 | placement: | |
950 | hosts: | |
951 | - Node01 | |
952 | - Node02 | |
953 | crush_device_class: ssd | |
954 | spec: | |
955 | data_devices: | |
956 | paths: | |
957 | - path: /dev/sdb | |
958 | crush_device_class: ssd | |
959 | - path: /dev/sdc | |
960 | crush_device_class: nvme | |
961 | db_devices: | |
962 | paths: | |
963 | - /dev/sdd | |
964 | wal_devices: | |
965 | paths: | |
966 | - /dev/sde | |
967 | ||
a4b75251 TL |
968 | .. _cephadm-osd-activate: |
969 | ||
f67539c2 TL |
970 | Activate existing OSDs |
971 | ====================== | |
972 | ||
973 | In case the OS of a host was reinstalled, existing OSDs need to be activated | |
974 | again. For this use case, cephadm provides a wrapper for :ref:`ceph-volume-lvm-activate` that | |
975 | activates all existing OSDs on a host. | |
976 | ||
977 | .. prompt:: bash # | |
978 | ||
979 | ceph cephadm osd activate <host>... | |
980 | ||
981 | This will scan all existing disks for OSDs and deploy corresponding daemons. | |
a4b75251 | 982 | |
20effc67 TL |
983 | Further Reading |
984 | =============== | |
a4b75251 TL |
985 | |
986 | * :ref:`ceph-volume` | |
987 | * :ref:`rados-index` |