]> git.proxmox.com Git - ceph.git/blob - ceph/doc/mgr/orchestrator.rst
fba1b5ce06265233bc7152a25858b3a6ea22e9a0
[ceph.git] / ceph / doc / mgr / orchestrator.rst
1
2 .. _orchestrator-cli-module:
3
4 ================
5 Orchestrator CLI
6 ================
7
8 This module provides a command line interface (CLI) to orchestrator
9 modules (ceph-mgr modules which interface with external orchestration services).
10
11 As the orchestrator CLI unifies different external orchestrators, a common nomenclature
12 for the orchestrator module is needed.
13
14 +--------------------------------------+---------------------------------------+
15 | *host* | hostname (not DNS name) of the |
16 | | physical host. Not the podname, |
17 | | container name, or hostname inside |
18 | | the container. |
19 +--------------------------------------+---------------------------------------+
20 | *service type* | The type of the service. e.g., nfs, |
21 | | mds, osd, mon, rgw, mgr, iscsi |
22 +--------------------------------------+---------------------------------------+
23 | *service* | A logical service, Typically |
24 | | comprised of multiple service |
25 | | instances on multiple hosts for HA |
26 | | |
27 | | * ``fs_name`` for mds type |
28 | | * ``rgw_zone`` for rgw type |
29 | | * ``ganesha_cluster_id`` for nfs type |
30 +--------------------------------------+---------------------------------------+
31 | *daemon* | A single instance of a service. |
32 | | Usually a daemon, but maybe not |
33 | | (e.g., might be a kernel service |
34 | | like LIO or knfsd or whatever) |
35 | | |
36 | | This identifier should |
37 | | uniquely identify the instance |
38 +--------------------------------------+---------------------------------------+
39
40 The relation between the names is the following:
41
42 * A *service* has a specific *service type*
43 * A *daemon* is a physical instance of a *service type*
44
45
46 .. note::
47
48 Orchestrator modules may only implement a subset of the commands listed below.
49 Also, the implementation of the commands may differ between modules.
50
51 Status
52 ======
53
54 ::
55
56 ceph orch status
57
58 Show current orchestrator mode and high-level status (whether the orchestrator
59 plugin is available and operational)
60
61 Host Management
62 ===============
63
64 List hosts associated with the cluster::
65
66 ceph orch host ls
67
68 Add and remove hosts::
69
70 ceph orch host add <hostname> [<addr>] [<labels>...]
71 ceph orch host rm <hostname>
72
73 For cephadm, see also :ref:`cephadm-fqdn`.
74
75 Host Specification
76 ------------------
77
78 Many hosts can be added at once using
79 ``ceph orch apply -i`` by submitting a multi-document YAML file::
80
81 ---
82 service_type: host
83 addr: node-00
84 hostname: node-00
85 labels:
86 - example1
87 - example2
88 ---
89 service_type: host
90 addr: node-01
91 hostname: node-01
92 labels:
93 - grafana
94 ---
95 service_type: host
96 addr: node-02
97 hostname: node-02
98
99 This can be combined with service specifications (below) to create a cluster spec file to deploy a whole cluster in one command. see ``cephadm bootstrap --apply-spec`` also to do this during bootstrap. Cluster SSH Keys must be copied to hosts prior to adding them.
100
101 OSD Management
102 ==============
103
104 List Devices
105 ------------
106
107 Print a list of discovered devices, grouped by host and optionally
108 filtered to a particular host:
109
110 ::
111
112 ceph orch device ls [--host=...] [--refresh]
113
114 Example::
115
116 HOST PATH TYPE SIZE DEVICE AVAIL REJECT REASONS
117 master /dev/vda hdd 42.0G False locked
118 node1 /dev/vda hdd 42.0G False locked
119 node1 /dev/vdb hdd 8192M 387836 False locked, LVM detected, Insufficient space (<5GB) on vgs
120 node1 /dev/vdc hdd 8192M 450575 False locked, LVM detected, Insufficient space (<5GB) on vgs
121 node3 /dev/vda hdd 42.0G False locked
122 node3 /dev/vdb hdd 8192M 395145 False LVM detected, locked, Insufficient space (<5GB) on vgs
123 node3 /dev/vdc hdd 8192M 165562 False LVM detected, locked, Insufficient space (<5GB) on vgs
124 node2 /dev/vda hdd 42.0G False locked
125 node2 /dev/vdb hdd 8192M 672147 False LVM detected, Insufficient space (<5GB) on vgs, locked
126 node2 /dev/vdc hdd 8192M 228094 False LVM detected, Insufficient space (<5GB) on vgs, locked
127
128
129
130
131 Erase Devices (Zap Devices)
132 ---------------------------
133
134 Erase (zap) a device so that it can be reused. ``zap`` calls ``ceph-volume zap`` on the remote host.
135
136 ::
137
138 orch device zap <hostname> <path>
139
140 Example command::
141
142 ceph orch device zap my_hostname /dev/sdx
143
144 .. note::
145 Cephadm orchestrator will automatically deploy drives that match the DriveGroup in your OSDSpec if the unmanaged flag is unset.
146 For example, if you use the ``all-available-devices`` option when creating OSDs, when you ``zap`` a device the cephadm orchestrator will automatically create a new OSD in the device .
147 To disable this behavior, see :ref:`orchestrator-cli-create-osds`.
148
149 .. _orchestrator-cli-create-osds:
150
151 Create OSDs
152 -----------
153
154 Create OSDs on a set of devices on a single host::
155
156 ceph orch daemon add osd <host>:device1,device2
157
158 Another way of doing it is using ``apply`` interface::
159
160 ceph orch apply osd -i <json_file/yaml_file> [--dry-run]
161
162 where the ``json_file/yaml_file`` is a DriveGroup specification.
163 For a more in-depth guide to DriveGroups please refer to :ref:`drivegroups`
164
165 ``dry-run`` will cause the orchestrator to present a preview of what will happen
166 without actually creating the OSDs.
167
168 Example::
169
170 # ceph orch apply osd --all-available-devices --dry-run
171 NAME HOST DATA DB WAL
172 all-available-devices node1 /dev/vdb - -
173 all-available-devices node2 /dev/vdc - -
174 all-available-devices node3 /dev/vdd - -
175
176 When the parameter ``all-available-devices`` or a DriveGroup specification is used, a cephadm service is created.
177 This service guarantees that all available devices or devices included in the DriveGroup will be used for OSDs.
178 Note that the effect of ``--all-available-devices`` is persistent; that is, drives which are added to the system
179 or become available (say, by zapping) after the command is complete will be automatically found and added to the cluster.
180
181 That is, after using::
182
183 ceph orch apply osd --all-available-devices
184
185 * If you add new disks to the cluster they will automatically be used to create new OSDs.
186 * A new OSD will be created automatically if you remove an OSD and clean the LVM physical volume.
187
188 If you want to avoid this behavior (disable automatic creation of OSD on available devices), use the ``unmanaged`` parameter::
189
190 ceph orch apply osd --all-available-devices --unmanaged=true
191
192 Remove an OSD
193 -------------
194 ::
195
196 ceph orch osd rm <osd_id(s)> [--replace] [--force]
197
198 Evacuates PGs from an OSD and removes it from the cluster.
199
200 Example::
201
202 # ceph orch osd rm 0
203 Scheduled OSD(s) for removal
204
205
206 OSDs that are not safe-to-destroy will be rejected.
207
208 You can query the state of the operation with::
209
210 # ceph orch osd rm status
211 OSD_ID HOST STATE PG_COUNT REPLACE FORCE STARTED_AT
212 2 cephadm-dev done, waiting for purge 0 True False 2020-07-17 13:01:43.147684
213 3 cephadm-dev draining 17 False True 2020-07-17 13:01:45.162158
214 4 cephadm-dev started 42 False True 2020-07-17 13:01:45.162158
215
216
217 When no PGs are left on the OSD, it will be decommissioned and removed from the cluster.
218
219 .. note::
220 After removing an OSD, if you wipe the LVM physical volume in the device used by the removed OSD, a new OSD will be created.
221 Read information about the ``unmanaged`` parameter in :ref:`orchestrator-cli-create-osds`.
222
223 Stopping OSD Removal
224 --------------------
225
226 You can stop the queued OSD removal operation with
227
228 ::
229
230 ceph orch osd rm stop <svc_id(s)>
231
232 Example::
233
234 # ceph orch osd rm stop 4
235 Stopped OSD(s) removal
236
237 This will reset the initial state of the OSD and take it off the removal queue.
238
239
240 Replace an OSD
241 -------------------
242 ::
243
244 orch osd rm <svc_id(s)> --replace [--force]
245
246 Example::
247
248 # ceph orch osd rm 4 --replace
249 Scheduled OSD(s) for replacement
250
251
252 This follows the same procedure as the "Remove OSD" part with the exception that the OSD is not permanently removed
253 from the CRUSH hierarchy, but is assigned a 'destroyed' flag.
254
255 **Preserving the OSD ID**
256
257 The previously-set 'destroyed' flag is used to determine OSD ids that will be reused in the next OSD deployment.
258
259 If you use OSDSpecs for OSD deployment, your newly added disks will be assigned the OSD ids of their replaced
260 counterparts, assuming the new disks still match the OSDSpecs.
261
262 For assistance in this process you can use the '--dry-run' feature.
263
264 Tip: The name of your OSDSpec can be retrieved from **ceph orch ls**
265
266 Alternatively, you can use your OSDSpec file::
267
268 ceph orch apply osd -i <osd_spec_file> --dry-run
269 NAME HOST DATA DB WAL
270 <name_of_osd_spec> node1 /dev/vdb - -
271
272
273 If this matches your anticipated behavior, just omit the --dry-run flag to execute the deployment.
274
275
276 ..
277 Turn On Device Lights
278 ^^^^^^^^^^^^^^^^^^^^^
279 ::
280
281 ceph orch device ident-on <dev_id>
282 ceph orch device ident-on <dev_name> <host>
283 ceph orch device fault-on <dev_id>
284 ceph orch device fault-on <dev_name> <host>
285
286 ceph orch device ident-off <dev_id> [--force=true]
287 ceph orch device ident-off <dev_id> <host> [--force=true]
288 ceph orch device fault-off <dev_id> [--force=true]
289 ceph orch device fault-off <dev_id> <host> [--force=true]
290
291 where ``dev_id`` is the device id as listed in ``osd metadata``,
292 ``dev_name`` is the name of the device on the system and ``host`` is the host as
293 returned by ``orchestrator host ls``
294
295 ceph orch osd ident-on {primary,journal,db,wal,all} <osd-id>
296 ceph orch osd ident-off {primary,journal,db,wal,all} <osd-id>
297 ceph orch osd fault-on {primary,journal,db,wal,all} <osd-id>
298 ceph orch osd fault-off {primary,journal,db,wal,all} <osd-id>
299
300 where ``journal`` is the filestore journal device, ``wal`` is the bluestore
301 write ahead log device, and ``all`` stands for all devices associated with the OSD
302
303
304 Monitor and manager management
305 ==============================
306
307 Creates or removes MONs or MGRs from the cluster. Orchestrator may return an
308 error if it doesn't know how to do this transition.
309
310 Update the number of monitor hosts::
311
312 ceph orch apply mon --placement=<placement> [--dry-run]
313
314 Where ``placement`` is a :ref:`orchestrator-cli-placement-spec`.
315
316 Each host can optionally specify a network for the monitor to listen on.
317
318 Update the number of manager hosts::
319
320 ceph orch apply mgr --placement=<placement> [--dry-run]
321
322 Where ``placement`` is a :ref:`orchestrator-cli-placement-spec`.
323
324 ..
325 .. note::
326
327 The host lists are the new full list of mon/mgr hosts
328
329 .. note::
330
331 specifying hosts is optional for some orchestrator modules
332 and mandatory for others (e.g. Ansible).
333
334
335 Service Status
336 ==============
337
338 Print a list of services known to the orchestrator. The list can be limited to
339 services on a particular host with the optional --host parameter and/or
340 services of a particular type via optional --type parameter
341 (mon, osd, mgr, mds, rgw):
342
343 ::
344
345 ceph orch ls [--service_type type] [--service_name name] [--export] [--format f] [--refresh]
346
347 Discover the status of a particular service or daemons::
348
349 ceph orch ls --service_type type --service_name <name> [--refresh]
350
351 Export the service specs known to the orchestrator as yaml in format
352 that is compatible to ``ceph orch apply -i``::
353
354 ceph orch ls --export
355
356
357 Daemon Status
358 =============
359
360 Print a list of all daemons known to the orchestrator::
361
362 ceph orch ps [--hostname host] [--daemon_type type] [--service_name name] [--daemon_id id] [--format f] [--refresh]
363
364 Query the status of a particular service instance (mon, osd, mds, rgw). For OSDs
365 the id is the numeric OSD ID, for MDS services it is the file system name::
366
367 ceph orch ps --daemon_type osd --daemon_id 0
368
369
370 .. _orchestrator-cli-cephfs:
371
372 Deploying CephFS
373 ================
374
375 In order to set up a :term:`CephFS`, execute::
376
377 ceph fs volume create <fs_name> <placement spec>
378
379 where ``name`` is the name of the CephFS and ``placement`` is a
380 :ref:`orchestrator-cli-placement-spec`.
381
382 This command will create the required Ceph pools, create the new
383 CephFS, and deploy mds servers.
384
385
386 .. _orchestrator-cli-stateless-services:
387
388 Stateless services (MDS/RGW/NFS/rbd-mirror/iSCSI)
389 =================================================
390
391 (Please note: The orchestrator will not configure the services. Please look into the corresponding
392 documentation for service configuration details.)
393
394 The ``name`` parameter is an identifier of the group of instances:
395
396 * a CephFS file system for a group of MDS daemons,
397 * a zone name for a group of RGWs
398
399 Creating/growing/shrinking/removing services::
400
401 ceph orch apply mds <fs_name> [--placement=<placement>] [--dry-run]
402 ceph orch apply rgw <realm> <zone> [--subcluster=<subcluster>] [--port=<port>] [--ssl] [--placement=<placement>] [--dry-run]
403 ceph orch apply nfs <name> <pool> [--namespace=<namespace>] [--placement=<placement>] [--dry-run]
404 ceph orch rm <service_name> [--force]
405
406 where ``placement`` is a :ref:`orchestrator-cli-placement-spec`.
407
408 e.g., ``ceph orch apply mds myfs --placement="3 host1 host2 host3"``
409
410 Service Commands::
411
412 ceph orch <start|stop|restart|redeploy|reconfig> <service_name>
413
414 Deploying custom containers
415 ===========================
416
417 The orchestrator enables custom containers to be deployed using a YAML file.
418 A corresponding :ref:`orchestrator-cli-service-spec` must look like:
419
420 .. code-block:: yaml
421
422 service_type: container
423 service_id: foo
424 placement:
425 ...
426 image: docker.io/library/foo:latest
427 entrypoint: /usr/bin/foo
428 uid: 1000
429 gid: 1000
430 args:
431 - "--net=host"
432 - "--cpus=2"
433 ports:
434 - 8080
435 - 8443
436 envs:
437 - SECRET=mypassword
438 - PORT=8080
439 - PUID=1000
440 - PGID=1000
441 volume_mounts:
442 CONFIG_DIR: /etc/foo
443 bind_mounts:
444 - ['type=bind', 'source=lib/modules', 'destination=/lib/modules', 'ro=true']
445 dirs:
446 - CONFIG_DIR
447 files:
448 CONFIG_DIR/foo.conf:
449 - refresh=true
450 - username=xyz
451 - "port: 1234"
452
453 where the properties of a service specification are:
454
455 * ``service_id``
456 A unique name of the service.
457 * ``image``
458 The name of the Docker image.
459 * ``uid``
460 The UID to use when creating directories and files in the host system.
461 * ``gid``
462 The GID to use when creating directories and files in the host system.
463 * ``entrypoint``
464 Overwrite the default ENTRYPOINT of the image.
465 * ``args``
466 A list of additional Podman/Docker command line arguments.
467 * ``ports``
468 A list of TCP ports to open in the host firewall.
469 * ``envs``
470 A list of environment variables.
471 * ``bind_mounts``
472 When you use a bind mount, a file or directory on the host machine
473 is mounted into the container. Relative `source=...` paths will be
474 located below `/var/lib/ceph/<cluster-fsid>/<daemon-name>`.
475 * ``volume_mounts``
476 When you use a volume mount, a new directory is created within
477 Docker’s storage directory on the host machine, and Docker manages
478 that directory’s contents. Relative source paths will be located below
479 `/var/lib/ceph/<cluster-fsid>/<daemon-name>`.
480 * ``dirs``
481 A list of directories that are created below
482 `/var/lib/ceph/<cluster-fsid>/<daemon-name>`.
483 * ``files``
484 A dictionary, where the key is the relative path of the file and the
485 value the file content. The content must be double quoted when using
486 a string. Use '\\n' for line breaks in that case. Otherwise define
487 multi-line content as list of strings. The given files will be created
488 below the directory `/var/lib/ceph/<cluster-fsid>/<daemon-name>`.
489 The absolute path of the directory where the file will be created must
490 exist. Use the `dirs` property to create them if necessary.
491
492 .. _orchestrator-cli-service-spec:
493
494 Service Specification
495 =====================
496
497 A *Service Specification* is a data structure represented as YAML
498 to specify the deployment of services. For example:
499
500 .. code-block:: yaml
501
502 service_type: rgw
503 service_id: realm.zone
504 placement:
505 hosts:
506 - host1
507 - host2
508 - host3
509 unmanaged: false
510 ...
511
512 where the properties of a service specification are:
513
514 * ``service_type``
515 The type of the service. Needs to be either a Ceph
516 service (``mon``, ``crash``, ``mds``, ``mgr``, ``osd`` or
517 ``rbd-mirror``), a gateway (``nfs`` or ``rgw``), part of the
518 monitoring stack (``alertmanager``, ``grafana``, ``node-exporter`` or
519 ``prometheus``) or (``container``) for custom containers.
520 * ``service_id``
521 The name of the service.
522 * ``placement``
523 See :ref:`orchestrator-cli-placement-spec`.
524 * ``unmanaged``
525 If set to ``true``, the orchestrator will not deploy nor
526 remove any daemon associated with this service. Placement and all other
527 properties will be ignored. This is useful, if this service should not
528 be managed temporarily.
529
530 Each service type can have additional service specific properties.
531
532 Service specifications of type ``mon``, ``mgr``, and the monitoring
533 types do not require a ``service_id``.
534
535 A service of type ``nfs`` requires a pool name and may contain
536 an optional namespace:
537
538 .. code-block:: yaml
539
540 service_type: nfs
541 service_id: mynfs
542 placement:
543 hosts:
544 - host1
545 - host2
546 spec:
547 pool: mypool
548 namespace: mynamespace
549
550 where ``pool`` is a RADOS pool where NFS client recovery data is stored
551 and ``namespace`` is a RADOS namespace where NFS client recovery
552 data is stored in the pool.
553
554 A service of type ``osd`` is described in :ref:`drivegroups`
555
556 Many service specifications can be applied at once using
557 ``ceph orch apply -i`` by submitting a multi-document YAML file::
558
559 cat <<EOF | ceph orch apply -i -
560 service_type: mon
561 placement:
562 host_pattern: "mon*"
563 ---
564 service_type: mgr
565 placement:
566 host_pattern: "mgr*"
567 ---
568 service_type: osd
569 service_id: default_drive_group
570 placement:
571 host_pattern: "osd*"
572 data_devices:
573 all: true
574 EOF
575
576 .. _orchestrator-cli-placement-spec:
577
578 Placement Specification
579 =======================
580
581 For the orchestrator to deploy a *service*, it needs to know where to deploy
582 *daemons*, and how many to deploy. This is the role of a placement
583 specification. Placement specifications can either be passed as command line arguments
584 or in a YAML files.
585
586 Explicit placements
587 -------------------
588
589 Daemons can be explicitly placed on hosts by simply specifying them::
590
591 orch apply prometheus --placement="host1 host2 host3"
592
593 Or in YAML:
594
595 .. code-block:: yaml
596
597 service_type: prometheus
598 placement:
599 hosts:
600 - host1
601 - host2
602 - host3
603
604 MONs and other services may require some enhanced network specifications::
605
606 orch daemon add mon --placement="myhost:[v2:1.2.3.4:3000,v1:1.2.3.4:6789]=name"
607
608 where ``[v2:1.2.3.4:3000,v1:1.2.3.4:6789]`` is the network address of the monitor
609 and ``=name`` specifies the name of the new monitor.
610
611 Placement by labels
612 -------------------
613
614 Daemons can be explictly placed on hosts that match a specific label::
615
616 orch apply prometheus --placement="label:mylabel"
617
618 Or in YAML:
619
620 .. code-block:: yaml
621
622 service_type: prometheus
623 placement:
624 label: "mylabel"
625
626
627 Placement by pattern matching
628 -----------------------------
629
630 Daemons can be placed on hosts as well::
631
632 orch apply prometheus --placement='myhost[1-3]'
633
634 Or in YAML:
635
636 .. code-block:: yaml
637
638 service_type: prometheus
639 placement:
640 host_pattern: "myhost[1-3]"
641
642 To place a service on *all* hosts, use ``"*"``::
643
644 orch apply crash --placement='*'
645
646 Or in YAML:
647
648 .. code-block:: yaml
649
650 service_type: node-exporter
651 placement:
652 host_pattern: "*"
653
654
655 Setting a limit
656 ---------------
657
658 By specifying ``count``, only that number of daemons will be created::
659
660 orch apply prometheus --placement=3
661
662 To deploy *daemons* on a subset of hosts, also specify the count::
663
664 orch apply prometheus --placement="2 host1 host2 host3"
665
666 If the count is bigger than the amount of hosts, cephadm deploys one per host::
667
668 orch apply prometheus --placement="3 host1 host2"
669
670 results in two Prometheus daemons.
671
672 Or in YAML:
673
674 .. code-block:: yaml
675
676 service_type: prometheus
677 placement:
678 count: 3
679
680 Or with hosts:
681
682 .. code-block:: yaml
683
684 service_type: prometheus
685 placement:
686 count: 2
687 hosts:
688 - host1
689 - host2
690 - host3
691
692 Updating Service Specifications
693 ===============================
694
695 The Ceph Orchestrator maintains a declarative state of each
696 service in a ``ServiceSpec``. For certain operations, like updating
697 the RGW HTTP port, we need to update the existing
698 specification.
699
700 1. List the current ``ServiceSpec``::
701
702 ceph orch ls --service_name=<service-name> --export > myservice.yaml
703
704 2. Update the yaml file::
705
706 vi myservice.yaml
707
708 3. Apply the new ``ServiceSpec``::
709
710 ceph orch apply -i myservice.yaml [--dry-run]
711
712 Configuring the Orchestrator CLI
713 ================================
714
715 To enable the orchestrator, select the orchestrator module to use
716 with the ``set backend`` command::
717
718 ceph orch set backend <module>
719
720 For example, to enable the Rook orchestrator module and use it with the CLI::
721
722 ceph mgr module enable rook
723 ceph orch set backend rook
724
725 Check the backend is properly configured::
726
727 ceph orch status
728
729 Disable the Orchestrator
730 ------------------------
731
732 To disable the orchestrator, use the empty string ``""``::
733
734 ceph orch set backend ""
735 ceph mgr module disable rook
736
737 Current Implementation Status
738 =============================
739
740 This is an overview of the current implementation status of the orchestrators.
741
742 =================================== ====== =========
743 Command Rook Cephadm
744 =================================== ====== =========
745 apply iscsi ⚪ ✔
746 apply mds ✔ ✔
747 apply mgr ⚪ ✔
748 apply mon ✔ ✔
749 apply nfs ✔ ✔
750 apply osd ✔ ✔
751 apply rbd-mirror ✔ ✔
752 apply rgw ⚪ ✔
753 apply container ⚪ ✔
754 host add ⚪ ✔
755 host ls ✔ ✔
756 host rm ⚪ ✔
757 daemon status ⚪ ✔
758 daemon {stop,start,...} ⚪ ✔
759 device {ident,fault}-(on,off} ⚪ ✔
760 device ls ✔ ✔
761 iscsi add ⚪ ✔
762 mds add ⚪ ✔
763 nfs add ✔ ✔
764 rbd-mirror add ⚪ ✔
765 rgw add ⚪ ✔
766 ps ✔ ✔
767 =================================== ====== =========
768
769 where
770
771 * ⚪ = not yet implemented
772 * ❌ = not applicable
773 * ✔ = implemented