ceph/doc/mgr/orchestrator.rst

   1
   2 .. _orchestrator-cli-module:
   3
   4 ================
   5 Orchestrator CLI
   6 ================
   7
   8 This module provides a command line interface (CLI) to orchestrator
   9 modules (ceph-mgr modules which interface with external orchestration services).
  10
  11 As the orchestrator CLI unifies different external orchestrators, a common nomenclature
  12 for the orchestrator module is needed.
  13
  14 +--------------------------------------+---------------------------------------+
  15 | *host*                               | hostname (not DNS name) of the        |
  16 |                                      | physical host. Not the podname,       |
  17 |                                      | container name, or hostname inside    |
  18 |                                      | the container.                        |
  19 +--------------------------------------+---------------------------------------+
  20 | *service type*                       | The type of the service. e.g., nfs,   |
  21 |                                      | mds, osd, mon, rgw, mgr, iscsi        |
  22 +--------------------------------------+---------------------------------------+
  23 | *service*                            | A logical service, Typically          |
  24 |                                      | comprised of multiple service         |
  25 |                                      | instances on multiple hosts for HA    |
  26 |                                      |                                       |
  27 |                                      | * ``fs_name`` for mds type            |
  28 |                                      | * ``rgw_zone`` for rgw type           |
  29 |                                      | * ``ganesha_cluster_id`` for nfs type |
  30 +--------------------------------------+---------------------------------------+
  31 | *daemon*                             | A single instance of a service.       |
  32 |                                      | Usually a daemon, but maybe not       |
  33 |                                      | (e.g., might be a kernel service      |
  34 |                                      | like LIO or knfsd or whatever)        |
  35 |                                      |                                       |
  36 |                                      | This identifier should                |
  37 |                                      | uniquely identify the instance        |
  38 +--------------------------------------+---------------------------------------+
  39
  40 The relation between the names is the following:
  41
  42 * A *service* has a specfic *service type*
  43 * A *daemon* is a physical instance of a *service type*
  44
  45
  46 .. note::
  47
  48     Orchestrator modules may only implement a subset of the commands listed below.
  49     Also, the implementation of the commands are orchestrator module dependent and will
  50     differ between implementations.
  51
  52 Status
  53 ======
  54
  55 ::
  56
  57     ceph orch status
  58
  59 Show current orchestrator mode and high-level status (whether the module able
  60 to talk to it)
  61
  62 Also show any in-progress actions.
  63
  64 Host Management
  65 ===============
  66
  67 List hosts associated with the cluster::
  68
  69     ceph orch host ls
  70
  71 Add and remove hosts::
  72
  73     ceph orch host add <host>
  74     ceph orch host rm <host>
  75
  76 OSD Management
  77 ==============
  78
  79 List Devices
  80 ------------
  81
  82 Print a list of discovered devices, grouped by host and optionally
  83 filtered to a particular host:
  84
  85 ::
  86
  87     ceph orch device ls [--host=...] [--refresh]
  88
  89 Example::
  90
  91     # ceph orch device ls
  92     Host 192.168.121.206:
  93     Device Path           Type       Size    Rotates  Available Model
  94     /dev/sdb               hdd      50.0G       True       True ATA/QEMU HARDDISK
  95     /dev/sda               hdd      50.0G       True      False ATA/QEMU HARDDISK
  96
  97     Host 192.168.121.181:
  98     Device Path           Type       Size    Rotates  Available Model
  99     /dev/sdb               hdd      50.0G       True       True ATA/QEMU HARDDISK
 100     /dev/sda               hdd      50.0G       True      False ATA/QEMU HARDDISK
 101
 102 .. note::
 103     Output form Ansible orchestrator
 104
 105 Create OSDs
 106 -----------
 107
 108 Create OSDs on a group of devices on a single host::
 109
 110     ceph orch osd create <host>:<drive>
 111     ceph orch osd create -i <path-to-drive-group.json>
 112
 113
 114 The output of ``osd create`` is not specified and may vary between orchestrator backends.
 115
 116 Where ``drive.group.json`` is a JSON file containing the fields defined in
 117 :class:`ceph.deployment_utils.drive_group.DriveGroupSpec`
 118
 119 Example::
 120
 121     # ceph orch osd create 192.168.121.206:/dev/sdc
 122     {"status": "OK", "msg": "", "data": {"event": "playbook_on_stats", "uuid": "7082f3ba-f5b7-4b7c-9477-e74ca918afcb", "stdout": "\r\nPLAY RECAP *********************************************************************\r\n192.168.121.206            : ok=96   changed=3    unreachable=0    failed=0   \r\n", "counter": 932, "pid": 10294, "created": "2019-05-28T22:22:58.527821", "end_line": 1170, "runner_ident": "083cad3c-8197-11e9-b07a-2016b900e38f", "start_line": 1166, "event_data": {"ignored": 0, "skipped": {"192.168.121.206": 186}, "ok": {"192.168.121.206": 96}, "artifact_data": {}, "rescued": 0, "changed": {"192.168.121.206": 3}, "pid": 10294, "dark": {}, "playbook_uuid": "409364a6-9d49-4e44-8b7b-c28e5b3adf89", "playbook": "add-osd.yml", "failures": {}, "processed": {"192.168.121.206": 1}}, "parent_uuid": "409364a6-9d49-4e44-8b7b-c28e5b3adf89"}}
 123
 124 .. note::
 125     Output form Ansible orchestrator
 126
 127 Decommission an OSD
 128 -------------------
 129 ::
 130
 131     ceph orch osd rm <osd-id> [osd-id...]
 132
 133 Removes one or more OSDs from the cluster and the host, if the OSDs are marked as
 134 ``destroyed``.
 135
 136 Example::
 137
 138     # ceph orch osd rm 4
 139     {"status": "OK", "msg": "", "data": {"event": "playbook_on_stats", "uuid": "1a16e631-906d-48e0-9e24-fa7eb593cc0a", "stdout": "\r\nPLAY RECAP *********************************************************************\r\n192.168.121.158            : ok=2    changed=0    unreachable=0    failed=0   \r\n192.168.121.181            : ok=2    changed=0    unreachable=0    failed=0   \r\n192.168.121.206            : ok=2    changed=0    unreachable=0    failed=0   \r\nlocalhost                  : ok=31   changed=8    unreachable=0    failed=0   \r\n", "counter": 240, "pid": 10948, "created": "2019-05-28T22:26:09.264012", "end_line": 308, "runner_ident": "8c093db0-8197-11e9-b07a-2016b900e38f", "start_line": 301, "event_data": {"ignored": 0, "skipped": {"localhost": 37}, "ok": {"192.168.121.181": 2, "192.168.121.158": 2, "192.168.121.206": 2, "localhost": 31}, "artifact_data": {}, "rescued": 0, "changed": {"localhost": 8}, "pid": 10948, "dark": {}, "playbook_uuid": "a12ec40e-bce9-4bc9-b09e-2d8f76a5be02", "playbook": "shrink-osd.yml", "failures": {}, "processed": {"192.168.121.181": 1, "192.168.121.158": 1, "192.168.121.206": 1, "localhost": 1}}, "parent_uuid": "a12ec40e-bce9-4bc9-b09e-2d8f76a5be02"}}
 140
 141 .. note::
 142     Output form Ansible orchestrator
 143
 144 ..
 145     Blink Device Lights
 146     ^^^^^^^^^^^^^^^^^^^
 147     ::
 148
 149         ceph orch device ident-on <dev_id>
 150         ceph orch device ident-on <dev_name> <host>
 151         ceph orch device fault-on <dev_id>
 152         ceph orch device fault-on <dev_name> <host>
 153
 154         ceph orch device ident-off <dev_id> [--force=true]
 155         ceph orch device ident-off <dev_id> <host> [--force=true]
 156         ceph orch device fault-off <dev_id> [--force=true]
 157         ceph orch device fault-off <dev_id> <host> [--force=true]
 158
 159     where ``dev_id`` is the device id as listed in ``osd metadata``,
 160     ``dev_name`` is the name of the device on the system and ``host`` is the host as
 161     returned by ``orchestrator host ls``
 162
 163         ceph orch osd ident-on {primary,journal,db,wal,all} <osd-id>
 164         ceph orch osd ident-off {primary,journal,db,wal,all} <osd-id>
 165         ceph orch osd fault-on {primary,journal,db,wal,all} <osd-id>
 166         ceph orch osd fault-off {primary,journal,db,wal,all} <osd-id>
 167
 168     Where ``journal`` is the filestore journal, ``wal`` is the write ahead log of
 169     bluestore and ``all`` stands for all devices associated with the osd
 170
 171
 172 Monitor and manager management
 173 ==============================
 174
 175 Creates or removes MONs or MGRs from the cluster. Orchestrator may return an
 176 error if it doesn't know how to do this transition.
 177
 178 Update the number of monitor hosts::
 179
 180     ceph orch apply mon <num> [host, host:network...]
 181
 182 Each host can optionally specify a network for the monitor to listen on.
 183
 184 Update the number of manager hosts::
 185
 186     ceph orch apply mgr <num> [host...]
 187
 188 ..
 189     .. note::
 190
 191         The host lists are the new full list of mon/mgr hosts
 192
 193     .. note::
 194
 195         specifying hosts is optional for some orchestrator modules
 196         and mandatory for others (e.g. Ansible).
 197
 198
 199 Service Status
 200 ==============
 201
 202 Print a list of services known to the orchestrator. The list can be limited to
 203 services on a particular host with the optional --host parameter and/or
 204 services of a particular type via optional --type parameter
 205 (mon, osd, mgr, mds, rgw):
 206
 207 ::
 208
 209     ceph orch ps
 210     ceph orch service ls [--host host] [--svc_type type] [--refresh]
 211
 212 Discover the status of a particular service or daemons::
 213
 214     ceph orch service ls --svc_type type --svc_id <name> [--refresh]
 215
 216
 217 Query the status of a particular service instance (mon, osd, mds, rgw).  For OSDs
 218 the id is the numeric OSD ID, for MDS services it is the file system name::
 219
 220     ceph orch daemon status <type> <instance-name> [--refresh]
 221
 222
 223 .. _orchestrator-cli-cephfs:
 224
 225 Depoying CephFS
 226 ===============
 227
 228 In order to set up a :term:`CephFS`, execute::
 229
 230     ceph fs volume create <fs_name> <placement spec>
 231
 232 Where ``name`` is the name of the CephFS, ``placement`` is a
 233 :ref:`orchestrator-cli-placement-spec`.
 234
 235 This command will create the required Ceph pools, create the new
 236 CephFS, and deploy mds servers.
 237
 238 Stateless services (MDS/RGW/NFS/rbd-mirror/iSCSI)
 239 =================================================
 240
 241 The orchestrator is not responsible for configuring the services. Please look into the corresponding
 242 documentation for details.
 243
 244 The ``name`` parameter is an identifier of the group of instances:
 245
 246 * a CephFS file system for a group of MDS daemons,
 247 * a zone name for a group of RGWs
 248
 249 Sizing: the ``size`` parameter gives the number of daemons in the cluster
 250 (e.g. the number of MDS daemons for a particular CephFS file system).
 251
 252 Creating/growing/shrinking/removing services::
 253
 254     ceph orch {mds,rgw} update <name> <size> [host…]
 255     ceph orch {mds,rgw} add <name>
 256     ceph orch nfs update <name> <size> [host…]
 257     ceph orch nfs add <name> <pool> [--namespace=<namespace>]
 258     ceph orch {mds,rgw,nfs} rm <name>
 259
 260 e.g., ``ceph orch mds update myfs 3 host1 host2 host3``
 261
 262 Start/stop/reload::
 263
 264     ceph orch service {stop,start,reload} <type> <name>
 265
 266     ceph orch daemon {start,stop,reload} <type> <daemon-id>
 267
 268 .. _orchestrator-cli-service-spec:
 269
 270 Service Specification
 271 =====================
 272
 273 As *Service Specification* is a data structure often represented as YAML
 274 to specify the deployment of services. For example:
 275
 276 .. code-block:: yaml
 277
 278     service_type: rgw
 279     service_id: realm.zone
 280     placement:
 281       hosts:
 282         - host1
 283         - host2
 284         - host3
 285     spec: ...
 286
 287 Where the properties of a service specification are the following:
 288
 289 * ``service_type`` is the type of the service. Needs to be either a Ceph
 290    service (``mon``, ``crash``, ``mds``, ``mgr``, ``osd`` or
 291    ``rbd-mirror``), a gateway (``nfs`` or ``rgw``), or part of the
 292    monitoring stack (``alertmanager``, ``grafana``, ``node-exporter`` or
 293    ``prometheus``).
 294 * ``service_id`` is the name of the service. Omit the service time
 295 * ``placement`` is a :ref:`orchestrator-cli-placement-spec`
 296 * ``spec``: additional specifications for a specific service.
 297
 298 Each service type can have different requirements for the spec.
 299
 300 Service specifications of type ``mon``, ``mgr``, and the monitoring
 301 types do not require a ``service_id``
 302
 303 A service of type ``nfs`` requires a pool name and contain
 304 an optional namespace:
 305
 306 .. code-block:: yaml
 307
 308     service_type: nfs
 309     service_id: mynfs
 310     placement:
 311       hosts:
 312         - host1
 313         - host2
 314     spec:
 315       pool: mypool
 316       namespace: mynamespace
 317
 318 Where ``pool`` is a RADOS pool where NFS client recovery data is stored
 319 and ``namespace`` is a RADOS namespace where NFS client recovery
 320 data is stored in the pool.
 321
 322 A service of type ``osd`` is in detail described in :ref:`drivegroups`
 323
 324 Many service specifications can then be applied at once using
 325 ``ceph orch apply -i`` by submitting a multi-document YAML file::
 326
 327     cat <<EOF | ceph orch apply -i -
 328     service_type: mon
 329     placement:
 330       host_pattern: "mon*"
 331     ---
 332     service_type: mgr
 333     placement:
 334       host_pattern: "mgr*"
 335     ---
 336     service_type: osd
 337     placement:
 338       host_pattern: "osd*"
 339     data_devices:
 340       all: true
 341     EOF
 342
 343 .. _orchestrator-cli-placement-spec:
 344
 345 Placement Specification
 346 =======================
 347
 348 In order to allow the orchestrator to deploy a *service*, it needs to
 349 know how many and where it should deploy *daemons*. The orchestrator
 350 defines a placement specification that can either be passed as a command line argument.
 351
 352 Explicit placements
 353 -------------------
 354
 355 Daemons can be explictly placed on hosts by simply specifying them::
 356
 357     orch apply prometheus "host1 host2 host3"
 358
 359 Or in yaml:
 360
 361 .. code-block:: yaml
 362
 363     service_type: prometheus
 364     placement:
 365       hosts:
 366         - host1
 367         - host2
 368         - host3
 369
 370 MONs and other services may require some enhanced network specifications::
 371
 372   orch daemon add mon myhost:[v2:1.2.3.4:3000,v1:1.2.3.4:6789]=name
 373
 374 Where ``[v2:1.2.3.4:3000,v1:1.2.3.4:6789]`` is the network address of the monitor
 375 and ``=name`` specifies the name of the new monitor.
 376
 377 Placement by labels
 378 -------------------
 379
 380 Daemons can be explictly placed on hosts that match a specifc label::
 381
 382     orch apply prometheus label:mylabel
 383
 384 Or in yaml:
 385
 386 .. code-block:: yaml
 387
 388     service_type: prometheus
 389     placement:
 390       label: "mylabel"
 391
 392
 393 Placement by pattern matching
 394 -----------------------------
 395
 396 Daemons can be placed on hosts as well::
 397
 398     orch apply prometheus 'myhost[1-3]'
 399
 400 Or in yaml:
 401
 402 .. code-block:: yaml
 403
 404     service_type: prometheus
 405     placement:
 406       host_pattern: "myhost[1-3]"
 407
 408 To place a service on *all* hosts, use ``"*"``::
 409
 410     orch apply crash '*'
 411
 412 Or in yaml:
 413
 414 .. code-block:: yaml
 415
 416     service_type: node-exporter
 417     placement:
 418       host_pattern: "*"
 419
 420
 421 Setting a limit
 422 ---------------
 423
 424 By specifying ``count``, only that number of daemons will be created::
 425
 426     orch apply prometheus 3
 427
 428 To deploy *daemons* on a subset of hosts, also specify the count::
 429
 430     orch apply prometheus "2 host1 host2 host3"
 431
 432 If the count is bigger than the amount of hosts, cephadm still deploys two daemons::
 433
 434     orch apply prometheus "3 host1 host2"
 435
 436 Or in yaml:
 437
 438 .. code-block:: yaml
 439
 440     service_type: prometheus
 441     placement:
 442       count: 3
 443
 444 Or with hosts:
 445
 446 .. code-block:: yaml
 447
 448     service_type: prometheus
 449     placement:
 450       count: 2
 451       hosts:
 452         - host1
 453         - host2
 454         - host3
 455
 456
 457 Configuring the Orchestrator CLI
 458 ================================
 459
 460 To enable the orchestrator, select the orchestrator module to use
 461 with the ``set backend`` command::
 462
 463     ceph orch set backend <module>
 464
 465 For example, to enable the Rook orchestrator module and use it with the CLI::
 466
 467     ceph mgr module enable rook
 468     ceph orch set backend rook
 469
 470 Check the backend is properly configured::
 471
 472     ceph orch status
 473
 474 Disable the Orchestrator
 475 ------------------------
 476
 477 To disable the orchestrator, use the empty string ``""``::
 478
 479     ceph orch set backend ""
 480     ceph mgr module disable rook
 481
 482 Current Implementation Status
 483 =============================
 484
 485 This is an overview of the current implementation status of the orchestrators.
 486
 487 =================================== ====== =========
 488  Command                             Rook   Cephadm
 489 =================================== ====== =========
 490  apply iscsi                         ⚪      ⚪
 491  apply mds                           ✔      ✔
 492  apply mgr                           ⚪      ✔
 493  apply mon                           ✔      ✔
 494  apply nfs                           ✔      ✔
 495  apply osd                           ✔      ✔
 496  apply rbd-mirror                    ✔      ✔
 497  apply rgw                           ⚪      ✔
 498  host add                            ⚪      ✔
 499  host ls                             ✔      ✔
 500  host rm                             ⚪      ✔
 501  daemon status                       ⚪      ✔
 502  daemon {stop,start,...}             ⚪      ✔
 503  device {ident,fault}-(on,off}       ⚪      ✔
 504  device ls                           ✔      ✔
 505  iscsi add                           ⚪      ⚪
 506  mds add                             ✔      ✔
 507  nfs add                             ✔      ✔
 508  ps                                  ⚪      ✔
 509  rbd-mirror add                      ⚪      ✔
 510  rgw add                             ✔      ✔
 511  ps                                  ✔      ✔
 512 =================================== ====== =========
 513
 514 where
 515
 516 * ⚪ = not yet implemented
 517 * ❌ = not applicable
 518 * ✔ = implemented