ceph/doc/mgr/orchestrator_modules.rst

   1
   2
   3 .. _orchestrator-modules:
   4
   5 .. py:currentmodule:: orchestrator
   6
   7 ceph-mgr orchestrator modules
   8 =============================
   9
  10 .. warning::
  11
  12     This is developer documentation, describing Ceph internals that
  13     are only relevant to people writing ceph-mgr orchestrator modules.
  14
  15 In this context, *orchestrator* refers to some external service that
  16 provides the ability to discover devices and create Ceph services.  This
  17 includes external projects such as Rook.
  18
  19 An *orchestrator module* is a ceph-mgr module (:ref:`mgr-module-dev`)
  20 which implements common management operations using a particular
  21 orchestrator.
  22
  23 Orchestrator modules subclass the ``Orchestrator`` class: this class is
  24 an interface, it only provides method definitions to be implemented
  25 by subclasses.  The purpose of defining this common interface
  26 for different orchestrators is to enable common UI code, such as
  27 the dashboard, to work with various different backends.
  28
  29
  30 .. graphviz::
  31
  32     digraph G {
  33         subgraph cluster_1 {
  34             volumes [label="mgr/volumes"]
  35             rook [label="mgr/rook"]
  36             dashboard [label="mgr/dashboard"]
  37             orchestrator_cli [label="mgr/orchestrator"]
  38             orchestrator [label="Orchestrator Interface"]
  39             cephadm [label="mgr/cephadm"]
  40
  41             label = "ceph-mgr";
  42         }
  43
  44         volumes -> orchestrator
  45         dashboard -> orchestrator
  46         orchestrator_cli -> orchestrator
  47         orchestrator -> rook -> rook_io
  48         orchestrator -> cephadm
  49
  50
  51         rook_io [label="Rook"]
  52
  53         rankdir="TB";
  54     }
  55
  56 Behind all the abstraction, the purpose of orchestrator modules is simple:
  57 enable Ceph to do things like discover available hardware, create and
  58 destroy OSDs, and run MDS and RGW services.
  59
  60 A tutorial is not included here: for full and concrete examples, see
  61 the existing implemented orchestrator modules in the Ceph source tree.
  62
  63 Glossary
  64 --------
  65
  66 Stateful service
  67   a daemon that uses local storage, such as OSD or mon.
  68
  69 Stateless service
  70   a daemon that doesn't use any local storage, such
  71   as an MDS, RGW, nfs-ganesha, iSCSI gateway.
  72
  73 Label
  74   arbitrary string tags that may be applied by administrators
  75   to hosts.  Typically administrators use labels to indicate
  76   which hosts should run which kinds of service.  Labels are
  77   advisory (from human input) and do not guarantee that hosts
  78   have particular physical capabilities.
  79
  80 Drive group
  81   collection of block devices with common/shared OSD
  82   formatting (typically one or more SSDs acting as
  83   journals/dbs for a group of HDDs).
  84
  85 Placement
  86   choice of which host is used to run a service.
  87
  88 Key Concepts
  89 ------------
  90
  91 The underlying orchestrator remains the source of truth for information
  92 about whether a service is running, what is running where, which
  93 hosts are available, etc.  Orchestrator modules should avoid taking
  94 any internal copies of this information, and read it directly from
  95 the orchestrator backend as much as possible.
  96
  97 Bootstrapping hosts and adding them to the underlying orchestration
  98 system is outside the scope of Ceph's orchestrator interface.  Ceph
  99 can only work on hosts when the orchestrator is already aware of them.
 100
 101 Where possible, placement of stateless services should be left up to the
 102 orchestrator.
 103
 104 Completions and batching
 105 ------------------------
 106
 107 All methods that read or modify the state of the system can potentially
 108 be long running. Therefore the module needs to schedule those operations.
 109
 110 Each orchestrator module implements its own underlying mechanisms
 111 for completions.  This might involve running the underlying operations
 112 in threads, or batching the operations up before later executing
 113 in one go in the background.  If implementing such a batching pattern, the
 114 module would do no work on any operation until it appeared in a list
 115 of completions passed into *process*.
 116
 117 Error Handling
 118 --------------
 119
 120 The main goal of error handling within orchestrator modules is to provide debug information to
 121 assist users when dealing with deployment errors.
 122
 123 .. autoclass:: OrchestratorError
 124 .. autoclass:: NoOrchestrator
 125 .. autoclass:: OrchestratorValidationError
 126
 127
 128 In detail, orchestrators need to explicitly deal with different kinds of errors:
 129
 130 1. No orchestrator configured
 131
 132    See :class:`NoOrchestrator`.
 133
 134 2. An orchestrator doesn't implement a specific method.
 135
 136    For example, an Orchestrator doesn't support ``add_host``.
 137
 138    In this case, a ``NotImplementedError`` is raised.
 139
 140 3. Missing features within implemented methods.
 141
 142    E.g. optional parameters to a command that are not supported by the
 143    backend (e.g. the hosts field in :func:`Orchestrator.apply_mons` command with the rook backend).
 144
 145    See :class:`OrchestratorValidationError`.
 146
 147 4. Input validation errors
 148
 149    The ``orchestrator`` module and other calling modules are supposed to
 150    provide meaningful error messages.
 151
 152    See :class:`OrchestratorValidationError`.
 153
 154 5. Errors when actually executing commands
 155
 156    The resulting Completion should contain an error string that assists in understanding the
 157    problem. In addition, :func:`Completion.is_errored` is set to ``True``
 158
 159 6. Invalid configuration in the orchestrator modules
 160
 161    This can be tackled similar to 5.
 162
 163
 164 All other errors are unexpected orchestrator issues and thus should raise an exception that are then
 165 logged into the mgr log file. If there is a completion object at that point,
 166 :func:`Completion.result` may contain an error message.
 167
 168
 169 Excluded functionality
 170 ----------------------
 171
 172 - Ceph's orchestrator interface is not a general purpose framework for
 173   managing linux servers -- it is deliberately constrained to manage
 174   the Ceph cluster's services only.
 175 - Multipathed storage is not handled (multipathing is unnecessary for
 176   Ceph clusters).  Each drive is assumed to be visible only on
 177   a single host.
 178
 179 Host management
 180 ---------------
 181
 182 .. automethod:: Orchestrator.add_host
 183 .. automethod:: Orchestrator.remove_host
 184 .. automethod:: Orchestrator.get_hosts
 185 .. automethod:: Orchestrator.update_host_addr
 186 .. automethod:: Orchestrator.add_host_label
 187 .. automethod:: Orchestrator.remove_host_label
 188
 189 .. autoclass:: HostSpec
 190
 191 Devices
 192 -------
 193
 194 .. automethod:: Orchestrator.get_inventory
 195 .. autoclass:: InventoryFilter
 196
 197 .. py:currentmodule:: ceph.deployment.inventory
 198
 199 .. autoclass:: Devices
 200    :members:
 201
 202 .. autoclass:: Device
 203    :members:
 204
 205 .. py:currentmodule:: orchestrator
 206
 207 Placement
 208 ---------
 209
 210 A :ref:`orchestrator-cli-placement-spec` defines the placement of
 211 daemons of a specific service.
 212
 213 In general, stateless services do not require any specific placement
 214 rules as they can run anywhere that sufficient system resources
 215 are available. However, some orchestrators may not include the
 216 functionality to choose a location in this way. Optionally, you can
 217 specify a location when creating a stateless service.
 218
 219
 220 .. py:currentmodule:: ceph.deployment.service_spec
 221
 222 .. autoclass:: PlacementSpec
 223    :members:
 224
 225 .. py:currentmodule:: orchestrator
 226
 227
 228 Services
 229 --------
 230
 231 .. autoclass:: ServiceDescription
 232
 233 .. py:currentmodule:: ceph.deployment.service_spec
 234
 235 .. autoclass:: ServiceSpec
 236   :members:
 237   :private-members:
 238   :noindex:
 239
 240 .. py:currentmodule:: orchestrator
 241
 242 .. automethod:: Orchestrator.describe_service
 243
 244 .. automethod:: Orchestrator.service_action
 245 .. automethod:: Orchestrator.remove_service
 246
 247
 248 Daemons
 249 -------
 250
 251 .. automethod:: Orchestrator.list_daemons
 252 .. automethod:: Orchestrator.remove_daemons
 253 .. automethod:: Orchestrator.daemon_action
 254
 255 OSD management
 256 --------------
 257
 258 .. automethod:: Orchestrator.create_osds
 259
 260 .. automethod:: Orchestrator.blink_device_light
 261 .. autoclass:: DeviceLightLoc
 262
 263 .. _orchestrator-osd-replace:
 264
 265 OSD Replacement
 266 ^^^^^^^^^^^^^^^
 267
 268 See :ref:`rados-replacing-an-osd` for the underlying process.
 269
 270 Replacing OSDs is fundamentally a two-staged process, as users need to
 271 physically replace drives. The orchestrator therefore exposes this two-staged process.
 272
 273 Phase one is a call to :meth:`Orchestrator.remove_daemons` with ``destroy=True`` in order to mark
 274 the OSD as destroyed.
 275
 276
 277 Phase two is a call to  :meth:`Orchestrator.create_osds` with a Drive Group with
 278
 279 .. py:currentmodule:: ceph.deployment.drive_group
 280
 281 :attr:`DriveGroupSpec.osd_id_claims` set to the destroyed OSD ids.
 282
 283 .. py:currentmodule:: orchestrator
 284
 285 Services
 286 --------
 287
 288 .. automethod:: Orchestrator.add_daemon
 289 .. automethod:: Orchestrator.apply_mon
 290 .. automethod:: Orchestrator.apply_mgr
 291 .. automethod:: Orchestrator.apply_mds
 292 .. automethod:: Orchestrator.apply_rbd_mirror
 293
 294 .. py:currentmodule:: ceph.deployment.service_spec
 295
 296 .. autoclass:: RGWSpec
 297   :noindex:
 298
 299 .. py:currentmodule:: orchestrator
 300
 301 .. automethod:: Orchestrator.apply_rgw
 302
 303 .. py:currentmodule:: ceph.deployment.service_spec
 304
 305 .. autoclass:: NFSServiceSpec
 306
 307 .. py:currentmodule:: orchestrator
 308
 309 .. automethod:: Orchestrator.apply_nfs
 310
 311 Upgrades
 312 --------
 313
 314 .. automethod:: Orchestrator.upgrade_available
 315 .. automethod:: Orchestrator.upgrade_start
 316 .. automethod:: Orchestrator.upgrade_status
 317 .. autoclass:: UpgradeStatusSpec
 318
 319 Utility
 320 -------
 321
 322 .. automethod:: Orchestrator.available
 323 .. automethod:: Orchestrator.get_feature_set
 324
 325 Client Modules
 326 --------------
 327
 328 .. autoclass:: OrchestratorClientMixin
 329    :members: