]> git.proxmox.com Git - ceph.git/blame - ceph/doc/mgr/orchestrator_modules.rst
import 15.2.0 Octopus source
[ceph.git] / ceph / doc / mgr / orchestrator_modules.rst
CommitLineData
11fdf7f2
TL
1
2
3.. _orchestrator-modules:
4
5.. py:currentmodule:: orchestrator
6
7ceph-mgr orchestrator modules
8=============================
9
10.. warning::
11
12 This is developer documentation, describing Ceph internals that
13 are only relevant to people writing ceph-mgr orchestrator modules.
14
15In this context, *orchestrator* refers to some external service that
16provides the ability to discover devices and create Ceph services. This
9f95a23c 17includes external projects such as Rook.
11fdf7f2
TL
18
19An *orchestrator module* is a ceph-mgr module (:ref:`mgr-module-dev`)
20which implements common management operations using a particular
21orchestrator.
22
23Orchestrator modules subclass the ``Orchestrator`` class: this class is
24an interface, it only provides method definitions to be implemented
25by subclasses. The purpose of defining this common interface
26for different orchestrators is to enable common UI code, such as
27the dashboard, to work with various different backends.
28
29
30.. graphviz::
31
32 digraph G {
33 subgraph cluster_1 {
34 volumes [label="mgr/volumes"]
35 rook [label="mgr/rook"]
36 dashboard [label="mgr/dashboard"]
9f95a23c 37 orchestrator_cli [label="mgr/orchestrator"]
11fdf7f2 38 orchestrator [label="Orchestrator Interface"]
9f95a23c 39 cephadm [label="mgr/cephadm"]
11fdf7f2
TL
40
41 label = "ceph-mgr";
42 }
43
44 volumes -> orchestrator
45 dashboard -> orchestrator
46 orchestrator_cli -> orchestrator
47 orchestrator -> rook -> rook_io
9f95a23c 48 orchestrator -> cephadm
11fdf7f2
TL
49
50
51 rook_io [label="Rook"]
11fdf7f2
TL
52
53 rankdir="TB";
54 }
55
56Behind all the abstraction, the purpose of orchestrator modules is simple:
57enable Ceph to do things like discover available hardware, create and
58destroy OSDs, and run MDS and RGW services.
59
60A tutorial is not included here: for full and concrete examples, see
61the existing implemented orchestrator modules in the Ceph source tree.
62
63Glossary
64--------
65
66Stateful service
67 a daemon that uses local storage, such as OSD or mon.
68
69Stateless service
70 a daemon that doesn't use any local storage, such
71 as an MDS, RGW, nfs-ganesha, iSCSI gateway.
72
73Label
74 arbitrary string tags that may be applied by administrators
9f95a23c
TL
75 to hosts. Typically administrators use labels to indicate
76 which hosts should run which kinds of service. Labels are
77 advisory (from human input) and do not guarantee that hosts
11fdf7f2
TL
78 have particular physical capabilities.
79
80Drive group
81 collection of block devices with common/shared OSD
82 formatting (typically one or more SSDs acting as
83 journals/dbs for a group of HDDs).
84
85Placement
9f95a23c 86 choice of which host is used to run a service.
11fdf7f2
TL
87
88Key Concepts
89------------
90
91The underlying orchestrator remains the source of truth for information
92about whether a service is running, what is running where, which
9f95a23c 93hosts are available, etc. Orchestrator modules should avoid taking
11fdf7f2
TL
94any internal copies of this information, and read it directly from
95the orchestrator backend as much as possible.
96
9f95a23c 97Bootstrapping hosts and adding them to the underlying orchestration
11fdf7f2 98system is outside the scope of Ceph's orchestrator interface. Ceph
9f95a23c 99can only work on hosts when the orchestrator is already aware of them.
11fdf7f2
TL
100
101Calls to orchestrator modules are all asynchronous, and return *completion*
102objects (see below) rather than returning values immediately.
103
104Where possible, placement of stateless services should be left up to the
105orchestrator.
106
107Completions and batching
108------------------------
109
110All methods that read or modify the state of the system can potentially
9f95a23c
TL
111be long running. To handle that, all such methods return a *Completion*
112object. Orchestrator modules
113must implement the *process* method: this takes a list of completions, and
11fdf7f2
TL
114is responsible for checking if they're finished, and advancing the underlying
115operations as needed.
116
117Each orchestrator module implements its own underlying mechanisms
118for completions. This might involve running the underlying operations
119in threads, or batching the operations up before later executing
120in one go in the background. If implementing such a batching pattern, the
121module would do no work on any operation until it appeared in a list
9f95a23c 122of completions passed into *process*.
11fdf7f2 123
9f95a23c
TL
124Some operations need to show a progress. Those operations need to add
125a *ProgressReference* to the completion. At some point, the progress reference
126becomes *effective*, meaning that the operation has really happened
127(e.g. a service has actually been started).
11fdf7f2 128
9f95a23c 129.. automethod:: Orchestrator.process
11fdf7f2 130
9f95a23c 131.. autoclass:: Completion
11fdf7f2
TL
132 :members:
133
9f95a23c 134.. autoclass:: ProgressReference
81eedcae
TL
135 :members:
136
11fdf7f2
TL
137Error Handling
138--------------
139
140The main goal of error handling within orchestrator modules is to provide debug information to
141assist users when dealing with deployment errors.
142
143.. autoclass:: OrchestratorError
144.. autoclass:: NoOrchestrator
145.. autoclass:: OrchestratorValidationError
146
147
148In detail, orchestrators need to explicitly deal with different kinds of errors:
149
1501. No orchestrator configured
151
152 See :class:`NoOrchestrator`.
153
1542. An orchestrator doesn't implement a specific method.
155
156 For example, an Orchestrator doesn't support ``add_host``.
157
158 In this case, a ``NotImplementedError`` is raised.
159
1603. Missing features within implemented methods.
161
162 E.g. optional parameters to a command that are not supported by the
9f95a23c 163 backend (e.g. the hosts field in :func:`Orchestrator.apply_mons` command with the rook backend).
11fdf7f2
TL
164
165 See :class:`OrchestratorValidationError`.
166
1674. Input validation errors
168
9f95a23c 169 The ``orchestrator`` module and other calling modules are supposed to
11fdf7f2
TL
170 provide meaningful error messages.
171
172 See :class:`OrchestratorValidationError`.
173
1745. Errors when actually executing commands
175
176 The resulting Completion should contain an error string that assists in understanding the
9f95a23c 177 problem. In addition, :func:`Completion.is_errored` is set to ``True``
11fdf7f2
TL
178
1796. Invalid configuration in the orchestrator modules
180
181 This can be tackled similar to 5.
182
183
184All other errors are unexpected orchestrator issues and thus should raise an exception that are then
185logged into the mgr log file. If there is a completion object at that point,
9f95a23c 186:func:`Completion.result` may contain an error message.
11fdf7f2
TL
187
188
189Excluded functionality
190----------------------
191
192- Ceph's orchestrator interface is not a general purpose framework for
193 managing linux servers -- it is deliberately constrained to manage
194 the Ceph cluster's services only.
195- Multipathed storage is not handled (multipathing is unnecessary for
196 Ceph clusters). Each drive is assumed to be visible only on
9f95a23c 197 a single host.
11fdf7f2
TL
198
199Host management
200---------------
201
202.. automethod:: Orchestrator.add_host
203.. automethod:: Orchestrator.remove_host
204.. automethod:: Orchestrator.get_hosts
9f95a23c
TL
205.. automethod:: Orchestrator.update_host_addr
206.. automethod:: Orchestrator.add_host_label
207.. automethod:: Orchestrator.remove_host_label
11fdf7f2 208
9f95a23c
TL
209.. autoclass:: HostSpec
210
211Devices
212-------
11fdf7f2
TL
213
214.. automethod:: Orchestrator.get_inventory
215.. autoclass:: InventoryFilter
11fdf7f2 216
9f95a23c
TL
217.. py:currentmodule:: ceph.deployment.inventory
218
219.. autoclass:: Devices
11fdf7f2
TL
220 :members:
221
9f95a23c
TL
222.. autoclass:: Device
223 :members:
224
225.. py:currentmodule:: orchestrator
226
227Placement
228---------
229
230A :ref:`orchestrator-cli-placement-spec` defines the placement of
231daemons of a specifc service.
232
233In general, stateless services do not require any specific placement
234rules as they can run anywhere that sufficient system resources
235are available. However, some orchestrators may not include the
236functionality to choose a location in this way. Optionally, you can
237specify a location when creating a stateless service.
238
239
240.. py:currentmodule:: ceph.deployment.service_spec
241
242.. autoclass:: PlacementSpec
243 :members:
244
245.. py:currentmodule:: orchestrator
246
247
248Services
249--------
250
11fdf7f2
TL
251.. autoclass:: ServiceDescription
252
9f95a23c
TL
253.. py:currentmodule:: ceph.deployment.service_spec
254
255.. autoclass:: ServiceSpec
256
257.. py:currentmodule:: orchestrator
258
259.. automethod:: Orchestrator.describe_service
11fdf7f2
TL
260
261.. automethod:: Orchestrator.service_action
9f95a23c
TL
262.. automethod:: Orchestrator.remove_service
263
264
265Daemons
266-------
267
268.. automethod:: Orchestrator.list_daemons
269.. automethod:: Orchestrator.remove_daemons
270.. automethod:: Orchestrator.daemon_action
11fdf7f2
TL
271
272OSD management
273--------------
274
275.. automethod:: Orchestrator.create_osds
11fdf7f2 276
9f95a23c
TL
277.. automethod:: Orchestrator.blink_device_light
278.. autoclass:: DeviceLightLoc
11fdf7f2 279
9f95a23c
TL
280.. _orchestrator-osd-replace:
281
282OSD Replacement
283^^^^^^^^^^^^^^^
284
285See :ref:`rados-replacing-an-osd` for the underlying process.
286
287Replacing OSDs is fundamentally a two-staged process, as users need to
288physically replace drives. The orchestrator therefor exposes this two-staged process.
289
290Phase one is a call to :meth:`Orchestrator.remove_daemons` with ``destroy=True`` in order to mark
291the OSD as destroyed.
292
293
294Phase two is a call to :meth:`Orchestrator.create_osds` with a Drive Group with
295
296.. py:currentmodule:: ceph.deployment.drive_group
297
298:attr:`DriveGroupSpec.osd_id_claims` set to the destroyed OSD ids.
299
300.. py:currentmodule:: orchestrator
301
302Monitors
303--------
304
305.. automethod:: Orchestrator.add_mon
306.. automethod:: Orchestrator.apply_mon
11fdf7f2
TL
307
308Stateless Services
309------------------
310
9f95a23c
TL
311.. automethod:: Orchestrator.add_mgr
312.. automethod:: Orchestrator.apply_mgr
313.. automethod:: Orchestrator.add_mds
314.. automethod:: Orchestrator.apply_mds
315.. automethod:: Orchestrator.add_rbd_mirror
316.. automethod:: Orchestrator.apply_rbd_mirror
317
318.. py:currentmodule:: ceph.deployment.service_spec
319
320.. autoclass:: RGWSpec
321
322.. py:currentmodule:: orchestrator
323
324.. automethod:: Orchestrator.add_rgw
325.. automethod:: Orchestrator.apply_rgw
326
327.. py:currentmodule:: ceph.deployment.service_spec
328
329.. autoclass:: NFSServiceSpec
330
331.. py:currentmodule:: orchestrator
332
333.. automethod:: Orchestrator.add_nfs
334.. automethod:: Orchestrator.apply_nfs
11fdf7f2
TL
335
336Upgrades
337--------
338
339.. automethod:: Orchestrator.upgrade_available
340.. automethod:: Orchestrator.upgrade_start
341.. automethod:: Orchestrator.upgrade_status
11fdf7f2
TL
342.. autoclass:: UpgradeStatusSpec
343
344Utility
345-------
346
347.. automethod:: Orchestrator.available
9f95a23c 348.. automethod:: Orchestrator.get_feature_set
11fdf7f2
TL
349
350Client Modules
351--------------
352
353.. autoclass:: OrchestratorClientMixin
354 :members: