]> git.proxmox.com Git - ceph.git/blob - ceph/doc/mgr/orchestrator_modules.rst
import 15.2.0 Octopus source
[ceph.git] / ceph / doc / mgr / orchestrator_modules.rst
1
2
3 .. _orchestrator-modules:
4
5 .. py:currentmodule:: orchestrator
6
7 ceph-mgr orchestrator modules
8 =============================
9
10 .. warning::
11
12 This is developer documentation, describing Ceph internals that
13 are only relevant to people writing ceph-mgr orchestrator modules.
14
15 In this context, *orchestrator* refers to some external service that
16 provides the ability to discover devices and create Ceph services. This
17 includes external projects such as Rook.
18
19 An *orchestrator module* is a ceph-mgr module (:ref:`mgr-module-dev`)
20 which implements common management operations using a particular
21 orchestrator.
22
23 Orchestrator modules subclass the ``Orchestrator`` class: this class is
24 an interface, it only provides method definitions to be implemented
25 by subclasses. The purpose of defining this common interface
26 for different orchestrators is to enable common UI code, such as
27 the dashboard, to work with various different backends.
28
29
30 .. graphviz::
31
32 digraph G {
33 subgraph cluster_1 {
34 volumes [label="mgr/volumes"]
35 rook [label="mgr/rook"]
36 dashboard [label="mgr/dashboard"]
37 orchestrator_cli [label="mgr/orchestrator"]
38 orchestrator [label="Orchestrator Interface"]
39 cephadm [label="mgr/cephadm"]
40
41 label = "ceph-mgr";
42 }
43
44 volumes -> orchestrator
45 dashboard -> orchestrator
46 orchestrator_cli -> orchestrator
47 orchestrator -> rook -> rook_io
48 orchestrator -> cephadm
49
50
51 rook_io [label="Rook"]
52
53 rankdir="TB";
54 }
55
56 Behind all the abstraction, the purpose of orchestrator modules is simple:
57 enable Ceph to do things like discover available hardware, create and
58 destroy OSDs, and run MDS and RGW services.
59
60 A tutorial is not included here: for full and concrete examples, see
61 the existing implemented orchestrator modules in the Ceph source tree.
62
63 Glossary
64 --------
65
66 Stateful service
67 a daemon that uses local storage, such as OSD or mon.
68
69 Stateless service
70 a daemon that doesn't use any local storage, such
71 as an MDS, RGW, nfs-ganesha, iSCSI gateway.
72
73 Label
74 arbitrary string tags that may be applied by administrators
75 to hosts. Typically administrators use labels to indicate
76 which hosts should run which kinds of service. Labels are
77 advisory (from human input) and do not guarantee that hosts
78 have particular physical capabilities.
79
80 Drive group
81 collection of block devices with common/shared OSD
82 formatting (typically one or more SSDs acting as
83 journals/dbs for a group of HDDs).
84
85 Placement
86 choice of which host is used to run a service.
87
88 Key Concepts
89 ------------
90
91 The underlying orchestrator remains the source of truth for information
92 about whether a service is running, what is running where, which
93 hosts are available, etc. Orchestrator modules should avoid taking
94 any internal copies of this information, and read it directly from
95 the orchestrator backend as much as possible.
96
97 Bootstrapping hosts and adding them to the underlying orchestration
98 system is outside the scope of Ceph's orchestrator interface. Ceph
99 can only work on hosts when the orchestrator is already aware of them.
100
101 Calls to orchestrator modules are all asynchronous, and return *completion*
102 objects (see below) rather than returning values immediately.
103
104 Where possible, placement of stateless services should be left up to the
105 orchestrator.
106
107 Completions and batching
108 ------------------------
109
110 All methods that read or modify the state of the system can potentially
111 be long running. To handle that, all such methods return a *Completion*
112 object. Orchestrator modules
113 must implement the *process* method: this takes a list of completions, and
114 is responsible for checking if they're finished, and advancing the underlying
115 operations as needed.
116
117 Each orchestrator module implements its own underlying mechanisms
118 for completions. This might involve running the underlying operations
119 in threads, or batching the operations up before later executing
120 in one go in the background. If implementing such a batching pattern, the
121 module would do no work on any operation until it appeared in a list
122 of completions passed into *process*.
123
124 Some operations need to show a progress. Those operations need to add
125 a *ProgressReference* to the completion. At some point, the progress reference
126 becomes *effective*, meaning that the operation has really happened
127 (e.g. a service has actually been started).
128
129 .. automethod:: Orchestrator.process
130
131 .. autoclass:: Completion
132 :members:
133
134 .. autoclass:: ProgressReference
135 :members:
136
137 Error Handling
138 --------------
139
140 The main goal of error handling within orchestrator modules is to provide debug information to
141 assist users when dealing with deployment errors.
142
143 .. autoclass:: OrchestratorError
144 .. autoclass:: NoOrchestrator
145 .. autoclass:: OrchestratorValidationError
146
147
148 In detail, orchestrators need to explicitly deal with different kinds of errors:
149
150 1. No orchestrator configured
151
152 See :class:`NoOrchestrator`.
153
154 2. An orchestrator doesn't implement a specific method.
155
156 For example, an Orchestrator doesn't support ``add_host``.
157
158 In this case, a ``NotImplementedError`` is raised.
159
160 3. Missing features within implemented methods.
161
162 E.g. optional parameters to a command that are not supported by the
163 backend (e.g. the hosts field in :func:`Orchestrator.apply_mons` command with the rook backend).
164
165 See :class:`OrchestratorValidationError`.
166
167 4. Input validation errors
168
169 The ``orchestrator`` module and other calling modules are supposed to
170 provide meaningful error messages.
171
172 See :class:`OrchestratorValidationError`.
173
174 5. Errors when actually executing commands
175
176 The resulting Completion should contain an error string that assists in understanding the
177 problem. In addition, :func:`Completion.is_errored` is set to ``True``
178
179 6. Invalid configuration in the orchestrator modules
180
181 This can be tackled similar to 5.
182
183
184 All other errors are unexpected orchestrator issues and thus should raise an exception that are then
185 logged into the mgr log file. If there is a completion object at that point,
186 :func:`Completion.result` may contain an error message.
187
188
189 Excluded functionality
190 ----------------------
191
192 - Ceph's orchestrator interface is not a general purpose framework for
193 managing linux servers -- it is deliberately constrained to manage
194 the Ceph cluster's services only.
195 - Multipathed storage is not handled (multipathing is unnecessary for
196 Ceph clusters). Each drive is assumed to be visible only on
197 a single host.
198
199 Host management
200 ---------------
201
202 .. automethod:: Orchestrator.add_host
203 .. automethod:: Orchestrator.remove_host
204 .. automethod:: Orchestrator.get_hosts
205 .. automethod:: Orchestrator.update_host_addr
206 .. automethod:: Orchestrator.add_host_label
207 .. automethod:: Orchestrator.remove_host_label
208
209 .. autoclass:: HostSpec
210
211 Devices
212 -------
213
214 .. automethod:: Orchestrator.get_inventory
215 .. autoclass:: InventoryFilter
216
217 .. py:currentmodule:: ceph.deployment.inventory
218
219 .. autoclass:: Devices
220 :members:
221
222 .. autoclass:: Device
223 :members:
224
225 .. py:currentmodule:: orchestrator
226
227 Placement
228 ---------
229
230 A :ref:`orchestrator-cli-placement-spec` defines the placement of
231 daemons of a specifc service.
232
233 In general, stateless services do not require any specific placement
234 rules as they can run anywhere that sufficient system resources
235 are available. However, some orchestrators may not include the
236 functionality to choose a location in this way. Optionally, you can
237 specify a location when creating a stateless service.
238
239
240 .. py:currentmodule:: ceph.deployment.service_spec
241
242 .. autoclass:: PlacementSpec
243 :members:
244
245 .. py:currentmodule:: orchestrator
246
247
248 Services
249 --------
250
251 .. autoclass:: ServiceDescription
252
253 .. py:currentmodule:: ceph.deployment.service_spec
254
255 .. autoclass:: ServiceSpec
256
257 .. py:currentmodule:: orchestrator
258
259 .. automethod:: Orchestrator.describe_service
260
261 .. automethod:: Orchestrator.service_action
262 .. automethod:: Orchestrator.remove_service
263
264
265 Daemons
266 -------
267
268 .. automethod:: Orchestrator.list_daemons
269 .. automethod:: Orchestrator.remove_daemons
270 .. automethod:: Orchestrator.daemon_action
271
272 OSD management
273 --------------
274
275 .. automethod:: Orchestrator.create_osds
276
277 .. automethod:: Orchestrator.blink_device_light
278 .. autoclass:: DeviceLightLoc
279
280 .. _orchestrator-osd-replace:
281
282 OSD Replacement
283 ^^^^^^^^^^^^^^^
284
285 See :ref:`rados-replacing-an-osd` for the underlying process.
286
287 Replacing OSDs is fundamentally a two-staged process, as users need to
288 physically replace drives. The orchestrator therefor exposes this two-staged process.
289
290 Phase one is a call to :meth:`Orchestrator.remove_daemons` with ``destroy=True`` in order to mark
291 the OSD as destroyed.
292
293
294 Phase two is a call to :meth:`Orchestrator.create_osds` with a Drive Group with
295
296 .. py:currentmodule:: ceph.deployment.drive_group
297
298 :attr:`DriveGroupSpec.osd_id_claims` set to the destroyed OSD ids.
299
300 .. py:currentmodule:: orchestrator
301
302 Monitors
303 --------
304
305 .. automethod:: Orchestrator.add_mon
306 .. automethod:: Orchestrator.apply_mon
307
308 Stateless Services
309 ------------------
310
311 .. automethod:: Orchestrator.add_mgr
312 .. automethod:: Orchestrator.apply_mgr
313 .. automethod:: Orchestrator.add_mds
314 .. automethod:: Orchestrator.apply_mds
315 .. automethod:: Orchestrator.add_rbd_mirror
316 .. automethod:: Orchestrator.apply_rbd_mirror
317
318 .. py:currentmodule:: ceph.deployment.service_spec
319
320 .. autoclass:: RGWSpec
321
322 .. py:currentmodule:: orchestrator
323
324 .. automethod:: Orchestrator.add_rgw
325 .. automethod:: Orchestrator.apply_rgw
326
327 .. py:currentmodule:: ceph.deployment.service_spec
328
329 .. autoclass:: NFSServiceSpec
330
331 .. py:currentmodule:: orchestrator
332
333 .. automethod:: Orchestrator.add_nfs
334 .. automethod:: Orchestrator.apply_nfs
335
336 Upgrades
337 --------
338
339 .. automethod:: Orchestrator.upgrade_available
340 .. automethod:: Orchestrator.upgrade_start
341 .. automethod:: Orchestrator.upgrade_status
342 .. autoclass:: UpgradeStatusSpec
343
344 Utility
345 -------
346
347 .. automethod:: Orchestrator.available
348 .. automethod:: Orchestrator.get_feature_set
349
350 Client Modules
351 --------------
352
353 .. autoclass:: OrchestratorClientMixin
354 :members: