]> git.proxmox.com Git - ceph.git/blob - ceph/doc/mgr/orchestrator_modules.rst
import ceph 16.2.7
[ceph.git] / ceph / doc / mgr / orchestrator_modules.rst
1
2
3 .. _orchestrator-modules:
4
5 .. py:currentmodule:: orchestrator
6
7 ceph-mgr orchestrator modules
8 =============================
9
10 .. warning::
11
12 This is developer documentation, describing Ceph internals that
13 are only relevant to people writing ceph-mgr orchestrator modules.
14
15 In this context, *orchestrator* refers to some external service that
16 provides the ability to discover devices and create Ceph services. This
17 includes external projects such as Rook.
18
19 An *orchestrator module* is a ceph-mgr module (:ref:`mgr-module-dev`)
20 which implements common management operations using a particular
21 orchestrator.
22
23 Orchestrator modules subclass the ``Orchestrator`` class: this class is
24 an interface, it only provides method definitions to be implemented
25 by subclasses. The purpose of defining this common interface
26 for different orchestrators is to enable common UI code, such as
27 the dashboard, to work with various different backends.
28
29
30 .. graphviz::
31
32 digraph G {
33 subgraph cluster_1 {
34 volumes [label="mgr/volumes"]
35 rook [label="mgr/rook"]
36 dashboard [label="mgr/dashboard"]
37 orchestrator_cli [label="mgr/orchestrator"]
38 orchestrator [label="Orchestrator Interface"]
39 cephadm [label="mgr/cephadm"]
40
41 label = "ceph-mgr";
42 }
43
44 volumes -> orchestrator
45 dashboard -> orchestrator
46 orchestrator_cli -> orchestrator
47 orchestrator -> rook -> rook_io
48 orchestrator -> cephadm
49
50
51 rook_io [label="Rook"]
52
53 rankdir="TB";
54 }
55
56 Behind all the abstraction, the purpose of orchestrator modules is simple:
57 enable Ceph to do things like discover available hardware, create and
58 destroy OSDs, and run MDS and RGW services.
59
60 A tutorial is not included here: for full and concrete examples, see
61 the existing implemented orchestrator modules in the Ceph source tree.
62
63 Glossary
64 --------
65
66 Stateful service
67 a daemon that uses local storage, such as OSD or mon.
68
69 Stateless service
70 a daemon that doesn't use any local storage, such
71 as an MDS, RGW, nfs-ganesha, iSCSI gateway.
72
73 Label
74 arbitrary string tags that may be applied by administrators
75 to hosts. Typically administrators use labels to indicate
76 which hosts should run which kinds of service. Labels are
77 advisory (from human input) and do not guarantee that hosts
78 have particular physical capabilities.
79
80 Drive group
81 collection of block devices with common/shared OSD
82 formatting (typically one or more SSDs acting as
83 journals/dbs for a group of HDDs).
84
85 Placement
86 choice of which host is used to run a service.
87
88 Key Concepts
89 ------------
90
91 The underlying orchestrator remains the source of truth for information
92 about whether a service is running, what is running where, which
93 hosts are available, etc. Orchestrator modules should avoid taking
94 any internal copies of this information, and read it directly from
95 the orchestrator backend as much as possible.
96
97 Bootstrapping hosts and adding them to the underlying orchestration
98 system is outside the scope of Ceph's orchestrator interface. Ceph
99 can only work on hosts when the orchestrator is already aware of them.
100
101 Where possible, placement of stateless services should be left up to the
102 orchestrator.
103
104 Completions and batching
105 ------------------------
106
107 All methods that read or modify the state of the system can potentially
108 be long running. Therefore the module needs to schedule those operations.
109
110 Each orchestrator module implements its own underlying mechanisms
111 for completions. This might involve running the underlying operations
112 in threads, or batching the operations up before later executing
113 in one go in the background. If implementing such a batching pattern, the
114 module would do no work on any operation until it appeared in a list
115 of completions passed into *process*.
116
117 Error Handling
118 --------------
119
120 The main goal of error handling within orchestrator modules is to provide debug information to
121 assist users when dealing with deployment errors.
122
123 .. autoclass:: OrchestratorError
124 .. autoclass:: NoOrchestrator
125 .. autoclass:: OrchestratorValidationError
126
127
128 In detail, orchestrators need to explicitly deal with different kinds of errors:
129
130 1. No orchestrator configured
131
132 See :class:`NoOrchestrator`.
133
134 2. An orchestrator doesn't implement a specific method.
135
136 For example, an Orchestrator doesn't support ``add_host``.
137
138 In this case, a ``NotImplementedError`` is raised.
139
140 3. Missing features within implemented methods.
141
142 E.g. optional parameters to a command that are not supported by the
143 backend (e.g. the hosts field in :func:`Orchestrator.apply_mons` command with the rook backend).
144
145 See :class:`OrchestratorValidationError`.
146
147 4. Input validation errors
148
149 The ``orchestrator`` module and other calling modules are supposed to
150 provide meaningful error messages.
151
152 See :class:`OrchestratorValidationError`.
153
154 5. Errors when actually executing commands
155
156 The resulting Completion should contain an error string that assists in understanding the
157 problem. In addition, :func:`Completion.is_errored` is set to ``True``
158
159 6. Invalid configuration in the orchestrator modules
160
161 This can be tackled similar to 5.
162
163
164 All other errors are unexpected orchestrator issues and thus should raise an exception that are then
165 logged into the mgr log file. If there is a completion object at that point,
166 :func:`Completion.result` may contain an error message.
167
168
169 Excluded functionality
170 ----------------------
171
172 - Ceph's orchestrator interface is not a general purpose framework for
173 managing linux servers -- it is deliberately constrained to manage
174 the Ceph cluster's services only.
175 - Multipathed storage is not handled (multipathing is unnecessary for
176 Ceph clusters). Each drive is assumed to be visible only on
177 a single host.
178
179 Host management
180 ---------------
181
182 .. automethod:: Orchestrator.add_host
183 .. automethod:: Orchestrator.remove_host
184 .. automethod:: Orchestrator.get_hosts
185 .. automethod:: Orchestrator.update_host_addr
186 .. automethod:: Orchestrator.add_host_label
187 .. automethod:: Orchestrator.remove_host_label
188
189 .. autoclass:: HostSpec
190
191 Devices
192 -------
193
194 .. automethod:: Orchestrator.get_inventory
195 .. autoclass:: InventoryFilter
196
197 .. py:currentmodule:: ceph.deployment.inventory
198
199 .. autoclass:: Devices
200 :members:
201
202 .. autoclass:: Device
203 :members:
204
205 .. py:currentmodule:: orchestrator
206
207 Placement
208 ---------
209
210 A :ref:`orchestrator-cli-placement-spec` defines the placement of
211 daemons of a specific service.
212
213 In general, stateless services do not require any specific placement
214 rules as they can run anywhere that sufficient system resources
215 are available. However, some orchestrators may not include the
216 functionality to choose a location in this way. Optionally, you can
217 specify a location when creating a stateless service.
218
219
220 .. py:currentmodule:: ceph.deployment.service_spec
221
222 .. autoclass:: PlacementSpec
223 :members:
224
225 .. py:currentmodule:: orchestrator
226
227
228 Services
229 --------
230
231 .. autoclass:: ServiceDescription
232
233 .. py:currentmodule:: ceph.deployment.service_spec
234
235 .. autoclass:: ServiceSpec
236 :members:
237 :private-members:
238 :noindex:
239
240 .. py:currentmodule:: orchestrator
241
242 .. automethod:: Orchestrator.describe_service
243
244 .. automethod:: Orchestrator.service_action
245 .. automethod:: Orchestrator.remove_service
246
247
248 Daemons
249 -------
250
251 .. automethod:: Orchestrator.list_daemons
252 .. automethod:: Orchestrator.remove_daemons
253 .. automethod:: Orchestrator.daemon_action
254
255 OSD management
256 --------------
257
258 .. automethod:: Orchestrator.create_osds
259
260 .. automethod:: Orchestrator.blink_device_light
261 .. autoclass:: DeviceLightLoc
262
263 .. _orchestrator-osd-replace:
264
265 OSD Replacement
266 ^^^^^^^^^^^^^^^
267
268 See :ref:`rados-replacing-an-osd` for the underlying process.
269
270 Replacing OSDs is fundamentally a two-staged process, as users need to
271 physically replace drives. The orchestrator therefore exposes this two-staged process.
272
273 Phase one is a call to :meth:`Orchestrator.remove_daemons` with ``destroy=True`` in order to mark
274 the OSD as destroyed.
275
276
277 Phase two is a call to :meth:`Orchestrator.create_osds` with a Drive Group with
278
279 .. py:currentmodule:: ceph.deployment.drive_group
280
281 :attr:`DriveGroupSpec.osd_id_claims` set to the destroyed OSD ids.
282
283 .. py:currentmodule:: orchestrator
284
285 Services
286 --------
287
288 .. automethod:: Orchestrator.add_daemon
289 .. automethod:: Orchestrator.apply_mon
290 .. automethod:: Orchestrator.apply_mgr
291 .. automethod:: Orchestrator.apply_mds
292 .. automethod:: Orchestrator.apply_rbd_mirror
293
294 .. py:currentmodule:: ceph.deployment.service_spec
295
296 .. autoclass:: RGWSpec
297 :noindex:
298
299 .. py:currentmodule:: orchestrator
300
301 .. automethod:: Orchestrator.apply_rgw
302
303 .. py:currentmodule:: ceph.deployment.service_spec
304
305 .. autoclass:: NFSServiceSpec
306
307 .. py:currentmodule:: orchestrator
308
309 .. automethod:: Orchestrator.apply_nfs
310
311 Upgrades
312 --------
313
314 .. automethod:: Orchestrator.upgrade_available
315 .. automethod:: Orchestrator.upgrade_start
316 .. automethod:: Orchestrator.upgrade_status
317 .. autoclass:: UpgradeStatusSpec
318
319 Utility
320 -------
321
322 .. automethod:: Orchestrator.available
323 .. automethod:: Orchestrator.get_feature_set
324
325 Client Modules
326 --------------
327
328 .. autoclass:: OrchestratorClientMixin
329 :members: