]> git.proxmox.com Git - ceph.git/blame - ceph/doc/mgr/modules.rst
import quincy beta 17.1.0
[ceph.git] / ceph / doc / mgr / modules.rst
CommitLineData
11fdf7f2
TL
1
2
3.. _mgr-module-dev:
4
5ceph-mgr module developer's guide
6=================================
7
8.. warning::
9
10 This is developer documentation, describing Ceph internals that
11 are only relevant to people writing ceph-mgr modules.
12
13Creating a module
14-----------------
15
16In pybind/mgr/, create a python module. Within your module, create a class
17that inherits from ``MgrModule``. For ceph-mgr to detect your module, your
18directory must contain a file called `module.py`.
19
20The most important methods to override are:
21
22* a ``serve`` member function for server-type modules. This
23 function should block forever.
24* a ``notify`` member function if your module needs to
25 take action when new cluster data is available.
26* a ``handle_command`` member function if your module
20effc67
TL
27 exposes CLI commands. But this approach for exposing commands
28 is deprecated. For more details, see :ref:`mgr-module-exposing-commands`.
11fdf7f2
TL
29
30Some modules interface with external orchestrators to deploy
31Ceph services. These also inherit from ``Orchestrator``, which adds
32additional methods to the base ``MgrModule`` class. See
33:ref:`Orchestrator modules <orchestrator-modules>` for more on
34creating these modules.
35
36Installing a module
37-------------------
38
39Once your module is present in the location set by the
40``mgr module path`` configuration setting, you can enable it
41via the ``ceph mgr module enable`` command::
42
43 ceph mgr module enable mymodule
44
45Note that the MgrModule interface is not stable, so any modules maintained
46outside of the Ceph tree are liable to break when run against any newer
47or older versions of Ceph.
48
20effc67
TL
49.. _mgr module dev logging:
50
11fdf7f2
TL
51Logging
52-------
53
9f95a23c
TL
54Logging in Ceph manager modules is done as in any other Python program. Just
55import the ``logging`` package and get a logger instance with the
56``logging.getLogger`` function.
57
58Each module has a ``log_level`` option that specifies the current Python
59logging level of the module.
60To change or query the logging level of the module use the following Ceph
61commands::
62
63 ceph config get mgr mgr/<module_name>/log_level
64 ceph config set mgr mgr/<module_name>/log_level <info|debug|critical|error|warning|>
65
66The logging level used upon the module's start is determined by the current
67logging level of the mgr daemon, unless if the ``log_level`` option was
68previously set with the ``config set ...`` command. The mgr daemon logging
69level is mapped to the module python logging level as follows:
70
71* <= 0 is CRITICAL
72* <= 1 is WARNING
73* <= 4 is INFO
74* <= +inf is DEBUG
75
76We can unset the module log level and fallback to the mgr daemon logging level
77by running the following command::
78
79 ceph config set mgr mgr/<module_name>/log_level ''
80
81By default, modules' logging messages are processed by the Ceph logging layer
82where they will be recorded in the mgr daemon's log file.
83But it's also possible to send a module's logging message to it's own file.
84
85The module's log file will be located in the same directory as the mgr daemon's
86log file with the following name pattern::
87
88 <mgr_daemon_log_file_name>.<module_name>.log
89
90To enable the file logging on a module use the following command::
91
92 ceph config set mgr mgr/<module_name>/log_to_file true
93
94When the module's file logging is enabled, module's logging messages stop
95being written to the mgr daemon's log file and are only written to the
96module's log file.
97
98It's also possible to check the status and disable the file logging with the
99following commands::
100
101 ceph config get mgr mgr/<module_name>/log_to_file
102 ceph config set mgr mgr/<module_name>/log_to_file false
103
104
11fdf7f2 105
20effc67 106.. _mgr-module-exposing-commands:
11fdf7f2
TL
107
108Exposing commands
109-----------------
110
f67539c2
TL
111There are two approaches for exposing a command. The first one is to
112use the ``@CLICommand`` decorator to decorate the method which handles
113the command. like this
114
115.. code:: python
116
117 @CLICommand('antigravity send to blackhole',
118 perm='rw')
119 def send_to_blackhole(self, oid: str, blackhole: Optional[str] = None, inbuf: Optional[str] = None):
120 '''
121 Send the specified object to black hole
122 '''
123 obj = self.find_object(oid)
124 if obj is None:
125 return HandleCommandResult(-errno.ENOENT, stderr=f"object '{oid}' not found")
126 if blackhole is not None and inbuf is not None:
127 try:
128 location = self.decrypt(blackhole, passphrase=inbuf)
129 except ValueError:
130 return HandleCommandResult(-errno.EINVAL, stderr='unable to decrypt location')
131 else:
132 location = blackhole
133 self.send_object_to(obj, location)
134 return HandleCommandResult(stdout=f'the black hole swallowed '{oid}'")
135
136The first parameter passed to ``CLICommand`` is the "name" of the command.
137Since there are lots of commands in Ceph, we tend to group related commands
138with a common prefix. In this case, "antigravity" is used for this purpose.
139As the author is probably designing a module which is also able to launch
140rockets into the deep space.
141
142The `type annotations <https://www.python.org/dev/peps/pep-0484/>`_ for the
143method parameters are mandatory here, so the usage of the command can be
144properly reported to the ``ceph`` CLI, and the manager daemon can convert
145the serialized command parameters sent by the clients to the expected type
146before passing them to the handler method. With properly implemented types,
147one can also perform some sanity checks against the parameters!
148
149The names of the parameters are part of the command interface, so please
150try to take the backward compatibility into consideration when changing
151them. But you **cannot** change name of ``inbuf`` parameter, it is used
152to pass the content of the file specified by ``ceph --in-file`` option.
153
154The docstring of the method is used for the description of the command.
155
156The manager daemon cooks the usage of the command from these ingredients,
157like::
158
159 antigravity send to blackhole <oid> [<blackhole>] Send the specified object to black hole
160
161as part of the output of ``ceph --help``.
162
163In addition to ``@CLICommand``, you could also use ``@CLIReadCommand`` or
164``@CLIWriteCommand`` if your command only requires read permissions or
165write permissions respectively.
166
167The second one is to set the ``COMMANDS`` class attribute of your module to
168a list of dicts like this::
11fdf7f2
TL
169
170 COMMANDS = [
171 {
172 "cmd": "foobar name=myarg,type=CephString",
173 "desc": "Do something awesome",
174 "perm": "rw",
175 # optional:
176 "poll": "true"
177 }
178 ]
179
180The ``cmd`` part of each entry is parsed in the same way as internal
181Ceph mon and admin socket commands (see mon/MonCommands.h in
182the Ceph source for examples). Note that the "poll" field is optional,
183and is set to False by default; this indicates to the ``ceph`` CLI
184that it should call this command repeatedly and output results (see
185``ceph -h`` and its ``--period`` option).
186
187Each command is expected to return a tuple ``(retval, stdout, stderr)``.
188``retval`` is an integer representing a libc error code (e.g. EINVAL,
189EPERM, or 0 for no error), ``stdout`` is a string containing any
190non-error output, and ``stderr`` is a string containing any progress or
191error explanation output. Either or both of the two strings may be empty.
192
193Implement the ``handle_command`` function to respond to the commands
194when they are sent:
195
196
197.. py:currentmodule:: mgr_module
198.. automethod:: MgrModule.handle_command
199
200Configuration options
201---------------------
202
203Modules can load and store configuration options using the
204``set_module_option`` and ``get_module_option`` methods.
205
206.. note:: Use ``set_module_option`` and ``get_module_option`` to
207 manage user-visible configuration options that are not blobs (like
208 certificates). If you want to persist module-internal data or
209 binary configuration data consider using the `KV store`_.
210
211You must declare your available configuration options in the
212``MODULE_OPTIONS`` class attribute, like this:
213
20effc67 214.. code-block:: python
11fdf7f2
TL
215
216 MODULE_OPTIONS = [
20effc67 217 Option(name="my_option")
11fdf7f2
TL
218 ]
219
220If you try to use set_module_option or get_module_option on options not declared
221in ``MODULE_OPTIONS``, an exception will be raised.
222
223You may choose to provide setter commands in your module to perform
224high level validation. Users can also modify configuration using
225the normal `ceph config set` command, where the configuration options
226for a mgr module are named like `mgr/<module name>/<option>`.
227
228If a configuration option is different depending on which node the mgr
229is running on, then use *localized* configuration (
230``get_localized_module_option``, ``set_localized_module_option``).
231This may be necessary for options such as what address to listen on.
232Localized options may also be set externally with ``ceph config set``,
233where they key name is like ``mgr/<module name>/<mgr id>/<option>``
234
235If you need to load and store data (e.g. something larger, binary, or multiline),
236use the KV store instead of configuration options (see next section).
237
238Hints for using config options:
239
240* Reads are fast: ceph-mgr keeps a local in-memory copy, so in many cases
241 you can just do a get_module_option every time you use a option, rather than
242 copying it out into a variable.
243* Writes block until the value is persisted (i.e. round trip to the monitor),
244 but reads from another thread will see the new value immediately.
245* If a user has used `config set` from the command line, then the new
246 value will become visible to `get_module_option` immediately, although the
247 mon->mgr update is asynchronous, so `config set` will return a fraction
248 of a second before the new value is visible on the mgr.
249* To delete a config value (i.e. revert to default), just pass ``None`` to
250 set_module_option.
251
252.. automethod:: MgrModule.get_module_option
253.. automethod:: MgrModule.set_module_option
254.. automethod:: MgrModule.get_localized_module_option
255.. automethod:: MgrModule.set_localized_module_option
256
257KV store
258--------
259
260Modules have access to a private (per-module) key value store, which
261is implemented using the monitor's "config-key" commands. Use
262the ``set_store`` and ``get_store`` methods to access the KV store from
263your module.
264
265The KV store commands work in a similar way to the configuration
266commands. Reads are fast, operating from a local cache. Writes block
267on persistence and do a round trip to the monitor.
268
269This data can be access from outside of ceph-mgr using the
270``ceph config-key [get|set]`` commands. Key names follow the same
271conventions as configuration options. Note that any values updated
272from outside of ceph-mgr will not be seen by running modules until
273the next restart. Users should be discouraged from accessing module KV
274data externally -- if it is necessary for users to populate data, modules
275should provide special commands to set the data via the module.
276
277Use the ``get_store_prefix`` function to enumerate keys within
278a particular prefix (i.e. all keys starting with a particular substring).
279
280
281.. automethod:: MgrModule.get_store
282.. automethod:: MgrModule.set_store
283.. automethod:: MgrModule.get_localized_store
284.. automethod:: MgrModule.set_localized_store
285.. automethod:: MgrModule.get_store_prefix
286
287
288Accessing cluster data
289----------------------
290
291Modules have access to the in-memory copies of the Ceph cluster's
292state that the mgr maintains. Accessor functions as exposed
293as members of MgrModule.
294
295Calls that access the cluster or daemon state are generally going
296from Python into native C++ routines. There is some overhead to this,
297but much less than for example calling into a REST API or calling into
298an SQL database.
299
300There are no consistency rules about access to cluster structures or
301daemon metadata. For example, an OSD might exist in OSDMap but
302have no metadata, or vice versa. On a healthy cluster these
303will be very rare transient states, but modules should be written
304to cope with the possibility.
305
306Note that these accessors must not be called in the modules ``__init__``
307function. This will result in a circular locking exception.
308
309.. automethod:: MgrModule.get
310.. automethod:: MgrModule.get_server
311.. automethod:: MgrModule.list_servers
312.. automethod:: MgrModule.get_metadata
313.. automethod:: MgrModule.get_daemon_status
314.. automethod:: MgrModule.get_perf_schema
315.. automethod:: MgrModule.get_counter
316.. automethod:: MgrModule.get_mgr_id
317
318Exposing health checks
319----------------------
320
321Modules can raise first class Ceph health checks, which will be reported
322in the output of ``ceph status`` and in other places that report on the
323cluster's health.
324
325If you use ``set_health_checks`` to report a problem, be sure to call
326it again with an empty dict to clear your health check when the problem
327goes away.
328
329.. automethod:: MgrModule.set_health_checks
330
331What if the mons are down?
332--------------------------
333
334The manager daemon gets much of its state (such as the cluster maps)
335from the monitor. If the monitor cluster is inaccessible, whichever
336manager was active will continue to run, with the latest state it saw
337still in memory.
338
339However, if you are creating a module that shows the cluster state
340to the user then you may well not want to mislead them by showing
341them that out of date state.
342
343To check if the manager daemon currently has a connection to
344the monitor cluster, use this function:
345
346.. automethod:: MgrModule.have_mon_connection
347
348Reporting if your module cannot run
349-----------------------------------
350
351If your module cannot be run for any reason (such as a missing dependency),
352then you can report that by implementing the ``can_run`` function.
353
354.. automethod:: MgrModule.can_run
355
356Note that this will only work properly if your module can always be imported:
357if you are importing a dependency that may be absent, then do it in a
358try/except block so that your module can be loaded far enough to use
359``can_run`` even if the dependency is absent.
360
361Sending commands
362----------------
363
364A non-blocking facility is provided for sending monitor commands
365to the cluster.
366
367.. automethod:: MgrModule.send_command
368
369Receiving notifications
370-----------------------
371
372The manager daemon calls the ``notify`` function on all active modules
373when certain important pieces of cluster state are updated, such as the
374cluster maps.
375
376The actual data is not passed into this function, rather it is a cue for
377the module to go and read the relevant structure if it is interested. Most
378modules ignore most types of notification: to ignore a notification
379simply return from this function without doing anything.
380
381.. automethod:: MgrModule.notify
382
383Accessing RADOS or CephFS
384-------------------------
385
386If you want to use the librados python API to access data stored in
387the Ceph cluster, you can access the ``rados`` attribute of your
388``MgrModule`` instance. This is an instance of ``rados.Rados`` which
389has been constructed for you using the existing Ceph context (an internal
390detail of the C++ Ceph code) of the mgr daemon.
391
392Always use this specially constructed librados instance instead of
393constructing one by hand.
394
9f95a23c 395Similarly, if you are using libcephfs to access the file system, then
11fdf7f2
TL
396use the libcephfs ``create_with_rados`` to construct it from the
397``MgrModule.rados`` librados instance, and thereby inherit the correct context.
398
399Remember that your module may be running while other parts of the cluster
400are down: do not assume that librados or libcephfs calls will return
401promptly -- consider whether to use timeouts or to block if the rest of
402the cluster is not fully available.
403
404Implementing standby mode
405-------------------------
406
407For some modules, it is useful to run on standby manager daemons as well
408as on the active daemon. For example, an HTTP server can usefully
409serve HTTP redirect responses from the standby managers so that
410the user can point his browser at any of the manager daemons without
411having to worry about which one is active.
412
413Standby manager daemons look for a subclass of ``StandbyModule``
414in each module. If the class is not found then the module is not
415used at all on standby daemons. If the class is found, then
416its ``serve`` method is called. Implementations of ``StandbyModule``
417must inherit from ``mgr_module.MgrStandbyModule``.
418
419The interface of ``MgrStandbyModule`` is much restricted compared to
420``MgrModule`` -- none of the Ceph cluster state is available to
421the module. ``serve`` and ``shutdown`` methods are used in the same
422way as a normal module class. The ``get_active_uri`` method enables
423the standby module to discover the address of its active peer in
424order to make redirects. See the ``MgrStandbyModule`` definition
425in the Ceph source code for the full list of methods.
426
427For an example of how to use this interface, look at the source code
428of the ``dashboard`` module.
429
430Communicating between modules
431-----------------------------
432
433Modules can invoke member functions of other modules.
434
435.. automethod:: MgrModule.remote
436
437Be sure to handle ``ImportError`` to deal with the case that the desired
438module is not enabled.
439
440If the remote method raises a python exception, this will be converted
441to a RuntimeError on the calling side, where the message string describes
442the exception that was originally thrown. If your logic intends
443to handle certain errors cleanly, it is better to modify the remote method
444to return an error value instead of raising an exception.
445
446At time of writing, inter-module calls are implemented without
447copies or serialization, so when you return a python object, you're
448returning a reference to that object to the calling module. It
449is recommend *not* to rely on this reference passing, as in future the
450implementation may change to serialize arguments and return
451values.
452
453
11fdf7f2
TL
454Shutting down cleanly
455---------------------
456
457If a module implements the ``serve()`` method, it should also implement
458the ``shutdown()`` method to shutdown cleanly: misbehaving modules
459may otherwise prevent clean shutdown of ceph-mgr.
460
461Limitations
462-----------
463
464It is not possible to call back into C++ code from a module's
465``__init__()`` method. For example calling ``self.get_module_option()`` at
466this point will result in an assertion failure in ceph-mgr. For modules
467that implement the ``serve()`` method, it usually makes sense to do most
468initialization inside that method instead.
469
20effc67
TL
470Debugging
471---------
472
473Apparently, we can always use the :ref:`mgr module dev logging` facility
474for debugging a ceph-mgr module. But some of us might miss `PDB`_ and the
475interactive Python interpreter. Yes, we can have them as well when developing
476ceph-mgr modules! ``ceph_mgr_repl.py`` can drop you into an interactive shell
477talking to ``selftest`` module. With this tool, one can peek and poke the
478ceph-mgr module, and use all the exposed facilities in quite the same way
479how we use the Python command line interpreter. For using ``ceph_mgr_repl.py``,
480we need to
481
482#. ready a Ceph cluster
483#. enable the ``selftest`` module
484#. setup the necessary environment variables
485#. launch the tool
486
487.. _PDB: https://docs.python.org/3/library/pdb.html
488
489Following is a sample session, in which the Ceph version is queried by
490inputting ``print(mgr.version)`` at the prompt. And later
491``timeit`` module is imported to measure the execution time of
492`mgr.get_mgr_id()`.
493
494.. code-block:: console
495
496 $ cd build
497 $ MDS=0 MGR=1 OSD=3 MON=1 ../src/vstart.sh -n -x
498 $ bin/ceph mgr module enable selftest
499 $ ../src/pybind/ceph_mgr_repl.py --show-env
500 $ export PYTHONPATH=/home/me/ceph/src/pybind:/home/me/ceph/build/lib/cython_modules/lib.3:/home/me/ceph/src/python-common:$PYTHONPATH
501 $ export LD_LIBRARY_PATH=/home/me/ceph/build/lib:$LD_LIBRARY_PATH
502 $ export PYTHONPATH=/home/me/ceph/src/pybind:/home/me/ceph/build/lib/cython_modules/lib.3:/home/me/ceph/src/python-common:$PYTHONPATH
503 $ export LD_LIBRARY_PATH=/home/me/ceph/build/lib:$LD_LIBRARY_PATH
504 $ ../src/pybind/ceph_mgr_repl.py
505 $ ../src/pybind/ceph_mgr_repl.py
506 Python 3.9.2 (default, Feb 28 2021, 17:03:44)
507 [GCC 10.2.1 20210110] on linux
508 Type "help", "copyright", "credits" or "license" for more information.
509 (MgrModuleInteractiveConsole)
510 [mgr self-test eval] >>> print(mgr.version)
511 ceph version Development (no_version) quincy (dev)
512 [mgr self-test eval] >>> from timeit import timeit
513 [mgr self-test eval] >>> timeit(mgr.get_mgr_id)
514 0.16303414600042743
515 [mgr self-test eval] >>>
516
517If you want to "talk" to a ceph-mgr module other than ``selftest`` using
518this tool, you can either add a command to the module you want to debug
519exactly like how ``mgr self-test eval`` command was added to ``selftest``. Or
520we can make this simpler by promoting the ``eval()`` method to a dedicated
521`Mixin`_ class and inherit your ``MgrModule`` subclass from it. And define
522a command with it. Assuming the prefix of the command is ``mgr my-module eval``,
523one can just put
524
525.. prompt:: bash $
526
527 ../src/pybind/ceph_mgr_repl.py --prefix "mgr my-module eval"
528
529
530.. _Mixin: _https://en.wikipedia.org/wiki/Mixin
531
11fdf7f2
TL
532Is something missing?
533---------------------
534
535The ceph-mgr python interface is not set in stone. If you have a need
536that is not satisfied by the current interface, please bring it up
537on the ceph-devel mailing list. While it is desired to avoid bloating
538the interface, it is not generally very hard to expose existing data
539to the Python code when there is a good reason.
540