]> git.proxmox.com Git - ceph.git/blob - ceph/doc/cephadm/operations.rst
bump version to 18.2.4-pve3
[ceph.git] / ceph / doc / cephadm / operations.rst
1 ==================
2 Cephadm Operations
3 ==================
4
5 .. _watching_cephadm_logs:
6
7 Watching cephadm log messages
8 =============================
9
10 Cephadm writes logs to the ``cephadm`` cluster log channel. You can
11 monitor Ceph's activity in real time by reading the logs as they fill
12 up. Run the following command to see the logs in real time:
13
14 .. prompt:: bash #
15
16 ceph -W cephadm
17
18 By default, this command shows info-level events and above. To see
19 debug-level messages as well as info-level events, run the following
20 commands:
21
22 .. prompt:: bash #
23
24 ceph config set mgr mgr/cephadm/log_to_cluster_level debug
25 ceph -W cephadm --watch-debug
26
27 .. warning::
28
29 The debug messages are very verbose!
30
31 You can see recent events by running the following command:
32
33 .. prompt:: bash #
34
35 ceph log last cephadm
36
37 These events are also logged to the ``ceph.cephadm.log`` file on
38 monitor hosts as well as to the monitor daemons' stderr.
39
40
41 .. _cephadm-logs:
42
43
44 Ceph daemon control
45 ===================
46
47 Starting and stopping daemons
48 -----------------------------
49
50 You can stop, start, or restart a daemon with:
51
52 .. prompt:: bash #
53
54 ceph orch daemon stop <name>
55 ceph orch daemon start <name>
56 ceph orch daemon restart <name>
57
58 You can also do the same for all daemons for a service with:
59
60 .. prompt:: bash #
61
62 ceph orch stop <name>
63 ceph orch start <name>
64 ceph orch restart <name>
65
66
67 Redeploying or reconfiguring a daemon
68 -------------------------------------
69
70 The container for a daemon can be stopped, recreated, and restarted with
71 the ``redeploy`` command:
72
73 .. prompt:: bash #
74
75 ceph orch daemon redeploy <name> [--image <image>]
76
77 A container image name can optionally be provided to force a
78 particular image to be used (instead of the image specified by the
79 ``container_image`` config value).
80
81 If only the ceph configuration needs to be regenerated, you can also
82 issue a ``reconfig`` command, which will rewrite the ``ceph.conf``
83 file but will not trigger a restart of the daemon.
84
85 .. prompt:: bash #
86
87 ceph orch daemon reconfig <name>
88
89
90 Rotating a daemon's authenticate key
91 ------------------------------------
92
93 All Ceph and gateway daemons in the cluster have a secret key that is used to connect
94 to and authenticate with the cluster. This key can be rotated (i.e., replaced with a
95 new key) with the following command:
96
97 .. prompt:: bash #
98
99 ceph orch daemon rotate-key <name>
100
101 For MDS, OSD, and MGR daemons, this does not require a daemon restart. For other
102 daemons, however (e.g., RGW), the daemon may be restarted to switch to the new key.
103
104
105 Ceph daemon logs
106 ================
107
108 Logging to journald
109 -------------------
110
111 Ceph daemons traditionally write logs to ``/var/log/ceph``. Ceph daemons log to
112 journald by default and Ceph logs are captured by the container runtime
113 environment. They are accessible via ``journalctl``.
114
115 .. note:: Prior to Quincy, ceph daemons logged to stderr.
116
117 Example of logging to journald
118 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
119
120 For example, to view the logs for the daemon ``mon.foo`` for a cluster
121 with ID ``5c5a50ae-272a-455d-99e9-32c6a013e694``, the command would be
122 something like:
123
124 .. prompt:: bash #
125
126 journalctl -u ceph-5c5a50ae-272a-455d-99e9-32c6a013e694@mon.foo
127
128 This works well for normal operations when logging levels are low.
129
130 Logging to files
131 ----------------
132
133 You can also configure Ceph daemons to log to files instead of to
134 journald if you prefer logs to appear in files (as they did in earlier,
135 pre-cephadm, pre-Octopus versions of Ceph). When Ceph logs to files,
136 the logs appear in ``/var/log/ceph/<cluster-fsid>``. If you choose to
137 configure Ceph to log to files instead of to journald, remember to
138 configure Ceph so that it will not log to journald (the commands for
139 this are covered below).
140
141 Enabling logging to files
142 ~~~~~~~~~~~~~~~~~~~~~~~~~
143
144 To enable logging to files, run the following commands:
145
146 .. prompt:: bash #
147
148 ceph config set global log_to_file true
149 ceph config set global mon_cluster_log_to_file true
150
151 Disabling logging to journald
152 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
153
154 If you choose to log to files, we recommend disabling logging to journald or else
155 everything will be logged twice. Run the following commands to disable logging
156 to stderr:
157
158 .. prompt:: bash #
159
160 ceph config set global log_to_stderr false
161 ceph config set global mon_cluster_log_to_stderr false
162 ceph config set global log_to_journald false
163 ceph config set global mon_cluster_log_to_journald false
164
165 .. note:: You can change the default by passing --log-to-file during
166 bootstrapping a new cluster.
167
168 Modifying the log retention schedule
169 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
170
171 By default, cephadm sets up log rotation on each host to rotate these
172 files. You can configure the logging retention schedule by modifying
173 ``/etc/logrotate.d/ceph.<cluster-fsid>``.
174
175
176 Data location
177 =============
178
179 Cephadm stores daemon data and logs in different locations than did
180 older, pre-cephadm (pre Octopus) versions of ceph:
181
182 * ``/var/log/ceph/<cluster-fsid>`` contains all cluster logs. By
183 default, cephadm logs via stderr and the container runtime. These
184 logs will not exist unless you have enabled logging to files as
185 described in `cephadm-logs`_.
186 * ``/var/lib/ceph/<cluster-fsid>`` contains all cluster daemon data
187 (besides logs).
188 * ``/var/lib/ceph/<cluster-fsid>/<daemon-name>`` contains all data for
189 an individual daemon.
190 * ``/var/lib/ceph/<cluster-fsid>/crash`` contains crash reports for
191 the cluster.
192 * ``/var/lib/ceph/<cluster-fsid>/removed`` contains old daemon
193 data directories for stateful daemons (e.g., monitor, prometheus)
194 that have been removed by cephadm.
195
196 Disk usage
197 ----------
198
199 Because a few Ceph daemons (notably, the monitors and prometheus) store a
200 large amount of data in ``/var/lib/ceph`` , we recommend moving this
201 directory to its own disk, partition, or logical volume so that it does not
202 fill up the root file system.
203
204
205 Health checks
206 =============
207 The cephadm module provides additional health checks to supplement the
208 default health checks provided by the Cluster. These additional health
209 checks fall into two categories:
210
211 - **cephadm operations**: Health checks in this category are always
212 executed when the cephadm module is active.
213 - **cluster configuration**: These health checks are *optional*, and
214 focus on the configuration of the hosts in the cluster.
215
216 CEPHADM Operations
217 ------------------
218
219 CEPHADM_PAUSED
220 ~~~~~~~~~~~~~~
221
222 This indicates that cephadm background work has been paused with
223 ``ceph orch pause``. Cephadm continues to perform passive monitoring
224 activities (like checking host and daemon status), but it will not
225 make any changes (like deploying or removing daemons).
226
227 Resume cephadm work by running the following command:
228
229 .. prompt:: bash #
230
231 ceph orch resume
232
233 .. _cephadm-stray-host:
234
235 CEPHADM_STRAY_HOST
236 ~~~~~~~~~~~~~~~~~~
237
238 This indicates that one or more hosts have Ceph daemons that are
239 running, but are not registered as hosts managed by *cephadm*. This
240 means that those services cannot currently be managed by cephadm
241 (e.g., restarted, upgraded, included in `ceph orch ps`).
242
243 * You can manage the host(s) by running the following command:
244
245 .. prompt:: bash #
246
247 ceph orch host add *<hostname>*
248
249 .. note::
250
251 You might need to configure SSH access to the remote host
252 before this will work.
253
254 * See :ref:`cephadm-fqdn` for more information about host names and
255 domain names.
256
257 * Alternatively, you can manually connect to the host and ensure that
258 services on that host are removed or migrated to a host that is
259 managed by *cephadm*.
260
261 * This warning can be disabled entirely by running the following
262 command:
263
264 .. prompt:: bash #
265
266 ceph config set mgr mgr/cephadm/warn_on_stray_hosts false
267
268 CEPHADM_STRAY_DAEMON
269 ~~~~~~~~~~~~~~~~~~~~
270
271 One or more Ceph daemons are running but not are not managed by
272 *cephadm*. This may be because they were deployed using a different
273 tool, or because they were started manually. Those
274 services cannot currently be managed by cephadm (e.g., restarted,
275 upgraded, or included in `ceph orch ps`).
276
277 * If the daemon is a stateful one (monitor or OSD), it should be adopted
278 by cephadm; see :ref:`cephadm-adoption`. For stateless daemons, it is
279 usually easiest to provision a new daemon with the ``ceph orch apply``
280 command and then stop the unmanaged daemon.
281
282 * If the stray daemon(s) are running on hosts not managed by cephadm, you can manage the host(s) by running the following command:
283
284 .. prompt:: bash #
285
286 ceph orch host add *<hostname>*
287
288 .. note::
289
290 You might need to configure SSH access to the remote host
291 before this will work.
292
293 * See :ref:`cephadm-fqdn` for more information about host names and
294 domain names.
295
296 * This warning can be disabled entirely by running the following command:
297
298 .. prompt:: bash #
299
300 ceph config set mgr mgr/cephadm/warn_on_stray_daemons false
301
302 CEPHADM_HOST_CHECK_FAILED
303 ~~~~~~~~~~~~~~~~~~~~~~~~~
304
305 One or more hosts have failed the basic cephadm host check, which verifies
306 that (1) the host is reachable and cephadm can be executed there, and (2)
307 that the host satisfies basic prerequisites, like a working container
308 runtime (podman or docker) and working time synchronization.
309 If this test fails, cephadm will no be able to manage services on that host.
310
311 You can manually run this check by running the following command:
312
313 .. prompt:: bash #
314
315 ceph cephadm check-host *<hostname>*
316
317 You can remove a broken host from management by running the following command:
318
319 .. prompt:: bash #
320
321 ceph orch host rm *<hostname>*
322
323 You can disable this health warning by running the following command:
324
325 .. prompt:: bash #
326
327 ceph config set mgr mgr/cephadm/warn_on_failed_host_check false
328
329 Cluster Configuration Checks
330 ----------------------------
331 Cephadm periodically scans each host in the cluster in order
332 to understand the state of the OS, disks, network interfacess etc. This information can
333 then be analyzed for consistency across the hosts in the cluster to
334 identify any configuration anomalies.
335
336 Enabling Cluster Configuration Checks
337 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
338
339 These configuration checks are an **optional** feature, and are enabled
340 by running the following command:
341
342 .. prompt:: bash #
343
344 ceph config set mgr mgr/cephadm/config_checks_enabled true
345
346 States Returned by Cluster Configuration Checks
347 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
348
349 Configuration checks are triggered after each host scan. The
350 cephadm log entries will show the current state and outcome of the
351 configuration checks as follows:
352
353 Disabled state (config_checks_enabled false):
354
355 .. code-block:: bash
356
357 ALL cephadm checks are disabled, use 'ceph config set mgr mgr/cephadm/config_checks_enabled true' to enable
358
359 Enabled state (config_checks_enabled true):
360
361 .. code-block:: bash
362
363 CEPHADM 8/8 checks enabled and executed (0 bypassed, 0 disabled). No issues detected
364
365 Managing Configuration Checks (subcommands)
366 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
367
368 The configuration checks themselves are managed through several cephadm subcommands.
369
370 To determine whether the configuration checks are enabled, run the following command:
371
372 .. prompt:: bash #
373
374 ceph cephadm config-check status
375
376 This command returns the status of the configuration checker as either "Enabled" or "Disabled".
377
378
379 To list all the configuration checks and their current states, run the following command:
380
381 .. code-block:: console
382
383 # ceph cephadm config-check ls
384
385 NAME HEALTHCHECK STATUS DESCRIPTION
386 kernel_security CEPHADM_CHECK_KERNEL_LSM enabled check that SELINUX/Apparmor profiles are consistent across cluster hosts
387 os_subscription CEPHADM_CHECK_SUBSCRIPTION enabled check that subscription states are consistent for all cluster hosts
388 public_network CEPHADM_CHECK_PUBLIC_MEMBERSHIP enabled check that all hosts have a network interface on the Ceph public_network
389 osd_mtu_size CEPHADM_CHECK_MTU enabled check that OSD hosts share a common MTU setting
390 osd_linkspeed CEPHADM_CHECK_LINKSPEED enabled check that OSD hosts share a common network link speed
391 network_missing CEPHADM_CHECK_NETWORK_MISSING enabled check that the cluster/public networks as defined exist on the Ceph hosts
392 ceph_release CEPHADM_CHECK_CEPH_RELEASE enabled check for Ceph version consistency: all Ceph daemons should be the same release unless upgrade is in progress
393 kernel_version CEPHADM_CHECK_KERNEL_VERSION enabled checks that the maj.min version of the kernel is consistent across Ceph hosts
394
395 The name of each configuration check can be used to enable or disable a specific check by running a command of the following form:
396 :
397
398 .. prompt:: bash #
399
400 ceph cephadm config-check disable <name>
401
402 For example:
403
404 .. prompt:: bash #
405
406 ceph cephadm config-check disable kernel_security
407
408 CEPHADM_CHECK_KERNEL_LSM
409 ~~~~~~~~~~~~~~~~~~~~~~~~
410 Each host within the cluster is expected to operate within the same Linux
411 Security Module (LSM) state. For example, if the majority of the hosts are
412 running with SELINUX in enforcing mode, any host not running in this mode is
413 flagged as an anomaly and a healthcheck (WARNING) state raised.
414
415 CEPHADM_CHECK_SUBSCRIPTION
416 ~~~~~~~~~~~~~~~~~~~~~~~~~~
417 This check relates to the status of OS vendor subscription. This check is
418 performed only for hosts using RHEL and helps to confirm that all hosts are
419 covered by an active subscription, which ensures that patches and updates are
420 available.
421
422 CEPHADM_CHECK_PUBLIC_MEMBERSHIP
423 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
424 All members of the cluster should have a network interface configured on at least one of the
425 public network subnets. Hosts that are not on the public network will rely on
426 routing, which may affect performance.
427
428 CEPHADM_CHECK_MTU
429 ~~~~~~~~~~~~~~~~~
430 The MTU of the network interfaces on OSD hosts can be a key factor in consistent performance. This
431 check examines hosts that are running OSD services to ensure that the MTU is
432 configured consistently within the cluster. This is determined by determining
433 the MTU setting that the majority of hosts is using. Any anomalies result in a
434 health check.
435
436 CEPHADM_CHECK_LINKSPEED
437 ~~~~~~~~~~~~~~~~~~~~~~~
438 This check is similar to the MTU check. Link speed consistency is a factor in
439 consistent cluster performance, as is the MTU of the OSD node network interfaces.
440 This check determines the link speed shared by the majority of OSD hosts, and a
441 health check is run for any hosts that are set at a lower link speed rate.
442
443 CEPHADM_CHECK_NETWORK_MISSING
444 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
445 The `public_network` and `cluster_network` settings support subnet definitions
446 for IPv4 and IPv6. If these settings are not found on any host in the cluster,
447 a health check is raised.
448
449 CEPHADM_CHECK_CEPH_RELEASE
450 ~~~~~~~~~~~~~~~~~~~~~~~~~~
451 Under normal operations, the Ceph cluster runs daemons that are of the same Ceph
452 release (for example, Reef). This check determines the active release for each daemon, and
453 reports any anomalies as a healthcheck. *This check is bypassed if an upgrade
454 is in process.*
455
456 CEPHADM_CHECK_KERNEL_VERSION
457 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
458 The OS kernel version (maj.min) is checked for consistency across hosts.
459 The kernel version of the majority of the hosts is used as the basis for
460 identifying anomalies.
461
462 .. _client_keyrings_and_configs:
463
464 Client keyrings and configs
465 ===========================
466 Cephadm can distribute copies of the ``ceph.conf`` file and client keyring
467 files to hosts. Starting from versions 16.2.10 (Pacific) and 17.2.1 (Quincy),
468 in addition to the default location ``/etc/ceph/`` cephadm also stores config
469 and keyring files in the ``/var/lib/ceph/<fsid>/config`` directory. It is usually
470 a good idea to store a copy of the config and ``client.admin`` keyring on any host
471 used to administer the cluster via the CLI. By default, cephadm does this for any
472 nodes that have the ``_admin`` label (which normally includes the bootstrap host).
473
474 .. note:: Ceph daemons will still use files on ``/etc/ceph/``. The new configuration
475 location ``/var/lib/ceph/<fsid>/config`` is used by cephadm only. Having this config
476 directory under the fsid helps cephadm to load the configuration associated with
477 the cluster.
478
479
480 When a client keyring is placed under management, cephadm will:
481
482 - build a list of target hosts based on the specified placement spec (see
483 :ref:`orchestrator-cli-placement-spec`)
484 - store a copy of the ``/etc/ceph/ceph.conf`` file on the specified host(s)
485 - store a copy of the ``ceph.conf`` file at ``/var/lib/ceph/<fsid>/config/ceph.conf`` on the specified host(s)
486 - store a copy of the ``ceph.client.admin.keyring`` file at ``/var/lib/ceph/<fsid>/config/ceph.client.admin.keyring`` on the specified host(s)
487 - store a copy of the keyring file on the specified host(s)
488 - update the ``ceph.conf`` file as needed (e.g., due to a change in the cluster monitors)
489 - update the keyring file if the entity's key is changed (e.g., via ``ceph
490 auth ...`` commands)
491 - ensure that the keyring file has the specified ownership and specified mode
492 - remove the keyring file when client keyring management is disabled
493 - remove the keyring file from old hosts if the keyring placement spec is
494 updated (as needed)
495
496 Listing Client Keyrings
497 -----------------------
498
499 To see the list of client keyrings are currently under management, run the following command:
500
501 .. prompt:: bash #
502
503 ceph orch client-keyring ls
504
505 Putting a Keyring Under Management
506 ----------------------------------
507
508 To put a keyring under management, run a command of the following form:
509
510 .. prompt:: bash #
511
512 ceph orch client-keyring set <entity> <placement> [--mode=<mode>] [--owner=<uid>.<gid>] [--path=<path>]
513
514 - By default, the *path* is ``/etc/ceph/client.{entity}.keyring``, which is
515 where Ceph looks by default. Be careful when specifying alternate locations,
516 as existing files may be overwritten.
517 - A placement of ``*`` (all hosts) is common.
518 - The mode defaults to ``0600`` and ownership to ``0:0`` (user root, group root).
519
520 For example, to create a ``client.rbd`` key and deploy it to hosts with the
521 ``rbd-client`` label and make it group readable by uid/gid 107 (qemu), run the
522 following commands:
523
524 .. prompt:: bash #
525
526 ceph auth get-or-create-key client.rbd mon 'profile rbd' mgr 'profile rbd' osd 'profile rbd pool=my_rbd_pool'
527 ceph orch client-keyring set client.rbd label:rbd-client --owner 107:107 --mode 640
528
529 The resulting keyring file is:
530
531 .. code-block:: console
532
533 -rw-r-----. 1 qemu qemu 156 Apr 21 08:47 /etc/ceph/client.client.rbd.keyring
534
535 Disabling Management of a Keyring File
536 --------------------------------------
537
538 To disable management of a keyring file, run a command of the following form:
539
540 .. prompt:: bash #
541
542 ceph orch client-keyring rm <entity>
543
544 .. note::
545
546 This deletes any keyring files for this entity that were previously written
547 to cluster nodes.
548
549 .. _etc_ceph_conf_distribution:
550
551 /etc/ceph/ceph.conf
552 ===================
553
554 Distributing ceph.conf to hosts that have no keyrings
555 -----------------------------------------------------
556
557 It might be useful to distribute ``ceph.conf`` files to hosts without an
558 associated client keyring file. By default, cephadm deploys only a
559 ``ceph.conf`` file to hosts where a client keyring is also distributed (see
560 above). To write config files to hosts without client keyrings, run the
561 following command:
562
563 .. prompt:: bash #
564
565 ceph config set mgr mgr/cephadm/manage_etc_ceph_ceph_conf true
566
567 Using Placement Specs to specify which hosts get keyrings
568 ---------------------------------------------------------
569
570 By default, the configs are written to all hosts (i.e., those listed by ``ceph
571 orch host ls``). To specify which hosts get a ``ceph.conf``, run a command of
572 the following form:
573
574 .. prompt:: bash #
575
576 ceph config set mgr mgr/cephadm/manage_etc_ceph_ceph_conf_hosts <placement spec>
577
578 For example, to distribute configs to hosts with the ``bare_config`` label, run
579 the following command:
580
581 Distributing ceph.conf to hosts tagged with bare_config
582 -------------------------------------------------------
583
584 For example, to distribute configs to hosts with the ``bare_config`` label, run the following command:
585
586 .. prompt:: bash #
587
588 ceph config set mgr mgr/cephadm/manage_etc_ceph_ceph_conf_hosts label:bare_config
589
590 (See :ref:`orchestrator-cli-placement-spec` for more information about placement specs.)
591
592 Purging a cluster
593 =================
594
595 .. danger:: THIS OPERATION WILL DESTROY ALL DATA STORED IN THIS CLUSTER
596
597 In order to destroy a cluster and delete all data stored in this cluster, disable
598 cephadm to stop all orchestration operations (so we avoid deploying new daemons).
599
600 .. prompt:: bash #
601
602 ceph mgr module disable cephadm
603
604 Then verify the FSID of the cluster:
605
606 .. prompt:: bash #
607
608 ceph fsid
609
610 Purge ceph daemons from all hosts in the cluster
611
612 .. prompt:: bash #
613
614 # For each host:
615 cephadm rm-cluster --force --zap-osds --fsid <fsid>