]> git.proxmox.com Git - ceph.git/blame - ceph/doc/cephadm/operations.rst
import quincy beta 17.1.0
[ceph.git] / ceph / doc / cephadm / operations.rst
CommitLineData
9f95a23c
TL
1==================
2Cephadm Operations
3==================
4
522d829b
TL
5.. _watching_cephadm_logs:
6
9f95a23c
TL
7Watching cephadm log messages
8=============================
9
522d829b
TL
10Cephadm writes logs to the ``cephadm`` cluster log channel. You can
11monitor Ceph's activity in real time by reading the logs as they fill
12up. Run the following command to see the logs in real time:
13
14.. prompt:: bash #
15
16 ceph -W cephadm
17
18By default, this command shows info-level events and above. To see
19debug-level messages as well as info-level events, run the following
20commands:
9f95a23c 21
522d829b 22.. prompt:: bash #
9f95a23c 23
522d829b
TL
24 ceph config set mgr mgr/cephadm/log_to_cluster_level debug
25 ceph -W cephadm --watch-debug
9f95a23c 26
522d829b 27.. warning::
9f95a23c 28
522d829b 29 The debug messages are very verbose!
9f95a23c 30
522d829b 31You can see recent events by running the following command:
9f95a23c 32
522d829b
TL
33.. prompt:: bash #
34
35 ceph log last cephadm
9f95a23c
TL
36
37These events are also logged to the ``ceph.cephadm.log`` file on
522d829b 38monitor hosts as well as to the monitor daemons' stderr.
9f95a23c
TL
39
40
801d1391
TL
41.. _cephadm-logs:
42
9f95a23c
TL
43Ceph daemon logs
44================
45
522d829b
TL
46Logging to journald
47-------------------
48
49Ceph daemons traditionally write logs to ``/var/log/ceph``. Ceph daemons log to
50journald by default and Ceph logs are captured by the container runtime
51environment. They are accessible via ``journalctl``.
52
53.. note:: Prior to Quincy, ceph daemons logged to stderr.
9f95a23c 54
522d829b
TL
55Example of logging to journald
56~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
9f95a23c
TL
57
58For example, to view the logs for the daemon ``mon.foo`` for a cluster
59with ID ``5c5a50ae-272a-455d-99e9-32c6a013e694``, the command would be
522d829b
TL
60something like:
61
62.. prompt:: bash #
9f95a23c
TL
63
64 journalctl -u ceph-5c5a50ae-272a-455d-99e9-32c6a013e694@mon.foo
65
66This works well for normal operations when logging levels are low.
67
9f95a23c
TL
68Logging to files
69----------------
70
522d829b
TL
71You can also configure Ceph daemons to log to files instead of to
72journald if you prefer logs to appear in files (as they did in earlier,
73pre-cephadm, pre-Octopus versions of Ceph). When Ceph logs to files,
74the logs appear in ``/var/log/ceph/<cluster-fsid>``. If you choose to
75configure Ceph to log to files instead of to journald, remember to
76configure Ceph so that it will not log to journald (the commands for
77this are covered below).
78
79Enabling logging to files
80~~~~~~~~~~~~~~~~~~~~~~~~~
81
82To enable logging to files, run the following commands:
9f95a23c 83
522d829b 84.. prompt:: bash #
9f95a23c
TL
85
86 ceph config set global log_to_file true
87 ceph config set global mon_cluster_log_to_file true
88
522d829b
TL
89Disabling logging to journald
90~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
91
92If you choose to log to files, we recommend disabling logging to journald or else
93everything will be logged twice. Run the following commands to disable logging
94to stderr:
95
96.. prompt:: bash #
9f95a23c
TL
97
98 ceph config set global log_to_stderr false
99 ceph config set global mon_cluster_log_to_stderr false
522d829b
TL
100 ceph config set global log_to_journald false
101 ceph config set global mon_cluster_log_to_journald false
102
103.. note:: You can change the default by passing --log-to-file during
104 bootstrapping a new cluster.
105
106Modifying the log retention schedule
107~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
9f95a23c
TL
108
109By default, cephadm sets up log rotation on each host to rotate these
110files. You can configure the logging retention schedule by modifying
111``/etc/logrotate.d/ceph.<cluster-fsid>``.
112
113
114Data location
115=============
116
522d829b
TL
117Cephadm stores daemon data and logs in different locations than did
118older, pre-cephadm (pre Octopus) versions of ceph:
9f95a23c 119
522d829b
TL
120* ``/var/log/ceph/<cluster-fsid>`` contains all cluster logs. By
121 default, cephadm logs via stderr and the container runtime. These
122 logs will not exist unless you have enabled logging to files as
123 described in `cephadm-logs`_.
9f95a23c
TL
124* ``/var/lib/ceph/<cluster-fsid>`` contains all cluster daemon data
125 (besides logs).
126* ``/var/lib/ceph/<cluster-fsid>/<daemon-name>`` contains all data for
127 an individual daemon.
128* ``/var/lib/ceph/<cluster-fsid>/crash`` contains crash reports for
129 the cluster.
130* ``/var/lib/ceph/<cluster-fsid>/removed`` contains old daemon
131 data directories for stateful daemons (e.g., monitor, prometheus)
132 that have been removed by cephadm.
133
134Disk usage
135----------
136
522d829b
TL
137Because a few Ceph daemons (notably, the monitors and prometheus) store a
138large amount of data in ``/var/lib/ceph`` , we recommend moving this
139directory to its own disk, partition, or logical volume so that it does not
140fill up the root file system.
9f95a23c
TL
141
142
9f95a23c
TL
143Health checks
144=============
522d829b
TL
145The cephadm module provides additional health checks to supplement the
146default health checks provided by the Cluster. These additional health
147checks fall into two categories:
f67539c2 148
522d829b
TL
149- **cephadm operations**: Health checks in this category are always
150 executed when the cephadm module is active.
151- **cluster configuration**: These health checks are *optional*, and
152 focus on the configuration of the hosts in the cluster.
f67539c2
TL
153
154CEPHADM Operations
155------------------
9f95a23c
TL
156
157CEPHADM_PAUSED
522d829b 158~~~~~~~~~~~~~~
9f95a23c 159
522d829b
TL
160This indicates that cephadm background work has been paused with
161``ceph orch pause``. Cephadm continues to perform passive monitoring
162activities (like checking host and daemon status), but it will not
163make any changes (like deploying or removing daemons).
9f95a23c 164
522d829b
TL
165Resume cephadm work by running the following command:
166
167.. prompt:: bash #
9f95a23c
TL
168
169 ceph orch resume
170
f6b5b4d7
TL
171.. _cephadm-stray-host:
172
9f95a23c 173CEPHADM_STRAY_HOST
522d829b
TL
174~~~~~~~~~~~~~~~~~~
175
176This indicates that one or more hosts have Ceph daemons that are
177running, but are not registered as hosts managed by *cephadm*. This
178means that those services cannot currently be managed by cephadm
179(e.g., restarted, upgraded, included in `ceph orch ps`).
9f95a23c 180
a4b75251 181* You can manage the host(s) by running the following command:
9f95a23c 182
a4b75251 183 .. prompt:: bash #
9f95a23c 184
a4b75251 185 ceph orch host add *<hostname>*
9f95a23c 186
a4b75251 187 .. note::
522d829b 188
a4b75251
TL
189 You might need to configure SSH access to the remote host
190 before this will work.
9f95a23c 191
a4b75251
TL
192* See :ref:`cephadm-fqdn` for more information about host names and
193 domain names.
9f95a23c 194
a4b75251
TL
195* Alternatively, you can manually connect to the host and ensure that
196 services on that host are removed or migrated to a host that is
197 managed by *cephadm*.
522d829b 198
a4b75251
TL
199* This warning can be disabled entirely by running the following
200 command:
9f95a23c 201
a4b75251 202 .. prompt:: bash #
9f95a23c 203
a4b75251 204 ceph config set mgr mgr/cephadm/warn_on_stray_hosts false
f6b5b4d7 205
9f95a23c 206CEPHADM_STRAY_DAEMON
522d829b 207~~~~~~~~~~~~~~~~~~~~
9f95a23c
TL
208
209One or more Ceph daemons are running but not are not managed by
210*cephadm*. This may be because they were deployed using a different
211tool, or because they were started manually. Those
212services cannot currently be managed by cephadm (e.g., restarted,
213upgraded, or included in `ceph orch ps`).
214
a4b75251
TL
215* If the daemon is a stateful one (monitor or OSD), it should be adopted
216 by cephadm; see :ref:`cephadm-adoption`. For stateless daemons, it is
217 usually easiest to provision a new daemon with the ``ceph orch apply``
218 command and then stop the unmanaged daemon.
9f95a23c 219
a4b75251 220* If the stray daemon(s) are running on hosts not managed by cephadm, you can manage the host(s) by running the following command:
522d829b 221
a4b75251
TL
222 .. prompt:: bash #
223
224 ceph orch host add *<hostname>*
225
226 .. note::
9f95a23c 227
a4b75251
TL
228 You might need to configure SSH access to the remote host
229 before this will work.
230
231* See :ref:`cephadm-fqdn` for more information about host names and
232 domain names.
233
234* This warning can be disabled entirely by running the following command:
235
236 .. prompt:: bash #
237
238 ceph config set mgr mgr/cephadm/warn_on_stray_daemons false
9f95a23c
TL
239
240CEPHADM_HOST_CHECK_FAILED
522d829b 241~~~~~~~~~~~~~~~~~~~~~~~~~
9f95a23c
TL
242
243One or more hosts have failed the basic cephadm host check, which verifies
244that (1) the host is reachable and cephadm can be executed there, and (2)
245that the host satisfies basic prerequisites, like a working container
246runtime (podman or docker) and working time synchronization.
247If this test fails, cephadm will no be able to manage services on that host.
248
522d829b
TL
249You can manually run this check by running the following command:
250
251.. prompt:: bash #
9f95a23c
TL
252
253 ceph cephadm check-host *<hostname>*
254
522d829b
TL
255You can remove a broken host from management by running the following command:
256
257.. prompt:: bash #
9f95a23c
TL
258
259 ceph orch host rm *<hostname>*
260
522d829b
TL
261You can disable this health warning by running the following command:
262
263.. prompt:: bash #
9f95a23c
TL
264
265 ceph config set mgr mgr/cephadm/warn_on_failed_host_check false
e306af50 266
f67539c2
TL
267Cluster Configuration Checks
268----------------------------
522d829b
TL
269Cephadm periodically scans each of the hosts in the cluster in order
270to understand the state of the OS, disks, NICs etc. These facts can
271then be analysed for consistency across the hosts in the cluster to
272identify any configuration anomalies.
273
274Enabling Cluster Configuration Checks
275~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
e306af50 276
522d829b
TL
277The configuration checks are an **optional** feature, and are enabled
278by running the following command:
279
280.. prompt:: bash #
e306af50 281
f67539c2 282 ceph config set mgr mgr/cephadm/config_checks_enabled true
e306af50 283
522d829b
TL
284States Returned by Cluster Configuration Checks
285~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
286
287The configuration checks are triggered after each host scan (1m). The
288cephadm log entries will show the current state and outcome of the
289configuration checks as follows:
e306af50 290
522d829b
TL
291Disabled state (config_checks_enabled false):
292
293.. code-block:: bash
e306af50 294
f67539c2 295 ALL cephadm checks are disabled, use 'ceph config set mgr mgr/cephadm/config_checks_enabled true' to enable
f91f0fd5 296
522d829b
TL
297Enabled state (config_checks_enabled true):
298
299.. code-block:: bash
f91f0fd5 300
f67539c2 301 CEPHADM 8/8 checks enabled and executed (0 bypassed, 0 disabled). No issues detected
e306af50 302
522d829b
TL
303Managing Configuration Checks (subcommands)
304~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
f91f0fd5 305
522d829b
TL
306The configuration checks themselves are managed through several cephadm subcommands.
307
308To determine whether the configuration checks are enabled, run the following command:
309
310.. prompt:: bash #
f91f0fd5 311
f67539c2
TL
312 ceph cephadm config-check status
313
522d829b
TL
314This command returns the status of the configuration checker as either "Enabled" or "Disabled".
315
f67539c2 316
522d829b 317To list all the configuration checks and their current states, run the following command:
f67539c2 318
522d829b 319.. code-block:: console
f67539c2 320
522d829b 321 # ceph cephadm config-check ls
f67539c2 322
f67539c2
TL
323 NAME HEALTHCHECK STATUS DESCRIPTION
324 kernel_security CEPHADM_CHECK_KERNEL_LSM enabled checks SELINUX/Apparmor profiles are consistent across cluster hosts
325 os_subscription CEPHADM_CHECK_SUBSCRIPTION enabled checks subscription states are consistent for all cluster hosts
326 public_network CEPHADM_CHECK_PUBLIC_MEMBERSHIP enabled check that all hosts have a NIC on the Ceph public_netork
327 osd_mtu_size CEPHADM_CHECK_MTU enabled check that OSD hosts share a common MTU setting
328 osd_linkspeed CEPHADM_CHECK_LINKSPEED enabled check that OSD hosts share a common linkspeed
329 network_missing CEPHADM_CHECK_NETWORK_MISSING enabled checks that the cluster/public networks defined exist on the Ceph hosts
330 ceph_release CEPHADM_CHECK_CEPH_RELEASE enabled check for Ceph version consistency - ceph daemons should be on the same release (unless upgrade is active)
331 kernel_version CEPHADM_CHECK_KERNEL_VERSION enabled checks that the MAJ.MIN of the kernel on Ceph hosts is consistent
332
522d829b
TL
333The name of each configuration check can be used to enable or disable a specific check by running a command of the following form:
334:
335
336.. prompt:: bash #
adb31ebb 337
f67539c2 338 ceph cephadm config-check disable <name>
adb31ebb 339
522d829b
TL
340For example:
341
342.. prompt:: bash #
343
f67539c2 344 ceph cephadm config-check disable kernel_security
adb31ebb 345
f67539c2 346CEPHADM_CHECK_KERNEL_LSM
522d829b
TL
347~~~~~~~~~~~~~~~~~~~~~~~~
348Each host within the cluster is expected to operate within the same Linux
349Security Module (LSM) state. For example, if the majority of the hosts are
350running with SELINUX in enforcing mode, any host not running in this mode is
351flagged as an anomaly and a healtcheck (WARNING) state raised.
adb31ebb 352
f67539c2 353CEPHADM_CHECK_SUBSCRIPTION
522d829b
TL
354~~~~~~~~~~~~~~~~~~~~~~~~~~
355This check relates to the status of vendor subscription. This check is
356performed only for hosts using RHEL, but helps to confirm that all hosts are
357covered by an active subscription, which ensures that patches and updates are
358available.
adb31ebb 359
f67539c2 360CEPHADM_CHECK_PUBLIC_MEMBERSHIP
522d829b
TL
361~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
362All members of the cluster should have NICs configured on at least one of the
363public network subnets. Hosts that are not on the public network will rely on
364routing, which may affect performance.
adb31ebb 365
f67539c2 366CEPHADM_CHECK_MTU
522d829b
TL
367~~~~~~~~~~~~~~~~~
368The MTU of the NICs on OSDs can be a key factor in consistent performance. This
369check examines hosts that are running OSD services to ensure that the MTU is
370configured consistently within the cluster. This is determined by establishing
371the MTU setting that the majority of hosts is using. Any anomalies result in a
372Ceph health check.
adb31ebb 373
f67539c2 374CEPHADM_CHECK_LINKSPEED
522d829b
TL
375~~~~~~~~~~~~~~~~~~~~~~~
376This check is similar to the MTU check. Linkspeed consistency is a factor in
377consistent cluster performance, just as the MTU of the NICs on the OSDs is.
378This check determines the linkspeed shared by the majority of OSD hosts, and a
379health check is run for any hosts that are set at a lower linkspeed rate.
adb31ebb 380
f67539c2 381CEPHADM_CHECK_NETWORK_MISSING
522d829b
TL
382~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
383The `public_network` and `cluster_network` settings support subnet definitions
384for IPv4 and IPv6. If these settings are not found on any host in the cluster,
385a health check is raised.
adb31ebb 386
f67539c2 387CEPHADM_CHECK_CEPH_RELEASE
522d829b
TL
388~~~~~~~~~~~~~~~~~~~~~~~~~~
389Under normal operations, the Ceph cluster runs daemons under the same ceph
390release (that is, the Ceph cluster runs all daemons under (for example)
391Octopus). This check determines the active release for each daemon, and
392reports any anomalies as a healthcheck. *This check is bypassed if an upgrade
393process is active within the cluster.*
adb31ebb 394
f67539c2 395CEPHADM_CHECK_KERNEL_VERSION
522d829b
TL
396~~~~~~~~~~~~~~~~~~~~~~~~~~~~
397The OS kernel version (maj.min) is checked for consistency across the hosts.
398The kernel version of the majority of the hosts is used as the basis for
399identifying anomalies.
400
401.. _client_keyrings_and_configs:
adb31ebb 402
b3b6e05e
TL
403Client keyrings and configs
404===========================
405
522d829b
TL
406Cephadm can distribute copies of the ``ceph.conf`` file and client keyring
407files to hosts. It is usually a good idea to store a copy of the config and
408``client.admin`` keyring on any host used to administer the cluster via the
409CLI. By default, cephadm does this for any nodes that have the ``_admin``
410label (which normally includes the bootstrap host).
b3b6e05e
TL
411
412When a client keyring is placed under management, cephadm will:
413
522d829b
TL
414 - build a list of target hosts based on the specified placement spec (see
415 :ref:`orchestrator-cli-placement-spec`)
b3b6e05e
TL
416 - store a copy of the ``/etc/ceph/ceph.conf`` file on the specified host(s)
417 - store a copy of the keyring file on the specified host(s)
418 - update the ``ceph.conf`` file as needed (e.g., due to a change in the cluster monitors)
522d829b
TL
419 - update the keyring file if the entity's key is changed (e.g., via ``ceph
420 auth ...`` commands)
421 - ensure that the keyring file has the specified ownership and specified mode
b3b6e05e 422 - remove the keyring file when client keyring management is disabled
522d829b
TL
423 - remove the keyring file from old hosts if the keyring placement spec is
424 updated (as needed)
b3b6e05e 425
522d829b
TL
426Listing Client Keyrings
427-----------------------
428
429To see the list of client keyrings are currently under management, run the following command:
430
431.. prompt:: bash #
b3b6e05e
TL
432
433 ceph orch client-keyring ls
434
522d829b
TL
435Putting a Keyring Under Management
436----------------------------------
437
438To put a keyring under management, run a command of the following form:
439
440.. prompt:: bash #
f67539c2 441
b3b6e05e 442 ceph orch client-keyring set <entity> <placement> [--mode=<mode>] [--owner=<uid>.<gid>] [--path=<path>]
f67539c2 443
522d829b
TL
444- By default, the *path* is ``/etc/ceph/client.{entity}.keyring``, which is
445 where Ceph looks by default. Be careful when specifying alternate locations,
446 as existing files may be overwritten.
b3b6e05e
TL
447- A placement of ``*`` (all hosts) is common.
448- The mode defaults to ``0600`` and ownership to ``0:0`` (user root, group root).
adb31ebb 449
522d829b
TL
450For example, to create a ``client.rbd`` key and deploy it to hosts with the
451``rbd-client`` label and make it group readable by uid/gid 107 (qemu), run the
452following commands:
453
454.. prompt:: bash #
adb31ebb 455
b3b6e05e
TL
456 ceph auth get-or-create-key client.rbd mon 'profile rbd' mgr 'profile rbd' osd 'profile rbd pool=my_rbd_pool'
457 ceph orch client-keyring set client.rbd label:rbd-client --owner 107:107 --mode 640
f67539c2 458
522d829b
TL
459The resulting keyring file is:
460
461.. code-block:: console
b3b6e05e
TL
462
463 -rw-r-----. 1 qemu qemu 156 Apr 21 08:47 /etc/ceph/client.client.rbd.keyring
464
522d829b
TL
465Disabling Management of a Keyring File
466--------------------------------------
467
468To disable management of a keyring file, run a command of the following form:
469
470.. prompt:: bash #
b3b6e05e
TL
471
472 ceph orch client-keyring rm <entity>
473
522d829b
TL
474.. note::
475
476 This deletes any keyring files for this entity that were previously written
477 to cluster nodes.
b3b6e05e 478
522d829b 479.. _etc_ceph_conf_distribution:
b3b6e05e
TL
480
481/etc/ceph/ceph.conf
482===================
adb31ebb 483
522d829b
TL
484Distributing ceph.conf to hosts that have no keyrings
485-----------------------------------------------------
486
487It might be useful to distribute ``ceph.conf`` files to hosts without an
488associated client keyring file. By default, cephadm deploys only a
489``ceph.conf`` file to hosts where a client keyring is also distributed (see
490above). To write config files to hosts without client keyrings, run the
491following command:
492
493.. prompt:: bash #
adb31ebb 494
b3b6e05e 495 ceph config set mgr mgr/cephadm/manage_etc_ceph_ceph_conf true
adb31ebb 496
522d829b
TL
497Using Placement Specs to specify which hosts get keyrings
498---------------------------------------------------------
499
500By default, the configs are written to all hosts (i.e., those listed by ``ceph
501orch host ls``). To specify which hosts get a ``ceph.conf``, run a command of
502the following form:
503
504.. prompt:: bash #
505
506 ceph config set mgr mgr/cephadm/manage_etc_ceph_ceph_conf_hosts <placement spec>
507
508For example, to distribute configs to hosts with the ``bare_config`` label, run
509the following command:
510
511Distributing ceph.conf to hosts tagged with bare_config
512-------------------------------------------------------
adb31ebb 513
522d829b 514For example, to distribute configs to hosts with the ``bare_config`` label, run the following command:
adb31ebb 515
522d829b 516.. prompt:: bash #
adb31ebb 517
522d829b 518 ceph config set mgr mgr/cephadm/manage_etc_ceph_ceph_conf_hosts label:bare_config
adb31ebb 519
b3b6e05e 520(See :ref:`orchestrator-cli-placement-spec` for more information about placement specs.)
a4b75251
TL
521
522Purging a cluster
523=================
524
525.. danger:: THIS OPERATION WILL DESTROY ALL DATA STORED IN THIS CLUSTER
526
20effc67 527In order to destroy a cluster and delete all data stored in this cluster, pause
a4b75251
TL
528cephadm to avoid deploying new daemons.
529
530.. prompt:: bash #
531
532 ceph orch pause
533
534Then verify the FSID of the cluster:
535
536.. prompt:: bash #
537
538 ceph fsid
539
540Purge ceph daemons from all hosts in the cluster
541
542.. prompt:: bash #
543
544 # For each host:
545 cephadm rm-cluster --force --zap-osds --fsid <fsid>