]> git.proxmox.com Git - ceph.git/blame - ceph/doc/cephadm/operations.rst
import ceph pacific 16.2.5
[ceph.git] / ceph / doc / cephadm / operations.rst
CommitLineData
9f95a23c
TL
1==================
2Cephadm Operations
3==================
4
5Watching cephadm log messages
6=============================
7
8Cephadm logs to the ``cephadm`` cluster log channel, meaning you can
9monitor progress in realtime with::
10
11 # ceph -W cephadm
12
13By default it will show info-level events and above. To see
14debug-level messages too::
15
16 # ceph config set mgr mgr/cephadm/log_to_cluster_level debug
17 # ceph -W cephadm --watch-debug
18
19Be careful: the debug messages are very verbose!
20
21You can see recent events with::
22
23 # ceph log last cephadm
24
25These events are also logged to the ``ceph.cephadm.log`` file on
26monitor hosts and to the monitor daemons' stderr.
27
28
801d1391
TL
29.. _cephadm-logs:
30
9f95a23c
TL
31Ceph daemon logs
32================
33
34Logging to stdout
35-----------------
36
37Traditionally, Ceph daemons have logged to ``/var/log/ceph``. By
38default, cephadm daemons log to stderr and the logs are
39captured by the container runtime environment. For most systems, by
40default, these logs are sent to journald and accessible via
41``journalctl``.
42
43For example, to view the logs for the daemon ``mon.foo`` for a cluster
44with ID ``5c5a50ae-272a-455d-99e9-32c6a013e694``, the command would be
45something like::
46
47 journalctl -u ceph-5c5a50ae-272a-455d-99e9-32c6a013e694@mon.foo
48
49This works well for normal operations when logging levels are low.
50
51To disable logging to stderr::
52
53 ceph config set global log_to_stderr false
54 ceph config set global mon_cluster_log_to_stderr false
55
56Logging to files
57----------------
58
59You can also configure Ceph daemons to log to files instead of stderr,
60just like they have in the past. When logging to files, Ceph logs appear
61in ``/var/log/ceph/<cluster-fsid>``.
62
63To enable logging to files::
64
65 ceph config set global log_to_file true
66 ceph config set global mon_cluster_log_to_file true
67
68We recommend disabling logging to stderr (see above) or else everything
69will be logged twice::
70
71 ceph config set global log_to_stderr false
72 ceph config set global mon_cluster_log_to_stderr false
73
74By default, cephadm sets up log rotation on each host to rotate these
75files. You can configure the logging retention schedule by modifying
76``/etc/logrotate.d/ceph.<cluster-fsid>``.
77
78
79Data location
80=============
81
82Cephadm daemon data and logs in slightly different locations than older
83versions of ceph:
84
85* ``/var/log/ceph/<cluster-fsid>`` contains all cluster logs. Note
86 that by default cephadm logs via stderr and the container runtime,
87 so these logs are normally not present.
88* ``/var/lib/ceph/<cluster-fsid>`` contains all cluster daemon data
89 (besides logs).
90* ``/var/lib/ceph/<cluster-fsid>/<daemon-name>`` contains all data for
91 an individual daemon.
92* ``/var/lib/ceph/<cluster-fsid>/crash`` contains crash reports for
93 the cluster.
94* ``/var/lib/ceph/<cluster-fsid>/removed`` contains old daemon
95 data directories for stateful daemons (e.g., monitor, prometheus)
96 that have been removed by cephadm.
97
98Disk usage
99----------
100
101Because a few Ceph daemons may store a significant amount of data in
102``/var/lib/ceph`` (notably, the monitors and prometheus), we recommend
103moving this directory to its own disk, partition, or logical volume so
104that it does not fill up the root file system.
105
106
9f95a23c
TL
107Health checks
108=============
f67539c2
TL
109The cephadm module provides additional healthchecks to supplement the default healthchecks
110provided by the Cluster. These additional healthchecks fall into two categories;
111
112- **cephadm operations**: Healthchecks in this category are always executed when the cephadm module is active.
113- **cluster configuration**: These healthchecks are *optional*, and focus on the configuration of the hosts in
114 the cluster
115
116CEPHADM Operations
117------------------
9f95a23c
TL
118
119CEPHADM_PAUSED
f67539c2 120^^^^^^^^^^^^^^
9f95a23c
TL
121
122Cephadm background work has been paused with ``ceph orch pause``. Cephadm
123continues to perform passive monitoring activities (like checking
124host and daemon status), but it will not make any changes (like deploying
125or removing daemons).
126
127Resume cephadm work with::
128
129 ceph orch resume
130
f6b5b4d7
TL
131.. _cephadm-stray-host:
132
9f95a23c 133CEPHADM_STRAY_HOST
f67539c2 134^^^^^^^^^^^^^^^^^^
9f95a23c
TL
135
136One or more hosts have running Ceph daemons but are not registered as
137hosts managed by *cephadm*. This means that those services cannot
138currently be managed by cephadm (e.g., restarted, upgraded, included
139in `ceph orch ps`).
140
141You can manage the host(s) with::
142
143 ceph orch host add *<hostname>*
144
145Note that you may need to configure SSH access to the remote host
146before this will work.
147
148Alternatively, you can manually connect to the host and ensure that
149services on that host are removed or migrated to a host that is
150managed by *cephadm*.
151
152You can also disable this warning entirely with::
153
154 ceph config set mgr mgr/cephadm/warn_on_stray_hosts false
155
f6b5b4d7
TL
156See :ref:`cephadm-fqdn` for more information about host names and
157domain names.
158
9f95a23c 159CEPHADM_STRAY_DAEMON
f67539c2 160^^^^^^^^^^^^^^^^^^^^
9f95a23c
TL
161
162One or more Ceph daemons are running but not are not managed by
163*cephadm*. This may be because they were deployed using a different
164tool, or because they were started manually. Those
165services cannot currently be managed by cephadm (e.g., restarted,
166upgraded, or included in `ceph orch ps`).
167
168If the daemon is a stateful one (monitor or OSD), it should be adopted
169by cephadm; see :ref:`cephadm-adoption`. For stateless daemons, it is
170usually easiest to provision a new daemon with the ``ceph orch apply``
171command and then stop the unmanaged daemon.
172
173This warning can be disabled entirely with::
174
175 ceph config set mgr mgr/cephadm/warn_on_stray_daemons false
176
177CEPHADM_HOST_CHECK_FAILED
f67539c2 178^^^^^^^^^^^^^^^^^^^^^^^^^
9f95a23c
TL
179
180One or more hosts have failed the basic cephadm host check, which verifies
181that (1) the host is reachable and cephadm can be executed there, and (2)
182that the host satisfies basic prerequisites, like a working container
183runtime (podman or docker) and working time synchronization.
184If this test fails, cephadm will no be able to manage services on that host.
185
186You can manually run this check with::
187
188 ceph cephadm check-host *<hostname>*
189
190You can remove a broken host from management with::
191
192 ceph orch host rm *<hostname>*
193
194You can disable this health warning with::
195
196 ceph config set mgr mgr/cephadm/warn_on_failed_host_check false
e306af50 197
f67539c2
TL
198Cluster Configuration Checks
199----------------------------
200Cephadm periodically scans each of the hosts in the cluster, to understand the state
201of the OS, disks, NICs etc. These facts can then be analysed for consistency across the hosts
202in the cluster to identify any configuration anomalies.
e306af50 203
f67539c2
TL
204The configuration checks are an **optional** feature, enabled by the following command
205::
e306af50 206
f67539c2 207 ceph config set mgr mgr/cephadm/config_checks_enabled true
e306af50 208
f67539c2
TL
209The configuration checks are triggered after each host scan (1m). The cephadm log entries will
210show the current state and outcome of the configuration checks as follows;
e306af50 211
f67539c2
TL
212Disabled state (config_checks_enabled false)
213::
e306af50 214
f67539c2 215 ALL cephadm checks are disabled, use 'ceph config set mgr mgr/cephadm/config_checks_enabled true' to enable
f91f0fd5 216
f67539c2
TL
217Enabled state (config_checks_enabled true)
218::
f91f0fd5 219
f67539c2 220 CEPHADM 8/8 checks enabled and executed (0 bypassed, 0 disabled). No issues detected
e306af50 221
f67539c2 222The configuration checks themselves are managed through several cephadm sub-commands.
f91f0fd5 223
f67539c2
TL
224To determine whether the configuration checks are enabled, you can use the following command
225::
f91f0fd5 226
f67539c2
TL
227 ceph cephadm config-check status
228
229This command will return the status of the configuration checker as either "Enabled" or "Disabled".
230
231
232Listing all the configuration checks and their current state
233::
234
235 ceph cephadm config-check ls
236
237 e.g.
238 NAME HEALTHCHECK STATUS DESCRIPTION
239 kernel_security CEPHADM_CHECK_KERNEL_LSM enabled checks SELINUX/Apparmor profiles are consistent across cluster hosts
240 os_subscription CEPHADM_CHECK_SUBSCRIPTION enabled checks subscription states are consistent for all cluster hosts
241 public_network CEPHADM_CHECK_PUBLIC_MEMBERSHIP enabled check that all hosts have a NIC on the Ceph public_netork
242 osd_mtu_size CEPHADM_CHECK_MTU enabled check that OSD hosts share a common MTU setting
243 osd_linkspeed CEPHADM_CHECK_LINKSPEED enabled check that OSD hosts share a common linkspeed
244 network_missing CEPHADM_CHECK_NETWORK_MISSING enabled checks that the cluster/public networks defined exist on the Ceph hosts
245 ceph_release CEPHADM_CHECK_CEPH_RELEASE enabled check for Ceph version consistency - ceph daemons should be on the same release (unless upgrade is active)
246 kernel_version CEPHADM_CHECK_KERNEL_VERSION enabled checks that the MAJ.MIN of the kernel on Ceph hosts is consistent
247
248The name of each configuration check, can then be used to enable or disable a specific check.
249::
adb31ebb 250
f67539c2 251 ceph cephadm config-check disable <name>
adb31ebb 252
f67539c2
TL
253 eg.
254 ceph cephadm config-check disable kernel_security
adb31ebb 255
f67539c2
TL
256CEPHADM_CHECK_KERNEL_LSM
257^^^^^^^^^^^^^^^^^^^^^^^^
258Each host within the cluster is expected to operate within the same Linux Security Module (LSM) state. For example,
259if the majority of the hosts are running with SELINUX in enforcing mode, any host not running in this mode
260would be flagged as an anomaly and a healtcheck (WARNING) state raised.
adb31ebb 261
f67539c2
TL
262CEPHADM_CHECK_SUBSCRIPTION
263^^^^^^^^^^^^^^^^^^^^^^^^^^
264This check relates to the status of vendor subscription. This check is only performed for hosts using RHEL, but helps
265to confirm that all your hosts are covered by an active subscription so patches and updates
266are available.
adb31ebb 267
f67539c2
TL
268CEPHADM_CHECK_PUBLIC_MEMBERSHIP
269^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
270All members of the cluster should have NICs configured on at least one of the public network subnets. Hosts
271that are not on the public network will rely on routing which may affect performance
adb31ebb 272
f67539c2
TL
273CEPHADM_CHECK_MTU
274^^^^^^^^^^^^^^^^^
275The MTU of the NICs on OSDs can be a key factor in consistent performance. This check examines hosts
276that are running OSD services to ensure that the MTU is configured consistently within the cluster. This is
277determined by establishing the MTU setting that the majority of hosts are using, with any anomalies being
278resulting in a Ceph healthcheck.
adb31ebb 279
f67539c2
TL
280CEPHADM_CHECK_LINKSPEED
281^^^^^^^^^^^^^^^^^^^^^^^
282Similar to the MTU check, linkspeed consistency is also a factor in consistent cluster performance.
283This check determines the linkspeed shared by the majority of "OSD hosts", resulting in a healthcheck for
284any hosts that are set at a lower linkspeed rate.
adb31ebb 285
f67539c2
TL
286CEPHADM_CHECK_NETWORK_MISSING
287^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
288The public_network and cluster_network settings support subnet definitions for IPv4 and IPv6. If these
289settings are not found on any host in the cluster a healthcheck is raised.
adb31ebb 290
f67539c2
TL
291CEPHADM_CHECK_CEPH_RELEASE
292^^^^^^^^^^^^^^^^^^^^^^^^^^
293Under normal operations, the ceph cluster should be running daemons under the same ceph release (i.e. all
294pacific). This check looks at the active release for each daemon, and reports any anomalies as a
295healthcheck. *This check is bypassed if an upgrade process is active within the cluster.*
adb31ebb 296
f67539c2
TL
297CEPHADM_CHECK_KERNEL_VERSION
298^^^^^^^^^^^^^^^^^^^^^^^^^^^^
299The OS kernel version (maj.min) is checked for consistency across the hosts. Once again, the
300majority of the hosts is used as the basis of identifying anomalies.
adb31ebb 301
b3b6e05e
TL
302Client keyrings and configs
303===========================
304
305Cephadm can distribute copies of the ``ceph.conf`` and client keyring
306files to hosts. For example, it is usually a good idea to store a
307copy of the config and ``client.admin`` keyring on any hosts that will
308be used to administer the cluster via the CLI. By default, cephadm will do
309this for any nodes with the ``_admin`` label (which normally includes the bootstrap
310host).
311
312When a client keyring is placed under management, cephadm will:
313
314 - build a list of target hosts based on the specified placement spec (see :ref:`orchestrator-cli-placement-spec`)
315 - store a copy of the ``/etc/ceph/ceph.conf`` file on the specified host(s)
316 - store a copy of the keyring file on the specified host(s)
317 - update the ``ceph.conf`` file as needed (e.g., due to a change in the cluster monitors)
318 - update the keyring file if the entity's key is changed (e.g., via ``ceph auth ...`` commands)
319 - ensure the keyring file has the specified ownership and mode
320 - remove the keyring file when client keyring management is disabled
321 - remove the keyring file from old hosts if the keyring placement spec is updated (as needed)
322
323To view which client keyrings are currently under management::
324
325 ceph orch client-keyring ls
326
327To place a keyring under management::
f67539c2 328
b3b6e05e 329 ceph orch client-keyring set <entity> <placement> [--mode=<mode>] [--owner=<uid>.<gid>] [--path=<path>]
f67539c2 330
b3b6e05e
TL
331- By default, the *path* will be ``/etc/ceph/client.{entity}.keyring``, which is where
332 Ceph looks by default. Be careful specifying alternate locations as existing files
333 may be overwritten.
334- A placement of ``*`` (all hosts) is common.
335- The mode defaults to ``0600`` and ownership to ``0:0`` (user root, group root).
adb31ebb 336
b3b6e05e 337For example, to create and deploy a ``client.rbd`` key to hosts with the ``rbd-client`` label and group readable by uid/gid 107 (qemu),::
adb31ebb 338
b3b6e05e
TL
339 ceph auth get-or-create-key client.rbd mon 'profile rbd' mgr 'profile rbd' osd 'profile rbd pool=my_rbd_pool'
340 ceph orch client-keyring set client.rbd label:rbd-client --owner 107:107 --mode 640
f67539c2 341
b3b6e05e
TL
342The resulting keyring file is::
343
344 -rw-r-----. 1 qemu qemu 156 Apr 21 08:47 /etc/ceph/client.client.rbd.keyring
345
346To disable management of a keyring file::
347
348 ceph orch client-keyring rm <entity>
349
350Note that this will delete any keyring files for this entity that were previously written
351to cluster nodes.
352
353
354/etc/ceph/ceph.conf
355===================
adb31ebb 356
b3b6e05e
TL
357It may also be useful to distribute ``ceph.conf`` files to hosts without an associated
358client keyring file. By default, cephadm only deploys a ``ceph.conf`` file to hosts where a client keyring
359is also distributed (see above). To write config files to hosts without client keyrings::
adb31ebb 360
b3b6e05e 361 ceph config set mgr mgr/cephadm/manage_etc_ceph_ceph_conf true
adb31ebb 362
b3b6e05e
TL
363By default, the configs are written to all hosts (i.e., those listed
364by ``ceph orch host ls``). To specify which hosts get a ``ceph.conf``::
adb31ebb 365
b3b6e05e 366 ceph config set mgr mgr/cephadm/manage_etc_ceph_ceph_conf_hosts <placement spec>
adb31ebb 367
b3b6e05e 368For example, to distribute configs to hosts with the ``bare_config`` label,::
adb31ebb 369
b3b6e05e 370 ceph config set mgr mgr/cephadm/manage_etc_ceph_ceph_conf_hosts label:bare_config
adb31ebb 371
b3b6e05e 372(See :ref:`orchestrator-cli-placement-spec` for more information about placement specs.)