]> git.proxmox.com Git - ceph.git/blob - ceph/doc/cephadm/operations.rst
import ceph 16.2.6
[ceph.git] / ceph / doc / cephadm / operations.rst
1 ==================
2 Cephadm Operations
3 ==================
4
5 .. _watching_cephadm_logs:
6
7 Watching cephadm log messages
8 =============================
9
10 Cephadm writes logs to the ``cephadm`` cluster log channel. You can
11 monitor Ceph's activity in real time by reading the logs as they fill
12 up. Run the following command to see the logs in real time:
13
14 .. prompt:: bash #
15
16 ceph -W cephadm
17
18 By default, this command shows info-level events and above. To see
19 debug-level messages as well as info-level events, run the following
20 commands:
21
22 .. prompt:: bash #
23
24 ceph config set mgr mgr/cephadm/log_to_cluster_level debug
25 ceph -W cephadm --watch-debug
26
27 .. warning::
28
29 The debug messages are very verbose!
30
31 You can see recent events by running the following command:
32
33 .. prompt:: bash #
34
35 ceph log last cephadm
36
37 These events are also logged to the ``ceph.cephadm.log`` file on
38 monitor hosts as well as to the monitor daemons' stderr.
39
40
41 .. _cephadm-logs:
42
43 Ceph daemon logs
44 ================
45
46 Logging to journald
47 -------------------
48
49 Ceph daemons traditionally write logs to ``/var/log/ceph``. Ceph daemons log to
50 journald by default and Ceph logs are captured by the container runtime
51 environment. They are accessible via ``journalctl``.
52
53 .. note:: Prior to Quincy, ceph daemons logged to stderr.
54
55 Example of logging to journald
56 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
57
58 For example, to view the logs for the daemon ``mon.foo`` for a cluster
59 with ID ``5c5a50ae-272a-455d-99e9-32c6a013e694``, the command would be
60 something like:
61
62 .. prompt:: bash #
63
64 journalctl -u ceph-5c5a50ae-272a-455d-99e9-32c6a013e694@mon.foo
65
66 This works well for normal operations when logging levels are low.
67
68 Logging to files
69 ----------------
70
71 You can also configure Ceph daemons to log to files instead of to
72 journald if you prefer logs to appear in files (as they did in earlier,
73 pre-cephadm, pre-Octopus versions of Ceph). When Ceph logs to files,
74 the logs appear in ``/var/log/ceph/<cluster-fsid>``. If you choose to
75 configure Ceph to log to files instead of to journald, remember to
76 configure Ceph so that it will not log to journald (the commands for
77 this are covered below).
78
79 Enabling logging to files
80 ~~~~~~~~~~~~~~~~~~~~~~~~~
81
82 To enable logging to files, run the following commands:
83
84 .. prompt:: bash #
85
86 ceph config set global log_to_file true
87 ceph config set global mon_cluster_log_to_file true
88
89 Disabling logging to journald
90 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
91
92 If you choose to log to files, we recommend disabling logging to journald or else
93 everything will be logged twice. Run the following commands to disable logging
94 to stderr:
95
96 .. prompt:: bash #
97
98 ceph config set global log_to_stderr false
99 ceph config set global mon_cluster_log_to_stderr false
100 ceph config set global log_to_journald false
101 ceph config set global mon_cluster_log_to_journald false
102
103 .. note:: You can change the default by passing --log-to-file during
104 bootstrapping a new cluster.
105
106 Modifying the log retention schedule
107 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
108
109 By default, cephadm sets up log rotation on each host to rotate these
110 files. You can configure the logging retention schedule by modifying
111 ``/etc/logrotate.d/ceph.<cluster-fsid>``.
112
113
114 Data location
115 =============
116
117 Cephadm stores daemon data and logs in different locations than did
118 older, pre-cephadm (pre Octopus) versions of ceph:
119
120 * ``/var/log/ceph/<cluster-fsid>`` contains all cluster logs. By
121 default, cephadm logs via stderr and the container runtime. These
122 logs will not exist unless you have enabled logging to files as
123 described in `cephadm-logs`_.
124 * ``/var/lib/ceph/<cluster-fsid>`` contains all cluster daemon data
125 (besides logs).
126 * ``/var/lib/ceph/<cluster-fsid>/<daemon-name>`` contains all data for
127 an individual daemon.
128 * ``/var/lib/ceph/<cluster-fsid>/crash`` contains crash reports for
129 the cluster.
130 * ``/var/lib/ceph/<cluster-fsid>/removed`` contains old daemon
131 data directories for stateful daemons (e.g., monitor, prometheus)
132 that have been removed by cephadm.
133
134 Disk usage
135 ----------
136
137 Because a few Ceph daemons (notably, the monitors and prometheus) store a
138 large amount of data in ``/var/lib/ceph`` , we recommend moving this
139 directory to its own disk, partition, or logical volume so that it does not
140 fill up the root file system.
141
142
143 Health checks
144 =============
145 The cephadm module provides additional health checks to supplement the
146 default health checks provided by the Cluster. These additional health
147 checks fall into two categories:
148
149 - **cephadm operations**: Health checks in this category are always
150 executed when the cephadm module is active.
151 - **cluster configuration**: These health checks are *optional*, and
152 focus on the configuration of the hosts in the cluster.
153
154 CEPHADM Operations
155 ------------------
156
157 CEPHADM_PAUSED
158 ~~~~~~~~~~~~~~
159
160 This indicates that cephadm background work has been paused with
161 ``ceph orch pause``. Cephadm continues to perform passive monitoring
162 activities (like checking host and daemon status), but it will not
163 make any changes (like deploying or removing daemons).
164
165 Resume cephadm work by running the following command:
166
167 .. prompt:: bash #
168
169 ceph orch resume
170
171 .. _cephadm-stray-host:
172
173 CEPHADM_STRAY_HOST
174 ~~~~~~~~~~~~~~~~~~
175
176 This indicates that one or more hosts have Ceph daemons that are
177 running, but are not registered as hosts managed by *cephadm*. This
178 means that those services cannot currently be managed by cephadm
179 (e.g., restarted, upgraded, included in `ceph orch ps`).
180
181 You can manage the host(s) by running the following command:
182
183 .. prompt:: bash #
184
185 ceph orch host add *<hostname>*
186
187 .. note::
188
189 You might need to configure SSH access to the remote host
190 before this will work.
191
192 Alternatively, you can manually connect to the host and ensure that
193 services on that host are removed or migrated to a host that is
194 managed by *cephadm*.
195
196 This warning can be disabled entirely by running the following
197 command:
198
199 .. prompt:: bash #
200
201 ceph config set mgr mgr/cephadm/warn_on_stray_hosts false
202
203 See :ref:`cephadm-fqdn` for more information about host names and
204 domain names.
205
206 CEPHADM_STRAY_DAEMON
207 ~~~~~~~~~~~~~~~~~~~~
208
209 One or more Ceph daemons are running but not are not managed by
210 *cephadm*. This may be because they were deployed using a different
211 tool, or because they were started manually. Those
212 services cannot currently be managed by cephadm (e.g., restarted,
213 upgraded, or included in `ceph orch ps`).
214
215 If the daemon is a stateful one (monitor or OSD), it should be adopted
216 by cephadm; see :ref:`cephadm-adoption`. For stateless daemons, it is
217 usually easiest to provision a new daemon with the ``ceph orch apply``
218 command and then stop the unmanaged daemon.
219
220 This warning can be disabled entirely by running the following command:
221
222 .. prompt:: bash #
223
224 ceph config set mgr mgr/cephadm/warn_on_stray_daemons false
225
226 CEPHADM_HOST_CHECK_FAILED
227 ~~~~~~~~~~~~~~~~~~~~~~~~~
228
229 One or more hosts have failed the basic cephadm host check, which verifies
230 that (1) the host is reachable and cephadm can be executed there, and (2)
231 that the host satisfies basic prerequisites, like a working container
232 runtime (podman or docker) and working time synchronization.
233 If this test fails, cephadm will no be able to manage services on that host.
234
235 You can manually run this check by running the following command:
236
237 .. prompt:: bash #
238
239 ceph cephadm check-host *<hostname>*
240
241 You can remove a broken host from management by running the following command:
242
243 .. prompt:: bash #
244
245 ceph orch host rm *<hostname>*
246
247 You can disable this health warning by running the following command:
248
249 .. prompt:: bash #
250
251 ceph config set mgr mgr/cephadm/warn_on_failed_host_check false
252
253 Cluster Configuration Checks
254 ----------------------------
255 Cephadm periodically scans each of the hosts in the cluster in order
256 to understand the state of the OS, disks, NICs etc. These facts can
257 then be analysed for consistency across the hosts in the cluster to
258 identify any configuration anomalies.
259
260 Enabling Cluster Configuration Checks
261 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
262
263 The configuration checks are an **optional** feature, and are enabled
264 by running the following command:
265
266 .. prompt:: bash #
267
268 ceph config set mgr mgr/cephadm/config_checks_enabled true
269
270 States Returned by Cluster Configuration Checks
271 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
272
273 The configuration checks are triggered after each host scan (1m). The
274 cephadm log entries will show the current state and outcome of the
275 configuration checks as follows:
276
277 Disabled state (config_checks_enabled false):
278
279 .. code-block:: bash
280
281 ALL cephadm checks are disabled, use 'ceph config set mgr mgr/cephadm/config_checks_enabled true' to enable
282
283 Enabled state (config_checks_enabled true):
284
285 .. code-block:: bash
286
287 CEPHADM 8/8 checks enabled and executed (0 bypassed, 0 disabled). No issues detected
288
289 Managing Configuration Checks (subcommands)
290 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
291
292 The configuration checks themselves are managed through several cephadm subcommands.
293
294 To determine whether the configuration checks are enabled, run the following command:
295
296 .. prompt:: bash #
297
298 ceph cephadm config-check status
299
300 This command returns the status of the configuration checker as either "Enabled" or "Disabled".
301
302
303 To list all the configuration checks and their current states, run the following command:
304
305 .. code-block:: console
306
307 # ceph cephadm config-check ls
308
309 NAME HEALTHCHECK STATUS DESCRIPTION
310 kernel_security CEPHADM_CHECK_KERNEL_LSM enabled checks SELINUX/Apparmor profiles are consistent across cluster hosts
311 os_subscription CEPHADM_CHECK_SUBSCRIPTION enabled checks subscription states are consistent for all cluster hosts
312 public_network CEPHADM_CHECK_PUBLIC_MEMBERSHIP enabled check that all hosts have a NIC on the Ceph public_netork
313 osd_mtu_size CEPHADM_CHECK_MTU enabled check that OSD hosts share a common MTU setting
314 osd_linkspeed CEPHADM_CHECK_LINKSPEED enabled check that OSD hosts share a common linkspeed
315 network_missing CEPHADM_CHECK_NETWORK_MISSING enabled checks that the cluster/public networks defined exist on the Ceph hosts
316 ceph_release CEPHADM_CHECK_CEPH_RELEASE enabled check for Ceph version consistency - ceph daemons should be on the same release (unless upgrade is active)
317 kernel_version CEPHADM_CHECK_KERNEL_VERSION enabled checks that the MAJ.MIN of the kernel on Ceph hosts is consistent
318
319 The name of each configuration check can be used to enable or disable a specific check by running a command of the following form:
320 :
321
322 .. prompt:: bash #
323
324 ceph cephadm config-check disable <name>
325
326 For example:
327
328 .. prompt:: bash #
329
330 ceph cephadm config-check disable kernel_security
331
332 CEPHADM_CHECK_KERNEL_LSM
333 ~~~~~~~~~~~~~~~~~~~~~~~~
334 Each host within the cluster is expected to operate within the same Linux
335 Security Module (LSM) state. For example, if the majority of the hosts are
336 running with SELINUX in enforcing mode, any host not running in this mode is
337 flagged as an anomaly and a healtcheck (WARNING) state raised.
338
339 CEPHADM_CHECK_SUBSCRIPTION
340 ~~~~~~~~~~~~~~~~~~~~~~~~~~
341 This check relates to the status of vendor subscription. This check is
342 performed only for hosts using RHEL, but helps to confirm that all hosts are
343 covered by an active subscription, which ensures that patches and updates are
344 available.
345
346 CEPHADM_CHECK_PUBLIC_MEMBERSHIP
347 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
348 All members of the cluster should have NICs configured on at least one of the
349 public network subnets. Hosts that are not on the public network will rely on
350 routing, which may affect performance.
351
352 CEPHADM_CHECK_MTU
353 ~~~~~~~~~~~~~~~~~
354 The MTU of the NICs on OSDs can be a key factor in consistent performance. This
355 check examines hosts that are running OSD services to ensure that the MTU is
356 configured consistently within the cluster. This is determined by establishing
357 the MTU setting that the majority of hosts is using. Any anomalies result in a
358 Ceph health check.
359
360 CEPHADM_CHECK_LINKSPEED
361 ~~~~~~~~~~~~~~~~~~~~~~~
362 This check is similar to the MTU check. Linkspeed consistency is a factor in
363 consistent cluster performance, just as the MTU of the NICs on the OSDs is.
364 This check determines the linkspeed shared by the majority of OSD hosts, and a
365 health check is run for any hosts that are set at a lower linkspeed rate.
366
367 CEPHADM_CHECK_NETWORK_MISSING
368 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
369 The `public_network` and `cluster_network` settings support subnet definitions
370 for IPv4 and IPv6. If these settings are not found on any host in the cluster,
371 a health check is raised.
372
373 CEPHADM_CHECK_CEPH_RELEASE
374 ~~~~~~~~~~~~~~~~~~~~~~~~~~
375 Under normal operations, the Ceph cluster runs daemons under the same ceph
376 release (that is, the Ceph cluster runs all daemons under (for example)
377 Octopus). This check determines the active release for each daemon, and
378 reports any anomalies as a healthcheck. *This check is bypassed if an upgrade
379 process is active within the cluster.*
380
381 CEPHADM_CHECK_KERNEL_VERSION
382 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
383 The OS kernel version (maj.min) is checked for consistency across the hosts.
384 The kernel version of the majority of the hosts is used as the basis for
385 identifying anomalies.
386
387 .. _client_keyrings_and_configs:
388
389 Client keyrings and configs
390 ===========================
391
392 Cephadm can distribute copies of the ``ceph.conf`` file and client keyring
393 files to hosts. It is usually a good idea to store a copy of the config and
394 ``client.admin`` keyring on any host used to administer the cluster via the
395 CLI. By default, cephadm does this for any nodes that have the ``_admin``
396 label (which normally includes the bootstrap host).
397
398 When a client keyring is placed under management, cephadm will:
399
400 - build a list of target hosts based on the specified placement spec (see
401 :ref:`orchestrator-cli-placement-spec`)
402 - store a copy of the ``/etc/ceph/ceph.conf`` file on the specified host(s)
403 - store a copy of the keyring file on the specified host(s)
404 - update the ``ceph.conf`` file as needed (e.g., due to a change in the cluster monitors)
405 - update the keyring file if the entity's key is changed (e.g., via ``ceph
406 auth ...`` commands)
407 - ensure that the keyring file has the specified ownership and specified mode
408 - remove the keyring file when client keyring management is disabled
409 - remove the keyring file from old hosts if the keyring placement spec is
410 updated (as needed)
411
412 Listing Client Keyrings
413 -----------------------
414
415 To see the list of client keyrings are currently under management, run the following command:
416
417 .. prompt:: bash #
418
419 ceph orch client-keyring ls
420
421 Putting a Keyring Under Management
422 ----------------------------------
423
424 To put a keyring under management, run a command of the following form:
425
426 .. prompt:: bash #
427
428 ceph orch client-keyring set <entity> <placement> [--mode=<mode>] [--owner=<uid>.<gid>] [--path=<path>]
429
430 - By default, the *path* is ``/etc/ceph/client.{entity}.keyring``, which is
431 where Ceph looks by default. Be careful when specifying alternate locations,
432 as existing files may be overwritten.
433 - A placement of ``*`` (all hosts) is common.
434 - The mode defaults to ``0600`` and ownership to ``0:0`` (user root, group root).
435
436 For example, to create a ``client.rbd`` key and deploy it to hosts with the
437 ``rbd-client`` label and make it group readable by uid/gid 107 (qemu), run the
438 following commands:
439
440 .. prompt:: bash #
441
442 ceph auth get-or-create-key client.rbd mon 'profile rbd' mgr 'profile rbd' osd 'profile rbd pool=my_rbd_pool'
443 ceph orch client-keyring set client.rbd label:rbd-client --owner 107:107 --mode 640
444
445 The resulting keyring file is:
446
447 .. code-block:: console
448
449 -rw-r-----. 1 qemu qemu 156 Apr 21 08:47 /etc/ceph/client.client.rbd.keyring
450
451 Disabling Management of a Keyring File
452 --------------------------------------
453
454 To disable management of a keyring file, run a command of the following form:
455
456 .. prompt:: bash #
457
458 ceph orch client-keyring rm <entity>
459
460 .. note::
461
462 This deletes any keyring files for this entity that were previously written
463 to cluster nodes.
464
465 .. _etc_ceph_conf_distribution:
466
467 /etc/ceph/ceph.conf
468 ===================
469
470 Distributing ceph.conf to hosts that have no keyrings
471 -----------------------------------------------------
472
473 It might be useful to distribute ``ceph.conf`` files to hosts without an
474 associated client keyring file. By default, cephadm deploys only a
475 ``ceph.conf`` file to hosts where a client keyring is also distributed (see
476 above). To write config files to hosts without client keyrings, run the
477 following command:
478
479 .. prompt:: bash #
480
481 ceph config set mgr mgr/cephadm/manage_etc_ceph_ceph_conf true
482
483 Using Placement Specs to specify which hosts get keyrings
484 ---------------------------------------------------------
485
486 By default, the configs are written to all hosts (i.e., those listed by ``ceph
487 orch host ls``). To specify which hosts get a ``ceph.conf``, run a command of
488 the following form:
489
490 .. prompt:: bash #
491
492 ceph config set mgr mgr/cephadm/manage_etc_ceph_ceph_conf_hosts <placement spec>
493
494 For example, to distribute configs to hosts with the ``bare_config`` label, run
495 the following command:
496
497 Distributing ceph.conf to hosts tagged with bare_config
498 -------------------------------------------------------
499
500 For example, to distribute configs to hosts with the ``bare_config`` label, run the following command:
501
502 .. prompt:: bash #
503
504 ceph config set mgr mgr/cephadm/manage_etc_ceph_ceph_conf_hosts label:bare_config
505
506 (See :ref:`orchestrator-cli-placement-spec` for more information about placement specs.)