]> git.proxmox.com Git - ceph.git/blob - ceph/doc/cephadm/operations.rst
ec6e8887a647ec6cef80e142697e25721a16b162
[ceph.git] / ceph / doc / cephadm / operations.rst
1 ==================
2 Cephadm Operations
3 ==================
4
5 .. _watching_cephadm_logs:
6
7 Watching cephadm log messages
8 =============================
9
10 Cephadm writes logs to the ``cephadm`` cluster log channel. You can
11 monitor Ceph's activity in real time by reading the logs as they fill
12 up. Run the following command to see the logs in real time:
13
14 .. prompt:: bash #
15
16 ceph -W cephadm
17
18 By default, this command shows info-level events and above. To see
19 debug-level messages as well as info-level events, run the following
20 commands:
21
22 .. prompt:: bash #
23
24 ceph config set mgr mgr/cephadm/log_to_cluster_level debug
25 ceph -W cephadm --watch-debug
26
27 .. warning::
28
29 The debug messages are very verbose!
30
31 You can see recent events by running the following command:
32
33 .. prompt:: bash #
34
35 ceph log last cephadm
36
37 These events are also logged to the ``ceph.cephadm.log`` file on
38 monitor hosts as well as to the monitor daemons' stderr.
39
40
41 .. _cephadm-logs:
42
43 Ceph daemon logs
44 ================
45
46 Logging to journald
47 -------------------
48
49 Ceph daemons traditionally write logs to ``/var/log/ceph``. Ceph daemons log to
50 journald by default and Ceph logs are captured by the container runtime
51 environment. They are accessible via ``journalctl``.
52
53 .. note:: Prior to Quincy, ceph daemons logged to stderr.
54
55 Example of logging to journald
56 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
57
58 For example, to view the logs for the daemon ``mon.foo`` for a cluster
59 with ID ``5c5a50ae-272a-455d-99e9-32c6a013e694``, the command would be
60 something like:
61
62 .. prompt:: bash #
63
64 journalctl -u ceph-5c5a50ae-272a-455d-99e9-32c6a013e694@mon.foo
65
66 This works well for normal operations when logging levels are low.
67
68 Logging to files
69 ----------------
70
71 You can also configure Ceph daemons to log to files instead of to
72 journald if you prefer logs to appear in files (as they did in earlier,
73 pre-cephadm, pre-Octopus versions of Ceph). When Ceph logs to files,
74 the logs appear in ``/var/log/ceph/<cluster-fsid>``. If you choose to
75 configure Ceph to log to files instead of to journald, remember to
76 configure Ceph so that it will not log to journald (the commands for
77 this are covered below).
78
79 Enabling logging to files
80 ~~~~~~~~~~~~~~~~~~~~~~~~~
81
82 To enable logging to files, run the following commands:
83
84 .. prompt:: bash #
85
86 ceph config set global log_to_file true
87 ceph config set global mon_cluster_log_to_file true
88
89 Disabling logging to journald
90 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
91
92 If you choose to log to files, we recommend disabling logging to journald or else
93 everything will be logged twice. Run the following commands to disable logging
94 to stderr:
95
96 .. prompt:: bash #
97
98 ceph config set global log_to_stderr false
99 ceph config set global mon_cluster_log_to_stderr false
100 ceph config set global log_to_journald false
101 ceph config set global mon_cluster_log_to_journald false
102
103 .. note:: You can change the default by passing --log-to-file during
104 bootstrapping a new cluster.
105
106 Modifying the log retention schedule
107 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
108
109 By default, cephadm sets up log rotation on each host to rotate these
110 files. You can configure the logging retention schedule by modifying
111 ``/etc/logrotate.d/ceph.<cluster-fsid>``.
112
113
114 Data location
115 =============
116
117 Cephadm stores daemon data and logs in different locations than did
118 older, pre-cephadm (pre Octopus) versions of ceph:
119
120 * ``/var/log/ceph/<cluster-fsid>`` contains all cluster logs. By
121 default, cephadm logs via stderr and the container runtime. These
122 logs will not exist unless you have enabled logging to files as
123 described in `cephadm-logs`_.
124 * ``/var/lib/ceph/<cluster-fsid>`` contains all cluster daemon data
125 (besides logs).
126 * ``/var/lib/ceph/<cluster-fsid>/<daemon-name>`` contains all data for
127 an individual daemon.
128 * ``/var/lib/ceph/<cluster-fsid>/crash`` contains crash reports for
129 the cluster.
130 * ``/var/lib/ceph/<cluster-fsid>/removed`` contains old daemon
131 data directories for stateful daemons (e.g., monitor, prometheus)
132 that have been removed by cephadm.
133
134 Disk usage
135 ----------
136
137 Because a few Ceph daemons (notably, the monitors and prometheus) store a
138 large amount of data in ``/var/lib/ceph`` , we recommend moving this
139 directory to its own disk, partition, or logical volume so that it does not
140 fill up the root file system.
141
142
143 Health checks
144 =============
145 The cephadm module provides additional health checks to supplement the
146 default health checks provided by the Cluster. These additional health
147 checks fall into two categories:
148
149 - **cephadm operations**: Health checks in this category are always
150 executed when the cephadm module is active.
151 - **cluster configuration**: These health checks are *optional*, and
152 focus on the configuration of the hosts in the cluster.
153
154 CEPHADM Operations
155 ------------------
156
157 CEPHADM_PAUSED
158 ~~~~~~~~~~~~~~
159
160 This indicates that cephadm background work has been paused with
161 ``ceph orch pause``. Cephadm continues to perform passive monitoring
162 activities (like checking host and daemon status), but it will not
163 make any changes (like deploying or removing daemons).
164
165 Resume cephadm work by running the following command:
166
167 .. prompt:: bash #
168
169 ceph orch resume
170
171 .. _cephadm-stray-host:
172
173 CEPHADM_STRAY_HOST
174 ~~~~~~~~~~~~~~~~~~
175
176 This indicates that one or more hosts have Ceph daemons that are
177 running, but are not registered as hosts managed by *cephadm*. This
178 means that those services cannot currently be managed by cephadm
179 (e.g., restarted, upgraded, included in `ceph orch ps`).
180
181 * You can manage the host(s) by running the following command:
182
183 .. prompt:: bash #
184
185 ceph orch host add *<hostname>*
186
187 .. note::
188
189 You might need to configure SSH access to the remote host
190 before this will work.
191
192 * See :ref:`cephadm-fqdn` for more information about host names and
193 domain names.
194
195 * Alternatively, you can manually connect to the host and ensure that
196 services on that host are removed or migrated to a host that is
197 managed by *cephadm*.
198
199 * This warning can be disabled entirely by running the following
200 command:
201
202 .. prompt:: bash #
203
204 ceph config set mgr mgr/cephadm/warn_on_stray_hosts false
205
206 CEPHADM_STRAY_DAEMON
207 ~~~~~~~~~~~~~~~~~~~~
208
209 One or more Ceph daemons are running but not are not managed by
210 *cephadm*. This may be because they were deployed using a different
211 tool, or because they were started manually. Those
212 services cannot currently be managed by cephadm (e.g., restarted,
213 upgraded, or included in `ceph orch ps`).
214
215 * If the daemon is a stateful one (monitor or OSD), it should be adopted
216 by cephadm; see :ref:`cephadm-adoption`. For stateless daemons, it is
217 usually easiest to provision a new daemon with the ``ceph orch apply``
218 command and then stop the unmanaged daemon.
219
220 * If the stray daemon(s) are running on hosts not managed by cephadm, you can manage the host(s) by running the following command:
221
222 .. prompt:: bash #
223
224 ceph orch host add *<hostname>*
225
226 .. note::
227
228 You might need to configure SSH access to the remote host
229 before this will work.
230
231 * See :ref:`cephadm-fqdn` for more information about host names and
232 domain names.
233
234 * This warning can be disabled entirely by running the following command:
235
236 .. prompt:: bash #
237
238 ceph config set mgr mgr/cephadm/warn_on_stray_daemons false
239
240 CEPHADM_HOST_CHECK_FAILED
241 ~~~~~~~~~~~~~~~~~~~~~~~~~
242
243 One or more hosts have failed the basic cephadm host check, which verifies
244 that (1) the host is reachable and cephadm can be executed there, and (2)
245 that the host satisfies basic prerequisites, like a working container
246 runtime (podman or docker) and working time synchronization.
247 If this test fails, cephadm will no be able to manage services on that host.
248
249 You can manually run this check by running the following command:
250
251 .. prompt:: bash #
252
253 ceph cephadm check-host *<hostname>*
254
255 You can remove a broken host from management by running the following command:
256
257 .. prompt:: bash #
258
259 ceph orch host rm *<hostname>*
260
261 You can disable this health warning by running the following command:
262
263 .. prompt:: bash #
264
265 ceph config set mgr mgr/cephadm/warn_on_failed_host_check false
266
267 Cluster Configuration Checks
268 ----------------------------
269 Cephadm periodically scans each of the hosts in the cluster in order
270 to understand the state of the OS, disks, NICs etc. These facts can
271 then be analysed for consistency across the hosts in the cluster to
272 identify any configuration anomalies.
273
274 Enabling Cluster Configuration Checks
275 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
276
277 The configuration checks are an **optional** feature, and are enabled
278 by running the following command:
279
280 .. prompt:: bash #
281
282 ceph config set mgr mgr/cephadm/config_checks_enabled true
283
284 States Returned by Cluster Configuration Checks
285 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
286
287 The configuration checks are triggered after each host scan (1m). The
288 cephadm log entries will show the current state and outcome of the
289 configuration checks as follows:
290
291 Disabled state (config_checks_enabled false):
292
293 .. code-block:: bash
294
295 ALL cephadm checks are disabled, use 'ceph config set mgr mgr/cephadm/config_checks_enabled true' to enable
296
297 Enabled state (config_checks_enabled true):
298
299 .. code-block:: bash
300
301 CEPHADM 8/8 checks enabled and executed (0 bypassed, 0 disabled). No issues detected
302
303 Managing Configuration Checks (subcommands)
304 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
305
306 The configuration checks themselves are managed through several cephadm subcommands.
307
308 To determine whether the configuration checks are enabled, run the following command:
309
310 .. prompt:: bash #
311
312 ceph cephadm config-check status
313
314 This command returns the status of the configuration checker as either "Enabled" or "Disabled".
315
316
317 To list all the configuration checks and their current states, run the following command:
318
319 .. code-block:: console
320
321 # ceph cephadm config-check ls
322
323 NAME HEALTHCHECK STATUS DESCRIPTION
324 kernel_security CEPHADM_CHECK_KERNEL_LSM enabled checks SELINUX/Apparmor profiles are consistent across cluster hosts
325 os_subscription CEPHADM_CHECK_SUBSCRIPTION enabled checks subscription states are consistent for all cluster hosts
326 public_network CEPHADM_CHECK_PUBLIC_MEMBERSHIP enabled check that all hosts have a NIC on the Ceph public_netork
327 osd_mtu_size CEPHADM_CHECK_MTU enabled check that OSD hosts share a common MTU setting
328 osd_linkspeed CEPHADM_CHECK_LINKSPEED enabled check that OSD hosts share a common linkspeed
329 network_missing CEPHADM_CHECK_NETWORK_MISSING enabled checks that the cluster/public networks defined exist on the Ceph hosts
330 ceph_release CEPHADM_CHECK_CEPH_RELEASE enabled check for Ceph version consistency - ceph daemons should be on the same release (unless upgrade is active)
331 kernel_version CEPHADM_CHECK_KERNEL_VERSION enabled checks that the MAJ.MIN of the kernel on Ceph hosts is consistent
332
333 The name of each configuration check can be used to enable or disable a specific check by running a command of the following form:
334 :
335
336 .. prompt:: bash #
337
338 ceph cephadm config-check disable <name>
339
340 For example:
341
342 .. prompt:: bash #
343
344 ceph cephadm config-check disable kernel_security
345
346 CEPHADM_CHECK_KERNEL_LSM
347 ~~~~~~~~~~~~~~~~~~~~~~~~
348 Each host within the cluster is expected to operate within the same Linux
349 Security Module (LSM) state. For example, if the majority of the hosts are
350 running with SELINUX in enforcing mode, any host not running in this mode is
351 flagged as an anomaly and a healtcheck (WARNING) state raised.
352
353 CEPHADM_CHECK_SUBSCRIPTION
354 ~~~~~~~~~~~~~~~~~~~~~~~~~~
355 This check relates to the status of vendor subscription. This check is
356 performed only for hosts using RHEL, but helps to confirm that all hosts are
357 covered by an active subscription, which ensures that patches and updates are
358 available.
359
360 CEPHADM_CHECK_PUBLIC_MEMBERSHIP
361 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
362 All members of the cluster should have NICs configured on at least one of the
363 public network subnets. Hosts that are not on the public network will rely on
364 routing, which may affect performance.
365
366 CEPHADM_CHECK_MTU
367 ~~~~~~~~~~~~~~~~~
368 The MTU of the NICs on OSDs can be a key factor in consistent performance. This
369 check examines hosts that are running OSD services to ensure that the MTU is
370 configured consistently within the cluster. This is determined by establishing
371 the MTU setting that the majority of hosts is using. Any anomalies result in a
372 Ceph health check.
373
374 CEPHADM_CHECK_LINKSPEED
375 ~~~~~~~~~~~~~~~~~~~~~~~
376 This check is similar to the MTU check. Linkspeed consistency is a factor in
377 consistent cluster performance, just as the MTU of the NICs on the OSDs is.
378 This check determines the linkspeed shared by the majority of OSD hosts, and a
379 health check is run for any hosts that are set at a lower linkspeed rate.
380
381 CEPHADM_CHECK_NETWORK_MISSING
382 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
383 The `public_network` and `cluster_network` settings support subnet definitions
384 for IPv4 and IPv6. If these settings are not found on any host in the cluster,
385 a health check is raised.
386
387 CEPHADM_CHECK_CEPH_RELEASE
388 ~~~~~~~~~~~~~~~~~~~~~~~~~~
389 Under normal operations, the Ceph cluster runs daemons under the same ceph
390 release (that is, the Ceph cluster runs all daemons under (for example)
391 Octopus). This check determines the active release for each daemon, and
392 reports any anomalies as a healthcheck. *This check is bypassed if an upgrade
393 process is active within the cluster.*
394
395 CEPHADM_CHECK_KERNEL_VERSION
396 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
397 The OS kernel version (maj.min) is checked for consistency across the hosts.
398 The kernel version of the majority of the hosts is used as the basis for
399 identifying anomalies.
400
401 .. _client_keyrings_and_configs:
402
403 Client keyrings and configs
404 ===========================
405
406 Cephadm can distribute copies of the ``ceph.conf`` file and client keyring
407 files to hosts. It is usually a good idea to store a copy of the config and
408 ``client.admin`` keyring on any host used to administer the cluster via the
409 CLI. By default, cephadm does this for any nodes that have the ``_admin``
410 label (which normally includes the bootstrap host).
411
412 When a client keyring is placed under management, cephadm will:
413
414 - build a list of target hosts based on the specified placement spec (see
415 :ref:`orchestrator-cli-placement-spec`)
416 - store a copy of the ``/etc/ceph/ceph.conf`` file on the specified host(s)
417 - store a copy of the keyring file on the specified host(s)
418 - update the ``ceph.conf`` file as needed (e.g., due to a change in the cluster monitors)
419 - update the keyring file if the entity's key is changed (e.g., via ``ceph
420 auth ...`` commands)
421 - ensure that the keyring file has the specified ownership and specified mode
422 - remove the keyring file when client keyring management is disabled
423 - remove the keyring file from old hosts if the keyring placement spec is
424 updated (as needed)
425
426 Listing Client Keyrings
427 -----------------------
428
429 To see the list of client keyrings are currently under management, run the following command:
430
431 .. prompt:: bash #
432
433 ceph orch client-keyring ls
434
435 Putting a Keyring Under Management
436 ----------------------------------
437
438 To put a keyring under management, run a command of the following form:
439
440 .. prompt:: bash #
441
442 ceph orch client-keyring set <entity> <placement> [--mode=<mode>] [--owner=<uid>.<gid>] [--path=<path>]
443
444 - By default, the *path* is ``/etc/ceph/client.{entity}.keyring``, which is
445 where Ceph looks by default. Be careful when specifying alternate locations,
446 as existing files may be overwritten.
447 - A placement of ``*`` (all hosts) is common.
448 - The mode defaults to ``0600`` and ownership to ``0:0`` (user root, group root).
449
450 For example, to create a ``client.rbd`` key and deploy it to hosts with the
451 ``rbd-client`` label and make it group readable by uid/gid 107 (qemu), run the
452 following commands:
453
454 .. prompt:: bash #
455
456 ceph auth get-or-create-key client.rbd mon 'profile rbd' mgr 'profile rbd' osd 'profile rbd pool=my_rbd_pool'
457 ceph orch client-keyring set client.rbd label:rbd-client --owner 107:107 --mode 640
458
459 The resulting keyring file is:
460
461 .. code-block:: console
462
463 -rw-r-----. 1 qemu qemu 156 Apr 21 08:47 /etc/ceph/client.client.rbd.keyring
464
465 Disabling Management of a Keyring File
466 --------------------------------------
467
468 To disable management of a keyring file, run a command of the following form:
469
470 .. prompt:: bash #
471
472 ceph orch client-keyring rm <entity>
473
474 .. note::
475
476 This deletes any keyring files for this entity that were previously written
477 to cluster nodes.
478
479 .. _etc_ceph_conf_distribution:
480
481 /etc/ceph/ceph.conf
482 ===================
483
484 Distributing ceph.conf to hosts that have no keyrings
485 -----------------------------------------------------
486
487 It might be useful to distribute ``ceph.conf`` files to hosts without an
488 associated client keyring file. By default, cephadm deploys only a
489 ``ceph.conf`` file to hosts where a client keyring is also distributed (see
490 above). To write config files to hosts without client keyrings, run the
491 following command:
492
493 .. prompt:: bash #
494
495 ceph config set mgr mgr/cephadm/manage_etc_ceph_ceph_conf true
496
497 Using Placement Specs to specify which hosts get keyrings
498 ---------------------------------------------------------
499
500 By default, the configs are written to all hosts (i.e., those listed by ``ceph
501 orch host ls``). To specify which hosts get a ``ceph.conf``, run a command of
502 the following form:
503
504 .. prompt:: bash #
505
506 ceph config set mgr mgr/cephadm/manage_etc_ceph_ceph_conf_hosts <placement spec>
507
508 For example, to distribute configs to hosts with the ``bare_config`` label, run
509 the following command:
510
511 Distributing ceph.conf to hosts tagged with bare_config
512 -------------------------------------------------------
513
514 For example, to distribute configs to hosts with the ``bare_config`` label, run the following command:
515
516 .. prompt:: bash #
517
518 ceph config set mgr mgr/cephadm/manage_etc_ceph_ceph_conf_hosts label:bare_config
519
520 (See :ref:`orchestrator-cli-placement-spec` for more information about placement specs.)
521
522 Purging a cluster
523 =================
524
525 .. danger:: THIS OPERATION WILL DESTROY ALL DATA STORED IN THIS CLUSTER
526
527 In order to destroy a cluster and delete all data stored in this cluster, pause
528 cephadm to avoid deploying new daemons.
529
530 .. prompt:: bash #
531
532 ceph orch pause
533
534 Then verify the FSID of the cluster:
535
536 .. prompt:: bash #
537
538 ceph fsid
539
540 Purge ceph daemons from all hosts in the cluster
541
542 .. prompt:: bash #
543
544 # For each host:
545 cephadm rm-cluster --force --zap-osds --fsid <fsid>