ceph/doc/cephadm/operations.rst

   1 ==================
   2 Cephadm Operations
   3 ==================
   4
   5 .. _watching_cephadm_logs:
   6
   7 Watching cephadm log messages
   8 =============================
   9
  10 Cephadm writes logs to the ``cephadm`` cluster log channel. You can
  11 monitor Ceph's activity in real time by reading the logs as they fill
  12 up. Run the following command to see the logs in real time:
  13
  14 .. prompt:: bash #
  15
  16   ceph -W cephadm
  17
  18 By default, this command shows info-level events and above.  To see
  19 debug-level messages as well as info-level events, run the following
  20 commands:
  21
  22 .. prompt:: bash #
  23
  24   ceph config set mgr mgr/cephadm/log_to_cluster_level debug
  25   ceph -W cephadm --watch-debug
  26
  27 .. warning::
  28
  29   The debug messages are very verbose!
  30
  31 You can see recent events by running the following command:
  32
  33 .. prompt:: bash #
  34
  35   ceph log last cephadm
  36
  37 These events are also logged to the ``ceph.cephadm.log`` file on
  38 monitor hosts as well as to the monitor daemons' stderr.
  39
  40
  41 .. _cephadm-logs:
  42
  43
  44 Ceph daemon control
  45 ===================
  46
  47 Starting and stopping daemons
  48 -----------------------------
  49
  50 You can stop, start, or restart a daemon with:
  51
  52 .. prompt:: bash #
  53
  54    ceph orch daemon stop <name>
  55    ceph orch daemon start <name>
  56    ceph orch daemon restart <name>
  57
  58 You can also do the same for all daemons for a service with:
  59
  60 .. prompt:: bash #
  61
  62    ceph orch stop <name>
  63    ceph orch start <name>
  64    ceph orch restart <name>
  65
  66
  67 Redeploying or reconfiguring a daemon
  68 -------------------------------------
  69
  70 The container for a daemon can be stopped, recreated, and restarted with
  71 the ``redeploy`` command:
  72
  73 .. prompt:: bash #
  74
  75    ceph orch daemon redeploy <name> [--image <image>]
  76
  77 A container image name can optionally be provided to force a
  78 particular image to be used (instead of the image specified by the
  79 ``container_image`` config value).
  80
  81 If only the ceph configuration needs to be regenerated, you can also
  82 issue a ``reconfig`` command, which will rewrite the ``ceph.conf``
  83 file but will not trigger a restart of the daemon.
  84
  85 .. prompt:: bash #
  86
  87    ceph orch daemon reconfig <name>
  88
  89
  90 Rotating a daemon's authenticate key
  91 ------------------------------------
  92
  93 All Ceph and gateway daemons in the cluster have a secret key that is used to connect
  94 to and authenticate with the cluster.  This key can be rotated (i.e., replaced with a
  95 new key) with the following command:
  96
  97 .. prompt:: bash #
  98
  99    ceph orch daemon rotate-key <name>
 100
 101 For MDS, OSD, and MGR daemons, this does not require a daemon restart.  For other
 102 daemons, however (e.g., RGW), the daemon may be restarted to switch to the new key.
 103
 104
 105 Ceph daemon logs
 106 ================
 107
 108 Logging to journald
 109 -------------------
 110
 111 Ceph daemons traditionally write logs to ``/var/log/ceph``. Ceph daemons log to
 112 journald by default and Ceph logs are captured by the container runtime
 113 environment. They are accessible via ``journalctl``.
 114
 115 .. note:: Prior to Quincy, ceph daemons logged to stderr.
 116
 117 Example of logging to journald
 118 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 119
 120 For example, to view the logs for the daemon ``mon.foo`` for a cluster
 121 with ID ``5c5a50ae-272a-455d-99e9-32c6a013e694``, the command would be
 122 something like:
 123
 124 .. prompt:: bash #
 125
 126   journalctl -u ceph-5c5a50ae-272a-455d-99e9-32c6a013e694@mon.foo
 127
 128 This works well for normal operations when logging levels are low.
 129
 130 Logging to files
 131 ----------------
 132
 133 You can also configure Ceph daemons to log to files instead of to
 134 journald if you prefer logs to appear in files (as they did in earlier,
 135 pre-cephadm, pre-Octopus versions of Ceph).  When Ceph logs to files,
 136 the logs appear in ``/var/log/ceph/<cluster-fsid>``. If you choose to
 137 configure Ceph to log to files instead of to journald, remember to
 138 configure Ceph so that it will not log to journald (the commands for
 139 this are covered below).
 140
 141 Enabling logging to files
 142 ~~~~~~~~~~~~~~~~~~~~~~~~~
 143
 144 To enable logging to files, run the following commands:
 145
 146 .. prompt:: bash #
 147
 148   ceph config set global log_to_file true
 149   ceph config set global mon_cluster_log_to_file true
 150
 151 Disabling logging to journald
 152 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 153
 154 If you choose to log to files, we recommend disabling logging to journald or else
 155 everything will be logged twice. Run the following commands to disable logging
 156 to stderr:
 157
 158 .. prompt:: bash #
 159
 160   ceph config set global log_to_stderr false
 161   ceph config set global mon_cluster_log_to_stderr false
 162   ceph config set global log_to_journald false
 163   ceph config set global mon_cluster_log_to_journald false
 164
 165 .. note:: You can change the default by passing --log-to-file during
 166    bootstrapping a new cluster.
 167
 168 Modifying the log retention schedule
 169 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 170
 171 By default, cephadm sets up log rotation on each host to rotate these
 172 files.  You can configure the logging retention schedule by modifying
 173 ``/etc/logrotate.d/ceph.<cluster-fsid>``.
 174
 175
 176 Data location
 177 =============
 178
 179 Cephadm stores daemon data and logs in different locations than did
 180 older, pre-cephadm (pre Octopus) versions of ceph:
 181
 182 * ``/var/log/ceph/<cluster-fsid>`` contains all cluster logs. By
 183   default, cephadm logs via stderr and the container runtime. These
 184   logs will not exist unless you have enabled logging to files as
 185   described in `cephadm-logs`_.
 186 * ``/var/lib/ceph/<cluster-fsid>`` contains all cluster daemon data
 187   (besides logs).
 188 * ``/var/lib/ceph/<cluster-fsid>/<daemon-name>`` contains all data for
 189   an individual daemon.
 190 * ``/var/lib/ceph/<cluster-fsid>/crash`` contains crash reports for
 191   the cluster.
 192 * ``/var/lib/ceph/<cluster-fsid>/removed`` contains old daemon
 193   data directories for stateful daemons (e.g., monitor, prometheus)
 194   that have been removed by cephadm.
 195
 196 Disk usage
 197 ----------
 198
 199 Because a few Ceph daemons (notably, the monitors and prometheus) store a
 200 large amount of data in ``/var/lib/ceph`` , we recommend moving this
 201 directory to its own disk, partition, or logical volume so that it does not
 202 fill up the root file system.
 203
 204
 205 Health checks
 206 =============
 207 The cephadm module provides additional health checks to supplement the
 208 default health checks provided by the Cluster. These additional health
 209 checks fall into two categories:
 210
 211 - **cephadm operations**: Health checks in this category are always
 212   executed when the cephadm module is active.
 213 - **cluster configuration**: These health checks are *optional*, and
 214   focus on the configuration of the hosts in the cluster.
 215
 216 CEPHADM Operations
 217 ------------------
 218
 219 CEPHADM_PAUSED
 220 ~~~~~~~~~~~~~~
 221
 222 This indicates that cephadm background work has been paused with
 223 ``ceph orch pause``.  Cephadm continues to perform passive monitoring
 224 activities (like checking host and daemon status), but it will not
 225 make any changes (like deploying or removing daemons).
 226
 227 Resume cephadm work by running the following command:
 228
 229 .. prompt:: bash #
 230
 231   ceph orch resume
 232
 233 .. _cephadm-stray-host:
 234
 235 CEPHADM_STRAY_HOST
 236 ~~~~~~~~~~~~~~~~~~
 237
 238 This indicates that one or more hosts have Ceph daemons that are
 239 running, but are not registered as hosts managed by *cephadm*.  This
 240 means that those services cannot currently be managed by cephadm
 241 (e.g., restarted, upgraded, included in `ceph orch ps`).
 242
 243 * You can manage the host(s) by running the following command:
 244
 245   .. prompt:: bash #
 246
 247     ceph orch host add *<hostname>*
 248
 249   .. note::
 250
 251     You might need to configure SSH access to the remote host
 252     before this will work.
 253
 254 * See :ref:`cephadm-fqdn` for more information about host names and
 255   domain names.
 256
 257 * Alternatively, you can manually connect to the host and ensure that
 258   services on that host are removed or migrated to a host that is
 259   managed by *cephadm*.
 260
 261 * This warning can be disabled entirely by running the following
 262   command:
 263
 264   .. prompt:: bash #
 265
 266     ceph config set mgr mgr/cephadm/warn_on_stray_hosts false
 267
 268 CEPHADM_STRAY_DAEMON
 269 ~~~~~~~~~~~~~~~~~~~~
 270
 271 One or more Ceph daemons are running but not are not managed by
 272 *cephadm*.  This may be because they were deployed using a different
 273 tool, or because they were started manually.  Those
 274 services cannot currently be managed by cephadm (e.g., restarted,
 275 upgraded, or included in `ceph orch ps`).
 276
 277 * If the daemon is a stateful one (monitor or OSD), it should be adopted
 278   by cephadm; see :ref:`cephadm-adoption`.  For stateless daemons, it is
 279   usually easiest to provision a new daemon with the ``ceph orch apply``
 280   command and then stop the unmanaged daemon.
 281
 282 * If the stray daemon(s) are running on hosts not managed by cephadm, you can manage the host(s) by running the following command:
 283
 284   .. prompt:: bash #
 285
 286     ceph orch host add *<hostname>*
 287
 288   .. note::
 289
 290     You might need to configure SSH access to the remote host
 291     before this will work.
 292
 293 * See :ref:`cephadm-fqdn` for more information about host names and
 294   domain names.
 295
 296 * This warning can be disabled entirely by running the following command:
 297
 298   .. prompt:: bash #
 299
 300     ceph config set mgr mgr/cephadm/warn_on_stray_daemons false
 301
 302 CEPHADM_HOST_CHECK_FAILED
 303 ~~~~~~~~~~~~~~~~~~~~~~~~~
 304
 305 One or more hosts have failed the basic cephadm host check, which verifies
 306 that (1) the host is reachable and cephadm can be executed there, and (2)
 307 that the host satisfies basic prerequisites, like a working container
 308 runtime (podman or docker) and working time synchronization.
 309 If this test fails, cephadm will no be able to manage services on that host.
 310
 311 You can manually run this check by running the following command:
 312
 313 .. prompt:: bash #
 314
 315   ceph cephadm check-host *<hostname>*
 316
 317 You can remove a broken host from management by running the following command:
 318
 319 .. prompt:: bash #
 320
 321   ceph orch host rm *<hostname>*
 322
 323 You can disable this health warning by running the following command:
 324
 325 .. prompt:: bash #
 326
 327   ceph config set mgr mgr/cephadm/warn_on_failed_host_check false
 328
 329 Cluster Configuration Checks
 330 ----------------------------
 331 Cephadm periodically scans each host in the cluster in order
 332 to understand the state of the OS, disks, network interfacess etc. This information can
 333 then be analyzed for consistency across the hosts in the cluster to
 334 identify any configuration anomalies.
 335
 336 Enabling Cluster Configuration Checks
 337 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 338
 339 These configuration checks are an **optional** feature, and are enabled
 340 by running the following command:
 341
 342 .. prompt:: bash #
 343
 344   ceph config set mgr mgr/cephadm/config_checks_enabled true
 345
 346 States Returned by Cluster Configuration Checks
 347 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 348
 349 Configuration checks are triggered after each host scan. The
 350 cephadm log entries will show the current state and outcome of the
 351 configuration checks as follows:
 352
 353 Disabled state (config_checks_enabled false):
 354
 355 .. code-block:: bash
 356
 357   ALL cephadm checks are disabled, use 'ceph config set mgr mgr/cephadm/config_checks_enabled true' to enable
 358
 359 Enabled state (config_checks_enabled true):
 360
 361 .. code-block:: bash
 362
 363   CEPHADM 8/8 checks enabled and executed (0 bypassed, 0 disabled). No issues detected
 364
 365 Managing Configuration Checks (subcommands)
 366 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 367
 368 The configuration checks themselves are managed through several cephadm subcommands.
 369
 370 To determine whether the configuration checks are enabled, run the following command:
 371
 372 .. prompt:: bash #
 373
 374   ceph cephadm config-check status
 375
 376 This command returns the status of the configuration checker as either "Enabled" or "Disabled".
 377
 378
 379 To list all the configuration checks and their current states, run the following command:
 380
 381 .. code-block:: console
 382
 383   # ceph cephadm config-check ls
 384
 385     NAME             HEALTHCHECK                      STATUS   DESCRIPTION
 386   kernel_security  CEPHADM_CHECK_KERNEL_LSM         enabled  check that SELINUX/Apparmor profiles are consistent across cluster hosts
 387   os_subscription  CEPHADM_CHECK_SUBSCRIPTION       enabled  check that subscription states are consistent for all cluster hosts
 388   public_network   CEPHADM_CHECK_PUBLIC_MEMBERSHIP  enabled  check that all hosts have a network interface on the Ceph public_network
 389   osd_mtu_size     CEPHADM_CHECK_MTU                enabled  check that OSD hosts share a common MTU setting
 390   osd_linkspeed    CEPHADM_CHECK_LINKSPEED          enabled  check that OSD hosts share a common network link speed
 391   network_missing  CEPHADM_CHECK_NETWORK_MISSING    enabled  check that the cluster/public networks as defined exist on the Ceph hosts
 392   ceph_release     CEPHADM_CHECK_CEPH_RELEASE       enabled  check for Ceph version consistency: all Ceph daemons should be the same release unless upgrade is in progress
 393   kernel_version   CEPHADM_CHECK_KERNEL_VERSION     enabled  checks that the maj.min version of the kernel is consistent across Ceph hosts
 394
 395 The name of each configuration check can be used to enable or disable a specific check by running a command of the following form:
 396 :
 397
 398 .. prompt:: bash #
 399
 400   ceph cephadm config-check disable <name>
 401
 402 For example:
 403
 404 .. prompt:: bash #
 405
 406   ceph cephadm config-check disable kernel_security
 407
 408 CEPHADM_CHECK_KERNEL_LSM
 409 ~~~~~~~~~~~~~~~~~~~~~~~~
 410 Each host within the cluster is expected to operate within the same Linux
 411 Security Module (LSM) state. For example, if the majority of the hosts are
 412 running with SELINUX in enforcing mode, any host not running in this mode is
 413 flagged as an anomaly and a healthcheck (WARNING) state raised.
 414
 415 CEPHADM_CHECK_SUBSCRIPTION
 416 ~~~~~~~~~~~~~~~~~~~~~~~~~~
 417 This check relates to the status of OS vendor subscription. This check is
 418 performed only for hosts using RHEL and helps to confirm that all hosts are
 419 covered by an active subscription, which ensures that patches and updates are
 420 available.
 421
 422 CEPHADM_CHECK_PUBLIC_MEMBERSHIP
 423 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 424 All members of the cluster should have a network interface configured on at least one of the
 425 public network subnets. Hosts that are not on the public network will rely on
 426 routing, which may affect performance.
 427
 428 CEPHADM_CHECK_MTU
 429 ~~~~~~~~~~~~~~~~~
 430 The MTU of the network interfaces on OSD hosts can be a key factor in consistent performance. This
 431 check examines hosts that are running OSD services to ensure that the MTU is
 432 configured consistently within the cluster. This is determined by determining
 433 the MTU setting that the majority of hosts is using. Any anomalies result in a
 434 health check.
 435
 436 CEPHADM_CHECK_LINKSPEED
 437 ~~~~~~~~~~~~~~~~~~~~~~~
 438 This check is similar to the MTU check. Link speed consistency is a factor in
 439 consistent cluster performance, as is the MTU of the OSD node network interfaces.
 440 This check determines the link speed shared by the majority of OSD hosts, and a
 441 health check is run for any hosts that are set at a lower link speed rate.
 442
 443 CEPHADM_CHECK_NETWORK_MISSING
 444 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 445 The `public_network` and `cluster_network` settings support subnet definitions
 446 for IPv4 and IPv6. If these settings are not found on any host in the cluster,
 447 a health check is raised.
 448
 449 CEPHADM_CHECK_CEPH_RELEASE
 450 ~~~~~~~~~~~~~~~~~~~~~~~~~~
 451 Under normal operations, the Ceph cluster runs daemons that are of the same Ceph
 452 release (for example, Reef).  This check determines the active release for each daemon, and
 453 reports any anomalies as a healthcheck. *This check is bypassed if an upgrade
 454 is in process.*
 455
 456 CEPHADM_CHECK_KERNEL_VERSION
 457 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 458 The OS kernel version (maj.min) is checked for consistency across hosts.
 459 The kernel version of the majority of the hosts is used as the basis for
 460 identifying anomalies.
 461
 462 .. _client_keyrings_and_configs:
 463
 464 Client keyrings and configs
 465 ===========================
 466 Cephadm can distribute copies of the ``ceph.conf`` file and client keyring
 467 files to hosts. Starting from versions 16.2.10 (Pacific) and 17.2.1 (Quincy),
 468 in addition to the default location ``/etc/ceph/`` cephadm also stores config
 469 and keyring files in the ``/var/lib/ceph/<fsid>/config`` directory. It is usually
 470 a good idea to store a copy of the config and ``client.admin`` keyring on any host
 471 used to administer the cluster via the CLI. By default, cephadm does this for any
 472 nodes that have the ``_admin`` label (which normally includes the bootstrap host).
 473
 474 .. note:: Ceph daemons will still use files on ``/etc/ceph/``. The new configuration
 475    location ``/var/lib/ceph/<fsid>/config`` is used by cephadm only. Having this config
 476    directory under the fsid helps cephadm to load the configuration associated with
 477    the cluster.
 478
 479
 480 When a client keyring is placed under management, cephadm will:
 481
 482   - build a list of target hosts based on the specified placement spec (see
 483     :ref:`orchestrator-cli-placement-spec`)
 484   - store a copy of the ``/etc/ceph/ceph.conf`` file on the specified host(s)
 485   - store a copy of the ``ceph.conf`` file at ``/var/lib/ceph/<fsid>/config/ceph.conf`` on the specified host(s)
 486   - store a copy of the ``ceph.client.admin.keyring`` file at ``/var/lib/ceph/<fsid>/config/ceph.client.admin.keyring`` on the specified host(s)
 487   - store a copy of the keyring file on the specified host(s)
 488   - update the ``ceph.conf`` file as needed (e.g., due to a change in the cluster monitors)
 489   - update the keyring file if the entity's key is changed (e.g., via ``ceph
 490     auth ...`` commands)
 491   - ensure that the keyring file has the specified ownership and specified mode
 492   - remove the keyring file when client keyring management is disabled
 493   - remove the keyring file from old hosts if the keyring placement spec is
 494     updated (as needed)
 495
 496 Listing Client Keyrings
 497 -----------------------
 498
 499 To see the list of client keyrings are currently under management, run the following command:
 500
 501 .. prompt:: bash #
 502
 503   ceph orch client-keyring ls
 504
 505 Putting a Keyring Under Management
 506 ----------------------------------
 507
 508 To put a keyring under management, run a command of the following form:
 509
 510 .. prompt:: bash #
 511
 512   ceph orch client-keyring set <entity> <placement> [--mode=<mode>] [--owner=<uid>.<gid>] [--path=<path>]
 513
 514 - By default, the *path* is ``/etc/ceph/client.{entity}.keyring``, which is
 515   where Ceph looks by default.  Be careful when specifying alternate locations,
 516   as existing files may be overwritten.
 517 - A placement of ``*`` (all hosts) is common.
 518 - The mode defaults to ``0600`` and ownership to ``0:0`` (user root, group root).
 519
 520 For example, to create a ``client.rbd`` key and deploy it to hosts with the
 521 ``rbd-client`` label and make it group readable by uid/gid 107 (qemu), run the
 522 following commands:
 523
 524 .. prompt:: bash #
 525
 526   ceph auth get-or-create-key client.rbd mon 'profile rbd' mgr 'profile rbd' osd 'profile rbd pool=my_rbd_pool'
 527   ceph orch client-keyring set client.rbd label:rbd-client --owner 107:107 --mode 640
 528
 529 The resulting keyring file is:
 530
 531 .. code-block:: console
 532
 533   -rw-r-----. 1 qemu qemu 156 Apr 21 08:47 /etc/ceph/client.client.rbd.keyring
 534
 535 Disabling Management of a Keyring File
 536 --------------------------------------
 537
 538 To disable management of a keyring file, run a command of the following form:
 539
 540 .. prompt:: bash #
 541
 542   ceph orch client-keyring rm <entity>
 543
 544 .. note::
 545
 546   This deletes any keyring files for this entity that were previously written
 547   to cluster nodes.
 548
 549 .. _etc_ceph_conf_distribution:
 550
 551 /etc/ceph/ceph.conf
 552 ===================
 553
 554 Distributing ceph.conf to hosts that have no keyrings
 555 -----------------------------------------------------
 556
 557 It might be useful to distribute ``ceph.conf`` files to hosts without an
 558 associated client keyring file.  By default, cephadm deploys only a
 559 ``ceph.conf`` file to hosts where a client keyring is also distributed (see
 560 above).  To write config files to hosts without client keyrings, run the
 561 following command:
 562
 563 .. prompt:: bash #
 564
 565     ceph config set mgr mgr/cephadm/manage_etc_ceph_ceph_conf true
 566
 567 Using Placement Specs to specify which hosts get keyrings
 568 ---------------------------------------------------------
 569
 570 By default, the configs are written to all hosts (i.e., those listed by ``ceph
 571 orch host ls``).  To specify which hosts get a ``ceph.conf``, run a command of
 572 the following form:
 573
 574 .. prompt:: bash #
 575
 576   ceph config set mgr mgr/cephadm/manage_etc_ceph_ceph_conf_hosts <placement spec>
 577
 578 For example, to distribute configs to hosts with the ``bare_config`` label, run
 579 the following command:
 580
 581 Distributing ceph.conf to hosts tagged with bare_config
 582 -------------------------------------------------------
 583
 584 For example, to distribute configs to hosts with the ``bare_config`` label, run the following command:
 585
 586 .. prompt:: bash #
 587
 588   ceph config set mgr mgr/cephadm/manage_etc_ceph_ceph_conf_hosts label:bare_config
 589
 590 (See :ref:`orchestrator-cli-placement-spec` for more information about placement specs.)
 591
 592 Purging a cluster
 593 =================
 594
 595 .. danger:: THIS OPERATION WILL DESTROY ALL DATA STORED IN THIS CLUSTER
 596
 597 In order to destroy a cluster and delete all data stored in this cluster, disable
 598 cephadm to stop all orchestration operations (so we avoid deploying new daemons).
 599
 600 .. prompt:: bash #
 601
 602   ceph mgr module disable cephadm
 603
 604 Then verify the FSID of the cluster:
 605
 606 .. prompt:: bash #
 607
 608   ceph fsid
 609
 610 Purge ceph daemons from all hosts in the cluster
 611
 612 .. prompt:: bash #
 613
 614   # For each host:
 615   cephadm rm-cluster --force --zap-osds --fsid <fsid>