ceph/doc/cephadm/operations.rst

   1 ==================
   2 Cephadm Operations
   3 ==================
   4
   5 .. _watching_cephadm_logs:
   6
   7 Watching cephadm log messages
   8 =============================
   9
  10 Cephadm writes logs to the ``cephadm`` cluster log channel. You can
  11 monitor Ceph's activity in real time by reading the logs as they fill
  12 up. Run the following command to see the logs in real time:
  13
  14 .. prompt:: bash #
  15
  16   ceph -W cephadm
  17
  18 By default, this command shows info-level events and above.  To see
  19 debug-level messages as well as info-level events, run the following
  20 commands:
  21
  22 .. prompt:: bash #
  23
  24   ceph config set mgr mgr/cephadm/log_to_cluster_level debug
  25   ceph -W cephadm --watch-debug
  26
  27 .. warning::
  28
  29   The debug messages are very verbose!
  30
  31 You can see recent events by running the following command:
  32
  33 .. prompt:: bash #
  34
  35   ceph log last cephadm
  36
  37 These events are also logged to the ``ceph.cephadm.log`` file on
  38 monitor hosts as well as to the monitor daemons' stderr.
  39
  40
  41 .. _cephadm-logs:
  42
  43 Ceph daemon logs
  44 ================
  45
  46 Logging to journald
  47 -------------------
  48
  49 Ceph daemons traditionally write logs to ``/var/log/ceph``. Ceph daemons log to
  50 journald by default and Ceph logs are captured by the container runtime
  51 environment. They are accessible via ``journalctl``.
  52
  53 .. note:: Prior to Quincy, ceph daemons logged to stderr.
  54
  55 Example of logging to journald
  56 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  57
  58 For example, to view the logs for the daemon ``mon.foo`` for a cluster
  59 with ID ``5c5a50ae-272a-455d-99e9-32c6a013e694``, the command would be
  60 something like:
  61
  62 .. prompt:: bash #
  63
  64   journalctl -u ceph-5c5a50ae-272a-455d-99e9-32c6a013e694@mon.foo
  65
  66 This works well for normal operations when logging levels are low.
  67
  68 Logging to files
  69 ----------------
  70
  71 You can also configure Ceph daemons to log to files instead of to
  72 journald if you prefer logs to appear in files (as they did in earlier,
  73 pre-cephadm, pre-Octopus versions of Ceph).  When Ceph logs to files,
  74 the logs appear in ``/var/log/ceph/<cluster-fsid>``. If you choose to
  75 configure Ceph to log to files instead of to journald, remember to
  76 configure Ceph so that it will not log to journald (the commands for
  77 this are covered below).
  78
  79 Enabling logging to files
  80 ~~~~~~~~~~~~~~~~~~~~~~~~~
  81
  82 To enable logging to files, run the following commands:
  83
  84 .. prompt:: bash #
  85
  86   ceph config set global log_to_file true
  87   ceph config set global mon_cluster_log_to_file true
  88
  89 Disabling logging to journald
  90 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  91
  92 If you choose to log to files, we recommend disabling logging to journald or else
  93 everything will be logged twice. Run the following commands to disable logging
  94 to stderr:
  95
  96 .. prompt:: bash #
  97
  98   ceph config set global log_to_stderr false
  99   ceph config set global mon_cluster_log_to_stderr false
 100   ceph config set global log_to_journald false
 101   ceph config set global mon_cluster_log_to_journald false
 102
 103 .. note:: You can change the default by passing --log-to-file during
 104    bootstrapping a new cluster.
 105
 106 Modifying the log retention schedule
 107 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 108
 109 By default, cephadm sets up log rotation on each host to rotate these
 110 files.  You can configure the logging retention schedule by modifying
 111 ``/etc/logrotate.d/ceph.<cluster-fsid>``.
 112
 113
 114 Data location
 115 =============
 116
 117 Cephadm stores daemon data and logs in different locations than did
 118 older, pre-cephadm (pre Octopus) versions of ceph:
 119
 120 * ``/var/log/ceph/<cluster-fsid>`` contains all cluster logs. By
 121   default, cephadm logs via stderr and the container runtime. These
 122   logs will not exist unless you have enabled logging to files as
 123   described in `cephadm-logs`_.
 124 * ``/var/lib/ceph/<cluster-fsid>`` contains all cluster daemon data
 125   (besides logs).
 126 * ``/var/lib/ceph/<cluster-fsid>/<daemon-name>`` contains all data for
 127   an individual daemon.
 128 * ``/var/lib/ceph/<cluster-fsid>/crash`` contains crash reports for
 129   the cluster.
 130 * ``/var/lib/ceph/<cluster-fsid>/removed`` contains old daemon
 131   data directories for stateful daemons (e.g., monitor, prometheus)
 132   that have been removed by cephadm.
 133
 134 Disk usage
 135 ----------
 136
 137 Because a few Ceph daemons (notably, the monitors and prometheus) store a
 138 large amount of data in ``/var/lib/ceph`` , we recommend moving this
 139 directory to its own disk, partition, or logical volume so that it does not
 140 fill up the root file system.
 141
 142
 143 Health checks
 144 =============
 145 The cephadm module provides additional health checks to supplement the
 146 default health checks provided by the Cluster. These additional health
 147 checks fall into two categories:
 148
 149 - **cephadm operations**: Health checks in this category are always
 150   executed when the cephadm module is active.
 151 - **cluster configuration**: These health checks are *optional*, and
 152   focus on the configuration of the hosts in the cluster.
 153
 154 CEPHADM Operations
 155 ------------------
 156
 157 CEPHADM_PAUSED
 158 ~~~~~~~~~~~~~~
 159
 160 This indicates that cephadm background work has been paused with
 161 ``ceph orch pause``.  Cephadm continues to perform passive monitoring
 162 activities (like checking host and daemon status), but it will not
 163 make any changes (like deploying or removing daemons).
 164
 165 Resume cephadm work by running the following command:
 166
 167 .. prompt:: bash #
 168
 169   ceph orch resume
 170
 171 .. _cephadm-stray-host:
 172
 173 CEPHADM_STRAY_HOST
 174 ~~~~~~~~~~~~~~~~~~
 175
 176 This indicates that one or more hosts have Ceph daemons that are
 177 running, but are not registered as hosts managed by *cephadm*.  This
 178 means that those services cannot currently be managed by cephadm
 179 (e.g., restarted, upgraded, included in `ceph orch ps`).
 180
 181 You can manage the host(s) by running the following command:
 182
 183 .. prompt:: bash #
 184
 185   ceph orch host add *<hostname>*
 186
 187 .. note::
 188
 189   You might need to configure SSH access to the remote host
 190   before this will work.
 191
 192 Alternatively, you can manually connect to the host and ensure that
 193 services on that host are removed or migrated to a host that is
 194 managed by *cephadm*.
 195
 196 This warning can be disabled entirely by running the following
 197 command:
 198
 199 .. prompt:: bash #
 200
 201   ceph config set mgr mgr/cephadm/warn_on_stray_hosts false
 202
 203 See :ref:`cephadm-fqdn` for more information about host names and
 204 domain names.
 205
 206 CEPHADM_STRAY_DAEMON
 207 ~~~~~~~~~~~~~~~~~~~~
 208
 209 One or more Ceph daemons are running but not are not managed by
 210 *cephadm*.  This may be because they were deployed using a different
 211 tool, or because they were started manually.  Those
 212 services cannot currently be managed by cephadm (e.g., restarted,
 213 upgraded, or included in `ceph orch ps`).
 214
 215 If the daemon is a stateful one (monitor or OSD), it should be adopted
 216 by cephadm; see :ref:`cephadm-adoption`.  For stateless daemons, it is
 217 usually easiest to provision a new daemon with the ``ceph orch apply``
 218 command and then stop the unmanaged daemon.
 219
 220 This warning can be disabled entirely by running the following command:
 221
 222 .. prompt:: bash #
 223
 224   ceph config set mgr mgr/cephadm/warn_on_stray_daemons false
 225
 226 CEPHADM_HOST_CHECK_FAILED
 227 ~~~~~~~~~~~~~~~~~~~~~~~~~
 228
 229 One or more hosts have failed the basic cephadm host check, which verifies
 230 that (1) the host is reachable and cephadm can be executed there, and (2)
 231 that the host satisfies basic prerequisites, like a working container
 232 runtime (podman or docker) and working time synchronization.
 233 If this test fails, cephadm will no be able to manage services on that host.
 234
 235 You can manually run this check by running the following command:
 236
 237 .. prompt:: bash #
 238
 239   ceph cephadm check-host *<hostname>*
 240
 241 You can remove a broken host from management by running the following command:
 242
 243 .. prompt:: bash #
 244
 245   ceph orch host rm *<hostname>*
 246
 247 You can disable this health warning by running the following command:
 248
 249 .. prompt:: bash #
 250
 251   ceph config set mgr mgr/cephadm/warn_on_failed_host_check false
 252
 253 Cluster Configuration Checks
 254 ----------------------------
 255 Cephadm periodically scans each of the hosts in the cluster in order
 256 to understand the state of the OS, disks, NICs etc. These facts can
 257 then be analysed for consistency across the hosts in the cluster to
 258 identify any configuration anomalies.
 259
 260 Enabling Cluster Configuration Checks
 261 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 262
 263 The configuration checks are an **optional** feature, and are enabled
 264 by running the following command:
 265
 266 .. prompt:: bash #
 267
 268   ceph config set mgr mgr/cephadm/config_checks_enabled true
 269
 270 States Returned by Cluster Configuration Checks
 271 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 272
 273 The configuration checks are triggered after each host scan (1m). The
 274 cephadm log entries will show the current state and outcome of the
 275 configuration checks as follows:
 276
 277 Disabled state (config_checks_enabled false):
 278
 279 .. code-block:: bash
 280
 281   ALL cephadm checks are disabled, use 'ceph config set mgr mgr/cephadm/config_checks_enabled true' to enable
 282
 283 Enabled state (config_checks_enabled true):
 284
 285 .. code-block:: bash
 286
 287   CEPHADM 8/8 checks enabled and executed (0 bypassed, 0 disabled). No issues detected
 288
 289 Managing Configuration Checks (subcommands)
 290 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 291
 292 The configuration checks themselves are managed through several cephadm subcommands.
 293
 294 To determine whether the configuration checks are enabled, run the following command:
 295
 296 .. prompt:: bash #
 297
 298   ceph cephadm config-check status
 299
 300 This command returns the status of the configuration checker as either "Enabled" or "Disabled".
 301
 302
 303 To list all the configuration checks and their current states, run the following command:
 304
 305 .. code-block:: console
 306
 307   # ceph cephadm config-check ls
 308
 309     NAME             HEALTHCHECK                      STATUS   DESCRIPTION
 310   kernel_security  CEPHADM_CHECK_KERNEL_LSM         enabled  checks SELINUX/Apparmor profiles are consistent across cluster hosts
 311   os_subscription  CEPHADM_CHECK_SUBSCRIPTION       enabled  checks subscription states are consistent for all cluster hosts
 312   public_network   CEPHADM_CHECK_PUBLIC_MEMBERSHIP  enabled  check that all hosts have a NIC on the Ceph public_netork
 313   osd_mtu_size     CEPHADM_CHECK_MTU                enabled  check that OSD hosts share a common MTU setting
 314   osd_linkspeed    CEPHADM_CHECK_LINKSPEED          enabled  check that OSD hosts share a common linkspeed
 315   network_missing  CEPHADM_CHECK_NETWORK_MISSING    enabled  checks that the cluster/public networks defined exist on the Ceph hosts
 316   ceph_release     CEPHADM_CHECK_CEPH_RELEASE       enabled  check for Ceph version consistency - ceph daemons should be on the same release (unless upgrade is active)
 317   kernel_version   CEPHADM_CHECK_KERNEL_VERSION     enabled  checks that the MAJ.MIN of the kernel on Ceph hosts is consistent
 318
 319 The name of each configuration check can be used to enable or disable a specific check by running a command of the following form:
 320 :
 321
 322 .. prompt:: bash #
 323
 324   ceph cephadm config-check disable <name>
 325
 326 For example:
 327
 328 .. prompt:: bash #
 329
 330   ceph cephadm config-check disable kernel_security
 331
 332 CEPHADM_CHECK_KERNEL_LSM
 333 ~~~~~~~~~~~~~~~~~~~~~~~~
 334 Each host within the cluster is expected to operate within the same Linux
 335 Security Module (LSM) state. For example, if the majority of the hosts are
 336 running with SELINUX in enforcing mode, any host not running in this mode is
 337 flagged as an anomaly and a healtcheck (WARNING) state raised.
 338
 339 CEPHADM_CHECK_SUBSCRIPTION
 340 ~~~~~~~~~~~~~~~~~~~~~~~~~~
 341 This check relates to the status of vendor subscription. This check is
 342 performed only for hosts using RHEL, but helps to confirm that all hosts are
 343 covered by an active subscription, which ensures that patches and updates are
 344 available.
 345
 346 CEPHADM_CHECK_PUBLIC_MEMBERSHIP
 347 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 348 All members of the cluster should have NICs configured on at least one of the
 349 public network subnets. Hosts that are not on the public network will rely on
 350 routing, which may affect performance.
 351
 352 CEPHADM_CHECK_MTU
 353 ~~~~~~~~~~~~~~~~~
 354 The MTU of the NICs on OSDs can be a key factor in consistent performance. This
 355 check examines hosts that are running OSD services to ensure that the MTU is
 356 configured consistently within the cluster. This is determined by establishing
 357 the MTU setting that the majority of hosts is using. Any anomalies result in a
 358 Ceph health check.
 359
 360 CEPHADM_CHECK_LINKSPEED
 361 ~~~~~~~~~~~~~~~~~~~~~~~
 362 This check is similar to the MTU check. Linkspeed consistency is a factor in
 363 consistent cluster performance, just as the MTU of the NICs on the OSDs is.
 364 This check determines the linkspeed shared by the majority of OSD hosts, and a
 365 health check is run for any hosts that are set at a lower linkspeed rate.
 366
 367 CEPHADM_CHECK_NETWORK_MISSING
 368 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 369 The `public_network` and `cluster_network` settings support subnet definitions
 370 for IPv4 and IPv6. If these settings are not found on any host in the cluster,
 371 a health check is raised.
 372
 373 CEPHADM_CHECK_CEPH_RELEASE
 374 ~~~~~~~~~~~~~~~~~~~~~~~~~~
 375 Under normal operations, the Ceph cluster runs daemons under the same ceph
 376 release (that is, the Ceph cluster runs all daemons under (for example)
 377 Octopus).  This check determines the active release for each daemon, and
 378 reports any anomalies as a healthcheck. *This check is bypassed if an upgrade
 379 process is active within the cluster.*
 380
 381 CEPHADM_CHECK_KERNEL_VERSION
 382 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 383 The OS kernel version (maj.min) is checked for consistency across the hosts.
 384 The kernel version of the majority of the hosts is used as the basis for
 385 identifying anomalies.
 386
 387 .. _client_keyrings_and_configs:
 388
 389 Client keyrings and configs
 390 ===========================
 391
 392 Cephadm can distribute copies of the ``ceph.conf`` file and client keyring
 393 files to hosts. It is usually a good idea to store a copy of the config and
 394 ``client.admin`` keyring on any host used to administer the cluster via the
 395 CLI.  By default, cephadm does this for any nodes that have the ``_admin``
 396 label (which normally includes the bootstrap host).
 397
 398 When a client keyring is placed under management, cephadm will:
 399
 400   - build a list of target hosts based on the specified placement spec (see
 401     :ref:`orchestrator-cli-placement-spec`)
 402   - store a copy of the ``/etc/ceph/ceph.conf`` file on the specified host(s)
 403   - store a copy of the keyring file on the specified host(s)
 404   - update the ``ceph.conf`` file as needed (e.g., due to a change in the cluster monitors)
 405   - update the keyring file if the entity's key is changed (e.g., via ``ceph
 406     auth ...`` commands)
 407   - ensure that the keyring file has the specified ownership and specified mode
 408   - remove the keyring file when client keyring management is disabled
 409   - remove the keyring file from old hosts if the keyring placement spec is
 410     updated (as needed)
 411
 412 Listing Client Keyrings
 413 -----------------------
 414
 415 To see the list of client keyrings are currently under management, run the following command:
 416
 417 .. prompt:: bash #
 418
 419   ceph orch client-keyring ls
 420
 421 Putting a Keyring Under Management
 422 ----------------------------------
 423
 424 To put a keyring under management, run a command of the following form:
 425
 426 .. prompt:: bash #
 427
 428   ceph orch client-keyring set <entity> <placement> [--mode=<mode>] [--owner=<uid>.<gid>] [--path=<path>]
 429
 430 - By default, the *path* is ``/etc/ceph/client.{entity}.keyring``, which is
 431   where Ceph looks by default.  Be careful when specifying alternate locations,
 432   as existing files may be overwritten.
 433 - A placement of ``*`` (all hosts) is common.
 434 - The mode defaults to ``0600`` and ownership to ``0:0`` (user root, group root).
 435
 436 For example, to create a ``client.rbd`` key and deploy it to hosts with the
 437 ``rbd-client`` label and make it group readable by uid/gid 107 (qemu), run the
 438 following commands:
 439
 440 .. prompt:: bash #
 441
 442   ceph auth get-or-create-key client.rbd mon 'profile rbd' mgr 'profile rbd' osd 'profile rbd pool=my_rbd_pool'
 443   ceph orch client-keyring set client.rbd label:rbd-client --owner 107:107 --mode 640
 444
 445 The resulting keyring file is:
 446
 447 .. code-block:: console
 448
 449   -rw-r-----. 1 qemu qemu 156 Apr 21 08:47 /etc/ceph/client.client.rbd.keyring
 450
 451 Disabling Management of a Keyring File
 452 --------------------------------------
 453
 454 To disable management of a keyring file, run a command of the following form:
 455
 456 .. prompt:: bash #
 457
 458   ceph orch client-keyring rm <entity>
 459
 460 .. note::
 461
 462   This deletes any keyring files for this entity that were previously written
 463   to cluster nodes.
 464
 465 .. _etc_ceph_conf_distribution:
 466
 467 /etc/ceph/ceph.conf
 468 ===================
 469
 470 Distributing ceph.conf to hosts that have no keyrings
 471 -----------------------------------------------------
 472
 473 It might be useful to distribute ``ceph.conf`` files to hosts without an
 474 associated client keyring file.  By default, cephadm deploys only a
 475 ``ceph.conf`` file to hosts where a client keyring is also distributed (see
 476 above).  To write config files to hosts without client keyrings, run the
 477 following command:
 478
 479 .. prompt:: bash #
 480
 481     ceph config set mgr mgr/cephadm/manage_etc_ceph_ceph_conf true
 482
 483 Using Placement Specs to specify which hosts get keyrings
 484 ---------------------------------------------------------
 485
 486 By default, the configs are written to all hosts (i.e., those listed by ``ceph
 487 orch host ls``).  To specify which hosts get a ``ceph.conf``, run a command of
 488 the following form:
 489
 490 .. prompt:: bash #
 491
 492   ceph config set mgr mgr/cephadm/manage_etc_ceph_ceph_conf_hosts <placement spec>
 493
 494 For example, to distribute configs to hosts with the ``bare_config`` label, run
 495 the following command:
 496
 497 Distributing ceph.conf to hosts tagged with bare_config
 498 -------------------------------------------------------
 499
 500 For example, to distribute configs to hosts with the ``bare_config`` label, run the following command:
 501
 502 .. prompt:: bash #
 503
 504   ceph config set mgr mgr/cephadm/manage_etc_ceph_ceph_conf_hosts label:bare_config
 505
 506 (See :ref:`orchestrator-cli-placement-spec` for more information about placement specs.)