ceph/doc/rados/operations/monitoring.rst

   1 ======================
   2  Monitoring a Cluster
   3 ======================
   4
   5 Once you have a running cluster, you may use the ``ceph`` tool to monitor your
   6 cluster. Monitoring a cluster typically involves checking OSD status, monitor
   7 status, placement group status and metadata server status.
   8
   9 Using the command line
  10 ======================
  11
  12 Interactive mode
  13 ----------------
  14
  15 To run the ``ceph`` tool in interactive mode, type ``ceph`` at the command line
  16 with no arguments.  For example::
  17
  18         ceph
  19         ceph> health
  20         ceph> status
  21         ceph> quorum_status
  22         ceph> mon stat
  23
  24 Non-default paths
  25 -----------------
  26
  27 If you specified non-default locations for your configuration or keyring,
  28 you may specify their locations::
  29
  30    ceph -c /path/to/conf -k /path/to/keyring health
  31
  32 Checking a Cluster's Status
  33 ===========================
  34
  35 After you start your cluster, and before you start reading and/or
  36 writing data, check your cluster's status first.
  37
  38 To check a cluster's status, execute the following::
  39
  40         ceph status
  41
  42 Or::
  43
  44         ceph -s
  45
  46 In interactive mode, type ``status`` and press **Enter**. ::
  47
  48         ceph> status
  49
  50 Ceph will print the cluster status. For example, a tiny Ceph demonstration
  51 cluster with one of each service may print the following:
  52
  53 ::
  54
  55   cluster:
  56     id:     477e46f1-ae41-4e43-9c8f-72c918ab0a20
  57     health: HEALTH_OK
  58
  59   services:
  60     mon: 3 daemons, quorum a,b,c
  61     mgr: x(active)
  62     mds: cephfs_a-1/1/1 up  {0=a=up:active}, 2 up:standby
  63     osd: 3 osds: 3 up, 3 in
  64
  65   data:
  66     pools:   2 pools, 16 pgs
  67     objects: 21 objects, 2.19K
  68     usage:   546 GB used, 384 GB / 931 GB avail
  69     pgs:     16 active+clean
  70
  71
  72 .. topic:: How Ceph Calculates Data Usage
  73
  74    The ``usage`` value reflects the *actual* amount of raw storage used. The
  75    ``xxx GB / xxx GB`` value means the amount available (the lesser number)
  76    of the overall storage capacity of the cluster. The notional number reflects
  77    the size of the stored data before it is replicated, cloned or snapshotted.
  78    Therefore, the amount of data actually stored typically exceeds the notional
  79    amount stored, because Ceph creates replicas of the data and may also use
  80    storage capacity for cloning and snapshotting.
  81
  82
  83 Watching a Cluster
  84 ==================
  85
  86 In addition to local logging by each daemon, Ceph clusters maintain
  87 a *cluster log* that records high level events about the whole system.
  88 This is logged to disk on monitor servers (as ``/var/log/ceph/ceph.log`` by
  89 default), but can also be monitored via the command line.
  90
  91 To follow the cluster log, use the following command
  92
  93 ::
  94
  95         ceph -w
  96
  97 Ceph will print the status of the system, followed by each log message as it
  98 is emitted.  For example:
  99
 100 ::
 101
 102   cluster:
 103     id:     477e46f1-ae41-4e43-9c8f-72c918ab0a20
 104     health: HEALTH_OK
 105
 106   services:
 107     mon: 3 daemons, quorum a,b,c
 108     mgr: x(active)
 109     mds: cephfs_a-1/1/1 up  {0=a=up:active}, 2 up:standby
 110     osd: 3 osds: 3 up, 3 in
 111
 112   data:
 113     pools:   2 pools, 16 pgs
 114     objects: 21 objects, 2.19K
 115     usage:   546 GB used, 384 GB / 931 GB avail
 116     pgs:     16 active+clean
 117
 118
 119   2017-07-24 08:15:11.329298 mon.a mon.0 172.21.9.34:6789/0 23 : cluster [INF] osd.0 172.21.9.34:6806/20527 boot
 120   2017-07-24 08:15:14.258143 mon.a mon.0 172.21.9.34:6789/0 39 : cluster [INF] Activating manager daemon x
 121   2017-07-24 08:15:15.446025 mon.a mon.0 172.21.9.34:6789/0 47 : cluster [INF] Manager daemon x is now available
 122
 123
 124 In addition to using ``ceph -w`` to print log lines as they are emitted,
 125 use ``ceph log last [n]`` to see the most recent ``n`` lines from the cluster
 126 log.
 127
 128 Monitoring Health Checks
 129 ========================
 130
 131 Ceph continuously runs various *health checks* against its own status.  When
 132 a health check fails, this is reflected in the output of ``ceph status`` (or
 133 ``ceph health``).  In addition, messages are sent to the cluster log to
 134 indicate when a check fails, and when the cluster recovers.
 135
 136 For example, when an OSD goes down, the ``health`` section of the status
 137 output may be updated as follows:
 138
 139 ::
 140
 141     health: HEALTH_WARN
 142             1 osds down
 143             Degraded data redundancy: 21/63 objects degraded (33.333%), 16 pgs unclean, 16 pgs degraded
 144
 145 At this time, cluster log messages are also emitted to record the failure of the
 146 health checks:
 147
 148 ::
 149
 150     2017-07-25 10:08:58.265945 mon.a mon.0 172.21.9.34:6789/0 91 : cluster [WRN] Health check failed: 1 osds down (OSD_DOWN)
 151     2017-07-25 10:09:01.302624 mon.a mon.0 172.21.9.34:6789/0 94 : cluster [WRN] Health check failed: Degraded data redundancy: 21/63 objects degraded (33.333%), 16 pgs unclean, 16 pgs degraded (PG_DEGRADED)
 152
 153 When the OSD comes back online, the cluster log records the cluster's return
 154 to a health state:
 155
 156 ::
 157
 158     2017-07-25 10:11:11.526841 mon.a mon.0 172.21.9.34:6789/0 109 : cluster [WRN] Health check update: Degraded data redundancy: 2 pgs unclean, 2 pgs degraded, 2 pgs undersized (PG_DEGRADED)
 159     2017-07-25 10:11:13.535493 mon.a mon.0 172.21.9.34:6789/0 110 : cluster [INF] Health check cleared: PG_DEGRADED (was: Degraded data redundancy: 2 pgs unclean, 2 pgs degraded, 2 pgs undersized)
 160     2017-07-25 10:11:13.535577 mon.a mon.0 172.21.9.34:6789/0 111 : cluster [INF] Cluster is now healthy
 161
 162 Network Performance Checks
 163 --------------------------
 164
 165 Ceph OSDs send heartbeat ping messages amongst themselves to monitor daemon availability.  We
 166 also use the response times to monitor network performance.
 167 While it is possible that a busy OSD could delay a ping response, we can assume
 168 that if a network switch fails multiple delays will be detected between distinct pairs of OSDs.
 169
 170 By default we will warn about ping times which exceed 1 second (1000 milliseconds).
 171
 172 ::
 173
 174     HEALTH_WARN Slow OSD heartbeats on back (longest 1118.001ms)
 175
 176 The health detail will add the combination of OSDs are seeing the delays and by how much.  There is a limit of 10
 177 detail line items.
 178
 179 ::
 180
 181     [WRN] OSD_SLOW_PING_TIME_BACK: Slow OSD heartbeats on back (longest 1118.001ms)
 182         Slow OSD heartbeats on back from osd.0 [dc1,rack1] to osd.1 [dc1,rack1] 1118.001 msec possibly improving
 183         Slow OSD heartbeats on back from osd.0 [dc1,rack1] to osd.2 [dc1,rack2] 1030.123 msec
 184         Slow OSD heartbeats on back from osd.2 [dc1,rack2] to osd.1 [dc1,rack1] 1015.321 msec
 185         Slow OSD heartbeats on back from osd.1 [dc1,rack1] to osd.0 [dc1,rack1] 1010.456 msec
 186
 187 To see even more detail and a complete dump of network performance information the ``dump_osd_network`` command can be used.  Typically, this would be
 188 sent to a mgr, but it can be limited to a particular OSD's interactions by issuing it to any OSD.  The current threshold which defaults to 1 second
 189 (1000 milliseconds) can be overridden as an argument in milliseconds.
 190
 191 The following command will show all gathered network performance data by specifying a threshold of 0 and sending to the mgr.
 192
 193 ::
 194
 195     $ ceph daemon /var/run/ceph/ceph-mgr.x.asok dump_osd_network 0
 196     {
 197         "threshold": 0,
 198         "entries": [
 199             {
 200                 "last update": "Wed Sep  4 17:04:49 2019",
 201                 "stale": false,
 202                 "from osd": 2,
 203                 "to osd": 0,
 204                 "interface": "front",
 205                 "average": {
 206                     "1min": 1.023,
 207                     "5min": 0.860,
 208                     "15min": 0.883
 209                 },
 210                 "min": {
 211                     "1min": 0.818,
 212                     "5min": 0.607,
 213                     "15min": 0.607
 214                 },
 215                 "max": {
 216                     "1min": 1.164,
 217                     "5min": 1.173,
 218                     "15min": 1.544
 219                 },
 220                 "last": 0.924
 221             },
 222             {
 223                 "last update": "Wed Sep  4 17:04:49 2019",
 224                 "stale": false,
 225                 "from osd": 2,
 226                 "to osd": 0,
 227                 "interface": "back",
 228                 "average": {
 229                     "1min": 0.968,
 230                     "5min": 0.897,
 231                     "15min": 0.830
 232                 },
 233                 "min": {
 234                     "1min": 0.860,
 235                     "5min": 0.563,
 236                     "15min": 0.502
 237                 },
 238                 "max": {
 239                     "1min": 1.171,
 240                     "5min": 1.216,
 241                     "15min": 1.456
 242                 },
 243                 "last": 0.845
 244             },
 245             {
 246                 "last update": "Wed Sep  4 17:04:48 2019",
 247                 "stale": false,
 248                 "from osd": 0,
 249                 "to osd": 1,
 250                 "interface": "front",
 251                 "average": {
 252                     "1min": 0.965,
 253                     "5min": 0.811,
 254                     "15min": 0.850
 255                 },
 256                 "min": {
 257                     "1min": 0.650,
 258                     "5min": 0.488,
 259                     "15min": 0.466
 260                 },
 261                 "max": {
 262                     "1min": 1.252,
 263                     "5min": 1.252,
 264                     "15min": 1.362
 265                 },
 266             "last": 0.791
 267         },
 268         ...
 269
 270
 271
 272 Muting health checks
 273 --------------------
 274
 275 Health checks can be muted so that they do not affect the overall
 276 reported status of the cluster.  Alerts are specified using the health
 277 check code (see :ref:`health-checks`)::
 278
 279   ceph health mute <code>
 280
 281 For example, if there is a health warning, muting it will make the
 282 cluster report an overall status of ``HEALTH_OK``.  For example, to
 283 mute an ``OSD_DOWN`` alert,::
 284
 285   ceph health mute OSD_DOWN
 286
 287 Mutes are reported as part of the short and long form of the ``ceph health`` command.
 288 For example, in the above scenario, the cluster would report::
 289
 290   $ ceph health
 291   HEALTH_OK (muted: OSD_DOWN)
 292   $ ceph health detail
 293   HEALTH_OK (muted: OSD_DOWN)
 294   (MUTED) OSD_DOWN 1 osds down
 295       osd.1 is down
 296
 297 A mute can be explicitly removed with::
 298
 299   ceph health unmute <code>
 300
 301 For example,::
 302
 303   ceph health unmute OSD_DOWN
 304
 305 A health check mute may optionally have a TTL (time to live)
 306 associated with it, such that the mute will automatically expire
 307 after the specified period of time has elapsed.  The TTL is specified as an optional
 308 duration argument, e.g.::
 309
 310   ceph health mute OSD_DOWN 4h    # mute for 4 hours
 311   ceph health mute MON_DOWN 15m   # mute for 15  minutes
 312
 313 Normally, if a muted health alert is resolved (e.g., in the example
 314 above, the OSD comes back up), the mute goes away.  If the alert comes
 315 back later, it will be reported in the usual way.
 316
 317 It is possible to make a mute "sticky" such that the mute will remain even if the
 318 alert clears.  For example,::
 319
 320   ceph health mute OSD_DOWN 1h --sticky   # ignore any/all down OSDs for next hour
 321
 322 Most health mutes also disappear if the extent of an alert gets worse.  For example,
 323 if there is one OSD down, and the alert is muted, the mute will disappear if one
 324 or more additional OSDs go down.  This is true for any health alert that involves
 325 a count indicating how much or how many of something is triggering the warning or
 326 error.
 327
 328
 329 Detecting configuration issues
 330 ==============================
 331
 332 In addition to the health checks that Ceph continuously runs on its
 333 own status, there are some configuration issues that may only be detected
 334 by an external tool.
 335
 336 Use the `ceph-medic`_ tool to run these additional checks on your Ceph
 337 cluster's configuration.
 338
 339 Checking a Cluster's Usage Stats
 340 ================================
 341
 342 To check a cluster's data usage and data distribution among pools, you can
 343 use the ``df`` option. It is similar to Linux ``df``. Execute
 344 the following::
 345
 346         ceph df
 347
 348 The output of ``ceph df`` looks like this::
 349
 350    CLASS     SIZE    AVAIL     USED  RAW USED  %RAW USED
 351    ssd    202 GiB  200 GiB  2.0 GiB   2.0 GiB       1.00
 352    TOTAL  202 GiB  200 GiB  2.0 GiB   2.0 GiB       1.00
 353
 354    --- POOLS ---
 355    POOL                   ID  PGS   STORED   (DATA)   (OMAP)   OBJECTS     USED  (DATA)   (OMAP)   %USED  MAX AVAIL  QUOTA OBJECTS  QUOTA BYTES  DIRTY  USED COMPR  UNDER COMPR
 356    device_health_metrics   1    1  242 KiB   15 KiB  227 KiB         4  251 KiB  24 KiB  227 KiB       0    297 GiB            N/A          N/A      4         0 B          0 B
 357    cephfs.a.meta           2   32  6.8 KiB  6.8 KiB      0 B        22   96 KiB  96 KiB      0 B       0    297 GiB            N/A          N/A     22         0 B          0 B
 358    cephfs.a.data           3   32      0 B      0 B      0 B         0      0 B     0 B      0 B       0     99 GiB            N/A          N/A      0         0 B          0 B
 359    test                    4   32   22 MiB   22 MiB   50 KiB       248   19 MiB  19 MiB   50 KiB       0    297 GiB            N/A          N/A    248         0 B          0 B
 360
 361
 362
 363
 364
 365 - **CLASS:** for example, "ssd" or "hdd"
 366 - **SIZE:** The amount of storage capacity managed by the cluster.
 367 - **AVAIL:** The amount of free space available in the cluster.
 368 - **USED:** The amount of raw storage consumed by user data (excluding
 369   BlueStore's database)
 370 - **RAW USED:** The amount of raw storage consumed by user data, internal
 371   overhead, or reserved capacity.
 372 - **%RAW USED:** The percentage of raw storage used. Use this number in
 373   conjunction with the ``full ratio`` and ``near full ratio`` to ensure that
 374   you are not reaching your cluster's capacity. See `Storage Capacity`_ for
 375   additional details.
 376
 377
 378 **POOLS:**
 379
 380 The **POOLS** section of the output provides a list of pools and the notional
 381 usage of each pool. The output from this section **DOES NOT** reflect replicas,
 382 clones or snapshots. For example, if you store an object with 1MB of data, the
 383 notional usage will be 1MB, but the actual usage may be 2MB or more depending
 384 on the number of replicas, clones and snapshots.
 385
 386 - **ID:** The number of the node within the pool.
 387 - **STORED:** actual amount of data user/Ceph has stored in a pool. This is
 388   similar to the USED column in earlier versions of Ceph but the calculations
 389   (for BlueStore!) are more precise (gaps are properly handled).
 390
 391   - **(DATA):** usage for RBD (RADOS Block Device), CephFS file data, and RGW
 392     (RADOS Gateway) object data.
 393   - **(OMAP):** key-value pairs. Used primarily by CephFS and RGW (RADOS
 394     Gateway) for metadata storage.
 395
 396 - **OBJECTS:** The notional number of objects stored per pool. "Notional" is
 397   defined above in the paragraph immediately under "POOLS".
 398 - **USED:** The space allocated for a pool over all OSDs. This includes
 399   replication, allocation granularity, and erasure-coding overhead. Compression
 400   savings and object content gaps are also taken into account. BlueStore's
 401   database is not included in this amount.
 402
 403   - **(DATA):** object usage for RBD (RADOS Block Device), CephFS file data, and RGW
 404     (RADOS Gateway) object data.
 405   - **(OMAP):** object key-value pairs. Used primarily by CephFS and RGW (RADOS
 406     Gateway) for metadata storage.
 407
 408 - **%USED:** The notional percentage of storage used per pool.
 409 - **MAX AVAIL:** An estimate of the notional amount of data that can be written
 410   to this pool.
 411 - **QUOTA OBJECTS:** The number of quota objects.
 412 - **QUOTA BYTES:** The number of bytes in the quota objects.
 413 - **DIRTY:** The number of objects in the cache pool that have been written to
 414   the cache pool but have not been flushed yet to the base pool. This field is
 415   only available when cache tiering is in use.
 416 - **USED COMPR:** amount of space allocated for compressed data (i.e. this
 417   includes compressed data plus all the allocation, replication and erasure
 418   coding overhead).
 419 - **UNDER COMPR:** amount of data passed through compression (summed over all
 420   replicas) and beneficial enough to be stored in a compressed form.
 421
 422
 423 .. note:: The numbers in the POOLS section are notional. They are not
 424    inclusive of the number of replicas, snapshots or clones. As a result, the
 425    sum of the USED and %USED amounts will not add up to the USED and %USED
 426    amounts in the RAW section of the output.
 427
 428 .. note:: The MAX AVAIL value is a complicated function of the replication
 429    or erasure code used, the CRUSH rule that maps storage to devices, the
 430    utilization of those devices, and the configured ``mon_osd_full_ratio``.
 431
 432
 433 Checking OSD Status
 434 ===================
 435
 436 You can check OSDs to ensure they are ``up`` and ``in`` by executing the
 437 following command:
 438
 439 .. prompt:: bash #
 440
 441   ceph osd stat
 442
 443 Or:
 444
 445 .. prompt:: bash #
 446
 447   ceph osd dump
 448
 449 You can also check view OSDs according to their position in the CRUSH map by
 450 using the following command:
 451
 452 .. prompt:: bash #
 453
 454    ceph osd tree
 455
 456 Ceph will print out a CRUSH tree with a host, its OSDs, whether they are up
 457 and their weight:
 458
 459 .. code-block:: bash
 460
 461    #ID CLASS WEIGHT  TYPE NAME             STATUS REWEIGHT PRI-AFF
 462     -1       3.00000 pool default
 463     -3       3.00000 rack mainrack
 464     -2       3.00000 host osd-host
 465      0   ssd 1.00000         osd.0             up  1.00000 1.00000
 466      1   ssd 1.00000         osd.1             up  1.00000 1.00000
 467      2   ssd 1.00000         osd.2             up  1.00000 1.00000
 468
 469 For a detailed discussion, refer to `Monitoring OSDs and Placement Groups`_.
 470
 471 Checking Monitor Status
 472 =======================
 473
 474 If your cluster has multiple monitors (likely), you should check the monitor
 475 quorum status after you start the cluster and before reading and/or writing data. A
 476 quorum must be present when multiple monitors are running. You should also check
 477 monitor status periodically to ensure that they are running.
 478
 479 To see display the monitor map, execute the following::
 480
 481         ceph mon stat
 482
 483 Or::
 484
 485         ceph mon dump
 486
 487 To check the quorum status for the monitor cluster, execute the following::
 488
 489         ceph quorum_status
 490
 491 Ceph will return the quorum status. For example, a Ceph  cluster consisting of
 492 three monitors may return the following:
 493
 494 .. code-block:: javascript
 495
 496         { "election_epoch": 10,
 497           "quorum": [
 498                 0,
 499                 1,
 500                 2],
 501           "quorum_names": [
 502                 "a",
 503                 "b",
 504                 "c"],
 505           "quorum_leader_name": "a",
 506           "monmap": { "epoch": 1,
 507               "fsid": "444b489c-4f16-4b75-83f0-cb8097468898",
 508               "modified": "2011-12-12 13:28:27.505520",
 509               "created": "2011-12-12 13:28:27.505520",
 510               "features": {"persistent": [
 511                                 "kraken",
 512                                 "luminous",
 513                                 "mimic"],
 514                 "optional": []
 515               },
 516               "mons": [
 517                     { "rank": 0,
 518                       "name": "a",
 519                       "addr": "127.0.0.1:6789/0",
 520                       "public_addr": "127.0.0.1:6789/0"},
 521                     { "rank": 1,
 522                       "name": "b",
 523                       "addr": "127.0.0.1:6790/0",
 524                       "public_addr": "127.0.0.1:6790/0"},
 525                     { "rank": 2,
 526                       "name": "c",
 527                       "addr": "127.0.0.1:6791/0",
 528                       "public_addr": "127.0.0.1:6791/0"}
 529                    ]
 530           }
 531         }
 532
 533 Checking MDS Status
 534 ===================
 535
 536 Metadata servers provide metadata services for  CephFS. Metadata servers have
 537 two sets of states: ``up | down`` and ``active | inactive``. To ensure your
 538 metadata servers are ``up`` and ``active``,  execute the following::
 539
 540         ceph mds stat
 541
 542 To display details of the metadata cluster, execute the following::
 543
 544         ceph fs dump
 545
 546
 547 Checking Placement Group States
 548 ===============================
 549
 550 Placement groups map objects to OSDs. When you monitor your
 551 placement groups,  you will want them to be ``active`` and ``clean``.
 552 For a detailed discussion, refer to `Monitoring OSDs and Placement Groups`_.
 553
 554 .. _Monitoring OSDs and Placement Groups: ../monitoring-osd-pg
 555
 556 .. _rados-monitoring-using-admin-socket:
 557
 558 Using the Admin Socket
 559 ======================
 560
 561 The Ceph admin socket allows you to query a daemon via a socket interface.
 562 By default, Ceph sockets reside under ``/var/run/ceph``. To access a daemon
 563 via the admin socket, login to the host running the daemon and use the
 564 following command::
 565
 566         ceph daemon {daemon-name}
 567         ceph daemon {path-to-socket-file}
 568
 569 For example, the following are equivalent::
 570
 571     ceph daemon osd.0 foo
 572     ceph daemon /var/run/ceph/ceph-osd.0.asok foo
 573
 574 To view the available admin socket commands, execute the following command::
 575
 576         ceph daemon {daemon-name} help
 577
 578 The admin socket command enables you to show and set your configuration at
 579 runtime. See `Viewing a Configuration at Runtime`_ for details.
 580
 581 Additionally, you can set configuration values at runtime directly (i.e., the
 582 admin socket bypasses the monitor, unlike ``ceph tell {daemon-type}.{id}
 583 config set``, which relies on the monitor but doesn't require you to login
 584 directly to the host in question ).
 585
 586 .. _Viewing a Configuration at Runtime: ../../configuration/ceph-conf#viewing-a-configuration-at-runtime
 587 .. _Storage Capacity: ../../configuration/mon-config-ref#storage-capacity
 588 .. _ceph-medic: http://docs.ceph.com/ceph-medic/master/