ceph/doc/rados/operations/monitoring.rst

   1 ======================
   2  Monitoring a Cluster
   3 ======================
   4
   5 Once you have a running cluster, you may use the ``ceph`` tool to monitor your
   6 cluster. Monitoring a cluster typically involves checking OSD status, monitor
   7 status, placement group status and metadata server status.
   8
   9 Using the command line
  10 ======================
  11
  12 Interactive mode
  13 ----------------
  14
  15 To run the ``ceph`` tool in interactive mode, type ``ceph`` at the command line
  16 with no arguments.  For example:
  17
  18 .. prompt:: bash $
  19
  20         ceph
  21
  22 .. prompt:: ceph>
  23     :prompts: ceph>
  24
  25     health
  26     status
  27     quorum_status
  28     mon stat
  29
  30 Non-default paths
  31 -----------------
  32
  33 If you specified non-default locations for your configuration or keyring,
  34 you may specify their locations:
  35
  36 .. prompt:: bash $
  37
  38    ceph -c /path/to/conf -k /path/to/keyring health
  39
  40 Checking a Cluster's Status
  41 ===========================
  42
  43 After you start your cluster, and before you start reading and/or
  44 writing data, check your cluster's status first.
  45
  46 To check a cluster's status, execute the following:
  47
  48 .. prompt:: bash $
  49
  50    ceph status
  51
  52 Or:
  53
  54 .. prompt:: bash $
  55
  56    ceph -s
  57
  58 In interactive mode, type ``status`` and press **Enter**:
  59
  60 .. prompt:: ceph>
  61     :prompts: ceph>
  62
  63     status
  64
  65 Ceph will print the cluster status. For example, a tiny Ceph demonstration
  66 cluster with one of each service may print the following:
  67
  68 ::
  69
  70   cluster:
  71     id:     477e46f1-ae41-4e43-9c8f-72c918ab0a20
  72     health: HEALTH_OK
  73
  74   services:
  75     mon: 3 daemons, quorum a,b,c
  76     mgr: x(active)
  77     mds: cephfs_a-1/1/1 up  {0=a=up:active}, 2 up:standby
  78     osd: 3 osds: 3 up, 3 in
  79
  80   data:
  81     pools:   2 pools, 16 pgs
  82     objects: 21 objects, 2.19K
  83     usage:   546 GB used, 384 GB / 931 GB avail
  84     pgs:     16 active+clean
  85
  86
  87 .. topic:: How Ceph Calculates Data Usage
  88
  89    The ``usage`` value reflects the *actual* amount of raw storage used. The
  90    ``xxx GB / xxx GB`` value means the amount available (the lesser number)
  91    of the overall storage capacity of the cluster. The notional number reflects
  92    the size of the stored data before it is replicated, cloned or snapshotted.
  93    Therefore, the amount of data actually stored typically exceeds the notional
  94    amount stored, because Ceph creates replicas of the data and may also use
  95    storage capacity for cloning and snapshotting.
  96
  97
  98 Watching a Cluster
  99 ==================
 100
 101 In addition to local logging by each daemon, Ceph clusters maintain
 102 a *cluster log* that records high level events about the whole system.
 103 This is logged to disk on monitor servers (as ``/var/log/ceph/ceph.log`` by
 104 default), but can also be monitored via the command line.
 105
 106 To follow the cluster log, use the following command:
 107
 108 .. prompt:: bash $
 109
 110    ceph -w
 111
 112 Ceph will print the status of the system, followed by each log message as it
 113 is emitted. For example:
 114
 115 ::
 116
 117   cluster:
 118     id:     477e46f1-ae41-4e43-9c8f-72c918ab0a20
 119     health: HEALTH_OK
 120
 121   services:
 122     mon: 3 daemons, quorum a,b,c
 123     mgr: x(active)
 124     mds: cephfs_a-1/1/1 up  {0=a=up:active}, 2 up:standby
 125     osd: 3 osds: 3 up, 3 in
 126
 127   data:
 128     pools:   2 pools, 16 pgs
 129     objects: 21 objects, 2.19K
 130     usage:   546 GB used, 384 GB / 931 GB avail
 131     pgs:     16 active+clean
 132
 133
 134   2017-07-24 08:15:11.329298 mon.a mon.0 172.21.9.34:6789/0 23 : cluster [INF] osd.0 172.21.9.34:6806/20527 boot
 135   2017-07-24 08:15:14.258143 mon.a mon.0 172.21.9.34:6789/0 39 : cluster [INF] Activating manager daemon x
 136   2017-07-24 08:15:15.446025 mon.a mon.0 172.21.9.34:6789/0 47 : cluster [INF] Manager daemon x is now available
 137
 138
 139 In addition to using ``ceph -w`` to print log lines as they are emitted,
 140 use ``ceph log last [n]`` to see the most recent ``n`` lines from the cluster
 141 log.
 142
 143 Monitoring Health Checks
 144 ========================
 145
 146 Ceph continuously runs various *health checks* against its own status.  When
 147 a health check fails, this is reflected in the output of ``ceph status`` (or
 148 ``ceph health``).  In addition, messages are sent to the cluster log to
 149 indicate when a check fails, and when the cluster recovers.
 150
 151 For example, when an OSD goes down, the ``health`` section of the status
 152 output may be updated as follows:
 153
 154 ::
 155
 156     health: HEALTH_WARN
 157             1 osds down
 158             Degraded data redundancy: 21/63 objects degraded (33.333%), 16 pgs unclean, 16 pgs degraded
 159
 160 At this time, cluster log messages are also emitted to record the failure of the
 161 health checks:
 162
 163 ::
 164
 165     2017-07-25 10:08:58.265945 mon.a mon.0 172.21.9.34:6789/0 91 : cluster [WRN] Health check failed: 1 osds down (OSD_DOWN)
 166     2017-07-25 10:09:01.302624 mon.a mon.0 172.21.9.34:6789/0 94 : cluster [WRN] Health check failed: Degraded data redundancy: 21/63 objects degraded (33.333%), 16 pgs unclean, 16 pgs degraded (PG_DEGRADED)
 167
 168 When the OSD comes back online, the cluster log records the cluster's return
 169 to a health state:
 170
 171 ::
 172
 173     2017-07-25 10:11:11.526841 mon.a mon.0 172.21.9.34:6789/0 109 : cluster [WRN] Health check update: Degraded data redundancy: 2 pgs unclean, 2 pgs degraded, 2 pgs undersized (PG_DEGRADED)
 174     2017-07-25 10:11:13.535493 mon.a mon.0 172.21.9.34:6789/0 110 : cluster [INF] Health check cleared: PG_DEGRADED (was: Degraded data redundancy: 2 pgs unclean, 2 pgs degraded, 2 pgs undersized)
 175     2017-07-25 10:11:13.535577 mon.a mon.0 172.21.9.34:6789/0 111 : cluster [INF] Cluster is now healthy
 176
 177 Network Performance Checks
 178 --------------------------
 179
 180 Ceph OSDs send heartbeat ping messages amongst themselves to monitor daemon availability.  We
 181 also use the response times to monitor network performance.
 182 While it is possible that a busy OSD could delay a ping response, we can assume
 183 that if a network switch fails multiple delays will be detected between distinct pairs of OSDs.
 184
 185 By default we will warn about ping times which exceed 1 second (1000 milliseconds).
 186
 187 ::
 188
 189     HEALTH_WARN Slow OSD heartbeats on back (longest 1118.001ms)
 190
 191 The health detail will add the combination of OSDs are seeing the delays and by how much.  There is a limit of 10
 192 detail line items.
 193
 194 ::
 195
 196     [WRN] OSD_SLOW_PING_TIME_BACK: Slow OSD heartbeats on back (longest 1118.001ms)
 197         Slow OSD heartbeats on back from osd.0 [dc1,rack1] to osd.1 [dc1,rack1] 1118.001 msec possibly improving
 198         Slow OSD heartbeats on back from osd.0 [dc1,rack1] to osd.2 [dc1,rack2] 1030.123 msec
 199         Slow OSD heartbeats on back from osd.2 [dc1,rack2] to osd.1 [dc1,rack1] 1015.321 msec
 200         Slow OSD heartbeats on back from osd.1 [dc1,rack1] to osd.0 [dc1,rack1] 1010.456 msec
 201
 202 To see even more detail and a complete dump of network performance information the ``dump_osd_network`` command can be used.  Typically, this would be
 203 sent to a mgr, but it can be limited to a particular OSD's interactions by issuing it to any OSD.  The current threshold which defaults to 1 second
 204 (1000 milliseconds) can be overridden as an argument in milliseconds.
 205
 206 The following command will show all gathered network performance data by specifying a threshold of 0 and sending to the mgr.
 207
 208 .. prompt:: bash $
 209
 210    ceph daemon /var/run/ceph/ceph-mgr.x.asok dump_osd_network 0
 211
 212 ::
 213
 214     {
 215         "threshold": 0,
 216         "entries": [
 217             {
 218                 "last update": "Wed Sep  4 17:04:49 2019",
 219                 "stale": false,
 220                 "from osd": 2,
 221                 "to osd": 0,
 222                 "interface": "front",
 223                 "average": {
 224                     "1min": 1.023,
 225                     "5min": 0.860,
 226                     "15min": 0.883
 227                 },
 228                 "min": {
 229                     "1min": 0.818,
 230                     "5min": 0.607,
 231                     "15min": 0.607
 232                 },
 233                 "max": {
 234                     "1min": 1.164,
 235                     "5min": 1.173,
 236                     "15min": 1.544
 237                 },
 238                 "last": 0.924
 239             },
 240             {
 241                 "last update": "Wed Sep  4 17:04:49 2019",
 242                 "stale": false,
 243                 "from osd": 2,
 244                 "to osd": 0,
 245                 "interface": "back",
 246                 "average": {
 247                     "1min": 0.968,
 248                     "5min": 0.897,
 249                     "15min": 0.830
 250                 },
 251                 "min": {
 252                     "1min": 0.860,
 253                     "5min": 0.563,
 254                     "15min": 0.502
 255                 },
 256                 "max": {
 257                     "1min": 1.171,
 258                     "5min": 1.216,
 259                     "15min": 1.456
 260                 },
 261                 "last": 0.845
 262             },
 263             {
 264                 "last update": "Wed Sep  4 17:04:48 2019",
 265                 "stale": false,
 266                 "from osd": 0,
 267                 "to osd": 1,
 268                 "interface": "front",
 269                 "average": {
 270                     "1min": 0.965,
 271                     "5min": 0.811,
 272                     "15min": 0.850
 273                 },
 274                 "min": {
 275                     "1min": 0.650,
 276                     "5min": 0.488,
 277                     "15min": 0.466
 278                 },
 279                 "max": {
 280                     "1min": 1.252,
 281                     "5min": 1.252,
 282                     "15min": 1.362
 283                 },
 284             "last": 0.791
 285         },
 286         ...
 287
 288
 289
 290 Muting health checks
 291 --------------------
 292
 293 Health checks can be muted so that they do not affect the overall
 294 reported status of the cluster.  Alerts are specified using the health
 295 check code (see :ref:`health-checks`):
 296
 297 .. prompt:: bash $
 298
 299    ceph health mute <code>
 300
 301 For example, if there is a health warning, muting it will make the
 302 cluster report an overall status of ``HEALTH_OK``.  For example, to
 303 mute an ``OSD_DOWN`` alert,:
 304
 305 .. prompt:: bash $
 306
 307    ceph health mute OSD_DOWN
 308
 309 Mutes are reported as part of the short and long form of the ``ceph health`` command.
 310 For example, in the above scenario, the cluster would report:
 311
 312 .. prompt:: bash $
 313
 314    ceph health
 315
 316 ::
 317
 318    HEALTH_OK (muted: OSD_DOWN)
 319
 320 .. prompt:: bash $
 321
 322    ceph health detail
 323
 324 ::
 325
 326    HEALTH_OK (muted: OSD_DOWN)
 327    (MUTED) OSD_DOWN 1 osds down
 328        osd.1 is down
 329
 330 A mute can be explicitly removed with:
 331
 332 .. prompt:: bash $
 333
 334    ceph health unmute <code>
 335
 336 For example:
 337
 338 .. prompt:: bash $
 339
 340    ceph health unmute OSD_DOWN
 341
 342 A health check mute may optionally have a TTL (time to live)
 343 associated with it, such that the mute will automatically expire
 344 after the specified period of time has elapsed.  The TTL is specified as an optional
 345 duration argument, e.g.:
 346
 347 .. prompt:: bash $
 348
 349    ceph health mute OSD_DOWN 4h    # mute for 4 hours
 350    ceph health mute MON_DOWN 15m   # mute for 15  minutes
 351
 352 Normally, if a muted health alert is resolved (e.g., in the example
 353 above, the OSD comes back up), the mute goes away.  If the alert comes
 354 back later, it will be reported in the usual way.
 355
 356 It is possible to make a mute "sticky" such that the mute will remain even if the
 357 alert clears.  For example:
 358
 359 .. prompt:: bash $
 360
 361    ceph health mute OSD_DOWN 1h --sticky   # ignore any/all down OSDs for next hour
 362
 363 Most health mutes also disappear if the extent of an alert gets worse.  For example,
 364 if there is one OSD down, and the alert is muted, the mute will disappear if one
 365 or more additional OSDs go down.  This is true for any health alert that involves
 366 a count indicating how much or how many of something is triggering the warning or
 367 error.
 368
 369
 370 Detecting configuration issues
 371 ==============================
 372
 373 In addition to the health checks that Ceph continuously runs on its
 374 own status, there are some configuration issues that may only be detected
 375 by an external tool.
 376
 377 Use the `ceph-medic`_ tool to run these additional checks on your Ceph
 378 cluster's configuration.
 379
 380 Checking a Cluster's Usage Stats
 381 ================================
 382
 383 To check a cluster's data usage and data distribution among pools, you can
 384 use the ``df`` option. It is similar to Linux ``df``. Execute
 385 the following:
 386
 387 .. prompt:: bash $
 388
 389    ceph df
 390
 391 The output of ``ceph df`` looks like this::
 392
 393    CLASS     SIZE    AVAIL     USED  RAW USED  %RAW USED
 394    ssd    202 GiB  200 GiB  2.0 GiB   2.0 GiB       1.00
 395    TOTAL  202 GiB  200 GiB  2.0 GiB   2.0 GiB       1.00
 396
 397    --- POOLS ---
 398    POOL                   ID  PGS   STORED   (DATA)   (OMAP)   OBJECTS     USED  (DATA)   (OMAP)   %USED  MAX AVAIL  QUOTA OBJECTS  QUOTA BYTES  DIRTY  USED COMPR  UNDER COMPR
 399    device_health_metrics   1    1  242 KiB   15 KiB  227 KiB         4  251 KiB  24 KiB  227 KiB       0    297 GiB            N/A          N/A      4         0 B          0 B
 400    cephfs.a.meta           2   32  6.8 KiB  6.8 KiB      0 B        22   96 KiB  96 KiB      0 B       0    297 GiB            N/A          N/A     22         0 B          0 B
 401    cephfs.a.data           3   32      0 B      0 B      0 B         0      0 B     0 B      0 B       0     99 GiB            N/A          N/A      0         0 B          0 B
 402    test                    4   32   22 MiB   22 MiB   50 KiB       248   19 MiB  19 MiB   50 KiB       0    297 GiB            N/A          N/A    248         0 B          0 B
 403
 404
 405
 406
 407
 408 - **CLASS:** for example, "ssd" or "hdd"
 409 - **SIZE:** The amount of storage capacity managed by the cluster.
 410 - **AVAIL:** The amount of free space available in the cluster.
 411 - **USED:** The amount of raw storage consumed by user data (excluding
 412   BlueStore's database)
 413 - **RAW USED:** The amount of raw storage consumed by user data, internal
 414   overhead, or reserved capacity.
 415 - **%RAW USED:** The percentage of raw storage used. Use this number in
 416   conjunction with the ``full ratio`` and ``near full ratio`` to ensure that
 417   you are not reaching your cluster's capacity. See `Storage Capacity`_ for
 418   additional details.
 419
 420
 421 **POOLS:**
 422
 423 The **POOLS** section of the output provides a list of pools and the notional
 424 usage of each pool. The output from this section **DOES NOT** reflect replicas,
 425 clones or snapshots. For example, if you store an object with 1MB of data, the
 426 notional usage will be 1MB, but the actual usage may be 2MB or more depending
 427 on the number of replicas, clones and snapshots.
 428
 429 - **ID:** The number of the node within the pool.
 430 - **STORED:** actual amount of data user/Ceph has stored in a pool. This is
 431   similar to the USED column in earlier versions of Ceph but the calculations
 432   (for BlueStore!) are more precise (gaps are properly handled).
 433
 434   - **(DATA):** usage for RBD (RADOS Block Device), CephFS file data, and RGW
 435     (RADOS Gateway) object data.
 436   - **(OMAP):** key-value pairs. Used primarily by CephFS and RGW (RADOS
 437     Gateway) for metadata storage.
 438
 439 - **OBJECTS:** The notional number of objects stored per pool. "Notional" is
 440   defined above in the paragraph immediately under "POOLS".
 441 - **USED:** The space allocated for a pool over all OSDs. This includes
 442   replication, allocation granularity, and erasure-coding overhead. Compression
 443   savings and object content gaps are also taken into account. BlueStore's
 444   database is not included in this amount.
 445
 446   - **(DATA):** object usage for RBD (RADOS Block Device), CephFS file data, and RGW
 447     (RADOS Gateway) object data.
 448   - **(OMAP):** object key-value pairs. Used primarily by CephFS and RGW (RADOS
 449     Gateway) for metadata storage.
 450
 451 - **%USED:** The notional percentage of storage used per pool.
 452 - **MAX AVAIL:** An estimate of the notional amount of data that can be written
 453   to this pool.
 454 - **QUOTA OBJECTS:** The number of quota objects.
 455 - **QUOTA BYTES:** The number of bytes in the quota objects.
 456 - **DIRTY:** The number of objects in the cache pool that have been written to
 457   the cache pool but have not been flushed yet to the base pool. This field is
 458   only available when cache tiering is in use.
 459 - **USED COMPR:** amount of space allocated for compressed data (i.e. this
 460   includes compressed data plus all the allocation, replication and erasure
 461   coding overhead).
 462 - **UNDER COMPR:** amount of data passed through compression (summed over all
 463   replicas) and beneficial enough to be stored in a compressed form.
 464
 465
 466 .. note:: The numbers in the POOLS section are notional. They are not
 467    inclusive of the number of replicas, snapshots or clones. As a result, the
 468    sum of the USED and %USED amounts will not add up to the USED and %USED
 469    amounts in the RAW section of the output.
 470
 471 .. note:: The MAX AVAIL value is a complicated function of the replication
 472    or erasure code used, the CRUSH rule that maps storage to devices, the
 473    utilization of those devices, and the configured ``mon_osd_full_ratio``.
 474
 475
 476 Checking OSD Status
 477 ===================
 478
 479 You can check OSDs to ensure they are ``up`` and ``in`` by executing the
 480 following command:
 481
 482 .. prompt:: bash #
 483
 484   ceph osd stat
 485
 486 Or:
 487
 488 .. prompt:: bash #
 489
 490   ceph osd dump
 491
 492 You can also check view OSDs according to their position in the CRUSH map by
 493 using the following command:
 494
 495 .. prompt:: bash #
 496
 497    ceph osd tree
 498
 499 Ceph will print out a CRUSH tree with a host, its OSDs, whether they are up
 500 and their weight:
 501
 502 .. code-block:: bash
 503
 504    #ID CLASS WEIGHT  TYPE NAME             STATUS REWEIGHT PRI-AFF
 505     -1       3.00000 pool default
 506     -3       3.00000 rack mainrack
 507     -2       3.00000 host osd-host
 508      0   ssd 1.00000         osd.0             up  1.00000 1.00000
 509      1   ssd 1.00000         osd.1             up  1.00000 1.00000
 510      2   ssd 1.00000         osd.2             up  1.00000 1.00000
 511
 512 For a detailed discussion, refer to `Monitoring OSDs and Placement Groups`_.
 513
 514 Checking Monitor Status
 515 =======================
 516
 517 If your cluster has multiple monitors (likely), you should check the monitor
 518 quorum status after you start the cluster and before reading and/or writing data. A
 519 quorum must be present when multiple monitors are running. You should also check
 520 monitor status periodically to ensure that they are running.
 521
 522 To see display the monitor map, execute the following:
 523
 524 .. prompt:: bash $
 525
 526    ceph mon stat
 527
 528 Or:
 529
 530 .. prompt:: bash $
 531
 532    ceph mon dump
 533
 534 To check the quorum status for the monitor cluster, execute the following:
 535
 536 .. prompt:: bash $
 537
 538    ceph quorum_status
 539
 540 Ceph will return the quorum status. For example, a Ceph  cluster consisting of
 541 three monitors may return the following:
 542
 543 .. code-block:: javascript
 544
 545         { "election_epoch": 10,
 546           "quorum": [
 547                 0,
 548                 1,
 549                 2],
 550           "quorum_names": [
 551                 "a",
 552                 "b",
 553                 "c"],
 554           "quorum_leader_name": "a",
 555           "monmap": { "epoch": 1,
 556               "fsid": "444b489c-4f16-4b75-83f0-cb8097468898",
 557               "modified": "2011-12-12 13:28:27.505520",
 558               "created": "2011-12-12 13:28:27.505520",
 559               "features": {"persistent": [
 560                                 "kraken",
 561                                 "luminous",
 562                                 "mimic"],
 563                 "optional": []
 564               },
 565               "mons": [
 566                     { "rank": 0,
 567                       "name": "a",
 568                       "addr": "127.0.0.1:6789/0",
 569                       "public_addr": "127.0.0.1:6789/0"},
 570                     { "rank": 1,
 571                       "name": "b",
 572                       "addr": "127.0.0.1:6790/0",
 573                       "public_addr": "127.0.0.1:6790/0"},
 574                     { "rank": 2,
 575                       "name": "c",
 576                       "addr": "127.0.0.1:6791/0",
 577                       "public_addr": "127.0.0.1:6791/0"}
 578                    ]
 579           }
 580         }
 581
 582 Checking MDS Status
 583 ===================
 584
 585 Metadata servers provide metadata services for  CephFS. Metadata servers have
 586 two sets of states: ``up | down`` and ``active | inactive``. To ensure your
 587 metadata servers are ``up`` and ``active``,  execute the following:
 588
 589 .. prompt:: bash $
 590
 591    ceph mds stat
 592
 593 To display details of the metadata cluster, execute the following:
 594
 595 .. prompt:: bash $
 596
 597    ceph fs dump
 598
 599
 600 Checking Placement Group States
 601 ===============================
 602
 603 Placement groups map objects to OSDs. When you monitor your
 604 placement groups,  you will want them to be ``active`` and ``clean``.
 605 For a detailed discussion, refer to `Monitoring OSDs and Placement Groups`_.
 606
 607 .. _Monitoring OSDs and Placement Groups: ../monitoring-osd-pg
 608
 609 .. _rados-monitoring-using-admin-socket:
 610
 611 Using the Admin Socket
 612 ======================
 613
 614 The Ceph admin socket allows you to query a daemon via a socket interface.
 615 By default, Ceph sockets reside under ``/var/run/ceph``. To access a daemon
 616 via the admin socket, login to the host running the daemon and use the
 617 following command:
 618
 619 .. prompt:: bash $
 620
 621    ceph daemon {daemon-name}
 622    ceph daemon {path-to-socket-file}
 623
 624 For example, the following are equivalent:
 625
 626 .. prompt:: bash $
 627
 628    ceph daemon osd.0 foo
 629    ceph daemon /var/run/ceph/ceph-osd.0.asok foo
 630
 631 To view the available admin socket commands, execute the following command:
 632
 633 .. prompt:: bash $
 634
 635    ceph daemon {daemon-name} help
 636
 637 The admin socket command enables you to show and set your configuration at
 638 runtime. See `Viewing a Configuration at Runtime`_ for details.
 639
 640 Additionally, you can set configuration values at runtime directly (i.e., the
 641 admin socket bypasses the monitor, unlike ``ceph tell {daemon-type}.{id}
 642 config set``, which relies on the monitor but doesn't require you to login
 643 directly to the host in question ).
 644
 645 .. _Viewing a Configuration at Runtime: ../../configuration/ceph-conf#viewing-a-configuration-at-runtime
 646 .. _Storage Capacity: ../../configuration/mon-config-ref#storage-capacity
 647 .. _ceph-medic: http://docs.ceph.com/ceph-medic/master/