ceph/doc/rados/operations/monitoring.rst

   1 ======================
   2  Monitoring a Cluster
   3 ======================
   4
   5 Once you have a running cluster, you may use the ``ceph`` tool to monitor your
   6 cluster. Monitoring a cluster typically involves checking OSD status, monitor
   7 status, placement group status and metadata server status.
   8
   9 Interactive Mode
  10 ================
  11
  12 To run the ``ceph`` tool in interactive mode, type ``ceph`` at the command line
  13 with no arguments.  For example::
  14
  15         ceph
  16         ceph> health
  17         ceph> status
  18         ceph> quorum_status
  19         ceph> mon_status
  20
  21
  22 Checking Cluster Health
  23 =======================
  24
  25 After you start your cluster, and before you start reading and/or
  26 writing data, check your cluster's health first. You can check on the
  27 health of your Ceph cluster with the following::
  28
  29         ceph health
  30
  31 If you specified non-default locations for your configuration or keyring,
  32 you may specify their locations::
  33
  34    ceph -c /path/to/conf -k /path/to/keyring health
  35
  36 Upon starting the Ceph cluster, you will likely encounter a health
  37 warning such as ``HEALTH_WARN XXX num placement groups stale``. Wait a few moments and check
  38 it again. When your cluster is ready, ``ceph health`` should return a message
  39 such as ``HEALTH_OK``. At that point, it is okay to begin using the cluster.
  40
  41 Watching a Cluster
  42 ==================
  43
  44 To watch the cluster's ongoing events, open a new terminal. Then, enter::
  45
  46         ceph -w
  47
  48 Ceph will print each event.  For example, a tiny Ceph cluster consisting of
  49 one monitor, and two OSDs may print the following::
  50
  51     cluster b370a29d-9287-4ca3-ab57-3d824f65e339
  52      health HEALTH_OK
  53      monmap e1: 1 mons at {ceph1=10.0.0.8:6789/0}, election epoch 2, quorum 0 ceph1
  54      osdmap e63: 2 osds: 2 up, 2 in
  55       pgmap v41338: 952 pgs, 20 pools, 17130 MB data, 2199 objects
  56             115 GB used, 167 GB / 297 GB avail
  57                  952 active+clean
  58
  59     2014-06-02 15:45:21.655871 osd.0 [INF] 17.71 deep-scrub ok
  60     2014-06-02 15:45:47.880608 osd.1 [INF] 1.0 scrub ok
  61     2014-06-02 15:45:48.865375 osd.1 [INF] 1.3 scrub ok
  62     2014-06-02 15:45:50.866479 osd.1 [INF] 1.4 scrub ok
  63     2014-06-02 15:45:01.345821 mon.0 [INF] pgmap v41339: 952 pgs: 952 active+clean; 17130 MB data, 115 GB used, 167 GB / 297 GB avail
  64     2014-06-02 15:45:05.718640 mon.0 [INF] pgmap v41340: 952 pgs: 1 active+clean+scrubbing+deep, 951 active+clean; 17130 MB data, 115 GB used, 167 GB / 297 GB avail
  65     2014-06-02 15:45:53.997726 osd.1 [INF] 1.5 scrub ok
  66     2014-06-02 15:45:06.734270 mon.0 [INF] pgmap v41341: 952 pgs: 1 active+clean+scrubbing+deep, 951 active+clean; 17130 MB data, 115 GB used, 167 GB / 297 GB avail
  67     2014-06-02 15:45:15.722456 mon.0 [INF] pgmap v41342: 952 pgs: 952 active+clean; 17130 MB data, 115 GB used, 167 GB / 297 GB avail
  68     2014-06-02 15:46:06.836430 osd.0 [INF] 17.75 deep-scrub ok
  69     2014-06-02 15:45:55.720929 mon.0 [INF] pgmap v41343: 952 pgs: 1 active+clean+scrubbing+deep, 951 active+clean; 17130 MB data, 115 GB used, 167 GB / 297 GB avail
  70
  71
  72 The output provides:
  73
  74 - Cluster ID
  75 - Cluster health status
  76 - The monitor map epoch and the status of the monitor quorum
  77 - The OSD map epoch and the status of OSDs
  78 - The placement group map version
  79 - The number of placement groups and pools
  80 - The *notional* amount of data stored and the number of objects stored; and,
  81 - The total amount of data stored.
  82
  83 .. topic:: How Ceph Calculates Data Usage
  84
  85    The ``used`` value reflects the *actual* amount of raw storage used. The
  86    ``xxx GB / xxx GB`` value means the amount available (the lesser number)
  87    of the overall storage capacity of the cluster. The notional number reflects
  88    the size of the stored data before it is replicated, cloned or snapshotted.
  89    Therefore, the amount of data actually stored typically exceeds the notional
  90    amount stored, because Ceph creates replicas of the data and may also use
  91    storage capacity for cloning and snapshotting.
  92
  93
  94 Checking a Cluster's Usage Stats
  95 ================================
  96
  97 To check a cluster's data usage and data distribution among pools, you can
  98 use the ``df`` option. It is similar to Linux ``df``. Execute
  99 the following::
 100
 101         ceph df
 102
 103 The **GLOBAL** section of the output provides an overview of the amount of
 104 storage your cluster uses for your data.
 105
 106 - **SIZE:** The overall storage capacity of the cluster.
 107 - **AVAIL:** The amount of free space available in the cluster.
 108 - **RAW USED:** The amount of raw storage used.
 109 - **% RAW USED:** The percentage of raw storage used. Use this number in
 110   conjunction with the ``full ratio`` and ``near full ratio`` to ensure that
 111   you are not reaching your cluster's capacity. See `Storage Capacity`_ for
 112   additional details.
 113
 114 The **POOLS** section of the output provides a list of pools and the notional
 115 usage of each pool. The output from this section **DOES NOT** reflect replicas,
 116 clones or snapshots. For example, if you store an object with 1MB of data, the
 117 notional usage will be 1MB, but the actual usage may be 2MB or more depending
 118 on the number of replicas, clones and snapshots.
 119
 120 - **NAME:** The name of the pool.
 121 - **ID:** The pool ID.
 122 - **USED:** The notional amount of data stored in kilobytes, unless the number
 123   appends **M** for megabytes or **G** for gigabytes.
 124 - **%USED:** The notional percentage of storage used per pool.
 125 - **MAX AVAIL:** An estimate of the notional amount of data that can be written
 126   to this pool.
 127 - **Objects:** The notional number of objects stored per pool.
 128
 129 .. note:: The numbers in the **POOLS** section are notional. They are not
 130    inclusive of the number of replicas, shapshots or clones. As a result,
 131    the sum of the **USED** and **%USED** amounts will not add up to the
 132    **RAW USED** and **%RAW USED** amounts in the **GLOBAL** section of the
 133    output.
 134
 135 .. note:: The **MAX AVAIL** value is a complicated function of the
 136    replication or erasure code used, the CRUSH rule that maps storage
 137    to devices, the utilization of those devices, and the configured
 138    mon_osd_full_ratio.
 139
 140
 141 Checking a Cluster's Status
 142 ===========================
 143
 144 To check a cluster's status, execute the following::
 145
 146         ceph status
 147
 148 Or::
 149
 150         ceph -s
 151
 152 In interactive mode, type ``status`` and press **Enter**. ::
 153
 154         ceph> status
 155
 156 Ceph will print the cluster status. For example, a tiny Ceph  cluster consisting
 157 of one monitor, and two OSDs may print the following::
 158
 159     cluster b370a29d-9287-4ca3-ab57-3d824f65e339
 160      health HEALTH_OK
 161      monmap e1: 1 mons at {ceph1=10.0.0.8:6789/0}, election epoch 2, quorum 0 ceph1
 162      osdmap e63: 2 osds: 2 up, 2 in
 163       pgmap v41332: 952 pgs, 20 pools, 17130 MB data, 2199 objects
 164             115 GB used, 167 GB / 297 GB avail
 165                    1 active+clean+scrubbing+deep
 166                  951 active+clean
 167
 168
 169 Checking OSD Status
 170 ===================
 171
 172 You can check OSDs to ensure they are ``up`` and ``in`` by executing::
 173
 174         ceph osd stat
 175
 176 Or::
 177
 178         ceph osd dump
 179
 180 You can also check view OSDs according to their position in the CRUSH map. ::
 181
 182         ceph osd tree
 183
 184 Ceph will print out a CRUSH tree with a host, its OSDs, whether they are up
 185 and their weight. ::
 186
 187         # id    weight  type name       up/down reweight
 188         -1      3       pool default
 189         -3      3               rack mainrack
 190         -2      3                       host osd-host
 191         0       1                               osd.0   up      1
 192         1       1                               osd.1   up      1
 193         2       1                               osd.2   up      1
 194
 195 For a detailed discussion, refer to `Monitoring OSDs and Placement Groups`_.
 196
 197 Checking Monitor Status
 198 =======================
 199
 200 If your cluster has multiple monitors (likely), you should check the monitor
 201 quorum status after you start the cluster before reading and/or writing data. A
 202 quorum must be present when multiple monitors are running. You should also check
 203 monitor status periodically to ensure that they are running.
 204
 205 To see display the monitor map, execute the following::
 206
 207         ceph mon stat
 208
 209 Or::
 210
 211         ceph mon dump
 212
 213 To check the quorum status for the monitor cluster, execute the following::
 214
 215         ceph quorum_status
 216
 217 Ceph will return the quorum status. For example, a Ceph  cluster consisting of
 218 three monitors may return the following:
 219
 220 .. code-block:: javascript
 221
 222         { "election_epoch": 10,
 223           "quorum": [
 224                 0,
 225                 1,
 226                 2],
 227           "monmap": { "epoch": 1,
 228               "fsid": "444b489c-4f16-4b75-83f0-cb8097468898",
 229               "modified": "2011-12-12 13:28:27.505520",
 230               "created": "2011-12-12 13:28:27.505520",
 231               "mons": [
 232                     { "rank": 0,
 233                       "name": "a",
 234                       "addr": "127.0.0.1:6789\/0"},
 235                     { "rank": 1,
 236                       "name": "b",
 237                       "addr": "127.0.0.1:6790\/0"},
 238                     { "rank": 2,
 239                       "name": "c",
 240                       "addr": "127.0.0.1:6791\/0"}
 241                    ]
 242             }
 243         }
 244
 245 Checking MDS Status
 246 ===================
 247
 248 Metadata servers provide metadata services for  Ceph FS. Metadata servers have
 249 two sets of states: ``up | down`` and ``active | inactive``. To ensure your
 250 metadata servers are ``up`` and ``active``,  execute the following::
 251
 252         ceph mds stat
 253
 254 To display details of the metadata cluster, execute the following::
 255
 256         ceph fs dump
 257
 258
 259 Checking Placement Group States
 260 ===============================
 261
 262 Placement groups map objects to OSDs. When you monitor your
 263 placement groups,  you will want them to be ``active`` and ``clean``.
 264 For a detailed discussion, refer to `Monitoring OSDs and Placement Groups`_.
 265
 266 .. _Monitoring OSDs and Placement Groups: ../monitoring-osd-pg
 267
 268
 269 Using the Admin Socket
 270 ======================
 271
 272 The Ceph admin socket allows you to query a daemon via a socket interface.
 273 By default, Ceph sockets reside under ``/var/run/ceph``. To access a daemon
 274 via the admin socket, login to the host running the daemon and use the
 275 following command::
 276
 277         ceph daemon {daemon-name}
 278         ceph daemon {path-to-socket-file}
 279
 280 For example, the following are equivalent::
 281
 282     ceph daemon osd.0 foo
 283     ceph daemon /var/run/ceph/ceph-osd.0.asok foo
 284
 285 To view the available admin socket commands, execute the following command::
 286
 287         ceph daemon {daemon-name} help
 288
 289 The admin socket command enables you to show and set your configuration at
 290 runtime. See `Viewing a Configuration at Runtime`_ for details.
 291
 292 Additionally, you can set configuration values at runtime directly (i.e., the
 293 admin socket bypasses the monitor, unlike ``ceph tell {daemon-type}.{id}
 294 injectargs``, which relies on the monitor but doesn't require you to login
 295 directly to the host in question ).
 296
 297 .. _Viewing a Configuration at Runtime: ../../configuration/ceph-conf#ceph-runtime-config
 298 .. _Storage Capacity: ../../configuration/mon-config-ref#storage-capacity