]> git.proxmox.com Git - ceph.git/blame - ceph/doc/rados/operations/monitoring.rst
update sources to v12.1.2
[ceph.git] / ceph / doc / rados / operations / monitoring.rst
CommitLineData
7c673cae
FG
1======================
2 Monitoring a Cluster
3======================
4
5Once you have a running cluster, you may use the ``ceph`` tool to monitor your
6cluster. Monitoring a cluster typically involves checking OSD status, monitor
7status, placement group status and metadata server status.
8
c07f9fc5
FG
9Using the command line
10======================
11
12Interactive mode
13----------------
7c673cae
FG
14
15To run the ``ceph`` tool in interactive mode, type ``ceph`` at the command line
16with no arguments. For example::
17
18 ceph
19 ceph> health
20 ceph> status
21 ceph> quorum_status
22 ceph> mon_status
7c673cae 23
c07f9fc5
FG
24Non-default paths
25-----------------
7c673cae
FG
26
27If you specified non-default locations for your configuration or keyring,
28you may specify their locations::
29
30 ceph -c /path/to/conf -k /path/to/keyring health
31
c07f9fc5
FG
32Checking a Cluster's Status
33===========================
34
35After you start your cluster, and before you start reading and/or
36writing data, check your cluster's status first.
7c673cae 37
c07f9fc5 38To check a cluster's status, execute the following::
7c673cae 39
c07f9fc5
FG
40 ceph status
41
42Or::
7c673cae 43
c07f9fc5
FG
44 ceph -s
45
46In interactive mode, type ``status`` and press **Enter**. ::
47
48 ceph> status
49
50Ceph will print the cluster status. For example, a tiny Ceph demonstration
51cluster with one of each service may print the following:
52
53::
54
55 cluster:
56 id: 477e46f1-ae41-4e43-9c8f-72c918ab0a20
57 health: HEALTH_OK
58
59 services:
60 mon: 1 daemons, quorum a
61 mgr: x(active)
62 mds: 1/1/1 up {0=a=up:active}
63 osd: 1 osds: 1 up, 1 in
64
65 data:
66 pools: 2 pools, 16 pgs
67 objects: 21 objects, 2246 bytes
68 usage: 546 GB used, 384 GB / 931 GB avail
69 pgs: 16 active+clean
7c673cae 70
7c673cae
FG
71
72.. topic:: How Ceph Calculates Data Usage
73
c07f9fc5 74 The ``usage`` value reflects the *actual* amount of raw storage used. The
7c673cae
FG
75 ``xxx GB / xxx GB`` value means the amount available (the lesser number)
76 of the overall storage capacity of the cluster. The notional number reflects
77 the size of the stored data before it is replicated, cloned or snapshotted.
78 Therefore, the amount of data actually stored typically exceeds the notional
79 amount stored, because Ceph creates replicas of the data and may also use
80 storage capacity for cloning and snapshotting.
81
82
c07f9fc5
FG
83Watching a Cluster
84==================
85
86In addition to local logging by each daemon, Ceph clusters maintain
87a *cluster log* that records high level events about the whole system.
88This is logged to disk on monitor servers (as ``/var/log/ceph/ceph.log`` by
89default), but can also be monitored via the command line.
90
91To follow the cluster log, use the following command
92
93::
94
95 ceph -w
96
97Ceph will print the status of the system, followed by each log message as it
98is emitted. For example:
99
100::
101
102 cluster:
103 id: 477e46f1-ae41-4e43-9c8f-72c918ab0a20
104 health: HEALTH_OK
105
106 services:
107 mon: 1 daemons, quorum a
108 mgr: x(active)
109 mds: 1/1/1 up {0=a=up:active}
110 osd: 1 osds: 1 up, 1 in
111
112 data:
113 pools: 2 pools, 16 pgs
114 objects: 21 objects, 2246 bytes
115 usage: 546 GB used, 384 GB / 931 GB avail
116 pgs: 16 active+clean
117
118
119 2017-07-24 08:15:11.329298 mon.a mon.0 172.21.9.34:6789/0 23 : cluster [INF] osd.0 172.21.9.34:6806/20527 boot
120 2017-07-24 08:15:14.258143 mon.a mon.0 172.21.9.34:6789/0 39 : cluster [INF] Activating manager daemon x
121 2017-07-24 08:15:15.446025 mon.a mon.0 172.21.9.34:6789/0 47 : cluster [INF] Manager daemon x is now available
122
123
124In addition to using ``ceph -w`` to print log lines as they are emitted,
125use ``ceph log last [n]`` to see the most recent ``n`` lines from the cluster
126log.
127
128Monitoring Health Checks
129========================
130
131Ceph continously runs various *health checks* against its own status. When
132a health check fails, this is reflected in the output of ``ceph status`` (or
133``ceph health``). In addition, messages are sent to the cluster log to
134indicate when a check fails, and when the cluster recovers.
135
136For example, when an OSD goes down, the ``health`` section of the status
137output may be updated as follows:
138
139::
140
141 health: HEALTH_WARN
142 1 osds down
143 Degraded data redundancy: 21/63 objects degraded (33.333%), 16 pgs unclean, 16 pgs degraded
144
145At this time, cluster log messages are also emitted to record the failure of the
146health checks:
147
148::
149
150 2017-07-25 10:08:58.265945 mon.a mon.0 172.21.9.34:6789/0 91 : cluster [WRN] Health check failed: 1 osds down (OSD_DOWN)
151 2017-07-25 10:09:01.302624 mon.a mon.0 172.21.9.34:6789/0 94 : cluster [WRN] Health check failed: Degraded data redundancy: 21/63 objects degraded (33.333%), 16 pgs unclean, 16 pgs degraded (PG_DEGRADED)
152
153When the OSD comes back online, the cluster log records the cluster's return
154to a health state:
155
156::
157
158 2017-07-25 10:11:11.526841 mon.a mon.0 172.21.9.34:6789/0 109 : cluster [WRN] Health check update: Degraded data redundancy: 2 pgs unclean, 2 pgs degraded, 2 pgs undersized (PG_DEGRADED)
159 2017-07-25 10:11:13.535493 mon.a mon.0 172.21.9.34:6789/0 110 : cluster [INF] Health check cleared: PG_DEGRADED (was: Degraded data redundancy: 2 pgs unclean, 2 pgs degraded, 2 pgs undersized)
160 2017-07-25 10:11:13.535577 mon.a mon.0 172.21.9.34:6789/0 111 : cluster [INF] Cluster is now healthy
161
162
163Detecting configuration issues
164==============================
165
166In addition to the health checks that Ceph continuously runs on its
167own status, there are some configuration issues that may only be detected
168by an external tool.
169
170Use the `ceph-medic`_ tool to run these additional checks on your Ceph
171cluster's configuration.
172
7c673cae
FG
173Checking a Cluster's Usage Stats
174================================
175
176To check a cluster's data usage and data distribution among pools, you can
177use the ``df`` option. It is similar to Linux ``df``. Execute
178the following::
179
180 ceph df
181
182The **GLOBAL** section of the output provides an overview of the amount of
183storage your cluster uses for your data.
184
185- **SIZE:** The overall storage capacity of the cluster.
186- **AVAIL:** The amount of free space available in the cluster.
187- **RAW USED:** The amount of raw storage used.
188- **% RAW USED:** The percentage of raw storage used. Use this number in
189 conjunction with the ``full ratio`` and ``near full ratio`` to ensure that
190 you are not reaching your cluster's capacity. See `Storage Capacity`_ for
191 additional details.
192
193The **POOLS** section of the output provides a list of pools and the notional
194usage of each pool. The output from this section **DOES NOT** reflect replicas,
195clones or snapshots. For example, if you store an object with 1MB of data, the
196notional usage will be 1MB, but the actual usage may be 2MB or more depending
197on the number of replicas, clones and snapshots.
198
199- **NAME:** The name of the pool.
200- **ID:** The pool ID.
201- **USED:** The notional amount of data stored in kilobytes, unless the number
202 appends **M** for megabytes or **G** for gigabytes.
203- **%USED:** The notional percentage of storage used per pool.
204- **MAX AVAIL:** An estimate of the notional amount of data that can be written
205 to this pool.
206- **Objects:** The notional number of objects stored per pool.
207
208.. note:: The numbers in the **POOLS** section are notional. They are not
209 inclusive of the number of replicas, shapshots or clones. As a result,
210 the sum of the **USED** and **%USED** amounts will not add up to the
211 **RAW USED** and **%RAW USED** amounts in the **GLOBAL** section of the
212 output.
213
214.. note:: The **MAX AVAIL** value is a complicated function of the
215 replication or erasure code used, the CRUSH rule that maps storage
216 to devices, the utilization of those devices, and the configured
217 mon_osd_full_ratio.
218
219
7c673cae
FG
220
221Checking OSD Status
222===================
223
224You can check OSDs to ensure they are ``up`` and ``in`` by executing::
225
226 ceph osd stat
227
228Or::
229
230 ceph osd dump
231
232You can also check view OSDs according to their position in the CRUSH map. ::
233
234 ceph osd tree
235
236Ceph will print out a CRUSH tree with a host, its OSDs, whether they are up
237and their weight. ::
238
239 # id weight type name up/down reweight
240 -1 3 pool default
241 -3 3 rack mainrack
242 -2 3 host osd-host
243 0 1 osd.0 up 1
244 1 1 osd.1 up 1
245 2 1 osd.2 up 1
246
247For a detailed discussion, refer to `Monitoring OSDs and Placement Groups`_.
248
249Checking Monitor Status
250=======================
251
252If your cluster has multiple monitors (likely), you should check the monitor
253quorum status after you start the cluster before reading and/or writing data. A
254quorum must be present when multiple monitors are running. You should also check
255monitor status periodically to ensure that they are running.
256
257To see display the monitor map, execute the following::
258
259 ceph mon stat
260
261Or::
262
263 ceph mon dump
264
265To check the quorum status for the monitor cluster, execute the following::
266
267 ceph quorum_status
268
269Ceph will return the quorum status. For example, a Ceph cluster consisting of
270three monitors may return the following:
271
272.. code-block:: javascript
273
274 { "election_epoch": 10,
275 "quorum": [
276 0,
277 1,
278 2],
279 "monmap": { "epoch": 1,
280 "fsid": "444b489c-4f16-4b75-83f0-cb8097468898",
281 "modified": "2011-12-12 13:28:27.505520",
282 "created": "2011-12-12 13:28:27.505520",
283 "mons": [
284 { "rank": 0,
285 "name": "a",
286 "addr": "127.0.0.1:6789\/0"},
287 { "rank": 1,
288 "name": "b",
289 "addr": "127.0.0.1:6790\/0"},
290 { "rank": 2,
291 "name": "c",
292 "addr": "127.0.0.1:6791\/0"}
293 ]
294 }
295 }
296
297Checking MDS Status
298===================
299
300Metadata servers provide metadata services for Ceph FS. Metadata servers have
301two sets of states: ``up | down`` and ``active | inactive``. To ensure your
302metadata servers are ``up`` and ``active``, execute the following::
303
304 ceph mds stat
305
306To display details of the metadata cluster, execute the following::
307
308 ceph fs dump
309
310
311Checking Placement Group States
312===============================
313
314Placement groups map objects to OSDs. When you monitor your
315placement groups, you will want them to be ``active`` and ``clean``.
316For a detailed discussion, refer to `Monitoring OSDs and Placement Groups`_.
317
318.. _Monitoring OSDs and Placement Groups: ../monitoring-osd-pg
319
320
321Using the Admin Socket
322======================
323
324The Ceph admin socket allows you to query a daemon via a socket interface.
325By default, Ceph sockets reside under ``/var/run/ceph``. To access a daemon
326via the admin socket, login to the host running the daemon and use the
327following command::
328
329 ceph daemon {daemon-name}
330 ceph daemon {path-to-socket-file}
331
332For example, the following are equivalent::
333
334 ceph daemon osd.0 foo
335 ceph daemon /var/run/ceph/ceph-osd.0.asok foo
336
337To view the available admin socket commands, execute the following command::
338
339 ceph daemon {daemon-name} help
340
341The admin socket command enables you to show and set your configuration at
342runtime. See `Viewing a Configuration at Runtime`_ for details.
343
344Additionally, you can set configuration values at runtime directly (i.e., the
345admin socket bypasses the monitor, unlike ``ceph tell {daemon-type}.{id}
346injectargs``, which relies on the monitor but doesn't require you to login
347directly to the host in question ).
348
349.. _Viewing a Configuration at Runtime: ../../configuration/ceph-conf#ceph-runtime-config
350.. _Storage Capacity: ../../configuration/mon-config-ref#storage-capacity
c07f9fc5 351.. _ceph-medic: http://docs.ceph.com/ceph-medic/master/