]> git.proxmox.com Git - ceph.git/blame - ceph/doc/rados/operations/monitoring.rst
buildsys: auto-determine current version for makefile
[ceph.git] / ceph / doc / rados / operations / monitoring.rst
CommitLineData
7c673cae
FG
1======================
2 Monitoring a Cluster
3======================
4
5Once you have a running cluster, you may use the ``ceph`` tool to monitor your
6cluster. Monitoring a cluster typically involves checking OSD status, monitor
7status, placement group status and metadata server status.
8
c07f9fc5
FG
9Using the command line
10======================
11
12Interactive mode
13----------------
7c673cae
FG
14
15To run the ``ceph`` tool in interactive mode, type ``ceph`` at the command line
16with no arguments. For example::
17
18 ceph
19 ceph> health
20 ceph> status
21 ceph> quorum_status
22 ceph> mon_status
7c673cae 23
c07f9fc5
FG
24Non-default paths
25-----------------
7c673cae
FG
26
27If you specified non-default locations for your configuration or keyring,
28you may specify their locations::
29
30 ceph -c /path/to/conf -k /path/to/keyring health
31
c07f9fc5
FG
32Checking a Cluster's Status
33===========================
34
35After you start your cluster, and before you start reading and/or
36writing data, check your cluster's status first.
7c673cae 37
c07f9fc5 38To check a cluster's status, execute the following::
7c673cae 39
c07f9fc5
FG
40 ceph status
41
42Or::
7c673cae 43
c07f9fc5
FG
44 ceph -s
45
46In interactive mode, type ``status`` and press **Enter**. ::
47
48 ceph> status
49
50Ceph will print the cluster status. For example, a tiny Ceph demonstration
51cluster with one of each service may print the following:
52
53::
54
55 cluster:
56 id: 477e46f1-ae41-4e43-9c8f-72c918ab0a20
57 health: HEALTH_OK
58
59 services:
11fdf7f2 60 mon: 3 daemons, quorum a,b,c
c07f9fc5 61 mgr: x(active)
11fdf7f2
TL
62 mds: cephfs_a-1/1/1 up {0=a=up:active}, 2 up:standby
63 osd: 3 osds: 3 up, 3 in
c07f9fc5
FG
64
65 data:
66 pools: 2 pools, 16 pgs
11fdf7f2 67 objects: 21 objects, 2.19K
c07f9fc5
FG
68 usage: 546 GB used, 384 GB / 931 GB avail
69 pgs: 16 active+clean
7c673cae 70
7c673cae
FG
71
72.. topic:: How Ceph Calculates Data Usage
73
c07f9fc5 74 The ``usage`` value reflects the *actual* amount of raw storage used. The
7c673cae
FG
75 ``xxx GB / xxx GB`` value means the amount available (the lesser number)
76 of the overall storage capacity of the cluster. The notional number reflects
77 the size of the stored data before it is replicated, cloned or snapshotted.
78 Therefore, the amount of data actually stored typically exceeds the notional
79 amount stored, because Ceph creates replicas of the data and may also use
80 storage capacity for cloning and snapshotting.
81
82
c07f9fc5
FG
83Watching a Cluster
84==================
85
86In addition to local logging by each daemon, Ceph clusters maintain
87a *cluster log* that records high level events about the whole system.
88This is logged to disk on monitor servers (as ``/var/log/ceph/ceph.log`` by
89default), but can also be monitored via the command line.
90
91To follow the cluster log, use the following command
92
93::
94
95 ceph -w
96
97Ceph will print the status of the system, followed by each log message as it
98is emitted. For example:
99
100::
101
102 cluster:
103 id: 477e46f1-ae41-4e43-9c8f-72c918ab0a20
104 health: HEALTH_OK
105
106 services:
11fdf7f2 107 mon: 3 daemons, quorum a,b,c
c07f9fc5 108 mgr: x(active)
11fdf7f2
TL
109 mds: cephfs_a-1/1/1 up {0=a=up:active}, 2 up:standby
110 osd: 3 osds: 3 up, 3 in
c07f9fc5
FG
111
112 data:
113 pools: 2 pools, 16 pgs
11fdf7f2 114 objects: 21 objects, 2.19K
c07f9fc5
FG
115 usage: 546 GB used, 384 GB / 931 GB avail
116 pgs: 16 active+clean
117
118
119 2017-07-24 08:15:11.329298 mon.a mon.0 172.21.9.34:6789/0 23 : cluster [INF] osd.0 172.21.9.34:6806/20527 boot
120 2017-07-24 08:15:14.258143 mon.a mon.0 172.21.9.34:6789/0 39 : cluster [INF] Activating manager daemon x
121 2017-07-24 08:15:15.446025 mon.a mon.0 172.21.9.34:6789/0 47 : cluster [INF] Manager daemon x is now available
122
123
124In addition to using ``ceph -w`` to print log lines as they are emitted,
125use ``ceph log last [n]`` to see the most recent ``n`` lines from the cluster
126log.
127
128Monitoring Health Checks
129========================
130
11fdf7f2 131Ceph continuously runs various *health checks* against its own status. When
c07f9fc5
FG
132a health check fails, this is reflected in the output of ``ceph status`` (or
133``ceph health``). In addition, messages are sent to the cluster log to
134indicate when a check fails, and when the cluster recovers.
135
136For example, when an OSD goes down, the ``health`` section of the status
137output may be updated as follows:
138
139::
140
141 health: HEALTH_WARN
142 1 osds down
143 Degraded data redundancy: 21/63 objects degraded (33.333%), 16 pgs unclean, 16 pgs degraded
144
145At this time, cluster log messages are also emitted to record the failure of the
146health checks:
147
148::
149
150 2017-07-25 10:08:58.265945 mon.a mon.0 172.21.9.34:6789/0 91 : cluster [WRN] Health check failed: 1 osds down (OSD_DOWN)
151 2017-07-25 10:09:01.302624 mon.a mon.0 172.21.9.34:6789/0 94 : cluster [WRN] Health check failed: Degraded data redundancy: 21/63 objects degraded (33.333%), 16 pgs unclean, 16 pgs degraded (PG_DEGRADED)
152
153When the OSD comes back online, the cluster log records the cluster's return
154to a health state:
155
156::
157
158 2017-07-25 10:11:11.526841 mon.a mon.0 172.21.9.34:6789/0 109 : cluster [WRN] Health check update: Degraded data redundancy: 2 pgs unclean, 2 pgs degraded, 2 pgs undersized (PG_DEGRADED)
159 2017-07-25 10:11:13.535493 mon.a mon.0 172.21.9.34:6789/0 110 : cluster [INF] Health check cleared: PG_DEGRADED (was: Degraded data redundancy: 2 pgs unclean, 2 pgs degraded, 2 pgs undersized)
160 2017-07-25 10:11:13.535577 mon.a mon.0 172.21.9.34:6789/0 111 : cluster [INF] Cluster is now healthy
161
162
163Detecting configuration issues
164==============================
165
166In addition to the health checks that Ceph continuously runs on its
167own status, there are some configuration issues that may only be detected
168by an external tool.
169
170Use the `ceph-medic`_ tool to run these additional checks on your Ceph
171cluster's configuration.
172
7c673cae
FG
173Checking a Cluster's Usage Stats
174================================
175
176To check a cluster's data usage and data distribution among pools, you can
177use the ``df`` option. It is similar to Linux ``df``. Execute
178the following::
179
180 ceph df
181
11fdf7f2
TL
182The **RAW STORAGE** section of the output provides an overview of the
183amount of storage that is managed by your cluster.
7c673cae 184
11fdf7f2
TL
185- **CLASS:** The class of OSD device (or the total for the cluster)
186- **SIZE:** The amount of storage capacity managed by the cluster.
7c673cae 187- **AVAIL:** The amount of free space available in the cluster.
11fdf7f2
TL
188- **USED:** The amount of raw storage consumed by user data.
189- **RAW USED:** The amount of raw storage consumed by user data, internal overhead, or reserved capacity.
190- **%RAW USED:** The percentage of raw storage used. Use this number in
7c673cae
FG
191 conjunction with the ``full ratio`` and ``near full ratio`` to ensure that
192 you are not reaching your cluster's capacity. See `Storage Capacity`_ for
193 additional details.
194
195The **POOLS** section of the output provides a list of pools and the notional
196usage of each pool. The output from this section **DOES NOT** reflect replicas,
197clones or snapshots. For example, if you store an object with 1MB of data, the
198notional usage will be 1MB, but the actual usage may be 2MB or more depending
199on the number of replicas, clones and snapshots.
200
201- **NAME:** The name of the pool.
202- **ID:** The pool ID.
203- **USED:** The notional amount of data stored in kilobytes, unless the number
204 appends **M** for megabytes or **G** for gigabytes.
205- **%USED:** The notional percentage of storage used per pool.
206- **MAX AVAIL:** An estimate of the notional amount of data that can be written
207 to this pool.
11fdf7f2 208- **OBJECTS:** The notional number of objects stored per pool.
7c673cae
FG
209
210.. note:: The numbers in the **POOLS** section are notional. They are not
11fdf7f2 211 inclusive of the number of replicas, snapshots or clones. As a result,
7c673cae 212 the sum of the **USED** and **%USED** amounts will not add up to the
11fdf7f2 213 **USED** and **%USED** amounts in the **RAW** section of the
7c673cae
FG
214 output.
215
216.. note:: The **MAX AVAIL** value is a complicated function of the
217 replication or erasure code used, the CRUSH rule that maps storage
218 to devices, the utilization of those devices, and the configured
219 mon_osd_full_ratio.
220
221
7c673cae
FG
222
223Checking OSD Status
224===================
225
226You can check OSDs to ensure they are ``up`` and ``in`` by executing::
227
228 ceph osd stat
229
230Or::
231
232 ceph osd dump
233
234You can also check view OSDs according to their position in the CRUSH map. ::
235
236 ceph osd tree
237
238Ceph will print out a CRUSH tree with a host, its OSDs, whether they are up
239and their weight. ::
240
11fdf7f2
TL
241 #ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
242 -1 3.00000 pool default
243 -3 3.00000 rack mainrack
244 -2 3.00000 host osd-host
245 0 ssd 1.00000 osd.0 up 1.00000 1.00000
246 1 ssd 1.00000 osd.1 up 1.00000 1.00000
247 2 ssd 1.00000 osd.2 up 1.00000 1.00000
7c673cae
FG
248
249For a detailed discussion, refer to `Monitoring OSDs and Placement Groups`_.
250
251Checking Monitor Status
252=======================
253
254If your cluster has multiple monitors (likely), you should check the monitor
11fdf7f2 255quorum status after you start the cluster and before reading and/or writing data. A
7c673cae
FG
256quorum must be present when multiple monitors are running. You should also check
257monitor status periodically to ensure that they are running.
258
259To see display the monitor map, execute the following::
260
261 ceph mon stat
262
263Or::
264
265 ceph mon dump
266
267To check the quorum status for the monitor cluster, execute the following::
268
269 ceph quorum_status
270
271Ceph will return the quorum status. For example, a Ceph cluster consisting of
272three monitors may return the following:
273
274.. code-block:: javascript
275
276 { "election_epoch": 10,
277 "quorum": [
278 0,
279 1,
280 2],
11fdf7f2
TL
281 "quorum_names": [
282 "a",
283 "b",
284 "c"],
285 "quorum_leader_name": "a",
7c673cae
FG
286 "monmap": { "epoch": 1,
287 "fsid": "444b489c-4f16-4b75-83f0-cb8097468898",
288 "modified": "2011-12-12 13:28:27.505520",
289 "created": "2011-12-12 13:28:27.505520",
11fdf7f2
TL
290 "features": {"persistent": [
291 "kraken",
292 "luminous",
293 "mimic"],
294 "optional": []
295 },
7c673cae
FG
296 "mons": [
297 { "rank": 0,
298 "name": "a",
11fdf7f2
TL
299 "addr": "127.0.0.1:6789/0",
300 "public_addr": "127.0.0.1:6789/0"},
7c673cae
FG
301 { "rank": 1,
302 "name": "b",
11fdf7f2
TL
303 "addr": "127.0.0.1:6790/0",
304 "public_addr": "127.0.0.1:6790/0"},
7c673cae
FG
305 { "rank": 2,
306 "name": "c",
11fdf7f2
TL
307 "addr": "127.0.0.1:6791/0",
308 "public_addr": "127.0.0.1:6791/0"}
7c673cae 309 ]
11fdf7f2 310 }
7c673cae
FG
311 }
312
313Checking MDS Status
314===================
315
91327a77 316Metadata servers provide metadata services for CephFS. Metadata servers have
7c673cae
FG
317two sets of states: ``up | down`` and ``active | inactive``. To ensure your
318metadata servers are ``up`` and ``active``, execute the following::
319
320 ceph mds stat
321
322To display details of the metadata cluster, execute the following::
323
324 ceph fs dump
325
326
327Checking Placement Group States
328===============================
329
330Placement groups map objects to OSDs. When you monitor your
331placement groups, you will want them to be ``active`` and ``clean``.
332For a detailed discussion, refer to `Monitoring OSDs and Placement Groups`_.
333
334.. _Monitoring OSDs and Placement Groups: ../monitoring-osd-pg
335
336
337Using the Admin Socket
338======================
339
340The Ceph admin socket allows you to query a daemon via a socket interface.
341By default, Ceph sockets reside under ``/var/run/ceph``. To access a daemon
342via the admin socket, login to the host running the daemon and use the
343following command::
344
345 ceph daemon {daemon-name}
346 ceph daemon {path-to-socket-file}
347
348For example, the following are equivalent::
349
350 ceph daemon osd.0 foo
351 ceph daemon /var/run/ceph/ceph-osd.0.asok foo
352
353To view the available admin socket commands, execute the following command::
354
355 ceph daemon {daemon-name} help
356
357The admin socket command enables you to show and set your configuration at
358runtime. See `Viewing a Configuration at Runtime`_ for details.
359
360Additionally, you can set configuration values at runtime directly (i.e., the
361admin socket bypasses the monitor, unlike ``ceph tell {daemon-type}.{id}
11fdf7f2 362config set``, which relies on the monitor but doesn't require you to login
7c673cae
FG
363directly to the host in question ).
364
11fdf7f2 365.. _Viewing a Configuration at Runtime: ../../configuration/ceph-conf#viewing-a-configuration-at-runtime
7c673cae 366.. _Storage Capacity: ../../configuration/mon-config-ref#storage-capacity
c07f9fc5 367.. _ceph-medic: http://docs.ceph.com/ceph-medic/master/