]> git.proxmox.com Git - ceph.git/blame - ceph/doc/rados/operations/monitoring.rst
add subtree-ish sources for 12.0.3
[ceph.git] / ceph / doc / rados / operations / monitoring.rst
CommitLineData
7c673cae
FG
1======================
2 Monitoring a Cluster
3======================
4
5Once you have a running cluster, you may use the ``ceph`` tool to monitor your
6cluster. Monitoring a cluster typically involves checking OSD status, monitor
7status, placement group status and metadata server status.
8
9Interactive Mode
10================
11
12To run the ``ceph`` tool in interactive mode, type ``ceph`` at the command line
13with no arguments. For example::
14
15 ceph
16 ceph> health
17 ceph> status
18 ceph> quorum_status
19 ceph> mon_status
20
21
22Checking Cluster Health
23=======================
24
25After you start your cluster, and before you start reading and/or
26writing data, check your cluster's health first. You can check on the
27health of your Ceph cluster with the following::
28
29 ceph health
30
31If you specified non-default locations for your configuration or keyring,
32you may specify their locations::
33
34 ceph -c /path/to/conf -k /path/to/keyring health
35
36Upon starting the Ceph cluster, you will likely encounter a health
37warning such as ``HEALTH_WARN XXX num placement groups stale``. Wait a few moments and check
38it again. When your cluster is ready, ``ceph health`` should return a message
39such as ``HEALTH_OK``. At that point, it is okay to begin using the cluster.
40
41Watching a Cluster
42==================
43
44To watch the cluster's ongoing events, open a new terminal. Then, enter::
45
46 ceph -w
47
48Ceph will print each event. For example, a tiny Ceph cluster consisting of
49one monitor, and two OSDs may print the following::
50
51 cluster b370a29d-9287-4ca3-ab57-3d824f65e339
52 health HEALTH_OK
53 monmap e1: 1 mons at {ceph1=10.0.0.8:6789/0}, election epoch 2, quorum 0 ceph1
54 osdmap e63: 2 osds: 2 up, 2 in
55 pgmap v41338: 952 pgs, 20 pools, 17130 MB data, 2199 objects
56 115 GB used, 167 GB / 297 GB avail
57 952 active+clean
58
59 2014-06-02 15:45:21.655871 osd.0 [INF] 17.71 deep-scrub ok
60 2014-06-02 15:45:47.880608 osd.1 [INF] 1.0 scrub ok
61 2014-06-02 15:45:48.865375 osd.1 [INF] 1.3 scrub ok
62 2014-06-02 15:45:50.866479 osd.1 [INF] 1.4 scrub ok
63 2014-06-02 15:45:01.345821 mon.0 [INF] pgmap v41339: 952 pgs: 952 active+clean; 17130 MB data, 115 GB used, 167 GB / 297 GB avail
64 2014-06-02 15:45:05.718640 mon.0 [INF] pgmap v41340: 952 pgs: 1 active+clean+scrubbing+deep, 951 active+clean; 17130 MB data, 115 GB used, 167 GB / 297 GB avail
65 2014-06-02 15:45:53.997726 osd.1 [INF] 1.5 scrub ok
66 2014-06-02 15:45:06.734270 mon.0 [INF] pgmap v41341: 952 pgs: 1 active+clean+scrubbing+deep, 951 active+clean; 17130 MB data, 115 GB used, 167 GB / 297 GB avail
67 2014-06-02 15:45:15.722456 mon.0 [INF] pgmap v41342: 952 pgs: 952 active+clean; 17130 MB data, 115 GB used, 167 GB / 297 GB avail
68 2014-06-02 15:46:06.836430 osd.0 [INF] 17.75 deep-scrub ok
69 2014-06-02 15:45:55.720929 mon.0 [INF] pgmap v41343: 952 pgs: 1 active+clean+scrubbing+deep, 951 active+clean; 17130 MB data, 115 GB used, 167 GB / 297 GB avail
70
71
72The output provides:
73
74- Cluster ID
75- Cluster health status
76- The monitor map epoch and the status of the monitor quorum
77- The OSD map epoch and the status of OSDs
78- The placement group map version
79- The number of placement groups and pools
80- The *notional* amount of data stored and the number of objects stored; and,
81- The total amount of data stored.
82
83.. topic:: How Ceph Calculates Data Usage
84
85 The ``used`` value reflects the *actual* amount of raw storage used. The
86 ``xxx GB / xxx GB`` value means the amount available (the lesser number)
87 of the overall storage capacity of the cluster. The notional number reflects
88 the size of the stored data before it is replicated, cloned or snapshotted.
89 Therefore, the amount of data actually stored typically exceeds the notional
90 amount stored, because Ceph creates replicas of the data and may also use
91 storage capacity for cloning and snapshotting.
92
93
94Checking a Cluster's Usage Stats
95================================
96
97To check a cluster's data usage and data distribution among pools, you can
98use the ``df`` option. It is similar to Linux ``df``. Execute
99the following::
100
101 ceph df
102
103The **GLOBAL** section of the output provides an overview of the amount of
104storage your cluster uses for your data.
105
106- **SIZE:** The overall storage capacity of the cluster.
107- **AVAIL:** The amount of free space available in the cluster.
108- **RAW USED:** The amount of raw storage used.
109- **% RAW USED:** The percentage of raw storage used. Use this number in
110 conjunction with the ``full ratio`` and ``near full ratio`` to ensure that
111 you are not reaching your cluster's capacity. See `Storage Capacity`_ for
112 additional details.
113
114The **POOLS** section of the output provides a list of pools and the notional
115usage of each pool. The output from this section **DOES NOT** reflect replicas,
116clones or snapshots. For example, if you store an object with 1MB of data, the
117notional usage will be 1MB, but the actual usage may be 2MB or more depending
118on the number of replicas, clones and snapshots.
119
120- **NAME:** The name of the pool.
121- **ID:** The pool ID.
122- **USED:** The notional amount of data stored in kilobytes, unless the number
123 appends **M** for megabytes or **G** for gigabytes.
124- **%USED:** The notional percentage of storage used per pool.
125- **MAX AVAIL:** An estimate of the notional amount of data that can be written
126 to this pool.
127- **Objects:** The notional number of objects stored per pool.
128
129.. note:: The numbers in the **POOLS** section are notional. They are not
130 inclusive of the number of replicas, shapshots or clones. As a result,
131 the sum of the **USED** and **%USED** amounts will not add up to the
132 **RAW USED** and **%RAW USED** amounts in the **GLOBAL** section of the
133 output.
134
135.. note:: The **MAX AVAIL** value is a complicated function of the
136 replication or erasure code used, the CRUSH rule that maps storage
137 to devices, the utilization of those devices, and the configured
138 mon_osd_full_ratio.
139
140
141Checking a Cluster's Status
142===========================
143
144To check a cluster's status, execute the following::
145
146 ceph status
147
148Or::
149
150 ceph -s
151
152In interactive mode, type ``status`` and press **Enter**. ::
153
154 ceph> status
155
156Ceph will print the cluster status. For example, a tiny Ceph cluster consisting
157of one monitor, and two OSDs may print the following::
158
159 cluster b370a29d-9287-4ca3-ab57-3d824f65e339
160 health HEALTH_OK
161 monmap e1: 1 mons at {ceph1=10.0.0.8:6789/0}, election epoch 2, quorum 0 ceph1
162 osdmap e63: 2 osds: 2 up, 2 in
163 pgmap v41332: 952 pgs, 20 pools, 17130 MB data, 2199 objects
164 115 GB used, 167 GB / 297 GB avail
165 1 active+clean+scrubbing+deep
166 951 active+clean
167
168
169Checking OSD Status
170===================
171
172You can check OSDs to ensure they are ``up`` and ``in`` by executing::
173
174 ceph osd stat
175
176Or::
177
178 ceph osd dump
179
180You can also check view OSDs according to their position in the CRUSH map. ::
181
182 ceph osd tree
183
184Ceph will print out a CRUSH tree with a host, its OSDs, whether they are up
185and their weight. ::
186
187 # id weight type name up/down reweight
188 -1 3 pool default
189 -3 3 rack mainrack
190 -2 3 host osd-host
191 0 1 osd.0 up 1
192 1 1 osd.1 up 1
193 2 1 osd.2 up 1
194
195For a detailed discussion, refer to `Monitoring OSDs and Placement Groups`_.
196
197Checking Monitor Status
198=======================
199
200If your cluster has multiple monitors (likely), you should check the monitor
201quorum status after you start the cluster before reading and/or writing data. A
202quorum must be present when multiple monitors are running. You should also check
203monitor status periodically to ensure that they are running.
204
205To see display the monitor map, execute the following::
206
207 ceph mon stat
208
209Or::
210
211 ceph mon dump
212
213To check the quorum status for the monitor cluster, execute the following::
214
215 ceph quorum_status
216
217Ceph will return the quorum status. For example, a Ceph cluster consisting of
218three monitors may return the following:
219
220.. code-block:: javascript
221
222 { "election_epoch": 10,
223 "quorum": [
224 0,
225 1,
226 2],
227 "monmap": { "epoch": 1,
228 "fsid": "444b489c-4f16-4b75-83f0-cb8097468898",
229 "modified": "2011-12-12 13:28:27.505520",
230 "created": "2011-12-12 13:28:27.505520",
231 "mons": [
232 { "rank": 0,
233 "name": "a",
234 "addr": "127.0.0.1:6789\/0"},
235 { "rank": 1,
236 "name": "b",
237 "addr": "127.0.0.1:6790\/0"},
238 { "rank": 2,
239 "name": "c",
240 "addr": "127.0.0.1:6791\/0"}
241 ]
242 }
243 }
244
245Checking MDS Status
246===================
247
248Metadata servers provide metadata services for Ceph FS. Metadata servers have
249two sets of states: ``up | down`` and ``active | inactive``. To ensure your
250metadata servers are ``up`` and ``active``, execute the following::
251
252 ceph mds stat
253
254To display details of the metadata cluster, execute the following::
255
256 ceph fs dump
257
258
259Checking Placement Group States
260===============================
261
262Placement groups map objects to OSDs. When you monitor your
263placement groups, you will want them to be ``active`` and ``clean``.
264For a detailed discussion, refer to `Monitoring OSDs and Placement Groups`_.
265
266.. _Monitoring OSDs and Placement Groups: ../monitoring-osd-pg
267
268
269Using the Admin Socket
270======================
271
272The Ceph admin socket allows you to query a daemon via a socket interface.
273By default, Ceph sockets reside under ``/var/run/ceph``. To access a daemon
274via the admin socket, login to the host running the daemon and use the
275following command::
276
277 ceph daemon {daemon-name}
278 ceph daemon {path-to-socket-file}
279
280For example, the following are equivalent::
281
282 ceph daemon osd.0 foo
283 ceph daemon /var/run/ceph/ceph-osd.0.asok foo
284
285To view the available admin socket commands, execute the following command::
286
287 ceph daemon {daemon-name} help
288
289The admin socket command enables you to show and set your configuration at
290runtime. See `Viewing a Configuration at Runtime`_ for details.
291
292Additionally, you can set configuration values at runtime directly (i.e., the
293admin socket bypasses the monitor, unlike ``ceph tell {daemon-type}.{id}
294injectargs``, which relies on the monitor but doesn't require you to login
295directly to the host in question ).
296
297.. _Viewing a Configuration at Runtime: ../../configuration/ceph-conf#ceph-runtime-config
298.. _Storage Capacity: ../../configuration/mon-config-ref#storage-capacity