]> git.proxmox.com Git - ceph.git/blob - ceph/doc/rados/operations/monitoring.rst
bump version to 12.1.1-pve1 while rebasing patches
[ceph.git] / ceph / doc / rados / operations / monitoring.rst
1 ======================
2 Monitoring a Cluster
3 ======================
4
5 Once you have a running cluster, you may use the ``ceph`` tool to monitor your
6 cluster. Monitoring a cluster typically involves checking OSD status, monitor
7 status, placement group status and metadata server status.
8
9 Interactive Mode
10 ================
11
12 To run the ``ceph`` tool in interactive mode, type ``ceph`` at the command line
13 with no arguments. For example::
14
15 ceph
16 ceph> health
17 ceph> status
18 ceph> quorum_status
19 ceph> mon_status
20
21
22 Checking Cluster Health
23 =======================
24
25 After you start your cluster, and before you start reading and/or
26 writing data, check your cluster's health first. You can check on the
27 health of your Ceph cluster with the following::
28
29 ceph health
30
31 If you specified non-default locations for your configuration or keyring,
32 you may specify their locations::
33
34 ceph -c /path/to/conf -k /path/to/keyring health
35
36 Upon starting the Ceph cluster, you will likely encounter a health
37 warning such as ``HEALTH_WARN XXX num placement groups stale``. Wait a few moments and check
38 it again. When your cluster is ready, ``ceph health`` should return a message
39 such as ``HEALTH_OK``. At that point, it is okay to begin using the cluster.
40
41 Watching a Cluster
42 ==================
43
44 To watch the cluster's ongoing events, open a new terminal. Then, enter::
45
46 ceph -w
47
48 Ceph will print each event. For example, a tiny Ceph cluster consisting of
49 one monitor, and two OSDs may print the following::
50
51 cluster b370a29d-9287-4ca3-ab57-3d824f65e339
52 health HEALTH_OK
53 monmap e1: 1 mons at {ceph1=10.0.0.8:6789/0}, election epoch 2, quorum 0 ceph1
54 osdmap e63: 2 osds: 2 up, 2 in
55 pgmap v41338: 952 pgs, 20 pools, 17130 MB data, 2199 objects
56 115 GB used, 167 GB / 297 GB avail
57 952 active+clean
58
59 2014-06-02 15:45:21.655871 osd.0 [INF] 17.71 deep-scrub ok
60 2014-06-02 15:45:47.880608 osd.1 [INF] 1.0 scrub ok
61 2014-06-02 15:45:48.865375 osd.1 [INF] 1.3 scrub ok
62 2014-06-02 15:45:50.866479 osd.1 [INF] 1.4 scrub ok
63 2014-06-02 15:45:01.345821 mon.0 [INF] pgmap v41339: 952 pgs: 952 active+clean; 17130 MB data, 115 GB used, 167 GB / 297 GB avail
64 2014-06-02 15:45:05.718640 mon.0 [INF] pgmap v41340: 952 pgs: 1 active+clean+scrubbing+deep, 951 active+clean; 17130 MB data, 115 GB used, 167 GB / 297 GB avail
65 2014-06-02 15:45:53.997726 osd.1 [INF] 1.5 scrub ok
66 2014-06-02 15:45:06.734270 mon.0 [INF] pgmap v41341: 952 pgs: 1 active+clean+scrubbing+deep, 951 active+clean; 17130 MB data, 115 GB used, 167 GB / 297 GB avail
67 2014-06-02 15:45:15.722456 mon.0 [INF] pgmap v41342: 952 pgs: 952 active+clean; 17130 MB data, 115 GB used, 167 GB / 297 GB avail
68 2014-06-02 15:46:06.836430 osd.0 [INF] 17.75 deep-scrub ok
69 2014-06-02 15:45:55.720929 mon.0 [INF] pgmap v41343: 952 pgs: 1 active+clean+scrubbing+deep, 951 active+clean; 17130 MB data, 115 GB used, 167 GB / 297 GB avail
70
71
72 The output provides:
73
74 - Cluster ID
75 - Cluster health status
76 - The monitor map epoch and the status of the monitor quorum
77 - The OSD map epoch and the status of OSDs
78 - The placement group map version
79 - The number of placement groups and pools
80 - The *notional* amount of data stored and the number of objects stored; and,
81 - The total amount of data stored.
82
83 .. topic:: How Ceph Calculates Data Usage
84
85 The ``used`` value reflects the *actual* amount of raw storage used. The
86 ``xxx GB / xxx GB`` value means the amount available (the lesser number)
87 of the overall storage capacity of the cluster. The notional number reflects
88 the size of the stored data before it is replicated, cloned or snapshotted.
89 Therefore, the amount of data actually stored typically exceeds the notional
90 amount stored, because Ceph creates replicas of the data and may also use
91 storage capacity for cloning and snapshotting.
92
93
94 Checking a Cluster's Usage Stats
95 ================================
96
97 To check a cluster's data usage and data distribution among pools, you can
98 use the ``df`` option. It is similar to Linux ``df``. Execute
99 the following::
100
101 ceph df
102
103 The **GLOBAL** section of the output provides an overview of the amount of
104 storage your cluster uses for your data.
105
106 - **SIZE:** The overall storage capacity of the cluster.
107 - **AVAIL:** The amount of free space available in the cluster.
108 - **RAW USED:** The amount of raw storage used.
109 - **% RAW USED:** The percentage of raw storage used. Use this number in
110 conjunction with the ``full ratio`` and ``near full ratio`` to ensure that
111 you are not reaching your cluster's capacity. See `Storage Capacity`_ for
112 additional details.
113
114 The **POOLS** section of the output provides a list of pools and the notional
115 usage of each pool. The output from this section **DOES NOT** reflect replicas,
116 clones or snapshots. For example, if you store an object with 1MB of data, the
117 notional usage will be 1MB, but the actual usage may be 2MB or more depending
118 on the number of replicas, clones and snapshots.
119
120 - **NAME:** The name of the pool.
121 - **ID:** The pool ID.
122 - **USED:** The notional amount of data stored in kilobytes, unless the number
123 appends **M** for megabytes or **G** for gigabytes.
124 - **%USED:** The notional percentage of storage used per pool.
125 - **MAX AVAIL:** An estimate of the notional amount of data that can be written
126 to this pool.
127 - **Objects:** The notional number of objects stored per pool.
128
129 .. note:: The numbers in the **POOLS** section are notional. They are not
130 inclusive of the number of replicas, shapshots or clones. As a result,
131 the sum of the **USED** and **%USED** amounts will not add up to the
132 **RAW USED** and **%RAW USED** amounts in the **GLOBAL** section of the
133 output.
134
135 .. note:: The **MAX AVAIL** value is a complicated function of the
136 replication or erasure code used, the CRUSH rule that maps storage
137 to devices, the utilization of those devices, and the configured
138 mon_osd_full_ratio.
139
140
141 Checking a Cluster's Status
142 ===========================
143
144 To check a cluster's status, execute the following::
145
146 ceph status
147
148 Or::
149
150 ceph -s
151
152 In interactive mode, type ``status`` and press **Enter**. ::
153
154 ceph> status
155
156 Ceph will print the cluster status. For example, a tiny Ceph cluster consisting
157 of one monitor, and two OSDs may print the following::
158
159 cluster b370a29d-9287-4ca3-ab57-3d824f65e339
160 health HEALTH_OK
161 monmap e1: 1 mons at {ceph1=10.0.0.8:6789/0}, election epoch 2, quorum 0 ceph1
162 osdmap e63: 2 osds: 2 up, 2 in
163 pgmap v41332: 952 pgs, 20 pools, 17130 MB data, 2199 objects
164 115 GB used, 167 GB / 297 GB avail
165 1 active+clean+scrubbing+deep
166 951 active+clean
167
168
169 Checking OSD Status
170 ===================
171
172 You can check OSDs to ensure they are ``up`` and ``in`` by executing::
173
174 ceph osd stat
175
176 Or::
177
178 ceph osd dump
179
180 You can also check view OSDs according to their position in the CRUSH map. ::
181
182 ceph osd tree
183
184 Ceph will print out a CRUSH tree with a host, its OSDs, whether they are up
185 and their weight. ::
186
187 # id weight type name up/down reweight
188 -1 3 pool default
189 -3 3 rack mainrack
190 -2 3 host osd-host
191 0 1 osd.0 up 1
192 1 1 osd.1 up 1
193 2 1 osd.2 up 1
194
195 For a detailed discussion, refer to `Monitoring OSDs and Placement Groups`_.
196
197 Checking Monitor Status
198 =======================
199
200 If your cluster has multiple monitors (likely), you should check the monitor
201 quorum status after you start the cluster before reading and/or writing data. A
202 quorum must be present when multiple monitors are running. You should also check
203 monitor status periodically to ensure that they are running.
204
205 To see display the monitor map, execute the following::
206
207 ceph mon stat
208
209 Or::
210
211 ceph mon dump
212
213 To check the quorum status for the monitor cluster, execute the following::
214
215 ceph quorum_status
216
217 Ceph will return the quorum status. For example, a Ceph cluster consisting of
218 three monitors may return the following:
219
220 .. code-block:: javascript
221
222 { "election_epoch": 10,
223 "quorum": [
224 0,
225 1,
226 2],
227 "monmap": { "epoch": 1,
228 "fsid": "444b489c-4f16-4b75-83f0-cb8097468898",
229 "modified": "2011-12-12 13:28:27.505520",
230 "created": "2011-12-12 13:28:27.505520",
231 "mons": [
232 { "rank": 0,
233 "name": "a",
234 "addr": "127.0.0.1:6789\/0"},
235 { "rank": 1,
236 "name": "b",
237 "addr": "127.0.0.1:6790\/0"},
238 { "rank": 2,
239 "name": "c",
240 "addr": "127.0.0.1:6791\/0"}
241 ]
242 }
243 }
244
245 Checking MDS Status
246 ===================
247
248 Metadata servers provide metadata services for Ceph FS. Metadata servers have
249 two sets of states: ``up | down`` and ``active | inactive``. To ensure your
250 metadata servers are ``up`` and ``active``, execute the following::
251
252 ceph mds stat
253
254 To display details of the metadata cluster, execute the following::
255
256 ceph fs dump
257
258
259 Checking Placement Group States
260 ===============================
261
262 Placement groups map objects to OSDs. When you monitor your
263 placement groups, you will want them to be ``active`` and ``clean``.
264 For a detailed discussion, refer to `Monitoring OSDs and Placement Groups`_.
265
266 .. _Monitoring OSDs and Placement Groups: ../monitoring-osd-pg
267
268
269 Using the Admin Socket
270 ======================
271
272 The Ceph admin socket allows you to query a daemon via a socket interface.
273 By default, Ceph sockets reside under ``/var/run/ceph``. To access a daemon
274 via the admin socket, login to the host running the daemon and use the
275 following command::
276
277 ceph daemon {daemon-name}
278 ceph daemon {path-to-socket-file}
279
280 For example, the following are equivalent::
281
282 ceph daemon osd.0 foo
283 ceph daemon /var/run/ceph/ceph-osd.0.asok foo
284
285 To view the available admin socket commands, execute the following command::
286
287 ceph daemon {daemon-name} help
288
289 The admin socket command enables you to show and set your configuration at
290 runtime. See `Viewing a Configuration at Runtime`_ for details.
291
292 Additionally, you can set configuration values at runtime directly (i.e., the
293 admin socket bypasses the monitor, unlike ``ceph tell {daemon-type}.{id}
294 injectargs``, which relies on the monitor but doesn't require you to login
295 directly to the host in question ).
296
297 .. _Viewing a Configuration at Runtime: ../../configuration/ceph-conf#ceph-runtime-config
298 .. _Storage Capacity: ../../configuration/mon-config-ref#storage-capacity