]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | ====================== |
2 | Monitoring a Cluster | |
3 | ====================== | |
4 | ||
5 | Once you have a running cluster, you may use the ``ceph`` tool to monitor your | |
6 | cluster. Monitoring a cluster typically involves checking OSD status, monitor | |
7 | status, placement group status and metadata server status. | |
8 | ||
9 | Interactive Mode | |
10 | ================ | |
11 | ||
12 | To run the ``ceph`` tool in interactive mode, type ``ceph`` at the command line | |
13 | with no arguments. For example:: | |
14 | ||
15 | ceph | |
16 | ceph> health | |
17 | ceph> status | |
18 | ceph> quorum_status | |
19 | ceph> mon_status | |
20 | ||
21 | ||
22 | Checking Cluster Health | |
23 | ======================= | |
24 | ||
25 | After you start your cluster, and before you start reading and/or | |
26 | writing data, check your cluster's health first. You can check on the | |
27 | health of your Ceph cluster with the following:: | |
28 | ||
29 | ceph health | |
30 | ||
31 | If you specified non-default locations for your configuration or keyring, | |
32 | you may specify their locations:: | |
33 | ||
34 | ceph -c /path/to/conf -k /path/to/keyring health | |
35 | ||
36 | Upon starting the Ceph cluster, you will likely encounter a health | |
37 | warning such as ``HEALTH_WARN XXX num placement groups stale``. Wait a few moments and check | |
38 | it again. When your cluster is ready, ``ceph health`` should return a message | |
39 | such as ``HEALTH_OK``. At that point, it is okay to begin using the cluster. | |
40 | ||
41 | Watching a Cluster | |
42 | ================== | |
43 | ||
44 | To watch the cluster's ongoing events, open a new terminal. Then, enter:: | |
45 | ||
46 | ceph -w | |
47 | ||
48 | Ceph will print each event. For example, a tiny Ceph cluster consisting of | |
49 | one monitor, and two OSDs may print the following:: | |
50 | ||
51 | cluster b370a29d-9287-4ca3-ab57-3d824f65e339 | |
52 | health HEALTH_OK | |
53 | monmap e1: 1 mons at {ceph1=10.0.0.8:6789/0}, election epoch 2, quorum 0 ceph1 | |
54 | osdmap e63: 2 osds: 2 up, 2 in | |
55 | pgmap v41338: 952 pgs, 20 pools, 17130 MB data, 2199 objects | |
56 | 115 GB used, 167 GB / 297 GB avail | |
57 | 952 active+clean | |
58 | ||
59 | 2014-06-02 15:45:21.655871 osd.0 [INF] 17.71 deep-scrub ok | |
60 | 2014-06-02 15:45:47.880608 osd.1 [INF] 1.0 scrub ok | |
61 | 2014-06-02 15:45:48.865375 osd.1 [INF] 1.3 scrub ok | |
62 | 2014-06-02 15:45:50.866479 osd.1 [INF] 1.4 scrub ok | |
63 | 2014-06-02 15:45:01.345821 mon.0 [INF] pgmap v41339: 952 pgs: 952 active+clean; 17130 MB data, 115 GB used, 167 GB / 297 GB avail | |
64 | 2014-06-02 15:45:05.718640 mon.0 [INF] pgmap v41340: 952 pgs: 1 active+clean+scrubbing+deep, 951 active+clean; 17130 MB data, 115 GB used, 167 GB / 297 GB avail | |
65 | 2014-06-02 15:45:53.997726 osd.1 [INF] 1.5 scrub ok | |
66 | 2014-06-02 15:45:06.734270 mon.0 [INF] pgmap v41341: 952 pgs: 1 active+clean+scrubbing+deep, 951 active+clean; 17130 MB data, 115 GB used, 167 GB / 297 GB avail | |
67 | 2014-06-02 15:45:15.722456 mon.0 [INF] pgmap v41342: 952 pgs: 952 active+clean; 17130 MB data, 115 GB used, 167 GB / 297 GB avail | |
68 | 2014-06-02 15:46:06.836430 osd.0 [INF] 17.75 deep-scrub ok | |
69 | 2014-06-02 15:45:55.720929 mon.0 [INF] pgmap v41343: 952 pgs: 1 active+clean+scrubbing+deep, 951 active+clean; 17130 MB data, 115 GB used, 167 GB / 297 GB avail | |
70 | ||
71 | ||
72 | The output provides: | |
73 | ||
74 | - Cluster ID | |
75 | - Cluster health status | |
76 | - The monitor map epoch and the status of the monitor quorum | |
77 | - The OSD map epoch and the status of OSDs | |
78 | - The placement group map version | |
79 | - The number of placement groups and pools | |
80 | - The *notional* amount of data stored and the number of objects stored; and, | |
81 | - The total amount of data stored. | |
82 | ||
83 | .. topic:: How Ceph Calculates Data Usage | |
84 | ||
85 | The ``used`` value reflects the *actual* amount of raw storage used. The | |
86 | ``xxx GB / xxx GB`` value means the amount available (the lesser number) | |
87 | of the overall storage capacity of the cluster. The notional number reflects | |
88 | the size of the stored data before it is replicated, cloned or snapshotted. | |
89 | Therefore, the amount of data actually stored typically exceeds the notional | |
90 | amount stored, because Ceph creates replicas of the data and may also use | |
91 | storage capacity for cloning and snapshotting. | |
92 | ||
93 | ||
94 | Checking a Cluster's Usage Stats | |
95 | ================================ | |
96 | ||
97 | To check a cluster's data usage and data distribution among pools, you can | |
98 | use the ``df`` option. It is similar to Linux ``df``. Execute | |
99 | the following:: | |
100 | ||
101 | ceph df | |
102 | ||
103 | The **GLOBAL** section of the output provides an overview of the amount of | |
104 | storage your cluster uses for your data. | |
105 | ||
106 | - **SIZE:** The overall storage capacity of the cluster. | |
107 | - **AVAIL:** The amount of free space available in the cluster. | |
108 | - **RAW USED:** The amount of raw storage used. | |
109 | - **% RAW USED:** The percentage of raw storage used. Use this number in | |
110 | conjunction with the ``full ratio`` and ``near full ratio`` to ensure that | |
111 | you are not reaching your cluster's capacity. See `Storage Capacity`_ for | |
112 | additional details. | |
113 | ||
114 | The **POOLS** section of the output provides a list of pools and the notional | |
115 | usage of each pool. The output from this section **DOES NOT** reflect replicas, | |
116 | clones or snapshots. For example, if you store an object with 1MB of data, the | |
117 | notional usage will be 1MB, but the actual usage may be 2MB or more depending | |
118 | on the number of replicas, clones and snapshots. | |
119 | ||
120 | - **NAME:** The name of the pool. | |
121 | - **ID:** The pool ID. | |
122 | - **USED:** The notional amount of data stored in kilobytes, unless the number | |
123 | appends **M** for megabytes or **G** for gigabytes. | |
124 | - **%USED:** The notional percentage of storage used per pool. | |
125 | - **MAX AVAIL:** An estimate of the notional amount of data that can be written | |
126 | to this pool. | |
127 | - **Objects:** The notional number of objects stored per pool. | |
128 | ||
129 | .. note:: The numbers in the **POOLS** section are notional. They are not | |
130 | inclusive of the number of replicas, shapshots or clones. As a result, | |
131 | the sum of the **USED** and **%USED** amounts will not add up to the | |
132 | **RAW USED** and **%RAW USED** amounts in the **GLOBAL** section of the | |
133 | output. | |
134 | ||
135 | .. note:: The **MAX AVAIL** value is a complicated function of the | |
136 | replication or erasure code used, the CRUSH rule that maps storage | |
137 | to devices, the utilization of those devices, and the configured | |
138 | mon_osd_full_ratio. | |
139 | ||
140 | ||
141 | Checking a Cluster's Status | |
142 | =========================== | |
143 | ||
144 | To check a cluster's status, execute the following:: | |
145 | ||
146 | ceph status | |
147 | ||
148 | Or:: | |
149 | ||
150 | ceph -s | |
151 | ||
152 | In interactive mode, type ``status`` and press **Enter**. :: | |
153 | ||
154 | ceph> status | |
155 | ||
156 | Ceph will print the cluster status. For example, a tiny Ceph cluster consisting | |
157 | of one monitor, and two OSDs may print the following:: | |
158 | ||
159 | cluster b370a29d-9287-4ca3-ab57-3d824f65e339 | |
160 | health HEALTH_OK | |
161 | monmap e1: 1 mons at {ceph1=10.0.0.8:6789/0}, election epoch 2, quorum 0 ceph1 | |
162 | osdmap e63: 2 osds: 2 up, 2 in | |
163 | pgmap v41332: 952 pgs, 20 pools, 17130 MB data, 2199 objects | |
164 | 115 GB used, 167 GB / 297 GB avail | |
165 | 1 active+clean+scrubbing+deep | |
166 | 951 active+clean | |
167 | ||
168 | ||
169 | Checking OSD Status | |
170 | =================== | |
171 | ||
172 | You can check OSDs to ensure they are ``up`` and ``in`` by executing:: | |
173 | ||
174 | ceph osd stat | |
175 | ||
176 | Or:: | |
177 | ||
178 | ceph osd dump | |
179 | ||
180 | You can also check view OSDs according to their position in the CRUSH map. :: | |
181 | ||
182 | ceph osd tree | |
183 | ||
184 | Ceph will print out a CRUSH tree with a host, its OSDs, whether they are up | |
185 | and their weight. :: | |
186 | ||
187 | # id weight type name up/down reweight | |
188 | -1 3 pool default | |
189 | -3 3 rack mainrack | |
190 | -2 3 host osd-host | |
191 | 0 1 osd.0 up 1 | |
192 | 1 1 osd.1 up 1 | |
193 | 2 1 osd.2 up 1 | |
194 | ||
195 | For a detailed discussion, refer to `Monitoring OSDs and Placement Groups`_. | |
196 | ||
197 | Checking Monitor Status | |
198 | ======================= | |
199 | ||
200 | If your cluster has multiple monitors (likely), you should check the monitor | |
201 | quorum status after you start the cluster before reading and/or writing data. A | |
202 | quorum must be present when multiple monitors are running. You should also check | |
203 | monitor status periodically to ensure that they are running. | |
204 | ||
205 | To see display the monitor map, execute the following:: | |
206 | ||
207 | ceph mon stat | |
208 | ||
209 | Or:: | |
210 | ||
211 | ceph mon dump | |
212 | ||
213 | To check the quorum status for the monitor cluster, execute the following:: | |
214 | ||
215 | ceph quorum_status | |
216 | ||
217 | Ceph will return the quorum status. For example, a Ceph cluster consisting of | |
218 | three monitors may return the following: | |
219 | ||
220 | .. code-block:: javascript | |
221 | ||
222 | { "election_epoch": 10, | |
223 | "quorum": [ | |
224 | 0, | |
225 | 1, | |
226 | 2], | |
227 | "monmap": { "epoch": 1, | |
228 | "fsid": "444b489c-4f16-4b75-83f0-cb8097468898", | |
229 | "modified": "2011-12-12 13:28:27.505520", | |
230 | "created": "2011-12-12 13:28:27.505520", | |
231 | "mons": [ | |
232 | { "rank": 0, | |
233 | "name": "a", | |
234 | "addr": "127.0.0.1:6789\/0"}, | |
235 | { "rank": 1, | |
236 | "name": "b", | |
237 | "addr": "127.0.0.1:6790\/0"}, | |
238 | { "rank": 2, | |
239 | "name": "c", | |
240 | "addr": "127.0.0.1:6791\/0"} | |
241 | ] | |
242 | } | |
243 | } | |
244 | ||
245 | Checking MDS Status | |
246 | =================== | |
247 | ||
248 | Metadata servers provide metadata services for Ceph FS. Metadata servers have | |
249 | two sets of states: ``up | down`` and ``active | inactive``. To ensure your | |
250 | metadata servers are ``up`` and ``active``, execute the following:: | |
251 | ||
252 | ceph mds stat | |
253 | ||
254 | To display details of the metadata cluster, execute the following:: | |
255 | ||
256 | ceph fs dump | |
257 | ||
258 | ||
259 | Checking Placement Group States | |
260 | =============================== | |
261 | ||
262 | Placement groups map objects to OSDs. When you monitor your | |
263 | placement groups, you will want them to be ``active`` and ``clean``. | |
264 | For a detailed discussion, refer to `Monitoring OSDs and Placement Groups`_. | |
265 | ||
266 | .. _Monitoring OSDs and Placement Groups: ../monitoring-osd-pg | |
267 | ||
268 | ||
269 | Using the Admin Socket | |
270 | ====================== | |
271 | ||
272 | The Ceph admin socket allows you to query a daemon via a socket interface. | |
273 | By default, Ceph sockets reside under ``/var/run/ceph``. To access a daemon | |
274 | via the admin socket, login to the host running the daemon and use the | |
275 | following command:: | |
276 | ||
277 | ceph daemon {daemon-name} | |
278 | ceph daemon {path-to-socket-file} | |
279 | ||
280 | For example, the following are equivalent:: | |
281 | ||
282 | ceph daemon osd.0 foo | |
283 | ceph daemon /var/run/ceph/ceph-osd.0.asok foo | |
284 | ||
285 | To view the available admin socket commands, execute the following command:: | |
286 | ||
287 | ceph daemon {daemon-name} help | |
288 | ||
289 | The admin socket command enables you to show and set your configuration at | |
290 | runtime. See `Viewing a Configuration at Runtime`_ for details. | |
291 | ||
292 | Additionally, you can set configuration values at runtime directly (i.e., the | |
293 | admin socket bypasses the monitor, unlike ``ceph tell {daemon-type}.{id} | |
294 | injectargs``, which relies on the monitor but doesn't require you to login | |
295 | directly to the host in question ). | |
296 | ||
297 | .. _Viewing a Configuration at Runtime: ../../configuration/ceph-conf#ceph-runtime-config | |
298 | .. _Storage Capacity: ../../configuration/mon-config-ref#storage-capacity |