]>
Commit | Line | Data |
---|---|---|
20effc67 TL |
1 | ============== |
2 | Health Reports | |
3 | ============== | |
4 | ||
5 | ||
6 | How to Get Reports | |
7 | ================== | |
8 | ||
9 | In general, there are two channels to retrieve the health reports: | |
10 | ||
11 | ceph (CLI) | |
12 | which sends ``health`` mon command for retrieving the health status of the cluster | |
13 | mgr module | |
14 | which calls ``mgr.get('health')`` for the same report in the form of a JSON encoded string | |
15 | ||
16 | The following diagrams outline the involved parties and how the interact when the clients | |
17 | query for the reports: | |
18 | ||
20effc67 TL |
19 | |
20 | Where are the Reports Generated | |
21 | =============================== | |
22 | ||
23 | Aggregator of Aggregators | |
24 | ------------------------- | |
25 | ||
26 | Health reports are aggregated from multiple Paxos services: | |
27 | ||
28 | - AuthMonitor | |
29 | - HealthMonitor | |
30 | - MDSMonitor | |
31 | - MgrMonitor | |
32 | - MgrStatMonitor | |
33 | - MonmapMonitor | |
34 | - OSDMonitor | |
35 | ||
36 | When persisting the pending changes in their own domain, each of them identifies the | |
37 | health related issues and store them into the monstore with the prefix of ``health`` | |
38 | using the same transaction. For instance, ``OSDMonitor`` checks a pending new osdmap | |
39 | for possible issues, like down OSDs and missing scrub flag in a pool, and then stores | |
40 | the encoded form of the health reports along with the new osdmap. These reports are | |
41 | later loaded and decoded, so they can be collected on demand. When it comes to | |
42 | ``MDSMonitor``, it persists the health metrics in the beacon sent by the MDS daemons, | |
43 | and prepares health reports when storing the pending changes. | |
44 | ||
20effc67 TL |
45 | |
46 | So, if we want to add a new warning related to cephfs, probably the best place to | |
47 | start is ``MDSMonitor::encode_pending()``, where health reports are collected from | |
48 | the latest ``FSMap`` and the health metrics reported by MDS daemons. | |
49 | ||
50 | But it's noteworthy that ``MgrStatMonitor`` does *not* prepare the reports by itself, | |
51 | it just stores whatever the health reports received from mgr! | |
52 | ||
1e59de90 TL |
53 | ceph-mgr -- A Delegate Aggregator |
54 | --------------------------------- | |
20effc67 TL |
55 | |
56 | In Ceph, mgr is created to share the burden of monitor, which is used to establish | |
57 | the consensus of information which is critical to keep the cluster function. | |
58 | Apparently, osdmap, mdsmap and monmap fall into this category. But what about the | |
59 | aggregated statistics of the cluster? They are crucial for the administrator to | |
60 | understand the status of the cluster, but they might not be that important to keep | |
1e59de90 | 61 | the cluster running. To address this scalability issue, we offloaded the work of |
20effc67 TL |
62 | collecting and aggregating the metrics to mgr. |
63 | ||
64 | Now, mgr is responsible for receiving and processing the ``MPGStats`` messages from | |
65 | OSDs. And we also developed a protocol allowing a daemon to periodically report its | |
66 | metrics and status to mgr using ``MMgrReport``. On the mgr side, it periodically sends | |
67 | an aggregated report to the ``MgrStatMonitor`` service on mon. As explained earlier, | |
68 | this service just persists the health reports in the aggregated report to the monstore. | |
69 |