]>
Commit | Line | Data |
---|---|---|
20effc67 TL |
1 | ============== |
2 | Health Reports | |
3 | ============== | |
4 | ||
5 | ||
6 | How to Get Reports | |
7 | ================== | |
8 | ||
9 | In general, there are two channels to retrieve the health reports: | |
10 | ||
11 | ceph (CLI) | |
12 | which sends ``health`` mon command for retrieving the health status of the cluster | |
13 | mgr module | |
14 | which calls ``mgr.get('health')`` for the same report in the form of a JSON encoded string | |
15 | ||
16 | The following diagrams outline the involved parties and how the interact when the clients | |
17 | query for the reports: | |
18 | ||
19 | .. seqdiag:: | |
20 | ||
21 | seqdiag { | |
22 | default_note_color = lightblue; | |
23 | osd; mon; ceph-cli; | |
24 | osd => mon [ label = "update osdmap service" ]; | |
25 | osd => mon [ label = "update osdmap service" ]; | |
26 | ceph-cli -> mon [ label = "send 'health' command" ]; | |
27 | mon -> mon [ leftnote = "gather checks from services" ]; | |
28 | ceph-cli <-- mon [ label = "checks and mutes" ]; | |
29 | } | |
30 | ||
31 | .. seqdiag:: | |
32 | ||
33 | seqdiag { | |
34 | default_note_color = lightblue; | |
35 | osd; mon; mgr; mgr-module; | |
36 | mgr -> mon [ label = "subscribe for 'mgrdigest'" ]; | |
37 | osd => mon [ label = "update osdmap service" ]; | |
38 | osd => mon [ label = "update osdmap service" ]; | |
39 | mon -> mgr [ label = "send MMgrDigest" ]; | |
40 | mgr -> mgr [ note = "update cluster state" ]; | |
41 | mon <-- mgr; | |
42 | mgr-module -> mgr [ label = "mgr.get('health')" ]; | |
43 | mgr-module <-- mgr [ label = "heath reports in json" ]; | |
44 | } | |
45 | ||
46 | Where are the Reports Generated | |
47 | =============================== | |
48 | ||
49 | Aggregator of Aggregators | |
50 | ------------------------- | |
51 | ||
52 | Health reports are aggregated from multiple Paxos services: | |
53 | ||
54 | - AuthMonitor | |
55 | - HealthMonitor | |
56 | - MDSMonitor | |
57 | - MgrMonitor | |
58 | - MgrStatMonitor | |
59 | - MonmapMonitor | |
60 | - OSDMonitor | |
61 | ||
62 | When persisting the pending changes in their own domain, each of them identifies the | |
63 | health related issues and store them into the monstore with the prefix of ``health`` | |
64 | using the same transaction. For instance, ``OSDMonitor`` checks a pending new osdmap | |
65 | for possible issues, like down OSDs and missing scrub flag in a pool, and then stores | |
66 | the encoded form of the health reports along with the new osdmap. These reports are | |
67 | later loaded and decoded, so they can be collected on demand. When it comes to | |
68 | ``MDSMonitor``, it persists the health metrics in the beacon sent by the MDS daemons, | |
69 | and prepares health reports when storing the pending changes. | |
70 | ||
71 | .. seqdiag:: | |
72 | ||
73 | seqdiag { | |
74 | default_note_color = lightblue; | |
75 | mds; mon-mds; mon-health; ceph-cli; | |
76 | mds -> mon-mds [ label = "send beacon" ]; | |
77 | mon-mds -> mon-mds [ note = "store health metrics in beacon" ]; | |
78 | mds <-- mon-mds; | |
79 | mon-mds -> mon-mds [ note = "encode_health(checks)" ]; | |
80 | ceph-cli -> mon-health [ label = "send 'health' command" ]; | |
81 | mon-health => mon-mds [ label = "gather health checks" ]; | |
82 | ceph-cli <-- mon-health [ label = "checks and mutes" ]; | |
83 | } | |
84 | ||
85 | So, if we want to add a new warning related to cephfs, probably the best place to | |
86 | start is ``MDSMonitor::encode_pending()``, where health reports are collected from | |
87 | the latest ``FSMap`` and the health metrics reported by MDS daemons. | |
88 | ||
89 | But it's noteworthy that ``MgrStatMonitor`` does *not* prepare the reports by itself, | |
90 | it just stores whatever the health reports received from mgr! | |
91 | ||
92 | ceph-mgr -- A Delegate Aggegator | |
93 | -------------------------------- | |
94 | ||
95 | In Ceph, mgr is created to share the burden of monitor, which is used to establish | |
96 | the consensus of information which is critical to keep the cluster function. | |
97 | Apparently, osdmap, mdsmap and monmap fall into this category. But what about the | |
98 | aggregated statistics of the cluster? They are crucial for the administrator to | |
99 | understand the status of the cluster, but they might not be that important to keep | |
100 | the cluster running. To address this scability issue, we offloaded the work of | |
101 | collecting and aggregating the metrics to mgr. | |
102 | ||
103 | Now, mgr is responsible for receiving and processing the ``MPGStats`` messages from | |
104 | OSDs. And we also developed a protocol allowing a daemon to periodically report its | |
105 | metrics and status to mgr using ``MMgrReport``. On the mgr side, it periodically sends | |
106 | an aggregated report to the ``MgrStatMonitor`` service on mon. As explained earlier, | |
107 | this service just persists the health reports in the aggregated report to the monstore. | |
108 | ||
109 | .. seqdiag:: | |
110 | ||
111 | seqdiag { | |
112 | default_note_color = lightblue; | |
113 | service; mgr; mon-mgr-stat; mon-health; | |
114 | service -> mgr [ label = "send(open)" ]; | |
115 | mgr -> mgr [ note = "register the new service" ]; | |
116 | service <-- mgr; | |
117 | mgr => service [ label = "send(configure)" ]; | |
118 | service -> mgr [ label = "send(report)" ]; | |
119 | mgr -> mgr [ note = "update/aggregate service metrics" ]; | |
120 | service <-- mgr; | |
121 | service => mgr [ label = "send(report)" ]; | |
122 | mgr -> mon-mgr-stat [ label = "send(mgr-report)" ]; | |
123 | mon-mgr-stat -> mon-mgr-stat [ note = "store health checks in the report" ]; | |
124 | mgr <-- mon-mgr-stat; | |
125 | mon-health => mon-mgr-stat [ label = "gather health checks" ]; | |
126 | service => mgr [ label = "send(report)" ]; | |
127 | service => mgr [ label = "send(close)" ]; | |
128 | } |