]> git.proxmox.com Git - ceph.git/blame - ceph/doc/dev/health-reports.rst
update ceph source to reef 18.2.1
[ceph.git] / ceph / doc / dev / health-reports.rst
CommitLineData
20effc67
TL
1==============
2Health Reports
3==============
4
5
6How to Get Reports
7==================
8
9In general, there are two channels to retrieve the health reports:
10
11ceph (CLI)
12 which sends ``health`` mon command for retrieving the health status of the cluster
13mgr module
14 which calls ``mgr.get('health')`` for the same report in the form of a JSON encoded string
15
16The following diagrams outline the involved parties and how the interact when the clients
17query for the reports:
18
20effc67
TL
19
20Where are the Reports Generated
21===============================
22
23Aggregator of Aggregators
24-------------------------
25
26Health reports are aggregated from multiple Paxos services:
27
28- AuthMonitor
29- HealthMonitor
30- MDSMonitor
31- MgrMonitor
32- MgrStatMonitor
33- MonmapMonitor
34- OSDMonitor
35
36When persisting the pending changes in their own domain, each of them identifies the
37health related issues and store them into the monstore with the prefix of ``health``
38using the same transaction. For instance, ``OSDMonitor`` checks a pending new osdmap
39for possible issues, like down OSDs and missing scrub flag in a pool, and then stores
40the encoded form of the health reports along with the new osdmap. These reports are
41later loaded and decoded, so they can be collected on demand. When it comes to
42``MDSMonitor``, it persists the health metrics in the beacon sent by the MDS daemons,
43and prepares health reports when storing the pending changes.
44
20effc67
TL
45
46So, if we want to add a new warning related to cephfs, probably the best place to
47start is ``MDSMonitor::encode_pending()``, where health reports are collected from
48the latest ``FSMap`` and the health metrics reported by MDS daemons.
49
50But it's noteworthy that ``MgrStatMonitor`` does *not* prepare the reports by itself,
51it just stores whatever the health reports received from mgr!
52
1e59de90
TL
53ceph-mgr -- A Delegate Aggregator
54---------------------------------
20effc67
TL
55
56In Ceph, mgr is created to share the burden of monitor, which is used to establish
57the consensus of information which is critical to keep the cluster function.
58Apparently, osdmap, mdsmap and monmap fall into this category. But what about the
59aggregated statistics of the cluster? They are crucial for the administrator to
60understand the status of the cluster, but they might not be that important to keep
1e59de90 61the cluster running. To address this scalability issue, we offloaded the work of
20effc67
TL
62collecting and aggregating the metrics to mgr.
63
64Now, mgr is responsible for receiving and processing the ``MPGStats`` messages from
65OSDs. And we also developed a protocol allowing a daemon to periodically report its
66metrics and status to mgr using ``MMgrReport``. On the mgr side, it periodically sends
67an aggregated report to the ``MgrStatMonitor`` service on mon. As explained earlier,
68this service just persists the health reports in the aggregated report to the monstore.
69