]>
Commit | Line | Data |
---|---|---|
3efd9988 | 1 | ================= |
c07f9fc5 FG |
2 | Prometheus plugin |
3 | ================= | |
4 | ||
5 | Provides a Prometheus exporter to pass on Ceph performance counters | |
6 | from the collection point in ceph-mgr. Ceph-mgr receives MMgrReport | |
7 | messages from all MgrClient processes (mons and OSDs, for instance) | |
8 | with performance counter schema data and actual counter data, and keeps | |
9 | a circular buffer of the last N samples. This plugin creates an HTTP | |
10 | endpoint (like all Prometheus exporters) and retrieves the latest sample | |
11 | of every counter when polled (or "scraped" in Prometheus terminology). | |
12 | The HTTP path and query parameters are ignored; all extant counters | |
13 | for all reporting entities are returned in text exposition format. | |
14 | (See the Prometheus `documentation <https://prometheus.io/docs/instrumenting/exposition_formats/#text-format-details>`_.) | |
15 | ||
3efd9988 FG |
16 | Enabling prometheus output |
17 | ========================== | |
c07f9fc5 FG |
18 | |
19 | The *prometheus* module is enabled with:: | |
20 | ||
21 | ceph mgr module enable prometheus | |
22 | ||
23 | Configuration | |
24 | ------------- | |
25 | ||
26 | By default the module will accept HTTP requests on port ``9283`` on all | |
27 | IPv4 and IPv6 addresses on the host. The port and listen address are both | |
28 | configurable with ``ceph config-key set``, with keys | |
29 | ``mgr/prometheus/server_addr`` and ``mgr/prometheus/server_port``. | |
30 | This port is registered with Prometheus's `registry <https://github.com/prometheus/prometheus/wiki/Default-port-allocations>`_. | |
31 | ||
3efd9988 FG |
32 | Statistic names and labels |
33 | ========================== | |
34 | ||
35 | The names of the stats are exactly as Ceph names them, with | |
36 | illegal characters ``.``, ``-`` and ``::`` translated to ``_``, | |
37 | and ``ceph_`` prefixed to all names. | |
38 | ||
39 | ||
40 | All *daemon* statistics have a ``ceph_daemon`` label such as "osd.123" | |
41 | that identifies the type and ID of the daemon they come from. Some | |
42 | statistics can come from different types of daemon, so when querying | |
43 | e.g. an OSD's RocksDB stats, you would probably want to filter | |
44 | on ceph_daemon starting with "osd" to avoid mixing in the monitor | |
45 | rocksdb stats. | |
46 | ||
47 | ||
48 | The *cluster* statistics (i.e. those global to the Ceph cluster) | |
49 | have labels appropriate to what they report on. For example, | |
50 | metrics relating to pools have a ``pool_id`` label. | |
51 | ||
52 | Pool and OSD metadata series | |
53 | ---------------------------- | |
54 | ||
55 | Special series are output to enable displaying and querying on | |
56 | certain metadata fields. | |
57 | ||
58 | Pools have a ``ceph_pool_metadata`` field like this: | |
59 | ||
60 | :: | |
61 | ||
28e407b8 | 62 | ceph_pool_metadata{pool_id="2",name="cephfs_metadata_a"} 1.0 |
3efd9988 FG |
63 | |
64 | OSDs have a ``ceph_osd_metadata`` field like this: | |
65 | ||
66 | :: | |
67 | ||
28e407b8 | 68 | ceph_osd_metadata{cluster_addr="172.21.9.34:6802/19096",device_class="ssd",ceph_daemon="osd.0",public_addr="172.21.9.34:6801/19096",weight="1.0"} 1.0 |
3efd9988 FG |
69 | |
70 | ||
71 | Correlating drive statistics with node_exporter | |
72 | ----------------------------------------------- | |
73 | ||
74 | The prometheus output from Ceph is designed to be used in conjunction | |
75 | with the generic host monitoring from the Prometheus node_exporter. | |
76 | ||
77 | To enable correlation of Ceph OSD statistics with node_exporter's | |
78 | drive statistics, special series are output like this: | |
79 | ||
80 | :: | |
81 | ||
28e407b8 | 82 | ceph_disk_occupation{ceph_daemon="osd.0",device="sdd", exported_instance="myhost"} |
3efd9988 | 83 | |
28e407b8 AA |
84 | To use this to get disk statistics by OSD ID, use either the ``and`` operator or |
85 | the ``*`` operator in your prometheus query. All metadata metrics (like `` | |
86 | ceph_disk_occupation`` have the value 1 so they act neutral with ``*``. Using ``*`` | |
87 | allows to use ``group_left`` and ``group_right`` grouping modifiers, so that | |
88 | the resulting metric has additional labels from one side of the query. | |
89 | ||
90 | See the | |
91 | `prometheus documentation`__ for more information about constructing queries. | |
92 | ||
93 | __ https://prometheus.io/docs/prometheus/latest/querying/basics | |
94 | ||
95 | The goal is to run a query like | |
3efd9988 FG |
96 | |
97 | :: | |
98 | ||
99 | rate(node_disk_bytes_written[30s]) and on (device,instance) ceph_disk_occupation{ceph_daemon="osd.0"} | |
100 | ||
28e407b8 AA |
101 | Out of the box the above query will not return any metrics since the ``instance`` labels of |
102 | both metrics don't match. The ``instance`` label of ``ceph_disk_occupation`` | |
103 | will be the currently active MGR node. | |
3efd9988 | 104 | |
28e407b8 | 105 | The following two section outline two approaches to remedy this. |
3efd9988 | 106 | |
28e407b8 AA |
107 | Use label_replace |
108 | ================= | |
109 | ||
110 | The ``label_replace`` function (cp. | |
111 | `label_replace documentation <https://prometheus.io/docs/prometheus/latest/querying/functions/#label_replace>`_) | |
112 | can add a label to, or alter a label of, a metric within a query. | |
3efd9988 | 113 | |
28e407b8 AA |
114 | To correlate an OSD and its disks write rate, the following query can be used: |
115 | ||
116 | :: | |
3efd9988 | 117 | |
28e407b8 AA |
118 | label_replace(rate(node_disk_bytes_written[30s]), "exported_instance", "$1", "instance", "(.*):.*") and on (device,exported_instance) ceph_disk_occupation{ceph_daemon="osd.0"} |
119 | ||
120 | Configuring Prometheus server | |
121 | ============================= | |
3efd9988 FG |
122 | |
123 | honor_labels | |
124 | ------------ | |
125 | ||
126 | To enable Ceph to output properly-labelled data relating to any host, | |
127 | use the ``honor_labels`` setting when adding the ceph-mgr endpoints | |
128 | to your prometheus configuration. | |
129 | ||
28e407b8 AA |
130 | This allows Ceph to export the proper ``instance`` label without prometheus |
131 | overwriting it. Without this setting, Prometheus applies an ``instance`` label | |
132 | that includes the hostname and port of the endpoint that the series game from. | |
133 | Because Ceph clusters have multiple manager daemons, this results in an | |
134 | ``instance`` label that changes spuriously when the active manager daemon | |
135 | changes. | |
3efd9988 | 136 | |
28e407b8 | 137 | node_exporter hostname labels |
3efd9988 FG |
138 | ----------------------------- |
139 | ||
140 | Set your ``instance`` labels to match what appears in Ceph's OSD metadata | |
28e407b8 | 141 | in the ``instance`` field. This is generally the short hostname of the node. |
3efd9988 FG |
142 | |
143 | This is only necessary if you want to correlate Ceph stats with host stats, | |
144 | but you may find it useful to do it in all cases in case you want to do | |
145 | the correlation in the future. | |
146 | ||
147 | Example configuration | |
148 | --------------------- | |
149 | ||
150 | This example shows a single node configuration running ceph-mgr and | |
28e407b8 AA |
151 | node_exporter on a server called ``senta04``. Note that this requires to add the |
152 | appropriate instance label to every ``node_exporter`` target individually. | |
3efd9988 FG |
153 | |
154 | This is just an example: there are other ways to configure prometheus | |
155 | scrape targets and label rewrite rules. | |
156 | ||
157 | prometheus.yml | |
158 | ~~~~~~~~~~~~~~ | |
159 | ||
160 | :: | |
161 | ||
162 | global: | |
163 | scrape_interval: 15s | |
164 | evaluation_interval: 15s | |
165 | ||
166 | scrape_configs: | |
167 | - job_name: 'node' | |
168 | file_sd_configs: | |
169 | - files: | |
170 | - node_targets.yml | |
171 | - job_name: 'ceph' | |
172 | honor_labels: true | |
173 | file_sd_configs: | |
174 | - files: | |
175 | - ceph_targets.yml | |
176 | ||
177 | ||
178 | ceph_targets.yml | |
179 | ~~~~~~~~~~~~~~~~ | |
180 | ||
181 | ||
182 | :: | |
183 | ||
184 | [ | |
185 | { | |
186 | "targets": [ "senta04.mydomain.com:9283" ], | |
28e407b8 | 187 | "labels": {} |
3efd9988 FG |
188 | } |
189 | ] | |
190 | ||
191 | ||
192 | node_targets.yml | |
193 | ~~~~~~~~~~~~~~~~ | |
194 | ||
195 | :: | |
196 | ||
197 | [ | |
198 | { | |
199 | "targets": [ "senta04.mydomain.com:9100" ], | |
200 | "labels": { | |
201 | "instance": "senta04" | |
202 | } | |
203 | } | |
204 | ] | |
205 | ||
206 | ||
c07f9fc5 | 207 | Notes |
3efd9988 | 208 | ===== |
c07f9fc5 FG |
209 | |
210 | Counters and gauges are exported; currently histograms and long-running | |
211 | averages are not. It's possible that Ceph's 2-D histograms could be | |
212 | reduced to two separate 1-D histograms, and that long-running averages | |
213 | could be exported as Prometheus' Summary type. | |
214 | ||
c07f9fc5 FG |
215 | Timestamps, as with many Prometheus exporters, are established by |
216 | the server's scrape time (Prometheus expects that it is polling the | |
217 | actual counter process synchronously). It is possible to supply a | |
218 | timestamp along with the stat report, but the Prometheus team strongly | |
219 | advises against this. This means that timestamps will be delayed by | |
220 | an unpredictable amount; it's not clear if this will be problematic, | |
221 | but it's worth knowing about. |