]>
Commit | Line | Data |
---|---|---|
11fdf7f2 TL |
1 | .. _mgr-prometheus: |
2 | ||
3efd9988 | 3 | ================= |
11fdf7f2 | 4 | Prometheus Module |
c07f9fc5 FG |
5 | ================= |
6 | ||
7 | Provides a Prometheus exporter to pass on Ceph performance counters | |
8 | from the collection point in ceph-mgr. Ceph-mgr receives MMgrReport | |
9 | messages from all MgrClient processes (mons and OSDs, for instance) | |
10 | with performance counter schema data and actual counter data, and keeps | |
11fdf7f2 | 11 | a circular buffer of the last N samples. This module creates an HTTP |
c07f9fc5 FG |
12 | endpoint (like all Prometheus exporters) and retrieves the latest sample |
13 | of every counter when polled (or "scraped" in Prometheus terminology). | |
14 | The HTTP path and query parameters are ignored; all extant counters | |
15 | for all reporting entities are returned in text exposition format. | |
16 | (See the Prometheus `documentation <https://prometheus.io/docs/instrumenting/exposition_formats/#text-format-details>`_.) | |
17 | ||
3efd9988 FG |
18 | Enabling prometheus output |
19 | ========================== | |
c07f9fc5 FG |
20 | |
21 | The *prometheus* module is enabled with:: | |
22 | ||
23 | ceph mgr module enable prometheus | |
24 | ||
25 | Configuration | |
26 | ------------- | |
27 | ||
28 | By default the module will accept HTTP requests on port ``9283`` on all | |
29 | IPv4 and IPv6 addresses on the host. The port and listen address are both | |
30 | configurable with ``ceph config-key set``, with keys | |
31 | ``mgr/prometheus/server_addr`` and ``mgr/prometheus/server_port``. | |
32 | This port is registered with Prometheus's `registry <https://github.com/prometheus/prometheus/wiki/Default-port-allocations>`_. | |
33 | ||
e306af50 TL |
34 | .. _prometheus-rbd-io-statistics: |
35 | ||
11fdf7f2 TL |
36 | RBD IO statistics |
37 | ----------------- | |
38 | ||
39 | The module can optionally collect RBD per-image IO statistics by enabling | |
40 | dynamic OSD performance counters. The statistics are gathered for all images | |
41 | in the pools that are specified in the ``mgr/prometheus/rbd_stats_pools`` | |
42 | configuration parameter. The parameter is a comma or space separated list | |
43 | of ``pool[/namespace]`` entries. If the namespace is not specified the | |
44 | statistics are collected for all namespaces in the pool. | |
45 | ||
e306af50 TL |
46 | Example to activate the RBD-enabled pools ``pool1``, ``pool2`` and ``poolN``:: |
47 | ||
48 | ceph config set mgr mgr/prometheus/rbd_stats_pools "pool1,pool2,poolN" | |
49 | ||
11fdf7f2 TL |
50 | The module makes the list of all available images scanning the specified |
51 | pools and namespaces and refreshes it periodically. The period is | |
52 | configurable via the ``mgr/prometheus/rbd_stats_pools_refresh_interval`` | |
53 | parameter (in sec) and is 300 sec (5 minutes) by default. The module will | |
54 | force refresh earlier if it detects statistics from a previously unknown | |
55 | RBD image. | |
56 | ||
e306af50 TL |
57 | Example to turn up the sync interval to 10 minutes:: |
58 | ||
59 | ceph config set mgr mgr/prometheus/rbd_stats_pools_refresh_interval 600 | |
60 | ||
3efd9988 FG |
61 | Statistic names and labels |
62 | ========================== | |
63 | ||
64 | The names of the stats are exactly as Ceph names them, with | |
65 | illegal characters ``.``, ``-`` and ``::`` translated to ``_``, | |
66 | and ``ceph_`` prefixed to all names. | |
67 | ||
68 | ||
69 | All *daemon* statistics have a ``ceph_daemon`` label such as "osd.123" | |
70 | that identifies the type and ID of the daemon they come from. Some | |
71 | statistics can come from different types of daemon, so when querying | |
72 | e.g. an OSD's RocksDB stats, you would probably want to filter | |
73 | on ceph_daemon starting with "osd" to avoid mixing in the monitor | |
74 | rocksdb stats. | |
75 | ||
76 | ||
77 | The *cluster* statistics (i.e. those global to the Ceph cluster) | |
78 | have labels appropriate to what they report on. For example, | |
79 | metrics relating to pools have a ``pool_id`` label. | |
80 | ||
11fdf7f2 TL |
81 | |
82 | The long running averages that represent the histograms from core Ceph | |
83 | are represented by a pair of ``<name>_sum`` and ``<name>_count`` metrics. | |
84 | This is similar to how histograms are represented in `Prometheus <https://prometheus.io/docs/concepts/metric_types/#histogram>`_ | |
85 | and they can also be treated `similarly <https://prometheus.io/docs/practices/histograms/>`_. | |
86 | ||
3efd9988 FG |
87 | Pool and OSD metadata series |
88 | ---------------------------- | |
89 | ||
90 | Special series are output to enable displaying and querying on | |
91 | certain metadata fields. | |
92 | ||
93 | Pools have a ``ceph_pool_metadata`` field like this: | |
94 | ||
95 | :: | |
96 | ||
28e407b8 | 97 | ceph_pool_metadata{pool_id="2",name="cephfs_metadata_a"} 1.0 |
3efd9988 FG |
98 | |
99 | OSDs have a ``ceph_osd_metadata`` field like this: | |
100 | ||
101 | :: | |
102 | ||
28e407b8 | 103 | ceph_osd_metadata{cluster_addr="172.21.9.34:6802/19096",device_class="ssd",ceph_daemon="osd.0",public_addr="172.21.9.34:6801/19096",weight="1.0"} 1.0 |
3efd9988 FG |
104 | |
105 | ||
106 | Correlating drive statistics with node_exporter | |
107 | ----------------------------------------------- | |
108 | ||
109 | The prometheus output from Ceph is designed to be used in conjunction | |
110 | with the generic host monitoring from the Prometheus node_exporter. | |
111 | ||
112 | To enable correlation of Ceph OSD statistics with node_exporter's | |
113 | drive statistics, special series are output like this: | |
114 | ||
115 | :: | |
116 | ||
28e407b8 | 117 | ceph_disk_occupation{ceph_daemon="osd.0",device="sdd", exported_instance="myhost"} |
3efd9988 | 118 | |
28e407b8 AA |
119 | To use this to get disk statistics by OSD ID, use either the ``and`` operator or |
120 | the ``*`` operator in your prometheus query. All metadata metrics (like `` | |
121 | ceph_disk_occupation`` have the value 1 so they act neutral with ``*``. Using ``*`` | |
122 | allows to use ``group_left`` and ``group_right`` grouping modifiers, so that | |
123 | the resulting metric has additional labels from one side of the query. | |
124 | ||
125 | See the | |
126 | `prometheus documentation`__ for more information about constructing queries. | |
127 | ||
128 | __ https://prometheus.io/docs/prometheus/latest/querying/basics | |
129 | ||
130 | The goal is to run a query like | |
3efd9988 FG |
131 | |
132 | :: | |
133 | ||
134 | rate(node_disk_bytes_written[30s]) and on (device,instance) ceph_disk_occupation{ceph_daemon="osd.0"} | |
135 | ||
28e407b8 AA |
136 | Out of the box the above query will not return any metrics since the ``instance`` labels of |
137 | both metrics don't match. The ``instance`` label of ``ceph_disk_occupation`` | |
138 | will be the currently active MGR node. | |
3efd9988 | 139 | |
28e407b8 | 140 | The following two section outline two approaches to remedy this. |
3efd9988 | 141 | |
28e407b8 AA |
142 | Use label_replace |
143 | ================= | |
144 | ||
145 | The ``label_replace`` function (cp. | |
146 | `label_replace documentation <https://prometheus.io/docs/prometheus/latest/querying/functions/#label_replace>`_) | |
147 | can add a label to, or alter a label of, a metric within a query. | |
3efd9988 | 148 | |
28e407b8 AA |
149 | To correlate an OSD and its disks write rate, the following query can be used: |
150 | ||
151 | :: | |
3efd9988 | 152 | |
28e407b8 AA |
153 | label_replace(rate(node_disk_bytes_written[30s]), "exported_instance", "$1", "instance", "(.*):.*") and on (device,exported_instance) ceph_disk_occupation{ceph_daemon="osd.0"} |
154 | ||
155 | Configuring Prometheus server | |
156 | ============================= | |
3efd9988 FG |
157 | |
158 | honor_labels | |
159 | ------------ | |
160 | ||
11fdf7f2 | 161 | To enable Ceph to output properly-labeled data relating to any host, |
3efd9988 FG |
162 | use the ``honor_labels`` setting when adding the ceph-mgr endpoints |
163 | to your prometheus configuration. | |
164 | ||
28e407b8 AA |
165 | This allows Ceph to export the proper ``instance`` label without prometheus |
166 | overwriting it. Without this setting, Prometheus applies an ``instance`` label | |
11fdf7f2 | 167 | that includes the hostname and port of the endpoint that the series came from. |
28e407b8 AA |
168 | Because Ceph clusters have multiple manager daemons, this results in an |
169 | ``instance`` label that changes spuriously when the active manager daemon | |
170 | changes. | |
3efd9988 | 171 | |
11fdf7f2 TL |
172 | If this is undesirable a custom ``instance`` label can be set in the |
173 | Prometheus target configuration: you might wish to set it to the hostname | |
174 | of your first mgr daemon, or something completely arbitrary like "ceph_cluster". | |
175 | ||
28e407b8 | 176 | node_exporter hostname labels |
3efd9988 FG |
177 | ----------------------------- |
178 | ||
179 | Set your ``instance`` labels to match what appears in Ceph's OSD metadata | |
28e407b8 | 180 | in the ``instance`` field. This is generally the short hostname of the node. |
3efd9988 FG |
181 | |
182 | This is only necessary if you want to correlate Ceph stats with host stats, | |
183 | but you may find it useful to do it in all cases in case you want to do | |
184 | the correlation in the future. | |
185 | ||
186 | Example configuration | |
187 | --------------------- | |
188 | ||
189 | This example shows a single node configuration running ceph-mgr and | |
28e407b8 AA |
190 | node_exporter on a server called ``senta04``. Note that this requires to add the |
191 | appropriate instance label to every ``node_exporter`` target individually. | |
3efd9988 FG |
192 | |
193 | This is just an example: there are other ways to configure prometheus | |
194 | scrape targets and label rewrite rules. | |
195 | ||
196 | prometheus.yml | |
197 | ~~~~~~~~~~~~~~ | |
198 | ||
199 | :: | |
200 | ||
201 | global: | |
202 | scrape_interval: 15s | |
203 | evaluation_interval: 15s | |
204 | ||
205 | scrape_configs: | |
206 | - job_name: 'node' | |
207 | file_sd_configs: | |
208 | - files: | |
209 | - node_targets.yml | |
210 | - job_name: 'ceph' | |
211 | honor_labels: true | |
212 | file_sd_configs: | |
213 | - files: | |
214 | - ceph_targets.yml | |
215 | ||
216 | ||
217 | ceph_targets.yml | |
218 | ~~~~~~~~~~~~~~~~ | |
219 | ||
220 | ||
221 | :: | |
222 | ||
223 | [ | |
224 | { | |
225 | "targets": [ "senta04.mydomain.com:9283" ], | |
28e407b8 | 226 | "labels": {} |
3efd9988 FG |
227 | } |
228 | ] | |
229 | ||
230 | ||
231 | node_targets.yml | |
232 | ~~~~~~~~~~~~~~~~ | |
233 | ||
234 | :: | |
235 | ||
236 | [ | |
237 | { | |
238 | "targets": [ "senta04.mydomain.com:9100" ], | |
239 | "labels": { | |
240 | "instance": "senta04" | |
241 | } | |
242 | } | |
243 | ] | |
244 | ||
245 | ||
c07f9fc5 | 246 | Notes |
3efd9988 | 247 | ===== |
c07f9fc5 FG |
248 | |
249 | Counters and gauges are exported; currently histograms and long-running | |
250 | averages are not. It's possible that Ceph's 2-D histograms could be | |
251 | reduced to two separate 1-D histograms, and that long-running averages | |
252 | could be exported as Prometheus' Summary type. | |
253 | ||
c07f9fc5 FG |
254 | Timestamps, as with many Prometheus exporters, are established by |
255 | the server's scrape time (Prometheus expects that it is polling the | |
256 | actual counter process synchronously). It is possible to supply a | |
257 | timestamp along with the stat report, but the Prometheus team strongly | |
258 | advises against this. This means that timestamps will be delayed by | |
259 | an unpredictable amount; it's not clear if this will be problematic, | |
260 | but it's worth knowing about. |