]> git.proxmox.com Git - ceph.git/blame - ceph/doc/mgr/prometheus.rst
import ceph 16.2.7
[ceph.git] / ceph / doc / mgr / prometheus.rst
CommitLineData
11fdf7f2
TL
1.. _mgr-prometheus:
2
3efd9988 3=================
11fdf7f2 4Prometheus Module
c07f9fc5
FG
5=================
6
7Provides a Prometheus exporter to pass on Ceph performance counters
8from the collection point in ceph-mgr. Ceph-mgr receives MMgrReport
9messages from all MgrClient processes (mons and OSDs, for instance)
10with performance counter schema data and actual counter data, and keeps
11fdf7f2 11a circular buffer of the last N samples. This module creates an HTTP
c07f9fc5
FG
12endpoint (like all Prometheus exporters) and retrieves the latest sample
13of every counter when polled (or "scraped" in Prometheus terminology).
14The HTTP path and query parameters are ignored; all extant counters
15for all reporting entities are returned in text exposition format.
16(See the Prometheus `documentation <https://prometheus.io/docs/instrumenting/exposition_formats/#text-format-details>`_.)
17
3efd9988
FG
18Enabling prometheus output
19==========================
c07f9fc5
FG
20
21The *prometheus* module is enabled with::
22
23 ceph mgr module enable prometheus
24
25Configuration
26-------------
27
f6b5b4d7
TL
28.. note::
29
30 The Prometheus manager module needs to be restarted for configuration changes to be applied.
31
32By default the module will accept HTTP requests on port ``9283`` on all IPv4
33and IPv6 addresses on the host. The port and listen address are both
f67539c2 34configurable with ``ceph config set``, with keys
f6b5b4d7
TL
35``mgr/prometheus/server_addr`` and ``mgr/prometheus/server_port``. This port
36is registered with Prometheus's `registry
37<https://github.com/prometheus/prometheus/wiki/Default-port-allocations>`_.
38
39::
40
41 ceph config set mgr mgr/prometheus/server_addr 0.0.0.0
42 ceph config set mgr mgr/prometheus/server_port 9283
43
44.. warning::
45
46 The ``scrape_interval`` of this module should always be set to match
47 Prometheus' scrape interval to work properly and not cause any issues.
48
49The Prometheus manager module is, by default, configured with a scrape interval
50of 15 seconds. The scrape interval in the module is used for caching purposes
51and to determine when a cache is stale.
52
53It is not recommended to use a scrape interval below 10 seconds. It is
54recommended to use 15 seconds as scrape interval, though, in some cases it
55might be useful to increase the scrape interval.
56
57To set a different scrape interval in the Prometheus module, set
58``scrape_interval`` to the desired value::
59
60 ceph config set mgr mgr/prometheus/scrape_interval 20
61
62On large clusters (>1000 OSDs), the time to fetch the metrics may become
a4b75251
TL
63significant. Without the cache, the Prometheus manager module could, especially
64in conjunction with multiple Prometheus instances, overload the manager and lead
65to unresponsive or crashing Ceph manager instances. Hence, the cache is enabled
66by default. This means that there is a possibility that the cache becomes
67stale. The cache is considered stale when the time to fetch the metrics from
68Ceph exceeds the configured :confval:``mgr/prometheus/scrape_interval``.
f6b5b4d7
TL
69
70If that is the case, **a warning will be logged** and the module will either
71
72* respond with a 503 HTTP status code (service unavailable) or,
73* it will return the content of the cache, even though it might be stale.
74
75This behavior can be configured. By default, it will return a 503 HTTP status
76code (service unavailable). You can set other options using the ``ceph config
77set`` commands.
78
79To tell the module to respond with possibly stale data, set it to ``return``::
80
81 ceph config set mgr mgr/prometheus/stale_cache_strategy return
82
83To tell the module to respond with "service unavailable", set it to ``fail``::
84
85 ceph config set mgr mgr/prometheus/stale_cache_strategy fail
c07f9fc5 86
a4b75251
TL
87If you are confident that you don't require the cache, you can disable it::
88
89 ceph config set mgr mgr/prometheus/cache false
90
e306af50
TL
91.. _prometheus-rbd-io-statistics:
92
11fdf7f2
TL
93RBD IO statistics
94-----------------
95
96The module can optionally collect RBD per-image IO statistics by enabling
97dynamic OSD performance counters. The statistics are gathered for all images
98in the pools that are specified in the ``mgr/prometheus/rbd_stats_pools``
99configuration parameter. The parameter is a comma or space separated list
100of ``pool[/namespace]`` entries. If the namespace is not specified the
101statistics are collected for all namespaces in the pool.
102
e306af50
TL
103Example to activate the RBD-enabled pools ``pool1``, ``pool2`` and ``poolN``::
104
105 ceph config set mgr mgr/prometheus/rbd_stats_pools "pool1,pool2,poolN"
106
11fdf7f2
TL
107The module makes the list of all available images scanning the specified
108pools and namespaces and refreshes it periodically. The period is
109configurable via the ``mgr/prometheus/rbd_stats_pools_refresh_interval``
110parameter (in sec) and is 300 sec (5 minutes) by default. The module will
111force refresh earlier if it detects statistics from a previously unknown
112RBD image.
113
e306af50
TL
114Example to turn up the sync interval to 10 minutes::
115
116 ceph config set mgr mgr/prometheus/rbd_stats_pools_refresh_interval 600
117
3efd9988
FG
118Statistic names and labels
119==========================
120
121The names of the stats are exactly as Ceph names them, with
f6b5b4d7 122illegal characters ``.``, ``-`` and ``::`` translated to ``_``,
3efd9988
FG
123and ``ceph_`` prefixed to all names.
124
125
126All *daemon* statistics have a ``ceph_daemon`` label such as "osd.123"
127that identifies the type and ID of the daemon they come from. Some
128statistics can come from different types of daemon, so when querying
129e.g. an OSD's RocksDB stats, you would probably want to filter
130on ceph_daemon starting with "osd" to avoid mixing in the monitor
131rocksdb stats.
132
133
134The *cluster* statistics (i.e. those global to the Ceph cluster)
f6b5b4d7 135have labels appropriate to what they report on. For example,
3efd9988
FG
136metrics relating to pools have a ``pool_id`` label.
137
11fdf7f2
TL
138
139The long running averages that represent the histograms from core Ceph
140are represented by a pair of ``<name>_sum`` and ``<name>_count`` metrics.
141This is similar to how histograms are represented in `Prometheus <https://prometheus.io/docs/concepts/metric_types/#histogram>`_
142and they can also be treated `similarly <https://prometheus.io/docs/practices/histograms/>`_.
143
3efd9988
FG
144Pool and OSD metadata series
145----------------------------
146
147Special series are output to enable displaying and querying on
148certain metadata fields.
149
150Pools have a ``ceph_pool_metadata`` field like this:
151
152::
153
28e407b8 154 ceph_pool_metadata{pool_id="2",name="cephfs_metadata_a"} 1.0
3efd9988
FG
155
156OSDs have a ``ceph_osd_metadata`` field like this:
157
158::
159
28e407b8 160 ceph_osd_metadata{cluster_addr="172.21.9.34:6802/19096",device_class="ssd",ceph_daemon="osd.0",public_addr="172.21.9.34:6801/19096",weight="1.0"} 1.0
3efd9988
FG
161
162
163Correlating drive statistics with node_exporter
164-----------------------------------------------
165
166The prometheus output from Ceph is designed to be used in conjunction
167with the generic host monitoring from the Prometheus node_exporter.
168
f6b5b4d7 169To enable correlation of Ceph OSD statistics with node_exporter's
3efd9988
FG
170drive statistics, special series are output like this:
171
172::
173
28e407b8 174 ceph_disk_occupation{ceph_daemon="osd.0",device="sdd", exported_instance="myhost"}
3efd9988 175
28e407b8
AA
176To use this to get disk statistics by OSD ID, use either the ``and`` operator or
177the ``*`` operator in your prometheus query. All metadata metrics (like ``
178ceph_disk_occupation`` have the value 1 so they act neutral with ``*``. Using ``*``
179allows to use ``group_left`` and ``group_right`` grouping modifiers, so that
180the resulting metric has additional labels from one side of the query.
181
182See the
183`prometheus documentation`__ for more information about constructing queries.
184
185__ https://prometheus.io/docs/prometheus/latest/querying/basics
186
187The goal is to run a query like
3efd9988
FG
188
189::
190
191 rate(node_disk_bytes_written[30s]) and on (device,instance) ceph_disk_occupation{ceph_daemon="osd.0"}
192
28e407b8
AA
193Out of the box the above query will not return any metrics since the ``instance`` labels of
194both metrics don't match. The ``instance`` label of ``ceph_disk_occupation``
195will be the currently active MGR node.
3efd9988 196
28e407b8 197 The following two section outline two approaches to remedy this.
3efd9988 198
28e407b8
AA
199Use label_replace
200=================
201
202The ``label_replace`` function (cp.
203`label_replace documentation <https://prometheus.io/docs/prometheus/latest/querying/functions/#label_replace>`_)
204can add a label to, or alter a label of, a metric within a query.
3efd9988 205
28e407b8
AA
206To correlate an OSD and its disks write rate, the following query can be used:
207
208::
3efd9988 209
28e407b8
AA
210 label_replace(rate(node_disk_bytes_written[30s]), "exported_instance", "$1", "instance", "(.*):.*") and on (device,exported_instance) ceph_disk_occupation{ceph_daemon="osd.0"}
211
212Configuring Prometheus server
213=============================
3efd9988
FG
214
215honor_labels
216------------
217
11fdf7f2 218To enable Ceph to output properly-labeled data relating to any host,
3efd9988
FG
219use the ``honor_labels`` setting when adding the ceph-mgr endpoints
220to your prometheus configuration.
221
28e407b8
AA
222This allows Ceph to export the proper ``instance`` label without prometheus
223overwriting it. Without this setting, Prometheus applies an ``instance`` label
11fdf7f2 224that includes the hostname and port of the endpoint that the series came from.
28e407b8
AA
225Because Ceph clusters have multiple manager daemons, this results in an
226``instance`` label that changes spuriously when the active manager daemon
227changes.
3efd9988 228
11fdf7f2
TL
229If this is undesirable a custom ``instance`` label can be set in the
230Prometheus target configuration: you might wish to set it to the hostname
231of your first mgr daemon, or something completely arbitrary like "ceph_cluster".
232
28e407b8 233node_exporter hostname labels
3efd9988
FG
234-----------------------------
235
236Set your ``instance`` labels to match what appears in Ceph's OSD metadata
28e407b8 237in the ``instance`` field. This is generally the short hostname of the node.
3efd9988
FG
238
239This is only necessary if you want to correlate Ceph stats with host stats,
240but you may find it useful to do it in all cases in case you want to do
241the correlation in the future.
242
243Example configuration
244---------------------
245
246This example shows a single node configuration running ceph-mgr and
f67539c2
TL
247node_exporter on a server called ``senta04``. Note that this requires one
248to add an appropriate and unique ``instance`` label to each ``node_exporter`` target.
3efd9988
FG
249
250This is just an example: there are other ways to configure prometheus
251scrape targets and label rewrite rules.
252
253prometheus.yml
254~~~~~~~~~~~~~~
255
256::
257
258 global:
259 scrape_interval: 15s
260 evaluation_interval: 15s
261
262 scrape_configs:
263 - job_name: 'node'
264 file_sd_configs:
265 - files:
266 - node_targets.yml
267 - job_name: 'ceph'
268 honor_labels: true
269 file_sd_configs:
270 - files:
271 - ceph_targets.yml
272
273
274ceph_targets.yml
275~~~~~~~~~~~~~~~~
276
277
278::
279
280 [
281 {
282 "targets": [ "senta04.mydomain.com:9283" ],
28e407b8 283 "labels": {}
3efd9988
FG
284 }
285 ]
286
287
288node_targets.yml
289~~~~~~~~~~~~~~~~
290
291::
292
293 [
294 {
295 "targets": [ "senta04.mydomain.com:9100" ],
296 "labels": {
297 "instance": "senta04"
298 }
299 }
300 ]
301
302
c07f9fc5 303Notes
3efd9988 304=====
c07f9fc5
FG
305
306Counters and gauges are exported; currently histograms and long-running
307averages are not. It's possible that Ceph's 2-D histograms could be
308reduced to two separate 1-D histograms, and that long-running averages
309could be exported as Prometheus' Summary type.
310
c07f9fc5
FG
311Timestamps, as with many Prometheus exporters, are established by
312the server's scrape time (Prometheus expects that it is polling the
313actual counter process synchronously). It is possible to supply a
314timestamp along with the stat report, but the Prometheus team strongly
315advises against this. This means that timestamps will be delayed by
316an unpredictable amount; it's not clear if this will be problematic,
317but it's worth knowing about.