]> git.proxmox.com Git - ceph.git/blame - ceph/doc/mgr/prometheus.rst
import 15.2.4
[ceph.git] / ceph / doc / mgr / prometheus.rst
CommitLineData
11fdf7f2
TL
1.. _mgr-prometheus:
2
3efd9988 3=================
11fdf7f2 4Prometheus Module
c07f9fc5
FG
5=================
6
7Provides a Prometheus exporter to pass on Ceph performance counters
8from the collection point in ceph-mgr. Ceph-mgr receives MMgrReport
9messages from all MgrClient processes (mons and OSDs, for instance)
10with performance counter schema data and actual counter data, and keeps
11fdf7f2 11a circular buffer of the last N samples. This module creates an HTTP
c07f9fc5
FG
12endpoint (like all Prometheus exporters) and retrieves the latest sample
13of every counter when polled (or "scraped" in Prometheus terminology).
14The HTTP path and query parameters are ignored; all extant counters
15for all reporting entities are returned in text exposition format.
16(See the Prometheus `documentation <https://prometheus.io/docs/instrumenting/exposition_formats/#text-format-details>`_.)
17
3efd9988
FG
18Enabling prometheus output
19==========================
c07f9fc5
FG
20
21The *prometheus* module is enabled with::
22
23 ceph mgr module enable prometheus
24
25Configuration
26-------------
27
28By default the module will accept HTTP requests on port ``9283`` on all
29IPv4 and IPv6 addresses on the host. The port and listen address are both
30configurable with ``ceph config-key set``, with keys
31``mgr/prometheus/server_addr`` and ``mgr/prometheus/server_port``.
32This port is registered with Prometheus's `registry <https://github.com/prometheus/prometheus/wiki/Default-port-allocations>`_.
33
e306af50
TL
34.. _prometheus-rbd-io-statistics:
35
11fdf7f2
TL
36RBD IO statistics
37-----------------
38
39The module can optionally collect RBD per-image IO statistics by enabling
40dynamic OSD performance counters. The statistics are gathered for all images
41in the pools that are specified in the ``mgr/prometheus/rbd_stats_pools``
42configuration parameter. The parameter is a comma or space separated list
43of ``pool[/namespace]`` entries. If the namespace is not specified the
44statistics are collected for all namespaces in the pool.
45
e306af50
TL
46Example to activate the RBD-enabled pools ``pool1``, ``pool2`` and ``poolN``::
47
48 ceph config set mgr mgr/prometheus/rbd_stats_pools "pool1,pool2,poolN"
49
11fdf7f2
TL
50The module makes the list of all available images scanning the specified
51pools and namespaces and refreshes it periodically. The period is
52configurable via the ``mgr/prometheus/rbd_stats_pools_refresh_interval``
53parameter (in sec) and is 300 sec (5 minutes) by default. The module will
54force refresh earlier if it detects statistics from a previously unknown
55RBD image.
56
e306af50
TL
57Example to turn up the sync interval to 10 minutes::
58
59 ceph config set mgr mgr/prometheus/rbd_stats_pools_refresh_interval 600
60
3efd9988
FG
61Statistic names and labels
62==========================
63
64The names of the stats are exactly as Ceph names them, with
65illegal characters ``.``, ``-`` and ``::`` translated to ``_``,
66and ``ceph_`` prefixed to all names.
67
68
69All *daemon* statistics have a ``ceph_daemon`` label such as "osd.123"
70that identifies the type and ID of the daemon they come from. Some
71statistics can come from different types of daemon, so when querying
72e.g. an OSD's RocksDB stats, you would probably want to filter
73on ceph_daemon starting with "osd" to avoid mixing in the monitor
74rocksdb stats.
75
76
77The *cluster* statistics (i.e. those global to the Ceph cluster)
78have labels appropriate to what they report on. For example,
79metrics relating to pools have a ``pool_id`` label.
80
11fdf7f2
TL
81
82The long running averages that represent the histograms from core Ceph
83are represented by a pair of ``<name>_sum`` and ``<name>_count`` metrics.
84This is similar to how histograms are represented in `Prometheus <https://prometheus.io/docs/concepts/metric_types/#histogram>`_
85and they can also be treated `similarly <https://prometheus.io/docs/practices/histograms/>`_.
86
3efd9988
FG
87Pool and OSD metadata series
88----------------------------
89
90Special series are output to enable displaying and querying on
91certain metadata fields.
92
93Pools have a ``ceph_pool_metadata`` field like this:
94
95::
96
28e407b8 97 ceph_pool_metadata{pool_id="2",name="cephfs_metadata_a"} 1.0
3efd9988
FG
98
99OSDs have a ``ceph_osd_metadata`` field like this:
100
101::
102
28e407b8 103 ceph_osd_metadata{cluster_addr="172.21.9.34:6802/19096",device_class="ssd",ceph_daemon="osd.0",public_addr="172.21.9.34:6801/19096",weight="1.0"} 1.0
3efd9988
FG
104
105
106Correlating drive statistics with node_exporter
107-----------------------------------------------
108
109The prometheus output from Ceph is designed to be used in conjunction
110with the generic host monitoring from the Prometheus node_exporter.
111
112To enable correlation of Ceph OSD statistics with node_exporter's
113drive statistics, special series are output like this:
114
115::
116
28e407b8 117 ceph_disk_occupation{ceph_daemon="osd.0",device="sdd", exported_instance="myhost"}
3efd9988 118
28e407b8
AA
119To use this to get disk statistics by OSD ID, use either the ``and`` operator or
120the ``*`` operator in your prometheus query. All metadata metrics (like ``
121ceph_disk_occupation`` have the value 1 so they act neutral with ``*``. Using ``*``
122allows to use ``group_left`` and ``group_right`` grouping modifiers, so that
123the resulting metric has additional labels from one side of the query.
124
125See the
126`prometheus documentation`__ for more information about constructing queries.
127
128__ https://prometheus.io/docs/prometheus/latest/querying/basics
129
130The goal is to run a query like
3efd9988
FG
131
132::
133
134 rate(node_disk_bytes_written[30s]) and on (device,instance) ceph_disk_occupation{ceph_daemon="osd.0"}
135
28e407b8
AA
136Out of the box the above query will not return any metrics since the ``instance`` labels of
137both metrics don't match. The ``instance`` label of ``ceph_disk_occupation``
138will be the currently active MGR node.
3efd9988 139
28e407b8 140 The following two section outline two approaches to remedy this.
3efd9988 141
28e407b8
AA
142Use label_replace
143=================
144
145The ``label_replace`` function (cp.
146`label_replace documentation <https://prometheus.io/docs/prometheus/latest/querying/functions/#label_replace>`_)
147can add a label to, or alter a label of, a metric within a query.
3efd9988 148
28e407b8
AA
149To correlate an OSD and its disks write rate, the following query can be used:
150
151::
3efd9988 152
28e407b8
AA
153 label_replace(rate(node_disk_bytes_written[30s]), "exported_instance", "$1", "instance", "(.*):.*") and on (device,exported_instance) ceph_disk_occupation{ceph_daemon="osd.0"}
154
155Configuring Prometheus server
156=============================
3efd9988
FG
157
158honor_labels
159------------
160
11fdf7f2 161To enable Ceph to output properly-labeled data relating to any host,
3efd9988
FG
162use the ``honor_labels`` setting when adding the ceph-mgr endpoints
163to your prometheus configuration.
164
28e407b8
AA
165This allows Ceph to export the proper ``instance`` label without prometheus
166overwriting it. Without this setting, Prometheus applies an ``instance`` label
11fdf7f2 167that includes the hostname and port of the endpoint that the series came from.
28e407b8
AA
168Because Ceph clusters have multiple manager daemons, this results in an
169``instance`` label that changes spuriously when the active manager daemon
170changes.
3efd9988 171
11fdf7f2
TL
172If this is undesirable a custom ``instance`` label can be set in the
173Prometheus target configuration: you might wish to set it to the hostname
174of your first mgr daemon, or something completely arbitrary like "ceph_cluster".
175
28e407b8 176node_exporter hostname labels
3efd9988
FG
177-----------------------------
178
179Set your ``instance`` labels to match what appears in Ceph's OSD metadata
28e407b8 180in the ``instance`` field. This is generally the short hostname of the node.
3efd9988
FG
181
182This is only necessary if you want to correlate Ceph stats with host stats,
183but you may find it useful to do it in all cases in case you want to do
184the correlation in the future.
185
186Example configuration
187---------------------
188
189This example shows a single node configuration running ceph-mgr and
28e407b8
AA
190node_exporter on a server called ``senta04``. Note that this requires to add the
191appropriate instance label to every ``node_exporter`` target individually.
3efd9988
FG
192
193This is just an example: there are other ways to configure prometheus
194scrape targets and label rewrite rules.
195
196prometheus.yml
197~~~~~~~~~~~~~~
198
199::
200
201 global:
202 scrape_interval: 15s
203 evaluation_interval: 15s
204
205 scrape_configs:
206 - job_name: 'node'
207 file_sd_configs:
208 - files:
209 - node_targets.yml
210 - job_name: 'ceph'
211 honor_labels: true
212 file_sd_configs:
213 - files:
214 - ceph_targets.yml
215
216
217ceph_targets.yml
218~~~~~~~~~~~~~~~~
219
220
221::
222
223 [
224 {
225 "targets": [ "senta04.mydomain.com:9283" ],
28e407b8 226 "labels": {}
3efd9988
FG
227 }
228 ]
229
230
231node_targets.yml
232~~~~~~~~~~~~~~~~
233
234::
235
236 [
237 {
238 "targets": [ "senta04.mydomain.com:9100" ],
239 "labels": {
240 "instance": "senta04"
241 }
242 }
243 ]
244
245
c07f9fc5 246Notes
3efd9988 247=====
c07f9fc5
FG
248
249Counters and gauges are exported; currently histograms and long-running
250averages are not. It's possible that Ceph's 2-D histograms could be
251reduced to two separate 1-D histograms, and that long-running averages
252could be exported as Prometheus' Summary type.
253
c07f9fc5
FG
254Timestamps, as with many Prometheus exporters, are established by
255the server's scrape time (Prometheus expects that it is polling the
256actual counter process synchronously). It is possible to supply a
257timestamp along with the stat report, but the Prometheus team strongly
258advises against this. This means that timestamps will be delayed by
259an unpredictable amount; it's not clear if this will be problematic,
260but it's worth knowing about.