]> git.proxmox.com Git - ceph.git/blob - ceph/doc/mgr/prometheus.rst
import 15.2.4
[ceph.git] / ceph / doc / mgr / prometheus.rst
1 .. _mgr-prometheus:
2
3 =================
4 Prometheus Module
5 =================
6
7 Provides a Prometheus exporter to pass on Ceph performance counters
8 from the collection point in ceph-mgr. Ceph-mgr receives MMgrReport
9 messages from all MgrClient processes (mons and OSDs, for instance)
10 with performance counter schema data and actual counter data, and keeps
11 a circular buffer of the last N samples. This module creates an HTTP
12 endpoint (like all Prometheus exporters) and retrieves the latest sample
13 of every counter when polled (or "scraped" in Prometheus terminology).
14 The HTTP path and query parameters are ignored; all extant counters
15 for all reporting entities are returned in text exposition format.
16 (See the Prometheus `documentation <https://prometheus.io/docs/instrumenting/exposition_formats/#text-format-details>`_.)
17
18 Enabling prometheus output
19 ==========================
20
21 The *prometheus* module is enabled with::
22
23 ceph mgr module enable prometheus
24
25 Configuration
26 -------------
27
28 By default the module will accept HTTP requests on port ``9283`` on all
29 IPv4 and IPv6 addresses on the host. The port and listen address are both
30 configurable with ``ceph config-key set``, with keys
31 ``mgr/prometheus/server_addr`` and ``mgr/prometheus/server_port``.
32 This port is registered with Prometheus's `registry <https://github.com/prometheus/prometheus/wiki/Default-port-allocations>`_.
33
34 .. _prometheus-rbd-io-statistics:
35
36 RBD IO statistics
37 -----------------
38
39 The module can optionally collect RBD per-image IO statistics by enabling
40 dynamic OSD performance counters. The statistics are gathered for all images
41 in the pools that are specified in the ``mgr/prometheus/rbd_stats_pools``
42 configuration parameter. The parameter is a comma or space separated list
43 of ``pool[/namespace]`` entries. If the namespace is not specified the
44 statistics are collected for all namespaces in the pool.
45
46 Example to activate the RBD-enabled pools ``pool1``, ``pool2`` and ``poolN``::
47
48 ceph config set mgr mgr/prometheus/rbd_stats_pools "pool1,pool2,poolN"
49
50 The module makes the list of all available images scanning the specified
51 pools and namespaces and refreshes it periodically. The period is
52 configurable via the ``mgr/prometheus/rbd_stats_pools_refresh_interval``
53 parameter (in sec) and is 300 sec (5 minutes) by default. The module will
54 force refresh earlier if it detects statistics from a previously unknown
55 RBD image.
56
57 Example to turn up the sync interval to 10 minutes::
58
59 ceph config set mgr mgr/prometheus/rbd_stats_pools_refresh_interval 600
60
61 Statistic names and labels
62 ==========================
63
64 The names of the stats are exactly as Ceph names them, with
65 illegal characters ``.``, ``-`` and ``::`` translated to ``_``,
66 and ``ceph_`` prefixed to all names.
67
68
69 All *daemon* statistics have a ``ceph_daemon`` label such as "osd.123"
70 that identifies the type and ID of the daemon they come from. Some
71 statistics can come from different types of daemon, so when querying
72 e.g. an OSD's RocksDB stats, you would probably want to filter
73 on ceph_daemon starting with "osd" to avoid mixing in the monitor
74 rocksdb stats.
75
76
77 The *cluster* statistics (i.e. those global to the Ceph cluster)
78 have labels appropriate to what they report on. For example,
79 metrics relating to pools have a ``pool_id`` label.
80
81
82 The long running averages that represent the histograms from core Ceph
83 are represented by a pair of ``<name>_sum`` and ``<name>_count`` metrics.
84 This is similar to how histograms are represented in `Prometheus <https://prometheus.io/docs/concepts/metric_types/#histogram>`_
85 and they can also be treated `similarly <https://prometheus.io/docs/practices/histograms/>`_.
86
87 Pool and OSD metadata series
88 ----------------------------
89
90 Special series are output to enable displaying and querying on
91 certain metadata fields.
92
93 Pools have a ``ceph_pool_metadata`` field like this:
94
95 ::
96
97 ceph_pool_metadata{pool_id="2",name="cephfs_metadata_a"} 1.0
98
99 OSDs have a ``ceph_osd_metadata`` field like this:
100
101 ::
102
103 ceph_osd_metadata{cluster_addr="172.21.9.34:6802/19096",device_class="ssd",ceph_daemon="osd.0",public_addr="172.21.9.34:6801/19096",weight="1.0"} 1.0
104
105
106 Correlating drive statistics with node_exporter
107 -----------------------------------------------
108
109 The prometheus output from Ceph is designed to be used in conjunction
110 with the generic host monitoring from the Prometheus node_exporter.
111
112 To enable correlation of Ceph OSD statistics with node_exporter's
113 drive statistics, special series are output like this:
114
115 ::
116
117 ceph_disk_occupation{ceph_daemon="osd.0",device="sdd", exported_instance="myhost"}
118
119 To use this to get disk statistics by OSD ID, use either the ``and`` operator or
120 the ``*`` operator in your prometheus query. All metadata metrics (like ``
121 ceph_disk_occupation`` have the value 1 so they act neutral with ``*``. Using ``*``
122 allows to use ``group_left`` and ``group_right`` grouping modifiers, so that
123 the resulting metric has additional labels from one side of the query.
124
125 See the
126 `prometheus documentation`__ for more information about constructing queries.
127
128 __ https://prometheus.io/docs/prometheus/latest/querying/basics
129
130 The goal is to run a query like
131
132 ::
133
134 rate(node_disk_bytes_written[30s]) and on (device,instance) ceph_disk_occupation{ceph_daemon="osd.0"}
135
136 Out of the box the above query will not return any metrics since the ``instance`` labels of
137 both metrics don't match. The ``instance`` label of ``ceph_disk_occupation``
138 will be the currently active MGR node.
139
140 The following two section outline two approaches to remedy this.
141
142 Use label_replace
143 =================
144
145 The ``label_replace`` function (cp.
146 `label_replace documentation <https://prometheus.io/docs/prometheus/latest/querying/functions/#label_replace>`_)
147 can add a label to, or alter a label of, a metric within a query.
148
149 To correlate an OSD and its disks write rate, the following query can be used:
150
151 ::
152
153 label_replace(rate(node_disk_bytes_written[30s]), "exported_instance", "$1", "instance", "(.*):.*") and on (device,exported_instance) ceph_disk_occupation{ceph_daemon="osd.0"}
154
155 Configuring Prometheus server
156 =============================
157
158 honor_labels
159 ------------
160
161 To enable Ceph to output properly-labeled data relating to any host,
162 use the ``honor_labels`` setting when adding the ceph-mgr endpoints
163 to your prometheus configuration.
164
165 This allows Ceph to export the proper ``instance`` label without prometheus
166 overwriting it. Without this setting, Prometheus applies an ``instance`` label
167 that includes the hostname and port of the endpoint that the series came from.
168 Because Ceph clusters have multiple manager daemons, this results in an
169 ``instance`` label that changes spuriously when the active manager daemon
170 changes.
171
172 If this is undesirable a custom ``instance`` label can be set in the
173 Prometheus target configuration: you might wish to set it to the hostname
174 of your first mgr daemon, or something completely arbitrary like "ceph_cluster".
175
176 node_exporter hostname labels
177 -----------------------------
178
179 Set your ``instance`` labels to match what appears in Ceph's OSD metadata
180 in the ``instance`` field. This is generally the short hostname of the node.
181
182 This is only necessary if you want to correlate Ceph stats with host stats,
183 but you may find it useful to do it in all cases in case you want to do
184 the correlation in the future.
185
186 Example configuration
187 ---------------------
188
189 This example shows a single node configuration running ceph-mgr and
190 node_exporter on a server called ``senta04``. Note that this requires to add the
191 appropriate instance label to every ``node_exporter`` target individually.
192
193 This is just an example: there are other ways to configure prometheus
194 scrape targets and label rewrite rules.
195
196 prometheus.yml
197 ~~~~~~~~~~~~~~
198
199 ::
200
201 global:
202 scrape_interval: 15s
203 evaluation_interval: 15s
204
205 scrape_configs:
206 - job_name: 'node'
207 file_sd_configs:
208 - files:
209 - node_targets.yml
210 - job_name: 'ceph'
211 honor_labels: true
212 file_sd_configs:
213 - files:
214 - ceph_targets.yml
215
216
217 ceph_targets.yml
218 ~~~~~~~~~~~~~~~~
219
220
221 ::
222
223 [
224 {
225 "targets": [ "senta04.mydomain.com:9283" ],
226 "labels": {}
227 }
228 ]
229
230
231 node_targets.yml
232 ~~~~~~~~~~~~~~~~
233
234 ::
235
236 [
237 {
238 "targets": [ "senta04.mydomain.com:9100" ],
239 "labels": {
240 "instance": "senta04"
241 }
242 }
243 ]
244
245
246 Notes
247 =====
248
249 Counters and gauges are exported; currently histograms and long-running
250 averages are not. It's possible that Ceph's 2-D histograms could be
251 reduced to two separate 1-D histograms, and that long-running averages
252 could be exported as Prometheus' Summary type.
253
254 Timestamps, as with many Prometheus exporters, are established by
255 the server's scrape time (Prometheus expects that it is polling the
256 actual counter process synchronously). It is possible to supply a
257 timestamp along with the stat report, but the Prometheus team strongly
258 advises against this. This means that timestamps will be delayed by
259 an unpredictable amount; it's not clear if this will be problematic,
260 but it's worth knowing about.