[ceph.git] / ceph / doc / mgr / prometheus.rst

=================
Prometheus plugin
=================

Provides a Prometheus exporter to pass on Ceph performance counters
from the collection point in ceph-mgr.  Ceph-mgr receives MMgrReport
messages from all MgrClient processes (mons and OSDs, for instance)
with performance counter schema data and actual counter data, and keeps
a circular buffer of the last N samples.  This plugin creates an HTTP
endpoint (like all Prometheus exporters) and retrieves the latest sample
of every counter when polled (or "scraped" in Prometheus terminology).
The HTTP path and query parameters are ignored; all extant counters
for all reporting entities are returned in text exposition format.
(See the Prometheus `documentation <https://prometheus.io/docs/instrumenting/exposition_formats/#text-format-details>`_.)

Enabling prometheus output
==========================

The *prometheus* module is enabled with::

  ceph mgr module enable prometheus

Configuration
-------------

By default the module will accept HTTP requests on port ``9283`` on all
IPv4 and IPv6 addresses on the host.  The port and listen address are both
configurable with ``ceph config-key set``, with keys
``mgr/prometheus/server_addr`` and ``mgr/prometheus/server_port``.
This port is registered with Prometheus's `registry <https://github.com/prometheus/prometheus/wiki/Default-port-allocations>`_.

Statistic names and labels
==========================

The names of the stats are exactly as Ceph names them, with
illegal characters ``.``, ``-`` and ``::`` translated to ``_``, 
and ``ceph_`` prefixed to all names.


All *daemon* statistics have a ``ceph_daemon`` label such as "osd.123"
that identifies the type and ID of the daemon they come from.  Some
statistics can come from different types of daemon, so when querying
e.g. an OSD's RocksDB stats, you would probably want to filter
on ceph_daemon starting with "osd" to avoid mixing in the monitor
rocksdb stats.


The *cluster* statistics (i.e. those global to the Ceph cluster)
have labels appropriate to what they report on.  For example, 
metrics relating to pools have a ``pool_id`` label.

Pool and OSD metadata series
----------------------------

Special series are output to enable displaying and querying on
certain metadata fields.

Pools have a ``ceph_pool_metadata`` field like this:

::

    ceph_pool_metadata{pool_id="2",name="cephfs_metadata_a"} 0.0

OSDs have a ``ceph_osd_metadata`` field like this:

::

    ceph_osd_metadata{cluster_addr="172.21.9.34:6802/19096",device_class="ssd",id="0",public_addr="172.21.9.34:6801/19096",weight="1.0"} 0.0


Correlating drive statistics with node_exporter
-----------------------------------------------

The prometheus output from Ceph is designed to be used in conjunction
with the generic host monitoring from the Prometheus node_exporter.

To enable correlation of Ceph OSD statistics with node_exporter's 
drive statistics, special series are output like this:

::

    ceph_disk_occupation{ceph_daemon="osd.0",device="sdd",instance="myhost",job="ceph"}

To use this to get disk statistics by OSD ID, use the ``and on`` syntax
in your prometheus query like this:

::

    rate(node_disk_bytes_written[30s]) and on (device,instance) ceph_disk_occupation{ceph_daemon="osd.0"}

See the prometheus documentation for more information about constructing
queries.

Note that for this mechanism to work, Ceph and node_exporter must agree
about the values of the ``instance`` label.  See the following section
for guidance about to to set up Prometheus in a way that sets
``instance`` properly.

Configuring Prometheus server
=============================

See the prometheus documentation for full details of how to add
scrape endpoints: the notes
in this section are tips on how to configure Prometheus to capture
the Ceph statistics in the most usefully-labelled form.

This configuration is necessary because Ceph is reporting metrics
from many hosts and services via a single endpoint, and some
metrics that relate to no physical host (such as pool statistics).

honor_labels
------------

To enable Ceph to output properly-labelled data relating to any host,
use the ``honor_labels`` setting when adding the ceph-mgr endpoints
to your prometheus configuration.

Without this setting, any ``instance`` labels that Ceph outputs, such
as those in ``ceph_disk_occupation`` series, will be overridden
by Prometheus.

Ceph instance label
-------------------

By default, Prometheus applies an ``instance`` label that includes
the hostname and port of the endpoint that the series game from.  Because
Ceph clusters have multiple manager daemons, this results in an ``instance``
label that changes spuriously when the active manager daemon changes.

Set a custom ``instance`` label in your Prometheus target configuration: 
you might wish to set it to the hostname of your first monitor, or something
completely arbitrary like "ceph_cluster".

node_exporter instance labels
-----------------------------

Set your ``instance`` labels to match what appears in Ceph's OSD metadata
in the ``hostname`` field.  This is generally the short hostname of the node.

This is only necessary if you want to correlate Ceph stats with host stats,
but you may find it useful to do it in all cases in case you want to do
the correlation in the future.

Example configuration
---------------------

This example shows a single node configuration running ceph-mgr and
node_exporter on a server called ``senta04``.

This is just an example: there are other ways to configure prometheus
scrape targets and label rewrite rules.

prometheus.yml
~~~~~~~~~~~~~~

::

    global:
      scrape_interval:     15s
      evaluation_interval: 15s

    scrape_configs:
      - job_name: 'node'
        file_sd_configs:
          - files:
            - node_targets.yml
      - job_name: 'ceph'
        honor_labels: true
        file_sd_configs:
          - files:
            - ceph_targets.yml


ceph_targets.yml
~~~~~~~~~~~~~~~~


::

    [
        {
            "targets": [ "senta04.mydomain.com:9283" ],
            "labels": {
                "instance": "ceph_cluster"
            }
        }
    ]


node_targets.yml
~~~~~~~~~~~~~~~~

::

    [
        {
            "targets": [ "senta04.mydomain.com:9100" ],
            "labels": {
                "instance": "senta04"
            }
        }
    ]


Notes
=====

Counters and gauges are exported; currently histograms and long-running 
averages are not.  It's possible that Ceph's 2-D histograms could be 
reduced to two separate 1-D histograms, and that long-running averages
could be exported as Prometheus' Summary type.

Timestamps, as with many Prometheus exporters, are established by
the server's scrape time (Prometheus expects that it is polling the
actual counter process synchronously).  It is possible to supply a
timestamp along with the stat report, but the Prometheus team strongly
advises against this.  This means that timestamps will be delayed by
an unpredictable amount; it's not clear if this will be problematic,
but it's worth knowing about.
Commit	Line	Data
3efd9988	1	=================
c07f9fc5 FG	2	Prometheus plugin
	3	=================
	4
	5	Provides a Prometheus exporter to pass on Ceph performance counters
	6	from the collection point in ceph-mgr. Ceph-mgr receives MMgrReport
	7	messages from all MgrClient processes (mons and OSDs, for instance)
	8	with performance counter schema data and actual counter data, and keeps
	9	a circular buffer of the last N samples. This plugin creates an HTTP
	10	endpoint (like all Prometheus exporters) and retrieves the latest sample
	11	of every counter when polled (or "scraped" in Prometheus terminology).
	12	The HTTP path and query parameters are ignored; all extant counters
	13	for all reporting entities are returned in text exposition format.
	14	(See the Prometheus `documentation <https://prometheus.io/docs/instrumenting/exposition_formats/#text-format-details>`_.)
	15
3efd9988 FG	16	Enabling prometheus output
3efd9988 FG	17	==========================
c07f9fc5 FG	18
	19	The prometheus module is enabled with::
	20
	21	ceph mgr module enable prometheus
	22
	23	Configuration
	24	-------------
	25
	26	By default the module will accept HTTP requests on port ``9283`` on all
	27	IPv4 and IPv6 addresses on the host. The port and listen address are both
	28	configurable with ``ceph config-key set``, with keys
	29	``mgr/prometheus/server_addr`` and ``mgr/prometheus/server_port``.
	30	This port is registered with Prometheus's `registry <https://github.com/prometheus/prometheus/wiki/Default-port-allocations>`_.
	31
3efd9988 FG	32	Statistic names and labels
	33	==========================
	34
	35	The names of the stats are exactly as Ceph names them, with
	36	illegal characters ``.``, ``-`` and ``::`` translated to ``_``,
	37	and ``ceph_`` prefixed to all names.
	38
	39
	40	All daemon statistics have a ``ceph_daemon`` label such as "osd.123"
	41	that identifies the type and ID of the daemon they come from. Some
	42	statistics can come from different types of daemon, so when querying
	43	e.g. an OSD's RocksDB stats, you would probably want to filter
	44	on ceph_daemon starting with "osd" to avoid mixing in the monitor
	45	rocksdb stats.
	46
	47
	48	The cluster statistics (i.e. those global to the Ceph cluster)
	49	have labels appropriate to what they report on. For example,
	50	metrics relating to pools have a ``pool_id`` label.
	51
	52	Pool and OSD metadata series
	53	----------------------------
	54
	55	Special series are output to enable displaying and querying on
	56	certain metadata fields.
	57
	58	Pools have a ``ceph_pool_metadata`` field like this:
	59
	60	::
	61
	62	ceph_pool_metadata{pool_id="2",name="cephfs_metadata_a"} 0.0
	63
	64	OSDs have a ``ceph_osd_metadata`` field like this:
	65
	66	::
	67
	68	ceph_osd_metadata{cluster_addr="172.21.9.34:6802/19096",device_class="ssd",id="0",public_addr="172.21.9.34:6801/19096",weight="1.0"} 0.0
	69
	70
	71	Correlating drive statistics with node_exporter
	72	-----------------------------------------------
	73
	74	The prometheus output from Ceph is designed to be used in conjunction
	75	with the generic host monitoring from the Prometheus node_exporter.
	76
	77	To enable correlation of Ceph OSD statistics with node_exporter's
	78	drive statistics, special series are output like this:
	79
	80	::
	81
	82	ceph_disk_occupation{ceph_daemon="osd.0",device="sdd",instance="myhost",job="ceph"}
	83
	84	To use this to get disk statistics by OSD ID, use the ``and on`` syntax
	85	in your prometheus query like this:
	86
	87	::
	88
	89	rate(node_disk_bytes_written[30s]) and on (device,instance) ceph_disk_occupation{ceph_daemon="osd.0"}
	90
	91	See the prometheus documentation for more information about constructing
	92	queries.
	93
	94	Note that for this mechanism to work, Ceph and node_exporter must agree
	95	about the values of the ``instance`` label. See the following section
96	for guidance about to to set up Prometheus in a way that sets
97	``instance`` properly.
98
99	Configuring Prometheus server
100	=============================
101
102	See the prometheus documentation for full details of how to add
103	scrape endpoints: the notes
104	in this section are tips on how to configure Prometheus to capture
105	the Ceph statistics in the most usefully-labelled form.
106
107	This configuration is necessary because Ceph is reporting metrics
108	from many hosts and services via a single endpoint, and some
109	metrics that relate to no physical host (such as pool statistics).
110
111	honor_labels
112	------------
113
114	To enable Ceph to output properly-labelled data relating to any host,
115	use the ``honor_labels`` setting when adding the ceph-mgr endpoints
116	to your prometheus configuration.
117
118	Without this setting, any ``instance`` labels that Ceph outputs, such
119	as those in ``ceph_disk_occupation`` series, will be overridden
120	by Prometheus.
121
122	Ceph instance label
123	-------------------
124
125	By default, Prometheus applies an ``instance`` label that includes
126	the hostname and port of the endpoint that the series game from. Because
127	Ceph clusters have multiple manager daemons, this results in an ``instance``
128	label that changes spuriously when the active manager daemon changes.
129
130	Set a custom ``instance`` label in your Prometheus target configuration:
131	you might wish to set it to the hostname of your first monitor, or something
132	completely arbitrary like "ceph_cluster".
133
134	node_exporter instance labels
135	-----------------------------
136
137	Set your ``instance`` labels to match what appears in Ceph's OSD metadata
138	in the ``hostname`` field. This is generally the short hostname of the node.
139
140	This is only necessary if you want to correlate Ceph stats with host stats,
141	but you may find it useful to do it in all cases in case you want to do
142	the correlation in the future.
143
144	Example configuration
145	---------------------
146
147	This example shows a single node configuration running ceph-mgr and
148	node_exporter on a server called ``senta04``.
149
150	This is just an example: there are other ways to configure prometheus
151	scrape targets and label rewrite rules.
152
153	prometheus.yml
154	~~~~~~~~~~~~~~
155
156	::
157
158	global:
159	scrape_interval: 15s
160	evaluation_interval: 15s
161
162	scrape_configs:
163	- job_name: 'node'
164	file_sd_configs:
165	- files:
166	- node_targets.yml
167	- job_name: 'ceph'
168	honor_labels: true
169	file_sd_configs:
170	- files:
171	- ceph_targets.yml
172
173
174	ceph_targets.yml
175	~~~~~~~~~~~~~~~~
176
177
178	::
179
180	[
181	{
182	"targets": [ "senta04.mydomain.com:9283" ],
183	"labels": {
184	"instance": "ceph_cluster"
185	}
186	}
187	]
188
189
190	node_targets.yml
191	~~~~~~~~~~~~~~~~
192
193	::
194
195	[
196	{
197	"targets": [ "senta04.mydomain.com:9100" ],
198	"labels": {
199	"instance": "senta04"
200	}
201	}
202	]
203
204
c07f9fc5	205	Notes
3efd9988	206	=====
c07f9fc5 FG	207
	208	Counters and gauges are exported; currently histograms and long-running
	209	averages are not. It's possible that Ceph's 2-D histograms could be
	210	reduced to two separate 1-D histograms, and that long-running averages
	211	could be exported as Prometheus' Summary type.
	212
c07f9fc5 FG	213	Timestamps, as with many Prometheus exporters, are established by
	214	the server's scrape time (Prometheus expects that it is polling the
	215	actual counter process synchronously). It is possible to supply a
	216	timestamp along with the stat report, but the Prometheus team strongly
	217	advises against this. This means that timestamps will be delayed by
	218	an unpredictable amount; it's not clear if this will be problematic,
	219	but it's worth knowing about.