ceph/doc/mgr/prometheus.rst

   1 =================
   2 Prometheus plugin
   3 =================
   4
   5 Provides a Prometheus exporter to pass on Ceph performance counters
   6 from the collection point in ceph-mgr.  Ceph-mgr receives MMgrReport
   7 messages from all MgrClient processes (mons and OSDs, for instance)
   8 with performance counter schema data and actual counter data, and keeps
   9 a circular buffer of the last N samples.  This plugin creates an HTTP
  10 endpoint (like all Prometheus exporters) and retrieves the latest sample
  11 of every counter when polled (or "scraped" in Prometheus terminology).
  12 The HTTP path and query parameters are ignored; all extant counters
  13 for all reporting entities are returned in text exposition format.
  14 (See the Prometheus `documentation <https://prometheus.io/docs/instrumenting/exposition_formats/#text-format-details>`_.)
  15
  16 Enabling prometheus output
  17 ==========================
  18
  19 The *prometheus* module is enabled with::
  20
  21   ceph mgr module enable prometheus
  22
  23 Configuration
  24 -------------
  25
  26 By default the module will accept HTTP requests on port ``9283`` on all
  27 IPv4 and IPv6 addresses on the host.  The port and listen address are both
  28 configurable with ``ceph config-key set``, with keys
  29 ``mgr/prometheus/server_addr`` and ``mgr/prometheus/server_port``.
  30 This port is registered with Prometheus's `registry <https://github.com/prometheus/prometheus/wiki/Default-port-allocations>`_.
  31
  32 Statistic names and labels
  33 ==========================
  34
  35 The names of the stats are exactly as Ceph names them, with
  36 illegal characters ``.``, ``-`` and ``::`` translated to ``_``,
  37 and ``ceph_`` prefixed to all names.
  38
  39
  40 All *daemon* statistics have a ``ceph_daemon`` label such as "osd.123"
  41 that identifies the type and ID of the daemon they come from.  Some
  42 statistics can come from different types of daemon, so when querying
  43 e.g. an OSD's RocksDB stats, you would probably want to filter
  44 on ceph_daemon starting with "osd" to avoid mixing in the monitor
  45 rocksdb stats.
  46
  47
  48 The *cluster* statistics (i.e. those global to the Ceph cluster)
  49 have labels appropriate to what they report on.  For example,
  50 metrics relating to pools have a ``pool_id`` label.
  51
  52 Pool and OSD metadata series
  53 ----------------------------
  54
  55 Special series are output to enable displaying and querying on
  56 certain metadata fields.
  57
  58 Pools have a ``ceph_pool_metadata`` field like this:
  59
  60 ::
  61
  62     ceph_pool_metadata{pool_id="2",name="cephfs_metadata_a"} 1.0
  63
  64 OSDs have a ``ceph_osd_metadata`` field like this:
  65
  66 ::
  67
  68     ceph_osd_metadata{cluster_addr="172.21.9.34:6802/19096",device_class="ssd",ceph_daemon="osd.0",public_addr="172.21.9.34:6801/19096",weight="1.0"} 1.0
  69
  70
  71 Correlating drive statistics with node_exporter
  72 -----------------------------------------------
  73
  74 The prometheus output from Ceph is designed to be used in conjunction
  75 with the generic host monitoring from the Prometheus node_exporter.
  76
  77 To enable correlation of Ceph OSD statistics with node_exporter's
  78 drive statistics, special series are output like this:
  79
  80 ::
  81
  82     ceph_disk_occupation{ceph_daemon="osd.0",device="sdd", exported_instance="myhost"}
  83
  84 To use this to get disk statistics by OSD ID, use either the ``and`` operator or
  85 the ``*`` operator in your prometheus query. All metadata metrics (like ``
  86 ceph_disk_occupation`` have the value 1 so they act neutral with ``*``. Using ``*``
  87 allows to use ``group_left`` and ``group_right`` grouping modifiers, so that
  88 the resulting metric has additional labels from one side of the query.
  89
  90 See the
  91 `prometheus documentation`__ for more information about constructing queries.
  92
  93 __ https://prometheus.io/docs/prometheus/latest/querying/basics
  94
  95 The goal is to run a query like
  96
  97 ::
  98
  99     rate(node_disk_bytes_written[30s]) and on (device,instance) ceph_disk_occupation{ceph_daemon="osd.0"}
 100
 101 Out of the box the above query will not return any metrics since the ``instance`` labels of
 102 both metrics don't match. The ``instance`` label of ``ceph_disk_occupation``
 103 will be the currently active MGR node.
 104
 105  The following two section outline two approaches to remedy this.
 106
 107 Use label_replace
 108 =================
 109
 110 The ``label_replace`` function (cp.
 111 `label_replace documentation <https://prometheus.io/docs/prometheus/latest/querying/functions/#label_replace>`_)
 112 can add a label to, or alter a label of, a metric within a query.
 113
 114 To correlate an OSD and its disks write rate, the following query can be used:
 115
 116 ::
 117
 118     label_replace(rate(node_disk_bytes_written[30s]), "exported_instance", "$1", "instance", "(.*):.*") and on (device,exported_instance) ceph_disk_occupation{ceph_daemon="osd.0"}
 119
 120 Configuring Prometheus server
 121 =============================
 122
 123 honor_labels
 124 ------------
 125
 126 To enable Ceph to output properly-labelled data relating to any host,
 127 use the ``honor_labels`` setting when adding the ceph-mgr endpoints
 128 to your prometheus configuration.
 129
 130 This allows Ceph to export the proper ``instance`` label without prometheus
 131 overwriting it. Without this setting, Prometheus applies an ``instance`` label
 132 that includes the hostname and port of the endpoint that the series game from.
 133 Because Ceph clusters have multiple manager daemons, this results in an
 134 ``instance`` label that changes spuriously when the active manager daemon
 135 changes.
 136
 137 node_exporter hostname labels
 138 -----------------------------
 139
 140 Set your ``instance`` labels to match what appears in Ceph's OSD metadata
 141 in the ``instance`` field.  This is generally the short hostname of the node.
 142
 143 This is only necessary if you want to correlate Ceph stats with host stats,
 144 but you may find it useful to do it in all cases in case you want to do
 145 the correlation in the future.
 146
 147 Example configuration
 148 ---------------------
 149
 150 This example shows a single node configuration running ceph-mgr and
 151 node_exporter on a server called ``senta04``. Note that this requires to add the
 152 appropriate instance label to every ``node_exporter`` target individually.
 153
 154 This is just an example: there are other ways to configure prometheus
 155 scrape targets and label rewrite rules.
 156
 157 prometheus.yml
 158 ~~~~~~~~~~~~~~
 159
 160 ::
 161
 162     global:
 163       scrape_interval:     15s
 164       evaluation_interval: 15s
 165
 166     scrape_configs:
 167       - job_name: 'node'
 168         file_sd_configs:
 169           - files:
 170             - node_targets.yml
 171       - job_name: 'ceph'
 172         honor_labels: true
 173         file_sd_configs:
 174           - files:
 175             - ceph_targets.yml
 176
 177
 178 ceph_targets.yml
 179 ~~~~~~~~~~~~~~~~
 180
 181
 182 ::
 183
 184     [
 185         {
 186             "targets": [ "senta04.mydomain.com:9283" ],
 187             "labels": {}
 188         }
 189     ]
 190
 191
 192 node_targets.yml
 193 ~~~~~~~~~~~~~~~~
 194
 195 ::
 196
 197     [
 198         {
 199             "targets": [ "senta04.mydomain.com:9100" ],
 200             "labels": {
 201                 "instance": "senta04"
 202             }
 203         }
 204     ]
 205
 206
 207 Notes
 208 =====
 209
 210 Counters and gauges are exported; currently histograms and long-running
 211 averages are not.  It's possible that Ceph's 2-D histograms could be
 212 reduced to two separate 1-D histograms, and that long-running averages
 213 could be exported as Prometheus' Summary type.
 214
 215 Timestamps, as with many Prometheus exporters, are established by
 216 the server's scrape time (Prometheus expects that it is polling the
 217 actual counter process synchronously).  It is possible to supply a
 218 timestamp along with the stat report, but the Prometheus team strongly
 219 advises against this.  This means that timestamps will be delayed by
 220 an unpredictable amount; it's not clear if this will be problematic,
 221 but it's worth knowing about.