]> git.proxmox.com Git - ceph.git/blame - ceph/doc/cephadm/monitoring.rst
import ceph pacific 16.2.5
[ceph.git] / ceph / doc / cephadm / monitoring.rst
CommitLineData
f67539c2
TL
1.. _mgr-cephadm-monitoring:
2
3Monitoring Services
4===================
9f95a23c 5
e306af50
TL
6Ceph Dashboard uses `Prometheus <https://prometheus.io/>`_, `Grafana
7<https://grafana.com/>`_, and related tools to store and visualize detailed
8metrics on cluster utilization and performance. Ceph users have three options:
9f95a23c
TL
9
10#. Have cephadm deploy and configure these services. This is the default
11 when bootstrapping a new cluster unless the ``--skip-monitoring-stack``
12 option is used.
13#. Deploy and configure these services manually. This is recommended for users
14 with existing prometheus services in their environment (and in cases where
15 Ceph is running in Kubernetes with Rook).
16#. Skip the monitoring stack completely. Some Ceph dashboard graphs will
17 not be available.
18
e306af50
TL
19The monitoring stack consists of `Prometheus <https://prometheus.io/>`_,
20Prometheus exporters (:ref:`mgr-prometheus`, `Node exporter
21<https://prometheus.io/docs/guides/node-exporter/>`_), `Prometheus Alert
22Manager <https://prometheus.io/docs/alerting/alertmanager/>`_ and `Grafana
23<https://grafana.com/>`_.
24
25.. note::
26
27 Prometheus' security model presumes that untrusted users have access to the
28 Prometheus HTTP endpoint and logs. Untrusted users have access to all the
29 (meta)data Prometheus collects that is contained in the database, plus a
30 variety of operational and debugging information.
31
32 However, Prometheus' HTTP API is limited to read-only operations.
33 Configurations can *not* be changed using the API and secrets are not
34 exposed. Moreover, Prometheus has some built-in measures to mitigate the
35 impact of denial of service attacks.
36
37 Please see `Prometheus' Security model
38 <https://prometheus.io/docs/operating/security/>` for more detailed
39 information.
9f95a23c 40
f67539c2
TL
41Deploying monitoring with cephadm
42---------------------------------
43
b3b6e05e
TL
44The default behavior of ``cephadm`` is to deploy a basic monitoring stack. It
45is however possible that you have a Ceph cluster without a monitoring stack,
46and you would like to add a monitoring stack to it. (Here are some ways that
47you might have come to have a Ceph cluster without a monitoring stack: You
48might have passed the ``--skip-monitoring stack`` option to ``cephadm`` during
49the installation of the cluster, or you might have converted an existing
50cluster (which had no monitoring stack) to cephadm management.)
9f95a23c 51
b3b6e05e
TL
52To set up monitoring on a Ceph cluster that has no monitoring, follow the
53steps below:
f91f0fd5 54
b3b6e05e
TL
55#. Enable the Prometheus module in the ceph-mgr daemon. This exposes the internal Ceph metrics so that Prometheus can scrape them:
56
57 .. prompt:: bash #
9f95a23c
TL
58
59 ceph mgr module enable prometheus
60
b3b6e05e 61#. Deploy a node-exporter service on every node of the cluster. The node-exporter provides host-level metrics like CPU and memory utilization:
f91f0fd5 62
b3b6e05e 63 .. prompt:: bash #
9f95a23c
TL
64
65 ceph orch apply node-exporter '*'
66
b3b6e05e 67#. Deploy alertmanager:
f91f0fd5 68
b3b6e05e 69 .. prompt:: bash #
9f95a23c
TL
70
71 ceph orch apply alertmanager 1
72
b3b6e05e
TL
73#. Deploy Prometheus. A single Prometheus instance is sufficient, but
74 for high availablility (HA) you might want to deploy two:
75
76 .. prompt:: bash #
77
78 ceph orch apply prometheus 1
f91f0fd5 79
b3b6e05e 80 or
9f95a23c 81
b3b6e05e
TL
82 .. prompt:: bash #
83
84 ceph orch apply prometheus 2
9f95a23c 85
b3b6e05e 86#. Deploy grafana:
f91f0fd5 87
b3b6e05e 88 .. prompt:: bash #
9f95a23c
TL
89
90 ceph orch apply grafana 1
91
b3b6e05e
TL
92Manually setting the Grafana URL
93~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
94
95Cephadm automatically configures Prometheus, Grafana, and Alertmanager in
96all cases except one.
adb31ebb 97
b3b6e05e
TL
98In a some setups, the Dashboard user's browser might not be able to access the
99Grafana URL that is configured in Ceph Dashboard. This can happen when the
100cluster and the accessing user are in different DNS zones.
adb31ebb 101
b3b6e05e
TL
102If this is the case, you can use a configuration option for Ceph Dashboard
103to set the URL that the user's browser will use to access Grafana. This
104value will never be altered by cephadm. To set this configuration option,
105issue the following command:
adb31ebb 106
b3b6e05e 107 .. prompt:: bash $
9f95a23c 108
b3b6e05e
TL
109 ceph dashboard set-grafana-frontend-api-url <grafana-server-api>
110
111It might take a minute or two for services to be deployed. After the
112services have been deployed, you should see something like this when you issue the command ``ceph orch ls``:
f91f0fd5
TL
113
114.. code-block:: console
9f95a23c
TL
115
116 $ ceph orch ls
117 NAME RUNNING REFRESHED IMAGE NAME IMAGE ID SPEC
118 alertmanager 1/1 6s ago docker.io/prom/alertmanager:latest 0881eb8f169f present
119 crash 2/2 6s ago docker.io/ceph/daemon-base:latest-master-devel mix present
120 grafana 1/1 0s ago docker.io/pcuzner/ceph-grafana-el8:latest f77afcf0bcf6 absent
121 node-exporter 2/2 6s ago docker.io/prom/node-exporter:latest e5a616e4b9cf present
122 prometheus 1/1 6s ago docker.io/prom/prometheus:latest e935122ab143 present
123
adb31ebb
TL
124Configuring SSL/TLS for Grafana
125~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
126
b3b6e05e
TL
127``cephadm`` deploys Grafana using the certificate defined in the ceph
128key/value store. If no certificate is specified, ``cephadm`` generates a
129self-signed certificate during the deployment of the Grafana service.
adb31ebb 130
b3b6e05e 131A custom certificate can be configured using the following commands:
adb31ebb 132
b3b6e05e 133.. prompt:: bash #
adb31ebb
TL
134
135 ceph config-key set mgr/cephadm/grafana_key -i $PWD/key.pem
136 ceph config-key set mgr/cephadm/grafana_crt -i $PWD/certificate.pem
137
b3b6e05e
TL
138If you have already deployed Grafana, run ``reconfig`` on the service to
139update its configuration:
adb31ebb 140
b3b6e05e 141.. prompt:: bash #
adb31ebb 142
f67539c2 143 ceph orch reconfig grafana
adb31ebb 144
b3b6e05e
TL
145The ``reconfig`` command also sets the proper URL for Ceph Dashboard.
146
147Networks and Ports
148~~~~~~~~~~~~~~~~~~
149
150All monitoring services can have the network and port they bind to configured with a yaml service specification
151
152example spec file:
153
154.. code-block:: yaml
155
156 service_type: grafana
157 service_name: grafana
158 placement:
159 count: 1
160 networks:
161 - 192.169.142.0/24
162 spec:
163 port: 4200
adb31ebb 164
e306af50
TL
165Using custom images
166~~~~~~~~~~~~~~~~~~~
167
168It is possible to install or upgrade monitoring components based on other
169images. To do so, the name of the image to be used needs to be stored in the
170configuration first. The following configuration options are available.
171
172- ``container_image_prometheus``
173- ``container_image_grafana``
174- ``container_image_alertmanager``
175- ``container_image_node_exporter``
176
f91f0fd5
TL
177Custom images can be set with the ``ceph config`` command
178
179.. code-block:: bash
e306af50
TL
180
181 ceph config set mgr mgr/cephadm/<option_name> <value>
182
f91f0fd5
TL
183For example
184
185.. code-block:: bash
e306af50
TL
186
187 ceph config set mgr mgr/cephadm/container_image_prometheus prom/prometheus:v1.4.1
188
b3b6e05e
TL
189If there were already running monitoring stack daemon(s) of the type whose
190image you've changed, you must redeploy the daemon(s) in order to have them
191actually use the new image.
192
193For example, if you had changed the prometheus image
194
195.. prompt:: bash #
196
197 ceph orch redeploy prometheus
198
199
e306af50
TL
200.. note::
201
202 By setting a custom image, the default value will be overridden (but not
203 overwritten). The default value changes when updates become available.
204 By setting a custom image, you will not be able to update the component
205 you have set the custom image for automatically. You will need to
206 manually update the configuration (image name and tag) to be able to
207 install updates.
adb31ebb 208
e306af50
TL
209 If you choose to go with the recommendations instead, you can reset the
210 custom image you have set before. After that, the default value will be
f91f0fd5
TL
211 used again. Use ``ceph config rm`` to reset the configuration option
212
213 .. code-block:: bash
e306af50
TL
214
215 ceph config rm mgr mgr/cephadm/<option_name>
216
f91f0fd5
TL
217 For example
218
219 .. code-block:: bash
e306af50
TL
220
221 ceph config rm mgr mgr/cephadm/container_image_prometheus
222
adb31ebb
TL
223Using custom configuration files
224~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
225
226By overriding cephadm templates, it is possible to completely customize the
227configuration files for monitoring services.
228
229Internally, cephadm already uses `Jinja2
230<https://jinja.palletsprojects.com/en/2.11.x/>`_ templates to generate the
231configuration files for all monitoring components. To be able to customize the
232configuration of Prometheus, Grafana or the Alertmanager it is possible to store
233a Jinja2 template for each service that will be used for configuration
234generation instead. This template will be evaluated every time a service of that
235kind is deployed or reconfigured. That way, the custom configuration is
236preserved and automatically applied on future deployments of these services.
237
238.. note::
239
240 The configuration of the custom template is also preserved when the default
241 configuration of cephadm changes. If the updated configuration is to be used,
242 the custom template needs to be migrated *manually*.
243
244Option names
245""""""""""""
246
247The following templates for files that will be generated by cephadm can be
248overridden. These are the names to be used when storing with ``ceph config-key
249set``:
250
f67539c2
TL
251- ``services/alertmanager/alertmanager.yml``
252- ``services/grafana/ceph-dashboard.yml``
253- ``services/grafana/grafana.ini``
254- ``services/prometheus/prometheus.yml``
adb31ebb
TL
255
256You can look up the file templates that are currently used by cephadm in
257``src/pybind/mgr/cephadm/templates``:
258
259- ``services/alertmanager/alertmanager.yml.j2``
260- ``services/grafana/ceph-dashboard.yml.j2``
261- ``services/grafana/grafana.ini.j2``
262- ``services/prometheus/prometheus.yml.j2``
263
264Usage
265"""""
266
267The following command applies a single line value:
268
269.. code-block:: bash
270
271 ceph config-key set mgr/cephadm/<option_name> <value>
272
273To set contents of files as template use the ``-i`` argument:
274
275.. code-block:: bash
276
277 ceph config-key set mgr/cephadm/<option_name> -i $PWD/<filename>
278
279.. note::
280
281 When using files as input to ``config-key`` an absolute path to the file must
282 be used.
283
f67539c2
TL
284
285Then the configuration file for the service needs to be recreated.
286This is done using `reconfig`. For more details see the following example.
adb31ebb
TL
287
288Example
289"""""""
290
291.. code-block:: bash
292
293 # set the contents of ./prometheus.yml.j2 as template
f67539c2 294 ceph config-key set mgr/cephadm/services/prometheus/prometheus.yml \
adb31ebb
TL
295 -i $PWD/prometheus.yml.j2
296
f67539c2
TL
297 # reconfig the prometheus service
298 ceph orch reconfig prometheus
adb31ebb 299
9f95a23c
TL
300Disabling monitoring
301--------------------
302
b3b6e05e 303To disable monitoring and remove the software that supports it, run the following commands:
f91f0fd5 304
b3b6e05e 305.. code-block:: console
9f95a23c 306
b3b6e05e
TL
307 $ ceph orch rm grafana
308 $ ceph orch rm prometheus --force # this will delete metrics data collected so far
309 $ ceph orch rm node-exporter
310 $ ceph orch rm alertmanager
311 $ ceph mgr module disable prometheus
9f95a23c
TL
312
313
314Deploying monitoring manually
315-----------------------------
316
317If you have an existing prometheus monitoring infrastructure, or would like
318to manage it yourself, you need to configure it to integrate with your Ceph
319cluster.
320
f91f0fd5
TL
321* Enable the prometheus module in the ceph-mgr daemon
322
323 .. code-block:: bash
9f95a23c
TL
324
325 ceph mgr module enable prometheus
326
327 By default, ceph-mgr presents prometheus metrics on port 9283 on each host
328 running a ceph-mgr daemon. Configure prometheus to scrape these.
329
330* To enable the dashboard's prometheus-based alerting, see :ref:`dashboard-alerting`.
331
332* To enable dashboard integration with Grafana, see :ref:`dashboard-grafana`.
e306af50
TL
333
334Enabling RBD-Image monitoring
335---------------------------------
336
337Due to performance reasons, monitoring of RBD images is disabled by default. For more information please see
338:ref:`prometheus-rbd-io-statistics`. If disabled, the overview and details dashboards will stay empty in Grafana
339and the metrics will not be visible in Prometheus.