]> git.proxmox.com Git - ceph.git/blob - ceph/doc/cephadm/services/monitoring.rst
import quincy beta 17.1.0
[ceph.git] / ceph / doc / cephadm / services / monitoring.rst
1 .. _mgr-cephadm-monitoring:
2
3 Monitoring Services
4 ===================
5
6 Ceph Dashboard uses `Prometheus <https://prometheus.io/>`_, `Grafana
7 <https://grafana.com/>`_, and related tools to store and visualize detailed
8 metrics on cluster utilization and performance. Ceph users have three options:
9
10 #. Have cephadm deploy and configure these services. This is the default
11 when bootstrapping a new cluster unless the ``--skip-monitoring-stack``
12 option is used.
13 #. Deploy and configure these services manually. This is recommended for users
14 with existing prometheus services in their environment (and in cases where
15 Ceph is running in Kubernetes with Rook).
16 #. Skip the monitoring stack completely. Some Ceph dashboard graphs will
17 not be available.
18
19 The monitoring stack consists of `Prometheus <https://prometheus.io/>`_,
20 Prometheus exporters (:ref:`mgr-prometheus`, `Node exporter
21 <https://prometheus.io/docs/guides/node-exporter/>`_), `Prometheus Alert
22 Manager <https://prometheus.io/docs/alerting/alertmanager/>`_ and `Grafana
23 <https://grafana.com/>`_.
24
25 .. note::
26
27 Prometheus' security model presumes that untrusted users have access to the
28 Prometheus HTTP endpoint and logs. Untrusted users have access to all the
29 (meta)data Prometheus collects that is contained in the database, plus a
30 variety of operational and debugging information.
31
32 However, Prometheus' HTTP API is limited to read-only operations.
33 Configurations can *not* be changed using the API and secrets are not
34 exposed. Moreover, Prometheus has some built-in measures to mitigate the
35 impact of denial of service attacks.
36
37 Please see `Prometheus' Security model
38 <https://prometheus.io/docs/operating/security/>` for more detailed
39 information.
40
41 Deploying monitoring with cephadm
42 ---------------------------------
43
44 The default behavior of ``cephadm`` is to deploy a basic monitoring stack. It
45 is however possible that you have a Ceph cluster without a monitoring stack,
46 and you would like to add a monitoring stack to it. (Here are some ways that
47 you might have come to have a Ceph cluster without a monitoring stack: You
48 might have passed the ``--skip-monitoring stack`` option to ``cephadm`` during
49 the installation of the cluster, or you might have converted an existing
50 cluster (which had no monitoring stack) to cephadm management.)
51
52 To set up monitoring on a Ceph cluster that has no monitoring, follow the
53 steps below:
54
55 #. Deploy a node-exporter service on every node of the cluster. The node-exporter provides host-level metrics like CPU and memory utilization:
56
57 .. prompt:: bash #
58
59 ceph orch apply node-exporter
60
61 #. Deploy alertmanager:
62
63 .. prompt:: bash #
64
65 ceph orch apply alertmanager
66
67 #. Deploy Prometheus. A single Prometheus instance is sufficient, but
68 for high availablility (HA) you might want to deploy two:
69
70 .. prompt:: bash #
71
72 ceph orch apply prometheus
73
74 or
75
76 .. prompt:: bash #
77
78 ceph orch apply prometheus --placement 'count:2'
79
80 #. Deploy grafana:
81
82 .. prompt:: bash #
83
84 ceph orch apply grafana
85
86 .. _cephadm-monitoring-networks-ports:
87
88 Networks and Ports
89 ~~~~~~~~~~~~~~~~~~
90
91 All monitoring services can have the network and port they bind to configured with a yaml service specification
92
93 example spec file:
94
95 .. code-block:: yaml
96
97 service_type: grafana
98 service_name: grafana
99 placement:
100 count: 1
101 networks:
102 - 192.169.142.0/24
103 spec:
104 port: 4200
105
106 Using custom images
107 ~~~~~~~~~~~~~~~~~~~
108
109 It is possible to install or upgrade monitoring components based on other
110 images. To do so, the name of the image to be used needs to be stored in the
111 configuration first. The following configuration options are available.
112
113 - ``container_image_prometheus``
114 - ``container_image_grafana``
115 - ``container_image_alertmanager``
116 - ``container_image_node_exporter``
117
118 Custom images can be set with the ``ceph config`` command
119
120 .. code-block:: bash
121
122 ceph config set mgr mgr/cephadm/<option_name> <value>
123
124 For example
125
126 .. code-block:: bash
127
128 ceph config set mgr mgr/cephadm/container_image_prometheus prom/prometheus:v1.4.1
129
130 If there were already running monitoring stack daemon(s) of the type whose
131 image you've changed, you must redeploy the daemon(s) in order to have them
132 actually use the new image.
133
134 For example, if you had changed the prometheus image
135
136 .. prompt:: bash #
137
138 ceph orch redeploy prometheus
139
140
141 .. note::
142
143 By setting a custom image, the default value will be overridden (but not
144 overwritten). The default value changes when updates become available.
145 By setting a custom image, you will not be able to update the component
146 you have set the custom image for automatically. You will need to
147 manually update the configuration (image name and tag) to be able to
148 install updates.
149
150 If you choose to go with the recommendations instead, you can reset the
151 custom image you have set before. After that, the default value will be
152 used again. Use ``ceph config rm`` to reset the configuration option
153
154 .. code-block:: bash
155
156 ceph config rm mgr mgr/cephadm/<option_name>
157
158 For example
159
160 .. code-block:: bash
161
162 ceph config rm mgr mgr/cephadm/container_image_prometheus
163
164 .. _cephadm-overwrite-jinja2-templates:
165
166 Using custom configuration files
167 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
168
169 By overriding cephadm templates, it is possible to completely customize the
170 configuration files for monitoring services.
171
172 Internally, cephadm already uses `Jinja2
173 <https://jinja.palletsprojects.com/en/2.11.x/>`_ templates to generate the
174 configuration files for all monitoring components. To be able to customize the
175 configuration of Prometheus, Grafana or the Alertmanager it is possible to store
176 a Jinja2 template for each service that will be used for configuration
177 generation instead. This template will be evaluated every time a service of that
178 kind is deployed or reconfigured. That way, the custom configuration is
179 preserved and automatically applied on future deployments of these services.
180
181 .. note::
182
183 The configuration of the custom template is also preserved when the default
184 configuration of cephadm changes. If the updated configuration is to be used,
185 the custom template needs to be migrated *manually* after each upgrade of Ceph.
186
187 Option names
188 """"""""""""
189
190 The following templates for files that will be generated by cephadm can be
191 overridden. These are the names to be used when storing with ``ceph config-key
192 set``:
193
194 - ``services/alertmanager/alertmanager.yml``
195 - ``services/grafana/ceph-dashboard.yml``
196 - ``services/grafana/grafana.ini``
197 - ``services/prometheus/prometheus.yml``
198
199 You can look up the file templates that are currently used by cephadm in
200 ``src/pybind/mgr/cephadm/templates``:
201
202 - ``services/alertmanager/alertmanager.yml.j2``
203 - ``services/grafana/ceph-dashboard.yml.j2``
204 - ``services/grafana/grafana.ini.j2``
205 - ``services/prometheus/prometheus.yml.j2``
206
207 Usage
208 """""
209
210 The following command applies a single line value:
211
212 .. code-block:: bash
213
214 ceph config-key set mgr/cephadm/<option_name> <value>
215
216 To set contents of files as template use the ``-i`` argument:
217
218 .. code-block:: bash
219
220 ceph config-key set mgr/cephadm/<option_name> -i $PWD/<filename>
221
222 .. note::
223
224 When using files as input to ``config-key`` an absolute path to the file must
225 be used.
226
227
228 Then the configuration file for the service needs to be recreated.
229 This is done using `reconfig`. For more details see the following example.
230
231 Example
232 """""""
233
234 .. code-block:: bash
235
236 # set the contents of ./prometheus.yml.j2 as template
237 ceph config-key set mgr/cephadm/services/prometheus/prometheus.yml \
238 -i $PWD/prometheus.yml.j2
239
240 # reconfig the prometheus service
241 ceph orch reconfig prometheus
242
243 Deploying monitoring without cephadm
244 ------------------------------------
245
246 If you have an existing prometheus monitoring infrastructure, or would like
247 to manage it yourself, you need to configure it to integrate with your Ceph
248 cluster.
249
250 * Enable the prometheus module in the ceph-mgr daemon
251
252 .. code-block:: bash
253
254 ceph mgr module enable prometheus
255
256 By default, ceph-mgr presents prometheus metrics on port 9283 on each host
257 running a ceph-mgr daemon. Configure prometheus to scrape these.
258
259 * To enable the dashboard's prometheus-based alerting, see :ref:`dashboard-alerting`.
260
261 * To enable dashboard integration with Grafana, see :ref:`dashboard-grafana`.
262
263 Disabling monitoring
264 --------------------
265
266 To disable monitoring and remove the software that supports it, run the following commands:
267
268 .. code-block:: console
269
270 $ ceph orch rm grafana
271 $ ceph orch rm prometheus --force # this will delete metrics data collected so far
272 $ ceph orch rm node-exporter
273 $ ceph orch rm alertmanager
274 $ ceph mgr module disable prometheus
275
276 See also :ref:`orch-rm`.
277
278 Setting up RBD-Image monitoring
279 -------------------------------
280
281 Due to performance reasons, monitoring of RBD images is disabled by default. For more information please see
282 :ref:`prometheus-rbd-io-statistics`. If disabled, the overview and details dashboards will stay empty in Grafana
283 and the metrics will not be visible in Prometheus.
284
285 Setting up Grafana
286 ------------------
287
288 Manually setting the Grafana URL
289 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
290
291 Cephadm automatically configures Prometheus, Grafana, and Alertmanager in
292 all cases except one.
293
294 In a some setups, the Dashboard user's browser might not be able to access the
295 Grafana URL that is configured in Ceph Dashboard. This can happen when the
296 cluster and the accessing user are in different DNS zones.
297
298 If this is the case, you can use a configuration option for Ceph Dashboard
299 to set the URL that the user's browser will use to access Grafana. This
300 value will never be altered by cephadm. To set this configuration option,
301 issue the following command:
302
303 .. prompt:: bash $
304
305 ceph dashboard set-grafana-frontend-api-url <grafana-server-api>
306
307 It might take a minute or two for services to be deployed. After the
308 services have been deployed, you should see something like this when you issue the command ``ceph orch ls``:
309
310 .. code-block:: console
311
312 $ ceph orch ls
313 NAME RUNNING REFRESHED IMAGE NAME IMAGE ID SPEC
314 alertmanager 1/1 6s ago docker.io/prom/alertmanager:latest 0881eb8f169f present
315 crash 2/2 6s ago docker.io/ceph/daemon-base:latest-master-devel mix present
316 grafana 1/1 0s ago docker.io/pcuzner/ceph-grafana-el8:latest f77afcf0bcf6 absent
317 node-exporter 2/2 6s ago docker.io/prom/node-exporter:latest e5a616e4b9cf present
318 prometheus 1/1 6s ago docker.io/prom/prometheus:latest e935122ab143 present
319
320 Configuring SSL/TLS for Grafana
321 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
322
323 ``cephadm`` deploys Grafana using the certificate defined in the ceph
324 key/value store. If no certificate is specified, ``cephadm`` generates a
325 self-signed certificate during the deployment of the Grafana service.
326
327 A custom certificate can be configured using the following commands:
328
329 .. prompt:: bash #
330
331 ceph config-key set mgr/cephadm/grafana_key -i $PWD/key.pem
332 ceph config-key set mgr/cephadm/grafana_crt -i $PWD/certificate.pem
333
334 If you have already deployed Grafana, run ``reconfig`` on the service to
335 update its configuration:
336
337 .. prompt:: bash #
338
339 ceph orch reconfig grafana
340
341 The ``reconfig`` command also sets the proper URL for Ceph Dashboard.
342
343 Setting the initial admin password
344 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
345
346 By default, Grafana will not create an initial
347 admin user. In order to create the admin user, please create a file
348 ``grafana.yaml`` with this content:
349
350 .. code-block:: yaml
351
352 service_type: grafana
353 spec:
354 initial_admin_password: mypassword
355
356 Then apply this specification:
357
358 .. code-block:: bash
359
360 ceph orch apply -i grafana.yaml
361 ceph orch redeploy grafana
362
363 Grafana will now create an admin user called ``admin`` with the
364 given password.
365
366
367 Setting up Alertmanager
368 -----------------------
369
370 Adding Alertmanager webhooks
371 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
372
373 To add new webhooks to the Alertmanager configuration, add additional
374 webhook urls like so:
375
376 .. code-block:: yaml
377
378 service_type: alertmanager
379 spec:
380 user_data:
381 default_webhook_urls:
382 - "https://foo"
383 - "https://bar"
384
385 Where ``default_webhook_urls`` is a list of additional URLs that are
386 added to the default receivers' ``<webhook_configs>`` configuration.
387
388 Run ``reconfig`` on the service to update its configuration:
389
390 .. prompt:: bash #
391
392 ceph orch reconfig alertmanager
393
394 Further Reading
395 ---------------
396
397 * :ref:`mgr-prometheus`