]> git.proxmox.com Git - ceph.git/blob - ceph/doc/cephadm/monitoring.rst
a694efaed69eb837764fdb48a14429afbcde8a2f
[ceph.git] / ceph / doc / cephadm / monitoring.rst
1 Monitoring Stack with Cephadm
2 =============================
3
4 Ceph Dashboard uses `Prometheus <https://prometheus.io/>`_, `Grafana
5 <https://grafana.com/>`_, and related tools to store and visualize detailed
6 metrics on cluster utilization and performance. Ceph users have three options:
7
8 #. Have cephadm deploy and configure these services. This is the default
9 when bootstrapping a new cluster unless the ``--skip-monitoring-stack``
10 option is used.
11 #. Deploy and configure these services manually. This is recommended for users
12 with existing prometheus services in their environment (and in cases where
13 Ceph is running in Kubernetes with Rook).
14 #. Skip the monitoring stack completely. Some Ceph dashboard graphs will
15 not be available.
16
17 The monitoring stack consists of `Prometheus <https://prometheus.io/>`_,
18 Prometheus exporters (:ref:`mgr-prometheus`, `Node exporter
19 <https://prometheus.io/docs/guides/node-exporter/>`_), `Prometheus Alert
20 Manager <https://prometheus.io/docs/alerting/alertmanager/>`_ and `Grafana
21 <https://grafana.com/>`_.
22
23 .. note::
24
25 Prometheus' security model presumes that untrusted users have access to the
26 Prometheus HTTP endpoint and logs. Untrusted users have access to all the
27 (meta)data Prometheus collects that is contained in the database, plus a
28 variety of operational and debugging information.
29
30 However, Prometheus' HTTP API is limited to read-only operations.
31 Configurations can *not* be changed using the API and secrets are not
32 exposed. Moreover, Prometheus has some built-in measures to mitigate the
33 impact of denial of service attacks.
34
35 Please see `Prometheus' Security model
36 <https://prometheus.io/docs/operating/security/>` for more detailed
37 information.
38
39 By default, bootstrap will deploy a basic monitoring stack. If you
40 did not do this (by passing ``--skip-monitoring-stack``, or if you
41 converted an existing cluster to cephadm management, you can set up
42 monitoring by following the steps below.
43
44 #. Enable the prometheus module in the ceph-mgr daemon. This exposes the internal Ceph metrics so that prometheus can scrape them.
45
46 .. code-block:: bash
47
48 ceph mgr module enable prometheus
49
50 #. Deploy a node-exporter service on every node of the cluster. The node-exporter provides host-level metrics like CPU and memory utilization.
51
52 .. code-block:: bash
53
54 ceph orch apply node-exporter '*'
55
56 #. Deploy alertmanager
57
58 .. code-block:: bash
59
60 ceph orch apply alertmanager 1
61
62 #. Deploy prometheus. A single prometheus instance is sufficient, but
63 for HA you may want to deploy two.
64
65 .. code-block:: bash
66
67 ceph orch apply prometheus 1 # or 2
68
69 #. Deploy grafana
70
71 .. code-block:: bash
72
73 ceph orch apply grafana 1
74
75 Cephadm takes care of the configuration of Prometheus, Grafana, and Alertmanager
76 automatically.
77
78 However, there is one exception to this rule. In a some setups, the Dashboard
79 user's browser might not be able to access the Grafana URL configured in Ceph
80 Dashboard. One such scenario is when the cluster and the accessing user are each
81 in a different DNS zone.
82
83 For this case, there is an extra configuration option for Ceph Dashboard, which
84 can be used to configure the URL for accessing Grafana by the user's browser.
85 This value will never be altered by cephadm. To set this configuration option,
86 issue the following command::
87
88 $ ceph dashboard set-grafana-frontend-api-url <grafana-server-api>
89
90 It may take a minute or two for services to be deployed. Once
91 completed, you should see something like this from ``ceph orch ls``
92
93 .. code-block:: console
94
95 $ ceph orch ls
96 NAME RUNNING REFRESHED IMAGE NAME IMAGE ID SPEC
97 alertmanager 1/1 6s ago docker.io/prom/alertmanager:latest 0881eb8f169f present
98 crash 2/2 6s ago docker.io/ceph/daemon-base:latest-master-devel mix present
99 grafana 1/1 0s ago docker.io/pcuzner/ceph-grafana-el8:latest f77afcf0bcf6 absent
100 node-exporter 2/2 6s ago docker.io/prom/node-exporter:latest e5a616e4b9cf present
101 prometheus 1/1 6s ago docker.io/prom/prometheus:latest e935122ab143 present
102
103 Configuring SSL/TLS for Grafana
104 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
105
106 ``cephadm`` will deploy Grafana using the certificate defined in the ceph
107 key/value store. If a certificate is not specified, ``cephadm`` will generate a
108 self-signed certificate during deployment of the Grafana service.
109
110 A custom certificate can be configured using the following commands.
111
112 .. code-block:: bash
113
114 ceph config-key set mgr/cephadm/grafana_key -i $PWD/key.pem
115 ceph config-key set mgr/cephadm/grafana_crt -i $PWD/certificate.pem
116
117 The ``cephadm`` manager module needs to be restarted to be able to read updates
118 to these keys.
119
120 .. code-block:: bash
121
122 ceph orch restart mgr
123
124 If you already deployed Grafana, you need to redeploy the service for the
125 configuration to be updated.
126
127 .. code-block:: bash
128
129 ceph orch redeploy grafana
130
131 The ``redeploy`` command also takes care of setting the right URL for Ceph
132 Dashboard.
133
134 Using custom images
135 ~~~~~~~~~~~~~~~~~~~
136
137 It is possible to install or upgrade monitoring components based on other
138 images. To do so, the name of the image to be used needs to be stored in the
139 configuration first. The following configuration options are available.
140
141 - ``container_image_prometheus``
142 - ``container_image_grafana``
143 - ``container_image_alertmanager``
144 - ``container_image_node_exporter``
145
146 Custom images can be set with the ``ceph config`` command
147
148 .. code-block:: bash
149
150 ceph config set mgr mgr/cephadm/<option_name> <value>
151
152 For example
153
154 .. code-block:: bash
155
156 ceph config set mgr mgr/cephadm/container_image_prometheus prom/prometheus:v1.4.1
157
158 .. note::
159
160 By setting a custom image, the default value will be overridden (but not
161 overwritten). The default value changes when updates become available.
162 By setting a custom image, you will not be able to update the component
163 you have set the custom image for automatically. You will need to
164 manually update the configuration (image name and tag) to be able to
165 install updates.
166
167 If you choose to go with the recommendations instead, you can reset the
168 custom image you have set before. After that, the default value will be
169 used again. Use ``ceph config rm`` to reset the configuration option
170
171 .. code-block:: bash
172
173 ceph config rm mgr mgr/cephadm/<option_name>
174
175 For example
176
177 .. code-block:: bash
178
179 ceph config rm mgr mgr/cephadm/container_image_prometheus
180
181 Using custom configuration files
182 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
183
184 By overriding cephadm templates, it is possible to completely customize the
185 configuration files for monitoring services.
186
187 Internally, cephadm already uses `Jinja2
188 <https://jinja.palletsprojects.com/en/2.11.x/>`_ templates to generate the
189 configuration files for all monitoring components. To be able to customize the
190 configuration of Prometheus, Grafana or the Alertmanager it is possible to store
191 a Jinja2 template for each service that will be used for configuration
192 generation instead. This template will be evaluated every time a service of that
193 kind is deployed or reconfigured. That way, the custom configuration is
194 preserved and automatically applied on future deployments of these services.
195
196 .. note::
197
198 The configuration of the custom template is also preserved when the default
199 configuration of cephadm changes. If the updated configuration is to be used,
200 the custom template needs to be migrated *manually*.
201
202 Option names
203 """"""""""""
204
205 The following templates for files that will be generated by cephadm can be
206 overridden. These are the names to be used when storing with ``ceph config-key
207 set``:
208
209 - ``alertmanager_alertmanager.yml``
210 - ``grafana_ceph-dashboard.yml``
211 - ``grafana_grafana.ini``
212 - ``prometheus_prometheus.yml``
213
214 You can look up the file templates that are currently used by cephadm in
215 ``src/pybind/mgr/cephadm/templates``:
216
217 - ``services/alertmanager/alertmanager.yml.j2``
218 - ``services/grafana/ceph-dashboard.yml.j2``
219 - ``services/grafana/grafana.ini.j2``
220 - ``services/prometheus/prometheus.yml.j2``
221
222 Usage
223 """""
224
225 The following command applies a single line value:
226
227 .. code-block:: bash
228
229 ceph config-key set mgr/cephadm/<option_name> <value>
230
231 To set contents of files as template use the ``-i`` argument:
232
233 .. code-block:: bash
234
235 ceph config-key set mgr/cephadm/<option_name> -i $PWD/<filename>
236
237 .. note::
238
239 When using files as input to ``config-key`` an absolute path to the file must
240 be used.
241
242 It is required to restart the cephadm mgr module after a configuration option
243 has been set. Then the configuration file for the service needs to be recreated.
244 This is done using `redeploy`. For more details see the following example.
245
246 Example
247 """""""
248
249 .. code-block:: bash
250
251 # set the contents of ./prometheus.yml.j2 as template
252 ceph config-key set mgr/cephadm/services_prometheus_prometheus.yml \
253 -i $PWD/prometheus.yml.j2
254
255 # restart cephadm mgr module
256 ceph orch restart mgr
257
258 # redeploy the prometheus service
259 ceph orch redeploy prometheus
260
261 Disabling monitoring
262 --------------------
263
264 If you have deployed monitoring and would like to remove it, you can do
265 so with
266
267 .. code-block:: bash
268
269 ceph orch rm grafana
270 ceph orch rm prometheus --force # this will delete metrics data collected so far
271 ceph orch rm node-exporter
272 ceph orch rm alertmanager
273 ceph mgr module disable prometheus
274
275
276 Deploying monitoring manually
277 -----------------------------
278
279 If you have an existing prometheus monitoring infrastructure, or would like
280 to manage it yourself, you need to configure it to integrate with your Ceph
281 cluster.
282
283 * Enable the prometheus module in the ceph-mgr daemon
284
285 .. code-block:: bash
286
287 ceph mgr module enable prometheus
288
289 By default, ceph-mgr presents prometheus metrics on port 9283 on each host
290 running a ceph-mgr daemon. Configure prometheus to scrape these.
291
292 * To enable the dashboard's prometheus-based alerting, see :ref:`dashboard-alerting`.
293
294 * To enable dashboard integration with Grafana, see :ref:`dashboard-grafana`.
295
296 Enabling RBD-Image monitoring
297 ---------------------------------
298
299 Due to performance reasons, monitoring of RBD images is disabled by default. For more information please see
300 :ref:`prometheus-rbd-io-statistics`. If disabled, the overview and details dashboards will stay empty in Grafana
301 and the metrics will not be visible in Prometheus.