]>
Commit | Line | Data |
---|---|---|
f67539c2 TL |
1 | .. _mgr-cephadm-monitoring: |
2 | ||
3 | Monitoring Services | |
4 | =================== | |
9f95a23c | 5 | |
e306af50 TL |
6 | Ceph Dashboard uses `Prometheus <https://prometheus.io/>`_, `Grafana |
7 | <https://grafana.com/>`_, and related tools to store and visualize detailed | |
8 | metrics on cluster utilization and performance. Ceph users have three options: | |
9f95a23c TL |
9 | |
10 | #. Have cephadm deploy and configure these services. This is the default | |
11 | when bootstrapping a new cluster unless the ``--skip-monitoring-stack`` | |
12 | option is used. | |
13 | #. Deploy and configure these services manually. This is recommended for users | |
14 | with existing prometheus services in their environment (and in cases where | |
15 | Ceph is running in Kubernetes with Rook). | |
16 | #. Skip the monitoring stack completely. Some Ceph dashboard graphs will | |
17 | not be available. | |
18 | ||
e306af50 TL |
19 | The monitoring stack consists of `Prometheus <https://prometheus.io/>`_, |
20 | Prometheus exporters (:ref:`mgr-prometheus`, `Node exporter | |
21 | <https://prometheus.io/docs/guides/node-exporter/>`_), `Prometheus Alert | |
22 | Manager <https://prometheus.io/docs/alerting/alertmanager/>`_ and `Grafana | |
23 | <https://grafana.com/>`_. | |
24 | ||
25 | .. note:: | |
26 | ||
27 | Prometheus' security model presumes that untrusted users have access to the | |
28 | Prometheus HTTP endpoint and logs. Untrusted users have access to all the | |
29 | (meta)data Prometheus collects that is contained in the database, plus a | |
30 | variety of operational and debugging information. | |
31 | ||
32 | However, Prometheus' HTTP API is limited to read-only operations. | |
33 | Configurations can *not* be changed using the API and secrets are not | |
34 | exposed. Moreover, Prometheus has some built-in measures to mitigate the | |
35 | impact of denial of service attacks. | |
36 | ||
37 | Please see `Prometheus' Security model | |
38 | <https://prometheus.io/docs/operating/security/>` for more detailed | |
39 | information. | |
9f95a23c | 40 | |
f67539c2 TL |
41 | Deploying monitoring with cephadm |
42 | --------------------------------- | |
43 | ||
b3b6e05e TL |
44 | The default behavior of ``cephadm`` is to deploy a basic monitoring stack. It |
45 | is however possible that you have a Ceph cluster without a monitoring stack, | |
46 | and you would like to add a monitoring stack to it. (Here are some ways that | |
47 | you might have come to have a Ceph cluster without a monitoring stack: You | |
48 | might have passed the ``--skip-monitoring stack`` option to ``cephadm`` during | |
49 | the installation of the cluster, or you might have converted an existing | |
50 | cluster (which had no monitoring stack) to cephadm management.) | |
9f95a23c | 51 | |
b3b6e05e TL |
52 | To set up monitoring on a Ceph cluster that has no monitoring, follow the |
53 | steps below: | |
f91f0fd5 | 54 | |
b3b6e05e TL |
55 | #. Enable the Prometheus module in the ceph-mgr daemon. This exposes the internal Ceph metrics so that Prometheus can scrape them: |
56 | ||
57 | .. prompt:: bash # | |
9f95a23c TL |
58 | |
59 | ceph mgr module enable prometheus | |
60 | ||
b3b6e05e | 61 | #. Deploy a node-exporter service on every node of the cluster. The node-exporter provides host-level metrics like CPU and memory utilization: |
f91f0fd5 | 62 | |
b3b6e05e | 63 | .. prompt:: bash # |
9f95a23c TL |
64 | |
65 | ceph orch apply node-exporter '*' | |
66 | ||
b3b6e05e | 67 | #. Deploy alertmanager: |
f91f0fd5 | 68 | |
b3b6e05e | 69 | .. prompt:: bash # |
9f95a23c TL |
70 | |
71 | ceph orch apply alertmanager 1 | |
72 | ||
b3b6e05e TL |
73 | #. Deploy Prometheus. A single Prometheus instance is sufficient, but |
74 | for high availablility (HA) you might want to deploy two: | |
75 | ||
76 | .. prompt:: bash # | |
77 | ||
78 | ceph orch apply prometheus 1 | |
f91f0fd5 | 79 | |
b3b6e05e | 80 | or |
9f95a23c | 81 | |
b3b6e05e TL |
82 | .. prompt:: bash # |
83 | ||
84 | ceph orch apply prometheus 2 | |
9f95a23c | 85 | |
b3b6e05e | 86 | #. Deploy grafana: |
f91f0fd5 | 87 | |
b3b6e05e | 88 | .. prompt:: bash # |
9f95a23c TL |
89 | |
90 | ceph orch apply grafana 1 | |
91 | ||
b3b6e05e TL |
92 | Manually setting the Grafana URL |
93 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
94 | ||
95 | Cephadm automatically configures Prometheus, Grafana, and Alertmanager in | |
96 | all cases except one. | |
adb31ebb | 97 | |
b3b6e05e TL |
98 | In a some setups, the Dashboard user's browser might not be able to access the |
99 | Grafana URL that is configured in Ceph Dashboard. This can happen when the | |
100 | cluster and the accessing user are in different DNS zones. | |
adb31ebb | 101 | |
b3b6e05e TL |
102 | If this is the case, you can use a configuration option for Ceph Dashboard |
103 | to set the URL that the user's browser will use to access Grafana. This | |
104 | value will never be altered by cephadm. To set this configuration option, | |
105 | issue the following command: | |
adb31ebb | 106 | |
b3b6e05e | 107 | .. prompt:: bash $ |
9f95a23c | 108 | |
b3b6e05e TL |
109 | ceph dashboard set-grafana-frontend-api-url <grafana-server-api> |
110 | ||
111 | It might take a minute or two for services to be deployed. After the | |
112 | services have been deployed, you should see something like this when you issue the command ``ceph orch ls``: | |
f91f0fd5 TL |
113 | |
114 | .. code-block:: console | |
9f95a23c TL |
115 | |
116 | $ ceph orch ls | |
117 | NAME RUNNING REFRESHED IMAGE NAME IMAGE ID SPEC | |
118 | alertmanager 1/1 6s ago docker.io/prom/alertmanager:latest 0881eb8f169f present | |
119 | crash 2/2 6s ago docker.io/ceph/daemon-base:latest-master-devel mix present | |
120 | grafana 1/1 0s ago docker.io/pcuzner/ceph-grafana-el8:latest f77afcf0bcf6 absent | |
121 | node-exporter 2/2 6s ago docker.io/prom/node-exporter:latest e5a616e4b9cf present | |
122 | prometheus 1/1 6s ago docker.io/prom/prometheus:latest e935122ab143 present | |
123 | ||
adb31ebb TL |
124 | Configuring SSL/TLS for Grafana |
125 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
126 | ||
b3b6e05e TL |
127 | ``cephadm`` deploys Grafana using the certificate defined in the ceph |
128 | key/value store. If no certificate is specified, ``cephadm`` generates a | |
129 | self-signed certificate during the deployment of the Grafana service. | |
adb31ebb | 130 | |
b3b6e05e | 131 | A custom certificate can be configured using the following commands: |
adb31ebb | 132 | |
b3b6e05e | 133 | .. prompt:: bash # |
adb31ebb TL |
134 | |
135 | ceph config-key set mgr/cephadm/grafana_key -i $PWD/key.pem | |
136 | ceph config-key set mgr/cephadm/grafana_crt -i $PWD/certificate.pem | |
137 | ||
b3b6e05e TL |
138 | If you have already deployed Grafana, run ``reconfig`` on the service to |
139 | update its configuration: | |
adb31ebb | 140 | |
b3b6e05e | 141 | .. prompt:: bash # |
adb31ebb | 142 | |
f67539c2 | 143 | ceph orch reconfig grafana |
adb31ebb | 144 | |
b3b6e05e TL |
145 | The ``reconfig`` command also sets the proper URL for Ceph Dashboard. |
146 | ||
147 | Networks and Ports | |
148 | ~~~~~~~~~~~~~~~~~~ | |
149 | ||
150 | All monitoring services can have the network and port they bind to configured with a yaml service specification | |
151 | ||
152 | example spec file: | |
153 | ||
154 | .. code-block:: yaml | |
155 | ||
156 | service_type: grafana | |
157 | service_name: grafana | |
158 | placement: | |
159 | count: 1 | |
160 | networks: | |
161 | - 192.169.142.0/24 | |
162 | spec: | |
163 | port: 4200 | |
adb31ebb | 164 | |
e306af50 TL |
165 | Using custom images |
166 | ~~~~~~~~~~~~~~~~~~~ | |
167 | ||
168 | It is possible to install or upgrade monitoring components based on other | |
169 | images. To do so, the name of the image to be used needs to be stored in the | |
170 | configuration first. The following configuration options are available. | |
171 | ||
172 | - ``container_image_prometheus`` | |
173 | - ``container_image_grafana`` | |
174 | - ``container_image_alertmanager`` | |
175 | - ``container_image_node_exporter`` | |
176 | ||
f91f0fd5 TL |
177 | Custom images can be set with the ``ceph config`` command |
178 | ||
179 | .. code-block:: bash | |
e306af50 TL |
180 | |
181 | ceph config set mgr mgr/cephadm/<option_name> <value> | |
182 | ||
f91f0fd5 TL |
183 | For example |
184 | ||
185 | .. code-block:: bash | |
e306af50 TL |
186 | |
187 | ceph config set mgr mgr/cephadm/container_image_prometheus prom/prometheus:v1.4.1 | |
188 | ||
b3b6e05e TL |
189 | If there were already running monitoring stack daemon(s) of the type whose |
190 | image you've changed, you must redeploy the daemon(s) in order to have them | |
191 | actually use the new image. | |
192 | ||
193 | For example, if you had changed the prometheus image | |
194 | ||
195 | .. prompt:: bash # | |
196 | ||
197 | ceph orch redeploy prometheus | |
198 | ||
199 | ||
e306af50 TL |
200 | .. note:: |
201 | ||
202 | By setting a custom image, the default value will be overridden (but not | |
203 | overwritten). The default value changes when updates become available. | |
204 | By setting a custom image, you will not be able to update the component | |
205 | you have set the custom image for automatically. You will need to | |
206 | manually update the configuration (image name and tag) to be able to | |
207 | install updates. | |
adb31ebb | 208 | |
e306af50 TL |
209 | If you choose to go with the recommendations instead, you can reset the |
210 | custom image you have set before. After that, the default value will be | |
f91f0fd5 TL |
211 | used again. Use ``ceph config rm`` to reset the configuration option |
212 | ||
213 | .. code-block:: bash | |
e306af50 TL |
214 | |
215 | ceph config rm mgr mgr/cephadm/<option_name> | |
216 | ||
f91f0fd5 TL |
217 | For example |
218 | ||
219 | .. code-block:: bash | |
e306af50 TL |
220 | |
221 | ceph config rm mgr mgr/cephadm/container_image_prometheus | |
222 | ||
adb31ebb TL |
223 | Using custom configuration files |
224 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
225 | ||
226 | By overriding cephadm templates, it is possible to completely customize the | |
227 | configuration files for monitoring services. | |
228 | ||
229 | Internally, cephadm already uses `Jinja2 | |
230 | <https://jinja.palletsprojects.com/en/2.11.x/>`_ templates to generate the | |
231 | configuration files for all monitoring components. To be able to customize the | |
232 | configuration of Prometheus, Grafana or the Alertmanager it is possible to store | |
233 | a Jinja2 template for each service that will be used for configuration | |
234 | generation instead. This template will be evaluated every time a service of that | |
235 | kind is deployed or reconfigured. That way, the custom configuration is | |
236 | preserved and automatically applied on future deployments of these services. | |
237 | ||
238 | .. note:: | |
239 | ||
240 | The configuration of the custom template is also preserved when the default | |
241 | configuration of cephadm changes. If the updated configuration is to be used, | |
242 | the custom template needs to be migrated *manually*. | |
243 | ||
244 | Option names | |
245 | """""""""""" | |
246 | ||
247 | The following templates for files that will be generated by cephadm can be | |
248 | overridden. These are the names to be used when storing with ``ceph config-key | |
249 | set``: | |
250 | ||
f67539c2 TL |
251 | - ``services/alertmanager/alertmanager.yml`` |
252 | - ``services/grafana/ceph-dashboard.yml`` | |
253 | - ``services/grafana/grafana.ini`` | |
254 | - ``services/prometheus/prometheus.yml`` | |
adb31ebb TL |
255 | |
256 | You can look up the file templates that are currently used by cephadm in | |
257 | ``src/pybind/mgr/cephadm/templates``: | |
258 | ||
259 | - ``services/alertmanager/alertmanager.yml.j2`` | |
260 | - ``services/grafana/ceph-dashboard.yml.j2`` | |
261 | - ``services/grafana/grafana.ini.j2`` | |
262 | - ``services/prometheus/prometheus.yml.j2`` | |
263 | ||
264 | Usage | |
265 | """"" | |
266 | ||
267 | The following command applies a single line value: | |
268 | ||
269 | .. code-block:: bash | |
270 | ||
271 | ceph config-key set mgr/cephadm/<option_name> <value> | |
272 | ||
273 | To set contents of files as template use the ``-i`` argument: | |
274 | ||
275 | .. code-block:: bash | |
276 | ||
277 | ceph config-key set mgr/cephadm/<option_name> -i $PWD/<filename> | |
278 | ||
279 | .. note:: | |
280 | ||
281 | When using files as input to ``config-key`` an absolute path to the file must | |
282 | be used. | |
283 | ||
f67539c2 TL |
284 | |
285 | Then the configuration file for the service needs to be recreated. | |
286 | This is done using `reconfig`. For more details see the following example. | |
adb31ebb TL |
287 | |
288 | Example | |
289 | """"""" | |
290 | ||
291 | .. code-block:: bash | |
292 | ||
293 | # set the contents of ./prometheus.yml.j2 as template | |
f67539c2 | 294 | ceph config-key set mgr/cephadm/services/prometheus/prometheus.yml \ |
adb31ebb TL |
295 | -i $PWD/prometheus.yml.j2 |
296 | ||
f67539c2 TL |
297 | # reconfig the prometheus service |
298 | ceph orch reconfig prometheus | |
adb31ebb | 299 | |
9f95a23c TL |
300 | Disabling monitoring |
301 | -------------------- | |
302 | ||
b3b6e05e | 303 | To disable monitoring and remove the software that supports it, run the following commands: |
f91f0fd5 | 304 | |
b3b6e05e | 305 | .. code-block:: console |
9f95a23c | 306 | |
b3b6e05e TL |
307 | $ ceph orch rm grafana |
308 | $ ceph orch rm prometheus --force # this will delete metrics data collected so far | |
309 | $ ceph orch rm node-exporter | |
310 | $ ceph orch rm alertmanager | |
311 | $ ceph mgr module disable prometheus | |
9f95a23c TL |
312 | |
313 | ||
314 | Deploying monitoring manually | |
315 | ----------------------------- | |
316 | ||
317 | If you have an existing prometheus monitoring infrastructure, or would like | |
318 | to manage it yourself, you need to configure it to integrate with your Ceph | |
319 | cluster. | |
320 | ||
f91f0fd5 TL |
321 | * Enable the prometheus module in the ceph-mgr daemon |
322 | ||
323 | .. code-block:: bash | |
9f95a23c TL |
324 | |
325 | ceph mgr module enable prometheus | |
326 | ||
327 | By default, ceph-mgr presents prometheus metrics on port 9283 on each host | |
328 | running a ceph-mgr daemon. Configure prometheus to scrape these. | |
329 | ||
330 | * To enable the dashboard's prometheus-based alerting, see :ref:`dashboard-alerting`. | |
331 | ||
332 | * To enable dashboard integration with Grafana, see :ref:`dashboard-grafana`. | |
e306af50 TL |
333 | |
334 | Enabling RBD-Image monitoring | |
335 | --------------------------------- | |
336 | ||
337 | Due to performance reasons, monitoring of RBD images is disabled by default. For more information please see | |
338 | :ref:`prometheus-rbd-io-statistics`. If disabled, the overview and details dashboards will stay empty in Grafana | |
339 | and the metrics will not be visible in Prometheus. |