]> git.proxmox.com Git - ceph.git/blob - ceph/doc/cephadm/monitoring.rst
import 15.2.4
[ceph.git] / ceph / doc / cephadm / monitoring.rst
1 Monitoring Stack with Cephadm
2 =============================
3
4 Ceph Dashboard uses `Prometheus <https://prometheus.io/>`_, `Grafana
5 <https://grafana.com/>`_, and related tools to store and visualize detailed
6 metrics on cluster utilization and performance. Ceph users have three options:
7
8 #. Have cephadm deploy and configure these services. This is the default
9 when bootstrapping a new cluster unless the ``--skip-monitoring-stack``
10 option is used.
11 #. Deploy and configure these services manually. This is recommended for users
12 with existing prometheus services in their environment (and in cases where
13 Ceph is running in Kubernetes with Rook).
14 #. Skip the monitoring stack completely. Some Ceph dashboard graphs will
15 not be available.
16
17 The monitoring stack consists of `Prometheus <https://prometheus.io/>`_,
18 Prometheus exporters (:ref:`mgr-prometheus`, `Node exporter
19 <https://prometheus.io/docs/guides/node-exporter/>`_), `Prometheus Alert
20 Manager <https://prometheus.io/docs/alerting/alertmanager/>`_ and `Grafana
21 <https://grafana.com/>`_.
22
23 .. note::
24
25 Prometheus' security model presumes that untrusted users have access to the
26 Prometheus HTTP endpoint and logs. Untrusted users have access to all the
27 (meta)data Prometheus collects that is contained in the database, plus a
28 variety of operational and debugging information.
29
30 However, Prometheus' HTTP API is limited to read-only operations.
31 Configurations can *not* be changed using the API and secrets are not
32 exposed. Moreover, Prometheus has some built-in measures to mitigate the
33 impact of denial of service attacks.
34
35 Please see `Prometheus' Security model
36 <https://prometheus.io/docs/operating/security/>` for more detailed
37 information.
38
39 By default, bootstrap will deploy a basic monitoring stack. If you
40 did not do this (by passing ``--skip-monitoring-stack``, or if you
41 converted an existing cluster to cephadm management, you can set up
42 monitoring by following the steps below.
43
44 #. Enable the prometheus module in the ceph-mgr daemon. This exposes the internal Ceph metrics so that prometheus can scrape them.::
45
46 ceph mgr module enable prometheus
47
48 #. Deploy a node-exporter service on every node of the cluster. The node-exporter provides host-level metrics like CPU and memory utilization.::
49
50 ceph orch apply node-exporter '*'
51
52 #. Deploy alertmanager::
53
54 ceph orch apply alertmanager 1
55
56 #. Deploy prometheus. A single prometheus instance is sufficient, but
57 for HA you may want to deploy two.::
58
59 ceph orch apply prometheus 1 # or 2
60
61 #. Deploy grafana::
62
63 ceph orch apply grafana 1
64
65 Cephadm handles the prometheus, grafana, and alertmanager
66 configurations automatically.
67
68 It may take a minute or two for services to be deployed. Once
69 completed, you should see something like this from ``ceph orch ls``::
70
71 $ ceph orch ls
72 NAME RUNNING REFRESHED IMAGE NAME IMAGE ID SPEC
73 alertmanager 1/1 6s ago docker.io/prom/alertmanager:latest 0881eb8f169f present
74 crash 2/2 6s ago docker.io/ceph/daemon-base:latest-master-devel mix present
75 grafana 1/1 0s ago docker.io/pcuzner/ceph-grafana-el8:latest f77afcf0bcf6 absent
76 node-exporter 2/2 6s ago docker.io/prom/node-exporter:latest e5a616e4b9cf present
77 prometheus 1/1 6s ago docker.io/prom/prometheus:latest e935122ab143 present
78
79 Using custom images
80 ~~~~~~~~~~~~~~~~~~~
81
82 It is possible to install or upgrade monitoring components based on other
83 images. To do so, the name of the image to be used needs to be stored in the
84 configuration first. The following configuration options are available.
85
86 - ``container_image_prometheus``
87 - ``container_image_grafana``
88 - ``container_image_alertmanager``
89 - ``container_image_node_exporter``
90
91 Custom images can be set with the ``ceph config`` command::
92
93 ceph config set mgr mgr/cephadm/<option_name> <value>
94
95 For example::
96
97 ceph config set mgr mgr/cephadm/container_image_prometheus prom/prometheus:v1.4.1
98
99 .. note::
100
101 By setting a custom image, the default value will be overridden (but not
102 overwritten). The default value changes when updates become available.
103 By setting a custom image, you will not be able to update the component
104 you have set the custom image for automatically. You will need to
105 manually update the configuration (image name and tag) to be able to
106 install updates.
107
108 If you choose to go with the recommendations instead, you can reset the
109 custom image you have set before. After that, the default value will be
110 used again. Use ``ceph config rm`` to reset the configuration option::
111
112 ceph config rm mgr mgr/cephadm/<option_name>
113
114 For example::
115
116 ceph config rm mgr mgr/cephadm/container_image_prometheus
117
118 Disabling monitoring
119 --------------------
120
121 If you have deployed monitoring and would like to remove it, you can do
122 so with::
123
124 ceph orch rm grafana
125 ceph orch rm prometheus --force # this will delete metrics data collected so far
126 ceph orch rm node-exporter
127 ceph orch rm alertmanager
128 ceph mgr module disable prometheus
129
130
131 Deploying monitoring manually
132 -----------------------------
133
134 If you have an existing prometheus monitoring infrastructure, or would like
135 to manage it yourself, you need to configure it to integrate with your Ceph
136 cluster.
137
138 * Enable the prometheus module in the ceph-mgr daemon::
139
140 ceph mgr module enable prometheus
141
142 By default, ceph-mgr presents prometheus metrics on port 9283 on each host
143 running a ceph-mgr daemon. Configure prometheus to scrape these.
144
145 * To enable the dashboard's prometheus-based alerting, see :ref:`dashboard-alerting`.
146
147 * To enable dashboard integration with Grafana, see :ref:`dashboard-grafana`.
148
149 Enabling RBD-Image monitoring
150 ---------------------------------
151
152 Due to performance reasons, monitoring of RBD images is disabled by default. For more information please see
153 :ref:`prometheus-rbd-io-statistics`. If disabled, the overview and details dashboards will stay empty in Grafana
154 and the metrics will not be visible in Prometheus.