ceph/doc/cephadm/monitoring.rst

   1 Monitoring Stack with Cephadm
   2 =============================
   3
   4 Ceph Dashboard uses `Prometheus <https://prometheus.io/>`_, `Grafana
   5 <https://grafana.com/>`_, and related tools to store and visualize detailed
   6 metrics on cluster utilization and performance.  Ceph users have three options:
   7
   8 #. Have cephadm deploy and configure these services.  This is the default
   9    when bootstrapping a new cluster unless the ``--skip-monitoring-stack``
  10    option is used.
  11 #. Deploy and configure these services manually.  This is recommended for users
  12    with existing prometheus services in their environment (and in cases where
  13    Ceph is running in Kubernetes with Rook).
  14 #. Skip the monitoring stack completely.  Some Ceph dashboard graphs will
  15    not be available.
  16
  17 The monitoring stack consists of `Prometheus <https://prometheus.io/>`_,
  18 Prometheus exporters (:ref:`mgr-prometheus`, `Node exporter
  19 <https://prometheus.io/docs/guides/node-exporter/>`_), `Prometheus Alert
  20 Manager <https://prometheus.io/docs/alerting/alertmanager/>`_ and `Grafana
  21 <https://grafana.com/>`_.
  22
  23 .. note::
  24
  25   Prometheus' security model presumes that untrusted users have access to the
  26   Prometheus HTTP endpoint and logs. Untrusted users have access to all the
  27   (meta)data Prometheus collects that is contained in the database, plus a
  28   variety of operational and debugging information.
  29
  30   However, Prometheus' HTTP API is limited to read-only operations.
  31   Configurations can *not* be changed using the API and secrets are not
  32   exposed. Moreover, Prometheus has some built-in measures to mitigate the
  33   impact of denial of service attacks.
  34
  35   Please see `Prometheus' Security model
  36   <https://prometheus.io/docs/operating/security/>` for more detailed
  37   information.
  38
  39 By default, bootstrap will deploy a basic monitoring stack.  If you
  40 did not do this (by passing ``--skip-monitoring-stack``, or if you
  41 converted an existing cluster to cephadm management, you can set up
  42 monitoring by following the steps below.
  43
  44 #. Enable the prometheus module in the ceph-mgr daemon.  This exposes the internal Ceph metrics so that prometheus can scrape them.
  45
  46    .. code-block:: bash
  47
  48      ceph mgr module enable prometheus
  49
  50 #. Deploy a node-exporter service on every node of the cluster.  The node-exporter provides host-level metrics like CPU and memory utilization.
  51
  52    .. code-block:: bash
  53
  54      ceph orch apply node-exporter '*'
  55
  56 #. Deploy alertmanager
  57
  58    .. code-block:: bash
  59
  60      ceph orch apply alertmanager 1
  61
  62 #. Deploy prometheus.  A single prometheus instance is sufficient, but
  63    for HA you may want to deploy two.
  64
  65    .. code-block:: bash
  66
  67      ceph orch apply prometheus 1    # or 2
  68
  69 #. Deploy grafana
  70
  71    .. code-block:: bash
  72
  73      ceph orch apply grafana 1
  74
  75 Cephadm takes care of the configuration of Prometheus, Grafana, and Alertmanager
  76 automatically.
  77
  78 However, there is one exception to this rule. In a some setups, the Dashboard
  79 user's browser might not be able to access the Grafana URL configured in Ceph
  80 Dashboard. One such scenario is when the cluster and the accessing user are each
  81 in a different DNS zone.
  82
  83 For this case, there is an extra configuration option for Ceph Dashboard, which
  84 can be used to configure the URL for accessing Grafana by the user's browser.
  85 This value will never be altered by cephadm. To set this configuration option,
  86 issue the following command::
  87
  88   $ ceph dashboard set-grafana-frontend-api-url <grafana-server-api>
  89
  90 It may take a minute or two for services to be deployed.  Once
  91 completed, you should see something like this from ``ceph orch ls``
  92
  93 .. code-block:: console
  94
  95   $ ceph orch ls
  96   NAME           RUNNING  REFRESHED  IMAGE NAME                                      IMAGE ID        SPEC
  97   alertmanager       1/1  6s ago     docker.io/prom/alertmanager:latest              0881eb8f169f  present
  98   crash              2/2  6s ago     docker.io/ceph/daemon-base:latest-master-devel  mix           present
  99   grafana            1/1  0s ago     docker.io/pcuzner/ceph-grafana-el8:latest       f77afcf0bcf6   absent
 100   node-exporter      2/2  6s ago     docker.io/prom/node-exporter:latest             e5a616e4b9cf  present
 101   prometheus         1/1  6s ago     docker.io/prom/prometheus:latest                e935122ab143  present
 102
 103 Configuring SSL/TLS for Grafana
 104 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 105
 106 ``cephadm`` will deploy Grafana using the certificate defined in the ceph
 107 key/value store. If a certificate is not specified, ``cephadm`` will generate a
 108 self-signed certificate during deployment of the Grafana service.
 109
 110 A custom certificate can be configured using the following commands.
 111
 112 .. code-block:: bash
 113
 114   ceph config-key set mgr/cephadm/grafana_key -i $PWD/key.pem
 115   ceph config-key set mgr/cephadm/grafana_crt -i $PWD/certificate.pem
 116
 117 The ``cephadm`` manager module needs to be restarted to be able to read updates
 118 to these keys.
 119
 120 .. code-block:: bash
 121
 122   ceph orch restart mgr
 123
 124 If you already deployed Grafana, you need to redeploy the service for the
 125 configuration to be updated.
 126
 127 .. code-block:: bash
 128
 129   ceph orch redeploy grafana
 130
 131 The ``redeploy`` command also takes care of setting the right URL for Ceph
 132 Dashboard.
 133
 134 Using custom images
 135 ~~~~~~~~~~~~~~~~~~~
 136
 137 It is possible to install or upgrade monitoring components based on other
 138 images.  To do so, the name of the image to be used needs to be stored in the
 139 configuration first.  The following configuration options are available.
 140
 141 - ``container_image_prometheus``
 142 - ``container_image_grafana``
 143 - ``container_image_alertmanager``
 144 - ``container_image_node_exporter``
 145
 146 Custom images can be set with the ``ceph config`` command
 147
 148 .. code-block:: bash
 149
 150      ceph config set mgr mgr/cephadm/<option_name> <value>
 151
 152 For example
 153
 154 .. code-block:: bash
 155
 156      ceph config set mgr mgr/cephadm/container_image_prometheus prom/prometheus:v1.4.1
 157
 158 .. note::
 159
 160      By setting a custom image, the default value will be overridden (but not
 161      overwritten).  The default value changes when updates become available.
 162      By setting a custom image, you will not be able to update the component
 163      you have set the custom image for automatically.  You will need to
 164      manually update the configuration (image name and tag) to be able to
 165      install updates.
 166
 167      If you choose to go with the recommendations instead, you can reset the
 168      custom image you have set before.  After that, the default value will be
 169      used again.  Use ``ceph config rm`` to reset the configuration option
 170
 171      .. code-block:: bash
 172
 173           ceph config rm mgr mgr/cephadm/<option_name>
 174
 175      For example
 176
 177      .. code-block:: bash
 178
 179           ceph config rm mgr mgr/cephadm/container_image_prometheus
 180
 181 Using custom configuration files
 182 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 183
 184 By overriding cephadm templates, it is possible to completely customize the
 185 configuration files for monitoring services.
 186
 187 Internally, cephadm already uses `Jinja2
 188 <https://jinja.palletsprojects.com/en/2.11.x/>`_ templates to generate the
 189 configuration files for all monitoring components. To be able to customize the
 190 configuration of Prometheus, Grafana or the Alertmanager it is possible to store
 191 a Jinja2 template for each service that will be used for configuration
 192 generation instead. This template will be evaluated every time a service of that
 193 kind is deployed or reconfigured. That way, the custom configuration is
 194 preserved and automatically applied on future deployments of these services.
 195
 196 .. note::
 197
 198   The configuration of the custom template is also preserved when the default
 199   configuration of cephadm changes. If the updated configuration is to be used,
 200   the custom template needs to be migrated *manually*.
 201
 202 Option names
 203 """"""""""""
 204
 205 The following templates for files that will be generated by cephadm can be
 206 overridden. These are the names to be used when storing with ``ceph config-key
 207 set``:
 208
 209 - ``alertmanager_alertmanager.yml``
 210 - ``grafana_ceph-dashboard.yml``
 211 - ``grafana_grafana.ini``
 212 - ``prometheus_prometheus.yml``
 213
 214 You can look up the file templates that are currently used by cephadm in
 215 ``src/pybind/mgr/cephadm/templates``:
 216
 217 - ``services/alertmanager/alertmanager.yml.j2``
 218 - ``services/grafana/ceph-dashboard.yml.j2``
 219 - ``services/grafana/grafana.ini.j2``
 220 - ``services/prometheus/prometheus.yml.j2``
 221
 222 Usage
 223 """""
 224
 225 The following command applies a single line value:
 226
 227 .. code-block:: bash
 228
 229   ceph config-key set mgr/cephadm/<option_name> <value>
 230
 231 To set contents of files as template use the ``-i`` argument:
 232
 233 .. code-block:: bash
 234
 235   ceph config-key set mgr/cephadm/<option_name> -i $PWD/<filename>
 236
 237 .. note::
 238
 239   When using files as input to ``config-key`` an absolute path to the file must
 240   be used.
 241
 242 It is required to restart the cephadm mgr module after a configuration option
 243 has been set. Then the configuration file for the service needs to be recreated.
 244 This is done using `redeploy`. For more details see the following example.
 245
 246 Example
 247 """""""
 248
 249 .. code-block:: bash
 250
 251   # set the contents of ./prometheus.yml.j2 as template
 252   ceph config-key set mgr/cephadm/services_prometheus_prometheus.yml \
 253     -i $PWD/prometheus.yml.j2
 254
 255   # restart cephadm mgr module
 256   ceph orch restart mgr
 257
 258   # redeploy the prometheus service
 259   ceph orch redeploy prometheus
 260
 261 Disabling monitoring
 262 --------------------
 263
 264 If you have deployed monitoring and would like to remove it, you can do
 265 so with
 266
 267 .. code-block:: bash
 268
 269   ceph orch rm grafana
 270   ceph orch rm prometheus --force   # this will delete metrics data collected so far
 271   ceph orch rm node-exporter
 272   ceph orch rm alertmanager
 273   ceph mgr module disable prometheus
 274
 275
 276 Deploying monitoring manually
 277 -----------------------------
 278
 279 If you have an existing prometheus monitoring infrastructure, or would like
 280 to manage it yourself, you need to configure it to integrate with your Ceph
 281 cluster.
 282
 283 * Enable the prometheus module in the ceph-mgr daemon
 284
 285   .. code-block:: bash
 286
 287      ceph mgr module enable prometheus
 288
 289   By default, ceph-mgr presents prometheus metrics on port 9283 on each host
 290   running a ceph-mgr daemon.  Configure prometheus to scrape these.
 291
 292 * To enable the dashboard's prometheus-based alerting, see :ref:`dashboard-alerting`.
 293
 294 * To enable dashboard integration with Grafana, see :ref:`dashboard-grafana`.
 295
 296 Enabling RBD-Image monitoring
 297 ---------------------------------
 298
 299 Due to performance reasons, monitoring of RBD images is disabled by default. For more information please see
 300 :ref:`prometheus-rbd-io-statistics`. If disabled, the overview and details dashboards will stay empty in Grafana
 301 and the metrics will not be visible in Prometheus.