ceph/doc/cephadm/monitoring.rst

   1 Monitoring Stack with Cephadm
   2 =============================
   3
   4 Ceph Dashboard uses `Prometheus <https://prometheus.io/>`_, `Grafana
   5 <https://grafana.com/>`_, and related tools to store and visualize detailed
   6 metrics on cluster utilization and performance.  Ceph users have three options:
   7
   8 #. Have cephadm deploy and configure these services.  This is the default
   9    when bootstrapping a new cluster unless the ``--skip-monitoring-stack``
  10    option is used.
  11 #. Deploy and configure these services manually.  This is recommended for users
  12    with existing prometheus services in their environment (and in cases where
  13    Ceph is running in Kubernetes with Rook).
  14 #. Skip the monitoring stack completely.  Some Ceph dashboard graphs will
  15    not be available.
  16
  17 The monitoring stack consists of `Prometheus <https://prometheus.io/>`_,
  18 Prometheus exporters (:ref:`mgr-prometheus`, `Node exporter
  19 <https://prometheus.io/docs/guides/node-exporter/>`_), `Prometheus Alert
  20 Manager <https://prometheus.io/docs/alerting/alertmanager/>`_ and `Grafana
  21 <https://grafana.com/>`_.
  22
  23 .. note::
  24
  25   Prometheus' security model presumes that untrusted users have access to the
  26   Prometheus HTTP endpoint and logs. Untrusted users have access to all the
  27   (meta)data Prometheus collects that is contained in the database, plus a
  28   variety of operational and debugging information.
  29
  30   However, Prometheus' HTTP API is limited to read-only operations.
  31   Configurations can *not* be changed using the API and secrets are not
  32   exposed. Moreover, Prometheus has some built-in measures to mitigate the
  33   impact of denial of service attacks.
  34
  35   Please see `Prometheus' Security model
  36   <https://prometheus.io/docs/operating/security/>` for more detailed
  37   information.
  38
  39 By default, bootstrap will deploy a basic monitoring stack.  If you
  40 did not do this (by passing ``--skip-monitoring-stack``, or if you
  41 converted an existing cluster to cephadm management, you can set up
  42 monitoring by following the steps below.
  43
  44 #. Enable the prometheus module in the ceph-mgr daemon.  This exposes the internal Ceph metrics so that prometheus can scrape them.::
  45
  46      ceph mgr module enable prometheus
  47
  48 #. Deploy a node-exporter service on every node of the cluster.  The node-exporter provides host-level metrics like CPU and memory utilization.::
  49
  50      ceph orch apply node-exporter '*'
  51
  52 #. Deploy alertmanager::
  53
  54      ceph orch apply alertmanager 1
  55
  56 #. Deploy prometheus.  A single prometheus instance is sufficient, but
  57    for HA you may want to deploy two.::
  58
  59      ceph orch apply prometheus 1    # or 2
  60
  61 #. Deploy grafana::
  62
  63      ceph orch apply grafana 1
  64
  65 Cephadm handles the prometheus, grafana, and alertmanager
  66 configurations automatically.
  67
  68 It may take a minute or two for services to be deployed.  Once
  69 completed, you should see something like this from ``ceph orch ls``::
  70
  71   $ ceph orch ls
  72   NAME           RUNNING  REFRESHED  IMAGE NAME                                      IMAGE ID        SPEC
  73   alertmanager       1/1  6s ago     docker.io/prom/alertmanager:latest              0881eb8f169f  present
  74   crash              2/2  6s ago     docker.io/ceph/daemon-base:latest-master-devel  mix           present
  75   grafana            1/1  0s ago     docker.io/pcuzner/ceph-grafana-el8:latest       f77afcf0bcf6   absent
  76   node-exporter      2/2  6s ago     docker.io/prom/node-exporter:latest             e5a616e4b9cf  present
  77   prometheus         1/1  6s ago     docker.io/prom/prometheus:latest                e935122ab143  present
  78
  79 Using custom images
  80 ~~~~~~~~~~~~~~~~~~~
  81
  82 It is possible to install or upgrade monitoring components based on other
  83 images.  To do so, the name of the image to be used needs to be stored in the
  84 configuration first.  The following configuration options are available.
  85
  86 - ``container_image_prometheus``
  87 - ``container_image_grafana``
  88 - ``container_image_alertmanager``
  89 - ``container_image_node_exporter``
  90
  91 Custom images can be set with the ``ceph config`` command::
  92
  93      ceph config set mgr mgr/cephadm/<option_name> <value>
  94
  95 For example::
  96
  97      ceph config set mgr mgr/cephadm/container_image_prometheus prom/prometheus:v1.4.1
  98
  99 .. note::
 100
 101      By setting a custom image, the default value will be overridden (but not
 102      overwritten).  The default value changes when updates become available.
 103      By setting a custom image, you will not be able to update the component
 104      you have set the custom image for automatically.  You will need to
 105      manually update the configuration (image name and tag) to be able to
 106      install updates.
 107
 108      If you choose to go with the recommendations instead, you can reset the
 109      custom image you have set before.  After that, the default value will be
 110      used again.  Use ``ceph config rm`` to reset the configuration option::
 111
 112           ceph config rm mgr mgr/cephadm/<option_name>
 113
 114      For example::
 115
 116           ceph config rm mgr mgr/cephadm/container_image_prometheus
 117
 118 Disabling monitoring
 119 --------------------
 120
 121 If you have deployed monitoring and would like to remove it, you can do
 122 so with::
 123
 124   ceph orch rm grafana
 125   ceph orch rm prometheus --force   # this will delete metrics data collected so far
 126   ceph orch rm node-exporter
 127   ceph orch rm alertmanager
 128   ceph mgr module disable prometheus
 129
 130
 131 Deploying monitoring manually
 132 -----------------------------
 133
 134 If you have an existing prometheus monitoring infrastructure, or would like
 135 to manage it yourself, you need to configure it to integrate with your Ceph
 136 cluster.
 137
 138 * Enable the prometheus module in the ceph-mgr daemon::
 139
 140      ceph mgr module enable prometheus
 141
 142   By default, ceph-mgr presents prometheus metrics on port 9283 on each host
 143   running a ceph-mgr daemon.  Configure prometheus to scrape these.
 144
 145 * To enable the dashboard's prometheus-based alerting, see :ref:`dashboard-alerting`.
 146
 147 * To enable dashboard integration with Grafana, see :ref:`dashboard-grafana`.
 148
 149 Enabling RBD-Image monitoring
 150 ---------------------------------
 151
 152 Due to performance reasons, monitoring of RBD images is disabled by default. For more information please see
 153 :ref:`prometheus-rbd-io-statistics`. If disabled, the overview and details dashboards will stay empty in Grafana
 154 and the metrics will not be visible in Prometheus.