.. _rados-troubleshooting-mon:
-=================================
+==========================
Troubleshooting Monitors
-=================================
+==========================
.. index:: monitor, high availability
-When a cluster encounters monitor-related troubles there's a tendency to
-panic, and sometimes with good reason. Losing one or more monitors doesn't
-necessarily mean that your cluster is down, so long as a majority are up,
-running, and form a quorum.
-Regardless of how bad the situation is, the first thing you should do is to
-calm down, take a breath, and step through the below troubleshooting steps.
+If a cluster encounters monitor-related problems, this does not necessarily
+mean that the cluster is in danger of going down. Even if multiple monitors are
+lost, the cluster can still be up and running, as long as there are enough
+surviving monitors to form a quorum.
+However serious your cluster's monitor-related problems might be, we recommend
+that you take the following troubleshooting steps.
-Initial Troubleshooting
-========================
+Initial Troubleshooting
+=======================
**Are the monitors running?**
- First of all, we need to make sure the monitor (*mon*) daemon processes
- (``ceph-mon``) are running. You would be amazed by how often Ceph admins
- forget to start the mons, or to restart them after an upgrade. There's no
- shame, but try to not lose a couple of hours looking for a deeper problem.
- When running Kraken or later releases also ensure that the manager
- daemons (``ceph-mgr``) are running, usually alongside each ``ceph-mon``.
-
+ First, make sure that the monitor (*mon*) daemon processes (``ceph-mon``) are
+ running. Sometimes Ceph admins either forget to start the mons or forget to
+ restart the mons after an upgrade. Checking for this simple oversight can
+ save hours of painstaking troubleshooting. It is also important to make sure
+ that the manager daemons (``ceph-mgr``) are running. Remember that typical
+ cluster configurations provide one ``ceph-mgr`` for each ``ceph-mon``.
+
+ .. note:: Rook will not run more than two managers.
+
+**Can you reach the monitor nodes?**
+
+ In certain rare cases, there may be ``iptables`` rules that block access to
+ monitor nodes or TCP ports. These rules might be left over from earlier
+ stress testing or rule development. To check for the presence of such rules,
+ SSH into the server and then try to connect to the monitor's ports
+ (``tcp/3300`` and ``tcp/6789``) using ``telnet``, ``nc``, or a similar tool.
-**Are you able to reach to the mon nodes?**
+**Does the ``ceph status`` command run and receive a reply from the cluster?**
- Doesn't happen often, but sometimes there are ``iptables`` rules that
- block access to mon nodes or TCP ports. These may be leftovers from
- prior stress-testing or rule development. Try SSHing into
- the server and, if that succeeds, try connecting to the monitor's ports
- (``tcp/3300`` and ``tcp/6789``) using a ``telnet``, ``nc``, or similar tools.
+ If the ``ceph status`` command does receive a reply from the cluster, then the
+ cluster is up and running. The monitors will answer to a ``status`` request
+ only if there is a formed quorum. Confirm that one or more ``mgr`` daemons
+ are reported as running. Under ideal conditions, all ``mgr`` daemons will be
+ reported as running.
-**Does ceph -s run and obtain a reply from the cluster?**
- If the answer is yes then your cluster is up and running. One thing you
- can take for granted is that the monitors will only answer to a ``status``
- request if there is a formed quorum. Also check that at least one ``mgr``
- daemon is reported as running, ideally all of them.
+ If the ``ceph status`` command does not receive a reply from the cluster, then
+ there are probably not enough monitors ``up`` to form a quorum. The ``ceph
+ -s`` command with no further options specified connects to an arbitrarily
+ selected monitor. In certain cases, however, it might be helpful to connect
+ to a specific monitor (or to several specific monitors in sequence) by adding
+ the ``-m`` flag to the command: for example, ``ceph status -m mymon1``.
- If ``ceph -s`` hangs without obtaining a reply from the cluster
- or showing ``fault`` messages, then it is likely that your monitors
- are either down completely or just a fraction are up -- a fraction
- insufficient to form a majority quorum. This check will connect to an
- arbitrary mon; in rare cases it may be illuminating to bind to specific
- mons in sequence by adding e.g. ``-m mymon1`` to the command.
-**What if ceph -s doesn't come back?**
+**None of this worked. What now?**
- If you haven't gone through all the steps so far, please go back and do.
+ If the above solutions have not resolved your problems, you might find it
+ helpful to examine each individual monitor in turn. Whether or not a quorum
+ has been formed, it is possible to contact each monitor individually and
+ request its status by using the ``ceph tell mon.ID mon_status`` command (here
+ ``ID`` is the monitor's identifier).
- You can contact each monitor individually asking them for their status,
- regardless of a quorum being formed. This can be achieved using
- ``ceph tell mon.ID mon_status``, ID being the monitor's identifier. You should
- perform this for each monitor in the cluster. In section `Understanding
- mon_status`_ we will explain how to interpret the output of this command.
+ Run the ``ceph tell mon.ID mon_status`` command for each monitor in the
+ cluster. For more on this command's output, see :ref:`Understanding
+ mon_status
+ <rados_troubleshoting_troubleshooting_mon_understanding_mon_status>`.
- You may instead SSH into each mon node and query the daemon's admin socket.
+ There is also an alternative method: SSH into each monitor node and query the
+ daemon's admin socket. See :ref:`Using the Monitor's Admin
+ Socket<rados_troubleshoting_troubleshooting_mon_using_admin_socket>`.
+.. _rados_troubleshoting_troubleshooting_mon_using_admin_socket:
Using the monitor's admin socket
-=================================
+================================
-The admin socket allows you to interact with a given daemon directly using a
-Unix socket file. This file can be found in your monitor's ``run`` directory.
-By default, the admin socket will be kept in ``/var/run/ceph/ceph-mon.ID.asok``
-but this may be elsewhere if you have overridden the default directory. If you
-don't find it there, check your ``ceph.conf`` for an alternative path or
-run::
+A monitor's admin socket allows you to interact directly with a specific daemon
+by using a Unix socket file. This file is found in the monitor's ``run``
+directory. The admin socket's default directory is
+``/var/run/ceph/ceph-mon.ID.asok``, but this can be overridden and the admin
+socket might be elsewhere, especially if your cluster's daemons are deployed in
+containers. If you cannot find it, either check your ``ceph.conf`` for an
+alternative path or run the following command:
+
+.. prompt:: bash $
- ceph-conf --name mon.ID --show-config-value admin_socket
+ ceph-conf --name mon.ID --show-config-value admin_socket
-Bear in mind that the admin socket will be available only while the monitor
-daemon is running. When the monitor is properly shut down, the admin socket
-will be removed. If however the monitor is not running and the admin socket
-persists, it is likely that the monitor was improperly shut down.
-Regardless, if the monitor is not running, you will not be able to use the
-admin socket, with ``ceph`` likely returning ``Error 111: Connection Refused``.
+The admin socket is available for use only when the monitor daemon is running.
+Whenever the monitor has been properly shut down, the admin socket is removed.
+However, if the monitor is not running and the admin socket persists, it is
+likely that the monitor has been improperly shut down. In any case, if the
+monitor is not running, it will be impossible to use the admin socket, and the
+``ceph`` command is likely to return ``Error 111: Connection Refused``.
-Accessing the admin socket is as simple as running ``ceph tell`` on the daemon
-you are interested in. For example::
+To access the admin socket, run a ``ceph tell`` command of the following form
+(specifying the daemon that you are interested in):
- ceph tell mon.<id> mon_status
+.. prompt:: bash $
-Under the hood, this passes the command ``help`` to the running MON daemon
-``<id>`` via its "admin socket", which is a file ending in ``.asok``
-somewhere under ``/var/run/ceph``. Once you know the full path to the file,
-you can even do this yourself::
+ ceph tell mon.<id> mon_status
- ceph --admin-daemon <full_path_to_asok_file> <command>
+This command passes a ``help`` command to the specific running monitor daemon
+``<id>`` via its admin socket. If you know the full path to the admin socket
+file, this can be done more directly by running the following command:
-Using ``help`` as the command to the ``ceph`` tool will show you the
-supported commands available through the admin socket. Please take a look
-at ``config get``, ``config show``, ``mon stat`` and ``quorum_status``,
-as those can be enlightening when troubleshooting a monitor.
+.. prompt:: bash $
+ ceph --admin-daemon <full_path_to_asok_file> <command>
+
+Running ``ceph help`` shows all supported commands that are available through
+the admin socket. See especially ``config get``, ``config show``, ``mon stat``,
+and ``quorum_status``.
+
+.. _rados_troubleshoting_troubleshooting_mon_understanding_mon_status:
Understanding mon_status
-=========================
+========================
-``mon_status`` can always be obtained via the admin socket. This command will
-output a multitude of information about the monitor, including the same output
-you would get with ``quorum_status``.
+The status of the monitor (as reported by the ``ceph tell mon.X mon_status``
+command) can always be obtained via the admin socket. This command outputs a
+great deal of information about the monitor (including the information found in
+the output of the ``quorum_status`` command).
-Take the following example output of ``ceph tell mon.c mon_status``::
+To understand this command's output, let us consider the following example, in
+which we see the output of ``ceph tell mon.c mon_status``::
-
{ "name": "c",
"rank": 2,
"state": "peon",
"name": "c",
"addr": "127.0.0.1:6795\/0"}]}}
-A couple of things are obvious: we have three monitors in the monmap (*a*, *b*
-and *c*), the quorum is formed by only two monitors, and *c* is in the quorum
-as a *peon*.
+It is clear that there are three monitors in the monmap (*a*, *b*, and *c*),
+the quorum is formed by only two monitors, and *c* is in the quorum as a
+*peon*.
+
+**Which monitor is out of the quorum?**
-Which monitor is out of the quorum?
+ The answer is **a** (that is, ``mon.a``).
- The answer would be **a**.
+**Why?**
-Why?
+ When the ``quorum`` set is examined, there are clearly two monitors in the
+ set: *1* and *2*. But these are not monitor names. They are monitor ranks, as
+ established in the current ``monmap``. The ``quorum`` set does not include
+ the monitor that has rank 0, and according to the ``monmap`` that monitor is
+ ``mon.a``.
- Take a look at the ``quorum`` set. We have two monitors in this set: *1*
- and *2*. These are not monitor names. These are monitor ranks, as established
- in the current monmap. We are missing the monitor with rank 0, and according
- to the monmap that would be ``mon.a``.
+**How are monitor ranks determined?**
-By the way, how are ranks established?
+ Monitor ranks are calculated (or recalculated) whenever monitors are added or
+ removed. The calculation of ranks follows a simple rule: the **greater** the
+ ``IP:PORT`` combination, the **lower** the rank. In this case, because
+ ``127.0.0.1:6789`` is lower than the other two ``IP:PORT`` combinations,
+ ``mon.a`` has the highest rank: namely, rank 0.
- Ranks are (re)calculated whenever you add or remove monitors and follow a
- simple rule: the **greater** the ``IP:PORT`` combination, the **lower** the
- rank is. In this case, considering that ``127.0.0.1:6789`` is lower than all
- the remaining ``IP:PORT`` combinations, ``mon.a`` has rank 0.
Most Common Monitor Issues
===========================
socket as explained in `Using the monitor's admin socket`_ and
`Understanding mon_status`_.
- If the monitor is out of the quorum, its state should be one of
- ``probing``, ``electing`` or ``synchronizing``. If it happens to be either
- ``leader`` or ``peon``, then the monitor believes to be in quorum, while
- the remaining cluster is sure it is not; or maybe it got into the quorum
- while we were troubleshooting the monitor, so check you ``ceph -s`` again
- just to make sure. Proceed if the monitor is not yet in the quorum.
+ If the monitor is out of the quorum, its state should be one of ``probing``,
+ ``electing`` or ``synchronizing``. If it happens to be either ``leader`` or
+ ``peon``, then the monitor believes to be in quorum, while the remaining
+ cluster is sure it is not; or maybe it got into the quorum while we were
+ troubleshooting the monitor, so check you ``ceph status`` again just to make
+ sure. Proceed if the monitor is not yet in the quorum.
What if the state is ``probing``?
This means the monitor is still looking for the other monitors. Every time
- you start a monitor, the monitor will stay in this state for some time
- while trying to connect the rest of the monitors specified in the ``monmap``.
- The time a monitor will spend in this state can vary. For instance, when on
- a single-monitor cluster (never do this in production),
- the monitor will pass through the probing state almost instantaneously.
- In a multi-monitor cluster, the monitors will stay in this state until they
- find enough monitors to form a quorum -- this means that if you have 2 out
- of 3 monitors down, the one remaining monitor will stay in this state
- indefinitely until you bring one of the other monitors up.
+ you start a monitor, the monitor will stay in this state for some time while
+ trying to connect the rest of the monitors specified in the ``monmap``. The
+ time a monitor will spend in this state can vary. For instance, when on a
+ single-monitor cluster (never do this in production), the monitor will pass
+ through the probing state almost instantaneously. In a multi-monitor
+ cluster, the monitors will stay in this state until they find enough monitors
+ to form a quorum -- this means that if you have 2 out of 3 monitors down, the
+ one remaining monitor will stay in this state indefinitely until you bring
+ one of the other monitors up.
If you have a quorum the starting daemon should be able to find the
other monitors quickly, as long as they can be reached. If your
How do I know there's a clock skew?
- The monitors will warn you via the cluster status ``HEALTH_WARN``. ``ceph health
- detail`` or ``ceph status`` should show something like::
+ The monitors will warn you via the cluster status ``HEALTH_WARN``. ``ceph
+ health detail`` or ``ceph status`` should show something like::
mon.c addr 10.10.0.1:6789/0 clock skew 0.08235s > max 0.05s (latency 0.0045s)
That means that ``mon.c`` has been flagged as suffering from a clock skew.
- On releases beginning with Luminous you can issue the
- ``ceph time-sync-status`` command to check status. Note that the lead mon
- is typically the one with the numerically lowest IP address. It will always
- show ``0``: the reported offsets of other mons are relative to
- the lead mon, not to any external reference source.
+ On releases beginning with Luminous you can issue the ``ceph
+ time-sync-status`` command to check status. Note that the lead mon is
+ typically the one with the numerically lowest IP address. It will always
+ show ``0``: the reported offsets of other mons are relative to the lead mon,
+ not to any external reference source.
What should I do if there's a clock skew?
with troubleshooting Ceph. For most situations, setting the following options
on your monitors will be enough to pinpoint a potential source of the issue::
- debug mon = 10
- debug ms = 1
+ debug_mon = 10
+ debug_ms = 1
If we find that these debug levels are not enough, there's a chance we may
ask you to raise them or even define other debug subsystems to obtain infos