[ceph.git] / ceph / doc / cephfs / troubleshooting.rst

=================
 Troubleshooting
=================

Slow/stuck operations
=====================

If you are experiencing apparent hung operations, the first task is to identify
where the problem is occurring: in the client, the MDS, or the network connecting
them. Start by looking to see if either side has stuck operations
(:ref:`slow_requests`, below), and narrow it down from there.

We can get hints about what's going on by dumping the MDS cache ::

  ceph daemon mds.<name> dump cache /tmp/dump.txt

.. note:: The file `dump.txt` is on the machine executing the MDS and for systemd
	  controlled MDS services, this is in a tmpfs in the MDS container.
	  Use `nsenter(1)` to locate `dump.txt` or specify another system-wide path.

If high logging levels are set on the MDS, that will almost certainly hold the
information we need to diagnose and solve the issue.

Stuck during recovery
=====================

Stuck in up:replay
------------------

If your MDS is stuck in ``up:replay`` then it is likely that the journal is
very long. Did you see ``MDS_HEALTH_TRIM`` cluster warnings saying the MDS is
behind on trimming its journal? If the journal has grown very large, it can
take hours to read the journal. There is no working around this but there
are things you can do to speed things along:

Reduce MDS debugging to 0. Even at the default settings, the MDS logs some
messages to memory for dumping if a fatal error is encountered. You can avoid
this:

.. code:: bash

   ceph config set mds debug_mds 0
   ceph config set mds debug_ms 0
   ceph config set mds debug_monc 0

Note if the MDS fails then there will be virtually no information to determine
why. If you can calculate when ``up:replay`` will complete, you should restore
these configs just prior to entering the next state:

.. code:: bash

   ceph config rm mds debug_mds
   ceph config rm mds debug_ms
   ceph config rm mds debug_monc

Once you've got replay moving along faster, you can calculate when the MDS will
complete. This is done by examining the journal replay status:

.. code:: bash

   $ ceph tell mds.<fs_name>:0 status | jq .replay_status
   {
     "journal_read_pos": 4195244,
     "journal_write_pos": 4195244,
     "journal_expire_pos": 4194304,
     "num_events": 2,
     "num_segments": 2
   }

Replay completes when the ``journal_read_pos`` reaches the
``journal_write_pos``. The write position will not change during replay. Track
the progression of the read position to compute the expected time to complete.


Avoiding recovery roadblocks
----------------------------

When trying to urgently restore your file system during an outage, here are some
things to do:

* **Deny all reconnect to clients.** This effectively blocklists all existing
  CephFS sessions so all mounts will hang or become unavailable.

.. code:: bash

   ceph config set mds mds_deny_all_reconnect true

  Remember to undo this after the MDS becomes active.

.. note:: This does not prevent new sessions from connecting. For that, see the ``refuse_client_session`` file system setting.

* **Extend the MDS heartbeat grace period**. This avoids replacing an MDS that appears
  "stuck" doing some operation. Sometimes recovery of an MDS may involve an
  operation that may take longer than expected (from the programmer's
  perspective). This is more likely when recovery is already taking a longer than
  normal amount of time to complete (indicated by your reading this document).
  Avoid unnecessary replacement loops by extending the heartbeat graceperiod:

.. code:: bash

   ceph config set mds mds_heartbeat_grace 3600

  This has the effect of having the MDS continue to send beacons to the monitors
  even when its internal "heartbeat" mechanism has not been reset (beat) in one
  hour. Note the previous mechanism for achieving this was via the
  `mds_beacon_grace` monitor setting.

* **Disable open file table prefetch.** Normally, the MDS will prefetch
  directory contents during recovery to heat up its cache. During long
  recovery, the cache is probably already hot **and large**. So this behavior
  can be undesirable. Disable using:

.. code:: bash

   ceph config set mds mds_oft_prefetch_dirfrags false

* **Turn off clients.** Clients reconnecting to the newly ``up:active`` MDS may
  cause new load on the file system when it's just getting back on its feet.
  There will likely be some general maintenance to do before workloads should be
  resumed. For example, expediting journal trim may be advisable if the recovery
  took a long time because replay was reading a overly large journal.

  You can do this manually or use the new file system tunable:

.. code:: bash

   ceph fs set <fs_name> refuse_client_session true

  That prevents any clients from establishing new sessions with the MDS.


Expediting MDS journal trim
===========================

If your MDS journal grew too large (maybe your MDS was stuck in up:replay for a
long time!), you will want to have the MDS trim its journal more frequently.
You will know the journal is too large because of ``MDS_HEALTH_TRIM`` warnings.

The main tunable available to do this is to modify the MDS tick interval. The
"tick" interval drives several upkeep activities in the MDS. It is strongly
recommended no significant file system load be present when modifying this tick
interval. This setting only affects an MDS in ``up:active``. The MDS does not
trim its journal during recovery.

.. code:: bash

   ceph config set mds mds_tick_interval 2


RADOS Health
============

If part of the CephFS metadata or data pools is unavailable and CephFS is not
responding, it is probably because RADOS itself is unhealthy. Resolve those
problems first (:doc:`../../rados/troubleshooting/index`).

The MDS
=======

If an operation is hung inside the MDS, it will eventually show up in ``ceph health``,
identifying "slow requests are blocked". It may also identify clients as
"failing to respond" or misbehaving in other ways. If the MDS identifies
specific clients as misbehaving, you should investigate why they are doing so.

Generally it will be the result of

#. Overloading the system (if you have extra RAM, increase the
   "mds cache memory limit" config from its default 1GiB; having a larger active
   file set than your MDS cache is the #1 cause of this!).

#. Running an older (misbehaving) client.

#. Underlying RADOS issues.

Otherwise, you have probably discovered a new bug and should report it to
the developers!

.. _slow_requests:

Slow requests (MDS)
-------------------
You can list current operations via the admin socket by running::

  ceph daemon mds.<name> dump_ops_in_flight

from the MDS host. Identify the stuck commands and examine why they are stuck.
Usually the last "event" will have been an attempt to gather locks, or sending
the operation off to the MDS log. If it is waiting on the OSDs, fix them. If
operations are stuck on a specific inode, you probably have a client holding
caps which prevent others from using it, either because the client is trying
to flush out dirty data or because you have encountered a bug in CephFS'
distributed file lock code (the file "capabilities" ["caps"] system).

If it's a result of a bug in the capabilities code, restarting the MDS
is likely to resolve the problem.

If there are no slow requests reported on the MDS, and it is not reporting
that clients are misbehaving, either the client has a problem or its
requests are not reaching the MDS.

.. _ceph_fuse_debugging:

ceph-fuse debugging
===================

ceph-fuse also supports ``dump_ops_in_flight``. See if it has any and where they are
stuck.

Debug output
------------

To get more debugging information from ceph-fuse, try running in the foreground
with logging to the console (``-d``) and enabling client debug
(``--debug-client=20``), enabling prints for each message sent
(``--debug-ms=1``).

If you suspect a potential monitor issue, enable monitor debugging as well
(``--debug-monc=20``).

.. _kernel_mount_debugging:

Kernel mount debugging
======================

If there is an issue with the kernel client, the most important thing is
figuring out whether the problem is with the kernel client or the MDS. Generally,
this is easy to work out. If the kernel client broke directly, there will be
output in ``dmesg``. Collect it and any inappropriate kernel state.

Slow requests
-------------

Unfortunately the kernel client does not support the admin socket, but it has
similar (if limited) interfaces if your kernel has debugfs enabled. There
will be a folder in ``sys/kernel/debug/ceph/``, and that folder (whose name will
look something like ``28f7427e-5558-4ffd-ae1a-51ec3042759a.client25386880``)
will contain a variety of files that output interesting output when you ``cat``
them. These files are described below; the most interesting when debugging
slow requests are probably the ``mdsc`` and ``osdc`` files.

* bdi: BDI info about the Ceph system (blocks dirtied, written, etc)
* caps: counts of file "caps" structures in-memory and used
* client_options: dumps the options provided to the CephFS mount
* dentry_lru: Dumps the CephFS dentries currently in-memory
* mdsc: Dumps current requests to the MDS
* mdsmap: Dumps the current MDSMap epoch and MDSes
* mds_sessions: Dumps the current sessions to MDSes
* monc: Dumps the current maps from the monitor, and any "subscriptions" held
* monmap: Dumps the current monitor map epoch and monitors
* osdc: Dumps the current ops in-flight to OSDs (ie, file data IO)
* osdmap: Dumps the current OSDMap epoch, pools, and OSDs

If the data pool is in a NEARFULL condition, then the kernel cephfs client
will switch to doing writes synchronously, which is quite slow.

Disconnected+Remounted FS
=========================
Because CephFS has a "consistent cache", if your network connection is
disrupted for a long enough time, the client will be forcibly
disconnected from the system. At this point, the kernel client is in
a bind: it cannot safely write back dirty data, and many applications
do not handle IO errors correctly on close().
At the moment, the kernel client will remount the FS, but outstanding file system
IO may or may not be satisfied. In these cases, you may need to reboot your
client system.

You can identify you are in this situation if dmesg/kern.log report something like::

   Jul 20 08:14:38 teuthology kernel: [3677601.123718] ceph: mds0 closed our session
   Jul 20 08:14:38 teuthology kernel: [3677601.128019] ceph: mds0 reconnect start
   Jul 20 08:14:39 teuthology kernel: [3677602.093378] ceph: mds0 reconnect denied
   Jul 20 08:14:39 teuthology kernel: [3677602.098525] ceph:  dropping dirty+flushing Fw state for ffff8802dc150518 1099935956631
   Jul 20 08:14:39 teuthology kernel: [3677602.107145] ceph:  dropping dirty+flushing Fw state for ffff8801008e8518 1099935946707
   Jul 20 08:14:39 teuthology kernel: [3677602.196747] libceph: mds0 172.21.5.114:6812 socket closed (con state OPEN)
   Jul 20 08:14:40 teuthology kernel: [3677603.126214] libceph: mds0 172.21.5.114:6812 connection reset
   Jul 20 08:14:40 teuthology kernel: [3677603.132176] libceph: reset on mds0

This is an area of ongoing work to improve the behavior. Kernels will soon
be reliably issuing error codes to in-progress IO, although your application(s)
may not deal with them well. In the longer-term, we hope to allow reconnect
and reclaim of data in cases where it won't violate POSIX semantics (generally,
data which hasn't been accessed or modified by other clients).

Mounting
========

Mount 5 Error
-------------

A mount 5 error typically occurs if a MDS server is laggy or if it crashed.
Ensure at least one MDS is up and running, and the cluster is ``active +
healthy``. 

Mount 12 Error
--------------

A mount 12 error with ``cannot allocate memory`` usually occurs if you  have a
version mismatch between the :term:`Ceph Client` version and the :term:`Ceph
Storage Cluster` version. Check the versions using::

	ceph -v
	
If the Ceph Client is behind the Ceph cluster, try to upgrade it::

	sudo apt-get update && sudo apt-get install ceph-common 

You may need to uninstall, autoclean and autoremove ``ceph-common`` 
and then reinstall it so that you have the latest version.

Dynamic Debugging
=================

You can enable dynamic debug against the CephFS module.

Please see: https://github.com/ceph/ceph/blob/master/src/script/kcon_all.sh

In-memory Log Dump
==================

In-memory logs can be dumped by setting ``mds_extraordinary_events_dump_interval``
during a lower level debugging (log level < 10). ``mds_extraordinary_events_dump_interval``
is the interval in seconds for dumping the recent in-memory logs when there is an Extra-Ordinary event.

The Extra-Ordinary events are classified as:

* Client Eviction
* Missed Beacon ACK from the monitors
* Missed Internal Heartbeats

In-memory Log Dump is disabled by default to prevent log file bloat in a production environment.
The below commands consecutively enables it::

  $ ceph config set mds debug_mds <log_level>/<gather_level>
  $ ceph config set mds mds_extraordinary_events_dump_interval <seconds>

The ``log_level`` should be < 10 and ``gather_level`` should be >= 10 to enable in-memory log dump.
When it is enabled, the MDS checks for the extra-ordinary events every
``mds_extraordinary_events_dump_interval`` seconds and if any of them occurs, MDS dumps the
in-memory logs containing the relevant event details in ceph-mds log.

.. note:: For higher log levels (log_level >= 10) there is no reason to dump the In-memory Logs and a
          lower gather level (gather_level < 10) is insufficient to gather In-memory Logs. Thus a
          log level >=10 or a gather level < 10 in debug_mds would prevent enabling the In-memory Log Dump.
          In such cases, when there is a failure it's required to reset the value of
          mds_extraordinary_events_dump_interval to 0 before enabling using the above commands.

The In-memory Log Dump can be disabled using::

  $ ceph config set mds mds_extraordinary_events_dump_interval 0

Filesystems Become Inaccessible After an Upgrade
================================================

.. note::
   You can avoid ``operation not permitted`` errors by running this procedure
   before an upgrade. As of May 2023, it seems that ``operation not permitted``
   errors of the kind discussed here occur after upgrades after Nautilus
   (inclusive).

IF

you have CephFS file systems that have data and metadata pools that were
created by a ``ceph fs new`` command (meaning that they were not created
with the defaults)

OR

you have an existing CephFS file system and are upgrading to a new post-Nautilus
major version of Ceph

THEN

in order for the documented ``ceph fs authorize...`` commands to function as
documented (and to avoid 'operation not permitted' errors when doing file I/O
or similar security-related problems for all users except the ``client.admin``
user), you must first run:

.. prompt:: bash $

   ceph osd pool application set <your metadata pool name> cephfs metadata <your ceph fs filesystem name>

and

.. prompt:: bash $

   ceph osd pool application set <your data pool name> cephfs data <your ceph fs filesystem name>

Otherwise, when the OSDs receive a request to read or write data (not the
directory info, but file data) they will not know which Ceph file system name
to look up. This is true also of pool names, because the 'defaults' themselves
changed in the major releases, from::

   data pool=fsname
   metadata pool=fsname_metadata

to::

   data pool=fsname.data and
   metadata pool=fsname.meta

Any setup that used ``client.admin`` for all mounts did not run into this
problem, because the admin key gave blanket permissions.

A temporary fix involves changing mount requests to the 'client.admin' user and
its associated key. A less drastic but half-fix is to change the osd cap for
your user to just ``caps osd = "allow rw"``  and delete ``tag cephfs
data=....``

Reporting Issues
================

If you have identified a specific issue, please report it with as much
information as possible. Especially important information:

* Ceph versions installed on client and server
* Whether you are using the kernel or fuse client
* If you are using the kernel client, what kernel version?
* How many clients are in play, doing what kind of workload?
* If a system is 'stuck', is that affecting all clients or just one?
* Any ceph health messages
* Any backtraces in the ceph logs from crashes

If you are satisfied that you have found a bug, please file it on `the bug
tracker`. For more general queries, please write to the `ceph-users mailing
list`.

.. _the bug tracker: http://tracker.ceph.com
.. _ceph-users mailing list:  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com/
Commit	Line	Data
7c673cae FG	1	=================
	2	Troubleshooting
	3	=================
	4
	5	Slow/stuck operations
	6	=====================
	7
	8	If you are experiencing apparent hung operations, the first task is to identify
	9	where the problem is occurring: in the client, the MDS, or the network connecting
	10	them. Start by looking to see if either side has stuck operations
	11	(:ref:`slow_requests`, below), and narrow it down from there.
	12
9f95a23c TL	13	We can get hints about what's going on by dumping the MDS cache ::
	14
	15	ceph daemon mds.<name> dump cache /tmp/dump.txt
	16
	17	.. note:: The file `dump.txt` is on the machine executing the MDS and for systemd
	18	controlled MDS services, this is in a tmpfs in the MDS container.
	19	Use `nsenter(1)` to locate `dump.txt` or specify another system-wide path.
	20
	21	If high logging levels are set on the MDS, that will almost certainly hold the
	22	information we need to diagnose and solve the issue.
	23
05a536ef TL	24	Stuck during recovery
	25	=====================
	26
	27	Stuck in up:replay
	28	------------------
	29
	30	If your MDS is stuck in ``up:replay`` then it is likely that the journal is
	31	very long. Did you see ``MDS_HEALTH_TRIM`` cluster warnings saying the MDS is
	32	behind on trimming its journal? If the journal has grown very large, it can
	33	take hours to read the journal. There is no working around this but there
	34	are things you can do to speed things along:
	35
	36	Reduce MDS debugging to 0. Even at the default settings, the MDS logs some
	37	messages to memory for dumping if a fatal error is encountered. You can avoid
	38	this:
	39
	40	.. code:: bash
	41
	42	ceph config set mds debug_mds 0
	43	ceph config set mds debug_ms 0
	44	ceph config set mds debug_monc 0
	45
	46	Note if the MDS fails then there will be virtually no information to determine
	47	why. If you can calculate when ``up:replay`` will complete, you should restore
	48	these configs just prior to entering the next state:
	49
	50	.. code:: bash
	51
	52	ceph config rm mds debug_mds
	53	ceph config rm mds debug_ms
	54	ceph config rm mds debug_monc
	55
	56	Once you've got replay moving along faster, you can calculate when the MDS will
	57	complete. This is done by examining the journal replay status:
	58
	59	.. code:: bash
	60
	61	$ ceph tell mds.<fs_name>:0 status \| jq .replay_status
	62	{
	63	"journal_read_pos": 4195244,
	64	"journal_write_pos": 4195244,
	65	"journal_expire_pos": 4194304,
	66	"num_events": 2,
	67	"num_segments": 2
	68	}
	69
	70	Replay completes when the ``journal_read_pos`` reaches the
	71	``journal_write_pos``. The write position will not change during replay. Track
	72	the progression of the read position to compute the expected time to complete.
	73
	74
	75	Avoiding recovery roadblocks
	76	----------------------------
	77
	78	When trying to urgently restore your file system during an outage, here are some
	79	things to do:
	80
	81	* Deny all reconnect to clients. This effectively blocklists all existing
	82	CephFS sessions so all mounts will hang or become unavailable.
	83
	84	.. code:: bash
	85
	86	ceph config set mds mds_deny_all_reconnect true
	87
88	Remember to undo this after the MDS becomes active.
89
90	.. note:: This does not prevent new sessions from connecting. For that, see the ``refuse_client_session`` file system setting.
91
92	* Extend the MDS heartbeat grace period. This avoids replacing an MDS that appears
93	"stuck" doing some operation. Sometimes recovery of an MDS may involve an
94	operation that may take longer than expected (from the programmer's
95	perspective). This is more likely when recovery is already taking a longer than
96	normal amount of time to complete (indicated by your reading this document).
97	Avoid unnecessary replacement loops by extending the heartbeat graceperiod:
98
99	.. code:: bash
100
aee94f69	101	ceph config set mds mds_heartbeat_grace 3600
05a536ef TL	102
	103	This has the effect of having the MDS continue to send beacons to the monitors
	104	even when its internal "heartbeat" mechanism has not been reset (beat) in one
	105	hour. Note the previous mechanism for achieving this was via the
	106	`mds_beacon_grace` monitor setting.
	107
	108	* Disable open file table prefetch. Normally, the MDS will prefetch
	109	directory contents during recovery to heat up its cache. During long
	110	recovery, the cache is probably already hot and large. So this behavior
	111	can be undesirable. Disable using:
	112
	113	.. code:: bash
	114
	115	ceph config set mds mds_oft_prefetch_dirfrags false
	116
	117	* Turn off clients. Clients reconnecting to the newly ``up:active`` MDS may
	118	cause new load on the file system when it's just getting back on its feet.
	119	There will likely be some general maintenance to do before workloads should be
	120	resumed. For example, expediting journal trim may be advisable if the recovery
	121	took a long time because replay was reading a overly large journal.
	122
	123	You can do this manually or use the new file system tunable:
	124
	125	.. code:: bash
	126
	127	ceph fs set <fs_name> refuse_client_session true
	128
	129	That prevents any clients from establishing new sessions with the MDS.
	130
	131
	132
	133	Expediting MDS journal trim
	134	===========================
	135
	136	If your MDS journal grew too large (maybe your MDS was stuck in up:replay for a
	137	long time!), you will want to have the MDS trim its journal more frequently.
	138	You will know the journal is too large because of ``MDS_HEALTH_TRIM`` warnings.
	139
	140	The main tunable available to do this is to modify the MDS tick interval. The
	141	"tick" interval drives several upkeep activities in the MDS. It is strongly
	142	recommended no significant file system load be present when modifying this tick
	143	interval. This setting only affects an MDS in ``up:active``. The MDS does not
	144	trim its journal during recovery.
	145
	146	.. code:: bash
	147
	148	ceph config set mds mds_tick_interval 2
	149
	150
7c673cae FG	151	RADOS Health
	152	============
	153
11fdf7f2	154	If part of the CephFS metadata or data pools is unavailable and CephFS is not
7c673cae FG	155	responding, it is probably because RADOS itself is unhealthy. Resolve those
	156	problems first (:doc:`../../rados/troubleshooting/index`).
	157
	158	The MDS
	159	=======
	160
	161	If an operation is hung inside the MDS, it will eventually show up in ``ceph health``,
	162	identifying "slow requests are blocked". It may also identify clients as
	163	"failing to respond" or misbehaving in other ways. If the MDS identifies
	164	specific clients as misbehaving, you should investigate why they are doing so.
9f95a23c	165
7c673cae	166	Generally it will be the result of
9f95a23c TL	167
	168	#. Overloading the system (if you have extra RAM, increase the
	169	"mds cache memory limit" config from its default 1GiB; having a larger active
	170	file set than your MDS cache is the #1 cause of this!).
	171
	172	#. Running an older (misbehaving) client.
	173
	174	#. Underlying RADOS issues.
7c673cae FG	175
	176	Otherwise, you have probably discovered a new bug and should report it to
	177	the developers!
	178
	179	.. _slow_requests:
	180
	181	Slow requests (MDS)
	182	-------------------
	183	You can list current operations via the admin socket by running::
	184
	185	ceph daemon mds.<name> dump_ops_in_flight
	186
	187	from the MDS host. Identify the stuck commands and examine why they are stuck.
	188	Usually the last "event" will have been an attempt to gather locks, or sending
	189	the operation off to the MDS log. If it is waiting on the OSDs, fix them. If
	190	operations are stuck on a specific inode, you probably have a client holding
	191	caps which prevent others from using it, either because the client is trying
c07f9fc5	192	to flush out dirty data or because you have encountered a bug in CephFS'
7c673cae FG	193	distributed file lock code (the file "capabilities" ["caps"] system).
	194
	195	If it's a result of a bug in the capabilities code, restarting the MDS
	196	is likely to resolve the problem.
	197
c07f9fc5	198	If there are no slow requests reported on the MDS, and it is not reporting
7c673cae	199	that clients are misbehaving, either the client has a problem or its
c07f9fc5	200	requests are not reaching the MDS.
7c673cae	201
9f95a23c TL	202	.. _ceph_fuse_debugging:
9f95a23c TL	203
7c673cae FG	204	ceph-fuse debugging
	205	===================
	206
9f95a23c	207	ceph-fuse also supports ``dump_ops_in_flight``. See if it has any and where they are
7c673cae FG	208	stuck.
	209
	210	Debug output
	211	------------
	212
	213	To get more debugging information from ceph-fuse, try running in the foreground
	214	with logging to the console (``-d``) and enabling client debug
	215	(``--debug-client=20``), enabling prints for each message sent
	216	(``--debug-ms=1``).
	217
	218	If you suspect a potential monitor issue, enable monitor debugging as well
	219	(``--debug-monc=20``).
	220
9f95a23c	221	.. _kernel_mount_debugging:
7c673cae FG	222
	223	Kernel mount debugging
	224	======================
	225
9f95a23c TL	226	If there is an issue with the kernel client, the most important thing is
	227	figuring out whether the problem is with the kernel client or the MDS. Generally,
	228	this is easy to work out. If the kernel client broke directly, there will be
	229	output in ``dmesg``. Collect it and any inappropriate kernel state.
	230
7c673cae FG	231	Slow requests
	232	-------------
	233
	234	Unfortunately the kernel client does not support the admin socket, but it has
	235	similar (if limited) interfaces if your kernel has debugfs enabled. There
	236	will be a folder in ``sys/kernel/debug/ceph/``, and that folder (whose name will
	237	look something like ``28f7427e-5558-4ffd-ae1a-51ec3042759a.client25386880``)
	238	will contain a variety of files that output interesting output when you ``cat``
	239	them. These files are described below; the most interesting when debugging
	240	slow requests are probably the ``mdsc`` and ``osdc`` files.
	241
	242	* bdi: BDI info about the Ceph system (blocks dirtied, written, etc)
	243	* caps: counts of file "caps" structures in-memory and used
	244	* client_options: dumps the options provided to the CephFS mount
	245	* dentry_lru: Dumps the CephFS dentries currently in-memory
	246	* mdsc: Dumps current requests to the MDS
	247	* mdsmap: Dumps the current MDSMap epoch and MDSes
	248	* mds_sessions: Dumps the current sessions to MDSes
	249	* monc: Dumps the current maps from the monitor, and any "subscriptions" held
	250	* monmap: Dumps the current monitor map epoch and monitors
	251	* osdc: Dumps the current ops in-flight to OSDs (ie, file data IO)
	252	* osdmap: Dumps the current OSDMap epoch, pools, and OSDs
	253
20effc67 TL	254	If the data pool is in a NEARFULL condition, then the kernel cephfs client
20effc67 TL	255	will switch to doing writes synchronously, which is quite slow.
7c673cae FG	256
	257	Disconnected+Remounted FS
	258	=========================
	259	Because CephFS has a "consistent cache", if your network connection is
	260	disrupted for a long enough time, the client will be forcibly
	261	disconnected from the system. At this point, the kernel client is in
c07f9fc5	262	a bind: it cannot safely write back dirty data, and many applications
7c673cae	263	do not handle IO errors correctly on close().
9f95a23c	264	At the moment, the kernel client will remount the FS, but outstanding file system
7c673cae FG	265	IO may or may not be satisfied. In these cases, you may need to reboot your
	266	client system.
	267
	268	You can identify you are in this situation if dmesg/kern.log report something like::
	269
	270	Jul 20 08:14:38 teuthology kernel: [3677601.123718] ceph: mds0 closed our session
	271	Jul 20 08:14:38 teuthology kernel: [3677601.128019] ceph: mds0 reconnect start
	272	Jul 20 08:14:39 teuthology kernel: [3677602.093378] ceph: mds0 reconnect denied
	273	Jul 20 08:14:39 teuthology kernel: [3677602.098525] ceph: dropping dirty+flushing Fw state for ffff8802dc150518 1099935956631
	274	Jul 20 08:14:39 teuthology kernel: [3677602.107145] ceph: dropping dirty+flushing Fw state for ffff8801008e8518 1099935946707
	275	Jul 20 08:14:39 teuthology kernel: [3677602.196747] libceph: mds0 172.21.5.114:6812 socket closed (con state OPEN)
	276	Jul 20 08:14:40 teuthology kernel: [3677603.126214] libceph: mds0 172.21.5.114:6812 connection reset
	277	Jul 20 08:14:40 teuthology kernel: [3677603.132176] libceph: reset on mds0
	278
	279	This is an area of ongoing work to improve the behavior. Kernels will soon
	280	be reliably issuing error codes to in-progress IO, although your application(s)
	281	may not deal with them well. In the longer-term, we hope to allow reconnect
	282	and reclaim of data in cases where it won't violate POSIX semantics (generally,
	283	data which hasn't been accessed or modified by other clients).
	284
	285	Mounting
	286	========
	287
	288	Mount 5 Error
	289	-------------
	290
	291	A mount 5 error typically occurs if a MDS server is laggy or if it crashed.
	292	Ensure at least one MDS is up and running, and the cluster is ``active +
	293	healthy``.
	294
	295	Mount 12 Error
	296	--------------
	297
	298	A mount 12 error with ``cannot allocate memory`` usually occurs if you have a
	299	version mismatch between the :term:`Ceph Client` version and the :term:`Ceph
	300	Storage Cluster` version. Check the versions using::
	301
	302	ceph -v
	303
	304	If the Ceph Client is behind the Ceph cluster, try to upgrade it::
	305
	306	sudo apt-get update && sudo apt-get install ceph-common
	307
	308	You may need to uninstall, autoclean and autoremove ``ceph-common``
	309	and then reinstall it so that you have the latest version.
	310
9f95a23c TL	311	Dynamic Debugging
	312	=================
	313
	314	You can enable dynamic debug against the CephFS module.
	315
	316	Please see: https://github.com/ceph/ceph/blob/master/src/script/kcon_all.sh
	317
1e59de90 TL	318	In-memory Log Dump
	319	==================
	320
	321	In-memory logs can be dumped by setting ``mds_extraordinary_events_dump_interval``
	322	during a lower level debugging (log level < 10). ``mds_extraordinary_events_dump_interval``
	323	is the interval in seconds for dumping the recent in-memory logs when there is an Extra-Ordinary event.
	324
	325	The Extra-Ordinary events are classified as:
	326
	327	* Client Eviction
	328	* Missed Beacon ACK from the monitors
	329	* Missed Internal Heartbeats
	330
	331	In-memory Log Dump is disabled by default to prevent log file bloat in a production environment.
	332	The below commands consecutively enables it::
	333
	334	$ ceph config set mds debug_mds <log_level>/<gather_level>
	335	$ ceph config set mds mds_extraordinary_events_dump_interval <seconds>
	336
	337	The ``log_level`` should be < 10 and ``gather_level`` should be >= 10 to enable in-memory log dump.
	338	When it is enabled, the MDS checks for the extra-ordinary events every
	339	``mds_extraordinary_events_dump_interval`` seconds and if any of them occurs, MDS dumps the
	340	in-memory logs containing the relevant event details in ceph-mds log.
	341
	342	.. note:: For higher log levels (log_level >= 10) there is no reason to dump the In-memory Logs and a
	343	lower gather level (gather_level < 10) is insufficient to gather In-memory Logs. Thus a
	344	log level >=10 or a gather level < 10 in debug_mds would prevent enabling the In-memory Log Dump.
	345	In such cases, when there is a failure it's required to reset the value of
	346	mds_extraordinary_events_dump_interval to 0 before enabling using the above commands.
	347
	348	The In-memory Log Dump can be disabled using::
	349
	350	$ ceph config set mds mds_extraordinary_events_dump_interval 0
	351
	352	Filesystems Become Inaccessible After an Upgrade
	353	================================================
	354
	355	.. note::
	356	You can avoid ``operation not permitted`` errors by running this procedure
	357	before an upgrade. As of May 2023, it seems that ``operation not permitted``
	358	errors of the kind discussed here occur after upgrades after Nautilus
	359	(inclusive).
	360
	361	IF
	362
	363	you have CephFS file systems that have data and metadata pools that were
	364	created by a ``ceph fs new`` command (meaning that they were not created
	365	with the defaults)
	366
	367	OR
	368
	369	you have an existing CephFS file system and are upgrading to a new post-Nautilus
	370	major version of Ceph
	371
	372	THEN
	373
	374	in order for the documented ``ceph fs authorize...`` commands to function as
	375	documented (and to avoid 'operation not permitted' errors when doing file I/O
	376	or similar security-related problems for all users except the ``client.admin``
	377	user), you must first run:
	378
	379	.. prompt:: bash $
	380
	381	ceph osd pool application set <your metadata pool name> cephfs metadata <your ceph fs filesystem name>
382
383	and
384
385	.. prompt:: bash $
386
387	ceph osd pool application set <your data pool name> cephfs data <your ceph fs filesystem name>
388
389	Otherwise, when the OSDs receive a request to read or write data (not the
390	directory info, but file data) they will not know which Ceph file system name
391	to look up. This is true also of pool names, because the 'defaults' themselves
392	changed in the major releases, from::
393
394	data pool=fsname
395	metadata pool=fsname_metadata
396
397	to::
398
399	data pool=fsname.data and
400	metadata pool=fsname.meta
401
402	Any setup that used ``client.admin`` for all mounts did not run into this
403	problem, because the admin key gave blanket permissions.
404
405	A temporary fix involves changing mount requests to the 'client.admin' user and
406	its associated key. A less drastic but half-fix is to change the osd cap for
407	your user to just ``caps osd = "allow rw"`` and delete ``tag cephfs
408	data=....``
409
9f95a23c TL	410	Reporting Issues
	411	================
	412
	413	If you have identified a specific issue, please report it with as much
	414	information as possible. Especially important information:
	415
	416	* Ceph versions installed on client and server
	417	* Whether you are using the kernel or fuse client
	418	* If you are using the kernel client, what kernel version?
	419	* How many clients are in play, doing what kind of workload?
	420	* If a system is 'stuck', is that affecting all clients or just one?
	421	* Any ceph health messages
	422	* Any backtraces in the ceph logs from crashes
	423
	424	If you are satisfied that you have found a bug, please file it on `the bug
	425	tracker`. For more general queries, please write to the `ceph-users mailing
	426	list`.
	427
	428	.. _the bug tracker: http://tracker.ceph.com
	429	.. _ceph-users mailing list: http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com/