[ceph.git] / ceph / doc / cephfs / mds-states.rst


MDS States
==========


The Metadata Server (MDS) goes through several states during normal operation
in CephFS. For example, some states indicate that the MDS is recovering from a
failover by a previous instance of the MDS. Here we'll document all of these
states and include a state diagram to visualize the transitions.

State Descriptions
------------------

Common states
~~~~~~~~~~~~~~


::

    up:active

This is the normal operating state of the MDS. It indicates that the MDS
and its rank in the file system is available.


::

    up:standby

The MDS is available to takeover for a failed rank (see also :ref:`mds-standby`).
The monitor will automatically assign an MDS in this state to a failed rank
once available.


::

    up:standby_replay

The MDS is following the journal of another ``up:active`` MDS. Should the
active MDS fail, having a standby MDS in replay mode is desirable as the MDS is
replaying the live journal and will more quickly takeover. A downside to having
standby replay MDSs is that they are not available to takeover for any other
MDS that fails, only the MDS they follow.


Less common or transitory states
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


::

    up:boot

This state is broadcast to the Ceph monitors during startup. This state is
never visible as the Monitor immediately assign the MDS to an available rank or
commands the MDS to operate as a standby. The state is documented here for
completeness.


::

    up:creating

The MDS is creating a new rank (perhaps rank 0) by constructing some per-rank
metadata (like the journal) and entering the MDS cluster.


::

    up:starting

The MDS is restarting a stopped rank. It opens associated per-rank metadata
and enters the MDS cluster.


::

    up:stopping

When a rank is stopped, the monitors command an active MDS to enter the
``up:stopping`` state. In this state, the MDS accepts no new client
connections, migrates all subtrees to other ranks in the file system, flush its
metadata journal, and, if the last rank (0), evict all clients and shutdown
(see also :ref:`cephfs-administration`).


::

    up:replay

The MDS taking over a failed rank. This state represents that the MDS is
recovering its journal and other metadata.


::

    up:resolve

The MDS enters this state from ``up:replay`` if the Ceph file system has
multiple ranks (including this one), i.e. it's not a single active MDS cluster.
The MDS is resolving any uncommitted inter-MDS operations. All ranks in the
file system must be in this state or later for progress to be made, i.e. no
rank can be failed/damaged or ``up:replay``.


::

    up:reconnect

An MDS enters this state from ``up:replay`` or ``up:resolve``. This state is to
solicit reconnections from clients. Any client which had a session with this
rank must reconnect during this time, configurable via
``mds_reconnect_timeout``.


::

    up:rejoin

The MDS enters this state from ``up:reconnect``. In this state, the MDS is
rejoining the MDS cluster cache. In particular, all inter-MDS locks on metadata
are reestablished.

If there are no known client requests to be replayed, the MDS directly becomes
``up:active`` from this state.


::

    up:clientreplay

The MDS may enter this state from ``up:rejoin``. The MDS is replaying any
client requests which were replied to but not yet durable (not journaled).
Clients resend these requests during ``up:reconnect`` and the requests are
replayed once again. The MDS enters ``up:active`` after completing replay.


Failed states
~~~~~~~~~~~~~

::

    down:failed

No MDS actually holds this state. Instead, it is applied to the rank in the file system. For example:

::

    $ ceph fs dump
    ...
    max_mds 1
    in      0
    up      {}
    failed  0
    ...

Rank 0 is part of the failed set and is pending to be taken over by a standby
MDS. If this state persists, it indicates no suitable MDS daemons found to be
assigned to this rank. This may be caused by not enough standby daemons, or all
standby daemons have incompatible compat (see also :ref:`upgrade-mds-cluster`).


::

    down:damaged

No MDS actually holds this state. Instead, it is applied to the rank in the file system. For example:

::

    $ ceph fs dump
    ...
    max_mds 1
    in      0
    up      {}
    failed  
    damaged 0
    ...

Rank 0 has become damaged (see also :ref:`cephfs-disaster-recovery`) and placed in
the ``damaged`` set. An MDS which was running as rank 0 found metadata damage
that could not be automatically recovered. Operator intervention is required.


::

    down:stopped
    
No MDS actually holds this state. Instead, it is applied to the rank in the file system. For example:

::

    $ ceph fs dump
    ...
    max_mds 1
    in      0
    up      {}
    failed  
    damaged 
    stopped 1
    ...

The rank has been stopped by reducing ``max_mds`` (see also :ref:`cephfs-multimds`).

State Diagram
-------------

This state diagram shows the possible state transitions for the MDS/rank. The legend is as follows:

Color
~~~~~

- Green: MDS is active.
- Orange: MDS is in transient state trying to become active.
- Red: MDS is indicating a state that causes the rank to be marked failed.
- Purple: MDS and rank is stopping.
- Black: MDS is indicating a state that causes the rank to be marked damaged.

Shape
~~~~~

- Circle: an MDS holds this state.
- Hexagon: no MDS holds this state (it is applied to the rank).

Lines
~~~~~

- A double-lined shape indicates the rank is "in".

.. graphviz:: mds-state-diagram.dot
Commit	Line	Data
11fdf7f2 TL	1
	2	MDS States
	3	==========
	4
	5
	6	The Metadata Server (MDS) goes through several states during normal operation
	7	in CephFS. For example, some states indicate that the MDS is recovering from a
	8	failover by a previous instance of the MDS. Here we'll document all of these
	9	states and include a state diagram to visualize the transitions.
	10
	11	State Descriptions
	12	------------------
	13
	14	Common states
	15	~~~~~~~~~~~~~~
	16
	17
	18	::
	19
	20	up:active
	21
	22	This is the normal operating state of the MDS. It indicates that the MDS
	23	and its rank in the file system is available.
	24
	25
	26	::
	27
	28	up:standby
	29
	30	The MDS is available to takeover for a failed rank (see also :ref:`mds-standby`).
	31	The monitor will automatically assign an MDS in this state to a failed rank
	32	once available.
	33
	34
	35	::
	36
	37	up:standby_replay
	38
	39	The MDS is following the journal of another ``up:active`` MDS. Should the
	40	active MDS fail, having a standby MDS in replay mode is desirable as the MDS is
	41	replaying the live journal and will more quickly takeover. A downside to having
	42	standby replay MDSs is that they are not available to takeover for any other
	43	MDS that fails, only the MDS they follow.
	44
	45
	46	Less common or transitory states
	47	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	48
	49
	50	::
	51
	52	up:boot
	53
	54	This state is broadcast to the Ceph monitors during startup. This state is
	55	never visible as the Monitor immediately assign the MDS to an available rank or
	56	commands the MDS to operate as a standby. The state is documented here for
	57	completeness.
	58
	59
	60	::
	61
	62	up:creating
	63
	64	The MDS is creating a new rank (perhaps rank 0) by constructing some per-rank
65	metadata (like the journal) and entering the MDS cluster.
66
67
68	::
69
70	up:starting
71
72	The MDS is restarting a stopped rank. It opens associated per-rank metadata
73	and enters the MDS cluster.
74
75
76	::
77
78	up:stopping
79
80	When a rank is stopped, the monitors command an active MDS to enter the
81	``up:stopping`` state. In this state, the MDS accepts no new client
82	connections, migrates all subtrees to other ranks in the file system, flush its
83	metadata journal, and, if the last rank (0), evict all clients and shutdown
84	(see also :ref:`cephfs-administration`).
85
86
87	::
88
89	up:replay
90
91	The MDS taking over a failed rank. This state represents that the MDS is
92	recovering its journal and other metadata.
93
94
95	::
96
97	up:resolve
98
99	The MDS enters this state from ``up:replay`` if the Ceph file system has
100	multiple ranks (including this one), i.e. it's not a single active MDS cluster.
101	The MDS is resolving any uncommitted inter-MDS operations. All ranks in the
102	file system must be in this state or later for progress to be made, i.e. no
103	rank can be failed/damaged or ``up:replay``.
104
105
106	::
107
108	up:reconnect
109
110	An MDS enters this state from ``up:replay`` or ``up:resolve``. This state is to
111	solicit reconnections from clients. Any client which had a session with this
112	rank must reconnect during this time, configurable via
113	``mds_reconnect_timeout``.
114
115
116	::
117
118	up:rejoin
119
120	The MDS enters this state from ``up:reconnect``. In this state, the MDS is
121	rejoining the MDS cluster cache. In particular, all inter-MDS locks on metadata
122	are reestablished.
123
124	If there are no known client requests to be replayed, the MDS directly becomes
125	``up:active`` from this state.
126
127
128	::
129
130	up:clientreplay
131
132	The MDS may enter this state from ``up:rejoin``. The MDS is replaying any
133	client requests which were replied to but not yet durable (not journaled).
134	Clients resend these requests during ``up:reconnect`` and the requests are
135	replayed once again. The MDS enters ``up:active`` after completing replay.
136
137
138	Failed states
139	~~~~~~~~~~~~~
140
141	::
142
143	down:failed
144
145	No MDS actually holds this state. Instead, it is applied to the rank in the file system. For example:
146
147	::
148
149	$ ceph fs dump
150	...
151	max_mds 1
152	in 0
153	up {}
154	failed 0
155	...
156
20effc67 TL	157	Rank 0 is part of the failed set and is pending to be taken over by a standby
	158	MDS. If this state persists, it indicates no suitable MDS daemons found to be
	159	assigned to this rank. This may be caused by not enough standby daemons, or all
1e59de90	160	standby daemons have incompatible compat (see also :ref:`upgrade-mds-cluster`).
11fdf7f2 TL	161
	162
	163	::
	164
	165	down:damaged
	166
	167	No MDS actually holds this state. Instead, it is applied to the rank in the file system. For example:
	168
	169	::
	170
	171	$ ceph fs dump
	172	...
	173	max_mds 1
	174	in 0
	175	up {}
	176	failed
	177	damaged 0
	178	...
	179
	180	Rank 0 has become damaged (see also :ref:`cephfs-disaster-recovery`) and placed in
	181	the ``damaged`` set. An MDS which was running as rank 0 found metadata damage
	182	that could not be automatically recovered. Operator intervention is required.
	183
	184
	185	::
	186
	187	down:stopped
	188
	189	No MDS actually holds this state. Instead, it is applied to the rank in the file system. For example:
	190
	191	::
	192
	193	$ ceph fs dump
	194	...
	195	max_mds 1
	196	in 0
	197	up {}
	198	failed
	199	damaged
	200	stopped 1
	201	...
	202
	203	The rank has been stopped by reducing ``max_mds`` (see also :ref:`cephfs-multimds`).
	204
	205	State Diagram
	206	-------------
	207
	208	This state diagram shows the possible state transitions for the MDS/rank. The legend is as follows:
	209
	210	Color
	211	~~~~~
	212
	213	- Green: MDS is active.
	214	- Orange: MDS is in transient state trying to become active.
	215	- Red: MDS is indicating a state that causes the rank to be marked failed.
	216	- Purple: MDS and rank is stopping.
9f95a23c	217	- Black: MDS is indicating a state that causes the rank to be marked damaged.
11fdf7f2 TL	218
	219	Shape
	220	~~~~~
	221
	222	- Circle: an MDS holds this state.
	223	- Hexagon: no MDS holds this state (it is applied to the rank).
	224
	225	Lines
	226	~~~~~
	227
	228	- A double-lined shape indicates the rank is "in".
	229
20effc67	230	.. graphviz:: mds-state-diagram.dot