]>
Commit | Line | Data |
---|---|---|
11fdf7f2 TL |
1 | |
2 | MDS States | |
3 | ========== | |
4 | ||
5 | ||
6 | The Metadata Server (MDS) goes through several states during normal operation | |
7 | in CephFS. For example, some states indicate that the MDS is recovering from a | |
8 | failover by a previous instance of the MDS. Here we'll document all of these | |
9 | states and include a state diagram to visualize the transitions. | |
10 | ||
11 | State Descriptions | |
12 | ------------------ | |
13 | ||
14 | Common states | |
15 | ~~~~~~~~~~~~~~ | |
16 | ||
17 | ||
18 | :: | |
19 | ||
20 | up:active | |
21 | ||
22 | This is the normal operating state of the MDS. It indicates that the MDS | |
23 | and its rank in the file system is available. | |
24 | ||
25 | ||
26 | :: | |
27 | ||
28 | up:standby | |
29 | ||
30 | The MDS is available to takeover for a failed rank (see also :ref:`mds-standby`). | |
31 | The monitor will automatically assign an MDS in this state to a failed rank | |
32 | once available. | |
33 | ||
34 | ||
35 | :: | |
36 | ||
37 | up:standby_replay | |
38 | ||
39 | The MDS is following the journal of another ``up:active`` MDS. Should the | |
40 | active MDS fail, having a standby MDS in replay mode is desirable as the MDS is | |
41 | replaying the live journal and will more quickly takeover. A downside to having | |
42 | standby replay MDSs is that they are not available to takeover for any other | |
43 | MDS that fails, only the MDS they follow. | |
44 | ||
45 | ||
46 | Less common or transitory states | |
47 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
48 | ||
49 | ||
50 | :: | |
51 | ||
52 | up:boot | |
53 | ||
54 | This state is broadcast to the Ceph monitors during startup. This state is | |
55 | never visible as the Monitor immediately assign the MDS to an available rank or | |
56 | commands the MDS to operate as a standby. The state is documented here for | |
57 | completeness. | |
58 | ||
59 | ||
60 | :: | |
61 | ||
62 | up:creating | |
63 | ||
64 | The MDS is creating a new rank (perhaps rank 0) by constructing some per-rank | |
65 | metadata (like the journal) and entering the MDS cluster. | |
66 | ||
67 | ||
68 | :: | |
69 | ||
70 | up:starting | |
71 | ||
72 | The MDS is restarting a stopped rank. It opens associated per-rank metadata | |
73 | and enters the MDS cluster. | |
74 | ||
75 | ||
76 | :: | |
77 | ||
78 | up:stopping | |
79 | ||
80 | When a rank is stopped, the monitors command an active MDS to enter the | |
81 | ``up:stopping`` state. In this state, the MDS accepts no new client | |
82 | connections, migrates all subtrees to other ranks in the file system, flush its | |
83 | metadata journal, and, if the last rank (0), evict all clients and shutdown | |
84 | (see also :ref:`cephfs-administration`). | |
85 | ||
86 | ||
87 | :: | |
88 | ||
89 | up:replay | |
90 | ||
91 | The MDS taking over a failed rank. This state represents that the MDS is | |
92 | recovering its journal and other metadata. | |
93 | ||
94 | ||
95 | :: | |
96 | ||
97 | up:resolve | |
98 | ||
99 | The MDS enters this state from ``up:replay`` if the Ceph file system has | |
100 | multiple ranks (including this one), i.e. it's not a single active MDS cluster. | |
101 | The MDS is resolving any uncommitted inter-MDS operations. All ranks in the | |
102 | file system must be in this state or later for progress to be made, i.e. no | |
103 | rank can be failed/damaged or ``up:replay``. | |
104 | ||
105 | ||
106 | :: | |
107 | ||
108 | up:reconnect | |
109 | ||
110 | An MDS enters this state from ``up:replay`` or ``up:resolve``. This state is to | |
111 | solicit reconnections from clients. Any client which had a session with this | |
112 | rank must reconnect during this time, configurable via | |
113 | ``mds_reconnect_timeout``. | |
114 | ||
115 | ||
116 | :: | |
117 | ||
118 | up:rejoin | |
119 | ||
120 | The MDS enters this state from ``up:reconnect``. In this state, the MDS is | |
121 | rejoining the MDS cluster cache. In particular, all inter-MDS locks on metadata | |
122 | are reestablished. | |
123 | ||
124 | If there are no known client requests to be replayed, the MDS directly becomes | |
125 | ``up:active`` from this state. | |
126 | ||
127 | ||
128 | :: | |
129 | ||
130 | up:clientreplay | |
131 | ||
132 | The MDS may enter this state from ``up:rejoin``. The MDS is replaying any | |
133 | client requests which were replied to but not yet durable (not journaled). | |
134 | Clients resend these requests during ``up:reconnect`` and the requests are | |
135 | replayed once again. The MDS enters ``up:active`` after completing replay. | |
136 | ||
137 | ||
138 | Failed states | |
139 | ~~~~~~~~~~~~~ | |
140 | ||
141 | :: | |
142 | ||
143 | down:failed | |
144 | ||
145 | No MDS actually holds this state. Instead, it is applied to the rank in the file system. For example: | |
146 | ||
147 | :: | |
148 | ||
149 | $ ceph fs dump | |
150 | ... | |
151 | max_mds 1 | |
152 | in 0 | |
153 | up {} | |
154 | failed 0 | |
155 | ... | |
156 | ||
20effc67 TL |
157 | Rank 0 is part of the failed set and is pending to be taken over by a standby |
158 | MDS. If this state persists, it indicates no suitable MDS daemons found to be | |
159 | assigned to this rank. This may be caused by not enough standby daemons, or all | |
1e59de90 | 160 | standby daemons have incompatible compat (see also :ref:`upgrade-mds-cluster`). |
11fdf7f2 TL |
161 | |
162 | ||
163 | :: | |
164 | ||
165 | down:damaged | |
166 | ||
167 | No MDS actually holds this state. Instead, it is applied to the rank in the file system. For example: | |
168 | ||
169 | :: | |
170 | ||
171 | $ ceph fs dump | |
172 | ... | |
173 | max_mds 1 | |
174 | in 0 | |
175 | up {} | |
176 | failed | |
177 | damaged 0 | |
178 | ... | |
179 | ||
180 | Rank 0 has become damaged (see also :ref:`cephfs-disaster-recovery`) and placed in | |
181 | the ``damaged`` set. An MDS which was running as rank 0 found metadata damage | |
182 | that could not be automatically recovered. Operator intervention is required. | |
183 | ||
184 | ||
185 | :: | |
186 | ||
187 | down:stopped | |
188 | ||
189 | No MDS actually holds this state. Instead, it is applied to the rank in the file system. For example: | |
190 | ||
191 | :: | |
192 | ||
193 | $ ceph fs dump | |
194 | ... | |
195 | max_mds 1 | |
196 | in 0 | |
197 | up {} | |
198 | failed | |
199 | damaged | |
200 | stopped 1 | |
201 | ... | |
202 | ||
203 | The rank has been stopped by reducing ``max_mds`` (see also :ref:`cephfs-multimds`). | |
204 | ||
205 | State Diagram | |
206 | ------------- | |
207 | ||
208 | This state diagram shows the possible state transitions for the MDS/rank. The legend is as follows: | |
209 | ||
210 | Color | |
211 | ~~~~~ | |
212 | ||
213 | - Green: MDS is active. | |
214 | - Orange: MDS is in transient state trying to become active. | |
215 | - Red: MDS is indicating a state that causes the rank to be marked failed. | |
216 | - Purple: MDS and rank is stopping. | |
9f95a23c | 217 | - Black: MDS is indicating a state that causes the rank to be marked damaged. |
11fdf7f2 TL |
218 | |
219 | Shape | |
220 | ~~~~~ | |
221 | ||
222 | - Circle: an MDS holds this state. | |
223 | - Hexagon: no MDS holds this state (it is applied to the rank). | |
224 | ||
225 | Lines | |
226 | ~~~~~ | |
227 | ||
228 | - A double-lined shape indicates the rank is "in". | |
229 | ||
20effc67 | 230 | .. graphviz:: mds-state-diagram.dot |