]>
Commit | Line | Data |
---|---|---|
11fdf7f2 TL |
1 | |
2 | MDS States | |
3 | ========== | |
4 | ||
5 | ||
6 | The Metadata Server (MDS) goes through several states during normal operation | |
7 | in CephFS. For example, some states indicate that the MDS is recovering from a | |
8 | failover by a previous instance of the MDS. Here we'll document all of these | |
9 | states and include a state diagram to visualize the transitions. | |
10 | ||
11 | State Descriptions | |
12 | ------------------ | |
13 | ||
14 | Common states | |
15 | ~~~~~~~~~~~~~~ | |
16 | ||
17 | ||
18 | :: | |
19 | ||
20 | up:active | |
21 | ||
22 | This is the normal operating state of the MDS. It indicates that the MDS | |
23 | and its rank in the file system is available. | |
24 | ||
25 | ||
26 | :: | |
27 | ||
28 | up:standby | |
29 | ||
30 | The MDS is available to takeover for a failed rank (see also :ref:`mds-standby`). | |
31 | The monitor will automatically assign an MDS in this state to a failed rank | |
32 | once available. | |
33 | ||
34 | ||
35 | :: | |
36 | ||
37 | up:standby_replay | |
38 | ||
39 | The MDS is following the journal of another ``up:active`` MDS. Should the | |
40 | active MDS fail, having a standby MDS in replay mode is desirable as the MDS is | |
41 | replaying the live journal and will more quickly takeover. A downside to having | |
42 | standby replay MDSs is that they are not available to takeover for any other | |
43 | MDS that fails, only the MDS they follow. | |
44 | ||
45 | ||
46 | Less common or transitory states | |
47 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
48 | ||
49 | ||
50 | :: | |
51 | ||
52 | up:boot | |
53 | ||
54 | This state is broadcast to the Ceph monitors during startup. This state is | |
55 | never visible as the Monitor immediately assign the MDS to an available rank or | |
56 | commands the MDS to operate as a standby. The state is documented here for | |
57 | completeness. | |
58 | ||
59 | ||
60 | :: | |
61 | ||
62 | up:creating | |
63 | ||
64 | The MDS is creating a new rank (perhaps rank 0) by constructing some per-rank | |
65 | metadata (like the journal) and entering the MDS cluster. | |
66 | ||
67 | ||
68 | :: | |
69 | ||
70 | up:starting | |
71 | ||
72 | The MDS is restarting a stopped rank. It opens associated per-rank metadata | |
73 | and enters the MDS cluster. | |
74 | ||
75 | ||
76 | :: | |
77 | ||
78 | up:stopping | |
79 | ||
80 | When a rank is stopped, the monitors command an active MDS to enter the | |
81 | ``up:stopping`` state. In this state, the MDS accepts no new client | |
82 | connections, migrates all subtrees to other ranks in the file system, flush its | |
83 | metadata journal, and, if the last rank (0), evict all clients and shutdown | |
84 | (see also :ref:`cephfs-administration`). | |
85 | ||
86 | ||
87 | :: | |
88 | ||
89 | up:replay | |
90 | ||
91 | The MDS taking over a failed rank. This state represents that the MDS is | |
92 | recovering its journal and other metadata. | |
93 | ||
94 | ||
95 | :: | |
96 | ||
97 | up:resolve | |
98 | ||
99 | The MDS enters this state from ``up:replay`` if the Ceph file system has | |
100 | multiple ranks (including this one), i.e. it's not a single active MDS cluster. | |
101 | The MDS is resolving any uncommitted inter-MDS operations. All ranks in the | |
102 | file system must be in this state or later for progress to be made, i.e. no | |
103 | rank can be failed/damaged or ``up:replay``. | |
104 | ||
105 | ||
106 | :: | |
107 | ||
108 | up:reconnect | |
109 | ||
110 | An MDS enters this state from ``up:replay`` or ``up:resolve``. This state is to | |
111 | solicit reconnections from clients. Any client which had a session with this | |
112 | rank must reconnect during this time, configurable via | |
113 | ``mds_reconnect_timeout``. | |
114 | ||
115 | ||
116 | :: | |
117 | ||
118 | up:rejoin | |
119 | ||
120 | The MDS enters this state from ``up:reconnect``. In this state, the MDS is | |
121 | rejoining the MDS cluster cache. In particular, all inter-MDS locks on metadata | |
122 | are reestablished. | |
123 | ||
124 | If there are no known client requests to be replayed, the MDS directly becomes | |
125 | ``up:active`` from this state. | |
126 | ||
127 | ||
128 | :: | |
129 | ||
130 | up:clientreplay | |
131 | ||
132 | The MDS may enter this state from ``up:rejoin``. The MDS is replaying any | |
133 | client requests which were replied to but not yet durable (not journaled). | |
134 | Clients resend these requests during ``up:reconnect`` and the requests are | |
135 | replayed once again. The MDS enters ``up:active`` after completing replay. | |
136 | ||
137 | ||
138 | Failed states | |
139 | ~~~~~~~~~~~~~ | |
140 | ||
141 | :: | |
142 | ||
143 | down:failed | |
144 | ||
145 | No MDS actually holds this state. Instead, it is applied to the rank in the file system. For example: | |
146 | ||
147 | :: | |
148 | ||
149 | $ ceph fs dump | |
150 | ... | |
151 | max_mds 1 | |
152 | in 0 | |
153 | up {} | |
154 | failed 0 | |
155 | ... | |
156 | ||
157 | Rank 0 is part of the failed set. | |
158 | ||
159 | ||
160 | :: | |
161 | ||
162 | down:damaged | |
163 | ||
164 | No MDS actually holds this state. Instead, it is applied to the rank in the file system. For example: | |
165 | ||
166 | :: | |
167 | ||
168 | $ ceph fs dump | |
169 | ... | |
170 | max_mds 1 | |
171 | in 0 | |
172 | up {} | |
173 | failed | |
174 | damaged 0 | |
175 | ... | |
176 | ||
177 | Rank 0 has become damaged (see also :ref:`cephfs-disaster-recovery`) and placed in | |
178 | the ``damaged`` set. An MDS which was running as rank 0 found metadata damage | |
179 | that could not be automatically recovered. Operator intervention is required. | |
180 | ||
181 | ||
182 | :: | |
183 | ||
184 | down:stopped | |
185 | ||
186 | No MDS actually holds this state. Instead, it is applied to the rank in the file system. For example: | |
187 | ||
188 | :: | |
189 | ||
190 | $ ceph fs dump | |
191 | ... | |
192 | max_mds 1 | |
193 | in 0 | |
194 | up {} | |
195 | failed | |
196 | damaged | |
197 | stopped 1 | |
198 | ... | |
199 | ||
200 | The rank has been stopped by reducing ``max_mds`` (see also :ref:`cephfs-multimds`). | |
201 | ||
202 | State Diagram | |
203 | ------------- | |
204 | ||
205 | This state diagram shows the possible state transitions for the MDS/rank. The legend is as follows: | |
206 | ||
207 | Color | |
208 | ~~~~~ | |
209 | ||
210 | - Green: MDS is active. | |
211 | - Orange: MDS is in transient state trying to become active. | |
212 | - Red: MDS is indicating a state that causes the rank to be marked failed. | |
213 | - Purple: MDS and rank is stopping. | |
9f95a23c | 214 | - Black: MDS is indicating a state that causes the rank to be marked damaged. |
11fdf7f2 TL |
215 | |
216 | Shape | |
217 | ~~~~~ | |
218 | ||
219 | - Circle: an MDS holds this state. | |
220 | - Hexagon: no MDS holds this state (it is applied to the rank). | |
221 | ||
222 | Lines | |
223 | ~~~~~ | |
224 | ||
225 | - A double-lined shape indicates the rank is "in". | |
226 | ||
227 | .. image:: mds-state-diagram.svg |