]> git.proxmox.com Git - ceph.git/blame - ceph/doc/cephfs/standby.rst
add subtree-ish sources for 12.0.3
[ceph.git] / ceph / doc / cephfs / standby.rst
CommitLineData
7c673cae
FG
1
2Terminology
3-----------
4
5A Ceph cluster may have zero or more CephFS *filesystems*. CephFS
6filesystems have a human readable name (set in ``fs new``)
7and an integer ID. The ID is called the filesystem cluster ID,
8or *FSCID*.
9
10Each CephFS filesystem has a number of *ranks*, one by default,
11which start at zero. A rank may be thought of as a metadata shard.
12Controlling the number of ranks in a filesystem is described
13in :doc:`/cephfs/multimds`
14
15Each CephFS ceph-mds process (a *daemon*) initially starts up
16without a rank. It may be assigned one by the monitor cluster.
17A daemon may only hold one rank at a time. Daemons only give up
18a rank when the ceph-mds process stops.
19
20If a rank is not associated with a daemon, the rank is
21considered *failed*. Once a rank is assigned to a daemon,
22the rank is considered *up*.
23
24A daemon has a *name* that is set statically by the administrator
25when the daemon is first configured. Typical configurations
26use the hostname where the daemon runs as the daemon name.
27
28Each time a daemon starts up, it is also assigned a *GID*, which
29is unique to this particular process lifetime of the daemon. The
30GID is an integer.
31
32Referring to MDS daemons
33------------------------
34
35Most of the administrative commands that refer to an MDS daemon
36accept a flexible argument format that may contain a rank, a GID
37or a name.
38
39Where a rank is used, this may optionally be qualified with
40a leading filesystem name or ID. If a daemon is a standby (i.e.
41it is not currently assigned a rank), then it may only be
42referred to by GID or name.
43
44For example, if we had an MDS daemon which was called 'myhost',
45had GID 5446, and was assigned rank 0 in the filesystem 'myfs'
46which had FSCID 3, then any of the following would be suitable
47forms of the 'fail' command:
48
49::
50
51 ceph mds fail 5446 # GID
52 ceph mds fail myhost # Daemon name
53 ceph mds fail 0 # Unqualified rank
54 ceph mds fail 3:0 # FSCID and rank
55 ceph mds fail myfs:0 # Filesystem name and rank
56
57Managing failover
58-----------------
59
60If an MDS daemon stops communicating with the monitor, the monitor will
61wait ``mds_beacon_grace`` seconds (default 15 seconds) before marking
62the daemon as *laggy*.
63
64Each file system may specify a number of standby daemons to be considered
65healthy. This number includes daemons in standby-replay waiting for a rank to
66fail (remember that a standby-replay daemon will not be assigned to take over a
67failure for another rank or a failure in a another CephFS file system). The
68pool of standby daemons not in replay count towards any file system count.
69Each file system may set the number of standby daemons wanted using:
70
71::
72
73 ceph fs set <fs name> standby_count_wanted <count>
74
75Setting ``count`` to 0 will disable the health check.
76
77
78Configuring standby daemons
79---------------------------
80
81There are four configuration settings that control how a daemon
82will behave while in standby:
83
84::
85
86 mds_standby_for_name
87 mds_standby_for_rank
88 mds_standby_for_fscid
89 mds_standby_replay
90
91These may be set in the ceph.conf on the host where the MDS daemon
92runs (as opposed to on the monitor). The daemon loads these settings
93when it starts, and sends them to the monitor.
94
95By default, if none of these settings are used, all MDS daemons
96which do not hold a rank will be used as standbys for any rank.
97
98The settings which associate a standby daemon with a particular
99name or rank do not guarantee that the daemon will *only* be used
100for that rank. They mean that when several standbys are available,
101the associated standby daemon will be used. If a rank is failed,
102and a standby is available, it will be used even if it is associated
103with a different rank or named daemon.
104
105mds_standby_replay
106~~~~~~~~~~~~~~~~~~
107
108If this is set to true, then the standby daemon will continuously read
109the metadata journal of an up rank. This will give it
110a warm metadata cache, and speed up the process of failing over
111if the daemon serving the rank fails.
112
113An up rank may only have one standby replay daemon assigned to it,
114if two daemons are both set to be standby replay then one of them
115will arbitrarily win, and the other will become a normal non-replay
116standby.
117
118Once a daemon has entered the standby replay state, it will only be
119used as a standby for the rank that it is following. If another rank
120fails, this standby replay daemon will not be used as a replacement,
121even if no other standbys are available.
122
123*Historical note:* In Ceph prior to v10.2.1, this setting (when ``false``) is
124always true when ``mds_standby_for_*`` is also set.
125
126mds_standby_for_name
127~~~~~~~~~~~~~~~~~~~~
128
129Set this to make the standby daemon only take over a failed rank
130if the last daemon to hold it matches this name.
131
132mds_standby_for_rank
133~~~~~~~~~~~~~~~~~~~~
134
135Set this to make the standby daemon only take over the specified
136rank. If another rank fails, this daemon will not be used to
137replace it.
138
139Use in conjunction with ``mds_standby_for_fscid`` to be specific
140about which filesystem's rank you are targeting, if you have
141multiple filesystems.
142
143mds_standby_for_fscid
144~~~~~~~~~~~~~~~~~~~~~
145
146If ``mds_standby_for_rank`` is set, this is simply a qualifier to
147say which filesystem's rank is referred to.
148
149If ``mds_standby_for_rank`` is not set, then setting FSCID will
150cause this daemon to target any rank in the specified FSCID. Use
151this if you have a daemon that you want to use for any rank, but
152only within a particular filesystem.
153
154mon_force_standby_active
155~~~~~~~~~~~~~~~~~~~~~~~~
156
157This setting is used on monitor hosts. It defaults to true.
158
159If it is false, then daemons configured with standby_replay=true
160will **only** become active if the rank/name that they have
161been configured to follow fails. On the other hand, if this
162setting is true, then a daemon configured with standby_replay=true
163may be assigned some other rank.
164
165Examples
166--------
167
168These are example ceph.conf snippets. In practice you can either
169copy a ceph.conf with all daemons' configuration to all your servers,
170or you can have a different file on each server that contains just
171that server's daemons' configuration.
172
173Simple pair
174~~~~~~~~~~~
175
176Two MDS daemons 'a' and 'b' acting as a pair, where whichever one is not
177currently assigned a rank will be the standby replay follower
178of the other.
179
180::
181
182 [mds.a]
183 mds standby replay = true
184 mds standby for rank = 0
185
186 [mds.b]
187 mds standby replay = true
188 mds standby for rank = 0
189
190Floating standby
191~~~~~~~~~~~~~~~~
192
193Three MDS daemons 'a', 'b' and 'c', in a filesystem that has
194``max_mds`` set to 2.
195
196::
197
198 # No explicit configuration required: whichever daemon is
199 # not assigned a rank will go into 'standby' and take over
200 # for whichever other daemon fails.
201
202Two MDS clusters
203~~~~~~~~~~~~~~~~
204
205With two filesystems, I have four MDS daemons, and I want two
206to act as a pair for one filesystem and two to act as a pair
207for the other filesystem.
208
209::
210
211 [mds.a]
212 mds standby for fscid = 1
213
214 [mds.b]
215 mds standby for fscid = 1
216
217 [mds.c]
218 mds standby for fscid = 2
219
220 [mds.d]
221 mds standby for fscid = 2
222