]> git.proxmox.com Git - ceph.git/blob - ceph/doc/cephfs/eviction.rst
8f0f20b8448c300ce8fa4fb7df7c79ef5f77e33c
[ceph.git] / ceph / doc / cephfs / eviction.rst
1
2 ===============================
3 Ceph filesystem client eviction
4 ===============================
5
6 When a filesystem client is unresponsive or otherwise misbehaving, it
7 may be necessary to forcibly terminate its access to the filesystem. This
8 process is called *eviction*.
9
10 Evicting a CephFS client prevents it from communicating further with MDS
11 daemons and OSD daemons. If a client was doing buffered IO to the filesystem,
12 any un-flushed data will be lost.
13
14 Clients may either be evicted automatically (if they fail to communicate
15 promptly with the MDS), or manually (by the system administrator).
16
17 The client eviction process applies to clients of all kinds, this includes
18 FUSE mounts, kernel mounts, nfs-ganesha gateways, and any process using
19 libcephfs.
20
21 Automatic client eviction
22 =========================
23
24 There are three situations in which a client may be evicted automatically:
25
26 On an active MDS daemon, if a client has not communicated with the MDS for
27 over ``mds_session_autoclose`` seconds (300 seconds by default), then it
28 will be evicted automatically.
29
30 On an active MDS daemon, if a client has not responded to cap revoke messages
31 for over ``mds_cap_revoke_eviction_timeout`` (configuration option) seconds.
32 This is disabled by default.
33
34 During MDS startup (including on failover), the MDS passes through a
35 state called ``reconnect``. During this state, it waits for all the
36 clients to connect to the new MDS daemon. If any clients fail to do
37 so within the time window (``mds_reconnect_timeout``, 45 seconds by default)
38 then they will be evicted.
39
40 A warning message is sent to the cluster log if either of these situations
41 arises.
42
43 Manual client eviction
44 ======================
45
46 Sometimes, the administrator may want to evict a client manually. This
47 could happen if a client is died and the administrator does not
48 want to wait for its session to time out, or it could happen if
49 a client is misbehaving and the administrator does not have access to
50 the client node to unmount it.
51
52 It is useful to inspect the list of clients first:
53
54 ::
55
56 ceph tell mds.0 client ls
57
58 [
59 {
60 "id": 4305,
61 "num_leases": 0,
62 "num_caps": 3,
63 "state": "open",
64 "replay_requests": 0,
65 "completed_requests": 0,
66 "reconnecting": false,
67 "inst": "client.4305 172.21.9.34:0/422650892",
68 "client_metadata": {
69 "ceph_sha1": "ae81e49d369875ac8b569ff3e3c456a31b8f3af5",
70 "ceph_version": "ceph version 12.0.0-1934-gae81e49 (ae81e49d369875ac8b569ff3e3c456a31b8f3af5)",
71 "entity_id": "0",
72 "hostname": "senta04",
73 "mount_point": "/tmp/tmpcMpF1b/mnt.0",
74 "pid": "29377",
75 "root": "/"
76 }
77 }
78 ]
79
80
81
82 Once you have identified the client you want to evict, you can
83 do that using its unique ID, or various other attributes to identify it:
84
85 ::
86
87 # These all work
88 ceph tell mds.0 client evict id=4305
89 ceph tell mds.0 client evict client_metadata.=4305
90
91
92 Advanced: Un-blacklisting a client
93 ==================================
94
95 Ordinarily, a blacklisted client may not reconnect to the servers: it
96 must be unmounted and then mounted anew.
97
98 However, in some situations it may be useful to permit a client that
99 was evicted to attempt to reconnect.
100
101 Because CephFS uses the RADOS OSD blacklist to control client eviction,
102 CephFS clients can be permitted to reconnect by removing them from
103 the blacklist:
104
105 ::
106
107 ceph osd blacklist ls
108 # ... identify the address of the client ...
109 ceph osd blacklist rm <address>
110
111 Doing this may put data integrity at risk if other clients have accessed
112 files that the blacklisted client was doing buffered IO to. It is also not
113 guaranteed to result in a fully functional client -- the best way to get
114 a fully healthy client back after an eviction is to unmount the client
115 and do a fresh mount.
116
117 If you are trying to reconnect clients in this way, you may also
118 find it useful to set ``client_reconnect_stale`` to true in the
119 FUSE client, to prompt the client to try to reconnect.
120
121 Advanced: Configuring blacklisting
122 ==================================
123
124 If you are experiencing frequent client evictions, due to slow
125 client hosts or an unreliable network, and you cannot fix the underlying
126 issue, then you may want to ask the MDS to be less strict.
127
128 It is possible to respond to slow clients by simply dropping their
129 MDS sessions, but permit them to re-open sessions and permit them
130 to continue talking to OSDs. To enable this mode, set
131 ``mds_session_blacklist_on_timeout`` to false on your MDS nodes.
132
133 For the equivalent behaviour on manual evictions, set
134 ``mds_session_blacklist_on_evict`` to false.
135
136 Note that if blacklisting is disabled, then evicting a client will
137 only have an effect on the MDS you send the command to. On a system
138 with multiple active MDS daemons, you would need to send an
139 eviction command to each active daemon. When blacklisting is enabled
140 (the default), sending an eviction command to just a single
141 MDS is sufficient, because the blacklist propagates it to the others.
142
143
144 .. _background_blacklisting_and_osd_epoch_barrier:
145
146 Background: Blacklisting and OSD epoch barrier
147 ==============================================
148
149 After a client is blacklisted, it is necessary to make sure that
150 other clients and MDS daemons have the latest OSDMap (including
151 the blacklist entry) before they try to access any data objects
152 that the blacklisted client might have been accessing.
153
154 This is ensured using an internal "osdmap epoch barrier" mechanism.
155
156 The purpose of the barrier is to ensure that when we hand out any
157 capabilities which might allow touching the same RADOS objects, the
158 clients we hand out the capabilities to must have a sufficiently recent
159 OSD map to not race with cancelled operations (from ENOSPC) or
160 blacklisted clients (from evictions).
161
162 More specifically, the cases where an epoch barrier is set are:
163
164 * Client eviction (where the client is blacklisted and other clients
165 must wait for a post-blacklist epoch to touch the same objects).
166 * OSD map full flag handling in the client (where the client may
167 cancel some OSD ops from a pre-full epoch, so other clients must
168 wait until the full epoch or later before touching the same objects).
169 * MDS startup, because we don't persist the barrier epoch, so must
170 assume that latest OSD map is always required after a restart.
171
172 Note that this is a global value for simplicity. We could maintain this on
173 a per-inode basis. But we don't, because:
174
175 * It would be more complicated.
176 * It would use an extra 4 bytes of memory for every inode.
177 * It would not be much more efficient as almost always everyone has the latest.
178 OSD map anyway, in most cases everyone will breeze through this barrier
179 rather than waiting.
180 * This barrier is done in very rare cases, so any benefit from per-inode
181 granularity would only very rarely be seen.
182
183 The epoch barrier is transmitted along with all capability messages, and
184 instructs the receiver of the message to avoid sending any more RADOS
185 operations to OSDs until it has seen this OSD epoch. This mainly applies
186 to clients (doing their data writes directly to files), but also applies
187 to the MDS because things like file size probing and file deletion are
188 done directly from the MDS.