]> git.proxmox.com Git - ceph.git/blame - ceph/doc/cephfs/eviction.rst
import 15.2.0 Octopus source
[ceph.git] / ceph / doc / cephfs / eviction.rst
CommitLineData
7c673cae 1
9f95a23c
TL
2================================
3Ceph file system client eviction
4================================
7c673cae 5
9f95a23c
TL
6When a file system client is unresponsive or otherwise misbehaving, it
7may be necessary to forcibly terminate its access to the file system. This
7c673cae
FG
8process is called *eviction*.
9
31f18b77 10Evicting a CephFS client prevents it from communicating further with MDS
9f95a23c 11daemons and OSD daemons. If a client was doing buffered IO to the file system,
31f18b77
FG
12any un-flushed data will be lost.
13
14Clients may either be evicted automatically (if they fail to communicate
15promptly with the MDS), or manually (by the system administrator).
16
17The client eviction process applies to clients of all kinds, this includes
18FUSE mounts, kernel mounts, nfs-ganesha gateways, and any process using
19libcephfs.
20
21Automatic client eviction
22=========================
23
11fdf7f2 24There are three situations in which a client may be evicted automatically.
31f18b77 25
11fdf7f2
TL
26#. On an active MDS daemon, if a client has not communicated with the MDS for over
27 ``session_autoclose`` (a file system variable) seconds (300 seconds by
28 default), then it will be evicted automatically.
31f18b77 29
11fdf7f2
TL
30#. On an active MDS daemon, if a client has not responded to cap revoke messages
31 for over ``mds_cap_revoke_eviction_timeout`` (configuration option) seconds.
32 This is disabled by default.
91327a77 33
11fdf7f2
TL
34#. During MDS startup (including on failover), the MDS passes through a
35 state called ``reconnect``. During this state, it waits for all the
36 clients to connect to the new MDS daemon. If any clients fail to do
37 so within the time window (``mds_reconnect_timeout``, 45 seconds by default)
38 then they will be evicted.
31f18b77
FG
39
40A warning message is sent to the cluster log if either of these situations
41arises.
7c673cae 42
31f18b77
FG
43Manual client eviction
44======================
7c673cae 45
31f18b77 46Sometimes, the administrator may want to evict a client manually. This
11fdf7f2 47could happen if a client has died and the administrator does not
31f18b77
FG
48want to wait for its session to time out, or it could happen if
49a client is misbehaving and the administrator does not have access to
50the client node to unmount it.
7c673cae 51
31f18b77 52It is useful to inspect the list of clients first:
7c673cae
FG
53
54::
55
31f18b77
FG
56 ceph tell mds.0 client ls
57
7c673cae 58 [
31f18b77
FG
59 {
60 "id": 4305,
61 "num_leases": 0,
62 "num_caps": 3,
63 "state": "open",
64 "replay_requests": 0,
65 "completed_requests": 0,
66 "reconnecting": false,
67 "inst": "client.4305 172.21.9.34:0/422650892",
68 "client_metadata": {
69 "ceph_sha1": "ae81e49d369875ac8b569ff3e3c456a31b8f3af5",
70 "ceph_version": "ceph version 12.0.0-1934-gae81e49 (ae81e49d369875ac8b569ff3e3c456a31b8f3af5)",
71 "entity_id": "0",
72 "hostname": "senta04",
73 "mount_point": "/tmp/tmpcMpF1b/mnt.0",
74 "pid": "29377",
75 "root": "/"
76 }
77 }
78 ]
79
80
81
82Once you have identified the client you want to evict, you can
83do that using its unique ID, or various other attributes to identify it:
7c673cae
FG
84
85::
31f18b77
FG
86
87 # These all work
88 ceph tell mds.0 client evict id=4305
89 ceph tell mds.0 client evict client_metadata.=4305
90
7c673cae 91
31f18b77
FG
92Advanced: Un-blacklisting a client
93==================================
7c673cae 94
31f18b77
FG
95Ordinarily, a blacklisted client may not reconnect to the servers: it
96must be unmounted and then mounted anew.
7c673cae 97
31f18b77
FG
98However, in some situations it may be useful to permit a client that
99was evicted to attempt to reconnect.
7c673cae 100
31f18b77
FG
101Because CephFS uses the RADOS OSD blacklist to control client eviction,
102CephFS clients can be permitted to reconnect by removing them from
103the blacklist:
7c673cae
FG
104
105::
106
11fdf7f2
TL
107 $ ceph osd blacklist ls
108 listed 1 entries
109 127.0.0.1:0/3710147553 2018-03-19 11:32:24.716146
110 $ ceph osd blacklist rm 127.0.0.1:0/3710147553
111 un-blacklisting 127.0.0.1:0/3710147553
112
7c673cae 113
31f18b77
FG
114Doing this may put data integrity at risk if other clients have accessed
115files that the blacklisted client was doing buffered IO to. It is also not
116guaranteed to result in a fully functional client -- the best way to get
117a fully healthy client back after an eviction is to unmount the client
118and do a fresh mount.
7c673cae 119
31f18b77
FG
120If you are trying to reconnect clients in this way, you may also
121find it useful to set ``client_reconnect_stale`` to true in the
122FUSE client, to prompt the client to try to reconnect.
7c673cae 123
31f18b77
FG
124Advanced: Configuring blacklisting
125==================================
7c673cae 126
31f18b77
FG
127If you are experiencing frequent client evictions, due to slow
128client hosts or an unreliable network, and you cannot fix the underlying
129issue, then you may want to ask the MDS to be less strict.
7c673cae 130
31f18b77
FG
131It is possible to respond to slow clients by simply dropping their
132MDS sessions, but permit them to re-open sessions and permit them
133to continue talking to OSDs. To enable this mode, set
134``mds_session_blacklist_on_timeout`` to false on your MDS nodes.
7c673cae 135
31f18b77
FG
136For the equivalent behaviour on manual evictions, set
137``mds_session_blacklist_on_evict`` to false.
138
139Note that if blacklisting is disabled, then evicting a client will
140only have an effect on the MDS you send the command to. On a system
141with multiple active MDS daemons, you would need to send an
142eviction command to each active daemon. When blacklisting is enabled
b32b8144 143(the default), sending an eviction command to just a single
31f18b77
FG
144MDS is sufficient, because the blacklist propagates it to the others.
145
b32b8144 146.. _background_blacklisting_and_osd_epoch_barrier:
7c673cae 147
b32b8144
FG
148Background: Blacklisting and OSD epoch barrier
149==============================================
7c673cae 150
b32b8144
FG
151After a client is blacklisted, it is necessary to make sure that
152other clients and MDS daemons have the latest OSDMap (including
153the blacklist entry) before they try to access any data objects
154that the blacklisted client might have been accessing.
155
156This is ensured using an internal "osdmap epoch barrier" mechanism.
157
158The purpose of the barrier is to ensure that when we hand out any
159capabilities which might allow touching the same RADOS objects, the
160clients we hand out the capabilities to must have a sufficiently recent
161OSD map to not race with cancelled operations (from ENOSPC) or
162blacklisted clients (from evictions).
163
164More specifically, the cases where an epoch barrier is set are:
165
166 * Client eviction (where the client is blacklisted and other clients
167 must wait for a post-blacklist epoch to touch the same objects).
168 * OSD map full flag handling in the client (where the client may
169 cancel some OSD ops from a pre-full epoch, so other clients must
170 wait until the full epoch or later before touching the same objects).
171 * MDS startup, because we don't persist the barrier epoch, so must
172 assume that latest OSD map is always required after a restart.
173
174Note that this is a global value for simplicity. We could maintain this on
175a per-inode basis. But we don't, because:
176
177 * It would be more complicated.
178 * It would use an extra 4 bytes of memory for every inode.
11fdf7f2
TL
179 * It would not be much more efficient as, almost always, everyone has
180 the latest OSD map. And, in most cases everyone will breeze through this
181 barrier rather than waiting.
b32b8144
FG
182 * This barrier is done in very rare cases, so any benefit from per-inode
183 granularity would only very rarely be seen.
184
185The epoch barrier is transmitted along with all capability messages, and
186instructs the receiver of the message to avoid sending any more RADOS
187operations to OSDs until it has seen this OSD epoch. This mainly applies
188to clients (doing their data writes directly to files), but also applies
189to the MDS because things like file size probing and file deletion are
190done directly from the MDS.