2 Ceph filesystem client eviction
3 ===============================
5 When a filesystem client is unresponsive or otherwise misbehaving, it
6 may be necessary to forcibly terminate its access to the filesystem. This
7 process is called *eviction*.
9 This process is somewhat thorough in order to protect against data inconsistency
10 resulting from misbehaving clients.
15 First, prevent the client from performing any more data operations by *blacklisting*
16 it at the RADOS level. You may be familiar with this concept as *fencing* in other
19 Identify the client to evict from the MDS session list:
23 # ceph daemon mds.a session ls
30 "reconnecting": false,
31 "inst": "client.4117 172.16.79.251:0\/3271",
32 "client_metadata": { "entity_id": "admin",
33 "hostname": "fedoravm.localdomain",
34 "mount_point": "\/home\/user\/mnt"}}]
36 In this case the 'fedoravm' client has address ``172.16.79.251:0/3271``, so we blacklist
41 # ceph osd blacklist add 172.16.79.251:0/3271
42 blacklisting 172.16.79.251:0/3271 until 2014-12-09 13:09:56.569368 (3600 sec)
47 While the evicted client is now marked as blacklisted in the central (mon) copy of the OSD
48 map, it is now necessary to ensure that this OSD map update has propagated to all daemons
49 involved in subsequent filesystem I/O. To do this, use the ``osdmap barrier`` MDS admin
52 First read the latest OSD epoch:
58 fsid fd61ca96-53ff-4311-826c-f36b176d69ea
59 created 2014-12-09 12:03:38.595844
60 modified 2014-12-09 12:09:56.619957
63 In this case it is 12. Now request the MDS to barrier on this epoch:
67 # ceph daemon mds.a osdmap barrier 12
72 Finally, it is safe to evict the client's MDS session, such that any capabilities it held
73 may be issued to other clients. The ID here is the ``id`` attribute from the ``session ls``
78 # ceph daemon mds.a session evict 4117
80 That's it! The client has now been evicted, and any resources it had locked will
81 now be available for other clients.
83 Background: OSD epoch barrier
84 -----------------------------
86 The purpose of the barrier is to ensure that when we hand out any
87 capabilities which might allow touching the same RADOS objects, the
88 clients we hand out the capabilities to must have a sufficiently recent
89 OSD map to not race with cancelled operations (from ENOSPC) or
90 blacklisted clients (from evictions)
92 More specifically, the cases where we set an epoch barrier are:
94 * Client eviction (where the client is blacklisted and other clients
95 must wait for a post-blacklist epoch to touch the same objects)
96 * OSD map full flag handling in the client (where the client may
97 cancel some OSD ops from a pre-full epoch, so other clients must
98 wait until the full epoch or later before touching the same objects).
99 * MDS startup, because we don't persist the barrier epoch, so must
100 assume that latest OSD map is always required after a restart.
102 Note that this is a global value for simplicity: we could maintain this on
103 a per-inode basis. We don't, because:
105 * It would be more complicated
106 * It would use an extra 4 bytes of memory for every inode
107 * It would not be much more efficient as almost always everyone has the latest
108 OSD map anyway, in most cases everyone will breeze through this barrier
110 * We only do this barrier in very rare cases, so any benefit from per-inode
111 granularity would only very rarely be seen.
113 The epoch barrier is transmitted along with all capability messages, and
114 instructs the receiver of the message to avoid sending any more RADOS
115 operations to OSDs until it has seen this OSD epoch. This mainly applies
116 to clients (doing their data writes directly to files), but also applies
117 to the MDS because things like file size probing and file deletion are
118 done directly from the MDS.