ceph/doc/cephfs/eviction.rst

   1
   2 ================================
   3 Ceph file system client eviction
   4 ================================
   5
   6 When a file system client is unresponsive or otherwise misbehaving, it
   7 may be necessary to forcibly terminate its access to the file system.  This
   8 process is called *eviction*.
   9
  10 Evicting a CephFS client prevents it from communicating further with MDS
  11 daemons and OSD daemons.  If a client was doing buffered IO to the file system,
  12 any un-flushed data will be lost.
  13
  14 Clients may either be evicted automatically (if they fail to communicate
  15 promptly with the MDS), or manually (by the system administrator).
  16
  17 The client eviction process applies to clients of all kinds, this includes
  18 FUSE mounts, kernel mounts, nfs-ganesha gateways, and any process using
  19 libcephfs.
  20
  21 Automatic client eviction
  22 =========================
  23
  24 There are three situations in which a client may be evicted automatically.
  25
  26 #. On an active MDS daemon, if a client has not communicated with the MDS for over
  27    ``session_autoclose`` (a file system variable) seconds (300 seconds by
  28    default), then it will be evicted automatically.
  29
  30 #. On an active MDS daemon, if a client has not responded to cap revoke messages
  31    for over ``mds_cap_revoke_eviction_timeout`` (configuration option) seconds.
  32    This is disabled by default.
  33
  34 #. During MDS startup (including on failover), the MDS passes through a
  35    state called ``reconnect``.  During this state, it waits for all the
  36    clients to connect to the new MDS daemon.  If any clients fail to do
  37    so within the time window (``mds_reconnect_timeout``, 45 seconds by default)
  38    then they will be evicted.
  39
  40 A warning message is sent to the cluster log if either of these situations
  41 arises.
  42
  43 Manual client eviction
  44 ======================
  45
  46 Sometimes, the administrator may want to evict a client manually.  This
  47 could happen if a client has died and the administrator does not
  48 want to wait for its session to time out, or it could happen if
  49 a client is misbehaving and the administrator does not have access to
  50 the client node to unmount it.
  51
  52 It is useful to inspect the list of clients first:
  53
  54 ::
  55
  56     ceph tell mds.0 client ls
  57
  58     [
  59         {
  60             "id": 4305,
  61             "num_leases": 0,
  62             "num_caps": 3,
  63             "state": "open",
  64             "replay_requests": 0,
  65             "completed_requests": 0,
  66             "reconnecting": false,
  67             "inst": "client.4305 172.21.9.34:0/422650892",
  68             "client_metadata": {
  69                 "ceph_sha1": "ae81e49d369875ac8b569ff3e3c456a31b8f3af5",
  70                 "ceph_version": "ceph version 12.0.0-1934-gae81e49 (ae81e49d369875ac8b569ff3e3c456a31b8f3af5)",
  71                 "entity_id": "0",
  72                 "hostname": "senta04",
  73                 "mount_point": "/tmp/tmpcMpF1b/mnt.0",
  74                 "pid": "29377",
  75                 "root": "/"
  76             }
  77         }
  78     ]
  79
  80
  81
  82 Once you have identified the client you want to evict, you can
  83 do that using its unique ID, or various other attributes to identify it:
  84
  85 ::
  86
  87     # These all work
  88     ceph tell mds.0 client evict id=4305
  89     ceph tell mds.0 client evict client_metadata.=4305
  90
  91
  92 Advanced: Un-blocklisting a client
  93 ==================================
  94
  95 Ordinarily, a blocklisted client may not reconnect to the servers: it
  96 must be unmounted and then mounted anew.
  97
  98 However, in some situations it may be useful to permit a client that
  99 was evicted to attempt to reconnect.
 100
 101 Because CephFS uses the RADOS OSD blocklist to control client eviction,
 102 CephFS clients can be permitted to reconnect by removing them from
 103 the blocklist:
 104
 105 ::
 106
 107     $ ceph osd blocklist ls
 108     listed 1 entries
 109     127.0.0.1:0/3710147553 2018-03-19 11:32:24.716146
 110     $ ceph osd blocklist rm 127.0.0.1:0/3710147553
 111     un-blocklisting 127.0.0.1:0/3710147553
 112
 113
 114 Doing this may put data integrity at risk if other clients have accessed
 115 files that the blocklisted client was doing buffered IO to.  It is also not
 116 guaranteed to result in a fully functional client -- the best way to get
 117 a fully healthy client back after an eviction is to unmount the client
 118 and do a fresh mount.
 119
 120 If you are trying to reconnect clients in this way, you may also
 121 find it useful to set ``client_reconnect_stale`` to true in the
 122 FUSE client, to prompt the client to try to reconnect.
 123
 124 Advanced: Configuring blocklisting
 125 ==================================
 126
 127 If you are experiencing frequent client evictions, due to slow
 128 client hosts or an unreliable network, and you cannot fix the underlying
 129 issue, then you may want to ask the MDS to be less strict.
 130
 131 It is possible to respond to slow clients by simply dropping their
 132 MDS sessions, but permit them to re-open sessions and permit them
 133 to continue talking to OSDs.  To enable this mode, set
 134 ``mds_session_blocklist_on_timeout`` to false on your MDS nodes.
 135
 136 For the equivalent behaviour on manual evictions, set
 137 ``mds_session_blocklist_on_evict`` to false.
 138
 139 Note that if blocklisting is disabled, then evicting a client will
 140 only have an effect on the MDS you send the command to.  On a system
 141 with multiple active MDS daemons, you would need to send an
 142 eviction command to each active daemon.  When blocklisting is enabled
 143 (the default), sending an eviction command to just a single
 144 MDS is sufficient, because the blocklist propagates it to the others.
 145
 146 .. _background_blocklisting_and_osd_epoch_barrier:
 147
 148 Background: Blocklisting and OSD epoch barrier
 149 ==============================================
 150
 151 After a client is blocklisted, it is necessary to make sure that
 152 other clients and MDS daemons have the latest OSDMap (including
 153 the blocklist entry) before they try to access any data objects
 154 that the blocklisted client might have been accessing.
 155
 156 This is ensured using an internal "osdmap epoch barrier" mechanism.
 157
 158 The purpose of the barrier is to ensure that when we hand out any
 159 capabilities which might allow touching the same RADOS objects, the
 160 clients we hand out the capabilities to must have a sufficiently recent
 161 OSD map to not race with cancelled operations (from ENOSPC) or
 162 blocklisted clients (from evictions).
 163
 164 More specifically, the cases where an epoch barrier is set are:
 165
 166  * Client eviction (where the client is blocklisted and other clients
 167    must wait for a post-blocklist epoch to touch the same objects).
 168  * OSD map full flag handling in the client (where the client may
 169    cancel some OSD ops from a pre-full epoch, so other clients must
 170    wait until the full epoch or later before touching the same objects).
 171  * MDS startup, because we don't persist the barrier epoch, so must
 172    assume that latest OSD map is always required after a restart.
 173
 174 Note that this is a global value for simplicity. We could maintain this on
 175 a per-inode basis. But we don't, because:
 176
 177  * It would be more complicated.
 178  * It would use an extra 4 bytes of memory for every inode.
 179  * It would not be much more efficient as, almost always, everyone has
 180    the latest OSD map. And, in most cases everyone will breeze through this
 181    barrier rather than waiting.
 182  * This barrier is done in very rare cases, so any benefit from per-inode
 183    granularity would only very rarely be seen.
 184
 185 The epoch barrier is transmitted along with all capability messages, and
 186 instructs the receiver of the message to avoid sending any more RADOS
 187 operations to OSDs until it has seen this OSD epoch.  This mainly applies
 188 to clients (doing their data writes directly to files), but also applies
 189 to the MDS because things like file size probing and file deletion are
 190 done directly from the MDS.