]>
Commit | Line | Data |
---|---|---|
7c673cae | 1 | |
31f18b77 | 2 | =============================== |
7c673cae FG |
3 | Ceph filesystem client eviction |
4 | =============================== | |
5 | ||
6 | When a filesystem client is unresponsive or otherwise misbehaving, it | |
7 | may be necessary to forcibly terminate its access to the filesystem. This | |
8 | process is called *eviction*. | |
9 | ||
31f18b77 FG |
10 | Evicting a CephFS client prevents it from communicating further with MDS |
11 | daemons and OSD daemons. If a client was doing buffered IO to the filesystem, | |
12 | any un-flushed data will be lost. | |
13 | ||
14 | Clients may either be evicted automatically (if they fail to communicate | |
15 | promptly with the MDS), or manually (by the system administrator). | |
16 | ||
17 | The client eviction process applies to clients of all kinds, this includes | |
18 | FUSE mounts, kernel mounts, nfs-ganesha gateways, and any process using | |
19 | libcephfs. | |
20 | ||
21 | Automatic client eviction | |
22 | ========================= | |
23 | ||
24 | There are two situations in which a client may be evicted automatically: | |
25 | ||
26 | On an active MDS daemon, if a client has not communicated with the MDS for | |
27 | over ``mds_session_autoclose`` seconds (300 seconds by default), then it | |
28 | will be evicted automatically. | |
29 | ||
30 | During MDS startup (including on failover), the MDS passes through a | |
31 | state called ``reconnect``. During this state, it waits for all the | |
32 | clients to connect to the new MDS daemon. If any clients fail to do | |
33 | so within the time window (``mds_reconnect_timeout``, 45 seconds by default) | |
34 | then they will be evicted. | |
35 | ||
36 | A warning message is sent to the cluster log if either of these situations | |
37 | arises. | |
7c673cae | 38 | |
31f18b77 FG |
39 | Manual client eviction |
40 | ====================== | |
7c673cae | 41 | |
31f18b77 FG |
42 | Sometimes, the administrator may want to evict a client manually. This |
43 | could happen if a client is died and the administrator does not | |
44 | want to wait for its session to time out, or it could happen if | |
45 | a client is misbehaving and the administrator does not have access to | |
46 | the client node to unmount it. | |
7c673cae | 47 | |
31f18b77 | 48 | It is useful to inspect the list of clients first: |
7c673cae FG |
49 | |
50 | :: | |
51 | ||
31f18b77 FG |
52 | ceph tell mds.0 client ls |
53 | ||
7c673cae | 54 | [ |
31f18b77 FG |
55 | { |
56 | "id": 4305, | |
57 | "num_leases": 0, | |
58 | "num_caps": 3, | |
59 | "state": "open", | |
60 | "replay_requests": 0, | |
61 | "completed_requests": 0, | |
62 | "reconnecting": false, | |
63 | "inst": "client.4305 172.21.9.34:0/422650892", | |
64 | "client_metadata": { | |
65 | "ceph_sha1": "ae81e49d369875ac8b569ff3e3c456a31b8f3af5", | |
66 | "ceph_version": "ceph version 12.0.0-1934-gae81e49 (ae81e49d369875ac8b569ff3e3c456a31b8f3af5)", | |
67 | "entity_id": "0", | |
68 | "hostname": "senta04", | |
69 | "mount_point": "/tmp/tmpcMpF1b/mnt.0", | |
70 | "pid": "29377", | |
71 | "root": "/" | |
72 | } | |
73 | } | |
74 | ] | |
75 | ||
76 | ||
77 | ||
78 | Once you have identified the client you want to evict, you can | |
79 | do that using its unique ID, or various other attributes to identify it: | |
7c673cae FG |
80 | |
81 | :: | |
31f18b77 FG |
82 | |
83 | # These all work | |
84 | ceph tell mds.0 client evict id=4305 | |
85 | ceph tell mds.0 client evict client_metadata.=4305 | |
86 | ||
7c673cae | 87 | |
31f18b77 FG |
88 | Advanced: Un-blacklisting a client |
89 | ================================== | |
7c673cae | 90 | |
31f18b77 FG |
91 | Ordinarily, a blacklisted client may not reconnect to the servers: it |
92 | must be unmounted and then mounted anew. | |
7c673cae | 93 | |
31f18b77 FG |
94 | However, in some situations it may be useful to permit a client that |
95 | was evicted to attempt to reconnect. | |
7c673cae | 96 | |
31f18b77 FG |
97 | Because CephFS uses the RADOS OSD blacklist to control client eviction, |
98 | CephFS clients can be permitted to reconnect by removing them from | |
99 | the blacklist: | |
7c673cae FG |
100 | |
101 | :: | |
102 | ||
31f18b77 FG |
103 | ceph osd blacklist ls |
104 | # ... identify the address of the client ... | |
105 | ceph osd blacklist rm <address> | |
7c673cae | 106 | |
31f18b77 FG |
107 | Doing this may put data integrity at risk if other clients have accessed |
108 | files that the blacklisted client was doing buffered IO to. It is also not | |
109 | guaranteed to result in a fully functional client -- the best way to get | |
110 | a fully healthy client back after an eviction is to unmount the client | |
111 | and do a fresh mount. | |
7c673cae | 112 | |
31f18b77 FG |
113 | If you are trying to reconnect clients in this way, you may also |
114 | find it useful to set ``client_reconnect_stale`` to true in the | |
115 | FUSE client, to prompt the client to try to reconnect. | |
7c673cae | 116 | |
31f18b77 FG |
117 | Advanced: Configuring blacklisting |
118 | ================================== | |
7c673cae | 119 | |
31f18b77 FG |
120 | If you are experiencing frequent client evictions, due to slow |
121 | client hosts or an unreliable network, and you cannot fix the underlying | |
122 | issue, then you may want to ask the MDS to be less strict. | |
7c673cae | 123 | |
31f18b77 FG |
124 | It is possible to respond to slow clients by simply dropping their |
125 | MDS sessions, but permit them to re-open sessions and permit them | |
126 | to continue talking to OSDs. To enable this mode, set | |
127 | ``mds_session_blacklist_on_timeout`` to false on your MDS nodes. | |
7c673cae | 128 | |
31f18b77 FG |
129 | For the equivalent behaviour on manual evictions, set |
130 | ``mds_session_blacklist_on_evict`` to false. | |
131 | ||
132 | Note that if blacklisting is disabled, then evicting a client will | |
133 | only have an effect on the MDS you send the command to. On a system | |
134 | with multiple active MDS daemons, you would need to send an | |
135 | eviction command to each active daemon. When blacklisting is enabled | |
136 | (the default), sending an eviction to command to just a single | |
137 | MDS is sufficient, because the blacklist propagates it to the others. | |
138 | ||
139 | Advanced options | |
140 | ================ | |
141 | ||
142 | ``mds_blacklist_interval`` - this setting controls how many seconds | |
143 | entries will remain in the blacklist for. | |
7c673cae | 144 | |
7c673cae | 145 |