]> git.proxmox.com Git - ceph.git/blob - ceph/doc/dev/cephfs-reclaim.rst
update ceph source to reef 18.2.1
[ceph.git] / ceph / doc / dev / cephfs-reclaim.rst
1 CephFS Reclaim Interface
2 ========================
3
4 Introduction
5 ------------
6 NFS servers typically do not track ephemeral state on stable storage. If
7 the NFS server is restarted, then it will be resurrected with no
8 ephemeral state, and the NFS clients are expected to send requests to
9 reclaim what state they held during a grace period.
10
11 In order to support this use-case, libcephfs has grown several functions
12 that allow a client that has been stopped and restarted to destroy or
13 reclaim state held by a previous incarnation of itself. This allows the
14 client to reacquire state held by its previous incarnation, and to avoid
15 the long wait for the old session to time out before releasing the state
16 previously held.
17
18 As soon as an NFS server running over cephfs goes down, it's racing
19 against its MDS session timeout. If the Ceph session times out before
20 the NFS grace period is started, then conflicting state could be
21 acquired by another client. This mechanism also allows us to increase
22 the timeout for these clients, to ensure that the server has a long
23 window of time to be restarted.
24
25 Setting the UUID
26 ----------------
27 In order to properly reset or reclaim against the old session, we need a
28 way to identify the old session. This done by setting a unique opaque
29 value on the session using **ceph_set_uuid()**. The uuid value can be
30 any string and is treated as opaque by the client.
31
32 Setting the uuid directly can only be done on a new session, prior to
33 mounting. When reclaim is performed the current session will inherit the
34 old session's uuid.
35
36 Starting Reclaim
37 ----------------
38 After calling ceph_create and ceph_init on the resulting struct
39 ceph_mount_info, the client should then issue ceph_start_reclaim,
40 passing in the uuid of the previous incarnation of the client with any
41 flags.
42
43 CEPH_RECLAIM_RESET
44 This flag indicates that we do not intend to do any sort of reclaim
45 against the old session indicated by the given uuid, and that it
46 should just be discarded. Any state held by the previous client
47 should be released immediately.
48
49 Finishing Reclaim
50 -----------------
51 After the Ceph client has completed all of its reclaim operations, the
52 client should issue ceph_finish_reclaim to indicate that the reclaim is
53 now complete.
54
55 Setting Session Timeout (Optional)
56 ----------------------------------
57 When a client dies and is restarted, and we need to preserve its state,
58 we are effectively racing against the session expiration clock. In this
59 situation we generally want a longer timeout since we expect to
60 eventually kill off the old session manually.
61
62 Example 1: Reset Old Session
63 ----------------------------
64 This example just kills off the MDS session held by a previous instance
65 of itself. An NFS server can start a grace period and then ask the MDS
66 to tear down the old session. This allows clients to start reclaim
67 immediately.
68
69 (Note: error handling omitted for clarity)
70
71 .. code-block:: c
72
73 struct ceph_mount_info *cmount;
74 const char *uuid = "foobarbaz";
75
76 /* Set up a new cephfs session, but don't mount it yet. */
77 rc = ceph_create(&cmount);
78 rc = ceph_init(&cmount);
79
80 /*
81 * Set the timeout to 5 minutes to lengthen the window of time for
82 * the server to restart, should it crash.
83 */
84 ceph_set_session_timeout(cmount, 300);
85
86 /*
87 * Start reclaim vs. session with old uuid. Before calling this,
88 * all NFS servers that could acquire conflicting state _must_ be
89 * enforcing their grace period locally.
90 */
91 rc = ceph_start_reclaim(cmount, uuid, CEPH_RECLAIM_RESET);
92
93 /* Declare reclaim complete */
94 rc = ceph_finish_reclaim(cmount);
95
96 /* Set uuid held by new session */
97 ceph_set_uuid(cmount, nodeid);
98
99 /*
100 * Now mount up the file system and do normal open/lock operations to
101 * satisfy reclaim requests.
102 */
103 ceph_mount(cmount, rootpath);
104 ...