]>
Commit | Line | Data |
---|---|---|
11fdf7f2 TL |
1 | CephFS Reclaim Interface |
2 | ======================== | |
3 | ||
4 | Introduction | |
5 | ------------ | |
6 | NFS servers typically do not track ephemeral state on stable storage. If | |
7 | the NFS server is restarted, then it will be resurrected with no | |
8 | ephemeral state, and the NFS clients are expected to send requests to | |
9 | reclaim what state they held during a grace period. | |
10 | ||
11 | In order to support this use-case, libcephfs has grown several functions | |
12 | that allow a client that has been stopped and restarted to destroy or | |
13 | reclaim state held by a previous incarnation of itself. This allows the | |
14 | client to reacquire state held by its previous incarnation, and to avoid | |
15 | the long wait for the old session to time out before releasing the state | |
16 | previously held. | |
17 | ||
18 | As soon as an NFS server running over cephfs goes down, it's racing | |
19 | against its MDS session timeout. If the Ceph session times out before | |
20 | the NFS grace period is started, then conflicting state could be | |
21 | acquired by another client. This mechanism also allows us to increase | |
22 | the timeout for these clients, to ensure that the server has a long | |
23 | window of time to be restarted. | |
24 | ||
25 | Setting the UUID | |
26 | ---------------- | |
27 | In order to properly reset or reclaim against the old session, we need a | |
28 | way to identify the old session. This done by setting a unique opaque | |
29 | value on the session using **ceph_set_uuid()**. The uuid value can be | |
30 | any string and is treated as opaque by the client. | |
31 | ||
32 | Setting the uuid directly can only be done on a new session, prior to | |
33 | mounting. When reclaim is performed the current session will inherit the | |
34 | old session's uuid. | |
35 | ||
36 | Starting Reclaim | |
37 | ---------------- | |
38 | After calling ceph_create and ceph_init on the resulting struct | |
39 | ceph_mount_info, the client should then issue ceph_start_reclaim, | |
40 | passing in the uuid of the previous incarnation of the client with any | |
41 | flags. | |
42 | ||
43 | CEPH_RECLAIM_RESET | |
44 | This flag indicates that we do not intend to do any sort of reclaim | |
45 | against the old session indicated by the given uuid, and that it | |
46 | should just be discarded. Any state held by the previous client | |
47 | should be released immediately. | |
48 | ||
49 | Finishing Reclaim | |
50 | ----------------- | |
51 | After the Ceph client has completed all of its reclaim operations, the | |
52 | client should issue ceph_finish_reclaim to indicate that the reclaim is | |
53 | now complete. | |
54 | ||
55 | Setting Session Timeout (Optional) | |
56 | ---------------------------------- | |
57 | When a client dies and is restarted, and we need to preserve its state, | |
58 | we are effectively racing against the session expiration clock. In this | |
59 | situation we generally want a longer timeout since we expect to | |
60 | eventually kill off the old session manually. | |
61 | ||
62 | Example 1: Reset Old Session | |
63 | ---------------------------- | |
64 | This example just kills off the MDS session held by a previous instance | |
65 | of itself. An NFS server can start a grace period and then ask the MDS | |
66 | to tear down the old session. This allows clients to start reclaim | |
67 | immediately. | |
68 | ||
69 | (Note: error handling omitted for clarity) | |
70 | ||
71 | .. code-block:: c | |
72 | ||
73 | struct ceph_mount_info *cmount; | |
74 | const char *uuid = "foobarbaz"; | |
75 | ||
76 | /* Set up a new cephfs session, but don't mount it yet. */ | |
77 | rc = ceph_create(&cmount); | |
78 | rc = ceph_init(&cmount); | |
79 | ||
80 | /* | |
81 | * Set the timeout to 5 minutes to lengthen the window of time for | |
82 | * the server to restart, should it crash. | |
83 | */ | |
84 | ceph_set_session_timeout(cmount, 300); | |
85 | ||
86 | /* | |
87 | * Start reclaim vs. session with old uuid. Before calling this, | |
88 | * all NFS servers that could acquire conflicting state _must_ be | |
89 | * enforcing their grace period locally. | |
90 | */ | |
91 | rc = ceph_start_reclaim(cmount, uuid, CEPH_RECLAIM_RESET); | |
92 | ||
93 | /* Declare reclaim complete */ | |
94 | rc = ceph_finish_reclaim(cmount); | |
95 | ||
96 | /* Set uuid held by new session */ | |
97 | ceph_set_uuid(cmount, nodeid); | |
98 | ||
99 | /* | |
9f95a23c | 100 | * Now mount up the file system and do normal open/lock operations to |
11fdf7f2 TL |
101 | * satisfy reclaim requests. |
102 | */ | |
103 | ceph_mount(cmount, rootpath); | |
104 | ... |