]> git.proxmox.com Git - ceph.git/blob - ceph/doc/cephfs/mdcache.rst
update source to Ceph Pacific 16.2.2
[ceph.git] / ceph / doc / cephfs / mdcache.rst
1 =================================
2 CephFS Distributed Metadata Cache
3 =================================
4 While the data for inodes in a Ceph file system is stored in RADOS and
5 accessed by the clients directly, inode metadata and directory
6 information is managed by the Ceph metadata server (MDS). The MDS's
7 act as mediator for all metadata related activity, storing the resulting
8 information in a separate RADOS pool from the file data.
9
10 CephFS clients can request that the MDS fetch or change inode metadata
11 on its behalf, but an MDS can also grant the client **capabilities**
12 (aka **caps**) for each inode (see :doc:`/cephfs/capabilities`).
13
14 A capability grants the client the ability to cache and possibly
15 manipulate some portion of the data or metadata associated with the
16 inode. When another client needs access to the same information, the MDS
17 will revoke the capability and the client will eventually return it,
18 along with an updated version of the inode's metadata (in the event that
19 it made changes to it while it held the capability).
20
21 Clients can request capabilities and will generally get them, but when
22 there is competing access or memory pressure on the MDS, they may be
23 **revoked**. When a capability is revoked, the client is responsible for
24 returning it as soon as it is able. Clients that fail to do so in a
25 timely fashion may end up **blocklisted** and unable to communicate with
26 the cluster.
27
28 Since the cache is distributed, the MDS must take great care to ensure
29 that no client holds capabilities that may conflict with other clients'
30 capabilities, or operations that it does itself. This allows cephfs
31 clients to rely on much greater cache coherence than a filesystem like
32 NFS, where the client may cache data and metadata beyond the point where
33 it has changed on the server.
34
35 Client Metadata Requests
36 ------------------------
37 When a client needs to query/change inode metadata or perform an
38 operation on a directory, it has two options. It can make a request to
39 the MDS directly, or serve the information out of its cache. With
40 CephFS, the latter is only possible if the client has the necessary
41 caps.
42
43 Clients can send simple requests to the MDS to query or request changes
44 to certain metadata. The replies to these requests may also grant the
45 client a certain set of caps for the inode, allowing it to perform
46 subsequent requests without consulting the MDS.
47
48 Clients can also request caps directly from the MDS, which is necessary
49 in order to read or write file data.
50
51 Distributed Locks in an MDS Cluster
52 -----------------------------------
53 When an MDS wants to read or change information about an inode, it must
54 gather the appropriate locks for it. The MDS cluster may have a series
55 of different types of locks on the given inode and each MDS may have
56 disjoint sets of locks.
57
58 If there are outstanding caps that would conflict with these locks, then
59 they must be revoked before the lock can be acquired. Once the competing
60 caps are returned to the MDS, then it can get the locks and do the
61 operation.
62
63 On a filesystem served by multiple MDS', the metadata cache is also
64 distributed among the MDS' in the cluster. For every inode, at any given
65 time, only one MDS in the cluster is considered **authoritative**. Any
66 requests to change that inode must be done by the authoritative MDS,
67 though non-authoritative MDS can forward requests to the authoritative
68 one.
69
70 Non-auth MDS' can also obtain read locks that prevent the auth MDS from
71 changing the data until the lock is dropped, so that they can serve
72 inode info to the clients.
73
74 The auth MDS for an inode can change over time as well. The MDS' will
75 actively balance responsibility for the inode cache amongst
76 themselves, but this can be overridden by **pinning** certain subtrees
77 to a single MDS.