]> git.proxmox.com Git - ceph.git/blob - ceph/doc/cephfs/mds-journaling.rst
import 15.2.0 Octopus source
[ceph.git] / ceph / doc / cephfs / mds-journaling.rst
1 MDS Journaling
2 ==============
3
4 CephFS Metadata Pool
5 --------------------
6
7 CephFS uses a separate (metadata) pool for managing file metadata (inodes and
8 dentries) in a Ceph File System. The metadata pool has all the information about
9 files in a Ceph File System including the File System hierarchy. Additionally,
10 CephFS maintains meta information related to other entities in a file system
11 such as file system journals, open file table, session map, etc.
12
13 This document describes how Ceph Metadata Servers use and rely on journaling.
14
15 CephFS MDS Journaling
16 ---------------------
17
18 CephFS metadata servers stream a journal of metadata events into RADOS in the metadata
19 pool prior to executing a file system operation. Active MDS daemon(s) manage metadata
20 for files and directories in CephFS.
21
22 CephFS uses journaling for couple of reasons:
23
24 #. Consistency: On an MDS failover, the journal events can be replayed to reach a
25 consistent file system state. Also, metadata operations that require multiple
26 updates to the backing store need to be journaled for crash consistency (along
27 with other consistency mechanisms such as locking, etc..).
28
29 #. Performance: Journal updates are (mostly) sequential, hence updates to journals
30 are fast. Furthermore, updates can be batched into single write, thereby saving
31 disk seek time involved in updates to different parts of a file. Having a large
32 journal also helps a standby MDS to warm its cache which helps indirectly during
33 MDS failover.
34
35 Each active metadata server maintains its own journal in the metadata pool. Journals
36 are striped over multiple objects. Journal entries which are not required (deemed as
37 old) are trimmed by the metadata server.
38
39 Journal Events
40 --------------
41
42 Apart from journaling file system metadata updates, CephFS journals various other events
43 such as client session info and directory import/export state to name a few. These events
44 are used by the metadata sever to reestablish correct state as required, e.g., Ceph MDS
45 tries to reconnect clients on restart when journal events get replayed and a specific
46 event type in the journal specifies that a client entity type has a session with the MDS
47 before it was restarted.
48
49 To examine the list of such events recorded in the journal, CephFS provides a command
50 line utility `cephfs-journal-tool` which can be used as follows:
51
52 ::
53
54 cephfs-journal-tool --rank=<fs>:<rank> event get list
55
56 `cephfs-journal-tool` is also used to discover and repair a damaged Ceph File System.
57 (See :doc:`/cephfs/cephfs-journal-tool` for more details)
58
59 Journal Event Types
60 -------------------
61
62 Following are various event types that are journaled by the MDS.
63
64 #. `EVENT_COMMITTED`: Mark a request (id) as committed.
65
66 #. `EVENT_EXPORT`: Maps directories to an MDS rank.
67
68 #. `EVENT_FRAGMENT`: Tracks various stages of directory fragmentation (split/merge).
69
70 #. `EVENT_IMPORTSTART`: Logged when an MDS rank starts importing directory fragments.
71
72 #. `EVENT_IMPORTFINISH`: Logged when an MDS rank finishes importing directory fragments.
73
74 #. `EVENT_NOOP`: No operation event type for skipping over a journal region.
75
76 #. `EVENT_OPEN`: Tracks which inodes have open file handles.
77
78 #. `EVENT_RESETJOURNAL`: Used to mark a journal as `reset` post truncation.
79
80 #. `EVENT_SESSION`: Tracks open client sessions.
81
82 #. `EVENT_SLAVEUPDATE`: Logs various stages of an operation that has been forwarded to a (slave) mds.
83
84 #. `EVENT_SUBTREEMAP`: Map of directory inodes to directory contents (subtree partition).
85
86 #. `EVENT_TABLECLIENT`: Log transition states of MDSs view of client tables (snap/anchor).
87
88 #. `EVENT_TABLESERVER`: Log transition states of MDSs view of server tables (snap/anchor).
89
90 #. `EVENT_UPDATE`: Log file operations on an inode.