]> git.proxmox.com Git - ceph.git/blame - ceph/doc/dev/cephfs-snapshots.rst
import ceph quincy 17.2.6
[ceph.git] / ceph / doc / dev / cephfs-snapshots.rst
CommitLineData
7c673cae
FG
1CephFS Snapshots
2================
3
11fdf7f2
TL
4CephFS supports snapshots, generally created by invoking mkdir within the
5``.snap`` directory. Note this is a hidden, special directory, not visible
6during a directory listing.
7c673cae
FG
7
8Overview
9-----------
10
11Generally, snapshots do what they sound like: they create an immutable view
9f95a23c 12of the file system at the point in time they're taken. There are some headline
7c673cae
FG
13features that make CephFS snapshots different from what you might expect:
14
15* Arbitrary subtrees. Snapshots are created within any directory you choose,
9f95a23c 16 and cover all data in the file system under that directory.
7c673cae
FG
17* Asynchronous. If you create a snapshot, buffered data is flushed out lazily,
18 including from other clients. As a result, "creating" the snapshot is
19 very fast.
20
21Important Data Structures
22-------------------------
23* SnapRealm: A `SnapRealm` is created whenever you create a snapshot at a new
11fdf7f2
TL
24 point in the hierarchy (or, when a snapshotted inode is move outside of its
25 parent snapshot). SnapRealms contain an `sr_t srnode`, and `inodes_with_caps`
26 that are part of the snapshot. Clients also have a SnapRealm concept that
27 maintains less data but is used to associate a `SnapContext` with each open
28 file for writing.
7c673cae
FG
29* sr_t: An `sr_t` is the on-disk snapshot metadata. It is part of the containing
30 directory and contains sequence counters, timestamps, the list of associated
11fdf7f2
TL
31 snapshot IDs, and `past_parent_snaps`.
32* SnapServer: SnapServer manages snapshot ID allocation, snapshot deletion and
9f95a23c 33 tracks list of effective snapshots in the file system. A file system only has
11fdf7f2
TL
34 one instance of snapserver.
35* SnapClient: SnapClient is used to communicate with snapserver, each MDS rank
36 has its own snapclient instance. SnapClient also caches effective snapshots
37 locally.
7c673cae
FG
38
39Creating a snapshot
40-------------------
9f95a23c
TL
41CephFS snapshot feature is enabled by default on new file system. To enable it
42on existing file systems, use command below.
11fdf7f2
TL
43
44.. code::
45
46 $ ceph fs set <fs_name> allow_new_snaps true
47
48When snapshots are enabled, all directories in CephFS will have a special
49``.snap`` directory. (You may configure a different name with the ``client
50snapdir`` setting if you wish.)
51
52To create a CephFS snapshot, create a subdirectory under
53``.snap`` with a name of your choice. For example, to create a snapshot on
54directory "/1/2/3/", invoke ``mkdir /1/2/3/.snap/my-snapshot-name`` .
55
56This is transmitted to the MDS Server as a
7c673cae
FG
57CEPH_MDS_OP_MKSNAP-tagged `MClientRequest`, and initially handled in
58Server::handle_client_mksnap(). It allocates a `snapid` from the `SnapServer`,
59projects a new inode with the new SnapRealm, and commits it to the MDLog as
60usual. When committed, it invokes
11fdf7f2
TL
61`MDCache::do_realm_invalidate_and_update_notify()`, which notifies all clients
62with caps on files under "/1/2/3/", about the new SnapRealm. When clients get
63the notifications, they update client-side SnapRealm hierarchy, link files
64under "/1/2/3/" to the new SnapRealm and generate a `SnapContext` for the
65new SnapRealm.
7c673cae 66
11fdf7f2 67Note that this *is not* a synchronous part of the snapshot creation!
7c673cae
FG
68
69Updating a snapshot
70-------------------
11fdf7f2
TL
71If you delete a snapshot, a similar process is followed. If you remove an inode
72out of its parent SnapRealm, the rename code creates a new SnapRealm for the
73renamed inode (if SnapRealm does not already exist), saves IDs of snapshots that
74are effective on the original parent SnapRealm into `past_parent_snaps` of the
75new SnapRealm, then follows a process similar to creating snapshot.
7c673cae
FG
76
77Generating a SnapContext
78------------------------
79A RADOS `SnapContext` consists of a snapshot sequence ID (`snapid`) and all
80the snapshot IDs that an object is already part of. To generate that list, we
11fdf7f2
TL
81combine `snapids` associated with the SnapRealm and all valid `snapids` in
82`past_parent_snaps`. Stale `snapids` are filtered out by SnapClient's cached
83effective snapshots.
7c673cae
FG
84
85Storing snapshot data
86---------------------
87File data is stored in RADOS "self-managed" snapshots. Clients are careful to
88use the correct `SnapContext` when writing file data to the OSDs.
89
90Storing snapshot metadata
91-------------------------
92Snapshotted dentries (and their inodes) are stored in-line as part of the
93directory they were in at the time of the snapshot. *All dentries* include a
94`first` and `last` snapid for which they are valid. (Non-snapshotted dentries
95will have their `last` set to CEPH_NOSNAP).
96
97Snapshot writeback
98------------------
99There is a great deal of code to handle writeback efficiently. When a Client
100receives an `MClientSnap` message, it updates the local `SnapRealm`
101representation and its links to specific `Inodes`, and generates a `CapSnap`
102for the `Inode`. The `CapSnap` is flushed out as part of capability writeback,
103and if there is dirty data the `CapSnap` is used to block fresh data writes
104until the snapshot is completely flushed to the OSDs.
105
106In the MDS, we generate snapshot-representing dentries as part of the regular
107process for flushing them. Dentries with outstanding `CapSnap` data is kept
108pinned and in the journal.
109
110Deleting snapshots
111------------------
11fdf7f2 112Snapshots are deleted by invoking "rmdir" on the ".snap" directory they are
7c673cae
FG
113rooted in. (Attempts to delete a directory which roots snapshots *will fail*;
114you must delete the snapshots first.) Once deleted, they are entered into the
115`OSDMap` list of deleted snapshots and the file data is removed by the OSDs.
116Metadata is cleaned up as the directory objects are read in and written back
117out again.
118
119Hard links
120----------
11fdf7f2 121Inode with multiple hard links is moved to a dummy global SnapRealm. The
9f95a23c 122dummy SnapRealm covers all snapshots in the file system. The inode's data
11fdf7f2
TL
123will be preserved for any new snapshot. These preserved data will cover
124snapshots on any linkage of the inode.
7c673cae
FG
125
126Multi-FS
127---------
9f95a23c
TL
128Snapshots and multiple file systems don't interact well. Specifically, each
129MDS cluster allocates `snapids` independently; if you have multiple file systems
7c673cae
FG
130sharing a single pool (via namespaces), their snapshots *will* collide and
131deleting one will result in missing file data for others. (This may even be
132invisible, not throwing errors to the user.) If each FS gets its own
133pool things probably work, but this isn't tested and may not be true.
39ae355f
TL
134
135.. Note:: To avoid snap id collision between mon-managed snapshots and file system
136 snapshots, pools with mon-managed snapshots are not allowed to be attached
137 to a file system. Also, mon-managed snapshots can't be created in pools
138 already attached to a file system either.