]> git.proxmox.com Git - mirror_ubuntu-zesty-kernel.git/blame - Documentation/filesystems/aufs/design/01intro.txt
Revert "UBUNTU: SAUCE: aufs -- Convert to use xattr handlers"
[mirror_ubuntu-zesty-kernel.git] / Documentation / filesystems / aufs / design / 01intro.txt
CommitLineData
5b88fdd9
SF
1
2# Copyright (C) 2005-2016 Junjiro R. Okajima
3#
4# This program is free software; you can redistribute it and/or modify
5# it under the terms of the GNU General Public License as published by
6# the Free Software Foundation; either version 2 of the License, or
7# (at your option) any later version.
8#
9# This program is distributed in the hope that it will be useful,
10# but WITHOUT ANY WARRANTY; without even the implied warranty of
11# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
12# GNU General Public License for more details.
13#
14# You should have received a copy of the GNU General Public License
15# along with this program. If not, see <http://www.gnu.org/licenses/>.
16
17Introduction
18----------------------------------------
19
20aufs [ei ju: ef es] | [a u f s]
211. abbrev. for "advanced multi-layered unification filesystem".
222. abbrev. for "another unionfs".
233. abbrev. for "auf das" in German which means "on the" in English.
24 Ex. "Butter aufs Brot"(G) means "butter onto bread"(E).
25 But "Filesystem aufs Filesystem" is hard to understand.
26
27AUFS is a filesystem with features:
28- multi layered stackable unification filesystem, the member directory
29 is called as a branch.
30- branch permission and attribute, 'readonly', 'real-readonly',
31 'readwrite', 'whiteout-able', 'link-able whiteout', etc. and their
32 combination.
33- internal "file copy-on-write".
34- logical deletion, whiteout.
35- dynamic branch manipulation, adding, deleting and changing permission.
36- allow bypassing aufs, user's direct branch access.
37- external inode number translation table and bitmap which maintains the
38 persistent aufs inode number.
39- seekable directory, including NFS readdir.
40- file mapping, mmap and sharing pages.
41- pseudo-link, hardlink over branches.
42- loopback mounted filesystem as a branch.
43- several policies to select one among multiple writable branches.
44- revert a single systemcall when an error occurs in aufs.
45- and more...
46
47
48Multi Layered Stackable Unification Filesystem
49----------------------------------------------------------------------
50Most people already knows what it is.
51It is a filesystem which unifies several directories and provides a
52merged single directory. When users access a file, the access will be
53passed/re-directed/converted (sorry, I am not sure which English word is
54correct) to the real file on the member filesystem. The member
55filesystem is called 'lower filesystem' or 'branch' and has a mode
56'readonly' and 'readwrite.' And the deletion for a file on the lower
57readonly branch is handled by creating 'whiteout' on the upper writable
58branch.
59
60On LKML, there have been discussions about UnionMount (Jan Blunck,
61Bharata B Rao and Valerie Aurora) and Unionfs (Erez Zadok). They took
62different approaches to implement the merged-view.
63The former tries putting it into VFS, and the latter implements as a
64separate filesystem.
65(If I misunderstand about these implementations, please let me know and
66I shall correct it. Because it is a long time ago when I read their
67source files last time).
68
69UnionMount's approach will be able to small, but may be hard to share
70branches between several UnionMount since the whiteout in it is
71implemented in the inode on branch filesystem and always
72shared. According to Bharata's post, readdir does not seems to be
73finished yet.
74There are several missing features known in this implementations such as
75- for users, the inode number may change silently. eg. copy-up.
76- link(2) may break by copy-up.
77- read(2) may get an obsoleted filedata (fstat(2) too).
78- fcntl(F_SETLK) may be broken by copy-up.
79- unnecessary copy-up may happen, for example mmap(MAP_PRIVATE) after
80 open(O_RDWR).
81
82In linux-3.18, "overlay" filesystem (formerly known as "overlayfs") was
83merged into mainline. This is another implementation of UnionMount as a
84separated filesystem. All the limitations and known problems which
85UnionMount are equally inherited to "overlay" filesystem.
86
87Unionfs has a longer history. When I started implementing a stackable
88filesystem (Aug 2005), it already existed. It has virtual super_block,
89inode, dentry and file objects and they have an array pointing lower
90same kind objects. After contributing many patches for Unionfs, I
91re-started my project AUFS (Jun 2006).
92
93In AUFS, the structure of filesystem resembles to Unionfs, but I
94implemented my own ideas, approaches and enhancements and it became
95totally different one.
96
97Comparing DM snapshot and fs based implementation
98- the number of bytes to be copied between devices is much smaller.
99- the type of filesystem must be one and only.
100- the fs must be writable, no readonly fs, even for the lower original
101 device. so the compression fs will not be usable. but if we use
102 loopback mount, we may address this issue.
103 for instance,
104 mount /cdrom/squashfs.img /sq
105 losetup /sq/ext2.img
106 losetup /somewhere/cow
107 dmsetup "snapshot /dev/loop0 /dev/loop1 ..."
108- it will be difficult (or needs more operations) to extract the
109 difference between the original device and COW.
110- DM snapshot-merge may help a lot when users try merging. in the
111 fs-layer union, users will use rsync(1).
112
113You may want to read my old paper "Filesystems in LiveCD"
114(http://aufs.sourceforge.net/aufs2/report/sq/sq.pdf).
115
116
117Several characters/aspects/persona of aufs
118----------------------------------------------------------------------
119
120Aufs has several characters, aspects or persona.
1211. a filesystem, callee of VFS helper
1222. sub-VFS, caller of VFS helper for branches
1233. a virtual filesystem which maintains persistent inode number
1244. reader/writer of files on branches such like an application
125
1261. Callee of VFS Helper
127As an ordinary linux filesystem, aufs is a callee of VFS. For instance,
128unlink(2) from an application reaches sys_unlink() kernel function and
129then vfs_unlink() is called. vfs_unlink() is one of VFS helper and it
130calls filesystem specific unlink operation. Actually aufs implements the
131unlink operation but it behaves like a redirector.
132
1332. Caller of VFS Helper for Branches
134aufs_unlink() passes the unlink request to the branch filesystem as if
135it were called from VFS. So the called unlink operation of the branch
136filesystem acts as usual. As a caller of VFS helper, aufs should handle
137every necessary pre/post operation for the branch filesystem.
138- acquire the lock for the parent dir on a branch
139- lookup in a branch
140- revalidate dentry on a branch
141- mnt_want_write() for a branch
142- vfs_unlink() for a branch
143- mnt_drop_write() for a branch
144- release the lock on a branch
145
1463. Persistent Inode Number
147One of the most important issue for a filesystem is to maintain inode
148numbers. This is particularly important to support exporting a
149filesystem via NFS. Aufs is a virtual filesystem which doesn't have a
150backend block device for its own. But some storage is necessary to
151keep and maintain the inode numbers. It may be a large space and may not
152suit to keep in memory. Aufs rents some space from its first writable
153branch filesystem (by default) and creates file(s) on it. These files
154are created by aufs internally and removed soon (currently) keeping
155opened.
156Note: Because these files are removed, they are totally gone after
157 unmounting aufs. It means the inode numbers are not persistent
158 across unmount or reboot. I have a plan to make them really
159 persistent which will be important for aufs on NFS server.
160
1614. Read/Write Files Internally (copy-on-write)
162Because a branch can be readonly, when you write a file on it, aufs will
163"copy-up" it to the upper writable branch internally. And then write the
164originally requested thing to the file. Generally kernel doesn't
165open/read/write file actively. In aufs, even a single write may cause a
166internal "file copy". This behaviour is very similar to cp(1) command.
167
168Some people may think it is better to pass such work to user space
169helper, instead of doing in kernel space. Actually I am still thinking
170about it. But currently I have implemented it in kernel space.