]> git.proxmox.com Git - mirror_ubuntu-bionic-kernel.git/blob - Documentation/filesystems/aufs/design/03lookup.txt
UBUNTU: ubuntu: vbox -- update to 5.2.6-dfsg-5
[mirror_ubuntu-bionic-kernel.git] / Documentation / filesystems / aufs / design / 03lookup.txt
1
2 # Copyright (C) 2005-2017 Junjiro R. Okajima
3 #
4 # This program is free software; you can redistribute it and/or modify
5 # it under the terms of the GNU General Public License as published by
6 # the Free Software Foundation; either version 2 of the License, or
7 # (at your option) any later version.
8 #
9 # This program is distributed in the hope that it will be useful,
10 # but WITHOUT ANY WARRANTY; without even the implied warranty of
11 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
12 # GNU General Public License for more details.
13 #
14 # You should have received a copy of the GNU General Public License
15 # along with this program. If not, see <http://www.gnu.org/licenses/>.
16
17 Lookup in a Branch
18 ----------------------------------------------------------------------
19 Since aufs has a character of sub-VFS (see Introduction), it operates
20 lookup for branches as VFS does. It may be a heavy work. But almost all
21 lookup operation in aufs is the simplest case, ie. lookup only an entry
22 directly connected to its parent. Digging down the directory hierarchy
23 is unnecessary. VFS has a function lookup_one_len() for that use, and
24 aufs calls it.
25
26 When a branch is a remote filesystem, aufs basically relies upon its
27 ->d_revalidate(), also aufs forces the hardest revalidate tests for
28 them.
29 For d_revalidate, aufs implements three levels of revalidate tests. See
30 "Revalidate Dentry and UDBA" in detail.
31
32
33 Test Only the Highest One for the Directory Permission (dirperm1 option)
34 ----------------------------------------------------------------------
35 Let's try case study.
36 - aufs has two branches, upper readwrite and lower readonly.
37 /au = /rw + /ro
38 - "dirA" exists under /ro, but /rw. and its mode is 0700.
39 - user invoked "chmod a+rx /au/dirA"
40 - the internal copy-up is activated and "/rw/dirA" is created and its
41 permission bits are set to world readable.
42 - then "/au/dirA" becomes world readable?
43
44 In this case, /ro/dirA is still 0700 since it exists in readonly branch,
45 or it may be a natively readonly filesystem. If aufs respects the lower
46 branch, it should not respond readdir request from other users. But user
47 allowed it by chmod. Should really aufs rejects showing the entries
48 under /ro/dirA?
49
50 To be honest, I don't have a good solution for this case. So aufs
51 implements 'dirperm1' and 'nodirperm1' mount options, and leave it to
52 users.
53 When dirperm1 is specified, aufs checks only the highest one for the
54 directory permission, and shows the entries. Otherwise, as usual, checks
55 every dir existing on all branches and rejects the request.
56
57 As a side effect, dirperm1 option improves the performance of aufs
58 because the number of permission check is reduced when the number of
59 branch is many.
60
61
62 Revalidate Dentry and UDBA (User's Direct Branch Access)
63 ----------------------------------------------------------------------
64 Generally VFS helpers re-validate a dentry as a part of lookup.
65 0. digging down the directory hierarchy.
66 1. lock the parent dir by its i_mutex.
67 2. lookup the final (child) entry.
68 3. revalidate it.
69 4. call the actual operation (create, unlink, etc.)
70 5. unlock the parent dir
71
72 If the filesystem implements its ->d_revalidate() (step 3), then it is
73 called. Actually aufs implements it and checks the dentry on a branch is
74 still valid.
75 But it is not enough. Because aufs has to release the lock for the
76 parent dir on a branch at the end of ->lookup() (step 2) and
77 ->d_revalidate() (step 3) while the i_mutex of the aufs dir is still
78 held by VFS.
79 If the file on a branch is changed directly, eg. bypassing aufs, after
80 aufs released the lock, then the subsequent operation may cause
81 something unpleasant result.
82
83 This situation is a result of VFS architecture, ->lookup() and
84 ->d_revalidate() is separated. But I never say it is wrong. It is a good
85 design from VFS's point of view. It is just not suitable for sub-VFS
86 character in aufs.
87
88 Aufs supports such case by three level of revalidation which is
89 selectable by user.
90 1. Simple Revalidate
91 Addition to the native flow in VFS's, confirm the child-parent
92 relationship on the branch just after locking the parent dir on the
93 branch in the "actual operation" (step 4). When this validation
94 fails, aufs returns EBUSY. ->d_revalidate() (step 3) in aufs still
95 checks the validation of the dentry on branches.
96 2. Monitor Changes Internally by Inotify/Fsnotify
97 Addition to above, in the "actual operation" (step 4) aufs re-lookup
98 the dentry on the branch, and returns EBUSY if it finds different
99 dentry.
100 Additionally, aufs sets the inotify/fsnotify watch for every dir on branches
101 during it is in cache. When the event is notified, aufs registers a
102 function to kernel 'events' thread by schedule_work(). And the
103 function sets some special status to the cached aufs dentry and inode
104 private data. If they are not cached, then aufs has nothing to
105 do. When the same file is accessed through aufs (step 0-3) later,
106 aufs will detect the status and refresh all necessary data.
107 In this mode, aufs has to ignore the event which is fired by aufs
108 itself.
109 3. No Extra Validation
110 This is the simplest test and doesn't add any additional revalidation
111 test, and skip the revalidation in step 4. It is useful and improves
112 aufs performance when system surely hide the aufs branches from user,
113 by over-mounting something (or another method).