]>
Commit | Line | Data |
---|---|---|
e14748e8 SF |
1 | |
2 | # Copyright (C) 2011-2016 Junjiro R. Okajima | |
3 | # | |
4 | # This program is free software; you can redistribute it and/or modify | |
5 | # it under the terms of the GNU General Public License as published by | |
6 | # the Free Software Foundation; either version 2 of the License, or | |
7 | # (at your option) any later version. | |
8 | # | |
9 | # This program is distributed in the hope that it will be useful, | |
10 | # but WITHOUT ANY WARRANTY; without even the implied warranty of | |
11 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | |
12 | # GNU General Public License for more details. | |
13 | # | |
14 | # You should have received a copy of the GNU General Public License | |
15 | # along with this program; if not, write to the Free Software | |
16 | # Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA | |
17 | ||
18 | ||
19 | File-based Hierarchical Storage Management (FHSM) | |
20 | ---------------------------------------------------------------------- | |
21 | Hierarchical Storage Management (or HSM) is a well-known feature in the | |
22 | storage world. Aufs provides this feature as file-based with multiple | |
23 | writable branches, based upon the principle of "Colder, the Lower". | |
24 | Here the word "colder" means that the less used files, and "lower" means | |
25 | that the position in the order of the stacked branches vertically. | |
26 | These multiple writable branches are prioritized, ie. the topmost one | |
27 | should be the fastest drive and be used heavily. | |
28 | ||
29 | o Characters in aufs FHSM story | |
30 | - aufs itself and a new branch attribute. | |
31 | - a new ioctl interface to move-down and to establish a connection with | |
32 | the daemon ("move-down" is a converse of "copy-up"). | |
33 | - userspace tool and daemon. | |
34 | ||
35 | The userspace daemon establishes a connection with aufs and waits for | |
36 | the notification. The notified information is very similar to struct | |
37 | statfs containing the number of consumed blocks and inodes. | |
38 | When the consumed blocks/inodes of a branch exceeds the user-specified | |
39 | upper watermark, the daemon activates its move-down process until the | |
40 | consumed blocks/inodes reaches the user-specified lower watermark. | |
41 | ||
42 | The actual move-down is done by aufs based upon the request from | |
43 | user-space since we need to maintain the inode number and the internal | |
44 | pointer arrays in aufs. | |
45 | ||
46 | Currently aufs FHSM handles the regular files only. Additionally they | |
47 | must not be hard-linked nor pseudo-linked. | |
48 | ||
49 | ||
50 | o Cowork of aufs and the user-space daemon | |
51 | During the userspace daemon established the connection, aufs sends a | |
52 | small notification to it whenever aufs writes something into the | |
53 | writable branch. But it may cost high since aufs issues statfs(2) | |
54 | internally. So user can specify a new option to cache the | |
55 | info. Actually the notification is controlled by these factors. | |
56 | + the specified cache time. | |
57 | + classified as "force" by aufs internally. | |
58 | Until the specified time expires, aufs doesn't send the info | |
59 | except the forced cases. When aufs decide forcing, the info is always | |
60 | notified to userspace. | |
61 | For example, the number of free inodes is generally large enough and | |
62 | the shortage of it happens rarely. So aufs doesn't force the | |
63 | notification when creating a new file, directory and others. This is | |
64 | the typical case which aufs doesn't force. | |
65 | When aufs writes the actual filedata and the files consumes any of new | |
66 | blocks, the aufs forces notifying. | |
67 | ||
68 | ||
69 | o Interfaces in aufs | |
70 | - New branch attribute. | |
71 | + fhsm | |
72 | Specifies that the branch is managed by FHSM feature. In other word, | |
73 | participant in the FHSM. | |
74 | When nofhsm is set to the branch, it will not be the source/target | |
75 | branch of the move-down operation. This attribute is set | |
76 | independently from coo and moo attributes, and if you want full | |
77 | FHSM, you should specify them as well. | |
78 | - New mount option. | |
79 | + fhsm_sec | |
80 | Specifies a second to suppress many less important info to be | |
81 | notified. | |
82 | - New ioctl. | |
83 | + AUFS_CTL_FHSM_FD | |
84 | create a new file descriptor which userspace can read the notification | |
85 | (a subset of struct statfs) from aufs. | |
86 | - Module parameter 'brs' | |
87 | It has to be set to 1. Otherwise the new mount option 'fhsm' will not | |
88 | be set. | |
89 | - mount helpers /sbin/mount.aufs and /sbin/umount.aufs | |
90 | When there are two or more branches with fhsm attributes, | |
91 | /sbin/mount.aufs invokes the user-space daemon and /sbin/umount.aufs | |
92 | terminates it. As a result of remounting and branch-manipulation, the | |
93 | number of branches with fhsm attribute can be one. In this case, | |
94 | /sbin/mount.aufs will terminate the user-space daemon. | |
95 | ||
96 | ||
97 | Finally the operation is done as these steps in kernel-space. | |
98 | - make sure that, | |
99 | + no one else is using the file. | |
100 | + the file is not hard-linked. | |
101 | + the file is not pseudo-linked. | |
102 | + the file is a regular file. | |
103 | + the parent dir is not opaqued. | |
104 | - find the target writable branch. | |
105 | - make sure the file is not whiteout-ed by the upper (than the target) | |
106 | branch. | |
107 | - make the parent dir on the target branch. | |
108 | - mutex lock the inode on the branch. | |
109 | - unlink the whiteout on the target branch (if exists). | |
110 | - lookup and create the whiteout-ed temporary name on the target branch. | |
111 | - copy the file as the whiteout-ed temporary name on the target branch. | |
112 | - rename the whiteout-ed temporary name to the original name. | |
113 | - unlink the file on the source branch. | |
114 | - maintain the internal pointer array and the external inode number | |
115 | table (XINO). | |
116 | - maintain the timestamps and other attributes of the parent dir and the | |
117 | file. | |
118 | ||
119 | And of course, in every step, an error may happen. So the operation | |
120 | should restore the original file state after an error happens. |