]>
Commit | Line | Data |
---|---|---|
e66d8631 MCC |
1 | .. SPDX-License-Identifier: GPL-2.0 |
2 | ||
3 | ====================================== | |
4 | Enhanced Read-Only File System - EROFS | |
5 | ====================================== | |
6 | ||
fdb05364 GX |
7 | Overview |
8 | ======== | |
9 | ||
10 | EROFS file-system stands for Enhanced Read-Only File System. Different | |
11 | from other read-only file systems, it aims to be designed for flexibility, | |
12 | scalability, but be kept simple and high performance. | |
13 | ||
14 | It is designed as a better filesystem solution for the following scenarios: | |
e66d8631 | 15 | |
fdb05364 GX |
16 | - read-only storage media or |
17 | ||
18 | - part of a fully trusted read-only solution, which means it needs to be | |
19 | immutable and bit-for-bit identical to the official golden image for | |
20 | their releases due to security and other considerations and | |
21 | ||
22 | - hope to save some extra storage space with guaranteed end-to-end performance | |
23 | by using reduced metadata and transparent file compression, especially | |
24 | for those embedded devices with limited memory (ex, smartphone); | |
25 | ||
26 | Here is the main features of EROFS: | |
e66d8631 | 27 | |
fdb05364 GX |
28 | - Little endian on-disk design; |
29 | ||
30 | - Currently 4KB block size (nobh) and therefore maximum 16TB address space; | |
31 | ||
32 | - Metadata & data could be mixed by design; | |
33 | ||
34 | - 2 inode versions for different requirements: | |
e66d8631 MCC |
35 | |
36 | ===================== ============ ===================================== | |
ffafde47 | 37 | compact (v1) extended (v2) |
e66d8631 MCC |
38 | ===================== ============ ===================================== |
39 | Inode metadata size 32 bytes 64 bytes | |
40 | Max file size 4 GB 16 EB (also limited by max. vol size) | |
41 | Max uids/gids 65536 4294967296 | |
42 | File change time no yes (64 + 32-bit timestamp) | |
43 | Max hardlinks 65536 4294967296 | |
44 | Metadata reserved 4 bytes 14 bytes | |
45 | ===================== ============ ===================================== | |
fdb05364 GX |
46 | |
47 | - Support extended attributes (xattrs) as an option; | |
48 | ||
49 | - Support xattr inline and tail-end data inline for all files; | |
50 | ||
516c115c GX |
51 | - Support POSIX.1e ACLs by using xattrs; |
52 | ||
fdb05364 | 53 | - Support transparent file compression as an option: |
ffafde47 | 54 | LZ4 algorithm with 4 KB fixed-sized output compression for high performance. |
fdb05364 GX |
55 | |
56 | The following git tree provides the file system user-space tools under | |
57 | development (ex, formatting tool mkfs.erofs): | |
e66d8631 MCC |
58 | |
59 | - git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-utils.git | |
fdb05364 GX |
60 | |
61 | Bugs and patches are welcome, please kindly help us and send to the following | |
62 | linux-erofs mailing list: | |
e66d8631 MCC |
63 | |
64 | - linux-erofs mailing list <linux-erofs@lists.ozlabs.org> | |
fdb05364 | 65 | |
fdb05364 GX |
66 | Mount options |
67 | ============= | |
68 | ||
e66d8631 | 69 | =================== ========================================================= |
fdb05364 GX |
70 | (no)user_xattr Setup Extended User Attributes. Note: xattr is enabled |
71 | by default if CONFIG_EROFS_FS_XATTR is selected. | |
72 | (no)acl Setup POSIX Access Control List. Note: acl is enabled | |
73 | by default if CONFIG_EROFS_FS_POSIX_ACL is selected. | |
4279f3f9 | 74 | cache_strategy=%s Select a strategy for cached decompression from now on: |
e66d8631 MCC |
75 | |
76 | ========== ============================================= | |
77 | disabled In-place I/O decompression only; | |
78 | readahead Cache the last incomplete compressed physical | |
4279f3f9 GX |
79 | cluster for further reading. It still does |
80 | in-place I/O decompression for the rest | |
81 | compressed physical clusters; | |
e66d8631 | 82 | readaround Cache the both ends of incomplete compressed |
4279f3f9 GX |
83 | physical clusters for further reading. |
84 | It still does in-place I/O decompression | |
85 | for the rest compressed physical clusters. | |
e66d8631 MCC |
86 | ========== ============================================= |
87 | =================== ========================================================= | |
fdb05364 GX |
88 | |
89 | On-disk details | |
90 | =============== | |
91 | ||
92 | Summary | |
93 | ------- | |
94 | Different from other read-only file systems, an EROFS volume is designed | |
e66d8631 | 95 | to be as simple as possible:: |
fdb05364 GX |
96 | |
97 | |-> aligned with the block size | |
98 | ____________________________________________________________ | |
99 | | |SB| | ... | Metadata | ... | Data | Metadata | ... | Data | | |
100 | |_|__|_|_____|__________|_____|______|__________|_____|______| | |
101 | 0 +1K | |
102 | ||
103 | All data areas should be aligned with the block size, but metadata areas | |
104 | may not. All metadatas can be now observed in two different spaces (views): | |
e66d8631 | 105 | |
fdb05364 | 106 | 1. Inode metadata space |
e66d8631 | 107 | |
fdb05364 | 108 | Each valid inode should be aligned with an inode slot, which is a fixed |
ffafde47 | 109 | value (32 bytes) and designed to be kept in line with compact inode size. |
fdb05364 GX |
110 | |
111 | Each inode can be directly found with the following formula: | |
112 | inode offset = meta_blkaddr * block_size + 32 * nid | |
113 | ||
e66d8631 MCC |
114 | :: |
115 | ||
116 | |-> aligned with 8B | |
117 | |-> followed closely | |
118 | + meta_blkaddr blocks |-> another slot | |
119 | _____________________________________________________________________ | |
120 | | ... | inode | xattrs | extents | data inline | ... | inode ... | |
121 | |________|_______|(optional)|(optional)|__(optional)_|_____|__________ | |
122 | |-> aligned with the inode slot size | |
123 | . . | |
124 | . . | |
125 | . . | |
126 | . . | |
127 | . . | |
128 | . . | |
129 | .____________________________________________________|-> aligned with 4B | |
130 | | xattr_ibody_header | shared xattrs | inline xattrs | | |
131 | |____________________|_______________|_______________| | |
132 | |-> 12 bytes <-|->x * 4 bytes<-| . | |
133 | . . . | |
134 | . . . | |
135 | . . . | |
136 | ._______________________________.______________________. | |
137 | | id | id | id | id | ... | id | ent | ... | ent| ... | | |
138 | |____|____|____|____|______|____|_____|_____|____|_____| | |
139 | |-> aligned with 4B | |
140 | |-> aligned with 4B | |
fdb05364 GX |
141 | |
142 | Inode could be 32 or 64 bytes, which can be distinguished from a common | |
e66d8631 | 143 | field which all inode versions have -- i_format:: |
fdb05364 GX |
144 | |
145 | __________________ __________________ | |
ffafde47 | 146 | | i_format | | i_format | |
fdb05364 GX |
147 | |__________________| |__________________| |
148 | | ... | | ... | | |
149 | | | | | | |
150 | |__________________| 32 bytes | | | |
151 | | | | |
152 | |__________________| 64 bytes | |
153 | ||
154 | Xattrs, extents, data inline are followed by the corresponding inode with | |
ffafde47 GX |
155 | proper alignment, and they could be optional for different data mappings. |
156 | _currently_ total 4 valid data mappings are supported: | |
fdb05364 | 157 | |
e66d8631 | 158 | == ==================================================================== |
ffafde47 GX |
159 | 0 flat file data without data inline (no extent); |
160 | 1 fixed-sized output data compression (with non-compacted indexes); | |
161 | 2 flat file data with tail packing data inline (no extent); | |
162 | 3 fixed-sized output data compression (with compacted indexes, v5.3+). | |
e66d8631 | 163 | == ==================================================================== |
fdb05364 GX |
164 | |
165 | The size of the optional xattrs is indicated by i_xattr_count in inode | |
166 | header. Large xattrs or xattrs shared by many different files can be | |
167 | stored in shared xattrs metadata rather than inlined right after inode. | |
168 | ||
169 | 2. Shared xattrs metadata space | |
e66d8631 | 170 | |
fdb05364 GX |
171 | Shared xattrs space is similar to the above inode space, started with |
172 | a specific block indicated by xattr_blkaddr, organized one by one with | |
173 | proper align. | |
174 | ||
175 | Each share xattr can also be directly found by the following formula: | |
176 | xattr offset = xattr_blkaddr * block_size + 4 * xattr_id | |
177 | ||
e66d8631 MCC |
178 | :: |
179 | ||
180 | |-> aligned by 4 bytes | |
181 | + xattr_blkaddr blocks |-> aligned with 4 bytes | |
182 | _________________________________________________________________________ | |
183 | | ... | xattr_entry | xattr data | ... | xattr_entry | xattr data ... | |
184 | |________|_____________|_____________|_____|______________|_______________ | |
fdb05364 GX |
185 | |
186 | Directories | |
187 | ----------- | |
188 | All directories are now organized in a compact on-disk format. Note that | |
189 | each directory block is divided into index and name areas in order to support | |
190 | random file lookup, and all directory entries are _strictly_ recorded in | |
191 | alphabetical order in order to support improved prefix binary search | |
192 | algorithm (could refer to the related source code). | |
193 | ||
e66d8631 MCC |
194 | :: |
195 | ||
196 | ___________________________ | |
197 | / | | |
198 | / ______________|________________ | |
199 | / / | nameoff1 | nameoffN-1 | |
200 | ____________.______________._______________v________________v__________ | |
201 | | dirent | dirent | ... | dirent | filename | filename | ... | filename | | |
202 | |___.0___|____1___|_____|___N-1__|____0_____|____1_____|_____|___N-1____| | |
203 | \ ^ | |
204 | \ | * could have | |
205 | \ | trailing '\0' | |
206 | \________________________| nameoff0 | |
fdb05364 | 207 | |
e66d8631 | 208 | Directory block |
fdb05364 GX |
209 | |
210 | Note that apart from the offset of the first filename, nameoff0 also indicates | |
211 | the total number of directory entries in this block since it is no need to | |
212 | introduce another on-disk field at all. | |
213 | ||
214 | Compression | |
215 | ----------- | |
ffafde47 | 216 | Currently, EROFS supports 4KB fixed-sized output transparent file compression, |
e66d8631 MCC |
217 | as illustrated below:: |
218 | ||
219 | |---- Variant-Length Extent ----|-------- VLE --------|----- VLE ----- | |
220 | clusterofs clusterofs clusterofs | |
221 | | | | logical data | |
222 | _________v_______________________________v_____________________v_______________ | |
223 | ... | . | | . | | . | ... | |
224 | ____|____.________|_____________|________.____|_____________|__.__________|____ | |
225 | |-> cluster <-|-> cluster <-|-> cluster <-|-> cluster <-|-> cluster <-| | |
226 | size size size size size | |
227 | . . . . | |
228 | . . . . | |
229 | . . . . | |
230 | _______._____________._____________._____________._____________________ | |
231 | ... | | | | ... physical data | |
232 | _______|_____________|_____________|_____________|_____________________ | |
233 | |-> cluster <-|-> cluster <-|-> cluster <-| | |
234 | size size size | |
fdb05364 GX |
235 | |
236 | Currently each on-disk physical cluster can contain 4KB (un)compressed data | |
237 | at most. For each logical cluster, there is a corresponding on-disk index to | |
238 | describe its cluster type, physical cluster address, etc. | |
239 | ||
240 | See "struct z_erofs_vle_decompressed_index" in erofs_fs.h for more details. |