]>
Commit | Line | Data |
---|---|---|
489fcb91 | 1 | .. SPDX-License-Identifier: GPL-2.0 |
fc513a33 | 2 | |
489fcb91 | 3 | ======================== |
d3091215 | 4 | ext4 General Information |
489fcb91 | 5 | ======================== |
fc513a33 | 6 | |
c9f3f2d8 | 7 | Ext4 is an advanced level of the ext3 filesystem which incorporates |
22359f57 DC |
8 | scalability and reliability enhancements for supporting large filesystems |
9 | (64 bit) in keeping with increasing disk capacities and state-of-the-art | |
10 | feature requirements. | |
fc513a33 | 11 | |
22359f57 DC |
12 | Mailing list: linux-ext4@vger.kernel.org |
13 | Web site: http://ext4.wiki.kernel.org | |
fc513a33 DK |
14 | |
15 | ||
489fcb91 DW |
16 | Quick usage instructions |
17 | ======================== | |
fc513a33 | 18 | |
22359f57 | 19 | Note: More extensive information for getting started with ext4 can be |
489fcb91 DW |
20 | found at the ext4 wiki site at the URL: |
21 | http://ext4.wiki.kernel.org/index.php/Ext4_Howto | |
22359f57 | 22 | |
0694f8c3 | 23 | - The latest version of e2fsprogs can be found at: |
93e3270c | 24 | |
0694f8c3 | 25 | https://www.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs/ |
489fcb91 | 26 | |
93e3270c JS |
27 | or |
28 | ||
0694f8c3 | 29 | http://sourceforge.net/project/showfiles.php?group_id=2406 |
fc513a33 | 30 | |
93e3270c JS |
31 | or grab the latest git repository from: |
32 | ||
0694f8c3 | 33 | https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git |
4537398d | 34 | |
0694f8c3 | 35 | - Create a new filesystem using the ext4 filesystem type: |
93e3270c | 36 | |
489fcb91 | 37 | # mke2fs -t ext4 /dev/hda1 |
93e3270c | 38 | |
0694f8c3 | 39 | Or to configure an existing ext3 filesystem to support extents: |
fc513a33 | 40 | |
22359f57 | 41 | # tune2fs -O extents /dev/hda1 |
fc513a33 | 42 | |
93e3270c | 43 | If the filesystem was created with 128 byte inodes, it can be |
0694f8c3 | 44 | converted to use 256 byte for greater efficiency via: |
fc513a33 | 45 | |
93e3270c | 46 | # tune2fs -I 256 /dev/hda1 |
fc513a33 | 47 | |
0694f8c3 | 48 | - Mounting: |
93e3270c | 49 | |
03010a33 | 50 | # mount -t ext4 /dev/hda1 /wherever |
fc513a33 | 51 | |
8e1a4857 TT |
52 | - When comparing performance with other filesystems, it's always |
53 | important to try multiple workloads; very often a subtle change in a | |
54 | workload parameter can completely change the ranking of which | |
55 | filesystems do well compared to others. When comparing versus ext3, | |
56 | note that ext4 enables write barriers by default, while ext3 does | |
57 | not enable write barriers by default. So it is useful to use | |
58 | explicitly specify whether barriers are enabled or not when via the | |
59 | '-o barriers=[0|1]' mount option for both ext3 and ext4 filesystems | |
60 | for a fair comparison. When tuning ext3 for best benchmark numbers, | |
61 | it is often worthwhile to try changing the data journaling mode; '-o | |
ad434017 LC |
62 | data=writeback' can be faster for some workloads. (Note however that |
63 | running mounted with data=writeback can potentially leave stale data | |
64 | exposed in recently written files in case of an unclean shutdown, | |
65 | which could be a security exposure in some situations.) Configuring | |
66 | the filesystem with a large journal can also be helpful for | |
67 | metadata-intensive workloads. | |
fc513a33 | 68 | |
489fcb91 DW |
69 | Features |
70 | ======== | |
fc513a33 | 71 | |
489fcb91 DW |
72 | Currently Available |
73 | ------------------- | |
fc513a33 | 74 | |
93e3270c | 75 | * ability to use filesystems > 16TB (e2fsprogs support not available yet) |
fc513a33 DK |
76 | * extent format reduces metadata overhead (RAM, IO for access, transactions) |
77 | * extent format more robust in face of on-disk corruption due to magics, | |
8e1a4857 | 78 | * internal redundancy in tree |
49f1487b | 79 | * improved file allocation (multi-block alloc) |
722bde68 | 80 | * lift 32000 subdirectory limit imposed by i_links_count[1] |
93e3270c JS |
81 | * nsec timestamps for mtime, atime, ctime, create time |
82 | * inode version field on disk (NFSv4, Lustre) | |
83 | * reduced e2fsck time via uninit_bg feature | |
84 | * journal checksumming for robustness, performance | |
85 | * persistent file preallocation (e.g for streaming media, databases) | |
86 | * ability to pack bitmaps and inode tables into larger virtual groups via the | |
87 | flex_bg feature | |
88 | * large file support | |
98bfa344 | 89 | * inode allocation using large virtual block groups via flex_bg |
49f1487b MC |
90 | * delayed allocation |
91 | * large block (up to pagesize) support | |
98bfa344 | 92 | * efficient new ordered mode in JBD2 and ext4 (avoid using buffer head to force |
49f1487b | 93 | the ordering) |
0a790fe4 | 94 | * Case-insensitive file name lookups |
2fdff4c8 EB |
95 | * file-based encryption support (fscrypt) |
96 | * file-based verity support (fsverity) | |
fc513a33 | 97 | |
722bde68 TT |
98 | [1] Filesystems with a block size of 1k may see a limit imposed by the |
99 | directory hash tree having a maximum depth of two. | |
100 | ||
0a790fe4 GKB |
101 | case-insensitive file name lookups |
102 | ====================================================== | |
103 | ||
104 | The case-insensitive file name lookup feature is supported on a | |
105 | per-directory basis, allowing the user to mix case-insensitive and | |
106 | case-sensitive directories in the same filesystem. It is enabled by | |
107 | flipping the +F inode attribute of an empty directory. The | |
108 | case-insensitive string match operation is only defined when we know how | |
109 | text in encoded in a byte sequence. For that reason, in order to enable | |
110 | case-insensitive directories, the filesystem must have the | |
111 | casefold feature, which stores the filesystem-wide encoding | |
112 | model used. By default, the charset adopted is the latest version of | |
113 | Unicode (12.1.0, by the time of this writing), encoded in the UTF-8 | |
114 | form. The comparison algorithm is implemented by normalizing the | |
115 | strings to the Canonical decomposition form, as defined by Unicode, | |
116 | followed by a byte per byte comparison. | |
117 | ||
118 | The case-awareness is name-preserving on the disk, meaning that the file | |
119 | name provided by userspace is a byte-per-byte match to what is actually | |
120 | written in the disk. The Unicode normalization format used by the | |
121 | kernel is thus an internal representation, and not exposed to the | |
122 | userspace nor to the disk, with the important exception of disk hashes, | |
123 | used on large case-insensitive directories with DX feature. On DX | |
124 | directories, the hash must be calculated using the casefolded version of | |
125 | the filename, meaning that the normalization format used actually has an | |
126 | impact on where the directory entry is stored. | |
127 | ||
128 | When we change from viewing filenames as opaque byte sequences to seeing | |
129 | them as encoded strings we need to address what happens when a program | |
130 | tries to create a file with an invalid name. The Unicode subsystem | |
131 | within the kernel leaves the decision of what to do in this case to the | |
132 | filesystem, which select its preferred behavior by enabling/disabling | |
133 | the strict mode. When Ext4 encounters one of those strings and the | |
134 | filesystem did not require strict mode, it falls back to considering the | |
135 | entire string as an opaque byte sequence, which still allows the user to | |
136 | operate on that file, but the case-insensitive lookups won't work. | |
137 | ||
489fcb91 DW |
138 | Options |
139 | ======= | |
fc513a33 DK |
140 | |
141 | When mounting an ext4 filesystem, the following option are accepted: | |
142 | (*) == default | |
143 | ||
c0e3e040 DW |
144 | ro |
145 | Mount filesystem read only. Note that ext4 will replay the journal (and | |
146 | thus write to the partition) even when mounted "read only". The mount | |
147 | options "ro,noload" can be used to prevent writes to the filesystem. | |
148 | ||
149 | journal_checksum | |
150 | Enable checksumming of the journal transactions. This will allow the | |
151 | recovery code in e2fsck and the kernel to detect corruption in the | |
152 | kernel. It is a compatible change and will be ignored by older | |
153 | kernels. | |
154 | ||
155 | journal_async_commit | |
156 | Commit block can be written to disk without waiting for descriptor | |
157 | blocks. If enabled older kernels cannot mount the device. This will | |
158 | enable 'journal_checksum' internally. | |
159 | ||
160 | journal_path=path, journal_dev=devnum | |
161 | When the external journal device's major/minor numbers have changed, | |
162 | these options allow the user to specify the new journal location. The | |
163 | journal device is identified through either its new major/minor numbers | |
164 | encoded in devnum, or via a path to the device. | |
165 | ||
166 | norecovery, noload | |
167 | Don't load the journal on mounting. Note that if the filesystem was | |
168 | not unmounted cleanly, skipping the journal replay will lead to the | |
169 | filesystem containing inconsistencies that can lead to any number of | |
170 | problems. | |
171 | ||
172 | data=journal | |
173 | All data are committed into the journal prior to being written into the | |
174 | main file system. Enabling this mode will disable delayed allocation | |
175 | and O_DIRECT support. | |
176 | ||
177 | data=ordered (*) | |
178 | All data are forced directly out to the main file system prior to its | |
179 | metadata being committed to the journal. | |
180 | ||
181 | data=writeback | |
182 | Data ordering is not preserved, data may be written into the main file | |
183 | system after its metadata has been committed to the journal. | |
184 | ||
185 | commit=nrsec (*) | |
23f6b024 JK |
186 | This setting limits the maximum age of the running transaction to |
187 | 'nrsec' seconds. The default value is 5 seconds. This means that if | |
188 | you lose your power, you will lose as much as the latest 5 seconds of | |
189 | metadata changes (your filesystem will not be damaged though, thanks | |
190 | to the journaling). This default value (or any low value) will hurt | |
191 | performance, but it's good for data-safety. Setting it to 0 will have | |
192 | the same effect as leaving it at the default (5 seconds). Setting it | |
193 | to very large values will improve performance. Note that due to | |
194 | delayed allocation even older data can be lost on power failure since | |
195 | writeback of those data begins only after time set in | |
196 | /proc/sys/vm/dirty_expire_centisecs. | |
c0e3e040 DW |
197 | |
198 | barrier=<0|1(*)>, barrier(*), nobarrier | |
199 | This enables/disables the use of write barriers in the jbd code. | |
200 | barrier=0 disables, barrier=1 enables. This also requires an IO stack | |
201 | which can support barriers, and if jbd gets an error on a barrier | |
202 | write, it will disable again with a warning. Write barriers enforce | |
203 | proper on-disk ordering of journal commits, making volatile disk write | |
204 | caches safe to use, at some performance penalty. If your disks are | |
205 | battery-backed in one way or another, disabling barriers may safely | |
206 | improve performance. The mount options "barrier" and "nobarrier" can | |
207 | also be used to enable or disable barriers, for consistency with other | |
208 | ext4 mount options. | |
209 | ||
210 | inode_readahead_blks=n | |
211 | This tuning parameter controls the maximum number of inode table blocks | |
212 | that ext4's inode table readahead algorithm will pre-read into the | |
213 | buffer cache. The default value is 32 blocks. | |
214 | ||
215 | nouser_xattr | |
216 | Disables Extended User Attributes. See the attr(5) manual page for | |
217 | more information about extended attributes. | |
218 | ||
219 | noacl | |
220 | This option disables POSIX Access Control List support. If ACL support | |
221 | is enabled in the kernel configuration (CONFIG_EXT4_FS_POSIX_ACL), ACL | |
222 | is enabled by default on mount. See the acl(5) manual page for more | |
223 | information about acl. | |
224 | ||
225 | bsddf (*) | |
226 | Make 'df' act like BSD. | |
227 | ||
228 | minixdf | |
229 | Make 'df' act like Minix. | |
230 | ||
231 | debug | |
232 | Extra debugging information is sent to syslog. | |
233 | ||
234 | abort | |
235 | Simulate the effects of calling ext4_abort() for debugging purposes. | |
236 | This is normally used while remounting a filesystem which is already | |
237 | mounted. | |
238 | ||
239 | errors=remount-ro | |
240 | Remount the filesystem read-only on an error. | |
241 | ||
242 | errors=continue | |
243 | Keep going on a filesystem error. | |
244 | ||
245 | errors=panic | |
246 | Panic and halt the machine if an error occurs. (These mount options | |
247 | override the errors behavior specified in the superblock, which can be | |
248 | configured using tune2fs) | |
249 | ||
250 | data_err=ignore(*) | |
251 | Just print an error message if an error occurs in a file data buffer in | |
252 | ordered mode. | |
253 | data_err=abort | |
254 | Abort the journal if an error occurs in a file data buffer in ordered | |
255 | mode. | |
256 | ||
257 | grpid | bsdgroups | |
258 | New objects have the group ID of their parent. | |
259 | ||
260 | nogrpid (*) | sysvgroups | |
261 | New objects have the group ID of their creator. | |
262 | ||
263 | resgid=n | |
264 | The group ID which may use the reserved blocks. | |
265 | ||
266 | resuid=n | |
267 | The user ID which may use the reserved blocks. | |
268 | ||
269 | sb= | |
270 | Use alternate superblock at this location. | |
271 | ||
272 | quota, noquota, grpquota, usrquota | |
273 | These options are ignored by the filesystem. They are used only by | |
274 | quota tools to recognize volumes where quota should be turned on. See | |
275 | documentation in the quota-tools package for more details | |
276 | (http://sourceforge.net/projects/linuxquota). | |
277 | ||
278 | jqfmt=<quota type>, usrjquota=<file>, grpjquota=<file> | |
279 | These options tell filesystem details about quota so that quota | |
280 | information can be properly updated during journal replay. They replace | |
281 | the above quota options. See documentation in the quota-tools package | |
282 | for more details (http://sourceforge.net/projects/linuxquota). | |
283 | ||
284 | stripe=n | |
285 | Number of filesystem blocks that mballoc will try to use for allocation | |
286 | size and alignment. For RAID5/6 systems this should be the number of | |
287 | data disks * RAID chunk size in file system blocks. | |
288 | ||
289 | delalloc (*) | |
290 | Defer block allocation until just before ext4 writes out the block(s) | |
291 | in question. This allows ext4 to better allocation decisions more | |
292 | efficiently. | |
293 | ||
294 | nodelalloc | |
295 | Disable delayed allocation. Blocks are allocated when the data is | |
296 | copied from userspace to the page cache, either via the write(2) system | |
297 | call or when an mmap'ed page which was previously unallocated is | |
298 | written for the first time. | |
299 | ||
300 | max_batch_time=usec | |
301 | Maximum amount of time ext4 should wait for additional filesystem | |
302 | operations to be batch together with a synchronous write operation. | |
303 | Since a synchronous write operation is going to force a commit and then | |
304 | a wait for the I/O complete, it doesn't cost much, and can be a huge | |
305 | throughput win, we wait for a small amount of time to see if any other | |
306 | transactions can piggyback on the synchronous write. The algorithm | |
307 | used is designed to automatically tune for the speed of the disk, by | |
308 | measuring the amount of time (on average) that it takes to finish | |
309 | committing a transaction. Call this time the "commit time". If the | |
310 | time that the transaction has been running is less than the commit | |
311 | time, ext4 will try sleeping for the commit time to see if other | |
312 | operations will join the transaction. The commit time is capped by | |
313 | the max_batch_time, which defaults to 15000us (15ms). This | |
314 | optimization can be turned off entirely by setting max_batch_time to 0. | |
315 | ||
316 | min_batch_time=usec | |
317 | This parameter sets the commit time (as described above) to be at least | |
318 | min_batch_time. It defaults to zero microseconds. Increasing this | |
319 | parameter may improve the throughput of multi-threaded, synchronous | |
320 | workloads on very fast disks, at the cost of increasing latency. | |
321 | ||
322 | journal_ioprio=prio | |
323 | The I/O priority (from 0 to 7, where 0 is the highest priority) which | |
324 | should be used for I/O operations submitted by kjournald2 during a | |
325 | commit operation. This defaults to 3, which is a slightly higher | |
326 | priority than the default I/O priority. | |
327 | ||
328 | auto_da_alloc(*), noauto_da_alloc | |
329 | Many broken applications don't use fsync() when replacing existing | |
330 | files via patterns such as fd = open("foo.new")/write(fd,..)/close(fd)/ | |
331 | rename("foo.new", "foo"), or worse yet, fd = open("foo", | |
332 | O_TRUNC)/write(fd,..)/close(fd). If auto_da_alloc is enabled, ext4 | |
333 | will detect the replace-via-rename and replace-via-truncate patterns | |
334 | and force that any delayed allocation blocks are allocated such that at | |
335 | the next journal commit, in the default data=ordered mode, the data | |
336 | blocks of the new file are forced to disk before the rename() operation | |
337 | is committed. This provides roughly the same level of guarantees as | |
338 | ext3, and avoids the "zero-length" problem that can happen when a | |
339 | system crashes before the delayed allocation blocks are forced to disk. | |
340 | ||
341 | noinit_itable | |
342 | Do not initialize any uninitialized inode table blocks in the | |
343 | background. This feature may be used by installation CD's so that the | |
344 | install process can complete as quickly as possible; the inode table | |
345 | initialization process would then be deferred until the next time the | |
346 | file system is unmounted. | |
347 | ||
348 | init_itable=n | |
349 | The lazy itable init code will wait n times the number of milliseconds | |
350 | it took to zero out the previous block group's inode table. This | |
351 | minimizes the impact on the system performance while file system's | |
352 | inode table is being initialized. | |
353 | ||
354 | discard, nodiscard(*) | |
355 | Controls whether ext4 should issue discard/TRIM commands to the | |
356 | underlying block device when blocks are freed. This is useful for SSD | |
357 | devices and sparse/thinly-provisioned LUNs, but it is off by default | |
358 | until sufficient testing has been done. | |
359 | ||
360 | nouid32 | |
361 | Disables 32-bit UIDs and GIDs. This is for interoperability with | |
362 | older kernels which only store and expect 16-bit values. | |
363 | ||
364 | block_validity(*), noblock_validity | |
365 | These options enable or disable the in-kernel facility for tracking | |
366 | filesystem metadata blocks within internal data structures. This | |
367 | allows multi- block allocator and other routines to notice bugs or | |
368 | corrupted allocation bitmaps which cause blocks to be allocated which | |
369 | overlap with filesystem metadata blocks. | |
370 | ||
371 | dioread_lock, dioread_nolock | |
372 | Controls whether or not ext4 should use the DIO read locking. If the | |
373 | dioread_nolock option is specified ext4 will allocate uninitialized | |
374 | extent before buffer write and convert the extent to initialized after | |
375 | IO completes. This approach allows ext4 code to avoid using inode | |
376 | mutex, which improves scalability on high speed storages. However this | |
377 | does not work with data journaling and dioread_nolock option will be | |
378 | ignored with kernel warning. Note that dioread_nolock code path is only | |
379 | used for extent-based files. Because of the restrictions this options | |
380 | comprises it is off by default (e.g. dioread_lock). | |
381 | ||
382 | max_dir_size_kb=n | |
383 | This limits the size of directories so that any attempt to expand them | |
384 | beyond the specified limit in kilobytes will cause an ENOSPC error. | |
385 | This is useful in memory constrained environments, where a very large | |
386 | directory can cause severe performance problems or even provoke the Out | |
387 | Of Memory killer. (For example, if there is only 512mb memory | |
388 | available, a 176mb directory may seriously cramp the system's style.) | |
389 | ||
390 | i_version | |
391 | Enable 64-bit inode version support. This option is off by default. | |
392 | ||
393 | dax | |
394 | Use direct access (no page cache). See | |
395 | Documentation/filesystems/dax.txt. Note that this option is | |
396 | incompatible with data=journal. | |
923ae0ff | 397 | |
fc513a33 | 398 | Data Mode |
93e3270c | 399 | ========= |
fc513a33 DK |
400 | There are 3 different data modes: |
401 | ||
402 | * writeback mode | |
489fcb91 DW |
403 | |
404 | In data=writeback mode, ext4 does not journal data at all. This mode provides | |
405 | a similar level of journaling as that of XFS, JFS, and ReiserFS in its default | |
406 | mode - metadata journaling. A crash+recovery can cause incorrect data to | |
407 | appear in files which were written shortly before the crash. This mode will | |
408 | typically provide the best ext4 performance. | |
fc513a33 DK |
409 | |
410 | * ordered mode | |
489fcb91 DW |
411 | |
412 | In data=ordered mode, ext4 only officially journals metadata, but it logically | |
413 | groups metadata information related to data changes with the data blocks into | |
414 | a single unit called a transaction. When it's time to write the new metadata | |
415 | out to disk, the associated data blocks are written first. In general, this | |
416 | mode performs slightly slower than writeback but significantly faster than | |
417 | journal mode. | |
fc513a33 DK |
418 | |
419 | * journal mode | |
489fcb91 DW |
420 | |
421 | data=journal mode provides full data and metadata journaling. All new data is | |
422 | written to the journal first, and then to its final location. In the event of | |
423 | a crash, the journal can be replayed, bringing both data and metadata into a | |
424 | consistent state. This mode is the slowest except when data needs to be read | |
425 | from and written to disk at the same time where it outperforms all others | |
426 | modes. Enabling this mode will disable delayed allocation and O_DIRECT | |
427 | support. | |
fc513a33 | 428 | |
6f9524e9 LC |
429 | /proc entries |
430 | ============= | |
431 | ||
432 | Information about mounted ext4 file systems can be found in | |
433 | /proc/fs/ext4. Each mounted filesystem will have a directory in | |
434 | /proc/fs/ext4 based on its device name (i.e., /proc/fs/ext4/hdc or | |
435 | /proc/fs/ext4/dm-0). The files in each per-device directory are shown | |
436 | in table below. | |
437 | ||
438 | Files in /proc/fs/ext4/<devname> | |
489fcb91 | 439 | |
c0e3e040 DW |
440 | mb_groups |
441 | details of multiblock allocator buddy cache of free blocks | |
6f9524e9 LC |
442 | |
443 | /sys entries | |
444 | ============ | |
445 | ||
446 | Information about mounted ext4 file systems can be found in | |
447 | /sys/fs/ext4. Each mounted filesystem will have a directory in | |
448 | /sys/fs/ext4 based on its device name (i.e., /sys/fs/ext4/hdc or | |
449 | /sys/fs/ext4/dm-0). The files in each per-device directory are shown | |
450 | in table below. | |
451 | ||
489fcb91 DW |
452 | Files in /sys/fs/ext4/<devname>: |
453 | ||
6f9524e9 | 454 | (see also Documentation/ABI/testing/sysfs-fs-ext4) |
6f9524e9 | 455 | |
c0e3e040 DW |
456 | delayed_allocation_blocks |
457 | This file is read-only and shows the number of blocks that are dirty in | |
458 | the page cache, but which do not have their location in the filesystem | |
459 | allocated yet. | |
460 | ||
461 | inode_goal | |
462 | Tuning parameter which (if non-zero) controls the goal inode used by | |
463 | the inode allocator in preference to all other allocation heuristics. | |
464 | This is intended for debugging use only, and should be 0 on production | |
465 | systems. | |
466 | ||
467 | inode_readahead_blks | |
468 | Tuning parameter which controls the maximum number of inode table | |
469 | blocks that ext4's inode table readahead algorithm will pre-read into | |
470 | the buffer cache. | |
471 | ||
472 | lifetime_write_kbytes | |
473 | This file is read-only and shows the number of kilobytes of data that | |
474 | have been written to this filesystem since it was created. | |
475 | ||
476 | max_writeback_mb_bump | |
477 | The maximum number of megabytes the writeback code will try to write | |
478 | out before move on to another inode. | |
479 | ||
480 | mb_group_prealloc | |
481 | The multiblock allocator will round up allocation requests to a | |
482 | multiple of this tuning parameter if the stripe size is not set in the | |
483 | ext4 superblock | |
484 | ||
485 | mb_max_to_scan | |
486 | The maximum number of extents the multiblock allocator will search to | |
487 | find the best extent. | |
488 | ||
489 | mb_min_to_scan | |
490 | The minimum number of extents the multiblock allocator will search to | |
491 | find the best extent. | |
492 | ||
493 | mb_order2_req | |
494 | Tuning parameter which controls the minimum size for requests (as a | |
495 | power of 2) where the buddy cache is used. | |
496 | ||
497 | mb_stats | |
498 | Controls whether the multiblock allocator should collect statistics, | |
499 | which are shown during the unmount. 1 means to collect statistics, 0 | |
500 | means not to collect statistics. | |
501 | ||
502 | mb_stream_req | |
503 | Files which have fewer blocks than this tunable parameter will have | |
504 | their blocks allocated out of a block group specific preallocation | |
505 | pool, so that small files are packed closely together. Each large file | |
506 | will have its blocks allocated out of its own unique preallocation | |
507 | pool. | |
508 | ||
509 | session_write_kbytes | |
510 | This file is read-only and shows the number of kilobytes of data that | |
511 | have been written to this filesystem since it was mounted. | |
512 | ||
513 | reserved_clusters | |
514 | This is RW file and contains number of reserved clusters in the file | |
515 | system which will be used in the specific situations to avoid costly | |
516 | zeroout, unexpected ENOSPC, or possible data loss. The default is 2% or | |
517 | 4096 clusters, whichever is smaller and this can be changed however it | |
518 | can never exceed number of clusters in the file system. If there is not | |
519 | enough space for the reserved space when mounting the file mount will | |
520 | _not_ fail. | |
6f9524e9 LC |
521 | |
522 | Ioctls | |
523 | ====== | |
524 | ||
525 | There is some Ext4 specific functionality which can be accessed by applications | |
526 | through the system call interfaces. The list of all Ext4 specific ioctls are | |
527 | shown in the table below. | |
528 | ||
529 | Table of Ext4 specific ioctls | |
489fcb91 | 530 | |
c0e3e040 DW |
531 | EXT4_IOC_GETFLAGS |
532 | Get additional attributes associated with inode. The ioctl argument is | |
533 | an integer bitfield, with bit values described in ext4.h. This ioctl is | |
534 | an alias for FS_IOC_GETFLAGS. | |
535 | ||
536 | EXT4_IOC_SETFLAGS | |
537 | Set additional attributes associated with inode. The ioctl argument is | |
538 | an integer bitfield, with bit values described in ext4.h. This ioctl is | |
539 | an alias for FS_IOC_SETFLAGS. | |
540 | ||
541 | EXT4_IOC_GETVERSION, EXT4_IOC_GETVERSION_OLD | |
542 | Get the inode i_generation number stored for each inode. The | |
543 | i_generation number is normally changed only when new inode is created | |
544 | and it is particularly useful for network filesystems. The '_OLD' | |
545 | version of this ioctl is an alias for FS_IOC_GETVERSION. | |
546 | ||
547 | EXT4_IOC_SETVERSION, EXT4_IOC_SETVERSION_OLD | |
548 | Set the inode i_generation number stored for each inode. The '_OLD' | |
549 | version of this ioctl is an alias for FS_IOC_SETVERSION. | |
550 | ||
551 | EXT4_IOC_GROUP_EXTEND | |
552 | This ioctl has the same purpose as the resize mount option. It allows | |
553 | to resize filesystem to the end of the last existing block group, | |
554 | further resize has to be done with resize2fs, either online, or | |
555 | offline. The argument points to the unsigned logn number representing | |
556 | the filesystem new block count. | |
557 | ||
558 | EXT4_IOC_MOVE_EXT | |
559 | Move the block extents from orig_fd (the one this ioctl is pointing to) | |
560 | to the donor_fd (the one specified in move_extent structure passed as | |
561 | an argument to this ioctl). Then, exchange inode metadata between | |
562 | orig_fd and donor_fd. This is especially useful for online | |
563 | defragmentation, because the allocator has the opportunity to allocate | |
564 | moved blocks better, ideally into one contiguous extent. | |
565 | ||
566 | EXT4_IOC_GROUP_ADD | |
567 | Add a new group descriptor to an existing or new group descriptor | |
568 | block. The new group descriptor is described by ext4_new_group_input | |
569 | structure, which is passed as an argument to this ioctl. This is | |
570 | especially useful in conjunction with EXT4_IOC_GROUP_EXTEND, which | |
571 | allows online resize of the filesystem to the end of the last existing | |
572 | block group. Those two ioctls combined is used in userspace online | |
573 | resize tool (e.g. resize2fs). | |
574 | ||
575 | EXT4_IOC_MIGRATE | |
576 | This ioctl operates on the filesystem itself. It converts (migrates) | |
577 | ext3 indirect block mapped inode to ext4 extent mapped inode by walking | |
578 | through indirect block mapping of the original inode and converting | |
579 | contiguous block ranges into ext4 extents of the temporary inode. Then, | |
580 | inodes are swapped. This ioctl might help, when migrating from ext3 to | |
581 | ext4 filesystem, however suggestion is to create fresh ext4 filesystem | |
582 | and copy data from the backup. Note, that filesystem has to support | |
583 | extents for this ioctl to work. | |
584 | ||
585 | EXT4_IOC_ALLOC_DA_BLKS | |
586 | Force all of the delay allocated blocks to be allocated to preserve | |
587 | application-expected ext3 behaviour. Note that this will also start | |
588 | triggering a write of the data blocks, but this behaviour may change in | |
589 | the future as it is not necessary and has been done this way only for | |
590 | sake of simplicity. | |
591 | ||
592 | EXT4_IOC_RESIZE_FS | |
593 | Resize the filesystem to a new size. The number of blocks of resized | |
594 | filesystem is passed in via 64 bit integer argument. The kernel | |
595 | allocates bitmaps and inode table, the userspace tool thus just passes | |
596 | the new number of blocks. | |
597 | ||
598 | EXT4_IOC_SWAP_BOOT | |
599 | Swap i_blocks and associated attributes (like i_blocks, i_size, | |
600 | i_flags, ...) from the specified inode with inode EXT4_BOOT_LOADER_INO | |
601 | (#5). This is typically used to store a boot loader in a secure part of | |
602 | the filesystem, where it can't be changed by a normal user by accident. | |
603 | The data blocks of the previous boot loader will be associated with the | |
604 | given inode. | |
6f9524e9 | 605 | |
fc513a33 DK |
606 | References |
607 | ========== | |
608 | ||
609 | kernel source: <file:fs/ext4/> | |
610 | <file:fs/jbd2/> | |
611 | ||
612 | programs: http://e2fsprogs.sourceforge.net/ | |
fc513a33 DK |
613 | |
614 | useful links: http://fedoraproject.org/wiki/ext3-devel | |
615 | http://www.bullopensource.org/ext4/ | |
93e3270c JS |
616 | http://ext4.wiki.kernel.org/index.php/Main_Page |
617 | http://fedoraproject.org/wiki/Features/Ext4 |