]>
Commit | Line | Data |
---|---|---|
25b532ce MCC |
1 | ==================== |
2 | Changes since 2.5.0: | |
3 | ==================== | |
4 | ||
5 | --- | |
6 | ||
7 | **recommended** | |
8 | ||
9 | New helpers: sb_bread(), sb_getblk(), sb_find_get_block(), set_bh(), | |
10 | sb_set_blocksize() and sb_min_blocksize(). | |
11 | ||
12 | Use them. | |
13 | ||
14 | (sb_find_get_block() replaces 2.4's get_hash_table()) | |
15 | ||
16 | --- | |
17 | ||
18 | **recommended** | |
19 | ||
20 | New methods: ->alloc_inode() and ->destroy_inode(). | |
21 | ||
22 | Remove inode->u.foo_inode_i | |
23 | ||
24 | Declare:: | |
25 | ||
26 | struct foo_inode_info { | |
27 | /* fs-private stuff */ | |
28 | struct inode vfs_inode; | |
29 | }; | |
30 | static inline struct foo_inode_info *FOO_I(struct inode *inode) | |
31 | { | |
32 | return list_entry(inode, struct foo_inode_info, vfs_inode); | |
33 | } | |
34 | ||
35 | Use FOO_I(inode) instead of &inode->u.foo_inode_i; | |
36 | ||
37 | Add foo_alloc_inode() and foo_destroy_inode() - the former should allocate | |
38 | foo_inode_info and return the address of ->vfs_inode, the latter should free | |
39 | FOO_I(inode) (see in-tree filesystems for examples). | |
40 | ||
41 | Make them ->alloc_inode and ->destroy_inode in your super_operations. | |
42 | ||
43 | Keep in mind that now you need explicit initialization of private data | |
44 | typically between calling iget_locked() and unlocking the inode. | |
45 | ||
46 | At some point that will become mandatory. | |
47 | ||
48 | --- | |
49 | ||
50 | **mandatory** | |
51 | ||
52 | Change of file_system_type method (->read_super to ->get_sb) | |
53 | ||
54 | ->read_super() is no more. Ditto for DECLARE_FSTYPE and DECLARE_FSTYPE_DEV. | |
55 | ||
56 | Turn your foo_read_super() into a function that would return 0 in case of | |
57 | success and negative number in case of error (-EINVAL unless you have more | |
58 | informative error value to report). Call it foo_fill_super(). Now declare:: | |
59 | ||
60 | int foo_get_sb(struct file_system_type *fs_type, | |
61 | int flags, const char *dev_name, void *data, struct vfsmount *mnt) | |
62 | { | |
63 | return get_sb_bdev(fs_type, flags, dev_name, data, foo_fill_super, | |
64 | mnt); | |
65 | } | |
66 | ||
67 | (or similar with s/bdev/nodev/ or s/bdev/single/, depending on the kind of | |
68 | filesystem). | |
69 | ||
70 | Replace DECLARE_FSTYPE... with explicit initializer and have ->get_sb set as | |
71 | foo_get_sb. | |
72 | ||
73 | --- | |
74 | ||
75 | **mandatory** | |
76 | ||
77 | Locking change: ->s_vfs_rename_sem is taken only by cross-directory renames. | |
78 | Most likely there is no need to change anything, but if you relied on | |
79 | global exclusion between renames for some internal purpose - you need to | |
80 | change your internal locking. Otherwise exclusion warranties remain the | |
81 | same (i.e. parents and victim are locked, etc.). | |
82 | ||
83 | --- | |
84 | ||
85 | **informational** | |
86 | ||
87 | Now we have the exclusion between ->lookup() and directory removal (by | |
88 | ->rmdir() and ->rename()). If you used to need that exclusion and do | |
89 | it by internal locking (most of filesystems couldn't care less) - you | |
90 | can relax your locking. | |
91 | ||
92 | --- | |
93 | ||
94 | **mandatory** | |
95 | ||
96 | ->lookup(), ->truncate(), ->create(), ->unlink(), ->mknod(), ->mkdir(), | |
97 | ->rmdir(), ->link(), ->lseek(), ->symlink(), ->rename() | |
98 | and ->readdir() are called without BKL now. Grab it on entry, drop upon return | |
99 | - that will guarantee the same locking you used to have. If your method or its | |
100 | parts do not need BKL - better yet, now you can shift lock_kernel() and | |
101 | unlock_kernel() so that they would protect exactly what needs to be | |
102 | protected. | |
103 | ||
104 | --- | |
105 | ||
106 | **mandatory** | |
107 | ||
108 | BKL is also moved from around sb operations. BKL should have been shifted into | |
109 | individual fs sb_op functions. If you don't need it, remove it. | |
110 | ||
111 | --- | |
112 | ||
113 | **informational** | |
114 | ||
115 | check for ->link() target not being a directory is done by callers. Feel | |
116 | free to drop it... | |
117 | ||
118 | --- | |
119 | ||
120 | **informational** | |
121 | ||
122 | ->link() callers hold ->i_mutex on the object we are linking to. Some of your | |
123 | problems might be over... | |
124 | ||
125 | --- | |
126 | ||
127 | **mandatory** | |
128 | ||
129 | new file_system_type method - kill_sb(superblock). If you are converting | |
130 | an existing filesystem, set it according to ->fs_flags:: | |
131 | ||
132 | FS_REQUIRES_DEV - kill_block_super | |
133 | FS_LITTER - kill_litter_super | |
134 | neither - kill_anon_super | |
135 | ||
136 | FS_LITTER is gone - just remove it from fs_flags. | |
137 | ||
138 | --- | |
139 | ||
140 | **mandatory** | |
141 | ||
142 | FS_SINGLE is gone (actually, that had happened back when ->get_sb() | |
143 | went in - and hadn't been documented ;-/). Just remove it from fs_flags | |
144 | (and see ->get_sb() entry for other actions). | |
145 | ||
146 | --- | |
147 | ||
148 | **mandatory** | |
149 | ||
150 | ->setattr() is called without BKL now. Caller _always_ holds ->i_mutex, so | |
151 | watch for ->i_mutex-grabbing code that might be used by your ->setattr(). | |
152 | Callers of notify_change() need ->i_mutex now. | |
153 | ||
154 | --- | |
155 | ||
156 | **recommended** | |
157 | ||
158 | New super_block field ``struct export_operations *s_export_op`` for | |
159 | explicit support for exporting, e.g. via NFS. The structure is fully | |
160 | documented at its declaration in include/linux/fs.h, and in | |
9195c3e8 | 161 | Documentation/filesystems/nfs/exporting.rst. |
25b532ce MCC |
162 | |
163 | Briefly it allows for the definition of decode_fh and encode_fh operations | |
164 | to encode and decode filehandles, and allows the filesystem to use | |
165 | a standard helper function for decode_fh, and provide file-system specific | |
166 | support for this helper, particularly get_parent. | |
167 | ||
168 | It is planned that this will be required for exporting once the code | |
169 | settles down a bit. | |
170 | ||
171 | **mandatory** | |
172 | ||
173 | s_export_op is now required for exporting a filesystem. | |
174 | isofs, ext2, ext3, resierfs, fat | |
175 | can be used as examples of very different filesystems. | |
176 | ||
177 | --- | |
178 | ||
179 | **mandatory** | |
180 | ||
181 | iget4() and the read_inode2 callback have been superseded by iget5_locked() | |
182 | which has the following prototype:: | |
183 | ||
184 | struct inode *iget5_locked(struct super_block *sb, unsigned long ino, | |
185 | int (*test)(struct inode *, void *), | |
186 | int (*set)(struct inode *, void *), | |
187 | void *data); | |
188 | ||
189 | 'test' is an additional function that can be used when the inode | |
190 | number is not sufficient to identify the actual file object. 'set' | |
191 | should be a non-blocking function that initializes those parts of a | |
192 | newly created inode to allow the test function to succeed. 'data' is | |
193 | passed as an opaque value to both test and set functions. | |
194 | ||
195 | When the inode has been created by iget5_locked(), it will be returned with the | |
196 | I_NEW flag set and will still be locked. The filesystem then needs to finalize | |
197 | the initialization. Once the inode is initialized it must be unlocked by | |
198 | calling unlock_new_inode(). | |
199 | ||
200 | The filesystem is responsible for setting (and possibly testing) i_ino | |
201 | when appropriate. There is also a simpler iget_locked function that | |
202 | just takes the superblock and inode number as arguments and does the | |
203 | test and set for you. | |
204 | ||
205 | e.g.:: | |
206 | ||
207 | inode = iget_locked(sb, ino); | |
208 | if (inode->i_state & I_NEW) { | |
209 | err = read_inode_from_disk(inode); | |
210 | if (err < 0) { | |
211 | iget_failed(inode); | |
212 | return err; | |
213 | } | |
214 | unlock_new_inode(inode); | |
215 | } | |
216 | ||
217 | Note that if the process of setting up a new inode fails, then iget_failed() | |
218 | should be called on the inode to render it dead, and an appropriate error | |
219 | should be passed back to the caller. | |
220 | ||
221 | --- | |
222 | ||
223 | **recommended** | |
224 | ||
225 | ->getattr() finally getting used. See instances in nfs, minix, etc. | |
226 | ||
227 | --- | |
228 | ||
229 | **mandatory** | |
230 | ||
231 | ->revalidate() is gone. If your filesystem had it - provide ->getattr() | |
232 | and let it call whatever you had as ->revlidate() + (for symlinks that | |
233 | had ->revalidate()) add calls in ->follow_link()/->readlink(). | |
234 | ||
235 | --- | |
236 | ||
237 | **mandatory** | |
238 | ||
239 | ->d_parent changes are not protected by BKL anymore. Read access is safe | |
240 | if at least one of the following is true: | |
241 | ||
242 | * filesystem has no cross-directory rename() | |
243 | * we know that parent had been locked (e.g. we are looking at | |
244 | ->d_parent of ->lookup() argument). | |
245 | * we are called from ->rename(). | |
246 | * the child's ->d_lock is held | |
247 | ||
248 | Audit your code and add locking if needed. Notice that any place that is | |
249 | not protected by the conditions above is risky even in the old tree - you | |
250 | had been relying on BKL and that's prone to screwups. Old tree had quite | |
251 | a few holes of that kind - unprotected access to ->d_parent leading to | |
252 | anything from oops to silent memory corruption. | |
253 | ||
254 | --- | |
255 | ||
256 | **mandatory** | |
257 | ||
258 | FS_NOMOUNT is gone. If you use it - just set SB_NOUSER in flags | |
259 | (see rootfs for one kind of solution and bdev/socket/pipe for another). | |
260 | ||
261 | --- | |
262 | ||
263 | **recommended** | |
264 | ||
265 | Use bdev_read_only(bdev) instead of is_read_only(kdev). The latter | |
266 | is still alive, but only because of the mess in drivers/s390/block/dasd.c. | |
267 | As soon as it gets fixed is_read_only() will die. | |
268 | ||
269 | --- | |
270 | ||
271 | **mandatory** | |
272 | ||
273 | ->permission() is called without BKL now. Grab it on entry, drop upon | |
274 | return - that will guarantee the same locking you used to have. If | |
275 | your method or its parts do not need BKL - better yet, now you can | |
276 | shift lock_kernel() and unlock_kernel() so that they would protect | |
277 | exactly what needs to be protected. | |
278 | ||
279 | --- | |
280 | ||
281 | **mandatory** | |
282 | ||
283 | ->statfs() is now called without BKL held. BKL should have been | |
284 | shifted into individual fs sb_op functions where it's not clear that | |
285 | it's safe to remove it. If you don't need it, remove it. | |
286 | ||
287 | --- | |
288 | ||
289 | **mandatory** | |
290 | ||
291 | is_read_only() is gone; use bdev_read_only() instead. | |
292 | ||
293 | --- | |
294 | ||
295 | **mandatory** | |
296 | ||
297 | destroy_buffers() is gone; use invalidate_bdev(). | |
298 | ||
299 | --- | |
300 | ||
301 | **mandatory** | |
302 | ||
303 | fsync_dev() is gone; use fsync_bdev(). NOTE: lvm breakage is | |
304 | deliberate; as soon as struct block_device * is propagated in a reasonable | |
305 | way by that code fixing will become trivial; until then nothing can be | |
306 | done. | |
307 | ||
308 | **mandatory** | |
309 | ||
310 | block truncatation on error exit from ->write_begin, and ->direct_IO | |
311 | moved from generic methods (block_write_begin, cont_write_begin, | |
312 | nobh_write_begin, blockdev_direct_IO*) to callers. Take a look at | |
313 | ext2_write_failed and callers for an example. | |
314 | ||
315 | **mandatory** | |
316 | ||
317 | ->truncate is gone. The whole truncate sequence needs to be | |
318 | implemented in ->setattr, which is now mandatory for filesystems | |
319 | implementing on-disk size changes. Start with a copy of the old inode_setattr | |
320 | and vmtruncate, and the reorder the vmtruncate + foofs_vmtruncate sequence to | |
321 | be in order of zeroing blocks using block_truncate_page or similar helpers, | |
322 | size update and on finally on-disk truncation which should not fail. | |
323 | setattr_prepare (which used to be inode_change_ok) now includes the size checks | |
324 | for ATTR_SIZE and must be called in the beginning of ->setattr unconditionally. | |
325 | ||
326 | **mandatory** | |
327 | ||
328 | ->clear_inode() and ->delete_inode() are gone; ->evict_inode() should | |
329 | be used instead. It gets called whenever the inode is evicted, whether it has | |
330 | remaining links or not. Caller does *not* evict the pagecache or inode-associated | |
331 | metadata buffers; the method has to use truncate_inode_pages_final() to get rid | |
332 | of those. Caller makes sure async writeback cannot be running for the inode while | |
333 | (or after) ->evict_inode() is called. | |
334 | ||
335 | ->drop_inode() returns int now; it's called on final iput() with | |
336 | inode->i_lock held and it returns true if filesystems wants the inode to be | |
337 | dropped. As before, generic_drop_inode() is still the default and it's been | |
338 | updated appropriately. generic_delete_inode() is also alive and it consists | |
339 | simply of return 1. Note that all actual eviction work is done by caller after | |
340 | ->drop_inode() returns. | |
341 | ||
342 | As before, clear_inode() must be called exactly once on each call of | |
343 | ->evict_inode() (as it used to be for each call of ->delete_inode()). Unlike | |
344 | before, if you are using inode-associated metadata buffers (i.e. | |
345 | mark_buffer_dirty_inode()), it's your responsibility to call | |
346 | invalidate_inode_buffers() before clear_inode(). | |
347 | ||
348 | NOTE: checking i_nlink in the beginning of ->write_inode() and bailing out | |
349 | if it's zero is not *and* *never* *had* *been* enough. Final unlink() and iput() | |
350 | may happen while the inode is in the middle of ->write_inode(); e.g. if you blindly | |
351 | free the on-disk inode, you may end up doing that while ->write_inode() is writing | |
352 | to it. | |
353 | ||
354 | --- | |
355 | ||
356 | **mandatory** | |
357 | ||
358 | .d_delete() now only advises the dcache as to whether or not to cache | |
359 | unreferenced dentries, and is now only called when the dentry refcount goes to | |
360 | 0. Even on 0 refcount transition, it must be able to tolerate being called 0, | |
361 | 1, or more times (eg. constant, idempotent). | |
362 | ||
363 | --- | |
364 | ||
365 | **mandatory** | |
366 | ||
367 | .d_compare() calling convention and locking rules are significantly | |
368 | changed. Read updated documentation in Documentation/filesystems/vfs.rst (and | |
369 | look at examples of other filesystems) for guidance. | |
370 | ||
371 | --- | |
372 | ||
373 | **mandatory** | |
374 | ||
375 | .d_hash() calling convention and locking rules are significantly | |
376 | changed. Read updated documentation in Documentation/filesystems/vfs.rst (and | |
377 | look at examples of other filesystems) for guidance. | |
378 | ||
379 | --- | |
380 | ||
381 | **mandatory** | |
382 | ||
383 | dcache_lock is gone, replaced by fine grained locks. See fs/dcache.c | |
384 | for details of what locks to replace dcache_lock with in order to protect | |
385 | particular things. Most of the time, a filesystem only needs ->d_lock, which | |
386 | protects *all* the dcache state of a given dentry. | |
387 | ||
388 | --- | |
389 | ||
390 | **mandatory** | |
391 | ||
392 | Filesystems must RCU-free their inodes, if they can have been accessed | |
393 | via rcu-walk path walk (basically, if the file can have had a path name in the | |
394 | vfs namespace). | |
395 | ||
396 | Even though i_dentry and i_rcu share storage in a union, we will | |
397 | initialize the former in inode_init_always(), so just leave it alone in | |
398 | the callback. It used to be necessary to clean it there, but not anymore | |
399 | (starting at 3.2). | |
400 | ||
401 | --- | |
402 | ||
403 | **recommended** | |
404 | ||
405 | vfs now tries to do path walking in "rcu-walk mode", which avoids | |
406 | atomic operations and scalability hazards on dentries and inodes (see | |
407 | Documentation/filesystems/path-lookup.txt). d_hash and d_compare changes | |
408 | (above) are examples of the changes required to support this. For more complex | |
409 | filesystem callbacks, the vfs drops out of rcu-walk mode before the fs call, so | |
410 | no changes are required to the filesystem. However, this is costly and loses | |
411 | the benefits of rcu-walk mode. We will begin to add filesystem callbacks that | |
412 | are rcu-walk aware, shown below. Filesystems should take advantage of this | |
413 | where possible. | |
414 | ||
415 | --- | |
416 | ||
417 | **mandatory** | |
418 | ||
419 | d_revalidate is a callback that is made on every path element (if | |
420 | the filesystem provides it), which requires dropping out of rcu-walk mode. This | |
421 | may now be called in rcu-walk mode (nd->flags & LOOKUP_RCU). -ECHILD should be | |
422 | returned if the filesystem cannot handle rcu-walk. See | |
423 | Documentation/filesystems/vfs.rst for more details. | |
424 | ||
425 | permission is an inode permission check that is called on many or all | |
426 | directory inodes on the way down a path walk (to check for exec permission). It | |
427 | must now be rcu-walk aware (mask & MAY_NOT_BLOCK). See | |
428 | Documentation/filesystems/vfs.rst for more details. | |
429 | ||
430 | --- | |
431 | ||
432 | **mandatory** | |
433 | ||
434 | In ->fallocate() you must check the mode option passed in. If your | |
435 | filesystem does not support hole punching (deallocating space in the middle of a | |
436 | file) you must return -EOPNOTSUPP if FALLOC_FL_PUNCH_HOLE is set in mode. | |
437 | Currently you can only have FALLOC_FL_PUNCH_HOLE with FALLOC_FL_KEEP_SIZE set, | |
438 | so the i_size should not change when hole punching, even when puching the end of | |
439 | a file off. | |
440 | ||
441 | --- | |
442 | ||
443 | **mandatory** | |
444 | ||
445 | ->get_sb() is gone. Switch to use of ->mount(). Typically it's just | |
446 | a matter of switching from calling ``get_sb_``... to ``mount_``... and changing | |
447 | the function type. If you were doing it manually, just switch from setting | |
448 | ->mnt_root to some pointer to returning that pointer. On errors return | |
449 | ERR_PTR(...). | |
450 | ||
451 | --- | |
452 | ||
453 | **mandatory** | |
454 | ||
455 | ->permission() and generic_permission()have lost flags | |
456 | argument; instead of passing IPERM_FLAG_RCU we add MAY_NOT_BLOCK into mask. | |
457 | ||
458 | generic_permission() has also lost the check_acl argument; ACL checking | |
459 | has been taken to VFS and filesystems need to provide a non-NULL ->i_op->get_acl | |
460 | to read an ACL from disk. | |
461 | ||
462 | --- | |
463 | ||
464 | **mandatory** | |
465 | ||
466 | If you implement your own ->llseek() you must handle SEEK_HOLE and | |
467 | SEEK_DATA. You can hanle this by returning -EINVAL, but it would be nicer to | |
468 | support it in some way. The generic handler assumes that the entire file is | |
469 | data and there is a virtual hole at the end of the file. So if the provided | |
470 | offset is less than i_size and SEEK_DATA is specified, return the same offset. | |
471 | If the above is true for the offset and you are given SEEK_HOLE, return the end | |
472 | of the file. If the offset is i_size or greater return -ENXIO in either case. | |
473 | ||
474 | **mandatory** | |
475 | ||
476 | If you have your own ->fsync() you must make sure to call | |
477 | filemap_write_and_wait_range() so that all dirty pages are synced out properly. | |
478 | You must also keep in mind that ->fsync() is not called with i_mutex held | |
479 | anymore, so if you require i_mutex locking you must make sure to take it and | |
480 | release it yourself. | |
481 | ||
482 | --- | |
483 | ||
484 | **mandatory** | |
485 | ||
486 | d_alloc_root() is gone, along with a lot of bugs caused by code | |
487 | misusing it. Replacement: d_make_root(inode). On success d_make_root(inode) | |
488 | allocates and returns a new dentry instantiated with the passed in inode. | |
489 | On failure NULL is returned and the passed in inode is dropped so the reference | |
490 | to inode is consumed in all cases and failure handling need not do any cleanup | |
491 | for the inode. If d_make_root(inode) is passed a NULL inode it returns NULL | |
492 | and also requires no further error handling. Typical usage is:: | |
493 | ||
494 | inode = foofs_new_inode(....); | |
495 | s->s_root = d_make_root(inode); | |
496 | if (!s->s_root) | |
497 | /* Nothing needed for the inode cleanup */ | |
498 | return -ENOMEM; | |
499 | ... | |
500 | ||
501 | --- | |
502 | ||
503 | **mandatory** | |
504 | ||
505 | The witch is dead! Well, 2/3 of it, anyway. ->d_revalidate() and | |
506 | ->lookup() do *not* take struct nameidata anymore; just the flags. | |
507 | ||
508 | --- | |
509 | ||
510 | **mandatory** | |
511 | ||
512 | ->create() doesn't take ``struct nameidata *``; unlike the previous | |
513 | two, it gets "is it an O_EXCL or equivalent?" boolean argument. Note that | |
514 | local filesystems can ignore tha argument - they are guaranteed that the | |
515 | object doesn't exist. It's remote/distributed ones that might care... | |
516 | ||
517 | --- | |
518 | ||
519 | **mandatory** | |
520 | ||
521 | FS_REVAL_DOT is gone; if you used to have it, add ->d_weak_revalidate() | |
522 | in your dentry operations instead. | |
523 | ||
524 | --- | |
525 | ||
526 | **mandatory** | |
527 | ||
528 | vfs_readdir() is gone; switch to iterate_dir() instead | |
529 | ||
530 | --- | |
531 | ||
532 | **mandatory** | |
533 | ||
534 | ->readdir() is gone now; switch to ->iterate() | |
535 | ||
536 | **mandatory** | |
537 | ||
538 | vfs_follow_link has been removed. Filesystems must use nd_set_link | |
539 | from ->follow_link for normal symlinks, or nd_jump_link for magic | |
540 | /proc/<pid> style links. | |
541 | ||
542 | --- | |
543 | ||
544 | **mandatory** | |
545 | ||
546 | iget5_locked()/ilookup5()/ilookup5_nowait() test() callback used to be | |
547 | called with both ->i_lock and inode_hash_lock held; the former is *not* | |
548 | taken anymore, so verify that your callbacks do not rely on it (none | |
549 | of the in-tree instances did). inode_hash_lock is still held, | |
550 | of course, so they are still serialized wrt removal from inode hash, | |
551 | as well as wrt set() callback of iget5_locked(). | |
552 | ||
553 | --- | |
554 | ||
555 | **mandatory** | |
556 | ||
557 | d_materialise_unique() is gone; d_splice_alias() does everything you | |
558 | need now. Remember that they have opposite orders of arguments ;-/ | |
559 | ||
560 | --- | |
561 | ||
562 | **mandatory** | |
563 | ||
564 | f_dentry is gone; use f_path.dentry, or, better yet, see if you can avoid | |
565 | it entirely. | |
566 | ||
567 | --- | |
568 | ||
569 | **mandatory** | |
570 | ||
571 | never call ->read() and ->write() directly; use __vfs_{read,write} or | |
572 | wrappers; instead of checking for ->write or ->read being NULL, look for | |
573 | FMODE_CAN_{WRITE,READ} in file->f_mode. | |
574 | ||
575 | --- | |
576 | ||
577 | **mandatory** | |
578 | ||
579 | do _not_ use new_sync_{read,write} for ->read/->write; leave it NULL | |
580 | instead. | |
581 | ||
582 | --- | |
583 | ||
584 | **mandatory** | |
585 | ->aio_read/->aio_write are gone. Use ->read_iter/->write_iter. | |
586 | ||
587 | --- | |
588 | ||
589 | **recommended** | |
590 | ||
591 | for embedded ("fast") symlinks just set inode->i_link to wherever the | |
592 | symlink body is and use simple_follow_link() as ->follow_link(). | |
593 | ||
594 | --- | |
595 | ||
596 | **mandatory** | |
597 | ||
598 | calling conventions for ->follow_link() have changed. Instead of returning | |
599 | cookie and using nd_set_link() to store the body to traverse, we return | |
600 | the body to traverse and store the cookie using explicit void ** argument. | |
601 | nameidata isn't passed at all - nd_jump_link() doesn't need it and | |
602 | nd_[gs]et_link() is gone. | |
603 | ||
604 | --- | |
605 | ||
606 | **mandatory** | |
607 | ||
608 | calling conventions for ->put_link() have changed. It gets inode instead of | |
609 | dentry, it does not get nameidata at all and it gets called only when cookie | |
610 | is non-NULL. Note that link body isn't available anymore, so if you need it, | |
611 | store it as cookie. | |
612 | ||
613 | --- | |
614 | ||
615 | **mandatory** | |
616 | ||
617 | any symlink that might use page_follow_link_light/page_put_link() must | |
618 | have inode_nohighmem(inode) called before anything might start playing with | |
619 | its pagecache. No highmem pages should end up in the pagecache of such | |
620 | symlinks. That includes any preseeding that might be done during symlink | |
621 | creation. __page_symlink() will honour the mapping gfp flags, so once | |
622 | you've done inode_nohighmem() it's safe to use, but if you allocate and | |
623 | insert the page manually, make sure to use the right gfp flags. | |
624 | ||
625 | --- | |
626 | ||
627 | **mandatory** | |
628 | ||
629 | ->follow_link() is replaced with ->get_link(); same API, except that | |
630 | ||
631 | * ->get_link() gets inode as a separate argument | |
632 | * ->get_link() may be called in RCU mode - in that case NULL | |
633 | dentry is passed | |
634 | ||
635 | --- | |
636 | ||
637 | **mandatory** | |
638 | ||
639 | ->get_link() gets struct delayed_call ``*done`` now, and should do | |
640 | set_delayed_call() where it used to set ``*cookie``. | |
641 | ||
642 | ->put_link() is gone - just give the destructor to set_delayed_call() | |
643 | in ->get_link(). | |
644 | ||
645 | --- | |
646 | ||
647 | **mandatory** | |
648 | ||
649 | ->getxattr() and xattr_handler.get() get dentry and inode passed separately. | |
650 | dentry might be yet to be attached to inode, so do _not_ use its ->d_inode | |
651 | in the instances. Rationale: !@#!@# security_d_instantiate() needs to be | |
652 | called before we attach dentry to inode. | |
653 | ||
654 | --- | |
655 | ||
656 | **mandatory** | |
657 | ||
658 | symlinks are no longer the only inodes that do *not* have i_bdev/i_cdev/ | |
659 | i_pipe/i_link union zeroed out at inode eviction. As the result, you can't | |
660 | assume that non-NULL value in ->i_nlink at ->destroy_inode() implies that | |
661 | it's a symlink. Checking ->i_mode is really needed now. In-tree we had | |
662 | to fix shmem_destroy_callback() that used to take that kind of shortcut; | |
663 | watch out, since that shortcut is no longer valid. | |
664 | ||
665 | --- | |
666 | ||
667 | **mandatory** | |
668 | ||
669 | ->i_mutex is replaced with ->i_rwsem now. inode_lock() et.al. work as | |
670 | they used to - they just take it exclusive. However, ->lookup() may be | |
671 | called with parent locked shared. Its instances must not | |
672 | ||
673 | * use d_instantiate) and d_rehash() separately - use d_add() or | |
674 | d_splice_alias() instead. | |
675 | * use d_rehash() alone - call d_add(new_dentry, NULL) instead. | |
676 | * in the unlikely case when (read-only) access to filesystem | |
677 | data structures needs exclusion for some reason, arrange it | |
678 | yourself. None of the in-tree filesystems needed that. | |
679 | * rely on ->d_parent and ->d_name not changing after dentry has | |
680 | been fed to d_add() or d_splice_alias(). Again, none of the | |
681 | in-tree instances relied upon that. | |
682 | ||
683 | We are guaranteed that lookups of the same name in the same directory | |
684 | will not happen in parallel ("same" in the sense of your ->d_compare()). | |
685 | Lookups on different names in the same directory can and do happen in | |
686 | parallel now. | |
687 | ||
688 | --- | |
689 | ||
690 | **recommended** | |
691 | ||
692 | ->iterate_shared() is added; it's a parallel variant of ->iterate(). | |
693 | Exclusion on struct file level is still provided (as well as that | |
694 | between it and lseek on the same struct file), but if your directory | |
695 | has been opened several times, you can get these called in parallel. | |
696 | Exclusion between that method and all directory-modifying ones is | |
697 | still provided, of course. | |
698 | ||
699 | Often enough ->iterate() can serve as ->iterate_shared() without any | |
700 | changes - it is a read-only operation, after all. If you have any | |
701 | per-inode or per-dentry in-core data structures modified by ->iterate(), | |
702 | you might need something to serialize the access to them. If you | |
703 | do dcache pre-seeding, you'll need to switch to d_alloc_parallel() for | |
704 | that; look for in-tree examples. | |
705 | ||
706 | Old method is only used if the new one is absent; eventually it will | |
707 | be removed. Switch while you still can; the old one won't stay. | |
708 | ||
709 | --- | |
710 | ||
711 | **mandatory** | |
712 | ||
713 | ->atomic_open() calls without O_CREAT may happen in parallel. | |
714 | ||
715 | --- | |
716 | ||
717 | **mandatory** | |
718 | ||
719 | ->setxattr() and xattr_handler.set() get dentry and inode passed separately. | |
720 | dentry might be yet to be attached to inode, so do _not_ use its ->d_inode | |
721 | in the instances. Rationale: !@#!@# security_d_instantiate() needs to be | |
722 | called before we attach dentry to inode and !@#!@##!@$!$#!@#$!@$!@$ smack | |
723 | ->d_instantiate() uses not just ->getxattr() but ->setxattr() as well. | |
724 | ||
725 | --- | |
726 | ||
727 | **mandatory** | |
728 | ||
729 | ->d_compare() doesn't get parent as a separate argument anymore. If you | |
730 | used it for finding the struct super_block involved, dentry->d_sb will | |
731 | work just as well; if it's something more complicated, use dentry->d_parent. | |
732 | Just be careful not to assume that fetching it more than once will yield | |
733 | the same value - in RCU mode it could change under you. | |
734 | ||
735 | --- | |
736 | ||
737 | **mandatory** | |
738 | ||
739 | ->rename() has an added flags argument. Any flags not handled by the | |
740 | filesystem should result in EINVAL being returned. | |
741 | ||
742 | --- | |
743 | ||
744 | ||
745 | **recommended** | |
746 | ||
747 | ->readlink is optional for symlinks. Don't set, unless filesystem needs | |
748 | to fake something for readlink(2). | |
749 | ||
750 | --- | |
751 | ||
752 | **mandatory** | |
753 | ||
754 | ->getattr() is now passed a struct path rather than a vfsmount and | |
755 | dentry separately, and it now has request_mask and query_flags arguments | |
756 | to specify the fields and sync type requested by statx. Filesystems not | |
757 | supporting any statx-specific features may ignore the new arguments. | |
758 | ||
759 | --- | |
760 | ||
761 | **mandatory** | |
762 | ||
763 | ->atomic_open() calling conventions have changed. Gone is ``int *opened``, | |
764 | along with FILE_OPENED/FILE_CREATED. In place of those we have | |
765 | FMODE_OPENED/FMODE_CREATED, set in file->f_mode. Additionally, return | |
766 | value for 'called finish_no_open(), open it yourself' case has become | |
767 | 0, not 1. Since finish_no_open() itself is returning 0 now, that part | |
768 | does not need any changes in ->atomic_open() instances. | |
769 | ||
770 | --- | |
771 | ||
772 | **mandatory** | |
773 | ||
774 | alloc_file() has become static now; two wrappers are to be used instead. | |
775 | alloc_file_pseudo(inode, vfsmount, name, flags, ops) is for the cases | |
776 | when dentry needs to be created; that's the majority of old alloc_file() | |
777 | users. Calling conventions: on success a reference to new struct file | |
778 | is returned and callers reference to inode is subsumed by that. On | |
779 | failure, ERR_PTR() is returned and no caller's references are affected, | |
780 | so the caller needs to drop the inode reference it held. | |
781 | alloc_file_clone(file, flags, ops) does not affect any caller's references. | |
782 | On success you get a new struct file sharing the mount/dentry with the | |
783 | original, on failure - ERR_PTR(). | |
784 | ||
785 | --- | |
786 | ||
787 | **mandatory** | |
788 | ||
789 | ->clone_file_range() and ->dedupe_file_range have been replaced with | |
790 | ->remap_file_range(). See Documentation/filesystems/vfs.rst for more | |
791 | information. | |
792 | ||
793 | --- | |
794 | ||
795 | **recommended** | |
796 | ||
797 | ->lookup() instances doing an equivalent of:: | |
798 | ||
799 | if (IS_ERR(inode)) | |
800 | return ERR_CAST(inode); | |
801 | return d_splice_alias(inode, dentry); | |
802 | ||
803 | don't need to bother with the check - d_splice_alias() will do the | |
804 | right thing when given ERR_PTR(...) as inode. Moreover, passing NULL | |
805 | inode to d_splice_alias() will also do the right thing (equivalent of | |
806 | d_add(dentry, NULL); return NULL;), so that kind of special cases | |
807 | also doesn't need a separate treatment. | |
808 | ||
809 | --- | |
810 | ||
811 | **strongly recommended** | |
812 | ||
813 | take the RCU-delayed parts of ->destroy_inode() into a new method - | |
814 | ->free_inode(). If ->destroy_inode() becomes empty - all the better, | |
815 | just get rid of it. Synchronous work (e.g. the stuff that can't | |
816 | be done from an RCU callback, or any WARN_ON() where we want the | |
817 | stack trace) *might* be movable to ->evict_inode(); however, | |
818 | that goes only for the things that are not needed to balance something | |
819 | done by ->alloc_inode(). IOW, if it's cleaning up the stuff that | |
820 | might have accumulated over the life of in-core inode, ->evict_inode() | |
821 | might be a fit. | |
822 | ||
823 | Rules for inode destruction: | |
824 | ||
825 | * if ->destroy_inode() is non-NULL, it gets called | |
826 | * if ->free_inode() is non-NULL, it gets scheduled by call_rcu() | |
827 | * combination of NULL ->destroy_inode and NULL ->free_inode is | |
828 | treated as NULL/free_inode_nonrcu, to preserve the compatibility. | |
829 | ||
830 | Note that the callback (be it via ->free_inode() or explicit call_rcu() | |
831 | in ->destroy_inode()) is *NOT* ordered wrt superblock destruction; | |
832 | as the matter of fact, the superblock and all associated structures | |
833 | might be already gone. The filesystem driver is guaranteed to be still | |
834 | there, but that's it. Freeing memory in the callback is fine; doing | |
835 | more than that is possible, but requires a lot of care and is best | |
836 | avoided. | |
837 | ||
838 | --- | |
839 | ||
840 | **mandatory** | |
841 | ||
842 | DCACHE_RCUACCESS is gone; having an RCU delay on dentry freeing is the | |
843 | default. DCACHE_NORCU opts out, and only d_alloc_pseudo() has any | |
844 | business doing so. | |
845 | ||
846 | --- | |
847 | ||
848 | **mandatory** | |
849 | ||
850 | d_alloc_pseudo() is internal-only; uses outside of alloc_file_pseudo() are | |
851 | very suspect (and won't work in modules). Such uses are very likely to | |
852 | be misspelled d_alloc_anon(). | |
d9a9f484 AV |
853 | |
854 | --- | |
855 | ||
856 | **mandatory** | |
857 | ||
858 | [should've been added in 2016] stale comment in finish_open() nonwithstanding, | |
859 | failure exits in ->atomic_open() instances should *NOT* fput() the file, | |
860 | no matter what. Everything is handled by the caller. | |
df820f8d MS |
861 | |
862 | --- | |
863 | ||
864 | **mandatory** | |
865 | ||
866 | clone_private_mount() returns a longterm mount now, so the proper destructor of | |
867 | its result is kern_unmount() or kern_unmount_array(). |