git.proxmox.com Git - mirror_ubuntu-bionic-kernel.git/commit

xfs: remote attribute blocks aren't really userdata

When adding a new remote attribute, we write the attribute to the
new extent before the allocation transaction is committed. This
means we cannot reuse busy extents as that violates crash
consistency semantics. Hence we currently treat remote attribute
extent allocation like userdata because it has the same overwrite
ordering constraints as userdata.

Unfortunately, this also allows the allocator to incorrectly apply
extent size hints to the remote attribute extent allocation. This
results in interesting failures, such as transaction block
reservation overruns and in-memory inode attribute fork corruption.

To fix this, we need to separate the busy extent reuse configuration
from the userdata configuration. This changes the definition of
XFS_BMAPI_METADATA slightly - it now means that allocation is
metadata and reuse of busy extents is acceptible due to the metadata
ordering semantics of the journal. If this flag is not set, it
means the allocation is that has unordered data writeback, and hence
busy extent reuse is not allowed. It no longer implies the
allocation is for user data, just that the data write will not be
strictly ordered. This matches the semantics for both user data
and remote attribute block allocation.

As such, This patch changes the "userdata" field to a "datatype"
field, and adds a "no busy reuse" flag to the field.
When we detect an unordered data extent allocation, we immediately set
the no reuse flag. We then set the "user data" flags based on the
inode fork we are allocating the extent to. Hence we only set
userdata flags on data fork allocations now and consider attribute
fork remote extents to be an unordered metadata extent.

The result is that remote attribute extents now have the expected
allocation semantics, and the data fork allocation behaviour is
completely unchanged.

It should be noted that there may be other ways to fix this (e.g.
use ordered metadata buffers for the remote attribute extent data
write) but they are more invasive and difficult to validate both
from a design and implementation POV. Hence this patch takes the
simple, obvious route to fixing the problem...

Reported-and-tested-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

author	Dave Chinner <dchinner@redhat.com>
	Sun, 25 Sep 2016 22:21:28 +0000 (08:21 +1000)
committer	Dave Chinner <david@fromorbit.com>
	Sun, 25 Sep 2016 22:21:28 +0000 (08:21 +1000)
commit	292378edcb408c652e841fdc867fc14f8b4995fa
tree	7a7c1961c4083c311f4ff7cf0bede090e72223f2	tree
parent	ea78d80866ce375defb2fdd1c8a3aafec95e0f85	commit \| diff

fs/xfs/libxfs/xfs_alloc.c		diff \| blob \| blame \| history
fs/xfs/libxfs/xfs_alloc.h		diff \| blob \| blame \| history
fs/xfs/libxfs/xfs_bmap.c		diff \| blob \| blame \| history
fs/xfs/libxfs/xfs_bmap.h		diff \| blob \| blame \| history
fs/xfs/xfs_bmap_util.c		diff \| blob \| blame \| history
fs/xfs/xfs_extent_busy.c		diff \| blob \| blame \| history
fs/xfs/xfs_filestream.c		diff \| blob \| blame \| history
fs/xfs/xfs_trace.h		diff \| blob \| blame \| history