]> git.proxmox.com Git - mirror_zfs.git/log
mirror_zfs.git
3 months agoAdd 'zpool status -e' flag to see unhealthy vdevs
Cameron Harr [Wed, 7 Feb 2024 17:12:12 +0000 (09:12 -0800)]
Add 'zpool status -e' flag to see unhealthy vdevs

When very large pools are present, it can be laborious to find
reasons for why a pool is degraded and/or where an unhealthy vdev
is. This option filters out vdevs that are ONLINE and with no errors
to make it easier to see where the issues are. Root and parents of
unhealthy vdevs will always be printed.

Testing:
ZFS errors and drive failures for multiple vdevs were simulated with
zinject.

Sample vdev listings with '-e' option
- All vdevs healthy
    NAME        STATE     READ WRITE CKSUM
    iron5       ONLINE       0     0     0

- ZFS errors
    NAME        STATE     READ WRITE CKSUM
    iron5       ONLINE       0     0     0
      raidz2-5  ONLINE       1     0     0
        L23     ONLINE       1     0     0
        L24     ONLINE       1     0     0
        L37     ONLINE       1     0     0

- Vdev faulted
    NAME        STATE     READ WRITE CKSUM
    iron5       DEGRADED     0     0     0
      raidz2-6  DEGRADED     0     0     0
        L67     FAULTED      0     0     0  too many errors

- Vdev faults and data errors
    NAME        STATE     READ WRITE CKSUM
    iron5       DEGRADED     0     0     0
      raidz2-1  DEGRADED     0     0     0
        L2      FAULTED      0     0     0  too many errors
      raidz2-5  ONLINE       1     0     0
        L23     ONLINE       1     0     0
        L24     ONLINE       1     0     0
        L37     ONLINE       1     0     0
      raidz2-6  DEGRADED     0     0     0
        L67     FAULTED      0     0     0  too many errors

- Vdev missing
    NAME        STATE     READ WRITE CKSUM
    iron5       DEGRADED     0     0     0
      raidz2-6  DEGRADED     0     0     0
        L67     UNAVAIL      3     1     0

- Slow devices when -s provided with -e
    NAME        STATE     READ WRITE CKSUM  SLOW
    iron5       DEGRADED     0     0     0     -
      raidz2-5  DEGRADED     0     0     0     -
        L10     FAULTED      0     0     0     0  external device fault
        L51     ONLINE       0     0     0    14

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Cameron Harr <harr1@llnl.gov>
Closes #15769

3 months agozed: fix typo in variable ZED_POWER_OFF_ENCLO*US*RE_SLOT_ON_FAULT
Mauricio Faria de Oliveira [Sat, 9 Dec 2023 00:32:35 +0000 (21:32 -0300)]
zed: fix typo in variable ZED_POWER_OFF_ENCLO*US*RE_SLOT_ON_FAULT

Replace ENCLO_US_RE with ENCLO_SU_RE in the name of the variable.

Note this changes the user-visible string in zed.rc, thus might
break current users with the wrong string, but it's ~2 months
since zfs-2.2.0 tag is out, thus should not be widespread yet.

Mechanical change:

    $ grep -rl ZED_POWER_OFF_ENCLOUSRE_SLOT_ON_FAULT
    cmd/zed/zed.d/zed.rc
    cmd/zed/zed.d/statechange-slot_off.sh

    $ sed -i 's/ZED_POWER_OFF_ENCLOUSRE_SLOT_ON_FAULT/<linebreak>
                ZED_POWER_OFF_ENCLOSURE_SLOT_ON_FAULT/g' \
      cmd/zed/zed.d/zed.rc \
      cmd/zed/zed.d/statechange-slot_off.sh

    $ grep -rl ZED_POWER_OFF_ENCLOUSRE_SLOT_ON_FAULT
    $

Fixes 11fbcacf37d1a66c7a40bb8920c70ce9a87270ea
("zed: Add zedlet to power off slot when drive is faulted")

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Closes #15651

4 months agoImprove performance for zpool trim on linux
Umer Saleem [Fri, 2 Feb 2024 19:51:51 +0000 (00:51 +0500)]
Improve performance for zpool trim on linux

On Linux, ZFS uses blkdev_issue_discard in vdev_disk_io_trim to issue
trim command which is synchronous.

This commit updates vdev_disk_io_trim to use __blkdev_issue_discard,
which is asynchronous. Unfortunately there isn't any asynchronous
version for blkdev_issue_secure_erase, so performance of secure trim
will still suffer.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15843

4 months agoBRT: Fix FICLONE/FICLONERANGE shortened copy
Tony Hutter [Tue, 6 Feb 2024 17:55:43 +0000 (09:55 -0800)]
BRT: Fix FICLONE/FICLONERANGE shortened copy

On Linux the ioctl_ficlonerange() and ioctl_ficlone() system calls
are expected to either fully clone the specified range or return an
error.  The range may be for an entire file.  While internally ZFS
supports cloning partial ranges there's no way to return the length
cloned to the caller so we need to make this all or nothing.

As part of this change support for the REMAP_FILE_CAN_SHORTEN flag
has been added.  When REMAP_FILE_CAN_SHORTEN is set zfs_clone_range()
will return a shortened range when encountering pending dirty records.
When it's clear zfs_clone_range() will block and wait for the records
to be written out allowing the blocks to be cloned.

Furthermore, the file range lock is held over the region being cloned
to prevent it from being modified while cloning.  This doesn't quite
provide an atomic semantics since if an error is encountered only a
portion of the range may be cloned.  This will be converted to an
error if REMAP_FILE_CAN_SHORTEN was not provided and returned to the
caller.  However, the destination file range is left in an undefined
state.

A test case has been added which exercises this functionality by
verifying that `cp --reflink=never|auto|always` works correctly.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15728
Closes #15842

4 months agoFix the FreeBSD userspace build (#15716)
Mark Johnston [Wed, 27 Dec 2023 20:17:53 +0000 (15:17 -0500)]
Fix the FreeBSD userspace build (#15716)

- Mark some parameters to zpool_power*() as unused.
- Add a stub zpool_disk_wait().

Fixes: a9520e6e5 ("zpool: Add slot power control, print power status")
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
4 months agozpool: Add slot power control, print power status
Tony Hutter [Thu, 21 Dec 2023 18:53:16 +0000 (10:53 -0800)]
zpool: Add slot power control, print power status

Add `zpool` flags to control the slot power to drives.  This assumes
your SAS or NVMe enclosure supports slot power control via sysfs.

The new `--power` flag is added to `zpool offline|online|clear`:

    zpool offline --power <pool> <device>    Turn off device slot power
    zpool online --power <pool> <device>     Turn on device slot power
    zpool clear --power <pool> [device]      Turn on device slot power

If the ZPOOL_AUTO_POWER_ON_SLOT env var is set, then the '--power'
option is automatically implied for `zpool online` and `zpool clear`
and does not need to be passed.

zpool status also gets a --power option to print the slot power status.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mart Frauenlob <AllKind@fastest.cc>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15662

4 months agozed: misc vdev_enc_sysfs_path fixes
Tony Hutter [Tue, 7 Nov 2023 17:09:24 +0000 (09:09 -0800)]
zed: misc vdev_enc_sysfs_path fixes

There have been rare cases where the VDEV_ENC_SYSFS_PATH value that zed
gets passed is stale.  To mitigate this, dynamically check the sysfs
path at the time of zed event processing, and use the dynamic value if
possible.  Note that there will be other times when we can not
dynamically detect the sysfs path (like if a disk disappears) and have
to rely on the old value for things like turning on the fault LED.  That
is to say, we can't just blindly use the dynamic path in every case.

Also:
- Add enclosure sysfs entry when running 'zpool add'
- Fix 'slot' and 'enc' zpool.d scripts for nvme

Reviewed-by: Don Brady <dev.fs.zfs@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15462

4 months agoZTS: Add dirty dnode stress test
Tony Hutter [Mon, 11 Dec 2023 17:59:59 +0000 (09:59 -0800)]
ZTS: Add dirty dnode stress test

Add a test for the dirty dnode SEEK_HOLE/SEEK_DATA bug described in
https://github.com/openzfs/zfs/issues/15526

The bug was fixed in https://github.com/openzfs/zfs/pull/15571 and
was backported to 2.2.2 and 2.1.14.  This test case is just to
make sure it does not come back.

seekflood.c originally written by Rob Norris.

Reviewed-by: Graham Perrin <grahamperrin@freebsd.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15608

4 months agoLinux 6.8 compat: handle mnt_idmap user_namespace change
Rob Norris [Tue, 23 Jan 2024 10:14:06 +0000 (21:14 +1100)]
Linux 6.8 compat: handle mnt_idmap user_namespace change

struct mnt_idmap no longer has a struct user_namespace within it. Work
around this by creating a temporary with the copy of the map we need
taken from the idmap.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Youzhong Yang <yyang@mathworks.com>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #15805

4 months agoLinux 6.8 compat: fix inode permission tests
Rob Norris [Tue, 23 Jan 2024 06:43:20 +0000 (17:43 +1100)]
Linux 6.8 compat: fix inode permission tests

The name inode_permission is now defined in the kernel. Rename ours to
test_permission, in line with most of our other tests.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #15805

4 months agoLinux 6.8 compat: replace MAX_ORDER define
Rob Norris [Tue, 23 Jan 2024 05:41:05 +0000 (16:41 +1100)]
Linux 6.8 compat: replace MAX_ORDER define

MAX_ORDER has been renamed to MAX_PAGE_ORDER. Rather than just
redefining it, instead define our own name and set it consistently from
the start.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #15805

4 months agoLinux 6.8 compat: implement strlcpy fallback
Rob Norris [Tue, 23 Jan 2024 05:34:49 +0000 (16:34 +1100)]
Linux 6.8 compat: implement strlcpy fallback

Linux has removed strlcpy in favour of strscpy. This implements a
fallback implementation of strlcpy for this case.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #15805

4 months agoLinux 6.8 compat: update for new bdev access functions
Rob Norris [Tue, 23 Jan 2024 04:42:57 +0000 (15:42 +1100)]
Linux 6.8 compat: update for new bdev access functions

blkdev_get_by_path() and blkdev_put() have been replaced by
bdev_open_by_path() and bdev_release(), which return a "handle" object
with the bdev object itself inside.

This adds detection for the new functions, and macros to handle the old
and new forms consistently.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #15805

4 months agoLinux 6.8 compat: make test functions static
Rob Norris [Mon, 22 Jan 2024 23:50:53 +0000 (10:50 +1100)]
Linux 6.8 compat: make test functions static

The kernel is now being compiled with -Wmissing-prototypes. Most of our
test stub functions had no prototype, and failed to compile. Since they
don't need to be visible anywhere else, just make them all static.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #15805

4 months agoLinux 6.7 compat: META
Brian Behlendorf [Mon, 29 Jan 2024 19:35:43 +0000 (11:35 -0800)]
Linux 6.7 compat: META

Update the META file to reflect compatibility with the 6.7 kernel.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15833

4 months agoDon't assert mg_initialized due to device addition race
Paul Dagnelie [Mon, 29 Jan 2024 18:36:42 +0000 (10:36 -0800)]
Don't assert mg_initialized due to device addition race

During device removal stress tests, we noticed that we were tripping
the assertion that mg_initialized was true. After investigation, it was
determined that the mg in question was the embedded log metaslab
group for a newly added vdev; the normal mg had been initialized (by
metaslab_sync_reassess, via vdev_sync_done). However, because the spa
config alloc lock is not held as writer across both calls to
metaslab_sync_reassess, it is possible for an allocation to happen
between the two metaslab_groups being initialized. Because the metaslab
code doesn't check the group in question, just the vdev's main mg, it
is possible to get past the initial check in vdev_allocatable and
later fail due to the assertion.

We simply remove the assertions. We could also consider locking the
ALLOC lock around the reassess calls in vdev_sync_done, but that risks
deadlocks. We could check the actual target mg in vdev_allocatable,
but that risks racing with a passivation that comes in after that
check but before the assertion. We still won't be able to actually
allocate from the metaslab group if no metaslabs are ready, so this
change shouldn't break anything.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #15818

4 months agoUpdate man pages to time(1) from time(2)
Chris Davidson [Mon, 29 Jan 2024 17:44:08 +0000 (12:44 -0500)]
Update man pages to time(1) from time(2)

zpool-iostat.8: Updated time(2) -> time(1) to align to manual page
zpool-list.8: Updated time(2) -> time(1) to align to manual page
zpool-status.8: Updated time(2) -> time(1) to align to manual page
zpool-wait.8: Update time(2) -> time(1) to align to manual page

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christopher Davidson <christopher.davidson@gmail.com>
Closes #15823

4 months agoZTS: Allow longer run time for zdb_args_pos
Brian Behlendorf [Mon, 29 Jan 2024 17:41:26 +0000 (09:41 -0800)]
ZTS: Allow longer run time for zdb_args_pos

The zdb_args_pos test may take slightly longer than 600 seconds to run
on some of the CI builders.  To prevent this from causing failures allow
up to 1200 seconds for tests in this group.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15826

4 months agoMove nodes into correct subgraphs
Andrew Innes [Mon, 29 Jan 2024 17:16:02 +0000 (01:16 +0800)]
Move nodes into correct subgraphs

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Andrew Innes <andrew.c12@gmail.com>
Closes #15828

4 months agozpool wait: print timestamp before the header
Rob N [Fri, 26 Jan 2024 22:41:31 +0000 (09:41 +1100)]
zpool wait: print timestamp before the header

list, status and iostat all display the -T timestamp before the header,
but wait showed it after. Make it be like the others.

Reported-by: Kyle Evans <kevans@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15825

4 months agoUpdate vdev devid and physpath if changed between imports
Ameer Hamza [Fri, 26 Jan 2024 22:24:35 +0000 (03:24 +0500)]
Update vdev devid and physpath if changed between imports

If devid or physpath for a vdev changes between imports, ensure it is
updated to the new value.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #15816

4 months agoZTS: Update deprecated Github Action version numbers
Tino Reichardt [Fri, 26 Jan 2024 22:22:26 +0000 (23:22 +0100)]
ZTS: Update deprecated Github Action version numbers

GitHub Actions is transitioning from Node 16 to Node 20.

So we need to update these:
- actions/checkout@v3 -> v4
- actions/download-artifact@v3 -> v4
- actions/upload-artifact@v3 -> v4 and some minor changes

Update also the documentation of the testings workflow.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Andrew Innes <andrew.c12@gmail.com>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #15820

4 months agoSwitch to CodeQL to detect prohibited function use
Richard Yao [Fri, 26 Jan 2024 22:11:33 +0000 (17:11 -0500)]
Switch to CodeQL to detect prohibited function use

The LLVM/Clang developers pointed out that using the CPP to detect use
of functions that our QA policies prohibit risks invoking undefined
behavior. To resolve this, we configure CodeQL to detect forbidden
function usage.

Note that cpp in the context of CodeQL refers to C/C++, rather than the
C PreProcessor, which C++ also uses. It really should have been written
cxx, but that ship sailed a long time ago. This misuse of the term cpp
is retained in the CodeQL configuration for consistency with upstream
CodeQL.

As a side benefit, verbose make no longer is a wall of text showing a
bunch of CPP macros, which can make debugging slightly easier.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #15819
Closes #14134

4 months agoZTS: Apply small changes for speeding up the tests
Tino Reichardt [Fri, 26 Jan 2024 21:36:59 +0000 (22:36 +0100)]
ZTS: Apply small changes for speeding up the tests

The Github Action Runner got some new hardware metrics.  We should use
the provided and empty disk which is pre-mounted at /mnt now.

Disk1: 89GiB -> rootfs + bootfs with ~80MB/s -> don't care
Disk2: 64GiB -> /mnt with 420MB/s -> new testing ssd

This commit will mount the new disk to /var/tmp and provide hopefully
some speedups within our testings.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Andrew Innes <andrew.c12@gmail.com>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #15811

4 months agoFreeBSD: Fix bootstrapping tools under Linux/musl
Val Packett [Fri, 19 Jan 2024 21:01:26 +0000 (18:01 -0300)]
FreeBSD: Fix bootstrapping tools under Linux/musl

musl libc has deprecated LFS64 aliases, so bootstrapping FreeBSD tools
under musl distros has been failing with stat64 errors.

Apply the aliases under non-glibc Linux to fix this problem.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Val Packett <val@packett.cool>
Closes #15780

4 months agolinux spl: fix typo in top comment of spl-condvar.c
Tino Reichardt [Wed, 17 Jan 2024 17:05:12 +0000 (18:05 +0100)]
linux spl: fix typo in top comment of spl-condvar.c

Credential Implementation -> Condition Variables Implementation

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #15782

4 months agoMake sure all necessary RPM path macros are defined
Lalufu [Tue, 16 Jan 2024 21:32:59 +0000 (22:32 +0100)]
Make sure all necessary RPM path macros are defined

When building (s)rpm files through the Makefile, a directory structure
is created in /tmp to hold the various files.

In case the user running the command has overridden some of the RPM path
settings through their user profile (for example in `~/.rpmmacros`),
these paths do not line up with the configuration, and the build fails.

Make sure all paths used are properly defined.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ralf Ertzinger <ralf@skytale.net>
Closes #15756

4 months agoMake spl_kmem_cache size check consistent
youzhongyang [Tue, 16 Jan 2024 21:30:58 +0000 (16:30 -0500)]
Make spl_kmem_cache size check consistent

On Linux x86_64, kmem cache can have size up to 4M,
however increasing spl_kmem_cache_slab_limit can lead
to crash due to the size check inconsistency.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Closes #15757

4 months agoAdd path handling for aux vdevs in `label_path`
Ameer Hamza [Thu, 4 Jan 2024 14:35:04 +0000 (19:35 +0500)]
Add path handling for aux vdevs in `label_path`

If the AUX vdev is added using UUID, importing the pool falls back AUX
vdev to open it with disk name instead of UUID due to the absence of
path information for AUX vdevs. Since AUX label now have path
information, this PR adds path handling for it in `label_path`.

Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #15737

4 months agoExtend aux label to add path information
Ameer Hamza [Thu, 4 Jan 2024 14:32:53 +0000 (19:32 +0500)]
Extend aux label to add path information

Pool import logic uses vdev paths, so it makes sense to add path
information on AUX vdev as well.

Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #15737

4 months agofix: Uber block label not always found for aux vdevs
Ameer Hamza [Thu, 4 Jan 2024 14:02:50 +0000 (19:02 +0500)]
fix: Uber block label not always found for aux vdevs

When spare or l2cache (aux) vdev is added during pool creation,
spa->spa_uberblock is not dumped until that point. Subsequently,
the aux label is never synchronized after its initial creation,
resulting in the uberblock label remaining undumped. The uberblock
is crucial for lib_blkid in identifying the ZFS partition type. To
address this issue, we now ensure sync of the uberblock label once
if it's not dumped initially.

Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #15737

4 months agoFix "out of memory" error
Brian Behlendorf [Fri, 12 Jan 2024 20:35:29 +0000 (12:35 -0800)]
Fix "out of memory" error

Drop the no_memory() call from zpool_in_use() when reading the
label fails and instead return the error to the caller.  This
prevents a misleading "internal error: out of memory" error
when the label can't be read.  This will result in is_spare()
returning B_FALSE instead of aborting, which is already safely
handled.

Furthermore, on Linux it's possible for EREMOTEIO to returned
by an NVMe device if the device has been low-level formatted
and not rescanned.  In this case we want to fallback to the
legacy scanning method and read any of the labels we can.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #13538
Closes #15747

4 months agofix: preserve linux kmod signature in zfs-kmod rpm spec
Benjamin Sherman [Fri, 12 Jan 2024 20:33:41 +0000 (14:33 -0600)]
fix: preserve linux kmod signature in zfs-kmod rpm spec

This change provides rpm spec macros to sign the zfs and spl kmods as
the final step after the %install scriptlet. This is needed since the
find-debuginfo.sh script strips out debug symbols plus signatures.

Kernel module signing only occurs when the required files are present
as typically required in the Linux source tree:
- certs/signing_key.pem
- certs/signing_key.x509

The method for overriding the default __spec_install_post macro is
inspired by (and largely copied from) the Fedora kernel.spec.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Benjamin Sherman <benjamin@holyarmy.org>
Closes #15744

4 months agofix(mount): do not truncate shares not zfs mount
Stefan Lendl [Fri, 12 Jan 2024 20:05:11 +0000 (21:05 +0100)]
fix(mount): do not truncate shares not zfs mount

When running zfs share -a resetting the exports.d/zfs.exports makes
sense the get a clean state.
Truncating was also called with zfs mount which would not populate the
file again.
Add test to verify shares persist after mount -a.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Stefan Lendl <s.lendl@proxmox.com>
Closes #15607
Closes #15660

4 months agoFix a potential use-after-free in zfs_setsecattr()
Mark Johnston [Tue, 9 Jan 2024 23:57:09 +0000 (18:57 -0500)]
Fix a potential use-after-free in zfs_setsecattr()

In general, VOPs must not load the "z_log" field until having called
zfs_enter_verify_zp().

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #15752

4 months agoLinux: Defer loading the object set in zfs_setattr()
Mark Johnston [Tue, 9 Jan 2024 15:57:29 +0000 (10:57 -0500)]
Linux: Defer loading the object set in zfs_setattr()

We need to wait until after having done a zfs_enter() to load some
fields from the zfsvfs structure.  Otherwise a use-after-free is
possible in the face of a concurrent rollback.

Other functions in this file are careful to avoid this bug, I believe
this is the only instance.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #15752

4 months agoMake zdb -R scale less poorly
Rich Ercolani [Fri, 12 Jan 2024 19:55:17 +0000 (14:55 -0500)]
Make zdb -R scale less poorly

zdb -R with :d tries to use gzip decompression 9 times per size.
There's absolutely no reason for that, they're all the same
decompressor.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #15726

4 months agoStop wasting time on malloc in snprintf_zstd_header
Rich Ercolani [Fri, 12 Jan 2024 20:17:26 +0000 (15:17 -0500)]
Stop wasting time on malloc in snprintf_zstd_header

Profiling zdb -vvvvv on datasets with a lot of zstd blocks, we find
ourselves spending quite a lot of time on malloc/free, because we
allocate a 16M abd each call, and never free it, so we're leaking
16M per call as well.

This seems sub-optimal. So let's just keep the buffer around and
reuse it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #15721

4 months agoFix file descriptor leak on pool import.
Pawel Jakub Dawidek [Tue, 23 Jan 2024 23:03:48 +0000 (15:03 -0800)]
Fix file descriptor leak on pool import.

Descriptor leak can be easily reproduced by doing:

# zpool import tank
# sysctl kern.openfiles
# zpool export tank; zpool import tank
# sysctl kern.openfiles

We were leaking four file descriptors on every import.

Similar leak most likely existed when using file-based VDEVs.

External-issue: https://reviews.freebsd.org/D43529
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #15630

4 months agoZTS: Apply zfs_bclone_enabled to bclone tests
Brian Behlendorf [Tue, 23 Jan 2024 00:15:03 +0000 (16:15 -0800)]
ZTS: Apply zfs_bclone_enabled to bclone tests

If block cloning is disabled by default then enable it when running
the bclone tests.  Follow up to #15529.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15796

4 months agofix: variable type with zfs-tests/cmd/clonefile.c
Tino Reichardt [Wed, 17 Jan 2024 17:06:14 +0000 (18:06 +0100)]
fix: variable type with zfs-tests/cmd/clonefile.c

Compiling on arm64 freebsd-13.2 and arm64 almalinux-8 brings currently
this error:

```
  CC       tests/zfs-tests/cmd/clonefile.o
tests/zfs-tests/cmd/clonefile.c:166:43: error: result of comparison of \
constant -1 with expression of type 'char' is always true \
[-Werror,-Wtautological-constant-out-of-range-compare]
        while ((c = getopt(argc, argv, "crfdq")) != -1) {
               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^  ~~
1 error generated.
gmake[2]: *** [Makefile:8675: tests/zfs-tests/cmd/clonefile.o] Error 1
```

Fix: use correct variable type `int`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #15783

4 months agoFix cloning into mmaped and cached file.
Pawel Jakub Dawidek [Wed, 17 Jan 2024 16:51:07 +0000 (08:51 -0800)]
Fix cloning into mmaped and cached file.

If the destination file is mmaped and the mmaped region was already
read, so it is cached, we need to update mmaped pages after successful
clone using update_pages().

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Pointed out by: Ka Ho Ng <khng@freebsd.org>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #15772

4 months agoZTS: Test for clone, mmap and write for block cloning
Umer Saleem [Tue, 16 Jan 2024 21:15:10 +0000 (02:15 +0500)]
ZTS: Test for clone, mmap and write for block cloning

For block cloning, if we mmap the cloned file and write from the
map into the file, it triggers a panic in dbuf_redirty() on Linux.

The same scenario causes data corruption on FreeBSD. Both these
issues are fixed under PR#15656 and PR#15665.

It would be good to add a test for this scenario in ZTS. The test
program and issue was produced by @robn.

Reviewed-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15717

4 months agoEnable block_cloning tests on FreeBSD
Brian Behlendorf [Fri, 12 Jan 2024 19:57:13 +0000 (11:57 -0800)]
Enable block_cloning tests on FreeBSD

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #15749

4 months agoBlock cloning tests.
Pawel Jakub Dawidek [Tue, 26 Dec 2023 20:01:53 +0000 (12:01 -0800)]
Block cloning tests.

The test mostly focus on testing various corner cases.
The tests take a long time to run, so for the common.run runfile
we randomly select a hundred tests.
To run all the bclone tests, bclone.run runfile should be used.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #15631

4 months agoTest LWB buffer overflow for block cloning
Umer Saleem [Fri, 15 Dec 2023 22:18:27 +0000 (03:18 +0500)]
Test LWB buffer overflow for block cloning

PR#15634 removes 128K into 2x68K LWB split optimization, since it
was found to cause LWB buffer overflow while trying to write 128KB
TX_CLONE_RANGE record with 1022 block pointers into 68KB buffer,
with multiple VDEVs ZIL.

This commit adds a test for this particular scenario by writing
maximum sizes TX_CLONE_RANE record with 1022 block pointers into
68KB buffer, with two SLOG devices.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15672

4 months agoZTS: Add test cases for block cloning replay
Ameer Hamza [Thu, 30 Nov 2023 20:14:56 +0000 (01:14 +0500)]
ZTS: Add test cases for block cloning replay

Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #15614

4 months agoZTS: block_cloning: Use numeric sort for get_same_blocks
Ameer Hamza [Wed, 6 Dec 2023 20:18:43 +0000 (01:18 +0500)]
ZTS: block_cloning: Use numeric sort for get_same_blocks

Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #15614

4 months agoAutotrim High Load Average Fix
Kevin Jin [Wed, 17 Jan 2024 17:03:58 +0000 (12:03 -0500)]
Autotrim High Load Average Fix

Switch from cv_wait() to cv_wait_idle() in vdev_autotrim_wait_kick(),
which should mitigate the high load average while waiting.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: jxdking <lostking2008@hotmail.com>
Closes #15781

4 months agoLinux 6.7 compat: zfs_setattr fix atime update
Rob N [Tue, 16 Jan 2024 22:01:17 +0000 (09:01 +1100)]
Linux 6.7 compat: zfs_setattr fix atime update

In db4fc559c I messed up and changed this bit of code to set the inode
atime to an uninitialised value, when actually it was just supposed to
loading the atime from the inode to be stored in the SA. This changes it
to what it should have been.

Ensure times change by the right amount Previously, we only checked
if the times changed at all, which missed a bug where the atime was
being set to an undefined value.

Now ensure the times change by two seconds (or thereabouts), ensuring
we catch cases where we set the time to something bonkers

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #15762
Closes #15773

4 months agocompact: workaround for GPL-only symbols on riscv from Linux 6.2
Shengqi Chen [Wed, 6 Dec 2023 20:37:50 +0000 (04:37 +0800)]
compact: workaround for GPL-only symbols on riscv from Linux 6.2

Since Linux 6.2, the implementation of flush_dcache_page on riscv
references GPL-only symbol `PageHuge`, breaking the build of zfs.

This patch uses existing mechanism to override flush_dcache_page,
removing the call to `PageHuge`. According to comments in kernel,
it is only used to do some check against HugeTLB pages, which only
exist in userspace. ZFS uses flush_dcache_page only on kernel pages,
thus this patch will not introduce any behaviour change.

See also: torvalds/linux@d33deda, openzfs/zfs@589f59b

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #14974
Closes #15627

4 months agospa: Let spa_taskq_param_get()'s addition of a newline be optional
Mark Johnston [Fri, 29 Dec 2023 17:56:35 +0000 (12:56 -0500)]
spa: Let spa_taskq_param_get()'s addition of a newline be optional

For FreeBSD sysctls, we don't want the extra newline, since the
sysctl(8) utility will format strings appropriately.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reported-by: Peter Holm <pho@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #15719

4 months agospa: Fix FreeBSD sysctl handlers
Mark Johnston [Fri, 29 Dec 2023 15:22:58 +0000 (10:22 -0500)]
spa: Fix FreeBSD sysctl handlers

sbuf_cpy() resets the sbuf state, which is wrong for sbufs allocated by
sbuf_new_for_sysctl().  In particular, this code triggers an assertion
failure in sbuf_clear().

Simplify by just using sysctl_handle_string() for both reading and
setting the tunable.

Fixes: 6930ecbb7 ("spa: make read/write queues configurable")
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reported-by: Peter Holm <pho@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #15719

4 months agofreebsd: fix compile for spa_taskq_read/spa_taskq_write params
Rob Norris [Thu, 11 Jan 2024 08:43:38 +0000 (19:43 +1100)]
freebsd: fix compile for spa_taskq_read/spa_taskq_write params

Missed in #15695, backporting #15675.

Signed-off-by: Rob Norris <robn@despairlabs.com>
4 months agoFix livelist assertions for dedup and cloning
Alexander Motin [Tue, 9 Jan 2024 17:48:40 +0000 (12:48 -0500)]
Fix livelist assertions for dedup and cloning

Two block pointers in livelist pointing to the same location may
be caused not only by dedup, but also by block cloning. We should
not assert D bit set in them.

Two block pointers in livelist pointing to the same location may
have different logical birth time in case of dedup or cloning. We
should assert identical physical birth time instead.

Assert identical physical block size between pointers in addition
to checksum, since that is what checksums are calculated on.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15732

4 months agoImprove block sizes checks during cloning
Alexander Motin [Tue, 9 Jan 2024 17:46:43 +0000 (12:46 -0500)]
Improve block sizes checks during cloning

- Fail if source block is smaller than destination.  We can only
grow blocks, not shrink them.
 - Fail if we do not have full znode range lock.  In that case grow
is not even called.  We should improve zfs_rangelock_cb() somehow
to know when cloning needs to grow the block size unlike write.
 - Fail of we tried to resize, but failed.  There are many reasons
for it to fail that we can not predict at this level, so be ready
for them.  Unlike write, that may proceed after growth failure,
block cloning can't and must return error.

This fixes assertion inside dmu_brt_clone() when it sees different
number of blocks held in destination than it got block pointers.
Builds without ZFS_DEBUG returned EXDEV, so are not affected much.

Reviewed-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15724
Closes #15735

4 months agoLinux 6.2 compat: add check for kernel_neon_* availability
Shengqi Chen [Tue, 9 Jan 2024 00:05:24 +0000 (08:05 +0800)]
Linux 6.2 compat: add check for kernel_neon_* availability

This patch adds check for `kernel_neon_*` symbols on arm and arm64
platforms to address the following issues:

1. Linux 6.2+ on arm64 has exported them with `EXPORT_SYMBOL_GPL`, so
   license compatibility must be checked before use.
2. On both arm and arm64, the definitions of these symbols are guarded
   by `CONFIG_KERNEL_MODE_NEON`, but their declarations are still
   present. Checking in configuration phase only leads to MODPOST
   errors (undefined references).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #15711
Closes #14555
Closes: #15401
4 months agoDon't panic on unencrypted block in encrypted dataset
chrisperedun [Thu, 21 Dec 2023 19:12:30 +0000 (14:12 -0500)]
Don't panic on unencrypted block in encrypted dataset

While 763ca47 closes the situation of block cloning creating
unencrypted records in encrypted datasets, existing data still causes
panic on read. Setting zfs_recover bypasses this but at the cost of
potentially ignoring more serious issues.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Peredun <chris.peredun@ixsystems.com>
Closes #15677

4 months agodbuf: Set dr_data when unoverriding after clone
Alexander Motin [Tue, 12 Dec 2023 20:59:24 +0000 (15:59 -0500)]
dbuf: Set dr_data when unoverriding after clone

Block cloning normally creates dirty record without dr_data.  But if
the block is read after cloning, it is moved into DB_CACHED state and
receives the data buffer.  If after that we call dbuf_unoverride()
to convert the dirty record into normal write, we should give it the
data buffer from dbuf and release one.

Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15654
Closes #15656

4 months agodbuf: Handle arcbuf assignment after block cloning
Alexander Motin [Tue, 12 Dec 2023 20:53:59 +0000 (15:53 -0500)]
dbuf: Handle arcbuf assignment after block cloning

In some cases dbuf_assign_arcbuf() may be called on a block that
was recently cloned.  If it happened in current TXG we must undo
the block cloning first, since the only one dirty record per TXG
can't and shouldn't mean both cloning and overwrite same time.

Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15653

4 months agoDMU: Fix lock leak on dbuf_hold() error
Alexander Motin [Sat, 9 Dec 2023 00:43:39 +0000 (19:43 -0500)]
DMU: Fix lock leak on dbuf_hold() error

dmu_assign_arcbuf_by_dnode() should drop dn_struct_rwlock lock in
case dbuf_hold() failed.  I don't have reproduction for this, but
it looks inconsistent with dmu_buf_hold_noread_by_dnode() and co.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15644

4 months agoBRT: Limit brt_vdev_dump() to only one vdev
Alexander Motin [Wed, 6 Dec 2023 23:37:27 +0000 (18:37 -0500)]
BRT: Limit brt_vdev_dump() to only one vdev

Without this patch on pool of 60 vdevs with ZFS_DEBUG enabled clone
takes much more time than copy, while heavily trashing dbgmsg for
no good reason, repeatedly dumping all vdevs BRTs again and again,
even unmodified ones.

I am generally not sure this dumping is not excessive, but decided
to keep it for now, just restricting its scope to more reasonable.

Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15625

4 months agoZIL: Remove 128K into 2x68K LWB split optimization
Alexander Motin [Wed, 6 Dec 2023 23:02:05 +0000 (18:02 -0500)]
ZIL: Remove 128K into 2x68K LWB split optimization

To improve 128KB block write performance in case of multiple VDEVs
ZIL used to spit those writes into two 64KB ones.  Unfortunately it
was found to cause LWB buffer overflow, trying to write maximum-
sizes 128KB TX_CLONE_RANGE record with 1022 block pointers into
68KB buffer, since unlike TX_WRITE ZIL code can't split it.

This is a minimally-invasive temporary block cloning fix until the
following more invasive prediction code refactoring.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15634

4 months agozdb: Dump encrypted write and clone ZIL records
Alexander Motin [Wed, 6 Dec 2023 20:39:12 +0000 (15:39 -0500)]
zdb: Dump encrypted write and clone ZIL records

Block pointers are not encrypted in TX_WRITE and TX_CLONE_RANGE
records, so we can dump them, that may be useful for debugging.

Related to #15543.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15629

4 months agoAllow block cloning across encrypted datasets
oromenahar [Tue, 5 Dec 2023 19:03:48 +0000 (20:03 +0100)]
Allow block cloning across encrypted datasets

When two datasets share the same master encryption key, it is safe
to clone encrypted blocks. Currently only snapshots and clones
of a dataset share with it the same encryption key.

Added a test for:
- Clone from encrypted sibling to encrypted sibling with
  non encrypted parent
- Clone from encrypted parent to inherited encrypted child
- Clone from child to sibling with encrypted parent
- Clone from snapshot to the original datasets
- Clone from foreign snapshot to a foreign dataset
- Cloning from non-encrypted to encrypted datasets
- Cloning from encrypted to non-encrypted datasets

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Original-patch-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Signed-off-by: Kay Pedersen <mail@mkwg.de>
Closes #15544

4 months agoZIL: Do not clone blocks from the future
Alexander Motin [Tue, 5 Dec 2023 18:58:11 +0000 (13:58 -0500)]
ZIL: Do not clone blocks from the future

ZIL claim can not handle block pointers cloned from the future,
since they are not yet allocated at that point.  It may happen
either if the block was just written when it was cloned, or if
the pool was frozen or somehow else rewound on import.

Handle it from two sides: prevent cloning of blocks with physical
birth time from not yet synced or frozen TXG, and abort ZIL claim
if we still detect such blocks due to rewind or something else.

While there, assert that any cloned blocks we claim are really
allocated by calling metaslab_check_free().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15617

4 months agoZIL: Remove TX_CLONE_RANGE replay for ZVOLs.
Alexander Motin [Fri, 1 Dec 2023 23:23:20 +0000 (18:23 -0500)]
ZIL: Remove TX_CLONE_RANGE replay for ZVOLs.

zil_claim_clone_range() takes references on cloned blocks before ZIL
replay.  Later zil_free_clone_range() drops them after replay or on
dataset destroy.  The total balance is neutral.  It means we do not
need to do anything (drop the references) for not implemented yet
TX_CLONE_RANGE replay for ZVOLs.

This is a logical follow up to #15603.

Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15612

4 months agoZIO: Add overflow checks for linear buffers
Alexander Motin [Fri, 1 Dec 2023 19:50:10 +0000 (14:50 -0500)]
ZIO: Add overflow checks for linear buffers

Since we use a limited set of kmem caches, quite often we have unused
memory after the end of the buffer.  Put there up to a 512-byte canary
when built with debug to detect buffer overflows at the free time.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15553

4 months agoZIL: Assert record sizes in different places
Alexander Motin [Tue, 28 Nov 2023 21:35:14 +0000 (16:35 -0500)]
ZIL: Assert record sizes in different places

This should make sure we have log written without overflows.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15517

4 months agoL2ARC: Restrict write size to 1/4 of the device
Alexander Motin [Tue, 14 Nov 2023 21:47:57 +0000 (16:47 -0500)]
L2ARC: Restrict write size to 1/4 of the device

PR #15457 exposed weird logic in L2ARC write sizing. If it appeared
bigger than device size, instead of liming write it reset all the
system-wide tunables to their default.  Aside of being excessive,
it did not actually help with the problem, still allowing infinite
loop to happen.

This patch removes the tunables reverting logic, but instead limits
L2ARC writes (or at least eviction/trim) to 1/4 of the capacity.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15519

4 months agoLinux: Reclaim unused spl_kmem_cache_reclaim
Alexander Motin [Fri, 10 Nov 2023 18:34:46 +0000 (13:34 -0500)]
Linux: Reclaim unused spl_kmem_cache_reclaim

It is unused for 3 years since #10576.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15507

4 months agoFreeBSD: Optimize large kstat outputs
Alexander Motin [Tue, 7 Nov 2023 19:35:40 +0000 (14:35 -0500)]
FreeBSD: Optimize large kstat outputs

- Use sbuf_new_for_sysctl() to reduce double-buffering on sysctl
output.
- Use much faster sbuf_cat() instead of sbuf_printf("%s").

Together it reduces `sysctl kstat.zfs.misc.dbufs` time from minutes
to seconds, making dbufstat almost usable.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15495

4 months agoUpdate the kstat dataset_name when renaming a zvol
Alan Somers [Tue, 7 Nov 2023 19:34:50 +0000 (12:34 -0700)]
Update the kstat dataset_name when renaming a zvol

Add a dataset_kstats_rename function, and call it when renaming
a zvol on FreeBSD and Linux.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com>
Sponsored-by: Axcient
Closes #15482
Closes #15486

4 months agoABD: Be more assertive in iterators
Alexander Motin [Tue, 24 Oct 2023 21:33:58 +0000 (17:33 -0400)]
ABD: Be more assertive in iterators

Once we verified the ABDs and asserted the sizes we should never
see premature ABDs ends.  Assert that and remove extra branches
from production builds.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15428

5 months agospa: make read/write queues configurable
Rob Norris [Wed, 25 Oct 2023 04:11:37 +0000 (15:11 +1100)]
spa: make read/write queues configurable

We are finding that as customers get larger and faster machines
(hundreds of cores, large NVMe-backed pools) they keep hitting
relatively low performance ceilings. Our profiling work almost always
finds that they're running into bottlenecks on the SPA IO taskqs.
Unfortunately there's often little we can advise at that point, because
there's very few ways to change behaviour without patching.

This commit adds two load-time parameters `zio_taskq_read` and
`zio_taskq_write` that can configure the READ and WRITE IO taskqs
directly.

This achieves two goals: it gives operators (and those that support
them) a way to tune things without requiring a custom build of OpenZFS,
which is often not possible, and it lets us easily try different config
variations in a variety of environments to inform the development of
better defaults for these kind of systems.

Because tuning the IO taskqs really requires a fairly deep understanding
of how IO in ZFS works, and generally isn't needed without a pretty
serious workload and an ability to identify bottlenecks, only minimal
documentation is provided. Its expected that anyone using this is going
to have the source code there as well.

Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
5 months agoLinux 6.5 compat: check BLK_OPEN_EXCL is defined
Brian Behlendorf [Thu, 21 Dec 2023 19:22:56 +0000 (11:22 -0800)]
Linux 6.5 compat: check BLK_OPEN_EXCL is defined

On some systems we already have blkdev_get_by_path() with 4 args
but still the old FMODE_EXCL and not BLK_OPEN_EXCL defined.
The vdev_bdev_mode() function was added to handle this case
but there was no generic way to specify exclusive access.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15692

5 months agoZTS: Disable io_uring test on CentOS 9
Brian Behlendorf [Sat, 9 Dec 2023 01:31:31 +0000 (17:31 -0800)]
ZTS: Disable io_uring test on CentOS 9

The io_uring test fails on CentOS 9 with the following fio error.
Disable the test for the benefit of the CI until this can be fully
investigated.  This basic test passes as expected on newer kernels.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15636

5 months agolinux 6.7 compat: rework shrinker setup for heap allocations
Rob Norris [Sat, 16 Dec 2023 13:36:21 +0000 (00:36 +1100)]
linux 6.7 compat: rework shrinker setup for heap allocations

6.7 changes the shrinker API such that shrinkers must be allocated
dynamically by the kernel. To accomodate this, this commit reworks
spl_register_shrinker() to do something similar against earlier kernels.

Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://github.com/sponsors/robn
5 months agolinux 6.7 compat: handle superblock shrinker member change
Rob Norris [Sat, 16 Dec 2023 06:39:07 +0000 (17:39 +1100)]
linux 6.7 compat: handle superblock shrinker member change

In 6.7 the superblock shrinker member s_shrink has changed from being an
embedded struct to a pointer. Detect this, and don't take a reference if
it already is one.

Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://github.com/sponsors/robn
5 months agolinux 6.7 compat: use inode atime/mtime accessors
Rob Norris [Sat, 16 Dec 2023 11:31:32 +0000 (22:31 +1100)]
linux 6.7 compat: use inode atime/mtime accessors

6.6 made i_ctime inaccessible; 6.7 has done the same for i_atime and
i_mtime. This extends the method used for ctime in b37f29341 to atime
and mtime as well.

Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://github.com/sponsors/robn
5 months agolinux 6.7 compat: simplify current_time() check
Rob Norris [Sat, 16 Dec 2023 07:01:45 +0000 (18:01 +1100)]
linux 6.7 compat: simplify current_time() check

6.7 changed the names of the time members in struct inode, so we can't
assign back to it because we don't know its name. In practice this
doesn't matter though - if we're missing current_time(), then we must be
on <4.9, and we know our fallback will need to return timespec.

Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://github.com/sponsors/robn
6 months agoTag zfs-2.2.2 zfs-2.2.2
Tony Hutter [Tue, 28 Nov 2023 22:53:42 +0000 (14:53 -0800)]
Tag zfs-2.2.2

META file and changelog updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
6 months agoFreeBSD: Fix ZFS so that snapshots under .zfs/snapshot are NFS visible
rmacklem [Tue, 28 Nov 2023 00:31:03 +0000 (16:31 -0800)]
FreeBSD: Fix ZFS so that snapshots under .zfs/snapshot are NFS visible

Call vfs_exjail_clone() for mounts created under .zfs/snapshot
to fill in the mnt_exjail field for the mount.  If this is not
done, the snapshots under .zfs/snapshot with not be accessible
over NFS.

This version has the argument name in vfs.h fixed to match that
of the name in spl_vfs.c, although it really does not matter.

External-issue: https://reviews.freebsd.org/D42672
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rick Macklem <rmacklem@uoguelph.ca>
Closes #15563

6 months agoZIL: Call brt_pending_add() replaying TX_CLONE_RANGE
Alexander Motin [Wed, 29 Nov 2023 18:51:34 +0000 (13:51 -0500)]
ZIL: Call brt_pending_add() replaying TX_CLONE_RANGE

zil_claim_clone_range() takes references on cloned blocks before ZIL
replay.  Later zil_free_clone_range() drops them after replay or on
dataset destroy.  The total balance is neutral.  It means on actual
replay we must take additional references, which would stay in BRT.

Without this blocks could be freed prematurely when either original
file or its clone are destroyed.  I've observed BRT being emptied
and the feature being deactivated after ZIL replay completion, which
should not have happened.  With the patch I see expected stats.

Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15603

6 months agozdb: fix printf() length for uint64_t devid
Martin Matuška [Wed, 29 Nov 2023 17:18:30 +0000 (18:18 +0100)]
zdb: fix printf() length for uint64_t devid

Bug introduced in 213d6829673.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Warner Losh <imp@FreeBSD.org>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #15606

6 months agoLinux 6.6 compat: fix configure error with clang (#15558)
Jaron Kent-Dobias [Tue, 28 Nov 2023 19:34:40 +0000 (20:34 +0100)]
Linux 6.6 compat: fix configure error with clang (#15558)

With Linux v6.6.x and clang 16, a configure step fails on a warning that
later results in an error while building, due to 'ts' being
uninitialized. Add a trivial initialization to silence the warning.

Signed-off-by: Jaron Kent-Dobias <jaron@kent-dobias.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
6 months agozfs-dkms: fix shell-init error message
AllKind [Mon, 27 Nov 2023 21:17:48 +0000 (22:17 +0100)]
zfs-dkms: fix shell-init error message

If all zfs dkms modules have been removed, a shell-init error message
may appear, because /var/lib/dkms/zfs does no longer exist.
Resolve this by leaving the directory earlier on.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mart Frauenlob <AllKind@fastest.cc>
Closes #15576

6 months agoFreeBSD: Fix the build on FreeBSD 12
Alan Somers [Mon, 27 Nov 2023 20:58:03 +0000 (13:58 -0700)]
FreeBSD: Fix the build on FreeBSD 12

It was broken for several reasons:
* VOP_UNLOCK lost an argument in 13.0.  So OpenZFS should be using
  VOP_UNLOCK1, but a few direct calls to VOP_UNLOCK snuck in.
* The location of the zlib header moved in 13.0 and 12.1.  We can drop
  support for building on 12.0, which is EoL.
* knlist_init lost an argument in 13.0.  OpenZFS change 9d0887402ba
  assumed 13.0 or later.
* FreeBSD 13.0 added copy_file_range, and OpenZFS change 67a1b037915
  assumed 13.0 or later.

Sponsored-by: Axcient
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #15551

6 months agodmu_buf_will_clone: fix race in transition back to NOFILL
Rob N [Tue, 28 Nov 2023 17:53:04 +0000 (04:53 +1100)]
dmu_buf_will_clone: fix race in transition back to NOFILL

Previously, dmu_buf_will_clone() would roll back any dirty record, but
would not clean out the modified data nor reset the state before
releasing the lock. That leaves the last-written data in db_data, but
the dbuf in the wrong state.

This is eventually corrected when the dbuf state is made NOFILL, and
dbuf_noread() called (which clears out the old data), but at this point
its too late, because the lock was already dropped with that invalid
state.

Any caller acquiring the lock before the call into
dmu_buf_will_not_fill() can find what appears to be a clean, readable
buffer, and would take the wrong state from it: it should be getting the
data from the cloned block, not from earlier (unwritten) dirty data.

Even after the state was switched to NOFILL, the old data was still not
cleaned out until dbuf_noread(), which is another gap for a caller to
take the lock and read the wrong data.

This commit fixes all this by properly cleaning up the previous state
and then setting the new state before dropping the lock. The
DBUF_VERIFY() calls confirm that the dbuf is in a valid state when the
lock is down.

Sponsored-by: Klara, Inc.
Sponsored-By: OpenDrives Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #15566
Closes #15526

6 months agozdb: Fix zdb '-O|-r' options with -e/exported zpool
Akash B [Mon, 27 Nov 2023 21:41:58 +0000 (03:11 +0530)]
zdb: Fix zdb '-O|-r' options with -e/exported zpool

zdb with '-e' or exported zpool doesn't work along with
'-O' and '-r' options as we process them before '-e' has
been processed.

Below errors are seen:

~> zdb -e pool-mds65/mdt65 -O oi.9/0x200000009:0x0:0x0
failed to hold dataset 'pool-mds65/mdt65': No such file or directory

~> zdb -e pool-oss0/ost0 -r file1 /tmp/filecopy1 -p.
failed to hold dataset 'pool-oss0/ost0': No such file or directory
zdb: internal error: No such file or directory

We need to make sure to process '-O|-r' options after the
'-e' option has been processed, which imports the pool to
the namespace if it's not in the cachefile.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Akash B <akash-b@hpe.com>
Closes #15532

6 months agozdb: show BRT statistics and dump its contents
Rob Norris [Sat, 18 Nov 2023 10:33:45 +0000 (21:33 +1100)]
zdb: show BRT statistics and dump its contents

Same idea as the dedup stats, but for block cloning.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15541

6 months agobrt: lift internal definitions into _impl header
Rob Norris [Sat, 18 Nov 2023 10:32:16 +0000 (21:32 +1100)]
brt: lift internal definitions into _impl header

So that zdb (and others!) can get at the BRT on-disk structures.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15541

6 months agoZTS: Fix zfs_load-key failures on F39
Tony Hutter [Mon, 27 Nov 2023 21:24:37 +0000 (13:24 -0800)]
ZTS: Fix zfs_load-key failures on F39

The zfs_load-key tests were failing on F39 due to their use of the
deprecated ssl.wrap_socket function.  This commit updates the test to
instead use ssl.SSLContext() as described in
https://stackoverflow.com/a/65194957.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15534
Closes #15550

6 months agoZIL: Do not encrypt block pointers in lr_clone_range_t
Alexander Motin [Sun, 19 Nov 2023 01:01:03 +0000 (20:01 -0500)]
ZIL: Do not encrypt block pointers in lr_clone_range_t

In case of crash cloned blocks need to be claimed on pool import.
It is only possible if they (lr_bps) and their count (lr_nbps) are
not encrypted but only authenticated, similar to block pointer in
lr_write_t.  Few other fields can be and are still encrypted.

This should fix panic on ZIL claim after crash when block cloning
is actively used.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Tom Caputi <caputit1@tcnj.edu>
Reviewed-by: Sean Eric Fagan <sef@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Edmund Nadolski <edmund.nadolski@ixsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15543
Closes #15513

6 months agodnode_is_dirty: check dnode and its data for dirtiness
Rob N [Tue, 28 Nov 2023 17:15:48 +0000 (04:15 +1100)]
dnode_is_dirty: check dnode and its data for dirtiness

Over its history this the dirty dnode test has been changed between
checking for a dnodes being on `os_dirty_dnodes` (`dn_dirty_link`) and
`dn_dirty_record`.

  de198f2d9 Fix lseek(SEEK_DATA/SEEK_HOLE) mmap consistency
  2531ce372 Revert "Report holes when there are only metadata changes"
  ec4f9b8f3 Report holes when there are only metadata changes
  454365bba Fix dirty check in dmu_offset_next()
  66aca2473 SEEK_HOLE should not block on txg_wait_synced()

Also illumos/illumos-gate@c543ec060d illumos/illumos-gate@2bcf0248e9

It turns out both are actually required.

In the case of appending data to a newly created file, the dnode proper
is dirtied (at least to change the blocksize) and dirty records are
added.  Thus, a single logical operation is represented by separate
dirty indicators, and must not be separated.

The incorrect dirty check becomes a problem when the first block of a
file is being appended to while another process is calling lseek to skip
holes. There is a small window where the dnode part is undirtied while
there are still dirty records. In this case, `lseek(fd, 0, SEEK_DATA)`
would not know that the file is dirty, and would go to
`dnode_next_offset()`. Since the object has no data blocks yet, it
returns `ESRCH`, indicating no data found, which results in `ENXIO`
being returned to `lseek()`'s caller.

Since coreutils 9.2, `cp` performs sparse copies by default, that is, it
uses `SEEK_DATA` and `SEEK_HOLE` against the source file and attempts to
replicate the holes in the target. When it hits the bug, its initial
search for data fails, and it goes on to call `fallocate()` to create a
hole over the entire destination file.

This has come up more recently as users upgrade their systems, getting
OpenZFS 2.2 as well as a newer coreutils. However, this problem has been
reproduced against 2.1, as well as on FreeBSD 13 and 14.

This change simply updates the dirty check to check both types of dirty.
If there's anything dirty at all, we immediately go to the "wait for
sync" stage, It doesn't really matter after that; both changes are on
disk, so the dirty fields should be correct.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #15571
Closes #15526

6 months agoRevert "Tune zio buffer caches and their alignments"
Brian Behlendorf [Mon, 27 Nov 2023 21:49:20 +0000 (13:49 -0800)]
Revert "Tune zio buffer caches and their alignments"

This reverts commit bd7a02c251d8c119937e847d5161b512913667e6 which
can trigger an unlikely existing bio alignment issue on Linux.
This change is good, but the underlying issue it exposes needs to
be resolved before this can be re-applied.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #15533

6 months agoTag zfs-2.2.1 zfs-2.2.1
Tony Hutter [Mon, 13 Nov 2023 19:38:57 +0000 (11:38 -0800)]
Tag zfs-2.2.1

META file and changelog updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
6 months agoZTS: Fix 'could not unmount datasets' on Alma 9
Tony Hutter [Sat, 18 Nov 2023 21:07:06 +0000 (13:07 -0800)]
ZTS: Fix 'could not unmount datasets' on Alma 9

Many tests are failing on AlmaLinux 9 because ZTS could not destroy the
pool in cleanup.  This was due to $PWD being set to '.' instead of the
expected full path.  This patch sets $PWD to the full path.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
6 months agozfs-2.2.1: Disable block cloning by default
Tony Hutter [Thu, 16 Nov 2023 19:42:19 +0000 (11:42 -0800)]
zfs-2.2.1: Disable block cloning by default

Disable block cloning by default to mitigate possible data corruption
(see #15529 and #15526).

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
6 months agoAdd a tunable to disable BRT support.
Rich Ercolani [Thu, 16 Nov 2023 19:35:22 +0000 (14:35 -0500)]
Add a tunable to disable BRT support.

Copy the disable parameter that FreeBSD implemented, and extend it to
work on Linux as well, until we're sure this is stable.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #15529