]> git.proxmox.com Git - mirror_zfs.git/log
mirror_zfs.git
19 months agoRevert "Reduce dbuf_find() lock contention"
Brian Behlendorf [Mon, 19 Sep 2022 18:07:15 +0000 (11:07 -0700)]
Revert "Reduce dbuf_find() lock contention"

This reverts commit 34dbc618f50cfcd392f90af80c140398c38cbcd1.  While this
change resolved the lock contention observed for certain workloads, it
inadventantly reduced the maximum hash inserts/removes per second.  This
appears to be due to the slightly higher acquisition cost of a rwlock vs
a mutex.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
19 months agoCleanup: Change 1 used in bitshifts to 1ULL
Richard Yao [Thu, 22 Sep 2022 18:28:33 +0000 (14:28 -0400)]
Cleanup: Change 1 used in bitshifts to 1ULL

Coverity complains about this. It is not a bug as long as we never shift
by more than 31, but it is not terrible to change the constants from 1
to 1ULL as clean up.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13914

19 months agoRetire ZFS_TEARDOWN_TRY_ENTER_READ
Mateusz Guzik [Tue, 20 Sep 2022 22:34:41 +0000 (00:34 +0200)]
Retire ZFS_TEARDOWN_TRY_ENTER_READ

There were never any users and it so happens the operation is not even
supported by rrm locks -- the macros were wrong for Linux and FreeBSD
when not using it's RMS locks.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #13906

19 months agoAdd membar_sync
Mateusz Guzik [Tue, 20 Sep 2022 22:32:44 +0000 (00:32 +0200)]
Add membar_sync

Provides the missing full barrier variant to the membar primitive set.

While not used right now, this is probably going to change down the
road.

Name taken from Solaris, to follow the existing routines.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #13907

19 months agoFix minor issues in namespace delegation support
youzhongyang [Tue, 20 Sep 2022 22:25:21 +0000 (18:25 -0400)]
Fix minor issues in namespace delegation support

get_user_ns() is only done once for each namespace, so put_user_ns()
should be done once too.

Fix two typos in user_namespace/user_namespace_002.ksh and
user_namespace/user_namespace_003.ksh.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Closes #13918

19 months agoFreeBSD: handle V_PCATCH
Mateusz Guzik [Tue, 20 Sep 2022 22:22:32 +0000 (00:22 +0200)]
FreeBSD: handle V_PCATCH

See https://cgit.FreeBSD.org/src/commit/?id=a75d1ddd74312f5dd79bc1e965f7077679659f2e

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #13910

19 months agoFreeBSD: catch up to 1400068
Mateusz Guzik [Tue, 20 Sep 2022 22:21:30 +0000 (00:21 +0200)]
FreeBSD: catch up to 1400068

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #13909

19 months agoCall va_end() before return in zpool_standard_error_fmt()
Richard Yao [Tue, 20 Sep 2022 22:20:56 +0000 (18:20 -0400)]
Call va_end() before return in zpool_standard_error_fmt()

Commit ecd6cf800b63704be73fb264c3f5b6e0dafc068d by marks in OpenSolaris
at Tue Jun 26 07:44:24 2007 -0700 introduced a bug where we fail to call
`va_end()` before returning.

The man page for va_start() says:

"Each invocation of va_start() must be matched by a corresponding
invocation of va_end() in the same function."

Coverity complained about this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chunwei Chen <david.chen@nutanix.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13904

19 months agoFix potential NULL pointer dereference in zfsdle_vdev_online()
Richard Yao [Tue, 20 Sep 2022 22:20:04 +0000 (18:20 -0400)]
Fix potential NULL pointer dereference in zfsdle_vdev_online()

Coverity complained about this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chunwei Chen <david.chen@nutanix.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13903

19 months agoDelay ZFS_PROP_SHARESMB property to handle it for encrypted raw receive
Ameer Hamza [Tue, 20 Sep 2022 22:19:05 +0000 (03:19 +0500)]
Delay ZFS_PROP_SHARESMB property to handle it for encrypted raw receive

For encrypted raw receive, objset creation is delayed until a call to
dmu_recv_stream(). ZFS_PROP_SHARESMB property requires objset to be
populated when calling zpl_earlier_version(). To correctly handle the
ZFS_PROP_SHARESMB property for encrypted raw receive, this change
delays setting the property.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #13878

19 months agoFreeBSD: Cleanup zfs_readdir()
Richard Yao [Tue, 20 Sep 2022 21:50:16 +0000 (17:50 -0400)]
FreeBSD: Cleanup zfs_readdir()

The FreeBSD project's coverity scans found dead code in `zfs_readdir()`.
Also, the comment above `zfs_readdir()` is out of date.

I fixed the comment and deleted all of the dead code, plus additional
dead code that was found upon review.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13924

19 months agoFreeBSD: Fix uninitialized pointer read in spa_import_rootpool()
Richard Yao [Tue, 20 Sep 2022 21:43:03 +0000 (17:43 -0400)]
FreeBSD: Fix uninitialized pointer read in spa_import_rootpool()

The FreeBSD project's coverity scans found this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13923

19 months agoCleanup: Remove unused uu_pname code
Richard Yao [Tue, 20 Sep 2022 00:33:52 +0000 (20:33 -0400)]
Cleanup: Remove unused uu_pname code

Coverity caught a possible NULL pointer dereference in dead code. We can
delete it all.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chunwei Chen <david.chen@nutanix.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13900

19 months agoFix usage of zed_log_msg() and zfs_panic_recover()
Richard Yao [Tue, 20 Sep 2022 00:32:18 +0000 (20:32 -0400)]
Fix usage of zed_log_msg() and zfs_panic_recover()

Coverity complained about the format specifiers not matching variables.
In one case, the variable is a constant, so we fix it. In another, we
were missing an argument (about which coverity also complained).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13888

19 months agoLinux: Fix use-after-free in zfsvfs_create()
Richard Yao [Tue, 20 Sep 2022 00:30:58 +0000 (20:30 -0400)]
Linux: Fix use-after-free in zfsvfs_create()

Coverity reported that we pass a pointer to zfsvfs to
`dmu_objset_disown()` after freeing zfsvfs in zfsvfs_create_impl() after
a failure in zfsvfs_init().

We have nearly identical duplicate versions of this code for FreeBSD and
Linux, but interestingly, the FreeBSD version of this code differs in
such a way that it does not suffer from this bug. We remove the
difference from the FreeBSD version to fix this bug.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13883

19 months agoFreeBSD: fix static module build broken in 7bb707ffa
Martin Matuška [Tue, 20 Sep 2022 00:21:45 +0000 (02:21 +0200)]
FreeBSD: fix static module build broken in 7bb707ffa

param_set_arc_free_target(SYSCTL_HANDLER_ARGS) and
param_set_arc_no_grow_shift(SYSCTL_HANDLER_ARGS) defined in
sysctl_os.c must be made available to arc_os.c.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #13915

19 months agoFreeBSD: stop passing LK_INTERLOCK to VOP_LOCK
Mateusz Guzik [Tue, 20 Sep 2022 00:17:27 +0000 (02:17 +0200)]
FreeBSD: stop passing LK_INTERLOCK to VOP_LOCK

There is an ongoing effort to eliminate this feature.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #13908

20 months agoAdd PPC cpu feature tests for FreeBSD and Linux
Tino Reichardt [Wed, 7 Sep 2022 18:33:59 +0000 (20:33 +0200)]
Add PPC cpu feature tests for FreeBSD and Linux

Add needed cpu feature tests for powerpc architecture.

Overview:
zfs_altivec_available() - needed by RAID-Z
zfs_vsx_available()     - needed by BLAKE3
zfs_isa207_available()  - needed by SHA2

Part 1 - Userspace
- use getauxval() for Linux and elf_aux_info() for FreeBSD
- direct including <sys/auxv.h> fails with double definitions
- so we self define the needed functions and definitions

Part 2 - Kernel space FreeBSD
- use exported cpu_features of <powerpc/cpu.h>

Part 3 - Kernel space Linux
- use cpu_has_feature() function of <asm/cpufeature.h>

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13725

20 months agoAdd zfs_blake3_impl to zfs.4
Tino Reichardt [Sat, 3 Sep 2022 08:40:29 +0000 (10:40 +0200)]
Add zfs_blake3_impl to zfs.4

The zfs module parameter zfs_blake3_impl got no manual page entry while
adding BLAKE3 to OpenZFS. This commit adds the required notes about the
parameter into zfs.4

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Co-authored-by: Ryan Moeller <ryan@freqlabs.com>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13725

20 months agoFix BLAKE3 tuneable and module loading on Linux and FreeBSD
Tino Reichardt [Wed, 3 Aug 2022 16:36:41 +0000 (18:36 +0200)]
Fix BLAKE3 tuneable and module loading on Linux and FreeBSD

Apply similar options to BLAKE3 as it is done for zfs_fletcher_4_impl.

The zfs module parameter on Linux changes from icp_blake3_impl to
zfs_blake3_impl.

You can check and set it on Linux via sysfs like this:
```
[bash]# cat /sys/module/zfs/parameters/zfs_blake3_impl
cycle [fastest] generic sse2 sse41 avx2

[bash]# echo sse2 > /sys/module/zfs/parameters/zfs_blake3_impl
[bash]# cat /sys/module/zfs/parameters/zfs_blake3_impl
cycle fastest generic [sse2] sse41 avx2
```

The modprobe module parameters may also be used now:
```
[bash]# modprobe zfs zfs_blake3_impl=sse41
[bash]# cat /sys/module/zfs/parameters/zfs_blake3_impl
cycle fastest generic sse2 [sse41] avx2
```

On FreeBSD the BLAKE3 implementation can be set via sysctl like this:
```
[bsd]# sysctl vfs.zfs.blake3_impl
vfs.zfs.blake3_impl: cycle [fastest] generic sse2 sse41 avx2
[bsd]# sysctl vfs.zfs.blake3_impl=sse2
vfs.zfs.blake3_impl: cycle [fastest] generic sse2 sse41 avx2 \
  -> cycle fastest generic [sse2] sse41 avx2
```

This commit changes also some Blake3 internals like these:
- blake3_impl_ops_t was renamed to blake3_ops_t
- all functions are named blake3_impl_NAME() now

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Co-authored-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13725

20 months agozfs_enter rework followup
Brian Behlendorf [Fri, 16 Sep 2022 21:22:52 +0000 (14:22 -0700)]
zfs_enter rework followup

The zpl_fadvise() function was recently added and was not included
in the initial patch.  Update it accordingly.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13831

20 months agoFix null pointer dereferences in PAM
Richard Yao [Fri, 16 Sep 2022 21:02:54 +0000 (17:02 -0400)]
Fix null pointer dereferences in PAM

Coverity caught these.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13889

20 months agoHandle ECKSUM as new EZFS_CKSUM ‒ "insufficient replicas"
наб [Fri, 16 Sep 2022 20:59:25 +0000 (22:59 +0200)]
Handle ECKSUM as new EZFS_CKSUM ‒ "insufficient replicas"

Add a meaningful error message for ECKSUM to common error messages.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #6805
Closes #13808
Closes #13898

20 months agozfs recv hangs if max recordsize is less than received recordsize
Ameer Hamza [Fri, 16 Sep 2022 20:52:25 +0000 (01:52 +0500)]
zfs recv hangs if max recordsize is less than received recordsize

- Some optimizations for bqueue enqueue/dequeue.
- Added a fix to prevent deadlock when both bqueue_enqueue_impl()
and bqueue_dequeue() waits for signal to be triggered.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #13855

20 months agoUpdate coverity model
Richard Yao [Fri, 16 Sep 2022 20:45:15 +0000 (16:45 -0400)]
Update coverity model

`uu_panic()` needs to be modelled and the definition of `vpanic()` from
the original coverity model was missing
`__coverity_format_string_sink__()`.

We also model `libspl_assertf()` as part of an attempt to eliminate
false positives.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13901

20 months agoFix unable to export zpool without nfs-utils
Chunwei Chen [Fri, 16 Sep 2022 20:43:26 +0000 (13:43 -0700)]
Fix unable to export zpool without nfs-utils

Don't return error in nfs_disable_share when nfs is not available, since
it wouldn't have been able to share in the first place.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #13534
Closes #13800

20 months agozfs_enter rework
Chunwei Chen [Fri, 16 Sep 2022 20:36:47 +0000 (13:36 -0700)]
zfs_enter rework

Replace ZFS_ENTER and ZFS_VERIFY_ZP, which have hidden returns, with
functions that return error code. The reason we want to do this is
because hidden returns are not obvious and had caused some missing fail
path unwinding.

This patch changes the common, linux, and freebsd parts. Also fixes
fail path unwinding in zfs_fsync, zpl_fsync, zpl_xattr_{list,get,set}, and
zfs_lookup().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #13831

20 months agoAdd zfs_btree_verify_intensity kernel module parameter
Richard Yao [Thu, 15 Sep 2022 23:22:33 +0000 (19:22 -0400)]
Add zfs_btree_verify_intensity kernel module parameter

I see a few issues in the issue tracker that might be aided by being
able to turn this on. We have no module parameter for it, so I would
like to add one.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13874

20 months agoFix incorrect size given to bqueue_enqueue() call in dmu_redact.c
Richard Yao [Thu, 15 Sep 2022 23:21:21 +0000 (19:21 -0400)]
Fix incorrect size given to bqueue_enqueue() call in dmu_redact.c

We pass sizeof (struct redact_record *) rather than sizeof (struct
redact_record). Passing the pointer size is wrong.

Coverity caught this in two places.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13885

20 months agoUse correct mdoc macros for arguments
Mateusz Piotrowski [Thu, 15 Sep 2022 21:22:00 +0000 (23:22 +0200)]
Use correct mdoc macros for arguments

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Mateusz Piotrowski <0mp@FreeBSD.org>
Closes #13890

20 months agoFix assertions in crypto reference helpers
Richard Yao [Thu, 15 Sep 2022 20:24:00 +0000 (16:24 -0400)]
Fix assertions in crypto reference helpers

The assertions are racy and the use of `membar_exit()` did nothing to
fix that.

The helpers use atomic functions, so we cleverly get values from the
atomics that we can use to ensure that the assertions operate on the
correct values.

We also use `membar_producer()` prior to decrementing reference counts
so that operations that happened prior to a decrement to 0 will be
guaranteed to happen before the decrement on architectures that reorder
atomics.

This also slightly improves performance by eliminating unnecessary
reads, although I doubt it would be measurable in any benchmark.

Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13880

20 months agoZTS: parameter expansion in zfs_unshare_006_pos
John Wren Kennedy [Thu, 15 Sep 2022 20:14:35 +0000 (14:14 -0600)]
ZTS: parameter expansion in zfs_unshare_006_pos

zfs_unshare_006 checks to see if a dataset still has an active SMB
share after doing an NFS unshare -a. The test could fail because the
check for the SMB share does not expect dashes in a dataset name to be
converted to underscores as pathname delimiters are.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: John Kennedy <john.kennedy@delphix.com>
Closes #13893

20 months agoAdd coverity model to repository
Richard Yao [Thu, 15 Sep 2022 18:50:19 +0000 (14:50 -0400)]
Add coverity model to repository

Other projects such as the python project include their coverity models
in their repositories. This provides transparency, which is beneficial
in open source projects. Therefore, it is a good idea to include the
coverity model in our repository too.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13884

20 months agoFix use-after-free bugs in icp code
Richard Yao [Thu, 15 Sep 2022 18:46:42 +0000 (14:46 -0400)]
Fix use-after-free bugs in icp code

These were reported by Coverity as "Read from pointer after free" bugs.
Presumably, it did not report it as a use-after-free bug because it does
not understand the inline assembly that implements the atomic
instruction.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13881

20 months agoCI: revert `--with-config=dist` to hotfix Ubuntu 20.04
George Melikov [Wed, 14 Sep 2022 23:26:57 +0000 (02:26 +0300)]
CI: revert `--with-config=dist` to hotfix Ubuntu 20.04

Recently Github action runners started to fail on kmod build.
Revert --with-config=dist from ./configure section of github
runners to stabilize CI for now.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #13894

20 months agoFreeBSD: Fix integer conversion for vnlru_free{,_vfsops}()
Richard Yao [Wed, 14 Sep 2022 19:51:55 +0000 (15:51 -0400)]
FreeBSD: Fix integer conversion for vnlru_free{,_vfsops}()

When reviewing #13875, I noticed that our FreeBSD code has an issue
where it converts from `int64_t` to `int` when calling
`vnlru_free{,_vfsops}()`. The result is that if the int64_t is `1 <<
36`, the int will be 0, since the low bits are 0. Even when some low
bits are set, a value such as `((1 << 36) + 1)` would truncate to 1,
which is wrong.

There is protection against this on 32-bit platforms, but on 64-bit
platforms, there is no check to protect us, so we add a check.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13882

20 months agoAdd assertion to dsl_dataset_set_compression_sync
Richard Yao [Wed, 14 Sep 2022 19:50:03 +0000 (15:50 -0400)]
Add assertion to dsl_dataset_set_compression_sync

Coverity pointed out that if we somehow receive SPA_FEATURE_NONE, we
will use a negative number as an array index. A defensive assertion
seems appropriate.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13872

20 months agoFix theoretical "use-after-free" in dbuf_prefetch_indirect_done()
Richard Yao [Wed, 14 Sep 2022 00:58:29 +0000 (20:58 -0400)]
Fix theoretical "use-after-free" in dbuf_prefetch_indirect_done()

Coverity complains about a "use-after-free" bug in
`dbuf_prefetch_indirect_done()` because we use a pointer value after
freeing its buffer. The pointer is used for refcounting in ARC (as the
reference holder). There is a theoretical situation where the pointer
would be reused in a way that causes the refcounting to collide, so we
change the order in which we call arc_buf_destroy() and
dbuf_prefetch_fini() to match the rest of the function. This prevents
the theoretical situation from being a possibility.

Also, we have a few return statements with a value, despite this being a
void function. We clean those up while we are making changes here.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13869

20 months agoRemove incorrect free() in zfs_get_pci_slots_sys_path()
Richard Yao [Wed, 14 Sep 2022 00:00:53 +0000 (20:00 -0400)]
Remove incorrect free() in zfs_get_pci_slots_sys_path()

Coverity found this. We attempted to free tmp, which is a pointer to a
string that should be freed by the caller.

Reviewed-by: Neal Gompa <ngompa@datto.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13864

20 months agoCleanup: Make memory barrier definitions consistent across kernels
Richard Yao [Tue, 13 Sep 2022 23:59:33 +0000 (19:59 -0400)]
Cleanup: Make memory barrier definitions consistent across kernels

We inherited membar_consumer() and membar_producer() from OpenSolaris,
but we had replaced membar_consumer() with Linux's smp_rmb() in
zfs_ioctl.c. The FreeBSD SPL consequently implemented a shim for the
Linux-only smp_rmb().

We reinstate membar_consumer() in platform independent code and fix the
FreeBSD SPL to implement membar_consumer() in a way analogous to Linux.

Reviewed-by: Konstantin Belousov <kib@FreeBSD.org>
Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13843

20 months agoFix memory leak in ztest
Richard Yao [Tue, 13 Sep 2022 23:53:21 +0000 (19:53 -0400)]
Fix memory leak in ztest

Coverity found this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13863

20 months agoCleanup dead spa_boot code
Richard Yao [Tue, 13 Sep 2022 23:40:10 +0000 (19:40 -0400)]
Cleanup dead spa_boot code

Unused code detected by coverity.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13868

20 months agozpool_load_compat() should create strings of length ZFS_MAXPROPLEN
Richard Yao [Mon, 12 Sep 2022 19:54:43 +0000 (15:54 -0400)]
zpool_load_compat() should create strings of length ZFS_MAXPROPLEN

Otherwise, `strlcat()` can overflow them.

Coverity found this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13866

20 months agovdev_draid_lookup_map() should not iterate outside draid_maps
Richard Yao [Mon, 12 Sep 2022 19:51:17 +0000 (15:51 -0400)]
vdev_draid_lookup_map() should not iterate outside draid_maps

Coverity reported this as an out-of-bounds read.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13865

20 months agoFix file descriptor handling in zdb_copy_object()
Richard Yao [Mon, 12 Sep 2022 19:34:10 +0000 (15:34 -0400)]
Fix file descriptor handling in zdb_copy_object()

Coverity found a file descriptor leak. Eyeballing it showed that we had
no handling for the `open()` call failing either. We can address both of
these at once.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13862

20 months agoFix use-after-free in btree code
Richard Yao [Mon, 12 Sep 2022 18:22:15 +0000 (14:22 -0400)]
Fix use-after-free in btree code

Coverty static analysis found these.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #10989
Closes #13861

20 months agoCleanup: Use OpenSolaris functions to call scheduler
Richard Yao [Mon, 12 Sep 2022 16:55:37 +0000 (12:55 -0400)]
Cleanup: Use OpenSolaris functions to call scheduler

In our codebase, `cond_resched() and `schedule()` are Linux kernel
functions that have replaced the OpenSolaris `kpreempt()` functions in
the codebase to such an extent that `kpreempt()` in zfs_context.h was
broken. Nobody noticed because we did not actually use it. The header
had defined `kpreempt()` as `yield()`, which works on OpenSolaris and
Illumos where `sched_yield()` is a wrapper for `yield()`, but that does
not work on any other platform.

The FreeBSD platform specific code implemented shims for these, but the
shim for `schedule()` forced us to wait, which is different than merely
rescheduling to another thread as the original Linux code does, while
the shim for `cond_resched()` had the same definition as its kernel
kpreempt() shim.

After studying this, I have concluded that we should reintroduce the
kpreempt() function in platform independent code with the following
definitions:

- In the Linux kernel:
kpreempt(unused) -> cond_resched()

- In the FreeBSD kernel:
kpreempt(unused) -> kern_yield(PRI_USER)

- In userspace:
kpreempt(unused) -> sched_yield()

In userspace, nothing changes from this cleanup. In the kernels, the
function `fm_fini()` will now call `kern_yield(PRI_USER)` on FreeBSD and
`cond_resched()` on Linux.  This is instead of `pause("schedule", 1)` on
FreeBSD and `schedule()` on Linux. This makes our behavior consistent
across platforms.

Note that Linux's SPL continues to use `cond_resched()` and
`schedule()`.  However, those functions have been removed from both the
FreeBSD code and userspace code.

This should have the benefit of making it slightly easier to port the
code to new platforms by making how things should be mapped less
confusing.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13845

20 months agoMake zfs-share service resilient to stale exports
Don Brady [Fri, 9 Sep 2022 17:54:16 +0000 (11:54 -0600)]
Make zfs-share service resilient to stale exports

The are a few cases where stale entries in /etc/exports.d/zfs.exports
will cause the nfs-server service to fail when starting up.

Since the nfs-server startup consumes /etc/exports.d/zfs.exports, the
zfs-share service (which rebuilds the list of zfs exports) should run
before the nfs-server service.

To make the zfs-share service resilient to stale exports, this change
truncates the zfs config file as part of the zfs share -a operation.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #13775

20 months agoFreeBSD: Replace legacy make_dev() interface usage
Ryan Moeller [Thu, 8 Sep 2022 17:40:18 +0000 (13:40 -0400)]
FreeBSD: Replace legacy make_dev() interface usage

The function make_dev_s() was introduced to replace make_dev() in
FreeBSD 11.0.  It allows further specification of properties and flags
and returns an error code on failure.  Using this we can fail loading
the module more gracefully than a panic in situations such as when a
device named zfs already exists.  We already use it for zvols.

Use make_dev_s() for /dev/zfs.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #13854

20 months agozed: Fix config_sync autoexpand flood
Tony Hutter [Thu, 8 Sep 2022 17:32:30 +0000 (10:32 -0700)]
zed: Fix config_sync autoexpand flood

Users were seeing floods of `config_sync` events when autoexpand was
enabled.  This happened because all "disk status change" udev events
invoke the autoexpand codepath, which calls zpool_relabel_disk(),
which in turn cause another "disk status change" event to happen,
in a feedback loop.  Note that "disk status change" happens every time
a user calls close() on a block device.

This commit breaks the feedback loop by only allowing an autoexpand
to happen if the disk actually changed size.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes: #7132
Closes: #7366
Closes #13729

20 months agoImprove too large physical ashift handling
Alexander Motin [Thu, 8 Sep 2022 17:30:53 +0000 (13:30 -0400)]
Improve too large physical ashift handling

When iterating through children physical ashifts for vdev, prefer
ones above the maximum logical ashift, that we can actually use,
but within the administrator defined maximum.

When selecting top-level vdev ashift, do not set it to the defined
maximum in case physical ashift is even higher, but just ignore one.
Using the maximum does not prevent misaligned writes, but reduces
space efficiency.  Since ZFS tries to write data sequentially and
aggregates the writes, in many cases large misanigned writes may be
not as bad as the space penalty otherwise.

Allow internal physical ashifts for vdevs higher than SHIFT_MAX.
May be one day allocator or aggregation could benefit from that.

Reduce zfs_vdev_max_auto_ashift default from 16 (64KB) to 14 (16KB),
so that ZFS may still use bigger ashifts up to SHIFT_MAX (64KB),
but only if it really has to or explicitly told to, but not as an
"optimization".

There are some read-intensive NVMe SSDs that report Preferred Write
Alignment of 64KB, and attempt to build RAIDZ2 of those leads to a
space inefficiency that can't be justified.  Instead these changes
make ZFS fall back to logical ashift of 12 (4KB) by default and
only warn user that it may be suboptimal for performance.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #13798

20 months agoAdd Linux posix_fadvise support
Finix1979 [Thu, 8 Sep 2022 17:29:41 +0000 (01:29 +0800)]
Add Linux posix_fadvise support

The purpose of this PR is to accepts fadvise ioctl from userland
to do read-ahead by demand.

It could dramatically improve sequential read performance especially
when primarycache is set to metadata or zfs_prefetch_disable is 1.

If the file is mmaped, generic_fadvise is also called for page cache
read-ahead besides dmu_prefetch.

Only POSIX_FADV_WILLNEED and POSIX_FADV_SEQUENTIAL are supported in
this PR currently.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Finix Yan <yancw@info2soft.com>
Closes #13694

20 months agoLinux SPL module init: Handle memory allocation failures correctly
Richard Yao [Thu, 8 Sep 2022 17:28:20 +0000 (13:28 -0400)]
Linux SPL module init: Handle memory allocation failures correctly

Upon inspection of our code, I noticed that we assume that
__alloc_percpu() cannot fail, and while it probably never has failed in
practice, technically, it can fail, so we should handle that.

Additionally, we incorrectly assume that `taskq_create()` in
spl_kmem_cache_init() cannot fail. The same remark applies to it.

Lastly, `spl-init()` failures should always return negative error
values, but in some places, we are returning positive 1, which is
incorrect. We change those values to their correct error codes.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13847

20 months agoFix build on FreeBSD/powerpc64*
pkubaj [Thu, 8 Sep 2022 17:27:25 +0000 (17:27 +0000)]
Fix build on FreeBSD/powerpc64*

There's no VSX handler on FreeBSD for now.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Piotr Kubaj <pkubaj@FreeBSD.org>
Closes #13848

20 months agomake DMU_OT_IS_METADATA and DMU_OT_IS_ENCRYPTED return B_TRUE or B_FALSE
Christian Schwarz [Thu, 8 Sep 2022 00:04:15 +0000 (02:04 +0200)]
make DMU_OT_IS_METADATA and DMU_OT_IS_ENCRYPTED return B_TRUE or B_FALSE

Without this patch, the

    ASSERT3U(dbuf_is_metadata(db), ==, arc_is_metadata(buf));

at the beginning of dbuf_assign_arcbuf can panic
if the object type is a DMU_OT_NEWTYPE that has
DMU_OT_METADATA set.

While we're at it, fix DMU_OT_IS_ENCRYPTED as well.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Christian Schwarz <christian.schwarz@nutanix.com>
Closes #13842

20 months agoAdd xattr_handler support for Android kernels
Walter Huf [Tue, 6 Sep 2022 17:02:18 +0000 (10:02 -0700)]
Add xattr_handler support for Android kernels

Some ARM BSPs run the Android kernel, which has
a modified xattr_handler->get() function signature.
This adds support to compile against these kernels.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Walter Huf <hufman@gmail.com>
Closes #13824

20 months agoFreeBSD: add kqfilter support for zvol cdev
Rob Wing [Wed, 2 Feb 2022 05:00:57 +0000 (20:00 -0900)]
FreeBSD: add kqfilter support for zvol cdev

The only event hooked up is NOTE_ATTRIB, which is triggered when the
device is resized.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Wing <rew@FreeBSD.org>
Closes #13773

20 months agoFreeBSD: add knlist_init_sx() for exclusive locks
Rob Wing [Sun, 14 Aug 2022 05:09:49 +0000 (21:09 -0800)]
FreeBSD: add knlist_init_sx() for exclusive locks

This will be used to implement kqfilter support for zvol cdevs.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Wing <rew@FreeBSD.org>
Closes #13773

20 months agoCleanup Raid-Z Typo fixes
Richard Yao [Tue, 6 Sep 2022 16:43:21 +0000 (12:43 -0400)]
Cleanup Raid-Z Typo fixes

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13834

20 months agoFix column width in 'zpool iostat -v' and 'zpool list -v'
Samuel [Tue, 6 Sep 2022 16:37:47 +0000 (22:07 +0530)]
Fix column width in 'zpool iostat -v' and 'zpool list -v'

This commit fixes a minor spacing issue caused when
enumerating vdev names, which originated from #13031

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
Signed-off-by: Samuel Wycliffe <samuelwycliffe@gmail.com>
Closes #13811

20 months agoAdd DD_FIELD string for snapshots_changed property
Umer Saleem [Fri, 2 Sep 2022 20:33:50 +0000 (01:33 +0500)]
Add DD_FIELD string for snapshots_changed property

This commit adds DD_FIELD string used in extensified dsl_dir zap object
for snapshots_changed property.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #13819

20 months agoAdd zfs.sync.snapshot_rename
Andriy Gapon [Fri, 2 Sep 2022 20:31:19 +0000 (23:31 +0300)]
Add zfs.sync.snapshot_rename

Only the single snapshot rename is provided.
The recursive or more complex rename can be scripted.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Andriy Gapon <avg@FreeBSD.org>
Closes #13802

20 months agoFreeBSD: Organize sysctls
Ryan Moeller [Tue, 9 Aug 2022 09:05:47 +0000 (09:05 +0000)]
FreeBSD: Organize sysctls

FreeBSD had a few platform-specific ARC tunables in the wrong place:

- Move FreeBSD-specifc ARC tunables into the same vfs.zfs.arc node as
  the rest of the ARC tunables.
- Move the handlers from arc_os.c to sysctl_os.c and add compat sysctls
  for the legacy names.

While here, some additional clean up:

- Most handlers are specific to a particular variable and don't need a
  pointer passed through the args.
- Group blocks of related variables, handlers, and sysctl declarations
  into logical sections.
- Match variable types for temporaries in handlers with the type of the
  global variable.
- Remove leftover comments.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #13756

20 months agoFreeBSD: Mark ZFS_MODULE_PARAM_CALL as MPSAFE
Ryan Moeller [Tue, 9 Aug 2022 09:05:29 +0000 (09:05 +0000)]
FreeBSD: Mark ZFS_MODULE_PARAM_CALL as MPSAFE

ZFS_MODULE_PARAM_CALL handlers implement their own locking if needed
and do not require Giant.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #13756

20 months agoAdd zilstat script to report zil kstats in a user friendly manner
Ameer Hamza [Fri, 2 Sep 2022 20:24:07 +0000 (01:24 +0500)]
Add zilstat script to report zil kstats in a user friendly manner

Added a python script to process both global and per dataset
zil kstats and report them in a user friendly manner similar
to arcstat and dbufstat.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #13704

20 months agoApply arc_shrink_shift to ARC above arc_c_min
Alexander Motin [Fri, 2 Sep 2022 20:21:18 +0000 (16:21 -0400)]
Apply arc_shrink_shift to ARC above arc_c_min

It makes sense to free memory in smaller chunks when approaching
arc_c_min to let other kernel subsystems to free more, since after
that point we can't free anything.  This also matches behavior on
Linux, where to shrinker reported only the size above arc_c_min.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #13794

20 months agoFreeBSD: Cleanup dead code from VFS
Richard Yao [Fri, 2 Sep 2022 20:20:10 +0000 (16:20 -0400)]
FreeBSD: Cleanup dead code from VFS

The vfs_*_feature() macros turn anything that uses them into dead code,
so we can delete all of it.

As a side effect, zfs_set_fuid_feature() is now identical in
module/os/freebsd/zfs/zfs_vnops_os.c and
module/os/linux/zfs/zfs_vnops_os.c. A few other functions are identical
too. Future cleanup could move these into a common file.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13832

20 months agoAlloc zdb_cd_t to fix stack issue
Andrew Innes [Fri, 2 Sep 2022 20:15:18 +0000 (04:15 +0800)]
Alloc zdb_cd_t to fix stack issue

Alloc zdb_cd_t since it is too large for the stack on windows
which results in `zdb` crashing immediately.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrew Innes <andrew.c12@gmail.com>
Co-authored-by: Jorgen Lundman <lundman@lundman.net>
Closes #13807

20 months agoImporting from cachefile can trip assertion
George Wilson [Fri, 26 Aug 2022 21:04:27 +0000 (16:04 -0500)]
Importing from cachefile can trip assertion

When importing from cachefile, it is possible that the builtin retry
logic will trip an assertion because it also fails to find the pool.
This fix addresses that case and returns the correct error message to
the user.

Reviewed-by: Richard Yao <ryao@gentoo.org>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Wilson <gwilson@delphix.com>
Closes #13781

20 months agoZTS: zvol_stress: fix race condition with zinject usage
Christian Schwarz [Thu, 25 Aug 2022 21:22:10 +0000 (23:22 +0200)]
ZTS: zvol_stress: fix race condition with zinject usage

In automated ZTS runs, I'd occasionally hit

    log_fail "Expected to see some write errors"

because there weren't any write errors.

The reason is that we're not syncing the zpool before `zinject -c`.
If the writes by `dd` aren't synced out at the time `zinject -c` runs,
they will not hit an error and we'll hit the log_fail above.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Christian Schwarz <christian.schwarz@nutanix.com>
Closes #13793

20 months agoRevert "Avoid panic with recordsize > 128k, raw sending and no large_blocks"
Brian Behlendorf [Thu, 25 Aug 2022 20:33:32 +0000 (13:33 -0700)]
Revert "Avoid panic with recordsize > 128k, raw sending and no large_blocks"

This reverts commit 80a650b7bb04bce3aef5e4cfd1d966e3599dafd4.  This change
inadvertently introduced a regression in ztest where one of the new ASSERTs
is triggered in dsl_scan_visitbp().

Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #12275
Closes #13799

20 months agoUpdates for snapshots_changed property
Umer Saleem [Wed, 24 Aug 2022 21:20:43 +0000 (02:20 +0500)]
Updates for snapshots_changed property

Currently, snapshots_changed property is stored in dd_props_zapobj, due
to which the property is assumed to be local. This causes a difference
in behavior with respect to other readonly properties.

This commit stores the snapshots_changed property in dd_object. Source
is not set to local in this case, which makes it consistent with other
readonly properties.

This commit also updates the date string format to include seconds.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #13785

20 months agoFix zpool status in case of unloaded keys
George Amanakis [Tue, 23 Aug 2022 00:42:01 +0000 (02:42 +0200)]
Fix zpool status in case of unloaded keys

When scrubbing an encrypted filesystem with unloaded key still report an
error in zpool status.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alek Pinchuk <apinchuk@axcient.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #13675
Closes #13717

20 months agoPrevent zevent list from consuming all of kernel memory
Paul Dagnelie [Mon, 22 Aug 2022 19:36:22 +0000 (12:36 -0700)]
Prevent zevent list from consuming all of kernel memory

There are a couple changes included here. The first is to introduce
a cap on the size the ZED will grow the zevent list to. One million
entries is more than enough for most use cases, and if you are
overflowing that value, the problem needs to be addressed another
way. The value is also tunable, for those who want the limit to be
higher or lower.

The other change is to add a kernel module parameter that allows
snapshot creation/deletion to be exempted from the history logging;
for most workloads, having these things logged is valuable, but for
some workloads it produces large quantities of log spam and isn't
especially helpful.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Issue #13374
Closes #13753

21 months agocontrib: dracut: zfs-snapshot-bootfs: exit status fix
gregory-lee-bartholomew [Fri, 12 Aug 2022 21:28:15 +0000 (16:28 -0500)]
contrib: dracut: zfs-snapshot-bootfs: exit status fix

When the zfs-snapshot-bootfs service attempts to create a snapshot
that already exists, the exit status of the command is non-zero and
the service reports failed to the systemd service manager. This is a
common occurrence if bootfs.snapshot is left set on the kernel command
line and it should not be considered a failure.

This service was originally set to ignore this error by prefixing
the command with - on the ExecStart line, but the leading - appears
to have been dropped in #13359.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gregory Bartholomew <gregory.lee.bartholomew@gmail.com>
Closes #13769

21 months agoarcstat: fix -p option
r-ricci [Fri, 12 Aug 2022 21:21:52 +0000 (22:21 +0100)]
arcstat: fix -p option

When the -p option is used, a list of floats is passed to sep.join(),
which expects strings. Fix this by converting each value to a string.

Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Roberto Ricci <ricci@disroot.org>
Closes #12916
Closes #13767

21 months agoEnable relatime by default
George Melikov [Fri, 12 Aug 2022 21:20:25 +0000 (00:20 +0300)]
Enable relatime by default

Linux sets relatime on mount by default for any file system,
but relatime=off in ZFS disables it explicitly.

Let's be consistent with other file systems on Linux.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #13614

21 months agoZTS: Fix zpool_expand_001_pos
Tony Hutter [Tue, 9 Aug 2022 20:26:46 +0000 (13:26 -0700)]
ZTS: Fix zpool_expand_001_pos

`zpool_expand_001_pos` was often failing due to not seeing autoexpand
commands in the `zpool history`.  During testing, I found this to be
unreliable (sometimes the "online" wouldn't appear in `zpool history`)
and unnecessary, as we could simply check that the pool increased in
size.

This commit revamps the test to check for the expanded pool size
and corresponding new free space.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #13743

21 months agoAdd comment on acb_zio_dummy
Christian Schwarz [Mon, 8 Aug 2022 23:55:13 +0000 (01:55 +0200)]
Add comment on acb_zio_dummy

Thanks to George Wilson for clarifying this on Slack.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Christian Schwarz <christian.schwarz@nutanix.com>
Closes #13698

21 months agoLinux 6.0 compat: register_shrinker() now var-arg
Coleman Kane [Mon, 8 Aug 2022 23:18:30 +0000 (19:18 -0400)]
Linux 6.0 compat: register_shrinker() now var-arg

The 6.0 kernel added a printf-style var-arg for args > 0 to the
register_shrinker function, in order to add names to shrinkers, in
commit e33c267ab70de4249d22d7eab1cc7d68a889bac2. This enables the
shrinkers to have friendly names exposed in /sys/kernel/debug/shrinker/.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #13748

21 months agolibzfs: Remove unused zpool_get_physpath()
Ryan Moeller [Fri, 5 Aug 2022 00:04:09 +0000 (20:04 -0400)]
libzfs: Remove unused zpool_get_physpath()

This is an oddly specific function that has never had any consumers in
the history of this repo.  Get rid of it and the pile of helper
functions that exist for it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #13724

21 months agozpool: fix redundancy check after vdev removal
Stéphane Lesimple [Fri, 5 Aug 2022 00:02:57 +0000 (03:02 +0300)]
zpool: fix redundancy check after vdev removal

The presence of indirect vdevs was confusing get_redundancy(), which
considered a pool with e.g. only mirror top-level vdevs and at least
one indirect vdev (due to the removal of a previous vdev) as already
having a broken redundancy, which is not the case. This lead to the
possibility of compromising the redundancy of a pool by adding
mismatched vdevs without requiring the use of `-f`, and with no
visible notice or warning.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Stéphane Lesimple <speed47_github@speed47.net>
Closes #13705
Closes #13711

21 months agoLinux 5.20 compat: blk_cleanup_disk()
Brian Behlendorf [Thu, 4 Aug 2022 00:37:52 +0000 (17:37 -0700)]
Linux 5.20 compat: blk_cleanup_disk()

As of the Linux 5.20 kernel blk_cleanup_disk() has been removed,
all callers should use put_disk().

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13728

21 months agoLinux 5.20 compat: bdevname()
Brian Behlendorf [Wed, 3 Aug 2022 18:35:47 +0000 (11:35 -0700)]
Linux 5.20 compat: bdevname()

As of the Linux 5.20 kernel bdevname() has been removed, all
callers should use snprintf() and the "%pg" format specifier.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13728

21 months agoDon't double-zero buffers in fault management nvlists
Paul Dagnelie [Thu, 4 Aug 2022 23:53:47 +0000 (16:53 -0700)]
Don't double-zero buffers in fault management nvlists

This is a small cleanup for a trivial problem which happened to
be noticed while another issue was being investigated.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #13730

21 months agoAdd snapshots_changed as property
Umer Saleem [Tue, 2 Aug 2022 23:45:30 +0000 (04:45 +0500)]
Add snapshots_changed as property

Make dd_snap_cmtime property persistent across mount and unmount
operations by storing in ZAP and restore the value from ZAP on hold
into dd_snap_cmtime instead of updating it.

Expose dd_snap_cmtime as 'snapshots_changed' property that provides a
mechanism to quickly determine whether snapshot list for dataset has
changed without having to mount a dataset or iterate the snapshot list.

It specifies the time at which a snapshot for a dataset was last
created or deleted. This allows us to be more efficient how often we
query snapshots.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #13635

21 months agoFreeBSD: Ignore symlink to i386 includes
Ryan Moeller [Tue, 2 Aug 2022 23:34:23 +0000 (19:34 -0400)]
FreeBSD: Ignore symlink to i386 includes

A symlink to i386 includes is created in the build dir on amd64 since
freebsd/freebsd-src@d07600c563039f252becc29ac7d9a454b6b0600d

Tell git to ignore it like the other include links.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #13719

21 months agoLinux 5.19 compat: META
Brian Behlendorf [Tue, 2 Aug 2022 17:04:38 +0000 (10:04 -0700)]
Linux 5.19 compat: META

Update the META file to reflect compatibility with the 5.19 kernel.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13715

21 months agoSkip checksum benchmarks on systems with slow cpu
Tino Reichardt [Mon, 1 Aug 2022 16:51:45 +0000 (18:51 +0200)]
Skip checksum benchmarks on systems with slow cpu

The checksum benchmarking on module load may take a really long time
on embedded systems with a slow cpu. Avoid all benchmarks >= 1MiB on
systems, where EdonR is slower then 300 MiB/s.

This limit is currently hardcoded via the define LIMIT_PERF_MBS.

This is the new benchmark output of a slow Intel Atom:

```
 implementation    1k    4k   16k   64k  256k    1m    4m   16m
 edonr-generic    209   257   268   259   262     0     0     0
 skein-generic    129   150   151   150   150     0     0     0
 sha256-generic    50    55    56    56    56     0     0     0
 sha512-generic    76    86    88    89    88     0     0     0
 blake3-generic    63    62    62    62    61     0     0     0
 blake3-sse2      114   292   301   307   309     0     0     0
```

Reviewed-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13695

21 months agoFix checkstyle warning: E275 missing whitespace after keyword
Tino Reichardt [Mon, 1 Aug 2022 16:49:35 +0000 (18:49 +0200)]
Fix checkstyle warning: E275 missing whitespace after keyword

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13710

21 months agoImplement a new type of zfs receive: corrective receive (-c)
Alek P [Thu, 28 Jul 2022 22:52:46 +0000 (18:52 -0400)]
Implement a new type of zfs receive: corrective receive (-c)

This type of recv is used to heal corrupted data when a replica
of the data already exists (in the form of a send file for example).
With the provided send stream, corrective receive will read from
disk blocks described by the WRITE records. When any of the reads
come back with ECKSUM we use the data from the corresponding WRITE
record to rewrite the corrupted block.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes #9372

21 months agoFreeBSD compile fix
Tino Reichardt [Thu, 28 Jul 2022 21:19:41 +0000 (23:19 +0200)]
FreeBSD compile fix

The file module/os/freebsd/zfs/zfs_ioctl_compat.c fails compiling
because of this error: 'static' is not at beginning of declaration

This commit fixes the three places within that file.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13702

21 months agoZTS: Fix io_uring support check
Brian Behlendorf [Tue, 26 Jul 2022 21:39:23 +0000 (14:39 -0700)]
ZTS: Fix io_uring support check

Not all Linux distribution kernels enable io_uring support by
default.  Update the run time check to verify that the booted
kernel was built with CONFIG_IO_URING=y.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Co-authored-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13648
Closes #13685

21 months agoAdd createtxg sort support for simple snapshot iterator
Ameer Hamza [Mon, 25 Jul 2022 21:04:46 +0000 (02:04 +0500)]
Add createtxg sort support for simple snapshot iterator

- When iterating snapshots with name only, e.g., "-o name -s name",
libzfs uses simple snapshot iterator and results are displayed
in alphabetic order. This PR adds support for faster version of
createtxg sort by avoiding nvlist parsing for properties. Flags
"-o name -s createtxg" will enable createtxg sort while using
simple snapshot iterator.
- Added support to read createtxg property directly from zfs handle
for filesystem, volume and snapshot types instead of parsing nvlist.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #13577

21 months agoZTS: Fix occasional inherit_001_pos.ksh failure
Brian Behlendorf [Mon, 25 Jul 2022 16:52:42 +0000 (09:52 -0700)]
ZTS: Fix occasional inherit_001_pos.ksh failure

The mountpoint may still be busy when the `zfs unmount -a` command
is run causing an unexpected failure.  Retry the unmount a couple
of times since it should not remain busy for long.

    19:10:50.29 NOTE: Reading state from .../inheritance/state021.cfg
    19:10:50.32 cannot unmount '/TESTPOOL': pool or dataset is busy
    19:10:50.32 ERROR: zfs unmount -a exited 1

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13686

21 months agozdb: dump spill block pointer if present
Christian Schwarz [Thu, 21 Jul 2022 00:16:29 +0000 (02:16 +0200)]
zdb: dump spill block pointer if present

Output will look like so:

  $ sudo zdb -dddd -vv testpool/fs 2
  Dataset testpool/fs [ZPL], ID 260, cr_txg 8, 25K, 7 objects, rootbp DVA[0]=<0:1800be00:200> DVA[1]=<0:1c00be00:200> [L0 DMU objset] fletcher4 lz4 unencrypted LE contiguous unique double size=1000L/200P birth=16L/16P fill=7 cksum=d03b396cd:489ca835517:d4b04a4d0a62:1b413aac454d53

      Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type
           2    1   128K    512     1K     512    512    0.00  ZFS plain file (K=inherit) (Z=inherit=lz4)
                                                 192   bonus  System attributes
      dnode flags: USED_BYTES USERUSED_ACCOUNTED USEROBJUSED_ACCOUNTED SPILL_BLKPTR
      dnode maxblkid: 0
      path    /testfile
      uid     0
      gid     0
      atime   Fri Jul 15 12:36:35 2022
      mtime   Fri Jul 15 12:36:35 2022
      ctime   Fri Jul 15 12:36:51 2022
      crtime  Fri Jul 15 12:36:35 2022
      gen 10
      mode    100600
      size    0
      parent  34
      links   1
      pflags  840800000004
      SA xattrs: 248 bytes, 2 entries

          security.selinux = nutanix_u:object_r:unlabeled_t:s0\000
          user.foo = xbLQJjyVvEVPGGuRHV/gjkFFO1MdehKnLjjd36ZaoMVaUqtqFoMMYT5Ya9yywHApJNoK/1hNJfO3\012XCJWv9/QUTKamoWW9xVDE7yi8zn166RNw5QUhf84cZ3JNLnw6oN

Spill block: 0:10005c00:200 0:14005c00:200 200L/200P F=1 B=16/16 cksum=1cdfac47a4:910c5caa557:195d0493dfe5a:332b6fde6ad547
Indirect blocks:

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Christian Schwarz <christian.schwarz@nutanix.com>
Closes #13640

21 months agoAdd support for per dataset zil stats and use wmsum counters
ixhamza [Thu, 21 Jul 2022 00:14:06 +0000 (05:14 +0500)]
Add support for per dataset zil stats and use wmsum counters

ZIL kstats are reported in an inclusive way, i.e., same counters are
shared to capture all the activities happening in zil. Added support
to report zil stats for every datset individually by combining them
with already exposed dataset kstats.

Wmsum uses per cpu counters and provide less overhead as compared
to atomic operations. Updated zil kstats to replace wmsum counters
to avoid atomic operations.

Reviewed-by: Christian Schwarz <christian.schwarz@nutanix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #13636

21 months agoFix scrub resume from newly created hole
Alexander Motin [Thu, 21 Jul 2022 00:02:36 +0000 (20:02 -0400)]
Fix scrub resume from newly created hole

It may happen that scan bookmark points to a block that was turned
into a part of a big hole.  In such case dsl_scan_visitbp() may skip
it and dsl_scan_check_resume() will not be called for it.  As result
new scan suspend won't be possible until the end of the object, that
may take hours if the object is a multi-terabyte ZVOL on a slow HDD
pool, stretching TXG to all that time, creating all sorts of problems.

This patch changes the resume condition to any greater or equal block,
so even if we miss the bookmarked block, the next one we find will
delete the bookmark, allowing new suspend.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13643

21 months agoFix memory allocation for the checksum benchmark
Tino Reichardt [Thu, 21 Jul 2022 00:01:32 +0000 (02:01 +0200)]
Fix memory allocation for the checksum benchmark

Allocation via kmem_cache_alloc() is limited to less then 4m for
some architectures.

This commit limits the benchmarks with the linear abd cache to 1m
on all architectures and adds 4m + 16m benchmarks via non-linear
abd_alloc().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Co-authored-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13669
Closes #13670

22 months agoExpose ZFS dataset case sensitivity setting via sb_opts
ixhamza [Thu, 14 Jul 2022 17:38:16 +0000 (22:38 +0500)]
Expose ZFS dataset case sensitivity setting via sb_opts

Makes the case sensitivity setting visible on Linux in /proc/mounts.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #13607