]> git.proxmox.com Git - mirror_zfs.git/log
mirror_zfs.git
3 years agoMemory leak in ztest_vdev_attach_detach()
Matthew Ahrens [Fri, 25 Dec 2020 04:51:13 +0000 (20:51 -0800)]
Memory leak in ztest_vdev_attach_detach()

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11396

3 years agonvlist leaked in zpool_find_config()
Matthew Ahrens [Wed, 23 Dec 2020 17:52:24 +0000 (09:52 -0800)]
nvlist leaked in zpool_find_config()

In `zpool_find_config()`, the `pools` nvlist is leaked.  Part of it (a
sub-nvlist) is returned in `*configp`, but the callers also leak that.

Additionally, in `zdb.c:main()`, the `searchdirs` is leaked.

The leaks were detected by ASAN (`configure --enable-asan`).

This commit resolves the leaks.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11396

3 years agoimplicit conversion from 'boolean_t' to 'ds_hold_flags_t'
Toomas Soome [Mon, 28 Dec 2020 00:31:02 +0000 (02:31 +0200)]
implicit conversion from 'boolean_t' to 'ds_hold_flags_t'

Build error on illumos with gcc 10 did reveal:

In function 'dmu_objset_refresh_ownership':
../../common/fs/zfs/dmu_objset.c:857:25: error: implicit conversion
from 'boolean_t' to 'ds_hold_flags_t' {aka 'enum ds_hold_flags'}
[-Werror=enum-conversion]
      857 |  dsl_dataset_disown(ds, decrypt, tag);
          |                         ^~~~~~~
cc1: all warnings being treated as errors

libzfs_input_check.c: In function 'zfs_ioc_input_tests':
libzfs_input_check.c:754:28: error: implicit conversion from
'enum dmu_objset_type' to 'enum lzc_dataset_type'
[-Werror=enum-conversion]
  754 |  err = lzc_create(dataset, DMU_OST_ZFS, NULL, NULL, 0);
      |                            ^~~~~~~~~~~
cc1: all warnings being treated as errors

The same issue is present in openzfs, and also the same issue about
ds_hold_flags_t, which currently defines exactly one valid value.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #11406

3 years agoLinux 5.11 compat: blk_{un}register_region()
Brian Behlendorf [Tue, 22 Dec 2020 22:12:31 +0000 (14:12 -0800)]
Linux 5.11 compat: blk_{un}register_region()

As of 5.11 the blk_register_region() and blk_unregister_region()
functions have been retired. This isn't a problem since add_disk()
has implicitly allocated minor numbers for a very long time.

Reviewed-by: Rafael Kitover <rkitover@gmail.com>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11387
Closes #11390

3 years agoLinux 5.11 compat: revalidate_disk_size()
Brian Behlendorf [Tue, 22 Dec 2020 21:53:25 +0000 (13:53 -0800)]
Linux 5.11 compat: revalidate_disk_size()

Both revalidate_disk_size() and revalidate_disk() have been removed.
Functionally this isn't a problem because we only relied on these
functions to call zvol_revalidate_disk() for us and to perform any
additional handling which might be needed for that kernel version.
When neither are available we know there's no additional handling
needed and we can directly call zvol_revalidate_disk().

Reviewed-by: Rafael Kitover <rkitover@gmail.com>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11387
Closes #11390

3 years agoLinux 5.11 compat: bdev_whole()
Brian Behlendorf [Tue, 22 Dec 2020 21:02:59 +0000 (13:02 -0800)]
Linux 5.11 compat: bdev_whole()

The bd_contains member was removed from the block_device structure.
Callers needing to determine if a vdev is a whole block device should
use the new bdev_whole() wrapper.  For older kernels we provide our
own bdev_whole() wrapper which relies on bd_contains for compatibility.

Reviewed-by: Rafael Kitover <rkitover@gmail.com>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11387
Closes #11390

3 years agoLinux 5.11 compat: bio_start_io_acct() / bio_end_io_acct()
Brian Behlendorf [Tue, 22 Dec 2020 20:17:13 +0000 (12:17 -0800)]
Linux 5.11 compat: bio_start_io_acct() / bio_end_io_acct()

The generic IO accounting functions have been removed in favor of the
bio_start_io_acct() and bio_end_io_acct() functions which provide a
better interface.  These new functions were introduced in the 5.8
kernels but it wasn't until the 5.11 kernel that the previous generic
IO accounting interfaces were removed.

This commit updates the blk_generic_*_io_acct() wrappers to provide
and interface similar to the updated kernel interface.  It's slightly
different because for older kernels we need to pass the request queue
as well as the bio.

Reviewed-by: Rafael Kitover <rkitover@gmail.com>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11387
Closes #11390

3 years agoLinux 5.11 compat: lookup_bdev()
Brian Behlendorf [Tue, 22 Dec 2020 18:26:45 +0000 (10:26 -0800)]
Linux 5.11 compat: lookup_bdev()

The lookup_bdev() function has been updated to require a dev_t
be passed as the second argument. This is actually pretty nice
since the major number stored in the dev_t was the only part we
were interested in. This allows to us avoid handling the bdev
entirely.  The vdev_lookup_bdev() wrapper was updated to emulate
the behavior of the new lookup_bdev() for all supported kernels.

Reviewed-by: Rafael Kitover <rkitover@gmail.com>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11387
Closes #11390

3 years agoLinux 5.11 compat: conftest
Brian Behlendorf [Wed, 23 Dec 2020 19:40:02 +0000 (11:40 -0800)]
Linux 5.11 compat: conftest

Update the ZFS_LINUX_TEST_PROGRAM macro to always set the module
license.  As of the 5.11 kernel not setting a license has been
converted from a warning to an error.

Reviewed-by: Rafael Kitover <rkitover@gmail.com>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11387
Closes #11390

3 years agodbufstat: Fix warnings with Python 3.8
Ryan Moeller [Wed, 23 Dec 2020 23:10:35 +0000 (18:10 -0500)]
dbufstat: Fix warnings with Python 3.8

Replace "is" with "==" and "is not" with "!=".

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11394

3 years agoLinux 5.10 compat: META
Brian Behlendorf [Wed, 23 Dec 2020 16:55:02 +0000 (08:55 -0800)]
Linux 5.10 compat: META

Increase the Linux-Maximum version in the META file to 5.10.
All of the required compatibility patches have been merged.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11391

3 years agozfs-kmods: install to /lib/modules instead of /usr/lib/modules
Christian Schwarz [Tue, 22 Dec 2020 04:14:32 +0000 (04:14 +0000)]
zfs-kmods: install to /lib/modules instead of /usr/lib/modules

Before this patch, dracut wouldn't find zfs.ko for inclusion in
initramfs. This was caused by the packages installing in to
/lib/modules instead of /usr/lib/modules.  Correcting this allows
dracut to do the right thing, even without

    # /etc/dracut.conf
    add_drivers+=" zfs "

Notably, rpm/redhat/zfs-kmod.spec.in does not contain the definition of
the `prefix` macro that this commit removes in the generic kmod spec.
And https://rpmfusion.org/Packaging/KernelModules/Kmods2 does not
mention `prefix` at all.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11381

3 years agoDangling reference from dmu_objset_upgrade
Andy Fiddaman [Mon, 21 Dec 2020 18:13:23 +0000 (18:13 +0000)]
Dangling reference from dmu_objset_upgrade

After porting the fix for https://github.com/openzfs/zfs/issues/5295
over to illumos, we started hitting an assertion failure when running
the testsuite:

assertion failed: rc->rc_count == number, file: .../refcount.c

and the unexpected hold has this stack:

dsl_dataset_long_hold+0x59 dmu_objset_upgrade+0x73
dmu_objset_id_quota_upgrade+0x15 dmu_objset_own+0x14f

The simplest reproducer for this in illumos is

    zpool create -f -O version=1 testpool c3t0d0; zpool destroy testpool

which is run as part of the zpool_create_tempname test, but I can't get
this to trigger on FreeBSD. This appears to be because of the call to
txg_wait_synced() in dmu_objset_upgrade_stop() (which was missing in
illumos), slows down dmu_objset_disown() enough to avoid the condition.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andy Fiddaman <andy@omnios.org>
Closes #11368

3 years agoLinux 4.18.0-257.el8 compat: blk_alloc_queue()
Brian Behlendorf [Mon, 21 Dec 2020 18:11:56 +0000 (10:11 -0800)]
Linux 4.18.0-257.el8 compat: blk_alloc_queue()

The CentOS stream 4.18.0-257 kernel appears to have backported
the Linux 5.9 change to make_request_fn and the associated API.
To maintain weak modules compatibility the original symbol was
retained and the new interface blk_alloc_queue_rh() was added.

Unfortunately, blk_alloc_queue() was replaced in the blkdev.h
header by blk_alloc_queue_bh() so there doesn't seem to be a way
to build new kmods against the old interfces.  Even though they
appear to still be available for weak module binding.

To accommodate this a configure check is added for the new _rh()
variant of the function and used if available.  If compatibility
code gets added to the kernel for the original blk_alloc_queue()
interface this should be fine.  OpenZFS will simply continue to
prefer the new interface and only fallback to blk_alloc_queue()
when blk_alloc_queue_rh() isn't available.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11374

3 years agoLinux 5.10 compat: also zvol_revalidate_disk()
Michael D Labriola [Fri, 18 Dec 2020 17:36:19 +0000 (12:36 -0500)]
Linux 5.10 compat: also zvol_revalidate_disk()

Commit 59b68723 added a configure check for 5.10, which removed
revalidate_disk(), and conditionally replaced it's usage with a call to
the new revalidate_disk_size() function.  However, the old function also
invoked the device's registered callback, in our case
zvol_revalidate_disk().  This commit adds a call to zvol_revalidate_disk()
in zvol_update_volsize() to make sure the code path stays the same.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Michael D Labriola <michael.d.labriola@gmail.com>
Closes #11358

3 years agoFix maybe uninitialized variable warning
Brian Behlendorf [Sun, 20 Dec 2020 17:50:13 +0000 (09:50 -0800)]
Fix maybe uninitialized variable warning

Commit 1c2358c12 restructured this code and introduced a warning
about the variable maybe not being initialized.  This cannot happen
with the updated code but we should initialize the variable anyway
to silence the warning.

    zpl_file.c: In function ‘zpl_iter_write’:
    zpl_file.c:324:9: warning: ‘count’ may be used uninitialized
        in this function [-Wmaybe-uninitialized]

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11373

3 years agoRemove iov_iter_advance() from iter_read
Brian Behlendorf [Sun, 20 Dec 2020 17:49:29 +0000 (09:49 -0800)]
Remove iov_iter_advance() from iter_read

There's no need to call iov_iter_advance() in zpl_iter_read().
This was preserved from the previous code where it wasn't needed
but also didn't cause any problems.  Now that the iter functions
also handle pipes that's no longer the case.  When fully reading a
pipe buffer iov_iter_advance() may results in the pipe buf release
function being called which will not be registered resulting in
a NULL dereference.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11375
Closes #11378

3 years agoLinux 5.10 compat: use iov_iter in uio structure
Brian Behlendorf [Fri, 18 Dec 2020 16:48:26 +0000 (08:48 -0800)]
Linux 5.10 compat: use iov_iter in uio structure

As of the 5.10 kernel the generic splice compatibility code has been
removed.  All filesystems are now responsible for registering a
->splice_read and ->splice_write callback to support this operation.

The good news is the VFS provided generic_file_splice_read() and
iter_file_splice_write() callbacks can be used provided the ->iter_read
and ->iter_write callback support pipes.  However, this is currently
not the case and only iovecs and bvecs (not pipes) are ever attached
to the uio structure.

This commit changes that by allowing full iov_iter structures to be
attached to uios.  Ever since the 4.9 kernel the iov_iter structure
has supported iovecs, kvecs, bvevs, and pipes so it's desirable to
pass the entire thing when possible.  In conjunction with this the
uio helper functions (i.e uiomove(), uiocopy(), etc) have been
updated to understand the new UIO_ITER type.

Note that using the kernel provided uio_iter interfaces allowed the
existing Linux specific uio handling code to be simplified.  When
there's no longer a need to support kernel's older than 4.9, then
it will be possible to remove the iovec and bvec members from the
uio structure and always use a uio_iter.  Until then we need to
maintain all of the existing types for older kernels.

Some additional refactoring and cleanup was included in this change:

- Added checks to configure to detect available iov_iter interfaces.
  Some are available all the way back to the 3.10 kernel and are used
  when available.  In particular, uio_prefaultpages() now always uses
  iov_iter_fault_in_readable() which is available for all supported
  kernels.

- The unused UIO_USERISPACE type has been removed.  It is no longer
  needed now that the uio_seg enum is platform specific.

- Moved zfs_uio.c from the zcommon.ko module to the Linux specific
  platform code for the zfs.ko module.  This gets it out of libzfs
  where it was never needed and keeps this Linux specific code out
  of the common sources.

- Removed unnecessary O_APPEND handling from zfs_iter_write(), this
  is redundant and O_APPEND is already handled in zfs_write();

NOTE: Cleanly applying this kernel compatibility change required
applying the following commits.  This makes the change larger than
it absolutely needs to be, but the resulting code matches what's in
the branch branch.  This is both more tested and makes it easier to
apply any future backports in this area.

7cf4cd824 Remove incorrect assertion
783be694f Reduce confusion in zfs_write
af5626ac2 Return EFAULT at the end of zfs_write() when set
cc1f85be8 Simplify offset and length limit in zfs_write
9585538d0 Const some unchanging variables in zfs_write
86e74dc16 Remove redundant oid parameter to update_pages
b3d723fb0 Factor uid, gid, and projid out of loop in zfs_write
3d40b6554 Share zfs_fsync, zfs_read, zfs_write, et al between Linux and FreeBSD

Reviewed-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11351

3 years agoRemove incorrect assertion
Brian Behlendorf [Tue, 24 Nov 2020 17:28:42 +0000 (09:28 -0800)]
Remove incorrect assertion

Commit 85703f6 added a new ASSERT to zfs_write() as part of the
cleanup which isn't correct in the case where multiple processes
are concurrently extending a file.  The `zp->z_size` is updated
atomically while holding a range lock on only a portion of the
file.  Therefore, it's possible for the file size to increase
after a same check is performed earlier in the loop causing this
ASSERT to fail.  The code itself handles this case correctly so
only the invalid ASSERT needs to be removed.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11235

3 years agoReduce confusion in zfs_write
Ryan Moeller [Wed, 18 Nov 2020 23:06:59 +0000 (18:06 -0500)]
Reduce confusion in zfs_write

Is this block when abuf != NULL ever reached? Yes, it is.

Add asserts and comments to prove that when we get here, we have a full
block write at an aligned offset extending past EOF.

Simplify by removing the check that tx_bytes == max_blksz, since we can
assert that it is always true.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11191

3 years agoReturn EFAULT at the end of zfs_write() when set
Ryan Moeller [Sat, 14 Nov 2020 18:16:26 +0000 (13:16 -0500)]
Return EFAULT at the end of zfs_write() when set

FreeBSD's VFS expects EFAULT from zfs_write() if we didn't complete
the full write so it can retry the operation.  Add some missing
SET_ERRORs in zfs_write().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11193

3 years agoSimplify offset and length limit in zfs_write
Ryan Moeller [Mon, 9 Nov 2020 21:01:56 +0000 (16:01 -0500)]
Simplify offset and length limit in zfs_write

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11176

3 years agoConst some unchanging variables in zfs_write
Ryan Moeller [Wed, 4 Nov 2020 23:10:12 +0000 (23:10 +0000)]
Const some unchanging variables in zfs_write

Show that these values will not be changing later.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11176

3 years agoRemove redundant oid parameter to update_pages
Ryan Moeller [Wed, 4 Nov 2020 21:47:14 +0000 (21:47 +0000)]
Remove redundant oid parameter to update_pages

The oid comes from the znode we are already passing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11176

3 years agoFactor uid, gid, and projid out of loop in zfs_write
Ryan Moeller [Wed, 4 Nov 2020 22:10:13 +0000 (22:10 +0000)]
Factor uid, gid, and projid out of loop in zfs_write

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11176

3 years agoShare zfs_fsync, zfs_read, zfs_write, et al between Linux and FreeBSD
Matthew Macy [Wed, 21 Oct 2020 21:08:06 +0000 (14:08 -0700)]
Share zfs_fsync, zfs_read, zfs_write, et al between Linux and FreeBSD

The zfs_fsync, zfs_read, and zfs_write function are almost identical
between Linux and FreeBSD.  With a little refactoring they can be
moved to the common code which is what is done by this commit.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11078

3 years agoZTS: Simplify zpool_initialize_verify_initialized
Brian Behlendorf [Fri, 18 Dec 2020 16:42:59 +0000 (08:42 -0800)]
ZTS: Simplify zpool_initialize_verify_initialized

Consider the test to be a success as long as the initializing pattern
is found at least once per metaslab.  This indicates that at least
part of the free space was initialized.  Ideally we'd check that the
pattern was written to all free space but that's much trickier so this
check is a reasonable compromise.

Using a here-string to feed the loop in this test causes an empty
string to still trigger the loop so we miss the `spacemaps=0` case.
Pipe into the loop instead.

While here, we can use `zpool wait -t initialize $TESTPOOL` to wait for
the pool to initialize.

Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11365

3 years agospecial device removal space accounting fixes
Matthew Ahrens [Thu, 17 Dec 2020 20:11:56 +0000 (12:11 -0800)]
special device removal space accounting fixes

The space in special devices is not included in spa_dspace (or
dsl_pool_adjustedsize(), or the zfs `available` property).  Therefore
there is always at least as much free space in the normal class, as
there is allocated in the special class(es).  And therefore, there is
always enough free space to remove a special device.

However, the checks for free space when removing special devices did not
take this into account.  This commit corrects that.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11329

3 years agoUse the correct return type for getopt
sterlingjensen [Thu, 17 Dec 2020 18:19:30 +0000 (12:19 -0600)]
Use the correct return type for getopt

Use the correct return type for getopt otherwise clang complains
about tautological-constant-out-of-range-compare.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sterling Jensen <sterlingjensen@users.noreply.github.com>
Closes #11359

3 years agoDKMS: Disable weak modules
gregory-lee-bartholomew [Tue, 15 Dec 2020 17:22:30 +0000 (11:22 -0600)]
DKMS: Disable weak modules

Fedora does not guarantee a stable kABI, so weak modules should be dis-
abled. See the dkms man page for a more detailed explanation of the weak
module feature.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gregory Bartholomew <gregory.lee.bartholomew@gmail.com>
Closes #9891
Closes #11128
Closes #11242
Closes #11335

3 years agolua: avoid gcc -Wreturn-local-addr bug
Ryan Libby [Tue, 15 Dec 2020 17:20:48 +0000 (09:20 -0800)]
lua: avoid gcc -Wreturn-local-addr bug

Avoid a bug with gcc's -Wreturn-local-addr warning with some
obfuscation.  In buggy versions of gcc, if a return value is an
expression that involves the address of a local variable, and even if
that address is legally converted to a non-pointer type, a warning may
be emitted and the value of the address may be replaced with zero.
Howerver, buggy versions don't emit the warning or replace the value
when simply returning a local variable of non-pointer type.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90737

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Libby <rlibby@FreeBSD.org>
Closes #11337

3 years agospa: avoid type narrowing warning
Ryan Libby [Tue, 15 Dec 2020 17:20:06 +0000 (09:20 -0800)]
spa: avoid type narrowing warning

Building the spa module for i386 caused gcc to emit
-Wint-to-pointer-cast "cast to pointer from integer of different size"
because spa.spa_did was uint64_t but pthread_join (via thread_join in
spa_deactivate) takes a pointer (32-bit on i386).  Define spa_did to be
pointer-size instead.  For now spa_did is in fact never non-zero and the
thread_join could instead be ifdef'd out, but changing the size of
spa_did may be more useful for the future.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Libby <rlibby@FreeBSD.org>
Closes #11336

3 years agoFreeBSD libzfs: gcc requires __thread after static
Ryan Libby [Mon, 14 Dec 2020 17:28:24 +0000 (09:28 -0800)]
FreeBSD libzfs: gcc requires __thread after static

Building libzfs with gcc on FreeBSD failed because gcc is picky about
the order of keywords in declarations with __thread, whereas clang is
more relaxed.

https://gcc.gnu.org/onlinedocs/gcc/Thread-Local.html

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ryan Libby <rlibby@FreeBSD.org>
Closes #11331

3 years agoFix reporting of CKSUM errors in indirect vdevs
George Amanakis [Fri, 11 Dec 2020 20:15:37 +0000 (21:15 +0100)]
Fix reporting of CKSUM errors in indirect vdevs

When removing and subsequently reattaching a vdev, CKSUM errors may
occur as vdev_indirect_read_all() reads from all children of a mirror
in case of a resilver.

Fix this by checking whether a child is missing the data and setting a
flag (ic_error) which is then checked in vdev_indirect_repair() and
suppresses incrementing the checksum counter.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #11277

3 years agoarc_summary3: Handle overflowing value width
Ryan Moeller [Tue, 8 Dec 2020 20:20:25 +0000 (20:20 +0000)]
arc_summary3: Handle overflowing value width

Some tunables shown by arc_summary3 have string values that may exceed
the normal line length, leaving a negative offset between the name and
value fields.  The negative space is of course not valid and Python
rightly barfs up an exception traceback.

Handle an overflowing value field width by ignoring the line length
and separating the name from the value by a single space instead.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11270

3 years agoFreeBSD: Implement sysctl for fletcher4 impl
Ryan Moeller [Wed, 2 Dec 2020 21:45:08 +0000 (21:45 +0000)]
FreeBSD: Implement sysctl for fletcher4 impl

There is a tunable to select the fletcher 4 checksum implementation on
Linux but it was not present in FreeBSD.

Implement the sysctl handler for FreeBSD and use ZFS_MODULE_PARAM_CALL
to provide the tunable on both platforms.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11270

3 years agoFix kernel panic induced by redacted send
Paul Dagnelie [Fri, 11 Dec 2020 18:22:29 +0000 (10:22 -0800)]
Fix kernel panic induced by redacted send

In the redaction list traversal code, there is a bug in the binary search
logic when looking for the resume point. Maxbufid can be decremented to -1,
causing us to read the last possible block of the object instead of the one we
wanted. This can cause incorrect resume behavior, or possibly even a hang in
some cases. In addition, when examining non-last blocks, we can treat the
block as being the same size as the last block, causing us to miss entries in
the redaction list when determining where to resume. Finally, we were ignoring
the case where the resume point was found in the buffer being searched, and
resuming from minbufid. All these issues have been corrected, and the code has
been significantly simplified to make future issues less likely.

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #11297

3 years agoFreeBSD: Fix format of vfs.zfs.arc_no_grow_shift
Ryan Moeller [Tue, 8 Dec 2020 17:21:36 +0000 (17:21 +0000)]
FreeBSD: Fix format of vfs.zfs.arc_no_grow_shift

vfs.zfs.arc_no_grow_shift has an invalid type (15) and this causes
py-sysctl to format it as a bytearray when it should be an integer.

"U" is not a valid format, it should be "I" and the type should match
the variable type, int.  We can return EINVAL if the value is set below
zero.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11318

3 years agoFreeBSD: Update usage of py-sysctl
Ryan Moeller [Tue, 8 Dec 2020 17:02:16 +0000 (17:02 +0000)]
FreeBSD: Update usage of py-sysctl

py-sysctl now includes the CTLTYPE_NODE type nodes in the list returned
by sysctl.filter() on FreeBSD head.  It also provides descriptions now.

Eliminate the subprocess call to get descriptions, and filter out the
nodes so we only deal with values.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11318

3 years agoFix possibly uninitialized 'root_inode' variable warning
Brian Behlendorf [Thu, 10 Dec 2020 23:23:26 +0000 (15:23 -0800)]
Fix possibly uninitialized 'root_inode' variable warning

Resolve an uninitialized variable warning when compiling.

    In function ‘zfs_domount’:
    warning: ‘root_inode’ may be used uninitialized in this
        function [-Wmaybe-uninitialized]
    sb->s_root = d_make_root(root_inode);

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11306

3 years agoCI: add zloop workflow
George Melikov [Tue, 8 Dec 2020 18:40:44 +0000 (21:40 +0300)]
CI: add zloop workflow

Run ztest via zloop for 20 minutes, total run time is ~30 minutes.

Signed-off-by: George Melikov <mail@gmelikov.ru>
3 years agoFreeBSD: Do zcommon_init sooner to avoid FPU panic
Ryan Moeller [Thu, 10 Dec 2020 05:29:00 +0000 (00:29 -0500)]
FreeBSD: Do zcommon_init sooner to avoid FPU panic

There has been a panic affecting some system configurations where the
thread FPU context is disturbed during the fletcher 4 benchmarks,
leading to a panic at boot.

module_init() registers zcommon_init to run in the last subsystem
(SI_SUB_LAST).  Running it as soon as interrupts have been configured
(SI_SUB_INT_CONFIG_HOOKS) makes sure we have finished the benchmarks
before we start doing other things.

While it's not clear *how* the FPU context was being disturbed, this
does seem to avoid it.

Add a module_init_early() macro to run zcommon_init() at this earlier
point on FreeBSD.  On Linux this is defined as module_init().

Authored by: Konstantin Belousov <kib@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11302

3 years agomount_zfs: print strerror instead of errno for error reporting
Érico Nogueira Rolim [Thu, 10 Dec 2020 05:24:59 +0000 (02:24 -0300)]
mount_zfs: print strerror instead of errno for error reporting

Tracking down an error message with the errno value can be difficult,
using strerror makes the error message clearer.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Érico Rolim <erico.erc@gmail.com>
Closes #11303

3 years agoDrop path prefix workaround
sterlingjensen [Thu, 10 Dec 2020 05:24:26 +0000 (23:24 -0600)]
Drop path prefix workaround

Canonicalization, the source of the trouble, was disabled in 9000a9f.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sterling Jensen <sterlingjensen@users.noreply.github.com>
Closes #11295

3 years agoDelete rw_semaphore.wait_lock configure check
Orivej Desh [Thu, 10 Dec 2020 05:22:54 +0000 (05:22 +0000)]
Delete rw_semaphore.wait_lock configure check

Last use of wait_lock was removed in "Linux 5.3 compat: retire
rw_tryupgrade()" (e7a99dab2b065ac2f8736a65d1b226d21754d771).

Fixes the issue reported in
https://github.com/openzfs/zfs/issues/11097#issuecomment-714532367

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Orivej Desh <orivej@gmx.fr>
Closes #11309

3 years agoFix optional "force" arg handing in zfs_ioc_pool_sync()
Brian Behlendorf [Wed, 9 Dec 2020 22:52:45 +0000 (14:52 -0800)]
Fix optional "force" arg handing in zfs_ioc_pool_sync()

The fnvlist_lookup_boolean_value() function should not be used
to check the force argument since it's optional.  It may not be
provided or may have been created with the wrong flags.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11281
Closes #11284

3 years agoCI: add new zfs-tests-sanity workflow
George Melikov [Tue, 8 Dec 2020 17:53:45 +0000 (20:53 +0300)]
CI: add new zfs-tests-sanity workflow

Run zfs-tests with sanity.run for brief results.  Timeouts
are rare, so minimize false positives by increasing the
default from 60 to 180 seconds.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11304

3 years agoZTS: zpool_trim tests throttle trim process
George Melikov [Mon, 7 Dec 2020 18:06:10 +0000 (21:06 +0300)]
ZTS: zpool_trim tests throttle trim process

Otherwise trim may finish before progress checks.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11296

3 years agoReduce fletcher4 and raidz benchmark times
Brian Behlendorf [Sun, 6 Dec 2020 17:57:20 +0000 (09:57 -0800)]
Reduce fletcher4 and raidz benchmark times

During module load time all of the available fetcher4 and raidz
implementations are benchmarked for a fixed amount of time to
determine the fastest available.  Manual testing has shown that this
time can be significantly reduced with negligible effect on the final
results.

This commit changes the benchmark time to 1ms which can reduce the
module load time by over a second on x86_64.  On an x86_64 system
with sse3, ssse3, and avx2 instructions the benchmark times are:

    Fletcher4    603ms   -> 15ms
    RAIDZ        1,322ms -> 64ms

Reviewed-by: Matthew Macy <mmacy@freebsd.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11282

3 years agoZTS: adjust zpool_import_012_pos timeout
Brian Behlendorf [Sun, 6 Dec 2020 17:48:36 +0000 (09:48 -0800)]
ZTS: adjust zpool_import_012_pos timeout

When running in the CI the zpool_import_012_pos test case occasionally
takes longer than the maximum 600 seconds.  When this happens the test
case is considered to have failed but always completes a few minutes
latter.  Since the logs suggest nothing has actually failed this commit
increases timeout and removes the exception.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11286

3 years agoZTS: Update zfs_share_concurrent_shares.ksh
Brian Behlendorf [Sun, 6 Dec 2020 17:47:33 +0000 (09:47 -0800)]
ZTS: Update zfs_share_concurrent_shares.ksh

Occasionally an out of memory error is hit by this test case
when mounting the filesystems.  Try and reduce the likelihood
of this occurring by reducing the thread count from 100 to 50.
It also has the advantage of slightly speeding up the test.

    cannot mount 'testpool/testfs3/79': Cannot allocate memory
        filesystem successfully created, but not mounted

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11283

3 years agoAdd sanity.run file
Brian Behlendorf [Thu, 3 Dec 2020 18:49:39 +0000 (10:49 -0800)]
Add sanity.run file

This run file contains a subset of functional tests which exercise
as much functionality as possible while still executing relatively
quickly.  The included tests should take no more than a few seconds
each to run at most.  This provides a convenient way to sanity test a
change before committing to a full test run which takes several hours.

    $ ./scripts/zfs-tests.sh -r sanity
    ...
    Results Summary
    PASS  813

    Running Time: 00:14:42
    Percent passed: 100.0%

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11271

3 years agoFix trivial typo in zfs-diff.8
melak [Thu, 3 Dec 2020 18:18:26 +0000 (19:18 +0100)]
Fix trivial typo in zfs-diff.8

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tamas TEVESZ <ice@extreme.hu>
Closes #11268
Closes #11272

3 years agoFix for "Reduce latency effects of non-interactive I/O"
Alexander Motin [Thu, 3 Dec 2020 18:02:39 +0000 (13:02 -0500)]
Fix for "Reduce latency effects of non-interactive I/O"

It was found that setting min_active tunables for non-interactive I/Os
makes them stuck.  It is caused by zfs_vdev_nia_delay, that can never
be reached if we never issue any I/Os due to min_active set to zero.

Fix this by issuing at least one non-interactive I/O at a time when
there are no interactive I/Os.  When there are interactive I/Os, zero
min_active allows to completely block any non-interactive I/O.  It may
min_active starvation in some scenarios, but who we are to deny foot
shooting?

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #11261

3 years agoReduce latency effects of non-interactive I/O
Alexander Motin [Tue, 24 Nov 2020 17:26:42 +0000 (12:26 -0500)]
Reduce latency effects of non-interactive I/O

Investigating influence of scrub (especially sequential) on random read
latency I've noticed that on some HDDs single 4KB read may take up to 4
seconds!  Deeper investigation shown that many HDDs heavily prioritize
sequential reads even when those are submitted with queue depth of 1.

This patch addresses the latency from two sides:
 - by using _min_active queue depths for non-interactive requests while
   the interactive request(s) are active and few requests after;
 - by throttling it further if no interactive requests has completed
   while configured amount of non-interactive did.

While there, I've also modified vdev_queue_class_to_issue() to give
more chances to schedule at least _min_active requests to the lowest
priorities.  It should reduce starvation if several non-interactive
processes are running same time with some interactive and I think should
make possible setting of zfs_vdev_max_active to as low as 1.

I've benchmarked this change with 4KB random reads from ZVOL with 16KB
block size on newly written non-fragmented pool.  On fragmented pool I
also saw improvements, but not so dramatic.  Below are log2 histograms
of the random read latency in milliseconds for different devices:

4 2x mirror vdevs of SATA HDD WDC WD20EFRX-68EUZN0 before:
0, 0, 2,  1,  12,  21,  19,  18, 10, 15, 17, 21
after:
0, 0, 0, 24, 101, 195, 419, 250, 47,  4,  0,  0
, that means maximum latency reduction from 2s to 500ms.

4 2x mirror vdevs of SATA HDD WDC WD80EFZX-68UW8N0 before:
0, 0,  2,  31,  38,  28,  18,  12, 17, 20, 24, 10, 3
after:
0, 0, 55, 247, 455, 470, 412, 181, 36,  0,  0,  0, 0
, i.e. from 4s to 250ms.

1 SAS HDD SEAGATE ST14000NM0048 before:
0,  0,  29,   70, 107,   45,  27, 1, 0, 0, 1, 4, 19
after:
1, 29, 681, 1261, 676, 1633,  67, 1, 0, 0, 0, 0,  0
, i.e. from 4s to 125ms.

1 SAS SSD SEAGATE XS3840TE70014 before (microseconds):
0, 0, 0, 0, 0, 0, 0, 0,  70, 18343, 82548, 618
after:
0, 0, 0, 0, 0, 0, 0, 0, 283, 92351, 34844,  90

I've also measured scrub time during the test and on idle pools.  On
idle fragmented pool I've measured scrub getting few percent faster
due to use of QD3 instead of QD2 before.  On idle non-fragmented pool
I've measured no difference.  On busy non-fragmented pool I've measured
scrub time increase about 1.5-1.7x, while IOPS increase reached 5-9x.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #11166

3 years agoAdd compatibility for busybox mktemp
qzdanis [Thu, 3 Dec 2020 18:01:16 +0000 (13:01 -0500)]
Add compatibility for busybox mktemp

Busybox's mktemp requires at least six X's in the template, causing
the current sed --in-place check to fail because the file does not
exist. This change adds additional X's to mktemp templates that do
not already have at least six X's in them.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Quentin Zdanis <zdanisq@gmail.com>
Closes #11269

3 years agoFreeBSD: notify userspace when a vdev is removed
Ryan Moeller [Wed, 2 Dec 2020 18:20:02 +0000 (13:20 -0500)]
FreeBSD: notify userspace when a vdev is removed

This is needed for zfsd to autoreplace vdevs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11260

3 years agoMake zpool status "remove:" label print in bold
Andrew Sun [Tue, 1 Dec 2020 23:22:51 +0000 (18:22 -0500)]
Make zpool status "remove:" label print in bold

When ZFS_COLOR is set, zpool status shows row headings in bold,
except for the "remove:" heading. This is a quick fix that makes
it print in bold too.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrew Sun <me@andrewsun.com>
Closes #11255

3 years agoCI: simplify checkstyle runner
George Melikov [Tue, 1 Dec 2020 20:15:55 +0000 (23:15 +0300)]
CI: simplify checkstyle runner

Remove excess steps.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11262

3 years agoZED/zfs-list-cacher.sh: don't exit on ignored event type
Prawn [Sun, 20 Dec 2020 02:01:21 +0000 (03:01 +0100)]
ZED/zfs-list-cacher.sh: don't exit on ignored event type

Check for the history_event type instead.

The zfs-list-cacher.sh script currently respects the event types
excluded from syslog(!) in ZED_SYSLOG_SUBCLASS_EXCLUDE.
This makes little sense in this single-purpose script and
silently breaks when history_events are excluded from syslog,
which is the default since 13d65987a9d9958de77422f5d9d25b47e486537d.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
Closes #11164
Closes #11347

3 years agoTag 2.0.0 zfs-2.0.0
Brian Behlendorf [Mon, 30 Nov 2020 18:13:06 +0000 (10:13 -0800)]
Tag 2.0.0

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
3 years agoVerify zfs module loaded before starting services
Brian Behlendorf [Sat, 28 Nov 2020 19:11:18 +0000 (11:11 -0800)]
Verify zfs module loaded before starting services

Extend the change made in ae12b02 to verify the zfs kernel
modules are loaded to the rest of the OpenZFS services.  If
the modules aren't loaded the neither the share, volume, or
and zed services can be started.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11243

3 years agodracut: use /bin/sh instead of bash as the intepreter
Đoàn Trần Công Danh [Sat, 28 Nov 2020 19:02:08 +0000 (02:02 +0700)]
dracut: use /bin/sh instead of bash as the intepreter

Despite that dracut has a hard dependency on bash,
its modules doesn't, dracut only has a hard dependency on bash for
module-setup (on a fully usable machine). Inside initramfs, dracut
allows users choose from a list of handful other shells, e.g. bash,
busybox, dash, mkfsh.

In fact, my local machine's initramfs is being built with dash,
and it's functional for a very long time.

Before 64025fa3a (Silence 'make checkbashisms', 2020-08-20), we also
allows our users to have that right, too.

Let's fix the problem 'make checkbashisms' reported and allows our users
to have that right, again.

For 'plymouth' case, let's simply run the command inside the if instead
of checking for the existence of command before running it, because the
status is also failture if plymouth is unavailable.

While we're at it, let's remove an unnecessary fork for grep in
zfs-generator.sh.in and its following complicated 'if elif fi' with
a simple 'case ... esac'.

To support this change, also exclude 90zfs from "make checkbashisms"
because the current CI infrastructure ships an old version of
"checkbashisms", which complains about "command -v", while the current
latest "checkbashisms" thinks it's fine. In the near future, we can
revert that change to "Makefile.am" when CI infrastructure is updated.

Reviewed-by: Gabriel A. Devenyi <gdevenyi@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Closes #11244

3 years agoRevert "Reduce latency effects of non-interactive I/O"
Brian Behlendorf [Mon, 30 Nov 2020 17:38:15 +0000 (09:38 -0800)]
Revert "Reduce latency effects of non-interactive I/O"

Under certain conditions commit a3a4b8def appears to result in a
hang, or poor performance, when importing a pool.  Until the root
cause can be identified it has been reverted from the release branch.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #11245

3 years agoTag 2.0.0-rc7
Brian Behlendorf [Wed, 25 Nov 2020 17:18:18 +0000 (09:18 -0800)]
Tag 2.0.0-rc7

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
3 years agoReduce latency effects of non-interactive I/O
Alexander Motin [Tue, 24 Nov 2020 17:26:42 +0000 (12:26 -0500)]
Reduce latency effects of non-interactive I/O

Investigating influence of scrub (especially sequential) on random read
latency I've noticed that on some HDDs single 4KB read may take up to 4
seconds!  Deeper investigation shown that many HDDs heavily prioritize
sequential reads even when those are submitted with queue depth of 1.

This patch addresses the latency from two sides:
 - by using _min_active queue depths for non-interactive requests while
   the interactive request(s) are active and few requests after;
 - by throttling it further if no interactive requests has completed
   while configured amount of non-interactive did.

While there, I've also modified vdev_queue_class_to_issue() to give
more chances to schedule at least _min_active requests to the lowest
priorities.  It should reduce starvation if several non-interactive
processes are running same time with some interactive and I think should
make possible setting of zfs_vdev_max_active to as low as 1.

I've benchmarked this change with 4KB random reads from ZVOL with 16KB
block size on newly written non-fragmented pool.  On fragmented pool I
also saw improvements, but not so dramatic.  Below are log2 histograms
of the random read latency in milliseconds for different devices:

4 2x mirror vdevs of SATA HDD WDC WD20EFRX-68EUZN0 before:
0, 0, 2,  1,  12,  21,  19,  18, 10, 15, 17, 21
after:
0, 0, 0, 24, 101, 195, 419, 250, 47,  4,  0,  0
, that means maximum latency reduction from 2s to 500ms.

4 2x mirror vdevs of SATA HDD WDC WD80EFZX-68UW8N0 before:
0, 0,  2,  31,  38,  28,  18,  12, 17, 20, 24, 10, 3
after:
0, 0, 55, 247, 455, 470, 412, 181, 36,  0,  0,  0, 0
, i.e. from 4s to 250ms.

1 SAS HDD SEAGATE ST14000NM0048 before:
0,  0,  29,   70, 107,   45,  27, 1, 0, 0, 1, 4, 19
after:
1, 29, 681, 1261, 676, 1633,  67, 1, 0, 0, 0, 0,  0
, i.e. from 4s to 125ms.

1 SAS SSD SEAGATE XS3840TE70014 before (microseconds):
0, 0, 0, 0, 0, 0, 0, 0,  70, 18343, 82548, 618
after:
0, 0, 0, 0, 0, 0, 0, 0, 283, 92351, 34844,  90

I've also measured scrub time during the test and on idle pools.  On
idle fragmented pool I've measured scrub getting few percent faster
due to use of QD3 instead of QD2 before.  On idle non-fragmented pool
I've measured no difference.  On busy non-fragmented pool I've measured
scrub time increase about 1.5-1.7x, while IOPS increase reached 5-9x.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #11166

3 years agopam_zfs_key: accommodate different dataset naming scheme
cragw [Sun, 22 Nov 2020 17:32:34 +0000 (01:32 +0800)]
pam_zfs_key: accommodate different dataset naming scheme

Name of dataset for user home directory may vary from the expected
$homes_prefix/$username, if different naming scheme is being used.

We can use property mountpoint to specify the dataset for $username
as long as its value is identical to passwd's pw_dir.

For example:
    NAME                       PROPERTY     VALUE
    rpool/home/myuser_123456   mountpoint   /home/myuser

Reviewed-by: Felix Dörre <felix@dogcraft.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Crag Wang <crag0715@gmail.com>
Closes #11165

3 years agoFreeBSD: decouple ZFS_DEBUG from kernel debug settings
Matthew Macy [Tue, 24 Nov 2020 17:16:46 +0000 (09:16 -0800)]
FreeBSD: decouple ZFS_DEBUG from kernel debug settings

Reviewed-by: Martelli Nikola @martellini
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11213

3 years agoObsolete earlier packages due to version bump
Brian Behlendorf [Tue, 24 Nov 2020 17:24:24 +0000 (09:24 -0800)]
Obsolete earlier packages due to version bump

In order for package managers such as dnf to upgrade cleanly after
the package SONAME bump the obsolete package names must be known.
Update the new packages to correctly obsolete the old ones.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11230
Closes #11233

3 years agolibzfsbootenv: do not depend on libnvpair
Antonio Russo [Sun, 22 Nov 2020 23:16:42 +0000 (16:16 -0700)]
libzfsbootenv: do not depend on libnvpair

We do not build libnvpair.pc.  Moreover, it is automatically pulled in
by libzfs.pc, so no additional specific dependency is required.

Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #11227

3 years agoInclude the ABI with dist tarball
Brian Behlendorf [Sat, 21 Nov 2020 18:44:52 +0000 (10:44 -0800)]
Include the ABI with dist tarball

The ABI should be included when generating the `make dist` tarball
since it's required by the `make checkabi` target.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11225

3 years agoCorrect missing zil_claim() DTL updates
Brian Behlendorf [Fri, 20 Nov 2020 21:14:45 +0000 (13:14 -0800)]
Correct missing zil_claim() DTL updates

Commit a1d477c2 accidentally disabled DTL updates for the zil_claim()
case described at the end of vdev_stat_update() by unconditionally
disabling all DTL updates when loading.  This was done to avoid
a deadlock on the vd_dtl_lock when loading the DTLs from disk.

    vdev_dtl_contains <--- Takes vd->vd_dtl_lock
    vdev_mirror_child_missing
    vdev_mirror_io_start
    zio_vdev_io_start
    __zio_execute
    arc_read
    dbuf_issue_final_prefetch
    dbuf_prefetch_impl
    dbuf_prefetch
    dmu_prefetch
    space_map_iterate
    space_map_load_length
    space_map_load
    vdev_dtl_load <--- Takes vd->vd_dtl_lock
    vdev_load
    spa_ld_load_vdev_metadata
    spa_tryimport

The missing DTL updates can be restored by moving the space_map_load()
call outside the vd_dtl_lock.  A private range tree is populated by
reading the space map and then merged in to the DTL_MISSING tree
under the lock.

Furthermore, the SPA_LOAD_NONE check in vdev_dtl_contains() leads to an
additional problem.  Any resilvering which occurs before SPA_LOAD_NONE
is set will incorrectly determine that there's nothing to repair.  This
can result in full redundancy not being restored for some blocks.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11218

3 years agoTrack SONAME version bump in packaging
Antonio Russo [Fri, 20 Nov 2020 00:25:24 +0000 (17:25 -0700)]
Track SONAME version bump in packaging

RPM and DEB packages are named after the SONAME version of the library
they contain.  After bumping this version, the packaging should be
renamed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #11219

3 years agoEnable ABI checks for the checkstyle workflow
Brian Behlendorf [Wed, 18 Nov 2020 17:32:49 +0000 (09:32 -0800)]
Enable ABI checks for the checkstyle workflow

For the OpenZFS 2.0 release branch extend the CI checkstyle
workflow to perform the library ABI checks.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11215

3 years agoAdd ABI snapshot
Brian Behlendorf [Tue, 17 Nov 2020 20:48:53 +0000 (20:48 +0000)]
Add ABI snapshot

Add a snapshot of the OpenZFS 2.0 ABI using libabigail-1.7-2.
The included ABI passes `make checkabi` for CentOS 7, Fedora 33,
Debian 10, and Ubuntu 20.04.  This covers a fairly wide range
of glibc, gcc, and libabigail versions plus other changes which
are platform specific.

Reviewed-by: Antonio Russo <aerusso@aerusso.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11144

3 years agoLibrary ABI tracking with abigail
Antonio Russo [Sun, 15 Nov 2020 04:35:31 +0000 (21:35 -0700)]
Library ABI tracking with abigail

Provide two make targets: checkabi and storeabi.

storeabi uses libabigail to generate a reference copy of the ABI for the
public libraries.

checkabi compares such a reference to the compiled version, failing if
they are not compatible.  No ABI is generated for libzpool.so, it is
only used by ztest and zdb and not external consumers.

Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #11144

3 years agoFix problems in zvol_set_volmode_impl
Matthew Macy [Tue, 17 Nov 2020 17:50:52 +0000 (09:50 -0800)]
Fix problems in zvol_set_volmode_impl

- Don't leave fstrans set when passed a snapshot
- Don't remove minor if volmode already matches new value
- (FreeBSD) Wait for GEOM ops to complete before trying
  remove (at create time GEOM will be "tasting" in parallel)
- (FreeBSD) Don't leak zvol_state_lock on open if zv == NULL
- (FreeBSD) Don't try to unlock zv->zv_state lock if zv == NULL

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11199

3 years agozpool: correctly align columns with -p
наб [Fri, 13 Nov 2020 22:38:29 +0000 (23:38 +0100)]
zpool: correctly align columns with -p

zpool_expand_proplist() now ignores pl_fixed if its new literal
argument is true.  The rest is a consequence of needing to pass
that down.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiao?=~Dska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11202

3 years agozpool(8): fix pool-wi[sd]e typo
наб [Fri, 13 Nov 2020 22:53:37 +0000 (23:53 +0100)]
zpool(8): fix pool-wi[sd]e typo

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11202

3 years agoFix 'zfs userspace' for received datasets in encrypted root
loli10K [Mon, 16 Nov 2020 17:10:29 +0000 (18:10 +0100)]
Fix 'zfs userspace' for received datasets in encrypted root

For encrypted receives, where user accounting is initially disabled on
creation, both 'zfs userspace' and 'zfs groupspace' fails with
EOPNOTSUPP: this is because dmu_objset_id_quota_upgrade_cb() forgets to
set OBJSET_FLAG_USERACCOUNTING_COMPLETE on the objset flags after a
successful dmu_objset_space_upgrade().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #9501
Closes #9596

3 years agoFix ASSERT logic in l2arc_evict()
George Amanakis [Mon, 16 Nov 2020 17:08:11 +0000 (18:08 +0100)]
Fix ASSERT logic in l2arc_evict()

In case of cache device removal it is possible that at the end of
l2arc_evict() we have l2ad_hand = l2ad_evict. This can lead to the
following panic in case of a debug build:

VERIFY3(dev->l2ad_hand < dev->l2ad_evict) failed (321920512 < 321920512)
Call Trace:
 dump_stack+0x66/0x90
 spl_panic+0xef/0x117 [spl]
 l2arc_remove_vdev+0x11d/0x290 [zfs]
 spa_load_l2cache+0x275/0x5b0 [zfs]
 spa_vdev_remove+0x4a5/0x6e0 [zfs]
 zfs_ioc_vdev_remove+0x59/0xa0 [zfs]
 zfsdev_ioctl_common+0x5b3/0x630 [zfs]
 zfsdev_ioctl+0x53/0xe0 [zfs]
 do_vfs_ioctl+0x42e/0x6b0
 ksys_ioctl+0x5e/0x90
 do_syscall_64+0x5b/0x1a0
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

In case of cache device removal it also possible that l2ad_hand +
distance > l2ad_end since we do not iterate l2arc_evict() and l2ad_hand
is not reset. This has no functional consequence however as the cache
device is about to be removed.

Fix this by omitting the ASSERT in case of device removal.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #11205

3 years agoconfig/dracut/90zfs: handle cases where hostid(1) returns all zeros
Érico Rolim [Fri, 13 Nov 2020 03:00:59 +0000 (00:00 -0300)]
config/dracut/90zfs: handle cases where hostid(1) returns all zeros

On systems with musl libc, hostid(1) always prints "00000000", which
will cause improper behavior when the 90zfs module is configured in a
dracut initramfs. Work around this by copying the host /etc/hostid if
the file exists, and otherwise only write /etc/hostid if hostid(1)
returns something meaningful. This avoids zgenhostid creating a random
/etc/hostid for the initramfs, which could lead to errors when trying to
import the pool if spl_hostid isn't defined in the kernel command line.

Furthermore, tag the /etc/hostid file as hostonly, since it is system
specific and shouldn't be taken into account when trying to use an
initramfs generated in one system to boot into a different system.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Georgy Yakovlev <gyakovlev@gentoo.org>
Co-authored-by: Andrew J. Hesford <ajh@sideband.org>
Signed-off-by: Érico Rolim <erico.erc@gmail.com>
Closes #11174
Closes #11189

3 years agozgenhostid: accept hostid arguments equal to zero.
Érico Rolim [Tue, 10 Nov 2020 14:22:27 +0000 (11:22 -0300)]
zgenhostid: accept hostid arguments equal to zero.

A common usage pattern for zgenhostid, including in the ZFS dracut
module, is running it as:

  zgenhostid $(hostid)

However, zgenhostid only accepted hostid arguments greater than 0, which
meant that, when the output of hostid(1) was "00000000", zgenhostid
would error out, even though 0 is a possible return value for the
gethostid(3) function used by hostid(1):

- On current musl libc, gethostid(3) is a stub that always returns 0.
- On glibc, gethostid(3) will return 0 if /etc/hostid exists but is
  smaller than 4 bytes.

In these cases, it makes more sense for zgenhostid to treat a value of 0
as other parts of the zfs codebase do, meaning that a hostid value
couldn't be determined; therefore, it should attempt to generate a
random value to write into /etc/hostid.

The manpage and usage output have been updated to reflect this.

Whitespace has also been fixed in the usage output.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Georgy Yakovlev <gyakovlev@gentoo.org>
Reviewed-by: Andrew J. Hesford <ajh@sideband.org>
Signed-off-by: Érico Rolim <erico.erc@gmail.com>
Closes #11174
Closes #11189

3 years agoLinux: Fix ZFS_ENTER/ZFS_EXIT/ZFS_VERFY_ZP usage
Brian Behlendorf [Sat, 14 Nov 2020 18:19:00 +0000 (10:19 -0800)]
Linux: Fix ZFS_ENTER/ZFS_EXIT/ZFS_VERFY_ZP usage

The ZFS_ENTER/ZFS_EXIT/ZFS_VERFY_ZP macros should not be used
in the Linux zpl_*.c source files.  They return a positive error
value which is correct for the common code, but not for the Linux
specific kernel code which expects a negative return value.  The
ZPL_ENTER/ZPL_EXIT/ZPL_VERFY_ZP macros should be used instead.

Furthermore, the ZPL_EXIT macro has been updated to not call the
zfs_exit_fs() function.  This prevents a possible deadlock which
can occur when a snapshot is automatically unmounted because the
zpl_show_devname() must never wait on in progress automatic
snapshot unmounts.

Reviewed-by: Adam Moss <c@yotes.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11169
Closes #11201

3 years agoAssertion failure when logging large output of channel program
Matthew Ahrens [Sat, 14 Nov 2020 18:17:16 +0000 (10:17 -0800)]
Assertion failure when logging large output of channel program

The output of ZFS channel programs is logged on-disk in the zpool
history, and printed by `zpool history -i`.  Channel programs can use
10MB of memory by default, and up to 100MB by using the `zfs program -m`
flag.  Therefore their output can be up to some fraction of 100MB.

In addition to being somewhat wasteful of the limited space reserved for
the pool history (which for large pools is 1GB), in extreme cases this
can result in a failure of `ASSERT(length <= DMU_MAX_ACCESS);` in
`dmu_buf_hold_array_by_dnode()`.

This commit limits the output size that will be logged to 1MB.  Larger
outputs will not be logged, instead a entry will be logged indicating
the size of the omitted output.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11194

3 years agoTag 2.0.0-rc6
Brian Behlendorf [Thu, 12 Nov 2020 19:02:58 +0000 (11:02 -0800)]
Tag 2.0.0-rc6

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
3 years agoChannel program may spuriously fail with "memory limit exhausted"
Matthew Ahrens [Thu, 12 Nov 2020 01:16:15 +0000 (17:16 -0800)]
Channel program may spuriously fail with "memory limit exhausted"

ZFS channel programs (invoked by `zfs program`) are executed in a LUA
sandbox with a limit on the amount of memory they can consume.  The
limit is 10MB by default, and can be raised to 100MB with the `-m` flag.
If the memory limit is exceeded, the LUA program exits and the command
fails with a message like `Channel program execution failed: Memory
limit exhausted.`

The LUA sandbox allocates memory with `vmem_alloc(KM_NOSLEEP)`, which
will fail if the requested memory is not immediately available.  In this
case, the program fails with the same message, `Memory limit exhausted`.
However, in this case the specified memory limit has not been reached,
and the memory may only be temporarily unavailable.

This commit changes the LUA memory allocator `zcp_lua_alloc()` to use
`vmem_alloc(KM_SLEEP)`, so that we won't spuriously fail when memory is
temporarily low.  Instead, we rely on the system to be able to free up
memory (e.g. by evicting from the ARC), and we assume that even at the
highest memory limit of 100MB, the channel program will not truly
exhaust the system's memory.

External-issue: DLPX-71924
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11190

3 years agoLinux: Fix mount/unmount when dataset name has a space
Brian Behlendorf [Thu, 12 Nov 2020 01:14:24 +0000 (17:14 -0800)]
Linux: Fix mount/unmount when dataset name has a space

The custom zpl_show_devname() helper should translate spaces in
to the octal escape sequence \040.  The getmntent(2) function
is aware of this convention and properly translates the escape
character back to a space when reading the fsname.

Without this change the `zfs mount` and `zfs unmount` commands
incorrectly detect when a dataset with a name containing spaces
is mounted.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11182
Closes #11187

3 years agoStart snapdir_iterate traversals to begin wtih the value of zero.
Tony Perkins [Mon, 28 Sep 2020 00:46:22 +0000 (20:46 -0400)]
Start snapdir_iterate traversals to begin wtih the value of zero.

The microzap hash can sometimes be zero for single digit snapnames.
The zap cursor can then have a serialized value of two (for . and ..),
and skip the first entry in the avl tree for the .zfs/snapshot directory
listing, and therefore does not return all snapshots.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Cedric Berger <cedric@precidata.com>
Signed-off-by: Tony Perkins <tperkins@datto.com>
Closes #11039

3 years agoG/C data_alloc_arena
Mateusz Guzik [Thu, 12 Nov 2020 01:11:32 +0000 (02:11 +0100)]
G/C data_alloc_arena

It is a leftover from illumos always set to NULL and introducing a
spurious difference between zio_buf and zio_data_buf.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11188

3 years agoG/C struct znode -> z_moved
Mateusz Guzik [Tue, 10 Nov 2020 20:42:47 +0000 (21:42 +0100)]
G/C struct znode -> z_moved

The field is yet another leftover from unsupported zfs_znode_move.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11186

3 years agoFix compiling on FreeBSD + gcc - don't assume illmnos bits
Adrian Chadd [Thu, 15 Oct 2020 20:07:12 +0000 (13:07 -0700)]
Fix compiling on FreeBSD + gcc - don't assume illmnos bits

This looks like it was once from the illumnos compat code.
FreeBSD doesn't have cmn_err as a compiler format attribute, so
it definitely errors out.

It doesn't show up on LLVM because it doesn't trigger at all.

Add in the format flags but keep them behind #if 0 for now;
there are too many format issues that trigger when one does
format checking in the shared code.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: adrian chadd <adrian@freebsd.org>
Closes #11068
Closes #11069

3 years agoFix pointer-is-uint64_t-sized assumption in the ioctl path
Adrian Chadd [Thu, 15 Oct 2020 20:02:43 +0000 (13:02 -0700)]
Fix pointer-is-uint64_t-sized assumption in the ioctl path

This shows up when compiling freebsd-head on amd64 using gcc-6.4.
The lib32 compat build ends up tripping over this assumption.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: adrian chadd <adrian@freebsd.org>
Closes #11068
Closes #11069

3 years agoFix memleak in cmd/mount_zfs.c
sterlingjensen [Tue, 10 Nov 2020 23:50:44 +0000 (17:50 -0600)]
Fix memleak in cmd/mount_zfs.c

Convert dynamic allocation to static buffer, simplify parse_dataset
function return path. Add tests specific to the mount helper.

Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sterling Jensen <sterlingjensen@users.noreply.github.com>
Closes #11098

3 years agozpoolprops.8: clarify vdev expansion rules
наб [Tue, 10 Nov 2020 20:48:26 +0000 (21:48 +0100)]
zpoolprops.8: clarify vdev expansion rules

Remove reference to EFI(?), explain that the new space
is beyond the GPT for whole-disk vdevs, and add section noting how it
behaves with partition vdevs in terms of how the user is most likely to
encounter it ‒ the previous phrasing was confusing
and seemed to indicate that "zpool online -e" will be able to claim

  GPT[whatever, ZFS, free space, whatever]

into

  GPT[whatever, ZFS, whatever]
but that's not the case, as it'll only be able to do so after manually
resizing the ZFS partition to include the free space beforehand, i.e.:
  GPT[whatever, ZFS, free space, whatever]
  GPT[whatever, [ZFS + free space], potentially left-overs, whatever]
  # zpool online -e
  GPT[whatever, ZFS, whatever]

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11158

3 years agoinitramfs: zfsunlock hook breaks /usr/bin
Pavel Zakharov [Tue, 10 Nov 2020 19:12:07 +0000 (14:12 -0500)]
initramfs: zfsunlock hook breaks /usr/bin

The copy_exec() function expects that the full path of the target
file is passed rather than just the directory, and will take care
of creating the underlying directories if they don't exist.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Closes #11162

3 years agoFreeBSD: Simplify zvol_geom_open and zvol_cdev_open
Ryan Moeller [Fri, 6 Nov 2020 18:56:58 +0000 (13:56 -0500)]
FreeBSD: Simplify zvol_geom_open and zvol_cdev_open

We can consolidate the unlocking procedure into one place by starting
with drop_suspend set to B_FALSE and moving the open count check up.

While here, a little code cleanup. Match the out labels between
zvol_geom_open and zvol_cdev_open, and add a missing period in some
comments.

Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11175

3 years agoFreeBSD: Avoid spurious EINTR in zvol_cdev_open
Ryan Moeller [Fri, 6 Nov 2020 18:52:16 +0000 (13:52 -0500)]
FreeBSD: Avoid spurious EINTR in zvol_cdev_open

zvol_first_open can fail with EINTR if spa_namespace_lock is not held
and cannot be taken without waiting.

Apply the same logic that was done for zvol_geom_open to take
spa_namespace_lock if not already held on first open in zvol_cdev_open.

Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11175

3 years agoFix dmu_tx_dirty_throttle after arc_c reduction
Alexander Motin [Tue, 10 Nov 2020 18:39:26 +0000 (13:39 -0500)]
Fix dmu_tx_dirty_throttle after arc_c reduction

After initial arc_c was reduced to arc_c_min it became possible that
on datasets with primarycache=metadata or none dirty data make up most
of ARC capacity and easily more than configured 50% of initial arc_c,
that causes forced txg commits by arc_tempreserve_space() and periodic
very long write delays.

This patch makes arc_tempreserve_space() to use arc_c only after ARC
warmed up once and arc_c really means something, but use arc_c_max
before that.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #11178

3 years agoFix dnode refcount tracking
Matthew Macy [Tue, 10 Nov 2020 18:37:10 +0000 (10:37 -0800)]
Fix dnode refcount tracking

Fix a couple of places where the wrong tag is passed
to dnode_{hold, rele}

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11184