]> git.proxmox.com Git - mirror_zfs.git/log
mirror_zfs.git
3 years agoZTS events_002: Improve speed and reliability
Antonio Russo [Mon, 8 Mar 2021 16:42:45 +0000 (09:42 -0700)]
ZTS events_002: Improve speed and reliability

events_002 exercises the ZED, ensuring that it neither misses events,
nor reporting events twice.

On slow test hardware, some of the timeouts are insufficient to allow
the ZED to properly settle.  Conversely, on fast hardware these same
timeouts are too long, unnecessarily slowing the test run.

Instead of using a fixed timeout, wait for the expected final event
before returning.  Additionally, wait with a timeout for unexpected
events to avoid missing them if they show up late.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #11703

3 years agozvol: call zil_replaying() during replay
Christian Schwarz [Sun, 7 Mar 2021 17:49:58 +0000 (18:49 +0100)]
zvol: call zil_replaying() during replay

zil_replaying(zil, tx) has the side-effect of informing the ZIL that an
entry has been replayed in the (still open) tx.  The ZIL uses that
information to record the replay progress in the ZIL header when that
tx's txg syncs.

ZPL log entries are not idempotent and logically dependent and thus
calling zil_replaying() is necessary for correctness.

For ZVOLs the question of correctness is more nuanced: ZVOL logs only
TX_WRITE and TX_TRUNCATE, both of which are idempotent. Logical
dependencies between two records exist only if the write or discard
request had sync semantics or if the ranges affected by the records
overlap.

Thus, at a first glance, it would be correct to restart replay from
the beginning if we crash before replay completes. But this does not
address the following scenario:
Assume one log record per LWB.
The chain on disk is

    HDR -> 1:W(1, "A") -> 2:W(1, "B") -> 3:W(2, "X") -> 4:W(3, "Z")

where N:W(O, C) represents log entry number N which is a TX_WRITE of C
to offset A.
We replay 1, 2 and 3 in one txg, sync that txg, then crash.
Bit flips corrupt 2, 3, and 4.
We come up again and restart replay from the beginning because
we did not call zil_replaying() during replay.
We replay 1 again, then interpret 2's invalid checksum as the end
of the ZIL chain and call replay done.
The replayed zvol content is "AX".

If we had called zil_replaying() the HDR would have pointed to 3
and our resumed replay would not have replayed anything because
3 was corrupted, resulting in zvol content "BX".

If 3 logically depends on 2 then the replay corrupted the ZVOL_OBJ's
contents.

This patch adds the zil_replaying() calls to the replay functions.
Since the callbacks in the replay function need the zilog_t* pointer
so that they can call zil_replaying() we open the ZIL while
replaying in zvol_create_minor(). We also verify that replay has
been done when on-demand-opening the ZIL on the first modifying
bio.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11667

3 years agoZTS: Improve cleanup in zpool tests
Ryan Moeller [Sun, 7 Mar 2021 17:41:01 +0000 (12:41 -0500)]
ZTS: Improve cleanup in zpool tests

* Restore original kern.corefile value after the test.
* Don't leave behind a frozen pool.
* Clean up leftover vdev files.
* Make zpool_002_pos and zpool_003_pos consistent in their handling of
core files while here.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11694

3 years agoClarify compressed zfs send/recv behavior
manfromafar [Sun, 7 Mar 2021 17:39:16 +0000 (10:39 -0700)]
Clarify compressed zfs send/recv behavior

Docs for send and receive do not explain behavior when sending a
compressed stream then receiving on a host that overrides compression
with -o compress=value.

The data from the send stream is written as it was from the send is
the compressed form but the compression algorithm set on the receiver
is the overridden version which causes some confusion as to what
algorithm was actually used.

Updated man docs to clarify behavior

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed By: Allan Jude <allanjude@freebsd.org>
Signed-off-by: manfromafar <manfromafar@outlook.com>
Closes #11690

3 years agoIntentionally allow ZFS_READONLY in zfs_write
Ryan Moeller [Sun, 7 Mar 2021 17:31:52 +0000 (12:31 -0500)]
Intentionally allow ZFS_READONLY in zfs_write

ZFS_READONLY represents the "DOS R/O" attribute.
When that flag is set, we should behave as if write access
were not granted by anything in the ACL.  In particular:
We _must_ allow writes after opening the file r/w, then
setting the DOS R/O attribute, and writing some more.
(Similar to how you can write after fchmod(fd, 0444).)

Restore these semantics which were lost on FreeBSD when refactoring
zfs_write.  To my knowledge Linux does not actually expose this flag,
but we'll need it to eventually so I've added the supporting checks.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11693

3 years agoSuppress cppcheck invalidSyntax warninigs
Brian Behlendorf [Sat, 6 Mar 2021 01:56:35 +0000 (17:56 -0800)]
Suppress cppcheck invalidSyntax warninigs

For some reason cppcheck 1.90 is generating an invalidSyntax warning
when the BF64_SET macro is used in the zstream source.  The same
warning is not reported by cppcheck 2.3, nor is their any evident
problem with the expanded macro.  This appears to be an issue with
this version of cppcheck.  This commit annotates the source to suppress
the warning.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11700

3 years agoInitialize ZIL buffers
Brian Behlendorf [Fri, 5 Mar 2021 22:45:13 +0000 (14:45 -0800)]
Initialize ZIL buffers

When populating a ZIL destination buffer ensure it is always
zeroed before its contents are constructed.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Tom Caputi <caputit1@tcnj.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11687

3 years agoFix abd_get_offset_struct() may allocate new abd
Jorgen Lundman [Fri, 5 Mar 2021 20:22:57 +0000 (05:22 +0900)]
Fix abd_get_offset_struct() may allocate new abd

Even when supplied with an abd to abd_get_offset_struct(), the call
to abd_get_offset_impl() can allocate a different abd. Ensure to
call abd_fini_struct() on the abd that is not used.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #11683

3 years agoFreeBSD module --enable-debug --enable-invariants
Ryan Moeller [Fri, 5 Mar 2021 20:16:41 +0000 (15:16 -0500)]
FreeBSD module --enable-debug --enable-invariants

Wire up the --enable-debug flag for configure to the FreeBSD module
build.  Add --enable-invariants.

The running FreeBSD kernel config is used to detect whether to enable
INVARIANTS if not explicitly specified with --enable-invariants or
--disable-invariants.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11678

3 years agozpool: use tab to intend continuation from removal status
Thomas Lamprecht [Fri, 5 Mar 2021 20:15:35 +0000 (21:15 +0100)]
zpool: use tab to intend continuation from removal status

Bring the output of the removal status in line with the other
"fields" that zpool status outputs, and thus allows an parser to
easier detect this as continuation of the 'remove:' output.

Before:
remove: Removal of vdev 0 copied 282G in 0h9m, completed on [...]
    776K memory used for removed device mappings

Now:
remove: Removal of vdev 0 copied 282G in 0h9m, completed on [...]
776K memory used for removed device mappings

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Closes #11674

3 years agoDon't bomb out when using keylocation=file://
James Wah [Wed, 3 Mar 2021 16:28:49 +0000 (03:28 +1100)]
Don't bomb out when using keylocation=file://

Avoid following the error path when the operation in fact succeeded.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: James Wah <james@laird-wah.net>
Closes #11651

3 years agolinux: zvol: avoid heap allocation for zvol_request_sync=1
Christian Schwarz [Wed, 3 Mar 2021 16:15:28 +0000 (17:15 +0100)]
linux: zvol: avoid heap allocation for zvol_request_sync=1

The spl_kmem_alloc showed up in some flamegraphs in a single-threaded
4k sync write workload at 85k IOPS on an
Intel(R) Xeon(R) Silver 4215 CPU @ 2.50GHz.
Certainly not a huge win but I believe the change is clean and
easy to maintain down the road.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11666

3 years agoAdd "zstd-fast" to help options for "compression" property
Jake Howard [Wed, 3 Mar 2021 16:14:19 +0000 (16:14 +0000)]
Add "zstd-fast" to help options for "compression" property

This value does work as expected, and is documented in the manpage.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jake Howard <git@theorangeone.net>
Closes #11670

3 years agoCancel TRIM / initialize on FAULTED non-writeable vdevs
nssrikanth [Tue, 2 Mar 2021 18:27:27 +0000 (23:57 +0530)]
Cancel TRIM / initialize on FAULTED non-writeable vdevs

When a device which is actively trimming or initializing becomes
FAULTED, and therefore no longer writable, cancel the active
TRIM or initialization.  When the device is merely taken offline
with `zpool offline` then stop the operation but do not cancel it.
When the device is brought back online the operation will be
resumed if possible.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Vipin Kumar Verma <vipin.verma@hpe.com>
Signed-off-by: Srikanth N S <srikanth.nagasubbaraoseetharaman@hpe.com>
Closes #11588

3 years agoFix assert in FreeBSD-specific dmu_read_pages
Andriy Gapon [Sun, 28 Feb 2021 01:23:09 +0000 (03:23 +0200)]
Fix assert in FreeBSD-specific dmu_read_pages

The function has three similar pieces of code: for read-behind pages,
requested pages and read-ahead pages.  All three pieces had an
assert to ensure that the page is not mapped.  Later the assert was
relaxed to require that the page is not mapped for writing.  But that
was done in two places out of three.  This change fixes the third piece,
read-ahead.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andriy Gapon <avg@FreeBSD.org>
Closes #11654

3 years agoZTS: zpool_trim_start_and_cancel_pos.ksh
Brian Behlendorf [Sun, 28 Feb 2021 01:19:50 +0000 (17:19 -0800)]
ZTS: zpool_trim_start_and_cancel_pos.ksh

Several of the TRIM tests were based of the initialize tests and
then adapted for TRIM.  The zpool_trim_start_and_cancel_pos.ksh
test was intended to be one such test but it was overlooked and
actually never adapted.  Update it accordingly.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11649

3 years agoAdd missing checks for unsupported features
Martin Matuška [Sun, 28 Feb 2021 01:16:02 +0000 (02:16 +0100)]
Add missing checks for unsupported features

After 35ec517 it has become possible to import ZFS pools witn an
active org.illumos:edonr feature on FreeBSD, leading to a panic.

In addition, "zpool status" reported all pools without edonr
as upgradable and "zpool upgrade -v" reported edonr in the list
of upgradable features.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #11653

3 years agoLinux 5.12 compat: replace bio_*_io_acct with disk_*_io_acct
Coleman Kane [Tue, 23 Feb 2021 02:18:41 +0000 (21:18 -0500)]
Linux 5.12 compat: replace bio_*_io_acct with disk_*_io_acct

The bio_*_acct functions became GPL exports, which causes the
kernel modules to refuse to compile. This replaces code with
alternate function calls to the disk_*_io_acct interfaces, which
are not GPL exports. This change was added in kernel commit
99dfc43ecbf67f12a06512918aaba61d55863efc.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #11639

3 years agoLinux 5.12 compat: bio->bi_disk member moved
Coleman Kane [Tue, 23 Feb 2021 02:07:51 +0000 (21:07 -0500)]
Linux 5.12 compat: bio->bi_disk member moved

The struct bio member bi_disk was moved underneath a new member named
bi_bdev. So all attempts to reference bio->bi_disk need to now become
bio->bi_bdev->bd_disk.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #11639

3 years agoFix vdev_rebuild_thread deadlock
Brian Behlendorf [Wed, 24 Feb 2021 18:01:00 +0000 (10:01 -0800)]
Fix vdev_rebuild_thread deadlock

The metaslab_disable() call may block waiting for a txg sync.
Therefore it's important that vdev_rebuild_thread release the
SCL_CONFIG read lock it is holding before this call.  Failure
to do so can result in the txg_sync thread getting blocked
waiting for this lock which results in a deadlock.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewd-by: Srikanth N S <srikanth.nagasubbaraoseetharaman@hpe.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11647

3 years agoFix overly broad locking in spa_vdev_config_exit()
Brian Behlendorf [Wed, 24 Feb 2021 18:00:21 +0000 (10:00 -0800)]
Fix overly broad locking in spa_vdev_config_exit()

Calling vdev_free() only requires the we acquire the spa config
SCL_STATE_ALL locks, not the SCL_ALL locks.  In particular, we need
need to avoid taking the SCL_CONFIG lock (included in SCL_ALL) as a
writer since this can lead to a deadlock.  The txg_sync_thread() may
block in spa_txg_history_init_io() when taking the SCL_CONFIG lock
as a reading when it detects there's a pending writer.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11585

3 years agovdev_id: Fix partition regular expression
Tony Hutter [Wed, 24 Feb 2021 17:58:46 +0000 (09:58 -0800)]
vdev_id: Fix partition regular expression

Given a DM device name, the old vdev_id script would extract any text
after a 'p' as the partition number.  It then appends "-part" + the
partition number to the name, giving a by-vdev name like "L0-part5".

This works fine if the DM name is like 'dm-2p5', but doesn't work if
the DM name is a multipath name like "mpatha".  In those cases it
incorrectly matches the 'p' in "mpatha", giving by-vdev names like
"L0-partatha".

This patch fixes the issue by making the partition regex match stricter.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #11637

3 years agoLinux: increase max nvlist_src size
Brian Behlendorf [Wed, 24 Feb 2021 17:57:18 +0000 (09:57 -0800)]
Linux: increase max nvlist_src size

On Linux increase the maximum allowed size of the src nvlist which
can be passed to the /dev/zfs ioctl.  Originally, this was set
to a maximum of KMALLOC_MAX_SIZE (4M) because it was kmalloc'd.
Since that time it's been converted to a vmalloc so that's no
longer a hard limit, and it's desirable for `zfs send/recv` to
allow larger nvlists so more snapshots can be sent at once.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6572
Closes #11638

3 years agoAdd upper bound for slop space calculation
Prakash Surya [Wed, 24 Feb 2021 17:52:43 +0000 (09:52 -0800)]
Add upper bound for slop space calculation

This change modifies the behavior of how we determine how much slop
space to use in the pool, such that now it has an upper limit. The
default upper limit is 128G, but is configurable via a tunable.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Prakash Surya <prakash.surya@delphix.com>
Closes #11023

3 years agoWrap bare EINVAL returns with SET_ERROR
Ryan Moeller [Wed, 24 Feb 2021 17:51:10 +0000 (12:51 -0500)]
Wrap bare EINVAL returns with SET_ERROR

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11636

3 years agoForce symlink creation for zpool.d compat links
Ryan Moeller [Wed, 24 Feb 2021 17:49:59 +0000 (12:49 -0500)]
Force symlink creation for zpool.d compat links

gmake install fails when zpool.d compat links already exist.

Force the symlinks to be recreated if already present.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11633

3 years agosend_iterate_snap : doall send without fromsnap
Cedric Maunoury [Wed, 24 Feb 2021 17:48:58 +0000 (18:48 +0100)]
send_iterate_snap : doall send without fromsnap

The behavior of a NULL fromsnap was inadvertently changed for a doall
send when the send/recv logic in libzfs was updated.  Restore the
previous behavior by correcting send_iterate_snap() to include all
the snapshots in the nvlist for this case.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Cedric Maunoury <cedric.maunoury@gmail.com>
Closes #11608

3 years agoFix error message when zfs module are already unloaded
Adam D. Moss [Sun, 21 Feb 2021 04:23:10 +0000 (20:23 -0800)]
Fix error message when zfs module are already unloaded

Using zfs-sh -u on linux will fail with inaccurate message when the
zfs modules are already unloaded.  Deal with the case where a module
is already unloaded; its USE_COUNT will be the empty string

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Adam Moss <c@yotes.com>
Closes #11627

3 years agovdev_ops: don't try to call vdev_op_hold or vdev_op_rele when NULL
fbynite [Sun, 21 Feb 2021 04:19:20 +0000 (19:19 -0900)]
vdev_ops: don't try to call vdev_op_hold or vdev_op_rele when NULL

This prevents a panic after a SLOG add/removal on the root pool followed
by a zpool scrub.

When a SLOG is removed, a hole takes its place - the vdev_ops for a hole
is vdev_hole_ops, which defines the handler functions of vdev_op_hold
and vdev_op_rele as NULL.

This bug has been reported in illumos and FreeBSD, a different trigger
in the FreeBSD report though.

Credit for this patch goes to Patrick Mooney <pmooney@pfmooney.com>

Obtained from: illumos-gate commit: c65bd18728f34725
External-issue: https://www.illumos.org/issues/12981
External-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252396
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Wing <rob.fx907@gmail.com>
Closes #11623

3 years agoBetter zfs_get_enclosure_sysfs_path() enclosure support
Tony Hutter [Sun, 21 Feb 2021 04:17:45 +0000 (20:17 -0800)]
Better zfs_get_enclosure_sysfs_path() enclosure support

A multpathed disk will have several 'underlying' paths to the disk.  For
example, multipath disk 'dm-0' may be made up of paths:
/dev/{sda,sdb,sdc,sdd}.  On many enclosures those underlying sysfs
paths will have a symlink back to their enclosure device entry
(like 'enclosure_device0/slot1').  This is used by the
statechange-led.sh script to set/clear the fault LED for a disk, and
by 'zpool status -c'.

However, on some enclosures, those underlying paths may not all have
symlinks back to the enclosure device.  Maybe only two out of four
of them might.

This patch updates zfs_get_enclosure_sysfs_path() to favor returning
paths that have symlinks back to their enclosure devices, rather
than just returning the first path.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #11617

3 years agoCleaning up uio headers
Brian Atkinson [Sun, 21 Feb 2021 04:16:50 +0000 (21:16 -0700)]
Cleaning up uio headers

Making uio_impl.h the common header interface between Linux and FreeBSD
so both OS's can share a common header file. This also helps reduce code
duplication for zfs_uio_t for each OS.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #11622

3 years agoztest: propagate -o to the zdb child process
Christian Schwarz [Thu, 18 Feb 2021 11:20:09 +0000 (12:20 +0100)]
ztest: propagate -o to the zdb child process

I think this is the behavior that most users expect.

Future work: have a separate flag, e.g., -O, to specify separate
set_global_vars for the zdb child than for the ztest children.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11602

3 years agoztest: fix -o by calling set_global_var in child processes
Christian Schwarz [Tue, 16 Feb 2021 10:14:44 +0000 (11:14 +0100)]
ztest: fix -o by calling set_global_var in child processes

Without set_global_var() in the child processes the -o option provides
little use.

Before this change set_global_var() was called as a side-effect of
getopt processing which only happens for the parent ztest process.

This change limits the set of options that can be set and makes them
available to the child through ztest_shared_opts_t.

Future work: support arbitrary option count and length.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11602

3 years agolibzpool: set_global_var: refactor to not modify 'arg'
Christian Schwarz [Tue, 16 Feb 2021 11:27:48 +0000 (12:27 +0100)]
libzpool: set_global_var: refactor to not modify 'arg'

Also fixes leak of the dlopen handle in the error case.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11602

3 years agolibzpool: set_global_var: fix endianness handling (fixes zdb -o )
Christian Schwarz [Mon, 15 Feb 2021 12:02:32 +0000 (13:02 +0100)]
libzpool: set_global_var: fix endianness handling (fixes zdb -o )

Without this patch I get the error

  Setting global variables is only supported on little-endian systems

when using `zdb -o` on my amd64 machine.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11602

3 years agoRestore FreeBSD resource usage accounting
Ryan Moeller [Sat, 20 Feb 2021 06:34:33 +0000 (01:34 -0500)]
Restore FreeBSD resource usage accounting

Add zfs_racct_* interfaces for platform-dependent read/write accounting.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11613

3 years agoChecksum errors may not be counted
Don Brady [Sat, 20 Feb 2021 06:33:15 +0000 (23:33 -0700)]
Checksum errors may not be counted

Fix regression seen in issue #11545 where checksum errors
where not being counted or showing up in a zpool event.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #11609

3 years agoFreeBSD: disable the use of hardware crypto offload drivers for now
Mark Johnston [Thu, 18 Feb 2021 23:51:20 +0000 (18:51 -0500)]
FreeBSD: disable the use of hardware crypto offload drivers for now

First, the crypto request completion handler contains a bug in that it
fails to reset fs_done correctly after the request is completed.  This
is only a problem for asynchronous drivers.  Second, some hardware
drivers have input constraints which ZFS does not satisfy.  For
instance, ccp(4) apparently requires the AAD length for AES-GCM to be a
multiple of the cipher block size, and with qat(4) the AES-GCM AAD
length may not be longer than 240 bytes.  FreeBSD's generic crypto
framework doesn't have a mechanism to automatically fall back to a
software implementation if a hardware driver cannot process a request,
and ZFS does not tolerate such errors.

The plan is to implement such a fallback mechanism, but with FreeBSD
13.0 approaching we should simply disable the use hardware drivers for
now.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #11612

3 years agoFix report_mount_progress never calling set_progress_header
Andriy Gapon [Thu, 18 Feb 2021 21:53:05 +0000 (23:53 +0200)]
Fix report_mount_progress never calling set_progress_header

That happens because of an off-by-one mistake.
share_mount_one_cb() calls report_mount_progress(current=sm_done) after
having incremented sm_done by one.  Then report_mount_progress()
increments the parameter again.  It appears that that logic became
obsolete after commit a10d50f999511, parallel zfs mount.

On FreeBSD I observe that zfs mount -a -v prints, for example,
    (null): (201/248)
That happens because set_progress_header() is never called.

With this change the output becomes correct:
    Mounting ZFS filesystems: (209/248)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andriy Gapon <avg@FreeBSD.org>
Closes #11607

3 years agoRemove unused abd_alloc_scatter_offset_chunkcnt
Ryan Libby [Thu, 18 Feb 2021 05:39:13 +0000 (21:39 -0800)]
Remove unused abd_alloc_scatter_offset_chunkcnt

Remove function that become unused after refactoring in
e2af2acce3436acdb2b35fdc7c9de1a30ea85514.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Libby <rlibby@FreeBSD.org>
Closes #11614

3 years agoAdd "compatibility" property for zpool feature sets
Colm [Thu, 18 Feb 2021 05:30:45 +0000 (05:30 +0000)]
Add "compatibility" property for zpool feature sets

Property to allow sets of features to be specified; for compatibility
with specific versions / releases / external systems. Influences
the behavior of 'zpool upgrade' and 'zpool create'. Initial man
page changes and test cases included.

Brief synopsis:

zpool create -o compatibility=off|legacy|file[,file...] pool vdev...

compatibility = off : disable compatibility mode (enable all features)
compatibility = legacy : request that no features be enabled
compatibility = file[,file...] : read features from specified files.
Only features present in *all* files will be enabled on the
resulting pool. Filenames may be absolute, or relative to
/etc/zfs/compatibility.d or /usr/share/zfs/compatibility.d (/etc
checked first).

Only affects zpool create, zpool upgrade and zpool status.

ABI changes in libzfs:

* New function "zpool_load_compat" to load and parse compat sets.
* Add "zpool_compat_status_t" typedef for compatibility parse status.
* Add ZPOOL_PROP_COMPATIBILITY to the pool properties enum
* Add ZPOOL_STATUS_COMPATIBILITY_ERR to the pool status enum

An initial set of base compatibility sets are included in
cmd/zpool/compatibility.d, and the Makefile for cmd/zpool is
modified to install these in $pkgdatadir/compatibility.d and to
create symbolic links to a reasonable set of aliases.

Reviewed-by: ericloewe
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Colm Buckley <colm@tuatha.org>
Closes #11468

3 years agoFreeBSD: disable edonr in zfs_mod_supported_feature()
Brian Behlendorf [Wed, 17 Feb 2021 16:14:51 +0000 (08:14 -0800)]
FreeBSD: disable edonr in zfs_mod_supported_feature()

Rather than conditionally compiling out the edonr code for FreeBSD
update zfs_mod_supported_feature() to indicate this feature is
unsupported.  This ensures that all spa features are defined on
every platform, even if they are not supported.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11605
Issue #11468

3 years agoSupport uClibc for the tests compilations
José Luis Salvador Rufo [Wed, 17 Feb 2021 05:51:46 +0000 (06:51 +0100)]
Support uClibc for the tests compilations

There are two issues that don't allow ZFS to be compiled using uClibc.
`backtrace()`, and `program_invocation_short_name` as a `const`.
This patch adds uClibc to the conditionals in the same way there are
already for Glibc for `backtrace()`; and removes the external param
`program_invocation_short_name` because its only used here for the
whole project.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: José Luis Salvador Rufo <salvador.joseluis@gmail.com>
Closes #11600

3 years agoMake inline ABD predicates compatible with C++
Ryan Moeller [Mon, 15 Feb 2021 18:15:50 +0000 (13:15 -0500)]
Make inline ABD predicates compatible with C++

FreeBSD's zfsd fails to build after e2af2acce3 due to strict type
checking errors from the implicit conversion between bool and boolean_t
in the inline predicate definitions in abd.h.

Use conditionals to return the correct value type from these functions.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #11592

3 years agoLinux 5.11 compat: META
Brian Behlendorf [Wed, 10 Feb 2021 18:11:21 +0000 (10:11 -0800)]
Linux 5.11 compat: META

Increase the Linux-Maximum version in the META file to 5.11.
All of the required compatibility patches have been merged.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11586

3 years agovdev_id: Support daisy-chained JBODs in multipath mode
Arshad Hussain [Tue, 9 Feb 2021 21:04:09 +0000 (02:34 +0530)]
vdev_id: Support daisy-chained JBODs in multipath mode

Within function sas_handler() userspace commands like
'/usr/sbin/multipath' have been replaced with sourcing
device details from within sysfs which reduced a
significant amount of overhead and processing time.
Multiple JBOD enclosures and their order are sourced
from the bsg driver (/sys/class/enclosure) to isolate
chassis top-level expanders, which are then dynamically
indexed based on host channel of the multipath subordinate
disk member device being processed. Additionally added a
"mixed" mode for slot identification for environments where
a ZFS server system may contain SAS disk slots where there
is no expander (direct connect to HBA) while an attached
external JBOD with an expander have different slot identifier
methods.

How Has This Been Tested?
~~~~~~~~~~~~~~~~~~~~~~~~~

Testing was performed on a AMD EPYC based dual-server
high-availability multipath environment with multiple
HBAs per ZFS server and four SAS JBODs. The two primary
JBODs were multipath/cross-connected between the two
ZFS-HA servers. The secondary JBODs were daisy-chained
off of the primary JBODs using aligned SAS expander
channels (JBOD-0 expanderA--->JBOD-1 expanderA,
          JBOD-0 expanderB--->JBOD-1 expanderB, etc).
Pools were created, exported and re-imported, imported
globally with 'zpool import -a -d /dev/disk/by-vdev'.
Low level udev debug outputs were traced to isolate
and resolve errors.

Result:
~~~~~~~

Initial testing of a previous version of this change
showed how reliance on userspace utilities like
'/usr/sbin/multipath' and '/usr/bin/lsscsi' were
exacerbated by increasing numbers of disks and JBODs.
With four 60-disk SAS JBODs and 240 disks the time to
process a udevadm trigger was 3 minutes 30 seconds
during which nearly all CPU cores were above 80%
utilization. By switching reliance on userspace
utilities to sysfs in this version, the udevadm
trigger processing time was reduced to 12.2 seconds
and negligible CPU load.

This patch also fixes few shellcheck complains.

Reviewed-by: Gabriel A. Devenyi <gdevenyi@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Jeff Johnson <jeff.johnson@aeoncomputing.com>
Signed-off-by: Jeff Johnson <jeff.johnson@aeoncomputing.com>
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Closes #11526

3 years agoRename zfs_inode_update to zfs_znode_update_vfs
khng300 [Tue, 9 Feb 2021 19:17:29 +0000 (03:17 +0800)]
Rename zfs_inode_update to zfs_znode_update_vfs

zfs_znode_update_vfs is a more platform-agnostic name than
zfs_inode_update. Besides that, the function's prototype is moved to
include/sys/zfs_znode.h as the function is also used in common code.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ka Ho Ng <khng300@gmail.com>
Sponsored by: The FreeBSD Foundation
Closes #11580

3 years agoAdd an assert to clarify code
Kleber Tarcísio [Tue, 9 Feb 2021 19:14:59 +0000 (16:14 -0300)]
Add an assert to clarify code

The first time through the loop prevdb and prevhdl are NULL.  They
are then both set, but only prevdb is checked.  Add an ASSERT to
make it clear that prevhdl must be set when prevdb is.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kleber <klebertarcisio@yahoo.com.br>
Closes #10754
Closes #11575

3 years agoSet file mode during zfs_write
Antonio Russo [Mon, 8 Feb 2021 17:15:05 +0000 (10:15 -0700)]
Set file mode during zfs_write

3d40b65 refactored zfs_vnops.c, which shared much code verbatim between
Linux and BSD.  After a successful write, the suid/sgid bits are reset,
and the mode to be written is stored in newmode.  On Linux, this was
propagated to both the in-memory inode and znode, which is then updated
with sa_update.

3d40b65 accidentally removed the initialization of newmode, which
happened to occur on the same line as the inode update (which has been
moved out of the function).

The uninitialized newmode can be saved to disk, leading to a crash on
stat() of that file, in addition to a merely incorrect file mode.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #11474
Closes #11576

3 years agozfs-import-{cache,scan}: change condition to FileNotEmpty
наб [Fri, 5 Feb 2021 19:25:22 +0000 (20:25 +0100)]
zfs-import-{cache,scan}: change condition to FileNotEmpty

When all pools are exported ZFS will generate an empty cache file.
This will cause the import service to fail, which is sub-optimal,
since this means that dracut fails, and it necessary to run
`zpool import -a` to boot, delete the file, and regenerate+reinstall
the initrd.

This resolves the issue by treating an zero-length cache files the
same as a missing cache file.  This aligns the behavior with that
of the `zpool` command itself.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11568

3 years agoFixed issue with processing of EC_dev_remove event
nssrikanth [Fri, 5 Feb 2021 16:30:50 +0000 (22:00 +0530)]
Fixed issue with processing of EC_dev_remove event

The pool guid and vdev guid received by zfs_agent_post_event(),
which calls zfs_retire_recv(), are normally non-zero.  However,
later in this same method they may be unconditionally reset to
zero by the code which is intended to handle  multipath, spare
and l2arc vdevs.  This will result in the EC_dev_remove not
being handled.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>\
Co-authored-by: Vipin Kumar Verma <vipin.verma@hpe.com>
Signed-off-by: Srikanth N S <srikanth.nagasubbaraoseetharaman@hpe.com>
Closes #11564

3 years agozfs-list.8: clarify listing snapshots
Brian Behlendorf [Thu, 4 Feb 2021 17:56:28 +0000 (09:56 -0800)]
zfs-list.8: clarify listing snapshots

Clarify how to include snapshots in the `zpool list` output by
referencing the full name of the `listsnapshots` pool property,
and the `zpool list -t snapshot` option.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11562
Closes #11565

3 years agoDocument monotonicity of dmu_tx_assign() and txg_hold_open()
Christian Schwarz [Mon, 25 Jan 2021 12:13:45 +0000 (13:13 +0100)]
Document monotonicity of dmu_tx_assign() and txg_hold_open()

Expand the comments to make it clear exactly what is guaranteed
by dmu_tx_assign() and txg_hold_open().  Additionally, update
the comment which refers to txg_exit() when it should reference
txg_rele_to_sync().

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11521

3 years agozts-report.py: ignore some skipped tests in Github CI
George Melikov [Wed, 27 Jan 2021 12:18:01 +0000 (15:18 +0300)]
zts-report.py: ignore some skipped tests in Github CI

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11554

3 years agoCI: add ubuntu-* functional tests runner
George Melikov [Tue, 26 Jan 2021 12:01:44 +0000 (15:01 +0300)]
CI: add ubuntu-* functional tests runner

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11554

3 years agoCI: rename zfs-tests workflow
George Melikov [Tue, 26 Jan 2021 12:01:19 +0000 (15:01 +0300)]
CI: rename zfs-tests workflow

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11554

3 years agoRemove unused iov_iter_init_compat() wrapper
Brian Behlendorf [Sat, 30 Jan 2021 18:06:14 +0000 (10:06 -0800)]
Remove unused iov_iter_init_compat() wrapper

This compatibility code is no longer needed.  For it a while
iov_iter_init_compat() was used by zfs_uio_prefaultpages() but
this code should have been dropped as part of commit 83b91ae1.
Take care of that oversight and remove it.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11543

3 years agoThe abd child/parent relationship does not need to be tracked
Matthew Ahrens [Sat, 30 Jan 2021 18:04:42 +0000 (10:04 -0800)]
The abd child/parent relationship does not need to be tracked

ABD's currently track their parent/child relationship.  This applies to
`abd_get_offset()` and `abd_borrow_buf()`.  However, nothing depends on
knowing this relationship, it's only used for consistency checks to
verify that we are not destroying an ABD that's still in use.  When we
are creating/destroying ABD's frequently, the performance impact of
maintaining these data structures (in particular the atomic
increment/decrement operations) can be measurable.

This commit removes this verification code on production builds, but
keeps it when ZFS_DEBUG is set.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11535

3 years agoAdded extra check to replace Faulted VDEV with Distributed Spare
nssrikanth [Fri, 29 Jan 2021 01:00:26 +0000 (06:30 +0530)]
Added extra check to replace Faulted VDEV with Distributed Spare

In ZED zfs_retire agent added a check to handle Distributed Spare
replacement for Faulted VDEV also.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Vipin Kumar Verma <vipin.verma@hpe.com>
Signed-off-by: Mark Maybee <mark.maybee@hpe.com>
Closes #11354
Closes #11355

3 years agoFixing gang ABD when adding another gang
Brian Atkinson [Fri, 29 Jan 2021 00:54:12 +0000 (17:54 -0700)]
Fixing gang ABD when adding another gang

I originally applied a fix in #11539 to fix a parent's child references
when a gang ABD is free'd. However, I did not take into account
abd_gang_add_gang(). We still need to make sure to update the child
references in this function as well. In order to resolve this I removed
decreasing the gang ABD's size in abd_free_gang() as well as moved back
the original placeent of zfs_refcount_remove_many() in abd_free().

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #11542

3 years agoZTS: add userspace_send_encrypted.ksh to Makefile
George Melikov [Thu, 28 Jan 2021 21:39:38 +0000 (00:39 +0300)]
ZTS: add userspace_send_encrypted.ksh to Makefile

All tests need to be included in the Makefiles.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11541

3 years agofix abd_nr_pages_off for gang abd
Matthew Ahrens [Thu, 28 Jan 2021 17:28:20 +0000 (09:28 -0800)]
fix abd_nr_pages_off for gang abd

`__vdev_disk_physio()` uses `abd_nr_pages_off()` to allocate a bio with
a sufficient number of iovec's to process this zio (i.e.
`nr_iovecs`/`bi_max_vecs`).  If there are not enough iovec's in the bio,
then additional bio's will be allocated.  However, this is a sub-optimal
code path.  In particular, it requires several abd calls (to
`abd_nr_pages_off()` and `abd_bio_map_off()`) which will have to walk
the constituents of the ABD (the pages or the gang children) because
they are looking for offsets > 0.

For gang ABD's, `abd_nr_pages_off()` returns the number of iovec's
needed for the first constituent, rather than the sum of all
constituents (within the requested range).  This always under-estimates
the required number of iovec's, which causes us to always need several
bio's.  The end result is that `__vdev_disk_physio()` is usually O(n^2)
for gang ABD's (and occasionally O(n^3), when more than 16 bio's are
needed).

This commit fixes `abd_nr_pages_off()`'s handling of gang ABD's, to
correctly determine how many iovec's are needed, by adding up the number
of iovec's for each of the gang children in the requested range.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11536

3 years agoAvoid updating the L2ARC device header unnecessarily
George Amanakis [Thu, 28 Jan 2021 17:20:03 +0000 (18:20 +0100)]
Avoid updating the L2ARC device header unnecessarily

If we do not write any buffers to the cache device and the evict hand
has not advanced do not update the cache device header.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #11522
Closes #11537

3 years agoRemoving ABD Parent Child Reference Before Freeing ABD
Brian Atkinson [Thu, 28 Jan 2021 17:15:17 +0000 (10:15 -0700)]
Removing ABD Parent Child Reference Before Freeing ABD

Moving the call to zfs_refcount_remove_many() in abd_free() to be called
before any of the ABD free variants are called. This is necessary
because abd_free_gang() adjusts the abd_size for the gang ABD. If the
parent's child references are removed after free'ing the gang ABD the
refcount is not adjusted correctly for the parent's children.

I also removed some stray abd_put() in comments and changed
abd_free_gang_abd() -> abd_free_gang().

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #11539

3 years agoAdd zdb -r <dataset> <object-id | file> <output>
Allan Jude [Thu, 28 Jan 2021 05:36:01 +0000 (00:36 -0500)]
Add zdb -r <dataset> <object-id | file> <output>

While you can use zdb -R poolname vdev:offset:[<lsize>/]<psize>[:flags]
to extract individual DVAs from a vdev, it would be handy for be able
copy an entire file out of the pool.

Given a file or object number, add support to copy the contents to a
file. Useful for debugging and recovery.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #11027

3 years agoRevert special case code from pre-hashtable nvlist era
Mark Maybee [Thu, 28 Jan 2021 05:31:51 +0000 (22:31 -0700)]
Revert special case code from pre-hashtable nvlist era

Before a hash table was added on top of the nvlist code, there were
cases where the nvlist allocation was changed from fnvlist_alloc()
to nvlist_alloc() to avoid expensive NV_UNIQUE_NAME checks. Now
this is no longer necessary. These changes should be reverted to be
consistent with other code. There are some cases where this change
will also reduce the number of iterations.

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mark Maybee <mark.maybee@delphix.com>
Closes #11464

3 years agoFix zrele race in zrele_async that can cause hang
Paul Dagnelie [Thu, 28 Jan 2021 05:29:58 +0000 (21:29 -0800)]
Fix zrele race in zrele_async that can cause hang

There is a race condition in zfs_zrele_async when we are checking if
we would be the one to evict an inode. This can lead to a txg sync
deadlock.

Instead of calling into iput directly, we attempt to perform the atomic
decrement ourselves, unless that would set the i_count value to zero.
In that case, we dispatch a call to iput to run later, to prevent a
deadlock from occurring.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #11527
Closes #11530

3 years agoZTS: pool_state test check for pool existence in cleanup
George Melikov [Thu, 28 Jan 2021 01:33:30 +0000 (04:33 +0300)]
ZTS: pool_state test check for pool existence in cleanup

If there is no scsi_debug module, then this test
must be skipped, in this case cleanup routine should
be prepared for absent pool.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11534

3 years agoFix a resource leak in uu_avl_pool_destroy
Alan Somers [Wed, 27 Jan 2021 03:39:28 +0000 (20:39 -0700)]
Fix a resource leak in uu_avl_pool_destroy

Need to destroy the pthread mutex created in uu_avl_pool_create.

https://svnweb.freebsd.org/base?view=revision&revision=262912

Obtained from: FreeBSD
Sponsored by: Spectra Logic Corporation
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #11528

3 years agoParallelize vdev_validate
Alan Somers [Tue, 12 Jan 2021 22:25:52 +0000 (15:25 -0700)]
Parallelize vdev_validate

The runtime of vdev_validate is dominated by the disk accesses in
vdev_label_read_config.  Speed it up by validating all vdevs in
parallel using a taskq.

Sponsored by: Axcient
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #11470

3 years agoRead all disk labels concurrently in vdev_label_read_config
Alan Somers [Tue, 12 Jan 2021 21:59:56 +0000 (14:59 -0700)]
Read all disk labels concurrently in vdev_label_read_config

This is similar to what we already do in vdev_geom_read_config.

Sponsored by: Axcient
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #11470

3 years agoParallelize vdev_load
Alan Somers [Tue, 12 Jan 2021 00:00:19 +0000 (17:00 -0700)]
Parallelize vdev_load

metaslab_init is the slowest part of importing a mature pool, and it
must be repeated hundreds of times for each top-level vdev.  But its
speed is dominated by a few serialized disk accesses.  That can lead to
import times of > 1 hour for pools with many top-level vdevs on spinny
disks.

Speed up the import by using a taskqueue to parallelize vdev_load across
all top-level vdevs.

This also requires adding mutex protection to
metaslab_class_t.mc_historgram.  The mc_histogram fields were
unprotected when that code was first written in "Illumos 4976-4984 -
metaslab improvements" (OpenZFS
f3a7f6610f2df0217ba3b99099019417a954b673).  The lock wasn't added until
3dfb57a35e8cbaa7c424611235d669f3c575ada1, though it's unclear exactly
which fields it's supposed to protect.  In any case, it wasn't until
vdev_load was parallelized that any code attempted concurrent access to
those fields.

Sponsored by: Axcient
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #11470

3 years agoFix a man page link in zfs-program.8
Alan Somers [Wed, 27 Jan 2021 00:17:11 +0000 (17:17 -0700)]
Fix a man page link in zfs-program.8

zfs-program.8 has an orphan link, fix it.

https://svnweb.freebsd.org/base?view=revision&revision=360080

Obtained from: FreeBSD
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #11529

3 years agocppcheck: integrete cppcheck
Brian Behlendorf [Fri, 22 Jan 2021 20:54:34 +0000 (12:54 -0800)]
cppcheck: integrete cppcheck

In order for cppcheck to perform a proper analysis it needs to be
aware of how the sources are compiled (source files, include
paths/files, extra defines, etc).  All the needed information is
available from the Makefiles and can be leveraged with a generic
cppcheck Makefile target.  So let's add one.

Additional minor changes:

* Removing the cppcheck-suppressions.txt file.  With cppcheck 2.3
  and these changes it appears to no longer be needed.  Some inline
  suppressions were also removed since they appear not to be
  needed.  We can add them back if it turns out they're needed
  for older versions of cppcheck.

* Added the ax_count_cpus m4 macro to detect at configure time how
  many processors are available in order to run multiple cppcheck
  jobs.  This value is also now used as a replacement for nproc
  when executing the kernel interface checks.

* "PHONY =" line moved in to the Rules.am file which is included
  at the top of all Makefile.am's.  This is just convenient becase
  it allows us to use the += syntax to add phony targets.

* One upside of this integration worth mentioning is it now allows
  `make cppcheck` to be run in any directory to check that subtree.

* For the moment, cppcheck is not run against the FreeBSD specific
  kernel sources.  The cppcheck-FreeBSD target will need to be
  implemented and testing on FreeBSD to support this.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11508

3 years agocppcheck: return value always 0
Brian Behlendorf [Sat, 23 Jan 2021 05:26:41 +0000 (21:26 -0800)]
cppcheck: return value always 0

Identical condition and return expression 'rc', return value is
always 0.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11508

3 years agocppcheck: remove redundant ASSERTs
Brian Behlendorf [Sat, 23 Jan 2021 05:24:08 +0000 (21:24 -0800)]
cppcheck: remove redundant ASSERTs

The ASSERT that the passed pointer isn't NULL appears after the
pointer has already been dereferenced.  Remove the redundant check.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11508

3 years agocppcheck: resolve double free
Brian Behlendorf [Sat, 23 Jan 2021 00:17:16 +0000 (16:17 -0800)]
cppcheck: resolve double free

The double free reported for the realloc() failure branch is a
false positive.  It should be resolved in cppcheck 2.4 but for
the benefit of older versions we supress the warning.

    https://trac.cppcheck.net/ticket/9292

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11508

3 years agocppcheck: zpool_main.c possible null pointer dereference
Brian Behlendorf [Fri, 22 Jan 2021 23:03:56 +0000 (15:03 -0800)]
cppcheck: zpool_main.c possible null pointer dereference

Explicitly check for NULL to satisfy cppcheck that "val" can never
be NULL when passed to printf().  This looks like a false positive
since is_blank_str() can never take the false conditional branch
when passed a NULL.  But there's no harm in adding the extra check.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11508

3 years agoRAIDZ2/3 fails to heal silently corrupted parity w/2+ bad disks
Matthew Ahrens [Wed, 27 Jan 2021 00:05:05 +0000 (16:05 -0800)]
RAIDZ2/3 fails to heal silently corrupted parity w/2+ bad disks

When scrubbing, (non-sequential) resilvering, or correcting a checksum
error using RAIDZ parity, ZFS should heal any incorrect RAIDZ parity by
overwriting it.  For example, if P disks are silently corrupted (P being
the number of failures tolerated; e.g. RAIDZ2 has P=2), `zpool scrub`
should detect and heal all the bad state on these disks, including
parity.  This way if there is a subsequent failure we are fully
protected.

With RAIDZ2 or RAIDZ3, a block can have silent damage to a parity
sector, and also damage (silent or known) to a data sector.  In this
case the parity should be healed but it is not.

The problem can be noticed by scrubbing the pool twice.  Assuming there
was no damage concurrent with the scrubs, the first scrub should fix all
silent damage, and the second scrub should be "clean" (`zpool status`
should not report checksum errors on any disks).  If the bug is
encountered, then the second scrub will repair the silently-damaged
parity that the first scrub failed to repair, and these checksum errors
will be reported after the second scrub.  Since the first scrub repaired
all the damaged data, the bug can not be encountered during the second
scrub, so subsequent scrubs (more than two) are not necessary.

The root cause of the problem is some code that was inadvertently added
to `raidz_parity_verify()` by the DRAID changes.  The incorrect code
causes the parity healing to be aborted if there is damaged data
(`rc_error != 0`) or the data disk is not present (`!rc_tried`).  These
checks are not necessary, because we only call `raidz_parity_verify()`
if we have the correct data (which may have been reconstructed using
parity, and which was verified by the checksum).

This commit fixes the problem by removing the incorrect checks in
`raidz_parity_verify()`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11489
Closes #11510

3 years agoZTS: zpool_export test improvements
Will Andrews [Tue, 26 Jan 2021 21:14:04 +0000 (15:14 -0600)]
ZTS: zpool_export test improvements

- refactor cleanup routines into common kshlib zpool_export_cleanup func
- don't require physical disks to test, just use files

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Will Andrews <will@firepipe.net>
Closes #11518

3 years agodracut: Fix race condition between load-key and import
Lorenz Hüdepohl [Tue, 26 Jan 2021 20:14:22 +0000 (21:14 +0100)]
dracut: Fix race condition between load-key and import

zfs-load-key.sh is called by the dracut-pre-mount.service unit which has
no explicit 'After' dependency on zfs-import.target. That way it can be
that the pool has not yet been imported and the zfs-load-key.sh finishes
without ever seeing the relevant pool.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Lorenz Hüdepohl <dev@stellardeath.org>
Closes #11500

3 years agospa_export_common: refactor common exit points
Will Andrews [Mon, 25 Jan 2021 23:04:11 +0000 (17:04 -0600)]
spa_export_common: refactor common exit points

Create a common exit point for spa_export_common (a very long
function), which avoids missing steps on failure.  This work
is helpful for the planned forced pool export changes.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Will Andrews <will@firepipe.net>
Closes #11514

3 years agoZTS: improve output clarity of check_prop_source
Will Andrews [Sun, 11 Oct 2020 20:11:06 +0000 (15:11 -0500)]
ZTS: improve output clarity of check_prop_source

Instead of just failing, indicate the expected and actual value and
source as a NOTE.  Tests using this failed in an earlier version of
the changeset and this information helped find the cause.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Will Andrews <will@firepipe.net>
Closes #11517

3 years agoZTS: remove duplicate check_prop_source from zfs_receive
Will Andrews [Mon, 25 Jan 2021 22:38:19 +0000 (16:38 -0600)]
ZTS: remove duplicate check_prop_source from zfs_receive

There is an identical definition in zfs_set_common.kshlib already.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Will Andrews <will@firepipe.net>
Closes #11516

3 years agologapi: cat output file instead of printing
Will Andrews [Sun, 11 Oct 2020 20:08:56 +0000 (15:08 -0500)]
logapi: cat output file instead of printing

This avoids globbing together multiple lines in the log, if you happen
to specify LOGAPI_DEBUG because you want to see it.

Signed-off-by: Will Andrews <will@firepipe.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11515

3 years agospl-taskq: Make sure thread tsd hash entry is cleared
Matthew Macy [Mon, 25 Jan 2021 19:18:28 +0000 (11:18 -0800)]
spl-taskq: Make sure thread tsd hash entry is cleared

Like any other thread created by thread_create() we need to call
thread_exit() to properly clean it up.  In particular, this ensures the
tsd hash for the thread is cleared.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11512

3 years agoSpeed up "zpool import" in the presence of many zvols
Alan Somers [Mon, 25 Jan 2021 00:02:45 +0000 (17:02 -0700)]
Speed up "zpool import" in the presence of many zvols

By default, FreeBSD does not allow zpools to be backed by zvols (that
can be changed with the "vfs.zfs.vol.recursive" sysctl). When that
sysctl is set to 0, the kernel does not attempt to read zvols when
looking for vdevs. But the zpool command still does. This change brings
the zpool command into line with the kernel's behavior. It speeds "zpool
import" when an already imported pool has many zvols, or a zvol with
many snapshots.

https://svnweb.freebsd.org/base?view=revision&revision=357235
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=241083
https://reviews.freebsd.org/D22077

Obtained from: FreeBSD
Reported by: Martin Birgmeier <d8zNeCFG@aon.at>
Sponsored by: Axcient
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #11502

3 years agozfsprops.8: fix mispluralisation in "Default values is"
наб [Sun, 24 Jan 2021 23:57:51 +0000 (00:57 +0100)]
zfsprops.8: fix mispluralisation in "Default values is"

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11509

3 years agoZTS: Use swapctl to list swap devices on FreeBSD
Ryan Moeller [Sun, 24 Jan 2021 23:56:59 +0000 (18:56 -0500)]
ZTS: Use swapctl to list swap devices on FreeBSD

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11503

3 years agovdev_id: Add error message when $CONFIG is missing
Arshad Hussain [Sat, 23 Jan 2021 23:52:29 +0000 (05:22 +0530)]
vdev_id: Add error message when $CONFIG is missing

It was observed that vdev_id exists silently when
the $CONFIG file is missing.

This patch adds error message in case vdev_id is
called without default $CONFIG or '-c'. This makes
end user observe the exit message more easily.

Before Patch:
~~~~~~~~~~~~~
$ ./cmd/vdev_id/vdev_id
$

After Patch:
~~~~~~~~~~~~
$ ./cmd/vdev_id/vdev_id
Error: Config file "/etc/zfs/vdev_id.conf" not found
$

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Closes #11498

3 years agoFix two minor lint errors (cppcheck)
Colm [Sat, 23 Jan 2021 23:49:32 +0000 (23:49 +0000)]
Fix two minor lint errors (cppcheck)

Fix two minor errors reported by cppcheck:

In module/zfs/abd.c (abd_get_offset_impl), add non-NULL
assertion to prevent NULL dereference warning.

In module/zfs/arc.c (l2arc_write_buffers), change 'try'
variable to 'pass' to avoid C++ reserved word.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Colm Buckley <colm@tuatha.org>
Closes #11507

3 years agoRelax special_small_blocks assertion.
Alexander Motin [Sat, 23 Jan 2021 23:45:27 +0000 (18:45 -0500)]
Relax special_small_blocks assertion.

Follow up for commit 624222a, value asserted <= SPA_OLD_MAXBLOCKSIZE
instead of SPA_MAXBLOCKSIZE as it should be after the previous change.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #11501

3 years agoAdd basic io_uring test
Matthew Macy [Sat, 23 Jan 2021 23:42:42 +0000 (15:42 -0800)]
Add basic io_uring test

Provide a basic test coverage for io_uring I/O.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11497

3 years agoFreeBSD: upstream changes to VFS interface
Ryan Moeller [Thu, 21 Jan 2021 23:20:14 +0000 (23:20 +0000)]
FreeBSD: upstream changes to VFS interface

Set VIRF_MOUNTPOINT flag on snapshot mountpoint.

Authored-by: Mateusz Guzik <mjg@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11458

3 years agoFreeBSD: fix HEAD build, conditionally remove FDSYNC defines
Matt Macy [Wed, 13 Jan 2021 00:22:29 +0000 (16:22 -0800)]
FreeBSD: fix HEAD build, conditionally remove FDSYNC defines

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11458

3 years agoOnly add supported features during pool creation
Brian Behlendorf [Fri, 22 Jan 2021 17:47:06 +0000 (09:47 -0800)]
Only add supported features during pool creation

When creating a pool only features supported by both user and
kernel space should be enabled.  Furthermore, improve the error
messages when attempting to create, or add, a dRAID vdev when
the dRAID feature is not supported by the kernel modules.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11492

3 years agoSet aside a metaslab for ZIL blocks
Matthew Ahrens [Thu, 21 Jan 2021 23:12:54 +0000 (15:12 -0800)]
Set aside a metaslab for ZIL blocks

Mixing ZIL and normal allocations has several problems:

1. The ZIL allocations are allocated, written to disk, and then a few
seconds later freed.  This leaves behind holes (free segments) where the
ZIL blocks used to be, which increases fragmentation, which negatively
impacts performance.

2. When under moderate load, ZIL allocations are of 128KB.  If the pool
is fairly fragmented, there may not be many free chunks of that size.
This causes ZFS to load more metaslabs to locate free segments of 128KB
or more.  The loading happens synchronously (from zil_commit()), and can
take around a second even if the metaslab's spacemap is cached in the
ARC.  All concurrent synchronous operations on this filesystem must wait
while the metaslab is loading.  This can cause a significant performance
impact.

3. If the pool is very fragmented, there may be zero free chunks of
128KB or more.  In this case, the ZIL falls back to txg_wait_synced(),
which has an enormous performance impact.

These problems can be eliminated by using a dedicated log device
("slog"), even one with the same performance characteristics as the
normal devices.

This change sets aside one metaslab from each top-level vdev that is
preferentially used for ZIL allocations (vdev_log_mg,
spa_embedded_log_class).  From an allocation perspective, this is
similar to having a dedicated log device, and it eliminates the
above-mentioned performance problems.

Log (ZIL) blocks can be allocated from the following locations.  Each
one is tried in order until the allocation succeeds:
1. dedicated log vdevs, aka "slog" (spa_log_class)
2. embedded slog metaslabs (spa_embedded_log_class)
3. other metaslabs in normal vdevs (spa_normal_class)

The space required for the embedded slog metaslabs is usually between
0.5% and 1.0% of the pool, and comes out of the existing 3.2% of "slop"
space that is not available for user data.

On an all-ssd system with 4TB storage, 87% fragmentation, 60% capacity,
and recordsize=8k, testing shows a ~50% performance increase on random
8k sync writes.  On even more fragmented systems (which hit problem #3
above and call txg_wait_synced()), the performance improvement can be
arbitrarily large (>100x).

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11389

3 years agodracut: Support /usr/bin as 'systemctl' path
Lorenz Hüdepohl [Thu, 21 Jan 2021 20:59:24 +0000 (21:59 +0100)]
dracut: Support /usr/bin as 'systemctl' path

On openSUSE the initrd has systemctl in /usr/bin, check this path as
well.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Lorenz Hüdepohl <dev@stellardeath.org>
Closes #11487

3 years agoInstall zgenhostid to sbindir
Antonio Russo [Thu, 21 Jan 2021 20:58:24 +0000 (13:58 -0700)]
Install zgenhostid to sbindir

zgenhostid(8) is used to modify or create /etc/hostid.  This
administrative tool is currently installed to bindir.  System utilities
are typically placed in sbin.

Modify the installation directory for zgenhostid.  Additionally, track
this change in its use in dracut and the rpm installation.

Authored-by: наб <nabijaczleweli@nabijaczleweli.xyz>
Authored-by: Antonio Russo <aerusso@aerusso.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #11485

3 years agozpool: speed up importing large pools (#11469)
Alan Somers [Thu, 21 Jan 2021 20:55:54 +0000 (13:55 -0700)]
zpool: speed up importing large pools (#11469)

The ZFS_IOC_POOL_TRYIMPORT ioctl returns an nvlist from the kernel to a
preallocated buffer in  userland.  Userland must guess how large the
buffer should be.  If it undersizes it, it must reallocate and try
again.  That can cost a lot of time for large pools.

OpenZFS commit 28b40c8a6e3 set the guess at "zc.zc_nvlist_conf_size * 4"
without explanation.  On my system, that is too small.  From experiment,
x 32 is a better multiplier.  But I don't know how to calculate it
theoretically.

Sponsored by: Axcient
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alek Pinchuk <apinchuk@axcient.com>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #11469