]> git.proxmox.com Git - mirror_zfs.git/log
mirror_zfs.git
3 years agocppcheck: zpool_main.c possible null pointer dereference
Brian Behlendorf [Fri, 22 Jan 2021 23:03:56 +0000 (15:03 -0800)]
cppcheck: zpool_main.c possible null pointer dereference

Explicitly check for NULL to satisfy cppcheck that "val" can never
be NULL when passed to printf().  This looks like a false positive
since is_blank_str() can never take the false conditional branch
when passed a NULL.  But there's no harm in adding the extra check.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11508

3 years agoRAIDZ2/3 fails to heal silently corrupted parity w/2+ bad disks
Matthew Ahrens [Wed, 27 Jan 2021 00:05:05 +0000 (16:05 -0800)]
RAIDZ2/3 fails to heal silently corrupted parity w/2+ bad disks

When scrubbing, (non-sequential) resilvering, or correcting a checksum
error using RAIDZ parity, ZFS should heal any incorrect RAIDZ parity by
overwriting it.  For example, if P disks are silently corrupted (P being
the number of failures tolerated; e.g. RAIDZ2 has P=2), `zpool scrub`
should detect and heal all the bad state on these disks, including
parity.  This way if there is a subsequent failure we are fully
protected.

With RAIDZ2 or RAIDZ3, a block can have silent damage to a parity
sector, and also damage (silent or known) to a data sector.  In this
case the parity should be healed but it is not.

The problem can be noticed by scrubbing the pool twice.  Assuming there
was no damage concurrent with the scrubs, the first scrub should fix all
silent damage, and the second scrub should be "clean" (`zpool status`
should not report checksum errors on any disks).  If the bug is
encountered, then the second scrub will repair the silently-damaged
parity that the first scrub failed to repair, and these checksum errors
will be reported after the second scrub.  Since the first scrub repaired
all the damaged data, the bug can not be encountered during the second
scrub, so subsequent scrubs (more than two) are not necessary.

The root cause of the problem is some code that was inadvertently added
to `raidz_parity_verify()` by the DRAID changes.  The incorrect code
causes the parity healing to be aborted if there is damaged data
(`rc_error != 0`) or the data disk is not present (`!rc_tried`).  These
checks are not necessary, because we only call `raidz_parity_verify()`
if we have the correct data (which may have been reconstructed using
parity, and which was verified by the checksum).

This commit fixes the problem by removing the incorrect checks in
`raidz_parity_verify()`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11489
Closes #11510

3 years agoZTS: zpool_export test improvements
Will Andrews [Tue, 26 Jan 2021 21:14:04 +0000 (15:14 -0600)]
ZTS: zpool_export test improvements

- refactor cleanup routines into common kshlib zpool_export_cleanup func
- don't require physical disks to test, just use files

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Will Andrews <will@firepipe.net>
Closes #11518

3 years agodracut: Fix race condition between load-key and import
Lorenz Hüdepohl [Tue, 26 Jan 2021 20:14:22 +0000 (21:14 +0100)]
dracut: Fix race condition between load-key and import

zfs-load-key.sh is called by the dracut-pre-mount.service unit which has
no explicit 'After' dependency on zfs-import.target. That way it can be
that the pool has not yet been imported and the zfs-load-key.sh finishes
without ever seeing the relevant pool.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Lorenz Hüdepohl <dev@stellardeath.org>
Closes #11500

3 years agospa_export_common: refactor common exit points
Will Andrews [Mon, 25 Jan 2021 23:04:11 +0000 (17:04 -0600)]
spa_export_common: refactor common exit points

Create a common exit point for spa_export_common (a very long
function), which avoids missing steps on failure.  This work
is helpful for the planned forced pool export changes.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Will Andrews <will@firepipe.net>
Closes #11514

3 years agoZTS: improve output clarity of check_prop_source
Will Andrews [Sun, 11 Oct 2020 20:11:06 +0000 (15:11 -0500)]
ZTS: improve output clarity of check_prop_source

Instead of just failing, indicate the expected and actual value and
source as a NOTE.  Tests using this failed in an earlier version of
the changeset and this information helped find the cause.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Will Andrews <will@firepipe.net>
Closes #11517

3 years agoZTS: remove duplicate check_prop_source from zfs_receive
Will Andrews [Mon, 25 Jan 2021 22:38:19 +0000 (16:38 -0600)]
ZTS: remove duplicate check_prop_source from zfs_receive

There is an identical definition in zfs_set_common.kshlib already.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Will Andrews <will@firepipe.net>
Closes #11516

3 years agologapi: cat output file instead of printing
Will Andrews [Sun, 11 Oct 2020 20:08:56 +0000 (15:08 -0500)]
logapi: cat output file instead of printing

This avoids globbing together multiple lines in the log, if you happen
to specify LOGAPI_DEBUG because you want to see it.

Signed-off-by: Will Andrews <will@firepipe.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11515

3 years agospl-taskq: Make sure thread tsd hash entry is cleared
Matthew Macy [Mon, 25 Jan 2021 19:18:28 +0000 (11:18 -0800)]
spl-taskq: Make sure thread tsd hash entry is cleared

Like any other thread created by thread_create() we need to call
thread_exit() to properly clean it up.  In particular, this ensures the
tsd hash for the thread is cleared.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11512

3 years agoSpeed up "zpool import" in the presence of many zvols
Alan Somers [Mon, 25 Jan 2021 00:02:45 +0000 (17:02 -0700)]
Speed up "zpool import" in the presence of many zvols

By default, FreeBSD does not allow zpools to be backed by zvols (that
can be changed with the "vfs.zfs.vol.recursive" sysctl). When that
sysctl is set to 0, the kernel does not attempt to read zvols when
looking for vdevs. But the zpool command still does. This change brings
the zpool command into line with the kernel's behavior. It speeds "zpool
import" when an already imported pool has many zvols, or a zvol with
many snapshots.

https://svnweb.freebsd.org/base?view=revision&revision=357235
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=241083
https://reviews.freebsd.org/D22077

Obtained from: FreeBSD
Reported by: Martin Birgmeier <d8zNeCFG@aon.at>
Sponsored by: Axcient
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #11502

3 years agozfsprops.8: fix mispluralisation in "Default values is"
наб [Sun, 24 Jan 2021 23:57:51 +0000 (00:57 +0100)]
zfsprops.8: fix mispluralisation in "Default values is"

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11509

3 years agoZTS: Use swapctl to list swap devices on FreeBSD
Ryan Moeller [Sun, 24 Jan 2021 23:56:59 +0000 (18:56 -0500)]
ZTS: Use swapctl to list swap devices on FreeBSD

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11503

3 years agovdev_id: Add error message when $CONFIG is missing
Arshad Hussain [Sat, 23 Jan 2021 23:52:29 +0000 (05:22 +0530)]
vdev_id: Add error message when $CONFIG is missing

It was observed that vdev_id exists silently when
the $CONFIG file is missing.

This patch adds error message in case vdev_id is
called without default $CONFIG or '-c'. This makes
end user observe the exit message more easily.

Before Patch:
~~~~~~~~~~~~~
$ ./cmd/vdev_id/vdev_id
$

After Patch:
~~~~~~~~~~~~
$ ./cmd/vdev_id/vdev_id
Error: Config file "/etc/zfs/vdev_id.conf" not found
$

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Closes #11498

3 years agoFix two minor lint errors (cppcheck)
Colm [Sat, 23 Jan 2021 23:49:32 +0000 (23:49 +0000)]
Fix two minor lint errors (cppcheck)

Fix two minor errors reported by cppcheck:

In module/zfs/abd.c (abd_get_offset_impl), add non-NULL
assertion to prevent NULL dereference warning.

In module/zfs/arc.c (l2arc_write_buffers), change 'try'
variable to 'pass' to avoid C++ reserved word.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Colm Buckley <colm@tuatha.org>
Closes #11507

3 years agoRelax special_small_blocks assertion.
Alexander Motin [Sat, 23 Jan 2021 23:45:27 +0000 (18:45 -0500)]
Relax special_small_blocks assertion.

Follow up for commit 624222a, value asserted <= SPA_OLD_MAXBLOCKSIZE
instead of SPA_MAXBLOCKSIZE as it should be after the previous change.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #11501

3 years agoAdd basic io_uring test
Matthew Macy [Sat, 23 Jan 2021 23:42:42 +0000 (15:42 -0800)]
Add basic io_uring test

Provide a basic test coverage for io_uring I/O.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11497

3 years agoFreeBSD: upstream changes to VFS interface
Ryan Moeller [Thu, 21 Jan 2021 23:20:14 +0000 (23:20 +0000)]
FreeBSD: upstream changes to VFS interface

Set VIRF_MOUNTPOINT flag on snapshot mountpoint.

Authored-by: Mateusz Guzik <mjg@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11458

3 years agoFreeBSD: fix HEAD build, conditionally remove FDSYNC defines
Matt Macy [Wed, 13 Jan 2021 00:22:29 +0000 (16:22 -0800)]
FreeBSD: fix HEAD build, conditionally remove FDSYNC defines

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11458

3 years agoOnly add supported features during pool creation
Brian Behlendorf [Fri, 22 Jan 2021 17:47:06 +0000 (09:47 -0800)]
Only add supported features during pool creation

When creating a pool only features supported by both user and
kernel space should be enabled.  Furthermore, improve the error
messages when attempting to create, or add, a dRAID vdev when
the dRAID feature is not supported by the kernel modules.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11492

3 years agoSet aside a metaslab for ZIL blocks
Matthew Ahrens [Thu, 21 Jan 2021 23:12:54 +0000 (15:12 -0800)]
Set aside a metaslab for ZIL blocks

Mixing ZIL and normal allocations has several problems:

1. The ZIL allocations are allocated, written to disk, and then a few
seconds later freed.  This leaves behind holes (free segments) where the
ZIL blocks used to be, which increases fragmentation, which negatively
impacts performance.

2. When under moderate load, ZIL allocations are of 128KB.  If the pool
is fairly fragmented, there may not be many free chunks of that size.
This causes ZFS to load more metaslabs to locate free segments of 128KB
or more.  The loading happens synchronously (from zil_commit()), and can
take around a second even if the metaslab's spacemap is cached in the
ARC.  All concurrent synchronous operations on this filesystem must wait
while the metaslab is loading.  This can cause a significant performance
impact.

3. If the pool is very fragmented, there may be zero free chunks of
128KB or more.  In this case, the ZIL falls back to txg_wait_synced(),
which has an enormous performance impact.

These problems can be eliminated by using a dedicated log device
("slog"), even one with the same performance characteristics as the
normal devices.

This change sets aside one metaslab from each top-level vdev that is
preferentially used for ZIL allocations (vdev_log_mg,
spa_embedded_log_class).  From an allocation perspective, this is
similar to having a dedicated log device, and it eliminates the
above-mentioned performance problems.

Log (ZIL) blocks can be allocated from the following locations.  Each
one is tried in order until the allocation succeeds:
1. dedicated log vdevs, aka "slog" (spa_log_class)
2. embedded slog metaslabs (spa_embedded_log_class)
3. other metaslabs in normal vdevs (spa_normal_class)

The space required for the embedded slog metaslabs is usually between
0.5% and 1.0% of the pool, and comes out of the existing 3.2% of "slop"
space that is not available for user data.

On an all-ssd system with 4TB storage, 87% fragmentation, 60% capacity,
and recordsize=8k, testing shows a ~50% performance increase on random
8k sync writes.  On even more fragmented systems (which hit problem #3
above and call txg_wait_synced()), the performance improvement can be
arbitrarily large (>100x).

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11389

3 years agodracut: Support /usr/bin as 'systemctl' path
Lorenz Hüdepohl [Thu, 21 Jan 2021 20:59:24 +0000 (21:59 +0100)]
dracut: Support /usr/bin as 'systemctl' path

On openSUSE the initrd has systemctl in /usr/bin, check this path as
well.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Lorenz Hüdepohl <dev@stellardeath.org>
Closes #11487

3 years agoInstall zgenhostid to sbindir
Antonio Russo [Thu, 21 Jan 2021 20:58:24 +0000 (13:58 -0700)]
Install zgenhostid to sbindir

zgenhostid(8) is used to modify or create /etc/hostid.  This
administrative tool is currently installed to bindir.  System utilities
are typically placed in sbin.

Modify the installation directory for zgenhostid.  Additionally, track
this change in its use in dracut and the rpm installation.

Authored-by: наб <nabijaczleweli@nabijaczleweli.xyz>
Authored-by: Antonio Russo <aerusso@aerusso.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #11485

3 years agozpool: speed up importing large pools (#11469)
Alan Somers [Thu, 21 Jan 2021 20:55:54 +0000 (13:55 -0700)]
zpool: speed up importing large pools (#11469)

The ZFS_IOC_POOL_TRYIMPORT ioctl returns an nvlist from the kernel to a
preallocated buffer in  userland.  Userland must guess how large the
buffer should be.  If it undersizes it, it must reallocate and try
again.  That can cost a lot of time for large pools.

OpenZFS commit 28b40c8a6e3 set the guess at "zc.zc_nvlist_conf_size * 4"
without explanation.  On my system, that is too small.  From experiment,
x 32 is a better multiplier.  But I don't know how to calculate it
theoretically.

Sponsored by: Axcient
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alek Pinchuk <apinchuk@axcient.com>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #11469

3 years agolibzutil: optimize zpool_read_label with AIO
Alan Somers [Wed, 13 Jan 2021 17:00:12 +0000 (10:00 -0700)]
libzutil: optimize zpool_read_label with AIO

Read all labels in parallel instead of sequentially.

Originally committed as
https://cgit.freebsd.org/src/commit/?id=b49e9abcf44cafaf5cfad7029c9a6adbb28346e8

Obtained from: FreeBSD
Sponsored by: Spectra Logic, Axcient
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alek Pinchuk <apinchuk@axcient.com>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #11467

3 years agolibzutil: don't read extraneous data in zpool_read_label
Alan Somers [Wed, 13 Jan 2021 16:30:48 +0000 (09:30 -0700)]
libzutil: don't read extraneous data in zpool_read_label

zpool_read_label doesn't need the full labels including uberblocks.  It
only needs the vdev_phys_t.  This reduces by half the amount of data
read to check for a label, speeding up "zpool import", "zpool
labelclear", etc.

Originally committed as
https://cgit.freebsd.org/src/commit/?id=63f8025d6acab1b334373ddd33f940a69b3b54cc

Obtained from: FreeBSD
Sponsored by: Spectra Logic Corp, Axcient
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alek Pinchuk <apinchuk@axcient.com>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #11467

3 years agoLinux 5.10 compat: restore custom uio_prefaultpages()
Brian Behlendorf [Thu, 21 Jan 2021 18:43:39 +0000 (10:43 -0800)]
Linux 5.10 compat: restore custom uio_prefaultpages()

As part of commit 1c2358c1 the custom uio_prefaultpages() code
was removed in favor of using the generic kernel provided
iov_iter_fault_in_readable() interface.  Unfortunately, it
turns out that up until the Linux 4.7 kernel the function would
only ever fault in the first iovec of the iov_iter.  The result
being uiomove_iov() may hang waiting for the page.

This commit effectively restores the custom uio_prefaultpages()
pages code for Linux 4.9 and earlier kernels which contain the
troublesome version of iov_iter_fault_in_readable().

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11463
Closes #11484

3 years agoExtending FreeBSD UIO Struct
Brian Atkinson [Thu, 21 Jan 2021 05:27:30 +0000 (22:27 -0700)]
Extending FreeBSD UIO Struct

In FreeBSD the struct uio was just a typedef to uio_t. In order to
extend this struct, outside of the definition for the struct uio, the
struct uio has been embedded inside of a uio_t struct.

Also renamed all the uio_* interfaces to be zfs_uio_* to make it clear
this is a ZFS interface.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #11438

3 years agoallow callers to allocate and provide the abd_t struct
Matthew Ahrens [Wed, 20 Jan 2021 19:24:37 +0000 (11:24 -0800)]
allow callers to allocate and provide the abd_t struct

The `abd_get_offset_*()` routines create an abd_t that references
another abd_t, and doesn't allocate any pages/buffers of its own.  In
some workloads, these routines may be called frequently, to create many
abd_t's representing small pieces of a single large abd_t.  In
particular, the upcoming RAIDZ Expansion project makes heavy use of
these routines.

This commit adds the ability for the caller to allocate and provide the
abd_t struct to a variant of `abd_get_offset_*()`.  This eliminates the
cost of allocating the abd_t and performing the accounting associated
with it (`abdstat_struct_size`).  The RAIDZ/DRAID code uses this for
the `rc_abd`, which references the zio's abd.  The upcoming RAIDZ
Expansion project will leverage this infrastructure to increase
performance of reads post-expansion by around 50%.

Additionally, some of the interfaces around creating and destroying
abd_t's are cleaned up.  Most significantly, the distinction between
`abd_put()` and `abd_free()` is eliminated; all types of abd_t's are
now disposed of with `abd_free()`.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Issue #8853
Closes #11439

3 years agoRe-apply path sanitizer, as mount(8) still mangles it
sterlingjensen [Tue, 19 Jan 2021 19:57:31 +0000 (13:57 -0600)]
Re-apply path sanitizer, as mount(8) still mangles it

Prior to util-linux 2.36.2, if a file or directory in the
current working directory was named 'dataset' then mount(8)
would prepend the current working directory to the dataset.

Eventually, we should be able to drop this workaround.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sterling Jensen <sterlingjensen@users.noreply.github.com>
Closes #11295
Closes #11462

3 years agoZTS: avoid piping to special devices
Antonio Russo [Tue, 19 Jan 2021 19:53:35 +0000 (12:53 -0700)]
ZTS: avoid piping to special devices

As described in #11445, the kernel interface kernel_{read,write} no
longer act on special devices.  In the ZTS, zfs send and receive are
tested by piping to these devices, leading to spurious failures (for
positive tests) and may mask errors (for negative tests).

Until a more permanent mechanism to address this deficiency is
developed, clean up the output from the ZTS by avoiding directly piping
to or from /dev/null and /dev/zero.

For /dev/zero input, simply use a pipe: `cat </dev/zero |` .

However, for /dev/null output, the shell semantics for pipe failures
means that zfs send error codes will be masked by the successful
`| cat >/dev/null` command execution.  In that case, use a temporary
file under $TEST_BASE_DIR for output in favor.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #11478

3 years agolibzfs_sendrecv: Use fnv* to verify nvlist/nvpair*
Ryan Moeller [Thu, 14 Jan 2021 17:53:09 +0000 (12:53 -0500)]
libzfs_sendrecv: Use fnv* to verify nvlist/nvpair*

Use verified variants of nvlist/nvpair functions where applicable.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11460

3 years agoztest: Clean up use of ASSERT and VERIFY
Ryan Moeller [Wed, 13 Jan 2021 01:21:01 +0000 (20:21 -0500)]
ztest: Clean up use of ASSERT and VERIFY

Try to use more appropriate ASSERT and VERIFY variants in ztest.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11454

3 years agoZTS: avoid race to unmount in zfs_rollback_001
Antonio Russo [Wed, 13 Jan 2021 01:20:02 +0000 (18:20 -0700)]
ZTS: avoid race to unmount in zfs_rollback_001

The zfs_rollback_001 test modifies files in a temporary, test dataset
repeatedly.  Before each iteration, any preexisting dataset is removed,
after unmounted with umount -f, if necessary.

Add a short delay after the forced unmount, avoiding a race that can
prevent zfs destroy from succeeding, leading to a test failure.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #11451

3 years agoztest: Use fnvlist_* instead of VERIFYing nvlist_*
Ryan Moeller [Mon, 11 Jan 2021 17:30:33 +0000 (12:30 -0500)]
ztest: Use fnvlist_* instead of VERIFYing nvlist_*

Simplify ztest by using fnvlist functions to verify success.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11441

3 years agorecord ioctl elapsed time in zpool history
Matthew Ahrens [Mon, 11 Jan 2021 17:29:25 +0000 (09:29 -0800)]
record ioctl elapsed time in zpool history

Each zfs ioctl that changes on-disk state (e.g. set property, create
snapshot, destroy filesystem) is recorded in the zpool history, and is
printed by `zpool history -i`.

For performance diagnostic purposes, it would be useful to know how long
each of these ioctls took to run.  This commit adds that functionality,
with a new `ZPOOL_HIST_ELAPSED_NS` member of the history nvlist.

Additionally, the time recorded in this history log is currently the
time that the history record is written to disk.  But in many cases (CLI
args logging and ioctl logging), this happens asynchronously,
potentially many seconds after the operation completed.  This commit
changes the timestamp to reflect when the history event was created,
rather than when it was written to disk.

Reviewed-by: Mark Maybee <mmaybee@cray.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11440

3 years agoassertion failed in arc_wait_for_eviction()
Matthew Ahrens [Fri, 8 Jan 2021 04:06:32 +0000 (20:06 -0800)]
assertion failed in arc_wait_for_eviction()

If the system is very low on memory (specifically,
`arc_free_memory() < arc_sys_free/2`, i.e. less than 1/16th of RAM
free), `arc_evict_state_impl()` will defer wakups.  In this case, the
arc_evict_waiter_t's remain on the list, even though `arc_evict_count`
has been incremented past their `aew_count`.

The problem is that `arc_wait_for_eviction()` assumes that if there are
waiters on the list, the count they are waiting for has not yet been
reached.  However, the deferred wakeups may violate this, causing
`ASSERT(last->aew_count > arc_evict_count)` to fail.

This commit resolves the issue by having new waiters use the greater of
`arc_evict_count` and the last `aew_count`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11285
Closes #11397

3 years agoFreeBSD: minor_t needs to be signed so that -1 is recognized as such
Matthew Macy [Thu, 7 Jan 2021 18:41:27 +0000 (10:41 -0800)]
FreeBSD: minor_t needs to be signed so that -1 is recognized as such

zfsdev_close sets zs_minor to -1 to avoid duplicate calls to
destroy. This doesn't mix well with the current u_int used.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11437

3 years agoAutoconf 2.70 compatibility
Brian Behlendorf [Sun, 3 Jan 2021 00:55:55 +0000 (16:55 -0800)]
Autoconf 2.70 compatibility

Several m4 macros have been retired in autoconf 2.70.  Update the
the build system to use the new macros provided to replace them.

* Replaced AC_HELP_STRING with AS_HELP_STRING.

* Replaced AC_TRY_COMPILE with AC_COMPILE_IFELSE/AC_LANG_PROGRAM.

* Replaced AC_CANONICAL_SYSTEM with AC_CANONICAL_TARGET

* Replaced AC_PROG_LIBTOOL with LT_INIT

* $CPP is not defined in ZFS_AC_KERNEL and really shouldn't be
  directly used like this.  Replace it with an $AWK command
  to extract the kernel source version.

Reviewed-by: Eli Schwartz <eschwartz@archlinux.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #11413
Closes #11419

3 years agozfs_mount_all_mountpoints: cleanup_all should leave pool root mounted
Toomas Soome [Sun, 3 Jan 2021 00:54:53 +0000 (02:54 +0200)]
zfs_mount_all_mountpoints: cleanup_all should leave pool root mounted

if pool root is not mounted, then zpool umount in next test will leave
dataset mountpoint directory around and next zfs mount -a will fail
with error: cannot mount '/testpool': directory is not empty

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #11417

3 years agoVZ 7 kernel compat: introduce ITER-enabled .direct_IO() via IOVECs
Konstantin Khorenko [Wed, 30 Dec 2020 22:18:29 +0000 (01:18 +0300)]
VZ 7 kernel compat: introduce ITER-enabled .direct_IO() via IOVECs

Virtuozzo 7 kernels starting 3.10.0-1127.18.2.vz7.163.46
have the following configuration:

  * no HAVE_VFS_RW_ITERATE
  * HAVE_VFS_DIRECT_IO_ITER_RW_OFFSET

=> let's add implementation of zpl_direct_IO() via
zpl_aio_{read,write}() in this case.

https://bugs.openvz.org/browse/OVZ-7243

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Closes #11410
Closes #11411

3 years agoMemory leak in zdb:import_checkpointed_state()
Matthew Ahrens [Fri, 25 Dec 2020 05:07:24 +0000 (21:07 -0800)]
Memory leak in zdb:import_checkpointed_state()

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11396

3 years agoMemory leak in ztest_dmu_objset_own()
Matthew Ahrens [Fri, 25 Dec 2020 04:58:17 +0000 (20:58 -0800)]
Memory leak in ztest_dmu_objset_own()

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11396

3 years agoMemory leak in ztest_vdev_attach_detach()
Matthew Ahrens [Fri, 25 Dec 2020 04:51:13 +0000 (20:51 -0800)]
Memory leak in ztest_vdev_attach_detach()

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11396

3 years agonvlist leaked in zpool_find_config()
Matthew Ahrens [Wed, 23 Dec 2020 17:52:24 +0000 (09:52 -0800)]
nvlist leaked in zpool_find_config()

In `zpool_find_config()`, the `pools` nvlist is leaked.  Part of it (a
sub-nvlist) is returned in `*configp`, but the callers also leak that.

Additionally, in `zdb.c:main()`, the `searchdirs` is leaked.

The leaks were detected by ASAN (`configure --enable-asan`).

This commit resolves the leaks.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11396

3 years agoimplicit conversion from 'boolean_t' to 'ds_hold_flags_t'
Toomas Soome [Mon, 28 Dec 2020 00:31:02 +0000 (02:31 +0200)]
implicit conversion from 'boolean_t' to 'ds_hold_flags_t'

Build error on illumos with gcc 10 did reveal:

In function 'dmu_objset_refresh_ownership':
../../common/fs/zfs/dmu_objset.c:857:25: error: implicit conversion
from 'boolean_t' to 'ds_hold_flags_t' {aka 'enum ds_hold_flags'}
[-Werror=enum-conversion]
      857 |  dsl_dataset_disown(ds, decrypt, tag);
          |                         ^~~~~~~
cc1: all warnings being treated as errors

libzfs_input_check.c: In function 'zfs_ioc_input_tests':
libzfs_input_check.c:754:28: error: implicit conversion from
'enum dmu_objset_type' to 'enum lzc_dataset_type'
[-Werror=enum-conversion]
  754 |  err = lzc_create(dataset, DMU_OST_ZFS, NULL, NULL, 0);
      |                            ^~~~~~~~~~~
cc1: all warnings being treated as errors

The same issue is present in openzfs, and also the same issue about
ds_hold_flags_t, which currently defines exactly one valid value.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #11406

3 years agoLinux 5.11 compat: blk_{un}register_region()
Brian Behlendorf [Tue, 22 Dec 2020 22:12:31 +0000 (14:12 -0800)]
Linux 5.11 compat: blk_{un}register_region()

As of 5.11 the blk_register_region() and blk_unregister_region()
functions have been retired. This isn't a problem since add_disk()
has implicitly allocated minor numbers for a very long time.

Reviewed-by: Rafael Kitover <rkitover@gmail.com>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11387
Closes #11390

3 years agoLinux 5.11 compat: revalidate_disk_size()
Brian Behlendorf [Tue, 22 Dec 2020 21:53:25 +0000 (13:53 -0800)]
Linux 5.11 compat: revalidate_disk_size()

Both revalidate_disk_size() and revalidate_disk() have been removed.
Functionally this isn't a problem because we only relied on these
functions to call zvol_revalidate_disk() for us and to perform any
additional handling which might be needed for that kernel version.
When neither are available we know there's no additional handling
needed and we can directly call zvol_revalidate_disk().

Reviewed-by: Rafael Kitover <rkitover@gmail.com>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11387
Closes #11390

3 years agoLinux 5.11 compat: bdev_whole()
Brian Behlendorf [Tue, 22 Dec 2020 21:02:59 +0000 (13:02 -0800)]
Linux 5.11 compat: bdev_whole()

The bd_contains member was removed from the block_device structure.
Callers needing to determine if a vdev is a whole block device should
use the new bdev_whole() wrapper.  For older kernels we provide our
own bdev_whole() wrapper which relies on bd_contains for compatibility.

Reviewed-by: Rafael Kitover <rkitover@gmail.com>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11387
Closes #11390

3 years agoLinux 5.11 compat: bio_start_io_acct() / bio_end_io_acct()
Brian Behlendorf [Tue, 22 Dec 2020 20:17:13 +0000 (12:17 -0800)]
Linux 5.11 compat: bio_start_io_acct() / bio_end_io_acct()

The generic IO accounting functions have been removed in favor of the
bio_start_io_acct() and bio_end_io_acct() functions which provide a
better interface.  These new functions were introduced in the 5.8
kernels but it wasn't until the 5.11 kernel that the previous generic
IO accounting interfaces were removed.

This commit updates the blk_generic_*_io_acct() wrappers to provide
and interface similar to the updated kernel interface.  It's slightly
different because for older kernels we need to pass the request queue
as well as the bio.

Reviewed-by: Rafael Kitover <rkitover@gmail.com>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11387
Closes #11390

3 years agoLinux 5.11 compat: lookup_bdev()
Brian Behlendorf [Tue, 22 Dec 2020 18:26:45 +0000 (10:26 -0800)]
Linux 5.11 compat: lookup_bdev()

The lookup_bdev() function has been updated to require a dev_t
be passed as the second argument. This is actually pretty nice
since the major number stored in the dev_t was the only part we
were interested in. This allows to us avoid handling the bdev
entirely.  The vdev_lookup_bdev() wrapper was updated to emulate
the behavior of the new lookup_bdev() for all supported kernels.

Reviewed-by: Rafael Kitover <rkitover@gmail.com>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11387
Closes #11390

3 years agoLinux 5.11 compat: conftest
Brian Behlendorf [Wed, 23 Dec 2020 19:40:02 +0000 (11:40 -0800)]
Linux 5.11 compat: conftest

Update the ZFS_LINUX_TEST_PROGRAM macro to always set the module
license.  As of the 5.11 kernel not setting a license has been
converted from a warning to an error.

Reviewed-by: Rafael Kitover <rkitover@gmail.com>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11387
Closes #11390

3 years agodbufstat: Fix warnings with Python 3.8
Ryan Moeller [Wed, 23 Dec 2020 23:10:35 +0000 (18:10 -0500)]
dbufstat: Fix warnings with Python 3.8

Replace "is" with "==" and "is not" with "!=".

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11394

3 years agoLinux 5.10 compat: META
Brian Behlendorf [Wed, 23 Dec 2020 16:55:02 +0000 (08:55 -0800)]
Linux 5.10 compat: META

Increase the Linux-Maximum version in the META file to 5.10.
All of the required compatibility patches have been merged.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11391

3 years agoRemove unused check from dmu_tx_count_write()
Brian Behlendorf [Tue, 22 Dec 2020 04:17:13 +0000 (20:17 -0800)]
Remove unused check from dmu_tx_count_write()

Individual transactions may not be larger than DMU_MAX_ACCESS.
This is enforced by the assertions in dmu_tx_hold_write() and
dmu_tx_hold_write_by_dnode().  There's an additional check in
dmu_tx_count_write() however it has no effect and only sets a
local err variable.  We could enable this check, however since
it's already enforced by ASSERTs elsewhere I opted to remove it
instead.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3731
Closes #11384

3 years agozfs-kmods: install to /lib/modules instead of /usr/lib/modules
Christian Schwarz [Tue, 22 Dec 2020 04:14:32 +0000 (04:14 +0000)]
zfs-kmods: install to /lib/modules instead of /usr/lib/modules

Before this patch, dracut wouldn't find zfs.ko for inclusion in
initramfs. This was caused by the packages installing in to
/lib/modules instead of /usr/lib/modules.  Correcting this allows
dracut to do the right thing, even without

    # /etc/dracut.conf
    add_drivers+=" zfs "

Notably, rpm/redhat/zfs-kmod.spec.in does not contain the definition of
the `prefix` macro that this commit removes in the generic kmod spec.
And https://rpmfusion.org/Packaging/KernelModules/Kmods2 does not
mention `prefix` at all.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11381

3 years agoForward questions to github discussions
Kjeld Schouten-Lebbing [Tue, 22 Dec 2020 04:09:02 +0000 (05:09 +0100)]
Forward questions to github discussions

Instead of creating issues with type "question"
Forward to the GitHub Discussion system.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #11383

3 years agoDangling reference from dmu_objset_upgrade
Andy Fiddaman [Mon, 21 Dec 2020 18:13:23 +0000 (18:13 +0000)]
Dangling reference from dmu_objset_upgrade

After porting the fix for https://github.com/openzfs/zfs/issues/5295
over to illumos, we started hitting an assertion failure when running
the testsuite:

assertion failed: rc->rc_count == number, file: .../refcount.c

and the unexpected hold has this stack:

dsl_dataset_long_hold+0x59 dmu_objset_upgrade+0x73
dmu_objset_id_quota_upgrade+0x15 dmu_objset_own+0x14f

The simplest reproducer for this in illumos is

    zpool create -f -O version=1 testpool c3t0d0; zpool destroy testpool

which is run as part of the zpool_create_tempname test, but I can't get
this to trigger on FreeBSD. This appears to be because of the call to
txg_wait_synced() in dmu_objset_upgrade_stop() (which was missing in
illumos), slows down dmu_objset_disown() enough to avoid the condition.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andy Fiddaman <andy@omnios.org>
Closes #11368

3 years agoLinux 4.18.0-257.el8 compat: blk_alloc_queue()
Brian Behlendorf [Mon, 21 Dec 2020 18:11:56 +0000 (10:11 -0800)]
Linux 4.18.0-257.el8 compat: blk_alloc_queue()

The CentOS stream 4.18.0-257 kernel appears to have backported
the Linux 5.9 change to make_request_fn and the associated API.
To maintain weak modules compatibility the original symbol was
retained and the new interface blk_alloc_queue_rh() was added.

Unfortunately, blk_alloc_queue() was replaced in the blkdev.h
header by blk_alloc_queue_bh() so there doesn't seem to be a way
to build new kmods against the old interfces.  Even though they
appear to still be available for weak module binding.

To accommodate this a configure check is added for the new _rh()
variant of the function and used if available.  If compatibility
code gets added to the kernel for the original blk_alloc_queue()
interface this should be fine.  OpenZFS will simply continue to
prefer the new interface and only fallback to blk_alloc_queue()
when blk_alloc_queue_rh() isn't available.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11374

3 years agoFix maybe uninitialized variable warning
Brian Behlendorf [Sun, 20 Dec 2020 17:50:13 +0000 (09:50 -0800)]
Fix maybe uninitialized variable warning

Commit 1c2358c12 restructured this code and introduced a warning
about the variable maybe not being initialized.  This cannot happen
with the updated code but we should initialize the variable anyway
to silence the warning.

    zpl_file.c: In function ‘zpl_iter_write’:
    zpl_file.c:324:9: warning: ‘count’ may be used uninitialized
        in this function [-Wmaybe-uninitialized]

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11373

3 years agoRemove iov_iter_advance() from iter_read
Brian Behlendorf [Sun, 20 Dec 2020 17:49:29 +0000 (09:49 -0800)]
Remove iov_iter_advance() from iter_read

There's no need to call iov_iter_advance() in zpl_iter_read().
This was preserved from the previous code where it wasn't needed
but also didn't cause any problems.  Now that the iter functions
also handle pipes that's no longer the case.  When fully reading a
pipe buffer iov_iter_advance() may results in the pipe buf release
function being called which will not be registered resulting in
a NULL dereference.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11375
Closes #11378

3 years agodsl_pool: extend comment on DSL Pool Configuration Lock
Christian Schwarz [Sun, 20 Dec 2020 02:04:05 +0000 (02:04 +0000)]
dsl_pool: extend comment on DSL Pool Configuration Lock

Based on a conversation with Matt on the OpenZFS Slack.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11370

3 years agoLinux 5.10 compat: also zvol_revalidate_disk()
Michael D Labriola [Fri, 18 Dec 2020 17:36:19 +0000 (12:36 -0500)]
Linux 5.10 compat: also zvol_revalidate_disk()

Commit 59b68723 added a configure check for 5.10, which removed
revalidate_disk(), and conditionally replaced it's usage with a call to
the new revalidate_disk_size() function.  However, the old function also
invoked the device's registered callback, in our case
zvol_revalidate_disk().  This commit adds a call to zvol_revalidate_disk()
in zvol_update_volsize() to make sure the code path stays the same.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Michael D Labriola <michael.d.labriola@gmail.com>
Closes #11358

3 years agoZED/zfs-list-cacher.sh: don't exit on ignored event type
Prawn [Fri, 18 Dec 2020 17:34:10 +0000 (18:34 +0100)]
ZED/zfs-list-cacher.sh: don't exit on ignored event type

Check for the history_event type instead.

The zfs-list-cacher.sh script currently respects the event types
excluded from syslog(!) in ZED_SYSLOG_SUBCLASS_EXCLUDE.
This makes little sense in this single-purpose script and
silently breaks when history_events are excluded from syslog,
which is the default since 13d65987a9d9958de77422f5d9d25b47e486537d.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
Closes #11164
Closes #11347

3 years agoLinux 5.10 compat: use iov_iter in uio structure
Brian Behlendorf [Fri, 18 Dec 2020 16:48:26 +0000 (08:48 -0800)]
Linux 5.10 compat: use iov_iter in uio structure

As of the 5.10 kernel the generic splice compatibility code has been
removed.  All filesystems are now responsible for registering a
->splice_read and ->splice_write callback to support this operation.

The good news is the VFS provided generic_file_splice_read() and
iter_file_splice_write() callbacks can be used provided the ->iter_read
and ->iter_write callback support pipes.  However, this is currently
not the case and only iovecs and bvecs (not pipes) are ever attached
to the uio structure.

This commit changes that by allowing full iov_iter structures to be
attached to uios.  Ever since the 4.9 kernel the iov_iter structure
has supported iovecs, kvecs, bvevs, and pipes so it's desirable to
pass the entire thing when possible.  In conjunction with this the
uio helper functions (i.e uiomove(), uiocopy(), etc) have been
updated to understand the new UIO_ITER type.

Note that using the kernel provided uio_iter interfaces allowed the
existing Linux specific uio handling code to be simplified.  When
there's no longer a need to support kernel's older than 4.9, then
it will be possible to remove the iovec and bvec members from the
uio structure and always use a uio_iter.  Until then we need to
maintain all of the existing types for older kernels.

Some additional refactoring and cleanup was included in this change:

- Added checks to configure to detect available iov_iter interfaces.
  Some are available all the way back to the 3.10 kernel and are used
  when available.  In particular, uio_prefaultpages() now always uses
  iov_iter_fault_in_readable() which is available for all supported
  kernels.

- The unused UIO_USERISPACE type has been removed.  It is no longer
  needed now that the uio_seg enum is platform specific.

- Moved zfs_uio.c from the zcommon.ko module to the Linux specific
  platform code for the zfs.ko module.  This gets it out of libzfs
  where it was never needed and keeps this Linux specific code out
  of the common sources.

- Removed unnecessary O_APPEND handling from zfs_iter_write(), this
  is redundant and O_APPEND is already handled in zfs_write();

Reviewed-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11351

3 years agoZTS: Simplify zpool_initialize_verify_initialized
Brian Behlendorf [Fri, 18 Dec 2020 16:42:59 +0000 (08:42 -0800)]
ZTS: Simplify zpool_initialize_verify_initialized

Consider the test to be a success as long as the initializing pattern
is found at least once per metaslab.  This indicates that at least
part of the free space was initialized.  Ideally we'd check that the
pattern was written to all free space but that's much trickier so this
check is a reasonable compromise.

Using a here-string to feed the loop in this test causes an empty
string to still trigger the loop so we miss the `spacemaps=0` case.
Pipe into the loop instead.

While here, we can use `zpool wait -t initialize $TESTPOOL` to wait for
the pool to initialize.

Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11365

3 years agospecial device removal space accounting fixes
Matthew Ahrens [Thu, 17 Dec 2020 20:11:56 +0000 (12:11 -0800)]
special device removal space accounting fixes

The space in special devices is not included in spa_dspace (or
dsl_pool_adjustedsize(), or the zfs `available` property).  Therefore
there is always at least as much free space in the normal class, as
there is allocated in the special class(es).  And therefore, there is
always enough free space to remove a special device.

However, the checks for free space when removing special devices did not
take this into account.  This commit corrects that.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11329

3 years agoAvoid extra work updating ARC kstats and tunables
Ryan Moeller [Thu, 17 Dec 2020 19:16:42 +0000 (14:16 -0500)]
Avoid extra work updating ARC kstats and tunables

After e357046 it should not be necessary to periodically update ARC
kstats and tunables.  Tunable updates are applied when modified, and
kstats are updated on demand.

Update kstats in `arc_evict_cb_check()` for `ZFS_DEBUG` builds only.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11237

3 years agoUse the correct return type for getopt
sterlingjensen [Thu, 17 Dec 2020 18:19:30 +0000 (12:19 -0600)]
Use the correct return type for getopt

Use the correct return type for getopt otherwise clang complains
about tautological-constant-out-of-range-compare.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sterling Jensen <sterlingjensen@users.noreply.github.com>
Closes #11359

3 years agoOnly examine best metaslabs on each vdev
Matthew Ahrens [Wed, 16 Dec 2020 22:40:05 +0000 (14:40 -0800)]
Only examine best metaslabs on each vdev

On a system with very high fragmentation, we may need to do lots of gang
allocations (e.g. most indirect block allocations (~50KB) may need to
gang). Before failing a "normal" allocation and resorting to ganging, we
try every metaslab.  This has the impact of loading every metaslab (not
a huge deal since we now typically keep all metaslabs loaded), and also
iterating over every metaslab for every failing allocation. If there are
many metaslabs (more than the typical ~200, e.g. due to vdev expansion
or very large vdevs), the CPU cost of this iteration can be very
impactful.  This iteration is done with the mg_lock held, creating long
hold times and high lock contention for concurrent allocations,
ultimately causing long txg sync times and poor application performance.

To address this, this commit changes the behavior of "normal" (not
try_hard, not ZIL) allocations.  These will now only examine the 100
best metaslabs (as determined by their ms_weight).  If none of these
have a large enough free segment, then the allocation will fail and
we'll fall back on ganging.

To accomplish this, we will now (normally) gang before doing a
`try_hard` allocation.  Non-try_hard allocations will only examine the
100 best metaslabs of each vdev.  In summary, we will first try normal
allocation.  If that fails then we will do a gang allocation.  If that
fails then we will do a "try hard" gang allocation.  If that fails then
we will have a multi-layer gang block.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11327

3 years agoMake metaslab class rotor and aliquot per-allocator.
Alexander Motin [Tue, 15 Dec 2020 18:55:44 +0000 (13:55 -0500)]
Make metaslab class rotor and aliquot per-allocator.

Metaslab rotor and aliquot are used to distribute workload between
vdevs while keeping some locality for logically adjacent blocks.  Once
multiple allocators were introduced to separate allocation of different
objects it does not make much sense for different allocators to write
into different metaslabs of the same metaslab group (vdev) same time,
competing for its resources.  This change makes each allocator choose
metaslab group independently, colliding with others only sporadically.

Test including simultaneous write into 4 files with recordsize of 4KB
on a striped pool of 30 disks on a system with 40 logical cores show
reduction of vdev queue lock contention from 54 to 27% due to better
load distribution.  Unfortunately it won't help much ZVOLs yet since
only one dataset/ZVOL is synced at a time, and so for the most part
only one allocator is used, but it may improve later.

While there, to reduce the number of pointer dereferences change
per-allocator storage for metaslab classes and groups from several
separate malloc()'s to variable length arrays at the ends of the
original class and group structures.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #11288

3 years agoDKMS: Disable weak modules
gregory-lee-bartholomew [Tue, 15 Dec 2020 17:22:30 +0000 (11:22 -0600)]
DKMS: Disable weak modules

Fedora does not guarantee a stable kABI, so weak modules should be dis-
abled. See the dkms man page for a more detailed explanation of the weak
module feature.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gregory Bartholomew <gregory.lee.bartholomew@gmail.com>
Closes #9891
Closes #11128
Closes #11242
Closes #11335

3 years agolua: avoid gcc -Wreturn-local-addr bug
Ryan Libby [Tue, 15 Dec 2020 17:20:48 +0000 (09:20 -0800)]
lua: avoid gcc -Wreturn-local-addr bug

Avoid a bug with gcc's -Wreturn-local-addr warning with some
obfuscation.  In buggy versions of gcc, if a return value is an
expression that involves the address of a local variable, and even if
that address is legally converted to a non-pointer type, a warning may
be emitted and the value of the address may be replaced with zero.
Howerver, buggy versions don't emit the warning or replace the value
when simply returning a local variable of non-pointer type.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90737

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Libby <rlibby@FreeBSD.org>
Closes #11337

3 years agospa: avoid type narrowing warning
Ryan Libby [Tue, 15 Dec 2020 17:20:06 +0000 (09:20 -0800)]
spa: avoid type narrowing warning

Building the spa module for i386 caused gcc to emit
-Wint-to-pointer-cast "cast to pointer from integer of different size"
because spa.spa_did was uint64_t but pthread_join (via thread_join in
spa_deactivate) takes a pointer (32-bit on i386).  Define spa_did to be
pointer-size instead.  For now spa_did is in fact never non-zero and the
thread_join could instead be ifdef'd out, but changing the size of
spa_did may be more useful for the future.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Libby <rlibby@FreeBSD.org>
Closes #11336

3 years agoFreeBSD libzfs: gcc requires __thread after static
Ryan Libby [Mon, 14 Dec 2020 17:28:24 +0000 (09:28 -0800)]
FreeBSD libzfs: gcc requires __thread after static

Building libzfs with gcc on FreeBSD failed because gcc is picky about
the order of keywords in declarations with __thread, whereas clang is
more relaxed.

https://gcc.gnu.org/onlinedocs/gcc/Thread-Local.html

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ryan Libby <rlibby@FreeBSD.org>
Closes #11331

3 years agodmu_zfetch: fix memory leak
Matthew Macy [Sun, 13 Dec 2020 00:00:00 +0000 (16:00 -0800)]
dmu_zfetch: fix memory leak

The last change caused the read completion callback to not be called
if the IO was still in progress. This change restores allocation
of the arc buf callback, but in the callback path checks the new
acb_nobuf field to know to skip buffer allocation.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11324

3 years agoFix reporting of CKSUM errors in indirect vdevs
George Amanakis [Fri, 11 Dec 2020 20:15:37 +0000 (21:15 +0100)]
Fix reporting of CKSUM errors in indirect vdevs

When removing and subsequently reattaching a vdev, CKSUM errors may
occur as vdev_indirect_read_all() reads from all children of a mirror
in case of a resilver.

Fix this by checking whether a child is missing the data and setting a
flag (ic_error) which is then checked in vdev_indirect_repair() and
suppresses incrementing the checksum counter.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #11277

3 years agoRemove draid.d symlink from zfs_helpers.sh
Brian Behlendorf [Fri, 11 Dec 2020 19:00:58 +0000 (11:00 -0800)]
Remove draid.d symlink from zfs_helpers.sh

In an earlier revision of dRAID there existed an /etc/zfs/draid.d
directory.  This was removed before the final version was integrated
but a little bit was accidentally overlooked in the zfs_helpers.sh
script.  Remove this remnant.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11326

3 years agoarc_summary3: Handle overflowing value width
Ryan Moeller [Tue, 8 Dec 2020 20:20:25 +0000 (20:20 +0000)]
arc_summary3: Handle overflowing value width

Some tunables shown by arc_summary3 have string values that may exceed
the normal line length, leaving a negative offset between the name and
value fields.  The negative space is of course not valid and Python
rightly barfs up an exception traceback.

Handle an overflowing value field width by ignoring the line length
and separating the name from the value by a single space instead.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11270

3 years agoFreeBSD: Implement sysctl for fletcher4 impl
Ryan Moeller [Wed, 2 Dec 2020 21:45:08 +0000 (21:45 +0000)]
FreeBSD: Implement sysctl for fletcher4 impl

There is a tunable to select the fletcher 4 checksum implementation on
Linux but it was not present in FreeBSD.

Implement the sysctl handler for FreeBSD and use ZFS_MODULE_PARAM_CALL
to provide the tunable on both platforms.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11270

3 years agoImprove zfs receive performance with lightweight write
Matthew Ahrens [Fri, 11 Dec 2020 18:26:02 +0000 (10:26 -0800)]
Improve zfs receive performance with lightweight write

The performance of `zfs receive` can be bottlenecked on the CPU consumed
by the `receive_writer` thread, especially when receiving streams with
small compressed block sizes.  Much of the CPU is spent creating and
destroying dbuf's and arc buf's, one for each `WRITE` record in the send
stream.

This commit introduces the concept of "lightweight writes", which allows
`zfs receive` to write to the DMU by providing an ABD, and instantiating
only a new type of `dbuf_dirty_record_t`.  The dbuf and arc buf for this
"dirty leaf block" are not instantiated.

Because there is no dbuf with the dirty data, this mechanism doesn't
support reading from "lightweight-dirty" blocks (they would see the
on-disk state rather than the dirty data).  Since the dedup-receive code
has been removed, `zfs receive` is write-only, so this works fine.

Because there are no arc bufs for the received data, the received data
is no longer cached in the ARC.

Testing a receive of a stream with average compressed block size of 4KB,
this commit improves performance by 50%, while also reducing CPU usage
by 50% of a CPU.  On a per-block basis, CPU consumed by receive_writer()
and dbuf_evict() is now 1/7th (14%) of what it was.

Baseline: 450MB/s, CPU in receive_writer() 40% + dbuf_evict() 35%
New: 670MB/s, CPU in receive_writer() 17% + dbuf_evict() 0%

The code is also restructured in a few ways:

Added a `dr_dnode` field to the dbuf_dirty_record_t.  This simplifies
some existing code that no longer needs `DB_DNODE_ENTER()` and related
routines.  The new field is needed by the lightweight-type dirty record.

To ensure that the `dr_dnode` field remains valid until the dirty record
is freed, we have to ensure that the `dnode_move()` doesn't relocate the
dnode_t.  To do this we keep a hold on the dnode until it's zio's have
completed.  This is already done by the user-accounting code
(`userquota_updates_task()`), this commit extends that so that it always
keeps the dnode hold until zio completion (see `dnode_rele_task()`).

`dn_dirty_txg` was previously zeroed when the dnode was synced.  This
was not necessary, since its meaning can be "when was this dnode last
dirtied".  This change simplifies the new `dnode_rele_task()` code.

Removed some dead code related to `DRR_WRITE_BYREF` (dedup receive).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11105

3 years agoFix kernel panic induced by redacted send
Paul Dagnelie [Fri, 11 Dec 2020 18:22:29 +0000 (10:22 -0800)]
Fix kernel panic induced by redacted send

In the redaction list traversal code, there is a bug in the binary search
logic when looking for the resume point. Maxbufid can be decremented to -1,
causing us to read the last possible block of the object instead of the one we
wanted. This can cause incorrect resume behavior, or possibly even a hang in
some cases. In addition, when examining non-last blocks, we can treat the
block as being the same size as the last block, causing us to miss entries in
the redaction list when determining where to resume. Finally, we were ignoring
the case where the resume point was found in the buffer being searched, and
resuming from minbufid. All these issues have been corrected, and the code has
been significantly simplified to make future issues less likely.

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #11297

3 years agoFreeBSD: Fix format of vfs.zfs.arc_no_grow_shift
Ryan Moeller [Tue, 8 Dec 2020 17:21:36 +0000 (17:21 +0000)]
FreeBSD: Fix format of vfs.zfs.arc_no_grow_shift

vfs.zfs.arc_no_grow_shift has an invalid type (15) and this causes
py-sysctl to format it as a bytearray when it should be an integer.

"U" is not a valid format, it should be "I" and the type should match
the variable type, int.  We can return EINVAL if the value is set below
zero.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11318

3 years agoFreeBSD: Update usage of py-sysctl
Ryan Moeller [Tue, 8 Dec 2020 17:02:16 +0000 (17:02 +0000)]
FreeBSD: Update usage of py-sysctl

py-sysctl now includes the CTLTYPE_NODE type nodes in the list returned
by sysctl.filter() on FreeBSD head.  It also provides descriptions now.

Eliminate the subprocess call to get descriptions, and filter out the
nodes so we only deal with values.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11318

3 years agoFix possibly uninitialized 'root_inode' variable warning
Brian Behlendorf [Thu, 10 Dec 2020 23:23:26 +0000 (15:23 -0800)]
Fix possibly uninitialized 'root_inode' variable warning

Resolve an uninitialized variable warning when compiling.

    In function ‘zfs_domount’:
    warning: ‘root_inode’ may be used uninitialized in this
        function [-Wmaybe-uninitialized]
    sb->s_root = d_make_root(root_inode);

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11306

3 years agoImplement memory and CPU hotplug
Paul Dagnelie [Thu, 10 Dec 2020 22:09:23 +0000 (14:09 -0800)]
Implement memory and CPU hotplug

ZFS currently doesn't react to hotplugging cpu or memory into the
system in any way. This patch changes that by adding logic to the ARC
that allows the system to take advantage of new memory that is added
for caching purposes. It also adds logic to the taskq infrastructure
to support dynamically expanding the number of threads allocated to a
taskq.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Matthew Ahrens <matthew.ahrens@delphix.com>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #11212

3 years agoCI: add zloop workflow
Brian Behlendorf [Thu, 10 Dec 2020 18:55:53 +0000 (10:55 -0800)]
CI: add zloop workflow

Run ztest via zloop for 20 minutes, total run time is ~30 minutes.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11319

3 years agoCI: add zloop workflow
George Melikov [Tue, 8 Dec 2020 18:40:44 +0000 (21:40 +0300)]
CI: add zloop workflow

Run ztest via zloop for 20 minutes, total run time is ~30 minutes.

Signed-off-by: George Melikov <mail@gmelikov.ru>
3 years agoFreeBSD: Do zcommon_init sooner to avoid FPU panic
Ryan Moeller [Thu, 10 Dec 2020 05:29:00 +0000 (00:29 -0500)]
FreeBSD: Do zcommon_init sooner to avoid FPU panic

There has been a panic affecting some system configurations where the
thread FPU context is disturbed during the fletcher 4 benchmarks,
leading to a panic at boot.

module_init() registers zcommon_init to run in the last subsystem
(SI_SUB_LAST).  Running it as soon as interrupts have been configured
(SI_SUB_INT_CONFIG_HOOKS) makes sure we have finished the benchmarks
before we start doing other things.

While it's not clear *how* the FPU context was being disturbed, this
does seem to avoid it.

Add a module_init_early() macro to run zcommon_init() at this earlier
point on FreeBSD.  On Linux this is defined as module_init().

Authored by: Konstantin Belousov <kib@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11302

3 years agoZTS: three small follow up fixes for #11167
Attila Fülöp [Thu, 10 Dec 2020 05:27:12 +0000 (06:27 +0100)]
ZTS: three small follow up fixes for #11167

Follow up fix for 0cb40fa3. Remove unused variables, don't source
unused libs and add missed cleanup.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #11311

3 years agomount_zfs: print strerror instead of errno for error reporting
Érico Nogueira Rolim [Thu, 10 Dec 2020 05:24:59 +0000 (02:24 -0300)]
mount_zfs: print strerror instead of errno for error reporting

Tracking down an error message with the errno value can be difficult,
using strerror makes the error message clearer.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Érico Rolim <erico.erc@gmail.com>
Closes #11303

3 years agoDrop path prefix workaround
sterlingjensen [Thu, 10 Dec 2020 05:24:26 +0000 (23:24 -0600)]
Drop path prefix workaround

Canonicalization, the source of the trouble, was disabled in 9000a9f.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sterling Jensen <sterlingjensen@users.noreply.github.com>
Closes #11295

3 years agoDelete rw_semaphore.wait_lock configure check
Orivej Desh [Thu, 10 Dec 2020 05:22:54 +0000 (05:22 +0000)]
Delete rw_semaphore.wait_lock configure check

Last use of wait_lock was removed in "Linux 5.3 compat: retire
rw_tryupgrade()" (e7a99dab2b065ac2f8736a65d1b226d21754d771).

Fixes the issue reported in
https://github.com/openzfs/zfs/issues/11097#issuecomment-714532367

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Orivej Desh <orivej@gmx.fr>
Closes #11309

3 years agoDecouple arc_read_done callback from arc buf instantiation
Matthew Macy [Wed, 9 Dec 2020 23:05:06 +0000 (15:05 -0800)]
Decouple arc_read_done callback from arc buf instantiation

Add ARC_FLAG_NO_BUF to indicate that a buffer need not be
instantiated.  This fixes a ~20% performance regression on
cached reads due to zfetch changes.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11220
Closes #11232

3 years agoFix optional "force" arg handing in zfs_ioc_pool_sync()
Brian Behlendorf [Wed, 9 Dec 2020 22:52:45 +0000 (14:52 -0800)]
Fix optional "force" arg handing in zfs_ioc_pool_sync()

The fnvlist_lookup_boolean_value() function should not be used
to check the force argument since it's optional.  It may not be
provided or may have been created with the wrong flags.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11281
Closes #11284

3 years agoCI: add new zfs-tests-sanity workflow
George Melikov [Tue, 8 Dec 2020 17:53:45 +0000 (20:53 +0300)]
CI: add new zfs-tests-sanity workflow

Run zfs-tests with sanity.run for brief results.  Timeouts
are rare, so minimize false positives by increasing the
default from 60 to 180 seconds.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11304

3 years agoZTS: zpool_trim tests throttle trim process
George Melikov [Mon, 7 Dec 2020 18:06:10 +0000 (21:06 +0300)]
ZTS: zpool_trim tests throttle trim process

Otherwise trim may finish before progress checks.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11296

3 years agoReduce fletcher4 and raidz benchmark times
Brian Behlendorf [Sun, 6 Dec 2020 17:57:20 +0000 (09:57 -0800)]
Reduce fletcher4 and raidz benchmark times

During module load time all of the available fetcher4 and raidz
implementations are benchmarked for a fixed amount of time to
determine the fastest available.  Manual testing has shown that this
time can be significantly reduced with negligible effect on the final
results.

This commit changes the benchmark time to 1ms which can reduce the
module load time by over a second on x86_64.  On an x86_64 system
with sse3, ssse3, and avx2 instructions the benchmark times are:

    Fletcher4    603ms   -> 15ms
    RAIDZ        1,322ms -> 64ms

Reviewed-by: Matthew Macy <mmacy@freebsd.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11282

3 years agoAvoid some spa_has_pending_synctask() calls.
Alexander Motin [Sun, 6 Dec 2020 17:55:02 +0000 (12:55 -0500)]
Avoid some spa_has_pending_synctask() calls.

Since 8c4fb36a24 (PR #7795) spa_has_pending_synctask() started to
take two more locks per write inside txg_all_lists_empty().  I am
surprised those pool-wide locks are not contended, but still their
operations are visible in CPU profiles under contended vdev lock.

This commit slightly changes vdev_queue_max_async_writes() flow to
not call the function if we are going to return max_active any way
due to high amount of dirty data.  It allows to save some CPU time
exactly when the pool is busy.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-By: Tom Caputi <caputit1@tcnj.edu>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #11280

3 years agoBring consistency to ABD chunk count types.
Alexander Motin [Sun, 6 Dec 2020 17:53:40 +0000 (12:53 -0500)]
Bring consistency to ABD chunk count types.

With both abd_size and abd_nents being uint_t it makes no sense for
abd_chunkcnt_for_bytes() to return size_t.  Random mix of different
types used to count chunks looks bad and makes compiler more difficult
to optimize the code.

In particular on FreeBSD this change allows compiler to completely
optimize out abd_verify_scatter() when built without debug, removing
pointless 64-bit division and even more pointless empty loop.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #11279

3 years agoEnable ABI checks for the checkstyle workflow
Brian Behlendorf [Sun, 6 Dec 2020 17:50:47 +0000 (09:50 -0800)]
Enable ABI checks for the checkstyle workflow

Extend the CI checkstyle workflow to perform the library ABI
checks in the master branch.  The intent is not to prevent any
ABI changes but to detect them immediately so when they're
made it's done intentionally.

When the changing the ABI the `make storeabi` target can be
used to generate a new .abi file which can be included with
the commit.  This depends on the libabigail utility which is
available from the majority of distribution package managers.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11287