]> git.proxmox.com Git - mirror_zfs.git/log
mirror_zfs.git
4 years agoChange default to overlay=on
Ryan Moeller [Fri, 6 Mar 2020 17:28:19 +0000 (12:28 -0500)]
Change default to overlay=on

Filesystems allow overlay mounts by default on FreeBSD and Linux.

Respect the native convention by switching the default to overlay=on,
while retaining the option to turn the property off for compatibility
with other operating systems' conventions.

Update documentation and tests accordingly.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10030

4 years agoZTS: Update zts-report exceptions for FreeBSD
Ryan Moeller [Fri, 6 Mar 2020 17:26:38 +0000 (12:26 -0500)]
ZTS: Update zts-report exceptions for FreeBSD

The new zfs_sync_trim_* tests are skipped on FreeBSD.
Both of the previously failing tests are now passing.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10105

4 years agoZTS: Speed up write_dirs cleanup
Brian Behlendorf [Wed, 4 Mar 2020 23:12:12 +0000 (15:12 -0800)]
ZTS: Speed up write_dirs cleanup

The write_dirs tests fill a filesystem with a bunch of files until it
is full.  In cleanup the files are truncated and removed individually.
These tests already take a while to run.

It is quicker and easier to destroy the whole dataset and create a new
one to replace it in the cleanup functions.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10098

4 years agoZTS: Add missing quotes
Brian Behlendorf [Wed, 4 Mar 2020 23:10:45 +0000 (15:10 -0800)]
ZTS: Add missing quotes

`default_setup` takes a disk list as the first argument and has
optional additional arguments that control secondary functionality.
A couple of test setups mistakenly call `default_setup $DISKS`.

Add quotes so the second and subsequent disks are correctly included
in the pool as vdevs rather than triggering unwanted behavior from
`default_setup`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10097

4 years agoZTS: Add zts-report exceptions for FreeBSD
Brian Behlendorf [Wed, 4 Mar 2020 23:09:40 +0000 (15:09 -0800)]
ZTS: Add zts-report exceptions for FreeBSD

There are three tests we expect to fail only on FreeBSD.
* link_count never exits and eventually times out:
 - @amotin tells me this test is probably not applicable to us
 - Skip on FreeBSD
* userobj feature does not activate immediately after pool upgrade
 - low impact; we are aware of this issue
* removal does not appear to condense on export
 - low impact; we are aware of this issue

Additionally removal_with_zdb passes on FreeBSD, so it is moved to
"maybe".

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10093

4 years agozio: dprintf_bp() if errors > 0 in zfs_blkptr_verify()
Brian Behlendorf [Wed, 4 Mar 2020 23:08:41 +0000 (15:08 -0800)]
zio: dprintf_bp() if errors > 0 in zfs_blkptr_verify()

Also dprintf_bp() in case BLK_VERIFY_HALT of zfs_blkptr_verify_log()
since dprintf_bp() in zfs_blkptr_verify() will never be executed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Justin Keogh <commits@v6y.net>
Closes #10086

4 years agoZTS: Test the correct filesystem_limits behavior
Brian Behlendorf [Wed, 4 Mar 2020 23:07:52 +0000 (15:07 -0800)]
ZTS: Test the correct filesystem_limits behavior

See issue #8226: Property filesystem_limit does not work as documented

There have been previous attempts to fix the behavior on Linux, but so
far the issue is still open.  See PRs #8228, #8280.

The existing tests pass for the incorrect behavior.  This is a problem
on FreeBSD; we are failing the tests because we implement the feature
correctly.

I have adapted the tests based on the work by @loli10k in #8280 and
extended the changes to fix the snapshot_limit test as well.

Linux now fails these tests, so entries linking to the issue have been
added to the "maybe" group in zts-report.py.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10082

4 years agoAdd trim support to zpool wait
Brian Behlendorf [Wed, 4 Mar 2020 23:07:11 +0000 (15:07 -0800)]
Add trim support to zpool wait

Manual trims fall into the category of long-running pool activities
which people might want to wait synchronously for. This change adds
support to 'zpool wait' for waiting for manual trim operations to
complete. It also adds a '-w' flag to 'zpool trim' which can be used to
turn 'zpool trim' into a synchronous operation.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Signed-off-by: John Gallagher <john.gallagher@delphix.com>
Closes #10071

4 years agoImprove performance of zio_taskq_member
Matthew Ahrens [Tue, 3 Mar 2020 18:29:38 +0000 (10:29 -0800)]
Improve performance of zio_taskq_member

__zio_execute() calls zio_taskq_member() to determine if we are running
in a zio interrupt taskq, in which case we may need to switch to
processing this zio in a zio issue taskq.  The call to
zio_taskq_member() can become a performance bottleneck when we are
processing a high rate of zio's.

zio_taskq_member() calls taskq_member() on each of the zio interrupt
taskqs, of which there are 21.  This is slow because each call to
taskq_member() does tsd_get(taskq_tsd), which on Linux is relatively
slow.

This commit improves the performance of zio_taskq_member() by having it
cache the value of tsd_get(taskq_tsd), reducing the number of those
calls to 1/21th of the current behavior.

In a test case running `zfs send -c >/dev/null` of a filesystem with
small blocks (average 2.5KB/block), zio_taskq_member() was using 6.7% of
one CPU, and with this change it is reduced to 1.3%.  Overall time to
perform the `zfs send` reduced by 10% (~150,000 block/sec to ~165,000
blocks/sec).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10070

4 years agoZTS: Provide for nested cleanup routines
Ryan Moeller [Tue, 3 Mar 2020 18:28:09 +0000 (13:28 -0500)]
ZTS: Provide for nested cleanup routines

Shared test library functions lack a simple way to ensure proper
cleanup in the event of a failure.  The `log_onexit` cleanup pattern
cannot be used in library functions because it uses one global
variable to store the cleanup command.

An example of where this is a serious issue is when a tunable that
artifically stalls kernel progress gets activated and then some check
fails.  Unless the caller knows about the tunable and sets it back,
the system will be left in a bad state.

To solve this problem, turn the global cleanup variable into a stack.
Provide push and pop functions to add additional cleanup steps and
remove them after it is safe again.

The first use of this new functionality is in attempt_during_removal,
which sets REMOVAL_SUSPEND_PROGRESS.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10080

4 years agoMake spa_history_zone platform-dependent in kernel
Ryan Moeller [Mon, 2 Mar 2020 17:43:30 +0000 (12:43 -0500)]
Make spa_history_zone platform-dependent in kernel

This function should only return "linux" on Linux.

Move the kernel part of the function out of common code.
Fix the tests for FreeBSD.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10079

4 years agoZTS: Change issue URL template to OpenZFS org
Ryan Moeller [Mon, 2 Mar 2020 17:42:22 +0000 (12:42 -0500)]
ZTS: Change issue URL template to OpenZFS org

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10081

4 years agoDon't open zfs control device exclusively
Matthew Macy [Fri, 28 Feb 2020 22:54:14 +0000 (14:54 -0800)]
Don't open zfs control device exclusively

With the FreeBSD platform changes that were made for #10073
it is no longer necessary on FreeBSD to open the control device
exclusively to get onexit callbacks invoked.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10076

4 years agoDon't call zrele on passed zp in zfs_xattr_owner_unlinked on FreeBSD
Matthew Macy [Fri, 28 Feb 2020 22:53:18 +0000 (14:53 -0800)]
Don't call zrele on passed zp in zfs_xattr_owner_unlinked on FreeBSD

FreeBSD has a somewhat more cumbersome locking and refcounting
protocol for the platform counterpart to znode. We need to not call
zrele on the passed zp, but do need to do so on any intermediate zp.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10075

4 years agoRe-share zfsdev_getminor and zfs_onexit_fd_hold
Matthew Macy [Fri, 28 Feb 2020 22:50:32 +0000 (14:50 -0800)]
Re-share zfsdev_getminor and zfs_onexit_fd_hold

By adding a zfs_file_private accessor to the common
interfaces and some extensions to FreeBSD platform
code it is now possible to share the implementations
for the aforementioned functions.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10073

4 years agoImprove zfs destroy performance with zio_t-free zio_free()
Matthew Ahrens [Fri, 28 Feb 2020 22:49:44 +0000 (14:49 -0800)]
Improve zfs destroy performance with zio_t-free zio_free()

When "zfs destroy" is run, it completes quickly, and in the background
we locate the blocks to free and free them.  This background activity
can be observed with `zpool get freeing` and `zpool wait -t free ...`.

This background activity is processed by a single thread (the spa_sync
thread) which calls zio_free() on each of the blocks to free.  With even
modest storage performance, the CPU consumption of zio_free() can be the
performance bottleneck.

Performance of zio_free() can be improved by not actually creating a
zio_t in the common case (non-dedup, non-gang), instead calling
metaslab_free() directly.  This avoids the CPU cost of allocating the
zio_t, and more importantly the cost of adding and later removing this
zio_t from the parent zio's child list.

The result is that performance of background freeing more than doubles,
from 0.6 million blocks per second to 1.3 million blocks per second.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10034

4 years agoZTS: Fixup shebang in rsend_016, add to common.run
Ryan Moeller [Fri, 28 Feb 2020 17:48:29 +0000 (12:48 -0500)]
ZTS: Fixup shebang in rsend_016, add to common.run

All other ksh scripts use /bin/ksh in the shebang.

Make rsend_016_neg consistent with the rest of the suite.

The test also was absent from any runfiles. Add it to common.run.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10051

4 years agoZTS: Eliminate partitioning from zpool_add
Ryan Moeller [Fri, 28 Feb 2020 17:46:51 +0000 (12:46 -0500)]
ZTS: Eliminate partitioning from zpool_add

Use file vdevs if we are short on $DISKS.
Also fixed vol recursion for FreeBSD in 004.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10060

4 years agoFix CONFIG_MODULES=no Linux kernel config
Brian Behlendorf [Fri, 28 Feb 2020 17:23:48 +0000 (09:23 -0800)]
Fix CONFIG_MODULES=no Linux kernel config

When configuring as builtin (--enable-linux-builtin) for kernels
without loadable module support (CONFIG_MODULES=n) only the object
file is created.  Never a loadable kmod.

Update ZFS_LINUX_TRY_COMPILE to handle this in a manor similar to
the ZFS_LINUX_TEST_COMPILE_ALL macro.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9887
Closes #10063

4 years agoLinux 5.5 compat: blkg_tryget()
Brian Behlendorf [Fri, 28 Feb 2020 16:58:39 +0000 (08:58 -0800)]
Linux 5.5 compat: blkg_tryget()

Commit https://github.com/torvalds/linux/commit/9e8d42a0f accidentally
converted the static inline function blkg_tryget() to GPL-only for
kernels built with CONFIG_PREEMPT_RCU=y and CONFIG_BLK_CGROUP=y.

Resolve the build issue by providing our own equivalent functionality
when needed which uses rcu_read_lock_sched() internally as before.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9745
Closes #10072

4 years agoarc_summary: Make get_descriptions per platform
Ryan Moeller [Fri, 28 Feb 2020 01:15:06 +0000 (20:15 -0500)]
arc_summary: Make get_descriptions per platform

Linux uses modinfo to get tunables descriptions, FreeBSD has to use
sysctl.

Move the existing function definition so it is defined that way on
Linux, and add a definition in terms of sysctl for FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10062

4 years agopyzfs: Add constants for platform-specific errnos
Ryan Moeller [Fri, 28 Feb 2020 01:14:21 +0000 (20:14 -0500)]
pyzfs: Add constants for platform-specific errnos

FreeBSD doesn't have EBADE, ECHRNG, or ETIME.

Add constants for these and set them appropriately for the platform.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10061

4 years agoConsolidate arc_buf allocation checks
Matthew Macy [Fri, 28 Feb 2020 01:12:44 +0000 (17:12 -0800)]
Consolidate arc_buf allocation checks

The following check currently occurs in three separate locations
in dbuf.c.  This change consolidates those checks in to the
dbuf_alloc_arcbuf_from_arcbuf() function.

if (arc_is_encrypted(data)) {
...
} else if (compress_type != ZIO_COMPRESS_OFF) {
...
} else {
...
}

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10057

4 years agoZTS: Misc fixes for FreeBSD
Ryan Moeller [Thu, 27 Feb 2020 17:38:34 +0000 (12:38 -0500)]
ZTS: Misc fixes for FreeBSD

* Set geom debug flags in corrupt_blocks_at_level
* Use the right time zone for history tests
* Add missing commands.cfg entry for diskinfo
* Rewrite get_last_txg_synced to use zdb
* Don't check ulimits for sparse files
* Suspend removal before removing a vdev, not after

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10054

4 years agoZTS: Fix zfs_receive_004_neg
Matthew Ahrens [Thu, 27 Feb 2020 17:37:34 +0000 (09:37 -0800)]
ZTS: Fix zfs_receive_004_neg

`zfs recv` of an incremental stream that already exists is ignored, with
a message like:

    receiving incremental stream of pool/fs@incsnap into pool/fs@incsnap
    snap testpool/testfs@incsnap already exists; ignoring

And the command exits successfully (exit code 0).

The zfs_receive_004_neg test is expecting that a this case will fail,
with nonzero exit code.

The fix is to remove this specific command from the test case.  This
lets us check that the remaining commands do in fact fail.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10055

4 years agoLinux 5.6 compat: time_t
Brian Behlendorf [Wed, 26 Feb 2020 21:18:07 +0000 (13:18 -0800)]
Linux 5.6 compat: time_t

As part of the Linux kernel's y2038 changes the time_t type has been
fully retired.  Callers are now required to use the time64_t type.

Rather than move to the new type, I've removed the few remaining
places where a time_t is used in the kernel code.  They've been
replaced with a uint64_t which is already how ZFS internally
handled these values.

Going forward we should work towards updating the remaining user
space time_t consumers to the 64-bit interfaces.

Reviewed-by: Matthew Macy <mmacy@freebsd.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10052
Closes #10064

4 years agoLinux 5.6 compat: ktime_get_raw_ts64()
Brian Behlendorf [Wed, 26 Feb 2020 20:42:33 +0000 (12:42 -0800)]
Linux 5.6 compat: ktime_get_raw_ts64()

The getrawmonotonic() and getrawmonotonic64() interfaces have been
fully retired.  Update gethrtime() to use the replacement interface
ktime_get_raw_ts64() which was introduced in the 4.18 kernel.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10052
Closes #10064

4 years agoRefactor dnode dirty context from dbuf_dirty
Matthew Macy [Thu, 27 Feb 2020 00:09:17 +0000 (16:09 -0800)]
Refactor dnode dirty context from dbuf_dirty

* Add dedicated donde_set_dirtyctx routine.
* Add empty dirty record on destroy assertion.
* Make much more extensive use of the SET_ERROR macro.

Reviewed-by: Will Andrews <wca@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9924

4 years agoZTS: Fix zfs_copies_002_pos
Ryan Moeller [Wed, 26 Feb 2020 22:29:13 +0000 (17:29 -0500)]
ZTS: Fix zfs_copies_002_pos

The function `get_used_prop` does not exist.

Use `get_prop used` instead.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10059

4 years agoZTS: Adapt casenorm tests for FreeBSD
Ryan Moeller [Wed, 26 Feb 2020 16:41:30 +0000 (11:41 -0500)]
ZTS: Adapt casenorm tests for FreeBSD

Several casenorm tests pass on FreeBSD but are expected to fail on
Linux.

Move the passing tests from "fail" to "maybe" so that passing on
FreeBSD is not unexpected.

Invert platform logic so FreeBSD doesn't use illumos-only zlook.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10050

4 years agoZTS: Misc fixes for FreeBSD
Ryan Moeller [Wed, 26 Feb 2020 00:23:27 +0000 (19:23 -0500)]
ZTS: Misc fixes for FreeBSD

* Check for mountd in is_shared to avoid timeout when not running
* Enhance robustness of some cleanup functions
* Simplify atime lookup
* Skip sharenfs validation for now
* Don't add mountpoint property to inheritance validation on FreeBSD

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10047

4 years agoAdd missing newline after zfs redact help message
Ryan Moeller [Wed, 26 Feb 2020 00:20:50 +0000 (19:20 -0500)]
Add missing newline after zfs redact help message

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10045

4 years agoZTS: zed_start should not fail if zed is already running
Olaf Faaland [Wed, 26 Feb 2020 00:02:10 +0000 (16:02 -0800)]
ZTS: zed_start should not fail if zed is already running

zed_start may be called in places where zed is not
typically already running, but this is not a requirement
of the tests.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #9974

4 years agoRemove dead code error handling from dsl_crypt.c
Matthew Macy [Tue, 25 Feb 2020 23:59:29 +0000 (15:59 -0800)]
Remove dead code error handling from dsl_crypt.c

Sleepable (KM_SLEEP) allocations cannot fail. Hence
error handling for them is not useful.

Reviewed-By: Tom Caputi <tcaputi@datto.com>
Reviewed-By: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10031

4 years agoZTS: Move atime_003 to linux.run
Ryan Moeller [Tue, 25 Feb 2020 23:27:41 +0000 (18:27 -0500)]
ZTS: Move atime_003 to linux.run

This test verifies relatime behavior, which is only present on Linux.

Move the test to linux.run

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10046

4 years agoUpdate README for OpenZFS
Matthew Ahrens [Tue, 25 Feb 2020 19:43:20 +0000 (11:43 -0800)]
Update README for OpenZFS

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10053

4 years agoRemove zfs_getattr and convoff dead code
Dirkjan Bussink [Mon, 24 Feb 2020 23:38:23 +0000 (00:38 +0100)]
Remove zfs_getattr and convoff dead code

The `convoff` function is called only in one code path in `zfs_space`.
Each caller of `zfs_space` is called with a `flock64_t` that has
`l_whence` set to `SEEK_SET`. This means that `convoff` always results
in a no-op as the `bfp` parameter has `l_whence` set to `SEEK_SET` and
`int whence` is `SEEK_SET` as well.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
Closes #10006

4 years agoZTS: Misc fixes for FreeBSD
Ryan Moeller [Mon, 24 Feb 2020 18:17:55 +0000 (13:17 -0500)]
ZTS: Misc fixes for FreeBSD

* Force UFS sync before snap in vol rollback tests
* rw is not a valid share option on FreeBSD, use ro instead
* zfs_unmount_nested: mountpoint is in the pool, rmdir *before* export
* Fix some more platform checks
* Fix disappearing group in delegate tests
* Don't try delegating for jailed, only root can set it

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10038

4 years agoRemove unused structs and members in dmu_send.c
Matthew Ahrens [Mon, 24 Feb 2020 17:50:14 +0000 (09:50 -0800)]
Remove unused structs and members in dmu_send.c

There are several structs (and members of structs) related to redaction,
which are no longer used.  This commit removes them.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10039

4 years agoZTS: Eliminate partitioning from zpool_destroy
Ryan Moeller [Sat, 22 Feb 2020 00:00:23 +0000 (19:00 -0500)]
ZTS: Eliminate partitioning from zpool_destroy

The zpool destroy tests partition a single disk to create two pools.

This can be done using two disks and no partitioning instead.
And temporarily allow vol recursion for FreeBSD while in here.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10036

4 years agoZTS: Refactor is_shared, fix impl on FreeBSD
Ryan Moeller [Fri, 21 Feb 2020 23:59:20 +0000 (18:59 -0500)]
ZTS: Refactor is_shared, fix impl on FreeBSD

FreeBSD doesn't have a `share` command.  It does have showmount.

Split the separate platform impls out of is_shared_impl.
Dispatch to the correct platform impl function from is_shared.
Eliminate the use of is_shared_impl from tests.  is_shared works.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10037

4 years agoZTS: Move privilege tests to sunos.run
Ryan Moeller [Fri, 21 Feb 2020 16:52:44 +0000 (11:52 -0500)]
ZTS: Move privilege tests to sunos.run

These tests are unspported on FreeBSD and Linux for lack of pfexec.

Move the privilege tests to sunos.run and remove the platform checks.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10035

4 years agoZTS: Don't use lsblk on FreeBSD
Ryan Moeller [Fri, 21 Feb 2020 16:38:34 +0000 (11:38 -0500)]
ZTS: Don't use lsblk on FreeBSD

These tests use lsblk to find the sector size of a disk.
FreeBSD doesn't have lsblk.

Use diskinfo -v to get sector size on FreeBSD.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>\
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10033

4 years agoZTS: Fix userquota_006_pos on FreeBSD
Ryan Moeller [Thu, 20 Feb 2020 16:14:25 +0000 (11:14 -0500)]
ZTS: Fix userquota_006_pos on FreeBSD

FreeBSD uses `pw` for account management. `userquota_006_pos`
erroneously invokes the non-existent `groupdel` command on FreeBSD.

Use `pw groupdel -n` instead of `groupdel` on FreeBSD.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10032

4 years agoZTS: Check the right mount options on FreeBSD
Ryan Moeller [Thu, 20 Feb 2020 16:12:24 +0000 (11:12 -0500)]
ZTS: Check the right mount options on FreeBSD

FreeBSD does not support the "devices" and "nodevices" mount options.

Do not check these options on FreeBSD.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10028

4 years agoZTS: Fix faulty slog_replay_fs_001 test
Ryan Moeller [Thu, 20 Feb 2020 16:11:51 +0000 (11:11 -0500)]
ZTS: Fix faulty slog_replay_fs_001 test

This test is supposed to verify zil operations. For TX_WRITE, writes
must be synchronous in order to be entered in the zil. Linux seems to
be doing sync writes even when they are not asked for, but on FreeBSD
the test does not do what is intended.

Use dd oflag=sync for the parts of this test that are supposed to
result in TX_WRITE zil entries.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10022

4 years agoFix icp include directories for in-tree build
Arvind Sankar [Thu, 20 Feb 2020 16:10:47 +0000 (11:10 -0500)]
Fix icp include directories for in-tree build

When zfs is built in-tree using --enable-linux-builtin, the compile
commands are executed from the kernel build directory. If the build
directory is different from the kernel source directory, passing
-Ifs/zfs/icp will not find the headers as they are not present in the
build directory.

Fix this by adding @abs_top_srcdir@ to pull the headers from the zfs
source tree instead.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10021

4 years agoZTS: Eliminate partitioning from zpool_create etc
Ryan Moeller [Thu, 20 Feb 2020 16:10:13 +0000 (11:10 -0500)]
ZTS: Eliminate partitioning from zpool_create etc

These tests can be made to work without a bunch of complex
partitioning of physical disks.

Use the 3 disks directly, creating a few file disks if needed for a
compelling reason.

Reduce the use of shared variables that don't have a clear utility.

Catch the fallout in tests that include cfg/shlib from zpool_create.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10002

4 years agoZTS: Fix zpool_create/create-o_ashift on FreeBSD
Ryan Moeller [Wed, 19 Feb 2020 18:27:23 +0000 (13:27 -0500)]
ZTS: Fix zpool_create/create-o_ashift on FreeBSD

For some unknown reason, egrep was misbehaving with this pattern on
FreeBSD.  The command works fine run interactively from a shell, but
in the test the output of egrep is empty.

Work around the issue by using a filter in the awk script instead.

While here, add a bit of diagnostic output and other simplifications
to the awk script as well.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10023

4 years agoZTS: Avoid nonportable cmp flag
Ryan Moeller [Wed, 19 Feb 2020 17:03:31 +0000 (12:03 -0500)]
ZTS: Avoid nonportable cmp flag

FreeBSD doesn't have the -n flag for cmp.

Read the area for the first four labels from the disk to a separate
file to compare instead of using the special flag to limit the size.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10024

4 years agoAdd notice that forcefully unmount is not supported on Linux
Mariusz Zaborski [Tue, 18 Feb 2020 21:36:23 +0000 (22:36 +0100)]
Add notice that forcefully unmount is not supported on Linux

The Linux VFS will never allow a filesystem which is in use to
be unmounted.  This behavior differs from other platforms like
FreeBSD which allow a filesystem to be force unmounted.  This
will result in errors being returned to applications actively
using the filesystem.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mariusz Zaborski <oshogbo@vexillium.org>
Closes #10013

4 years agoZTS: Move free to Linux commands list
Ryan Moeller [Tue, 18 Feb 2020 19:23:41 +0000 (14:23 -0500)]
ZTS: Move free to Linux commands list

FreeBSD does not have the free command. This command is only used by
Linux in a perf hostinfo function.

Move free from the list of common commands to the list of Linux
commands.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10011

4 years agoEnable zpool events tunables and tests on FreeBSD
Ryan Moeller [Tue, 18 Feb 2020 19:22:56 +0000 (14:22 -0500)]
Enable zpool events tunables and tests on FreeBSD

We have have made the necessary changes in our module code to expose
zevents through both devd and the zpool events ioctl. Now the tunables
can be exposed and zpool events tests can be enabled on both platforms.

A few minor tweaks to the tests were needed to accommodate the way wc
formats output on FreeBSD.

zed remains to be ported.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10008

4 years agoFactor out some dbuf subroutines and add state change tracing
Matthew Macy [Tue, 18 Feb 2020 19:21:37 +0000 (11:21 -0800)]
Factor out some dbuf subroutines and add state change tracing

Create dedicated dbuf_read_hole and dbuf_read_bonus.
Additionally, add a dtrace probe to allow state change tracing.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Will Andrews <wca@FreeBSD.org>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Authored-by: Will Andrews <wca@FreeBSD.org>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9923

4 years agoPrefer org.openzfs for features and properties
Richard Laager [Tue, 18 Feb 2020 17:36:50 +0000 (11:36 -0600)]
Prefer org.openzfs for features and properties

Moving forward, we wish to use org.openzfs (no dash) rather than
org.open-zfs or org.zfsonlinux for feature GUIDs and property names.
The existing feature GUIDs cannot be changed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes #10003

4 years agoZTS: Move cksum to common system commands
Ryan Moeller [Sun, 16 Feb 2020 20:49:49 +0000 (15:49 -0500)]
ZTS: Move cksum to common system commands

The cksum command is used by delegate tests. We have it on FreeBSD,
so it should not have been moved to the Linux commands list.

Move it back to the common commands list.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10007

4 years agoHonour sync=disabled when relinking tpmfiles
DeHackEd [Sun, 16 Feb 2020 20:44:08 +0000 (15:44 -0500)]
Honour sync=disabled when relinking tpmfiles

Unlinked files don't respect synchronous flush commands, but when they get relinked
their state is unknown. Previously we force flushed all such files even when
sync=disabled. Correct this case.

Reviewed-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: DHE <git@dehacked.net>
Closes #10005

4 years agoSystemd mount generator: Generate noauto units; add control properties
InsanePrawn [Wed, 12 Feb 2020 17:01:15 +0000 (18:01 +0100)]
Systemd mount generator: Generate noauto units; add control properties

This commit refactors the systemd mount generators and makes the
following major changes:

- The generator now generates units for datasets marked canmount=noauto,
  too. These units are NOT WantedBy local-fs.target.
  If there are multiple noauto datasets for a path, no noauto unit will
  be created. Datasets with canmount=on are prioritized.

- Introduces handling of new user properties which are now included in
  the zfs-list.cache files:
    - org.openzfs.systemd:requires:
      List of units to require for this mount unit
    - org.openzfs.systemd:requires-mounts-for:
      List of mounts to require by this mount unit
    - org.openzfs.systemd:before:
      List of units to order after this mount unit
    - org.openzfs.systemd:after:
      List of units to order before this mount unit
    - org.openzfs.systemd:wanted-by:
      List of units to add a Wants dependency on this mount unit to
    - org.openzfs.systemd:required-by:
      List of units to add a Requires dependency on this mount unit to
    - org.openzfs.systemd:nofail:
      Toggles between a wants and a requires dependency.
    - org.openzfs.systemd:ignore:
      Do not generate a mount unit for this dataset.

  Consult the updated man page for detailed documentation.

- Restructures and extends the zfs-mount-generator(8) man page with the
  above properties, information on unit ordering and a license header.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Antonio Russo <antonio.e.russo@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
Closes #9649

4 years agoSystemd mount generator: Silence shellcheck warnings
InsanePrawn [Sat, 11 Jan 2020 18:14:23 +0000 (19:14 +0100)]
Systemd mount generator: Silence shellcheck warnings

Silences a warning about an intentionally unquoted variable.
Fixes a warning caused by strings split across lines by slightly
refactoring keyloadcmd.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Antonio Russo <antonio.e.russo@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
Closes #9649

4 years agoSupport setting user properties in a channel program
Jason King [Fri, 14 Feb 2020 21:41:42 +0000 (15:41 -0600)]
Support setting user properties in a channel program

This adds support for setting user properties in a
zfs channel program by adding 'zfs.sync.set_prop'
and 'zfs.check.set_prop' to the ZFS LUA API.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Co-authored-by: Sara Hartse <sara.hartse@delphix.com>
Contributions-by: Jason King <jason.king@joyent.com>
Signed-off-by: Sara Hartse <sara.hartse@delphix.com>
Signed-off-by: Jason King <jason.king@joyent.com>
Closes #9950

4 years agoRemove limit on number of async zio_frees of non-dedup blocks
Matthew Ahrens [Fri, 14 Feb 2020 16:39:46 +0000 (08:39 -0800)]
Remove limit on number of async zio_frees of non-dedup blocks

The module parameter zfs_async_block_max_blocks limits the number of
blocks that can be freed by the background freeing of filesystems and
snapshots (from "zfs destroy"), in one TXG.  This is useful when freeing
dedup blocks, becuase each zio_free() of a dedup block can require an
i/o to read the relevant part of the dedup table (DDT), and will also
dirty that block.

zfs_async_block_max_blocks is set to 100,000 by default.  For the more
typical case where dedup is not used, this can have a negative
performance impact on the rate of background freeing (from "zfs
destroy").  For example, with recordsize=8k, and TXG's syncing once
every 5 seconds, we can free only 160MB of data per second, which may be
much less than the rate we can write data.

This change increases zfs_async_block_max_blocks to be unlimited by
default.  To address the dedup freeing issue, a new tunable is
introduced, zfs_max_async_dedup_frees, which limits the number of
zio_free()'s of dedup blocks done by background destroys, per txg.  The
default is 100,000 free's (same as the old zfs_async_block_max_blocks
default).

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10000

4 years agoMake zpool.d/iostat work on FreeBSD
Ryan Moeller [Fri, 14 Feb 2020 16:37:40 +0000 (11:37 -0500)]
Make zpool.d/iostat work on FreeBSD

There are slight differences in the iostat commands between FreeBSD and
Linux.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9979

4 years agoUse POSIX stdout/stderr redirect in configure macro
Andrew J. Hesford [Fri, 14 Feb 2020 16:30:29 +0000 (11:30 -0500)]
Use POSIX stdout/stderr redirect in configure macro

This PR fixes an issue wherein redirecting stdout and stderr when
building kernel modules in configure tests relied on a bashism that
does not work as expected when /bin/sh is not bash.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-By: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Andrew J. Hesford <ajh@sideband.org>
Closes #9990
Closes #9998

4 years agoZTS: Misc test fixes for FreeBSD
Ryan Moeller [Thu, 13 Feb 2020 21:52:34 +0000 (16:52 -0500)]
ZTS: Misc test fixes for FreeBSD

Add missing logic for FreeBSD to a few test scripts.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9994

4 years agoZTS: Don't include zpool_create.shlib in zpool_add
Ryan Moeller [Thu, 13 Feb 2020 20:11:25 +0000 (15:11 -0500)]
ZTS: Don't include zpool_create.shlib in zpool_add

The zpool_add tests include zpool_create.shlib for a few silly
variables.

Don't use those variables for the file names. Include zpool_add.kshlib
for whatever variables we still need.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9997

4 years agoZTS: Eliminate partitioning from zpool_remove
Ryan Moeller [Thu, 13 Feb 2020 20:10:36 +0000 (15:10 -0500)]
ZTS: Eliminate partitioning from zpool_remove

These tests do not need to use partitions.

Get rid of the partitioning and just use the disks directly.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9996

4 years agoZTS: Eliminate partitioning from write_dirs
Ryan Moeller [Thu, 13 Feb 2020 20:08:59 +0000 (15:08 -0500)]
ZTS: Eliminate partitioning from write_dirs

These tests do not need to use partitions.

Get rid of the partitioning and just use the disks directly.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9995

4 years agoZTS: Cleanup some cleanup functions
Ryan Moeller [Thu, 13 Feb 2020 20:05:32 +0000 (15:05 -0500)]
ZTS: Cleanup some cleanup functions

Cleanup functions should make a best effort to clean up as much as
possible.

Do a consistency pass in a bunch of tests to make the cleanup
functions less prone to failure and fix a few typos here and there.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9993

4 years agoFix a typo/whitespace in tests README
Ryan Moeller [Thu, 13 Feb 2020 20:04:47 +0000 (15:04 -0500)]
Fix a typo/whitespace in tests README

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9991

4 years agoZTS: Use ECKSUM instead of EBADE in libzfs test
Ryan Moeller [Thu, 13 Feb 2020 20:03:01 +0000 (15:03 -0500)]
ZTS: Use ECKSUM instead of EBADE in libzfs test

Linux defines ECKSUM as EBADE, FreeBSD defines it as EINTEGRITY.

Test for ECKSUM instead of EBADE so we don't have to define EBADE for
this test on FreeBSD.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9992

4 years agozfs-mount-generator: Fix escaping for /
Richard Laager [Thu, 13 Feb 2020 19:55:59 +0000 (13:55 -0600)]
zfs-mount-generator: Fix escaping for /

The correct name for the mount unit for / is "-.mount", not ".mount".

Reviewed-by: InsanePrawn <insane.prawny@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Antonio Russo <antonio.e.russo@gmail.com>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes #9970

4 years agofix zstreamdump -C
Matthew Ahrens [Thu, 13 Feb 2020 19:24:57 +0000 (11:24 -0800)]
fix zstreamdump -C

zstreamdump -C always fails.  It is not calculating the checksums, but
it's still trying to verify that the (non-calculated) checksum matches
the one stored in the send stream.

This change makes zstreamdump -C not verify checksums.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #9983

4 years agoMissed wakeup when growing kmem cache
Matthew Ahrens [Thu, 13 Feb 2020 19:23:02 +0000 (11:23 -0800)]
Missed wakeup when growing kmem cache

When growing the size of a (VMEM or KVMEM) kmem cache, spl_cache_grow()
always does taskq_dispatch(spl_cache_grow_work), and then waits for the
KMC_BIT_GROWING to be cleared by the taskq thread.

The taskq thread (spl_cache_grow_work()) does:
1. allocate new slab and add to list
2. wake_up_all(skc_waitq)
3. clear_bit(KMC_BIT_GROWING)

Therefore, the waiting thread can wake up before GROWING has been
cleared.  It will see that the growing has not yet completed, and go
back to sleep until it hits the 100ms timeout.

This can have an extreme performance impact on workloads that alloc/free
more than fits in the (statically-sized) magazines.  These workloads
allocate and free slabs with high frequency.

The problem can be observed with `funclatency spl_cache_grow`, which on
some workloads shows that 99.5% of the time it takes <64us to allocate
slabs, but we spend ~70% of our time in outliers, waiting for the 100ms
timeout.

The fix is to do `clear_bit(KMC_BIT_GROWING)` before
`wake_up_all(skc_waitq)`.

A future investigation should evaluate if we still actually need to
taskq_dispatch() at all, and if so on which kernel versions.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #9989

4 years agoRemove duplicate dbufs accounting
Alexander Motin [Thu, 13 Feb 2020 19:20:42 +0000 (14:20 -0500)]
Remove duplicate dbufs accounting

Since AVL already has embedded element counter, use dn_dbufs_count
only for dbufs not counted there (bonus buffers) and just add them.
This removes two atomics per dbuf life cycle.

According to profiler it reduces time spent by dbuf_destroy() inside
bottlenecked dbuf_evict_thread() from 13.36% to 9.20% of the core.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #9949

4 years agoZTS: Move user_namespace test to linux.run
Ryan Moeller [Wed, 12 Feb 2020 21:06:00 +0000 (16:06 -0500)]
ZTS: Move user_namespace test to linux.run

Namespaces is a Linux feature not available on other platforms.

Move the user_namespace test out of common.run to linux.run.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9982

4 years agoZTS: Interpret env vars in faketty on FreeBSD
Ryan Moeller [Wed, 12 Feb 2020 21:04:51 +0000 (16:04 -0500)]
ZTS: Interpret env vars in faketty on FreeBSD

This was missed in review. On FreeBSD, script does not understand
environment variables being passed as a command.

Use env to make faketty handle env vars on FreeBSD.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9981

4 years agoZTS: Fix zdb_display_block on FreeBSD
Ryan Moeller [Wed, 12 Feb 2020 21:01:55 +0000 (16:01 -0500)]
ZTS: Fix zdb_display_block on FreeBSD

Missed this in the review, but wc output on FreeBSD is indented,
so string comparisons mismatch when comparing to an unindented number.

Compare counts as integers instead of strings.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9980

4 years agoMove zfs_version_kernel to platform code
Ryan Moeller [Wed, 12 Feb 2020 21:00:19 +0000 (16:00 -0500)]
Move zfs_version_kernel to platform code

Linux uses sysfs to determine the module version, FreeBSD uses a
different method.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9978

4 years agoZTS: Move zpool_split_wholedisks to linux.run
Ryan Moeller [Wed, 12 Feb 2020 20:59:28 +0000 (15:59 -0500)]
ZTS: Move zpool_split_wholedisks to linux.run

This test uses the scsi_debug Linux kernel module.

Move the test to linux.run until we have an alternative to scsi_debug
worked out on FreeBSD.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9984

4 years agozdb: Always print symlink target
Justin Keogh [Wed, 12 Feb 2020 19:36:05 +0000 (19:36 +0000)]
zdb: Always print symlink target

When zdb is printing paths, also print the symlink target if it exists.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Signed-off-by: Justin Keogh <commits@v6y.net>
Closes #9925

4 years agozcp: add zfs.sync.bookmark
Christian Schwarz [Thu, 16 Jan 2020 01:15:05 +0000 (02:15 +0100)]
zcp: add zfs.sync.bookmark

Add support for bookmark creation and cloning.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #9571

4 years agoImplement bookmark copying
Christian Schwarz [Mon, 11 Nov 2019 07:24:14 +0000 (23:24 -0800)]
Implement bookmark copying

This feature allows copying existing bookmarks using

    zfs bookmark fs#target fs#newbookmark

There are some niche use cases for such functionality,
e.g. when using bookmarks as markers for replication progress.

Copying redaction bookmarks produces a normal bookmark that
cannot be used for redacted send (we are not duplicating
the redaction object).

ZCP support for bookmarking (both creation and copying) will be
implemented in a separate patch based on this work.

Overview:

- Terminology:
    - source = existing snapshot or bookmark
    - new/bmark = new bookmark
- Implement bookmark copying in `dsl_bookmark.c`
  - create new bookmark node
  - copy source's `zbn_phys` to new's `zbn_phys`
  - zero-out redaction object id in copy
- Extend existing bookmark ioctl nvlist schema to accept
  bookmarks as sources
  - => `dsl_bookmark_create_nvl_validate` is authoritative
- use `dsl_dataset_is_before` check for both snapshot
  and bookmark sources
- Adjust CLI
  - refactor shortname expansion logic in `zfs_do_bookmark`
- Update man pages
  - warn about redaction bookmark handling
- Add test cases
  - CLI
  - pyyzfs libzfs_core bindings

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #9571

4 years agoAddress Coverity warnings in #9902
Matthew Macy [Tue, 11 Feb 2020 21:12:41 +0000 (13:12 -0800)]
Address Coverity warnings in #9902

Coverity reports the variable may be NULL, but due to the
way the dirty records are handled this cannot be the case.
Add a comment and VERIFY to make this clear and silence
the warning.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9962

4 years agoAdd missing dmu_buf_unlock_parent() calls to dbuf_read_impl()
Brian Behlendorf [Mon, 10 Feb 2020 22:54:12 +0000 (14:54 -0800)]
Add missing dmu_buf_unlock_parent() calls to dbuf_read_impl()

As explained by the comment in dbuf_read() and above dbuf_read_impl().
Under all circumstances the parent lock specified by dblt should be
dropped when existing dbuf_read_impl().  This was not being done for
two exist paths.  Additionally, ensure the mutex is unlocked before
dropping the parent lock.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9968

4 years agoFix zdb -R with 'b' flag
Paul Zuchowski [Mon, 10 Feb 2020 22:00:05 +0000 (18:00 -0400)]
Fix zdb -R with 'b' flag

zdb -R :b fails due to the indirect block being compressed,
and the 'b' and 'd' flag not working in tandem when specified.
Fix the flag parsing code and create a zfs test for zdb -R
block display.  Also fix the zio flags where the dotted notation
for the vdev portion of DVA (i.e. 0.0:offset:length) fails.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes #9640
Closes #9729

4 years agobash scripts: use /usr/bin/env for bash shebangs
Graham Christensen [Mon, 10 Feb 2020 21:13:46 +0000 (16:13 -0500)]
bash scripts: use /usr/bin/env for bash shebangs

Not all systems / distros have a `/bin/bash`, and these scripts are
more difficult to run at development time.

For example, my system is NixOS which doesn't have a /bin/bash. This
is not a problem for NixOS building ZFS as a package: the build
environment automatically replaces these shebangs with corrected
paths.

The problem is much more annoying at development time: either the
scripts don't run, or I correct them for my local machine and deal with
a perpetually dirty work tree.

Before committing this patch I confirmed there are existing scripts
which use `/usr/bin/env` to locate bash, so I am thinking this is a
safe transformation.

There are a handful of other shebangs in this repository which don't
work on my system. This patch is useful on its own specifically for
`commitcheck.sh`, otherwise I can't validate my commits before
submission.

Here are the remaining shebangs which NixOS systems won't have:

       1274 #!/bin/ksh -p
         91 #!/bin/ksh
         89 #! /bin/ksh -p
          2 #!/bin/sed -f
          1 #!/usr/bin/perl -w
          1 #!/usr/bin/ksh
          1 #!/bin/nawk -f

plus this which will create an invalid shebang in
`tests/zfs-tests/tests/functional/mv_files/mv_files_common.kshlib`:

        echo "#!/bin/ksh" > $TEST_BASE_DIR/exitsZero.ksh

I chose to leave those alone for now, and gauge the interest in this
much smaller patch first.

The fixes for these are easy enough by simply using `/usr/bin/env ksh`:

         91 #!/bin/ksh
          1 #!/usr/bin/ksh

The fix for the other set is much trickier. Quoting the GNU coreutils
manual:

    Most operating systems (e.g. GNU/Linux, BSDs) treat all text after
    the first space as a single argument. When using env in a script it
    is thus not possible to specify multiple arguments.

and not all `env`'s support arguments.

Mine (GNU Coreutils 8.31) does, though this feature is new since
April 2018, GNU Coreutils 8.30:
https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=668306ed86c8c79b0af0db8b9c882654ebb66db2

and worse, requires the -S argument:

    -S, --split-string=S  process and split S into separate arguments;
                          used to pass multiple arguments on shebang
                          lines

Example:

    $ seq 1 2 | $(nix-build '<nixpkgs>' -A coreutils)/bin/env "sort -nr"
    /nix/[...]-coreutils-8.31/bin/env: ‘sort -nr’: No such file or directory
    /nix/[...]-coreutils-8.31/bin/env: use -[v]S to pass options in shebang lines

    $ seq 1 2 | $(nix-build '<nixpkgs>' -A coreutils)/bin/env "-S sort -nr"
    2
    1

GNU Coreutils says FreeBSD's `env` does, though I wonder if FreeBSD's
would be unhappy with the `-S`:
https://www.gnu.org/software/coreutils/manual/html_node/env-invocation.html#env-invocation

BusyBox v1.30.1 does not, and does not have a `-S`-like option:

    $ seq 1 2 | $(nix-build '<nixpkgs>' -A busybox)/bin/env "sort -nr"
    env: can't execute 'sort -nr': No such file or directory

Toybox 0.8.1 also does not, and also does not have a `-S` option:

    $ seq 1 2 | $(nix-build '<nixpkgs>' -A toybox)/bin/env "sort -nr"
    env: exec sort -nr: No such file or directory

---

At any rate, if this patch merges and the remaining ~1,500 are updated,
the much larger patch should probably include a checkstyle-like test
asserting all new shebangs use `/usr/bin/env`. I also don't mind
dealing with NixOS weirdness if the project would prefer that.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Graham Christensen <graham@grahamc.com>
Closes #9893

4 years agoShare some code for spa deadman tunables
Ryan Moeller [Mon, 10 Feb 2020 21:11:30 +0000 (16:11 -0500)]
Share some code for spa deadman tunables

We need to do the same thing to update all spas on any OS for these
tunables, so let's share the code.

While here let's match the types of the literals initializing the
variables with the type of the variable.

Reviewed-by: Allan Jude <allanjude@freebsd.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9964

4 years agoZTS: Test zvol I/O in different volmodes
Ryan Moeller [Mon, 10 Feb 2020 21:10:25 +0000 (16:10 -0500)]
ZTS: Test zvol I/O in different volmodes

We found that our zvol code had some issues with volmode=dev that were
not revealed by ZTS.

Add some basic I/O operations to exercise more code paths in the
volmode test.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9953

4 years agoICP: Improve AES-GCM performance
Attila Fülöp [Mon, 10 Feb 2020 20:59:50 +0000 (21:59 +0100)]
ICP: Improve AES-GCM performance

Currently SIMD accelerated AES-GCM performance is limited by two
factors:

a. The need to disable preemption and interrupts and save the FPU
state before using it and to do the reverse when done. Due to the
way the code is organized (see (b) below) we have to pay this price
twice for each 16 byte GCM block processed.

b. Most processing is done in C, operating on single GCM blocks.
The use of SIMD instructions is limited to the AES encryption of the
counter block (AES-NI) and the Galois multiplication (PCLMULQDQ).
This leads to the FPU not being fully utilized for crypto
operations.

To solve (a) we do crypto processing in larger chunks while owning
the FPU. An `icp_gcm_avx_chunk_size` module parameter was introduced
to make this chunk size tweakable. It defaults to 32 KiB. This step
alone roughly doubles performance. (b) is tackled by porting and
using the highly optimized openssl AES-GCM assembler routines, which
do all the processing (CTR, AES, GMULT) in a single routine. Both
steps together result in up to 32x reduction of the time spend in
the en/decryption routines, leading up to approximately 12x
throughput increase for large (128 KiB) blocks.

Lastly, this commit changes the default encryption algorithm from
AES-CCM to AES-GCM when setting the `encryption=on` property.

Reviewed-By: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-By: Jason King <jason.king@joyent.com>
Reviewed-By: Tom Caputi <tcaputi@datto.com>
Reviewed-By: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #9749

4 years agoFactor out dbuf_sync_bonus
Matthew Macy [Fri, 7 Feb 2020 22:22:29 +0000 (14:22 -0800)]
Factor out dbuf_sync_bonus

Factor the portion of dbuf_sync_leaf() responsible for handling bonus
buffers out in to its own dbuf_sync_bonus() helper function.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9909

4 years agoZTS: Add an is_dilos function for future ZTS updates
Igor K [Fri, 7 Feb 2020 20:32:52 +0000 (23:32 +0300)]
ZTS: Add an is_dilos function for future ZTS updates

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Igor Kozhukhov <igor@dilos.org>
Closes #9960

4 years agoZTS: Use wc -c instead of --bytes for portability
Ryan Moeller [Fri, 7 Feb 2020 20:31:38 +0000 (15:31 -0500)]
ZTS: Use wc -c instead of --bytes for portability

FreeBSD does not have the long opts for wc.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9963

4 years agoLinux 5.6 compat: timestamp_truncate()
Brian Behlendorf [Thu, 6 Feb 2020 20:37:25 +0000 (12:37 -0800)]
Linux 5.6 compat: timestamp_truncate()

The timestamp_truncate() function was added, it replaces the existing
timespec64_trunc() function.  This change renames our wrapper function
to be consistent with the upstream name and updates the compatibility
code for older kernels accordingly.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9956
Closes #9961

4 years agoLinux 5.6 compat: struct proc_ops
Brian Behlendorf [Thu, 6 Feb 2020 18:30:41 +0000 (10:30 -0800)]
Linux 5.6 compat: struct proc_ops

The proc_ops structure was introduced to replace the use of of the
file_operations structure when registering proc handlers.  This
change creates a new kstat_proc_op_t typedef for compatibility
which can be used to pass around the correct structure.

This change additionally adds the 'const' keyword to all of the
existing proc operations structures.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9961

4 years agoReduce number of atomic_add() calls in aggsum
Alexander Motin [Thu, 6 Feb 2020 21:21:06 +0000 (16:21 -0500)]
Reduce number of atomic_add() calls in aggsum

Previous code used 4 atomics to do aggsum_flush_bucket() and 2 more to
re-borrow after the flush.  But since asc_borrowed and asc_delta are
accessed only while holding asc_lock, it makes no any sense to modify
as_lower_bound and as_upper_bound in multiple steps.  Instead of that
the new code uses only 2 atomics in all the cases, one per as_*_bound
variable.  I think even that is overkill, simple atomic store and
load could be used here, since all modifications are done under the
as_lock, but there are no such primitives in ZFS code now.

While there, make borrow code consider previous borrow value, so that
on mixed request patterns reduce chance of needing to borrow again if
much larger request follows tiny one that needed borrow.

Also reduce as_numbuckets from uint64_t to u_int.  It makes no sense
to use so large division operation on every aggsum_add().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #9930

4 years agoReplace static per-cpu with dynamic per-cpu data
Romain Dolbeau [Thu, 6 Feb 2020 17:26:13 +0000 (18:26 +0100)]
Replace static per-cpu with dynamic per-cpu data

This solves the issue of loading the spl module on RISC-V.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Romain Dolbeau <romain.dolbeau@european-processor-initiative.eu>
Closes #9942

4 years agoFix static data to link with -fno-common
Romain Dolbeau [Thu, 6 Feb 2020 17:25:29 +0000 (18:25 +0100)]
Fix static data to link with -fno-common

-fno-common is the new default in GCC 10, replacing -fcommon in
GCC <= 9, so static data must only be allocated once.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Romain Dolbeau <romain.dolbeau@european-processor-initiative.eu>
Closes #9943

4 years agoFix unknown cc flag -fno-ipa-sra
Ryan Moeller [Thu, 6 Feb 2020 17:24:37 +0000 (12:24 -0500)]
Fix unknown cc flag -fno-ipa-sra

Clang does not recognize -fno-ipa-sra, so add a check for it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9946

4 years agoSuggest using visudo to edit
Gerardwx [Wed, 5 Feb 2020 19:31:53 +0000 (14:31 -0500)]
Suggest using visudo to edit

Suggest visudo which allows editing the sudoers file in a safe fashion.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gerardwx <gerardw@alum.mit.edu>
Closes #9918

4 years agoFew microoptimizations to dbuf layer
Alexander Motin [Wed, 5 Feb 2020 19:08:44 +0000 (14:08 -0500)]
Few microoptimizations to dbuf layer

Move db_link into the same cache line as db_blkid and db_level.
It allows significantly reduce avl_add() time in dbuf_create() on
systems with large RAM and huge number of dbufs per dnode.

Avoid few accesses to dbuf_caches[].size, which is highly congested
under high IOPS and never stays in cache for a long time.  Use local
value we are receiving from zfs_refcount_add_many() any way.

Remove cache_size_bytes_max bump from dbuf_evict_one().  I don't see
a point to do it on dbuf eviction after we done it on insertion in
dbuf_rele_and_unlock().

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #9931