Prakash Surya [Wed, 30 Oct 2019 18:02:41 +0000 (11:02 -0700)]
Enable use of DTRACE_PROBE* macros in "spl" module
This change modifies some of the infrastructure for enabling the use of
the DTRACE_PROBE* macros, such that we can use tehm in the "spl" module.
Currently, when the DTRACE_PROBE* macros are used, they get expanded to
create new functions, and these dynamically generated functions become
part of the "zfs" module.
Since the "spl" module does not depend on the "zfs" module, the use of
DTRACE_PROBE* in the "spl" module would result in undefined symbols
being used in the "spl" module. Specifically, DTRACE_PROBE* would turn
into a function call, and the function being called would be a symbol
only contained in the "zfs" module; which results in a linker and/or
runtime error.
Thus, this change adds the necessary logic to the "spl" module, to
mirror the tracing functionality available to the "zfs" module. After
this change, we'll have a "trace_zfs.h" header file which defines the
probes available only to the "zfs" module, and a "trace_spl.h" header
file which defines the probes available only to the "spl" module.
Reviewed by: Brad Lewis <brad.lewis@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Prakash Surya <prakash.surya@delphix.com>
Closes #9525
Matthew Macy [Fri, 1 Nov 2019 17:37:33 +0000 (10:37 -0700)]
Prefix struct rangelock
A struct rangelock already exists on FreeBSD. Add a zfs_ prefix as
per our convention to prevent any conflict with existing symbols.
This change is a follow up to 2cc479d0.
Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9534
Brian Behlendorf [Thu, 31 Oct 2019 19:57:42 +0000 (12:57 -0700)]
ZTS: Fix removal_with_errors
The removal_with_errors.ksh test case could occasionally complete
the removal process instead of canceling due to an injected error.
To prevent this false positive, export and import the pool between
test phases to flush the ARC cache. Furthermore, double the amount
of data in the pool to increase the removal time.
Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9528
Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9533
Matthew Macy [Thu, 31 Oct 2019 17:09:01 +0000 (10:09 -0700)]
Include prototypes for vdev_initialize
Address two prototype related warnings emitted by clang.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9535
Matthew Macy [Thu, 31 Oct 2019 16:52:22 +0000 (09:52 -0700)]
Factor Linux specific code out of spa_misc.c
Move these Linux module parameter get/set helpers in to
platform specific code.
Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9457
alaviss [Wed, 30 Oct 2019 21:38:41 +0000 (21:38 +0000)]
dracut/zfs-load-key.sh: properly remove prefixes
Removes the 'ZFS=' prefix from $BOOTFS instead of $root. This makes sure
that the 'zfs:' prefix remains stripped so that users with
'root=zfs:dataset' cmdline can have key loaded on boot again.
Reviewed-by: Garrett Fields <ghfields@gmail.com> Reviewed-by: Dacian Reece-Stremtan <dacianstremtan@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Hiếu Lê <leorize+oss@disroot.org>
Closes #9520
Tom Caputi [Wed, 30 Oct 2019 18:27:28 +0000 (14:27 -0400)]
Fix 'zfs change-key' with unencrypted child
Currently, when you call 'zfs change-key' on an encrypted dataset
that has an unencrypted child, the code will trigger a VERIFY.
This VERIFY is leftover from before we allowed unencrypted
datasets to exist underneath encrypted ones. This patch fixes the
issue by simply replacing the VERIFY with an early return when
recursing through datasets.
Reviewed by: Jason King <jason.brian.king@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #9524
Matthew Macy [Mon, 28 Oct 2019 16:51:53 +0000 (09:51 -0700)]
Minor spa portability fixes
- FreeBSD's rootpool import code uses spa_config_parse
- Move the zvol_create_minors call out from under the
spa_namespace_lock in spa_import. It isn't needed and it causes
a lock order reversal on FreeBSD.
Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9499
Chunwei Chen [Mon, 28 Oct 2019 16:49:44 +0000 (09:49 -0700)]
Fix zpool history unbounded memory usage
In original implementation, zpool history will read the whole history
before printing anything, causing memory usage goes unbounded. We fix
this by breaking it into read-print iterations.
Reviewed-by: Tom Caputi <tcaputi@datto.com> Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #9516
loli10K [Sat, 26 Oct 2019 22:22:19 +0000 (00:22 +0200)]
Fix for ARC sysctls ignored at runtime
This change leverage module_param_call() to run arc_tuning_update()
immediately after the ARC tunable has been updated as suggested in cffa8372 code review.
A simple test case is added to the ZFS Test Suite to prevent future
regressions in functionality.
Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #9487
Closes #9489
Matthew Macy [Sat, 26 Oct 2019 17:07:59 +0000 (10:07 -0700)]
Remove unneeded header from libzpool/kernel.c
The sys/signal.h header doesn't exist on FreeBSD, nor is
it needed on Linux.
Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9510
Ryan Moeller [Sat, 26 Oct 2019 03:03:46 +0000 (23:03 -0400)]
ZTS: Move more tests to linux.run
Tests that rely on special filesystems that are specific to Linux
should only be run on Linux.
Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Giuseppe Di Natale <guss80@gmail.com> Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9512
Tom Caputi [Thu, 24 Oct 2019 17:51:01 +0000 (13:51 -0400)]
Fix incremental recursive encrypted receive
Currently, incremental recursive encrypted receives fail to work
for any snapshot after the first. The reason for this is because
the check in zfs_setup_cmdline_props() did not properly realize
that when the user attempts to use '-x encryption' in this
situation, they are not really overriding the existing encryption
property and instead are attempting to prevent it from changing.
This resulted in an error message stating: "encryption property
'encryption' cannot be set or excluded for raw or incremental
streams".
This problem is fixed by updating the logic to expect this use
case.
Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #9494
Brian Behlendorf [Thu, 24 Oct 2019 17:17:33 +0000 (10:17 -0700)]
Linux 4.14, 4.19, 5.0+ compat: SIMD save/restore
Contrary to initial testing we cannot rely on these kernels to
invalidate the per-cpu FPU state and restore the FPU registers.
Nor can we guarantee that the kernel won't modify the FPU state
which we saved in the task struck.
Therefore, the kfpu_begin() and kfpu_end() functions have been
updated to save and restore the FPU state using our own dedicated
per-cpu FPU state variables.
This has the additional advantage of allowing us to use the FPU
again in user threads. So we remove the code which was added to
use task queues to ensure some functions ran in kernel threads.
Reviewed-by: Fabian Grünbichler <f.gruenbichler@proxmox.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #9346
Closes #9403
chrisrd [Wed, 23 Oct 2019 22:26:17 +0000 (09:26 +1100)]
Don't call arc_buf_destroy on unallocated arc_buf
Fixes an obvious issue of calling arc_buf_destroy() on an
unallocated arc_buf.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Paul Dagnelie <pcd@delphix.com> Signed-off-by: Chris Dunlop <chris@onthe.net.au>
Closes #9453
Matthew Macy [Wed, 23 Oct 2019 20:48:31 +0000 (13:48 -0700)]
Use platform independent error code for libzfs_run_process_impl
Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9493
Matthew Macy [Mon, 21 Oct 2019 03:37:30 +0000 (20:37 -0700)]
Use correct format string when printing int8
Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9486
Matthew Macy [Sun, 20 Oct 2019 00:08:19 +0000 (17:08 -0700)]
Remove gratuitous Linux only include in ztest & zdb
We don't need to include stdio_ext.h
Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9483
With 4k disks, this test will fail in the last section because the
expected human readable value of 20.0M is reported as 20.1M. Rather than
use the human readable property, switch to the parsable property and
verify that the values are reasonably close.
Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: John Kennedy <john.kennedy@delphix.com>
Closes #9477
Giving a name to this enum makes it discoverable from
debugging tools like DRGN and SDB. For example, with
the name proposed on this patch we can iterate over
these values in DRGN:
```
>>> prog.type('enum kmc_bit').enumerators
(('KMC_BIT_NOTOUCH', 0), ('KMC_BIT_NODEBUG', 1),
('KMC_BIT_NOMAGAZINE', 2), ('KMC_BIT_NOHASH', 3),
('KMC_BIT_QCACHE', 4), ('KMC_BIT_KMEM', 5),
('KMC_BIT_VMEM', 6), ('KMC_BIT_SLAB', 7),
...
```
This enables SDB to easily pretty-print the flags of
the spl_kmem_caches in the system like this:
```
> spl_kmem_caches -o "name,flags,total_memory"
name flags total_memory
------------------------ ----------------------- ------------
abd_t KMC_NOMAGAZINE|KMC_SLAB 4.5MB
arc_buf_hdr_t_full KMC_NOMAGAZINE|KMC_SLAB 12.3MB
... <cropped> ...
ddt_cache KMC_VMEM 583.7KB
ddt_entry_cache KMC_NOMAGAZINE|KMC_SLAB 0.0B
... <cropped> ...
zio_buf_1048576 KMC_NODEBUG|KMC_VMEM 0.0B
... <cropped> ...
```
Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #9478
Update skc_obj_alloc for spl kmem caches that are backed by Linux
Currently, for certain sizes and classes of allocations we use
SPL caches that are backed by caches in the Linux Slab allocator
to reduce fragmentation and increase utilization of memory. The
way things are implemented for these caches as of now though is
that we don't keep any statistics of the allocations that we
make from these caches.
This patch enables the tracking of allocated objects in those
SPL caches by making the trade-off of grabbing the cache lock
at every object allocation and free to update the respective
counter.
Additionally, this patch makes those caches visible in the
/proc/spl/kmem/slab special file.
As a side note, enabling the specific counter for those caches
enables SDB to create a more user-friendly interface than
/proc/spl/kmem/slab that can also cross-reference data from
slabinfo. Here is for example the output of one of those
caches in SDB that outputs the name of the underlying Linux
cache, the memory of SPL objects allocated in that cache,
and the percentage of those objects compared to all the
objects in it:
```
> spl_kmem_caches | filter obj.skc_name == "zio_buf_512" | pp
name ... source total_memory util
----------- ... ----------------- ------------ ----
zio_buf_512 ... kmalloc-512[SLUB] 16.9MB 8
```
Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #9474
Ryan Moeller [Thu, 17 Oct 2019 02:19:48 +0000 (22:19 -0400)]
Detect if sed supports --in-place
Not all versions of sed have the --in-place flag. Detect support for
the flag during ./configure and provide a fallback mechanism for those
systems where sed's behavior differs. The autoconf variable
${ac_inplace} can be used to choose the correct flags for editing a
file in place with sed.
Replace violating usages in Makefile.am with ${ac_inplace}.
Reviewed-by: Chris Dunlop <chris@onthe.net.au> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9463
Matthew Macy [Thu, 17 Oct 2019 01:43:52 +0000 (18:43 -0700)]
Make zfsdev_getminor signature cross platform
Only pass the file descriptor to make zfsdev_get_miror() portable.
Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9466
Matthew Macy [Thu, 17 Oct 2019 01:37:31 +0000 (18:37 -0700)]
Move linux specific mmp module_param_call handler to platform code
Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9465
Matthew Macy [Thu, 17 Oct 2019 01:36:16 +0000 (18:36 -0700)]
Add default case to lua kernel code
Some platforms, e.g. FreeBSD, support user space setjmp
semantics in kernel.
Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9450
Matthew Macy [Thu, 17 Oct 2019 01:34:19 +0000 (18:34 -0700)]
Fix signature for private functions without header declarations
Clang will complain if a function has no prior declaration
Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9467
Matthew Macy [Mon, 14 Oct 2019 02:25:19 +0000 (19:25 -0700)]
Remove dead code and cleanup scoping in dmu_send.c
This addresses a number of problems with dmu_send.c:
* bp_span is unused which makes clang complain
* dump_write conflicts with FreeBSD's existing core dump code
* range_alloc is private to the file and not declared in any headers
causing clang to complain
Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9432
Paul Dagnelie [Mon, 14 Oct 2019 02:21:51 +0000 (19:21 -0700)]
Don't call sizeof on void
We get the sizeof the appropriate type, and don't cast away const.
Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #9455
Matthew Macy [Mon, 14 Oct 2019 02:19:39 +0000 (19:19 -0700)]
Move zfs_onexit_fd_hold to platform code
FreeBSD has a very different implementation.
Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9442
Matthew Macy [Mon, 14 Oct 2019 02:15:27 +0000 (19:15 -0700)]
Move zio_delay_interrupt to platform code
FreeBSD has its own implementation as do other platforms.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9439
Brian Behlendorf [Mon, 14 Oct 2019 02:13:26 +0000 (19:13 -0700)]
Modify sharenfs=on default behavior
While it may sometimes be convenient to export an NFS filesystem with
no_root_squash it should not be the default behavior. Align the
default behavior with the Linux NFS server defaults. To restore
the previous behavior use 'zfs set sharenfs="no_root_squash,..."'.
Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Reviewed-by: Richard Laager <rlaager@wiktel.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9397
Closes #9425
chrisrd [Fri, 11 Oct 2019 17:33:13 +0000 (04:33 +1100)]
Typo fix in comment: dso_dryrun
Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chris Dunlop <chris@onthe.net.au>
Closes #9452
Brian Behlendorf [Fri, 11 Oct 2019 17:22:20 +0000 (10:22 -0700)]
ZTS: Fix zpool_status_-s
After commit 5e74ac51 which split and reordered the run files the
`zpool_status_-s` test began failing. The new ordering placed
the test after a previous test which used `zpool replace` to replace
a disk but did not clear its label. This resulted in the next test,
`zpool_status_-s`, failing because of the potentially active
pool being detected on the replaced vdev.
/dev/loop0 is part of potentially active pool 'testpool'
Use the default_mirror_setup_noexit() and default_cleanup_noexit()
functions to create the pool in `zpool_status_-s`. They use the -f
flag by default.
In the `scrub_after_resilver` test wipe the label during cleanup
to prevent future failures if the tests are again reordered.
Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9451
Paul Dagnelie [Fri, 11 Oct 2019 17:13:21 +0000 (10:13 -0700)]
Function name and comment updates
Rename certain functions for more consistency when they share common
features. Make comments clearer about what arguments should be passed
to the insert and add functions.
Reviewed by: Sara Hartse <sara.hartse@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matt Ahrens <matt@delphix.com> Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #9441
Matthew Macy [Fri, 11 Oct 2019 17:10:20 +0000 (10:10 -0700)]
Remove linux/mod_compat.h from common code
It is no longer necessary; mod_compat.h is included from zfs_context.h.
Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9449
Matthew Macy [Fri, 11 Oct 2019 17:06:18 +0000 (10:06 -0700)]
Expose dmu_buf_hold_array_by_dnode to platform code
FreeBSD uses this in its pager ops routines
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9431
Richard Yao [Wed, 9 Oct 2019 19:16:12 +0000 (12:16 -0700)]
Implement ZPOOL_IMPORT_UDEV_TIMEOUT_MS
Since 0.7.0, zpool import would unconditionally block on udev for 30
seconds. This introduced a regression in initramfs environments that
lack udev (particularly mdev based environments), yet use a zfs userland
tools intended for the system that had been built against udev. Gentoo's
genkernel is the main example, although custom user initramfs
environments would be similarly impacted unless special builds of the
ZFS userland utilities were done for them. Such environments already
have their own mechanisms for blocking until device nodes are ready
(such as genkernel's scandelay parameter), so it is unnecessary for
zpool import to block on a non-existent udev until a timeout is reached
inside of them.
Rather than trying to intelligently determine whether udev is available
on the system to avoid unnecessarily blocking in such environments, it
seems best to just allow the environment to override the timeout. I
propose that we add an environment variable called
ZPOOL_IMPORT_UDEV_TIMEOUT_MS. Setting it to 0 would restore the 0.6.x
behavior that was more desirable in mdev based initramfs environments.
This allows the system user land utilities to be reused when building
mdev-based initramfs archives.
Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Georgy Yakovlev <gyakovlev@gentoo.org> Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes #9436
Ryan Moeller [Fri, 11 Oct 2019 16:50:46 +0000 (12:50 -0400)]
Clarify loop variable name in zfs copies test
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9445
Ryan Moeller [Fri, 11 Oct 2019 16:49:48 +0000 (12:49 -0400)]
Fix some style nits in tests
Mostly whitespace changes, no functional changes intended.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9447
loli10K [Thu, 10 Oct 2019 23:39:41 +0000 (01:39 +0200)]
Fix pool creation with feature@allocation_classes disabled
When "feature@allocation_classes" is not enabled on the pool no vdev
with "special" or "dedup" allocation type should be allowed to exist in
the vdev tree.
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #9427
Closes #9429
Matthew Macy [Thu, 10 Oct 2019 22:59:34 +0000 (15:59 -0700)]
Move get_temporary_prop to platform code
Temporary property handling at the VFS layer requires
platform specific code.
Reviewed-by: Sean Eric Fagan <sef@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Jorgen Lundman <lundman@lundman.net> Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9401
Matthew Macy [Thu, 10 Oct 2019 22:39:44 +0000 (15:39 -0700)]
Make `zil_async_to_sync` visible to platform code
FreeBSD's zvol platform code requires access to the
zil_async_to_sync() function.
Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9440
Brian Behlendorf [Thu, 10 Oct 2019 16:49:45 +0000 (09:49 -0700)]
Update `zfs program` command usage
Update the zfs(8) man page to clearly describe that arguments for
channel programs are to be listed after the -- sentinel which
terminates argument processing. This behavior is supported by
getopt on Linux, FreeBSD, and Illumos according to each platforms
respective man pages.
zfs program [-jn] [-t instruction-limit] [-m memory-limit]
pool script [--] arg1 ...
Reviewed-by: Clint Armstrong <clint@clintarmstrong.net> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9056
Closes #9428
Matthew Macy [Thu, 10 Oct 2019 16:45:38 +0000 (09:45 -0700)]
Make clang happy with vdev_raidz_ code
The macros are used to generate code for conditions without a
corresponding branch. This is not a problem in practice, but
clang has no way of knowing that. Add a default branch with a
VERIFY(0) to indicate that it "can't happen"
```
In file included from \
/usr/home/mmacy/devel/ZoF/module/zfs/vdev_raidz_math_sse2.c:607:
/usr/home/mmacy/devel/ZoF/module/zfs/vdev_raidz_math_impl.h:281:3: \
error: no case matching constant switch condition '3' [-Werror]
```
Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9434
Ryan Moeller [Wed, 9 Oct 2019 17:39:26 +0000 (13:39 -0400)]
Move platform independent tests to a shared runfile
Tests that aren't limited to running on Linux can be moved to a common
runfile to be shared with other platforms.
The test runner and wrapper script are enhanced to allow specifying
multiple runfiles as a comma-separated list. The default runfiles are
now "common.run,PLATFORM.run" where PLATFORM is determined at run time.
Sections in runfiles that share a path with another runfile can append
a colon separator and an identifier to the path in the section
name, ie `[tests/functional/atime:Linux]`, to avoid overriding the tests
specified by other runfiles.
Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9391
Paul Dagnelie [Wed, 9 Oct 2019 17:36:03 +0000 (10:36 -0700)]
Reduce loaded range tree memory usage
This patch implements a new tree structure for ZFS, and uses it to
store range trees more efficiently.
The new structure is approximately a B-tree, though there are some
small differences from the usual characterizations. The tree has core
nodes and leaf nodes; each contain data elements, which the elements
in the core nodes acting as separators between its children. The
difference between core and leaf nodes is that the core nodes have an
array of children, while leaf nodes don't. Every node in the tree may
be only partially full; in most cases, they are all at least 50% full
(in terms of element count) except for the root node, which can be
less full. Underfull nodes will steal from their neighbors or merge to
remain full enough, while overfull nodes will split in two. The data
elements are contained in tree-controlled buffers; they are copied
into these on insertion, and overwritten on deletion. This means that
the elements are not independently allocated, which reduces overhead,
but also means they can't be shared between trees (and also that
pointers to them are only valid until a side-effectful tree operation
occurs). The overhead varies based on how dense the tree is, but is
usually on the order of about 50% of the element size; the per-node
overheads are very small, and so don't make a significant difference.
The trees can accept arbitrary records; they accept a size and a
comparator to allow them to be used for a variety of purposes.
The new trees replace the AVL trees used in the range trees today.
Currently, the range_seg_t structure contains three 8 byte integers
of payload and two 24 byte avl_tree_node_ts to handle its storage in
both an offset-sorted tree and a size-sorted tree (total size: 64
bytes). In the new model, the range seg structures are usually two 4
byte integers, but a separate one needs to exist for the size-sorted
and offset-sorted tree. Between the raw size, the 50% overhead, and
the double storage, the new btrees are expected to use 8*1.5*2 = 24
bytes per record, or 33.3% as much memory as the AVL trees (this is
for the purposes of storing metaslab range trees; for other purposes,
like scrubs, they use ~50% as much memory).
We reduced the size of the payload in the range segments by teaching
range trees about starting offsets and shifts; since metaslabs have a
fixed starting offset, and they all operate in terms of disk sectors,
we can store the ranges using 4-byte integers as long as the size of
the metaslab divided by the sector size is less than 2^32. For 512-byte
sectors, this is a 2^41 (or 2TB) metaslab, which with the default
settings corresponds to a 256PB disk. 4k sector disks can handle
metaslabs up to 2^46 bytes, or 2^63 byte disks. Since we do not
anticipate disks of this size in the near future, there should be
almost no cases where metaslabs need 64-byte integers to store their
ranges. We do still have the capability to store 64-byte integer ranges
to account for cases where we are storing per-vdev (or per-dnode) trees,
which could reasonably go above the limits discussed. We also do not
store fill information in the compact version of the node, since it
is only used for sorted scrub.
We also optimized the metaslab loading process in various other ways
to offset some inefficiencies in the btree model. While individual
operations (find, insert, remove_from) are faster for the btree than
they are for the avl tree, remove usually requires a find operation,
while in the AVL tree model the element itself suffices. Some clever
changes actually caused an overall speedup in metaslab loading; we use
approximately 40% less cpu to load metaslabs in our tests on Illumos.
Another memory and performance optimization was achieved by changing
what is stored in the size-sorted trees. When a disk is heavily
fragmented, the df algorithm used by default in ZFS will almost always
find a number of small regions in its initial cursor-based search; it
will usually only fall back to the size-sorted tree to find larger
regions. If we increase the size of the cursor-based search slightly,
and don't store segments that are smaller than a tunable size floor
in the size-sorted tree, we can further cut memory usage down to
below 20% of what the AVL trees store. This also results in further
reductions in CPU time spent loading metaslabs.
The 16KiB size floor was chosen because it results in substantial memory
usage reduction while not usually resulting in situations where we can't
find an appropriate chunk with the cursor and are forced to use an
oversized chunk from the size-sorted tree. In addition, even if we do
have to use an oversized chunk from the size-sorted tree, the chunk
would be too small to use for ZIL allocations, so it isn't as big of a
loss as it might otherwise be. And often, more small allocations will
follow the initial one, and the cursor search will now find the
remainder of the chunk we didn't use all of and use it for subsequent
allocations. Practical testing has shown little or no change in
fragmentation as a result of this change.
If the size-sorted tree becomes empty while the offset sorted one still
has entries, it will load all the entries from the offset sorted tree
and disregard the size floor until it is unloaded again. This operation
occurs rarely with the default setting, only on incredibly thoroughly
fragmented pools.
There are some other small changes to zdb to teach it to handle btrees,
but nothing major.
Reviewed-by: George Wilson <gwilson@delphix.com> Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed by: Sebastien Roy seb@delphix.com Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #9181
George Melikov [Tue, 8 Oct 2019 17:10:23 +0000 (20:10 +0300)]
module/Makefile.in: don't run xargs if empty
If stdin if empty - don't run xargs command,
otherwise we can get `cp: missing file operand`
error.
Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #9418
ZTS: Fix trim/trim_config and trim/autotrim_config
There have been occasional CI failures which occur when the trimmed
vdev size exactly matches the target size. Resolve this by slightly
relaxing the conditional and checking for -ge rather than -gt. In
all of the cases observer, the values match exactly. For example:
Failure /mnt/trim-vdev1 is 768 MB which is not -gt than 768 MB
Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9399
Commit 093bb64 resolved an automount failures for chroot'd processes
but inadvertently broke automounting for root filesystems where the
vfs_mntpoint is NULL. Resolve the issue by checking for NULL in order
to generate the correct path.
Reviewed-by: Tom Caputi <tcaputi@datto.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9381
Closes #9384
Matthew Macy [Thu, 3 Oct 2019 22:54:29 +0000 (15:54 -0700)]
Rename rangelock_ functions to zfs_rangelock_
A rangelock KPI already exists on FreeBSD. Add a zfs_ prefix as
per our convention to prevent any conflict with existing symbols.
Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9402
Tony Nguyen [Thu, 3 Oct 2019 22:33:38 +0000 (16:33 -0600)]
dbuf_hold_impl() cleanup to improve cached read performance
Currently every dbuf_hold_impl() incurs kmem_alloc() and kmem_free()
which can be costly for cached read performance.
This change reverts the dbuf_hold_impl() fix stack commit, i.e. fc5bb51f08a6c91ff9ad3559d0266eeeab0b1f61 to eliminate the extra
kmem_alloc() and kmem_free() operations and improve cached read
performance. With the change, each dbuf_hold_impl() frame uses 40 bytes
more, total of 800 for 20 recursive levels. Linux kernel stack sizes are
8K and 16K for 32bit and 64bit, respectively, so stack overrun risk is
limited.
Performance observations on 8K recordsize filesystem:
- 8/128/1024K at 1-128 sequential cached read, ~3% improvement
Testing done on Ubuntu 18.04 with 4.15 kernel, 8vCPUs and SSD storage on
VMware ESX.
Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Nguyen <tony.nguyen@delphix.com>
Closes #9351
Matthew Macy [Thu, 3 Oct 2019 17:20:44 +0000 (10:20 -0700)]
OpenZFS restructuring - libzutil
Factor Linux specific functionality out of libzutil.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9356
Update cleanup_upgrade to use destroy_dataset and destroy_pool
when performing cleanup. These wrappers retry if the pool is busy
preventing occasional failures like those observed when running
tests upgrade_readonly_pool. For example:
SUCCESS: test enabled == enabled
User accounting upgrade is not executed on readonly pool
NOTE: Performing local cleanup via log_onexit (cleanup_upgrade)
cannot destroy 'testpool': pool is busy
ERROR: zpool destroy testpool exited 1
Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9400
Didier Roche [Wed, 2 Oct 2019 17:51:55 +0000 (19:51 +0200)]
Workaround to avoid a race when /var/lib is a persistent dataset
If /var/lib is a dataset not under <pool>/ROOT/<root_dataset>, as
proposed in the ubuntu root on zfs upstream guide
(https://github.com/zfsonlinux/zfs/wiki/Ubuntu-18.04-Root-on-ZFS),
we end up with a race where some services, like systemd-random-seed
are writing under /var/lib, while zfs-mount is called. zfs mount will
then potentially fail because of /var/lib isn't empty and so, can't be
mounted.
Order those 2 units for now (more may be needed) as we can't declare
virtually a provide mount point to match
"RequiresMountsFor=/var/lib/systemd/random-seed" from
systemd-random-seed.service.
The optional generator for zfs 0.8 fixes it, but it's not enabled
by default nor necessarily required.
Both zfs-mount.service and systemd-random-seed.service are starting
After=systemd-remount-fs.service. zfs-mount.service should be done
before local-fs.target while systemd-random-seed.service should finish
before sysinit.target (which is a later target).
Ideally, we would have a way for zfs mount -a unit to declare all paths
or move systemd-random-seed after local-fs.target.
Reviewed-by: Antonio Russo <antonio.e.russo@gmail.com> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Didier Roche <didrocks@ubuntu.com>
Closes #9360
Matthew Macy [Wed, 2 Oct 2019 17:39:48 +0000 (10:39 -0700)]
OpenZFS restructuring - libspl
Factor Linux specific pieces out of libspl.
Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Sean Eric Fagan <sef@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9336
Matthew Macy [Tue, 1 Oct 2019 23:35:05 +0000 (16:35 -0700)]
OpenZFS restructuring - arc_stats
Make arc_stats visible to platform code.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9386
dacianstremtan [Tue, 1 Oct 2019 19:54:27 +0000 (15:54 -0400)]
Fix for zfs-dracut regression
Line 31 and 32 overwrote the ${root} variable which broke mount-zfs.sh
We have create a new variable for the dataset instead of overwriting the
${root} variable in zfs-load-key.sh${root} variable in zfs-load-key.sh
Reduce the time required for ./configure to perform the needed
KABI checks by allowing kbuild to compile multiple test cases in
parallel. This was accomplished by splitting each test's source
code from the logic handling whether that code could be compiled
or not.
By introducing this split it's possible to minimize the number of
times kbuild needs to be invoked. As importantly, it means all of
the tests can be built in parallel. This does require a little extra
care since we expect some tests to fail, so the --keep-going (-k)
option must be provided otherwise some tests may not get compiled.
Furthermore, since a failure during the kbuild modpost phase will
result in an early exit; the final linking phase is limited to tests
which passed the initial compilation and produced an object file.
Once everything has been built the configure script proceeds as
previously. The only significant difference is that it now merely
needs to test for the existence of a .ko file to determine the
result of a given test. This vastly speeds up the entire process.
New test cases should use ZFS_LINUX_TEST_SRC to declare their test
source code and ZFS_LINUX_TEST_RESULT to check the result. All of
the existing kernel-*.m4 files have been updated accordingly, see
config/kernel-current-time.m4 for a basic example. The legacy
ZFS_LINUX_TRY_COMPILE macro has been kept to handle special cases
but it's use is not encouraged.
master (secs) patched (secs)
------------- ----------------
autogen.sh 61 68
configure 137 24 (~17% of current run time)
make -j $(nproc) 44 44
make rpms 287 150
Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #8547
Closes #9132
Closes #9341
Prakash Surya [Tue, 1 Oct 2019 19:33:12 +0000 (12:33 -0700)]
Timeout waiting for ZVOL device to be created
We've seen cases where after creating a ZVOL, the ZVOL device node in
"/dev" isn't generated after 20 seconds of waiting, which is the point
at which our applications gives up on waiting and reports an error.
The workload when this occurs is to "refresh" 400+ ZVOLs roughly at the
same time, based on a policy set by the user. This refresh operation
will destroy the ZVOL, and re-create it based on a snapshot.
When this occurs, we see many hundreds of entries on the "z_zvol" taskq
(based on inspection of the /proc/spl/taskq-all file). Many of the
entries on the taskq end up in the "zvol_remove_minors_impl" function,
and I've measured the latency of that function:
That data is from a 10 second sample, using the BCC "funclatency" tool.
As we can see, in this 10 second sample, most calls took 128ms at a
minimum. Thus, some basic math tells us that in any 20 second interval,
we could only process at most about 150 removals, which is much less
than the 400+ that'll occur based on the workload.
As a result of this, and since all ZVOL minor operations will go through
the single threaded "z_zvol" taskq, the latency for creating a single
ZVOL device can be unreasonably large due to other ZVOL activity on the
system. In our case, it's large enough to cause the application to
generate an error and fail the operation.
When profiling the "zvol_remove_minors_impl" function, I saw that most
of the time in the function was spent off-cpu, blocked in the function
"taskq_wait_outstanding". How this works, is "zvol_remove_minors_impl"
will dispatch calls to "zvol_free" using the "system_taskq", and then
the "taskq_wait_outstanding" function is used to wait for all of those
dispatched calls to occur before "zvol_remove_minors_impl" will return.
As far as I can tell, "zvol_remove_minors_impl" doesn't necessarily have
to wait for all calls to "zvol_free" to occur before it returns. Thus,
this change removes the call to "taskq_wait_oustanding", so that calls
to "zvol_free" don't affect the latency of "zvol_remove_minors_impl".
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Gallagher <john.gallagher@delphix.com> Signed-off-by: Prakash Surya <prakash.surya@delphix.com>
Closes #9380
Matthew Macy [Mon, 30 Sep 2019 19:16:06 +0000 (12:16 -0700)]
OpenZFS restructuring - zpool
Factor Linux specific functions out of the zpool command.
Reviewed-by: Allan Jude <allanjude@freebsd.org> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Sean Eric Fagan <sef@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9333
Matthew Macy [Fri, 27 Sep 2019 17:46:28 +0000 (10:46 -0700)]
OpenZFS restructuring - zfs_ioctl
Refactor the zfs ioctls in to platform dependent and independent bits.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Sean Eric Fagan <sef@ixsystems.com> Signed-off-by: Matthew Macy <mmacy@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9301
Ben McGough [Thu, 26 Sep 2019 16:52:10 +0000 (09:52 -0700)]
Adding slack notifier
Allow ZED notification via slack incoming webhook.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Signed-off-by: Ben McGough <bmcgough@fredhutch.org>
Closes #9076
Closes #9350
Tom Caputi [Thu, 26 Sep 2019 00:02:33 +0000 (20:02 -0400)]
Fix encryption hierarchy issues with zfs recv -d
Currently, the recv_fix_encryption_hierarchy() function accepts
'destsnap' as one of its parameters. Originally, this was intended
to be the top-level dataset of a receive (whether or not the
receive was recursive). Unfortunately, this parameter actually is
simply the input that is passed in from the command line. When
the user specifies 'zfs recv -d', this string is actually only the
name of the receiving pool since the rest of the name is derived
from the send stream. This causes the function to fail, leaving
some datasets with an invalid encryption hierarchy.
This patch resolves this problem by passing in the top_zfs variable
instead. In order to make this work, this patch also includes some
changes that ensure the value is always present when we need it.
Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #9273
Closes #9309
Ryan Moeller [Wed, 25 Sep 2019 18:42:04 +0000 (14:42 -0400)]
ZTS: Fix typos in zfs_destroy tests cleanup
lot_must -> log_must
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed by: Sara Hartse <sara.hartse@delphix.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9362
Brian Behlendorf [Wed, 25 Sep 2019 16:24:45 +0000 (09:24 -0700)]
ZTS: harden xattr/cleanup.ksh
When the xattr/cleanup.ksh script is unable to remove the test group
due to an active process then it will not call default_cleanup. This
will result in a zvol_ENOSPC/setup failure when attempting to create
the /mnt/testdir directory which will already exist.
Resolve the issue by performing the default_cleanup before removing
the test user and group to ensure this step always happens. Also
allow one more retry to further minimize the likelihood of the
cleanup failing.
Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9358
Brian Behlendorf [Wed, 25 Sep 2019 16:23:29 +0000 (09:23 -0700)]
Add warning for zfs_vdev_elevator option removal
Originally the zfs_vdev_elevator module option was added as a
convenience so the requested elevator would be automatically set
on the underlying block devices. At the time this was simple
because the kernel provided an API function which did exactly this.
This API was then removed in the Linux 4.12 kernel which prompted
us to add compatibly code to set the elevator via a usermodehelper.
While well intentioned this introduced a bug which could cause a
system hang, that issue was subsequently fixed by commit 2a0d4188.
In order to avoid future bugs in this area, and to simplify the code,
this functionality is being deprecated. A console warning has been
added to notify any existing consumers and the documentation updated
accordingly. This option will remain for the lifetime of the 0.8.x
series for compatibility but if planned to be phased out of master.
Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #8664
Closes #9317
Trying to 'zfs diff' a snapshot with large dnodes will incorrectly try
to access its interior slots when dnodesize > sizeof(dnode_phys_t).
This is normally not an issue because the interior slots are
zero-filled, which report_dnode() handles calling
report_free_dnode_range(). However this is not the case for encrypted
large dnodes or filesystem using many SA based xattrs where the extra
data past the legacy dnode size boundary is interpreted as a
dnode_phys_t.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tom Caputi <tcaputi@datto.com> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #7678
Closes #8931
Closes #9343
Ryan Moeller [Sun, 22 Sep 2019 22:27:53 +0000 (18:27 -0400)]
Use signed types to prevent subtraction overflow
The difference between the sizes could be positive or negative. Leaving
the types as unsigned means the result overflows when the difference is
negative and removing the labs() means we'll have introduced a bug. The
subtraction results in the correct value when the unsigned integer is
interpreted as a signed integer by labs().
Clang doesn't see that we're doing a subtraction and abusing the types.
It sees the result of the subtraction, an unsigned value, being passed
to an absolute value function and emits a warning which we treat as an
error.
Reviewed by: Youzhong Yang <youzhong@gmail.com> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9355
Disabled resilver_defer feature leads to looping resilvers
When a disk is replaced with another on a pool with the resilver_defer
feature present, but not enabled the resilver activity restarts during
each spa_sync. This patch checks to make sure that the resilver_defer
feature is first enabled before requesting a deferred resilver.
This was originally fixed in illumos-joyent as OS-7982.
Reviewed-by: Chris Dunlop <chris@onthe.net.au> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com> Signed-off-by: Kody A Kantor <kody@kkantor.com>
External-issue: illumos-joyent OS-7982
Closes #9299
Closes #9338
Ryan Moeller [Wed, 18 Sep 2019 16:05:57 +0000 (12:05 -0400)]
Refactor libzfs_error_init newlines
Move the trailing newlines from the error message strings to the format
strings to more closely match the other error messages.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9330
The was incorrect with respect to swapping dataset IDs both in the
on-disk ZAP object and the in-memory queue.
In both cases, if ds1 was already present, then it would be first
replaced with ds2 and then ds would be replaced back with ds1.
Also, both cases did not properly handle a situation where both ds1 and
ds2 are already queued. A duplicate insertion would be attempted and
its failure would result in a panic.
Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Tom Caputi <tcaputi@datto.com> Signed-off-by: Andriy Gapon <avg@FreeBSD.org>
Closes #9140
Closes #9163