]> git.proxmox.com Git - mirror_zfs.git/log
mirror_zfs.git
3 years agoMake metaslab class rotor and aliquot per-allocator.
Alexander Motin [Tue, 15 Dec 2020 18:55:44 +0000 (13:55 -0500)]
Make metaslab class rotor and aliquot per-allocator.

Metaslab rotor and aliquot are used to distribute workload between
vdevs while keeping some locality for logically adjacent blocks.  Once
multiple allocators were introduced to separate allocation of different
objects it does not make much sense for different allocators to write
into different metaslabs of the same metaslab group (vdev) same time,
competing for its resources.  This change makes each allocator choose
metaslab group independently, colliding with others only sporadically.

Test including simultaneous write into 4 files with recordsize of 4KB
on a striped pool of 30 disks on a system with 40 logical cores show
reduction of vdev queue lock contention from 54 to 27% due to better
load distribution.  Unfortunately it won't help much ZVOLs yet since
only one dataset/ZVOL is synced at a time, and so for the most part
only one allocator is used, but it may improve later.

While there, to reduce the number of pointer dereferences change
per-allocator storage for metaslab classes and groups from several
separate malloc()'s to variable length arrays at the ends of the
original class and group structures.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #11288

3 years agoDKMS: Disable weak modules
gregory-lee-bartholomew [Tue, 15 Dec 2020 17:22:30 +0000 (11:22 -0600)]
DKMS: Disable weak modules

Fedora does not guarantee a stable kABI, so weak modules should be dis-
abled. See the dkms man page for a more detailed explanation of the weak
module feature.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gregory Bartholomew <gregory.lee.bartholomew@gmail.com>
Closes #9891
Closes #11128
Closes #11242
Closes #11335

3 years agolua: avoid gcc -Wreturn-local-addr bug
Ryan Libby [Tue, 15 Dec 2020 17:20:48 +0000 (09:20 -0800)]
lua: avoid gcc -Wreturn-local-addr bug

Avoid a bug with gcc's -Wreturn-local-addr warning with some
obfuscation.  In buggy versions of gcc, if a return value is an
expression that involves the address of a local variable, and even if
that address is legally converted to a non-pointer type, a warning may
be emitted and the value of the address may be replaced with zero.
Howerver, buggy versions don't emit the warning or replace the value
when simply returning a local variable of non-pointer type.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90737

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Libby <rlibby@FreeBSD.org>
Closes #11337

3 years agospa: avoid type narrowing warning
Ryan Libby [Tue, 15 Dec 2020 17:20:06 +0000 (09:20 -0800)]
spa: avoid type narrowing warning

Building the spa module for i386 caused gcc to emit
-Wint-to-pointer-cast "cast to pointer from integer of different size"
because spa.spa_did was uint64_t but pthread_join (via thread_join in
spa_deactivate) takes a pointer (32-bit on i386).  Define spa_did to be
pointer-size instead.  For now spa_did is in fact never non-zero and the
thread_join could instead be ifdef'd out, but changing the size of
spa_did may be more useful for the future.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Libby <rlibby@FreeBSD.org>
Closes #11336

3 years agoFreeBSD libzfs: gcc requires __thread after static
Ryan Libby [Mon, 14 Dec 2020 17:28:24 +0000 (09:28 -0800)]
FreeBSD libzfs: gcc requires __thread after static

Building libzfs with gcc on FreeBSD failed because gcc is picky about
the order of keywords in declarations with __thread, whereas clang is
more relaxed.

https://gcc.gnu.org/onlinedocs/gcc/Thread-Local.html

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ryan Libby <rlibby@FreeBSD.org>
Closes #11331

3 years agodmu_zfetch: fix memory leak
Matthew Macy [Sun, 13 Dec 2020 00:00:00 +0000 (16:00 -0800)]
dmu_zfetch: fix memory leak

The last change caused the read completion callback to not be called
if the IO was still in progress. This change restores allocation
of the arc buf callback, but in the callback path checks the new
acb_nobuf field to know to skip buffer allocation.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11324

3 years agoFix reporting of CKSUM errors in indirect vdevs
George Amanakis [Fri, 11 Dec 2020 20:15:37 +0000 (21:15 +0100)]
Fix reporting of CKSUM errors in indirect vdevs

When removing and subsequently reattaching a vdev, CKSUM errors may
occur as vdev_indirect_read_all() reads from all children of a mirror
in case of a resilver.

Fix this by checking whether a child is missing the data and setting a
flag (ic_error) which is then checked in vdev_indirect_repair() and
suppresses incrementing the checksum counter.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #11277

3 years agoRemove draid.d symlink from zfs_helpers.sh
Brian Behlendorf [Fri, 11 Dec 2020 19:00:58 +0000 (11:00 -0800)]
Remove draid.d symlink from zfs_helpers.sh

In an earlier revision of dRAID there existed an /etc/zfs/draid.d
directory.  This was removed before the final version was integrated
but a little bit was accidentally overlooked in the zfs_helpers.sh
script.  Remove this remnant.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11326

3 years agoarc_summary3: Handle overflowing value width
Ryan Moeller [Tue, 8 Dec 2020 20:20:25 +0000 (20:20 +0000)]
arc_summary3: Handle overflowing value width

Some tunables shown by arc_summary3 have string values that may exceed
the normal line length, leaving a negative offset between the name and
value fields.  The negative space is of course not valid and Python
rightly barfs up an exception traceback.

Handle an overflowing value field width by ignoring the line length
and separating the name from the value by a single space instead.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11270

3 years agoFreeBSD: Implement sysctl for fletcher4 impl
Ryan Moeller [Wed, 2 Dec 2020 21:45:08 +0000 (21:45 +0000)]
FreeBSD: Implement sysctl for fletcher4 impl

There is a tunable to select the fletcher 4 checksum implementation on
Linux but it was not present in FreeBSD.

Implement the sysctl handler for FreeBSD and use ZFS_MODULE_PARAM_CALL
to provide the tunable on both platforms.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11270

3 years agoImprove zfs receive performance with lightweight write
Matthew Ahrens [Fri, 11 Dec 2020 18:26:02 +0000 (10:26 -0800)]
Improve zfs receive performance with lightweight write

The performance of `zfs receive` can be bottlenecked on the CPU consumed
by the `receive_writer` thread, especially when receiving streams with
small compressed block sizes.  Much of the CPU is spent creating and
destroying dbuf's and arc buf's, one for each `WRITE` record in the send
stream.

This commit introduces the concept of "lightweight writes", which allows
`zfs receive` to write to the DMU by providing an ABD, and instantiating
only a new type of `dbuf_dirty_record_t`.  The dbuf and arc buf for this
"dirty leaf block" are not instantiated.

Because there is no dbuf with the dirty data, this mechanism doesn't
support reading from "lightweight-dirty" blocks (they would see the
on-disk state rather than the dirty data).  Since the dedup-receive code
has been removed, `zfs receive` is write-only, so this works fine.

Because there are no arc bufs for the received data, the received data
is no longer cached in the ARC.

Testing a receive of a stream with average compressed block size of 4KB,
this commit improves performance by 50%, while also reducing CPU usage
by 50% of a CPU.  On a per-block basis, CPU consumed by receive_writer()
and dbuf_evict() is now 1/7th (14%) of what it was.

Baseline: 450MB/s, CPU in receive_writer() 40% + dbuf_evict() 35%
New: 670MB/s, CPU in receive_writer() 17% + dbuf_evict() 0%

The code is also restructured in a few ways:

Added a `dr_dnode` field to the dbuf_dirty_record_t.  This simplifies
some existing code that no longer needs `DB_DNODE_ENTER()` and related
routines.  The new field is needed by the lightweight-type dirty record.

To ensure that the `dr_dnode` field remains valid until the dirty record
is freed, we have to ensure that the `dnode_move()` doesn't relocate the
dnode_t.  To do this we keep a hold on the dnode until it's zio's have
completed.  This is already done by the user-accounting code
(`userquota_updates_task()`), this commit extends that so that it always
keeps the dnode hold until zio completion (see `dnode_rele_task()`).

`dn_dirty_txg` was previously zeroed when the dnode was synced.  This
was not necessary, since its meaning can be "when was this dnode last
dirtied".  This change simplifies the new `dnode_rele_task()` code.

Removed some dead code related to `DRR_WRITE_BYREF` (dedup receive).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11105

3 years agoFix kernel panic induced by redacted send
Paul Dagnelie [Fri, 11 Dec 2020 18:22:29 +0000 (10:22 -0800)]
Fix kernel panic induced by redacted send

In the redaction list traversal code, there is a bug in the binary search
logic when looking for the resume point. Maxbufid can be decremented to -1,
causing us to read the last possible block of the object instead of the one we
wanted. This can cause incorrect resume behavior, or possibly even a hang in
some cases. In addition, when examining non-last blocks, we can treat the
block as being the same size as the last block, causing us to miss entries in
the redaction list when determining where to resume. Finally, we were ignoring
the case where the resume point was found in the buffer being searched, and
resuming from minbufid. All these issues have been corrected, and the code has
been significantly simplified to make future issues less likely.

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #11297

3 years agoFreeBSD: Fix format of vfs.zfs.arc_no_grow_shift
Ryan Moeller [Tue, 8 Dec 2020 17:21:36 +0000 (17:21 +0000)]
FreeBSD: Fix format of vfs.zfs.arc_no_grow_shift

vfs.zfs.arc_no_grow_shift has an invalid type (15) and this causes
py-sysctl to format it as a bytearray when it should be an integer.

"U" is not a valid format, it should be "I" and the type should match
the variable type, int.  We can return EINVAL if the value is set below
zero.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11318

3 years agoFreeBSD: Update usage of py-sysctl
Ryan Moeller [Tue, 8 Dec 2020 17:02:16 +0000 (17:02 +0000)]
FreeBSD: Update usage of py-sysctl

py-sysctl now includes the CTLTYPE_NODE type nodes in the list returned
by sysctl.filter() on FreeBSD head.  It also provides descriptions now.

Eliminate the subprocess call to get descriptions, and filter out the
nodes so we only deal with values.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11318

3 years agoFix possibly uninitialized 'root_inode' variable warning
Brian Behlendorf [Thu, 10 Dec 2020 23:23:26 +0000 (15:23 -0800)]
Fix possibly uninitialized 'root_inode' variable warning

Resolve an uninitialized variable warning when compiling.

    In function ‘zfs_domount’:
    warning: ‘root_inode’ may be used uninitialized in this
        function [-Wmaybe-uninitialized]
    sb->s_root = d_make_root(root_inode);

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11306

3 years agoImplement memory and CPU hotplug
Paul Dagnelie [Thu, 10 Dec 2020 22:09:23 +0000 (14:09 -0800)]
Implement memory and CPU hotplug

ZFS currently doesn't react to hotplugging cpu or memory into the
system in any way. This patch changes that by adding logic to the ARC
that allows the system to take advantage of new memory that is added
for caching purposes. It also adds logic to the taskq infrastructure
to support dynamically expanding the number of threads allocated to a
taskq.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Matthew Ahrens <matthew.ahrens@delphix.com>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #11212

3 years agoCI: add zloop workflow
Brian Behlendorf [Thu, 10 Dec 2020 18:55:53 +0000 (10:55 -0800)]
CI: add zloop workflow

Run ztest via zloop for 20 minutes, total run time is ~30 minutes.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11319

3 years agoCI: add zloop workflow
George Melikov [Tue, 8 Dec 2020 18:40:44 +0000 (21:40 +0300)]
CI: add zloop workflow

Run ztest via zloop for 20 minutes, total run time is ~30 minutes.

Signed-off-by: George Melikov <mail@gmelikov.ru>
3 years agoFreeBSD: Do zcommon_init sooner to avoid FPU panic
Ryan Moeller [Thu, 10 Dec 2020 05:29:00 +0000 (00:29 -0500)]
FreeBSD: Do zcommon_init sooner to avoid FPU panic

There has been a panic affecting some system configurations where the
thread FPU context is disturbed during the fletcher 4 benchmarks,
leading to a panic at boot.

module_init() registers zcommon_init to run in the last subsystem
(SI_SUB_LAST).  Running it as soon as interrupts have been configured
(SI_SUB_INT_CONFIG_HOOKS) makes sure we have finished the benchmarks
before we start doing other things.

While it's not clear *how* the FPU context was being disturbed, this
does seem to avoid it.

Add a module_init_early() macro to run zcommon_init() at this earlier
point on FreeBSD.  On Linux this is defined as module_init().

Authored by: Konstantin Belousov <kib@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11302

3 years agoZTS: three small follow up fixes for #11167
Attila Fülöp [Thu, 10 Dec 2020 05:27:12 +0000 (06:27 +0100)]
ZTS: three small follow up fixes for #11167

Follow up fix for 0cb40fa3. Remove unused variables, don't source
unused libs and add missed cleanup.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #11311

3 years agomount_zfs: print strerror instead of errno for error reporting
Érico Nogueira Rolim [Thu, 10 Dec 2020 05:24:59 +0000 (02:24 -0300)]
mount_zfs: print strerror instead of errno for error reporting

Tracking down an error message with the errno value can be difficult,
using strerror makes the error message clearer.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Érico Rolim <erico.erc@gmail.com>
Closes #11303

3 years agoDrop path prefix workaround
sterlingjensen [Thu, 10 Dec 2020 05:24:26 +0000 (23:24 -0600)]
Drop path prefix workaround

Canonicalization, the source of the trouble, was disabled in 9000a9f.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sterling Jensen <sterlingjensen@users.noreply.github.com>
Closes #11295

3 years agoDelete rw_semaphore.wait_lock configure check
Orivej Desh [Thu, 10 Dec 2020 05:22:54 +0000 (05:22 +0000)]
Delete rw_semaphore.wait_lock configure check

Last use of wait_lock was removed in "Linux 5.3 compat: retire
rw_tryupgrade()" (e7a99dab2b065ac2f8736a65d1b226d21754d771).

Fixes the issue reported in
https://github.com/openzfs/zfs/issues/11097#issuecomment-714532367

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Orivej Desh <orivej@gmx.fr>
Closes #11309

3 years agoDecouple arc_read_done callback from arc buf instantiation
Matthew Macy [Wed, 9 Dec 2020 23:05:06 +0000 (15:05 -0800)]
Decouple arc_read_done callback from arc buf instantiation

Add ARC_FLAG_NO_BUF to indicate that a buffer need not be
instantiated.  This fixes a ~20% performance regression on
cached reads due to zfetch changes.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11220
Closes #11232

3 years agoFix optional "force" arg handing in zfs_ioc_pool_sync()
Brian Behlendorf [Wed, 9 Dec 2020 22:52:45 +0000 (14:52 -0800)]
Fix optional "force" arg handing in zfs_ioc_pool_sync()

The fnvlist_lookup_boolean_value() function should not be used
to check the force argument since it's optional.  It may not be
provided or may have been created with the wrong flags.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11281
Closes #11284

3 years agoCI: add new zfs-tests-sanity workflow
George Melikov [Tue, 8 Dec 2020 17:53:45 +0000 (20:53 +0300)]
CI: add new zfs-tests-sanity workflow

Run zfs-tests with sanity.run for brief results.  Timeouts
are rare, so minimize false positives by increasing the
default from 60 to 180 seconds.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11304

3 years agoZTS: zpool_trim tests throttle trim process
George Melikov [Mon, 7 Dec 2020 18:06:10 +0000 (21:06 +0300)]
ZTS: zpool_trim tests throttle trim process

Otherwise trim may finish before progress checks.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11296

3 years agoReduce fletcher4 and raidz benchmark times
Brian Behlendorf [Sun, 6 Dec 2020 17:57:20 +0000 (09:57 -0800)]
Reduce fletcher4 and raidz benchmark times

During module load time all of the available fetcher4 and raidz
implementations are benchmarked for a fixed amount of time to
determine the fastest available.  Manual testing has shown that this
time can be significantly reduced with negligible effect on the final
results.

This commit changes the benchmark time to 1ms which can reduce the
module load time by over a second on x86_64.  On an x86_64 system
with sse3, ssse3, and avx2 instructions the benchmark times are:

    Fletcher4    603ms   -> 15ms
    RAIDZ        1,322ms -> 64ms

Reviewed-by: Matthew Macy <mmacy@freebsd.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11282

3 years agoAvoid some spa_has_pending_synctask() calls.
Alexander Motin [Sun, 6 Dec 2020 17:55:02 +0000 (12:55 -0500)]
Avoid some spa_has_pending_synctask() calls.

Since 8c4fb36a24 (PR #7795) spa_has_pending_synctask() started to
take two more locks per write inside txg_all_lists_empty().  I am
surprised those pool-wide locks are not contended, but still their
operations are visible in CPU profiles under contended vdev lock.

This commit slightly changes vdev_queue_max_async_writes() flow to
not call the function if we are going to return max_active any way
due to high amount of dirty data.  It allows to save some CPU time
exactly when the pool is busy.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-By: Tom Caputi <caputit1@tcnj.edu>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #11280

3 years agoBring consistency to ABD chunk count types.
Alexander Motin [Sun, 6 Dec 2020 17:53:40 +0000 (12:53 -0500)]
Bring consistency to ABD chunk count types.

With both abd_size and abd_nents being uint_t it makes no sense for
abd_chunkcnt_for_bytes() to return size_t.  Random mix of different
types used to count chunks looks bad and makes compiler more difficult
to optimize the code.

In particular on FreeBSD this change allows compiler to completely
optimize out abd_verify_scatter() when built without debug, removing
pointless 64-bit division and even more pointless empty loop.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #11279

3 years agoEnable ABI checks for the checkstyle workflow
Brian Behlendorf [Sun, 6 Dec 2020 17:50:47 +0000 (09:50 -0800)]
Enable ABI checks for the checkstyle workflow

Extend the CI checkstyle workflow to perform the library ABI
checks in the master branch.  The intent is not to prevent any
ABI changes but to detect them immediately so when they're
made it's done intentionally.

When the changing the ABI the `make storeabi` target can be
used to generate a new .abi file which can be included with
the commit.  This depends on the libabigail utility which is
available from the majority of distribution package managers.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11287

3 years agoZTS: adjust zpool_import_012_pos timeout
Brian Behlendorf [Sun, 6 Dec 2020 17:48:36 +0000 (09:48 -0800)]
ZTS: adjust zpool_import_012_pos timeout

When running in the CI the zpool_import_012_pos test case occasionally
takes longer than the maximum 600 seconds.  When this happens the test
case is considered to have failed but always completes a few minutes
latter.  Since the logs suggest nothing has actually failed this commit
increases timeout and removes the exception.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11286

3 years agoZTS: Update zfs_share_concurrent_shares.ksh
Brian Behlendorf [Sun, 6 Dec 2020 17:47:33 +0000 (09:47 -0800)]
ZTS: Update zfs_share_concurrent_shares.ksh

Occasionally an out of memory error is hit by this test case
when mounting the filesystems.  Try and reduce the likelihood
of this occurring by reducing the thread count from 100 to 50.
It also has the advantage of slightly speeding up the test.

    cannot mount 'testpool/testfs3/79': Cannot allocate memory
        filesystem successfully created, but not mounted

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11283

3 years agoFix raw sends on encrypted datasets when copying back snapshots
George Amanakis [Fri, 4 Dec 2020 22:34:29 +0000 (23:34 +0100)]
Fix raw sends on encrypted datasets when copying back snapshots

When sending raw encrypted datasets the user space accounting is present
when it's not expected to be. This leads to the subsequent mount failure
due a checksum error when verifying the local mac.
Fix this by clearing the OBJSET_FLAG_USERACCOUNTING_COMPLETE and reset
the local mac. This allows the user accounting to be correctly updated
on first mount using the normal upgrade process.

Reviewed-By: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-By: Tom Caputi <caputit1@tcnj.edu>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #10523
Closes #11221

3 years agozpool: Dryrun fails to list some devices
Attila Fülöp [Fri, 4 Dec 2020 22:04:39 +0000 (23:04 +0100)]
zpool: Dryrun fails to list some devices

`zpool create -n` fails to list cache and spare vdevs.
`zpool add -n` fails to list spare devices.
`zpool split -n` fails to list `special` and `dedup` labels.
`zpool add -n` and `zpool split -n` shouldn't list hole devices.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #11122
Closes #11167

3 years agoAdd -u option to 'zfs create'
Ryan Moeller [Fri, 4 Dec 2020 22:01:42 +0000 (17:01 -0500)]
Add -u option to 'zfs create'

Add -u option to 'zfs create' that prevents file system from being
automatically mounted. This is similar to the 'zfs receive -u'.

Authored by: pjd <pjd@FreeBSD.org>
FreeBSD-commit: freebsd/freebsd@35c58230e292775a694d189ff2b0bea2dcf6947d

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Ported-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11254

3 years agoAdd sanity.run file
Brian Behlendorf [Thu, 3 Dec 2020 18:49:39 +0000 (10:49 -0800)]
Add sanity.run file

This run file contains a subset of functional tests which exercise
as much functionality as possible while still executing relatively
quickly.  The included tests should take no more than a few seconds
each to run at most.  This provides a convenient way to sanity test a
change before committing to a full test run which takes several hours.

    $ ./scripts/zfs-tests.sh -r sanity
    ...
    Results Summary
    PASS  813

    Running Time: 00:14:42
    Percent passed: 100.0%

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11271

3 years agoFix trivial typo in zfs-diff.8
melak [Thu, 3 Dec 2020 18:18:26 +0000 (19:18 +0100)]
Fix trivial typo in zfs-diff.8

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tamas TEVESZ <ice@extreme.hu>
Closes #11268
Closes #11272

3 years agoFix for "Reduce latency effects of non-interactive I/O"
Alexander Motin [Thu, 3 Dec 2020 18:02:39 +0000 (13:02 -0500)]
Fix for "Reduce latency effects of non-interactive I/O"

It was found that setting min_active tunables for non-interactive I/Os
makes them stuck.  It is caused by zfs_vdev_nia_delay, that can never
be reached if we never issue any I/Os due to min_active set to zero.

Fix this by issuing at least one non-interactive I/O at a time when
there are no interactive I/Os.  When there are interactive I/Os, zero
min_active allows to completely block any non-interactive I/O.  It may
min_active starvation in some scenarios, but who we are to deny foot
shooting?

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #11261

3 years agoAdd compatibility for busybox mktemp
qzdanis [Thu, 3 Dec 2020 18:01:16 +0000 (13:01 -0500)]
Add compatibility for busybox mktemp

Busybox's mktemp requires at least six X's in the template, causing
the current sed --in-place check to fail because the file does not
exist. This change adds additional X's to mktemp templates that do
not already have at least six X's in them.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Quentin Zdanis <zdanisq@gmail.com>
Closes #11269

3 years agoFreeBSD: notify userspace when a vdev is removed
Ryan Moeller [Wed, 2 Dec 2020 18:20:02 +0000 (13:20 -0500)]
FreeBSD: notify userspace when a vdev is removed

This is needed for zfsd to autoreplace vdevs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11260

3 years agoAvoid unneccessary zio allocation and wait
Finix1979 [Wed, 2 Dec 2020 17:28:55 +0000 (01:28 +0800)]
Avoid unneccessary zio allocation and wait

In function dmu_buf_hold_array_by_dnode, the usage of zio is only for
the reading operation. Only create the zio and wait it in the reading
scenario as a performance optimization.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Finix Yan <yancw@info2soft.com>
Closes #11251
Closes #11256

3 years agoMake zpool status "remove:" label print in bold
Andrew Sun [Tue, 1 Dec 2020 23:22:51 +0000 (18:22 -0500)]
Make zpool status "remove:" label print in bold

When ZFS_COLOR is set, zpool status shows row headings in bold,
except for the "remove:" heading. This is a quick fix that makes
it print in bold too.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrew Sun <me@andrewsun.com>
Closes #11255

3 years agoCI: simplify checkstyle runner
George Melikov [Tue, 1 Dec 2020 20:15:55 +0000 (23:15 +0300)]
CI: simplify checkstyle runner

Remove excess steps.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11262

3 years agozpool_influxdb: move to libexec dir
Pavel Snajdr [Sat, 28 Nov 2020 19:15:57 +0000 (20:15 +0100)]
zpool_influxdb: move to libexec dir

Move the zpool_influxdb command to /usr/libexec/zfs,
and include the /usr/libexec/zfs path in the system search
directory when running the test suite.

Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Closes #11156
Closes #11160
Closes #11224

3 years agoVerify zfs module loaded before starting services
Brian Behlendorf [Sat, 28 Nov 2020 19:11:18 +0000 (11:11 -0800)]
Verify zfs module loaded before starting services

Extend the change made in ae12b02 to verify the zfs kernel
modules are loaded to the rest of the OpenZFS services.  If
the modules aren't loaded the neither the share, volume, or
and zed services can be started.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11243

3 years agodracut: use /bin/sh instead of bash as the intepreter
Đoàn Trần Công Danh [Sat, 28 Nov 2020 19:02:08 +0000 (02:02 +0700)]
dracut: use /bin/sh instead of bash as the intepreter

Despite that dracut has a hard dependency on bash,
its modules doesn't, dracut only has a hard dependency on bash for
module-setup (on a fully usable machine). Inside initramfs, dracut
allows users choose from a list of handful other shells, e.g. bash,
busybox, dash, mkfsh.

In fact, my local machine's initramfs is being built with dash,
and it's functional for a very long time.

Before 64025fa3a (Silence 'make checkbashisms', 2020-08-20), we also
allows our users to have that right, too.

Let's fix the problem 'make checkbashisms' reported and allows our users
to have that right, again.

For 'plymouth' case, let's simply run the command inside the if instead
of checking for the existence of command before running it, because the
status is also failture if plymouth is unavailable.

While we're at it, let's remove an unnecessary fork for grep in
zfs-generator.sh.in and its following complicated 'if elif fi' with
a simple 'case ... esac'.

To support this change, also exclude 90zfs from "make checkbashisms"
because the current CI infrastructure ships an old version of
"checkbashisms", which complains about "command -v", while the current
latest "checkbashisms" thinks it's fine. In the near future, we can
revert that change to "Makefile.am" when CI infrastructure is updated.

Reviewed-by: Gabriel A. Devenyi <gdevenyi@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Closes #11244

3 years agoRemove incorrect assertion
Brian Behlendorf [Tue, 24 Nov 2020 17:28:42 +0000 (09:28 -0800)]
Remove incorrect assertion

Commit 85703f6 added a new ASSERT to zfs_write() as part of the
cleanup which isn't correct in the case where multiple processes
are concurrently extending a file.  The `zp->z_size` is updated
atomically while holding a range lock on only a portion of the
file.  Therefore, it's possible for the file size to increase
after a same check is performed earlier in the loop causing this
ASSERT to fail.  The code itself handles this case correctly so
only the invalid ASSERT needs to be removed.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11235

3 years agoReduce latency effects of non-interactive I/O
Alexander Motin [Tue, 24 Nov 2020 17:26:42 +0000 (12:26 -0500)]
Reduce latency effects of non-interactive I/O

Investigating influence of scrub (especially sequential) on random read
latency I've noticed that on some HDDs single 4KB read may take up to 4
seconds!  Deeper investigation shown that many HDDs heavily prioritize
sequential reads even when those are submitted with queue depth of 1.

This patch addresses the latency from two sides:
 - by using _min_active queue depths for non-interactive requests while
   the interactive request(s) are active and few requests after;
 - by throttling it further if no interactive requests has completed
   while configured amount of non-interactive did.

While there, I've also modified vdev_queue_class_to_issue() to give
more chances to schedule at least _min_active requests to the lowest
priorities.  It should reduce starvation if several non-interactive
processes are running same time with some interactive and I think should
make possible setting of zfs_vdev_max_active to as low as 1.

I've benchmarked this change with 4KB random reads from ZVOL with 16KB
block size on newly written non-fragmented pool.  On fragmented pool I
also saw improvements, but not so dramatic.  Below are log2 histograms
of the random read latency in milliseconds for different devices:

4 2x mirror vdevs of SATA HDD WDC WD20EFRX-68EUZN0 before:
0, 0, 2,  1,  12,  21,  19,  18, 10, 15, 17, 21
after:
0, 0, 0, 24, 101, 195, 419, 250, 47,  4,  0,  0
, that means maximum latency reduction from 2s to 500ms.

4 2x mirror vdevs of SATA HDD WDC WD80EFZX-68UW8N0 before:
0, 0,  2,  31,  38,  28,  18,  12, 17, 20, 24, 10, 3
after:
0, 0, 55, 247, 455, 470, 412, 181, 36,  0,  0,  0, 0
, i.e. from 4s to 250ms.

1 SAS HDD SEAGATE ST14000NM0048 before:
0,  0,  29,   70, 107,   45,  27, 1, 0, 0, 1, 4, 19
after:
1, 29, 681, 1261, 676, 1633,  67, 1, 0, 0, 0, 0,  0
, i.e. from 4s to 125ms.

1 SAS SSD SEAGATE XS3840TE70014 before (microseconds):
0, 0, 0, 0, 0, 0, 0, 0,  70, 18343, 82548, 618
after:
0, 0, 0, 0, 0, 0, 0, 0, 283, 92351, 34844,  90

I've also measured scrub time during the test and on idle pools.  On
idle fragmented pool I've measured scrub getting few percent faster
due to use of QD3 instead of QD2 before.  On idle non-fragmented pool
I've measured no difference.  On busy non-fragmented pool I've measured
scrub time increase about 1.5-1.7x, while IOPS increase reached 5-9x.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #11166

3 years agoObsolete earlier packages due to version bump
Brian Behlendorf [Tue, 24 Nov 2020 17:24:24 +0000 (09:24 -0800)]
Obsolete earlier packages due to version bump

In order for package managers such as dnf to upgrade cleanly after
the package SONAME bump the obsolete package names must be known.
Update the new packages to correctly obsolete the old ones.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11230
Closes #11233

3 years agoFreeBSD: decouple ZFS_DEBUG from kernel debug settings
Matthew Macy [Tue, 24 Nov 2020 17:16:46 +0000 (09:16 -0800)]
FreeBSD: decouple ZFS_DEBUG from kernel debug settings

Reviewed-by: Martelli Nikola @martellini
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11213

3 years agoUpdate dRAID short feature description
Brian Behlendorf [Mon, 23 Nov 2020 22:49:17 +0000 (14:49 -0800)]
Update dRAID short feature description

The documentation describes dRAID as a distributed spare, not
parity, RAID implementation.  Update the short feature description
to match the rest of the documentation.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11229

3 years agolibzfsbootenv: do not depend on libnvpair
Antonio Russo [Sun, 22 Nov 2020 23:16:42 +0000 (16:16 -0700)]
libzfsbootenv: do not depend on libnvpair

We do not build libnvpair.pc.  Moreover, it is automatically pulled in
by libzfs.pc, so no additional specific dependency is required.

Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #11227

3 years agopam_zfs_key: accommodate different dataset naming scheme
cragw [Sun, 22 Nov 2020 17:32:34 +0000 (01:32 +0800)]
pam_zfs_key: accommodate different dataset naming scheme

Name of dataset for user home directory may vary from the expected
$homes_prefix/$username, if different naming scheme is being used.

We can use property mountpoint to specify the dataset for $username
as long as its value is identical to passwd's pw_dir.

For example:
    NAME                       PROPERTY     VALUE
    rpool/home/myuser_123456   mountpoint   /home/myuser

Reviewed-by: Felix Dörre <felix@dogcraft.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Crag Wang <crag0715@gmail.com>
Closes #11165

3 years agoInclude the ABI with dist tarball
Brian Behlendorf [Sat, 21 Nov 2020 18:44:52 +0000 (10:44 -0800)]
Include the ABI with dist tarball

The ABI should be included when generating the `make dist` tarball
since it's required by the `make checkabi` target.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11225

3 years agoCorrect missing zil_claim() DTL updates
Brian Behlendorf [Fri, 20 Nov 2020 21:14:45 +0000 (13:14 -0800)]
Correct missing zil_claim() DTL updates

Commit a1d477c2 accidentally disabled DTL updates for the zil_claim()
case described at the end of vdev_stat_update() by unconditionally
disabling all DTL updates when loading.  This was done to avoid
a deadlock on the vd_dtl_lock when loading the DTLs from disk.

    vdev_dtl_contains <--- Takes vd->vd_dtl_lock
    vdev_mirror_child_missing
    vdev_mirror_io_start
    zio_vdev_io_start
    __zio_execute
    arc_read
    dbuf_issue_final_prefetch
    dbuf_prefetch_impl
    dbuf_prefetch
    dmu_prefetch
    space_map_iterate
    space_map_load_length
    space_map_load
    vdev_dtl_load <--- Takes vd->vd_dtl_lock
    vdev_load
    spa_ld_load_vdev_metadata
    spa_tryimport

The missing DTL updates can be restored by moving the space_map_load()
call outside the vd_dtl_lock.  A private range tree is populated by
reading the space map and then merged in to the DTL_MISSING tree
under the lock.

Furthermore, the SPA_LOAD_NONE check in vdev_dtl_contains() leads to an
additional problem.  Any resilvering which occurs before SPA_LOAD_NONE
is set will incorrectly determine that there's nothing to repair.  This
can result in full redundancy not being restored for some blocks.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11218

3 years agoTrack SONAME version bump in packaging
Antonio Russo [Fri, 20 Nov 2020 00:25:24 +0000 (17:25 -0700)]
Track SONAME version bump in packaging

RPM and DEB packages are named after the SONAME version of the library
they contain.  After bumping this version, the packaging should be
renamed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #11219

3 years agodracut/mount-zfs.sh: quote expansion on zpool test
наб [Thu, 12 Nov 2020 22:16:50 +0000 (23:16 +0100)]
dracut/mount-zfs.sh: quote expansion on zpool test

Bring over some of the improvements from dracut/zfs-load-key.sh,
shellcheck is slightly quieter as well

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11198

3 years agodracut/zfs-load-key.sh: simplify import loop, quote variable assignments
наб [Thu, 12 Nov 2020 22:06:24 +0000 (23:06 +0100)]
dracut/zfs-load-key.sh: simplify import loop, quote variable assignments

The loop now has a less confusing condition and properly uses
systemctl(1) is-failed's return code instead of that entire mess

The assignments could turn into "var=val program" if encryptionroot
or keylocation had whitespace in them

As a bonus, this (mostly) silences shellcheck

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11198

3 years agoReduce confusion in zfs_write
Ryan Moeller [Wed, 18 Nov 2020 23:06:59 +0000 (18:06 -0500)]
Reduce confusion in zfs_write

Is this block when abuf != NULL ever reached? Yes, it is.

Add asserts and comments to prove that when we get here, we have a full
block write at an aligned offset extending past EOF.

Simplify by removing the check that tx_bytes == max_blksz, since we can
assert that it is always true.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11191

3 years agoFix problems in zvol_set_volmode_impl
Matthew Macy [Tue, 17 Nov 2020 17:50:52 +0000 (09:50 -0800)]
Fix problems in zvol_set_volmode_impl

- Don't leave fstrans set when passed a snapshot
- Don't remove minor if volmode already matches new value
- (FreeBSD) Wait for GEOM ops to complete before trying
  remove (at create time GEOM will be "tasting" in parallel)
- (FreeBSD) Don't leak zvol_state_lock on open if zv == NULL
- (FreeBSD) Don't try to unlock zv->zv_state lock if zv == NULL

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11199

3 years agoAdd ABI snapshot
Brian Behlendorf [Sun, 15 Nov 2020 04:38:34 +0000 (21:38 -0700)]
Add ABI snapshot

Add a snapshot of the current ABI using libabigail-1.7-2.  The
included ABI passes `make checkabi` for CentOS 7, Fedora 33,
Debian 10, and Ubuntu 20.04.  This covers a fairly wide range
of glibc, gcc, and libabigail versions plus other changes which
are platform specific.

Reviewed-by: Antonio Russo <aerusso@aerusso.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11144

3 years agoLibrary ABI tracking with abigail
Antonio Russo [Sun, 15 Nov 2020 04:35:31 +0000 (21:35 -0700)]
Library ABI tracking with abigail

Provide two make targets: checkabi and storeabi.

storeabi uses libabigail to generate a reference copy of the ABI for the
public libraries.

checkabi compares such a reference to the compiled version, failing if
they are not compatible.  No ABI is generated for libzpool.so, it is
only used by ztest and zdb and not external consumers.

Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #11144

3 years agozpool: correctly align columns with -p
наб [Fri, 13 Nov 2020 22:38:29 +0000 (23:38 +0100)]
zpool: correctly align columns with -p

zpool_expand_proplist() now ignores pl_fixed if its new literal
argument is true.  The rest is a consequence of needing to pass
that down.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiao?=~Dska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11202

3 years agozpool(8): fix pool-wi[sd]e typo
наб [Fri, 13 Nov 2020 22:53:37 +0000 (23:53 +0100)]
zpool(8): fix pool-wi[sd]e typo

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11202

3 years agoFix 'zfs userspace' for received datasets in encrypted root
loli10K [Mon, 16 Nov 2020 17:10:29 +0000 (18:10 +0100)]
Fix 'zfs userspace' for received datasets in encrypted root

For encrypted receives, where user accounting is initially disabled on
creation, both 'zfs userspace' and 'zfs groupspace' fails with
EOPNOTSUPP: this is because dmu_objset_id_quota_upgrade_cb() forgets to
set OBJSET_FLAG_USERACCOUNTING_COMPLETE on the objset flags after a
successful dmu_objset_space_upgrade().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #9501
Closes #9596

3 years agoFix ASSERT logic in l2arc_evict()
George Amanakis [Mon, 16 Nov 2020 17:08:11 +0000 (18:08 +0100)]
Fix ASSERT logic in l2arc_evict()

In case of cache device removal it is possible that at the end of
l2arc_evict() we have l2ad_hand = l2ad_evict. This can lead to the
following panic in case of a debug build:

VERIFY3(dev->l2ad_hand < dev->l2ad_evict) failed (321920512 < 321920512)
Call Trace:
 dump_stack+0x66/0x90
 spl_panic+0xef/0x117 [spl]
 l2arc_remove_vdev+0x11d/0x290 [zfs]
 spa_load_l2cache+0x275/0x5b0 [zfs]
 spa_vdev_remove+0x4a5/0x6e0 [zfs]
 zfs_ioc_vdev_remove+0x59/0xa0 [zfs]
 zfsdev_ioctl_common+0x5b3/0x630 [zfs]
 zfsdev_ioctl+0x53/0xe0 [zfs]
 do_vfs_ioctl+0x42e/0x6b0
 ksys_ioctl+0x5e/0x90
 do_syscall_64+0x5b/0x1a0
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

In case of cache device removal it also possible that l2ad_hand +
distance > l2ad_end since we do not iterate l2arc_evict() and l2ad_hand
is not reset. This has no functional consequence however as the cache
device is about to be removed.

Fix this by omitting the ASSERT in case of device removal.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #11205

3 years agoconfig/dracut/90zfs: handle cases where hostid(1) returns all zeros
Érico Rolim [Fri, 13 Nov 2020 03:00:59 +0000 (00:00 -0300)]
config/dracut/90zfs: handle cases where hostid(1) returns all zeros

On systems with musl libc, hostid(1) always prints "00000000", which
will cause improper behavior when the 90zfs module is configured in a
dracut initramfs. Work around this by copying the host /etc/hostid if
the file exists, and otherwise only write /etc/hostid if hostid(1)
returns something meaningful. This avoids zgenhostid creating a random
/etc/hostid for the initramfs, which could lead to errors when trying to
import the pool if spl_hostid isn't defined in the kernel command line.

Furthermore, tag the /etc/hostid file as hostonly, since it is system
specific and shouldn't be taken into account when trying to use an
initramfs generated in one system to boot into a different system.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Georgy Yakovlev <gyakovlev@gentoo.org>
Co-authored-by: Andrew J. Hesford <ajh@sideband.org>
Signed-off-by: Érico Rolim <erico.erc@gmail.com>
Closes #11174
Closes #11189

3 years agozgenhostid: accept hostid arguments equal to zero.
Érico Rolim [Tue, 10 Nov 2020 14:22:27 +0000 (11:22 -0300)]
zgenhostid: accept hostid arguments equal to zero.

A common usage pattern for zgenhostid, including in the ZFS dracut
module, is running it as:

  zgenhostid $(hostid)

However, zgenhostid only accepted hostid arguments greater than 0, which
meant that, when the output of hostid(1) was "00000000", zgenhostid
would error out, even though 0 is a possible return value for the
gethostid(3) function used by hostid(1):

- On current musl libc, gethostid(3) is a stub that always returns 0.
- On glibc, gethostid(3) will return 0 if /etc/hostid exists but is
  smaller than 4 bytes.

In these cases, it makes more sense for zgenhostid to treat a value of 0
as other parts of the zfs codebase do, meaning that a hostid value
couldn't be determined; therefore, it should attempt to generate a
random value to write into /etc/hostid.

The manpage and usage output have been updated to reflect this.

Whitespace has also been fixed in the usage output.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Georgy Yakovlev <gyakovlev@gentoo.org>
Reviewed-by: Andrew J. Hesford <ajh@sideband.org>
Signed-off-by: Érico Rolim <erico.erc@gmail.com>
Closes #11174
Closes #11189

3 years agoLinux: Fix ZFS_ENTER/ZFS_EXIT/ZFS_VERFY_ZP usage
Brian Behlendorf [Sat, 14 Nov 2020 18:19:00 +0000 (10:19 -0800)]
Linux: Fix ZFS_ENTER/ZFS_EXIT/ZFS_VERFY_ZP usage

The ZFS_ENTER/ZFS_EXIT/ZFS_VERFY_ZP macros should not be used
in the Linux zpl_*.c source files.  They return a positive error
value which is correct for the common code, but not for the Linux
specific kernel code which expects a negative return value.  The
ZPL_ENTER/ZPL_EXIT/ZPL_VERFY_ZP macros should be used instead.

Furthermore, the ZPL_EXIT macro has been updated to not call the
zfs_exit_fs() function.  This prevents a possible deadlock which
can occur when a snapshot is automatically unmounted because the
zpl_show_devname() must never wait on in progress automatic
snapshot unmounts.

Reviewed-by: Adam Moss <c@yotes.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11169
Closes #11201

3 years agoAssertion failure when logging large output of channel program
Matthew Ahrens [Sat, 14 Nov 2020 18:17:16 +0000 (10:17 -0800)]
Assertion failure when logging large output of channel program

The output of ZFS channel programs is logged on-disk in the zpool
history, and printed by `zpool history -i`.  Channel programs can use
10MB of memory by default, and up to 100MB by using the `zfs program -m`
flag.  Therefore their output can be up to some fraction of 100MB.

In addition to being somewhat wasteful of the limited space reserved for
the pool history (which for large pools is 1GB), in extreme cases this
can result in a failure of `ASSERT(length <= DMU_MAX_ACCESS);` in
`dmu_buf_hold_array_by_dnode()`.

This commit limits the output size that will be logged to 1MB.  Larger
outputs will not be logged, instead a entry will be logged indicating
the size of the omitted output.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11194

3 years agoReturn EFAULT at the end of zfs_write() when set
Ryan Moeller [Sat, 14 Nov 2020 18:16:26 +0000 (13:16 -0500)]
Return EFAULT at the end of zfs_write() when set

FreeBSD's VFS expects EFAULT from zfs_write() if we didn't complete
the full write so it can retry the operation.  Add some missing
SET_ERRORs in zfs_write().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11193

3 years agoDistributed Spare (dRAID) Feature
Brian Behlendorf [Fri, 13 Nov 2020 21:51:51 +0000 (13:51 -0800)]
Distributed Spare (dRAID) Feature

This patch adds a new top-level vdev type called dRAID, which stands
for Distributed parity RAID.  This pool configuration allows all dRAID
vdevs to participate when rebuilding to a distributed hot spare device.
This can substantially reduce the total time required to restore full
parity to pool with a failed device.

A dRAID pool can be created using the new top-level `draid` type.
Like `raidz`, the desired redundancy is specified after the type:
`draid[1,2,3]`.  No additional information is required to create the
pool and reasonable default values will be chosen based on the number
of child vdevs in the dRAID vdev.

    zpool create <pool> draid[1,2,3] <vdevs...>

Unlike raidz, additional optional dRAID configuration values can be
provided as part of the draid type as colon separated values. This
allows administrators to fully specify a layout for either performance
or capacity reasons.  The supported options include:

    zpool create <pool> \
        draid[<parity>][:<data>d][:<children>c][:<spares>s] \
        <vdevs...>

    - draid[parity]       - Parity level (default 1)
    - draid[:<data>d]     - Data devices per group (default 8)
    - draid[:<children>c] - Expected number of child vdevs
    - draid[:<spares>s]   - Distributed hot spares (default 0)

Abbreviated example `zpool status` output for a 68 disk dRAID pool
with two distributed spares using special allocation classes.

```
  pool: tank
 state: ONLINE
config:

    NAME                  STATE     READ WRITE CKSUM
    slag7                 ONLINE       0     0     0
      draid2:8d:68c:2s-0  ONLINE       0     0     0
        L0                ONLINE       0     0     0
        L1                ONLINE       0     0     0
        ...
        U25               ONLINE       0     0     0
        U26               ONLINE       0     0     0
        spare-53          ONLINE       0     0     0
          U27             ONLINE       0     0     0
          draid2-0-0      ONLINE       0     0     0
        U28               ONLINE       0     0     0
        U29               ONLINE       0     0     0
        ...
        U42               ONLINE       0     0     0
        U43               ONLINE       0     0     0
    special
      mirror-1            ONLINE       0     0     0
        L5                ONLINE       0     0     0
        U5                ONLINE       0     0     0
      mirror-2            ONLINE       0     0     0
        L6                ONLINE       0     0     0
        U6                ONLINE       0     0     0
    spares
      draid2-0-0          INUSE     currently in use
      draid2-0-1          AVAIL
```

When adding test coverage for the new dRAID vdev type the following
options were added to the ztest command.  These options are leverages
by zloop.sh to test a wide range of dRAID configurations.

    -K draid|raidz|random - kind of RAID to test
    -D <value>            - dRAID data drives per group
    -S <value>            - dRAID distributed hot spares
    -R <value>            - RAID parity (raidz or dRAID)

The zpool_create, zpool_import, redundancy, replacement and fault
test groups have all been updated provide test coverage for the
dRAID feature.

Co-authored-by: Isaac Huang <he.huang@intel.com>
Co-authored-by: Mark Maybee <mmaybee@cray.com>
Co-authored-by: Don Brady <don.brady@delphix.com>
Co-authored-by: Matthew Ahrens <mahrens@delphix.com>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mmaybee@cray.com>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10102

3 years agoChannel program may spuriously fail with "memory limit exhausted"
Matthew Ahrens [Thu, 12 Nov 2020 01:16:15 +0000 (17:16 -0800)]
Channel program may spuriously fail with "memory limit exhausted"

ZFS channel programs (invoked by `zfs program`) are executed in a LUA
sandbox with a limit on the amount of memory they can consume.  The
limit is 10MB by default, and can be raised to 100MB with the `-m` flag.
If the memory limit is exceeded, the LUA program exits and the command
fails with a message like `Channel program execution failed: Memory
limit exhausted.`

The LUA sandbox allocates memory with `vmem_alloc(KM_NOSLEEP)`, which
will fail if the requested memory is not immediately available.  In this
case, the program fails with the same message, `Memory limit exhausted`.
However, in this case the specified memory limit has not been reached,
and the memory may only be temporarily unavailable.

This commit changes the LUA memory allocator `zcp_lua_alloc()` to use
`vmem_alloc(KM_SLEEP)`, so that we won't spuriously fail when memory is
temporarily low.  Instead, we rely on the system to be able to free up
memory (e.g. by evicting from the ARC), and we assume that even at the
highest memory limit of 100MB, the channel program will not truly
exhaust the system's memory.

External-issue: DLPX-71924
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11190

3 years agoLinux: Fix mount/unmount when dataset name has a space
Brian Behlendorf [Thu, 12 Nov 2020 01:14:24 +0000 (17:14 -0800)]
Linux: Fix mount/unmount when dataset name has a space

The custom zpl_show_devname() helper should translate spaces in
to the octal escape sequence \040.  The getmntent(2) function
is aware of this convention and properly translates the escape
character back to a space when reading the fsname.

Without this change the `zfs mount` and `zfs unmount` commands
incorrectly detect when a dataset with a name containing spaces
is mounted.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11182
Closes #11187

3 years agoG/C data_alloc_arena
Mateusz Guzik [Thu, 12 Nov 2020 01:11:32 +0000 (02:11 +0100)]
G/C data_alloc_arena

It is a leftover from illumos always set to NULL and introducing a
spurious difference between zio_buf and zio_data_buf.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11188

3 years agoStart snapdir_iterate traversals to begin wtih the value of zero.
Tony Perkins [Mon, 28 Sep 2020 00:46:22 +0000 (20:46 -0400)]
Start snapdir_iterate traversals to begin wtih the value of zero.

The microzap hash can sometimes be zero for single digit snapnames.
The zap cursor can then have a serialized value of two (for . and ..),
and skip the first entry in the avl tree for the .zfs/snapshot directory
listing, and therefore does not return all snapshots.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Cedric Berger <cedric@precidata.com>
Signed-off-by: Tony Perkins <tperkins@datto.com>
Closes #11039

3 years agoFix compiling on FreeBSD + gcc - don't assume illmnos bits
Adrian Chadd [Thu, 15 Oct 2020 20:07:12 +0000 (13:07 -0700)]
Fix compiling on FreeBSD + gcc - don't assume illmnos bits

This looks like it was once from the illumnos compat code.
FreeBSD doesn't have cmn_err as a compiler format attribute, so
it definitely errors out.

It doesn't show up on LLVM because it doesn't trigger at all.

Add in the format flags but keep them behind #if 0 for now;
there are too many format issues that trigger when one does
format checking in the shared code.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: adrian chadd <adrian@freebsd.org>
Closes #11068
Closes #11069

3 years agoFix pointer-is-uint64_t-sized assumption in the ioctl path
Adrian Chadd [Thu, 15 Oct 2020 20:02:43 +0000 (13:02 -0700)]
Fix pointer-is-uint64_t-sized assumption in the ioctl path

This shows up when compiling freebsd-head on amd64 using gcc-6.4.
The lib32 compat build ends up tripping over this assumption.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: adrian chadd <adrian@freebsd.org>
Closes #11068
Closes #11069

3 years agoFix memleak in cmd/mount_zfs.c
sterlingjensen [Tue, 10 Nov 2020 23:50:44 +0000 (17:50 -0600)]
Fix memleak in cmd/mount_zfs.c

Convert dynamic allocation to static buffer, simplify parse_dataset
function return path. Add tests specific to the mount helper.

Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sterling Jensen <sterlingjensen@users.noreply.github.com>
Closes #11098

3 years agozpoolprops.8: clarify vdev expansion rules
наб [Tue, 10 Nov 2020 20:48:26 +0000 (21:48 +0100)]
zpoolprops.8: clarify vdev expansion rules

Remove reference to EFI(?), explain that the new space
is beyond the GPT for whole-disk vdevs, and add section noting how it
behaves with partition vdevs in terms of how the user is most likely to
encounter it ‒ the previous phrasing was confusing
and seemed to indicate that "zpool online -e" will be able to claim

  GPT[whatever, ZFS, free space, whatever]

into

  GPT[whatever, ZFS, whatever]
but that's not the case, as it'll only be able to do so after manually
resizing the ZFS partition to include the free space beforehand, i.e.:
  GPT[whatever, ZFS, free space, whatever]
  GPT[whatever, [ZFS + free space], potentially left-overs, whatever]
  # zpool online -e
  GPT[whatever, ZFS, whatever]

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11158

3 years agoG/C struct znode -> z_moved
Mateusz Guzik [Tue, 10 Nov 2020 20:42:47 +0000 (21:42 +0100)]
G/C struct znode -> z_moved

The field is yet another leftover from unsupported zfs_znode_move.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11186

3 years agoinitramfs: zfsunlock hook breaks /usr/bin
Pavel Zakharov [Tue, 10 Nov 2020 19:12:07 +0000 (14:12 -0500)]
initramfs: zfsunlock hook breaks /usr/bin

The copy_exec() function expects that the full path of the target
file is passed rather than just the directory, and will take care
of creating the underlying directories if they don't exist.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Closes #11162

3 years agoFreeBSD: Simplify zvol_geom_open and zvol_cdev_open
Ryan Moeller [Fri, 6 Nov 2020 18:56:58 +0000 (13:56 -0500)]
FreeBSD: Simplify zvol_geom_open and zvol_cdev_open

We can consolidate the unlocking procedure into one place by starting
with drop_suspend set to B_FALSE and moving the open count check up.

While here, a little code cleanup. Match the out labels between
zvol_geom_open and zvol_cdev_open, and add a missing period in some
comments.

Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11175

3 years agoFreeBSD: Avoid spurious EINTR in zvol_cdev_open
Ryan Moeller [Fri, 6 Nov 2020 18:52:16 +0000 (13:52 -0500)]
FreeBSD: Avoid spurious EINTR in zvol_cdev_open

zvol_first_open can fail with EINTR if spa_namespace_lock is not held
and cannot be taken without waiting.

Apply the same logic that was done for zvol_geom_open to take
spa_namespace_lock if not already held on first open in zvol_cdev_open.

Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11175

3 years agoSimplify offset and length limit in zfs_write
Ryan Moeller [Mon, 9 Nov 2020 21:01:56 +0000 (16:01 -0500)]
Simplify offset and length limit in zfs_write

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11176

3 years agoConst some unchanging variables in zfs_write
Ryan Moeller [Wed, 4 Nov 2020 23:10:12 +0000 (23:10 +0000)]
Const some unchanging variables in zfs_write

Show that these values will not be changing later.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11176

3 years agoFreeBSD: Move uio_prefaultpages def to uio.h
Ryan Moeller [Wed, 4 Nov 2020 21:43:30 +0000 (21:43 +0000)]
FreeBSD: Move uio_prefaultpages def to uio.h

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11176

3 years agoRemove redundant oid parameter to update_pages
Ryan Moeller [Wed, 4 Nov 2020 21:47:14 +0000 (21:47 +0000)]
Remove redundant oid parameter to update_pages

The oid comes from the znode we are already passing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11176

3 years agoFactor uid, gid, and projid out of loop in zfs_write
Ryan Moeller [Wed, 4 Nov 2020 22:10:13 +0000 (22:10 +0000)]
Factor uid, gid, and projid out of loop in zfs_write

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11176

3 years agoFix dmu_tx_dirty_throttle after arc_c reduction
Alexander Motin [Tue, 10 Nov 2020 18:39:26 +0000 (13:39 -0500)]
Fix dmu_tx_dirty_throttle after arc_c reduction

After initial arc_c was reduced to arc_c_min it became possible that
on datasets with primarycache=metadata or none dirty data make up most
of ARC capacity and easily more than configured 50% of initial arc_c,
that causes forced txg commits by arc_tempreserve_space() and periodic
very long write delays.

This patch makes arc_tempreserve_space() to use arc_c only after ARC
warmed up once and arc_c really means something, but use arc_c_max
before that.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #11178

3 years agoFix dnode refcount tracking
Matthew Macy [Tue, 10 Nov 2020 18:37:10 +0000 (10:37 -0800)]
Fix dnode refcount tracking

Fix a couple of places where the wrong tag is passed
to dnode_{hold, rele}

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11184

3 years agoZTS: Add L1 corruption test
Ryan Moeller [Wed, 21 Oct 2020 22:35:08 +0000 (22:35 +0000)]
ZTS: Add L1 corruption test

Add a new test case which corrupts all level 1 block in a file.
Then verifies that corruption is detected and repaired.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11141

3 years agoZTS: Output all block copies in list_file_blocks
Ryan Moeller [Thu, 29 Oct 2020 21:43:38 +0000 (21:43 +0000)]
ZTS: Output all block copies in list_file_blocks

The second part of list_file_blocks transforms the object description
output by zdb -ddddd $ds $objnum into a stream of lines of the form
"level path offset length" for the indirect blocks in the given file.
The current code only works for the first copy of L0 blocks.  L1 and
L2 indirect blocks have more than one copy on disk.

Add one more -d to the zdb command so we get all block copies and
rewrite the transformation to match more than L0 and output all DVAs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11141

3 years agoZTS: Fix list_file_blocks for mirror vdevs, level > 0
Ryan Moeller [Wed, 28 Oct 2020 20:29:31 +0000 (20:29 +0000)]
ZTS: Fix list_file_blocks for mirror vdevs, level > 0

The first part of list_file_blocks transforms the pool configuration
output by zdb -C $pool into shell code to set up a shell variable,
VDEV_MAP, that maps from vdev id to the underlying vdev path. This
variable is a simple indexed array. However, the vdev id in a DVA is
only the id of the top level vdev.

When the pool is mirrored, the top level vdev is a mirror and its
children are the mirrored devices. So, what we need is to map from
the top level vdev id to a list of the underlying vdev paths.
ist_file_blocks does not need to work for raidz vdevs, so we can
disregard that case.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11141

3 years agoFreeBSD: Prevent a NULL reference in zvol_cdev_open
Mariusz Zaborski [Fri, 6 Nov 2020 01:02:19 +0000 (02:02 +0100)]
FreeBSD: Prevent a NULL reference in zvol_cdev_open

Check if the ZVOL has been written before calling zil_async_to_sync.
The ZIL will be opened on the first write, not earlier.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mariusz Zaborski <oshogbo@vexillium.org>
Closes #11152

3 years agoFreeBSD: Prevent NULL pointer dereference of resid
khng300 [Thu, 5 Nov 2020 00:50:08 +0000 (08:50 +0800)]
FreeBSD: Prevent NULL pointer dereference of resid

spa_config_load() passes NULL into resid when doing zfs_file_read().
This would trip over when vfs.zfs.autoimport_disable=0.

Sponsored by: The FreeBSD Foundation
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Ka Ho Ng <khng@freebsdfoundation.org>
Closes #11149

3 years agoSynchronize library ABI levels
Antonio Russo [Sat, 31 Oct 2020 14:39:58 +0000 (08:39 -0600)]
Synchronize library ABI levels

Bump library SOVERSION under Linux to match FreeBSD's.

Additionally, this bump properly accounts for the ABI changes relative
to ZoL 0.8.5 for the Linux build.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Issue #11144

3 years agoFreeBSD: zvol_os: Use SET_ERROR more judiciously
Ryan Moeller [Tue, 3 Nov 2020 17:21:09 +0000 (12:21 -0500)]
FreeBSD: zvol_os: Use SET_ERROR more judiciously

SET_ERROR is useful to trace errors, so use it where the errors occur
rather than factored out to the end of a function.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11146

3 years agoZTS: zdb_block_size_histogram increase variance
Brian Behlendorf [Tue, 3 Nov 2020 17:20:34 +0000 (09:20 -0800)]
ZTS: zdb_block_size_histogram increase variance

The expected variance for this test case was originally set at 10%
based on local testing.  Additional testing via the CI has show it
can be as large as 11%.  Increase the expected maximum to 12% to
prevent this test from incorrectly failing.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11148