Arvind Sankar [Sun, 7 Jun 2020 21:03:12 +0000 (17:03 -0400)]
Cleanup linux module kbuild files
The linux module can be built either as an external module, or compiled
into the kernel, using copy-builtin. The source and build directories
are slightly different between the two cases, and currently, compiling
into the kernel still refers to some files from the configured ZFS
source tree, instead of the copies inside the kernel source tree. There
is also duplication between copy-builtin, which creates a Kbuild file to
build ZFS inside the kernel tree, and the top-level module/Makefile.in.
Fix this by moving the list of modules and the CFLAGS settings into a
new module/Kbuild.in, which will be used by the kernel kbuild
infrastructure, and using KBUILD_EXTMOD to distinguish the two cases
within the Makefiles, in order to choose appropriate include
directories etc.
Module CFLAGS setting is simplified by using subdir-ccflags-y (available
since 2.6.30) to set them in the top-level Kbuild instead of each
individual module. The disabling of -Wunused-but-set-variable is removed
from the lua and zfs modules. The variable that the Makefile uses is
actually not defined, so this has no effect; and the warning has long
been disabled by the kernel Makefile itself.
The target_cpu definition in module/{zfs,zcommon} is removed as it was
replaced by use of CONFIG_SPARC64 in
commit 70835c5b755e ("Unify target_cpu handling")
os/linux/{spl,zfs} are removed from obj-m, as they are not modules in
themselves, but are included by the Makefile in the spl and zfs module
directories. The vestigial Makefiles in os and os/linux are removed.
Andrea Gelmini [Wed, 10 Jun 2020 04:24:09 +0000 (06:24 +0200)]
Fix typos
Correct various typos in the comments and tests.
Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #10423
Matthew Ahrens [Tue, 9 Jun 2020 17:41:01 +0000 (10:41 -0700)]
File incorrectly zeroed when receiving incremental stream that toggles -L
Background:
By increasing the recordsize property above the default of 128KB, a
filesystem may have "large" blocks. By default, a send stream of such a
filesystem does not contain large WRITE records, instead it decreases
objects' block sizes to 128KB and splits the large blocks into 128KB
blocks, allowing the large-block filesystem to be received by a system
that does not support the `large_blocks` feature. A send stream
generated by `zfs send -L` (or `--large-block`) preserves the large
block size on the receiving system, by using large WRITE records.
When receiving an incremental send stream for a filesystem with large
blocks, if the send stream's -L flag was toggled, a bug is encountered
in which the file's contents are incorrectly zeroed out. The contents
of any blocks that were not modified by this send stream will be lost.
"Toggled" means that the previous send used `-L`, but this incremental
does not use `-L` (-L to no-L); or that the previous send did not use
`-L`, but this incremental does use `-L` (no-L to -L).
Changes:
This commit addresses the problem with several changes to the semantics
of zfs send/receive:
1. "-L to no-L" incrementals are rejected. If the previous send used
`-L`, but this incremental does not use `-L`, the `zfs receive` will
fail with this error message:
incremental send stream requires -L (--large-block), to match
previous receive.
2. "no-L to -L" incrementals are handled correctly, preserving the
smaller (128KB) block size of any already-received files that used large
blocks on the sending system but were split by `zfs send` without the
`-L` flag.
3. A new send stream format flag is added, `SWITCH_TO_LARGE_BLOCKS`.
This feature indicates that we can correctly handle "no-L to -L"
incrementals. This flag is currently not set on any send streams. In
the future, we intend for incremental send streams of snapshots that
have large blocks to use `-L` by default, and these streams will also
have the `SWITCH_TO_LARGE_BLOCKS` feature set. This ensures that streams
from the default use of `zfs send` won't encounter the bug mentioned
above, because they can't be received by software with the bug.
Implementation notes:
To facilitate accessing the ZPL's generation number,
`zfs_space_delta_cb()` has been renamed to `zpl_get_file_info()` and
restructured to fill in a struct with ZPL-specific info including owner
and generation.
In the "no-L to -L" case, if this is a compressed send stream (from
`zfs send -cL`), large WRITE records that are being written to small
(128KB) blocksize files need to be decompressed so that they can be
written split up into multiple blocks. The zio pipeline will recompress
each smaller block individually.
A new test case, `send-L_toggle`, is added, which tests the "no-L to -L"
case and verifies that we get an error for the "-L to no-L" case.
Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #6224
Closes #10383
Igor K [Tue, 9 Jun 2020 17:31:16 +0000 (20:31 +0300)]
ZTS: Fix add-o_ashift.ksh
Use option '-o' after action for compatibility
Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Igor Kozhukhov <igor@dilos.org>
Closes #10426
George Amanakis [Tue, 9 Jun 2020 17:15:08 +0000 (13:15 -0400)]
Trim L2ARC
The l2arc_evict() function is responsible for evicting buffers which
reference the next bytes of the L2ARC device to be overwritten. Teach
this function to additionally TRIM that vdev space before it is
overwritten if the device has been filled with data. This is done by
vdev_trim_simple() which trims by issuing a new type of TRIM,
TRIM_TYPE_SIMPLE.
We also implement a "Trim Ahead" feature. It is a zfs module parameter,
expressed in % of the current write size. This trims ahead of the
current write size. A minimum of 64MB will be trimmed. The default is 0
which disables TRIM on L2ARC as it can put significant stress to
underlying storage devices. To enable TRIM on L2ARC we set
l2arc_trim_ahead > 0.
We also implement TRIM of the whole cache device upon addition to a
pool, pool creation or when the header of the device is invalid upon
importing a pool or onlining a cache device. This is dependent on
l2arc_trim_ahead > 0. TRIM of the whole device is done with
TRIM_TYPE_MANUAL so that its status can be monitored by zpool status -t.
We save the TRIM state for the whole device and the time of completion
on-disk in the header, and restore these upon L2ARC rebuild so that
zpool status -t can correctly report them. Whole device TRIM is done
asynchronously so that the user can export of the pool or remove the
cache device while it is trimming (ie if it is too slow).
We do not TRIM the whole device if persistent L2ARC has been disabled by
l2arc_rebuild_enabled = 0 because we may not want to lose all cached
buffers (eg we may want to import the pool with
l2arc_rebuild_enabled = 0 only once because of memory pressure). If
persistent L2ARC has been disabled by setting the module parameter
l2arc_rebuild_blocks_min_l2size to a value greater than the size of the
cache device then the whole device is trimmed upon creation or import of
a pool if l2arc_trim_ahead > 0.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Adam D. Moss <c@yotes.com> Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #9713
Closes #9789
Closes #10224
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Co-authored-by: Michael Niewöhner <foss@mniewoehner.de> Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
Closes #10422
In Illumos it is possible to call ioctl functions from within the
kernel by passing the FKIOCTL flag. Neither FreeBSD nor Linux support
that, but it doesn't hurt to keep it around, as all the code is there.
Before this commit it was a dead code and zc_iflags was always zero.
Restore this functionality by allowing to pass a flag to the
zfsdev_ioctl_common() function.
Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #10417
By removing excessive includes it takes us a small step close to
compiling this file in userland.
Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #10415
Paul Dagnelie [Mon, 8 Jun 2020 15:58:13 +0000 (08:58 -0700)]
Don't erase final byte of envblock
When we copy the envblock's contents out, we currently treat it as
a normal C string. However, this functionality is supposed to more
closely emulate interacting with a file. As a consequence, we were
incorrectly truncating the contents of the envblock by replacing
the final byte of the buffer with a null character.
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #10405
Jorgen Lundman [Sun, 7 Jun 2020 18:42:12 +0000 (03:42 +0900)]
Replace sprintf()->snprintf() and strcpy()->strlcpy()
The strcpy() and sprintf() functions are deprecated on some platforms.
Care is needed to ensure correct size is used. If some platforms
miss snprintf, we can add a #define to sprintf, likewise strlcpy().
The biggest change is adding a size parameter to zfs_id_to_fuidstr().
The various *_impl_get() functions are only used on linux and have
not yet been updated.
Reviewed by: Sean Eric Fagan <sef@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #10400
The pool may not be imported when the previous pass is terminated.
In which case, spa_open() will return ENOENT to indicate the pool
is not currently imported. Refactor to code slightly to handle
this case by importing the pool and then retrying the spa_open().
The ztest_import() function was moved before ztest_run() and the
import logic split in to a small internal helper function. The
ztest_freeze() function was also moved but no changes were made.
Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10407
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed by: Sean Eric Fagan <sef@ixsystems.com> Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #10386
Paul Dagnelie [Thu, 4 Jun 2020 02:53:21 +0000 (19:53 -0700)]
Fix double mutex_init bug in send code
It was possible to cause a kernel panic in the send code by
initializing an already-initialized mutex, if a record was created
with type DATA, destroyed with a different type (bypassing the
mutex_destroy call) and then re-allocated as a DATA record again.
We tweak the logic to not change the type of a record once it has
been created, avoiding the issue.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Jorgen Lundman <lundman@lundman.net> Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #10374
Ryan Moeller [Wed, 3 Jun 2020 17:45:12 +0000 (13:45 -0400)]
FreeBSD: Simplify zvol and fix locking
zvol_geom_bio_strategy should handle its own use of the zvol
suspend reader lock and ensure the zilog exists when needed.
A few other places using the zvol zilog should use the suspend
reader lock as well.
Simplify consumers of zvol_geom_bio_strategy, fix the locking, and
while in here, use the boolean_t constants with doread.
Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10381
Ryan Moeller [Wed, 3 Jun 2020 16:52:38 +0000 (12:52 -0400)]
Periodically update ARC kstats
FreeBSD needs arc_adjust_zthr to run periodically for kstats to be
updated. A comment in the code suggests this may have been the
original intent in illumos as well:
Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10371
Jorgen Lundman [Wed, 3 Jun 2020 16:49:32 +0000 (01:49 +0900)]
Restore avl_update() calls and related functions
The macOS kmem implementation uses avl_update() and related
functions. These same function exist in the Solaris AVL code but
were removed because they were unused. Restore them.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #10390
Matthew Macy [Sat, 30 May 2020 19:54:57 +0000 (12:54 -0700)]
Fix crypto build on FreeBSD HEAD
Update API usage to reflect recent change.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10384
Add bootfs.snapshot and bootfs.rollback kernel parameters
Unlike other filesystems, snapshots and rollbacks of bootfs need to be
done from a rescue environment. This patch makes it possible to snap-
shot or rollback the bootfs simply by specifying bootfs.snapshot or
bootfs.rollback on the kernel command line. The operation will be
performed by dracut just before bootfs is mounted.
Reviewed-by: Antonio Russo <antonio.e.russo@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Gregory Bartholomew <gregory.lee.bartholomew@gmail.com>
Closes #10198
Brian Behlendorf [Sat, 30 May 2020 04:14:10 +0000 (21:14 -0700)]
ztest: Fix ztest_run_zdb() failure
It's possible for ztest to be killed while the pool is exported
which results in an empty cache file. This is a valid state to
test, but the validation check performed by ztest_run_zdb()
depends on the pool being in the cache file. If it's not the
following error is printed.
zdb -bccsv -G -d -Y -U /tmp/zloop-run/zpool.cache ztest
zdb: can't open '/tmp/zloop-run': No such file or directory
Resolve these failures by removing the dependency on the cache
file. Functionally, we only care that the pool can be imported
and that the zdb verification passes.
Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10385
allen-4 [Fri, 29 May 2020 19:01:57 +0000 (15:01 -0400)]
Update zfs-functions.in
The init.d zfs-share script does not perform the intended
action without having a variable set for ZFS_SHARE and
ZFS_UNSHARE
Assign default values to ZFS_SHARE and ZFS_UNSHARE. Export
the environment variables after sourcing the configuration
file.
Reviewed-by: Richard Yao <ryao@gentoo.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Georgy Yakovlev <gyakovlev@gentoo.org> Signed-off-by: Allen Holl <allen.m.holl@gmail.com>
Closes #10341
Closes #10382
John Gallagher [Thu, 28 May 2020 00:27:28 +0000 (17:27 -0700)]
Rework error handling in zpool_trim()
When a manual trim is run against an entire pool, errors about
particular devices which don't support trim are suppressed. This changes
zpool_trim() in libzfs so that it doesn't return an error when the only
errors are suppressed ones. An exception is made when none of the
devices support trim, in which case an error is reported and a non-zero
status is returned.
This also fixes how the --wait flag works in the presence of suppressed
errors. In particular, suppressed errors no longer cause zpool_trim()
to skip the wait.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: John Gallagher <john.gallagher@delphix.com>
Closes #10263
Closes #10372
Ryan Moeller [Thu, 28 May 2020 00:18:06 +0000 (20:18 -0400)]
ZTS: Retry export/destroy when busy in zpool_import_012
It can take a moment for the NFS server to give up the mountpoint
after unsharing a filesystem.
Use log_must_busy to retry export/destroy a few times after switching
off sharenfs.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10380
Brian Behlendorf [Tue, 26 May 2020 23:07:50 +0000 (16:07 -0700)]
Revert "Let zfs mount all tolerate in-progress mounts"
This reverts commit a9cd8bf which introduced a segfault when running
`zfs mount -a` multiple times when there are mountpoints which are
not empty. This segfault is now seen frequently by the CI after
the mount code was updated to directly call mount(2).
The original reason this logic was added is described in #8881.
Since then the systemd `zfs-share.target` has been updated to run
"After" the `zfs-mount.server` which should avoid this issue.
Reviewed-by: Don Brady <don.brady@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9560
Closes #10364
Marcel Schilling [Tue, 26 May 2020 22:09:25 +0000 (00:09 +0200)]
Fix dead links http://list.zfsonlinux.org
Originally, I wanted to point to directly to
https://zfsonlinux.topicbox.com/groups/zfs-discuss
as the text refers to that specific mailing list, but George Melikov
requested to change it to the general to give users the overview.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Marcel Schilling <marcel.schilling@uni-luebeck.de>
Closes #10367
Closes #10369
Brian Behlendorf [Sun, 24 May 2020 00:13:42 +0000 (17:13 -0700)]
ZTS: Fix zfs_mount.kshlib cleanup
Update cleanup_filesystem to use destroy_dataset when performing
cleanup. This ensures the destroy is retried if the pool is busy
preventing occasional failures.
Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <guss80@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10358
Brian Atkinson [Thu, 21 May 2020 01:06:09 +0000 (19:06 -0600)]
Gang ABD Type
Adding the gang ABD type, which allows for linear and scatter ABDs to
be chained together into a single ABD.
This can be used to avoid doing memory copies to/from ABDs. An example
of this can be found in vdev_queue.c in the vdev_queue_aggregate()
function.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Brian <bwa@clemson.edu> Co-authored-by: Mark Maybee <mmaybee@cray.com> Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #10069
felixdoerre [Thu, 21 May 2020 01:02:41 +0000 (04:02 +0300)]
mount: use the mount syscall directly
Allow zfs datasets to be mounted on Linux without relying on the
invocation of an external processes. This is the same behavior
which is implemented for FreeBSD.
Use of the libmount library was originally considered because it
provides functionality to properly lock and update the /etc/mtab
file. However, these days /etc/mtab is typically a symlink to
/proc/self/mounts so there's nothing to updated. Therefore, we
call mount(2) directly and avoid any additional dependencies.
If required the legacy behavior can be enabled by setting the
ZFS_MOUNT_HELPER environment variable. This may be needed in
environments where SELinux in enabled and the zfs binary does
not have mount permission.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Felix Dörre <felix@dogcraft.de>
#10294
DeHackEd [Wed, 20 May 2020 17:07:21 +0000 (13:07 -0400)]
Use boot_ncpus in place of max_ncpus in taskq_create
Due to hotplug support or BIOS bugs sometimes max_ncpus can be
an absurdly high value. I have a system with 32 cores/threads
but reports max_ncpus == 440. This many threads potentially
cripples the system during arc_prune floods for example.
boot_ncpus is the number of working CPUs when called so use
that instead.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: DHE <git@dehacked.net>
Closes #10282
George Amanakis [Tue, 19 May 2020 21:24:10 +0000 (17:24 -0400)]
Fix gcc 10.1 stringop-truncation error
As we do not expect the destination of these strncpy calls to be NULL
terminated, substitute them with memcpy.
Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #10346
Kyle Evans [Sat, 16 May 2020 17:12:01 +0000 (12:12 -0500)]
freebsd: return EISDIR for read(2) on directories
This is arguably a change for internal consistency within OpenZFS, as the
Linux implementation will reject read(2) on directories with EISDIR. It's
not unreasonable for read(2) to do something here on FreeBSD, but we don't
currently copy out anything useful anyways so start rejecting it with the
appropriate error.
Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Kyle Evans <kevans@FreeBSD.org>
Closes #10338
ColMelvin [Fri, 15 May 2020 03:51:33 +0000 (22:51 -0500)]
RPM: Remove old versions of DKMS on upgrade
Due to a mismatch between the text and a regex looking for that text,
the `%preuninstall` script would never run the `dkms remove` command
necessary to avoid corrupting the DKMS data configuration. Increase
regex specificity to avoid this issue.
Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chris Lindee <chris.lindee+github@gmail.com> Closes: #9891
Closes #10327
Matthew Ahrens [Fri, 15 May 2020 03:48:29 +0000 (20:48 -0700)]
Fix error handling in receive_writer_thread()
If `receive_writer_thread()` gets an error from `receive_process_record()`,
it should be saved in `rwa->err` so that we will stop processing records,
and the main thread will notice that the receive has failed.
When an error is first encountered, this happens correctly. However, if
there are more records to dequeue, the next time through the loop we
will reset `rwa->err` to zero, allowing us to try to process the
following record (2 after the failed record). Depending on what types
of records remain, we may incorrectly complete the receive
"successfully", but without actually having processed all the records.
The fix is to only set `rwa->err` if we got a *non-zero* error.
This bug was introduced by #10099 "Improve zfs receive performance by
batching writes".
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Paul Dagnelie <pcd@delphix.com> Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10320
Brian Behlendorf [Fri, 15 May 2020 03:45:16 +0000 (20:45 -0700)]
Fix abd_enter/exit_critical wrappers
Commit fc551d7 introduced the wrappers abd_enter_critical() and
abd_exit_critical() to mark critical sections. On Linux these are
implemented with the local_irq_save() and local_irq_restore() macros
which set the 'flags' argument when saving. By wrapping them with
a function the local variable is no longer set by the macro and is
no longer properly restored.
Convert abd_enter_critical() and abd_exit_critical() to macros to
resolve this issue and ensure the flags are properly restored.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10332
Brian Behlendorf [Thu, 14 May 2020 16:41:29 +0000 (09:41 -0700)]
flake8 E741 variable name warning
Update the zts-report.py script to conform to the flake8 E741 rule.
"Variables named I, O, and l can be very hard to read. This is
because the letter I and the letter l are easily confused, and
the letter O and the number 0 can be easily confused."
- https://www.flake8rules.com/rules/E741.html
Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10323
The cleanup routine for this test attempts to remove some temporary
files with `rm -f $VDEV_*`, but VDEV_ is undefined. As a result, all
files in the current working directory (/var/tmp/test_results/current)
get removed instead. This includes the complete log file of all tests.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: George Amanakis <gamanakis@gmail.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: John Kennedy <john.kennedy@delphix.com>
Closes #10324
John Poduska [Wed, 13 May 2020 17:54:27 +0000 (13:54 -0400)]
Resilver restarts unnecessarily when it encounters errors
When a resilver finishes, vdev_dtl_reassess is called to hopefully
excise DTL_MISSING (amongst other things). If there are errors during
the resilver, they are tracked in DTL_SCRUB, as spelled out in the
block comment in vdev.c. DTL_SCRUB is in-core only, so it can only
be used if the pool was online for the whole resilver. This state is
tracked with the spa_scrub_started flag, which only gets set when
the scan is initialized. Unfortunately, this flag gets cleared right
before vdev_dtl_reassess gets called, so if there are any errors
during the scan, DTL_MISSING will never get excised and the resilver
will just continually restart. This fix simply moves clearing that
flag until after the call to vdev_dtl_reasses.
In addition, if a pool is imported and already has scn_errors > 0,
this change will restart the resilver immediately instead of doing
the rest of the scan and then restarting it from the beginning. On
the other hand, if scn_errors == 0 at import, then no errors have
been encountered so far, so the spa_scrub_started flag can be safely
set.
A test has been added to verify that resilver does not restart when
relevant DTL's are available.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Paul Zuchowski <pzuchowski@datto.com> Signed-off-by: John Poduska <jpoduska@datto.com>
Closes #10291
AJ Jordan [Mon, 4 May 2020 08:00:59 +0000 (04:00 -0400)]
Fix outdated comment header
Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: AJ Jordan <alex@strugee.net>
Closes #10288
AJ Jordan [Mon, 4 May 2020 07:49:33 +0000 (03:49 -0400)]
Fix up arcstat(1) to match our version
Turns out the illumos manpage, which is what this originates from, was
written for the original Perl version of the utility which is not the
version in the OpenZFS tree. *That* version originates from a Python
rewrite that was done for FreeNAS. So fix up the manpage to match what
we actually ship (and fix a few typos in the process).
Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: AJ Jordan <alex@strugee.net>
Closes #10288
Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: AJ Jordan <alex@strugee.net>
Closes #10288
AJ Jordan [Thu, 7 May 2020 21:49:00 +0000 (17:49 -0400)]
Fix inconsistent capitalization in `arcstat -v`
Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: AJ Jordan <alex@strugee.net>
Closes #10288
Richard Laager [Sun, 10 May 2020 19:26:08 +0000 (14:26 -0500)]
Change zfsunlock for better busybox compatibility
It turns out that there are two versions of Busybox, at least on Ubuntu
18.04. If you have the busybox-static package installed, you get a
busybox that supports `ps a` and `head`. If you only have
busybox-initramfs, you don't. Either way, you have `awk`.
This change should also make this compatible with GNU ps, if you somehow
end up with that in the initramfs environment.
Reviewed-by: Tom Caputi <tcaputi@datto.com> Reviewed-by: Andrey Prokopenko <job@terem.fr> Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes #10307
Brian Atkinson [Sun, 10 May 2020 19:23:52 +0000 (13:23 -0600)]
Combine OS-independent ABD Code into Common Source File
Reorganizing ABD code base so OS-independent ABD code has been placed
into a common abd.c file. OS-dependent ABD code has been left in each
OS's ABD source files, and these source files have been renamed to
abd_os.
The OS-independent ABD code is now under:
module/zfs/abd.c
With the OS-dependent code in:
module/os/linux/zfs/abd_os.c
module/os/freebsd/zfs/abd_os.c
Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #10293
Fixed LDADD library links in Makefiles for cross compilation builds
When building on native dev system, there are no issues but when
cross-compiling for target system, some linker errors are observed.
The only way to avoid these errors is by adjusting the Makefile.am
of those various components to add the library dependencies.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Petros Koutoupis <petros@petroskoutoupis.com>
Closes #10304
When recursively destroying the dataset it's possible for the
dataset volume to be open by an unrelated process, like blkid.
Use the destroy_dataset() which will retry when this occurs.
Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10305
This commit add a new feature for Debian-based distributions to unlock
encrypted root partition over SSH. This feature is very handy on
headless NAS or VPS cloud servers. To use this feature, you will need
to install the dropbear-initramfs package.
Reviewed-By: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-By: Tom Caputi <tcaputi@datto.com> Signed-off-by: Andrey Prokopenko <job@terem.fr> Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes #10027
Richard Laager [Sat, 2 May 2020 23:46:46 +0000 (18:46 -0500)]
Cleanup contrib/initramfs automake
The initramfs hook scripts depend on Makefile. This way, if the
substitution code is changed, they should update. This brings it in
line with etc/init.d (which was modified to match the example in the
automake docs).
The initramfs hook script cleaning now matches etc/init.d.
There was a mix of SUBDIRS recursion and custom install rules for files
in subdirectories. This was duplicated for the "hooks" and "scripts"
subdirectories. Now everything uses SUBDIRS.
I fixed the substitution of DEFAULT_INITCONF_DIR for hooks/zfs.
Reviewed-By: Andrey Prokopenko <job@terem.fr> Reviewed-By: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-By: Tom Caputi <tcaputi@datto.com> Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes #10027
George Amanakis [Thu, 7 May 2020 23:34:03 +0000 (19:34 -0400)]
Improvements on persistent L2ARC
Functional changes:
We implement refcounts of log blocks and their aligned size on the
cache device along with two corresponding arcstats. The refcounts are
reflected in the header of the device and provide valuable information
as to whether log blocks are accounted for correctly. These are
dynamically adjusted as log blocks are committed/evicted. zdb also uses
this information in the device header and compares it to the
corresponding values as reported by dump_l2arc_log_blocks() which
emulates l2arc_rebuild(). If the refcounts saved in the device header
report higher values, zdb exits with an error. For this feature to work
correctly there should be no active writes on the device. This is also
employed in the tests of persistent L2ARC. We extend the structure of
the cache device header by adding the two new variables mirroring the
refcounts after the existing variables to preserve backward
compatibility in terms of persistent L2ARC.
1) a new arcstat "l2_log_blk_asize" and refcount "l2ad_lb_asize" which
reflect the total aligned size of log blocks on the device. This is
also reflected in the header of the cache device as "dh_lb_asize".
2) a new arcstat "l2arc_log_blk_count" and refcount "l2ad_lb_count"
which reflect the total number of L2ARC log blocks present on cache
devices. It is also reflected in the header of the cache device as
"dh_lb_count".
In l2arc_rebuild_vdev() if the amount of committed log entries in a log
block is 0 and the device header is valid we update the device header.
This will facilitate trimming of the whole device in this case when
TRIM for L2ARC is implemented.
Improve loop protection in l2arc_rebuild() by using the starting offset
of the payload of each log block instead of the starting offset of the
log block.
If the zio in l2arc_write_buffers() fails, restore the lbps array in the
header of the device to its previous state in l2arc_write_done().
If l2arc_rebuild() ends the rebuild process without restoring any L2ARC
log blocks in ARC and without any other error, this means that the lbps
array in the header is pointing to non-existent or invalid log blocks.
Reset the device header in this case.
In l2arc_rebuild() change the zfs_dbgmsg messages to
spa_history_log_internal() making them user visible with zpool history
command.
Non-functional changes:
Make the first test in persistent L2ARC use `zdb -lll` to increase
coverage in `zdb.c`.
Rename psize with asize when referring to log blocks, since
L2ARC_SET_PSIZE stores the vdev aligned size for log blocks. Also
rename dh_log_blk_entries to dh_log_entries to make it clear that
it is a mirror of l2ad_log_entries. Added comments for both changes.
Fix inaccurate comments for example in l2arc_log_blk_restore().
Add asserts at the end in l2arc_evict() and l2arc_write_buffers().
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #10228
Paul Dagnelie [Thu, 7 May 2020 16:36:33 +0000 (09:36 -0700)]
Add support for boot environment data to be stored in the label
Modern bootloaders leverage data stored in the root filesystem to
enable some of their powerful features. GRUB specifically has a grubenv
file which can store large amounts of configuration data that can be
read and written at boot time and during normal operation. This allows
sysadmins to configure useful features like automated failover after
failed boot attempts. Unfortunately, due to the Copy-on-Write nature
of ZFS, the standard behavior of these tools cannot handle writing to
ZFS files safely at boot time. We need an alternative way to store
data that allows the bootloader to make changes to the data.
This work is very similar to work that was done on Illumos to enable
similar functionality in the FreeBSD bootloader. This patch is different
in that the data being stored is a raw grubenv file; this file can store
arbitrary variables and values, and the scripting provided by grub is
powerful enough that special structures are not required to implement
advanced behavior.
We repurpose the second padding area in each label to store the grubenv
file, protected by an embedded checksum. We add two ioctls to get and
set this data, and libzfs_core and libzfs functions to access them more
easily. There are no direct command line interfaces to these functions;
these will be added directly to the bootloader utilities.
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #10009
Philip Pokorny [Thu, 7 May 2020 00:17:38 +0000 (17:17 -0700)]
Fix column width calculation issue with certain terminal widths
If the reported terminal width is 0 or less than 42, the signed variable
width was set to a negative number that was then assigned to the
unsigned column width becoming a huge number.
Add comments and change logic to better explain what's happening.
Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Philip Pokorny <ppokorny@mindspring.com>
Closes #10247
George Amanakis [Wed, 6 May 2020 17:32:28 +0000 (13:32 -0400)]
Enable splitting mirrors with indirect vdevs
When a top-level vdev is removed from a pool it is converted to an
indirect vdev. Until now splitting such mirrored pools was not possible
with zpool split. This patch enables handling of indirect vdevs and
splitting of those pools with zpool split.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #10283
alaviss [Mon, 4 May 2020 22:25:48 +0000 (22:25 +0000)]
config/kernel-inode-times: initialize timespec
Usage of this variable uninitialized triggers -Werror,-Wuninitialized
when compiled under clang for linux kernel 5.6, leading the build system
to believe that the function is not declared.
This commit initializes the variable to suppress the warning and fix the
build for kernel 5.6 with clang.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Hiếu Lê <leorize+oss@disroot.org>
Closes #10279
Closes #10281
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Ported-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10270
Ryan Moeller [Mon, 4 May 2020 22:07:04 +0000 (18:07 -0400)]
Update FreeBSD SPL atomics
Sync up with the following changes from FreeBSD:
ZFS: add emulation of atomic_swap_64 and atomic_load_64
Some 32-bit platforms do not provide 64-bit atomic operations that ZFS
requires, either in userland or at all. We emulate those operations
for those platforms using a mutex. That is not entirely correct and
it's very efficient. Besides, the loads are plain loads, so torn
values are possible.
Nevertheless, the emulation seems to work for some definition of work.
This change adds atomic_swap_64, which is already used in ZFS code,
and atomic_load_64 that can be used to prevent torn reads.
atomic_cas_32 is implemented using atomic_fcmpset_32 on all platforms.
Ditto for atomic_cas_64 and atomic_fcmpset_64 on platforms that have
it. The only exception is sparc64 that provides MD atomic_cas_32 and
atomic_cas_64.
This is slightly inefficient as fcmpset reports whether the operation
updated the target and that information is not needed for cas.
Nevertheless, there is less code to maintain and to add for new
platforms. Also, the operations are done inline now as opposed to
function calls before.
atomic_add_64_nv is implemented using atomic_fetchadd_64 on platforms
that provide it.
casptr, cas32, atomic_or_8, atomic_or_8_nv are completely removed as
they have no users.
atomic_mtx that is used to emulate 64-bit atomics on platforms that
lack them is defined only on those platforms.
As a result, platform specific opensolaris_atomic.S files have lost
most of their code. The only exception is i386 where the
compat+contrib code provides 64-bit atomics for userland use. That
code assumes availability of cmpxchg8b instruction. FreeBSD does not
have that assumption for i386 userland and does not provide 64-bit
atomics. Hopefully, this can and will be fixed.
emulate illumos membar_producer with atomic_thread_fence_rel
membar_producer is supposed to be a store-store barrier.
Also, in the code that FreeBSD has ported from illumos membar_producer
is used only with regular stores to regular memory (with respect to
caching).
We do not have an MI primitive for the store-store barrier, so
atomic_thread_fence_rel is the closest we have as it provides
(load | store) -> store barrier.
Previously, membar_producer was an empty function call on all 32-bit
arm-s, 32-bit powerpc, riscv and all mips variants. I think that it
was inadequate.
On other platforms, such as amd64, arm64, i386, powerpc64, sparc64,
membar_producer was implemented using stronger primitives than required
for a store-store barrier with respect to regular memory access.
For example, it used sfence on amd64 and lock-ed nop in i386 (despite
TSO).
On powerpc64 we now use recommended lwsync instead of eieio.
On sparc64 FreeBSD uses TSO mode.
On arm64/aarch64 we now use dmb sy instead of dmb ish. Not sure if
this is an improvement, actually.
After this change we can drop opensolaris_atomic.S for aarch64, amd64,
powerpc64 and sparc64 as all required atomic operations have either
direct or light-weight mapping to FreeBSD native atomic operations.
fix up r353340, don't assume that fcmpset has strong semantics
fcmpset can have two kinds of semantics, weak and strong.
For practical purposes, strong semantics means that if fcmpset fails
then the reported current value is always different from the expected
value. Weak semantics means that the reported current value may be the
same as the expected value even though fcmpset failed. That's a so
called "sporadic" failure.
I originally implemented atomic_cas expecting strong semantics, but
many platforms actually have weak one.
Reported by: pkubaj (not confirmed if same issue)
Discussed with: kib, mjg
Authored by: avg <avg@FreeBSD.org>
FreeBSD-commit: freebsd/freebsd@238787c74e737e271f17330fbad900acc35651c
[PowerPC] [MIPS] Implement 32-bit kernel emulation of atomic64 operations
This is a lock-based emulation of 64-bit atomics for kernel use, split off
from an earlier patch by jhibbits.
This is needed to unblock future improvements that reduce the need for
locking on 64-bit platforms by using atomic updates.
The implementation allows for future integration with userland atomic64,
but as that implies going through sysarch for every use, the current
status quo of userland doing its own locking may be for the best.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Ported-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10250
Ryan Moeller [Fri, 1 May 2020 00:50:16 +0000 (20:50 -0400)]
ZTS: Count CKSUM for all vdevs in verify_pool
The verify_pool function should detect checksum errors on any vdev, but
it was only checking at the root of the pool.
Accumulate the errors for all vdevs to obtain the correct count.
Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10271
Ryan Moeller [Fri, 1 May 2020 00:48:58 +0000 (20:48 -0400)]
zdb: Fix ignored zfs_arc_max tuning
Running zdb -l $disk shows a warning that zfs_arc_max is being ignored.
zdb sets zfs_arc_max below zfs_arc_min, which causes the value to be
ignored by arc_tuning_update().
Set zfs_arc_min to the bare minimum in zdb, which is below zfs_arc_max.
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Allan Jude <allanjude@freebsd.org> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10269
Paul B. Henson [Fri, 6 Dec 2019 05:35:38 +0000 (05:35 +0000)]
OpenZFS 6765 - zfs_zaccess_delete() comments do not accurately
reflect delete permissions for ACLs
Authored by: Kevin Crowe <kevin.crowe@nexenta.com>
Reviewed by: Gordon Ross <gwr@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Richard Lowe <richlowe@richlowe.net> Ported-by: Paul B. Henson <henson@acm.org>
Porting Notes:
* Only comments are updated
Paul B. Henson [Thu, 5 Dec 2019 04:30:02 +0000 (04:30 +0000)]
OpenZFS 6764 - zfs issues with inheritance flags during chmod(2)
with aclmode=passthrough
Authored by: Albert Lee <trisk@nexenta.com>
Reviewed by: Gordon Ross <gwr@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Richard Lowe <richlowe@richlowe.net> Ported-by: Paul B. Henson <henson@acm.org>
OpenZFS-issue: https://www.illumos.org/issues/6764
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/de0f1ddb59
Closes #10266
Paul B. Henson [Thu, 5 Dec 2019 00:45:14 +0000 (00:45 +0000)]
OpenZFS 3254 - add support in zfs for aclmode=restricted
Authored-by: Paul B. Henson <henson@acm.org>
Reviewed by: Albert Lee <trisk@nexenta.com>
Reviewed by: Gordon Ross <gwr@nexenta.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Richard Lowe <richlowe@richlowe.net> Ported-by: Paul B. Henson <henson@acm.org>
OpenZFS-issue: https://www.illumos.org/issues/3254
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/71dbfc287c
Closes #10266
Paul B. Henson [Thu, 5 Dec 2019 00:35:18 +0000 (00:35 +0000)]
OpenZFS 742 - Resurrect the ZFS "aclmode" property OpenZFS 664 - Umask masking "deny" ACL entries OpenZFS 279 - Bug in the new ACL (post-PSARC/2010/029) semantics
Porting notes:
* Updated zfs_acl_chmod to take 'boolean_t isdir' as first parameter
rather than 'zfsvfs_t *zfsvfs'
* zfs man pages changes mixed between zfs and new zfsprops man pages
Reviewed by: Aram Hvrneanu <aram@nexenta.com>
Reviewed by: Gordon Ross <gwr@nexenta.com>
Reviewed by: Robert Gordon <rbg@openrbg.com>
Reviewed by: Mark.Maybee@oracle.com
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Garrett D'Amore <garrett@nexenta.com> Ported-by: Paul B. Henson <henson@acm.org>
OpenZFS-issue: https://www.illumos.org/issues/742
OpenZFS-issue: https://www.illumos.org/issues/664
OpenZFS-issue: https://www.illumos.org/issues/279
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/a3c49ce110
Closes #10266
Jason King [Tue, 28 Apr 2020 17:55:18 +0000 (12:55 -0500)]
Support custom URI schemes for the keylocation property
Every platform has their own preferred methods for implementing URI
schemes beyond the currently supported file scheme (e.g. 'https' on
FreeBSD would likely use libfetch, while Linux distros and illumos
would probably use libcurl, etc). It would be helpful if libzfs can
be extended to support additional schemes in a simple manner.
A table of (scheme, handler_function) pairs is added to libzfs_crypto.c,
and the existing functions in libzfs_crypto.c so that when the key
format is ZFS_KEYFORMAT_URI, the scheme from the URI string is
extracted, and a matching handler it located in the aforementioned
table (returning an error if no matching handler is found). The handler
function is then invoked to retrieve the key material (in the format
specified by the keyformat property) and the key is loaded or the
handler can return an error to abort the key loading process.
Reviewed by: Sean Eric Fagan <sef@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Jason King <jason.king@joyent.com>
Closes #10218
Sara Hartse [Tue, 28 Apr 2020 16:56:31 +0000 (09:56 -0700)]
Add more sanity testing for zdb input args
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: sara hartse <sara.hartse@delphix.com>
Closes #10243
Reviewed-By: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-By: Tom Caputi <tcaputi@datto.com> Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #10246
Ryan Moeller [Tue, 28 Apr 2020 16:14:30 +0000 (12:14 -0400)]
Fix zlib leak on FreeBSD
zlib_inflateEnd was accidentally a wrapper for inflateInit instead of
inflateEnd, and hilarity ensues.
Fix the typo so we free memory instead of allocating more.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10225
Closes #10252
alex [Sat, 25 Apr 2020 02:04:34 +0000 (10:04 +0800)]
zfs_create: round up volume size to multiple of bs
Round up the volume size requested in `zfs create -V size` to the next
higher multiple of the volblocksize. Updates the man page and adds a
test to verify the new behavior.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reported-by: puffi <puffi@users.noreply.github.com> Signed-off-by: Alex John <alex@stty.io>
Closes #8541
Closes #10196
Tom Caputi [Sat, 25 Apr 2020 02:00:32 +0000 (22:00 -0400)]
Fix missing ivset guid with resumed raw base recv
This patch corrects a bug introduced in 61152d1069. When
resuming a raw base receive, the dmu_recv code always sets
drc->drc_fromsnapobj to the object ID of the previous
snapshot. For incrementals, this is correct, but for base
sends, this should be left at 0. The presence of this ID
eventually allows a check to run which determines whether
or not the incoming stream and the previous snapshot have
matching IVset guids. This check fails becuase it is not
meant to run when there is no previous snapshot. When it
does fail, the user receives an error stating that the
incoming stream has the problem outlined in errata 4.
This patch corrects this issue by simply ensuring
drc->drc_fromsnapobj is left as 0 for base receives.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #10234
Closes #10239
Brian Behlendorf [Thu, 23 Apr 2020 22:54:38 +0000 (15:54 -0700)]
Fix unitialized variable in `zstream redup` command
Fix uninitialized variable in `zstream redup` command. The compiler
may determine the 'stream_offset' variable can be uninitialized
because not all rdt_lookup() exit paths set it. This should never
happen in practice as documented by the assert, but initialize it
regardless to resolve the warning.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10241
Closes #10244
Matthew Ahrens [Thu, 23 Apr 2020 22:53:14 +0000 (15:53 -0700)]
change libspl list member names to match kernel
This aids in debugging, so that we can use the same infrastructure to
walk zfs's list_t in the kernel module and in the userland libraries
(e.g. when debugging ztest).
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10236
Matthew Ahrens [Thu, 23 Apr 2020 17:06:57 +0000 (10:06 -0700)]
Remove deduplicated send/receive code
Deduplicated send streams (i.e. `zfs send -D` and `zfs receive` of such
streams) are deprecated. Deduplicated send streams can be received by
first converting them to non-deduplicated with the `zstream redup`
command.
This commit removes the code for sending and receiving deduplicated send
streams. `zfs send -D` will now print a warning, ignore the `-D` flag,
and generate a regular (non-deduplicated) send stream. `zfs receive` of
a deduplicated send stream will print an error message and fail.
The resulting code simplification (especially in the kernel's support
for receiving dedup streams) should help enable future performance
enhancements.
Several new tests are added which leverage `zstream redup`.
Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Issue #7887
Issue #10117
Issue #10156
Closes #10212
Matthew Ahrens [Wed, 22 Apr 2020 17:26:56 +0000 (10:26 -0700)]
Use a struct to organize metaslab-group-allocator fields
Each metaslab group (of which there is one per top-level vdev) has
several (4, by default) "metaslab group allocators". Each "allocator"
has its own metaslab that it prefers to allocate from (the "primary"
allocator), and each can perform allocations concurrently with the other
allocators. In addition to the primary metaslab, there are several
other fields that need to be tracked separately for each allocator.
These are currently stored as several arrays in the metaslab_group_t,
each array indexed by allocator number.
This change organizes all the metaslab-group-allocator-specific fields
into a new struct, metaslab_group_allocator_t. The metaslab_group_t now
needs only one array indexed by the allocator number - which contains
the metaslab_group_allocator_t's.
Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10213
On zpools containing hole vdevs (e.g. removed log devices), the `zpool
trim` (and presumably `zpool initialize`) commands will attempt calling
their respective functions on "hole", which fails, as this is not a real
vdev.
Avoid this by removing HOLE vdevs in zpool_collect_leaves.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Niklas Haas <git@haasn.xyz>
Closes #10227
Matthew Ahrens [Mon, 20 Apr 2020 17:12:48 +0000 (10:12 -0700)]
Fix zfs send progress reporting
The progress of a send is supposed to be reported by `zfs send -v`, but
it is not. This works by creating a new user thread (with
pthread_create()) which does ZFS_IOC_SEND_PROGRESS ioctls to check how
much progress has been made. This IOCTL finds the specified send (since
there may be multiple concurrent sends in the system). The IOCTL also
checks that the specified send was started by the current process.
On Linux, different threads of the same process are represented as
different `struct task_struct`s (and, confusingly, have different
PID's). To check if if two threads are in the same process, we need to
check if they have the same `struct task_struct:group_leader`.
We used to to this correctly, but it was inadvertently changed by 30af21b02569 (Redacted Send) to simply check if the current
`struct task_struct` is the one that started the send.
This commit changes the code back to checking if the send was started by
a `struct task_struct` with the same `group_leader` as the calling
thread.
Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Chris Wedgwood <cw@f00f.org> Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10215
Closes #10216
Matthew Macy [Fri, 17 Apr 2020 16:30:26 +0000 (09:30 -0700)]
Use new FreeBSD API to largely eliminate object locking
Propagate changes in HEAD that mostly eliminate object locking.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10205
George Amanakis [Fri, 17 Apr 2020 16:27:40 +0000 (12:27 -0400)]
Persistent L2ARC minor fixes
Minor fixes on persistent L2ARC improving code readability and fixing
a typo in zdb.c when byte-swapping a log block. It also improves the
pesist_l2arc_007_pos.ksh test by giving it more time to retrieve log
blocks on the cache device.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Adam D. Moss <c@yotes.com> Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #10210
Ryan Moeller [Wed, 15 Apr 2020 16:21:40 +0000 (12:21 -0400)]
Don't delete freebsd.run in distclean
Add a comment so the file is not empty.
The comment can be removed when FreeBSD-specific tests are added.
Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Sean Eric Fagan <sef@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10206