Toomas Soome [Tue, 25 May 2021 17:33:18 +0000 (20:33 +0300)]
zdb: dump_history needs space
One space is missing from zdb -h output causing strings to be concatenated. (fixing #11940)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #12098
verify_slog_support no longer applies to ZFS since slog support
is always available.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #12092
Alexander Motin [Mon, 24 May 2021 20:42:45 +0000 (16:42 -0400)]
FreeBSD: Retry OCF ENOMEM errors.
ZFS does not expect transient errors from crypto. For read they are
counted as checksum errors, while for write end up in panic. To not
panic on random low memory conditions retry ENOMEM errors in the OCF
wrapper function.
While there remove unneeded timeout and priority from msleep().
External-issue: https://reviews.freebsd.org/D30339 Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-By: iXsystems, Inc.
Closes #12077
наб [Sat, 8 May 2021 11:17:04 +0000 (13:17 +0200)]
Don't abuse vfork()
According to POSIX.1, "vfork() has the same effect as fork(2),
except that the behavior is undefined if the process created by vfork()
either modifies any data other than a variable of type pid_t
used to store the return value from vfork(), [...],
or calls any other function before successfully calling _exit(2)
or one of the exec(3) family of functions."
These do all three, and work by pure chance
(or maybe they don't, but we blisfully don't know).
Either way: bad idea to call vfork() from C,
unless you're the standard library, and POSIX.1-2008 removes it entirely
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12015
наб [Wed, 19 May 2021 11:56:24 +0000 (13:56 +0200)]
libzfs: run_process: set O_NONBLOCK on lines pipe
Without this, we can deadlock: the child is stuck writing to the pipe,
and we are stuck waiting on the child
With this, we the child fills up the pipe (a few hundred kBish)
and starts getting EAGAINs, which allows it to either crash
or ignore them
libzfs_run_process_get_stdout*() is used only by zpool -c scripts,
which output short runs of K=V pairs, so the likelihood of losing
legitimate data there is relatively low
Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12082
This change addresses two distinct scenarios which are possible
when performing a sequential resilver to a dRAID pool with vdevs
that contain silent unknown damage. Which in this circumstance
took the form of the devices being intentionally overwritten with
zeros. However, it could also result from a device returning incorrect
data while a sequential resilver was in progress.
Scenario 1) A sequential resilver is performed while all of the
dRAID vdevs are ONLINE and there is silent damage present on the
vdev being resilvered. In this case, nothing will be repaired
by vdev_raidz_io_done_reconstruct_known_missing() because
rc->rc_error isn't set on any of the raid columns. To address
this vdev_draid_io_start_read() has been updated to always mark
the resilvering column as ESTALE for sequential resilver IO.
Scenario 2) Multiple columns contain silent damage for the same
block and a sequential resilver is performed. In this case it's
impossible to generate the correct data from parity unless all of
the damaged columns are being sequentially resilvered (and thus
only good data is used to generate parity). This is as expected
and there's nothing which can be done about it. However, we need
to be careful not to make to situation worse. Since we can't
verify the data is actually good without a checksum, we must
only repair the devices which are being sequentially resilvered.
Otherwise, an incorrect repair to a device which previously
contained good data could effectively lock in the damage and
make reconstruction impossible. A check for this was added to
vdev_raidz_io_done_verified() along with a new test case.
Lastly, this change updates the redundancy_draid_spare1 and
redundancy_draid_spare3 test cases to be more representative
of normal dRAID replacement operation. Specifically, what we
care about is that the scrub run after a sequential resilver
does not find additional blocks which need repair. This would
indicate the sequential resilver failed to rebuild a section of
one of the devices. Note also the tests were switched to using
the verify_pool() function which still checks for checksum errors.
Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12061
Lauri Tirkkonen [Thu, 20 May 2021 16:03:03 +0000 (19:03 +0300)]
zfs-allow.8: mention 'bookmark' permission
Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Lauri Tirkkonen <lauri@hacktheplanet.fi>
Closes #12064
Alexander Motin [Fri, 14 May 2021 16:13:53 +0000 (12:13 -0400)]
Scale worker threads and taskqs with number of CPUs
While use of dynamic taskqs allows to reduce number of idle threads,
hardcoded 8 taskqs of each kind is a big overkill for small systems,
complicating CPU scheduling, increasing I/O reorder, etc, while
providing no real locking benefits, just not needed there.
On another side, 12*8 worker threads per kind are able to overload
almost any system nowadays. For example, pool of several fast SSDs
with SHA256 checksum makes system barely responsive during scrub, or
with dedup enabled barely responsive during large file deletion.
To address both problems this patch introduces ZTI_SCALE macro, alike
to ZTI_BATCH, but with multiple taskqs, depending on number of CPUs,
to be used in places where lock scalability is needed, while request
ordering is not so much. The code is made to create new taskq for
~6 worker threads (less for small systems, but more for very large)
up to 80% of CPU cores (previous 75% was not good for rounding down).
Both number of threads and threads per taskq are now tunable in case
somebody really wants to use all of system power for ZFS.
While obviously some benchmarks show small peak performance reduction
(not so big really, especially on systems with SMT, where use of the
second threads does not give as much performance as the first ones),
they also show dramatic latency reduction and much more smooth user-
space operation in case of high CPU usage by ZFS.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-By: iXsystems, Inc.
Closes #11966
Brian Behlendorf [Fri, 14 May 2021 16:11:56 +0000 (09:11 -0700)]
ZTS: Increase redundancy test timeout
The redundancy_draid.ksh and redundancy_raidz.ksh tests were updated
by commit 93c8e91fe to additionally verify self-healing. This
additional check increased the run time which can now occasionally
exceed the default maximum timeout in the CI environment. To prevent
this from causing failures increase the default timeout for the
redundancy test cases.
Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12043
наб [Fri, 14 May 2021 04:47:53 +0000 (06:47 +0200)]
i-t: rewrite hooks
This produces a leaner image, doesn't fail if zdb doesn't exist,
properly handles hostnameless systems, doesn't mention crypto modules
for no reason, doesn't add useless empty executable in hopes an
eight-year-old PR is merged, uses i-t builtins for all copies
Also optimize the checkbashisms filter to spawn one (or a few) awks
instead of one per regular file and remove initramfs/hooks therefrom due
to a command -v false positive
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12017
Ryan Moeller [Thu, 29 Apr 2021 03:27:57 +0000 (03:27 +0000)]
FreeBSD: Implement xattr=sa
FreeBSD historically has not cared about the xattr property; it was
always treated as xattr=on. With xattr=on, xattrs are stored as files
in a hidden xattr directory. With xattr=sa, xattrs are stored as
system attributes and get cached in nvlists during xattr operations.
This makes SA xattrs simpler and more efficient to manipulate. FreeBSD
needs to implement the SA xattr operations for feature parity with
Linux and to ensure that SA xattrs are accessible when migrated or
replicated from Linux.
Following the example set by Linux, refactor our existing extattr vnops
to split off the parts handling dir style xattrs, and add the
corresponding SA handling parts.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11997
Ryan Moeller [Wed, 28 Apr 2021 19:19:28 +0000 (19:19 +0000)]
FreeBSD: Use SET_ERROR to trace xattr name errors
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11997
Ryan Moeller [Wed, 28 Apr 2021 18:58:30 +0000 (18:58 +0000)]
FreeBSD: Don't force xattr mount option
The kernel will use the xattr property by default when not overridden
by a mount option.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11997
Brian Behlendorf [Thu, 13 May 2021 17:00:17 +0000 (10:00 -0700)]
Revert "Fix raw sends on encrypted datasets when copying back snapshots"
Commit d1d4769 takes into account the encryption key version to
decide if the local_mac could be zeroed out. However, this could lead
to failure mounting encrypted datasets created with intermediate
versions of ZFS encryption available in master between major releases.
In order to prevent this situation revert d1d4769 pending a more
comprehensive fix which addresses the mount failure case.
Reviewed-by: George Amanakis <gamanakis@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #11294
Issue #12025
Issue #12300
Closes #12033
наб [Mon, 10 May 2021 15:24:59 +0000 (17:24 +0200)]
Widen mancheck target to all pages, fix them
mandoc: ./man/man8/zfs-mount-generator.8.in:188:2:
ERROR: skipping end of block that is not open: RE
mandoc: ./man/man8/zfs_ids_to_path.8:38:2:
ERROR: skipping unknown macro: .LP
mandoc: ./man/man8/zfs_ids_to_path.8:48:2:
ERROR: inserting missing end of block: Sh breaks Bl
mandoc: ./man/man8/zfs-wait.8:69:2:
ERROR: skipping end of block that is not open: El
mandoc: ./man/man8/zfs-program.8:460:2:
ERROR: inserting missing end of block: It breaks Bd
mandoc: ./man/man8/zfs-mount-generator.8:188:2:
ERROR: skipping end of block that is not open: RE
mandoc: ./man/man8/zstream.8:43:2:
ERROR: skipping unknown macro: .LP
mandoc: ./man/man8/zstream.8:107:2:
ERROR: inserting missing end of block: Sh breaks Bl
mandoc: ./man/man8/zstream.8:107:2:
ERROR: inserting missing end of block: Sh breaks Bl
make: *** [Makefile:1529: mancheck] Error 1
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Issue #12017
Brian Behlendorf [Wed, 12 May 2021 02:55:12 +0000 (19:55 -0700)]
ZTS: Add known exceptions
The following seven tests been observed to occasionally fail during
CI testing. This commit adds them to the list of known somewhat
flaky test cases.
Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12023
Coleman Kane [Wed, 12 May 2021 02:53:02 +0000 (22:53 -0400)]
linux 5.13 compat: bdevops->revalidate_disk() removed
Linux kernel commit 0f00b82e5413571ed225ddbccad6882d7ea60bc7 removes the
revalidate_disk() handler from struct block_device_operations. This
caused a regression, and this commit eliminates the call to it and the
assignment in the block_device_operations static handler assignment
code, when configure identifies that the kernel doesn't support that
API handler.
Reviewed-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #11967
Closes #11977
Ryan Moeller [Tue, 11 May 2021 05:02:25 +0000 (01:02 -0400)]
Remove unimplemented virus scanning hooks
Reviewed-by: Adam Moss <c@yotes.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11972
наб [Mon, 10 May 2021 18:00:15 +0000 (20:00 +0200)]
module/zfs: remove zfs_zevent_console and zfs_zevent_cols
zfs_zevent_console committed multiple printk()s per line without
properly continuing them ‒ a single event could easily be fragmented
across over thirty lines, making it useless for direct application
zfs_zevent_cols exists purely to wrap the output from zfs_zevent_console
The niche this was supposed to fill can be better served by something
akin to the all-syslog ZEDLET
наб [Mon, 3 May 2021 10:01:13 +0000 (12:01 +0200)]
zfs_get_enclosure_sysfs_path(): don't leak dev path
Also always free tmp2 at the end
Before:
nabijaczleweli@tarta:~/uwu$ valgrind --leak-check=full ./blergh
==8947== Memcheck, a memory error detector
==8947== Using Valgrind-3.14.0 and LibVEX
==8947== Command: ./blergh
==8947==
(null)
==8947==
==8947== HEAP SUMMARY:
==8947== in use at exit: 23 bytes in 1 blocks
==8947== total heap usage: 3 allocs, 2 frees, 1,147 bytes allocated
==8947==
==8947== 23 bytes in 1 blocks are definitely lost in loss record 1 of 1
==8947== at 0x483577F: malloc (vg_replace_malloc.c:299)
==8947== by 0x48D74B7: vasprintf (vasprintf.c:73)
==8947== by 0x48B7833: asprintf (asprintf.c:35)
==8947== by 0x401258: zfs_get_enclosure_sysfs_path
(zutil_device_path_os.c:191)
==8947== by 0x401482: main (blergh.c:107)
==8947==
==8947== LEAK SUMMARY:
==8947== definitely lost: 23 bytes in 1 blocks
==8947== indirectly lost: 0 bytes in 0 blocks
==8947== possibly lost: 0 bytes in 0 blocks
==8947== still reachable: 0 bytes in 0 blocks
==8947== suppressed: 0 bytes in 0 blocks
==8947==
==8947== For counts of detected and suppressed errors, rerun with: -v
==8947== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
nabijaczleweli@tarta:~/uwu$ sed -n 191p zutil_device_path_os.c
tmpsize = asprintf(&tmp1, "/sys/block/%s/device", dev_name);
After:
nabijaczleweli@tarta:~/uwu$ valgrind --leak-check=full ./blergh
==9512== Memcheck, a memory error detector
==9512== Using Valgrind-3.14.0 and LibVEX
==9512== Command: ./blergh
==9512==
(null)
==9512==
==9512== HEAP SUMMARY:
==9512== in use at exit: 0 bytes in 0 blocks
==9512== total heap usage: 3 allocs, 3 frees, 1,147 bytes allocated
==9512==
==9512== All heap blocks were freed -- no leaks are possible
==9512==
==9512== For counts of detected and suppressed errors, rerun with: -v
==9512== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11993
libzfs: zpool_load_compat(): open feature file cloexec
As a bonus, this also passes the open flags into the open flags instead
of the mode (it worked by accident because O_RDONLY is 0),
correctly detects a failed map,
and prefaults the entire file since we're always writing to every page
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11993
illiliti [Sat, 8 May 2021 15:58:26 +0000 (15:58 +0000)]
copy-builtin: posix conformance
This commits contains changes to allow running `copy-builtin` without
bash + some minor improvements.
changed shebang to /bin/sh
added -f option to `set` to globally disable unneeded globbing
replaced all `echo` commands within add_after() with `printf`
alternative to avoid possible issues with options (-neE)
dropped non-portable superfluous `readlink` command
replaced superfluous `true` command with `:` builtin alternative
replaced non-portable `--recursive` option of `cp` command with `-R`
alternative
dropped non-portable `local` keyword
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: illiliti <illiliti@protonmail.com>
Closes #12004
When dRAID performs a normal read operation only the data columns
in the raid map are read from disk. This is enough information to
calculate the checksum, verify it, and return the needed data to the
application. It's only in the event of a checksum failure that the
additional parity and any empty columns must be read since they are
required for parity reconstruction.
Reading these additional columns is handled by vdev_raidz_read_all()
which calls vdev_draid_map_alloc_empty() to expand the raid_map_t
and submit IOs for the missing columns. This all works correctly,
but it fails to account for any "short" columns. These are data
columns which are padded with a empty skip sector at the end.
Since that empty sector is not needed for a normal read it's not
read when columns is first read from disk. However, like the parity
and empty columns the skip sector is needed to perform reconstruction.
The fix is to mark any "short" columns as never being read by clearing
the rc_tried flag when expanding the raid_map_t. This will cause
the entire column to re-read from disk in the event of a checksum
failure allowing the self-healing functionality to repair the block.
Note that this only effects the self-healing feature because when
scrubbing a pool the parity, data, and empty columns are all read
initially to verify their contents. Furthermore, only blocks which
contain "short" columns would be effected, and only when the memory
backing the skip sector wasn't already zeroed out.
This change extends the existing redundancy_raidz.ksh test case to
verify self-healing (as well as resilver and scrub). Then applies
the same test case to dRAID with a slightly modified version of
the test script called redundancy_draid.ksh. The unused variable
combrec was also removed from both test cases.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12010
наб [Mon, 3 May 2021 19:21:21 +0000 (21:21 +0200)]
Replace ZoL with OpenZFS where applicable
Afterward, git grep ZoL matches:
* README.md: * [ZoL Site](https://zfsonlinux.org)
- Correct
* etc/default/zfs.in:# ZoL userland configuration.
- Changing this would induce a needless upgrade-check,
if the user has modified the configuration;
this can be updated the next time the defaults change
* module/zfs/dmu_send.c: * ZoL < 0.7 does not handle [...]
- Before 0.7 is ZoL, so fair enough
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Issue #11956
Ryan Moeller [Mon, 3 May 2021 17:56:08 +0000 (17:56 +0000)]
FreeBSD: Remove !FreeBSD ifdef'd code
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11994
Ryan Moeller [Mon, 3 May 2021 17:51:03 +0000 (17:51 +0000)]
Clean up use of zfs_log_create in zfs_dir
zfs_log_create returns void, so there is no reason to cast its return
value to void at the call site.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11994
наб [Fri, 7 May 2021 22:10:16 +0000 (00:10 +0200)]
zed: protect against wait4()/fork() races to the global PID table
This can be very easily triggered by adding a sleep(1) before
the wait4() on a PID-starved system: the reaper thread would wait
for a child before its entry appeared, letting old entries accumulate:
By employing this wider lock, we atomise [wait, remove] and [fork, add]:
slowing down the reaper thread now just causes some zombies
to accumulate until it can get to them
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Don Brady <don.brady@delphix.com> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11963
Closes #11965
Alyssa Ross [Fri, 7 May 2021 22:08:16 +0000 (22:08 +0000)]
Return required size when encode_fh size too small
Quoting <linux/exportfs.h>:
> encode_fh() should return the fileid_type on success and on error
> returns 255 (if the space needed to encode fh is greater than
> @max_len*4 bytes). On error @max_len contains the minimum size (in 4
> byte unit) needed to encode the file handle.
ZFS was not setting max_len in the case where the handle was too
small. As a result of this, the `t_name_to_handle_at.c' example in
name_to_handle_at(2) did not work on ZFS.
zfsctl_fid() will itself set max_len if called with a fid that is too
small, so if we give zfs_fid() that behavior as well, the fix is quite
easy: if the handle is too small, just use a zero-size fid instead of
the handle.
Tested by running t_name_to_handle_at on a normal file, a directory, a
.zfs directory, and a snapshot.
Thanks-to: Puck Meerburg <puck@puckipedia.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Signed-off-by: Alyssa Ross <hi@alyssa.is>
Closes #11995
Alexander Motin [Fri, 7 May 2021 22:07:03 +0000 (18:07 -0400)]
Simplify/fix dnode_move() for dn_zfetch
Previous code tried to keep prefetch streams while moving dnode. But
it was at least not updating per-stream zs_fetchback pointers, causing
use-after-free on next access. Instead of that I see much easier and
cleaner to just drop old prefetch state and start new from scratch.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-By: iXsystems, Inc.
Closes #11936
Closes #11998
Matthew Ahrens [Thu, 6 May 2021 18:24:56 +0000 (11:24 -0700)]
undocumented libzfs API changes broke "zfs list"
While OpenZFS does permit breaking changes to the libzfs API, we should
avoid these changes when reasonably possible, and take steps to mitigate
the impact to consumers when changes are necessary.
Commit e4288a8397bb1f made a libzfs API change that is especially
difficult for consumers because there is no change to the function
signatures, only to their behavior. Therefore, consumers can't notice
that there was a change at compile time. Also, the API change was
incompletely and incorrectly documented.
The commit message mentions `zfs_get_prop()` [sic], but all callers of
`get_numeric_property()` are impacted: `zfs_prop_get()`,
`zfs_prop_get_numeric()`, and `zfs_prop_get_int()`.
`zfs_prop_get_int()` always calls `get_numeric_property(src=NULL)`, so
it assumes that the filesystem is not mounted. This means that e.g.
`zfs_prop_get_int(ZFS_PROP_MOUNTED)` always returns 0.
The documentation says that to preserve the previous behavior, callers
should initialize `*src=ZPROP_SRC_NONE`, and some callers were changed
to do that. However, the existing behavior is actually preserved by
initializing `*src=ZPROP_SRC_ALL`, not `NONE`.
The code comment above `zfs_prop_get()` says, "src: ... NULL will be
treated as ZPROP_SRC_ALL.". However, the code actually treats NULL as
ZPROP_SRC_NONE. i.e. `zfs_prop_get(src=NULL)` assumes that the
filesystem is not mounted.
There are several existing calls which use `src=NULL` which are impacted
by the API change, most noticeably those used by `zfs list`, which now
assumes that filesystems are not mounted. For example,
`zfs list -o name,mounted` previously indicated whether a filesystem was
mounted or not, but now it always (incorrectly) indicates that the
filesystem is not mounted (`MOUNTED: no`). Similarly, properties that
are set at mount time are ignored. E.g. `zfs list -o name,atime` may
display an incorrect value if it was set at mount time.
To address these problems, this commit reverts commit e4288a8397bb1f:
"zfs get: don't lookup mount options when using "-s local""
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11999
Ryan Moeller [Thu, 6 May 2021 16:45:16 +0000 (12:45 -0400)]
FreeBSD: Initialize/destroy zp->z_lock
zp->z_lock is used in shared code for protecting projid and scantime.
We don't exercise these paths much if at all on FreeBSD, so have been
lucky enough not to have issues with the uninitialized locks so far.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #12003
Ryan Moeller [Fri, 30 Apr 2021 23:36:10 +0000 (19:36 -0400)]
FreeBSD: Clean up ASSERT/VERIFY use in module
Convert use of ASSERT() to ASSERT0(), ASSERT3U(), ASSERT3S(),
ASSERT3P(), and likewise for VERIFY(). In some cases it ended up
making more sense to change the code, such as VERIFY on nvlist
operations that I have converted to use fnvlist instead. In one
place I changed an internal struct member from int to boolean_t to
match its use. Some asserts that combined multiple checks with &&
in a single assert have been split to separate asserts, to make it
apparent which check fails.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11971
zed.d/pool_import-led.sh: fix for current zpool scripts
Also minor clean-up with folding state_to_val() into a case,
unrolling the lesser-available seq into numbers,
ignoring vdev states we don't care about,
and documentation comments
Ryan Moeller [Fri, 30 Apr 2021 14:37:02 +0000 (10:37 -0400)]
ZTS: Fix xattr_002_neg passing too soon
Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11970
Toomas Soome [Thu, 29 Apr 2021 23:44:07 +0000 (02:44 +0300)]
zdb: dump_history can be improved
We only recognize some history records, instead, use
same logic as in print_history_records() in zpool_main.c.
Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #11940
Alan Somers [Thu, 29 Apr 2021 21:19:44 +0000 (15:19 -0600)]
zfs get: don't lookup mount options when using "-s local"
Looking up mount options can be very expensive on servers with many
mounted file systems. When doing "zfs get" with any "-s" option that
does not include "temporary", the mount list will never be used. This
commit optimizes for that case.
This is a breaking commit for libzfs! Callers of zfs_get_prop are now
required to initialize src. To preserve existing behavior, they should
initialize it to ZPROP_SRC_NONE.
Sponsored by: Axcient Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #11955
vdev_mirror: don't scrub/resilver devices that can't be read
This ensures that we don't accumulate checksum errors against offline or
unavailable devices but, more importantly, means that we don't
needlessly create DTL entries for offline devices that are already
up-to-date.
Consider a 3-way mirror, with disk A always online (and so always with
an empty DTL) and B and C only occasionally online. When A & B resilver
with C offline, B's DTL will effectively be appended to C's due to these
spurious ZIOs even as the resilver empties B's DTL:
* These ZIOs land in vdev_mirror_scrub_done() and flag an error
* That flagged error causes vdev_mirror_io_done() to see
unexpected_errors, so it issues a ZIO_TYPE_WRITE repair ZIO, which
inherits ZIO_FLAG_SCAN_THREAD because zio_vdev_child_io() includes
that flag in ZIO_VDEV_CHILD_FLAGS.
* That ZIO fails, too, and eventually zio_done() gets its hands on it
and calls vdev_stat_update().
* vdev_stat_update() sees the error and this zio...
* is not speculative,
* is not due to EIO (but rather ENXIO, since the device is closed)
* has an ->io_vd != NULL (specifically, the offline leaf device)
* is a write
* is for a txg != 0 (but rather the read block's physical birth txg)
* has ZIO_FLAG_SCAN_THREAD asserted
* So: vdev_stat_update() calls vdev_dtl_dirty() on the offline vdev.
Then, when A & C resilver with B offline, that story gets replayed and
C's DTL will be appended to B's.
In fact, one does not need this permanently-broken-mirror scenario to
induce badness: breaking a mirror with no DTLs and then scrubbing will
create DTLs for all offline devices. These DTLs will persist until the
entire mirror is reassembled for the duration of the *resilver*, which,
incidentally, will not consider the devices with good data to be sources
of good data in the case of a read failure.
Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Nathaniel Wesley Filardo <nwfilardo@gmail.com>
Closes #11930
Brian Behlendorf [Wed, 28 Apr 2021 00:45:40 +0000 (17:45 -0700)]
zfs.spec.in: remove post ldconfig scriptlets
In Fedora 28 the packaging guidelines were changed such that ldconfig
should no longer be called in either the %post or %postun scriptlets.
Instead the new compatibility macros %ldconfig_post, %ldconfig_postun,
and %ldocnfig_scriptlets should be used.
Since we only currently support Fedora 31 and newer, we could drop
%post or %postun scriptlets entirely according to the guidelines.
However, since we also use the same spec file for CentOS / RHEL
it's convenient to call the macros which are available starting
with CentOS / RHEL 8. For CentOS / RHEL 7 we must still call
ldconfig in the traditional way.
Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11931
Brian Behlendorf [Tue, 27 Apr 2021 15:27:03 +0000 (08:27 -0700)]
ZTS: Add known exceptions
Both the zpool_initialize_import_export and checkpoint_discard_busy
test cases a known to occasionally fail. Add them to the list of
known possible failures and reference the appropriate issue on the
tracker.
Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11949
Martin Matuška [Tue, 27 Apr 2021 15:25:48 +0000 (17:25 +0200)]
Drop "All rights reserved" from files by trasz@FreeBSD.org
This obeys the change in freebsd/freebsd-src@bce7ee9d4
External-issue: https://reviews.freebsd.org/D26980 Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #11947
receive: don't fail inheriting (-x) properties on wrong dataset type
Receiving datasets while blanket inheriting properties like zfs
receive -x mountpoint can generally be desirable, e.g. to avoid
unexpected mounts on backup hosts.
Currently this will fail to receive zvols due to the mountpoint
property being applicable to filesystems only. This limitation
currently requires operators to special-case their minds and tools
for zvols.
This change gets rid of this limitation for inherit (-x) by
Spiting up the dataset type handling: Warnings for inheriting (-x),
errors for overriding (-o).
Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
Closes #11416
Closes #11840
Closes #11864
zed: protect against wait4()/fork() races to the launched process tree
As soon as wait4() returns, fork() can immediately return with the same
PID, and race to lock _launched_processes_lock, then try to add the new
(duplicate) PID to _launched_processes, which asserts
By locking before wait4(), we ensure, that, given that same
unfortunate scheduling, _launched_processes_lock cannot be locked by the
spawner before we pop the process in the reaper, and only afterward will
it be added
This moves where the reaper idles when there are children from the
wait4() to the pause(), locking for the duration of that single syscall
in both the no-children and running-children cases; the impact of this
is one to two syscalls (depending on _launched_processes_lock state)
per loop
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Don Brady <don.brady@delphix.com> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11924
Closes #11928
Daniel Stevenson [Tue, 20 Apr 2021 17:16:00 +0000 (12:16 -0500)]
Fixed incorrect man page reference in zfsprops(8)
The special_small_blocks section directed readers to zpool(8) for
documentation on special allocation classes, while they are actually
documented in zpoolconcepts(8).
Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Daniel Stevenson <daniel@dstev.net>
Closes #11918
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Wilson <gwilson@delphix.com> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11886
freebsd/libshare: nfs: make nfs_is_shared() thread-safe
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Wilson <gwilson@delphix.com> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11886
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Wilson <gwilson@delphix.com> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11886
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Wilson <gwilson@delphix.com> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11886
libshare: nfs: don't leak nfs_lock_fd when lock fails
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Wilson <gwilson@delphix.com> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11886
This replaces the generic libspl atomic.c atomics implementation
with one based on builtin gcc atomics. This functionality was added
as an experimental feature in gcc 4.4. Today even CentOS 7 ships
with gcc 4.8 as the default compiler we can make this the default.
Furthermore, the builtin atomics are as good or better than our
hand-rolled implementation so it's reasonable to drop that custom code.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11904
Brian Behlendorf [Mon, 19 Apr 2021 04:58:36 +0000 (21:58 -0700)]
ZTS: Improve redundancy test scripts
- Add additional logging to provide more information about why the
test failed. This including logging more of the individual commands
and the contents and differences of the record files on failure.
- Updated get_vdevs() to properly exclude all top-level vdevs
including raidz3 and draid[1-3].
- Replaced gnudd with dd. This is the only remaining place in the
test suite gnudd is used and it shouldn't be needed.
- The refill_test_env function expects the pool as the first argument
but never sets the pool variable.
- Only fill the test pools to 50% of capacity instead of 75% to help
speed up the tests.
- Fix replace_missing_devs() calculation, MINDEVSIZE should be
MINVDEVSIZE.
- Fix damage_devs() so it overwrites almost all of the device so
we're guaranteed to damage filesystem blocks.
- redundancy_stripe.ksh should not use log_mustnot to check if the
pool is healthy since the return value may be misinterpreted.
Just perform a normal conditional check and log the failure.
Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11906
Objtool requires the use of a DRAP register while aligning the
stack. Since a DRAP register is a gcc concept and we are
notoriously low on registers in the crypto code, it's not worth
the effort to mimic gcc generated stack realignment.
We simply silence the warning by adding the offending object files
to OBJECT_FILES_NON_STANDARD.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #6950
Closes #11914
As compared to the output after:
# zfs set xyz.nabijaczleweli:test=hehe zest/__test
# ./a.out
before: xyz.nabijaczleweli:test:
value: 'hehe'
source: 'zest/__test'
This deduplicates 2 sets of caches which use the same allocation size.
Memory savings fluctuate a lot, one sample result is FreeBSD running
"make buildworld" saving ~180MB RAM in reduced page count associated
with zio caches.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11877
contrib/dracut: 90: mount essential datasets under root
This partly mirrors what the i-t script does (though that mounts all
children, recursively) ‒ /etc, /usr, /lib*, and /bin are all essential,
if present, to successfully invoke the real init, which will then mount
everything else it might need in the right order
Ryan Moeller [Fri, 16 Apr 2021 00:43:07 +0000 (20:43 -0400)]
zfs-send(8): Restore sorting of flags
Before #11710 the flags in zfs-send(8) were sorted.
Restore order and bump the date.
Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11905
linux/spl: base proc_dohostid() on proc_dostring()
This fixes /proc/sys/kernel/spl/hostid on kernels with mainline commit 32927393dc1ccd60fb2bdc05b9e8e88753761469 ("sysctl: pass kernel pointers
to ->proc_handler") ‒ 5.7-rc1 and up
The access_ok() check in copy_to_user() in proc_copyout_string() would
always fail, so all userspace reads and writes would fail with EINVAL
proc_dostring() strips only the final new-line,
but simple_strtoul() doesn't actually need a back-trimmed string ‒
writing "012345678 \n" is still allowed, as is "012345678zupsko", &c.
This alters what happens when an invalid value is written ‒
previously it'd get set to what-ever simple_strtoul() returned
(probably 0, thereby resetting it to default), now it does nothing