The first warning of a misspelling is a false positive, so we annotate
the script accordingly. As for the x-prefix warnings update the check
to use the conventional '[ -z <string> ]' syntax.
all-syslog.sh:46:47: warning: Possible misspelling: ZEVENT_ZIO_OBJECT
may not be assigned, but ZEVENT_ZIO_OBJSET is. [SC2153]
make_gitrev.sh:53:6: note: Avoid x-prefix in comparisons as it no
longer serves a purpose [SC2268]
man-dates.sh:10:7: note: Avoid x-prefix in comparisons as it no
longer serves a purpose [SC2268]
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12208
Rich Ercolani [Wed, 9 Jun 2021 00:20:16 +0000 (20:20 -0400)]
Correct a flaw in the Python 3 version checking
It turns out the ax_python_devel.m4 version check assumes that
("3.X+1.0" >= "3.X.0") is True in Python, which is not when X+1
is 10 or above and X is not. (Also presumably X+1=100 and ...)
So let's remake the check to behave consistently, using the
"packaging" or (if absent) the "distlib" modules.
(Also, update the Github workflows to use the new packages.)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Closes: #12073
This:
(a) improves the error log message,
(b) locks per pool instead of globally,
(c) locks the actual output file instead of /var/lock/zfs-list,
which would otherwise linger there forever (well, still will,
but you can remove it and it won't come back), and
(d) preserves attributes of the output file
instead of reverting them to 0:0 644
It is imperative that the previous commit
("zed-functions.sh: zed_lock(): don't truncate lock")
be included in any series that contains this one
Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12042
наб [Fri, 14 May 2021 02:18:20 +0000 (04:18 +0200)]
zed.d/all-debug.sh: simplify
By locking the log file itself, we can omit arduous rebinding and
explicit umask setting, but, perhaps more importantly, avoid permanently
littering /var/lock/ with zed.debug.log.lock we will never delete
It is imperative that the previous commit
("zed-functions.sh: zed_lock(): don't truncate lock")
be included in any series that contains this one
Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12042
наб [Fri, 14 May 2021 02:17:31 +0000 (04:17 +0200)]
zed-functions.sh: zed_lock(): don't truncate lock
By appending instead of truncating, we can lock on any file (with write
permissions) instead of only dedicated lock files, since the locking
process itself no longer alters the file in any way
Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12042
Alan Somers [Tue, 8 Jun 2021 13:36:43 +0000 (07:36 -0600)]
libzfs: On FreeBSD, use MNT_NOWAIT with getfsstat
`getfsstat(2)` is used to retrieve the list of mounted file systems,
which libzfs uses when fetching properties like mountpoint, atime,
setuid, etc. The `mode` parameter may be `MNT_NOWAIT`, which uses
information in the VFS's cache, or `MNT_WAIT`, which effectively does a
`statfs` on every single mounted file system in order to fetch the most
up-to-date information. As far as I can tell, the only fields that
libzfs cares about are the filesystem's name, mountpoint, fstypename,
and mount flags. Those things are always updated on mount and unmount,
so they will always be accurate in the VFS's mount cache except in two
circumstances:
1) When a file system is busy unmounting
2) When a ZFS file system changes the value of a mount-overridable
property like atime or setuid, but doesn't remount the file system.
Right now that only happens when the property is changed by an
unprivileged user who has delegated authority to change the property
but not to mount the dataset. But perhaps libzfs could choose to do
it for other reasons in the future.
Switching to `MNT_NOWAIT` will greatly improve speed with no downside,
as long as we explicitly update the mount cache whenever we change a
mount-overridable property.
For comparison, Illumos gets this information using the native
`getmntany` and `getmntent` functions, which also use cached
information. The illumos function that would refresh the cache,
`resetmnttab`, is never called by libzfs.
And on GNU/Linux, `getmntany` and `getmntent` don't even communicate
with the kernel directly. They simply parse the file they are given,
which is usually /etc/mtab or /proc/mounts. Perhaps the implementation
of /proc/mounts is synchronous, ala MNT_WAIT; I don't know.
Sponsored-by: Axcient Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com> Closes: #12091
Rich Ercolani [Mon, 7 Jun 2021 19:29:27 +0000 (15:29 -0400)]
Force --enable-debug on FreeBSD if INVARIANTS is set
There's already logic to force INVARIANTS on for building if it's
present in the running kernel; however, not having DEBUG enabled
when DEBUG and INVARIANTS are can cause strange panics.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12185
Closes #12163
Update the logic to handle the dedup-case of consecutive
FREEs in the livelist code. The logic still ensures that
all the FREE entries are matched up with a respective
ALLOC by keeping a refcount for each FREE blkptr that we
encounter and ensuring that this refcount gets to zero
by the time we are done processing the livelist.
zdb -y no longer panics when encountering double frees
Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Don Brady <don.brady@delphix.com> Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #11480
Closes #12177
Alexander Motin [Mon, 7 Jun 2021 16:02:47 +0000 (12:02 -0400)]
More aggsum optimizations
- Avoid atomic_add() when updating as_lower_bound/as_upper_bound.
Previous code was excessively strong on 64bit systems while not
strong enough on 32bit ones. Instead introduce and use real
atomic_load() and atomic_store() operations, just an assignments
on 64bit machines, but using proper atomics on 32bit ones to avoid
torn reads/writes.
- Reduce number of buckets on large systems. Extra buckets not as
much improve add speed, as hurt reads. Unlike wmsum for aggsum
reads are still important.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-By: iXsystems, Inc.
Closes #12145
Rich Ercolani [Fri, 4 Jun 2021 21:00:39 +0000 (17:00 -0400)]
Let zfs diff be more permissive
In the current world, `zfs diff` will die on certain kinds of errors
that come up on ordinary, not-mangled filesystems - like EINVAL,
which can come from a file with multiple hardlinks having the one
whose name is referenced deleted.
Since it should always be safe to continue, let's relax about all
error codes - still print something for most, but don't immediately
abort when we encounter them.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12072
jharmening [Fri, 4 Jun 2021 20:11:08 +0000 (13:11 -0700)]
FreeBSD: incorporate changes to the VFS_QUOTACTL(9) KPI
VFS_QUOTACTL(9) has been updated to allow each filesystem to indicate
whether it has changed the busy state of the mount. The filesystem
may still assume that its .vfs_quotactl entrypoint is always called
with the mount busied, but only needs to unbusy the mount (and clear
*mp_busy) if it does something that actually requires the mount to be
unbusied. It no longer needs to blindly copy-paste the UFS protocol
for calling vfs_unbusy(9) for the Q_QUOTAOFF and Q_QUOTAON commands.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Jason Harmening <jason.harmening@gmail.com>
Closes #12052
Ryan Moeller [Fri, 4 Jun 2021 19:53:44 +0000 (15:53 -0400)]
Fix error check in nvlist_print_json_string
Move check for errors from mbrtowc() into the loop. The error values
are not actually negative, so we don't break out of the loop when they
are encountered.
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12175
Closes #12176
Linux: Set spl_kmem_cache_slab_limit when page size !4K
For small objects the kernel's slab implementation is very fast and
space efficient. However, as the allocation size increases to
require multiple pages performance suffers. The SPL kmem cache
allocator was designed to better handle these large allocation
sizes. Therefore, on Linux the kmem_cache_* compatibility wrappers
prefer to use the kernel's slab allocator for small objects and
the custom SPL kmem cache allocator for larger objects.
This logic was effectively disabled for all architectures using
a non-4K page size which caused all kmem caches to only use the
SPL implementation. Functionally this is fine, but the SPL code
which calculates the target number of objects per-slab does not
take in to account that __vmalloc() always returns page-aligned
memory. This can result in a massive amount of wasted space when
allocating tiny objects on a platform using large pages (64k).
To resolve this issue we set the spl_kmem_cache_slab_limit cutoff
to 16K for all architectures.
This particular change does not attempt to update the logic used
to calculate the optimal number of pages per slab. This remains
an issue which should be addressed in a future change.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12152
Closes #11429
Closes #11574
Closes #12150
наб [Sat, 15 May 2021 09:53:14 +0000 (11:53 +0200)]
libzfs: convert to -fvisibility=hidden
Also mark all printf-like funxions in libzfs_impl.h as printf-like
and add --no-show-locs to storeabi, in hopes diffs will make more sense
in future
This removes these symbols from libzfs:
D nfs_only
T SHA256Init
T SHA2Final
T SHA2Init
T SHA2Update
T SHA384Init
T SHA512Init
D share_all_proto
D smb_only
T zfs_is_shared_proto
W zpool_mount_datasets
W zpool_unmount_datasets
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12048
Colm [Thu, 3 Jun 2021 15:13:42 +0000 (16:13 +0100)]
A couple of small style cleanups
In `zpool_load_compat()`:
* initialize `l_features[]` with a loop rather than a static
initializer.
* don't redefine system constants; use private names instead
Rationale here:
When an array is initialized using a static {foo}, only the specified
members are initialized to the provided values, the rest are
initialized to zero. While B_FALSE is of course zero, it feels
unsafe to rely on this being true forever, so I'm inclined to sacrifice
a few microseconds of runtime here and initialize using a loop.
When looking for the correct combination of system constants to use
(in open() and mmap()), I prefer to use private constants rather than
redefining system ones; due to the small chance that the system
ones might be referenced later in the file. So rather than defining
O_PATH and MAP_POPULATE, I use distinct constant names.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Colm Buckley <colm@tuatha.org>
Closes #12156
zfs_arc_overflow_shift was never a parameter: ca0bf58d65f77e944b9905571df9a2eae647aeca ("Illumos 5497 - lock
contention on arcs_mtx") is the only result in
git log -Soverflow_shift, and it wasn't exposed then, nor is it now
zfs_read_chunk_size was renamed to zfs_vnops_read_chunk_size in e53d678d4ad596a310d51dab107bb6fa97e2b226 ("Share zfs_fsync, zfs_read,
zfs_write, et al between Linux and FreeBSD")
zio_decompress_fail_fraction was never a parameter: it was added in c3bd3fb4ac49705819666055ff1206a9fa3d1b9e ("OpenZFS 9403 - assertion
failed in arc_buf_destroy()") as a developer aid for setting in zdb, but
it's a dangerous test tunable and has no place in public documentation,
(not to mention that it obviously doesn't work):
> Although this did uncover a few low priority issues, this
unfortuantely also causes ztest to ASSERT in many locations where the
code is working correctly since it is designed to fail on IO errors.
Developers can manually set this variable with the '-o' option to find
and debug issues.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Laager <rlaager@wiktel.com> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12157
Rich Ercolani [Tue, 1 Jun 2021 21:20:50 +0000 (17:20 -0400)]
Added another missed case to arc_summary3
It turns out that sometimes, evidently only when run inside the
ZTS handler, arc_summary3 | head > /dev/null will die with ENOTCONN,
and ruin the test run.
Added handling for that.
Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12160
Ryan Moeller [Tue, 1 Jun 2021 21:13:26 +0000 (17:13 -0400)]
libzfs_core: Fix some style violations
Made function names start on a new line. Added a blank line between
functions. This helps when grepping for functions.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12137
grembo [Tue, 1 Jun 2021 21:03:49 +0000 (23:03 +0200)]
FreeBSD boot code reminder after zpool upgrade
There used to be a warning after upgrading a zpool in FreeBSD, so users
won't forget to update the boot loader that pool is booted from.
This change brings this warning back, but only if the bootfs property
is set on the pool, which should be sufficient for the vast majority of
FreeBSD installations. People running something custom are most likely
aware of what to do after an upgrade in their specific environment.
Functionality is implemented in an OS specific helper function.
Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Co-authored-by: Michael Gmelin <grembo@FreeBSD.org> Signed-off-by: Michael Gmelin <grembo@FreeBSD.org>
Closes #12099
Closes #12104
Rich Ercolani [Tue, 1 Jun 2021 18:58:08 +0000 (14:58 -0400)]
Remove iov_iter_advance() for iter_write
The additional iter advance is incorrect, as copy_from_iter() has
already done the right thing. This will result in the following
warning being printed to the console as of the 5.12 kernel.
Attempted to advance past end of bvec iter
This change should have been included with #11378 when a
similar change was made on the read side.
Manoj Joseph [Fri, 28 May 2021 22:06:07 +0000 (15:06 -0700)]
long options for ztest
This change introduces long options for ztest. It builds the usage
message as well as the long_options array from a single table. It also
adds #defines for the default values.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Manoj Joseph <manoj.joseph@delphix.com>
Closes #12117
Paul Dagnelie [Fri, 28 May 2021 21:08:41 +0000 (14:08 -0700)]
Don't direct to freenode in issue template
While Libera doesn't yet have a webchat client, we should at least
direct them to the right network. Once a webchat client is available,
we can direct them to it.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #12127
Rich Ercolani [Fri, 28 May 2021 05:39:53 +0000 (01:39 -0400)]
Minor fix to configure on s390x
configure on s390x has a key check fail with an error about
a variable being used uninitialized. So let's initialize it.
Reviewed-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12126
Alexander Motin [Thu, 27 May 2021 20:27:29 +0000 (16:27 -0400)]
Introduce write-mostly sums
wmsum counters are a reduced version of aggsum counters, optimized for
write-mostly scenarios. They do not provide optimized read functions,
but instead allow much cheaper add function. The primary usage is
infrequently read statistic counters, not requiring exact precision.
The Linux implementation is directly mapped into percpu_counter KPI.
The FreeBSD implementation is directly mapped into counter(9) KPI.
In user-space due to lack of better implementation mapped to aggsum.
Unfortunately neither Linux percpu_counter nor FreeBSD counter(9)
provide sufficient functionality to completelly replace aggsum, so
it still remains to be used for several hot counters.
Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-By: iXsystems, Inc.
Closes #12114
Alexander Motin [Thu, 27 May 2021 16:11:39 +0000 (12:11 -0400)]
Improve scrub maxinflight_bytes math.
Previously, ZFS scaled maxinflight_bytes based on total number of
disks in the pool. A 3-wide mirror was receiving a queue depth of 3
disks, which it should not, since it reads from all the disks inside.
For wide raidz the situation was slightly better, but still a 3-wide
raidz1 received a depth of 3 disks instead of 2.
The new code counts only unique data disks, i.e. 1 disk for mirrors
and non-parity disks for raidz/draid. For draid the math is still
imperfect, since vdev_get_nparity() returns number of parity disks
per group, not per vdev, but still some better than it was.
This should slightly reduce scrub influence on payload for some pool
topologies by avoiding excessive queuing.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closing #12046
A plain rewrite of the shell version, and generates identical
units, save for replacing some empty lines with nothing, having fewer
meaningless spaces in After=s and different spacing in the lock scripts,
for a clean git diff -w
This is a gain of anywhere from 0m0.336s vs 0m0.022s (15.27x)
to 0m0.202s vs 0m0.006s (33.67x), depending on the hardware,
a.k.a. from "absolutely unusable" to "perfectly fine"
This also properly deals with canmount=noauto units across multiple
pools
See PR for detailed timings (of an early version) and diffs
Reviewed-by: Antonio Russo <aerusso@aerusso.net> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: InsanePrawn <insane.prawny@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Issue #11915
Closes #11917
Alexander Motin [Wed, 26 May 2021 18:14:26 +0000 (14:14 -0400)]
FreeBSD: Update dataset_kstats for zvols in dev mode
Previous commit added accounting for geom mode, but not for dev.
In geom mode we actually have GEOM statistics, while in dev mode
additional accounting actually makes more sense.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #12097
Alexander Motin [Wed, 26 May 2021 01:38:34 +0000 (21:38 -0400)]
FreeBSD: avoid memory allocation in arc_prune_async
Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #12049
Rich Ercolani [Wed, 26 May 2021 01:08:27 +0000 (21:08 -0400)]
Add note for printing all dbgmsg entries on FreeBSD
I looked for a bit, and couldn't find any documentation on
how to print all logged dbgmsg entries, just messages since
the DTrace probe started, until @allanjude kindly pointed me
toward the sysctl.
So let's add that note where the DTrace probe is mentioned for
FreeBSD, so other people can find it.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12113
vermavipinkumar [Tue, 25 May 2021 18:32:07 +0000 (00:02 +0530)]
Propagate vdev state due to invalid label corruption
Propagate vdev child state to parents on invalid label
Add VDEV_AUX_BAD_LABEL to print_import_config()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Co-authored-by: Srikanth N S <srikanth.nagasubbaraoseetharaman@hpe.com> Signed-off-by: Vipin Kumar Verma <vipin.verma@hpe.com>
Closes #12088