Justin Gottula [Wed, 30 Jun 2021 01:50:13 +0000 (18:50 -0700)]
Udev rules: use non-ancient comma syntax
This file is old as dirt. It's entirely possible that commas were
optional in udev back at that time. But they're definitely supposed to
be there nowadays.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com> Signed-off-by: Justin Gottula <justin@jgottula.com>
Closes #12302
Alexander Motin [Thu, 1 Jul 2021 15:32:31 +0000 (11:32 -0400)]
Remove avl_size field from struct avl_tree
This field is used only by illumos mdb. On other platforms it only
increases the struct size from 32 to 40 bytes. For struct vdev_queue
including 13 instances of avl_tree_t size means active cache lines.
Keep the padding in user-space for now to not break the ABI.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-By: iXsystems, Inc.
Closes #12290
Alexander Motin [Thu, 1 Jul 2021 15:30:31 +0000 (11:30 -0400)]
Compact dbuf/buf hashes and lock arrays
With default dbuf cache size of 1/32 of ARC, it makes no sense to have
hash table of the same size (or even bigger on Linux). Reduce it to
1/8 of ARC's one, still leaving some slack, assuming higher I/O rate
via dbuf cache than via ARC.
Remove padding from ARC hash locks array. The idea behind padding
is to avoid false sharing between locks. It would have sense if
there would be a limited number of very busy locks. But since we
have no limit on the number, using the same memory for more locks we
can achieve even lower lock contention with the same false sharing,
or we can use less memory for the same contention level.
Reduce number of hash locks from 8192 to 2048. The number is still
big enough to not cause contention, but reduced memory size improves
cache hit rate for mutex_tryenter() in ARC eviction thread, saving
about 1% of the thread time.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-By: iXsystems, Inc.
Closes #12289
Fix a leak of abd_t that manifested mostly when using
raidzN with at least as many columns as N (e.g. a
four-disk raidz2 but not a three-disk raidz2).
Sufficiently heavy raidz use would eventually run a system
out of memory.
Additionally:
* Switch abd_cache arena to FIRSTFIT, which empirically
improves perofrmance.
* Make abd_chunk_cache more performant and debuggable.
* Allocate the abd_zero_buf from abd_chunk_cache rather
than the heap.
* Don't try to reap non-existent qcaches in abd_cache arena.
* KM_PUSHPAGE->KM_SLEEP when allocating chunks from their
own arena
Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Jorgen Lundman <lundman@lundman.net> Co-authored-by: Sean Doran <smd@use.net>
Closes #12295
Kevin Jin [Thu, 1 Jul 2021 15:20:27 +0000 (11:20 -0400)]
Optimize txg_kick() process (#12274)
Use dp_dirty_pertxg[] for txg_kick(), instead of dp_dirty_total in
original code. Extra parameter "txg" is added for txg_kick(), thus it
knows which txg to kick. Also txg_kick() call is moved from
dsl_pool_need_dirty_delay() to dsl_pool_dirty_space() so that we can
know the txg number assigned for txg_kick().
Some unnecessary code regarding dp_dirty_total in txg_sync_thread() is
also cleaned up.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: jxdking <lostking2008@hotmail.com>
Closes #12274
Alexander Motin [Thu, 1 Jul 2021 15:16:54 +0000 (11:16 -0400)]
Remove refcount from spa_config_*()
The only reason for spa_config_*() to use refcount instead of simple
non-atomic (thanks to scl_lock) variable for scl_count is tracking,
hard disabled for the last 8 years. Switch to simple int scl_count
reduces the lock hold time by avoiding atomic, plus makes structure
fit into single cache line, reducing the locks contention.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-By: iXsystems, Inc.
Closes #12287
Ryan Moeller [Wed, 30 Jun 2021 14:37:20 +0000 (10:37 -0400)]
ZED: Match added disk by pool/vdev GUID if found (#12217)
This enables ZED to auto-online vdevs that are not wholedisk managed by
ZFS.
Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Don Brady <don.brady@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Brian Behlendorf [Tue, 29 Jun 2021 20:16:38 +0000 (13:16 -0700)]
Linux 5.13 compat: META
Increase the Linux-Maximum version in the META file to 5.13.
All of the required compatibility patches have been merged
and the 5.13 kernel has been officially released.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Alexander [Tue, 29 Jun 2021 14:26:11 +0000 (16:26 +0200)]
module/zfs: simplify ddt_stat_add() loop
LLVM's Polly (ISL to be precise) is unhappy with the loop from
ddt_stat_add():
CC [M] fs/zfs/zfs/ddt.o
../lib/External/isl/isl_schedule_node.c:2470: cannot insert node
between set or sequence node and its filter children
(building with the custom patch which adds Polly support to Kbuild)
The mentioned loop is rather suboptimal. All that we need is to just
treat ddt_stat_t as an array of u64 and perform 1:1 addition or
substraction. This can be done in simpler for-loop with the
determined index and bounds. Compiler will expand d_end - d into
a number of ddt_stat_t fields at compile time.
This prevents Polly from failing on this file.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Closes #12253
Alexander Motin [Tue, 29 Jun 2021 12:59:14 +0000 (08:59 -0400)]
Avoid 64bit division in multilist index functions
The number of sublists in a multilist is relatively small. We dont need
64 bits to calculate an index. 32 bits is sufficient and makes the
code more efficient.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-By: iXsystems, Inc.
Closes #12288
Michal Vasilek [Sat, 26 Jun 2021 05:43:25 +0000 (07:43 +0200)]
Fix plymouth passphrase prompt with dracut
plymouth --command splits the command on spaces which means
that zfs-load-key was getting the filesystem name enclosed
in single quotes (since 13c59bb76) and failing. This commit
fixes it by piping the password directly to the command
similar to how it's done in other scripts (initramfs,
dracut without plymouth).
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Michal Vasilek <michal@vasilek.cz> Related-to: #9193 Related-to: #9202
Closes #12147
Rich Ercolani [Sat, 26 Jun 2021 05:28:12 +0000 (01:28 -0400)]
Fix build with KASAN
The stock zstd code expects some helpers from ASAN if present.
This works fine in userland, but in kernel, KASAN also gets detected,
and lacks those helpers. So let's make some empty substitutes for
that case.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12232
Alexander Motin [Fri, 25 Jun 2021 23:38:31 +0000 (19:38 -0400)]
Help compiller optimize out abd_verify()
While abd_verify() does nothing when built without debug, compiler
can't optimize it out by itself due to calls to external list_*()
and abd_verify_scatter(). This commit makes it explicit.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Adam Moss <c@yotes.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-By: iXsystems, Inc.
Closes #12280
Martin Matuška [Fri, 25 Jun 2021 17:28:51 +0000 (19:28 +0200)]
FreeBSD: fix compilation of FreeBSD world after 29274c9f6
prng32_bounded() is available to kernel only on FreeBSD 13+.
Call inline random_get_pseudo_bytes() with correct pointer type.
To be consistent, apply to Linux as well.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #12282
Brian Behlendorf [Thu, 24 Jun 2021 21:30:02 +0000 (14:30 -0700)]
Update cache file when setting compatibility property
Unlike most other properties the 'compatibility' property is stored
in the pool config object and not the DMU_OT_POOL_PROPS object.
This had the advantage that the compatibility information is available
without needing to fully import the pool (it can be read with zdb).
However, this means we need to make sure to update both the copy of
the config in the MOS and the cache file. This wasn't being done.
This commit adds a call to spa_async_request() to ensure the copy of
the config in the cache file gets updated as well as the one stored
in the pool. This same change is made for the 'comment' property
which suffers from the same inconsistency.
Reviewed-by: Sean Eric Fagan <sef@ixsystems.com> Reviewed-by: Colm Buckley <colm@tuatha.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12261
Closes #12276
Paul Dagnelie [Thu, 24 Jun 2021 19:42:01 +0000 (12:42 -0700)]
Fix flag copying in resume case
A couple flags weren't being copied in the case where we're doing size
estimation on a resume.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes: #12266
jumbi77 [Thu, 24 Jun 2021 17:02:54 +0000 (19:02 +0200)]
zfs_metaslab_mem_limit should be 25 instead of 75
According to current zfs man page zfs_metaslab_mem_limit should be
25 instead of 75.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Signed-off-by: jumbi77@users.noreply.github.com
Closes #12273
Attila Fülöp [Wed, 23 Jun 2021 23:57:06 +0000 (01:57 +0200)]
gcc 11 cleanup
Compiling with gcc 11.1.0 produces three new warnings.
Change the code slightly to avoid them.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #12130
Closes #12188
Closes #12237
Brian Behlendorf [Wed, 23 Jun 2021 22:53:13 +0000 (15:53 -0700)]
ZTS: Add known exceptions
The receive-o-x_props_override test case reliably fails on the
FreeBSD main builders (but not on Linux), until the root cause is
understood add this test to the FreeBSD exception list.
On Linux the alloc_class_012_pos test case may occasionally fail.
This is a known false positive which has also been added to the
Linux exception list until the test can be made entirely reliable.
Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12272
Rich Ercolani [Wed, 23 Jun 2021 04:53:45 +0000 (00:53 -0400)]
Annotated dprintf as printf-like
ZFS loves using %llu for uint64_t, but that requires a cast to not
be noisy - which is even done in many, though not all, places.
Also a couple places used %u for uint64_t, which were promoted
to %llu.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12233
Per the discussion in #11531, the reverted commit---which intended only
to be a cleanup commit---introduced a subtle, unintended change in
behavior.
Care was taken to partially revert and then reapply 10b3c7f5e4
which would otherwise have caused a conflict. These changes were
squashed in to this commit.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Suggested-by: @chrisrd Suggested-by: robn@despairlabs.com Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #11531
Closes #12227
Alexander Motin [Tue, 22 Jun 2021 23:35:23 +0000 (19:35 -0400)]
Optimize small random numbers generation
In all places except two spa_get_random() is used for small values,
and the consumers do not require well seeded high quality values.
Switch those two exceptions directly to random_get_pseudo_bytes()
and optimize spa_get_random(), renaming it to random_in_range(),
since it is not related to SPA or ZFS in general.
On FreeBSD directly map random_in_range() to new prng32_bounded() KPI
added in FreeBSD 13. On Linux and in user-space just reduce the type
used to uint32_t to avoid more expensive 64bit division.
Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-By: iXsystems, Inc.
Closes #12183
Alexander Motin [Thu, 17 Jun 2021 00:19:34 +0000 (20:19 -0400)]
Use wmsum for arc, abd, dbuf and zfetch statistics. (#12172)
wmsum was designed exactly for cases like these with many updates
and rare reads. It allows to completely avoid atomic operations on
congested global variables.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-By: iXsystems, Inc.
Closes #12172
George Amanakis [Thu, 17 Jun 2021 00:17:42 +0000 (03:17 +0300)]
Avoid deadlock when removing L2ARC devices under I/O
In case we have I/O and try to remove an L2ARC device a deadlock might
occur. arc_read()->zio_read()->zfs_blkptr_verify() waits for SCL_VDEV
to be dropped while holding the hash_lock. However, spa_l2cache_load()
holds SCL_ALL and waits for the hash_lock in l2arc_evict().
Fix this by moving zfs_blkptr_verify() to the top top arc_read() before
the hash_lock is taken. Verify the block pointer and return a checksum
error if damaged rather than halting the system, by using
BLK_VERIFY_LOG instead of BLK_VERIFY_HALT.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #12054
Turns out $ZPOOL_IMPORT_OPTS expands in a shell-like fashion,
yielding 'import' '-aN' '-o' 'cachefile=none' for an unset variable,
and 'import' '-aN' '-o' 'cachefile=none' 'word1' 'word2' for a
white-spaced one, but ${ZPOOL_IMPORT_OPTS} expands like "${Z_I_O}"
would in a shell, yielding 'import' '-aN' '-o' 'cachefile=none' ''
(empty) and 'import' '-aN' '-o' 'cachefile=none' 'word1 word2' (spaced)
Matthew Ahrens [Sun, 13 Jun 2021 17:48:53 +0000 (10:48 -0700)]
vdev_draid_min_asize() ignores reserved space
vdev_draid_min_asize() returns the minimum size of a child vdev. This
is used when determining if a disk is big enough to replace a child.
It's also used by zdb to determine how big of a child to make to test
replacement.
vdev_draid_min_asize() says that the child’s asize has to be at least
1/Nth of the entire draid’s asize, which is the same logic as raidz.
However, this contradicts the code in vdev_draid_open(), which
calculates the draid’s asize based on a reduced child size:
An additional 32MB of scratch space is reserved at the end of each
child for use by the dRAID expansion feature
So the problem is that you can replace a draid disk with one that’s
vdev_draid_min_asize(), but it actually needs to be larger to accommodate
the additional 32MB. The replacement is allowed and everything works at
first (since the reserved space is at the end, and we don’t try to use
it yet), but when you try to close and reopen the pool,
vdev_draid_open() calculates a smaller asize for the draid, because of
the smaller leaf, which is not allowed.
I think the confusion is that vdev_draid_min_asize() is correctly
returning the amount of required *allocatable* space in a leaf, but the
actual *size* of the leaf needs to be at least 32MB more than that.
ztest_vdev_attach_detach() assumes that it can attach that size of
device, and it actually can (the kernel/libzpool accepts it), but it
then later causes zdb to not be able to open the pool.
This commit changes vdev_draid_min_asize() to return the required size
of the leaf, not the size that draid will make available to the metaslab
allocator.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11459
Closes #12221
Paul Zuchowski [Sat, 12 Jun 2021 00:00:33 +0000 (20:00 -0400)]
Do not hash unlinked inodes
In zfs_znode_alloc we always hash inodes. If the
znode is unlinked, we do not need to hash it. This
fixes the problem where zfs_suspend_fs is doing zrele
(iput) in an async fashion, and zfs_resume_fs unlinked
drain processing will try to hash an inode that could
still be hashed, resulting in a panic.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alan Somers <asomers@gmail.com> Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes #9741
Closes #11223
Closes #11648
Closes #12210
наб [Sat, 22 May 2021 15:19:14 +0000 (17:19 +0200)]
Forbid basename(3) and dirname(3)
There are at least two interpretations of basename(3),
in addition to both functions being allowed to /both/ return a static
buffer (unsuitable in multi-threaded environments) /and/ raze the input
(which encourages overallocations, at best)
Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12105
Brian Behlendorf [Fri, 11 Jun 2021 15:21:36 +0000 (08:21 -0700)]
ZTS: Add zfs_clone_livelist_dedup.ksh to Makefile.am
Commit 86b5f4c12 added a new zfs_clone_livelist_dedup.ksh test case
but didn't include it in the Makefile.am. This results in the test
not being included in the dist tarball so it's never run by the CI.
Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes: #12224
Alexander Motin [Thu, 10 Jun 2021 16:42:31 +0000 (12:42 -0400)]
Re-embed multilist_t storage
This commit partially reverts changes to multilists in PR 7968
(multi-threaded spa-sync()) and adds some cache line alignments to
separate read-only multilists and heavily modified refcount's to different
cache lines.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-by: iXsystems, Inc.
Closes #12158
Alexander Motin [Thu, 10 Jun 2021 15:27:33 +0000 (11:27 -0400)]
Remove pool io kstats (#12212)
This mostly reverts "3537 want pool io kstats" commit of 8 years ago.
From one side this code using pool-wide locks became pretty bad for
performance, creating significant lock contention in I/O pipeline.
From another, there are more efficient ways now to obtain detailed
statistics, while this statistics is illumos-specific and much less
usable on Linux and FreeBSD, reported only via procfs/sysctls.
This commit does not remove KSTAT_TYPE_IO implementation, that may
be removed later together with already unused KSTAT_TYPE_INTR and
KSTAT_TYPE_TIMER.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-By: iXsystems, Inc.
Closes #12212
Rich Ercolani [Thu, 10 Jun 2021 00:57:57 +0000 (20:57 -0400)]
Added error for writing to /dev/ on Linux
Starting in Linux 5.10, trying to write to /dev/{null,zero} errors out.
Prefer to inform people when this happens rather than hoping they guess
what's wrong.
Reviewed-by: Antonio Russo <aerusso@aerusso.net> Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Closes: #11991
No symbols affected in libavl
No symbols affected by libtpool, but pre-ANSI declarations got purged
No symbols affected by libzfs_core
No symbols affected by libzfs_bootenv
libefi got cleaned, gained efi_debug documentation in efi_partition.h,
and removes one undocumented and unused symbol from libzfs_core:
D default_vtoc_map
libnvpair saw removal of these symbols:
D nv_alloc_nosleep_def
D nv_alloc_sleep
D nv_alloc_sleep_def
D nv_fixed_ops_def
D nvlist_hashtable_init_size
D nvpair_max_recursion
libshare saw removal of these symbols from libzfs:
T libshare_nfs_init
T libshare_smb_init
T register_fstype
B smb_shares
libzutil saw removal of these internal symbols from libzfs_core:
T label_paths
T slice_cache_compare
T zpool_find_import_blkid
T zpool_open_func
T zutil_alloc
T zutil_strdup
Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12191
наб [Thu, 3 Jun 2021 21:34:27 +0000 (23:34 +0200)]
libefi: remove efi_auto_sense()
It's present (but undocumented) in the illumos gate and used exclusively
by rmformat(1) (which I recommend as a nice blast from the past),
and also the math assumes 512B sectors and is therefore wrong
Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12191
The first warning of a misspelling is a false positive, so we annotate
the script accordingly. As for the x-prefix warnings update the check
to use the conventional '[ -z <string> ]' syntax.
all-syslog.sh:46:47: warning: Possible misspelling: ZEVENT_ZIO_OBJECT
may not be assigned, but ZEVENT_ZIO_OBJSET is. [SC2153]
make_gitrev.sh:53:6: note: Avoid x-prefix in comparisons as it no
longer serves a purpose [SC2268]
man-dates.sh:10:7: note: Avoid x-prefix in comparisons as it no
longer serves a purpose [SC2268]
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12208
Rich Ercolani [Wed, 9 Jun 2021 00:20:16 +0000 (20:20 -0400)]
Correct a flaw in the Python 3 version checking
It turns out the ax_python_devel.m4 version check assumes that
("3.X+1.0" >= "3.X.0") is True in Python, which is not when X+1
is 10 or above and X is not. (Also presumably X+1=100 and ...)
So let's remake the check to behave consistently, using the
"packaging" or (if absent) the "distlib" modules.
(Also, update the Github workflows to use the new packages.)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Closes: #12073
This:
(a) improves the error log message,
(b) locks per pool instead of globally,
(c) locks the actual output file instead of /var/lock/zfs-list,
which would otherwise linger there forever (well, still will,
but you can remove it and it won't come back), and
(d) preserves attributes of the output file
instead of reverting them to 0:0 644
It is imperative that the previous commit
("zed-functions.sh: zed_lock(): don't truncate lock")
be included in any series that contains this one
Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12042
наб [Fri, 14 May 2021 02:18:20 +0000 (04:18 +0200)]
zed.d/all-debug.sh: simplify
By locking the log file itself, we can omit arduous rebinding and
explicit umask setting, but, perhaps more importantly, avoid permanently
littering /var/lock/ with zed.debug.log.lock we will never delete
It is imperative that the previous commit
("zed-functions.sh: zed_lock(): don't truncate lock")
be included in any series that contains this one
Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12042
наб [Fri, 14 May 2021 02:17:31 +0000 (04:17 +0200)]
zed-functions.sh: zed_lock(): don't truncate lock
By appending instead of truncating, we can lock on any file (with write
permissions) instead of only dedicated lock files, since the locking
process itself no longer alters the file in any way
Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12042
Alan Somers [Tue, 8 Jun 2021 13:36:43 +0000 (07:36 -0600)]
libzfs: On FreeBSD, use MNT_NOWAIT with getfsstat
`getfsstat(2)` is used to retrieve the list of mounted file systems,
which libzfs uses when fetching properties like mountpoint, atime,
setuid, etc. The `mode` parameter may be `MNT_NOWAIT`, which uses
information in the VFS's cache, or `MNT_WAIT`, which effectively does a
`statfs` on every single mounted file system in order to fetch the most
up-to-date information. As far as I can tell, the only fields that
libzfs cares about are the filesystem's name, mountpoint, fstypename,
and mount flags. Those things are always updated on mount and unmount,
so they will always be accurate in the VFS's mount cache except in two
circumstances:
1) When a file system is busy unmounting
2) When a ZFS file system changes the value of a mount-overridable
property like atime or setuid, but doesn't remount the file system.
Right now that only happens when the property is changed by an
unprivileged user who has delegated authority to change the property
but not to mount the dataset. But perhaps libzfs could choose to do
it for other reasons in the future.
Switching to `MNT_NOWAIT` will greatly improve speed with no downside,
as long as we explicitly update the mount cache whenever we change a
mount-overridable property.
For comparison, Illumos gets this information using the native
`getmntany` and `getmntent` functions, which also use cached
information. The illumos function that would refresh the cache,
`resetmnttab`, is never called by libzfs.
And on GNU/Linux, `getmntany` and `getmntent` don't even communicate
with the kernel directly. They simply parse the file they are given,
which is usually /etc/mtab or /proc/mounts. Perhaps the implementation
of /proc/mounts is synchronous, ala MNT_WAIT; I don't know.
Sponsored-by: Axcient Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com> Closes: #12091
Rich Ercolani [Mon, 7 Jun 2021 19:29:27 +0000 (15:29 -0400)]
Force --enable-debug on FreeBSD if INVARIANTS is set
There's already logic to force INVARIANTS on for building if it's
present in the running kernel; however, not having DEBUG enabled
when DEBUG and INVARIANTS are can cause strange panics.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12185
Closes #12163
Update the logic to handle the dedup-case of consecutive
FREEs in the livelist code. The logic still ensures that
all the FREE entries are matched up with a respective
ALLOC by keeping a refcount for each FREE blkptr that we
encounter and ensuring that this refcount gets to zero
by the time we are done processing the livelist.
zdb -y no longer panics when encountering double frees
Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Don Brady <don.brady@delphix.com> Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #11480
Closes #12177
Alexander Motin [Mon, 7 Jun 2021 16:02:47 +0000 (12:02 -0400)]
More aggsum optimizations
- Avoid atomic_add() when updating as_lower_bound/as_upper_bound.
Previous code was excessively strong on 64bit systems while not
strong enough on 32bit ones. Instead introduce and use real
atomic_load() and atomic_store() operations, just an assignments
on 64bit machines, but using proper atomics on 32bit ones to avoid
torn reads/writes.
- Reduce number of buckets on large systems. Extra buckets not as
much improve add speed, as hurt reads. Unlike wmsum for aggsum
reads are still important.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-By: iXsystems, Inc.
Closes #12145
Rich Ercolani [Fri, 4 Jun 2021 21:00:39 +0000 (17:00 -0400)]
Let zfs diff be more permissive
In the current world, `zfs diff` will die on certain kinds of errors
that come up on ordinary, not-mangled filesystems - like EINVAL,
which can come from a file with multiple hardlinks having the one
whose name is referenced deleted.
Since it should always be safe to continue, let's relax about all
error codes - still print something for most, but don't immediately
abort when we encounter them.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12072
jharmening [Fri, 4 Jun 2021 20:11:08 +0000 (13:11 -0700)]
FreeBSD: incorporate changes to the VFS_QUOTACTL(9) KPI
VFS_QUOTACTL(9) has been updated to allow each filesystem to indicate
whether it has changed the busy state of the mount. The filesystem
may still assume that its .vfs_quotactl entrypoint is always called
with the mount busied, but only needs to unbusy the mount (and clear
*mp_busy) if it does something that actually requires the mount to be
unbusied. It no longer needs to blindly copy-paste the UFS protocol
for calling vfs_unbusy(9) for the Q_QUOTAOFF and Q_QUOTAON commands.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Jason Harmening <jason.harmening@gmail.com>
Closes #12052
Ryan Moeller [Fri, 4 Jun 2021 19:53:44 +0000 (15:53 -0400)]
Fix error check in nvlist_print_json_string
Move check for errors from mbrtowc() into the loop. The error values
are not actually negative, so we don't break out of the loop when they
are encountered.
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12175
Closes #12176
Linux: Set spl_kmem_cache_slab_limit when page size !4K
For small objects the kernel's slab implementation is very fast and
space efficient. However, as the allocation size increases to
require multiple pages performance suffers. The SPL kmem cache
allocator was designed to better handle these large allocation
sizes. Therefore, on Linux the kmem_cache_* compatibility wrappers
prefer to use the kernel's slab allocator for small objects and
the custom SPL kmem cache allocator for larger objects.
This logic was effectively disabled for all architectures using
a non-4K page size which caused all kmem caches to only use the
SPL implementation. Functionally this is fine, but the SPL code
which calculates the target number of objects per-slab does not
take in to account that __vmalloc() always returns page-aligned
memory. This can result in a massive amount of wasted space when
allocating tiny objects on a platform using large pages (64k).
To resolve this issue we set the spl_kmem_cache_slab_limit cutoff
to 16K for all architectures.
This particular change does not attempt to update the logic used
to calculate the optimal number of pages per slab. This remains
an issue which should be addressed in a future change.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12152
Closes #11429
Closes #11574
Closes #12150
наб [Sat, 15 May 2021 09:53:14 +0000 (11:53 +0200)]
libzfs: convert to -fvisibility=hidden
Also mark all printf-like funxions in libzfs_impl.h as printf-like
and add --no-show-locs to storeabi, in hopes diffs will make more sense
in future
This removes these symbols from libzfs:
D nfs_only
T SHA256Init
T SHA2Final
T SHA2Init
T SHA2Update
T SHA384Init
T SHA512Init
D share_all_proto
D smb_only
T zfs_is_shared_proto
W zpool_mount_datasets
W zpool_unmount_datasets
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12048
Colm [Thu, 3 Jun 2021 15:13:42 +0000 (16:13 +0100)]
A couple of small style cleanups
In `zpool_load_compat()`:
* initialize `l_features[]` with a loop rather than a static
initializer.
* don't redefine system constants; use private names instead
Rationale here:
When an array is initialized using a static {foo}, only the specified
members are initialized to the provided values, the rest are
initialized to zero. While B_FALSE is of course zero, it feels
unsafe to rely on this being true forever, so I'm inclined to sacrifice
a few microseconds of runtime here and initialize using a loop.
When looking for the correct combination of system constants to use
(in open() and mmap()), I prefer to use private constants rather than
redefining system ones; due to the small chance that the system
ones might be referenced later in the file. So rather than defining
O_PATH and MAP_POPULATE, I use distinct constant names.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Colm Buckley <colm@tuatha.org>
Closes #12156
zfs_arc_overflow_shift was never a parameter: ca0bf58d65f77e944b9905571df9a2eae647aeca ("Illumos 5497 - lock
contention on arcs_mtx") is the only result in
git log -Soverflow_shift, and it wasn't exposed then, nor is it now
zfs_read_chunk_size was renamed to zfs_vnops_read_chunk_size in e53d678d4ad596a310d51dab107bb6fa97e2b226 ("Share zfs_fsync, zfs_read,
zfs_write, et al between Linux and FreeBSD")
zio_decompress_fail_fraction was never a parameter: it was added in c3bd3fb4ac49705819666055ff1206a9fa3d1b9e ("OpenZFS 9403 - assertion
failed in arc_buf_destroy()") as a developer aid for setting in zdb, but
it's a dangerous test tunable and has no place in public documentation,
(not to mention that it obviously doesn't work):
> Although this did uncover a few low priority issues, this
unfortuantely also causes ztest to ASSERT in many locations where the
code is working correctly since it is designed to fail on IO errors.
Developers can manually set this variable with the '-o' option to find
and debug issues.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Laager <rlaager@wiktel.com> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12157
Rich Ercolani [Tue, 1 Jun 2021 21:20:50 +0000 (17:20 -0400)]
Added another missed case to arc_summary3
It turns out that sometimes, evidently only when run inside the
ZTS handler, arc_summary3 | head > /dev/null will die with ENOTCONN,
and ruin the test run.
Added handling for that.
Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12160
Ryan Moeller [Tue, 1 Jun 2021 21:13:26 +0000 (17:13 -0400)]
libzfs_core: Fix some style violations
Made function names start on a new line. Added a blank line between
functions. This helps when grepping for functions.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12137
grembo [Tue, 1 Jun 2021 21:03:49 +0000 (23:03 +0200)]
FreeBSD boot code reminder after zpool upgrade
There used to be a warning after upgrading a zpool in FreeBSD, so users
won't forget to update the boot loader that pool is booted from.
This change brings this warning back, but only if the bootfs property
is set on the pool, which should be sufficient for the vast majority of
FreeBSD installations. People running something custom are most likely
aware of what to do after an upgrade in their specific environment.
Functionality is implemented in an OS specific helper function.
Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Co-authored-by: Michael Gmelin <grembo@FreeBSD.org> Signed-off-by: Michael Gmelin <grembo@FreeBSD.org>
Closes #12099
Closes #12104
Rich Ercolani [Tue, 1 Jun 2021 18:58:08 +0000 (14:58 -0400)]
Remove iov_iter_advance() for iter_write
The additional iter advance is incorrect, as copy_from_iter() has
already done the right thing. This will result in the following
warning being printed to the console as of the 5.12 kernel.
Attempted to advance past end of bvec iter
This change should have been included with #11378 when a
similar change was made on the read side.