Brian Behlendorf [Mon, 21 Feb 2022 03:21:31 +0000 (19:21 -0800)]
ZTS: Retry in import_rewind_config_changed.ksh
As explained by the disclaimer in the test case,
"This test can fail since nothing guarantees that old
MOS blocks aren't overwritten."
This behavior is expected and correct, but results in a
flaky test case which is problematic for the CI. The best
we can do to resolve this is to retry the sub-test which
failed when the MOS blocks have clearly been overwritten.
When testing failures were rare enough that a single retry
should normally be sufficient. However, we allow up to
five for good measure.
Reviewed by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13119
Ryan Moeller [Fri, 18 Feb 2022 21:09:03 +0000 (16:09 -0500)]
libzfs: Fail making a dataset handle gracefully
When a dataset is in the process of being received it gets marked as
inconsistent and should not be used. We should check for this when
opening a dataset handle in libzfs and return with an appropriate error
set, rather than hitting an abort because of the incomplete data.
zfs_open() passes errno to zfs_standard_error() after observing
make_dataset_handle() fail, which ends up aborting if errno is 0.
Set errno before returning where we know it has not been set already.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #13077
Damian Szuberski [Fri, 18 Feb 2022 19:43:11 +0000 (19:43 +0000)]
spl: make 'spl_panic_halt' working for all cases
The default behavior where the serious ZFS errors cause FS thread to
stuck is very bad for some production scenario.
In some production scenarios (Linux), it is recommended to make real
kernel PANIC, where system can be rebooted by watchdog or kernel itself.
This patch enables coherent handling of spl_panic_halt parameter.
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Authored-by: Wojciech Nizinski <w.nizinski@grinn-global.com> Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #12120
Closes #13109
Brian Behlendorf [Thu, 17 Feb 2022 20:09:06 +0000 (12:09 -0800)]
ZTS: Fix vdev_zaps_004_pos.ksh
When attaching a vdev to a mirror wait for the resilver to complete
before invoking `zdb` to inspect the pool. This ensures the pool is
essentially idle which allows `zdb` to open the imported pool reliably.
Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13112
Closes #6935
Savyasachee Jha [Mon, 14 Feb 2022 16:58:37 +0000 (22:28 +0530)]
Multiple dracut module install script cleanups
- Replaced intances of `dracut_install` with `inst_simple`
- Removed calls to `test -x mark_hostonly` because the function is an
inbuilt dracut function
- Removed redundant installation of `systemd-ask-password` and
`systemd-tty-ask-password-agent` because they are already installed by
the systemd module. There is no need to install them again
- Removed multiple calls to the `mark_hostonly` function because the
`inst_simple` has a command-line switch for it
- Cleaned up the installation of the `zpool.cache`, `vdev_id.conf` and
`hostid` files to make the logic easier to follow
- Cleaned up and simplified the systemd service installation logic by
invoking systemctl instead of creating symlinks manually
- Replaced various hard-coded paths with dracut equivalents to better
conform with expected dracut behaviour
- Removed redundant call to `mkdir` (`inst_simple` creates the parent
directory if it does not exist on the destination initrd)
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Reviewed-by: Andrew J. Hesford <ajh@sideband.org> Signed-off-by: Savyasachee Jha <hi@savyasacheejha.com>
Closes #13010
Savyasachee Jha [Mon, 14 Feb 2022 12:45:16 +0000 (18:15 +0530)]
Remove absolute paths to udev rules and binaries for dracut
Since dracut functions can locate both udev rules and binaries, there is
no point in keeping absolute paths in the module setup script. It also
breaks the --sysroot option in dracut. This commit removes mentions to
absolute paths for binaries and udev rules.
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Reviewed-by: Andrew J. Hesford <ajh@sideband.org> Signed-off-by: Savyasachee Jha <hi@savyasacheejha.com>
Closes #13010
Savyasachee Jha [Tue, 25 Jan 2022 03:22:48 +0000 (03:22 +0000)]
Make better use of dracut functions when building initramfs
Setting up the module involves multiple redundant calls to a bunch of
dracut functions wheich can be combined into one. Additionally, the mass
of code required to load libgcc_s.so* can be replaced with one dracut
function. This has the additional effect of removing errors involving
the non-installation of libgcc_s.so* which are seen on debian bullseye
when using version 2.1.2-1~bpo11+1 from the backports repository.
The systemd binaries are separated out into their own `dracut_install`
function call so they do not get pulled in when dracut does not load the
systemd module.
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Reviewed-by: Andrew J. Hesford <ajh@sideband.org> Signed-off-by: Savyasachee Jha <hi@savyasacheejha.com>
Closes #13010
Damian Szuberski [Wed, 16 Feb 2022 23:17:16 +0000 (00:17 +0100)]
Fix Linux kernel directories detection
Most modern Linux distributions have separate locations for bare
source and prebuilt ("build") files. Additionally, there are `source`
and `build` symlinks in `/lib/modules/$(KERNEL_VERSION)` pointing to
them. The order of directory search is now:
- `configure` command line values if both `--with-linux` and
`--with-linux-obj` were defined
- If only `--with-linux` was defined, `--with-linux-obj` is assumed
to have the same value as `--with-linux`
- If neither `--with-linux` nor `--with-linux-obj` were defined
autodetection is used:
- `/lib/modules/$(uname -r)/{source,build}` respectively, if exist
- The first directory in `/lib/modules` with the highest version
number according to `sort -V` which contains `source` and `build`
symlinks/directories
- The first directory matching `/usr/src/kernels/*` and
`/usr/src/linux-*` with the highest version number according to
`sort -V`. Here the source and prebuilt directories are assumed
to be the same.
George Amanakis [Wed, 16 Feb 2022 19:52:02 +0000 (20:52 +0100)]
Enable encrypted raw sending to pools with greater ashift
Raw sending from pool1/encrypted with ashift=9 to pool2/encrypted with
ashift=12 results to failure when mounting pool2/encrypted (Input/Output
error). Notably, the opposite, raw sending from a greater ashift to a
lower one does not fail.
This happens because zio_compress_write() falsely checks only
ZIO_FLAG_RAW_COMPRESS and not ZIO_FLAG_RAW_ENCRYPT which is also set in
encrypted raw send streams. In this case it rounds up the psize and if
not equal to the zio->io_size it modifies the block by zeroing out
the extra bytes. Because this happens in a SA attr. registration object
(type=46), the decryption fails upon mounting the filesystem, and zpool
status falsely reports an error.
Fix this by checking both ZIO_FLAG_RAW_COMPRESS and ZIO_FLAG_RAW_ENCRYPT
before deciding whether to zero-pad a block.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #13067
Closes #13074
наб [Wed, 16 Feb 2022 00:42:30 +0000 (01:42 +0100)]
libzfs: calculate receive times with sub-second precision
Provide two digits of precision when reporting send/receive
times. Tiny snapshots may take significantly less than a second
and rounding up to a full second can introduce a significant error.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13047
Ryan Moeller [Wed, 16 Feb 2022 00:35:30 +0000 (19:35 -0500)]
Cross-platform xattr user namespace compatibility
ZFS on Linux originally implemented xattr namespaces in a way that is
incompatible with other operating systems. On illumos, xattrs do not
have namespaces. Every xattr name is visible. FreeBSD has two
universally defined namespaces: EXTATTR_NAMESPACE_USER and
EXTATTR_NAMESPACE_SYSTEM. The system namespace is used for protected
FreeBSD-specific attributes such as MAC labels and pnfs state. These
attributes have the namespace string "freebsd:system:" prefixed to the
name in the encoding scheme used by ZFS. The user namespace is used
for general purpose user attributes and obeys normal access control
mechanisms. These attributes have no namespace string prefixed, so
xattrs written on illumos are accessible in the user namespace on
FreeBSD, and xattrs written to the user namespace on FreeBSD are
accessible by the same name on illumos.
Linux has several xattr namespaces. On Linux, ZFS encodes the
namespace in the xattr name for every namespace, including the user
namespace. As a consequence, an xattr in the user namespace with the
name "foo" is stored by ZFS with the name "user.foo" and therefore
appears on FreeBSD and illumos to have the name "user.foo" rather than
"foo". Conversely, none of the xattrs written on FreeBSD or illumos
are accessible on Linux unless the name happens to be prefixed with one
of the Linux xattr namespaces, in which case the namespace is stripped
from the name. This makes xattrs entirely incompatible between Linux
and other platforms.
We want to make the encoding of user namespace xattrs compatible across
platforms. A critical requirement of this compatibility is for xattrs
from existing pools from FreeBSD and illumos to be accessible by the
same names in the user namespace on Linux. It is also necessary that
existing pools with xattrs written by Linux retain access to those
xattrs by the same names on Linux. Making user namespace xattrs from
Linux accessible by the correct names on other platforms is important.
The handling of other namespaces is not required to be consistent.
Add a fallback mechanism for listing and getting xattrs to treat xattrs
as being in the user namespace if they do not match a known prefix.
Do not allow setting or getting xattrs with a name that is prefixed
with one of the namespace names used by ZFS on supported platforms.
Allow choosing between legacy illumos and FreeBSD compatibility and
legacy Linux compatibility with a new tunable. This facilitates
replication and migration of pools between hosts with different
compatibility needs.
The tunable controls whether or not to prefix the namespace to the
name. If the xattr is already present with the alternate prefix,
remove it so only the new version persists. By default the platform's
existing convention is used.
Reviewed-by: Christian Schwarz <christian.schwarz@nutanix.com> Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11919
наб [Wed, 2 Feb 2022 22:11:34 +0000 (23:11 +0100)]
module: icp: remove provider stats
These were all folded into a single kstat at
/proc/spl/kstat/kcf/NONAME_provider_stats
with no way to know which one it actually was,
and only the AES and SHA (so not Skein) ones were ever updated
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
наб [Thu, 23 Dec 2021 02:22:27 +0000 (03:22 +0100)]
module: icp: have a static 8 providers
This is currently twice the amount we actually have (sha[12], skein,
aes), and 512 * sizeof(void*) = 4096: 128x more than we need and a waste
of most of a page in the kernel address space
Plus, there's no need to actually allocate it dynamically: it's always
got a static size. Put it in .data
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
George Amanakis [Tue, 15 Feb 2022 23:48:59 +0000 (00:48 +0100)]
Avoid dirtying the final TXGs when exporting a pool
There are two codepaths than can dirty final TXGs:
1) If calling spa_export_common()->spa_unload()->
spa_unload_log_sm_flush_all() after the spa_final_txg is set, then
spa_sync()->spa_flush_metaslabs() may end up dirtying the final
TXGs. Then we have the following panic:
Call Trace:
<TASK>
dump_stack_lvl+0x46/0x62
spl_panic+0xea/0x102 [spl]
dbuf_dirty+0xcd6/0x11b0 [zfs]
zap_lockdir_impl+0x321/0x590 [zfs]
zap_lockdir+0xed/0x150 [zfs]
zap_update+0x69/0x250 [zfs]
feature_sync+0x5f/0x190 [zfs]
space_map_alloc+0x83/0xc0 [zfs]
spa_generate_syncing_log_sm+0x10b/0x2f0 [zfs]
spa_flush_metaslabs+0xb2/0x350 [zfs]
spa_sync_iterate_to_convergence+0x15a/0x320 [zfs]
spa_sync+0x2e0/0x840 [zfs]
txg_sync_thread+0x2b1/0x3f0 [zfs]
thread_generic_wrapper+0x62/0xa0 [spl]
kthread+0x127/0x150
ret_from_fork+0x22/0x30
</TASK>
2) Calling vdev_*_stop_all() for a second time in spa_unload() after
spa_export_common() unnecessarily delays the final TXGs beyond what
spa_final_txg is set at.
Fix this by performing the check and call for
spa_unload_log_sm_flush_all() before the spa_final_txg is set in
spa_export_common(). Also check if the spa_final_txg has already been
set in spa_unload() and skip those calls in this case.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Amanakis <gamanakis@gmail.com>
External-issue: https://www.illumos.org/issues/9081
Closes #13048
Closes #13098
Brian Behlendorf [Sun, 13 Feb 2022 22:22:49 +0000 (14:22 -0800)]
ZTS: Fix checkpoint_ro_rewind.ksh
Related to commit 90b77a036. Retry the `zpool export` if the pool is
"busy" indicating there is a process accessing the mount point. This
can happen after an import and allowing it to be retried will avoid
spurious test failures.
Reviewed by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13092
Brian Behlendorf [Sun, 13 Feb 2022 22:22:00 +0000 (14:22 -0800)]
ZTS: Fix zpool_expand_001_pos
The dRAID section of the zpool_expand_001_pos test would reliably fail
because the calculated expansion size assumed the dRAID top-level vdev
was created with a distributed spare. Create the vdev as expected to
resolve the test failure.
This test case flaw was accidentally caused by changing the default
number of dRAID distributed spares from one to zero while dRAID was
being developed.
Additionally, remove zpool_expand_005_pos from the list of possible
faulty tests. It appears to be passing consistently in my testing.
Reviewed by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13091
Brian Behlendorf [Fri, 11 Feb 2022 22:31:45 +0000 (14:31 -0800)]
Fix gcc warning in kfpu_begin()
Observed when building on CentOS 8 Stream. Remove the `out`
label at the end of the function and instead return.
linux/simd_x86.h: In function 'kfpu_begin':
linux/simd_x86.h:337:1: error: label at end of compound statement
Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Attila Fülöp <attila@fueloep.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13089
Paul Zuchowski [Fri, 11 Feb 2022 21:32:08 +0000 (16:32 -0500)]
ZTS: Fix problem with zdb_objset_id test
Use large numbers for datasets with numeric names to avoid name
and id collisions. Sporadic test failures were observed when the
test would create $TESTPOOL/100 with an objset ID of 100.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes #13087
наб [Fri, 11 Feb 2022 19:49:07 +0000 (20:49 +0100)]
zpoolprops.7: document leaked
It's noted very scarcely in the code as it stands, indeed the only
actual comment on this is
/*
* We have finished background destroying, but there is still
* some space left in the dp_free_dir. Transfer this leaked
* space to the dp_leak_dir.
*/
Introduced in fbeddd60b79690b6a6ececc9b00b6014d21405aa ("Illumos 4390 -
I/O errors can corrupt space map when deleting fs/vol"),
which explains, alongside the references, that this can only happen
with a corrupted pool
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13081
Brian Behlendorf [Thu, 10 Feb 2022 01:00:03 +0000 (17:00 -0800)]
ZTS: Fix zvol_misc_volmode test
Changing volmode may need to remove minors, which could be open, so
call udev_wait() before we "zfs set volmode=<value>". This ensures
no udev process has the zvol open (i.e. blkid) and the kernel
zvol_remove_minor_impl() function won't skip removing the in use
device.
Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13075
Attila Fülöp [Wed, 9 Feb 2022 22:38:33 +0000 (23:38 +0100)]
Receive checks should allow unencrypted child datasets
dmu_recv_begin_check() unconditionally sets the DS_HOLD_FLAG_DECRYPT
flag before calling dsl_dataset_hold_flags(). If the key on the
receiving side isn't loaded or the send stream contains embedded
blocks, the receive check fails for a stream which is perfectly
valid and could be received without any problem. This seems like
a remnant of the initial design, where unencrypted datasets below
encrypted ones weren't allowed.
Add a condition to set `DS_HOLD_FLAG_DECRYPT` only for encrypted
datasets, modify an existing test to detect this regression and add
a test for raw replication streams.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Amanakis <gamanakis@gmail.com> Co-authored-by: George Amanakis <gamanakis@gmail.com> Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #13033
Closes #13076
Peter Levine [Wed, 2 Feb 2022 05:44:59 +0000 (00:44 -0500)]
Add support for $KERNEL_{CC,LD,LLVM} variables
Currently, $(CC), $(LD), and $(LLVM) variables aren't passed to kbuild
while building modules. This causes modules to build with the default
GNU GCC toolchain and prevents experimenting with other toolchains such
as CLANG/LLVM. It can also lead to build failure if the CFLAGS/LDFLAGS
passed are incompatible with gcc/ld.
Pass $KERNEL_CC, $KERNEL_LD, and $KERNEL_LLVM as $(CC), $(LD), and
$(LLVM), respectively, to kbuild for each that is defined in the
environment. This should take care of the majority of alternative
toolchain use cases.
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Peter Levine <plevine457@gmail.com>
Closes #13046
Attila Fülöp [Wed, 9 Feb 2022 20:50:10 +0000 (21:50 +0100)]
Linux 5.16 compat: don't use XSTATE_XSAVE to save FPU state
Linux 5.16 moved XSTATE_XSAVE and XSTATE_XRESTORE out of our reach,
so add our own XSAVE{,OPT,S} code and use it for Linux 5.16.
Please note that this differs from previous behavior in that it
won't handle exceptions created by XSAVE an XRSTOR. This is sensible
for three reasons.
- Exceptions during XSAVE and XRSTOR can only occur if the feature
is not supported or enabled or the memory operand isn't aligned
on a 64 byte boundary. If this happens something else went
terribly wrong, and it may be better to stop execution.
- Previously we just printed a warning and didn't handle the fault,
this is arguable for the above reason.
- All other *SAVE instruction also don't handle exceptions, so this
at least aligns behavior.
Finally add a test to catch such a regression in the future.
Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #13042
Closes #13059
Majority of the software installed by default in GitHub runners is
irrelevant to OpenZFS. Reclaimed space could be used for more/bigger
vdev files. File deletion happens in the background, leveraging
`systemd-run` - the workflow is not significantly slowed down.
Before
```
Filesystem Size Used Avail Use% Mounted on
/dev/root 84G 53G 31G 63% /
```
After
```
Filesystem Size Used Avail Use% Mounted on
/dev/root 84G 15G 70G 18% /
```
Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #13066
Using `zfs_mount_at()` gives opportunity to properly propagate
mountopts from what's stored in a pool to the `mount(2)` syscall
invocation. It fixes cases when mount options are set to incorrect
values and rectification is impossible (e. g. Linux initrd boot
sequence in #7947).
Moved debug information printing after all variables are
initialized - printed text reflects what is passed to `mount(2)`.
Change enforced shell type from `dash` to `sh` and excluded
`SC2039` and `SC3043` by default. `local` keyword is accepted by all
POSIX shells from practical point of view. There is no need anymore
to enforce dash so `local` is accepted.
Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #13020
There's no need to make the platform ops dynamic dispatch.
This change replaces the dynamic dispatch with static calls to the
platform-specific functions.
To avoid name collisions, prefix all platform-specific functions
with `zvol_os_`.
I actually find `zvol_..._os` slightly nicer to read in the calling
code, but having it as a prefix is useful.
Alexander Motin [Fri, 4 Feb 2022 21:06:38 +0000 (16:06 -0500)]
Add more control/visibility to spa_load_verify().
Use error thresholds from policy to control whether to scrub data
and/or metadata. If threshold is set to UINT64_MAX, then caller
probably does not care about result and we may skip that part.
By default import neither set the data error threshold nor read
the error counter, so skip the data scrub for faster import.
Metadata are still scrubbed and fail if even single error found.
While there just for symmetry return number of metadata errors in
case threshold is not set to zero and we haven't reached it.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #13022