git.proxmox.com Git - mirror

]> git.proxmox.com Git - mirror_zfs.git/log

Val Packett [Thu, 11 May 2023 21:16:57 +0000 (18:16 -0300)]

PAM: enable testing on FreeBSD

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Felix Dörre <felix@dogcraft.de>
Signed-off-by: Val Packett <val@packett.cool>
Closes #14834

commit | commitdiff | tree

Val Packett [Sat, 6 May 2023 01:17:12 +0000 (22:17 -0300)]

PAM: support password changes even when not mounted

There's usually no requirement that a user be logged in for changing
their password, so let's not be surprising here.

We need to use the fetch_lazy mechanism for the old password to avoid
a double prompt for it, so that mechanism is now generalized a bit.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Felix Dörre <felix@dogcraft.de>
Signed-off-by: Val Packett <val@packett.cool>
Closes #14834

commit | commitdiff | tree

Val Packett [Sat, 6 May 2023 01:34:58 +0000 (22:34 -0300)]

PAM: add 'uid_min' and 'uid_max' options for changing the uid range

Instead of a fixed >=1000 check, allow the configuration to override
the minimum UID and add a maximum one as well. While here, add the
uid range check to the authenticate method as well, and fix the return
in the chauthtok method (seems very wrong to report success when we've
done absolutely nothing).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Felix Dörre <felix@dogcraft.de>
Signed-off-by: Val Packett <val@packett.cool>
Closes #14834

commit | commitdiff | tree

Val Packett [Sat, 6 May 2023 01:02:13 +0000 (22:02 -0300)]

PAM: add 'forceunmount' flag

Probably not always a good idea, but it's nice to have the option.
It is a workaround for FreeBSD calling the PAM session end earier than
the last process is actually done touching the mount, for example.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Felix Dörre <felix@dogcraft.de>
Signed-off-by: Val Packett <val@packett.cool>
Closes #14834

commit | commitdiff | tree

Val Packett [Fri, 5 May 2023 22:35:57 +0000 (19:35 -0300)]

PAM: add 'recursive_homes' flag to use with 'prop_mountpoint'

It's not always desirable to have a fixed flat homes directory.
With the 'recursive_homes' flag, 'prop_mountpoint' search would
traverse the whole tree starting at 'homes' (which can now be '*'
to mean all pools) to find a dataset with a mountpoint matching
the home directory.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Felix Dörre <felix@dogcraft.de>
Signed-off-by: Val Packett <val@packett.cool>
Closes #14834

commit | commitdiff | tree

Val Packett [Sat, 6 May 2023 00:56:39 +0000 (21:56 -0300)]

PAM: use boolean_t for config flags

Since we already use boolean_t in the file, we can use it here.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Felix Dörre <felix@dogcraft.de>
Signed-off-by: Val Packett <val@packett.cool>
Closes #14834

commit | commitdiff | tree

Val Packett [Fri, 5 May 2023 23:00:48 +0000 (20:00 -0300)]

PAM: do not fail to mount if the key's already loaded

If we're expecting a working home directory on login, it would be
rather frustrating to not have it mounted just because it e.g. failed to
unmount once on logout.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Felix Dörre <felix@dogcraft.de>
Signed-off-by: Val Packett <val@packett.cool>
Closes #14834

commit | commitdiff | tree

Rich Ercolani [Wed, 31 May 2023 23:58:41 +0000 (19:58 -0400)]

Revert "initramfs: use `mount.zfs` instead of `mount`"

This broke mounting of snapshots on / for users.

See https://github.com/openzfs/zfs/issues/9461#issuecomment-1376162949 for more context.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #14908

commit | commitdiff | tree

Luís Henriques [Tue, 30 May 2023 22:15:24 +0000 (23:15 +0100)]

Fix NULL pointer dereference when doing concurrent 'send' operations

A NULL pointer will occur when doing a 'zfs send -S' on a dataset that
is still being received. The problem is that the new 'send' will
rightfully fail to own the datasets (i.e. dsl_dataset_own_force() will
fail), but then dmu_send() will still do the dsl_dataset_disown().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Luís Henriques <henrix@camandro.org>
Closes #14903
Closes #14890

commit | commitdiff | tree

Brian Behlendorf [Mon, 29 May 2023 19:55:35 +0000 (12:55 -0700)]

ZTS: zvol_misc_trim disable blk mq

Disable the zvol_misc_fua.ksh and zvol_misc_trim.ksh test cases on impacted
kernels.  This issue is being actively worked in #14872 and as part of that
fix this commit will be reverted.

    VERIFY(zh->zh_claim_txg == 0) failed
    PANIC at zil.c:904:zil_create()

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #14872
Closes #14870

commit | commitdiff | tree

Richard Yao [Fri, 26 May 2023 22:47:52 +0000 (18:47 -0400)]

Use __attribute__((malloc)) on memory allocation functions

This informs the C compiler that pointers returned from these functions
do not alias other functions, which allows it to do better code
optimization and should make the compiled code smaller.

References:
https://stackoverflow.com/a/53654773
https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-malloc-function-attribute
https://clang.llvm.org/docs/AttributeReference.html#malloc

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14827

commit | commitdiff | tree

Brian Behlendorf [Fri, 26 May 2023 22:39:23 +0000 (15:39 -0700)]

ZTS: Add zpool_resilver_concurrent exception

The zpool_resilver_concurrent test case requires the ZED which is not used
on FreeBSD. Add this test to the known list of skipped tested for FreeBSD.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14904

commit | commitdiff | tree

Mike Swanson [Fri, 26 May 2023 22:37:15 +0000 (15:37 -0700)]

Add compatibility symlinks for FreeBSD 12.{3,4} and 13.{0,1,2}

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mike Swanson <mikeonthecomputer@gmail.com>
Closes #14902

commit | commitdiff | tree

Colm [Fri, 26 May 2023 17:04:19 +0000 (10:04 -0700)]

Adding new read-only compatible zpool features to compatibility.d/grub2

GRUB2 is compatible with all "read-only compatible" features,
so it is safe to add new features of this type to the grub2
compatibility list. We generally want to include all compatible
features, to minimize the differences between grub2-compatible
pools and no-compatibility pools.

Adding new properties `livelist` and `zpool_checkpoint` accordingly.

Also adding them to the man page which references this file as an
example, for consistency.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Colm Buckley <colm@tuatha.org>
Closes #14893

commit | commitdiff | tree

Richard Yao [Fri, 26 May 2023 17:03:12 +0000 (13:03 -0400)]

btree: Implement faster binary search algorithm

This implements a binary search algorithm for B-Trees that reduces
branching to the absolute minimum necessary for a binary search
algorithm. It also enables the compiler to inline the comparator to
ensure that the only slowdown when doing binary search is from waiting
for memory accesses. Additionally, it instructs the compiler to unroll
the loop, which gives an additional 40% improve with Clang and 8%
improvement with GCC.

Consumers must opt into using the faster algorithm. At present, only
B-Trees used inside kernel code have been modified to use the faster
algorithm.

Micro-benchmarks suggest that this can improve binary search performance
by up to 3.5 times when compiling with Clang 16 and up to 1.9 times when
compiling with GCC 12.2.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14866

commit | commitdiff | tree

George Amanakis [Fri, 26 May 2023 16:53:00 +0000 (18:53 +0200)]

Fix inconsistent definition of zfs_scrub_error_blocks_per_txg

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #14894

commit | commitdiff | tree

Damiano Albani [Thu, 25 May 2023 23:10:54 +0000 (01:10 +0200)]

Add missing files to Debian DKMS package

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Damiano Albani <damiano.albani@gmail.com>
Closes #14887
Closes #14889

commit | commitdiff | tree

Brian Behlendorf [Thu, 25 May 2023 20:53:08 +0000 (13:53 -0700)]

Update compatibility.d files

Add an openzfs-2.2 compatibility file for the next release.

Edon-R support has been enabled for FreeBSD removing the need
for different FreeBSD and Linux files. Symlinks for the -linux
and -freebsd names are created for any scripts expecting that
convention.

Additionally, a symlink for ubunutu-22.04 was added.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14833

commit | commitdiff | tree

Alexander Motin [Thu, 25 May 2023 20:51:53 +0000 (16:51 -0400)]

zil: Add some more statistics.

In addition to a number of actual log bytes written, account also a
total written bytes including padding and total allocated bytes (bytes
<= write <= alloc). It should allow to monitor zil traffic and space
efficiency.

Add dtrace probe for zil block size selection.

Make zilstat report more information and fit it into less width.

Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #14863

commit | commitdiff | tree

Alexander Motin [Thu, 25 May 2023 16:48:43 +0000 (12:48 -0400)]

ZIL: Reduce scope of per-dataset zl_issuer_lock.

Before this change ZIL copied all log data while holding the lock.
It caused huge lock contention on workloads with many big parallel
writes. This change splits the process into two parts: first,
zil_lwb_assign() estimates the log space needed for all transactions,
and zil_lwb_write_close() allocates blocks and zios while holding the
lock, then, after the lock in dropped, zil_lwb_commit() copies the
data, and zil_lwb_write_issue() issues the I/Os.

Also while there slightly reduce scope of zl_lock.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #14841

commit | commitdiff | tree

Dimitri John Ledkov [Wed, 24 May 2023 19:31:28 +0000 (20:31 +0100)]

systemd: Use non-absolute paths in Exec* lines

Since systemd v239, Exec* binaries are resolved from PATH when they
are not-absolute. Switch to this by default for ease of downstream
maintenance. Many downstream distributions move individual binaries
to locations that existing compile-time configurations cannot
accommodate.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Dimitri John Ledkov <dimitri.ledkov@canonical.com>
Closes #14880

commit | commitdiff | tree

Akash B [Wed, 24 May 2023 19:28:09 +0000 (00:58 +0530)]

Fix concurrent resilvers initiated at same time

For draid vdevs it was possible to initiate both the
sequential and healing resilver at same time.

This fixes the following two scenarios.
1) There's a window where a sequential rebuild can
be started via ZED even if a healing resilver has been
scheduled.
- This is fixed by adding additional check in
spa_vdev_attach() for any scheduled resilver and return
appropriate error code when a resilver is already in
progress.

2) It was possible for zpool clear to start a healing
resilver when it wasn't needed at all. This occurs because
during a vdev_open() the device is presumed to be healthy not
until the device is validated by vdev_validate() and it's set
unavailable. However, by this point an async resilver will
have already been requested if the DTL isn't empty.
- This is fixed by cancelling the SPA_ASYNC_RESILVER
request immediately at the end of vdev_reopen() when a resilver
is unneeded.

Finally, added a testcase in ZTS for verification.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Dipak Ghosh <dipak.ghosh@hpe.com>
Signed-off-by: Akash B <akash-b@hpe.com>
Closes #14881
Closes #14892

commit | commitdiff | tree

youzhongyang [Wed, 24 May 2023 19:23:42 +0000 (15:23 -0400)]

Linux 6.4 compat: reclaimed_slab renamed to reclaimed

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Closes #14891

commit | commitdiff | tree

Brian Atkinson [Fri, 19 May 2023 20:05:53 +0000 (16:05 -0400)]

Hold db_mtx when updating db_state

Commit 555ef90 did some general code refactoring for
dmu_buf_will_not_fill() and dmu_buf_will_fill(). However, the db_mtx was
not held when update db->db_state in those code block. The rest of the
dbuf code always holds the db_mtx when updating db_state. This is
important because cv_wait() db_changed is used to check for db_state
changes.

Updating dmu_buf_will_not_fill() and dmu_buf_will_fill() to hold the
db_mtx when updating db_state.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #14875

commit | commitdiff | tree

Brian Behlendorf [Fri, 19 May 2023 20:05:09 +0000 (13:05 -0700)]

Probe vdevs before marking removed

Before allowing the ZED to mark a vdev as REMOVED due to a
hotplug event confirm that it is non-responsive with probe.
Any device which can be successfully probed should be left
ONLINE to prevent a healthy pool from being incorrectly
SUSPENDED.  This may occur for at least the following two
scenarios.

1) Drive expansion (zpool online -e) in VMware environments.
   If, during the partition resize operation, a partition is
   removed and re-created then udev will send a removed event.

2) Re-scanning the namespaces of an NVMe device (nvme ns-rescan)
   may result in a udev remove and add event being delivered.

Finally, update the ZED to only kick in a spare when the
removal was successful.

Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #14859
Closes #14861

commit | commitdiff | tree

George Amanakis [Fri, 17 Dec 2021 20:35:28 +0000 (21:35 +0100)]

Teach zpool scrub to scrub only blocks in error log

Added a flag '-e' in zpool scrub to scrub only blocks in error log. A
user can pause, resume and cancel the error scrub by passing additional
command line arguments -p -s just like a regular scrub. This involves
adding a new flag, creating new libzfs interfaces, a new ioctl, and the
actual iteration and read-issuing logic. Error scrubbing is executed in
multiple txg to make sure pool performance is not affected.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Co-authored-by: TulsiJain tulsi.jain@delphix.com
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #8995
Closes #12355

commit | commitdiff | tree

Brian Behlendorf [Thu, 18 May 2023 17:02:20 +0000 (10:02 -0700)]

Add the ability to uninitialize

zpool initialize functions well for touching every free byte...once.
But if we want to do it again, we're currently out of luck.

So let's add zpool initialize -u to clear it.

Co-authored-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12451
Closes #14873

commit | commitdiff | tree

Antonio Russo [Mon, 15 May 2023 23:11:33 +0000 (17:11 -0600)]

test-runner: pass kmemleak and kmsg to Cmd.run

test-runner.py orchestrates all of the ZTS executions. The `Cmd` object
manages these process, and its `run` method specifically invokes these
possibly long-running processes, possibly retrying in the event of a
timeout. Since its inception, memory leak detection using the kmemleak
infrastructure [1], and kernel logging [2] have been added to this run
mechanism.

However, the callback to cull a process beyond its timeout threshold,
`kill_cmd`, has evaded modernization by both of these changes. As a
result, this function fails to properly invoke `run`, leading to an
untrapped exception and unreported test failure.

This patch extends `kill_cmd` to receive these kernel devices through
the `options` parameter, and regularizes all the `.run` calls from
`Cmd`, and its subclasses, to accept that parameter.

[1] Commit a69765ea5b563e0cd4d15fac4b1ac08c6ccf12d1
[2] Commit fc2c0256c55a2859d1988671b0896d22b75c8aba

Reviewed-by: John Wren Kennedy <john.kennedy@delphix.com>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #14849

commit | commitdiff | tree

Richard Yao [Fri, 12 May 2023 21:10:14 +0000 (17:10 -0400)]

Fix undefined behavior in spa_sync_props()

8eae2d214cfa53862833eeeda9a5c1e9d5ded47d caused Coverity to begin
complaining about "Improper use of negative value" in two places in
spa_sync_props() because Coverity correctly inferred from `prop ==
ZPOOL_PROP_INVAL` that prop could be -1 while both zpool_prop_to_name()
and zpool_prop_get_type() use it an array index, which is undefined
behavior.

Assuming that the system does not panic from an attempt to read invalid
memory, the case statement for ZPOOL_PROP_INVAL will ensure that only
user properties will reach this code when prop is ZPOOL_PROP_INVAL, such
that execution will continue safely. However, if we are unlucky enough
to read invalid memory, then the system will panic.

This issue predates the patch that caused coverity to begin complaining.
Thankfully, our userland tools do not pass nonsense to us, so this bug
should not be triggered unless a future userland tool attempts to set a
property that we do not understand.

Reported-by: Coverity (CID-1561129)
Reported-by: Coverity (CID-1561130)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14860

commit | commitdiff | tree

Richard Yao [Fri, 12 May 2023 20:47:56 +0000 (16:47 -0400)]

Fix use after free regression in spa_remove_healed_errors()

6839ec6f1098c28ff7b772f1b31b832d05e6b567 placed code in
spa_remove_healed_errors() that uses a pointer after the kmem_free()
call that frees it.

Reported-by: Coverity (CID-1562375)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14860

commit | commitdiff | tree

Alexander Motin [Fri, 12 May 2023 16:49:26 +0000 (12:49 -0400)]

zil: Free lwb_buf after write completion.

There is no sense to keep that memory allocated during the flush.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #14855

commit | commitdiff | tree

Alexander Motin [Fri, 12 May 2023 16:14:29 +0000 (12:14 -0400)]

zil: Some micro-optimizations.

Should not cause functional changes.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #14854

commit | commitdiff | tree

Don Brady [Fri, 12 May 2023 16:12:28 +0000 (10:12 -0600)]

Refine special_small_blocks property validation

When the special_small_blocks property is being set during a pool
create it enforces a limit of 128KiB even if the pool's record size
is larger.

If the recordsize property is being set during a pool create, then
use that value instead of the default SPA_OLD_MAXBLOCKSIZE value.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <dev.fs.zfs@gmail.com>
Closes #13815
Closes #14811

commit | commitdiff | tree

Brian Behlendorf [Fri, 12 May 2023 16:07:58 +0000 (09:07 -0700)]

ZTS: Add auto_replace_001_pos to exceptions

The auto_replace_001_pos test case does not reliably pass on
Fedora 37 and newer. Until the test case can be updated to make
it reliable add it to the list of "maybe" exceptions on Linux.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #14851
Closes #14852

commit | commitdiff | tree

Pawel Jakub Dawidek [Wed, 10 May 2023 05:32:30 +0000 (22:32 -0700)]

Make sure we are not trying to clone a spill block.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #14825

commit | commitdiff | tree

Pawel Jakub Dawidek [Thu, 4 May 2023 23:14:19 +0000 (16:14 -0700)]

Correct comment.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #14825

commit | commitdiff | tree

Pawel Jakub Dawidek [Thu, 4 May 2023 06:25:22 +0000 (23:25 -0700)]

Remove badly placed comment.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #14825

commit | commitdiff | tree

Pawel Jakub Dawidek [Wed, 3 May 2023 07:24:47 +0000 (00:24 -0700)]

Don't call zfs_exit_two() before zfs_enter_two().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #14825

commit | commitdiff | tree

Pawel Jakub Dawidek [Tue, 2 May 2023 22:46:14 +0000 (15:46 -0700)]

Don't use dmu_buf_is_dirty() for unassigned transaction.

The dmu_buf_is_dirty() call doesn't make sense here for two reasons:
1. txg is 0 for unassigned tx, so it was a no-op.
2. It is equivalent of checking if we have dirty records and we are doing
this few lines earlier.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #14825

commit | commitdiff | tree

Pawel Jakub Dawidek [Tue, 2 May 2023 21:24:43 +0000 (14:24 -0700)]

Deny block cloning is dbuf size doesn't match BP size.

I don't know an easy way to shrink down dbuf size, so just deny block cloning
into dbufs that don't match our BP's size.

This fixes the following situation:
1. Create a small file, eg. 1kB of random bytes. Its dbuf will be 1kB.
2. Create a larger file, eg. 2kB of random bytes. Its dbuf will be 2kB.
3. Truncate the large file to 0. Its dbuf will remain 2kB.
4. Clone the small file into the large file. Small file's BP lsize is
1kB, but the large file's dbuf is 2kB.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #14825

commit | commitdiff | tree

Pawel Jakub Dawidek [Sun, 30 Apr 2023 09:47:09 +0000 (02:47 -0700)]

Additional block cloning fixes.

Reimplement some of the block cloning vs dbuf logic, mostly to fix
situation where we clone a block and in the same transaction group
we want to partially overwrite the clone.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #14825

commit | commitdiff | tree

Alexander Motin [Thu, 11 May 2023 21:27:12 +0000 (17:27 -0400)]

zil: Don't expect zio_shrink() to succeed.

At least for RAIDZ zio_shrink() does not reduce zio size, but reduced
wsz in that case likely results in writing uninitialized memory.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #14853

commit | commitdiff | tree

Ameer Hamza [Wed, 10 May 2023 00:56:35 +0000 (05:56 +0500)]

Prevent panic during concurrent snapshot rollback and zvol read

Protect zvol_cdev_read with zv_suspend_lock to prevent concurrent
release of the dnode, avoiding panic when a snapshot is rolled back
in parallel during ongoing zvol read operation.

Reviewed-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #14839

commit | commitdiff | tree

Tony Hutter [Wed, 10 May 2023 00:55:19 +0000 (17:55 -0700)]

pam: Fix "buffer overflow" in pam ZTS tests on F38

The pam ZTS tests were reporting a buffer overflow on F38, possibly
due to F38 now setting _FORTIFY_SOURCE=3 by default. gdb and
valgrind narrowed this down to a snprintf() buffer overflow in
zfs_key_config_modify_session_counter(). I'm not clear why this
particular snprintf() was being flagged as an overflow, but when
I replaced it with an asprintf(), the test passed reliably.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #14802
Closes #14842

commit | commitdiff | tree

Brian Behlendorf [Tue, 9 May 2023 16:03:10 +0000 (09:03 -0700)]

Add dmu_tx_hold_append() interface

Provides an interface which callers can use to declare a write when
the exact starting offset in not yet known. Since the full range
being updated is not available only the first L0 block at the
provided offset will be prefetched.

Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14819

commit | commitdiff | tree

Brian Behlendorf [Tue, 9 May 2023 15:57:02 +0000 (08:57 -0700)]

Debug auto_replace_001_pos failures

Reduced the timeout to 60 seconds which should be more than
sufficient and allow the test to be marked as FAILED rather
than KILLED. Also dump the pool status on cleanup.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14829

commit | commitdiff | tree

George Amanakis [Tue, 9 May 2023 15:54:41 +0000 (17:54 +0200)]

Remove duplicate code in l2arc_evict()

l2arc_evict() performs the adjustment of the size of buffers to be
written on L2ARC unnecessarily. l2arc_write_size() is called right
before l2arc_evict() and performs those adjustments.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #14828

commit | commitdiff | tree

Alexander Motin [Tue, 9 May 2023 15:54:01 +0000 (11:54 -0400)]

Remove single parent assertion from zio_nowait().

We only need to know if ZIO has any parent there. We do not care if
it has more than one, but use of zio_unique_parent() == NULL asserts
that.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #14823

commit | commitdiff | tree

George Amanakis [Tue, 9 May 2023 15:53:27 +0000 (17:53 +0200)]

Enable the head_errlog feature to remove errors

In case check_filesystem() does not error out and does not report
an error, remove that error block from error lists and logs
without requiring a scrub. This can happen when the original file and
all snapshots/clones referencing it have been removed.

Otherwise zpool status will still report that "Permanent errors have
been detected..." without actually reporting any of them.

To implement this change the functions introduced in corrective
receive were modified to take into account the head_errlog feature.

Before this change:
=============================
pool: test
state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
config:

        NAME                   STATE     READ WRITE CKSUM
        test                   ONLINE       0     0     0
          /home/user/vdev_a    ONLINE       0     0     2

errors: Permanent errors have been detected in the following files:

=============================

After this change:
=============================
  pool: test
state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are
unaffected.
action: Determine if the device needs to be replaced, and clear the
errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
config:

        NAME                   STATE     READ WRITE CKSUM
        test                   ONLINE       0     0     0
          /home/user/vdev_a    ONLINE       0     0     2

errors: No known data errors
=============================

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #14813

commit | commitdiff | tree

George Amanakis [Mon, 8 May 2023 20:35:03 +0000 (22:35 +0200)]

Fixes in head_errlog feature with encryption

For the head_errlog feature use dsl_dataset_hold_obj_flags() instead of
dsl_dataset_hold_obj() in order to enable access to the encryption keys
(if loaded). This enables reporting of errors in encrypted filesystems
which are not mounted but have their keys loaded.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #14837

commit | commitdiff | tree

Matthew Ahrens [Mon, 8 May 2023 18:20:23 +0000 (11:20 -0700)]

Verify block pointers before writing them out

If a block pointer is corrupted (but the block containing it checksums
correctly, e.g. due to a bug that overwrites random memory), we can
often detect it before the block is read, with the `zfs_blkptr_verify()`
function, which is used in `arc_read()`, `zio_free()`, etc.

However, such corruption is not typically recoverable. To recover from
it we would need to detect the memory error before the block pointer is
written to disk.

This PR verifies BP's that are contained in indirect blocks and dnodes
before they are written to disk, in `dbuf_write_ready()`. This way,
we'll get a panic before the on-disk data is corrupted. This will help
us to diagnose what's causing the corruption, as well as being much
easier to recover from.

To minimize performance impact, only checks that can be done without
holding the spa_config_lock are performed.

Additionally, when corruption is detected, the raw words of the block
pointer are logged. (Note that `dprintf_bp()` is a no-op by default,
but if enabled it is not safe to use with invalid block pointers.)

Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #14817

commit | commitdiff | tree

Brian Behlendorf [Mon, 8 May 2023 18:17:41 +0000 (11:17 -0700)]

zdb: consistent xattr output

When using zdb to output the value of an xattr only interpret it
as printable characters if the entire byte array is printable.
Additionally, if the --parseable option is set always output the
buffer contents as octal for easy parsing.

Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14830

commit | commitdiff | tree

Brian Behlendorf [Mon, 8 May 2023 17:09:30 +0000 (10:09 -0700)]

ZTS: add snapshot/snapshot_002_pos exception

Add snapshot_002_pos to the known list of occasional failures
for FreeBSD until it can be made entirely reliable.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #14831
Closes #14832

commit | commitdiff | tree

Alexander Motin [Fri, 5 May 2023 16:17:55 +0000 (12:17 -0400)]

Fix two abd_gang_add_gang() issues.

- There is no reason to assert that added gang is not empty. It
may be weird to add an empty gang, but it is legal.
- When moving chain list from the added gang clear its size, or it
will trigger assertion in abd_verify() when that gang is freed.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #14816

commit | commitdiff | tree

Pawel Jakub Dawidek [Fri, 5 May 2023 16:09:12 +0000 (01:09 +0900)]

Simplify and optimize random_int_between().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #14805

commit | commitdiff | tree

Pawel Jakub Dawidek [Fri, 5 May 2023 15:51:41 +0000 (00:51 +0900)]

Plug memory leak in zfsdev_state.

On kernel module unload, free all zfsdev state structures, except for
zfsdev_state_listhead, which is statically allocated.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #14824

commit | commitdiff | tree

Ameer Hamza [Wed, 3 May 2023 22:10:32 +0000 (03:10 +0500)]

zpool import -m also removing spare and cache when log device is missing

spa_import() relies on a pool config fetched by spa_try_import() for
spare/cache devices. Import flags are not passed to spa_tryimport(),
which makes it return early due to a missing log device and missing
retrieving the cache device and spare eventually. Passing
ZFS_IMPORT_MISSING_LOG to spa_tryimport() makes it fetch the correct
configuration regardless of the missing log device.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #14794

commit | commitdiff | tree

buzzingwires [Wed, 3 May 2023 16:03:57 +0000 (12:03 -0400)]

Allow zhack label repair to restore detached devices.

This commit expands on the zhack label repair command in d04b5c9 by
adding the -u option to undetach a device by regenerating uberblocks,
in addition to the existing functionality of fixing checksums, now
represented by -c. Previous behavior is retained in the case of no
options.

The changes are heavily inspired by Jeff Bonwick's labelfix
utility, as archived at:

https://gist.github.com/jjwhitney/baaa63144da89726e482

Additionally, it is now capable of properly determining the size of
block devices and other media, as well as handling sizes which are
not divisible by 2^18. This should make it viable for use on physical
devices and partitions, in addition to files.

These changes should make it possible to import zpools that have had
their uberblocks erased, such as in the case of pools rendered
inaccessible by erroneous detach commands.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: buzzingwires <buzzingwires@outlook.com>
Closes #14773

commit | commitdiff | tree

George Amanakis [Wed, 3 May 2023 16:00:14 +0000 (18:00 +0200)]

Optimize check_filesystem() and process_error_log()

Integrate check_clones() into check_filesystem() and implement a list
instead of iterating recursively over the clones, thus eliminating the
risk of a stack overflow.

Also use kmem_zalloc() to allocate large structures in
process_error_log() reducing its stack size from ~700 to ~128 bytes.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #14744

commit | commitdiff | tree

Pawel Jakub Dawidek [Tue, 2 May 2023 16:24:26 +0000 (01:24 +0900)]

Use correct block pointer in block cloning case.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #14806

commit | commitdiff | tree

Brian Behlendorf [Tue, 2 May 2023 16:21:47 +0000 (09:21 -0700)]

Wrap clang specific pragma

Clang specific pragmas need to be wrapped to prevent a build
warning when compiling with gcc.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14814

commit | commitdiff | tree

Mateusz Guzik [Tue, 2 May 2023 00:21:27 +0000 (02:21 +0200)]

blake3: fix up bogus checksums in face of cpu migration

This is a temporary measure until a better fix is sorted out.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Sponsored by: Rubicon Communications, LLC ("Netgate")
Closes #14785
Closes #14808

commit | commitdiff | tree

Serapheim Dimitropoulos [Tue, 2 May 2023 00:18:42 +0000 (17:18 -0700)]

Correct ABD size for split block ZIOs

Currently when layering the ABD buffer of each split block on top of
an indirect vdev's ZIO ABD we don't specify the split block's ABD.
This results in those ABDs being incorrectly sized by inheriting
the size of their parent ABD which is larger than what each split
block needs.

The above behavior isn't causing any bugs currently but can lead
to unexpected ABD sizes for people analyzing and/or working on
the ZIO codepath. This patch fixes this behavior by properly setting
the ABD size for split block ZIOs.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #14804

commit | commitdiff | tree

Justin Hibbits [Thu, 27 Apr 2023 19:49:21 +0000 (15:49 -0400)]

powerpc64: Support ELFv2 asm on Big Endian

FreeBSD/powerpc64 is all ELFv2 since FreeBSD 13, even big endian. The
existing sha256 and sha512 asm code assumes that BE is all ELFv1, and LE
is ELFv2. Minor changes to add ELFv2 in the BE side gets this working
correctly on FreeBSD with latest OpenZFS import.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Justin Hibbits <chmeeedalf@gmail.com>
Closes #14779

commit | commitdiff | tree

Alexander Motin [Thu, 27 Apr 2023 19:32:58 +0000 (15:32 -0400)]

Mark TX_COMMIT transaction with TXG_NOTHROTTLE.

TX_COMMIT has no on-disk representation and does not produce any more
dirty data. It should not wait for anything, and even just skipping
the checks if not waiting gives improvement noticeable in profiler.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #14798

commit | commitdiff | tree

Val Packett [Thu, 27 Apr 2023 16:49:03 +0000 (13:49 -0300)]

PAM: support the authentication facility

Implement the pam_sm_authenticate method, using the noop argument of
lzc_load_key to do a passphrase check without actually loading the key.

This allows using ZFS as the source of truth for user passwords,
without storing any password hashes in /etc or using other PAM modules.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Felix Dörre <felix@dogcraft.de>
Signed-off-by: Val Packett <val@packett.cool>
Closes #14789

commit | commitdiff | tree

Tino Reichardt [Wed, 26 Apr 2023 19:40:26 +0000 (21:40 +0200)]

Fix BLAKE3 aarch64 assembly for FreeBSD and macOS

The x18 register isn't useable within FreeBSD kernel space, so we
have to fix the BLAKE3 aarch64 assembly for not using it.

The source files are here: https://github.com/mcmilk/BLAKE3-tests

Reviewed-by: Kyle Evans <kevans@FreeBSD.org>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #14728

commit | commitdiff | tree

Brian Behlendorf [Wed, 26 Apr 2023 18:49:16 +0000 (11:49 -0700)]

Fix checkstyle warning

Resolve a missed checkstyle warning.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14799

commit | commitdiff | tree

Alexander Motin [Wed, 26 Apr 2023 16:20:43 +0000 (12:20 -0400)]

Fix positive ABD size assertion in abd_verify().

Gang ABDs without childred are legal, and they do have zero size.
For other ABD types zero size doesn't have much sense and likely
not working correctly now.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #14795

commit | commitdiff | tree

Mateusz Guzik [Thu, 20 Apr 2023 09:00:03 +0000 (09:00 +0000)]

FreeBSD: fix up EINVAL from getdirentries on .zfs

Without the change:
/.zfs
/.zfs/snapshot
find: /.zfs: Invalid argument

Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #14774

commit | commitdiff | tree

Mateusz Guzik [Thu, 20 Apr 2023 08:59:38 +0000 (08:59 +0000)]

FreeBSD: add missing vn state transition for .zfs

Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #14774

commit | commitdiff | tree

Rob N [Wed, 26 Apr 2023 15:50:44 +0000 (01:50 +1000)]

tests/zdb_encrypted: parse numbers a little more robustly

On FreeBSD, `wc` prints some leading spaces, while on Linux it does not.
So we tell ksh to expect an integer, and it does the rest.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #14791
Closes #14797

commit | commitdiff | tree

Brian Behlendorf [Wed, 26 Apr 2023 15:43:39 +0000 (08:43 -0700)]

zdb: Fix minor memory leak

Commit 6b6aaf6dc2e65c63c74fbd7840c14627e9a91ce2 introduced a small
memory leak in zdb. This was detected by the LeakSanitizer and was
causing all ztest runs to fail.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14796

commit | commitdiff | tree

Brian Behlendorf [Tue, 25 Apr 2023 23:40:55 +0000 (16:40 -0700)]

Revert "Fix data race between zil_commit() and zil_suspend()"

This reverts commit 4c856fb333ac57d9b4a6ddd44407fd022a702f00 to
resolve a newly introduced deadlock which in practice in more
disruptive that the issue this commit intended to address.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #14775
Closes #14790

commit | commitdiff | tree

Han Gao [Tue, 25 Apr 2023 23:05:45 +0000 (07:05 +0800)]

Add loongarch64 support

Add loongarch64 definitions & lua module setjmp asm

LoongArch is a new RISC ISA, which is a bit like MIPS or RISC-V.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Han Gao <gaohan@uniontech.com>
Signed-off-by: WANG Xuerui <xen0n@gentoo.org>
Closes #13422

commit | commitdiff | tree

Rich Ercolani [Mon, 24 Apr 2023 23:55:07 +0000 (19:55 -0400)]

Taught zdb -bb to print metadata totals

People often want estimates of how much of their pool is occupied
by metadata, but they end up using lots of text processing on zdb's
output to get it.

So let's just...provide it for them.

Now, zdb -bbbs will output something like:

Blocks  LSIZE   PSIZE   ASIZE     avg    comp   %Total  Type
[...]
    68  1.06M    272K    544K      8K    4.00     0.00      L6 Total
1.71K   212M   6.85M   13.7M      8K   30.91     0.00      L5 Total
1.71K   212M   6.85M   13.7M      8K   30.91     0.00      L4 Total
1.73K   214M   6.92M   13.8M      8K   30.89     0.00      L3 Total
18.7K  2.29G    111M    221M   11.8K   21.19     0.00      L2 Total
3.56M   454G   28.4G   56.9G   16.0K   15.97     0.19      L1 Total
  308M  36.8T   28.2T   28.6T   95.1K    1.30    99.80      L0 Total
  311M  37.3T   28.3T   28.6T   94.2K    1.32   100.00  Total
50.4M   774G    113G    291G   5.77K    6.85     0.99  Metadata Total

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #14746

commit | commitdiff | tree

Mateusz Guzik [Mon, 24 Apr 2023 23:15:42 +0000 (01:15 +0200)]

FreeBSD: add missing vop_fplookup assignments

It became illegal to not have them as of
5f6df177758b9dff88e4b6069aeb2359e8b0c493 ("vfs: validate that vop
vectors provide all or none fplookup vops") upstream.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #14788

commit | commitdiff | tree

Mateusz Guzik [Wed, 5 Apr 2023 21:28:52 +0000 (21:28 +0000)]

FreeBSD: try to fallback early if can't do optimized copy

Not complete, but already shaves on some locking.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Sponsored by: Rubicon Communications, LLC ("Netgate")
Closes #14723

commit | commitdiff | tree

Mateusz Guzik [Wed, 5 Apr 2023 21:12:17 +0000 (21:12 +0000)]

FreeBSD: fix up EXDEV handling for clone_range

API contract requires VOPs to handle EXDEV internally, worst case by
falling back to the generic copy routine. This broke with the recent
changes.

While here whack custom loop to lock 2 vnodes with vn_lock_pair, which
provides the same functionality internally. write start/finish around
it plays no role so got eliminated.

One difference is that vn_lock_pair always takes an exclusive lock on
both vnodes. I did not patch around it because current code takes an
exclusive lock on the target vnode. zfs supports shared-locking for
writes, so this serializes different calls to the routine as is, despite
range locking inside. At the same time you may notice the source vnode
can get some traffic if only shared-locked, thus once more this goes
the safer route of exclusive-locking. Note this should be patched to
use shared-locking for both once the feature is considered stable.

Technically the switch to vn_lock_pair should be a separate change, but
it would only introduce churn immediately whacked by the rest of the
patch.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Sponsored by: Rubicon Communications, LLC ("Netgate")
Closes #14723

commit | commitdiff | tree

Dimitry Andric [Fri, 21 Apr 2023 17:22:52 +0000 (19:22 +0200)]

FreeBSD: make zfs_vfs_held() definition consistent with declaration

Noticed while attempting to change FreeBSD's boolean_t into an actual
bool: in include/sys/zfs_ioctl_impl.h, zfs_vfs_held() is declared to
return a boolean_t, but in module/os/freebsd/zfs/zfs_ioctl_os.c it is
defined to return an int. Make the definition match the declaration.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Dimitry Andric <dimitry@andric.com>
Closes #14776

commit | commitdiff | tree

Allan Jude [Fri, 21 Apr 2023 17:20:36 +0000 (13:20 -0400)]

Add support for zpool user properties

Usage:

zpool set org.freebsd:comment="this is my pool" poolname

Tests are based on zfs_set's user property tests.

Also stop truncating property values at MAXNAMELEN, use ZFS_MAXPROPLEN.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Mateusz Piotrowski <mateusz.piotrowski@klarasystems.com>
Sponsored-by: Beckhoff Automation GmbH & Co. KG.
Sponsored-by: Klara Inc.
Closes #11680

commit | commitdiff | tree

Richard Yao [Tue, 11 Apr 2023 17:56:16 +0000 (17:56 +0000)]

Linux: Suppress -Wordered-compare-function-pointers in tracepoint code

Clang points out that there is a comparison against -1, but we cannot
fix it because that is from the kernel headers, which we must support.
We can workaround this by using a pragma.

Sponsored-By: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Youzhong Yang <yyang@mathworks.com>
Signed-off-by: Richard Yao <richard.yao@klarasystems.com>
Closes #14738

commit | commitdiff | tree

Richard Yao [Tue, 11 Apr 2023 17:50:43 +0000 (17:50 +0000)]

Linux: zfs_zaccess_trivial() should always call generic_permission()

Building with Clang on Linux generates a warning that err could be
uninitialized if mnt_ns is a NULL pointer. However, mnt_ns should never
be NULL, so there is no need to put this behind an if statement. Taking
it outside of the if statement means that the possibility of err being
uninitialized goes from being always zero in a way that the compiler
could not realize to a way that is always zero in a way that the
compiler can realize.

Sponsored-By: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Youzhong Yang <yyang@mathworks.com>
Signed-off-by: Richard Yao <richard.yao@klarasystems.com>
Closes #14738

commit | commitdiff | tree

Brian Behlendorf [Thu, 20 Apr 2023 17:25:16 +0000 (10:25 -0700)]

ZTS: zvol_misc_trim retry busy export

Retry the export if the pool is busy due to an open zvol.
Observed in the CI on Fedora 37.

cannot export 'testpool': pool is busy
ERROR: zpool export testpool exited 1

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14769

commit | commitdiff | tree

rob-wing [Thu, 20 Apr 2023 17:07:56 +0000 (09:07 -0800)]

Create zap for root vdev

And add it to the AVZ, this is not backwards compatible with older pools
due to an assertion in spa_sync() that verifies the number of ZAPs of
all vdevs matches the number of ZAPs in the AVZ.

Granted, the assertion only applies to #DEBUG builds - still, a feature
flag is introduced to avoid the assertion, com.klarasystems:vdev_zaps_v2

Notably, this allows to get/set properties on the root vdev:

    % zpool set user:prop=value <pool> root-0

Before this commit, it was already possible to get/set properties on
top-level vdevs with the syntax <type>-<vdev_id> (e.g. mirror-0):

    % zpool set user:prop=value <pool> mirror-0

This syntax also applies to the root vdev as it is is of type 'root'
with a vdev_id of 0, root-0. The keyword 'root' as an alias for
'root-0'.

The following tests have been added:

    - zpool get all properties from root vdev
    - zpool set a property on root vdev
    - verify root vdev ZAP is created

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Wing <rob.wing@klarasystems.com>
Sponsored-by: Seagate Technology
Submitted-by: Klara, Inc.
Closes #14405

commit | commitdiff | tree

Herb Wartens [Wed, 19 Apr 2023 20:22:59 +0000 (13:22 -0700)]

Allow MMP to bypass waiting for other threads

At our site we have seen cases when multi-modifier protection is enabled
(multihost=on) on our pool and the pool gets suspended due to a single
disk that is failing and responding very slowly. Our pools have 90 disks
in them and we expect disks to fail. The current version of MMP requires
that we wait for other writers before moving on. When a disk is
responding very slowly, we observed that waiting here was bad enough to
cause the pool to suspend. This change allows the MMP thread to bypass
waiting for other threads and reduces the chances the pool gets
suspended.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Herb Wartens <hawartens@gmail.com>
Closes #14659

commit | commitdiff | tree

Paul Dagnelie [Wed, 19 Apr 2023 20:20:02 +0000 (13:20 -0700)]

ZTS: send-c_volume is flaky

We use block_device_wait to wait for the zvol block device to
actually appear, and we log the result of the dd calls by using
an intermediate file.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Wren Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #14767

commit | commitdiff | tree

Ameer Hamza [Wed, 19 Apr 2023 16:04:32 +0000 (21:04 +0500)]

Fix "Detach spare vdev in case if resilvering does not happen"

Spare vdev should detach from the pool when a disk is reinserted.
However, spare detachment depends on the completion of resilvering,
and if resilver does not schedule, the spare vdev keeps attached to
the pool until the next resilvering. When a zfs pool contains
several disks (25+ mirror), resilvering does not always happen when
a disk is reinserted. In this patch, spare vdev is manually detached
from the pool when resilvering does not occur and it has been tested
on both Linux and FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #14722

commit | commitdiff | tree

наб [Wed, 19 Apr 2023 16:03:42 +0000 (18:03 +0200)]

zfsprops.7: update mandlock

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=f7e33bdbd6d1bdf9c3df8bba5abcf3399f957ac3
https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=7e59106e9c34458540f7d382d5b49071d1b7104f

Fixes: commit fb9baa9b2045a193a3caf0a46b5cac5ef7a84b61 ("zfsprops.8:
remove nbmand-not-used-on-Linux and pointer to mount(8)")

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #14765

commit | commitdiff | tree

youzhongyang [Wed, 19 Apr 2023 01:10:40 +0000 (21:10 -0400)]

Silence clang warning of flexible array not at end

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Closes #14764

commit | commitdiff | tree

Low-power [Tue, 18 Apr 2023 18:34:41 +0000 (02:34 +0800)]

Values printed by zpool-iostat(8) should be right-aligned

This inappropriate left-alignment was introduced in 7bb7b1f.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: WHR <msl0000023508@gmail.com>
Closes #14751

commit | commitdiff | tree

Tony Hutter [Tue, 18 Apr 2023 15:41:52 +0000 (08:41 -0700)]

Revert "ZFS_IOC_COUNT_FILLED does unnecessary txg_wait_synced()"

This reverts commit 4b3133e671b958fa2c915a4faf57812820124a7b.

Users identified this commit as a possible source of data
corruption:
https://github.com/openzfs/zfs/issues/14753

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Issue #14753
Closes #14761

commit | commitdiff | tree

Rich Ercolani [Tue, 18 Apr 2023 00:38:09 +0000 (20:38 -0400)]

Work around Raspberry Pi kernel packaging oddities

On Debian and Ubuntu and friends, you get something like
"linux-image-$(uname -r)" and "linux-headers-$(uname -r)" you
can put a Depends on.

On Raspberry Pi OS, you get "raspberrypi-kernel" and
"raspberrypi-kernel-headers", with version numbers like 20230411.

There is not, as far as I can tell, a reasonable way to map that
to a kernel version short of reaching out and digging around in
the changelogs or Makefile, so just special-case it so the packages
don't fail to install at install time. They still might not build
if the versions don't match, but I don't see a way to do anything
about that...

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #14745
Closes #14747

commit | commitdiff | tree

Pawel Jakub Dawidek [Mon, 17 Apr 2023 23:42:09 +0000 (08:42 +0900)]

Fix VERIFY(!zil_replaying(zilog, tx)) panic

The zfs_log_clone_range() function is never called from the
zfs_clone_range_replay() function, so I assumed it is safe to assert
that zil_replaying() is never TRUE here. It turns out zil_replaying()
also returns TRUE when the sync property is set to disabled.

Fix the problem by just returning if zil_replaying() returns TRUE.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reported by: Florian Smeets
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #14758

commit | commitdiff | tree

dodexahedron [Thu, 13 Apr 2023 16:15:34 +0000 (09:15 -0700)]

Minor improvements to zpoolconcepts.7

* Fixed one typo (effects -> affects)
* Re-worded raidz description to make it clearer that it is not
   quite the same as RAID5, though similar
* Clarified that data is not necessarily written in a static
   stripe width
* Minor grammar consistency improvement
* Noted that "volumes" means zvols
* Fixed a couple of split infinitives
* Clarified that hot spares come from the same pool they were
   assigned to
* "we" -> ZFS
* Fixed warnings thrown by mandoc, and removed unnecessary
  wordiness in one fixed line.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brandon Thetford <brandon@dodecatec.com>
Closes #14726

commit | commitdiff | tree

youzhongyang [Thu, 13 Apr 2023 16:12:03 +0000 (12:12 -0400)]

Linux 6.3 compat: Fix memcpy "detected field-spanning write" error

Add a new union member of flexible array to dnode_phys_t and use
it in the macro so we can silence the memcpy() fortify error.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Closes #14737

commit | commitdiff | tree

Pawel Jakub Dawidek [Wed, 12 Apr 2023 23:15:05 +0000 (08:15 +0900)]

Fix data corruption when cloning embedded blocks

Don't overwrite blk_phys_birth, as for embedded blocks it is part of
the payload.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Issue #13392
Closes #14739

commit | commitdiff | tree

наб [Wed, 12 Apr 2023 17:08:49 +0000 (19:08 +0200)]

initramfs: source user scripts from /e/z/initramfs-tools-load-key{,.d/*}

By dropping in a file in a directory (for packages) or by making a file
(for local administrators), custom key loading methods may be provided
for the rootfs and necessities.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Nicholas Morris <security@niwamo.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Co-authored-by: Nicholas Morris <security@niwamo.com>
Supersedes: #14704
Closes: #13757
Closes #14733

commit | commitdiff | tree

George Amanakis [Wed, 12 Apr 2023 15:53:53 +0000 (17:53 +0200)]

Fix in check_filesystem()

Fix the code in case of missing snapshots. Previously the check was in
a conditional that would be executed if the filesystem had snapshots.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #14735

commit | commitdiff | tree

Alan Somers [Mon, 10 Apr 2023 21:24:27 +0000 (15:24 -0600)]

Trim needless zeroes from checksum events

The ereport.fs.zfs.checksum event contains histograms of the bits that
were wrongly set or cleared according to their bit position in a 64-bit
word. So the maximum value that any histogram bucket could have would
be 64. But ZFS currently uses a uint32_t to hold each bucket. As a
result, the event report is full of needless zeroes.

Change the bucket size to uint8_t, stripping 768 needless zeros from
each event.

Original event format:
```
class=ereport.fs.zfs.checksum ena=639460469834258433 pool=testpool.1933 pool_guid=4979719877084416563 pool_state=0 pool_context=0 pool_failmode=wait vdev_guid=4136721804819128578 vdev_type=file vdev_path=/tmp/kyua.1TxP3A/2/work/file1.1933 vdev_ashift=9 vdev_complete_ts=609837019678 vdev_delta_ts=33450 vdev_read_errors=0 vdev_write_errors=0 vdev_cksum_errors=20 vdev_delays=0 parent_guid=2751977006639883417 parent_type=raidz vdev_spare_guids= zio_err=0 zio_flags=1048752 zio_stage=4194304 zio_pipeline=65011712 zio_delay=0 zio_timestamp=0 zio_delta=0 zio_priority=4 zio_offset=702976 zio_size=1024 zio_objset=24 zio_object=0 zio_level=3 zio_blkid=0 bad_ranges=0000000000000400 bad_ranges_min_gap=8 bad_range_sets=0000079e bad_range_clears=00000854 bad_set_histogram=000000210000001a000000150000001d000000240000001b000000220000001b000000210000002100000018000000260000002300000025000000210000001e000000250000001b0000001d0000001e0000001600000025000000180000001b000000240000001b000000240000001b0000001c000000210000001b0000001e000000210000001a0000001e000000220000001d0000001b000000200000001f0000001a000000250000001f0000001d0000001b0000001d000000240000001d0000001b0000001b0000001f00000024000000190000001a0000001f0000001e000000240000001e0000002400000021000000200000001d0000001d00000021 bad_cleared_histogram=000000220000002700000021000000210000001b0000001a000000250000001f0000001c0000001e0000002400000022000000220000002400000022000000240000002200000021000000220000001b0000002100000021000000190000001b000000240000002400000020000000290000002a00000028000000250000002400000020000000270000002500000016000000270000001c000000210000001f000000240000001c0000002100000022000000240000002100000023000000210000002700000022000000240000001b00000022000000210000001c00000023000000150000002600000020000000270000001e0000001d0000002400000026 time=00000016806457270000000323406839 eid=458
```

New format:
```
class=ereport.fs.zfs.checksum ena=96599319807790081 pool=testpool.1933 pool_guid=1236902063710799041 pool_state=0 pool_context=0 pool_failmode=wait vdev_guid=2774253874431514999 vdev_type=file vdev_path=/tmp/kyua.6Temlq/2/work/file1.1933 vdev_ashift=9 vdev_complete_ts=92124283803 vdev_delta_ts=46670 vdev_read_errors=0 vdev_write_errors=0 vdev_cksum_errors=20 vdev_delays=0 parent_guid=8090931855087882905 parent_type=raidz vdev_spare_guids= zio_err=0 zio_flags=1048752 zio_stage=4194304 zio_pipeline=65011712 zio_delay=0 zio_timestamp=0 zio_delta=0 zio_priority=4 zio_offset=1028608 zio_size=512 zio_objset=0 zio_object=0 zio_level=0 zio_blkid=4 bad_ranges=0000000000000200 bad_ranges_min_gap=8 bad_range_sets=0000061f bad_range_clears=000001f4 bad_set_histogram=1719161c1c1c101618171a151a1a19161e1c171d1816161c191f1a18192117191c131d171b1613151a171419161a1b1319101b14171b18151e191a1b141a1c17 bad_cleared_histogram=06090a0808070a0b020609060506090a01090a050a0a0509070609080d050d0607080d060507080c04070807070a0608020c080c080908040808090a05090a07 time=00000016806477050000000604157480 eid=62
```

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Alan Somers <asomers@FreeBSD.org>
Sponsored-by: Axcient
Closes #14716

ZFS On Linux mirror for PVE

RSS Atom