]> git.proxmox.com Git - mirror_zfs.git/log
mirror_zfs.git
3 years agolibuutil: purge unused functions
наб [Mon, 12 Apr 2021 16:32:43 +0000 (18:32 +0200)]
libuutil: purge unused functions

Remove vestigial uu_open_tmp().  The problems with this implementation
are many, but the primary one is the TMPPATHFMT macro, which is
unused, and always has been.

Searching around for any users leads only to earlier imports of the
same, identical file, i.a. into an apple repository (which does patch
gethrtime() into it and gives us a copyright date of 2007),
and a MidnightBSD one from 2008.

Searching illumos-gate, uu_open_tmp appears, in current HEAD, three
times: in the header, libuutil's mapfile ABI, and the implementation.

This slowly grows up to eight occurrences as one moves back to the root
"OpenSolaris Launch" commit: the header, implementation, twice in
libuutil's spec ABI, twice (with multilib and non-multilib paths) in
libuutil.so's i386 and SPARC binary db ABIs.

That's 2005, and this file was abandonware even then, it's dead code.

The situation is similar for the uu_dprintf() family of functions and
uu_dump().  Nothing in accessibly recorded history has ever used them.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11873

3 years agoImprovements to the 'compatibility' property
Colm [Mon, 12 Apr 2021 16:08:56 +0000 (17:08 +0100)]
Improvements to the 'compatibility' property

Several improvements to the operation of the 'compatibility' property:

1) Improved handling of unrecognized features:
Change the way unrecognized features in compatibility files are handled.

 * invalid features in files under /usr/share/zfs/compatibility.d
   only get a warning (as these may refer to future features not yet in
   the library),
 * invalid features in files under /etc/zfs/compatibility.d
   get an error (as these are presumed to refer to the current system).

2) Improved error reporting from zpool_load_compat.
Note: slight ABI change to zpool_load_compat for better error reporting.

3) compatibility=legacy inhibits all 'zpool upgrade' operations.

4) Detect when features are enabled outside current compatibility set
   * zpool set compatibility=foo <-- print a warning
   * zpool set feature@xxx=enabled <-- error
   * zpool status <-- indicate this state

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Colm Buckley <colm@tuatha.org>
Closes #11861

3 years agoZTS: fix removal_condense_export test case
Brian Behlendorf [Mon, 12 Apr 2021 04:49:13 +0000 (21:49 -0700)]
ZTS: fix removal_condense_export test case

It's been observed in the CI that the required 25% of obsolete bytes
in the mapping can be to high a threshold for this test resulting in
condensing never being triggered and a test failure.  To prevent these
failures make the existing zfs_condense_indirect_obsolete_pct tuning
available so the obsolete percentage can be reduced from 25% to 5%
during this test.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11869

3 years agoUpdate libzfs.abi for zfs_send() change
Brian Behlendorf [Mon, 12 Apr 2021 00:03:55 +0000 (17:03 -0700)]
Update libzfs.abi for zfs_send() change

Commit 099fa7e4 intentionally modified the libzfs ABI.  However, it
failed to include an update for the libzfs.abi file.  This commit
resolves the `make checkabi` warning due to that omission.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11710

3 years agoBalance parentheses in parameter descriptions
pstef [Sun, 11 Apr 2021 23:35:07 +0000 (01:35 +0200)]
Balance parentheses in parameter descriptions

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Piotr Paweł Stefaniak <pstef@freebsd.org>
Closes #11882

3 years agoZTS: Add known exceptions
Brian Behlendorf [Sun, 11 Apr 2021 22:55:38 +0000 (15:55 -0700)]
ZTS: Add known exceptions

The fault/auto_spare_shared, l2arc/persist_l2arc_007_pos, and
alloc_class/alloc_class_013_pos test cases are not entirely reliable
and may occasionally fail resulting in a false positive in the CI.
Add these tests to known list of possible failures until they can
be made 100% reliable.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11890

3 years agolib/: set O_CLOEXEC on all fds
наб [Thu, 8 Apr 2021 20:17:38 +0000 (22:17 +0200)]
lib/: set O_CLOEXEC on all fds

As found by
  git grep -E '(open|setmntent|pipe2?)\(' |
    grep -vE '((zfs|zpool)_|fd|dl|lzc_re|pidfile_|g_)open\('

FreeBSD's pidfile_open() says nothing about the flags of the files it
opens, but we can't do anything about it anyway; the implementation does
open all files with O_CLOEXEC

Consider this output with zpool.d/media appended with
"pid=$$; (ls -l /proc/$pid/fd > /dev/tty)":
  $ /sbin/zpool iostat -vc media
  lrwx------ 0 -> /dev/pts/0
  l-wx------ 1 -> 'pipe:[3278500]'
  l-wx------ 2 -> /dev/null
  lrwx------ 3 -> /dev/zfs
  lr-x------ 4 -> /proc/31895/mounts
  lrwx------ 5 -> /dev/zfs
  lr-x------ 10 -> /usr/lib/zfs-linux/zpool.d/media
vs
  $ ./zpool iostat -vc vendor,upath,iostat,media
  lrwx------ 0 -> /dev/pts/0
  l-wx------ 1 -> 'pipe:[3279887]'
  l-wx------ 2 -> /dev/null
  lr-x------ 10 -> /usr/lib/zfs-linux/zpool.d/media

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11866

3 years agolibzfs{,_core}: set O_CLOEXEC on persistent (ZFS_DEV and MNTTAB) fds
наб [Tue, 9 Mar 2021 23:00:43 +0000 (00:00 +0100)]
libzfs{,_core}: set O_CLOEXEC on persistent (ZFS_DEV and MNTTAB) fds

These were fd 3, 4, and 5 by the time zfs change-key hit
execute_key_fob()

glibc appends "e" to setmntent() mode, but musl's just returns fopen()

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11866

3 years agolibzfs: zfs_crypto_create() requires a new key by definition: set newkey
наб [Thu, 11 Mar 2021 13:34:01 +0000 (14:34 +0100)]
libzfs: zfs_crypto_create() requires a new key by definition: set newkey

This changes the password prompt for new encryption roots from
  Enter passphrase:
  Re-enter passphrase:
to
  Enter new passphrase:
  Re-enter new passphrase:
which makes more sense and is more consistent with "new passphrase"
now always meaning "come up with something" and plain "passphrase"
"remember that thing"

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11866

3 years agolibzfs_crypto.c: remove unused key_locator enum
наб [Wed, 10 Mar 2021 10:18:49 +0000 (11:18 +0100)]
libzfs_crypto.c: remove unused key_locator enum

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11866

3 years agozfprops(8): fix spacing in jailed= arguments
наб [Thu, 11 Mar 2021 16:42:22 +0000 (17:42 +0100)]
zfprops(8): fix spacing in jailed= arguments

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11866

3 years agozfs-[un]jail(8): fix "zfs-jail [un]jail" leftovers
наб [Wed, 10 Mar 2021 14:56:01 +0000 (15:56 +0100)]
zfs-[un]jail(8): fix "zfs-jail [un]jail" leftovers

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11866

3 years agozed: untangle _zed_conf_parse_path()
наб [Wed, 7 Apr 2021 14:17:44 +0000 (16:17 +0200)]
zed: untangle _zed_conf_parse_path()

Dunno, maybe it's just me, but the previous style was /really/ confusing

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11860

3 years agozed: don't malloc() global zed_conf instance, optimise zed_conf layout
наб [Wed, 7 Apr 2021 13:38:22 +0000 (15:38 +0200)]
zed: don't malloc() global zed_conf instance, optimise zed_conf layout

It's all of 40 bytes with 4-byte pointers and 64 with 8-byte ones
(previously 44 and 88, respectively) ‒
there's no reason it can't live on the stack

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11860

3 years agozed: remove zed_conf::{min,max}_events and ZED_{MIN,MAX}_EVENTS
наб [Wed, 7 Apr 2021 13:35:10 +0000 (15:35 +0200)]
zed: remove zed_conf::{min,max}_events and ZED_{MIN,MAX}_EVENTS

No users, fields marked "reserved for future use", macros defined to 0

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11860

3 years agozed: remove zed_conf::syslog_facility
наб [Wed, 7 Apr 2021 13:32:45 +0000 (15:32 +0200)]
zed: remove zed_conf::syslog_facility

No users, nobody sets it, main() hard-codes LOG_DAEMON, which is the
only correct value for this

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11860

3 years agozed: _zed_conf_display_help(): be consistent about what got_err means
наб [Wed, 7 Apr 2021 13:27:55 +0000 (15:27 +0200)]
zed: _zed_conf_display_help(): be consistent about what got_err means

Users passed in EXIT_SUCCESS and EXIT_FAILURE, despite it being a bool

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11860

3 years agozed: untangle -h option listing
наб [Wed, 7 Apr 2021 13:20:22 +0000 (15:20 +0200)]
zed: untangle -h option listing

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11860

3 years agozed: print out licence string as one big chunk
наб [Wed, 7 Apr 2021 12:52:58 +0000 (14:52 +0200)]
zed: print out licence string as one big chunk

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11860

3 years agoAllow zfs to send replication streams with missing snapshots
pablofsf [Sun, 11 Apr 2021 19:05:35 +0000 (21:05 +0200)]
Allow zfs to send replication streams with missing snapshots

A tentative implementation and discussion was done in #5285.
According to it a send --skip-missing|-s flag has been added.
In a replication stream, when there are snapshots missing in
the hierarchy, if -s is provided print a warning and ignore
dataset (and its children) instead of throwing an error

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pablo Correa Gómez <ablocorrea@hotmail.com>
Closes #11710

3 years agokmod-zfs should obsolete kmod-spl as well as spl-kmod
Olaf Faaland [Sun, 11 Apr 2021 19:02:26 +0000 (12:02 -0700)]
kmod-zfs should obsolete kmod-spl as well as spl-kmod

Without this Obsoletes, using packages built --with-spec=redhat, an
upgrade from zfs-0.7 to zfs-2.x does not cause the kmod-spl-0.7 package
to be removed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #11865

3 years agozvol_wait: properly handle zvol_volmode sysctl being 3/none
наб [Fri, 9 Apr 2021 16:12:07 +0000 (18:12 +0200)]
zvol_wait: properly handle zvol_volmode sysctl being 3/none

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11859

3 years agozfs_ids_to_path: print correct wrong values
наб [Wed, 7 Apr 2021 17:06:42 +0000 (19:06 +0200)]
zfs_ids_to_path: print correct wrong values

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11859

3 years agozfs_ids_to_path: the -v comes after the executable name
наб [Wed, 7 Apr 2021 17:04:46 +0000 (19:04 +0200)]
zfs_ids_to_path: the -v comes after the executable name

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11859

3 years agocontrib/bpftrace: exec bpftrace, remove useless cat
наб [Wed, 7 Apr 2021 16:38:07 +0000 (18:38 +0200)]
contrib/bpftrace: exec bpftrace, remove useless cat

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11859

3 years agoarc_summary3: just read /s/m/{mod}/version instead of spawning cat
наб [Wed, 7 Apr 2021 16:02:35 +0000 (18:02 +0200)]
arc_summary3: just read /s/m/{mod}/version instead of spawning cat

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11859

3 years agozvol_wait: fix for zvols with spaces in name, optimise
наб [Wed, 7 Apr 2021 15:37:55 +0000 (17:37 +0200)]
zvol_wait: fix for zvols with spaces in name, optimise

list_zvols() would happily, for zvols with spaces in their names,
assign the second half to volmode, &c., so use a normal read
and set IFS to a tab instead of using 4 separate AWK processes(?)

Similarly, in filter_out_deleted_zvols(), run zfs(8) once and use the
output directly instead of spawning a zfs(8) process per zvol

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11859

3 years agozstreamdump: exec zstream dump
наб [Wed, 7 Apr 2021 14:55:40 +0000 (16:55 +0200)]
zstreamdump: exec zstream dump

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11859

3 years agoMove zfsdev_state_{init,destroy} to common code
Ryan Moeller [Tue, 16 Mar 2021 13:04:58 +0000 (13:04 +0000)]
Move zfsdev_state_{init,destroy} to common code

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #11833

3 years agoEliminate zfsdev_get_state_impl
Ryan Moeller [Tue, 16 Mar 2021 12:44:23 +0000 (12:44 +0000)]
Eliminate zfsdev_get_state_impl

After 3937ab20f zfsdev_get_state_impl can become zfsdev_get_state.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #11833

3 years agozpl_inode.c: Fix SMACK interoperability
TerraTech [Fri, 9 Apr 2021 04:15:29 +0000 (21:15 -0700)]
zpl_inode.c: Fix SMACK interoperability

SMACK needs to have the ZFS dentry security field setup before
SMACK's d_instantiate() hook is called as it requires functioning
'__vfs_getxattr()' calls to properly set the labels.

Fxes:
1) file instantiation properly setting the object label to the
   subject's label
2) proper file labeling in a transmutable directory

Functions Updated:
1) zpl_create()
2) zpl_mknod()
3) zpl_mkdir()
4) zpl_symlink()

External-issue: https://github.com/cschaufler/smack-next/issues/1
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: TerraTech <TerraTech@users.noreply.github.com>
Closes #11646
Closes #11839

3 years agoZTS: Improve cleanup in removal_with_export
Ryan Moeller [Fri, 9 Apr 2021 04:10:28 +0000 (00:10 -0400)]
ZTS: Improve cleanup in removal_with_export

Kill the removal operation on every platform, not just Linux.
The test has been fixed and is now stable on FreeBSD.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11856

3 years agoAdded check for broken alien version
Rich Ercolani [Thu, 8 Apr 2021 21:37:51 +0000 (17:37 -0400)]
Added check for broken alien version

Added a check for alien 8.95.{1,2,3}, which is known to fail to
generate debs 100% of the time, and instead print out a message
informing the developer that it's known to be broken and linking
them to more information.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #11848
Closes #11850

3 years agoUse dsl_scan_setup_check() to setup a scrub
Brian Behlendorf [Thu, 8 Apr 2021 21:33:15 +0000 (14:33 -0700)]
Use dsl_scan_setup_check() to setup a scrub

When a rebuild completes it will automatically schedule a follow up
scrub to verify all of the block checksums.  Before setting up the
scrub execute the counterpart dsl_scan_setup_check() function to
confirm the scrub can be started.  Prior to this change we'd only
check vdev_rebuild_active() which isn't as comprehensive, and using
the check function keeps all of this logic in one place.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11849

3 years agoFix double sha1/sha1.o line in module/icp/Makefile.in
Tino Reichardt [Thu, 8 Apr 2021 20:25:24 +0000 (22:25 +0200)]
Fix double sha1/sha1.o line in module/icp/Makefile.in

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #11852

3 years agoZTS: Tests using zhack may fail on FreeBSD
Ryan Moeller [Thu, 8 Apr 2021 20:21:53 +0000 (16:21 -0400)]
ZTS: Tests using zhack may fail on FreeBSD

As described in #11854, zhack is occasionally segfaulting on FreeBSD.
Debugging this is proving to be tricky. To avoid false positives in
the CI add entries for the tests that use zhack in zts-report to
accept that they may occasionally fail on FreeBSD.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Issue #11854
Closes #11855

3 years agoRatelimit deadman zevents as with delay zevents
Ryan Moeller [Wed, 7 Apr 2021 23:23:57 +0000 (19:23 -0400)]
Ratelimit deadman zevents as with delay zevents

Just as delay zevents can flood the zevent pipe when a vdev becomes
unresponsive, so do the deadman zevents.

Ratelimit deadman zevents according to the same tunable as for delay
zevents.

Enable deadman tests on FreeBSD and add a test for deadman event
ratelimiting.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11786

3 years agozed: only go up to current limit in close_from() fallback
наб [Sat, 3 Apr 2021 10:09:24 +0000 (12:09 +0200)]
zed: only go up to current limit in close_from() fallback

Consider the following strace log:
  prlimit64(0, RLIMIT_NOFILE,
            NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0
  dup2(0, 30)                         = 30
  dup2(0, 300)                        = 300
  dup2(0, 3000)                       = -1 EBADF (Bad file descriptor)
  dup2(0, 30000)                      = -1 EBADF (Bad file descriptor)
  dup2(0, 300000)                     = -1 EBADF (Bad file descriptor)
  prlimit64(0, RLIMIT_NOFILE,
            {rlim_cur=1024*1024, rlim_max=1024*1024}, NULL) = 0
  dup2(0, 30)                         = 30
  dup2(0, 300)                        = 300
  dup2(0, 3000)                       = 3000
  dup2(0, 30000)                      = 30000
  dup2(0, 300000)                     = 300000

Even a privileged process needs to bump its rlimit before being able
to use fds higher than rlim_cur.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11834

3 years agozed.8: the Diagnosis Engine is implemented
наб [Fri, 2 Apr 2021 19:37:53 +0000 (21:37 +0200)]
zed.8: the Diagnosis Engine is implemented

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11834

3 years agozed: replace zed_file_write_n() with write(2), purge it
наб [Fri, 2 Apr 2021 19:31:23 +0000 (21:31 +0200)]
zed: replace zed_file_write_n() with write(2), purge it

We set SA_RESTART early on, which will prevent EINTRs (indeed, to the
point of needing to clear it in the reaper, since it interferes with
pause(2)), which is the only error zed_file_write_n() actually handled
(plus, the pid write is no bigger than 12 bytes anyway)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11834

3 years agozed: merge all _NOT_IMPLEMENTED_ events
наб [Fri, 2 Apr 2021 18:47:00 +0000 (20:47 +0200)]
zed: merge all _NOT_IMPLEMENTED_ events

These events should currently never be generated.

Also untag _zed_event_add_nvpair() from merge with
zpool_do_events_nvprint() ‒ they serve different purposes (machine,
usually script vs human consumption) and format the output differently
as it stands

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11834

3 years agozed: remove unused zed_file_read_n()
наб [Fri, 2 Apr 2021 15:32:51 +0000 (17:32 +0200)]
zed: remove unused zed_file_read_n()

Same deal as zed_file_close_on_exec()

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11834

3 years agozed: bump zfs_zevent_len_max if we miss any events
наб [Fri, 2 Apr 2021 15:14:31 +0000 (17:14 +0200)]
zed: bump zfs_zevent_len_max if we miss any events

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11834

3 years agozed.8: don't pretend an unprivileged user could change the script owner
наб [Fri, 2 Apr 2021 14:40:48 +0000 (16:40 +0200)]
zed.8: don't pretend an unprivileged user could change the script owner

And add a note on /why/ ZEDLETs need to be owned by root

Quoth chown(2), Linux man-pages project:
  Only a privileged process (Linux: one with the CAP_CHOWN capability)
  may change the owner of a file.

Quoth chown(2), FreeBSD:
     [EPERM]  The operation would change the ownership,
              but the effective user ID is not the super-user.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11834

3 years agozed: purge all mentions of a configuration file
наб [Fri, 2 Apr 2021 13:57:23 +0000 (15:57 +0200)]
zed: purge all mentions of a configuration file

There simply isn't a need for one, since the flags the daemon takes
are all short (mostly just toggles) and administrative in nature,
and are therefore better served by the age-old tradition of sourcing an
environment file and preparing the cmdline in the init-specific handler
itself, if needed at all

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11834

3 years agozed: implement close_from() in terms of /proc/self/fd, if available
наб [Fri, 2 Apr 2021 13:10:34 +0000 (15:10 +0200)]
zed: implement close_from() in terms of /proc/self/fd, if available

/dev/fd on Darwin

Consider the following strace output:
  prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0

Yes, that is well over a million file descriptors!

This reduces the ZED start-up time from "at least a second" to
"instantaneous", and, under strace, from "don't even try" to "usable"
by simple virtue of doing five syscalls instead of over a million;
in most cases the main loop does nothing

Recent Linuxes (5.8+) have close_range(2) for this, but that's an
overoptimisation (and libcs don't have wrappers for it yet)

This is also run by the ZEDLET pre-exec. Compare:
  Finished "all-syslog.sh" eid=13 pid=6717 time=1.027100s exit=0
  Finished "history_event-zfs-list-cacher.sh" eid=13 pid=6718 time=1.046923s exit=0
to
  Finished "all-syslog.sh" eid=12 pid=4834 time=0.001836s exit=0
  Finished "history_event-zfs-list-cacher.sh" eid=12 pid=4835 time=0.001346s exit=0
lol

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11834

3 years agozed: print combined system/user time after ZEDLET death
наб [Fri, 2 Apr 2021 12:10:31 +0000 (14:10 +0200)]
zed: print combined system/user time after ZEDLET death

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11834

3 years agoAdd kmodtool fix to detect different System.map location
Marcin Skarbek [Wed, 7 Apr 2021 17:17:39 +0000 (19:17 +0200)]
Add kmodtool fix to detect different System.map location

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Marcin Skarbek <git@skarbek.name>
Closes #7807
Closes #11836

3 years agofix misplaced quotes in kmod-preamble
Olaf Faaland [Wed, 7 Apr 2021 17:10:34 +0000 (10:10 -0700)]
fix misplaced quotes in kmod-preamble

rpm/redhat/zfs-kmod.spec.in has a typo in the shell code that
creates the kmod-preamble file.  This typo results in the
preamble file having the wrong name,

./SOURCES/kmod-preamblenObsoletes

and missing the Obsoletes clause that has become part of the name.

Because the filename is incorrect, the built package does not have
"obsoletes" or "conflicts" set.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #11851

3 years agoObsolete earlier packages due to version bump
Brian Behlendorf [Wed, 7 Apr 2021 17:09:21 +0000 (10:09 -0700)]
Obsolete earlier packages due to version bump

In order for package managers such as dnf to upgrade cleanly after
the package SONAME bump the obsolete package names must be known.
Update the new packages to correctly obsolete the old ones.

Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11844
Closes #11847

3 years agoi-t: don't brokenly set the scheduler for root pool vdev's disks
наб [Sat, 3 Apr 2021 22:53:40 +0000 (00:53 +0200)]
i-t: don't brokenly set the scheduler for root pool vdev's disks

This effectively reverts
  4fc411f7a3ecee8a70fc8d6c687fae9a1cf20b31 (part of #6807) and
  f6fbe25664629d1ae6a3b186f14ec69dbe6c6232 (#9042) ‒
the code itself and latter PR cite symmetry with whole-disk-vdev
behaviour (presumably because rootfs vdevs are rarely whole disks),
but the code is broken for NVME devices (indeed, it'd strip the
controller number instead of the (potential) partition number, turning
"nvme0n1p1" into "nvmen1p1", which would then subsequently fail the
sysfs existence check); it could be fixed to handle those (and any
others) rather easily by dereferencing /sys/class/block/$devname,
but this isn't the place for setting this ‒ as noted in the commit that
removed setting the scheduler by default
(9e17e6f2541c69a7a5e0ed814a7f5e71cbf8b90a) ‒ use an udev rule

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11838

3 years agoi-t: fix root=zfs:AUTO
наб [Sat, 3 Apr 2021 16:18:39 +0000 (18:18 +0200)]
i-t: fix root=zfs:AUTO

IFS= would break loops in import_pool(), which would fault
any automatic import

Additionally $ZFS_BOOTFS from cmdline would interfere with find_rootfs()

If many pools were present, same thing could happen across multiple
find_rootfs() runs, so bail out early and clean up in error path

Suggested-by: @nachtgeist
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11278
Closes #11838

3 years agozfs get -p only outputs 3 columns if "clones" property is empty
matt-fidd [Tue, 6 Apr 2021 23:05:54 +0000 (00:05 +0100)]
zfs get -p only outputs 3 columns if "clones" property is empty

get_clones_string currently returns an empty string for filesystem
snapshots which have no clones. This breaks parsable `zfs get` output as
only three columns are output, instead of 4.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Fiddaman <github@m.fiddaman.uk>
Co-authored-by: matt <matt@fiddaman.net>
Closes #11837

3 years agokmem_alloc(KM_SLEEP) should use kvmalloc()
Matthew Ahrens [Tue, 6 Apr 2021 19:44:54 +0000 (12:44 -0700)]
kmem_alloc(KM_SLEEP) should use kvmalloc()

`kmem_alloc(size>PAGESIZE, KM_SLEEP)` is backed by `kmalloc()`, which
finds contiguous physical memory.  If there isn't enough contiguous
physical memory available (e.g. due to physical page fragmentation), the
OOM killer will be invoked to make more memory available.  This is not
ideal because processes may be killed when there is still plenty of free
memory (it just happens to be in individual pages, not contiguous runs
of pages).  We have observed this when allocating the ~13KB `zfs_cmd_t`,
for example in `zfsdev_ioctl()`.

This commit changes the behavior of
`kmem_alloc(size>PAGESIZE, KM_SLEEP)` when there are insufficient
contiguous free pages.  In this case we will find individual pages and
stitch them together using virtual memory.  This is accomplished by
using `kvmalloc()`, which implements the described behavior by trying
`kmalloc(__GFP_NORETRY)` and falling back on `vmalloc()`.

The behavior of `kmem_alloc(KM_NOSLEEP)` is not changed; it continues to
use `kmalloc(GPF_ATOMIC | __GFP_NORETRY)`.  This is because `vmalloc()`
may sleep.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11461

3 years agozpool-features.5: remove "booting not possible with this feature"s
наб [Tue, 6 Apr 2021 19:39:54 +0000 (21:39 +0200)]
zpool-features.5: remove "booting not possible with this feature"s

The exact limitations on what features are supported when booting
vary considerably depending on the environment.  In order to minimize
confusion avoid categorical statements which assume GRUB2 is being
used.  The supported GRUB2 features are covered earlier in this man
page for easy reference.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11842

3 years agoman: fix wrong .Xr macros usages
George Melikov [Tue, 6 Apr 2021 19:27:40 +0000 (22:27 +0300)]
man: fix wrong .Xr macros usages

In addition, html doc will have working hyperlinks.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11845

3 years agolibzutil: zfs_isnumber(): return false if input empty
наб [Tue, 6 Apr 2021 19:25:53 +0000 (21:25 +0200)]
libzutil: zfs_isnumber(): return false if input empty

zpool list, which is the only user, would mistakenly try to parse the
empty string as the interval in this case:

  $ zpool list "a"
  cannot open 'a': no such pool
  $ zpool list ""
  interval cannot be zero
  usage: <usage string follows>
which is now symmetric with zpool get:
  $ zpool list ""
  cannot open '': name must begin with a letter

Avoid breaking the  "interval cannot be zero" string.
There simply isn't a need for this, and it's user-facing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11841
Closes #11843

3 years agoZTS: pool_checkpoint improvements
Brian Behlendorf [Sat, 3 Apr 2021 15:33:22 +0000 (08:33 -0700)]
ZTS: pool_checkpoint improvements

The pool_checkpoint tests may incorrectly fail because several of
them invoke zdb for an imported pool.  In this scenario it's not
unexpected for zdb to fail if the pool is modified.  To resolve
this these zdb checks are now done after the pool has been exported.

Additionally, the default cleanup functions assumed the pool would
be imported when they were run.  If this was not the case they're
exit early and fail to cleanup all of the test state causing
subsequent tests to fail.  Add a check to only destroy the pool
when it is imported.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11832

3 years agoFix various typos
Andrea Gelmini [Sat, 3 Apr 2021 01:38:53 +0000 (18:38 -0700)]
Fix various typos

Correct an assortment of typos throughout the code base.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #11774

3 years agobash_completion.d: always call zfs/zpool binaries directly
наб [Fri, 2 Apr 2021 23:34:58 +0000 (01:34 +0200)]
bash_completion.d: always call zfs/zpool binaries directly

/dev/zfs is 0:0 666 on most systems, so the [ -w /dev/zfs ] check always
succeeds, but if zfs isn't in $PATH (e.g. when completing from
"/sbin/zfs list" on a regular account) this can lead to error spew like

  nabijaczleweli@szarotka:~$ /sbin/zfs list bash: zfs: command not found
  @ bash: zfs: command not found

We only do read-only commands, and quite general ones at that,
so there's no need to elevate one way or another.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11828

3 years agoAdd RELEASES.md file
Brian Behlendorf [Fri, 2 Apr 2021 23:33:40 +0000 (16:33 -0700)]
Add RELEASES.md file

Document the project's policy regarding publishing and maintaining
official OpenZFS releases.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11821

3 years agozed: allow limiting concurrent jobs
наб [Mon, 29 Mar 2021 13:21:54 +0000 (15:21 +0200)]
zed: allow limiting concurrent jobs

200ms time-out is relatively long, but if we already hit the cap,
then we'll likely be able to spawn multiple new jobs when we wake up

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11807

3 years agozed: remove unused zed_file_close_on_exec()
наб [Sat, 27 Mar 2021 13:18:27 +0000 (14:18 +0100)]
zed: remove unused zed_file_close_on_exec()

The FIXME comment was there since the initial implementation in 2014,
there are no users

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11807

3 years agozed: use separate reaper thread and collect ZEDLETs asynchronously
наб [Fri, 26 Mar 2021 13:41:38 +0000 (14:41 +0100)]
zed: use separate reaper thread and collect ZEDLETs asynchronously

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11807

3 years agozed: set names for all threads
наб [Fri, 26 Mar 2021 20:18:18 +0000 (21:18 +0100)]
zed: set names for all threads

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11807

3 years agoZTS: inheritance/inherit_001_pos is flaky
Ryan Moeller [Fri, 2 Apr 2021 18:11:52 +0000 (14:11 -0400)]
ZTS: inheritance/inherit_001_pos is flaky

Add inheritance/inherit_001_pos to the maybe fails on FreeBSD list.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11830

3 years agoAvoid taking global lock to destroy zfsdev state
Ryan Moeller [Fri, 2 Apr 2021 18:09:05 +0000 (14:09 -0400)]
Avoid taking global lock to destroy zfsdev state

We have exclusive access to our zfsdev state object in this section
until it is invalidated by setting zs_minor to -1, so we can destroy
the state without taking a lock if we do the invalidation last, after
a member to ensure correct ordering.

While here, strengthen the assertions that zs_minor is valid when we
enter.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #11751

3 years agoFreeBSD: Fix stable/12 after AT_BENEATH removal
Ryan Moeller [Fri, 2 Apr 2021 18:06:44 +0000 (14:06 -0400)]
FreeBSD: Fix stable/12 after AT_BENEATH removal

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11827

3 years agoBump libzfs.so and libzpool.so versions
Brian Behlendorf [Thu, 1 Apr 2021 23:53:05 +0000 (16:53 -0700)]
Bump libzfs.so and libzpool.so versions

Bump the library versions as advised by the libtool guidelines.

https://www.gnu.org/software/libtool/manual/html_node/Updating-version-info.html

Two new functions were added but no existing functions were changed,
so we increase the version and the age (version:revision:age).

Added functions (2):
- boolean_t zpool_is_draid_spare(const char *);
- zpool_compat_status_t zpool_load_compat(const char *,
      boolean_t *, char *, char *);

Additionally bump the libzpool.so version information.  This library
is for internal use but we still want to update the version to track
major changes to the interfaces.

The libzfsbootenv, libuutil, libnvpair and libzfs_core libraries
have not been updated.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11817

3 years agoAllow pool names that look like Solaris disk names
Ryan Moeller [Thu, 1 Apr 2021 15:49:41 +0000 (11:49 -0400)]
Allow pool names that look like Solaris disk names

Nothing bad happens if a prefix of your pool name matches a disk name.
This is a bit of a silly restriction at this point.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #11781
Closes #11813

3 years agoDon't scale zfs_zevent_len_max by CPU count
Ryan Moeller [Wed, 31 Mar 2021 17:56:37 +0000 (13:56 -0400)]
Don't scale zfs_zevent_len_max by CPU count

The lower bound for this scaling to too low and the upper bound is too
high.  Use a fixed default length of 512 instead, which is a reasonable
value on any system.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11822

3 years agoAtomically check and set dropped zevent count
Ryan Moeller [Mon, 29 Mar 2021 19:44:27 +0000 (15:44 -0400)]
Atomically check and set dropped zevent count

ratelimit_dropped isn't protected by a lock and is expected to
be updated atomically.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11822

3 years agoCI: Increase free space in workflow
Brian Behlendorf [Thu, 1 Apr 2021 15:39:27 +0000 (08:39 -0700)]
CI: Increase free space in workflow

Recently we've been running out of free space in the ubuntu 20.04
environment resulting in test failures.  This appears to be caused
by a change in the default available free space and not because of
any change in OpenZFS. Try and avoid this failure by applying a
suggested workaround which removes some unnecessary files.

https://github.com/actions/virtual-environments/issues/2840

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11826

3 years agoFixing m4 iops rename check
Brian Atkinson [Thu, 1 Apr 2021 15:37:41 +0000 (09:37 -0600)]
Fixing m4 iops rename check

The configure check for iops->rename wanting flags was missing the
AC_MSG_CHECKING() so it would just print yes without saying what was
being checked.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #11825

3 years agofsck.zfs: implement 4/8 exit codes as suggested in manpage
наб [Wed, 31 Mar 2021 17:49:56 +0000 (19:49 +0200)]
fsck.zfs: implement 4/8 exit codes as suggested in manpage

Update the fsck.zfs helper to bubble up some already-known-about
errors if they are detected in the pool.

health=degraded => 4/"Filesystem errors left uncorrected"
health=faulted && dataset in /etc/fstab => 8/"Operational error"
pool not found => 8/"Operational error"
everything else => 0

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11806

3 years agoAdd compatibility file sets (ZoL 0.6.1, 0.6.4, OpenZFS 2.1)
Mike Swanson [Wed, 31 Mar 2021 16:40:25 +0000 (09:40 -0700)]
Add compatibility file sets (ZoL 0.6.1, 0.6.4, OpenZFS 2.1)

ZoL 0.6.1 introduced feature flags with the three features that all
implementations at the time were guaranteed to have.  0.6.4 introduced
a few more until 0.6.5 added two after that.  OpenZFS 2.1 added the
dRAID feature.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mike Swanson <mikeonthecomputer@gmail.com>
Closes #11818

3 years agoUpdate META
Brian Behlendorf [Tue, 30 Mar 2021 17:32:29 +0000 (10:32 -0700)]
Update META

Increase the version to 2.1.99 to indicate the master branch is
newer than the 2.1.x release.  This ensures packages built from
master branch are considered to be newer than the last release.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
3 years agoTag 2.1.0-rc1
Brian Behlendorf [Mon, 29 Mar 2021 23:31:29 +0000 (16:31 -0700)]
Tag 2.1.0-rc1

New features:
- Distributed Spare (dRAID) Feature
- Added "compatibility" property for zpool feature sets
- Added zpool_influxdb command to collect zpool statistics

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
3 years agozed: reap child after killing on time-out
наб [Fri, 26 Mar 2021 21:21:00 +0000 (22:21 +0100)]
zed: reap child after killing on time-out

When a child process is killed waitpid() must be called on the
pid the reap the zombie process.

Update BUGS section to reflect reality by replacing "zedlets
aren't time limited with "zedlets can be interrupted".

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11769
Closes #11798

3 years agoUse a helper function to clarify gang block size
Matthew Ahrens [Fri, 26 Mar 2021 18:19:35 +0000 (11:19 -0700)]
Use a helper function to clarify gang block size

For gang blocks, `DVA_GET_ASIZE()` is the total space allocated for the
gang DVA including its children BP's.  The space allocated at each DVA's
vdev/offset is `vdev_psize_to_asize(vd, SPA_GANGBLOCKSIZE)`.

This commit makes this relationship more clear by using a helper
function, `vdev_gang_header_asize()`, for the space allocated at the
gang block's vdev/offset.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11744

3 years agoWhen specifying raidz vdev name, parity count should match
Matthew Ahrens [Fri, 26 Mar 2021 18:12:22 +0000 (11:12 -0700)]
When specifying raidz vdev name, parity count should match

When specifying the name of a RAIDZ vdev on the command line, it can be
specified as raidz-<vdevID> or raidzP-<vdevID>.
e.g. `zpool clear poolname raidz-0` or `zpool clear poolname raidz2-0`

If the parity is specified in the vdev name, it should match the actual
parity of that RAIDZ vdev, otherwise the command should fail.  This
commit makes it so.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Stuart Maybee <stuart.maybee@comcast.net>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11742

3 years agoFix error code on __zpl_ioctl_setflags()
Luis Henriques [Fri, 26 Mar 2021 17:46:45 +0000 (17:46 +0000)]
Fix error code on __zpl_ioctl_setflags()

Other (all?) Linux filesystems seem to return -EPERM instead of -EACCESS
when trying to set FS_APPEND_FL or FS_IMMUTABLE_FL without the
CAP_LINUX_IMMUTABLE capability.  This was detected by generic/545 test
in the fstest suite.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Luis Henriques <henrix@camandro.org>
Closes #11791

3 years agoSupport running FreeBSD buildworld on Arm-based macOS hosts
Jessica Clarke [Fri, 26 Mar 2021 17:45:12 +0000 (17:45 +0000)]
Support running FreeBSD buildworld on Arm-based macOS hosts

Arm-based Macs are like FreeBSD and provide a full 64-bit stat from the
start, so have no stat64 variants. Thus, define stat64 and fstat64 as
aliases for the normal versions.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Jessica Clarke <jrtc27@jrtc27.com>
Closes #11771

3 years agoRemoved duplicated includes
Andrea Gelmini [Mon, 22 Mar 2021 19:34:58 +0000 (20:34 +0100)]
Removed duplicated includes

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #11775

3 years agoFix typo in Python method name
Andrea Gelmini [Mon, 22 Mar 2021 19:32:38 +0000 (20:32 +0100)]
Fix typo in Python method name

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #11776

3 years agoSplit dmu_zfetch() speculation and execution parts
Alexander Motin [Sat, 20 Mar 2021 05:56:11 +0000 (01:56 -0400)]
Split dmu_zfetch() speculation and execution parts

To make better predictions on parallel workloads dmu_zfetch() should
be called as early as possible to reduce possible request reordering.
In particular, it should be called before dmu_buf_hold_array_by_dnode()
calls dbuf_hold(), which may sleep waiting for indirect blocks, waking
up multiple threads same time on completion, that can significantly
reorder the requests, making the stream look like random.  But we
should not issue prefetch requests before the on-demand ones, since
they may get to the disks first despite the I/O scheduler, increasing
on-demand request latency.

This patch splits dmu_zfetch() into two functions: dmu_zfetch_prepare()
and dmu_zfetch_run().  The first can be executed as early as needed.
It only updates statistics and makes predictions without issuing any
I/Os.  The I/O issuance is handled by dmu_zfetch_run(), which can be
called later when all on-demand I/Os are already issued.  It even
tracks the activity of other concurrent threads, issuing the prefetch
only when _all_ on-demand requests are issued.

For many years it was a big problem for storage servers, handling
deeper request queues from their clients, having to either serialize
consequential reads to make ZFS prefetcher usable, or execute the
incoming requests as-is and get almost no prefetch from ZFS, relying
only on deep enough prefetch by the clients.  Benefits of those ways
varied, but neither was perfect.  With this patch deeper queue
sequential read benchmarks with CrystalDiskMark from Windows via
iSCSI to FreeBSD target show me much better throughput with almost
100% prefetcher hit rate, comparing to almost zero before.

While there, I also removed per-stream zs_lock as useless, completely
covered by parent zf_lock.  Also I reused zs_blocks refcount to track
zf_stream linkage of the stream, since I believe previous zs_fetch ==
NULL check in dmu_zfetch_stream_done() was racy.

Delete prefetch streams when they reach ends of files.  It saves up
to 1KB of RAM per file, plus reduces searches through the stream list.

Block data prefetch (speculation and indirect block prefetch is still
done since they are cheaper) if all dbufs of the stream are already
in DMU cache.  First cache miss immediately fires all the prefetch
that would be done for the stream by that time.  It saves some CPU
time if same files within DMU cache capacity are read over and over.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam Moss <c@yotes.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #11652

3 years agoFix zfs_get_data access to files with wrong generation
Chunwei Chen [Sat, 20 Mar 2021 05:53:31 +0000 (22:53 -0700)]
Fix zfs_get_data access to files with wrong generation

If TX_WRITE is create on a file, and the file is later deleted and a new
directory is created on the same object id, it is possible that when
zil_commit happens, zfs_get_data will be called on the new directory.
This may result in panic as it tries to do range lock.

This patch fixes this issue by record the generation number during
zfs_log_write, so zfs_get_data can check if the object is valid.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #10593
Closes #11682

3 years agoFix regression in POSIX mode behavior
Andrew [Sat, 20 Mar 2021 05:50:46 +0000 (01:50 -0400)]
Fix regression in POSIX mode behavior

Commit 235a85657 introduced a regression in evaluation of POSIX modes
that require group DENY entries in the internal ZFS ACL. An example
of such a POSX mode is 007. When write_implies_delete_child is set,
then ACE_WRITE_DATA is added to `wanted_dirperms` in prior to calling
zfs_zaccess_common(). This occurs is zfs_zaccess_delete().

Unfortunately, when zfs_zaccess_aces_check hits this particular DENY
ACE, zfs_groupmember() is checked to determine whether access should be
denied, and since zfs_groupmember() always returns B_TRUE on Linux and
so this check is failed, resulting ultimately in EPERM being returned.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Andrew Walker <awalker@ixsystems.com>
Closes #11760

3 years agoZTS: New test for kernel panic induced by redacted send
Palash Gandhi [Sat, 20 Mar 2021 05:47:50 +0000 (22:47 -0700)]
ZTS: New test for kernel panic induced by redacted send

This change adds a new test that covers a bug fix in the binary search
in the redacted send resume logic that causes a kernel panic.
The bug was fixed in https://github.com/openzfs/zfs/pull/11297.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Palash Gandhi <palash.gandhi@delphix.com>
Closes #11764

3 years agoAllow setting bootfs property on pools with indirect vdevs
Martin Matuška [Sat, 20 Mar 2021 05:46:43 +0000 (06:46 +0100)]
Allow setting bootfs property on pools with indirect vdevs

The FreeBSD boot loader relies on the bootfs property and is capable
of booting from removed (indirect) vdevs.

Reviewed-by Eric van Gyzen
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #11763

3 years agoFix typo in zgenhostid.8
Ryan Moeller [Sat, 20 Mar 2021 05:39:42 +0000 (01:39 -0400)]
Fix typo in zgenhostid.8

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11770

3 years agoRemoving old code for k(un)map_atomic
Brian Atkinson [Sat, 20 Mar 2021 05:38:44 +0000 (23:38 -0600)]
Removing old code for k(un)map_atomic

It used to be required to pass a enum km_type to kmap_atomic() and
kunmap_atomic(), however this is no longer necessary and the wrappers
zfs_k(un)map_atomic removed these. This is confusing in the ABD code as
the struct abd_iter member iter_km no longer exists and the wrapper
macros simply compile them out.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam Moss <c@yotes.com>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #11768

3 years agoInitialize metaslab range trees in metaslab_init
Serapheim Dimitropoulos [Sat, 20 Mar 2021 05:36:02 +0000 (22:36 -0700)]
Initialize metaslab range trees in metaslab_init

= Motivation

We've noticed several zloop crashes within Delphix generated
due to the following sequence of events:

- A device gets expanded and new metaslabas are allocated for
  it. These metaslabs go through `metaslab_init()` but haven't
  gone through `metaslab_sync_done()` yet. This meas that the
  only range tree that's actually set is the `ms_allocatable`.
  All the others are NULL.

- A vdev_initialization is issues and `vdev_initialize_thread`
  starts processing one of these new metaslabs of the expanded
  vdev.

- As part of `vdev_initialize_calculate_progress()` we call
  into `metaslab_load()` and `metaslab_load_impl()` which
  in turn tries to dereference the metaslabs trees that
  are still NULL and therefore we crash.

The same failure can come up from the `vdev_trim` code paths.

= This Patch

We considered the following solutions to deal with this issue:

[A] Add logic to `vdev_initialize/trim` to skip those new
    metaslabs. We decided against this as it would be good
    to avoid exposing this lower-level detail to higer-level
    operations.

[B] Have `metaslab_load_impl()` return early for new metaslabs
    and thus never touch those range_trees that are NULL at
    that time. This seemed more of a work-around for the bug
    and not a clear-cut solution.

[C] Refactor our logic so all metaslabs have their range_trees
    created at the time of their creatin in `metaslab_init()`.

In this patch we decided to go with [C] because:

(1) It doesn't expose more metaslab details to higher level
    operations such as vdev initialize and trim.

(2) The current behavior of creating the range trees lazily
    in `metaslab_sync_done()` is unnecessarily complicated.

(3) Always initializing the metaslab range_trees makes other
    parts of the codebase cleaner. For example, we used to
    use `ms_freed` as the reference value for knowing whether
    all the range_trees have been initialized. Now we no
    longer need to do that check in most places (and in the
    few that we do we use the `ms_new` boolean field now
    which is more readable).

= Side Changes

Probably due to a mismerge we set `ms_loaded` to `B_TRUE` twice
in `metasloab_load_impl()`. In this patch we remove the extraneous
assignment.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #11737

3 years agoLinux 5.12 update: bio_max_segs() replaces BIO_MAX_PAGES
Coleman Kane [Sat, 20 Mar 2021 05:33:42 +0000 (01:33 -0400)]
Linux 5.12 update: bio_max_segs() replaces BIO_MAX_PAGES

The BIO_MAX_PAGES macro is being retired in favor of a bio_max_segs()
function that implements the typical MIN(x,y) logic used throughout the
kernel for bounding the allocation, and also the new implementation is
intended to be signed-safe (which the former was not).

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #11765

3 years agoLinux 5.12 compat: idmapped mounts
Coleman Kane [Sat, 20 Mar 2021 04:00:59 +0000 (00:00 -0400)]
Linux 5.12 compat: idmapped mounts

In Linux 5.12, the filesystem API was modified to support ipmapped
mounts by adding a "struct user_namespace *" parameter to a number
functions and VFS handlers. This change adds the needed autoconf
macros to detect the new interfaces and updates the code appropriately.
This change does not add support for idmapped mounts, instead it
preserves the existing behavior by passing the initial user namespace
where needed.  A subsequent commit will be required to add support
for idmapped mounted.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #11712

3 years agoClean up RAIDZ/DRAID ereport code
Matthew Ahrens [Fri, 19 Mar 2021 23:22:10 +0000 (16:22 -0700)]
Clean up RAIDZ/DRAID ereport code

The RAIDZ and DRAID code is responsible for reporting checksum errors on
their child vdevs.  Checksum errors represent events where a disk
returned data or parity that should have been correct, but was not.  In
other words, these are instances of silent data corruption.  The
checksum errors show up in the vdev stats (and thus `zpool status`'s
CKSUM column), and in the event log (`zpool events`).

Note, this is in contrast with the more common "noisy" errors where a
disk goes offline, in which case ZFS knows that the disk is bad and
doesn't try to read it, or the device returns an error on the requested
read or write operation.

RAIDZ/DRAID generate checksum errors via three code paths:

1. When RAIDZ/DRAID reconstructs a damaged block, checksum errors are
reported on any children whose data was not used during the
reconstruction.  This is handled in `raidz_reconstruct()`.  This is the
most common type of RAIDZ/DRAID checksum error.

2. When RAIDZ/DRAID is not able to reconstruct a damaged block, that
means that the data has been lost.  The zio fails and an error is
returned to the consumer (e.g. the read(2) system call).  This would
happen if, for example, three different disks in a RAIDZ2 group are
silently damaged.  Since the damage is silent, it isn't possible to know
which three disks are damaged, so a checksum error is reported against
every child that returned data or parity for this read.  (For DRAID,
typically only one "group" of children is involved in each io.)  This
case is handled in `vdev_raidz_cksum_finish()`. This is the next most
common type of RAIDZ/DRAID checksum error.

3. If RAIDZ/DRAID is not able to reconstruct a damaged block (like in
case 2), but there happens to be additional copies of this block due to
"ditto blocks" (i.e. multiple DVA's in this blkptr_t), and one of those
copies is good, then RAIDZ/DRAID compares each sector of the data or
parity that it retrieved with the good data from the other DVA, and if
they differ then it reports a checksum error on this child.  This
differs from case 2 in that the checksum error is reported on only the
subset of children that actually have bad data or parity.  This case
happens very rarely, since normally only metadata has ditto blocks.  If
the silent damage is extensive, there will be many instances of case 2,
and the pool will likely be unrecoverable.

The code for handling case 3 is considerably more complicated than the
other cases, for two reasons:

1. It needs to run after the main raidz read logic has completed.  The
data RAIDZ read needs to be preserved until after the alternate DVA has
been read, which necessitates refcounts and callbacks managed by the
non-raidz-specific zio layer.

2. It's nontrivial to map the sections of data read by RAIDZ to the
correct data.  For example, the correct data does not include the parity
information, so the parity must be recalculated based on the correct
data, and then compared to the parity that was read from the RAIDZ
children.

Due to the complexity of case 3, the rareness of hitting it, and the
minimal benefit it provides above case 2, this commit removes the code
for case 3.  These types of errors will now be handled the same as case
2, i.e. the checksum error will be reported against all children that
returned data or parity.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11735

3 years agoFreeBSD: make seqc asserts conditional on replay
Mateusz Guzik [Thu, 18 Mar 2021 05:09:45 +0000 (06:09 +0100)]
FreeBSD: make seqc asserts conditional on replay

Avoids tripping on asserts when doing pool recovery.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11739

3 years agoRemove unused rr_code
Matthew Ahrens [Thu, 18 Mar 2021 04:57:09 +0000 (21:57 -0700)]
Remove unused rr_code

The `rr_code` field in `raidz_row_t` is unused.

This commit removes the field, as well as the code that's used to set
it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11736

3 years agoFreeBSD: Fix memory leaks in kstats
Ryan Moeller [Thu, 18 Mar 2021 04:55:18 +0000 (00:55 -0400)]
FreeBSD: Fix memory leaks in kstats

Don't handle (incorrectly) kmem_zalloc() failure.  With KM_SLEEP,
will never return NULL.

Free the data allocated for non-virtual kstats when deleting the object.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11767

3 years agoLinux: always check or verify return of igrab()
Adam D. Moss [Tue, 16 Mar 2021 23:33:34 +0000 (16:33 -0700)]
Linux: always check or verify return of igrab()

zhold() wraps igrab() on Linux, and igrab() may fail when the inode
is in the process of being deleted.  This means zhold() must only be
called when a reference exists and therefore it cannot be deleted.
This is the case for all existing consumers so add a VERIFY and a
comment explaining this requirement.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Adam Moss <c@yotes.com>
Closes #11704