]> git.proxmox.com Git - mirror_zfs.git/log
mirror_zfs.git
5 years agoAdvise users to retain issue/PR templates
bunder2015 [Wed, 17 Oct 2018 17:25:38 +0000 (13:25 -0400)]
Advise users to retain issue/PR templates

Occasionally we get issues and PRs from users who delete the
templates.  Advise users that their issues and PRs may be closed if
they do not fill out the templates as we really need this information.

Also updating PR template to drop unneeded approval toggle as we are
now using issue labels for status tracking.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: bunder2015 <omfgbunder@gmail.com>
Closes #8029

5 years agoAdd types to featureflags in zfs
Paul Dagnelie [Tue, 16 Oct 2018 18:15:04 +0000 (11:15 -0700)]
Add types to featureflags in zfs

The boolean featureflags in use thus far in ZFS are extremely useful,
but because they take advantage of the zap layer, more interesting data
than just a true/false value can be stored in a featureflag. In redacted
send/receive, this is used to store the list of redaction snapshots for
a redacted dataset.

This change adds the ability for ZFS to store types other than a boolean
in a featureflag. The only other implemented type is a uint64_t array.
It also modifies the interfaces around dataset features to accomodate
the new capabilities, and adds a few new functions to increase
encapsulation.

This functionality will be used by the Redacted Send/Receive feature.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7981

5 years agodeadlock between mm_sem and tx assign in zfs_write() and page fault
ilbsmart [Tue, 16 Oct 2018 18:11:24 +0000 (02:11 +0800)]
deadlock between mm_sem and tx assign in zfs_write() and page fault

The bug time sequence:
1. thread #1, `zfs_write` assign a txg "n".
2. In a same process, thread #2, mmap page fault (which means the
   `mm_sem` is hold) occurred, `zfs_dirty_inode` open a txg failed,
   and wait previous txg "n" completed.
3. thread #1 call `uiomove` to write, however page fault is occurred
   in `uiomove`, which means it need `mm_sem`, but `mm_sem` is hold by
   thread #2, so it stuck and can't complete,  then txg "n" will
   not complete.

So thread #1 and thread #2 are deadlocked.

Reviewed-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Grady Wong <grady.w@xtaotech.com>
Closes #7939

5 years agoAdd zts-report.py to python shebang exclusion
Brian Behlendorf [Mon, 15 Oct 2018 23:59:28 +0000 (16:59 -0700)]
Add zts-report.py to python shebang exclusion

Include zts-report.py is the __brp_mangle_shebangs_exclude_from
to resolve build failures in Fedora 28 and newer.

Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #8020
Issue #7360

5 years agoOpenZFS 9847 - leaking dd_clones (DMU_OT_DSL_CLONES) objects (#7979)
Matthew Ahrens [Fri, 12 Oct 2018 18:28:26 +0000 (11:28 -0700)]
OpenZFS 9847 - leaking dd_clones (DMU_OT_DSL_CLONES) objects (#7979)

OpenZFS 9847 - leaking dd_clones (DMU_OT_DSL_CLONES) objects

We're leaking the dd_clones objects in dsl_dir_destroy_sync.  This bug
appears to have been around forever.  Thankfully the amount of space
typically involved is tiny.

In addition this adds a mechanism in ZDB to find objects in the MOS
which are leaked (not referenced anywhere).

Porting notes:
* Added dd_crypto_obj to ZDB MOS object leak tracking

Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: Matthew Ahrens <mahrens@delphix.com>
OpenZFS-issue: https://illumos.org/issues/9847
Closes #7979

5 years agoImproved error handling for extreme rewinds
Brian Behlendorf [Tue, 9 Oct 2018 22:42:42 +0000 (15:42 -0700)]
Improved error handling for extreme rewinds

The vdev_checkpoint_sm_object(), vdev_obsolete_sm_object(), and
vdev_obsolete_counts_are_precise() functions assume that the
only way a zap_lookup() can fail is if the requested entry is
missing.  While this is the most common cause, it's not the only
cause.  Attemping to access a damaged ZAP will result in other
errors.

The most likely scenario for accessing a damaged ZAP is during
an extreme rewind pool import.  Under these conditions the pool
is expected to contain damaged objects and the import code was
updated to handle this gracefully.  Getting an ECKSUM error from
these ZAPs after the pool in import a far less likely, therefore
the behavior for call paths was not modified.

Reviewed-by: Tim Chase <tim@chase2k.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7809
Closes #7921

5 years agoRevert "Allow ECKSUM in vdev_checkpoint_sm_object()"
Brian Behlendorf [Tue, 9 Oct 2018 22:21:27 +0000 (15:21 -0700)]
Revert "Allow ECKSUM in vdev_checkpoint_sm_object()"

This reverts commit e927fc8a522e1c0db89955cc555841aa23bbd634.

Reviewed by: Tim Chase <tim@chase2k.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7921

5 years agoDefine timestruc_t for Lustre compatibility
Tony Hutter [Fri, 12 Oct 2018 18:13:34 +0000 (11:13 -0700)]
Define timestruc_t for Lustre compatibility

Lustre 2.8 (and possibly other versions) are still using timestruc_t,
which was removed in spl-0.7.10 in favor of inode_timespec_t.  Add
in a backwards compatibility #define for timestruc_t so that Lustre
builds.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #8014

5 years agoOpenZFS 9689 - zfs range lock code should not be zpl-specific
Matt Ahrens [Mon, 1 Oct 2018 22:13:12 +0000 (15:13 -0700)]
OpenZFS 9689 - zfs range lock code should not be zpl-specific

The ZFS range locking code in zfs_rlock.c/h depends on ZPL-specific
data structures, specifically znode_t.  However, it's also used by
the ZVOL code, which uses a "dummy" znode_t to pass to the range
locking code.

We should clean this up so that the range locking code is generic
and can be used equally by ZPL and ZVOL, and also can be used by
future consumers that may need to run in userland (libzpool) as
well as the kernel.

Porting notes:
* Added missing sys/avl.h include to sys/zfs_rlock.h.
* Removed 'dbuf is within the locked range' ASSERTs from dmu_sync().
  This was needed because ztest does not yet use a locked_range_t.
* Removed "Approved by:" tag requirement from OpenZFS commit
  check to prevent needless warnings when integrating changes
  which has not been merged to illumos.
* Reverted free_list range lock changes which were originally
  needed to defer the cv_destroy() which was called immediately
  after cv_broadcast().  With d2733258 this should be safe but
  if not we may need to reintroduce this logic.
* Reverts: The following two commits were reverted and squashed in
  to this change in order to make it easier to apply OpenZFS 9689.
  - d88895a0, which removed the dummy znode from zvol_state
  - e3a07cd0, which updated ztest to use range locks
* Preserved optimized rangelock comparison function.  Preserved the
  rangelock free list.  The cv_destroy() function will block waiting
  for all processes in cv_wait() to be scheduled and drop their
  reference.  This is done to ensure it's safe to free the condition
  variable.  However, blocking while holding the rl->rl_lock mutex
  can result in a deadlock on Linux.  A free list is introduced to
  defer the cv_destroy() and kmem_free() until after the mutex is
  released.

Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
OpenZFS-issue: https://illumos.org/issues/9689
OpenZFS-commit: https://github.com/openzfs/openzfs/pull/680
External-issue: DLPX-58662
Closes #7980

5 years agoFix changelist mounted-dataset iteration
Alek P [Thu, 11 Oct 2018 04:13:13 +0000 (00:13 -0400)]
Fix changelist mounted-dataset iteration

Commit 0c6d093 caused a regression in the inherit codepath.
The fix is to restrict the changelist iteration on mountpoints and
add proper handling for 'legacy' mountpoints

Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alek Pinchuk <apinchuk@datto.com>
Closes #7988
Closes #7991

5 years agoCheck scheduler for "noop" before setting "noop"
Garrett Fields [Wed, 10 Oct 2018 15:46:22 +0000 (11:46 -0400)]
Check scheduler for "noop" before setting "noop"

Originally code only checked for presence of "/sys/block/$i/queue/
scheduler".  "sh: write error: Invalid argument" was produced when
trying to set "noop" on certain devices (eg. virtio) when it isn't
a listed option. This modification continues to check for the presence
of "/sys/block/$i/queue/scheduler" and also checks that it contains
"noop" as an option before setting "noop".

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Signed-off-by: Garrett Fields <ghfields@gmail.com>
Closes #8004

5 years agoPrint "(repairing)" in zpool status again
Tony Hutter [Wed, 10 Oct 2018 03:30:32 +0000 (20:30 -0700)]
Print "(repairing)" in zpool status again

Historically, zpool status prints "(repairing)" for any drives that
have errors during a scrub:

        NAME            STATE     READ WRITE CKSUM
        mypool          ONLINE       0     0     0
          mirror-0      ONLINE       0     0     0
            /tmp/file1  ONLINE      13     0     0  (repairing)
            /tmp/file2  ONLINE       0     0     0
            /tmp/file3  ONLINE       0     0     0

This was accidentally broken in "OpenZFS 9166 - zfs storage pool
checkpoint" (d2734cc).  This patch adds it back in.

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #7779
Closes #7978

5 years ago Refactor dmu_recv into its own file
Paul Dagnelie [Tue, 9 Oct 2018 21:05:13 +0000 (14:05 -0700)]
 Refactor dmu_recv into its own file

This change moves the bottom half of dmu_send.c (where the receive
logic is kept) into a new file, dmu_recv.c, and does similarly
for receive-related changes in header files.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7982

5 years agoFix arc_release() refcount
Brian Behlendorf [Mon, 8 Oct 2018 21:59:34 +0000 (14:59 -0700)]
Fix arc_release() refcount

Update arc_release to use arc_buf_size().  This hunk was accidentally
dropped when porting compressed send/recv, 2aa34383b.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #8000

5 years agoAdd zfs_refcount_transfer_ownership_many()
Brian Behlendorf [Mon, 8 Oct 2018 21:58:21 +0000 (14:58 -0700)]
Add zfs_refcount_transfer_ownership_many()

When debugging is enabled and a zfs_refcount_t contains multiple holders
using the same key, but different ref_counts, the wrong reference_t may
be transferred.  Add a zfs_refcount_transfer_ownership_many() function,
like the existing zfs_refcount_*_many() functions, to match and transfer
the correct refcount_t;

This issue may occur when using encryption with refcount debugging
enabled.  An arc_buf_hdr_t can have references for both the
hdr->b_l1hdr.b_pabd and hdr->b_crypt_hdr.b_rabd both of which use
the hdr as the reference holder.  When unsharing the buffer the
p_abd should be transferred.

This issue does not impact production builds because refcount holders
are not tracked.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7219
Closes #8000

5 years agoCreate /proc/sys/kernel/spl/gitrev with git hash
Matthew Ahrens [Tue, 9 Oct 2018 04:57:02 +0000 (21:57 -0700)]
Create /proc/sys/kernel/spl/gitrev with git hash

The existing mechanisms for determining what code is running in the
kernel do not always correctly report the git hash.  The versions
reported there do not reflect changes made since `configure` was run
(i.e. incremental builds do not update the version) and they are
misleading if git tags are not set up properly.  This applies to
`modinfo zfs`, `dmesg`, and `/sys/module/zfs/version`.

There are complicated requirements on how the existing version is
generated.  Therefore we are leaving that alone, and adding a new
mechanism to record and retrieve the git hash:
`cat /proc/sys/kernel/spl/gitrev`

The gitrev is re-generated at compile time, when running `make`
(including for incremental builds).  The value is the output of `git
describe` (or "unknown" if not in a git repo or there are uncommitted
changes).

We're also removing /proc/sys/kernel/spl/version, which was never very
useful.

Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Tim Chase <tim@chase2k.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #7931
Closes #7965

5 years agoOpenZFS 9617 - too-frequent TXG sync causes excessive write inflation
Matthew Ahrens [Tue, 12 Dec 2017 23:46:58 +0000 (15:46 -0800)]
OpenZFS 9617 - too-frequent TXG sync causes excessive write inflation

Porting notes:
* Renamed zfs_dirty_data_sync_pct to zfs_dirty_data_sync_percent and
  changed the type to be consistent with the other dirty module params.
* Updated zfs-module-parameters.5 accordingly.

Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Approved by: Robert Mustacchi <rm@joyent.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
OpenZFS-issue: https://illumos.org/issues/9617
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/7928f4ba
Closes #7976

5 years agoWarn if checking programs are not installed
Matthew Ahrens [Thu, 4 Oct 2018 20:11:45 +0000 (13:11 -0700)]
Warn if checking programs are not installed

`make checkstyle` silently skips checks if the required programs are not
installed (e.g. shellcheck, mandoc).  Therefore developers may not
realize that they are not getting the full suite of code checks.  This
also applies to more specific targets like `make shellcheck`.

We should print a warning message when a check is skipped due to missing
tools.

Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #7984

5 years agoAdd codecheck make target
Matthew Ahrens [Thu, 4 Oct 2018 20:10:10 +0000 (13:10 -0700)]
Add codecheck make target

We'd like to have tooling that verifies code style, while ignoring the
commit message.  For example, code does not need to be signed off in
order to be tested.  Current workarounds are to run `git checkstyle` and
ignore the commit message errors, or to run `make cstyle shellcheck
flake8 mancheck testscheck`, and make sure that list stays updated.

Solution is to add a new make target, `codecheck` which does all the
code checks.  `checkstyle` is now simply `codecheck` + `commitcheck`.

Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #7985

5 years agoFix ASSERT macros to not over-expand
Paul Dagnelie [Thu, 4 Oct 2018 03:16:45 +0000 (20:16 -0700)]
Fix ASSERT macros to not over-expand

The code reuse in the definitions of the ASSERT and VERIFY macros result
in expansion of their arguments before they are stringified, which
produces ugly and undesirable output.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7884

5 years agoAdd new fnvlist_lookup_* functions
Paul Dagnelie [Wed, 3 Oct 2018 22:30:55 +0000 (15:30 -0700)]
Add new fnvlist_lookup_* functions

Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7977

5 years agoVerify 'zfs destroy' will unshare the dataset
Prakash Surya [Fri, 21 Sep 2018 17:54:49 +0000 (10:54 -0700)]
Verify 'zfs destroy' will unshare the dataset

This change adds a new test case to the zfs-test suite to verify that
when 'zfs destroy' is used on a shared dataset, the dataset will be
unshared after the destroy operation completes.

Reviewed by: loli10K <ezomori.nozomu@gmail.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Prakash Surya <prakash.surya@delphix.com>
Closes #7941

5 years agoFix "zfs destroy" when "sharenfs=on" is used
Prakash Surya [Fri, 21 Sep 2018 15:47:42 +0000 (08:47 -0700)]
Fix "zfs destroy" when "sharenfs=on" is used

When using "zfs destroy" on a dataset that is using "sharenfs=on" and
has been automatically exported (by libzfs), the dataset will not be
automatically unexported as it should be. This workflow appears to have
been broken by this commit: 3fd3e56cfd543d7d7a1bf502bfc0db6e24139668

In that change, the "zfs_unmount" function was modified to use the
"mnt.mnt_special" field when determining the mount point that is being
unmounted, rather than "mnt.mnt_mountp".

As a result, when "mntpt" is passed into "zfs_unshare_proto", it's value
is now the dataset name rather than the mountpoint. Thus, when this
value is used with the "is_shared" function (via "zfs_unshare_proto") it
will not find a match (since that function assumes it'll be passed the
mountpoint) and incorrectly reports that the dataset is not shared.

This can be easily reproduced with the following commands:

    $ sudo zpool create tank xvdb
    $ sudo zfs create -o sharenfs=on tank/fish
    $ sudo zfs destroy tank/fish

    $ sudo zfs list -r tank
    NAME   USED  AVAIL  REFER  MOUNTPOINT
    tank  97.5K  7.27G    24K  /tank

    $ sudo exportfs
    /tank/fish      <world>
    $ sudo cat /etc/dfs/sharetab
    /tank/fish      -       nfs     rw,crossmnt

At this point, the "tank/fish" filesystem doesn't exist, but it's still
listed as exported when looking at "exportfs" and "/etc/dfs/sharetab".

Also note, this change brings us back in-sync with the illumos code, as
it pertains to this one line; on illumos, "mnt.mnt_mountp" is used.

Reviewed by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: George Wilson <george.wilson@delphix.com>
Signed-off-by: Prakash Surya <prakash.surya@delphix.com>
Issue #6143
Closes #7941

5 years agoOpenZFS 9677 - panic from zio_write_gang_block()
Brad Lewis [Thu, 15 Mar 2018 19:47:26 +0000 (13:47 -0600)]
OpenZFS 9677 - panic from zio_write_gang_block()

Panic from zio_write_gang_block() when creating dump device
on fragmented rpool.

Authored by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prashanth Sreenivasa <pks@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Ported-by: Behlendorf <behlendorf1@llnl.gov>
OpenZFS-issue: https://illumos.org/issues/9677
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/7341a7d
Closes #7975

5 years agoOpenZFS 9616 - Bogus error when attempting to set property on read-only pool
Andrew Stormont [Sun, 17 Jun 2018 10:53:29 +0000 (11:53 +0100)]
OpenZFS 9616 - Bogus error when attempting to set property on read-only pool

Authored by: Andrew Stormont <astormont@racktopsystems.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
OpenZFS-issue: https://illumos.org/issues/9616
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f62db44d
Closes #7974

5 years agoRefcounted DSL Crypto Key Mappings
Tom Caputi [Wed, 3 Oct 2018 16:47:11 +0000 (12:47 -0400)]
Refcounted DSL Crypto Key Mappings

Since native ZFS encryption was merged, we have been fighting
against a series of bugs that come down to the same problem: Key
mappings (which must be present during all I/O operations) are
created and destroyed based on dataset ownership, but I/Os can
have traditionally been allowed to "leak" into the next txg after
the dataset is disowned.

In the past we have attempted to solve this problem by trying to
ensure that datasets are disowned ater all I/O is finished by
calling txg_wait_synced(), but we have repeatedly found edge cases
that need to be squashed and code paths that might incur a high
number of txg syncs. This patch attempts to resolve this issue
differently, by adding a reference to the key mapping for each txg
it is dirtied in. By doing so, we can remove many of the
unnecessary calls to txg_wait_synced() we have added in the past
and ensure we don't need to deal with this problem in the future.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #7949

5 years agoOpenZFS 9700 - ZFS resilvered mirror does not balance reads
Jerry Jelinek [Thu, 30 Aug 2018 13:13:30 +0000 (13:13 +0000)]
OpenZFS 9700 - ZFS resilvered mirror does not balance reads

Authored by: Jerry Jelinek <jerry.jelinek@joyent.com>
Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
OpenZFS-issue: https://illumos.org/issues/9700
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/82f63c3c
Closes #7973

5 years agoOpenZFS 9763 - zfs(1M): broken formatting in allow/unallow description
Yuri Pankov [Fri, 24 Aug 2018 00:36:04 +0000 (03:36 +0300)]
OpenZFS 9763 -  zfs(1M): broken formatting in allow/unallow description

Porting notes:
* Two of the three changes from the upstream patch were already
  applied for Linux.  Only the last one is required.

Authored by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Robert Mustacchi <rm@joyent.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Approved by: Gordon Ross <gwr@nexenta.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
OpenZFS-issue: https://illumos.org/issues/9763
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/8a702e55
Closes #7972

5 years agochangelist should be able to iter on mounts
Alek P [Tue, 2 Oct 2018 19:30:58 +0000 (15:30 -0400)]
changelist should be able to iter on mounts

Modified changelist_gather()ing for the mountpoint property.
Now instead of iterating on all dataset descendants, we read
/proc/self/mounts and iterate on the mounted descendant datasets only.

Switched changelist implementation from a uu_list_* to uu_avl_* in
order to  reduce changlist code-path's worst case time complexity.

Reviewed by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alek Pinchuk <apinchuk@datto.com>
Closes #7967

5 years agoZTS: Fix snapshot_009_pos, snapshot_010_pos
Brian Behlendorf [Tue, 2 Oct 2018 00:15:57 +0000 (17:15 -0700)]
ZTS: Fix snapshot_009_pos, snapshot_010_pos

Mitigate the likelihood of the newly created volumes being busy
when the 'zfs destroy -r' is issued by waiting for udev to settle.
Since this is not a iron clad fix I've added the test case to
the known list of possible failures and referenced issue #7961.

Finally, in the case this test does fail fix the cleanup logic
so subsequent tests won't incorrectly fail.

Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7961
Closes #7962

5 years agoPrefix all refcount functions with zfs_
Tim Schumacher [Mon, 1 Oct 2018 17:42:05 +0000 (19:42 +0200)]
Prefix all refcount functions with zfs_

Recent changes in the Linux kernel made it necessary to prefix
the refcount_add() function with zfs_ due to a name collision.

To bring the other functions in line with that and to avoid future
collisions, prefix the other refcount functions as well.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Schumacher <timschumi@gmx.de>
Closes #7963

5 years agoRemove duplicate macro in dsl_dir.h
Matthew Ahrens [Mon, 1 Oct 2018 17:40:11 +0000 (10:40 -0700)]
Remove duplicate macro in dsl_dir.h

The DD_FIELD_LAST_REMAP_TXG macro was added twice (with the same value).
This change removes one of them.

Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #7968

5 years agoRefine split block reconstruction
Brian Behlendorf [Mon, 1 Oct 2018 17:36:34 +0000 (10:36 -0700)]
Refine split block reconstruction

Due to a flaw in 4589f3ae the number of unique combinations
could be calculated incorrectly.  This could result in the
random combinations reconstruction being used when it would
have been possible to check all combinations.

This change fixes the unique combinations calculation and
simplifies the reconstruction logic by maintaining a per-
segment list of unique copies.

The vdev_indirect_splits_damage() function was introduced
to validate both the enumeration and random reconstruction
logic with ztest.  It is implemented such it will never
make a known recoverable block unrecoverable.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #6900
Closes #7934

5 years agoFixes for procfs files backed by linked lists
John Gallagher [Wed, 26 Sep 2018 18:08:12 +0000 (11:08 -0700)]
Fixes for procfs files backed by linked lists

There are some issues with the way the seq_file interface is implemented
for kstats backed by linked lists (zfs_dbgmsgs and certain per-pool
debugging info):

* We don't account for the fact that seq_file sometimes visits a node
  multiple times, which results in missing messages when read through
  procfs.
* We don't keep separate state for each reader of a file, so concurrent
  readers will receive incorrect results.
* We don't account for the fact that entries may have been removed from
  the list between read syscalls, so reading from these files in procfs
  can cause the system to crash.

This change fixes these issues and adds procfs_list, a wrapper around a
linked list which abstracts away the details of implementing the
seq_file interface for a list and exposing the contents of the list
through procfs.

Reviewed by: Don Brady <don.brady@delphix.com>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: John Gallagher <john.gallagher@delphix.com>
External-issue: LX-1211
Closes #7819

5 years agoFix flake 8 style warnings
Gregor Kopka [Wed, 26 Sep 2018 18:02:26 +0000 (20:02 +0200)]
Fix flake 8 style warnings

Ran zts-report.py and test-runner.py from ./tests/test-runner/bin/
through the 2to3 (https://docs.python.org/2/library/2to3.html).
Checked the result, fixed:
- 'maxint' -> 'maxsize' that 2to3 missed.
- 'cmp=' parameter for a 'sorted()' with a 'key=' version.
- try/except wrapping of configparser import as there are still
  python 2.7 systems that lack a compatibility shim

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gregor Kopka <gregor@kopka.net>
Closes #7925
Closes #7952

5 years agoLinux 4.19-rc3+ compat: Remove refcount_t compat
Tim Schumacher [Wed, 26 Sep 2018 17:29:26 +0000 (19:29 +0200)]
Linux 4.19-rc3+ compat: Remove refcount_t compat

torvalds/linux@59b57717f ("blkcg: delay blkg destruction until
after writeback has finished") added a refcount_t to the blkcg
structure. Due to the refcount_t compatibility code, zfs_refcount_t
was used by mistake.

Resolve this by removing the compatibility code and replacing the
occurrences of refcount_t with zfs_refcount_t.

Reviewed-by: Franz Pletz <fpletz@fnordicwalking.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Schumacher <timschumi@gmx.de>
Closes #7885
Closes #7932

5 years agoFix small sysfs leak
Brian Behlendorf [Wed, 26 Sep 2018 16:50:58 +0000 (09:50 -0700)]
Fix small sysfs leak

When zfs_kobj_init() is called with an attr_cnt of 0 only the
kobj->zko_default_attrs is allocated.  It subsequently won't
get freed in zfs_kobj_release since the free is wrapped in
a kobj->zko_attr_count != 0 conditional.

Split the block in zfs_kobj_release() to make sure the
kobj->zko_default_attrs are freed in this case.

Additionally, fix a minor spelling mistake and typo in
zfs_kobj_init() which could also cause a leak but in practice
is almost certain not to fail.

Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Tim Chase <tim@chase2k.com>
Reviewed-by: John Gallagher <john.gallagher@delphix.com>
Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7957

5 years agoZpool iostat: remove latency/queue scaling
Gregor Kopka [Tue, 25 Sep 2018 23:29:16 +0000 (01:29 +0200)]
Zpool iostat: remove latency/queue scaling

Bandwidth and iops are average per second while *_wait are averages
per request for latency or, for queue depths, an instantaneous
measurement at the end of an interval (according to man zpool).

When calculating the first two it makes sense to do
x/interval_duration (x being the increase in total bytes or number of
requests over the duration of the interval, interval_duration in
seconds) to 'scale' from amount/interval_duration to amount/second.

But applying the same math for the latter (*_wait latencies/queue) is
wrong as there is no interval_duration component in the values (these
are time/requests to get to average_time/request or already an
absulute number).

This bug leads to the only correct continuous *_wait figures for both
latencies and queue depths from 'zpool iostat -l/q' being with
duration=1 as then the wrong math cancels itself (x/1 is a nop).

This removes temporal scaling from latency and queue depth figures.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gregor Kopka <gregor@kopka.net>
Closes #7945
Closes #7694

5 years agoRevert "Fix flake 8 style warnings"
Brian Behlendorf [Tue, 25 Sep 2018 00:16:41 +0000 (17:16 -0700)]
Revert "Fix flake 8 style warnings"

This reverts commit b8fd4310c54444eecb66140d99a6156f4353b29b which
accidentally introduced a regression for some versions of python.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #7929

5 years agoFix statfs(2) for 32-bit user space
Brian Behlendorf [Tue, 25 Sep 2018 00:11:25 +0000 (17:11 -0700)]
Fix statfs(2) for 32-bit user space

When handling a 32-bit statfs() system call the returned fields,
although 64-bit in the kernel, must be limited to 32-bits or an
EOVERFLOW error will be returned.

This is less of an issue for block counts since the default
reported block size in 128KiB. But since it is possible to
set a smaller block size, these values will be scaled as
needed to fit in a 32-bit unsigned long.

Unlike most other filesystems the total possible file counts
are more likely to overflow because they are calculated based
on the available free space in the pool. In order to prevent
this the reported value must be capped at 2^32-1. This is
only for statfs(2) reporting, there are no changes to the
internal ZFS limits.

Reviewed-by: Andreas Dilger <andreas.dilger@whamcloud.com>
Reviewed-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #7927
Closes #7122
Closes #7937

5 years agoZTS: Fix removal_resume_export
LOLi [Mon, 24 Sep 2018 19:58:16 +0000 (21:58 +0200)]
ZTS: Fix removal_resume_export

This change simplify the test case removing part of the logic which was
introducing a race condition and thus causing spurious failures: we use
attempt_during_removal() from removal.kshlib instead which has been
observed to be more stable.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #7894
Closes #7913

5 years agoFix flake 8 style warnings
Gregor Kopka [Mon, 24 Sep 2018 17:12:59 +0000 (19:12 +0200)]
Fix flake 8 style warnings

Ran zts-report.py and test-runner.py from ./tests/test-runner/bin/
through the 2to3 (https://docs.python.org/2/library/2to3.html).
Checked the result, fixed:
- 'maxint' -> 'maxsize' that 2to3 missed.
- 'cmp=' parameter for a 'sorted()' with a 'key=' version.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: John Wren Kennedy <jwk404@gmail.com>
Signed-off-by: Gregor Kopka <gregor@kopka.net>
Closes #7925
Closes #7929

5 years agovdev_disk_error() prints ASCII SOH to debug log
LOLi [Fri, 21 Sep 2018 16:42:42 +0000 (18:42 +0200)]
vdev_disk_error() prints ASCII SOH to debug log

Currently vdev_disk_error() prepends its messages sent to the internal
ZFS debug log with KERN_WARNING, which is currently defined as follows:

   #define KERN_SOH      "\001"
   #define KERN_WARNING  KERN_SOH "4"

Since "\001" (ASCII Start Of Header) is not printable this results in
weird characters displayed when inspecting the debug log. This commit
simply removes this superfluous prefix passed to zfs_dbgmsg().

Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #7936

5 years agoFix reference to zpool-features(5)
DeHackEd [Fri, 21 Sep 2018 16:41:08 +0000 (12:41 -0400)]
Fix reference to zpool-features(5)

Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: DHE <git@dehacked.net>
Closes #7938

5 years agoAdd limits to spa_slop_shift tunable
LOLi [Fri, 21 Sep 2018 04:10:12 +0000 (06:10 +0200)]
Add limits to spa_slop_shift tunable

This change adds limits to the possible spa_slop_shift values set via
the sysfs interface. Accepted values are from a minimum of 1 to a
maximum of 31 (inclusive): these limits are based on the following
values observed on a 128PB file-vdev test pool:

spa_slop_shift=1, spa_get_slop_space=63.5PiB
spa_slop_shift=2, spa_get_slop_space=31.8PiB
spa_slop_shift=3, spa_get_slop_space=15.9PiB
spa_slop_shift=4, spa_get_slop_space=7.9PiB
spa_slop_shift=5, spa_get_slop_space=4PiB
spa_slop_shift=6, spa_get_slop_space=2PiB
...
spa_slop_shift=25, spa_get_slop_space=4GiB
spa_slop_shift=26, spa_get_slop_space=2GiB
spa_slop_shift=27, spa_get_slop_space=1016MiB
spa_slop_shift=28, spa_get_slop_space=508MiB
spa_slop_shift=29, spa_get_slop_space=254MiB
spa_slop_shift=30, spa_get_slop_space=128MiB
spa_slop_shift=31, spa_get_slop_space=128MiB
spa_slop_shift=32, spa_get_slop_space=128MiB

Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #7876
Closes #7900

5 years agoAdd NEWS file
Richard Yao [Tue, 18 Sep 2018 19:03:47 +0000 (15:03 -0400)]
Add NEWS file

I received a request for a NEWS file. That needs to be handled by Tony
and Brian, but for now, we can at least provide one that provides a link
to github so that users who expect NEWS files from their packages will
know where to go for release information.

Reviewed-by: Neal Gompa <ngompa@datto.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes #7918

5 years agozstreamdump dumps core printing truncated nvlist
LOLi [Tue, 18 Sep 2018 16:43:09 +0000 (18:43 +0200)]
zstreamdump dumps core printing truncated nvlist

This change prevents zstreamdump from crashing when trying to print
invalid nvlist data (DRR_BEGIN record) from a truncated send stream.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #7917

5 years agoFix allocation_classes GUID in zpool-features(5)
DeHackEd [Tue, 18 Sep 2018 15:59:47 +0000 (11:59 -0400)]
Fix allocation_classes GUID in zpool-features(5)

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: DHE <git@dehacked.net>
Closes #7920

5 years agoAdd new wiki page to CONTRIBUTING
bunder2015 [Tue, 18 Sep 2018 15:58:15 +0000 (11:58 -0400)]
Add new wiki page to CONTRIBUTING

A new wiki page has been added for users who may be new to Git or
GitHub.  Adding link to CONTRIBUTING.

Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Gregor Kopka <gregor@kopka.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: bunder2015 <omfgbunder@gmail.com>
Closes #7923
Closes #7915

5 years agoMan page fixes - zpool/zfs optional parameters
Gregor Kopka [Tue, 18 Sep 2018 15:55:33 +0000 (17:55 +0200)]
Man page fixes - zpool/zfs optional parameters

The man pages for zpool and zfs (get command)
listed the pool/dataset parameter as required,
but these are optional. Fixed that.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gregor Kopka <gregor@kopka.net>
Closes #7916

5 years agoClarify 'zpool remove' restrictions
Brian Behlendorf [Tue, 18 Sep 2018 00:28:18 +0000 (17:28 -0700)]
Clarify 'zpool remove' restrictions

Update zpool(8) to clarify what type of vdevs may be safely
removed and that the existence of any top-level raidz device
which is part of the primary pool will prevent device removal.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7880
Closes #7893

5 years agozpool should detect invalid fs property on create
LOLi [Thu, 13 Sep 2018 20:37:42 +0000 (22:37 +0200)]
zpool should detect invalid fs property on create

This change improve the handling of invalid filesystem properties when
specified at pool creation: this is useful when 'zpool create -n'
(dry run) is executed to detect invalid fs-level options (-O) before
the actual command is run.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #7620
Closes #7878

5 years agoAdd removal_resume_export to zts-report.py
Brian Behlendorf [Thu, 13 Sep 2018 20:35:09 +0000 (13:35 -0700)]
Add removal_resume_export to zts-report.py

Add the removal_resume_export test case to the possible failure
section of the zts-report.py and reference the Github issue.  In
the CI environment this test has proven to be unreliable due to
the way it detects the removal thread.  This is a flaw in the test
and not device removal so update the result summary accordingly.

Additionally, increase the allowed timeout in an effort to reduce
the observed rate of false positves.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7895
Issue #7894

5 years agozpool split can create a corrupted pool
Roman Strashkin [Thu, 13 Sep 2018 01:14:42 +0000 (04:14 +0300)]
zpool split can create a corrupted pool

Added vdev_resilver_needed() check to verify VDEVs are fully
synced, so that after split the new pool will not be corrupted.

Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Roman Strashkin <roman.strashkin@nexenta.com>
Closes #7865
Closes #7881

5 years agoTag 0.8.0-rc1
Brian Behlendorf [Fri, 7 Sep 2018 16:35:09 +0000 (09:35 -0700)]
Tag 0.8.0-rc1

Major new features:
- Native encryption
- Device removal
- Allocation classes
- Pool checkpoints
- Sequential scrub and resilver
- Project quota
- Channel programs
- Direct IO

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
5 years agoFix in-kernel sysfs entries
Don Brady [Fri, 7 Sep 2018 04:44:52 +0000 (22:44 -0600)]
Fix in-kernel sysfs entries

The recent sysfs zfs properties feature breaks the in-kernel
builds of zfs (sans module).  When not built as a module add
the sysfs entries under /sys/fs/zfs/.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7868
Closes #7872

5 years agoFix zfs_sysfs_live test failure
Don Brady [Fri, 7 Sep 2018 00:36:00 +0000 (18:36 -0600)]
Fix zfs_sysfs_live test failure

The ZTS zfs_sysfs_live test fails occasionally due to an uninitialized
string on an error path.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7869

5 years agoFix 'zfs allow' for create time permissions
LOLi [Thu, 6 Sep 2018 20:11:21 +0000 (22:11 +0200)]
Fix 'zfs allow' for create time permissions

When no permission set is defined for a dataset the create time
permissions are incorrectly shown as if they were a permission set.
This change simply correct how allow permissions are displayed.

This commit also fixes a small manpage formatting issue and adds the
"zfs_allow_003_pos" test case to the ZFS Test Suite.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #7519
Closes #7860

5 years agoPool allocation classes
Don Brady [Thu, 6 Sep 2018 01:33:36 +0000 (19:33 -0600)]
Pool allocation classes

Allocation Classes add the ability to have allocation classes in a
pool that are dedicated to serving specific block categories, such
as DDT data, metadata, and small file blocks. A pool can opt-in to
this feature by adding a 'special' or 'dedup' top-level VDEV.

Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Alek Pinchuk <apinchuk@datto.com>
Reviewed-by: HÃ¥kan Johansson <f96hajo@chalmers.se>
Reviewed-by: Andreas Dilger <andreas.dilger@chamcloud.com>
Reviewed-by: DHE <git@dehacked.net>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Gregor Kopka <gregor@kopka.net>
Reviewed-by: Kash Pande <kash@tripleback.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #5182

5 years agoCorrectly handle errors from kern_path
Chris Siebenmann [Wed, 5 Sep 2018 05:26:56 +0000 (01:26 -0400)]
Correctly handle errors from kern_path

As a regular kernel function, kern_path() returns errors as negative
errnos, such as -ELOOP. zfsctl_snapdir_vget() must convert these into
the positive errnos used throughout the ZFS code when it returns them
to other ZFS functions so that the ZFS code properly sees them as
errors.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Siebenmann <cks.git01@cs.toronto.edu>
Closes #7764
Closes #7864

5 years agoAdded recalculation of ARC stats mid-eviction
Rich Ercolani [Wed, 5 Sep 2018 05:15:14 +0000 (01:15 -0400)]
Added recalculation of ARC stats mid-eviction

Re-adds a recalculation step for the ARC stats after the MRU
eviction so that we don't pathologically attempt to evict the MFU.

Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Authored-by: Mark Johnston <markj@freebsd.org>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #7855

5 years agoRevert "Update zfs_admin_snapshot default value (disabled)"
Brian Behlendorf [Wed, 5 Sep 2018 04:58:23 +0000 (21:58 -0700)]
Revert "Update zfs_admin_snapshot default value (disabled)"

This reverts commit a6214a0ae9e78d3cac0e495e2fcf7af0858a872f.
Disabling zfs_admin_snapshot by default results in multiple ZTS
tests failing which depend on this functionality.  Revert this
change until the relevant test cases can be updated.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #7838

5 years agoUpdate zfs_admin_snapshot default value (disabled)
George Melikov [Wed, 5 Sep 2018 00:21:24 +0000 (03:21 +0300)]
Update zfs_admin_snapshot default value (disabled)

It's disabled by default, update code to reflect
the documentation.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Gregor Kopka <gregor@kopka.net>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #7835
Closes #7838

5 years agoOpenZFS 9751 - Allocation throttling misplacing ditto blocks
mav [Fri, 17 Aug 2018 15:17:09 +0000 (15:17 +0000)]
OpenZFS 9751 - Allocation throttling misplacing ditto blocks

Relax allocation throttling for ditto blocks.  Due to random imbalances
in allocation it tends to push block copies to one vdev, that looks
slightly better at the moment.  Slightly less strict policy allows both
improve data security and surprisingly write performance, since we don't
need to touch extra metaslabs on each vdev to respect the min distance.

Sponsored by: iXsystems, Inc.

Authored by: mav <mav@FreeBSD.org>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
OpenZFS-issue: https://illumos.org/issues/9751
FreeBSD-commit: https://github.com/freebsd/freebsd/commit/8253837ac3
Closes #7857

5 years agoOpenZFS 9738 - Fix third block copy allocations, broken at 9112.
mav [Fri, 17 Aug 2018 15:00:41 +0000 (15:00 +0000)]
OpenZFS 9738 - Fix third block copy allocations, broken at 9112.

Use METASLAB_WEIGHT_CLAIM weight to allocate tertiary blocks.
Previous use of METASLAB_WEIGHT_SECONDARY for that caused errors
later on metaslab_activate_allocator() call, leading to massive
load of unneeded metaslabs and write freezes.

Authored by: mav <mav@FreeBSD.org>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
OpenZFS-issue: https://illumos.org/issues/9738
FreeBSD-commit: https://github.com/freebsd/freebsd/commit/63e7138
Closes #7858

5 years agoAdd basic zfs ioc input nvpair validation
Don Brady [Sun, 2 Sep 2018 19:14:01 +0000 (15:14 -0400)]
Add basic zfs ioc input nvpair validation

We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).

Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.

This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.

The zfs_ioc_key_t for zfs_keys_channel_program looks like:

static const zfs_ioc_key_t zfs_keys_channel_program[] = {
       {"program",     DATA_TYPE_STRING,               0},
       {"arg",         DATA_TYPE_UNKNOWN,              0},
       {"sync",        DATA_TYPE_BOOLEAN_VALUE,        ZK_OPTIONAL},
       {"instrlimit",  DATA_TYPE_UINT64,               ZK_OPTIONAL},
       {"memlimit",    DATA_TYPE_UINT64,               ZK_OPTIONAL},
};

Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).

ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780

5 years agoAdd zfs module feature and property info to sysfs
Don Brady [Sun, 2 Sep 2018 19:09:53 +0000 (15:09 -0400)]
Add zfs module feature and property info to sysfs

This extends our sysfs '/sys/module/zfs' entry to include feature
and property attributes. The primary consumer of this information
is user processes, like the zfs CLI, that need to know what the
current loaded ZFS module supports. The libzfs binary will consult
this information when instantiating the zfs and zpool property
tables and the pool features table.

This introduces 4 kernel objects (dirs) into '/sys/module/zfs'
with corresponding attributes (files):
  features.runtime
  features.pool
  properties.dataset
  properties.pool

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7706

5 years agoZTS: Fix EBUSY volume destroy failures
Brian Behlendorf [Fri, 31 Aug 2018 22:30:44 +0000 (15:30 -0700)]
ZTS: Fix EBUSY volume destroy failures

It's possible for an unrelated process, like blkid, to have the
volume open when 'zfs destroy' is run.  Switch the cleanup functions
to the destroy_dataset() helper which handles this case by retrying
the destroy when the dataset is busy.  This was done not only for
volumes but also for file systems for consistency.

Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7854

5 years agoAllow ECKSUM in vdev_checkpoint_sm_object()
Brian Behlendorf [Fri, 31 Aug 2018 21:20:34 +0000 (14:20 -0700)]
Allow ECKSUM in vdev_checkpoint_sm_object()

The checkpoint space map object may not be accessible from the
vdev's ZAP when it has been damaged.  This may be the case when
performing an extreme rewind when importing the pool.

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7809
Closes #7853

5 years agoclean up __dbuf_hold_impl
Matthew Ahrens [Fri, 31 Aug 2018 17:16:54 +0000 (13:16 -0400)]
clean up __dbuf_hold_impl

We can simplify the dbuf_hold code by allocating dbuf_hold_arg_t's on
demand, rather than allocating a big array of them up front.  While this
can occasionally increase the number of allocations, typically only one
allocation is needed since the indirect block is already cached.

The performance test suite gets the same results with this change.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #7841

5 years agoZTS: pool_checkpoint path cleanup
bunder2015 [Thu, 30 Aug 2018 21:45:16 +0000 (17:45 -0400)]
ZTS: pool_checkpoint path cleanup

Removing hardcoded paths in pool_checkpoint.kshlib

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: bunder2015 <omfgbunder@gmail.com>
Closes #7840

5 years agoZTS: Fix DEV_DSKDIR trim from disk
Richard Elling [Thu, 30 Aug 2018 21:43:37 +0000 (14:43 -0700)]
ZTS: Fix DEV_DSKDIR trim from disk

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Elling <Richard.Elling@RichardElling.com>
Closes #7848

5 years agoZTS: zvol_swap_003 path cleanup
bunder2015 [Thu, 30 Aug 2018 20:53:06 +0000 (16:53 -0400)]
ZTS: zvol_swap_003 path cleanup

Removing hardcoded paths in zvol_swap_003

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: bunder2015 <omfgbunder@gmail.com>
Closes #7839

5 years agoZTS: path cleanup
bernie1995 [Thu, 30 Aug 2018 20:46:55 +0000 (22:46 +0200)]
ZTS: path cleanup

Removing hardcoded paths in many scripts.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: bernie1995 <bernie.pikes@gmail.com>
Issue #7507
Closes #7843

5 years agoZTS: Fix zfs_create_013_pos
Brian Behlendorf [Thu, 30 Aug 2018 20:38:09 +0000 (13:38 -0700)]
ZTS: Fix zfs_create_013_pos

It's possible for an unrelated process, like blkid, to have the
volume open when 'zfs destroy' is run.  Switch the cleanup function
to the destroy_dataset() helper which handles this case by retrying
the destroy when the dataset is busy.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7847

5 years agoOpenZFS 9403 - assertion failed in arc_buf_destroy()
Tom Caputi [Wed, 29 Aug 2018 18:33:33 +0000 (14:33 -0400)]
OpenZFS 9403 - assertion failed in arc_buf_destroy()

Assertion failed in arc_buf_destroy() when concurrently reading
block with checksum error.

Porting notes:
* The ability to zinject decompression errors has been added, but
  this only works at the zio_decompress() level, where we have all
  of the info we need to match against the user's zinject options.
* The decompress_fault test has been added to test the new zinject
  functionality
* We attempted to set zio_decompress_fail_fraction to (1 << 18) in
  ztest for further test coverage. Although this did uncover a few
  low priority issues, this unfortuantely also causes ztest to
  ASSERT in many locations where the code is working correctly since
  it is designed to fail on IO errors. Developers can manually set
  this variable with the '-o' option to find and debug issues.

Authored by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Matt Ahrens <mahrens@delphix.com>
Ported-by: Tom Caputi <tcaputi@datto.com>
OpenZFS-issue: https://illumos.org/issues/9403
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/fa98e487a9
Closes #7822

5 years agoAlways wait for txg sync when umounting dataset
Tom Caputi [Mon, 20 Aug 2018 20:42:17 +0000 (16:42 -0400)]
Always wait for txg sync when umounting dataset

Currently, when unmounting a filesystem, ZFS will only wait for
a txg sync if the dataset is dirty and not readonly. However, this
can be problematic in cases where a dataset is remounted readonly
immediately before being unmounted, which often happens when the
system is being shut down. Since encrypted datasets require that
all I/O is completed before the dataset is disowned, this issue
causes problems when write I/Os leak into the txgs after the
dataset is disowned, which can happen when sync=disabled.

While looking into fixes for this issue, it was discovered that
dsl_dataset_is_dirty() does not return B_TRUE when the dataset has
been removed from the txg dirty datasets list, but has not actually
been processed yet. Furthermore, the implementation is comletely
different from dmu_objset_is_dirty(), adding to the confusion.
Rather than relying on this function, this patch forces the umount
code path (and the remount readonly code path) to always perform a
txg sync on read-write datasets and removes the function altogether.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #7753
Closes #7795

5 years agoSmall rework of txg_list code
Tom Caputi [Mon, 20 Aug 2018 20:41:53 +0000 (16:41 -0400)]
Small rework of txg_list code

This patch simply adds some missing locking to the txg_list
functions and refactors txg_verify() so that it is only compiled
in for debug builds.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #7795

5 years agoDirect IO support
Brian Behlendorf [Mon, 27 Aug 2018 17:04:21 +0000 (10:04 -0700)]
Direct IO support

Direct IO via the O_DIRECT flag was originally introduced in XFS by
IRIX for database workloads. Its purpose was to allow the database
to bypass the page and buffer caches to prevent unnecessary IO
operations (e.g.  readahead) while preventing contention for system
memory between the database and kernel caches.

On Illumos, there is a library function called directio(3C) that
allows user space to provide a hint to the file system that Direct IO
is useful, but the file system is free to ignore it. The semantics
are also entirely a file system decision. Those that do not
implement it return ENOTTY.

Since the semantics were never defined in any standard, O_DIRECT is
implemented such that it conforms to the behavior described in the
Linux open(2) man page as follows.

    1.  Minimize cache effects of the I/O.

    By design the ARC is already scan-resistant which helps mitigate
    the need for special O_DIRECT handling.  Data which is only
    accessed once will be the first to be evicted from the cache.
    This behavior is in consistent with Illumos and FreeBSD.

    Future performance work may wish to investigate the benefits of
    immediately evicting data from the cache which has been read or
    written with the O_DIRECT flag.  Functionally this behavior is
    very similar to applying the 'primarycache=metadata' property
    per open file.

    2. O_DIRECT _MAY_ impose restrictions on IO alignment and length.

    No additional alignment or length restrictions are imposed.

    3. O_DIRECT _MAY_ perform unbuffered IO operations directly
       between user memory and block device.

    No unbuffered IO operations are currently supported.  In order
    to support features such as transparent compression, encryption,
    and checksumming a copy must be made to transform the data.

    4. O_DIRECT _MAY_ imply O_DSYNC (XFS).

    O_DIRECT does not imply O_DSYNC for ZFS.  Callers must provide
    O_DSYNC to request synchronous semantics.

    5. O_DIRECT _MAY_ disable file locking that serializes IO
       operations.  Applications should avoid mixing O_DIRECT
       and normal IO or mmap(2) IO to the same file.  This is
       particularly true for overlapping regions.

    All I/O in ZFS is locked for correctness and this locking is not
    disabled by O_DIRECT.  However, concurrently mixing O_DIRECT,
    mmap(2), and normal I/O on the same file is not recommended.

This change is implemented by layering the aops->direct_IO operations
on the existing AIO operations.  Code already existed in ZFS on Linux
for bypassing the page cache when O_DIRECT is specified.

References:
  * http://xfs.org/docs/xfsdocs-xml-dev/XFS_User_Guide/tmp/en-US/html/ch02s09.html
  * https://blogs.oracle.com/roch/entry/zfs_and_directio
  * https://ext4.wiki.kernel.org/index.php/Clarifying_Direct_IO's_Semantics
  * https://illumos.org/man/3c/directio

Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #224
Closes #7823

5 years agoRemove %changelog from spec file
Brian Behlendorf [Sun, 26 Aug 2018 19:59:33 +0000 (12:59 -0700)]
Remove %changelog from spec file

Remove the %changelog section from the spec files since it does
not get updated in the master branch.  Not only does this mean
the information is stale, but it can result in 'make deb' failing
to build packages, issue #7825.  This section should be updated
for tagged releases.

Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7825
Closes #7827

5 years agoFedora 28: Fix misc bounds check compiler warnings
Joao Carlos Mendes Luis [Sun, 26 Aug 2018 19:55:44 +0000 (16:55 -0300)]
Fedora 28: Fix misc bounds check compiler warnings

Fix a bunch of truncation compiler warnings that show up
on Fedora 28 (GCC 8.0.1).

Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #7368
Closes #7826
Closes #7830

5 years agoFix libaio-devel requirement for Debian-based distributions
LOLi [Sun, 26 Aug 2018 19:43:27 +0000 (21:43 +0200)]
Fix libaio-devel requirement for Debian-based distributions

BuildRequires tags for "-devel" packages in the RPM spec file do not
work when building on Debian-based distributions.

Fix this issue by making this requirement conditional to RPM-based
distributions.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #7829
Closes #7831

5 years agoAdd libaio-devel BuildRequires
Brian Behlendorf [Thu, 23 Aug 2018 16:34:34 +0000 (09:34 -0700)]
Add libaio-devel BuildRequires

The zfs-test package needs a build requirement on the libaio-devel
package.  Without it ./configure will correctly determine that
mmap_libaio cannot be built and it will be skipped.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7821
Closes #7824

5 years agoStack overflow when destroying deeply nested clones
LOLi [Wed, 22 Aug 2018 18:03:31 +0000 (20:03 +0200)]
Stack overflow when destroying deeply nested clones

Destroy operations on deeply nested chains of clones can overflow
the stack:

        Depth    Size   Location    (221 entries)
        -----    ----   --------
  0)    15664      48   mutex_lock+0x5/0x30
  1)    15616       8   mutex_lock+0x5/0x30
...
 26)    13576      72   dsl_dataset_remove_clones_key.isra.4+0x124/0x1e0 [zfs]
 27)    13504      72   dsl_dataset_remove_clones_key.isra.4+0x18a/0x1e0 [zfs]
 28)    13432      72   dsl_dataset_remove_clones_key.isra.4+0x18a/0x1e0 [zfs]
...
185)     2128      72   dsl_dataset_remove_clones_key.isra.4+0x18a/0x1e0 [zfs]
186)     2056      72   dsl_dataset_remove_clones_key.isra.4+0x18a/0x1e0 [zfs]
187)     1984      72   dsl_dataset_remove_clones_key.isra.4+0x18a/0x1e0 [zfs]
188)     1912     136   dsl_destroy_snapshot_sync_impl+0x4e0/0x1090 [zfs]
189)     1776      16   dsl_destroy_snapshot_check+0x0/0x90 [zfs]
...
218)      304     128   kthread+0xdf/0x100
219)      176      48   ret_from_fork+0x22/0x40
220)      128     128   kthread+0x0/0x100

Fix this issue by converting dsl_dataset_remove_clones_key() from
recursive to iterative.

Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #7279
Closes #7810

5 years agoAdded metadata/dnode cache info to arc_summary
Rich Ercolani [Wed, 22 Aug 2018 16:35:20 +0000 (12:35 -0400)]
Added metadata/dnode cache info to arc_summary

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #7815

5 years agos/VERIFY/VERIFY3S in vdev_checkpoint_sm_object
Tim Chase [Tue, 21 Aug 2018 23:08:15 +0000 (18:08 -0500)]
s/VERIFY/VERIFY3S in vdev_checkpoint_sm_object

Using VERIFY3S allows to view the unexpected error value in the system
log.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
Issue #7809
Closes #7818

5 years agoFix issues with raw receive_write_byref()
Tom Caputi [Mon, 20 Aug 2018 18:03:56 +0000 (14:03 -0400)]
Fix issues with raw receive_write_byref()

This patch fixes 2 issues with raw, deduplicated send streams. The
first is that datasets who had been completely received earlier in
the stream were not still marked as raw receives. This caused
problems when newly received datasets attempted to fetch raw data
from these datasets without this flag set.

The second problem was that the arc freeze checksum code was not
consistent about which locks needed to be held while performing
its asserts. The proper locking needed to run these asserts is
actually fairly nuanced, since the asserts touch the linked list
of buffers (requiring the header lock), the arc_state (requiring
the b_evict_lock), and the b_freeze_cksum (requiring the
b_freeze_lock). This seems like a large performance sacrifice and
a lot of unneeded complexity to verify that this relatively small
debug feature is working as intended, so this patch simply removes
these asserts instead.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #7701

5 years agopyzfs: add missing libzfs_core functions
LOLi [Mon, 20 Aug 2018 17:11:52 +0000 (19:11 +0200)]
pyzfs: add missing libzfs_core functions

This change adds the following libzfs_core functions to pyzfs:
lzc_remap, lzc_pool_checkpoint, lzc_pool_checkpoint_discard

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #7793
Closes #7800

5 years agoSkip import activity test in more zdb code paths
Olaf Faaland [Mon, 20 Aug 2018 17:05:23 +0000 (10:05 -0700)]
Skip import activity test in more zdb code paths

Since zdb opens the pools read-only, it cannot damage the pool in the
event the pool is already imported either on the same host or on
another one.

If the pool vdev structure is changing while zdb is importing the
pool, it may cause zdb to crash.  However this is unlikely, and in any
case it's a user space process and can simply be run again.

For this reason, zdb should disable the multihost activity test on
import that is normally run.

This commit fixes a few zdb code paths where that had been overlooked.
It also adds tests to ensure that several common use cases handle this
properly in the future.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Gu Zheng <guzheng2331314@163.com>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #7797
Closes #7801

5 years agoDon't modify argv[] in user tools
DeHackEd [Mon, 20 Aug 2018 16:55:18 +0000 (12:55 -0400)]
Don't modify argv[] in user tools

argv[] gets modified during string parsing for input arguments. This
is reflected in the live process listing. Don't do that.

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: DHE <git@dehacked.net>
Closes #7760

5 years agoIntroduce read/write kstats per dataset
Serapheim Dimitropoulos [Mon, 20 Aug 2018 16:52:37 +0000 (09:52 -0700)]
Introduce read/write kstats per dataset

The following patch introduces a few statistics on reads and writes
grouped by dataset. These statistics are implemented as kstats
(backed by aggregate sums for performance) and can be retrieved by
using the dataset objset ID number. The motivation for this change is
to provide some preliminary analytics on dataset usage/performance.

Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #7705

5 years agoZTS: events path cleanup
bunder2015 [Sun, 19 Aug 2018 04:19:41 +0000 (00:19 -0400)]
ZTS: events path cleanup

Removing hardcoded paths in events.cfg

Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: bunder2015 <omfgbunder@gmail.com>
Closes #7805

5 years agoZTS: largest_pool_001 path cleanup
bunder2015 [Sun, 19 Aug 2018 04:18:31 +0000 (00:18 -0400)]
ZTS: largest_pool_001 path cleanup

Removing hardcoded paths in largest_pool_001

Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: bunder2015 <omfgbunder@gmail.com>
Closes #7804

5 years agoZTS: privilege group path cleanup
bunder2015 [Sun, 19 Aug 2018 04:17:22 +0000 (00:17 -0400)]
ZTS: privilege group path cleanup

Removing hardcoded paths in privilege group tests

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: bunder2015 <omfgbunder@gmail.com>
Closes #7803

5 years agoZTS: Fix import_cache_device_replaced
Brian Behlendorf [Sun, 19 Aug 2018 04:16:12 +0000 (21:16 -0700)]
ZTS: Fix import_cache_device_replaced

Allow the 'zpool replace' to run slowly without overwhelming the vdev
queues by setting zfs_scan_vdev_limit=128k.  This limits the number of
concurrent slow IOs which need to be handled.  The net effect is the
test case runs approximately 3x faster putting it well under the 10
minute per-test time limit.

Rename import_cache* test cases to imprt_cachefile*.  Originally
these were renamed due to a maximum tar name limit, this limit was
removed by commit 1dfde3d9b.

Replaced instances of /var/tmp in zpool_import.cfg with $TEST_BASE_DIR.

Reviewed-by: bunder2015 <omfgbunder@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7765
Closes #7802

5 years ago'zfs holds' scripted mode is not documented
LOLi [Sat, 18 Aug 2018 22:47:41 +0000 (00:47 +0200)]
'zfs holds' scripted mode is not documented

This change simply documents the existing "scripted mode" option in
both command help and man page.

Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #7798

5 years agoFix arcstat.py handling of unsupported options
LOLi [Sat, 18 Aug 2018 20:10:36 +0000 (22:10 +0200)]
Fix arcstat.py handling of unsupported options

This change allows the arcstat.py script to handle unsupported options
gracefully and print both error and usage messages when one such option
is provided.

Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #7799

5 years agoZTS: Fix reservation_001_pos
Brian Behlendorf [Fri, 17 Aug 2018 17:01:47 +0000 (10:01 -0700)]
ZTS: Fix reservation_001_pos

It's possible for an unrelated process, like blkid, to have the
volume open when 'zfs destroy' is run.  Switch the cleanup function
to the destroy_dataset() helper which handles this case by retrying
the destroy when the dataset is busy.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7796

5 years agoFix traverse_impl() kmem leak
Brian Behlendorf [Wed, 15 Aug 2018 16:53:44 +0000 (09:53 -0700)]
Fix traverse_impl() kmem leak

The error path must free the memory allocated by this function or
it will be leaked.  In practice, this would leak only a few bytes
of memory under rare circumstances and thus is unlikely to have
caused any real problems.  This issue was caught by the kmemleak.

Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7791

5 years agoUse posix format for dist tarballs
Brian Behlendorf [Wed, 15 Aug 2018 16:52:28 +0000 (09:52 -0700)]
Use posix format for dist tarballs

Traditionally Automake has defaulted to the V7 tar format when
creating tarballs for distributions.  One of the many limitions
of this format is a 99 character maximum path + file name limit.
This can cause problems when adding new test cases to the ZTS
due to the depth of the sub-tree and descriptive test names.

This change switches the build system to the posix (aliased as
pax) tar format which conforms to the POSIX.1-2001 specification.
This format does not suffer from the V7 limitations, was designed
to be compatible, and will become the default format in future
versions of GNU tar.

https://www.gnu.org/software/tar/manual/html_chapter/tar_8.html

As part of this change the blockfiles directories which were
originally removed due to this limit have been readded.

Reviewed by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7767