]> git.proxmox.com Git - mirror_zfs.git/log
mirror_zfs.git
6 years ago Fix aarch64 build
Brian Behlendorf [Sat, 29 Jul 2017 20:25:53 +0000 (13:25 -0700)]
 Fix aarch64 build

Add aarch64 to the list of architecture which do not sanitize the
LDFLAGS from the environment.  See fb963d33 for details.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6424

6 years agoDisable zfs_send_007_pos
Giuseppe Di Natale [Sat, 29 Jul 2017 05:37:27 +0000 (22:37 -0700)]
Disable zfs_send_007_pos

Test case zfs_send_007_pos regularly is killed
by test-runner during zfs-tests on buildbot. Disable
it for now until further investigation can be done.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #6422

6 years agozfs promote|rename .../%recv should be an error
LOLi [Fri, 28 Jul 2017 21:12:34 +0000 (23:12 +0200)]
zfs promote|rename .../%recv should be an error

If we are in the middle of an incremental 'zfs receive', the child
.../%recv will exist. If we run 'zfs promote' .../%recv, it will "work",
but then zfs gets confused about the status of the new dataset.
Attempting to do this promote should be an error.

Similarly renaming .../%recv datasets should not be allowed.

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #4843
Closes #6339

6 years agoOpenZFS 7915 - checks in l2arc_evict could use some cleaning up
Andriy Gapon [Tue, 28 Feb 2017 21:32:55 +0000 (23:32 +0200)]
OpenZFS 7915 - checks in l2arc_evict could use some cleaning up

Authored by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/7915
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/836a00c
Closes #6375

6 years agoOpenZFS 8373 - TXG_WAIT in ZIL commit path
Andriy Gapon [Mon, 17 Jul 2017 12:31:30 +0000 (15:31 +0300)]
OpenZFS 8373 - TXG_WAIT in ZIL commit path

Authored by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/8373
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/7f04961
Closes #6403

6 years agoCorrect man page generation
bunder2015 [Fri, 28 Jul 2017 02:06:34 +0000 (22:06 -0400)]
Correct man page generation

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: bunder2015 <omfgbunder@gmail.com>
Closes #6409
Closes #6410

6 years agoTag zfs-0.7.0
Brian Behlendorf [Wed, 26 Jul 2017 17:10:28 +0000 (10:10 -0700)]
Tag zfs-0.7.0

META file and changelog updated.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
6 years agoOpenZFS 8508 - Mounting a zpool on 32-bit platforms panics
Giuseppe Di Natale [Wed, 26 Jul 2017 16:44:21 +0000 (09:44 -0700)]
OpenZFS 8508 - Mounting a zpool on 32-bit platforms panics

Authored by: Justin Hibbits <chmeeedalf@gmail.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/8508
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/15fc257
Closes #6404

6 years agoAdd line info and SET_ERROR() to ZFS debug log
Ned Bass [Wed, 26 Jul 2017 06:09:48 +0000 (23:09 -0700)]
Add line info and SET_ERROR() to ZFS debug log

Redefine the SET_ERROR macro in terms of __dprintf() so the error
return codes get logged as both tracepoint events (if tracepoints are
enabled) and as ZFS debug log entries.  This also allows us to use
the same definition of SET_ERROR() in kernel and user space.

Define a new debug flag ZFS_DEBUG_SET_ERROR=512 that may be bitwise
or'd into zfs_flags. Setting this flag enables both dprintf() and
SET_ERROR() messages in the debug log. That is, setting
ZFS_DEBUG_SET_ERROR and ZFS_DEBUG_DPRINTF|ZFS_DEBUG_SET_ERROR are
equivalent (this was done for sake of simplicity). Leaving
ZFS_DEBUG_SET_ERROR unset suppresses the SET_ERROR() messages which
helps avoid cluttering up the logs.

To enable SET_ERROR() logging, run:

  echo 1 >   /sys/module/zfs/parameters/zfs_dbgmsg_enable
  echo 512 > /sys/module/zfs/parameters/zfs_flags

Remove the zfs_set_error_class tracepoints event class since
SET_ERROR() now uses __dprintf(). This sacrifices a bit of
granularity when selecting individual tracepoint events to enable but
it makes the code simpler.

Include file, function, and line number information in debug log
entries.  The information is now added to the message buffer in
__dprintf() and as a result the zfs_dprintf_class tracepoints event
class was changed from a 4 parameter interface to a single parameter.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ned Bass <bass6@llnl.gov>
Closes #6400

6 years agoFix zpool-features.5 indentation
Brian Behlendorf [Wed, 26 Jul 2017 01:57:00 +0000 (18:57 -0700)]
Fix zpool-features.5 indentation

The userobj_accounting feature described in the zpool-features.5
man page was incorrectly indented.  Fix it.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6402

6 years agoSome additional send stream validity checking
Ned Bass [Wed, 26 Jul 2017 01:52:40 +0000 (18:52 -0700)]
Some additional send stream validity checking

Check in the DMU whether an object record in a send stream being
received contains an unsupported dnode slot count, and return an
error if it does. Failure to catch an unsupported dnode slot count
would result in a panic when the SPA attempts to increment the
reference count for the large_dnode feature and the pool has the
feature disabled. This is not normally an issue for a well-formed
send stream which would have the DMU_BACKUP_FEATURE_LARGE_DNODE flag
set if it contains large dnodes, so it will be rejected as
unsupported if the required feature is disabled. This change adds a
missing object record field validation.

Add missing stream feature flag checks in
dmu_recv_resume_begin_check().

Consolidate repetitive comment blocks in dmu_recv_begin_check().

Update zstreamdump to print the dnode slot count (dn_slots) for an
object record when running in verbose mode.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Ned Bass <bass6@llnl.gov>
Closes #6396

6 years agoFix 'zpool clear' on suspended pools
Brian Behlendorf [Tue, 25 Jul 2017 19:20:52 +0000 (12:20 -0700)]
Fix 'zpool clear' on suspended pools

'zpool clear' should be able to resume I/O on suspended, but otherwise
healthy, pools.

4a283c7 accidentally introduced a new code path where we call
txg_wait_synced() on the suspended pool before we had the chance to
resume I/O via zio_resume(): this results in the 'zpool clear'
command hanging indefinitely, waiting for a TXG that cannot be synced.

Fix this by avoiding the call to txg_wait_synced().

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6399

6 years agoFix autoconf detection of super_setup_bdi_name
Justin Bedő [Tue, 25 Jul 2017 17:30:20 +0000 (03:30 +1000)]
Fix autoconf detection of super_setup_bdi_name

The previous autoconf test for the presence of super_setup_bdi_name()
uses an invocation with an incorrect type signature, producing a
warning by the compiler when the test is run. This gets elevated to an
error when compiling with -Werror=format-security, causing autoconf to
falsely infer super_setup_bdi_name() is not present. This updates the
testing code to match the invocation used in
include/linux/vfs_compat.h.

Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Justin Bedo <cu@cua0.org>
Closes #6398

6 years agoReport MMP_STATE_NO_HOSTID immediately
Olaf Faaland [Sat, 15 Jul 2017 01:15:00 +0000 (18:15 -0700)]
Report MMP_STATE_NO_HOSTID immediately

There is no need to perform the activity check before detecting that the
user must set the system hostid, because the pool's multihost property
is on, but spa_get_hostid() returned 0.  The initial call to
vdev_uberblock_load() provided the information required.

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #6388

6 years agoAdd callback for zfs_multihost_interval
Olaf Faaland [Fri, 21 Jul 2017 00:54:26 +0000 (17:54 -0700)]
Add callback for zfs_multihost_interval

Add a callback to wake all running mmp threads when
zfs_multihost_interval is changed.

This is necessary when the interval is changed from a very large value
to a significantly lower one, while pools are imported that have the
multihost property enabled.

Without this commit, the mmp thread does not wake up and detect the new
interval until after it has waited the old multihost interval time.  A
user monitoring mmp writes via the provided kstat would be led to
believe that the changed setting did not work.

Added a test in the ZTS under mmp to verify the new functionality is
working.

Added a test to ztest which starts and stops mmp threads, and calls into
the code to signal sleeping mmp threads, to test for deadlocks or
similar locking issues.

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #6387

6 years agoSkip activity check for zhack RO import
Olaf Faaland [Fri, 14 Jul 2017 23:32:55 +0000 (16:32 -0700)]
Skip activity check for zhack RO import

"zhack feature stat" performs a read-only import, so the MMP activity
check is not necessary.

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #6388
Closes #6389

6 years agoAdd zgenhostid utility script
Olaf Faaland [Wed, 19 Jul 2017 01:11:08 +0000 (18:11 -0700)]
Add zgenhostid utility script

Turning the multihost property on requires that a hostid be set to allow
ZFS to determine when a foreign system is attemping to import a pool.
The error message instructing the user to set a hostid refers to
genhostid(1).

Genhostid(1) is not available on SUSE Linux.  This commit adds a script
modeled after genhostid(1) for those users.

Zgenhostid checks for an /etc/hostid file; if it does not exist, it
creates one and stores a value.  If the user has provided a hostid as an
argument, that value is used.  Otherwise, a random hostid is generated
and stored.

This differs from the CENTOS 6/7 versions of genhostid, which overwrite
the /etc/hostid file even though their manpages state otherwise.

A man page for zgenhostid is added. The one for genhostid is in (1), but
I put zgenhostid in (8) because I believe it's more appropriate.

The mmp tests are modified to use zgenhostid to set the hostid instead
of using the spl_hostid module parameter.  zgenhostid will not replace
an existing /etc/hostid file, so new mmp_clear_hostid calls are
required.

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #6358
Closes #6379

6 years agoRelease SCL_STATE in map_write_done()
Olaf Faaland [Mon, 24 Jul 2017 15:48:28 +0000 (08:48 -0700)]
Release SCL_STATE in map_write_done()

The config lock must be held for the duration of the MMP write.
Since the I/Os are executed via map_nowait(), the done function
is the only place where we know the write has completed.

Since SCL_STATE is taken as reader, overlapping I/Os do not
create a deadlock.  The refcount is simply increased when new
I/Os are queued and decreased when I/Os complete.

Test case added which exercises the probe IO call path to
verify the fix and prevent a regression.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #6394

6 years agoRevert Fix vdev_probe() call wrt SCL_STATE_ALL
Olaf Faaland [Tue, 18 Jul 2017 18:43:55 +0000 (11:43 -0700)]
Revert Fix vdev_probe() call wrt SCL_STATE_ALL

This reverts commit cc9c6bc, which has been causing intermittent
test failures on buildbot.  A correct fix for this locking issue
has been applied in a separate patch.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
6 years agoIncrease delay for zed log in events tests
Giuseppe Di Natale [Mon, 24 Jul 2017 20:02:42 +0000 (13:02 -0700)]
Increase delay for zed log in events tests

In zed event test cases, a brief delay was introduced
to allow for events to make it to the zed log. On at least
one buildbot builder, the 1 second delay is not long enough.
Therefore, increasing the delay should ensure the zed has
more than enough time to write to its log.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #6395

6 years agoFix buffer overflow in dsl_dataset_name()
LOLi [Mon, 24 Jul 2017 19:56:49 +0000 (21:56 +0200)]
Fix buffer overflow in dsl_dataset_name()

If we're creating a pool with version >= SPA_VERSION_DSL_SCRUB (v11)
we need to account for additional space needed by the origin dataset
which will also be snapshotted: "poolname"+"/"+"$ORIGIN"+"@"+"$ORIGIN".

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #6374

6 years agoFix don't zero_label when replace with spare
Chunwei Chen [Mon, 24 Jul 2017 19:49:27 +0000 (12:49 -0700)]
Fix don't zero_label when replace with spare

When replacing a disk with non-wholedisk spare, we shouldn't zero_label
it. The wholedisk case already skip it. In fact, zero_label function
will fail saying device busy because it's already opened exclusively,
but since there's no error checking, the replace command will succeed,
causing great confusion.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #6369

6 years agoRestrict zpool iostat/status -c to search path
Giuseppe Di Natale [Mon, 24 Jul 2017 18:53:59 +0000 (11:53 -0700)]
Restrict zpool iostat/status -c to search path

zpool iostat/status -c is supposed to be restricted
by its search path, but currently isn't. To prevent
arbitrary scripts from being executed, disallow '/'
from commands.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #6353
Closes #6359

6 years agoUse correct macro for hz in mmp.c
Olaf Faaland [Mon, 24 Jul 2017 18:22:10 +0000 (11:22 -0700)]
Use correct macro for hz in mmp.c

Commit 379ca9c Multi-modifier protection (MMP) used HZ to convert
nanoseconds to ticks for use with cv_timedwait() and ddi_get_lbolt().
The correct macro is hz, which is defined within the SPL for kernel
space, and within zfs_context.h for user space.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #6357
Closes #6360

6 years agoFix coverity defects: CID 165755
Giuseppe Di Natale [Mon, 24 Jul 2017 18:16:58 +0000 (11:16 -0700)]
Fix coverity defects: CID 165755

CID 165755: Division or modulo by zero (DIVIDE_BY_ZERO)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #6352

6 years agozfs_mount_001_neg: use log_must_busy in cleanup
Giuseppe Di Natale [Mon, 24 Jul 2017 18:10:25 +0000 (11:10 -0700)]
zfs_mount_001_neg: use log_must_busy in cleanup

Use log_must_busy when destroying the snapshot
and dataset during cleanup in zfs_mount_001_neg.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #6382

6 years agoDisable nbmand tests on kernels w/o support
Giuseppe Di Natale [Mon, 24 Jul 2017 18:03:50 +0000 (11:03 -0700)]
Disable nbmand tests on kernels w/o support

This change allows mountpoint_003_pos and send-c_props
to run on Linux kernels that do not support mandatory
locking. Linux kernel versions greater than or equal to
4.4 no longer support mandatory locking and the test
suite will now account for that.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #6346
Closes #6347
Closes #6362

6 years agoAdd new fsck return code to zvol_misc_002_pos
Tony Hutter [Mon, 24 Jul 2017 17:58:14 +0000 (10:58 -0700)]
Add new fsck return code to zvol_misc_002_pos

zvol_misc_002_pos was failing on Fedora 26 because its newer version
of fsck was returning a different code than previous versions.  The
new fsck error code is valid and is been added to the test in this
patch.

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #6350

6 years agoOpenZFS 8491 - uberblock on-disk padding to reserve space for smoothly merging zpool...
Serapheim Dimitropoulos [Thu, 13 Jul 2017 14:32:53 +0000 (07:32 -0700)]
OpenZFS 8491 - uberblock on-disk padding to reserve space for smoothly merging zpool checkpoint & MMP in ZFS

The zpool checkpoint feature in DxOS added a new field in the uberblock.
The Multi-Modifier Protection Pull Request from ZoL adds three new fields
in the uberblock (Reference: https://github.com/zfsonlinux/zfs/pull/6279).
As these two changes come from two different sources and once upstreamed
and deployed will introduce an incompatibility with each other we want
to upstream a change that will reserve the padding for both of them so
integration goes smoothly and everyone gets both features.

Porting Notes: Preserved MMP comments in uberblock struct.

Authored by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Olaf Faaland <faaland1@llnl.gov>
Approved by: Gordon Ross <gwr@nexenta.com>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/8491
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/d84fa5f
Closes #6390

6 years agoLinux 4.13 compat: bio->bi_status and blk_status_t
Brian Behlendorf [Mon, 24 Jul 2017 02:37:12 +0000 (19:37 -0700)]
Linux 4.13 compat: bio->bi_status and blk_status_t

Commit torvalds/linux@4e4cbee9.  The bio->bi_error field was
replaced with bio->bi_status which is an enum that describes
all possible error types.

Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6351

6 years agoMinor fixes in zpool iostat -c documentation (#6370)
Ned Bass [Fri, 21 Jul 2017 00:04:35 +0000 (17:04 -0700)]
Minor fixes in zpool iostat -c documentation (#6370)

- Use nested [] notation to denote optional script list elements
- Fix space before comma after smarctl(8)
- Fix typo and formatting error in reference to -v option
- Fix spelling errors

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Ned Bass <bass6@llnl.gov>
Closes #6370

6 years agoFix coverity defects: CID 165757
George Melikov [Fri, 14 Jul 2017 16:34:35 +0000 (19:34 +0300)]
Fix coverity defects: CID 165757

CID 165757: Control flow issues (MISSING_BREAK)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #6348

6 years agoTag 0.7.0-rc5
Brian Behlendorf [Thu, 13 Jul 2017 19:08:53 +0000 (12:08 -0700)]
Tag 0.7.0-rc5

Fifth release candidate.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
`

6 years agoFix vdev_probe() call outside SCL_STATE_ALL lock
Brian Behlendorf [Wed, 12 Jul 2017 00:35:34 +0000 (20:35 -0400)]
Fix vdev_probe() call outside SCL_STATE_ALL lock

When an IO fails then zio_vdev_io_done() can call vdev_probe()
to determine the health of the vdev.  This is safe as long as
the original zio was submitted with zio_wait() and holds the
SCL_STATE_ALL lock over the operation.

If zio_no_wait() was used then the done callback will submit
the probe IO outside the SCL_STATE_ALL lock and hit this
ASSERT in zio_create()

  ASSERT(!vd || spa_config_held(spa, SCL_STATE_ALL, RW_READER));

Resolve the issue by only allowing vdev_probe() to be called
when there's a waiter indicating the caller is using zio_wait().
This assumes that caller is still holding SCL_STATE_ALL.

This issue isn't MMP specific but was surfaced when testing.
Without this patch it can be reproduced by running:

  zpool set multihost on <pool>
  zinject -d <vdev> -e io -T write -f 50 <pool> -L uber

Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@intel.com>
Closes #745
Closes #6279

6 years agoMulti-modifier protection (MMP)
Olaf Faaland [Sat, 8 Jul 2017 03:20:35 +0000 (20:20 -0700)]
Multi-modifier protection (MMP)

Add multihost=on|off pool property to control MMP.  When enabled
a new thread writes uberblocks to the last slot in each label, at a
set frequency, to indicate to other hosts the pool is actively imported.
These uberblocks are the last synced uberblock with an updated
timestamp.  Property defaults to off.

During tryimport, find the "best" uberblock (newest txg and timestamp)
repeatedly, checking for change in the found uberblock.  Include the
results of the activity test in the config returned by tryimport.
These results are reported to user in "zpool import".

Allow the user to control the period between MMP writes, and the
duration of the activity test on import, via a new module parameter
zfs_multihost_interval.  The period is specified in milliseconds.  The
activity test duration is calculated from this value, and from the
mmp_delay in the "best" uberblock found initially.

Add a kstat interface to export statistics about Multiple Modifier
Protection (MMP) updates. Include the last synced txg number, the
timestamp, the delay since the last MMP update, the VDEV GUID, the VDEV
label that received the last MMP update, and the VDEV path.  Abbreviated
output below.

$ cat /proc/spl/kstat/zfs/mypool/multihost
31 0 0x01 10 880 105092382393521 105144180101111
txg   timestamp  mmp_delay   vdev_guid   vdev_label vdev_path
20468    261337  250274925   68396651780       3    /dev/sda
20468    261339  252023374   6267402363293     1    /dev/sdc
20468    261340  252000858   6698080955233     1    /dev/sdx
20468    261341  251980635   783892869810      2    /dev/sdy
20468    261342  253385953   8923255792467     3    /dev/sdd
20468    261344  253336622   042125143176      0    /dev/sdab
20468    261345  253310522   1200778101278     2    /dev/sde
20468    261346  253286429   0950576198362     2    /dev/sdt
20468    261347  253261545   96209817917       3    /dev/sds
20468    261349  253238188   8555725937673     3    /dev/sdb

Add a new tunable zfs_multihost_history to specify the number of MMP
updates to store history for. By default it is set to zero meaning that
no MMP statistics are stored.

When using ztest to generate activity, for automated tests of the MMP
function, some test functions interfere with the test.  For example, the
pool is exported to run zdb and then imported again.  Add a new ztest
function, "-M", to alter ztest behavior to prevent this.

Add new tests to verify the new functionality.  Tests provided by
Giuseppe Di Natale.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Ned Bass <bass6@llnl.gov>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #745
Closes #6279

6 years agoMake hostid consistent in user and kernel space
Olaf Faaland [Thu, 25 May 2017 20:32:06 +0000 (13:32 -0700)]
Make hostid consistent in user and kernel space

If no spl_hostid was set, and no /etc/hostid file existed, the user
and kernel would have different values for the hostid.

The kernel's would be 0.  User space's would depend on the libc
implementation.  On systems with glibc, it would be a generated value,
probably the first 4 bytes of an IP address (see man 3 gethostid and
comments above hostid_read in SPL for details).

This then causes the hostid stored in the labels and in the pool
config not to match the hostid userspace obtains from
get_system_hostid().

Since the kernel has no way to know the libc's generated hostid value,
it serves no purpose for ZFS to use the value.

This patch changes user space's get_system_hostid() to conform to the
kernel's method, first checking for the spl_hostid via sysfs, and then
reading from /etc/hostid directly.

It does not look up spl_hostid_path, because if that is set and the
file it pointed to exists, spl_hostid will reflect its contents.

It eliminates the call to libc's gethostid().

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Ned Bass <bass6@llnl.gov>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #745
Closes #6279

6 years agoOpenZFS 6939 - add sysevents to zfs core for commands
Dave Eddy [Tue, 30 May 2017 18:39:17 +0000 (11:39 -0700)]
OpenZFS 6939 - add sysevents to zfs core for commands

Authored by: Dave Eddy <dave@daveeddy.com>
Reviewed by: Patrick Mooney <patrick.mooney@joyent.com>
Reviewed by: Joshua M. Clulow <jmc@joyent.com>
Reviewed by: Josh Wilsdon <jwilsdon@joyent.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed by: Alan Somers <asomers@gmail.com>
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/6939
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/ce1577b
Closes #6328

6 years agoFixed VERIFY3_IMPL() bug from 682ce104
Tom Caputi [Thu, 13 Jul 2017 00:15:24 +0000 (20:15 -0400)]
Fixed VERIFY3_IMPL() bug from 682ce104

When VERIFY3_IMPL() was adjusted in 682ce104, the values of
the operands were omitted from the variadic arguments list.
This patch simply corrects this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #6343

6 years agoAdd port of FreeBSD 'volmode' property
LOLi [Wed, 12 Jul 2017 20:05:37 +0000 (22:05 +0200)]
Add port of FreeBSD 'volmode' property

The volmode property may be set to control the visibility of ZVOL
block devices.

This allow switching ZVOL between three modes:
   full - existing fully functional behaviour (default)
   dev  - hide partitions on ZVOL block devices
   none - not exposing volumes outside ZFS

Additionally the new zvol_volmode module parameter can be used to
control the default behaviour.

This functionality can be used, for instance, on "backup" pools to
avoid cluttering /dev with unneeded zd* devices.

Original-patch-by: mav <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
FreeBSD-commit: https://github.com/freebsd/freebsd/commit/dd28e6bb
Closes #1796
Closes #3438
Closes #6233

6 years agoOpenZFS 5428 - provide fts(), reallocarray(), and strtonum()
Yuri Pankov [Tue, 13 Jun 2017 03:16:28 +0000 (20:16 -0700)]
OpenZFS 5428 - provide fts(), reallocarray(), and strtonum()

Authored by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Robert Mustacchi <rm@joyent.com>
Approved by: Joshua M. Clulow <josh@sysmgr.org>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Porting Notes:
* All hunks unrelated to ZFS were dropped.

OpenZFS-issue: https://www.illumos.org/issues/5428
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/4585130
Closes #6326

6 years agoExit test-runner with non-zero if tests are KILLED
Giuseppe Di Natale [Sat, 8 Jul 2017 00:07:40 +0000 (17:07 -0700)]
Exit test-runner with non-zero if tests are KILLED

fe46eeb introduced non-zero exit codes to test-runner.
A non-zero exit code should be returned when test-runner
decided to kill a test and mark it as KILLED.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #6325

6 years agoFix chattr_001_pos
LOLi [Fri, 7 Jul 2017 22:45:29 +0000 (00:45 +0200)]
Fix chattr_001_pos

Commands should be eval()ed if they involve a shell redirection,
otherwise we end up writing log_* functions messages to the output.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #6300
Closes #6323

6 years agoOpenZFS 8126 - ztest assertion failed in dbuf_dirty due to dn_nlevels changing
Matthew Ahrens [Mon, 20 Mar 2017 22:38:11 +0000 (15:38 -0700)]
OpenZFS 8126 - ztest assertion failed in dbuf_dirty due to dn_nlevels changing

The sync thread is concurrently modifying dn_phys->dn_nlevels
while dbuf_dirty() is trying to assert something about it, without
holding the necessary lock. We need to move this assertion further down
in the function, after we have acquired the dn_struct_rwlock.

Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/8126
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/0ef125d
Closes #6314

6 years agoOpenZFS 8067 - zdb should be able to dump literal embedded block pointer
Matthew Ahrens [Mon, 1 May 2017 18:06:07 +0000 (11:06 -0700)]
OpenZFS 8067 - zdb should be able to dump literal embedded block pointer

Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: Yuri Pankov <yuri.pankov@gmail.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/8067
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/8173085
Closes #6319

6 years agoPrevent dependencies on Debianized packages
Antonio Russo [Fri, 7 Jul 2017 17:45:17 +0000 (13:45 -0400)]
Prevent dependencies on Debianized packages

Call dpkg-shlibdeps with arguments excluding the Debianized packages
lib{uutil1,nvpair1,zfs2,zpool2}linux from the auto-generated
dependencies of generated .debs. A shim dh_shlibdeps that calls the
real dh_shlibdeps with corresponding arguments is installed into a
temporary directory, which is in turn pre-pended to the PATH for the
alien call, working around alien's inability to directly alter the
dependencies of its output debs. Resolves #6106.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <antonio.e.russo@gmail.com>
Closes #6309
Closes #6106

6 years agoFix 'zpool clear' on readonly pools
LOLi [Fri, 7 Jul 2017 17:39:53 +0000 (19:39 +0200)]
Fix 'zpool clear' on readonly pools

Illumos 4080 inadvertently allows 'zpool clear' on readonly pools: fix
this by reintroducing a check (POOL_CHECK_READONLY) in zfs_ioc_clear
registration code.

Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #6306

6 years agoImplemented zpool scrub pause/resume
Alek P [Fri, 7 Jul 2017 05:16:13 +0000 (22:16 -0700)]
Implemented zpool scrub pause/resume

Currently, there is no way to pause a scrub. Pausing may
be useful when the pool is busy with other I/O to preserve
bandwidth.

This patch adds the ability to pause and resume scrubbing.
This is achieved by maintaining a persistent on-disk scrub state.
While the state is 'paused' we do not scrub any more blocks.
We do however perform regular scan housekeeping such as
freeing async destroyed and deadlist blocks while paused.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Thomas Caputi <tcaputi@datto.com>
Reviewed-by: Serapheim Dimitropoulos <serapheimd@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alek Pinchuk <apinchuk@datto.com>
Closes #6167

6 years agoReschedule processes on -ERESTARTSYS
Arkadiusz Bubała [Thu, 6 Jul 2017 15:38:24 +0000 (17:38 +0200)]
Reschedule processes on -ERESTARTSYS

On the single core machine the system may hang when the
spa_namespare_lock acquisition fails in the zvol_first_open
function. It returns -ERESTARTSYS error what causes the
endless loop in __blkdev_get function.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Arkadiusz Bubała <arkadiusz.bubala@open-e.com>
Closes #6283
Closes #6312

6 years agoZTS: replace su commands by run_user function
George Melikov [Wed, 5 Jul 2017 17:46:52 +0000 (20:46 +0300)]
ZTS: replace su commands by run_user function

Needed for PATH variable to be passed into su.  The
posix* tests were fixed, but they need further investigation
before they can be enabled.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #6303

6 years agoMusl libc fixes
alaviss [Wed, 5 Jul 2017 17:39:13 +0000 (00:39 +0700)]
Musl libc fixes

Musl libc's <stdio.h> doesn't include <stdarg.h>, which cause
`va_start` and `va_end` end up being undefined symbols.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Leorize <alaviss@users.noreply.github.com>
Closes #6310

6 years agoClang fixes
alaviss [Wed, 5 Jul 2017 17:38:20 +0000 (00:38 +0700)]
Clang fixes

Clang doesn't support `/` as comment in assembly, this patch replaces
them with `#`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Leorize <alaviss@users.noreply.github.com>
Closes #6311

6 years agoOpenZFS 8378 - crash due to bp in-memory modification of nopwrite block
Matthew Ahrens [Fri, 14 Apr 2017 19:59:18 +0000 (12:59 -0700)]
OpenZFS 8378 - crash due to bp in-memory modification of nopwrite block

Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
The problem is that zfs_get_data() supplies a stale zgd_bp to
dmu_sync(), which we then nopwrite against.
zfs_get_data() doesn't hold any DMU-related locks, so after it
copies db_blkptr to zgd_bp, dbuf_write_ready() could change
db_blkptr, and dbuf_write_done() could remove the dirty record.
dmu_sync() then sees the stale BP and that the dbuf it not dirty,
so it is eligible for nop-writing.
The fix is for dmu_sync() to copy db_blkptr to zgd_bp after
acquiring the db_mtx. We could still see a stale db_blkptr,
but if it is stale then the dirty record will still exist and
thus we won't attempt to nopwrite.

OpenZFS-issue: https://www.illumos.org/issues/8378
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/3127742
Closes #6293

6 years agoOpenZFS 7600 - zfs rollback should pass target snapshot to kernel
Andriy Gapon [Sat, 11 Mar 2017 18:26:47 +0000 (20:26 +0200)]
OpenZFS 7600 - zfs rollback should pass target snapshot to kernel

Authored by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
The existing kernel-side code only provides a method to rollback to a
latest snapshot, whatever it happens to be at the time when the rollback
is actually done.  That could be unsafe or confusing in environments
where concurrent DSL changes are possible as the resulting state could
correspond to a newer or older snapshot than the originally requested
one.
This change allows to amend that method such that the rollback is
performed only when the latest snapshot has a specific name.  That is,
if a new snapshot is concurrently created or the target snapshot is
destroyed, then no rollback is done and EXDEV error is returned.
New libzfs_core function lzc_rollback_to() is provided for the new
functionality.  libzfs is changed to use lzc_rollback_to() to implement
zfs rollback command.
Perhaps we should return different errors to distinguish the case where
the desired snapshot exists but it's not the latest snapshot and the
case where the desired snapshot does not exist.

OpenZFS-issue: https://www.illumos.org/issues/7600
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/3d645eb
Closes #6292

6 years agoOpenZFS 7910 - l2arc_write_buffers() may write beyond target_sz
Andriy Gapon [Sat, 11 Mar 2017 17:48:35 +0000 (19:48 +0200)]
OpenZFS 7910 - l2arc_write_buffers() may write beyond target_sz

Authored by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/7910
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/cb6af4b
Closes #6291

6 years agoOpenZFS 8418 - zfs_prop_get_table() call in zfs_validate_name() is a no-op
Marcel Telka [Thu, 22 Jun 2017 13:30:49 +0000 (15:30 +0200)]
OpenZFS 8418 - zfs_prop_get_table() call in zfs_validate_name() is a no-op

Authored by: Marcel Telka <marcel@telka.sk>
Reviewed by: Vitaliy Gusev <gusev.vitaliy@icloud.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/8418
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/e09ba01
Closes #6305

6 years agoZTS: minor typo and old default values
George Melikov [Mon, 3 Jul 2017 21:21:12 +0000 (00:21 +0300)]
ZTS: minor typo and old default values

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #6298

6 years agoOn failure tests-runner should do non-zero exit
Alek P [Fri, 30 Jun 2017 18:14:26 +0000 (14:14 -0400)]
On failure tests-runner should do non-zero exit

Right now test runner will always exit(0).
It's helpful to have zfs-tests.sh provide different
exit values depending on if everything passed or not.
We can then use common shell cmds to run tests until failure.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Alek Pinchuk <apinchuk@datto.com>
Closes #6285

6 years agoPrint fail messages before callbacks in test suite
Giuseppe Di Natale [Fri, 30 Jun 2017 18:12:29 +0000 (11:12 -0700)]
Print fail messages before callbacks in test suite

Reorder operations in _endlog so failure messages get
printed prior to performing callbacks and cleanup. This
helps clarify why a test failed and places the message
closer to the point of incident in the resulting logs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #6281

6 years agoOpenZFS 8430 - dir_is_empty_readdir() doesn't properly handle error from fdopendir()
Sowrabha Gopal [Thu, 1 Jun 2017 20:27:02 +0000 (13:27 -0700)]
OpenZFS 8430 - dir_is_empty_readdir() doesn't properly handle error from fdopendir()

Authored by: Sowrabha Gopal <sowrabha.gopal@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Approved by: Robert Mustacchi <rm@joyent.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
dir_is_empty_readdir() immediately returns if fdopendir() fails.
We should close dirfd when that happens.

OpenZFS-issue: https://www.illumos.org/issues/8430
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/e165e20
Closes #6289

6 years agoOpenZFS 8416 - abd.h is not C++ friendly
Andriy Gapon [Wed, 21 Jun 2017 20:47:54 +0000 (23:47 +0300)]
OpenZFS 8416 - abd.h is not C++ friendly

Authored by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Alek Pinchuk <pinchuk.alek@gmail.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/8416
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/589c189
Closes #6288

6 years agoOpenZFS 8426 - mark immutable buffer arguments as such in abd.h
Andriy Gapon [Mon, 26 Jun 2017 10:46:45 +0000 (13:46 +0300)]
OpenZFS 8426 - mark immutable buffer arguments as such in abd.h

Authored by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/8426
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/37359a6
Closes #6287

6 years agoOpenZFS 8377 - Panic in bookmark deletion
Matthew Ahrens [Fri, 14 Apr 2017 19:52:43 +0000 (12:52 -0700)]
OpenZFS 8377 - Panic in bookmark deletion

Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
The problem is that when dsl_bookmark_destroy_check() is
executed from open context (the pre-check), it fills in
dbda_success based on the existence of the bookmark. But
the bookmark (or containing filesystem as in this case)
can be destroyed before we get to syncing context. When
we re-run dsl_bookmark_destroy_check() in syncing context,
it will not add the deleted bookmark to dbda_success,
intending for dsl_bookmark_destroy_sync() to not process
it. But because the bookmark is still in dbda_success from
the open-context call, we do try to destroy it.
The fix is that dsl_bookmark_destroy_check() should not
modify dbda_success when called from open context.

OpenZFS-issue: https://www.illumos.org/issues/8377
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/b0b6fe3
Closes #6286

6 years agoClean up large dnode code
Matthew Ahrens [Thu, 29 Jun 2017 17:18:03 +0000 (10:18 -0700)]
Clean up large dnode code

Resolves issues discovered when porting to OpenZFS.

* Lint warnings.
* Made dnode_move_impl() large dnode aware.  This
  functionality is currently unused on Linux.

Reviewed-by: Ned Bass <bass6@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #6262

6 years agoSet arc_meta_limit, arc_dnode_limit on change
chrisrd [Thu, 29 Jun 2017 16:57:27 +0000 (02:57 +1000)]
Set arc_meta_limit, arc_dnode_limit on change

Make zfs_arc_meta_limit_percent and zfs_arc_dnode_limit_percent behave
as you would expect from zfs-module-parameters.5.

- recalculate arc_meta_limit if zfs_arc_meta_limit_percent changes
- recalculate arc_dnode_limit if zfs_arc_dnode_limit_percent changes
- correctly set arc_meta_limit and arc_dnode_limit if zfs_arc_max or
  zfs_arc_meta_min changes

Reviewed-by: Tim Chase <tim@chase2k.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Dunlop <chris@onthe.net.au>
Closes #6269

6 years agoConvert man zfs.8 to mdoc (OpenZFS sync)
Brian Behlendorf [Thu, 29 Jun 2017 16:55:30 +0000 (09:55 -0700)]
Convert man zfs.8 to mdoc (OpenZFS sync)

* Fixed some typos
* Additional description for some commands arguments
* Text reworked to be in sync with OpenZFS
* Added Linux as .Os type

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6282

6 years agoGCC 7.1 fixes
Tony Hutter [Wed, 28 Jun 2017 17:05:16 +0000 (10:05 -0700)]
GCC 7.1 fixes

GCC 7.1 with will warn when we're not checking the snprintf()
return code in cases where the buffer could be truncated. This
patch either checks the snprintf return code (where applicable),
or simply disables the warnings (ztest.c).

Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #6253

6 years agoConvert man zpool.8 to mdoc (OpenZFS sync)
George Melikov [Sun, 18 Jun 2017 18:27:06 +0000 (21:27 +0300)]
Convert man zpool.8 to mdoc (OpenZFS sync)

* Fixed some typos
* Additional description for some commands arguments
* `listsnapshots` remained
* Text reworked to be in sync with OpenZFS
* Added Linux as .Os type
* Updated `zpool events` section.
* Updated `zpool iostat|status -c` sections
* Added zed(8) reference to SEE ALSO

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #6245

6 years agoFix RHEL 7.4 bio_set_op_attrs build error
Tony Hutter [Tue, 27 Jun 2017 19:00:27 +0000 (12:00 -0700)]
Fix RHEL 7.4 bio_set_op_attrs build error

On RHEL 7.4, include/linux/bio.h now includes a macro for
bio_set_op_attrs that conflicts with the ifndef in ZFS
include/linux/blkdev_compat.h.  This patch fixes the build.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #6234
Closes #6271

6 years agoCap maximum aggregate IO size
Brian Behlendorf [Tue, 27 Jun 2017 17:09:16 +0000 (10:09 -0700)]
Cap maximum aggregate IO size

Commit 8542ef8 allowed optional IOs to be aggregated beyond
the specified aggregation limit.  Since the aggregation limit
was also used to enforce the maximum block size, setting
`zfs_vdev_aggregation_limit=16777216` could result in an
attempt to allocate an ABD larger than 16M.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6259
Closes #6270

6 years agoFix zpool_add_005_pos
bunder2015 [Tue, 27 Jun 2017 17:06:07 +0000 (13:06 -0400)]
Fix zpool_add_005_pos

Under Linux the existence of a block device in /etc/fstab is
not sufficient to prevent the use of the force flag.  Without
the force flag a warning will be printed that the device has
a filesystem of a given type.  Providing the force option
will overwrite that filesystem as long as it is not actively
mounted.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Tested-by: bunder2015 <omfgbunder@gmail.com>
Signed-off-by: bunder2015 <omfgbunder@gmail.com>
Closes #6267
Closes #6272

6 years agoRefine use of zv_state_lock.
Boris Protopopov [Tue, 13 Jun 2017 16:03:44 +0000 (12:03 -0400)]
Refine use of zv_state_lock.

Use zv_state_lock to protect all members of zvol_state structure, add
relevant ASSERT()s. Take zv_suspend_lock before zv_state_lock, do not
hold zv_state_lock across suspend/resume.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Boris Protopopov <boris.protopopov@actifio.com>
Closes #6226

6 years agoOpenZFS 5220 - L2ARC does not support devices that do not provide 512B access
Giuseppe Di Natale [Tue, 27 Jun 2017 00:32:43 +0000 (17:32 -0700)]
OpenZFS 5220 - L2ARC does not support devices that do not provide 512B access

Authored by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/5220
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/403a8da
Closes #6260

6 years agoOpenZFS 8264 - want support for promoting datasets in libzfs_core
Giuseppe Di Natale [Mon, 26 Jun 2017 23:56:09 +0000 (16:56 -0700)]
OpenZFS 8264 - want support for promoting datasets in libzfs_core

Authored by: Andrew Stormont <astormont@racktopsystems.com>
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan McDonald <danmcd@kebe.com>
Approved by: Dan McDonald <danmcd@kebe.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/8264
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/a4b8c9a
Closes #6254

6 years agoFix arithmetic error message in zfs_clone_010_pos
bunder2015 [Mon, 26 Jun 2017 21:48:54 +0000 (17:48 -0400)]
Fix arithmetic error message in zfs_clone_010_pos

zfs_clone_010_pos.ksh: line 234: ZFS_MAXPROPLEN: arithmetic syntax error

Reviewed-by: Kash Pande <kash@tripleback.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: bunder2015 <omfgbunder@gmail.com>
Closes #6268

6 years agoCall cv_signal() with mutex held
Boris Protopopov [Mon, 26 Jun 2017 21:36:49 +0000 (17:36 -0400)]
Call cv_signal() with mutex held

In bqueue_dequeue(), call cv_signal() with bq_lock held.
Re-enable rsend_009_pos to test the fix.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Boris Protopopov <boris.protopopov@actifio.com>
Closes #5887

6 years agoOpenZFS 8331 - zfs_unshare returns wrong error code for smb unshare failure
Andrew Stormont [Mon, 12 Jun 2017 16:56:09 +0000 (17:56 +0100)]
OpenZFS 8331 - zfs_unshare returns wrong error code for smb unshare failure

Authored by: Andrew Stormont <astormont@racktopsystems.com>
Reviewed by: Marcel Telka <marcel@telka.sk>
Reviewed by: Toomas Soome <tsoome@me.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/8331
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/4f4378c
Closes #6255

6 years agoDashes for zero latency values in zpool iostat -p
Tony Hutter [Thu, 22 Jun 2017 16:39:01 +0000 (09:39 -0700)]
Dashes for zero latency values in zpool iostat -p

This prints dashes instead of zeros for zero latency values in
'zpool iostat -p'.  You'll get zero latencies reported when the
disk is idle, but technically a zero latency is invalid, since you
can't measure the latency of doing nothing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #6210

6 years agoAdd kpreempt_disable/enable around CPU_SEQID uses
Morgan Jones [Mon, 19 Jun 2017 16:43:16 +0000 (16:43 +0000)]
Add kpreempt_disable/enable around CPU_SEQID uses

In zfs/dmu_object and icp/core/kcf_sched, the CPU_SEQID macro
should be surrounded by `kpreempt_disable` and `kpreempt_enable`
calls to avoid a Linux kernel BUG warning.  These code paths use
the cpuid to minimize lock contention and is is safe to reschedule
the process to a different processor at any time.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Morgan Jones <me@numin.it>
Closes #6239

6 years agoInject zinject(8) a percentage amount of dev errs
Don Brady [Sat, 17 Jun 2017 00:21:11 +0000 (18:21 -0600)]
Inject zinject(8) a percentage amount of dev errs

In the original form of device error injection, it was an all or nothing
situation.  To help simulate intermittent error conditions, you can now
specify a real number percentage value. This is also very useful for our
ZFS fault diagnosis testing and for injecting intermittent errors during
load testing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@intel.com>
Closes #6227

6 years agoProvide links to info about ZFS Buildbot options
Giuseppe Di Natale [Fri, 16 Jun 2017 00:52:18 +0000 (17:52 -0700)]
Provide links to info about ZFS Buildbot options

Add links for information about the ZFS buildbot options
to the contributing guidelines and PR template.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #6235

6 years agoAvoid 'queue not locked' warning at pool import.
Boris Protopopov [Wed, 14 Jun 2017 20:18:36 +0000 (16:18 -0400)]
Avoid 'queue not locked' warning at pool import.

Use queue_flag_set_unlocked() in zvol_alloc().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Boris Protopopov <boris.protopopov@actifio.com>
Issue #6226

6 years agoFix zvol_state_t->zv_open_count race
LOLi [Thu, 15 Jun 2017 18:08:45 +0000 (20:08 +0200)]
Fix zvol_state_t->zv_open_count race

5559ba0 added zv_state_lock to protect zvol_state_t internal data:
this, however, doesn't guard zv->zv_open_count and
zv->zv_disk->private_data in zvol_remove_minors_impl().

Fix this by taking zv->zv_state_lock before we check its zv_open_count.

P1 (z_zvol)                       P2 (systemd-udevd)
---                               ---
zvol_remove_minors_impl()
: zv->zv_open_count==0
                                  zvol_open()
                                  ->mutex_enter(zv_state_lock)
                                  : zv->zv_open_count++
                                  ->mutex_exit(zv_state_lock)
->mutex_enter(zv->zv_state_lock)
->zvol_remove(zv)
->mutex_exit(zv->zv_state_lock)
: zv->zv_disk->private_data = NULL
->zvol_free()
-->ASSERT(zv->zv_open_count==0) *
                                  zvol_release()
                                  : zv = disk->private_data
                                  ->ASSERT(zv && zv->zv_open_count>0) *
---                               ---
* ASSERT() fails

Reviewed by: Boris Protopopov <bprotopopov@hotmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #6213

6 years agoFix manual description of zfs_arc_dnode_limit
chrisrd [Wed, 14 Jun 2017 20:23:02 +0000 (06:23 +1000)]
Fix manual description of zfs_arc_dnode_limit

In arc_evict_state() we start pruning when arc_dnode_size >
arc_dnode_limit, i.e. arc_dnode_limit is a ceiling rather than a
floor.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Dunlop <chris@onthe.net.au>
Closes #6228

6 years agoFix zvol_init error handling
Richard Yao [Sat, 20 May 2017 18:01:55 +0000 (14:01 -0400)]
Fix zvol_init error handling

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@prophetstor.com>
6 years agoMake zvol operations use _by_dnode routines
Richard Yao [Tue, 13 Jun 2017 16:18:08 +0000 (12:18 -0400)]
Make zvol operations use _by_dnode routines

This continues what was started in
0eef1bde31d67091d3deed23fe2394f5a8bf2276 by fully converting zvols
to avoid unnecessary dnode_hold() calls. This saves a small amount
of CPU time and slightly improves latencies of operations on zvols.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@prophetstor.com>
Closes #6058

6 years agoFix zpool_import_all_001_pos
Giuseppe Di Natale [Tue, 13 Jun 2017 16:05:55 +0000 (09:05 -0700)]
Fix zpool_import_all_001_pos

Cleanup zpool_import_all_001_pos to no longer use devices.
The test is meant to test zpool import -a and by no longer
requiring devices, a number of dependencies are no longer
necessary.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #6198

6 years agoReduce stack usage of dsl_dir_tempreserve_impl
DeHackEd [Mon, 12 Jun 2017 18:41:03 +0000 (14:41 -0400)]
Reduce stack usage of dsl_dir_tempreserve_impl

Buildbots and zfs-tests regularly see 7 kilobytes of stack
usage with this function. Convert self-calls to iterations

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: DHE <git@dehacked.net>
Closes #6219

6 years agoUse log_must_busy in destroy_pool
Brian Behlendorf [Mon, 12 Jun 2017 16:45:32 +0000 (09:45 -0700)]
Use log_must_busy in destroy_pool

The log function log_must_busy was added in commit e623aea2 for
this purpose.  Update destroy_pool to use it.

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6217

6 years agoAdd missing \n for "invalid optionusage" output
kpande [Fri, 9 Jun 2017 16:51:13 +0000 (12:51 -0400)]
Add missing \n for "invalid optionusage" output

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: DHE <git@dehacked.net>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Jack Draak <jackdraak@gmail.com>
Signed-off-by: Kash Pande <kash@tripleback.net>
Closes #6203

6 years agoOpenZFS 8056 - zfs send size estimate is inaccurate for some zvols
Paul Dagnelie [Thu, 7 Jul 2016 22:00:51 +0000 (15:00 -0700)]
OpenZFS 8056 - zfs send size estimate is inaccurate for some zvols

Authored by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Reviewed-by: Kash Pande <kash@tripleback.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
The send size estimate for a zvol can be too low, if the size of the
record headers (dmu_replay_record_t's) is a significant portion of the
size. This is typically the case when the data is highly compressible,
especially with embedded blocks.

The problem is that dmu_adjust_send_estimate_for_indirects() assumes
that blocks are the size of the "recordsize" property (128KB). However,
for zvols, the blocks are the size of the "volblocksize" property (8KB).
Therefore, we estimate that there will be 16x less record headers than
there really will be.

The fix is to check the type of the object set (whether it is a zvol or
not) and pick the appropriate property. In addition, while we are at it,
we also add the size of the BEGIN and END records to the estimate.

OpenZFS-issue: https://www.illumos.org/issues/8056
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/faf09cd
Closes #6205

6 years agoOpenZFS 8156 - dbuf_evict_notify() does not need dbuf_evict_lock
Matthew Ahrens [Tue, 28 Mar 2017 22:31:49 +0000 (15:31 -0700)]
OpenZFS 8156 - dbuf_evict_notify() does not need dbuf_evict_lock

Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
dbuf_evict_notify() holds the dbuf_evict_lock while checking if it should
do the eviction itself (because the evict thread is not able to keep up).
This can result in massive lock contention.  It isn't necessary to hold
the lock, because if we make the wrong choice occasionally, nothing bad
will happen. This commit results in a ~60% performance improvement for
ARC-cached sequential reads.

OpenZFS-issue: https://www.illumos.org/issues/8156
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f73e5d9
Closes #6204

6 years agoOpenZFS 8199 - multi-threaded dmu_object_alloc()
Matthew Ahrens [Fri, 13 May 2016 04:16:36 +0000 (21:16 -0700)]
OpenZFS 8199 - multi-threaded dmu_object_alloc()

dmu_object_alloc() is single-threaded, so when multiple threads are
creating files in a single filesystem, they spend a lot of time waiting
for the os_obj_lock.  To improve performance of multi-threaded file
creation, we must make dmu_object_alloc() typically not grab any
filesystem-wide locks.

The solution is to have a "next object to allocate" for each CPU. Each
of these "next object"s is in a different block of the dnode object, so
that concurrent allocation holds dnodes in different dbufs.  When a
thread's "next object" reaches the end of a chunk of objects (by default
4 blocks worth -- 128 dnodes), it will be reset to the per-objset
os_obj_next, which will be increased by a chunk of objects (128).  Only
when manipulating the os_obj_next will we need to grab the os_obj_lock.
This decreases lock contention dramatically, because each thread only
needs to grab the os_obj_lock briefly, once per 128 allocations.

This results in a 70% performance improvement to multi-threaded object
creation (where each thread is creating objects in its own directory),
from 67,000/sec to 115,000/sec, with 8 CPUs.

Work sponsored by Intel Corp.

Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Ned Bass <bass6@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
OpenZFS-issue: https://www.illumos.org/issues/8199
OpenZFS-commit: https://github.com/openzfs/openzfs/pull/374
Closes #4703
Closes #6117

6 years agoOpenZFS 7578 - Fix/improve some aspects of ZIL writing
Giuseppe Di Natale [Fri, 9 Jun 2017 16:15:37 +0000 (09:15 -0700)]
OpenZFS 7578 - Fix/improve some aspects of ZIL writing

- After some ZIL changes 6 years ago zil_slog_limit got partially broken
due to zl_itx_list_sz not updated when async itx'es upgraded to sync.
Actually because of other changes about that time zl_itx_list_sz is not
really required to implement the functionality, so this patch removes
some unneeded broken code and variables.

 - Original idea of zil_slog_limit was to reduce chance of SLOG abuse by
single heavy logger, that increased latency for other (more latency critical)
loggers, by pushing heavy log out into the main pool instead of SLOG.  Beside
huge latency increase for heavy writers, this implementation caused double
write of all data, since the log records were explicitly prepared for SLOG.
Since we now have I/O scheduler, I've found it can be much more efficient
to reduce priority of heavy logger SLOG writes from ZIO_PRIORITY_SYNC_WRITE
to ZIO_PRIORITY_ASYNC_WRITE, while still leave them on SLOG.

 - Existing ZIL implementation had problem with space efficiency when it
has to write large chunks of data into log blocks of limited size.  In some
cases efficiency stopped to almost as low as 50%.  In case of ZIL stored on
spinning rust, that also reduced log write speed in half, since head had to
uselessly fly over allocated but not written areas.  This change improves
the situation by offloading problematic operations from z*_log_write() to
zil_lwb_commit(), which knows real situation of log blocks allocation and
can split large requests into pieces much more efficiently.  Also as side
effect it removes one of two data copy operations done by ZIL code WR_COPIED
case.

 - While there, untangle and unify code of z*_log_write() functions.
Also zfs_log_write() alike to zvol_log_write() can now handle writes crossing
block boundary, that may also improve efficiency if ZPL is made to do that.

Sponsored by:   iXsystems, Inc.

Authored by: Alexander Motin <mav@FreeBSD.org>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Steven Hartland <steven.hartland@multiplay.co.uk>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <ryao@gentoo.org>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/7578
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/aeb13ac
Closes #6191

6 years agoOpenZFS 8155 - simplify dmu_write_policy handling of pre-compressed buffers
Matthew Ahrens [Thu, 23 Mar 2017 16:07:27 +0000 (09:07 -0700)]
OpenZFS 8155 - simplify dmu_write_policy handling of pre-compressed buffers

Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
When writing pre-compressed buffers, arc_write() requires that
the compression algorithm used to compress the buffer matches
the compression algorithm requested by the zio_prop_t, which is
set by dmu_write_policy(). This makes dmu_write_policy() and its
callers a bit more complicated.

We simplify this by making arc_write() trust the caller to supply
the type of pre-compressed buffer that it wants to write,
and override the compression setting in the zio_prop_t.

OpenZFS-issue: https://www.illumos.org/issues/8155
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/b55ff58
Closes #6200

6 years agoAdd MS_MANDLOCK mount failure message
Brian Behlendorf [Wed, 7 Jun 2017 17:59:44 +0000 (10:59 -0700)]
Add MS_MANDLOCK mount failure message

Commit torvalds/linux@9e8925b6 allowed for kernels to be built
without support for mandatory locking (MS_MANDLOCK).  This will
result in 'zfs mount' failing when the nbmand=on property is set
if the kernel is built without CONFIG_MANDATORY_FILE_LOCKING.

Unfortunately we can not reliably detect prior to the mount(2) system
call if the kernel was built with this support.  The best we can do
is check if the mount failed with EPERM and if we passed 'mand'
as a mount option and then print a more useful error message. e.g.

  filesystem 'tank/fs' has the 'nbmand=on' property set, this mount
  option may be disabled in your kernel.  Use 'zfs set nbmand=off'
  to disable this option and try to mount the filesystem again.

Additionally, switch the default error message case to use
strerror() to produce a more human readable message.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4729
Closes #6199

6 years agoSkip tests that are slow on 32-bit builders
Giuseppe Di Natale [Wed, 7 Jun 2017 02:04:01 +0000 (22:04 -0400)]
Skip tests that are slow on 32-bit builders

zpool_create_024_pos, zvol_misc_002_pos, write_dirs_002_pos are slow
on the buildbot 32-bit builder. Skip the test cases for now on 32-bit
builders.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #6195

6 years agoReduce async_destroy_001_pos memory requirements
Brian Behlendorf [Tue, 6 Jun 2017 18:30:47 +0000 (11:30 -0700)]
Reduce async_destroy_001_pos memory requirements

The number of blocks which can be freed per TXG is controlled
by the zfs_free_max_blocks module option (defaults to 100,000).
Both speed up this test case and reduce the memory requirements
by only creating 4 TXGs worth of blocks to be freed.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #5479
Closes #6192

6 years agoAllow add of raidz and mirror with same redundancy
Håkan Johansson [Mon, 5 Jun 2017 20:53:09 +0000 (22:53 +0200)]
Allow add of raidz and mirror with same redundancy

Allow new members to be added to a pool mixing raidz and mirror vdevs
without giving -f, as long as they have matching redundancy.  This case
was missed in #5915, which only handled zpool create.

Add zfstest zpool_add_010_pos.ksh, with test of zpool create
followed by zpool add of mixed raidz and mirror vdevs.

Add some more mixed raidz and mirror cases to zpool_create_006_pos.ksh.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Haakan Johansson <f96hajo@chalmers.se>
Issue #5915
Closes #6181

6 years agoLinux 4.9 compat: fix zfs_ctldir xattr handling
LOLi [Mon, 5 Jun 2017 18:26:25 +0000 (20:26 +0200)]
Linux 4.9 compat: fix zfs_ctldir xattr handling

Since torvalds/linux@d0a5b99 IOP_XATTR is used to indicate the inode
has xattr support: clear it for the ctldir inodes to avoid EIO errors.

Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #6189

6 years agozpool iostat/status -c improvements
Giuseppe Di Natale [Mon, 5 Jun 2017 17:52:15 +0000 (13:52 -0400)]
zpool iostat/status -c improvements

Users can now provide their own scripts to be run
with 'zpool iostat/status -c'. User scripts should be
placed in ~/.zpool.d to be included in zpool's
default search path.

Provide a script which can be used with
'zpool iostat|status -c' that will return the type of
device (hdd, sdd, file).

Provide a script to get various values from smartctl
when using 'zpool iostat/status -c'.

Allow users to define the ZPOOL_SCRIPTS_PATH
environment variable which can be used to override
the default 'zpool iostat/status -c' search path.

Allow the ZPOOL_SCRIPTS_ENABLED environment
variable to enable or disable 'zpool status/iostat -c'
functionality.

Use the new smart script to provide the serial command.

Install /etc/sudoers.d/zfs file which contains the sudoer
rule for smartctl as a sample.

Allow 'zpool iostat/status -c' tests to run in tree.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #6121
Closes #6153