]> git.proxmox.com Git - mirror_zfs.git/log
mirror_zfs.git
7 years agoIncrease zfs_vdev_async_write_min_active to 2
DHE [Sun, 26 Mar 2017 02:36:28 +0000 (22:36 -0400)]
Increase zfs_vdev_async_write_min_active to 2

Resilver operations frequently cause only a small amount of dirty data
to be written to disk at a time, resulting in the IO scheduler to only
issue 1 write at a time to the resilvering disk. When it is rotational
media the drive will often travel past the next sector to be written
before receiving a write command from ZFS, significantly delaying the
write of the next sector.

Raise zfs_vdev_async_write_min_active so that drives are kept fed
during resilvering.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: DHE <git@dehacked.net>
Issue #4825
Closes #5926

7 years agoOpenZFS 8061 - sa_find_idx_tab can be declared more type-safely
Matthew Ahrens [Thu, 13 Apr 2017 21:38:16 +0000 (14:38 -0700)]
OpenZFS 8061 - sa_find_idx_tab can be declared more type-safely

Authored by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
sa_find_idx_tab() is declared as taking and returning "void *" parameters.
These can be declared to be the specific types.

OpenZFS-issue: https://www.illumos.org/issues/8061
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/4e64aff
Closes #6017

7 years agoOpenZFS 7900 - zdb shouldn't print the path of a znode at verbosity < 5
Alan Somers [Thu, 13 Apr 2017 21:22:32 +0000 (14:22 -0700)]
OpenZFS 7900 - zdb shouldn't print the path of a znode at verbosity < 5

Authored by: Alan Somers <asomers@gmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
There are two reasons:
1) Finding a znode's path is slower than printing any other znode
   information at verbosity < 5.
2) On a corrupted pool like the one mentioned below, zdb will crash when it
   tries to determine the znode's path. But with this patch, zdb can still
   extract useful information from such pools.

OpenZFS-issue: https://www.illumos.org/issues/7900
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/2b0dee1
Closes #6016

7 years agoOpenZFS 6101 - attempt to lzc_create() a filesystem under a volume results in a panic
Andriy Gapon [Thu, 13 Apr 2017 21:32:08 +0000 (14:32 -0700)]
OpenZFS 6101 - attempt to lzc_create() a filesystem under a volume results in a panic

Authored by: Andriy Gapon <avg@FreeBSD.org>
Approved by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
When querying ZPL properties verify that the objset is of type
DMU_OST_ZFS.

OpenZFS-issue: https://www.illumos.org/issues/6101
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/ce2243a
Closes #6015

7 years agoOpenZFS 8026 - retire zfs_throttle_delay and zfs_throttle_resolution
Andriy Gapon [Fri, 14 Apr 2017 05:42:15 +0000 (01:42 -0400)]
OpenZFS 8026 - retire zfs_throttle_delay and zfs_throttle_resolution

Authored by: Andriy Gapon <avg@FreeBSD.org>
Approved by: Richard Lowe <richlowe@richlowe.net>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/8026
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/9b33e07
Closes #6014

7 years agoSEEK_HOLE should not block on txg_wait_synced()
Debabrata Banerjee [Fri, 24 Mar 2017 21:28:38 +0000 (17:28 -0400)]
SEEK_HOLE should not block on txg_wait_synced()

Force flushing of txg's can be painfully slow when competing for disk
IO, since this is a process meant to execute asynchronously. Optimize
this path via allowing data/hole seeking if the file is clean, but if
dirty fall back to old logic. This is a compromise to disabling the
feature entirely.

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com>
Closes #4306
Closes #5962

7 years agoOpenZFS 6410 - teach zdb to perform object lookups by path
Brian Behlendorf [Thu, 13 Apr 2017 16:40:56 +0000 (09:40 -0700)]
OpenZFS 6410 - teach zdb to perform object lookups by path

Authored by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Will Andrews <will@freebsd.org>
Approved by: Dan McDonald <danmcd@omniti.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Porting Notes:
- Replaced zdb.8 with upstream mdoc zdb.1m version.  Updated to
  include Linux specific features: -V verbatium imports and
  improved label printing (-u, and -l).
- Minor changes to `zdb -h` output to honor 80 character limit.

OpenZFS-issue: https://www.illumos.org/issues/6410
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/ed61ec1
Closes #6006

7 years agoOpenZFS 5120 - zfs should allow large block/gzip/raidz boot pool (loader project)
Brian Behlendorf [Thu, 13 Apr 2017 16:40:00 +0000 (09:40 -0700)]
OpenZFS 5120 - zfs should allow large block/gzip/raidz boot pool (loader project)

Authored by: Toomas Soome <tsoome@me.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Don Brady <don.brady@intel.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Porting Notes:
grub-2.02-beta2-422-gcad5cc0 includes support for large blocks.
- Commit 8aab121 allowed GZIP[1-9].
- Grub allows pools with multiple top-level vdevs.

OpenZFS-issue: https://www.illumos.org/issues/5120
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c8811bd
Closes #6007

7 years agoInvalidate cache during a zpool labelclear
Giuseppe Di Natale [Wed, 12 Apr 2017 22:49:31 +0000 (15:49 -0700)]
Invalidate cache during a zpool labelclear

Be sure to invalidate a vdev's cache before performing
a zpool labelclear. There are cases where the cache is
stale because we did some operation that bypassed it,
and since we are doing an open with only O_RDWR, we
should invalidate it to be safe.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #6009

7 years agoOpenZFS 7503 - zfs-test should tail ::zfs_dbgmsg on test failure
Brian Behlendorf [Wed, 12 Apr 2017 20:36:48 +0000 (13:36 -0700)]
OpenZFS 7503 - zfs-test should tail ::zfs_dbgmsg on test failure

Authored by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Approved by: Gordon Ross <gordon.w.ross@gmail.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Porting Notes:
- Enable internal log for DEBUG builds and in zfs-tests.sh.
- callbacks/zfs_dbgmsg.ksh - Dump interal log via kstat.
- callbacks/zfs_dmesg.ksh - Dump dmesg log.
- default.cfg - 'Test Suite Specific Commands' dropped.

OpenZFS-issue: https://www.illumos.org/issues/7503
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/55a1300
Closes #6002

7 years agoFix header inclusions for standards conformance
Richard Yao [Sat, 8 Apr 2017 16:51:04 +0000 (12:51 -0400)]
Fix header inclusions for standards conformance

musl's sys/errno.h is literally:

/#warning redirecting incorrect #include <sys/errno.h> to <errno.h>
/#include <errno.h>

It does the same for sys/{poll,signal}.h. This is rather noisy when
building ZoL against musl. musl is also correct in pointing out that the
correct headers are outside of sys/ according to the single unix
specification:

http://pubs.opengroup.org/onlinepubs/7908799/xsh/errno.h.html
http://pubs.opengroup.org/onlinepubs/7908799/xsh/poll.h.html
http://pubs.opengroup.org/onlinepubs/7908799/xsh/signal.h.html

Lets implement our own sys/* versions of these headers to redirect to
the proper userland ones when building in userspace. That will silence
the warning.

There are also some instances where we include incorrectly from sys/ or
from outside of sys/ in userspace only code. In these instances, lets
just fix the includes directly.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes #5993

7 years agoFix `zpool iostat -T d 1` on musl
Richard Yao [Sun, 9 Apr 2017 19:00:03 +0000 (15:00 -0400)]
Fix `zpool iostat -T d 1` on musl

When building on Gentoo against musl, GCC complains:

timestamp.c: In function ‘print_timestamp’:
timestamp.c:32:19: warning: passing argument 1 of ‘nl_langinfo’ makes
integer from pointer without a cast
 #define _DATE_FMT "%+"
                   ^
timestamp.c:47:21: note: in expansion of macro ‘_DATE_FMT’
   fmt = nl_langinfo(_DATE_FMT);
                     ^
The error was wrapped to meet comment style requirements.

This code is used by `zpool iostat -T d 1` to print a date and upon
testing it, I see no date printed. Lets use D_T_FMT so that something
gets printed and if D_T_FMT is not avaliable, then we can fall back to
"%+".

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes #5993

7 years agoAdd missing includes to zed_log.c
Richard Yao [Sat, 8 Apr 2017 17:14:14 +0000 (13:14 -0400)]
Add missing includes to zed_log.c

GCC 4.9.4 complains about implicit function declarations when building
against musl on Gentoo.

zed_log.c: In function ‘zed_log_pipe_open’:
zed_log.c:69:7: warning: implicit declaration of function ‘getpid’
       (int)getpid());
       ^
zed_log.c:71:2: warning: implicit declaration of function ‘pipe’
  if (pipe(_ctx.pipe_fd) < 0)
  ^
zed_log.c: In function ‘zed_log_pipe_close_reads’:
zed_log.c:90:2: warning: implicit declaration of function ‘close’
  if (close(_ctx.pipe_fd[0]) < 0)
  ^
zed_log.c: In function ‘zed_log_pipe_wait’:
zed_log.c:141:3: warning: implicit declaration of function ‘read’
   n = read(_ctx.pipe_fd[0], &c, sizeof (c));

The [-Wimplicit-function-declaration] at the end of each warning has
been removed to meet comment style requirements.

The man pages say to include <sys/types.h> and <unistd.h>. Doing that
silences the warnings.

Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes #5993

7 years agoOpenZFS 7535 - need test for resumed send of top most filesystem
Brian Behlendorf [Wed, 12 Apr 2017 15:47:42 +0000 (08:47 -0700)]
OpenZFS 7535 - need test for resumed send of top most filesystem

Authored by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Porting Notes:
- zfs_share_001_pos.ksh - Older versions of exportfs will match
  multiple exports that share a common prefix.  Reorder the 'fs'
  list so unshares occur from most to least unique.
- zfs_share_005_pos.ksh - Enabled and updated for Linux.

OpenZFS-issue: https://www.illumos.org/issues/7535
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/ac89d1e
Closes #5979

7 years agoSkip rate limiting events in zfs_ereport_post
Giuseppe Di Natale [Wed, 12 Apr 2017 01:37:45 +0000 (18:37 -0700)]
Skip rate limiting events in zfs_ereport_post

In zfs_ereport_post, if an event is a rate limiting
event, immediately return before any processing is done.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #5998

7 years agoOpenZFS 6865 - want zfs-tests cases for zpool labelclear command
Yuri Pankov [Fri, 13 Jan 2017 17:25:15 +0000 (09:25 -0800)]
OpenZFS 6865 - want zfs-tests cases for zpool labelclear command

Authored by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Porting Notes:
- Updated 'zpool labelclear' and 'zdb -l' such that they attempt
  to find a vdev given solely its short name.  This behavior is
  consistent with the upstream OpenZFS code and the test cases
  depend on it.  The actual implementation differs slightly due
  to device naming conventions on Linux.
- auto_online_001_pos, auto_replace_001_pos and add-o_ashift
  test cases updated to expect failure when no label exists.
- read_efi_label() and zpool_label_disk_check() are read-only
  operations and should use O_RDONLY at open time to enforce this.
- zpool_label_disk() and zpool_relabel_disk() write the partition
  information using O_DIRECT an fsync() and page cache invalidation
  to ensure a consistent view of the device.
- dump_label() in zdb should invalidate the page cache in order
  to get the authoritative label from disk.

OpenZFS-issue: https://www.illumos.org/issues/6865
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c95076c
Closes #5981

7 years agoFix size inflation in spa_get_worst_case_asize()
LOLi [Mon, 10 Apr 2017 22:28:21 +0000 (00:28 +0200)]
Fix size inflation in spa_get_worst_case_asize()

When we try assign a new transaction to a TXG we must know beforehand
if there is sufficient free space on disk. This is to decide,
in dmu_tx_assign(), if we should reject the TX with ENOSPC.

We rely on spa_get_worst_case_asize() to inflate the size of our
logical writes by a factor of spa_asize_inflation which is
calculated as:

   (VDEV_RAIDZ_MAXPARITY + 1) * SPA_DVAS_PER_BP * 2 == 24

The problem with the current implementation is that we don't take
into account what happens with very small writes on VDEVs with large
physical block sizes.
Consider the case of writes to a dataset with recordsize=512,
copies=3 on a VDEV with ashift=13 (usually SSD with 8K block size):
every logical IO will end up allocating 3 * 8K = 24K on disk, so 512
bytes multiplied by 48, which is double the size we account for.
If we allow this kind of writes to be assigned a TX it is possible,
when the pool is almost full, to trigger an allocation failure
(ENOSPC) in the ZIO pipeline, which will in turn result in the whole
pool being suspended.

The bug is fixed by using, in spa_get_worst_case_asize(), the MAX()
value chosen between the logical io size from zfs_write() and the
maximum physical block size used among our VDEVs.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #5941

7 years agoOpenZFS 8005 - poor performance of 1MB writes on certain RAID-Z configurations
Matthew Ahrens [Mon, 10 Apr 2017 22:21:45 +0000 (15:21 -0700)]
OpenZFS 8005 - poor performance of 1MB writes on certain RAID-Z configurations

Authored by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Don Brady <don.brady@intel.com>
Ported-by: Matt Ahrens <mahrens@delphix.com>
RAID-Z requires that space be allocated in multiples of P+1 sectors,
because this is the minimum size block that can have the required amount
of parity.  Thus blocks on RAIDZ1 must be allocated in a multiple of 2
sectors; on RAIDZ2 multiple of 3; and on RAIDZ3 multiple of 4.  A sector
is a unit of 2^ashift bytes, typically 512B or 4KB.

To satisfy this constraint, the allocation size is rounded up to the
proper multiple, resulting in up to 3 "pad sectors" at the end of some
blocks.  The contents of these pad sectors are not used, so we do not
need to read or write these sectors.  However, some storage hardware
performs much worse (around 1/2 as fast) on mostly-contiguous writes
when there are small gaps of non-overwritten data between the writes.
Therefore, ZFS creates "optional" zio's when writing RAID-Z blocks that
include pad sectors.  If writing a pad sector will fill the gap between
two (required) writes, we will issue the optional zio, thus doubling
performance.  The gap-filling performance improvement was introduced in
July 2009.

Writing the optional zio is done by the io aggregation code in
vdev_queue.c.  The problem is that it is also subject to the limit on
the size of aggregate writes, zfs_vdev_aggregation_limit, which is by
default 128KB.  For a given block, if the amount of data plus padding
written to a leaf device exceeds zfs_vdev_aggregation_limit, the
optional zio will not be written, resulting in a ~2x performance
degradation.

The problem occurs only for certain values of ashift, compressed block
size, and RAID-Z configuration (number of parity and data disks).  It
cannot occur with the default recordsize=128KB.  If compression is
enabled, all configurations with recordsize=1MB or larger will be
impacted to some degree.

The problem notably occurs with recordsize=1MB, compression=off, with 10
disks in a RAIDZ2 or RAIDZ3 group (with 512B or 4KB sectors).  Therefore
this problem has been known as "the 1MB 10-wide RAIDZ2 (or 3) problem".

The problem also occurs with the following configurations:

With recordsize=512KB or 256KB, compression=off, the problem occurs only
in rarely-used configurations:
* 4-wide RAIDZ1 with recordsize=512KB and ashift=12 (4KB sectors)
* 4-wide RAIDZ2 (either recordsize, either ashift)
* 5-wide RAIDZ2 with recordsize=512KB (either ashift)
* 6-wide RAIDZ2 with recordsize=512KB (either ashift)

With recordsize=1MB, compression=off, ashift=9 (512B sectors)
* RAIDZ1 with 4 or 8 disks
* RAIDZ2 with 4, 8, or 10 disks
* RAIDZ3 with 6, 8, 9, or 10 disks

With recordsize=1MB, compression=off, ashift=12 (4KB sectors)
* RAIDZ1 with 7 or 8 disks
* RAIDZ2 with 4, 5, or 10 disks
* RAIDZ3 with 6, 9, or 10 disks

With recordsize=2MB and larger (which can only be selected by changing
kernel tunables), many configurations are affected, including with
higher numbers of disks (up to 18 disks with recordsize=2MB).

Increase zfs_vdev_aggregation_limit to allow the optional zio to be
aggregated, thus eliminating the problem.  Setting it to 256KB fixes all
commonly-used configurations.

The solution is to aggregate optional zio's regardless of the
aggregation size limit.

Analysis sponsored by Intel Corp.

OpenZFS-issue: https://www.illumos.org/issues/8005
OpenZFS-commit: https://github.com/openzfs/openzfs/pull/321
Closes #5931

7 years agoOpenZFS 2932 - support crash dumps to raidz, etc. pools
Giuseppe Di Natale [Thu, 6 Apr 2017 15:25:47 +0000 (08:25 -0700)]
OpenZFS 2932 - support crash dumps to raidz, etc. pools

Authored by: Bill Pijewski <wdp@joyent.com>
Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@nexenta.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/2932
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/810e43b
Closes #5984
Closes #5216

7 years agozfstest - replace dircmp with diff
George Melikov [Sun, 9 Apr 2017 23:17:55 +0000 (03:17 +0400)]
zfstest - replace dircmp with diff

`dircmp` doesn't exist in Linux while `diff` is already used
by zfstests on all platforms.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Wren Kennedy <john.kennedy@delphix.com>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #5996

7 years agozfstest reservation_009_pos.sh missed backslash
George Melikov [Sun, 9 Apr 2017 23:15:44 +0000 (03:15 +0400)]
zfstest reservation_009_pos.sh missed backslash

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Wren Kennedy <john.kennedy@delphix.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #5997

7 years agoOpenZFS 8023 - Panic destroying a metaslab deferred range tree
George Wilson [Fri, 7 Apr 2017 20:50:18 +0000 (13:50 -0700)]
OpenZFS 8023 - Panic destroying a metaslab deferred range tree

Authored by: George Wilson <george.wilson@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
We don't want to dirty any data when we're in the final txgs of the pool
export logic. This change introduces checks to make sure that no data is
dirtied after a certain point. It also addresses the culprit of this
specific bug – the space map cannot be upgraded when we're in final
stages of pool export. If we encounter a space map that wants to be
upgraded in this phase, then we simply ignore the request as it will get
retried the next time we set the fragmentation metric on that metaslab.

OpenZFS-issue: https://www.illumos.org/issues/8023
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/2ef00f5
Closes #5991

7 years agoOpenZFS 5380 - receive of a send -p stream doesn't need to try renaming snapshots
Andriy Gapon [Fri, 7 Apr 2017 20:54:29 +0000 (13:54 -0700)]
OpenZFS 5380 - receive of a send -p stream doesn't need to try renaming snapshots

Authored by: Andriy Gapon <avg@FreeBSD.org>
Approved by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
recv_incremental_replication() takes care of things like removing
datasets that have been removed on the sending side, detecting renamed
datasets, ensuring that all datasets in the affected hierarchy have the
same properties as their counterparts on the sending side.
All of the above are not necessary if we are receiving a stream for a
single dataset that has been generated with zfs send -p, that is, a
stream that includes properties.  zfs_receive_one() already takes care
of applying the properties to the received datasets.

OpenZFS-issue: https://www.illumos.org/issues/5380
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/b8ab927
Closes #5990

7 years agoOpenZFS 8046 - Let calloc() do the multiplication in libzfs_fru_refresh
Pedro Giffuni [Fri, 7 Apr 2017 20:36:06 +0000 (13:36 -0700)]
OpenZFS 8046 - Let calloc() do the multiplication in libzfs_fru_refresh

Authored by: Pedro Giffuni <pfg@freebsd.org>
Approved by: Robert Mustacchi <rm@joyent.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/8046
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/3a3c0d5
Closes #5989

7 years agoOpenZFS 8027 - tighten up dsl_pool_dirty_delta
Andriy Gapon [Fri, 7 Apr 2017 20:52:26 +0000 (13:52 -0700)]
OpenZFS 8027 - tighten up dsl_pool_dirty_delta

Authored by: Andriy Gapon <avg@FreeBSD.org>
Approved by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/8027
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/642668d
Closes #5988

7 years agozfs_receive_010_pos.ksh local => typeset change
George Melikov [Sun, 9 Apr 2017 23:01:54 +0000 (03:01 +0400)]
zfs_receive_010_pos.ksh local => typeset change

Ksh uses `typeset`, `local` is a Bash analog.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Wren Kennedy <john.kennedy@delphix.com>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #5995

7 years agozfstests cli_user/misc/setup.ksh space missed
George Melikov [Sun, 9 Apr 2017 23:00:43 +0000 (03:00 +0400)]
zfstests cli_user/misc/setup.ksh space missed

Ksh syntax requires a space after `!` in if statement.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Wren Kennedy <john.kennedy@delphix.com>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #5994

7 years agoOpenZFS 7404 - rootpool_007_neg, bootfs_006_pos and bootfs_008_neg tests fail with...
Toomas Soome [Sat, 3 Dec 2016 07:13:44 +0000 (23:13 -0800)]
OpenZFS 7404 - rootpool_007_neg, bootfs_006_pos and bootfs_008_neg tests fail with the loader project bits

Authored by: Toomas Soome <tsoome@me.com>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Reviewed by: Marcel Telka <marcel@telka.sk>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Porting Notes:
- Removed gzip and zle compression restriction on bootfs
  datasets.  Grub added support for these long ago.  Ay
  version of grub which understands lz4 also supports this.
- Enabled rootpool tests in runfile but skipped by default
  in setup on Linux since they modify the rootpool.
- bootfs_006_pos.ksh, striped pools are allowed as bootfs.

OpenZFS-issue: https://www.illumos.org/issues/7404
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/55a424c
Closes #5982

7 years agoOpenZFS 7629 - Fix for 7290 neglected to remove some escape sequences
Brian Behlendorf [Fri, 7 Apr 2017 16:30:05 +0000 (09:30 -0700)]
OpenZFS 7629 - Fix for 7290 neglected to remove some escape sequences

Authored by: John Wren Kennedy <john.kennedy@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Porting Notes:
- Multiple changes in this commit were applied in c1d9abf.

OpenZFS-issue: https://www.illumos.org/issues/7629
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f5fb56d
Closes #5980

7 years agoCorrect shellcheck make recipe
Giuseppe Di Natale [Fri, 7 Apr 2017 00:16:41 +0000 (17:16 -0700)]
Correct shellcheck make recipe

Consolidated the shellcheck call in the
make recipe down to a single call of
shellcheck. Corrected script errors that
have been skipped. Corrected script errors
that have been introduced because make
wasn't reporting any errors from shellcheck.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #5976

7 years agoSkip xfstests on Amazon Linux
Brian Behlendorf [Fri, 7 Apr 2017 00:15:30 +0000 (17:15 -0700)]
Skip xfstests on Amazon Linux

The ZFS enabled versions of xfstests fails to build cleanly on
Amazon Linux.  This issue should be resolved by rebasing the ZFS
patches against the latest xfstests and pushing those patches
upstream.  This would allow us to use an unmodified xfstests.

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #5481
Closes #5977

7 years agoFix coverity defects: CID 161288
Giuseppe Di Natale [Thu, 6 Apr 2017 20:18:22 +0000 (13:18 -0700)]
Fix coverity defects: CID 161288

CID 161288:  Null pointer dereferences  (REVERSE_INULL)

Ensure physpath != NULL before the strcmp.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #5974

7 years agoOpenZFS 7290 - ZFS test suite needs to control what utilities it can run
John Wren Kennedy [Thu, 6 Apr 2017 00:18:22 +0000 (20:18 -0400)]
OpenZFS 7290 - ZFS test suite needs to control what utilities it can run

Authored by: John Wren Kennedy <john.kennedy@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Gordon Ross <gordon.w.ross@gmail.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: George Melikov <mail@gmelikov.ru>
Porting Notes:
- Utilities which aren't available under Linux have been removed.
- Because of sudo's default secure path behavior PATH must be
  explicitly reset at the top of libtest.shlib.  This avoids the
  need for all users to customize secure path on their system.
- Updated ZoL infrastructure to manage constrained path
- Updated all test cases
- Check permissions for usergroup tests
- When testing in-tree create links under bin/
- Update fault cleanup such that missing files during
  cleanup aren't fatal.
- Configure su environment with constrained path

OpenZFS-issue: https://www.illumos.org/issues/7290
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/1d32ba6
Closes #5903

7 years agoAdded auto-replace FMA test for the ZFS Test Suite
Sydney Vanda [Thu, 2 Mar 2017 16:47:26 +0000 (09:47 -0700)]
Added auto-replace FMA test for the ZFS Test Suite

Also included are updates to auto-online test

Automated auto-replace test to go along with ZED FMA integration
(PR 4673) auto-replace_001.pos works using a scsi_debug device
(the only usable virtual device currently due to whole_disk var
needing to be set)

Functionality for automated FMA auto-replace test to work with
scsi_debug devs:  Some functionality/exceptions needed to be
added for automation of auto-replace to work correctly.

In the test an alias vdev_id rule is added for any scsi_debug
device which sets the phys_path="scsidebug" after a udevadm
trigger command.

A symlink is created for the vdev_id.conf file (in /etc/zfs/ by
default) to be used in-tree for the test suite
(/var/tmp/zfs/vdev_id.conf).  "./scripts/zfs-helpers.sh -i" needs
to be run before fault tests in the ZTS (to use udev rules in-tree)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Don Brady <don.brady@intel.com>
Reviewed-by: David Quigley <david.quigley@intel.com>
Signed-off-by: Sydney Vanda <sydney.m.vanda@intel.com>
Closes #5944

7 years agoAccept raidz and mirror with similar redundancy
Håkan Johansson [Wed, 5 Apr 2017 22:21:13 +0000 (00:21 +0200)]
Accept raidz and mirror with similar redundancy

Allow a pool to be created with both raidz and mirror members,
without giving -f, as long as they have matching redundancy.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Haakan T Johansson <f96hajo@chalmers.se>
Closes #5915

7 years agoFix regression in zfs_ereport_start()
Don Brady [Wed, 5 Apr 2017 21:24:26 +0000 (15:24 -0600)]
Fix regression in zfs_ereport_start()

On 32-bit platforms spa_state is 32 bits without cast, and thus
caused a NULL pointer dereference when treated as 64bit in
var arg.  Accidentally introduced by bcdb96a.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Signed-off-by: Don Brady <don.brady@intel.com>
Closes #5966
Closes #5965

7 years agoFix coverity defects: CID 161264
Giuseppe Di Natale [Wed, 5 Apr 2017 20:21:10 +0000 (13:21 -0700)]
Fix coverity defects: CID 161264

CID 161264:  Uninitialized variables  (UNINIT)

In _zed_event_add_nvpair, when handling DATA_TYPE_UINT64,
we should be using i64 throughout the entire case.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Don Brady <don.brady@intel.com>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #5964

7 years agoOpenZFS 7885 - zpool list can report 16.0e for expandsz
Steven Hartland [Mon, 3 Apr 2017 23:38:51 +0000 (16:38 -0700)]
OpenZFS 7885 - zpool list can report 16.0e for expandsz

Authored by: Steven Hartland <steven.hartland@multiplay.co.uk>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Gordon Ross <gordon.w.ross@gmail.com>

When a member of a RAIDZ has been replaced with a device smaller than
the original, then the top level vdev can report its expand size as
16.0E.
The reduced child asize causes the RAIDZ to have a vdev_asize lower than
its vdev_max_asize which then results in an underflow during the
calculation of the parents expand size.
Fix this by updating the vdev_asize if it shrinks, which is already
protected by a check against vdev_min_asize so should always be safe.
Also for RAIDZ vdevs, ensure that the sum of their child vdev_min_asize
is always greater than the parents vdev_min_size.

Reviewed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: George Melikov <mail@gmelikov.ru>
OpenZFS-issue: https://www.illumos.org/issues/7885
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/bb0dbaa
Closes #5963

7 years agolist -o props should be alloc,free not used,avail
Tom Matthews [Tue, 4 Apr 2017 18:03:33 +0000 (19:03 +0100)]
list -o props should be alloc,free not used,avail

Manpage suggests the zpool list properties include 'used'
and 'available', when these are invalid property names.
Use alloc and free in their place.

```
$ zpool list -o name,size,used   2>&1 |head -1
bad property list: invalid property 'used'
$ zpool list -o name,size,avail   2>&1 |head -1
bad property list: invalid property 'avail'
$ zpool list -o name,size,available   2>&1 |head -1
bad property list: invalid property 'available'
$ zpool list -o name,size,alloc,free
NAME    SIZE  ALLOC   FREE
apool   464M   203M   261M
bpool  3.62T  1.97T  1.65T
```

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Matthews <tom@axiom-partners.com>
Closes #5959

7 years agoAdditional Information for Zedlets
N Clark [Mon, 3 Apr 2017 21:23:02 +0000 (17:23 -0400)]
Additional Information for Zedlets

* Add ZPOOL pool state to zfs_post_common to
  allow differentiation between export and destroy
  by zedlets.

* Add pool name as standard export  This ensures
  pool name is exported to zedlets.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Don Brady <don.brady@intel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Closes #5942

7 years agoPrevent commitcheck.sh from running twice
Giuseppe Di Natale [Mon, 3 Apr 2017 21:20:01 +0000 (14:20 -0700)]
Prevent commitcheck.sh from running twice

A stray semicolon was causing commitcheck.sh
to run twice when running make checkstyle.
Updated regexes for matching tagged lines.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #5952

7 years agozfs_get_005_neg.ksh fix typos
George Melikov [Mon, 3 Apr 2017 18:06:04 +0000 (22:06 +0400)]
zfs_get_005_neg.ksh fix typos

`test_options_bookmark` function must have an `s` at the end.

Reviewed-by: Marcel Telka <marcel@telka.sk>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #5957

7 years agoCommit message format in contributing guidelines
Giuseppe Di Natale [Fri, 31 Mar 2017 16:33:38 +0000 (09:33 -0700)]
Commit message format in contributing guidelines

Add the need to have a commit message with a specific
format to the contributing guidelines. Provide a script
to help enforce commit message style.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #5943

7 years agoglibc 2.5 compat: use correct header for makedev() et al.
Olaf Faaland [Fri, 31 Mar 2017 16:32:00 +0000 (09:32 -0700)]
glibc 2.5 compat: use correct header for makedev() et al.

In glibc 2.5, makedev(), major(), and minor() are defined in
sys/sysmacros.h.  They are also defined in types.h for backward
compatability, but using these definitions triggers a compile warning.
This breaks the ZFS build, as it builds with -Werror.

autoconf email threads indicate these macros may be defined in
sys/mkdev.h in some cases.

This commit adds configure checks to detect where makedev() is defined:
  sys/sysmacros.h
  sys/mkdev.h

It assumes major() and minor() are defined in the same place.

The libspl types.h then includes
sys/sysmacros.h (preferred) or
sys/mkdev.h (2nd choice)
if one of those defines makedev().

This is done before including the system types.h.

An alternative would be to remove uses of major, minor, and makedev,
instead comparing the st_dev returned from stat64.  These configure
checks would then be unnecessary.

This change revealed that __NORETURN was being defined unnecessarily in
libspl/include/sys/sysmacros.h.  That definition is removed.

The files in which __NORETURN are used all include types.h, and so all
will get the definition provided by feature_tests.h

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #5945

7 years agoFix add-o_ashift.ksh permissions
Brian Behlendorf [Fri, 31 Mar 2017 16:25:23 +0000 (09:25 -0700)]
Fix add-o_ashift.ksh permissions

Test cases must be executable or they will be skipped.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5947

7 years agoRemove dependency on linear ABD
Gvozden Neskovic [Thu, 5 Jan 2017 19:10:07 +0000 (14:10 -0500)]
Remove dependency on linear ABD

Wherever possible it's best to avoid depending on a linear ABD.
Update the code accordingly in the following areas.

- vdev_raidz
- zio, zio_checksum
- zfs_fm
- change abd_alloc_for_io() to use abd_alloc()

Reviewed-by: David Quigley <david.quigley@intel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
Closes #5668

7 years agoOpenZFS 7990 - libzfs: snapspec_cb() does not need to call zfs_strdup()
Giuseppe Di Natale [Wed, 29 Mar 2017 00:22:46 +0000 (17:22 -0700)]
OpenZFS 7990 - libzfs: snapspec_cb() does not need to call zfs_strdup()

Authored by: Marcel Telka <marcel@telka.sk>
Reviewed by: Yuri Pankov <yuri.pankov@gmail.com>
Reviewed by: Toomas Soome <tsoome@me.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/7990
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/d8584ba
Closes #5939

7 years agoCheck ashift validity in 'zpool add'
LOLi [Wed, 29 Mar 2017 00:21:11 +0000 (02:21 +0200)]
Check ashift validity in 'zpool add'

df83110 added the ability to specify a custom "ashift" value from the command
line in 'zpool add' and 'zpool attach'. This commit adds additional checks to
the provided ashift to prevent invalid values from being used, which could
result in disastrous consequences for the whole pool.

Additionally provide ASHIFT_MAX and ASHIFT_MIN definitions in spa.h.

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #5878

7 years agoFix wrong offset args in vdev_cache_write
Chunwei Chen [Tue, 28 Mar 2017 18:06:22 +0000 (11:06 -0700)]
Fix wrong offset args in vdev_cache_write

The offset arguments is wrong when changing to abd_copy_off in a6255b7

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Gvozden Neskovic <neskovic@gmail.com>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #5932
Closes #5936

7 years agoFix "undefined reference to xdr_control" when building raidz_test cmd
Sen Haerens [Tue, 28 Mar 2017 17:47:50 +0000 (19:47 +0200)]
Fix "undefined reference to xdr_control" when building raidz_test cmd

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: SenH <sen@senhaerens.be>
Closes #5933

7 years agoDisable rsend_009_pos
Brian Behlendorf [Tue, 28 Mar 2017 16:58:23 +0000 (09:58 -0700)]
Disable rsend_009_pos

Test rsend_009_pos has been observed to fail pretty frequently
when testing using a kmemleak enabled kernel.  For the moment
disable this test case until the underlying issue is resolved.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #5887
Closes #5934

7 years agoUpdate documentation for new parameter "zfs_qat_disable"
wli5 [Mon, 27 Mar 2017 19:33:57 +0000 (03:33 +0800)]
Update documentation for new parameter "zfs_qat_disable"

Update documentation in zfs-module-parameters.5 for new
parameter "zfs_qat_disable" which was introduced by #5846.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Weigang Li <weigang.li@intel.com>
Closes #5914

7 years agoAllow c99 when building ZFS in the kernel tree
Brian Behlendorf [Mon, 27 Mar 2017 19:31:15 +0000 (12:31 -0700)]
Allow c99 when building ZFS in the kernel tree

Commit 4a5d7f82 enabled building c99 out of the kernel tree.
However, when building as part of the kernel different Makefiles
are used and -std=gnu99 must additionially be added there.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5919

7 years agoFix 'zdb -o' segmentation fault
LOLi [Fri, 24 Mar 2017 01:57:54 +0000 (02:57 +0100)]
Fix 'zdb -o' segmentation fault

Fix a regression accidentally introduced by OpenZFS 7280 in ed828c0: since
whether to accept NULL as a valid first parameter in strchr() is implementation
specific we add an additional check to avoid crashing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #5917

7 years agoRetry zfs_znode_alloc() in zfs_mknode()
Brian Behlendorf [Fri, 24 Mar 2017 01:26:50 +0000 (18:26 -0700)]
Retry zfs_znode_alloc() in zfs_mknode()

For historical reasons zfs_mknode() was written such that it could
never fail.  This poses a problem for Linux since zfs_znode_alloc()
could potentually failure due to low memory.  Handle this gracefully
by retrying zfs_znode_alloc() until it succeeds, direct reclaim
will eventually be able to allocate memory.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5535
Closes #5908

7 years agoFix undefined reference to `libzfs_fru_compare'
Brian Behlendorf [Fri, 24 Mar 2017 01:24:09 +0000 (18:24 -0700)]
Fix undefined reference to `libzfs_fru_compare'

Add trivial libzfs_fru_compare() function which can be used when
HAVE_LIBTOPO is not defined.  The only caller is find_vdev() and
this function should never be reached because search_fru must be
NULL unless HAVE_LIBTOPO is defined.

Rename _HAS_FMD_TOPO to existing HAVE_LIBTOPO which was
originally added for this purpose.  This macro will never be defined.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5402
Closes #5909

7 years agoOpenZFS 3821 - Race in rollback, zil close, and zil flush
George Wilson [Sun, 6 Nov 2016 03:43:56 +0000 (20:43 -0700)]
OpenZFS 3821 - Race in rollback, zil close, and zil flush

Authored by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Approved by: Richard Lowe <richlowe@richlowe.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: George Melikov <mail@gmelikov.ru>
OpenZFS-issue: https://www.illumos.org/issues/3821
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/43297f9
Closes #5905

7 years agoFix `zpool status -v` error message
Brian Behlendorf [Thu, 23 Mar 2017 01:08:55 +0000 (18:08 -0700)]
Fix `zpool status -v` error message

When a pool is suspended it's impossible to read the list
of damaged files from disk.  This would result in a generic
misleading "insufficient permissions" error message.

Update zpool_get_errlog() to use the standard zpool error
logging functions to generate a useful error message.  In
this case:

  errors: List of errors unavailable: pool I/O is currently suspended

This patch does not address the related issue of potentially
not being able to resume a suspend pool when the underlying
device names have changed.

Additionally, remove the error handling from zfs_alloc()
in zpool_get_errlog() for readability since this function
can never fail.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4031
Closes #5731
Closes #5907

7 years agoGZIP compression offloading with QAT accelerator
wli5 [Thu, 23 Mar 2017 00:58:47 +0000 (08:58 +0800)]
GZIP compression offloading with QAT accelerator

This patch implement the hardware accelerator method in GZIP compression
in ZFS. When the ZFS pool is enabled GZIP compression, the compression
API will be automatically transferred to the hardware accelerator to
free up CPU resource and speed up the compression time.

* To enable Intel QAT hardware acceleration in ZOL you need to have QAT
  hardware and the driver installed:
  * QAT hardware DH8950:
  http://ark.intel.com/products/79483/Intel-QuickAssist-Adapter-8950
  * QAT driver:
  https://01.org/intel-quickassist-technology
* Start QAT driver in your system:
  service qat_service start
* Enable QAT in ZFS, e.g.:
  ./configure --with-qat=<qat-driver-path>/QAT1.6
  make
* Set GZIP compression in ZFS dataset:
  zfs set compression = gzip <dataset>
* Get QAT hardware statistics by:
  cat /proc/spl/kstat/zfs/qat
* To disable QAT in ZFS:
  insmod zfs.ko zfs_qat_disable=1

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Signed-off-by: Weigang Li <weigang.li@intel.com>
Closes #5846

7 years agolibspl: Fix incorrect use of platform defines on sparc64
John Paul Adrian Glaubitz [Thu, 23 Mar 2017 00:55:00 +0000 (01:55 +0100)]
libspl: Fix incorrect use of platform defines on sparc64

libspl tries to detect sparc64 by checking whether __sparc64__
is defined. Unfortunately, this assumption is not correct as
sparc64 does not define __sparc64__ but it defines __sparc__
and __arch64__ instead. This leads to sparc64 being detected
as 32-Bit sparc and the build fails because both _ILP32 and
_LP64 are defined in this case.

To fix the problem, remove the checks for __sparc64__ and
just check __arch64__ if a sparc host was previously
detected with __sparc__.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Closes #5913

7 years agoOpenZFS 7968 - multi-threaded spa_sync()
Matthew Ahrens [Tue, 21 Mar 2017 01:36:00 +0000 (18:36 -0700)]
OpenZFS 7968 - multi-threaded spa_sync()

Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: Matthew Ahrens <mahrens@delphix.com>
spa_sync() iterates over all the dirty dnodes and processes each of them
by calling dnode_sync(). If there are many dirty dnodes (e.g. because we
created or removed a lot of files), the single thread of spa_sync()
calling dnode_sync() can become a bottleneck. Additionally, if many
dnodes are dirtied concurrently in open context (e.g. due to concurrent
file creation), the os_lock will experience lock contention via
dnode_setdirty().

The solution is to track dirty dnodes on a multilist_t, and for
spa_sync() to use separate threads to process each of the sublists in
the multilist.

OpenZFS-issue: https://www.illumos.org/issues/7968
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/4a2a54c
Closes #5752

7 years agoLinux 4.11 compat: iops.getattr and friends
Olaf Faaland [Tue, 21 Mar 2017 00:51:16 +0000 (17:51 -0700)]
Linux 4.11 compat: iops.getattr and friends

In torvalds/linux@a528d35, there are changes to the getattr family of functions,
struct kstat, and the interface of inode_operations .getattr.

The inode_operations .getattr and simple_getattr() interface changed to:

int (*getattr) (const struct path *, struct dentry *, struct kstat *,
    u32 request_mask, unsigned int query_flags)

The request_mask argument indicates which field(s) the caller intends to use.
Fields the caller has not specified via request_mask may be set in the returned
struct anyway, but their values may be approximate.

The query_flags argument indicates whether the filesystem must update
the attributes from the backing store.

Currently both fields are ignored.  It is possible that getattr-related
functions within zfs could be optimized based on the request_mask.

struct kstat includes new fields:
u32               result_mask;  /* What fields the user got */
u64               attributes;   /* See STATX_ATTR_* flags */
struct timespec   btime;        /* File creation time */

Fields attribute and btime are cleared; the result_mask reflects this.  These
appear to be optional based on simple_getattr() and vfs_getattr() within the
kernel, which take the same approach.

Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #5875

7 years agozfs(8) fixes
DeHackEd [Mon, 20 Mar 2017 22:14:28 +0000 (18:14 -0400)]
zfs(8) fixes

Documentation fixes for zfs(8)

* White space issue in the userused@user property section
* zfs send supports using bookmarks as the origin snapshot

Reviewed by: Ned Bass <bass6@llnl.gov>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: DHE <git@dehacked.net>
Closes #5906

7 years agoOpenZFS 7801 - add more by-dnode routines (lint)
Matthew Ahrens [Wed, 15 Mar 2017 12:49:59 +0000 (08:49 -0400)]
OpenZFS 7801 - add more by-dnode routines (lint)

Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/7801
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f25efb3
Closes #5894

7 years agoOpenZFS 6874 - rollback and receive need to reset ZPL state to what's on disk
Matthew Ahrens [Wed, 11 May 2016 03:49:02 +0000 (20:49 -0700)]
OpenZFS 6874 - rollback and receive need to reset ZPL state to what's on disk

Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
When we do a clone swap (caused by "zfs rollback" or "zfs receive"), the
ZPL doesn't completely reload the state from the DMU; some values remain
cached in the zfsvfs_t.

OpenZFS-issue: https://www.illumos.org/issues/6874
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/1fdcbd0
Closes #5888

7 years agoAlign mount options handling and type/function names with OpenZFS
Brian Behlendorf [Mon, 13 Mar 2017 22:08:40 +0000 (15:08 -0700)]
Align mount options handling and type/function names with OpenZFS

Refactor the temporary mount option in a way which minimizes
differences with upstream.  Additionally, replace the zfs_sb_t
type with zfsvfs_t and rename several functions to be consistent
with the upstream names.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5876

7 years agoRestructure mount option handling
Brian Behlendorf [Thu, 9 Mar 2017 00:56:09 +0000 (19:56 -0500)]
Restructure mount option handling

Restructure the handling of mount options to be consistent with
upstream OpenZFS.  This required making the following changes.

- The zfs_mntopts_t was renamed vfs_t and adjusted to provide
  the minimal needed functionality.  This includes a pointer
  back to the associated zfsvfs_t.  Plus it made it possible
  to revert zfs_register_callbacks() and zfsvfs_create() back
  to their original prototypes.

- A zfs_mnt_t structure was added for the sole purpose of
  providing a structure to pass the osname and raw mount
  pointer to zfs_domount() without having to copy them.

- Mount option parsing was moved down from the zpl_* wrapper
  functions in to the zfs_* functions.  This allowed for the
  code to be simplied and it's where similar functionality
  appears on other platforms.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
7 years agoRename zfs_* functions
Brian Behlendorf [Wed, 8 Mar 2017 22:56:19 +0000 (17:56 -0500)]
Rename zfs_* functions

Several functions were renamed when ZFS was originally ported to
Linux.  Revert the code to the original names to minimize the
delta with upstream OpenZFS.

  zfs_sb_teardown -> zfsvfs_teardown
  zfs_sb_create -> zfsvfs_create
  zfs_sb_setup -> zfsvfs_setup
  zfs_sb_free -> zfsvfs_free
  get_zfs_sb -> getzfsvfs
  zfs_sb_hold -> zfsvfs_hold
  zfs_sb_rele -> zfsvfs_rele

  zfs_sb_prune_aliases  -> zfs_prune_aliases (Linux-only)
  zfs_sb_prune -> zfs_prune (Linux only)

Align the zfs_vnops.h and zfs_vfsops.h with upstream as much
as possible.  Several prototypes were removed and those that
remain were reordered.

Move the EXPORT_SYMBOL lines to the end of the source files
for consistency with the other source files.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
7 years agoRename zfs_sb_t -> zfsvfs_t
Brian Behlendorf [Wed, 8 Mar 2017 00:21:37 +0000 (19:21 -0500)]
Rename zfs_sb_t -> zfsvfs_t

The use of zfs_sb_t instead of zfsvfs_t results in unnecessary
conflicts with the upstream source.  Change all instances of
zfs_sb_t to zfsvfs_t including updating the variables names.

Whenever possible the code was updated to be consistent with
hope it appears in the upstream OpenZFS source.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
7 years agoFix ZVOL BLKFLSBUF ioctl
Brian Behlendorf [Fri, 10 Mar 2017 01:43:36 +0000 (17:43 -0800)]
Fix ZVOL BLKFLSBUF ioctl

The BLKFLSBUF ioctl is expected to do two things:

  - flush dirty pages to stable storage, and
  - invalidate clean pages

Unfortunately, the existing implementation of BLKFLSBUF in
zvol_ioctl() only flushes pages which are part of the current
TXG to disk.  There may be additional dirty pages in the
page cache which haven't yet been submitted to the DMU and
therefore aren't part of any TXG.

Furthermore because zvol_ioctl() returns 0 the generic
blkdev_flushbuf() does not invalidate the page cache.

Resolve the issue by moving bdev_flush() in to zvol_ioctl()
and explicitly waiting for a full TXG sync.  Then invalidate
the page cache.  The associated ARC buffers need not be
evicted since they cannot be bypassed using O_DIRECT.

Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5871
Closes #5879

7 years agoSuppress cppcheck nullPointer error in zfs_write
Giuseppe Di Natale [Fri, 10 Mar 2017 01:40:21 +0000 (17:40 -0800)]
Suppress cppcheck nullPointer error in zfs_write

Newer versions of cppcheck find the potential NULL pointer
bug in zfs_write(). The function is difficult to refactor without
extensive work, so suppress the potential NULL pointer error
which cannot occur for now.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #5882

7 years agoCorrect arc_summary and dbufstat python style
Giuseppe Di Natale [Thu, 9 Mar 2017 18:21:59 +0000 (10:21 -0800)]
Correct arc_summary and dbufstat python style

arc_summary and dbufstat should have two spaces
after their last function definitions.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #5881

7 years agoEnable shellcheck to run for select scripts
Giuseppe Di Natale [Thu, 9 Mar 2017 18:20:15 +0000 (10:20 -0800)]
Enable shellcheck to run for select scripts

Enable shellcheck to run on zed scripts,
paxcheck.sh, zfs-tests.sh, zfs.sh, and zloop.sh.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #5812

7 years agoFix nfs snapdir automount
Chunwei Chen [Wed, 8 Mar 2017 17:26:33 +0000 (09:26 -0800)]
Fix nfs snapdir automount

The current implementation for allowing nfs to access snapdir is very buggy.
It uses a special fh for snapdirs, such that the next time nfsd does
fh_to_dentry, it actually returns the root inode inside the snapshot. So nfsd
never knows it cross a mountpoint.

The problem is that nfsd will not hold a reference on the vfsmount of the
snapshot. This cause auto unmounter to unmount the snapshot even though nfs is
still holding dentries in it.

To fix this, we return the inode for the snapdirs themselves. However, we also
trigger automount upon fh_to_dentry, and return ESTALE so nfsd will revalidate
and see the mountpoint and do crossmnt.

Because nfsd will now be aware that these are different filesystems users
must add crossmnt to their export options to access snapshot directories.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #3794
Closes #4716
Closes #5810
Closes #5833

7 years agoFix harmless "BARRIER is deprecated" kernel warning on Centos 6.8
Tony Hutter [Wed, 8 Mar 2017 17:20:21 +0000 (09:20 -0800)]
Fix harmless "BARRIER is deprecated" kernel warning on Centos 6.8

A one time warning after module load that "BARRIER is deprecated" was seen
on the heavily patched 2.6.32-642.13.1.el6.x86_64 Centos 6.8 kernel.  It seems
that kernel had both the old BARRIER and the newer FLUSH/FUA interfaces
defined.  This fixes the warning by prefering the newer FLUSH/FUA interface
if it's available.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #5739
Closes #5828

7 years agoOpenZFS 7867 - ARC space accounting leak
Andriy Gapon [Mon, 27 Feb 2017 22:47:33 +0000 (14:47 -0800)]
OpenZFS 7867 - ARC space accounting leak

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Tim Chase <tim@chase2k.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/7867
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/aa1f740d
Closes #5874

7 years agoCorrected highlight for zpool man page
bunder2015 [Tue, 7 Mar 2017 21:01:39 +0000 (16:01 -0500)]
Corrected highlight for zpool man page

SS is already highlighted and the fB/fR tags break the highlighting
prematurely, removing the tags highlights the entire line.

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: bunder2015 <omfgbunder@gmail.com>
Closes #5873

7 years ago[icp] fpu and asm cleanup for linux
Gvozden Neskovic [Tue, 7 Mar 2017 20:59:31 +0000 (21:59 +0100)]
[icp] fpu and asm cleanup for linux

Properly annotate functions and data section so that objtool does not complain
when CONFIG_STACK_VALIDATION and CONFIG_FRAME_POINTER are enabled.

Pass KERNELCPPFLAGS to assembler.

Use kfpu_begin()/kfpu_end() to protect SIMD regions in Linux kernel.

Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
Closes #5872
Closes #5041

7 years agoFix multi-line error messages in blkdev_compat.h
bunder2015 [Tue, 7 Mar 2017 17:54:55 +0000 (12:54 -0500)]
Fix multi-line error messages in blkdev_compat.h

Fix multi-line error messages in blkdev_compat.h by changing
error-generating multi-line error messages to single line errors.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: bunder2015 <omfgbunder@gmail.com>
Closes #5860

7 years agoOpenZFS 7793 - ztest fails assertion in dmu_tx_willuse_space
Brian Behlendorf [Tue, 7 Mar 2017 17:51:59 +0000 (09:51 -0800)]
OpenZFS 7793 - ztest fails assertion in dmu_tx_willuse_space

Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Background information: This assertion about tx_space_* verifies that we
are not dirtying more stuff than we thought we would. We “need” to know
how much we will dirty so that we can check if we should fail this
transaction with ENOSPC/EDQUOT, in dmu_tx_assign(). While the
transaction is open (i.e. between dmu_tx_assign() and dmu_tx_commit() —
typically less than a millisecond), we call dbuf_dirty() on the exact
blocks that will be modified. Once this happens, the temporary
accounting in tx_space_* is unnecessary, because we know exactly what
blocks are newly dirtied; we call dnode_willuse_space() to track this
more exact accounting.

The fundamental problem causing this bug is that dmu_tx_hold_*() relies
on the current state in the DMU (e.g. dn_nlevels) to predict how much
will be dirtied by this transaction, but this state can change before we
actually perform the transaction (i.e. call dbuf_dirty()).

This bug will be fixed by removing the assertion that the tx_space_*
accounting is perfectly accurate (i.e. we never dirty more than was
predicted by dmu_tx_hold_*()). By removing the requirement that this
accounting be perfectly accurate, we can also vastly simplify it, e.g.
removing most of the logic in dmu_tx_count_*().

The new tx space accounting will be very approximate, and may be more or
less than what is actually dirtied. It will still be used to determine
if this transaction will put us over quota. Transactions that are marked
by dmu_tx_mark_netfree() will be excepted from this check. We won’t make
an attempt to determine how much space will be freed by the transaction
— this was rarely accurate enough to determine if a transaction should
be permitted when we are over quota, which is why dmu_tx_mark_netfree()
was introduced in 2014.

We also won’t attempt to give “credit” when overwriting existing blocks,
if those blocks may be freed. This allows us to remove the
do_free_accounting logic in dbuf_dirty(), and associated routines. This
logic attempted to predict what will be on disk when this txg syncs, to
know if the overwritten block will be freed (i.e. exists, and has no
snapshots).

OpenZFS-issue: https://www.illumos.org/issues/7793
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/3704e0a
Upstream bugs: DLPX-32883a
Closes #5804

Porting notes:
- DNODE_SIZE replaced with DNODE_MIN_SIZE in dmu_tx_count_dnode(),
  Using the default dnode size would be slightly better.
- DEBUG_DMU_TX wrappers and configure option removed.
- Resolved _by_dnode() conflicts these changes have not yet been
  applied to OpenZFS.

7 years agoOpenZFS 7843 - get_clones_stat() is suboptimal for lots of clones
Brian Behlendorf [Tue, 7 Mar 2017 17:47:40 +0000 (09:47 -0800)]
OpenZFS 7843 - get_clones_stat() is suboptimal for lots of clones

Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/7843
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/4d519e7
Closes #5868

7 years agoDump unique configurations and Uberblocks in zdb -lu
Olaf Faaland [Tue, 7 Mar 2017 00:01:45 +0000 (16:01 -0800)]
Dump unique configurations and Uberblocks in zdb -lu

For zdb -l, detect when the configuration nvlist in some label l (l>0)
is the same as a configuration already dumped.  If so, do not dump it.

Make a similar check when dumping Uberblocks for zdb -lu.  Check whether
a label already dumped contains an identical Uberblock.  If so, do not
dump the Uberblock.

When dumping a configuration or Uberblock, state which labels it is
found in (0-3), for example: labels = 1 2 3

Detecting redundant uberblocks or configurations is accomplished by
calculating checksums of the uberblocks and the packed nvlists
containing the configuration.

If there is nothing unique to be dumped for a label (ie the
configuration and uberblocks have checksums matching those already
dumped) print nothing for that label.

With additional l's or u's, increase verbosity as follows:

-l      Dump each unique configuration only once.
        Indicate which labels it appears in.
-ll     In addition, dump label space usage stats.
-lll    Dump every configuration, unique or not.

-u      Dump each unique, valid, uberblock only once.
        Indicate which labels it appears in.
-uu     In addition, state which slots are invalid.
-uuu    Dump every uberblock, unique or not.
-uuuu   Dump the uberblock blockpointer (used to be -uuu)

Make exit values conform to the manual page.  Failing to unpack a
configuration nvlist is considered an error, as well as failing to open
or read from the device.

Add three tests, zdb_00{3,4,5}_pos to verify the above functionality.

An example of the output:
------------------------------------
LABEL 0
------------------------------------
    version: 5000
    name: 'pool'
    state: 1
    txg: 880
    < ... redacted ... >
    features_for_read:
com.delphix:hole_birth
com.delphix:embedded_data
    labels = 0
    Uberblock[0]
magic = 0000000000bab10c
version = 5000
txg = 0
guid_sum = 3038694082047428541
timestamp = 1487715500 UTC = Tue Feb 21 14:18:20 2017
labels = 0 1 2 3
    Uberblock[4]
magic = 0000000000bab10c
version = 5000
txg = 772
guid_sum = 9045970794941528051
timestamp = 1487727291 UTC = Tue Feb 21 17:34:51 2017
labels = 0
    < ... redacted ... >
------------------------------------
LABEL 1
------------------------------------
    version: 5000
    name: 'pool'
    state: 1
    txg: 14
    < ... redacted ... >
com.delphix:embedded_data
    labels = 1 2 3
    Uberblock[4]
magic = 0000000000bab10c
version = 5000
txg = 4
guid_sum = 7793930272573252584
timestamp = 1487727521 UTC = Tue Feb 21 17:38:41 2017
labels = 1 2 3
    < ... redacted ... >

Reviewed-by: Tim Chase <tim@chase2k.com>
Reviewed-by: Don Brady <don.brady@intel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #5738

7 years agoFix loop device becomes read-only
Chunwei Chen [Mon, 6 Mar 2017 17:20:20 +0000 (09:20 -0800)]
Fix loop device becomes read-only

Commit 933ec99 removes read and write from f_op because the vfs layer will
select iter_write or aio_write automatically. However, for Linux <= 4.0,
loop_set_fd will actually check f_op->write and set read-only if not exists.
This patch add them back and use the generic do_sync_{read,write} for
aio_{read,write} and new_sync_{read,write} for {read,write}_iter.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #5776
Closes #5855

7 years agoFix powerpc build
Brian Behlendorf [Mon, 6 Mar 2017 17:17:24 +0000 (09:17 -0800)]
Fix powerpc build

Unlike other architectures which sanitize the LDFLAGS from the
environment in arch/<arch>/Makefile.  The powerpc Makefile
allows LDFLAGS to be passed through resulting in the following
build failure.

  /usr/bin/ld: unrecognized option '-Wl,-z,relro'

LDFLAGS is set in /usr/lib/rpm/redhat/macros by default.  Clear
the environment variable when building kmods for powerpc.

Additionally, now that ppc64le exists it's not longer safe to
assume a powerpc system is big endian.  Rely on the endianness
provided by the compiler.

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5856

7 years agoReduce size of zvol and enforce 4k blocksize in zvol tests
Giuseppe Di Natale [Wed, 1 Mar 2017 20:58:12 +0000 (12:58 -0800)]
Reduce size of zvol and enforce 4k blocksize in zvol tests

32-bit builders in the buildbot are having trouble completing
their ENOSPC testing in less than the timeout. Reduce the
zvol size and use a 4k block size to reduce read-modify-writes
which are particularly expensive on 32-bit systems due to the
reduced maximum ARC size.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kash Pande <kash@tripleback.net>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #5845

7 years agoBug fixes for single test runs in zfs-tests
Giuseppe Di Natale [Wed, 1 Mar 2017 02:02:48 +0000 (18:02 -0800)]
Bug fixes for single test runs in zfs-tests

Correctly remove the temporary runfile after the
single test is run.

Cleanup and setup scripts are relative to the
test suite's location, correct how we look for
those scripts.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #5844

7 years agoAdd auto-online test for ZED/FMA as part of the ZTS
Sydney Vanda [Fri, 23 Sep 2016 20:51:08 +0000 (13:51 -0700)]
Add auto-online test for ZED/FMA as part of the ZTS

Automated auto-online test to go along with ZED FMA integration (PR 4673)
auto_online_001.pos works with real devices (sd- and mpath) and with non-real
block devices (loop) by adding a scsi_debug device to the pool

Note: In order for test group to run, ZED must not currently be running.
Kernel 3.16.37 or higher needed for scsi_debug to work properly
If timeout occurs on test using a scsi_debug device (error noticed on Ubuntu
system), a reboot might be needed in order for test to pass. (more
investigation into this)

Also suppressed output from is_real_device/is_loop_device/is_mpath_device -
was making the log file very cluttered with useless error messages
"ie /dev/mapper/sdc is not a block device" from previous patch

Reviewed-by: Don Brady <don.brady@intel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: David Quigley <david.quigley@intel.com>
Signed-off-by: Sydney Vanda <sydney.m.vanda@intel.com>
Closes #5774

7 years agoLinux 4.11 compat: avoid refcount_t name conflict
Olaf Faaland [Wed, 1 Mar 2017 00:10:18 +0000 (16:10 -0800)]
Linux 4.11 compat: avoid refcount_t name conflict

Linux 4.11 introduces a new type, refcount_t, which conflicts with the
type of the same name defined within ZFS.

Rename the ZFS type zfs_refcount_t.  Within the ZFS code, use a macro to
cause references to refcount_t to be changed to zfs_refcount_t at
compile time.  This reduces conflicts when later landing OpenZFS
patches.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #5823
Closes #5842

7 years agoFix initramfs hook for merged /usr/lib and /lib
Matt Kemp [Mon, 27 Feb 2017 20:03:23 +0000 (14:03 -0600)]
Fix initramfs hook for merged /usr/lib and /lib

Under a merged `/lib` -> `/usr/lib` which renders `/lib` as a symlink,
`find /lib -type f -name libgcc_s.so.1` will not return a result as
`find` will not traverse the symlink. Modifying it to `find /lib/ -type
f -name libgcc_s.so.1` should work for both symlinked and non-symlinked
`/lib` directories.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Kemp <matt@mattikus.com>
Closes #5834

7 years agoClean up by-dnode code in dmu_tx.c
Matthew Ahrens [Fri, 24 Feb 2017 21:34:26 +0000 (13:34 -0800)]
Clean up by-dnode code in dmu_tx.c

https://github.com/zfsonlinux/zfs/commit/0eef1bde31d67091d3deed23fe2394f5a8bf2276
introduced some changes which we slightly improved the style of when
porting to illumos.

There is also one minor error-handling fix, in zap_add() the "zap" may
become NULL in case of an error re-opening the ZAP.

Originally suggested at: https://github.com/openzfs/openzfs/pull/276

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #5805

7 years agoABD style cleanups
Isaac Huang [Fri, 24 Feb 2017 20:05:42 +0000 (13:05 -0700)]
ABD style cleanups

The commit a6255b7fce400d485a0e87cbe369aa0ed7dc5dc4 removed a few
assertions which help catch errors and improve code readability. It also
duplicated two conditionals, which was unnecessary and made the code
confusing to read. This patch cleans it up.

Reviewed-by: David Quigley <david.quigley@intel.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Isaac Huang <he.huang@intel.com>
Closes #5802

7 years agoFix checksumflags assignment in cksummer
Tim Crawford [Fri, 24 Feb 2017 19:29:47 +0000 (14:29 -0500)]
Fix checksumflags assignment in cksummer

drr_checksumflags was incorrectly set to drr_checksumtype.

Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Tim Crawford <tcrawford@datto.com>
Closes #5830

7 years agoOpenZFS 7736 - ZFS Performance tests should log FIO summary output
Ahmed G [Wed, 18 Jan 2017 00:53:31 +0000 (16:53 -0800)]
OpenZFS 7736 - ZFS Performance tests should log FIO summary output

Authored by: Ahmed G <ahmedg@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Stephen Blinick <stephen.blinick@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Porting Notes:
- Using $FIO until 7290 is ported.

OpenZFS-issue: https://www.illumos.org/issues/7736
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/7a61309
Closes #5827

7 years agoOpenZFS 7812 - Remove gender specific language
Daniel Hoffman [Fri, 17 Feb 2017 19:48:20 +0000 (11:48 -0800)]
OpenZFS 7812 - Remove gender specific language

Authored by: Daniel Hoffman <dj.hoffman@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Reviewed by: Chris Williamson <chris.williamson@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Robert Mustacchi <rm@joyent.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Ported-by: George Melikov <mail@gmelikov.ru>
This change removes all gendered language that did not refer specifically
to an individual person or pet. The convention taken was to use
variations on "they" when referring to users and/or human beings, while
using "it" when referring to code, functions, and/or libraries.
Additionally, we took the liberty to fix up any whitespace issues that
were found in any files that were already being modified.

OpenZFS-issue: https://www.illumos.org/issues/7812
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/ad626db
Closes #5822

7 years agoOpenZFS 7761 - bootfs_005_neg's pool destruction must handle EBUSY
Prakash Surya [Thu, 12 Jan 2017 00:36:58 +0000 (16:36 -0800)]
OpenZFS 7761 - bootfs_005_neg's pool destruction must handle EBUSY

Authored by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Yuri Pankov <yuri.pankov@gmail.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Ported-by: George Melikov <mail@gmelikov.ru>
OpenZFS-issue: https://www.illumos.org/issues/7761
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/ad309d3
Closes #5818

7 years agoOpenZFS 7199 - dsl_dataset_rollback_sync may try to free already free blocks
Andriy Gapon [Mon, 21 Nov 2016 23:09:54 +0000 (15:09 -0800)]
OpenZFS 7199 - dsl_dataset_rollback_sync may try to free already free blocks

7200 no blocks must be born in a txg after a snaphot is created
Authored by: Andriy Gapon <andriy.gapon@clusterhq.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Approved by: Gordon Ross <gordon.w.ross@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: George Melikov <mail@gmelikov.ru>
OpenZFS-issue: https://www.illumos.org/issues/7199
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/bfaed0b
Closes #5817

7 years agoOpenZFS 7337 - inherit_001_pos occasionally times out
Matthew Ahrens [Sat, 24 Sep 2016 03:44:15 +0000 (20:44 -0700)]
OpenZFS 7337 - inherit_001_pos occasionally times out

Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Ported-by: George Melikov <mail@gmelikov.ru>
OpenZFS-issue: https://www.illumos.org/issues/7337
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/b021ac0
Closes #5800

Porting notes:
- Additional code refactor for better Zol and OpenZFS codebase sync

7 years agoAllow zfs-tests to run a single test
Giuseppe Di Natale [Fri, 24 Feb 2017 18:59:24 +0000 (10:59 -0800)]
Allow zfs-tests to run a single test

Add a -t flag to zfs-tests to allow a user
to run a single test by providing the path
to the test relative to STF_SUITE.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #5775

7 years agoFix incorrect spare vdev state after replacing
Isaac Huang [Thu, 23 Feb 2017 18:32:15 +0000 (11:32 -0700)]
Fix incorrect spare vdev state after replacing

After a hot spare replaces an OFFLINE vdev, the new
parent spare vdev state is set incorrectly to OFFLINE.
The correct state should be DEGRADED. The incorrect
OFFLINE state will prevent top-level vdev from reading
the spare vdev, thus causing unnecessary reconstruction.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Don Brady <don.brady@intel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Isaac Huang <he.huang@intel.com>
Closes #5766
Closes #5770

7 years agoRetry setting LED
Christopher Voltz [Thu, 16 Feb 2017 21:41:48 +0000 (15:41 -0600)]
Retry setting LED

If the LED is being accessed by another process when we try to update
it, the update will be lost. Add a retry loop which will read the state
of the LED and update it until the LED is in the correct state. The
number of times this will occur is limited to ensure that the ZEDlet
won't hang ZED.

Refactor to remove duplication so setting of the LED occurs in only one
place.

Cleanup a couple of the warnings generated by shellcheck which weren't
the result of specific choices by the author. Several notes and warnings
are still present but removing them would make the code less clear or
require adding lines to tell shellcheck to ignore the warning.

Remove ",i" from the documentation at the top of the file which appears
to be a typographic error.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Christopher Voltz <christopher.voltz@hpe.com>
Closes #5795