]> git.proxmox.com Git - mirror_zfs.git/log
mirror_zfs.git
7 years agoOpenZFS 4185 - add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R
Tony Hutter [Wed, 15 Jun 2016 22:47:05 +0000 (15:47 -0700)]
OpenZFS 4185 - add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Approved by: Garrett D'Amore <garrett@damore.org>
Ported by: Tony Hutter <hutter2@llnl.gov>

OpenZFS-issue: https://www.illumos.org/issues/4185
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/45818ee

Porting Notes:
This code is ported on top of the Illumos Crypto Framework code:

    https://github.com/zfsonlinux/zfs/pull/4329/commits/b5e030c8dbb9cd393d313571dee4756fbba8c22d

The list of porting changes includes:

- Copied module/icp/include/sha2/sha2.h directly from illumos

- Removed from module/icp/algs/sha2/sha2.c:
#pragma inline(SHA256Init, SHA384Init, SHA512Init)

- Added 'ctx' to lib/libzfs/libzfs_sendrecv.c:zio_checksum_SHA256() since
  it now takes in an extra parameter.

- Added CTASSERT() to assert.h from for module/zfs/edonr_zfs.c

- Added skein & edonr to libicp/Makefile.am

- Added sha512.S.  It was generated from sha512-x86_64.pl in Illumos.

- Updated ztest.c with new fletcher_4_*() args; used NULL for new CTX argument.

- In icp/algs/edonr/edonr_byteorder.h, Removed the #if defined(__linux) section
  to not #include the non-existant endian.h.

- In skein_test.c, renane NULL to 0 in "no test vector" array entries to get
  around a compiler warning.

- Fixup test files:
- Rename <sys/varargs.h> -> <varargs.h>, <strings.h> -> <string.h>,
- Remove <note.h> and define NOTE() as NOP.
- Define u_longlong_t
- Rename "#!/usr/bin/ksh" -> "#!/bin/ksh -p"
- Rename NULL to 0 in "no test vector" array entries to get around a
  compiler warning.
- Remove "for isa in $($ISAINFO); do" stuff
- Add/update Makefiles
- Add some userspace headers like stdio.h/stdlib.h in places of
  sys/types.h.

- EXPORT_SYMBOL *_Init/*_Update/*_Final... routines in ICP modules.

- Update scripts/zfs2zol-patch.sed

- include <sys/sha2.h> in sha2_impl.h

- Add sha2.h to include/sys/Makefile.am

- Add skein and edonr dirs to icp Makefile

- Add new checksums to zpool_get.cfg

- Move checksum switch block from zfs_secpolicy_setprop() to
  zfs_check_settable()

- Fix -Wuninitialized error in edonr_byteorder.h on PPC

- Fix stack frame size errors on ARM32
   - Don't unroll loops in Skein on 32-bit to save stack space
   - Add memory barriers in sha2.c on 32-bit to save stack space

- Add filetest_001_pos.ksh checksum sanity test

- Add option to write psudorandom data in file_write utility

7 years agoAdd parity generation/rebuild using 128-bits NEON for Aarch64
Romain Dolbeau [Mon, 3 Oct 2016 16:44:00 +0000 (18:44 +0200)]
Add parity generation/rebuild using 128-bits NEON for Aarch64

This re-use the framework established for SSE2, SSSE3 and
AVX2. However, GCC is using FP registers on Aarch64, so
unlike SSE/AVX2 we can't rely on the registers being left alone
between ASM statements. So instead, the NEON code uses
C variables and GCC extended ASM syntax. Note that since
the kernel explicitly disable vector registers, they
have to be locally re-enabled explicitly.

As we use the variable's number to define the symbolic
name, and GCC won't allow duplicate symbolic names,
numbers have to be unique. Even when the code is not
going to be used (e.g. the case for 4 registers when
using the macro with only 2). Only the actually used
variables should be declared, otherwise the build
will fails in debug mode.

This requires the replacement of the XOR(X,X) syntax
by a new ZERO(X) macro, which does the same thing but
without repeating the argument. And perhaps someday
there will be a machine where there is a more efficient
way to zero a register than XOR with itself. This affects
scalar, SSE2, SSSE3 and AVX2 as they need the new macro.

It's possible to write faster implementations (different
scheduling, different unrolling, interleaving NEON and
scalar, ...) for various cores, but this one has the
advantage of fitting in the current state of the code,
and thus is likely easier to review/check/merge.

The only difference between aarch64-neon and aarch64-neonx2
is that aarch64-neonx2 unroll some functions some more.

Reviewed-by: Gvozden Neskovic <neskovic@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Romain Dolbeau <romain.dolbeau@atos.net>
Closes #4801

7 years agoCorrect zpool_vdev_remove() error message
Richard Laager [Sun, 2 Oct 2016 18:34:17 +0000 (13:34 -0500)]
Correct zpool_vdev_remove() error message

The error message in zpool_vdev_remove() said top-level devices
could be removed, but that has never been true.

Reported-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes #4506
Closes #5213

7 years agoFix coverity defects: CID 147448, 147449, 147450, 147453, 147454
luozhengzheng [Sun, 2 Oct 2016 18:24:54 +0000 (02:24 +0800)]
Fix coverity defects: CID 147448, 147449, 147450, 147453, 147454

coverity scan CID:147448,type: unchecked return value
coverity scan CID:147449,type: unchecked return value
coverity scan CID:147450,type: unchecked return value
coverity scan CID:147453,type: unchecked return value
coverity scan CID:147454,type: unchecked return value

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: luozhengzheng <luo.zhengzheng@zte.com.cn>
Closes #5206

7 years agoFix NULL deref in kcf_remove_mech_provider
candychencan [Fri, 30 Sep 2016 23:04:43 +0000 (07:04 +0800)]
Fix NULL deref in kcf_remove_mech_provider

In the default case the function must return to avoid dereferencing
'prov_mech' which will be NULL.

Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: candychencan <chen.can2@zte.com.cn>
Closes #5134

7 years agoFix coverity defects: CID 147563, 147560
cao [Fri, 30 Sep 2016 22:56:17 +0000 (06:56 +0800)]
Fix coverity defects: CID 147563, 147560

coverity scan CID:147563, Type:dereference null return value
coverity scan CID:147560, Type:dereference null return value

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: cao.xuewen <cao.xuewen@zte.com.cn>
Closes #5168

7 years agoFix coverity defects: CID 147531 147532 147533 147535
GeLiXin [Fri, 30 Sep 2016 22:47:57 +0000 (06:47 +0800)]
Fix coverity defects: CID 147531 147532 147533 147535

coverity scan CID:147531,type: Argument cannot be negative
- may copy data with negative size
coverity scan CID:147532,type: resource leaks
- may close a fd which is negative
coverity scan CID:147533,type: resource leaks
- may call pwrite64 with a negative size
coverity scan CID:147535,type: resource leaks
- may call fdopen with a negative fd

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: GeLiXin <ge.lixin@zte.com.cn>
Closes #5176

7 years agoFix coverity defects: CID 147536, 147537, 147538
GeLiXin [Fri, 30 Sep 2016 22:40:07 +0000 (06:40 +0800)]
Fix coverity defects: CID 147536, 147537, 147538

coverity scan CID:147536, type: Argument cannot be negative
- may write or close fd which is negative
coverity scan CID:147537, type: Argument cannot be negative
- may call dup2 with a negative fd
coverity scan CID:147538, type: Argument cannot be negative
- may read or fchown with a negative fd

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: GeLiXin <ge.lixin@zte.com.cn>
Closes #5185

7 years agoraidz_test: respect wall time
Gvozden Neskovic [Fri, 30 Sep 2016 22:19:51 +0000 (00:19 +0200)]
raidz_test: respect wall time

When timeout is specified (-t), stop worker threads in the middle of work units.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
Issue #5180
Closes #5190

7 years agoFix cppcheck warning in buf_init()
Brian Behlendorf [Fri, 30 Sep 2016 22:04:21 +0000 (15:04 -0700)]
Fix cppcheck warning in buf_init()

Cppcheck 1.63 erroneously complains about an uninitialized value
in buf_init().  Newer versions of cppcheck (1.72) handle this
correctly but we'll initialize the value anyway to silence the
warning.

Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5203

7 years agoDisable zpool_import_002_pos and ro_props_001_pos
Brian Behlendorf [Fri, 30 Sep 2016 19:12:53 +0000 (12:12 -0700)]
Disable zpool_import_002_pos and ro_props_001_pos

These test cases fail some percentage of the time resulting
in automated testing failures.  Disable the offending tests
until they can be made reliable.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #5201
Issue #5202
Closes #5194

7 years agoFix coverity defects: CID 147707
cao [Fri, 30 Sep 2016 17:49:16 +0000 (01:49 +0800)]
Fix coverity defects: CID 147707

coverity scan CID:147707, Type:Double free.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: cao.xuewen <cao.xuewen@zte.com.cn>
Closes #5097

7 years agoAdd a script to change file names when upstreaming to OpenZFS/illumos
Matthew Ahrens [Fri, 30 Sep 2016 04:01:50 +0000 (21:01 -0700)]
Add a script to change file names when upstreaming to OpenZFS/illumos

Add a script to change file names when upstreaming to OpenZFS/illumos.

Reviewed-by: Reviewed by: Prashanth Sreenivasa <prashksp@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Ahrens <mahrens@delphix.com>
Closes #5178

7 years agoAvoid undefined shift overflow in fzap_cursor_retrieve()
Gvozden Neskovic [Fri, 2 Sep 2016 13:10:34 +0000 (15:10 +0200)]
Avoid undefined shift overflow in fzap_cursor_retrieve()

Avoid calculating (1<<64) if lh_prefix_len == 0. Semantics of the method remain
the same.

Assert (lh_prefix_len > 0) in zap_expand_leaf() to detect possibly the same
problem.

Issue #4883

Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
7 years agoExplicit integer promotion for bit shift operations
Gvozden Neskovic [Fri, 2 Sep 2016 13:07:00 +0000 (15:07 +0200)]
Explicit integer promotion for bit shift operations

Explicitly promote variables to correct type. Undefined behavior is
reported because length of int is not well defined by C standard.

Issue #4883

Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
7 years agofix: Shift exponent too large
Gvozden Neskovic [Wed, 31 Aug 2016 08:12:08 +0000 (10:12 +0200)]
fix: Shift exponent too large

Undefined operation is reported by running ztest (or zloop) compiled with GCC
UndefinedBehaviorSanitizer. Error only happens on top level of dnode indirection
with large enough offset values. Logically, left shift operation would work,
but bit shift semantics in C, and limitation of uint64_t, do not produce desired
result.

Issue #5059, #4883

Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
7 years agoFix coverity defects: CID 147443, 147656, 147655, 147441, 147653
BearBabyLiu [Thu, 29 Sep 2016 20:33:09 +0000 (04:33 +0800)]
Fix coverity defects: CID 147443, 147656, 147655, 147441, 147653

coverity scan CID:147443, Type: Buffer not null terminated
coverity scan CID:147656, Type: Copy into fixed size buffer
coverity scan CID:147655, Type: Copy into fixed size buffer
coverity scan CID:147441, Type: Buffer not null terminated
coverity scan CID:147653, Type: Copy into fixed size buffer

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: liuhuang <liu.huang@zte.com.cn>
Closes #5165

7 years agoExplicit block device plugging when submitting multiple BIOs
Isaac Huang [Thu, 29 Sep 2016 20:13:31 +0000 (14:13 -0600)]
Explicit block device plugging when submitting multiple BIOs

Without plugging, the default 'noop' scheduler will not merge
the BIOs which are part of a large ZIO.

Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Isaac Huang <he.huang@intel.com>
Closes #5181

7 years agoFix zfs_clone_010_pos.ksh to verify zfs clones property displays right
liaoyuxiangqin [Thu, 29 Sep 2016 20:08:44 +0000 (04:08 +0800)]
Fix zfs_clone_010_pos.ksh to verify zfs clones property displays right

Because the macro ZFS_MAXPROPLEN used in function print_dataset
differs between platforms set it appropriately and calculate the expected
number of passes.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: yuxiang <guo.yong33@zte.com.cn>
Closes #5154

7 years agoEnable ro_props_001_pos and onoffs_001_pos
ChaoyuZhang [Thu, 29 Sep 2016 19:56:48 +0000 (03:56 +0800)]
Enable ro_props_001_pos and onoffs_001_pos

Enable ro_props_001_pos and onoffs_001_pos which pass reliably.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: ChaoyuZhang <zhang.chaoyu@zte.com.cn>
Closes #5183

7 years agoFix zfs_clone_010_pos.ksh to verify the space used by multiple copies
liaoyuxiangqin [Thu, 29 Sep 2016 19:46:13 +0000 (03:46 +0800)]
Fix zfs_clone_010_pos.ksh to verify the space used by multiple copies

The default blocksize in Linux is 1024 due to a GNU-ism.  Setting the
expected blocksize resolves the issue.  As mentioned in the PR an
alternate solution would be to set POSIXLY_CORRECT=1.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: yuxiang <guo.yong33@zte.com.cn>
Closes #5167

7 years agoFix coverity defects: CID 147610, 147608, 147607
cao [Thu, 29 Sep 2016 19:11:44 +0000 (03:11 +0800)]
Fix coverity defects: CID 147610, 147608, 147607

coverity scan CID:147610, Type: Resource leak.
coverity scan CID:147608, Type: Resource leak.
coverity scan CID:147607, Type: Resource leak.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: cao.xuewen <cao.xuewen@zte.com.cn>
Closes #5143

7 years agoFix coverity defects: 147658, 147652, 147651
cao [Thu, 29 Sep 2016 19:06:14 +0000 (03:06 +0800)]
Fix coverity defects: 147658, 147652, 147651

coverity scan CID:147658, Type:copy into fixed size buffer.
coverity scan CID:147652, Type:copy into fixed size buffer.
coverity scan CID:147651, Type:copy into fixed size buffer.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: cao.xuewen <cao.xuewen@zte.com.cn>
Closes #5160

7 years agoRefactor inode->i_mode management
lorddoskias [Tue, 27 Sep 2016 21:08:52 +0000 (00:08 +0300)]
Refactor inode->i_mode management

Refactor the code in such a way so that inode->i_mode is being set
at the same time zp->z_mode is being changed. This has the effect of
keeping both in sync without relying on zfs_inode_update.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Closes #5158

7 years agoEnable property_alias_001_pos.ksh
candychencan [Tue, 27 Sep 2016 18:49:45 +0000 (02:49 +0800)]
Enable property_alias_001_pos.ksh

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: candychencan <chen.can2@zte.com.cn>
Closes #5175

7 years agoFix coverity defects: CID 147650, 147649, 147647, 147646
cao [Sun, 25 Sep 2016 22:08:28 +0000 (06:08 +0800)]
Fix coverity defects: CID 147650, 147649, 147647, 147646

coverity scan CID:147650, Type:copy into fixed size buffer.
coverity scan CID:147649, Type:copy into fixed size buffer.
coverity scan CID:147647, Type:copy into fixed size buffer.
coverity scan CID:147646, Type:copy into fixed size buffer.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: cao.xuewen <cao.xuewen@zte.com.cn>
Closes #5161

7 years agoFix coverity defects: CID 147602 147604
cao [Fri, 23 Sep 2016 22:43:46 +0000 (06:43 +0800)]
Fix coverity defects: CID 147602 147604

coverity scan CID:147604, Type: Resource leak.
coverity scan CID:147602, Type: Resource leak.
reason: safe_malloc calcvs, goto children but not free calcvs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: cao.xuewen <cao.xuewen@zte.com.cn>
Closes #5155

7 years agoFix multilist_create() memory leak
Brian Behlendorf [Fri, 23 Sep 2016 17:55:10 +0000 (10:55 -0700)]
Fix multilist_create() memory leak

In arc_state_fini() the `arc_l2c_only->arcs_list[*]` multilists
must be destroyed.  This accidentally regressed in d3c2ae1c.

Reviewed by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #5151
Closes #5152

7 years agoFix coverity defects: CID 147613 147614 147616 147617
luozhengzheng [Fri, 23 Sep 2016 16:10:50 +0000 (00:10 +0800)]
Fix coverity defects: CID 147613 147614 147616 147617

coverity scan CID:147617,type: resource leaks
coverity scan CID:147616,type: resource leaks
coverity scan CID:147614,type: resource leaks
coverity scan CID:147613,type: resource leaks

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: luozhengzheng <luo.zhengzheng@zte.com.cn>
Closes #5150

7 years agoLinux 4.7 compat: Fix deadlock during lookup on case-insensitive
tuxoko [Fri, 23 Sep 2016 02:09:16 +0000 (19:09 -0700)]
Linux 4.7 compat: Fix deadlock during lookup on case-insensitive

We must not use d_add_ci if the dentry already has the real name. Otherwise,
d_add_ci()->d_alloc_parallel() will find itself on the lookup hash and wait
on itself causing deadlock.

Tested-by: satmandu
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #5124
Closes #5141
Closes #5147
Closes #5148

7 years agoOpenZFS 6111 - zfs send should ignore datasets created after the ending snapshot
kernelOfTruth aka. kOT, Gentoo user [Fri, 23 Sep 2016 00:22:37 +0000 (02:22 +0200)]
OpenZFS 6111 - zfs send should ignore datasets created after the ending snapshot

Authored by: Alex Deiter <alex.deiter@gmail.com>
Reviewed by: Alex Aizman alex.aizman@nexenta.com
Reviewed by: Alek Pinchuk alek.pinchuk@nexenta.com
Reviewed by: Roman Strashkin roman.strashkin@nexenta.com
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Garrett D'Amore <garrett@damore.org>
Ported-by: kernelOfTruth <kerneloftruth@gmail.com>
OpenZFS-issue: https://www.illumos.org/issues/6111
OpenZFS-commit: https://github.com/illumos/illumos-gate/commit/4a20c933
Closes #5110

Porting notes:

There were changes from upstream due to the following commits:
- zfs send -p send properties only for snapshots that are actually sent
  https://github.com/zfsonlinux/zfs/commit/057485504e3a4502c265813ab58e9ec8ffc2a3be
- Produce a full snapshot list for zfs send -p
  https://github.com/zfsonlinux/zfs/commit/e890dd85a7522730ad46daf68150aafd3952d0c1
- Implement zfs_ioc_recv_new() for OpenZFS 2605
  https://github.com/zfsonlinux/zfs/commit/43e52eddb13d8accbd052fac9a242ce979531aa4
- OpenZFS 6314 - buffer overflow in dsl_dataset_name
  ZFS_MAXNAMELEN was changed to the now used ZFS_MAX_DATASET_NAME_LEN since
  https://github.com/zfsonlinux/zfs/commit/eca7b76001a7d33f78bd98884aef8325bdbf98e7

7 years agoOpenZFS 7230 - add assertions to dmu_send_impl() to verify that stream includes BEGIN...
kernelOfTruth aka. kOT, Gentoo user [Thu, 22 Sep 2016 23:01:19 +0000 (01:01 +0200)]
OpenZFS 7230 - add assertions to dmu_send_impl() to verify that stream includes BEGIN and END records

Authored by: Matt Krantz <matt.krantz@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Robert Mustacchi <rm@joyent.com>
Ported-by: kernelOfTruth <kerneloftruth@gmail.com>
OpenZFS-issue: https://www.illumos.org/issues/7230
OpenZFS-commit: https://github.com/illumos/illumos-gate/commit/12b90ee2
Closes #5112

7 years agoFix coverity defects
luozhengzheng [Thu, 22 Sep 2016 22:55:41 +0000 (06:55 +0800)]
Fix coverity defects

1.coverity scan CID:147445 function zfs_do_send in zfs_main.c
Buffer not null terminated (BUFFER_SIZE_WARNING)

2.coverity scan CID:147443 function zfs_do_bookmark in zfs_main.c
Buffer not null terminated (BUFFER_SIZE_WARNING)

3.coverity scan CID:147660 function main in zinject.c
Passing string argv[0] of unknown size to strcpy
By the way, the leak of g_zfs is fixed.

4.coverity scan CID: 147442 function make_disks in zpool_vdev.c
Buffer not null terminated (BUFFER_SIZE_WARNING)

5.coverity scan CID: 147661 function main in dir_rd_update.c
passing string cp1 of unknown size to strcpy

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: luozhengzheng <luo.zhengzheng@zte.com.cn>
Closes #5130

7 years agoUpdate zfs destroy test scripts
cao [Thu, 22 Sep 2016 22:28:34 +0000 (06:28 +0800)]
Update zfs destroy test scripts

Update and enable zfs_destroy_0[08-13]_*.ksh.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: cao.xuewen <cao.xuewen@zte.com.cn>
Closes #5068

7 years agoFix coverity defects
luozhengzheng [Thu, 22 Sep 2016 01:09:00 +0000 (09:09 +0800)]
Fix coverity defects

coverity scan CID:147633,type: sizeof not portable
coverity scan CID:147637,type: sizeof not portable
coverity scan CID:147638,type: sizeof not portable
coverity scan CID:147640,type: sizeof not portable

In these particular cases sizeof (XX **) happens to be equal to sizeof (X *),
but this is not a portable assumption.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: luozhengzheng <luo.zhengzheng@zte.com.cn>
Closes #5144

7 years agoFix zfs_destroy_001_pos.ksh
candychencan [Wed, 21 Sep 2016 20:51:53 +0000 (04:51 +0800)]
Fix zfs_destroy_001_pos.ksh

Due to how the Linux VFS was designed busy mount points
cannot be destroyed even when given the force option.  Update
the zfs_destroy_001_pos test case to expect this behavior when
running under Linux.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: candychencan <chen.can2@zte.com.cn>
Closes #5132

7 years agoFix automatically generated release number
Brian Behlendorf [Wed, 21 Sep 2016 20:45:21 +0000 (13:45 -0700)]
Fix automatically generated release number

When building from the head of a branch a release number is
automatically generated with `git describe` using the last tag
on that branch as the base.  For this to work the last tag on the
branch needs to be predictable given the current META file.

This logic was accidentally broken when an -rcX tag was added to
the branch.  Update it to search for a VERSION or VERSION-RELEASE
tag.

Reviewed-by: Chris Siebenmann <cks.git01@cs.toronto.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5105
Closes #5140

7 years agoReduce noise in tracing logs
Isaac Huang [Wed, 21 Sep 2016 20:37:20 +0000 (14:37 -0600)]
Reduce noise in tracing logs

dbuf_read_impl() returns (SET_ERROR(err)) when err can be 0, which adds
lots of noise in tracing logs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Isaac Huang <he.huang@intel.com>
Closes #4430
Closes #5146

7 years agoFix regression that broke dracut initramfs generation
Moritz Maxeiner [Wed, 21 Sep 2016 20:35:16 +0000 (22:35 +0200)]
Fix regression that broke dracut initramfs generation

Based upon @ryao's initial fix for 1c73494394fc9de9283b3fd4f00bcdf4bd300a7
5e9843405f63fdabe76e87b92b81a127d488abc7 ) this one also uses
`command -v` instead of `type`, but additionally only applies the
fix to close zfsonlinux/zfs#4749 when `libgcc_s.so.1` has not been included
by dracut automatically (verified by whether `zpool` links directly to
`libgcc_s.so`), as well as change the fallback option to match `libgcc_s.so*`.

Tested-by: Ben Jencks <ben@bjencks.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Moritz Maxeiner <moritz@ucworks.org>
Closes #5089
Closed #5138

7 years agoFix coverity defects
BearBabyLiu [Wed, 21 Sep 2016 02:09:22 +0000 (10:09 +0800)]
Fix coverity defects

coverity scan CID:147504 Type: Explicit null dereferenced
Reason: passing null pointer dl to zfs_dirent_unlock

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: BearBabyLiu <liu.huang@zte.com.cn>
Closes #5131

7 years agoRemove script zfs_commands.cfg
legend-hua [Wed, 21 Sep 2016 01:36:24 +0000 (09:36 +0800)]
Remove script zfs_commands.cfg

zfs_commands.cfg have printed "No such file or directory", When executing
script/zfs-test.sh. The script is a symlink to ../../../zfs-script-config.sh
So delete the symlink, and directly source $SRCDIR/zfs-script-config.sh
when it exists from default.cfg.in

Reviewed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: legend-hua <liu.hua130@zte.com.cn>
Closes #5133

7 years agoFix coverity defects
cao [Wed, 21 Sep 2016 00:45:45 +0000 (08:45 +0800)]
Fix coverity defects

Fix coverity defects:
coverity scan CID:147623, Type: Resource leak.
coverity scan CID:147622, Type: Resource leak.
reason: zpool_open zhp, but not zpool_close zhp. so resource leak.

coverity scan CID:147621, Type: Resource fd leak.
coverity scan CID:147620, Type: Resource fd leak.
reason: do_write do_read open file fd,but exception not close fd.

delete unuse definition DMU_OS_IS_L2COMPRESSIBLE.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: cao.xuewen <cao.xuewen@zte.com.cn>
Closes #5137

7 years agoFix strncpy in taskq_create
candychencan [Tue, 20 Sep 2016 18:27:15 +0000 (02:27 +0800)]
Fix strncpy in taskq_create

Assign the copy length to TASKQ_NAMELEN, so if the name length equals
'TASKQ_NAMELEN+1' , the final '\0' of tq->tq_name is preserved.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: candychencan <chen.can2@zte.com.cn>
Closes #5136

7 years agoChange /etc/mtab to /proc/self/mounts
slashdd [Tue, 20 Sep 2016 17:07:58 +0000 (13:07 -0400)]
Change /etc/mtab to /proc/self/mounts

Fix misleading error message:

 "The /dev/zfs device is missing and must be created.", if /etc/mtab is missing.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Eric Desrochers <eric.desrochers@canonical.com>
Closes #4680
Closes #5029

7 years agoFix arc_adjust_meta_balanced()
Tim Chase [Mon, 19 Sep 2016 16:28:35 +0000 (11:28 -0500)]
Fix arc_adjust_meta_balanced()

The type of "adjustmnt" was erroneously changed to unsigned when the compressed
ARC code was ported in d3c2ae1c0806b183a315e3d43cc8018cfdca79b5.

As a result of it being unsigned, the balanced metadata eviction logic
would evict all of the non-metadata.

Reviewed-by: Chris Severance <github.severach@spamgourmet.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: David Quigley <david.quigley@intel.com>
Signed-off-by: Tim Chase <tim@onlight.com>
Closes #5128
Closes #5129

7 years agoFix FALLOC_FL_PUNCH_HOLE use in randfree_file.c
legend-hua [Sat, 17 Sep 2016 22:20:10 +0000 (06:20 +0800)]
Fix FALLOC_FL_PUNCH_HOLE use in randfree_file.c

The FALLOC_FL_PUNCH_HOLE flag was introduced in the 2.6.38
kernel.  To prevent breaking the build on older systems wrap its use
in a conditional.  When FALLOC_FL_PUNCH_HOLE isn't available
return a non-zero status and error message.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: legend-hua <liu.hua130@zte.com.cn>
Closes #5101

7 years agoFix Coverity defects
luozhengzheng [Sat, 17 Sep 2016 22:08:54 +0000 (06:08 +0800)]
Fix Coverity defects

CID 147659, 150952 and 147645

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: luozhengzheng <luo.zhengzheng@zte.com.cn>
Closes #5103

7 years agoEnable ignore_hole_birth module option by default
Brian Behlendorf [Fri, 16 Sep 2016 21:05:30 +0000 (14:05 -0700)]
Enable ignore_hole_birth module option by default

Enable ignore_hole_birth by default until all known hole birth bugs
have been resolved and relevant test cases added.

Reviewed-by: Boris Protopopov <boris.protopopov@actifio.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #4809
Closes #5099

7 years agoDisable zpool_upgrade_004_pos test case
Brian Behlendorf [Fri, 16 Sep 2016 20:25:46 +0000 (13:25 -0700)]
Disable zpool_upgrade_004_pos test case

This test cause frequently triggers issue #4034.  Disable this
test case until the root cause of this issue has been addressed.

Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #4034
Closes #5120

7 years agoSimplify time handling logic in zfs_settattr
Nikolay Borisov [Mon, 12 Sep 2016 19:35:56 +0000 (22:35 +0300)]
Simplify time handling logic in zfs_settattr

Simplify time handling in zfs_setattr by mimicking the logic in
setattr_copy from the linux kernel. In order to achieve this
in the case when ZFS' log is being replayed it is necessary
to unconditionally set the ctime in zfs_replay_setattr.

Also use the timespec_trunc function when assigning values to the
generic inode struct. This is currently a noop since zfs sets
s_time_gran to 1, however in the future rules about precision might
change.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Closes #4916

7 years agoRefactor generic inode time updating
Nikolay Borisov [Mon, 1 Aug 2016 20:02:25 +0000 (23:02 +0300)]
Refactor generic inode time updating

ZFS doesn't provide a custom update_time method meaning it delegates
this job to the generic VFS layer. The only time when it needs to
set the various *time values is when the inode is being marshalled
to/from the disk. Do this by moving the relevant code from
zfs_inode_update_impl to zfs_node_alloc and zfs_rezget. As a result
from this change it is no longer necessary to have multiple versions
of the zfs_inode_update function - so just nuke them and leave only
one.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Issue #227
Closes #4916

7 years agoDLPX-44733 combine arc_buf_alloc_impl() with arc_buf_clone()
Dan Kimmel [Wed, 13 Jul 2016 21:17:41 +0000 (17:17 -0400)]
DLPX-44733 combine arc_buf_alloc_impl() with arc_buf_clone()

Authored by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Tom Caputi <tcaputi@datto.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported by: David Quigley <david.quigley@intel.com>
Issue #5078

7 years agoRemove lint suppression from dmu.h and unnecessary dmu.h include in spa.h
Dan Kimmel [Mon, 13 Jun 2016 02:47:35 +0000 (22:47 -0400)]
Remove lint suppression from dmu.h and unnecessary dmu.h include in spa.h

Authored by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Tom Caputi <tcaputi@datto.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported by: David Quigley <david.quigley@intel.com>
Issue #5078

7 years agoEnable raw writes to perform dedup with verification
Tom Caputi [Tue, 13 Sep 2016 01:34:19 +0000 (21:34 -0400)]
Enable raw writes to perform dedup with verification

Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: David Quigley <david.quigley@intel.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Issue #5078

7 years agoDLPX-40252 integrate EP-476 compressed zfs send/receive
Dan Kimmel [Mon, 11 Jul 2016 17:45:52 +0000 (13:45 -0400)]
DLPX-40252 integrate EP-476 compressed zfs send/receive

Authored by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Tom Caputi <tcaputi@datto.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported by: David Quigley <david.quigley@intel.com>
Issue #5078

7 years agoOpenZFS 6950 - ARC should cache compressed data
George Wilson [Thu, 2 Jun 2016 04:04:53 +0000 (00:04 -0400)]
OpenZFS 6950 - ARC should cache compressed data

Authored by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Tom Caputi <tcaputi@datto.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported by: David Quigley <david.quigley@intel.com>

This review covers the reading and writing of compressed arc headers, sharing
data between the arc_hdr_t and the arc_buf_t, and the implementation of a new
dbuf cache to keep frequently access data uncompressed.

I've added a new member to l1 arc hdr called b_pdata. The b_pdata always hangs
off the arc_buf_hdr_t (if an L1 hdr is in use) and points to the physical block
for that DVA. The physical block may or may not be compressed. If compressed
arc is enabled and the block on-disk is compressed, then the b_pdata will match
the block on-disk and remain compressed in memory. If the block on disk is not
compressed, then neither will the b_pdata. Lastly, if compressed arc is
disabled, then b_pdata will always be an uncompressed version of the on-disk
block.

Typically the arc will cache only the arc_buf_hdr_t and will aggressively evict
any arc_buf_t's that are no longer referenced. This means that the arc will
primarily have compressed blocks as the arc_buf_t's are considered overhead and
are always uncompressed. When a consumer reads a block we first look to see if
the arc_buf_hdr_t is cached. If the hdr is cached then we allocate a new
arc_buf_t and decompress the b_pdata contents into the arc_buf_t's b_data. If
the hdr already has a arc_buf_t, then we will allocate an additional arc_buf_t
and bcopy the uncompressed contents from the first arc_buf_t to the new one.

Writing to the compressed arc requires that we first discard the b_pdata since
the physical block is about to be rewritten. The new data contents will be
passed in via an arc_buf_t (uncompressed) and during the I/O pipeline stages we
will copy the physical block contents to a newly allocated b_pdata.

When an l2arc is inuse it will also take advantage of the b_pdata. Now the
l2arc will always write the contents of b_pdata to the l2arc. This means that
when compressed arc is enabled that the l2arc blocks are identical to those
stored in the main data pool. This provides a significant advantage since we
can leverage the bp's checksum when reading from the l2arc to determine if the
contents are valid. If the compressed arc is disabled, then we must first
transform the read block to look like the physical block in the main data pool
before comparing the checksum and determining it's valid.

OpenZFS-issue: https://www.illumos.org/issues/6950
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/7fc10f0
Issue #5078

7 years agoOpenZFS 7262 - remove seq from zfs_receive_010.ksh
Paul Dagnelie [Sat, 3 Sep 2016 04:07:15 +0000 (21:07 -0700)]
OpenZFS 7262 - remove seq from zfs_receive_010.ksh

Authored by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: John Wren Kennedy <john.kennedy@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Dan McDonald <danmcd@omniti.com>
Ported-by: candychencan <chen.can2@zte.com.cn>
OpenZFS-issue: https://www.illumos.org/issues/7262
OpenZFS-commit: https://github.com/illumos/illumos-gate/commit/b868f5d
Closes #5080

7 years agoFix memleak in zfs_do_* and zpool_do_*
luozhengzheng [Thu, 1 Sep 2016 02:23:10 +0000 (10:23 +0800)]
Fix memleak in zfs_do_* and zpool_do_*

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: luozhengzheng <luo.zhengzheng@zte.com.cn>
Closes #5056

7 years agoAllow ZVOL bookmarks to be listed recursively
loli10K [Wed, 7 Sep 2016 17:34:20 +0000 (19:34 +0200)]
Allow ZVOL bookmarks to be listed recursively

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #4503
Closes #5072

7 years agoRemove redundant assignments to arc_c
Tim Chase [Fri, 9 Sep 2016 16:03:03 +0000 (11:03 -0500)]
Remove redundant assignments to arc_c

Several assignments to arc_c had no effect because it is ultimately
initialized to arc_c_max.

This aligns ZoL better with the upstream code which removed these
assignments some time ago.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@onlight.com>
Closes #5081

7 years agoRefactor spa_load_l2cache to make build happy
Nikolay Borisov [Sat, 10 Sep 2016 20:06:17 +0000 (23:06 +0300)]
Refactor spa_load_l2cache to make build happy

In case sav->sav_config was NULL the body of the function
would skip the iteration of the l2 cache devices and will
just cleanup the old devices. However, this wasn't very obvious
since the null check was performed after the loop body and after
the old devices were cleaned. Refactor the code so that it's now
obvious when the iteration of the l2cache devices is skipped.

This fixes the following cppcheck warning:

[module/zfs/spa.c:1552]: (error) Possible null pointer dereference: newvdevs

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Closes #5087

7 years agoFree property names with spa_strfree() rather than strfree()
Tim Chase [Sat, 10 Sep 2016 15:16:13 +0000 (10:16 -0500)]
Free property names with spa_strfree() rather than strfree()

Since they're allocated with spa_strdup(), they should be freed with
spa_strfree() so the proper length buffer is freed.

Reviewed-by: Richard Yao <ryao@gentoo.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Closes #5082
Closes #5086

7 years agoFix memory/fd leak in check_file() and is_spare()
liuhuang [Sat, 10 Sep 2016 20:41:19 +0000 (04:41 +0800)]
Fix memory/fd leak in check_file() and is_spare()

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: liuhuang <liu.huang@zte.com.cn>
Closes #5085

7 years agoFix make lint target
Brian Behlendorf [Fri, 9 Sep 2016 18:01:22 +0000 (11:01 -0700)]
Fix make lint target

When errors are detected 'make lint' should return a non-zero
error code.  The value 2 was chosen to indicate these are warnings
and not fatal.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
7 years agozfs dracut module should not assume systemd presence
Moritz Maxeiner [Thu, 1 Sep 2016 16:29:31 +0000 (18:29 +0200)]
zfs dracut module should not assume systemd presence

Signed-off-by: Moritz Maxeiner <moritz@ucworks.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4749
Closes #5058

7 years agoAdapt genkernel fix for zfsonlinux/zfs#4749 to zfs dracut module
Moritz Maxeiner [Thu, 1 Sep 2016 16:15:10 +0000 (18:15 +0200)]
Adapt genkernel fix for zfsonlinux/zfs#4749 to zfs dracut module

Signed-off-by: Moritz Maxeiner <moritz@ucworks.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4749
Closes #5058

7 years agoOpenZFS - Performance regression suite for zfstest
John Wren Kennedy [Wed, 3 Aug 2016 21:26:15 +0000 (21:26 +0000)]
OpenZFS - Performance regression suite for zfstest

Author: John Wren Kennedy <john.kennedy@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Don Brady <don.brady@intel.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: David Quigley <david.quigley@intel.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Ported-by: Don Brady <don.brady@intel.com>
OpenZFS-issue: https://www.illumos.org/issues/6950
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/dcbf3bd6
Delphix-commit: https://github.com/delphix/delphix-os/commit/978ed49
Closes #4929

ZFS Test Suite Performance Regression Tests

This was pulled into OpenZFS via the compressed arc featureand was
separated out in zfsonlinux as a separate pull request from PR-4768.
It originally came in as QA-4903 in Delphix-OS from John Kennedy.

Expected Usage:

$ DISKS="sdb sdc sdd" zfs-tests.sh -r perf-regression.run

Porting Notes:
1. Added assertions in the setup script to make sure required tools
   (fio, mpstat, ...) are present.
2. For the config.json generation in perf.shlib used arcstats and
    other binaries instead of dtrace to query the values.
3. For the perf data collection:
   - use "zpool iostat -lpvyL" instead of the io.d dtrace script
    (currently not collecting zfs_read/write latency stats)
   - mpstat and iostat take different arguments
   - prefetch_io.sh is a placeholder that uses arcstats instead of
     dtrace
4. Build machines require fio, mdadm and sysstat pakage (YMMV).

Future Work:
   - Need a way to measure zfs_read and zfs_write latencies per pool.
   - Need tools to takes two sets of output and display/graph the
     differences
   - Bring over additional regression tests from Delphix

7 years agoReal disk partitioning now enabled in test suite for Linux
Sydney Vanda [Fri, 22 Jul 2016 15:07:04 +0000 (15:07 +0000)]
Real disk partitioning now enabled in test suite for Linux

When using real devices, specify DISKS="sdb sdc sdd" opposed to
/dev/sdb in zfs-tests.sh - otherwise errors with directory names and
disk names registering as "/dev//dev/sdb" for some tests.  The same
goes for mpath: DISK="mpatha mpathad mpathb"

Expected Usage:

$ DISKS="sdb sdc sdd" zfs-tests.sh

SLICE_PREFIX is now set as "p" for a loop device (ie loop0p2) or
"" for a real device (ie sdb2), or either for multipath devices
(ie mpatha1 or mpath1p1) instead of only "p" by default.  Note that
kpartx partitioning is not currently supported in this patch
(ie "partx") and may need to be disabled on Debian distributions.
Functions added for determining test directory (/dev or /dev/mapper)
as well as slice prefix are determined and exported mostly in the cfg
file of each test group directory.

Currently zpools cannot be created on whole mpath devices that have
been partitioned. In order to fix this tests have either been revised
to use a partition instead, or if there is a size constraint and the
pool needs to be created on the whole disk, partitions are then deleted
if the device is a multipath device.  This functionality is added to
default_cleanup() or to individual cleanup scripts if a non-default
cleanup method is used.

The max partitions is currently set at 8 to account for all of the
tests thus far.

Patch changes are generally encompassed in "if is_linux" construct.

Signed-off-by: Sydney Vanda <sydney.m.vanda@intel.com>
Reviewed-by: John Salinas <John.Salinas@intel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: David Quigley <david.quigley@intel.com>
Closes #4447
Closes #4964
Closes #5074

7 years agoTag 0.7.0-rc1
Brian Behlendorf [Wed, 7 Sep 2016 17:30:52 +0000 (10:30 -0700)]
Tag 0.7.0-rc1

First release candidate.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
7 years agoBring over illumos ZFS FMA logic -- phase 1
Don Brady [Wed, 31 Aug 2016 21:46:58 +0000 (15:46 -0600)]
Bring over illumos ZFS FMA logic -- phase 1

This first phase brings over the ZFS SLM module, zfs_mod.c, to handle
auto operations in response to disk events. Disk event monitoring is
provided from libudev and generates the expected payload schema for
zfs_mod. This work leverages the recently added devid and phys_path
strings in the vdev label.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@intel.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #4673

7 years agoDelete unreferenced function zfs_ereport_send_interim_checksum
luozhengzheng [Wed, 31 Aug 2016 11:07:36 +0000 (19:07 +0800)]
Delete unreferenced function zfs_ereport_send_interim_checksum

Signed-off-by: luozhengzheng <luo.zhengzheng@zte.com.cn>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5055

7 years agokmem_zalloc with KM_SLEEP will never return NULL
luozhengzheng [Wed, 31 Aug 2016 01:32:02 +0000 (09:32 +0800)]
kmem_zalloc with KM_SLEEP will never return NULL

These allocations can never fail.  Leaving the error handling
code here gives the impression they can so it has been removed.

Signed-off-by: luozhengzheng <luo.zhengzheng@zte.com.cn>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5048

7 years agoFix zfs_unmount() and zfs_unshare_proto() leaks
cao [Wed, 31 Aug 2016 10:35:52 +0000 (18:35 +0800)]
Fix zfs_unmount() and zfs_unshare_proto() leaks

Always free mnpt memory on failure in the zfs_unmount() function.

In the zfs_unshare_proto() function mountpoint is a const and
should not be assigned.

Signed-off-by: cao.xuewen <cao.xuewen@zte.com.cn>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5054

7 years agoPerformance optimization of AVL tree comparator functions
Gvozden Neskovic [Sat, 27 Aug 2016 18:12:53 +0000 (20:12 +0200)]
Performance optimization of AVL tree comparator functions

perf: 2.75x faster ddt_entry_compare()
    First 256bits of ddt_key_t is a block checksum, which are expected
to be close to random data. Hence, on average, comparison only needs to
look at first few bytes of the keys. To reduce number of conditional
jump instructions, the result is computed as: sign(memcmp(k1, k2)).

Sign of an integer 'a' can be obtained as: `(0 < a) - (a < 0)` := {-1, 0, 1} ,
which is computed efficiently.  Synthetic performance evaluation of
original and new algorithm over 1G random keys on 2.6GHz Intel(R) Xeon(R)
CPU E5-2660 v3:

old 6.85789 s
new 2.49089 s

perf: 2.8x faster vdev_queue_offset_compare() and vdev_queue_timestamp_compare()
    Compute the result directly instead of using conditionals

perf: zfs_range_compare()
    Speedup between 1.1x - 2.5x, depending on compiler version and
optimization level.

perf: spa_error_entry_compare()
    `bcmp()` is not suitable for comparator use. Use `memcmp()` instead.

perf: 2.8x faster metaslab_compare() and metaslab_rangesize_compare()
perf: 2.8x faster zil_bp_compare()
perf: 2.8x faster mze_compare()
perf: faster dbuf_compare()
perf: faster compares in spa_misc
perf: 2.8x faster layout_hash_compare()
perf: 2.8x faster space_reftree_compare()
perf: libzfs: faster avl tree comparators
perf: guid_compare()
perf: dsl_deadlist_compare()
perf: perm_set_compare()
perf: 2x faster range_tree_seg_compare()
perf: faster unique_compare()
perf: faster vdev_cache _compare()
perf: faster vdev_uberblock_compare()
perf: faster fuid _compare()
perf: faster zfs_znode_hold_compare()

Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
Signed-off-by: Richard Elling <richard.elling@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5033

7 years agoFix zhack argument processing
Brian Behlendorf [Wed, 31 Aug 2016 01:56:36 +0000 (18:56 -0700)]
Fix zhack argument processing

The argument processing is zhack makes the assumption that getopt()
will not permute argv.  This isn't true for the GNU implementation of
getopt() unless the optstring is prefixed with a '+'.  In which case
this is equivalent to setting the POSIXLY_CORRECT environment variable

In addition, update the usage() and optstrings to reflect the existing
supported options.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: liaoyuxiangqin <guo.yong33@zte.com.cn>
Closes #5047

7 years agoUpdate zpool_import_001_pos
Brian Behlendorf [Wed, 31 Aug 2016 01:50:11 +0000 (18:50 -0700)]
Update zpool_import_001_pos

Older versions of blkid may not promptly detect ZFS labels when
they're located on partitions.  In order to ensure this test passes
reliably always perform a scan of default search paths (-s).

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: liaoyuxiangqin <guo.yong33@zte.com.cn>
Closes #4987
Closes #5047

7 years agoFix "zpool get guid,freeing,leaked" source
Hajo Möller [Tue, 5 Jan 2016 21:46:54 +0000 (22:46 +0100)]
Fix "zpool get guid,freeing,leaked" source

`zpool get guid,freeing,leaked` shows SOURCE as `default`, it should
be `-` as those props are not editable.

Changed code to not overwrite `src` for `ZPOOL_PROP_VERSION`, so it
stays `ZPROP_SRC_NONE`.  Make src const to avoid future mistakes

Signed-off-by: Hajo Möller <dasjoe@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4170

7 years agoUpdate zfs_destroy_004.ksh script
cao [Tue, 23 Aug 2016 02:12:41 +0000 (10:12 +0800)]
Update zfs_destroy_004.ksh script

Issues:
Under Linux, when executing zfs_destroy_004.ksh destroy $fs is an
error.  The key issue here is that illumos kernel treats this case
differently than the Linux kernel. On illumos you can unmount and
destroy a filesystem which is busy and all consumers of it get EIO.
On Linux the expected behavior is to prevent the unmount and destroy.

Cause analysis:
When create $fs file system and mount file system to $mntp.
cd $mntp, linux isn't allow to destroy $fs in this mount contents.
No matter what destroy with parameters.

Solution:
So  log_mustnot $ZFS destroy $fs is ok.
cd $olddir and destroy $fs.

Signed-off-by: caoxuewen cao.xuewen@zte.com.cn
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5012

7 years agoUpdate zfs_create_003_pos.ksh and zfs_create_006_pos.ksh
ChaoyuZhang [Mon, 22 Aug 2016 02:27:06 +0000 (10:27 +0800)]
Update zfs_create_003_pos.ksh and zfs_create_006_pos.ksh

As the scripts zfs_create_003_pos.ksh and zfs_create_006_pos.ksh can
run successfully in the linux, add them to the <linux.run> file to
increase test scene.

Signed-off-by: ChaoyuZhang <zhang.chaoyu@zte.com.cn>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5002

7 years agoAdd log_must_{retry,busy} helpers
Brian Behlendorf [Mon, 20 Jun 2016 21:28:51 +0000 (14:28 -0700)]
Add log_must_{retry,busy} helpers

Add helpers which automatically retry the provided command when
the error message matches the provided keyword.  This provides an
easy way to handle the asynchronous nature of some ZFS commands.

For example, the `zfs destroy` command may need to be retried in
the case where the block device is unexpected busy.  This can be
accomplished as follows:

  log_must_busy $ZFS destroy ...

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #5002

7 years agoUpdate zfs_mount_005_pos.ksh and zfs_mount_010_neg.ksh
liuhuang [Sun, 21 Aug 2016 23:40:54 +0000 (07:40 +0800)]
Update zfs_mount_005_pos.ksh and zfs_mount_010_neg.ksh

Update zfs_mount_005_pos.ksh and zfs_mount_010_neg.ksh to reflect
the expected Linux behavior.  The is_linux wrapper is used so the
test case may be used on Linux and non-Linux platforms.

Signed-off-by: liuhuang <liu.huang@zte.com.cn>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5000

7 years agoDelete unused zfsctl_snapdir_inactive declaration
cao [Tue, 30 Aug 2016 11:32:22 +0000 (19:32 +0800)]
Delete unused zfsctl_snapdir_inactive declaration

zfsctl_snapdir_inactive is defined in zfs-0.6.3.  In zfs-0.6.5.7
this is declaration remains even though the implementation was
removed in commit 278bee93.  Removed fastreboot_disable_highpil
which is also unused.

Signed-off-by: caoxuewen cao.xuewen@zte.com.cn
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5042

7 years agoOpenZFS 6940 - Cannot unlink directories when over quota
Simon Klinkert [Tue, 30 Aug 2016 13:03:05 +0000 (15:03 +0200)]
OpenZFS 6940 - Cannot unlink directories when over quota

From user perspective, I would expect that ZFS is always able
to remove files and directories even when the quota is exceeded.

Authored by: Simon Klinkert <simon.klinkert@gmail.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Ported-by: kernelOfTruth kerneloftruth@gmail.com
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/6940
OpenZFS-issue: https://www.illumos.org/issues/6334
OpenZFS-commit: https://github.com/illumos/illumos-gate/commit/9918916
Closes #5044

7 years agoOpenZFS 6322 - ZFS indirect block predictive prefetch
Alexander Motin [Mon, 29 Aug 2016 21:36:39 +0000 (23:36 +0200)]
OpenZFS 6322 - ZFS indirect block predictive prefetch

For quite some time I was thinking about possibility to prefetch
ZFS indirection tables while doing sequential reads or writes.
Recent changes in predictive prefetcher made that much easier to
do. My tests on zvol with 16KB block size on 5x striped and 2x
mirrored pool of 10 disks show almost double throughput on sequential
read, and almost tripple on sequential rewrite. While for read alike
effect can be received from increasing maximal prefetch distance
(though at higher memory cost), for rewrite there is no other
solution so far.

Authored by: Alexander Motin <mav@freebsd.org>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Ported-by: kernelOfTruth kerneloftruth@gmail.com
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/6322
OpenZFS-commit: https://github.com/illumos/illumos-gate/commit/cb92f413
Closes #5040

Porting notes:
- Change from upstream in module/zfs/dbuf.c in 'int dbuf_read' due
  to commit 5f6d0b6 'Handle block pointers with a corrupt logical size'

- Difference from upstream in module/zfs/dmu_zfetch.c,
  uint32_t zfetch_max_idistance -> unsigned int zfetch_max_idistance

- Variables have been initialized at the beginning of the function
 (void dmu_zfetch) to resemble the order of occurrence and account
 for C99, C11 mode errors.

7 years agoOpenZFS 7086 - ztest attempts dva_get_dsize_sync on an embedded blockpointer
Matthew Ahrens [Mon, 29 Aug 2016 18:40:16 +0000 (11:40 -0700)]
OpenZFS 7086 - ztest attempts dva_get_dsize_sync on an embedded blockpointer

In dbuf_dirty(), we need to grab the dn_struct_rwlock before looking at
the db_blkptr, to prevent it from being changed by syncing context.

Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/7086
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/98fa317
Closes #5039

7 years agoFix: Build warnings with different gcc optimization levels in debug mode
GeLiXin [Thu, 25 Aug 2016 08:40:20 +0000 (16:40 +0800)]
Fix: Build warnings with different gcc optimization levels in debug mode

This fix resolves warnings reported during compiling with different gcc
optimization levels in debug mode,

Test tools:
gcc version 4.4.7 20120313 (Red Hat 4.4.7-16) (GCC)
Linux version: 2.6.32-573.18.1.el6.x86_64, Red Hat Enterprise Linux Server release 6.1 (Santiago)

List of warnings:
CFLAGS=-O1 ./configure --enable-debug ;make
../../module/icp/core/kcf_sched.c: In function ‘kcf_aop_done’:
../../module/icp/core/kcf_sched.c:499: error: ‘fg’ may be used uninitialized in this function
../../module/icp/core/kcf_sched.c:499: note: ‘fg’ was declared here

CFLAGS=-Os ./configure --enable-debug ; make
libzfs_dataset.c: In function ‘zfs_prop_set_list’:
libzfs_dataset.c:1575: error: ‘nvl_len’ may be used uninitialized in this function

Signed-off-by: GeLiXin <ge.lixin@zte.com.cn>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5022

7 years agoFix cv_timedwait_hires
Brian Behlendorf [Thu, 25 Aug 2016 20:24:01 +0000 (20:24 +0000)]
Fix cv_timedwait_hires

The user space implementation of cv_timedwait_hires() was always passing
a relative time to pthread_cond_timedwait() when an absolute time is
expected.  This was accidentally introduced in commit 206971d2.

Replace two magic values with their corresponding preprocessor macro.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes #5024

7 years agoAdd zfs_arc_meta_limit_percent tunable
GeLiXin [Thu, 11 Aug 2016 03:15:37 +0000 (11:15 +0800)]
Add zfs_arc_meta_limit_percent tunable

ARC will evict meta buffers that exceed the arc_meta_limit. Before a further
investigating on whether we should take special protection on meta buffers,
this tunable make arc_meta_limit adjustable for different workloads.

People can set zfs_arc_meta_limit_percent to any value while insmod zfs.ko,
so some range check is added to guarantee a suitable arc_meta_limit.

Suggested by Tim Chase, zfs_arc_dnode_limit is changed to a percent-style
tunable as well.

Signed-off-by: GeLiXin <ge.lixin@zte.com.cn>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4957

7 years agoPrevent reclaim in send_traverse_thread()
Tim Chase [Sun, 21 Aug 2016 13:22:32 +0000 (08:22 -0500)]
Prevent reclaim in send_traverse_thread()

As is the case with traverse_prefetch_thread(), the deep stacks caused
by traversal require disabling reclaim in the send traverse thread.

Also, do the same for receive_writer_thread() in which similar problems
have been observed.

Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4912
Closes #4998

7 years agoFix: Array bounds read in zprop_print_one_property()
GeLiXin [Mon, 22 Aug 2016 03:20:22 +0000 (11:20 +0800)]
Fix: Array bounds read in zprop_print_one_property()

If the loop index i comes to (ZFS_GET_NCOLS - 1), the cbp->cb_columns[i + 1]
actually read the data of cbp->cb_colwidths[0], which means the array
subscript is above array bounds.

Luckily the cbp->cb_colwidths[0] is always 0 and it seems we haven't
looped enough times to exceed the array bounds so far, but it's really
a secluded risk someday.

Signed-off-by: GeLiXin <ge.lixin@zte.com.cn>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5003

7 years agoLinux compat: Grsecurity kernel
Gvozden Neskovic [Sun, 21 Aug 2016 19:29:49 +0000 (21:29 +0200)]
Linux compat: Grsecurity kernel

API Change: Module parameter set/get methods take const parameter in
Grsecurity kernel v4.7.1

Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
Signed-off-by: Jason Zaman <jason@perfinion.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4997
Closes #5001

7 years agoOpenZFS 7004 - dmu_tx_hold_zap() does dnode_hold() 7x on same object
Matthew Ahrens [Wed, 20 Jul 2016 22:42:13 +0000 (15:42 -0700)]
OpenZFS 7004 - dmu_tx_hold_zap() does dnode_hold() 7x on same object

Using a benchmark which has 32 threads creating 2 million files in the
same directory, on a machine with 16 CPU cores, I observed poor
performance. I noticed that dmu_tx_hold_zap() was using about 30% of
all CPU, and doing dnode_hold() 7 times on the same object (the ZAP
object that is being held).

dmu_tx_hold_zap() keeps a hold on the dnode_t the entire time it is
running, in dmu_tx_hold_t:txh_dnode, so it would be nice to use the
dnode_t that we already have in hand, rather than repeatedly calling
dnode_hold(). To do this, we need to pass the dnode_t down through
all the intermediate calls that dmu_tx_hold_zap() makes, making these
routines take the dnode_t* rather than an objset_t* and a uint64_t
object number. In particular, the following routines will need to have
analogous *_by_dnode() variants created:

dmu_buf_hold_noread()
dmu_buf_hold()
zap_lookup()
zap_lookup_norm()
zap_count_write()
zap_lockdir()
zap_count_write()

This can improve performance on the benchmark described above by 100%,
from 30,000 file creations per second to 60,000. (This improvement is on
top of that provided by working around the object allocation issue. Peak
performance of ~90,000 creations per second was observed with 8 CPUs;
adding CPUs past that decreased performance due to lock contention.) The
CPU used by dmu_tx_hold_zap() was reduced by 88%, from 340 CPU-seconds
to 40 CPU-seconds.

Sponsored by: Intel Corp.

Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/7004
OpenZFS-commit: https://github.com/openzfs/openzfs/pull/109
Closes #4641
Closes #4972

7 years agoOpenZFS 7003 - zap_lockdir() should tag hold
Matthew Ahrens [Wed, 20 Jul 2016 22:39:55 +0000 (15:39 -0700)]
OpenZFS 7003 - zap_lockdir() should tag hold

zap_lockdir() / zap_unlockdir() should take a "void *tag" argument which
tags the hold on the zap. This will help diagnose programming errors
which misuse the hold on the ZAP.

Sponsored by: Intel Corp.

Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Pavel Zakharov <pavel.zakha@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/7003
OpenZFS-commit: https://github.com/openzfs/openzfs/pull/108
Closes #4972

7 years agoFix spa config generate memory leak in spa_load_best function
heary-cao [Sat, 6 Aug 2016 07:08:51 +0000 (15:08 +0800)]
Fix spa config generate memory leak in spa_load_best function

When spa retry load succeeds and spa recovery is requested it may
leak in spa_load_best function.  Always free the generated config
when it is not assigned to the spa.

Signed-off-by: cao.xuewen <cao.xuewen@zte.com.cn>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4940

7 years agoUpdate zfs_create_(009,010)_neg.ksh
ChaoyuZhang [Wed, 17 Aug 2016 00:57:25 +0000 (08:57 +0800)]
Update zfs_create_(009,010)_neg.ksh

Just cleanup the new fs created during the test, so the "$found"
should be "true".

Signed-off-by: ChaoyuZhang <zhang.chaoyu@zte.com.cn>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4978

7 years agoOpenZFS 7176 - Yet another hole birth issue
Paul Dagnelie [Tue, 9 Aug 2016 21:06:39 +0000 (23:06 +0200)]
OpenZFS 7176 - Yet another hole birth issue

This is another bug in the long line of hole-birth related issues. In
this particular case, it was discovered that a previous hole-birth fix
(illumos bug 6513, commit bc77ba73) did not cover as many cases as we
thought it did. While the issue worked in the case of hole-punching
(writing zeroes to a large part of a file), it did not deal with
truncation, and then writing beyond the new end of the file.

The problem is that dbuf_findbp will return ENOENT if the block it's
trying to find is beyond the end of the file. If that happens, we assume
there is no birth time, and so we lose that information when we write
out new blkptrs. We should teach dbuf_findbp to look for things that are
beyond the current end, but not beyond the absolute end of the file.

Authored by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matthew Ahrens mahrens@delphix.com
Reviewed by: George Wilson george.wilson@delphix.com
Ported-by: kernelOfTruth <kerneloftruth@gmail.com>
Signed-off-by: Boris Protopopov <boris.protopopov@actifio.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/7176
OpenZFS-commit: https://github.com/openzfs/openzfs/pull/173/commits/8b9f3ad
Upstream-bugs: DLPX-46009

Porting notes:
- Fix ISO C90 mixed declaration error in dbuf.c ( int nlevels, epbs; ) ;
  keep previous position of the initialization

7 years agoFix do_link portion of ctime test
Nikolay Borisov [Tue, 16 Aug 2016 20:00:16 +0000 (23:00 +0300)]
Fix do_link portion of ctime test

From the man page of dirname: " Both dirname() and basename()
may modify the contents of path, so it may be desirable to pass
a copy when calling one of these functions." And in fact on linux
using dirname actually changes the contents of the passed parameter as
evident from the following failure when running the ctime test:

link(/root/zfs-mount, /root/zfs-mount/link_file)

Fix this by creating a copy of the input parameter and passing that
to dirname, thus not compromising the original parameter, allowing
the creation of hard link to succeed.

Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4977

7 years agoIt is not necessary to zero struct dbuf_hold_impl_data
Matthew Ahrens [Thu, 21 Jul 2016 05:50:26 +0000 (22:50 -0700)]
It is not necessary to zero struct dbuf_hold_impl_data

Under a workload which makes heavy use of `dbuf_hold()`, I noticed that a
considerable amount of time was spent in `dbuf_hold_impl()`, due to its call to
`kmem_zalloc(sizeof (struct dbuf_hold_impl_data) * DBUF_HOLD_IMPL_MAX_DEPTH)`,
which is around 2KiB.  This structure is used as a stack, to limit the size of
the C stack as dbuf_hold() calls itself recursively.  We make a recursive call
to hold the parent's dbuf when the requested dbuf is not found.  The vast
majority of the time, the parent or grandparent indirect dbuf is cached, so the
number of recursive calls is very low.  However, we initialize this entire
array for every call to dbuf_hold().

To improve performance, this commit changes `dbuf_hold()` to use `kmem_alloc()`
instead of `kmem_zalloc()`.  __dbuf_hold_impl_init is changed to initialize all
members of the struct before they are used.  I observed ~5% performance
improvement on a workload which creates many files.

Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4974

7 years agozdb: fencepost error at zdb_cb.zcb_embedded_histogram[][]
Gvozden Neskovic [Thu, 4 Aug 2016 14:23:35 +0000 (16:23 +0200)]
zdb: fencepost error at zdb_cb.zcb_embedded_histogram[][]

Erroneous access detected by gcc UndefinedBehaviorSanitizer:
`zdb.c:2424:7: runtime error: index 112 out of bounds for type 'uint64_t [112]'`

Fix: increase histogram size by 1 to accommodate all possible sizes.

Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4934
Issue #4883

7 years agoRework of fletcher_4 module
Gvozden Neskovic [Tue, 12 Jul 2016 15:50:54 +0000 (17:50 +0200)]
Rework of fletcher_4 module

- Benchmark memory block is increased to 128kiB to reflect real block sizes more
accurately. Measurements include all three stages needed for checksum generation,
i.e. `init()/compute()/fini()`. The inner loop is repeated multiple times to offset
overhead of time function.

- Fastest implementation selects native and byteswap methods independently in
benchmark. To support this new function pointers `init_byteswap()/fini_byteswap()`
are introduced.

- Implementation mutex lock is replaced by atomic variable.

- To save time, benchmark is not executed in userspace. Instead, highest supported
implementation is used for fastest. Default userspace selector is still 'cycle'.

- `fletcher_4_native/byteswap()` methods use incremental methods to finish
calculation if data size is not multiple of vector stride (currently 64B).

- Added `fletcher_4_native_varsize()` special purpose method for use when buffer size
is not known in advance. The method does not enforce 4B alignment on buffer size, and
will ignore last (size % 4) bytes of the data buffer.

- Benchmark `kstat` is changed to match the one of vdev_raidz. It now shows
throughput for all supported implementations (in B/s), native and byteswap,
as well as the code [fastest] is running.

Example of `fletcher_4_bench` running on `Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz`:
implementation   native         byteswap
scalar           4768120823     3426105750
sse2             7947841777     4318964249
ssse3            7951922722     6112191941
avx2             13269714358    11043200912
fastest          avx2           avx2

Example of `fletcher_4_bench` running on `Intel(R) Xeon Phi(TM) CPU 7210 @ 1.30GHz`:
implementation   native         byteswap
scalar           1291115967     1031555336
sse2             2539571138     1280970926
ssse3            2537778746     1080016762
avx2             4950749767     1078493449
avx512f          9581379998     4010029046
fastest          avx512f        avx512f

Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4952