git.proxmox.com Git - mirror

Linux compat 4.16: SECTOR_SIZE

As of https://github.com/torvalds/linux/commit/233bde21,
SECTOR_SIZE is defined in linux/blkdev.h. Define SECTOR_SIZE
in sunldi.h only if it's not already defined.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #697

Remove sysevents

These headers are provided in the ZFS repository and never used
by the SPL. Remove them to ensure the right ones are included.

Reviewed-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #696

Fix spl-kmod builds when using rpm >= 4.14

With rpm-software-management/rpm@5e94633 a package version containing
invalid characters (most commonly a double '-') causes the kmod package
generation to terminate with an error. This change takes advantage of
the newly introduced rpm macro "_wrong_version_format_terminate_build"
to allow kmod packages to be built.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #691

Fix more cstyle warnings

This patch contains no functional changes. It is solely intended
to resolve cstyle warnings in order to facilitate moving the spl
source code in to the zfs repository.

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #687

Fix function name typos

vn_init() and vn_fini() had been renamed by 12ff95ff in 2011.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@osnexus.com>
Closes #686

Staticize kstat_default_update()

This is only used via ->ks_update of `kstat_t *`.
This isn't exported nor do headers have its prototype.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@osnexus.com>
Closes #686

Fix multiple evaluations of VERIFY() and ASSERT() on failures

Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: DHE <git@dehacked.net>
Closes #684
Closes #685

Split spl-build.m4

Split the kernel interface configure checks in to seperate m4
macro files. This is intended to facilitate moving the spl
source code in to the zfs repository.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #682

Fix cstyle warnings

This patch contains no functional changes. It is solely intended
to resolve cstyle warnings in order to facilitate moving the spl
source code in to the zfs repository.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #681

Add cv_timedwait_io()

Add missing helper function cv_timedwait_io(), it should be used
when waiting on IO with a specified timeout.

Reviewed-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #674

Fix Debian packaging on ARMv7/ARM64

When building packages on Debian-based systems specify the target
architecture used by 'alien' to convert .rpm packages into .deb: this
avoids detecting an incorrect value which results in the following
errors:

<package>.aarch64.rpm is for architecture aarch64 ; the package cannot be built on this system
<package>.armv7l.rpm is for architecture armel ; the package cannot be built on this system

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes zfsonlinux/zfs#7046
Closes #678

Linux 4.15 compat: timer updates

Use timer_setup() macro and new timeout function definition.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #670
Closes #671

Linux 4.14 compat: vfs_read & vfs_write

The kernel_read & kernel_write functions have always wrapped the
vfs_read & vfs_write functions respectively.  However, they could
not be used by vn_rdwr() since the offset wasn't passed as a
pointer.  This prevented us from being able to properly update
the file offset.

Linux 4.14 unexported vfs_read & vfs_write but also changed the
signature of kernel_read & kernel_write to provide the needed
functionality.  Use these updated functions when available.

Reviewed-by: Pritam Baral <pritam@pritambaral.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #656
Closes #667

Remove all spin_is_locked calls

On systems with CONFIG_SMP turned off, spin_is_locked always returns
false causing these assertions to fail. Remove them as suggested in
zfsonlinux/zfs#6558.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: James Cowgill <james.cowgill@mips.com>
Closes #665

Remove vn_rename and vn_remove

Both vn_rename and vn_remove have been historically problematic
to implement reliably. Rather than fixing them yet again they
are being removed.

Reviewed-by: Arkadiusz Bubala <arkadiusz.bubala@open-e.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #648
Closes #661

Update spl module parameters man5 with missing parameter details

Update spl module parameters man5 with the following missing parameter
details for spl_panic_halt.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alex Braunegg <alex.braunegg@gmail.com>
Closes #664

Add DKMS package on Debian-based distributions

* config/deb.am: Enable building DKMS packages for Debian
* rpm/generic/spl-dkms.spec.in: Adjust spec to be Debian-compatible
* Condition kernel-devel Requires to RPM distros
* Ensure that --rpm_safe_upgrade isn't used on non-RPM distros
* config/deb.am: Drop CONFIG_KERNEL and CONFIG_USER guards
* Makefile.am: Add pkg-dkms target

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Neal Gompa <ngompa@datto.com>
Closes #657

Add parenthesis to btop and ptob macros

Add missing parenthesis around btop and ptob macros to ensure
operation ordering is preserved after expansion.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #660

Make include/linux/ conform to ZFS style standard

No semantic changes.

Fix the following types of style issues:
blank after preprocessor #
#define followed by space instead of tab
improper first line of block comment
indent by spaces instead of tabs
last line in file is blank
missing blank after open comment
missing space before left brace
non-continuation indented 4 spaces
spaces instead of tabs
unparenthesized return expression

Signed-off-by: Olaf Faaland <faaland1@llnl.gov>

Make file headers conform to ZFS style standard

No semantic changes.

Change
/************\
and
\************/

to

/*
and
*/

Signed-off-by: Olaf Faaland <faaland1@llnl.gov>

Pool io stat shows wlentime instead of rlentime

Reviewed-by: Tim Chase <tim@chase2k.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Elling <Richard.Elling@RichardElling.com>
Closes #652
Closes #651

Allow longer SPA names in stats

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: gaurkuma <gauravk.18@gmail.com>
Closes #641

make module/spl/spl-kmem.c non-executable (again)

This was probably accidentally committed in

aeb9baa618beea1458ab3ab22cbc0f39213da6cf
Fix: handle NULL case in spl_kmem_free_track()

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Gvozden Neskovic <neskovic@gmail.com>
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Closes #644

spl-module-parameters.5 manpage: fix macro

There is no '.sh' macro in troff/groff/man, only '.SH' for section
headers. I assume .sp for a line break was intended here like
in the rest of the man page.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #643

splat.1 manpage: fix spelling of 'hexadecimal'

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #642

Add assert under lock to detect cases of dispach of a preallocated

taskq work item to more than one queue concurrently. Also, please
see discussion in zfsonlinux/zfs#3840.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Boris Protopopov <boris.protopopov@actifio.com>
Closes #609

Fix use-after-free in taskq_seq_show_impl

taskq_seq_show_impl walks the tq_active_list to show the tqent_func and
tqent_arg. However for taskq_dispatch_ent, it's very likely that the
task entry will be freed during the function call, and causes a
use-after-free bug.

To fix this, we duplicate the task entry to an on-stack struct, and
assign it instead to tqt_task. This way, the tq_lock alone will
guarantee its safety.

Reviewed-by: Tim Chase <tim@chase2k.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #638
Closes #640

Add __divmoddi4 and __udivmoddi4 for 32-bit arch

gcc-7 seems to use __udivmoddi4 for 64-bit division on 32-bit arch. This
patch implement them so we don't get undefined reference error.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes zfsonlinux/zfs#6417
Closes #636

Remove misguided HAVE_MUTEX_OWNER check, take 2

It is just plain unsafe to peek inside in-kernel
mutex structure and make assumptions about what kernel
does with those internal fields like owner.

Kernel is all too happy to stop doing the expected things
like tracing lock owner once you load a tainted module
like spl/zfs that is not GPL.

As such you will get instant assertion failures like this:

  VERIFY3(((*(volatile typeof((&((&zo->zo_lock)->m_mutex))->owner) *)&
      ((&((&zo->zo_lock)->m_mutex))->owner))) ==
     ((void *)0)) failed (ffff88030be28500 == (null))
  PANIC at zfs_onexit.c:104:zfs_onexit_destroy()
  Showing stack for process 3626
  CPU: 0 PID: 3626 Comm: mkfs.lustre Tainted: P OE ------------ 3.10.0-debug #1
  Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
  Call Trace:
  dump_stack+0x19/0x1b
  spl_dumpstack+0x44/0x50 [spl]
  spl_panic+0xbf/0xf0 [spl]
  zfs_onexit_destroy+0x17c/0x280 [zfs]
  zfsdev_release+0x48/0xd0 [zfs]

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Reviewed-by: Gvozden Neskovic <neskovic@gmail.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Closes #639
Closes #632

spl-mutex: fix race in mutex_exit

Prevent race on accessing kmutex_t when the mutex is
embedded in a ref counted structure.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
Closes zfsonlinux/zfs#6401
Closes #637

Revert "Remove misguided HAVE_MUTEX_OWNER check"

This reverts commit d89616fda88bc030aaff758d37ede7d35e58841a which
introduced some build failures which need to be resolved before
this can be merged.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #633

Remove misguided HAVE_MUTEX_OWNER check

It is just plain unsafe to peek inside in-kernel
mutex structure and make assumptions about what kernel
does with those internal fields like owner.

Kernel is all too happy to stop doing the expected things
like tracing lock owner once you load a tainted module
like spl/zfs that is not GPL.

As such you will get instant assertion failures like this:

  VERIFY3(((*(volatile typeof((&((&zo->zo_lock)->m_mutex))->owner) *)&
      ((&((&zo->zo_lock)->m_mutex))->owner))) ==
     ((void *)0)) failed (ffff88030be28500 == (null))
  PANIC at zfs_onexit.c:104:zfs_onexit_destroy()
  Showing stack for process 3626
  CPU: 0 PID: 3626 Comm: mkfs.lustre Tainted: P OE ------------ 3.10.0-debug #1
  Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
  Call Trace:
  dump_stack+0x19/0x1b
  spl_dumpstack+0x44/0x50 [spl]
  spl_panic+0xbf/0xf0 [spl]
  zfs_onexit_destroy+0x17c/0x280 [zfs]
  zfsdev_release+0x48/0xd0 [zfs]

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Closes #632
Closes #633

Fix aarch64 build

Add aarch64 to the list of architecture which do not sanitize the
LDFLAGS from the environment. See e0aacd9b for details.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #635

Tag spl-0.7.0

META file and changelog updated.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

Module parameter to enable spl_panic() to panic the kernel

In unattended operations it's often more useful to have node
panic and reboot when it encounters problems as opposed to
sit there indefinitely waiting for somebody to discover it.

This implements an spl_panic_crash module parameter, set it
to nonzero to cause spl_panic() to call panic().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Closes #634

Avoid WARN() from procfs on kstat collision

When we load a ZFS pool having spa_name equals to some existing kstat
we would have to create a duplicate entry, which procfs doesn't like.

For instance a ZFS pool named "zil" would have its kstat "txgs"
(module "zfs/zil") intalled under "/proc/spl/kstat/zfs/zil":
unfortunately we already have a kstat named "zil" (module "zfs")
installed in the same procfs location.

Avoid this issue by skipping the duplicate entry creation in procfs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #628

Linux 4.13 compat: wait queues

Commit torvalds/linux@ac6424b9
- Renamed struct wait_queue -> struct wait_queue_entry.

Commit torvalds/linux@2055da97
- Renamed wait_queue_head::task_list -> wait_queue_head::head
- Renamed wait_queue_entry::task_list -> wait_queue_entry::entry

Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #629

Tag 0.7.0-rc5

Fifth release candidate.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

Don't cache the system hostid

Historically the SPL cached the system hostid the first time it
was accessed. This was done to speed up subsequent accesses.
But in practice the system host id is rarely accessed and its
inconvenient that it doesn't promptly detect /etc/hostid
configuration changes. Therefore, zone_get_hostid() has been
updated to always refresh the system hostid reported.

Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #626

Add ASSERT3B/VERIFY3B/USEC2NSEC/NSEC2USEC macros

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Prakash Surya <prakash.surya@delphix.com>
Closes #627

Fix RWSEM_SPINLOCK_IS_RAW check failed

Initialize dummy_lock to fix the build error in gcc 7.1.1 with:
error: ‘dummy_lock’ is used uninitialized in this function

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #622

config: allow --with-linux without --with-linux-obj

Don't use `uname -r` to determine kernel build directory when the user
specified kernel source with --with-linux. Otherwise, the user is forced
to use --with-linux-obj even if they are the same directory, which is
very counterintuitive.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>

Improve gitignore

Exclude Makefile.in in module/ and fix the gitignore in cmd/
Also, ignore *.patch and *.orig files

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>

Fix cv_timedwait timeout

Perform the already past expiration time check before updating
cvp->cv_mutex with the provided mutex. This check only depends
on local state. Doing it first ensures that cvp->cv_mutex will not
be updated in the timeout case or if it's ever called with an
expire_time <= now.

Reviewed-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #616

Linux 4.12 compat: PF_FSTRANS was removed

Change SPL_FSTRANS to optionally contains PF_FSTRANS. Also, add
__spl_pf_fstrans_check for the checks specifically for PF_FSTRANS.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #614

Tag 0.7.0-rc4

Fourth release candidate.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

glibc 2.25 compat: remove assert(X=Y)

The assert() related definitions in glibc 2.25 were altered to warn
about assert(X=Y) when -Wparentheses is used. See
https://abi-laboratory.pro/tracker/changelog/glibc/2.25/log.html

lib/list.c used this construct to set the value of a magic field which
is defined only when debugging.

Replaced the assert()s with #ifndef/#endifs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #610

Linux 4.11 compat: remove stub for __put_task_struct

Before kernel 2.6.29 credentials were embedded in task_structs, and zfs had
cases where one thread would need to refer to the credential of another thread,
forcing it to take a hold on the foreign thread's task_struct to ensure it was
not freed.

Since 2.6.29, the credential has been moved out of the task_struct into a
cred_t.

In addition, the mainline kernel originally did not export __put_task_struct()
but the RHEL5 kernel did, according to zfsonlinux/spl@e811949a570. As of
2.6.39 the mainline kernel exports it.

There is no longer zfs code that takes or releases holds on a task_struct, and
so there is no longer any reference to __put_task_struct().

This affects the linux 4.11 kernel because the prototype for
__put_task_struct() is in a new include file (linux/sched/task.h) and so the
config check failed to detect the exported symbol.

Removing the unnecessary stub and corresponding config check. This works on
kernels since the oldest one currently supported, 2.6.32 as shipped with
Centos/RHEL.

Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #608

Linux 4.11 compat: add linux/sched/signal.h

In Linux 4.11, torvalds/linux@2a1f062, signal handling related functions
were moved from sched.h into sched/signal.h.

Add configure checks to detect this and include the new file where
needed.

Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #608

Linux 4.11 compat: vfs_getattr() takes 4 args

There are changes to vfs_getattr() in torvalds/linux@a528d35.  The new
interface is:

int vfs_getattr(const struct path *path, struct kstat *stat,
               u32 request_mask, unsigned int query_flags)

The request_mask argument indicates which field(s) the caller intends to
use.  Fields the caller does not specify via request_mask may be set in
the returned struct anyway, but their values may be approximate.

The query_flags argument indicates whether the filesystem must update
the attributes from the backing store.

This patch uses the query_flags which result in vfs_getattr behaving the same
as it did with the 2-argument version which the kernel provided before
Linux 4.11.

Members blksize and blocks are now always the same size regardless of
arch.  They match the size of the equivalent members in vnode_t.

The configure checks are modified to ensure that the appropriate
vfs_getattr() interface is used.

A more complete fix, removing the ZFS dependency on vfs_getattr()
entirely, is deferred as it is a much larger project.

Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #608

Fix powerpc build

Unlike other architectures which sanitize the LDFLAGS from the
environment in arch/<arch>/Makefile.  The powerpc Makefile
allows LDFLAGS to be passed through resulting in the following
build failure.

  /usr/bin/ld: unrecognized option '-Wl,-z,relro'

LDFLAGS is set in /usr/lib/rpm/redhat/macros by default.  Clear
the environment variable when building kmods for powerpc.

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #607

Linux 4.11 compat: set_task_state() removed

Replace uses of set_task_state(current, STATE) with
set_current_state(STATE).

In Linux 4.11, torvalds/linux@642fa44, set_task_state() is removed.

All spl uses are of the form set_task_state(current, STATE).
set_current_state(STATE) is equivalent and has been available since
Linux 2.2.26.

Furthermore, set_current_state(STATE) is already used in about 15
locations within spl. This change should have no impact other than
removing an unnecessary dependency.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #603

Use kernel slab for vn_cache and vn_file_cache

Resolve a false positive in the kmemleak checker by shifting to the
kernel slab. It shows up because vn_file_cache is using KMC_KMEM
which is directly allocated using __get_free_pages, which is not
automatically tracked by kmemleak.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #599

Add a PAGESHIFT definition

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: David Quigley <david.quigley@intel.com>
Closes #598

Tag 0.7.0-rc3

Third release candidate.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

Reimplement rt_mutex_owner to fix build with DEBUG & PREEMPT_RT_FULL

rt_mutex_owner is internal to kernel/locking/rtmutex_common.h and
inaccessible for SPL via the public kernel headers. The way of
accessing the owner has been stable since at least 3.13 ([1], [2]),
which is masking the lowest bit in the owner pointer in rt_mutex. We
do the same.

[1] http://lxr.free-electrons.com/source/kernel/locking/rtmutex_common.h?v=3.13#L99
[2] http://lxr.free-electrons.com/source/kernel/locking/rtmutex_common.h?v=4.9#L78

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Clemens Fruhwirth <clemens@endorphin.org>
Closes #593

Remove identical if statements in module/spl/spl-vnode.c

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #594

Add support for recent kmem_cache_create_usercopy

SLAB_USERCOPY flag was used to indicate PAX
not to kill copies from kernel to userland.

With recent grsecurity patchset and
CONFIG_GRKERNSEC_HIDESYM that enables
CONFIG_PAX_USERCOPY zfs would panic.

Handle newer API while keeping old one functional.

Tested-by: RageLtMan <rageltman@sempervictus>
Reviewed-by: spendergrsec <spender@grsecurity.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kevin Tanguy <kevin.tanguy@ovh.net>
Closes #595

Update struct member intializers to C89

When building SPL within the kernel tree, C99 initializers cause
build failures and need to be converted to C89 as kernel CFLAGS
specify -std=gnu89.

This fix was provided by @behlendorf in #595 discussion notes and
manually implemented in the current master revision.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: RageLtMan <rageltman@sempervictus>
Closes #597

Add support for rw semaphore under PREEMPT_RT_FULL

The main complication from the RT patch set is that the RW semaphore
locks change such that read locks on an rwsem can be taken only by
a single thread.  All other threads are locked out. This single
thread can take a read lock multiple times though. The underlying
implementation changes to a mutex with an additional read_depth
count.

The implementation can be best understood by inspecting the RT
patch.  rwsem_rt.h and rt.c give the best insight into how RT
rwsem works. My implementation for rwsem_tryupgrade is basically
an inversion of rt_downgrade_write found in rt.c. Please see the
comments in the code.

Unfortunately, I have to drop SPLAT rwlock test4 completely as this
test tries to take multiple locks from different threads, which RT
rwsems do not support.  Otherwise SPLAT, zconfig.sh, zpios-sanity.sh
and zfs-tests.sh pass on my Debian-testing VM with the kernel
linux-image-4.8.0-1-rt-amd64.

Tested-by: kernelOfTruth <kerneloftruth@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Clemens Fruhwirth <clemens@endorphin.org>
Closes zfsonlinux/zfs#5491
Closes #589
Closes #308

Remove stale comment from rw_tryupgrade()

Commit f58040c0fc8bc6490fcc75db7fc3e709dfc3c656 should have removed
this comment which is no longer relevant.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Clemens Fruhwirth <clemens@endorphin.org>
Issue #589

Refactor some splat macro to function

Refactor the code by making splat_test_{init,fini}, splat_subsystem_{init,fini}
into functions. They don't have reason to be macro and it would be too bloated
to inline every call.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>

Fix splat memleak

SPLAT_TEST_FINI didn't call kfree causing memleak.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>

Add system_delay_taskq for long delay

Add a dedicated system_delay_taskq for long delay like spa_deadman and
zpl_posix_acl_free. This will allow us to use system_taskq in the manner of
dispatch multiple tasks and call taskq_wait_outstanding.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #588

Limit number of tasks shown in taskq proc

To prevent holding tq_lock for too long.

Before zfsonlinux/zfs@8e71ab9, hogging delay tasks and cat /proc/spl/taskq
would easily cause a lockup. While that bug has been fixed. It's probably
still a good idea to do this just in case task lists grow too large.

Reviewed-by: Tim Chase <tim@chase2k.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #586

Add TASKQID_INVALID and TASKQID_INITIAL macros

Add the TASKQID_INVALID and TASKQID_INITIAL macros and update the
taskq implementation and test cases to use them. This is solely
for the purposes of readability and introduces no functional change.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

Fix vmem_size()

Add a minimal implementation of vmem_size() which accounts for the
virtual memory usage of the SPL's kmem cache.  This functionality
is only useful on 32-bit systems with a small virtual address space.

The following assumptions are made:

  1) The major SPL consumer of virtual memory is the kmem cache.
  2) Memory allocated with vmem_alloc() is short lived and can be ignored.
  3) Allow a 4MB floor as a generous pad given normal consumption.
  4) The spl_kmem_cache_sem only contends with cache create/destroy.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

Tag 0.7.0-rc2

Second release candidate.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

Linux 4.9 compat: group_info changes

In Linux 4.9, torvalds/linux@81243ea, group_info changed from 2d array via
->blocks to 1d array via ->gid. We change the spl cred functions accordingly.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #581

Fix splat-cred.c cred usage

No need to crhold current_cred(), fix possible leak in splat_cred_test2

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #556

Fix crgetgroups out-of-bound and misc cred fix

init_groups has 0 nblocks, therefore calling the current crgetgroups with
init_groups would result in out-of-bound access. We fix this by returning NULL
when nblocks is 0.

Cap crgetngroups to NGROUPS_PER_BLOCK, since crgetgroups will only return
blocks[0].

Also, remove all get_group_info. The cred already holds reference on the
group_info, and cred is not mutable. So there's no reason to hold extra
reference, if we hold cred.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #556

Fix out-of-bound in per_cpu in spl_random_init

When iterating per_cpu values, we need to use for_each_possible_cpu. While
NR_CPUS indicates the number of CPU supported by the kernel, it might not
initialize all of them if the kernel decides it's not possible to use them.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #578

Linux 4.8 compat: Fix RW_READ_HELD

Linux 4.8, starting from torvalds/linux@19c5d690e, will set owner to 1 when
read held instead of leave it NULL. So we change the condition to
`rw_owner(rwp) <= 1` in RW_READ_HELD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes zfsonlinux/zfs#5233
Closes #577

Fix p0 initializer

Due to changes in the task_struct the following warning is occurs
when initializing the global p0.  Since this structure only exists
for it's address to be taken initialize it in a manor which isn't
sensitive to internal changes to the structure.

  module/spl/spl-generic.c:58:1: error: missing braces around
  initializer [-Werror=missing-braces]

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #576

Fix aarch64 type warning

Explicitly cast type in splat-rwlock.c test case to silence
the following warning.

warning: format ‘%ld’ expects argument of type ‘long int’,
but argument N has type ‘int’

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #574

Fix automatically generated release number

When building from the head of a branch a release number is
automatically generated with `git describe` using the last tag
on that branch as the base. For this to work the last tag on the
branch needs to be predictable given the current META file.

This logic was accidentally broken when an -rcX tag was added to
the branch. Update it to search for a VERSION or VERSION-RELEASE
tag.

Reviewed-by: Chris Siebenmann <cks.git01@cs.toronto.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue zfsonlinux/zfs#5105
Closes #572

Increase spl_kmem_alloc_warn limit

In order to support ABD with large blocks the spl_kmem_alloc_warn
limit needs to be increased to 64K.

A 16M block requires that pointers be stored for 4096 4K-pages
on an x86_64 system.  Each of these pointers is 8 bytes requiring
an allocation of 8*4096=32,768 bytes.  The addition of a small
header to this structure pushes the allocation over the default
32K warning threshold.

In addition, fix a small bug where MAX was used instead of MIN
when setting the default.  This ensures a reasonable limit is
still set on systems with page sizes larger then 4K.

Reviewed-by: David Quigley <david.quigley@intel.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #571

Fix spl check.sh script

Update splat_cmd to reference the correct location of the splat utility.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Liu Hua<liu.hua130@zte.com.cn>
Closes #570

Cleanup in cred.h

Remove the code that doesn't make any sense.

Reviewed-by: Brian Behlendorf <behlendorf@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #569

Tag 0.7.0-rc1

First release candidate.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

Fix: handle NULL case in spl_kmem_free_track()

When DEBUG_KMEM_TRACKING is enabled in SPL, we keep tracking all
the buffers alloced by kmem_alloc() and kmem_zalloc(). If a NULL
pointer which indicates no track info in SPL is passed to
spl_kmem_free_track, we just ignore it.

Signed-off-by: GeLiXin <ge.lixin@zte.com.cn>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue zfsonlinux/zfs#4967
Closes #567

Fix HAVE_MUTEX_OWNER test for kernels prior to 4.6

Recent 4.X kernels prior to 4.6 require #include of spinlock.h in
order to get the definition of __ARCH_SPIN_LOCK_UNLOCKED which is
used by DEFINE_MUTEX().

Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #566

Add handling for kernel 4.7's CONFIG_TRIM_UNUSED_KSYMS

Kernel 4.7 added the option to trim the unused exported symbols. In
my testing this showed to be problematic since the PDE_DATA function
was considered unused and as such was trimmed. This in turn caused the
respective test during spl's configure stage to falsely detect that
PDE_DATA is not defined, which in turn caused build failures later.

Handle this situation by adding detection whether CONFIG_TRIM_UNUSED_KSYMS
is enabled and refuse to build against a kernel which has it enabled

Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #565

Add gitignore entry for spl-*.o.d files

Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #565

Linux 4.8 compat: rw_semaphore atomic_long_t count

For non-rwsem-spinlocks the "count" member was changed from a
"long" to "atomic_long_t" type. A configure check has been
added to detect this change along with new versions of the
_rwsem_tryupgrade() function and RWSEM_COUNT() macro. See
https://github.com/torvalds/linux/commit/8ee62b18 for complete
details.

Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #563

Added highbit() and lowbit() macros

Signed-off-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #562

Add _ALIGNMENT_REQUIRED to isa_defs.h for checksums

_ALIGNMENT_REQUIRED needs to be #defined in isa_defs.h in order to
port the Illumos checksum code to ZoL:

4185 add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R
OpenZFS-issue: https://www.illumos.org/issues/4185
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/45818ee

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #561

Improve spl slab cache alloc

The policy is to try to allocate with KM_NOSLEEP, which will lead to
memory allocation with GFP_ATOMIC, and if it fails, it will launch
an taskq to expand slab space.

This way it should be able to get better NUMA memory locality and
reduce the overhead of context switch.

Signed-off-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #551

Fix use-after-free in splat_taskq_test7

This splat_vprint is using tq_arg->name after tq_arg is freed.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #557

Implement a proper rw_tryupgrade

Current rw_tryupgrade does rw_exit and then rw_tryenter(RW_RWITER), and then
does rw_enter(RW_READER) if it fails. This violate the assumption that
rw_tryupgrade should be atomic and could cause extra contention or even lock
inversion.

This patch we implement a proper rw_tryupgrade. For rwsem-spinlock, we take
the spinlock to check rwsem->count and rwsem->wait_list. For normal rwsem, we
use cmpxchg on rwsem->count to change the value from single reader to single
writer.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Closes zfsonlinux/zfs#4692
Closes #554

Add isa_defs for MIPS

GCC for MIPS only defines _LP64 when 64bit,
while no _ILP32 defined when 32bit.

Signed-off-by: YunQiang Su <syq@debian.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #558

Fix taskq_wait_outstanding re-evaluate tq_next_id

wait_event is a macro, so the current implementation will cause re-
evaluation of tq_next_id every time it wakes up. This would cause
taskq_wait_outstanding(tq, 0) to be equivalent to taskq_wait(tq)

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Issue #553

Fix race between taskq_destroy and dynamic spawning thread

While taskq_destroy would wait for dynamic_taskq to finish its tasks, but it
does not implies the thread being spawned is up and running. This will cause
taskq to be freed before the thread can exit.

We fix this by using tq_nspawn to indicate how many threads are being spawned
before they are inserted to the thread list. And have taskq_destroy to wait
for it to drop to zero.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Issue #553
Closes #550

Restore CALLOUT_FLAG_ABSOLUTE in cv_timedwait_hires

In 39cd90e, I mistakenly disabled the ability of using absolute expire time in
cv_timedwait_hires. I don't quite sure why I did that, so let's restore it.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Issue #553

Linux 4.7 compat: inode_lock() and friends

Linux 4.7 changes i_mutex to i_rwsem, and we should used inode_lock and
inode_lock_shared to do exclusive and shared lock respectively.

We use spl_inode_lock{,_shared}() to hide the difference. Note that on older
kernel you'll always take an exclusive lock.

We also add all other inode_lock friends. And nested users now should
explicitly call spl_inode_lock_nested with correct subclass.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue zfsonlinux/zfs#4665
Closes #549

Add cv_timedwait_sig_hires to allow interruptible sleep

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #548

Add a macro to convert seconds to nanoseconds and vice-versa

Required infrastructure for zfsonlinux/zfs#4600.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #546

Clear PF_FSTRANS over spl_filp_fallocate()

The problem described in 2a5d574 also applies to XFS's file or inode
fallocate method. Both paths may trigger writeback and expose this
issue, see the full stack below.

When layered on XFS a warning will be emitted under CentOS7 when entering
either the file or inode fallocate method with PF_FSTRANS already set.
To avoid triggering this error PF_FSTRANS is cleared and then reset
in vn_space().

WARNING: at fs/xfs/xfs_aops.c:982 xfs_vm_writepage+0x58b/0x5d0

Call Trace:
[<ffffffff810a1ed5>] warn_slowpath_common+0x95/0xe0
[<ffffffff810a1f3a>] warn_slowpath_null+0x1a/0x20
[<ffffffffa0231fdb>] xfs_vm_writepage+0x58b/0x5d0 [xfs]
[<ffffffff81173ed7>] __writepage+0x17/0x40
[<ffffffff81176f81>] write_cache_pages+0x251/0x530
[<ffffffff811772b1>] generic_writepages+0x51/0x80
[<ffffffffa0230cb0>] xfs_vm_writepages+0x60/0x80 [xfs]
[<ffffffff81177300>] do_writepages+0x20/0x30
[<ffffffff8116a5f5>] __filemap_fdatawrite_range+0xb5/0x100
[<ffffffff8116a6cb>] filemap_write_and_wait_range+0x8b/0xd0
[<ffffffffa0235bb4>] xfs_free_file_space+0xf4/0x520 [xfs]
[<ffffffffa023cbce>] xfs_file_fallocate+0x19e/0x2c0 [xfs]
[<ffffffffa036c6fc>] vn_space+0x3c/0x40 [spl]
[<ffffffffa0434817>] vdev_file_io_start+0x207/0x260 [zfs]
[<ffffffffa047170d>] zio_vdev_io_start+0xad/0x2d0 [zfs]
[<ffffffffa0474942>] zio_execute+0x82/0xe0 [zfs]
[<ffffffffa036ba7d>] taskq_thread+0x28d/0x5a0 [spl]
[<ffffffff810c1777>] kthread+0xd7/0xf0
[<ffffffff8167de2f>] ret_from_fork+0x3f/0x70

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Nikolay Borisov <kernel@kyup.com>
Closes zfsonlinux/zfs#4529

Use vmem_free() in dfl_free() and add dfl_alloc()

This change was lost, somehow, in e5f9a9a. Since the arrays can be
rather large, they need to be allocated with vmem_zalloc() via dfl_alloc()
and freed with vmem_free() via dfl_free().

The new dfl_alloc() function should be used to allocate object of type
dkioc_free_list_t in order that they're allocated from vmem.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Nikolay Borisov <kernel@kyup.com>
Closes #543

Use kernel provided mutex owner

To reduce mutex footprint, we detect the existence of owner in kernel mutex,
and rely on it if it exists.

Note that before Linux 3.0, mutex owner is of type thread_info. Also note
that, in Linux 3.18, the condition for owner is changed from
CONFIG_DEBUG_MUTEXES || CONFIG_SMP to
CONFIG_DEBUG_MUTEXES || CONFIG_MUTEX_SPIN_ON_OWNER

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #540