]> git.proxmox.com Git - mirror_ubuntu-zesty-kernel.git/log
mirror_ubuntu-zesty-kernel.git
7 years agoafs, rxrpc: Update the MAINTAINERS file
David Howells [Thu, 15 Dec 2016 13:59:50 +0000 (13:59 +0000)]
afs, rxrpc: Update the MAINTAINERS file

Update the MAINTAINERS file for AFS and AF_RXRPC to include a website
pointer.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoredo: radix tree test suite: fix compilation
Matthew Wilcox [Wed, 7 Dec 2016 22:44:33 +0000 (14:44 -0800)]
redo: radix tree test suite: fix compilation

[ This resurrects commit 53855d10f456, which was reverted in
  2b41226b39b6.  It depended on commit d544abd5ff7d ("lib/radix-tree:
  Convert to hotplug state machine") so now it is correct to apply ]

Patch "lib/radix-tree: Convert to hotplug state machine" breaks the test
suite as it adds a call to cpuhp_setup_state_nocalls() which is not
currently emulated in the test suite.  Add it, and delete the emulation
of the old CPU hotplug mechanism.

Link: http://lkml.kernel.org/r/1480369871-5271-36-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoprintk: Remove no longer used second struct cont
Geert Uytterhoeven [Thu, 15 Dec 2016 12:53:58 +0000 (13:53 +0100)]
printk: Remove no longer used second struct cont

If CONFIG_PRINTK=n:

    kernel/printk/printk.c:1893: warning: ‘cont’ defined but not used

Note that there are actually two different struct cont definitions and
objects: the first one is used if CONFIG_PRINTK=y, the second one became
unused by removing console_cont_flush().

Fixes: 5c2992ee7fd8 ("printk: remove console flushing special cases for partial buffered lines")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Petr Mladek <pmladek@suse.com>
[ I do the occasional "allnoconfig" builds, but apparently not often
  enough  - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoEDAC: Document HW_EVENT_ERR_DEFERRED type
Yazen Ghannam [Thu, 1 Dec 2016 20:24:53 +0000 (14:24 -0600)]
EDAC: Document HW_EVENT_ERR_DEFERRED type

Add a description of the HW_EVENT_ERR_DEFERRED type that wasn't included
with commit d12a969ebbfc ("EDAC, amd64: Add Deferred Error type").

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
7 years agoedac.rst: move concepts dictionary from edac.h
Mauro Carvalho Chehab [Sat, 29 Oct 2016 18:13:23 +0000 (16:13 -0200)]
edac.rst: move concepts dictionary from edac.h

Instead of storing the concepts dictionary inside header file,
move it to the subsystem documentation.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
7 years agoedac: fix kenel-doc markups at edac.h
Mauro Carvalho Chehab [Fri, 28 Oct 2016 17:04:52 +0000 (15:04 -0200)]
edac: fix kenel-doc markups at edac.h

As this file was never added to the driver-api, the kernel-doc
markups there were never tested. Some of them have issues.
Fix them.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
7 years agoedac: fix kernel-doc tags at the drivers/edac_*.h
Mauro Carvalho Chehab [Sat, 29 Oct 2016 12:35:23 +0000 (10:35 -0200)]
edac: fix kernel-doc tags at the drivers/edac_*.h

Some kernel-doc tags don't provide good descriptions or use
a different style. Adjust them.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
7 years agoedac: adjust docs location at MAINTAINERS and 00-INDEX
Mauro Carvalho Chehab [Thu, 27 Oct 2016 08:35:16 +0000 (06:35 -0200)]
edac: adjust docs location at MAINTAINERS and 00-INDEX

Update MAINTAINERS to reflect the location of edac.rst and ras.rst.

In the case of 00-INDEX, there's already an entry to the admin-guide,
so all we need to do is to remove the entry there.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
7 years agodriver-api: create an edac.rst file with EDAC documentation
Mauro Carvalho Chehab [Wed, 26 Oct 2016 16:14:45 +0000 (14:14 -0200)]
driver-api: create an edac.rst file with EDAC documentation

Currently, there's no device driver documentation for the EDAC
subsystem at the driver-api book. Fill in the blanks for the
structures and functions that misses documentation, uniform
the word on the existing ones, and add a new edac.rst file at
driver-api, in order to document the EDAC subsystem.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
7 years agoedac: move documentation from edac_mc.c to edac_core.h
Mauro Carvalho Chehab [Wed, 26 Oct 2016 17:47:55 +0000 (15:47 -0200)]
edac: move documentation from edac_mc.c to edac_core.h

Several functions are documented at edac_mc.c.

As we'll be including edac_core.h at drivers-api book, move
those, in order for the kernel-doc markups be part of the API
documentation book.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
7 years agoedac: move documentation from edac_pci*.c to edac_pci.h
Mauro Carvalho Chehab [Wed, 26 Oct 2016 18:13:43 +0000 (16:13 -0200)]
edac: move documentation from edac_pci*.c to edac_pci.h

Several functions are documented at edac_pci.c and edac_pci_sysfs.c.

As we'll be including edac_pci.h at drivers-api book, move those,
in order for the kernel-doc markups be part of the API
documentation book.

As several of those kernel-doc macros are not in the right format,
fix them.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
7 years agoedac: move documentation from edac_device to edac_core.h
Mauro Carvalho Chehab [Wed, 26 Oct 2016 18:01:47 +0000 (16:01 -0200)]
edac: move documentation from edac_device to edac_core.h

Several functions are documented at edac_device.c.

As we'll be including edac_core.h at drivers-api book, move those,
in order for the kernel-doc markups be part of the API
documentation book.

As several of those kernel-doc macros are not in the right format,
fix them.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
7 years agoedac: rename edac_core.h to edac_mc.h
Mauro Carvalho Chehab [Sat, 29 Oct 2016 17:16:34 +0000 (15:16 -0200)]
edac: rename edac_core.h to edac_mc.h

Now, all left at edac_core.h are at drivers/edac/edac_mc.c,
so rename it to edac_mc.h.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
7 years agoedac: move EDAC device definitions to drivers/edac/edac_device.h
Mauro Carvalho Chehab [Sat, 29 Oct 2016 12:01:41 +0000 (10:01 -0200)]
edac: move EDAC device definitions to drivers/edac/edac_device.h

The edac_core.h header contain data structures and function
definitions for both EDAC MC and EDAC device.

Let's move the devices ones to a separate header file, as part
of a header reorganization.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
7 years agoedac: move EDAC PCI definitions to drivers/edac/edac_pci.h
Mauro Carvalho Chehab [Sat, 29 Oct 2016 11:56:00 +0000 (09:56 -0200)]
edac: move EDAC PCI definitions to drivers/edac/edac_pci.h

The edac_core.h header contain data structures and function
definitions for the 3 parts of EDAC: MC, PCI and device.

Let's move the PCI ones to a separate header file, as part
of a header reorganization.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
7 years agodocs-rst: admin-guide: add documentation for EDAC
Mauro Carvalho Chehab [Wed, 26 Oct 2016 18:24:41 +0000 (16:24 -0200)]
docs-rst: admin-guide: add documentation for EDAC

EDAC is part of the Kernel's RAS facilities, with is useful for
system admins to detect errors. So, add it to the admin's guide.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
7 years agoedac.txt: Improve documentation, adding RAS introduction
Mauro Carvalho Chehab [Thu, 27 Oct 2016 11:26:36 +0000 (09:26 -0200)]
edac.txt: Improve documentation, adding RAS introduction

The edac.txt assumes that the reader has already deep knowledge
on RAS features. However, this may not be the case. So, add an
introduction chapter explaining the main concepts that are used by
the EDAC subsystem and by other RAS drivers within the Kernel.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
7 years agoedac.txt: update information about newer Intel CPUs
Mauro Carvalho Chehab [Wed, 26 Oct 2016 10:43:58 +0000 (08:43 -0200)]
edac.txt: update information about newer Intel CPUs

There's a chapter at edac.rst written by the time Nehalem
support was added. Such information is used not only by the
Nehalem driver (i7core_edac), but by all newer Intel CPU
architectures that are supported by i7core_edac, sb_edac
and sbx_edac drivers.

Update the information to reflect that.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
7 years agoedac.txt: remove info that the Nehalem EDAC is experimental
Mauro Carvalho Chehab [Wed, 26 Oct 2016 10:19:57 +0000 (08:19 -0200)]
edac.txt: remove info that the Nehalem EDAC is experimental

This driver has been there for almost 3 years, without any
conceptual changes. So, it is not experimental anymore, and
won't likely have any changes at the API or on log outputs.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
7 years agoedac.txt: convert EDAC documentation to ReST
Mauro Carvalho Chehab [Wed, 26 Oct 2016 10:14:12 +0000 (08:14 -0200)]
edac.txt: convert EDAC documentation to ReST

Converts the EDAC driver subsystem documentation to ReST:

- Put paragraph titles in lower case;
- Add code blocks where needed;
- Convert tables to ReST markup;
- Mark filesystem and module names as verbatim;
- Adjust document to be properly displayed in html.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
7 years agoedac.txt: add a section explaining the dimmX and rankX directories
Mauro Carvalho Chehab [Thu, 27 Oct 2016 12:00:46 +0000 (10:00 -0200)]
edac.txt: add a section explaining the dimmX and rankX directories

Documentation for those are missing at the EDAC description.

I guess we end by moving such descriptions in the past to the
ABI document (or only added it there), but it means that the
EDAC documentation is incomplete. So, add it there.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
7 years agoedac: edac_core.h: remove prototype for edac_pci_reset_delay_period()
Mauro Carvalho Chehab [Wed, 26 Oct 2016 18:15:02 +0000 (16:15 -0200)]
edac: edac_core.h: remove prototype for edac_pci_reset_delay_period()

This function doesn't exist. So, remove its prototype.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
7 years agoedac: edac_core.h: get rid of unused kobj_complete
Mauro Carvalho Chehab [Wed, 26 Oct 2016 17:25:23 +0000 (15:25 -0200)]
edac: edac_core.h: get rid of unused kobj_complete

This element of struct edac_pci_ctl_info is never used. So,
get rid of it.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
7 years agoMerge branch 'patchwork' into v4l_for_linus
Mauro Carvalho Chehab [Thu, 15 Dec 2016 10:38:35 +0000 (08:38 -0200)]
Merge branch 'patchwork' into v4l_for_linus

* patchwork: (496 commits)
  [media] v4l: tvp5150: Add missing break in set control handler
  [media] v4l: tvp5150: Don't inline the tvp5150_selmux() function
  [media] v4l: tvp5150: Compile tvp5150_link_setup out if !CONFIG_MEDIA_CONTROLLER
  [media] em28xx: don't store usb_device at struct em28xx
  [media] em28xx: use usb_interface for dev_foo() calls
  [media] em28xx: don't change the device's name
  [media] mn88472: fix chip id check on probe
  [media] mn88473: fix chip id check on probe
  [media] lirc: fix error paths in lirc_cdev_add()
  [media] s5p-mfc: Add support for MFC v8 available in Exynos 5433 SoCs
  [media] s5p-mfc: Rework clock handling
  [media] s5p-mfc: Don't keep clock prepared all the time
  [media] s5p-mfc: Kill all IS_ERR_OR_NULL in clocks management code
  [media] s5p-mfc: Remove dead conditional code
  [media] s5p-mfc: Ensure that clock is disabled before turning power off
  [media] s5p-mfc: Remove special clock rate management
  [media] s5p-mfc: Use printk_ratelimited for reporting ioctl errors
  [media] s5p-mfc: Set DMA_ATTR_ALLOC_SINGLE_PAGES
  [media] vivid: Set color_enc on HSV formats
  [media] v4l2-tpg: Init hv_enc field with a valid value
  ...

7 years agoMerge branches 'work.namei', 'work.dcache' and 'work.iov_iter' into for-linus
Al Viro [Thu, 15 Dec 2016 06:07:29 +0000 (01:07 -0500)]
Merge branches 'work.namei', 'work.dcache' and 'work.iov_iter' into for-linus

7 years agoMerge tag 'xfs-for-linus-4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Thu, 15 Dec 2016 05:35:31 +0000 (21:35 -0800)]
Merge tag 'xfs-for-linus-4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs

Pull xfs updates from Dave Chinner:
 "There is quite a varied bunch of stuff in this update, and some of it
  you will have already merged through the ext4 tree which imported the
  dax-4.10-iomap-pmd topic branch from the XFS tree.

  There is also a new direct IO implementation that uses the iomap
  infrastructure. It's much simpler, faster, and has lower IO latency
  than the existing direct IO infrastructure.

  Summary:
   - DAX PMD faults via iomap infrastructure
   - Direct-io support in iomap infrastructure
   - removal of now-redundant XFS inode iolock, replaced with VFS
     i_rwsem
   - synchronisation with fixes and changes in userspace libxfs code
   - extent tree lookup helpers
   - lots of little corruption detection improvements to verifiers
   - optimised CRC calculations
   - faster buffer cache lookups
   - deprecation of barrier/nobarrier mount options - we always use
     REQ_FUA/REQ_FLUSH where appropriate for data integrity now
   - cleanups to speculative preallocation
   - miscellaneous minor bug fixes and cleanups"

* tag 'xfs-for-linus-4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (63 commits)
  xfs: nuke unused tracepoint definitions
  xfs: use GPF_NOFS when allocating btree cursors
  xfs: use xfs_vn_setattr_size to check on new size
  xfs: deprecate barrier/nobarrier mount option
  xfs: Always flush caches when integrity is required
  xfs: ignore leaf attr ichdr.count in verifier during log replay
  xfs: use rhashtable to track buffer cache
  xfs: optimise CRC updates
  xfs: make xfs btree stats less huge
  xfs: don't cap maximum dedupe request length
  xfs: don't allow di_size with high bit set
  xfs: error out if trying to add attrs and anextents > 0
  xfs: don't crash if reading a directory results in an unexpected hole
  xfs: complain if we don't get nextents bmap records
  xfs: check for bogus values in btree block headers
  xfs: forbid AG btrees with level == 0
  xfs: several xattr functions can be void
  xfs: handle cow fork in xfs_bmap_trace_exlist
  xfs: pass state not whichfork to trace_xfs_extlist
  xfs: Move AGI buffer type setting to xfs_read_agi
  ...

7 years agoprintk: remove console flushing special cases for partial buffered lines
Linus Torvalds [Tue, 25 Oct 2016 18:27:31 +0000 (11:27 -0700)]
printk: remove console flushing special cases for partial buffered lines

It actively hurts proper merging, and makes for a lot of special cases.
There was a good(ish) reason for doing it originally, but it's getting
too painful to maintain.  And most of the original reasons for it are
long gone.

So instead of having special code to flush partial lines to the console
(as opposed to the record buffers), do _all_ the console writing from
the record buffer, and be done with it.

If an oops happens (or some other synchronous event), we will flush the
partial lines due to the oops printing activity, so this does not affect
that.  It does mean that if you have a completely hung machine, a
partial preceding line may not have been printed out.

That was some of the original reason for this complexity, in fact, back
when we used to test for the historical i386 "halt" instruction problem
by doing

pr_info("Checking 'hlt' instruction... ");

if (!boot_cpu_data.hlt_works_ok) {
pr_cont("disabled\n");
return;
}
halt();
halt();
halt();
halt();
pr_cont("OK\n");

and that model no longer works (it the 'hlt' instruction kills the
machine, the partial line won't have been flushed, so you won't even see
it).

Of course, that was also back in the days when people actually had
textual console output rather than a graphical splash-screen at bootup.
How times change..

Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Joe Perches <joe@perches.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Tested-by: Petr Mladek <pmladek@suse.com>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoprintk: remove games with previous record flags
Linus Torvalds [Tue, 25 Oct 2016 18:27:31 +0000 (11:27 -0700)]
printk: remove games with previous record flags

The record logging code looks at the previous record flags in various
ways, and they are all wrong.

You can't use the previous record flags to determine anything about the
next record, because they may simply not be related.  In particular, the
reason the previous record was a continuation record may well be exactly
_because_ the new record was printed by a different process, which is
why the previous record was flushed.

So all those games are simply wrong, and make the code hard to
understand (because the code fundamentally cdoes not make sense).

So remove it.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agovsock/virtio: fix src/dst cid format
Michael S. Tsirkin [Tue, 6 Dec 2016 04:07:15 +0000 (06:07 +0200)]
vsock/virtio: fix src/dst cid format

These fields are 64 bit, using le32_to_cpu and friends
on these will not do the right thing.
Fix this up.

Cc: stable@vger.kernel.org
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
7 years agovsock/virtio: mark an internal function static
Michael S. Tsirkin [Tue, 6 Dec 2016 04:06:06 +0000 (06:06 +0200)]
vsock/virtio: mark an internal function static

virtio_transport_alloc_pkt is only used locally, make it static.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
7 years agovsock/virtio: add a missing __le annotation
Michael S. Tsirkin [Tue, 6 Dec 2016 04:03:34 +0000 (06:03 +0200)]
vsock/virtio: add a missing __le annotation

guest cid is read from config space, therefore it's in little endian
format and is treated as such, annotate it accordingly.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
7 years agovhost: add missing __user annotations
Michael S. Tsirkin [Tue, 6 Dec 2016 04:01:41 +0000 (06:01 +0200)]
vhost: add missing __user annotations

Several vhost functions were missing __user annotations
on pointers, causing sparse warnings. Fix this up.

sparse also warns about vhost_process_iotlb_msg which
is local and should be static. Fix that up as well.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
7 years agovhost: make interval tree static inline
Michael S. Tsirkin [Tue, 6 Dec 2016 03:57:54 +0000 (05:57 +0200)]
vhost: make interval tree static inline

vhost_umem_interval_tree is only used locally within vhost.c, mark it
static. As some functions generated go unused, this triggers warnings
unless we also mark it inline.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
7 years agodrm/virtio: annotate virtio_gpu_queue_ctrl_buffer_locked
Michael S. Tsirkin [Mon, 5 Dec 2016 20:39:30 +0000 (22:39 +0200)]
drm/virtio: annotate virtio_gpu_queue_ctrl_buffer_locked

virtio_gpu_queue_ctrl_buffer_locked is called with ctrlq.qlock taken, it
releases and acquires this lock.  This causes a sparse warning.  Add
appropriate annotations for sparse context checking.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
7 years agodrm/virtio: fix lock context imbalance
Michael S. Tsirkin [Mon, 5 Dec 2016 20:36:56 +0000 (22:36 +0200)]
drm/virtio: fix lock context imbalance

When virtio_gpu_free_vbufs exits due to list empty, it does not
drop the free_vbufs lock that it took.
list empty is not expected to happen anyway, but it can't hurt to fix
this and drop the lock.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
7 years agodrm/virtio: fix endianness in primary_plane_update
Michael S. Tsirkin [Mon, 5 Dec 2016 19:44:39 +0000 (21:44 +0200)]
drm/virtio: fix endianness in primary_plane_update

virtio_gpu_cmd_transfer_to_host_2d expects x and y
parameters in LE, but virtio_gpu_primary_plane_update
passes in the CPU format instead.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
7 years agovirtio_console: drop unused config fields
Michael S. Tsirkin [Mon, 5 Dec 2016 19:39:42 +0000 (21:39 +0200)]
virtio_console: drop unused config fields

struct ports_device includes a config field including the whole
virtio_console_config, but only max_nr_ports in there is ever updated or
used. The rest is unused and in fact does not even mirror the
device config. Drop everything except max_nr_ports,
saving some memory.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
7 years agoMerge tag 'for-linus-4.10' of git://git.code.sf.net/p/openipmi/linux-ipmi
Linus Torvalds [Thu, 15 Dec 2016 04:54:33 +0000 (20:54 -0800)]
Merge tag 'for-linus-4.10' of git://git.code.sf.net/p/openipmi/linux-ipmi

Pull IPMI updates from Corey Minyard:
 "Various small fixes for IPMI. Cleanups in the documentation and
  convertion printk() to pr_xxx() and removal of an unused module
  parameter. Some small bug fixes and enhancements.

  This also adds a post softdep from the IPMI core module to the IPMI
  device interface. Many people have complained that the device
  interface isn't automatically avaiable when IPMI is loaded. I don't
  want to make the device interface mandatory, though, plenty of people
  use IPMI internally (like with ACPI) and don't need a device interface
  or the added possible security entry. A softdep should make it work
  'out of the box' but allow people to not have it if they don't want
  it"

* tag 'for-linus-4.10' of git://git.code.sf.net/p/openipmi/linux-ipmi:
  ipmi: create hardware-independent softdep for ipmi_devintf
  ipmi: Fix sequence number handling
  ipmi: Pick up slave address from SMBIOS on an ACPI device
  ipmi_si: Clean up printks
  Move platform device creation earlier in the initialization
  ipmi: Update documentation
  ipmi_ssif: Remove an unused module parameter
  ipmi: Periodically check for events, not messages

7 years agologfs: remove from tree
Christoph Hellwig [Sun, 11 Sep 2016 14:04:46 +0000 (09:04 -0500)]
logfs: remove from tree

Logfs was introduced to the kernel in 2009, and hasn't seen any non
drive-by changes since 2012, while having lots of unsolved issues
including the complete lack of error handling, with more and more
issues popping up without any fixes.

The logfs.org domain has been bouncing from a mail, and the maintainer
on the non-logfs.org domain hasn't repsonded to past queries either.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
7 years agoMerge tag 'dmaengine-4.10-rc1' of git://git.infradead.org/users/vkoul/slave-dma
Linus Torvalds [Thu, 15 Dec 2016 04:42:45 +0000 (20:42 -0800)]
Merge tag 'dmaengine-4.10-rc1' of git://git.infradead.org/users/vkoul/slave-dma

Pull dmaengine updates from Vinod Koul:
 "Fairly routine update this time around with all changes specific to
  drivers:

   - New driver for STMicroelectronics FDMA
   - Memory-to-memory transfers on dw dmac
   - Support for slave maps on pl08x devices
   - Bunch of driver fixes to use dma_pool_zalloc
   - Bunch of compile and warning fixes spread across drivers"

[ The ST FDMA driver already came in earlier through the remoteproc tree ]

* tag 'dmaengine-4.10-rc1' of git://git.infradead.org/users/vkoul/slave-dma: (68 commits)
  dmaengine: sirf-dma: remove unused ‘sdesc’
  dmaengine: pl330: remove unused ‘regs’
  dmaengine: s3c24xx: remove unused ‘cdata’
  dmaengine: stm32-dma: remove unused ‘src_addr’
  dmaengine: stm32-dma: remove unused ‘dst_addr’
  dmaengine: stm32-dma: remove unused ‘sfcr’
  dmaengine: pch_dma: remove unused ‘cookie’
  dmaengine: mic_x100_dma: remove unused ‘data’
  dmaengine: img-mdc: remove unused ‘prev_phys’
  dmaengine: usb-dmac: remove unused ‘uchan’
  dmaengine: ioat: remove unused ‘res’
  dmaengine: ioat: remove unused ‘ioat_dma’
  dmaengine: ioat: remove unused ‘is_raid_device’
  dmaengine: pl330: do not generate unaligned access
  dmaengine: k3dma: move to dma_pool_zalloc
  dmaengine: at_hdmac: move to dma_pool_zalloc
  dmaengine: at_xdmac: don't restore unsaved status
  dmaengine: ioat: set error code on failures
  dmaengine: ioat: set error code on failures
  dmaengine: DW DMAC: add multi-block property to device tree
  ...

7 years agovirtio_ring: fix complaint by sparse
Gonglei [Tue, 22 Nov 2016 05:51:50 +0000 (13:51 +0800)]
virtio_ring: fix complaint by sparse

 # make C=2 CF="-D__CHECK_ENDIAN__" ./drivers/virtio/

drivers/virtio/virtio_ring.c:423:19: warning: incorrect type in assignment (different base types)
drivers/virtio/virtio_ring.c:423:19:    expected unsigned int [unsigned] [assigned] i
drivers/virtio/virtio_ring.c:423:19:    got restricted __virtio16 [usertype] next
drivers/virtio/virtio_ring.c:423:19: warning: incorrect type in assignment (different base types)
drivers/virtio/virtio_ring.c:423:19:    expected unsigned int [unsigned] [assigned] i
drivers/virtio/virtio_ring.c:423:19:    got restricted __virtio16 [usertype] next
drivers/virtio/virtio_ring.c:423:19: warning: incorrect type in assignment (different base types)
drivers/virtio/virtio_ring.c:423:19:    expected unsigned int [unsigned] [assigned] i
drivers/virtio/virtio_ring.c:423:19:    got restricted __virtio16 [usertype] next
drivers/virtio/virtio_ring.c:604:39: warning: incorrect type in initializer (different base types)
drivers/virtio/virtio_ring.c:604:39:    expected unsigned short [unsigned] [usertype] nextflag
drivers/virtio/virtio_ring.c:604:39:    got restricted __virtio16
drivers/virtio/virtio_ring.c:612:33: warning: restricted __virtio16 degrades to integer

Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
7 years agovirtio_pci_modern: fix complaint by sparse
Gonglei [Tue, 22 Nov 2016 05:51:49 +0000 (13:51 +0800)]
virtio_pci_modern: fix complaint by sparse

drivers/virtio/virtio_pci_modern.c:66:40: warning: incorrect type in argument 2 (different base types)
drivers/virtio/virtio_pci_modern.c:66:40:    expected unsigned int [noderef] [usertype] <asn:2>*addr
drivers/virtio/virtio_pci_modern.c:66:40:    got restricted __le32 [noderef] [usertype] <asn:2>*lo
drivers/virtio/virtio_pci_modern.c:67:33: warning: incorrect type in argument 2 (different base types)
drivers/virtio/virtio_pci_modern.c:67:33:    expected unsigned int [noderef] [usertype] <asn:2>*addr
drivers/virtio/virtio_pci_modern.c:67:33:    got restricted __le32 [noderef] [usertype] <asn:2>*hi
drivers/virtio/virtio_pci_modern.c:150:32: warning: incorrect type in argument 2 (different base types)
drivers/virtio/virtio_pci_modern.c:150:32:    expected unsigned int [noderef] [usertype] <asn:2>*addr
drivers/virtio/virtio_pci_modern.c:150:32:    got restricted __le32 [noderef] <asn:2>*<noident>
drivers/virtio/virtio_pci_modern.c:151:39: warning: incorrect type in argument 1 (different base types)
drivers/virtio/virtio_pci_modern.c:151:39:    expected unsigned int [noderef] [usertype] <asn:2>*addr
drivers/virtio/virtio_pci_modern.c:151:39:    got restricted __le32 [noderef] <asn:2>*<noident>
drivers/virtio/virtio_pci_modern.c:152:32: warning: incorrect type in argument 2 (different base types)
drivers/virtio/virtio_pci_modern.c:152:32:    expected unsigned int [noderef] [usertype] <asn:2>*addr

Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
7 years agoMerge tag 'modules-for-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu...
Linus Torvalds [Thu, 15 Dec 2016 04:12:43 +0000 (20:12 -0800)]
Merge tag 'modules-for-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux

Pull modules updates from Jessica Yu:
 "Summary of modules changes for the 4.10 merge window:

   - The rodata= cmdline parameter has been extended to additionally
     apply to module mappings

   - Fix a hard to hit race between module loader error/clean up
     handling and ftrace registration

   - Some code cleanups, notably panic.c and modules code use a unified
     taint_flags table now. This is much cleaner than duplicating the
     taint flag code in modules.c"

* tag 'modules-for-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
  module: fix DEBUG_SET_MODULE_RONX typo
  module: extend 'rodata=off' boot cmdline parameter to module mappings
  module: Fix a comment above strong_try_module_get()
  module: When modifying a module's text ignore modules which are going away too
  module: Ensure a module's state is set accordingly during module coming cleanup code
  module: remove trailing whitespace
  taint/module: Clean up global and module taint flags handling
  modpost: free allocated memory

7 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Thu, 15 Dec 2016 01:25:18 +0000 (17:25 -0800)]
Merge branch 'akpm' (patches from Andrew)

Merge more updates from Andrew Morton:

 - a few misc things

 - kexec updates

 - DMA-mapping updates to better support networking DMA operations

 - IPC updates

 - various MM changes to improve DAX fault handling

 - lots of radix-tree changes, mainly to the test suite. All leading up
   to reimplementing the IDA/IDR code to be a wrapper layer over the
   radix-tree. However the final trigger-pulling patch is held off for
   4.11.

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (114 commits)
  radix tree test suite: delete unused rcupdate.c
  radix tree test suite: add new tag check
  radix-tree: ensure counts are initialised
  radix tree test suite: cache recently freed objects
  radix tree test suite: add some more functionality
  idr: reduce the number of bits per level from 8 to 6
  rxrpc: abstract away knowledge of IDR internals
  tpm: use idr_find(), not idr_find_slowpath()
  idr: add ida_is_empty
  radix tree test suite: check multiorder iteration
  radix-tree: fix replacement for multiorder entries
  radix-tree: add radix_tree_split_preload()
  radix-tree: add radix_tree_split
  radix-tree: add radix_tree_join
  radix-tree: delete radix_tree_range_tag_if_tagged()
  radix-tree: delete radix_tree_locate_item()
  radix-tree: improve multiorder iterators
  btrfs: fix race in btrfs_free_dummy_fs_info()
  radix-tree: improve dump output
  radix-tree: make radix_tree_find_next_bit more useful
  ...

7 years agoMerge branch 'for-linus' of git://git.kernel.dk/linux-block
Linus Torvalds [Thu, 15 Dec 2016 01:21:53 +0000 (17:21 -0800)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block

Pull block IO fixes from Jens Axboe:
 "A few fixes that I collected as post-merge.

  I was going to wait a bit with sending this out, but the O_DIRECT fix
  should really go in sooner rather than later"

* 'for-linus' of git://git.kernel.dk/linux-block:
  blk-mq: Fix failed allocation path when mapping queues
  blk-mq: Avoid memory reclaim when remapping queues
  block_dev: don't update file access position for sync direct IO
  nvme/pci: Log PCI_STATUS when the controller dies
  block_dev: don't test bdev->bd_contains when it is not stable

7 years agoMerge branch 'for-4.10/fs-unmap' of git://git.kernel.dk/linux-block
Linus Torvalds [Thu, 15 Dec 2016 01:09:00 +0000 (17:09 -0800)]
Merge branch 'for-4.10/fs-unmap' of git://git.kernel.dk/linux-block

Pull fs meta data unmap optimization from Jens Axboe:
 "A series from Jan Kara, providing a more efficient way for unmapping
  meta data from in the buffer cache than doing it block-by-block.

  Provide a general helper that existing callers can use"

* 'for-4.10/fs-unmap' of git://git.kernel.dk/linux-block:
  fs: Remove unmap_underlying_metadata
  fs: Add helper to clean bdev aliases under a bh and use it
  ext2: Use clean_bdev_aliases() instead of iteration
  ext4: Use clean_bdev_aliases() instead of iteration
  direct-io: Use clean_bdev_aliases() instead of handmade iteration
  fs: Provide function to unmap metadata for a range of blocks

7 years agodocs: add back 'Documentation/Changes' file (as symlink)
Linus Torvalds [Thu, 15 Dec 2016 00:30:12 +0000 (16:30 -0800)]
docs: add back 'Documentation/Changes' file (as symlink)

Jaegeuk Kim reports that the debian kernel package build gets confused
by the lack of Documentation/Changes file.  We also refer to that path
name in ver_linux and various how-to files and Kconfig files.

The file got renamed away in commit 186128f75392 ("docs-rst: add
documents to development-process"), and as Jaegeuk Kim points out, the
commit message for that change says "use symlinks instead of renames",
but then the commit itself actually does renames after all.

Maybe we should do the other files too, but for now this just adds the
minimal symlink back to the historical name, so that people looking for
Documentation/Changes will actually find what they are looking for, and
the debian scripts continue to work.

Reported-by: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoradix tree test suite: delete unused rcupdate.c
Matthew Wilcox [Wed, 14 Dec 2016 23:09:36 +0000 (15:09 -0800)]
radix tree test suite: delete unused rcupdate.c

This file was used to implement call_rcu() before liburcu implemented
that function.  It hasn't even been compiled since before the test suite
was added to the kernel.  Remove it to reduce confusion.

Link: http://lkml.kernel.org/r/1481667692-14500-5-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoradix tree test suite: add new tag check
Matthew Wilcox [Wed, 14 Dec 2016 23:09:34 +0000 (15:09 -0800)]
radix tree test suite: add new tag check

We have a check that setting a tag on a single entry at root succeeds,
but we were missing a check that clearing a tag on that same entry also
succeeds.

Link: http://lkml.kernel.org/r/1481667692-14500-4-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoradix-tree: ensure counts are initialised
Matthew Wilcox [Wed, 14 Dec 2016 23:09:31 +0000 (15:09 -0800)]
radix-tree: ensure counts are initialised

radix_tree_join() was freeing nodes with a non-zero ->exceptional count,
and radix_tree_split() wasn't zeroing ->exceptional when it allocated
the new node.  Fix this by making all callers of radix_tree_node_alloc()
pass in the new counts (and some other always-initialised fields), which
will prevent the problem recurring if in future we decide to do
something similar.

Link: http://lkml.kernel.org/r/1481667692-14500-3-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoradix tree test suite: cache recently freed objects
Matthew Wilcox [Wed, 14 Dec 2016 23:09:28 +0000 (15:09 -0800)]
radix tree test suite: cache recently freed objects

The kmem_cache_alloc implementation simply allocates new memory from
malloc() and calls the ctor, which zeroes out the entire object.  This
means it cannot spot bugs where the object isn't properly reinitialised
before being freed.

Add a small (11 objects) cache before freeing objects back to malloc.
This is enough to let us write a test to catch it, although the memory
allocator is now aware of the structure of the radix tree node, since it
chains free objects through ->private_data (like the percpu cache does).

Link: http://lkml.kernel.org/r/1481667692-14500-2-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoradix tree test suite: add some more functionality
Matthew Wilcox [Wed, 14 Dec 2016 23:09:25 +0000 (15:09 -0800)]
radix tree test suite: add some more functionality

IDR needs more functionality from the kernel: kmalloc()/kfree(), and
xchg().

Link: http://lkml.kernel.org/r/1480369871-5271-67-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoidr: reduce the number of bits per level from 8 to 6
Matthew Wilcox [Wed, 14 Dec 2016 23:09:22 +0000 (15:09 -0800)]
idr: reduce the number of bits per level from 8 to 6

In preparation for merging the IDR and radix tree, reduce the fanout at
each level from 256 to 64.  If this causes a performance problem then a
bisect will point to this commit, and we'll have a better idea about
what we might do to fix it.

Link: http://lkml.kernel.org/r/1480369871-5271-66-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agorxrpc: abstract away knowledge of IDR internals
Matthew Wilcox [Wed, 14 Dec 2016 23:09:19 +0000 (15:09 -0800)]
rxrpc: abstract away knowledge of IDR internals

Add idr_get_cursor() / idr_set_cursor() APIs, and remove the reference
to IDR_SIZE.

Link: http://lkml.kernel.org/r/1480369871-5271-65-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: David Howells <dhowells@redhat.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agotpm: use idr_find(), not idr_find_slowpath()
Matthew Wilcox [Wed, 14 Dec 2016 23:09:16 +0000 (15:09 -0800)]
tpm: use idr_find(), not idr_find_slowpath()

idr_find_slowpath() is not intended to be part of the public API, it's
an implementation detail.  There's no reason to skip straight to the
slowpath here.

Link: http://lkml.kernel.org/r/1480369871-5271-64-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Peter Huewe <peterhuewe@gmx.de>
Cc: Marcel Selhorst <tpmdd@selhorst.net>
Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoidr: add ida_is_empty
Matthew Wilcox [Wed, 14 Dec 2016 23:09:13 +0000 (15:09 -0800)]
idr: add ida_is_empty

Two of the USB Gadgets were poking around in the internals of struct ida
in order to determine if it is empty.  Add the appropriate abstraction.

Link: http://lkml.kernel.org/r/1480369871-5271-63-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Acked-by: Konstantin Khlebnikov <koct9i@gmail.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Felipe Balbi <balbi@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoradix tree test suite: check multiorder iteration
Matthew Wilcox [Wed, 14 Dec 2016 23:09:10 +0000 (15:09 -0800)]
radix tree test suite: check multiorder iteration

The random iteration test only inserts order-0 entries currently.
Update it to insert entries of order between 7 and 0.  Also make the
maximum index configurable, make some variables static, make the test
duration variable, remove some useless spinning, and add a fifth thread
which calls tag_tagged_items().

Link: http://lkml.kernel.org/r/1480369871-5271-62-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoradix-tree: fix replacement for multiorder entries
Matthew Wilcox [Wed, 14 Dec 2016 23:09:07 +0000 (15:09 -0800)]
radix-tree: fix replacement for multiorder entries

When replacing an entry with NULL, we need to delete any sibling
entries.  Also account deleting exceptional entries properly.  Also fix
a bug with radix_tree_iter_replace() where we would fail to remove
entirely freed nodes.  Also fix accounting bug when switching between
normal and exceptional entries with replace_slot.  Also add testcases
for all these bugs.

Link: http://lkml.kernel.org/r/1480369871-5271-61-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoradix-tree: add radix_tree_split_preload()
Matthew Wilcox [Wed, 14 Dec 2016 23:09:04 +0000 (15:09 -0800)]
radix-tree: add radix_tree_split_preload()

Calculate how many nodes we need to allocate to split an old_order entry
into multiple entries, each of size new_order.  The test suite checks
that we allocated exactly the right number of nodes; neither too many
(checked by rtp->nr == 0), nor too few (checked by comparing
nr_allocated before and after the call to radix_tree_split()).

Link: http://lkml.kernel.org/r/1480369871-5271-60-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoradix-tree: add radix_tree_split
Matthew Wilcox [Wed, 14 Dec 2016 23:09:01 +0000 (15:09 -0800)]
radix-tree: add radix_tree_split

This new function splits a larger multiorder entry into smaller entries
(potentially multi-order entries).  These entries are initialised to
RADIX_TREE_RETRY to ensure that RCU walkers who see this state aren't
confused.  The caller should then call radix_tree_for_each_slot() and
radix_tree_replace_slot() in order to turn these retry entries into the
intended new entries.  Tags are replicated from the original multiorder
entry into each new entry.

Link: http://lkml.kernel.org/r/1480369871-5271-59-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoradix-tree: add radix_tree_join
Matthew Wilcox [Wed, 14 Dec 2016 23:08:58 +0000 (15:08 -0800)]
radix-tree: add radix_tree_join

This new function allows for the replacement of many smaller entries in
the radix tree with one larger multiorder entry.  From the point of view
of an RCU walker, they may see a mixture of the smaller entries and the
large entry during the same walk, but they will never see NULL for an
index which was populated before the join.

Link: http://lkml.kernel.org/r/1480369871-5271-58-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoradix-tree: delete radix_tree_range_tag_if_tagged()
Matthew Wilcox [Wed, 14 Dec 2016 23:08:55 +0000 (15:08 -0800)]
radix-tree: delete radix_tree_range_tag_if_tagged()

This is an exceptionally complicated function with just one caller
(tag_pages_for_writeback).  We devote a large portion of the runtime of
the test suite to testing this one function which has one caller.  By
introducing the new function radix_tree_iter_tag_set(), we can eliminate
all of the complexity while keeping the performance.  The caller can now
use a fairly standard radix_tree_for_each() loop, and it doesn't need to
worry about tricksy things like 'start' wrapping.

The test suite continues to spend a large amount of time investigating
this function, but now it's testing the underlying primitives such as
radix_tree_iter_resume() and the radix_tree_for_each_tagged() iterator
which are also used by other parts of the kernel.

Link: http://lkml.kernel.org/r/1480369871-5271-57-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoradix-tree: delete radix_tree_locate_item()
Matthew Wilcox [Wed, 14 Dec 2016 23:08:52 +0000 (15:08 -0800)]
radix-tree: delete radix_tree_locate_item()

This rather complicated function can be better implemented as an
iterator.  It has only one caller, so move the functionality to the only
place that needs it.  Update the test suite to follow the same pattern.

Link: http://lkml.kernel.org/r/1480369871-5271-56-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Konstantin Khlebnikov <koct9i@gmail.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoradix-tree: improve multiorder iterators
Matthew Wilcox [Wed, 14 Dec 2016 23:08:49 +0000 (15:08 -0800)]
radix-tree: improve multiorder iterators

This fixes several interlinked problems with the iterators in the
presence of multiorder entries.

1. radix_tree_iter_next() would only advance by one slot, which would
   result in the iterators returning the same entry more than once if
   there were sibling entries.

2. radix_tree_next_slot() could return an internal pointer instead of
   a user pointer if a tagged multiorder entry was immediately followed by
   an entry of lower order.

3. radix_tree_next_slot() expanded to a lot more code than it used to
   when multiorder support was compiled in.  And I wasn't comfortable with
   entry_to_node() being in a header file.

Fixing radix_tree_iter_next() for the presence of sibling entries
necessarily involves examining the contents of the radix tree, so we now
need to pass 'slot' to radix_tree_iter_next(), and we need to change the
calling convention so it is called *before* dropping the lock which
protects the tree.  Also rename it to radix_tree_iter_resume(), as some
people thought it was necessary to call radix_tree_iter_next() each time
around the loop.

radix_tree_next_slot() becomes closer to how it looked before multiorder
support was introduced.  It only checks to see if the next entry in the
chunk is a sibling entry or a pointer to a node; this should be rare
enough that handling this case out of line is not a performance impact
(and such impact is amortised by the fact that the entry we just
processed was a multiorder entry).  Also, radix_tree_next_slot() used to
force a new chunk lookup for untagged entries, which is more expensive
than the out of line sibling entry skipping.

Link: http://lkml.kernel.org/r/1480369871-5271-55-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agobtrfs: fix race in btrfs_free_dummy_fs_info()
Matthew Wilcox [Wed, 14 Dec 2016 23:08:46 +0000 (15:08 -0800)]
btrfs: fix race in btrfs_free_dummy_fs_info()

We drop the lock which protects the radix tree, so we must call
radix_tree_iter_next() in order to avoid a modification to the tree
invalidating the iterator state.

Link: http://lkml.kernel.org/r/1480369871-5271-54-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoradix-tree: improve dump output
Matthew Wilcox [Wed, 14 Dec 2016 23:08:43 +0000 (15:08 -0800)]
radix-tree: improve dump output

Print the indices of the entries as unsigned (instead of signed)
integers and print the parent node of each entry to help navigate around
larger trees where the layout is not quite so obvious.  Print the
indices covered by a node.  Rearrange the order of fields printed so the
indices and parents line up for each type of entry.

Link: http://lkml.kernel.org/r/1480369871-5271-53-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoradix-tree: make radix_tree_find_next_bit more useful
Matthew Wilcox [Wed, 14 Dec 2016 23:08:40 +0000 (15:08 -0800)]
radix-tree: make radix_tree_find_next_bit more useful

Since this function is specialised to the radix tree, pass in the node
and tag to calculate the address of the bitmap in
radix_tree_find_next_bit() instead of the caller.  Likewise, there is no
need to pass in the size of the bitmap.

Link: http://lkml.kernel.org/r/1480369871-5271-52-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoradix-tree: create node_tag_set()
Matthew Wilcox [Wed, 14 Dec 2016 23:08:37 +0000 (15:08 -0800)]
radix-tree: create node_tag_set()

Similar to node_tag_clear(), factor node_tag_set() out of
radix_tree_range_tag_if_tagged().

Link: http://lkml.kernel.org/r/1480369871-5271-51-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoradix-tree: move rcu_head into a union with private_list
Matthew Wilcox [Wed, 14 Dec 2016 23:08:34 +0000 (15:08 -0800)]
radix-tree: move rcu_head into a union with private_list

I want to be able to reference node->parent after freeing node.

Currently node->parent is in a union with rcu_head, so it is overwritten
when the node is put on the RCU list.  We know that private_list is not
referenced after the node is freed, so it is safe for these two members
to share space.

Link: http://lkml.kernel.org/r/1480369871-5271-50-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoradix-tree: fix typo
Matthew Wilcox [Wed, 14 Dec 2016 23:08:31 +0000 (15:08 -0800)]
radix-tree: fix typo

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoradix tree test suite: use common find-bit code
Matthew Wilcox [Wed, 14 Dec 2016 23:08:29 +0000 (15:08 -0800)]
radix tree test suite: use common find-bit code

Remove the old find_next_bit code in favour of linking in the find_bit
code from tools/lib.

Link: http://lkml.kernel.org/r/1480369871-5271-48-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agotools: add more bitmap functions
Matthew Wilcox [Wed, 14 Dec 2016 23:08:26 +0000 (15:08 -0800)]
tools: add more bitmap functions

I need the following functions for the radix tree:

  bitmap_fill
  bitmap_empty
  bitmap_full

Copy the implementations from include/linux/bitmap.h

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoradix tree test suite: record order in each item
Matthew Wilcox [Wed, 14 Dec 2016 23:08:23 +0000 (15:08 -0800)]
radix tree test suite: record order in each item

This probably doubles the size of each item allocated by the test suite
but it lets us check a few more things, and may be needed for upcoming
API changes that require the caller pass in the order of the entry.

Link: http://lkml.kernel.org/r/1480369871-5271-46-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoradix tree test suite: handle exceptional entries
Matthew Wilcox [Wed, 14 Dec 2016 23:08:20 +0000 (15:08 -0800)]
radix tree test suite: handle exceptional entries

item_kill_tree() assumes that everything in the tree is a pointer to a
struct item, which is annoying when testing the behaviour of exceptional
entries.  Fix it to delete exceptional entries on the assumption they
don't need to be freed.

Link: http://lkml.kernel.org/r/1480369871-5271-45-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoradix tree test suite: use rcu_barrier
Matthew Wilcox [Wed, 14 Dec 2016 23:08:17 +0000 (15:08 -0800)]
radix tree test suite: use rcu_barrier

Calling rcu_barrier() allows all of the rcu-freed memory to be actually
returned to the pool, and allows nr_allocated to return to 0.  As well
as allowing diffs between runs to be more useful, it also lets us
pinpoint leaks more effectively.

Link: http://lkml.kernel.org/r/1480369871-5271-44-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoradix tree test suite: benchmark for iterator
Konstantin Khlebnikov [Wed, 14 Dec 2016 23:08:14 +0000 (15:08 -0800)]
radix tree test suite: benchmark for iterator

This adds simple benchmark for iterator similar to one I've used for
commit 78c1d78488a3 ("radix-tree: introduce bit-optimized iterator")

Building with make BENCHMARK=1 set radix tree order to 6, this allows to
get performance comparable to in kernel performance.

Link: http://lkml.kernel.org/r/1480369871-5271-43-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoradix tree test suite: iteration test misuses RCU
Matthew Wilcox [Wed, 14 Dec 2016 23:08:11 +0000 (15:08 -0800)]
radix tree test suite: iteration test misuses RCU

Each thread needs to register itself with RCU, otherwise the reading
thread's read lock has no effect and the freeing thread will free the
memory in the tree without waiting for the read lock to be dropped.

Link: http://lkml.kernel.org/r/1480369871-5271-42-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoradix tree test suite: make runs more reproducible
Matthew Wilcox [Wed, 14 Dec 2016 23:08:08 +0000 (15:08 -0800)]
radix tree test suite: make runs more reproducible

Instead of reseeding the random number generator every time around the
loop in big_gang_check(), seed it at the beginning of execution.  Use
rand_r() and an independent base seed for each thread in
iteration_test() so they don't stomp all over each others state.  Since
this particular test depends on the kernel scheduler, the iteration test
can't be reproduced based purely on the random seed, but at least it
won't pollute the other tests.

Print the seed, and allow the seed to be specified so that a run which
hits a problem can be reproduced.

Link: http://lkml.kernel.org/r/1480369871-5271-41-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoradix tree test suite: free preallocated nodes
Matthew Wilcox [Wed, 14 Dec 2016 23:08:05 +0000 (15:08 -0800)]
radix tree test suite: free preallocated nodes

It can be a source of mild concern when the test suite shows that we're
leaking nodes.  While poring over the source code looking for leaks can
lead to some fascinating bugs being discovered, sometimes the leak is
simply that these nodes were preallocated and are sitting on the per-CPU
list.  Free them by calling the CPU dead callback.

Link: http://lkml.kernel.org/r/1480369871-5271-40-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoradix tree test suite: track preempt_count
Matthew Wilcox [Wed, 14 Dec 2016 23:08:02 +0000 (15:08 -0800)]
radix tree test suite: track preempt_count

Rather than simply NOP out preempt_enable() and preempt_disable(), keep
track of preempt_count and display it regularly in case either the test
suite or the code under test is forgetting to balance the enables &
disables.  Only found a test-case that was forgetting to re-enable
preemption, but it's a possibility worth checking.

Link: http://lkml.kernel.org/r/1480369871-5271-39-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoradix tree test suite: allow GFP_ATOMIC allocations to fail
Matthew Wilcox [Wed, 14 Dec 2016 23:07:59 +0000 (15:07 -0800)]
radix tree test suite: allow GFP_ATOMIC allocations to fail

In order to test the preload code, it is necessary to fail GFP_ATOMIC
allocations, which requires defining GFP_KERNEL and GFP_ATOMIC properly.
Remove the obsolete __GFP_WAIT and copy the definitions of the __GFP
flags which are used from the kernel include files.  We also need the
real definition of gfpflags_allow_blocking() to persuade the radix tree
to actually use its preallocated nodes.

Link: http://lkml.kernel.org/r/1480369871-5271-38-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agotools: add WARN_ON_ONCE
Matthew Wilcox [Wed, 14 Dec 2016 23:07:56 +0000 (15:07 -0800)]
tools: add WARN_ON_ONCE

Patch series "Radix tree patches for 4.10", v3.

Mostly these are improvements; the only bug fixes in here relate to
multiorder entries (which are unused in the 4.9 tree).

This patch (of 32):

The radix tree uses its own buggy WARN_ON_ONCE.  Replace it with the
definition from asm-generic/bug.h

Link: http://lkml.kernel.org/r/1480369871-5271-37-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agodax: clear dirty entry tags on cache flush
Jan Kara [Wed, 14 Dec 2016 23:07:53 +0000 (15:07 -0800)]
dax: clear dirty entry tags on cache flush

Currently we never clear dirty tags in DAX mappings and thus address
ranges to flush accumulate.  Now that we have locking of radix tree
entries, we have all the locking necessary to reliably clear the radix
tree dirty tag when flushing caches for corresponding address range.
Similarly to page_mkclean() we also have to write-protect pages to get a
page fault when the page is next written to so that we can mark the
entry dirty again.

Link: http://lkml.kernel.org/r/1479460644-25076-21-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agodax: protect PTE modification on WP fault by radix tree entry lock
Jan Kara [Wed, 14 Dec 2016 23:07:50 +0000 (15:07 -0800)]
dax: protect PTE modification on WP fault by radix tree entry lock

Currently PTE gets updated in wp_pfn_shared() after dax_pfn_mkwrite()
has released corresponding radix tree entry lock.  When we want to
writeprotect PTE on cache flush, we need PTE modification to happen
under radix tree entry lock to ensure consistent updates of PTE and
radix tree (standard faults use page lock to ensure this consistency).
So move update of PTE bit into dax_pfn_mkwrite().

Link: http://lkml.kernel.org/r/1479460644-25076-20-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agodax: make cache flushing protected by entry lock
Jan Kara [Wed, 14 Dec 2016 23:07:47 +0000 (15:07 -0800)]
dax: make cache flushing protected by entry lock

Currently, flushing of caches for DAX mappings was ignoring entry lock.
So far this was ok (modulo a bug that a difference in entry lock could
cause cache flushing to be mistakenly skipped) but in the following
patches we will write-protect PTEs on cache flushing and clear dirty
tags.  For that we will need more exclusion.  So do cache flushing under
an entry lock.  This allows us to remove one lock-unlock pair of
mapping->tree_lock as a bonus.

Link: http://lkml.kernel.org/r/1479460644-25076-19-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agomm: export follow_pte()
Jan Kara [Wed, 14 Dec 2016 23:07:45 +0000 (15:07 -0800)]
mm: export follow_pte()

DAX will need to implement its own version of page_check_address().  To
avoid duplicating page table walking code, export follow_pte() which
does what we need.

Link: http://lkml.kernel.org/r/1479460644-25076-18-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agomm: change return values of finish_mkwrite_fault()
Jan Kara [Wed, 14 Dec 2016 23:07:42 +0000 (15:07 -0800)]
mm: change return values of finish_mkwrite_fault()

Currently finish_mkwrite_fault() returns 0 when PTE got changed before
we acquired PTE lock and VM_FAULT_WRITE when we succeeded in modifying
the PTE.  This is somewhat confusing since 0 generally means success, it
is also inconsistent with finish_fault() which returns 0 on success.
Change finish_mkwrite_fault() to return 0 on success and VM_FAULT_NOPAGE
when PTE changed.  Practically, there should be no behavioral difference
since we bail out from the fault the same way regardless whether we
return 0, VM_FAULT_NOPAGE, or VM_FAULT_WRITE.  Also note that
VM_FAULT_WRITE has no effect for shared mappings since the only two
places that check it - KSM and GUP - care about private mappings only.
Generally the meaning of VM_FAULT_WRITE for shared mappings is not well
defined and we should probably clean that up.

Link: http://lkml.kernel.org/r/1479460644-25076-17-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agomm: provide helper for finishing mkwrite faults
Jan Kara [Wed, 14 Dec 2016 23:07:39 +0000 (15:07 -0800)]
mm: provide helper for finishing mkwrite faults

Provide a helper function for finishing write faults due to PTE being
read-only.  The helper will be used by DAX to avoid the need of
complicating generic MM code with DAX locking specifics.

Link: http://lkml.kernel.org/r/1479460644-25076-16-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agomm: move part of wp_page_reuse() into the single call site
Jan Kara [Wed, 14 Dec 2016 23:07:36 +0000 (15:07 -0800)]
mm: move part of wp_page_reuse() into the single call site

wp_page_reuse() handles write shared faults which is needed only in
wp_page_shared().  Move the handling only into that location to make
wp_page_reuse() simpler and avoid a strange situation when we sometimes
pass in locked page, sometimes unlocked etc.

Link: http://lkml.kernel.org/r/1479460644-25076-15-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agomm: use vmf->page during WP faults
Jan Kara [Wed, 14 Dec 2016 23:07:33 +0000 (15:07 -0800)]
mm: use vmf->page during WP faults

So far we set vmf->page during WP faults only when we needed to pass it
to the ->page_mkwrite handler.  Set it in all the cases now and use that
instead of passing page pointer explicitly around.

Link: http://lkml.kernel.org/r/1479460644-25076-14-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agomm: pass vm_fault structure into do_page_mkwrite()
Jan Kara [Wed, 14 Dec 2016 23:07:30 +0000 (15:07 -0800)]
mm: pass vm_fault structure into do_page_mkwrite()

We will need more information in the ->page_mkwrite() helper for DAX to
be able to fully finish faults there.  Pass vm_fault structure to
do_page_mkwrite() and use it there so that information propagates
properly from upper layers.

Link: http://lkml.kernel.org/r/1479460644-25076-13-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agomm: factor out common parts of write fault handling
Jan Kara [Wed, 14 Dec 2016 23:07:27 +0000 (15:07 -0800)]
mm: factor out common parts of write fault handling

Currently we duplicate handling of shared write faults in
wp_page_reuse() and do_shared_fault().  Factor them out into a common
function.

Link: http://lkml.kernel.org/r/1479460644-25076-12-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agomm: move handling of COW faults into DAX code
Jan Kara [Wed, 14 Dec 2016 23:07:24 +0000 (15:07 -0800)]
mm: move handling of COW faults into DAX code

Move final handling of COW faults from generic code into DAX fault
handler.  That way generic code doesn't have to be aware of
peculiarities of DAX locking so remove that knowledge and make locking
functions private to fs/dax.c.

Link: http://lkml.kernel.org/r/1479460644-25076-11-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agomm: factor out functionality to finish page faults
Jan Kara [Wed, 14 Dec 2016 23:07:21 +0000 (15:07 -0800)]
mm: factor out functionality to finish page faults

Introduce finish_fault() as a helper function for finishing page faults.
It is rather thin wrapper around alloc_set_pte() but since we'd want to
call this from DAX code or filesystems, it is still useful to avoid some
boilerplate code.

Link: http://lkml.kernel.org/r/1479460644-25076-10-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agomm: allow full handling of COW faults in ->fault handlers
Jan Kara [Wed, 14 Dec 2016 23:07:18 +0000 (15:07 -0800)]
mm: allow full handling of COW faults in ->fault handlers

Patch series "dax: Clear dirty bits after flushing caches", v5.

Patchset to clear dirty bits from radix tree of DAX inodes when caches
for corresponding pfns have been flushed.  In principle, these patches
enable handlers to easily update PTEs and do other work necessary to
finish the fault without duplicating the functionality present in the
generic code.  I'd like to thank Kirill and Ross for reviews of the
series!

This patch (of 20):

To allow full handling of COW faults add memcg field to struct vm_fault
and a return value of ->fault() handler meaning that COW fault is fully
handled and memcg charge must not be canceled.  This will allow us to
remove knowledge about special DAX locking from the generic fault code.

Link: http://lkml.kernel.org/r/1479460644-25076-9-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agomm: add orig_pte field into vm_fault
Jan Kara [Wed, 14 Dec 2016 23:07:16 +0000 (15:07 -0800)]
mm: add orig_pte field into vm_fault

Add orig_pte field to vm_fault structure to allow ->page_mkwrite
handlers to fully handle the fault.

This also allows us to save some passing of extra arguments around.

Link: http://lkml.kernel.org/r/1479460644-25076-8-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agomm: use passed vm_fault structure for in wp_pfn_shared()
Jan Kara [Wed, 14 Dec 2016 23:07:13 +0000 (15:07 -0800)]
mm: use passed vm_fault structure for in wp_pfn_shared()

Instead of creating another vm_fault structure, use the one passed to
wp_pfn_shared() for passing arguments into pfn_mkwrite handler.

Link: http://lkml.kernel.org/r/1479460644-25076-7-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agomm: trim __do_fault() arguments
Jan Kara [Wed, 14 Dec 2016 23:07:10 +0000 (15:07 -0800)]
mm: trim __do_fault() arguments

Use vm_fault structure to pass cow_page, page, and entry in and out of
the function.

That reduces number of __do_fault() arguments from 4 to 1.

Link: http://lkml.kernel.org/r/1479460644-25076-6-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agomm: use passed vm_fault structure in __do_fault()
Jan Kara [Wed, 14 Dec 2016 23:07:07 +0000 (15:07 -0800)]
mm: use passed vm_fault structure in __do_fault()

Instead of creating another vm_fault structure, use the one passed to
__do_fault() for passing arguments into fault handler.

Link: http://lkml.kernel.org/r/1479460644-25076-5-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agomm: use pgoff in struct vm_fault instead of passing it separately
Jan Kara [Wed, 14 Dec 2016 23:07:04 +0000 (15:07 -0800)]
mm: use pgoff in struct vm_fault instead of passing it separately

struct vm_fault has already pgoff entry.  Use it instead of passing
pgoff as a separate argument and then assigning it later.

Link: http://lkml.kernel.org/r/1479460644-25076-4-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>