git.proxmox.com Git - mirror_ubuntu-focal-kernel.git/log

nohz: Fix collision between tick and other hrtimers, again

This restores commit:

  24b91e360ef5: ("nohz: Fix collision between tick and other hrtimers")

... which got reverted by commit:

  558e8e27e73f: ('Revert "nohz: Fix collision between tick and other hrtimers"')

... due to a regression where CPUs spuriously stopped ticking.

The bug happened when a tick fired too early past its expected expiration:
on IRQ exit the tick was scheduled again to the same deadline but skipped
reprogramming because ts->next_tick still kept in cache the deadline.
This has been fixed now with resetting ts->next_tick from the tick
itself. Extra care has also been taken to prevent from obsolete values
throughout CPU hotplug operations.

When the tick is stopped and an interrupt occurs afterward, we check on
that interrupt exit if the next tick needs to be rescheduled. If it
doesn't need any update, we don't want to do anything.

In order to check if the tick needs an update, we compare it against the
clockevent device deadline. Now that's a problem because the clockevent
device is at a lower level than the tick itself if it is implemented
on top of hrtimer.

Every hrtimer share this clockevent device. So comparing the next tick
deadline against the clockevent device deadline is wrong because the
device may be programmed for another hrtimer whose deadline collides
with the tick. As a result we may end up not reprogramming the tick
accidentally.

In a worst case scenario under full dynticks mode, the tick stops firing
as it is supposed to every 1hz, leaving /proc/stat stalled:

      Task in a full dynticks CPU
      ----------------------------

      * hrtimer A is queued 2 seconds ahead
      * the tick is stopped, scheduled 1 second ahead
      * tick fires 1 second later
      * on tick exit, nohz schedules the tick 1 second ahead but sees
        the clockevent device is already programmed to that deadline,
        fooled by hrtimer A, the tick isn't rescheduled.
      * hrtimer A is cancelled before its deadline
      * tick never fires again until an interrupt happens...

In order to fix this, store the next tick deadline to the tick_sched
local structure and reuse that value later to check whether we need to
reprogram the clock after an interrupt.

On the other hand, ts->sleep_length still wants to know about the next
clock event and not just the tick, so we want to improve the related
comment to avoid confusion.

Reported-and-tested-by: Tim Wright <tim@binbash.co.uk>
Reported-and-tested-by: Pavel Machek <pavel@ucw.cz>
Reported-by: James Hartsock <hartsjc@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1492783255-5051-2-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

nohz: Add hrtimer sanity check

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Merge tag 'pstore-v4.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull pstore fix from Kees Cook:
"Fix bad EFI vars iterator usage"

* tag 'pstore-v4.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
efi-pstore: Fix read iter after pstore API refactor

efi-pstore: Fix read iter after pstore API refactor

During the internal pstore API refactoring, the EFI vars read entry was
accidentally made to update a stack variable instead of the pstore
private data pointer. This corrects the problem (and removes the now
needless argument).

Fixes: 125cc42baf8a ("pstore: Replace arguments for read() API")
Signed-off-by: Kees Cook <keescook@chromium.org>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 fixes from Martin Schwidefsky:

- convert the debug feature to refcount_t

- reduce the copy size for strncpy_from_user

- 8 bug fixes

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/virtio: change virtio_feature_desc:features type to __le32
  s390: convert debug_info.ref_count from atomic_t to refcount_t
  s390: move _text symbol to address higher than zero
  s390/qdio: increase string buffer size
  s390/ccwgroup: increase string buffer size
  s390/topology: let topology_mnest_limit() return unsigned char
  s390/uaccess: use sane length for __strncpy_from_user()
  s390/uprobes: fix compile for !KPROBES
  s390/ftrace: fix compile for !MODULES
  s390/cputime: fix incorrect system time

Merge tag 'edac_fix_for_4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp

Pull EDAC fix from Borislav Petkov:
"A single amd64_edac fix correcting chip select sizes reporting on
F17h"

* tag 'edac_fix_for_4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
EDAC, amd64: Fix reporting of Chip Select sizes on Fam17h

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) Track alignment in BPF verifier so that legitimate programs won't be
    rejected on !CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS architectures.

2) Make tail calls work properly in arm64 BPF JIT, from Deniel
    Borkmann.

3) Make the configuration and semantics Generic XDP make more sense and
    don't allow both generic XDP and a driver specific instance to be
    active at the same time. Also from Daniel.

4) Don't crash on resume in xen-netfront, from Vitaly Kuznetsov.

5) Fix use-after-free in VRF driver, from Gao Feng.

6) Use netdev_alloc_skb_ip_align() to avoid unaligned IP headers in
    qca_spi driver, from Stefan Wahren.

7) Always run cleanup routines in BPF samples when we get SIGTERM, from
    Andy Gospodarek.

8) The mdio phy code should bring PHYs out of reset using the shared
    GPIO lines before invoking bus->reset(). From Florian Fainelli.

9) Some USB descriptor access endian fixes in various drivers from
    Johan Hovold.

10) Handle PAUSE advertisements properly in mlx5 driver, from Gal
    Pressman.

11) Fix reversed test in mlx5e_setup_tc(), from Saeed Mahameed.

12) Cure netdev leak in AF_PACKET when using timestamping via control
    messages. From Douglas Caetano dos Santos.

13) netcp doesn't support HWTSTAMP_FILTER_ALl, reject it. From Miroslav
    Lichvar.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (52 commits)
  ldmvsw: stop the clean timer at beginning of remove
  ldmvsw: unregistering netdev before disable hardware
  net: netcp: fix check of requested timestamping filter
  ipv6: avoid dad-failures for addresses with NODAD
  qed: Fix uninitialized data in aRFS infrastructure
  mdio: mux: fix device_node_continue.cocci warnings
  net/packet: fix missing net_device reference release
  net/mlx4_core: Use min3 to select number of MSI-X vectors
  macvlan: Fix performance issues with vlan tagged packets
  net: stmmac: use correct pointer when printing normal descriptor ring
  net/mlx5: Use underlay QPN from the root name space
  net/mlx5e: IPoIB, Only support regular RQ for now
  net/mlx5e: Fix setup TC ndo
  net/mlx5e: Fix ethtool pause support and advertise reporting
  net/mlx5e: Use the correct pause values for ethtool advertising
  vmxnet3: ensure that adapter is in proper state during force_close
  sfc: revert changes to NIC revision numbers
  net: ch9200: add missing USB-descriptor endianness conversions
  net: irda: irda-usb: fix firmware name on big-endian hosts
  net: dsa: mv88e6xxx: add default case to switch
  ...

Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
"A set of minor cifs fixes"

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
  [CIFS] Minor cleanup of xattr query function
  fs: cifs: transport: Use time_after for time comparison
  SMB2: Fix share type handling
  cifs: cifsacl: Use a temporary ops variable to reduce code length
  Don't delay freeing mids when blocked on slow socket write of request
  CIFS: silence lockdep splat in cifs_relock_file()

Merge branch 'ldmsw-fixes'

Shannon Nelson says:

====================
ldmvsw: port removal stability

Under heavy reboot stress testing we found a couple of timing issues
when removing the device that could cause the kernel great heartburn,
addressed by these two patches.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ldmvsw: stop the clean timer at beginning of remove

Stop the clean timer earlier to be sure there's no asynchronous
interference while stopping the port.

Orabug: 25748241

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ldmvsw: unregistering netdev before disable hardware

When running LDom binding/unbinding test, kernel may panic
in ldmvsw_open(). It is more likely that because we're removing
the ldc connection before unregistering the netdev in vsw_port_remove(),
we set up a window of time where one process could be removing the
device while another trying to UP the device. This also sometimes causes
vio handshake error due to opening a device without closing it completely.
We should unregister the netdev before we disable the "hardware".

Orabug: 25980913, 25925306

Signed-off-by: Thomas Tai <thomas.tai@oracle.com>
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: netcp: fix check of requested timestamping filter

The driver doesn't support timestamping of all received packets and
should return error when trying to enable the HWTSTAMP_FILTER_ALL
filter.

Cc: WingMan Kwok <w-kwok2@ti.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'mlx5-fixes-2017-05-12-V2' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
Mellanox, mlx5 fixes 2017-05-12

This series contains some mlx5 fixes for net.
Please pull and let me know if there's any problem.

For -stable:
("net/mlx5e: Fix ethtool pause support and advertise reporting") kernels >= 4.8
("net/mlx5e: Use the correct pause values for ethtool advertising") kernels >= 4.8

v1->v2:
Dropped statistics spinlock patch, it needs some extra work.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: avoid dad-failures for addresses with NODAD

Every address gets added with TENTATIVE flag even for the addresses with
IFA_F_NODAD flag and dad-work is scheduled for them. During this DAD process
we realize it's an address with NODAD and complete the process without
sending any probe. However the TENTATIVE flags stays on the
address for sometime enough to cause misinterpretation when we receive a NS.
While processing NS, if the address has TENTATIVE flag, we mark it DADFAILED
and endup with an address that was originally configured as NODAD with
DADFAILED.

We can't avoid scheduling dad_work for addresses with NODAD but we can
avoid adding TENTATIVE flag to avoid this racy situation.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: Fix uninitialized data in aRFS infrastructure

Current memset is using incorrect type of variable, causing the
upper-half of the strucutre to be left uninitialized and causing:

ethernet/qlogic/qed/qed_init_fw_funcs.c: In function 'qed_set_rfs_mode_disable':
ethernet/qlogic/qed/qed_init_fw_funcs.c:993:3: error: '*((void *)&ramline+4)' is used uninitialized in this function [-Werror=uninitialized]

Fixes: d51e4af5c209 ("qed: aRFS infrastructure support")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

mdio: mux: fix device_node_continue.cocci warnings

Device node iterators put the previous value of the index variable, so an
explicit put causes a double put.

In particular, of_mdiobus_register can fail before doing anything
interesting, so one could view it as a no-op from the reference count
point of view.

Generated by: scripts/coccinelle/iterators/device_node_continue.cocci

CC: Jon Mason <jon.mason@broadcom.com>
Signed-off-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/packet: fix missing net_device reference release

When using a TX ring buffer, if an error occurs processing a control
message (e.g. invalid message), the net_device reference is not
released.

Fixes c14ac9451c348 ("sock: enable timestamping using control messages")
Signed-off-by: Douglas Caetano dos Santos <douglascs@taghos.com.br>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_core: Use min3 to select number of MSI-X vectors

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

macvlan: Fix performance issues with vlan tagged packets

Macvlan always turns on offload features that have sofware
fallback (NETIF_GSO_SOFTWARE).  This allows much higher guest-guest
communications over macvtap.

However, macvtap does not turn on these features for vlan tagged traffic.
As a result, depending on the HW that mactap is configured on, the
performance of guest-guest communication over a vlan is very
inconsistent.  If the HW supports TSO/UFO over vlans, then the
performance will be fine.  If not, the the performance will suffer
greatly since the VM may continue using TSO/UFO, and will force the host
segment the traffic and possibly overlow the macvtap queue.

This patch adds the always on offloads to vlan_features.  This
makes sure that any vlan tagged traffic between 2 guest will not
be segmented needlessly.

Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: use correct pointer when printing normal descriptor ring

There are two pointers in sysfs_display_ring,
one that increments if using normal dma descriptors,
another if using extended dma descriptors.

When printing the normal dma descriptors, the wrong pointer is used,
thus the printed descriptor addresses are incorrect.

Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

s390/virtio: change virtio_feature_desc:features type to __le32

The feature member of virtio_feature_desc contains little endian
values, given that it contents will be converted with
le32_to_cpu(). The "wrong" __u32 type leads to the sparse warnings
below.
In order to avoid them, use the correct __le32 type instead.

drivers/s390/virtio/virtio_ccw.c:749:14: warning: cast to restricted __le32
drivers/s390/virtio/virtio_ccw.c:762:28: warning: cast to restricted __le32

Acked-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

net/mlx5: Use underlay QPN from the root name space

Root flow table is dynamically changed by the underlying flow steering
layer, and IPoIB/ULPs have no idea what will be the root flow table in
the future, hence we need a dynamic infrastructure to move Underlay QPs
with the root flow table.

Fixes: b3ba51498bdd ("net/mlx5: Refactor create flow table method to accept underlay QP")
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: IPoIB, Only support regular RQ for now

IPoIB doesn't support striding RQ at the moment, for this
we need to explicitly choose non striding RQ in IPoIB init,
even if the HW supports it.

Fixes: 8f493ffd88ea ("net/mlx5e: IPoIB, RX steering RSS RQTs and TIRs")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Fix setup TC ndo

Fail-safe support patches introduced a trivial bug,
setup tc callback is doing a wrong check of the netdevice state,
the fix is simply to invert the condition.

Fixes: 6f9485af4020 ("net/mlx5e: Fail safe tc setup")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Fix ethtool pause support and advertise reporting

Pause bit should set when RX pause is on, not TX pause.
Also, setting Asym_Pause is incorrect, and should be turned off.

Fixes: 665bc53969d7 ("net/mlx5e: Use new ethtool get/set link ksettings API")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Cc: kernel-team@fb.com
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Use the correct pause values for ethtool advertising

Query the operational pause from firmware (PFCC register) instead of
always passing zeros.

Fixes: 665bc53969d7 ("net/mlx5e: Use new ethtool get/set link ksettings API")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Cc: kernel-team@fb.com
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

Linux 4.12-rc1

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input

Pull some more input subsystem updates from Dmitry Torokhov:
"An updated xpad driver with a few more recognized device IDs, and a
  new psxpad-spi driver, allowing connecting Playstation 1 and 2 joypads
  via SPI bus"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: cros_ec_keyb - remove extraneous 'const'
  Input: add support for PlayStation 1/2 joypads connected via SPI
  Input: xpad - add USB IDs for Mad Catz Brawlstick and Razer Sabertooth
  Input: xpad - sync supported devices with xboxdrv
  Input: xpad - sort supported devices by USB ID

Merge tag 'upstream-4.12-rc1' of git://git.infradead.org/linux-ubifs

Pull UBI/UBIFS updates from Richard Weinberger:

- new config option CONFIG_UBIFS_FS_SECURITY

- minor improvements

- random fixes

* tag 'upstream-4.12-rc1' of git://git.infradead.org/linux-ubifs:
  ubi: Add debugfs file for tracking PEB state
  ubifs: Fix a typo in comment of ioctl2ubifs & ubifs2ioctl
  ubifs: Remove unnecessary assignment
  ubifs: Fix cut and paste error on sb type comparisons
  ubi: fastmap: Fix slab corruption
  ubifs: Add CONFIG_UBIFS_FS_SECURITY to disable/enable security labels
  ubi: Make mtd parameter readable
  ubi: Fix section mismatch

Merge branch 'for-linus-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml

Pull UML fixes from Richard Weinberger:
"No new stuff, just fixes"

* 'for-linus-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
  um: Add missing NR_CPUS include
  um: Fix to call read_initrd after init_bootmem
  um: Include kbuild.h instead of duplicating its macros
  um: Fix PTRACE_POKEUSER on x86_64
  um: Set number of CPUs
  um: Fix _print_addr()

Merge branch 'akpm' (patches from Andrew)

Merge misc fixes from Andrew Morton:
"15 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm, docs: update memory.stat description with workingset* entries
  mm: vmscan: scan until it finds eligible pages
  mm, thp: copying user pages must schedule on collapse
  dax: fix PMD data corruption when fault races with write
  dax: fix data corruption when fault races with write
  ext4: return to starting transaction in ext4_dax_huge_fault()
  mm: fix data corruption due to stale mmap reads
  dax: prevent invalidation of mapped DAX entries
  Tigran has moved
  mm, vmalloc: fix vmalloc users tracking properly
  mm/khugepaged: add missed tracepoint for collapse_huge_page_swapin
  gcov: support GCC 7.1
  mm, vmstat: Remove spurious WARN() during zoneinfo print
  time: delete current_fs_time()
  hwpoison, memcg: forcibly uncharge LRU pages

[CIFS] Minor cleanup of xattr query function

Some minor cleanup of cifs query xattr functions (will also make
SMB3 xattr implementation cleaner as well).

Signed-off-by: Steve French <steve.french@primarydata.com>

fs: cifs: transport: Use time_after for time comparison

Use time_after kernel macro for time comparison
that has safety check.

Signed-off-by: Karim Eshapa <karim.eshapa@gmail.com>
Signed-off-by: Steve French <smfrench@gmail.com>

SMB2: Fix share type handling

In fs/cifs/smb2pdu.h, we have:
#define SMB2_SHARE_TYPE_DISK    0x01
#define SMB2_SHARE_TYPE_PIPE    0x02
#define SMB2_SHARE_TYPE_PRINT   0x03

Knowing that, with the current code, the SMB2_SHARE_TYPE_PRINT case can
never trigger and printer share would be interpreted as disk share.

So, test the ShareType value for equality instead.

Fixes: faaf946a7d5b ("CIFS: Add tree connect/disconnect capability for SMB2")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <smfrench@gmail.com>

cifs: cifsacl: Use a temporary ops variable to reduce code length

Create an ops variable to store tcon->ses->server->ops and cache
indirections and reduce code size a trivial bit.

$ size fs/cifs/cifsacl.o*
   text    data     bss     dec     hex filename
   5338     136       8    5482    156a fs/cifs/cifsacl.o.new
   5371     136       8    5515    158b fs/cifs/cifsacl.o.old

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Steve French <smfrench@gmail.com>

mm, docs: update memory.stat description with workingset* entries

Commit 4b4cea91691d ("mm: vmscan: fix IO/refault regression in cache
workingset transition") introduced three new entries in memory stat
file:

- workingset_refault
- workingset_activate
- workingset_nodereclaim

This commit adds a corresponding description to the cgroup v2 docs.

Link: http://lkml.kernel.org/r/1494530293-31236-1-git-send-email-guro@fb.com
Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: vmscan: scan until it finds eligible pages

Although there are a ton of free swap and anonymous LRU page in elgible
zones, OOM happened.

  balloon invoked oom-killer: gfp_mask=0x17080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK), nodemask=(null),  order=0, oom_score_adj=0
  CPU: 7 PID: 1138 Comm: balloon Not tainted 4.11.0-rc6-mm1-zram-00289-ge228d67e9677-dirty #17
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
  Call Trace:
   oom_kill_process+0x21d/0x3f0
   out_of_memory+0xd8/0x390
   __alloc_pages_slowpath+0xbc1/0xc50
   __alloc_pages_nodemask+0x1a5/0x1c0
   pte_alloc_one+0x20/0x50
   __pte_alloc+0x1e/0x110
   __handle_mm_fault+0x919/0x960
   handle_mm_fault+0x77/0x120
   __do_page_fault+0x27a/0x550
   trace_do_page_fault+0x43/0x150
   do_async_page_fault+0x2c/0x90
   async_page_fault+0x28/0x30
  Mem-Info:
  active_anon:424716 inactive_anon:65314 isolated_anon:0
   active_file:52 inactive_file:46 isolated_file:0
   unevictable:0 dirty:27 writeback:0 unstable:0
   slab_reclaimable:3967 slab_unreclaimable:4125
   mapped:133 shmem:43 pagetables:1674 bounce:0
   free:4637 free_pcp:225 free_cma:0
  Node 0 active_anon:1698864kB inactive_anon:261256kB active_file:208kB inactive_file:184kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:532kB dirty:108kB writeback:0kB shmem:172kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
  DMA free:7316kB min:32kB low:44kB high:56kB active_anon:8064kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB slab_reclaimable:464kB slab_unreclaimable:40kB kernel_stack:0kB pagetables:24kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
  lowmem_reserve[]: 0 992 992 1952
  DMA32 free:9088kB min:2048kB low:3064kB high:4080kB active_anon:952176kB inactive_anon:0kB active_file:36kB inactive_file:0kB unevictable:0kB writepending:88kB present:1032192kB managed:1019388kB mlocked:0kB slab_reclaimable:13532kB slab_unreclaimable:16460kB kernel_stack:3552kB pagetables:6672kB bounce:0kB free_pcp:56kB local_pcp:24kB free_cma:0kB
  lowmem_reserve[]: 0 0 0 959
  Movable free:3644kB min:1980kB low:2960kB high:3940kB active_anon:738560kB inactive_anon:261340kB active_file:188kB inactive_file:640kB unevictable:0kB writepending:20kB present:1048444kB managed:1010816kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:832kB local_pcp:60kB free_cma:0kB
  lowmem_reserve[]: 0 0 0 0
  DMA: 1*4kB (E) 0*8kB 18*16kB (E) 10*32kB (E) 10*64kB (E) 9*128kB (ME) 8*256kB (E) 2*512kB (E) 2*1024kB (E) 0*2048kB 0*4096kB = 7524kB
  DMA32: 417*4kB (UMEH) 181*8kB (UMEH) 68*16kB (UMEH) 48*32kB (UMEH) 14*64kB (MH) 3*128kB (M) 1*256kB (H) 1*512kB (M) 2*1024kB (M) 0*2048kB 0*4096kB = 9836kB
  Movable: 1*4kB (M) 1*8kB (M) 1*16kB (M) 1*32kB (M) 0*64kB 1*128kB (M) 2*256kB (M) 4*512kB (M) 1*1024kB (M) 0*2048kB 0*4096kB = 3772kB
  378 total pagecache pages
  17 pages in swap cache
  Swap cache stats: add 17325, delete 17302, find 0/27
  Free swap  = 978940kB
  Total swap = 1048572kB
  524157 pages RAM
  0 pages HighMem/MovableOnly
  12629 pages reserved
  0 pages cma reserved
  0 pages hwpoisoned
  [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
  [  433]     0   433     4904        5      14       3       82             0 upstart-udev-br
  [  438]     0   438    12371        5      27       3      191         -1000 systemd-udevd

With investigation, skipping page of isolate_lru_pages makes reclaim
void because it returns zero nr_taken easily so LRU shrinking is
effectively nothing and just increases priority aggressively.  Finally,
OOM happens.

The problem is that get_scan_count determines nr_to_scan with eligible
zones so although priority drops to zero, it couldn't reclaim any pages
if the LRU contains mostly ineligible pages.

get_scan_count:

        size = lruvec_lru_size(lruvec, lru, sc->reclaim_idx);
size = size >> sc->priority;

Assumes sc->priority is 0 and LRU list is as follows.

N-N-N-N-H-H-H-H-H-H-H-H-H-H-H-H-H-H-H-H

(Ie, small eligible pages are in the head of LRU but others are
almost ineligible pages)

In that case, size becomes 4 so VM want to scan 4 pages but 4 pages from
tail of the LRU are not eligible pages.  If get_scan_count counts
skipped pages, it doesn't reclaim any pages remained after scanning 4
pages so it ends up OOM happening.

This patch makes isolate_lru_pages try to scan pages until it encounters
eligible zones's pages.

[akpm@linux-foundation.org: clean up mind-bending `for' statement.  Tweak comment text]
Fixes: 3db65812d688 ("Revert "mm, vmscan: account for skipped pages as a partial scan"")
Link: http://lkml.kernel.org/r/1494457232-27401-1-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm, thp: copying user pages must schedule on collapse

We have encountered need_resched warnings in __collapse_huge_page_copy()
while doing {clear,copy}_user_highpage() over HPAGE_PMD_NR source pages.

mm->mmap_sem is held for write, but the iteration is well bounded.

Reschedule as needed.

Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1705101426380.109808@chino.kir.corp.google.com
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

dax: fix PMD data corruption when fault races with write

This is based on a patch from Jan Kara that fixed the equivalent race in
the DAX PTE fault path.

Currently DAX PMD read fault can race with write(2) in the following
way:

CPU1 - write(2)                 CPU2 - read fault
                                dax_iomap_pmd_fault()
                                  ->iomap_begin() - sees hole

dax_iomap_rw()
  iomap_apply()
    ->iomap_begin - allocates blocks
    dax_iomap_actor()
      invalidate_inode_pages2_range()
        - there's nothing to invalidate

                                  grab_mapping_entry()
  - we add huge zero page to the radix tree
    and map it to page tables

The result is that hole page is mapped into page tables (and thus zeros
are seen in mmap) while file has data written in that place.

Fix the problem by locking exception entry before mapping blocks for the
fault.  That way we are sure invalidate_inode_pages2_range() call for
racing write will either block on entry lock waiting for the fault to
finish (and unmap stale page tables after that) or read fault will see
already allocated blocks by write(2).

Fixes: 9f141d6ef6258 ("dax: Call ->iomap_begin without entry lock during dax fault")
Link: http://lkml.kernel.org/r/20170510172700.18991-1-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

dax: fix data corruption when fault races with write

Currently DAX read fault can race with write(2) in the following way:

CPU1 - write(2) CPU2 - read fault
dax_iomap_pte_fault()
  ->iomap_begin() - sees hole
dax_iomap_rw()
  iomap_apply()
    ->iomap_begin - allocates blocks
    dax_iomap_actor()
      invalidate_inode_pages2_range()
        - there's nothing to invalidate
  grab_mapping_entry()
  - we add zero page in the radix tree
    and map it to page tables

The result is that hole page is mapped into page tables (and thus zeros
are seen in mmap) while file has data written in that place.

Fix the problem by locking exception entry before mapping blocks for the
fault.  That way we are sure invalidate_inode_pages2_range() call for
racing write will either block on entry lock waiting for the fault to
finish (and unmap stale page tables after that) or read fault will see
already allocated blocks by write(2).

Fixes: 9f141d6ef6258a3a37a045842d9ba7e68f368956
Link: http://lkml.kernel.org/r/20170510085419.27601-5-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ext4: return to starting transaction in ext4_dax_huge_fault()

DAX will return to locking exceptional entry before mapping blocks for a
page fault to fix possible races with concurrent writes. To avoid lock
inversion between exceptional entry lock and transaction start, start
the transaction already in ext4_dax_huge_fault().

Fixes: 9f141d6ef6258a3a37a045842d9ba7e68f368956
Link: http://lkml.kernel.org/r/20170510085419.27601-4-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: fix data corruption due to stale mmap reads

Currently, we didn't invalidate page tables during invalidate_inode_pages2()
for DAX.  That could result in e.g. 2MiB zero page being mapped into
page tables while there were already underlying blocks allocated and
thus data seen through mmap were different from data seen by read(2).
The following sequence reproduces the problem:

- open an mmap over a 2MiB hole

- read from a 2MiB hole, faulting in a 2MiB zero page

- write to the hole with write(3p). The write succeeds but we
   incorrectly leave the 2MiB zero page mapping intact.

- via the mmap, read the data that was just written. Since the zero
   page mapping is still intact we read back zeroes instead of the new
   data.

Fix the problem by unconditionally calling invalidate_inode_pages2_range()
in dax_iomap_actor() for new block allocations and by properly
invalidating page tables in invalidate_inode_pages2_range() for DAX
mappings.

Fixes: c6dcf52c23d2d3fb5235cec42d7dd3f786b87d55
Link: http://lkml.kernel.org/r/20170510085419.27601-3-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

dax: prevent invalidation of mapped DAX entries

Patch series "mm,dax: Fix data corruption due to mmap inconsistency",
v4.

This series fixes data corruption that can happen for DAX mounts when
page faults race with write(2) and as a result page tables get out of
sync with block mappings in the filesystem and thus data seen through
mmap is different from data seen through read(2).

The series passes testing with t_mmap_stale test program from Ross and
also other mmap related tests on DAX filesystem.

This patch (of 4):

dax_invalidate_mapping_entry() currently removes DAX exceptional entries
only if they are clean and unlocked.  This is done via:

  invalidate_mapping_pages()
    invalidate_exceptional_entry()
      dax_invalidate_mapping_entry()

However, for page cache pages removed in invalidate_mapping_pages()
there is an additional criteria which is that the page must not be
mapped.  This is noted in the comments above invalidate_mapping_pages()
and is checked in invalidate_inode_page().

For DAX entries this means that we can can end up in a situation where a
DAX exceptional entry, either a huge zero page or a regular DAX entry,
could end up mapped but without an associated radix tree entry.  This is
inconsistent with the rest of the DAX code and with what happens in the
page cache case.

We aren't able to unmap the DAX exceptional entry because according to
its comments invalidate_mapping_pages() isn't allowed to block, and
unmap_mapping_range() takes a write lock on the mapping->i_mmap_rwsem.

Since we essentially never have unmapped DAX entries to evict from the
radix tree, just remove dax_invalidate_mapping_entry().

Fixes: c6dcf52c23d2 ("mm: Invalidate DAX radix tree entries only if appropriate")
Link: http://lkml.kernel.org/r/20170510085419.27601-2-jack@suse.cz
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Reported-by: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: <stable@vger.kernel.org> [4.10+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Tigran has moved

Cc: Tigran Aivazian <aivazian.tigran@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm, vmalloc: fix vmalloc users tracking properly

Commit 1f5307b1e094 ("mm, vmalloc: properly track vmalloc users") has
pulled asm/pgtable.h include dependency to linux/vmalloc.h and that
turned out to be a bad idea for some architectures.  E.g.  m68k fails
with

   In file included from arch/m68k/include/asm/pgtable_mm.h:145:0,
                    from arch/m68k/include/asm/pgtable.h:4,
                    from include/linux/vmalloc.h:9,
                    from arch/m68k/kernel/module.c:9:
   arch/m68k/include/asm/mcf_pgtable.h: In function 'nocache_page':
>> arch/m68k/include/asm/mcf_pgtable.h:339:43: error: 'init_mm' undeclared (first use in this function)
    #define pgd_offset_k(address) pgd_offset(&init_mm, address)

as spotted by kernel build bot. nios2 fails for other reason

  In file included from include/asm-generic/io.h:767:0,
                   from arch/nios2/include/asm/io.h:61,
                   from include/linux/io.h:25,
                   from arch/nios2/include/asm/pgtable.h:18,
                   from include/linux/mm.h:70,
                   from include/linux/pid_namespace.h:6,
                   from include/linux/ptrace.h:9,
                   from arch/nios2/include/uapi/asm/elf.h:23,
                   from arch/nios2/include/asm/elf.h:22,
                   from include/linux/elf.h:4,
                   from include/linux/module.h:15,
                   from init/main.c:16:
  include/linux/vmalloc.h: In function '__vmalloc_node_flags':
  include/linux/vmalloc.h:99:40: error: 'PAGE_KERNEL' undeclared (first use in this function); did you mean 'GFP_KERNEL'?

which is due to the newly added #include <asm/pgtable.h>, which on nios2
includes <linux/io.h> and thus <asm/io.h> and <asm-generic/io.h> which
again includes <linux/vmalloc.h>.

Tweaking that around just turns out a bigger headache than necessary.
This patch reverts 1f5307b1e094 and reimplements the original fix in a
different way.  __vmalloc_node_flags can stay static inline which will
cover vmalloc* functions.  We only have one external user
(kvmalloc_node) and we can export __vmalloc_node_flags_caller and
provide the caller directly.  This is much simpler and it doesn't really
need any games with header files.

[akpm@linux-foundation.org: coding-style fixes]
[mhocko@kernel.org: revert old comment]
Link: http://lkml.kernel.org/r/20170509211054.GB16325@dhcp22.suse.cz
Fixes: 1f5307b1e094 ("mm, vmalloc: properly track vmalloc users")
Link: http://lkml.kernel.org/r/20170509153702.GR6481@dhcp22.suse.cz
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Tobias Klauser <tklauser@distanz.ch>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/khugepaged: add missed tracepoint for collapse_huge_page_swapin

One return case of `__collapse_huge_page_swapin()` does not invoke
tracepoint while every other return case does. This commit adds a
tracepoint invocation for the case.

Link: http://lkml.kernel.org/r/20170507101813.30187-1-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sj38.park@gmail.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

gcov: support GCC 7.1

Starting from GCC 7.1, __gcov_exit is a new symbol expected to be
implemented in a profiling runtime.

[akpm@linux-foundation.org: coding-style fixes]
[mliska@suse.cz: v2]
Link: http://lkml.kernel.org/r/e63a3c59-0149-c97e-4084-20ca8f146b26@suse.cz
Link: http://lkml.kernel.org/r/8c4084fa-3885-29fe-5fc4-0d4ca199c785@suse.cz
Signed-off-by: Martin Liska <mliska@suse.cz>
Acked-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm, vmstat: Remove spurious WARN() during zoneinfo print

After commit e2ecc8a79ed4 ("mm, vmstat: print non-populated zones in
zoneinfo"), /proc/zoneinfo will show unpopulated zones.

A memoryless node, having no populated zones at all, was previously
ignored, but will now trigger the WARN() in is_zone_first_populated().

Remove this warning, as its only purpose was to warn of a situation that
has since been enabled.

Aside: The "per-node stats" are still printed under the first populated
zone, but that's not necessarily the first stanza any more. I'm not
sure which criteria is more important with regard to not breaking
parsers, but it looks a little weird to the eye.

Fixes: e2ecc8a79ed4 ("mm, vmstat: print node-based stats in zoneinfo file")
Link: http://lkml.kernel.org/r/1493854905-10918-1-git-send-email-arbab@linux.vnet.ibm.com
Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

time: delete current_fs_time()

All uses of the current_fs_time() function have been replaced by other
time interfaces.

And, its use cases can be fulfilled by current_time() or ktime_get_*
variants.

Link: http://lkml.kernel.org/r/1491613030-11599-13-git-send-email-deepa.kernel@gmail.com
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

hwpoison, memcg: forcibly uncharge LRU pages

Laurent Dufour has noticed that hwpoinsoned pages are kept charged.  In
his particular case he has hit a bad_page("page still charged to
cgroup") when onlining a hwpoison page.  While this looks like something
that shouldn't happen in the first place because onlining hwpages and
returning them to the page allocator makes only little sense it shows a
real problem.

hwpoison pages do not get freed usually so we do not uncharge them (at
least not since commit 0a31bc97c80c ("mm: memcontrol: rewrite uncharge
API")).  Each charge pins memcg (since e8ea14cc6ead ("mm: memcontrol:
take a css reference for each charged page")) as well and so the
mem_cgroup and the associated state will never go away.  Fix this leak
by forcibly uncharging a LRU hwpoisoned page in delete_from_lru_cache().
We also have to tweak uncharge_list because it cannot rely on zero ref
count for these pages.

[akpm@linux-foundation.org: coding-style fixes]
Fixes: 0a31bc97c80c ("mm: memcontrol: rewrite uncharge API")
Link: http://lkml.kernel.org/r/20170502185507.GB19165@dhcp22.suse.cz
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Tested-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Reviewed-by: Balbir Singh <bsingharora@gmail.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm fixes from Dan Williams:
"Incremental fixes and a small feature addition on top of the main
  libnvdimm 4.12 pull request:

   - Geert noticed that tinyconfig was bloated by BLOCK selecting DAX.
     The size regression is fixed by moving all dax helpers into the
     dax-core and only specifying "select DAX" for FS_DAX and
     dax-capable drivers. He also asked for clarification of the
     NR_DEV_DAX config option which, on closer look, does not need to be
     a config option at all. Mike also throws in a DEV_DAX_PMEM fixup
     for good measure.

   - Ben's attention to detail on -stable patch submissions caught a
     case where the recent fixes to arch_copy_from_iter_pmem() missed a
     condition where we strand dirty data in the cache. This is tagged
     for -stable and will also be included in the rework of the pmem api
     to a proposed {memcpy,copy_user}_flushcache() interface for 4.13.

   - Vishal adds a feature that missed the initial pull due to pending
     review feedback. It allows the kernel to clear media errors when
     initializing a BTT (atomic sector update driver) instance on a pmem
     namespace.

   - Ross noticed that the dax_device + dax_operations conversion broke
     __dax_zero_page_range(). The nvdimm unit tests fail to check this
     path, but xfstests immediately trips over it. No excuse for missing
     this before submitting the 4.12 pull request.

  These all pass the nvdimm unit tests and an xfstests spot check. The
  set has received a build success notification from the kbuild robot"

* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  filesystem-dax: fix broken __dax_zero_page_range() conversion
  libnvdimm, btt: ensure that initializing metadata clears poison
  libnvdimm: add an atomic vs process context flag to rw_bytes
  x86, pmem: Fix cache flushing for iovec write < 8 bytes
  device-dax: kill NR_DEV_DAX
  block, dax: move "select DAX" from BLOCK to FS_DAX
  device-dax: Tell kbuild DEV_DAX_PMEM depends on DEV_DAX

Merge tag 'sound-fix-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"This contains a one-liner change that has a significant impact:
  disabling the build of OSS. It's been unmaintained for long time, and
  we'd like to drop the stuff. Finally, as the first step, stop the
  build. Let's see whether it works without much complaints.

  Other than that, there are two small fixes for HD-audio"

* tag 'sound-fix-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  sound: Disable the build of OSS drivers
  ALSA: hda: Fix cpu lockup when stopping the cmd dmas
  ALSA: hda - Add mute led support for HP EliteBook 840 G3

Merge tag 'for-v4.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply

Pull more power-supply updates from Sebastian Reichel:
"The power-supply subsystem has a few more changes for the v4.12 merge
  window:

   - New battery driver for AXP20X and AXP22X PMICs

   - Improve max17042_battery for usage on x86

   - Misc small cleanups & fixes"

* tag 'for-v4.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply: (34 commits)
  power: supply: cpcap-charger: Keep trickle charger bits disabled
  power: supply: cpcap-charger: Fix enable for 3.8V charge setting
  power: supply: cpcap-charger: Fix charge voltage configuration
  power: supply: cpcap-charger: Fix charger name
  power: supply: twl4030-charger: make twl4030_bci_property_is_writeable static
  power: supply: sbs-battery: Add alert callback
  mailmap: add Sebastian Reichel
  power: supply: avoid unused twl4030-madc.h
  power: supply: sbs-battery: Correct supply status with current draw
  power: supply: sbs-battery: Don't ignore the first external power change
  power: supply: pda_power: move from timer to delayed_work
  power: supply: max17042_battery: Add support for the SCOPE property
  power: supply: max17042_battery: Add support for the CHARGE_NOW property
  power: supply: max17042_battery: Add support for the CHARGE_FULL_DESIGN property
  power: supply: max17042_battery: mAh readings depend on r_sns value
  power: supply: max17042_battery: Add support for the VOLT_MIN property
  power: supply: max17042_battery: Add support for the TECHNOLOGY attribute
  power: supply: max17042_battery: Add external_power_changed callback
  power: supply: max17042_battery: Add support for the STATUS property
  power: supply: max17042_battery: Add default platform_data fallback data
  ...

Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux

Pull thermal management updates from Zhang Rui:

- Fix a problem where orderly_shutdown() is called for multiple times
   due to multiple critical overheating events raised in a short period
   by platform thermal driver. (Keerthy)

- Introduce a backup thermal shutdown mechanism, which invokes
   kernel_power_off()/emergency_restart() directly, after
   orderly_shutdown() being issued for certain amount of time(specified
   via Kconfig). This is useful in certain conditions that userspace may
   be unable to power off the system in a clean manner and leaves the
   system in a critical state, like in the middle of driver probing
   phase. (Keerthy)

- Introduce a new interface in thermal devfreq_cooling code so that the
   driver can provide more precise data regarding actual power to the
   thermal governor every time the power budget is calculated. (Lukasz
   Luba)

- Introduce BCM 2835 soc thermal driver and northstar thermal driver,
   within a new sub-folder. (Rafał Miłecki)

- Introduce DA9062/61 thermal driver. (Steve Twiss)

- Remove non-DT booting on TI-SoC driver. Also add support to fetching
   coefficients from DT. (Keerthy)

- Refactorf RCAR Gen3 thermal driver. (Niklas Söderlund)

- Small fix on MTK and intel-soc-dts thermal driver. (Dawei Chien,
   Brian Bian)

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: (25 commits)
  thermal: core: Add a back up thermal shutdown mechanism
  thermal: core: Allow orderly_poweroff to be called only once
  Thermal: Intel SoC DTS: Change interrupt request behavior
  trace: thermal: add another parameter 'power' to the tracing function
  thermal: devfreq_cooling: add new interface for direct power read
  thermal: devfreq_cooling: refactor code and add get_voltage function
  thermal: mt8173: minor mtk_thermal.c cleanups
  thermal: bcm2835: move to the broadcom subdirectory
  thermal: broadcom: ns: specify myself as MODULE_AUTHOR
  thermal: da9062/61: Thermal junction temperature monitoring driver
  Documentation: devicetree: thermal: da9062/61 TJUNC temperature binding
  thermal: broadcom: add Northstar thermal driver
  dt-bindings: thermal: add support for Broadcom's Northstar thermal
  thermal: bcm2835: add thermal driver for bcm2835 SoC
  dt-bindings: Add thermal zone to bcm2835-thermal example
  thermal: rcar_gen3_thermal: add suspend and resume support
  thermal: rcar_gen3_thermal: store device match data in private structure
  thermal: rcar_gen3_thermal: enable hardware interrupts for trip points
  thermal: rcar_gen3_thermal: record and check number of TSCs found
  thermal: rcar_gen3_thermal: check that TSC exists before memory allocation
  ...

Merge tag 'drm-fixes-for-v4.12-rc1' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
"AMD, nouveau, one i915, and one EDID fix for v4.12-rc1

  Some fixes that it would be good to have in rc1. It contains the i915
  quiet fix that you reported.

  It also has an amdgpu fixes pull, with lots of ongoing work on Vega10
  which is new in this kernel and is preliminary support so may have a
  fair bit of movement.

  Otherwise a few non-Vega10 AMD fixes, one EDID fix and some nouveau
  regression fixers"

* tag 'drm-fixes-for-v4.12-rc1' of git://people.freedesktop.org/~airlied/linux: (144 commits)
  drm/i915: Make vblank evade warnings optional
  drm/nouveau/therm: remove ineffective workarounds for alarm bugs
  drm/nouveau/tmr: avoid processing completed alarms when adding a new one
  drm/nouveau/tmr: fix corruption of the pending list when rescheduling an alarm
  drm/nouveau/tmr: handle races with hw when updating the next alarm time
  drm/nouveau/tmr: ack interrupt before processing alarms
  drm/nouveau/core: fix static checker warning
  drm/nouveau/fb/ram/gf100-: remove 0x10f200 read
  drm/nouveau/kms/nv50: skip core channel cursor update on position-only changes
  drm/nouveau/kms/nv50: fix source-rect-only plane updates
  drm/nouveau/kms/nv50: remove pointless argument to window atomic_check_acquire()
  drm/amd/powerplay: refine pwm1_enable callback functions for CI.
  drm/amd/powerplay: refine pwm1_enable callback functions for vi.
  drm/amd/powerplay: refine pwm1_enable callback functions for Vega10.
  drm/amdgpu: refine amdgpu pwm1_enable sysfs interface.
  drm/amdgpu: add amd fan ctrl mode enums.
  drm/amd/powerplay: add more smu message on Vega10.
  drm/amdgpu: fix dependency issue
  drm/amd: fix init order of sched job
  drm/amdgpu: add some additional vega10 pci ids
  ...

Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending

Pull SCSI target updates from Nicholas Bellinger:
"Things were a lot more calm than previously expected. It's primarily
  fixes in various areas, with most of the new functionality centering
  around TCMU backend driver work that Xiubo Li has been driving.

  Here's the summary on the feature side:

   - Make T10-PI verify configurable for emulated (FILEIO + RD) backends
    (Dmitry Monakhov)
   - Allow target-core/TCMU pass-through to use in-kernel SPC-PR logic
    (Bryant Ly + MNC)
   - Add TCMU support for growing ring buffer size (Xiubo Li + MNC)
   - Add TCMU support for global block data pool (Xiubo Li + MNC)

  and on the bug-fix side:

   - Fix COMPARE_AND_WRITE non GOOD status handling for READ phase
    failures (Gary Guo + nab)
   - Fix iscsi-target hang with explicitly changing per NodeACL
    CmdSN number depth with concurrent login driven session
    reinstatement.  (Gary Guo + nab)
   - Fix ibmvscsis fabric driver ABORT task handling (Bryant Ly)
   - Fix target-core/FILEIO zero length handling (Bart Van Assche)

  Also, there was an OOPs introduced with the WRITE_VERIFY changes that
  I ended up reverting at the last minute, because as not unusual Bart
  and I could not agree on the fix in time for -rc1. Since it's specific
  to a conformance test, it's been reverted for now.

  There is a separate patch in the queue to address the underlying
  control CDB write overflow regression in >= v4.3 separate from the
  WRITE_VERIFY revert here, that will be pushed post -rc1"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (30 commits)
  Revert "target: Fix VERIFY and WRITE VERIFY command parsing"
  IB/srpt: Avoid that aborting a command triggers a kernel warning
  IB/srpt: Fix abort handling
  target/fileio: Fix zero-length READ and WRITE handling
  ibmvscsis: Do not send aborted task response
  tcmu: fix module removal due to stuck thread
  target: Don't force session reset if queue_depth does not change
  iscsi-target: Set session_fall_back_to_erl0 when forcing reinstatement
  target: Fix compare_and_write_callback handling for non GOOD status
  tcmu: Recalculate the tcmu_cmd size to save cmd area memories
  tcmu: Add global data block pool support
  tcmu: Add dynamic growing data area feature support
  target: fixup error message in target_tg_pt_gp_tg_pt_gp_id_store()
  target: fixup error message in target_tg_pt_gp_alua_access_type_store()
  target/user: PGR Support
  target: Add WRITE_VERIFY_16
  Documentation/target: add an example script to configure an iSCSI target
  target: Use kmalloc_array() in transport_kmap_data_sg()
  target: Use kmalloc_array() in compare_and_write_callback()
  target: Improve size determinations in two functions
  ...

Merge branch 'work.sane_pwd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull misc vfs updates from Al Viro:
"Making sure that something like a referral point won't end up as pwd
  or root.

  The main part is the last commit (fixing mntns_install()); that one
  fixes a hard-to-hit race. The fchdir() commit is making fchdir(2) a
  bit more robust - it should be impossible to get opened files (even
  O_PATH ones) for referral points in the first place, so the existing
  checks are OK, but checking the same thing as in chdir(2) is just as
  cheap.

  The path_init() commit removes a redundant check that shouldn't have
  been there in the first place"

* 'work.sane_pwd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  make sure that mntns_install() doesn't end up with referral for root
  path_init(): don't bother with checking MAY_EXEC for LOOKUP_ROOT
  make sure that fchdir() won't accept referral points, etc.

Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf updates/fixes from Ingo Molnar:
"Mostly tooling updates, but also two kernel fixes: a call chain
  handling robustness fix and an x86 PMU driver event definition fix"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/callchain: Force USER_DS when invoking perf_callchain_user()
  tools build: Fixup sched_getcpu feature test
  perf tests kmod-path: Don't fail if compressed modules aren't supported
  perf annotate: Fix AArch64 comment char
  perf tools: Fix spelling mistakes
  perf/x86: Fix Broadwell-EP DRAM RAPL events
  perf config: Refactor a duplicated code for obtaining config file name
  perf symbols: Allow user probes on versioned symbols
  perf symbols: Accept symbols starting at address 0
  tools lib string: Adopt prefixcmp() from perf and subcmd
  perf units: Move parse_tag_value() to units.[ch]
  perf ui gtk: Move gtk .so name to the only place where it is used
  perf tools: Move HAS_BOOL define to where perl headers are used
  perf memswap: Split the byteswap memory range wrappers from util.[ch]
  perf tools: Move event prototypes from util.h to event.h
  perf buildid: Move prototypes from util.h to build-id.h

Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fix from Ingo Molnar:
"A single ARM Juno clocksource driver fix"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource/arm_arch_timer: Fix arch_timer_mem_find_best_frame()

Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull stackprotector fixlet from Ingo Molnar:
"A single fix/enhancement to increase stackprotector canary randomness
on 64-bit kernels with very little cost"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
stackprotector: Increase the per-task stack canary's random range from 32 bits to 64 bits on 64-bit platforms

Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:
"Misc fixes:

   - two boot crash fixes
   - unwinder fixes
   - kexec related kernel direct mappings enhancements/fixes
   - more Clang support quirks
   - minor cleanups
   - Documentation fixes"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/intel_rdt: Fix a typo in Documentation
  x86/build: Don't add -maccumulate-outgoing-args w/o compiler support
  x86/boot/32: Fix UP boot on Quark and possibly other platforms
  x86/mm/32: Set the '__vmalloc_start_set' flag in initmem_init()
  x86/kexec/64: Use gbpages for identity mappings if available
  x86/mm: Add support for gbpages to kernel_ident_mapping_init()
  x86/boot: Declare error() as noreturn
  x86/mm/kaslr: Use the _ASM_MUL macro for multiplication to work around Clang incompatibility
  x86/mm: Fix boot crash caused by incorrect loop count calculation in sync_global_pgds()
  x86/asm: Don't use RBP as a temporary register in csum_partial_copy_generic()
  x86/microcode/AMD: Remove redundant NULL check on mc

Merge tag 'for-linus-4.12b-rc0c-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:
"This contains two fixes for booting under Xen introduced during this
  merge window and two fixes for older problems, where one is just much
  more probable due to another merge window change"

* tag 'for-linus-4.12b-rc0c-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen: adjust early dom0 p2m handling to xen hypervisor behavior
  x86/amd: don't set X86_BUG_SYSRET_SS_ATTRS when running under Xen
  xen/x86: Do not call xen_init_time_ops() until shared_info is initialized
  x86/xen: fix xsave capability setting

Merge tag 'powerpc-4.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull more powerpc updates from Michael Ellerman:
"The change to the Linux page table geometry was delayed for more
  testing with 16G pages, and there's the new CPU features stuff which
  just needed one more polish before going in. Plus a few changes from
  Scott which came in a bit late. And then various fixes, mostly minor.

  Summary highlights:

   - rework the Linux page table geometry to lower memory usage on
     64-bit Book3S (IBM chips) using the Hash MMU.

   - support for a new device tree binding for discovering CPU features
     on future firmwares.

   - Freescale updates from Scott:
      "Includes a fix for a powerpc/next mm regression on 64e, a fix for
       a kernel hang on 64e when using a debugger inside a relocated
       kernel, a qman fix, and misc qe improvements."

  Thanks to: Christophe Leroy, Gavin Shan, Horia Geantă, LiuHailong,
  Nicholas Piggin, Roy Pledge, Scott Wood, Valentin Longchamp"

* tag 'powerpc-4.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/64s: Support new device tree binding for discovering CPU features
  powerpc: Don't print cpu_spec->cpu_name if it's NULL
  of/fdt: introduce of_scan_flat_dt_subnodes and of_get_flat_dt_phandle
  powerpc/64s: Fix unnecessary machine check handler relocation branch
  powerpc/mm/book3s/64: Rework page table geometry for lower memory usage
  powerpc: Fix distclean with Makefile.postlink
  powerpc/64e: Don't place the stack beyond TASK_SIZE
  powerpc/powernv: Block PCI config access on BCM5718 during EEH recovery
  powerpc/8xx: Adding support of IRQ in MPC8xx GPIO
  soc/fsl/qbman: Disable IRQs for deferred QBMan work
  soc/fsl/qe: add EXPORT_SYMBOL for the 2 qe_tdm functions
  soc/fsl/qe: only apply QE_General4 workaround on affected SoCs
  soc/fsl/qe: round brg_freq to 1kHz granularity
  soc/fsl/qe: get rid of immrbar_virt_to_phys()
  net: ethernet: ucc_geth: fix MEM_PART_MURAM mode
  powerpc/64e: Fix hang when debugging programs with relocated kernel

Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus

Pull MIPS updates from James Hogan:
"math-emu:
   - Add missing clearing of BLTZALL and BGEZALL emulation counters
   - Fix BC1EQZ and BC1NEZ condition handling
   - Fix BLEZL and BGTZL identification

  BPF:
   - Add JIT support for SKF_AD_HATYPE
   - Use unsigned access for unsigned SKB fields
   - Quit clobbering callee saved registers in JIT code
   - Fix multiple problems in JIT skb access helpers

  Loongson 3:
   - Select MIPS_L1_CACHE_SHIFT_6

  Octeon:
   - Remove vestiges of CONFIG_CAVIUM_OCTEON_2ND_KERNEL
   - Remove unused L2C types and macros.
   - Remove unused SLI types and macros.
   - Fix compile error when USB is not enabled.
   - Octeon: Remove unused PCIERCX types and macros.
   - Octeon: Clean up platform code.

  SNI:
   - Remove recursive include of cpu-feature-overrides.h

  Sibyte:
   - Export symbol periph_rev to sb1250-mac network driver.
   - Fix Kconfig warning.

  Generic platform:
   - Enable Root FS on NFS in generic_defconfig

  SMP-MT:
   - Use CPU interrupt controller IPI IRQ domain support

  UASM:
   - Add support for LHU for uasm.
   - Remove needless ISA abstraction

  mm:
   - Add 48-bit VA space and 4-level page tables for 4K pages.

  PCI:
   - Add controllers before the specified head

  irqchip driver for MIPS CPU:
   - Replace magic 0x100 with IE_SW0
   - Prepare for non-legacy IRQ domains
   - Introduce IPI IRQ domain support

  MAINTAINERS:
   - Update email-id of Rahul Bedarkar

  NET:
   - sb1250-mac: Add missing MODULE_LICENSE()

  CPUFREQ:
   - Loongson2: drop set_cpus_allowed_ptr()

  Misc:
   - Disable Werror when W= is set
   - Opt into HAVE_COPY_THREAD_TLS
   - Enable GENERIC_CPU_AUTOPROBE
   - Use common outgoing-CPU-notification code
   - Remove dead define of ST_OFF
   - Remove CONFIG_ARCH_HAS_ILOG2_U{32,64}
   - Stengthen IPI IRQ domain sanity check
   - Remove confusing else statement in __do_page_fault()
   - Don't unnecessarily include kmalloc.h into <asm/cache.h>.
   - Delete unused definition of SMP_CACHE_SHIFT.
   - Delete redundant definition of SMP_CACHE_BYTES"

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (39 commits)
  MIPS: Sibyte: Fix Kconfig warning.
  MIPS: Sibyte: Export symbol periph_rev to sb1250-mac network driver.
  NET: sb1250-mac: Add missing MODULE_LICENSE()
  MAINTAINERS: Update email-id of Rahul Bedarkar
  MIPS: Remove confusing else statement in __do_page_fault()
  MIPS: Stengthen IPI IRQ domain sanity check
  MIPS: smp-mt: Use CPU interrupt controller IPI IRQ domain support
  irqchip: mips-cpu: Introduce IPI IRQ domain support
  irqchip: mips-cpu: Prepare for non-legacy IRQ domains
  irqchip: mips-cpu: Replace magic 0x100 with IE_SW0
  MIPS: Remove CONFIG_ARCH_HAS_ILOG2_U{32,64}
  MIPS: generic: Enable Root FS on NFS in generic_defconfig
  MIPS: mach-rm: Remove recursive include of cpu-feature-overrides.h
  MIPS: Opt into HAVE_COPY_THREAD_TLS
  CPUFREQ: Loongson2: drop set_cpus_allowed_ptr()
  MIPS: uasm: Remove needless ISA abstraction
  MIPS: Remove dead define of ST_OFF
  MIPS: Use common outgoing-CPU-notification code
  MIPS: math-emu: Fix BC1EQZ and BC1NEZ condition handling
  MIPS: r2-on-r6-emu: Clear BLTZALL and BGEZALL debugfs counters
  ...

Merge tag 'nios2-v4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2

Pull nios2 updates from Ley Foon Tan:
"nios2 fixes/enhancements and adding nios2 R2 support"

* tag 'nios2-v4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2:
  nios2: remove custom early console implementation
  nios2: use generic strncpy_from_user() and strnlen_user()
  nios2: Add CDX support
  nios2: Add BMX support
  nios2: Add NIOS2_ARCH_REVISION to select between R1 and R2
  nios2: implement flush_dcache_mmap_lock/unlock
  nios2: enable earlycon support
  nios2: constify irq_domain_ops
  nios2: remove wrapper header for cmpxchg.h
  nios2: add .gitignore entries for auto-generated files

vmxnet3: ensure that adapter is in proper state during force_close

There are several paths in vmxnet3, where settings changes cause the
adapter to be brought down and back up (vmxnet3_set_ringparam among
them).  Should part of the reset operation fail, these paths call
vmxnet3_force_close, which enables all napi instances prior to calling
dev_close (with the expectation that vmxnet3_close will then properly
disable them again).  However, vmxnet3_force_close neglects to clear
VMXNET3_STATE_BIT_QUIESCED prior to calling dev_close.  As a result
vmxnet3_quiesce_dev (called from vmxnet3_close), returns early, and
leaves all the napi instances in a enabled state while the device itself
is closed.  If a device in this state is activated again, napi_enable
will be called on already enabled napi_instances, leading to a BUG halt.

The fix is to simply enausre that the QUIESCED bit is cleared in
vmxnet3_force_close to allow quesence to be completed properly on close.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Shrikrishna Khare <skhare@vmware.com>
CC: "VMware, Inc." <pv-drivers@vmware.com>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: revert changes to NIC revision numbers

The revision enum values (eg EFX_REV_HUNT_A0) form part of our API,
and are included in ethtool. If these are inconsistent then ethtool
will print garbage for a register dump (ethtool -d).

Fixes: 5a6681e22c14 ("sfc: separate out SFC4000 ("Falcon") support into new sfc-falcon driver")
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ch9200: add missing USB-descriptor endianness conversions

Add the missing endianness conversions to a debug statement printing
the USB device-descriptor idVendor and idProduct fields during probe.

Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: irda: irda-usb: fix firmware name on big-endian hosts

Add missing endianness conversion when using the USB device-descriptor
bcdDevice field to construct a firmware file name.

Fixes: 8ef80aef118e ("[IRDA]: irda-usb.c: STIR421x cleanups")
Cc: stable <stable@vger.kernel.org> # 2.6.18
Cc: Nick Fedchik <nfedchik@atlantic-link.com.ua>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: add default case to switch

Add default case to switch in order to avoid any chance of using an
uninitialized variable _low_, in case s->type does not match any of
the listed case values.

Addresses-Coverity-ID: 1398130
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sctp: fix src address selection if using secondary addresses for ipv6

Commit 0ca50d12fe46 ("sctp: fix src address selection if using secondary
addresses") has fixed a src address selection issue when using secondary
addresses for ipv4.

Now sctp ipv6 also has the similar issue. When using a secondary address,
sctp_v6_get_dst tries to choose the saddr which has the most same bits
with the daddr by sctp_v6_addr_match_len. It may make some cases not work
as expected.

hostA:
  [1] fd21:356b:459a:cf10::11 (eth1)
  [2] fd21:356b:459a:cf20::11 (eth2)

hostB:
  [a] fd21:356b:459a:cf30::2  (eth1)
  [b] fd21:356b:459a:cf40::2  (eth2)

route from hostA to hostB:
  fd21:356b:459a:cf30::/64 dev eth1  metric 1024  mtu 1500

The expected path should be:
  fd21:356b:459a:cf10::11 <-> fd21:356b:459a:cf30::2
But addr[2] matches addr[a] more bits than addr[1] does, according to
sctp_v6_addr_match_len. It causes the path to be:
  fd21:356b:459a:cf20::11 <-> fd21:356b:459a:cf30::2

This patch is to fix it with the same way as Marcelo's fix for sctp ipv4.
As no ip_dev_find for ipv6, this patch is to use ipv6_chk_addr to check
if the saddr is in a dev instead.

Note that for backwards compatibility, it will still do the addr_match_len
check here when no optimal is found.

Reported-by: Patrick Talbert <ptalbert@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sound: Disable the build of OSS drivers

OSS drivers are left as badly unmaintained, and now we're facing a
problem to clean up the hackish set_fs() usage in their codes.  Since
most of drivers have been covered by ALSA, and the others are dead old
and inactive, let's leave them RIP.

This patch is the first step: disable the build of OSS drivers.
We'll eventually drop the whole codes and clean up later.

Note that sound/oss/dmasound is still kept, since it's a completely
different implementation of OSS, and it doesn't suffer from set_fs()
hack.  Moreover, the build of ALSA is disabled on M68K by some reason,
thus disabling it shall result in a regression.  This one will be
disabled / removed once when we add the support in ALSA side.

Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

drm/i915: Make vblank evade warnings optional

Add a new Kconfig option to enable/disable the extra warnings
from the vblank evade code. For now we'll keep the warning
about an actually missed vblank always enabled as that can have
an actual user visible impact. But if we miss the deadline
othrwise there's no real need to bother the user with that.
We'll want these warnings enabled during development however
so that we can catch regressions.

Based on the reports it looks like this is still very easy
to hit on SKL, so we have more work ahead of us to optimize
the crtiical section further.

Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reported-by: Jens Axboe <axboe@kernel.dk>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Fixes: e1edbd44e23b ("drm/i915: Complain if we take too long under vblank evasion.")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>

Merge branch 'linux-4.12' of git://github.com/skeggsb/linux into drm-next

Quite a few patches, but not much code changed:
- Fixes regression from atomic when only the source rect of a plane
changes (ie. xrandr --right-of)
- Fixes another issue where atomic changed behaviour underneath us,
potentially causing laggy cursor position updates
- Fixes for a bunch of races in thermal code, which lead to random
lockups for a lot of users

* 'linux-4.12' of git://github.com/skeggsb/linux:
  drm/nouveau/therm: remove ineffective workarounds for alarm bugs
  drm/nouveau/tmr: avoid processing completed alarms when adding a new one
  drm/nouveau/tmr: fix corruption of the pending list when rescheduling an alarm
  drm/nouveau/tmr: handle races with hw when updating the next alarm time
  drm/nouveau/tmr: ack interrupt before processing alarms
  drm/nouveau/core: fix static checker warning
  drm/nouveau/fb/ram/gf100-: remove 0x10f200 read
  drm/nouveau/kms/nv50: skip core channel cursor update on position-only changes
  drm/nouveau/kms/nv50: fix source-rect-only plane updates
  drm/nouveau/kms/nv50: remove pointless argument to window atomic_check_acquire()

Merge branch 'drm-next-4.12' of git://people.freedesktop.org/~agd5f/linux into drm-next

Fixes for 4.12.  This is a bit bigger than usual since it's 3 weeks
worth of fixes and most of these changes are for vega10 which is
new for 4.12 and still in a fair amount of flux.  It looks like
you missed my last pull request, so those patches are included here
as well.  Highlights:
- Lots of vega10 fixes
- Fix interruptable wait mixup
- Fan control method fixes
- Misc display fixes for radeon and amdgpu
- Misc bug fixes

* 'drm-next-4.12' of git://people.freedesktop.org/~agd5f/linux: (132 commits)
  drm/amd/powerplay: refine pwm1_enable callback functions for CI.
  drm/amd/powerplay: refine pwm1_enable callback functions for vi.
  drm/amd/powerplay: refine pwm1_enable callback functions for Vega10.
  drm/amdgpu: refine amdgpu pwm1_enable sysfs interface.
  drm/amdgpu: add amd fan ctrl mode enums.
  drm/amd/powerplay: add more smu message on Vega10.
  drm/amdgpu: fix dependency issue
  drm/amd: fix init order of sched job
  drm/amdgpu: add some additional vega10 pci ids
  drm/amdgpu/soc15: use atomfirmware for setting bios scratch for reset
  drm/amdgpu/atomfirmware: add function to update engine hang status
  drm/radeon: only warn once in radeon_ttm_bo_destroy if va list not empty
  drm/amdgpu: fix mutex list null pointer reference
  drm/amd/powerplay: fix bug sclk/mclk level can't be set on vega10.
  drm/amd/powerplay: Setup sw CTF to allow graceful exit when temperature exceeds maximum.
  drm/amd/powerplay: delete dead code in powerplay.
  drm/amdgpu: Use less generic enum definitions
  drm/amdgpu/gfx9: derive tile pipes from golden settings
  drm/amdgpu/gfx: drop max_gs_waves_per_vgt
  drm/amd/powerplay: disable engine spread spectrum feature on Vega10.
  ...

Merge tag 'drm-misc-next-fixes-2017-05-05' of git://anongit.freedesktop.org/git/drm-misc into drm-next

Core Changes:
- Add quirk for LGD 764 panel to default 10bpc (Mario)

Cc: Mario Kleiner <mario.kleiner.de@gmail.com>
* tag 'drm-misc-next-fixes-2017-05-05' of git://anongit.freedesktop.org/git/drm-misc:
drm/edid: Add 10 bpc quirk for LGD 764 panel in HP zBook 17 G2

net: phy: Call bus->reset() after releasing PHYs from reset

The API convention makes it that a given MDIO bus reset should be able
to access PHY devices in its reset() callback and perform additional
MDIO accesses in order to bring the bus and PHYs in a working state.

Commit 69226896ad63 ("mdio_bus: Issue GPIO RESET to PHYs.") broke that
contract by first calling bus->reset() and then release all PHYs from
reset using their shared GPIO line, so restore the expected
functionality here.

Fixes: 69226896ad63 ("mdio_bus: Issue GPIO RESET to PHYs.")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: Handle multiple variable additions into packet pointers in verifier.

We must accumulate into reg->aux_off rather than use a plain assignment.

Add a test for this situation to test_align.

Reported-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: make macro tipc_wait_for_cond() smp safe

The macro tipc_wait_for_cond() is embedding the macro sk_wait_event()
to fulfil its task. The latter, in turn, is evaluating the stated
condition outside the socket lock context. This is problematic if
the condition is accessing non-trivial data structures which may be
altered by incoming interrupts, as is the case with the cong_links()
linked list, used by socket to keep track of the current set of
congested links. We sometimes see crashes when this list is accessed
by a condition function at the same time as a SOCK_WAKEUP interrupt
is removing an element from the list.

We fix this by expanding selected parts of sk_wait_event() into the
outer macro, while ensuring that all evaluations of a given condition
are performed under socket lock protection.

Fixes: commit 365ad353c256 ("tipc: reduce risk of user starvation during link congestion")
Reviewed-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

samples/bpf: run cleanup routines when receiving SIGTERM

Shahid Habib noticed that when xdp1 was killed from a different console the xdp
program was not cleaned-up properly in the kernel and it continued to forward
traffic.

Most of the applications in samples/bpf cleanup properly, but only when getting
SIGINT. Since kill defaults to using SIGTERM, add support to cleanup when the
application receives either SIGINT or SIGTERM.

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Reported-by: Shahid Habib <shahid.habib@broadcom.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet: aquantia: remove redundant checks on error status

The error status err is initialized as zero and then being checked
several times to see if it is less than zero even when it has not
been updated. It may seem that the err should be assigned to the
return code of the call to the various *offload_en_set calls and
then we check for failure, however, these functions are void and
never actually return any status.

Since these error checks are redundant we can remove these
as well as err and the error exit label err_exit.

Detected by CoverityScan, CID#1398313 and CID#1398306 ("Logically
dead code")

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Acked-by: Pavel Belous <pavel.belous@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: Remove commented out debugging hack in test_align.

Reported-by: Alexander Alemayhu <alexander@alemayhu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'qlcnic-fixes'

Manish Chopra says:

====================
qlcnic: Bug fix and update version

This series has one fix and bumps up driver version.
Please consider applying to "net"
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

qlcnic: Update version to 5.3.66

Bumping up the version as couple of fixes added after 5.3.65

Signed-off-by: Manish Chopra <manish.chopra@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlcnic: Fix link configuration with autoneg disabled

Currently driver returns error on speed configurations
for 83xx adapter's non XGBE ports, due to this link doesn't
come up on the ports using 1000Base-T as a connector with
autoneg disabled. This patch fixes this with initializing
appropriate port type based on queried module/connector
types from hardware before any speed/autoneg configuration.

Signed-off-by: Manish Chopra <manish.chopra@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

xen-netfront: avoid crashing on resume after a failure in talk_to_netback()

Unavoidable crashes in netfront_resume() and netback_changed() after a
previous fail in talk_to_netback() (e.g. when we fail to read MAC from
xenstore) were discovered. The failure path in talk_to_netback() does
unregister/free for netdev but we don't reset drvdata and we try accessing
it after resume.

Fix the bug by removing the whole xen device completely with
device_unregister(), this guarantees we won't have any calls into netfront
after a failure.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sched: optimize class dumps

In commit 59cc1f61f09c ("net: sched: convert qdisc linked list to
hashtable") we missed the opportunity to considerably speed up
tc_dump_tclass_root() if a qdisc handle is provided by user.

Instead of iterating all the qdiscs, use qdisc_match_from_root()
to directly get the one we look for.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: avoid fragmenting peculiar skbs in SACK

This patch fixes a bug in splitting an SKB during SACK
processing. Specifically if an skb contains multiple
packets and is only partially sacked in the higher sequences,
tcp_match_sack_to_skb() splits the skb and marks the second fragment
as SACKed.

The current code further attempts rounding up the first fragment
to MSS boundaries. But it misses a boundary condition when the
rounded-up fragment size (pkt_len) is exactly skb size. Spliting
such an skb is pointless and causses a kernel warning and aborts
the SACK processing. This patch universally checks such over-split
before calling tcp_fragment to prevent these unnecessary warnings.

Fixes: adb92db857ee ("tcp: Make SACK code to split only at mss boundaries")
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netem: fix skb_orphan_partial()

I should have known that lowering skb->truesize was dangerous :/

In case packets are not leaving the host via a standard Ethernet device,
but looped back to local sockets, bad things can happen, as reported
by Michael Madsen ( https://bugzilla.kernel.org/show_bug.cgi?id=195713 )

So instead of tweaking skb->truesize, lets change skb->destructor
and keep a reference on the owner socket via its sk_refcnt.

Fixes: f2f872f9272a ("netem: Introduce skb_orphan_partial() helper")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Michael Madsen <mkm@nabto.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'generic-xdp-followups'

Daniel Borkmann says:

====================
Two generic xdp related follow-ups

Two follow-ups for the generic XDP API, would be great if
both could still be considered, since the XDP API is not
frozen yet. For details please see individual patches.

v1 -> v2:
  - Implemented feedback from Jakub Kicinski (reusing
    attribute on dump), thanks!
  - Rest as is.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

xdp: refine xdp api with regards to generic xdp

While working on the iproute2 generic XDP frontend, I noticed that
as of right now it's possible to have native *and* generic XDP
programs loaded both at the same time for the case when a driver
supports native XDP.

The intended model for generic XDP from b5cdae3291f7 ("net: Generic
XDP") is, however, that only one out of the two can be present at
once which is also indicated as such in the XDP netlink dump part.
The main rationale for generic XDP is to ease accessibility (in
case a driver does not yet have XDP support) and to generically
provide a semantical model as an example for driver developers
wanting to add XDP support. The generic XDP option for an XDP
aware driver can still be useful for comparing and testing both
implementations.

However, it is not intended to have a second XDP processing stage
or layer with exactly the same functionality of the first native
stage. Only reason could be to have a partial fallback for future
XDP features that are not supported yet in the native implementation
and we probably also shouldn't strive for such fallback and instead
encourage native feature support in the first place. Given there's
currently no such fallback issue or use case, lets not go there yet
if we don't need to.

Therefore, change semantics for loading XDP and bail out if the
user tries to load a generic XDP program when a native one is
present and vice versa. Another alternative to bailing out would
be to handle the transition from one flavor to another gracefully,
but that would require to bring the device down, exchange both
types of programs, and bring it up again in order to avoid a tiny
window where a packet could hit both hooks. Given this complicates
the logic for just a debugging feature in the native case, I went
with the simpler variant.

For the dump, remove IFLA_XDP_FLAGS that was added with b5cdae3291f7
and reuse IFLA_XDP_ATTACHED for indicating the mode. Dumping all
or just a subset of flags that were used for loading the XDP prog
is suboptimal in the long run since not all flags are useful for
dumping and if we start to reuse the same flag definitions for
load and dump, then we'll waste bit space. What we really just
want is to dump the mode for now.

Current IFLA_XDP_ATTACHED semantics are: nothing was installed (0),
a program is running at the native driver layer (1). Thus, add a
mode that says that a program is running at generic XDP layer (2).
Applications will handle this fine in that older binaries will
just indicate that something is attached at XDP layer, effectively
this is similar to IFLA_XDP_FLAGS attr that we would have had
modulo the redundancy.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

xdp: add flag to enforce driver mode

After commit b5cdae3291f7 ("net: Generic XDP") we automatically fall
back to a generic XDP variant if the driver does not support native
XDP. Allow for an option where the user can specify that always the
native XDP variant should be selected and in case it's not supported
by a driver, just bail out.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Input: cros_ec_keyb - remove extraneous 'const'

gcc-7 warns about 'const SIMPLE_DEV_PM_OPS', as that macro already contains
a 'const' keyword:

drivers/input/keyboard/cros_ec_keyb.c:663:14: error: duplicate 'const' declaration specifier [-Werror=duplicate-decl-specifier]
static const SIMPLE_DEV_PM_OPS(cros_ec_keyb_pm_ops, NULL, cros_ec_keyb_resume);

This removes the extra one.

Fixes: 6af6dc2d2aa6 ("input: Add ChromeOS EC keyboard driver")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>

drm/nouveau/therm: remove ineffective workarounds for alarm bugs

These were ineffective due to touching the list without the alarm lock,
but should no longer be required.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Cc: stable@vger.kernel.org

drm/nouveau/tmr: avoid processing completed alarms when adding a new one

The idea here was to avoid having to "manually" program the HW if there's
a new earliest alarm. This was lazy and bad, as it leads to loads of fun
races between inter-related callers (ie. therm).

Turns out, it's not so difficult after all. Go figure ;)

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Cc: stable@vger.kernel.org

drm/nouveau/tmr: fix corruption of the pending list when rescheduling an alarm

At least therm/fantog "attempts" to work around this issue, which could
lead to corruption of the pending alarm list.

Fix it properly by not updating the timestamp without the lock held, or
trying to add an already pending alarm to the pending alarm list....

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Cc: stable@vger.kernel.org

drm/nouveau/tmr: handle races with hw when updating the next alarm time

If the time to the next alarm is short enough, we could race with HW and
end up with an ~4 second delay until it triggers.

Fix this by checking again after we update HW.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Cc: stable@vger.kernel.org

drm/nouveau/tmr: ack interrupt before processing alarms

Fixes a race where we can miss an alarm that triggers while we're already
processing previous alarms.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Cc: stable@vger.kernel.org

drm/nouveau/core: fix static checker warning

object->engine cannot be NULL, it's either valid, or an error pointer.

This particular condition shouldn't actually be possible, but just in
case, we'll keep it.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>

drm/nouveau/fb/ram/gf100-: remove 0x10f200 read

This reg has moved on Pascal, and causes a bus fault.

We never use the value anyway, so just remove the read.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>