git.proxmox.com Git - mirror_ubuntu-jammy-kernel.git/log

staging: lustre: recovery: don't skip open replay on reconnect

Once reconnect happened during replay, we'd continue the open
replay with the last failed replay, but not the next.

Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6802
Reviewed-on: http://review.whamcloud.com/15871
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Hongchao Zhang <hongchao.zhang@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: ldlm: reclaim granted locks defensively

It was discovered that to many ldlm locks where being
created on the server side to the point of memory
exhaustion. The work of LU-6529 introduced watermarks
to avoid this memory exhaustion. This is the client
side part of this work for the upstream client.

Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6529
Reviewed-on: http://review.whamcloud.com/14931
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6929
Reviewed-on: http://review.whamcloud.com/15813
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: mdt: disable IMA support

For IMA (Integrity Measurement Architecture), there are two xattr
"security.ima" and "security.evm" to protect the file to be modified
accidentally or maliciously, the two xattr are not compatible with
VBR, then disable it to workaround the problem currently and enable
it when the conditions are ready.

Signed-off-by: Hongchao Zhang <hongchao.zhang@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6455
Reviewed-on: http://review.whamcloud.com/14928
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: ptlrpc: replay bulk request

Even though the server might already got the bulk
replay request, but bulk transfer timeout, let's
replay the bulk request, i.e. treat such replay as
same as no replied replay request (See
ptlrpc_replay_interpret()).

Signed-off-by: wang di <di.wang@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6924
Reviewed-on: http://review.whamcloud.com/15793
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: mdc: deactive MDT permanently

Add active proc entry for MDC, and mark MDC to
be inactive once the MDC is deactivated.

Signed-off-by: wang di <di.wang@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6586
Reviewed-on: http://review.whamcloud.com/14747
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: obdecho: don't copy lu_site

While creating an echo device, echo_device_alloc() copies the lu_site
from MD stack, such kind of copy result in uninitialized mutex and
other potential issues.

Instead of copying the lu_site, we'd use the lu_site by pointer directly.

Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6765
Reviewed-on: http://review.whamcloud.com/15657
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: ptlrpc: Do not resend req with allow_replay

If the request is allowed to be sent during recovery,
and it is not timeout yet, then we do not need to
resend it in the final stage of recovery.

Unnecessary resend will cause the bulk request to resend the
request with different mbit, but same xid, and on the remote
server side, it will drop such request with same xid, so it
will never get the bulk in this case.

Signed-off-by: wang di <di.wang@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6780
Reviewed-on: http://review.whamcloud.com/15458
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Liang Zhen <liang.zhen@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: llite: report back to user bad stripe count

If the user is requesting a stripe count larger than what
is supported in the file system then report back to the
user an error as well as what is the largest possible
striping.

Signed-off-by: wang di <di.wang@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6602
Reviewed-on: http://review.whamcloud.com/15162
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: llite: basic support of SELinux in CLIO

Bring the ability to properly initiate security context
on SELinux-enabled client and store it on server side via
extended attribute.

Security context initialization is not atomic, but that would
require a wire protocol change to send security label in the
creation request.

Filter out security.selinux from xattr cache as it is
already cached in system slab.

Signed-off-by: Sebastien Buisson <sebastien.buisson@bull.net>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5560
Reviewed-on: http://review.whamcloud.com/11648
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: ptlrpc: embed highest XID in each request

Atomically assign XIDs and put request and sending list so
we can learn the lowest unreplied XID at any point.

This allows to embed in every resquests the highest XID for
which a reply has been received and does not have an unreplied
lower-numbered XID.

This will be used by the MDT target to release in-memory
reply data corresponding to XIDs of reply received by the client.

Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Signed-off-by: Gregoire Pichon <gregoire.pichon@bull.net>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5319
Reviewed-on: http://review.whamcloud.com/14793
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: dne: setdirstripe should fail if not supported

If MDS doesn't support striped directory, creating striped directory
from a userland utility such as 'lfs setdirstripe' should fail.

Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6661
Reviewed-on: http://review.whamcloud.com/15123
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: clio: update file attributes after sync

This really only affects clients that are attached to lustre
server running ZFS, because zfs does not update # of blocks
until the blocks are flushed to disk.

This patch update file's blocks attribute after OST_SYNC completes.

Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4389
Reviewed-on: http://review.whamcloud.com/12915
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: lov: remove LSM from struct lustre_md

In struct lustre_md replace the struct lov_stripe_md *lsm member with
an opaque struct lu_buf layout which holds the layout metadata
returned by the server. Refactor the LOV object initialization and
layout change code to accommodate this. Simplify lov_unpackmd() and
supporting functions according to the reduced number of use cases.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5814
Reviewed-on: http://review.whamcloud.com/13722
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: ptlrpc: Introduce iovec to bulk descriptor

Added a union to ptlrpc_bulk_desc for KVEC and KIOV buffers.
bd_type has been changed to be a bit mask. Bits are set in
bd_type to specify {put,get}{source,sink}{kvec,kiov}
changed all instances in the code to access the union properly
ASSUMPTION: all current code only works with KIOV and new DNE code
to be added will introduce a different code path that uses IOVEC

As part of the implementation buffer operations are added
as callbacks to be defined by users of ptlrpc. This enables
buffer users to define how to allocate and release buffers.

Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5835
Reviewed-on: http://review.whamcloud.com/12525
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: Liang Zhen <liang.zhen@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: lmv: allow cross-MDT rename and link

Remove checks for cross-MDT operation, so cross-MDT
rename and link will be allowed.

Remove obsolete locality parameters in MDT lock API after all of
cross-MDT operations are allowed.

This is the client side changed only so the upstream
client can support this.

Signed-off-by: wang di <di.wang@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3537
Reviewed-on: http://review.whamcloud.com/12282
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: obdclass: variable llog chunk size

Do not use fix LLOG_CHUNK_SIZE (8192 bytes), and
it will get the llog_chunk_size from llog_log_hdr.
Accordingly llog header will be variable too, so
we can enlarge the bitmap in the header, then
have more records in each llog file.

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: wang di <di.wang@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6602
Reviewed-on: http://review.whamcloud.com/14883
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: llite: restart short read/write for normal IO

If normal IO got short read/write, we'd restart the IO from where
we've accomplished until we meet EOF or error happens.

Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6389
Reviewed-on: http://review.whamcloud.com/14123
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: llite: remove IS_ERR(master_inode) check

The kernel function ilookup5_nowait never returns
IS_ERR so we can remove the IS_ERR check in the
ll_md_blocking_ast() function.

Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8697
Reviewed-on: http://review.whamcloud.com/23151
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Frank Zago <fzago@cray.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: lustre: osc: remove handling cl_avail_grant less than zero

Earlier cl_avail_grant was changed to an unsigned int. Juila
Lawall reported for the upstream client the following which
affects the Intel branch as well:

drivers/staging/lustre/lustre/osc/osc_request.c:1045:5-24: WARNING: Unsigned
expression compared with zero: cli -> cl_avail_grant < 0

Since cl_avail_grant can never be negative we can remove the
code handling the negative value case.

Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Intel-bug-id: LU-8697 https://jira.hpdd.intel.com/browse/LU-6303
Reviewed-on: http://review.whamcloud.com/23155
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Frank Zago <fzago@cray.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Merge 4.9-rc3 into staging-next

This resolves a merge issue with
drivers/staging/iio/accel/sca3000_core.c and we want the fixes all in
here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Linux 4.9-rc3

Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 bugfix from Thomas Gleixner:
"A single bugfix for the recent changes related to registering the boot
  cpu when this has not happened before prefill_possible_map().

  The main problem with this change got fixed already, but we missed the
  case where the local APIC is not yet mapped, when prefill_possible_map()
  is invoked, so the registration of the boot cpu which has the APIC bit
  set in CPUID will explode.

  I should have seen that issue earlier, but all I can do now is feeling
  embarassed"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/smpboot: Init apic mapping before usage

Merge tag 'upstream-4.9-rc3' of git://git.infradead.org/linux-ubifs

Pull ubi/ubifs fixes from Richard Weinberger:
"This contains fixes for issues in both UBI and UBIFS:

   - A regression wrt overlayfs, introduced in -rc2.
   - An UBI issue, found by Dan Carpenter's static checker"

* tag 'upstream-4.9-rc3' of git://git.infradead.org/linux-ubifs:
  ubifs: Fix regression in ubifs_readdir()
  ubi: fastmap: Fix add_vol() return value test in ubi_attach_fastmap()

Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Olof Johansson:
"We haven't seen a whole lot of fixes for the first two weeks since the
  merge window, but here is the batch that we have at the moment.

  Nothing sticks out as particularly bad or scary, it's mostly a handful
  of smaller fixes to several platforms. The Uniphier reset controller
  changes could probably have been delayed to 4.10, but they're not
  scary and just plumbing up driver changes that went in during the
  merge window.

  We're also adding another maintainer to Marvell Berlin platforms, to
  help out when Sebastian is too busy. Yay teamwork!"

* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: imx: mach-imx6q: Fix the PHY ID mask for AR8031
  ARM: dts: vf610: fix IRQ flag of global timer
  ARM: imx: gpc: Fix the imx_gpc_genpd_init() error path
  ARM: imx: gpc: Initialize all power domains
  arm64: dts: Updated NAND DT properties for NS2 SVK
  arm64: dts: uniphier: change MIO node to SD control node
  ARM: dts: uniphier: change MIO node to SD control node
  reset: uniphier: rename MIO reset to SD reset for Pro5, PXs2, LD20 SoCs
  arm64: uniphier: select ARCH_HAS_RESET_CONTROLLER
  ARM: uniphier: select ARCH_HAS_RESET_CONTROLLER
  arm64: dts: Add timer erratum property for LS2080A and LS1043A
  arm64: dts: rockchip: remove the abuse of keep-power-in-suspend
  ARM: multi_v7_defconfig: Enable Intel e1000e driver
  MAINTAINERS: add myself as Marvell berlin SoC maintainer
  bus: qcom-ebi2: depend on ARCH_QCOM or COMPILE_TEST
  ARM: dts: fix the SD card on the Snowball
  arm64: dts: rockchip: remove always-on and boot-on from vcc_sd
  arm64: dts: marvell: fix clocksource for CP110 master SPI0
  ARM: mvebu: Select corediv clk for all mvebu v7 SoC

Merge tag 'char-misc-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc driver fixes from Greg KH:
"Here are a few small char/misc driver fixes for reported issues.

  The "biggest" are two binder fixes for reported issues that have been
  shipping in Android phones for a while now, the others are various
  fixes for reported problems.

  And there's a MAINTAINERS update for good measure.

  All have been in linux-next with no reported issues"

* tag 'char-misc-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  MAINTAINERS: Add entry for genwqe driver
  VMCI: Doorbell create and destroy fixes
  GenWQE: Fix bad page access during abort of resource allocation
  vme: vme_get_size potentially returning incorrect value on failure
  extcon: qcom-spmi-misc: Sync the extcon state on interrupt
  hv: do not lose pending heartbeat vmbus packets
  mei: txe: don't clean an unprocessed interrupt cause.
  ANDROID: binder: Clear binder and cookie when setting handle in flat binder struct
  ANDROID: binder: Add strong ref checks

Merge tag 'v4.9-rockchip-dts64-fixes1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into fixes

Correct regulator handling on Rockchip arm64 boards to make
bind/unbind calls work correctly and remove a sdio-only
property from non-sdio mmc hosts, that accidentially was
added there.

* tag 'v4.9-rockchip-dts64-fixes1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
arm64: dts: rockchip: remove the abuse of keep-power-in-suspend
arm64: dts: rockchip: remove always-on and boot-on from vcc_sd

Signed-off-by: Olof Johansson <olof@lixom.net>

Merge tag 'arm-soc/for-4.9/devicetree-arm64-fixes' of http://github.com/Broadcom/stblinux into fixes

This pull request contains a single fix for Broadcom ARM64-based SoCs:

- Ray adds the required bus width and OOB sector size properties to the
  Northstar 2 SVK reference board in order for the NAND controller to work
  properly

* tag 'arm-soc/for-4.9/devicetree-arm64-fixes' of http://github.com/Broadcom/stblinux:
  arm64: dts: Updated NAND DT properties for NS2 SVK

Signed-off-by: Olof Johansson <olof@lixom.net>

Merge tag 'imx-fixes-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into fixes

The i.MX fixes for 4.9:
- A couple of patches from Fabio to fix the GPC power domain regression
   which is caused by PM Domain core change 0159ec670763dd
   ("PM / Domains: Verify the PM domain is present when adding a
   provider"), and a related kernel crash seen with multi_v7_defconfig
   build.
- Correct the PHY ID mask for AR8031 to match phy driver code.
- Apply new added timer erratum A008585 for LS1043A and LS2080A SoC.
- Correct vf610 global timer IRQ flag to avoid warning from gic driver
   after commit 992345a58e0c ("irqchip/gic: WARN if setting the
   interrupt type for a PPI fails").

* tag 'imx-fixes-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
  ARM: imx: mach-imx6q: Fix the PHY ID mask for AR8031
  ARM: dts: vf610: fix IRQ flag of global timer
  ARM: imx: gpc: Fix the imx_gpc_genpd_init() error path
  ARM: imx: gpc: Initialize all power domains
  arm64: dts: Add timer erratum property for LS2080A and LS1043A

Signed-off-by: Olof Johansson <olof@lixom.net>

Merge tag 'uniphier-fixes-v4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-uniphier into fixes

UniPhier ARM SoC fixes for v4.9

- Add "select ARCH_HAS_RESET_CONTROLLER" in Kconfig
- Rename wrongly-named mioctrl to sdctrl

* tag 'uniphier-fixes-v4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-uniphier:
  arm64: dts: uniphier: change MIO node to SD control node
  ARM: dts: uniphier: change MIO node to SD control node
  reset: uniphier: rename MIO reset to SD reset for Pro5, PXs2, LD20 SoCs
  arm64: uniphier: select ARCH_HAS_RESET_CONTROLLER
  ARM: uniphier: select ARCH_HAS_RESET_CONTROLLER

Signed-off-by: Olof Johansson <olof@lixom.net>

Merge tag 'driver-core-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core fixes from Greg KH:
"Here are two small driver core / kernfs fixes for 4.9-rc3.

  One makes the Kconfig entry for DEBUG_TEST_DRIVER_REMOVE a bit more
  explicit that this is a crazy thing to enable for a distro kernel
  (thanks for trying Fedora!), the other resolves an issue with vim
  opening kernfs files (sysfs, configfs, etc.)

  Both have been in linux-next with no reported issues"

* tag 'driver-core-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  driver core: Make Kconfig text for DEBUG_TEST_DRIVER_REMOVE stronger
  kernfs: Add noop_fsync to supported kernfs_file_fops

Merge tag 'staging-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging and IIO driver fixes from Greg KH:
"Here are some small staging and iio driver fixes for reported issues
  for 4.9-rc3. Nothing major, the "largest" being a lustre fix for a
  sysfs file that was obviously wrong, and had never been tested, so it
  was moved to debugfs as that is where it belongs. The others are small
  bug fixes for reported issues with various staging or iio drivers.

  All have been in linux-next for a while with no reported issues"

* tag 'staging-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  greybus: fix a leak on error in gb_module_create()
  greybus: es2: fix error return code in ap_probe()
  greybus: arche-platform: Add missing of_node_put() in arche_platform_change_state()
  staging: android: ion: Fix error handling in ion_query_heaps()
  iio: accel: sca3000_core: avoid potentially uninitialized variable
  iio:chemical:atlas-ph-sensor: Fix use of 32 bit int to hold 16 bit big endian value
  staging/lustre/llite: Move unstable_stats from sysfs to debugfs
  Staging: wilc1000: Fix kernel Oops on opening the device
  staging: android/ion: testing the wrong variable
  Staging: greybus: uart: Use gbphy_dev->dev instead of bundle->dev
  Staging: greybus: gpio: Use gbphy_dev->dev instead of bundle->dev
  iio: adc: ti-adc081c: Select IIO_TRIGGERED_BUFFER to prevent build errors
  iio: maxim_thermocouple: Align 16 bit big endian value of raw reads

Merge tag 'tty-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial driver fixes from Greg KH:
"Here are a number of small tty and serial driver fixes for reported
  issues for 4.9-rc3. Nothing major, but they do resolve a bunch of
  problems with the tty core changes that are in 4.9-rc1, and finally
  the atmel serial driver is back working properly.

  All have been in linux-next with no reported issues"

* tag 'tty-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  tty: serial_core: fix NULL struct tty pointer access in uart_write_wakeup
  tty: serial_core: Fix serial console crash on port shutdown
  tty/serial: at91: fix hardware handshake on Atmel platforms
  vt: clear selection before resizing
  sc16is7xx: always write state when configuring GPIO as an output
  sh-sci: document R8A7743/5 support
  tty: serial: 8250: 8250_core: NXP SC16C2552 workaround
  tty: limit terminal size to 4M chars
  tty: serial: fsl_lpuart: Fix Tx DMA edge case
  serial: 8250_lpss: enable MSI for sure
  serial: core: fix console problems on uart_close
  serial: 8250_uniphier: fix clearing divisor latch access bit
  serial: 8250_uniphier: fix more unterminated string
  serial: pch_uart: add terminate entry for dmi_system_id tables
  devicetree: bindings: uart: Add new compatible string for ZynqMP
  serial: xuartps: Add new compatible string for ZynqMP
  serial: SERIAL_STM32 should depend on HAS_DMA
  serial: stm32: Fix comparisons with undefined register
  tty: vt, fix bogus division in csi_J

Merge tag 'usb-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
"Here are a number of small USB driver fixes for 4.9-rc3.

  There is the usual number of gadget and xhci patches in here to
  resolved reported issues, as well as some usb-serial driver fixes and
  new device ids.

  All have been in linux-next with no reported issues"

* tag 'usb-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (26 commits)
  usb: chipidea: host: fix NULL ptr dereference during shutdown
  usb: renesas_usbhs: add wait after initialization for R-Car Gen3
  usb: increase ohci watchdog delay to 275 msec
  usb: musb: Call pm_runtime from musb_gadget_queue
  usb: musb: Fix hardirq-safe hardirq-unsafe lock order error
  usb: ehci-platform: increase EHCI_MAX_RSTS to 4
  usb: ohci-at91: Set RemoteWakeupConnected bit explicitly.
  USB: serial: fix potential NULL-dereference at probe
  xhci: use default USB_RESUME_TIMEOUT when resuming ports.
  xhci: workaround for hosts missing CAS bit
  xhci: add restart quirk for Intel Wildcatpoint PCH
  USB: serial: cp210x: fix tiocmget error handling
  wusb: fix error return code in wusb_prf()
  Revert "Documentation: devicetree: dwc2: Deprecate g-tx-fifo-size"
  Revert "usb: dwc2: gadget: fix TX FIFO size and address initialization"
  Revert "usb: dwc2: gadget: change variable name to more meaningful"
  USB: serial: ftdi_sio: add support for Infineon TriBoard TC2X7
  wusb: Stop using the stack for sg crypto scratch space
  usb: dwc3: Fix size used in dma_free_coherent()
  usb: gadget: f_fs: stop sleeping in ffs_func_eps_disable
  ...

x86/smpboot: Init apic mapping before usage

The recent changes, which forced the registration of the boot cpu on UP
systems, which do not have ACPI tables, have been fixed for systems w/o
local APIC, but left a wreckage for systems which have neither ACPI nor
mptables, but the CPU has an APIC, e.g. virtualbox.

The boot process crashes in prefill_possible_map() as it wants to register
the boot cpu, which needs to access the local apic, but the local APIC is
not yet mapped.

There is no reason why init_apic_mapping() can't be invoked before
prefill_possible_map(). So instead of playing another silly early mapping
game, as the ACPI/mptables code does, we just move init_apic_mapping()
before the call to prefill_possible_map().

In hindsight, I should have noticed that combination earlier.

Sorry for the churn (also in stable)!

Fixes: ff8560512b8d ("x86/boot/smp: Don't try to poke disabled/non-existent APIC")
Reported-and-debugged-by: Michal Necasek <michal.necasek@oracle.com>
Reported-and-tested-by: Wolfgang Bauer <wbauer@tmo.at>
Cc: prarit@redhat.com
Cc: ville.syrjala@linux.intel.com
Cc: michael.thayer@oracle.com
Cc: knut.osmundsen@oracle.com
Cc: frank.mehnert@oracle.com
Cc: Borislav Petkov <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1610282114380.5053@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Merge tag 'acpi-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fixes from Rafael Wysocki:
"These fix recent ACPICA regressions, an older PCI IRQ management
  regression, and an incorrect return value of a function in the APEI
  code.

  Specifics:

   - Fix three ACPICA issues related to the interpreter locking and
     introduced by recent changes in that area (Lv Zheng).

   - Fix a PCI IRQ management regression introduced during the 4.7 cycle
     and related to the configuration of shared IRQs on systems with an
     ISA bus (Sinan Kaya).

   - Fix up a return value of one function in the APEI code (Punit
     Agrawal)"

* tag 'acpi-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPICA: Dispatcher: Fix interpreter locking around acpi_ev_initialize_region()
  ACPICA: Dispatcher: Fix an unbalanced lock exit path in acpi_ds_auto_serialize_method()
  ACPICA: Dispatcher: Fix order issue of method termination
  ACPI / APEI: Fix incorrect return value of ghes_proc()
  ACPI/PCI: pci_link: Include PIRQ_PENALTY_PCI_USING for ISA IRQs
  ACPI/PCI: pci_link: penalize SCI correctly
  ACPI/PCI/IRQ: assign ISA IRQ directly during early boot stages

Merge tag 'pm-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
"These fix two intel_pstate issues related to the way it works when the
  scaling_governor sysfs attribute is set to "performance" and fix up
  messages in the system suspend core code.

  Specifics:

   - Fix a missing KERN_CONT in a system suspend message by converting
     the affected code to using pr_info() and pr_cont() instead of the
     "raw" printk() (Jon Hunter).

   - Make intel_pstate set the CPU P-state from its .set_policy()
     callback when the scaling_governor sysfs attribute is set to
     "performance" so that it interacts with NOHZ_FULL more predictably
     which was the case before 4.7 (Rafael Wysocki).

   - Make intel_pstate always request the maximum allowed P-state when
     the scaling_governor sysfs attribute is set to "performance" to
     prevent it from effectively ingoring that setting is some
     situations (Rafael Wysocki)"

* tag 'pm-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq: intel_pstate: Always set max P-state in performance mode
  PM / suspend: Fix missing KERN_CONT for suspend message
  cpufreq: intel_pstate: Set P-state upfront in performance mode

Merge tag 'arc-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc

Pull ARC updates from Vineet Gupta:

- support IDU intc for UP builds

- support gz, lzma compressed uImage [Daniel Mentz]

- adjust /proc/cpuinfo for non-continuous cpu ids [Noam Camus]

- syscall for userspace cmpxchg assist for configs lacking hardware atomics

- rework of boot log printing mainly for identifying older arc700 cores

- retiring some old code, build toggles

* tag 'arc-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
  ARC: module: print pretty section names
  ARC: module: elide loop to save reference to .eh_frame
  ARC: mm: retire ARC_DBG_TLB_MISS_COUNT...
  ARC: build: retire old toggles
  ARC: boot log: refactor cpu name/release printing
  ARC: boot log: remove awkward space comma from MMU line
  ARC: boot log: don't assume SWAPE instruction support
  ARC: boot log: refactor printing abt features not captured in BCRs
  ARCv2: boot log: print IOC exists as well as enabled status
  ARCv2: IOC: use @ioc_enable not @ioc_exist where intended
  ARC: syscall for userspace cmpxchg assist
  ARC: fix build warning in elf.h
  ARC: Adjust cpuinfo for non-continuous cpu ids
  ARC: [build] Support gz, lzma compressed uImage
  ARCv2: intc: untangle SMP, MCIP and IDU

Merge branches 'acpica-fixes', 'acpi-pci-fixes' and 'acpi-apei-fixes'

* acpica-fixes:
  ACPICA: Dispatcher: Fix interpreter locking around acpi_ev_initialize_region()
  ACPICA: Dispatcher: Fix an unbalanced lock exit path in acpi_ds_auto_serialize_method()
  ACPICA: Dispatcher: Fix order issue of method termination

* acpi-pci-fixes:
  ACPI/PCI: pci_link: Include PIRQ_PENALTY_PCI_USING for ISA IRQs
  ACPI/PCI: pci_link: penalize SCI correctly
  ACPI/PCI/IRQ: assign ISA IRQ directly during early boot stages

* acpi-apei-fixes:
  ACPI / APEI: Fix incorrect return value of ghes_proc()

ACPICA: Dispatcher: Fix interpreter locking around acpi_ev_initialize_region()

In the code path of acpi_ev_initialize_region(), there is namespace
modification code unlocked. This patch tunes the code to make sure
such modification are always locked.

Fixes: 74f51b80a0c4 (ACPICA: Namespace: Fix dynamic table loading issues)
Tested-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

ACPICA: Dispatcher: Fix an unbalanced lock exit path in acpi_ds_auto_serialize_method()

There is a lock unbalanced exit path in acpi_ds_initialize_method(),
this patch corrects it.

Fixes: 441ad11d078f (ACPICA: Dispatcher: Fix a mutex issue for method auto serialization)
Tested-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

ACPICA: Dispatcher: Fix order issue of method termination

The last step of the method termination should be the end of the method
serialization. Otherwise, the steps happening after it will face the race
issues that cannot be protected by the method serialization mechanism.

This patch fixes this issue by moving the per-method-object deletion code
prior than the end of the method serialization. Otherwise, the possible
race issues may result in AE_ALREADY_EXISTS error in a parallel
environment.

Fixes: 74f51b80a0c4 (ACPICA: Namespace: Fix dynamic table loading issues)
Reported-and-tested-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Merge tag 'powerpc-4.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
"Fixes marked for stable:
   - Convert cmp to cmpd in idle enter sequence (Segher Boessenkool)
   - cxl: Fix leaking pid refs in some error paths (Vaibhav Jain)
   - Re-fix race condition between going idle and entering guest (Paul Mackerras)
   - Fix race condition in setting lock bit in idle/wakeup code (Paul Mackerras)
   - radix: Use tlbiel only if we ever ran on the current cpu (Aneesh Kumar K.V)
   - relocation, register save fixes for system reset interrupt (Nicholas Piggin)

  Fixes for code merged this cycle:
   - Fix CONFIG_ALIVEC typo in restore_tm_state() (Valentin Rothberg)
   - KVM: PPC: Book3S HV: Fix build error when SMP=n (Michael Ellerman)"

* tag 'powerpc-4.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/64s: relocation, register save fixes for system reset interrupt
  powerpc/mm/radix: Use tlbiel only if we ever ran on the current cpu
  powerpc/process: Fix CONFIG_ALIVEC typo in restore_tm_state()
  powerpc/64: Fix race condition in setting lock bit in idle/wakeup code
  powerpc/64: Re-fix race condition between going idle and entering guest
  cxl: Fix leaking pid refs in some error paths
  powerpc: Convert cmp to cmpd in idle enter sequence
  KVM: PPC: Book3S HV: Fix build error when SMP=n

Merge branches 'pm-cpufreq-fixes' and 'pm-sleep-fixes'

* pm-cpufreq-fixes:
  cpufreq: intel_pstate: Always set max P-state in performance mode
  cpufreq: intel_pstate: Set P-state upfront in performance mode

* pm-sleep-fixes:
  PM / suspend: Fix missing KERN_CONT for suspend message

Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fixes from Ingo Molnar:
"Misc kernel fixes: a virtualization environment related fix, an uncore
  PMU driver removal handling fix, a PowerPC fix and new events for
  Knights Landing"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel: Honour the CPUID for number of fixed counters in hypervisors
  perf/powerpc: Don't call perf_event_disable() from atomic context
  perf/core: Protect PMU device removal with a 'pmu_bus_running' check, to fix CONFIG_DEBUG_TEST_DRIVER_REMOVE=y kernel panic
  perf/x86/intel/cstate: Add C-state residency events for Knights Landing

Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm fixes from Dan Williams:
"A build fix, a NULL de-reference found by static analysis, a misuse of
  the percpu_ref_exit() (tagged for -stable), and notification of failed
  attempts to clear media errors.

  These patches have received a build success notification from the
  0day- kbuild-robot and appeared in next-20161028"

* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  device-dax: fix percpu_ref_exit ordering
  nvdimm: make CONFIG_NVDIMM_DAX 'bool'
  pmem: report error on clear poison failure
  libnvdimm, namespace: potential NULL deref on allocation error

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Will Deacon:
"Three arm64 fixes for -rc3.  They're all pretty straightforward: a
  couple of NUMA issues from the Huawei folks and a thinko in
  __page_to_voff that seems to be benign, but is certainly better off
  fixed.

  Summary:
   - couple of NUMA fixes
   - thinko in __page_to_voff"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: mm: fix __page_to_voff definition
  arm64/numa: fix incorrect log for memory-less node
  arm64/numa: fix pcpu_cpu_distance() to get correct CPU proximity

Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:
"Misc fixes: three build fixes, an unwinder fix and a microcode loader
  fix"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/microcode/AMD: Fix more fallout from CONFIG_RANDOMIZE_MEMORY=y
  x86: Fix export for mcount and __fentry__
  x86/quirks: Hide maybe-uninitialized warning
  x86/build: Fix build with older GCC versions
  x86/unwind: Fix empty stack dereference in guess unwinder

Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fixes from Ingo Molnar:
"Fix four timer locking races: two were noticed by Linus while
  reviewing the code while chasing for a corruption bug, and two
  from fixing spurious USB timeouts"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timers: Prevent base clock corruption when forwarding
  timers: Prevent base clock rewind when forwarding clock
  timers: Lock base for same bucket optimization
  timers: Plug locking race vs. timer migration

Merge branches 'core-urgent-for-linus', 'irq-urgent-for-linus' and 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull objtool, irq and scheduler fixes from Ingo Molnar:
"One more objtool fixlet for GCC6 code generation patterns, an irq
  DocBook fix and an unused variable warning fix in the scheduler"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool: Fix rare switch jump table pattern detection

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  doc: Add missing parameter for msi_setup

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/fair: Remove unused but set variable 'rq'

ARC: module: print pretty section names

Now that we have referece to section name string table in
apply_relocate_add(), use it to

- print the name of section being relocated
- print symbol with NULL name (since it refers to a section)

before

| Section to fixup 7000a060
| =========================================================
| rela->r_off | rela->addend | sym->st_value | ADDR | VALUE
| =========================================================
| 1c 0 7000e000  7000a07c 7000e000 []
| 40 0 7000a000  7000a0a0 7000a000 []

after

| Section to fixup .eh_frame @7000a060
| =========================================================
| r_off r_add st_value ADDRESS  VALUE
| =========================================================
|    1c 0 7000e000 7000a07c 7000e000 [.init.text]
|    40 0 7000a000 7000a0a0 7000a000 [.exit.text]

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>

ARC: module: elide loop to save reference to .eh_frame

The loop was really needed in .debug_frame regime where wanted make it
as SH_ALLOC so that apply_relocate_add() would process it. That's not
needed for .eh_frame, so we check this in apply_relocate_add() which
gets called for each section.

Note that we need to save reference to "section name strings" section in
module_frob_arch_sections() since apply_relocate_add() doesn't get that

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>

ARC: mm: retire ARC_DBG_TLB_MISS_COUNT...

... given that we have perf counters abel to do the same thing non
intrusively

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>

ARC: build: retire old toggles

These are really ancient toggles and tools no longer require them to be
passed. This paves way for deprecating them in long run.

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>

ARC: boot log: refactor cpu name/release printing

The motivation is to identify ARC750 vs. ARC770 (we currently print
generic "ARC700").

A given ARC700 release could be 750 or 770, with same ARCNUM (or family
identifier which is unfortunate). The existing arc_cpu_tbl[] kept a single
concatenated string for core name and release which thus doesn't work
for 750 vs. 770 identification.

So split this into 2 tables, one with core names and other with release.
And while we are at it, get rid of the range checking for family numbers.
We just document the known to exist cores running Linux and ditch
others.

With this in place, we add detection of ARC750 which is
- cores 0x33 and before
- cores 0x34 and later with MMUv2

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>

ARC: boot log: remove awkward space comma from MMU line

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>

ARC: boot log: don't assume SWAPE instruction support

This came to light when helping a customer with oldish ARC750 core who
were getting instruction errors because of lack of SWAPE but boot log
was incorrectly printing it as being present

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>

ARC: boot log: refactor printing abt features not captured in BCRs

On older arc700 cores, some of the features configured were not present
in Build config registers. To print about them at boot, we just use the
Kconfig option i.e. whether linux is built to use them or not.
So yes this seems bogus, but what else can be done. Moreover if linux is
booting with these enabled, then the Kconfig info is a good indicator
anyways.

Over time these "hacks" accumulated in read_arc_build_cfg_regs() as well
as arc_cpu_mumbojumbo(). so refactor and move all of those in a single
place: read_arc_build_cfg_regs(). This causes some code redcution too:

| bloat-o-meter2 arch/arc/kernel/setup.o.0 arch/arc/kernel/setup.o.1
| add/remove: 0/0 grow/shrink: 2/1 up/down: 64/-132 (-68)
| function                                     old     new   delta
| setup_processor                              610     670     +60
| cpuinfo_arc700                                76      80      +4
| arc_cpu_mumbojumbo                           752     620    -132

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>

Merge branch 'for-linus-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs

Pull btrfs fixes from Chris Mason:
"My patch fixes the btrfs list_head abuse that we tracked down during
  Dave Jones' memory corruption investigation. With both Jens and my
  patches in place, I'm no longer able to trigger problems.

  Filipe is fixing a difficult old bug between snapshots, balance and
  send. Dave is cooking a few more for the next rc, but these are tested
  and ready"

* 'for-linus-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  btrfs: fix races on root_log_ctx lists
  btrfs: fix incremental send failure caused by balance

ARCv2: boot log: print IOC exists as well as enabled status

Previously we would not print the case when IOC existed but was not
enabled.

And while at it, reduce one line off boot printing by consolidating
the Peripheral address space and IO-Coherency which in a way
applies to them

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>

Merge tag 'sound-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"This contains the usual stuff -- the fixups and quirks for HD-audio
  and USB-audio, in addition to a bad regression fix in ALSA sequencer
  timer since 4.8, and a trivial fix for asihpi PCI driver"

* tag 'sound-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: usb-audio: Add quirk for Syntek STK1160
  ALSA: seq: Fix time account regression
  ALSA: hda - Fix surround output pins for ASRock B150M mobo
  ALSA: hda - Fix headset mic detection problem for two Dell laptops
  ALSA: asihpi: fix kernel memory disclosure
  ALSA: hda - Adding a new group of pin cfg into ALC295 pin quirk table
  ALSA: hda - allow 40 bit DMA mask for NVidia devices

Merge tag 'drm-x86-pat-regression-fix' of git://people.freedesktop.org/~airlied/linux

Pull drm x86/pat regression fixes from Dave Airlie:
"This is a standalone pull request for the fix for a regression
  introduced in -rc1 by a change to vm_insert_mixed to start using the
  PAT range tracking to validate page protections. With this fix in
  place, all the VRAM mappings for GPU drivers ended up at UC instead of
  WC.

  There are probably better ways to fix this long term, but nothing I'd
  considered for -fixes that wouldn't need more settling in time. So
  I've just created a new arch API that the drivers can reserve all
  their VRAM aperture ranges as WC"

* tag 'drm-x86-pat-regression-fix' of git://people.freedesktop.org/~airlied/linux:
  drm/drivers: add support for using the arch wc mapping API.
  x86/io: add interface to reserve io memtype for a resource range. (v1.1)

Merge tag 'dm-4.9-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:

- a couple DM raid and DM mirror fixes

- a couple .request_fn request-based DM NULL pointer fixes

- a fix for a DM target reference count leak, on target load error,
   that prevented associated DM target kernel module(s) from being
   removed

* tag 'dm-4.9-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm table: fix missing dm_put_target_type() in dm_table_add_target()
  dm rq: clear kworker_task if kthread_run() returned an error
  dm: free io_barrier after blk_cleanup_queue call
  dm raid: fix activation of existing raid4/10 devices
  dm mirror: use all available legs on multiple failures
  dm mirror: fix read error on recovery after default leg failure
  dm raid: fix compat_features validation

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security

Pull key fixes from James Morris:

- fix a buffer overflow when displaying /proc/keys [CVE-2016-7042].

- fix broken initialisation in the big_key implementation that can
   result in an oops.

- make big_key depend on having a random number generator available in
   Kconfig.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  security/keys: make BIG_KEYS dependent on stdrng.
  KEYS: Sort out big_key initialisation
  KEYS: Fix short sprintf buffer in /proc/keys show function

ubifs: Fix regression in ubifs_readdir()

Commit c83ed4c9dbb35 ("ubifs: Abort readdir upon error") broke
overlayfs support because the fix exposed an internal error
code to VFS.

Reported-by: Peter Rosin <peda@axentia.se>
Tested-by: Peter Rosin <peda@axentia.se>
Reported-by: Ralph Sennhauser <ralph.sennhauser@gmail.com>
Tested-by: Ralph Sennhauser <ralph.sennhauser@gmail.com>
Fixes: c83ed4c9dbb35 ("ubifs: Abort readdir upon error")
Cc: stable@vger.kernel.org
Signed-off-by: Richard Weinberger <richard@nod.at>

ubi: fastmap: Fix add_vol() return value test in ubi_attach_fastmap()

Commit e96a8a3bb671 ("UBI: Fastmap: Do not add vol if it already
exists") introduced a bug by changing the possible error codes returned
by add_vol():
- this function no longer returns NULL in case of allocation failure
but return ERR_PTR(-ENOMEM)
- when a duplicate entry in the volume RB tree is found it returns
ERR_PTR(-EEXIST) instead of ERR_PTR(-EINVAL)

Fix the tests done on add_vol() return val to match this new behavior.

Fixes: e96a8a3bb671 ("UBI: Fastmap: Do not add vol if it already exists")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Acked-by: Sheng Yong <shengyong1@huawei.com>
Signed-off-by: Richard Weinberger <richard@nod.at>

MAINTAINERS: Add entry for genwqe driver

Frank and I maintain this

Signed-off-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Cc: haver@linux.vnet.ibm.com
Acked-by: Frank Haverkamp <haver@linux.vnet.ibm.com>=
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

VMCI: Doorbell create and destroy fixes

This change consists of two changes:

1) If vmci_doorbell_create is called when neither guest nor
   host personality as been initialized, vmci_get_context_id
   will return VMCI_INVALID_ID. In that case, we should fail
   the create call.
2) In doorbell destroy, we assume that vmci_guest_code_active()
   has the same return value on create and destroy. That may not
   be the case, so we may end up with the wrong refcount.
   Instead, destroy should check explicitly whether the doorbell
   is in the index table as an indicator of whether the guest
   code was active at create time.

Reviewed-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

GenWQE: Fix bad page access during abort of resource allocation

When interrupting an application which was allocating DMAable
memory, it was possible, that the DMA memory was deallocated
twice, leading to the error symptoms below.

Thanks to Gerald, who analyzed the problem and provided this
patch.

I agree with his analysis of the problem: ddcb_cmd_fixups() ->
genwqe_alloc_sync_sgl() (fails in f/lpage, but sgl->sgl != NULL
and f/lpage maybe also != NULL) -> ddcb_cmd_cleanup() ->
genwqe_free_sync_sgl() (double free, because sgl->sgl != NULL and
f/lpage maybe also != NULL)

In this scenario we would have exactly the kind of double free that
would explain the WARNING / Bad page state, and as expected it is
caused by broken error handling (cleanup).

Using the Ubuntu git source, tag Ubuntu-4.4.0-33.52, he was able to reproduce
the "Bad page state" issue, and with the patch on top he could not reproduce
it any more.

------------[ cut here ]------------
WARNING: at /build/linux-o03cxz/linux-4.4.0/arch/s390/include/asm/pci_dma.h:141
Modules linked in: qeth_l2 ghash_s390 prng aes_s390 des_s390 des_generic sha512_s390 sha256_s390 sha1_s390 sha_common genwqe_card qeth crc_itu_t qdio ccwgroup vmur dm_multipath dasd_eckd_mod dasd_mod
CPU: 2 PID: 3293 Comm: genwqe_gunzip Not tainted 4.4.0-33-generic #52-Ubuntu
task: 0000000032c7e270 ti: 00000000324e4000 task.ti: 00000000324e4000
Krnl PSW : 0404c00180000000 0000000000156346 (dma_update_cpu_trans+0x9e/0xa8)
           R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3
Krnl GPRS: 00000000324e7bcd 0000000000c3c34a 0000000027628298 000000003215b400
           0000000000000400 0000000000001fff 0000000000000400 0000000116853000
           07000000324e7b1e 0000000000000001 0000000000000001 0000000000000001
           0000000000001000 0000000116854000 0000000000156402 00000000324e7a38
Krnl Code: 000000000015633a: 95001000           cli     0(%r1),0
           000000000015633e: a774ffc3           brc     7,1562c4
          #0000000000156342: a7f40001           brc     15,156344
          >0000000000156346: 92011000           mvi     0(%r1),1
           000000000015634a: a7f4ffbd           brc     15,1562c4
           000000000015634e: 0707               bcr     0,%r7
           0000000000156350: c00400000000       brcl    0,156350
           0000000000156356: eb7ff0500024       stmg    %r7,%r15,80(%r15)
Call Trace:
([<00000000001563e0>] dma_update_trans+0x90/0x228)
[<00000000001565dc>] s390_dma_unmap_pages+0x64/0x160
[<00000000001567c2>] s390_dma_free+0x62/0x98
[<000003ff801310ce>] __genwqe_free_consistent+0x56/0x70 [genwqe_card]
[<000003ff801316d0>] genwqe_free_sync_sgl+0xf8/0x160 [genwqe_card]
[<000003ff8012bd6e>] ddcb_cmd_cleanup+0x86/0xa8 [genwqe_card]
[<000003ff8012c1c0>] do_execute_ddcb+0x110/0x348 [genwqe_card]
[<000003ff8012c914>] genwqe_ioctl+0x51c/0xc20 [genwqe_card]
[<000000000032513a>] do_vfs_ioctl+0x3b2/0x518
[<0000000000325344>] SyS_ioctl+0xa4/0xb8
[<00000000007b86c6>] system_call+0xd6/0x264
[<000003ff9e8e520a>] 0x3ff9e8e520a
Last Breaking-Event-Address:
[<0000000000156342>] dma_update_cpu_trans+0x9a/0xa8
---[ end trace 35996336235145c8 ]---
BUG: Bad page state in process jbd2/dasdb1-8  pfn:3215b
page:000003d100c856c0 count:-1 mapcount:0 mapping:          (null) index:0x0
flags: 0x3fffc0000000000()
page dumped because: nonzero _count

Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Frank Haverkamp <haver@linux.vnet.ibm.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

vme: vme_get_size potentially returning incorrect value on failure

The function vme_get_size returns the size of the window to the caller,
however it doesn't check the return value of the call to vme_master_get.

Return 0 on failure rather than anything else.

Suggested-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martyn Welch <martyn.welch@collabora.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tty: serial_core: fix NULL struct tty pointer access in uart_write_wakeup

Since commit 761ed4a94582ab29 ("tty: serial_core: convert uart_close to
use tty_port_close"), the serial console is broken on various systems
and typing "reboot" splats the following on the serial console:

INIT: Sending p[  427.863916] BUG: unable to handle kernel NULL pointer dereference at 00000000000001e0
[  427.885156] IP: [] tty_wakeup+0xc/0x70
[  427.898337] PGD 0 [  427.902051]
[  427.907498] Oops: 0000 [#1] PREEMPT SMP
[  427.917635] Modules linked in: nfsv3 nfs_acl nfs fscache lockd
sunrpc grace edd af_packet cpufreq_conservative cpufreq_userspace
cpufreq_powersave fuse loop md_mod dm_mod joydev hid_generic usbhid
ipmi_ssif ohci_pci ohci_hcd ehci_pci ehci_hcd e1000e ptp firewire_ohci
edac_core pps_core tpm_infineon sp5100_tco firewire_core acpi_cpufreq
serio_raw pcspkr fjes usbcore shpchp edac_mce_amd tpm_tis ipmi_si
tpm_tis_core i2c_piix4 k10temp sg ipmi_msghandler tpm sr_mod button
cdrom kvm_amd kvm irqbypass crc_itu_t ast ttm drm_kms_helper drm
fb_sys_fops sysimgblt sysfillrect syscopyarea i2c_algo_bit scsi_dh_rdac
scsi_dh_alua scsi_dh_emc scsi_dh_hp_sw ata_generic pata_atiixp
[  428.054179] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-rc1-1.g73e3f23-default #1
[  428.072868] Hardware name: System manufacturer System Product Name/KGP(M)E-D16, BIOS 0902    12/03/2010
[  428.094755] task: ffffffffa2c0d500 task.stack: ffffffffa2c00000
[  428.109717] RIP: 0010:[]  [] tty_wakeup+0xc/0x70
[  428.128407] RSP: 0018:ffff9a1a5fc03df8  EFLAGS: 00010086
[  428.142184] RAX: ffff9a1857258000 RBX: ffffffffa3050ea0 RCX: 0000000000000000
[  428.159649] RDX: 000000000000001b RSI: 0000000000000000 RDI: 0000000000000000
[  428.177109] RBP: ffff9a1a5fc03e08 R08: 0000000000000000 R09: 0000000000000000
[  428.194547] R10: 0000000000021c77 R11: 0000000000000000 R12: ffff9a1857258000
[  428.212002] R13: 0000000000000000 R14: 0000000000000020 R15: 0000000000000020
[  428.229481] FS:  0000000000000000(0000) GS:ffff9a1a5fc00000(0000) knlGS:0000000000000000
[  428.248938] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  428.263726] CR2: 00000000000001e0 CR3: 0000000390c06000 CR4: 00000000000006f0
[  428.281331] Stack:
[  428.288696]  ffffffffa3050ea0 ffff9a1857258000 ffff9a1a5fc03e18 ffffffffa24e0ab1
[  428.307064]  ffff9a1a5fc03e40 ffffffffa24e8865 ffffffffa3050ea0 00000000000000c2
[  428.325456]  0000000000000046 ffff9a1a5fc03e78 ffffffffa24e8a5f ffffffffa3050ea0
[  428.343905] Call Trace:
[  428.352319]   [  428.356216]  [] uart_write_wakeup+0x21/0x30

The problem is for console ports, the serial port is not shutdown and
interrupts may fire after the struct tty is gone. Simply calling the
tty_port helper tty_port_tty_wakeup instead of tty_wakeup directly will
ensure there is a valid struct tty.

Fixes: 761ed4a94582ab29 ("tty: serial_core: convert uart_close to use tty_port_close")
Reported-by: Borislav Petkov <bp@alien8.de>
Reported-by: Mike Galbraith <mgalbraith@suse.de>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linux-serial@vger.kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tty: serial_core: Fix serial console crash on port shutdown

The port->console flag is always false, as uart_console() is called
before the serial console has been registered.

Hence for a serial port used as the console, uart_tty_port_shutdown()
will still be called when userspace closes the port, powering it down.
This may lead to a system lock up when the serial console driver writes
to the serial port's registers.

To fix this, move the setting of port->console after the call to
uart_configure_port(), which registers the serial console.

Fixes: 761ed4a94582ab29 ("tty: serial_core: convert uart_close to use tty_port_close")
Reported-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Rob Herring <robh@kernel.org>
Tested-by: Mugunthan V N <mugunthanvnm@ti.com>
Tested-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
[robh: rebased on tty-linus]
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tty/serial: at91: fix hardware handshake on Atmel platforms

After commit 1cf6e8fc8341 ("tty/serial: at91: fix RTS line management
when hardware handshake is enabled"), the hardware handshake wasn't
functional anymore on Atmel platforms (beside SAMA5D2).

To understand why, one has to understand the flag ATMEL_US_USMODE_HWHS
first:
Before commit 1cf6e8fc8341 ("tty/serial: at91: fix RTS line management
when hardware handshake is enabled"), this flag was never set.
Thus, the CTS/RTS where only handled by serial_core (and everything
worked just fine).

This commit introduced the use of the ATMEL_US_USMODE_HWHS flag,
enabling it for all boards when the user space enables flow control.

When the ATMEL_US_USMODE_HWHS is set, the Atmel USART controller
handles a part of the flow control job:
- disable the transmitter when the CTS pin gets high.
- drive the RTS pin high when the DMA buffer transfer is completed or
PDC RX buffer full or RX FIFO is beyond threshold. (depending on the
controller version).

NB: This feature is *not* mandatory for the flow control to work.
(Nevertheless, it's very useful if low latencies are needed.)

Now, the specifics of the ATMEL_US_USMODE_HWHS flag:

- For platforms with DMAC and no FIFOs (sam9x25, sam9x35, sama5D3,
sama5D4, sam9g15, sam9g25, sam9g35)* this feature simply doesn't work.
( source: https://lkml.org/lkml/2016/9/7/598 )
Tested it on sam9g35, the RTS pins always stays up, even when RXEN=1
or a new DMA transfer descriptor is set.
=> ATMEL_US_USMODE_HWHS must not be used for those platforms

- For platforms with a PDC (sam926{0,1,3}, sam9g10, sam9g20, sam9g45,
sam9g46)*, there's another kind of problem. Once the flag
ATMEL_US_USMODE_HWHS is set, the RTS pin can't be driven anymore via
RTSEN/RTSDIS in USART Control Register. The RTS pin can only be driven
by enabling/disabling the receiver or setting RCR=RNCR=0 in the PDC
(Receive (Next) Counter Register).
=> Doing this is beyond the scope of this patch and could add other
bugs, so the original (and working) behaviour should be set for those
platforms (meaning ATMEL_US_USMODE_HWHS flag should be unset).

- For platforms with a FIFO (sama5d2)*, the RTS pin is driven according
to the RX FIFO thresholds, and can be also driven by RTSEN/RTSDIS in
USART Control Register. No problem here.
(This was the use case of commit 1cf6e8fc8341 ("tty/serial: at91: fix
RTS line management when hardware handshake is enabled"))
NB: If the CTS pin declared as a GPIO in the DTS, (for instance
cts-gpios = <&pioA PIN_PB31 GPIO_ACTIVE_LOW>), the transmitter will be
disabled.
=> ATMEL_US_USMODE_HWHS flag can be set for this platform ONLY IF the
CTS pin is not a GPIO.

So, the only case when ATMEL_US_USMODE_HWHS can be enabled is when
(atmel_use_fifo(port) &&
!mctrl_gpio_to_gpiod(atmel_port->gpios, UART_GPIO_CTS))

Tested on all Atmel USART controller flavours:
AT91SAM9G35-CM (DMAC flavour), AT91SAM9G20-EK (PDC flavour),
SAMA5D2xplained (FIFO flavour).

* the list may not be exhaustive

Cc: <stable@vger.kernel.org> #4.4+ (beware, missing atmel_port variable)
Fixes: 1cf6e8fc8341 ("tty/serial: at91: fix RTS line management when hardware handshake is enabled")
Signed-off-by: Richard Genoud <richard.genoud@gmail.com>
Acked-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Acked-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

perf/x86/intel: Honour the CPUID for number of fixed counters in hypervisors

perf doesn't seem to honour the number of fixed counters specified by CPUID
leaf 0xa. It always assumes that Intel CPUs have at least 3 fixed counters.

So if some of the fixed counters are masked out by the hypervisor, it still
tries to check/set them.

This patch makes perf behave nicer when the kernel is running under a
hypervisor that doesn't expose all the counters.

This patch contains some ideas from Matt Wilson.

Signed-off-by: Imre Palik <imrep@amazon.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Kozyrev <alexander.kozyrev@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Artyom Kuanbekov <artyom.kuanbekov@intel.com>
Cc: David Carrillo-Cisneros <davidcc@google.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Wilson <msw@amazon.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1477037939-15605-1-git-send-email-imrep.amz@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

perf/powerpc: Don't call perf_event_disable() from atomic context

The trinity syscall fuzzer triggered following WARN() on powerpc:

  WARNING: CPU: 9 PID: 2998 at arch/powerpc/kernel/hw_breakpoint.c:278
  ...
  NIP [c00000000093aedc] .hw_breakpoint_handler+0x28c/0x2b0
  LR [c00000000093aed8] .hw_breakpoint_handler+0x288/0x2b0
  Call Trace:
  [c0000002f7933580] [c00000000093aed8] .hw_breakpoint_handler+0x288/0x2b0 (unreliable)
  [c0000002f7933630] [c0000000000f671c] .notifier_call_chain+0x7c/0xf0
  [c0000002f79336d0] [c0000000000f6abc] .__atomic_notifier_call_chain+0xbc/0x1c0
  [c0000002f7933780] [c0000000000f6c40] .notify_die+0x70/0xd0
  [c0000002f7933820] [c00000000001a74c] .do_break+0x4c/0x100
  [c0000002f7933920] [c0000000000089fc] handle_dabr_fault+0x14/0x48

Followed by a lockdep warning:

  ===============================
  [ INFO: suspicious RCU usage. ]
  4.8.0-rc5+ #7 Tainted: G        W
  -------------------------------
  ./include/linux/rcupdate.h:556 Illegal context switch in RCU read-side critical section!

  other info that might help us debug this:

  rcu_scheduler_active = 1, debug_locks = 0
  2 locks held by ls/2998:
   #0:  (rcu_read_lock){......}, at: [<c0000000000f6a00>] .__atomic_notifier_call_chain+0x0/0x1c0
   #1:  (rcu_read_lock){......}, at: [<c00000000093ac50>] .hw_breakpoint_handler+0x0/0x2b0

  stack backtrace:
  CPU: 9 PID: 2998 Comm: ls Tainted: G        W       4.8.0-rc5+ #7
  Call Trace:
  [c0000002f7933150] [c00000000094b1f8] .dump_stack+0xe0/0x14c (unreliable)
  [c0000002f79331e0] [c00000000013c468] .lockdep_rcu_suspicious+0x138/0x180
  [c0000002f7933270] [c0000000001005d8] .___might_sleep+0x278/0x2e0
  [c0000002f7933300] [c000000000935584] .mutex_lock_nested+0x64/0x5a0
  [c0000002f7933410] [c00000000023084c] .perf_event_ctx_lock_nested+0x16c/0x380
  [c0000002f7933500] [c000000000230a80] .perf_event_disable+0x20/0x60
  [c0000002f7933580] [c00000000093aeec] .hw_breakpoint_handler+0x29c/0x2b0
  [c0000002f7933630] [c0000000000f671c] .notifier_call_chain+0x7c/0xf0
  [c0000002f79336d0] [c0000000000f6abc] .__atomic_notifier_call_chain+0xbc/0x1c0
  [c0000002f7933780] [c0000000000f6c40] .notify_die+0x70/0xd0
  [c0000002f7933820] [c00000000001a74c] .do_break+0x4c/0x100
  [c0000002f7933920] [c0000000000089fc] handle_dabr_fault+0x14/0x48

While it looks like the first WARN() is probably valid, the other one is
triggered by disabling event via perf_event_disable() from atomic context.

The event is disabled here in case we were not able to emulate
the instruction that hit the breakpoint. By disabling the event
we unschedule the event and make sure it's not scheduled back.

But we can't call perf_event_disable() from atomic context, instead
we need to use the event's pending_disable irq_work method to disable it.

Reported-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161026094824.GA21397@krava
Signed-off-by: Ingo Molnar <mingo@kernel.org>

perf/core: Protect PMU device removal with a 'pmu_bus_running' check, to fix CONFIG_DEBUG_TEST_DRIVER_REMOVE=y kernel panic

CAI Qian reported a crash in the PMU uncore device removal code,
enabled by the CONFIG_DEBUG_TEST_DRIVER_REMOVE=y option:

https://marc.info/?l=linux-kernel&m=147688837328451

The reason for the crash is that perf_pmu_unregister() tries to remove
a PMU device which is not added at this point. We add PMU devices
only after pmu_bus is registered, which happens in the
perf_event_sysfs_init() call and sets the 'pmu_bus_running' flag.

The fix is to get the 'pmu_bus_running' flag state at the point
the PMU is taken out of the PMU list and remove the device
later only if it's set.

Reported-by: CAI Qian <caiqian@redhat.com>
Tested-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161020111011.GA13361@krava
Signed-off-by: Ingo Molnar <mingo@kernel.org>

x86/microcode/AMD: Fix more fallout from CONFIG_RANDOMIZE_MEMORY=y

We needed the physical address of the container in order to compute the
offset within the relocated ramdisk. And we did this by doing __pa() on
the virtual address.

However, __pa() does checks whether the physical address is within
PAGE_OFFSET and __START_KERNEL_map - see __phys_addr() - which fail
if we have CONFIG_RANDOMIZE_MEMORY enabled: we feed a virtual address
which *doesn't* have the randomization offset into a function which uses
PAGE_OFFSET which *does* have that offset.

This makes this check fire:

VIRTUAL_BUG_ON((x > y) || !phys_addr_valid(x));
^^^^^^

due to the randomization offset.

The fix is as simple as using __pa_nodebug() because we do that
randomization offset accounting later in that function ourselves.

Reported-by: Bob Peterson <rpeterso@redhat.com>
Tested-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm <linux-mm@kvack.org>
Cc: stable@vger.kernel.org # 4.9
Link: http://lkml.kernel.org/r/20161027123623.j2jri5bandimboff@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Merge branch 'akpm' (patches from Andrew)

Merge misc fixes from Andrew Morton:
"20 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  drivers/misc/sgi-gru/grumain.c: remove bogus 0x prefix from printk
  cris/arch-v32: cryptocop: print a hex number after a 0x prefix
  ipack: print a hex number after a 0x prefix
  block: DAC960: print a hex number after a 0x prefix
  fs: exofs: print a hex number after a 0x prefix
  lib/genalloc.c: start search from start of chunk
  mm: memcontrol: do not recurse in direct reclaim
  CREDITS: update credit information for Martin Kepplinger
  proc: fix NULL dereference when reading /proc/<pid>/auxv
  mm: kmemleak: ensure that the task stack is not freed during scanning
  lib/stackdepot.c: bump stackdepot capacity from 16MB to 128MB
  latent_entropy: raise CONFIG_FRAME_WARN by default
  kconfig.h: remove config_enabled() macro
  ipc: account for kmem usage on mqueue and msg
  mm/slab: improve performance of gathering slabinfo stats
  mm: page_alloc: use KERN_CONT where appropriate
  mm/list_lru.c: avoid error-path NULL pointer deref
  h8300: fix syscall restarting
  kcov: properly check if we are in an interrupt
  mm/slab: fix kmemcg cache creation delayed issue

drivers/misc/sgi-gru/grumain.c: remove bogus 0x prefix from printk

Would like to have this be a decimal number.

Link: http://lkml.kernel.org/r/20161026134746.GA30169@sgi.com
Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

cris/arch-v32: cryptocop: print a hex number after a 0x prefix

It makes the result hard to interpret correctly if a base 10 number is
prefixed by 0x. So change to a hex number.

Link: http://lkml.kernel.org/r/20161026125658.25728-6-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ipack: print a hex number after a 0x prefix

It makes the result hard to interpret correctly if a base 10 number is
prefixed by 0x. So change to a hex number.

Link: http://lkml.kernel.org/r/20161026125658.25728-4-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Cc: Jens Taprogge <jens.taprogge@taprogge.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

block: DAC960: print a hex number after a 0x prefix

It makes the message hard to interpret correctly if a base 10 number is
prefixed by 0x. So change to a hex number.

Link: http://lkml.kernel.org/r/20161026125658.25728-3-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fs: exofs: print a hex number after a 0x prefix

It makes the message hard to interpret correctly if a base 10 number is
prefixed by 0x. So change to a hex number.

Link: http://lkml.kernel.org/r/20161026125658.25728-2-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Boaz Harrosh <ooo@electrozaur.com>
Cc: Benny Halevy <bhalevy@primarydata.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

lib/genalloc.c: start search from start of chunk

gen_pool_alloc_algo() iterates over the chunks of a pool trying to find
a contiguous block of memory that satisfies the allocation request.

The shortcut

if (size > atomic_read(&chunk->avail))
continue;

makes the loop skip over chunks that do not have enough bytes left to
fulfill the request.  There are two situations, though, where an
allocation might still fail:

(1) The available memory is not contiguous, i.e.  the request cannot
    be fulfilled due to external fragmentation.

(2) A race condition.  Another thread runs the same code concurrently
    and is quicker to grab the available memory.

In those situations, the loop calls pool->algo() to search the entire
chunk, and pool->algo() returns some value that is >= end_bit to
indicate that the search failed.  This return value is then assigned to
start_bit.  The variables start_bit and end_bit describe the range that
should be searched, and this range should be reset for every chunk that
is searched.  Today, the code fails to reset start_bit to 0.  As a
result, prefixes of subsequent chunks are ignored.  Memory allocations
might fail even though there is plenty of room left in these prefixes of
those other chunks.

Fixes: 7f184275aa30 ("lib, Make gen_pool memory allocator lockless")
Link: http://lkml.kernel.org/r/1477420604-28918-1-git-send-email-danielmentz@google.com
Signed-off-by: Daniel Mentz <danielmentz@google.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: memcontrol: do not recurse in direct reclaim

On 4.0, we saw a stack corruption from a page fault entering direct
memory cgroup reclaim, calling into btrfs_releasepage(), which then
tried to allocate an extent and recursed back into a kmem charge ad
nauseam:

  [...]
  btrfs_releasepage+0x2c/0x30
  try_to_release_page+0x32/0x50
  shrink_page_list+0x6da/0x7a0
  shrink_inactive_list+0x1e5/0x510
  shrink_lruvec+0x605/0x7f0
  shrink_zone+0xee/0x320
  do_try_to_free_pages+0x174/0x440
  try_to_free_mem_cgroup_pages+0xa7/0x130
  try_charge+0x17b/0x830
  memcg_charge_kmem+0x40/0x80
  new_slab+0x2d9/0x5a0
  __slab_alloc+0x2fd/0x44f
  kmem_cache_alloc+0x193/0x1e0
  alloc_extent_state+0x21/0xc0
  __clear_extent_bit+0x2b5/0x400
  try_release_extent_mapping+0x1a3/0x220
  __btrfs_releasepage+0x31/0x70
  btrfs_releasepage+0x2c/0x30
  try_to_release_page+0x32/0x50
  shrink_page_list+0x6da/0x7a0
  shrink_inactive_list+0x1e5/0x510
  shrink_lruvec+0x605/0x7f0
  shrink_zone+0xee/0x320
  do_try_to_free_pages+0x174/0x440
  try_to_free_mem_cgroup_pages+0xa7/0x130
  try_charge+0x17b/0x830
  mem_cgroup_try_charge+0x65/0x1c0
  handle_mm_fault+0x117f/0x1510
  __do_page_fault+0x177/0x420
  do_page_fault+0xc/0x10
  page_fault+0x22/0x30

On later kernels, kmem charging is opt-in rather than opt-out, and that
particular kmem allocation in btrfs_releasepage() is no longer being
charged and won't recurse and overrun the stack anymore.

But it's not impossible for an accounted allocation to happen from the
memcg direct reclaim context, and we needed to reproduce this crash many
times before we even got a useful stack trace out of it.

Like other direct reclaimers, mark tasks in memcg reclaim PF_MEMALLOC to
avoid recursing into any other form of direct reclaim.  Then let
recursive charges from PF_MEMALLOC contexts bypass the cgroup limit.

Link: http://lkml.kernel.org/r/20161025141050.GA13019@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

CREDITS: update credit information for Martin Kepplinger

Content and employer changed.

Link: http://lkml.kernel.org/r/1477304102-28830-1-git-send-email-martin.kepplinger@ginzinger.com
Signed-off-by: Martin Kepplinger <martink@posteo.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

proc: fix NULL dereference when reading /proc/<pid>/auxv

Reading auxv of any kernel thread results in NULL pointer dereferencing
in auxv_read() where mm can be NULL.  Fix that by checking for NULL mm
and bailing out early.  This is also the original behavior changed by
recent commit c5317167854e ("proc: switch auxv to use of __mem_open()").

  # cat /proc/2/auxv
  Unable to handle kernel NULL pointer dereference at virtual address 000000a8
  Internal error: Oops: 17 [#1] PREEMPT SMP ARM
  CPU: 3 PID: 113 Comm: cat Not tainted 4.9.0-rc1-ARCH+ #1
  Hardware name: BCM2709
  task: ea3b0b00 task.stack: e99b2000
  PC is at auxv_read+0x24/0x4c
  LR is at do_readv_writev+0x2fc/0x37c
  Process cat (pid: 113, stack limit = 0xe99b2210)
  Call chain:
    auxv_read
    do_readv_writev
    vfs_readv
    default_file_splice_read
    splice_direct_to_actor
    do_splice_direct
    do_sendfile
    SyS_sendfile64
    ret_fast_syscall

Fixes: c5317167854e ("proc: switch auxv to use of __mem_open()")
Link: http://lkml.kernel.org/r/1476966200-14457-1-git-send-email-chianglungyu@gmail.com
Signed-off-by: Leon Yu <chianglungyu@gmail.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Mateusz Guzik <mguzik@redhat.com>
Cc: Janis Danisevskis <jdanis@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: kmemleak: ensure that the task stack is not freed during scanning

Commit 68f24b08ee89 ("sched/core: Free the stack early if
CONFIG_THREAD_INFO_IN_TASK") may cause the task->stack to be freed
during kmemleak_scan() execution, leading to either a NULL pointer fault
(if task->stack is NULL) or kmemleak accessing already freed memory.

This patch uses the new try_get_task_stack() API to ensure that the task
stack is not freed during kmemleak stack scanning.

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=173901.

Fixes: 68f24b08ee89 ("sched/core: Free the stack early if CONFIG_THREAD_INFO_IN_TASK")
Link: http://lkml.kernel.org/r/1476266223-14325-1-git-send-email-catalin.marinas@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: CAI Qian <caiqian@redhat.com>
Tested-by: CAI Qian <caiqian@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: CAI Qian <caiqian@redhat.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

lib/stackdepot.c: bump stackdepot capacity from 16MB to 128MB

KASAN uses stackdepot to memorize stacks for all kmalloc/kfree calls.
Current stackdepot capacity is 16MB (1024 top level entries x 4 pages on
second level).  Size of each stack is (num_frames + 3) * sizeof(long).
Which gives us ~84K stacks.  This capacity was chosen empirically and it
is enough to run kernel normally.

However, when lots of configs are enabled and a fuzzer tries to maximize
code coverage, it easily hits the limit within tens of minutes.  I've
tested for long a time with number of top level entries bumped 4x
(4096).  And I think I've seen overflow only once.  But I don't have all
configs enabled and code coverage has not reached maximum yet.  So bump
it 8x to 8192.

Since we have two-level table, memory cost of this is very moderate --
currently the top-level table is 8KB, with this patch it is 64KB, which
is negligible under KASAN.

Here is some approx math.

128MB allows us to memorize ~670K stacks (assuming stack is ~200b).
I've grepped kernel for kmalloc|kfree|kmem_cache_alloc|kmem_cache_free|
kzalloc|kstrdup|kstrndup|kmemdup and it gives ~60K matches.  Most of
alloc/free call sites are reachable with only one stack.  But some
utility functions can have large fanout.  Assuming average fanout is 5x,
total number of alloc/free stacks is ~300K.

Link: http://lkml.kernel.org/r/1476458416-122131-1-git-send-email-dvyukov@google.com
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Baozeng Ding <sploving1@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

latent_entropy: raise CONFIG_FRAME_WARN by default

When building with the latent_entropy plugin, set the default
CONFIG_FRAME_WARN to 2048, since some __init functions have many basic
blocks that, when instrumented by the latent_entropy plugin, grow beyond
1024 byte stack size on 32-bit builds.

Link: http://lkml.kernel.org/r/20161018211216.GA39687@beast
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Cc: Emese Revfy <re.emese@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Michal Marek <mmarek@suse.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

kconfig.h: remove config_enabled() macro

The use of config_enabled() is ambiguous.  For config options,
IS_ENABLED(), IS_REACHABLE(), etc.  will make intention clearer.
Sometimes config_enabled() has been used for non-config options because
it is useful to check whether the given symbol is defined or not.

I have been tackling on deprecating config_enabled(), and now is the
time to finish this work.

Some new users have appeared for v4.9-rc1, but it is trivial to replace
them:

- arch/x86/mm/kaslr.c
  replace config_enabled() with IS_ENABLED() because
  CONFIG_X86_ESPFIX64 and CONFIG_EFI are boolean.

- include/asm-generic/export.h
  replace config_enabled() with __is_defined().

Then, config_enabled() can be removed now.

Going forward, please use IS_ENABLED(), IS_REACHABLE(), etc. for config
options, and __is_defined() for non-config symbols.

Link: http://lkml.kernel.org/r/1476616078-32252-1-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Michal Marek <mmarek@suse.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ipc: account for kmem usage on mqueue and msg

When kmem accounting switched from account by default to only account if
flagged by __GFP_ACCOUNT, IPC mqueue and messages was left out.

The production use case at hand is that mqueues should be customizable
via sysctls in Docker containers in a Kubernetes cluster. This can only
be safely allowed to the users of the cluster (without the risk that
they can cause resource shortage on a node, influencing other users'
containers) if all resources they control are bounded, i.e. accounted
for.

Link: http://lkml.kernel.org/r/1476806075-1210-1-git-send-email-arozansk@redhat.com
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Reported-by: Stefan Schimanski <sttts@redhat.com>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Stefan Schimanski <sttts@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/slab: improve performance of gathering slabinfo stats

On large systems, when some slab caches grow to millions of objects (and
many gigabytes), running 'cat /proc/slabinfo' can take up to 1-2
seconds.  During this time, interrupts are disabled while walking the
slab lists (slabs_full, slabs_partial, and slabs_free) for each node,
and this sometimes causes timeouts in other drivers (for instance,
Infiniband).

This patch optimizes 'cat /proc/slabinfo' by maintaining a counter for
total number of allocated slabs per node, per cache.  This counter is
updated when a slab is created or destroyed.  This enables us to skip
traversing the slabs_full list while gathering slabinfo statistics, and
since slabs_full tends to be the biggest list when the cache is large,
it results in a dramatic performance improvement.  Getting slabinfo
statistics now only requires walking the slabs_free and slabs_partial
lists, and those lists are usually much smaller than slabs_full.

We tested this after growing the dentry cache to 70GB, and the
performance improved from 2s to 5ms.

Link: http://lkml.kernel.org/r/1472517876-26814-1-git-send-email-aruna.ramakrishna@oracle.com
Signed-off-by: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: page_alloc: use KERN_CONT where appropriate

Recent changes to printk require KERN_CONT uses to continue logging
messages. So add KERN_CONT where necessary.

[akpm@linux-foundation.org: coding-style fixes]
Fixes: 4bcc595ccd80 ("printk: reinstate KERN_CONT for printing continuation lines")
Link: http://lkml.kernel.org/r/c7df37c8665134654a17aaeb8b9f6ace1d6db58b.1476239034.git.joe@perches.com
Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/list_lru.c: avoid error-path NULL pointer deref

As described in https://bugzilla.kernel.org/show_bug.cgi?id=177821:

After some analysis it seems to be that the problem is in alloc_super().
In case list_lru_init_memcg() fails it goes into destroy_super(), which
calls list_lru_destroy().

And in list_lru_init() we see that in case memcg_init_list_lru() fails,
lru->node is freed, but not set NULL, which then leads list_lru_destroy()
to believe it is initialized and call memcg_destroy_list_lru().
memcg_destroy_list_lru() in turn can access lru->node[i].memcg_lrus,
which is NULL.

[akpm@linux-foundation.org: add comment]
Signed-off-by: Alexander Polakov <apolyakov@beget.ru>
Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

h8300: fix syscall restarting

Back in commit f56141e3e2d9 ("all arches, signal: move restart_block to
struct task_struct"), all architectures and core code were changed to
use task_struct::restart_block. However, when h8300 support was
subsequently restored in v4.2, it was not updated to account for this,
and maintains thread_info::restart_block, which is not kept in sync.

This patch drops the redundant restart_block from thread_info, and moves
h8300 to the common one in task_struct, ensuring that syscall restarting
always works as expected.

Fixes: f56141e3e2d9 ("all arches, signal: move restart_block to struct task_struct")
Link: http://lkml.kernel.org/r/1476714934-11635-1-git-send-email-mark.rutland@arm.com
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: uclinux-h8-devel@lists.sourceforge.jp
Cc: <stable@vger.kernel.org> [4.2+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

kcov: properly check if we are in an interrupt

in_interrupt() returns a nonzero value when we are either in an
interrupt or have bh disabled via local_bh_disable(). Since we are
interested in only ignoring coverage from actual interrupts, do a proper
check instead of just calling in_interrupt().

As a result of this change, kcov will start to collect coverage from
within local_bh_disable()/local_bh_enable() sections.

Link: http://lkml.kernel.org/r/1476115803-20712-1-git-send-email-andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Nicolai Stange <nicstange@gmail.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: James Morse <james.morse@arm.com>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/slab: fix kmemcg cache creation delayed issue

There is a bug report that SLAB makes extreme load average due to over
2000 kworker thread.

  https://bugzilla.kernel.org/show_bug.cgi?id=172981

This issue is caused by kmemcg feature that try to create new set of
kmem_caches for each memcg.  Recently, kmem_cache creation is slowed by
synchronize_sched() and futher kmem_cache creation is also delayed since
kmem_cache creation is synchronized by a global slab_mutex lock.  So,
the number of kworker that try to create kmem_cache increases quietly.

synchronize_sched() is for lockless access to node's shared array but
it's not needed when a new kmem_cache is created.  So, this patch rules
out that case.

Fixes: 801faf0db894 ("mm/slab: lockless decision to grow cache")
Link: http://lkml.kernel.org/r/1475734855-4837-1-git-send-email-iamjoonsoo.kim@lge.com
Reported-by: Doug Smythies <dsmythies@telus.net>
Tested-by: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

device-dax: fix percpu_ref_exit ordering

We need to wait until the percpu_ref is released before exit. Otherwise,
we sometimes lose the race and trigger this new warning that was added
in v4.9 (commit a67823c1ed10 "percpu-refcount: init ->confirm_switch
member properly"):

WARNING: CPU: 0 PID: 3629 at lib/percpu-refcount.c:107 percpu_ref_exit+0x51/0x60
[..]
Call Trace:
  [<ffffffff814bf093>] dump_stack+0x85/0xc2
  [<ffffffff810b15db>] __warn+0xcb/0xf0
  [<ffffffff810b170d>] warn_slowpath_null+0x1d/0x20
  [<ffffffff814d70c1>] percpu_ref_exit+0x51/0x60
  [<ffffffffa005706a>] dax_pmem_percpu_exit+0x1a/0x50 [dax_pmem]
  [<ffffffff81615f1f>] devm_action_release+0xf/0x20

Cc: <stable@vger.kernel.org>
Fixes: ab68f2622136 ("/dev/dax, pmem: direct access to persistent memory")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Allow KASAN and HOTPLUG_MEMORY to co-exist when doing build testing

No, KASAN may not be able to co-exist with HOTPLUG_MEMORY at runtime,
but for build testing there is no reason not to allow them together.

This hopefully means better build coverage and fewer embarrasing silly
problems like the one fixed by commit 9db4f36e82c2 ("mm: remove unused
variable in memory hotplug") in the future.

Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

nvdimm: make CONFIG_NVDIMM_DAX 'bool'

A bugfix just tried to address a randconfig build problem and introduced
a variant of the same problem: with CONFIG_LIBNVDIMM=y and
CONFIG_NVDIMM_DAX=m, the nvdimm module now fails to link:

drivers/nvdimm/built-in.o: In function `to_nd_device_type':
bus.c:(.text+0x1b5d): undefined reference to `is_nd_dax'
drivers/nvdimm/built-in.o: In function `nd_region_notify_driver_action.constprop.2':
region_devs.c:(.text+0x6b6c): undefined reference to `is_nd_dax'
region_devs.c:(.text+0x6b8c): undefined reference to `to_nd_dax'
drivers/nvdimm/built-in.o: In function `nd_region_probe':
region.c:(.text+0x70f3): undefined reference to `nd_dax_create'
drivers/nvdimm/built-in.o: In function `mode_show':
namespace_devs.c:(.text+0xa196): undefined reference to `is_nd_dax'
drivers/nvdimm/built-in.o: In function `nvdimm_namespace_common_probe':
(.text+0xa55f): undefined reference to `is_nd_dax'
drivers/nvdimm/built-in.o: In function `nvdimm_namespace_common_probe':
(.text+0xa56e): undefined reference to `to_nd_dax'

This reverts the earlier fix, making NVDIMM_DAX a 'bool' option again
as it should be (it gets linked into the libnvdimm module). To fix
the original problem, I'm adding a dependency on LIBNVDIMM to
DEV_DAX_PMEM, which ensures we can't have that one built-in if the
rest is a module.

Fixes: 4e65e9381c7a ("/dev/dax: fix Kconfig dependency build breakage")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>