]> git.proxmox.com Git - mirror_ubuntu-kernels.git/log
mirror_ubuntu-kernels.git
7 years agoandroid: binder: Refactor prev and next buffer into a helper function
Sherry Yang [Wed, 23 Aug 2017 15:46:39 +0000 (08:46 -0700)]
android: binder: Refactor prev and next buffer into a helper function

Use helper functions buffer_next and buffer_prev instead
of list_entry to get the next and previous buffers.

Signed-off-by: Sherry Yang <sherryy@android.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodrivers/fmc: carrier can program FPGA on registration
Federico Vaga [Tue, 18 Jul 2017 06:33:24 +0000 (08:33 +0200)]
drivers/fmc: carrier can program FPGA on registration

The initial FPGA may require programming before it is useful.

Signed-off-by: Federico Vaga <federico.vaga@cern.ch>
Tested-by: Pat Riehecky <riehecky@fnal.gov>
Acked-by: Alessandro Rubini <rubini@gnudd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodrivers/fmc: change registration prototype
Federico Vaga [Tue, 18 Jul 2017 06:33:13 +0000 (08:33 +0200)]
drivers/fmc: change registration prototype

Permit use of either fmc_device_register_n or fmc_device_register_n_gw
depending on the type of device in use.

Signed-off-by: Federico Vaga <federico.vaga@cern.ch>
Tested-by: Pat Riehecky <riehecky@fnal.gov>
Acked-by: Alessandro Rubini <rubini@gnudd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodrivers/fmc: The only way to dump the SDB is from debugfs
Federico Vaga [Tue, 18 Jul 2017 06:33:03 +0000 (08:33 +0200)]
drivers/fmc: The only way to dump the SDB is from debugfs

Driver should not call fmc_sdb_dump() anymore. (actually they can but the
operation is not supported, so it will print an error message)

Signed-off-by: Federico Vaga <federico.vaga@cern.ch>
Tested-by: Pat Riehecky <riehecky@fnal.gov>
Acked-by: Alessandro Rubini <rubini@gnudd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodrivers/fmc: hide fmc operations behind helpers
Federico Vaga [Tue, 18 Jul 2017 06:32:53 +0000 (08:32 +0200)]
drivers/fmc: hide fmc operations behind helpers

This gave us more freedom to change/add/remove operations without
recompiling all device driver.

Typically, Carrier board implement the fmc operations, so they will not
use these helpers.

Signed-off-by: Federico Vaga <federico.vaga@cern.ch>
Tested-by: Pat Riehecky <riehecky@fnal.gov>
Acked-by: Alessandro Rubini <rubini@gnudd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodrivers/fmc: remove unused variable
Federico Vaga [Tue, 18 Jul 2017 06:32:42 +0000 (08:32 +0200)]
drivers/fmc: remove unused variable

Signed-off-by: Federico Vaga <federico.vaga@cern.ch>
Tested-by: Pat Riehecky <riehecky@fnal.gov>
Acked-by: Alessandro Rubini <rubini@gnudd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agothunderbolt: Fix reset response_type
Dan Carpenter [Wed, 16 Aug 2017 08:54:17 +0000 (11:54 +0300)]
thunderbolt: Fix reset response_type

There is a mistake here where we accidentally use sizeof(TB_CFG_PKG_RESET)
instead of just TB_CFG_PKG_RESET.  The size of an int is 4 so it's the
same as TB_CFG_PKG_NOTIFY_ACK.

Fixes: d7f781bfdbf4 ("thunderbolt: Rework control channel to be more reliable")
Reported-by: Colin King <colin.king@canonical.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: stable <stable@vger.kernel.org> # 4.13
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agothunderbolt: Allow clearing the key
Bernat, Yehezkel [Tue, 15 Aug 2017 05:19:20 +0000 (08:19 +0300)]
thunderbolt: Allow clearing the key

If secure authentication of a devices fails, either because the device
already has another key uploaded, or there is some other error sending
challenge to the device, and the user only wants to approve the device
just once (without a new key being uploaded to the device) the current
implementation does not allow this because the key cannot be cleared
once set even if we allow it to be changed.

Make this scenario possible and allow clearing the key by writing
empty string to the key sysfs file.

Signed-off-by: Yehezkel Bernat <yehezkel.bernat@intel.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agothunderbolt: Make key root-only accessible
Bernat, Yehezkel [Tue, 15 Aug 2017 05:19:12 +0000 (08:19 +0300)]
thunderbolt: Make key root-only accessible

Non-root user may read the key back after root wrote it there.
This removes read access to everyone but root.

Signed-off-by: Yehezkel Bernat <yehezkel.bernat@intel.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agothunderbolt: Remove superfluous check
Bernat, Yehezkel [Tue, 15 Aug 2017 05:19:01 +0000 (08:19 +0300)]
thunderbolt: Remove superfluous check

The key size is tested by hex2bin() already (as '\0' isn't an hex digit)

Suggested-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Signed-off-by: Yehezkel Bernat <yehezkel.bernat@intel.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agocoresight: constify amba_id
Arvind Yadav [Thu, 24 Aug 2017 16:35:57 +0000 (22:05 +0530)]
coresight: constify amba_id

amba_id are not supposed to change at runtime. All functions
working with const amba_id. So mark the non-const structs as const.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agocoresight: etb10: constify amba_id
Arvind Yadav [Thu, 24 Aug 2017 16:35:58 +0000 (22:05 +0530)]
coresight: etb10: constify amba_id

amba_id are not supposed to change at runtime. All functions
working with const amba_id. So mark the non-const structs as const.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agocoresight: etm3x: constify amba_id
Arvind Yadav [Thu, 24 Aug 2017 16:35:59 +0000 (22:05 +0530)]
coresight: etm3x: constify amba_id

amba_id are not supposed to change at runtime. All functions
working with const amba_id. So mark the non-const structs as const.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agocoresight: etm4x: constify amba_id
Arvind Yadav [Thu, 24 Aug 2017 16:36:00 +0000 (22:06 +0530)]
coresight: etm4x: constify amba_id

amba_id are not supposed to change at runtime. All functions
working with const amba_id. So mark the non-const structs as const.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agocoresight: funnel: constify amba_id
Arvind Yadav [Thu, 24 Aug 2017 16:36:01 +0000 (22:06 +0530)]
coresight: funnel: constify amba_id

amba_id are not supposed to change at runtime. All functions
working with const amba_id. So mark the non-const structs as const.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agocoresight: replicator: constify amba_id
Arvind Yadav [Thu, 24 Aug 2017 16:36:02 +0000 (22:06 +0530)]
coresight: replicator: constify amba_id

amba_id are not supposed to change at runtime. All functions
working with const amba_id. So mark the non-const structs as const.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agocoresight: stm: constify amba_id
Arvind Yadav [Thu, 24 Aug 2017 16:36:03 +0000 (22:06 +0530)]
coresight: stm: constify amba_id

amba_id are not supposed to change at runtime. All functions
working with const amba_id. So mark the non-const structs as const.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agocoresight: tmc: constify amba_id
Arvind Yadav [Thu, 24 Aug 2017 16:36:04 +0000 (22:06 +0530)]
coresight: tmc: constify amba_id

amba_id are not supposed to change at runtime. All functions
working with const amba_id. So mark the non-const structs as const.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agocoresight: tpiu: constify amba_id
Arvind Yadav [Thu, 24 Aug 2017 16:36:05 +0000 (22:06 +0530)]
coresight: tpiu: constify amba_id

amba_id are not supposed to change at runtime. All functions
working with const amba_id. So mark the non-const structs as const.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agocoresight: STM: Clean up __iomem type usage
Stephen Boyd [Wed, 2 Aug 2017 16:22:20 +0000 (10:22 -0600)]
coresight: STM: Clean up __iomem type usage

The casting and other things here is odd, and causes sparse to
complain:

drivers/hwtracing/coresight/coresight-stm.c:279:35: warning: incorrect type in argument 1 (different address spaces)
drivers/hwtracing/coresight/coresight-stm.c:279:35:    expected void [noderef] <asn:2>*addr
drivers/hwtracing/coresight/coresight-stm.c:279:35:    got struct stm_drvdata *drvdata
drivers/hwtracing/coresight/coresight-stm.c:327:17: warning: incorrect type in argument 2 (different address spaces)
drivers/hwtracing/coresight/coresight-stm.c:327:17:    expected void volatile [noderef] <asn:2>*addr
drivers/hwtracing/coresight/coresight-stm.c:327:17:    got void *addr
drivers/hwtracing/coresight/coresight-stm.c:330:17: warning: incorrect type in argument 2 (different address spaces)
drivers/hwtracing/coresight/coresight-stm.c:330:17:    expected void volatile [noderef] <asn:2>*addr
drivers/hwtracing/coresight/coresight-stm.c:330:17:    got void *addr
drivers/hwtracing/coresight/coresight-stm.c:333:17: warning: incorrect type in argument 2 (different address spaces)
drivers/hwtracing/coresight/coresight-stm.c:333:17:    expected void volatile [noderef] <asn:2>*addr
drivers/hwtracing/coresight/coresight-stm.c:333:17:    got void *addr

>From what I can tell, we don't really need to treat ch_addr as
anything besides a pointer, and we can just do pointer math
instead of ORing in the bits of the offset and achieve the same
thing.

Also, we were passing a drvdata pointer to the
coresight_timeout() function, but we really wanted to pass the
address of the register base. Luckily the base is the first
member of the structure, so everything works out, but this is
quite unsafe if we ever change the structure layout. Clean this
all up so sparse stops complaining on this code.

Reported-by: Satyajit Desai <sadesai@codeaurora.org>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agocoresight: Add support for Coresight SoC 600 components
Suzuki K Poulose [Wed, 2 Aug 2017 16:22:18 +0000 (10:22 -0600)]
coresight: Add support for Coresight SoC 600 components

Add the peripheral ids for the Coresight SoC 600 TPIU, replicator
and funnel.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agocoresight tmc: Add support for Coresight SoC 600 TMC
Suzuki K Poulose [Wed, 2 Aug 2017 16:22:17 +0000 (10:22 -0600)]
coresight tmc: Add support for Coresight SoC 600 TMC

The coresight SoC 600 supports ETR save-restore which allows us
to restore a trace session by retaining the RRP/RWP/STS.Full values
when the TMC leaves the Disabled state. However, the TMC doesn't
have a scatter-gather unit in built.

Also, TMCs have different PIDs in different configurations (ETF,
ETB & ETR), unlike the previous generation.

While the DEVID exposes some of the features/changes in the TMC,
it doesn't explicitly advertises the new save-restore feature
as described above.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agocoresight tmc: Support for save-restore in ETR
Suzuki K Poulose [Wed, 2 Aug 2017 16:22:16 +0000 (10:22 -0600)]
coresight tmc: Support for save-restore in ETR

The Coresight SoC 600 TMC ETR supports save-restore feature,
where the values of the RRP/RWP and STS.Full are retained
when it leaves the Disabled state. Hence, we must program the
RRP/RWP and STS.Full to a proper value. For now, set the RRP/RWP
to the base address of the buffer and clear the STS.Full register.
This can be later exploited for proper save-restore of ETR
trace contexts (e.g, perf).

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agocoresight tmc etr: Setup AXI cache encoding for read transfers
Suzuki K Poulose [Wed, 2 Aug 2017 16:22:15 +0000 (10:22 -0600)]
coresight tmc etr: Setup AXI cache encoding for read transfers

If the ETR supports split cache encoding (i.e, separate bits for
read and write transfers) unlike the older version (where read
and write transfers use the same encoding in AXICTL[2-5]).
This feature is not advertised and has to be described by the
static mask associated with the device id.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agocoresight tmc etr: Cleanup AXICTL register handling
Suzuki K Poulose [Wed, 2 Aug 2017 16:22:14 +0000 (10:22 -0600)]
coresight tmc etr: Cleanup AXICTL register handling

This patch cleans up how we setup the AXICTL register on
TMC ETR. At the moment we don't set the CacheCtrl bits, which
drives the arcache and awcache bits on AXI bus specifying the
cacheablitiy. Set this to Write-back Read and Write-allocate.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agocoresight tmc etr: Detect address width at runtime
Suzuki K Poulose [Wed, 2 Aug 2017 16:22:13 +0000 (10:22 -0600)]
coresight tmc etr: Detect address width at runtime

TMC in Coresight SoC-600 advertises the AXI address width
in the device configuration register.

Bit 16 - AXIAW_VALID
 0 - AXI Address Width not valid
 1 - Valid AXI Address width in Bits[23-17]

Bits [23-17] - AXIAW. If AXIAW_VALID = b01 then
 0x20 - 32bit AXI address bus
 0x28 - 40bit AXI address bus
 0x2c - 44bit AXI address bus
 0x30 - 48bit AXI address bus
 0x34 - 52bit AXI address bus

Use the address bits from the device configuration register, if
available. Otherwise, default to 40bit.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agocoresight tmc: Detect support for scatter gather
Suzuki K Poulose [Wed, 2 Aug 2017 16:22:12 +0000 (10:22 -0600)]
coresight tmc: Detect support for scatter gather

The SG unit in the TMC has been removed in Coresight SoC-600.
This is however advertised by DEVID:Bit 24 = 0b1. On the
previous generation, the bit is RES0, hence we can rely on the
DEVID to detect the support.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agocoresight tmc etr: Add capabilitiy information
Suzuki K Poulose [Wed, 2 Aug 2017 16:22:11 +0000 (10:22 -0600)]
coresight tmc etr: Add capabilitiy information

With new version of TMC ETR, there are differing set of
features supported by the TMC. Add the capability of a
given TMC ETR for making safer decisions at runtime.

The device configuration register of the TMC (DEVID) lists
some of the capabilities. So, we can detect some of them at
probe. However, some of the features (or changes in behavior)
are not advertised and we have to depend on the PID to infer
the features. So we use a static description of the "unadvertised"
capabilities attached to the PID. Combining both, the static
and the dynamic capabilities, we maintain a bitmask of the
available features which can be later checked to take
appropriate actions.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agocoresight tmc: Handle configuration types properly
Suzuki K Poulose [Wed, 2 Aug 2017 16:22:10 +0000 (10:22 -0600)]
coresight tmc: Handle configuration types properly

Coresight SoC 600 defines a new configuration for TMC, Embedded Trace
Streamer (ETS), indicated by 0x3 in MODE:CONFIG_TYPE. This would break
the existing driver which will treat anything other than ETR/ETB as an
ETF. Fix the driver to check the configuration type properly and also
add a warning if we encounter an unsupported configuration (ETS).

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agocoresight replicator: Expose replicator management registers
Suzuki K Poulose [Wed, 2 Aug 2017 16:22:09 +0000 (10:22 -0600)]
coresight replicator: Expose replicator management registers

Expose the idfilter* registers of the programmable replicator.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agocoresight tmc: Expose DBA and AXICTL
Suzuki K Poulose [Wed, 2 Aug 2017 16:22:08 +0000 (10:22 -0600)]
coresight tmc: Expose DBA and AXICTL

Expose DBALO,DBAHI and AXICTL registers

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agocoresight tmc: Add helpers for accessing 64bit registers
Suzuki K Poulose [Wed, 2 Aug 2017 16:22:07 +0000 (10:22 -0600)]
coresight tmc: Add helpers for accessing 64bit registers

Coresight TMC splits 64bit registers into a pair of 32bit registers
(e.g DBA, RRP, RWP). Provide helpers to read/write to these registers.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agocoresight: Use the new helper for defining registers
Suzuki K Poulose [Wed, 2 Aug 2017 16:22:06 +0000 (10:22 -0600)]
coresight: Use the new helper for defining registers

Use the new helpers for exposing coresight component registers,
choosing the 64bit variants for appropriate registers.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agocoresight: Add support for reading 64bit registers
Suzuki K Poulose [Wed, 2 Aug 2017 16:22:05 +0000 (10:22 -0600)]
coresight: Add support for reading 64bit registers

Add support for reading a lower and upper 32bits of a register
as a single 64bit register. Also add simplified macros for
direct register accesses.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agocoresight replicator: Cleanup programmable replicator naming
Suzuki K Poulose [Wed, 2 Aug 2017 16:22:04 +0000 (10:22 -0600)]
coresight replicator: Cleanup programmable replicator naming

The Linux coresight drivers define the programmable ATB replicator as
Qualcomm replicator, while this is designed by ARM. This can cause
confusion to a user selecting the driver. Cleanup all references to
make it explicitly clear. This patch :

 1) Replace the compatible string for the replicator :
      qcom,coresight-replicator1x => arm,coresight-dynamic-replicator
 2) Changes the Kconfig symbol (since this is not part of any defconfigs)
     CORESIGHT_QCOM_REPLICATOR => CORESIGHT_DYNAMIC_REPLICATOR
 3) Improves the help message in the Kconfig.
 4) Changes the name of the driver and the file :
      coresight-replicator-qcom => coresight-dynamic-replicator

Cc: Pratik Patel <pratikp@codeaurora.org>
Cc: Ivan T. Ivanov <ivan.ivanov@linaro.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: devicetree@vger.kernel.org
Cc: Mark Rutland <mark.rutland@arm.com>
Acked-by: Rob Herring <robh+dt@kernel.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agocoresight: etm4x: Adds trace return stack option programming for ETMv4.
Mike Leach [Wed, 2 Aug 2017 16:22:03 +0000 (10:22 -0600)]
coresight: etm4x: Adds trace return stack option programming for ETMv4.

Adds handling to program the return stack option into ETMv4 hardware if
specified in the perf command line.

If option is not supported by the hardware then it will be ignored.
This allows capture to move between core/ETM combinations that have the
hardware support to those that do not.

Signed-off-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agocoresight: ptm: Adds trace return stack option programming for PTM.
Mike Leach [Wed, 2 Aug 2017 16:22:02 +0000 (10:22 -0600)]
coresight: ptm: Adds trace return stack option programming for PTM.

Adds handling to program the return stack option into PTM hardware if
specified in the perf command line.

If option is not supported by the hardware then it will be ignored.
This allows capture to move between core/ETM combinations that have the
hardware support to those that do not.

Signed-off-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agocoresight: pmu: Adds return stack option to perf coresight pmu
Mike Leach [Wed, 2 Aug 2017 16:22:01 +0000 (10:22 -0600)]
coresight: pmu: Adds return stack option to perf coresight pmu

Return stack is a programmable option on some ETM and PTM hardware.
Adds the option flags to enable this from the perf event command line.

Signed-off-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agohwtracing: coresight: constify attribute_group structures.
Arvind Yadav [Wed, 2 Aug 2017 16:22:00 +0000 (10:22 -0600)]
hwtracing: coresight: constify attribute_group structures.

attribute_groups are not supposed to change at runtime. All functions
working with attribute_groups provided by <linux/sysfs.h> work with const
attribute_group. So mark the non-const structs as const.

File size before:
  text    data     bss     dec     hex filename
   2573     288     296    3157     c55 coresight-etm-perf.o

File size After adding 'const':
   text    data     bss     dec     hex filename
   2613     224     296    3133     c3d coresight-etm-perf.o

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agocoresight: etm3x: Set synchronisation frequencty to TRM default
Mathieu Poirier [Wed, 2 Aug 2017 16:21:59 +0000 (10:21 -0600)]
coresight: etm3x: Set synchronisation frequencty to TRM default

Register ETMSYNCFR holds the number of by that need to be generated before
periodic synchronisation packets are inserted in the trace stream.  By
zeroing out the config structure, the current code effectively disable
periodic synchronization.

This patch simply initialise the recommended value for this register as
specified in the technical reference manual.

Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agocoresight: etb10: Move etb_disable_hw() outside of lock
Mathieu Poirier [Wed, 2 Aug 2017 16:21:58 +0000 (10:21 -0600)]
coresight: etb10: Move etb_disable_hw() outside of lock

Function etb_disable_hw() is already taking care of unlocking and locking
the coresight access register and as such doesn't need to be placed
within the unlock/lock of function etb_update_buffer().

Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agocoresight: Add barrier packet for synchronisation
Mathieu Poirier [Wed, 2 Aug 2017 16:21:57 +0000 (10:21 -0600)]
coresight: Add barrier packet for synchronisation

When a buffer overflow happens the synchronisation patckets usually
present at the beginning of the buffer are lost, a situation that
prevents the decoder from knowing the context of the traces being
decoded.

This patch adds a barrier packet to be used by sink IPs when a buffer
overflow condition is detected.  These barrier packets are then used
by the decoding library as markers to force re-synchronisation.

Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agocoresight: etb10: Remove useless conversion to LE
Mathieu Poirier [Wed, 2 Aug 2017 16:21:56 +0000 (10:21 -0600)]
coresight: etb10: Remove useless conversion to LE

Internal CoreSight components are rendering trace data in little-endian
format.  As such there is no need to convert the data once more, hence
removing the extra step.

Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agocoresight: Correct buffer lost increment
Mathieu Poirier [Wed, 2 Aug 2017 16:21:55 +0000 (10:21 -0600)]
coresight: Correct buffer lost increment

Many conditions may cause synchronisation to be lost when updating
the perf ring buffer but the end result is still the same: synchronisation
is lost.  As such there is no need to increment the lost count for each
condition, just once will suffice.

Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agospmi: pmic-arb: Move the ownership check to irq_chip callback
Kiran Gunda [Wed, 23 Aug 2017 12:46:26 +0000 (18:16 +0530)]
spmi: pmic-arb: Move the ownership check to irq_chip callback

Check the irq ownership in the irq_request_resources callback
instead of checking it during the irq mapping. This can prevent
installing the flow handler for the interrupt that is not owned by the EE
and allow the irq translation during the gpio driver probe.

Signed-off-by: Kiran Gunda <kgunda@codeaurora.org>
Tested-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agospmi: Convert to using %pOF instead of full_name
Rob Herring [Tue, 18 Jul 2017 21:43:32 +0000 (16:43 -0500)]
spmi: Convert to using %pOF instead of full_name

Now that we have a custom printf format specifier, convert users of
full_name to use %pOF instead. This is preparation to remove storing
of the full path string for each node.

Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agospmi: pmic-arb: Remove checking opc value not less than 0
Fenglin Wu [Fri, 28 Jul 2017 07:10:47 +0000 (12:40 +0530)]
spmi: pmic-arb: Remove checking opc value not less than 0

The opc parameter in pmic_arb_write_cmd() function is defined with type
u8 and it's always greater than or equal to 0. Checking that it's not
less than 0 is redundant and it can cause a forbidden warning during
compilation. Remove the check.

Signed-off-by: Fenglin Wu <fenglinw@codeaurora.org>
Signed-off-by: Kiran Gunda <kgunda@codeaurora.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agospmi: pmic-arb: add support for HW version 5
David Collins [Fri, 28 Jul 2017 07:10:46 +0000 (12:40 +0530)]
spmi: pmic-arb: add support for HW version 5

Add support for version 5 of the SPMI PMIC arbiter.  It utilizes
different offsets for registers than those found on version 3.
Also, the procedure to determine if writing and IRQ access is
allowed for a given PPID changes for version 5.

Signed-off-by: David Collins <collinsd@codeaurora.org>
Signed-off-by: Kiran Gunda <kgunda@codeaurora.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agospmi: pmic-arb: fix a possible null pointer dereference
Kiran Gunda [Fri, 28 Jul 2017 07:10:45 +0000 (12:40 +0530)]
spmi: pmic-arb: fix a possible null pointer dereference

If "core" memory resource is not specified, then the driver could
end up dereferencing a null pointer. Fix this issue.

Signed-off-by: Kiran Gunda <kgunda@codeaurora.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agospmi: pmic-arb: return __iomem pointer instead of offset
Kiran Gunda [Fri, 28 Jul 2017 07:10:44 +0000 (12:40 +0530)]
spmi: pmic-arb: return __iomem pointer instead of offset

Modify the pmic_arb version ops to return an __iomem pointer
to the address instead of an offset. That way we do not need to
care about the base address changes in the new HW version.

Signed-off-by: Kiran Gunda <kgunda@codeaurora.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agospmi: pmic-arb: use irq_chip callback to set spmi irq wakeup capability
Kiran Gunda [Fri, 28 Jul 2017 07:10:43 +0000 (12:40 +0530)]
spmi: pmic-arb: use irq_chip callback to set spmi irq wakeup capability

Currently the driver sets the pmic arbiter core interrupt as wakeup capable
irrespective of the child irqs which causes the system to wakeup
unnecessarily. To fix this, set the core interrupt as wakeup capable
only if any of the child irqs request for it. Do this by marking it as
wakeup capable in the irq_set_wake callback.

Signed-off-by: Kiran Gunda <kgunda@codeaurora.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agospmi: pmic-arb: return the value instead of passing by pointer
Kiran Gunda [Fri, 28 Jul 2017 07:10:42 +0000 (12:40 +0530)]
spmi: pmic-arb: return the value instead of passing by pointer

Returning the output value from a function, when it is possible, is the
better and cleaner way than passing it by the pointer. Hence, modify
the ppid_to_apid mapping function to return apid instead of passing
it by a pointer. While at it, pass the ppid as function parameter to
ppid_to_apid mapping function instead of passing the sid and addr.

Signed-off-by: Kiran Gunda <kgunda@codeaurora.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agospmi: pmic-arb: replace the writel_relaxed with __raw_writel
Kiran Gunda [Fri, 28 Jul 2017 07:10:41 +0000 (12:40 +0530)]
spmi: pmic-arb: replace the writel_relaxed with __raw_writel

Replace the writel_relaxed with __raw_writel to avoid byte swapping
in pmic_arb_write_data() function. That way the code is independent
of the CPU endianness.

Fixes: 111a10bf3e53 ("spmi: pmic-arb: rename spmi_pmic_arb_dev to
spmi_pmic_arb")
Signed-off-by: Kiran Gunda <kgunda@codeaurora.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agospmi: pmic-arb: fix memory allocation for mapping_table
Kiran Gunda [Fri, 28 Jul 2017 07:10:40 +0000 (12:40 +0530)]
spmi: pmic-arb: fix memory allocation for mapping_table

Allocate the correct memory size (max_pmic_peripherals) for the
mapping_table that holds the apid to ppid mapping. Also use a local
variable for mapping_table for better alignment of the code.

Fixes: 987a9f128b8a ("spmi: pmic-arb: Support more than 128 peripherals")
Signed-off-by: Kiran Gunda <kgunda@codeaurora.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agospmi: pmic-arb: optimize qpnpint_irq_set_type function
Kiran Gunda [Fri, 28 Jul 2017 07:10:39 +0000 (12:40 +0530)]
spmi: pmic-arb: optimize qpnpint_irq_set_type function

Optimize the qpnpint_irq_set_type() by using a local variable
to hold the handler type. Also clean up other variable usage.

Signed-off-by: Kiran Gunda <kgunda@codeaurora.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agospmi: pmic-arb: clean up pmic_arb_find_apid function
Kiran Gunda [Fri, 28 Jul 2017 07:10:38 +0000 (12:40 +0530)]
spmi: pmic-arb: clean up pmic_arb_find_apid function

Clean up the pmic_arb_find_apid() by using the local
variables to improve the code readability.

Signed-off-by: Kiran Gunda <kgunda@codeaurora.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agospmi: pmic-arb: rename pa_xx to pmic_arb_xx and other cleanup
Kiran Gunda [Fri, 28 Jul 2017 07:10:37 +0000 (12:40 +0530)]
spmi: pmic-arb: rename pa_xx to pmic_arb_xx and other cleanup

This patch cleans up the following.

- Rename the "pa" to "pmic_arb".
- Rename the spmi_pmic_arb *dev to spmi_pmic_arb *pmic_arb.
- Rename the pa_{read,write}_data() functions to
  pmic_arb_{read,write}_data().
- Rename channel to APID.
- Rename the HWIRQ_*() macros to hwirq_to_*().

Signed-off-by: Kiran Gunda <kgunda@codeaurora.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agospmi: pmic-arb: remove the read/write access checks
Kiran Gunda [Fri, 28 Jul 2017 07:10:36 +0000 (12:40 +0530)]
spmi: pmic-arb: remove the read/write access checks

The access mode checks for peripheral ownership for read/write
permissions should not be required. Every peripheral enabled for
this master is expected to have a read/write permissions. If there
is any such invalid access due to wrong configuration in boot loader
or device tree files, then it should be fixed in those locations.
Hence, remove the access mode checks from the driver.

Signed-off-by: Kiran Gunda <kgunda@codeaurora.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agomei: make device_type const
Bhumika Goyal [Sat, 19 Aug 2017 08:22:16 +0000 (13:52 +0530)]
mei: make device_type const

Make this const as it is only stored in the type field of a device
structure, which is const.
Done using Coccinelle.

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoRevert "staging: Fix build issues with new binder API"
Jisheng Zhang [Fri, 25 Aug 2017 00:39:51 +0000 (08:39 +0800)]
Revert "staging: Fix build issues with new binder API"

This reverts commit d0bdff0db809 ("staging: Fix build issues with new
binder API"), because commit e38361d032f1 ("ARM: 8091/2: add get_user()
support for 8 byte types") has added the 64bit __get_user_asm_*
implementation.

Signed-off-by: Jisheng Zhang <jszhang@marvell.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoMerge 4.13-rc7 into char-misc-next
Greg Kroah-Hartman [Mon, 28 Aug 2017 08:19:01 +0000 (10:19 +0200)]
Merge 4.13-rc7 into char-misc-next

We want the binder fix in here as well for testing and merge issues.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoLinux 4.13-rc7
Linus Torvalds [Mon, 28 Aug 2017 00:20:40 +0000 (17:20 -0700)]
Linux 4.13-rc7

7 years agoMerge tag 'iommu-fixes-v4.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Mon, 28 Aug 2017 00:10:34 +0000 (17:10 -0700)]
Merge tag 'iommu-fixes-v4.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull IOMMU fix from Joerg Roedel:
 "Another fix, this time in common IOMMU sysfs code.

  In the conversion from the old iommu sysfs-code to the
  iommu_device_register interface, I missed to update the release path
  for the struct device associated with an IOMMU. It freed the 'struct
  device', which was a pointer before, but is now embedded in another
  struct.

  Freeing from the middle of allocated memory had all kinds of nasty
  side effects when an IOMMU was unplugged. Unfortunatly nobody
  unplugged and IOMMU until now, so this was not discovered earlier. The
  fix is to make the 'struct device' a pointer again"

* tag 'iommu-fixes-v4.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu: Fix wrong freeing of iommu_device->dev

7 years agoMerge tag 'char-misc-4.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregk...
Linus Torvalds [Mon, 28 Aug 2017 00:08:37 +0000 (17:08 -0700)]
Merge tag 'char-misc-4.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc fix from Greg KH:
 "Here is a single misc driver fix for 4.13-rc7. It resolves a reported
  problem in the Android binder driver due to previous patches in
  4.13-rc.

  It's been in linux-next with no reported issues"

* tag 'char-misc-4.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  ANDROID: binder: fix proc->tsk check.

7 years agoMerge tag 'staging-4.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh...
Linus Torvalds [Mon, 28 Aug 2017 00:03:33 +0000 (17:03 -0700)]
Merge tag 'staging-4.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging/iio fixes from Greg KH:
 "Here are few small staging driver fixes, and some more IIO driver
  fixes for 4.13-rc7. Nothing major, just resolutions for some reported
  problems.

  All of these have been in linux-next with no reported problems"

* tag 'staging-4.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  iio: magnetometer: st_magn: remove ihl property for LSM303AGR
  iio: magnetometer: st_magn: fix status register address for LSM303AGR
  iio: hid-sensor-trigger: Fix the race with user space powering up sensors
  iio: trigger: stm32-timer: fix get trigger mode
  iio: imu: adis16480: Fix acceleration scale factor for adis16480
  PATCH] iio: Fix some documentation warnings
  staging: rtl8188eu: add RNX-N150NUB support
  Revert "staging: fsl-mc: be consistent when checking strcmp() return"
  iio: adc: stm32: fix common clock rate
  iio: adc: ina219: Avoid underflow for sleeping time
  iio: trigger: stm32-timer: add enable attribute
  iio: trigger: stm32-timer: fix get/set down count direction
  iio: trigger: stm32-timer: fix write_raw return value
  iio: trigger: stm32-timer: fix quadrature mode get routine
  iio: bmp280: properly initialize device for humidity reading

7 years agoMerge tag 'ntb-4.13-bugfixes' of git://github.com/jonmason/ntb
Linus Torvalds [Mon, 28 Aug 2017 00:01:54 +0000 (17:01 -0700)]
Merge tag 'ntb-4.13-bugfixes' of git://github.com/jonmason/ntb

Pull NTB fixes from Jon Mason:
 "NTB bug fixes to address an incorrect ntb_mw_count reference in the
  NTB transport, improperly bringing down the link if SPADs are
  corrupted, and an out-of-order issue regarding link negotiation and
  data passing"

* tag 'ntb-4.13-bugfixes' of git://github.com/jonmason/ntb:
  ntb: ntb_test: ensure the link is up before trying to configure the mws
  ntb: transport shouldn't disable link due to bogus values in SPADs
  ntb: use correct mw_count function in ntb_tool and ntb_transport

7 years agoAvoid page waitqueue race leaving possible page locker waiting
Linus Torvalds [Sun, 27 Aug 2017 23:25:09 +0000 (16:25 -0700)]
Avoid page waitqueue race leaving possible page locker waiting

The "lock_page_killable()" function waits for exclusive access to the
page lock bit using the WQ_FLAG_EXCLUSIVE bit in the waitqueue entry
set.

That means that if it gets woken up, other waiters may have been
skipped.

That, in turn, means that if it sees the page being unlocked, it *must*
take that lock and return success, even if a lethal signal is also
pending.

So instead of checking for lethal signals first, we need to check for
them after we've checked the actual bit that we were waiting for.  Even
if that might then delay the killing of the process.

This matches the order of the old "wait_on_bit_lock()" infrastructure
that the page locking used to use (and is still used in a few other
areas).

Note that if we still return an error after having unsuccessfully tried
to acquire the page lock, that is ok: that means that some other thread
was able to get ahead of us and lock the page, and when that other
thread then unlocks the page, the wakeup event will be repeated.  So any
other pending waiters will now get properly woken up.

Fixes: 62906027091f ("mm: add PageWaiters indicating tasks are waiting for a page bit")
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Jan Kara <jack@suse.cz>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoMinor page waitqueue cleanups
Linus Torvalds [Sun, 27 Aug 2017 20:55:12 +0000 (13:55 -0700)]
Minor page waitqueue cleanups

Tim Chen and Kan Liang have been battling a customer load that shows
extremely long page wakeup lists.  The cause seems to be constant NUMA
migration of a hot page that is shared across a lot of threads, but the
actual root cause for the exact behavior has not been found.

Tim has a patch that batches the wait list traversal at wakeup time, so
that we at least don't get long uninterruptible cases where we traverse
and wake up thousands of processes and get nasty latency spikes.  That
is likely 4.14 material, but we're still discussing the page waitqueue
specific parts of it.

In the meantime, I've tried to look at making the page wait queues less
expensive, and failing miserably.  If you have thousands of threads
waiting for the same page, it will be painful.  We'll need to try to
figure out the NUMA balancing issue some day, in addition to avoiding
the excessive spinlock hold times.

That said, having tried to rewrite the page wait queues, I can at least
fix up some of the braindamage in the current situation. In particular:

 (a) we don't want to continue walking the page wait list if the bit
     we're waiting for already got set again (which seems to be one of
     the patterns of the bad load).  That makes no progress and just
     causes pointless cache pollution chasing the pointers.

 (b) we don't want to put the non-locking waiters always on the front of
     the queue, and the locking waiters always on the back.  Not only is
     that unfair, it means that we wake up thousands of reading threads
     that will just end up being blocked by the writer later anyway.

Also add a comment about the layout of 'struct wait_page_key' - there is
an external user of it in the cachefiles code that means that it has to
match the layout of 'struct wait_bit_key' in the two first members.  It
so happens to match, because 'struct page *' and 'unsigned long *' end
up having the same values simply because the page flags are the first
member in struct page.

Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Christopher Lameter <cl@linux.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoClarify (and fix) MAX_LFS_FILESIZE macros
Linus Torvalds [Sun, 27 Aug 2017 19:12:25 +0000 (12:12 -0700)]
Clarify (and fix) MAX_LFS_FILESIZE macros

We have a MAX_LFS_FILESIZE macro that is meant to be filled in by
filesystems (and other IO targets) that know they are 64-bit clean and
don't have any 32-bit limits in their IO path.

It turns out that our 32-bit value for that limit was bogus.  On 32-bit,
the VM layer is limited by the page cache to only 32-bit index values,
but our logic for that was confusing and actually wrong.  We used to
define that value to

(((loff_t)PAGE_SIZE << (BITS_PER_LONG-1))-1)

which is actually odd in several ways: it limits the index to 31 bits,
and then it limits files so that they can't have data in that last byte
of a page that has the highest 31-bit index (ie page index 0x7fffffff).

Neither of those limitations make sense.  The index is actually the full
32 bit unsigned value, and we can use that whole full page.  So the
maximum size of the file would logically be "PAGE_SIZE << BITS_PER_LONG".

However, we do wan tto avoid the maximum index, because we have code
that iterates over the page indexes, and we don't want that code to
overflow.  So the maximum size of a file on a 32-bit host should
actually be one page less than the full 32-bit index.

So the actual limit is ULONG_MAX << PAGE_SHIFT.  That means that we will
not actually be using the page of that last index (ULONG_MAX), but we
can grow a file up to that limit.

The wrong value of MAX_LFS_FILESIZE actually caused problems for Doug
Nazar, who was still using a 32-bit host, but with a 9.7TB 2 x RAID5
volume.  It turns out that our old MAX_LFS_FILESIZE was 8TiB (well, one
byte less), but the actual true VM limit is one page less than 16TiB.

This was invisible until commit c2a9737f45e2 ("vfs,mm: fix a dead loop
in truncate_inode_pages_range()"), which started applying that
MAX_LFS_FILESIZE limit to block devices too.

NOTE! On 64-bit, the page index isn't a limiter at all, and the limit is
actually just the offset type itself (loff_t), which is signed.  But for
clarity, on 64-bit, just use the maximum signed value, and don't make
people have to count the number of 'f' characters in the hex constant.

So just use LLONG_MAX for the 64-bit case.  That was what the value had
been before too, just written out as a hex constant.

Fixes: c2a9737f45e2 ("vfs,mm: fix a dead loop in truncate_inode_pages_range()")
Reported-and-tested-by: Doug Nazar <nazard@nazar.ca>
Cc: Andreas Dilger <adilger@dilger.ca>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Dave Kleikamp <shaggy@kernel.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Linus Torvalds [Sat, 26 Aug 2017 19:48:29 +0000 (12:48 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input

Pull input fixes from Dmitry Torokhov:

 - a tweak to the IBM Trackpoint driver that helps recognizing
   trackpoints on never Lenovo Carbons

 - a fix to the ALPS driver solving scroll issues on some Dells

 - yet another ACPI ID has been added to Elan I2C toucpad driver

 - quieted diagnostic message in soc_button_array driver

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: ALPS - fix two-finger scroll breakage in right side on ALPS touchpad
  Input: soc_button_array - silence -ENOENT error on Dell XPS13 9365
  Input: trackpoint - add new trackpoint firmware ID
  Input: elan_i2c - add ELAN0602 ACPI ID to support Lenovo Yoga310

7 years agoMerge tag 'pci-v4.13-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaa...
Linus Torvalds [Sat, 26 Aug 2017 19:46:14 +0000 (12:46 -0700)]
Merge tag 'pci-v4.13-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI fix from Bjorn Helgaas:
 "Remove needlessly alarming MSI affinity warning (this is not actually
  a bug fix, but the warning prompts unnecessary bug reports)"

* tag 'pci-v4.13-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI/MSI: Don't warn when irq_create_affinity_masks() returns NULL

7 years agoMerge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 26 Aug 2017 16:06:28 +0000 (09:06 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:
 "Two fixes: one for an ldt_struct handling bug and a cherry-picked
  objtool fix"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Fix use-after-free of ldt_struct
  objtool: Fix '-mtune=atom' decoding support in objtool 2.0

7 years agoMerge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 26 Aug 2017 16:02:18 +0000 (09:02 -0700)]
Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fix from Ingo Molnar:
 "Fix a timer granularity handling race+bug, which would manifest itself
  by spuriously increasing timeouts of some timers (from 1 jiffy to ~500
  jiffies in the worst case measured) in certain nohz states"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timers: Fix excessive granularity of new timers after a nohz idle

7 years agoMerge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 26 Aug 2017 15:59:50 +0000 (08:59 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fix from Ingo Molnar:
 "A single fix to not allow nonsensical event groups that result in
  kernel warnings"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: Fix group {cpu,task} validation

7 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Sat, 26 Aug 2017 01:02:27 +0000 (18:02 -0700)]
Merge branch 'akpm' (patches from Andrew)

Merge misc fixes from Andrew Morton:
 "6 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm/memblock.c: reversed logic in memblock_discard()
  fork: fix incorrect fput of ->exe_file causing use-after-free
  mm/madvise.c: fix freeing of locked page with MADV_FREE
  dax: fix deadlock due to misaligned PMD faults
  mm, shmem: fix handling /sys/kernel/mm/transparent_hugepage/shmem_enabled
  PM/hibernate: touch NMI watchdog when creating snapshot

7 years agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Sat, 26 Aug 2017 00:46:23 +0000 (17:46 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull Paolo Bonzini:
 "Bugfixes for x86, PPC and s390"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: PPC: Book3S: Fix race and leak in kvm_vm_ioctl_create_spapr_tce()
  KVM, pkeys: do not use PKRU value in vcpu->arch.guest_fpu.state
  KVM: x86: simplify handling of PKRU
  KVM: x86: block guest protection keys unless the host has them enabled
  KVM: PPC: Book3S HV: Add missing barriers to XIVE code and document them
  KVM: PPC: Book3S HV: Workaround POWER9 DD1.0 bug causing IPB bit loss
  KVM: PPC: Book3S HV: Use msgsync with hypervisor doorbells on POWER9
  KVM: s390: sthyi: fix specification exception detection
  KVM: s390: sthyi: fix sthyi inline assembly

7 years agoMerge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Linus Torvalds [Sat, 26 Aug 2017 00:40:03 +0000 (17:40 -0700)]
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio fixes from Michael Tsirkin:
 "Fixes two obvious bugs in virtio pci"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  virtio_pci: fix cpu affinity support
  virtio_blk: fix incorrect message when disk is resized

7 years agoMerge tag 'powerpc-4.13-8' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc...
Linus Torvalds [Sat, 26 Aug 2017 00:32:35 +0000 (17:32 -0700)]
Merge tag 'powerpc-4.13-8' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fix from Michael Ellerman:
 "Just one fix, to add a barrier in the switch_mm() code to make sure
  the mm cpumask update is ordered vs the MMU starting to load
  translations. As far as we know no one's actually hit the bug, but
  that's just luck.

  Thanks to Benjamin Herrenschmidt, Nicholas Piggin"

* tag 'powerpc-4.13-8' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/mm: Ensure cpumask update is ordered

7 years agoMerge tag 'nfsd-4.13-2' of git://linux-nfs.org/~bfields/linux
Linus Torvalds [Sat, 26 Aug 2017 00:27:26 +0000 (17:27 -0700)]
Merge tag 'nfsd-4.13-2' of git://linux-nfs.org/~bfields/linux

Pull nfsd fixes from Bruce Fields:
 "Two nfsd bugfixes, neither 4.13 regressions, but both potentially
  serious"

* tag 'nfsd-4.13-2' of git://linux-nfs.org/~bfields/linux:
  net: sunrpc: svcsock: fix NULL-pointer exception
  nfsd: Limit end of page list when decoding NFSv4 WRITE

7 years agoMerge tag 'cifs-fixes-for-4.13-rc6-and-stable' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Sat, 26 Aug 2017 00:22:33 +0000 (17:22 -0700)]
Merge tag 'cifs-fixes-for-4.13-rc6-and-stable' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
 "Some bug fixes for stable for cifs"

* tag 'cifs-fixes-for-4.13-rc6-and-stable' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: return ENAMETOOLONG for overlong names in cifs_open()/cifs_lookup()
  cifs: Fix df output for users with quota limits

7 years agoMerge tag 'for-linus-20170825' of git://git.infradead.org/linux-mtd
Linus Torvalds [Sat, 26 Aug 2017 00:09:19 +0000 (17:09 -0700)]
Merge tag 'for-linus-20170825' of git://git.infradead.org/linux-mtd

Pull MTD fixes from Brian Norris:
 "Two fixes - one for a 4.13 regression, and the other for an older one:

   - Atmel NAND: since we started utilizing ONFI timings, we found that
     we were being too restrict at rejecting them, partly due to
     discrepancies in ONFI 4.0 and earlier versions. Relax the
     restriction to keep these platforms booting. This is a 4.13-rc1
     regression.

   - nandsim: repeated probe/removal may not work after a failed init,
     because we didn't free up our debugfs files properly on the failure
     path. This has been around since 3.8, but it's nice to get this
     fixed now in a nice easy patch that can target -stable, since
     there's already refactoring work (that also fixes the issue)
     targeted for the next merge window"

* tag 'for-linus-20170825' of git://git.infradead.org/linux-mtd:
  mtd: nand: atmel: Relax tADL_min constraint
  mtd: nandsim: remove debugfs entries in error path

7 years agoMerge branch 'for-linus' of git://git.kernel.dk/linux-block
Linus Torvalds [Sat, 26 Aug 2017 00:02:59 +0000 (17:02 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "A small batch of fixes that should be included for the 4.13 release.
  This contains:

   - Revert of the 4k loop blocksize support. Even with a recent batch
     of 4 fixes, we're still not really happy with it. Rather than be
     stuck with an API issue, let's revert it and get it right for 4.14.

   - Trivial patch from Bart, adding a few flags to the blk-mq debugfs
     exports that were added in this release, but not to the debugfs
     parts.

   - Regression fix for bsg, fixing a potential kernel panic. From
     Benjamin.

   - Tweak for the blk throttling, improving how we account discards.
     From Shaohua"

* 'for-linus' of git://git.kernel.dk/linux-block:
  blk-mq-debugfs: Add names for recently added flags
  bsg-lib: fix kernel panic resulting from missing allocation of reply-buffer
  Revert "loop: support 4k physical blocksize"
  blk-throttle: cap discard request size

7 years agoMerge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa...
Linus Torvalds [Fri, 25 Aug 2017 23:59:38 +0000 (16:59 -0700)]
Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
 "I2C has some bugfixes for you: mainly Jarkko fixed up a few things in
  the designware driver regarding the new slave mode. But Ulf also fixed
  a long-standing and now agreed suspend problem. Plus, some simple
  stuff which nonetheless needs fixing"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: designware: Fix runtime PM for I2C slave mode
  i2c: designware: Remove needless pm_runtime_put_noidle() call
  i2c: aspeed: fixed potential null pointer dereference
  i2c: simtec: use release_mem_region instead of release_resource
  i2c: core: Make comment about I2C table requirement to reflect the code
  i2c: designware: Fix standard mode speed when configuring the slave mode
  i2c: designware: Fix oops from i2c_dw_irq_handler_slave
  i2c: designware: Fix system suspend

7 years agoPCI/MSI: Don't warn when irq_create_affinity_masks() returns NULL
Christoph Hellwig [Fri, 25 Aug 2017 23:58:42 +0000 (18:58 -0500)]
PCI/MSI: Don't warn when irq_create_affinity_masks() returns NULL

irq_create_affinity_masks() can return NULL on non-SMP systems, when there
are not enough "free" vectors available to spread, or if memory allocation
for the CPU masks fails.  Only the allocation failure is of interest, and
even then the system will work just fine except for non-optimally spread
vectors.  Thus remove the warnings.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: David S. Miller <davem@davemloft.net>
7 years agoMerge tag 'mmc-v4.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Linus Torvalds [Fri, 25 Aug 2017 23:57:53 +0000 (16:57 -0700)]
Merge tag 'mmc-v4.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc

Pull MMC fix from Ulf Hansson:
 "MMC core: don't return error code R1_OUT_OF_RANGE for open-ending mode"

* tag 'mmc-v4.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: block: prevent propagating R1_OUT_OF_RANGE for open-ending mode

7 years agoMerge tag 'sound-4.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai...
Linus Torvalds [Fri, 25 Aug 2017 23:56:04 +0000 (16:56 -0700)]
Merge tag 'sound-4.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "We're keeping in a good shape, this batch contains just a few small
  fixes (a regression fix for ASoC rt5677 codec, NULL dereference and
  error-path fixes in firewire, and a corner-case ioctl error fix for
  user TLV), as well as usual quirks for USB-audio and HD-audio"

* tag 'sound-4.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ASoC: rt5677: Reintroduce I2C device IDs
  ALSA: hda - Add stereo mic quirk for Lenovo G50-70 (17aa:3978)
  ALSA: core: Fix unexpected error at replacing user TLV
  ALSA: usb-audio: Add delay quirk for H650e/Jabra 550a USB headsets
  ALSA: firewire-motu: destroy stream data surely at failure of card initialization
  ALSA: firewire: fix NULL pointer dereference when releasing uninitialized data of iso-resource

7 years agoMerge tag 'dmaengine-fix-4.13-rc7' of git://git.infradead.org/users/vkoul/slave-dma
Linus Torvalds [Fri, 25 Aug 2017 23:43:08 +0000 (16:43 -0700)]
Merge tag 'dmaengine-fix-4.13-rc7' of git://git.infradead.org/users/vkoul/slave-dma

Pull dmaengine fix from Vinod Koul:
 "A single fix for tegra210-adma driver to check of_irq_get() error"

* tag 'dmaengine-fix-4.13-rc7' of git://git.infradead.org/users/vkoul/slave-dma:
  dmaengine: tegra210-adma: fix of_irq_get() error check

7 years agoMerge tag 'drm-fixes-for-v4.13-rc7' of git://people.freedesktop.org/~airlied/linux
Linus Torvalds [Fri, 25 Aug 2017 23:39:51 +0000 (16:39 -0700)]
Merge tag 'drm-fixes-for-v4.13-rc7' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
 "Fixes for rc7, nothing too crazy, some core, i915, and sunxi fixes,
  Intel CI has been responsible for some of these fixes being required"

* tag 'drm-fixes-for-v4.13-rc7' of git://people.freedesktop.org/~airlied/linux:
  drm/i915/gvt: Fix the kernel null pointer error
  drm: Release driver tracking before making the object available again
  drm/i915: Clear lost context-switch interrupts across reset
  drm/i915/bxt: use NULL for GPIO connection ID
  drm/i915/cnl: Fix LSPCON support.
  drm/i915/vbt: ignore extraneous child devices for a port
  drm/i915: Initialize 'data' in intel_dsi_dcs_backlight.c
  drm/atomic: If the atomic check fails, return its value first
  drm/atomic: Handle -EDEADLK with out-fences correctly
  drm: Fix framebuffer leak
  drm/imx: ipuv3-plane: fix YUV framebuffer scanout on the base plane
  gpu: ipu-v3: add DRM dependency
  drm/rockchip: Fix suspend crash when drm is not bound
  drm/sun4i: Implement drm_driver lastclose to restore fbdev console

7 years agomm/memblock.c: reversed logic in memblock_discard()
Pavel Tatashin [Fri, 25 Aug 2017 22:55:46 +0000 (15:55 -0700)]
mm/memblock.c: reversed logic in memblock_discard()

In recently introduced memblock_discard() there is a reversed logic bug.
Memory is freed of static array instead of dynamically allocated one.

Link: http://lkml.kernel.org/r/1503511441-95478-2-git-send-email-pasha.tatashin@oracle.com
Fixes: 3010f876500f ("mm: discard memblock data later")
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reported-by: Woody Suwalski <terraluna977@gmail.com>
Tested-by: Woody Suwalski <terraluna977@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agofork: fix incorrect fput of ->exe_file causing use-after-free
Eric Biggers [Fri, 25 Aug 2017 22:55:43 +0000 (15:55 -0700)]
fork: fix incorrect fput of ->exe_file causing use-after-free

Commit 7c051267931a ("mm, fork: make dup_mmap wait for mmap_sem for
write killable") made it possible to kill a forking task while it is
waiting to acquire its ->mmap_sem for write, in dup_mmap().

However, it was overlooked that this introduced an new error path before
a reference is taken on the mm_struct's ->exe_file.  Since the
->exe_file of the new mm_struct was already set to the old ->exe_file by
the memcpy() in dup_mm(), it was possible for the mmput() in the error
path of dup_mm() to drop a reference to ->exe_file which was never
taken.

This caused the struct file to later be freed prematurely.

Fix it by updating mm_init() to NULL out the ->exe_file, in the same
place it clears other things like the list of mmaps.

This bug was found by syzkaller.  It can be reproduced using the
following C program:

    #define _GNU_SOURCE
    #include <pthread.h>
    #include <stdlib.h>
    #include <sys/mman.h>
    #include <sys/syscall.h>
    #include <sys/wait.h>
    #include <unistd.h>

    static void *mmap_thread(void *_arg)
    {
        for (;;) {
            mmap(NULL, 0x1000000, PROT_READ,
                 MAP_POPULATE|MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
        }
    }

    static void *fork_thread(void *_arg)
    {
        usleep(rand() % 10000);
        fork();
    }

    int main(void)
    {
        fork();
        fork();
        fork();
        for (;;) {
            if (fork() == 0) {
                pthread_t t;

                pthread_create(&t, NULL, mmap_thread, NULL);
                pthread_create(&t, NULL, fork_thread, NULL);
                usleep(rand() % 10000);
                syscall(__NR_exit_group, 0);
            }
            wait(NULL);
        }
    }

No special kernel config options are needed.  It usually causes a NULL
pointer dereference in __remove_shared_vm_struct() during exit, or in
dup_mmap() (which is usually inlined into copy_process()) during fork.
Both are due to a vm_area_struct's ->vm_file being used after it's
already been freed.

Google Bug Id: 64772007

Link: http://lkml.kernel.org/r/20170823211408.31198-1-ebiggers3@gmail.com
Fixes: 7c051267931a ("mm, fork: make dup_mmap wait for mmap_sem for write killable")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org> [v4.7+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agomm/madvise.c: fix freeing of locked page with MADV_FREE
Eric Biggers [Fri, 25 Aug 2017 22:55:39 +0000 (15:55 -0700)]
mm/madvise.c: fix freeing of locked page with MADV_FREE

If madvise(..., MADV_FREE) split a transparent hugepage, it called
put_page() before unlock_page().

This was wrong because put_page() can free the page, e.g. if a
concurrent madvise(..., MADV_DONTNEED) has removed it from the memory
mapping. put_page() then rightfully complained about freeing a locked
page.

Fix this by moving the unlock_page() before put_page().

This bug was found by syzkaller, which encountered the following splat:

    BUG: Bad page state in process syzkaller412798  pfn:1bd800
    page:ffffea0006f60000 count:0 mapcount:0 mapping:          (null) index:0x20a00
    flags: 0x200000000040019(locked|uptodate|dirty|swapbacked)
    raw: 0200000000040019 0000000000000000 0000000000020a00 00000000ffffffff
    raw: ffffea0006f60020 ffffea0006f60020 0000000000000000 0000000000000000
    page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
    bad because of flags: 0x1(locked)
    Modules linked in:
    CPU: 1 PID: 3037 Comm: syzkaller412798 Not tainted 4.13.0-rc5+ #35
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
     __dump_stack lib/dump_stack.c:16 [inline]
     dump_stack+0x194/0x257 lib/dump_stack.c:52
     bad_page+0x230/0x2b0 mm/page_alloc.c:565
     free_pages_check_bad+0x1f0/0x2e0 mm/page_alloc.c:943
     free_pages_check mm/page_alloc.c:952 [inline]
     free_pages_prepare mm/page_alloc.c:1043 [inline]
     free_pcp_prepare mm/page_alloc.c:1068 [inline]
     free_hot_cold_page+0x8cf/0x12b0 mm/page_alloc.c:2584
     __put_single_page mm/swap.c:79 [inline]
     __put_page+0xfb/0x160 mm/swap.c:113
     put_page include/linux/mm.h:814 [inline]
     madvise_free_pte_range+0x137a/0x1ec0 mm/madvise.c:371
     walk_pmd_range mm/pagewalk.c:50 [inline]
     walk_pud_range mm/pagewalk.c:108 [inline]
     walk_p4d_range mm/pagewalk.c:134 [inline]
     walk_pgd_range mm/pagewalk.c:160 [inline]
     __walk_page_range+0xc3a/0x1450 mm/pagewalk.c:249
     walk_page_range+0x200/0x470 mm/pagewalk.c:326
     madvise_free_page_range.isra.9+0x17d/0x230 mm/madvise.c:444
     madvise_free_single_vma+0x353/0x580 mm/madvise.c:471
     madvise_dontneed_free mm/madvise.c:555 [inline]
     madvise_vma mm/madvise.c:664 [inline]
     SYSC_madvise mm/madvise.c:832 [inline]
     SyS_madvise+0x7d3/0x13c0 mm/madvise.c:760
     entry_SYSCALL_64_fastpath+0x1f/0xbe

Here is a C reproducer:

    #define _GNU_SOURCE
    #include <pthread.h>
    #include <sys/mman.h>
    #include <unistd.h>

    #define MADV_FREE 8
    #define PAGE_SIZE 4096

    static void *mapping;
    static const size_t mapping_size = 0x1000000;

    static void *madvise_thrproc(void *arg)
    {
        madvise(mapping, mapping_size, (long)arg);
    }

    int main(void)
    {
        pthread_t t[2];

        for (;;) {
            mapping = mmap(NULL, mapping_size, PROT_WRITE,
                           MAP_POPULATE|MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);

            munmap(mapping + mapping_size / 2, PAGE_SIZE);

            pthread_create(&t[0], 0, madvise_thrproc, (void*)MADV_DONTNEED);
            pthread_create(&t[1], 0, madvise_thrproc, (void*)MADV_FREE);
            pthread_join(t[0], NULL);
            pthread_join(t[1], NULL);
            munmap(mapping, mapping_size);
        }
    }

Note: to see the splat, CONFIG_TRANSPARENT_HUGEPAGE=y and
CONFIG_DEBUG_VM=y are needed.

Google Bug Id: 64696096

Link: http://lkml.kernel.org/r/20170823205235.132061-1-ebiggers3@gmail.com
Fixes: 854e9ed09ded ("mm: support madvise(MADV_FREE)")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org> [v4.5+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agodax: fix deadlock due to misaligned PMD faults
Ross Zwisler [Fri, 25 Aug 2017 22:55:36 +0000 (15:55 -0700)]
dax: fix deadlock due to misaligned PMD faults

In DAX there are two separate places where the 2MiB range of a PMD is
defined.

The first is in the page tables, where a PMD mapping inserted for a
given address spans from (vmf->address & PMD_MASK) to ((vmf->address &
PMD_MASK) + PMD_SIZE - 1).  That is, from the 2MiB boundary below the
address to the 2MiB boundary above the address.

So, for example, a fault at address 3MiB (0x30 0000) falls within the
PMD that ranges from 2MiB (0x20 0000) to 4MiB (0x40 0000).

The second PMD range is in the mapping->page_tree, where a given file
offset is covered by a radix tree entry that spans from one 2MiB aligned
file offset to another 2MiB aligned file offset.

So, for example, the file offset for 3MiB (pgoff 768) falls within the
PMD range for the order 9 radix tree entry that ranges from 2MiB (pgoff
512) to 4MiB (pgoff 1024).

This system works so long as the addresses and file offsets for a given
mapping both have the same offsets relative to the start of each PMD.

Consider the case where the starting address for a given file isn't 2MiB
aligned - say our faulting address is 3 MiB (0x30 0000), but that
corresponds to the beginning of our file (pgoff 0).  Now all the PMDs in
the mapping are misaligned so that the 2MiB range defined in the page
tables never matches up with the 2MiB range defined in the radix tree.

The current code notices this case for DAX faults to storage with the
following test in dax_pmd_insert_mapping():

if (pfn_t_to_pfn(pfn) & PG_PMD_COLOUR)
goto unlock_fallback;

This test makes sure that the pfn we get from the driver is 2MiB
aligned, and relies on the assumption that the 2MiB alignment of the pfn
we get back from the driver matches the 2MiB alignment of the faulting
address.

However, faults to holes were not checked and we could hit the problem
described above.

This was reported in response to the NVML nvml/src/test/pmempool_sync
TEST5:

$ cd nvml/src/test/pmempool_sync
$ make TEST5

You can grab NVML here:

https://github.com/pmem/nvml/

The dmesg warning you see when you hit this error is:

  WARNING: CPU: 13 PID: 2900 at fs/dax.c:641 dax_insert_mapping_entry+0x2df/0x310

Where we notice in dax_insert_mapping_entry() that the radix tree entry
we are about to replace doesn't match the locked entry that we had
previously inserted into the tree.  This happens because the initial
insertion was done in grab_mapping_entry() using a pgoff calculated from
the faulting address (vmf->address), and the replacement in
dax_pmd_load_hole() => dax_insert_mapping_entry() is done using
vmf->pgoff.

In our failure case those two page offsets (one calculated from
vmf->address, one using vmf->pgoff) point to different order 9 radix
tree entries.

This failure case can result in a deadlock because the radix tree unlock
also happens on the pgoff calculated from vmf->address.  This means that
the locked radix tree entry that we swapped in to the tree in
dax_insert_mapping_entry() using vmf->pgoff is never unlocked, so all
future faults to that 2MiB range will block forever.

Fix this by validating that the faulting address's PMD offset matches
the PMD offset from the start of the file.  This check is done at the
very beginning of the fault and covers faults that would have mapped to
storage as well as faults to holes.  I left the COLOUR check in
dax_pmd_insert_mapping() in place in case we ever hit the insanity
condition where the alignment of the pfn we get from the driver doesn't
match the alignment of the userspace address.

Link: http://lkml.kernel.org/r/20170822222436.18926-1-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reported-by: "Slusarz, Marcin" <marcin.slusarz@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agomm, shmem: fix handling /sys/kernel/mm/transparent_hugepage/shmem_enabled
Kirill A. Shutemov [Fri, 25 Aug 2017 22:55:33 +0000 (15:55 -0700)]
mm, shmem: fix handling /sys/kernel/mm/transparent_hugepage/shmem_enabled

/sys/kernel/mm/transparent_hugepage/shmem_enabled controls if we want
to allocate huge pages when allocate pages for private in-kernel shmem
mount.

Unfortunately, as Dan noticed, I've screwed it up and the only way to
make kernel allocate huge page for the mount is to use "force" there.
All other values will be effectively ignored.

Link: http://lkml.kernel.org/r/20170822144254.66431-1-kirill.shutemov@linux.intel.com
Fixes: 5a6e75f8110c ("shmem: prepare huge= mount option and sysfs knob")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: stable <stable@vger.kernel.org> [4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoPM/hibernate: touch NMI watchdog when creating snapshot
Chen Yu [Fri, 25 Aug 2017 22:55:30 +0000 (15:55 -0700)]
PM/hibernate: touch NMI watchdog when creating snapshot

There is a problem that when counting the pages for creating the
hibernation snapshot will take significant amount of time, especially on
system with large memory.  Since the counting job is performed with irq
disabled, this might lead to NMI lockup.  The following warning were
found on a system with 1.5TB DRAM:

  Freezing user space processes ... (elapsed 0.002 seconds) done.
  OOM killer disabled.
  PM: Preallocating image memory...
  NMI watchdog: Watchdog detected hard LOCKUP on cpu 27
  CPU: 27 PID: 3128 Comm: systemd-sleep Not tainted 4.13.0-0.rc2.git0.1.fc27.x86_64 #1
  task: ffff9f01971ac000 task.stack: ffffb1a3f325c000
  RIP: 0010:memory_bm_find_bit+0xf4/0x100
  Call Trace:
   swsusp_set_page_free+0x2b/0x30
   mark_free_pages+0x147/0x1c0
   count_data_pages+0x41/0xa0
   hibernate_preallocate_memory+0x80/0x450
   hibernation_snapshot+0x58/0x410
   hibernate+0x17c/0x310
   state_store+0xdf/0xf0
   kobj_attr_store+0xf/0x20
   sysfs_kf_write+0x37/0x40
   kernfs_fop_write+0x11c/0x1a0
   __vfs_write+0x37/0x170
   vfs_write+0xb1/0x1a0
   SyS_write+0x55/0xc0
   entry_SYSCALL_64_fastpath+0x1a/0xa5
  ...
  done (allocated 6590003 pages)
  PM: Allocated 26360012 kbytes in 19.89 seconds (1325.28 MB/s)

It has taken nearly 20 seconds(2.10GHz CPU) thus the NMI lockup was
triggered.  In case the timeout of the NMI watch dog has been set to 1
second, a safe interval should be 6590003/20 = 320k pages in theory.
However there might also be some platforms running at a lower frequency,
so feed the watchdog every 100k pages.

[yu.c.chen@intel.com: simplification]
Link: http://lkml.kernel.org/r/1503460079-29721-1-git-send-email-yu.c.chen@intel.com
[yu.c.chen@intel.com: use interval of 128k instead of 100k to avoid modulus]
Link: http://lkml.kernel.org/r/1503328098-5120-1-git-send-email-yu.c.chen@intel.com
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Reported-by: Jan Filipcewicz <jan.filipcewicz@intel.com>
Suggested-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Michal Hocko <mhocko@suse.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Len Brown <lenb@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agovirtio_pci: fix cpu affinity support
Christoph Hellwig [Thu, 24 Aug 2017 16:07:02 +0000 (18:07 +0200)]
virtio_pci: fix cpu affinity support

Commit 0b0f9dc5 ("Revert "virtio_pci: use shared interrupts for
virtqueues"") removed the adjustment of the pre_vectors for the virtio
MSI-X vector allocation which was added in commit fb5e31d9 ("virtio:
allow drivers to request IRQ affinity when creating VQs"). This will
lead to an incorrect assignment of MSI-X vectors, and potential
deadlocks when offlining cpus.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Fixes: 0b0f9dc5 ("Revert "virtio_pci: use shared interrupts for virtqueues")
Reported-by: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
7 years agovirtio_blk: fix incorrect message when disk is resized
Stefan Hajnoczi [Wed, 26 Jul 2017 14:32:23 +0000 (15:32 +0100)]
virtio_blk: fix incorrect message when disk is resized

The message printed on disk resize is incorrect.  The following is
printed when resizing to 2 GiB:

  $ truncate -s 1G test.img
  $ qemu -device virtio-blk-pci,logical_block_size=4096,...
  (qemu) block_resize drive1 2G

  virtio_blk virtio0: new size: 4194304 4096-byte logical blocks (17.2 GB/16.0 GiB)

The virtio_blk capacity config field is in 512-byte sector units
regardless of logical_block_size as per the VIRTIO specification.
Therefore the message should read:

  virtio_blk virtio0: new size: 524288 4096-byte logical blocks (2.15 GB/2.0 GiB)

Note that this only affects the printed message.  Thankfully the actual
block device has the correct size because the block layer expects
capacity in sectors.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
7 years agoblk-mq-debugfs: Add names for recently added flags
Bart Van Assche [Fri, 18 Aug 2017 22:52:54 +0000 (15:52 -0700)]
blk-mq-debugfs: Add names for recently added flags

The symbolic constants QUEUE_FLAG_SCSI_PASSTHROUGH, QUEUE_FLAG_QUIESCED
and REQ_NOWAIT are missing from blk-mq-debugfs.c. Add these to
blk-mq-debugfs.c such that these appear as names in debugfs instead of
as numbers.

Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
7 years agoKVM: PPC: Book3S: Fix race and leak in kvm_vm_ioctl_create_spapr_tce()
Paul Mackerras [Thu, 24 Aug 2017 09:14:47 +0000 (19:14 +1000)]
KVM: PPC: Book3S: Fix race and leak in kvm_vm_ioctl_create_spapr_tce()

Nixiaoming pointed out that there is a memory leak in
kvm_vm_ioctl_create_spapr_tce() if the call to anon_inode_getfd()
fails; the memory allocated for the kvmppc_spapr_tce_table struct
is not freed, and nor are the pages allocated for the iommu
tables.  In addition, we have already incremented the process's
count of locked memory pages, and this doesn't get restored on
error.

David Hildenbrand pointed out that there is a race in that the
function checks early on that there is not already an entry in the
stt->iommu_tables list with the same LIOBN, but an entry with the
same LIOBN could get added between then and when the new entry is
added to the list.

This fixes all three problems.  To simplify things, we now call
anon_inode_getfd() before placing the new entry in the list.  The
check for an existing entry is done while holding the kvm->lock
mutex, immediately before adding the new entry to the list.
Finally, on failure we now call kvmppc_account_memlimit to
decrement the process's count of locked memory pages.

Reported-by: Nixiaoming <nixiaoming@huawei.com>
Reported-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 years agoperf/core: Fix group {cpu,task} validation
Mark Rutland [Thu, 22 Jun 2017 14:41:38 +0000 (15:41 +0100)]
perf/core: Fix group {cpu,task} validation

Regardless of which events form a group, it does not make sense for the
events to target different tasks and/or CPUs, as this leaves the group
inconsistent and impossible to schedule. The core perf code assumes that
these are consistent across (successfully intialised) groups.

Core perf code only verifies this when moving SW events into a HW
context. Thus, we can violate this requirement for pure SW groups and
pure HW groups, unless the relevant PMU driver happens to perform this
verification itself. These mismatched groups subsequently wreak havoc
elsewhere.

For example, we handle watchpoints as SW events, and reserve watchpoint
HW on a per-CPU basis at pmu::event_init() time to ensure that any event
that is initialised is guaranteed to have a slot at pmu::add() time.
However, the core code only checks the group leader's cpu filter (via
event_filter_match()), and can thus install follower events onto CPUs
violating thier (mismatched) CPU filters, potentially installing them
into a CPU without sufficient reserved slots.

This can be triggered with the below test case, resulting in warnings
from arch backends.

  #define _GNU_SOURCE
  #include <linux/hw_breakpoint.h>
  #include <linux/perf_event.h>
  #include <sched.h>
  #include <stdio.h>
  #include <sys/prctl.h>
  #include <sys/syscall.h>
  #include <unistd.h>

  static int perf_event_open(struct perf_event_attr *attr, pid_t pid, int cpu,
   int group_fd, unsigned long flags)
  {
return syscall(__NR_perf_event_open, attr, pid, cpu, group_fd, flags);
  }

  char watched_char;

  struct perf_event_attr wp_attr = {
.type = PERF_TYPE_BREAKPOINT,
.bp_type = HW_BREAKPOINT_RW,
.bp_addr = (unsigned long)&watched_char,
.bp_len = 1,
.size = sizeof(wp_attr),
  };

  int main(int argc, char *argv[])
  {
int leader, ret;
cpu_set_t cpus;

/*
 * Force use of CPU0 to ensure our CPU0-bound events get scheduled.
 */
CPU_ZERO(&cpus);
CPU_SET(0, &cpus);
ret = sched_setaffinity(0, sizeof(cpus), &cpus);
if (ret) {
printf("Unable to set cpu affinity\n");
return 1;
}

/* open leader event, bound to this task, CPU0 only */
leader = perf_event_open(&wp_attr, 0, 0, -1, 0);
if (leader < 0) {
printf("Couldn't open leader: %d\n", leader);
return 1;
}

/*
 * Open a follower event that is bound to the same task, but a
 * different CPU. This means that the group should never be possible to
 * schedule.
 */
ret = perf_event_open(&wp_attr, 0, 1, leader, 0);
if (ret < 0) {
printf("Couldn't open mismatched follower: %d\n", ret);
return 1;
} else {
printf("Opened leader/follower with mismastched CPUs\n");
}

/*
 * Open as many independent events as we can, all bound to the same
 * task, CPU0 only.
 */
do {
ret = perf_event_open(&wp_attr, 0, 0, -1, 0);
} while (ret >= 0);

/*
 * Force enable/disble all events to trigger the erronoeous
 * installation of the follower event.
 */
printf("Opened all events. Toggling..\n");
for (;;) {
prctl(PR_TASK_PERF_EVENTS_DISABLE, 0, 0, 0, 0);
prctl(PR_TASK_PERF_EVENTS_ENABLE, 0, 0, 0, 0);
}

return 0;
  }

Fix this by validating this requirement regardless of whether we're
moving events.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Zhou Chengming <zhouchengming1@huawei.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1498142498-15758-1-git-send-email-mark.rutland@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
7 years agox86/mm: Fix use-after-free of ldt_struct
Eric Biggers [Thu, 24 Aug 2017 17:50:29 +0000 (10:50 -0700)]
x86/mm: Fix use-after-free of ldt_struct

The following commit:

  39a0526fb3f7 ("x86/mm: Factor out LDT init from context init")

renamed init_new_context() to init_new_context_ldt() and added a new
init_new_context() which calls init_new_context_ldt().  However, the
error code of init_new_context_ldt() was ignored.  Consequently, if a
memory allocation in alloc_ldt_struct() failed during a fork(), the
->context.ldt of the new task remained the same as that of the old task
(due to the memcpy() in dup_mm()).  ldt_struct's are not intended to be
shared, so a use-after-free occurred after one task exited.

Fix the bug by making init_new_context() pass through the error code of
init_new_context_ldt().

This bug was found by syzkaller, which encountered the following splat:

    BUG: KASAN: use-after-free in free_ldt_struct.part.2+0x10a/0x150 arch/x86/kernel/ldt.c:116
    Read of size 4 at addr ffff88006d2cb7c8 by task kworker/u9:0/3710

    CPU: 1 PID: 3710 Comm: kworker/u9:0 Not tainted 4.13.0-rc4-next-20170811 #2
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
    Call Trace:
     __dump_stack lib/dump_stack.c:16 [inline]
     dump_stack+0x194/0x257 lib/dump_stack.c:52
     print_address_description+0x73/0x250 mm/kasan/report.c:252
     kasan_report_error mm/kasan/report.c:351 [inline]
     kasan_report+0x24e/0x340 mm/kasan/report.c:409
     __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:429
     free_ldt_struct.part.2+0x10a/0x150 arch/x86/kernel/ldt.c:116
     free_ldt_struct arch/x86/kernel/ldt.c:173 [inline]
     destroy_context_ldt+0x60/0x80 arch/x86/kernel/ldt.c:171
     destroy_context arch/x86/include/asm/mmu_context.h:157 [inline]
     __mmdrop+0xe9/0x530 kernel/fork.c:889
     mmdrop include/linux/sched/mm.h:42 [inline]
     exec_mmap fs/exec.c:1061 [inline]
     flush_old_exec+0x173c/0x1ff0 fs/exec.c:1291
     load_elf_binary+0x81f/0x4ba0 fs/binfmt_elf.c:855
     search_binary_handler+0x142/0x6b0 fs/exec.c:1652
     exec_binprm fs/exec.c:1694 [inline]
     do_execveat_common.isra.33+0x1746/0x22e0 fs/exec.c:1816
     do_execve+0x31/0x40 fs/exec.c:1860
     call_usermodehelper_exec_async+0x457/0x8f0 kernel/umh.c:100
     ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431

    Allocated by task 3700:
     save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
     save_stack+0x43/0xd0 mm/kasan/kasan.c:447
     set_track mm/kasan/kasan.c:459 [inline]
     kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
     kmem_cache_alloc_trace+0x136/0x750 mm/slab.c:3627
     kmalloc include/linux/slab.h:493 [inline]
     alloc_ldt_struct+0x52/0x140 arch/x86/kernel/ldt.c:67
     write_ldt+0x7b7/0xab0 arch/x86/kernel/ldt.c:277
     sys_modify_ldt+0x1ef/0x240 arch/x86/kernel/ldt.c:307
     entry_SYSCALL_64_fastpath+0x1f/0xbe

    Freed by task 3700:
     save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
     save_stack+0x43/0xd0 mm/kasan/kasan.c:447
     set_track mm/kasan/kasan.c:459 [inline]
     kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
     __cache_free mm/slab.c:3503 [inline]
     kfree+0xca/0x250 mm/slab.c:3820
     free_ldt_struct.part.2+0xdd/0x150 arch/x86/kernel/ldt.c:121
     free_ldt_struct arch/x86/kernel/ldt.c:173 [inline]
     destroy_context_ldt+0x60/0x80 arch/x86/kernel/ldt.c:171
     destroy_context arch/x86/include/asm/mmu_context.h:157 [inline]
     __mmdrop+0xe9/0x530 kernel/fork.c:889
     mmdrop include/linux/sched/mm.h:42 [inline]
     __mmput kernel/fork.c:916 [inline]
     mmput+0x541/0x6e0 kernel/fork.c:927
     copy_process.part.36+0x22e1/0x4af0 kernel/fork.c:1931
     copy_process kernel/fork.c:1546 [inline]
     _do_fork+0x1ef/0xfb0 kernel/fork.c:2025
     SYSC_clone kernel/fork.c:2135 [inline]
     SyS_clone+0x37/0x50 kernel/fork.c:2129
     do_syscall_64+0x26c/0x8c0 arch/x86/entry/common.c:287
     return_from_SYSCALL_64+0x0/0x7a

Here is a C reproducer:

    #include <asm/ldt.h>
    #include <pthread.h>
    #include <signal.h>
    #include <stdlib.h>
    #include <sys/syscall.h>
    #include <sys/wait.h>
    #include <unistd.h>

    static void *fork_thread(void *_arg)
    {
        fork();
    }

    int main(void)
    {
        struct user_desc desc = { .entry_number = 8191 };

        syscall(__NR_modify_ldt, 1, &desc, sizeof(desc));

        for (;;) {
            if (fork() == 0) {
                pthread_t t;

                srand(getpid());
                pthread_create(&t, NULL, fork_thread, NULL);
                usleep(rand() % 10000);
                syscall(__NR_exit_group, 0);
            }
            wait(NULL);
        }
    }

Note: the reproducer takes advantage of the fact that alloc_ldt_struct()
may use vmalloc() to allocate a large ->entries array, and after
commit:

  5d17a73a2ebe ("vmalloc: back off when the current task is killed")

it is possible for userspace to fail a task's vmalloc() by
sending a fatal signal, e.g. via exit_group().  It would be more
difficult to reproduce this bug on kernels without that commit.

This bug only affected kernels with CONFIG_MODIFY_LDT_SYSCALL=y.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: <stable@vger.kernel.org> [v4.6+]
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Fixes: 39a0526fb3f7 ("x86/mm: Factor out LDT init from context init")
Link: http://lkml.kernel.org/r/20170824175029.76040-1-ebiggers3@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>