Trond Myklebust [Wed, 4 Oct 2017 17:49:12 +0000 (13:49 -0400)]
NFSv4/pnfs: Fix an infinite layoutget loop
Since we can now use a lock stateid or a delegation stateid, that
differs from the context stateid, we need to change the test in
nfs4_layoutget_handle_exception() to take this into account.
This fixes an infinite layoutget loop in the NFS client whereby
it keeps retrying the initial layoutget using the same broken
stateid.
Fixes: 70d2f7b1ea19b ("pNFS: Use the standard I/O stateid when...") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Olof Johansson [Wed, 4 Oct 2017 17:31:00 +0000 (10:31 -0700)]
Merge tag 'stm32-dt-fixes-for-v4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/atorgue/stm32 into fixes
STM32 fixes for v4.14:
---------------------
-Fix STMPE1600 bindings for stm32429i-eval board
-Use right compatible for stm32f469 pinctrl. It implies to use
pinctrl dedicated files for F4 SoCs.
* tag 'stm32-dt-fixes-for-v4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/atorgue/stm32:
ARM: dts: stm32: use right pinctrl compatible for stm32f469
ARM: dts: stm32: Fix STMPE1600 binding on stm32429i-eval board
Mark Rutland [Tue, 3 Oct 2017 17:25:46 +0000 (18:25 +0100)]
arm64: Use larger stacks when KASAN is selected
AddressSanitizer instrumentation can significantly bloat the stack, and
with GCC 7 this can result in stack overflows at boot time in some
configurations.
We can avoid this by doubling our stack size when KASAN is in use, as is
already done on x86 (and has been since KASAN was introduced).
Regardless of other patches to decrease KASAN's stack utilization,
kernels built with KASAN will always require more stack space than those
built without, and we should take this into account.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
commit f6810c15cf97 ("iommu/arm-smmu: Clean up early-probing
workarounds") removed kernel code that was allowing to initialize
and probe the SMMU devices early (ie earlier than PCI devices, through
linker script callback entries) in the boot process because it was not
needed any longer in that the SMMU devices/drivers now support deferred
probing.
Since the SMMUs probe routines are also in charge of requesting global
PCI ACS kernel enablement, commit f6810c15cf97 ("iommu/arm-smmu: Clean
up early-probing workarounds") also postponed PCI ACS enablement to
SMMUs devices probe time, which is too late given that PCI devices needs
to detect if PCI ACS is enabled to init the respective capability
through the following call path:
Add code in the ACPI IORT SMMU platform devices initialization path
(that is called before ACPI PCI enumeration) to detect if there
exists firmware mappings to map root complexes ids to SMMU ids
and if so enable ACS for the system.
Fixes: f6810c15cf97 ("iommu/arm-smmu: Clean up early-probing workarounds") Reviewed-by: Robin Murphy <robin.murphy@arm.com> Tested-by: Nate Watterson <nwatters@codeaurora.org> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Hanjun Guo <hanjun.guo@linaro.org> Cc: Sudeep Holla <sudeep.holla@arm.com> Cc: Zhou Wang <wangzhou1@hisilicon.com> Cc: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Linus Torvalds [Wed, 4 Oct 2017 16:30:50 +0000 (09:30 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"A lot of stuff, sorry about that. A week on a beach, then a bunch of
time catching up then more time letting it bake in -next. Shan't do
that again!"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (51 commits)
include/linux/fs.h: fix comment about struct address_space
checkpatch: fix ignoring cover-letter logic
m32r: fix build failure
lib/ratelimit.c: use deferred printk() version
kernel/params.c: improve STANDARD_PARAM_DEF readability
kernel/params.c: fix an overflow in param_attr_show
kernel/params.c: fix the maximum length in param_get_string
mm/memory_hotplug: define find_{smallest|biggest}_section_pfn as unsigned long
mm/memory_hotplug: change pfn_to_section_nr/section_nr_to_pfn macro to inline function
kernel/kcmp.c: drop branch leftover typo
memremap: add scheduling point to devm_memremap_pages
mm, page_alloc: add scheduling point to memmap_init_zone
mm, memory_hotplug: add scheduling point to __add_pages
lib/idr.c: fix comment for idr_replace()
mm: memcontrol: use vmalloc fallback for large kmem memcg arrays
kernel/sysctl.c: remove duplicate UINT_MAX check on do_proc_douintvec_conv()
include/linux/bitfield.h: remove 32bit from FIELD_GET comment block
lib/lz4: make arrays static const, reduces object code size
exec: binfmt_misc: kill the onstack iname[BINPRM_BUF_SIZE] array
exec: binfmt_misc: fix race between load_misc_binary() and kill_node()
...
Boqun Feng [Tue, 3 Oct 2017 13:36:51 +0000 (21:36 +0800)]
kvm/x86: Avoid async PF preempting the kernel incorrectly
Currently, in PREEMPT_COUNT=n kernel, kvm_async_pf_task_wait() could call
schedule() to reschedule in some cases. This could result in
accidentally ending the current RCU read-side critical section early,
causing random memory corruption in the guest, or otherwise preempting
the currently running task inside between preempt_disable and
preempt_enable.
The difficulty to handle this well is because we don't know whether an
async PF delivered in a preemptible section or RCU read-side critical section
for PREEMPT_COUNT=n, since preempt_disable()/enable() and rcu_read_lock/unlock()
are both no-ops in that case.
To cure this, we treat any async PF interrupting a kernel context as one
that cannot be preempted, preventing kvm_async_pf_task_wait() from choosing
the schedule() path in that case.
To do so, a second parameter for kvm_async_pf_task_wait() is introduced,
so that we know whether it's called from a context interrupting the
kernel, and the parameter is set properly in all the callsites.
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Cc: stable@vger.kernel.org Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Linus Torvalds [Wed, 4 Oct 2017 16:21:58 +0000 (09:21 -0700)]
Merge branch 'fixes-v4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull smack fix from James Morris:
"It fixes a bug in xattr_getsecurity() where security_release_secctx()
was being called instead of kfree(), which leads to a memory leak in
the capabilities code. smack_inode_getsecurity is also fixed to behave
correctly when called from there"
* 'fixes-v4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
lsm: fix smack_inode_removexattr and xattr_getsecurity memleak
Marek Szyprowski [Tue, 19 Sep 2017 10:01:08 +0000 (12:01 +0200)]
clk: samsung: exynos4: Enable VPLL and EPLL clocks for suspend/resume cycle
Commit 6edfa11cb396 ("clk: samsung: Add enable/disable operation for
PLL36XX clocks") added enable/disable operations to PLL clocks. Prior that
VPLL and EPPL clocks were always enabled because the enable bit was never
touched. Those clocks have to be enabled during suspend/resume cycle,
because otherwise board fails to enter sleep mode. This patch enables them
unconditionally before entering system suspend state. System restore
function will set them to the previous state saved in the register cache
done before that unconditional enable.
Fixes: 6edfa11cb396 ("clk: samsung: Add enable/disable operation for PLL36XX clocks") CC: stable@vger.kernel.org # v4.13 Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com> Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org> Acked-by: Sylwester Nawrocki <s.nawrocki@samsung.com> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Linus Torvalds [Wed, 4 Oct 2017 15:34:01 +0000 (08:34 -0700)]
Merge tag 'trace-v4.14-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixlets from Steven Rostedt:
"Two updates:
- A memory fix with left over code from spliting out ftrace_ops and
function graph tracer, where the function graph tracer could reset
the trampoline pointer, leaving the old trampoline not to be freed
(memory leak).
- The update to Paul's patch that added the unnecessary READ_ONCE().
This removes the unnecessary READ_ONCE() instead of having to
rebase the branch to update the patch that added it"
* tag 'trace-v4.14-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
rcu: Remove extraneous READ_ONCE()s from rcu_irq_{enter,exit}()
ftrace: Fix kmemleak in unregister_ftrace_graph
Milan Broz [Wed, 13 Sep 2017 13:45:56 +0000 (15:45 +0200)]
dm crypt: reject sector_size feature if device length is not aligned to it
If a crypt mapping uses optional sector_size feature, additional
restrictions to mapped device segment size must be applied in
constructor, otherwise the device activation will fail later.
Fixes: 8f0009a225 ("dm crypt: optionally support larger encryption sector size") Cc: stable@vger.kernel.org # 4.12+ Signed-off-by: Milan Broz <gmazyland@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Benjamin Block [Tue, 3 Oct 2017 10:48:37 +0000 (12:48 +0200)]
bsg-lib: fix use-after-free under memory-pressure
When under memory-pressure it is possible that the mempool which backs
the 'struct request_queue' will make use of up to BLKDEV_MIN_RQ count
emergency buffers - in case it can't get a regular allocation. These
buffers are preallocated and once they are also used, they are
re-supplied with old finished requests from the same request_queue (see
mempool_free()).
The bug is, when re-supplying the emergency pool, the old requests are
not again ran through the callback mempool_t->alloc(), and thus also not
through the callback bsg_init_rq(). Thus we skip initialization, and
while the sense-buffer still should be good, scsi_request->cmd might
have become to be an invalid pointer in the meantime. When the request
is initialized in bsg.c, and the user's CDB is larger than BLK_MAX_CDB,
bsg will replace it with a custom allocated buffer, which is freed when
the user's command is finished, thus it dangles afterwards. When next a
command is sent by the user that has a smaller/similar CDB as
BLK_MAX_CDB, bsg will assume that scsi_request->cmd is backed by
scsi_request->__cmd, will not make a custom allocation, and write into
undefined memory.
Fix this by splitting bsg_init_rq() into two functions:
- bsg_init_rq() is changed to only do the allocation of the
sense-buffer, which is used to back the bsg job's reply buffer. This
pointer should never change during the lifetime of a scsi_request, so
it doesn't need re-initialization.
- bsg_initialize_rq() is a new function that makes use of
'struct request_queue's initialize_rq_fn callback (which was
introduced in v4.12). This is always called before the request is
given out via blk_get_request(). This function does the remaining
initialization that was previously done in bsg_init_rq(), and will
also do it when the request is taken from the emergency-pool of the
backing mempool.
Fixes: 50b4d485528d ("bsg-lib: fix kernel panic resulting from missing allocation of reply-buffer") Cc: <stable@vger.kernel.org> # 4.11+ Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jean-Denis Girard noticed commit c821e7f3 "pass bytes to
btrfs_bio_alloc" (https://patchwork.kernel.org/patch/9763081/)
introduces a regression on 32 bit machines.
When CONFIG_LBDAF is _not_ defined (CONFIG_LBDAF == Support for large
(2TB+) block devices and files) sector_t is 32 bit on 32bit machines.
In the function submit_extent_page, 'sector' (which is sector_t type) is
multiplied by 512 to convert it from sectors to bytes, leading to an
overflow when the disk is bigger than 4GB (!).
I added a cast to u64 to avoid overflow.
Fixes: c821e7f3 ("btrfs: pass bytes to btrfs_bio_alloc") CC: stable@vger.kernel.org # 4.13+ Signed-off-by: Goffredo Baroncelli <kreijack@inwind.it> Tested-by: Jean-Denis Girard <jd.girard@sysnux.pf> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
ARM: dts: stm32: use right pinctrl compatible for stm32f469
Currently, same stm32f429-pinctrl driver is used for stm32f429 and
stm32f469. As pin map is different between those 2 MCUs,
a stm32f469-pinctrl driver has been recently added.
This patch
-allows to use stm32f469-pinctrl driver for stm32f469 boards
-reworks stm32 devicetree files to fit with stm32f429 / stm32f469
In the same time it fixes an issue when only MACH_STM32F469 flag is
selected in menuconfig.
Peng Xu [Tue, 3 Oct 2017 20:21:51 +0000 (23:21 +0300)]
nl80211: Define policy for packet pattern attributes
Define a policy for packet pattern attributes in order to fix a
potential read over the end of the buffer during nla_get_u32()
of the NL80211_PKTPAT_OFFSET attribute.
Note that the data there can always be read due to SKB allocation
(with alignment and struct skb_shared_info at the end), but the
data might be uninitialized. This could be used to leak some data
from uninitialized vmalloc() memory, but most drivers don't allow
an offset (so you'd just get -EINVAL if the data is non-zero) or
just allow it with a fixed value - 100 or 128 bytes, so anything
above that would get -EINVAL. With brcmfmac the limit is 1500 so
(at least) one byte could be obtained.
Cc: stable@kernel.org Signed-off-by: Peng Xu <pxu@qti.qualcomm.com> Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
[rewrite description based on SKB allocation knowledge] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
powerpc/xive: Clear XIVE internal structures when a CPU is removed
Commit eac1e731b59e ("powerpc/xive: guest exploitation of the XIVE
interrupt controller") introduced support for the XIVE exploitation
mode of the P9 interrupt controller on the pseries platform.
At that time, support for CPU removal was not complete on PowerVM and
CPU hot unplug remained untested. It appears that some cleanups of the
XIVE internal structures are required before releasing the CPU,
without which the kernel crashes in a RTAS call doing the CPU
isolation.
These changes fix the crash by deconfiguring the IPI interrupt source
and clearing the event queues of the CPU when it is removed.
Fixes: eac1e731b59e ("powerpc/xive: guest exploitation of the XIVE interrupt controller") Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When resetting an IPI, hw_ipi should also be set to zero.
Fixes: eac1e731b59e ("powerpc/xive: guest exploitation of the XIVE interrupt controller") Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
nvme-pci: Use PCI bus address for data/queues in CMB
Currently, NVMe PCI host driver is programming CMB dma address as
I/O SQs addresses. This results in failures on systems where 1:1
outbound mapping is not used (example Broadcom iProc SOCs) because
CMB BAR will be progammed with PCI bus address but NVMe PCI EP will
try to access CMB using dma address.
To have CMB working on systems without 1:1 outbound mapping, we
program PCI bus address for I/O SQs instead of dma address. This
approach will work on systems with/without 1:1 outbound mapping.
Based on a report and previous patch from Abhishek Shah.
Fixes: 8ffaadf7 ("NVMe: Use CMB for the IO SQes if available") Cc: stable@vger.kernel.org Reported-by: Abhishek Shah <abhishek.shah@broadcom.com> Tested-by: Abhishek Shah <abhishek.shah@broadcom.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
ARM: dts: stm32: Fix STMPE1600 binding on stm32429i-eval board
To declare gpio interrupt line for STMPE1600, 2 possibilities are offered:
-use gpio binding (and then the gpiolib interface inside driver)
-use interrupt binding as each gpio-controller are also interrupt controller
on stm32f429.
In STMPE 1600 node both (gpio and interrupt) bindings are defined.
This patch fixes this issue and use only interrupt binding.
Thomas Gleixner [Wed, 4 Oct 2017 08:03:04 +0000 (10:03 +0200)]
watchdog/core: Rename some softlockup_* functions
The function names made sense up to the point where the watchdog
(re)configuration was unified to use softlockup_reconfigure_threads() for
all configuration purposes. But that includes scenarios which solely
configure the nmi watchdog.
Rename softlockup_reconfigure_threads() and softlockup_init_threads() so
the function names match the functionality.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linuxfoundation.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Don Zickus <dzickus@redhat.com>
This can be avoided by setting up the cpu hotplug state with nocalls and
move the initialization to the watchdog_nmi_probe() function. That
initializes the hotplug callbacks without invoking the callback and the
following core initialization function then configures the watchdog for the
online CPUs (in this case CPU0) via softlockup_reconfigure_threads().
Reported-and-tested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: linuxppc-dev@lists.ozlabs.org
Thomas Gleixner [Tue, 3 Oct 2017 14:37:53 +0000 (16:37 +0200)]
watchdog/core, powerpc: Lock cpus across reconfiguration
Instead of dropping the cpu hotplug lock after stopping NMI watchdog and
threads and reaquiring for restart, the code and the protection rules
become more obvious when holding cpu hotplug lock across the full
reconfiguration.
Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Don Zickus <dzickus@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1710022105570.2114@nanos
The recent cleanup of the watchdog code split watchdog_nmi_reconfigure()
into two stages. One to stop the NMI and one to restart it after
reconfiguration. That was done by adding a boolean 'run' argument to the
code, which is functionally correct but not necessarily a piece of art.
Replace it by two explicit functions: watchdog_nmi_stop() and
watchdog_nmi_start().
Fixes: 6592ad2fcc8f ("watchdog/core, powerpc: Make watchdog_nmi_reconfigure() two stage") Requested-by: Linus 'Nursing his pet-peeve' Torvalds <torvalds@linuxfoundation.org> Signed-off-by: Thomas 'Mopping up garbage' Gleixner <tglx@linutronix.de> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Don Zickus <dzickus@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1710021957480.2114@nanos
Gregory CLEMENT [Mon, 2 Oct 2017 14:58:52 +0000 (16:58 +0200)]
mmc: sdhci-xenon: Fix clock resource by adding an optional bus clock
On Armada 7K/8K we need to explicitly enable the bus clock. The bus clock
is optional because not all the SoCs need them but at least for Armada
7K/8K it is actually mandatory.
The binding documentation is updating accordingly.
Without this patch the kernel hand during boot if the mvpp2.2 network
driver was not present in the kernel. Indeed the clock needed by the
xenon controller was set by the network driver.
Jerome Brunet [Mon, 2 Oct 2017 12:27:42 +0000 (14:27 +0200)]
mmc: meson-gx: fix rx phase reset
Resetting the phase when POWER_ON is set the set_ios() call means that the
phase is reset almost every time the set_ios() is called, while the
expected behavior was to reset the phase on a power cycle.
This had gone unnoticed until now because in all mode (except hs400) the
tuning is done after the last to set_ios(). In such case, the tuning
result is used anyway. In HS400, there are a few calls to set_ios() after
the tuning is done, overwriting the tuning result.
Resetting the phase on POWER_UP instead of POWER_ON solve the problem.
Jerome Brunet [Mon, 2 Oct 2017 12:27:41 +0000 (14:27 +0200)]
mmc: meson-gx: make sure the clock is rounded down
Using CLK_DIVIDER_ROUND_CLOSEST is unsafe as the mmc clock could be
rounded to a rate higher the specified rate. Removing this flag ensure
that, if the rate needs to be rounded, it will be rounded down.
Fixes: 51c5d8447bd7 ("MMC: meson: initial support for GX platforms") Signed-off-by: Jerome Brunet <jbrunet@baylibre.com> Reviewed-by: Kevin Hilman <khilman@baylibre.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
In may, Steven sent a patch deleting the bounce buffer handling
and the CONFIG_MMC_BLOCK_BOUNCE option.
I chose the less invasive path of making it a runtime config
option, and we merged that successfully for kernel v4.12.
The code is however just standing in the way and taking up
space for seemingly no gain on any systems in wide use today.
Pierre says the code was there to improve speed on TI SDHCI
controllers on certain HP laptops and possibly some Ricoh
controllers as well. Early SDHCI controllers lacked the
scatter-gather feature, which made software bounce buffers
a significant speed boost.
We are clearly talking about the list of SDHCI PCI-based
MMC/SD card readers found in the pci_ids[] list in
drivers/mmc/host/sdhci-pci-core.c.
The TI SDHCI derivative is not supported by the upstream
kernel. This leaves the Ricoh.
What we can however notice is that the x86 defconfigs in the
kernel did not enable CONFIG_MMC_BLOCK_BOUNCE option, which
means that any such laptop would have to have a custom
configured kernel to actually take advantage of this
bounce buffer speed-up. It simply seems like there was
a speed optimization for the Ricoh controllers that noone
was using. (I have not checked the distro defconfigs but
I am pretty sure the situation is the same there.)
Bounce buffers increased performance on the OMAP HSMMC
at one point, and was part of the original submission in
commit a45c6cb81647 ("[ARM] 5369/1: omap mmc: Add new
omap hsmmc controller for 2430 and 34xx, v3")
This optimization was removed in
commit 0ccd76d4c236 ("omap_hsmmc: Implement scatter-gather
emulation")
which found that scatter-gather emulation provided even
better performance.
The same was introduced for SDHCI in
commit 2134a922c6e7 ("sdhci: scatter-gather (ADMA) support")
I am pretty positively convinced that software
scatter-gather emulation will do for any host controller what
the bounce buffers were doing. Essentially, the bounce buffer
was a reimplementation of software scatter-gather-emulation in
the MMC subsystem, and it should be done away with.
Cc: Pierre Ossman <pierre@ossman.eu> Cc: Juha Yrjola <juha.yrjola@solidboot.com> Cc: Steven J. Hill <Steven.Hill@cavium.com> Cc: Shawn Lin <shawn.lin@rock-chips.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Suggested-by: Steven J. Hill <Steven.Hill@cavium.com> Suggested-by: Shawn Lin <shawn.lin@rock-chips.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
lsm: fix smack_inode_removexattr and xattr_getsecurity memleak
security_inode_getsecurity() provides the text string value
of a security attribute. It does not provide a "secctx".
The code in xattr_getsecurity() that calls security_inode_getsecurity()
and then calls security_release_secctx() happened to work because
SElinux and Smack treat the attribute and the secctx the same way.
It fails for cap_inode_getsecurity(), because that module has no
secctx that ever needs releasing. It turns out that Smack is the
one that's doing things wrong by not allocating memory when instructed
to do so by the "alloc" parameter.
The fix is simple enough. Change the security_release_secctx() to
kfree() because it isn't a secctx being returned by
security_inode_getsecurity(). Change Smack to allocate the string when
told to do so.
Note: this also fixes memory leaks for LSMs which implement
inode_getsecurity but not release_secctx, such as capabilities.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: stable@vger.kernel.org Signed-off-by: James Morris <james.l.morris@oracle.com>
If we got two AIO writes into a COW area the second one might not have any
COW extents left to convert. Handle that case gracefully instead of
triggering an assert or accessing beyond the bounds of the extent list.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Darrick J. Wong [Mon, 18 Sep 2017 16:41:18 +0000 (09:41 -0700)]
xfs: always swap the cow forks when swapping extents
Since the CoW fork exists as a secondary data structure to the data
fork, we must always swap cow forks during swapext. We also need to
swap the extent counts and reset the cowblocks tags.
Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
ARC: [plat-hsdk]: Temporary fix to set CPU frequency to 1GHz
Add temporary fix to HSDK platform code to setup CPU frequency
to 1GHz on early boot.
We can remove this fix when smart hsdk pll driver will be
introduced, see discussion:
https://www.mail-archive.com/linux-snps-arc@lists.infradead.org/msg02689.html
ARCv2 ISA_CONFIG and ARC700_BUILD build config registers are not
compatible. cpuinfo_arc had isa info placeholder which was mashup of bits
form both.
Untangle this by defining it off of ARCv2 ISA info and it is fine even
for ARC700 since former is a super set of latter (ARC700 buildonly has 2
bits for atomics and stack check).
At runtime, we treat ARCv2 ISA info as a generic placeholder but
populate it correctly depending on ARC700 or HS.
This paves way for adding more HS specific bits in isa info which was
colliding with the extra bits for arc700.
Commit 92e5aae45778 "kernel/watchdog: split up config options"
introduced SOFTLOCKUP_DETECTOR which selects LOCKUP_DETECTOR
instead of the latter to be selected itself.
ARC: [plat-axs10x] sdio: Temporary fix of sdio ciu frequency
DW sdio controller has external ciu clock divider controlled
via register in SDIO IP. It divides sdio_ref_clk
(which comes from CGU) by 16 for default. So default mmcclk
clock (which comes to sdk_in) is 25000000 Hz.
Note this is a preventive fix, in line with similar change for HSDK
where this was actually needed. see:
http://lists.infradead.org/pipermail/linux-snps-arc/2017-September/002924.html
ARC: [plat-hsdk] sdio: Temporary fix of sdio ciu frequency
DW sdio controller has external ciu clock divider controlled via
register in SDIO IP. Due to its unexpected default value
(it should divide by 1 but it divides by 8)
SDIO IP uses wrong ciu clock and works unstable
So add temporary fix and change clock frequency from 100000000
to 12500000 Hz until we fix dw sdio driver itself.
ARC: [plat-axs103] Add temporary quirk to reset ethernet IP
DW ethernet controller on AXS10x hangs sometimes after SW reset, so
add temporary quirk to reset DW ethernet controller IP core.
This quirk can be removed after axs10x reset driver
(see http://patchwork.ozlabs.org/patch/800273/)
or simple reset driver
(see https://patchwork.kernel.org/patch/9903375/)
will be available in upstream.
Olof Johansson [Wed, 4 Oct 2017 01:15:58 +0000 (18:15 -0700)]
Merge tag 'omap-for-v4.14/fixes-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into fixes
Fixes for omaps for v4.14-rc cycle
Few minor fixes for omaps, mostly just boot time warning fixes:
- Drop undocumented camera binding that got merged during the merge window by
accident as I applied before Sakari's comments
- Fix soft reset warning for dra7 kexec boot for gpio1 as the optional clocks
need to be enabled for reset
- Fix dra7 kexec boot clock rate for McASP as the rate is no longer the default
rate after kexec
- Fix omap3 pandora MMC warning during boot
- Add am33xx SPI alias like we have on other SoCs
- Remove node for non-existing CPSW EMAC Ethernet on am43xx-epos-evm
* tag 'omap-for-v4.14/fixes-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: dts: am43xx-epos-evm: Remove extra CPSW EMAC entry
ARM: dts: am33xx: Add spi alias to match SOC schematics
ARM: OMAP2+: hsmmc: fix logic to call either omap_hsmmc_init or omap_hsmmc_late_init but not both
ARM: dts: dra7: Set a default parent to mcasp3_ahclkx_mux
ARM: OMAP2+: dra7xx: Set OPT_CLKS_IN_RESET flag for gpio1
ARM: dts: nokia n900: drop unneeded/undocumented parts of the dts
Olof Johansson [Wed, 4 Oct 2017 01:13:35 +0000 (18:13 -0700)]
Merge tag 'v4.14-rockchip-dts64fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into fixes
Adding the operating points on rk3368 like they were did not end up well
for the boards as all of them are missing their cpu supplies, the OPPs
actually need to follow the <target min max> format as the regulator is
shared between both clusters and the one rk3368 board I have, somehow also
doesn't like the higher opps at all - all of which I only realized after
I brought my rk3368 board online again, after its bootloader broke.
So we revert that OPP addition for now.
And also two fixes for the mipi dsi controller on rk3399, which was
referencing a clock to high up in the clock-tree so that an intermediate
gate could be disabled inadvertently and also needs a clock for its area
in the general register files of the rk3399 soc.
* tag 'v4.14-rockchip-dts64fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
arm64: dts: rockchip: add the grf clk for dw-mipi-dsi on rk3399
arm64: dts: rockchip: Correct MIPI DPHY PLL clock on rk3399
Revert "arm64: dts: rockchip: Add basic cpu frequencies for RK3368"
Olof Johansson [Wed, 4 Oct 2017 01:10:12 +0000 (18:10 -0700)]
Merge tag 'reset-fixes-for-4.14' of git://git.pengutronix.de/git/pza/linux into fixes
Reset controller fixes for v4.14
- Remove misleading HSDK v1 suffix, as there is no v2 planned
- Add missing DT binding documentation for HSDK reset driver
- Fix HSDK reset driver dependencies
* tag 'reset-fixes-for-4.14' of git://git.pengutronix.de/git/pza/linux:
reset: Restrict RESET_HSDK to ARC_SOC_HSDK or COMPILE_TEST
ARC: reset: remove the misleading v1 suffix all over
ARC: reset: add missing DT binding documentation for HSDKv1 reset driver
ARC: reset: Only build on archs that have IOMEM
Olof Johansson [Wed, 4 Oct 2017 01:09:08 +0000 (18:09 -0700)]
Merge tag 'davinci-fixes-for-v4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/nsekhar/linux-davinci into fixes
A fix for random ethernet mac address problem
on DA850 EVM.
* tag 'davinci-fixes-for-v4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/nsekhar/linux-davinci:
ARM: dts: da850-evm: add serial and ethernet aliases
Olof Johansson [Wed, 4 Oct 2017 01:08:27 +0000 (18:08 -0700)]
Merge tag 'at91-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nferre/linux-at91 into fixes
Fixes for 4.14:
- three DT fixes for the newly introduced sama5d27_som1_ek board
- one treewide modification that didn't touch this new PM code: we
synchronize now to be coherent with the other ARM platforms
* tag 'at91-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nferre/linux-at91:
ARM: at91: Replace uses of virt_to_phys with __pa_symbol
ARM: dts: at91: sama5d27_som1_ek: fix USB host vbus
ARM: dts: at91: sama5d27_som1_ek: fix typos
ARM: dts: at91: sama5d27_som1_ek: update pinmux/pinconf for LEDs and USB
This updates the Gemini defconfig with drivers merged
for v4.13 or v4.14:
- ATA driver is merged
- DMA driver is merged
- RTC driver gets selected from default Kconfig
Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Olof Johansson <olof@lixom.net>
ARM: defconfig: FRAMEBUFFER_CONSOLE can no longer be =m
It is no longer possible to load this at runtime, so let's
change the few remaining users to have it built-in all
the time.
arch/arm/configs/zeus_defconfig:115:warning: symbol value 'm' invalid for FRAMEBUFFER_CONSOLE
arch/arm/configs/viper_defconfig:116:warning: symbol value 'm' invalid for FRAMEBUFFER_CONSOLE
arch/arm/configs/pxa_defconfig:474:warning: symbol value 'm' invalid for FRAMEBUFFER_CONSOLE
Reported-by: kernelci.org bot <bot@kernelci.org> Fixes: 6104c37094e7 ("fbcon: Make fbcon a built-time depency for fbdev") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Olof Johansson <olof@lixom.net>
Mike Rapoport [Tue, 3 Oct 2017 23:16:54 +0000 (16:16 -0700)]
include/linux/fs.h: fix comment about struct address_space
Before commit 9c5d760b8d22 ("mm: split gfp_mask and mapping flags into
separate fields") the private_* fields of struct adrress_space were
grouped together and using "ditto" in comments describing the last
fields was correct.
With introduction of gpf_mask between private_lock and private_list
"ditto" references the wrong description.
ERROR: Does not appear to be a unified-diff format patch
The logic to suppress the unified-diff check for cover letters is there
but is checking $file instead of $filename. Fix the variable to use the
correct one.
Link: http://lkml.kernel.org/r/20170909090406.31523-1-shorne@gmail.com Signed-off-by: Stafford Horne <shorne@gmail.com> Acked-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sudip Mukherjee [Tue, 3 Oct 2017 23:16:49 +0000 (16:16 -0700)]
m32r: fix build failure
The allmodconfig build of m32r is failing with the error:
lib/mpi/mpih-div.o: In function 'mpihelp_divrem':
mpih-div.c:(.text+0x40): undefined reference to 'abort'
mpih-div.c:(.text+0x40): relocation truncated to fit:
R_M32R_26_PCREL_RELA against undefined symbol 'abort'
The function 'abort' was never defined for the m32r architecture.
Create 'abort' as is done in other arch like 'arm' and 'unicore32'.
printk_ratelimit() invokes ___ratelimit() which may invoke a normal
printk() (pr_warn() in this particular case) to warn about suppressed
output. Given that printk_ratelimit() may be called from anywhere, that
pr_warn() is dangerous - it may end up deadlocking the system. Fix
___ratelimit() by using deferred printk().
Jean Delvare [Tue, 3 Oct 2017 23:16:38 +0000 (16:16 -0700)]
kernel/params.c: fix an overflow in param_attr_show
Function param_attr_show could overflow the buffer it is operating on.
The buffer size is PAGE_SIZE, and the string returned by
attribute->param->ops->get is generated by scnprintf(buffer, PAGE_SIZE,
...) so it could be PAGE_SIZE - 1 long, with the terminating '\0' at the
very end of the buffer. Calling strcat(..., "\n") on this isn't safe, as
the '\0' will be replaced by '\n' (OK) and then another '\0' will be added
past the end of the buffer (not OK.)
Simply add the trailing '\n' when writing the attribute contents to the
buffer originally. This is safe, and also faster.
Credits to Teradata for discovering this issue.
Link: http://lkml.kernel.org/r/20170928162602.60c379c7@endymion Signed-off-by: Jean Delvare <jdelvare@suse.de> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jean Delvare [Tue, 3 Oct 2017 23:16:35 +0000 (16:16 -0700)]
kernel/params.c: fix the maximum length in param_get_string
The length parameter of strlcpy() is supposed to reflect the size of the
target buffer, not of the source string. Harmless in this case as the
buffer is PAGE_SIZE long and the source string is always much shorter than
this, but conceptually wrong, so let's fix it.
Link: http://lkml.kernel.org/r/20170928162515.24846b4f@endymion Signed-off-by: Jean Delvare <jdelvare@suse.de> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mm/memory_hotplug: define find_{smallest|biggest}_section_pfn as unsigned long
find_{smallest|biggest}_section_pfn()s find the smallest/biggest section
and return the pfn of the section. But the functions are defined as int.
So the functions always return 0x00000000 - 0xffffffff. It means if
memory address is over 16TB, the functions does not work correctly.
To handle 64 bit value, the patch defines
find_{smallest|biggest}_section_pfn() as unsigned long.
Fixes: 815121d2b5cd ("memory_hotplug: clear zone when removing the memory") Link: http://lkml.kernel.org/r/d9d5593a-d0a4-c4be-ab08-493df59a85c6@gmail.com Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Reza Arbab <arbab@linux.vnet.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mm/memory_hotplug: change pfn_to_section_nr/section_nr_to_pfn macro to inline function
pfn_to_section_nr() and section_nr_to_pfn() are defined as macro.
pfn_to_section_nr() has no issue even if it is defined as macro. But
section_nr_to_pfn() has overflow issue if sec is defined as int.
section_nr_to_pfn() just shifts sec by PFN_SECTION_SHIFT. If sec is
defined as unsigned long, section_nr_to_pfn() returns pfn as 64 bit value.
But if sec is defined as int, section_nr_to_pfn() returns pfn as 32 bit
value.
__remove_section() calculates start_pfn using section_nr_to_pfn() and
scn_nr defined as int. So if hot-removed memory address is over 16TB,
overflow issue occurs and section_nr_to_pfn() does not calculate correct
pfn.
To make callers use proper arg, the patch changes the macros to inline
functions.
Fixes: 815121d2b5cd ("memory_hotplug: clear zone when removing the memory") Link: http://lkml.kernel.org/r/e643a387-e573-6bbf-d418-c60c8ee3d15e@gmail.com Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Reza Arbab <arbab@linux.vnet.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Tue, 3 Oct 2017 23:16:23 +0000 (16:16 -0700)]
memremap: add scheduling point to devm_memremap_pages
devm_memremap_pages is initializing struct pages in for_each_device_pfn
and that can take quite some time. We have even seen a soft lockup
triggering on a non preemptive kernel
Link: http://lkml.kernel.org/r/20170918121410.24466-4-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Johannes Thumshirn <jthumshirn@suse.de> Tested-by: Johannes Thumshirn <jthumshirn@suse.de> Cc: Dan Williams <dan.j.williams@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix this by adding a scheduling point once per page block.
Link: http://lkml.kernel.org/r/20170918121410.24466-3-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Johannes Thumshirn <jthumshirn@suse.de> Tested-by: Johannes Thumshirn <jthumshirn@suse.de> Cc: Dan Williams <dan.j.williams@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Tue, 3 Oct 2017 23:16:16 +0000 (16:16 -0700)]
mm, memory_hotplug: add scheduling point to __add_pages
Patch series "mm, memory_hotplug: fix few soft lockups in memory
hotadd".
Johannes has noticed few soft lockups when adding a large nvdimm device.
All of them were caused by a long loop without any explicit cond_resched
which is a problem for !PREEMPT kernels.
The fix is quite straightforward. Just make sure that cond_resched gets
called from time to time.
This patch (of 3):
__add_pages gets a pfn range to add and there is no upper bound for a
single call. This is usually a memory block aligned size for the
regular memory hotplug - smaller sizes are usual for memory balloning
drivers, or the whole NUMA node for physical memory online. There is no
explicit scheduling point in that code path though.
This can lead to long latencies while __add_pages is executed and we
have even seen a soft lockup report during nvdimm initialization with
!PREEMPT kernel
Fix this by adding cond_resched once per each memory section in the
given pfn range. Each section is constant amount of work which itself
is not too expensive but many of them will just add up.
Link: http://lkml.kernel.org/r/20170918121410.24466-2-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Johannes Thumshirn <jthumshirn@suse.de> Tested-by: Johannes Thumshirn <jthumshirn@suse.de> Cc: Dan Williams <dan.j.williams@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Tue, 3 Oct 2017 23:16:10 +0000 (16:16 -0700)]
mm: memcontrol: use vmalloc fallback for large kmem memcg arrays
For quick per-memcg indexing, slab caches and list_lru structures
maintain linear arrays of descriptors. As the number of concurrent
memory cgroups in the system goes up, this requires large contiguous
allocations (8k cgroups = order-5, 16k cgroups = order-6 etc.) for every
existing slab cache and list_lru, which can easily fail on loaded
systems. E.g.:
This output is from an artificial reproducer, but we have repeatedly
observed order-7 failures in production in the Facebook fleet. These
systems become useless as they cannot run more jobs, even though there
is plenty of memory to allocate 128 individual pages.
Use kvmalloc and kvzalloc to fall back to vmalloc space if these arrays
prove too large for allocating them physically contiguous.
Link: http://lkml.kernel.org/r/20170918184919.20644-1-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Josef Bacik <jbacik@fb.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Colin Ian King [Tue, 3 Oct 2017 23:16:01 +0000 (16:16 -0700)]
lib/lz4: make arrays static const, reduces object code size
Don't populate the read-only arrays dec32table and dec64table on the
stack, instead make them both static const. Makes the object code
smaller by over 10K bytes:
Before:
text data bss dec hex filename
31500 0 0 31500 7b0c lib/lz4/lz4_decompress.o
After:
text data bss dec hex filename
20237 176 0 20413 4fbd lib/lz4/lz4_decompress.o
(gcc version 7.2.0 x86_64)
Link: http://lkml.kernel.org/r/20170921221939.20820-1-colin.king@canonical.com Signed-off-by: Colin Ian King <colin.king@canonical.com> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Sven Schmidt <4sschmid@informatik.uni-hamburg.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Tue, 3 Oct 2017 23:15:55 +0000 (16:15 -0700)]
exec: binfmt_misc: fix race between load_misc_binary() and kill_node()
load_misc_binary() makes a local copy of fmt->interpreter under
entries_lock to avoid the race with kill_node() but this is not enough;
the whole Node can be freed after we drop entries_lock, not only the
->interpreter string.
Add dget/dput(fmt->dentry) to ensure bm_evict_inode() can't destroy/free
this Node.
Link: http://lkml.kernel.org/r/20170922143650.GA17227@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Ben Woodard <woodard@redhat.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Jim Foraker <foraker1@llnl.gov> Cc: Travis Gummels <tgummels@redhat.com> Cc: <tdhooge@llnl.gov> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Tue, 3 Oct 2017 23:15:48 +0000 (16:15 -0700)]
exec: binfmt_misc: shift filp_close(interp_file) from kill_node() to bm_evict_inode()
To ensure that load_misc_binary() can't use the partially destroyed
Node, see also the next patch.
The current logic looks wrong in any case, once we close interp_file it
doesn't make any sense to delay kfree(inode->i_private), this Node is no
longer valid. Even if the MISC_FMT_OPEN_FILE/interp_file checks were
not racy (they are), load_misc_binary() should not try to reopen
->interpreter if MISC_FMT_OPEN_FILE is set but ->interp_file is NULL.
And I can't understand why do we use filp_close(), not fput().
Link: http://lkml.kernel.org/r/20170922143644.GA17216@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Ben Woodard <woodard@redhat.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Jim Foraker <foraker1@llnl.gov> Cc: <tdhooge@llnl.gov> Cc: Travis Gummels <tgummels@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Tue, 3 Oct 2017 23:15:45 +0000 (16:15 -0700)]
exec: binfmt_misc: don't nullify Node->dentry in kill_node()
kill_node() nullifies/checks Node->dentry to avoid double free. This
complicates the next changes and this is very confusing:
- we do not need to check dentry != NULL under entries_lock,
kill_node() is always called under inode_lock(d_inode(root)) and we
rely on this inode_lock() anyway, without this lock the
MISC_FMT_OPEN_FILE cleanup could race with itself.
- if kill_inode() was already called and ->dentry == NULL we should not
even try to close e->interp_file.
We can change bm_entry_write() to simply check !list_empty(list) before
kill_node. Again, we rely on inode_lock(), in particular it saves us
from the race with bm_status_write(), another caller of kill_node().
Link: http://lkml.kernel.org/r/20170922143641.GA17210@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Ben Woodard <woodard@redhat.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Jim Foraker <foraker1@llnl.gov> Cc: <tdhooge@llnl.gov> Cc: Travis Gummels <tgummels@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Tue, 3 Oct 2017 23:15:42 +0000 (16:15 -0700)]
exec: load_script: kill the onstack interp[BINPRM_BUF_SIZE] array
Patch series "exec: binfmt_misc: fix use-after-free, kill
iname[BINPRM_BUF_SIZE]".
It looks like this code was always wrong, then commit 948b701a607f
("binfmt_misc: add persistent opened binary handler for containers")
added more problems.
This patch (of 6):
load_script() can simply use i_name instead, it points into bprm->buf[]
and nobody can change this memory until we call prepare_binprm().
The only complication is that we need to also change the signature of
bprm_change_interp() but this change looks good too.
While at it, do whitespace/style cleanups.
NOTE: the real motivation for this change is that people want to
increase BINPRM_BUF_SIZE, we need to change load_misc_binary() too but
this looks more complicated because afaics it is very buggy.
Link: http://lkml.kernel.org/r/20170918163446.GA26793@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Travis Gummels <tgummels@redhat.com> Cc: Ben Woodard <woodard@redhat.com> Cc: Jim Foraker <foraker1@llnl.gov> Cc: <tdhooge@llnl.gov> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
userfaultfd: non-cooperative: fix fork use after free
When reading the event from the uffd, we put it on a temporary
fork_event list to detect if we can still access it after releasing and
retaking the event_wqh.lock.
If fork aborts and removes the event from the fork_event all is fine as
long as we're still in the userfault read context and fork_event head is
still alive.
We've to put the event allocated in the fork kernel stack, back from
fork_event list-head to the event_wqh head, before returning from
userfaultfd_ctx_read, because the fork_event head lifetime is limited to
the userfaultfd_ctx_read stack lifetime.
Forgetting to move the event back to its event_wqh place then results in
__remove_wait_queue(&ctx->event_wqh, &ewq->wq); in
userfaultfd_event_wait_completion to remove it from a head that has been
already freed from the reader stack.
This could only happen if resolve_userfault_fork failed (for example if
there are no file descriptors available to allocate the fork uffd). If
it succeeded it was put back correctly.
Furthermore, after find_userfault_evt receives a fork event, the forked
userfault context in fork_nctx and uwq->msg.arg.reserved.reserved1 can
be released by the fork thread as soon as the event_wqh.lock is
released. Taking a reference on the fork_nctx before dropping the lock
prevents an use after free in resolve_userfault_fork().
If the fork side aborted and it already released everything, we still
try to succeed resolve_userfault_fork(), if possible.
Fixes: 893e26e61d04eac9 ("userfaultfd: non-cooperative: Add fork() event") Link: http://lkml.kernel.org/r/20170920180413.26713-1-aarcange@redhat.com Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reported-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Cc: Pavel Emelyanov <xemul@virtuozzo.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Shaohua Li [Tue, 3 Oct 2017 23:15:32 +0000 (16:15 -0700)]
mm: fix data corruption caused by lazyfree page
MADV_FREE clears pte dirty bit and then marks the page lazyfree (clear
SwapBacked). There is no lock to prevent the page is added to swap
cache between these two steps by page reclaim. If page reclaim finds
such page, it will simply add the page to swap cache without pageout the
page to swap because the page is marked as clean. Next time, page fault
will read data from the swap slot which doesn't have the original data,
so we have a data corruption. To fix issue, we mark the page dirty and
pageout the page.
However, we shouldn't dirty all pages which is clean and in swap cache.
swapin page is swap cache and clean too. So we only dirty page which is
added into swap cache in page reclaim, which shouldn't be swapin page.
As Minchan suggested, simply dirty the page in add_to_swap can do the
job.
Fixes: 802a3a92ad7a ("mm: reclaim MADV_FREE pages") Link: http://lkml.kernel.org/r/08c84256b007bf3f63c91d94383bd9eb6fee2daa.1506446061.git.shli@fb.com Signed-off-by: Shaohua Li <shli@fb.com> Reported-by: Artem Savkov <asavkov@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Hillf Danton <hdanton@sina.com> Cc: Hugh Dickins <hughd@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: <stable@vger.kernel.org> [4.12+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Shaohua Li [Tue, 3 Oct 2017 23:15:29 +0000 (16:15 -0700)]
mm: avoid marking swap cached page as lazyfree
MADV_FREE clears pte dirty bit and then marks the page lazyfree (clear
SwapBacked). There is no lock to prevent the page is added to swap
cache between these two steps by page reclaim. Page reclaim could add
the page to swap cache and unmap the page. After page reclaim, the page
is added back to lru. At that time, we probably start draining per-cpu
pagevec and mark the page lazyfree. So the page could be in a state
with SwapBacked cleared and PG_swapcache set. Next time there is a
refault in the virtual address, do_swap_page can find the page from swap
cache but the page has PageSwapCache false because SwapBacked isn't set,
so do_swap_page will bail out and do nothing. The task will keep
running into fault handler.
Fixes: 802a3a92ad7a ("mm: reclaim MADV_FREE pages") Link: http://lkml.kernel.org/r/6537ef3814398c0073630b03f176263bc81f0902.1506446061.git.shli@fb.com Signed-off-by: Shaohua Li <shli@fb.com> Reported-by: Artem Savkov <asavkov@redhat.com> Tested-by: Artem Savkov <asavkov@redhat.com> Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Hillf Danton <hdanton@sina.com> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: <stable@vger.kernel.org> [4.12+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Layton [Tue, 3 Oct 2017 23:15:25 +0000 (16:15 -0700)]
mm: have filemap_check_and_advance_wb_err clear AS_EIO/AS_ENOSPC
Eryu noticed that he could sometimes get a leftover error reported when
it shouldn't be on fsync with ext2 and non-journalled ext4.
The problem is that writeback_single_inode still uses filemap_fdatawait.
That picks up a previously set AS_EIO flag, which would ordinarily have
been cleared before.
Since we're mostly using this function as a replacement for
filemap_check_errors, have filemap_check_and_advance_wb_err clear AS_EIO
and AS_ENOSPC when reporting an error. That should allow the new
function to better emulate the behavior of the old with respect to these
flags.
Link: http://lkml.kernel.org/r/20170922133331.28812-1-jlayton@kernel.org Signed-off-by: Jeff Layton <jlayton@redhat.com> Reported-by: Eryu Guan <eguan@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since commit 056b9d8a7692 ("mm: remove rodata_test_data export, add
pr_fmt"), rodata_test_data is used only inside rodata_test.c By
declaring it static, it gets properly allocated into .rodata section
instead of .data:
Ioan Nicu [Tue, 3 Oct 2017 23:15:13 +0000 (16:15 -0700)]
rapidio: remove global irq spinlocks from the subsystem
Locking of config and doorbell operations should be done only if the
underlying hardware requires it.
This patch removes the global spinlocks from the rapidio subsystem and
moves them to the mport drivers (fsl_rio and tsi721), only to the
necessary places. For example, local config space read and write
operations (lcread/lcwrite) are atomic in all existing drivers, so there
should be no need for locking, while the cread/cwrite operations which
generate maintenance transactions need to be synchronized with a lock.
Later, each driver could chose to use a per-port lock instead of a
global one, or even more granular locking.
Link: http://lkml.kernel.org/r/20170824113023.GD50104@nokia.com Signed-off-by: Ioan Nicu <ioan.nicu.ext@nokia.com> Signed-off-by: Frank Kunz <frank.kunz@nokia.com> Acked-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Arnd Bergmann [Tue, 3 Oct 2017 23:15:10 +0000 (16:15 -0700)]
mm: meminit: mark init_reserved_page as __meminit
The function is called from __meminit context and calls other __meminit
functions but isn't it self mark as such today:
WARNING: vmlinux.o(.text.unlikely+0x4516): Section mismatch in reference from the function init_reserved_page() to the function .meminit.text:early_pfn_to_nid()
The function init_reserved_page() references the function __meminit early_pfn_to_nid().
This is often because init_reserved_page lacks a __meminit annotation or the annotation of early_pfn_to_nid is wrong.
On most compilers, we don't notice this because the function gets
inlined all the time. Adding __meminit here fixes the harmless warning
for the old versions and is generally the correct annotation.
Link: http://lkml.kernel.org/r/20170915193149.901180-1-arnd@arndb.de Fixes: 7e18adb4f80b ("mm: meminit: initialise remaining struct pages in parallel with kswapd") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vitaly Wool [Tue, 3 Oct 2017 23:15:06 +0000 (16:15 -0700)]
z3fold: fix stale list handling
Fix the situation when clear_bit() is called for page->private before
the page pointer is actually assigned. While at it, remove work_busy()
check because it is costly and does not give 100% guarantee anyway.
Signed-off-by: Vitaly Wool <vitalywool@gmail.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: <Oleksiy.Avramchenko@sony.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrea brought to my attention that the L->{L,S} guarantees are
completely bogus for this case. I was looking at the diagram, from the
offending commit, when that _is_ the race, we had the load reordered
already.
What we need is at least S->L semantics, thus simply use
wq_has_sleeper() to serialize the call for good.
Sherry Yang [Tue, 3 Oct 2017 23:15:00 +0000 (16:15 -0700)]
android: binder: drop lru lock in isolate callback
Drop the global lru lock in isolate callback before calling
zap_page_range which calls cond_resched, and re-acquire the global lru
lock before returning. Also change return code to LRU_REMOVED_RETRY.
Use mmput_async when fail to acquire mmap sem in an atomic context.
Fix "BUG: sleeping function called from invalid context"
errors when CONFIG_DEBUG_ATOMIC_SLEEP is enabled.
Also restore mmput_async, which was initially introduced in commit ec8d7c14ea14 ("mm, oom_reaper: do not mmput synchronously from the oom
reaper context"), and was removed in commit 212925802454 ("mm: oom: let
oom_reap_task and exit_mmap run concurrently").
Link: http://lkml.kernel.org/r/20170914182231.90908-1-sherryy@android.com Fixes: f2517eb76f1f2 ("android: binder: Add global lru shrinker to binder") Signed-off-by: Sherry Yang <sherryy@android.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reported-by: Kyle Yan <kyan@codeaurora.org> Acked-by: Arve Hjønnevåg <arve@android.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Martijn Coenen <maco@google.com> Cc: Todd Kjos <tkjos@google.com> Cc: Riley Andrews <riandrews@android.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Hillf Danton <hdanton@sina.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Hoeun Ryu <hoeun.ryu@gmail.com> Cc: Christopher Lameter <cl@linux.com> Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 locks held by a.out/4771:
#0: (&mm->mmap_sem){++++++}, at: [<ffffffff8106eb35>] __do_page_fault+0x175/0x530
#1: (percpu_charge_mutex){+.+...}, at: [<ffffffff812b4c97>] try_charge+0x397/0x6e0
The problem is very similar to the one fixed by commit a459eeb7b852
("mm, page_alloc: do not depend on cpu hotplug locks inside the
allocator"). We are taking hotplug locks while we can be sitting on top
of basically arbitrary locks. This just calls for problems.
We can get rid of {get,put}_online_cpus, fortunately. We do not have to
be worried about races with memory hotplug because drain_local_stock,
which is called from both the WQ draining and the memory hotplug
contexts, is always operating on the local cpu stock with IRQs disabled.
The only thing to be careful about is that the target memcg doesn't
vanish while we are still in drain_all_stock so take a reference on it.
Link: http://lkml.kernel.org/r/20170913090023.28322-1-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Artem Savkov <asavkov@redhat.com> Tested-by: Artem Savkov <asavkov@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Tue, 3 Oct 2017 23:14:50 +0000 (16:14 -0700)]
mm, oom_reaper: skip mm structs with mmu notifiers
Andrea has noticed that the oom_reaper doesn't invalidate the range via
mmu notifiers (mmu_notifier_invalidate_range_start/end) and that can
corrupt the memory of the kvm guest for example.
tlb_flush_mmu_tlbonly already invokes mmu notifiers but that is not
sufficient as per Andrea:
"mmu_notifier_invalidate_range cannot be used in replacement of
mmu_notifier_invalidate_range_start/end. For KVM
mmu_notifier_invalidate_range is a noop and rightfully so. A MMU
notifier implementation has to implement either ->invalidate_range
method or the invalidate_range_start/end methods, not both. And if you
implement invalidate_range_start/end like KVM is forced to do, calling
mmu_notifier_invalidate_range in common code is a noop for KVM.
For those MMU notifiers that can get away only implementing
->invalidate_range, the ->invalidate_range is implicitly called by
mmu_notifier_invalidate_range_end(). And only those secondary MMUs
that share the same pagetable with the primary MMU (like AMD iommuv2)
can get away only implementing ->invalidate_range"
As the callback is allowed to sleep and the implementation is out of
hand of the MM it is safer to simply bail out if there is an mmu
notifier registered. In order to not fail too early make the
mm_has_notifiers check under the oom_lock and have a little nap before
failing to give the current oom victim some more time to exit.
[akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/20170913113427.2291-1-mhocko@kernel.org Fixes: aac453635549 ("mm, oom: introduce oom reaper") Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vitaly Wool [Tue, 3 Oct 2017 23:14:47 +0000 (16:14 -0700)]
z3fold: fix potential race in z3fold_reclaim_page
It is possible that on a (partially) unsuccessful page reclaim,
kref_put() called in z3fold_reclaim_page() does not yield page release,
but the page is released shortly afterwards by another thread. Then
z3fold_reclaim_page() would try to list_add() that (released) page again
which is obviously a bug.
To avoid that, spin_lock() has to be taken earlier, before the
kref_put() call mentioned earlier.
Link: http://lkml.kernel.org/r/20170913162937.bfff21c7d12b12a5f47639fd@gmail.com Signed-off-by: Vitaly Wool <vitalywool@gmail.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: <Oleksiy.Avramchenko@sony.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
sh: sh7269: remove nonexistent GPIO_PH[0-7] to fix pinctrl registration
Pinmux_pins[] is initialized through PINMUX_GPIO(), using designated
array initializers, where the GPIO_* enums serve as indices. If enum
values are defined, but never used, pinmux_pins[] contains (zero-filled)
holes. Such entries are treated as pin zero, which was registered
before, thus leading to pinctrl registration failures, as seen on
sh7722:
sh-pfc pfc-sh7722: pin 0 already registered
sh-pfc pfc-sh7722: error during pin registration
sh-pfc pfc-sh7722: could not register: -22
sh-pfc: probe of pfc-sh7722 failed with error -22
sh: sh7264: remove nonexistent GPIO_PH[0-7] to fix pinctrl registration
Pinmux_pins[] is initialized through PINMUX_GPIO(), using designated
array initializers, where the GPIO_* enums serve as indices. If enum
values are defined, but never used, pinmux_pins[] contains (zero-filled)
holes. Such entries are treated as pin zero, which was registered
before, thus leading to pinctrl registration failures, as seen on
sh7722:
sh-pfc pfc-sh7722: pin 0 already registered
sh-pfc pfc-sh7722: error during pin registration
sh-pfc pfc-sh7722: could not register: -22
sh-pfc: probe of pfc-sh7722 failed with error -22
sh: sh7757: remove nonexistent GPIO_PT[JLNQ]7_RESV to fix pinctrl registration
Commit 3810e96056ff ("sh: modify pinmux for SH7757 2nd cut") renamed
GPIO_PT[JLNQ]7 to GPIO_PT[JLNQ]7_RESV, and removed the existing users
from the pinmux_pins[] array.
However, pinmux_pins[] is initialized through PINMUX_GPIO(), using
designated array initializers, where the GPIO_* enums serve as indices.
Hence entries were not really removed, but replaced by (zero-filled)
holes. Such entries are treated as pin zero, which was registered
before, thus leading to pinctrl registration failures, as seen on
sh7722:
sh-pfc pfc-sh7722: pin 0 already registered
sh-pfc pfc-sh7722: error during pin registration
sh-pfc pfc-sh7722: could not register: -22
sh-pfc: probe of pfc-sh7722 failed with error -22
Remove GPIO_PT[JLNQ]7_RESV from the enum to fix this.
sh: sh7722: remove nonexistent GPIO_PTQ7 to fix pinctrl registration
Patch series "sh: sh7722/sh7757i/sh7264/sh7269: Fix pinctrl registration",
v2.
Magnus Damm reported that on sh7722/Migo-R, pinctrl registration fails
with:
sh-pfc pfc-sh7722: pin 0 already registered
sh-pfc pfc-sh7722: error during pin registration
sh-pfc pfc-sh7722: could not register: -22
sh-pfc: probe of pfc-sh7722 failed with error -22
pinmux_pins[] is initialized through PINMUX_GPIO(), using designated
array initializers, where the GPIO_* enums serve as indices. Apparently
GPIO_PTQ7 was defined in the enum, but never used. If enum values are
defined, but never used, pinmux_pins[] contains (zero-filled) holes.
Hence such entries are treated as pin zero, which was registered before,
and pinctrl registration fails.
I can't see how this ever worked, as at the time of commit f5e25ae52fef
("sh-pfc: Add sh7722 pinmux support"), pinmux_gpios[] in
drivers/pinctrl/sh-pfc/pfc-sh7722.c already had the hole, and
drivers/pinctrl/core.c already had the check.
Some scripting revealed a few more broken drivers:
- sh7757 has four holes, due to nonexistent GPIO_PT[JLNQ]7_RESV.
- sh7264 and sh7269 define GPIO_PH[0-7], but don't use it with
PINMUX_GPIO().
Patch 1 fixes the issue on sh7722, and was tested. Patches 3-4 should
fix the issue on the other 3 SoCs, but was untested due to lack of
hardware.
This patch (of 4):
On sh7722/Migo-R, pinctrl registration fails with:
sh-pfc pfc-sh7722: pin 0 already registered
sh-pfc pfc-sh7722: error during pin registration
sh-pfc pfc-sh7722: could not register: -22
sh-pfc: probe of pfc-sh7722 failed with error -22
pinmux_pins[] is initialized through PINMUX_GPIO(), using designated array
initializers, where the GPIO_* enums serve as indices. As GPIO_PTQ7 is
defined in the enum, but never used, pinmux_pins[] contains a
(zero-filled) hole. Hence this entry is treated as pin zero, which was
registered before, and pinctrl registration fails.
According to the datasheet, port PTQ7 does not exist. Hence remove
GPIO_PTQ7 from the enum to fix this.
Alexandru Moise [Tue, 3 Oct 2017 23:14:31 +0000 (16:14 -0700)]
mm, hugetlb, soft_offline: save compound page order before page migration
This fixes a bug in madvise() where if you'd try to soft offline a
hugepage via madvise(), while walking the address range you'd end up,
using the wrong page offset due to attempting to get the compound order
of a former but presently not compound page, due to dissolving the huge
page (since commit c3114a84f7f9: "mm: hugetlb: soft-offline: dissolve
source hugepage after successful migration").
As a result I ended up with all my free pages except one being offlined.
Link: http://lkml.kernel.org/r/20170912204306.GA12053@gmail.com Fixes: c3114a84f7f9 ("mm: hugetlb: soft-offline: dissolve source hugepage after successful migration") Signed-off-by: Alexandru Moise <00moses.alexander00@gmail.com> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Hillf Danton <hdanton@sina.com> Cc: Shaohua Li <shli@fb.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: David Rientjes <rientjes@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>