]> git.proxmox.com Git - mirror_ubuntu-eoan-kernel.git/log
mirror_ubuntu-eoan-kernel.git
5 years agoHID: core: fix dmesg flooding if report field larger than 32bit
Joshua Clayton [Wed, 30 Oct 2019 07:16:52 +0000 (15:16 +0800)]
HID: core: fix dmesg flooding if report field larger than 32bit

BugLink: https://bugs.launchpad.net/bugs/1850600
Only warn once of oversize hid report value field

On HP spectre x360 convertible the message:
hid-sensor-hub 001F:8087:0AC2.0002: hid_field_extract() called with n (192) > 32! (kworker/1:2)
is continually printed many times per second, crowding out all else.
Protect dmesg by printing the warning only one time.

The size of the hid report field data structure should probably be increased.
The data structure is treated as a u32 in Linux, but an unlimited number
of bits in the USB hid spec, so there is some rearchitecture needed now that
devices are sending more than 32 bits.

Signed-off-by: Joshua Clayton <stillcompiling@gmail.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
(cherry picked from commit 0af10eed9b7308187c7865024248b2a2a5aa382a)
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Acked-by: Connor Kuehl <connor.kuehl@canonical.com>
Acked-by: AceLan Kao <acelan.kao@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
5 years agoHID: core: Add printk_once variants to hid_warn() etc
Joshua Clayton [Wed, 30 Oct 2019 07:16:51 +0000 (15:16 +0800)]
HID: core: Add printk_once variants to hid_warn() etc

BugLink: https://bugs.launchpad.net/bugs/1850600
hid_warn_once() is needed. Add the others as part of the block.

Signed-off-by: Joshua Clayton <stillcompiling@gmail.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
(cherry picked from commit aaeabb121a6271775cef35ec3197e9ef124fdb2a)
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Acked-by: Connor Kuehl <connor.kuehl@canonical.com>
Acked-by: AceLan Kao <acelan.kao@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
5 years agoHID: core: reformat and reduce hid_printk macros
Joshua Clayton [Wed, 30 Oct 2019 07:16:50 +0000 (15:16 +0800)]
HID: core: reformat and reduce hid_printk macros

BugLink: https://bugs.launchpad.net/bugs/1850600
Reformat hid_printk macros to use standard __VA_ARGS__ syntax.
Per Joe Perches hid_printk(), hid_emerg(), hid_crit(), and hid_alert() are
unlikely ever to be used. Remove them.

Signed-off-by: Joshua Clayton <stillcompiling@gmail.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
(cherry picked from commit 337c22ab1d4f316f37e7a7c78bdea3d768270542)
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Acked-by: Connor Kuehl <connor.kuehl@canonical.com>
Acked-by: AceLan Kao <acelan.kao@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
5 years agoUBUNTU: SAUCE: ovl: Restore vm_file value when lower fs mmap fails
Seth Forshee [Thu, 7 Nov 2019 16:08:25 +0000 (10:08 -0600)]
UBUNTU: SAUCE: ovl: Restore vm_file value when lower fs mmap fails

BugLink: https://bugs.launchpad.net/bugs/1850994
ovl_mmap() overwrites vma->vm_file before calling the lower
filesystem mmap but does not restore the original value on
failure. This means it is giving a pointer to the lower fs file
back to the caller with no reference, which is a bad practice.
However, it does not lead to any issues with upstream kernels as
no caller accesses vma->vm_file after call_mmap().

With the aufs patches applied the story is different. Whereas
mmap_region() previously fput a local variable containing the
file it assigned to vm_file, it now calls vma_fput() which will
fput vm_file, for which it has no reference, and the reference
for the original vm_file is not put.

Fix this by restoring vma->vm_file to the original value when the
mmap call into the lower fs fails.

CVE-2019-15794

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
5 years agoUBUNTU: SAUCE: shiftfs: Restore vm_file value when lower fs mmap fails
Seth Forshee [Thu, 7 Nov 2019 16:08:24 +0000 (10:08 -0600)]
UBUNTU: SAUCE: shiftfs: Restore vm_file value when lower fs mmap fails

BugLink: https://bugs.launchpad.net/bugs/1850994
shiftfs_mmap() overwrites vma->vm_file before calling the lower
filesystem mmap but does not restore the original value on
failure. This means it is giving a pointer to the lower fs file
back to the caller with no reference, which is a bad practice.
However, it does not lead to any issues with upstream kernels as
no caller accesses vma->vm_file after call_mmap().

With the aufs patches applied the story is different. Whereas
mmap_region() previously fput a local variable containing the
file it assigned to vm_file, it now calls vma_fput() which will
fput vm_file, for which it has no reference, and the reference
for the original vm_file is not put.

Fix this by restoring vma->vm_file to the original value when the
mmap call into the lower fs fails.

CVE-2019-15794

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
5 years agoUBUNTU: SAUCE: fs: Move SB_I_NOSUID to the top of s_iflags
Seth Forshee [Thu, 7 Nov 2019 17:21:53 +0000 (11:21 -0600)]
UBUNTU: SAUCE: fs: Move SB_I_NOSUID to the top of s_iflags

BugLink: https://bugs.launchpad.net/bugs/1851677
SB_I_NOSUID was added by a sauce patch, and over time it has come
to occpy the same bit in s_iflags as SB_I_USERNS_VISIBLE without
being noticed. overlayfs will set SB_I_NOSUID when any lower
mount is nosuid. When this happens for a user namespace mount,
mount_too_revealing() will perform additional, unnecessary checks
which may block mounting when it should be allowed.

Move SB_I_NOSUID to prevent this conflict, and move it to the top
of s_iflags to make future conflicts less likely.

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Sultan Alsawaf <sultan.alsawaf@canonical.com>
Acked-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
5 years agoUBUNTU: SAUCE: (efi-lockdown) Really don't allow lifting lockdown from userspace
Seth Forshee [Tue, 5 Nov 2019 20:35:05 +0000 (14:35 -0600)]
UBUNTU: SAUCE: (efi-lockdown) Really don't allow lifting lockdown from userspace

BugLink: https://bugs.launchpad.net/bugs/1851380
"UBUNTU: SAUCE: (efi-lockdown) Add a SysRq option to lift kernel
lockdown" adds a sysrq key to lift kernel lockdown, which is
meant to only allow a physically present user to lift lockdown
using a keyboard. However, the code has a bug which also allows
root to lift lockdown through /proc/sysrq-trigger. Fix this bug
to make this work as intended.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Connor Kuehl <connor.kuehl@canonical.com>
Acked-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
5 years agoUBUNTU: [Debian]: do not skip tests for linux-hwe-edge
Thadeu Lima de Souza Cascardo [Fri, 25 Oct 2019 12:14:00 +0000 (14:14 +0200)]
UBUNTU: [Debian]: do not skip tests for linux-hwe-edge

Ignore: yes

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Acked-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoUBUNTU: upstream stable to v5.3.8
Connor Kuehl [Tue, 29 Oct 2019 15:26:28 +0000 (08:26 -0700)]
UBUNTU: upstream stable to v5.3.8

BugLink: https://bugs.launchpad.net/bugs/1850456
Ignore: yes
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoLinux 5.3.8
Greg Kroah-Hartman [Tue, 29 Oct 2019 08:22:48 +0000 (09:22 +0100)]
Linux 5.3.8

BugLink: https://bugs.launchpad.net/bugs/1850456
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoRDMA/cxgb4: Do not dma memory off of the stack
Greg KH [Tue, 1 Oct 2019 16:56:11 +0000 (18:56 +0200)]
RDMA/cxgb4: Do not dma memory off of the stack

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 3840c5b78803b2b6cc1ff820100a74a092c40cbb upstream.

Nicolas pointed out that the cxgb4 driver is doing dma off of the stack,
which is generally considered a very bad thing.  On some architectures it
could be a security problem, but odds are none of them actually run this
driver, so it's just a "normal" bug.

Resolve this by allocating the memory for a message off of the heap
instead of the stack.  kmalloc() always will give us a proper memory
location that DMA will work correctly from.

Link: https://lore.kernel.org/r/20191001165611.GA3542072@kroah.com
Reported-by: Nicolas Waisman <nico@semmle.com>
Tested-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoblk-rq-qos: fix first node deletion of rq_qos_del()
Tejun Heo [Tue, 15 Oct 2019 15:49:27 +0000 (08:49 -0700)]
blk-rq-qos: fix first node deletion of rq_qos_del()

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 307f4065b9d7c1e887e8bdfb2487e4638559fea1 upstream.

rq_qos_del() incorrectly assigns the node being deleted to the head if
it was the first on the list in the !prev path.  Fix it by iterating
with ** instead.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Fixes: a79050434b45 ("blk-rq-qos: refactor out common elements of blk-wbt")
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoof: reserved_mem: add missing of_node_put() for proper ref-counting
Chris Goldsworthy [Sun, 20 Oct 2019 01:57:24 +0000 (18:57 -0700)]
of: reserved_mem: add missing of_node_put() for proper ref-counting

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 5dba51754b04a941a1064f584e7a7f607df3f9bc upstream.

Commit d698a388146c ("of: reserved-memory: ignore disabled memory-region
nodes") added an early return in of_reserved_mem_device_init_by_idx(), but
didn't call of_node_put() on a device_node whose ref-count was incremented
in the call to of_parse_phandle() preceding the early exit.

Fixes: d698a388146c ("of: reserved-memory: ignore disabled memory-region nodes")
Signed-off-by: Chris Goldsworthy <cgoldswo@codeaurora.org>
Cc: stable@vger.kernel.org
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoopp: of: drop incorrect lockdep_assert_held()
Viresh Kumar [Thu, 10 Oct 2019 10:25:33 +0000 (15:55 +0530)]
opp: of: drop incorrect lockdep_assert_held()

BugLink: https://bugs.launchpad.net/bugs/1850456
commit f2edbb6699b0bc6e4f789846b99007200546c6c2 upstream.

_find_opp_of_np() doesn't traverse the list of OPP tables but instead
just the entries within an OPP table and so only requires to lock the
OPP table itself.

The lockdep_assert_held() was added there by mistake and isn't really
required.

Fixes: 5d6d106fa455 ("OPP: Populate required opp tables from "required-opps" property")
Cc: v5.0+ <stable@vger.kernel.org> # v5.0+
Reported-by: Niklas Cassel <niklas.cassel@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoPCI: PM: Fix pci_power_up()
Rafael J. Wysocki [Mon, 14 Oct 2019 11:25:00 +0000 (13:25 +0200)]
PCI: PM: Fix pci_power_up()

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 45144d42f299455911cc29366656c7324a3a7c97 upstream.

There is an arbitrary difference between the system resume and
runtime resume code paths for PCI devices regarding the delay to
apply when switching the devices from D3cold to D0.

Namely, pci_restore_standard_config() used in the runtime resume
code path calls pci_set_power_state() which in turn invokes
__pci_start_power_transition() to power up the device through the
platform firmware and that function applies the transition delay
(as per PCI Express Base Specification Revision 2.0, Section 6.6.1).
However, pci_pm_default_resume_early() used in the system resume
code path calls pci_power_up() which doesn't apply the delay at
all and that causes issues to occur during resume from
suspend-to-idle on some systems where the delay is required.

Since there is no reason for that difference to exist, modify
pci_power_up() to follow pci_set_power_state() more closely and
invoke __pci_start_power_transition() from there to call the
platform firmware to power up the device (in case that's necessary).

Fixes: db288c9c5f9d ("PCI / PM: restore the original behavior of pci_set_power_state()")
Reported-by: Daniel Drake <drake@endlessm.com>
Tested-by: Daniel Drake <drake@endlessm.com>
Link: https://lore.kernel.org/linux-pm/CAD8Lp44TYxrMgPLkHCqF9hv6smEurMXvmmvmtyFhZ6Q4SE+dig@mail.gmail.com/T/#m21be74af263c6a34f36e0fc5c77c5449d9406925
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: 3.10+ <stable@vger.kernel.org> # 3.10+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoxen/netback: fix error path of xenvif_connect_data()
Juergen Gross [Fri, 18 Oct 2019 07:45:49 +0000 (09:45 +0200)]
xen/netback: fix error path of xenvif_connect_data()

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 3d5c1a037d37392a6859afbde49be5ba6a70a6b3 upstream.

xenvif_connect_data() calls module_put() in case of error. This is
wrong as there is no related module_get().

Remove the superfluous module_put().

Fixes: 279f438e36c0a7 ("xen-netback: Don't destroy the netdev until the vif is shut down")
Cc: <stable@vger.kernel.org> # 3.12
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoceph: just skip unrecognized info in ceph_reply_info_extra
Jeff Layton [Thu, 26 Sep 2019 20:05:11 +0000 (16:05 -0400)]
ceph: just skip unrecognized info in ceph_reply_info_extra

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 1d3f87233e26362fc3d4e59f0f31a71b570f90b9 upstream.

In the future, we're going to want to extend the ceph_reply_info_extra
for create replies. Currently though, the kernel code doesn't accept an
extra blob that is larger than the expected data.

Change the code to skip over any unrecognized fields at the end of the
extra blob, rather than returning -EIO.

Cc: stable@vger.kernel.org
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agocpufreq: Avoid cpufreq_suspend() deadlock on system shutdown
Rafael J. Wysocki [Tue, 8 Oct 2019 23:29:10 +0000 (01:29 +0200)]
cpufreq: Avoid cpufreq_suspend() deadlock on system shutdown

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 65650b35133ff20f0c9ef0abd5c3c66dbce3ae57 upstream.

It is incorrect to set the cpufreq syscore shutdown callback pointer
to cpufreq_suspend(), because that function cannot be run in the
syscore stage of system shutdown for two reasons: (a) it may attempt
to carry out actions depending on devices that have already been shut
down at that point and (b) the RCU synchronization carried out by it
may not be able to make progress then.

The latter issue has been present since commit 45975c7d21a1 ("rcu:
Define RCU-sched API in terms of RCU for Tree RCU PREEMPT builds"),
but the former one has been there since commit 90de2a4aa9f3 ("cpufreq:
suspend cpufreq governors on shutdown") regardless.

Fix that by dropping cpufreq_syscore_ops altogether and making
device_shutdown() call cpufreq_suspend() directly before shutting
down devices, which is along the lines of what system-wide power
management does.

Fixes: 45975c7d21a1 ("rcu: Define RCU-sched API in terms of RCU for Tree RCU PREEMPT builds")
Fixes: 90de2a4aa9f3 ("cpufreq: suspend cpufreq governors on shutdown")
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 4.0+ <stable@vger.kernel.org> # 4.0+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agomemstick: jmb38x_ms: Fix an error handling path in 'jmb38x_ms_probe()'
Christophe JAILLET [Sat, 5 Oct 2019 11:21:01 +0000 (13:21 +0200)]
memstick: jmb38x_ms: Fix an error handling path in 'jmb38x_ms_probe()'

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 28c9fac09ab0147158db0baeec630407a5e9b892 upstream.

If 'jmb38x_ms_count_slots()' returns 0, we must undo the previous
'pci_request_regions()' call.

Goto 'err_out_int' to fix it.

Fixes: 60fdd931d577 ("memstick: add support for JMicron jmb38x MemoryStick host controller")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoKVM: PPC: Book3S HV: XIVE: Ensure VP isn't already in use
Greg Kurz [Fri, 27 Sep 2019 11:53:43 +0000 (13:53 +0200)]
KVM: PPC: Book3S HV: XIVE: Ensure VP isn't already in use

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 12ade69c1eb9958b13374edf5ef742ea20ccffde upstream.

Connecting a vCPU to a XIVE KVM device means establishing a 1:1
association between a vCPU id and the offset (VP id) of a VP
structure within a fixed size block of VPs. We currently try to
enforce the 1:1 relationship by checking that a vCPU with the
same id isn't already connected. This is good but unfortunately
not enough because we don't map VP ids to raw vCPU ids but to
packed vCPU ids, and the packing function kvmppc_pack_vcpu_id()
isn't bijective by design. We got away with it because QEMU passes
vCPU ids that fit well in the packing pattern. But nothing prevents
userspace to come up with a forged vCPU id resulting in a packed id
collision which causes the KVM device to associate two vCPUs to the
same VP. This greatly confuses the irq layer and ultimately crashes
the kernel, as shown below.

Example: a guest with 1 guest thread per core, a core stride of
8 and 300 vCPUs has vCPU ids 0,8,16...2392. If QEMU is patched to
inject at some point an invalid vCPU id 348, which is the packed
version of itself and 2392, we get:

genirq: Flags mismatch irq 199. 00010000 (kvm-2-2392) vs. 00010000 (kvm-2-348)
CPU: 24 PID: 88176 Comm: qemu-system-ppc Not tainted 5.3.0-xive-nr-servers-5.3-gku+ #38
Call Trace:
[c000003f7f9937e0] [c000000000c0110c] dump_stack+0xb0/0xf4 (unreliable)
[c000003f7f993820] [c0000000001cb480] __setup_irq+0xa70/0xad0
[c000003f7f9938d0] [c0000000001cb75c] request_threaded_irq+0x13c/0x260
[c000003f7f993940] [c00800000d44e7ac] kvmppc_xive_attach_escalation+0x104/0x270 [kvm]
[c000003f7f9939d0] [c00800000d45013c] kvmppc_xive_connect_vcpu+0x424/0x620 [kvm]
[c000003f7f993ac0] [c00800000d444428] kvm_arch_vcpu_ioctl+0x260/0x448 [kvm]
[c000003f7f993b90] [c00800000d43593c] kvm_vcpu_ioctl+0x154/0x7c8 [kvm]
[c000003f7f993d00] [c0000000004840f0] do_vfs_ioctl+0xe0/0xc30
[c000003f7f993db0] [c000000000484d44] ksys_ioctl+0x104/0x120
[c000003f7f993e00] [c000000000484d88] sys_ioctl+0x28/0x80
[c000003f7f993e20] [c00000000000b278] system_call+0x5c/0x68
xive-kvm: Failed to request escalation interrupt for queue 0 of VCPU 2392
------------[ cut here ]------------
remove_proc_entry: removing non-empty directory 'irq/199', leaking at least 'kvm-2-348'
WARNING: CPU: 24 PID: 88176 at /home/greg/Work/linux/kernel-kvm-ppc/fs/proc/generic.c:684 remove_proc_entry+0x1ec/0x200
Modules linked in: kvm_hv kvm dm_mod vhost_net vhost tap xt_CHECKSUM iptable_mangle xt_MASQUERADE iptable_nat nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter squashfs loop fuse i2c_dev sg ofpart ocxl powernv_flash at24 xts mtd uio_pdrv_genirq vmx_crypto opal_prd ipmi_powernv uio ipmi_devintf ipmi_msghandler ibmpowernv ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables ext4 mbcache jbd2 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq libcrc32c raid1 raid0 linear sd_mod ast i2c_algo_bit drm_vram_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm ahci libahci libata tg3 drm_panel_orientation_quirks [last unloaded: kvm]
CPU: 24 PID: 88176 Comm: qemu-system-ppc Not tainted 5.3.0-xive-nr-servers-5.3-gku+ #38
NIP:  c00000000053b0cc LR: c00000000053b0c8 CTR: c0000000000ba3b0
REGS: c000003f7f9934b0 TRAP: 0700   Not tainted  (5.3.0-xive-nr-servers-5.3-gku+)
MSR:  9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 48228222  XER: 20040000
CFAR: c000000000131a50 IRQMASK: 0
GPR00: c00000000053b0c8 c000003f7f993740 c0000000015ec500 0000000000000057
GPR04: 0000000000000001 0000000000000000 000049fb98484262 0000000000001bcf
GPR08: 0000000000000007 0000000000000007 0000000000000001 9000000000001033
GPR12: 0000000000008000 c000003ffffeb800 0000000000000000 000000012f4ce5a1
GPR16: 000000012ef5a0c8 0000000000000000 000000012f113bb0 0000000000000000
GPR20: 000000012f45d918 c000003f863758b0 c000003f86375870 0000000000000006
GPR24: c000003f86375a30 0000000000000007 c0002039373d9020 c0000000014c4a48
GPR28: 0000000000000001 c000003fe62a4f6b c00020394b2e9fab c000003fe62a4ec0
NIP [c00000000053b0cc] remove_proc_entry+0x1ec/0x200
LR [c00000000053b0c8] remove_proc_entry+0x1e8/0x200
Call Trace:
[c000003f7f993740] [c00000000053b0c8] remove_proc_entry+0x1e8/0x200 (unreliable)
[c000003f7f9937e0] [c0000000001d3654] unregister_irq_proc+0x114/0x150
[c000003f7f993880] [c0000000001c6284] free_desc+0x54/0xb0
[c000003f7f9938c0] [c0000000001c65ec] irq_free_descs+0xac/0x100
[c000003f7f993910] [c0000000001d1ff8] irq_dispose_mapping+0x68/0x80
[c000003f7f993940] [c00800000d44e8a4] kvmppc_xive_attach_escalation+0x1fc/0x270 [kvm]
[c000003f7f9939d0] [c00800000d45013c] kvmppc_xive_connect_vcpu+0x424/0x620 [kvm]
[c000003f7f993ac0] [c00800000d444428] kvm_arch_vcpu_ioctl+0x260/0x448 [kvm]
[c000003f7f993b90] [c00800000d43593c] kvm_vcpu_ioctl+0x154/0x7c8 [kvm]
[c000003f7f993d00] [c0000000004840f0] do_vfs_ioctl+0xe0/0xc30
[c000003f7f993db0] [c000000000484d44] ksys_ioctl+0x104/0x120
[c000003f7f993e00] [c000000000484d88] sys_ioctl+0x28/0x80
[c000003f7f993e20] [c00000000000b278] system_call+0x5c/0x68
Instruction dump:
2c230000 41820008 3923ff78 e8e900a0 3c82ff69 3c62ff8d 7fa6eb78 7fc5f378
3884f080 3863b948 4bbf6925 60000000 <0fe000004bffff7c fba10088 4bbf6e41
---[ end trace b925b67a74a1d8d1 ]---
BUG: Kernel NULL pointer dereference at 0x00000010
Faulting instruction address: 0xc00800000d44fc04
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
Modules linked in: kvm_hv kvm dm_mod vhost_net vhost tap xt_CHECKSUM iptable_mangle xt_MASQUERADE iptable_nat nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter squashfs loop fuse i2c_dev sg ofpart ocxl powernv_flash at24 xts mtd uio_pdrv_genirq vmx_crypto opal_prd ipmi_powernv uio ipmi_devintf ipmi_msghandler ibmpowernv ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables ext4 mbcache jbd2 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq libcrc32c raid1 raid0 linear sd_mod ast i2c_algo_bit drm_vram_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm ahci libahci libata tg3 drm_panel_orientation_quirks [last unloaded: kvm]
CPU: 24 PID: 88176 Comm: qemu-system-ppc Tainted: G        W         5.3.0-xive-nr-servers-5.3-gku+ #38
NIP:  c00800000d44fc04 LR: c00800000d44fc00 CTR: c0000000001cd970
REGS: c000003f7f9938e0 TRAP: 0300   Tainted: G        W          (5.3.0-xive-nr-servers-5.3-gku+)
MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24228882  XER: 20040000
CFAR: c0000000001cd9ac DAR: 0000000000000010 DSISR: 40000000 IRQMASK: 0
GPR00: c00800000d44fc00 c000003f7f993b70 c00800000d468300 0000000000000000
GPR04: 00000000000000c7 0000000000000000 0000000000000000 c000003ffacd06d8
GPR08: 0000000000000000 c000003ffacd0738 0000000000000000 fffffffffffffffd
GPR12: 0000000000000040 c000003ffffeb800 0000000000000000 000000012f4ce5a1
GPR16: 000000012ef5a0c8 0000000000000000 000000012f113bb0 0000000000000000
GPR20: 000000012f45d918 00007ffffe0d9a80 000000012f4f5df0 000000012ef8c9f8
GPR24: 0000000000000001 0000000000000000 c000003fe4501ed0 c000003f8b1d0000
GPR28: c0000033314689c0 c000003fe4501c00 c000003fe4501e70 c000003fe4501e90
NIP [c00800000d44fc04] kvmppc_xive_cleanup_vcpu+0xfc/0x210 [kvm]
LR [c00800000d44fc00] kvmppc_xive_cleanup_vcpu+0xf8/0x210 [kvm]
Call Trace:
[c000003f7f993b70] [c00800000d44fc00] kvmppc_xive_cleanup_vcpu+0xf8/0x210 [kvm] (unreliable)
[c000003f7f993bd0] [c00800000d450bd4] kvmppc_xive_release+0xdc/0x1b0 [kvm]
[c000003f7f993c30] [c00800000d436a98] kvm_device_release+0xb0/0x110 [kvm]
[c000003f7f993c70] [c00000000046730c] __fput+0xec/0x320
[c000003f7f993cd0] [c000000000164ae0] task_work_run+0x150/0x1c0
[c000003f7f993d30] [c000000000025034] do_notify_resume+0x304/0x440
[c000003f7f993e20] [c00000000000dcc4] ret_from_except_lite+0x70/0x74
Instruction dump:
3bff0008 7fbfd040 419e0054 847e0004 2fa30000 419effec e93d0000 8929203c
2f890000 419effb8 4800821d e8410018 <e9230010e9490008 9b2a0039 7c0004ac
---[ end trace b925b67a74a1d8d2 ]---

Kernel panic - not syncing: Fatal exception

This affects both XIVE and XICS-on-XIVE devices since the beginning.

Check the VP id instead of the vCPU id when a new vCPU is connected.
The allocation of the XIVE CPU structure in kvmppc_xive_connect_vcpu()
is moved after the check to avoid the need for rollback.

Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agobtrfs: tracepoints: Fix bad entry members of qgroup events
Qu Wenruo [Thu, 17 Oct 2019 02:38:37 +0000 (10:38 +0800)]
btrfs: tracepoints: Fix bad entry members of qgroup events

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 1b2442b4ae0f234daeadd90e153b466332c466d8 upstream.

[BUG]
For btrfs:qgroup_meta_reserve event, the trace event can output garbage:

  qgroup_meta_reserve: 9c7f6acc-b342-4037-bc47-7f6e4d2232d7: refroot=5(FS_TREE) type=DATA diff=2
  qgroup_meta_reserve: 9c7f6acc-b342-4037-bc47-7f6e4d2232d7: refroot=5(FS_TREE) type=0x258792 diff=2

The @type can be completely garbage, as DATA type is not possible for
trace_qgroup_meta_reserve() trace event.

[CAUSE]
Ther are several problems related to qgroup trace events:
- Unassigned entry member
  Member entry::type of trace_qgroup_update_reserve() and
  trace_qgourp_meta_reserve() is not assigned

- Redundant entry member
  Member entry::type is completely useless in
  trace_qgroup_meta_convert()

Fixes: 4ee0d8832c2e ("btrfs: qgroup: Update trace events for metadata reservation")
CC: stable@vger.kernel.org # 4.10+
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agobtrfs: tracepoints: Fix wrong parameter order for qgroup events
Qu Wenruo [Thu, 17 Oct 2019 02:38:36 +0000 (10:38 +0800)]
btrfs: tracepoints: Fix wrong parameter order for qgroup events

BugLink: https://bugs.launchpad.net/bugs/1850456
commit fd2b007eaec898564e269d1f478a2da0380ecf51 upstream.

[BUG]
For btrfs:qgroup_meta_reserve event, the trace event can output garbage:

  qgroup_meta_reserve: 9c7f6acc-b342-4037-bc47-7f6e4d2232d7: refroot=5(FS_TREE) type=DATA diff=2

The diff should always be alinged to sector size (4k), so there is
definitely something wrong.

[CAUSE]
For the wrong @diff, it's caused by wrong parameter order.
The correct parameters are:

  struct btrfs_root, s64 diff, int type.

However the parameters used are:

  struct btrfs_root, int type, s64 diff.

Fixes: 4ee0d8832c2e ("btrfs: qgroup: Update trace events for metadata reservation")
CC: stable@vger.kernel.org # 4.19+
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoBtrfs: check for the full sync flag while holding the inode lock during fsync
Filipe Manana [Wed, 16 Oct 2019 15:28:52 +0000 (16:28 +0100)]
Btrfs: check for the full sync flag while holding the inode lock during fsync

BugLink: https://bugs.launchpad.net/bugs/1850456
commit ba0b084ac309283db6e329785c1dc4f45fdbd379 upstream.

We were checking for the full fsync flag in the inode before locking the
inode, which is racy, since at that that time it might not be set but
after we acquire the inode lock some other task set it. One case where
this can happen is on a system low on memory and some concurrent task
failed to allocate an extent map and therefore set the full sync flag on
the inode, to force the next fsync to work in full mode.

A consequence of missing the full fsync flag set is hitting the problems
fixed by commit 0c713cbab620 ("Btrfs: fix race between ranged fsync and
writeback of adjacent ranges"), BUG_ON() when dropping extents from a log
tree, hitting assertion failures at tree-log.c:copy_items() or all sorts
of weird inconsistencies after replaying a log due to file extents items
representing ranges that overlap.

So just move the check such that it's done after locking the inode and
before starting writeback again.

Fixes: 0c713cbab620 ("Btrfs: fix race between ranged fsync and writeback of adjacent ranges")
CC: stable@vger.kernel.org # 5.2+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoBtrfs: fix qgroup double free after failure to reserve metadata for delalloc
Filipe Manana [Tue, 15 Oct 2019 09:54:39 +0000 (10:54 +0100)]
Btrfs: fix qgroup double free after failure to reserve metadata for delalloc

BugLink: https://bugs.launchpad.net/bugs/1850456
commit c7967fc1499beb9b70bb9d33525fb0b384af8883 upstream.

If we fail to reserve metadata for delalloc operations we end up releasing
the previously reserved qgroup amount twice, once explicitly under the
'out_qgroup' label by calling btrfs_qgroup_free_meta_prealloc() and once
again, under label 'out_fail', by calling btrfs_inode_rsv_release() with a
value of 'true' for its 'qgroup_free' argument, which results in
btrfs_qgroup_free_meta_prealloc() being called again, so we end up having
a double free.

Also if we fail to reserve the necessary qgroup amount, we jump to the
label 'out_fail', which calls btrfs_inode_rsv_release() and that in turns
calls btrfs_qgroup_free_meta_prealloc(), even though we weren't able to
reserve any qgroup amount. So we freed some amount we never reserved.

So fix this by removing the call to btrfs_inode_rsv_release() in the
failure path, since it's not necessary at all as we haven't changed the
inode's block reserve in any way at this point.

Fixes: c8eaeac7b73434 ("btrfs: reserve delalloc metadata differently")
CC: stable@vger.kernel.org # 5.2+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agobtrfs: don't needlessly create extent-refs kernel thread
David Sterba [Sat, 12 Oct 2019 16:42:10 +0000 (18:42 +0200)]
btrfs: don't needlessly create extent-refs kernel thread

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 80ed4548d0711d15ca51be5dee0ff813051cfc90 upstream.

The patch 32b593bfcb58 ("Btrfs: remove no longer used function to run
delayed refs asynchronously") removed the async delayed refs but the
thread has been created, without any use. Remove it to avoid resource
consumption.

Fixes: 32b593bfcb58 ("Btrfs: remove no longer used function to run delayed refs asynchronously")
CC: stable@vger.kernel.org # 5.2+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoBtrfs: add missing extents release on file extent cluster relocation error
Filipe Manana [Wed, 9 Oct 2019 16:43:45 +0000 (17:43 +0100)]
Btrfs: add missing extents release on file extent cluster relocation error

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 44db1216efe37bf670f8d1019cdc41658d84baf5 upstream.

If we error out when finding a page at relocate_file_extent_cluster(), we
need to release the outstanding extents counter on the relocation inode,
set by the previous call to btrfs_delalloc_reserve_metadata(), otherwise
the inode's block reserve size can never decrease to zero and metadata
space is leaked. Therefore add a call to btrfs_delalloc_release_extents()
in case we can't find the target page.

Fixes: 8b62f87bad9c ("Btrfs: rework outstanding_extents")
CC: stable@vger.kernel.org # 4.19+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agobtrfs: block-group: Fix a memory leak due to missing btrfs_put_block_group()
Qu Wenruo [Thu, 10 Oct 2019 02:39:26 +0000 (10:39 +0800)]
btrfs: block-group: Fix a memory leak due to missing btrfs_put_block_group()

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 4b654acdae850f48b8250b9a578a4eaa518c7a6f upstream.

In btrfs_read_block_groups(), if we have an invalid block group which
has mixed type (DATA|METADATA) while the fs doesn't have MIXED_GROUPS
feature, we error out without freeing the block group cache.

This patch will add the missing btrfs_put_block_group() to prevent
memory leak.

Note for stable backports: the file to patch in versions <= 5.3 is
fs/btrfs/extent-tree.c

Fixes: 49303381f19a ("Btrfs: bail out if block group has different mixed flag")
CC: stable@vger.kernel.org # 4.9+
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agopinctrl: armada-37xx: swap polarity on LED group
Patrick Williams [Tue, 1 Oct 2019 15:51:38 +0000 (10:51 -0500)]
pinctrl: armada-37xx: swap polarity on LED group

BugLink: https://bugs.launchpad.net/bugs/1850456
commit b835d6953009dc350d61402a854b5a7178d8c615 upstream.

The configuration registers for the LED group have inverted
polarity, which puts the GPIO into open-drain state when used in
GPIO mode.  Switch to '0' for GPIO and '1' for LED modes.

Fixes: 87466ccd9401 ("pinctrl: armada-37xx: Add pin controller support for Armada 37xx")
Signed-off-by: Patrick Williams <alpawi@amazon.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20191001155154.99710-1-alpawi@amazon.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agopinctrl: armada-37xx: fix control of pins 32 and up
Patrick Williams [Tue, 1 Oct 2019 15:46:31 +0000 (10:46 -0500)]
pinctrl: armada-37xx: fix control of pins 32 and up

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 20504fa1d2ffd5d03cdd9dc9c9dd4ed4579b97ef upstream.

The 37xx configuration registers are only 32 bits long, so
pins 32-35 spill over into the next register.  The calculation
for the register address was done, but the bitmask was not, so
any configuration to pin 32 or above resulted in a bitmask that
overflowed and performed no action.

Fix the register / offset calculation to also adjust the offset.

Fixes: 5715092a458c ("pinctrl: armada-37xx: Add gpio support")
Signed-off-by: Patrick Williams <alpawi@amazon.com>
Acked-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20191001154634.96165-1-alpawi@amazon.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agopinctrl: cherryview: restore Strago DMI workaround for all versions
Dmitry Torokhov [Tue, 24 Sep 2019 02:49:58 +0000 (19:49 -0700)]
pinctrl: cherryview: restore Strago DMI workaround for all versions

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 260996c30f4f3a732f45045e3e0efe27017615e4 upstream.

This is essentially a revert of:

e3f72b749da2 pinctrl: cherryview: fix Strago DMI workaround
86c5dd6860a6 pinctrl: cherryview: limit Strago DMI workarounds to version 1.0

because even with 1.1 versions of BIOS there are some pins that are
configured as interrupts but not claimed by any driver, and they
sometimes fire up and result in interrupt storms that cause touchpad
stop functioning and other issues.

Given that we are unlikely to qualify another firmware version for a
while it is better to keep the workaround active on all Strago boards.

Reported-by: Alex Levin <levinale@chromium.org>
Fixes: 86c5dd6860a6 ("pinctrl: cherryview: limit Strago DMI workarounds to version 1.0")
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: Alex Levin <levinale@chromium.org>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agox86/hyperv: Make vapic support x2apic mode
Roman Kagan [Thu, 10 Oct 2019 12:33:05 +0000 (12:33 +0000)]
x86/hyperv: Make vapic support x2apic mode

BugLink: https://bugs.launchpad.net/bugs/1850456
commit e211288b72f15259da86eed6eca680758dbe9e74 upstream.

Now that there's Hyper-V IOMMU driver, Linux can switch to x2apic mode
when supported by the vcpus.

However, the apic access functions for Hyper-V enlightened apic assume
xapic mode only.

As a result, Linux fails to bring up secondary cpus when run as a guest
in QEMU/KVM with both hv_apic and x2apic enabled.

According to Michael Kelley, when in x2apic mode, the Hyper-V synthetic
apic MSRs behave exactly the same as the corresponding architectural
x2apic MSRs, so there's no need to override the apic accessors.  The
only exception is hv_apic_eoi_write, which benefits from lazy EOI when
available; however, its implementation works for both xapic and x2apic
modes.

Fixes: 29217a474683 ("iommu/hyper-v: Add Hyper-V stub IOMMU driver")
Fixes: 6b48cb5f8347 ("X86/Hyper-V: Enlighten APIC access")
Suggested-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20191010123258.16919-1-rkagan@virtuozzo.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agox86/apic/x2apic: Fix a NULL pointer deref when handling a dying cpu
Sean Christopherson [Tue, 1 Oct 2019 20:50:19 +0000 (13:50 -0700)]
x86/apic/x2apic: Fix a NULL pointer deref when handling a dying cpu

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 7a22e03b0c02988e91003c505b34d752a51de344 upstream.

Check that the per-cpu cluster mask pointer has been set prior to
clearing a dying cpu's bit.  The per-cpu pointer is not set until the
target cpu reaches smp_callin() during CPUHP_BRINGUP_CPU, whereas the
teardown function, x2apic_dead_cpu(), is associated with the earlier
CPUHP_X2APIC_PREPARE.  If an error occurs before the cpu is awakened,
e.g. if do_boot_cpu() itself fails, x2apic_dead_cpu() will dereference
the NULL pointer and cause a panic.

  smpboot: do_boot_cpu failed(-22) to wakeup CPU#1
  BUG: kernel NULL pointer dereference, address: 0000000000000008
  RIP: 0010:x2apic_dead_cpu+0x1a/0x30
  Call Trace:
   cpuhp_invoke_callback+0x9a/0x580
   _cpu_up+0x10d/0x140
   do_cpu_up+0x69/0xb0
   smp_init+0x63/0xa9
   kernel_init_freeable+0xd7/0x229
   ? rest_init+0xa0/0xa0
   kernel_init+0xa/0x100
   ret_from_fork+0x35/0x40

Fixes: 023a611748fd5 ("x86/apic/x2apic: Simplify cluster management")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20191001205019.5789-1-sean.j.christopherson@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agox86/boot/64: Make level2_kernel_pgt pages invalid outside kernel area
Steve Wahl [Tue, 24 Sep 2019 21:03:55 +0000 (16:03 -0500)]
x86/boot/64: Make level2_kernel_pgt pages invalid outside kernel area

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 2aa85f246c181b1fa89f27e8e20c5636426be624 upstream.

Our hardware (UV aka Superdome Flex) has address ranges marked
reserved by the BIOS. Access to these ranges is caught as an error,
causing the BIOS to halt the system.

Initial page tables mapped a large range of physical addresses that
were not checked against the list of BIOS reserved addresses, and
sometimes included reserved addresses in part of the mapped range.
Including the reserved range in the map allowed processor speculative
accesses to the reserved range, triggering a BIOS halt.

Used early in booting, the page table level2_kernel_pgt addresses 1
GiB divided into 2 MiB pages, and it was set up to linearly map a full
 1 GiB of physical addresses that included the physical address range
of the kernel image, as chosen by KASLR.  But this also included a
large range of unused addresses on either side of the kernel image.
And unlike the kernel image's physical address range, this extra
mapped space was not checked against the BIOS tables of usable RAM
addresses.  So there were times when the addresses chosen by KASLR
would result in processor accessible mappings of BIOS reserved
physical addresses.

The kernel code did not directly access any of this extra mapped
space, but having it mapped allowed the processor to issue speculative
accesses into reserved memory, causing system halts.

This was encountered somewhat rarely on a normal system boot, and much
more often when starting the crash kernel if "crashkernel=512M,high"
was specified on the command line (this heavily restricts the physical
address of the crash kernel, in our case usually within 1 GiB of
reserved space).

The solution is to invalidate the pages of this table outside the kernel
image's space before the page table is activated. It fixes this problem
on our hardware.

 [ bp: Touchups. ]

Signed-off-by: Steve Wahl <steve.wahl@hpe.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: dimitri.sivanich@hpe.com
Cc: Feng Tang <feng.tang@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jordan Borgner <mail@jordan-borgner.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: mike.travis@hpe.com
Cc: russ.anderson@hpe.com
Cc: stable@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Cc: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Link: https://lkml.kernel.org/r/9c011ee51b081534a7a15065b1681d200298b530.1569358539.git.steve.wahl@hpe.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoirqchip/sifive-plic: Switch to fasteoi flow
Marc Zyngier [Sun, 15 Sep 2019 14:17:45 +0000 (15:17 +0100)]
irqchip/sifive-plic: Switch to fasteoi flow

BugLink: https://bugs.launchpad.net/bugs/1850456
commit bb0fed1c60cccbe4063b455a7228818395dac86e upstream.

The SiFive PLIC interrupt controller seems to have all the HW
features to support the fasteoi flow, but the driver seems to be
stuck in a distant past. Bring it into the 21st century.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Palmer Dabbelt <palmer@sifive.com> (QEMU Boot)
Tested-by: Darius Rad <darius@bluespec.com> (on 2 HW PLIC implementations)
Tested-by: Paul Walmsley <paul.walmsley@sifive.com> (HiFive Unleashed)
Reviewed-by: Palmer Dabbelt <palmer@sifive.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/8636gxskmj.wl-maz@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agodm cache: fix bugs when a GFP_NOWAIT allocation fails
Mikulas Patocka [Wed, 16 Oct 2019 13:21:50 +0000 (09:21 -0400)]
dm cache: fix bugs when a GFP_NOWAIT allocation fails

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 13bd677a472d534bf100bab2713efc3f9e3f5978 upstream.

GFP_NOWAIT allocation can fail anytime - it doesn't wait for memory being
available and it fails if the mempool is exhausted and there is not enough
memory.

If we go down this path:
  map_bio -> mg_start -> alloc_migration -> mempool_alloc(GFP_NOWAIT)
we can see that map_bio() doesn't check the return value of mg_start(),
and the bio is leaked.

If we go down this path:
  map_bio -> mg_start -> mg_lock_writes -> alloc_prison_cell ->
  dm_bio_prison_alloc_cell_v2 -> mempool_alloc(GFP_NOWAIT) ->
  mg_lock_writes -> mg_complete
the bio is ended with an error - it is unacceptable because it could
cause filesystem corruption if the machine ran out of memory
temporarily.

Change GFP_NOWAIT to GFP_NOIO, so that the mempool code will properly
wait until memory becomes available. mempool_alloc with GFP_NOIO can't
fail, so remove the code paths that deal with allocation failure.

Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agofs/dax: Fix pmd vs pte conflict detection
Dan Williams [Mon, 21 Oct 2019 16:29:20 +0000 (09:29 -0700)]
fs/dax: Fix pmd vs pte conflict detection

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 6370740e5f8ef12de7f9a9bf48a0393d202cd827 upstream.

Users reported a v5.3 performance regression and inability to establish
huge page mappings. A revised version of the ndctl "dax.sh" huge page
unit test identifies commit 23c84eb78375 "dax: Fix missed wakeup with
PMD faults" as the source.

Update get_unlocked_entry() to check for NULL entries before checking
the entry order, otherwise NULL is misinterpreted as a present pte
conflict. The 'order' check needs to happen before the locked check as
an unlocked entry at the wrong order must fallback to lookup the correct
order.

Reported-by: Jeff Smits <jeff.smits@intel.com>
Reported-by: Doug Nelson <doug.nelson@intel.com>
Cc: <stable@vger.kernel.org>
Fixes: 23c84eb78375 ("dax: Fix missed wakeup with PMD faults")
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Link: https://lore.kernel.org/r/157167532455.3945484.11971474077040503994.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agotracing: Fix race in perf_trace_buf initialization
Prateek Sood [Tue, 15 Oct 2019 06:17:25 +0000 (11:47 +0530)]
tracing: Fix race in perf_trace_buf initialization

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 6b1340cc00edeadd52ebd8a45171f38c8de2a387 upstream.

A race condition exists while initialiazing perf_trace_buf from
perf_trace_init() and perf_kprobe_init().

      CPU0                                        CPU1
perf_trace_init()
  mutex_lock(&event_mutex)
    perf_trace_event_init()
      perf_trace_event_reg()
        total_ref_count == 0
buf = alloc_percpu()
        perf_trace_buf[i] = buf
        tp_event->class->reg() //fails       perf_kprobe_init()
goto fail                              perf_trace_event_init()
                                                 perf_trace_event_reg()
        fail:
  total_ref_count == 0

                                                   total_ref_count == 0
                                                   buf = alloc_percpu()
                                                   perf_trace_buf[i] = buf
                                                   tp_event->class->reg()
                                                   total_ref_count++

          free_percpu(perf_trace_buf[i])
          perf_trace_buf[i] = NULL

Any subsequent call to perf_trace_event_reg() will observe total_ref_count > 0,
causing the perf_trace_buf to be always NULL. This can result in perf_trace_buf
getting accessed from perf_trace_buf_alloc() without being initialized. Acquiring
event_mutex in perf_kprobe_init() before calling perf_trace_event_init() should
fix this race.

The race caused the following bug:

 Unable to handle kernel paging request at virtual address 0000003106f2003c
 Mem abort info:
   ESR = 0x96000045
   Exception class = DABT (current EL), IL = 32 bits
   SET = 0, FnV = 0
   EA = 0, S1PTW = 0
 Data abort info:
   ISV = 0, ISS = 0x00000045
   CM = 0, WnR = 1
 user pgtable: 4k pages, 39-bit VAs, pgdp = ffffffc034b9b000
 [0000003106f2003c] pgd=0000000000000000, pud=0000000000000000
 Internal error: Oops: 96000045 [#1] PREEMPT SMP
 Process syz-executor (pid: 18393, stack limit = 0xffffffc093190000)
 pstate: 80400005 (Nzcv daif +PAN -UAO)
 pc : __memset+0x20/0x1ac
 lr : memset+0x3c/0x50
 sp : ffffffc09319fc50

  __memset+0x20/0x1ac
  perf_trace_buf_alloc+0x140/0x1a0
  perf_trace_sys_enter+0x158/0x310
  syscall_trace_enter+0x348/0x7c0
  el0_svc_common+0x11c/0x368
  el0_svc_handler+0x12c/0x198
  el0_svc+0x8/0xc

Ramdumps showed the following:
  total_ref_count = 3
  perf_trace_buf = (
      0x0 -> NULL,
      0x0 -> NULL,
      0x0 -> NULL,
      0x0 -> NULL)

Link: http://lkml.kernel.org/r/1571120245-4186-1-git-send-email-prsood@codeaurora.org
Cc: stable@vger.kernel.org
Fixes: e12f03d7031a9 ("perf/core: Implement the 'perf_kprobe' PMU")
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Prateek Sood <prsood@codeaurora.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoperf/aux: Fix AUX output stopping
Alexander Shishkin [Tue, 22 Oct 2019 07:39:40 +0000 (10:39 +0300)]
perf/aux: Fix AUX output stopping

BugLink: https://bugs.launchpad.net/bugs/1850456
commit f3a519e4add93b7b31a6616f0b09635ff2e6a159 upstream.

Commit:

  8a58ddae2379 ("perf/core: Fix exclusive events' grouping")

allows CAP_EXCLUSIVE events to be grouped with other events. Since all
of those also happen to be AUX events (which is not the case the other
way around, because arch/s390), this changes the rules for stopping the
output: the AUX event may not be on its PMU's context any more, if it's
grouped with a HW event, in which case it will be on that HW event's
context instead. If that's the case, munmap() of the AUX buffer can't
find and stop the AUX event, potentially leaving the last reference with
the atomic context, which will then end up freeing the AUX buffer. This
will then trip warnings:

Fix this by using the context's PMU context when looking for events
to stop, instead of the event's PMU context.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20191022073940.61814-1-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoCIFS: Fix use after free of file info structures
Pavel Shilovsky [Wed, 23 Oct 2019 22:37:19 +0000 (15:37 -0700)]
CIFS: Fix use after free of file info structures

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 1a67c415965752879e2e9fad407bc44fc7f25f23 upstream.

Currently the code assumes that if a file info entry belongs
to lists of open file handles of an inode and a tcon then
it has non-zero reference. The recent changes broke that
assumption when putting the last reference of the file info.
There may be a situation when a file is being deleted but
nothing prevents another thread to reference it again
and start using it. This happens because we do not hold
the inode list lock while checking the number of references
of the file info structure. Fix this by doing the proper
locking when doing the check.

Fixes: 487317c99477d ("cifs: add spinlock for the openFileList to cifsInodeInfo")
Fixes: cb248819d209d ("cifs: use cifsInodeInfo->open_file_lock while iterating to avoid a panic")
Cc: Stable <stable@vger.kernel.org>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agocifs: Fix missed free operations
Chuhong Yuan [Mon, 14 Oct 2019 07:15:31 +0000 (15:15 +0800)]
cifs: Fix missed free operations

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 783bf7b8b641167fb6f3f4f787f60ae62bad41b3 upstream.

cifs_setattr_nounix has two paths which miss free operations
for xid and fullpath.
Use goto cifs_setattr_exit like other paths to fix them.

CC: Stable <stable@vger.kernel.org>
Fixes: aa081859b10c ("cifs: flush before set-info if we have writeable handles")
Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoCIFS: avoid using MID 0xFFFF
Roberto Bergantinos Corpas [Mon, 14 Oct 2019 08:59:23 +0000 (10:59 +0200)]
CIFS: avoid using MID 0xFFFF

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 03d9a9fe3f3aec508e485dd3dcfa1e99933b4bdb upstream.

According to MS-CIFS specification MID 0xFFFF should not be used by the
CIFS client, but we actually do. Besides, this has proven to cause races
leading to oops between SendReceive2/cifs_demultiplex_thread. On SMB1,
MID is a 2 byte value easy to reach in CurrentMid which may conflict with
an oplock break notification request coming from server

Signed-off-by: Roberto Bergantinos Corpas <rbergant@redhat.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoarm64: Allow CAVIUM_TX2_ERRATUM_219 to be selected
Marc Zyngier [Fri, 13 Sep 2019 09:57:50 +0000 (10:57 +0100)]
arm64: Allow CAVIUM_TX2_ERRATUM_219 to be selected

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 603afdc9438ac546181e843f807253d75d3dbc45 upstream.

Allow the user to select the workaround for TX2-219, and update
the silicon-errata.rst file to reflect this.

Cc: <stable@vger.kernel.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoarm64: Enable workaround for Cavium TX2 erratum 219 when running SMT
Marc Zyngier [Tue, 9 Apr 2019 15:26:21 +0000 (16:26 +0100)]
arm64: Enable workaround for Cavium TX2 erratum 219 when running SMT

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 93916beb70143c46bf1d2bacf814be3a124b253b upstream.

It appears that the only case where we need to apply the TX2_219_TVM
mitigation is when the core is in SMT mode. So let's condition the
enabling on detecting a CPU whose MPIDR_EL1.Aff0 is non-zero.

Cc: <stable@vger.kernel.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoarm64: Avoid Cavium TX2 erratum 219 when switching TTBR
Marc Zyngier [Tue, 9 Apr 2019 15:22:24 +0000 (16:22 +0100)]
arm64: Avoid Cavium TX2 erratum 219 when switching TTBR

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 9405447ef79bc93101373e130f72e9e6cbf17dbb upstream.

As a PRFM instruction racing against a TTBR update can have undesirable
effects on TX2, NOP-out such PRFM on cores that are affected by
the TX2-219 erratum.

Cc: <stable@vger.kernel.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoarm64: KVM: Trap VM ops when ARM64_WORKAROUND_CAVIUM_TX2_219_TVM is set
Marc Zyngier [Thu, 7 Feb 2019 16:01:21 +0000 (16:01 +0000)]
arm64: KVM: Trap VM ops when ARM64_WORKAROUND_CAVIUM_TX2_219_TVM is set

BugLink: https://bugs.launchpad.net/bugs/1850456
commit d3ec3a08fa700c8b46abb137dce4e2514a6f9668 upstream.

In order to workaround the TX2-219 erratum, it is necessary to trap
TTBRx_EL1 accesses to EL2. This is done by setting HCR_EL2.TVM on
guest entry, which has the side effect of trapping all the other
VM-related sysregs as well.

To minimize the overhead, a fast path is used so that we don't
have to go all the way back to the main sysreg handling code,
unless the rest of the hypervisor expects to see these accesses.

Cc: <stable@vger.kernel.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoEDAC/ghes: Fix Use after free in ghes_edac remove path
James Morse [Mon, 14 Oct 2019 17:19:18 +0000 (18:19 +0100)]
EDAC/ghes: Fix Use after free in ghes_edac remove path

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 1e72e673b9d102ff2e8333e74b3308d012ddf75b upstream.

ghes_edac models a single logical memory controller, and uses a global
ghes_init variable to ensure only the first ghes_edac_register() will
do anything.

ghes_edac is registered the first time a GHES entry in the HEST is
probed. There may be multiple entries, so subsequent attempts to
register ghes_edac are silently ignored as the work has already been
done.

When a GHES entry is unregistered, it calls ghes_edac_unregister(),
which free()s the memory behind the global variables in ghes_edac.

But there may be multiple GHES entries, the next call to
ghes_edac_unregister() will dereference the free()d memory, and attempt
to free it a second time.

This may also be triggered on a platform with one GHES entry, if the
driver is unbound/re-bound and unbound. The re-bind step will do
nothing because of ghes_init, the second unbind will then do the same
work as the first.

Doing the unregister work on the first call is unsafe, as another
CPU may be processing a notification in ghes_edac_report_mem_error(),
using the memory we are about to free.

ghes_init is already half of the reference counting. We only need
to do the register work for the first call, and the unregister work
for the last. Add the unregister check.

This means we no longer free ghes_edac's memory while there are
GHES entries that may receive a notification.

This was detected by KASAN and DEBUG_TEST_DRIVER_REMOVE.

 [ bp: merge into a single patch. ]

Fixes: 0fe5f281f749 ("EDAC, ghes: Model a single, logical memory controller")
Reported-by: John Garry <john.garry@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Robert Richter <rrichter@marvell.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20191014171919.85044-2-james.morse@arm.com
Link: https://lkml.kernel.org/r/304df85b-8b56-b77e-1a11-aa23769f2e7c@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoparisc: Fix vmap memory leak in ioremap()/iounmap()
Helge Deller [Fri, 4 Oct 2019 17:23:37 +0000 (19:23 +0200)]
parisc: Fix vmap memory leak in ioremap()/iounmap()

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 513f7f747e1cba81f28a436911fba0b485878ebd upstream.

Sven noticed that calling ioremap() and iounmap() multiple times leads
to a vmap memory leak:
vmap allocation for size 4198400 failed:
use vmalloc=<size> to increase size

It seems we missed calling vunmap() in iounmap().

Signed-off-by: Helge Deller <deller@gmx.de>
Noticed-by: Sven Schnelle <svens@stackframe.org>
Cc: <stable@vger.kernel.org> # v3.16+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agolib/vdso: Make clock_getres() POSIX compliant again
Thomas Gleixner [Mon, 21 Oct 2019 10:07:15 +0000 (12:07 +0200)]
lib/vdso: Make clock_getres() POSIX compliant again

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 1638b8f096ca165965189b9626564c933c79fe63 upstream.

A recent commit removed the NULL pointer check from the clock_getres()
implementation causing a test case to fault.

POSIX requires an explicit NULL pointer check for clock_getres() aside of
the validity check of the clock_id argument for obscure reasons.

Add it back for both 32bit and 64bit.

Note, this is only a partial revert of the offending commit which does not
bring back the broken fallback invocation in the the 32bit compat
implementations of clock_getres() and clock_gettime().

Fixes: a9446a906f52 ("lib/vdso/32: Remove inconsistent NULL pointer checks")
Reported-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1910211202260.1904@nanos.tec.linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agos390/kaslr: add support for R_390_GLOB_DAT relocation type
Gerald Schaefer [Mon, 21 Oct 2019 17:56:00 +0000 (19:56 +0200)]
s390/kaslr: add support for R_390_GLOB_DAT relocation type

BugLink: https://bugs.launchpad.net/bugs/1850456
commit ac49303d9ef0ad98b79867a380ef23480e48870b upstream.

Commit "bpf: Process in-kernel BTF" in linux-next introduced an undefined
__weak symbol, which results in an R_390_GLOB_DAT relocation type. That
is not yet handled by the KASLR relocation code, and the kernel stops with
the message "Unknown relocation type".

Add code to detect and handle R_390_GLOB_DAT relocation types and undefined
symbols.

Fixes: 805bc0bc238f ("s390/kernel: build a relocatable kernel")
Cc: <stable@vger.kernel.org> # v5.2+
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agos390/zcrypt: fix memleak at release
Johan Hovold [Thu, 10 Oct 2019 13:13:33 +0000 (15:13 +0200)]
s390/zcrypt: fix memleak at release

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 388bb19be8eab4674a660e0c97eaf60775362bc7 upstream.

If a process is interrupted while accessing the crypto device and the
global ap_perms_mutex is contented, release() could return early and
fail to free related resources.

Fixes: 00fab2350e6b ("s390/zcrypt: multiple zcrypt device nodes support")
Cc: <stable@vger.kernel.org> # 4.19
Cc: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoxtensa: fix change_bit in exclusive access option
Max Filippov [Wed, 16 Oct 2019 04:51:43 +0000 (21:51 -0700)]
xtensa: fix change_bit in exclusive access option

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 775fd6bfefc66a8c33e91dd9687ed530643b954d upstream.

change_bit implementation for XCHAL_HAVE_EXCLUSIVE case changes all bits
except the one required due to copy-paste error from clear_bit.

Cc: stable@vger.kernel.org # v5.2+
Fixes: f7c34874f04a ("xtensa: add exclusive atomics support")
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoxtensa: drop EXPORT_SYMBOL for outs*/ins*
Max Filippov [Mon, 14 Oct 2019 22:48:19 +0000 (15:48 -0700)]
xtensa: drop EXPORT_SYMBOL for outs*/ins*

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 8b39da985194aac2998dd9e3a22d00b596cebf1e upstream.

Custom outs*/ins* implementations are long gone from the xtensa port,
remove matching EXPORT_SYMBOLs.
This fixes the following build warnings issued by modpost since commit
15bfc2348d54 ("modpost: check for static EXPORT_SYMBOL* functions"):

  WARNING: "insb" [vmlinux] is a static EXPORT_SYMBOL
  WARNING: "insw" [vmlinux] is a static EXPORT_SYMBOL
  WARNING: "insl" [vmlinux] is a static EXPORT_SYMBOL
  WARNING: "outsb" [vmlinux] is a static EXPORT_SYMBOL
  WARNING: "outsw" [vmlinux] is a static EXPORT_SYMBOL
  WARNING: "outsl" [vmlinux] is a static EXPORT_SYMBOL

Cc: stable@vger.kernel.org
Fixes: d38efc1f150f ("xtensa: adopt generic io routines")
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agozram: fix race between backing_dev_show and backing_dev_store
Chenwandun [Sat, 19 Oct 2019 03:20:14 +0000 (20:20 -0700)]
zram: fix race between backing_dev_show and backing_dev_store

BugLink: https://bugs.launchpad.net/bugs/1850456
commit f7daefe4231e57381d92c2e2ad905a899c28e402 upstream.

CPU0:        CPU1:
backing_dev_show        backing_dev_store
    ......    ......
    file = zram->backing_dev;
    down_read(&zram->init_lock);    down_read(&zram->init_init_lock)
    file_path(file, ...);    zram->backing_dev = backing_dev;
    up_read(&zram->init_lock);    up_read(&zram->init_lock);

gets the value of zram->backing_dev too early in backing_dev_show, which
resultin the value being NULL at the beginning, and not NULL later.

backtrace:
  d_path+0xcc/0x174
  file_path+0x10/0x18
  backing_dev_show+0x40/0xb4
  dev_attr_show+0x20/0x54
  sysfs_kf_seq_show+0x9c/0x10c
  kernfs_seq_show+0x28/0x30
  seq_read+0x184/0x488
  kernfs_fop_read+0x5c/0x1a4
  __vfs_read+0x44/0x128
  vfs_read+0xa0/0x138
  SyS_read+0x54/0xb4

Link: http://lkml.kernel.org/r/1571046839-16814-1-git-send-email-chenwandun@huawei.com
Signed-off-by: Chenwandun <chenwandun@huawei.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: <stable@vger.kernel.org> [4.14+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agomm/memory-failure: poison read receives SIGKILL instead of SIGBUS if mmaped more...
Jane Chu [Mon, 14 Oct 2019 21:12:29 +0000 (14:12 -0700)]
mm/memory-failure: poison read receives SIGKILL instead of SIGBUS if mmaped more than once

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 3d7fed4ad8ccb691d217efbb0f934e6a4df5ef91 upstream.

Mmap /dev/dax more than once, then read the poison location using
address from one of the mappings.  The other mappings due to not having
the page mapped in will cause SIGKILLs delivered to the process.
SIGKILL succeeds over SIGBUS, so user process loses the opportunity to
handle the UE.

Although one may add MAP_POPULATE to mmap(2) to work around the issue,
MAP_POPULATE makes mapping 128GB of pmem several magnitudes slower, so
isn't always an option.

Details -

  ndctl inject-error --block=10 --count=1 namespace6.0

  ./read_poison -x dax6.0 -o 5120 -m 2
  mmaped address 0x7f5bb6600000
  mmaped address 0x7f3cf3600000
  doing local read at address 0x7f3cf3601400
  Killed

Console messages in instrumented kernel -

  mce: Uncorrected hardware memory error in user-access at edbe201400
  Memory failure: tk->addr = 7f5bb6601000
  Memory failure: address edbe201: call dev_pagemap_mapping_shift
  dev_pagemap_mapping_shift: page edbe201: no PUD
  Memory failure: tk->size_shift == 0
  Memory failure: Unable to find user space address edbe201 in read_poison
  Memory failure: tk->addr = 7f3cf3601000
  Memory failure: address edbe201: call dev_pagemap_mapping_shift
  Memory failure: tk->size_shift = 21
  Memory failure: 0xedbe201: forcibly killing read_poison:22434 because of failure to unmap corrupted page
    => to deliver SIGKILL
  Memory failure: 0xedbe201: Killing read_poison:22434 due to hardware memory corruption
    => to deliver SIGBUS

Link: http://lkml.kernel.org/r/1565112345-28754-3-git-send-email-jane.chu@oracle.com
Signed-off-by: Jane Chu <jane.chu@oracle.com>
Suggested-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agohugetlbfs: don't access uninitialized memmaps in pfn_range_valid_gigantic()
David Hildenbrand [Sat, 19 Oct 2019 03:20:05 +0000 (20:20 -0700)]
hugetlbfs: don't access uninitialized memmaps in pfn_range_valid_gigantic()

BugLink: https://bugs.launchpad.net/bugs/1850456
commit f231fe4235e22e18d847e05cbe705deaca56580a upstream.

Uninitialized memmaps contain garbage and in the worst case trigger
kernel BUGs, especially with CONFIG_PAGE_POISONING.  They should not get
touched.

Let's make sure that we only consider online memory (managed by the
buddy) that has initialized memmaps.  ZONE_DEVICE is not applicable.

page_zone() will call page_to_nid(), which will trigger
VM_BUG_ON_PGFLAGS(PagePoisoned(page), page) with CONFIG_PAGE_POISONING
and CONFIG_DEBUG_VM_PGFLAGS when called on uninitialized memmaps.  This
can be the case when an offline memory block (e.g., never onlined) is
spanned by a zone.

Note: As explained by Michal in [1], alloc_contig_range() will verify
the range.  So it boils down to the wrong access in this function.

[1] http://lkml.kernel.org/r/20180423000943.GO17484@dhcp22.suse.cz

Link: http://lkml.kernel.org/r/20191015120717.4858-1-david@redhat.com
Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") [visible after d0dc12e86b319]
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: Michal Hocko <mhocko@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: <stable@vger.kernel.org> [4.13+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agomm: memblock: do not enforce current limit for memblock_phys* family
Mike Rapoport [Sat, 19 Oct 2019 03:20:01 +0000 (20:20 -0700)]
mm: memblock: do not enforce current limit for memblock_phys* family

BugLink: https://bugs.launchpad.net/bugs/1850456
commit f3057ad767542be7bbac44e548cb44017178a163 upstream.

Until commit 92d12f9544b7 ("memblock: refactor internal allocation
functions") the maximal address for memblock allocations was forced to
memblock.current_limit only for the allocation functions returning
virtual address.  The changes introduced by that commit moved the limit
enforcement into the allocation core and as a result the allocation
functions returning physical address also started to limit allocations
to memblock.current_limit.

This caused breakage of etnaviv GPU driver:

  etnaviv etnaviv: bound 130000.gpu (ops gpu_ops)
  etnaviv etnaviv: bound 134000.gpu (ops gpu_ops)
  etnaviv etnaviv: bound 2204000.gpu (ops gpu_ops)
  etnaviv-gpu 130000.gpu: model: GC2000, revision: 5108
  etnaviv-gpu 130000.gpu: command buffer outside valid memory window
  etnaviv-gpu 134000.gpu: model: GC320, revision: 5007
  etnaviv-gpu 134000.gpu: command buffer outside valid memory window
  etnaviv-gpu 2204000.gpu: model: GC355, revision: 1215
  etnaviv-gpu 2204000.gpu: Ignoring GPU with VG and FE2.0

Restore the behaviour of memblock_phys* family so that these functions
will not enforce memblock.current_limit.

Link: http://lkml.kernel.org/r/1570915861-17633-1-git-send-email-rppt@kernel.org
Fixes: 92d12f9544b7 ("memblock: refactor internal allocation functions")
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reported-by: Adam Ford <aford173@gmail.com>
Tested-by: Adam Ford <aford173@gmail.com> [imx6q-logicpd]
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Fabio Estevam <festevam@gmail.com>
Cc: Lucas Stach <l.stach@pengutronix.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agomm: memcg: get number of pages on the LRU list in memcgroup base on lru_zone_size
Honglei Wang [Sat, 19 Oct 2019 03:19:58 +0000 (20:19 -0700)]
mm: memcg: get number of pages on the LRU list in memcgroup base on lru_zone_size

BugLink: https://bugs.launchpad.net/bugs/1850456
commit b11edebbc967ebf5c55b8f9e1d5bb6d68ec3a7fd upstream.

Commit 1a61ab8038e72 ("mm: memcontrol: replace zone summing with
lruvec_page_state()") has made lruvec_page_state to use per-cpu counters
instead of calculating it directly from lru_zone_size with an idea that
this would be more effective.

Tim has reported that this is not really the case for their database
benchmark which is showing an opposite results where lruvec_page_state
is taking up a huge chunk of CPU cycles (about 25% of the system time
which is roughly 7% of total cpu cycles) on 5.3 kernels.  The workload
is running on a larger machine (96cpus), it has many cgroups (500) and
it is heavily direct reclaim bound.

Tim Chen said:

: The problem can also be reproduced by running simple multi-threaded
: pmbench benchmark with a fast Optane SSD swap (see profile below).
:
:
: 6.15%     3.08%  pmbench          [kernel.vmlinux]            [k] lruvec_lru_size
:             |
:             |--3.07%--lruvec_lru_size
:             |          |
:             |          |--2.11%--cpumask_next
:             |          |          |
:             |          |           --1.66%--find_next_bit
:             |          |
:             |           --0.57%--call_function_interrupt
:             |                     |
:             |                      --0.55%--smp_call_function_interrupt
:             |
:             |--1.59%--0x441f0fc3d009
:             |          _ops_rdtsc_init_base_freq
:             |          access_histogram
:             |          page_fault
:             |          __do_page_fault
:             |          handle_mm_fault
:             |          __handle_mm_fault
:             |          |
:             |           --1.54%--do_swap_page
:             |                     swapin_readahead
:             |                     swap_cluster_readahead
:             |                     |
:             |                      --1.53%--read_swap_cache_async
:             |                                __read_swap_cache_async
:             |                                alloc_pages_vma
:             |                                __alloc_pages_nodemask
:             |                                __alloc_pages_slowpath
:             |                                try_to_free_pages
:             |                                do_try_to_free_pages
:             |                                shrink_node
:             |                                shrink_node_memcg
:             |                                |
:             |                                |--0.77%--lruvec_lru_size
:             |                                |
:             |                                 --0.76%--inactive_list_is_low
:             |                                           |
:             |                                            --0.76%--lruvec_lru_size
:             |
:              --1.50%--measure_read
:                        page_fault
:                        __do_page_fault
:                        handle_mm_fault
:                        __handle_mm_fault
:                        do_swap_page
:                        swapin_readahead
:                        swap_cluster_readahead
:                        |
:                         --1.48%--read_swap_cache_async
:                                   __read_swap_cache_async
:                                   alloc_pages_vma
:                                   __alloc_pages_nodemask
:                                   __alloc_pages_slowpath
:                                   try_to_free_pages
:                                   do_try_to_free_pages
:                                   shrink_node
:                                   shrink_node_memcg
:                                   |
:                                   |--0.75%--inactive_list_is_low
:                                   |          |
:                                   |           --0.75%--lruvec_lru_size
:                                   |
:                                    --0.73%--lruvec_lru_size

The likely culprit is the cache traffic the lruvec_page_state_local
generates.  Dave Hansen says:

: I was thinking purely of the cache footprint.  If it's reading
: pn->lruvec_stat_local->count[idx] is three separate cachelines, so 192
: bytes of cache *96 CPUs = 18k of data, mostly read-only.  1 cgroup would
: be 18k of data for the whole system and the caching would be pretty
: efficient and all 18k would probably survive a tight page fault loop in
: the L1.  500 cgroups would be ~90k of data per CPU thread which doesn't
: fit in the L1 and probably wouldn't survive a tight page fault loop if
: both logical threads were banging on different cgroups.
:
: It's just a theory, but it's why I noted the number of cgroups when I
: initially saw this show up in profiles

Fix the regression by partially reverting the said commit and calculate
the lru size explicitly.

Link: http://lkml.kernel.org/r/20190905071034.16822-1-honglei.wang@oracle.com
Fixes: 1a61ab8038e72 ("mm: memcontrol: replace zone summing with lruvec_page_state()")
Signed-off-by: Honglei Wang <honglei.wang@oracle.com>
Reported-by: Tim Chen <tim.c.chen@linux.intel.com>
Acked-by: Tim Chen <tim.c.chen@linux.intel.com>
Tested-by: Tim Chen <tim.c.chen@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: <stable@vger.kernel.org> [5.2+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agomm, compaction: fix wrong pfn handling in __reset_isolation_pfn()
Vlastimil Babka [Mon, 14 Oct 2019 21:12:07 +0000 (14:12 -0700)]
mm, compaction: fix wrong pfn handling in __reset_isolation_pfn()

BugLink: https://bugs.launchpad.net/bugs/1850456
commit a2e9a5afce080226edbf1882d63d99bf32070e9e upstream.

Florian and Dave reported [1] a NULL pointer dereference in
__reset_isolation_pfn().  While the exact cause is unclear, staring at
the code revealed two bugs, which might be related.

One bug is that if zone starts in the middle of pageblock, block_page
might correspond to different pfn than block_pfn, and then the
pfn_valid_within() checks will check different pfn's than those accessed
via struct page.  This might result in acessing an unitialized page in
CONFIG_HOLES_IN_ZONE configs.

The other bug is that end_page refers to the first page of next
pageblock and not last page of current pageblock.  The online and valid
check is then wrong and with sections, the while (page < end_page) loop
might wander off actual struct page arrays.

[1] https://lore.kernel.org/linux-xfs/87o8z1fvqu.fsf@mid.deneb.enyo.de/

Link: http://lkml.kernel.org/r/20191008152915.24704-1-vbabka@suse.cz
Fixes: 6b0868c820ff ("mm/compaction.c: correct zone boundary handling when resetting pageblock skip hints")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Florian Weimer <fw@deneb.enyo.de>
Reported-by: Dave Chinner <david@fromorbit.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agomm: memcg/slab: fix panic in __free_slab() caused by premature memcg pointer release
Roman Gushchin [Sat, 19 Oct 2019 03:19:44 +0000 (20:19 -0700)]
mm: memcg/slab: fix panic in __free_slab() caused by premature memcg pointer release

BugLink: https://bugs.launchpad.net/bugs/1850456
commit b749ecfaf6c53ce79d6ab66afd2fc34189a073b1 upstream.

Karsten reported the following panic in __free_slab() happening on a s390x
machine:

  Unable to handle kernel pointer dereference in virtual kernel address space
  Failing address: 0000000000000000 TEID: 0000000000000483
  Fault in home space mode while using kernel ASCE.
  AS:00000000017d4007 R3:000000007fbd0007 S:000000007fbff000 P:000000000000003d
  Oops: 0004 ilc:3 Ý#1¨ PREEMPT SMP
  Modules linked in: tcp_diag inet_diag xt_tcpudp ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_at nf_nat
  CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-05872-g6133e3e4bada-dirty #14
  Hardware name: IBM 2964 NC9 702 (z/VM 6.4.0)
  Krnl PSW : 0704d00180000000 00000000003cadb6 (__free_slab+0x686/0x6b0)
             R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
  Krnl GPRS: 00000000f3a32928 0000000000000000 000000007fbf5d00 000000000117c4b8
             0000000000000000 000000009e3291c1 0000000000000000 0000000000000000
             0000000000000003 0000000000000008 000000002b478b00 000003d080a97600
             0000000000000003 0000000000000008 000000002b478b00 000003d080a97600
             000000000117ba00 000003e000057db0 00000000003cabcc 000003e000057c78
  Krnl Code: 00000000003cada6e310a1400004        lg      %r1,320(%r10)
             00000000003cadacc0e50046c286        brasl   %r14,ca32b8
            #00000000003cadb2a7f4fe36            brc     15,3caa1e
            >00000000003cadb6e32060800024        stg     %r2,128(%r6)
             00000000003cadbca7f4fd9e            brc     15,3ca8f8
             00000000003cadc0c0e50046790c        brasl   %r14,c99fd8
             00000000003cadc6a7f4fe2c            brc     15,3caa
             00000000003cadc6a7f4fe2c            brc     15,3caa1e
             00000000003cadcaecb1ffff00d9        aghik   %r11,%r1,-1
  Call Trace:
  (<00000000003cabcc> __free_slab+0x49c/0x6b0)
   <00000000001f5886> rcu_core+0x5a6/0x7e0
   <0000000000ca2dea> __do_softirq+0xf2/0x5c0
   <0000000000152644> irq_exit+0x104/0x130
   <000000000010d222> do_IRQ+0x9a/0xf0
   <0000000000ca2344> ext_int_handler+0x130/0x134
   <0000000000103648> enabled_wait+0x58/0x128
  (<0000000000103634> enabled_wait+0x44/0x128)
   <0000000000103b00> arch_cpu_idle+0x40/0x58
   <0000000000ca0544> default_idle_call+0x3c/0x68
   <000000000018eaa4> do_idle+0xec/0x1c0
   <000000000018ee0e> cpu_startup_entry+0x36/0x40
   <000000000122df34> arch_call_rest_init+0x5c/0x88
   <0000000000000000> 0x0
  INFO: lockdep is turned off.
  Last Breaking-Event-Address:
   <00000000003ca8f4> __free_slab+0x1c4/0x6b0
  Kernel panic - not syncing: Fatal exception in interrupt

The kernel panics on an attempt to dereference the NULL memcg pointer.
When shutdown_cache() is called from the kmem_cache_destroy() context, a
memcg kmem_cache might have empty slab pages in a partial list, which are
still charged to the memory cgroup.

These pages are released by free_partial() at the beginning of
shutdown_cache(): either directly or by scheduling a RCU-delayed work
(if the kmem_cache has the SLAB_TYPESAFE_BY_RCU flag).  The latter case
is when the reported panic can happen: memcg_unlink_cache() is called
immediately after shrinking partial lists, without waiting for scheduled
RCU works.  It sets the kmem_cache->memcg_params.memcg pointer to NULL,
and the following attempt to dereference it by __free_slab() from the
RCU work context causes the panic.

To fix the issue, let's postpone the release of the memcg pointer to
destroy_memcg_params().  It's called from a separate work context by
slab_caches_to_rcu_destroy_workfn(), which contains a full RCU barrier.
This guarantees that all scheduled page release RCU works will complete
before the memcg pointer will be zeroed.

Big thanks for Karsten for the perfect report containing all necessary
information, his help with the analysis of the problem and testing of the
fix.

Link: http://lkml.kernel.org/r/20191010160549.1584316-1-guro@fb.com
Fixes: fb2f2b0adb98 ("mm: memcg/slab: reparent memcg kmem_caches on cgroup removal")
Signed-off-by: Roman Gushchin <guro@fb.com>
Reported-by: Karsten Graul <kgraul@linux.ibm.com>
Tested-by: Karsten Graul <kgraul@linux.ibm.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Karsten Graul <kgraul@linux.ibm.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agomm/memunmap: don't access uninitialized memmap in memunmap_pages()
Aneesh Kumar K.V [Sat, 19 Oct 2019 03:19:39 +0000 (20:19 -0700)]
mm/memunmap: don't access uninitialized memmap in memunmap_pages()

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 77e080e7680e1e615587352f70c87b9e98126d03 upstream.

Patch series "mm/memory_hotplug: Shrink zones before removing memory",
v6.

This series fixes the access of uninitialized memmaps when shrinking
zones/nodes and when removing memory.  Also, it contains all fixes for
crashes that can be triggered when removing certain namespace using
memunmap_pages() - ZONE_DEVICE, reported by Aneesh.

We stop trying to shrink ZONE_DEVICE, as it's buggy, fixing it would be
more involved (we don't have SECTION_IS_ONLINE as an indicator), and
shrinking is only of limited use (set_zone_contiguous() cannot detect
the ZONE_DEVICE as contiguous).

We continue shrinking !ZONE_DEVICE zones, however, I reduced the amount
of code to a minimum.  Shrinking is especially necessary to keep
zone->contiguous set where possible, especially, on memory unplug of
DIMMs at zone boundaries.

--------------------------------------------------------------------------

Zones are now properly shrunk when offlining memory blocks or when
onlining failed.  This allows to properly shrink zones on memory unplug
even if the separate memory blocks of a DIMM were onlined to different
zones or re-onlined to a different zone after offlining.

Example:

  :/# cat /proc/zoneinfo
  Node 1, zone  Movable
          spanned  0
          present  0
          managed  0
  :/# echo "online_movable" > /sys/devices/system/memory/memory41/state
  :/# echo "online_movable" > /sys/devices/system/memory/memory43/state
  :/# cat /proc/zoneinfo
  Node 1, zone  Movable
          spanned  98304
          present  65536
          managed  65536
  :/# echo 0 > /sys/devices/system/memory/memory43/online
  :/# cat /proc/zoneinfo
  Node 1, zone  Movable
          spanned  32768
          present  32768
          managed  32768
  :/# echo 0 > /sys/devices/system/memory/memory41/online
  :/# cat /proc/zoneinfo
  Node 1, zone  Movable
          spanned  0
          present  0
          managed  0

This patch (of 10):

With an altmap, the memmap falling into the reserved altmap space are not
initialized and, therefore, contain a garbage NID and a garbage zone.
Make sure to read the NID/zone from a memmap that was initialized.

This fixes a kernel crash that is observed when destroying a namespace:

  kernel BUG at include/linux/mm.h:1107!
  cpu 0x1: Vector: 700 (Program Check) at [c000000274087890]
      pc: c0000000004b9728: memunmap_pages+0x238/0x340
      lr: c0000000004b9724: memunmap_pages+0x234/0x340
  ...
      pid   = 3669, comm = ndctl
  kernel BUG at include/linux/mm.h:1107!
    devm_action_release+0x30/0x50
    release_nodes+0x268/0x2d0
    device_release_driver_internal+0x174/0x240
    unbind_store+0x13c/0x190
    drv_attr_store+0x44/0x60
    sysfs_kf_write+0x70/0xa0
    kernfs_fop_write+0x1ac/0x290
    __vfs_write+0x3c/0x70
    vfs_write+0xe4/0x200
    ksys_write+0x7c/0x140
    system_call+0x5c/0x68

The "page_zone(pfn_to_page(pfn)" was introduced by 69324b8f4833 ("mm,
devm_memremap_pages: add MEMORY_DEVICE_PRIVATE support"), however, I
think we will never have driver reserved memory with
MEMORY_DEVICE_PRIVATE (no altmap AFAIKS).

[david@redhat.com: minimze code changes, rephrase description]
Link: http://lkml.kernel.org/r/20191006085646.5768-2-david@redhat.com
Fixes: 2c2a5af6fed2 ("mm, memory_hotplug: add nid parameter to arch_remove_memory")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Damian Tometzki <damian.tometzki@gmail.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Halil Pasic <pasic@linux.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jun Yao <yaojun8558363@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pankaj Gupta <pagupta@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qian Cai <cai@lca.pw>
Cc: Rich Felker <dalias@libc.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Yu Zhao <yuzhao@google.com>
Cc: <stable@vger.kernel.org> [5.0+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agomm/page_owner: don't access uninitialized memmaps when reading /proc/pagetypeinfo
Qian Cai [Sat, 19 Oct 2019 03:19:29 +0000 (20:19 -0700)]
mm/page_owner: don't access uninitialized memmaps when reading /proc/pagetypeinfo

BugLink: https://bugs.launchpad.net/bugs/1850456
commit a26ee565b6cd8dc2bf15ff6aa70bbb28f928b773 upstream.

Uninitialized memmaps contain garbage and in the worst case trigger
kernel BUGs, especially with CONFIG_PAGE_POISONING.  They should not get
touched.

For example, when not onlining a memory block that is spanned by a zone
and reading /proc/pagetypeinfo with CONFIG_DEBUG_VM_PGFLAGS and
CONFIG_PAGE_POISONING, we can trigger a kernel BUG:

  :/# echo 1 > /sys/devices/system/memory/memory40/online
  :/# echo 1 > /sys/devices/system/memory/memory42/online
  :/# cat /proc/pagetypeinfo > test.file
   page:fffff2c585200000 is uninitialized and poisoned
   raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
   raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
   page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
   There is not page extension available.
   ------------[ cut here ]------------
   kernel BUG at include/linux/mm.h:1107!
   invalid opcode: 0000 [#1] SMP NOPTI

Please note that this change does not affect ZONE_DEVICE, because
pagetypeinfo_showmixedcount_print() is called from
mm/vmstat.c:pagetypeinfo_showmixedcount() only for populated zones, and
ZONE_DEVICE is never populated (zone->present_pages always 0).

[david@redhat.com: move check to outer loop, add comment, rephrase description]
Link: http://lkml.kernel.org/r/20191011140638.8160-1-david@redhat.com
Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") # visible after d0dc12e86b319
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Miles Chen <miles.chen@mediatek.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: <stable@vger.kernel.org> [4.13+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agomm/slub: fix a deadlock in show_slab_objects()
Qian Cai [Mon, 14 Oct 2019 21:11:51 +0000 (14:11 -0700)]
mm/slub: fix a deadlock in show_slab_objects()

BugLink: https://bugs.launchpad.net/bugs/1850456
commit e4f8e513c3d353c134ad4eef9fd0bba12406c7c8 upstream.

A long time ago we fixed a similar deadlock in show_slab_objects() [1].
However, it is apparently due to the commits like 01fb58bcba63 ("slab:
remove synchronous synchronize_sched() from memcg cache deactivation
path") and 03afc0e25f7f ("slab: get_online_mems for
kmem_cache_{create,destroy,shrink}"), this kind of deadlock is back by
just reading files in /sys/kernel/slab which will generate a lockdep
splat below.

Since the "mem_hotplug_lock" here is only to obtain a stable online node
mask while racing with NUMA node hotplug, in the worst case, the results
may me miscalculated while doing NUMA node hotplug, but they shall be
corrected by later reads of the same files.

  WARNING: possible circular locking dependency detected
  ------------------------------------------------------
  cat/5224 is trying to acquire lock:
  ffff900012ac3120 (mem_hotplug_lock.rw_sem){++++}, at:
  show_slab_objects+0x94/0x3a8

  but task is already holding lock:
  b8ff009693eee398 (kn->count#45){++++}, at: kernfs_seq_start+0x44/0xf0

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:

  -> #2 (kn->count#45){++++}:
         lock_acquire+0x31c/0x360
         __kernfs_remove+0x290/0x490
         kernfs_remove+0x30/0x44
         sysfs_remove_dir+0x70/0x88
         kobject_del+0x50/0xb0
         sysfs_slab_unlink+0x2c/0x38
         shutdown_cache+0xa0/0xf0
         kmemcg_cache_shutdown_fn+0x1c/0x34
         kmemcg_workfn+0x44/0x64
         process_one_work+0x4f4/0x950
         worker_thread+0x390/0x4bc
         kthread+0x1cc/0x1e8
         ret_from_fork+0x10/0x18

  -> #1 (slab_mutex){+.+.}:
         lock_acquire+0x31c/0x360
         __mutex_lock_common+0x16c/0xf78
         mutex_lock_nested+0x40/0x50
         memcg_create_kmem_cache+0x38/0x16c
         memcg_kmem_cache_create_func+0x3c/0x70
         process_one_work+0x4f4/0x950
         worker_thread+0x390/0x4bc
         kthread+0x1cc/0x1e8
         ret_from_fork+0x10/0x18

  -> #0 (mem_hotplug_lock.rw_sem){++++}:
         validate_chain+0xd10/0x2bcc
         __lock_acquire+0x7f4/0xb8c
         lock_acquire+0x31c/0x360
         get_online_mems+0x54/0x150
         show_slab_objects+0x94/0x3a8
         total_objects_show+0x28/0x34
         slab_attr_show+0x38/0x54
         sysfs_kf_seq_show+0x198/0x2d4
         kernfs_seq_show+0xa4/0xcc
         seq_read+0x30c/0x8a8
         kernfs_fop_read+0xa8/0x314
         __vfs_read+0x88/0x20c
         vfs_read+0xd8/0x10c
         ksys_read+0xb0/0x120
         __arm64_sys_read+0x54/0x88
         el0_svc_handler+0x170/0x240
         el0_svc+0x8/0xc

  other info that might help us debug this:

  Chain exists of:
    mem_hotplug_lock.rw_sem --> slab_mutex --> kn->count#45

   Possible unsafe locking scenario:

         CPU0                    CPU1
         ----                    ----
    lock(kn->count#45);
                                 lock(slab_mutex);
                                 lock(kn->count#45);
    lock(mem_hotplug_lock.rw_sem);

   *** DEADLOCK ***

  3 locks held by cat/5224:
   #0: 9eff00095b14b2a0 (&p->lock){+.+.}, at: seq_read+0x4c/0x8a8
   #1: 0eff008997041480 (&of->mutex){+.+.}, at: kernfs_seq_start+0x34/0xf0
   #2: b8ff009693eee398 (kn->count#45){++++}, at:
  kernfs_seq_start+0x44/0xf0

  stack backtrace:
  Call trace:
   dump_backtrace+0x0/0x248
   show_stack+0x20/0x2c
   dump_stack+0xd0/0x140
   print_circular_bug+0x368/0x380
   check_noncircular+0x248/0x250
   validate_chain+0xd10/0x2bcc
   __lock_acquire+0x7f4/0xb8c
   lock_acquire+0x31c/0x360
   get_online_mems+0x54/0x150
   show_slab_objects+0x94/0x3a8
   total_objects_show+0x28/0x34
   slab_attr_show+0x38/0x54
   sysfs_kf_seq_show+0x198/0x2d4
   kernfs_seq_show+0xa4/0xcc
   seq_read+0x30c/0x8a8
   kernfs_fop_read+0xa8/0x314
   __vfs_read+0x88/0x20c
   vfs_read+0xd8/0x10c
   ksys_read+0xb0/0x120
   __arm64_sys_read+0x54/0x88
   el0_svc_handler+0x170/0x240
   el0_svc+0x8/0xc

I think it is important to mention that this doesn't expose the
show_slab_objects to use-after-free.  There is only a single path that
might really race here and that is the slab hotplug notifier callback
__kmem_cache_shrink (via slab_mem_going_offline_callback) but that path
doesn't really destroy kmem_cache_node data structures.

[1] http://lkml.iu.edu/hypermail/linux/kernel/1101.0/02850.html

[akpm@linux-foundation.org: add comment explaining why we don't need mem_hotplug_lock]
Link: http://lkml.kernel.org/r/1570192309-10132-1-git-send-email-cai@lca.pw
Fixes: 01fb58bcba63 ("slab: remove synchronous synchronize_sched() from memcg cache deactivation path")
Fixes: 03afc0e25f7f ("slab: get_online_mems for kmem_cache_{create,destroy,shrink}")
Signed-off-by: Qian Cai <cai@lca.pw>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agomm/memory-failure.c: don't access uninitialized memmaps in memory_failure()
David Hildenbrand [Sat, 19 Oct 2019 03:19:23 +0000 (20:19 -0700)]
mm/memory-failure.c: don't access uninitialized memmaps in memory_failure()

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 96c804a6ae8c59a9092b3d5dd581198472063184 upstream.

We should check for pfn_to_online_page() to not access uninitialized
memmaps.  Reshuffle the code so we don't have to duplicate the error
message.

Link: http://lkml.kernel.org/r/20191009142435.3975-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") [visible after d0dc12e86b319]
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org> [4.13+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agommc: sdhci-omap: Fix Tuning procedure for temperatures < -20C
Faiz Abbas [Thu, 10 Oct 2019 10:52:30 +0000 (16:22 +0530)]
mmc: sdhci-omap: Fix Tuning procedure for temperatures < -20C

BugLink: https://bugs.launchpad.net/bugs/1850456
commit feb40824d78eac5e48f56498dca941754dff33d7 upstream.

According to the App note[1] detailing the tuning algorithm, for
temperatures < -20C, the initial tuning value should be min(largest value
in LPW - 24, ceil(13/16 ratio of LPW)). The largest value in LPW is
(max_window + 4 * (max_len - 1)) and not (max_window + 4 * max_len) itself.
Fix this implementation.

[1] http://www.ti.com/lit/an/spraca9b/spraca9b.pdf

Fixes: 961de0a856e3 ("mmc: sdhci-omap: Workaround errata regarding SDR104/HS200 tuning failures (i929)")
Cc: stable@vger.kernel.org
Signed-off-by: Faiz Abbas <faiz_abbas@ti.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agommc: cqhci: Commit descriptors before setting the doorbell
Faiz Abbas [Mon, 14 Oct 2019 18:38:49 +0000 (00:08 +0530)]
mmc: cqhci: Commit descriptors before setting the doorbell

BugLink: https://bugs.launchpad.net/bugs/1850456
commit c07d0073b9ec80a139d07ebf78e9c30d2a28279e upstream.

Add a write memory barrier to make sure that descriptors are actually
written to memory, before ringing the doorbell.

Signed-off-by: Faiz Abbas <faiz_abbas@ti.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agommc: mxs: fix flags passed to dmaengine_prep_slave_sg
Sascha Hauer [Fri, 18 Oct 2019 09:39:34 +0000 (11:39 +0200)]
mmc: mxs: fix flags passed to dmaengine_prep_slave_sg

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 2bb9f7566ba7ab3c2154964461e37b52cdc6b91b upstream.

Since ceeeb99cd821 we no longer abuse the DMA_CTRL_ACK flag for custom
driver use and introduced the MXS_DMA_CTRL_WAIT4END instead. We have not
changed all users to this flag though. This patch fixes it for the
mxs-mmc driver.

Fixes: ceeeb99cd821 ("dmaengine: mxs: rename custom flag")
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Tested-by: Fabio Estevam <festevam@gmail.com>
Reported-by: Bruno Thomsen <bruno.thomsen@gmail.com>
Tested-by: Bruno Thomsen <bruno.thomsen@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoio_uring: used cached copies of sq->dropped and cq->overflow
Jens Axboe [Fri, 25 Oct 2019 16:04:25 +0000 (10:04 -0600)]
io_uring: used cached copies of sq->dropped and cq->overflow

BugLink: https://bugs.launchpad.net/bugs/1850456
[ Upstream commit 498ccd9eda49117c34e0041563d0da6ac40e52b8 ]

We currently use the ring values directly, but that can lead to issues
if the application is malicious and changes these values on our behalf.
Created in-kernel cached versions of them, and just overwrite the user
side when we update them. This is similar to how we treat the sq/cq
ring tail/head updates.

Reported-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoio_uring: Fix race for sqes with userspace
Pavel Begunkov [Fri, 25 Oct 2019 09:31:31 +0000 (12:31 +0300)]
io_uring: Fix race for sqes with userspace

BugLink: https://bugs.launchpad.net/bugs/1850456
[ Upstream commit 935d1e45908afb8853c497f2c2bbbb685dec51dc ]

io_ring_submit() finalises with
1. io_commit_sqring(), which releases sqes to the userspace
2. Then calls to io_queue_link_head(), accessing released head's sqe

Reorder them.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoio_uring: Fix broken links with offloading
Pavel Begunkov [Fri, 25 Oct 2019 09:31:30 +0000 (12:31 +0300)]
io_uring: Fix broken links with offloading

BugLink: https://bugs.launchpad.net/bugs/1850456
[ Upstream commit fb5ccc98782f654778cb8d96ba8a998304f9a51f ]

io_sq_thread() processes sqes by 8 without considering links. As a
result, links will be randomely subdivided.

The easiest way to fix it is to call io_get_sqring() inside
io_submit_sqes() as do io_ring_submit().

Downsides:
1. This removes optimisation of not grabbing mm_struct for fixed files
2. It submitting all sqes in one go, without finer-grained sheduling
with cq processing.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agofs/proc/page.c: don't access uninitialized memmaps in fs/proc/page.c
David Hildenbrand [Sat, 19 Oct 2019 03:19:20 +0000 (20:19 -0700)]
fs/proc/page.c: don't access uninitialized memmaps in fs/proc/page.c

BugLink: https://bugs.launchpad.net/bugs/1850456
commit aad5f69bc161af489dbb5934868bd347282f0764 upstream.

There are three places where we access uninitialized memmaps, namely:
- /proc/kpagecount
- /proc/kpageflags
- /proc/kpagecgroup

We have initialized memmaps either when the section is online or when the
page was initialized to the ZONE_DEVICE.  Uninitialized memmaps contain
garbage and in the worst case trigger kernel BUGs, especially with
CONFIG_PAGE_POISONING.

For example, not onlining a DIMM during boot and calling /proc/kpagecount
with CONFIG_PAGE_POISONING:

  :/# cat /proc/kpagecount > tmp.test
  BUG: unable to handle page fault for address: fffffffffffffffe
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 114616067 P4D 114616067 PUD 114618067 PMD 0
  Oops: 0000 [#1] SMP NOPTI
  CPU: 0 PID: 469 Comm: cat Not tainted 5.4.0-rc1-next-20191004+ #11
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.4
  RIP: 0010:kpagecount_read+0xce/0x1e0
  Code: e8 09 83 e0 3f 48 0f a3 02 73 2d 4c 89 e7 48 c1 e7 06 48 03 3d ab 51 01 01 74 1d 48 8b 57 08 480
  RSP: 0018:ffffa14e409b7e78 EFLAGS: 00010202
  RAX: fffffffffffffffe RBX: 0000000000020000 RCX: 0000000000000000
  RDX: 0000000000000001 RSI: 00007f76b5595000 RDI: fffff35645000000
  RBP: 00007f76b5595000 R08: 0000000000000001 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000140000
  R13: 0000000000020000 R14: 00007f76b5595000 R15: ffffa14e409b7f08
  FS:  00007f76b577d580(0000) GS:ffff8f41bd400000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: fffffffffffffffe CR3: 0000000078960000 CR4: 00000000000006f0
  Call Trace:
   proc_reg_read+0x3c/0x60
   vfs_read+0xc5/0x180
   ksys_read+0x68/0xe0
   do_syscall_64+0x5c/0xa0
   entry_SYSCALL_64_after_hwframe+0x49/0xbe

For now, let's drop support for ZONE_DEVICE from the three pseudo files
in order to fix this.  To distinguish offline memory (with garbage
memmap) from ZONE_DEVICE memory with properly initialized memmaps, we
would have to check get_dev_pagemap() and pfn_zone_device_reserved()
right now.  The usage of both (especially, special casing devmem) is
frowned upon and needs to be reworked.

The fundamental issue we have is:

if (pfn_to_online_page(pfn)) {
/* memmap initialized */
} else if (pfn_valid(pfn)) {
/*
 * ???
 * a) offline memory. memmap garbage.
 * b) devmem: memmap initialized to ZONE_DEVICE.
 * c) devmem: reserved for driver. memmap garbage.
 * (d) devmem: memmap currently initializing - garbage)
 */
}

We'll leave the pfn_zone_device_reserved() check in stable_page_flags()
in place as that function is also used from memory failure.  We now no
longer dump information about pages that are not in use anymore -
offline.

Link: http://lkml.kernel.org/r/20191009142435.3975-2-david@redhat.com
Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") [visible after d0dc12e86b319]
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: Qian Cai <cai@lca.pw>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Toshiki Fukasawa <t-fukasawa@vx.jp.nec.com>
Cc: Pankaj gupta <pagupta@redhat.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Anthony Yznaga <anthony.yznaga@oracle.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: <stable@vger.kernel.org> [4.13+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agodrivers/base/memory.c: don't access uninitialized memmaps in soft_offline_page_store()
David Hildenbrand [Sat, 19 Oct 2019 03:19:16 +0000 (20:19 -0700)]
drivers/base/memory.c: don't access uninitialized memmaps in soft_offline_page_store()

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 641fe2e9387a36f9ee01d7c69382d1fe147a5e98 upstream.

Uninitialized memmaps contain garbage and in the worst case trigger kernel
BUGs, especially with CONFIG_PAGE_POISONING.  They should not get touched.

Right now, when trying to soft-offline a PFN that resides on a memory
block that was never onlined, one gets a misleading error with
CONFIG_PAGE_POISONING:

  :/# echo 5637144576 > /sys/devices/system/memory/soft_offline_page
  [   23.097167] soft offline: 0x150000 page already poisoned

But the actual result depends on the garbage in the memmap.

soft_offline_page() can only work with online pages, it returns -EIO in
case of ZONE_DEVICE.  Make sure to only forward pages that are online
(iow, managed by the buddy) and, therefore, have an initialized memmap.

Add a check against pfn_to_online_page() and similarly return -EIO.

Link: http://lkml.kernel.org/r/20191010141200.8985-1-david@redhat.com
Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") [visible after d0dc12e86b319]
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: <stable@vger.kernel.org> [4.13+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agodrm/amdgpu: user pages array memory leak fix
Philip Yang [Thu, 3 Oct 2019 18:18:25 +0000 (14:18 -0400)]
drm/amdgpu: user pages array memory leak fix

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 209620b422945ee03cebb03f726e706d537b692d upstream.

user_pages array should always be freed after validation regardless if
user pages are changed after bo is created because with HMM change parse
bo always allocate user pages array to get user pages for userptr bo.

v2: remove unused local variable and amend commit

v3: add back get user pages in gem_userptr_ioctl, to detect application
bug where an userptr VMA is not ananymous memory and reject it.

Bugzilla: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1844962

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Tested-by: Joe Barnett <thejoe@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 5.3
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agodrm/amdgpu/uvd7: fix allocation size in enc ring test (v2)
Alex Deucher [Tue, 15 Oct 2019 22:08:59 +0000 (18:08 -0400)]
drm/amdgpu/uvd7: fix allocation size in enc ring test (v2)

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 5d230bc91f6c15e5d281f2851502918d98b9e770 upstream.

We need to allocate a large enough buffer for the
session info, otherwise the IB test can overwrite
other memory.

v2: - session info is 128K according to mesa
    - use the same session info for create and destroy

Bug: https://bugzilla.kernel.org/show_bug.cgi?id=204241
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Tested-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agodrm/amdgpu/uvd6: fix allocation size in enc ring test (v2)
Alex Deucher [Tue, 15 Oct 2019 22:07:19 +0000 (18:07 -0400)]
drm/amdgpu/uvd6: fix allocation size in enc ring test (v2)

BugLink: https://bugs.launchpad.net/bugs/1850456
commit ce584a8e2885c7b59dfacba42db39761243cacb2 upstream.

We need to allocate a large enough buffer for the
session info, otherwise the IB test can overwrite
other memory.

v2: - session info is 128K according to mesa
    - use the same session info for create and destroy

Bug: https://bugzilla.kernel.org/show_bug.cgi?id=204241
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Tested-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agodrm/amdgpu/vcn: fix allocation size in enc ring test
Alex Deucher [Tue, 15 Oct 2019 22:09:41 +0000 (18:09 -0400)]
drm/amdgpu/vcn: fix allocation size in enc ring test

BugLink: https://bugs.launchpad.net/bugs/1850456
commit c81fffc2c9450750dd7a54a36a788a860ab0425d upstream.

We need to allocate a large enough buffer for the
session info, otherwise the IB test can overwrite
other memory.

- Session info is 128K according to mesa
- Use the same session info for create and destroy

Bug: https://bugzilla.kernel.org/show_bug.cgi?id=204241
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Tested-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agodrm/amdgpu/vce: fix allocation size in enc ring test
Alex Deucher [Thu, 17 Oct 2019 15:36:47 +0000 (11:36 -0400)]
drm/amdgpu/vce: fix allocation size in enc ring test

BugLink: https://bugs.launchpad.net/bugs/1850456
commit ee027828c40faa92a7ef4c2b0641bbb3f4be95d3 upstream.

We need to allocate a large enough buffer for the
feedback buffer, otherwise the IB test can overwrite
other memory.

Reviewed-by: James Zhu <James.Zhu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agodrm/i915: Favor last VBT child device with conflicting AUX ch/DDC pin
Ville Syrjälä [Fri, 11 Oct 2019 20:20:30 +0000 (23:20 +0300)]
drm/i915: Favor last VBT child device with conflicting AUX ch/DDC pin

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 0336ab580878f4c5663dfa2b66095821fdc3e588 upstream.

The first come first served apporoach to handling the VBT
child device AUX ch conflicts has backfired. We have machines
in the wild where the VBT specifies both port A eDP and
port E DP (in that order) with port E being the real one.

So let's try to flip the preference around and let the last
child device win once again.

Cc: stable@vger.kernel.org
Cc: Jani Nikula <jani.nikula@intel.com>
Tested-by: Masami Ichikawa <masami256@gmail.com>
Tested-by: Torsten <freedesktop201910@liggy.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111966
Fixes: 36a0f92020dc ("drm/i915/bios: make child device order the priority order")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191011202030.8829-1-ville.syrjala@linux.intel.com
Acked-by: Jani Nikula <jani.nikula@intel.com>
(cherry picked from commit 41e35ffb380bde1379e4030bb5b2ac824d5139cf)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agodrm/i915/userptr: Never allow userptr into the mappable GGTT
Chris Wilson [Sat, 28 Sep 2019 08:25:46 +0000 (09:25 +0100)]
drm/i915/userptr: Never allow userptr into the mappable GGTT

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 4f2a572eda67aecb1e7e4fc26cc985fb8158f6e8 upstream.

Daniel Vetter uncovered a nasty cycle in using the mmu-notifiers to
invalidate userptr objects which also happen to be pulled into GGTT
mmaps. That is when we unbind the userptr object (on mmu invalidation),
we revoke all CPU mmaps, which may then recurse into mmu invalidation.

We looked for ways of breaking the cycle, but the revocation on
invalidation is required and cannot be avoided. The only solution we
could see was to not allow such GGTT bindings of userptr objects in the
first place. In practice, no one really wants to use a GGTT mmapping of
a CPU pointer...

Just before Daniel's explosive lockdep patches land in v5.4-rc1, we got
a genuine blip from CI:

<4>[  246.793958] ======================================================
<4>[  246.793972] WARNING: possible circular locking dependency detected
<4>[  246.793989] 5.3.0-gbd6c56f50d15-drmtip_372+ #1 Tainted: G     U
<4>[  246.794003] ------------------------------------------------------
<4>[  246.794017] kswapd0/145 is trying to acquire lock:
<4>[  246.794030] 000000003f565be6 (&dev->struct_mutex/1){+.+.}, at: userptr_mn_invalidate_range_start+0x18f/0x220 [i915]
<4>[  246.794250]
                  but task is already holding lock:
<4>[  246.794263] 000000001799cef9 (&anon_vma->rwsem){++++}, at: page_lock_anon_vma_read+0xe6/0x2a0
<4>[  246.794291]
                  which lock already depends on the new lock.

<4>[  246.794307]
                  the existing dependency chain (in reverse order) is:
<4>[  246.794322]
                  -> #3 (&anon_vma->rwsem){++++}:
<4>[  246.794344]        down_write+0x33/0x70
<4>[  246.794357]        __vma_adjust+0x3d9/0x7b0
<4>[  246.794370]        __split_vma+0x16a/0x180
<4>[  246.794385]        mprotect_fixup+0x2a5/0x320
<4>[  246.794399]        do_mprotect_pkey+0x208/0x2e0
<4>[  246.794413]        __x64_sys_mprotect+0x16/0x20
<4>[  246.794429]        do_syscall_64+0x55/0x1c0
<4>[  246.794443]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[  246.794456]
                  -> #2 (&mapping->i_mmap_rwsem){++++}:
<4>[  246.794478]        down_write+0x33/0x70
<4>[  246.794493]        unmap_mapping_pages+0x48/0x130
<4>[  246.794519]        i915_vma_revoke_mmap+0x81/0x1b0 [i915]
<4>[  246.794519]        i915_vma_unbind+0x11d/0x4a0 [i915]
<4>[  246.794519]        i915_vma_destroy+0x31/0x300 [i915]
<4>[  246.794519]        __i915_gem_free_objects+0xb8/0x4b0 [i915]
<4>[  246.794519]        drm_file_free.part.0+0x1e6/0x290
<4>[  246.794519]        drm_release+0xa6/0xe0
<4>[  246.794519]        __fput+0xc2/0x250
<4>[  246.794519]        task_work_run+0x82/0xb0
<4>[  246.794519]        do_exit+0x35b/0xdb0
<4>[  246.794519]        do_group_exit+0x34/0xb0
<4>[  246.794519]        __x64_sys_exit_group+0xf/0x10
<4>[  246.794519]        do_syscall_64+0x55/0x1c0
<4>[  246.794519]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[  246.794519]
                  -> #1 (&vm->mutex){+.+.}:
<4>[  246.794519]        i915_gem_shrinker_taints_mutex+0x6d/0xe0 [i915]
<4>[  246.794519]        i915_address_space_init+0x9f/0x160 [i915]
<4>[  246.794519]        i915_ggtt_init_hw+0x55/0x170 [i915]
<4>[  246.794519]        i915_driver_probe+0xc9f/0x1620 [i915]
<4>[  246.794519]        i915_pci_probe+0x43/0x1b0 [i915]
<4>[  246.794519]        pci_device_probe+0x9e/0x120
<4>[  246.794519]        really_probe+0xea/0x3d0
<4>[  246.794519]        driver_probe_device+0x10b/0x120
<4>[  246.794519]        device_driver_attach+0x4a/0x50
<4>[  246.794519]        __driver_attach+0x97/0x130
<4>[  246.794519]        bus_for_each_dev+0x74/0xc0
<4>[  246.794519]        bus_add_driver+0x13f/0x210
<4>[  246.794519]        driver_register+0x56/0xe0
<4>[  246.794519]        do_one_initcall+0x58/0x300
<4>[  246.794519]        do_init_module+0x56/0x1f6
<4>[  246.794519]        load_module+0x25bd/0x2a40
<4>[  246.794519]        __se_sys_finit_module+0xd3/0xf0
<4>[  246.794519]        do_syscall_64+0x55/0x1c0
<4>[  246.794519]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[  246.794519]
                  -> #0 (&dev->struct_mutex/1){+.+.}:
<4>[  246.794519]        __lock_acquire+0x15d8/0x1e90
<4>[  246.794519]        lock_acquire+0xa6/0x1c0
<4>[  246.794519]        __mutex_lock+0x9d/0x9b0
<4>[  246.794519]        userptr_mn_invalidate_range_start+0x18f/0x220 [i915]
<4>[  246.794519]        __mmu_notifier_invalidate_range_start+0x85/0x110
<4>[  246.794519]        try_to_unmap_one+0x76b/0x860
<4>[  246.794519]        rmap_walk_anon+0x104/0x280
<4>[  246.794519]        try_to_unmap+0xc0/0xf0
<4>[  246.794519]        shrink_page_list+0x561/0xc10
<4>[  246.794519]        shrink_inactive_list+0x220/0x440
<4>[  246.794519]        shrink_node_memcg+0x36e/0x740
<4>[  246.794519]        shrink_node+0xcb/0x490
<4>[  246.794519]        balance_pgdat+0x241/0x580
<4>[  246.794519]        kswapd+0x16c/0x530
<4>[  246.794519]        kthread+0x119/0x130
<4>[  246.794519]        ret_from_fork+0x24/0x50
<4>[  246.794519]
                  other info that might help us debug this:

<4>[  246.794519] Chain exists of:
                    &dev->struct_mutex/1 --> &mapping->i_mmap_rwsem --> &anon_vma->rwsem

<4>[  246.794519]  Possible unsafe locking scenario:

<4>[  246.794519]        CPU0                    CPU1
<4>[  246.794519]        ----                    ----
<4>[  246.794519]   lock(&anon_vma->rwsem);
<4>[  246.794519]                                lock(&mapping->i_mmap_rwsem);
<4>[  246.794519]                                lock(&anon_vma->rwsem);
<4>[  246.794519]   lock(&dev->struct_mutex/1);
<4>[  246.794519]
                   *** DEADLOCK ***

v2: Say no to mmap_ioctl

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111744
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111870
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: stable@vger.kernel.org
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190928082546.3473-1-chris@chris-wilson.co.uk
(cherry picked from commit a4311745bba9763e3c965643d4531bd5765b0513)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agodrm/amdgpu/sdma5: fix mask value of POLL_REGMEM packet for pipe sync
Xiaojie Yuan [Wed, 9 Oct 2019 17:01:23 +0000 (01:01 +0800)]
drm/amdgpu/sdma5: fix mask value of POLL_REGMEM packet for pipe sync

BugLink: https://bugs.launchpad.net/bugs/1850456
commit d12c50857c6edc1d18aa7a60c5a4d6d943137bc0 upstream.

sdma will hang once sequence number to be polled reaches 0x1000_0000

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Xiaojie Yuan <xiaojie.yuan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agodrm/amdgpu: Bail earlier when amdgpu.cik_/si_support is not set to 1
Hans de Goede [Thu, 10 Oct 2019 16:28:17 +0000 (18:28 +0200)]
drm/amdgpu: Bail earlier when amdgpu.cik_/si_support is not set to 1

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 984d7a929ad68b7be9990fc9c5cfa5d5c9fc7942 upstream.

Bail from the pci_driver probe function instead of from the drm_driver
load function.

This avoid /dev/dri/card0 temporarily getting registered and then
unregistered again, sending unwanted add / remove udev events to
userspace.

Specifically this avoids triggering the (userspace) bug fixed by this
plymouth merge-request:
https://gitlab.freedesktop.org/plymouth/plymouth/merge_requests/59

Note that despite that being a userspace bug, not sending unnecessary
udev events is a good idea in general.

BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1490490
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agodrm/panfrost: Handle resetting on timeout better
Steven Price [Wed, 9 Oct 2019 09:44:55 +0000 (10:44 +0100)]
drm/panfrost: Handle resetting on timeout better

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 5b3ec8134f5f9fa1ed0a538441a495521078bbee upstream.

Panfrost uses multiple schedulers (one for each slot, so 2 in reality),
and on a timeout has to stop all the schedulers to safely perform a
reset. However more than one scheduler can trigger a timeout at the same
time. This race condition results in jobs being freed while they are
still in use.

When stopping other slots use cancel_delayed_work_sync() to ensure that
any timeout started for that slot has completed. Also use
mutex_trylock() to obtain reset_lock. This means that only one thread
attempts the reset, the other threads will simply complete without doing
anything (the first thread will wait for this in the call to
cancel_delayed_work_sync()).

While we're here and since the function is already dependent on
sched_job not being NULL, let's remove the unnecessary checks.

Fixes: aa20236784ab ("drm/panfrost: Prevent concurrent resets")
Tested-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20191009094456.9704-1-steven.price@arm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agodrm/ttm: Restore ttm prefaulting
Thomas Hellstrom [Thu, 12 Sep 2019 18:38:54 +0000 (20:38 +0200)]
drm/ttm: Restore ttm prefaulting

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 941f2f72dbbe0cf8c2d6e0b180a8021a0ec477fa upstream.

Commit 4daa4fba3a38 ("gpu: drm: ttm: Adding new return type vm_fault_t")
broke TTM prefaulting. Since vmf_insert_mixed() typically always returns
VM_FAULT_NOPAGE, prefaulting stops after the second PTE.

Restore (almost) the original behaviour. Unfortunately we can no longer
with the new vm_fault_t return type determine whether a prefaulting
PTE insertion hit an already populated PTE, and terminate the insertion
loop. Instead we continue with the pre-determined number of prefaults.

Fixes: 4daa4fba3a38 ("gpu: drm: ttm: Adding new return type vm_fault_t")
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/330387/
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agodrm/edid: Add 6 bpc quirk for SDC panel in Lenovo G50
Kai-Heng Feng [Tue, 2 Apr 2019 03:30:37 +0000 (11:30 +0800)]
drm/edid: Add 6 bpc quirk for SDC panel in Lenovo G50

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 11bcf5f78905b90baae8fb01e16650664ed0cb00 upstream.

Another panel that needs 6BPC quirk.

BugLink: https://bugs.launchpad.net/bugs/1819968
Cc: <stable@vger.kernel.org> # v4.8+
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190402033037.21877-1-kai.heng.feng@canonical.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agomac80211: Reject malformed SSID elements
Will Deacon [Fri, 4 Oct 2019 09:51:31 +0000 (10:51 +0100)]
mac80211: Reject malformed SSID elements

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 4152561f5da3fca92af7179dd538ea89e248f9d0 upstream.

Although this shouldn't occur in practice, it's a good idea to bounds
check the length field of the SSID element prior to using it for things
like allocations or memcpy operations.

Cc: <stable@vger.kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Reported-by: Nicolas Waisman <nico@semmle.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20191004095132.15777-1-will@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agocfg80211: wext: avoid copying malformed SSIDs
Will Deacon [Fri, 4 Oct 2019 09:51:32 +0000 (10:51 +0100)]
cfg80211: wext: avoid copying malformed SSIDs

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 4ac2813cc867ae563a1ba5a9414bfb554e5796fa upstream.

Ensure the SSID element is bounds-checked prior to invoking memcpy()
with its length field, when copying to userspace.

Cc: <stable@vger.kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Reported-by: Nicolas Waisman <nico@semmle.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20191004095132.15777-2-will@kernel.org
[adjust commit log a bit]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoiwlwifi: pcie: change qu with jf devices to use qu configuration
Luca Coelho [Tue, 8 Oct 2019 10:21:02 +0000 (13:21 +0300)]
iwlwifi: pcie: change qu with jf devices to use qu configuration

BugLink: https://bugs.launchpad.net/bugs/1850456
commit aa0cc7dde17bb6b8cc533bbcfe3f53d70e0dd269 upstream.

There were a bunch of devices with qu and jf that were loading the
configuration with pu and jf, which is wrong.  Fix them all
accordingly.  Additionally, remove 0x1010 and 0x1210 subsytem IDs from
the list, since they are obviously wrong, and 0x0044 and 0x0244, which
were duplicate.

Cc: stable@vger.kernel.org # 5.1+
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoACPI: NFIT: Fix unlock on error in scrub_show()
Dan Carpenter [Fri, 18 Oct 2019 12:35:34 +0000 (15:35 +0300)]
ACPI: NFIT: Fix unlock on error in scrub_show()

BugLink: https://bugs.launchpad.net/bugs/1850456
commit edffc70f505abdab885f4b4212438b4298dec78f upstream.

We change the locking in this function and forgot to update this error
path so we are accidentally still holding the "dev->lockdep_mutex".

Fixes: 87a30e1f05d7 ("driver-core, libnvdimm: Let device subsystems add local lockdep coverage")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: 5.3+ <stable@vger.kernel.org> # 5.3+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoACPI: CPPC: Set pcc_data[pcc_ss_id] to NULL in acpi_cppc_processor_exit()
John Garry [Tue, 15 Oct 2019 14:07:31 +0000 (22:07 +0800)]
ACPI: CPPC: Set pcc_data[pcc_ss_id] to NULL in acpi_cppc_processor_exit()

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 56a0b978d42f58c7e3ba715cf65af487d427524d upstream.

When enabling KASAN and DEBUG_TEST_DRIVER_REMOVE, I find this KASAN
warning:

[   20.872057] BUG: KASAN: use-after-free in pcc_data_alloc+0x40/0xb8
[   20.878226] Read of size 4 at addr ffff00236cdeb684 by task swapper/0/1
[   20.884826]
[   20.886309] CPU: 19 PID: 1 Comm: swapper/0 Not tainted 5.4.0-rc1-00009-ge7f7df3db5bf-dirty #289
[   20.894994] Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI RC0 - V1.16.01 03/15/2019
[   20.903505] Call trace:
[   20.905942]  dump_backtrace+0x0/0x200
[   20.909593]  show_stack+0x14/0x20
[   20.912899]  dump_stack+0xd4/0x130
[   20.916291]  print_address_description.isra.9+0x6c/0x3b8
[   20.921592]  __kasan_report+0x12c/0x23c
[   20.925417]  kasan_report+0xc/0x18
[   20.928808]  __asan_load4+0x94/0xb8
[   20.932286]  pcc_data_alloc+0x40/0xb8
[   20.935938]  acpi_cppc_processor_probe+0x4e8/0xb08
[   20.940717]  __acpi_processor_start+0x48/0xb0
[   20.945062]  acpi_processor_start+0x40/0x60
[   20.949235]  really_probe+0x118/0x548
[   20.952887]  driver_probe_device+0x7c/0x148
[   20.957059]  device_driver_attach+0x94/0xa0
[   20.961231]  __driver_attach+0xa4/0x110
[   20.965055]  bus_for_each_dev+0xe8/0x158
[   20.968966]  driver_attach+0x30/0x40
[   20.972531]  bus_add_driver+0x234/0x2f0
[   20.976356]  driver_register+0xbc/0x1d0
[   20.980182]  acpi_processor_driver_init+0x40/0xe4
[   20.984875]  do_one_initcall+0xb4/0x254
[   20.988700]  kernel_init_freeable+0x24c/0x2f8
[   20.993047]  kernel_init+0x10/0x118
[   20.996524]  ret_from_fork+0x10/0x18
[   21.000087]
[   21.001567] Allocated by task 1:
[   21.004785]  save_stack+0x28/0xc8
[   21.008089]  __kasan_kmalloc.isra.9+0xbc/0xd8
[   21.012435]  kasan_kmalloc+0xc/0x18
[   21.015913]  pcc_data_alloc+0x94/0xb8
[   21.019564]  acpi_cppc_processor_probe+0x4e8/0xb08
[   21.024343]  __acpi_processor_start+0x48/0xb0
[   21.028689]  acpi_processor_start+0x40/0x60
[   21.032860]  really_probe+0x118/0x548
[   21.036512]  driver_probe_device+0x7c/0x148
[   21.040684]  device_driver_attach+0x94/0xa0
[   21.044855]  __driver_attach+0xa4/0x110
[   21.048680]  bus_for_each_dev+0xe8/0x158
[   21.052591]  driver_attach+0x30/0x40
[   21.056155]  bus_add_driver+0x234/0x2f0
[   21.059980]  driver_register+0xbc/0x1d0
[   21.063805]  acpi_processor_driver_init+0x40/0xe4
[   21.068497]  do_one_initcall+0xb4/0x254
[   21.072322]  kernel_init_freeable+0x24c/0x2f8
[   21.076667]  kernel_init+0x10/0x118
[   21.080144]  ret_from_fork+0x10/0x18
[   21.083707]
[   21.085186] Freed by task 1:
[   21.088056]  save_stack+0x28/0xc8
[   21.091360]  __kasan_slab_free+0x118/0x180
[   21.095445]  kasan_slab_free+0x10/0x18
[   21.099183]  kfree+0x80/0x268
[   21.102139]  acpi_cppc_processor_exit+0x1a8/0x1b8
[   21.106832]  acpi_processor_stop+0x70/0x80
[   21.110917]  really_probe+0x174/0x548
[   21.114568]  driver_probe_device+0x7c/0x148
[   21.118740]  device_driver_attach+0x94/0xa0
[   21.122912]  __driver_attach+0xa4/0x110
[   21.126736]  bus_for_each_dev+0xe8/0x158
[   21.130648]  driver_attach+0x30/0x40
[   21.134212]  bus_add_driver+0x234/0x2f0
[   21.0x10/0x18
[   21.161764]
[   21.163244] The buggy address belongs to the object at ffff00236cdeb600
[   21.163244]  which belongs to the cache kmalloc-256 of size 256
[   21.175750] The buggy address is located 132 bytes inside of
[   21.175750]  256-byte region [ffff00236cdeb600ffff00236cdeb700)
[   21.187473] The buggy address belongs to the page:
[   21.192254] page:fffffe008d937a00 refcount:1 mapcount:0 mapping:ffff002370c0fa00 index:0x0 compound_mapcount: 0
[   21.202331] flags: 0x1ffff00000010200(slab|head)
[   21.206940] raw: 1ffff00000010200 dead000000000100 dead000000000122 ffff002370c0fa00
[   21.214671] raw: 0000000000000000 00000000802a002a 00000001ffffffff 0000000000000000
[   21.222400] page dumped because: kasan: bad access detected
[   21.227959]
[   21.229438] Memory state around the buggy address:
[   21.234218]  ffff00236cdeb580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   21.241427]  ffff00236cdeb600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   21.248637] >ffff00236cdeb680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   21.255845]                    ^
[   21.259062]  ffff00236cdeb700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   21.266272]  ffff00236cdeb780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   21.273480] ==================================================================

It seems that global pcc_data[pcc_ss_id] can be freed in
acpi_cppc_processor_exit(), but we may later reference this value, so
NULLify it when freed.

Also remove the useless setting of data "pcc_channel_acquired", which
we're about to free.

Fixes: 85b1407bf6d2 ("ACPI / CPPC: Make CPPC ACPI driver aware of PCC subspace IDs")
Signed-off-by: John Garry <john.garry@huawei.com>
Cc: 4.15+ <stable@vger.kernel.org> # 4.15+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoASoC: rsnd: Reinitialize bit clock inversion flag for every format setting
Junya Monden [Wed, 16 Oct 2019 12:42:55 +0000 (14:42 +0200)]
ASoC: rsnd: Reinitialize bit clock inversion flag for every format setting

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 22e58665a01006d05f0239621f7d41cacca96cc4 upstream.

Unlike other format-related DAI parameters, rdai->bit_clk_inv flag
is not properly re-initialized when setting format for new stream
processing. The inversion, if requested, is then applied not to default,
but to a previous value, which leads to SCKP bit in SSICR register being
set incorrectly.
Fix this by re-setting the flag to its initial value, determined by format.

Fixes: 1a7889ca8aba3 ("ASoC: rsnd: fixup SND_SOC_DAIFMT_xB_xF behavior")
Cc: Andrew Gabbasov <andrew_gabbasov@mentor.com>
Cc: Jiada Wang <jiada_wang@mentor.com>
Cc: Timo Wischer <twischer@de.adit-jv.com>
Cc: stable@vger.kernel.org # v3.17+
Signed-off-by: Junya Monden <jmonden@jp.adit-jv.com>
Signed-off-by: Eugeniu Rosca <erosca@de.adit-jv.com>
Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Link: https://lore.kernel.org/r/20191016124255.7442-1-erosca@de.adit-jv.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoInput: st1232 - fix reporting multitouch coordinates
Dixit Parmar [Mon, 21 Oct 2019 16:32:47 +0000 (09:32 -0700)]
Input: st1232 - fix reporting multitouch coordinates

BugLink: https://bugs.launchpad.net/bugs/1850456
commit b1a402e75a5f5127ff1ffff0615249f98df8b7b3 upstream.

For Sitronix st1633 multi-touch controller driver the coordinates reported
for multiple fingers were wrong, as it was always taking LSB of coordinates
from the first contact data.

Signed-off-by: Dixit Parmar <dixitparmar19@gmail.com>
Reviewed-by: Martin Kepplinger <martink@posteo.de>
Cc: stable@vger.kernel.org
Fixes: 351e0592bfea ("Input: st1232 - add support for st1633")
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204561
Link: https://lore.kernel.org/r/1566209314-21767-1-git-send-email-dixitparmar19@gmail.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoInput: synaptics-rmi4 - avoid processing unknown IRQs
Evan Green [Sat, 12 Oct 2019 00:22:09 +0000 (17:22 -0700)]
Input: synaptics-rmi4 - avoid processing unknown IRQs

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 363c53875aef8fce69d4a2d0873919ccc7d9e2ad upstream.

rmi_process_interrupt_requests() calls handle_nested_irq() for
each interrupt status bit it finds. If the irq domain mapping for
this bit had not yet been set up, then it ends up calling
handle_nested_irq(0), which causes a NULL pointer dereference.

There's already code that masks the irq_status bits coming out of the
hardware with current_irq_mask, presumably to avoid this situation.
However current_irq_mask seems to more reflect the actual mask set
in the hardware rather than the IRQs software has set up and registered
for. For example, in rmi_driver_reset_handler(), the current_irq_mask
is initialized based on what is read from the hardware. If the reset
value of this mask enables IRQs that Linux has not set up yet, then
we end up in this situation.

There appears to be a third unused bitmask that used to serve this
purpose, fn_irq_bits. Use that bitmask instead of current_irq_mask
to avoid calling handle_nested_irq() on IRQs that have not yet been
set up.

Signed-off-by: Evan Green <evgreen@chromium.org>
Reviewed-by: Andrew Duggan <aduggan@synaptics.com>
Link: https://lore.kernel.org/r/20191008223657.163366-1-evgreen@chromium.org
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoInput: da9063 - fix capability and drop KEY_SLEEP
Marco Felsch [Mon, 16 Sep 2019 19:45:48 +0000 (12:45 -0700)]
Input: da9063 - fix capability and drop KEY_SLEEP

BugLink: https://bugs.launchpad.net/bugs/1850456
commit afce285b859cea91c182015fc9858ea58c26cd0e upstream.

Since commit f889beaaab1c ("Input: da9063 - report KEY_POWER instead of
KEY_SLEEP during power key-press") KEY_SLEEP isn't supported anymore. This
caused input device to not generate any events if "dlg,disable-key-power"
is set.

Fix this by unconditionally setting KEY_POWER capability, and not
declaring KEY_SLEEP.

Fixes: f889beaaab1c ("Input: da9063 - report KEY_POWER instead of KEY_SLEEP during power key-press")
Signed-off-by: Marco Felsch <m.felsch@pengutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoRevert "Input: elantech - enable SMBus on new (2018+) systems"
Kai-Heng Feng [Wed, 16 Oct 2019 00:37:37 +0000 (17:37 -0700)]
Revert "Input: elantech - enable SMBus on new (2018+) systems"

BugLink: https://bugs.launchpad.net/bugs/1850456
commit c324345ce89c3cc50226372960619c7ee940f616 upstream.

This reverts commit 883a2a80f79ca5c0c105605fafabd1f3df99b34c.

Apparently use dmi_get_bios_year() as manufacturing date isn't accurate
and this breaks older laptops with new BIOS update.

So let's revert this patch.

There are still new HP laptops still need to use SMBus to support all
features, but it'll be enabled via a whitelist.

Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Acked-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20191001070845.9720-1-kai.heng.feng@canonical.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoscsi: ch: Make it possible to open a ch device multiple times again
Bart Van Assche [Wed, 9 Oct 2019 17:35:36 +0000 (10:35 -0700)]
scsi: ch: Make it possible to open a ch device multiple times again

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 6a0990eaa768dfb7064f06777743acc6d392084b upstream.

Clearing ch->device in ch_release() is wrong because that pointer must
remain valid until ch_remove() is called. This patch fixes the following
crash the second time a ch device is opened:

BUG: kernel NULL pointer dereference, address: 0000000000000790
RIP: 0010:scsi_device_get+0x5/0x60
Call Trace:
 ch_open+0x4c/0xa0 [ch]
 chrdev_open+0xa2/0x1c0
 do_dentry_open+0x13a/0x380
 path_openat+0x591/0x1470
 do_filp_open+0x91/0x100
 do_sys_open+0x184/0x220
 do_syscall_64+0x5f/0x1a0
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 085e56766f74 ("scsi: ch: add refcounting")
Cc: Hannes Reinecke <hare@suse.de>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20191009173536.247889-1-bvanassche@acm.org
Reported-by: Rob Turk <robtu@rtist.nl>
Suggested-by: Rob Turk <robtu@rtist.nl>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoscsi: core: try to get module before removing device
Yufen Yu [Tue, 15 Oct 2019 13:05:56 +0000 (21:05 +0800)]
scsi: core: try to get module before removing device

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 77c301287ebae86cc71d03eb3806f271cb14da79 upstream.

We have a test case like block/001 in blktests, which will create a scsi
device by loading scsi_debug module and then try to delete the device by
sysfs interface. At the same time, it may remove the scsi_debug module.

And getting a invalid paging request BUG_ON as following:

[   34.625854] BUG: unable to handle page fault for address: ffffffffa0016bb8
[   34.629189] Oops: 0000 [#1] SMP PTI
[   34.629618] CPU: 1 PID: 450 Comm: bash Tainted: G        W         5.4.0-rc3+ #473
[   34.632524] RIP: 0010:scsi_proc_hostdir_rm+0x5/0xa0
[   34.643555] CR2: ffffffffa0016bb8 CR3: 000000012cd88000 CR4: 00000000000006e0
[   34.644545] Call Trace:
[   34.644907]  scsi_host_dev_release+0x6b/0x1f0
[   34.645511]  device_release+0x74/0x110
[   34.646046]  kobject_put+0x116/0x390
[   34.646559]  put_device+0x17/0x30
[   34.647041]  scsi_target_dev_release+0x2b/0x40
[   34.647652]  device_release+0x74/0x110
[   34.648186]  kobject_put+0x116/0x390
[   34.648691]  put_device+0x17/0x30
[   34.649157]  scsi_device_dev_release_usercontext+0x2e8/0x360
[   34.649953]  execute_in_process_context+0x29/0x80
[   34.650603]  scsi_device_dev_release+0x20/0x30
[   34.651221]  device_release+0x74/0x110
[   34.651732]  kobject_put+0x116/0x390
[   34.652230]  sysfs_unbreak_active_protection+0x3f/0x50
[   34.652935]  sdev_store_delete.cold.4+0x71/0x8f
[   34.653579]  dev_attr_store+0x1b/0x40
[   34.654103]  sysfs_kf_write+0x3d/0x60
[   34.654603]  kernfs_fop_write+0x174/0x250
[   34.655165]  __vfs_write+0x1f/0x60
[   34.655639]  vfs_write+0xc7/0x280
[   34.656117]  ksys_write+0x6d/0x140
[   34.656591]  __x64_sys_write+0x1e/0x30
[   34.657114]  do_syscall_64+0xb1/0x400
[   34.657627]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   34.658335] RIP: 0033:0x7f156f337130

During deleting scsi target, the scsi_debug module have been removed. Then,
sdebug_driver_template belonged to the module cannot be accessd, resulting
in scsi_proc_hostdir_rm() BUG_ON.

To fix the bug, we add scsi_device_get() in sdev_store_delete() to try to
increase refcount of module, avoiding the module been removed.

Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20191015130556.18061-1-yuyufen@huawei.com
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoscsi: core: save/restore command resid for error handling
Damien Le Moal [Tue, 1 Oct 2019 07:48:39 +0000 (16:48 +0900)]
scsi: core: save/restore command resid for error handling

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 8f8fed0cdbbd6cdbf28d9ebe662f45765d2f7d39 upstream.

When a non-passthrough command is terminated with CHECK CONDITION, request
sense is executed by hijacking the command descriptor. Since
scsi_eh_prep_cmnd() and scsi_eh_restore_cmnd() do not save/restore the
original command resid, the value returned on failure of the original
command is lost and replaced with the value set by the execution of the
request sense command. This value may in many instances be unaligned to the
device sector size, causing sd_done() to print a warning message about the
incorrect unaligned resid before the command is retried.

Fix this problem by saving the original command residual in struct
scsi_eh_save using scsi_eh_prep_cmnd() and restoring it in
scsi_eh_restore_cmnd(). In addition, to make sure that the request sense
command is executed with a correctly initialized command structure, also
reset the residual to 0 in scsi_eh_prep_cmnd() after saving the original
command value in struct scsi_eh_save.

Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20191001074839.1994-1-damien.lemoal@wdc.com
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoscsi: sd: Ignore a failure to sync cache due to lack of authorization
Oliver Neukum [Tue, 3 Sep 2019 10:18:39 +0000 (12:18 +0200)]
scsi: sd: Ignore a failure to sync cache due to lack of authorization

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 21e3d6c81179bbdfa279efc8de456c34b814cfd2 upstream.

I've got a report about a UAS drive enclosure reporting back Sense: Logical
unit access not authorized if the drive it holds is password protected.
While the drive is obviously unusable in that state as a mass storage
device, it still exists as a sd device and when the system is asked to
perform a suspend of the drive, it will be sent a SYNCHRONIZE CACHE. If
that fails due to password protection, the error must be ignored.

Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20190903101840.16483-1-oneukum@suse.com
Signed-off-by: Oliver Neukum <oneukum@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoscsi: zfcp: fix reaction on bit error threshold notification
Steffen Maier [Tue, 1 Oct 2019 10:49:49 +0000 (12:49 +0200)]
scsi: zfcp: fix reaction on bit error threshold notification

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 2190168aaea42c31bff7b9a967e7b045f07df095 upstream.

On excessive bit errors for the FCP channel ingress fibre path, the channel
notifies us.  Previously, we only emitted a kernel message and a trace
record.  Since performance can become suboptimal with I/O timeouts due to
bit errors, we now stop using an FCP device by default on channel
notification so multipath on top can timely failover to other paths.  A new
module parameter zfcp.ber_stop can be used to get zfcp old behavior.

User explanation of new kernel message:

 * Description:
 * The FCP channel reported that its bit error threshold has been exceeded.
 * These errors might result from a problem with the physical components
 * of the local fibre link into the FCP channel.
 * The problem might be damage or malfunction of the cable or
 * cable connection between the FCP channel and
 * the adjacent fabric switch port or the point-to-point peer.
 * Find details about the errors in the HBA trace for the FCP device.
 * The zfcp device driver closed down the FCP device
 * to limit the performance impact from possible I/O command timeouts.
 * User action:
 * Check for problems on the local fibre link, ensure that fibre optics are
 * clean and functional, and all cables are properly plugged.
 * After the repair action, you can manually recover the FCP device by
 * writing "0" into its "failed" sysfs attribute.
 * If recovery through sysfs is not possible, set the CHPID of the device
 * offline and back online on the service element.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: <stable@vger.kernel.org> #2.6.30+
Link: https://lore.kernel.org/r/20191001104949.42810-1-maier@linux.ibm.com
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agostaging: wlan-ng: fix exit return when sme->key_idx >= NUM_WEPKEYS
Colin Ian King [Mon, 14 Oct 2019 11:02:01 +0000 (12:02 +0100)]
staging: wlan-ng: fix exit return when sme->key_idx >= NUM_WEPKEYS

BugLink: https://bugs.launchpad.net/bugs/1850456
commit 153c5d8191c26165dbbd2646448ca7207f7796d0 upstream.

Currently the exit return path when sme->key_idx >= NUM_WEPKEYS is via
label 'exit' and this checks if result is non-zero, however result has
not been initialized and contains garbage.  Fix this by replacing the
goto with a return with the error code.

Addresses-Coverity: ("Uninitialized scalar variable")
Fixes: 0ca6d8e74489 ("Staging: wlan-ng: replace switch-case statements with macro")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20191014110201.9874-1-colin.king@canonical.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
5 years agoMIPS: tlbex: Fix build_restore_pagemask KScratch restore
Paul Burton [Fri, 18 Oct 2019 22:38:48 +0000 (15:38 -0700)]
MIPS: tlbex: Fix build_restore_pagemask KScratch restore

BugLink: https://bugs.launchpad.net/bugs/1850456
commit b42aa3fd5957e4daf4b69129e5ce752a2a53e7d6 upstream.

build_restore_pagemask() will restore the value of register $1/$at when
its restore_scratch argument is non-zero, and aims to do so by filling a
branch delay slot. Commit 0b24cae4d535 ("MIPS: Add missing EHB in mtc0
-> mfc0 sequence.") added an EHB instruction (Execution Hazard Barrier)
prior to restoring $1 from a KScratch register, in order to resolve a
hazard that can result in stale values of the KScratch register being
observed. In particular, P-class CPUs from MIPS with out of order
execution pipelines such as the P5600 & P6600 are affected.

Unfortunately this EHB instruction was inserted in the branch delay slot
causing the MFC0 instruction which performs the restoration to no longer
execute along with the branch. The result is that the $1 register isn't
actually restored, ie. the TLB refill exception handler clobbers it -
which is exactly the problem the EHB is meant to avoid for the P-class
CPUs.

Similarly build_get_pgd_vmalloc() will restore the value of $1/$at when
its mode argument equals refill_scratch, and suffers from the same
problem.

Fix this by in both cases moving the EHB earlier in the emitted code.
There's no reason it needs to immediately precede the MFC0 - it simply
needs to be between the MTC0 & MFC0.

This bug only affects Cavium Octeon systems which use
build_fast_tlb_refill_handler().

Signed-off-by: Paul Burton <paulburton@kernel.org>
Fixes: 0b24cae4d535 ("MIPS: Add missing EHB in mtc0 -> mfc0 sequence.")
Cc: Dmitry Korotin <dkorotin@wavecomp.com>
Cc: stable@vger.kernel.org # v3.15+
Cc: linux-mips@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>