]> git.proxmox.com Git - mirror_ubuntu-bionic-kernel.git/log
mirror_ubuntu-bionic-kernel.git
4 years agommc: meson-gx: Free irq in release() callback
Remi Pommarel [Thu, 10 Jan 2019 23:01:35 +0000 (00:01 +0100)]
mmc: meson-gx: Free irq in release() callback

BugLink: https://bugs.launchpad.net/bugs/1837664
commit bb364890323cca6e43f13e86d190ebf34a7d8cea upstream.

Because the irq was requested through device managed resources API
(devm_request_threaded_irq()) it was freed after meson_mmc_remove()
completion, thus after mmc_free_host() has reclaimed meson_host memory.
As this irq is IRQF_SHARED, while using CONFIG_DEBUG_SHIRQ, its handler
get called by free_irq(). So meson_mmc_irq() was called after the
meson_host memory reclamation and was using invalid memory.

We ended up with the following scenario:
device_release_driver()
meson_mmc_remove()
mmc_free_host() /* Freeing host memory */
...
devres_release_all()
devm_irq_release()
__free_irq()
meson_mmc_irq() /* Uses freed memory */

To avoid this, the irq is released in meson_mmc_remove() and in
mseon_mmc_probe() error path before mmc_free_host() gets called.

Reported-by: Elie Roudninski <xademax@gmail.com>
Signed-off-by: Remi Pommarel <repk@triplefau.lt>
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agos390/mm: always force a load of the primary ASCE on context switch
Martin Schwidefsky [Tue, 8 Jan 2019 11:44:57 +0000 (12:44 +0100)]
s390/mm: always force a load of the primary ASCE on context switch

BugLink: https://bugs.launchpad.net/bugs/1837664
commit a38662084c8bdb829ff486468c7ea801c13fcc34 upstream.

The ASCE of an mm_struct can be modified after a task has been created,
e.g. via crst_table_downgrade for a compat process. The active_mm logic
to avoid the switch_mm call if the next task is a kernel thread can
lead to a situation where switch_mm is called where 'prev == next' is
true but 'prev->context.asce == next->context.asce' is not.

This can lead to a situation where a CPU uses the outdated ASCE to run
a task. The result can be a crash, endless loops and really subtle
problem due to TLBs being created with an invalid ASCE.

Cc: stable@kernel.org # v3.15+
Fixes: 53e857f30867 ("s390/mm,tlb: race of lazy TLB flush vs. recreation")
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoASoC: tlv320aic32x4: Kernel OOPS while entering DAPM standby mode
b-ak [Mon, 7 Jan 2019 17:00:22 +0000 (22:30 +0530)]
ASoC: tlv320aic32x4: Kernel OOPS while entering DAPM standby mode

BugLink: https://bugs.launchpad.net/bugs/1837664
commit 667e9334fa64da2273e36ce131b05ac9e47c5769 upstream.

During the bootup of the kernel, the DAPM bias level is in the OFF
state. As soon as the DAPM framework kicks in it pushes the codec
into STANDBY state.

The probe function doesn't prepare the clock, and STANDBY state
does a clk_disable_unprepare() without checking the previous state.
This leads to an OOPS.

Not transitioning from an OFF state to the STANDBY state fixes the
problem.

Signed-off-by: b-ak <anur.bhargav@gmail.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agomlxsw: spectrum_fid: Update dummy FID index
Nir Dotan [Fri, 18 Jan 2019 15:57:59 +0000 (15:57 +0000)]
mlxsw: spectrum_fid: Update dummy FID index

BugLink: https://bugs.launchpad.net/bugs/1837664
[ Upstream commit a11dcd6497915ba79d95ef4fe2541aaac27f6201 ]

When using a tc flower action of egress mirred redirect, the driver adds
an implicit FID setting action. This implicit action sets a dummy FID to
the packet and is used as part of a design for trapping unmatched flows
in OVS.  While this implicit FID setting action is supposed to be a NOP
when a redirect action is added, in Spectrum-2 the FID record is
consulted as the dummy FID index is an 802.1D FID index and the packet
is dropped instead of being redirected.

Set the dummy FID index value to be within 802.1Q range. This satisfies
both Spectrum-1 which ignores the FID and Spectrum-2 which identifies it
as an 802.1Q FID and will then follow the redirect action.

Fixes: c3ab435466d5 ("mlxsw: spectrum: Extend to support Spectrum-2 ASIC")
Signed-off-by: Nir Dotan <nird@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agomlxsw: pci: Increase PCI SW reset timeout
Nir Dotan [Fri, 18 Jan 2019 15:57:56 +0000 (15:57 +0000)]
mlxsw: pci: Increase PCI SW reset timeout

BugLink: https://bugs.launchpad.net/bugs/1837664
[ Upstream commit d2f372ba0914e5722ac28e15f2ed2db61bcf0e44 ]

Spectrum-2 PHY layer introduces a calibration period which is a part of the
Spectrum-2 firmware boot process. Hence increase the SW timeout waiting for
the firmware to come out of boot. This does not increase system boot time
in cases where the firmware PHY calibration process is done quickly.

Fixes: c3ab435466d5 ("mlxsw: spectrum: Extend to support Spectrum-2 ASIC")
Signed-off-by: Nir Dotan <nird@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoxen: Fix x86 sched_clock() interface for xen
Juergen Gross [Mon, 14 Jan 2019 12:44:13 +0000 (13:44 +0100)]
xen: Fix x86 sched_clock() interface for xen

BugLink: https://bugs.launchpad.net/bugs/1837664
commit 867cefb4cb1012f42cada1c7d1f35ac8dd276071 upstream.

Commit f94c8d11699759 ("sched/clock, x86/tsc: Rework the x86 'unstable'
sched_clock() interface") broke Xen guest time handling across
migration:

[  187.249951] Freezing user space processes ... (elapsed 0.001 seconds) done.
[  187.251137] OOM killer disabled.
[  187.251137] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[  187.252299] suspending xenstore...
[  187.266987] xen:grant_table: Grant tables using version 1 layout
[18446743811.706476] OOM killer enabled.
[18446743811.706478] Restarting tasks ... done.
[18446743811.720505] Setting capacity to 16777216

Fix that by setting xen_sched_clock_offset at resume time to ensure a
monotonic clock value.

[boris: replaced pr_info() with pr_info_once() in xen_callback_vector()
 to avoid printing with incorrect timestamp during resume (as we
 haven't re-adjusted the clock yet)]

Fixes: f94c8d11699759 ("sched/clock, x86/tsc: Rework the x86 'unstable' sched_clock() interface")
Cc: <stable@vger.kernel.org> # 4.11
Reported-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Tested-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agox86/xen/time: Output xen sched_clock time from 0
Pavel Tatashin [Thu, 19 Jul 2018 20:55:32 +0000 (16:55 -0400)]
x86/xen/time: Output xen sched_clock time from 0

BugLink: https://bugs.launchpad.net/bugs/1837664
commit 38669ba205d178d2d38bfd194a196d65a44d5af2 upstream.

It is expected for sched_clock() to output data from 0, when system boots.

Add an offset xen_sched_clock_offset (similarly how it is done in other
hypervisors i.e. kvm_sched_clock_offset) to count sched_clock() from 0,
when time is first initialized.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: steven.sistare@oracle.com
Cc: daniel.m.jordan@oracle.com
Cc: linux@armlinux.org.uk
Cc: schwidefsky@de.ibm.com
Cc: heiko.carstens@de.ibm.com
Cc: john.stultz@linaro.org
Cc: sboyd@codeaurora.org
Cc: hpa@zytor.com
Cc: douly.fnst@cn.fujitsu.com
Cc: peterz@infradead.org
Cc: prarit@redhat.com
Cc: feng.tang@intel.com
Cc: pmladek@suse.com
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: linux-s390@vger.kernel.org
Cc: boris.ostrovsky@oracle.com
Cc: jgross@suse.com
Cc: pbonzini@redhat.com
Link: https://lkml.kernel.org/r/20180719205545.16512-14-pasha.tatashin@oracle.com
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agousb: dwc3: gadget: Clear req->needs_extra_trb flag on cleanup
Jack Pham [Thu, 10 Jan 2019 20:39:55 +0000 (12:39 -0800)]
usb: dwc3: gadget: Clear req->needs_extra_trb flag on cleanup

BugLink: https://bugs.launchpad.net/bugs/1837664
commit bd6742249b9ca918565e4e3abaa06665e587f4b5 upstream.

OUT endpoint requests may somtimes have this flag set when
preparing to be submitted to HW indicating that there is an
additional TRB chained to the request for alignment purposes.
If that request is removed before the controller can execute the
transfer (e.g. ep_dequeue/ep_disable), the request will not go
through the dwc3_gadget_ep_cleanup_completed_request() handler
and will not have its needs_extra_trb flag cleared when
dwc3_gadget_giveback() is called.  This same request could be
later requeued for a new transfer that does not require an
extra TRB and if it is successfully completed, the cleanup
and TRB reclamation will incorrectly process the additional TRB
which belongs to the next request, and incorrectly advances the
TRB dequeue pointer, thereby messing up calculation of the next
requeust's actual/remaining count when it completes.

The right thing to do here is to ensure that the flag is cleared
before it is given back to the function driver.  A good place
to do that is in dwc3_gadget_del_and_unmap_request().

Fixes: c6267a51639b ("usb: dwc3: gadget: align transfers to wMaxPacketSize")
Cc: stable@vger.kernel.org
Signed-off-by: Jack Pham <jackp@codeaurora.org>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
[jackp: backport to <= 4.20: replaced 'needs_extra_trb' with 'unaligned'
        and 'zero' members in patch and reworded commit text]
Signed-off-by: Jack Pham <jackp@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agonvmet-rdma: fix null dereference under heavy load
Raju Rangoju [Thu, 3 Jan 2019 17:35:31 +0000 (23:05 +0530)]
nvmet-rdma: fix null dereference under heavy load

BugLink: https://bugs.launchpad.net/bugs/1837664
commit 5cbab6303b4791a3e6713dfe2c5fda6a867f9adc upstream.

Under heavy load if we don't have any pre-allocated rsps left, we
dynamically allocate a rsp, but we are not actually allocating memory
for nvme_completion (rsp->req.rsp). In such a case, accessing pointer
fields (req->rsp->status) in nvmet_req_init() will result in crash.

To fix this, allocate the memory for nvme_completion by calling
nvmet_rdma_alloc_rsp()

Fixes: 8407879c("nvmet-rdma:fix possible bogus dereference under heavy load")
Cc: <stable@vger.kernel.org>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Raju Rangoju <rajur@chelsio.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agonvmet-rdma: Add unlikely for response allocated check
Israel Rukshin [Mon, 19 Nov 2018 10:58:51 +0000 (10:58 +0000)]
nvmet-rdma: Add unlikely for response allocated check

BugLink: https://bugs.launchpad.net/bugs/1837664
commit ad1f824948e4ed886529219cf7cd717d078c630d upstream.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Cc: Raju Rangoju <rajur@chelsio.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agos390/smp: Fix calling smp_call_ipl_cpu() from ipl CPU
David Hildenbrand [Fri, 11 Jan 2019 14:18:22 +0000 (15:18 +0100)]
s390/smp: Fix calling smp_call_ipl_cpu() from ipl CPU

BugLink: https://bugs.launchpad.net/bugs/1837664
commit 60f1bf29c0b2519989927cae640cd1f50f59dc7f upstream.

When calling smp_call_ipl_cpu() from the IPL CPU, we will try to read
from pcpu_devices->lowcore. However, due to prefixing, that will result
in reading from absolute address 0 on that CPU. We have to go via the
actual lowcore instead.

This means that right now, we will read lc->nodat_stack == 0 and
therfore work on a very wrong stack.

This BUG essentially broke rebooting under QEMU TCG (which will report
a low address protection exception). And checking under KVM, it is
also broken under KVM. With 1 VCPU it can be easily triggered.

:/# echo 1 > /proc/sys/kernel/sysrq
:/# echo b > /proc/sysrq-trigger
[   28.476745] sysrq: SysRq : Resetting
[   28.476793] Kernel stack overflow.
[   28.476817] CPU: 0 PID: 424 Comm: sh Not tainted 5.0.0-rc1+ #13
[   28.476820] Hardware name: IBM 2964 NE1 716 (KVM/Linux)
[   28.476826] Krnl PSW : 0400c00180000000 0000000000115c0c (pcpu_delegate+0x12c/0x140)
[   28.476861]            R:0 T:1 IO:0 EX:0 Key:0 M:0 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
[   28.476863] Krnl GPRS: ffffffffffffffff 0000000000000000 000000000010dff8 0000000000000000
[   28.476864]            0000000000000000 0000000000000000 0000000000ab7090 000003e0006efbf0
[   28.476864]            000000000010dff8 0000000000000000 0000000000000000 0000000000000000
[   28.476865]            000000007fffc000 0000000000730408 000003e0006efc58 0000000000000000
[   28.476887] Krnl Code: 0000000000115bfe4170f000            la      %r7,0(%r15)
[   28.476887]            0000000000115c0241f0a000            la      %r15,0(%r10)
[   28.476887]           #0000000000115c06e370f0980024        stg     %r7,152(%r15)
[   28.476887]           >0000000000115c0cc0e5fffff86e        brasl   %r14,114ce8
[   28.476887]            0000000000115c1241f07000            la      %r15,0(%r7)
[   28.476887]            0000000000115c16a7f4ffa8            brc     15,115b66
[   28.476887]            0000000000115c1a: 0707                bcr     0,%r7
[   28.476887]            0000000000115c1c: 0707                bcr     0,%r7
[   28.476901] Call Trace:
[   28.476902] Last Breaking-Event-Address:
[   28.476920]  [<0000000000a01c4a>] arch_call_rest_init+0x22/0x80
[   28.476927] Kernel panic - not syncing: Corrupt kernel stack, can't continue.
[   28.476930] CPU: 0 PID: 424 Comm: sh Not tainted 5.0.0-rc1+ #13
[   28.476932] Hardware name: IBM 2964 NE1 716 (KVM/Linux)
[   28.476932] Call Trace:

Fixes: 2f859d0dad81 ("s390/smp: reduce size of struct pcpu")
Cc: stable@vger.kernel.org # 4.0+
Reported-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoRevert "seccomp: add a selftest for get_metadata"
Sasha Levin [Mon, 28 Jan 2019 19:40:49 +0000 (14:40 -0500)]
Revert "seccomp: add a selftest for get_metadata"

BugLink: https://bugs.launchpad.net/bugs/1837664
This reverts commit e65cd9a20343ea90f576c24c38ee85ab6e7d5fec.

Tommi T. Rrantala notes:

PTRACE_SECCOMP_GET_METADATA was only added in 4.16
(26500475ac1b499d8636ff281311d633909f5d20)

And it's also breaking seccomp_bpf.c compilation for me:

seccomp_bpf.c: In function ‘get_metadata’:
seccomp_bpf.c:2878:26: error: storage size of ‘md’ isn’t known
  struct seccomp_metadata md;

Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agovt: invoke notifier on screen size change
Nicolas Pitre [Wed, 9 Jan 2019 03:55:01 +0000 (22:55 -0500)]
vt: invoke notifier on screen size change

BugLink: https://bugs.launchpad.net/bugs/1837664
commit 0c9b1965faddad7534b6974b5b36c4ad37998f8e upstream.

User space using poll() on /dev/vcs devices are not awaken when a
screen size change occurs. Let's fix that.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agocan: bcm: check timer values before ktime conversion
Oliver Hartkopp [Sun, 13 Jan 2019 18:31:43 +0000 (19:31 +0100)]
can: bcm: check timer values before ktime conversion

BugLink: https://bugs.launchpad.net/bugs/1837664
commit 93171ba6f1deffd82f381d36cb13177872d023f6 upstream.

Kyungtae Kim detected a potential integer overflow in bcm_[rx|tx]_setup()
when the conversion into ktime multiplies the given value with NSEC_PER_USEC
(1000).

Reference: https://marc.info/?l=linux-can&m=154732118819828&w=2

Add a check for the given tv_usec, so that the value stays below one second.
Additionally limit the tv_sec value to a reasonable value for CAN related
use-cases of 400 days and ensure all values to be positive.

Reported-by: Kyungtae Kim <kt0755@gmail.com>
Tested-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: linux-stable <stable@vger.kernel.org> # >= 2.6.26
Tested-by: Kyungtae Kim <kt0755@gmail.com>
Acked-by: Andre Naujoks <nautsch2@gmail.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agocan: dev: __can_get_echo_skb(): fix bogous check for non-existing skb by removing it
Manfred Schlaegl [Wed, 19 Dec 2018 18:39:58 +0000 (19:39 +0100)]
can: dev: __can_get_echo_skb(): fix bogous check for non-existing skb by removing it

BugLink: https://bugs.launchpad.net/bugs/1837664
commit 7b12c8189a3dc50638e7d53714c88007268d47ef upstream.

This patch revert commit 7da11ba5c506
("can: dev: __can_get_echo_skb(): print error message, if trying to echo non existing skb")

After introduction of this change we encountered following new error
message on various i.MX plattforms (flexcan):

| flexcan 53fc8000.can can0: __can_get_echo_skb: BUG! Trying to echo non
| existing skb: can_priv::echo_skb[0]

The introduction of the message was a mistake because
priv->echo_skb[idx] = NULL is a perfectly valid in following case: If
CAN_RAW_LOOPBACK is disabled (setsockopt) in applications, the pkt_type
of the tx skb's given to can_put_echo_skb is set to PACKET_LOOPBACK. In
this case can_put_echo_skb will not set priv->echo_skb[idx]. It is
therefore kept NULL.

As additional argument for revert: The order of check and usage of idx
was changed. idx is used to access an array element before checking it's
boundaries.

Signed-off-by: Manfred Schlaegl <manfred.schlaegl@ginzinger.com>
Fixes: 7da11ba5c506 ("can: dev: __can_get_echo_skb(): print error message, if trying to echo non existing skb")
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoirqchip/gic-v3-its: Align PCI Multi-MSI allocation on their size
Marc Zyngier [Fri, 18 Jan 2019 14:08:59 +0000 (14:08 +0000)]
irqchip/gic-v3-its: Align PCI Multi-MSI allocation on their size

BugLink: https://bugs.launchpad.net/bugs/1837664
commit 8208d1708b88b412ca97f50a6d951242c88cbbac upstream.

The way we allocate events works fine in most cases, except
when multiple PCI devices share an ITS-visible DevID, and that
one of them is trying to use MultiMSI allocation.

In that case, our allocation is not guaranteed to be zero-based
anymore, and we have to make sure we allocate it on a boundary
that is compatible with the PCI Multi-MSI constraints.

Fix this by allocating the full region upfront instead of iterating
over the number of MSIs. MSI-X are always allocated one by one,
so this shouldn't change anything on that front.

Fixes: b48ac83d6bbc2 ("irqchip: GICv3: ITS: MSI support")
Cc: stable@vger.kernel.org
Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoposix-cpu-timers: Unbreak timer rearming
Thomas Gleixner [Fri, 11 Jan 2019 13:33:16 +0000 (14:33 +0100)]
posix-cpu-timers: Unbreak timer rearming

BugLink: https://bugs.launchpad.net/bugs/1837664
commit 93ad0fc088c5b4631f796c995bdd27a082ef33a6 upstream.

The recent commit which prevented a division by 0 issue in the alarm timer
code broke posix CPU timers as an unwanted side effect.

The reason is that the common rearm code checks for timer->it_interval
being 0 now. What went unnoticed is that the posix cpu timer setup does not
initialize timer->it_interval as it stores the interval in CPU timer
specific storage. The reason for the separate storage is historical as the
posix CPU timers always had a 64bit nanoseconds representation internally
while timer->it_interval is type ktime_t which used to be a modified
timespec representation on 32bit machines.

Instead of reverting the offending commit and fixing the alarmtimer issue
in the alarmtimer code, store the interval in timer->it_interval at CPU
timer setup time so the common code check works. This also repairs the
existing inconistency of the posix CPU timer code which kept a single shot
timer armed despite of the interval being 0.

The separate storage can be removed in mainline, but that needs to be a
separate commit as the current one has to be backported to stable kernels.

Fixes: 0e334db6bb4b ("posix-timers: Fix division by zero bug")
Reported-by: H.J. Lu <hjl.tools@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190111133500.840117406@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agox86/kaslr: Fix incorrect i8254 outb() parameters
Daniel Drake [Mon, 7 Jan 2019 03:40:24 +0000 (11:40 +0800)]
x86/kaslr: Fix incorrect i8254 outb() parameters

BugLink: https://bugs.launchpad.net/bugs/1837664
commit 7e6fc2f50a3197d0e82d1c0e86282976c9e6c8a4 upstream.

The outb() function takes parameters value and port, in that order.  Fix
the parameters used in the kalsr i8254 fallback code.

Fixes: 5bfce5ef55cb ("x86, kaslr: Provide randomness functions")
Signed-off-by: Daniel Drake <drake@endlessm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@alien8.de
Cc: hpa@zytor.com
Cc: linux@endlessm.com
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190107034024.15005-1-drake@endlessm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agox86/selftests/pkeys: Fork() to check for state being preserved
Dave Hansen [Wed, 2 Jan 2019 21:56:57 +0000 (13:56 -0800)]
x86/selftests/pkeys: Fork() to check for state being preserved

BugLink: https://bugs.launchpad.net/bugs/1837664
commit e1812933b17be7814f51b6c310c5d1ced7a9a5f5 upstream.

There was a bug where the per-mm pkey state was not being preserved across
fork() in the child.  fork() is performed in the pkey selftests, but all of
the pkey activity is performed in the parent.  The child does not perform
any actions sensitive to pkey state.

To make the test more sensitive to these kinds of bugs, add a fork() where
the parent exits, and execution continues in the child.

To achieve this let the key exhaustion test not terminate at the first
allocation failure and fork after 2*NR_PKEYS loops and continue in the
child.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@alien8.de
Cc: hpa@zytor.com
Cc: peterz@infradead.org
Cc: mpe@ellerman.id.au
Cc: will.deacon@arm.com
Cc: luto@kernel.org
Cc: jroedel@suse.de
Cc: stable@vger.kernel.org
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Joerg Roedel <jroedel@suse.de>
Link: https://lkml.kernel.org/r/20190102215657.585704B7@viggo.jf.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agox86/pkeys: Properly copy pkey state at fork()
Dave Hansen [Wed, 2 Jan 2019 21:56:55 +0000 (13:56 -0800)]
x86/pkeys: Properly copy pkey state at fork()

BugLink: https://bugs.launchpad.net/bugs/1837664
commit a31e184e4f69965c99c04cc5eb8a4920e0c63737 upstream.

Memory protection key behavior should be the same in a child as it was
in the parent before a fork.  But, there is a bug that resets the
state in the child at fork instead of preserving it.

The creation of new mm's is a bit convoluted.  At fork(), the code
does:

  1. memcpy() the parent mm to initialize child
  2. mm_init() to initalize some select stuff stuff
  3. dup_mmap() to create true copies that memcpy() did not do right

For pkeys two bits of state need to be preserved across a fork:
'execute_only_pkey' and 'pkey_allocation_map'.

Those are preserved by the memcpy(), but mm_init() invokes
init_new_context() which overwrites 'execute_only_pkey' and
'pkey_allocation_map' with "new" values.

The author of the code erroneously believed that init_new_context is *only*
called at execve()-time.  But, alas, init_new_context() is used at execve()
and fork().

The result is that, after a fork(), the child's pkey state ends up looking
like it does after an execve(), which is totally wrong.  pkeys that are
already allocated can be allocated again, for instance.

To fix this, add code called by dup_mmap() to copy the pkey state from
parent to child explicitly.  Also add a comment above init_new_context() to
make it more clear to the next poor sod what this code is used for.

Fixes: e8c24d3a23a ("x86/pkeys: Allocation/free syscalls")
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@alien8.de
Cc: hpa@zytor.com
Cc: peterz@infradead.org
Cc: mpe@ellerman.id.au
Cc: will.deacon@arm.com
Cc: luto@kernel.org
Cc: jroedel@suse.de
Cc: stable@vger.kernel.org
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Joerg Roedel <jroedel@suse.de>
Link: https://lkml.kernel.org/r/20190102215655.7A69518C@viggo.jf.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoKVM: x86: Fix single-step debugging
Alexander Popov [Mon, 21 Jan 2019 12:48:40 +0000 (15:48 +0300)]
KVM: x86: Fix single-step debugging

BugLink: https://bugs.launchpad.net/bugs/1837664
commit 5cc244a20b86090c087073c124284381cdf47234 upstream.

The single-step debugging of KVM guests on x86 is broken: if we run
gdb 'stepi' command at the breakpoint when the guest interrupts are
enabled, RIP always jumps to native_apic_mem_write(). Then other
nasty effects follow.

Long investigation showed that on Jun 7, 2017 the
commit c8401dda2f0a00cd25c0 ("KVM: x86: fix singlestepping over syscall")
introduced the kvm_run.debug corruption: kvm_vcpu_do_singlestep() can
be called without X86_EFLAGS_TF set.

Let's fix it. Please consider that for -stable.

Signed-off-by: Alexander Popov <alex.popov@linux.com>
Cc: stable@vger.kernel.org
Fixes: c8401dda2f0a00cd25c0 ("KVM: x86: fix singlestepping over syscall")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agodm crypt: fix parsing of extended IV arguments
Milan Broz [Wed, 9 Jan 2019 10:57:14 +0000 (11:57 +0100)]
dm crypt: fix parsing of extended IV arguments

BugLink: https://bugs.launchpad.net/bugs/1837664
commit 1856b9f7bcc8e9bdcccc360aabb56fbd4dd6c565 upstream.

The dm-crypt cipher specification in a mapping table is defined as:
  cipher[:keycount]-chainmode-ivmode[:ivopts]
or (new crypt API format):
  capi:cipher_api_spec-ivmode[:ivopts]

For ESSIV, the parameter includes hash specification, for example:
aes-cbc-essiv:sha256

The implementation expected that additional IV option to never include
another dash '-' character.

But, with SHA3, there are names like sha3-256; so the mapping table
parser fails:

dmsetup create test --table "0 8 crypt aes-cbc-essiv:sha3-256 9c1185a5c5e9fc54612808977ee8f5b9e 0 /dev/sdb 0"
  or (new crypt API format)
dmsetup create test --table "0 8 crypt capi:cbc(aes)-essiv:sha3-256 9c1185a5c5e9fc54612808977ee8f5b9e 0 /dev/sdb 0"

  device-mapper: crypt: Ignoring unexpected additional cipher options
  device-mapper: table: 253:0: crypt: Error creating IV
  device-mapper: ioctl: error adding target to table

Fix the dm-crypt constructor to ignore additional dash in IV options and
also remove a bogus warning (that is ignored anyway).

Cc: stable@vger.kernel.org # 4.12+
Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agodm thin: fix passdown_double_checking_shared_status()
Joe Thornber [Tue, 15 Jan 2019 18:27:01 +0000 (13:27 -0500)]
dm thin: fix passdown_double_checking_shared_status()

BugLink: https://bugs.launchpad.net/bugs/1837664
commit d445bd9cec1a850c2100fcf53684c13b3fd934f2 upstream.

Commit 00a0ea33b495 ("dm thin: do not queue freed thin mapping for next
stage processing") changed process_prepared_discard_passdown_pt1() to
increment all the blocks being discarded until after the passdown had
completed to avoid them being prematurely reused.

IO issued to a thin device that breaks sharing with a snapshot, followed
by a discard issued to snapshot(s) that previously shared the block(s),
results in passdown_double_checking_shared_status() being called to
iterate through the blocks double checking their reference count is zero
and issuing the passdown if so.  So a side effect of commit 00a0ea33b495
is passdown_double_checking_shared_status() was broken.

Fix this by checking if the block reference count is greater than 1.
Also, rename dm_pool_block_is_used() to dm_pool_block_is_shared().

Fixes: 00a0ea33b495 ("dm thin: do not queue freed thin mapping for next stage processing")
Cc: stable@vger.kernel.org # 4.9+
Reported-by: ryan.p.norwood@gmail.com
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoacpi/nfit: Fix command-supported detection
Dan Williams [Sat, 19 Jan 2019 18:55:04 +0000 (10:55 -0800)]
acpi/nfit: Fix command-supported detection

BugLink: https://bugs.launchpad.net/bugs/1837664
commit 11189c1089da413aa4b5fd6be4c4d47c78968819 upstream.

The _DSM function number validation only happens to succeed when the
generic Linux command number translation corresponds with a
DSM-family-specific function number. This breaks NVDIMM-N
implementations that correctly implement _LSR, _LSW, and _LSI, but do
not happen to publish support for DSM function numbers 4, 5, and 6.

Recall that the support for _LS{I,R,W} family of methods results in the
DIMM being marked as supporting those command numbers at
acpi_nfit_register_dimms() time. The DSM function mask is only used for
ND_CMD_CALL support of non-NVDIMM_FAMILY_INTEL devices.

Fixes: 31eca76ba2fc ("nfit, libnvdimm: limited/whitelisted dimm command...")
Cc: <stable@vger.kernel.org>
Link: https://github.com/pmem/ndctl/issues/78
Reported-by: Sujith Pandel <sujith_pandel@dell.com>
Tested-by: Sujith Pandel <sujith_pandel@dell.com>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoacpi/nfit: Block function zero DSMs
Dan Williams [Mon, 14 Jan 2019 22:07:19 +0000 (14:07 -0800)]
acpi/nfit: Block function zero DSMs

BugLink: https://bugs.launchpad.net/bugs/1837664
commit 5e9e38d0db1d29efed1dd4cf9a70115d33521be7 upstream.

In preparation for using function number 0 as an error value, prevent it
from being considered a valid function value by acpi_nfit_ctl().

Cc: <stable@vger.kernel.org>
Cc: stuart hayes <stuart.w.hayes@gmail.com>
Fixes: e02fb7264d8a ("nfit: add Microsoft NVDIMM DSM command set...")
Reported-by: Jeff Moyer <jmoyer@redhat.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoInput: uinput - fix undefined behavior in uinput_validate_absinfo()
Dmitry Torokhov [Mon, 14 Jan 2019 21:54:55 +0000 (13:54 -0800)]
Input: uinput - fix undefined behavior in uinput_validate_absinfo()

BugLink: https://bugs.launchpad.net/bugs/1837664
commit d77651a227f8920dd7ec179b84e400cce844eeb3 upstream.

An integer overflow may arise in uinput_validate_absinfo() if "max - min"
can't be represented by an "int". We should check for overflow before
trying to use the result.

Reported-by: Kyungtae Kim <kt0755@gmail.com>
Reviewed-by: Peter Hutterer <peter.hutterer@who-t.net>
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agocompiler.h: enable builtin overflow checkers and add fallback code
Rasmus Villemoes [Mon, 7 May 2018 22:36:27 +0000 (00:36 +0200)]
compiler.h: enable builtin overflow checkers and add fallback code

BugLink: https://bugs.launchpad.net/bugs/1837664
commit f0907827a8a9152aedac2833ed1b674a7b2a44f2 upstream.

This adds wrappers for the __builtin overflow checkers present in gcc
5.1+ as well as fallback implementations for earlier compilers. It's not
that easy to implement the fully generic __builtin_X_overflow(T1 a, T2
b, T3 *d) in macros, so the fallback code assumes that T1, T2 and T3 are
the same. We obviously don't want the wrappers to have different
semantics depending on $GCC_VERSION, so we also insist on that even when
using the builtins.

There are a few problems with the 'a+b < a' idiom for checking for
overflow: For signed types, it relies on undefined behaviour and is
not actually complete (it doesn't check underflow;
e.g. INT_MIN+INT_MIN == 0 isn't caught). Due to type promotion it
is wrong for all types (signed and unsigned) narrower than
int. Similarly, when a and b does not have the same type, there are
subtle cases like

  u32 a;

  if (a + sizeof(foo) < a)
    return -EOVERFLOW;
  a += sizeof(foo);

where the test is always false on 64 bit platforms. Add to that that it
is not always possible to determine the types involved at a glance.

The new overflow.h is somewhat bulky, but that's mostly a result of
trying to be type-generic, complete (e.g. catching not only overflow
but also signed underflow) and not relying on undefined behaviour.

Linus is of course right [1] that for unsigned subtraction a-b, the
right way to check for overflow (underflow) is "b > a" and not
"__builtin_sub_overflow(a, b, &d)", but that's just one out of six cases
covered here, and included mostly for completeness.

So is it worth it? I think it is, if nothing else for the documentation
value of seeing

  if (check_add_overflow(a, b, &d))
    return -EGOAWAY;
  do_stuff_with(d);

instead of the open-coded (and possibly wrong and/or incomplete and/or
UBsan-tickling)

  if (a+b < a)
    return -EGOAWAY;
  do_stuff_with(a+b);

While gcc does recognize the 'a+b < a' idiom for testing unsigned add
overflow, it doesn't do nearly as good for unsigned multiplication
(there's also no single well-established idiom). So using
check_mul_overflow in kcalloc and friends may also make gcc generate
slightly better code.

[1] https://lkml.org/lkml/2015/11/2/658

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoInput: xpad - add support for SteelSeries Stratus Duo
Tom Panfil [Sat, 12 Jan 2019 01:49:40 +0000 (17:49 -0800)]
Input: xpad - add support for SteelSeries Stratus Duo

BugLink: https://bugs.launchpad.net/bugs/1837664
commit fe2bfd0d40c935763812973ce15f5764f1c12833 upstream.

Add support for the SteelSeries Stratus Duo, a wireless Xbox 360
controller. The Stratus Duo ships with a USB dongle to enable wireless
connectivity, but it can also function as a wired controller by connecting
it directly to a PC via USB, hence the need for two USD PIDs. 0x1430 is the
dongle, and 0x1431 is the controller.

Signed-off-by: Tom Panfil <tom@steelseries.com>
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoCIFS: Do not reconnect TCP session in add_credits()
Pavel Shilovsky [Sat, 19 Jan 2019 01:25:36 +0000 (17:25 -0800)]
CIFS: Do not reconnect TCP session in add_credits()

BugLink: https://bugs.launchpad.net/bugs/1837664
commit ef68e831840c40c7d01b328b3c0f5d8c4796c232 upstream.

When executing add_credits() we currently call cifs_reconnect()
if the number of credits is zero and there are no requests in
flight. In this case we may call cifs_reconnect() recursively
twice and cause memory corruption given the following sequence
of functions:

mid1.callback() -> add_credits() -> cifs_reconnect() ->
-> mid2.callback() -> add_credits() -> cifs_reconnect().

Fix this by avoiding to call cifs_reconnect() in add_credits()
and checking for zero credits in the demultiplex thread.

Cc: <stable@vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoCIFS: Fix credit calculation for encrypted reads with errors
Pavel Shilovsky [Fri, 18 Jan 2019 23:38:11 +0000 (15:38 -0800)]
CIFS: Fix credit calculation for encrypted reads with errors

BugLink: https://bugs.launchpad.net/bugs/1837664
commit ec678eae746dd25766a61c4095e2b649d3b20b09 upstream.

We do need to account for credits received in error responses
to read requests on encrypted sessions.

Cc: <stable@vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoCIFS: Fix credits calculations for reads with errors
Pavel Shilovsky [Thu, 17 Jan 2019 23:29:26 +0000 (15:29 -0800)]
CIFS: Fix credits calculations for reads with errors

BugLink: https://bugs.launchpad.net/bugs/1837664
commit 8004c78c68e894e4fd5ac3c22cc22eb7dc24cabc upstream.

Currently we mark MID as malformed if we get an error from server
in a read response. This leads to not properly processing credits
in the readv callback. Fix this by marking such a response as
normal received response and process it appropriately.

Cc: <stable@vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoCIFS: Fix possible hang during async MTU reads and writes
Pavel Shilovsky [Thu, 17 Jan 2019 16:21:24 +0000 (08:21 -0800)]
CIFS: Fix possible hang during async MTU reads and writes

BugLink: https://bugs.launchpad.net/bugs/1837664
commit acc58d0bab55a50e02c25f00bd6a210ee121595f upstream.

When doing MTU i/o we need to leave some credits for
possible reopen requests and other operations happening
in parallel. Currently we leave 1 credit which is not
enough even for reopen only: we need at least 2 credits
if durable handle reconnect fails. Also there may be
other operations at the same time including compounding
ones which require 3 credits at a time each. Fix this
by leaving 8 credits which is big enough to cover most
scenarios.

Was able to reproduce this when server was configured
to give out fewer credits than usual.

The proper fix would be to reconnect a file handle first
and then obtain credits for an MTU request but this leads
to bigger code changes and should happen in other patches.

Cc: <stable@vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoDrivers: hv: vmbus: Check for ring when getting debug info
Dexuan Cui [Mon, 17 Dec 2018 20:16:09 +0000 (20:16 +0000)]
Drivers: hv: vmbus: Check for ring when getting debug info

BugLink: https://bugs.launchpad.net/bugs/1837664
commit ba50bf1ce9a51fc97db58b96d01306aa70bc3979 upstream.

fc96df16a1ce is good and can already fix the "return stack garbage" issue,
but let's also improve hv_ringbuffer_get_debuginfo(), which would silently
return stack garbage, if people forget to check channel->state or
ring_info->ring_buffer, when using the function in the future.

Having an error check in the function would eliminate the potential risk.

Add a Fixes tag to indicate the patch depdendency.

Fixes: fc96df16a1ce ("Drivers: hv: vmbus: Return -EINVAL for the sys files for unopened channels")
Cc: stable@vger.kernel.org
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agohv_balloon: avoid touching uninitialized struct page during tail onlining
Vitaly Kuznetsov [Fri, 4 Jan 2019 14:19:42 +0000 (15:19 +0100)]
hv_balloon: avoid touching uninitialized struct page during tail onlining

BugLink: https://bugs.launchpad.net/bugs/1837664
commit da8ced360ca8ad72d8f41f5c8fcd5b0e63e1555f upstream.

Hyper-V memory hotplug protocol has 2M granularity and in Linux x86 we use
128M. To deal with it we implement partial section onlining by registering
custom page onlining callback (hv_online_page()). Later, when more memory
arrives we try to online the 'tail' (see hv_bring_pgs_online()).

It was found that in some cases this 'tail' onlining causes issues:

 BUG: Bad page state in process kworker/0:2  pfn:109e3a
 page:ffffe08344278e80 count:0 mapcount:1 mapping:0000000000000000 index:0x0
 flags: 0xfffff80000000()
 raw: 000fffff80000000 dead000000000100 dead000000000200 0000000000000000
 raw: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 page dumped because: nonzero mapcount
 ...
 Workqueue: events hot_add_req [hv_balloon]
 Call Trace:
  dump_stack+0x5c/0x80
  bad_page.cold.112+0x7f/0xb2
  free_pcppages_bulk+0x4b8/0x690
  free_unref_page+0x54/0x70
  hv_page_online_one+0x5c/0x80 [hv_balloon]
  hot_add_req.cold.24+0x182/0x835 [hv_balloon]
  ...

Turns out that we now have deferred struct page initialization for memory
hotplug so e.g. memory_block_action() in drivers/base/memory.c does
pages_correctly_probed() check and in that check it avoids inspecting
struct pages and checks sections instead. But in Hyper-V balloon driver we
do PageReserved(pfn_to_page()) check and this is now wrong.

Switch to checking online_section_nr() instead.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agotty/n_hdlc: fix __might_sleep warning
Paul Fulghum [Tue, 1 Jan 2019 20:28:53 +0000 (12:28 -0800)]
tty/n_hdlc: fix __might_sleep warning

BugLink: https://bugs.launchpad.net/bugs/1837664
commit fc01d8c61ce02c034e67378cd3e645734bc18c8c upstream.

Fix __might_sleep warning[1] in tty/n_hdlc.c read due to copy_to_user
call while current is TASK_INTERRUPTIBLE.  This is a false positive
since the code path does not depend on current state remaining
TASK_INTERRUPTIBLE.  The loop breaks out and sets TASK_RUNNING after
calling copy_to_user.

This patch supresses the warning by setting TASK_RUNNING before calling
copy_to_user.

[1] https://syzkaller.appspot.com/bug?id=17d5de7f1fcab794cb8c40032f893f52de899324

Signed-off-by: Paul Fulghum <paulkf@microgate.com>
Reported-by: syzbot <syzbot+c244af085a0159d22879@syzkaller.appspotmail.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: stable <stable@vger.kernel.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agouart: Fix crash in uart_write and uart_put_char
Samir Virmani [Wed, 16 Jan 2019 18:28:07 +0000 (10:28 -0800)]
uart: Fix crash in uart_write and uart_put_char

BugLink: https://bugs.launchpad.net/bugs/1837664
commit aff9cf5955185d1f183227e46c5f8673fa483813 upstream.

We were experiencing a crash similar to the one reported as part of
commit:a5ba1d95e46e ("uart: fix race between uart_put_char() and
uart_shutdown()") in our testbed as well. We continue to observe the same
crash after integrating the commit a5ba1d95e46e ("uart: fix race between
uart_put_char() and uart_shutdown()")

On reviewing the change, the port lock should be taken prior to checking for
if (!circ->buf) in fn. __uart_put_char and other fns. that update the buffer
uart_state->xmit.

Traceback:

[11/27/2018 06:24:32.4870] Unable to handle kernel NULL pointer dereference
                           at virtual address 0000003b

[11/27/2018 06:24:32.4950] PC is at memcpy+0x48/0x180
[11/27/2018 06:24:32.4950] LR is at uart_write+0x74/0x120
[11/27/2018 06:24:32.4950] pc : [<ffffffc0002e6808>]
                           lr : [<ffffffc0003747cc>] pstate: 000001c5
[11/27/2018 06:24:32.4950] sp : ffffffc076433d30
[11/27/2018 06:24:32.4950] x29: ffffffc076433d30 x28: 0000000000000140
[11/27/2018 06:24:32.4950] x27: ffffffc0009b9d5e x26: ffffffc07ce36580
[11/27/2018 06:24:32.4950] x25: 0000000000000000 x24: 0000000000000140
[11/27/2018 06:24:32.4950] x23: ffffffc000891200 x22: ffffffc01fc34000
[11/27/2018 06:24:32.4950] x21: 0000000000000fff x20: 0000000000000076
[11/27/2018 06:24:32.4950] x19: 0000000000000076 x18: 0000000000000000
[11/27/2018 06:24:32.4950] x17: 000000000047cf08 x16: ffffffc000099e68
[11/27/2018 06:24:32.4950] x15: 0000000000000018 x14: 776d726966205948
[11/27/2018 06:24:32.4950] x13: 50203a6c6974755f x12: 74647075205d3333
[11/27/2018 06:24:32.4950] x11: 3a35323a36203831 x10: 30322f37322f3131
[11/27/2018 06:24:32.4950] x9 : 5b205d303638342e x8 : 746164206f742070
[11/27/2018 06:24:32.4950] x7 : 7520736920657261 x6 : 000000000000003b
[11/27/2018 06:24:32.4950] x5 : 000000000000817a x4 : 0000000000000008
[11/27/2018 06:24:32.4950] x3 : 2f37322f31312a5b x2 : 000000000000006e
[11/27/2018 06:24:32.4950] x1 : ffffffc0009b9cf0 x0 : 000000000000003b

[11/27/2018 06:24:32.4950] CPU2: stopping
[11/27/2018 06:24:32.4950] CPU: 2 PID: 0 Comm: swapper/2 Tainted: P      D    O    4.1.51 #3
[11/27/2018 06:24:32.4950] Hardware name: Broadcom-v8A (DT)
[11/27/2018 06:24:32.4950] Call trace:
[11/27/2018 06:24:32.4950] [<ffffffc0000883b8>] dump_backtrace+0x0/0x150
[11/27/2018 06:24:32.4950] [<ffffffc00008851c>] show_stack+0x14/0x20
[11/27/2018 06:24:32.4950] [<ffffffc0005ee810>] dump_stack+0x90/0xb0
[11/27/2018 06:24:32.4950] [<ffffffc00008e844>] handle_IPI+0x18c/0x1a0
[11/27/2018 06:24:32.4950] [<ffffffc000080c68>] gic_handle_irq+0x88/0x90

Fixes: a5ba1d95e46e ("uart: fix race between uart_put_char() and uart_shutdown()")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Samir Virmani <samir@embedur.com>
Acked-by: Tycho Andersen <tycho@tycho.ws>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agotty: Handle problem if line discipline does not have receive_buf
Greg Kroah-Hartman [Sun, 20 Jan 2019 09:46:58 +0000 (10:46 +0100)]
tty: Handle problem if line discipline does not have receive_buf

BugLink: https://bugs.launchpad.net/bugs/1837664
commit 27cfb3a53be46a54ec5e0bd04e51995b74c90343 upstream.

Some tty line disciplines do not have a receive buf callback, so
properly check for that before calling it.  If they do not have this
callback, just eat the character quietly, as we can't fail this call.

Reported-by: Jann Horn <jannh@google.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agostaging: rtl8188eu: Add device code for D-Link DWA-121 rev B1
Michael Straube [Mon, 7 Jan 2019 17:28:58 +0000 (18:28 +0100)]
staging: rtl8188eu: Add device code for D-Link DWA-121 rev B1

BugLink: https://bugs.launchpad.net/bugs/1837664
commit 5f74a8cbb38d10615ed46bc3e37d9a4c9af8045a upstream.

This device was added to the stand-alone driver on github.
Add it to the staging driver as well.

Link: https://github.com/lwfinger/rtl8188eu/commit/a0619a07cd1e
Signed-off-by: Michael Straube <straube.linux@gmail.com>
Acked-by: Larry Finger <Larry.Finger@lwfinger.net>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agos390/smp: fix CPU hotplug deadlock with CPU rescan
Gerald Schaefer [Wed, 9 Jan 2019 12:00:03 +0000 (13:00 +0100)]
s390/smp: fix CPU hotplug deadlock with CPU rescan

BugLink: https://bugs.launchpad.net/bugs/1837664
commit b7cb707c373094ce4008d4a6ac9b6b366ec52da5 upstream.

smp_rescan_cpus() is called without the device_hotplug_lock, which can lead
to a dedlock when a new CPU is found and immediately set online by a udev
rule.

This was observed on an older kernel version, where the cpu_hotplug_begin()
loop was still present, and it resulted in hanging chcpu and systemd-udev
processes. This specific deadlock will not show on current kernels. However,
there may be other possible deadlocks, and since smp_rescan_cpus() can still
trigger a CPU hotplug operation, the device_hotplug_lock should be held.

For reference, this was the deadlock with the old cpu_hotplug_begin() loop:

        chcpu (rescan)                       systemd-udevd

 echo 1 > /sys/../rescan
 -> smp_rescan_cpus()
 -> (*) get_online_cpus()
    (increases refcount)
 -> smp_add_present_cpu()
    (new CPU found)
 -> register_cpu()
 -> device_add()
 -> udev "add" event triggered -----------> udev rule sets CPU online
                                         -> echo 1 > /sys/.../online
                                         -> lock_device_hotplug_sysfs()
                                            (this is missing in rescan path)
                                         -> device_online()
                                         -> (**) device_lock(new CPU dev)
                                         -> cpu_up()
                                         -> cpu_hotplug_begin()
                                            (loops until refcount == 0)
                                            -> deadlock with (*)
 -> bus_probe_device()
 -> device_attach()
 -> device_lock(new CPU dev)
    -> deadlock with (**)

Fix this by taking the device_hotplug_lock in the CPU rescan path.

Cc: <stable@vger.kernel.org>
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoARC: perf: map generic branches to correct hardware condition
Eugeniy Paltsev [Mon, 17 Dec 2018 09:54:23 +0000 (12:54 +0300)]
ARC: perf: map generic branches to correct hardware condition

BugLink: https://bugs.launchpad.net/bugs/1837664
commit 3affbf0e154ee351add6fcc254c59c3f3947fa8f upstream.

So far we've mapped branches to "ijmp" which also counts conditional
branches NOT taken. This makes us different from other architectures
such as ARM which seem to be counting only taken branches.

So use "ijmptak" hardware condition which only counts (all jump
instructions that are taken)

'ijmptak' event is available on both ARCompact and ARCv2 ISA based
cores.

Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
Cc: stable@vger.kernel.org
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
[vgupta: reworked changelog]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoARC: adjust memblock_reserve of kernel memory
Eugeniy Paltsev [Wed, 19 Dec 2018 16:16:16 +0000 (19:16 +0300)]
ARC: adjust memblock_reserve of kernel memory

BugLink: https://bugs.launchpad.net/bugs/1837664
commit a3010a0465383300f909f62b8a83f83ffa7b2517 upstream.

In setup_arch_memory we reserve the memory area wherein the kernel
is located. Current implementation may reserve more memory than
it actually required in case of CONFIG_LINUX_LINK_BASE is not
equal to CONFIG_LINUX_RAM_BASE. This happens because we calculate
start of the reserved region relatively to the CONFIG_LINUX_RAM_BASE
and end of the region relatively to the CONFIG_LINUX_RAM_BASE.

For example in case of HSDK board we wasted 256MiB of physical memory:
------------------->8------------------------------
Memory: 770416K/1048576K available (5496K kernel code,
    240K rwdata, 1064K rodata, 2200K init, 275K bss,
    278160K reserved, 0K cma-reserved)
------------------->8------------------------------

Fix that.

Fixes: 9ed68785f7f2b ("ARC: mm: Decouple RAM base address from kernel link addr")
Cc: stable@vger.kernel.org #4.14+
Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoARCv2: lib: memeset: fix doing prefetchw outside of buffer
Eugeniy Paltsev [Mon, 14 Jan 2019 15:16:48 +0000 (18:16 +0300)]
ARCv2: lib: memeset: fix doing prefetchw outside of buffer

BugLink: https://bugs.launchpad.net/bugs/1837664
commit e6a72b7daeeb521753803550f0ed711152bb2555 upstream.

ARCv2 optimized memset uses PREFETCHW instruction for prefetching the
next cache line but doesn't ensure that the line is not past the end of
the buffer. PRETECHW changes the line ownership and marks it dirty,
which can cause issues in SMP config when next line was already owned by
other core. Fix the issue by avoiding the PREFETCHW

Some more details:

The current code has 3 logical loops (ignroing the unaligned part)
  (a) Big loop for doing aligned 64 bytes per iteration with PREALLOC
  (b) Loop for 32 x 2 bytes with PREFETCHW
  (c) any left over bytes

loop (a) was already eliding the last 64 bytes, so PREALLOC was
safe. The fix was removing PREFETCW from (b).

Another potential issue (applicable to configs with 32 or 128 byte L1
cache line) is that PREALLOC assumes 64 byte cache line and may not do
the right thing specially for 32b. While it would be easy to adapt,
there are no known configs with those lie sizes, so for now, just
compile out PREALLOC in such cases.

Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
Cc: stable@vger.kernel.org #4.4+
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
[vgupta: rewrote changelog, used asm .macro vs. "C" macro]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoALSA: hda - Add mute LED support for HP ProBook 470 G5
Anthony Wong [Sat, 19 Jan 2019 04:22:31 +0000 (12:22 +0800)]
ALSA: hda - Add mute LED support for HP ProBook 470 G5

BugLink: https://bugs.launchpad.net/bugs/1837664
commit 699390381a7bae2fab01a22f742a17235c44ed8a upstream.

Support speaker and mic mute LEDs on HP ProBook 470 G5.

BugLink: https://bugs.launchpad.net/bugs/1811254
Signed-off-by: Anthony Wong <anthony.wong@canonical.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoASoC: rt5514-spi: Fix potential NULL pointer dereference
Gustavo A. R. Silva [Tue, 15 Jan 2019 17:57:23 +0000 (11:57 -0600)]
ASoC: rt5514-spi: Fix potential NULL pointer dereference

BugLink: https://bugs.launchpad.net/bugs/1837664
commit 060d0bf491874daece47053c4e1fb0489eb867d2 upstream.

There is a potential NULL pointer dereference in case devm_kzalloc()
fails and returns NULL.

Fix this by adding a NULL check on rt5514_dsp.

This issue was detected with the help of Coccinelle.

Fixes: 6eebf35b0e4a ("ASoC: rt5514: add rt5514 SPI driver")
Cc: stable@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoASoC: atom: fix a missing check of snd_pcm_lib_malloc_pages
Kangjie Lu [Wed, 26 Dec 2018 02:29:48 +0000 (20:29 -0600)]
ASoC: atom: fix a missing check of snd_pcm_lib_malloc_pages

BugLink: https://bugs.launchpad.net/bugs/1837664
commit 44fabd8cdaaa3acb80ad2bb3b5c61ae2136af661 upstream.

snd_pcm_lib_malloc_pages() may fail, so let's check its status and
return its error code upstream.

Signed-off-by: Kangjie Lu <kjlu@umn.edu>
Acked-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoUSB: serial: pl2303: add new PID to support PL2303TB
Charles Yeh [Tue, 15 Jan 2019 15:13:56 +0000 (23:13 +0800)]
USB: serial: pl2303: add new PID to support PL2303TB

BugLink: https://bugs.launchpad.net/bugs/1837664
commit 4dcf9ddc9ad5ab649abafa98c5a4d54b1a33dabb upstream.

Add new PID to support PL2303TB (TYPE_HX)

Signed-off-by: Charles Yeh <charlesyeh522@gmail.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoUSB: serial: simple: add Motorola Tetra TPG2200 device id
Max Schulze [Mon, 7 Jan 2019 07:31:49 +0000 (08:31 +0100)]
USB: serial: simple: add Motorola Tetra TPG2200 device id

BugLink: https://bugs.launchpad.net/bugs/1837664
commit b81c2c33eab79dfd3650293b2227ee5c6036585c upstream.

Add new Motorola Tetra device id for Motorola Solutions TETRA PEI device

T:  Bus=02 Lev=01 Prnt=01 Port=01 Cnt=01 Dev#=  4 Spd=480 MxCh= 0
D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
P:  Vendor=0cad ProdID=9016 Rev=24.16
S:  Manufacturer=Motorola Solutions, Inc.
S:  Product=TETRA PEI interface
C:  #Ifs= 2 Cfg#= 1 Atr=80 MxPwr=500mA
I:  If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=usb_serial_simple
I:  If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=usb_serial_simple

Signed-off-by: Max Schulze <max.schulze@posteo.de>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agomei: me: add denverton innovation engine device IDs
Tomas Winkler [Sun, 13 Jan 2019 12:24:48 +0000 (14:24 +0200)]
mei: me: add denverton innovation engine device IDs

BugLink: https://bugs.launchpad.net/bugs/1837664
commit f7ee8ead151f9d0b8dac6ab6c3ff49bbe809c564 upstream.

Add the Denverton innovation engine (IE) device ids.
The IE is an ME-like device which provides HW security
offloading.

Cc: <stable@vger.kernel.org>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agotcp: allow MSG_ZEROCOPY transmission also in CLOSE_WAIT state
Willem de Bruijn [Thu, 10 Jan 2019 19:40:33 +0000 (14:40 -0500)]
tcp: allow MSG_ZEROCOPY transmission also in CLOSE_WAIT state

BugLink: https://bugs.launchpad.net/bugs/1837664
[ Upstream commit 13d7f46386e060df31b727c9975e38306fa51e7a ]

TCP transmission with MSG_ZEROCOPY fails if the peer closes its end of
the connection and so transitions this socket to CLOSE_WAIT state.

Transmission in close wait state is acceptable. Other similar tests in
the stack (e.g., in FastOpen) accept both states. Relax this test, too.

Link: https://www.mail-archive.com/netdev@vger.kernel.org/msg276886.html
Link: https://www.mail-archive.com/netdev@vger.kernel.org/msg227390.html
Fixes: f214f915e7db ("tcp: enable MSG_ZEROCOPY")
Reported-by: Marek Majkowski <marek@cloudflare.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
CC: Yuchung Cheng <ycheng@google.com>
CC: Neal Cardwell <ncardwell@google.com>
CC: Soheil Hassas Yeganeh <soheil@google.com>
CC: Alexey Kodanev <alexey.kodanev@oracle.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agonet: ipv4: Fix memory leak in network namespace dismantle
Ido Schimmel [Wed, 9 Jan 2019 09:57:39 +0000 (09:57 +0000)]
net: ipv4: Fix memory leak in network namespace dismantle

BugLink: https://bugs.launchpad.net/bugs/1837664
[ Upstream commit f97f4dd8b3bb9d0993d2491e0f22024c68109184 ]

IPv4 routing tables are flushed in two cases:

1. In response to events in the netdev and inetaddr notification chains
2. When a network namespace is being dismantled

In both cases only routes associated with a dead nexthop group are
flushed. However, a nexthop group will only be marked as dead in case it
is populated with actual nexthops using a nexthop device. This is not
the case when the route in question is an error route (e.g.,
'blackhole', 'unreachable').

Therefore, when a network namespace is being dismantled such routes are
not flushed and leaked [1].

To reproduce:
# ip netns add blue
# ip -n blue route add unreachable 192.0.2.0/24
# ip netns del blue

Fix this by not skipping error routes that are not marked with
RTNH_F_DEAD when flushing the routing tables.

To prevent the flushing of such routes in case #1, add a parameter to
fib_table_flush() that indicates if the table is flushed as part of
namespace dismantle or not.

Note that this problem does not exist in IPv6 since error routes are
associated with the loopback device.

[1]
unreferenced object 0xffff888066650338 (size 56):
  comm "ip", pid 1206, jiffies 4294786063 (age 26.235s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 b0 1c 62 61 80 88 ff ff  ..........ba....
    e8 8b a1 64 80 88 ff ff 00 07 00 08 fe 00 00 00  ...d............
  backtrace:
    [<00000000856ed27d>] inet_rtm_newroute+0x129/0x220
    [<00000000fcdfc00a>] rtnetlink_rcv_msg+0x397/0xa20
    [<00000000cb85801a>] netlink_rcv_skb+0x132/0x380
    [<00000000ebc991d2>] netlink_unicast+0x4c0/0x690
    [<0000000014f62875>] netlink_sendmsg+0x929/0xe10
    [<00000000bac9d967>] sock_sendmsg+0xc8/0x110
    [<00000000223e6485>] ___sys_sendmsg+0x77a/0x8f0
    [<000000002e94f880>] __sys_sendmsg+0xf7/0x250
    [<00000000ccb1fa72>] do_syscall_64+0x14d/0x610
    [<00000000ffbe3dae>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [<000000003a8b605b>] 0xffffffffffffffff
unreferenced object 0xffff888061621c88 (size 48):
  comm "ip", pid 1206, jiffies 4294786063 (age 26.235s)
  hex dump (first 32 bytes):
    6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
    6b 6b 6b 6b 6b 6b 6b 6b d8 8e 26 5f 80 88 ff ff  kkkkkkkk..&_....
  backtrace:
    [<00000000733609e3>] fib_table_insert+0x978/0x1500
    [<00000000856ed27d>] inet_rtm_newroute+0x129/0x220
    [<00000000fcdfc00a>] rtnetlink_rcv_msg+0x397/0xa20
    [<00000000cb85801a>] netlink_rcv_skb+0x132/0x380
    [<00000000ebc991d2>] netlink_unicast+0x4c0/0x690
    [<0000000014f62875>] netlink_sendmsg+0x929/0xe10
    [<00000000bac9d967>] sock_sendmsg+0xc8/0x110
    [<00000000223e6485>] ___sys_sendmsg+0x77a/0x8f0
    [<000000002e94f880>] __sys_sendmsg+0xf7/0x250
    [<00000000ccb1fa72>] do_syscall_64+0x14d/0x610
    [<00000000ffbe3dae>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [<000000003a8b605b>] 0xffffffffffffffff

Fixes: 8cced9eff1d4 ("[NETNS]: Enable routing configuration in non-initial namespace.")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agovhost: log dirty page correctly
Jason Wang [Wed, 16 Jan 2019 08:54:42 +0000 (16:54 +0800)]
vhost: log dirty page correctly

BugLink: https://bugs.launchpad.net/bugs/1837664
[ Upstream commit cc5e710759470bc7f3c61d11fd54586f15fdbdf4 ]

Vhost dirty page logging API is designed to sync through GPA. But we
try to log GIOVA when device IOTLB is enabled. This is wrong and may
lead to missing data after migration.

To solve this issue, when logging with device IOTLB enabled, we will:

1) reuse the device IOTLB translation result of GIOVA->HVA mapping to
   get HVA, for writable descriptor, get HVA through iovec. For used
   ring update, translate its GIOVA to HVA
2) traverse the GPA->HVA mapping to get the possible GPA and log
   through GPA. Pay attention this reverse mapping is not guaranteed
   to be unique, so we should log each possible GPA in this case.

This fix the failure of scp to guest during migration. In -next, we
will probably support passing GIOVA->GPA instead of GIOVA->HVA.

Fixes: 6b1e6cc7855b ("vhost: new device IOTLB API")
Reported-by: Jintack Lim <jintack@cs.columbia.edu>
Cc: Jintack Lim <jintack@cs.columbia.edu>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoopenvswitch: Avoid OOB read when parsing flow nlattrs
Ross Lagerwall [Mon, 14 Jan 2019 09:16:56 +0000 (09:16 +0000)]
openvswitch: Avoid OOB read when parsing flow nlattrs

BugLink: https://bugs.launchpad.net/bugs/1837664
[ Upstream commit 04a4af334b971814eedf4e4a413343ad3287d9a9 ]

For nested and variable attributes, the expected length of an attribute
is not known and marked by a negative number.  This results in an OOB
read when the expected length is later used to check if the attribute is
all zeros. Fix this by using the actual length of the attribute rather
than the expected length.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agonet_sched: refetch skb protocol for each filter
Cong Wang [Sat, 12 Jan 2019 02:55:42 +0000 (18:55 -0800)]
net_sched: refetch skb protocol for each filter

BugLink: https://bugs.launchpad.net/bugs/1837664
[ Upstream commit cd0c4e70fc0ccfa705cdf55efb27519ce9337a26 ]

Martin reported a set of filters don't work after changing
from reclassify to continue. Looking into the code, it
looks like skb protocol is not always fetched for each
iteration of the filters. But, as demonstrated by Martin,
TC actions could modify skb->protocol, for example act_vlan,
this means we have to refetch skb protocol in each iteration,
rather than using the one we fetch in the beginning of the loop.

This bug is _not_ introduced by commit 3b3ae880266d
("net: sched: consolidate tc_classify{,_compat}"), technically,
if act_vlan is the only action that modifies skb protocol, then
it is commit c7e2b9689ef8 ("sched: introduce vlan action") which
introduced this bug.

Reported-by: Martin Olsson <martin.olsson+netdev@sentorsecurity.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agonet: phy: mdio_bus: add missing device_del() in mdiobus_register() error handling
Thomas Petazzoni [Wed, 16 Jan 2019 09:53:58 +0000 (10:53 +0100)]
net: phy: mdio_bus: add missing device_del() in mdiobus_register() error handling

BugLink: https://bugs.launchpad.net/bugs/1837664
[ Upstream commit e40e2a2e78664fa90ea4b9bdf4a84efce2fea9d9 ]

The current code in __mdiobus_register() doesn't properly handle
failures returned by the devm_gpiod_get_optional() call: it returns
immediately, without unregistering the device that was added by the
call to device_register() earlier in the function.

This leaves a stale device, which then causes a NULL pointer
dereference in the code that handles deferred probing:

[    1.489982] Unable to handle kernel NULL pointer dereference at virtual address 00000074
[    1.498110] pgd = (ptrval)
[    1.500838] [00000074] *pgd=00000000
[    1.504432] Internal error: Oops: 17 [#1] SMP ARM
[    1.509133] Modules linked in:
[    1.512192] CPU: 1 PID: 51 Comm: kworker/1:3 Not tainted 4.20.0-00039-g3b73a4cc8b3e-dirty #99
[    1.520708] Hardware name: Xilinx Zynq Platform
[    1.525261] Workqueue: events deferred_probe_work_func
[    1.530403] PC is at klist_next+0x10/0xfc
[    1.534403] LR is at device_for_each_child+0x40/0x94
[    1.539361] pc : [<c0683fbc>]    lr : [<c0455d90>]    psr: 200e0013
[    1.545628] sp : ceeefe68  ip : 00000001  fp : ffffe000
[    1.550863] r10: 00000000  r9 : c0c66790  r8 : 00000000
[    1.556079] r7 : c0457d44  r6 : 00000000  r5 : ceeefe8c  r4 : cfa2ec78
[    1.562604] r3 : 00000064  r2 : c0457d44  r1 : ceeefe8c  r0 : 00000064
[    1.569129] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[    1.576263] Control: 18c5387d  Table: 0ed7804a  DAC: 00000051
[    1.582013] Process kworker/1:3 (pid: 51, stack limit = 0x(ptrval))
[    1.588280] Stack: (0xceeefe68 to 0xceef0000)
[    1.592630] fe60:                   cfa2ec78 c0c03c08 00000000 c0457d44 00000000 c0c66790
[    1.600814] fe80: 00000000 c0455d90 ceeefeac 00000064 00000000 0d7a542e cee9d494 cfa2ec78
[    1.608998] fea0: cfa2ec78 00000000 c0457d44 c0457d7c cee9d494 c0c03c08 00000000 c0455dac
[    1.617182] fec0: cf98ba44 cf926a00 cee9d494 0d7a542e 00000000 cf935a10 cf935a10 cf935a10
[    1.625366] fee0: c0c4e9b8 c0457d7c c0c4e80c 00000001 cf935a10 c0457df4 cf935a10 c0c4e99c
[    1.633550] ff00: c0c4e99c c045a27c c0c4e9c4 ced63f80 cfde8a80 cfdebc00 00000000 c013893c
[    1.641734] ff20: cfde8a80 cfde8a80 c07bd354 ced63f80 ced63f94 cfde8a80 00000008 c0c02d00
[    1.649936] ff40: cfde8a98 cfde8a80 ffffe000 c0139a30 ffffe000 c0c6624a c07bd354 00000000
[    1.658120] ff60: ffffe000 cee9e780 ceebfe00 00000000 ceeee000 ced63f80 c0139788 cf8cdea4
[    1.666304] ff80: cee9e79c c013e598 00000001 ceebfe00 c013e44c 00000000 00000000 00000000
[    1.674488] ffa0: 00000000 00000000 00000000 c01010e8 00000000 00000000 00000000 00000000
[    1.682671] ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    1.690855] ffe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[    1.699058] [<c0683fbc>] (klist_next) from [<c0455d90>] (device_for_each_child+0x40/0x94)
[    1.707241] [<c0455d90>] (device_for_each_child) from [<c0457d7c>] (device_reorder_to_tail+0x38/0x88)
[    1.716476] [<c0457d7c>] (device_reorder_to_tail) from [<c0455dac>] (device_for_each_child+0x5c/0x94)
[    1.725692] [<c0455dac>] (device_for_each_child) from [<c0457d7c>] (device_reorder_to_tail+0x38/0x88)
[    1.734927] [<c0457d7c>] (device_reorder_to_tail) from [<c0457df4>] (device_pm_move_to_tail+0x28/0x40)
[    1.744235] [<c0457df4>] (device_pm_move_to_tail) from [<c045a27c>] (deferred_probe_work_func+0x58/0x8c)
[    1.753746] [<c045a27c>] (deferred_probe_work_func) from [<c013893c>] (process_one_work+0x210/0x4fc)
[    1.762888] [<c013893c>] (process_one_work) from [<c0139a30>] (worker_thread+0x2a8/0x5c0)
[    1.771072] [<c0139a30>] (worker_thread) from [<c013e598>] (kthread+0x14c/0x154)
[    1.778482] [<c013e598>] (kthread) from [<c01010e8>] (ret_from_fork+0x14/0x2c)
[    1.785689] Exception stack(0xceeeffb0 to 0xceeefff8)
[    1.790739] ffa0:                                     00000000 00000000 00000000 00000000
[    1.798923] ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    1.807107] ffe0: 00000000 00000000 00000000 00000000 00000013 00000000
[    1.813724] Code: e92d47f0 e1a05000 e8900048 e1a00003 (e5937010)
[    1.819844] ---[ end trace 3c2c0c8b65399ec9 ]---

The actual error that we had from devm_gpiod_get_optional() was
-EPROBE_DEFER, due to the GPIO being provided by a driver that is
probed later than the Ethernet controller driver.

To fix this, we simply add the missing device_del() invocation in the
error path.

Fixes: 69226896ad636 ("mdio_bus: Issue GPIO RESET to PHYs")
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agonet: Fix usage of pskb_trim_rcsum
Ross Lagerwall [Thu, 17 Jan 2019 15:34:38 +0000 (15:34 +0000)]
net: Fix usage of pskb_trim_rcsum

BugLink: https://bugs.launchpad.net/bugs/1837664
[ Upstream commit 6c57f0458022298e4da1729c67bd33ce41c14e7a ]

In certain cases, pskb_trim_rcsum() may change skb pointers.
Reinitialize header pointers afterwards to avoid potential
use-after-frees. Add a note in the documentation of
pskb_trim_rcsum(). Found by KASAN.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agonet: bridge: Fix ethernet header pointer before check skb forwardable
Yunjian Wang [Thu, 17 Jan 2019 01:46:41 +0000 (09:46 +0800)]
net: bridge: Fix ethernet header pointer before check skb forwardable

BugLink: https://bugs.launchpad.net/bugs/1837664
[ Upstream commit 28c1382fa28f2e2d9d0d6f25ae879b5af2ecbd03 ]

The skb header should be set to ethernet header before using
is_skb_forwardable. Because the ethernet header length has been
considered in is_skb_forwardable(including dev->hard_header_len
length).

To reproduce the issue:
1, add 2 ports on linux bridge br using following commands:
$ brctl addbr br
$ brctl addif br eth0
$ brctl addif br eth1
2, the MTU of eth0 and eth1 is 1500
3, send a packet(Data 1480, UDP 8, IP 20, Ethernet 14, VLAN 4)
from eth0 to eth1

So the expect result is packet larger than 1500 cannot pass through
eth0 and eth1. But currently, the packet passes through success, it
means eth1's MTU limit doesn't take effect.

Fixes: f6367b4660dd ("bridge: use is_skb_forwardable in forward path")
Cc: bridge@lists.linux-foundation.org
Cc: Nkolay Aleksandrov <nikolay@cumulusnetworks.com>
Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoamd-xgbe: Fix mdio access for non-zero ports and clause 45 PHYs
Lendacky, Thomas [Thu, 17 Jan 2019 14:20:14 +0000 (14:20 +0000)]
amd-xgbe: Fix mdio access for non-zero ports and clause 45 PHYs

BugLink: https://bugs.launchpad.net/bugs/1837664
[ Upstream commit 5ab3121beeb76aa6090195b67d237115860dd9ec ]

The XGBE hardware has support for performing MDIO operations using an
MDIO command request. The driver mistakenly uses the mdio port address
as the MDIO command request device address instead of the MDIO command
request port address. Additionally, the driver does not properly check
for and create a clause 45 MDIO command.

Check the supplied MDIO register to determine if the request is a clause
45 operation (MII_ADDR_C45). For a clause 45 operation, extract the device
address and register number from the supplied MDIO register and use them
to set the MDIO command request device address and register number fields.
For a clause 22 operation, the MDIO request device address is set to zero
and the MDIO command request register number is set to the supplied MDIO
register. In either case, the supplied MDIO port address is used as the
MDIO command request port address.

Fixes: 732f2ab7afb9 ("amd-xgbe: Add support for MDIO attached PHYs")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Tested-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agouserfaultfd: clear flag if remap event not enabled
Peter Xu [Fri, 28 Dec 2018 08:38:47 +0000 (00:38 -0800)]
userfaultfd: clear flag if remap event not enabled

BugLink: https://bugs.launchpad.net/bugs/1837477
[ Upstream commit 3cfd22be0ad663248fadfc8f6ffa3e255c394552 ]

When the process being tracked does mremap() without
UFFD_FEATURE_EVENT_REMAP on the corresponding tracking uffd file handle,
we should not generate the remap event, and at the same time we should
clear all the uffd flags on the new VMA.  Without this patch, we can still
have the VM_UFFD_MISSING|VM_UFFD_WP flags on the new VMA even the fault
handling process does not even know the existance of the VMA.

Link: http://lkml.kernel.org/r/20181211053409.20317-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Hugh Dickins <hughd@google.com>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Cc: Pravin Shedge <pravin.shedge4linux@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agodm: Check for device sector overflow if CONFIG_LBDAF is not set
Milan Broz [Wed, 7 Nov 2018 21:24:55 +0000 (22:24 +0100)]
dm: Check for device sector overflow if CONFIG_LBDAF is not set

BugLink: https://bugs.launchpad.net/bugs/1837477
[ Upstream commit ef87bfc24f9b8da82c89aff493df20f078bc9cb1 ]

Reference to a device in device-mapper table contains offset in sectors.

If the sector_t is 32bit integer (CONFIG_LBDAF is not set), then
several device-mapper targets can overflow this offset and validity
check is then performed on a wrong offset and a wrong table is activated.

See for example (on 32bit without CONFIG_LBDAF) this overflow:

  # dmsetup create test --table "0 2048 linear /dev/sdg 4294967297"
  # dmsetup table test
  0 2048 linear 8:96 1

This patch adds explicit check for overflow if the offset is sector_t type.

Signed-off-by: Milan Broz <gmazyland@gmail.com>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoperf tools: Add missing open_memstream() prototype for systems lacking it
Arnaldo Carvalho de Melo [Tue, 11 Dec 2018 19:31:19 +0000 (16:31 -0300)]
perf tools: Add missing open_memstream() prototype for systems lacking it

BugLink: https://bugs.launchpad.net/bugs/1837477
[ Upstream commit d7a8c4a6a055097a67ccfa3ca7c9ff1b64603a70 ]

There are systems such as the Android NDK API level 24 has the
open_memstream() function but doesn't provide a prototype, adding noise
to the build:

  builtin-timechart.c: In function 'cat_backtrace':
  builtin-timechart.c:486:2: warning: implicit declaration of function 'open_memstream' [-Wimplicit-function-declaration]
    FILE *f = open_memstream(&p, &p_len);
    ^
  builtin-timechart.c:486:2: warning: nested extern declaration of 'open_memstream' [-Wnested-externs]
  builtin-timechart.c:486:12: warning: initialization makes pointer from integer without a cast
    FILE *f = open_memstream(&p, &p_len);
              ^

Define a LACKS_OPEN_MEMSTREAM_PROTOTYPE define so that code needing that
can get a prototype.

Checked in the bionic git repo to be available since level 23:

https://android.googlesource.com/platform/bionic/+/master/libc/include/stdio.h#241

  FILE* open_memstream(char** __ptr, size_t* __size_ptr) __INTRODUCED_IN(23);

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-343ashae97e5bq6vizusyfno@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoperf tools: Add missing sigqueue() prototype for systems lacking it
Arnaldo Carvalho de Melo [Tue, 11 Dec 2018 18:48:47 +0000 (15:48 -0300)]
perf tools: Add missing sigqueue() prototype for systems lacking it

BugLink: https://bugs.launchpad.net/bugs/1837477
[ Upstream commit 748fe0889c1ff12d378946bd5326e8ee8eacf5cf ]

There are systems such as the Android NDK API level 24 has the
sigqueue() function but doesn't provide a prototype, adding noise to the
build:

  util/evlist.c: In function 'perf_evlist__prepare_workload':
  util/evlist.c:1494:4: warning: implicit declaration of function 'sigqueue' [-Wimplicit-function-declaration]
      if (sigqueue(getppid(), SIGUSR1, val))
      ^
  util/evlist.c:1494:4: warning: nested extern declaration of 'sigqueue' [-Wnested-externs]

Define a LACKS_SIGQUEUE_PROTOTYPE define so that code needing that can
get a prototype.

Checked in the bionic git repo to be available since level 23:

https://android.googlesource.com/platform/bionic/+/master/libc/include/signal.h#123

  int sigqueue(pid_t __pid, int __signal, const union sigval __value) __INTRODUCED_IN(23);

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-lmhpev1uni9kdrv7j29glyov@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoperf stat: Avoid segfaults caused by negated options
Michael Petlan [Mon, 10 Dec 2018 16:00:04 +0000 (11:00 -0500)]
perf stat: Avoid segfaults caused by negated options

BugLink: https://bugs.launchpad.net/bugs/1837477
[ Upstream commit 51433ead1460fb3f46e1c34f68bb22fd2dd0f5d0 ]

Some 'perf stat' options do not make sense to be negated (event,
cgroup), some do not have negated path implemented (metrics). Due to
that, it is better to disable the "no-" prefix for them, since
otherwise, the later opt-parsing segfaults.

Before:

  $ perf stat --no-metrics -- ls
  Segmentation fault (core dumped)

After:

  $ perf stat --no-metrics -- ls
   Error: option `no-metrics' isn't available
   Usage: perf stat [<options>] [<command>]

Signed-off-by: Michael Petlan <mpetlan@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
LPU-Reference: 1485912065.62416880.1544457604340.JavaMail.zimbra@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agox86/topology: Use total_cpus for max logical packages calculation
Hui Wang [Wed, 7 Nov 2018 02:36:43 +0000 (10:36 +0800)]
x86/topology: Use total_cpus for max logical packages calculation

BugLink: https://bugs.launchpad.net/bugs/1837477
[ Upstream commit aa02ef099cff042c2a9109782ec2bf1bffc955d4 ]

nr_cpu_ids can be limited on the command line via nr_cpus=. This can break the
logical package management because it results in a smaller number of packages
while in kdump kernel.

Check below case:
There is a two sockets system, each socket has 8 cores, which has 16 logical
cpus while HT was turn on.

 0  1  2  3  4  5  6  7     |    16 17 18 19 20 21 22 23
 cores on socket 0               threads on socket 0
 8  9 10 11 12 13 14 15     |    24 25 26 27 28 29 30 31
 cores on socket 1               threads on socket 1

While starting the kdump kernel with command line option nr_cpus=16 panic
was triggered on one of the cpus 24-31 eg. 26, then online cpu will be
1-15, 26(cpu 0 was disabled in kdump), ncpus will be 16 and
__max_logical_packages will be 1, but actually two packages were booted on.

This issue can reproduced by set kdump option nr_cpus=<real physical core
numbers>, and then trigger panic on last socket's thread, for example:

taskset -c 26 echo c > /proc/sysrq-trigger

Use total_cpus which will not be limited by nr_cpus command line to calculate
the value of __max_logical_packages.

Signed-off-by: Hui Wang <john.wanghui@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: <guijianfeng@huawei.com>
Cc: <wencongyang2@huawei.com>
Cc: <douliyang1@huawei.com>
Cc: <qiaonuohan@huawei.com>
Link: https://lkml.kernel.org/r/20181107023643.22174-1-john.wanghui@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agonetfilter: ipt_CLUSTERIP: fix deadlock in netns exit routine
Taehee Yoo [Mon, 5 Nov 2018 09:22:44 +0000 (18:22 +0900)]
netfilter: ipt_CLUSTERIP: fix deadlock in netns exit routine

BugLink: https://bugs.launchpad.net/bugs/1837477
[ Upstream commit 5a86d68bcf02f2d1e9a5897dd482079fd5f75e7f ]

When network namespace is destroyed, cleanup_net() is called.
cleanup_net() holds pernet_ops_rwsem then calls each ->exit callback.
So that clusterip_tg_destroy() is called by cleanup_net().
And clusterip_tg_destroy() calls unregister_netdevice_notifier().

But both cleanup_net() and clusterip_tg_destroy() hold same
lock(pernet_ops_rwsem). hence deadlock occurrs.

After this patch, only 1 notifier is registered when module is inserted.
And all of configs are added to per-net list.

test commands:
   %ip netns add vm1
   %ip netns exec vm1 bash
   %ip link set lo up
   %iptables -A INPUT -p tcp -i lo -d 192.168.0.5 --dport 80 \
-j CLUSTERIP --new --hashmode sourceip \
--clustermac 01:00:5e:00:00:20 --total-nodes 2 --local-node 1
   %exit
   %ip netns del vm1

splat looks like:
[  341.809674] ============================================
[  341.809674] WARNING: possible recursive locking detected
[  341.809674] 4.19.0-rc5+ #16 Tainted: G        W
[  341.809674] --------------------------------------------
[  341.809674] kworker/u4:2/87 is trying to acquire lock:
[  341.809674] 000000005da2d519 (pernet_ops_rwsem){++++}, at: unregister_netdevice_notifier+0x8c/0x460
[  341.809674]
[  341.809674] but task is already holding lock:
[  341.809674] 000000005da2d519 (pernet_ops_rwsem){++++}, at: cleanup_net+0x119/0x900
[  341.809674]
[  341.809674] other info that might help us debug this:
[  341.809674]  Possible unsafe locking scenario:
[  341.809674]
[  341.809674]        CPU0
[  341.809674]        ----
[  341.809674]   lock(pernet_ops_rwsem);
[  341.809674]   lock(pernet_ops_rwsem);
[  341.809674]
[  341.809674]  *** DEADLOCK ***
[  341.809674]
[  341.809674]  May be due to missing lock nesting notation
[  341.809674]
[  341.809674] 3 locks held by kworker/u4:2/87:
[  341.809674]  #0: 00000000d9df6c92 ((wq_completion)"%s""netns"){+.+.}, at: process_one_work+0xafe/0x1de0
[  341.809674]  #1: 00000000c2cbcee2 (net_cleanup_work){+.+.}, at: process_one_work+0xb60/0x1de0
[  341.809674]  #2: 000000005da2d519 (pernet_ops_rwsem){++++}, at: cleanup_net+0x119/0x900
[  341.809674]
[  341.809674] stack backtrace:
[  341.809674] CPU: 1 PID: 87 Comm: kworker/u4:2 Tainted: G        W         4.19.0-rc5+ #16
[  341.809674] Workqueue: netns cleanup_net
[  341.809674] Call Trace:
[ ... ]
[  342.070196]  down_write+0x93/0x160
[  342.070196]  ? unregister_netdevice_notifier+0x8c/0x460
[  342.070196]  ? down_read+0x1e0/0x1e0
[  342.070196]  ? sched_clock_cpu+0x126/0x170
[  342.070196]  ? find_held_lock+0x39/0x1c0
[  342.070196]  unregister_netdevice_notifier+0x8c/0x460
[  342.070196]  ? register_netdevice_notifier+0x790/0x790
[  342.070196]  ? __local_bh_enable_ip+0xe9/0x1b0
[  342.070196]  ? __local_bh_enable_ip+0xe9/0x1b0
[  342.070196]  ? clusterip_tg_destroy+0x372/0x650 [ipt_CLUSTERIP]
[  342.070196]  ? trace_hardirqs_on+0x93/0x210
[  342.070196]  ? __bpf_trace_preemptirq_template+0x10/0x10
[  342.070196]  ? clusterip_tg_destroy+0x372/0x650 [ipt_CLUSTERIP]
[  342.123094]  clusterip_tg_destroy+0x3ad/0x650 [ipt_CLUSTERIP]
[  342.123094]  ? clusterip_net_init+0x3d0/0x3d0 [ipt_CLUSTERIP]
[  342.123094]  ? cleanup_match+0x17d/0x200 [ip_tables]
[  342.123094]  ? xt_unregister_table+0x215/0x300 [x_tables]
[  342.123094]  ? kfree+0xe2/0x2a0
[  342.123094]  cleanup_entry+0x1d5/0x2f0 [ip_tables]
[  342.123094]  ? cleanup_match+0x200/0x200 [ip_tables]
[  342.123094]  __ipt_unregister_table+0x9b/0x1a0 [ip_tables]
[  342.123094]  iptable_filter_net_exit+0x43/0x80 [iptable_filter]
[  342.123094]  ops_exit_list.isra.10+0x94/0x140
[  342.123094]  cleanup_net+0x45b/0x900
[ ... ]

Fixes: 202f59afd441 ("netfilter: ipt_CLUSTERIP: do not hold dev")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agonetfilter: ipt_CLUSTERIP: remove wrong WARN_ON_ONCE in netns exit routine
Taehee Yoo [Mon, 5 Nov 2018 09:22:55 +0000 (18:22 +0900)]
netfilter: ipt_CLUSTERIP: remove wrong WARN_ON_ONCE in netns exit routine

BugLink: https://bugs.launchpad.net/bugs/1837477
[ Upstream commit b12f7bad5ad3724d19754390a3e80928525c0769 ]

When network namespace is destroyed, both clusterip_tg_destroy() and
clusterip_net_exit() are called. and clusterip_net_exit() is called
before clusterip_tg_destroy().
Hence cleanup check code in clusterip_net_exit() doesn't make sense.

test commands:
   %ip netns add vm1
   %ip netns exec vm1 bash
   %ip link set lo up
   %iptables -A INPUT -p tcp -i lo -d 192.168.0.5 --dport 80 \
-j CLUSTERIP --new --hashmode sourceip \
--clustermac 01:00:5e:00:00:20 --total-nodes 2 --local-node 1
   %exit
   %ip netns del vm1

splat looks like:
[  341.184508] WARNING: CPU: 1 PID: 87 at net/ipv4/netfilter/ipt_CLUSTERIP.c:840 clusterip_net_exit+0x319/0x380 [ipt_CLUSTERIP]
[  341.184850] Modules linked in: ipt_CLUSTERIP nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_tcpudp iptable_filter bpfilter ip_tables x_tables
[  341.184850] CPU: 1 PID: 87 Comm: kworker/u4:2 Not tainted 4.19.0-rc5+ #16
[  341.227509] Workqueue: netns cleanup_net
[  341.227509] RIP: 0010:clusterip_net_exit+0x319/0x380 [ipt_CLUSTERIP]
[  341.227509] Code: 0f 85 7f fe ff ff 48 c7 c2 80 64 2c c0 be a8 02 00 00 48 c7 c7 a0 63 2c c0 c6 05 18 6e 00 00 01 e8 bc 38 ff f5 e9 5b fe ff ff <0f> 0b e9 33 ff ff ff e8 4b 90 50 f6 e9 2d fe ff ff 48 89 df e8 de
[  341.227509] RSP: 0018:ffff88011086f408 EFLAGS: 00010202
[  341.227509] RAX: dffffc0000000000 RBX: 1ffff1002210de85 RCX: 0000000000000000
[  341.227509] RDX: 1ffff1002210de85 RSI: ffff880110813be8 RDI: ffffed002210de58
[  341.227509] RBP: ffff88011086f4d0 R08: 0000000000000000 R09: 0000000000000000
[  341.227509] R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff1002210de81
[  341.227509] R13: ffff880110625a48 R14: ffff880114cec8c8 R15: 0000000000000014
[  341.227509] FS:  0000000000000000(0000) GS:ffff880116600000(0000) knlGS:0000000000000000
[  341.227509] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  341.227509] CR2: 00007f11fd38e000 CR3: 000000013ca16000 CR4: 00000000001006e0
[  341.227509] Call Trace:
[  341.227509]  ? __clusterip_config_find+0x460/0x460 [ipt_CLUSTERIP]
[  341.227509]  ? default_device_exit+0x1ca/0x270
[  341.227509]  ? remove_proc_entry+0x1cd/0x390
[  341.227509]  ? dev_change_net_namespace+0xd00/0xd00
[  341.227509]  ? __init_waitqueue_head+0x130/0x130
[  341.227509]  ops_exit_list.isra.10+0x94/0x140
[  341.227509]  cleanup_net+0x45b/0x900
[ ... ]

Fixes: 613d0776d3fe ("netfilter: exit_net cleanup check added")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoperf vendor events intel: Fix Load_Miss_Real_Latency on SKL/SKX
Andi Kleen [Tue, 20 Nov 2018 05:06:35 +0000 (21:06 -0800)]
perf vendor events intel: Fix Load_Miss_Real_Latency on SKL/SKX

BugLink: https://bugs.launchpad.net/bugs/1837477
[ Upstream commit 91b2b97025097ce7ca7536bc87eba2bf14760fb4 ]

Fix incorrect event names for the Load_Miss_Real_Latency metric for
Skylake and Skylake Server.

Fixes https://github.com/andikleen/pmu-tools/issues/158

Before:

  % perf stat -M Load_Miss_Real_Latency true
  event syntax error: '..ss.pending,mem_load_retired.l1_miss_ps,mem_load_retired.fb_hit_ps}:W'
                                    \___ parser error

   Usage: perf stat [<options>] [<command>]

      -M, --metrics <metric/metric group list>
                            monitor specified metrics or metric groups (separated by ,)

After:

  % perf stat -M Load_Miss_Real_Latency true

   Performance counter stats for 'true':

             279,204      l1d_pend_miss.pending     #     14.0 Load_Miss_Real_Latency
               4,784      mem_load_uops_retired.l1_miss
              15,188      mem_load_uops_retired.hit_lfb

         0.000899640 seconds time elapsed

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/20181120050635.4215-1-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agobpf: relax verifier restriction on BPF_MOV | BPF_ALU
Jiong Wang [Fri, 7 Dec 2018 17:16:18 +0000 (12:16 -0500)]
bpf: relax verifier restriction on BPF_MOV | BPF_ALU

BugLink: https://bugs.launchpad.net/bugs/1837477
[ Upstream commit e434b8cdf788568ba65a0a0fd9f3cb41f3ca1803 ]

Currently, the destination register is marked as unknown for 32-bit
sub-register move (BPF_MOV | BPF_ALU) whenever the source register type is
SCALAR_VALUE.

This is too conservative that some valid cases will be rejected.
Especially, this may turn a constant scalar value into unknown value that
could break some assumptions of verifier.

For example, test_l4lb_noinline.c has the following C code:

    struct real_definition *dst

1:  if (!get_packet_dst(&dst, &pckt, vip_info, is_ipv6))
2:    return TC_ACT_SHOT;
3:
4:  if (dst->flags & F_IPV6) {

get_packet_dst is responsible for initializing "dst" into valid pointer and
return true (1), otherwise return false (0). The compiled instruction
sequence using alu32 will be:

  412: (54) (u32) r7 &= (u32) 1
  413: (bc) (u32) r0 = (u32) r7
  414: (95) exit

insn 413, a BPF_MOV | BPF_ALU, however will turn r0 into unknown value even
r7 contains SCALAR_VALUE 1.

This causes trouble when verifier is walking the code path that hasn't
initialized "dst" inside get_packet_dst, for which case 0 is returned and
we would then expect verifier concluding line 1 in the above C code pass
the "if" check, therefore would skip fall through path starting at line 4.
Now, because r0 returned from callee has became unknown value, so verifier
won't skip analyzing path starting at line 4 and "dst->flags" requires
dereferencing the pointer "dst" which actually hasn't be initialized for
this path.

This patch relaxed the code marking sub-register move destination. For a
SCALAR_VALUE, it is safe to just copy the value from source then truncate
it into 32-bit.

A unit test also included to demonstrate this issue. This test will fail
before this patch.

This relaxation could let verifier skipping more paths for conditional
comparison against immediate. It also let verifier recording a more
accurate/strict value for one register at one state, if this state end up
with going through exit without rejection and it is used for state
comparison later, then it is possible an inaccurate/permissive value is
better. So the real impact on verifier processed insn number is complex.
But in all, without this fix, valid program could be rejected.

>From real benchmarking on kernel selftests and Cilium bpf tests, there is
no impact on processed instruction number when tests ares compiled with
default compilation options. There is slightly improvements when they are
compiled with -mattr=+alu32 after this patch.

Also, test_xdp_noinline/-mattr=+alu32 now passed verification. It is
rejected before this fix.

Insn processed before/after this patch:

                        default     -mattr=+alu32

Kernel selftest

===
test_xdp.o              371/371      369/369
test_l4lb.o             6345/6345    5623/5623
test_xdp_noinline.o     2971/2971    rejected/2727
test_tcp_estates.o      429/429      430/430

Cilium bpf
===
bpf_lb-DLB_L3.o:        2085/2085     1685/1687
bpf_lb-DLB_L4.o:        2287/2287     1986/1982
bpf_lb-DUNKNOWN.o:      690/690       622/622
bpf_lxc.o:              95033/95033   N/A
bpf_netdev.o:           7245/7245     N/A
bpf_overlay.o:          2898/2898     3085/2947

NOTE:
  - bpf_lxc.o and bpf_netdev.o compiled by -mattr=+alu32 are rejected by
    verifier due to another issue inside verifier on supporting alu32
    binary.
  - Each cilium bpf program could generate several processed insn number,
    above number is sum of them.

v1->v2:
 - Restrict the change on SCALAR_VALUE.
 - Update benchmark numbers on Cilium bpf tests.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoarm64: kasan: Increase stack size for KASAN_EXTRA
Qian Cai [Fri, 7 Dec 2018 22:34:49 +0000 (17:34 -0500)]
arm64: kasan: Increase stack size for KASAN_EXTRA

BugLink: https://bugs.launchpad.net/bugs/1837477
[ Upstream commit 6e8830674ea77f57d57a33cca09083b117a71f41 ]

If the kernel is configured with KASAN_EXTRA, the stack size is
increased significantly due to setting the GCC -fstack-reuse option to
"none" [1]. As a result, it can trigger a stack overrun quite often with
32k stack size compiled using GCC 8. For example, this reproducer

  https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/madvise/madvise06.c

can trigger a "corrupted stack end detected inside scheduler" very
reliably with CONFIG_SCHED_STACK_END_CHECK enabled. There are other
reports at:

  https://lore.kernel.org/lkml/1542144497.12945.29.camel@gmx.us/
  https://lore.kernel.org/lkml/721E7B42-2D55-4866-9C1A-3E8D64F33F9C@gmx.us/

There are just too many functions that could have a large stack with
KASAN_EXTRA due to large local variables that have been called over and
over again without being able to reuse the stacks. Some noticiable ones
are,

size
7536 shrink_inactive_list
7440 shrink_page_list
6560 fscache_stats_show
3920 jbd2_journal_commit_transaction
3216 try_to_unmap_one
3072 migrate_page_move_mapping
3584 migrate_misplaced_transhuge_page
3920 ip_vs_lblcr_schedule
4304 lpfc_nvme_info_show
3888 lpfc_debugfs_nvmestat_data.constprop

There are other 49 functions over 2k in size while compiling kernel with
"-Wframe-larger-than=" on this machine. Hence, it is too much work to
change Makefiles for each object to compile without
-fsanitize-address-use-after-scope individually.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81715#c23

Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agomedia: uvcvideo: Refactor teardown of uvc on USB disconnect
Daniel Axtens [Sun, 23 Apr 2017 04:53:49 +0000 (00:53 -0400)]
media: uvcvideo: Refactor teardown of uvc on USB disconnect

BugLink: https://bugs.launchpad.net/bugs/1837477
[ Upstream commit 10e1fdb95809ed21406f53b5b4f064673a1b9ceb ]

Currently, disconnecting a USB webcam while it is in use prints out a
number of warnings, such as:

WARNING: CPU: 2 PID: 3118 at /build/linux-ezBi1T/linux-4.8.0/fs/sysfs/group.c:237 sysfs_remove_group+0x8b/0x90
sysfs group ffffffffa7cd0780 not found for kobject 'event13'

This has been noticed before. [0]

This is because of the order in which things are torn down.

If there are no streams active during a USB disconnect:

 - uvc_disconnect() is invoked via device_del() through the bus
   notifier mechanism.

 - this calls uvc_unregister_video().

 - uvc_unregister_video() unregisters the video device for each
   stream,

 - because there are no streams open, it calls uvc_delete()

 - uvc_delete() calls uvc_status_cleanup(), which cleans up the status
   input device.

 - uvc_delete() calls media_device_unregister(), which cleans up the
   media device

 - uvc_delete(), uvc_unregister_video() and uvc_disconnect() all
   return, and we end up back in device_del().

 - device_del() then cleans up the sysfs folder for the camera with
   dpm_sysfs_remove(). Because uvc_status_cleanup() and
   media_device_unregister() have already been called, this all works
   nicely.

If, on the other hand, there *are* streams active during a USB disconnect:

 - uvc_disconnect() is invoked

 - this calls uvc_unregister_video()

 - uvc_unregister_video() unregisters the video device for each
   stream,

 - uvc_unregister_video() and uvc_disconnect() return, and we end up
   back in device_del().

 - device_del() then cleans up the sysfs folder for the camera with
   dpm_sysfs_remove(). Because the status input device and the media
   device are children of the USB device, this also deletes their
   sysfs folders.

 - Sometime later, the final stream is closed, invoking uvc_release().

 - uvc_release() calls uvc_delete()

 - uvc_delete() calls uvc_status_cleanup(), which cleans up the status
   input device. Because the sysfs directory has already been removed,
   this causes a WARNing.

 - uvc_delete() calls media_device_unregister(), which cleans up the
   media device. Because the sysfs directory has already been removed,
   this causes another WARNing.

To fix this, we need to make sure the devices are always unregistered
before the end of uvc_disconnect(). To this, move the unregistration
into the disconnect path:

 - split uvc_status_cleanup() into two parts, one on disconnect that
   unregisters and one on delete that frees.

 - move v4l2_device_unregister() and media_device_unregister() into
   the disconnect path.

[0]: https://lkml.org/lkml/2016/12/8/657

[Renamed uvc_input_cleanup() to uvc_input_unregister()]

Signed-off-by: Daniel Axtens <dja@axtens.net>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoefi/libstub: Disable some warnings for x86{,_64}
Nathan Chancellor [Thu, 29 Nov 2018 17:12:26 +0000 (18:12 +0100)]
efi/libstub: Disable some warnings for x86{,_64}

BugLink: https://bugs.launchpad.net/bugs/1837477
[ Upstream commit 3db5e0ba8b8f4aee631d7ee04b7a11c56cfdc213 ]

When building the kernel with Clang, some disabled warnings appear
because this Makefile overrides KBUILD_CFLAGS for x86{,_64}. Add them to
this list so that the build is clean again.

-Wpointer-sign was disabled for the whole kernel before the beginning of Git history.

-Waddress-of-packed-member was disabled for the whole kernel and for
the early boot code in these commits:

  bfb38988c51e ("kbuild: clang: Disable 'address-of-packed-member' warning")
  20c6c1890455 ("x86/boot: Disable the address-of-packed-member compiler warning").

-Wgnu was disabled for the whole kernel and for the early boot code in
these commits:

  61163efae020 ("kbuild: LLVMLinux: Add Kbuild support for building kernel with Clang")
  6c3b56b19730 ("x86/boot: Disable Clang warnings about GNU extensions").

 [ mingo: Made the changelog more readable. ]

Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arend van Spriel <arend.vanspriel@broadcom.com>
Cc: Bhupesh Sharma <bhsharma@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Eric Snowberg <eric.snowberg@oracle.com>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Jon Hunter <jonathanh@nvidia.com>
Cc: Julien Thierry <julien.thierry@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: YiFei Zhu <zhuyifei1999@gmail.com>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20181129171230.18699-8-ard.biesheuvel@linaro.org
Link: https://github.com/ClangBuiltLinux/linux/issues/112
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoASoC: pcm3168a: Don't disable pcm3168a when CONFIG_PM defined
Jiada Wang [Wed, 28 Nov 2018 12:26:12 +0000 (21:26 +0900)]
ASoC: pcm3168a: Don't disable pcm3168a when CONFIG_PM defined

BugLink: https://bugs.launchpad.net/bugs/1837477
[ Upstream commit 489db5d941500249583ec6b49fa70e006bd8f632 ]

pcm3168 codec support runtime_[resume|suspend], whenever it
is not active, it enters suspend mode, and it's clock and regulators
will be disabled. so there is no need to disable them again in
remove callback.  Otherwise we got following kernel warnings,
when unload pcm3168a driver

[  222.257514] unbalanced disables for amp-en-regulator
[  222.262526] ------------[ cut here ]------------
[  222.267158] WARNING: CPU: 0 PID: 2423 at drivers/regulator/core.c:2264 _regulator_disable+0x28/0x108
[  222.276291] Modules linked in:
[  222.279343]  snd_soc_pcm3168a_i2c(-)
[  222.282916]  snd_aloop
[  222.285272]  arc4
[  222.287194]  wl18xx
[  222.289289]  wlcore
[  222.291385]  mac80211
[  222.293654]  cfg80211
[  222.295923]  aes_ce_blk
[  222.298366]  crypto_simd
[  222.300896]  cryptd
[  222.302992]  aes_ce_cipher
[  222.305696]  crc32_ce
[  222.307965]  ghash_ce
[  222.310234]  aes_arm64
[  222.312590]  gf128mul
[  222.314860]  snd_soc_rcar
[  222.317476]  sha2_ce
[  222.319658]  xhci_plat_hcd
[  222.322362]  sha256_arm64
[  222.324978]  xhci_hcd
[  222.327247]  sha1_ce
[  222.329430]  renesas_usbhs
[  222.332133]  evdev
[  222.334142]  sha1_generic
[  222.336758]  rcar_gen3_thermal
[  222.339810]  cpufreq_dt
[  222.342253]  ravb_streaming(C)
[  222.345304]  wlcore_sdio
[  222.347834]  thermal_sys
[  222.350363]  udc_core
[  222.352632]  mch_core(C)
[  222.355161]  usb_dmac
[  222.357430]  snd_soc_pcm3168a
[  222.360394]  snd_soc_ak4613
[  222.363184]  gpio_keys
[  222.365540]  virt_dma
[  222.367809]  nfsd
[  222.369730]  ipv6
[  222.371652]  autofs4
[  222.373834]  [last unloaded: snd_soc_pcm3168a_i2c]
[  222.378629] CPU: 0 PID: 2423 Comm: rmmod Tainted: G        WC      4.14.63-04798-gd456126e4a42-dirty #457
[  222.388196] Hardware name: Renesas H3ULCB Kingfisher board based on r8a7795 ES2.0+ (DT)
[  222.396199] task: ffff8006fa8c6200 task.stack: ffff00000a0a0000
[  222.402117] PC is at _regulator_disable+0x28/0x108
[  222.406906] LR is at _regulator_disable+0x28/0x108
[  222.411695] pc : [<ffff0000083bd89c>] lr : [<ffff0000083bd89c>] pstate: 00000145
[  222.419089] sp : ffff00000a0a3c80
[  222.422401] x29: ffff00000a0a3c80
[  222.425799] x28: ffff8006fa8c6200
[  222.429199] x27: ffff0000086f1000
[  222.432597] x26: 000000000000006a
[  222.435997] x25: 0000000000000124
[  222.439395] x24: 0000000000000018
[  222.442795] x23: 0000000000000006
[  222.446193] x22: ffff8006f925d490
[  222.449592] x21: ffff8006f9ac2068
[  222.452991] x20: ffff8006f9ac2000
[  222.456390] x19: 0000000000000005
[  222.459787] x18: 000000000000000a
[  222.463186] x17: 0000000000000000
[  222.466584] x16: 0000000000000000
[  222.469984] x15: 000000000d3f616a
[  222.473382] x14: 0720072007200720
[  222.476781] x13: 0720072007200720
[  222.480179] x12: 0720072007200720
[  222.483578] x11: 0720072007200720
[  222.486975] x10: 0720072007200720
[  222.490375] x9 : 0720072007200720
[  222.493773] x8 : 07200772076f0774
[  222.497172] x7 : 0000000000000000
[  222.500570] x6 : 0000000000000007
[  222.503969] x5 : 0000000000000000
[  222.507367] x4 : 0000000000000000
[  222.510766] x3 : 0000000000000000
[  222.514164] x2 : c790b852091e2600
[  222.517563] x1 : 0000000000000000
[  222.520961] x0 : 0000000000000028
[  222.524361] Call trace:
[  222.526805] Exception stack(0xffff00000a0a3b40 to 0xffff00000a0a3c80)
[  222.533245] 3b40: 0000000000000028 0000000000000000 c790b852091e2600 0000000000000000
[  222.541075] 3b60: 0000000000000000 0000000000000000 0000000000000007 0000000000000000
[  222.548905] 3b80: 07200772076f0774 0720072007200720 0720072007200720 0720072007200720
[  222.556735] 3ba0: 0720072007200720 0720072007200720 0720072007200720 000000000d3f616a
[  222.564564] 3bc0: 0000000000000000 0000000000000000 000000000000000a 0000000000000005
[  222.572394] 3be0: ffff8006f9ac2000 ffff8006f9ac2068 ffff8006f925d490 0000000000000006
[  222.580224] 3c00: 0000000000000018 0000000000000124 000000000000006a ffff0000086f1000
[  222.588053] 3c20: ffff8006fa8c6200 ffff00000a0a3c80 ffff0000083bd89c ffff00000a0a3c80
[  222.595883] 3c40: ffff0000083bd89c 0000000000000145 0000000000000000 0000000000000000
[  222.603713] 3c60: 0000ffffffffffff ffff00000a0a3c30 ffff00000a0a3c80 ffff0000083bd89c
[  222.611543] [<ffff0000083bd89c>] _regulator_disable+0x28/0x108
[  222.617375] [<ffff0000083bd9c4>] regulator_disable+0x48/0x68
[  222.623033] [<ffff0000083be8e4>] regulator_bulk_disable+0x58/0xc0
[  222.629134] [<ffff0000007d831c>] pcm3168a_remove+0x30/0x50 [snd_soc_pcm3168a]
[  222.636270] [<ffff0000007e5010>] pcm3168a_i2c_remove+0x10/0x1c [snd_soc_pcm3168a_i2c]
[  222.644106] [<ffff0000084b9d9c>] i2c_device_remove+0x38/0x70
[  222.649766] [<ffff00000843cd5c>] device_release_driver_internal+0xd0/0x1c0
[  222.656640] [<ffff00000843ced8>] driver_detach+0x70/0x7c
[  222.661951] [<ffff00000843bf68>] bus_remove_driver+0x74/0xa0
[  222.667609] [<ffff00000843d7e4>] driver_unregister+0x48/0x4c
[  222.673268] [<ffff0000084ba8dc>] i2c_del_driver+0x24/0x30
[  222.678666] [<ffff0000007e5078>] pcm3168a_i2c_driver_exit+0x10/0xf98 [snd_soc_pcm3168a_i2c]
[  222.687019] [<ffff00000811bd28>] SyS_delete_module+0x198/0x1d4
[  222.692850] Exception stack(0xffff00000a0a3ec0 to 0xffff00000a0a4000)
[  222.699289] 3ec0: 0000aaaafeb4b268 0000000000000800 14453f6470497100 0000fffffaa520d8
[  222.707119] 3ee0: 0000fffffaa520d9 000000000000000a 1999999999999999 0000000000000000
[  222.714948] 3f00: 000000000000006a 0000ffffa8f7d1d8 000000000000000a 0000000000000005
[  222.722778] 3f20: 0000000000000000 0000000000000000 000000000000002d 0000000000000000
[  222.730607] 3f40: 0000aaaae19b9f68 0000ffffa8f411f0 0000000000000000 0000aaaae19b9000
[  222.738436] 3f60: 0000fffffaa533b8 0000fffffaa531f0 0000000000000000 0000000000000001
[  222.746266] 3f80: 0000fffffaa53ec6 0000000000000000 0000aaaafeb4b200 0000aaaafeb4a010
[  222.754096] 3fa0: 0000000000000000 0000fffffaa53130 0000aaaae199f36c 0000fffffaa53130
[  222.761926] 3fc0: 0000ffffa8f411f8 0000000000000000 0000aaaafeb4b268 000000000000006a
[  222.769755] 3fe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[  222.777589] [<ffff0000080832c0>] el0_svc_naked+0x34/0x38
[  222.782899] ---[ end trace eaf8939a3698b1a8 ]---
[  222.787609] Failed to disable VCCDA2: -5
[  222.791649] ------------[ cut here ]------------
[  222.796283] WARNING: CPU: 0 PID: 2423 at drivers/clk/clk.c:595 clk_core_disable+0xc/0x1d8
[  222.804460] Modules linked in:
[  222.807511]  snd_soc_pcm3168a_i2c(-)
[  222.811083]  snd_aloop
[  222.813439]  arc4
[  222.815360]  wl18xx
[  222.817456]  wlcore
[  222.819551]  mac80211
[  222.821820]  cfg80211
[  222.824088]  aes_ce_blk
[  222.826531]  crypto_simd
[  222.829060]  cryptd
[  222.831155]  aes_ce_cipher
[  222.833859]  crc32_ce
[  222.836127]  ghash_ce
[  222.838396]  aes_arm64
[  222.840752]  gf128mul
[  222.843020]  snd_soc_rcar
[  222.845637]  sha2_ce
[  222.847818]  xhci_plat_hcd
[  222.850522]  sha256_arm64
[  222.853138]  xhci_hcd
[  222.855407]  sha1_ce
[  222.857589]  renesas_usbhs
[  222.860292]  evdev
[  222.862300]  sha1_generic
[  222.864917]  rcar_gen3_thermal
[  222.867968]  cpufreq_dt
[  222.870410]  ravb_streaming(C)
[  222.873461]  wlcore_sdio
[  222.875991]  thermal_sys
[  222.878520]  udc_core
[  222.880789]  mch_core(C)
[  222.883318]  usb_dmac
[  222.885587]  snd_soc_pcm3168a
[  222.888551]  snd_soc_ak4613
[  222.891341]  gpio_keys
[  222.893696]  virt_dma
[  222.895965]  nfsd
[  222.897886]  ipv6
[  222.899808]  autofs4
[  222.901990]  [last unloaded: snd_soc_pcm3168a_i2c]
[  222.906783] CPU: 0 PID: 2423 Comm: rmmod Tainted: G        WC      4.14.63-04798-gd456126e4a42-dirty #457
[  222.916349] Hardware name: Renesas H3ULCB Kingfisher board based on r8a7795 ES2.0+ (DT)
[  222.924351] task: ffff8006fa8c6200 task.stack: ffff00000a0a0000
[  222.930270] PC is at clk_core_disable+0xc/0x1d8
[  222.934799] LR is at clk_core_disable_lock+0x20/0x34
[  222.939761] pc : [<ffff0000083ab9b8>] lr : [<ffff0000083acd28>] pstate: 800001c5
[  222.947154] sp : ffff00000a0a3cf0
[  222.950466] x29: ffff00000a0a3cf0
[  222.953864] x28: ffff8006fa8c6200
[  222.957263] x27: ffff0000086f1000
[  222.960661] x26: 000000000000006a
[  222.964061] x25: 0000000000000124
[  222.967458] x24: 0000000000000015
[  222.970858] x23: ffff8006f9ffa8d0
[  222.974256] x22: ffff8006faf16480
[  222.977655] x21: ffff0000007e7040
[  222.981053] x20: ffff8006faadd100
[  222.984452] x19: 0000000000000140
[  222.987850] x18: 000000000000000a
[  222.991249] x17: 0000000000000000
[  222.994647] x16: 0000000000000000
[  222.998046] x15: 000000000d477819
[  223.001444] x14: 0720072007200720
[  223.004843] x13: 0720072007200720
[  223.008242] x12: 0720072007200720
[  223.011641] x11: 0720072007200720
[  223.015039] x10: 0720072007200720
[  223.018438] x9 : 0720072007200720
[  223.021837] x8 : 0720072007200720
[  223.025236] x7 : 0000000000000000
[  223.028634] x6 : 0000000000000007
[  223.032034] x5 : 0000000000000000
[  223.035432] x4 : 0000000000000000
[  223.038831] x3 : 0000000000000000
[  223.042229] x2 : 0000000004720471
[  223.045628] x1 : 0000000000000000
[  223.049026] x0 : ffff8006faadd100
[  223.052426] Call trace:
[  223.054870] Exception stack(0xffff00000a0a3bb0 to 0xffff00000a0a3cf0)
[  223.061309] 3ba0:                                   ffff8006faadd100 0000000000000000
[  223.069139] 3bc0: 0000000004720471 0000000000000000 0000000000000000 0000000000000000
[  223.076969] 3be0: 0000000000000007 0000000000000000 0720072007200720 0720072007200720
[  223.084798] 3c00: 0720072007200720 0720072007200720 0720072007200720 0720072007200720
[  223.092628] 3c20: 0720072007200720 000000000d477819 0000000000000000 0000000000000000
[  223.100458] 3c40: 000000000000000a 0000000000000140 ffff8006faadd100 ffff0000007e7040
[  223.108287] 3c60: ffff8006faf16480 ffff8006f9ffa8d0 0000000000000015 0000000000000124
[  223.116117] 3c80: 000000000000006a ffff0000086f1000 ffff8006fa8c6200 ffff00000a0a3cf0
[  223.123947] 3ca0: ffff0000083acd28 ffff00000a0a3cf0 ffff0000083ab9b8 00000000800001c5
[  223.131777] 3cc0: ffff00000a0a3cf0 ffff0000083acd1c 0000ffffffffffff ffff8006faadd100
[  223.139606] 3ce0: ffff00000a0a3cf0 ffff0000083ab9b8
[  223.144483] [<ffff0000083ab9b8>] clk_core_disable+0xc/0x1d8
[  223.150054] [<ffff0000083acd58>] clk_disable+0x1c/0x28
[  223.155198] [<ffff0000007d8328>] pcm3168a_remove+0x3c/0x50 [snd_soc_pcm3168a]
[  223.162334] [<ffff0000007e5010>] pcm3168a_i2c_remove+0x10/0x1c [snd_soc_pcm3168a_i2c]
[  223.170167] [<ffff0000084b9d9c>] i2c_device_remove+0x38/0x70
[  223.175826] [<ffff00000843cd5c>] device_release_driver_internal+0xd0/0x1c0
[  223.182700] [<ffff00000843ced8>] driver_detach+0x70/0x7c
[  223.188012] [<ffff00000843bf68>] bus_remove_driver+0x74/0xa0
[  223.193669] [<ffff00000843d7e4>] driver_unregister+0x48/0x4c
[  223.199329] [<ffff0000084ba8dc>] i2c_del_driver+0x24/0x30
[  223.204726] [<ffff0000007e5078>] pcm3168a_i2c_driver_exit+0x10/0xf98 [snd_soc_pcm3168a_i2c]
[  223.213079] [<ffff00000811bd28>] SyS_delete_module+0x198/0x1d4
[  223.218909] Exception stack(0xffff00000a0a3ec0 to 0xffff00000a0a4000)
[  223.225349] 3ec0: 0000aaaafeb4b268 0000000000000800 14453f6470497100 0000fffffaa520d8
[  223.233179] 3ee0: 0000fffffaa520d9 000000000000000a 1999999999999999 0000000000000000
[  223.241008] 3f00: 000000000000006a 0000ffffa8f7d1d8 000000000000000a 0000000000000005
[  223.248838] 3f20: 0000000000000000 0000000000000000 000000000000002d 0000000000000000
[  223.256668] 3f40: 0000aaaae19b9f68 0000ffffa8f411f0 0000000000000000 0000aaaae19b9000
[  223.264497] 3f60: 0000fffffaa533b8 0000fffffaa531f0 0000000000000000 0000000000000001
[  223.272327] 3f80: 0000fffffaa53ec6 0000000000000000 0000aaaafeb4b200 0000aaaafeb4a010
[  223.280157] 3fa0: 0000000000000000 0000fffffaa53130 0000aaaae199f36c 0000fffffaa53130
[  223.287986] 3fc0: 0000ffffa8f411f8 0000000000000000 0000aaaafeb4b268 000000000000006a
[  223.295816] 3fe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[  223.303648] [<ffff0000080832c0>] el0_svc_naked+0x34/0x38
[  223.308958] ---[ end trace eaf8939a3698b1a9 ]---
[  223.313752] ------------[ cut here ]------------
[  223.318383] WARNING: CPU: 0 PID: 2423 at drivers/clk/clk.c:477 clk_core_unprepare+0xc/0x1ac
[  223.326733] Modules linked in:
[  223.329784]  snd_soc_pcm3168a_i2c(-)
[  223.333356]  snd_aloop
[  223.335712]  arc4
[  223.337633]  wl18xx
[  223.339728]  wlcore
[  223.341823]  mac80211
[  223.344092]  cfg80211
[  223.346360]  aes_ce_blk
[  223.348803]  crypto_simd
[  223.351332]  cryptd
[  223.353428]  aes_ce_cipher
[  223.356131]  crc32_ce
[  223.358400]  ghash_ce
[  223.360668]  aes_arm64
[  223.363024]  gf128mul
[  223.365293]  snd_soc_rcar
[  223.367909]  sha2_ce
[  223.370091]  xhci_plat_hcd
[  223.372794]  sha256_arm64
[  223.375410]  xhci_hcd
[  223.377679]  sha1_ce
[  223.379861]  renesas_usbhs
[  223.382564]  evdev
[  223.384572]  sha1_generic
[  223.387188]  rcar_gen3_thermal
[  223.390239]  cpufreq_dt
[  223.392682]  ravb_streaming(C)
[  223.395732]  wlcore_sdio
[  223.398261]  thermal_sys
[  223.400790]  udc_core
[  223.403059]  mch_core(C)
[  223.405588]  usb_dmac
[  223.407856]  snd_soc_pcm3168a
[  223.410820]  snd_soc_ak4613
[  223.413609]  gpio_keys
[  223.415965]  virt_dma
[  223.418234]  nfsd
[  223.420155]  ipv6
[  223.422076]  autofs4
[  223.424258]  [last unloaded: snd_soc_pcm3168a_i2c]
[  223.429050] CPU: 0 PID: 2423 Comm: rmmod Tainted: G        WC      4.14.63-04798-gd456126e4a42-dirty #457
[  223.438616] Hardware name: Renesas H3ULCB Kingfisher board based on r8a7795 ES2.0+ (DT)
[  223.446618] task: ffff8006fa8c6200 task.stack: ffff00000a0a0000
[  223.452536] PC is at clk_core_unprepare+0xc/0x1ac
[  223.457239] LR is at clk_unprepare+0x28/0x3c
[  223.461506] pc : [<ffff0000083ab5a4>] lr : [<ffff0000083ace4c>] pstate: 60000145
[  223.468900] sp : ffff00000a0a3d00
[  223.472211] x29: ffff00000a0a3d00
[  223.475609] x28: ffff8006fa8c6200
[  223.479009] x27: ffff0000086f1000
[  223.482407] x26: 000000000000006a
[  223.485807] x25: 0000000000000124
[  223.489205] x24: 0000000000000015
[  223.492604] x23: ffff8006f9ffa8d0
[  223.496003] x22: ffff8006faf16480
[  223.499402] x21: ffff0000007e7040
[  223.502800] x20: ffff8006faf16420
[  223.506199] x19: ffff8006faadd100
[  223.509597] x18: 000000000000000a
[  223.512997] x17: 0000000000000000
[  223.516395] x16: 0000000000000000
[  223.519794] x15: 0000000000000000
[  223.523192] x14: 00000033fe89076c
[  223.526591] x13: 0000000000000400
[  223.529989] x12: 0000000000000400
[  223.533388] x11: 0000000000000000
[  223.536786] x10: 00000000000009e0
[  223.540185] x9 : ffff00000a0a3be0
[  223.543583] x8 : ffff8006fa8c6c40
[  223.546982] x7 : ffff8006fa8c6400
[  223.550380] x6 : 0000000000000001
[  223.553780] x5 : 0000000000000000
[  223.557178] x4 : ffff8006fa8c6200
[  223.560577] x3 : 0000000000000000
[  223.563975] x2 : ffff8006fa8c6200
[  223.567374] x1 : 0000000000000000
[  223.570772] x0 : ffff8006faadd100
[  223.574170] Call trace:
[  223.576615] Exception stack(0xffff00000a0a3bc0 to 0xffff00000a0a3d00)
[  223.583054] 3bc0: ffff8006faadd100 0000000000000000 ffff8006fa8c6200 0000000000000000
[  223.590884] 3be0: ffff8006fa8c6200 0000000000000000 0000000000000001 ffff8006fa8c6400
[  223.598714] 3c00: ffff8006fa8c6c40 ffff00000a0a3be0 00000000000009e0 0000000000000000
[  223.606544] 3c20: 0000000000000400 0000000000000400 00000033fe89076c 0000000000000000
[  223.614374] 3c40: 0000000000000000 0000000000000000 000000000000000a ffff8006faadd100
[  223.622204] 3c60: ffff8006faf16420 ffff0000007e7040 ffff8006faf16480 ffff8006f9ffa8d0
[  223.630033] 3c80: 0000000000000015 0000000000000124 000000000000006a ffff0000086f1000
[  223.637863] 3ca0: ffff8006fa8c6200 ffff00000a0a3d00 ffff0000083ace4c ffff00000a0a3d00
[  223.645693] 3cc0: ffff0000083ab5a4 0000000060000145 0000000000000140 ffff8006faadd100
[  223.653523] 3ce0: 0000ffffffffffff ffff0000083ace44 ffff00000a0a3d00 ffff0000083ab5a4
[  223.661353] [<ffff0000083ab5a4>] clk_core_unprepare+0xc/0x1ac
[  223.667103] [<ffff0000007d8330>] pcm3168a_remove+0x44/0x50 [snd_soc_pcm3168a]
[  223.674239] [<ffff0000007e5010>] pcm3168a_i2c_remove+0x10/0x1c [snd_soc_pcm3168a_i2c]
[  223.682070] [<ffff0000084b9d9c>] i2c_device_remove+0x38/0x70
[  223.687731] [<ffff00000843cd5c>] device_release_driver_internal+0xd0/0x1c0
[  223.694604] [<ffff00000843ced8>] driver_detach+0x70/0x7c
[  223.699915] [<ffff00000843bf68>] bus_remove_driver+0x74/0xa0
[  223.705572] [<ffff00000843d7e4>] driver_unregister+0x48/0x4c
[  223.711230] [<ffff0000084ba8dc>] i2c_del_driver+0x24/0x30
[  223.716628] [<ffff0000007e5078>] pcm3168a_i2c_driver_exit+0x10/0xf98 [snd_soc_pcm3168a_i2c]
[  223.724980] [<ffff00000811bd28>] SyS_delete_module+0x198/0x1d4
[  223.730811] Exception stack(0xffff00000a0a3ec0 to 0xffff00000a0a4000)
[  223.737250] 3ec0: 0000aaaafeb4b268 0000000000000800 14453f6470497100 0000fffffaa520d8
[  223.745079] 3ee0: 0000fffffaa520d9 000000000000000a 1999999999999999 0000000000000000
[  223.752909] 3f00: 000000000000006a 0000ffffa8f7d1d8 000000000000000a 0000000000000005
[  223.760739] 3f20: 0000000000000000 0000000000000000 000000000000002d 0000000000000000
[  223.768568] 3f40: 0000aaaae19b9f68 0000ffffa8f411f0 0000000000000000 0000aaaae19b9000
[  223.776398] 3f60: 0000fffffaa533b8 0000fffffaa531f0 0000000000000000 0000000000000001
[  223.784227] 3f80: 0000fffffaa53ec6 0000000000000000 0000aaaafeb4b200 0000aaaafeb4a010
[  223.792057] 3fa0: 0000000000000000 0000fffffaa53130 0000aaaae199f36c 0000fffffaa53130
[  223.799886] 3fc0: 0000ffffa8f411f8 0000000000000000 0000aaaafeb4b268 000000000000006a
[  223.807715] 3fe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[  223.815546] [<ffff0000080832c0>] el0_svc_naked+0x34/0x38
[  223.820855] ---[ end trace eaf8939a3698b1aa ]---

Fix this issue by only disable clock and regulators in remove callback
when CONFIG_PM isn't defined

Signed-off-by: Jiada Wang <jiada_wang@mentor.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agodrm/atomic-helper: Complete fake_commit->flip_done potentially earlier
Ville Syrjälä [Thu, 22 Nov 2018 14:34:11 +0000 (16:34 +0200)]
drm/atomic-helper: Complete fake_commit->flip_done potentially earlier

BugLink: https://bugs.launchpad.net/bugs/1837477
[ Upstream commit 2de42f79bb21a412f40ade8831eb6fc445cb78a4 ]

Consider the following scenario:
1. nonblocking enable crtc
2. wait for the event
3. nonblocking disable crtc

On i915 this can lead to a spurious -EBUSY from step 3 on
account of non-enabled planes getting the fake_commit in step 1
and we don't complete the fake_commit-> flip_done until
drm_atomic_helper_commit_hw_done() which can happen a long
time after the flip event was sent out.

This will become somewhat easy to hit on SKL+ once we start
to add all the planes for the crtc to every modeset commit
for the purposes of forcing a watermark register programming
[1].

To make the race a little less pronounced let's complete
fake_commit->flip_done after drm_atomic_helper_wait_for_flip_done().
For the single crtc case this should make the race quite
theoretical, assuming drm_atomic_helper_wait_for_flip_done()
actually has to wait for the real commit flip_done. In case
the real commit flip_done gets completed singificantly before
drm_atomic_helper_wait_for_flip_done(), or we are dealing with
multiple crtcs whose vblanks don't line up nicely the race still
exists.

[1] https://patchwork.freedesktop.org/patch/262670/

Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Fixes: 080de2e5be2d ("drm/atomic: Check for busy planes/connectors before setting the commit")
Testcase: igt/kms_cursor_legacy/*nonblocking-modeset-vs-cursor-atomic
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181122143412.11655-1-ville.syrjala@linux.intel.com
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoof: overlay: add missing of_node_put() after add new node to changeset
Frank Rowand [Fri, 5 Oct 2018 03:25:13 +0000 (20:25 -0700)]
of: overlay: add missing of_node_put() after add new node to changeset

BugLink: https://bugs.launchpad.net/bugs/1837477
[ Upstream commit 7c528e457d53c75107d5aa56892316d265c778de ]

The refcount of a newly added overlay node decrements to one
(instead of zero) when the overlay changeset is destroyed.  This
change will cause the final decrement be to zero.

After applying this patch, new validation warnings will be
reported from the devicetree unittest during boot due to
a pre-existing devicetree bug.  The warnings will be similar to:

  OF: ERROR: memory leak before free overlay changeset,  /testcase-data/overlay-node/test-bus/test-unittest4

This pre-existing devicetree bug will also trigger a WARN_ONCE() from
refcount_sub_and_test_checked() when an overlay changeset is
destroyed without having first been applied.  This scenario occurs
when an error in the overlay is detected during the overlay changeset
creation:

  WARNING: CPU: 0 PID: 1 at lib/refcount.c:187 refcount_sub_and_test_checked+0xa8/0xbc
  refcount_t: underflow; use-after-free.

  (unwind_backtrace) from (show_stack+0x10/0x14)
  (show_stack) from (dump_stack+0x6c/0x8c)
  (dump_stack) from (__warn+0xdc/0x104)
  (__warn) from (warn_slowpath_fmt+0x44/0x6c)
  (warn_slowpath_fmt) from (refcount_sub_and_test_checked+0xa8/0xbc)
  (refcount_sub_and_test_checked) from (kobject_put+0x24/0x208)
  (kobject_put) from (of_changeset_destroy+0x2c/0xb4)
  (of_changeset_destroy) from (free_overlay_changeset+0x1c/0x9c)
  (free_overlay_changeset) from (of_overlay_remove+0x284/0x2cc)
  (of_overlay_remove) from (of_unittest_apply_revert_overlay_check.constprop.4+0xf8/0x1e8)
  (of_unittest_apply_revert_overlay_check.constprop.4) from (of_unittest_overlay+0x960/0xed8)
  (of_unittest_overlay) from (of_unittest+0x1cc4/0x2138)
  (of_unittest) from (do_one_initcall+0x4c/0x28c)
  (do_one_initcall) from (kernel_init_freeable+0x29c/0x378)
  (kernel_init_freeable) from (kernel_init+0x8/0x110)
  (kernel_init) from (ret_from_fork+0x14/0x2c)

Tested-by: Alan Tull <atull@kernel.org>
Signed-off-by: Frank Rowand <frank.rowand@sony.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agodrm/amdkfd: fix interrupt spin lock
Christian König [Fri, 2 Nov 2018 13:46:24 +0000 (14:46 +0100)]
drm/amdkfd: fix interrupt spin lock

BugLink: https://bugs.launchpad.net/bugs/1837477
[ Upstream commit 2383a767c0ca06f96534456d8313909017c6c8d0 ]

Vega10 has multiple interrupt rings, so this can be called from multiple
calles at the same time resulting in:

[   71.779334] ================================
[   71.779406] WARNING: inconsistent lock state
[   71.779478] 4.19.0-rc1+ #44 Tainted: G        W
[   71.779565] --------------------------------
[   71.779637] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
[   71.779740] kworker/6:1/120 [HC0[0]:SC0[0]:HE1:SE1] takes:
[   71.779832] 00000000ad761971 (&(&kfd->interrupt_lock)->rlock){?...},
at: kgd2kfd_interrupt+0x75/0x100 [amdgpu]
[   71.780058] {IN-HARDIRQ-W} state was registered at:
[   71.780115]   _raw_spin_lock+0x2c/0x40
[   71.780180]   kgd2kfd_interrupt+0x75/0x100 [amdgpu]
[   71.780248]   amdgpu_irq_callback+0x6c/0x150 [amdgpu]
[   71.780315]   amdgpu_ih_process+0x88/0x100 [amdgpu]
[   71.780380]   amdgpu_irq_handler+0x20/0x40 [amdgpu]
[   71.780409]   __handle_irq_event_percpu+0x49/0x2a0
[   71.780436]   handle_irq_event_percpu+0x30/0x70
[   71.780461]   handle_irq_event+0x37/0x60
[   71.780484]   handle_edge_irq+0x83/0x1b0
[   71.780506]   handle_irq+0x1f/0x30
[   71.780526]   do_IRQ+0x53/0x110
[   71.780544]   ret_from_intr+0x0/0x22
[   71.780566]   cpuidle_enter_state+0xaa/0x330
[   71.780591]   do_idle+0x203/0x280
[   71.780610]   cpu_startup_entry+0x6f/0x80
[   71.780634]   start_secondary+0x1b0/0x200
[   71.780657]   secondary_startup_64+0xa4/0xb0

Fix this by always using irq save spin locks.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agonetfilter: ipset: Allow matching on destination MAC address for mac and ipmac sets
Stefano Brivio [Fri, 17 Aug 2018 19:09:47 +0000 (21:09 +0200)]
netfilter: ipset: Allow matching on destination MAC address for mac and ipmac sets

BugLink: https://bugs.launchpad.net/bugs/1837477
[ Upstream commit 8cc4ccf58379935f3ad456cc34e61c4e4c921d0e ]

There doesn't seem to be any reason to restrict MAC address
matching to source MAC addresses in set types bitmap:ipmac,
hash:ipmac and hash:mac. With this patch, and this setup:

  ip netns add A
  ip link add veth1 type veth peer name veth2 netns A
  ip addr add 192.0.2.1/24 dev veth1
  ip -net A addr add 192.0.2.2/24 dev veth2
  ip link set veth1 up
  ip -net A link set veth2 up

  ip netns exec A ipset create test hash:mac
  dst=$(ip netns exec A cat /sys/class/net/veth2/address)
  ip netns exec A ipset add test ${dst}
  ip netns exec A iptables -P INPUT DROP
  ip netns exec A iptables -I INPUT -m set --match-set test dst -j ACCEPT

ipset will match packets based on destination MAC address:

  # ping -c1 192.0.2.2 >/dev/null
  # echo $?
  0

Reported-by: Yi Chen <yiche@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agonet: clear skb->tstamp in bridge forwarding path
Paolo Abeni [Tue, 8 Jan 2019 17:45:05 +0000 (18:45 +0100)]
net: clear skb->tstamp in bridge forwarding path

BugLink: https://bugs.launchpad.net/bugs/1837477
[ Upstream commit 41d1c8839e5f8cb781cc635f12791decee8271b7 ]

Matteo reported forwarding issues inside the linux bridge,
if the enslaved interfaces use the fq qdisc.

Similar to commit 8203e2d844d3 ("net: clear skb->tstamp in
forwarding paths"), we need to clear the tstamp field in
the bridge forwarding path.

Fixes: 80b14dee2bea ("net: Add a new socket option for a future transmit time.")
Fixes: fb420d5d91c1 ("tcp/fq: move back to CLOCK_MONOTONIC")
Reported-and-tested-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoipmi:ssif: Fix handling of multi-part return messages
Corey Minyard [Fri, 16 Nov 2018 15:59:21 +0000 (09:59 -0600)]
ipmi:ssif: Fix handling of multi-part return messages

BugLink: https://bugs.launchpad.net/bugs/1837477
commit 7d6380cd40f7993f75c4bde5b36f6019237e8719 upstream.

The block number was not being compared right, it was off by one
when checking the response.

Some statistics wouldn't be incremented properly in some cases.

Check to see if that middle-part messages always have 31 bytes of
data.

Signed-off-by: Corey Minyard <cminyard@mvista.com>
Cc: stable@vger.kernel.org # 4.4
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoPCI: dwc: Move interrupt acking into the proper callback
Marc Zyngier [Tue, 13 Nov 2018 22:57:34 +0000 (22:57 +0000)]
PCI: dwc: Move interrupt acking into the proper callback

BugLink: https://bugs.launchpad.net/bugs/1837477
commit 3f7bb2ec20ce07c02b2002349d256c91a463fcc5 upstream.

The write to the status register is really an ACK for the HW,
and should be treated as such by the driver. Let's move it to the
irq_ack() callback, which will prevent people from moving it around
in order to paper over other bugs.

Fixes: 8c934095fa2f ("PCI: dwc: Clear MSI interrupt status after it is handled,
not before")
Fixes: 7c5925afbc58 ("PCI: dwc: Move MSI IRQs allocation to IRQ domains
hierarchical API")
Link: https://lore.kernel.org/linux-pci/20181113225734.8026-1-marc.zyngier@arm.com/
Reported-by: Trent Piepho <tpiepho@impinj.com>
Tested-by: Niklas Cassel <niklas.cassel@linaro.org>
Tested-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com>
Tested-by: Stanimir Varbanov <svarbanov@mm-sol.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
[lorenzo.pieralisi@arm.com: updated commit log]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agocifs: In Kconfig CONFIG_CIFS_POSIX needs depends on legacy (insecure cifs)
Steve French [Sat, 3 Nov 2018 20:02:44 +0000 (15:02 -0500)]
cifs: In Kconfig CONFIG_CIFS_POSIX needs depends on legacy (insecure cifs)

BugLink: https://bugs.launchpad.net/bugs/1837477
[ Upstream commit 6e785302dad32228819d8066e5376acd15d0e6ba ]

Missing a dependency.  Shouldn't show cifs posix extensions
in Kconfig if CONFIG_CIFS_ALLOW_INSECURE_DIALECTS (ie SMB1
protocol) is disabled.

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agocifs: allow disabling insecure dialects in the config
Steve French [Tue, 19 Jun 2018 19:34:08 +0000 (14:34 -0500)]
cifs: allow disabling insecure dialects in the config

BugLink: https://bugs.launchpad.net/bugs/1837477
commit 7420451f6a109f7f8f1bf283f34d08eba3259fb3 upstream.

allow disabling cifs (SMB1 ie vers=1.0) and vers=2.0 in the
config for the build of cifs.ko if want to always prevent mounting
with these less secure dialects.

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Jeremy Allison <jra@samba.org>
Cc: Alakesh Haloi <alakeshh@amazon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoUBUNTU: [Config] updateconfigs for CIFS_ALLOW_INSECURE_LEGACY
Kamal Mostafa [Mon, 22 Jul 2019 19:25:06 +0000 (12:25 -0700)]
UBUNTU: [Config] updateconfigs for CIFS_ALLOW_INSECURE_LEGACY

BugLink: https://bugs.launchpad.net/bugs/1837477
Ignore: yes
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agomm, proc: be more verbose about unstable VMA flags in /proc/<pid>/smaps
Michal Hocko [Fri, 28 Dec 2018 08:38:17 +0000 (00:38 -0800)]
mm, proc: be more verbose about unstable VMA flags in /proc/<pid>/smaps

BugLink: https://bugs.launchpad.net/bugs/1837477
[ Upstream commit 7550c6079846a24f30d15ac75a941c8515dbedfb ]

Patch series "THP eligibility reporting via proc".

This series of three patches aims at making THP eligibility reporting much
more robust and long term sustainable.  The trigger for the change is a
regression report [2] and the long follow up discussion.  In short the
specific application didn't have good API to query whether a particular
mapping can be backed by THP so it has used VMA flags to workaround that.
These flags represent a deep internal state of VMAs and as such they
should be used by userspace with a great deal of caution.

A similar has happened for [3] when users complained that VM_MIXEDMAP is
no longer set on DAX mappings.  Again a lack of a proper API led to an
abuse.

The first patch in the series tries to emphasise that that the semantic of
flags might change and any application consuming those should be really
careful.

The remaining two patches provide a more suitable interface to address [2]
and provide a consistent API to query the THP status both for each VMA and
process wide as well.  [1]

http://lkml.kernel.org/r/20181120103515.25280-1-mhocko@kernel.org [2]
http://lkml.kernel.org/r/http://lkml.kernel.org/r/alpine.DEB.2.21.1809241054050.224429@chino.kir.corp.google.com
[3] http://lkml.kernel.org/r/20181002100531.GC4135@quack2.suse.cz

This patch (of 3):

Even though vma flags exported via /proc/<pid>/smaps are explicitly
documented to be not guaranteed for future compatibility the warning
doesn't go far enough because it doesn't mention semantic changes to those
flags.  And they are important as well because these flags are a deep
implementation internal to the MM code and the semantic might change at
any time.

Let's consider two recent examples:
http://lkml.kernel.org/r/20181002100531.GC4135@quack2.suse.cz
: commit e1fb4a086495 "dax: remove VM_MIXEDMAP for fsdax and device dax" has
: removed VM_MIXEDMAP flag from DAX VMAs. Now our testing shows that in the
: mean time certain customer of ours started poking into /proc/<pid>/smaps
: and looks at VMA flags there and if VM_MIXEDMAP is missing among the VMA
: flags, the application just fails to start complaining that DAX support is
: missing in the kernel.

http://lkml.kernel.org/r/alpine.DEB.2.21.1809241054050.224429@chino.kir.corp.google.com
: Commit 1860033237d4 ("mm: make PR_SET_THP_DISABLE immediately active")
: introduced a regression in that userspace cannot always determine the set
: of vmas where thp is ineligible.
: Userspace relies on the "nh" flag being emitted as part of /proc/pid/smaps
: to determine if a vma is eligible to be backed by hugepages.
: Previous to this commit, prctl(PR_SET_THP_DISABLE, 1) would cause thp to
: be disabled and emit "nh" as a flag for the corresponding vmas as part of
: /proc/pid/smaps.  After the commit, thp is disabled by means of an mm
: flag and "nh" is not emitted.
: This causes smaps parsing libraries to assume a vma is eligible for thp
: and ends up puzzling the user on why its memory is not backed by thp.

In both cases userspace was relying on a semantic of a specific VMA flag.
The primary reason why that happened is a lack of a proper interface.
While this has been worked on and it will be fixed properly, it seems that
our wording could see some refinement and be more vocal about semantic
aspect of these flags as well.

Link: http://lkml.kernel.org/r/20181211143641.3503-2-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Paul Oppenheimer <bepvte@gmail.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agomm/swap: use nr_node_ids for avail_lists in swap_info_struct
Aaron Lu [Fri, 28 Dec 2018 08:34:39 +0000 (00:34 -0800)]
mm/swap: use nr_node_ids for avail_lists in swap_info_struct

BugLink: https://bugs.launchpad.net/bugs/1837477
[ Upstream commit 66f71da9dd38af17dc17209cdde7987d4679a699 ]

Since a2468cc9bfdf ("swap: choose swap device according to numa node"),
avail_lists field of swap_info_struct is changed to an array with
MAX_NUMNODES elements.  This made swap_info_struct size increased to 40KiB
and needs an order-4 page to hold it.

This is not optimal in that:
1 Most systems have way less than MAX_NUMNODES(1024) nodes so it
  is a waste of memory;
2 It could cause swapon failure if the swap device is swapped on
  after system has been running for a while, due to no order-4
  page is available as pointed out by Vasily Averin.

Solve the above two issues by using nr_node_ids(which is the actual
possible node number the running system has) for avail_lists instead of
MAX_NUMNODES.

nr_node_ids is unknown at compile time so can't be directly used when
declaring this array.  What I did here is to declare avail_lists as zero
element array and allocate space for it when allocating space for
swap_info_struct.  The reason why keep using array but not pointer is
plist_for_each_entry needs the field to be part of the struct, so pointer
will not work.

This patch is on top of Vasily Averin's fix commit.  I think the use of
kvzalloc for swap_info_struct is still needed in case nr_node_ids is
really big on some systems.

Link: http://lkml.kernel.org/r/20181115083847.GA11129@intel.com
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vasily Averin <vvs@virtuozzo.com>
Cc: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agomm/page-writeback.c: don't break integrity writeback on ->writepage() error
Brian Foster [Fri, 28 Dec 2018 08:37:20 +0000 (00:37 -0800)]
mm/page-writeback.c: don't break integrity writeback on ->writepage() error

BugLink: https://bugs.launchpad.net/bugs/1837477
[ Upstream commit 3fa750dcf29e8606e3969d13d8e188cc1c0f511d ]

write_cache_pages() is used in both background and integrity writeback
scenarios by various filesystems.  Background writeback is mostly
concerned with cleaning a certain number of dirty pages based on various
mm heuristics.  It may not write the full set of dirty pages or wait for
I/O to complete.  Integrity writeback is responsible for persisting a set
of dirty pages before the writeback job completes.  For example, an
fsync() call must perform integrity writeback to ensure data is on disk
before the call returns.

write_cache_pages() unconditionally breaks out of its processing loop in
the event of a ->writepage() error.  This is fine for background
writeback, which had no strict requirements and will eventually come
around again.  This can cause problems for integrity writeback on
filesystems that might need to clean up state associated with failed page
writeouts.  For example, XFS performs internal delayed allocation
accounting before returning a ->writepage() error, where applicable.  If
the current writeback happens to be associated with an unmount and
write_cache_pages() completes the writeback prematurely due to error, the
filesystem is unmounted in an inconsistent state if dirty+delalloc pages
still exist.

To handle this problem, update write_cache_pages() to always process the
full set of pages for integrity writeback regardless of ->writepage()
errors.  Save the first encountered error and return it to the caller once
complete.  This facilitates XFS (or any other fs that expects integrity
writeback to process the entire set of dirty pages) to clean up its
internal state completely in the event of persistent mapping errors.
Background writeback continues to exit on the first error encountered.

[akpm@linux-foundation.org: fix typo in comment]
Link: http://lkml.kernel.org/r/20181116134304.32440-1-bfoster@redhat.com
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoocfs2: fix panic due to unrecovered local alloc
Junxiao Bi [Fri, 28 Dec 2018 08:32:50 +0000 (00:32 -0800)]
ocfs2: fix panic due to unrecovered local alloc

BugLink: https://bugs.launchpad.net/bugs/1837477
[ Upstream commit 532e1e54c8140188e192348c790317921cb2dc1c ]

mount.ocfs2 ignore the inconsistent error that journal is clean but
local alloc is unrecovered.  After mount, local alloc not empty, then
reserver cluster didn't alloc a new local alloc window, reserveration
map is empty(ocfs2_reservation_map.m_bitmap_len = 0), that triggered the
following panic.

This issue was reported at

  https://oss.oracle.com/pipermail/ocfs2-devel/2015-May/010854.html

and was advised to fixed during mount.  But this is a very unusual
inconsistent state, usually journal dirty flag should be cleared at the
last stage of umount until every other things go right.  We may need do
further debug to check that.  Any way to avoid possible futher
corruption, mount should be abort and fsck should be run.

  (mount.ocfs2,1765,1):ocfs2_load_local_alloc:353 ERROR: Local alloc hasn't been recovered!
  found = 6518, set = 6518, taken = 8192, off = 15912372
  ocfs2: Mounting device (202,64) on (node 0, slot 3) with ordered data mode.
  o2dlm: Joining domain 89CEAC63CC4F4D03AC185B44E0EE0F3F ( 0 1 2 3 4 5 6 8 ) 8 nodes
  ocfs2: Mounting device (202,80) on (node 0, slot 3) with ordered data mode.
  o2hb: Region 89CEAC63CC4F4D03AC185B44E0EE0F3F (xvdf) is now a quorum device
  o2net: Accepted connection from node yvwsoa17p (num 7) at 172.22.77.88:7777
  o2dlm: Node 7 joins domain 64FE421C8C984E6D96ED12C55FEE2435 ( 0 1 2 3 4 5 6 7 8 ) 9 nodes
  o2dlm: Node 7 joins domain 89CEAC63CC4F4D03AC185B44E0EE0F3F ( 0 1 2 3 4 5 6 7 8 ) 9 nodes
  ------------[ cut here ]------------
  kernel BUG at fs/ocfs2/reservations.c:507!
  invalid opcode: 0000 [#1] SMP
  Modules linked in: ocfs2 rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs fscache lockd grace ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sunrpc ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 ovmapi ppdev parport_pc parport xen_netfront fb_sys_fops sysimgblt sysfillrect syscopyarea acpi_cpufreq pcspkr i2c_piix4 i2c_core sg ext4 jbd2 mbcache2 sr_mod cdrom xen_blkfront pata_acpi ata_generic ata_piix floppy dm_mirror dm_region_hash dm_log dm_mod
  CPU: 0 PID: 4349 Comm: startWebLogic.s Not tainted 4.1.12-124.19.2.el6uek.x86_64 #2
  Hardware name: Xen HVM domU, BIOS 4.4.4OVM 09/06/2018
  task: ffff8803fb04e200 ti: ffff8800ea4d8000 task.ti: ffff8800ea4d8000
  RIP: 0010:[<ffffffffa05e96a8>]  [<ffffffffa05e96a8>] __ocfs2_resv_find_window+0x498/0x760 [ocfs2]
  Call Trace:
    ocfs2_resmap_resv_bits+0x10d/0x400 [ocfs2]
    ocfs2_claim_local_alloc_bits+0xd0/0x640 [ocfs2]
    __ocfs2_claim_clusters+0x178/0x360 [ocfs2]
    ocfs2_claim_clusters+0x1f/0x30 [ocfs2]
    ocfs2_convert_inline_data_to_extents+0x634/0xa60 [ocfs2]
    ocfs2_write_begin_nolock+0x1c6/0x1da0 [ocfs2]
    ocfs2_write_begin+0x13e/0x230 [ocfs2]
    generic_perform_write+0xbf/0x1c0
    __generic_file_write_iter+0x19c/0x1d0
    ocfs2_file_write_iter+0x589/0x1360 [ocfs2]
    __vfs_write+0xb8/0x110
    vfs_write+0xa9/0x1b0
    SyS_write+0x46/0xb0
    system_call_fastpath+0x18/0xd7
  Code: ff ff 8b 75 b8 39 75 b0 8b 45 c8 89 45 98 0f 84 e5 fe ff ff 45 8b 74 24 18 41 8b 54 24 1c e9 56 fc ff ff 85 c0 0f 85 48 ff ff ff <0f> 0b 48 8b 05 cf c3 de ff 48 ba 00 00 00 00 00 00 00 10 48 85
  RIP   __ocfs2_resv_find_window+0x498/0x760 [ocfs2]
   RSP <ffff8800ea4db668>
  ---[ end trace 566f07529f2edf3c ]---
  Kernel panic - not syncing: Fatal exception
  Kernel Offset: disabled

Link: http://lkml.kernel.org/r/20181121020023.3034-2-junxiao.bi@oracle.com
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Acked-by: Joseph Qi <jiangqi903@gmail.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoscsi: megaraid: fix out-of-bound array accesses
Qian Cai [Thu, 13 Dec 2018 13:27:27 +0000 (08:27 -0500)]
scsi: megaraid: fix out-of-bound array accesses

BugLink: https://bugs.launchpad.net/bugs/1837477
[ Upstream commit c7a082e4242fd8cd21a441071e622f87c16bdacc ]

UBSAN reported those with MegaRAID SAS-3 3108,

[   77.467308] UBSAN: Undefined behaviour in drivers/scsi/megaraid/megaraid_sas_fp.c:117:32
[   77.475402] index 255 is out of range for type 'MR_LD_SPAN_MAP [1]'
[   77.481677] CPU: 16 PID: 333 Comm: kworker/16:1 Not tainted 4.20.0-rc5+ #1
[   77.488556] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.50 06/01/2018
[   77.495791] Workqueue: events work_for_cpu_fn
[   77.500154] Call trace:
[   77.502610]  dump_backtrace+0x0/0x2c8
[   77.506279]  show_stack+0x24/0x30
[   77.509604]  dump_stack+0x118/0x19c
[   77.513098]  ubsan_epilogue+0x14/0x60
[   77.516765]  __ubsan_handle_out_of_bounds+0xfc/0x13c
[   77.521767]  mr_update_load_balance_params+0x150/0x158 [megaraid_sas]
[   77.528230]  MR_ValidateMapInfo+0x2cc/0x10d0 [megaraid_sas]
[   77.533825]  megasas_get_map_info+0x244/0x2f0 [megaraid_sas]
[   77.539505]  megasas_init_adapter_fusion+0x9b0/0xf48 [megaraid_sas]
[   77.545794]  megasas_init_fw+0x1ab4/0x3518 [megaraid_sas]
[   77.551212]  megasas_probe_one+0x2c4/0xbe0 [megaraid_sas]
[   77.556614]  local_pci_probe+0x7c/0xf0
[   77.560365]  work_for_cpu_fn+0x34/0x50
[   77.564118]  process_one_work+0x61c/0xf08
[   77.568129]  worker_thread+0x534/0xa70
[   77.571882]  kthread+0x1c8/0x1d0
[   77.575114]  ret_from_fork+0x10/0x1c

[   89.240332] UBSAN: Undefined behaviour in drivers/scsi/megaraid/megaraid_sas_fp.c:117:32
[   89.248426] index 255 is out of range for type 'MR_LD_SPAN_MAP [1]'
[   89.254700] CPU: 16 PID: 95 Comm: kworker/u130:0 Not tainted 4.20.0-rc5+ #1
[   89.261665] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.50 06/01/2018
[   89.268903] Workqueue: events_unbound async_run_entry_fn
[   89.274222] Call trace:
[   89.276680]  dump_backtrace+0x0/0x2c8
[   89.280348]  show_stack+0x24/0x30
[   89.283671]  dump_stack+0x118/0x19c
[   89.287167]  ubsan_epilogue+0x14/0x60
[   89.290835]  __ubsan_handle_out_of_bounds+0xfc/0x13c
[   89.295828]  MR_LdRaidGet+0x50/0x58 [megaraid_sas]
[   89.300638]  megasas_build_io_fusion+0xbb8/0xd90 [megaraid_sas]
[   89.306576]  megasas_build_and_issue_cmd_fusion+0x138/0x460 [megaraid_sas]
[   89.313468]  megasas_queue_command+0x398/0x3d0 [megaraid_sas]
[   89.319222]  scsi_dispatch_cmd+0x1dc/0x8a8
[   89.323321]  scsi_request_fn+0x8e8/0xdd0
[   89.327249]  __blk_run_queue+0xc4/0x158
[   89.331090]  blk_execute_rq_nowait+0xf4/0x158
[   89.335449]  blk_execute_rq+0xdc/0x158
[   89.339202]  __scsi_execute+0x130/0x258
[   89.343041]  scsi_probe_and_add_lun+0x2fc/0x1488
[   89.347661]  __scsi_scan_target+0x1cc/0x8c8
[   89.351848]  scsi_scan_channel.part.3+0x8c/0xc0
[   89.356382]  scsi_scan_host_selected+0x130/0x1f0
[   89.361002]  do_scsi_scan_host+0xd8/0xf0
[   89.364927]  do_scan_async+0x9c/0x320
[   89.368594]  async_run_entry_fn+0x138/0x420
[   89.372780]  process_one_work+0x61c/0xf08
[   89.376793]  worker_thread+0x13c/0xa70
[   89.380546]  kthread+0x1c8/0x1d0
[   89.383778]  ret_from_fork+0x10/0x1c

This is because when populating Driver Map using firmware raid map, all
non-existing VDs set their ldTgtIdToLd to 0xff, so it can be skipped later.

From drivers/scsi/megaraid/megaraid_sas_base.c ,
memset(instance->ld_ids, 0xff, MEGASAS_MAX_LD_IDS);

From drivers/scsi/megaraid/megaraid_sas_fp.c ,
/* For non existing VDs, iterate to next VD*/
if (ld >= (MAX_LOGICAL_DRIVES_EXT - 1))
continue;

However, there are a few places that failed to skip those non-existing VDs
due to off-by-one errors. Then, those 0xff leaked into MR_LdRaidGet(0xff,
map) and triggered the out-of-bound accesses.

Fixes: 51087a8617fe ("megaraid_sas : Extended VD support")
Signed-off-by: Qian Cai <cai@lca.pw>
Acked-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoscsi: smartpqi: call pqi_free_interrupts() in pqi_shutdown()
Yanjiang Jin [Thu, 20 Dec 2018 08:32:35 +0000 (16:32 +0800)]
scsi: smartpqi: call pqi_free_interrupts() in pqi_shutdown()

BugLink: https://bugs.launchpad.net/bugs/1837477
[ Upstream commit e57b2945aa654e48f85a41e8917793c64ecb9de8 ]

We must free all irqs during shutdown, else kexec's 2nd kernel would hang
in pqi_wait_for_completion_io() as below:

Call trace:

 pqi_wait_for_completion_io
 pqi_submit_raid_request_synchronous.constprop.78+0x23c/0x310 [smartpqi]
 pqi_configure_events+0xec/0x1f8 [smartpqi]
 pqi_ctrl_init+0x814/0xca0 [smartpqi]
 pqi_pci_probe+0x400/0x46c [smartpqi]
 local_pci_probe+0x48/0xb0
 pci_device_probe+0x14c/0x1b0
 really_probe+0x218/0x3fc
 driver_probe_device+0x70/0x140
 __driver_attach+0x11c/0x134
 bus_for_each_dev+0x70/0xc8
 driver_attach+0x30/0x38
 bus_add_driver+0x1f0/0x294
 driver_register+0x74/0x12c
 __pci_register_driver+0x64/0x70
 pqi_init+0xd0/0x10000 [smartpqi]
 do_one_initcall+0x60/0x1d8
 do_init_module+0x64/0x1f8
 load_module+0x10ec/0x1350
 __se_sys_finit_module+0xd4/0x100
 __arm64_sys_finit_module+0x28/0x34
 el0_svc_handler+0x104/0x160
 el0_svc+0x8/0xc

This happens only in the following combinations:

1. smartpqi is built as module, not built-in;
2. We have a disk connected to smartpqi card;
3. Both kexec's 1st and 2nd kernels use this disk as Rootfs' mount point.

Signed-off-by: Yanjiang Jin <yanjiang.jin@hxt-semitech.com>
Acked-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoscsi: smartpqi: correct lun reset issues
Kevin Barnett [Fri, 7 Dec 2018 22:29:51 +0000 (16:29 -0600)]
scsi: smartpqi: correct lun reset issues

BugLink: https://bugs.launchpad.net/bugs/1837477
[ Upstream commit 2ba55c9851d74eb015a554ef69ddf2ef061d5780 ]

Problem:
The Linux kernel takes a logical volume offline after a LUN reset.  This is
generally accompanied by this message in the dmesg output:

Device offlined - not ready after error recovery

Root Cause:
The root cause is a "quirk" in the timeout handling in the Linux SCSI
layer. The Linux kernel places a 30-second timeout on most media access
commands (reads and writes) that it send to device drivers.  When a media
access command times out, the Linux kernel goes into error recovery mode
for the LUN that was the target of the command that timed out. Every
command that timed out is kept on a list inside of the Linux kernel to be
retried later. The kernel attempts to recover the command(s) that timed out
by issuing a LUN reset followed by a TEST UNIT READY. If the LUN reset and
TEST UNIT READY commands are successful, the kernel retries the command(s)
that timed out.

Each SCSI command issued by the kernel has a result field associated with
it. This field indicates the final result of the command (success or
error). When a command times out, the kernel places a value in this result
field indicating that the command timed out.

The "quirk" is that after the LUN reset and TEST UNIT READY commands are
completed, the kernel checks each command on the timed-out command list
before retrying it. If the result field is still "timed out", the kernel
treats that command as not having been successfully recovered for a
retry. If the number of commands that are in this state are greater than
two, the kernel takes the LUN offline.

Fix:
When our RAIDStack receives a LUN reset, it simply waits until all
outstanding commands complete. Generally, all of these outstanding commands
complete successfully. Therefore, the fix in the smartpqi driver is to
always set the command result field to indicate success when a request
completes successfully. This normally isn’t necessary because the result
field is always initialized to success when the command is submitted to the
driver. So when the command completes successfully, the result field is
left untouched. But in this case, the kernel changes the result field
behind the driver’s back and then expects the field to be changed by the
driver as the commands that timed-out complete.

Reviewed-by: Dave Carroll <david.carroll@microsemi.com>
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoIB/usnic: Fix potential deadlock
Parvi Kaustubhi [Tue, 11 Dec 2018 22:15:42 +0000 (14:15 -0800)]
IB/usnic: Fix potential deadlock

BugLink: https://bugs.launchpad.net/bugs/1837477
[ Upstream commit 8036e90f92aae2784b855a0007ae2d8154d28b3c ]

Acquiring the rtnl lock while holding usdev_lock could result in a
deadlock.

For example:

usnic_ib_query_port()
| mutex_lock(&us_ibdev->usdev_lock)
 | ib_get_eth_speed()
  | rtnl_lock()

rtnl_lock()
| usnic_ib_netdevice_event()
 | mutex_lock(&us_ibdev->usdev_lock)

This commit moves the usdev_lock acquisition after the rtnl lock has been
released.

This is safe to do because usdev_lock is not protecting anything being
accessed in ib_get_eth_speed(). Hence, the correct order of holding locks
(rtnl -> usdev_lock) is not violated.

Signed-off-by: Parvi Kaustubhi <pkaustub@cisco.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agosysfs: Disable lockdep for driver bind/unbind files
Daniel Vetter [Wed, 19 Dec 2018 12:39:09 +0000 (13:39 +0100)]
sysfs: Disable lockdep for driver bind/unbind files

BugLink: https://bugs.launchpad.net/bugs/1837477
[ Upstream commit 4f4b374332ec0ae9c738ff8ec9bed5cd97ff9adc ]

This is the much more correct fix for my earlier attempt at:

https://lkml.org/lkml/2018/12/10/118

Short recap:

- There's not actually a locking issue, it's just lockdep being a bit
  too eager to complain about a possible deadlock.

- Contrary to what I claimed the real problem is recursion on
  kn->count. Greg pointed me at sysfs_break_active_protection(), used
  by the scsi subsystem to allow a sysfs file to unbind itself. That
  would be a real deadlock, which isn't what's happening here. Also,
  breaking the active protection means we'd need to manually handle
  all the lifetime fun.

- With Rafael we discussed the task_work approach, which kinda works,
  but has two downsides: It's a functional change for a lockdep
  annotation issue, and it won't work for the bind file (which needs
  to get the errno from the driver load function back to userspace).

- Greg also asked why this never showed up: To hit this you need to
  unregister a 2nd driver from the unload code of your first driver. I
  guess only gpus do that. The bug has always been there, but only
  with a recent patch series did we add more locks so that lockdep
  built a chain from unbinding the snd-hda driver to the
  acpi_video_unregister call.

Full lockdep splat:

[12301.898799] ============================================
[12301.898805] WARNING: possible recursive locking detected
[12301.898811] 4.20.0-rc7+ #84 Not tainted
[12301.898815] --------------------------------------------
[12301.898821] bash/5297 is trying to acquire lock:
[12301.898826] 00000000f61c6093 (kn->count#39){++++}, at: kernfs_remove_by_name_ns+0x3b/0x80
[12301.898841] but task is already holding lock:
[12301.898847] 000000005f634021 (kn->count#39){++++}, at: kernfs_fop_write+0xdc/0x190
[12301.898856] other info that might help us debug this:
[12301.898862]  Possible unsafe locking scenario:
[12301.898867]        CPU0
[12301.898870]        ----
[12301.898874]   lock(kn->count#39);
[12301.898879]   lock(kn->count#39);
[12301.898883] *** DEADLOCK ***
[12301.898891]  May be due to missing lock nesting notation
[12301.898899] 5 locks held by bash/5297:
[12301.898903]  #0: 00000000cd800e54 (sb_writers#4){.+.+}, at: vfs_write+0x17f/0x1b0
[12301.898915]  #1: 000000000465e7c2 (&of->mutex){+.+.}, at: kernfs_fop_write+0xd3/0x190
[12301.898925]  #2: 000000005f634021 (kn->count#39){++++}, at: kernfs_fop_write+0xdc/0x190
[12301.898936]  #3: 00000000414ef7ac (&dev->mutex){....}, at: device_release_driver_internal+0x34/0x240
[12301.898950]  #4: 000000003218fbdf (register_count_mutex){+.+.}, at: acpi_video_unregister+0xe/0x40
[12301.898960] stack backtrace:
[12301.898968] CPU: 1 PID: 5297 Comm: bash Not tainted 4.20.0-rc7+ #84
[12301.898974] Hardware name: Hewlett-Packard HP EliteBook 8460p/161C, BIOS 68SCF Ver. F.01 03/11/2011
[12301.898982] Call Trace:
[12301.898989]  dump_stack+0x67/0x9b
[12301.898997]  __lock_acquire+0x6ad/0x1410
[12301.899003]  ? kernfs_remove_by_name_ns+0x3b/0x80
[12301.899010]  ? find_held_lock+0x2d/0x90
[12301.899017]  ? mutex_spin_on_owner+0xe4/0x150
[12301.899023]  ? find_held_lock+0x2d/0x90
[12301.899030]  ? lock_acquire+0x90/0x180
[12301.899036]  lock_acquire+0x90/0x180
[12301.899042]  ? kernfs_remove_by_name_ns+0x3b/0x80
[12301.899049]  __kernfs_remove+0x296/0x310
[12301.899055]  ? kernfs_remove_by_name_ns+0x3b/0x80
[12301.899060]  ? kernfs_name_hash+0xd/0x80
[12301.899066]  ? kernfs_find_ns+0x6c/0x100
[12301.899073]  kernfs_remove_by_name_ns+0x3b/0x80
[12301.899080]  bus_remove_driver+0x92/0xa0
[12301.899085]  acpi_video_unregister+0x24/0x40
[12301.899127]  i915_driver_unload+0x42/0x130 [i915]
[12301.899160]  i915_pci_remove+0x19/0x30 [i915]
[12301.899169]  pci_device_remove+0x36/0xb0
[12301.899176]  device_release_driver_internal+0x185/0x240
[12301.899183]  unbind_store+0xaf/0x180
[12301.899189]  kernfs_fop_write+0x104/0x190
[12301.899195]  __vfs_write+0x31/0x180
[12301.899203]  ? rcu_read_lock_sched_held+0x6f/0x80
[12301.899209]  ? rcu_sync_lockdep_assert+0x29/0x50
[12301.899216]  ? __sb_start_write+0x13c/0x1a0
[12301.899221]  ? vfs_write+0x17f/0x1b0
[12301.899227]  vfs_write+0xb9/0x1b0
[12301.899233]  ksys_write+0x50/0xc0
[12301.899239]  do_syscall_64+0x4b/0x180
[12301.899247]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[12301.899253] RIP: 0033:0x7f452ac7f7a4
[12301.899259] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 80 00 00 00 00 8b 05 aa f0 2c 00 48 63 ff 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 f3 c3 66 90 55 53 48 89 d5 48 89 f3 48 83
[12301.899273] RSP: 002b:00007ffceafa6918 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[12301.899282] RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007f452ac7f7a4
[12301.899288] RDX: 000000000000000d RSI: 00005612a1abf7c0 RDI: 0000000000000001
[12301.899295] RBP: 00005612a1abf7c0 R08: 000000000000000a R09: 00005612a1c46730
[12301.899301] R10: 000000000000000a R11: 0000000000000246 R12: 000000000000000d
[12301.899308] R13: 0000000000000001 R14: 00007f452af4a740 R15: 000000000000000d

Looking around I've noticed that usb and i2c already handle similar
recursion problems, where a sysfs file can unbind the same type of
sysfs somewhere else in the hierarchy. Relevant commits are:

commit 356c05d58af05d582e634b54b40050c73609617b
Author: Alan Stern <stern@rowland.harvard.edu>
Date:   Mon May 14 13:30:03 2012 -0400

    sysfs: get rid of some lockdep false positives

commit e9b526fe704812364bca07edd15eadeba163ebfb
Author: Alexander Sverdlin <alexander.sverdlin@nsn.com>
Date:   Fri May 17 14:56:35 2013 +0200

    i2c: suppress lockdep warning on delete_device

Implement the same trick for driver bind/unbind.

v2: Put the macro into bus.c (Greg).

Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
Cc: Arend van Spriel <aspriel@gmail.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Bartosz Golaszewski <brgl@bgdev.pl>
Cc: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Cc: Vivek Gautam <vivek.gautam@codeaurora.org>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoALSA: bebob: fix model-id of unit for Apogee Ensemble
Takashi Sakamoto [Wed, 19 Dec 2018 11:00:42 +0000 (20:00 +0900)]
ALSA: bebob: fix model-id of unit for Apogee Ensemble

BugLink: https://bugs.launchpad.net/bugs/1837477
[ Upstream commit 644b2e97405b0b74845e1d3c2b4fe4c34858062b ]

This commit fixes hard-coded model-id for an unit of Apogee Ensemble with
a correct value. This unit uses DM1500 ASIC produced ArchWave AG (formerly
known as BridgeCo AG).

I note that this model supports three modes in the number of data channels
in tx/rx streams; 8 ch pairs, 10 ch pairs, 18 ch pairs. The mode is
switched by Vendor-dependent AV/C command, like:

$ cd linux-firewire-utils
$ ./firewire-request /dev/fw1 fcp 0x00ff000003dbeb0600000000 (8ch pairs)
$ ./firewire-request /dev/fw1 fcp 0x00ff000003dbeb0601000000 (10ch pairs)
$ ./firewire-request /dev/fw1 fcp 0x00ff000003dbeb0602000000 (18ch pairs)

When switching between different mode, the unit disappears from IEEE 1394
bus, then appears on the bus with different combination of stream formats.
In a mode of 18 ch pairs, available sampling rate is up to 96.0 kHz, else
up to 192.0 kHz.

$ ./hinawa-config-rom-printer /dev/fw1
{ 'bus-info': { 'adj': False,
                'bmc': True,
                'chip_ID': 21474898341,
                'cmc': True,
                'cyc_clk_acc': 100,
                'generation': 2,
                'imc': True,
                'isc': True,
                'link_spd': 2,
                'max_ROM': 1,
                'max_rec': 512,
                'name': '1394',
                'node_vendor_ID': 987,
                'pmc': False},
  'root-directory': [ ['HARDWARE_VERSION', 19],
                      [ 'NODE_CAPABILITIES',
                        { 'addressing': {'64': True, 'fix': True, 'prv': False},
                          'misc': {'int': False, 'ms': False, 'spt': True},
                          'state': { 'atn': False,
                                     'ded': False,
                                     'drq': True,
                                     'elo': False,
                                     'init': False,
                                     'lst': True,
                                     'off': False},
                          'testing': {'bas': False, 'ext': False}}],
                      ['VENDOR', 987],
                      ['DESCRIPTOR', 'Apogee Electronics'],
                      ['MODEL', 126702],
                      ['DESCRIPTOR', 'Ensemble'],
                      ['VERSION', 5297],
                      [ 'UNIT',
                        [ ['SPECIFIER_ID', 41005],
                          ['VERSION', 65537],
                          ['MODEL', 126702],
                          ['DESCRIPTOR', 'Ensemble']]],
                      [ 'DEPENDENT_INFO',
                        [ ['SPECIFIER_ID', 2037],
                          ['VERSION', 1],
                          [(58, 'IMMEDIATE'), 16777159],
                          [(59, 'IMMEDIATE'), 1048576],
                          [(60, 'IMMEDIATE'), 16777159],
                          [(61, 'IMMEDIATE'), 6291456]]]]}

Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoclocksource/drivers/integrator-ap: Add missing of_node_put()
Yangtao Li [Sun, 25 Nov 2018 05:00:49 +0000 (00:00 -0500)]
clocksource/drivers/integrator-ap: Add missing of_node_put()

BugLink: https://bugs.launchpad.net/bugs/1837477
[ Upstream commit 5eb73c831171115d3b4347e1e7124a5a35d8086c ]

The function of_find_node_by_path() acquires a reference to the node
returned by it and that reference needs to be dropped by its caller.

integrator_ap_timer_init_of() doesn't do that.  The pri_node and the
sec_node are used as an identifier to compare against the current
node, so we can directly drop the refcount after getting the node from
the path as it is not used as pointer.

By dropping the refcount right after getting it, a single variable is
needed instead of two.

Fix this by use a single variable and drop the refcount right after
of_find_node_by_path().

Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoquota: Lock s_umount in exclusive mode for Q_XQUOTA{ON,OFF} quotactls.
Javier Barrio [Thu, 13 Dec 2018 00:06:29 +0000 (01:06 +0100)]
quota: Lock s_umount in exclusive mode for Q_XQUOTA{ON,OFF} quotactls.

BugLink: https://bugs.launchpad.net/bugs/1837477
[ Upstream commit 41c4f85cdac280d356df1f483000ecec4a8868be ]

Commit 1fa5efe3622db58cb8c7b9a50665e9eb9a6c7e97 (ext4: Use generic helpers for quotaon
and quotaoff) made possible to call quotactl(Q_XQUOTAON/OFF) on ext4 filesystems
with sysfile quota support. This leads to calling dquot_enable/disable without s_umount
held in excl. mode, because quotactl_cmd_onoff checks only for Q_QUOTAON/OFF.

The following WARN_ON_ONCE triggers (in this case for dquot_enable, ext4, latest Linus' tree):

[  117.807056] EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: quota,prjquota

[...]

[  155.036847] WARNING: CPU: 0 PID: 2343 at fs/quota/dquot.c:2469 dquot_enable+0x34/0xb9
[  155.036851] Modules linked in: quota_v2 quota_tree ipv6 af_packet joydev mousedev psmouse serio_raw pcspkr i2c_piix4 intel_agp intel_gtt e1000 ttm drm_kms_helper drm agpgart fb_sys_fops syscopyarea sysfillrect sysimgblt i2c_core input_leds kvm_intel kvm irqbypass qemu_fw_cfg floppy evdev parport_pc parport button crc32c_generic dm_mod ata_generic pata_acpi ata_piix libata loop ext4 crc16 mbcache jbd2 usb_storage usbcore sd_mod scsi_mod
[  155.036901] CPU: 0 PID: 2343 Comm: qctl Not tainted 4.20.0-rc6-00025-gf5d582777bcb #9
[  155.036903] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[  155.036911] RIP: 0010:dquot_enable+0x34/0xb9
[  155.036915] Code: 41 56 41 55 41 54 55 53 4c 8b 6f 28 74 02 0f 0b 4d 8d 7d 70 49 89 fc 89 cb 41 89 d6 89 f5 4c 89 ff e8 23 09 ea ff 85 c0 74 0a <0f> 0b 4c 89 ff e8 8b 09 ea ff 85 db 74 6a 41 8b b5 f8 00 00 00 0f
[  155.036918] RSP: 0018:ffffb09b00493e08 EFLAGS: 00010202
[  155.036922] RAX: 0000000000000001 RBX: 0000000000000008 RCX: 0000000000000008
[  155.036924] RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffff9781b67cd870
[  155.036926] RBP: 0000000000000002 R08: 0000000000000000 R09: 61c8864680b583eb
[  155.036929] R10: ffffb09b00493e48 R11: ffffffffff7ce7d4 R12: ffff9781b7ee8d78
[  155.036932] R13: ffff9781b67cd800 R14: 0000000000000004 R15: ffff9781b67cd870
[  155.036936] FS:  00007fd813250b88(0000) GS:ffff9781ba000000(0000) knlGS:0000000000000000
[  155.036939] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  155.036942] CR2: 00007fd812ff61d6 CR3: 000000007c882000 CR4: 00000000000006b0
[  155.036951] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  155.036953] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  155.036955] Call Trace:
[  155.037004]  dquot_quota_enable+0x8b/0xd0
[  155.037011]  kernel_quotactl+0x628/0x74e
[  155.037027]  ? do_mprotect_pkey+0x2a6/0x2cd
[  155.037034]  __x64_sys_quotactl+0x1a/0x1d
[  155.037041]  do_syscall_64+0x55/0xe4
[  155.037078]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  155.037105] RIP: 0033:0x7fd812fe1198
[  155.037109] Code: 02 77 0d 48 89 c1 48 c1 e9 3f 75 04 48 8b 04 24 48 83 c4 50 5b c3 48 83 ec 08 49 89 ca 48 63 d2 48 63 ff b8 b3 00 00 00 0f 05 <48> 89 c7 e8 c1 eb ff ff 5a c3 48 63 ff b8 bb 00 00 00 0f 05 48 89
[  155.037112] RSP: 002b:00007ffe8cd7b050 EFLAGS: 00000206 ORIG_RAX: 00000000000000b3
[  155.037116] RAX: ffffffffffffffda RBX: 00007ffe8cd7b148 RCX: 00007fd812fe1198
[  155.037119] RDX: 0000000000000000 RSI: 00007ffe8cd7cea9 RDI: 0000000000580102
[  155.037121] RBP: 00007ffe8cd7b0f0 R08: 000055fc8eba8a9d R09: 0000000000000000
[  155.037124] R10: 00007ffe8cd7b074 R11: 0000000000000206 R12: 00007ffe8cd7b168
[  155.037126] R13: 000055fc8eba8897 R14: 0000000000000000 R15: 0000000000000000
[  155.037131] ---[ end trace 210f864257175c51 ]---

and then the syscall proceeds without s_umount locking.

This patch locks the superblock ->s_umount sem. in exclusive mode for all Q_XQUOTAON/OFF
quotactls too in addition to Q_QUOTAON/OFF.

AFAICT, other than ext4, only xfs and ocfs2 are affected by this change.
The VFS will now call in xfs_quota_* functions with s_umount held, which wasn't the case
before. This looks good to me but I can not say for sure. Ext4 and ocfs2 where already
beeing called with s_umount exclusive via quota_quotaon/off which is basically the same.

Signed-off-by: Javier Barrio <javier.barrio.mart@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agodm snapshot: Fix excessive memory usage and workqueue stalls
Nikos Tsironis [Wed, 31 Oct 2018 21:53:08 +0000 (17:53 -0400)]
dm snapshot: Fix excessive memory usage and workqueue stalls

BugLink: https://bugs.launchpad.net/bugs/1837477
[ Upstream commit 721b1d98fb517ae99ab3b757021cf81db41e67be ]

kcopyd has no upper limit to the number of jobs one can allocate and
issue. Under certain workloads this can lead to excessive memory usage
and workqueue stalls. For example, when creating multiple dm-snapshot
targets with a 4K chunk size and then writing to the origin through the
page cache. Syncing the page cache causes a large number of BIOs to be
issued to the dm-snapshot origin target, which itself issues an even
larger (because of the BIO splitting taking place) number of kcopyd
jobs.

Running the following test, from the device mapper test suite [1],

  dmtest run --suite snapshot -n many_snapshots_of_same_volume_N

, with 8 active snapshots, results in the kcopyd job slab cache growing
to 10G. Depending on the available system RAM this can lead to the OOM
killer killing user processes:

[463.492878] kthreadd invoked oom-killer: gfp_mask=0x6040c0(GFP_KERNEL|__GFP_COMP),
              nodemask=(null), order=1, oom_score_adj=0
[463.492894] kthreadd cpuset=/ mems_allowed=0
[463.492948] CPU: 7 PID: 2 Comm: kthreadd Not tainted 4.19.0-rc7 #3
[463.492950] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[463.492952] Call Trace:
[463.492964]  dump_stack+0x7d/0xbb
[463.492973]  dump_header+0x6b/0x2fc
[463.492987]  ? lockdep_hardirqs_on+0xee/0x190
[463.493012]  oom_kill_process+0x302/0x370
[463.493021]  out_of_memory+0x113/0x560
[463.493030]  __alloc_pages_slowpath+0xf40/0x1020
[463.493055]  __alloc_pages_nodemask+0x348/0x3c0
[463.493067]  cache_grow_begin+0x81/0x8b0
[463.493072]  ? cache_grow_begin+0x874/0x8b0
[463.493078]  fallback_alloc+0x1e4/0x280
[463.493092]  kmem_cache_alloc_node+0xd6/0x370
[463.493098]  ? copy_process.part.31+0x1c5/0x20d0
[463.493105]  copy_process.part.31+0x1c5/0x20d0
[463.493115]  ? __lock_acquire+0x3cc/0x1550
[463.493121]  ? __switch_to_asm+0x34/0x70
[463.493129]  ? kthread_create_worker_on_cpu+0x70/0x70
[463.493135]  ? finish_task_switch+0x90/0x280
[463.493165]  _do_fork+0xe0/0x6d0
[463.493191]  ? kthreadd+0x19f/0x220
[463.493233]  kernel_thread+0x25/0x30
[463.493235]  kthreadd+0x1bf/0x220
[463.493242]  ? kthread_create_on_cpu+0x90/0x90
[463.493248]  ret_from_fork+0x3a/0x50
[463.493279] Mem-Info:
[463.493285] active_anon:20631 inactive_anon:4831 isolated_anon:0
[463.493285]  active_file:80216 inactive_file:80107 isolated_file:435
[463.493285]  unevictable:0 dirty:51266 writeback:109372 unstable:0
[463.493285]  slab_reclaimable:31191 slab_unreclaimable:3483521
[463.493285]  mapped:526 shmem:4903 pagetables:1759 bounce:0
[463.493285]  free:33623 free_pcp:2392 free_cma:0
...
[463.493489] Unreclaimable slab info:
[463.493513] Name                      Used          Total
[463.493522] bio-6                   1028KB       1028KB
[463.493525] bio-5                   1028KB       1028KB
[463.493528] dm_snap_pending_exception     236783KB     243789KB
[463.493531] dm_exception              41KB         42KB
[463.493534] bio-4                   1216KB       1216KB
[463.493537] bio-3                 439396KB     439396KB
[463.493539] kcopyd_job           6973427KB    6973427KB
...
[463.494340] Out of memory: Kill process 1298 (ruby2.3) score 1 or sacrifice child
[463.494673] Killed process 1298 (ruby2.3) total-vm:435740kB, anon-rss:20180kB, file-rss:4kB, shmem-rss:0kB
[463.506437] oom_reaper: reaped process 1298 (ruby2.3), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

Moreover, issuing a large number of kcopyd jobs results in kcopyd
hogging the CPU, while processing them. As a result, processing of work
items, queued for execution on the same CPU as the currently running
kcopyd thread, is stalled for long periods of time, hurting performance.
Running the aforementioned test we get, in dmesg, messages like the
following:

[67501.194592] BUG: workqueue lockup - pool cpus=4 node=0 flags=0x0 nice=0 stuck for 27s!
[67501.195586] Showing busy workqueues and worker pools:
[67501.195591] workqueue events: flags=0x0
[67501.195597]   pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=1/256
[67501.195611]     pending: cache_reap
[67501.195641] workqueue mm_percpu_wq: flags=0x8
[67501.195645]   pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=1/256
[67501.195656]     pending: vmstat_update
[67501.195682] workqueue kblockd: flags=0x18
[67501.195687]   pwq 5: cpus=2 node=0 flags=0x0 nice=-20 active=1/256
[67501.195698]     pending: blk_timeout_work
[67501.195753] workqueue kcopyd: flags=0x8
[67501.195757]   pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=1/256
[67501.195768]     pending: do_work [dm_mod]
[67501.195802] workqueue kcopyd: flags=0x8
[67501.195806]   pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=1/256
[67501.195817]     pending: do_work [dm_mod]
[67501.195834] workqueue kcopyd: flags=0x8
[67501.195838]   pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=1/256
[67501.195848]     pending: do_work [dm_mod]
[67501.195881] workqueue kcopyd: flags=0x8
[67501.195885]   pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=1/256
[67501.195896]     pending: do_work [dm_mod]
[67501.195920] workqueue kcopyd: flags=0x8
[67501.195924]   pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=2/256
[67501.195935]     in-flight: 67:do_work [dm_mod]
[67501.195945]     pending: do_work [dm_mod]
[67501.195961] pool 8: cpus=4 node=0 flags=0x0 nice=0 hung=27s workers=3 idle: 129 23765

The root cause for these issues is the way dm-snapshot uses kcopyd. In
particular, the lack of an explicit or implicit limit to the maximum
number of in-flight COW jobs. The merging path is not affected because
it implicitly limits the in-flight kcopyd jobs to one.

Fix these issues by using a semaphore to limit the maximum number of
in-flight kcopyd jobs. We grab the semaphore before allocating a new
kcopyd job in start_copy() and start_full_bio() and release it after the
job finishes in copy_callback().

The initial semaphore value is configurable through a module parameter,
to allow fine tuning the maximum number of in-flight COW jobs. Setting
this parameter to zero initializes the semaphore to INT_MAX.

A default value of 2048 maximum in-flight kcopyd jobs was chosen. This
value was decided experimentally as a trade-off between memory
consumption, stalling the kernel's workqueues and maintaining a high
enough throughput.

Re-running the aforementioned test:

  * Workqueue stalls are eliminated
  * kcopyd's job slab cache uses a maximum of 130MB
  * The time taken by the test to write to the snapshot-origin target is
    reduced from 05m20.48s to 03m26.38s

[1] https://github.com/jthornber/device-mapper-test-suite

Signed-off-by: Nikos Tsironis <ntsironis@arrikto.com>
Signed-off-by: Ilias Tsitsimpis <iliastsi@arrikto.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agotools lib subcmd: Don't add the kernel sources to the include path
Arnaldo Carvalho de Melo [Tue, 11 Dec 2018 18:00:52 +0000 (15:00 -0300)]
tools lib subcmd: Don't add the kernel sources to the include path

BugLink: https://bugs.launchpad.net/bugs/1837477
[ Upstream commit ece9804985b57e1ccd83b1fb6288520955a29d51 ]

At some point we decided not to directly include kernel sources files
when building tools/perf/, but when tools/lib/subcmd/ was forked from
tools/perf it somehow ended up adding it via these two lines in its
Makefile:

  CFLAGS += -I$(srctree)/include/uapi
  CFLAGS += -I$(srctree)/include

As $(srctree) points to the kernel sources.

Removing those lines and keeping just:

  CFLAGS += -I$(srctree)/tools/include/

Is enough to build tools/perf and tools/objtool.

This fixes the build when building from the sources in environments such
as the Android NDK crossbuilding from a fedora:26 system:

  subcmd-util.h:11:15: error: expected ',' or ';' before 'void'
   static inline void report(const char *prefix, const char *err, va_list params)
                 ^
  In file included from /git/perf/include/uapi/linux/stddef.h:2:0,
                   from /git/perf/include/uapi/linux/posix_types.h:5,
                   from /opt/android-ndk-r12b/platforms/android-24/arch-arm/usr/include/sys/types.h:36,
                   from /opt/android-ndk-r12b/platforms/android-24/arch-arm/usr/include/unistd.h:33,
                   from run-command.c:2:
  subcmd-util.h:18:17: error: '__no_instrument_function__' attribute applies only to functions

The /opt/android-ndk-r12b/platforms/android-24/arch-arm/usr/include/sys/types.h
file that includes linux/posix_types.h ends up getting the one in the kernel
sources causing the breakage. Fix it.

Test built tools/objtool/ too.

Reported-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Fixes: 4b6ab94eabe4 ("perf subcmd: Create subcmd library")
Link: https://lkml.kernel.org/n/tip-5lhaoecrj12t0bqwvpiu14sm@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agodm kcopyd: Fix bug causing workqueue stalls
Nikos Tsironis [Wed, 31 Oct 2018 21:53:09 +0000 (17:53 -0400)]
dm kcopyd: Fix bug causing workqueue stalls

BugLink: https://bugs.launchpad.net/bugs/1837477
[ Upstream commit d7e6b8dfc7bcb3f4f3a18313581f67486a725b52 ]

When using kcopyd to run callbacks through dm_kcopyd_do_callback() or
submitting copy jobs with a source size of 0, the jobs are pushed
directly to the complete_jobs list, which could be under processing by
the kcopyd thread. As a result, the kcopyd thread can continue running
completed jobs indefinitely, without releasing the CPU, as long as
someone keeps submitting new completed jobs through the aforementioned
paths. Processing of work items, queued for execution on the same CPU as
the currently running kcopyd thread, is thus stalled for excessive
amounts of time, hurting performance.

Running the following test, from the device mapper test suite [1],

  dmtest run --suite snapshot -n parallel_io_to_many_snaps_N

, with 8 active snapshots, we get, in dmesg, messages like the
following:

[68899.948523] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 95s!
[68899.949282] Showing busy workqueues and worker pools:
[68899.949288] workqueue events: flags=0x0
[68899.949295]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256
[68899.949306]     pending: vmstat_shepherd, cache_reap
[68899.949331] workqueue mm_percpu_wq: flags=0x8
[68899.949337]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
[68899.949345]     pending: vmstat_update
[68899.949387] workqueue dm_bufio_cache: flags=0x8
[68899.949392]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
[68899.949400]     pending: work_fn [dm_bufio]
[68899.949423] workqueue kcopyd: flags=0x8
[68899.949429]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
[68899.949437]     pending: do_work [dm_mod]
[68899.949452] workqueue kcopyd: flags=0x8
[68899.949458]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256
[68899.949466]     in-flight: 13:do_work [dm_mod]
[68899.949474]     pending: do_work [dm_mod]
[68899.949487] workqueue kcopyd: flags=0x8
[68899.949493]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
[68899.949501]     pending: do_work [dm_mod]
[68899.949515] workqueue kcopyd: flags=0x8
[68899.949521]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
[68899.949529]     pending: do_work [dm_mod]
[68899.949541] workqueue kcopyd: flags=0x8
[68899.949547]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
[68899.949555]     pending: do_work [dm_mod]
[68899.949568] pool 0: cpus=0 node=0 flags=0x0 nice=0 hung=95s workers=4 idle: 27130 27223 1084

Fix this by splitting the complete_jobs list into two parts: A user
facing part, named callback_jobs, and one used internally by kcopyd,
retaining the name complete_jobs. dm_kcopyd_do_callback() and
dispatch_job() now push their jobs to the callback_jobs list, which is
spliced to the complete_jobs list once, every time the kcopyd thread
wakes up. This prevents kcopyd from hogging the CPU indefinitely and
causing workqueue stalls.

Re-running the aforementioned test:

  * Workqueue stalls are eliminated
  * The maximum writing time among all targets is reduced from 09m37.10s
    to 06m04.85s and the total run time of the test is reduced from
    10m43.591s to 7m19.199s

[1] https://github.com/jthornber/device-mapper-test-suite

Signed-off-by: Nikos Tsironis <ntsironis@arrikto.com>
Signed-off-by: Ilias Tsitsimpis <iliastsi@arrikto.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agodm crypt: use u64 instead of sector_t to store iv_offset
AliOS system security [Mon, 5 Nov 2018 07:31:42 +0000 (15:31 +0800)]
dm crypt: use u64 instead of sector_t to store iv_offset

BugLink: https://bugs.launchpad.net/bugs/1837477
[ Upstream commit 8d683dcd65c037efc9fb38c696ec9b65b306e573 ]

The iv_offset in the mapping table of crypt target is a 64bit number when
IV algorithm is plain64, plain64be, essiv or benbi. It will be assigned to
iv_offset of struct crypt_config, cc_sector of struct convert_context and
iv_sector of struct dm_crypt_request. These structures members are defined
as a sector_t. But sector_t is 32bit when CONFIG_LBDAF is not set in 32bit
kernel. In this situation sector_t is not big enough to store the 64bit
iv_offset.

Here is a reproducer.
Prepare test image and device (loop is automatically allocated by cryptsetup):

  # dd if=/dev/zero of=tst.img bs=1M count=1
  # echo "tst"|cryptsetup open --type plain -c aes-xts-plain64 \
  --skip 500000000000000000 tst.img test

On 32bit system (use IV offset value that overflows to 64bit; CONFIG_LBDAF if off)
and device checksum is wrong:

  # dmsetup table test --showkeys
  0 2048 crypt aes-xts-plain64 dfa7cfe3c481f2239155739c42e539ae8f2d38f304dcc89d20b26f69daaf0933 3551657984 7:0 0

  # sha256sum /dev/mapper/test
  533e25c09176632b3794f35303488c4a8f3f965dffffa6ec2df347c168cb6c19 /dev/mapper/test

On 64bit system (and on 32bit system with the patch), table and checksum is now correct:

  # dmsetup table test --showkeys
  0 2048 crypt aes-xts-plain64 dfa7cfe3c481f2239155739c42e539ae8f2d38f304dcc89d20b26f69daaf0933 500000000000000000 7:0 0

  # sha256sum /dev/mapper/test
  5d16160f9d5f8c33d8051e65fdb4f003cc31cd652b5abb08f03aa6fce0df75fc /dev/mapper/test

Signed-off-by: AliOS system security <alios_sys_security@linux.alibaba.com>
Tested-and-Reviewed-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agonetfilter: ipt_CLUSTERIP: check MAC address when duplicate config is set
Taehee Yoo [Mon, 5 Nov 2018 09:23:25 +0000 (18:23 +0900)]
netfilter: ipt_CLUSTERIP: check MAC address when duplicate config is set

BugLink: https://bugs.launchpad.net/bugs/1837477
[ Upstream commit 06aa151ad1fc74a49b45336672515774a678d78d ]

If same destination IP address config is already existing, that config is
just used. MAC address also should be same.
However, there is no MAC address checking routine.
So that MAC address checking routine is added.

test commands:
   %iptables -A INPUT -p tcp -i lo -d 192.168.0.5 --dport 80 \
   -j CLUSTERIP --new --hashmode sourceip \
   --clustermac 01:00:5e:00:00:20 --total-nodes 2 --local-node 1
   %iptables -A INPUT -p tcp -i lo -d 192.168.0.5 --dport 80 \
   -j CLUSTERIP --new --hashmode sourceip \
   --clustermac 01:00:5e:00:00:21 --total-nodes 2 --local-node 1

After this patch, above commands are disallowed.

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoperf parse-events: Fix unchecked usage of strncpy()
Arnaldo Carvalho de Melo [Thu, 6 Dec 2018 16:52:13 +0000 (13:52 -0300)]
perf parse-events: Fix unchecked usage of strncpy()

BugLink: https://bugs.launchpad.net/bugs/1837477
[ Upstream commit bd8d57fb7e25e9fcf67a9eef5fa13aabe2016e07 ]

The strncpy() function may leave the destination string buffer
unterminated, better use strlcpy() that we have a __weak fallback
implementation for systems without it.

This fixes this warning on an Alpine Linux Edge system with gcc 8.2:

  util/parse-events.c: In function 'print_symbol_events':
  util/parse-events.c:2465:4: error: 'strncpy' specified bound 100 equals destination size [-Werror=stringop-truncation]
      strncpy(name, syms->symbol, MAX_NAME_LEN);
      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  In function 'print_symbol_events.constprop',
      inlined from 'print_events' at util/parse-events.c:2508:2:
  util/parse-events.c:2465:4: error: 'strncpy' specified bound 100 equals destination size [-Werror=stringop-truncation]
      strncpy(name, syms->symbol, MAX_NAME_LEN);
      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  In function 'print_symbol_events.constprop',
      inlined from 'print_events' at util/parse-events.c:2511:2:
  util/parse-events.c:2465:4: error: 'strncpy' specified bound 100 equals destination size [-Werror=stringop-truncation]
      strncpy(name, syms->symbol, MAX_NAME_LEN);
      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  cc1: all warnings being treated as errors

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Fixes: 947b4ad1d198 ("perf list: Fix max event string size")
Link: https://lkml.kernel.org/n/tip-b663e33bm6x8hrkie4uxh7u2@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoperf svghelper: Fix unchecked usage of strncpy()
Arnaldo Carvalho de Melo [Thu, 6 Dec 2018 14:29:48 +0000 (11:29 -0300)]
perf svghelper: Fix unchecked usage of strncpy()

BugLink: https://bugs.launchpad.net/bugs/1837477
[ Upstream commit 2f5302533f306d5ee87bd375aef9ca35b91762cb ]

The strncpy() function may leave the destination string buffer
unterminated, better use strlcpy() that we have a __weak fallback
implementation for systems without it.

In this specific case this would only happen if fgets() was buggy, as
its man page states that it should read one less byte than the size of
the destination buffer, so that it can put the nul byte at the end of
it, so it would never copy 255 non-nul chars, as fgets reads into the
orig buffer at most 254 non-nul chars and terminates it. But lets just
switch to strlcpy to keep the original intent and silence the gcc 8.2
warning.

This fixes this warning on an Alpine Linux Edge system with gcc 8.2:

  In function 'cpu_model',
      inlined from 'svg_cpu_box' at util/svghelper.c:378:2:
  util/svghelper.c:337:5: error: 'strncpy' output may be truncated copying 255 bytes from a string of length 255 [-Werror=stringop-truncation]
       strncpy(cpu_m, &buf[13], 255);
       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Fixes: f48d55ce7871 ("perf: Add a SVG helper library file")
Link: https://lkml.kernel.org/n/tip-xzkoo0gyr56gej39ltivuh9g@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>