git.proxmox.com Git - mirror_ubuntu-disco-kernel.git/log

drm/amd/powerplay: correct power reading on fiji

BugLink: https://bugs.launchpad.net/bugs/1821607
commit f5742ec36422a39b57f0256e4847f61b3c432f8c upstream.

Set sampling period as 500ms to provide a smooth power
reading output. Also, correct the register for power
reading.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

drm/radeon/evergreen_cs: fix missing break in switch statement

BugLink: https://bugs.launchpad.net/bugs/1821607
commit cc5034a5d293dd620484d1d836aa16c6764a1c8c upstream.

Add missing break statement in order to prevent the code from falling
through to case CB_TARGET_MASK.

This bug was found thanks to the ongoing efforts to enable
-Wimplicit-fallthrough.

Fixes: dd220a00e8bd ("drm/radeon/kms: add support for streamout v7")
Cc: stable@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

drm/fb-helper: generic: Fix drm_fbdev_client_restore()

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 78de14c23e031420aa5f61973583635eccd6cd2a upstream.

If fbdev setup has failed, lastclose will give a NULL pointer deref:

[   77.794295] [drm:drm_lastclose]
[   77.794414] [drm:drm_lastclose] driver lastclose completed
[   77.794660] Unable to handle kernel NULL pointer dereference at virtual address 00000014
[   77.809460] pgd = b376b71b
[   77.818275] [00000014] *pgd=175ba831, *pte=00000000, *ppte=00000000
[   77.830813] Internal error: Oops: 17 [#1] ARM
[   77.840963] Modules linked in: mi0283qt mipi_dbi tinydrm raspberrypi_hwmon gpio_backlight backlight snd_bcm2835(C) bcm2835_rng rng_core
[   77.865203] CPU: 0 PID: 527 Comm: lt-modetest Tainted: G         C        5.0.0-rc1+ #1
[   77.879525] Hardware name: BCM2835
[   77.889185] PC is at restore_fbdev_mode+0x20/0x164
[   77.900261] LR is at drm_fb_helper_restore_fbdev_mode_unlocked+0x54/0x9c
[   78.002446] Process lt-modetest (pid: 527, stack limit = 0x7a3d5c14)
[   78.291030] Backtrace:
[   78.300815] [<c04f2d0c>] (restore_fbdev_mode) from [<c04f4708>] (drm_fb_helper_restore_fbdev_mode_unlocked+0x54/0x9c)
[   78.319095]  r9:d8a8a288 r8:d891acf0 r7:d7697910 r6:00000000 r5:d891ac00 r4:d891ac00
[   78.334432] [<c04f46b4>] (drm_fb_helper_restore_fbdev_mode_unlocked) from [<c04f47e8>] (drm_fbdev_client_restore+0x18/0x20)
[   78.353296]  r8:d76978c0 r7:d7697910 r6:d7697950 r5:d7697800 r4:d891ac00 r3:c04f47d0
[   78.368689] [<c04f47d0>] (drm_fbdev_client_restore) from [<c051b6b4>] (drm_client_dev_restore+0x7c/0xc0)
[   78.385982] [<c051b638>] (drm_client_dev_restore) from [<c04f8fd0>] (drm_lastclose+0xc4/0xd4)
[   78.402332]  r8:d76978c0 r7:d7471080 r6:c0e0c088 r5:d8a85e00 r4:d7697800
[   78.416688] [<c04f8f0c>] (drm_lastclose) from [<c04f9088>] (drm_release+0xa8/0x10c)
[   78.431929]  r5:d8a85e00 r4:d7697800
[   78.442989] [<c04f8fe0>] (drm_release) from [<c02640c4>] (__fput+0x104/0x1c8)
[   78.457740]  r8:d5ccea10 r7:d96cfb10 r6:00000008 r5:d74c1b90 r4:d8a8a280
[   78.472043] [<c0263fc0>] (__fput) from [<c02641ec>] (____fput+0x18/0x1c)
[   78.486363]  r10:00000006 r9:d7722000 r8:c01011c4 r7:00000000 r6:c0ebac6c r5:d892a340
[   78.501869]  r4:d8a8a280
[   78.512002] [<c02641d4>] (____fput) from [<c013ef1c>] (task_work_run+0x98/0xac)
[   78.527186] [<c013ee84>] (task_work_run) from [<c010cc54>] (do_work_pending+0x4f8/0x570)
[   78.543238]  r7:d7722030 r6:00000004 r5:d7723fb0 r4:00000000
[   78.556825] [<c010c75c>] (do_work_pending) from [<c0101034>] (slow_work_pending+0xc/0x20)
[   78.674256] ---[ end trace 70d3a60cf739be3b ]---

Fix by using drm_fb_helper_lastclose() which checks if fbdev is in use.

Fixes: 9060d7f49376 ("drm/fb-helper: Finish the generic fbdev emulation")
Cc: stable@vger.kernel.org
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190125150300.33268-1-noralf@tronnes.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

media: imx: csi: Stop upstream before disabling IDMA channel

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 4bc1ab41eee9d02ad2483bf8f51a7b72e3504eba upstream.

Move upstream stream off to just after receiving the last EOF completion
and disabling the CSI (and thus before disabling the IDMA channel) in
csi_stop(). For symmetry also move upstream stream on to beginning of
csi_start().

Doing this makes csi_s_stream() more symmetric with prp_s_stream() which
will require the same change to fix a hard lockup.

Signed-off-by: Steve Longerbeam <slongerbeam@gmail.com>
Cc: stable@vger.kernel.org # for 4.13 and up
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

media: imx: csi: Disable CSI immediately after last EOF

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 2e0fe66e0a136252f4d89dbbccdcb26deb867eb8 upstream.

Disable the CSI immediately after receiving the last EOF before stream
off (and thus before disabling the IDMA channel). Do this by moving the
wait for EOF completion into a new function csi_idmac_wait_last_eof().

This fixes a complete system hard lockup on the SabreAuto when streaming
from the ADV7180, by repeatedly sending a stream off immediately followed
by stream on:

while true; do v4l2-ctl -d4 --stream-mmap --stream-count=3; done

Eventually this either causes the system lockup or EOF timeouts at all
subsequent stream on, until a system reset.

The lockup occurs when disabling the IDMA channel at stream off. Disabling
the CSI before disabling the IDMA channel appears to be a reliable fix for
the hard lockup.

Fixes: 4a34ec8e470cb ("[media] media: imx: Add CSI subdev driver")
Reported-by: Gaël PORTAY <gael.portay@collabora.com>
Signed-off-by: Steve Longerbeam <slongerbeam@gmail.com>
Cc: stable@vger.kernel.org # for 4.13 and up
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

media: imx-csi: Input connections to CSI should be optional

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 337e90ed028643c7acdfd0d31e3224d05ca03d66 upstream.

Some imx platforms do not have fwnode connections to all CSI input
ports, and should not be treated as an error. This includes the
imx6q SabreAuto, which has no connections to ipu1_csi1 and ipu2_csi0.
Return -ENOTCONN in imx_csi_parse_endpoint() so that v4l2-fwnode
endpoint parsing will not treat an unconnected CSI input port as
an error.

Fixes: c893500a16baf ("media: imx: csi: Register a subdev notifier")
Signed-off-by: Steve Longerbeam <slongerbeam@gmail.com>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Acked-by: Tim Harvey <tharvey@gateworks.com>
Cc: stable@vger.kernel.org
Tested-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

media: vimc: Add vimc-streamer for stream control

BugLink: https://bugs.launchpad.net/bugs/1821607
commit adc589d2a20808fb99d46a78175cd023f2040338 upstream.

Add a linear pipeline logic for the stream control. It's created by
walking backwards on the entity graph. When the stream starts it will
simply loop through the pipeline calling the respective process_frame
function of each entity.

Fixes: f2fe89061d797 ("vimc: Virtual Media Controller core, capture
and sensor")

Cc: stable@vger.kernel.org # for v4.20
Signed-off-by: Lucas A. M. Magalhães <lucmaga@gmail.com>
Acked-by: Helen Koike <helen.koike@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
[hverkuil-cisco@xs4all.nl: fixed small space-after-tab issue in the patch]
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

media: uvcvideo: Avoid NULL pointer dereference at the end of streaming

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 9dd0627d8d62a7ddb001a75f63942d92b5336561 upstream.

The UVC video driver converts the timestamp from hardware specific unit
to one known by the kernel at the time when the buffer is dequeued. This
is fine in general, but the streamoff operation consists of the
following steps (among other things):

1. uvc_video_clock_cleanup --- the hardware clock sample array is
   released and the pointer to the array is set to NULL,

2. buffers in active state are returned to the user and

3. buf_finish callback is called on buffers that are prepared.
   buf_finish includes calling uvc_video_clock_update that accesses the
   hardware clock sample array.

The above is serialised by a queue specific mutex. Address the problem
by skipping the clock conversion if the hardware clock sample array is
already released.

Fixes: 9c0863b1cc48 ("[media] vb2: call buf_finish from __queue_cancel")
Reported-by: Chiranjeevi Rapolu <chiranjeevi.rapolu@intel.com>
Tested-by: Chiranjeevi Rapolu <chiranjeevi.rapolu@intel.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

media: sun6i: Fix CSI regmap's max_register

BugLink: https://bugs.launchpad.net/bugs/1821607
commit d31b282e2c0de9c7fb113516820340251f03a625 upstream.

max_register is currently set to 0x1000. This is beyond the mapped
address range of the hardware, so attempts to dump the regmap from
debugfs would trigger a kernel exception.

Furthermore, the useful registers only occupy a small section at the
beginning of the full range. Change the value to 0x9c, the last known
register on the V3s and H3.

On the A31, the register range is extended to support additional
capture channels. Since this is not yet supported, ignore it for now.

Fixes: 5cc7522d8965 ("media: sun6i: Add support for Allwinner CSI V3s")
Cc: <stable@vger.kernel.org>
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Acked-by: Maxime Ripard <maxime.ripard@bootlin.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

media: lgdt330x: fix lock status reporting

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 1b4fd9de6ec7f3722c2b3e08cc5ad171c11f93be upstream.

A typo in code cleanup commit db9c1007bc07 ("media: lgdt330x: do
some cleanups at status logic") broke the FE_HAS_LOCK reporting
for 3303 chips by inadvertently modifying the register mask.

The broken lock status is critial as it prevents video capture
cards from reporting signal strength, scanning for channels,
and capturing video.

Fix regression by reverting mask change.

Cc: stable@vger.kernel.org # Kernel 4.17+
Fixes: db9c1007bc07 ("media: lgdt330x: do some cleanups at status logic")
Signed-off-by: Nick French <naf@ou.edu>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Tested-by: Adam Stylinski <kungfujesus06@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

media: imx: prpencvf: Stop upstream before disabling IDMA channel

BugLink: https://bugs.launchpad.net/bugs/1821607
commit a19c22677377b87e4354f7306f46ad99bc982a9f upstream.

Upstream must be stopped immediately after receiving the last EOF and
before disabling the IDMA channel. This can be accomplished by moving
upstream stream off to just after receiving the last EOF completion in
prp_stop(). For symmetry also move upstream stream on to end of
prp_start().

This fixes a complete system hard lockup on the SabreAuto when streaming
from the ADV7180, by repeatedly sending a stream off immediately followed
by stream on:

while true; do v4l2-ctl -d1 --stream-mmap --stream-count=3; done

Eventually this either causes the system lockup or EOF timeouts at all
subsequent stream on, until a system reset.

The lockup occurs when disabling the IDMA channel at stream off. Stopping
the video data stream entering the IDMA channel before disabling the
channel itself appears to be a reliable fix for the hard lockup.

Fixes: f0d9c8924e2c3 ("[media] media: imx: Add IC subdev drivers")
Reported-by: Gaël PORTAY <gael.portay@collabora.com>
Tested-by: Gaël PORTAY <gael.portay@collabora.com>
Signed-off-by: Steve Longerbeam <slongerbeam@gmail.com>
Cc: stable@vger.kernel.org # for 4.13 and up
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

rcu: Do RCU GP kthread self-wakeup from softirq and interrupt

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 1d1f898df6586c5ea9aeaf349f13089c6fa37903 upstream.

The rcu_gp_kthread_wake() function is invoked when it might be necessary
to wake the RCU grace-period kthread.  Because self-wakeups are normally
a useless waste of CPU cycles, if rcu_gp_kthread_wake() is invoked from
this kthread, it naturally refuses to do the wakeup.

Unfortunately, natural though it might be, this heuristic fails when
rcu_gp_kthread_wake() is invoked from an interrupt or softirq handler
that interrupted the grace-period kthread just after the final check of
the wait-event condition but just before the schedule() call.  In this
case, a wakeup is required, even though the call to rcu_gp_kthread_wake()
is within the RCU grace-period kthread's context.  Failing to provide
this wakeup can result in grace periods failing to start, which in turn
results in out-of-memory conditions.

This race window is quite narrow, but it actually did happen during real
testing.  It would of course need to be fixed even if it was strictly
theoretical in nature.

This patch does not Cc stable because it does not apply cleanly to
earlier kernel versions.

Fixes: 48a7639ce80c ("rcu: Make callers awaken grace-period kthread")
Reported-by: "He, Bo" <bo.he@intel.com>
Co-developed-by: "Zhang, Jun" <jun.zhang@intel.com>
Co-developed-by: "He, Bo" <bo.he@intel.com>
Co-developed-by: "xiao, jin" <jin.xiao@intel.com>
Co-developed-by: Bai, Jie A <jie.a.bai@intel.com>
Signed-off: "Zhang, Jun" <jun.zhang@intel.com>
Signed-off: "He, Bo" <bo.he@intel.com>
Signed-off: "xiao, jin" <jin.xiao@intel.com>
Signed-off: Bai, Jie A <jie.a.bai@intel.com>
Signed-off-by: "Zhang, Jun" <jun.zhang@intel.com>
[ paulmck: Switch from !in_softirq() to "!in_interrupt() &&
  !in_serving_softirq() to avoid redundant wakeups and to also handle the
  interrupt-handler scenario as well as the softirq-handler scenario that
  actually occurred in testing. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Link: https://lkml.kernel.org/r/CD6925E8781EFD4D8E11882D20FC406D52A11F61@SHSMSX104.ccr.corp.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

tpm: Unify the send callback behaviour

BugLink: https://bugs.launchpad.net/bugs/1821607
commit f5595f5baa30e009bf54d0d7653a9a0cc465be60 upstream.

The send() callback should never return length as it does not in every
driver except tpm_crb in the success case. The reason is that the main
transmit functionality only cares about whether the transmit was
successful or not and ignores the count completely.

Suggested-by: Stefan Berger <stefanb@linux.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Tested-by: Alexander Steffen <Alexander.Steffen@infineon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

tpm/tpm_crb: Avoid unaligned reads in crb_recv()

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 3d7a850fdc1a2e4d2adbc95cc0fc962974725e88 upstream.

The current approach to read first 6 bytes from the response and then tail
of the response, can cause the 2nd memcpy_fromio() to do an unaligned read
(e.g. read 32-bit word from address aligned to a 16-bits), depending on how
memcpy_fromio() is implemented. If this happens, the read will fail and the
memory controller will fill the read with 1's.

This was triggered by 170d13ca3a2f, which should be probably refined to
check and react to the address alignment. Before that commit, on x86
memcpy_fromio() turned out to be memcpy(). By a luck GCC has done the right
thing (from tpm_crb's perspective) for us so far, but we should not rely on
that. Thus, it makes sense to fix this also in tpm_crb, not least because
the fix can be then backported to stable kernels and make them more robust
when compiled in differing environments.

Cc: stable@vger.kernel.org
Cc: James Morris <jmorris@namei.org>
Cc: Tomas Winkler <tomas.winkler@intel.com>
Cc: Jerry Snitselaar <jsnitsel@redhat.com>
Fixes: 30fc8d138e91 ("tpm: TPM 2.0 CRB Interface")
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Acked-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

x86/ftrace: Fix warning and considate ftrace_jmp_replace() and ftrace_call_replace()

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 745cfeaac09ce359130a5451d90cb0bd4094c290 upstream.

Arnd reported the following compiler warning:

arch/x86/kernel/ftrace.c:669:23: error: 'ftrace_jmp_replace' defined but not used [-Werror=unused-function]

The ftrace_jmp_replace() function now only has a single user and should be
simply moved by that user. But looking at the code, it shows that
ftrace_jmp_replace() is similar to ftrace_call_replace() except that instead
of using the opcode of 0xe8 it uses 0xe9. It makes more sense to consolidate
that function into one implementation that both ftrace_jmp_replace() and
ftrace_call_replace() use by passing in the op code separate.

The structure in ftrace_code_union is also modified to replace the "e8"
field with the more appropriate name "op".

Cc: stable@vger.kernel.org
Reported-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: http://lkml.kernel.org/r/20190304200748.1418790-1-arnd@arndb.de
Fixes: d2a68c4effd8 ("x86/ftrace: Do not call function graph from dynamic trampolines")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

x86/kvmclock: set offset for kvm unstable clock

BugLink: https://bugs.launchpad.net/bugs/1821607
commit b5179ec4187251a751832193693d6e474d3445ac upstream.

VMs may show incorrect uptime and dmesg printk offsets on hypervisors with
unstable clock. The problem is produced when VM is rebooted without exiting
from qemu.

The fix is to calculate clock offset not only for stable clock but for
unstable clock as well, and use kvm_sched_clock_read() which substracts
the offset for both clocks.

This is safe, because pvclock_clocksource_read() does the right thing and
makes sure that clock always goes forward, so once offset is calculated
with unstable clock, we won't get new reads that are smaller than offset,
and thus won't get negative results.

Thank you Jon DeVree for helping to reproduce this issue.

Fixes: 857baa87b642 ("sched/clock: Enable sched clock early")
Cc: stable@vger.kernel.org
Reported-by: Dominique Martinet <asmadeus@codewreck.org>
Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

md: Fix failed allocation of md_register_thread

BugLink: https://bugs.launchpad.net/bugs/1821607
commit e406f12dde1a8375d77ea02d91f313fb1a9c6aec upstream.

mddev->sync_thread can be set to NULL on kzalloc failure downstream.
The patch checks for such a scenario and frees allocated resources.

Committer node:

Added similar fix to raid5.c, as suggested by Guoqing.

Cc: stable@vger.kernel.org # v3.16+
Acked-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Aditya Pakki <pakki001@umn.edu>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

perf intel-pt: Fix divide by zero when TSC is not available

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 076333870c2f5bdd9b6d31e7ca1909cf0c84cbfa upstream.

When TSC is not available, "timeless" decoding is used but a divide by
zero occurs if perf_time_to_tsc() is called.

Ensure the divisor is not zero.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: stable@vger.kernel.org # v4.9+
Link: https://lkml.kernel.org/n/tip-1i4j0wqoc8vlbkcizqqxpsf4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

perf/x86/intel/uncore: Fix client IMC events return huge result

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 8041ffd36f42d8521d66dd1e236feb58cecd68bc upstream.

The client IMC bandwidth events currently return very large values:

  $ perf stat -e uncore_imc/data_reads/ -e uncore_imc/data_writes/ -I 10000 -a

  10.000117222 34,788.76 MiB uncore_imc/data_reads/
  10.000117222 8.26 MiB uncore_imc/data_writes/
  20.000374584 34,842.89 MiB uncore_imc/data_reads/
  20.000374584 10.45 MiB uncore_imc/data_writes/
  30.000633299 37,965.29 MiB uncore_imc/data_reads/
  30.000633299 323.62 MiB uncore_imc/data_writes/
  40.000891548 41,012.88 MiB uncore_imc/data_reads/
  40.000891548 6.98 MiB uncore_imc/data_writes/
  50.001142480 1,125,899,906,621,494.75 MiB uncore_imc/data_reads/
  50.001142480 6.97 MiB uncore_imc/data_writes/

The client IMC events are freerunning counters. They still use the
old event encoding format (0x1 for data_read and 0x2 for data write).
The counter bit width is calculated by common code, which assume that
the standard encoding format is used for the freerunning counters.
Error bit width information is calculated.

The patch intends to convert the old client IMC event encoding to the
standard encoding format.

Current common code uses event->attr.config which directly copy from
user space. We should not implicitly modify it for a converted event.
The event->hw.config is used to replace the event->attr.config in
common code.

For client IMC events, the event->attr.config is used to calculate a
converted event with standard encoding format in the custom
event_init(). The converted event is stored in event->hw.config.
For other events of freerunning counters, they already use the standard
encoding format. The same value as event->attr.config is assigned to
event->hw.config in common event_init().

Reported-by: Jin Yao <yao.jin@linux.intel.com>
Tested-by: Jin Yao <yao.jin@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: stable@kernel.org # v4.18+
Fixes: 9aae1780e7e8 ("perf/x86/intel/uncore: Clean up client IMC uncore")
Link: https://lkml.kernel.org/r/20190227165729.1861-1-kan.liang@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

perf intel-pt: Fix overlap calculation for padding

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 5a99d99e3310a565b0cf63f785b347be9ee0da45 upstream.

Auxtrace records might have up to 7 bytes of padding appended. Adjust
the overlap accordingly.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20190206103947.15750-3-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

perf auxtrace: Define auxtrace record alignment

BugLink: https://bugs.launchpad.net/bugs/1821607
commit c3fcadf0bb765faf45d6d562246e1d08885466df upstream.

Define auxtrace record alignment so that it can be referenced elsewhere.

Note this is preparation for patch "perf intel-pt: Fix overlap calculation
for padding"

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20190206103947.15750-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

perf tools: Fix split_kallsyms_for_kcore() for trampoline symbols

BugLink: https://bugs.launchpad.net/bugs/1821607
commit d6d457451eb94fa747dc202765592eb8885a7352 upstream.

Kallsyms symbols do not have a size, so the size becomes the distance to
the next symbol.

Consequently the recently added trampoline symbols end up with large
sizes because the trampolines are some distance from one another and the
main kernel map.

However, symbols that end outside their map can disrupt the symbol tree
because, after mapping, it can appear incorrectly that they overlap
other symbols.

Add logic to truncate symbol size to the end of the corresponding map.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: stable@vger.kernel.org
Fixes: d83212d5dd67 ("kallsyms, x86: Export addresses of PTI entry trampolines")
Link: http://lkml.kernel.org/r/20190109091835.5570-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

perf intel-pt: Fix CYC timestamp calculation after OVF

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 03997612904866abe7cdcc992784ef65cb3a4b81 upstream.

CYC packet timestamp calculation depends upon CBR which was being
cleared upon overflow (OVF). That can cause errors due to failing to
synchronize with sideband events. Even if a CBR change has been lost,
the old CBR is still a better estimate than zero. So remove the clearing
of CBR.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20190206103947.15750-4-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

x86/unwind/orc: Fix ORC unwind table alignment

BugLink: https://bugs.launchpad.net/bugs/1821607
commit f76a16adc485699f95bb71fce114f97c832fe664 upstream.

The .orc_unwind section is a packed array of 6-byte structs.  It's
currently aligned to 6 bytes, which is causing warnings in the LLD
linker.

Six isn't a power of two, so it's not a valid alignment value.  The
actual alignment doesn't matter much because it's an array of packed
structs.  An alignment of two is sufficient.  In reality it always gets
aligned to four bytes because it comes immediately after the
4-byte-aligned .orc_unwind_ip section.

Fixes: ee9f8fce9964 ("x86/unwind: Add the ORC unwinder")
Reported-by: Nick Desaulniers <ndesaulniers@google.com>
Reported-by: Dmitry Golovin <dima@golovin.in>
Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://github.com/ClangBuiltLinux/linux/issues/218
Link: https://lkml.kernel.org/r/d55027ee95fe73e952dcd8be90aebd31b0095c45.1551892041.git.jpoimboe@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

vt: perform safe console erase in the right order

BugLink: https://bugs.launchpad.net/bugs/1821607
commit a6dbe442755999960ca54a9b8ecfd9606be0ea75 upstream.

Commit 4b4ecd9cb853 ("vt: Perform safe console erase only once") removed
what appeared to be an extra call to scr_memsetw(). This missed the fact
that set_origin() must be called before clearing the screen otherwise
old screen content gets restored on the screen when using vgacon. Let's
fix that by moving all the scrollback handling to flush_scrollback()
where it logically belongs, and invoking it before the actual screen
clearing in csi_J(), making the code simpler in the end.

Reported-by: Matthew Whitehead <tedheadster@gmail.com>
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Tested-by: Matthew Whitehead <tedheadster@gmail.com>
Fixes: 4b4ecd9cb853 ("vt: Perform safe console erase only once")
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

stable-kernel-rules.rst: add link to networking patch queue

BugLink: https://bugs.launchpad.net/bugs/1821607
commit a41e8f25fa8f8f67360d88eb0eebbabe95a64bdf upstream.

The networking maintainer keeps a public list of the patches being
queued up for the next round of stable releases. Be sure to check there
before asking for a patch to be applied so that you do not waste
people's time.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

bcache: use (REQ_META|REQ_PRIO) to indicate bio for metadata

BugLink: https://bugs.launchpad.net/bugs/1821607
commit dc7292a5bcb4c878b076fca2ac3fc22f81b8f8df upstream.

In 'commit 752f66a75aba ("bcache: use REQ_PRIO to indicate bio for
metadata")' REQ_META is replaced by REQ_PRIO to indicate metadata bio.
This assumption is not always correct, e.g. XFS uses REQ_META to mark
metadata bio other than REQ_PRIO. This is why Nix noticed that bcache
does not cache metadata for XFS after the above commit.

Thanks to Dave Chinner, he explains the difference between REQ_META and
REQ_PRIO from view of file system developer. Here I quote part of his
explanation from mailing list,
   REQ_META is used for metadata. REQ_PRIO is used to communicate to
   the lower layers that the submitter considers this IO to be more
   important that non REQ_PRIO IO and so dispatch should be expedited.

   IOWs, if the filesystem considers metadata IO to be more important
   that user data IO, then it will use REQ_PRIO | REQ_META rather than
   just REQ_META.

Then it seems bios with REQ_META or REQ_PRIO should both be cached for
performance optimation, because they are all probably low I/O latency
demand by upper layer (e.g. file system).

So in this patch, when we want to decide whether to bypass the cache,
REQ_META and REQ_PRIO are both checked. Then both metadata and
high priority I/O requests will be handled properly.

Reported-by: Nix <nix@esperi.org.uk>
Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Andre Noll <maan@tuebingen.mpg.de>
Tested-by: Nix <nix@esperi.org.uk>
Cc: stable@vger.kernel.org
Cc: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

bcache: treat stale && dirty keys as bad keys

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 58ac323084ebf44f8470eeb8b82660f9d0ee3689 upstream.

Stale && dirty keys can be produced in the follow way:
After writeback in write_dirty_finish(), dirty keys k1 will
replace by clean keys k2
==>ret = bch_btree_insert(dc->disk.c, &keys, NULL, &w->key);
==>btree_insert_fn(struct btree_op *b_op, struct btree *b)
==>static int bch_btree_insert_node(struct btree *b,
       struct btree_op *op,
       struct keylist *insert_keys,
       atomic_t *journal_ref,
Then two steps:
A) update k1 to k2 in btree node memory;
   bch_btree_insert_keys(b, op, insert_keys, replace_key)
B) Write the bset(contains k2) to cache disk by a 30s delay work
   bch_btree_leaf_dirty(b, journal_ref).
But before the 30s delay work write the bset to cache device,
these things happened:
A) GC works, and reclaim the bucket k2 point to;
B) Allocator works, and invalidate the bucket k2 point to,
   and increase the gen of the bucket, and place it into free_inc
   fifo;
C) Until now, the 30s delay work still does not finish work,
   so in the disk, the key still is k1, it is dirty and stale
   (its gen is smaller than the gen of the bucket). and then the
   machine power off suddenly happens;
D) When the machine power on again, after the btree reconstruction,
   the stale dirty key appear.

In bch_extent_bad(), when expensive_debug_checks is off, it would
treat the dirty key as good even it is stale keys, and it would
cause bellow probelms:
A) In read_dirty() it would cause machine crash:
   BUG_ON(ptr_stale(dc->disk.c, &w->key, 0));
B) It could be worse when reads hits stale dirty keys, it would
   read old incorrect data.

This patch tolerate the existence of these stale && dirty keys,
and treat them as bad key in bch_extent_bad().

(Coly Li: fix indent which was modified by sender's email client)

Signed-off-by: Tang Junhui <tang.junhui.linux@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

PM / OPP: Update performance state when freq == old_freq

BugLink: https://bugs.launchpad.net/bugs/1821607
commit faef080f6db5320011862f7baf1aa66d0851559f upstream.

At boot up, CPUFreq core performs a sanity check to see if the system is
running at a frequency defined in the frequency table of the CPU. If so,
we try to find a valid frequency (lowest frequency greater than the
currently programmed frequency) from the table and set it. When the call
reaches dev_pm_opp_set_rate(), it calls _find_freq_ceil(opp_table,
&old_freq) to find the previously configured OPP and this call also
updates the old_freq. This eventually sets the old_freq == freq (new
target requested by cpufreq core) and we skip updating the performance
state in this case.

Fix this by also updating the performance state when the old_freq ==
freq.

Fixes: ca1b5d77b1c6 ("OPP: Configure all required OPPs")
Cc: v5.0 <stable@vger.kernel.org> # v5.0
Reported-by: Niklas Cassel <niklas.cassel@linaro.org>
Tested-by: Jorge Ramirez-Ortiz <jorge.ramirez-ortiz@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

PM / wakeup: Rework wakeup source timer cancellation

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 1fad17fb1bbcd73159c2b992668a6957ecc5af8a upstream.

If wakeup_source_add() is called right after wakeup_source_remove()
for the same wakeup source, timer_setup() may be called for a
potentially scheduled timer which is incorrect.

To avoid that, move the wakeup source timer cancellation from
wakeup_source_drop() to wakeup_source_remove().

Moreover, make wakeup_source_remove() clear the timer function after
canceling the timer to let wakeup_source_not_registered() treat
unregistered wakeup sources in the same way as the ones that have
never been registered.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 4.4+ <stable@vger.kernel.org> # 4.4+
[ rjw: Subject, changelog, merged two patches together ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

svcrpc: fix UDP on servers with lots of threads

BugLink: https://bugs.launchpad.net/bugs/1821607
commit b7e5034cbecf5a65b7bfdc2b20a8378039577706 upstream.

James Pearson found that an NFS server stopped responding to UDP
requests if started with more than 1017 threads.

sv_max_mesg is about 2^20, so that is probably where the calculation
performed by

svc_sock_setbufsize(svsk->sk_sock,
(serv->sv_nrthreads+3) * serv->sv_max_mesg,
(serv->sv_nrthreads+3) * serv->sv_max_mesg);

starts to overflow an int.

Reported-by: James Pearson <jcpearson@gmail.com>
Tested-by: James Pearson <jcpearson@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

NFSv4.1: Reinitialise sequence results before retransmitting a request

BugLink: https://bugs.launchpad.net/bugs/1821607
commit c1dffe0bf7f9c3d57d9f237a7cb2a81e62babd2b upstream.

If we have to retransmit a request, we should ensure that we reinitialise
the sequence results structure, since in the event of a signal
we need to treat the request as if it had not been sent.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

nfsd: fix wrong check in write_v4_end_grace()

BugLink: https://bugs.launchpad.net/bugs/1821607
commit dd838821f0a29781b185cd8fb8e48d5c177bd838 upstream.

Commit 62a063b8e7d1 "nfsd4: fix crash on writing v4_end_grace before
nfsd startup" is trying to fix a NULL dereference issue, but it
mistakenly checks if the nfsd server is started. So fix it.

Fixes: 62a063b8e7d1 "nfsd4: fix crash on writing v4_end_grace before nfsd startup"
Cc: stable@vger.kernel.org
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: Yihao Wu <wuyihao@linux.alibaba.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

nfsd: fix memory corruption caused by readdir

BugLink: https://bugs.launchpad.net/bugs/1821607
commit b602345da6cbb135ba68cf042df8ec9a73da7981 upstream.

If the result of an NFSv3 readdir{,plus} request results in the
"offset" on one entry having to be split across 2 pages, and is sized
so that the next directory entry doesn't fit in the requested size,
then memory corruption can happen.

When encode_entry() is called after encoding the last entry that fits,
it notices that ->offset and ->offset1 are set, and so stores the
offset value in the two pages as required. It clears ->offset1 but
*does not* clear ->offset.

Normally this omission doesn't matter as encode_entry_baggage() will
be called, and will set ->offset to a suitable value (not on a page
boundary).
But in the case where cd->buflen < elen and nfserr_toosmall is
returned, ->offset is not reset.

This means that nfsd3proc_readdirplus will see ->offset with a value 4
bytes before the end of a page, and ->offset1 set to NULL.
It will try to write 8bytes to ->offset.
If we are lucky, the next page will be read-only, and the system will
BUG: unable to handle kernel paging request at...

If we are unlucky, some innocent page will have the first 4 bytes
corrupted.

nfsd3proc_readdir() doesn't even check for ->offset1, it just blindly
writes 8 bytes to the offset wherever it is.

Fix this by clearing ->offset after it is used, and copying the
->offset handling code from nfsd3_proc_readdirplus into
nfsd3_proc_readdir.

(Note that the commit hash in the Fixes tag is from the 'history'
tree - this bug predates git).

Fixes: 0b1d57cf7654 ("[PATCH] kNFSd: Fix nfs3 dentry encoding")
Fixes-URL: https://git.kernel.org/pub/scm/linux/kernel/git/history/history.git/commit/?id=0b1d57cf7654
Cc: stable@vger.kernel.org (v2.6.12+)
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

nfsd: fix performance-limiting session calculation

BugLink: https://bugs.launchpad.net/bugs/1821607
commit c54f24e338ed2a35218f117a4a1afb5f9e2b4e64 upstream.

We're unintentionally limiting the number of slots per nfsv4.1 session
to 10. Often more than 10 simultaneous RPCs are needed for the best
performance.

This calculation was meant to prevent any one client from using up more
than a third of the limit we set for total memory use across all clients
and sessions. Instead, it's limiting the client to a third of the
maximum for a single session.

Fix this.

Reported-by: Chris Tracy <ctracy@engr.scu.edu>
Cc: stable@vger.kernel.org
Fixes: de766e570413 "nfsd: give out fewer session slots as limit approaches"
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

NFS: Don't recoalesce on error in nfs_pageio_complete_mirror()

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 8127d82705998568b52ac724e28e00941538083d upstream.

If the I/O completion failed with a fatal error, then we should just
exit nfs_pageio_complete_mirror() rather than try to recoalesce.

Fixes: a7d42ddb3099 ("nfs: add mirroring support to pgio layer")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: stable@vger.kernel.org # v4.0+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

NFS: Fix an I/O request leakage in nfs_do_recoalesce

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 4d91969ed4dbcefd0e78f77494f0cb8fada9048a upstream.

Whether we need to exit early, or just reprocess the list, we
must not lost track of the request which failed to get recoalesced.

Fixes: 03d5eb65b538 ("NFS: Fix a memory leak in nfs_do_recoalesce")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: stable@vger.kernel.org # v4.0+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

NFS: Fix I/O request leakages

BugLink: https://bugs.launchpad.net/bugs/1821607
commit f57dcf4c72113c745d83f1c65f7291299f65c14f upstream.

When we fail to add the request to the I/O queue, we currently leave it
to the caller to free the failed request. However since some of the
requests that fail are actually created by nfs_pageio_add_request()
itself, and are not passed back the caller, this leads to a leakage
issue, which can again cause page locks to leak.

This commit addresses the leakage by freeing the created requests on
error, using desc->pg_completion_ops->error_cleanup()

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Fixes: a7d42ddb30997 ("nfs: add mirroring support to pgio layer")
Cc: stable@vger.kernel.org # v4.0: c18b96a1b862: nfs: clean up rest of reqs
Cc: stable@vger.kernel.org # v4.0: d600ad1f2bdb: NFS41: pop some layoutget
Cc: stable@vger.kernel.org # v4.0+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

cpuidle: governor: Add new governors to cpuidle_governors again

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 22782b3f9bb8ae21c710e2880db21bc729771e92 upstream.

After commit 61cb5758d3c4 ("cpuidle: Add cpuidle.governor= command
line parameter") new cpuidle governors are not added to the list
of available governors, so governor selection via sysfs doesn't
work as expected (even though it is rarely used anyway).

Fix that by making cpuidle_register_governor() add new governors to
cpuidle_governors again.

Fixes: 61cb5758d3c4 ("cpuidle: Add cpuidle.governor= command line parameter")
Reported-by: Kees Cook <keescook@chromium.org>
Cc: 5.0+ <stable@vger.kernel.org> # 5.0+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

cpcap-charger: generate events for userspace

BugLink: https://bugs.launchpad.net/bugs/1821607
commit fd10606f93a149a9f3d37574e5385b083b4a7b32 upstream.

The driver doesn't generate uevents on charger connect/disconnect.
This leads to UPower not detecting when AC is on or off... and that is
bad.

Reported by Arthur D. on github (
https://github.com/maemo-leste/bugtracker/issues/206 ), thanks to
Merlijn Wajer for suggesting a fix.

Cc: stable@kernel.org
Signed-off-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

mfd: sm501: Fix potential NULL pointer dereference

BugLink: https://bugs.launchpad.net/bugs/1821607
commit ae7b8eda27b33b1f688dfdebe4d46f690a8f9162 upstream.

There is a potential NULL pointer dereference in case devm_kzalloc()
fails and returns NULL.

Fix this by adding a NULL check on *lookup*

This bug was detected with the help of Coccinelle.

Fixes: b2e63555592f ("i2c: gpio: Convert to use descriptors")
Cc: stable@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

media: cx25840: mark pad sig_types to fix cx231xx init

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 46c039d06b6ecabb94bd16c3a999b28dc83b79ce upstream.

Without this, we get failures like this when the kernel attempts to
initialize a cx231xx device:

[16046.153653] cx231xx 3-1.2:1.1: New device Hauppauge Hauppauge Device @ 480 Mbps (2040:c200) with 6 interfaces
[16046.153900] cx231xx 3-1.2:1.1: can't change interface 3 alt no. to 3: Max. Pkt size = 0
[16046.153907] cx231xx 3-1.2:1.1: Identified as Hauppauge USB Live 2 (card=9)
[16046.154350] i2c i2c-11: Added multiplexed i2c bus 13
[16046.154379] i2c i2c-11: Added multiplexed i2c bus 14
[16046.267194] cx25840 10-0044: cx23102 A/V decoder found @ 0x88 (cx231xx #0-0)
[16048.424551] cx25840 10-0044: loaded v4l-cx231xx-avcore-01.fw firmware (16382 bytes)
[16048.463224] cx231xx 3-1.2:1.1: v4l2 driver version 0.0.3
[16048.567878] cx231xx 3-1.2:1.1: Registered video device video2 [v4l2]
[16048.568001] cx231xx 3-1.2:1.1: Registered VBI device vbi0
[16048.568419] cx231xx 3-1.2:1.1: audio EndPoint Addr 0x83, Alternate settings: 3
[16048.568425] cx231xx 3-1.2:1.1: video EndPoint Addr 0x84, Alternate settings: 5
[16048.568431] cx231xx 3-1.2:1.1: VBI EndPoint Addr 0x85, Alternate settings: 2
[16048.568436] cx231xx 3-1.2:1.1: sliced CC EndPoint Addr 0x86, Alternate settings: 2
[16048.568448] usb 3-1.2: couldn't get decoder output pad for V4L I/O
[16048.568453] cx231xx 3-1.2:1.1: V4L2 device vbi0 deregistered
[16048.568579] cx231xx 3-1.2:1.1: V4L2 device video2 deregistered
[16048.569001] cx231xx: probe of 3-1.2:1.1 failed with error -22

Likely a regession since Commit 9d6d20e652c0
("media: v4l2-mc: switch it to use the new approach to setup pipelines")
(v4.19-rc1-100-g9d6d20e652c0), which introduced the use of
PAD_SIGNAL_DV within v4l2_mc_create_media_graph().

This also modifies cx25840 to remove the VBI pad, matching the action
taken in Commit 092a37875a22 ("media: v4l2: remove VBI output pad").

Fixes: 9d6d20e652c0 ("media: v4l2-mc: switch it to use the new approach to setup pipelines")
Cc: stable@vger.kernel.org
Signed-off-by: Cody P Schafer <dev@codyps.com>
Tested-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

dm integrity: limit the rate of error messages

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 225557446856448039a9e495da37b72c20071ef2 upstream.

When using dm-integrity underneath md-raid, some tests with raid
auto-correction trigger large amounts of integrity failures - and all
these failures print an error message. These messages can bring the
system to a halt if the system is using serial console.

Fix this by limiting the rate of error messages - it improves the speed
of raid recovery and avoids the hang.

Fixes: 7eada909bfd7a ("dm: add integrity target")
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

dm: fix to_sector() for 32bit

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 0bdb50c531f7377a9da80d3ce2d61f389c84cb30 upstream.

A dm-raid array with devices larger than 4GB won't assemble on
a 32 bit host since _check_data_dev_sectors() was added in 4.16.
This is because to_sector() treats its argument as an "unsigned long"
which is 32bits (4GB) on a 32bit host. Using "unsigned long long"
is more correct.

Kernels as early as 4.2 can have other problems due to to_sector()
being used on the size of a device.

Fixes: 0cf4503174c1 ("dm raid: add support for the MD RAID0 personality")
cc: stable@vger.kernel.org (v4.2+)
Reported-and-tested-by: Guillaume Perréal <gperreal@free.fr>
Signed-off-by: NeilBrown <neil@brown.name>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

ipmi_si: fix use-after-free of resource->name

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 401e7e88d4ef80188ffa07095ac00456f901b8c4 upstream.

When we excute the following commands, we got oops
rmmod ipmi_si
cat /proc/ioports

[ 1623.482380] Unable to handle kernel paging request at virtual address ffff00000901d478
[ 1623.482382] Mem abort info:
[ 1623.482383]   ESR = 0x96000007
[ 1623.482385]   Exception class = DABT (current EL), IL = 32 bits
[ 1623.482386]   SET = 0, FnV = 0
[ 1623.482387]   EA = 0, S1PTW = 0
[ 1623.482388] Data abort info:
[ 1623.482389]   ISV = 0, ISS = 0x00000007
[ 1623.482390]   CM = 0, WnR = 0
[ 1623.482393] swapper pgtable: 4k pages, 48-bit VAs, pgdp = 00000000d7d94a66
[ 1623.482395] [ffff00000901d478] pgd=000000dffbfff003, pud=000000dffbffe003, pmd=0000003f5d06e003, pte=0000000000000000
[ 1623.482399] Internal error: Oops: 96000007 [#1] SMP
[ 1623.487407] Modules linked in: ipmi_si(E) nls_utf8 isofs rpcrdma ib_iser ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_umad rdma_cm ib_cm dm_mirror dm_region_hash dm_log iw_cm dm_mod aes_ce_blk crypto_simd cryptd aes_ce_cipher ses ghash_ce sha2_ce enclosure sha256_arm64 sg sha1_ce hisi_sas_v2_hw hibmc_drm sbsa_gwdt hisi_sas_main ip_tables mlx5_ib ib_uverbs marvell ib_core mlx5_core ixgbe mdio hns_dsaf ipmi_devintf hns_enet_drv ipmi_msghandler hns_mdio [last unloaded: ipmi_si]
[ 1623.532410] CPU: 30 PID: 11438 Comm: cat Kdump: loaded Tainted: G            E     5.0.0-rc3+ #168
[ 1623.541498] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.37 11/21/2017
[ 1623.548822] pstate: a0000005 (NzCv daif -PAN -UAO)
[ 1623.553684] pc : string+0x28/0x98
[ 1623.557040] lr : vsnprintf+0x368/0x5e8
[ 1623.560837] sp : ffff000013213a80
[ 1623.564191] x29: ffff000013213a80 x28: ffff00001138abb5
[ 1623.569577] x27: ffff000013213c18 x26: ffff805f67d06049
[ 1623.574963] x25: 0000000000000000 x24: ffff00001138abb5
[ 1623.580349] x23: 0000000000000fb7 x22: ffff0000117ed000
[ 1623.585734] x21: ffff000011188fd8 x20: ffff805f67d07000
[ 1623.591119] x19: ffff805f67d06061 x18: ffffffffffffffff
[ 1623.596505] x17: 0000000000000200 x16: 0000000000000000
[ 1623.601890] x15: ffff0000117ed748 x14: ffff805f67d07000
[ 1623.607276] x13: ffff805f67d0605e x12: 0000000000000000
[ 1623.612661] x11: 0000000000000000 x10: 0000000000000000
[ 1623.618046] x9 : 0000000000000000 x8 : 000000000000000f
[ 1623.623432] x7 : ffff805f67d06061 x6 : fffffffffffffffe
[ 1623.628817] x5 : 0000000000000012 x4 : ffff00000901d478
[ 1623.634203] x3 : ffff0a00ffffff04 x2 : ffff805f67d07000
[ 1623.639588] x1 : ffff805f67d07000 x0 : ffffffffffffffff
[ 1623.644974] Process cat (pid: 11438, stack limit = 0x000000008d4cbc10)
[ 1623.651592] Call trace:
[ 1623.654068]  string+0x28/0x98
[ 1623.657071]  vsnprintf+0x368/0x5e8
[ 1623.660517]  seq_vprintf+0x70/0x98
[ 1623.668009]  seq_printf+0x7c/0xa0
[ 1623.675530]  r_show+0xc8/0xf8
[ 1623.682558]  seq_read+0x330/0x440
[ 1623.689877]  proc_reg_read+0x78/0xd0
[ 1623.697346]  __vfs_read+0x60/0x1a0
[ 1623.704564]  vfs_read+0x94/0x150
[ 1623.711339]  ksys_read+0x6c/0xd8
[ 1623.717939]  __arm64_sys_read+0x24/0x30
[ 1623.725077]  el0_svc_common+0x120/0x148
[ 1623.732035]  el0_svc_handler+0x30/0x40
[ 1623.738757]  el0_svc+0x8/0xc
[ 1623.744520] Code: d1000406 aa0103e2 54000149 b4000080 (39400085)
[ 1623.753441] ---[ end trace f91b6a4937de9835 ]---
[ 1623.760871] Kernel panic - not syncing: Fatal exception
[ 1623.768935] SMP: stopping secondary CPUs
[ 1623.775718] Kernel Offset: disabled
[ 1623.781998] CPU features: 0x002,21006008
[ 1623.788777] Memory Limit: none
[ 1623.798329] Starting crashdump kernel...
[ 1623.805202] Bye!

If io_setup is called successful in try_smi_init() but try_smi_init()
goes out_err before calling ipmi_register_smi(), so ipmi_unregister_smi()
will not be called while removing module. It leads to the resource that
allocated in io_setup() can not be freed, but the name(DEVICE_NAME) of
resource is freed while removing the module. It causes use-after-free
when cat /proc/ioports.

Fix this by calling io_cleanup() while try_smi_init() goes to out_err.
and don't call io_cleanup() until io_setup() returns successful to avoid
warning prints.

Fixes: 93c303d2045b ("ipmi_si: Clean up shutdown a bit")
Cc: stable@vger.kernel.org
Reported-by: NuoHan Qiao <qiaonuohan@huawei.com>
Suggested-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

ipmi_si: Fix crash when using hard-coded device

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 41b766d661bf94a364960862cfc248a78313dbd3 upstream.

When excuting a command like:
modprobe ipmi_si ports=0xffc0e3 type=bt
The system would get an oops.

The trouble here is that ipmi_si_hardcode_find_bmc() is called before
ipmi_si_platform_init(), but initialization of the hard-coded device
creates an IPMI platform device, which won't be initialized yet.

The real trouble is that hard-coded devices aren't created with
any device, and the fixup is done later. So do it right, create the
hard-coded devices as normal platform devices.

This required adding some new resource types to the IPMI platform
code for passing information required by the hard-coded device
and adding some code to remove the hard-coded platform devices
on module removal.

To enforce the "hard-coded devices passed by the user take priority
over firmware devices" rule, some special code was added to check
and see if a hard-coded device already exists.

Reported-by: Yang Yingliang <yangyingliang@huawei.com>
Cc: stable@vger.kernel.org # v4.15+
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Tested-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

Revert "KVM/MMU: Flush tlb directly in the kvm_zap_gfn_range()"

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 92da008fa21034c369cdb8ca2b629fe5c196826b upstream.

This reverts commit 71883a62fcd6c70639fa12cda733378b4d997409.

The above commit contains an optimization to kvm_zap_gfn_range which
uses gfn-limited TLB flushes, if enabled. If using these limited flushes,
kvm_zap_gfn_range passes lock_flush_tlb=false to slot_handle_level_range
which creates a race when the function unlocks to call cond_resched.
See an example of this race below:

CPU 0                   CPU 1                           CPU 3
// zap_direct_gfn_range
mmu_lock()
// *ptep == pte_1
*ptep = 0
if (lock_flush_tlb)
        flush_tlbs()
mmu_unlock()
                        // In invalidate range
                        // MMU notifier
                        mmu_lock()
                        if (pte != 0)
                                *ptep = 0
                                flush = true
                        if (flush)
                                flush_remote_tlbs()
                        mmu_unlock()
                        return
                        // Host MM reallocates
                        // page previously
                        // backing guest memory.
                                                        // Guest accesses
                                                        // invalid page
                                                        // through pte_1
                                                        // in its TLB!!

Tested: Ran all kvm-unit-tests on a Intel Haswell machine with and
without this patch. The patch introduced no new failures.

Signed-off-by: Ben Gardon <bgardon@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

arm64: KVM: Fix architecturally invalid reset value for FPEXC32_EL2

BugLink: https://bugs.launchpad.net/bugs/1821607
commit c88b093693ccbe41991ef2e9b1d251945e6e54ed upstream.

Due to what looks like a typo dating back to the original addition
of FPEXC32_EL2 handling, KVM currently initialises this register to
an architecturally invalid value.

As a result, the VECITR field (RES1) in bits [10:8] is initialised
with 0, and the two reserved (RES0) bits [6:5] are initialised with
1. (In the Common VFP Subarchitecture as specified by ARMv7-A,
these two bits were IMP DEF. ARMv8-A removes them.)

This patch changes the reset value from 0x70 to 0x700, which
reflects the architectural constraints and is presumably what was
originally intended.

Cc: <stable@vger.kernel.org> # 4.12.x-
Cc: Christoffer Dall <christoffer.dall@arm.com>
Fixes: 62a89c44954f ("arm64: KVM: 32bit handling of coprocessor traps")
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

arm64: debug: Ensure debug handlers check triggering exception level

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 6bd288569b50bc89fa5513031086746968f585cb upstream.

Debug exception handlers may be called for exceptions generated both by
user and kernel code. In many cases, this is checked explicitly, but
in other cases things either happen to work by happy accident or they
go slightly wrong. For example, executing 'brk #4' from userspace will
enter the kprobes code and be ignored, but the instruction will be
retried forever in userspace instead of delivering a SIGTRAP.

Fix this issue in the most stable-friendly fashion by simply adding
explicit checks of the triggering exception level to all of our debug
exception handlers.

Cc: <stable@vger.kernel.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

arm64: debug: Don't propagate UNKNOWN FAR into si_code for debug signals

BugLink: https://bugs.launchpad.net/bugs/1821607
commit b9a4b9d084d978f80eb9210727c81804588b42ff upstream.

FAR_EL1 is UNKNOWN for all debug exceptions other than those caused by
taking a hardware watchpoint. Unfortunately, if a debug handler returns
a non-zero value, then we will propagate the UNKNOWN FAR value to
userspace via the si_addr field of the SIGTRAP siginfo_t.

Instead, let's set si_addr to take on the PC of the faulting instruction,
which we have available in the current pt_regs.

Cc: <stable@vger.kernel.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

arm64: Fix HCR.TGE status for NMI contexts

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 5870970b9a828d8693aa6d15742573289d7dbcd0 upstream.

When using VHE, the host needs to clear HCR_EL2.TGE bit in order
to interact with guest TLBs, switching from EL2&0 translation regime
to EL1&0.

However, some non-maskable asynchronous event could happen while TGE is
cleared like SDEI. Because of this address translation operations
relying on EL2&0 translation regime could fail (tlb invalidation,
userspace access, ...).

Fix this by properly setting HCR_EL2.TGE when entering NMI context and
clear it if necessary when returning to the interrupted context.

Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Suggested-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: linux-arch@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

ARM: s3c24xx: Fix boolean expressions in osiris_dvs_notify

BugLink: https://bugs.launchpad.net/bugs/1821607
commit e2477233145f2156434afb799583bccd878f3e9f upstream.

Fix boolean expressions by using logical AND operator '&&' instead of
bitwise operator '&'.

This issue was detected with the help of Coccinelle.

Fixes: 4fa084af28ca ("ARM: OSIRIS: DVS (Dynamic Voltage Scaling) supoort.")
Cc: stable@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
[krzk: Fix -Wparentheses warning]
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

powerpc/traps: Fix the message printed when stack overflows

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 9bf3d3c4e4fd82c7174f4856df372ab2a71005b9 upstream.

Today's message is useless:

[ 42.253267] Kernel stack overflow in process (ptrval), r1=c65500b0

This patch fixes it:

[ 66.905235] Kernel stack overflow in process sh[356], r1=c65560b0

Fixes: ad67b74d2469 ("printk: hash addresses printed with %p")
Cc: stable@vger.kernel.org # v4.15+
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: Use task_pid_nr()]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

powerpc/traps: fix recoverability of machine check handling on book3s/32

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 0bbea75c476b77fa7d7811d6be911cc7583e640f upstream.

Looks like book3s/32 doesn't set RI on machine check, so
checking RI before calling die() will always be fatal
allthought this is not an issue in most cases.

Fixes: b96672dd840f ("powerpc: Machine check interrupt is a non-maskable interrupt")
Fixes: daf00ae71dad ("powerpc/traps: restore recoverability of machine_check interrupts")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: stable@vger.kernel.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

powerpc/smp: Fix NMI IPI xmon timeout

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 88b9a3d1425a436e95c41f09986fdae2daee437a upstream.

The xmon debugger IPI handler waits in the callback function while
xmon is still active. This means they don't complete the IPI, and the
initiator always times out waiting for them.

Things manage to work after the timeout because there is some fallback
logic to keep NMI IPI state sane in case of the timeout, but this is a
bit ugly.

This patch changes NMI IPI back to half-asynchronous (i.e., wait for
everyone to call in, do not wait for IPI function to complete), but
the complexity is avoided by going one step further and allowing new
IPIs to be issued before the IPI functions to all complete.

If synchronization against that is required, it is left up to the
caller, but current callers don't require that. In fact with the
timeout handling, callers must be able to cope with this already.

Fixes: 5b73151fff63 ("powerpc: NMI IPI make NMI IPIs fully sychronous")
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

powerpc/smp: Fix NMI IPI timeout

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 1b5fc84aba170bdfe3533396ca9662ceea1609b7 upstream.

The NMI IPI timeout logic is broken, if __smp_send_nmi_ipi() times out
on the first condition, delay_us will be zero which will send it into
the second spin loop with no timeout so it will spin forever.

Fixes: 5b73151fff63 ("powerpc: NMI IPI make NMI IPIs fully sychronous")
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

powerpc/hugetlb: Don't do runtime allocation of 16G pages in LPAR configuration

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 35f2806b481f5b9207f25e1886cba5d1c4d12cc7 upstream.

We added runtime allocation of 16G pages in commit 4ae279c2c96a
("powerpc/mm/hugetlb: Allow runtime allocation of 16G.") That was done
to enable 16G allocation on PowerNV and KVM config. In case of KVM
config, we mostly would have the entire guest RAM backed by 16G
hugetlb pages for this to work. PAPR do support partial backing of
guest RAM with hugepages via ibm,expected#pages node of memory node in
the device tree. This means rest of the guest RAM won't be backed by
16G contiguous pages in the host and hence a hash page table insertion
can fail in such case.

An example error message will look like

  hash-mmu: mm: Hashing failure ! EA=0x7efc00000000 access=0x8000000000000006 current=readback
  hash-mmu:     trap=0x300 vsid=0x67af789 ssize=1 base psize=14 psize 14 pte=0xc000000400000386
  readback[12260]: unhandled signal 7 at 00007efc00000000 nip 00000000100012d0 lr 000000001000127c code 2

This patch address that by preventing runtime allocation of 16G
hugepages in LPAR config. To allocate 16G hugetlb one need to kernel
command line hugepagesz=16G hugepages=<number of 16G pages>

With radix translation mode we don't run into this issue.

This change will prevent runtime allocation of 16G hugetlb pages on
kvm with hash translation mode. However, with the current upstream it
was observed that 16G hugetlbfs backed guest doesn't boot at all.

We observe boot failure with the below message:
  [131354.647546] KVM: map_vrma at 0 failed, ret=-4

That means this patch is not resulting in an observable regression.
Once we fix the boot issue with 16G hugetlb backed memory, we need to
use ibm,expected#pages memory node attribute to indicate 16G page
reservation to the guest. This will also enable partial backing of
guest RAM with 16G pages.

Fixes: 4ae279c2c96a ("powerpc/mm/hugetlb: Allow runtime allocation of 16G.")
Cc: stable@vger.kernel.org # v4.14+
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

powerpc/ptrace: Simplify vr_get/set() to avoid GCC warning

BugLink: https://bugs.launchpad.net/bugs/1821607
commit ca6d5149d2ad0a8d2f9c28cbe379802260a0a5e0 upstream.

GCC 8 warns about the logic in vr_get/set(), which with -Werror breaks
the build:

  In function ‘user_regset_copyin’,
      inlined from ‘vr_set’ at arch/powerpc/kernel/ptrace.c:628:9:
  include/linux/regset.h:295:4: error: ‘memcpy’ offset [-527, -529] is
  out of the bounds [0, 16] of object ‘vrsave’ with type ‘union
  <anonymous>’ [-Werror=array-bounds]
  arch/powerpc/kernel/ptrace.c: In function ‘vr_set’:
  arch/powerpc/kernel/ptrace.c:623:5: note: ‘vrsave’ declared here
     } vrsave;

This has been identified as a regression in GCC, see GCC bug 88273.

However we can avoid the warning and also simplify the logic and make
it more robust.

Currently we pass -1 as end_pos to user_regset_copyout(). This says
"copy up to the end of the regset".

The definition of the regset is:
[REGSET_VMX] = {
.core_note_type = NT_PPC_VMX, .n = 34,
.size = sizeof(vector128), .align = sizeof(vector128),
.active = vr_active, .get = vr_get, .set = vr_set
},

The end is calculated as (n * size), ie. 34 * sizeof(vector128).

In vr_get/set() we pass start_pos as 33 * sizeof(vector128), meaning
we can copy up to sizeof(vector128) into/out-of vrsave.

The on-stack vrsave is defined as:
  union {
  elf_vrreg_t reg;
  u32 word;
  } vrsave;

And elf_vrreg_t is:
  typedef __vector128 elf_vrreg_t;

So there is no bug, but we rely on all those sizes lining up,
otherwise we would have a kernel stack exposure/overwrite on our
hands.

Rather than relying on that we can pass an explict end_pos based on
the sizeof(vrsave). The result should be exactly the same but it's
more obviously not over-reading/writing the stack and it avoids the
compiler warning.

Reported-by: Meelis Roos <mroos@linux.ee>
Reported-by: Mathieu Malaterre <malat@debian.org>
Cc: stable@vger.kernel.org
Tested-by: Mathieu Malaterre <malat@debian.org>
Tested-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

powerpc: Fix 32-bit KVM-PR lockup and host crash with MacOS guest

BugLink: https://bugs.launchpad.net/bugs/1821607
commit fe1ef6bcdb4fca33434256a802a3ed6aacf0bd2f upstream.

Commit 8792468da5e1 "powerpc: Add the ability to save FPU without
giving it up" unexpectedly removed the MSR_FE0 and MSR_FE1 bits from
the bitmask used to update the MSR of the previous thread in
__giveup_fpu() causing a KVM-PR MacOS guest to lockup and panic the
host kernel.

Leaving FE0/1 enabled means unrelated processes might receive FPEs
when they're not expecting them and crash. In particular if this
happens to init the host will then panic.

eg (transcribed):
  qemu-system-ppc[837]: unhandled signal 8 at 12cc9ce4 nip 12cc9ce4 lr 12cc9ca4 code 0
  systemd[1]: unhandled signal 8 at 202f02e0 nip 202f02e0 lr 001003d4 code 0
  Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b

Reinstate these bits to the MSR bitmask to enable MacOS guests to run
under 32-bit KVM-PR once again without issue.

Fixes: 8792468da5e1 ("powerpc: Add the ability to save FPU without giving it up")
Cc: stable@vger.kernel.org # v4.6+
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

powerpc/64s/hash: Fix assert_slb_presence() use of the slbfee. instruction

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 7104dccfd052fde51eecc9972dad9c40bd3e0d11 upstream.

The slbfee. instruction must have bit 24 of RB clear, failure to do
so can result in false negatives that result in incorrect assertions.

This is not obvious from the ISA v3.0B document, which only says:

The hardware ignores the contents of RB 36:38 40:63 -- p.1032

This patch fixes the bug and also clears all other bits from PPC bit
36-63, which is good practice when dealing with reserved or ignored
bits.

Fixes: e15a4fea4dee ("powerpc/64s/hash: Add some SLB debugging tests")
Cc: stable@vger.kernel.org # v4.20+
Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

powerpc/powernv: Don't reprogram SLW image on every KVM guest entry/exit

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 19f8a5b5be2898573a5e1dc1db93e8d40117606a upstream.

Commit 24be85a23d1f ("powerpc/powernv: Clear PECE1 in LPCR via stop-api
only on Hotplug", 2017-07-21) added two calls to opal_slw_set_reg()
inside pnv_cpu_offline(), with the aim of changing the LPCR value in
the SLW image to disable wakeups from the decrementer while a CPU is
offline.  However, pnv_cpu_offline() gets called each time a secondary
CPU thread is woken up to participate in running a KVM guest, that is,
not just when a CPU is offlined.

Since opal_slw_set_reg() is a very slow operation (with observed
execution times around 20 milliseconds), this means that an offline
secondary CPU can often be busy doing the opal_slw_set_reg() call
when the primary CPU wants to grab all the secondary threads so that
it can run a KVM guest.  This leads to messages like "KVM: couldn't
grab CPU n" being printed and guest execution failing.

There is no need to reprogram the SLW image on every KVM guest entry
and exit.  So that we do it only when a CPU is really transitioning
between online and offline, this moves the calls to
pnv_program_cpu_hotplug_lpcr() into pnv_smp_cpu_kill_self().

Fixes: 24be85a23d1f ("powerpc/powernv: Clear PECE1 in LPCR via stop-api only on Hotplug")
Cc: stable@vger.kernel.org # v4.14+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

powerpc/kvm: Save and restore host AMR/IAMR/UAMOR

BugLink: https://bugs.launchpad.net/bugs/1821607
commit c3c7470c75566a077c8dc71dcf8f1948b8ddfab4 upstream.

When the hash MMU is active the AMR, IAMR and UAMOR are used for
pkeys. The AMR is directly writable by user space, and the UAMOR masks
those writes, meaning both registers are effectively user register
state. The IAMR is used to create an execute only key.

Also we must maintain the value of at least the AMR when running in
process context, so that any memory accesses done by the kernel on
behalf of the process are correctly controlled by the AMR.

Although we are correctly switching all registers when going into a
guest, on returning to the host we just write 0 into all regs, except
on Power9 where we restore the IAMR correctly.

This could be observed by a user process if it writes the AMR, then
runs a guest and we then return immediately to it without
rescheduling. Because we have written 0 to the AMR that would have the
effect of granting read/write permission to pages that the process was
trying to protect.

In addition, when using the Radix MMU, the AMR can prevent inadvertent
kernel access to userspace data, writing 0 to the AMR disables that
protection.

So save and restore AMR, IAMR and UAMOR.

Fixes: cf43d3b26452 ("powerpc: Enable pkey subsystem")
Cc: stable@vger.kernel.org # v4.16+
Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

powerpc/83xx: Also save/restore SPRG4-7 during suspend

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 36da5ff0bea2dc67298150ead8d8471575c54c7d upstream.

The 83xx has 8 SPRG registers and uses at least SPRG4
for DTLB handling LRU.

Fixes: 2319f1239592 ("powerpc/mm: e300c2/c3/c4 TLB errata workaround")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

powerpc/powernv: Make opal log only readable by root

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 7b62f9bd2246b7d3d086e571397c14ba52645ef1 upstream.

Currently the opal log is globally readable. It is kernel policy to
limit the visibility of physical addresses / kernel pointers to root.
Given this and the fact the opal log may contain this information it
would be better to limit the readability to root.

Fixes: bfc36894a48b ("powerpc/powernv: Add OPAL message log interface")
Cc: stable@vger.kernel.org # v3.15+
Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Reviewed-by: Stewart Smith <stewart@linux.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

powerpc/wii: properly disable use of BATs when requested.

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 6d183ca8baec983dc4208ca45ece3c36763df912 upstream.

'nobats' kernel parameter or some options like CONFIG_DEBUG_PAGEALLOC
deny the use of BATS for mapping memory.

This patch makes sure that the specific wii RAM mapping function
takes it into account as well.

Fixes: de32400dd26e ("wii: use both mem1 and mem2 as ram")
Cc: stable@vger.kernel.org
Reviewed-by: Jonathan Neuschafer <j.neuschaefer@gmx.net>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

powerpc/32: Clear on-stack exception marker upon exception return

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 9580b71b5a7863c24a9bd18bcd2ad759b86b1eff upstream.

Clear the on-stack STACK_FRAME_REGS_MARKER on exception exit in order
to avoid confusing stacktrace like the one below.

  Call Trace:
  [c0e9dca0] [c01c42a0] print_address_description+0x64/0x2bc (unreliable)
  [c0e9dcd0] [c01c4684] kasan_report+0xfc/0x180
  [c0e9dd10] [c0895130] memchr+0x24/0x74
  [c0e9dd30] [c00a9e38] msg_print_text+0x124/0x574
  [c0e9dde0] [c00ab710] console_unlock+0x114/0x4f8
  [c0e9de40] [c00adc60] vprintk_emit+0x188/0x1c4
  --- interrupt: c0e9df00 at 0x400f330
      LR = init_stack+0x1f00/0x2000
  [c0e9de80] [c00ae3c4] printk+0xa8/0xcc (unreliable)
  [c0e9df20] [c0c27e44] early_irq_init+0x38/0x108
  [c0e9df50] [c0c15434] start_kernel+0x310/0x488
  [c0e9dff0] [00003484] 0x3484

With this patch the trace becomes:

  Call Trace:
  [c0e9dca0] [c01c42c0] print_address_description+0x64/0x2bc (unreliable)
  [c0e9dcd0] [c01c46a4] kasan_report+0xfc/0x180
  [c0e9dd10] [c0895150] memchr+0x24/0x74
  [c0e9dd30] [c00a9e58] msg_print_text+0x124/0x574
  [c0e9dde0] [c00ab730] console_unlock+0x114/0x4f8
  [c0e9de40] [c00adc80] vprintk_emit+0x188/0x1c4
  [c0e9de80] [c00ae3e4] printk+0xa8/0xcc
  [c0e9df20] [c0c27e44] early_irq_init+0x38/0x108
  [c0e9df50] [c0c15434] start_kernel+0x310/0x488
  [c0e9dff0] [00003484] 0x3484

Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

security/selinux: fix SECURITY_LSM_NATIVE_LABELS on reused superblock

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 3815a245b50124f0865415dcb606a034e97494d4 upstream.

In the case when we're reusing a superblock, selinux_sb_clone_mnt_opts()
fails to set set_kern_flags, with the result that
nfs_clone_sb_security() incorrectly clears NFS_CAP_SECURITY_LABEL.

The result is that if you mount the same NFS filesystem twice, NFS
security labels are turned off, even if they would work fine if you
mounted the filesystem only once.

("fixes" may be not exactly the right tag, it may be more like
"fixed-other-cases-but-missed-this-one".)

Cc: Scott Mayhew <smayhew@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 0b4d3452b8b4 "security/selinux: allow security_sb_clone_mnt_opts..."
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

selinux: add the missing walk_size + len check in selinux_sctp_bind_connect

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 292c997a1970f8d1e1dfa354ed770a22f7b5a434 upstream.

As does in __sctp_connect(), when checking addrs in a while loop, after
get the addr len according to sa_family, it's necessary to do the check
walk_size + af->sockaddr_len > addrs_size to make sure it won't access
an out-of-bounds addr.

The same thing is needed in selinux_sctp_bind_connect(), otherwise an
out-of-bounds issue can be triggered:

  [14548.772313] BUG: KASAN: slab-out-of-bounds in selinux_sctp_bind_connect+0x1aa/0x1f0
  [14548.927083] Call Trace:
  [14548.938072]  dump_stack+0x9a/0xe9
  [14548.953015]  print_address_description+0x65/0x22e
  [14548.996524]  kasan_report.cold.6+0x92/0x1a6
  [14549.015335]  selinux_sctp_bind_connect+0x1aa/0x1f0
  [14549.036947]  security_sctp_bind_connect+0x58/0x90
  [14549.058142]  __sctp_setsockopt_connectx+0x5a/0x150 [sctp]
  [14549.081650]  sctp_setsockopt.part.24+0x1322/0x3ce0 [sctp]

Cc: stable@vger.kernel.org
Fixes: d452930fd3b9 ("selinux: Add SCTP support")
Reported-by: Chunyu Hu <chuhu@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

jbd2: fix compile warning when using JBUFFER_TRACE

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 01215d3edb0f384ddeaa5e4a22c1ae5ff634149f upstream.

The jh pointer may be used uninitialized in the two cases below and the
compiler complain about it when enabling JBUFFER_TRACE macro, fix them.

In file included from fs/jbd2/transaction.c:19:0:
fs/jbd2/transaction.c: In function ‘jbd2_journal_get_undo_access’:
./include/linux/jbd2.h:1637:38: warning: ‘jh’ is used uninitialized in this function [-Wuninitialized]
#define JBUFFER_TRACE(jh, info) do { printk("%s: %d\n", __func__, jh->b_jcount);} while (0)
                                      ^
fs/jbd2/transaction.c:1219:23: note: ‘jh’ was declared here
  struct journal_head *jh;
                       ^
In file included from fs/jbd2/transaction.c:19:0:
fs/jbd2/transaction.c: In function ‘jbd2_journal_dirty_metadata’:
./include/linux/jbd2.h:1637:38: warning: ‘jh’ may be used uninitialized in this function [-Wmaybe-uninitialized]
#define JBUFFER_TRACE(jh, info) do { printk("%s: %d\n", __func__, jh->b_jcount);} while (0)
                                      ^
fs/jbd2/transaction.c:1332:23: note: ‘jh’ was declared here
  struct journal_head *jh;
                       ^

Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

jbd2: clear dirty flag when revoking a buffer from an older transaction

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 904cdbd41d749a476863a0ca41f6f396774f26e4 upstream.

Now, we capture a data corruption problem on ext4 while we're truncating
an extent index block. Imaging that if we are revoking a buffer which
has been journaled by the committing transaction, the buffer's jbddirty
flag will not be cleared in jbd2_journal_forget(), so the commit code
will set the buffer dirty flag again after refile the buffer.

fsx                               kjournald2
                                  jbd2_journal_commit_transaction
jbd2_journal_revoke                commit phase 1~5...
jbd2_journal_forget
   belongs to older transaction    commit phase 6
   jbddirty not clear               __jbd2_journal_refile_buffer
                                     __jbd2_journal_unfile_buffer
                                      test_clear_buffer_jbddirty
                                       mark_buffer_dirty

Finally, if the freed extent index block was allocated again as data
block by some other files, it may corrupt the file data after writing
cached pages later, such as during unmount time. (In general,
clean_bdev_aliases() related helpers should be invoked after
re-allocation to prevent the above corruption, but unfortunately we
missed it when zeroout the head of extra extent blocks in
ext4_ext_handle_unwritten_extents()).

This patch mark buffer as freed and set j_next_transaction to the new
transaction when it already belongs to the committing transaction in
jbd2_journal_forget(), so that commit code knows it should clear dirty
bits when it is done with the buffer.

This problem can be reproduced by xfstests generic/455 easily with
seeds (3246 3247 3248 3249).

Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

serial: 8250_pci: Have ACCES cards that use the four port Pericom PI7C9X7954 chip use the pci_pericom_setup()

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 78d3820b9bd39028727c6aab7297b63c093db343 upstream.

The four port Pericom chips have the fourth port at the wrong address.
Make use of quirk to fix it.

Fixes: c8d192428f52 ("serial: 8250: added acces i/o products quad and octal serial cards")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Jay Dolan <jay.dolan@accesio.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

serial: 8250_pci: Fix number of ports for ACCES serial cards

BugLink: https://bugs.launchpad.net/bugs/1821607
commit b896b03bc7fce43a07012cc6bf5e2ab2fddf3364 upstream.

Have the correct number of ports created for ACCES serial cards. Two port
cards show up as four ports, and four port cards show up as eight.

Fixes: c8d192428f52 ("serial: 8250: added acces i/o products quad and octal serial cards")
Signed-off-by: Jay Dolan <jay.dolan@accesio.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

serial: 8250_of: assume reg-shift of 2 for mrvl,mmp-uart

BugLink: https://bugs.launchpad.net/bugs/1821607
commit f4817843e39ce78aace0195a57d4e8500a65a898 upstream.

There are two other drivers that bind to mrvl,mmp-uart and both of them
assume register shift of 2 bits. There are device trees that lack the
property and rely on that assumption.

If this driver wins the race to bind to those devices, it should behave
the same as the older deprecated driver.

Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

serial: uartps: Fix stuck ISR if RX disabled with non-empty FIFO

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 7abab1605139bc41442864c18f9573440f7ca105 upstream.

If RX is disabled while there are still unprocessed bytes in RX FIFO,
cdns_uart_handle_rx() called from interrupt handler will get stuck in
the receive loop as read bytes will not get removed from the RX FIFO
and CDNS_UART_SR_RXEMPTY bit will never get set.

Avoid the stuck handler by checking first if RX is disabled. port->lock
protects against race with RX-disabling functions.

This HW behavior was mentioned by Nathan Rossi in 43e98facc4a3 ("tty:
xuartps: Fix RX hang, and TX corruption in termios call") which fixed a
similar issue in cdns_uart_set_termios().
The behavior can also be easily verified by e.g. setting
CDNS_UART_CR_RX_DIS at the beginning of cdns_uart_handle_rx() - the
following loop will then get stuck.

Resetting the FIFO using RXRST would not set RXEMPTY either so simply
issuing a reset after RX-disable would not work.

I observe this frequently on a ZynqMP board during heavy RX load at 1M
baudrate when the reader process exits and thus RX gets disabled.

Fixes: 61ec9016988f ("tty/serial: add support for Xilinx PS UART")
Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

dmaengine: usb-dmac: Make DMAC system sleep callbacks explicit

BugLink: https://bugs.launchpad.net/bugs/1821607
commit d9140a0da4a230a03426d175145989667758aa6a upstream.

This commit fixes the issue that USB-DMAC hangs silently after system
resumes on R-Car Gen3 hence renesas_usbhs will not work correctly
when using USB-DMAC for bulk transfer e.g. ethernet or serial
gadgets.

The issue can be reproduced by these steps:
1. modprobe g_serial
2. Suspend and resume system.
3. connect a usb cable to host side
4. Transfer data from Host to Target
5. cat /dev/ttyGS0 (Target side)
6. echo "test" > /dev/ttyACM0 (Host side)

The 'cat' will not result anything. However, system still can work
normally.

Currently, USB-DMAC driver does not have system sleep callbacks hence
this driver relies on the PM core to force runtime suspend/resume to
suspend and reinitialize USB-DMAC during system resume. After
the commit 17218e0092f8 ("PM / genpd: Stop/start devices without
pm_runtime_force_suspend/resume()"), PM core will not force
runtime suspend/resume anymore so this issue happens.

To solve this, make system suspend resume explicit by using
pm_runtime_force_{suspend,resume}() as the system sleep callbacks.
SET_NOIRQ_SYSTEM_SLEEP_PM_OPS() is used to make sure USB-DMAC
suspended after and initialized before renesas_usbhs."

Signed-off-by: Phuong Nguyen <phuong.nguyen.xw@renesas.com>
Signed-off-by: Hiroyuki Yokoyama <hiroyuki.yokoyama.vx@renesas.com>
Cc: <stable@vger.kernel.org> # v4.16+
[shimoda: revise the commit log and add Cc tag]
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

usb: typec: tps6598x: handle block writes separately with plain-I2C adapters

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 8a863a608d47fa5d9dd15cf841817f73f804cf91 upstream.

Commit 1a2f474d328f handles block _reads_ separately with plain-I2C
adapters, but the problem described with regmap-i2c not handling
SMBus block transfers (i.e. read and writes) correctly also exists
with writes.

As workaround, this patch adds a block write function the same way
1a2f474d328f adds a block read function.

Fixes: 1a2f474d328f ("usb: typec: tps6598x: handle block reads separately with plain-I2C adapters")
Fixes: 0a4c005bd171 ("usb: typec: driver for TI TPS6598x USB Power Delivery controllers")
Signed-off-by: Nikolaus Voss <nikolaus.voss@loewensteinmedical.de>
Cc: stable <stable@vger.kernel.org>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

usb: chipidea: tegra: Fix missed ci_hdrc_remove_device()

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 563b9372f7ec57e44e8f9a8600c5107d7ffdd166 upstream.

The ChipIdea's platform device need to be unregistered on Tegra's driver
module removal.

Fixes: dfebb5f43a78827a ("usb: chipidea: Add support for Tegra20/30/114/124")
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Acked-by: Peter Chen <peter.chen@nxp.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

clk: ingenic: Fix doc of ingenic_cgu_div_info

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 7ca4c922aad2e3c46767a12f80d01c6b25337b59 upstream.

The 'div' field does not represent a number of bits used to divide
(understand: right-shift) the divider, but a number itself used to
divide the divider.

Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: Maarten ter Huurne <maarten@treewalker.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

clk: ingenic: Fix round_rate misbehaving with non-integer dividers

BugLink: https://bugs.launchpad.net/bugs/1821607
commit bc5d922c93491878c44c9216e9d227c7eeb81d7f upstream.

Take a parent rate of 180 MHz, and a requested rate of 4.285715 MHz.
This results in a theorical divider of 41.999993 which is then rounded
up to 42. The .round_rate function would then return (180 MHz / 42) as
the clock, rounded down, so 4.285714 MHz.

Calling clk_set_rate on 4.285714 MHz would round the rate again, and
give a theorical divider of 42,0000028, now rounded up to 43, and the
rate returned would be (180 MHz / 43) which is 4.186046 MHz, aka. not
what we requested.

Fix this by rounding up the divisions.

Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Tested-by: Maarten ter Huurne <maarten@treewalker.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

clk: samsung: exynos5: Fix kfree() of const memory on setting driver_override

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 785c9f411eb2d9a6076d3511c631587d5e676bf3 upstream.

Platform driver driver_override field should not be initialized from
const memory because the core later kfree() it.  If driver_override is
manually set later through sysfs, kfree() of old value leads to:

    $ echo "new_value" > /sys/bus/platform/drivers/.../driver_override

    kernel BUG at ../mm/slub.c:3960!
    Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
    ...
    (kfree) from [<c058e8c0>] (platform_set_driver_override+0x84/0xac)
    (platform_set_driver_override) from [<c058e908>] (driver_override_store+0x20/0x34)
    (driver_override_store) from [<c031f778>] (kernfs_fop_write+0x100/0x1dc)
    (kernfs_fop_write) from [<c0296de8>] (__vfs_write+0x2c/0x17c)
    (__vfs_write) from [<c02970c4>] (vfs_write+0xa4/0x188)
    (vfs_write) from [<c02972e8>] (ksys_write+0x4c/0xac)
    (ksys_write) from [<c0101000>] (ret_fast_syscall+0x0/0x28)

The clk-exynos5-subcmu driver uses override only for the purpose of
creating meaningful names for children devices (matching names of power
domains, e.g. DISP, MFC).  The driver_override was not developed for
this purpose so just switch to default names of devices to fix the
issue.

Fixes: b06a532bf1fa ("clk: samsung: Add Exynos5 sub-CMU clock driver")
Cc: <stable@vger.kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

clk: samsung: exynos5: Fix possible NULL pointer exception on platform_device_alloc() failure

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 5f0b6216ea381b43c0dff88702d6cc5673d63922 upstream.

During initialization of subdevices if platform_device_alloc() failed,
returned NULL pointer will be later dereferenced. Add proper error
paths to exynos5_clk_register_subcmu(). The return value of this
function is still ignored because at this stage of init there is nothing
we can do.

Fixes: b06a532bf1fa ("clk: samsung: Add Exynos5 sub-CMU clock driver")
Cc: <stable@vger.kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

clk: clk-twl6040: Fix imprecise external abort for pdmclk

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 5ae51d67aec95f6f9386aa8dd5db424964895575 upstream.

I noticed that modprobe clk-twl6040 can fail after a cold boot with:
abe_cm:clk:0010:0: failed to enable
...
Unhandled fault: imprecise external abort (0x1406) at 0xbe896b20

WARNING: CPU: 1 PID: 29 at drivers/clk/clk.c:828 clk_core_disable_lock+0x18/0x24
...
(clk_core_disable_lock) from [<c0123534>] (_disable_clocks+0x18/0x90)
(_disable_clocks) from [<c0124040>] (_idle+0x17c/0x244)
(_idle) from [<c0125ad4>] (omap_hwmod_idle+0x24/0x44)
(omap_hwmod_idle) from [<c053a038>] (sysc_runtime_suspend+0x48/0x108)
(sysc_runtime_suspend) from [<c06084c4>] (__rpm_callback+0x144/0x1d8)
(__rpm_callback) from [<c0608578>] (rpm_callback+0x20/0x80)
(rpm_callback) from [<c0607034>] (rpm_suspend+0x120/0x694)
(rpm_suspend) from [<c0607a78>] (__pm_runtime_idle+0x60/0x84)
(__pm_runtime_idle) from [<c053aaf0>] (sysc_probe+0x874/0xf2c)
(sysc_probe) from [<c05fecd4>] (platform_drv_probe+0x48/0x98)

After searching around for a similar issue, I came across an earlier fix
that never got merged upstream in the Android tree for glass-omap-xrr02.
There is patch "MFD: twl6040-codec: Implement PDMCLK cold temp errata"
by Misael Lopez Cruz <misael.lopez@ti.com>.

Based on my observations, this fix is also needed when cold booting
devices, and not just for deeper idle modes. Since we now have a clock
driver for pdmclk, let's fix the issue in twl6040_pdmclk_prepare().

Cc: Misael Lopez Cruz <misael.lopez@ti.com>
Cc: Peter Ujfalusi <peter.ujfalusi@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Acked-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

clk: uniphier: Fix update register for CPU-gear

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 521282237b9d78b9bff423ec818becd4c95841c2 upstream.

Need to set the update bit in UNIPHIER_CLK_CPUGEAR_UPD to update
the CPU-gear value.

Fixes: d08f1f0d596c ("clk: uniphier: add CPU-gear change (cpufreq) support")
Cc: linux-stable@vger.kernel.org
Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
Acked-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

ext2: Fix underflow in ext2_max_size()

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 1c2d14212b15a60300a2d4f6364753e87394c521 upstream.

When ext2 filesystem is created with 64k block size, ext2_max_size()
will return value less than 0. Also, we cannot write any file in this fs
since the sb->maxbytes is less than 0. The core of the problem is that
the size of block index tree for such large block size is more than
i_blocks can carry. So fix the computation to count with this
possibility.

File size limits computed with the new function for the full range of
possible block sizes look like:

bits file_size
10     17247252480
11    275415851008
12   2196873666560
13   2197948973056
14   2198486220800
15   2198754754560
16   2198888906752

CC: stable@vger.kernel.org
Reported-by: yangerkun <yangerkun@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

cxl: Wrap iterations over afu slices inside 'afu_list_lock'

BugLink: https://bugs.launchpad.net/bugs/1821607
commit edeb304f659792fb5bab90d7d6f3408b4c7301fb upstream.

Within cxl module, iteration over array 'adapter->afu' may be racy
at few points as it might be simultaneously read during an EEH and its
contents being set to NULL while driver is being unloaded or unbound
from the adapter. This might result in a NULL pointer to 'struct afu'
being de-referenced during an EEH thereby causing a kernel oops.

This patch fixes this by making sure that all access to the array
'adapter->afu' is wrapped within the context of spin-lock
'adapter->afu_list_lock'.

Fixes: 9e8df8a21963 ("cxl: EEH support")
Cc: stable@vger.kernel.org # v4.3+
Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: Frederic Barrat <fbarrat@linux.ibm.com>
Acked-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

IB/rdmavt: Fix concurrency panics in QP post_send and modify to error

BugLink: https://bugs.launchpad.net/bugs/1821607
commit d757c60eca9b22f4d108929a24401e0fdecda0b1 upstream.

The RC/UC code path can go through a software loopback. In this code path
the receive side QP is manipulated.

If two threads are working on the QP receive side (i.e. post_send, and
modify_qp to an error state), QP information can be corrupted.

(post_send via loopback)
  set r_sge
  loop
     update r_sge
(modify_qp)
     take r_lock
     update r_sge <---- r_sge is now incorrect
(post_send)
     update r_sge <---- crash, etc.
     ...

This can lead to one of the two following crashes:

BUG: unable to handle kernel NULL pointer dereference at (null)
  IP:  hfi1_copy_sge+0xf1/0x2e0 [hfi1]
  PGD 8000001fe6a57067 PUD 1fd9e0c067 PMD 0
Call Trace:
  ruc_loopback+0x49b/0xbc0 [hfi1]
  hfi1_do_send+0x38e/0x3e0 [hfi1]
  _hfi1_do_send+0x1e/0x20 [hfi1]
  process_one_work+0x17f/0x440
  worker_thread+0x126/0x3c0
  kthread+0xd1/0xe0
  ret_from_fork_nospec_begin+0x21/0x21

or:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
  IP:  rvt_clear_mr_refs+0x45/0x370 [rdmavt]
  PGD 80000006ae5eb067 PUD ef15d0067 PMD 0
Call Trace:
  rvt_error_qp+0xaa/0x240 [rdmavt]
  rvt_modify_qp+0x47f/0xaa0 [rdmavt]
  ib_security_modify_qp+0x8f/0x400 [ib_core]
  ib_modify_qp_with_udata+0x44/0x70 [ib_core]
  modify_qp.isra.23+0x1eb/0x2b0 [ib_uverbs]
  ib_uverbs_modify_qp+0xaa/0xf0 [ib_uverbs]
  ib_uverbs_write+0x272/0x430 [ib_uverbs]
  vfs_write+0xc0/0x1f0
  SyS_write+0x7f/0xf0
  system_call_fastpath+0x1c/0x21

Fix by using the appropriate locking on the receiving QP.

Fixes: 15703461533a ("IB/{hfi1, qib, rdmavt}: Move ruc_loopback to rdmavt")
Cc: <stable@vger.kernel.org> #v4.9+
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

IB/rdmavt: Fix loopback send with invalidate ordering

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 38bbc9f0381550d1d227fc57afa08436e36b32fc upstream.

The IBTA spec notes:

o9-5.2.1: For any HCA which supports SEND with Invalidate, upon receiving
an IETH, the Invalidate operation must not take place until after the
normal transport header validation checks have been successfully
completed.

The rdmavt loopback code does the validation after the invalidate.

Fix by relocating the operation specific logic for all SEND variants until
after the validity checks.

Cc: <stable@vger.kernel.org> #v4.20+
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

IB/hfi1: Close race condition on user context disable and close

BugLink: https://bugs.launchpad.net/bugs/1821607
commit bc5add09764c123f58942a37c8335247e683d234 upstream.

When disabling and removing a receive context, it is possible for an
asynchronous event (i.e IRQ) to occur.  Because of this, there is a race
between cleaning up the context, and the context being used by the
asynchronous event.

cpu 0  (context cleanup)
    rc->ref_count-- (ref_count == 0)
    hfi1_rcd_free()
cpu 1  (IRQ (with rcd index))
rcd_get_by_index()
lock
ref_count+++     <-- reference count race (WARNING)
return rcd
unlock
cpu 0
    hfi1_free_ctxtdata() <-- incorrect free location
    lock
    remove rcd from array
    unlock
    free rcd

This race will cause the following WARNING trace:

WARNING: CPU: 0 PID: 175027 at include/linux/kref.h:52 hfi1_rcd_get_by_index+0x84/0xa0 [hfi1]
CPU: 0 PID: 175027 Comm: IMB-MPI1 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.el7.x86_64 #1
Hardware name: Intel Corporation S2600KP/S2600KP, BIOS SE5C610.86B.11.01.0076.C4.111920150602 11/19/2015
Call Trace:
  dump_stack+0x19/0x1b
  __warn+0xd8/0x100
  warn_slowpath_null+0x1d/0x20
  hfi1_rcd_get_by_index+0x84/0xa0 [hfi1]
  is_rcv_urgent_int+0x24/0x90 [hfi1]
  general_interrupt+0x1b6/0x210 [hfi1]
  __handle_irq_event_percpu+0x44/0x1c0
  handle_irq_event_percpu+0x32/0x80
  handle_irq_event+0x3c/0x60
  handle_edge_irq+0x7f/0x150
  handle_irq+0xe4/0x1a0
  do_IRQ+0x4d/0xf0
  common_interrupt+0x162/0x162

The race can also lead to a use after free which could be similar to:

general protection fault: 0000 1 SMP
CPU: 71 PID: 177147 Comm: IMB-MPI1 Kdump: loaded Tainted: G W OE ------------ 3.10.0-957.el7.x86_64 #1
Hardware name: Intel Corporation S2600KP/S2600KP, BIOS SE5C610.86B.11.01.0076.C4.111920150602 11/19/2015
task: ffff9962a8098000 ti: ffff99717a508000 task.ti: ffff99717a508000 __kmalloc+0x94/0x230
Call Trace:
  ? hfi1_user_sdma_process_request+0x9c8/0x1250 [hfi1]
  hfi1_user_sdma_process_request+0x9c8/0x1250 [hfi1]
  hfi1_aio_write+0xba/0x110 [hfi1]
  do_sync_readv_writev+0x7b/0xd0
  do_readv_writev+0xce/0x260
  ? handle_mm_fault+0x39d/0x9b0
  ? pick_next_task_fair+0x5f/0x1b0
  ? sched_clock_cpu+0x85/0xc0
  ? __schedule+0x13a/0x890
  vfs_writev+0x35/0x60
  SyS_writev+0x7f/0x110
  system_call_fastpath+0x22/0x27

Use the appropriate kref API to verify access.

Reorder context cleanup to ensure context removal before cleanup occurs
correctly.

Cc: stable@vger.kernel.org # v4.14.0+
Fixes: f683c80ca68e ("IB/hfi1: Resolve kernel panics by reference counting receive contexts")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

PCI: pci-bridge-emul: Extend pci_bridge_emul_init() with flags

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 33776d059630e5045ea9ccf756c74de8f9cc86de upstream.

Depending on the capabilities of the PCI controller/platform, the
PCI-to-PCI bridge emulation behavior might need to be different. For
example, on platforms that use the pci-mvebu code, we currently don't
support prefetchable memory BARs, so the corresponding fields in the
PCI-to-PCI bridge configuration space should be read-only.

To implement this, extend pci_bridge_emul_init() to take a "flags"
argument, with currently one flag supported:

PCI_BRIDGE_EMUL_NO_PREFETCHABLE_BAR

that will make the prefetchable memory base and limit registers
read-only.

The pci-mvebu and pci-aardvark drivers are updated accordingly.

Fixes: 1f08673eef123 ("PCI: mvebu: Convert to PCI emulated bridge config space")
Reported-by: Luís Mendes <luis.p.mendes@gmail.com>
Reported-by: Leigh Brown <leigh@solinno.co.uk>
Tested-by: Leigh Brown <leigh@solinno.co.uk>
Tested-by: Luis Mendes <luis.p.mendes@gmail.com>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: stable@vger.kernel.org
Cc: Luís Mendes <luis.p.mendes@gmail.com>
Cc: Leigh Brown <leigh@solinno.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

PCI: pci-bridge-emul: Create per-bridge copy of register behavior

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 59f81c35e0df840f7112cb296dde48df84a67c79 upstream.

The behavior of the different registers of the PCI-to-PCI bridge is
currently encoded in two global arrays, shared by all instances of
PCI-to-PCI bridge emulation.

However, we will need to tweak the behavior on a per-bridge basis, to
accommodate for different capabilities of the platforms where this
code is used. In preparation for this, create a per-bridge copy of the
register behavior arrays, so that they can later be tweaked on a
per-bridge basis.

Fixes: 1f08673eef123 ("PCI: mvebu: Convert to PCI emulated bridge config space")
Reported-by: Luís Mendes <luis.p.mendes@gmail.com>
Reported-by: Leigh Brown <leigh@solinno.co.uk>
Tested-by: Leigh Brown <leigh@solinno.co.uk>
Tested-by: Luis Mendes <luis.p.mendes@gmail.com>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: stable@vger.kernel.org
Cc: Luís Mendes <luis.p.mendes@gmail.com>
Cc: Leigh Brown <leigh@solinno.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

PCI: dwc: skip MSI init if MSIs have been explicitly disabled

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 3afc8299f39a27b60e1519a28e18878ce878e7dd upstream.

Since 7c5925afbc58 (PCI: dwc: Move MSI IRQs allocation to IRQ domains
hierarchical API) the MSI init claims one of the controller IRQs as a
chained IRQ line for the MSI controller. On some designs, like the i.MX6,
this line is shared with a PCIe legacy IRQ. When the line is claimed for
the MSI domain, any device trying to use this legacy IRQs will fail to
request this IRQ line.

As MSI and legacy IRQs are already mutually exclusive on the DWC core,
as the core won't forward any legacy IRQs once any MSI has been enabled,
users wishing to use legacy IRQs already need to explictly disable MSI
support (usually via the pci=nomsi kernel commandline option). To avoid
any issues with MSI conflicting with legacy IRQs, just skip all of the
DWC MSI initalization, including the IRQ line claim, when MSI is disabled.

Fixes: 7c5925afbc58 ("PCI: dwc: Move MSI IRQs allocation to IRQ domains hierarchical API")
Tested-by: Tim Harvey <tharvey@gateworks.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

PCI: qcom: Don't deassert reset GPIO during probe

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 02b485e31d98265189b91f3e69c43df2ed50610c upstream.

Acquiring the reset GPIO low means that reset is being deasserted, this
is followed almost immediately with qcom_pcie_host_init() asserting it,
initializing it and then finally deasserting it again, for the link to
come up.

Some PCIe devices requires a minimum time between the initial deassert
and subsequent reset cycles. In a platform that boots with the reset
GPIO asserted this requirement is being violated by this deassert/assert
pulse.

Acquire the reset GPIO high to prevent this situation by matching the
state to the subsequent asserted state.

Fixes: 82a823833f4e ("PCI: qcom: Add Qualcomm PCIe controller driver")
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
[lorenzo.pieralisi@arm.com: updated commit log]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Stanimir Varbanov <svarbanov@mm-sol.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

PCI/DPC: Fix print AER status in DPC event handling

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 9f08a5d896ce43380314c34ed3f264c8e6075b80 upstream.

Previously dpc_handler() called aer_get_device_error_info() without
initializing info->severity, so aer_get_device_error_info() relied on
uninitialized data.

Add dpc_get_aer_uncorrect_severity() to read the port's AER status, mask,
and severity registers and set info->severity.

Also, clear the port's AER fatal error status bits.

Fixes: 8aefa9b0d910 ("PCI/DPC: Print AER status in DPC event handling")
Signed-off-by: Dongdong Liu <liudongdong3@huawei.com>
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

PCI/ASPM: Use LTR if already enabled by platform

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 10ecc818ea7319b5d0d2b4e1aa6a77323e776f76 upstream.

RussianNeuroMancer reported that the Intel 7265 wifi on a Dell Venue 11 Pro
7140 table stopped working after wakeup from suspend and bisected the
problem to 9ab105deb60f ("PCI/ASPM: Disable ASPM L1.2 Substate if we don't
have LTR"). David Ward reported the same problem on a Dell Latitude 7350.

After af8bb9f89838 ("PCI/ACPI: Request LTR control from platform before
using it"), we don't enable LTR unless the platform has granted LTR control
to us. In addition, we don't notice if the platform had already enabled
LTR itself.

After 9ab105deb60f ("PCI/ASPM: Disable ASPM L1.2 Substate if we don't have
LTR"), we avoid using LTR if we don't think the path to the device has LTR
enabled.

The combination means that if the platform itself enables LTR but declines
to give the OS control over LTR, we unnecessarily avoided using ASPM L1.2.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=201469
Fixes: 9ab105deb60f ("PCI/ASPM: Disable ASPM L1.2 Substate if we don't have LTR")
Fixes: af8bb9f89838 ("PCI/ACPI: Request LTR control from platform before using it")
Reported-by: RussianNeuroMancer <russianneuromancer@ya.ru>
Reported-by: David Ward <david.ward@ll.mit.edu>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: stable@vger.kernel.org # v4.18+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

swiotlb: Add is_swiotlb_active() function

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 492366f7b4237257ef50ca9c431a6a0d50225aca upstream.

This function will be used from dma_direct code to determine
the maximum segment size of a dma mapping.

Cc: stable@vger.kernel.org
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

swiotlb: Introduce swiotlb_max_mapping_size()

BugLink: https://bugs.launchpad.net/bugs/1821607
commit abe420bfae528c92bd8cc5ecb62dc95672b1fd6f upstream.

The function returns the maximum size that can be remapped
by the SWIOTLB implementation. This function will be later
exposed to users through the DMA-API.

Cc: stable@vger.kernel.org
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

dma: Introduce dma_max_mapping_size()

BugLink: https://bugs.launchpad.net/bugs/1821607
commit 133d624b1cee16906134e92d5befb843b58bcf31 upstream.

The function returns the maximum size that can be mapped
using DMA-API functions. The patch also adds the
implementation for direct DMA and a new dma_map_ops pointer
so that other implementations can expose their limit.

Cc: stable@vger.kernel.org
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

ext4: fix crash during online resizing

BugLink: https://bugs.launchpad.net/bugs/1821607
commit f96c3ac8dfc24b4e38fc4c2eba5fea2107b929d1 upstream.

When computing maximum size of filesystem possible with given number of
group descriptor blocks, we forget to include s_first_data_block into
the number of blocks. Thus for filesystems with non-zero
s_first_data_block it can happen that computed maximum filesystem size
is actually lower than current filesystem size which confuses the code
and eventually leads to a BUG_ON in ext4_alloc_group_tables() hitting on
flex_gd->count == 0. The problem can be reproduced like:

truncate -s 100g /tmp/image
mkfs.ext4 -b 1024 -E resize=262144 /tmp/image 32768
mount -t ext4 -o loop /tmp/image /mnt
resize2fs /dev/loop0 262145
resize2fs /dev/loop0 300000

Fix the problem by properly including s_first_data_block into the
computed number of filesystem blocks.

Fixes: 1c6bd7173d66 "ext4: convert file system to meta_bg if needed..."
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

ext4: add mask of ext4 flags to swap

BugLink: https://bugs.launchpad.net/bugs/1821607
commit abdc644e8cbac2e9b19763680e5a7cf9bab2bee7 upstream.

The reason is that while swapping two inode, we swap the flags too.
Some flags such as EXT4_JOURNAL_DATA_FL can really confuse the things
since we're not resetting the address operations structure. The
simplest way to keep things sane is to restrict the flags that can be
swapped.

Signed-off-by: yangerkun <yangerkun@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

ext4: update quota information while swapping boot loader inode

BugLink: https://bugs.launchpad.net/bugs/1821607
commit aa507b5faf38784defe49f5e64605ac3c4425e26 upstream.

While do swap between two inode, they swap i_data without update
quota information. Also, swap_inode_boot_loader can do "revert"
somtimes, so update the quota while all operations has been finished.

Signed-off-by: yangerkun <yangerkun@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>