git.proxmox.com Git - mirror_ubuntu-zesty-kernel.git/log

devlink: allow to fillup eswitch attrs even if mode_get op does not exist

BugLink: http://bugs.launchpad.net/bugs/1676388
Even when mode_get op is not present, other eswitch attrs need to be
filled-up.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 4456f61cfd2a589c4368fe0b9080b646b9bd470d)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

devlink: use nla_put_failure goto label instead of out

BugLink: http://bugs.launchpad.net/bugs/1676388
Be aligned with the rest of the code and use label named nla_put_failure.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 1a6aa36b6f92b1a2f2e6789f6785372d4d6ddca9)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

devlink: rename devlink_eswitch_fill to devlink_nl_eswitch_fill

BugLink: http://bugs.launchpad.net/bugs/1676388
Be aligned with the rest of the file and name the helper function
accordingly.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 21e3d2dd4a19f842e7d134c341eb584970ff3b32)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

devlink: fix the name of eswitch commands

BugLink: http://bugs.launchpad.net/bugs/1676388
The eswitch_[gs]et command is supposed to be similar to port_[gs]et
command - for multiple eswitch attributes. However, when it was introduced
by 08f4b5918b2d ("net/devlink: Add E-Switch mode control") it was wrongly
named with the word "mode" in it. So fix this now, make the oririnal
enum value existing but obsolete.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit adf200f31c000d707e4afe238ed1d1199e0cce7c)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

net/mlx5e: Avoid wrong identification of rules on deletion

BugLink: http://bugs.launchpad.net/bugs/1676388
When deleting offloaded TC flows, we must correctly identify E-switch
rules. The current check could get us wrong w.r.t to rules set on the
PF. Since it's possible to set NIC rules on the PF, switch to SRIOV
offloads mode and then attempt to delete a NIC rule.

To solve that, we add a flags field to offloaded rules, set it on
creation time and use that over the code where currently needed.

Fixes: 8b32580df1cb ('net/mlx5e: Add TC vlan action for SRIOV offloads')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 65ba8fb7d5c6803ec236bb8d6650465fed7f9769)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

UBUNTU: [Debian] add rprovides for spl-modules and zfs-modules

Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

UBUNTU: SAUCE: efi: arm-stub: Round up FDT allocation to mapping size

The FDT is mapped via a fixmap entry that is at least 2 MB in size and
2 MB aligned on 4 KB page size kernels.

On UEFI systems, the FDT allocation may share this 2 MB block with a
reserved region, or another memory region that we should never map,
unless we account for this in the size of the allocation (the alignment
is already 2 MB)

So instead of taking guesses at the needed space, simply allocate 2 MB
immediately. The allocation will be recorded as a EFI_LOADER_DATA, and
the kernel only memblock_reserve()'s the actual size of the FDT, so the
unused space will be released to the kernel.

BugLink: http://bugs.launchpad.net/bugs/1675046
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-By: Jeffrey Hugo <jhugo@codeaurora.org>
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

UBUNTU: SAUCE: efi: arm-stub: Correct FDT and initrd allocation rules for arm64

On arm64, we have made some changes over the past year to the way the
kernel itself is allocated and to how it deals with the initrd and FDT.
This patch brings the allocation logic in the EFI stub in line with that,
which is necessary because the introduction of KASLR has created the
possibility for the initrd to be allocated in a place where the kernel
may not be able to map it. (This is mostly a theoretical scenario, since
it only affects systems where the physical memory footprint exceeds the
size of the linear mapping.)

Since we know the kernel itself will be covered by the linear mapping,
choose a suitably sized window (i.e., based on the size of the linear
region) covering the kernel when allocating memory for the initrd.

The FDT may be anywhere in memory on arm64 now that we map it via the
fixmap, so we can lift the address restriction there completely.

BugLink: http://bugs.launchpad.net/bugs/1675046
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-By: Jeffrey Hugo <jhugo@codeaurora.org>
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

scsi: aacraid: Fix potential null access

BugLink: http://bugs.launchpad.net/bugs/1675872
Currently, command threads fails to return ioctls commands for older
controller versions, since it returns when all the fibs have been
allocated. Another issue is even all the fibs have not been allocated,
the correct allocated fibs is not updated nor freed.

Fixes: 113156bcea9ef1e6 (scsi: aacraid: Reworked aac_command_thread)
Reported-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Reviewed-by: Dave Carroll <david.carroll@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from linux-next commit e498520edec6655e93ac5e768b04f4fd2299fe4d)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: aacraid: Fix typo in blink status

BugLink: http://bugs.launchpad.net/bugs/1675872
The return status of the adapter check on KERNEL_PANIC is supposed to be
the upper 16 bits of the OMR status register.

Fixes: c421530bf848604e (scsi: aacraid: Reorder Adpater status check)
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Reviewed-by: Dave Carroll <david.carroll@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 934767c56b0d9dbb95a40e9e6e4d9dcdc3a165ad)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: aacraid: remove redundant zero check on ret

BugLink: http://bugs.launchpad.net/bugs/1675872
The check for ret being zero is redundant as a few statements earlier we
break out of the while loop if ret is non-zero. Thus we can remove the
zero check and also the dead-code non-zero case too.

Detected by CoverityScan, CID#1411632 ("Logically Dead Code")

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Dave Carroll <david.carroll@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit fbdab3e7fd547e1ce558db1521659707bdf02cc6)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

ext4: lock the xattr block before checksuming it

BugLink: http://bugs.launchpad.net/bugs/1658633
We must lock the xattr block before calculating or verifying the
checksum in order to avoid spurious checksum failures.

https://bugzilla.kernel.org/show_bug.cgi?id=193661

Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
(cherry picked from commit dac7a4b4b1f664934e8b713f529b629f67db313c)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Colin King <colin.king@canonical.com>

net/mlx5e: Avoid supporting udp tunnel port ndo for VF reps

BugLink: http://bugs.launchpad.net/bugs/1676388
This was added to allow the TC offloading code to identify offloading
encap/decap vxlan rules.

The VF reps are effectively related to the same mlx5 PCI device as the
PF. Since the kernel invokes the (say) delete ndo for each netdev, the
FW erred on multiple vxlan dst port deletes when the port was deleted
from the system.

We fix that by keeping the registration to be carried out only by the
PF. Since the PF serves as the uplink device, the VF reps will look
up a port there and realize if they are ok to offload that.

Tested:
<SETUP VFS>
<SETUP switchdev mode to have representors>
ip link add vxlan1 type vxlan id 44 dev ens5f0 dstport 9999
ip link set vxlan1 up
ip link del dev vxlan1

Fixes: 4a25730eb202 ('net/mlx5e: Add ndo_udp_tunnel_add to VF representors')
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 1ad9a00ae0efc2e9337148d6c382fad3d27bf99a)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

net/mlx5: Fix create autogroup prev initializer

BugLink: http://bugs.launchpad.net/bugs/1676388
The autogroups list is a list of non overlapping group boundaries
sorted by their start index. If the autogroups list wasn't empty
and an empty group slot was found at the start of the list,
the new group was added to the end of the list instead of the
beginning, as the prev initializer was incorrect.
When this was repeated, it caused multiple groups to have
overlapping boundaries.

Fixed that by correctly initializing the prev pointer to the
start of the list.

Fixes: eccec8da3b4e ('net/mlx5: Keep autogroups list ordered')
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit af36370569eb37420e1e78a2e60c277b781fcd00)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

UBUNTU: SAUCE: arm64: arch_timer: Add check for unknown erratum

BugLink: https://bugs.launchpad.net/bugs/1675509
If an unknown erratum type is passed into arch_timer_check_ool_workaround(),
we would call arch_timer_iterate_errata with a NULL match_fn and trigger
an Oops. This does not look possible with the existing code (all types are
accounted for), but emit an error and return if it happens to come to pass
in the future.

Reported-by: Seth Forshee <seth.foreshee@canonical.com>
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

UBUNTU: SAUCE: arm64: arch_timer: Add HISILICON_ERRATUM_161010101 ACPI matching data

BugLink: https://bugs.launchpad.net/bugs/1675509
In order to deal with ACPI enabled platforms suffering from the
HISILICON_ERRATUM_161010101, let's add the required OEM data that
allow the workaround to be enabled.

Tested-by: dann frazier <dann.frazier@canonical.com>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Reviewed-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
(cherry picked from commit 68fedd693bbdf8601667302ee12e677e50fe06d8
in the timers/errata-rework branch of
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git)
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

UBUNTU: SAUCE: arm64: arch_timer: Allow erratum matching with ACPI OEM information

BugLink: https://bugs.launchpad.net/bugs/1675509
Just as we're able to identify a broken platform using some DT
information, let's enable a way to spot the offenders with ACPI.

The difference is that we can only match on some OEM info instead
of implementation-specific properties. So in order to avoid the
insane multiplication of errata structures, we allow an array
of OEM descriptions to be attached to an erratum structure.

Tested-by: dann frazier <dann.frazier@canonical.com>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Reviewed-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
(cherry picked from commit b6789e1ee3a11f2edcd43ce13628a6de664a2c28
in the timers/errata-rework branch of
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git)
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

UBUNTU: SAUCE: arm64: arch_timer: Workaround for Cortex-A73 erratum 858921

BugLink: https://bugs.launchpad.net/bugs/1675509
Cortex-A73 (all versions) counter read can return a wrong value
when the counter crosses a 32bit boundary.

The workaround involves performing the read twice, and to return
one or the other depending on whether a transition has taken place.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
(cherry picked from commit 7b1f8a2afb42458f5878060aec51ad9c28cbf653
in the timers/errata-rework branch of
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git)
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

UBUNTU: [Config] CONFIG_ARM64_ERRATUM_858921=y

Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

UBUNTU: SAUCE: arm64: arch_timer: Enable CNTVCT_EL0 trap if workaround is enabled

BugLink: https://bugs.launchpad.net/bugs/1675509
Userspace being allowed to use read CNTVCT_EL0 anytime (and not
only in the VDSO), we need to enable trapping whenever a cntvct
workaround is enabled on a given CPU.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
(cherry picked from commit 13fbbd32946ac882cdd3f07c02687f6b860d1a7f
in the timers/errata-rework branch of
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git)
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

UBUNTU: SAUCE: arm64: arch_timer: Move clocksource_counter and co around

BugLink: https://bugs.launchpad.net/bugs/1675509
In order to access clocksource_counter from the errata handling code,
move it (together with the related structures and functions) towards
the top of the file.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
(backported from commit d610f30af30c621399d2e7f5111ddc695f74311b
in the timers/errata-rework branch of
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git)
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

UBUNTU: SAUCE: arm64: arch_timer: Allows a CPU-specific erratum to only affect a subset of CPUs

BugLink: https://bugs.launchpad.net/bugs/1675509
Instead of applying a CPU-specific workaround to all CPUs in the system,
allow it to only affect a subset of them (typical big-little case).

This is done by turning the erratum pointer into a per-CPU variable.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
(cherry picked from commit a47c823df361fb477bdc98cb8d7d53841f6624a6
in the timers/errata-rework branch of
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git)
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

UBUNTU: SAUCE: arm64: arch_timer: Make workaround methods optional

BugLink: https://bugs.launchpad.net/bugs/1675509
Not all errata need to workaround all access types. Allow them to
be optional.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
(cherry picked from commit 7ecb0c2a0e111449128d17f559f0d0c6828d784c
in the timers/errata-rework branch of
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git)
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

UBUNTU: SAUCE: arm64: arch_timer: Rework the set_next_event workarounds

BugLink: https://bugs.launchpad.net/bugs/1675509
The way we work around errata affecting set_next_event is not very
nice, at it imposes this workaround on errata that do not need it.

Add new workaround hooks and let the existing workarounds use them.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
(cherry picked from commit afb31094daae2cb654266707293d992bb491576d
in the timers/errata-rework branch of
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git)
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

UBUNTU: SAUCE: arm64: arch_timer: Get rid of erratum_workaround_set_sne

BugLink: https://bugs.launchpad.net/bugs/1675509
Let's move the handling of workarounds affecting set_next_event
to the affected function, instead of overriding the pointers
as an afterthough. Yes, this is an extra indirection on the
erratum handling path, but the HW is busted anyway.

This will allow for some more flexibility later.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
(cherry picked from commit 32e0bde2c5371a9747ce917db8156e29d5c9fc8e
in the timers/errata-rework branch of
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git)
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

UBUNTU: SAUCE: arm64: arch_timer: Move arch_timer_reg_read/write around

BugLink: https://bugs.launchpad.net/bugs/1675509
As we're about to move things around, let's start with the low
level read/write functions. This allows us to use these functions
in the errata handling code without having to use forward declaration
of static functions.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
(cherry picked from commit 685cbf2298afcf404bd519361c0915154ba48c89
in the timers/errata-rework branch of
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git)
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

UBUNTU: SAUCE: arm64: arch_timer: Add erratum handler for CPU-specific capability

BugLink: https://bugs.launchpad.net/bugs/1675509
Should we ever have a workaround for an erratum that is detected using
a capability and affecting a particular CPU, it'd be nice to have
a way to probe them directly.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
(cherry picked from commit 062d3f37a3e173053e0722f40fa15a46d82c7301
in the timers/errata-rework branch of
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git)
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

UBUNTU: SAUCE: arm64: arch_timer: Add erratum handler for globally defined capability

BugLink: https://bugs.launchpad.net/bugs/1675509
Should we ever have a workaround for an erratum that is detected using
a capability (and affecting the whole system), it'd be nice to have
a way to probe them directly.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
(cherry picked from commit b50b1f54abc59e7f4913b9bf476c6e89259776b7
in the timers/errata-rework branch of
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git)
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

UBUNTU: SAUCE: arm64: arch_timer: Add infrastructure for multiple erratum detection methods

BugLink: https://bugs.launchpad.net/bugs/1675509
We're currently stuck with DT when it comes to handling errata, which
is pretty restrictive. In order to make things more flexible, let's
introduce an infrastructure that could support alternative discovery
methods. No change in functionality.

Reviewed-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
(cherry picked from commit 180cb954f2b26c6bccfd1a1229e6bb7c8fb07167
in the timers/errata-rework branch of
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git)
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

UBUNTU: SAUCE: arm64: cpu_errata: Add capability to advertise Cortex-A73 erratum 858921

BugLink: https://bugs.launchpad.net/bugs/1675509
In order to work around Cortex-A73 erratum 858921 in a subsequent
patch, add the required capability that advertise the erratum.

As the configuration option it depends on is not present yet,
this has no immediate effect.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
(cherry picked from commit b0f0a6a803aada6268a546c3faf1987e36cbb3ca
in the timers/errata-rework branch of
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git)
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

UBUNTU: SAUCE: arm64: cpu_errata: Allow an erratum to be match for all revisions of a core

BugLink: https://bugs.launchpad.net/bugs/1675509
Some minor erratum may not be fixed in further revisions of a core,
leading to a situation where the workaround needs to be updated each
time an updated core is released.

Introduce a MIDR_ALL_VERSIONS match helper that will work for all
versions of that MIDR, once and for all.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
(cherry picked from commit 8b8ebca44292a59789c612c0e060c58cb7e0be40
in the timers/errata-rework branch of
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git)
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

UBUNTU: SAUCE: arm64: Define Cortex-A73 MIDR

BugLink: https://bugs.launchpad.net/bugs/1675509
As we're about to introduce a new workaround that is specific to
Cortex-A73, let's define the coresponding MIDR.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
(cherry picked from commit fcd3b9e93c6e735185c2e8187408847dd3295c20
in the timers/errata-rework branch of
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git)
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

UBUNTU: SAUCE: arm64: Add CNTVCT_EL0 trap handler

BugLink: https://bugs.launchpad.net/bugs/1675509
Since people seem to make a point in breaking the userspace visible
counter, we have no choice but to trap the access. Add the required
handler.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
(cherry picked from commit 2b88ddaa935c5d004c951b23089256f7f25fcaa3
in the timers/errata-rework branch of
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git)
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

UBUNTU: SAUCE: arm64: Allow checking of a CPU-local erratum

BugLink: https://bugs.launchpad.net/bugs/1675509
this_cpu_has_cap() only checks the feature array, and not the errata
one. In order to be able to check for a CPU-local erratum, allow it
to inspect the latter as well.

This is consistent with cpus_have_cap()'s behaviour, which includes
errata already.

Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
(cherry picked from commit cd3fc121809c82f577363d112bf028b2bd3a655d
in the timers/errata-rework branch of
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git)
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

arm64: ptrace: add XZR-safe regs accessors

BugLink: http://bugs.launchpad.net/bugs/1675509
In A64, XZR and the SP share the same encoding (31), and whether an
instruction accesses XZR or SP for a particular register parameter
depends on the definition of the instruction.

We store the SP in pt_regs::regs[31], and thus when emulating
instructions, we must be careful to not erroneously read from or write
back to the saved SP. Unfortunately, we often fail to be this careful.

In all cases, instructions using a transfer register parameter Xt use
this to refer to XZR rather than SP. This patch adds helpers so that we
can more easily and consistently handle these cases.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
(cherry picked from commit 6c23e2ff7013be2c4bbcb7b9b3cc27c763348223)
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

Linux 4.10.6

BugLink: http://bugs.launchpad.net/bugs/1676429
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

drm/amdgpu/si: add dpm quirk for Oland

BugLink: http://bugs.launchpad.net/bugs/1676429
commit 18a8de1bc37e97dff1c96ee6cf49adbd02a0f775 upstream.

OLAND 0x1002:0x6604 0x1028:0x066F 0x00 seems to have problems
with higher sclks.

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

cgroup/pids: remove spurious suspicious RCU usage warning

BugLink: http://bugs.launchpad.net/bugs/1676429
commit 1d18c2747f937f1b5ec65ce6bf4ccb9ca1aea9e8 upstream.

pids_can_fork() is special in that the css association is guaranteed
to be stable throughout the function and thus doesn't need RCU
protection around task_css access.  When determining the css to charge
the pid, task_css_check() is used to override the RCU sanity check.

While adding a warning message on fork rejection from pids limit,
135b8b37bd91 ("cgroup: Add pids controller event when fork fails
because of pid limit") incorrectly added a task_css access which is
neither RCU protected or explicitly annotated.  This triggers the
following suspicious RCU usage warning when RCU debugging is enabled.

  cgroup: fork rejected by pids controller in

  ===============================
  [ ERR: suspicious RCU usage.  ]
  4.10.0-work+ #1 Not tainted
  -------------------------------
  ./include/linux/cgroup.h:435 suspicious rcu_dereference_check() usage!

  other info that might help us debug this:

  rcu_scheduler_active = 2, debug_locks = 0
  1 lock held by bash/1748:
   #0:  (&cgroup_threadgroup_rwsem){+++++.}, at: [<ffffffff81052c96>] _do_fork+0xe6/0x6e0

  stack backtrace:
  CPU: 3 PID: 1748 Comm: bash Not tainted 4.10.0-work+ #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.fc25 04/01/2014
  Call Trace:
   dump_stack+0x68/0x93
   lockdep_rcu_suspicious+0xd7/0x110
   pids_can_fork+0x1c7/0x1d0
   cgroup_can_fork+0x67/0xc0
   copy_process.part.58+0x1709/0x1e90
   _do_fork+0xe6/0x6e0
   SyS_clone+0x19/0x20
   do_syscall_64+0x5c/0x140
   entry_SYSCALL64_slow_path+0x25/0x25
  RIP: 0033:0x7f7853fab93a
  RSP: 002b:00007ffc12d05c90 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
  RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7853fab93a
  RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
  RBP: 00007ffc12d05cc0 R08: 0000000000000000 R09: 00007f78548db700
  R10: 00007f78548db9d0 R11: 0000000000000246 R12: 00000000000006d4
  R13: 0000000000000001 R14: 0000000000000000 R15: 000055e3ebe2c04d
  /asdf

There's no reason to dereference task_css again here when the
associated css is already available.  Fix it by replacing the
task_cgroup() call with css->cgroup.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Mike Galbraith <efault@gmx.de>
Fixes: 135b8b37bd91 ("cgroup: Add pids controller event when fork fails because of pid limit")
Cc: Kenny Yu <kennyyu@fb.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

percpu: acquire pcpu_lock when updating pcpu_nr_empty_pop_pages

BugLink: http://bugs.launchpad.net/bugs/1676429
commit 320661b08dd6f1746d5c7ab4eb435ec64b97cd45 upstream.

Update to pcpu_nr_empty_pop_pages in pcpu_alloc() is currently done
without holding pcpu_lock. This can lead to bad updates to the variable.
Add missing lock calls.

Fixes: b539b87fed37 ("percpu: implmeent pcpu_nr_empty_pop_pages and chunk->nr_populated")
Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

gfs2: Avoid alignment hole in struct lm_lockname

BugLink: http://bugs.launchpad.net/bugs/1676429
commit 28ea06c46fbcab63fd9a55531387b7928a18a590 upstream.

Commit 88ffbf3e03 switches to using rhashtables for glocks, hashing over
the entire struct lm_lockname instead of its individual fields. On some
architectures, struct lm_lockname contains a hole of uninitialized
memory due to alignment rules, which now leads to incorrect hash values.
Get rid of that hole.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

isdn/gigaset: fix NULL-deref at probe

BugLink: http://bugs.launchpad.net/bugs/1676429
commit 68c32f9c2a36d410aa242e661506e5b2c2764179 upstream.

Make sure to check the number of endpoints to avoid dereferencing a
NULL-pointer should a malicious device lack endpoints.

Fixes: cf7776dc05b8 ("[PATCH] isdn4linux: Siemens Gigaset drivers - direct USB connection")
Cc: Hansjoerg Lipp <hjlipp@web.de>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

target: Fix VERIFY_16 handling in sbc_parse_cdb

BugLink: http://bugs.launchpad.net/bugs/1676429
commit 13603685c1f12c67a7a2427f00b63f39a2b6f7c9 upstream.

As reported by Max, the Windows 2008 R2 chkdsk utility expects
VERIFY_16 to be supported, and does not handle the returned
CHECK_CONDITION properly, resulting in an infinite loop.

The kernel will log huge amounts of this error:

kernel: TARGET_CORE[iSCSI]: Unsupported SCSI Opcode 0x8f, sending
CHECK_CONDITION.

Signed-off-by: Max Lohrmann <post@wickenrode.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

scsi: mpt3sas: Avoid sleeping in interrupt context

BugLink: http://bugs.launchpad.net/bugs/1676429
commit 8893cf6cb1cf56334c05120e23092dbfc9423ebb upstream.

Commit 669f044170d8 ("scsi: srp_transport: Move queuecommand() wait code
to SCSI core") can make scsi_internal_device_block() sleep.  However,
the mpt3sas driver can call this function from an interrupt
handler. Hence add a second argument to scsi_internal_device_block()
that restores the old behavior of this function for the mpt3sas handler.

The call chain that triggered an "IRQ handler enabled interrupts"
complaint is as follows:

_base_interrupt()
-> _base_async_event()
   -> mpt3sas_scsih_event_callback()
      -> _scsih_check_topo_delete_events()
         -> _scsih_block_io_to_children_attached_directly()
            -> _scsih_block_io_device()
               -> _scsih_internal_device_block()
                  -> scsi_internal_device_block()

Reported-by: Omar Sandoval <osandov@osandov.com>
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Omar Sandoval <osandov@osandov.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sathya Prakash <sathya.prakash@broadcom.com>
Cc: Chaitra P B <chaitra.basappa@broadcom.com>
Cc: Suganath Prabu Subramani <suganath-prabu.subramani@broadcom.com>
Cc: Sreekanth Reddy <Sreekanth.Reddy@broadcom.com>
Tested-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

scsi: libiscsi: add lock around task lists to fix list corruption regression

BugLink: http://bugs.launchpad.net/bugs/1676429
commit 6f8830f5bbab16e54f261de187f3df4644a5b977 upstream.

There's a rather long standing regression from the commit "libiscsi:
Reduce locking contention in fast path"

Depending on iSCSI target behavior, it's possible to hit the case in
iscsi_complete_task where the task is still on a pending list
(!list_empty(&task->running)).  When that happens the task is removed
from the list while holding the session back_lock, but other task list
modification occur under the frwd_lock.  That leads to linked list
corruption and eventually a panicked system.

Rather than back out the session lock split entirely, in order to try
and keep some of the performance gains this patch adds another lock to
maintain the task lists integrity.

Major enterprise supported kernels have been backing out the lock split
for while now, thanks to the efforts at IBM where a lab setup has the
most reliable reproducer I've seen on this issue.  This patch has been
tested there successfully.

Signed-off-by: Chris Leech <cleech@redhat.com>
Fixes: 659743b02c41 ("[SCSI] libiscsi: Reduce locking contention in fast path")
Reported-by: Prashantha Subbarao <psubbara@us.ibm.com>
Reviewed-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

scsi: lpfc: Add shutdown method for kexec

BugLink: http://bugs.launchpad.net/bugs/1676429
commit 85e8a23936ab3442de0c42da97d53b29f004ece1 upstream.

We see lpfc devices regularly fail during kexec. Fix this by adding a
shutdown method which mirrors the remove method.

Signed-off-by: Anton Blanchard <anton@samba.org>
Reviewed-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Tested-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

target/pscsi: Fix TYPE_TAPE + TYPE_MEDIMUM_CHANGER export

BugLink: http://bugs.launchpad.net/bugs/1676429
commit a04e54f2c35823ca32d56afcd5cea5b783e2f51a upstream.

The following fixes a divide by zero OOPs with TYPE_TAPE
due to pscsi_tape_read_blocksize() failing causing a zero
sd->sector_size being propigated up via dev_attrib.hw_block_size.

It also fixes another long-standing bug where TYPE_TAPE and
TYPE_MEDIMUM_CHANGER where using pscsi_create_type_other(),
which does not call scsi_device_get() to take the device
reference. Instead, rename pscsi_create_type_rom() to
pscsi_create_type_nondisk() and use it for all cases.

Finally, also drop a dump_stack() in pscsi_get_blocks() for
non TYPE_DISK, which in modern target-core can get invoked
via target_sense_desc_format() during CHECK_CONDITION.

Reported-by: Malcolm Haak <insanemal@gmail.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

md/raid1/10: fix potential deadlock

BugLink: http://bugs.launchpad.net/bugs/1676429
commit 61eb2b43b99ebdc9bc6bc83d9792257b243e7cb3 upstream.

Neil Brown pointed out a potential deadlock in raid 10 code with
bio_split/chain. The raid1 code could have the same issue, but recent
barrier rework makes it less likely to happen. The deadlock happens in
below sequence:

1. generic_make_request(bio), this will set current->bio_list
2. raid10_make_request will split bio to bio1 and bio2
3. __make_request(bio1), wait_barrer, add underlayer disk bio to
current->bio_list
4. __make_request(bio2), wait_barrer

If raise_barrier happens between 3 & 4, since wait_barrier runs at 3,
raise_barrier waits for IO completion from 3. And since raise_barrier
sets barrier, 4 waits for raise_barrier. But IO from 3 can't be
dispatched because raid10_make_request() doesn't finished yet.

The solution is to adjust the IO ordering. Quotes from Neil:
"
It is much safer to:

    if (need to split) {
        split = bio_split(bio, ...)
        bio_chain(...)
        make_request_fn(split);
        generic_make_request(bio);
   } else
        make_request_fn(mddev, bio);

This way we first process the initial section of the bio (in 'split')
which will queue some requests to the underlying devices.  These
requests will be queued in generic_make_request.
Then we queue the remainder of the bio, which will be added to the end
of the generic_make_request queue.
Then we return.
generic_make_request() will pop the lower-level device requests off the
queue and handle them first.  Then it will process the remainder
of the original bio once the first section has been fully processed.
"

Note, this only happens in read path. In write path, the bio is flushed to
underlaying disks either by blk flush (from schedule) or offladed to raid1/10d.
It's queued in current->bio_list.

Cc: Coly Li <colyli@suse.de>
Suggested-by: NeilBrown <neilb@suse.com>
Reviewed-by: Jack Wang <jinpu.wang@profitbricks.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

hwrng: omap - Do not access INTMASK_REG on EIP76

BugLink: http://bugs.launchpad.net/bugs/1676429
commit b985735be7afea3a5e0570ce2ea0b662c0e12e19 upstream.

The INTMASK_REG register does not exist on EIP76. Due to this, the call:

omap_rng_write(priv, RNG_INTMASK_REG, RNG_SHUTDOWN_OFLO_MASK);

ends up, through the reg_map_eip76[] array, in accessing the register at
offset 0, which is the RNG_OUTPUT_0_REG. This by itself doesn't cause
any problem, but clearly doesn't enable the interrupt as it was
expected.

On EIP76, the register that allows to enable the interrupt is
RNG_CONTROL_REG. And just like RNG_INTMASK_REG, it's bit 1 of this
register that allows to enable the shutdown_oflo interrupt.

Fixes: 383212425c926 ("hwrng: omap - Add device variant for SafeXcel IP-76 found in Armada 8K")
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

hwrng: omap - use devm_clk_get() instead of of_clk_get()

BugLink: http://bugs.launchpad.net/bugs/1676429
commit 761c2510283066324cab7859930777ee34cbca78 upstream.

The omap-rng driver currently uses of_clk_get() to get a reference to
the clock, but never releases that reference. This commit fixes that by
using devm_clk_get() instead.

Fixes: 383212425c926 ("hwrng: omap - Add device variant for SafeXcel IP-76 found in Armada 8K")
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

hwrng: omap - write registers after enabling the clock

BugLink: http://bugs.launchpad.net/bugs/1676429
commit 45c2fdde01299b02a6e3225e848598a3c1e55539 upstream.

Commit 383212425c926 ("hwrng: omap - Add device variant for SafeXcel
IP-76 found in Armada 8K") added support for the SafeXcel IP-76 variant
of the IP. This modification included getting a reference and enabling a
clock. Unfortunately, this was done *after* writing to the
RNG_INTMASK_REG register. This generally works fine when the driver is
built-in because the clock might have been left enabled by the
bootloader, but fails short when the driver is built as a module: it
causes a system hang because a register is being accessed while the
clock is not enabled.

This commit fixes that by making the register access *after* enabling
the clock.

This issue was found by the kernelci.org testing effort.

Fixes: 383212425c926 ("hwrng: omap - Add device variant for SafeXcel IP-76 found in Armada 8K")
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

powerpc/boot: Fix zImage TOC alignment

BugLink: http://bugs.launchpad.net/bugs/1676429
commit 97ee351b50a49717543533cfb85b4bf9d88c9680 upstream.

Recent toolchains force the TOC to be 256 byte aligned. We need to
enforce this alignment in the zImage linker script, otherwise pointers
to our TOC variables (__toc_start) could be incorrect. If the actual
start of the TOC and __toc_start don't have the same value we crash
early in the zImage wrapper.

Suggested-by: Alan Modra <amodra@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

cpufreq: Fix and clean up show_cpuinfo_cur_freq()

BugLink: http://bugs.launchpad.net/bugs/1676429
commit 9b4f603e7a9f4282aec451063ffbbb8bb410dcd9 upstream.

There is a missing newline in show_cpuinfo_cur_freq(), so add it,
but while at it clean that function up somewhat too.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

NFS prevent double free in async nfs4_exchange_id

BugLink: http://bugs.launchpad.net/bugs/1676429
commit 63513232f8cd219dcaa5eafae028740ed3067d83 upstream.

Since rpc_task is async, the release function should be called which
will free the impl_id, scope, and owner.

Trond pointed at 2 more problems:
-- use of client pointer after free in the nfs4_exchangeid_release() function
-- cl_count mismatch if rpc_run_task() isn't run

Fixes: 8d89bd70bc9 ("NFS setup async exchange_id")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

xprtrdma: Squelch kbuild sparse complaint

BugLink: http://bugs.launchpad.net/bugs/1676429
commit eed50879d64ab1b9f76445dbab822e43a098b309 upstream.

New complaint from kbuild for 4.9.y:

net/sunrpc/xprtrdma/verbs.c:489:19: sparse: incompatible types in
comparison expression (different type sizes)

verbs.c:
489 max_sge = min(ia->ri_device->attrs.max_sge, RPCRDMA_MAX_SEND_SGES);

I can't reproduce this running sparse here. Likewise, "make W=1
net/sunrpc/xprtrdma/verbs.o" never indicated any issue.

A little poking suggests that because the range of its values is
small, gcc can make the actual width of RPCRDMA_MAX_SEND_SGES
smaller than the width of an unsigned integer.

Fixes: 16f906d66cd7 ("xprtrdma: Reduce required number of send SGEs")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

md/r5cache: fix set_syndrome_sources() for data in cache

BugLink: http://bugs.launchpad.net/bugs/1676429
commit 0977762f6d15f13caccc20d71a5dec47d098907d upstream.

Before this patch, device InJournal will be included in prexor
(SYNDROME_SRC_WANT_DRAIN) but not in reconstruct (SYNDROME_SRC_WRITTEN). So it
will break parity calculation. With srctype == SYNDROME_SRC_WRITTEN, we need
include both dev with non-null ->written and dev with R5_InJournal. This fixes
logic in 1e6d690(md/r5cache: caching phase of r5cache)

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

perf/core: Fix event inheritance on fork()

BugLink: http://bugs.launchpad.net/bugs/1676429
commit e7cc4865f0f31698ef2f7aac01a50e78968985b7 upstream.

While hunting for clues to a use-after-free, Oleg spotted that
perf_event_init_context() can loose an error value with the result
that fork() can succeed even though we did not fully inherit the perf
event context.

Spotted-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: oleg@redhat.com
Fixes: 889ff0150661 ("perf/core: Split context's event group list into pinned and non-pinned lists")
Link: http://lkml.kernel.org/r/20170316125823.190342547@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

perf/core: Fix use-after-free in perf_release()

BugLink: http://bugs.launchpad.net/bugs/1676429
commit e552a8389aa409e257b7dcba74f67f128f979ccc upstream.

Dmitry reported syzcaller tripped a use-after-free in perf_release().

After much puzzlement Oleg spotted the below scenario:

  Task1                           Task2

  fork()
    perf_event_init_task()
    /* ... */
    goto bad_fork_$foo;
    /* ... */
    perf_event_free_task()
      mutex_lock(ctx->lock)
      perf_free_event(B)

                                  perf_event_release_kernel(A)
                                    mutex_lock(A->child_mutex)
                                    list_for_each_entry(child, ...) {
                                      /* child == B */
                                      ctx = B->ctx;
                                      get_ctx(ctx);
                                      mutex_unlock(A->child_mutex);

        mutex_lock(A->child_mutex)
        list_del_init(B->child_list)
        mutex_unlock(A->child_mutex)

        /* ... */

      mutex_unlock(ctx->lock);
      put_ctx() /* >0 */
    free_task();
                                      mutex_lock(ctx->lock);
                                      mutex_lock(A->child_mutex);
                                      /* ... */
                                      mutex_unlock(A->child_mutex);
                                      mutex_unlock(ctx->lock)
                                      put_ctx() /* 0 */
                                        ctx->task && !TOMBSTONE
                                          put_task_struct() /* UAF */

This patch closes the hole by making perf_event_free_task() destroy the
task <-> ctx relation such that perf_event_release_kernel() will no longer
observe the now dead task.

Spotted-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: fweisbec@gmail.com
Cc: oleg@redhat.com
Fixes: c6e5b73242d2 ("perf: Synchronously clean up child events")
Link: http://lkml.kernel.org/r/20170314155949.GE32474@worktop
Link: http://lkml.kernel.org/r/20170316125823.140295131@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

parisc: Fix system shutdown halt

BugLink: http://bugs.launchpad.net/bugs/1676429
commit 73580dac7618e4bcd21679f553cf3c97323fec46 upstream.

On those parisc machines which don't provide a software power off
function, the system currently kills the init process at the end of a
shutdown and unexpectedly restarts insteads of halting.
Fix it by adding a loop which will not return.

Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

parisc: support R_PARISC_SECREL32 relocation in modules

BugLink: http://bugs.launchpad.net/bugs/1676429
commit 5f655322b1ba4bd46e26e307d04098f9c84df764 upstream.

The parisc kernel doesn't work with CONFIG_MODVERSIONS since the commit
71810db27c1c853b335675bee335d893bc3d324b. It can't load modules with the
error: "module unix: Unknown relocation: 41".

The commit changes __kcrctab from 64-bit valus to 32-bit values. The
assembler generates R_PARISC_SECREL32 secrel relocation for them and the
module loader doesn't support this relocation.

This patch adds the R_PARISC_SECREL32 relocation to the module loader.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

parisc: Optimize flush_kernel_vmap_range and invalidate_kernel_vmap_range

BugLink: http://bugs.launchpad.net/bugs/1676429
commit 316ec0624f951166daedbe446988ef92ae72b59f upstream.

The previously submitted patch did not resolve the random segmentation
faults observed on the phantom buildd system. There are still
unresolved problems with the Debian 4.8 and 4.9 kernels on C8000.

The attached patch removes the flush of the offset map pages and does a
whole data cache flush for large ranges. No other arch flushes the
offset map in these routines as far as I can tell.

I have not observed any random segmentation faults on rp3440 in two
weeks of testing with 4.10.0 and 4.10.1.

Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

qla2xxx: Fix request queue corruption.

BugLink: http://bugs.launchpad.net/bugs/1676429
commit 8b666809e10cda9814af3e8be339d35b83909056 upstream.

When FW notify driver or driver detects low FW resource,
driver tries to send out Busy SCSI Status to tell Initiator
side to back off. During the send process, the lock was not held.

Signed-off-by: Quinn Tran <quinn.tran@qlogic.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

qla2xxx: Fix memory leak for abts processing

BugLink: http://bugs.launchpad.net/bugs/1676429
commit ae940f2c472a62904dc18234de5cf3ed28f195ee upstream.

Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

give up on gcc ilog2() constant optimizations

BugLink: http://bugs.launchpad.net/bugs/1676429
commit 474c90156c8dcc2fa815e6716cc9394d7930cb9c upstream.

gcc-7 has an "optimization" pass that completely screws up, and
generates the code expansion for the (impossible) case of calling
ilog2() with a zero constant, even when the code gcc compiles does not
actually have a zero constant.

And we try to generate a compile-time error for anybody doing ilog2() on
a constant where that doesn't make sense (be it zero or negative).  So
now gcc7 will fail the build due to our sanity checking, because it
created that constant-zero case that didn't actually exist in the source
code.

There's a whole long discussion on the kernel mailing about how to work
around this gcc bug.  The gcc people themselevs have discussed their
"feature" in

   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72785

but it's all water under the bridge, because while it looked at one
point like it would be solved by the time gcc7 was released, that was
not to be.

So now we have to deal with this compiler braindamage.

And the only simple approach seems to be to just delete the code that
tries to warn about bad uses of ilog2().

So now "ilog2()" will just return 0 not just for the value 1, but for
any non-positive value too.

It's not like I can recall anybody having ever actually tried to use
this function on any invalid value, but maybe the sanity check just
meant that such code never made it out in public.

Reported-by: Laura Abbott <labbott@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>,
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

UBUNTU: Start new release

Ignore: yes
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

UBUNTU: Ubuntu-4.10.0-15.17

Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

powerpc/64s: POWER9 machine check handler

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1675771
Add POWER9 machine check handler. There are several new types of errors
added, so logging messages for those are also added.

This doesn't attempt to reuse any of the P7/8 defines or functions,
because that becomes too complex. The better option in future is to use
a table driven approach.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit 7b9f71f974a12740e79e918cfd58c2fce0b5b580)
Signed-off-by: Breno Leitao <brenohl@br.ibm.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

powerpc/64s: allow machine check handler to set severity and initiator

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1675771
Currently severity and initiator are always set to MCE_SEV_ERROR_SYNC and
MCE_INITIATOR_CPU in the core mce code. Allow them to be set by the
machine specific mce handlers.

No functional change for existing handlers.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit c1bbf387d6191e6e18f3adc4db45b922822c2ba4)
Signed-off-by: Breno Leitao <brenohl@br.ibm.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

powerpc/64s: fix handling of non-synchronous machine checks

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1675771
A synchronous machine check is an exception raised by the attempt to
execute the current instruction. If the error can't be corrected, it
can make sense to SIGBUS the currently running process.

In other cases, the error condition is not related to the current
instruction, so killing the current process is not the right thing to
do.

Today, all machine checks are MCE_SEV_ERROR_SYNC, so this has no
practical change. It will be used to handle POWER9 asynchronous
machine checks.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit 1363875bdb6317a2d0798284d7aaf320f0782f6d)
Signed-off-by: Breno Leitao <brenohl@br.ibm.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

x86/cpufeature: Add RING3MWAIT to CPU features

BugLink: http://bugs.launchpad.net/bugs/1637550
Add software-defined CPUID bit for the non-architectural ring 3
MONITOR/MWAIT feature.

Signed-off-by: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
Cc: Piotr.Luc@intel.com
Cc: dave.hansen@linux.intel.com
Link: http://lkml.kernel.org/r/1484918557-15481-4-git-send-email-grzegorz.andrejczuk@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
(cherry picked from commit 1d12d0ef0194ccc4dcebed3d96bb2301b26fc3ee)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Robert Hooker <sarvatt@ubuntu.com>

x86/elf: Add HWCAP2 to expose ring 3 MONITOR/MWAIT

BugLink: http://bugs.launchpad.net/bugs/1637550
Introduce ELF_HWCAP2 variable for x86 and reserve its bit 0 to expose the
ring 3 MONITOR/MWAIT.

HWCAP variables contain bitmasks which can be used by userspace
applications to detect which instruction sets are supported by CPU. On x86
architecture information about CPU capabilities can be checked via CPUID
instructions, unfortunately presence of ring 3 MONITOR/MWAIT feature cannot
be checked this way. ELF_HWCAP cannot be used as well, because on x86 it is
set to CPUID[1].EDX which means that all bits are reserved there.

HWCAP2 approach was chosen because it reuses existing solution present
in other architectures, so only minor modifications are required to the
kernel and userspace applications. When ELF_HWCAP2 is defined
kernel maps it to AT_HWCAP2 during the start of the application.
This way the ring 3 MONITOR/MWAIT feature can be detected using getauxval()
API in a simple and fast manner. ELF_HWCAP2 type is u32 to be consistent
with x86 ELF_HWCAP type.

Signed-off-by: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
Cc: Piotr.Luc@intel.com
Cc: dave.hansen@linux.intel.com
Link: http://lkml.kernel.org/r/1484918557-15481-3-git-send-email-grzegorz.andrejczuk@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
(cherry picked from commit 0274f9551eff55dbd63b5f5f3efe30fe5d4c801c)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Robert Hooker <sarvatt@ubuntu.com>

x86/msr: Add MSR_MISC_FEATURE_ENABLES and RING3MWAIT bit

BugLink: http://bugs.launchpad.net/bugs/1637550
Define new MSR MISC_FEATURE_ENABLES (0x140).

On supported CPUs if bit 1 of this MSR is set, then calling MONITOR and
MWAIT instructions outside of ring 0 will not cause invalid-opcode
exception.

The MSR MISC_FEATURE_ENABLES is not yet documented in the SDM. Here is the
relevant documentation:

Hex   Dec  Name                     Scope
140H  320  MISC_FEATURE_ENABLES     Thread
           0    Reserved
           1    If set to 1, the MONITOR and MWAIT instructions do not
                cause invalid-opcode exceptions when executed with CPL > 0
                or in virtual-8086 mode. If MWAIT is executed when CPL > 0
                or in virtual-8086 mode, and if EAX indicates a C-state
                other than C0 or C1, the instruction operates as if EAX
                indicated the C-state C1.
           63:2 Reserved

Signed-off-by: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
Cc: Piotr.Luc@intel.com
Cc: dave.hansen@linux.intel.com
Link: http://lkml.kernel.org/r/1484918557-15481-2-git-send-email-grzegorz.andrejczuk@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
(cherry picked from commit ae47eda905e61ef6ba0b6f79b967c9de15ca4f8e)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Robert Hooker <sarvatt@ubuntu.com>

x86/cpufeature: Enable RING3MWAIT for Knights Mill

BugLink: http://bugs.launchpad.net/bugs/1637550
Enable ring 3 MONITOR/MWAIT for Intel Xeon Phi codenamed Knights Mill. We
can't guarantee that this (KNM) will be the last CPU model that needs this
hack. But, we do recognize that this is far from optimal, and there is an
effort to ensure we don't keep doing extending this hack forever.

Signed-off-by: Piotr Luc <piotr.luc@intel.com>
Cc: Piotr.Luc@intel.com
Cc: dave.hansen@linux.intel.com
Link: http://lkml.kernel.org/r/1484918557-15481-6-git-send-email-grzegorz.andrejczuk@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
(cherry picked from commit 4d8bb00604b182b62e7786bae0e58e0befeeff85)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Robert Hooker <sarvatt@ubuntu.com>

x86/cpufeature: Enable RING3MWAIT for Knights Landing

BugLink: http://bugs.launchpad.net/bugs/1637550
Enable ring 3 MONITOR/MWAIT for Intel Xeon Phi x200 codenamed Knights
Landing.

Presence of this feature cannot be detected automatically (by reading any
other MSR) therefore it is required to explicitly check for the family and
model of the CPU before attempting to enable it.

Signed-off-by: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
Cc: Piotr.Luc@intel.com
Cc: dave.hansen@linux.intel.com
Link: http://lkml.kernel.org/r/1484918557-15481-5-git-send-email-grzegorz.andrejczuk@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
(cherry picked from commit e16fd002afe2b16d828bbf738b8a81a185fe9272)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Robert Hooker <sarvatt@ubuntu.com>

pinctrl: intel: Add Intel Gemini Lake pin controller support

BugLink: http://bugs.launchpad.net/bugs/1645951
This driver adds pinctrl/GPIO support for Intel Gemini Lake SoC. The
GPIO controller is based on the next generation GPIO hardware but still
compatible with the one supported by the Intel core pinctrl/GPIO driver.

This commit includes material from David E. Box.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
(cherry picked from commit 6693f9f96a0936488dd7876b8079933786e27b1b)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

UBUNTU: [Config] CONFIG_PINCTRL_GEMINILAKE=m

BugLink: http://bugs.launchpad.net/bugs/1645951
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

spi: pxa2xx: Add support for Intel Gemini Lake

BugLink: http://bugs.launchpad.net/bugs/1645951
Gemini Lake reuses the same LPSS SPI configuration as Broxton

Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit e18a80acd1365e91e3efcd69942d9073936cf851)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

i2c: i801: Add support for Intel Gemini Lake

BugLink: http://bugs.launchpad.net/bugs/1645951
Intel Gemini Lake has the same SMBus host controller than Intel Broxton.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
(cherry picked from commit 9827f9eb79c56424eac6409197a290601cf78eee)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

pwm: lpss: Add Intel Gemini Lake PCI ID

BugLink: http://bugs.launchpad.net/bugs/1645951
Intel Gemini Lake PWM is pretty much same as used in Intel Broxton. Add
this new PCI ID to the list of supported devices.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
(cherry picked from commit ae2520540cb0090eb8cdba559bcb6a82266f0308)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

mfd: intel-lpss: Add Intel Gemini Lake PCI IDs

BugLink: http://bugs.launchpad.net/bugs/1645951
Intel Gemini Lake is essentially Broxton with different PCI IDs. Add these
new PCI IDs to the list of supported devices.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
(cherry picked from commit f80e78aa11ad754de20104233af1ce4cea8f16a5)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

Linux 4.10.5

BugLink: http://bugs.launchpad.net/bugs/1675032
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

crypto: s5p-sss - Fix spinlock recursion on LRW(AES)

BugLink: http://bugs.launchpad.net/bugs/1675032
commit 28b62b1458685d8f68f67d9b2d511bf8fa32b746 upstream.

Running TCRYPT with LRW compiled causes spinlock recursion:

    testing speed of async lrw(aes) (lrw(ecb-aes-s5p)) encryption
    tcrypt: test 0 (256 bit key, 16 byte blocks): 19007 operations in 1 seconds (304112 bytes)
    tcrypt: test 1 (256 bit key, 64 byte blocks): 15753 operations in 1 seconds (1008192 bytes)
    tcrypt: test 2 (256 bit key, 256 byte blocks): 14293 operations in 1 seconds (3659008 bytes)
    tcrypt: test 3 (256 bit key, 1024 byte blocks): 11906 operations in 1 seconds (12191744 bytes)
    tcrypt: test 4 (256 bit key, 8192 byte blocks):
    BUG: spinlock recursion on CPU#1, irq/84-10830000/89
     lock: 0xeea99a68, .magic: dead4ead, .owner: irq/84-10830000/89, .owner_cpu: 1
    CPU: 1 PID: 89 Comm: irq/84-10830000 Not tainted 4.11.0-rc1-00001-g897ca6d0800d #559
    Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
    [<c010e1ec>] (unwind_backtrace) from [<c010ae1c>] (show_stack+0x10/0x14)
    [<c010ae1c>] (show_stack) from [<c03449c0>] (dump_stack+0x78/0x8c)
    [<c03449c0>] (dump_stack) from [<c015de68>] (do_raw_spin_lock+0x11c/0x120)
    [<c015de68>] (do_raw_spin_lock) from [<c0720110>] (_raw_spin_lock_irqsave+0x20/0x28)
    [<c0720110>] (_raw_spin_lock_irqsave) from [<c0572ca0>] (s5p_aes_crypt+0x2c/0xb4)
    [<c0572ca0>] (s5p_aes_crypt) from [<bf1d8aa4>] (do_encrypt+0x78/0xb0 [lrw])
    [<bf1d8aa4>] (do_encrypt [lrw]) from [<bf1d8b00>] (encrypt_done+0x24/0x54 [lrw])
    [<bf1d8b00>] (encrypt_done [lrw]) from [<c05732a0>] (s5p_aes_complete+0x60/0xcc)
    [<c05732a0>] (s5p_aes_complete) from [<c0573440>] (s5p_aes_interrupt+0x134/0x1a0)
    [<c0573440>] (s5p_aes_interrupt) from [<c01667c4>] (irq_thread_fn+0x1c/0x54)
    [<c01667c4>] (irq_thread_fn) from [<c0166a98>] (irq_thread+0x12c/0x1e0)
    [<c0166a98>] (irq_thread) from [<c0136a28>] (kthread+0x108/0x138)
    [<c0136a28>] (kthread) from [<c0107778>] (ret_from_fork+0x14/0x3c)

Interrupt handling routine was calling req->base.complete() under
spinlock.  In most cases this wasn't fatal but when combined with some
of the cipher modes (like LRW) this caused recursion - starting the new
encryption (s5p_aes_crypt()) while still holding the spinlock from
previous round (s5p_aes_complete()).

Beside that, the s5p_aes_interrupt() error handling path could execute
two completions in case of error for RX and TX blocks.

Rewrite the interrupt handling routine and the completion by:

1. Splitting the operations on scatterlist copies from
   s5p_aes_complete() into separate s5p_sg_done(). This still should be
   done under lock.
   The s5p_aes_complete() now only calls req->base.complete() and it has
   to be called outside of lock.

2. Moving the s5p_aes_complete() out of spinlock critical sections.
   In interrupt service routine s5p_aes_interrupts(), it appeared in few
   places, including error paths inside other functions called from ISR.
   This code was not so obvious to read so simplify it by putting the
   s5p_aes_complete() only within ISR level.

Reported-by: Nathan Royce <nroycea+kernel@gmail.com>
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

crypto: powerpc - Fix initialisation of crc32c context

BugLink: http://bugs.launchpad.net/bugs/1675032
commit aa2be9b3d6d2d699e9ca7cbfc00867c80e5da213 upstream.

Turning on crypto self-tests on a POWER8 shows:

alg: hash: Test 1 failed for crc32c-vpmsum
00000000: ff ff ff ff

Comparing the code with the Intel CRC32c implementation on which
ours is based shows that we are doing an init with 0, not ~0
as CRC32c requires.

This probably wasn't caught because btrfs does its own weird
open-coded initialisation.

Initialise our internal context to ~0 on init.

This makes the self-tests pass, and btrfs continues to work.

Fixes: 6dd7a82cc54e ("crypto: powerpc - Add POWER8 optimised crc32c")
Cc: Anton Blanchard <anton@samba.org>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Acked-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

locking/rwsem: Fix down_write_killable() for CONFIG_RWSEM_GENERIC_SPINLOCK=y

BugLink: http://bugs.launchpad.net/bugs/1675032
commit 17fcbd590d0c3e35bd9646e2215f86586378bc42 upstream.

We hang if SIGKILL has been sent, but the task is stuck in down_read()
(after do_exit()), even though no task is doing down_write() on the
rwsem in question:

  INFO: task libupnp:21868 blocked for more than 120 seconds.
  libupnp         D    0 21868      1 0x08100008
  ...
  Call Trace:
  __schedule()
  schedule()
  __down_read()
  do_exit()
  do_group_exit()
  __wake_up_parent()

This bug has already been fixed for CONFIG_RWSEM_XCHGADD_ALGORITHM=y in
the following commit:

04cafed7fc19 ("locking/rwsem: Fix down_write_killable()")

... however, this bug also exists for CONFIG_RWSEM_GENERIC_SPINLOCK=y.

Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Niklas Cassel <niklass@axis.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: d47996082f52 ("locking/rwsem: Introduce basis for down_write_killable()")
Link: http://lkml.kernel.org/r/1487981873-12649-1-git-send-email-niklass@axis.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

futex: Add missing error handling to FUTEX_REQUEUE_PI

BugLink: http://bugs.launchpad.net/bugs/1675032
commit 9bbb25afeb182502ca4f2c4f3f88af0681b34cae upstream.

Thomas spotted that fixup_pi_state_owner() can return errors and we
fail to unlock the rt_mutex in that case.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Darren Hart <dvhart@linux.intel.com>
Cc: juri.lelli@arm.com
Cc: bigeasy@linutronix.de
Cc: xlpang@redhat.com
Cc: rostedt@goodmis.org
Cc: mathieu.desnoyers@efficios.com
Cc: jdesfossez@efficios.com
Cc: dvhart@infradead.org
Cc: bristot@redhat.com
Link: http://lkml.kernel.org/r/20170304093558.867401760@infradead.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

futex: Fix potential use-after-free in FUTEX_REQUEUE_PI

BugLink: http://bugs.launchpad.net/bugs/1675032
commit c236c8e95a3d395b0494e7108f0d41cf36ec107c upstream.

While working on the futex code, I stumbled over this potential
use-after-free scenario. Dmitry triggered it later with syzkaller.

pi_mutex is a pointer into pi_state, which we drop the reference on in
unqueue_me_pi(). So any access to that pointer after that is bad.

Since other sites already do rt_mutex_unlock() with hb->lock held, see
for example futex_lock_pi(), simply move the unlock before
unqueue_me_pi().

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Darren Hart <dvhart@linux.intel.com>
Cc: juri.lelli@arm.com
Cc: bigeasy@linutronix.de
Cc: xlpang@redhat.com
Cc: rostedt@goodmis.org
Cc: mathieu.desnoyers@efficios.com
Cc: jdesfossez@efficios.com
Cc: dvhart@infradead.org
Cc: bristot@redhat.com
Link: http://lkml.kernel.org/r/20170304093558.801744246@infradead.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

x86/perf: Fix CR4.PCE propagation to use active_mm instead of mm

BugLink: http://bugs.launchpad.net/bugs/1675032
commit 5dc855d44c2ad960a86f593c60461f1ae1566b6d upstream.

If one thread mmaps a perf event while another thread in the same mm
is in some context where active_mm != mm (which can happen in the
scheduler, for example), refresh_pce() would write the wrong value
to CR4.PCE. This broke some PAPI tests.

Reported-and-tested-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 7911d3f7af14 ("perf/x86: Only allow rdpmc if a perf_event is mapped")
Link: http://lkml.kernel.org/r/0c5b38a76ea50e405f9abe07a13dfaef87c173a1.1489694270.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

x86/intel_rdt: Put group node in rdtgroup_kn_unlock

BugLink: http://bugs.launchpad.net/bugs/1675032
commit 49ec8f5b6ae3ab60385492cad900ffc8a523c895 upstream.

The rdtgroup_kn_unlock waits for the last user to release and put its
node. But it's calling kernfs_put on the node which calls the
rdtgroup_kn_unlock, which might not be the group's directory node, but
another group's file node.

This race could be easily reproduced by running 2 instances
of following script:

  mount -t resctrl resctrl /sys/fs/resctrl/
  pushd /sys/fs/resctrl/
  mkdir krava
  echo "krava" > krava/schemata
  rmdir krava
  popd
  umount  /sys/fs/resctrl

It triggers the slub debug error message with following command
line config: slub_debug=,kernfs_node_cache.

Call kernfs_put on the group's node to fix it.

Fixes: 60cf5e101fd4 ("x86/intel_rdt: Add mkdir to resctrl file system")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Shaohua Li <shli@fb.com>
Link: http://lkml.kernel.org/r/1489501253-20248-1-git-send-email-jolsa@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

x86/kasan: Fix boot with KASAN=y and PROFILE_ANNOTATED_BRANCHES=y

BugLink: http://bugs.launchpad.net/bugs/1675032
commit be3606ff739d1c1be36389f8737c577ad87e1f57 upstream.

The kernel doesn't boot with both PROFILE_ANNOTATED_BRANCHES=y and KASAN=y
options selected. With branch profiling enabled we end up calling
ftrace_likely_update() before kasan_early_init(). ftrace_likely_update() is
built with KASAN instrumentation, so calling it before kasan has been
initialized leads to crash.

Use DISABLE_BRANCH_PROFILING define to make sure that we don't call
ftrace_likely_update() from early code before kasan_early_init().

Fixes: ef7f0d6a6ca8 ("x86_64: add KASan support")
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: kasan-dev@googlegroups.com
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: lkp@01.org
Cc: Dmitry Vyukov <dvyukov@google.com>
Link: http://lkml.kernel.org/r/20170313163337.1704-1-aryabinin@virtuozzo.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

x86/tsc: Fix ART for TSC_KNOWN_FREQ

BugLink: http://bugs.launchpad.net/bugs/1675032
commit 44fee88cea43d3c2cac962e0439cb10a3cabff6d upstream.

Subhransu reported that convert_art_to_tsc() isn't working for him.

The ART to TSC relation is only set up for systems which use the refined
TSC calibration. Systems with known TSC frequency (available via CPUID 15)
are not using the refined calibration and therefor the ART to TSC relation
is never established.

Add the setup to the known frequency init path which skips ART
calibration. The init code needs to be duplicated as for systems which use
refined calibration the ART setup must be delayed until calibration has
been done.

The problem has been there since the ART support was introdduced, but only
detected now because Subhransu tested the first time on hardware which has
TSC frequency enumerated via CPUID 15.

Note for stable: The conditional has changed from TSC_RELIABLE to
TSC_KNOWN_FREQUENCY.

[ tglx: Rewrote changelog and identified the proper 'Fixes' commit ]

Fixes: f9677e0f8308 ("x86/tsc: Always Running Timer (ART) correlated clocksource")
Reported-by: "Prusty, Subhransu S" <subhransu.s.prusty@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: christopher.s.hall@intel.com
Cc: kevin.b.stanton@intel.com
Cc: john.stultz@linaro.org
Cc: akataria@vmware.com
Link: http://lkml.kernel.org/r/20170313145712.GI3312@twins.programming.kicks-ass.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

x86/unwind: Fix last frame check for aligned function stacks

BugLink: http://bugs.launchpad.net/bugs/1675032
commit 87a6b2975f0d340c75b7488d22d61d2f98fb8abf upstream.

Pavel Machek reported the following warning on x86-32:

  WARNING: kernel stack frame pointer at f50cdf98 in swapper/2:0 has bad value   (null)

The warning is caused by the unwinder not realizing that it reached the
end of the stack, due to an unusual prologue which gcc sometimes
generates for aligned stacks.  The prologue is based on a gcc feature
called the Dynamic Realign Argument Pointer (DRAP).  It's almost always
enabled for aligned stacks when -maccumulate-outgoing-args isn't set.

This issue is similar to the one fixed by the following commit:

  8023e0e2a48d ("x86/unwind: Adjust last frame check for aligned function stacks")

... but that fix was specific to x86-64.

Make the fix more generic to cover x86-32 as well, and also ensure that
the return address referred to by the frame pointer is a copy of the
original return address.

Fixes: acb4608ad186 ("x86/unwind: Create stack frames for saved syscall registers")
Reported-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: http://lkml.kernel.org/r/50d4924db716c264b14f1633037385ec80bf89d2.1489465609.git.jpoimboe@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

drm/i915/lspcon: Fix resume time initialization due to unasserted HPD

BugLink: http://bugs.launchpad.net/bugs/1675032
commit 4b84b4a5507913ee0da27b1f1b27671937839de6 upstream.

During system resume time initialization the HPD level on LSPCON ports
can stay low for an extended amount of time, leading to failed AUX
transfers and LSPCON initialization. Fix this by waiting for HPD to get
asserted.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99178
Cc: Shashank Sharma <shashank.sharma@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Shashank Sharma <shashank.sharma@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1485509961-9010-3-git-send-email-imre.deak@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
(corrected stable tag)
Signed-off-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

drm/i915/gen9+: Enable hotplug detection early

BugLink: http://bugs.launchpad.net/bugs/1675032
commit 2a57d9cce1c08578097d965468e37f06d71fa495 upstream.

For LSPCON resume time initialization we need to sample the
corresponding pin's HPD level, but this is only available when HPD
detection is enabled. Currently we enable detection only when enabling
HPD interrupts which is too late, so bring the enabling of detection
earlier.

This is needed by the next patch.

Cc: Shashank Sharma <shashank.sharma@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Shashank Sharma <shashank.sharma@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1485509961-9010-2-git-send-email-imre.deak@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
(rebased onto v4.10.4 due to missing s/IS_BROXTON/IS_GEN9_LP/ upstream change)
(corrected stable tag)
Signed-off-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

drm/i915/lspcon: Enable AUX interrupts for resume time initialization

BugLink: http://bugs.launchpad.net/bugs/1675032
commit 908764f6d0bd1ba496cb8da33b9b98297ed27351 upstream.

For LSPCON initialization during system resume we need AUX
functionality, but we call the corresponding encoder reset hook with all
interrupts disabled. Without interrupts we'll do a poll-wait for AUX
transfer completions, which adds a significant delay if the transfers
timeout/need to be retried for some reason.

Fix this by enabling interrupts before calling the reset hooks. Note
that while this will enable AUX interrupts it will keep HPD interrupts
disabled, in a similar way to the init time output setup code.

This issue existed since LSPCON support was added.

v2:
- Rebased on drm-tip.

Cc: Shashank Sharma <shashank.sharma@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Tested-by: David Weinehall <david.weinehall@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1480448429-27739-1-git-send-email-imre.deak@intel.com
(rebased onto v4.10.4 due to missing s/dev/dev_priv/ upstream change)
(corrected stable tag)
Signed-off-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

arm64: KVM: VHE: Clear HCR_TGE when invalidating guest TLBs

BugLink: http://bugs.launchpad.net/bugs/1675032
commit 68925176296a8b995e503349200e256674bfe5ac upstream.

When invalidating guest TLBs, special care must be taken to
actually shoot the guest TLBs and not the host ones if we're
running on a VHE system. This is controlled by the HCR_EL2.TGE
bit, which we forget to clear before invalidating TLBs.

Address the issue by introducing two wrappers (__tlb_switch_to_guest
and __tlb_switch_to_host) that take care of both the VTTBR_EL2
and HCR_EL2.TGE switching.

Reported-by: Tomasz Nowicki <tnowicki@caviumnetworks.com>
Tested-by: Tomasz Nowicki <tnowicki@caviumnetworks.com>
Reviewed-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

dccp: fix memory leak during tear-down of unsuccessful connection request

BugLink: http://bugs.launchpad.net/bugs/1675032
[ Upstream commit 72ef9c4125c7b257e3a714d62d778ab46583d6a3 ]

This patch fixes a memory leak, which happens if the connection request
is not fulfilled between parsing the DCCP options and handling the SYN
(because e.g. the backlog is full), because we forgot to free the
list of ack vectors.

Reported-by: Jianwen Ji <jiji@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

tun: fix premature POLLOUT notification on tun devices

BugLink: http://bugs.launchpad.net/bugs/1675032
[ Upstream commit b20e2d54789c6acbf6bd0efdbec2cf5fa4d90ef1 ]

aszlig observed failing ssh tunnels (-w) during initialization since
commit cc9da6cc4f56e0 ("ipv6: addrconf: use stable address generator for
ARPHRD_NONE"). We already had reports that the mentioned commit breaks
Juniper VPN connections. I can't clearly say that the Juniper VPN client
has the same problem, but it is worth a try to hint to this patch.

Because of the early generation of link local addresses, the kernel now
can start asking for routers on the local subnet much earlier than usual.
Those router solicitation packets arrive inside the ssh channels and
should be transmitted to the tun fd before the configuration scripts
might have upped the interface and made it ready for transmission.

ssh polls on the interface and receives back a POLL_OUT. It tries to send
the earily router solicitation packet to the tun interface. Unfortunately
it hasn't been up'ed yet by config scripts, thus failing with -EIO. ssh
doesn't retry again and considers the tun interface broken forever.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=121131
Fixes: cc9da6cc4f56 ("ipv6: addrconf: use stable address generator for ARPHRD_NONE")
Cc: Bjørn Mork <bjorn@mork.no>
Reported-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Reported-by: Jonas Lippuner <jonas@lippuner.ca>
Cc: Jonas Lippuner <jonas@lippuner.ca>
Reported-by: aszlig <aszlig@redmoonstudios.org>
Cc: aszlig <aszlig@redmoonstudios.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

dccp/tcp: fix routing redirect race

BugLink: http://bugs.launchpad.net/bugs/1675032
[ Upstream commit 45caeaa5ac0b4b11784ac6f932c0ad4c6b67cda0 ]

As Eric Dumazet pointed out this also needs to be fixed in IPv6.
v2: Contains the IPv6 tcp/Ipv6 dccp patches as well.

We have seen a few incidents lately where a dst_enty has been freed
with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
dst_entry. If the conditions/timings are right a crash then ensues when the
freed dst_entry is referenced later on. A Common crashing back trace is:

#8 [] page_fault at ffffffff8163e648
    [exception RIP: __tcp_ack_snd_check+74]
.
.
#9 [] tcp_rcv_established at ffffffff81580b64
#10 [] tcp_v4_do_rcv at ffffffff8158b54a
#11 [] tcp_v4_rcv at ffffffff8158cd02
#12 [] ip_local_deliver_finish at ffffffff815668f4
#13 [] ip_local_deliver at ffffffff81566bd9
#14 [] ip_rcv_finish at ffffffff8156656d
#15 [] ip_rcv at ffffffff81566f06
#16 [] __netif_receive_skb_core at ffffffff8152b3a2
#17 [] __netif_receive_skb at ffffffff8152b608
#18 [] netif_receive_skb at ffffffff8152b690
#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3]
#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3]
#21 [] net_rx_action at ffffffff8152bac2
#22 [] __do_softirq at ffffffff81084b4f
#23 [] call_softirq at ffffffff8164845c
#24 [] do_softirq at ffffffff81016fc5
#25 [] irq_exit at ffffffff81084ee5
#26 [] do_IRQ at ffffffff81648ff8

Of course it may happen with other NIC drivers as well.

It's found the freed dst_entry here:

224 static bool tcp_in_quickack_mode(struct sock *sk)↩
225 {↩
226 ▹       const struct inet_connection_sock *icsk = inet_csk(sk);↩
227 ▹       const struct dst_entry *dst = __sk_dst_get(sk);↩
228 ↩
229 ▹       return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩
230 ▹       ▹       (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩
231 }↩

But there are other backtraces attributed to the same freed dst_entry in
netfilter code as well.

All the vmcores showed 2 significant clues:

- Remote hosts behind the default gateway had always been redirected to a
different gateway. A rtable/dst_entry will be added for that host. Making
more dst_entrys with lower reference counts. Making this more probable.

- All vmcores showed a postitive LockDroppedIcmps value, e.g:

LockDroppedIcmps                  267

A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
regardless of whether user space has the socket locked. This can result in a
race condition where the same dst_entry cached in sk->sk_dst_entry can be
decremented twice for the same socket via:

do_redirect()->__sk_dst_check()-> dst_release().

Which leads to the dst_entry being prematurely freed with another socket
pointing to it via sk->sk_dst_cache and a subsequent crash.

To fix this skip do_redirect() if usespace has the socket locked. Instead let
the redirect take place later when user space does not have the socket
locked.

The dccp/IPv6 code is very similar in this respect, so fixing it there too.

As Eric Garver pointed out the following commit now invalidates routes. Which
can set the dst->obsolete flag so that ipv4_dst_check() returns null and
triggers the dst_release().

Fixes: ceb3320610d6 ("ipv4: Kill routes during PMTU/redirect updates.")
Cc: Eric Garver <egarver@redhat.com>
Cc: Hannes Sowa <hsowa@redhat.com>
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

net: use net->count to check whether a netns is alive or not

BugLink: http://bugs.launchpad.net/bugs/1675032
[ Upstream commit 91864f5852f9996210fad400cf70fb85af091243 ]

The previous idea was to check whether a net namespace is in
net_exit_list or not. It doesn't work, because net->exit_list is used in
__register_pernet_operations and __unregister_pernet_operations where
all namespaces are added to a temporary list to make cleanup in a error
case, so list_empty(&net->exit_list) always returns false.

Reported-by: Mantas Mikulėnas <grawity@gmail.com>
Fixes: 002d8a1a6c11 ("net: skip genenerating uevents for network namespaces that are exiting")
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

ipv6: avoid write to a possibly cloned skb

BugLink: http://bugs.launchpad.net/bugs/1675032
[ Upstream commit 79e49503efe53a8c51d8b695bedc8a346c5e4a87 ]

ip6_fragment, in case skb has a fraglist, checks if the
skb is cloned.  If it is, it will move to the 'slow path' and allocates
new skbs for each fragment.

However, right before entering the slowpath loop, it updates the
nexthdr value of the last ipv6 extension header to NEXTHDR_FRAGMENT,
to account for the fragment header that will be inserted in the new
ipv6-fragment skbs.

In case original skb is cloned this munges nexthdr value of another
skb.  Avoid this by doing the nexthdr update for each of the new fragment
skbs separately.

This was observed with tcpdump on a bridge device where netfilter ipv6
reassembly is active:  tcpdump shows malformed fragment headers as
the l4 header (icmpv6, tcp, etc). is decoded as a fragment header.

Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Reported-by: Andreas Karis <akaris@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

ipv6: make ECMP route replacement less greedy

BugLink: http://bugs.launchpad.net/bugs/1675032
[ Upstream commit 67e194007be08d071294456274dd53e0a04fdf90 ]

Commit 27596472473a ("ipv6: fix ECMP route replacement") introduced a
loop that removes all siblings of an ECMP route that is being
replaced. However, this loop doesn't stop when it has replaced
siblings, and keeps removing other routes with a higher metric.
We also end up triggering the WARN_ON after the loop, because after
this nsiblings < 0.

Instead, stop the loop when we have taken care of all routes with the
same metric as the route being replaced.

  Reproducer:
  ===========
    #!/bin/sh

    ip netns add ns1
    ip netns add ns2
    ip -net ns1 link set lo up

    for x in 0 1 2 ; do
        ip link add veth$x netns ns2 type veth peer name eth$x netns ns1
        ip -net ns1 link set eth$x up
        ip -net ns2 link set veth$x up
    done

    ip -net ns1 -6 r a 2000::/64 nexthop via fe80::0 dev eth0 \
            nexthop via fe80::1 dev eth1 nexthop via fe80::2 dev eth2
    ip -net ns1 -6 r a 2000::/64 via fe80::42 dev eth0 metric 256
    ip -net ns1 -6 r a 2000::/64 via fe80::43 dev eth0 metric 2048

    echo "before replace, 3 routes"
    ip -net ns1 -6 r | grep -v '^fe80\|^ff00'
    echo

    ip -net ns1 -6 r c 2000::/64 nexthop via fe80::4 dev eth0 \
            nexthop via fe80::5 dev eth1 nexthop via fe80::6 dev eth2

    echo "after replace, only 2 routes, metric 2048 is gone"
    ip -net ns1 -6 r | grep -v '^fe80\|^ff00'

Fixes: 27596472473a ("ipv6: fix ECMP route replacement")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>