git.proxmox.com Git - mirror_ubuntu-bionic-kernel.git/log

powerpc/security: Fix spectre_v2 reporting

When I updated the spectre_v2 reporting to handle software count cache
flush I got the logic wrong when there's no software count cache
enabled at all.

The result is that on systems with the software count cache flush
disabled we print:

  Mitigation: Indirect branch cache disabled, Software count cache flush

Which correctly indicates that the count cache is disabled, but
incorrectly says the software count cache flush is enabled.

The root of the problem is that we are trying to handle all
combinations of options. But we know now that we only expect to see
the software count cache flush enabled if the other options are false.

So split the two cases, which simplifies the logic and fixes the bug.
We were also missing a space before "(hardware accelerated)".

The result is we see one of:

  Mitigation: Indirect branch serialisation (kernel only)
  Mitigation: Indirect branch cache disabled
  Mitigation: Software count cache flush
  Mitigation: Software count cache flush (hardware accelerated)

BugLink: https://bugs.launchpad.net/bugs/1822870
Fixes: ee13cb249fab ("powerpc/64s: Add support for software count cache flush")
Cc: stable@vger.kernel.org # v4.19+
BugLink: https://bugs.launchpad.net/bugs/1822870
BugLink: https://bugs.launchpad.net/bugs/1822870
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Michael Neuling <mikey@neuling.org>
Reviewed-by: Diana Craciun <diana.craciun@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(backported from commit 92edf8df0ff2ae86cc632eeca0e651fd8431d40d)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>

powerpc/fsl: Add nospectre_v2 command line argument

When the command line argument is present, the Spectre variant 2
mitigations are disabled.

BugLink: https://bugs.launchpad.net/bugs/1822870
Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(backported from commit f633a8ad636efb5d4bba1a047d4a0f1ef719aa06)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>

powerpc/fsl: Fix spectre_v2 mitigations reporting

Currently for CONFIG_PPC_FSL_BOOK3E the spectre_v2 file is incorrect:

$ cat /sys/devices/system/cpu/vulnerabilities/spectre_v2
"Mitigation: Software count cache flush"

Which is wrong. Fix it to report vulnerable for now.

BugLink: https://bugs.launchpad.net/bugs/1822870
Fixes: ee13cb249fab ("powerpc/64s: Add support for software count cache flush")
Cc: stable@vger.kernel.org # v4.19+
BugLink: https://bugs.launchpad.net/bugs/1822870
Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit 7d8bad99ba5a22892f0cad6881289fdc3875a930)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>

powerpc/powernv: Query firmware for count cache flush settings

Look for fw-features properties to determine the appropriate settings
for the count cache flush, and then call the generic powerpc code to
set it up based on the security feature flags.

BugLink: https://bugs.launchpad.net/bugs/1822870
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit 99d54754d3d5f896a8f616b0b6520662bc99d66b)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>

powerpc/pseries: Query hypervisor for count cache flush settings

Use the existing hypercall to determine the appropriate settings for
the count cache flush, and then call the generic powerpc code to set
it up based on the security feature flags.

BugLink: https://bugs.launchpad.net/bugs/1822870
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit ba72dc171954b782a79d25e0f4b3ed91090c3b1e)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>

powerpc/64s: Add support for software count cache flush

Some CPU revisions support a mode where the count cache needs to be
flushed by software on context switch. Additionally some revisions may
have a hardware accelerated flush, in which case the software flush
sequence can be shortened.

If we detect the appropriate flag from firmware we patch a branch
into _switch() which takes us to a count cache flush sequence.

That sequence in turn may be patched to return early if we detect that
the CPU supports accelerating the flush sequence in hardware.

Add debugfs support for reporting the state of the flush, as well as
runtime disabling it.

And modify the spectre_v2 sysfs file to report the state of the
software flush.

BugLink: https://bugs.launchpad.net/bugs/1822870
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(backported from commit ee13cb249fabdff8b90aaff61add347749280087)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>

powerpc/64s: Add new security feature flags for count cache flush

Add security feature flags to indicate the need for software to flush
the count cache on context switch, and for the presence of a hardware
assisted count cache flush.

BugLink: https://bugs.launchpad.net/bugs/1822870
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit dc8c6cce9a26a51fc19961accb978217a3ba8c75)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>

powerpc/asm: Add a patch_site macro & helpers for patching instructions

Add a macro and some helper C functions for patching single asm
instructions.

The gas macro means we can do something like:

1: nop
patch_site 1b, patch__foo

Which is less visually distracting than defining a GLOBAL symbol at 1,
and also doesn't pollute the symbol table which can confuse eg. perf.

These are obviously similar to our existing feature sections, but are
not automatically patched based on CPU/MMU features, rather they are
designed to be manually patched by C code at some arbitrary point.

BugLink: https://bugs.launchpad.net/bugs/1822870
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit 06d0bbc6d0f56dacac3a79900e9a9a0d5972d818)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>

powerpc/lib/feature-fixups: use raw_patch_instruction()

feature fixups need to use patch_instruction() early in the boot,
even before the code is relocated to its final address, requiring
patch_instruction() to use PTRRELOC() in order to address data.

But feature fixups applies on code before it is set to read only,
even for modules. Therefore, feature fixups can use
raw_patch_instruction() instead.

BugLink: https://bugs.launchpad.net/bugs/1822870
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit 8183d99f4a22c2abbc543847a588df3666ef0c0c)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>

powerpc/lib/code-patching: refactor patch_instruction()

patch_instruction() uses almost the same sequence as
__patch_instruction()

This patch refactor it so that patch_instruction() uses
__patch_instruction() instead of duplicating code.

BugLink: https://bugs.launchpad.net/bugs/1822870
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit 8cf4c05712f04a405f0dacebcca8f042b391694a)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>

powerpc/64: Make meltdown reporting Book3S 64 specific

In a subsequent patch we will enable building security.c for Book3E.
However the NXP platforms are not vulnerable to Meltdown, so make the
Meltdown vulnerability reporting PPC_BOOK3S_64 specific.

BugLink: https://bugs.launchpad.net/bugs/1822870
Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
[mpe: Split out of larger patch]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit 406d2b6ae3420f5bb2b3db6986dc6f0b6dbb637b)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>

powerpc/64: Call setup_barrier_nospec() from setup_arch()

Currently we require platform code to call setup_barrier_nospec(). But
if we add an empty definition for the !CONFIG_PPC_BARRIER_NOSPEC case
then we can call it in setup_arch().

BugLink: https://bugs.launchpad.net/bugs/1822870
Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit af375eefbfb27cbb5b831984e66d724a40d26b5c)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>

powerpc/64: Add CONFIG_PPC_BARRIER_NOSPEC

Add a config symbol to encode which platforms support the
barrier_nospec speculation barrier. Currently this is just Book3S 64
but we will add Book3E in a future patch.

BugLink: https://bugs.launchpad.net/bugs/1822870
Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit 179ab1cbf883575c3a585bcfc0f2160f1d22a149)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>

powerpc/64: Make stf barrier PPC_BOOK3S_64 specific.

NXP Book3E platforms are not vulnerable to speculative store
bypass, so make the mitigations PPC_BOOK3S_64 specific.

BugLink: https://bugs.launchpad.net/bugs/1822870
Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit 6453b532f2c8856a80381e6b9a1f5ea2f12294df)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>

powerpc/64: Disable the speculation barrier from the command line

The speculation barrier can be disabled from the command line
with the parameter: "nospectre_v1".

BugLink: https://bugs.launchpad.net/bugs/1822870
Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit cf175dc315f90185128fb061dc05b6fbb211aa2f)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>

powerpc64s: Show ori31 availability in spectre_v1 sysfs file not v2

When I added the spectre_v2 information in sysfs, I included the
availability of the ori31 speculation barrier.

Although the ori31 barrier can be used to mitigate v2, it's primarily
intended as a spectre v1 mitigation. Spectre v2 is mitigated by
hardware changes.

So rework the sysfs files to show the ori31 information in the
spectre_v1 file, rather than v2.

Currently we display eg:

  $ grep . spectre_v*
  spectre_v1:Mitigation: __user pointer sanitization
  spectre_v2:Mitigation: Indirect branch cache disabled, ori31 speculation barrier enabled

After:

  $ grep . spectre_v*
  spectre_v1:Mitigation: __user pointer sanitization, ori31 speculation barrier enabled
  spectre_v2:Mitigation: Indirect branch cache disabled

BugLink: https://bugs.launchpad.net/bugs/1822870
Fixes: d6fbe1c55c55 ("powerpc/64s: Wire up cpu_show_spectre_v2()")
Cc: stable@vger.kernel.org # v4.17+
BugLink: https://bugs.launchpad.net/bugs/1822870
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit 6d44acae1937b81cf8115ada8958e04f601f3f2e)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>

powerpc/64s: Enhance the information in cpu_show_spectre_v1()

We now have barrier_nospec as mitigation so print it in
cpu_show_spectre_v1() when enabled.

BugLink: https://bugs.launchpad.net/bugs/1822870
Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit a377514519b9a20fa1ea9adddbb4129573129cef)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>

powerpc/64: Use barrier_nospec in syscall entry

Our syscall entry is done in assembly so patch in an explicit
barrier_nospec.

Based on a patch by Michal Suchanek.

BugLink: https://bugs.launchpad.net/bugs/1822870
Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit 51973a815c6b46d7b23b68d6af371ad1c9d503ca)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>

powerpc: Use barrier_nospec in copy_from_user()

Based on the x86 commit doing the same.

See commit 304ec1b05031 ("x86/uaccess: Use __uaccess_begin_nospec()
and uaccess_try_nospec") and b3bbfb3fb5d2 ("x86: Introduce
__uaccess_begin_nospec() and uaccess_try_nospec") for more detail.

In all cases we are ordering the load from the potentially
user-controlled pointer vs a previous branch based on an access_ok()
check or similar.

Base on a patch from Michal Suchanek.

BugLink: https://bugs.launchpad.net/bugs/1822870
Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit ddf35cf3764b5a182b178105f57515b42e2634f8)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>

powerpc/64s: Enable barrier_nospec based on firmware settings

Check what firmware told us and enable/disable the barrier_nospec as
appropriate.

We err on the side of enabling the barrier, as it's no-op on older
systems, see the comment for more detail.

BugLink: https://bugs.launchpad.net/bugs/1822870
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit cb3d6759a93c6d0aea1c10deb6d00e111c29c19c)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>

powerpc/64s: Patch barrier_nospec in modules

Note that unlike RFI which is patched only in kernel the nospec state
reflects settings at the time the module was loaded.

Iterating all modules and re-patching every time the settings change
is not implemented.

Based on lwsync patching.

BugLink: https://bugs.launchpad.net/bugs/1822870
Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit 815069ca57c142eb71d27439bc27f41a433a67b3)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>

powerpc/64s: Add support for ori barrier_nospec patching

Based on the RFI patching. This is required to be able to disable the
speculation barrier.

Only one barrier type is supported and it does nothing when the
firmware does not enable it. Also re-patching modules is not supported
So the only meaningful thing that can be done is patching out the
speculation barrier at boot when the user says it is not wanted.

BugLink: https://bugs.launchpad.net/bugs/1822870
Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit 2eea7f067f495e33b8b116b35b5988ab2b8aec55)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>

sctp: implement memory accounting on rx path

sk_forward_alloc's updating is also done on rx path, but to be consistent
we change to use sk_mem_charge() in sctp_skb_set_owner_r().

In sctp_eat_data(), it's not enough to check sctp_memory_pressure only,
which doesn't work for mem_cgroup_sockets_enabled, so we change to use
sk_under_memory_pressure().

When it's under memory pressure, sk_mem_reclaim() and sk_rmem_schedule()
should be called on both RENEGE or CHUNK DELIVERY path exit the memory
pressure status as soon as possible.

Note that sk_rmem_schedule() is using datalen to make things easy there.

Reported-by: Matteo Croce <mcroce@redhat.com>
Tested-by: Matteo Croce <mcroce@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
CVE-2019-3874

(cherry picked from commit 9dde27de3e5efa0d032f3c891a0ca833a0d31911 linux-next)
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>

sctp: implement memory accounting on tx path

Now when sending packets, sk_mem_charge() and sk_mem_uncharge() have been
used to set sk_forward_alloc. We just need to call sk_wmem_schedule() to
check if the allocated should be raised, and call sk_mem_reclaim() to
check if the allocated should be reduced when it's under memory pressure.

If sk_wmem_schedule() returns false, which means no memory is allowed to
allocate, it will block and wait for memory to become available.

Note different from tcp, sctp wait_for_buf happens before allocating any
skb, so memory accounting check is done with the whole msg_len before it
too.

Reported-by: Matteo Croce <mcroce@redhat.com>
Tested-by: Matteo Croce <mcroce@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
CVE-2019-3874

(backported from commit 1033990ac5b2ab6cee93734cb6d301aa3a35bcaa linux-next)
[tyhicks: Backport to 4.15:
- sctp_sendmsg_to_asoc() does not yet exist and its code is still in
sctp_sendmsg()
- sctp_sendmsg() has slight context differences due to timeo being
unconditionally assigned]
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>

sctp: use sk_wmem_queued to check for writable space

sk->sk_wmem_queued is used to count the size of chunks in out queue
while sk->sk_wmem_alloc is for counting the size of chunks has been
sent. sctp is increasing both of them before enqueuing the chunks,
and using sk->sk_wmem_alloc to check for writable space.

However, sk_wmem_alloc is also increased by 1 for the skb allocked
for sending in sctp_packet_transmit() but it will not wake up the
waiters when sk_wmem_alloc is decreased in this skb's destructor.

If msg size is equal to sk_sndbuf and sendmsg is waiting for sndbuf,
the check 'msg_len <= sctp_wspace(asoc)' in sctp_wait_for_sndbuf()
will keep waiting if there's a skb allocked in sctp_packet_transmit,
and later even if this skb got freed, the waiting thread will never
get waked up.

This issue has been there since very beginning, so we change to use
sk->sk_wmem_queued to check for writable space as sk_wmem_queued is
not increased for the skb allocked for sending, also as TCP does.

SOCK_SNDBUF_LOCK check is also removed here as it's for tx buf auto
tuning which I will add in another patch.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
CVE-2019-3874

(backported from commit cd305c74b0f8b49748a79a8f67fc8e5e3e0c4794)
[tyhicks: Backport to 4.15:
- sctp_sendmsg_to_asoc() does not yet exist and its code is still in
sctp_sendmsg()
- sctp_sendmsg() has slight context differences due to timeo being
unconditionally assigned]
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>

z3fold: fix possible reclaim races

BugLink: https://bugs.launchpad.net/bugs/1814874
Reclaim and free can race on an object which is basically fine but in
order for reclaim to be able to map "freed" object we need to encode
object length in the handle. handle_to_chunks() is then introduced to
extract object length from a handle and use it during mapping.

Moreover, to avoid racing on a z3fold "headless" page release, we should
not try to free that page in z3fold_free() if the reclaim bit is set.
Also, in the unlikely case of trying to reclaim a page being freed, we
should not proceed with that page.

While at it, fix the page accounting in reclaim function.

This patch supersedes "[PATCH] z3fold: fix reclaim lock-ups".

Link: http://lkml.kernel.org/r/20181105162225.74e8837d03583a9b707cf559@gmail.com
Signed-off-by: Vitaly Wool <vitaly.vul@sony.com>
Signed-off-by: Jongseok Kim <ks77sj@gmail.com>
Reported-by-by: Jongseok Kim <ks77sj@gmail.com>
Reviewed-by: Snild Dolkow <snild@sony.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit ca0246bb97c23da9d267c2107c07fb77e38205c9)
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>

selftests/ftrace: Add ppc support for kprobe args tests

BugLink: https://bugs.launchpad.net/bugs/1812809
Add powerpc support for the recently added kprobe args tests.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Shuah Khan (Samsung OSG) <shuah@kernel.org>
(cherry picked from commit 9855c4626c67abc24902246ba961e6dd9022dd27)
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Acked-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>

UBUNTU: SAUCE: misc: rtsx: Fixed rts5260 power saving parameter and sd glitch

BugLink: https://bugs.launchpad.net/bugs/1825487
this patch fixes rts5260 power saving parameter
make power saving function work on L1.1, L1.2

Link: https://lore.kernel.org/lkml/20190417073508.12389-1-ricky_wu@realtek.com/
Signed-off-by: RickyWu <ricky_wu@realtek.com>
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Acked-by: You-Sheng Yang <vicamo.yang@canonical.com>
Acked-by: AceLan Kao <acelan.kao@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>

misc: rtsx: Enable OCP for rts522a rts524a rts525a rts5260

BugLink: https://bugs.launchpad.net/bugs/1825487
this enables and adds OCP function for Realtek A series cardreader chips
and fixes some OCP flow in rts5260.c

Signed-off-by: RickyWu <ricky_wu@realtek.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit bede03a579b3b4a036003c4862cc1baa4ddc351f)
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Acked-by: You-Sheng Yang <vicamo.yang@canonical.com>
Acked-by: AceLan Kao <acelan.kao@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>

misc: rtsx: make various functions static

BugLink: https://bugs.launchpad.net/bugs/1825487
The functions rts5260_get_ocpstat, rts5260_get_ocpstat2,
rts5260_clear_ocpstat, rts5260_process_ocp, rts5260_init_hw and
rts5260_set_aspm are local to the source and do not need to be
in global scope, so make them static.

Cleans up sparse warnings:
symbol 'rts5260_get_ocpstat' was not declared. Should it be static?
symbol 'rts5260_get_ocpstat2' was not declared. Should it be static?
symbol 'rts5260_clear_ocpstat' was not declared. Should it be static?
symbol 'rts5260_process_ocp' was not declared. Should it be static?
symbol 'rts5260_init_hw' was not declared. Should it be static?
symbol 'rts5260_set_aspm' was not declared. Should it be static?

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 75a898051d3d2a105b3a0ca8be6e356429a68457)
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Acked-by: You-Sheng Yang <vicamo.yang@canonical.com>
Acked-by: AceLan Kao <acelan.kao@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>

ALSA: hda/realtek - add two more pin configuration sets to quirk table

BugLink: https://bugs.launchpad.net/bugs/1825272
We have two Dell laptops which have the codec 10ec0236 and 10ec0256
respectively, the headset mic on them can't work, need to apply the
quirk of ALC255_FIXUP_DELL1_MIC_NO_PRESENCE. So adding their pin
configurations in the pin quirk table.

Cc: <stable@vger.kernel.org>
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
(cherry picked from commit b26e36b7ef36a8a3a147b1609b2505f8a4ecf511
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git)
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: AceLan Kao <acelan.kao@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>

sunrpc: use-after-free in svc_process_common()

CVE-2018-16884

if node have NFSv41+ mounts inside several net namespaces
it can lead to use-after-free in svc_process_common()

svc_process_common()
/* Setup reply header */
rqstp->rq_xprt->xpt_ops->xpo_prep_reply_hdr(rqstp); <<< HERE

svc_process_common() can use incorrect rqstp->rq_xprt,
its caller function bc_svc_process() takes it from serv->sv_bc_xprt.
The problem is that serv is global structure but sv_bc_xprt
is assigned per-netnamespace.

According to Trond, the whole "let's set up rqstp->rq_xprt
for the back channel" is nothing but a giant hack in order
to work around the fact that svc_process_common() uses it
to find the xpt_ops, and perform a couple of (meaningless
for the back channel) tests of xpt_flags.

All we really need in svc_process_common() is to be able to run
rqstp->rq_xprt->xpt_ops->xpo_prep_reply_hdr()

Bruce J Fields points that this xpo_prep_reply_hdr() call
is an awfully roundabout way just to do "svc_putnl(resv, 0);"
in the tcp case.

This patch does not initialiuze rqstp->rq_xprt in bc_svc_process(),
now it calls svc_process_common() with rqstp->rq_xprt = NULL.

To adjust reply header svc_process_common() just check
rqstp->rq_prot and calls svc_tcp_prep_reply_hdr() for tcp case.

To handle rqstp->rq_xprt = NULL case in functions called from
svc_process_common() patch intruduces net namespace pointer
svc_rqst->rq_bc_net and adjust SVC_NET() definition.
Some other function was also adopted to properly handle described case.

Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Cc: stable@vger.kernel.org
Fixes: 23c20ecd4475 ("NFS: callback up - users counting cleanup")
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
(backported from commit d4b09acf924b84bae77cad090a9d108e70b43643)
[ kmously: The upstream patch had 2 additional NULL-checks in
include/trace/events/sunrpc.h that aren't applicable to 4.4.
Also added a declaration for svc_tcp_prep_reply_hdr() in svc.c to
avoid implicit-function-declaration warnings/errors ]
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: You-Sheng Yang <vicamo.yang@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>

sunrpc: use SVC_NET() in svcauth_gss_* functions

CVE-2018-16884

Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
(cherry picked from commit b8be5674fa9a6f3677865ea93f7803c4212f3e10)
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: You-Sheng Yang <vicamo.yang@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>

sky2: Increase D3 delay again

Another platform requires even longer delay to make the device work
correctly after S3.

So increase the delay to 300ms.

BugLink: https://bugs.launchpad.net/bugs/1798921
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 1765f5dcd00963e33f1b8a4e0f34061fbc0e2f7f)
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>

brcmfmac: assure SSID length from firmware is limited

The SSID length as received from firmware should not exceed
IEEE80211_MAX_SSID_LEN as that would result in heap overflow.

Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
CVE-2019-9500

(cherry picked from commit 1b5e2423164b3670e8bc9174e4762d297990deff)
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Acked-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

brcmfmac: add subtype check for event handling in data path

For USB there is no separate channel being used to pass events
from firmware to the host driver and as such are passed over the
data path. In order to detect mock event messages an additional
check is needed on event subtype. This check is added conditionally
using unlikely() keyword.

Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
CVE-2019-9503

(cherry picked from commit a4176ec356c73a46c07c181c6d04039fafa34a9f)
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Acked-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

vfio/type1: Limit DMA mappings per container

Memory backed DMA mappings are accounted against a user's locked
memory limit, including multiple mappings of the same memory. This
accounting bounds the number of such mappings that a user can create.
However, DMA mappings that are not backed by memory, such as DMA
mappings of device MMIO via mmaps, do not make use of page pinning
and therefore do not count against the user's locked memory limit.
These mappings still consume memory, but the memory is not well
associated to the process for the purpose of oom killing a task.

To add bounding on this use case, we introduce a limit to the total
number of concurrent DMA mappings that a user is allowed to create.
This limit is exposed as a tunable module option where the default
value of 64K is expected to be well in excess of any reasonable use
case (a large virtual machine configuration would typically only make
use of tens of concurrent mappings).

This fixes CVE-2019-3882.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
CVE-2019-3882

(cherry picked from commit 492855939bdb59c6f947b0b5b44af9ad82b7e38c)
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Acked-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

igb: Fix WARN_ONCE on runtime suspend

BugLink: https://bugs.launchpad.net/bugs/1818490
The runtime_suspend device callbacks are not supposed to save
configuration state or change the power state. Commit fb29f76cc566
("igb: Fix an issue that PME is not enabled during runtime suspend")
changed the driver to not save configuration state during runtime
suspend, however the driver callback still put the device into a
low-power state. This causes a warning in the pci pm core and results in
pci_pm_runtime_suspend not calling pci_save_state or pci_finish_runtime_suspend.

Fix this by not changing the power state either, leaving that to pci pm
core, and make the same change for suspend callback as well.

Also move a couple of defines into the appropriate header file instead
of inline in the .c file.

Fixes: fb29f76cc566 ("igb: Fix an issue that PME is not enabled during runtime suspend")
Signed-off-by: Arvind Sankar <niveditas98@gmail.com>
Reviewed-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit dabb8338be533c18f50255cf39ff4f66d4dabdbe)
Signed-off-by: You-Sheng Yang <vicamo.yang@canonical.com>
Acked-by: AceLan Kao <acelan.kao@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

fuse: fix initial parallel dirops

BugLink: https://bugs.launchpad.net/bugs/1823972
If parallel dirops are enabled in FUSE_INIT reply, then first operation may
leave fi->mutex held.

Reported-by: syzbot <syzbot+3f7b29af1baa9d0a55be@syzkaller.appspotmail.com>
Fixes: 5c672ab3f0ee ("fuse: serialize dirops by default")
Cc: <stable@vger.kernel.org> # v4.7
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
(cherry picked from commit 63576c13bd17848376c8ba4a98f5d5151140c4ac)
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Acked-by: Connor Kuehl <connor.kuehl@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

drm/amdgpu: fix&cleanups for wb_clear

BugLink: https://bugs.launchpad.net/bugs/1825074
fix:
should do right shift on wb before clearing

cleanups:
1,should memset all wb buffer
2,set max wb number to 128 (total 4KB) is big enough

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 73469585510d5161368c899b7eacd58c824b2b24)
Signed-off-by: You-Sheng Yang <vicamo.yang@canonical.com>
Acked-by: AceLan Kao <acelan.kao@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

ALSA: hda/realtek - Add support headset mode for New DELL WYSE NB

BugLink: https://launchpad.net/bugs/1821290
Enable headset mode support for new WYSE NB platform.

Signed-off-by: Kailang Yang <kailang@realtek.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
(cherry picked from commit da484d00f020af3dd7cfcc6c4b69a7f856832883)
Signed-off-by: Wen-chien Jesse Sung <jesse.sung@canonical.com>
Acked-by: AceLan Kao <acelan.kao@canonical.com>
Acked-by: hwang4 <hui.wang@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

ALSA: hda/realtek - Add support headset mode for DELL WYSE AIO

BugLink: https://launchpad.net/bugs/1821290
This patch will enable WYSE AIO for Headset mode.

Signed-off-by: Kailang Yang <kailang@realtek.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
(backported from commit 136824efaab2c095fc911048f7c7ddeda258c965)
Signed-off-by: Wen-chien Jesse Sung <jesse.sung@canonical.com>
Acked-by: AceLan Kao <acelan.kao@canonical.com>
Acked-by: hwang4 <hui.wang@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

ALSA: hda/realtek - Disable headset Mic VREF for headset mode of ALC225

BugLink: https://launchpad.net/bugs/1821290
Disable Headset Mic VREF for headset mode of ALC225.
This will be controlled by coef bits of headset mode functions.

[ Fixed a compile warning and code simplification -- tiwai ]

Signed-off-by: Kailang Yang <kailang@realtek.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
(backported from commit d1dd42110d2727e81b9265841a62bc84c454c3a2)
Signed-off-by: Wen-chien Jesse Sung <jesse.sung@canonical.com>
Acked-by: AceLan Kao <acelan.kao@canonical.com>
Acked-by: hwang4 <hui.wang@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

ALSA: hda/realtek - Add unplug function into unplug state of Headset Mode for ALC225

BugLink: https://launchpad.net/bugs/1821290
Forgot to add unplug function to unplug state of headset mode
for ALC225.

Signed-off-by: Kailang Yang <kailang@realtek.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
(cherry picked from commit 4d4b0c52bde470c379f5d168d5c139ad866cb808)
Signed-off-by: Wen-chien Jesse Sung <jesse.sung@canonical.com>
Acked-by: AceLan Kao <acelan.kao@canonical.com>
Acked-by: hwang4 <hui.wang@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

mac80211_hwsim: Timer should be initialized before device registered

BugLink: https://bugs.launchpad.net/bugs/1825058
Otherwise if network manager starts configuring Wi-Fi interface
immidiatelly after getting notification of its creation, we will get
NULL pointer dereference:

  BUG: unable to handle kernel NULL pointer dereference at           (null)
  IP: [<ffffffff95ae94c8>] hrtimer_active+0x28/0x50
  ...
  Call Trace:
   [<ffffffff95ae9997>] ? hrtimer_try_to_cancel+0x27/0x110
   [<ffffffff95ae9a95>] ? hrtimer_cancel+0x15/0x20
   [<ffffffffc0803bf0>] ? mac80211_hwsim_config+0x140/0x1c0 [mac80211_hwsim]

Cc: stable@vger.kernel.org
Signed-off-by: Vasyl Vavrychuk <vasyl.vavrychuk@globallogic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
(cherry picked from commit a1881c9b8a1edef0a5ae1d5c1b61406fe3402114)
Signed-off-by: You-Sheng Yang <vicamo.yang@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

ALSA: hda - Add two more machines to the power_save_blacklist

BugLink: https://bugs.launchpad.net/bugs/1821663
Recently we set CONFIG_SND_HDA_POWER_SAVE_DEFAULT to 1 when
configuring the kernel, then two machines were reported to have noise
after installing the new kernel. Put them in the blacklist, the
noise disappears.

https://bugs.launchpad.net/bugs/1821663
Cc: <stable@vger.kernel.org>
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
(backported from commit cae30527901d9590db0e12ace994c1d58bea87fd)
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: You-Sheng Yang <vicamo.yang@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

ALSA: hda - add Lenovo IdeaCentre B550 to the power_save_blacklist

BugLink: https://bugs.launchpad.net/bugs/1821663
Another machine which does not like the power saving (noise):
https://bugzilla.redhat.com/show_bug.cgi?id=1689623

Also, reorder the Lenovo C50 entry to keep the table sorted.

Reported-by: hs.guimaraes@outlook.com
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
(cherry picked from commit 721f1e6c1fd137e7e2053d8e103b666faaa2d50c)
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: You-Sheng Yang <vicamo.yang@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

ALSA: hda: Add Intel NUC7i3BNB to the power_save blacklist

BugLink: https://bugs.launchpad.net/bugs/1821663
Power-saving is causing a humming sound when active on the Intel
NUC7i3BNB, add it to the blacklist.

BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1520902
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
(cherry picked from commit dd6dd5365404a31292715e6f54184f47f9b6aca5)
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: You-Sheng Yang <vicamo.yang@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

nbd: fix how we set bd_invalidated

BugLink: https://bugs.launchpad.net/bugs/1822247
bd_invalidated is kind of a pain wrt partitions as it really only
triggers the partition rescan if it is set after bd_ops->open() runs, so
setting it when we reset the device isn't useful. We also sporadically
would still have partitions left over in some disconnect cases, so fix
this by always setting bd_invalidated on open if there's no
configuration or if we've had a disconnect action happen, that way the
partition table gets invalidated and rescanned properly.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from fe1f9e6659ca6124f500a0f829202c7c902fab0c)
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Acked-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

kvmclock: fix TSC calibration for nested guests

BugLink: https://bugs.launchpad.net/bugs/1822821
Inside a nested guest, access to hardware can be slow enough that
tsc_read_refs always return ULLONG_MAX, causing tsc_refine_calibration_work
to be called periodically and the nested guest to spend a lot of time
reading the ACPI timer.

However, if the TSC frequency is available from the pvclock page,
we can just set X86_FEATURE_TSC_KNOWN_FREQ and avoid the recalibration.
'refine' operation.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Peng Hao <peng.hao2@zte.com.cn>
[Commit message rewritten. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit e10f7805032365cc11c739a97f226ebb48aee042)
Signed-off-by: Heitor R. Alves de Siqueira <halves@canonical.com>
Acked-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

ipvs: fix refcount usage for conns in ops mode

BugLink: https://bugs.launchpad.net/bugs/1819786
Connections in One-packet scheduling mode (-o, --ops) are
removed with refcnt=0 because they are not hashed in conn table.
To avoid refcount_dec reporting this as error, change them to be
removed with refcount_dec_if_one as all other connections.

refcount_t hit zero at ip_vs_conn_put+0x31/0x40 [ip_vs]
in sh[15519], uid/euid: 497/497
WARNING: CPU: 0 PID: 15519 at ../kernel/panic.c:657
refcount_error_report+0x94/0x9e
Modules linked in: ip_vs_rr cirrus ttm sb_edac
edac_core drm_kms_helper crct10dif_pclmul crc32_pclmul
ghash_clmulni_intel pcbc mousedev drm aesni_intel aes_x86_64
crypto_simd glue_helper cryptd psmouse evdev input_leds led_class
intel_agp fb_sys_fops syscopyarea sysfillrect intel_rapl_perf mac_hid
intel_gtt serio_raw sysimgblt agpgart i2c_piix4 i2c_core ata_generic
pata_acpi floppy cfg80211 rfkill button loop macvlan ip_vs
nf_conntrack libcrc32c crc32c_generic ip_tables x_tables ipv6
crc_ccitt autofs4 ext4 crc16 mbcache jbd2 fscrypto ata_piix libata
atkbd libps2 scsi_mod crc32c_intel i8042 rtc_cmos serio af_packet
dm_mod dax fuse xen_netfront xen_blkfront
CPU: 0 PID: 15519 Comm: sh Tainted: G        W
4.15.17 #1-NixOS
Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006
RIP: 0010:refcount_error_report+0x94/0x9e
RSP: 0000:ffffa344dde039c8 EFLAGS: 00010296
RAX: 0000000000000057 RBX: ffffffff92f20e06 RCX: 0000000000000006
RDX: 0000000000000007 RSI: 0000000000000086 RDI: ffffa344dde165c0
RBP: ffffa344dde03b08 R08: 0000000000000218 R09: 0000000000000004
R10: ffffffff93006a80 R11: 0000000000000001 R12: ffffa344d68cd100
R13: 00000000000001f1 R14: ffffffff92f12fb0 R15: 0000000000000004
FS:  00007fc9d2040fc0(0000) GS:ffffa344dde00000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000262a000 CR3: 0000000016a0c004 CR4: 00000000001606f0
Call Trace:
<IRQ>
ex_handler_refcount+0x4e/0x80
fixup_exception+0x33/0x40
do_trap+0x83/0x140
do_error_trap+0x83/0xf0
? ip_vs_conn_drop_conntrack+0x120/0x1a5 [ip_vs]
? ip_finish_output2+0x29c/0x390
? ip_finish_output2+0x1a2/0x390
invalid_op+0x1b/0x40
RIP: 0010:ip_vs_conn_put+0x31/0x40 [ip_vs]
RSP: 0000:ffffa344dde03bb8 EFLAGS: 00010246
RAX: 0000000000000001 RBX: ffffa344df31cf00 RCX: ffffa344d7450198
RDX: 0000000000000003 RSI: 00000000fffffe01 RDI: ffffa344d7450140
RBP: 0000000000000002 R08: 0000000000000476 R09: 0000000000000000
R10: ffffa344dde03b28 R11: ffffa344df200000 R12: ffffa344d7d09000
R13: ffffa344def3a980 R14: ffffffffc04f6e20 R15: 0000000000000008
ip_vs_in.part.29.constprop.36+0x34f/0x640 [ip_vs]
? ip_vs_conn_out_get+0xe0/0xe0 [ip_vs]
ip_vs_remote_request4+0x47/0xa0 [ip_vs]
? ip_vs_in.part.29.constprop.36+0x640/0x640 [ip_vs]
nf_hook_slow+0x43/0xc0
ip_local_deliver+0xac/0xc0
? ip_rcv_finish+0x400/0x400
ip_rcv+0x26c/0x380
__netif_receive_skb_core+0x3a0/0xb10
? inet_gro_receive+0x23c/0x2b0
? netif_receive_skb_internal+0x24/0xb0
netif_receive_skb_internal+0x24/0xb0
napi_gro_receive+0xb8/0xe0
xennet_poll+0x676/0xb40 [xen_netfront]
net_rx_action+0x139/0x3a0
__do_softirq+0xde/0x2b4
irq_exit+0xae/0xb0
xen_evtchn_do_upcall+0x2c/0x40
xen_hvm_callback_vector+0x7d/0x90
</IRQ>
RIP: 0033:0x7fc9d11c91f9
RSP: 002b:00007ffebe8a2ea0 EFLAGS: 00000202 ORIG_RAX:
ffffffffffffff0c
RAX: 00000000ffffffff RBX: 0000000002609808 RCX: 0000000000000054
RDX: 0000000000000001 RSI: 0000000002605440 RDI: 00000000025f940e
RBP: 00000000025f940e R08: 000000000260213d R09: 1999999999999999
R10: 000000000262a808 R11: 00000000025f942d R12: 00000000025f940e
R13: 00007fc9d1301e20 R14: 00000000025f9408 R15: 00007fc9d1302720
Code: 48 8b 95 80 00 00 00 41 55 49 8d 8c 24 e0 05 00
00 45 8b 84 24 38 04 00 00 41 89 c1 48 89 de 48 c7 c7 a8 2f f2 92 e8
7c fa ff ff <0f> 0b 58 5b 5d 41 5c 41 5d c3 0f 1f 44 00 00 55 48 89 e5
41 56

Reported-by: Net Filter <netfilternetfilter@gmail.com>
Fixes: b54ab92b84b6 ("netfilter: refcounter conversions")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
(cherry picked from commit a050d345cef0dc6249263540da1e902bba617e43)
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Acked-by: You-Sheng Yang <vicamo.yang@canonical.com>
Acked-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

openvswitch: fix flow actions reallocation

BugLink: https://bugs.launchpad.net/bugs/1813244
The flow action buffer can be resized if it's not big enough to contain
all the requested flow actions. However, this resize doesn't take into
account the new requested size, the buffer is only increased by a factor
of 2x. This might be not enough to contain the new data, causing a
buffer overflow, for example:

[   42.044472] =============================================================================
[   42.045608] BUG kmalloc-96 (Not tainted): Redzone overwritten
[   42.046415] -----------------------------------------------------------------------------

[   42.047715] Disabling lock debugging due to kernel taint
[   42.047716] INFO: 0x8bf2c4a5-0x720c0928. First byte 0x0 instead of 0xcc
[   42.048677] INFO: Slab 0xbc6d2040 objects=29 used=18 fp=0xdc07dec4 flags=0x2808101
[   42.049743] INFO: Object 0xd53a3464 @offset=2528 fp=0xccdcdebb

[   42.050747] Redzone 76f1b237: cc cc cc cc cc cc cc cc                          ........
[   42.051839] Object d53a3464: 6b 6b 6b 6b 6b 6b 6b 6b 0c 00 00 00 6c 00 00 00  kkkkkkkk....l...
[   42.053015] Object f49a30cc: 6c 00 0c 00 00 00 00 00 00 00 00 03 78 a3 15 f6  l...........x...
[   42.054203] Object acfe4220: 20 00 02 00 ff ff ff ff 00 00 00 00 00 00 00 00   ...............
[   42.055370] Object 21024e91: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[   42.056541] Object 070e04c3: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[   42.057797] Object 948a777a: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[   42.059061] Redzone 8bf2c4a5: 00 00 00 00                                      ....
[   42.060189] Padding a681b46e: 5a 5a 5a 5a 5a 5a 5a 5a                          ZZZZZZZZ

Fix by making sure the new buffer is properly resized to contain all the
requested data.

BugLink: https://bugs.launchpad.net/bugs/1813244
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit f28cd2af22a0c134e4aa1c64a70f70d815d473fb)
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Acked-by: Juerg Haefliger <juergh@canonical.com>
Acked-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Juerg Haefliger <juergh@canonical.com>

UBUNTU: Ubuntu-4.15.0-48.51

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>

UBUNTU: link-to-tracker: update tracking bug

BugLink: https://bugs.launchpad.net/bugs/1822820
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>

UBUNTU: Start new release

Ignore: yes
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>

UBUNTU: [Packaging] resync retpoline extraction

BugLink: http://bugs.launchpad.net/bugs/1786013
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>

UBUNTU: [Packaging] update helper scripts

BugLink: http://bugs.launchpad.net/bugs/1786013
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>

btrfs: raid56: properly unmap parity page in finish_parity_scrub()

Buglink: https://bugs.launchpad.net/bugs/1812845
Parity page is incorrectly unmapped in finish_parity_scrub(), triggering
a reference counter bug on i386, i.e.:

[ 157.662401] kernel BUG at mm/highmem.c:349!
[ 157.666725] invalid opcode: 0000 [#1] SMP PTI

The reason is that kunmap(p_page) was completely left out, so we never
did an unmap for the p_page and the loop unmapping the rbio page was
iterating over the wrong number of stripes: unmapping should be done
with nr_data instead of rbio->real_stripes.

Test case to reproduce the bug:

- create a raid5 btrfs filesystem:
   # mkfs.btrfs -m raid5 -d raid5 /dev/sdb /dev/sdc /dev/sdd /dev/sde

- mount it:
   # mount /dev/sdb /mnt

- run btrfs scrub in a loop:
   # while :; do btrfs scrub start -BR /mnt; done

BugLink: https://bugs.launchpad.net/bugs/1812845
Fixes: 5a6ac9eacb49 ("Btrfs, raid56: support parity scrub on raid56")
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
(cherry picked from commit 3897b6f0a859288c22fb793fad11ec2327e60fcd)
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Acked-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>

cpupower : Fix header name to read idle state name

BugLink: https://bugs.launchpad.net/bugs/1719545
The names of the idle states in the output of cpupower monitor command are
truncated to 4 characters. On POWER9, this creates ambiguity as the states
are named "stop0", "stop1", etc.

root:~# cpupower monitor
              |Idle_Stats
PKG |CORE|CPU | snoo | stop | stop | stop | stop | stop | stop
   0|   0|   0|  0.00|  0.00|  0.00|  0.00|  0.00|  0.00|  1.90
   0|   0|   1|  0.00|  0.00|  0.00|  0.00|  0.00|  0.00|  0.00
   0|   0|   2|  0.00|  0.00|  0.00|  0.00|  0.00|  0.00|  0.00
   0|   0|   3|  0.00|  0.00|  0.00|  0.00|  0.00|  0.00|  0.00

This patch modifies the output to print the state name that results in a
legible output. The names will be printed with atmost 1 padding in left.

root:~# cpupower monitor
              | Idle_Stats
PKG|CORE| CPU|snooze|stop0L| stop0|stop1L| stop1|stop2L| stop2
   0|   0|   0|  0.00|  0.00|  0.00|  0.00|  0.00|  0.00|  0.72
   0|   0|   1|  0.00|  0.00|  0.00|  0.00|  0.00|  0.00|  0.00
   0|   0|   2|  0.00|  0.00|  0.00|  0.00|  0.00|  0.00|  0.00
   0|   0|   3|  0.00|  0.00|  0.00|  0.00|  0.00|  0.00|  0.00

This patch does not affect the output for intel.
Output for intel before applying the patch:

root:~# cpupower monitor
    |Idle_Stats
CPU | POLL | C1-S | C1E- | C3-S | C6-S | C7s- | C8-S | C9-S | C10-
   0|  0.00|  0.14|  0.39|  0.35|  7.41|  0.00| 17.67|  1.01| 70.03
   2|  0.00|  0.19|  0.47|  0.10|  6.50|  0.00| 29.66|  2.17| 58.07
   1|  0.00|  0.11|  0.50|  1.50|  9.11|  0.18| 18.19|  0.40| 66.63
   3|  0.00|  0.67|  0.42|  0.03|  5.84|  0.00| 12.58|  0.77| 77.14

Output for intel after applying the patch:

root:~# cpupower monitor
    | Idle_Stats
CPU| POLL | C1-S | C1E- | C3-S | C6-S | C7s- | C8-S | C9-S | C10-
   0|  0.03|  0.33|  1.01|  0.27|  3.03|  0.00| 19.18|  0.00| 71.24
   2|  0.00|  1.58|  0.58|  0.42|  8.55|  0.09| 21.11|  0.99| 63.32
   1|  0.00|  1.26|  0.88|  0.43|  9.00|  0.02|  7.78|  4.65| 71.91
   3|  0.00|  0.30|  0.42|  0.06| 13.62|  0.21| 30.29|  0.00| 52.45

Signed-off-by: Abhishek Goel <huntbag@linux.vnet.ibm.com>
Signed-off-by: Shuah Khan (Samsung OSG) <shuah@kernel.org>
(cherry picked from commit f9652d5cae04eb5e85303c087f5842d320499c65)
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>

drm/amdgpu: Free VGA stolen memory as soon as possible.

BugLink: https://launchpad.net/bugs/1818617
Reserved VRAM is used to avoid overriding pre OS FB.
Once our display stack takes over we don't need the reserved
VRAM anymore.

v2:
Remove comment, we know actually why we need to reserve the stolen VRAM.
Fix return type for amdgpu_ttm_late_init.
v3:
Return 0 in amdgpu_bo_late_init, rebase on changes to previous patch
v4: rebase
v5:
For GMC9 reserve always just 9M and keep the stolem memory around
until GART table curruption on S3 resume is resolved.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(backported from commit 6f752ec2c20c6a575da29d5b297980f376830e6b)
Signed-off-by: Timo Aaltonen <timo.aaltonen@canonical.com>
Acked-by: Khalid Elmously <khalid.elmously@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

drm/amdgpu/gmc: steal the appropriate amount of vram for fw hand-over (v3)

BugLink: https://launchpad.net/bugs/1818617
Steal 9 MB for vga emulation and fb if vga is enabled, otherwise,
steal enough to cover the current display size as set by the vbios.

If no memory is used (e.g., secondary or headless card), skip
stolen memory reserve.

v2: skip reservation if vram is limited, address Christian's comments
v3: squash in fix from Harry

Reviewed-and-Tested-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> (v2)
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
(cherry picked from commit ebdef28ebbcf767d9fa687acb1d02d97d834c628)
Signed-off-by: Timo Aaltonen <timo.aaltonen@canonical.com>
Acked-by: Khalid Elmously <khalid.elmously@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

iommu/vt-d: Disable ATS support on untrusted devices

BugLink: https://bugs.launchpad.net/bugs/1820153
Commit fb58fdcd295b9 ("iommu/vt-d: Do not enable ATS for untrusted
devices") disables ATS support on the devices which have been marked
as untrusted. Unfortunately this is not enough to fix the DMA attack
vulnerabiltiies because IOMMU driver allows translated requests as
long as a device advertises the ATS capability. Hence a malicious
peripheral device could use this to bypass IOMMU.

This disables the ATS support on untrusted devices by clearing the
internal per-device ATS mark. As the result, IOMMU driver will block
any translated requests from any device marked as untrusted.

Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Suggested-by: Kevin Tian <kevin.tian@intel.com>
Suggested-by: Ashok Raj <ashok.raj@intel.com>
Fixes: fb58fdcd295b9 ("iommu/vt-d: Do not enable ATS for untrusted devices")
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
(backported from commit d8b8591054575f33237556c32762d54e30774d28)
Signed-off-by: Aaron Ma <aaron.ma@canonical.com>
Acked-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

thunderbolt: Export IOMMU based DMA protection support to userspace

BugLink: https://bugs.launchpad.net/bugs/1820153
Recent systems with Thunderbolt ports may support IOMMU natively. In
practice this means that Thunderbolt connected devices are placed behind
an IOMMU during the whole time it is connected (including during boot)
making Thunderbolt security levels redundant. This is called Kernel DMA
protection [1] by Microsoft.

Some of these systems still have Thunderbolt security level set to
"user" in order to support OS downgrade (the older version of the OS
might not support IOMMU based DMA protection so connecting a device
still relies on user approval).

Export this information to userspace by introducing a new sysfs
attribute (iommu_dma_protection). Based on it userspace tools can make
more accurate decision whether or not authorize the connected device.

In addition update Thunderbolt documentation regarding IOMMU based DMA
protection.

[1] https://docs.microsoft.com/en-us/windows/security/information-protection/kernel-dma-protection-for-thunderbolt

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Yehezkel Bernat <YehezkelShB@gmail.com>
(cherry picked from commit dcc3c9e37fbd70e728d08cce0e50121605390fa0)
Signed-off-by: Aaron Ma <aaron.ma@canonical.com>
Acked-by: Tyler Hicks <tyhicks@canonical.com>
Acked-by: Hui Wang <hui.wang@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

iommu/vt-d: Do not enable ATS for untrusted devices

BugLink: https://bugs.launchpad.net/bugs/1820153
Currently Linux automatically enables ATS (Address Translation Service)
for any device that supports it (and IOMMU is turned on). ATS is used to
accelerate DMA access as the device can cache translations locally so
there is no need to do full translation on IOMMU side. However, as
pointed out in [1] ATS can be used to bypass IOMMU based security
completely by simply sending PCIe read/write transaction with AT
(Address Translation) field set to "translated".

To mitigate this modify the Intel IOMMU code so that it does not enable
ATS for any device that is marked as being untrusted. In case this turns
out to cause performance issues we may selectively allow ATS based on
user decision but currently use big hammer and disable it completely to
be on the safe side.

[1] https://www.repository.cam.ac.uk/handle/1810/274352

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
Acked-by: Joerg Roedel <jroedel@suse.de>
(cherry picked from commit fb58fdcd295b914ece1d829b24df00a17a9624bc)
Signed-off-by: Aaron Ma <aaron.ma@canonical.com>
Acked-by: Tyler Hicks <tyhicks@canonical.com>
Acked-by: Hui Wang <hui.wang@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

iommu/vt-d: Force IOMMU on for platform opt in hint

BugLink: https://bugs.launchpad.net/bugs/1820153
Intel VT-d spec added a new DMA_CTRL_PLATFORM_OPT_IN_FLAG flag in DMAR
ACPI table [1] for BIOS to report compliance about platform initiated
DMA restricted to RMRR ranges when transferring control to the OS. This
means that during OS boot, before it enables IOMMU none of the connected
devices can bypass DMA protection for instance by overwriting the data
structures used by the IOMMU. The OS also treats this as a hint that the
IOMMU should be enabled to prevent DMA attacks from possible malicious
devices.

A use of this flag is Kernel DMA protection for Thunderbolt [2] which in
practice means that IOMMU should be enabled for PCIe devices connected
to the Thunderbolt ports. With IOMMU enabled for these devices, all DMA
operations are limited in the range reserved for it, thus the DMA
attacks are prevented. All these devices are enumerated in the PCI/PCIe
module and marked with an untrusted flag.

This forces IOMMU to be enabled if DMA_CTRL_PLATFORM_OPT_IN_FLAG is set
in DMAR ACPI table and there are PCIe devices marked as untrusted in the
system. This can be turned off by adding "intel_iommu=off" in the kernel
command line, if any problems are found.

[1] https://software.intel.com/sites/default/files/managed/c5/15/vt-directed-io-spec.pdf
[2] https://docs.microsoft.com/en-us/windows/security/information-protection/kernel-dma-protection-for-thunderbolt

Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
Acked-by: Joerg Roedel <jroedel@suse.de>
(cherry picked from commit 89a6079df791aeace2044ea93be1b397195824ec)
Signed-off-by: Aaron Ma <aaron.ma@canonical.com>
Acked-by: Tyler Hicks <tyhicks@canonical.com>
Acked-by: Hui Wang <hui.wang@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

PCI / ACPI: Identify untrusted PCI devices

BugLink: https://bugs.launchpad.net/bugs/1820153
A malicious PCI device may use DMA to attack the system. An external
Thunderbolt port is a convenient point to attach such a device. The OS
may use IOMMU to defend against DMA attacks.

Some BIOSes mark these externally facing root ports with this
ACPI _DSD [1]:

  Name (_DSD, Package () {
      ToUUID ("efcc06cc-73ac-4bc3-bff0-76143807c389"),
      Package () {
          Package () {"ExternalFacingPort", 1},
  Package () {"UID", 0 }
      }
  })

If we find such a root port, mark it and all its children as untrusted.
The rest of the OS may use this information to enable DMA protection
against malicious devices. For instance the device may be put behind an
IOMMU to keep it from accessing memory outside of what the driver has
allocated for it.

While at it, add a comment on top of prp_guids array explaining the
possible caveat resulting when these GUIDs are treated equivalent.

[1] https://docs.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports#identifying-externally-exposed-pcie-root-ports

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
(backported from commit 617654aae50eb59dd98aa53fb562e850937f4cde)
Signed-off-by: Aaron Ma <aaron.ma@canonical.com>
Acked-by: Tyler Hicks <tyhicks@canonical.com>
Acked-by: Hui Wang <hui.wang@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

ACPI / property: Allow multiple property compatible _DSD entries

BugLink: https://bugs.launchpad.net/bugs/1820153
It is possible to have _DSD entries where the data is compatible with
device properties format but are using different GUID for various reasons.
In addition to that there can be many such _DSD entries for a single device
such as for PCIe root port used to host a Thunderbolt hierarchy:

    Scope (\_SB.PCI0.RP21)
    {
        Name (_DSD, Package () {
            ToUUID ("6211e2c0-58a3-4af3-90e1-927a4e0c55a4"),
            Package () {
                Package () {"HotPlugSupportInD3", 1}
            },

            ToUUID ("efcc06cc-73ac-4bc3-bff0-76143807c389"),
            Package () {
                Package () {"ExternalFacingPort", 1},
                Package () {"UID", 0 }
            }
        })
    }

More information about these new _DSD entries can be found in:

  https://docs.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports

To make these available for drivers via unified device property APIs,
modify ACPI property core so that it supports multiple _DSD entries
organized in a linked list. We also store GUID of each _DSD entry in struct
acpi_device_properties in case there is need to differentiate between
entries. The supported GUIDs are then listed in prp_guids array.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com>
(cherry picked from commit 5f5e4890d57a8af5da72c9d73a4efa9bad43a7a3)
Signed-off-by: Aaron Ma <aaron.ma@canonical.com>
Acked-by: Tyler Hicks <tyhicks@canonical.com>
Acked-by: Hui Wang <hui.wang@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

ACPICA: AML parser: attempt to continue loading table after error

BugLink: https://bugs.launchpad.net/bugs/1820153
This change alters the parser so that the table load does not abort
upon an error.

Notable changes:

If there is an error while parsing an element of the termlist, we
will skip parsing the current termlist element and continue parsing
to the next opcode in the termlist.

If we get an error while parsing the conditional of If/Else/While or
the device name of Scope, we will skip the body of the statement all
together and pop the parser_state.

If we get an error while parsing the base offset and length of an
operation region declaration, we will remove the operation region
from the namespace.

Signed-off-by: Erik Schmauss <erik.schmauss@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(backported from commit 5088814a6e931350e5bd29f5d59fa40c6dbbdf10)
Signed-off-by: Aaron Ma <aaron.ma@canonical.com>
Acked-by: Tyler Hicks <tyhicks@canonical.com>
Acked-by: Hui Wang <hui.wang@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

powerpc/powernv/npu: Fault user page into the hypervisor's pagetable

BugLink: https://bugs.launchpad.net/bugs/1819989
When a page fault happens in a GPU, the GPU signals the OS and the GPU
driver calls the fault handler which populated a page table; this allows
the GPU to complete an ATS request.

On the bare metal get_user_pages() is enough as it adds a pte to
the kernel page table but under KVM the partition scope tree does not get
updated so ATS will still fail.

This reads a byte from an effective address which causes HV storage
interrupt and KVM updates the partition scope tree.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit 58629c0dc34904d135af944d120eb23165ec3b61)
Signed-off-by: Jose Ricardo Ziviani <joserz@linux.ibm.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>
Acked-by: Kleber Souza <kleber.souza@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

powerpc/powernv/npu: Check mmio_atsd array bounds when populating

BugLink: https://bugs.launchpad.net/bugs/1819989
A broken device tree might contain more than 8 values and introduce hard
to debug memory corruption bug. This adds the boundary check.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit 135ef954051b102870a8d47a8eb822af1f1b1ec1)
Signed-off-by: Jose Ricardo Ziviani <joserz@linux.ibm.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>
Acked-by: Kleber Souza <kleber.souza@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

powerpc/pseries: Remove IOMMU API support for non-LPAR systems

BugLink: https://bugs.launchpad.net/bugs/1819989
The pci_dma_bus_setup_pSeries and pci_dma_dev_setup_pSeries hooks are
registered for the pseries platform which does not have FW_FEATURE_LPAR;
these would be pre-powernv platforms which we never supported PCI pass
through for anyway so remove it.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit c409c6316166993163e29312aeaaf1c0c300a04a)
Signed-off-by: Jose Ricardo Ziviani <joserz@linux.ibm.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>
Acked-by: Kleber Souza <kleber.souza@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

powerpc/pseries/npu: Enable platform support

BugLink: https://bugs.launchpad.net/bugs/1819989
We already changed NPU API for GPUs to not to call OPAL and the remaining
bit is initializing NPU structures.

This searches for POWER9 NVLinks attached to any device on a PHB and
initializes an NPU structure if any found.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(backported from commit 3be2df00e299821ad255498ac4411906a8d59cfa)
Signed-off-by: Jose Ricardo Ziviani <joserz@linux.ibm.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>
Acked-by: Kleber Souza <kleber.souza@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

powerpc/pseries/iommu: Use memory@ nodes in max RAM address calculation

BugLink: https://bugs.launchpad.net/bugs/1819989
We might have memory@ nodes with "linux,usable-memory" set to zero
(for example, to replicate powernv's behaviour for GPU coherent memory)
which means that the memory needs an extra initialization but since
it can be used afterwards, the pseries platform will try mapping it
for DMA so the DMA window needs to cover those memory regions too;
if the window cannot cover new memory regions, the memory onlining fails.

This walks through the memory nodes to find the highest RAM address to
let a huge DMA window cover that too in case this memory gets onlined
later.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit 68c0449ea16d775e762b532afddb4d6a5f161877)
Signed-off-by: Jose Ricardo Ziviani <joserz@linux.ibm.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>
Acked-by: Kleber Souza <kleber.souza@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

powerpc/powernv/npu: Move OPAL calls away from context manipulation

BugLink: https://bugs.launchpad.net/bugs/1819989
When introduced, the NPU context init/destroy helpers called OPAL which
enabled/disabled PID (a userspace memory context ID) filtering in an NPU
per a GPU; this was a requirement for P9 DD1.0. However newer chip
revision added a PID wildcard support so there is no more need to
call OPAL every time a new context is initialized. Also, since the PID
wildcard support was added, skiboot does not clear wildcard entries
in the NPU so these remain in the hardware till the system reboot.

This moves LPID and wildcard programming to the PE setup code which
executes once during the booting process so NPU2 context init/destroy
won't need to do additional configuration.

This replaces the check for FW_FEATURE_OPAL with a check for npu!=NULL as
this is the way to tell if the NPU support is present and configured.

This moves pnv_npu2_init() declaration as pseries should be able to use it.
This keeps pnv_npu2_map_lpar() in powernv as pseries is not allowed to
call that. This exports pnv_npu2_map_lpar_dev() as following patches
will use it from the VFIO driver.

While at it, replace redundant list_for_each_entry_safe() with
a simpler list_for_each_entry().

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit 0e759bd75285e96fbb4013d1303b08fdb8ba58e1)
Signed-off-by: Jose Ricardo Ziviani <joserz@linux.ibm.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>
Acked-by: Kleber Souza <kleber.souza@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

powerpc/powernv: Move npu struct from pnv_phb to pci_controller

BugLink: https://bugs.launchpad.net/bugs/1819989
The powernv PCI code stores NPU data in the pnv_phb struct. The latter
is referenced by pci_controller::private_data. We are going to have NPU2
support in the pseries platform as well but it does not store any
private_data in in the pci_controller struct; and even if it did,
it would be a different data structure.

This makes npu a pointer and stores it one level higher in
the pci_controller struct.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(backported from commit 46a1449d9e39478a35d35d9d9025776f6cee24fb)
Signed-off-by: Jose Ricardo Ziviani <joserz@linux.ibm.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>
Acked-by: Kleber Souza <kleber.souza@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

powerpc/powernv/idoa: Remove unnecessary pcidev from pci_dn

BugLink: https://bugs.launchpad.net/bugs/1819989
The pcidev value stored in pci_dn is only used for NPU/NPU2
initialization. We can easily drop the cached pointer and
use an ancient helper - pci_get_domain_bus_and_slot() instead in order
to reduce complexity.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit 902bdc57451c2c64aa139bbe24067f70a186db0a)
Signed-off-by: Jose Ricardo Ziviani <joserz@linux.ibm.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>
Acked-by: Kleber Souza <kleber.souza@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

powerpc/powernv: Make possible for user to force a full ipl cec reboot

BugLink: https://bugs.launchpad.net/bugs/1819989
Ever since fast reboot is enabled by default in opal,
opal_cec_reboot() will use fast-reset instead of full IPL to perform
system reboot. This leaves the user with no direct way to force a full
IPL reboot except changing an nvram setting that persistently disables
fast-reset for all subsequent reboots.

This patch provides a more direct way for the user to force a one-shot
full IPL reboot by passing the command line argument 'full' to the
reboot command. So the user will be able to tweak the reboot behavior
via:

  $ sudo reboot full # Force a full ipl reboot skipping fast-reset

  or
  $ sudo reboot   # default reboot path (usually fast-reset)

The reboot command passes the un-parsed command argument to the kernel
via the 'Reboot' syscall which is then passed on to the arch function
pnv_restart(). The patch updates pnv_restart() to handle this cmd-arg
and issues opal_cec_reboot2 with OPAL_REBOOT_FULL_IPL to force a full
IPL reset.

Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit 8139046a5a34787849df81f4a5875cf4b404a7a1)
Signed-off-by: Jose Ricardo Ziviani <joserz@linux.ibm.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>
Acked-by: Kleber Souza <kleber.souza@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

powerpc/powernv: Export opal_check_token symbol

BugLink: https://bugs.launchpad.net/bugs/1819989
Export opal_check_token symbol for modules to check the availability
of OPAL calls before using them.

Signed-off-by: Haren Myneni <haren@us.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit 6e708000ec2c93c2bde6a46aa2d6c3e80d4eaeb9)
Signed-off-by: Jose Ricardo Ziviani <joserz@linux.ibm.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>
Acked-by: Kleber Souza <kleber.souza@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

powerpc/powernv: call OPAL_QUIESCE before OPAL_SIGNAL_SYSTEM_RESET

BugLink: https://bugs.launchpad.net/bugs/1819989
Although it is often possible to recover a CPU that was interrupted
from OPAL with a system reset NMI, it's undesirable to interrupt them
for a few reasons. Firstly because dump/debug code itself needs to
call firmware, so it could hang on a lock or possibly corrupt a
per-cpu data structure if it or another CPU was interrupted from
OPAL. Secondly, the kexec crash dump code will not return from
interrupt to unwind the OPAL call.

Call OPAL_QUIESCE with QUIESCE_HOLD before sending an NMI IPI to
another CPU, which wait for it to leave firmware (or time out) to
avoid this problem in normal conditions. Firmware bugs may still
result in a timeout and interrupting OPAL, but that is the best
option (stops the CPU, and possibly allows firmware to be debugged).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit ee03b9b4479d1302d01cebedda3518dc967697b7)
Signed-off-by: Jose Ricardo Ziviani <joserz@linux.ibm.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>
Acked-by: Kleber Souza <kleber.souza@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

powerpc/powernv/npu: Do not try invalidating 32bit table when 64bit table is enabled

BugLink: https://bugs.launchpad.net/bugs/1819989
GPUs and the corresponding NVLink bridges get different PEs as they
have separate translation validation entries (TVEs). We put these PEs
to the same IOMMU group so they cannot be passed through separately.
So the iommu_table_group_ops::set_window/unset_window for GPUs do set
tables to the NPU PEs as well which means that iommu_table's list of
attached PEs (iommu_table_group_link) has both GPU and NPU PEs linked.
This list is used for TCE cache invalidation.

The problem is that NPU PE has just a single TVE and can be programmed
to point to 32bit or 64bit windows while GPU PE has two (as any other
PCI device). So we end up having an 32bit iommu_table struct linked to
both PEs even though only the 64bit TCE table cache can be invalidated
on NPU. And a relatively recent skiboot detects this and prints
errors.

This changes GPU's iommu_table_group_ops::set_window/unset_window to
make sure that NPU PE is only linked to the table actually used by the
hardware. If there are two tables used by an IOMMU group, the NPU PE
will use the last programmed one which with the current use scenarios
is expected to be a 64bit one.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit d41ce7b1bcc3e1d02cc9da3b83c0fe355fcb68e0)
Signed-off-by: Jose Ricardo Ziviani <joserz@linux.ibm.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>
Acked-by: Kleber Souza <kleber.souza@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

net-next/hinic: replace disable_irq_nosync/enable_irq

BugLink: https://bugs.launchpad.net/bugs/1820187
In order to avoid frequent system interrupts when sending and
receiving packets. we replace disable_irq_nosync/enable_irq
with hinic_set_msix_state(), hinic_set_msix_state is used to
access memory mapped hinic devices.

Signed-off-by: Xue Chaojing <xuechaojing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 905b464ad9008905db099f90ae20f373c7051804)
Signed-off-by: Ike Panhc <ike.pan@canonical.com>
Acked-by: Kleber Souza <kleber.souza@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

net-next/hinic:add shutdown callback

BugLink: https://bugs.launchpad.net/bugs/1820187
If there is no shutdown callback, our board will report pcie UNF errors
after restarting. This patch add shutdown callback for hinic.

Signed-off-by: Xue Chaojing <xuechaojing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 53fe3ed19df0bca6ce752fae8e483910b6f112f6)
Signed-off-by: Ike Panhc <ike.pan@canonical.com>
Acked-by: Kleber Souza <kleber.souza@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

hinic: optmize rx refill buffer mechanism

BugLink: https://bugs.launchpad.net/bugs/1820187
There is no need to schedule a different tasklet for refill,
This patch remove it.

Suggested-by: Neil Horman <nhorman@redhat.com>
Signed-off-by: Xue Chaojing <xuechaojing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit e1a76515b0c20c3477200c1345c477cc0e68c4ad)
Signed-off-by: Ike Panhc <ike.pan@canonical.com>
Acked-by: Kleber Souza <kleber.souza@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

net: hinic: fix null pointer dereference on pointer hwdev

BugLink: https://bugs.launchpad.net/bugs/1820187
Pointer hwdev is being dereferenced when declaring hwif , however, later
on hwdev is being null checked, hence we have dereference before null
check error. Fix this by assigning hwif and pdef only once hwdev has
been null checked.

Detected by CoverityScan, CID#1485581 ("Dereference before null check")

Fixes: 4a61abb100c8 ("net-next/hinic:add rx checksum offload for HiNIC")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit e159e592872edc0536f55ec242dbc5b70a593265)
Signed-off-by: Ike Panhc <ike.pan@canonical.com>
Acked-by: Kleber Souza <kleber.souza@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

net-next/hinic: fix a bug in rx data flow

BugLink: https://bugs.launchpad.net/bugs/1820187
In rx_alloc_pkts(), there is a loop call of tasklet, which causes
100% cpu utilization, even no packets are being received. This patch
fixes this bug.

Signed-off-by: Xue Chaojing <xuechaojing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit b1a200484143a727ce293e0f200a543cc7584152)
Signed-off-by: Ike Panhc <ike.pan@canonical.com>
Acked-by: Kleber Souza <kleber.souza@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

net-next/hinic:fix a bug in set mac address

BugLink: https://bugs.launchpad.net/bugs/1820187
In add_mac_addr(), if the MAC address is a muliticast address,
it will not be set, which causes the network card fail to receive
the multicast packet. This patch fixes this bug.

Signed-off-by: Xue Chaojing <xuechaojing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 9ea72dc9430306b77c73a8a21beb51437cde1d6d)
Signed-off-by: Ike Panhc <ike.pan@canonical.com>
Acked-by: Kleber Souza <kleber.souza@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

net-next/hinic:add rx checksum offload for HiNIC

BugLink: https://bugs.launchpad.net/bugs/1820187
In order to improve performance, this patch adds rx checksum offload
for the HiNIC driver. Performance test(Iperf) shows more than 80%
improvement in TCP streams.

Signed-off-by: Xue Chaojing <xuechaojing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 4a61abb100c8a647959147034f60e9fce17ce9af)
Signed-off-by: Ike Panhc <ike.pan@canonical.com>
Acked-by: Kleber Souza <kleber.souza@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

net-next/hinic:replace multiply and division operators

BugLink: https://bugs.launchpad.net/bugs/1820187
To improve performance, this patch uses bit operations to replace
multiply and division operators.

Signed-off-by: Xue Chaojing <xuechaojing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit ebda9b46cebc9c1245fcfe96c76525717ef984cc)
Signed-off-by: Ike Panhc <ike.pan@canonical.com>
Acked-by: Kleber Souza <kleber.souza@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

hinic: Fix l4_type parameter in hinic_task_set_tunnel_l4

BugLink: https://bugs.launchpad.net/bugs/1820187
Clang warns:

drivers/net/ethernet/huawei/hinic/hinic_tx.c:392:34: error: implicit
conversion from enumeration type 'enum hinic_l4_tunnel_type' to
different enumeration type 'enum hinic_l4_offload_type'
[-Werror,-Wenum-conversion]
hinic_task_set_tunnel_l4(task, TUNNEL_UDP_NO_CSUM,
~~~~~~~~~~~~~~~~~~~~~~~~ ^~~~~~~~~~~~~~~~~~
1 error generated.

It seems that hinic_task_set_tunnel_l4 was meant to take an enum of type
hinic_l4_tunnel_type, not hinic_l4_offload_type, given both the name of
the functions and the values used.

Fixes: cc18a7543d2f ("net-next/hinic: add checksum offload and TSO support")
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 6e29464b8a72e74ec7c3f816f53bfe46a43601bc)
Signed-off-by: Ike Panhc <ike.pan@canonical.com>
Acked-by: Kleber Souza <kleber.souza@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

net-next/hinic: add checksum offload and TSO support

BugLink: https://bugs.launchpad.net/bugs/1820187
This patch adds checksum offload and TSO support for the HiNIC
driver. Perfomance test (Iperf) shows more than 100% improvement
in TCP streams.

Signed-off-by: Zhao Chen <zhaochen6@huawei.com>
Signed-off-by: Xue Chaojing <xuechaojing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit cc18a7543d2f63a2c93fc61cfa7fd8be5464f75e)
Signed-off-by: Ike Panhc <ike.pan@canonical.com>
Acked-by: Kleber Souza <kleber.souza@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

hinic: remove ndo_poll_controller

BugLink: https://bugs.launchpad.net/bugs/1820187
As diagnosed by Song Liu, ndo_poll_controller() can
be very dangerous on loaded hosts, since the cpu
calling ndo_poll_controller() might steal all NAPI
contexts (for all RX/TX queues of the NIC). This capture
can last for unlimited amount of time, since one
cpu is generally not able to drain all the queues under load.

hinic uses NAPI for TX completions, so we better let core
networking stack call the napi->poll() to avoid the capture.

Note that hinic_netpoll() was incorrectly scheduling NAPI
on both RX and TX queues.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Aviad Krawczyk <aviad.krawczyk@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit e71fb423e0dea3c9f98f0101e965426edfe849cd)
Signed-off-by: Ike Panhc <ike.pan@canonical.com>
Acked-by: Kleber Souza <kleber.souza@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

net-next: hinic: fix a problem in free_tx_poll()

BugLink: https://bugs.launchpad.net/bugs/1820187
This patch fixes the problem below. The problem can be reproduced by the
following steps:
1) Connecting all HiNIC interfaces
2) On server side
    # sudo ifconfig eth0 192.168.100.1 up #Using MLX CX4 card
    # iperf -s
3) On client side
    # sudo ifconfig eth0 192.168.100.2 up #Using our HiNIC card
    # iperf -c 192.168.101.1 -P 10 -t 100000

after hours of testing, we will see errors:

    hinic 0000:05:00.0: No MGMT msg handler, mod = 0
    hinic 0000:05:00.0: No MGMT msg handler, mod = 0
    hinic 0000:05:00.0: No MGMT msg handler, mod = 0
    hinic 0000:05:00.0: No MGMT msg handler, mod = 0

The errors are caused by the following problem.
1) The hinic_get_wqe() checks the "wq->delta" to allocate new WQEs:

if (atomic_sub_return(num_wqebbs, &wq->delta) <= 0) {
atomic_add(num_wqebbs, &wq->delta);
return ERR_PTR(-EBUSY);
}

If the WQE occupies multiple pages, the shadow WQE will be used. Then the
hinic_xmit_frame() fills the WQE.

2) While in parallel with 1), the free_tx_poll() checks the "wq->delta"
to free old WQEs:

if ((atomic_read(&wq->delta) + num_wqebbs) > wq->q_depth)
return ERR_PTR(-EBUSY);

There is a probability that the shadow WQE which hinic_xmit_frame() is
using will be damaged by copy_wqe_to_shadow():

if (curr_pg != end_pg) {
void *shadow_addr = &wq->shadow_wqe[curr_pg * wq->max_wqe_size];

copy_wqe_to_shadow(wq, shadow_addr, num_wqebbs, *cons_idx);
return shadow_addr;
}

This can cause WQE data error and you will see the above error messages.
This patch fixes the problem.

Signed-off-by: Zhao Chen <zhaochen6@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 9c2956d2ad9e0e7d5827290ba9a716ed3fb83bcd)
Signed-off-by: Ike Panhc <ike.pan@canonical.com>
Acked-by: Kleber Souza <kleber.souza@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

UBUNTU: [Config]: enable highdpi Terminus 16x32 font support

BugLink: https://bugs.launchpad.net/bugs/1819881
Enable the Hi-DPI 16x32 font support.

Also enable the compiled-in font support.

Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Acked-by: Kleber Souza <kleber.souza@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

Fonts: New Terminus large console font

BugLink: https://bugs.launchpad.net/bugs/1819881
This patch adds an option to compile-in a high resolution
and large Terminus (ter16x32) bitmap console font for use with
HiDPI and Retina screens.

The font was convereted from standard Terminus ter-i32b.psf
(size 16x32) with the help of psftools and minor hand editing
deleting useless characters.

This patch is non-intrusive, no options are enabled by default so most
users won't notice a thing.

I am placing my changes under the GPL 2.0 just as source Terminus font.

Signed-off-by: Amanoel Dawod <amanoeladawod@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit ac8b6f148fc97e9e10b48bd337ef571b1d1136aa)
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Acked-by: Kleber Souza <kleber.souza@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

s390/qeth: report 25Gbit link speed

BugLink: https://bugs.launchpad.net/bugs/1814892
This adds the various identifiers for 25Gbit cards, and wires them up
into sysfs and ethtool.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 54e049c227d9968ff6a7d80aae5fec27b54d39da)
Signed-off-by: Frank Heimes <frank.heimes@canonical.com>
Acked-by: Khalid Elmously <khalid.elmously@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
[ klebers: removed duplicated subject line from the commit message ]
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>

x86/dump_pagetables: Fix LDT remap address marker

CVE-2017-5754

The LDT remap placement has been changed. It's now placed before the direct
mapping in the kernel virtual address space for both paging modes.

Change address markers order accordingly.

Fixes: d52888aa2753 ("x86/mm: Move LDT remap out of KASLR region on 5-level paging")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@alien8.de
Cc: hpa@zytor.com
Cc: dave.hansen@linux.intel.com
Cc: luto@kernel.org
Cc: peterz@infradead.org
Cc: boris.ostrovsky@oracle.com
Cc: jgross@suse.com
Cc: bhe@redhat.com
Cc: hans.van.kranenburg@mendix.com
Cc: linux-mm@kvack.org
Cc: xen-devel@lists.xenproject.org
Link: https://lkml.kernel.org/r/20181130202328.65359-3-kirill.shutemov@linux.intel.com
(cherry picked from commit 254eb5505ca0ca749d3a491fc6668b6c16647a99)
Signed-off-by: Juerg Haefliger <juergh@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

x86/mm: Fix guard hole handling

CVE-2017-5754

There is a guard hole at the beginning of the kernel address space, also
used by hypervisors. It occupies 16 PGD entries.

This reserved range is not defined explicitely, it is calculated relative
to other entities: direct mapping and user space ranges.

The calculation got broken by recent changes of the kernel memory layout:
LDT remap range is now mapped before direct mapping and makes the
calculation invalid.

The breakage leads to crash on Xen dom0 boot[1].

Define the reserved range explicitely. It's part of kernel ABI (hypervisors
expect it to be stable) and must not depend on changes in the rest of
kernel memory layout.

[1] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03313.html

Fixes: d52888aa2753 ("x86/mm: Move LDT remap out of KASLR region on 5-level paging")
Reported-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: bp@alien8.de
Cc: hpa@zytor.com
Cc: dave.hansen@linux.intel.com
Cc: luto@kernel.org
Cc: peterz@infradead.org
Cc: boris.ostrovsky@oracle.com
Cc: bhe@redhat.com
Cc: linux-mm@kvack.org
Cc: xen-devel@lists.xenproject.org
Link: https://lkml.kernel.org/r/20181130202328.65359-2-kirill.shutemov@linux.intel.com
(backported from commit 16877a5570e0c5f4270d5b17f9bab427bcae9514)
[juergh: Adjusted context.]
Signed-off-by: Juerg Haefliger <juergh@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

x86/ldt: Remove unused variable in map_ldt_struct()

CVE-2017-5754

Splitting out the sanity check in map_ldt_struct() moved page table syncing
into a separate function, which made the pgd variable unused. Remove it.

[ tglx: Massaged changelog ]

Fixes: 9bae3197e15d ("x86/ldt: Split out sanity check in map_ldt_struct()")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: bp@alien8.de
Cc: hpa@zytor.com
Cc: dave.hansen@linux.intel.com
Cc: peterz@infradead.org
Cc: boris.ostrovsky@oracle.com
Cc: jgross@suse.com
Cc: bhe@redhat.com
Cc: willy@infradead.org
Cc: linux-mm@kvack.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181026122856.66224-4-kirill.shutemov@linux.intel.com
(cherry picked from commit b082f2dd80612015cd6d9d84e52099734ec9a0e1)
Signed-off-by: Juerg Haefliger <juergh@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

x86/ldt: Unmap PTEs for the slot before freeing LDT pages

CVE-2017-5754

modify_ldt(2) leaves the old LDT mapped after switching over to the new
one. The old LDT gets freed and the pages can be re-used.

Leaving the mapping in place can have security implications. The mapping is
present in the userspace page tables and Meltdown-like attacks can read
these freed and possibly reused pages.

It's relatively simple to fix: unmap the old LDT and flush TLB before
freeing the old LDT memory.

This further allows to avoid flushing the TLB in map_ldt_struct() as the
slot is unmapped and flushed by unmap_ldt_struct() or has never been mapped
at all.

[ tglx: Massaged changelog and removed the needless line breaks ]

Fixes: f55f0501cbf6 ("x86/pti: Put the LDT in its own PGD if PTI is on")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@alien8.de
Cc: hpa@zytor.com
Cc: dave.hansen@linux.intel.com
Cc: luto@kernel.org
Cc: peterz@infradead.org
Cc: boris.ostrovsky@oracle.com
Cc: jgross@suse.com
Cc: bhe@redhat.com
Cc: willy@infradead.org
Cc: linux-mm@kvack.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181026122856.66224-3-kirill.shutemov@linux.intel.com
(backported from commit a0e6e0831c516860fc7f9be1db6c081fe902ebcf)
[juergh: Adjusted for flush_tlb_mm_range() having fewer arguments.]
Signed-off-by: Juerg Haefliger <juergh@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

x86/mm: Move LDT remap out of KASLR region on 5-level paging

CVE-2017-5754

On 5-level paging the LDT remap area is placed in the middle of the KASLR
randomization region and it can overlap with the direct mapping, the
vmalloc or the vmap area.

The LDT mapping is per mm, so it cannot be moved into the P4D page table
next to the CPU_ENTRY_AREA without complicating PGD table allocation for
5-level paging.

The 4 PGD slot gap just before the direct mapping is reserved for
hypervisors, so it cannot be used.

Move the direct mapping one slot deeper and use the resulting gap for the
LDT remap area. The resulting layout is the same for 4 and 5 level paging.

[ tglx: Massaged changelog ]

Fixes: f55f0501cbf6 ("x86/pti: Put the LDT in its own PGD if PTI is on")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: bp@alien8.de
Cc: hpa@zytor.com
Cc: dave.hansen@linux.intel.com
Cc: peterz@infradead.org
Cc: boris.ostrovsky@oracle.com
Cc: jgross@suse.com
Cc: bhe@redhat.com
Cc: willy@infradead.org
Cc: linux-mm@kvack.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181026122856.66224-2-kirill.shutemov@linux.intel.com
(backported from commit d52888aa2753e3063a9d3a0c9f72f94aa9809c15)
[juergh: Adjusted for non-existing pgtable_l5_enabled().]
Signed-off-by: Juerg Haefliger <juergh@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>