Michael Ellerman [Thu, 21 Mar 2019 04:24:33 +0000 (15:24 +1100)]
powerpc/security: Fix spectre_v2 reporting
When I updated the spectre_v2 reporting to handle software count cache
flush I got the logic wrong when there's no software count cache
enabled at all.
The result is that on systems with the software count cache flush
disabled we print:
Which correctly indicates that the count cache is disabled, but
incorrectly says the software count cache flush is enabled.
The root of the problem is that we are trying to handle all
combinations of options. But we know now that we only expect to see
the software count cache flush enabled if the other options are false.
So split the two cases, which simplifies the logic and fixes the bug.
We were also missing a space before "(hardware accelerated)".
Michael Ellerman [Mon, 23 Jul 2018 15:07:56 +0000 (01:07 +1000)]
powerpc/powernv: Query firmware for count cache flush settings
Look for fw-features properties to determine the appropriate settings
for the count cache flush, and then call the generic powerpc code to
set it up based on the security feature flags.
BugLink: https://bugs.launchpad.net/bugs/1822870 Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit 99d54754d3d5f896a8f616b0b6520662bc99d66b) Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Michael Ellerman [Mon, 23 Jul 2018 15:07:55 +0000 (01:07 +1000)]
powerpc/pseries: Query hypervisor for count cache flush settings
Use the existing hypercall to determine the appropriate settings for
the count cache flush, and then call the generic powerpc code to set
it up based on the security feature flags.
BugLink: https://bugs.launchpad.net/bugs/1822870 Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit ba72dc171954b782a79d25e0f4b3ed91090c3b1e) Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Michael Ellerman [Mon, 23 Jul 2018 15:07:54 +0000 (01:07 +1000)]
powerpc/64s: Add support for software count cache flush
Some CPU revisions support a mode where the count cache needs to be
flushed by software on context switch. Additionally some revisions may
have a hardware accelerated flush, in which case the software flush
sequence can be shortened.
If we detect the appropriate flag from firmware we patch a branch
into _switch() which takes us to a count cache flush sequence.
That sequence in turn may be patched to return early if we detect that
the CPU supports accelerating the flush sequence in hardware.
Add debugfs support for reporting the state of the flush, as well as
runtime disabling it.
And modify the spectre_v2 sysfs file to report the state of the
software flush.
BugLink: https://bugs.launchpad.net/bugs/1822870 Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(backported from commit ee13cb249fabdff8b90aaff61add347749280087) Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Michael Ellerman [Mon, 23 Jul 2018 15:07:53 +0000 (01:07 +1000)]
powerpc/64s: Add new security feature flags for count cache flush
Add security feature flags to indicate the need for software to flush
the count cache on context switch, and for the presence of a hardware
assisted count cache flush.
BugLink: https://bugs.launchpad.net/bugs/1822870 Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit dc8c6cce9a26a51fc19961accb978217a3ba8c75) Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Michael Ellerman [Mon, 23 Jul 2018 15:07:52 +0000 (01:07 +1000)]
powerpc/asm: Add a patch_site macro & helpers for patching instructions
Add a macro and some helper C functions for patching single asm
instructions.
The gas macro means we can do something like:
1: nop
patch_site 1b, patch__foo
Which is less visually distracting than defining a GLOBAL symbol at 1,
and also doesn't pollute the symbol table which can confuse eg. perf.
These are obviously similar to our existing feature sections, but are
not automatically patched based on CPU/MMU features, rather they are
designed to be manually patched by C code at some arbitrary point.
BugLink: https://bugs.launchpad.net/bugs/1822870 Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit 06d0bbc6d0f56dacac3a79900e9a9a0d5972d818) Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Christophe Leroy [Fri, 24 Nov 2017 07:31:09 +0000 (08:31 +0100)]
powerpc/lib/feature-fixups: use raw_patch_instruction()
feature fixups need to use patch_instruction() early in the boot,
even before the code is relocated to its final address, requiring
patch_instruction() to use PTRRELOC() in order to address data.
But feature fixups applies on code before it is set to read only,
even for modules. Therefore, feature fixups can use
raw_patch_instruction() instead.
BugLink: https://bugs.launchpad.net/bugs/1822870 Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit 8183d99f4a22c2abbc543847a588df3666ef0c0c) Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
powerpc/64: Make meltdown reporting Book3S 64 specific
In a subsequent patch we will enable building security.c for Book3E.
However the NXP platforms are not vulnerable to Meltdown, so make the
Meltdown vulnerability reporting PPC_BOOK3S_64 specific.
BugLink: https://bugs.launchpad.net/bugs/1822870 Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
[mpe: Split out of larger patch] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit 406d2b6ae3420f5bb2b3db6986dc6f0b6dbb637b) Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Michael Ellerman [Fri, 27 Jul 2018 23:06:35 +0000 (09:06 +1000)]
powerpc/64: Call setup_barrier_nospec() from setup_arch()
Currently we require platform code to call setup_barrier_nospec(). But
if we add an empty definition for the !CONFIG_PPC_BARRIER_NOSPEC case
then we can call it in setup_arch().
BugLink: https://bugs.launchpad.net/bugs/1822870 Signed-off-by: Diana Craciun <diana.craciun@nxp.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit af375eefbfb27cbb5b831984e66d724a40d26b5c) Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Michael Ellerman [Fri, 27 Jul 2018 23:06:34 +0000 (09:06 +1000)]
powerpc/64: Add CONFIG_PPC_BARRIER_NOSPEC
Add a config symbol to encode which platforms support the
barrier_nospec speculation barrier. Currently this is just Book3S 64
but we will add Book3E in a future patch.
BugLink: https://bugs.launchpad.net/bugs/1822870 Signed-off-by: Diana Craciun <diana.craciun@nxp.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit 179ab1cbf883575c3a585bcfc0f2160f1d22a149) Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
powerpc64s: Show ori31 availability in spectre_v1 sysfs file not v2
When I added the spectre_v2 information in sysfs, I included the
availability of the ori31 speculation barrier.
Although the ori31 barrier can be used to mitigate v2, it's primarily
intended as a spectre v1 mitigation. Spectre v2 is mitigated by
hardware changes.
So rework the sysfs files to show the ori31 information in the
spectre_v1 file, rather than v2.
Michael Ellerman [Tue, 24 Apr 2018 04:15:58 +0000 (14:15 +1000)]
powerpc: Use barrier_nospec in copy_from_user()
Based on the x86 commit doing the same.
See commit 304ec1b05031 ("x86/uaccess: Use __uaccess_begin_nospec()
and uaccess_try_nospec") and b3bbfb3fb5d2 ("x86: Introduce
__uaccess_begin_nospec() and uaccess_try_nospec") for more detail.
In all cases we are ordering the load from the potentially
user-controlled pointer vs a previous branch based on an access_ok()
check or similar.
Base on a patch from Michal Suchanek.
BugLink: https://bugs.launchpad.net/bugs/1822870 Signed-off-by: Michal Suchanek <msuchanek@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit ddf35cf3764b5a182b178105f57515b42e2634f8) Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Michal Suchanek [Tue, 24 Apr 2018 04:15:55 +0000 (14:15 +1000)]
powerpc/64s: Add support for ori barrier_nospec patching
Based on the RFI patching. This is required to be able to disable the
speculation barrier.
Only one barrier type is supported and it does nothing when the
firmware does not enable it. Also re-patching modules is not supported
So the only meaningful thing that can be done is patching out the
speculation barrier at boot when the user says it is not wanted.
BugLink: https://bugs.launchpad.net/bugs/1822870 Signed-off-by: Michal Suchanek <msuchanek@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit 2eea7f067f495e33b8b116b35b5988ab2b8aec55) Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Xin Long [Thu, 18 Apr 2019 07:50:00 +0000 (09:50 +0200)]
sctp: implement memory accounting on rx path
sk_forward_alloc's updating is also done on rx path, but to be consistent
we change to use sk_mem_charge() in sctp_skb_set_owner_r().
In sctp_eat_data(), it's not enough to check sctp_memory_pressure only,
which doesn't work for mem_cgroup_sockets_enabled, so we change to use
sk_under_memory_pressure().
When it's under memory pressure, sk_mem_reclaim() and sk_rmem_schedule()
should be called on both RENEGE or CHUNK DELIVERY path exit the memory
pressure status as soon as possible.
Note that sk_rmem_schedule() is using datalen to make things easy there.
Reported-by: Matteo Croce <mcroce@redhat.com> Tested-by: Matteo Croce <mcroce@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
CVE-2019-3874
(cherry picked from commit 9dde27de3e5efa0d032f3c891a0ca833a0d31911 linux-next) Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Xin Long [Thu, 18 Apr 2019 07:50:00 +0000 (09:50 +0200)]
sctp: implement memory accounting on tx path
Now when sending packets, sk_mem_charge() and sk_mem_uncharge() have been
used to set sk_forward_alloc. We just need to call sk_wmem_schedule() to
check if the allocated should be raised, and call sk_mem_reclaim() to
check if the allocated should be reduced when it's under memory pressure.
If sk_wmem_schedule() returns false, which means no memory is allowed to
allocate, it will block and wait for memory to become available.
Note different from tcp, sctp wait_for_buf happens before allocating any
skb, so memory accounting check is done with the whole msg_len before it
too.
Reported-by: Matteo Croce <mcroce@redhat.com> Tested-by: Matteo Croce <mcroce@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
CVE-2019-3874
(backported from commit 1033990ac5b2ab6cee93734cb6d301aa3a35bcaa linux-next)
[tyhicks: Backport to 4.15:
- sctp_sendmsg_to_asoc() does not yet exist and its code is still in
sctp_sendmsg()
- sctp_sendmsg() has slight context differences due to timeo being
unconditionally assigned] Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Xin Long [Thu, 18 Apr 2019 07:50:00 +0000 (09:50 +0200)]
sctp: use sk_wmem_queued to check for writable space
sk->sk_wmem_queued is used to count the size of chunks in out queue
while sk->sk_wmem_alloc is for counting the size of chunks has been
sent. sctp is increasing both of them before enqueuing the chunks,
and using sk->sk_wmem_alloc to check for writable space.
However, sk_wmem_alloc is also increased by 1 for the skb allocked
for sending in sctp_packet_transmit() but it will not wake up the
waiters when sk_wmem_alloc is decreased in this skb's destructor.
If msg size is equal to sk_sndbuf and sendmsg is waiting for sndbuf,
the check 'msg_len <= sctp_wspace(asoc)' in sctp_wait_for_sndbuf()
will keep waiting if there's a skb allocked in sctp_packet_transmit,
and later even if this skb got freed, the waiting thread will never
get waked up.
This issue has been there since very beginning, so we change to use
sk->sk_wmem_queued to check for writable space as sk_wmem_queued is
not increased for the skb allocked for sending, also as TCP does.
SOCK_SNDBUF_LOCK check is also removed here as it's for tx buf auto
tuning which I will add in another patch.
Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
CVE-2019-3874
(backported from commit cd305c74b0f8b49748a79a8f67fc8e5e3e0c4794)
[tyhicks: Backport to 4.15:
- sctp_sendmsg_to_asoc() does not yet exist and its code is still in
sctp_sendmsg()
- sctp_sendmsg() has slight context differences due to timeo being
unconditionally assigned] Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/1814874
Reclaim and free can race on an object which is basically fine but in
order for reclaim to be able to map "freed" object we need to encode
object length in the handle. handle_to_chunks() is then introduced to
extract object length from a handle and use it during mapping.
Moreover, to avoid racing on a z3fold "headless" page release, we should
not try to free that page in z3fold_free() if the reclaim bit is set.
Also, in the unlikely case of trying to reclaim a page being freed, we
should not proceed with that page.
While at it, fix the page accounting in reclaim function.
This patch supersedes "[PATCH] z3fold: fix reclaim lock-ups".
Link: http://lkml.kernel.org/r/20181105162225.74e8837d03583a9b707cf559@gmail.com Signed-off-by: Vitaly Wool <vitaly.vul@sony.com> Signed-off-by: Jongseok Kim <ks77sj@gmail.com> Reported-by-by: Jongseok Kim <ks77sj@gmail.com> Reviewed-by: Snild Dolkow <snild@sony.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit ca0246bb97c23da9d267c2107c07fb77e38205c9) Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
misc: rtsx: Enable OCP for rts522a rts524a rts525a rts5260
BugLink: https://bugs.launchpad.net/bugs/1825487
this enables and adds OCP function for Realtek A series cardreader chips
and fixes some OCP flow in rts5260.c
Signed-off-by: RickyWu <ricky_wu@realtek.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit bede03a579b3b4a036003c4862cc1baa4ddc351f) Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Acked-by: You-Sheng Yang <vicamo.yang@canonical.com> Acked-by: AceLan Kao <acelan.kao@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Colin Ian King [Fri, 19 Apr 2019 09:10:00 +0000 (11:10 +0200)]
misc: rtsx: make various functions static
BugLink: https://bugs.launchpad.net/bugs/1825487
The functions rts5260_get_ocpstat, rts5260_get_ocpstat2,
rts5260_clear_ocpstat, rts5260_process_ocp, rts5260_init_hw and
rts5260_set_aspm are local to the source and do not need to be
in global scope, so make them static.
Cleans up sparse warnings:
symbol 'rts5260_get_ocpstat' was not declared. Should it be static?
symbol 'rts5260_get_ocpstat2' was not declared. Should it be static?
symbol 'rts5260_clear_ocpstat' was not declared. Should it be static?
symbol 'rts5260_process_ocp' was not declared. Should it be static?
symbol 'rts5260_init_hw' was not declared. Should it be static?
symbol 'rts5260_set_aspm' was not declared. Should it be static?
Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 75a898051d3d2a105b3a0ca8be6e356429a68457) Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Acked-by: You-Sheng Yang <vicamo.yang@canonical.com> Acked-by: AceLan Kao <acelan.kao@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Hui Wang [Thu, 18 Apr 2019 05:52:00 +0000 (07:52 +0200)]
ALSA: hda/realtek - add two more pin configuration sets to quirk table
BugLink: https://bugs.launchpad.net/bugs/1825272
We have two Dell laptops which have the codec 10ec0236 and 10ec0256
respectively, the headset mic on them can't work, need to apply the
quirk of ALC255_FIXUP_DELL1_MIC_NO_PRESENCE. So adding their pin
configurations in the pin quirk table.
Cc: <stable@vger.kernel.org> Signed-off-by: Hui Wang <hui.wang@canonical.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
(cherry picked from commit b26e36b7ef36a8a3a147b1609b2505f8a4ecf511
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git) Signed-off-by: Hui Wang <hui.wang@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: AceLan Kao <acelan.kao@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
if node have NFSv41+ mounts inside several net namespaces
it can lead to use-after-free in svc_process_common()
svc_process_common()
/* Setup reply header */
rqstp->rq_xprt->xpt_ops->xpo_prep_reply_hdr(rqstp); <<< HERE
svc_process_common() can use incorrect rqstp->rq_xprt,
its caller function bc_svc_process() takes it from serv->sv_bc_xprt.
The problem is that serv is global structure but sv_bc_xprt
is assigned per-netnamespace.
According to Trond, the whole "let's set up rqstp->rq_xprt
for the back channel" is nothing but a giant hack in order
to work around the fact that svc_process_common() uses it
to find the xpt_ops, and perform a couple of (meaningless
for the back channel) tests of xpt_flags.
All we really need in svc_process_common() is to be able to run
rqstp->rq_xprt->xpt_ops->xpo_prep_reply_hdr()
Bruce J Fields points that this xpo_prep_reply_hdr() call
is an awfully roundabout way just to do "svc_putnl(resv, 0);"
in the tcp case.
This patch does not initialiuze rqstp->rq_xprt in bc_svc_process(),
now it calls svc_process_common() with rqstp->rq_xprt = NULL.
To adjust reply header svc_process_common() just check
rqstp->rq_prot and calls svc_tcp_prep_reply_hdr() for tcp case.
To handle rqstp->rq_xprt = NULL case in functions called from
svc_process_common() patch intruduces net namespace pointer
svc_rqst->rq_bc_net and adjust SVC_NET() definition.
Some other function was also adopted to properly handle described case.
Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Cc: stable@vger.kernel.org Fixes: 23c20ecd4475 ("NFS: callback up - users counting cleanup") Signed-off-by: J. Bruce Fields <bfields@redhat.com>
(backported from commit d4b09acf924b84bae77cad090a9d108e70b43643)
[ kmously: The upstream patch had 2 additional NULL-checks in
include/trace/events/sunrpc.h that aren't applicable to 4.4.
Also added a declaration for svc_tcp_prep_reply_hdr() in svc.c to
avoid implicit-function-declaration warnings/errors ] Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: You-Sheng Yang <vicamo.yang@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Another platform requires even longer delay to make the device work
correctly after S3.
So increase the delay to 300ms.
BugLink: https://bugs.launchpad.net/bugs/1798921 Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 1765f5dcd00963e33f1b8a4e0f34061fbc0e2f7f) Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Colin Ian King <colin.king@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
brcmfmac: assure SSID length from firmware is limited
The SSID length as received from firmware should not exceed
IEEE80211_MAX_SSID_LEN as that would result in heap overflow.
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com> Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com> Reviewed-by: Franky Lin <franky.lin@broadcom.com> Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
CVE-2019-9500
(cherry picked from commit 1b5e2423164b3670e8bc9174e4762d297990deff) Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Acked-by: Colin Ian King <colin.king@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
brcmfmac: add subtype check for event handling in data path
For USB there is no separate channel being used to pass events
from firmware to the host driver and as such are passed over the
data path. In order to detect mock event messages an additional
check is needed on event subtype. This check is added conditionally
using unlikely() keyword.
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com> Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com> Reviewed-by: Franky Lin <franky.lin@broadcom.com> Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
CVE-2019-9503
(cherry picked from commit a4176ec356c73a46c07c181c6d04039fafa34a9f) Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Acked-by: Colin Ian King <colin.king@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Alex Williamson [Thu, 18 Apr 2019 07:25:58 +0000 (07:25 +0000)]
vfio/type1: Limit DMA mappings per container
Memory backed DMA mappings are accounted against a user's locked
memory limit, including multiple mappings of the same memory. This
accounting bounds the number of such mappings that a user can create.
However, DMA mappings that are not backed by memory, such as DMA
mappings of device MMIO via mmaps, do not make use of page pinning
and therefore do not count against the user's locked memory limit.
These mappings still consume memory, but the memory is not well
associated to the process for the purpose of oom killing a task.
To add bounding on this use case, we introduce a limit to the total
number of concurrent DMA mappings that a user is allowed to create.
This limit is exposed as a tunable module option where the default
value of 64K is expected to be well in excess of any reasonable use
case (a large virtual machine configuration would typically only make
use of tens of concurrent mappings).
This fixes CVE-2019-3882.
Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
CVE-2019-3882
(cherry picked from commit 492855939bdb59c6f947b0b5b44af9ad82b7e38c) Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Acked-by: Colin Ian King <colin.king@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/1818490
The runtime_suspend device callbacks are not supposed to save
configuration state or change the power state. Commit fb29f76cc566
("igb: Fix an issue that PME is not enabled during runtime suspend")
changed the driver to not save configuration state during runtime
suspend, however the driver callback still put the device into a
low-power state. This causes a warning in the pci pm core and results in
pci_pm_runtime_suspend not calling pci_save_state or pci_finish_runtime_suspend.
Fix this by not changing the power state either, leaving that to pci pm
core, and make the same change for suspend callback as well.
Also move a couple of defines into the appropriate header file instead
of inline in the .c file.
Fixes: fb29f76cc566 ("igb: Fix an issue that PME is not enabled during runtime suspend") Signed-off-by: Arvind Sankar <niveditas98@gmail.com> Reviewed-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit dabb8338be533c18f50255cf39ff4f66d4dabdbe) Signed-off-by: You-Sheng Yang <vicamo.yang@canonical.com> Acked-by: AceLan Kao <acelan.kao@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
cleanups:
1,should memset all wb buffer
2,set max wb number to 128 (total 4KB) is big enough
Signed-off-by: Monk Liu <Monk.Liu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 73469585510d5161368c899b7eacd58c824b2b24) Signed-off-by: You-Sheng Yang <vicamo.yang@canonical.com> Acked-by: AceLan Kao <acelan.kao@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Kailang Yang [Wed, 17 Apr 2019 08:25:43 +0000 (16:25 +0800)]
ALSA: hda/realtek - Disable headset Mic VREF for headset mode of ALC225
BugLink: https://launchpad.net/bugs/1821290
Disable Headset Mic VREF for headset mode of ALC225.
This will be controlled by coef bits of headset mode functions.
[ Fixed a compile warning and code simplification -- tiwai ]
Signed-off-by: Kailang Yang <kailang@realtek.com> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
(backported from commit d1dd42110d2727e81b9265841a62bc84c454c3a2) Signed-off-by: Wen-chien Jesse Sung <jesse.sung@canonical.com> Acked-by: AceLan Kao <acelan.kao@canonical.com> Acked-by: hwang4 <hui.wang@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
mac80211_hwsim: Timer should be initialized before device registered
BugLink: https://bugs.launchpad.net/bugs/1825058
Otherwise if network manager starts configuring Wi-Fi interface
immidiatelly after getting notification of its creation, we will get
NULL pointer dereference:
Hui Wang [Thu, 18 Apr 2019 03:55:45 +0000 (11:55 +0800)]
ALSA: hda - Add two more machines to the power_save_blacklist
BugLink: https://bugs.launchpad.net/bugs/1821663
Recently we set CONFIG_SND_HDA_POWER_SAVE_DEFAULT to 1 when
configuring the kernel, then two machines were reported to have noise
after installing the new kernel. Put them in the blacklist, the
noise disappears.
https://bugs.launchpad.net/bugs/1821663 Cc: <stable@vger.kernel.org> Signed-off-by: Hui Wang <hui.wang@canonical.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
(backported from commit cae30527901d9590db0e12ace994c1d58bea87fd) Signed-off-by: Hui Wang <hui.wang@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: You-Sheng Yang <vicamo.yang@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Josef Bacik [Tue, 2 Apr 2019 16:32:44 +0000 (17:32 +0100)]
nbd: fix how we set bd_invalidated
BugLink: https://bugs.launchpad.net/bugs/1822247
bd_invalidated is kind of a pain wrt partitions as it really only
triggers the partition rescan if it is set after bd_ops->open() runs, so
setting it when we reset the device isn't useful. We also sporadically
would still have partitions left over in some disconnect cases, so fix
this by always setting bd_invalidated on open if there's no
configuration or if we've had a disconnect action happen, that way the
partition table gets invalidated and rescanned properly.
Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from fe1f9e6659ca6124f500a0f829202c7c902fab0c) Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Po-Hsu Lin <po-hsu.lin@canonical.com> Acked-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/1822821
Inside a nested guest, access to hardware can be slow enough that
tsc_read_refs always return ULLONG_MAX, causing tsc_refine_calibration_work
to be called periodically and the nested guest to spend a lot of time
reading the ACPI timer.
However, if the TSC frequency is available from the pvclock page,
we can just set X86_FEATURE_TSC_KNOWN_FREQ and avoid the recalibration.
'refine' operation.
Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Peng Hao <peng.hao2@zte.com.cn>
[Commit message rewritten. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit e10f7805032365cc11c739a97f226ebb48aee042) Signed-off-by: Heitor R. Alves de Siqueira <halves@canonical.com> Acked-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/1819786
Connections in One-packet scheduling mode (-o, --ops) are
removed with refcnt=0 because they are not hashed in conn table.
To avoid refcount_dec reporting this as error, change them to be
removed with refcount_dec_if_one as all other connections.
Andrea Righi [Fri, 5 Apr 2019 07:31:53 +0000 (09:31 +0200)]
openvswitch: fix flow actions reallocation
BugLink: https://bugs.launchpad.net/bugs/1813244
The flow action buffer can be resized if it's not big enough to contain
all the requested flow actions. However, this resize doesn't take into
account the new requested size, the buffer is only increased by a factor
of 2x. This might be not enough to contain the new data, causing a
buffer overflow, for example:
Fix by making sure the new buffer is properly resized to contain all the
requested data.
BugLink: https://bugs.launchpad.net/bugs/1813244 Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit f28cd2af22a0c134e4aa1c64a70f70d815d473fb) Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Acked-by: Juerg Haefliger <juergh@canonical.com> Acked-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Juerg Haefliger <juergh@canonical.com>
Andrea Righi [Thu, 28 Mar 2019 17:09:00 +0000 (18:09 +0100)]
btrfs: raid56: properly unmap parity page in finish_parity_scrub()
Buglink: https://bugs.launchpad.net/bugs/1812845
Parity page is incorrectly unmapped in finish_parity_scrub(), triggering
a reference counter bug on i386, i.e.:
The reason is that kunmap(p_page) was completely left out, so we never
did an unmap for the p_page and the loop unmapping the rbio page was
iterating over the wrong number of stripes: unmapping should be done
with nr_data instead of rbio->real_stripes.
cpupower : Fix header name to read idle state name
BugLink: https://bugs.launchpad.net/bugs/1719545
The names of the idle states in the output of cpupower monitor command are
truncated to 4 characters. On POWER9, this creates ambiguity as the states
are named "stop0", "stop1", etc.
drm/amdgpu: Free VGA stolen memory as soon as possible.
BugLink: https://launchpad.net/bugs/1818617
Reserved VRAM is used to avoid overriding pre OS FB.
Once our display stack takes over we don't need the reserved
VRAM anymore.
v2:
Remove comment, we know actually why we need to reserve the stolen VRAM.
Fix return type for amdgpu_ttm_late_init.
v3:
Return 0 in amdgpu_bo_late_init, rebase on changes to previous patch
v4: rebase
v5:
For GMC9 reserve always just 9M and keep the stolem memory around
until GART table curruption on S3 resume is resolved.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(backported from commit 6f752ec2c20c6a575da29d5b297980f376830e6b) Signed-off-by: Timo Aaltonen <timo.aaltonen@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Alex Deucher [Thu, 7 Mar 2019 06:51:45 +0000 (08:51 +0200)]
drm/amdgpu/gmc: steal the appropriate amount of vram for fw hand-over (v3)
BugLink: https://launchpad.net/bugs/1818617
Steal 9 MB for vga emulation and fb if vga is enabled, otherwise,
steal enough to cover the current display size as set by the vbios.
If no memory is used (e.g., secondary or headless card), skip
stolen memory reserve.
v2: skip reservation if vram is limited, address Christian's comments
v3: squash in fix from Harry
Reviewed-and-Tested-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> (v2) Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
(cherry picked from commit ebdef28ebbcf767d9fa687acb1d02d97d834c628) Signed-off-by: Timo Aaltonen <timo.aaltonen@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Lu Baolu [Fri, 29 Mar 2019 07:30:51 +0000 (15:30 +0800)]
iommu/vt-d: Disable ATS support on untrusted devices
BugLink: https://bugs.launchpad.net/bugs/1820153
Commit fb58fdcd295b9 ("iommu/vt-d: Do not enable ATS for untrusted
devices") disables ATS support on the devices which have been marked
as untrusted. Unfortunately this is not enough to fix the DMA attack
vulnerabiltiies because IOMMU driver allows translated requests as
long as a device advertises the ATS capability. Hence a malicious
peripheral device could use this to bypass IOMMU.
This disables the ATS support on untrusted devices by clearing the
internal per-device ATS mark. As the result, IOMMU driver will block
any translated requests from any device marked as untrusted.
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Suggested-by: Kevin Tian <kevin.tian@intel.com> Suggested-by: Ashok Raj <ashok.raj@intel.com> Fixes: fb58fdcd295b9 ("iommu/vt-d: Do not enable ATS for untrusted devices") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
(backported from commit d8b8591054575f33237556c32762d54e30774d28) Signed-off-by: Aaron Ma <aaron.ma@canonical.com> Acked-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Mika Westerberg [Fri, 15 Mar 2019 05:00:08 +0000 (13:00 +0800)]
thunderbolt: Export IOMMU based DMA protection support to userspace
BugLink: https://bugs.launchpad.net/bugs/1820153
Recent systems with Thunderbolt ports may support IOMMU natively. In
practice this means that Thunderbolt connected devices are placed behind
an IOMMU during the whole time it is connected (including during boot)
making Thunderbolt security levels redundant. This is called Kernel DMA
protection [1] by Microsoft.
Some of these systems still have Thunderbolt security level set to
"user" in order to support OS downgrade (the older version of the OS
might not support IOMMU based DMA protection so connecting a device
still relies on user approval).
Export this information to userspace by introducing a new sysfs
attribute (iommu_dma_protection). Based on it userspace tools can make
more accurate decision whether or not authorize the connected device.
In addition update Thunderbolt documentation regarding IOMMU based DMA
protection.
Mika Westerberg [Fri, 15 Mar 2019 05:00:07 +0000 (13:00 +0800)]
iommu/vt-d: Do not enable ATS for untrusted devices
BugLink: https://bugs.launchpad.net/bugs/1820153
Currently Linux automatically enables ATS (Address Translation Service)
for any device that supports it (and IOMMU is turned on). ATS is used to
accelerate DMA access as the device can cache translations locally so
there is no need to do full translation on IOMMU side. However, as
pointed out in [1] ATS can be used to bypass IOMMU based security
completely by simply sending PCIe read/write transaction with AT
(Address Translation) field set to "translated".
To mitigate this modify the Intel IOMMU code so that it does not enable
ATS for any device that is marked as being untrusted. In case this turns
out to cause performance issues we may selectively allow ATS based on
user decision but currently use big hammer and disable it completely to
be on the safe side.
Lu Baolu [Fri, 15 Mar 2019 05:00:06 +0000 (13:00 +0800)]
iommu/vt-d: Force IOMMU on for platform opt in hint
BugLink: https://bugs.launchpad.net/bugs/1820153
Intel VT-d spec added a new DMA_CTRL_PLATFORM_OPT_IN_FLAG flag in DMAR
ACPI table [1] for BIOS to report compliance about platform initiated
DMA restricted to RMRR ranges when transferring control to the OS. This
means that during OS boot, before it enables IOMMU none of the connected
devices can bypass DMA protection for instance by overwriting the data
structures used by the IOMMU. The OS also treats this as a hint that the
IOMMU should be enabled to prevent DMA attacks from possible malicious
devices.
A use of this flag is Kernel DMA protection for Thunderbolt [2] which in
practice means that IOMMU should be enabled for PCIe devices connected
to the Thunderbolt ports. With IOMMU enabled for these devices, all DMA
operations are limited in the range reserved for it, thus the DMA
attacks are prevented. All these devices are enumerated in the PCI/PCIe
module and marked with an untrusted flag.
This forces IOMMU to be enabled if DMA_CTRL_PLATFORM_OPT_IN_FLAG is set
in DMAR ACPI table and there are PCIe devices marked as untrusted in the
system. This can be turned off by adding "intel_iommu=off" in the kernel
command line, if any problems are found.
Mika Westerberg [Fri, 15 Mar 2019 05:00:05 +0000 (13:00 +0800)]
PCI / ACPI: Identify untrusted PCI devices
BugLink: https://bugs.launchpad.net/bugs/1820153
A malicious PCI device may use DMA to attack the system. An external
Thunderbolt port is a convenient point to attach such a device. The OS
may use IOMMU to defend against DMA attacks.
Some BIOSes mark these externally facing root ports with this
ACPI _DSD [1]:
If we find such a root port, mark it and all its children as untrusted.
The rest of the OS may use this information to enable DMA protection
against malicious devices. For instance the device may be put behind an
IOMMU to keep it from accessing memory outside of what the driver has
allocated for it.
While at it, add a comment on top of prp_guids array explaining the
possible caveat resulting when these GUIDs are treated equivalent.
BugLink: https://bugs.launchpad.net/bugs/1820153
It is possible to have _DSD entries where the data is compatible with
device properties format but are using different GUID for various reasons.
In addition to that there can be many such _DSD entries for a single device
such as for PCIe root port used to host a Thunderbolt hierarchy:
To make these available for drivers via unified device property APIs,
modify ACPI property core so that it supports multiple _DSD entries
organized in a linked list. We also store GUID of each _DSD entry in struct
acpi_device_properties in case there is need to differentiate between
entries. The supported GUIDs are then listed in prp_guids array.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com>
(cherry picked from commit 5f5e4890d57a8af5da72c9d73a4efa9bad43a7a3) Signed-off-by: Aaron Ma <aaron.ma@canonical.com> Acked-by: Tyler Hicks <tyhicks@canonical.com> Acked-by: Hui Wang <hui.wang@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
If there is an error while parsing an element of the termlist, we
will skip parsing the current termlist element and continue parsing
to the next opcode in the termlist.
If we get an error while parsing the conditional of If/Else/While or
the device name of Scope, we will skip the body of the statement all
together and pop the parser_state.
If we get an error while parsing the base offset and length of an
operation region declaration, we will remove the operation region
from the namespace.
Signed-off-by: Erik Schmauss <erik.schmauss@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(backported from commit 5088814a6e931350e5bd29f5d59fa40c6dbbdf10) Signed-off-by: Aaron Ma <aaron.ma@canonical.com> Acked-by: Tyler Hicks <tyhicks@canonical.com> Acked-by: Hui Wang <hui.wang@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
powerpc/powernv/npu: Fault user page into the hypervisor's pagetable
BugLink: https://bugs.launchpad.net/bugs/1819989
When a page fault happens in a GPU, the GPU signals the OS and the GPU
driver calls the fault handler which populated a page table; this allows
the GPU to complete an ATS request.
On the bare metal get_user_pages() is enough as it adds a pte to
the kernel page table but under KVM the partition scope tree does not get
updated so ATS will still fail.
This reads a byte from an effective address which causes HV storage
interrupt and KVM updates the partition scope tree.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit 58629c0dc34904d135af944d120eb23165ec3b61) Signed-off-by: Jose Ricardo Ziviani <joserz@linux.ibm.com> Acked-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Kleber Souza <kleber.souza@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
powerpc/powernv/npu: Check mmio_atsd array bounds when populating
BugLink: https://bugs.launchpad.net/bugs/1819989
A broken device tree might contain more than 8 values and introduce hard
to debug memory corruption bug. This adds the boundary check.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit 135ef954051b102870a8d47a8eb822af1f1b1ec1) Signed-off-by: Jose Ricardo Ziviani <joserz@linux.ibm.com> Acked-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Kleber Souza <kleber.souza@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
powerpc/pseries: Remove IOMMU API support for non-LPAR systems
BugLink: https://bugs.launchpad.net/bugs/1819989
The pci_dma_bus_setup_pSeries and pci_dma_dev_setup_pSeries hooks are
registered for the pseries platform which does not have FW_FEATURE_LPAR;
these would be pre-powernv platforms which we never supported PCI pass
through for anyway so remove it.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit c409c6316166993163e29312aeaaf1c0c300a04a) Signed-off-by: Jose Ricardo Ziviani <joserz@linux.ibm.com> Acked-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Kleber Souza <kleber.souza@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/1819989
We already changed NPU API for GPUs to not to call OPAL and the remaining
bit is initializing NPU structures.
This searches for POWER9 NVLinks attached to any device on a PHB and
initializes an NPU structure if any found.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(backported from commit 3be2df00e299821ad255498ac4411906a8d59cfa) Signed-off-by: Jose Ricardo Ziviani <joserz@linux.ibm.com> Acked-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Kleber Souza <kleber.souza@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
powerpc/pseries/iommu: Use memory@ nodes in max RAM address calculation
BugLink: https://bugs.launchpad.net/bugs/1819989
We might have memory@ nodes with "linux,usable-memory" set to zero
(for example, to replicate powernv's behaviour for GPU coherent memory)
which means that the memory needs an extra initialization but since
it can be used afterwards, the pseries platform will try mapping it
for DMA so the DMA window needs to cover those memory regions too;
if the window cannot cover new memory regions, the memory onlining fails.
This walks through the memory nodes to find the highest RAM address to
let a huge DMA window cover that too in case this memory gets onlined
later.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit 68c0449ea16d775e762b532afddb4d6a5f161877) Signed-off-by: Jose Ricardo Ziviani <joserz@linux.ibm.com> Acked-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Kleber Souza <kleber.souza@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
powerpc/powernv/npu: Move OPAL calls away from context manipulation
BugLink: https://bugs.launchpad.net/bugs/1819989
When introduced, the NPU context init/destroy helpers called OPAL which
enabled/disabled PID (a userspace memory context ID) filtering in an NPU
per a GPU; this was a requirement for P9 DD1.0. However newer chip
revision added a PID wildcard support so there is no more need to
call OPAL every time a new context is initialized. Also, since the PID
wildcard support was added, skiboot does not clear wildcard entries
in the NPU so these remain in the hardware till the system reboot.
This moves LPID and wildcard programming to the PE setup code which
executes once during the booting process so NPU2 context init/destroy
won't need to do additional configuration.
This replaces the check for FW_FEATURE_OPAL with a check for npu!=NULL as
this is the way to tell if the NPU support is present and configured.
This moves pnv_npu2_init() declaration as pseries should be able to use it.
This keeps pnv_npu2_map_lpar() in powernv as pseries is not allowed to
call that. This exports pnv_npu2_map_lpar_dev() as following patches
will use it from the VFIO driver.
While at it, replace redundant list_for_each_entry_safe() with
a simpler list_for_each_entry().
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit 0e759bd75285e96fbb4013d1303b08fdb8ba58e1) Signed-off-by: Jose Ricardo Ziviani <joserz@linux.ibm.com> Acked-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Kleber Souza <kleber.souza@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
powerpc/powernv: Move npu struct from pnv_phb to pci_controller
BugLink: https://bugs.launchpad.net/bugs/1819989
The powernv PCI code stores NPU data in the pnv_phb struct. The latter
is referenced by pci_controller::private_data. We are going to have NPU2
support in the pseries platform as well but it does not store any
private_data in in the pci_controller struct; and even if it did,
it would be a different data structure.
This makes npu a pointer and stores it one level higher in
the pci_controller struct.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(backported from commit 46a1449d9e39478a35d35d9d9025776f6cee24fb) Signed-off-by: Jose Ricardo Ziviani <joserz@linux.ibm.com> Acked-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Kleber Souza <kleber.souza@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
powerpc/powernv/idoa: Remove unnecessary pcidev from pci_dn
BugLink: https://bugs.launchpad.net/bugs/1819989
The pcidev value stored in pci_dn is only used for NPU/NPU2
initialization. We can easily drop the cached pointer and
use an ancient helper - pci_get_domain_bus_and_slot() instead in order
to reduce complexity.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit 902bdc57451c2c64aa139bbe24067f70a186db0a) Signed-off-by: Jose Ricardo Ziviani <joserz@linux.ibm.com> Acked-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Kleber Souza <kleber.souza@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Vaibhav Jain [Thu, 14 Mar 2019 17:56:44 +0000 (14:56 -0300)]
powerpc/powernv: Make possible for user to force a full ipl cec reboot
BugLink: https://bugs.launchpad.net/bugs/1819989
Ever since fast reboot is enabled by default in opal,
opal_cec_reboot() will use fast-reset instead of full IPL to perform
system reboot. This leaves the user with no direct way to force a full
IPL reboot except changing an nvram setting that persistently disables
fast-reset for all subsequent reboots.
This patch provides a more direct way for the user to force a one-shot
full IPL reboot by passing the command line argument 'full' to the
reboot command. So the user will be able to tweak the reboot behavior
via:
$ sudo reboot full # Force a full ipl reboot skipping fast-reset
or
$ sudo reboot # default reboot path (usually fast-reset)
The reboot command passes the un-parsed command argument to the kernel
via the 'Reboot' syscall which is then passed on to the arch function
pnv_restart(). The patch updates pnv_restart() to handle this cmd-arg
and issues opal_cec_reboot2 with OPAL_REBOOT_FULL_IPL to force a full
IPL reset.
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit 8139046a5a34787849df81f4a5875cf4b404a7a1) Signed-off-by: Jose Ricardo Ziviani <joserz@linux.ibm.com> Acked-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Kleber Souza <kleber.souza@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Nicholas Piggin [Thu, 14 Mar 2019 17:56:42 +0000 (14:56 -0300)]
powerpc/powernv: call OPAL_QUIESCE before OPAL_SIGNAL_SYSTEM_RESET
BugLink: https://bugs.launchpad.net/bugs/1819989
Although it is often possible to recover a CPU that was interrupted
from OPAL with a system reset NMI, it's undesirable to interrupt them
for a few reasons. Firstly because dump/debug code itself needs to
call firmware, so it could hang on a lock or possibly corrupt a
per-cpu data structure if it or another CPU was interrupted from
OPAL. Secondly, the kexec crash dump code will not return from
interrupt to unwind the OPAL call.
Call OPAL_QUIESCE with QUIESCE_HOLD before sending an NMI IPI to
another CPU, which wait for it to leave firmware (or time out) to
avoid this problem in normal conditions. Firmware bugs may still
result in a timeout and interrupting OPAL, but that is the best
option (stops the CPU, and possibly allows firmware to be debugged).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit ee03b9b4479d1302d01cebedda3518dc967697b7) Signed-off-by: Jose Ricardo Ziviani <joserz@linux.ibm.com> Acked-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Kleber Souza <kleber.souza@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
powerpc/powernv/npu: Do not try invalidating 32bit table when 64bit table is enabled
BugLink: https://bugs.launchpad.net/bugs/1819989
GPUs and the corresponding NVLink bridges get different PEs as they
have separate translation validation entries (TVEs). We put these PEs
to the same IOMMU group so they cannot be passed through separately.
So the iommu_table_group_ops::set_window/unset_window for GPUs do set
tables to the NPU PEs as well which means that iommu_table's list of
attached PEs (iommu_table_group_link) has both GPU and NPU PEs linked.
This list is used for TCE cache invalidation.
The problem is that NPU PE has just a single TVE and can be programmed
to point to 32bit or 64bit windows while GPU PE has two (as any other
PCI device). So we end up having an 32bit iommu_table struct linked to
both PEs even though only the 64bit TCE table cache can be invalidated
on NPU. And a relatively recent skiboot detects this and prints
errors.
This changes GPU's iommu_table_group_ops::set_window/unset_window to
make sure that NPU PE is only linked to the table actually used by the
hardware. If there are two tables used by an IOMMU group, the NPU PE
will use the last programmed one which with the current use scenarios
is expected to be a 64bit one.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit d41ce7b1bcc3e1d02cc9da3b83c0fe355fcb68e0) Signed-off-by: Jose Ricardo Ziviani <joserz@linux.ibm.com> Acked-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Kleber Souza <kleber.souza@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/1820187
In order to avoid frequent system interrupts when sending and
receiving packets. we replace disable_irq_nosync/enable_irq
with hinic_set_msix_state(), hinic_set_msix_state is used to
access memory mapped hinic devices.
Signed-off-by: Xue Chaojing <xuechaojing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 905b464ad9008905db099f90ae20f373c7051804) Signed-off-by: Ike Panhc <ike.pan@canonical.com> Acked-by: Kleber Souza <kleber.souza@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Xue Chaojing [Mon, 18 Mar 2019 08:30:26 +0000 (16:30 +0800)]
net-next/hinic:add shutdown callback
BugLink: https://bugs.launchpad.net/bugs/1820187
If there is no shutdown callback, our board will report pcie UNF errors
after restarting. This patch add shutdown callback for hinic.
Signed-off-by: Xue Chaojing <xuechaojing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 53fe3ed19df0bca6ce752fae8e483910b6f112f6) Signed-off-by: Ike Panhc <ike.pan@canonical.com> Acked-by: Kleber Souza <kleber.souza@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Suggested-by: Neil Horman <nhorman@redhat.com> Signed-off-by: Xue Chaojing <xuechaojing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit e1a76515b0c20c3477200c1345c477cc0e68c4ad) Signed-off-by: Ike Panhc <ike.pan@canonical.com> Acked-by: Kleber Souza <kleber.souza@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Colin Ian King [Mon, 18 Mar 2019 08:30:05 +0000 (16:30 +0800)]
net: hinic: fix null pointer dereference on pointer hwdev
BugLink: https://bugs.launchpad.net/bugs/1820187
Pointer hwdev is being dereferenced when declaring hwif , however, later
on hwdev is being null checked, hence we have dereference before null
check error. Fix this by assigning hwif and pdef only once hwdev has
been null checked.
Detected by CoverityScan, CID#1485581 ("Dereference before null check")
Fixes: 4a61abb100c8 ("net-next/hinic:add rx checksum offload for HiNIC") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit e159e592872edc0536f55ec242dbc5b70a593265) Signed-off-by: Ike Panhc <ike.pan@canonical.com> Acked-by: Kleber Souza <kleber.souza@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Xue Chaojing [Mon, 18 Mar 2019 08:29:56 +0000 (16:29 +0800)]
net-next/hinic: fix a bug in rx data flow
BugLink: https://bugs.launchpad.net/bugs/1820187
In rx_alloc_pkts(), there is a loop call of tasklet, which causes
100% cpu utilization, even no packets are being received. This patch
fixes this bug.
Signed-off-by: Xue Chaojing <xuechaojing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit b1a200484143a727ce293e0f200a543cc7584152) Signed-off-by: Ike Panhc <ike.pan@canonical.com> Acked-by: Kleber Souza <kleber.souza@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Xue Chaojing [Mon, 18 Mar 2019 08:29:47 +0000 (16:29 +0800)]
net-next/hinic:fix a bug in set mac address
BugLink: https://bugs.launchpad.net/bugs/1820187
In add_mac_addr(), if the MAC address is a muliticast address,
it will not be set, which causes the network card fail to receive
the multicast packet. This patch fixes this bug.
Signed-off-by: Xue Chaojing <xuechaojing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 9ea72dc9430306b77c73a8a21beb51437cde1d6d) Signed-off-by: Ike Panhc <ike.pan@canonical.com> Acked-by: Kleber Souza <kleber.souza@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Xue Chaojing [Mon, 18 Mar 2019 08:29:37 +0000 (16:29 +0800)]
net-next/hinic:add rx checksum offload for HiNIC
BugLink: https://bugs.launchpad.net/bugs/1820187
In order to improve performance, this patch adds rx checksum offload
for the HiNIC driver. Performance test(Iperf) shows more than 80%
improvement in TCP streams.
Signed-off-by: Xue Chaojing <xuechaojing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 4a61abb100c8a647959147034f60e9fce17ce9af) Signed-off-by: Ike Panhc <ike.pan@canonical.com> Acked-by: Kleber Souza <kleber.souza@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Signed-off-by: Xue Chaojing <xuechaojing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit ebda9b46cebc9c1245fcfe96c76525717ef984cc) Signed-off-by: Ike Panhc <ike.pan@canonical.com> Acked-by: Kleber Souza <kleber.souza@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
drivers/net/ethernet/huawei/hinic/hinic_tx.c:392:34: error: implicit
conversion from enumeration type 'enum hinic_l4_tunnel_type' to
different enumeration type 'enum hinic_l4_offload_type'
[-Werror,-Wenum-conversion]
hinic_task_set_tunnel_l4(task, TUNNEL_UDP_NO_CSUM,
~~~~~~~~~~~~~~~~~~~~~~~~ ^~~~~~~~~~~~~~~~~~
1 error generated.
It seems that hinic_task_set_tunnel_l4 was meant to take an enum of type
hinic_l4_tunnel_type, not hinic_l4_offload_type, given both the name of
the functions and the values used.
Fixes: cc18a7543d2f ("net-next/hinic: add checksum offload and TSO support") Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 6e29464b8a72e74ec7c3f816f53bfe46a43601bc) Signed-off-by: Ike Panhc <ike.pan@canonical.com> Acked-by: Kleber Souza <kleber.souza@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Zhao Chen [Mon, 18 Mar 2019 08:28:32 +0000 (16:28 +0800)]
net-next/hinic: add checksum offload and TSO support
BugLink: https://bugs.launchpad.net/bugs/1820187
This patch adds checksum offload and TSO support for the HiNIC
driver. Perfomance test (Iperf) shows more than 100% improvement
in TCP streams.
Signed-off-by: Zhao Chen <zhaochen6@huawei.com> Signed-off-by: Xue Chaojing <xuechaojing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit cc18a7543d2f63a2c93fc61cfa7fd8be5464f75e) Signed-off-by: Ike Panhc <ike.pan@canonical.com> Acked-by: Kleber Souza <kleber.souza@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Eric Dumazet [Mon, 18 Mar 2019 08:28:22 +0000 (16:28 +0800)]
hinic: remove ndo_poll_controller
BugLink: https://bugs.launchpad.net/bugs/1820187
As diagnosed by Song Liu, ndo_poll_controller() can
be very dangerous on loaded hosts, since the cpu
calling ndo_poll_controller() might steal all NAPI
contexts (for all RX/TX queues of the NIC). This capture
can last for unlimited amount of time, since one
cpu is generally not able to drain all the queues under load.
hinic uses NAPI for TX completions, so we better let core
networking stack call the napi->poll() to avoid the capture.
Note that hinic_netpoll() was incorrectly scheduling NAPI
on both RX and TX queues.
Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Aviad Krawczyk <aviad.krawczyk@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit e71fb423e0dea3c9f98f0101e965426edfe849cd) Signed-off-by: Ike Panhc <ike.pan@canonical.com> Acked-by: Kleber Souza <kleber.souza@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Zhao Chen [Mon, 18 Mar 2019 08:28:12 +0000 (16:28 +0800)]
net-next: hinic: fix a problem in free_tx_poll()
BugLink: https://bugs.launchpad.net/bugs/1820187
This patch fixes the problem below. The problem can be reproduced by the
following steps:
1) Connecting all HiNIC interfaces
2) On server side
# sudo ifconfig eth0 192.168.100.1 up #Using MLX CX4 card
# iperf -s
3) On client side
# sudo ifconfig eth0 192.168.100.2 up #Using our HiNIC card
# iperf -c 192.168.101.1 -P 10 -t 100000
after hours of testing, we will see errors:
hinic 0000:05:00.0: No MGMT msg handler, mod = 0
hinic 0000:05:00.0: No MGMT msg handler, mod = 0
hinic 0000:05:00.0: No MGMT msg handler, mod = 0
hinic 0000:05:00.0: No MGMT msg handler, mod = 0
The errors are caused by the following problem.
1) The hinic_get_wqe() checks the "wq->delta" to allocate new WQEs:
This can cause WQE data error and you will see the above error messages.
This patch fixes the problem.
Signed-off-by: Zhao Chen <zhaochen6@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 9c2956d2ad9e0e7d5827290ba9a716ed3fb83bcd) Signed-off-by: Ike Panhc <ike.pan@canonical.com> Acked-by: Kleber Souza <kleber.souza@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Amanoel Dawod [Tue, 19 Mar 2019 08:34:37 +0000 (16:34 +0800)]
Fonts: New Terminus large console font
BugLink: https://bugs.launchpad.net/bugs/1819881
This patch adds an option to compile-in a high resolution
and large Terminus (ter16x32) bitmap console font for use with
HiDPI and Retina screens.
The font was convereted from standard Terminus ter-i32b.psf
(size 16x32) with the help of psftools and minor hand editing
deleting useless characters.
This patch is non-intrusive, no options are enabled by default so most
users won't notice a thing.
I am placing my changes under the GPL 2.0 just as source Terminus font.
Signed-off-by: Amanoel Dawod <amanoeladawod@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit ac8b6f148fc97e9e10b48bd337ef571b1d1136aa) Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com> Acked-by: Kleber Souza <kleber.souza@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 54e049c227d9968ff6a7d80aae5fec27b54d39da) Signed-off-by: Frank Heimes <frank.heimes@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
[ klebers: removed duplicated subject line from the commit message ] Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
There is a guard hole at the beginning of the kernel address space, also
used by hypervisors. It occupies 16 PGD entries.
This reserved range is not defined explicitely, it is calculated relative
to other entities: direct mapping and user space ranges.
The calculation got broken by recent changes of the kernel memory layout:
LDT remap range is now mapped before direct mapping and makes the
calculation invalid.
The breakage leads to crash on Xen dom0 boot[1].
Define the reserved range explicitely. It's part of kernel ABI (hypervisors
expect it to be stable) and must not depend on changes in the rest of
kernel memory layout.
x86/ldt: Unmap PTEs for the slot before freeing LDT pages
CVE-2017-5754
modify_ldt(2) leaves the old LDT mapped after switching over to the new
one. The old LDT gets freed and the pages can be re-used.
Leaving the mapping in place can have security implications. The mapping is
present in the userspace page tables and Meltdown-like attacks can read
these freed and possibly reused pages.
It's relatively simple to fix: unmap the old LDT and flush TLB before
freeing the old LDT memory.
This further allows to avoid flushing the TLB in map_ldt_struct() as the
slot is unmapped and flushed by unmap_ldt_struct() or has never been mapped
at all.
[ tglx: Massaged changelog and removed the needless line breaks ]
Fixes: f55f0501cbf6 ("x86/pti: Put the LDT in its own PGD if PTI is on") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: bp@alien8.de Cc: hpa@zytor.com Cc: dave.hansen@linux.intel.com Cc: luto@kernel.org Cc: peterz@infradead.org Cc: boris.ostrovsky@oracle.com Cc: jgross@suse.com Cc: bhe@redhat.com Cc: willy@infradead.org Cc: linux-mm@kvack.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181026122856.66224-3-kirill.shutemov@linux.intel.com
(backported from commit a0e6e0831c516860fc7f9be1db6c081fe902ebcf)
[juergh: Adjusted for flush_tlb_mm_range() having fewer arguments.] Signed-off-by: Juerg Haefliger <juergh@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
x86/mm: Move LDT remap out of KASLR region on 5-level paging
CVE-2017-5754
On 5-level paging the LDT remap area is placed in the middle of the KASLR
randomization region and it can overlap with the direct mapping, the
vmalloc or the vmap area.
The LDT mapping is per mm, so it cannot be moved into the P4D page table
next to the CPU_ENTRY_AREA without complicating PGD table allocation for
5-level paging.
The 4 PGD slot gap just before the direct mapping is reserved for
hypervisors, so it cannot be used.
Move the direct mapping one slot deeper and use the resulting gap for the
LDT remap area. The resulting layout is the same for 4 and 5 level paging.
[ tglx: Massaged changelog ]
Fixes: f55f0501cbf6 ("x86/pti: Put the LDT in its own PGD if PTI is on") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: bp@alien8.de Cc: hpa@zytor.com Cc: dave.hansen@linux.intel.com Cc: peterz@infradead.org Cc: boris.ostrovsky@oracle.com Cc: jgross@suse.com Cc: bhe@redhat.com Cc: willy@infradead.org Cc: linux-mm@kvack.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181026122856.66224-2-kirill.shutemov@linux.intel.com
(backported from commit d52888aa2753e3063a9d3a0c9f72f94aa9809c15)
[juergh: Adjusted for non-existing pgtable_l5_enabled().] Signed-off-by: Juerg Haefliger <juergh@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>