Alex Williamson [Tue, 16 Jan 2018 23:40:40 +0000 (17:40 -0600)]
vfio/pci: Virtualize Maximum Read Request Size
BugLink: https://launchpad.net/bugs/1732804
MRRS defines the maximum read request size a device is allowed to
make. Drivers will often increase this to allow more data transfer
with a single request. Completions to this request are bound by the
MPS setting for the bus. Aside from device quirks (none known), it
doesn't seem to make sense to set an MRRS value less than MPS, yet
this is a likely scenario given that user drivers do not have a
system-wide view of the PCI topology. Virtualize MRRS such that the
user can set MRRS >= MPS, but use MPS as the floor value that we'll
write to hardware.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
(cherry picked from commit cf0d53ba4947aad6e471491d5b20a567cbe92e56) Acked-by: Paolo Pisati <paolo.pisati@canonical.com> Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Alex Williamson [Tue, 16 Jan 2018 23:40:39 +0000 (17:40 -0600)]
vfio/pci: Virtualize Maximum Payload Size
BugLink: https://launchpad.net/bugs/1732804
With virtual PCI-Express chipsets, we now see userspace/guest drivers
trying to match the physical MPS setting to a virtual downstream port.
Of course a lone physical device surrounded by virtual interconnects
cannot make a correct decision for a proper MPS setting. Instead,
let's virtualize the MPS control register so that writes through to
hardware are disallowed. Userspace drivers like QEMU assume they can
write anything to the device and we'll filter out anything dangerous.
Since mismatched MPS can lead to AER and other faults, let's add it
to the kernel side rather than relying on userspace virtualization to
handle it.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com>
(cherry picked from commit 523184972b282cd9ca17a76f6ca4742394856818) Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com> Acked-by: Paolo Pisati <paolo.pisati@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Xiaofei Tan [Thu, 4 Jan 2018 17:52:00 +0000 (18:52 +0100)]
scsi: hisi_sas: support zone management commands
BugLink: https://bugs.launchpad.net/bugs/1739891
Add two ATA commands, ATA_CMD_ZAC_MGMT_IN and ATA_CMD_ZAC_MGMT_OUT in
hisi_sas_get_ata_protocol(), to support SATA SMR disk.
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit c3fe8a2bbbc22bd4945ea69ab5a29913baeb35e4) Signed-off-by: dann frazier <dann.frazier@canonical.com> Acked-by: Paolo Pisati <paolo.pisati@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Jayachandran C [Thu, 4 Jan 2018 17:37:10 +0000 (10:37 -0700)]
i2c: xlp9xx: Get clock frequency with clk API
BugLink: https://bugs.launchpad.net/bugs/1738073
Get the input clock frequency to the controller from the linux clk
API, if it is available. This allows us to pass in the block input
frequency either from ACPI (using APD) or from device tree.
The old hardcoded frequency is used as default for backwards compatibility.
Signed-off-by: Jayachandran C <jnair@caviumnetworks.com> Signed-off-by: Kamlakant Patel <kamlakant.patel@cavium.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
(cherry picked from commit c347b8fc22b21899154cc153a4951aaf226b4e1a) Signed-off-by: dann frazier <dann.frazier@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
arm64: Add software workaround for Falkor erratum 1041
BugLink: https://bugs.launchpad.net/bugs/1738497
The ARM architecture defines the memory locations that are permitted
to be accessed as the result of a speculative instruction fetch from
an exception level for which all stages of translation are disabled.
Specifically, the core is permitted to speculatively fetch from the
4KB region containing the current program counter 4K and next 4K.
When translation is changed from enabled to disabled for the running
exception level (SCTLR_ELn[M] changed from a value of 1 to 0), the
Falkor core may errantly speculatively access memory locations outside
of the 4KB region permitted by the architecture. The errant memory
access may lead to one of the following unexpected behaviors.
1) A System Error Interrupt (SEI) being raised by the Falkor core due
to the errant memory access attempting to access a region of memory
that is protected by a slave-side memory protection unit.
2) Unpredictable device behavior due to a speculative read from device
memory. This behavior may only occur if the instruction cache is
disabled prior to or coincident with translation being changed from
enabled to disabled.
The conditions leading to this erratum will not occur when either of the
following occur:
1) A higher exception level disables translation of a lower exception level
(e.g. EL2 changing SCTLR_EL1[M] from a value of 1 to 0).
2) An exception level disabling its stage-1 translation if its stage-2
translation is enabled (e.g. EL1 changing SCTLR_EL1[M] from a value of 1
to 0 when HCR_EL2[VM] has a value of 1).
To avoid the errant behavior, software must execute an ISB immediately
prior to executing the MSR that will change SCTLR_ELn[M] from 1 to 0.
Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
(backported from commit 932b50c7c1c65e6f23002e075b97ee083c4a9e71)
[ dannf: trivial offset fix in Kconfig ] Signed-off-by: dann frazier <dann.frazier@canonical.com> Acked-by: Colin Ian King <colin.king@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
[kleber.souza: solve conflicts during cherry-pick] Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/1738497 Signed-off-by: dann frazier <dann.frazier@canonical.com> Acked-by: Colin Ian King <colin.king@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
[kleber.souza: fixed BugLink on the commit message.] Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
This fixes a previous patch which disabled checksum offloading
for both IPv4 and IPv6 packets. So L3 checksum offload was
getting disabled for IPv4 pkts. And HW is dropping these pkts
for some reason.
Without this patch, IPv4 TSO appears to be broken:
WIthout this patch I get ~16kbyte/s, with patch close to 2mbyte/s
when copying files via scp from test box to my home workstation.
Looking at tcpdump on sender it looks like hardware drops IPv4 TSO skbs.
This patch restores performance for me, ipv6 looks good too.
Fixes: fa6d7cb5d76c ("net: thunderx: Fix TCP/UDP checksum offload for IPv6 pkts") Cc: Sunil Goutham <sgoutham@cavium.com> Cc: Aleksey Makarov <aleksey.makarov@auriga.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 134059fd2775be79e26c2dff87d25cc2f6ea5626) Signed-off-by: dann frazier <dann.frazier@canonical.com> Acked-by: Paolo Pisati <paolo.pisati@canonical.com> Acked-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
This fixes a previous patch which enabled checksum offloading
for both IPv4 and IPv6 packets. So L3 checksum offload was
getting enabled for IPv6 pkts. And HW is dropping these pkts
as it assumes the pkt is IPv4 when IP csum offload is set
in the SQ descriptor.
Fixes: 3a9024f52c2e ("net: thunderx: Enable TSO and checksum offloads for ipv6") Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: Aleksey Makarov <aleksey.makarov@auriga.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit fa6d7cb5d76cf0467c61420fc9238045aedfd379) Signed-off-by: dann frazier <dann.frazier@canonical.com> Acked-by: Paolo Pisati <paolo.pisati@canonical.com> Acked-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Vadim Lomovtsev [Thu, 4 Jan 2018 17:31:00 +0000 (18:31 +0100)]
PCI: Set Cavium ACS capability quirk flags to assert RR/CR/SV/UF
BugLink: https://bugs.launchpad.net/bugs/1736774
The Cavium ThunderX (CN8XXX) family of PCIe Root Ports does not advertise
an ACS capability. However, the RTL internally implements similar
protection as if ACS had Request Redirection, Completion Redirection,
Source Validation, and Upstream Forwarding features enabled.
Will Deacon [Fri, 5 Jan 2018 01:48:37 +0000 (18:48 -0700)]
locking/qrwlock: Prevent slowpath writers getting held up by fastpath
BugLink: https://bugs.launchpad.net/bugs/1732238
When a prospective writer takes the qrwlock locking slowpath due to the
lock being held, it attempts to cmpxchg the wmode field from 0 to
_QW_WAITING so that concurrent lockers also take the slowpath and queue
on the spinlock accordingly, allowing the lockers to drain.
Unfortunately, this isn't fair, because a fastpath writer that comes in
after the lock is made available but before the _QW_WAITING flag is set
can effectively jump the queue. If there is a steady stream of prospective
writers, then the waiter will be held off indefinitely.
This patch restores fairness by separating _QW_WAITING and _QW_LOCKED
into two distinct fields: _QW_LOCKED continues to occupy the bottom byte
of the lockword so that it can be cleared unconditionally when unlocking,
but _QW_WAITING now occupies what used to be the bottom bit of the reader
count. This then forces the slow-path for concurrent lockers.
Tested-by: Waiman Long <longman@redhat.com> Tested-by: Jeremy Linton <jeremy.linton@arm.com> Tested-by: Adam Wallis <awallis@codeaurora.org> Tested-by: Jan Glauber <jglauber@cavium.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Jeremy.Linton@arm.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1507810851-306-6-git-send-email-will.deacon@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit d133166146333e1f13fc81c0e6c43c8d99290a8a) Signed-off-by: dann frazier <dann.frazier@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Will Deacon [Fri, 5 Jan 2018 01:48:36 +0000 (18:48 -0700)]
locking/qrwlock, arm64: Move rwlock implementation over to qrwlocks
BugLink: https://bugs.launchpad.net/bugs/1732238
Now that the qrwlock can make use of WFE, remove our homebrewed rwlock
code in favour of the generic queued implementation.
Tested-by: Waiman Long <longman@redhat.com> Tested-by: Jeremy Linton <jeremy.linton@arm.com> Tested-by: Adam Wallis <awallis@codeaurora.org> Tested-by: Jan Glauber <jglauber@cavium.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Jeremy.Linton@arm.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: boqun.feng@gmail.com Cc: linux-arm-kernel@lists.infradead.org Cc: paulmck@linux.vnet.ibm.com Link: http://lkml.kernel.org/r/1507810851-306-5-git-send-email-will.deacon@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 087133ac90763cd339b6b67f2998f87dcc136c52) Signed-off-by: dann frazier <dann.frazier@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Will Deacon [Fri, 5 Jan 2018 01:48:35 +0000 (18:48 -0700)]
locking/qrwlock: Use atomic_cond_read_acquire() when spinning in qrwlock
BugLink: https://bugs.launchpad.net/bugs/1732238
The qrwlock slowpaths involve spinning when either a prospective reader
is waiting for a concurrent writer to drain, or a prospective writer is
waiting for concurrent readers to drain. In both of these situations,
atomic_cond_read_acquire() can be used to avoid busy-waiting and make use
of any backoff functionality provided by the architecture.
This patch replaces the open-code loops and rspin_until_writer_unlock()
implementation with atomic_cond_read_acquire(). The write mode transition
zero to _QW_WAITING is left alone, since (a) this doesn't need acquire
semantics and (b) should be fast.
Tested-by: Waiman Long <longman@redhat.com> Tested-by: Jeremy Linton <jeremy.linton@arm.com> Tested-by: Adam Wallis <awallis@codeaurora.org> Tested-by: Jan Glauber <jglauber@cavium.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Jeremy.Linton@arm.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1507810851-306-4-git-send-email-will.deacon@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit b519b56e378ee82caf9b079b04f5db87dedc3251) Signed-off-by: dann frazier <dann.frazier@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Will Deacon [Fri, 5 Jan 2018 01:48:34 +0000 (18:48 -0700)]
locking/atomic: Add atomic_cond_read_acquire()
BugLink: https://bugs.launchpad.net/bugs/1732238
smp_cond_load_acquire() provides a way to spin on a variable with acquire
semantics until some conditional expression involving the variable is
satisfied. Architectures such as arm64 can potentially enter a low-power
state, waking up only when the value of the variable changes, which
reduces the system impact of tight polling loops.
This patch makes the same interface available to users of atomic_t,
atomic64_t and atomic_long_t, rather than require messy accesses to the
structure internals.
Signed-off-by: Will Deacon <will.deacon@arm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Jeremy.Linton@arm.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <longman@redhat.com> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1507810851-306-3-git-send-email-will.deacon@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 4df714be4dcf40bfb0d4af0f851a6e1977afa02e) Signed-off-by: dann frazier <dann.frazier@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Will Deacon [Fri, 5 Jan 2018 01:48:33 +0000 (18:48 -0700)]
locking/qrwlock: Use 'struct qrwlock' instead of 'struct __qrwlock'
BugLink: https://bugs.launchpad.net/bugs/1732238
There's no good reason to keep the internal structure of struct qrwlock
hidden from qrwlock.h, particularly as it's actually needed for unlock
and ends up being abstracted independently behind the __qrwlock_write_byte()
function.
Stop pretending we can hide this stuff, and move the __qrwlock definition
into qrwlock, removing the __qrwlock_write_byte() nastiness and using the
same struct definition everywhere instead.
Signed-off-by: Will Deacon <will.deacon@arm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Jeremy.Linton@arm.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <longman@redhat.com> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1507810851-306-2-git-send-email-will.deacon@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit e0d02285f16e8d5810f3d5d5e8a5886ca0015d3b) Signed-off-by: dann frazier <dann.frazier@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
scsi: libiscsi: Allow sd_shutdown on bad transport
BugLink: https://bugs.launchpad.net/bugs/1569925
If, for any reason, userland shuts down iscsi transport interfaces
before proper logouts - like when logging in to LUNs manually, without
logging out on server shutdown, or when automated scripts can't
umount/logout from logged LUNs - kernel will hang forever on its
sd_sync_cache() logic, after issuing the SYNCHRONIZE_CACHE cmd to all
still existent paths.
This happens because iscsi_eh_cmd_timed_out(), the transport layer
timeout helper, would tell the queue timeout function (scsi_times_out)
to reset the request timer over and over, until the session state is
back to logged in state. Unfortunately, during server shutdown, this
might never happen again.
Other option would be "not to handle" the issue in the transport
layer. That would trigger the error handler logic, which would also need
the session state to be logged in again.
Best option, for such case, is to tell upper layers that the command was
handled during the transport layer error handler helper, marking it as
DID_NO_CONNECT, which will allow completion and inform about the
problem.
After the session was marked as ISCSI_STATE_FAILED, due to the first
timeout during the server shutdown phase, all subsequent cmds will fail
to be queued, allowing upper logic to fail faster.
Signed-off-by: Rafael David Tinoco <rafael.tinoco@canonical.com> Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry-picked from commit d754941225a7dbc61f6dd2173fa9498049f9a7ee next-20180117) Signed-off-by: Rafael David Tinoco <rafael.tinoco@canonical.com> Acked-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Jens Axboe [Fri, 19 Jan 2018 12:43:12 +0000 (10:43 -0200)]
blk-mq-tag: check for NULL rq when iterating tags
BugLink: https://bugs.launchpad.net/bugs/1744300
Since we introduced blk-mq-sched, the tags->rqs[] array has been
dynamically assigned. So we need to check for NULL when iterating,
since there's a window of time where the bit is set, but we haven't
dynamically assigned the tags->rqs[] array position yet.
This is perfectly safe, since the memory backing of the request is
never going away while the device is alive.
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry-pick from commit 7f5562d5ecc44c757599b201df928ba52fa05047) Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com> Acked-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
This fixes crash in ttm_bo_vm_fault when hibmc driver is used.
Signed-off-by: Michal Srb <msrb@suse.com>
(cherry-picked from https://lists.freedesktop.org/archives/dri-devel/2017-November/159002.html)
[upstream fix at
https://lists.freedesktop.org/archives/dri-devel/2017-December/161184.html
is quite intrusive - this is the minimal fix] Signed-off-by: Daniel Axtens <daniel.axtens@canonical.com> Acked-by: Paolo Pisati <paolo.pisati@canonical.com> Acked-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
On x86-32, after decoding a frame pointer to get a regs address,
regs_size() dereferences the regs pointer when it checks regs->cs to see
if the regs are user mode. This is dangerous because it's possible that
what looks like a decoded frame pointer is actually a corrupt value, and
we don't want the unwinder to make things worse.
Instead of calling regs_size() on an unsafe pointer, just assume they're
kernel regs to start with. Later, once it's safe to access the regs, we
can do the user mode check and corresponding safety check for the
remaining two regs.
Reported-and-tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-and-tested-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Byungchul Park <byungchul.park@lge.com> Cc: LKP <lkp@01.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 5ed8d8bb38c5 ("x86/unwind: Move common code into update_stack_state()") Link: http://lkml.kernel.org/r/7f95b9a6993dec7674b3f3ab3dcd3294f7b9644d.1507597785.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 62dd86ac01f9fb6386d7f8c6b389c3ea4582a50a) BugLink: http://bugs.launchpad.net/bugs/1747263 Signed-off-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Elena Reshetova [Fri, 15 Dec 2017 10:29:09 +0000 (02:29 -0800)]
userns: prevent speculative execution
CVE-2017-5753 (Spectre v1 Intel)
Since the pos value in function m_start()
seems to be controllable by userspace and later on
conditionally (upon bound check) used to resolve
map->extent, insert an observable speculation
barrier before its usage. This should prevent
observable speculation on that branch and avoid
kernel memory leak.
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Elena Reshetova [Wed, 13 Dec 2017 08:15:30 +0000 (10:15 +0200)]
udf: prevent speculative execution
CVE-2017-5753 (Spectre v1 Intel)
Since the eahd->appAttrLocation value in function
udf_add_extendedattr() seems to be controllable by
userspace and later on conditionally (upon bound check)
used in following memmove, insert an observable speculation
barrier before its usage. This should prevent
observable speculation on that branch and avoid
kernel memory leak.
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Elena Reshetova [Wed, 30 Aug 2017 10:55:54 +0000 (13:55 +0300)]
net: mpls: prevent speculative execution
CVE-2017-5753 (Spectre v1 Intel)
Since the index value in function mpls_route_input_rcu()
seems to be controllable by userspace and later on
conditionally (upon bound check) used to resolve
platform_label, insert an observable speculation
barrier before its usage. This should prevent
observable speculation on that branch and avoid
kernel memory leak.
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Elena Reshetova [Wed, 30 Aug 2017 10:52:22 +0000 (13:52 +0300)]
fs: prevent speculative execution
CVE-2017-5753 (Spectre v1 Intel)
Since the fd value in function __fcheck_files()
seems to be controllable by userspace and later on
conditionally (upon bound check) used to resolve
fdt->fd, insert an observable speculation
barrier before its usage. This should prevent
observable speculation on that branch and avoid
kernel memory leak.
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Elena Reshetova [Wed, 30 Aug 2017 10:48:35 +0000 (13:48 +0300)]
ipv6: prevent speculative execution
CVE-2017-5753 (Spectre v1 Intel)
Since the offset value in function raw6_getfrag()
seems to be controllable by userspace and later on
conditionally (upon bound check) used in the
following memcpy, insert an observable speculation
barrier before its usage. This should prevent
observable speculation on that branch and avoid
kernel memory leak.
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Elena Reshetova [Wed, 13 Dec 2017 08:16:07 +0000 (10:16 +0200)]
ipv4: prevent speculative execution
CVE-2017-5753 (Spectre v1 Intel)
Since the offset value in function raw_getfrag()
seems to be controllable by userspace and later on
conditionally (upon bound check) used in the following
memcpy, insert an observable speculation
barrier before its usage. This should prevent
observable speculation on that branch and avoid
kernel memory leak.
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Elena Reshetova [Wed, 30 Aug 2017 10:47:12 +0000 (13:47 +0300)]
Thermal/int340x: prevent speculative execution
CVE-2017-5753 (Spectre v1 Intel)
Since the trip value in function int340x_thermal_get_trip_temp()
seems to be controllable by userspace and later on
conditionally (upon bound check) used to resolve
d->aux_trips, insert an observable speculation
barrier before its usage. This should prevent
observable speculation on that branch and avoid
kernel memory leak.
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Elena Reshetova [Wed, 30 Aug 2017 10:46:21 +0000 (13:46 +0300)]
cw1200: prevent speculative execution
CVE-2017-5753 (Spectre v1 Intel)
Since the queue value in function cw1200_conf_tx()
seems to be controllable by userspace and later on
conditionally (upon bound check) used in
WSM_TX_QUEUE_SET, insert an observable speculation
barrier before its usage. This should prevent
observable speculation on that branch and avoid
kernel memory leak.
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Elena Reshetova [Wed, 30 Aug 2017 10:45:35 +0000 (13:45 +0300)]
qla2xxx: prevent speculative execution
CVE-2017-5753 (Spectre v1 Intel)
Since the handle value in functions qlafx00_status_entry()
and qlafx00_multistatus_entry() seems to be controllable
by userspace and later on conditionally (upon bound check)
used to resolve req->outstanding_cmds, insert an observable
speculation barrier before its usage. This should prevent
observable speculation on that branch and avoid kernel
memory leak.
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Elena Reshetova [Wed, 30 Aug 2017 10:44:38 +0000 (13:44 +0300)]
p54: prevent speculative execution
CVE-2017-5753 (Spectre v1 Intel)
Since the queue value in function p54_conf_tx()
seems to be controllable by userspace and later on
conditionally (upon bound check) used to resolve
priv->qos_params, insert an observable speculation
barrier before its usage. This should prevent
observable speculation on that branch and avoid
kernel memory leak.
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Elena Reshetova [Wed, 30 Aug 2017 10:43:39 +0000 (13:43 +0300)]
carl9170: prevent speculative execution
CVE-2017-5753 (Spectre v1 Intel)
Since the queue value in function carl9170_op_conf_tx()
seems to be controllable by userspace and later on
conditionally (upon bound check) used to resolve
ar9170_qmap and following ar->edcf, insert an observable
speculation barrier before its usage. This should prevent
observable speculation on that branch and avoid
kernel memory leak.
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Elena Reshetova [Wed, 30 Aug 2017 10:41:27 +0000 (13:41 +0300)]
uvcvideo: prevent speculative execution
CVE-2017-5753 (Spectre v1 Intel)
Since the index value in function uvc_ioctl_enum_input()
seems to be controllable by userspace and later on
conditionally (upon bound check) used to resolve
selector->baSourceID, insert an observable speculation
barrier before its usage. This should prevent
observable speculation on that branch and avoid
kernel memory leak.
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Elena Reshetova [Tue, 8 Aug 2017 09:06:58 +0000 (12:06 +0300)]
x86, bpf, jit: prevent speculative execution when JIT is enabled
CVE-2017-5753 (Spectre v1 Intel)
When constant blinding is enabled (bpf_jit_harden = 1), this adds
an observable speculation barrier before emitting x86 jitted code
for the BPF_ALU(64)_OR_X and BPF_ALU_LHS_X
(for BPF_REG_AX register) eBPF instructions. This is needed in order
to prevent speculative execution on out of bounds BPF_MAP array
indexes when JIT is enabled. This way an arbitary kernel memory is
not exposed through side-channel attacks.
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Elena Reshetova [Mon, 7 Aug 2017 08:10:28 +0000 (11:10 +0300)]
bpf: prevent speculative execution in eBPF interpreter
CVE-2017-5753 (Spectre v1 Intel)
This adds an observable speculation barrier before LD_IMM_DW and
LDX_MEM_B/H/W/DW eBPF instructions during eBPF program
execution in order to prevent speculative execution on out
of bound BFP_MAP array indexes. This way an arbitary kernel
memory is not exposed through side channel attacks.
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Elena Reshetova [Mon, 7 Aug 2017 08:03:42 +0000 (11:03 +0300)]
locking/barriers: introduce new observable speculation barrier
CVE-2017-5753 (Spectre v1 Intel)
The new observable speculation barrier, osb(), ensures
that any user observable speculation doesn't cross the boundary.
Any user observable speculative activity on this CPU
thread before this point either completes, reaches a
state it can no longer cause an observable activity, or
is aborted before instructions after the barrier execute.
In x86 case, osb() resolves in lfence if X86_FEATURE_LFENCE_RDTSC
is present. Other architectures can define their variants.
Suggested-by: Arjan van de Ven <arjan@linux.intel.com> Suggested-by: Alan Cox <alan.cox@intel.com> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Elena Reshetova [Thu, 14 Dec 2017 08:09:03 +0000 (10:09 +0200)]
x86/cpu/AMD: Remove now unused definition of MFENCE_RDTSC feature
CVE-2017-5753 (Spectre v1 Intel)
With the switch to using LFENCE_RDTSC on AMD platforms there is no longer
a need for the MFENCE_RDTSC feature. Remove its usage and definition.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Remove the compile time warning when CONFIG_RETPOLINE=y and the compiler
does not have retpoline support. Linus rationale for this is:
It's wrong because it will just make people turn off RETPOLINE, and the
asm updates - and return stack clearing - that are independent of the
compiler are likely the most important parts because they are likely the
ones easiest to target.
And it's annoying because most people won't be able to do anything about
it. The number of people building their own compiler? Very small. So if
their distro hasn't got a compiler yet (and pretty much nobody does), the
warning is just annoying crap.
It is already properly reported as part of the sysfs interface. The
compile-time warning only encourages bad things.
Fixes: 76b043848fd2 ("x86/retpoline: Add initial retpoline support") Requested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: gnomes@lxorguk.ukuu.org.uk Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: thomas.lendacky@amd.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jikos@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Kees Cook <keescook@google.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org> Link: https://lkml.kernel.org/r/CA+55aFzWgquv4i6Mab6bASqYXg3ErV3XDFEYf=GEcCDQg5uAtw@mail.gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit a864eaee08dce8b81ba369f1af83d7dfefad5098) Signed-off-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
The intel_bts driver does not use the 'normal' BTS buffer which is exposed
through the cpu_entry_area but instead uses the memory allocated for the
perf AUX buffer.
This obviously comes apart when using PTI because then the kernel mapping;
which includes that AUX buffer memory; disappears. Fixing this requires to
expose a mapping which is visible in all context and that's not trivial.
As a quick fix disable this driver when PTI is enabled to prevent
malfunction.
Fixes: 385ce0ea4c07 ("x86/mm/pti: Add Kconfig") Reported-by: Vince Weaver <vincent.weaver@maine.edu> Reported-by: Robert Święcki <robert@swiecki.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: greg@kroah.com Cc: hughd@google.com Cc: luto@amacapital.net Cc: Vince Weaver <vince@deater.net> Cc: torvalds@linux-foundation.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180114102713.GB6166@worktop.programming.kicks-ass.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 556a656f2ff4eb6ce016f26cf184fb17608787c3) Signed-off-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
When the config option for PTI was added a reference to documentation was
added as well. But the documentation did not exist at that point. The final
documentation has a different file name.
Fix it up to point to the proper file.
Fixes: 385ce0ea ("x86/mm/pti: Add Kconfig") Signed-off-by: W. Trevor King <wking@tremily.us> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: linux-mm@kvack.org Cc: linux-security-module@vger.kernel.org Cc: James Morris <james.l.morris@oracle.com> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/3009cc8ccbddcd897ec1e0cb6dda524929de0d14.1515799398.git.wking@tremily.us Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 26ef466810d78c2dc80e1d8b71fc259d9da2936b) Signed-off-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
The switch to the user space page tables in the low level ASM code sets
unconditionally bit 12 and bit 11 of CR3. Bit 12 is switching the base
address of the page directory to the user part, bit 11 is switching the
PCID to the PCID associated with the user page tables.
This fails on a machine which lacks PCID support because bit 11 is set in
CR3. Bit 11 is reserved when PCID is inactive.
While the Intel SDM claims that the reserved bits are ignored when PCID is
disabled, the AMD APM states that they should be cleared.
This went unnoticed as the AMD APM was not checked when the code was
developed and reviewed and test systems with Intel CPUs never failed to
boot. The report is against a Centos 6 host where the guest fails to boot,
so it's not yet clear whether this is a virt issue or can happen on real
hardware too, but thats irrelevant as the AMD APM clearly ask for clearing
the reserved bits.
Make sure that on non PCID machines bit 11 is not set by the page table
switching code.
Andy suggested to rename the related bits and masks so they are clearly
describing what they should be used for, which is done as well for clarity.
That split could have been done with alternatives but the macro hell is
horrible and ugly. This can be done on top if someone cares to remove the
extra orq. For now it's a straight forward fix.
Fixes: 6fd166aae78c ("x86/mm: Use/Fix PCID to optimize user/kernel switches") Reported-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable <stable@vger.kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Willy Tarreau <w@1wt.eu> Cc: David Woodhouse <dwmw@amazon.co.uk> Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801140009150.2371@nanos Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 58c7987713214f8c62e37b29878b3a620ad277b9) Signed-off-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
This tests that the vsyscall entries do what they're expected to do.
It also confirms that attempts to read the vsyscall page behave as
expected.
If changes are made to the vsyscall code or its memory map handling,
running this test in all three of vsyscall=none, vsyscall=emulate,
and vsyscall=native are helpful.
(Because it's easy, this also compares the vsyscall results to their
vDSO equivalents.)
Note to KAISER backporters: please test this under all three
vsyscall modes. Also, in the emulate and native modes, make sure
that test_vsyscall_64 agrees with the command line or config
option as to which mode you're in. It's quite easy to mess up
the kernel such that native mode accidentally emulates
or vice versa.
Greg, etc: please backport this to all your Meltdown-patched
kernels. It'll help make sure the patches didn't regress
vsyscalls.
CSigned-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hugh Dickins <hughd@google.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/2b9c5a174c1d60fd7774461d518aa75598b1d8fd.1515719552.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit b971f34cc536db2eef2c87594c2d34294ca936c2) Signed-off-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
In accordance with the Intel and AMD documentation, we need to overwrite
all entries in the RSB on exiting a guest, to prevent malicious branch
target predictions from affecting the host kernel. This is needed both
for retpoline and for IBRS.
[ak: numbers again for the RSB stuffing labels]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: gnomes@lxorguk.ukuu.org.uk Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: thomas.lendacky@amd.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jikos@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Kees Cook <keescook@google.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org> Cc: Paul Turner <pjt@google.com> Link: https://lkml.kernel.org/r/1515755487-8524-1-git-send-email-dwmw@amazon.co.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 542b09d5fde1f3f4f77602ffdaa18da9773c3f03) Signed-off-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Convert indirect jumps in core 32/64bit entry assembler code to use
non-speculative sequences when CONFIG_RETPOLINE is enabled.
Don't use CALL_NOSPEC in entry_SYSCALL_64_fastpath because the return
address after the 'call' instruction must be *precisely* at the
.Lentry_SYSCALL_64_after_fastpath label for stub_ptregs_64 to work,
and the use of alternatives will mess that up unless we play horrid
games to prepend with NOPs and make the variants the same length. It's
not worth it; in the case where we ALTERNATIVE out the retpoline, the
first instruction at __x86.indirect_thunk.rax is going to be a bare
jmp *%rax anyway.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Cc: gnomes@lxorguk.ukuu.org.uk Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: thomas.lendacky@amd.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jikos@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Kees Cook <keescook@google.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org> Cc: Paul Turner <pjt@google.com> Link: https://lkml.kernel.org/r/1515707194-20531-7-git-send-email-dwmw@amazon.co.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 9c829a0efd356d4844bd4acda9a0d053692eef98) Signed-off-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Add a spectre_v2= option to select the mitigation used for the indirect
branch speculation vulnerability.
Currently, the only option available is retpoline, in its various forms.
This will be expanded to cover the new IBRS/IBPB microcode features.
The RETPOLINE_AMD feature relies on a serializing LFENCE for speculation
control. For AMD hardware, only set RETPOLINE_AMD if LFENCE is a
serializing instruction, which is indicated by the LFENCE_RDTSC feature.
[ tglx: Folded back the LFENCE/AMD fixes and reworked it so IBRS
integration becomes simple ]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: gnomes@lxorguk.ukuu.org.uk Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: thomas.lendacky@amd.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jikos@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Kees Cook <keescook@google.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org> Cc: Paul Turner <pjt@google.com> Link: https://lkml.kernel.org/r/1515707194-20531-5-git-send-email-dwmw@amazon.co.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 1404da13a17d8d09cee7c400659671ecb0651ba0) Signed-off-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Enable the use of -mindirect-branch=thunk-extern in newer GCC, and provide
the corresponding thunks. Provide assembler macros for invoking the thunks
in the same way that GCC does, from native and inline assembler.
This adds X86_FEATURE_RETPOLINE and sets it by default on all CPUs. In
some circumstances, IBRS microcode features may be used instead, and the
retpoline can be disabled.
On AMD CPUs if lfence is serialising, the retpoline can be dramatically
simplified to a simple "lfence; jmp *\reg". A future patch, after it has
been verified that lfence really is serialising in all circumstances, can
enable this by setting the X86_FEATURE_RETPOLINE_AMD feature bit in addition
to X86_FEATURE_RETPOLINE.
Do not align the retpoline in the altinstr section, because there is no
guarantee that it stays aligned when it's copied over the oldinstr during
alternative patching.
[ Andi Kleen: Rename the macros, add CONFIG_RETPOLINE option, export thunks]
[ tglx: Put actual function CALL/JMP in front of the macros, convert to
symbolic labels ]
[ dwmw2: Convert back to numeric labels, merge objtool fixes ]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: gnomes@lxorguk.ukuu.org.uk Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: thomas.lendacky@amd.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jikos@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Kees Cook <keescook@google.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org> Cc: Paul Turner <pjt@google.com> Link: https://lkml.kernel.org/r/1515707194-20531-4-git-send-email-dwmw@amazon.co.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(backported from commit 92db6bfd8470c977aa9ab723c4f244bcd6ec253c) Signed-off-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Getting objtool to understand retpolines is going to be a bit of a
challenge. For now, take advantage of the fact that retpolines are
patched in with alternatives. Just read the original (sane)
non-alternative instruction, and ignore the patched-in retpoline.
This allows objtool to understand the control flow *around* the
retpoline, even if it can't yet follow what's inside. This means the
ORC unwinder will fail to unwind from inside a retpoline, but will work
fine otherwise.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: gnomes@lxorguk.ukuu.org.uk Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: thomas.lendacky@amd.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jikos@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Kees Cook <keescook@google.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org> Cc: Paul Turner <pjt@google.com> Link: https://lkml.kernel.org/r/1515707194-20531-3-git-send-email-dwmw@amazon.co.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 7442d455e7d642731474c8d607edc76b56144b75) Signed-off-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
This is another case similar to what EFI does: create a new set of
page tables, map some code at a low address, and jump to it. PTI
mistakes this low address for userspace and mistakenly marks it
non-executable in an effort to make it unusable for userspace.
Undo the poison to allow execution.
Fixes: 385ce0ea4c07 ("x86/mm/pti: Add Kconfig") Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Jon Masters <jcm@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Jeff Law <law@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org> Cc: David" <dwmw@amazon.co.uk> Cc: Nick Clifton <nickc@redhat.com> Link: https://lkml.kernel.org/r/20180108102805.GK25546@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit d9dbc4b5deed072a4b7c73dc756ae71103bdf0fd) Signed-off-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
As the meltdown/spectre problem affects several CPU architectures, it makes
sense to have common way to express whether a system is affected by a
particular vulnerability or not. If affected the way to express the
mitigation should be common as well.
Create /sys/devices/system/cpu/vulnerabilities folder and files for
meltdown, spectre_v1 and spectre_v2.
Allow architectures to override the show function.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linuxfoundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: David Woodhouse <dwmw@amazon.co.uk> Link: https://lkml.kernel.org/r/20180107214913.096657732@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 039bb6382d905dad43104eb0086462ed12b17eb1) Signed-off-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Jim Mattson [Wed, 3 Jan 2018 22:31:38 +0000 (14:31 -0800)]
kvm: vmx: Scrub hardware GPRs at VM-exit
CVE-2017-5715 (Spectre v2 retpoline)
Guest GPR values are live in the hardware GPRs at VM-exit. Do not
leave any guest values in hardware GPRs after the guest GPR values are
saved to the vcpu_vmx structure.
This is a partial mitigation for CVE 2017-5715 and CVE 2017-5753.
Specifically, it defeats the Project Zero PoC for CVE 2017-5715.
Suggested-by: Eric Northup <digitaleric@google.com> Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Eric Northup <digitaleric@google.com> Reviewed-by: Benjamin Serebrin <serebrin@google.com> Reviewed-by: Andrew Honig <ahonig@google.com>
[Paolo: Add AMD bits, Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 0cb5b30698fdc8f6b4646012e3acb4ddce430788) Signed-off-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
x86/asm: Fix inline asm call constraints for Clang
CVE-2017-5715 (Spectre v2 retpoline)
For inline asm statements which have a CALL instruction, we list the
stack pointer as a constraint to convince GCC to ensure the frame
pointer is set up first:
With GCC 7.2, however, GCC's behavior has changed. It now changes its
behavior based on the conversion of the register variable to a global.
That somehow convinces it to *always* set up the frame pointer before
inserting *any* inline asm. (Therefore, listing the variable as an
output constraint is a no-op and is no longer necessary.) It's a bit
overkill, but the performance impact should be negligible. And in fact,
there's a nice improvement with frame pointers disabled:
So in summary, while listing the stack pointer as an output constraint
is no longer necessary for newer versions of GCC, it's still needed for
older versions.
Suggested-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reported-by: Matthias Kaehlcke <mka@chromium.org> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Miguel Bernal Marin <miguel.bernal.marin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/3db862e970c432ae823cf515c52b54fec8270e0e.1505942196.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
(backported from commit f5caf621ee357279e759c0911daf6d55c7d36f03) Signed-off-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>