]> git.proxmox.com Git - mirror_ubuntu-bionic-kernel.git/log
mirror_ubuntu-bionic-kernel.git
6 years agoALSA: usb-audio: Skip broken EU on Dell dock USB-audio
Takashi Iwai [Tue, 24 Apr 2018 09:11:48 +0000 (11:11 +0200)]
ALSA: usb-audio: Skip broken EU on Dell dock USB-audio

BugLink: http://bugs.launchpad.net/bugs/1778265
commit 1d8d6428d1da642ddd75b0be2d1bb1123ff8e017 upstream.

The Dell Dock USB-audio device with 0bda:4014 is behaving notoriously
bad, and we have already applied some workaround to avoid the firmware
hiccup.  Yet we still need to skip one thing, the Extension Unit at ID
4, which doesn't react correctly to the mixer ctl access.

Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1090658
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
6 years agoUSB: Increment wakeup count on remote wakeup.
Ravi Chandra Sadineni [Fri, 20 Apr 2018 18:08:21 +0000 (11:08 -0700)]
USB: Increment wakeup count on remote wakeup.

BugLink: http://bugs.launchpad.net/bugs/1778265
commit 83a62c51ba7b3c0bf45150c4eac7aefc6c785e94 upstream.

On chromebooks we depend on wakeup count to identify the wakeup source.
But currently USB devices do not increment the wakeup count when they
trigger the remote wake. This patch addresses the same.

Resume condition is reported differently on USB 2.0 and USB 3.0 devices.

On USB 2.0 devices, a wake capable device, if wake enabled, drives
resume signal to indicate a remote wake (USB 2.0 spec section 7.1.7.7).
The upstream facing port then sets C_PORT_SUSPEND bit and reports a
port change event (USB 2.0 spec section 11.24.2.7.2.3). Thus if a port
has resumed before driving the resume signal from the host and
C_PORT_SUSPEND is set, then the device attached to the given port might
be the reason for the last system wakeup. Increment the wakeup count for
the same.

On USB 3.0 devices, a function may signal that it wants to exit from device
suspend by sending a Function Wake Device Notification to the host (USB3.0
spec section 8.5.6.4) Thus on receiving the Function Wake, increment the
wakeup count.

Signed-off-by: Ravi Chandra Sadineni <ravisadineni@chromium.org>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
6 years agousb: core: Add quirk for HP v222w 16GB Mini
Kamil Lulko [Thu, 19 Apr 2018 23:54:02 +0000 (16:54 -0700)]
usb: core: Add quirk for HP v222w 16GB Mini

BugLink: http://bugs.launchpad.net/bugs/1778265
commit 3180dabe08e3653bf0a838553905d88f3773f29c upstream.

Add DELAY_INIT quirk to fix the following problem with HP
v222w 16GB Mini:

usb 1-3: unable to read config index 0 descriptor/start: -110
usb 1-3: can't read configurations, error -110
usb 1-3: can't set config #1, error -110

Signed-off-by: Kamil Lulko <kamilx.lulko@intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
6 years agoUSB: serial: cp210x: add ID for NI USB serial console
Kyle Roeschley [Mon, 9 Apr 2018 15:23:55 +0000 (10:23 -0500)]
USB: serial: cp210x: add ID for NI USB serial console

BugLink: http://bugs.launchpad.net/bugs/1778265
commit 1e23aace21515a8f7615a1de016c0ea8d4e0cc6e upstream.

Added the USB VID and PID for the USB serial console on some National
Instruments devices.

Signed-off-by: Kyle Roeschley <kyle.roeschley@ni.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
6 years agoUSB: serial: ftdi_sio: use jtag quirk for Arrow USB Blaster
Vasyl Vavrychuk [Wed, 11 Apr 2018 14:05:13 +0000 (17:05 +0300)]
USB: serial: ftdi_sio: use jtag quirk for Arrow USB Blaster

BugLink: http://bugs.launchpad.net/bugs/1778265
commit 470b5d6f0cf4674be2d1ec94e54283a1770b6a1a upstream.

Arrow USB Blaster integrated on MAX1000 board uses the same vendor ID
(0x0403) and product ID (0x6010) as the "original" FTDI device.

This patch avoids picking up by ftdi_sio of the first interface of this
USB device. After that this device can be used by Arrow user-space JTAG
driver.

Signed-off-by: Vasyl Vavrychuk <vvavrychuk@gmail.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
6 years agoUSB: serial: simple: add libtransistor console
Collin May [Sat, 7 Apr 2018 21:32:48 +0000 (14:32 -0700)]
USB: serial: simple: add libtransistor console

BugLink: http://bugs.launchpad.net/bugs/1778265
commit fe710508b6ba9d28730f3021fed70e7043433b2e upstream.

Add simple driver for libtransistor USB console.
This device is implemented in software:
https://github.com/reswitched/libtransistor/blob/development/lib/usb_serial.c

Signed-off-by: Collin May <collin@collinswebsite.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
6 years agoRevert "xhci: plat: Register shutdown for xhci_plat"
Greg Kroah-Hartman [Sun, 22 Apr 2018 12:31:03 +0000 (14:31 +0200)]
Revert "xhci: plat: Register shutdown for xhci_plat"

BugLink: http://bugs.launchpad.net/bugs/1778265
commit c20f53c58261b121d0989e147368803b9773b413 upstream.

This reverts commit b07c12517f2aed0add8ce18146bb426b14099392

It is incomplete and causes hangs on devices when shutting down.  It
needs a much more "complete" fix in order to work properly.  As that fix
has not been merged, revert this patch for now before it causes any more
problems.

Cc: Greg Hackmann <ghackmann@google.com>
Cc: Adam Wallis <awallis@codeaurora.org>
Cc: Mathias Nyman <mathias.nyman@intel.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
6 years agousbip: vhci_hcd: check rhport before using in vhci_hub_control()
Shuah Khan [Thu, 5 Apr 2018 22:31:49 +0000 (16:31 -0600)]
usbip: vhci_hcd: check rhport before using in vhci_hub_control()

BugLink: http://bugs.launchpad.net/bugs/1778265
commit 5b22f676118ff25049382041da0db8012e57c9e8 upstream.

Validate !rhport < 0 before using it to access port_status array.

Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
6 years agousbip: vhci_hcd: Fix usb device and sockfd leaks
Shuah Khan [Mon, 2 Apr 2018 20:52:32 +0000 (14:52 -0600)]
usbip: vhci_hcd: Fix usb device and sockfd leaks

BugLink: http://bugs.launchpad.net/bugs/1778265
commit 9020a7efe537856eb3e826ebebdf38a5d07a7857 upstream.

vhci_hcd fails to do reset to put usb device and sockfd in the
module remove/stop paths. Fix the leak.

Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
6 years agousbip: usbip_host: fix to hold parent lock for device_attach() calls
Shuah Khan [Thu, 5 Apr 2018 22:29:04 +0000 (16:29 -0600)]
usbip: usbip_host: fix to hold parent lock for device_attach() calls

BugLink: http://bugs.launchpad.net/bugs/1778265
commit 4bfb141bc01312a817d36627cc47c93f801c216d upstream.

usbip_host calls device_attach() without holding dev->parent lock.
Fix it.

Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
6 years agousbip: usbip_event: fix to not print kernel pointer address
Shuah Khan [Thu, 5 Apr 2018 22:29:50 +0000 (16:29 -0600)]
usbip: usbip_event: fix to not print kernel pointer address

BugLink: http://bugs.launchpad.net/bugs/1778265
commit 4c982482341c64f55daf69b6caa5a2bcd9b43824 upstream.

Fix it to not print kernel pointer address. Remove the conditional
and debug message as it isn't very useful.

Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
6 years agorandom: rate limit unseeded randomness warnings
Theodore Ts'o [Wed, 25 Apr 2018 05:12:32 +0000 (01:12 -0400)]
random: rate limit unseeded randomness warnings

BugLink: http://bugs.launchpad.net/bugs/1778265
commit 4e00b339e264802851aff8e73cde7d24b57b18ce upstream.

On systems without sufficient boot randomness, no point spamming dmesg.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
6 years agorandom: fix possible sleeping allocation from irq context
Theodore Ts'o [Mon, 23 Apr 2018 22:51:28 +0000 (18:51 -0400)]
random: fix possible sleeping allocation from irq context

BugLink: http://bugs.launchpad.net/bugs/1778265
commit 6c1e851c4edc13a43adb3ea4044e3fc8f43ccf7d upstream.

We can do a sleeping allocation from an irq context when CONFIG_NUMA
is enabled.  Fix this by initializing the NUMA crng instances in a
workqueue.

Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: syzbot+9de458f6a5e713ee8c1a@syzkaller.appspotmail.com
Fixes: 8ef35c866f8862df ("random: set up the NUMA crng instances...")
Cc: stable@vger.kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
6 years agorandom: set up the NUMA crng instances after the CRNG is fully initialized
Theodore Ts'o [Wed, 11 Apr 2018 19:23:56 +0000 (15:23 -0400)]
random: set up the NUMA crng instances after the CRNG is fully initialized

BugLink: http://bugs.launchpad.net/bugs/1778265
commit 8ef35c866f8862df074a49a93b0309725812dea8 upstream.

Until the primary_crng is fully initialized, don't initialize the NUMA
crng nodes.  Otherwise users of /dev/urandom on NUMA systems before
the CRNG is fully initialized can get very bad quality randomness.  Of
course everyone should move to getrandom(2) where this won't be an
issue, but there's a lot of legacy code out there.  This related to
CVE-2018-1108.

Reported-by: Jann Horn <jannh@google.com>
Fixes: 1e7f583af67b ("random: make /dev/urandom scalable for silly...")
Cc: stable@kernel.org # 4.8+
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
6 years agoext4: fix bitmap position validation
Lukas Czerner [Tue, 24 Apr 2018 15:31:44 +0000 (11:31 -0400)]
ext4: fix bitmap position validation

BugLink: http://bugs.launchpad.net/bugs/1778265
commit 22be37acce25d66ecf6403fc8f44df9c5ded2372 upstream.

Currently in ext4_valid_block_bitmap() we expect the bitmap to be
positioned anywhere between 0 and s_blocksize clusters, but that's
wrong because the bitmap can be placed anywhere in the block group. This
causes false positives when validating bitmaps on perfectly valid file
system layouts. Fix it by checking whether the bitmap is within the group
boundary.

The problem can be reproduced using the following

mkfs -t ext3 -E stride=256 /dev/vdb1
mount /dev/vdb1 /mnt/test
cd /mnt/test
wget https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.16.3.tar.xz
tar xf linux-4.16.3.tar.xz

This will result in the warnings in the logs

EXT4-fs error (device vdb1): ext4_validate_block_bitmap:399: comm tar: bg 84: block 2774529: invalid block bitmap

[ Changed slightly for clarity and to not drop a overflow test -- TYT ]

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reported-by: Ilya Dryomov <idryomov@gmail.com>
Fixes: 7dac4a1726a9 ("ext4: add validity checks for bitmap block numbers")
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
6 years agoext4: add validity checks for bitmap block numbers
Theodore Ts'o [Tue, 27 Mar 2018 03:54:10 +0000 (23:54 -0400)]
ext4: add validity checks for bitmap block numbers

BugLink: http://bugs.launchpad.net/bugs/1778265
commit 7dac4a1726a9c64a517d595c40e95e2d0d135f6f upstream.

An privileged attacker can cause a crash by mounting a crafted ext4
image which triggers a out-of-bounds read in the function
ext4_valid_block_bitmap() in fs/ext4/balloc.c.

This issue has been assigned CVE-2018-1093.

BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=199181
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1560782
Reported-by: Wen Xu <wen.xu@gatech.edu>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
6 years agoext4: add MODULE_SOFTDEP to ensure crc32c is included in the initramfs
Theodore Ts'o [Thu, 26 Apr 2018 04:44:46 +0000 (00:44 -0400)]
ext4: add MODULE_SOFTDEP to ensure crc32c is included in the initramfs

BugLink: http://bugs.launchpad.net/bugs/1778265
commit 7ef79ad52136712172eb0525bf0b462516bf2f93 upstream.

Fixes: a45403b51582 ("ext4: always initialize the crc32c checksum driver")
Reported-by: François Valenduc <francoisvalenduc@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
6 years agoext4: set h_journal if there is a failure starting a reserved handle
Theodore Ts'o [Wed, 18 Apr 2018 15:49:31 +0000 (11:49 -0400)]
ext4: set h_journal if there is a failure starting a reserved handle

BugLink: http://bugs.launchpad.net/bugs/1778265
commit b2569260d55228b617bd82aba6d0db2faeeb4116 upstream.

If ext4 tries to start a reserved handle via
jbd2_journal_start_reserved(), and the journal has been aborted, this
can result in a NULL pointer dereference.  This is because the fields
h_journal and h_transaction in the handle structure share the same
memory, via a union, so jbd2_journal_start_reserved() will clear
h_journal before calling start_this_handle().  If this function fails
due to an aborted handle, h_journal will still be NULL, and the call
to jbd2_journal_free_reserved() will pass a NULL journal to
sub_reserve_credits().

This can be reproduced by running "kvm-xfstests -c dioread_nolock
generic/475".

Cc: stable@kernel.org # 3.11
Fixes: 8f7d89f36829b ("jbd2: transaction reservation support")
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
6 years agoxhci: Fix USB ports for Dell Inspiron 5775
Kai-Heng Feng [Fri, 22 Jun 2018 18:50:47 +0000 (02:50 +0800)]
xhci: Fix USB ports for Dell Inspiron 5775

BugLink: https://bugs.launchpad.net/bugs/1756700
The Dell Inspiron 5775 is a Raven Ridge. The Enable Slot command timed
out when a USB device gets plugged:
[ 212.156326] xhci_hcd 0000:03:00.3: Error while assigning device slot ID
[ 212.156340] xhci_hcd 0000:03:00.3: Max number of devices this xHCI host supports is 64.
[ 212.156348] usb usb2-port3: couldn't allocate usb_device

AMD suggests that a delay before xHC suspends can fix the issue.

I can confirm it fixes the issue, so use the suspend delay quirk for
Raven Ridge's xHC.

Cc: stable@vger.kernel.org
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 621faf4f6a181b6e012c1d1865213f36f4159b7f)
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Acked-by: Anthony Wong <anthony.wong@canonical.com>
Acked-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
6 years agoixgbe/ixgbevf: Free IRQ when PCI error recovery removes the device
Mauro S M Rodrigues [Tue, 12 Jun 2018 13:27:37 +0000 (10:27 -0300)]
ixgbe/ixgbevf: Free IRQ when PCI error recovery removes the device

BugLink: http://bugs.launchpad.net/bugs/1776389
Since commit f7f37e7ff2b9 ("ixgbe: handle close/suspend race with
netif_device_detach/present") ixgbe_close_suspend is called, from
ixgbe_close, only if the device is present, i.e. if it isn't detached.
That exposed a situation where IRQs weren't freed if a PCI error
recovery system opts to remove the device. For such case the pci channel
state is set to pci_channel_io_perm_failure and ixgbe_io_error_detected
was returning PCI_ERS_RESULT_DISCONNECT before calling
ixgbe_close_suspend consequentially not freeing IRQ and crashing when
the remove handler calls pci_disable_device, hitting a BUG_ON at
free_msi_irqs, which asserts that there is no non-free IRQ associated
with the device to be removed:

BUG_ON(irq_has_action(entry->irq + i));

The issue is fixed by calling the ixgbe_close_suspend before evaluate
the pci channel state.

Reported-by: Naresh Bannoth <nbannoth@in.ibm.com>
Reported-by: Abdul Haleem <abdhalee@in.ibm.com>
Signed-off-by: Mauro S M Rodrigues <maurosr@linux.vnet.ibm.com>
Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit b212d815e77c72be921979119c715166cc8987b1)
Signed-off-by: Mauro S. M. Rodrigues <maurosr@linux.vnet.ibm.com>
Acked-by: Khalid Elmously <khalid.elmously@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
6 years agoscsi: aacraid: Correct hba_send to include iu_type
Dave Carroll [Wed, 13 Jun 2018 15:21:09 +0000 (11:21 -0400)]
scsi: aacraid: Correct hba_send to include iu_type

BugLink: http://bugs.launchpad.net/bugs/1770095
commit b60710ec7d7a ("scsi: aacraid: enable sending of TMFs from
aac_hba_send()") allows aac_hba_send() to send scsi commands, and TMF
requests, but the existing code only updates the iu_type for scsi
commands. For TMF requests we are sending an unknown iu_type to
firmware, which causes a fault.

Include iu_type prior to determining the validity of the command

Reported-by: Noah Misner <nmisner@us.ibm.com>
Fixes: b60710ec7d7ab ("aacraid: enable sending of TMFs from aac_hba_send()")
Fixes: 423400e64d377 ("aacraid: Include HBA direct interface")
Tested-by: Noah Misner <nmisner@us.ibm.com>
cc: stable@vger.kernel.org
Signed-off-by: Dave Carroll <david.carroll@microsemi.com>
Reviewed-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from linux-next commit 7d3af7d96af7b9f51e1ef67b6f4725f545737da2)
Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Acked-by: Khalid Elmously <khalid.elmously@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
6 years agos390/archrandom: Rework arch random implementation.
Harald Freudenberger [Wed, 13 Jun 2018 16:04:04 +0000 (12:04 -0400)]
s390/archrandom: Rework arch random implementation.

BugLink: http://bugs.launchpad.net/bugs/1775391
The arch_get_random_seed_long() invocation done by the random device
driver is done in interrupt context and may be invoked very very
frequently. The existing s390 arch_get_random_seed*() implementation
uses the PRNO(TRNG) instruction which produces excellent high quality
entropy but is relatively slow and thus expensive.

This fix reworks the arch_get_random_seed* implementation. It
introduces a buffer concept to decouple the delivery of random data
via arch_get_random_seed*() from the generation of new random
bytes. The buffer of random data is filled asynchronously by a
workqueue thread.
If there are enough bytes in the buffer the s390_arch_random_generate()
just delivers these bytes. Otherwise false is returned until the worker
thread refills the buffer.
The worker fills the rng buffer by pulling fresh entropy from the
high quality (but slow) true hardware random generator. This entropy
is then spread over the buffer with an pseudo random generator.
As the arch_get_random_seed_long() fetches 8 bytes and the calling
function add_interrupt_randomness() counts this as 1 bit entropy the
distribution needs to make sure there is in fact 1 bit entropy
contained in 8 bytes of the buffer. The current values pull 32 byte
entropy and scatter this into a 2048 byte buffer. So 8 byte in the
buffer will contain 1 bit of entropy.
The worker thread is rescheduled based on the charge level of the
buffer but at least with 500 ms delay to avoid too much cpu consumption.
So the max. amount of rng data delivered via arch_get_random_seed is
limited to 4Kb per second.

Signed-off-by: Harald Freudenberger <freude@de.ibm.com>
Reviewed-by: Patrick Steuer <patrick.steuer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
(cherry picked from commit 966f53e750aedc5f59f9ccae6bbfb8f671c7c842)
Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Acked-by: Khalid Elmously <khalid.elmously@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
6 years agos390/zcrypt: Fix CCA and EP11 CPRB processing failure memory leak.
Harald Freudenberger [Wed, 13 Jun 2018 16:12:59 +0000 (12:12 -0400)]
s390/zcrypt: Fix CCA and EP11 CPRB processing failure memory leak.

BugLink: http://bugs.launchpad.net/bugs/1775390
Tests showed, that the zcrypt device driver produces memory
leaks when a valid CCA or EP11 CPRB can't get delivered or has
a failure during processing within the zcrypt device driver.

This happens when a invalid domain or adapter number is used
or the lower level software or hardware layers produce any
kind of failure during processing of the request.

Only CPRBs send to CCA or EP11 cards can produce this memory
leak. The accelerator and the CPRBs processed by this type
of crypto card is not affected.

The two fields message and private within the ap_message struct
are allocated with pulling the function code for the CPRB but
only freed when processing of the CPRB succeeds. So for example
an invalid domain or adapter field causes the processing to
fail, leaving these two memory areas allocated forever.

Signed-off-by: Harald Freudenberger <freude@de.ibm.com>
Reviewed-by: Ingo Franzki <ifranzki@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
(cherry picked from commit 89a0c0ec0d2e3ce0ee9caa00f60c0c26ccf11c21)
Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Acked-by: Khalid Elmously <khalid.elmously@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
6 years agocxl: Disable prefault_mode in Radix mode
Vaibhav Jain [Fri, 18 May 2018 09:42:23 +0000 (15:12 +0530)]
cxl: Disable prefault_mode in Radix mode

BugLink: http://bugs.launchpad.net/bugs/1774471
Currently we see a kernel-oops reported on Power-9 while attaching a
context to an AFU, with radix-mode and sysfs attr 'prefault_mode' set
to anything other than 'none'. The backtrace of the oops is of this
form:

  Unable to handle kernel paging request for data at address 0x00000080
  Faulting instruction address: 0xc00800000bcf3b20
  cpu 0x1: Vector: 300 (Data Access) at [c00000037f003800]
      pc: c00800000bcf3b20: cxl_load_segment+0x178/0x290 [cxl]
      lr: c00800000bcf39f0: cxl_load_segment+0x48/0x290 [cxl]
      sp: c00000037f003a80
     msr: 9000000000009033
     dar: 80
   dsisr: 40000000
    current = 0xc00000037f280000
    paca    = 0xc0000003ffffe600   softe: 3        irq_happened: 0x01
      pid   = 3529, comm = afp_no_int
  <snip>
  cxl_prefault+0xfc/0x248 [cxl]
  process_element_entry_psl9+0xd8/0x1a0 [cxl]
  cxl_attach_dedicated_process_psl9+0x44/0x130 [cxl]
  native_attach_process+0xc0/0x130 [cxl]
  afu_ioctl+0x3f4/0x5e0 [cxl]
  do_vfs_ioctl+0xdc/0x890
  ksys_ioctl+0x68/0xf0
  sys_ioctl+0x40/0xa0
  system_call+0x58/0x6c

The issue is caused as on Power-8 the AFU attr 'prefault_mode' was
used to improve initial storage fault performance by prefaulting
process segments. However on Power-9 with radix mode we don't have
Storage-Segments that we can prefault. Also prefaulting process Pages
will be too costly and fine-grained.

Hence, since the prefaulting mechanism doesn't makes sense of
radix-mode, this patch updates prefault_mode_store() to not allow any
other value apart from CXL_PREFAULT_NONE when radix mode is enabled.

Fixes: f24be42aab37 ("cxl: Add psl9 specific code")
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from linux-next commit b6c84ba22ff3a198eb8d5552cf9b8fda1d792e54)
Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Acked-by: Khalid Elmously <khalid.elmously@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
6 years agocxl: Configure PSL to not use APC virtual machines
Vaibhav Jain [Tue, 17 Apr 2018 05:11:02 +0000 (10:41 +0530)]
cxl: Configure PSL to not use APC virtual machines

BugLink: http://bugs.launchpad.net/bugs/1774471
APC virtual machines arent used on POWER-9 chips and are already
disabled in on-chip CAPP. They also need to be disabled on the PSL via
'PSL Data Send Control Register' by setting bit(47). This forces the
PSL to send commands to CAPP with queue.id == 0.

Fixes: 5632874311db ("cxl: Add support for POWER9 DD2")
Cc: stable@vger.kernel.org # v4.15+
Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Reviewed-by: Alastair D'Silva <alastair@d-silva.org>
Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from linux-next commit 9a6d2022bacd8fca0be6297459a02dfd28dad6ba)
Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Acked-by: Khalid Elmously <khalid.elmously@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
6 years agocxl: Report the tunneled operations status
Philippe Bergheaud [Mon, 14 May 2018 08:27:36 +0000 (10:27 +0200)]
cxl: Report the tunneled operations status

BugLink: http://bugs.launchpad.net/bugs/1774471
Failure to synchronize the tunneled operations does not prevent
the initialization of the cxl card. This patch reports the tunneled
operations status via /sys.

Signed-off-by: Philippe Bergheaud <felix@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit 497a0790e2c604366b9e35dcb41310319e9bca13)
Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Acked-by: Khalid Elmously <khalid.elmously@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
6 years agocxl: Set the PBCQ Tunnel BAR register when enabling capi mode
Philippe Bergheaud [Mon, 14 May 2018 08:27:35 +0000 (10:27 +0200)]
cxl: Set the PBCQ Tunnel BAR register when enabling capi mode

BugLink: http://bugs.launchpad.net/bugs/1774471
Skiboot used to set the default Tunnel BAR register value when capi
mode was enabled. This approach was ok for the cxl driver, but
prevented other drivers from choosing different values.

Skiboot versions > 5.11 will not set the default value any longer.
This patch modifies the cxl driver to set/reset the Tunnel BAR
register when entering/exiting the cxl mode, with
pnv_pci_set_tunnel_bar().

That should work with old skiboot (since we are re-writing the value
already set) and new skiboot.

mpe: The tunnel support was only merged into Linux recently, in commit
d6a90bb83b50 ("powerpc/powernv: Enable tunneled operations")
(v4.17-rc1), so with new skiboot kernels between that commit and this
will not work correctly.

Fixes: d6a90bb83b50 ("powerpc/powernv: Enable tunneled operations")
Signed-off-by: Philippe Bergheaud <felix@linux.ibm.com>
Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit 401dca8cbd14fc4b32d93499dcd12a1711a73ecc)
Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Acked-by: Khalid Elmously <khalid.elmously@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
6 years agocxl: Remove function write_timebase_ctrl_psl9() for PSL9
Vaibhav Jain [Thu, 15 Feb 2018 06:19:36 +0000 (11:49 +0530)]
cxl: Remove function write_timebase_ctrl_psl9() for PSL9

BugLink: http://bugs.launchpad.net/bugs/1774471
For PSL9 the contents of PSL_TB_CTLSTAT register have changed in PSL9
and all of the register is now readonly. Hence we don't need an sl_ops
implementation for 'write_timebase_ctrl' for to populate this register
for PSL9.

Hence this patch removes function write_timebase_ctrl_psl9() and its
references from the code.

Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit 02b63b420223db3e33e19cc0aaf346371e8efe48)
Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Acked-by: Khalid Elmously <khalid.elmously@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
6 years agoBluetooth: btusb: Apply QCA Rome patches for some ATH3012 models
Takashi Iwai [Wed, 13 Jun 2018 16:53:58 +0000 (12:53 -0400)]
Bluetooth: btusb: Apply QCA Rome patches for some ATH3012 models

BugLink: http://bugs.launchpad.net/bugs/1764645
In commit f44cb4b19ed4 ("Bluetooth: btusb: Fix quirk for Atheros
1525/QCA6174") we tried to address the non-working Atheros BT devices
by changing the quirk from BTUSB_ATH3012 to BTUSB_QCA_ROME.  This made
such devices working while it turned out to break other existing chips
with the very same USB ID, hence it was reverted afterwards.

This is another attempt to tackle the issue.  The essential point to
use BTUSB_QCA_ROME is to apply the btusb_setup_qca() and do RAM-
patching.  And the previous attempt failed because btusb_setup_qca()
returns -ENODEV if the ROM version doesn't match with the expected
ones.  For some devices that have already the "correct" ROM versions,
we may just skip the setup procedure and continue the rest.

So, the first fix we'll need is to add a check of the ROM version in
the function to skip the setup if the ROM version looks already sane,
so that it can be applied for all ath devices.

However, the world is a bit more complex than that simple solution.
Since BTUSB_ATH3012 quirk checks the bcdDevice and bails out when it's
0x0001 at the beginning of probing, so the device probe always aborts
here.

In this patch, we add another check of ROM version again, and if the
device needs patching, the probe continues.  For that, a slight
refactoring of btusb_qca_send_vendor_req() was required so that the
probe function can pass usb_device pointer directly before allocating
hci_dev stuff.

Fixes: commit f44cb4b19ed4 ("Bluetooth: btusb: Fix quirk for Atheros 1525/QCA6174")
Bugzilla: http://bugzilla.opensuse.org/show_bug.cgi?id=1082504
Tested-by: Ivan Levshin <ivan.levshin@microfocus.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
(cherry picked from commit 803cdb8ce584198cd45825822910cac7de6378cb)
Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Acked-by: Khalid Elmously <khalid.elmously@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
6 years agoUBUNTU: SAUCE: wcn36xx: read MAC from file or randomly generate one
Paolo Pisati [Thu, 14 Jun 2018 10:08:56 +0000 (12:08 +0200)]
UBUNTU: SAUCE: wcn36xx: read MAC from file or randomly generate one

BugLink: http://bugs.launchpad.net/bugs/1776491
By default, wcn36xx initializes itself with a dummy 00:00:00:00:00:00 MAC
address, preventing the interface from working until a valid MAC address was set.

While not an issue on Ubuntu Classic (where the user can always set it
later on the command line or via /etc/network/interfaces), it became a problem
on Ubuntu Core where the wifi interface is probed during installation,
before the user has any chance to set a new MAC address.

To overcome this scenario, the wcn36xx driver in Xenial had a couple of features:

1) during probe, if /lib/firmware/wlan/macaddr0 was present, its content was
used as the new MAC address

2) if that failed, a pseudo-random MAC addres was generated and set

and this is a port of a the corresponding Xenial code to Bionic:
see xenial/snapdragon tree,
drivers/net/wireless/ath/wcn36xx/wcn36xx-msm.c::wcn36xx_msm_get_hw_mac().

Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Acked-by: Khalid Elmously <khalid.elmously@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
6 years agoUBUNTU: [Config] arm64: snapdragon: WCN36XX_SNAPDRAGON_HACKS=y
Paolo Pisati [Thu, 14 Jun 2018 10:08:57 +0000 (12:08 +0200)]
UBUNTU: [Config] arm64: snapdragon: WCN36XX_SNAPDRAGON_HACKS=y

BugLink: http://bugs.launchpad.net/bugs/1776491
Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Acked-by: Khalid Elmously <khalid.elmously@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
6 years agofscache: Fix hanging wait on page discarded by writeback
David Howells [Fri, 15 Jun 2018 02:33:29 +0000 (12:33 +1000)]
fscache: Fix hanging wait on page discarded by writeback

BugLink: https://bugs.launchpad.net/bugs/1777029
If the fscache asynchronous write operation elects to discard a page that's
pending storage to the cache because the page would be over the store limit
then it needs to wake the page as someone may be waiting on completion of
the write.

The problem is that the store limit may be updated by a different
asynchronous operation - and so may miss the write - and that the store
limit may not even get updated until later by the netfs.

Fix the kernel hang by making fscache_write_op() mark as written any pages
that are over the limit.

Signed-off-by: David Howells <dhowells@redhat.com>
(cherry picked from commit 2c98425720233ae3e135add0c7e869b32913502f)
Signed-off-by: Daniel Axtens <daniel.axtens@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Khalid Elmously <khalid.elmously@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
6 years agoUBUNTU: Start new release
Stefan Bader [Tue, 14 Aug 2018 10:19:58 +0000 (12:19 +0200)]
UBUNTU: Start new release

Ignore: yes
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agoUBUNTU: Ubuntu-4.15.0-32.35 Ubuntu-4.15.0-32.35
Stefan Bader [Thu, 9 Aug 2018 16:09:26 +0000 (18:09 +0200)]
UBUNTU: Ubuntu-4.15.0-32.35

Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agocpu: Fix per-cpu regression on ARM64
Tyler Hicks [Fri, 10 Aug 2018 14:42:27 +0000 (14:42 +0000)]
cpu: Fix per-cpu regression on ARM64

this_cpu_write() cannot be used on ARM64 until smp_prepare_boot_cpu()
has been called. boot_cpu_state_init() is called right before
smp_prepare_boot_cpu() so it can't use this_cpu_write().

CVE-2018-3620
CVE-2018-3646

Fixes: 956aab6172ee ("cpu/hotplug: Boot HT siblings at least once")
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
6 years agoRevert "net: increase fragment memory usage limits"
Tyler Hicks [Fri, 3 Aug 2018 21:53:15 +0000 (21:53 +0000)]
Revert "net: increase fragment memory usage limits"

This reverts commit c2a936600f78aea00d3312ea4b66a79a4619f9b4. It
made denial of service attacks on the IP fragment handling easier to
carry out.

CVE-2018-5391

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agox86/mm/pat: Make set_memory_np() L1TF safe
Andi Kleen [Tue, 7 Aug 2018 22:09:39 +0000 (15:09 -0700)]
x86/mm/pat: Make set_memory_np() L1TF safe

set_memory_np() is used to mark kernel mappings not present, but it has
it's own open coded mechanism which does not have the L1TF protection of
inverting the address bits.

Replace the open coded PTE manipulation with the L1TF protecting low level
PTE routines.

Passes the CPA self test.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
CVE-2018-3620
CVE-2018-3646

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agox86/speculation/l1tf: Make pmd/pud_mknotpresent() invert
Andi Kleen [Tue, 7 Aug 2018 22:09:37 +0000 (15:09 -0700)]
x86/speculation/l1tf: Make pmd/pud_mknotpresent() invert

Some cases in THP like:
  - MADV_FREE
  - mprotect
  - split

mark the PMD non present for temporarily to prevent races. The window for
an L1TF attack in these contexts is very small, but it wants to be fixed
for correctness sake.

Use the proper low level functions for pmd/pud_mknotpresent() to address
this.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
CVE-2018-3620
CVE-2018-3646

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agox86/speculation/l1tf: Invert all not present mappings
Andi Kleen [Tue, 7 Aug 2018 22:09:36 +0000 (15:09 -0700)]
x86/speculation/l1tf: Invert all not present mappings

For kernel mappings PAGE_PROTNONE is not necessarily set for a non present
mapping, but the inversion logic explicitely checks for !PRESENT and
PROT_NONE.

Remove the PROT_NONE check and make the inversion unconditional for all not
present mappings.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
CVE-2018-3620
CVE-2018-3646

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agocpu/hotplug: Fix SMT supported evaluation
Thomas Gleixner [Tue, 7 Aug 2018 06:19:57 +0000 (08:19 +0200)]
cpu/hotplug: Fix SMT supported evaluation

Josh reported that the late SMT evaluation in cpu_smt_state_init() sets
cpu_smt_control to CPU_SMT_NOT_SUPPORTED in case that 'nosmt' was supplied
on the kernel command line as it cannot differentiate between SMT disabled
by BIOS and SMT soft disable via 'nosmt'. That wreckages the state and
makes the sysfs interface unusable.

Rework this so that during bringup of the non boot CPUs the availability of
SMT is determined in cpu_smt_allowed(). If a newly booted CPU is not a
'primary' thread then set the local cpu_smt_available marker and evaluate
this explicitely right after the initial SMP bringup has finished.

SMT evaulation on x86 is a trainwreck as the firmware has all the
information _before_ booting the kernel, but there is no interface to query
it.

Fixes: 73d5e2b47264 ("cpu/hotplug: detect SMT disabled by BIOS")
Reported-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
CVE-2018-3620
CVE-2018-3646

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agoKVM: VMX: Tell the nested hypervisor to skip L1D flush on vmentry
Paolo Bonzini [Sun, 5 Aug 2018 14:07:47 +0000 (16:07 +0200)]
KVM: VMX: Tell the nested hypervisor to skip L1D flush on vmentry

When nested virtualization is in use, VMENTER operations from the nested
hypervisor into the nested guest will always be processed by the bare metal
hypervisor, and KVM's "conditional cache flushes" mode in particular does a
flush on nested vmentry.  Therefore, include the "skip L1D flush on
vmentry" bit in KVM's suggested ARCH_CAPABILITIES setting.

Add the relevant Documentation.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
CVE-2018-3620
CVE-2018-3646

[tyhicks: Adjust for the missing MSR_F10H_DECFG and MSR_IA32_UCODE_REV
 feature MSRs which do not exist in 4.15]
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agoKVM: VMX: support MSR_IA32_ARCH_CAPABILITIES as a feature MSR
Paolo Bonzini [Mon, 25 Jun 2018 12:04:37 +0000 (14:04 +0200)]
KVM: VMX: support MSR_IA32_ARCH_CAPABILITIES as a feature MSR

This lets userspace read the MSR_IA32_ARCH_CAPABILITIES and check that all
requested features are available on the host.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
CVE-2018-3620
CVE-2018-3646

(backported from commit cd28325249a1ca0d771557ce823e0308ad629f98)
[tyhicks: Adjust for the missing MSR_F10H_DECFG and MSR_IA32_UCODE_REV
 feature MSRs which do not exist in 4.15]
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agoKVM: X86: Introduce kvm_get_msr_feature()
Wanpeng Li [Wed, 28 Feb 2018 06:03:30 +0000 (14:03 +0800)]
KVM: X86: Introduce kvm_get_msr_feature()

Introduce kvm_get_msr_feature() to handle the msrs which are supported
by different vendors and sharing the same emulation logic.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Liran Alon <liran.alon@oracle.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
CVE-2018-3620
CVE-2018-3646

(cherry picked from commit 66421c1ec340096b291af763ed5721314cdd9c5c)
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agoKVM: x86: Add a framework for supporting MSR-based features
Tom Lendacky [Wed, 21 Feb 2018 19:39:51 +0000 (13:39 -0600)]
KVM: x86: Add a framework for supporting MSR-based features

Provide a new KVM capability that allows bits within MSRs to be recognized
as features.  Two new ioctls are added to the /dev/kvm ioctl routine to
retrieve the list of these MSRs and then retrieve their values. A kvm_x86_ops
callback is used to determine support for the listed MSR-based features.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[Tweaked documentation. - Radim]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
CVE-2018-3620
CVE-2018-3646

(backported from commit 801e459a6f3a63af9d447e6249088c76ae16efc4)
[tyhicks: Adjust context for missing mem_enc_* kvm_x86_ops]
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agox86/speculation: Use ARCH_CAPABILITIES to skip L1D flush on vmentry
Paolo Bonzini [Sun, 5 Aug 2018 14:07:46 +0000 (16:07 +0200)]
x86/speculation: Use ARCH_CAPABILITIES to skip L1D flush on vmentry

Bit 3 of ARCH_CAPABILITIES tells a hypervisor that L1D flush on vmentry is
not needed.  Add a new value to enum vmx_l1d_flush_state, which is used
either if there is no L1TF bug at all, or if bit 3 is set in ARCH_CAPABILITIES.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
CVE-2018-3620
CVE-2018-3646

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agox86/speculation: Simplify sysfs report of VMX L1TF vulnerability
Paolo Bonzini [Sun, 5 Aug 2018 14:07:45 +0000 (16:07 +0200)]
x86/speculation: Simplify sysfs report of VMX L1TF vulnerability

Three changes to the content of the sysfs file:

 - If EPT is disabled, L1TF cannot be exploited even across threads on the
   same core, and SMT is irrelevant.

 - If mitigation is completely disabled, and SMT is enabled, print "vulnerable"
   instead of "vulnerable, SMT vulnerable"

 - Reorder the two parts so that the main vulnerability state comes first
   and the detail on SMT is second.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
CVE-2018-3620
CVE-2018-3646

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agoDocumentation/l1tf: Remove Yonah processors from not vulnerable list
Thomas Gleixner [Sun, 5 Aug 2018 15:06:12 +0000 (17:06 +0200)]
Documentation/l1tf: Remove Yonah processors from not vulnerable list

Dave reported, that it's not confirmed that Yonah processors are
unaffected. Remove them from the list.

Reported-by: ave Hansen <dave.hansen@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
CVE-2018-3620
CVE-2018-3646

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agox86/KVM/VMX: Don't set l1tf_flush_l1d from vmx_handle_external_intr()
Nicolai Stange [Sun, 22 Jul 2018 11:38:18 +0000 (13:38 +0200)]
x86/KVM/VMX: Don't set l1tf_flush_l1d from vmx_handle_external_intr()

For VMEXITs caused by external interrupts, vmx_handle_external_intr()
indirectly calls into the interrupt handlers through the host's IDT.

It follows that these interrupts get accounted for in the
kvm_cpu_l1tf_flush_l1d per-cpu flag.

The subsequently executed vmx_l1d_flush() will thus be aware that some
interrupts have happened and conduct a L1d flush anyway.

Setting l1tf_flush_l1d from vmx_handle_external_intr() isn't needed
anymore. Drop it.

Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
CVE-2018-3620
CVE-2018-3646

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agox86/irq: Let interrupt handlers set kvm_cpu_l1tf_flush_l1d
Nicolai Stange [Sun, 29 Jul 2018 11:06:04 +0000 (13:06 +0200)]
x86/irq: Let interrupt handlers set kvm_cpu_l1tf_flush_l1d

The last missing piece to having vmx_l1d_flush() take interrupts after
VMEXIT into account is to set the kvm_cpu_l1tf_flush_l1d per-cpu flag on
irq entry.

Issue calls to kvm_set_cpu_l1tf_flush_l1d() from entering_irq(),
ipi_entering_ack_irq(), smp_reschedule_interrupt() and
uv_bau_message_interrupt().

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
CVE-2018-3620
CVE-2018-3646

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agox86: Don't include linux/irq.h from asm/hardirq.h
Nicolai Stange [Sun, 29 Jul 2018 10:15:33 +0000 (12:15 +0200)]
x86: Don't include linux/irq.h from asm/hardirq.h

The next patch in this series will have to make the definition of
irq_cpustat_t available to entering_irq().

Inclusion of asm/hardirq.h into asm/apic.h would cause circular header
dependencies like

  asm/smp.h
    asm/apic.h
      asm/hardirq.h
        linux/irq.h
          linux/topology.h
            linux/smp.h
              asm/smp.h

or

  linux/gfp.h
    linux/mmzone.h
      asm/mmzone.h
        asm/mmzone_64.h
          asm/smp.h
            asm/apic.h
              asm/hardirq.h
                linux/irq.h
                  linux/irqdesc.h
                    linux/kobject.h
                      linux/sysfs.h
                        linux/kernfs.h
                          linux/idr.h
                            linux/gfp.h

and others.

This causes compilation errors because of the header guards becoming
effective in the second inclusion: symbols/macros that had been defined
before wouldn't be available to intermediate headers in the #include chain
anymore.

A possible workaround would be to move the definition of irq_cpustat_t
into its own header and include that from both, asm/hardirq.h and
asm/apic.h.

However, this wouldn't solve the real problem, namely asm/harirq.h
unnecessarily pulling in all the linux/irq.h cruft: nothing in
asm/hardirq.h itself requires it. Also, note that there are some other
archs, like e.g. arm64, which don't have that #include in their
asm/hardirq.h.

Remove the linux/irq.h #include from x86' asm/hardirq.h.

Fix resulting compilation errors by adding appropriate #includes to *.c
files as needed.

Note that some of these *.c files could be cleaned up a bit wrt. to their
set of #includes, but that should better be done from separate patches, if
at all.

Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
CVE-2018-3620
CVE-2018-3646

[tyhicks: Adjust context in smpboot.c, add asm/sections.h include in
 kprobes/core.c, i915_pmu.c doesn't exist in 4.15,
 controller/pci-hyperv.c is host/pci-hyperv.c in 4.15]
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agox86/KVM/VMX: Introduce per-host-cpu analogue of l1tf_flush_l1d
Nicolai Stange [Fri, 27 Jul 2018 11:22:16 +0000 (13:22 +0200)]
x86/KVM/VMX: Introduce per-host-cpu analogue of l1tf_flush_l1d

Part of the L1TF mitigation for vmx includes flushing the L1D cache upon
VMENTRY.

L1D flushes are costly and two modes of operations are provided to users:
"always" and the more selective "conditional" mode.

If operating in the latter, the cache would get flushed only if a host side
code path considered unconfined had been traversed. "Unconfined" in this
context means that it might have pulled in sensitive data like user data
or kernel crypto keys.

The need for L1D flushes is tracked by means of the per-vcpu flag
l1tf_flush_l1d. KVM exit handlers considered unconfined set it. A
vmx_l1d_flush() subsequently invoked before the next VMENTER will conduct a
L1d flush based on its value and reset that flag again.

Currently, interrupts delivered "normally" while in root operation between
VMEXIT and VMENTER are not taken into account. Part of the reason is that
these don't leave any traces and thus, the vmx code is unable to tell if
any such has happened.

As proposed by Paolo Bonzini, prepare for tracking all interrupts by
introducing a new per-cpu flag, "kvm_cpu_l1tf_flush_l1d". It will be in
strong analogy to the per-vcpu ->l1tf_flush_l1d.

A later patch will make interrupt handlers set it.

For the sake of cache locality, group kvm_cpu_l1tf_flush_l1d into x86'
per-cpu irq_cpustat_t as suggested by Peter Zijlstra.

Provide the helpers kvm_set_cpu_l1tf_flush_l1d(),
kvm_clear_cpu_l1tf_flush_l1d() and kvm_get_cpu_l1tf_flush_l1d(). Make them
trivial resp. non-existent for !CONFIG_KVM_INTEL as appropriate.

Let vmx_l1d_flush() handle kvm_cpu_l1tf_flush_l1d in the same way as
l1tf_flush_l1d.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
CVE-2018-3620
CVE-2018-3646

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agox86/irq: Demote irq_cpustat_t::__softirq_pending to u16
Nicolai Stange [Fri, 27 Jul 2018 10:46:29 +0000 (12:46 +0200)]
x86/irq: Demote irq_cpustat_t::__softirq_pending to u16

An upcoming patch will extend KVM's L1TF mitigation in conditional mode
to also cover interrupts after VMEXITs. For tracking those, stores to a
new per-cpu flag from interrupt handlers will become necessary.

In order to improve cache locality, this new flag will be added to x86's
irq_cpustat_t.

Make some space available there by shrinking the ->softirq_pending bitfield
from 32 to 16 bits: the number of bits actually used is only NR_SOFTIRQS,
i.e. 10.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
CVE-2018-3620
CVE-2018-3646

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agox86/KVM/VMX: Move the l1tf_flush_l1d test to vmx_l1d_flush()
Nicolai Stange [Sat, 21 Jul 2018 20:35:28 +0000 (22:35 +0200)]
x86/KVM/VMX: Move the l1tf_flush_l1d test to vmx_l1d_flush()

Currently, vmx_vcpu_run() checks if l1tf_flush_l1d is set and invokes
vmx_l1d_flush() if so.

This test is unncessary for the "always flush L1D" mode.

Move the check to vmx_l1d_flush()'s conditional mode code path.

Notes:
- vmx_l1d_flush() is likely to get inlined anyway and thus, there's no
  extra function call.

- This inverts the (static) branch prediction, but there hadn't been any
  explicit likely()/unlikely() annotations before and so it stays as is.

Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
CVE-2018-3620
CVE-2018-3646

[tyhicks: Minor context adjustment]
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agox86/KVM/VMX: Replace 'vmx_l1d_flush_always' with 'vmx_l1d_flush_cond'
Nicolai Stange [Sat, 21 Jul 2018 20:25:00 +0000 (22:25 +0200)]
x86/KVM/VMX: Replace 'vmx_l1d_flush_always' with 'vmx_l1d_flush_cond'

The vmx_l1d_flush_always static key is only ever evaluated if
vmx_l1d_should_flush is enabled. In that case however, there are only two
L1d flushing modes possible: "always" and "conditional".

The "conditional" mode's implementation tends to require more sophisticated
logic than the "always" mode.

Avoid inverted logic by replacing the 'vmx_l1d_flush_always' static key
with a 'vmx_l1d_flush_cond' one.

There is no change in functionality.

Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
CVE-2018-3620
CVE-2018-3646

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agox86/KVM/VMX: Don't set l1tf_flush_l1d to true from vmx_l1d_flush()
Nicolai Stange [Sat, 21 Jul 2018 20:16:56 +0000 (22:16 +0200)]
x86/KVM/VMX: Don't set l1tf_flush_l1d to true from vmx_l1d_flush()

vmx_l1d_flush() gets invoked only if l1tf_flush_l1d is true. There's no
point in setting l1tf_flush_l1d to true from there again.

Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
CVE-2018-3620
CVE-2018-3646

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agocpu/hotplug: detect SMT disabled by BIOS
Josh Poimboeuf [Tue, 24 Jul 2018 16:17:40 +0000 (18:17 +0200)]
cpu/hotplug: detect SMT disabled by BIOS

If SMT is disabled in BIOS, the CPU code doesn't properly detect it.
The /sys/devices/system/cpu/smt/control file shows 'on', and the 'l1tf'
vulnerabilities file shows SMT as vulnerable.

Fix it by forcing 'cpu_smt_control' to CPU_SMT_NOT_SUPPORTED in such a
case.  Unfortunately the detection can only be done after bringing all
the CPUs online, so we have to overwrite any previous writes to the
variable.

Reported-by: Joe Mario <jmario@redhat.com>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Fixes: f048c399e0f7 ("x86/topology: Provide topology_smt_supported()")
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
CVE-2018-3620
CVE-2018-3646

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agoDocumentation/l1tf: Fix typos
Tony Luck [Thu, 19 Jul 2018 20:49:58 +0000 (13:49 -0700)]
Documentation/l1tf: Fix typos

Fix spelling and other typos

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
CVE-2018-3620
CVE-2018-3646

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agox86/KVM/VMX: Initialize the vmx_l1d_flush_pages' content
Nicolai Stange [Wed, 18 Jul 2018 17:07:38 +0000 (19:07 +0200)]
x86/KVM/VMX: Initialize the vmx_l1d_flush_pages' content

The slow path in vmx_l1d_flush() reads from vmx_l1d_flush_pages in order
to evict the L1d cache.

However, these pages are never cleared and, in theory, their data could be
leaked.

More importantly, KSM could merge a nested hypervisor's vmx_l1d_flush_pages
to fewer than 1 << L1D_CACHE_ORDER host physical pages and this would break
the L1d flushing algorithm: L1D on x86_64 is tagged by physical addresses.

Fix this by initializing the individual vmx_l1d_flush_pages with a
different pattern each.

Rename the "empty_zp" asm constraint identifier in vmx_l1d_flush() to
"flush_pages" to reflect this change.

Fixes: a47dd5f06714 ("x86/KVM/VMX: Add L1D flush algorithm")
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
CVE-2018-3620
CVE-2018-3646

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agox86/speculation/l1tf: Unbreak !__HAVE_ARCH_PFN_MODIFY_ALLOWED architectures
Jiri Kosina [Sat, 14 Jul 2018 19:56:13 +0000 (21:56 +0200)]
x86/speculation/l1tf: Unbreak !__HAVE_ARCH_PFN_MODIFY_ALLOWED architectures

pfn_modify_allowed() and arch_has_pfn_modify_check() are outside of the
!__ASSEMBLY__ section in include/asm-generic/pgtable.h, which confuses
assembler on archs that don't have __HAVE_ARCH_PFN_MODIFY_ALLOWED (e.g.
ia64) and breaks build:

    include/asm-generic/pgtable.h: Assembler messages:
    include/asm-generic/pgtable.h:538: Error: Unknown opcode `static inline bool pfn_modify_allowed(unsigned long pfn,pgprot_t prot)'
    include/asm-generic/pgtable.h:540: Error: Unknown opcode `return true'
    include/asm-generic/pgtable.h:543: Error: Unknown opcode `static inline bool arch_has_pfn_modify_check(void)'
    include/asm-generic/pgtable.h:545: Error: Unknown opcode `return false'
    arch/ia64/kernel/entry.S:69: Error: `mov' does not fit into bundle

Move those two static inlines into the !__ASSEMBLY__ section so that they
don't confuse the asm build pass.

Fixes: 42e4089c7890 ("x86/speculation/l1tf: Disallow non privileged high MMIO PROT_NONE mappings")
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
CVE-2018-3620
CVE-2018-3646

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agoDocumentation: Add section about CPU vulnerabilities
Thomas Gleixner [Fri, 13 Jul 2018 14:23:26 +0000 (16:23 +0200)]
Documentation: Add section about CPU vulnerabilities

Add documentation for the L1TF vulnerability and the mitigation mechanisms:

  - Explain the problem and risks
  - Document the mitigation mechanisms
  - Document the command line controls
  - Document the sysfs files

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lkml.kernel.org/r/20180713142323.287429944@linutronix.de
CVE-2018-3620
CVE-2018-3646

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agox86/bugs, kvm: Introduce boot-time control of L1TF mitigations
Jiri Kosina [Fri, 13 Jul 2018 14:23:25 +0000 (16:23 +0200)]
x86/bugs, kvm: Introduce boot-time control of L1TF mitigations

Introduce the 'l1tf=' kernel command line option to allow for boot-time
switching of mitigation that is used on processors affected by L1TF.

The possible values are:

  full
Provides all available mitigations for the L1TF vulnerability. Disables
SMT and enables all mitigations in the hypervisors. SMT control via
/sys/devices/system/cpu/smt/control is still possible after boot.
Hypervisors will issue a warning when the first VM is started in
a potentially insecure configuration, i.e. SMT enabled or L1D flush
disabled.

  full,force
Same as 'full', but disables SMT control. Implies the 'nosmt=force'
command line option. sysfs control of SMT and the hypervisor flush
control is disabled.

  flush
Leaves SMT enabled and enables the conditional hypervisor mitigation.
Hypervisors will issue a warning when the first VM is started in a
potentially insecure configuration, i.e. SMT enabled or L1D flush
disabled.

  flush,nosmt
Disables SMT and enables the conditional hypervisor mitigation. SMT
control via /sys/devices/system/cpu/smt/control is still possible
after boot. If SMT is reenabled or flushing disabled at runtime
hypervisors will issue a warning.

  flush,nowarn
Same as 'flush', but hypervisors will not warn when
a VM is started in a potentially insecure configuration.

  off
Disables hypervisor mitigations and doesn't emit any warnings.

Default is 'flush'.

Let KVM adhere to these semantics, which means:

  - 'lt1f=full,force' : Performe L1D flushes. No runtime control
       possible.

  - 'l1tf=full'
  - 'l1tf-flush'
  - 'l1tf=flush,nosmt' : Perform L1D flushes and warn on VM start if
  SMT has been runtime enabled or L1D flushing
  has been run-time enabled

  - 'l1tf=flush,nowarn' : Perform L1D flushes and no warnings are emitted.

  - 'l1tf=off' : L1D flushes are not performed and no warnings
  are emitted.

KVM can always override the L1D flushing behavior using its 'vmentry_l1d_flush'
module parameter except when lt1f=full,force is set.

This makes KVM's private 'nosmt' option redundant, and as it is a bit
non-systematic anyway (this is something to control globally, not on
hypervisor level), remove that option.

Add the missing Documentation entry for the l1tf vulnerability sysfs file
while at it.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20180713142323.202758176@linutronix.de
CVE-2018-3620
CVE-2018-3646

[tyhicks: Adjust context for changes to vmx_vm_init()]
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agocpu/hotplug: Set CPU_SMT_NOT_SUPPORTED early
Thomas Gleixner [Fri, 13 Jul 2018 14:23:24 +0000 (16:23 +0200)]
cpu/hotplug: Set CPU_SMT_NOT_SUPPORTED early

The CPU_SMT_NOT_SUPPORTED state is set (if the processor does not support
SMT) when the sysfs SMT control file is initialized.

That was fine so far as this was only required to make the output of the
control file correct and to prevent writes in that case.

With the upcoming l1tf command line parameter, this needs to be set up
before the L1TF mitigation selection and command line parsing happens.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20180713142323.121795971@linutronix.de
CVE-2018-3620
CVE-2018-3646

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agocpu/hotplug: Expose SMT control init function
Jiri Kosina [Fri, 13 Jul 2018 14:23:23 +0000 (16:23 +0200)]
cpu/hotplug: Expose SMT control init function

The L1TF mitigation will gain a commend line parameter which allows to set
a combination of hypervisor mitigation and SMT control.

Expose cpu_smt_disable() so the command line parser can tweak SMT settings.

[ tglx: Split out of larger patch and made it preserve an already existing
   force off state ]

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20180713142323.039715135@linutronix.de
CVE-2018-3620
CVE-2018-3646

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agox86/kvm: Allow runtime control of L1D flush
Thomas Gleixner [Fri, 13 Jul 2018 14:23:22 +0000 (16:23 +0200)]
x86/kvm: Allow runtime control of L1D flush

All mitigation modes can be switched at run time with a static key now:

 - Use sysfs_streq() instead of strcmp() to handle the trailing new line
   from sysfs writes correctly.
 - Make the static key management handle multiple invocations properly.
 - Set the module parameter file to RW

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20180713142322.954525119@linutronix.de
CVE-2018-3620
CVE-2018-3646

[tyhicks: Minor context adjustment]
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agox86/kvm: Serialize L1D flush parameter setter
Thomas Gleixner [Fri, 13 Jul 2018 14:23:21 +0000 (16:23 +0200)]
x86/kvm: Serialize L1D flush parameter setter

Writes to the parameter files are not serialized at the sysfs core
level, so local serialization is required.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20180713142322.873642605@linutronix.de
CVE-2018-3620
CVE-2018-3646

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agox86/kvm: Add static key for flush always
Thomas Gleixner [Fri, 13 Jul 2018 14:23:20 +0000 (16:23 +0200)]
x86/kvm: Add static key for flush always

Avoid the conditional in the L1D flush control path.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20180713142322.790914912@linutronix.de
CVE-2018-3620
CVE-2018-3646

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agox86/kvm: Move l1tf setup function
Thomas Gleixner [Fri, 13 Jul 2018 14:23:19 +0000 (16:23 +0200)]
x86/kvm: Move l1tf setup function

In preparation of allowing run time control for L1D flushing, move the
setup code to the module parameter handler.

In case of pre module init parsing, just store the value and let vmx_init()
do the actual setup after running kvm_init() so that enable_ept is having
the correct state.

During run-time invoke it directly from the parameter setter to prepare for
run-time control.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20180713142322.694063239@linutronix.de
CVE-2018-3620
CVE-2018-3646

[tyhicks: Minor context adjustment]
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agox86/l1tf: Handle EPT disabled state proper
Thomas Gleixner [Fri, 13 Jul 2018 14:23:18 +0000 (16:23 +0200)]
x86/l1tf: Handle EPT disabled state proper

If Extended Page Tables (EPT) are disabled or not supported, no L1D
flushing is required. The setup function can just avoid setting up the L1D
flush for the EPT=n case.

Invoke it after the hardware setup has be done and enable_ept has the
correct state and expose the EPT disabled state in the mitigation status as
well.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20180713142322.612160168@linutronix.de
CVE-2018-3620
CVE-2018-3646

[tyhicks: Adjust the vmx_exit() contents]
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agox86/kvm: Drop L1TF MSR list approach
Thomas Gleixner [Fri, 13 Jul 2018 14:23:17 +0000 (16:23 +0200)]
x86/kvm: Drop L1TF MSR list approach

The VMX module parameter to control the L1D flush should become
writeable.

The MSR list is set up at VM init per guest VCPU, but the run time
switching is based on a static key which is global. Toggling the MSR list
at run time might be feasible, but for now drop this optimization and use
the regular MSR write to make run-time switching possible.

The default mitigation is the conditional flush anyway, so for extra
paranoid setups this will add some small overhead, but the extra code
executed is in the noise compared to the flush itself.

Aside of that the EPT disabled case is not handled correctly at the moment
and the MSR list magic is in the way for fixing that as well.

If it's really providing a significant advantage, then this needs to be
revisited after the code is correct and the control is writable.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20180713142322.516940445@linutronix.de
CVE-2018-3620
CVE-2018-3646

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agox86/litf: Introduce vmx status variable
Thomas Gleixner [Fri, 13 Jul 2018 14:23:16 +0000 (16:23 +0200)]
x86/litf: Introduce vmx status variable

Store the effective mitigation of VMX in a status variable and use it to
report the VMX state in the l1tf sysfs file.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20180713142322.433098358@linutronix.de
CVE-2018-3620
CVE-2018-3646

[tyhicks: Adjust context in vmx_exit()]
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agocpu/hotplug: Online siblings when SMT control is turned on
Thomas Gleixner [Sat, 7 Jul 2018 09:40:18 +0000 (11:40 +0200)]
cpu/hotplug: Online siblings when SMT control is turned on

Writing 'off' to /sys/devices/system/cpu/smt/control offlines all SMT
siblings. Writing 'on' merily enables the abilify to online them, but does
not online them automatically.

Make 'on' more useful by onlining all offline siblings.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
CVE-2018-3620
CVE-2018-3646

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agox86/KVM/VMX: Use MSR save list for IA32_FLUSH_CMD if required
Konrad Rzeszutek Wilk [Thu, 28 Jun 2018 21:10:36 +0000 (17:10 -0400)]
x86/KVM/VMX: Use MSR save list for IA32_FLUSH_CMD if required

If the L1D flush module parameter is set to 'always' and the IA32_FLUSH_CMD
MSR is available, optimize the VMENTER code with the MSR save list.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
CVE-2018-3620
CVE-2018-3646

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agox86/KVM/VMX: Extend add_atomic_switch_msr() to allow VMENTER only MSRs
Konrad Rzeszutek Wilk [Thu, 21 Jun 2018 02:01:22 +0000 (22:01 -0400)]
x86/KVM/VMX: Extend add_atomic_switch_msr() to allow VMENTER only MSRs

The IA32_FLUSH_CMD MSR needs only to be written on VMENTER. Extend
add_atomic_switch_msr() with an entry_only parameter to allow storing the
MSR only in the guest (ENTRY) MSR array.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
CVE-2018-3620
CVE-2018-3646

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agox86/KVM/VMX: Separate the VMX AUTOLOAD guest/host number accounting
Konrad Rzeszutek Wilk [Thu, 21 Jun 2018 02:00:47 +0000 (22:00 -0400)]
x86/KVM/VMX: Separate the VMX AUTOLOAD guest/host number accounting

This allows to load a different number of MSRs depending on the context:
VMEXIT or VMENTER.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
CVE-2018-3620
CVE-2018-3646

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agox86/KVM/VMX: Add find_msr() helper function
Konrad Rzeszutek Wilk [Thu, 21 Jun 2018 00:11:39 +0000 (20:11 -0400)]
x86/KVM/VMX: Add find_msr() helper function

.. to help find the MSR on either the guest or host MSR list.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
CVE-2018-3620
CVE-2018-3646

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agox86/KVM/VMX: Split the VMX MSR LOAD structures to have an host/guest numbers
Konrad Rzeszutek Wilk [Wed, 20 Jun 2018 17:58:37 +0000 (13:58 -0400)]
x86/KVM/VMX: Split the VMX MSR LOAD structures to have an host/guest numbers

There is no semantic change but this change allows an unbalanced amount of
MSRs to be loaded on VMEXIT and VMENTER, i.e. the number of MSRs to save or
restore on VMEXIT or VMENTER may be different.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
CVE-2018-3620
CVE-2018-3646

[tyhicks: Backport around prepare_vmcs02_full() not yet existing]
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agox86/KVM/VMX: Add L1D flush logic
Paolo Bonzini [Mon, 2 Jul 2018 11:07:14 +0000 (13:07 +0200)]
x86/KVM/VMX: Add L1D flush logic

Add the logic for flushing L1D on VMENTER. The flush depends on the static
key being enabled and the new l1tf_flush_l1d flag being set.

The flags is set:
 - Always, if the flush module parameter is 'always'

 - Conditionally at:
   - Entry to vcpu_run(), i.e. after executing user space

   - From the sched_in notifier, i.e. when switching to a vCPU thread.

   - From vmexit handlers which are considered unsafe, i.e. where
     sensitive data can be brought into L1D:

     - The emulator, which could be a good target for other speculative
       execution-based threats,

     - The MMU, which can bring host page tables in the L1 cache.

     - External interrupts

     - Nested operations that require the MMU (see above). That is
       vmptrld, vmptrst, vmclear,vmwrite,vmread.

     - When handling invept,invvpid

[ tglx: Split out from combo patch and reduced to a single flag ]

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
CVE-2018-3620
CVE-2018-3646

[tyhicks: Backported around context changes]
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agox86/KVM/VMX: Add L1D MSR based flush
Paolo Bonzini [Mon, 2 Jul 2018 11:03:48 +0000 (13:03 +0200)]
x86/KVM/VMX: Add L1D MSR based flush

336996-Speculative-Execution-Side-Channel-Mitigations.pdf defines a new MSR
(IA32_FLUSH_CMD aka 0x10B) which has similar write-only semantics to other
MSRs defined in the document.

The semantics of this MSR is to allow "finer granularity invalidation of
caching structures than existing mechanisms like WBINVD. It will writeback
and invalidate the L1 data cache, including all cachelines brought in by
preceding instructions, without invalidating all caches (eg. L2 or
LLC). Some processors may also invalidate the first level level instruction
cache on a L1D_FLUSH command. The L1 data and instruction caches may be
shared across the logical processors of a core."

Use it instead of the loop based L1 flush algorithm.

A copy of this document is available at
   https://bugzilla.kernel.org/show_bug.cgi?id=199511

[ tglx: Avoid allocating pages when the MSR is available ]

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
CVE-2018-3620
CVE-2018-3646

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agox86/KVM/VMX: Add L1D flush algorithm
Paolo Bonzini [Mon, 2 Jul 2018 10:47:38 +0000 (12:47 +0200)]
x86/KVM/VMX: Add L1D flush algorithm

To mitigate the L1 Terminal Fault vulnerability it's required to flush L1D
on VMENTER to prevent rogue guests from snooping host memory.

CPUs will have a new control MSR via a microcode update to flush L1D with a
single MSR write, but in the absence of microcode a fallback to a software
based flush algorithm is required.

Add a software flush loop which is based on code from Intel.

[ tglx: Split out from combo patch ]
[ bpetkov: Polish the asm code ]

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
CVE-2018-3620
CVE-2018-3646

[tyhicks: Backported around context changes]
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agox86/KVM/VMX: Add module argument for L1TF mitigation
Konrad Rzeszutek Wilk [Mon, 2 Jul 2018 10:29:30 +0000 (12:29 +0200)]
x86/KVM/VMX: Add module argument for L1TF mitigation

Add a mitigation mode parameter "vmentry_l1d_flush" for CVE-2018-3620, aka
L1 terminal fault. The valid arguments are:

 - "always"  L1D cache flush on every VMENTER.
 - "cond" Conditional L1D cache flush, explained below
 - "never" Disable the L1D cache flush mitigation

"cond" is trying to avoid L1D cache flushes on VMENTER if the code executed
between VMEXIT and VMENTER is considered safe, i.e. is not bringing any
interesting information into L1D which might exploited.

[ tglx: Split out from a larger patch ]

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
CVE-2018-3620
CVE-2018-3646

[tyhicks: Backported around context changes]
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agox86/KVM: Warn user if KVM is loaded SMT and L1TF CPU bug being present
Konrad Rzeszutek Wilk [Wed, 20 Jun 2018 15:29:53 +0000 (11:29 -0400)]
x86/KVM: Warn user if KVM is loaded SMT and L1TF CPU bug being present

If the L1TF CPU bug is present we allow the KVM module to be loaded as the
major of users that use Linux and KVM have trusted guests and do not want a
broken setup.

Cloud vendors are the ones that are uncomfortable with CVE 2018-3620 and as
such they are the ones that should set nosmt to one.

Setting 'nosmt' means that the system administrator also needs to disable
SMT (Hyper-threading) in the BIOS, or via the 'nosmt' command line
parameter, or via the /sys/devices/system/cpu/smt/control. See commit
05736e4ac13c ("cpu/hotplug: Provide knobs to control SMT").

Other mitigations are to use task affinity, cpu sets, interrupt binding,
etc - anything to make sure that _only_ the same guests vCPUs are running
on sibling threads.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
CVE-2018-3620
CVE-2018-3646

[tyhicks: Backport around missing commit b31c114 to add vmx_vm_init()]
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agocpu/hotplug: Boot HT siblings at least once
Thomas Gleixner [Fri, 29 Jun 2018 14:05:48 +0000 (16:05 +0200)]
cpu/hotplug: Boot HT siblings at least once

Due to the way Machine Check Exceptions work on X86 hyperthreads it's
required to boot up _all_ logical cores at least once in order to set the
CR4.MCE bit.

So instead of ignoring the sibling threads right away, let them boot up
once so they can configure themselves. After they came out of the initial
boot stage check whether its a "secondary" sibling and cancel the operation
which puts the CPU back into offline state.

Reported-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Tony Luck <tony.luck@intel.com>
CVE-2018-3620
CVE-2018-3646

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agoRevert "x86/apic: Ignore secondary threads if nosmt=force"
Thomas Gleixner [Fri, 29 Jun 2018 14:05:47 +0000 (16:05 +0200)]
Revert "x86/apic: Ignore secondary threads if nosmt=force"

Dave Hansen reported, that it's outright dangerous to keep SMT siblings
disabled completely so they are stuck in the BIOS and wait for SIPI.

The reason is that Machine Check Exceptions are broadcasted to siblings and
the soft disabled sibling has CR4.MCE = 0. If a MCE is delivered to a
logical core with CR4.MCE = 0, it asserts IERR#, which shuts down or
reboots the machine. The MCE chapter in the SDM contains the following
blurb:

    Because the logical processors within a physical package are tightly
    coupled with respect to shared hardware resources, both logical
    processors are notified of machine check errors that occur within a
    given physical processor. If machine-check exceptions are enabled when
    a fatal error is reported, all the logical processors within a physical
    package are dispatched to the machine-check exception handler. If
    machine-check exceptions are disabled, the logical processors enter the
    shutdown state and assert the IERR# signal. When enabling machine-check
    exceptions, the MCE flag in control register CR4 should be set for each
    logical processor.

Reverting the commit which ignores siblings at enumeration time solves only
half of the problem. The core cpuhotplug logic needs to be adjusted as
well.

This thoughtful engineered mechanism also turns the boot process on all
Intel HT enabled systems into a MCE lottery. MCE is enabled on the boot CPU
before the secondary CPUs are brought up. Depending on the number of
physical cores the window in which this situation can happen is smaller or
larger. On a HSW-EX it's about 750ms:

MCE is enabled on the boot CPU:

[    0.244017] mce: CPU supports 22 MCE banks

The corresponding sibling #72 boots:

[    1.008005] .... node  #0, CPUs:    #72

That means if an MCE hits on physical core 0 (logical CPUs 0 and 72)
between these two points the machine is going to shutdown. At least it's a
known safe state.

It's obvious that the early boot can be hit by an MCE as well and then runs
into the same situation because MCEs are not yet enabled on the boot CPU.
But after enabling them on the boot CPU, it does not make any sense to
prevent the kernel from recovering.

Adjust the nosmt kernel parameter documentation as well.

Reverts: 2207def700f9 ("x86/apic: Ignore secondary threads if nosmt=force")
Reported-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Tony Luck <tony.luck@intel.com>
CVE-2018-3620
CVE-2018-3646

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agox86/speculation/l1tf: Fix up pte->pfn conversion for PAE
Michal Hocko [Wed, 27 Jun 2018 15:46:50 +0000 (17:46 +0200)]
x86/speculation/l1tf: Fix up pte->pfn conversion for PAE

Jan has noticed that pte_pfn and co. resp. pfn_pte are incorrect for
CONFIG_PAE because phys_addr_t is wider than unsigned long and so the
pte_val reps. shift left would get truncated. Fix this up by using proper
types.

Fixes: 6b28baca9b1f ("x86/speculation/l1tf: Protect PROT_NONE PTEs against speculation")
Reported-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
CVE-2018-3620
CVE-2018-3646

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agox86/speculation/l1tf: Protect PAE swap entries against L1TF
Vlastimil Babka [Fri, 22 Jun 2018 15:39:33 +0000 (17:39 +0200)]
x86/speculation/l1tf: Protect PAE swap entries against L1TF

The PAE 3-level paging code currently doesn't mitigate L1TF by flipping the
offset bits, and uses the high PTE word, thus bits 32-36 for type, 37-63 for
offset. The lower word is zeroed, thus systems with less than 4GB memory are
safe. With 4GB to 128GB the swap type selects the memory locations vulnerable
to L1TF; with even more memory, also the swap offfset influences the address.
This might be a problem with 32bit PAE guests running on large 64bit hosts.

By continuing to keep the whole swap entry in either high or low 32bit word of
PTE we would limit the swap size too much. Thus this patch uses the whole PAE
PTE with the same layout as the 64bit version does. The macros just become a
bit tricky since they assume the arch-dependent swp_entry_t to be 32bit.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Michal Hocko <mhocko@suse.com>
CVE-2018-3620
CVE-2018-3646

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agox86/CPU/AMD: Move TOPOEXT reenablement before reading smp_num_siblings
Borislav Petkov [Fri, 22 Jun 2018 09:34:11 +0000 (11:34 +0200)]
x86/CPU/AMD: Move TOPOEXT reenablement before reading smp_num_siblings

The TOPOEXT reenablement is a workaround for broken BIOSen which didn't
enable the CPUID bit. amd_get_topology_early(), however, relies on
that bit being set so that it can read out the CPUID leaf and set
smp_num_siblings properly.

Move the reenablement up to early_init_amd(). While at it, simplify
amd_get_topology_early().

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
CVE-2018-3620
CVE-2018-3646

[tyhicks: Backport around missing commit 18c71ce]
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agox86/cpufeatures: Add detection of L1D cache flush support.
Konrad Rzeszutek Wilk [Wed, 20 Jun 2018 20:42:58 +0000 (16:42 -0400)]
x86/cpufeatures: Add detection of L1D cache flush support.

336996-Speculative-Execution-Side-Channel-Mitigations.pdf defines a new MSR
(IA32_FLUSH_CMD) which is detected by CPUID.7.EDX[28]=1 bit being set.

This new MSR "gives software a way to invalidate structures with finer
granularity than other architectual methods like WBINVD."

A copy of this document is available at
  https://bugzilla.kernel.org/show_bug.cgi?id=199511

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
CVE-2018-3620
CVE-2018-3646

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agox86/speculation/l1tf: Extend 64bit swap file size limit
Vlastimil Babka [Thu, 21 Jun 2018 10:36:29 +0000 (12:36 +0200)]
x86/speculation/l1tf: Extend 64bit swap file size limit

The previous patch has limited swap file size so that large offsets cannot
clear bits above MAX_PA/2 in the pte and interfere with L1TF mitigation.

It assumed that offsets are encoded starting with bit 12, same as pfn. But
on x86_64, offsets are encoded starting with bit 9.

Thus the limit can be raised by 3 bits. That means 16TB with 42bit MAX_PA
and 256TB with 46bit MAX_PA.

Fixes: 377eeaa8e11f ("x86/speculation/l1tf: Limit swap file size to MAX_PA/2")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
CVE-2018-3620
CVE-2018-3646

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agox86/apic: Ignore secondary threads if nosmt=force
Thomas Gleixner [Tue, 5 Jun 2018 12:00:11 +0000 (14:00 +0200)]
x86/apic: Ignore secondary threads if nosmt=force

nosmt on the kernel command line merely prevents the onlining of the
secondary SMT siblings.

nosmt=force makes the APIC detection code ignore the secondary SMT siblings
completely, so they even do not show up as possible CPUs. That reduces the
amount of memory allocations for per cpu variables and saves other
resources from being allocated too large.

This is not fully equivalent to disabling SMT in the BIOS because the low
level SMT enabling in the BIOS can result in partitioning of resources
between the siblings, which is not undone by just ignoring them. Some CPUs
can use the full resources when their sibling is not onlined, but this is
depending on the CPU family and model and it's not well documented whether
this applies to all partitioned resources. That means depending on the
workload disabling SMT in the BIOS might result in better performance.

Linus analysis of the Intel manual:

  The intel optimization manual is not very clear on what the partitioning
  rules are.

  I find:

    "In general, the buffers for staging instructions between major pipe
     stages  are partitioned. These buffers include µop queues after the
     execution trace cache, the queues after the register rename stage, the
     reorder buffer which stages instructions for retirement, and the load
     and store buffers.

     In the case of load and store buffers, partitioning also provided an
     easier implementation to maintain memory ordering for each logical
     processor and detect memory ordering violations"

  but some of that partitioning may be relaxed if the HT thread is "not
  active":

    "In Intel microarchitecture code name Sandy Bridge, the micro-op queue
     is statically partitioned to provide 28 entries for each logical
     processor,  irrespective of software executing in single thread or
     multiple threads. If one logical processor is not active in Intel
     microarchitecture code name Ivy Bridge, then a single thread executing
     on that processor  core can use the 56 entries in the micro-op queue"

  but I do not know what "not active" means, and how dynamic it is. Some of
  that partitioning may be entirely static and depend on the early BIOS
  disabling of HT, and even if we park the cores, the resources will just be
  wasted.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
CVE-2018-3620
CVE-2018-3646

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agox86/cpu/AMD: Evaluate smp_num_siblings early
Thomas Gleixner [Tue, 5 Jun 2018 22:57:38 +0000 (00:57 +0200)]
x86/cpu/AMD: Evaluate smp_num_siblings early

To support force disabling of SMT it's required to know the number of
thread siblings early. amd_get_topology() cannot be called before the APIC
driver is selected, so split out the part which initializes
smp_num_siblings and invoke it from amd_early_init().

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
CVE-2018-3620
CVE-2018-3646

[tyhicks: Backport around missing commit 18c71ce]
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agox86/CPU/AMD: Do not check CPUID max ext level before parsing SMP info
Borislav Petkov [Fri, 15 Jun 2018 18:48:39 +0000 (20:48 +0200)]
x86/CPU/AMD: Do not check CPUID max ext level before parsing SMP info

Old code used to check whether CPUID ext max level is >= 0x80000008 because
that last leaf contains the number of cores of the physical CPU.  The three
functions called there now do not depend on that leaf anymore so the check
can go.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
CVE-2018-3620
CVE-2018-3646

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agox86/cpu/intel: Evaluate smp_num_siblings early
Thomas Gleixner [Tue, 5 Jun 2018 23:00:55 +0000 (01:00 +0200)]
x86/cpu/intel: Evaluate smp_num_siblings early

Make use of the new early detection function to initialize smp_num_siblings
on the boot cpu before the MP-Table or ACPI/MADT scan happens. That's
required for force disabling SMT.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
CVE-2018-3620
CVE-2018-3646

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agox86/cpu/topology: Provide detect_extended_topology_early()
Thomas Gleixner [Tue, 5 Jun 2018 22:55:39 +0000 (00:55 +0200)]
x86/cpu/topology: Provide detect_extended_topology_early()

To support force disabling of SMT it's required to know the number of
thread siblings early. detect_extended_topology() cannot be called before
the APIC driver is selected, so split out the part which initializes
smp_num_siblings.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
CVE-2018-3620
CVE-2018-3646

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agox86/cpu/common: Provide detect_ht_early()
Thomas Gleixner [Tue, 5 Jun 2018 22:53:57 +0000 (00:53 +0200)]
x86/cpu/common: Provide detect_ht_early()

To support force disabling of SMT it's required to know the number of
thread siblings early. detect_ht() cannot be called before the APIC driver
is selected, so split out the part which initializes smp_num_siblings.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
CVE-2018-3620
CVE-2018-3646

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agox86/cpu/AMD: Remove the pointless detect_ht() call
Thomas Gleixner [Tue, 5 Jun 2018 22:47:10 +0000 (00:47 +0200)]
x86/cpu/AMD: Remove the pointless detect_ht() call

Real 32bit AMD CPUs do not have SMT and the only value of the call was to
reach the magic printout which got removed.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
CVE-2018-3620
CVE-2018-3646

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agox86/cpu: Remove the pointless CPU printout
Thomas Gleixner [Tue, 5 Jun 2018 22:36:15 +0000 (00:36 +0200)]
x86/cpu: Remove the pointless CPU printout

The value of this printout is dubious at best and there is no point in
having it in two different places along with convoluted ways to reach it.

Remove it completely.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
CVE-2018-3620
CVE-2018-3646

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agocpu/hotplug: Provide knobs to control SMT
Thomas Gleixner [Tue, 29 May 2018 15:48:27 +0000 (17:48 +0200)]
cpu/hotplug: Provide knobs to control SMT

Provide a command line and a sysfs knob to control SMT.

The command line options are:

 'nosmt': Enumerate secondary threads, but do not online them

 'nosmt=force': Ignore secondary threads completely during enumeration
  via MP table and ACPI/MADT.

The sysfs control file has the following states (read/write):

 'on':  SMT is enabled. Secondary threads can be freely onlined
 'off':  SMT is disabled. Secondary threads, even if enumerated
   cannot be onlined
 'forceoff':  SMT is permanentely disabled. Writes to the control
   file are rejected.
 'notsupported': SMT is not supported by the CPU

The command line option 'nosmt' sets the sysfs control to 'off'. This
can be changed to 'on' to reenable SMT during runtime.

The command line option 'nosmt=force' sets the sysfs control to
'forceoff'. This cannot be changed during runtime.

When SMT is 'on' and the control file is changed to 'off' then all online
secondary threads are offlined and attempts to online a secondary thread
later on are rejected.

When SMT is 'off' and the control file is changed to 'on' then secondary
threads can be onlined again. The 'off' -> 'on' transition does not
automatically online the secondary threads.

When the control file is set to 'forceoff', the behaviour is the same as
setting it to 'off', but the operation is irreversible and later writes to
the control file are rejected.

When the control status is 'notsupported' then writes to the control file
are rejected.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
CVE-2018-3620
CVE-2018-3646

[tyhicks: Backport around missing "select NEED_SG_DMA_LENGTH" in Kconfig]
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agocpu/hotplug: Split do_cpu_down()
Thomas Gleixner [Tue, 29 May 2018 15:49:05 +0000 (17:49 +0200)]
cpu/hotplug: Split do_cpu_down()

Split out the inner workings of do_cpu_down() to allow reuse of that
function for the upcoming SMT disabling mechanism.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
CVE-2018-3620
CVE-2018-3646

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agocpu/hotplug: Make bringup/teardown of smp threads symmetric
Thomas Gleixner [Tue, 29 May 2018 17:05:25 +0000 (19:05 +0200)]
cpu/hotplug: Make bringup/teardown of smp threads symmetric

The asymmetry caused a warning to trigger if the bootup was stopped in state
CPUHP_AP_ONLINE_IDLE. The warning no longer triggers as kthread_park() can
now be invoked on already or still parked threads. But there is still no
reason to have this be asymmetric.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
CVE-2018-3620
CVE-2018-3646

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
6 years agox86/topology: Provide topology_smt_supported()
Thomas Gleixner [Thu, 21 Jun 2018 08:37:20 +0000 (10:37 +0200)]
x86/topology: Provide topology_smt_supported()

Provide information whether SMT is supoorted by the CPUs. Preparatory patch
for SMT control mechanism.

Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
CVE-2018-3620
CVE-2018-3646

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>