get_user_pages_fast is supposed to be a faster drop-in equivalent of
get_user_pages. As such, callers expect it to return a negative return
code when passed an invalid address, and never expect it to return 0
when passed a positive number of pages, since its documentation says:
* Returns number of pages pinned. This may be fewer than the number
* requested. If nr_pages is 0 or negative, returns 0. If no pages
* were pinned, returns -errno.
When get_user_pages_fast fall back on get_user_pages this is exactly
what happens. Unfortunately the implementation is inconsistent: it
returns 0 if passed a kernel address, confusing callers: for example,
the following is pretty common but does not appear to do the right thing
with a kernel address:
ret = get_user_pages_fast(addr, 1, writeable, &page);
if (ret < 0)
return ret;
Change get_user_pages_fast to return -EFAULT when supplied a kernel
address to make it match expectations.
All callers have been audited for consistency with the documented
semantics.
Link: http://lkml.kernel.org/r/1522962072-182137-4-git-send-email-mst@redhat.com Fixes: 5b65c4677a57 ("mm, x86/mm: Fix performance regression in get_user_pages_fast()") Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reported-by: syzbot+6304bf97ef436580fede@syzkaller.appspotmail.com Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thorsten Leemhuis <regressions@leemhuis.info> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
On an Output queue, both EMPTY and PENDING buffer states imply that the
buffer is ready for completion-processing by the upper-layer drivers.
So for a non-QEBSM Output queue, get_buf_states() merges mixed
batches of PENDING and EMPTY buffers into one large batch of EMPTY
buffers. The upper-layer driver (ie. qeth) later distuingishes PENDING
from EMPTY by inspecting the slsb_state for
QDIO_OUTBUF_STATE_FLAG_PENDING.
But the merge logic in get_buf_states() contains a bug that causes us to
erronously also merge ERROR buffers into such a batch of EMPTY buffers
(ERROR is 0xaf, EMPTY is 0xa1; so ERROR & EMPTY == EMPTY).
Effectively, most outbound ERROR buffers are currently discarded
silently and processed as if they had succeeded.
Note that this affects _all_ non-QEBSM device types, not just IQD with CQ.
Fix it by explicitly spelling out the exact conditions for merging.
For extracting the "get initial state" part out of the loop, this relies
on the fact that get_buf_states() is never called with a count of 0. The
QEBSM path already strictly requires this, and the two callers with
variable 'count' make sure of it.
Fixes: 104ea556ee7f ("qdio: support asynchronous delivery of storage blocks") Cc: <stable@vger.kernel.org> #v3.2+ Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Immediate retry of EQBS after CCQ 96 means that we potentially misreport
the state of buffers inspected during the first EQBS call.
This occurs when
1. the first EQBS finds all inspected buffers still in the initial state
set by the driver (ie INPUT EMPTY or OUTPUT PRIMED),
2. the EQBS terminates early with CCQ 96, and
3. by the time that the second EQBS comes around, the state of those
previously inspected buffers has changed.
If the state reported by the second EQBS is 'driver-owned', all we know
is that the previous buffers are driver-owned now as well. But we can't
tell if they all have the same state. So for instance
- the second EQBS reports OUTPUT EMPTY, but any number of the previous
buffers could be OUTPUT ERROR by now,
- the second EQBS reports OUTPUT ERROR, but any number of the previous
buffers could be OUTPUT EMPTY by now.
Effectively, this can result in both over- and underreporting of errors.
If the state reported by the second EQBS is 'HW-owned', that doesn't
guarantee that the previous buffers have not been switched to
driver-owned in the mean time. So for instance
- the second EQBS reports INPUT EMPTY, but any number of the previous
buffers could be INPUT PRIMED (or INPUT ERROR) by now.
This would result in failure to process pending work on the queue. If
it's the final check before yielding initiative, this can cause
a (temporary) queue stall due to IRQ avoidance.
Fixes: 25f269f17316 ("[S390] qdio: EQBS retry after CCQ 96") Cc: <stable@vger.kernel.org> #v3.2+ Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Commit 1cf03c00e7c1 "nfit: scrub and register regions in a workqueue"
mistakenly attempts to register a region per BLK aperture. There is
nothing to register for individual apertures as they belong as a set to
a BLK aperture group that are registered with a corresponding
DIMM-control-region. Filter them for registration to prevent some
needless devm_kzalloc() allocations.
Cc: <stable@vger.kernel.org> Fixes: 1cf03c00e7c1 ("nfit: scrub and register regions in a workqueue") Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
[ 92.931922] 1 lock held by systemd-udevd/525:
[ 92.933642] #0: 00000000a2849e25 (&bdev->bd_mutex){+.+.}, at: __blkdev_get+0x73/0x4f0
----------------------------------------
The reason of deadlock turned out that wait_event_interruptible() in
blk_queue_enter() got stuck with bdev->bd_mutex held at __blkdev_put()
due to q->mq_freeze_depth == 1.
[ 92.943530] 1 lock held by a.out/634:
[ 92.945105] #0: 00000000a2849e25 (&bdev->bd_mutex){+.+.}, at: __blkdev_put+0x3c/0x1e0
----------------------------------------
The reason of q->mq_freeze_depth == 1 turned out that loop_set_status()
forgot to call blk_mq_unfreeze_queue() at error paths for
info->lo_encrypt_type != NULL case.
The code that fixes the crashes in the following commit introduced a small
memory leak:
commit 6a2cf8d3663e ("scsi: qla2xxx: Fix crashes in qla2x00_probe_one on probe failure")
Fixing this requires a bit of reworking, which I've explained. Also provide
some code cleanup.
There is a small window in qla2x00_probe_one where if qla2x00_alloc_queues
fails, we end up never freeing req and rsp and leak 0xc0 and 0xc8 bytes
respectively (the sizes of req and rsp).
I originally put in checks to test for this condition which were based on
the incorrect assumption that if ha->rsp_q_map and ha->req_q_map were
allocated, then rsp and req were allocated as well. This is incorrect.
There is a window between these allocations:
ret = qla2x00_mem_alloc(ha, req_length, rsp_length, &req, &rsp);
goto probe_hw_failed;
ret = qla2x00_request_irqs(ha, rsp);
goto probe_failed;
if (qla2x00_alloc_queues(ha, req, rsp)) {
goto probe_failed;
[if successful, now ha->rsp_q_map and ha->req_q_map allocated]
To simplify this, we should just set req and rsp to NULL after we free
them. Sounds simple enough? The problem is that req and rsp are pointers
defined in the qla2x00_probe_one and they are not always passed by reference
to the routines that free them.
Here are paths which can free req and rsp:
PATH 1:
qla2x00_probe_one
ret = qla2x00_mem_alloc(ha, req_length, rsp_length, &req, &rsp);
[req and rsp are passed by reference, but if this fails, we currently
do not NULL out req and rsp. Easily fixed]
PATH 2:
qla2x00_probe_one
failing in qla2x00_request_irqs or qla2x00_alloc_queues
probe_failed:
qla2x00_free_device(base_vha);
qla2x00_free_req_que(ha, req)
qla2x00_free_rsp_que(ha, rsp)
PATH 3:
qla2x00_probe_one:
failing in qla2x00_mem_alloc or qla2x00_create_host
probe_hw_failed:
qla2x00_free_req_que(ha, req)
qla2x00_free_rsp_que(ha, rsp)
PATH 1: This should currently work, but it doesn't because rsp and rsp are
not set to NULL in qla2x00_mem_alloc. Easily remedied.
PATH 2: req and rsp aren't passed in at all to qla2x00_free_device but are
derived from ha->req_q_map[0] and ha->rsp_q_map[0]. These are only set up if
qla2x00_alloc_queues succeeds.
In qla2x00_free_queues, we are protected from crashing if these don't exist
because req_qid_map and rsp_qid_map are only set on their allocation. We are
guarded in this way:
for (cnt = 0; cnt < ha->max_req_queues; cnt++) {
if (!test_bit(cnt, ha->req_qid_map))
continue;
PATH 3: This works. We haven't freed req or rsp yet (or they were never
allocated if qla2x00_mem_alloc failed), so we'll attempt to free them here.
To summarize, there are a few small changes to make this work correctly and
(and for some cleanup):
1) (For PATH 1) Set *rsp and *req to NULL in case of failure in
qla2x00_mem_alloc so these are correctly set to NULL back in
qla2x00_probe_one
2) After jumping to probe_failed: and calling qla2x00_free_device,
explicitly set rsp and req to NULL so further calls with these pointers do
not crash, i.e. the free queue calls in the probe_hw_failed section we fall
through to.
3) Fix return code check in the call to qla2x00_alloc_queues. We currently
drop the return code on the floor. The probe fails but the caller of the
probe doesn't have an error code, so it attaches to pci. This can result in
a crash on module shutdown.
4) Remove unnecessary NULL checks in qla2x00_free_req_que,
qla2x00_free_rsp_que, and the egregious NULL checks before kfrees and vfrees
in qla2x00_mem_free.
I tested this out running a scenario where the card breaks at various times
during initialization. I made sure I forced every error exit path in
qla2x00_probe_one.
Cc: <stable@vger.kernel.org> # v4.16 Fixes: 6a2cf8d3663e ("scsi: qla2xxx: Fix crashes in qla2x00_probe_one on probe failure") Signed-off-by: Bill Kuzeja <william.kuzeja@stratus.com> Acked-by: Himanshu Madhani <himanshu.madhani@cavium.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
We're neglecting to clear the umask after it's set, which can cause a
later unrelated rpc to (incorrectly) use the same umask if it happens to
be processed by the same thread.
There's a more subtle problem here too:
An NFSv4 compound request is decoded all in one pass before any
operations are executed.
Currently we're setting current->fs->umask at the time we decode the
compound. In theory a single compound could contain multiple creates
each setting a umask. In that case we'd end up using whichever umask
was passed in the *last* operation as the umask for all the creates,
whether that was correct or not.
So, we should just be saving the umask at decode time and waiting to set
it until we actually process the corresponding operation.
In practice it's unlikely any client would do multiple creates in a
single compound. And even if it did they'd likely be from the same
process (hence carry the same umask). So this is a little academic, but
we should get it right anyway.
Fixes: 47057abde515 (nfsd: add support for the umask attribute) Cc: stable@vger.kernel.org Reported-by: Lucash Stach <l.stach@pengutronix.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
This is a fix for a regression in 32 bit kernels caused by an invalid
check for pgoff overflow in hugetlbfs mmap setup. The check incorrectly
specified that the size of a loff_t was the same as the size of a long.
The regression prevents mapping hugetlbfs files at offsets greater than
4GB on 32 bit kernels.
On 32 bit kernels conversion from a page based unsigned long can not
overflow a loff_t byte offset. Therefore, skip this check if
sizeof(unsigned long) != sizeof(loff_t).
Link: http://lkml.kernel.org/r/20180330145402.5053-1-mike.kravetz@oracle.com Fixes: 63489f8e8211 ("hugetlbfs: check for pgoff value overflow") Reported-by: Dan Rue <dan.rue@linaro.org> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Tested-by: Anders Roxell <anders.roxell@linaro.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Yisheng Xie <xieyisheng1@huawei.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Nic Losby <blurbdust@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Commit fd8aa9095a95 ("xen: optimize xenbus driver for multiple
concurrent xenstore accesses") made a subtle change to the semantic of
xenbus_dev_request_and_reply() and xenbus_transaction_end().
Before on an error response to XS_TRANSACTION_END
xenbus_dev_request_and_reply() would not decrement the active
transaction counter. But xenbus_transaction_end() has always counted the
transaction as finished regardless of the response.
The new behavior is that xenbus_dev_request_and_reply() and
xenbus_transaction_end() will always count the transaction as finished
regardless the response code (handled in xs_request_exit()).
But xenbus_dev_frontend tries to end a transaction on closing of the
device if the XS_TRANSACTION_END failed before. Trying to close the
transaction twice corrupts the reference count. So fix this by also
considering a transaction closed if we have sent XS_TRANSACTION_END once
regardless of the return code.
Cc: <stable@vger.kernel.org> # 4.11 Fixes: fd8aa9095a95 ("xen: optimize xenbus driver for multiple concurrent xenstore accesses") Signed-off-by: Simon Gaiser <simon@invisiblethingslab.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
As of now if we encounter an opaque dir while looking for a dentry, we set
d->last=true. This means that there is no need to look further in any of
the lower layers. This works fine as long as there are no redirets or
relative redircts. But what if there is an absolute redirect on the
children dentry of opaque directory. We still need to continue to look into
next lower layer. This patch fixes it.
Here is an example to demonstrate the issue. Say you have following setup.
upper: /redirect (redirect=/a/b/c)
lower1: /a/[b]/c ([b] is opaque) (c has absolute redirect=/a/b/d/)
lower0: /a/b/d/foo
Now "redirect" dir should merge with lower1:/a/b/c/ and lower0:/a/b/d.
Note, despite the fact lower1:/a/[b] is opaque, we need to continue to look
into lower0 because children c has an absolute redirect.
From commit 4b855ad37194 ("blk-mq: Create hctx for each present CPU),
blk-mq doesn't remap queue after CPU topo is changed, that said when
some of these offline CPUs become online, they are still mapped to
hctx 0, then hctx 0 may become the bottleneck of IO dispatch and
completion.
This patch sets up the mapping from the beginning, and aligns to
queue mapping for PCI device (blk_mq_pci_map_queues()).
Cc: Stefan Haberland <sth@linux.vnet.ibm.com> Cc: Keith Busch <keith.busch@intel.com> Cc: stable@vger.kernel.org Fixes: 4b855ad37194 ("blk-mq: Create hctx for each present CPU) Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
This patch orders getting budget and driver tag by making sure to acquire
driver tag after budget is got, this way can help to avoid the following
race:
1) before dispatch request from scheduler queue, get one budget first, then
dequeue a request, call it request A.
2) in another IO path for dispatching request B which is from hctx->dispatch,
driver tag is got, then try to get budget in blk_mq_dispatch_rq_list(),
unfortunately the budget is held by request A.
3) meantime blk_mq_dispatch_rq_list() is called for dispatching request
A, and try to get driver tag first, unfortunately no driver tag is
available because the driver tag is held by request B
4) both two IO pathes can't move on, and IO stall is caused.
This issue can be observed when running dbench on USB storage.
This patch fixes this issue by always getting budget before getting
driver tag.
Cc: stable@vger.kernel.org Fixes: de1482974080ec9e ("blk-mq: introduce .get_budget and .put_budget in blk_mq_ops") Cc: Christoph Hellwig <hch@lst.de> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: Omar Sandoval <osandov@fb.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
If a task is holding a reference to a namespace on a removed controller,
the head will not be released. If the same controller is added again
later, its namespaces may not be successfully added. Instead, the user
will see kernel message "Duplicate IDs for nsid <X>".
This patch fixes that by skipping heads that don't have namespaces when
considering if a new namespace is safe to add.
Reported-by: Alex Gagniuc <Alex_Gagniuc@Dellteam.com> Cc: stable@vger.kernel.org Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
scsi_device_quiesce() uses synchronize_rcu() to guarantee that the
effect of blk_set_preempt_only() will be visible for percpu_ref_tryget()
calls that occur after the queue unfreeze by using the approach
explained in https://lwn.net/Articles/573497/. The rcu read lock and
unlock calls in blk_queue_enter() form a pair with the synchronize_rcu()
call in scsi_device_quiesce(). Both scsi_device_quiesce() and
blk_queue_enter() must either use regular RCU or RCU-sched.
Since neither the RCU-protected code in blk_queue_enter() nor
blk_queue_usage_counter_release() sleeps, regular RCU protection
is sufficient. Note: scsi_device_quiesce() does not have to be
modified since it already uses synchronize_rcu().
Reported-by: Tejun Heo <tj@kernel.org> Fixes: 3a0a529971ec ("block, scsi: Make SCSI quiesce and resume work reliably") Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Hannes Reinecke <hare@suse.com> Cc: Ming Lei <ming.lei@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: Oleksandr Natalenko <oleksandr@natalenko.name> Cc: Martin Steigerwald <martin@lichtvoll.de> Cc: stable@vger.kernel.org # v4.15 Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Commit 7a20b8a61eff81bdb7097a578752a74860e9d142 ("f2fs: allocate node
and hot data in the beginning of partition") introduces another mount
option, heap, to reset it back. But it does not do anything for heap
mode, so fix it.
Cc: stable@vger.kernel.org Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
The APIC ID as parsed from ACPI MADT is validity checked with the
apic->apic_id_valid() callback, which depends on the selected APIC type.
For non X2APIC types APIC IDs >= 0xFF are invalid, but values > 0x7FFFFFFF
are detected as valid. This happens because the 'apicid' argument of the
apic_id_valid() callback is type 'int'. So the resulting comparison
apicid < 0xFF
evaluates to true for all unsigned int values > 0x7FFFFFFF which are handed
to default_apic_id_valid(). As a consequence, invalid APIC IDs in !X2APIC
mode are considered valid and accounted as possible CPUs.
Change the apicid argument type of the apic_id_valid() callback to u32 so
the evaluation is unsigned and returns the correct result.
[ tglx: Massaged changelog ]
Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Cc: jgross@suse.com Cc: Dou Liyang <douly.fnst@cn.fujitsu.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/1523322966-10296-1-git-send-email-lirongqing@baidu.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
When ath9k was switched over to use the mac80211 intermediate queues,
node cleanup now drains the mac80211 queues. However, this call path is
not protected by rcu_read_lock() as it was previously entirely internal
to the driver which uses its own locking.
This leads to a possible rcu_dereference() without holding
rcu_read_lock(); but only if a station is cleaned up while having
packets queued on the TXQ. Fix this by adding the rcu_read_lock() to the
caller in ath9k.
Fixes: 50f08edf9809 ("ath9k: Switch to using mac80211 intermediate software queues.") Cc: stable@vger.kernel.org Reported-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Initialize data->config_lock mutex before it is used by the driver code.
This fixes following warning on Odroid XU3 boards:
INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
CPU: 5 PID: 1 Comm: swapper/0 Not tainted 4.15.0-rc7-next-20180115-00001-gb75575dee3f2 #107
Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
[<c0111504>] (unwind_backtrace) from [<c010dbec>] (show_stack+0x10/0x14)
[<c010dbec>] (show_stack) from [<c09b3f74>] (dump_stack+0x90/0xc8)
[<c09b3f74>] (dump_stack) from [<c0179528>] (register_lock_class+0x1c0/0x59c)
[<c0179528>] (register_lock_class) from [<c017bd1c>] (__lock_acquire+0x78/0x1850)
[<c017bd1c>] (__lock_acquire) from [<c017de30>] (lock_acquire+0xc8/0x2b8)
[<c017de30>] (lock_acquire) from [<c09ca59c>] (__mutex_lock+0x60/0xa0c)
[<c09ca59c>] (__mutex_lock) from [<c09cafd0>] (mutex_lock_nested+0x1c/0x24)
[<c09cafd0>] (mutex_lock_nested) from [<c068b0d0>] (ina2xx_set_shunt+0x70/0xb0)
[<c068b0d0>] (ina2xx_set_shunt) from [<c068b218>] (ina2xx_probe+0x88/0x1b0)
[<c068b218>] (ina2xx_probe) from [<c0673d90>] (i2c_device_probe+0x1e0/0x2d0)
[<c0673d90>] (i2c_device_probe) from [<c053a268>] (driver_probe_device+0x2b8/0x4a0)
[<c053a268>] (driver_probe_device) from [<c053a54c>] (__driver_attach+0xfc/0x120)
[<c053a54c>] (__driver_attach) from [<c05384cc>] (bus_for_each_dev+0x58/0x7c)
[<c05384cc>] (bus_for_each_dev) from [<c0539590>] (bus_add_driver+0x174/0x250)
[<c0539590>] (bus_add_driver) from [<c053b5e0>] (driver_register+0x78/0xf4)
[<c053b5e0>] (driver_register) from [<c0675ef0>] (i2c_register_driver+0x38/0xa8)
[<c0675ef0>] (i2c_register_driver) from [<c0102b40>] (do_one_initcall+0x48/0x18c)
[<c0102b40>] (do_one_initcall) from [<c0e00df0>] (kernel_init_freeable+0x110/0x1d4)
[<c0e00df0>] (kernel_init_freeable) from [<c09c8120>] (kernel_init+0x8/0x114)
[<c09c8120>] (kernel_init) from [<c01010b4>] (ret_from_fork+0x14/0x20)
Fixes: 5d389b125186 ("hwmon: (ina2xx) Make calibration register value fixed") Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
The block address is saved after the block is initialized when
threshold_init_device() is called.
Use the saved block address, if available, rather than trying to
rediscover it.
This will avoid a call trace, when resuming from suspend, due to the
rdmsr_safe_on_cpu() call in get_block_address(). The rdmsr_safe_on_cpu()
call issues an IPI but we're running with interrupts disabled. This
triggers:
WARNING: CPU: 0 PID: 11523 at kernel/smp.c:291 smp_call_function_single+0xdc/0xe0
A use-after-free bug was caught by KASAN while running usdt related
code (BCC project. bcc/tests/python/test_usdt2.py):
==================================================================
BUG: KASAN: use-after-free in uprobe_perf_close+0x222/0x3b0
Read of size 4 at addr ffff880384f9b4a4 by task test_usdt2.py/870
The buggy address belongs to the object at ffff880384f9b480
which belongs to the cache task_struct of size 12928
It occurs because task_struct is freed before perf_event which refers
to the task and task flags are checked while teardown of the event.
perf_event_alloc() assigns task_struct to hw.target of perf_event,
but there is no reference counting for it.
As a fix we get_task_struct() in perf_event_alloc() at above mentioned
assignment and put_task_struct() in _free_event().
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 63b6da39bb38e8f1a1ef3180d32a39d6 ("perf: Fix perf_event_exit_task() race") Link: http://lkml.kernel.org/r/20180409100346.6416-1-bhole_prashant_q7@lab.ntt.co.jp Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
I suspect the reason is the per-cpu data is not in the linear chunk.
This could be restored if that was able to be fixed, but for now,
just remove the tracepoints.
Fixes: 0428491cba92 ("powerpc/mm: Trace tlbie(l) instructions") Cc: stable@vger.kernel.org # v4.13+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
1. With the patch "x86/vector/msi: Switch to global reservation mode",
the recent v4.15 and newer kernels always hang for 1-vCPU Hyper-V VM
with SR-IOV. This is because when we reach hv_compose_msi_msg() by
request_irq() -> request_threaded_irq() ->__setup_irq()->irq_startup()
-> __irq_startup() -> irq_domain_activate_irq() -> ... ->
msi_domain_activate() -> ... -> hv_compose_msi_msg(), local irq is
disabled in __setup_irq().
Note: when we reach hv_compose_msi_msg() by another code path:
pci_enable_msix_range() -> ... -> irq_domain_activate_irq() -> ... ->
hv_compose_msi_msg(), local irq is not disabled.
hv_compose_msi_msg() depends on an interrupt from the host.
With interrupts disabled, a UP VM always hangs in the busy loop in
the function, because the interrupt callback hv_pci_onchannelcallback()
can not be called.
We can do nothing but work it around by polling the channel. This
is ugly, but we don't have any other choice.
2. If the host is ejecting the VF device before we reach
hv_compose_msi_msg(), in a UP VM, we can hang in hv_compose_msi_msg()
forever, because at this time the host doesn't respond to the
CREATE_INTERRUPT request. This issue exists the first day the
pci-hyperv driver appears in the kernel.
Luckily, this can also by worked around by polling the channel
for the PCI_EJECT message and hpdev->state, and by checking the
PCI vendor ID.
Note: actually the above 2 issues also happen to a SMP VM, if
"hbus->hdev->channel->target_cpu == smp_processor_id()" is true.
Fixes: 4900be83602b ("x86/vector/msi: Switch to global reservation mode") Tested-by: Adrian Suhov <v-adsuho@microsoft.com> Tested-by: Chris Valean <v-chvale@microsoft.com> Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Acked-by: Haiyang Zhang <haiyangz@microsoft.com> Cc: <stable@vger.kernel.org> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Jack Morgenstein <jackm@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
When we hot-remove the device, we first receive a PCI_EJECT message and
then receive a PCI_BUS_RELATIONS message with bus_rel->device_count == 0.
The first message is offloaded to hv_eject_device_work(), and the second
is offloaded to pci_devices_present_work(). Both the paths can be running
list_del(&hpdev->list_entry), causing general protection fault, because
system_wq can run them concurrently.
The patch eliminates the race condition.
Since access to present/eject work items is serialized, we do not need the
hbus->enum_sem anymore, so remove it.
Fixes: 4daace0d8ce8 ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs") Link: https://lkml.kernel.org/r/KL1P15301MB00064DA6B4D221123B5241CFBFD70@KL1P15301MB0006.APCP153.PROD.OUTLOOK.COM Tested-by: Adrian Suhov <v-adsuho@microsoft.com> Tested-by: Chris Valean <v-chvale@microsoft.com> Signed-off-by: Dexuan Cui <decui@microsoft.com>
[lorenzo.pieralisi@arm.com: squashed semaphore removal patch] Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Acked-by: Haiyang Zhang <haiyangz@microsoft.com> Cc: <stable@vger.kernel.org> # v4.6+ Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Jack Morgenstein <jackm@mellanox.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
The pci-hyperv driver's channel callback hv_pci_onchannelcallback() is not
really a hot path, so we don't need to mark it as a perf_device, meaning
with this patch all HV_PCIE channels' target_cpu will be CPU0.
Signed-off-by: Dexuan Cui <decui@microsoft.com> Cc: stable@vger.kernel.org Cc: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Make sure that the HPMC (High Priority Machine Check) handler is 16-byte
aligned and that it's length in the IVT is a multiple of 16 bytes.
Otherwise PDC may decide not to call the HPMC crash handler.
Signed-off-by: Helge Deller <deller@gmx.de> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
As found by the ubsan checker, the value of the 'index' variable can be
out of range for the bc[] array:
UBSAN: Undefined behaviour in arch/parisc/kernel/drivers.c:655:21
index 6 is out of range for type 'char [6]'
Backtrace:
[<104fa850>] __ubsan_handle_out_of_bounds+0x68/0x80
[<1019d83c>] check_parent+0xc0/0x170
[<1019d91c>] descend_children+0x30/0x6c
[<1059e164>] device_for_each_child+0x60/0x98
[<1019cd54>] parse_tree_node+0x40/0x54
[<1019d86c>] check_parent+0xf0/0x170
[<1019d91c>] descend_children+0x30/0x6c
[<1059e164>] device_for_each_child+0x60/0x98
[<1019d938>] descend_children+0x4c/0x6c
[<1059e164>] device_for_each_child+0x60/0x98
[<1019cd54>] parse_tree_node+0x40/0x54
[<1019cffc>] hwpath_to_device+0xa4/0xc4
Signed-off-by: Helge Deller <deller@gmx.de> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
device_remove_group() was called on any cleanup, even if the
device attrs had not been added yet. That can occur in certain
error scenarios, so add a flag to know if it has been added.
Also make sure we remove the dev if we added it ourselves.
Signed-off-by: Corey Minyard <cminyard@mvista.com> Cc: stable@vger.kernel.org # 4.15 Cc: Laura Abbott <labbott@redhat.com> Tested-by: Bill Perkins <wmp@grnwood.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
To allow dual pipelines utilising two WPF entities when available, the
VSP was updated to support header-mode display list in continuous
pipelines.
A small bug in the status check of the command register causes the
second pipeline to be directly afflicted by the running of the first;
appearing as a perceived performance issue with stuttering display.
Fix the vsp1_dl_list_hw_update_pending() call to ensure that the read
comparison corresponds to the correct pipeline.
Fixes: eaf4bfad6ad8 ("v4l: vsp1: Add support for header display lists in continuous mode") Cc: "Stable v4.14+" <stable@vger.kernel.org> Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com> Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
At put_v4l2_window32(), it tries to access kp->clips. However,
kp points to an userspace pointer. So, it should be obtained
via get_user(), otherwise it can OOPS:
lan78xx_read_otp tries to return -EINVAL in the event of invalid OTP
content, but the value gets overwritten before it is returned and the
read goes ahead anyway. Make the read conditional as it should be
and preserve the error code.
Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver") Signed-off-by: Phil Elwell <phil@raspberrypi.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
vhost_copy_to_user is used to copy vring used elements to userspace.
We should use VHOST_ADDR_USED instead of VHOST_ADDR_DESC.
Fixes: f88949138058 ("vhost: introduce O(1) vq metadata cache") Signed-off-by: Eric Auger <eric.auger@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Commit dd9d598c6657 ("ip_gre: add the support for i/o_flags update via
netlink") added the ability to change o_flags, but missed that the
GSO/LLTX features are disabled by default, and only enabled some gre
features are unused. Thus we also need to disable the GSO/LLTX features
on the device when the TUNNEL_SEQ or TUNNEL_CSUM flags are set.
These two examples should result in the same features being set:
ip link add gre_none type gre local 192.168.0.10 remote 192.168.0.20 ttl 255 key 0
ip link set gre_none type gre seq
ip link add gre_seq type gre local 192.168.0.10 remote 192.168.0.20 ttl 255 key 1 seq
Fixes: dd9d598c6657 ("ip_gre: add the support for i/o_flags update via netlink") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Xin Long <lucien.xin@gmail.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
We can't use l2tp_tunnel_find() to prevent l2tp_nl_cmd_tunnel_create()
from creating a duplicate tunnel. A tunnel can be concurrently
registered after l2tp_tunnel_find() returns. Therefore, searching for
duplicates must be done at registration time.
Finally, remove l2tp_tunnel_find() entirely as it isn't use anywhere
anymore.
Fixes: 309795f4bec2 ("l2tp: Add netlink control API for L2TP") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
l2tp_tunnel_create() inserts the new tunnel into the namespace's tunnel
list and sets the socket's ->sk_user_data field, before returning it to
the caller. Therefore, there are two ways the tunnel can be accessed
and freed, before the caller even had the opportunity to take a
reference. In practice, syzbot could crash the module by closing the
socket right after a new tunnel was returned to pppol2tp_create().
This patch moves tunnel registration out of l2tp_tunnel_create(), so
that the caller can safely hold a reference before publishing the
tunnel. This second step is done with the new l2tp_tunnel_register()
function, which is now responsible for associating the tunnel to its
socket and for inserting it into the namespace's list.
While moving the code to l2tp_tunnel_register(), a few modifications
have been done. First, the socket validation tests are done in a helper
function, for clarity. Also, modifying the socket is now done after
having inserted the tunnel to the namespace's tunnels list. This will
allow insertion to fail, without having to revert theses modifications
in the error path (a followup patch will check for duplicate tunnels
before insertion). Either the socket is a kernel socket which we
control, or it is a user-space socket for which we have a reference on
the file descriptor. In any case, the socket isn't going to be closed
from under us.
Reported-by: syzbot+fbeeb5c3b538e8545644@syzkaller.appspotmail.com Fixes: fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
After the patch the short-circuit logic for A was inverted:
if (A || vq->iotlb)
return A;
return B;
This patch fixes the regression by rewriting the checks in the obvious
way, no longer returning A when vq->iotlb is non-NULL (which is hard to
understand).
Reported-by: syzbot+65a84dde0214b0387ccd@syzkaller.appspotmail.com Cc: Jason Wang <jasowang@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
On receiving a packet the state index points to the rstate which must be
used to fill up IP and TCP headers. But if the state index points to a
rstate which is unitialized, i.e. filled with zeros, it gets stuck in an
infinite loop inside ip_fast_csum trying to compute the ip checsum of a
header with zero length.
./scripts/faddr2line vmlinux slhc_uncompress+0x464/0x468 output:
ip_fast_csum at arch/arm64/include/asm/checksum.h:40
(inlined by) slhc_uncompress at drivers/net/slip/slhc.c:615
Adding a variable to indicate if the current rstate is initialized. If
such a packet arrives, move to toss state.
Signed-off-by: Tejaswi Tanikella <tejaswit@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
rds_sendmsg() calls rds_send_mprds_hash() to find a c_path to use to
send a message. Suppose the RDS connection is not yet up. In
rds_send_mprds_hash(), it does
if (conn->c_npaths == 0)
wait_event_interruptible(conn->c_hs_waitq,
(conn->c_npaths != 0));
If it is interrupted before the connection is set up,
rds_send_mprds_hash() will return a non-zero hash value. Hence
rds_sendmsg() will use a non-zero c_path to send the message. But if
the RDS connection ends up to be non-MP capable, the message will be
lost as only the zero c_path can be used.
Signed-off-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
The Cinterion AHS8 is a 3G device with one embedded WWAN interface
using cdc_ether as a driver.
The modem is controlled via AT commands through the exposed TTYs.
AT+CGDCONT write command can be used to activate or deactivate a WWAN
connection for a PDP context defined with the same command. UE
supports one WWAN adapter.
Signed-off-by: Bassem Boubaker <bassem.boubaker@actia.fr> Acked-by: Oliver Neukum <oneukum@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Zhengjun Xing [Mon, 14 May 2018 08:21:00 +0000 (10:21 +0200)]
xhci: Fix Kernel oops in xhci dbgtty
BugLink: https://bugs.launchpad.net/bugs/1768852
tty_unregister_driver may be called more than 1 time in some
hotplug cases,it will cause the kernel oops. This patch checked
dbc_tty_driver to make sure it is unregistered only 1 time.
UBUNTU: [Packaging] Fix missing watchdog for Raspberry Pi
BugLink: http://bugs.launchpad.net/bugs/1766052
The bcm2835_wdt.ko module is required for Raspberry Pi systems to actually
reboot and shutdown. It should not get automatically blacklisted. See
https://github.com/raspberrypi/linux/issues/2523
ming_qian [Mon, 14 May 2018 05:37:00 +0000 (07:37 +0200)]
UBUNTU: SAUCE: media: uvcvideo: Support realtek's UVC 1.5 device
BugLink: https://bugs.launchpad.net/bugs/1763748
The length of UVC 1.5 video control is 48, and it id 34 for UVC 1.1.
Change it to 48 for UVC 1.5 device,
and the UVC 1.5 device can be recognized.
More changes to the driver are needed for full UVC 1.5 compatibility.
However, at least the UVC 1.5 Realtek RTS5847/RTS5852 cameras have
been reported to work well.
Signed-off-by: ming_qian <ming_qian@realsil.com.cn> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Acked-by: AceLan Kao <acelan.kao@canonical.com> Acked-by: Andy Whitcroft <andy.whitcroft@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
BugLink: http://bugs.launchpad.net/bugs/1769721
his adds support for the P950ER, which has the same required fixup as the P950HR, but has a different PCI ID.
Signed-off-by: Jeremy Soller <jeremy@system76.com> Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com> Acked-by: Po-Hsu Lin <po-hsu.lin@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Mika Westerberg [Thu, 3 May 2018 05:13:00 +0000 (07:13 +0200)]
thunderbolt: Prevent crash when ICM firmware is not running
BugLink: https://bugs.launchpad.net/bugs/1768292
On Lenovo ThinkPad Yoga 370 (and possibly some other Lenovo models as
well) the Thunderbolt host controller sometimes comes up in such way
that the ICM firmware is not running properly. This is most likely an
issue in BIOS/firmware but as side-effect driver crashes the kernel due
to NULL pointer dereference:
Paolo Pisati [Thu, 3 May 2018 08:25:00 +0000 (10:25 +0200)]
UBUNTU: [Config] snapdragon: DRM_I2C_ADV7511=y
BugLink: http://bugs.launchpad.net/bugs/1768761 Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com> Acked-by: Colin Ian King <colin.king@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Igor Russkikh [Wed, 2 May 2018 04:59:00 +0000 (06:59 +0200)]
net: aquantia: oops when shutdown on already stopped device
BugLink: https://bugs.launchpad.net/bugs/1767088
In case netdev is closed at the moment of pci shutdown, aq_nic_stop
gets called second time. napi_disable in that case hangs indefinitely.
In other case, if device was never opened at all, we get oops because
of null pointer access.
We should invoke aq_nic_stop conditionally, only if device is running
at the moment of shutdown.
Reported-by: David Arcari <darcari@redhat.com> Fixes: 90869ddfefeb ("net: aquantia: Implement pci shutdown callback") Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 9a11aff25fd43d5bd2660ababdc9f564b0ba183a) Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Acked-by: Colin Ian King <colin.king@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Igor Russkikh [Wed, 2 May 2018 04:59:00 +0000 (06:59 +0200)]
net: aquantia: Regression on reset with 1.x firmware
BugLink: https://bugs.launchpad.net/bugs/1767088
On ASUS XG-C100C with 1.5.44 firmware a special mode called "dirty wake"
is active. With this mode when motherboard gets powered (but no poweron
happens yet), NIC automatically enables powersave link and watches
for WOL packet.
This normally allows to powerup the PC after AC power failures.
Not all motherboards or bios settings gives power to PCI slots,
so this mode is not enabled on all the hardware.
4.16 linux driver introduced full hardware reset sequence
This is required since before that we had no NIC hardware
reset implemented and there were side effects of "not clean start".
But this full reset is incompatible with "dirty wake" WOL feature
it keeps the PHY link in a special mode forever. As a consequence,
driver sees no link and no traffic.
To fix this we forcibly change FW state to idle state before doing
the full reset. This makes FW to restore link state.
Fixes: c8c82eb net: aquantia: Introduce global AQC hardware reset sequence Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit cce96d1883dae4b79f44890e5118243d806da286) Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Acked-by: Colin Ian King <colin.king@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Benjamin Poirier [Fri, 27 Apr 2018 18:31:00 +0000 (20:31 +0200)]
e1000e: Remove Other from EIAC
BugLink: http://bugs.launchpad.net/bugs/1764892
It was reported that emulated e1000e devices in vmware esxi 6.5 Build 7526125 do not link up after commit 4aea7a5c5e94 ("e1000e: Avoid receiver
overrun interrupt bursts", v4.15-rc1). Some tracing shows that after
e1000e_trigger_lsc() is called, ICR reads out as 0x0 in e1000_msix_other()
on emulated e1000e devices. In comparison, on real e1000e 82574 hardware,
icr=0x80000004 (_INT_ASSERTED | _LSC) in the same situation.
Some experimentation showed that this flaw in vmware e1000e emulation can
be worked around by not setting Other in EIAC. This is how it was before 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1).
Fixes: 4aea7a5c5e94 ("e1000e: Avoid receiver overrun interrupt bursts") Signed-off-by: Benjamin Poirier <bpoirier@suse.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 745d0bd3af99ccc8c5f5822f808cd133eadad6ac) Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
UBUNTU: SAUCE: platform/x86: acer-wmi: add another KEY_POWER keycode
BugLink: http://bugs.launchpad.net/bugs/1766054
Now that we have informed the firmware that the Power Button driver is
active, laptops such as the Acer Swift 3 will generate a WMI key event with
code 0x87 when the power button key is pressed. Add this keycode to the
table so that it is converted to an appropriate input event.
Signed-off-by: Antonio Rosario Intilisano <antonio.intilisano@gmail.com> Acked-by: Gianfranco Costamagna <locutusofborg@debian.org> Cc: Chris Chiu <chiu@endlessm.com> Cc: Daniel Drake <drake@endlessm.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com> Acked-by: Colin Ian King <colin.king@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Hui Wang [Tue, 24 Apr 2018 05:10:00 +0000 (07:10 +0200)]
ALSA: hda/realtek - set PINCFG_HEADSET_MIC to parse_flags
BugLink: https://bugs.launchpad.net/bugs/1766398
Otherwise, the pin will be regarded as microphone, and the jack name
is "Mic Phantom", it is always on in the pulseaudio even nothing is
plugged into the jack. So the UI is confusing to users since the
microphone always shows up in the UI even there is no microphone
plugged.
After adding this flag, the jack name is "Headset Mic Phantom", then
the pulseaudio can handle its detection correctly.
Fixes: f0ba9d699e5c ("ALSA: hda/realtek - Fix Dell headset Mic can't record") Cc: <stable@vger.kernel.org> Signed-off-by: Hui Wang <hui.wang@canonical.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
(cherry picked from commit 3ce0d5aa265bcc0a4b281cb0cabf92491276101b) Signed-off-by: Hui Wang <hui.wang@canonical.com> Acked-by: Po-Hsu Lin <po-hsu.lin@canonical.com> Acked-by: Aaron Ma <aaron.ma@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Hui Wang [Tue, 24 Apr 2018 05:44:00 +0000 (07:44 +0200)]
ALSA: hda/realtek - adjust the location of one mic
BugLink: https://bugs.launchpad.net/bugs/1766477
There are two front mics on this machine, if we don't adjust the
location for one of them, they will have the same mixer name,
pulseaudio can't handle this situation.
After applying this FIXUP, they will have different mixer name,
then pulseaudio can handle them correctly.
Cc: <stable@vger.kernel.org> Signed-off-by: Hui Wang <hui.wang@canonical.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
(cherry picked from commit a3dafb2200bf3c13905a088e82ae11f1eb275a83) Signed-off-by: Hui Wang <hui.wang@canonical.com> Acked-by: Aaron Ma <aaron.ma@canonical.com> Acked-by: Po-Hsu Lin <po-hsu.lin@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Colin Ian King [Tue, 17 Apr 2018 11:04:00 +0000 (13:04 +0200)]
UBUNTU: SAUCE: (noup) Update zfs to 0.7.5-1ubuntu15 (LP: #1764690)
BugLink: http://bugs.launchpad.net/bugs/1764690
This sync's SRU fixes in ZFS 0.7.5-1ubuntu15 to the kernel ZFS driver.
Fixes zfsonlinux issues fix in upstream ZFS repository:
- OpenZFS 8373 - TXG_WAIT in ZIL commit path
Closes zfsonlinux #6403
- zfs promote|rename .../%recv should be an error
Closes zfsonlinux #4843, #6339
- Fix parsable 'zfs get' for compressratios
Closes zfsonlinux #6436, #6449
- Fix zpool events scripted mode tab separator
Closes zfsonlinux #6444, #6445
- zv_suspend_lock in zvol_open()/zvol_release()
Closes zfsonlinux #6342
- Allow longer SPA names in stats, allows bigger pool names
Closes zfsonlinux #6481
- vdev_mirror: load balancing fixes
Closes zfsonlinux #6461
- Fix zfs_ioc_pool_sync should not use fnvlist
Closes zfsonlinux #6529
- OpenZFS 8375 - Kernel memory leak in nvpair code
Closes zfsonlinux #6578
- OpenZFS 7261 - nvlist code should enforce name length limit
Closes zfsonlinux #6579
- OpenZFS 5778 - nvpair_type_is_array() does not recognize
DATA_TYPE_INT8_ARRAY
Closes zfsonlinux #6580
- dmu_objset: release bonus buffer in failure path
Closes zfsonlinux #6575
- Fix false config_cache_write events
Closes zfsonlinux #6617
- Fix printk() calls missing log level
Closes zfsonlinux #6672
- Fix abdstats kstat on 32-bit systems
Closes zfsonlinux #6721
- Relax ASSERT for #6526
Closes zfsonlinux #6526
- Fix coverity defects: 147480, 147584 (Logically dead code)
Closes zfsonlinux #6745
- Fix coverity defects: CID 161388 (Resource Leak)
Closes zfsonlinux #6755
- Use ashift=12 by default on SSDSC2BW48 disks
Closes zfsonlinux #6774
- OpenZFS 8558, 8602 - lwp_create() returns EAGAIN
Closes zfsonlinux #6779
- ZFS send fails to dump objects larger than 128PiB
Closes zfsonlinux #6760
- Sort output of tunables in arc_summary.py
Closes zfsonlinux #6828
- Fix data on evict_skips in arc_summary.py
Closes zfsonlinux #6882, #6883
- Fix segfault in zpool iostat when adding VDEVs
Closes zfsonlinux #6748, #6872
- ZTS: Fix create-o_ashift test case
Closes zfsonlinux #6924, #6877
- Handle invalid options in arc_summary
Closes zfsonlinux #6983
- Call commit callbacks from the tail of the list
Closes zfsonlinux #6986
- Fix 'zpool add' handling of nested interior VDEVs
Closes zfsonlinux #6678, #6996
- Fix -fsanitize=address memory leak
kmem_alloc(0, ...) in userspace returns a leakable pointer.
Closes zfsonlinux #6941
- Revert raidz_map and _col structure types
Closes zfsonlinux #6981, #7023
- Use zap_count instead of cached z_size for unlink
Closes zfsonlinux #7019
- OpenZFS 8897 - zpool online -e fails assertion when run on non-leaf
vdevs
Closes zfsonlinux #7030
- OpenZFS 8898 - creating fs with checksum=skein on the boot pools
fails ungracefully
Closes zfsonlinux #7031
- Emit an error message before MMP suspends pool
Closes zfsonlinux #7048
- OpenZFS 8641 - "zpool clear" and "zinject" don't work on "spare"
or "replacing" vdevs
Closes zfsonlinux #7060
- OpenZFS 8835 - Speculative prefetch in ZFS not working for
misaligned reads
Closes zfsonlinux #7062
- OpenZFS 8972 - zfs holds: In scripted mode, do not pad columns with
spaces
Closes zfsonlinux #7063
- Revert "Remove wrong ASSERT in annotate_ecksum"
Closes zfsonlinux #7079
- OpenZFS 8731 - ASSERT3U(nui64s, <=, UINT16_MAX) fails for large
blocks
Closes zfsonlinux #7079
- Prevent zdb(8) from occasionally hanging on I/O
Closes zfsonlinux #6999
- Fix 'zfs receive -o' when used with '-e|-d'
Closes zfsonlinux #7088
- Change movaps to movups in AES-NI code
Closes zfsonlinux #7065, #7108
- tx_waited -> tx_dirty_delayed in trace_dmu.h
Closes zfsonlinux #7096
- OpenZFS 8966 - Source file zfs_acl.c, function
Closes zfsonlinux #7141
- Fix zdb -c traverse stop on damaged objset root
Closes zfsonlinux #7099
- Fix zle_decompress out of bound access
Closes zfsonlinux #7099
- Fix racy assignment of zcb.zcb_haderrors
Closes zfsonlinux #7099
- Fix zdb -R decompression
Closes zfsonlinux #7099, #4984
- Fix zdb -E segfault
Closes zfsonlinux #7099
- Fix zdb -ed on objset for exported pool
Closes zfsonlinux #7099, #6464
Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
drm/i915/edp: Do not do link training fallback or prune modes on EDP
BugLink: https://bugs.launchpad.net/bugs/1763271
In case of eDP because the panel has a fixed mode, the link rate
and lane count at which it is trained corresponds to the link BW
required to support the native resolution of the panel. In case of
panles with lower resolutions where fewer lanes are hooked up internally,
that number is reflected in the MAX_LANE_COUNT DPCD register of the panel.
So it is pointless to fallback to lower link rate/lane count in case
of link training failure on eDP connector since the lower link BW
will not support the native resolution of the panel and we cannot
prune the preferred mode on the eDP connector.
In case of Link training failure on the eDP panel, something is wrong
in the HW internally and hence driver errors out with a loud
and clear DRM_ERROR message.
v2:
* Fix the DEBUG_ERROR and add {} in else (Ville Syrjala)
Cc: Clinton Taylor <clinton.a.taylor@intel.com> Cc: Jim Bride <jim.bride@linux.intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Ville Syrjala <ville.syrjala@linux.intel.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Manasi Navare <manasi.d.navare@intel.com> Reviewed-by: Ville Syrjala <ville.syrjala@linux.intel.com>
Reference: https://bugs.freedesktop.org/show_bug.cgi?id=103369 Signed-off-by: Imre Deak <imre.deak@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/1507835618-23051-1-git-send-email-manasi.d.navare@intel.com
(cherry picked from commit c0cfb10d9e1de490e36d3b9d4228c0ea0ca30677) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit a306343bcd7df89d9d45a601929e26866e7b7a81) Signed-off-by: AceLan Kao <acelan.kao@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Michael Neuling [Fri, 27 Apr 2018 00:16:11 +0000 (10:16 +1000)]
stf-barrier: set eieio instruction bit 6 for future optimisations
Enable reserved instruction bit 6 in the eieio instruction; later a
faster version might be provided for this particular purpose (store
forward barrier) based on this hint.
Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Signed-off-by: Breno Leitao <brenohl@br.ibm.com>
CVE-2018-3639 (powerpc)
Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Nicholas Piggin [Tue, 24 Apr 2018 06:55:14 +0000 (16:55 +1000)]
powerpc/64s: Add support for a store forwarding barrier at kernel entry/exit
This patch prevents a possible store forwarding between privilege
domains, by inserting a store forwarding barrier in kernel entry and
exit paths.
Barriers must be inserted generally before the first load after moving
to a higher privilege, and after the last store before moving to a
lower privilege, HV and PR privilege transitions must be protected.
Barriers are added as patch sections, with all kernel/hypervisor entry
points patched, and the exit points to lower privilge levels patched
similarly to the RFI flush patching.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Signed-off-by: Breno Leitao <brenohl@br.ibm.com>
CVE-2018-3639 (powerpc)
Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Jiri Kosina [Thu, 10 May 2018 20:47:18 +0000 (22:47 +0200)]
x86/bugs: Fix __ssb_select_mitigation() return type
__ssb_select_mitigation() returns one of the members of enum ssb_mitigation,
not ssb_mitigation_cmd; fix the prototype to reflect that.
Fixes: 24f7fc83b9204 ("x86/bugs: Provide boot parameters for the spec_store_bypass_disable mitigation") Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
CVE-2018-3639 (x86)
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
The style for the 'status' file is CamelCase or this. _.
Fixes: fae1fa0fc ("proc: Provide details on speculation flaw mitigations") Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
CVE-2018-3639 (x86)
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Intel collateral will reference the SSB mitigation bit in IA32_SPEC_CTL[2]
as SSBD (Speculative Store Bypass Disable).
Hence changing it.
It is unclear yet what the MSR_IA32_ARCH_CAPABILITIES (0x10a) Bit(4) name
is going to be. Following the rename it would be SSBD_NO but that rolls out
to Speculative Store Bypass Disable No.
Also fixed the missing space in X86_FEATURE_AMD_SSBD.
[ tglx: Fixup x86_amd_rds_enable() and rds_tif_to_amd_ls_cfg() as well ]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
CVE-2018-3639 (x86)
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Thomas Gleixner [Fri, 4 May 2018 13:12:06 +0000 (15:12 +0200)]
seccomp: Move speculation migitation control to arch code
The migitation control is simpler to implement in architecture code as it
avoids the extra function call to check the mode. Aside of that having an
explicit seccomp enabled mode in the architecture mitigations would require
even more workarounds.
Move it into architecture code and provide a weak function in the seccomp
code. Remove the 'which' argument as this allows the architecture to decide
which mitigations are relevant for seccomp.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
CVE-2018-3639 (x86)
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Kees Cook [Thu, 3 May 2018 21:56:12 +0000 (14:56 -0700)]
seccomp: Add filter flag to opt-out of SSB mitigation
If a seccomp user is not interested in Speculative Store Bypass mitigation
by default, it can set the new SECCOMP_FILTER_FLAG_SPEC_ALLOW flag when
adding filters.
Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
CVE-2018-3639 (x86)
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Thomas Gleixner [Thu, 3 May 2018 20:09:15 +0000 (22:09 +0200)]
prctl: Add force disable speculation
For certain use cases it is desired to enforce mitigations so they cannot
be undone afterwards. That's important for loader stubs which want to
prevent a child from disabling the mitigation again. Will also be used for
seccomp(). The extra state preserving of the prctl state for SSB is a
preparatory step for EBPF dymanic speculation control.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
CVE-2018-3639 (x86)
[smb: minor context adaption in prctl.h] Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Kees Cook [Tue, 1 May 2018 22:07:31 +0000 (15:07 -0700)]
seccomp: Enable speculation flaw mitigations
When speculation flaw mitigations are opt-in (via prctl), using seccomp
will automatically opt-in to these protections, since using seccomp
indicates at least some level of sandboxing is desired.
Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
CVE-2018-3639 (x86)
Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Thomas Gleixner [Sun, 29 Apr 2018 13:26:40 +0000 (15:26 +0200)]
x86/speculation: Add prctl for Speculative Store Bypass mitigation
Add prctl based control for Speculative Store Bypass mitigation and make it
the default mitigation for Intel and AMD.
Andi Kleen provided the following rationale (slightly redacted):
There are multiple levels of impact of Speculative Store Bypass:
1) JITed sandbox.
It cannot invoke system calls, but can do PRIME+PROBE and may have call
interfaces to other code
2) Native code process.
No protection inside the process at this level.
3) Kernel.
4) Between processes.
The prctl tries to protect against case (1) doing attacks.
If the untrusted code can do random system calls then control is already
lost in a much worse way. So there needs to be system call protection in
some way (using a JIT not allowing them or seccomp). Or rather if the
process can subvert its environment somehow to do the prctl it can already
execute arbitrary code, which is much worse than SSB.
To put it differently, the point of the prctl is to not allow JITed code
to read data it shouldn't read from its JITed sandbox. If it already has
escaped its sandbox then it can already read everything it wants in its
address space, and do much worse.
The ability to control Speculative Store Bypass allows to enable the
protection selectively without affecting overall system performance.
Based on an initial patch from Tim Chen. Completely rewritten.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CVE-2018-3639 (x86)
Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Thomas Gleixner [Sun, 29 Apr 2018 13:21:42 +0000 (15:21 +0200)]
x86/process: Allow runtime control of Speculative Store Bypass
The Speculative Store Bypass vulnerability can be mitigated with the
Reduced Data Speculation (RDS) feature. To allow finer grained control of
this eventually expensive mitigation a per task mitigation control is
required.
Add a new TIF_RDS flag and put it into the group of TIF flags which are
evaluated for mismatch in switch_to(). If these bits differ in the previous
and the next task, then the slow path function __switch_to_xtra() is
invoked. Implement the TIF_RDS dependent mitigation control in the slow
path.
If the prctl for controlling Speculative Store Bypass is disabled or no
task uses the prctl then there is no overhead in the switch_to() fast
path.
Update the KVM related speculation control functions to take TID_RDS into
account as well.
Based on a patch from Tim Chen. Completely rewritten.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CVE-2018-3639 (x86)
Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Thomas Gleixner [Sun, 29 Apr 2018 13:20:11 +0000 (15:20 +0200)]
prctl: Add speculation control prctls
Add two new prctls to control aspects of speculation related vulnerabilites
and their mitigations to provide finer grained control over performance
impacting mitigations.
PR_GET_SPECULATION_CTRL returns the state of the speculation misfeature
which is selected with arg2 of prctl(2). The return value uses bit 0-2 with
the following meaning:
Bit Define Description
0 PR_SPEC_PRCTL Mitigation can be controlled per task by
PR_SET_SPECULATION_CTRL
1 PR_SPEC_ENABLE The speculation feature is enabled, mitigation is
disabled
2 PR_SPEC_DISABLE The speculation feature is disabled, mitigation is
enabled
If all bits are 0 the CPU is not affected by the speculation misfeature.
If PR_SPEC_PRCTL is set, then the per task control of the mitigation is
available. If not set, prctl(PR_SET_SPECULATION_CTRL) for the speculation
misfeature will fail.
PR_SET_SPECULATION_CTRL allows to control the speculation misfeature, which
is selected by arg2 of prctl(2) per task. arg3 is used to hand in the
control value, i.e. either PR_SPEC_ENABLE or PR_SPEC_DISABLE.
The common return values are:
EINVAL prctl is not implemented by the architecture or the unused prctl()
arguments are not 0
ENODEV arg2 is selecting a not supported speculation misfeature
PR_SET_SPECULATION_CTRL has these additional return values:
ERANGE arg3 is incorrect, i.e. it's not either PR_SPEC_ENABLE or PR_SPEC_DISABLE
ENXIO prctl control of the selected speculation misfeature is disabled
The first supported controlable speculation misfeature is
PR_SPEC_STORE_BYPASS. Add the define so this can be shared between
architectures.
Based on an initial patch from Tim Chen and mostly rewritten.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CVE-2018-3639 (x86)
[tyhicks: Minor backport for SAUCE patch context] Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Thomas Gleixner [Sun, 29 Apr 2018 13:01:37 +0000 (15:01 +0200)]
x86/speculation: Create spec-ctrl.h to avoid include hell
Having everything in nospec-branch.h creates a hell of dependencies when
adding the prctl based switching mechanism. Move everything which is not
required in nospec-branch.h to spec-ctrl.h and fix up the includes in the
relevant files.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Ingo Molnar <mingo@kernel.org>
CVE-2018-3639 (x86)
[tyhicks: Minor backport for context] Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
x86/bugs/AMD: Add support to disable RDS on Fam[15,16,17]h if requested
AMD does not need the Speculative Store Bypass mitigation to be enabled.
The parameters for this are already available and can be done via MSR
C001_1020. Each family uses a different bit in that MSR for this.
[ tglx: Expose the bit mask via a variable and move the actual MSR fiddling
into the bugs code as that's the right thing to do and also required
to prepare for dynamic enable/disable ]
Suggested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org>
CVE-2018-3639 (x86)
[tyhicks: Minor backport for context] Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
x86/bugs/intel: Set proper CPU features and setup RDS
Intel CPUs expose methods to:
- Detect whether RDS capability is available via CPUID.7.0.EDX[31],
- The SPEC_CTRL MSR(0x48), bit 2 set to enable RDS.
- MSR_IA32_ARCH_CAPABILITIES, Bit(4) no need to enable RRS.
With that in mind if spec_store_bypass_disable=[auto,on] is selected set at
boot-time the SPEC_CTRL MSR to enable RDS if the platform requires it.
Note that this does not fix the KVM case where the SPEC_CTRL is exposed to
guests which can muck with it, see patch titled :
KVM/SVM/VMX/x86/spectre_v2: Support the combination of guest and host IBRS.
And for the firmware (IBRS to be set), see patch titled:
x86/spectre_v2: Read SPEC_CTRL MSR during boot and re-use reserved bits
[ tglx: Distangled it from the intel implementation and kept the call order ]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Ingo Molnar <mingo@kernel.org>
CVE-2018-3639 (x86)
Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
x86/bugs: Provide boot parameters for the spec_store_bypass_disable mitigation
Contemporary high performance processors use a common industry-wide
optimization known as "Speculative Store Bypass" in which loads from
addresses to which a recent store has occurred may (speculatively) see an
older value. Intel refers to this feature as "Memory Disambiguation" which
is part of their "Smart Memory Access" capability.
Memory Disambiguation can expose a cache side-channel attack against such
speculatively read values. An attacker can create exploit code that allows
them to read memory outside of a sandbox environment (for example,
malicious JavaScript in a web page), or to perform more complex attacks
against code running within the same privilege level, e.g. via the stack.
As a first step to mitigate against such attacks, provide two boot command
line control knobs:
By default affected x86 processors will power on with Speculative
Store Bypass enabled. Hence the provided kernel parameters are written
from the point of view of whether to enable a mitigation or not.
The parameters are as follows:
- auto - Kernel detects whether your CPU model contains an implementation
of Speculative Store Bypass and picks the most appropriate
mitigation.
- on - disable Speculative Store Bypass
- off - enable Speculative Store Bypass
[ tglx: Reordered the checks so that the whole evaluation is not done
when the CPU does not support RDS ]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Ingo Molnar <mingo@kernel.org>
CVE-2018-3639 (x86)
Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
x86/bugs, KVM: Support the combination of guest and host IBRS
A guest may modify the SPEC_CTRL MSR from the value used by the
kernel. Since the kernel doesn't use IBRS, this means a value of zero is
what is needed in the host.
But the 336996-Speculative-Execution-Side-Channel-Mitigations.pdf refers to
the other bits as reserved so the kernel should respect the boot time
SPEC_CTRL value and use that.
This allows to deal with future extensions to the SPEC_CTRL interface if
any at all.
Note: This uses wrmsrl() instead of native_wrmsl(). I does not make any
difference as paravirt will over-write the callq *0xfff.. with the wrmsrl
assembler code.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Ingo Molnar <mingo@kernel.org>
CVE-2018-3639 (x86)
[tyhicks: Minor backport for context] Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
x86/bugs: Read SPEC_CTRL MSR during boot and re-use reserved bits
The 336996-Speculative-Execution-Side-Channel-Mitigations.pdf refers to all
the other bits as reserved. The Intel SDM glossary defines reserved as
implementation specific - aka unknown.
As such at bootup this must be taken it into account and proper masking for
the bits in use applied.
A copy of this document is available at
https://bugzilla.kernel.org/show_bug.cgi?id=199511
[ tglx: Made x86_spec_ctrl_base __ro_after_init ]
Suggested-by: Jon Masters <jcm@redhat.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Ingo Molnar <mingo@kernel.org>
CVE-2018-3639 (x86)
Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Linus Torvalds [Tue, 1 May 2018 13:55:51 +0000 (15:55 +0200)]
x86/nospec: Simplify alternative_msr_write()
The macro is not type safe and I did look for why that "g" constraint for
the asm doesn't work: it's because the asm is more fundamentally wrong.
It does
movl %[val], %%eax
but "val" isn't a 32-bit value, so then gcc will pass it in a register,
and generate code like
movl %rsi, %eax
and gas will complain about a nonsensical 'mov' instruction (it's moving a
64-bit register to a 32-bit one).
Passing it through memory will just hide the real bug - gcc still thinks
the memory location is 64-bit, but the "movl" will only load the first 32
bits and it all happens to work because x86 is little-endian.
Convert it to a type safe inline function with a little trick which hands
the feature into the ALTERNATIVE macro.
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
[smb: Added u32 casting to val to fix i386 build] Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Tyler Hicks [Fri, 4 May 2018 20:30:12 +0000 (20:30 +0000)]
UBUNTU: SAUCE: LSM stacking: adjust prctl values
Since LSM Stacking is provided as an early preview in the Ubuntu
kernels, we should use unusually high values for the LSM Stacking prctls
to reduce the chances of colliding with an upstream feature.
UBUNTU: Packaging: Depends on linux-base that provides the necessary tools
BugLink: http://bugs.launchpad.net/bugs/1767133
We Depends on linux-base because of linux-update-symlinks and
linux-check-removal. Those have been introduced in a version later than
4.0ubuntu1, which shipped on Xenial.
The unversioned Depends makes some upgrades to Bionic and installations of
linux-hwe-edge to break because those tools are not found during the configure
phase.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Andy Whitcroft <andy.whitcroft@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
UBUNTU: [Packaging] fix invocation of header postinst hooks
BugLink: http://bugs.launchpad.net/bugs/1766391
There's a typo in the headers postinst which prevents triggering
of dkms builds on installation. Change this to use the correct
path, /etc/kernel/header_postinst.d.