]> git.proxmox.com Git - mirror_ubuntu-bionic-kernel.git/log
mirror_ubuntu-bionic-kernel.git
4 years agolocking/qspinlock/x86: Increase _Q_PENDING_LOOPS upper bound
Will Deacon [Tue, 18 Dec 2018 17:13:59 +0000 (18:13 +0100)]
locking/qspinlock/x86: Increase _Q_PENDING_LOOPS upper bound

BugLink: https://bugs.launchpad.net/bugs/1837257
commit b247be3fe89b6aba928bf80f4453d1c4ba8d2063 upstream.

On x86, atomic_cond_read_relaxed will busy-wait with a cpu_relax() loop,
so it is desirable to increase the number of times we spin on the qspinlock
lockword when it is found to be transitioning from pending to locked.

According to Waiman Long:

 | Ideally, the spinning times should be at least a few times the typical
 | cacheline load time from memory which I think can be down to 100ns or
 | so for each cacheline load with the newest systems or up to several
 | hundreds ns for older systems.

which in his benchmarking corresponded to 512 iterations.

Suggested-by: Waiman Long <longman@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Waiman Long <longman@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: boqun.feng@gmail.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: paulmck@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1524738868-31318-5-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agolocking/qspinlock: Re-order code
Peter Zijlstra [Tue, 18 Dec 2018 17:13:58 +0000 (18:13 +0100)]
locking/qspinlock: Re-order code

BugLink: https://bugs.launchpad.net/bugs/1837257
commit 53bf57fab7321fb42b703056a4c80fc9d986d170 upstream.

Flip the branch condition after atomic_fetch_or_acquire(_Q_PENDING_VAL)
such that we loose the indent. This also result in a more natural code
flow IMO.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: andrea.parri@amarulasolutions.com
Cc: longman@redhat.com
Link: https://lkml.kernel.org/r/20181003130257.156322446@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agolocking/qspinlock: Kill cmpxchg() loop when claiming lock from head of queue
Will Deacon [Tue, 18 Dec 2018 17:13:57 +0000 (18:13 +0100)]
locking/qspinlock: Kill cmpxchg() loop when claiming lock from head of queue

BugLink: https://bugs.launchpad.net/bugs/1837257
commit c61da58d8a9ba9238250a548f00826eaf44af0f7 upstream.

When a queued locker reaches the head of the queue, it claims the lock
by setting _Q_LOCKED_VAL in the lockword. If there isn't contention, it
must also clear the tail as part of this operation so that subsequent
lockers can avoid taking the slowpath altogether.

Currently this is expressed as a cmpxchg() loop that practically only
runs up to two iterations. This is confusing to the reader and unhelpful
to the compiler. Rewrite the cmpxchg() loop without the loop, so that a
failed cmpxchg() implies that there is contention and we just need to
write to _Q_LOCKED_VAL without considering the rest of the lockword.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Waiman Long <longman@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: boqun.feng@gmail.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: paulmck@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1524738868-31318-7-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agolocking/qspinlock: Remove duplicate clear_pending() function from PV code
Will Deacon [Tue, 18 Dec 2018 17:13:56 +0000 (18:13 +0100)]
locking/qspinlock: Remove duplicate clear_pending() function from PV code

BugLink: https://bugs.launchpad.net/bugs/1837257
commit 3bea9adc96842b8a7345c7fb202c16ae9c8d5b25 upstream.

The native clear_pending() function is identical to the PV version, so the
latter can simply be removed.

This fixes the build for systems with >= 16K CPUs using the PV lock implementation.

Reported-by: Waiman Long <longman@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: boqun.feng@gmail.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: paulmck@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/20180427101619.GB21705@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agolocking/qspinlock: Remove unbounded cmpxchg() loop from locking slowpath
Will Deacon [Tue, 18 Dec 2018 17:13:55 +0000 (18:13 +0100)]
locking/qspinlock: Remove unbounded cmpxchg() loop from locking slowpath

BugLink: https://bugs.launchpad.net/bugs/1837257
commit 59fb586b4a07b4e1a0ee577140ab4842ba451acd upstream.

The qspinlock locking slowpath utilises a "pending" bit as a simple form
of an embedded test-and-set lock that can avoid the overhead of explicit
queuing in cases where the lock is held but uncontended. This bit is
managed using a cmpxchg() loop which tries to transition the uncontended
lock word from (0,0,0) -> (0,0,1) or (0,0,1) -> (0,1,1).

Unfortunately, the cmpxchg() loop is unbounded and lockers can be starved
indefinitely if the lock word is seen to oscillate between unlocked
(0,0,0) and locked (0,0,1). This could happen if concurrent lockers are
able to take the lock in the cmpxchg() loop without queuing and pass it
around amongst themselves.

This patch fixes the problem by unconditionally setting _Q_PENDING_VAL
using atomic_fetch_or, and then inspecting the old value to see whether
we need to spin on the current lock owner, or whether we now effectively
hold the lock. The tricky scenario is when concurrent lockers end up
queuing on the lock and the lock becomes available, causing us to see
a lockword of (n,0,0). With pending now set, simply queuing could lead
to deadlock as the head of the queue may not have observed the pending
flag being cleared. Conversely, if the head of the queue did observe
pending being cleared, then it could transition the lock from (n,0,0) ->
(0,0,1) meaning that any attempt to "undo" our setting of the pending
bit could race with a concurrent locker trying to set it.

We handle this race by preserving the pending bit when taking the lock
after reaching the head of the queue and leaving the tail entry intact
if we saw pending set, because we know that the tail is going to be
updated shortly.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Waiman Long <longman@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: boqun.feng@gmail.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: paulmck@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1524738868-31318-6-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agolocking/qspinlock: Merge 'struct __qspinlock' into 'struct qspinlock'
Will Deacon [Tue, 18 Dec 2018 17:13:54 +0000 (18:13 +0100)]
locking/qspinlock: Merge 'struct __qspinlock' into 'struct qspinlock'

BugLink: https://bugs.launchpad.net/bugs/1837257
commit 625e88be1f41b53cec55827c984e4a89ea8ee9f9 upstream.

'struct __qspinlock' provides a handy union of fields so that
subcomponents of the lockword can be accessed by name, without having to
manage shifts and masks explicitly and take endianness into account.

This is useful in qspinlock.h and also potentially in arch headers, so
move the 'struct __qspinlock' into 'struct qspinlock' and kill the extra
definition.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Waiman Long <longman@redhat.com>
Acked-by: Boqun Feng <boqun.feng@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arm-kernel@lists.infradead.org
Cc: paulmck@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1524738868-31318-3-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agolocking/qspinlock: Bound spinning on pending->locked transition in slowpath
Will Deacon [Tue, 18 Dec 2018 17:13:53 +0000 (18:13 +0100)]
locking/qspinlock: Bound spinning on pending->locked transition in slowpath

BugLink: https://bugs.launchpad.net/bugs/1837257
commit 6512276d97b160d90b53285bd06f7f201459a7e3 upstream.

If a locker taking the qspinlock slowpath reads a lock value indicating
that only the pending bit is set, then it will spin whilst the
concurrent pending->locked transition takes effect.

Unfortunately, there is no guarantee that such a transition will ever be
observed since concurrent lockers could continuously set pending and
hand over the lock amongst themselves, leading to starvation. Whilst
this would probably resolve in practice, it means that it is not
possible to prove liveness properties about the lock and means that lock
acquisition time is unbounded.

Rather than removing the pending->locked spinning from the slowpath
altogether (which has been shown to heavily penalise a 2-threaded
locking stress test on x86), this patch replaces the explicit spinning
with a call to atomic_cond_read_relaxed and allows the architecture to
provide a bound on the number of spins. For architectures that can
respond to changes in cacheline state in their smp_cond_load implementation,
it should be sufficient to use the default bound of 1.

Suggested-by: Waiman Long <longman@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Waiman Long <longman@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: boqun.feng@gmail.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: paulmck@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1524738868-31318-4-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agolocking/qspinlock: Ensure node is initialised before updating prev->next
Will Deacon [Tue, 18 Dec 2018 17:13:52 +0000 (18:13 +0100)]
locking/qspinlock: Ensure node is initialised before updating prev->next

BugLink: https://bugs.launchpad.net/bugs/1837257
commit 95bcade33a8af38755c9b0636e36a36ad3789fe6 upstream.

When a locker ends up queuing on the qspinlock locking slowpath, we
initialise the relevant mcs node and publish it indirectly by updating
the tail portion of the lock word using xchg_tail. If we find that there
was a pre-existing locker in the queue, we subsequently update their
->next field to point at our node so that we are notified when it's our
turn to take the lock.

This can be roughly illustrated as follows:

  /* Initialise the fields in node and encode a pointer to node in tail */
  tail = initialise_node(node);

  /*
   * Exchange tail into the lockword using an atomic read-modify-write
   * operation with release semantics
   */
  old = xchg_tail(lock, tail);

  /* If there was a pre-existing waiter ... */
  if (old & _Q_TAIL_MASK) {
prev = decode_tail(old);
smp_read_barrier_depends();

/* ... then update their ->next field to point to node.
WRITE_ONCE(prev->next, node);
  }

The conditional update of prev->next therefore relies on the address
dependency from the result of xchg_tail ensuring order against the
prior initialisation of node. However, since the release semantics of
the xchg_tail operation apply only to the write portion of the RmW,
then this ordering is not guaranteed and it is possible for the CPU
to return old before the writes to node have been published, consequently
allowing us to point prev->next to an uninitialised node.

This patch fixes the problem by making the update of prev->next a RELEASE
operation, which also removes the reliance on dependency ordering.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1518528177-19169-2-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agolocking: Remove smp_read_barrier_depends() from queued_spin_lock_slowpath()
Paul E. McKenney [Tue, 18 Dec 2018 17:13:51 +0000 (18:13 +0100)]
locking: Remove smp_read_barrier_depends() from queued_spin_lock_slowpath()

BugLink: https://bugs.launchpad.net/bugs/1837257
commit 548095dea63ffc016d39c35b32c628d033638aca upstream.

Queued spinlocks are not used by DEC Alpha, and furthermore operations
such as READ_ONCE() and release/relaxed RMW atomics are being changed
to imply smp_read_barrier_depends().  This commit therefore removes the
now-redundant smp_read_barrier_depends() from queued_spin_lock_slowpath(),
and adjusts the comments accordingly.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agox86/build: Fix compiler support check for CONFIG_RETPOLINE
Masahiro Yamada [Wed, 5 Dec 2018 06:27:19 +0000 (15:27 +0900)]
x86/build: Fix compiler support check for CONFIG_RETPOLINE

BugLink: https://bugs.launchpad.net/bugs/1837257
commit 25896d073d8a0403b07e6dec56f58e6c33678207 upstream.

It is troublesome to add a diagnostic like this to the Makefile
parse stage because the top-level Makefile could be parsed with
a stale include/config/auto.conf.

Once you are hit by the error about non-retpoline compiler, the
compilation still breaks even after disabling CONFIG_RETPOLINE.

The easiest fix is to move this check to the "archprepare" like
this commit did:

  829fe4aa9ac1 ("x86: Allow generating user-space headers without a compiler")

Reported-by: Meelis Roos <mroos@linux.ee>
Tested-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Fixes: 4cd24de3a098 ("x86/retpoline: Make CONFIG_RETPOLINE depend on compiler support")
Link: http://lkml.kernel.org/r/1543991239-18476-1-git-send-email-yamada.masahiro@socionext.com
Link: https://lkml.org/lkml/2018/12/4/206
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Cc: Gi-Oh Kim <gi-oh.kim@cloud.ionos.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agodrm/amdgpu: update SMC firmware image for polaris10 variants
Junwei Zhang [Fri, 7 Dec 2018 07:15:03 +0000 (15:15 +0800)]
drm/amdgpu: update SMC firmware image for polaris10 variants

BugLink: https://bugs.launchpad.net/bugs/1837257
commit d55d8be0747c96db28a1d08fc24d22ccd9b448ac upstream.

Some new variants require different firmwares.

Signed-off-by: Junwei Zhang <Jerry.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agodrm/i915/execlists: Apply a full mb before execution for Braswell
Chris Wilson [Thu, 6 Dec 2018 08:44:31 +0000 (08:44 +0000)]
drm/i915/execlists: Apply a full mb before execution for Braswell

BugLink: https://bugs.launchpad.net/bugs/1837257
commit cf66b8a0ba142fbd1bf10ac8f3ae92d1b0cb7b8f upstream.

Braswell is really picky about having our writes posted to memory before
we execute or else the GPU may see stale values. A wmb() is insufficient
as it only ensures the writes are visible to other cores, we need a full
mb() to ensure the writes are in memory and visible to the GPU.

The most frequent failure in flushing before execution is that we see
stale PTE values and execute the wrong pages.

References: 987abd5c62f9 ("drm/i915/execlists: Force write serialisation into context image vs execution")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181206084431.9805-3-chris@chris-wilson.co.uk
(cherry picked from commit 490b8c65b9db45896769e1095e78725775f47b3e)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agodrm/nouveau/kms: Fix memory leak in nv50_mstm_del()
Lyude Paul [Tue, 11 Dec 2018 23:56:20 +0000 (18:56 -0500)]
drm/nouveau/kms: Fix memory leak in nv50_mstm_del()

BugLink: https://bugs.launchpad.net/bugs/1837257
commit 24199c5436f267399afed0c4f1f57663c0408f57 upstream.

Noticed this while working on redoing the reference counting scheme in
the DP MST helpers. Nouveau doesn't attempt to call
drm_dp_mst_topology_mgr_destroy() at all, which leaves it leaking all of
the resources for drm_dp_mst_topology_mgr and it's children mstbs+ports.

Fixes: f479c0ba4a17 ("drm/nouveau/kms/nv50: initial support for DP 1.2 multi-stream")
Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: <stable@vger.kernel.org> # v4.10+
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agopowerpc/msi: Fix NULL pointer access in teardown code
Radu Rendec [Wed, 28 Nov 2018 03:20:48 +0000 (22:20 -0500)]
powerpc/msi: Fix NULL pointer access in teardown code

BugLink: https://bugs.launchpad.net/bugs/1837257
commit 78e7b15e17ac175e7eed9e21c6f92d03d3b0a6fa upstream.

The arch_teardown_msi_irqs() function assumes that controller ops
pointers were already checked in arch_setup_msi_irqs(), but this
assumption is wrong: arch_teardown_msi_irqs() can be called even when
arch_setup_msi_irqs() returns an error (-ENOSYS).

This can happen in the following scenario:
  - msi_capability_init() calls pci_msi_setup_msi_irqs()
  - pci_msi_setup_msi_irqs() returns -ENOSYS
  - msi_capability_init() notices the error and calls free_msi_irqs()
  - free_msi_irqs() calls pci_msi_teardown_msi_irqs()

This is easier to see when CONFIG_PCI_MSI_IRQ_DOMAIN is not set and
pci_msi_setup_msi_irqs() and pci_msi_teardown_msi_irqs() are just
aliases to arch_setup_msi_irqs() and arch_teardown_msi_irqs().

The call to free_msi_irqs() upon pci_msi_setup_msi_irqs() failure
seems legit, as it does additional cleanup; e.g.
list_del(&entry->list) and kfree(entry) inside free_msi_irqs() do
happen (MSI descriptors are allocated before pci_msi_setup_msi_irqs()
is called and need to be cleaned up if that fails).

Fixes: 6b2fd7efeb88 ("PCI/MSI/PPC: Remove arch_msi_check_device()")
Cc: stable@vger.kernel.org # v3.18+
Signed-off-by: Radu Rendec <radu.rendec@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agotracing: Fix memory leak of instance function hash filters
Steven Rostedt (VMware) [Tue, 11 Dec 2018 04:58:01 +0000 (23:58 -0500)]
tracing: Fix memory leak of instance function hash filters

BugLink: https://bugs.launchpad.net/bugs/1837257
commit 2840f84f74035e5a535959d5f17269c69fa6edc5 upstream.

The following commands will cause a memory leak:

 # cd /sys/kernel/tracing
 # mkdir instances/foo
 # echo schedule > instance/foo/set_ftrace_filter
 # rmdir instances/foo

The reason is that the hashes that hold the filters to set_ftrace_filter and
set_ftrace_notrace are not freed if they contain any data on the instance
and the instance is removed.

Found by kmemleak detector.

Cc: stable@vger.kernel.org
Fixes: 591dffdade9f ("ftrace: Allow for function tracing instance to filter functions")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agotracing: Fix memory leak in set_trigger_filter()
Steven Rostedt (VMware) [Mon, 10 Dec 2018 02:17:30 +0000 (21:17 -0500)]
tracing: Fix memory leak in set_trigger_filter()

BugLink: https://bugs.launchpad.net/bugs/1837257
commit 3cec638b3d793b7cacdec5b8072364b41caeb0e1 upstream.

When create_event_filter() fails in set_trigger_filter(), the filter may
still be allocated and needs to be freed. The caller expects the
data->filter to be updated with the new filter, even if the new filter
failed (we could add an error message by setting set_str parameter of
create_event_filter(), but that's another update).

But because the error would just exit, filter was left hanging and
nothing could free it.

Found by kmemleak detector.

Cc: stable@vger.kernel.org
Fixes: bac5fb97a173a ("tracing: Add and use generic set_trigger_filter() implementation")
Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agodm cache metadata: verify cache has blocks in blocks_are_clean_separate_dirty()
Mike Snitzer [Fri, 9 Nov 2018 16:56:03 +0000 (11:56 -0500)]
dm cache metadata: verify cache has blocks in blocks_are_clean_separate_dirty()

BugLink: https://bugs.launchpad.net/bugs/1837257
commit 687cf4412a343a63928a5c9d91bdc0f522939d43 upstream.

Otherwise dm_bitset_cursor_begin() return -ENODATA.  Other calls to
dm_bitset_cursor_begin() have similar negative checks.

Fixes inability to create a cache in passthrough mode (even though doing
so makes no sense).

Fixes: 0d963b6e65 ("dm cache metadata: fix metadata2 format's blocks_are_clean_separate_dirty")
Cc: stable@vger.kernel.org
Reported-by: David Teigland <teigland@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agodm thin: send event about thin-pool state change _after_ making it
Mike Snitzer [Tue, 11 Dec 2018 18:31:40 +0000 (13:31 -0500)]
dm thin: send event about thin-pool state change _after_ making it

BugLink: https://bugs.launchpad.net/bugs/1837257
commit f6c367585d0d851349d3a9e607c43e5bea993fa1 upstream.

Sending a DM event before a thin-pool state change is about to happen is
a bug.  It wasn't realized until it became clear that userspace response
to the event raced with the actual state change that the event was
meant to notify about.

Fix this by first updating internal thin-pool state to reflect what the
DM event is being issued about.  This fixes a long-standing racey/buggy
userspace device-mapper-test-suite 'resize_io' test that would get an
event but not find the state it was looking for -- so it would just go
on to hang because no other events caused the test to reevaluate the
thin-pool's state.

Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoARM: mmp/mmp2: fix cpu_is_mmp2() on mmp2-dt
Lubomir Rintel [Sun, 2 Dec 2018 11:12:24 +0000 (12:12 +0100)]
ARM: mmp/mmp2: fix cpu_is_mmp2() on mmp2-dt

BugLink: https://bugs.launchpad.net/bugs/1837257
commit 76f4e2c3b6a560cdd7a75b87df543e04d05a9e5f upstream.

cpu_is_mmp2() was equivalent to cpu_is_pj4(), wouldn't be correct for
multiplatform kernels. Fix it by also considering mmp_chip_id, as is
done for cpu_is_pxa168() and cpu_is_pxa910() above.

Moreover, it is only available with CONFIG_CPU_MMP2 and thus doesn't work
on DT-based MMP2 machines. Enable it on CONFIG_MACH_MMP2_DT too.

Note: CONFIG_CPU_MMP2 is only used for machines that use board files
instead of DT. It should perhaps be renamed. I'm not doing it now, because
I don't have a better idea.

Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: stable@vger.kernel.org
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agommc: sdhci: fix the timeout check window for clock and reset
Alek Du [Thu, 6 Dec 2018 09:24:59 +0000 (17:24 +0800)]
mmc: sdhci: fix the timeout check window for clock and reset

BugLink: https://bugs.launchpad.net/bugs/1837257
commit b704441e38f645dcfba1348ca3cc1ba43d1a9f31 upstream.

We observed some premature timeouts on a virtualization platform, the log
is like this:

case 1:
[159525.255629] mmc1: Internal clock never stabilised.
[159525.255818] mmc1: sdhci: ============ SDHCI REGISTER DUMP ===========
[159525.256049] mmc1: sdhci: Sys addr:  0x00000000 | Version:  0x00001002
...
[159525.257205] mmc1: sdhci: Wake-up:   0x00000000 | Clock:    0x0000fa03
From the clock control register dump, we are pretty sure the clock was
stablized.

case 2:
[  914.550127] mmc1: Reset 0x2 never completed.
[  914.550321] mmc1: sdhci: ============ SDHCI REGISTER DUMP ===========
[  914.550608] mmc1: sdhci: Sys addr:  0x00000010 | Version:  0x00001002

After checking the sdhci code, we found the timeout check actually has a
little window that the CPU can be scheduled out and when it comes back,
the original time set or check is not valid.

Fixes: 5a436cc0af62 ("mmc: sdhci: Optimize delay loops")
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Alek Du <alek.du@intel.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoMMC: OMAP: fix broken MMC on OMAP15XX/OMAP5910/OMAP310
Aaro Koskinen [Mon, 19 Nov 2018 23:14:00 +0000 (01:14 +0200)]
MMC: OMAP: fix broken MMC on OMAP15XX/OMAP5910/OMAP310

BugLink: https://bugs.launchpad.net/bugs/1837257
commit e8cde625bfe8a714a856e1366bcbb259d7346095 upstream.

Since v2.6.22 or so there has been reports [1] about OMAP MMC being
broken on OMAP15XX based hardware (OMAP5910 and OMAP310). The breakage
seems to have been caused by commit 46a6730e3ff9 ("mmc-omap: Fix
omap to use MMC_POWER_ON") that changed clock enabling to be done
on MMC_POWER_ON. This can happen multiple times in a row, and on 15XX
the hardware doesn't seem to like it and the MMC just stops responding.
Fix by memorizing the power mode and do the init only when necessary.

Before the patch (on Palm TE):

mmc0: new SD card at address b368
mmcblk0: mmc0:b368 SDC   977 MiB
mmci-omap mmci-omap.0: command timeout (CMD18)
mmci-omap mmci-omap.0: command timeout (CMD13)
mmci-omap mmci-omap.0: command timeout (CMD13)
mmci-omap mmci-omap.0: command timeout (CMD12) [x 6]
mmci-omap mmci-omap.0: command timeout (CMD13) [x 6]
mmcblk0: error -110 requesting status
mmci-omap mmci-omap.0: command timeout (CMD8)
mmci-omap mmci-omap.0: command timeout (CMD18)
mmci-omap mmci-omap.0: command timeout (CMD13)
mmci-omap mmci-omap.0: command timeout (CMD13)
mmci-omap mmci-omap.0: command timeout (CMD12) [x 6]
mmci-omap mmci-omap.0: command timeout (CMD13) [x 6]
mmcblk0: error -110 requesting status
mmcblk0: recovery failed!
print_req_error: I/O error, dev mmcblk0, sector 0
Buffer I/O error on dev mmcblk0, logical block 0, async page read
 mmcblk0: unable to read partition table

After the patch:

mmc0: new SD card at address b368
mmcblk0: mmc0:b368 SDC   977 MiB
 mmcblk0: p1

The patch is based on a fix and analysis done by Ladislav Michl.

Tested on OMAP15XX/OMAP310 (Palm TE), OMAP1710 (Nokia 770)
and OMAP2420 (Nokia N810).

[1] https://marc.info/?t=123175197000003&r=1&w=2

Fixes: 46a6730e3ff9 ("mmc-omap: Fix omap to use MMC_POWER_ON")
Reported-by: Ladislav Michl <ladis@linux-mips.org>
Reported-by: Andrzej Zaborowski <balrogg@gmail.com>
Tested-by: Ladislav Michl <ladis@linux-mips.org>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoarm64: dma-mapping: Fix FORCE_CONTIGUOUS buffer clearing
Robin Murphy [Mon, 10 Dec 2018 19:33:31 +0000 (19:33 +0000)]
arm64: dma-mapping: Fix FORCE_CONTIGUOUS buffer clearing

BugLink: https://bugs.launchpad.net/bugs/1837257
commit 3238c359acee4ab57f15abb5a82b8ab38a661ee7 upstream.

We need to invalidate the caches *before* clearing the buffer via the
non-cacheable alias, else in the worst case __dma_flush_area() may
write back dirty lines over the top of our nice new zeros.

Fixes: dd65a941f6ba ("arm64: dma-mapping: clear buffers allocated with FORCE_CONTIGUOUS flag")
Cc: <stable@vger.kernel.org> # 4.18.x-
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agouserfaultfd: check VM_MAYWRITE was set after verifying the uffd is registered
Andrea Arcangeli [Fri, 14 Dec 2018 22:17:17 +0000 (14:17 -0800)]
userfaultfd: check VM_MAYWRITE was set after verifying the uffd is registered

BugLink: https://bugs.launchpad.net/bugs/1837257
commit 01e881f5a1fca4677e82733061868c6d6ea05ca7 upstream.

Calling UFFDIO_UNREGISTER on virtual ranges not yet registered in uffd
could trigger an harmless false positive WARN_ON.  Check the vma is
already registered before checking VM_MAYWRITE to shut off the false
positive warning.

Link: http://lkml.kernel.org/r/20181206212028.18726-2-aarcange@redhat.com
Cc: <stable@vger.kernel.org>
Fixes: 29ec90660d68 ("userfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmas")
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: syzbot+06c7092e7d71218a2c16@syzkaller.appspotmail.com
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agopinctrl: sunxi: a83t: Fix IRQ offset typo for PH11
Chen-Yu Tsai [Tue, 4 Dec 2018 09:04:57 +0000 (17:04 +0800)]
pinctrl: sunxi: a83t: Fix IRQ offset typo for PH11

BugLink: https://bugs.launchpad.net/bugs/1837257
commit 478b6767ad26ab86d9ecc341027dd09a87b1f997 upstream.

Pin PH11 is used on various A83T board to detect a change in the OTG
port's ID pin, as in when an OTG host cable is plugged in.

The incorrect offset meant the gpiochip/irqchip was activating the wrong
pin for interrupts.

Fixes: 4730f33f0d82 ("pinctrl: sunxi: add allwinner A83T PIO controller support")
Cc: <stable@vger.kernel.org>
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Acked-by: Maxime Ripard <maxime.ripard@bootlin.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoUBUNTU: SAUCE: ALSA: hda - Add a conexant codec entry to let mute led work
Hui Wang [Fri, 26 Jul 2019 03:54:54 +0000 (11:54 +0800)]
UBUNTU: SAUCE: ALSA: hda - Add a conexant codec entry to let mute led work

BugLink: https://bugs.launchpad.net/bugs/1837963
This conexant codec isn't in the supported codec list yet, the hda
generic driver can drive this codec well, but on a Lenovo machine
with mute/mic-mute leds, we need to apply CXT_FIXUP_THINKPAD_ACPI
to make the leds work. After adding this codec to the list, the
driver patch_conexant.c will apply THINKPAD_ACPI to this machine.

Cc: stable@vger.kernel.org
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
(cherry picked from commit 3f8809499bf02ef7874254c5e23fc764a47a21a0
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git)
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Kai Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoUBUNTU: Start new release
Kleber Sacilotto de Souza [Tue, 13 Aug 2019 12:02:27 +0000 (14:02 +0200)]
UBUNTU: Start new release

Ignore: yes
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
4 years agoUBUNTU: Ubuntu-4.15.0-58.64 Ubuntu-4.15.0-58.64
Stefan Bader [Tue, 6 Aug 2019 10:45:37 +0000 (12:45 +0200)]
UBUNTU: Ubuntu-4.15.0-58.64

Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
4 years agoRevert "new primitive: discard_new_inode()"
Stefan Bader [Tue, 6 Aug 2019 10:39:04 +0000 (12:39 +0200)]
Revert "new primitive: discard_new_inode()"

BugLink: https://bugs.launchpad.net/bugs/1838982
This reverts commit a5b1d6edcceaccbad1f964641992f820794fdfae.

This change was only added as a pre-req for "ovl: set I_CREATING
on inode being created" which got reverted from 4.15 as it was
causing regressions.

Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Khaled Elmously <khalid.elmously@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
4 years agoRevert "ovl: set I_CREATING on inode being created"
Aaron Ma [Mon, 5 Aug 2019 17:20:27 +0000 (01:20 +0800)]
Revert "ovl: set I_CREATING on inode being created"

BugLink: https://bugs.launchpad.net/bugs/1838982
This reverts commit b7fe2730d202b8e8fab1884af4f954d793082fdd.

This upstream commit depends on the following commits:
e950564b97fd ("vfs: don't evict uninitialized inode")
80ea09a002bf ("vfs: factor out inode_insert5()")

With this commit only,
Crash happens during the setup (dpkg) phase of sbuild using a overlayfs
based chroot with the current bionic-proposed kernel (4.15.0-56-generic

[25553.379381] ? ovl_get_origin_fh+0x23/0x150 [overlay]
[25553.379386] ? ovl_inode_test+0x20/0x20 [overlay]
[25553.379390] ? ovl_lock_rename_workdir+0x50/0x50 [overlay]
[25553.379396] ovl_get_inode+0xa2/0x450 [overlay]
[25553.379401] ovl_lookup+0x275/0x760 [overlay]
[25553.379406] lookup_slow+0xab/0x170
[25553.379409] ? lookup_slow+0xab/0x170
[25553.379413] walk_component+0x1c3/0x470
[25553.379416] ? path_init+0x177/0x2f0
[25553.379419] path_lookupat+0x84/0x1f0
[25553.379423] ? __put_cred+0x3d/0x50
[25553.379426] ? revert_creds+0x2f/0x40
[25553.379429] filename_lookup+0xb6/0x190
[25553.379434] ? __check_object_size+0xaf/0x1b0
[25553.379438] user_path_at_empty+0x36/0x40
[25553.379441] ? user_path_at_empty+0x36/0x40
[25553.379444] SyS_chown+0x4d/0xe0
[25553.379449] do_syscall_64+0x73/0x130
[25553.379453] entry_SYSCALL_64_after_hwframe+0x3d/0xa2

commit 80ea09a002bf is not a fix, no need to backport.

Signed-off-by: Aaron Ma <aaron.ma@canonical.com>
Acked-by: Khalid Elmously <khalid.elmously@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoUBUNTU: Start new release
Stefan Bader [Tue, 6 Aug 2019 10:29:17 +0000 (12:29 +0200)]
UBUNTU: Start new release

Ignore: yes
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
4 years agoUBUNTU: Ubuntu-4.15.0-57.63
Kleber Sacilotto de Souza [Thu, 1 Aug 2019 10:25:26 +0000 (12:25 +0200)]
UBUNTU: Ubuntu-4.15.0-57.63

Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
4 years agox86/speculation/swapgs: Exclude ATOMs from speculation through SWAPGS
Thomas Gleixner [Wed, 17 Jul 2019 19:18:59 +0000 (21:18 +0200)]
x86/speculation/swapgs: Exclude ATOMs from speculation through SWAPGS

Intel provided the following information:

 On all current Atom processors, instructions that use a segment register
 value (e.g. a load or store) will not speculatively execute before the
 last writer of that segment retires. Thus they will not use a
 speculatively written segment value.

That means on ATOMs there is no speculation through SWAPGS, so the SWAPGS
entry paths can be excluded from the extra LFENCE if PTI is disabled.

Create a separate bug flag for the through SWAPGS speculation and mark all
out-of-order ATOMs and AMD/HYGON CPUs as not affected. The in-order ATOMs
are excluded from the whole mitigation mess anyway.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tyler Hicks <tyhicks@canonical.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
CVE-2019-1125

(backported from commit f36cf386e3fec258a341d446915862eded3e13d8)
[tyhicks: Dropped VULNWL_HYGON() change since this kernel version
 doesn't know about Hygon processors]
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
4 years agox86/entry/64: Use JMP instead of JMPQ
Josh Poimboeuf [Mon, 15 Jul 2019 16:51:39 +0000 (11:51 -0500)]
x86/entry/64: Use JMP instead of JMPQ

Somehow the swapgs mitigation entry code patch ended up with a JMPQ
instruction instead of JMP, where only the short jump is needed.  Some
assembler versions apparently fail to optimize JMPQ into a two-byte JMP
when possible, instead always using a 7-byte JMP with relocation.  For
some reason that makes the entry code explode with a #GP during boot.

Change it back to "JMP" as originally intended.

Fixes: 18ec54fdd6d1 ("x86/speculation: Prepare entry code for Spectre v1 swapgs mitigations")
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
CVE-2019-1125

(backported from commit 64dbc122b20f75183d8822618c24f85144a5a94d)
[tyhicks: Adjust context in entry_64.S]
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
4 years agox86/speculation: Enable Spectre v1 swapgs mitigations
Josh Poimboeuf [Mon, 8 Jul 2019 16:52:26 +0000 (11:52 -0500)]
x86/speculation: Enable Spectre v1 swapgs mitigations

The previous commit added macro calls in the entry code which mitigate the
Spectre v1 swapgs issue if the X86_FEATURE_FENCE_SWAPGS_* features are
enabled.  Enable those features where applicable.

The mitigations may be disabled with "nospectre_v1" or "mitigations=off".

There are different features which can affect the risk of attack:

- When FSGSBASE is enabled, unprivileged users are able to place any
  value in GS, using the wrgsbase instruction.  This means they can
  write a GS value which points to any value in kernel space, which can
  be useful with the following gadget in an interrupt/exception/NMI
  handler:

if (coming from user space)
swapgs
mov %gs:<percpu_offset>, %reg1
// dependent load or store based on the value of %reg
// for example: mov %(reg1), %reg2

  If an interrupt is coming from user space, and the entry code
  speculatively skips the swapgs (due to user branch mistraining), it
  may speculatively execute the GS-based load and a subsequent dependent
  load or store, exposing the kernel data to an L1 side channel leak.

  Note that, on Intel, a similar attack exists in the above gadget when
  coming from kernel space, if the swapgs gets speculatively executed to
  switch back to the user GS.  On AMD, this variant isn't possible
  because swapgs is serializing with respect to future GS-based
  accesses.

  NOTE: The FSGSBASE patch set hasn't been merged yet, so the above case
doesn't exist quite yet.

- When FSGSBASE is disabled, the issue is mitigated somewhat because
  unprivileged users must use prctl(ARCH_SET_GS) to set GS, which
  restricts GS values to user space addresses only.  That means the
  gadget would need an additional step, since the target kernel address
  needs to be read from user space first.  Something like:

if (coming from user space)
swapgs
mov %gs:<percpu_offset>, %reg1
mov (%reg1), %reg2
// dependent load or store based on the value of %reg2
// for example: mov %(reg2), %reg3

  It's difficult to audit for this gadget in all the handlers, so while
  there are no known instances of it, it's entirely possible that it
  exists somewhere (or could be introduced in the future).  Without
  tooling to analyze all such code paths, consider it vulnerable.

  Effects of SMAP on the !FSGSBASE case:

  - If SMAP is enabled, and the CPU reports RDCL_NO (i.e., not
    susceptible to Meltdown), the kernel is prevented from speculatively
    reading user space memory, even L1 cached values.  This effectively
    disables the !FSGSBASE attack vector.

  - If SMAP is enabled, but the CPU *is* susceptible to Meltdown, SMAP
    still prevents the kernel from speculatively reading user space
    memory.  But it does *not* prevent the kernel from reading the
    user value from L1, if it has already been cached.  This is probably
    only a small hurdle for an attacker to overcome.

Thanks to Dave Hansen for contributing the speculative_smap() function.

Thanks to Andrew Cooper for providing the inside scoop on whether swapgs
is serializing on AMD.

[ tglx: Fixed the USER fence decision and polished the comment as suggested
   by Dave Hansen ]

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
CVE-2019-1125

(backported from commit a2059825986a1c8143fd6698774fa9d83733bb11)
[tyhicks: Adjust context in kernel-parameters.txt and bugs.c]
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
4 years agox86/speculation: Prepare entry code for Spectre v1 swapgs mitigations
Josh Poimboeuf [Mon, 8 Jul 2019 16:52:25 +0000 (11:52 -0500)]
x86/speculation: Prepare entry code for Spectre v1 swapgs mitigations

Spectre v1 isn't only about array bounds checks.  It can affect any
conditional checks.  The kernel entry code interrupt, exception, and NMI
handlers all have conditional swapgs checks.  Those may be problematic in
the context of Spectre v1, as kernel code can speculatively run with a user
GS.

For example:

if (coming from user space)
swapgs
mov %gs:<percpu_offset>, %reg
mov (%reg), %reg1

When coming from user space, the CPU can speculatively skip the swapgs, and
then do a speculative percpu load using the user GS value.  So the user can
speculatively force a read of any kernel value.  If a gadget exists which
uses the percpu value as an address in another load/store, then the
contents of the kernel value may become visible via an L1 side channel
attack.

A similar attack exists when coming from kernel space.  The CPU can
speculatively do the swapgs, causing the user GS to get used for the rest
of the speculative window.

The mitigation is similar to a traditional Spectre v1 mitigation, except:

  a) index masking isn't possible; because the index (percpu offset)
     isn't user-controlled; and

  b) an lfence is needed in both the "from user" swapgs path and the
     "from kernel" non-swapgs path (because of the two attacks described
     above).

The user entry swapgs paths already have SWITCH_TO_KERNEL_CR3, which has a
CR3 write when PTI is enabled.  Since CR3 writes are serializing, the
lfences can be skipped in those cases.

On the other hand, the kernel entry swapgs paths don't depend on PTI.

To avoid unnecessary lfences for the user entry case, create two separate
features for alternative patching:

  X86_FEATURE_FENCE_SWAPGS_USER
  X86_FEATURE_FENCE_SWAPGS_KERNEL

Use these features in entry code to patch in lfences where needed.

The features aren't enabled yet, so there's no functional change.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
CVE-2019-1125

(backported from commit 18ec54fdd6d18d92025af097cd042a75cf0ea24c)
[tyhicks: Adjust context in calling.h and entry_64.S]
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
4 years agox86/cpufeatures: Combine word 11 and 12 into a new scattered features word
Fenghua Yu [Wed, 19 Jun 2019 16:51:09 +0000 (18:51 +0200)]
x86/cpufeatures: Combine word 11 and 12 into a new scattered features word

It's a waste for the four X86_FEATURE_CQM_* feature bits to occupy two
whole feature bits words. To better utilize feature words, re-define
word 11 to host scattered features and move the four X86_FEATURE_CQM_*
features into Linux defined word 11. More scattered features can be
added in word 11 in the future.

Rename leaf 11 in cpuid_leafs to CPUID_LNX_4 to reflect it's a
Linux-defined leaf.

Rename leaf 12 as CPUID_DUMMY which will be replaced by a meaningful
name in the next patch when CPUID.7.1:EAX occupies world 12.

Maximum number of RMID and cache occupancy scale are retrieved from
CPUID.0xf.1 after scattered CQM features are enumerated. Carve out the
code into a separate function.

KVM doesn't support resctrl now. So it's safe to move the
X86_FEATURE_CQM_* features to scattered features word 11 for KVM.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Aaron Lewis <aaronlewis@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Babu Moger <babu.moger@amd.com>
Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
Cc: "Sean J Christopherson" <sean.j.christopherson@intel.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: kvm ML <kvm@vger.kernel.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Ravi V Shankar <ravi.v.shankar@intel.com>
Cc: Sherry Hurwitz <sherry.hurwitz@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Lendacky <Thomas.Lendacky@amd.com>
Cc: x86 <x86@kernel.org>
Link: https://lkml.kernel.org/r/1560794416-217638-2-git-send-email-fenghua.yu@intel.com
CVE-2019-1125

(cherry picked from commit acec0ce081de0c36459eea91647faf99296445a3)
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
4 years agox86/cpufeatures: Carve out CQM features retrieval
Borislav Petkov [Wed, 19 Jun 2019 15:24:34 +0000 (17:24 +0200)]
x86/cpufeatures: Carve out CQM features retrieval

... into a separate function for better readability. Split out from a
patch from Fenghua Yu <fenghua.yu@intel.com> to keep the mechanical,
sole code movement separate for easy review.

No functional changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: x86@kernel.org
CVE-2019-1125

(cherry picked from commit 45fc56e629caa451467e7664fbd4c797c434a6c4)
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
4 years agoUBUNTU: update dkms package versions
Kleber Sacilotto de Souza [Thu, 1 Aug 2019 10:16:25 +0000 (12:16 +0200)]
UBUNTU: update dkms package versions

BugLink: http://bugs.launchpad.net/bugs/1786013
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
4 years agoUBUNTU: Start new release
Kleber Sacilotto de Souza [Thu, 1 Aug 2019 10:14:10 +0000 (12:14 +0200)]
UBUNTU: Start new release

Ignore: yes
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
4 years agoUBUNTU: Ubuntu-4.15.0-56.62 Ubuntu-4.15.0-56.62
Sultan Alsawaf [Wed, 24 Jul 2019 15:50:49 +0000 (09:50 -0600)]
UBUNTU: Ubuntu-4.15.0-56.62

Signed-off-by: Sultan Alsawaf <sultan.alsawaf@canonical.com>
4 years agoUBUNTU: link-to-tracker: update tracking bug
Juerg Haefliger [Wed, 24 Jul 2019 02:55:37 +0000 (20:55 -0600)]
UBUNTU: link-to-tracker: update tracking bug

BugLink: https://bugs.launchpad.net/bugs/1837626
Signed-off-by: Juerg Haefliger <juergh@canonical.com>
4 years agoUBUNTU: [Packaging] update helper scripts
Juerg Haefliger [Wed, 24 Jul 2019 00:35:49 +0000 (18:35 -0600)]
UBUNTU: [Packaging] update helper scripts

BugLink: http://bugs.launchpad.net/bugs/1786013
Signed-off-by: Juerg Haefliger <juergh@canonical.com>
4 years agoUBUNTU: [Packaging] resync git-ubuntu-log
Juerg Haefliger [Wed, 24 Jul 2019 00:35:48 +0000 (18:35 -0600)]
UBUNTU: [Packaging] resync git-ubuntu-log

BugLink: http://bugs.launchpad.net/bugs/1786013
Signed-off-by: Juerg Haefliger <juergh@canonical.com>
4 years agomedia: uvcvideo: Fix 'type' check leading to overflow
Alistair Strachan [Thu, 18 Jul 2019 09:27:00 +0000 (11:27 +0200)]
media: uvcvideo: Fix 'type' check leading to overflow

When initially testing the Camera Terminal Descriptor wTerminalType
field (buffer[4]), no mask is used. Later in the function, the MSB is
overloaded to store the descriptor subtype, and so a mask of 0x7fff
is used to check the type.

If a descriptor is specially crafted to set this overloaded bit in the
original wTerminalType field, the initial type check will fail (falling
through, without adjusting the buffer size), but the later type checks
will pass, assuming the buffer has been made suitably large, causing an
overflow.

Avoid this problem by checking for the MSB in the wTerminalType field.
If the bit is set, assume the descriptor is bad, and abort parsing it.

Originally reported here:
https://groups.google.com/forum/#!topic/syzkaller/Ot1fOE6v1d8
A similar (non-compiling) patch was provided at that time.

Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Alistair Strachan <astrachan@google.com>
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
CVE-2019-2101

(cherry picked from commit 47bb117911b051bbc90764a8bff96543cbd2005f)
Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
4 years agoUBUNTU: [Packaging] remove hibmc-drm from built modules list
Kleber Sacilotto de Souza [Tue, 23 Jul 2019 13:41:08 +0000 (15:41 +0200)]
UBUNTU: [Packaging] remove hibmc-drm from built modules list

BugLink: https://bugs.launchpad.net/bugs/1762940
CONFIG_DRM_HISI_HIBMC is not enabled only for arm64, so remove hibmc-drm
from the modules list from the previous ABI for all architectures except
arm64.

Ignore: yes
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
4 years agoUBUNTU: SAUCE: Make CONFIG_DRM_HISI_HIBMC depend on ARM64
Matthew Ruffell [Tue, 16 Jul 2019 01:08:00 +0000 (03:08 +0200)]
UBUNTU: SAUCE: Make CONFIG_DRM_HISI_HIBMC depend on ARM64

BugLink: https://bugs.launchpad.net/bugs/1762940
Hisilicon developed hibmc_drm for their arm64 based soc and did not
intend for this driver to be used on any other architecture than arm64.

Using it on amd64 leads to the screen being unreadable, forcing users to
manually blacklist the module on the kernel command line to use the d-i
server installer.

Make CONFIG_DRM_HISI_HIBMC firmly depend on arm64 to ensure it is not
built for other architectures.

Signed-off-by: Matthew Ruffell <matthew.ruffell@canonical.com>
Acked-by: Paolo Pisati <paolo.pisati@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
4 years agoUBUNTU: [Config] Set CONFIG_DRM_HISI_HIBMC to arm64 only
Matthew Ruffell [Tue, 16 Jul 2019 01:08:00 +0000 (03:08 +0200)]
UBUNTU: [Config] Set CONFIG_DRM_HISI_HIBMC to arm64 only

BugLink: https://bugs.launchpad.net/bugs/1762940
Hisilicon say that the hibmc_drm driver is for use on arm64 arch only,
and is not meant for amd64.

When hibmc_drm is used with amd64 hardware, multiple issues occur which
lead to the screen being unreadable, most significant is the inability
to use the d-i server installer due to the problem.

This patch removes CONFIG_DRM_HISI_HIBMC from all architectures other
than arm64.

Signed-off-by: Matthew Ruffell <matthew.ruffell@canonical.com>
Acked-by: Paolo Pisati <paolo.pisati@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Connor Kuehl <connor.kuehl@canonical.com>
[ kleber: added bug note on annotation file ]
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
4 years agosfc: stop the TX queue before pushing new buffers
Martin Habets [Thu, 24 May 2018 09:14:00 +0000 (10:14 +0100)]
sfc: stop the TX queue before pushing new buffers

BugLink: https://bugs.launchpad.net/bugs/1836635
efx_enqueue_skb() can push new buffers for the xmit_more functionality.
We must stops the TX queue before this or else the TX queue does not get
restarted and we get a netdev watchdog.

In the error handling we may now need to unwind more than 1 packet, and
we may need to push the new buffers onto the partner queue.

v2: In the error leg also push this queue if xmit_more is set

Fixes: e9117e5099ea ("sfc: Firmware-Assisted TSO version 2")
Reported-by: Jarod Wilson <jarod@redhat.com>
Tested-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Martin Habets <mhabets@solarflare.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 0c235113b3c42197dba66baf76697359b03a5046)
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
4 years agosfc: remove ctpio_dmabuf_start from stats
Bert Kenward [Wed, 4 Apr 2018 15:40:30 +0000 (16:40 +0100)]
sfc: remove ctpio_dmabuf_start from stats

BugLink: https://bugs.launchpad.net/bugs/1836635
The ctpio_dmabuf_start entry is not actually a stat and shouldn't
be exposed to ethtool.

Fixes: 2c0b6ee837db ("sfc: expose CTPIO stats on NICs that support them")
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 458bd99e49742c225f75501591573959c7ef50a2)
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
4 years agosfc: support FEC configuration through ethtool
Edward Cree [Wed, 14 Mar 2018 14:21:26 +0000 (14:21 +0000)]
sfc: support FEC configuration through ethtool

BugLink: https://bugs.launchpad.net/bugs/1836635
As well as 'auto' and the forced 'off', 'rs' and 'baser' states, we also
 handle combinations of settings (since the fecparam->fec field is a
 bitmask), where auto|rs and auto|baser specify a preferred FEC mode but
 will fall back to the other if the cable or link partner doesn't support
 it.  rs|baser (with or without auto bit) means prefer FEC even where
 auto wouldn't use it, but let FW choose which encoding to use.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 7f61e6c6279bcb340489ab6b781835da700f1c4b)
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
4 years agosfc: update MCDI protocol headers
Edward Cree [Wed, 14 Mar 2018 14:21:00 +0000 (14:21 +0000)]
sfc: update MCDI protocol headers

BugLink: https://bugs.launchpad.net/bugs/1836635
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit f215347cc0fcf76933d6bdab95e253724b823625)
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
4 years agosfc: mark some unexported symbols as static
kbuild test robot [Fri, 26 Jan 2018 17:00:39 +0000 (17:00 +0000)]
sfc: mark some unexported symbols as static

BugLink: https://bugs.launchpad.net/bugs/1836635
efx_default_channel_want_txqs() is only used in efx.c, while
 efx_ptp_want_txqs() and efx_ptp_channel_type (a struct) are only used
 in ptp.c.  In all cases these symbols should be static.

Fixes: 2935e3c38228 ("sfc: on 8000 series use TX queues for TX timestamps")
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
[ecree@solarflare.com: rewrote commit message]
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit e7345ba352d15c88cf9d8698a6f5bff9c25670eb)
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
4 years agosfc: add suffix to large constant in ptp
Bert Kenward [Fri, 26 Jan 2018 08:51:47 +0000 (08:51 +0000)]
sfc: add suffix to large constant in ptp

BugLink: https://bugs.launchpad.net/bugs/1836635
Fixes: 1280c0f8aafc ("sfc: support second + quarter ns time format for receive datapath")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 5b09179e7fa2849a0c95d14bb69416693e0ed0c3)
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
4 years agosfc: support Medford2 frequency adjustment format
Laurence Evans [Thu, 25 Jan 2018 17:28:04 +0000 (17:28 +0000)]
sfc: support Medford2 frequency adjustment format

BugLink: https://bugs.launchpad.net/bugs/1836635
Support increased precision frequency adjustment format (FP44) used
 by Medford2 adapters.

Signed-off-by: Laurence Evans <levans@solarflare.com>
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 88a4fb5fce303c1ffd0e7863c01fc9e38f2e1717)
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
4 years agosfc: support second + quarter ns time format for receive datapath
Edward Cree [Thu, 25 Jan 2018 17:27:40 +0000 (17:27 +0000)]
sfc: support second + quarter ns time format for receive datapath

BugLink: https://bugs.launchpad.net/bugs/1836635
The time_format that we stash in the PTP data structure is never
 referenced, so we can remove it.  Instead, store the information needed
 to interpret sync event timestamps.
Also rolls in a couple of other related minor PTP fixes.

Based on patches by Bert Kenward <bkenward@solarflare.com> and Laurence
 Evans <levans@solarflare.com>.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 1280c0f8aafc4c09c59c576c8d50f367070b2619)
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
4 years agosfc: support separate PTP and general timestamping
Laurence Evans [Thu, 25 Jan 2018 17:27:22 +0000 (17:27 +0000)]
sfc: support separate PTP and general timestamping

BugLink: https://bugs.launchpad.net/bugs/1836635
Support MC_CMD_PTP_OUT_GET_TIMESTAMP_CORRECTIONS_V2.  Extract general
 timestamp corrections in addition to PTP corrections.  Apply receive
 timestamp corrections for general datapath receive timestamping, and
 correspondingly for transmit.

Signed-off-by: Laurence Evans <levans@solarflare.com>
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 04796f4c4dc4ac4c4f405c22e20dc9ae1068eea5)
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
4 years agosfc: simplify RX datapath timestamping
Laurence Evans [Thu, 25 Jan 2018 17:27:02 +0000 (17:27 +0000)]
sfc: simplify RX datapath timestamping

BugLink: https://bugs.launchpad.net/bugs/1836635
Use timestamp conversion function with correction to avoid duplicate
 correction handling.

Signed-off-by: Laurence Evans <levans@solarflare.com>
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit c4f64fcc4d31e7f773cb4eec9d90c40ebb049c14)
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
4 years agosfc: only advertise TX timestamping if we have the license for it
Martin Habets [Thu, 25 Jan 2018 17:26:31 +0000 (17:26 +0000)]
sfc: only advertise TX timestamping if we have the license for it

BugLink: https://bugs.launchpad.net/bugs/1836635
We check the license for TX hardware timestamping capability.
The PTP probe will have enabled PTP sync events from the adapter.  If
 later, at TX queue init, it turns out we do not have the license, we
 don't need the sync events either.

Signed-off-by: Martin Habets <mhabets@solarflare.com>
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 6aa47c87cb053670bb636fb2001deb4a868f9486)
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
4 years agosfc: on 8000 series use TX queues for TX timestamps
Edward Cree [Thu, 25 Jan 2018 17:26:06 +0000 (17:26 +0000)]
sfc: on 8000 series use TX queues for TX timestamps

BugLink: https://bugs.launchpad.net/bugs/1836635
For this we create and use one or more new TX queues on the PTP channel,
 and enable sync events for it.
Based on a patch by Martin Habets <mhabets@solarflare.com>.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 2935e3c38228ad9bf073eeb0eedff5849eea63db)
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
4 years agosfc: MAC TX timestamp handling on the 8000 series
Martin Habets [Thu, 25 Jan 2018 17:25:50 +0000 (17:25 +0000)]
sfc: MAC TX timestamp handling on the 8000 series

BugLink: https://bugs.launchpad.net/bugs/1836635
TX timestamps on 8000 series are supplied from the MAC. This timestamp is
 only 48 bits long. The high order bits from the last time sync event are
 used for the top 16 bits.

Signed-off-by: Martin Habets <mhabets@solarflare.com>
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit c1d0d33946725775be1c68515c07d0ff8237d222)
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
4 years agosfc: only enable TX timestamping if the adapter is licensed for it
Martin Habets [Thu, 25 Jan 2018 17:25:33 +0000 (17:25 +0000)]
sfc: only enable TX timestamping if the adapter is licensed for it

BugLink: https://bugs.launchpad.net/bugs/1836635
If we try to enable the feature and do not have the license for it, the
 MCPU will refuse and fail our TX queue init.

Signed-off-by: Martin Habets <mhabets@solarflare.com>
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 50663fe1808fcd08cc60c3adfa3692b27a51161d)
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
4 years agosfc: use main datapath for HW timestamps if available
Martin Habets [Thu, 25 Jan 2018 17:25:15 +0000 (17:25 +0000)]
sfc: use main datapath for HW timestamps if available

BugLink: https://bugs.launchpad.net/bugs/1836635
We can now transmit SKBs in 2 ways:
1. Via the MC (for the 7XXX series and earlier), using
   efx_ptp_xmit_skb_mc().
2. Via the TX queues on the dedicated PTP channel (8XXX series and later),
   using efx_ptp_xmit_skb_queue().
The PTP worker thread uses the method set up at probe time. It never
 checked the return code from the old efx_ptp_xmit_skb(), so it now
 returns void.
We increment the TX dropped counter of the device if the transmit fails.

As a result of the probe per channel the remove gets called multiple times.
 Clean up efx->ptp_data properly to avoid the 2nd call blowing up.

Signed-off-by: Martin Habets <mhabets@solarflare.com>
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 23418dc131464ffe29c9ac2d71cf95bf2883fc4f)
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
4 years agosfc: add function to determine which TX timestamping method to use
Martin Habets [Thu, 25 Jan 2018 17:24:56 +0000 (17:24 +0000)]
sfc: add function to determine which TX timestamping method to use

BugLink: https://bugs.launchpad.net/bugs/1836635
Use MC capability MC_CMD_GET_CAPABILITIES_V2_OUT_TX_MAC_TIMESTAMPING to
 detect whether the NIC supports timestamping packets sent out the main
 datapath.

Signed-off-by: Martin Habets <mhabets@solarflare.com>
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 9c3afb33ae587723d2acda044a352670ec8d5b82)
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
4 years agosfc: handle TX timestamps in the normal data path
Martin Habets [Thu, 25 Jan 2018 17:24:43 +0000 (17:24 +0000)]
sfc: handle TX timestamps in the normal data path

BugLink: https://bugs.launchpad.net/bugs/1836635
Before this work, TX timestamping is done by sending each SKB to the MC.
On the 8000 series (Medford1) we have high speed timestamping via the
 MAC, which means we can use normal TX queues for this without a
 significant drop in bandwidth.  On the X2000 series (Medford2) support
 for transmitting via the MC is removed, so the new way must be used.

This patch enables timestamping on a TX queue, if requested.
It also enhances TX event handling to process the extra completion events,
 and puts the time in the SKB.

Signed-off-by: Martin Habets <mhabets@solarflare.com>
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit b9b603d46d5aad1fb66fa007759193e82a50c680)
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
4 years agosfc: remove tx and MCDI handling from NAPI budget consideration
Bert Kenward [Thu, 25 Jan 2018 17:24:20 +0000 (17:24 +0000)]
sfc: remove tx and MCDI handling from NAPI budget consideration

BugLink: https://bugs.launchpad.net/bugs/1836635
The NAPI budget is only for RX processing work, not other work such as
 TX or MCDI completion handling.

Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 5227ecccea2d645d253d243ad287169335a4ae64)
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
4 years agosfc: add bits for 25/50/100G supported/advertised speeds
Edward Cree [Wed, 10 Jan 2018 18:00:25 +0000 (18:00 +0000)]
sfc: add bits for 25/50/100G supported/advertised speeds

BugLink: https://bugs.launchpad.net/bugs/1836635
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 5abb5e7f916ee8d2d2543fb70edb2817284203cc)
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
4 years agosfc: support the ethtool ksettings API properly so that 25/50/100G works
Edward Cree [Wed, 10 Jan 2018 18:00:14 +0000 (18:00 +0000)]
sfc: support the ethtool ksettings API properly so that 25/50/100G works

BugLink: https://bugs.launchpad.net/bugs/1836635
Store and handle ethtool link mode masks within the driver instead of
 just a single u32.  However, quite a significant amount of existing code
 wants to manipulate the masks directly, and thus now uses the first
 unsigned long (i.e. mask[0]) as though it were a legacy u32 mask.  This
 is ok because all the bits that code is interested in are in the first
 32 bits of the mask; but it might be a good idea to change them in
 future to use the proper bitmap API.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit c2ab85d2daef42b1cdfd35f564cc40a392c88849)
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
4 years agosfc: basic MCDI mapping of 25/50/100G link speeds
Edward Cree [Wed, 10 Jan 2018 17:59:59 +0000 (17:59 +0000)]
sfc: basic MCDI mapping of 25/50/100G link speeds

BugLink: https://bugs.launchpad.net/bugs/1836635
Only handles direct speed setting, not autoneg, because the driver is
 still trying to pretend it uses the legacy ethtool API which doesn't
 have advertised/supported bits for 25/50/100G.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 702b3d51369779a1ad5b03b24911ef6b0a6caa6b)
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
4 years agosfc: expose CTPIO stats on NICs that support them
Bert Kenward [Thu, 21 Dec 2017 09:00:41 +0000 (09:00 +0000)]
sfc: expose CTPIO stats on NICs that support them

BugLink: https://bugs.launchpad.net/bugs/1836635
While the Linux driver doesn't use CTPIO ('cut-through programmed I/O'),
 other drivers on the same port might, so if we're responsible for
 reporting per-port stats we need to include the CTPIO stats.

Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 2c0b6ee837dba6034ace78fcc58d2bc4f5d063c1)
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
4 years agosfc: expose FEC stats on Medford2
Edward Cree [Thu, 21 Dec 2017 09:00:36 +0000 (09:00 +0000)]
sfc: expose FEC stats on Medford2

BugLink: https://bugs.launchpad.net/bugs/1836635
There's no explicit capability bit, so we just condition them on having
 efx->num_mac_stats >= MC_CMD_MAC_NSTATS_V2.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit f411b54d6b60f7db97190fa378de4c147fa055c5)
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
4 years agosfc: support variable number of MAC stats
Edward Cree [Thu, 21 Dec 2017 09:00:26 +0000 (09:00 +0000)]
sfc: support variable number of MAC stats

BugLink: https://bugs.launchpad.net/bugs/1836635
Medford2 NICs support more than MC_CMD_MAC_NSTATS stats, and report the new
 count in a field of MC_CMD_GET_CAPABILITIES_V4.  This also means that the
 end generation count moves (it is, as before, the last 64 bits of the DMA
 buffer, but that is no longer MC_CMD_MAC_GENERATION_END).
So read num_mac_stats from the GET_CAPABILITIES response, if present;
 otherwise assume MC_CMD_MAC_NSTATS; and always use num_mac_stats - 1 rather
 than MC_CMD_MAC_GENERATION_END.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit c1be48214543c4e5267c43d2c00ac2d9bb671381)
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
4 years agosfc: update MCDI protocol headers
Edward Cree [Thu, 21 Dec 2017 09:00:14 +0000 (09:00 +0000)]
sfc: update MCDI protocol headers

BugLink: https://bugs.launchpad.net/bugs/1836635
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit d31a59662529f48d03a0d09d1c2ffb1197f6a1ca)
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
4 years agosfc: populate the timer reload field
Bert Kenward [Mon, 18 Dec 2017 16:57:41 +0000 (16:57 +0000)]
sfc: populate the timer reload field

BugLink: https://bugs.launchpad.net/bugs/1836635
The timer mode register now has a separate field for the reload value.
Since we always use this timer with the reload (for interrupt moderation)
we set this to the same as the initial value.

Previous hardware ignores this field, so we can safely set these bits
on all hardware that uses this register.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 0bc959a95e8c1ee0295d2b85538a2a32b7b87880)
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
4 years agosfc: update EF10 register definitions
Bert Kenward [Mon, 18 Dec 2017 16:57:18 +0000 (16:57 +0000)]
sfc: update EF10 register definitions

BugLink: https://bugs.launchpad.net/bugs/1836635
The RX_L4_CLASS field has shrunk from 3 bits to 2 bits. The upper
bit was never used in previous hardware, so we can use the new
definition throughout.

The TSO OUTER_IPID field was previously spelt differently from the
external definitions.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit d8d8ccf277419b6feb281a2d08d9f881b2b724be)
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
4 years agosfc: improve PTP error reporting
Edward Cree [Mon, 18 Dec 2017 16:56:58 +0000 (16:56 +0000)]
sfc: improve PTP error reporting

BugLink: https://bugs.launchpad.net/bugs/1836635
Log a message if PTP probing fails; if we then, unexpectedly, get PTP
 events, only log a message for the first one on each device.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit acaef3c15612d7b0f5a4835f57e87a290e054839)
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
4 years agosfc: add Medford2 (SFC9250) PCI Device IDs
Edward Cree [Mon, 18 Dec 2017 16:56:34 +0000 (16:56 +0000)]
sfc: add Medford2 (SFC9250) PCI Device IDs

BugLink: https://bugs.launchpad.net/bugs/1836635
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit aae5a31663fe2683a6ec1bce00b1f8ac9c7fb249)
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
4 years agosfc: support VI strides other than 8k
Edward Cree [Mon, 18 Dec 2017 16:56:19 +0000 (16:56 +0000)]
sfc: support VI strides other than 8k

BugLink: https://bugs.launchpad.net/bugs/1836635
Medford2 can also have 16k or 64k VI stride.  This is reported by MCDI in
 GET_CAPABILITIES, which fortunately is called before the driver does
 anything sensitive to the VI stride (such as accessing or even allocating
 VIs past the zeroth).

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 71827443017789da691b402090c6be6138f43157)
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
4 years agosfc: make mem_bar a function rather than a constant
Edward Cree [Mon, 18 Dec 2017 16:55:50 +0000 (16:55 +0000)]
sfc: make mem_bar a function rather than a constant

BugLink: https://bugs.launchpad.net/bugs/1836635
Support using BAR 0 on SFC9250, even though the driver doesn't bind to such
 devices yet.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 03714bbb22ebe00bc07d83c526b16377c67daa3f)
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
4 years agoKVM: s390: enable MSA9 keywrapping functions depending on cpu model
Christian Borntraeger [Tue, 16 Jul 2019 06:54:00 +0000 (08:54 +0200)]
KVM: s390: enable MSA9 keywrapping functions depending on cpu model

BugLink: https://bugs.launchpad.net/bugs/1836153
Instead of adding a new machine option to disable/enable the keywrapping
options of pckmo (like for AES and DEA) we can now use the CPU model to
decide. As ECC is also wrapped with the AES key we need that to be
enabled.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
(cherry picked from commit 8ec2fa52eac53bff7ef1cedbc4ad8af650ec937c)
Signed-off-by: Frank Heimes <frank.heimes@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
4 years agoKVM: s390: add deflate conversion facilty to cpu model
Christian Borntraeger [Tue, 16 Jul 2019 06:54:00 +0000 (08:54 +0200)]
KVM: s390: add deflate conversion facilty to cpu model

BugLink: https://bugs.launchpad.net/bugs/1836153
This enables stfle.151 and adds the subfunctions for DFLTCC. Bit 151 is
added to the list of facilities that will be enabled when there is no
cpu model involved as DFLTCC requires no additional handling from
userspace, e.g. for migration.

Please note that a cpu model enabled user space can and will have the
final decision on the facility bits for a guests.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Collin Walling <walling@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
(cherry picked from commit 4f45b90e1c03466202fca7f62eaf32243f220830)
Signed-off-by: Frank Heimes <frank.heimes@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
4 years agoKVM: s390: add enhanced sort facilty to cpu model
Christian Borntraeger [Tue, 16 Jul 2019 06:54:00 +0000 (08:54 +0200)]
KVM: s390: add enhanced sort facilty to cpu model

BugLink: https://bugs.launchpad.net/bugs/1836153
This enables stfle.150 and adds the subfunctions for SORTL. Bit 150 is
added to the list of facilities that will be enabled when there is no
cpu model involved as sortl requires no additional handling from
userspace, e.g. for migration.

Please note that a cpu model enabled user space can and will have the
final decision on the facility bits for a guests.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Collin Walling <walling@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
(cherry picked from commit 173aec2d5a9fa5f40e462661a8283fcafe04764f)
Signed-off-by: Frank Heimes <frank.heimes@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
4 years agoKVM: s390: provide query function for instructions returning 32 byte
Christian Borntraeger [Tue, 16 Jul 2019 06:54:00 +0000 (08:54 +0200)]
KVM: s390: provide query function for instructions returning 32 byte

BugLink: https://bugs.launchpad.net/bugs/1836153
Some of the new features have a 32byte response for the query function.
Provide a new wrapper similar to __cpacf_query. We might want to factor
this out if other users come up, as of today there is none. So let us
keep the function within KVM.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Collin Walling <walling@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
(cherry picked from commit d668139718a9e2260702777bd8d86d71c30b6539)
Signed-off-by: Frank Heimes <frank.heimes@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
4 years agoKVM: s390: add MSA9 to cpumodel
Christian Borntraeger [Tue, 16 Jul 2019 06:54:00 +0000 (08:54 +0200)]
KVM: s390: add MSA9 to cpumodel

BugLink: https://bugs.launchpad.net/bugs/1836153
This enables stfle.155 and adds the subfunctions for KDSA. Bit 155 is
added to the list of facilities that will be enabled when there is no
cpu model involved as MSA9 requires no additional handling from
userspace, e.g. for migration.

Please note that a cpu model enabled user space can and will have the
final decision on the facility bits for a guests.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Collin Walling <walling@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
(cherry picked from commit 13209ad0395c4de7fa48108b1dac72e341d5c089)
Signed-off-by: Frank Heimes <frank.heimes@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
4 years agoKVM: s390: add vector BCD enhancements facility to cpumodel
Christian Borntraeger [Tue, 16 Jul 2019 06:54:00 +0000 (08:54 +0200)]
KVM: s390: add vector BCD enhancements facility to cpumodel

BugLink: https://bugs.launchpad.net/bugs/1836153
If vector support is enabled, the vector BCD enhancements facility
might also be enabled.
We can directly forward this facility to the guest if available
and VX is requested by user space.

Please note that user space can and will have the final decision
on the facility bits for a guests.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Collin Walling <walling@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
(cherry picked from commit d5cb6ab1e3d4d7e0648a167f6290e89f6e86964e)
Signed-off-by: Frank Heimes <frank.heimes@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
4 years agoKVM: s390: add vector enhancements facility 2 to cpumodel
Christian Borntraeger [Tue, 16 Jul 2019 06:54:00 +0000 (08:54 +0200)]
KVM: s390: add vector enhancements facility 2 to cpumodel

BugLink: https://bugs.launchpad.net/bugs/1836153
If vector support is enabled, the vector enhancements facility 2
might also be enabled.
We can directly forward this facility to the guest if available
and VX is requested by user space.

Please note that user space can and will have the final decision
on the facility bits for a guests.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Collin Walling <walling@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
(cherry picked from commit 7832e91cd33f21f3cf82b003478c292915a1ec14)
Signed-off-by: Frank Heimes <frank.heimes@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
4 years agoKVM: s390: implement subfunction processor calls
Christian Borntraeger [Tue, 16 Jul 2019 06:54:00 +0000 (08:54 +0200)]
KVM: s390: implement subfunction processor calls

BugLink: https://bugs.launchpad.net/bugs/1836153
While we will not implement interception for query functions yet, we can
and should disable functions that have a control bit based on the given
CPU model.

Let us start with enabling the subfunction interface.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
(cherry picked from commit 346fa2f891c71a9b98014f8f62c15f4c7dd95ec1)
Signed-off-by: Frank Heimes <frank.heimes@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
4 years agoKVM: s390: add debug logging for cpu model subfunctions
Christian Borntraeger [Tue, 16 Jul 2019 06:54:00 +0000 (08:54 +0200)]
KVM: s390: add debug logging for cpu model subfunctions

BugLink: https://bugs.launchpad.net/bugs/1836153
As userspace can now get/set the subfunctions we want to trace those.
This will allow to also check QEMUs cpu model vs. what the real
hardware provides.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Janosch Frank <frankja@linux.vnet.ibm.com>
(cherry picked from commit 11ba5961a2156a4f210627ed8421387e2531b100)
Signed-off-by: Frank Heimes <frank.heimes@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
4 years agoUBUNTU: SAUCE: e1000e: disable force K1-off feature
Kai-Heng Feng [Thu, 11 Jul 2019 07:16:46 +0000 (15:16 +0800)]
UBUNTU: SAUCE: e1000e: disable force K1-off feature

BugLink: https://bugs.launchpad.net/bugs/1836152
Forwardport from http://mails.dpdk.org/archives/dev/2016-November/050658.html

MAC-PHY desync may occur causing misdetection of link up event.
Disabling K1-off feature can work around the problem.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204057

Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
(cherry picked from commit 3a818fd5094bd988b371228b12ed33531d727d15 git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue.git dev-queue)
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoUBUNTU: SAUCE: e1000e: add workaround for possible stalled packet
Kai-Heng Feng [Thu, 11 Jul 2019 07:16:45 +0000 (15:16 +0800)]
UBUNTU: SAUCE: e1000e: add workaround for possible stalled packet

BugLink: https://bugs.launchpad.net/bugs/1836152
Forwardport from http://mails.dpdk.org/archives/dev/2016-November/050657.html

This works around a possible stalled packet issue, which may occur due to
clock recovery from the PCH being too slow, when the LAN is transitioning
from K1 at 1G link speed.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204057

Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
(cherry picked from commit 82f7de996433eee486f1acb37ad9047b431ec13d git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue.git dev-queue)
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoUBUNTU: SAUCE: e1000e: Make watchdog use delayed work
Detlev Casanova [Thu, 11 Jul 2019 09:43:17 +0000 (17:43 +0800)]
UBUNTU: SAUCE: e1000e: Make watchdog use delayed work

BugLink: https://bugs.launchpad.net/bugs/1836177
Use delayed work instead of timers to run the watchdog of the e1000e
driver.

Simplify the code with one less middle function.

Signed-off-by: Detlev Casanova <detlev.casanova@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(backported from commit 59653e6497d16f7ac1d9db088f3959f57ee8c3db git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue.git dev-queue)
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoPCI: Enable NVIDIA HDA controllers
Lukas Wunner [Fri, 12 Jul 2019 06:34:30 +0000 (14:34 +0800)]
PCI: Enable NVIDIA HDA controllers

BugLink: https://bugs.launchpad.net/bugs/1836308
Many NVIDIA GPUs can be configured as either a single-function video device
or a multi-function device with video at function 0 and an HDA audio
controller at function 1.  The HDA controller can be enabled or disabled by
a bit in the function 0 config space.

Some BIOSes leave the HDA disabled, which means the HDMI connector from the
NVIDIA GPU may not work.  Sometimes the BIOS enables the HDA if an HDMI
cable is connected at boot time, but that doesn't handle hotplug cases.

Enable the HDA controller on device enumeration and resume and re-read the
header type, which tells us whether the GPU is a multi-function device.

This quirk is limited to NVIDIA PCI devices with the VGA Controller device
class.  This is expected to correspond to product configurations where the
NVIDIA GPU has connectors attached.  Other products where the device class
is 3D Controller are expected to correspond to configurations where the
NVIDIA GPU is dedicated (dGPU) and has no connectors.  See original post
(URL below) for more details.

This commit takes inspiration from an earlier patch by Daniel Drake.

Link: https://lore.kernel.org/r/20190708051744.24039-1-drake@endlessm.com
Link: https://devtalk.nvidia.com/default/topic/1024022
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75985
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Daniel Drake <drake@endlessm.com>
[bhelgaas: commit log, log message, return early if already enabled]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Aaron Plattner <aplattner@nvidia.com>
Cc: Peter Wu <peter@lekensteyn.nl>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: Karol Herbst <kherbst@redhat.com>
Cc: Maik Freudenberg <hhfeuer@gmx.de>
(backported from commit b678f90a1a6f8899542ea7bb190e268fb2d6ea0d git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git pci/misc)
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoselftests/powerpc: Fix Makefiles for headers_install change
Michael Ellerman [Tue, 16 Jul 2019 11:00:29 +0000 (19:00 +0800)]
selftests/powerpc: Fix Makefiles for headers_install change

BugLink: https://bugs.launchpad.net/bugs/1836715
Commit b2d35fa5fc80 ("selftests: add headers_install to lib.mk")
introduced a requirement that Makefiles more than one level below the
selftests directory need to define top_srcdir, but it didn't update
any of the powerpc Makefiles.

This broke building all the powerpc selftests with eg:

  make[1]: Entering directory '/src/linux/tools/testing/selftests/powerpc'
  BUILD_TARGET=/src/linux/tools/testing/selftests/powerpc/alignment; mkdir -p $BUILD_TARGET; make OUTPUT=$BUILD_TARGET -k -C alignment all
  make[2]: Entering directory '/src/linux/tools/testing/selftests/powerpc/alignment'
  ../../lib.mk:20: ../../../../scripts/subarch.include: No such file or directory
  make[2]: *** No rule to make target '../../../../scripts/subarch.include'.
  make[2]: Failed to remake makefile '../../../../scripts/subarch.include'.
  Makefile:38: recipe for target 'alignment' failed

Fix it by setting top_srcdir in the affected Makefiles.

Fixes: b2d35fa5fc80 ("selftests: add headers_install to lib.mk")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit 7e0cf1c983b5b24426d130fd949a055d520acc9a)
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoselftests/powerpc: Remove Power9 paste tests
Michael Ellerman [Tue, 16 Jul 2019 11:00:28 +0000 (19:00 +0800)]
selftests/powerpc: Remove Power9 paste tests

BugLink: https://bugs.launchpad.net/bugs/1836715
Paste on POWER9 only works to accelerators and not on real memory. So
these tests just generate a SIGILL.

So just delete them.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit 525661ef8040e099d191f5f35defaed342e6859b)
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Connor Kuehl <connor.kuehl@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoixgbevf: Use cached link state instead of re-reading the value for ethtool
Alexander Duyck [Tue, 16 Jul 2019 20:13:42 +0000 (17:13 -0300)]
ixgbevf: Use cached link state instead of re-reading the value for ethtool

BugLink: https://bugs.launchpad.net/bugs/1836760
Change the ethtool link settings call to just read the cached state out of
the adapter structure instead of trying to recheck the value from the PF.
Doing this should prevent excessive reading of the mailbox.

Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Reviewed-by: "Guilherme G. Piccoli" <gpiccoli@canonical.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 1e1b0c658d9bb364b4a2a4b08a760d3e4c239bdc)
Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
Acked-by: Connor Kuehl <connor.kuehl@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agonetfilter: nf_nat: skip nat clash resolution for same-origin entries
Martynas Pumputis [Wed, 17 Jul 2019 00:30:41 +0000 (12:30 +1200)]
netfilter: nf_nat: skip nat clash resolution for same-origin entries

BugLink: https://bugs.launchpad.net/bugs/1836816
It is possible that two concurrent packets originating from the same
socket of a connection-less protocol (e.g. UDP) can end up having
different IP_CT_DIR_REPLY tuples which results in one of the packets
being dropped.

To illustrate this, consider the following simplified scenario:

1. Packet A and B are sent at the same time from two different threads
   by same UDP socket.  No matching conntrack entry exists yet.
   Both packets cause allocation of a new conntrack entry.
2. get_unique_tuple gets called for A.  No clashing entry found.
   conntrack entry for A is added to main conntrack table.
3. get_unique_tuple is called for B and will find that the reply
   tuple of B is already taken by A.
   It will allocate a new UDP source port for B to resolve the clash.
4. conntrack entry for B cannot be added to main conntrack table
   because its ORIGINAL direction is clashing with A and the REPLY
   directions of A and B are not the same anymore due to UDP source
   port reallocation done in step 3.

This patch modifies nf_conntrack_tuple_taken so it doesn't consider
colliding reply tuples if the IP_CT_DIR_ORIGINAL tuples are equal.

[ Florian: simplify patch to not use .allow_clash setting
  and always ignore identical flows ]

Signed-off-by: Martynas Pumputis <martynas@weave.works>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
(cherry picked from commit 4e35c1cb9460240e983a01745b5f29fe3a4d8e39)
Signed-off-by: Matthew Ruffell <matthew.ruffell@canonical.com>
Acked-by: Connor Kuehl <connor.kuehl@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agonetfilter: nf_conntrack: resolve clash for matching conntracks
Martynas Pumputis [Wed, 17 Jul 2019 00:30:40 +0000 (12:30 +1200)]
netfilter: nf_conntrack: resolve clash for matching conntracks

BugLink: https://bugs.launchpad.net/bugs/1836816
This patch enables the clash resolution for NAT (disabled in
"590b52e10d41") if clashing conntracks match (i.e. both tuples are equal)
and a protocol allows it.

The clash might happen for a connections-less protocol (e.g. UDP) when
two threads in parallel writes to the same socket and consequent calls
to "get_unique_tuple" return the same tuples (incl. reply tuples).

In this case it is safe to perform the resolution, as the losing CT
describes the same mangling as the winning CT, so no modifications to
the packet are needed, and the result of rules traversal for the loser's
packet stays valid.

Signed-off-by: Martynas Pumputis <martynas@weave.works>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
(cherry picked from commit ed07d9a021df6da53456663a76999189badc432a)
Signed-off-by: Matthew Ruffell <matthew.ruffell@canonical.com>
Acked-by: Connor Kuehl <connor.kuehl@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agocrypto: ecdh - add public key verification test
Stephan Mueller [Wed, 17 Jul 2019 09:02:42 +0000 (11:02 +0200)]
crypto: ecdh - add public key verification test

According to SP800-56A section 5.6.2.1, the public key to be processed
for the ECDH operation shall be checked for appropriateness. When the
public key is considered to be an ephemeral key, the partial validation
test as defined in SP800-56A section 5.6.2.3.4 can be applied.

The partial verification test requires the presence of the field
elements of a and b. For the implemented NIST curves, b is defined in
FIPS 186-4 appendix D.1.2. The element a is implicitly given with the
Weierstrass equation given in D.1.2 where a = p - 3.

Without the test, the NIST ACVP testing fails. After adding this check,
the NIST ACVP testing passes.

Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
CVE-2018-5383

(cherry picked from commit ea169a30a6bf6782a05a51d2b9cf73db151eab8b)
Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Acked-by: Connor Kuehl <connor.kuehl@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agosched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup
Phil Auld [Thu, 18 Jul 2019 03:05:59 +0000 (15:05 +1200)]
sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup

BugLink: https://bugs.launchpad.net/bugs/1836971
[ Upstream commit 2e8e19226398db8265a8e675fcc0118b9e80c9e8 ]

With extremely short cfs_period_us setting on a parent task group with a large
number of children the for loop in sched_cfs_period_timer() can run until the
watchdog fires. There is no guarantee that the call to hrtimer_forward_now()
will ever return 0.  The large number of children can make
do_sched_cfs_period_timer() take longer than the period.

 NMI watchdog: Watchdog detected hard LOCKUP on cpu 24
 RIP: 0010:tg_nop+0x0/0x10
  <IRQ>
  walk_tg_tree_from+0x29/0xb0
  unthrottle_cfs_rq+0xe0/0x1a0
  distribute_cfs_runtime+0xd3/0xf0
  sched_cfs_period_timer+0xcb/0x160
  ? sched_cfs_slack_timer+0xd0/0xd0
  __hrtimer_run_queues+0xfb/0x270
  hrtimer_interrupt+0x122/0x270
  smp_apic_timer_interrupt+0x6a/0x140
  apic_timer_interrupt+0xf/0x20
  </IRQ>

To prevent this we add protection to the loop that detects when the loop has run
too many times and scales the period and quota up, proportionally, so that the timer
can complete before then next period expires.  This preserves the relative runtime
quota while preventing the hard lockup.

A warning is issued reporting this state and the new values.

Signed-off-by: Phil Auld <pauld@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Anton Blanchard <anton@ozlabs.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190319130005.25492-1-pauld@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from d069fe4844f8d799d771659a745fe91870c93fda 4.14.y)
Signed-off-by: Matthew Ruffell <matthew.ruffell@canonical.com>
Acked-by: Connor Kuehl <connor.kuehl@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
4 years agoUBUNTU: update dkms package versions
Stefan Bader [Wed, 17 Jul 2019 10:20:41 +0000 (12:20 +0200)]
UBUNTU: update dkms package versions

BugLink: https://bugs.launchpad.net/bugs/1834479
Adjusting package versions to bionic/updates.

Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
4 years agoUBUNTU: [Packaging] Add update-version-dkms
Stefan Bader [Wed, 17 Jul 2019 10:19:04 +0000 (12:19 +0200)]
UBUNTU: [Packaging] Add update-version-dkms

BugLink: https://bugs.launchpad.net/bugs/1834479
This script is used to update debian/dkms-versions according
to versions in a given pocket. Copied directly from Disco.

Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>