git.proxmox.com Git - mirror_ubuntu-bionic-kernel.git/log

UBUNTU: SAUCE: net: ena: fix crash during ena_remove()

BugLink: http://bugs.launchpad.net/bugs/1802341
In ena_remove() we have the following stack call:
ena_remove()
  unregister_netdev()
  ena_destroy_device()
    netif_carrier_off()

Calling netif_carrier_off() causes linkwatch to try to handle the
link change event on the already unregistered netdev, which leads
to a read from an unreadable memory address.

This patch switches the order of the two functions, so that
netif_carrier_off() is called on a regiestered netdev.

To accomplish this fix we also had to:
1. Remove the set bit ENA_FLAG_TRIGGER_RESET
2. Add a sanitiy check in ena_close()
both to prevent double device reset (when calling unregister_netdev()
ena_close is called, but the device was already deleted in
ena_destroy_device()).
3. Set the admin_queue running state to false to avoid using it after
device was reset (for example when calling ena_destroy_all_io_queues()
right after ena_com_dev_reset() in ena_down)

Finally, driver version is also updated.

Change-Id: I3cc1aafe9cb3701a6eaee44e00add0e175c93148

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Acked-by: Tyler Hicks <tyhicks@canonical.com>
Acked-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

s390/qeth: sanitize strings in debug messages

BugLink: https://bugs.launchpad.net/bugs/1797367
As Documentation/s390/s390dbf.txt states quite clearly, using any
pointer in sprinf-formatted s390dbf debug entries is dangerous.
The pointers are dereferenced whenever the trace file is read from.
So if the referenced data has a shorter life-time than the trace file,
any read operation can result in a use-after-free.

So rip out all hazardous use of indirect data, and replace any usage of
dev_name() and such by the Bus ID number.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(backported from commit e19e5be8b4cafa8b3f8b0cd1b1dfe20fa0145b83)
[Adjusted for different text in last hunk of qeth_l2_main.c]
Signed-off-by: Frank Heimes <frank.heimes@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

s390/qeth: reduce hard-coded access to ccw channels

BugLink: https://bugs.launchpad.net/bugs/1797367
Where possible use accessor macros and local pointers to access the ccw
channels. This makes it less likely to miss a spot.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 750b162598ec5b65cdb44d18f050b45cb7f8d31b)
Signed-off-by: Frank Heimes <frank.heimes@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

s390/qeth: remove outdated portname debug msg

BugLink: https://bugs.launchpad.net/bugs/1797367
The 'portname' attribute is deprecated and setting it has no effect.

Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit d857e11193a24d6623bb562e9b26cde582bd877f)
Signed-off-by: Frank Heimes <frank.heimes@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

s390/qeth: avoid using is_multicast_ether_addr_64bits on (u8 *)[6]

BugLink: https://bugs.launchpad.net/bugs/1797367
*ether_addr*_64bits functions have been introduced to optimize
performance critical paths, which access 6-byte ethernet address as u64
value to get "nice" assembly. A harmless hack works nicely on ethernet
addresses shoved into a structure or a larger buffer, until busted by
Kasan on smth like plain (u8 *)[6].

qeth_l2_set_mac_address calls qeth_l2_remove_mac passing
u8 old_addr[ETH_ALEN] as an argument.

Adding/removing macs for an ethernet adapter is not that performance
critical. Moreover is_multicast_ether_addr_64bits itself on s390 is not
faster than is_multicast_ether_addr:

is_multicast_ether_addr(%r2) -> %r2
llc %r2,0(%r2)
risbg %r2,%r2,63,191,0

is_multicast_ether_addr_64bits(%r2) -> %r2
llgc %r2,0(%r2)
risbg %r2,%r2,63,191,0

So, let's just use is_multicast_ether_addr instead of
is_multicast_ether_addr_64bits.

Fixes: bcacfcbc82b4 ("s390/qeth: fix MAC address update sequence")
Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 9d0a58fb9747afd27d490c02a97889a1b59f6be4)
Signed-off-by: Frank Heimes <frank.heimes@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

s390/qeth: consolidate qeth MAC address helpers

BugLink: https://bugs.launchpad.net/bugs/1797367
For adding/removing a MAC address, use just one helper each that
handles both unicast and multicast.
Saves one level of indirection for multicast addresses, while improving
the error reporting for unicast addresses.

Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 8174aa8aceefd3f97aebe6cc428cc3fd7b6ac2fa)
Signed-off-by: Frank Heimes <frank.heimes@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

s390/qeth: don't keep track of MAC address's cast type

BugLink: https://bugs.launchpad.net/bugs/1797367
Instead of tracking the uc/mc state in each MAC address object, just
check the multicast bit in the address itself.

Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 4641b027f7c32ea51db3acd6dcf97435c2385970)
Signed-off-by: Frank Heimes <frank.heimes@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

s390/zcrypt: Show load of cards and queues in sysfs

BugLink: https://bugs.launchpad.net/bugs/1799184
Show the current load value of cards and queues in sysfs.
The load value for each card and queue is maintained by
the zcrypt device driver for dispatching and load
balancing requests over the available devices.

This patch provides the load value to userspace via a
new read only sysfs attribute 'load' per card and queue.

Signed-off-by: Harald Freudenberger <freude@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
(backported from commit 4a07750ba8f3f45f0be730f7370c2c21a7491cd7)
[Minor context adjustments]
Signed-off-by: Frank Heimes <frank.heimes@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

s390/zcrypt: remove unused functions and declarations

BugLink: https://bugs.launchpad.net/bugs/1799184
The AP bus code is not available as kernel module any more.
There was some leftover code dealing with kernel module
exit which has been removed with this patch.

Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
(backported from commit 2c957a8ad45991f3ef71da5c75ed2299f3d46a31)
[Context adjustments]
Signed-off-by: Frank Heimes <frank.heimes@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

s390/zcrypt: hex string mask improvements for apmask and aqmask.

BugLink: https://bugs.launchpad.net/bugs/1799184
The sysfs attributes /sys/bus/ap/apmask and /sys/bus/ap/aqmask
and the kernel command line arguments ap.apm and ap.aqm get
an improvement of the value parsing with this patch:

The mask values are bitmaps in big endian order starting with bit 0.
So adapter number 0 is the leftmost bit, mask is 0x8000... The sysfs
attributes and the kernel command line accept 2 different formats:
- Absolute hex string starting with 0x like "0x12345678" does set
   the mask starting from left to right. If the given string is shorter
   than the mask it is padded with 0s on the right. If the string is
   longer than the mask an error comes back (EINVAL).
- Relative format - a concatenation (done with ',') of the terms
   +<bitnr>[-<bitnr>] or -<bitnr>[-<bitnr>]. <bitnr> may be any
   valid number (hex, decimal or octal) in the range 0...255.
   Here are some examples:
     "+0-15,+32,-128,-0xFF"
     "-0-255,+1-16,+0x128"

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
(cherry picked from commit 3d8f60d38e249f989a7fca9c2370c31c3d5487e1)
Signed-off-by: Frank Heimes <frank.heimes@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

s390/zcrypt: AP bus support for alternate driver(s)

BugLink: https://bugs.launchpad.net/bugs/1799184
The current AP bus, AP devices and AP device drivers implementation
uses a clearly defined mapping for binding AP devices to AP device
drivers. So for example a CEX6C queue will always be bound to the
cex4queue device driver.

The Linux Device Driver model has no sensitivity for more than one
device driver eligible for one device type. If there exist more than
one drivers matching to the device type, simple all drivers are tried
consecutively.  There is no way to determine and influence the probing
order of the drivers.

With KVM there is a need to provide additional device drivers matching
to the very same type of AP devices. With a simple implementation the
KVM drivers run in competition to the regular drivers. Whichever
'wins' a device depends on build order and implementation details
within the common Linux Device Driver Model and is not
deterministic. However, a userspace process could figure out which
device should be bound to which driver and sort out the correct
binding by manipulating attributes in the sysfs.

If for security reasons a AP device must not get bound to the 'wrong'
device driver the sorting out has to be done within the Linux kernel
by the AP bus code. This patch modifies the behavior of the AP bus
for probing drivers for devices in a way that two sets of drivers are
usable. Two new bitmasks 'apmask' and 'aqmask' are used to mark a
subset of the APQN range for 'usable by the ap bus and the default
drivers' or 'not usable by the default drivers and thus available for
alternate drivers like vfio-xxx'. So an APQN which is addressed by
this masking only the default drivers will be probed. In contrary an
APQN which is not addressed by the masks will never be probed and
bound to default drivers but onny to alternate drivers.

Eventually the two masks give a way to divide the range of APQNs into
two pools: one pool of APQNs used by the AP bus and the default
drivers and thus via zcrypt drivers available to the userspace of the
system. And another pool where no zcrypt drivers are bound to and
which can be used by alternate drivers (like vfio-xxx) for their
needs. This division is hot-plug save and makes sure a APQN assigned
to an alternate driver is at no time somehow exploitable by the wrong
party.

The two masks are located in sysfs at /sys/bus/ap/apmask and
/sys/bus/ap/aqmask.  The mask syntax is exactly the same as the
already existing mask attributes in the /sys/bus/ap directory (for
example ap_usage_domain_mask and ap_control_domain_mask).

By default all APQNs belong to the ap bus and the default drivers:

  cat /sys/bus/ap/apmask
  0xffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
  cat /sys/bus/ap/aqmask
  0xffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff

The masks can be changed at boot time with the kernel command line
like this:

  ... ap.apmask=0xffff ap.aqmask=0x40

This would give these two pools:

  default drivers pool:    adapter 0 - 15, domain 1
  alternate drivers pool:  adapter 0 - 15, all but domain 1
   adapter 16-255, all domains

The sysfs attributes for this two masks are writeable and an
administrator is able to reconfigure the assignements on the fly by
writing new mask values into.  With changing the mask(s) a revision of
the existing queue to driver bindings is done. So all APQNs which are
bound to the 'wrong' driver are reprobed via kernel function
device_reprobe() and thus the new correct driver will be assigned with
respect of the changed apmask and aqmask bits.

The mask values are bitmaps in big endian order starting with bit 0.
So adapter number 0 is the leftmost bit, mask is 0x8000... The sysfs
attributes accept 2 different formats:
- Absolute hex string starting with 0x like "0x12345678" does set
  the mask starting from left to right. If the given string is shorter
  than the mask it is padded with 0s on the right. If the string is
  longer than the mask an error comes back (EINVAL).
- '+' or '-' followed by a numerical value. Valid examples are "+1",
  "-13", "+0x41", "-0xff" and even "+0" and "-0". Only the addressed
  bit in the mask is switched on ('+') or off ('-').

This patch will also be the base for an upcoming extension to the
zcrypt drivers to be able to provide additional zcrypt device nodes
with filtering based on ap and aq masks.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
(backported from commit 7e0bdbe5c21cb8316a694e46ad5aad339f6894a6)
[Minor context adjustments]
Signed-off-by: Frank Heimes <frank.heimes@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

s390/zcrypt: code beautify

BugLink: https://bugs.launchpad.net/bugs/1799184
Code beautify by following most of the checkpatch suggestions:
- SPDX license identifier line complains by checkpatch
- missing space or newline complains by checkpatch
- octal numbers for permssions complains by checkpatch
- renaming of static sysfs functions complains by checkpatch
- fix of block comment complains by checkpatch
- fix printf like calls where function name instead of %s __func__
was used
- __packed instead of __attribute__((packed))
- init to zero for static variables removed
- use of DEVICE_ATTR_RO and DEVICE_ATTR_RW macros

No functional code changes or API changes!

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
(backported from commit ac2b96f351d7d222c46e524feca03005f3fa8d75)
[zcrypt_queue.c,zcrypt_card.c: dropped changes around load_show
as that function does not exist in Bionic]
Signed-off-by: Frank Heimes <frank.heimes@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

x86/speculation: Support Enhanced IBRS on future CPUs

BugLink: https://launchpad.net/bugs/1786139
Future Intel processors will support "Enhanced IBRS" which is an "always
on" mode i.e. IBRS bit in SPEC_CTRL MSR is enabled once and never
disabled.

From the specification [1]:

"With enhanced IBRS, the predicted targets of indirect branches
  executed cannot be controlled by software that was executed in a less
  privileged predictor mode or on another logical processor. As a
  result, software operating on a processor with enhanced IBRS need not
  use WRMSR to set IA32_SPEC_CTRL.IBRS after every transition to a more
  privileged predictor mode. Software can isolate predictor modes
  effectively simply by setting the bit once. Software need not disable
  enhanced IBRS prior to entering a sleep state such as MWAIT or HLT."

If Enhanced IBRS is supported by the processor then use it as the
preferred spectre v2 mitigation mechanism instead of Retpoline. Intel's
Retpoline white paper [2] states:

"Retpoline is known to be an effective branch target injection (Spectre
  variant 2) mitigation on Intel processors belonging to family 6
  (enumerated by the CPUID instruction) that do not have support for
  enhanced IBRS. On processors that support enhanced IBRS, it should be
  used for mitigation instead of retpoline."

The reason why Enhanced IBRS is the recommended mitigation on processors
which support it is that these processors also support CET which
provides a defense against ROP attacks. Retpoline is very similar to ROP
techniques and might trigger false positives in the CET defense.

If Enhanced IBRS is selected as the mitigation technique for spectre v2,
the IBRS bit in SPEC_CTRL MSR is set once at boot time and never
cleared. Kernel also has to make sure that IBRS bit remains set after
VMEXIT because the guest might have cleared the bit. This is already
covered by the existing x86_spec_ctrl_set_guest() and
x86_spec_ctrl_restore_host() speculation control functions.

Enhanced IBRS still requires IBPB for full mitigation.

[1] Speculative-Execution-Side-Channel-Mitigations.pdf
[2] Retpoline-A-Branch-Target-Injection-Mitigation.pdf
Both documents are available at:
https://bugzilla.kernel.org/show_bug.cgi?id=199511

Originally-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim C Chen <tim.c.chen@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Link: https://lkml.kernel.org/r/1533148945-24095-1-git-send-email-sai.praneeth.prakhya@intel.com
(backported from commit 706d51681d636a0c4a5ef53395ec3b803e45ed4d)
[tyhicks: Minor context change and properly place the check in cpu_set_bug_bits()]
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

x86/speculation: Remove SPECTRE_V2_IBRS in enum spectre_v2_mitigation

BugLink: https://launchpad.net/bugs/1786139
SPECTRE_V2_IBRS in enum spectre_v2_mitigation is never used. Remove it.

Signed-off-by: Jiang Biao <jiang.biao2@zte.com.cn>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: hpa@zytor.com
Cc: dwmw2@amazon.co.uk
Cc: konrad.wilk@oracle.com
Cc: bp@suse.de
Cc: zhong.weidong@zte.com.cn
Link: https://lkml.kernel.org/r/1531872194-39207-1-git-send-email-jiang.biao2@zte.com.cn
(cherry picked from commit d9f4426c73002957be5dd39936f44a09498f7560)
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

Fix kexec forbidding kernels signed with keys in the secondary keyring to boot

BugLink: https://bugs.launchpad.net/bugs/1798441
The split of .system_keyring into .builtin_trusted_keys and
.secondary_trusted_keys broke kexec, thereby preventing kernels signed by
keys which are now in the secondary keyring from being kexec'd.

Fix this by passing VERIFY_USE_SECONDARY_KEYRING to
verify_pefile_signature().

Fixes: d3bfe84129f6 ("certs: Add a secondary system keyring that can be added to dynamically")
Signed-off-by: Yannik Sembritzki <yannik@sembritzki.me>
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: kexec@lists.infradead.org
Cc: keyrings@vger.kernel.org
Cc: linux-security-module@vger.kernel.org
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(backported from commit ea93102f32244e3f45c8b26260be77ed0cc1d16c)
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

UBUNTU: SAUCE: overlayfs: ensure mounter privileges when reading directories

BugLink: https://launchpad.net/bugs/1793458
When reading directory contents ensure the mounter has permissions for
the operation over the constituent parts (lower and upper). Where we are
in a namespace this ensures that the mounter (root in that namespace)
has permissions over the files and directories, preventing exposure of
protected files and directory contents.

CVE-2018-6559

Signed-off-by: Andy Whitcroft <apw@canonical.com>
[tyhicks: make use of new upstream check in ovl_permission() for copy-ups]
[tyhicks: make use of creator (mounter) creds hanging off the super block]
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

net: ena: fix compilation error in xtensa architecture

BugLink: http://bugs.launchpad.net/bugs/1798182
linux/prefetch.h is never explicitly included in ena_com, although
functions from it, such as prefetchw(), are used throughout ena_com.
This is an inclusion bug, and we fix it here by explicitly including
linux/prefetch.h. The bug was exposed when the driver was compiled
for the xtensa architecture.

Fixes: 689b2bdaaa14 ("net: ena: add functions for handling Low Latency Queues in ena_com")
Fixes: 8c590f977638 ("ena: Fix Kconfig dependency on X86")
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 00f17a8219f02139119d8b4547e032bf4888fa0d net-next)
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Acked-by: Khalid Elmously <khalid.elmously@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

net: ena: enable Low Latency Queues

BugLink: http://bugs.launchpad.net/bugs/1798182
Use the new API to enable usage of LLQ.

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 9fd255928d7ffb56d8466fab3331d0b2f40aa8c7 linux-next)
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Acked-by: Khalid Elmously <khalid.elmously@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

net: ena: Fix Kconfig dependency on X86

BugLink: http://bugs.launchpad.net/bugs/1798182
The Kconfig limitation of X86 is to too wide.
The ENA driver only requires a little endian dependency.

Change the dependency to be on little endian CPU.

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 8c590f9776386b8f697fd0b7ed6142ae6e3de79e linux-next)
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Acked-by: Khalid Elmously <khalid.elmously@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

net: ena: fix indentations in ena_defs for better readability

BugLink: http://bugs.launchpad.net/bugs/1798182
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit be26667cb3947c90322467f1d15ad86b02350e00 linux-next)
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Acked-by: Khalid Elmously <khalid.elmously@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

net: ena: update driver version to 2.0.1

BugLink: http://bugs.launchpad.net/bugs/1798182
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 3a7b9d8ddd200bdafaa3ef75b8544d2403eaa03b linux-next)
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Acked-by: Khalid Elmously <khalid.elmously@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

net: ena: remove redundant parameter in ena_com_admin_init()

BugLink: http://bugs.launchpad.net/bugs/1798182
Remove redundant spinlock acquire parameter from ena_com_admin_init()

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit f1e90f6e2c1fb0e491f910540314015324fed1e2 linux-next)
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Acked-by: Khalid Elmously <khalid.elmously@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

net: ena: change rx copybreak default to reduce kernel memory pressure

BugLink: http://bugs.launchpad.net/bugs/1798182
Improves socket memory utilization when receiving packets larger
than 128 bytes (the previous rx copybreak) and smaller than 256 bytes.

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 87731f0c681c9682c5521e5197d89e561b7da395 linux-next)
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Acked-by: Khalid Elmously <khalid.elmously@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

net: ena: limit refill Rx threshold to 256 to avoid latency issues

BugLink: http://bugs.launchpad.net/bugs/1798182
Currently Rx refill is done when the number of required descriptors is
above 1/8 queue size. With a default of 1024 entries per queue the
threshold is 128 descriptors.
There is intention to increase the queue size to 8196 entries.
In this case threshold of 1024 descriptors is too large and can hurt
latency.
Add another limitation to Rx threshold to be at most 256 descriptors.

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 0574bb806dad29a3dada0ee42b01645477d48282 linux-next)
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Acked-by: Khalid Elmously <khalid.elmously@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

net: ena: explicit casting and initialization, and clearer error handling

BugLink: http://bugs.launchpad.net/bugs/1798182
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit bd791175a6432d24fc5d7b348304276027372545 linux-next)
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Acked-by: Khalid Elmously <khalid.elmously@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

net: ena: use CSUM_CHECKED device indication to report skb's checksum status

BugLink: http://bugs.launchpad.net/bugs/1798182
Set skb->ip_summed to the correct value as reported by the device.
Add counter for the case where rx csum offload is enabled but
device didn't check it.

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit cb36bb36e1f17d2a7b9a9751e5cfec4235b46c93 linux-next)
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Acked-by: Khalid Elmously <khalid.elmously@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

net: ena: add functions for handling Low Latency Queues in ena_netdev

BugLink: http://bugs.launchpad.net/bugs/1798182
This patch includes all code changes necessary in ena_netdev to enable
packet sending via the LLQ placemnt mode.

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 38005ca816a7ef5516dc8e59ae95716739aa75b0 linux-next)
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Acked-by: Khalid Elmously <khalid.elmously@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

net: ena: add functions for handling Low Latency Queues in ena_com

BugLink: http://bugs.launchpad.net/bugs/1798182
This patch introduces APIs for detection, initialization, configuration
and actual usage of low latency queues(LLQ). It extends transmit API with
creation of LLQ descriptors in device memory (which include host buffers
descriptors as well as packet header)

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 689b2bdaaa1480ad2c14bdc4c6eaf38284549022 linux-next)
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Acked-by: Khalid Elmously <khalid.elmously@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

net: ena: introduce Low Latency Queues data structures according to ENA spec

BugLink: http://bugs.launchpad.net/bugs/1798182
Low Latency Queues(LLQ) allow usage of device's memory for descriptors
and headers. Such queues decrease processing time since data is already
located on the device when driver rings the doorbell.

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit a7982b8ec947052df6d4467b3a81571f02f528e0 linux-next)
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Acked-by: Khalid Elmously <khalid.elmously@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

net: ena: complete host info to match latest ENA spec

BugLink: http://bugs.launchpad.net/bugs/1798182
Add new fields and definitions to host info and fill them
according to the latest ENA spec version.

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 095f2f1facba0c78f23750dba65c78cef722c1ea linux-next)
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Acked-by: Khalid Elmously <khalid.elmously@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

net: ena: minor performance improvement

BugLink: http://bugs.launchpad.net/bugs/1798182
Reduce fastpath overhead by making ena_com_tx_comp_req_id_get() inline.
Also move it to ena_eth_com.h file with its dependency function
ena_com_cq_inc_head().

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 0e575f8542d1f4d74df30b5a9ba419c5373d01a1 linux-next)
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Acked-by: Khalid Elmously <khalid.elmously@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

net: ena: fix auto casting to boolean

BugLink: http://bugs.launchpad.net/bugs/1798182
Eliminate potential auto casting compilation error.

Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 248ab77342d0453f067b666b36f0f517ea66c361)
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Acked-by: Khalid Elmously <khalid.elmously@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

net: ena: fix NULL dereference due to untimely napi initialization

BugLink: http://bugs.launchpad.net/bugs/1798182
napi poll functions should be initialized before running request_irq(),
to handle a rare condition where there is a pending interrupt, causing
the ISR to fire immediately while the poll function wasn't set yet,
causing a NULL dereference.

Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 78a55d05def95144ca5fa9a64c49b2a0636a9866)
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Acked-by: Khalid Elmously <khalid.elmously@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

net: ena: fix rare bug when failed restart/resume is followed by driver removal

BugLink: http://bugs.launchpad.net/bugs/1798182
In a rare scenario when ena_device_restore() fails, followed by device
remove, an FLR will not be issued. In this case, the device will keep
sending asynchronous AENQ keep-alive events, even after driver removal,
leading to memory corruption.

Fixes: 8c5c7abdeb2d ("net: ena: add power management ops to the ENA driver")
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit d7703ddbd7c9cb1ab7c08e1b85b314ff8cea38e9)
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Acked-by: Khalid Elmously <khalid.elmously@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

net: ena: fix warning in rmmod caused by double iounmap

BugLink: http://bugs.launchpad.net/bugs/1798182
Memory mapped with devm_ioremap is automatically freed when the driver
is disconnected from the device. Therefore there is no need to
explicitly call devm_iounmap.

Fixes: 0857d92f71b6 ("net: ena: add missing unmap bars on device removal")
Fixes: 411838e7b41c ("net: ena: fix rare kernel crash when bar memory remap fails")
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit d79c3888bde6581da7ff9f9d6f581900ecb5e632)
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Acked-by: Khalid Elmously <khalid.elmously@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

net: ena: remove ndo_poll_controller

BugLink: http://bugs.launchpad.net/bugs/1798182
As diagnosed by Song Liu, ndo_poll_controller() can
be very dangerous on loaded hosts, since the cpu
calling ndo_poll_controller() might steal all NAPI
contexts (for all RX/TX queues of the NIC). This capture
can last for unlimited amount of time, since one
cpu is generally not able to drain all the queues under load.

ena uses NAPI for TX completions, so we better let core
networking stack call the napi->poll() to avoid the capture.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Netanel Belgazal <netanel@amazon.com>
Cc: Saeed Bishara <saeedb@amazon.com>
Cc: Zorik Machulsky <zorik@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 21627982e4fff76a053f4d08d7fb56e532e08d52)
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Acked-by: Khalid Elmously <khalid.elmously@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

genirq/migration: Avoid out of line call if pending is not set

BugLink: http://bugs.launchpad.net/bugs/1800537
commit d340ebd696f921d3ad01b8c0c29dd38f2ad2bf3e upstream.

The upcoming fix for the -EBUSY return from affinity settings requires to
use the irq_move_irq() functionality even on irq remapped interrupts. To
avoid the out of line call, move the check for the pending bit into an
inline helper.

Preparatory change for the real fix. No functional change.

Fixes: dccfe3147b42 ("x86/vector: Simplify vector move cleanup")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <liu.song.a23@gmail.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: stable@vger.kernel.org
Cc: Mike Travis <mike.travis@hpe.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
Link: https://lkml.kernel.org/r/20180604162224.471925894@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

genirq/affinity: Defer affinity setting if irq chip is busy

BugLink: http://bugs.launchpad.net/bugs/1800537
commit 12f47073a40f6aa75119d8f5df4077b7f334cced upstream.

The case that interrupt affinity setting fails with -EBUSY can be handled
in the kernel completely by using the already available generic pending
infrastructure.

If a irq_chip::set_affinity() fails with -EBUSY, handle it like the
interrupts for which irq_chip::set_affinity() can only be invoked from
interrupt context. Copy the new affinity mask to irq_desc::pending_mask and
set the affinity pending bit. The next raised interrupt for the affected
irq will check the pending bit and try to set the new affinity from the
handler. This avoids that -EBUSY is returned when an affinity change is
requested from user space and the previous change has not been cleaned
up. The new affinity will take effect when the next interrupt is raised
from the device.

Fixes: dccfe3147b42 ("x86/vector: Simplify vector move cleanup")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Song Liu <songliubraving@fb.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <liu.song.a23@gmail.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: stable@vger.kernel.org
Cc: Mike Travis <mike.travis@hpe.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Tariq Toukan <tariqt@mellanox.com>
Link: https://lkml.kernel.org/r/20180604162224.819273597@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

genirq/generic_pending: Do not lose pending affinity update

BugLink: http://bugs.launchpad.net/bugs/1800537
commit a33a5d2d16cb84bea8d5f5510f3a41aa48b5c467 upstream.

The generic pending interrupt mechanism moves interrupts from the interrupt
handler on the original target CPU to the new destination CPU. This is
required for x86 and ia64 due to the way the interrupt delivery and
acknowledge works if the interrupts are not remapped.

However that update can fail for various reasons. Some of them are valid
reasons to discard the pending update, but the case, when the previous move
has not been fully cleaned up is not a legit reason to fail.

Check the return value of irq_do_set_affinity() for -EBUSY, which indicates
a pending cleanup, and rearm the pending move in the irq dexcriptor so it's
tried again when the next interrupt arrives.

Fixes: 996c591227d9 ("x86/irq: Plug vector cleanup race")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Song Liu <songliubraving@fb.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <liu.song.a23@gmail.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: stable@vger.kernel.org
Cc: Mike Travis <mike.travis@hpe.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Tariq Toukan <tariqt@mellanox.com>
Link: https://lkml.kernel.org/r/20180604162224.386544292@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

irq_remapping: Use apic_ack_irq()

BugLink: http://bugs.launchpad.net/bugs/1800537
commit 8a2b7d142e7ac477d52f5f92251e59fc136d7ddd upstream.

To address the EBUSY fail of interrupt affinity settings in case that the
previous setting has not been cleaned up yet, use the new apic_ack_irq()
function instead of the special ir_ack_apic_edge() implementation which is
merily a wrapper around ack_APIC_irq().

Preparatory change for the real fix

Fixes: dccfe3147b42 ("x86/vector: Simplify vector move cleanup")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Song Liu <songliubraving@fb.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <liu.song.a23@gmail.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: stable@vger.kernel.org
Cc: Mike Travis <mike.travis@hpe.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Tariq Toukan <tariqt@mellanox.com>
Link: https://lkml.kernel.org/r/20180604162224.555716895@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

x86/platform/uv: Use apic_ack_irq()

BugLink: http://bugs.launchpad.net/bugs/1800537
commit 839b0f1c4ef674cd929a42304c078afca278581a upstream.

To address the EBUSY fail of interrupt affinity settings in case that the
previous setting has not been cleaned up yet, use the new apic_ack_irq()
function instead of the special uv_ack_apic() implementation which is
merily a wrapper around ack_APIC_irq().

Preparatory change for the real fix

Fixes: dccfe3147b42 ("x86/vector: Simplify vector move cleanup")
Reported-by: Song Liu <liu.song.a23@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Song Liu <songliubraving@fb.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: stable@vger.kernel.org
Cc: Mike Travis <mike.travis@hpe.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Tariq Toukan <tariqt@mellanox.com>
Link: https://lkml.kernel.org/r/20180604162224.721691398@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

x86/ioapic: Use apic_ack_irq()

BugLink: http://bugs.launchpad.net/bugs/1800537
commit 2b04e46d8d0b9b7ac08ded672e3eab823f01d77a upstream.

To address the EBUSY fail of interrupt affinity settings in case that the
previous setting has not been cleaned up yet, use the new apic_ack_irq()
function instead of directly invoking ack_APIC_irq().

Preparatory change for the real fix

Fixes: dccfe3147b42 ("x86/vector: Simplify vector move cleanup")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Song Liu <songliubraving@fb.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <liu.song.a23@gmail.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: stable@vger.kernel.org
Cc: Mike Travis <mike.travis@hpe.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Tariq Toukan <tariqt@mellanox.com>
Link: https://lkml.kernel.org/r/20180604162224.639011135@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

x86/apic: Provide apic_ack_irq()

BugLink: http://bugs.launchpad.net/bugs/1800537
commit c0255770ccdc77ef2184d2a0a2e0cde09d2b44a4 upstream.

apic_ack_edge() is explicitely for handling interrupt affinity cleanup when
interrupt remapping is not available or disable.

Remapped interrupts and also some of the platform specific special
interrupts, e.g. UV, invoke ack_APIC_irq() directly.

To address the issue of failing an affinity update with -EBUSY the delayed
affinity mechanism can be reused, but ack_APIC_irq() does not handle
that. Adding this to ack_APIC_irq() is not possible, because that function
is also used for exceptions and directly handled interrupts like IPIs.

Create a new function, which just contains the conditional invocation of
irq_move_irq() and the final ack_APIC_irq().

Reuse the new function in apic_ack_edge().

Preparatory change for the real fix.

Fixes: dccfe3147b42 ("x86/vector: Simplify vector move cleanup")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Song Liu <songliubraving@fb.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <liu.song.a23@gmail.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: stable@vger.kernel.org
Cc: Mike Travis <mike.travis@hpe.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Tariq Toukan <tariqt@mellanox.com>
Link: https://lkml.kernel.org/r/20180604162224.471925894@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

x86/apic/vector: Prevent hlist corruption and leaks

BugLink: http://bugs.launchpad.net/bugs/1800537
commit 80ae7b1a918e78b0bae88b0c0ad413d3fdced968 upstream.

Several people observed the WARN_ON() in irq_matrix_free() which triggers
when the caller tries to free an vector which is not in the allocation
range. Song provided the trace information which allowed to decode the root
cause.

The rework of the vector allocation mechanism failed to preserve a sanity
check, which prevents setting a new target vector/CPU when the previous
affinity change has not fully completed.

As a result a half finished affinity change can be overwritten, which can
cause the leak of a irq descriptor pointer on the previous target CPU and
double enqueue of the hlist head into the cleanup lists of two or more
CPUs. After one CPU cleaned up its vector the next CPU will invoke the
cleanup handler with vector 0, which triggers the out of range warning in
the matrix allocator.

Prevent this by checking the apic_data of the interrupt whether the
move_in_progress flag is false and the hlist node is not hashed. Return
-EBUSY if not.

This prevents the damage and restores the behaviour before the vector
allocation rework, but due to other changes in that area it also widens the
chance that user space can observe -EBUSY. In theory this should be fine,
but actually not all user space tools handle -EBUSY correctly. Addressing
that is not part of this fix, but will be addressed in follow up patches.

Fixes: 69cde0004a4b ("x86/vector: Use matrix allocator for vector assignment")
Reported-by: Dmitry Safonov <0x7f454c46@gmail.com>
Reported-by: Tariq Toukan <tariqt@mellanox.com>
Reported-by: Song Liu <liu.song.a23@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Song Liu <songliubraving@fb.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Cc: Mike Travis <mike.travis@hpe.com>
Cc: Borislav Petkov <bp@alien8.de>
Link: https://lkml.kernel.org/r/20180604162224.303870257@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

x86/vector: Fix the args of vector_alloc tracepoint

BugLink: http://bugs.launchpad.net/bugs/1800537
commit 838d76d63ec4eaeaa12bedfa50f261480f615200 upstream.

The vector_alloc tracepont reversed the reserved and ret aggs, that made
the trace print wrong. Exchange them.

Fixes: 8d1e3dca7de6 ("x86/vector: Add tracepoints for vector management")
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: hpa@zytor.com
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180601065031.21872-1-douly.fnst@cn.fujitsu.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

cpufreq: ti-cpufreq: Fix an incorrect error return value

BugLink: http://bugs.launchpad.net/bugs/1800537
commit e5d295b06d69a1924665a16a4987be475addd00f upstream.

Commit 05829d9431df (cpufreq: ti-cpufreq: kfree opp_data when
failure) has fixed a memory leak in the failure path, however
the patch returned a positive value on get_cpu_device() failure
instead of the previous negative value. Fix this incorrect error
return value properly.

Fixes: 05829d9431df (cpufreq: ti-cpufreq: kfree opp_data when failure)
Cc: 4.14+ <stable@vger.kernel.org> # v4.14+
Signed-off-by: Suman Anna <s-anna@ti.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

net: phy: dp83822: use BMCR_ANENABLE instead of BMSR_ANEGCAPABLE for DP83620

BugLink: http://bugs.launchpad.net/bugs/1800537
[ Upstream commit b718e8c8f4f5920aaddc2e52d5e32f494c91129c ]

DP83620 register set is compatible with the DP83848, but it also supports
100base-FX. When the hardware is configured such as that fiber mode is
enabled, autonegotiation is not possible.

The chip, however, doesn't expose this information via BMSR_ANEGCAPABLE.
Instead, this bit is always set high, even if the particular hardware
configuration makes it so that auto negotiation is not possible [1]. Under
these circumstances, the phy subsystem keeps trying for autonegotiation to
happen, without success.

Hereby, we inspect BMCR_ANENABLE bit after genphy_config_init, which on
reset is set to 0 when auto negotiation is disabled, and so we use this
value instead of BMSR_ANEGCAPABLE.

[1] https://e2e.ti.com/support/interface/ethernet/f/903/p/697165/2571170

Signed-off-by: Alvaro Gamez Machado <alvaro.gamez@hazent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

mm, page_alloc: do not break __GFP_THISNODE by zonelist reset

BugLink: http://bugs.launchpad.net/bugs/1800537
commit 7810e6781e0fcbca78b91cf65053f895bf59e85f upstream.

In __alloc_pages_slowpath() we reset zonelist and preferred_zoneref for
allocations that can ignore memory policies.  The zonelist is obtained
from current CPU's node.  This is a problem for __GFP_THISNODE
allocations that want to allocate on a different node, e.g.  because the
allocating thread has been migrated to a different CPU.

This has been observed to break SLAB in our 4.4-based kernel, because
there it relies on __GFP_THISNODE working as intended.  If a slab page
is put on wrong node's list, then further list manipulations may corrupt
the list because page_to_nid() is used to determine which node's
list_lock should be locked and thus we may take a wrong lock and race.

Current SLAB implementation seems to be immune by luck thanks to commit
511e3a058812 ("mm/slab: make cache_grow() handle the page allocated on
arbitrary node") but there may be others assuming that __GFP_THISNODE
works as promised.

We can fix it by simply removing the zonelist reset completely.  There
is actually no reason to reset it, because memory policies and cpusets
don't affect the zonelist choice in the first place.  This was different
when commit 183f6371aac2 ("mm: ignore mempolicies when using
ALLOC_NO_WATERMARK") introduced the code, as mempolicies provided their
own restricted zonelists.

We might consider this for 4.17 although I don't know if there's
anything currently broken.

SLAB is currently not affected, but in kernels older than 4.7 that don't
yet have 511e3a058812 ("mm/slab: make cache_grow() handle the page
allocated on arbitrary node") it is.  That's at least 4.4 LTS.  Older
ones I'll have to check.

So stable backports should be more important, but will have to be
reviewed carefully, as the code went through many changes.  BTW I think
that also the ac->preferred_zoneref reset is currently useless if we
don't also reset ac->nodemask from a mempolicy to NULL first (which we
probably should for the OOM victims etc?), but I would leave that for a
separate patch.

Link: http://lkml.kernel.org/r/20180525130853.13915-1-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Fixes: 183f6371aac2 ("mm: ignore mempolicies when using ALLOC_NO_WATERMARK")
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

HID: wacom: Correct logical maximum Y for 2nd-gen Intuos Pro large

BugLink: http://bugs.launchpad.net/bugs/1800537
commit d471b6b22d37bf9928c6d0202bdaaf76583b8b61 upstream.

The HID descriptor for the 2nd-gen Intuos Pro large (PTH-860) contains
a typo which defines an incorrect logical maximum Y value. This causes
a small portion of the bottom of the tablet to become unusable (both
because the area is below the "bottom" of the tablet and because
'wacom_wac_event' ignores out-of-range values). It also results in a
skewed aspect ratio.

To fix this, we add a quirk to 'wacom_usage_mapping' which overwrites
the data with the correct value.

Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com>
CC: stable@vger.kernel.org # v4.10+
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

HID: intel_ish-hid: ipc: register more pm callbacks to support hibernation

BugLink: http://bugs.launchpad.net/bugs/1800537
commit ebeaa367548e9e92dd9374b9464ff6e7d157117b upstream.

Current ISH driver only registers suspend/resume PM callbacks which don't
support hibernation (suspend to disk). Basically after hiberation, the ISH
can't resume properly and user may not see sensor events (for example: screen
rotation may not work).

User will not see a crash or panic or anything except the following message
in log:

hid-sensor-hub 001F:8086:22D8.0001: timeout waiting for response from ISHTP device

So this patch adds support for S4/hiberbation to ISH by using the
SIMPLE_DEV_PM_OPS() MACRO instead of struct dev_pm_ops directly. The suspend
and resume functions will now be used for both suspend to RAM and hibernation.

If power management is disabled, SIMPLE_DEV_PM_OPS will do nothing, the suspend
and resume related functions won't be used, so mark them as __maybe_unused to
clarify that this is the intended behavior, and remove #ifdefs for power
management.

Cc: stable@vger.kernel.org
Signed-off-by: Even Xu <even.xu@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

orangefs: report attributes_mask and attributes for statx

BugLink: http://bugs.launchpad.net/bugs/1800537
commit 7f54910fa8dfe504f2e1563f4f6ddc3294dfbf3a upstream.

OrangeFS formerly failed to set attributes_mask with the result that
software could not see immutable and append flags present in the
filesystem.

Reported-by: Becky Ligon <ligon@clemson.edu>
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Fixes: 68a24a6cc4a6 ("orangefs: implement statx")
Cc: stable@vger.kernel.org
Cc: hubcap@omnibond.com
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

orangefs: set i_size on new symlink

BugLink: http://bugs.launchpad.net/bugs/1800537
commit f6a4b4c9d07dda90c7c29dae96d6119ac6425dca upstream.

As long as a symlink inode remains in-core, the destination (and
therefore size) will not be re-fetched from the server, as it cannot
change.  The original implementation of the attribute cache assumed that
setting the expiry time in the past was sufficient to cause a re-fetch
of all attributes on the next getattr.  That does not work in this case.

The bug manifested itself as follows.  When the command sequence

touch foo; ln -s foo bar; ls -l bar

is run, the output was

lrwxrwxrwx. 1 fedora fedora 4906 Apr 24 19:10 bar -> foo

However, after a re-mount, ls -l bar produces

lrwxrwxrwx. 1 fedora fedora    3 Apr 24 19:10 bar -> foo

After this commit, even before a re-mount, the output is

lrwxrwxrwx. 1 fedora fedora    3 Apr 24 19:10 bar -> foo

Reported-by: Becky Ligon <ligon@clemson.edu>
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Fixes: 71680c18c8f2 ("orangefs: Cache getattr results.")
Cc: stable@vger.kernel.org
Cc: hubcap@omnibond.com
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

iwlwifi: fw: harden page loading code

BugLink: http://bugs.launchpad.net/bugs/1800537
commit 9039d985811d5b109b58b202b7594fd24e433fed upstream.

The page loading code trusts the data provided in the firmware images
a bit too much and may cause a buffer overflow or copy unknown data if
the block sizes don't match what we expect.

To prevent potential problems, harden the code by checking if the
sizes we are copying are what we expect.

Cc: stable@vger.kernel.org
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

x86/intel_rdt: Enable CMT and MBM on new Skylake stepping

BugLink: http://bugs.launchpad.net/bugs/1800537
commit 1d9f3e20a56d33e55748552aeec597f58542f92d upstream.

New stepping of Skylake has fixes for cache occupancy and memory
bandwidth monitoring.

Update the code to enable these by default on newer steppings.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: stable@vger.kernel.org # v4.14
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Link: https://lkml.kernel.org/r/20180608160732.9842-1-tony.luck@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

w1: mxc_w1: Enable clock before calling clk_get_rate() on it

BugLink: http://bugs.launchpad.net/bugs/1800537
commit 955bc61328dc0a297fb3baccd84e9d3aee501ed8 upstream.

According to the API, you may only call clk_get_rate() after actually
enabling it.

Found by Linux Driver Verification project (linuxtesting.org).

Fixes: a5fd9139f74c ("w1: add 1-wire master driver for i.MX27 / i.MX31")
Signed-off-by: Stefan Potyra <Stefan.Potyra@elektrobit.com>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

libata: Drop SanDisk SD7UB3Q*G1001 NOLPM quirk

BugLink: http://bugs.launchpad.net/bugs/1800537
commit 2cfce3a86b64b53f0a70e92a6a659c720c319b45 upstream.

Commit 184add2ca23c ("libata: Apply NOLPM quirk for SanDisk
SD7UB3Q*G1001 SSDs") disabled LPM for SanDisk SD7UB3Q*G1001 SSDs.

This has lead to several reports of users of that SSD where LPM
was working fine and who know have a significantly increased idle
power consumption on their laptops.

Likely there is another problem on the T450s from the original
reporter which gets exposed by the uncore reaching deeper sleep
states (higher PC-states) due to LPM being enabled. The problem as
reported, a hardfreeze about once a day, already did not sound like
it would be caused by LPM and the reports of the SSD working fine
confirm this. The original reporter is ok with dropping the quirk.

A X250 user has reported the same hard freeze problem and for him
the problem went away after unrelated updates, I suspect some GPU
driver stack changes fixed things.

TL;DR: The original reporters problem were triggered by LPM but not
an LPM issue, so drop the quirk for the SSD in question.

BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1583207
Cc: stable@vger.kernel.org
Cc: Richard W.M. Jones <rjones@redhat.com>
Cc: Lorenzo Dalrio <lorenzo.dalrio@gmail.com>
Reported-by: Lorenzo Dalrio <lorenzo.dalrio@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: "Richard W.M. Jones" <rjones@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

libata: zpodd: small read overflow in eject_tray()

BugLink: http://bugs.launchpad.net/bugs/1800537
commit 18c9a99bce2a57dfd7e881658703b5d7469cc7b9 upstream.

We read from the cdb[] buffer in ata_exec_internal_sg(). It has to be
ATAPI_CDB_LEN (16) bytes long, but this buffer is only 12 bytes.

Fixes: 213342053db5 ("libata: handle power transition of ODD")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

cpufreq: governors: Fix long idle detection logic in load calculation

BugLink: http://bugs.launchpad.net/bugs/1800537
commit 7592019634f8473f0b0973ce79297183077bdbc2 upstream.

According to current code implementation, detecting the long
idle period is done by checking if the interval between two
adjacent utilization update handlers is long enough. Although
this mechanism can detect if the idle period is long enough
(no utilization hooks invoked during idle period), it might
not cover a corner case: if the task has occupied the CPU
for too long which causes no context switches during that
period, then no utilization handler will be launched until this
high prio task is scheduled out. As a result, the idle_periods
field might be calculated incorrectly because it regards the
100% load as 0% and makes the conservative governor who uses
this field confusing.

Change the detection to compare the idle_time with sampling_rate
directly.

Reported-by: Artem S. Tashkinov <t.artem@mailcity.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

cpufreq: Fix new policy initialization during limits updates via sysfs

BugLink: http://bugs.launchpad.net/bugs/1800537
commit c7d1f119c48f64bebf0fa1e326af577c6152fe30 upstream.

If the policy limits are updated via cpufreq_update_policy() and
subsequently via sysfs, the limits stored in user_policy may be
set incorrectly.

For example, if both min and max are set via sysfs to the maximum
available frequency, user_policy.min and user_policy.max will also
be the maximum. If a policy notifier triggered by
cpufreq_update_policy() lowers both the min and the max at this
point, that change is not reflected by the user_policy limits, so
if the max is updated again via sysfs to the same lower value,
then user_policy.max will be lower than user_policy.min which
shouldn't happen. In particular, if one of the policy CPUs is
then taken offline and back online, cpufreq_set_policy() will
fail for it due to a failing limits check.

To prevent that from happening, initialize the min and max fields
of the new_policy object to the ones stored in user_policy that
were previously set via sysfs.

Signed-off-by: Kevin Wangtao <kevin.wangtao@hisilicon.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Subject & changelog ]
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

bdi: Move cgroup bdi_writeback to a dedicated low concurrency workqueue

BugLink: http://bugs.launchpad.net/bugs/1800537
commit f183464684190bacbfb14623bd3e4e51b7575b4c upstream.

cgwb_release() punts the actual release to cgwb_release_workfn() on
system_wq.  Depending on the number of cgroups or block devices, there
can be a lot of cgwb_release_workfn() in flight at the same time.

We're periodically seeing close to 256 kworkers getting stuck with the
following stack trace and overtime the entire system gets stuck.

  [<ffffffff810ee40c>] _synchronize_rcu_expedited.constprop.72+0x2fc/0x330
  [<ffffffff810ee634>] synchronize_rcu_expedited+0x24/0x30
  [<ffffffff811ccf23>] bdi_unregister+0x53/0x290
  [<ffffffff811cd1e9>] release_bdi+0x89/0xc0
  [<ffffffff811cd645>] wb_exit+0x85/0xa0
  [<ffffffff811cdc84>] cgwb_release_workfn+0x54/0xb0
  [<ffffffff810a68d0>] process_one_work+0x150/0x410
  [<ffffffff810a71fd>] worker_thread+0x6d/0x520
  [<ffffffff810ad3dc>] kthread+0x12c/0x160
  [<ffffffff81969019>] ret_from_fork+0x29/0x40
  [<ffffffffffffffff>] 0xffffffffffffffff

The events leading to the lockup are...

1. A lot of cgwb_release_workfn() is queued at the same time and all
   system_wq kworkers are assigned to execute them.

2. They all end up calling synchronize_rcu_expedited().  One of them
   wins and tries to perform the expedited synchronization.

3. However, that invovles queueing rcu_exp_work to system_wq and
   waiting for it.  Because #1 is holding all available kworkers on
   system_wq, rcu_exp_work can't be executed.  cgwb_release_workfn()
   is waiting for synchronize_rcu_expedited() which in turn is waiting
   for cgwb_release_workfn() to free up some of the kworkers.

We shouldn't be scheduling hundreds of cgwb_release_workfn() at the
same time.  There's nothing to be gained from that.  This patch
updates cgwb release path to use a dedicated percpu workqueue with
@max_active of 1.

While this resolves the problem at hand, it might be a good idea to
isolate rcu_exp_work to its own workqueue too as it can be used from
various paths and is prone to this sort of indirect A-A deadlocks.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

blk-mq: reinit q->tag_set_list entry only after grace period

BugLink: http://bugs.launchpad.net/bugs/1800537
commit a347c7ad8edf4c5685154f3fdc3c12fc1db800ba upstream.

It is not allowed to reinit q->tag_set_list list entry while RCU grace
period has not completed yet, otherwise the following soft lockup in
blk_mq_sched_restart() happens:

[ 1064.252652] watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [fio:9270]
[ 1064.254445] task: ffff99b912e8b900 task.stack: ffffa6d54c758000
[ 1064.254613] RIP: 0010:blk_mq_sched_restart+0x96/0x150
[ 1064.256510] Call Trace:
[ 1064.256664]  <IRQ>
[ 1064.256824]  blk_mq_free_request+0xea/0x100
[ 1064.256987]  msg_io_conf+0x59/0xd0 [ibnbd_client]
[ 1064.257175]  complete_rdma_req+0xf2/0x230 [ibtrs_client]
[ 1064.257340]  ? ibtrs_post_recv_empty+0x4d/0x70 [ibtrs_core]
[ 1064.257502]  ibtrs_clt_rdma_done+0xd1/0x1e0 [ibtrs_client]
[ 1064.257669]  ib_create_qp+0x321/0x380 [ib_core]
[ 1064.257841]  ib_process_cq_direct+0xbd/0x120 [ib_core]
[ 1064.258007]  irq_poll_softirq+0xb7/0xe0
[ 1064.258165]  __do_softirq+0x106/0x2a2
[ 1064.258328]  irq_exit+0x92/0xa0
[ 1064.258509]  do_IRQ+0x4a/0xd0
[ 1064.258660]  common_interrupt+0x7a/0x7a
[ 1064.258818]  </IRQ>

Meanwhile another context frees other queue but with the same set of
shared tags:

[ 1288.201183] INFO: task bash:5910 blocked for more than 180 seconds.
[ 1288.201833] bash            D    0  5910   5820 0x00000000
[ 1288.202016] Call Trace:
[ 1288.202315]  schedule+0x32/0x80
[ 1288.202462]  schedule_timeout+0x1e5/0x380
[ 1288.203838]  wait_for_completion+0xb0/0x120
[ 1288.204137]  __wait_rcu_gp+0x125/0x160
[ 1288.204287]  synchronize_sched+0x6e/0x80
[ 1288.204770]  blk_mq_free_queue+0x74/0xe0
[ 1288.204922]  blk_cleanup_queue+0xc7/0x110
[ 1288.205073]  ibnbd_clt_unmap_device+0x1bc/0x280 [ibnbd_client]
[ 1288.205389]  ibnbd_clt_unmap_dev_store+0x169/0x1f0 [ibnbd_client]
[ 1288.205548]  kernfs_fop_write+0x109/0x180
[ 1288.206328]  vfs_write+0xb3/0x1a0
[ 1288.206476]  SyS_write+0x52/0xc0
[ 1288.206624]  do_syscall_64+0x68/0x1d0
[ 1288.206774]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2

What happened is the following:

1. There are several MQ queues with shared tags.
2. One queue is about to be freed and now task is in
   blk_mq_del_queue_tag_set().
3. Other CPU is in blk_mq_sched_restart() and loops over all queues in
   tag list in order to find hctx to restart.

Because linked list entry was modified in blk_mq_del_queue_tag_set()
without proper waiting for a grace period, blk_mq_sched_restart()
never ends, spining in list_for_each_entry_rcu_rr(), thus soft lockup.

Fix is simple: reinit list entry after an RCU grace period elapsed.

Fixes: Fixes: 705cda97ee3a ("blk-mq: Make it safe to use RCU to iterate over blk_mq_tag_set.tag_list")
Cc: stable@vger.kernel.org
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: linux-block@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

nbd: use bd_set_size when updating disk size

BugLink: http://bugs.launchpad.net/bugs/1800537
commit 9e2b19675d1338d2a38e99194756f2db44a081df upstream.

When we stopped relying on the bdev everywhere I broke updating the
block device size on the fly, which ceph relies on. We can't just do
set_capacity, we also have to do bd_set_size so things like parted will
notice the device size change.

Fixes: 29eaadc ("nbd: stop using the bdev everywhere")
cc: stable@vger.kernel.org
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

nbd: update size when connected

BugLink: http://bugs.launchpad.net/bugs/1800537
commit c3f7c9397609705ef848cc98a5fb429b3e90c3c4 upstream.

I messed up changing the size of an NBD device while it was connected by
not actually updating the device or doing the uevent. Fix this by
updating everything if we're connected and we change the size.

cc: stable@vger.kernel.org
Fixes: 639812a ("nbd: don't set the device size until we're connected")
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

nbd: fix nbd device deletion

BugLink: http://bugs.launchpad.net/bugs/1800537
commit 8364da4751cf22201d74933d5e634176f44ed407 upstream.

This fixes a use after free bug, we shouldn't be doing disk->queue right
after we do del_gendisk(disk). Save the queue and do the cleanup after
the del_gendisk.

Fixes: c6a4759ea0c9 ("nbd: add device refcounting")
cc: stable@vger.kernel.org
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

cifs: For SMB2 security informaion query, check for minimum sized security descriptor instead of sizeof FileAllInformation class

BugLink: http://bugs.launchpad.net/bugs/1800537
commit ee25c6dd7b05113783ce1f4fab6b30fc00d29b8d upstream.

Validate_buf () function checks for an expected minimum sized response
passed to query_info() function.
For security information, the size of a security descriptor can be
smaller (one subauthority, no ACEs) than the size of the structure
that defines FileInfoClass of FileAllInformation.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199725
Cc: <stable@vger.kernel.org>
Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Reviewed-by: Noah Morrison <noah.morrison@rubrik.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

CIFS: 511c54a2f69195b28afb9dd119f03787b1625bb4 adds a check for session expiry

BugLink: http://bugs.launchpad.net/bugs/1800537
commit d81243c697ffc71f983736e7da2db31a8be0001f upstream.

Handle this additional status in the same way as SESSION_EXPIRED.

Signed-off-by: Mark Syms <mark.syms@citrix.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

smb3: fix various xid leaks

BugLink: http://bugs.launchpad.net/bugs/1800537
commit cfe89091644c441a1ade6dae6d2e47b715648615 upstream.

Fix a few cases where we were not freeing the xid which led to
active requests being non-zero at unmount time.

Signed-off-by: Steve French <smfrench@gmail.com>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

x86/MCE: Fix stack out-of-bounds write in mce-inject.c: Flags_read()

BugLink: http://bugs.launchpad.net/bugs/1800537
commit 985c78d3ff8e9c74450fa2bb08eb55e680d999ca upstream.

Each of the strings that we want to put into the buf[MAX_FLAG_OPT_SIZE]
in flags_read() is two characters long. But the sprintf() adds
a trailing newline and will add a terminating NUL byte. So
MAX_FLAG_OPT_SIZE needs to be 4.

sprintf() calls vsnprintf() and *that* does return:

" * The return value is the number of characters which would
* be generated for the given input, excluding the trailing
* '\0', as per ISO C99."

Note the "excluding".

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180427163707.ktaiysvbk3yhk4wm@agluck-desk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

ALSA: hda: add dock and led support for HP ProBook 640 G4

BugLink: http://bugs.launchpad.net/bugs/1800537
commit 7eef32c1ef895a3a96463f9cbd04203007cd5555 upstream.

This patch adds missing initialisation for HP 2013 UltraSlim Dock
Line-In/Out PINs and activates keyboard mute/micmute leds
for HP ProBook 640 G4

Signed-off-by: Dennis Wassenberg <dennis.wassenberg@secunet.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

ALSA: hda: add dock and led support for HP EliteBook 830 G5

BugLink: http://bugs.launchpad.net/bugs/1800537
commit 2861751f67b91e1d24e68010ced96614fb3140f4 upstream.

This patch adds missing initialisation for HP 2013 UltraSlim Dock
Line-In/Out PINs and activates keyboard mute/micmute leds
for HP EliteBook 830 G5

Signed-off-by: Dennis Wassenberg <dennis.wassenberg@secunet.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

ALSA: hda - Handle kzalloc() failure in snd_hda_attach_pcm_stream()

BugLink: http://bugs.launchpad.net/bugs/1800537
commit a3aa60d511746bd6c0d0366d4eb90a7998bcde8b upstream.

When 'kzalloc()' fails in 'snd_hda_attach_pcm_stream()', a new pcm instance is
created without setting its operators via 'snd_pcm_set_ops()'. Following
operations on the new pcm instance can trigger kernel null pointer dereferences
and cause kernel oops.

This bug was found with my work on building a gray-box fault-injection tool for
linux-kernel-module binaries. A kernel null pointer dereference was confirmed
from line 'substream->ops->open()' in function 'snd_pcm_open_substream()' in
file 'sound/core/pcm_native.c'.

This patch fixes the bug by calling 'snd_device_free()' in the error handling
path of 'kzalloc()', which removes the new pcm instance from the snd card before
returns with an error code.

Signed-off-by: Bo Chen <chenbo@pdx.edu>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

ALSA: hda/conexant - Add fixup for HP Z2 G4 workstation

BugLink: http://bugs.launchpad.net/bugs/1800537
commit f16041df4c360eccacfe90f96673b37829e4c959 upstream.

HP Z2 G4 requires the same workaround as other HP machines that have
no mic-pin detection.

Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

btrfs: scrub: Don't use inode pages for device replace

BugLink: http://bugs.launchpad.net/bugs/1800537
commit ac0b4145d662a3b9e34085dea460fb06ede9b69b upstream.

[BUG]
Btrfs can create compressed extent without checksum (even though it
shouldn't), and if we then try to replace device containing such extent,
the result device will contain all the uncompressed data instead of the
compressed one.

Test case already submitted to fstests:
https://patchwork.kernel.org/patch/10442353/

[CAUSE]
When handling compressed extent without checksum, device replace will
goe into copy_nocow_pages() function.

In that function, btrfs will get all inodes referring to this data
extents and then use find_or_create_page() to get pages direct from that
inode.

The problem here is, pages directly from inode are always uncompressed.
And for compressed data extent, they mismatch with on-disk data.
Thus this leads to corrupted compressed data extent written to replace
device.

[FIX]
In this attempt, we could just remove the "optimization" branch, and let
unified scrub_pages() to handle it.

Although scrub_pages() won't bother reusing page cache, it will be a
little slower, but it does the correct csum checking and won't cause
such data corruption caused by "optimization".

Note about the fix: this is the minimal fix that can be backported to
older stable trees without conflicts. The whole callchain from
copy_nocow_pages() can be deleted, and will be in followup patches.

Fixes: ff023aac3119 ("Btrfs: add code to scrub to copy read data to another disk")
CC: stable@vger.kernel.org # 4.4+
Reported-by: James Harvey <jamespharvey20@gmail.com>
Reviewed-by: James Harvey <jamespharvey20@gmail.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
[ remove code removal, add note why ]
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

btrfs: return error value if create_io_em failed in cow_file_range

BugLink: http://bugs.launchpad.net/bugs/1800537
commit 090a127afa8f73e9618d4058d6755f7ec7453dd6 upstream.

In cow_file_range(), create_io_em() may fail, but its return value is
not recorded. Then return value may be 0 even it failed which is a
wrong behavior.

Let cow_file_range() return PTR_ERR(em) if create_io_em() failed.

Fixes: 6f9994dbabe5 ("Btrfs: create a helper to create em for IO")
CC: stable@vger.kernel.org # 4.11+
Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

Btrfs: fix memory and mount leak in btrfs_ioctl_rm_dev_v2()

BugLink: http://bugs.launchpad.net/bugs/1800537
commit fd4e994bd1f9dc9628e168a7f619bf69f6984635 upstream.

If we have invalid flags set, when we error out we must drop our writer
counter and free the buffer we allocated for the arguments. This bug is
trivially reproduced with the following program on 4.7+:

#include <fcntl.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/ioctl.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <linux/btrfs.h>
#include <linux/btrfs_tree.h>

int main(int argc, char **argv)
{
struct btrfs_ioctl_vol_args_v2 vol_args = {
.flags = UINT64_MAX,
};
int ret;
int fd;

if (argc != 2) {
fprintf(stderr, "usage: %s PATH\n", argv[0]);
return EXIT_FAILURE;
}

fd = open(argv[1], O_WRONLY);
if (fd == -1) {
perror("open");
return EXIT_FAILURE;
}

ret = ioctl(fd, BTRFS_IOC_RM_DEV_V2, &vol_args);
if (ret == -1)
perror("ioctl");

close(fd);
return EXIT_SUCCESS;
}

When unmounting the filesystem, we'll hit the
WARN_ON(mnt_get_writers(mnt)) in cleanup_mnt() and also may prevent the
filesystem to be remounted read-only as the writer count will stay
lifted.

Fixes: 6b526ed70cf1 ("btrfs: introduce device delete by devid")
CC: stable@vger.kernel.org # 4.9+
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Su Yue <suy.fnst@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

Btrfs: fix clone vs chattr NODATASUM race

BugLink: http://bugs.launchpad.net/bugs/1800537
commit b5c40d598f5408bd0ca22dfffa82f03cd9433f23 upstream.

In btrfs_clone_files(), we must check the NODATASUM flag while the
inodes are locked. Otherwise, it's possible that btrfs_ioctl_setflags()
will change the flags after we check and we can end up with a party
checksummed file.

The race window is only a few instructions in size, between the if and
the locks which is:

3834         if (S_ISDIR(src->i_mode) || S_ISDIR(inode->i_mode))
3835                 return -EISDIR;

where the setflags must be run and toggle the NODATASUM flag (provided
the file size is 0).  The clone will block on the inode lock, segflags
takes the inode lock, changes flags, releases log and clone continues.

Not impossible but still needs a lot of bad luck to hit unintentionally.

Fixes: 0e7b824c4ef9 ("Btrfs: don't make a file partly checksummed through file clone")
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

driver core: Don't ignore class_dir_create_and_add() failure.

BugLink: http://bugs.launchpad.net/bugs/1800537
commit 84d0c27d6233a9ba0578b20f5a09701eb66cee42 upstream.

syzbot is hitting WARN() at kernfs_add_one() [1].
This is because kernfs_create_link() is confused by previous device_add()
call which continued without setting dev->kobj.parent field when
get_device_parent() failed by memory allocation fault injection.
Fix this by propagating the error from class_dir_create_and_add() to
the calllers of get_device_parent().

[1] https://syzkaller.appspot.com/bug?id=fae0fb607989ea744526d1c082a5b8de6529116f

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: syzbot <syzbot+df47f81c226b31d89fb1@syzkaller.appspotmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

ext4: fix fencepost error in check for inode count overflow during resize

BugLink: http://bugs.launchpad.net/bugs/1800537
commit 4f2f76f751433908364ccff82f437a57d0e6e9b7 upstream.

ext4_resize_fs() has an off-by-one bug when checking whether growing of
a filesystem will not overflow inode count. As a result it allows a
filesystem with 8192 inodes per group to grow to 64TB which overflows
inode count to 0 and makes filesystem unusable. Fix it.

Cc: stable@vger.kernel.org
Fixes: 3f8a6411fbada1fa482276591e037f3b1adcf55b
Reported-by: Jaco Kroon <jaco@uls.co.za>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

ext4: bubble errors from ext4_find_inline_data_nolock() up to ext4_iget()

BugLink: http://bugs.launchpad.net/bugs/1800537
commit eb9b5f01c33adebc31cbc236c02695f605b0e417 upstream.

If ext4_find_inline_data_nolock() returns an error it needs to get
reflected up to ext4_iget(). In order to fix this,
ext4_iget_extra_inode() needs to return an error (and not return
void).

This is related to "ext4: do not allow external inodes for inline
data" (which fixes CVE-2018-11412) in that in the errors=continue
case, it would be useful to for userspace to receive an error
indicating that file system is corrupted.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

ext4: update mtime in ext4_punch_hole even if no blocks are released

BugLink: http://bugs.launchpad.net/bugs/1800537
commit eee597ac931305eff3d3fd1d61d6aae553bc0984 upstream.

Currently in ext4_punch_hole we're going to skip the mtime update if
there are no actual blocks to release. However we've actually modified
the file by zeroing the partial block so the mtime should be updated.

Moreover the sync and datasync handling is skipped as well, which is
also wrong. Fix it.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reported-by: Joe Habermann <joe.habermann@quantum.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

ext4: fix hole length detection in ext4_ind_map_blocks()

BugLink: http://bugs.launchpad.net/bugs/1800537
commit 2ee3ee06a8fd792765fa3267ddf928997797eec5 upstream.

When ext4_ind_map_blocks() computes a length of a hole, it doesn't count
with the fact that mapped offset may be somewhere in the middle of the
completely empty subtree. In such case it will return too large length
of the hole which then results in lseek(SEEK_DATA) to end up returning
an incorrect offset beyond the end of the hole.

Fix the problem by correctly taking offset within a subtree into account
when computing a length of a hole.

Fixes: facab4d9711e7aa3532cb82643803e8f1b9518e8
CC: stable@vger.kernel.org
Reported-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

tls: fix use-after-free in tls_push_record

BugLink: http://bugs.launchpad.net/bugs/1800537
[ Upstream commit a447da7d00410278c90d3576782a43f8b675d7be ]

syzkaller managed to trigger a use-after-free in tls like the
following:

  BUG: KASAN: use-after-free in tls_push_record.constprop.15+0x6a2/0x810 [tls]
  Write of size 1 at addr ffff88037aa08000 by task a.out/2317

  CPU: 3 PID: 2317 Comm: a.out Not tainted 4.17.0+ #144
  Hardware name: LENOVO 20FBCTO1WW/20FBCTO1WW, BIOS N1FET47W (1.21 ) 11/28/2016
  Call Trace:
   dump_stack+0x71/0xab
   print_address_description+0x6a/0x280
   kasan_report+0x258/0x380
   ? tls_push_record.constprop.15+0x6a2/0x810 [tls]
   tls_push_record.constprop.15+0x6a2/0x810 [tls]
   tls_sw_push_pending_record+0x2e/0x40 [tls]
   tls_sk_proto_close+0x3fe/0x710 [tls]
   ? tcp_check_oom+0x4c0/0x4c0
   ? tls_write_space+0x260/0x260 [tls]
   ? kmem_cache_free+0x88/0x1f0
   inet_release+0xd6/0x1b0
   __sock_release+0xc0/0x240
   sock_close+0x11/0x20
   __fput+0x22d/0x660
   task_work_run+0x114/0x1a0
   do_exit+0x71a/0x2780
   ? mm_update_next_owner+0x650/0x650
   ? handle_mm_fault+0x2f5/0x5f0
   ? __do_page_fault+0x44f/0xa50
   ? mm_fault_error+0x2d0/0x2d0
   do_group_exit+0xde/0x300
   __x64_sys_exit_group+0x3a/0x50
   do_syscall_64+0x9a/0x300
   ? page_fault+0x8/0x30
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

This happened through fault injection where aead_req allocation in
tls_do_encryption() eventually failed and we returned -ENOMEM from
the function. Turns out that the use-after-free is triggered from
tls_sw_sendmsg() in the second tls_push_record(). The error then
triggers a jump to waiting for memory in sk_stream_wait_memory()
resp. returning immediately in case of MSG_DONTWAIT. What follows is
the trim_both_sgl(sk, orig_size), which drops elements from the sg
list added via tls_sw_sendmsg(). Now the use-after-free gets triggered
when the socket is being closed, where tls_sk_proto_close() callback
is invoked. The tls_complete_pending_work() will figure that there's
a pending closed tls record to be flushed and thus calls into the
tls_push_pending_closed_record() from there. ctx->push_pending_record()
is called from the latter, which is the tls_sw_push_pending_record()
from sw path. This again calls into tls_push_record(). And here the
tls_fill_prepend() will panic since the buffer address has been freed
earlier via trim_both_sgl(). One way to fix it is to move the aead
request allocation out of tls_do_encryption() early into tls_push_record().
This means we don't prep the tls header and advance state to the
TLS_PENDING_CLOSED_RECORD before allocation which could potentially
fail happened. That fixes the issue on my side.

Fixes: 3c4d7559159b ("tls: kernel TLS support")
Reported-by: syzbot+5c74af81c547738e1684@syzkaller.appspotmail.com
Reported-by: syzbot+709f2810a6a05f11d4d3@syzkaller.appspotmail.com
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

net: in virtio_net_hdr only add VLAN_HLEN to csum_start if payload holds vlan

BugLink: http://bugs.launchpad.net/bugs/1800537
[ Upstream commit fd3a88625844907151737fc3b4201676effa6d27 ]

Tun, tap, virtio, packet and uml vector all use struct virtio_net_hdr
to communicate packet metadata to userspace.

For skbuffs with vlan, the first two return the packet as it may have
existed on the wire, inserting the VLAN tag in the user buffer. Then
virtio_net_hdr.csum_start needs to be adjusted by VLAN_HLEN bytes.

Commit f09e2249c4f5 ("macvtap: restore vlan header on user read")
added this feature to macvtap. Commit 3ce9b20f1971 ("macvtap: Fix
csum_start when VLAN tags are present") then fixed up csum_start.

Virtio, packet and uml do not insert the vlan header in the user
buffer.

When introducing virtio_net_hdr_from_skb to deduplicate filling in
the virtio_net_hdr, the variant from macvtap which adds VLAN_HLEN was
applied uniformly, breaking csum offset for packets with vlan on
virtio and packet.

Make insertion of VLAN_HLEN optional. Convert the callers to pass it
when needed.

Fixes: e858fae2b0b8f4 ("virtio_net: use common code for virtio_net_hdr and skb GSO conversion")
Fixes: 1276f24eeef2 ("packet: use common code for virtio_net_hdr and skb GSO conversion")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

udp: fix rx queue len reported by diag and proc interface

BugLink: http://bugs.launchpad.net/bugs/1800537
[ Upstream commit 6c206b20092a3623184cff9470dba75d21507874 ]

After commit 6b229cf77d68 ("udp: add batching to udp_rmem_release()")
the sk_rmem_alloc field does not measure exactly anymore the
receive queue length, because we batch the rmem release. The issue
is really apparent only after commit 0d4a6608f68c ("udp: do rmem bulk
free even if the rx sk queue is empty"): the user space can easily
check for an empty socket with not-0 queue length reported by the 'ss'
tool or the procfs interface.

We need to use a custom UDP helper to report the correct queue length,
taking into account the forward allocation deficit.

Reported-by: trevor.francis@46labs.com
Fixes: 6b229cf77d68 ("UDP: add batching to udp_rmem_release()")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

tcp: verify the checksum of the first data segment in a new connection

BugLink: http://bugs.launchpad.net/bugs/1800537
[ Upstream commit 4fd44a98ffe0d048246efef67ed640fdf2098a62 ]

commit 079096f103fa ("tcp/dccp: install syn_recv requests into ehash
table") introduced an optimization for the handling of child sockets
created for a new TCP connection.

But this optimization passes any data associated with the last ACK of the
connection handshake up the stack without verifying its checksum, because it
calls tcp_child_process(), which in turn calls tcp_rcv_state_process()
directly. These lower-level processing functions do not do any checksum
verification.

Insert a tcp_checksum_complete call in the TCP_NEW_SYN_RECEIVE path to
fix this.

Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table")
Signed-off-by: Frank van der Linden <fllinden@amazon.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Balbir Singh <bsingharora@gmail.com>
Reviewed-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

net/sched: act_simple: fix parsing of TCA_DEF_DATA

BugLink: http://bugs.launchpad.net/bugs/1800537
[ Upstream commit 8d499533e0bc02d44283dbdab03142b599b8ba16 ]

use nla_strlcpy() to avoid copying data beyond the length of TCA_DEF_DATA
netlink attribute, in case it is less than SIMP_MAX_DATA and it does not
end with '\0' character.

v2: fix errors in the commit message, thanks Hangbin Liu

Fixes: fa1b1cff3d06 ("net_cls_act: Make act_simple use of netlink policy.")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

net: dsa: add error handling for pskb_trim_rcsum

BugLink: http://bugs.launchpad.net/bugs/1800537
[ Upstream commit 349b71d6f427ff8211adf50839dbbff3f27c1805 ]

When pskb_trim_rcsum fails, the lack of error-handling code may
cause unexpected results.

This patch adds error-handling code after calling pskb_trim_rcsum.

Signed-off-by: Zhouyang Jia <jiazhouyang09@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

ipv6: allow PMTU exceptions to local routes

BugLink: http://bugs.launchpad.net/bugs/1800537
[ Upstream commit 0975764684487bf3f7a47eef009e750ea41bd514 ]

IPVS setups with local client and remote tunnel server need
to create exception for the local virtual IP. What we do is to
change PMTU from 64KB (on "lo") to 1460 in the common case.

Suggested-by: Martin KaFai Lau <kafai@fb.com>
Fixes: 45e4fd26683c ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception")
Fixes: 7343ff31ebf0 ("ipv6: Don't create clones of host routes.")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: David Ahern <dsahern@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

cdc_ncm: avoid padding beyond end of skb

BugLink: http://bugs.launchpad.net/bugs/1800537
[ Upstream commit 49c2c3f246e2fc3009039e31a826333dcd0283cd ]

Commit 4a0e3e989d66 ("cdc_ncm: Add support for moving NDP to end
of NCM frame") added logic to reserve space for the NDP at the
end of the NTB/skb.  This reservation did not take the final
alignment of the NDP into account, causing us to reserve too
little space. Additionally the padding prior to NDP addition did
not ensure there was enough space for the NDP.

The NTB/skb with the NDP appended would then exceed the configured
max size. This caused the final padding of the NTB to use a
negative count, padding to almost INT_MAX, and resulting in:

[60103.825970] BUG: unable to handle kernel paging request at ffff9641f2004000
[60103.825998] IP: __memset+0x24/0x30
[60103.826001] PGD a6a06067 P4D a6a06067 PUD 4f65a063 PMD 72003063 PTE 0
[60103.826013] Oops: 0002 [#1] SMP NOPTI
[60103.826018] Modules linked in: (removed(
[60103.826158] CPU: 0 PID: 5990 Comm: Chrome_DevTools Tainted: G           O 4.14.0-3-amd64 #1 Debian 4.14.17-1
[60103.826162] Hardware name: LENOVO 20081 BIOS 41CN28WW(V2.04) 05/03/2012
[60103.826166] task: ffff964193484fc0 task.stack: ffffb2890137c000
[60103.826171] RIP: 0010:__memset+0x24/0x30
[60103.826174] RSP: 0000:ffff964316c03b68 EFLAGS: 00010216
[60103.826178] RAX: 0000000000000000 RBX: 00000000fffffffd RCX: 000000001ffa5000
[60103.826181] RDX: 0000000000000005 RSI: 0000000000000000 RDI: ffff9641f2003ffc
[60103.826184] RBP: ffff964192f6c800 R08: 00000000304d434e R09: ffff9641f1d2c004
[60103.826187] R10: 0000000000000002 R11: 00000000000005ae R12: ffff9642e6957a80
[60103.826190] R13: ffff964282ff2ee8 R14: 000000000000000d R15: ffff9642e4843900
[60103.826194] FS:  00007f395aaf6700(0000) GS:ffff964316c00000(0000) knlGS:0000000000000000
[60103.826197] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[60103.826200] CR2: ffff9641f2004000 CR3: 0000000013b0c000 CR4: 00000000000006f0
[60103.826204] Call Trace:
[60103.826212]  <IRQ>
[60103.826225]  cdc_ncm_fill_tx_frame+0x5e3/0x740 [cdc_ncm]
[60103.826236]  cdc_ncm_tx_fixup+0x57/0x70 [cdc_ncm]
[60103.826246]  usbnet_start_xmit+0x5d/0x710 [usbnet]
[60103.826254]  ? netif_skb_features+0x119/0x250
[60103.826259]  dev_hard_start_xmit+0xa1/0x200
[60103.826267]  sch_direct_xmit+0xf2/0x1b0
[60103.826273]  __dev_queue_xmit+0x5e3/0x7c0
[60103.826280]  ? ip_finish_output2+0x263/0x3c0
[60103.826284]  ip_finish_output2+0x263/0x3c0
[60103.826289]  ? ip_output+0x6c/0xe0
[60103.826293]  ip_output+0x6c/0xe0
[60103.826298]  ? ip_forward_options+0x1a0/0x1a0
[60103.826303]  tcp_transmit_skb+0x516/0x9b0
[60103.826309]  tcp_write_xmit+0x1aa/0xee0
[60103.826313]  ? sch_direct_xmit+0x71/0x1b0
[60103.826318]  tcp_tasklet_func+0x177/0x180
[60103.826325]  tasklet_action+0x5f/0x110
[60103.826332]  __do_softirq+0xde/0x2b3
[60103.826337]  irq_exit+0xae/0xb0
[60103.826342]  do_IRQ+0x81/0xd0
[60103.826347]  common_interrupt+0x98/0x98
[60103.826351]  </IRQ>
[60103.826355] RIP: 0033:0x7f397bdf2282
[60103.826358] RSP: 002b:00007f395aaf57d8 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff6e
[60103.826362] RAX: 0000000000000000 RBX: 00002f07bc6d0900 RCX: 00007f39752d7fe7
[60103.826365] RDX: 0000000000000022 RSI: 0000000000000147 RDI: 00002f07baea02c0
[60103.826368] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
[60103.826371] R10: 00000000ffffffff R11: 0000000000000000 R12: 00002f07baea02c0
[60103.826373] R13: 00002f07bba227a0 R14: 00002f07bc6d090c R15: 0000000000000000
[60103.826377] Code: 90 90 90 90 90 90 90 0f 1f 44 00 00 49 89 f9 48 89 d1 83
e2 07 48 c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 <f3> 48
ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1
[60103.826442] RIP: __memset+0x24/0x30 RSP: ffff964316c03b68
[60103.826444] CR2: ffff9641f2004000

Commit e1069bbfcf3b ("net: cdc_ncm: Reduce memory use when kernel
memory low") made this bug much more likely to trigger by reducing
the NTB size under memory pressure.

Link: https://bugs.debian.org/893393
Reported-by: Горбешко Богдан <bodqhrohro@gmail.com>
Reported-and-tested-by: Dennis Wassenberg <dennis.wassenberg@secunet.com>
Cc: Enrico Mioso <mrkiko.rs@gmail.com>
Fixes: 4a0e3e989d66 ("cdc_ncm: Add support for moving NDP to end of NCM frame")
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

bonding: re-evaluate force_primary when the primary slave name changes

BugLink: http://bugs.launchpad.net/bugs/1800537
[ Upstream commit eb55bbf865d9979098c6a7a17cbdb41237ece951 ]

There is a timing issue under active-standy mode, when bond_enslave() is
called, bond->params.primary might not be initialized yet.

Any time the primary slave string changes, bond->force_primary should be
set to true to make sure the primary becomes the active slave.

Signed-off-by: Xiangning Yu <yuxiangning@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

net/mlx5e: Don't attempt to dereference the ppriv struct if not being eswitch manager

BugLink: http://bugs.launchpad.net/bugs/1799049

The check for cpu hit statistics was not returning immediate false for
any non vport rep netdev and hence we crashed (say on mlx5 probed VFs) if
user-space tool was calling into any possible netdev in the system.

Fix that by doing a proper check before dereferencing.

Fixes: 1d447a39142e ('net/mlx5e: Extendable vport representor netdev private data')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Eli Cohen <eli@melloanox.com>
Reviewed-by: Eli Cohen <eli@melloanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
(cherry picked from commit 8ffd569aaa818f2624ca821d9a246342fa8b8c50)
Signed-off-by: Talat Batheesh <talatb@mellanox.com>
Acked-by: Kleber Souza <kleber.souza@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

ipmi: Fix timer race with module unload

Please note that below oops is from an older kernel, but the same
race seems to be present in the upstream kernel too.

---8<---

The following panic was encountered during removing the ipmi_ssif
module:

[ 526.352555] Unable to handle kernel paging request at virtual address ffff000006923090
[ 526.360464] Mem abort info:
[ 526.363257] ESR = 0x86000007
[ 526.366304] Exception class = IABT (current EL), IL = 32 bits
[ 526.372221] SET = 0, FnV = 0
[ 526.375269] EA = 0, S1PTW = 0
[ 526.378405] swapper pgtable: 4k pages, 48-bit VAs, pgd = 000000008ae60416
[ 526.385185] [ffff000006923090] *pgd=000000bffcffe803, *pud=000000bffcffd803, *pmd=0000009f4731a003, *pte=0000000000000000
[ 526.396141] Internal error: Oops: 86000007 [#1] SMP
[ 526.401008] Modules linked in: nls_iso8859_1 ipmi_devintf joydev input_leds ipmi_msghandler shpchp sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear i2c_smbus hid_generic usbhid uas hid usb_storage ast aes_ce_blk i2c_algo_bit aes_ce_cipher qede ttm crc32_ce ptp crct10dif_ce drm_kms_helper ghash_ce syscopyarea sha2_ce sysfillrect sysimgblt pps_core fb_sys_fops sha256_arm64 sha1_ce mpt3sas qed drm raid_class ahci scsi_transport_sas libahci gpio_xlp i2c_xlp9xx aes_neon_bs aes_neon_blk crypto_simd cryptd aes_arm64 [last unloaded: ipmi_ssif]
[ 526.468085] CPU: 125 PID: 0 Comm: swapper/125 Not tainted 4.15.0-35-generic #38~lp1775396+build.1
[ 526.476942] Hardware name: To be filled by O.E.M. Saber/Saber, BIOS 0ACKL022 08/14/2018
[ 526.484932] pstate: 00400009 (nzcv daif +PAN -UAO)
[ 526.489713] pc : 0xffff000006923090
[ 526.493198] lr : call_timer_fn+0x34/0x178
[ 526.497194] sp : ffff000009b0bdd0
[ 526.500496] x29: ffff000009b0bdd0 x28: 0000000000000082
[ 526.505796] x27: 0000000000000002 x26: ffff000009515188
[ 526.511096] x25: ffff000009515180 x24: ffff0000090f1018
[ 526.516396] x23: ffff000009519660 x22: dead000000000200
[ 526.521696] x21: ffff000006923090 x20: 0000000000000100
[ 526.526995] x19: ffff809eeb466a40 x18: 0000000000000000
[ 526.532295] x17: 000000000000000e x16: 0000000000000007
[ 526.537594] x15: 0000000000000000 x14: 071c71c71c71c71c
[ 526.542894] x13: 0000000000000000 x12: 0000000000000000
[ 526.548193] x11: 0000000000000001 x10: ffff000009b0be88
[ 526.553493] x9 : 0000000000000000 x8 : 0000000000000005
[ 526.558793] x7 : ffff80befc1f8528 x6 : 0000000000000020
[ 526.564092] x5 : 0000000000000040 x4 : 0000000020001b20
[ 526.569392] x3 : 0000000000000000 x2 : ffff809eeb466a40
[ 526.574692] x1 : ffff000006923090 x0 : ffff809eeb466a40
[ 526.579992] Process swapper/125 (pid: 0, stack limit = 0x000000002eb50acc)
[ 526.586854] Call trace:
[ 526.589289] 0xffff000006923090
[ 526.592419] expire_timers+0xc8/0x130
[ 526.596070] run_timer_softirq+0xec/0x1b0
[ 526.600070] __do_softirq+0x134/0x328
[ 526.603726] irq_exit+0xc8/0xe0
[ 526.606857] __handle_domain_irq+0x6c/0xc0
[ 526.610941] gic_handle_irq+0x84/0x188
[ 526.614679] el1_irq+0xe8/0x180
[ 526.617822] cpuidle_enter_state+0xa0/0x328
[ 526.621993] cpuidle_enter+0x34/0x48
[ 526.625564] call_cpuidle+0x44/0x70
[ 526.629040] do_idle+0x1b8/0x1f0
[ 526.632256] cpu_startup_entry+0x2c/0x30
[ 526.636174] secondary_start_kernel+0x11c/0x130
[ 526.640694] Code: bad PC value
[ 526.643800] ---[ end trace d020b0b8417c2498 ]---
[ 526.648404] Kernel panic - not syncing: Fatal exception in interrupt
[ 526.654778] SMP: stopping secondary CPUs
[ 526.658734] Kernel Offset: disabled
[ 526.662211] CPU features: 0x5800c38
[ 526.665688] Memory Limit: none
[ 526.668768] ---[ end Kernel panic - not syncing: Fatal exception in interrupt

Prevent mod_timer from arming a timer that was already removed by
del_timer during module unload.

BugLink: http://launchpad.net/bugs/1799281
Signed-off-by: Jan Glauber <jglauber@cavium.com>
Cc: <stable@vger.kernel.org> # 3.19
Signed-off-by: Corey Minyard <cminyard@mvista.com>
(cherry picked from commit 0711e8c1b4572d076264e71b0002d223f2666ed7)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Acked-by: Kleber Souza <kleber.souza@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

ipmi: Remove ACPI SPMI probing from the SSIF (I2C) driver

The IPMI spec states:

  The purpose of the SPMI Table is to provide a mechanism that can
  be used by the OSPM (an ACPI term for “OS Operating System-directed
  configuration and Power Management” essentially meaning an ACPI-aware
  OS or OS loader) very early in the boot process, e.g., before the
  ability to execute ACPI control methods in the OS is available.

When we are probing IPMI in Linux, ACPI control methods are available,
so we shouldn't be probing using SPMI.  It could cause some confusion
during the probing process.

BugLink: http://launchpad.net/bugs/1799276
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Tested-by: Jiandi An <anjiandi@codeaurora.org>
(cherry picked from commit 4866b1dce0389013a268f0ab63f7229b30c6e5fe)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Acked-by: Kleber Souza <kleber.souza@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

cap_inode_getsecurity: use d_find_any_alias() instead of d_find_alias()

BugLink: https://bugs.launchpad.net/bugs/1786729
The code in cap_inode_getsecurity(), introduced by commit 8db6c34f1dbc
("Introduce v3 namespaced file capabilities"), should use
d_find_any_alias() instead of d_find_alias() do handle unhashed dentry
correctly. This is needed, for example, if execveat() is called with an
open but unlinked overlayfs file, because overlayfs unhashes dentry on
unlink.
This is a regression of real life application, first reported at
https://www.spinics.net/lists/linux-unionfs/msg05363.html

Below reproducer and setup can reproduce the case.
  const char* exec="echo";
  const char *newargv[] = { "echo", "hello", NULL};
  const char *newenviron[] = { NULL };
  int fd, err;

  fd = open(exec, O_PATH);
  unlink(exec);
  err = syscall(322/*SYS_execveat*/, fd, "", newargv, newenviron,
AT_EMPTY_PATH);
  if(err<0)
    fprintf(stderr, "execveat: %s\n", strerror(errno));

gcc compile into ~/test/a.out
mount -t overlay -orw,lowerdir=/mnt/l,upperdir=/mnt/u,workdir=/mnt/w
none /mnt/m
cd /mnt/m
cp /bin/echo .
~/test/a.out

Expected result:
hello
Actually result:
execveat: Invalid argument
dmesg:
Invalid argument reading file caps for /dev/fd/3

The 2nd reproducer and setup emulates similar case but for
regular filesystem:
  const char* exec="echo";
  int fd, err;
  char buf[256];

  fd = open(exec, O_RDONLY);
  unlink(exec);
  err = fgetxattr(fd, "security.capability", buf, 256);
  if(err<0)
    fprintf(stderr, "fgetxattr: %s\n", strerror(errno));

gcc compile into ~/test_fgetxattr

cd /tmp
cp /bin/echo .
~/test_fgetxattr

Result:
fgetxattr: Invalid argument

On regular filesystem, for example, ext4 read xattr from
disk and return to execveat(), will not trigger this issue, however,
the overlay attr handler pass real dentry to vfs_getxattr() will.
This reproducer calls fgetxattr() with an unlinked fd, involkes
vfs_getxattr() then reproduced the case that d_find_alias() in
cap_inode_getsecurity() can't find the unlinked dentry.

Suggested-by: Amir Goldstein <amir73il@gmail.com>
Acked-by: Amir Goldstein <amir73il@gmail.com>
Acked-by: Serge E. Hallyn <serge@hallyn.com>
Fixes: 8db6c34f1dbc ("Introduce v3 namespaced file capabilities")
Cc: <stable@vger.kernel.org> # v4.14
Signed-off-by: Eddie Horng <eddie.horng@mediatek.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
(cherry picked from commit 355139a8dba446cc11a424cddbf7afebc3041ba1)
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Acked-by: Kleber Souza <kleber.souza@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

ipmi:ssif: Add support for multi-part transmit messages > 2 parts

The spec was fairly confusing about how multi-part transmit messages
worked, so the original implementation only added support for two
part messages.  But after talking about it with others and finding
something I missed, I think it makes more sense.

The spec mentions smbus command 8 in a table at the end of the
section on SSIF support as the end transaction.  If that works,
then all is good and as it should be.  However, some implementations
seem to use a middle transaction <32 bytes tomark the end because of the
confusion in the spec, even though that is an SMBus violation if
the number of bytes is zero.

So this change adds some tests, if command=8 works, it uses that,
otherwise if an empty end transaction works, it uses a middle
transaction <32 bytes to mark the end.  If neither works, then
it limits the size to 63 bytes as it is now.

BugLink: https://bugs.launchpad.net/bugs/1799794
Cc: Harri Hakkarainen <harri@cavium.com>
Cc: Bazhenov, Dmitry <dmitry.bazhenov@auriga.com>
Cc: Mach, Dat <Dat.Mach@cavium.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
(cherry picked from commit 10042504ed92c06077b8a20a4edd67ba784847d4)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Acked-by: Kleber Souza <kleber.souza@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

sysfs: Fix regression when adding a file to an existing group

BugLink: https://launchpad.net/bugs/1784501
Commit 5f81880d5204 ("sysfs, kobject: allow creating kobject belonging
to arbitrary users") incorrectly changed the argument passed as the
parent parameter when calling sysfs_add_file_mode_ns(). This caused some
sysfs attribute files to not be added correctly to certain groups.

Fixes: 5f81880d5204 ("sysfs, kobject: allow creating kobject belonging to arbitrary users")
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reported-by: Heiner Kallweit <hkallweit1@gmail.com>
Tested-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit d1753390274f7760e5b593cb657ea34f0617e559)
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Acked-by: Kleber Souza <kleber.souza@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>