git.proxmox.com Git - mirror_ubuntu-focal-kernel.git/log

UBUNTU: [Config]: built-in VFIO_PCI for amd64

BugLink: https://bugs.launchpad.net/bugs/1770845
This allows vfio-pci to be bound to certain devices during boot, preventing
other drivers from binding them.

In particular, USB host drivers, like xhci_hcd, are hard to unbind, as USB
devices may end up being used by applications.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
[ saf: add removed modules to modules.ignore ]
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

UBUNTU: [Packaging]: config-check: some enforcement is ignored

Whenever config-check finds an entry like:

CONFIG_VFIO mark<ENFORCED>note<LP#1636733>

it will consider mark<ENFORCED>note as the option name, as the regexp allows
any characters not in the whitespace class (\S). By using a class like [^\s<],
that is, not whitespace and not '<', it will be correctly parsed.

We could fix the entries that have this problem, but fixing the parser is more
robust.

Ignore: yes

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Acked-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

UBUNTU: SAUCE: md/raid0: Link to wiki with guidance on multi-zone RAID0 layout migration

BugLink: https://bugs.launchpad.net/bugs/1850540
Helping an administrator understand this issue and how to deal with it
requires more text than achievable in a kernel error message. Let's
clarify the issue in the Ubuntu wiki, and have the kernel emit a link
to it.

I've submitted a similar change upstream:
https://marc.info/?l=linux-raid&m=157360088014027&w=2
Should it get merged, we should consider replacing this patch with that one.
Otherwise, it is probably safe to drop this SAUCE patch after focal.

Signed-off-by: dann frazier <dann.frazier@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

UBUNTU: [Config] disable PCI_MESON

Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Acked-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

UBUNTU: [Config] Enable HISI crypto drivers and update previous module

BugLink: https://bugs.launchpad.net/bugs/1850117
BugLink: https://bugs.launchpad.net/bugs/1854549
+CONFIG_CRYPTO_DEV_HISI_HPRE=m
+CONFIG_CRYPTO_DEV_HISI_SEC2=m
+CONFIG_HW_RANDOM_HISI_V2=m
-CONFIG_CRYPTO_HISI_SGL=m

Previous module list has to be updated because of module renamed.

Signed-off-by: Ike Panhc <ike.pan@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

Documentation: Add debugfs doc for hisi_hpre

BugLink: https://bugs.launchpad.net/bugs/1850117
Add debugfs descriptions for HiSilicon HPRE driver.

Signed-off-by: Zaibo Xu <xuzaibo@huawei.com>
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit b492f82fcee1d1c6cdb54ce6e8134438e651b3cf)
Signed-off-by: Ike Panhc <ike.pan@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

Documentation: add DebugFS doc for HiSilicon SEC

BugLink: https://bugs.launchpad.net/bugs/1854549
This Documentation is for HiSilicon SEC DebugFS.

Signed-off-by: Longfang Liu <liulongfang@huawei.com>
Signed-off-by: Kai Ye <yekai13@huawei.com>
Reviewed-by: Zaibo Xu <xuzaibo@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit cbfe56e6938b2f7ca5e78b04417ee07f7c8d87fb)
Signed-off-by: Ike Panhc <ike.pan@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

MAINTAINERS: Add maintainer for HiSilicon SEC V2 driver

BugLink: https://bugs.launchpad.net/bugs/1854549
Here adds maintainer information for security engine driver.

Signed-off-by: Zaibo Xu <xuzaibo@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit a30583fcfb86ebf332573598359c189a4e02c2da)
Signed-off-by: Ike Panhc <ike.pan@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

MAINTAINERS: Add maintainer for HiSilicon TRNG V2 driver

BugLink: https://bugs.launchpad.net/bugs/1854549
Here adds maintainer information for HiSilicon TRNG V2 driver.

Signed-off-by: Zaibo Xu <xuzaibo@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 6a101349f8a71ec5f9466e61f22306b68ef49600)
Signed-off-by: Ike Panhc <ike.pan@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

MAINTAINERS: Add maintainer for HiSilicon HPRE driver

BugLink: https://bugs.launchpad.net/bugs/1850117
Here adds maintainer information for high performance RSA
engine (HPRE) driver.

Signed-off-by: Zaibo Xu <xuzaibo@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 75451f871cf5ed735c96118894fb9de418cd8a79)
Signed-off-by: Ike Panhc <ike.pan@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

hwrng: hisi - add HiSilicon TRNG driver support

BugLink: https://bugs.launchpad.net/bugs/1854549
This series adds HiSilicon true random number generator(TRNG)
driver in hw_random subsystem.

Signed-off-by: Zaibo Xu <xuzaibo@huawei.com>
Signed-off-by: Weili Qian <qianweili@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 3e90efd129593cf693d721e13f031f760d5a6343)
Signed-off-by: Ike Panhc <ike.pan@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

crypto: hisilicon - fix a NULL vs IS_ERR() bug in sec_create_qp_ctx()

BugLink: https://bugs.launchpad.net/bugs/1854549
The hisi_acc_create_sgl_pool() function returns error pointers, it never
returns NULL pointers.

Fixes: 416d82204df4 ("crypto: hisilicon - add HiSilicon SEC V2 driver")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 8a6b8f4d7a891ac66db4f97900a86b55c84a5802)
Signed-off-by: Ike Panhc <ike.pan@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

crypto: hisilicon - add DebugFS for HiSilicon SEC

BugLink: https://bugs.launchpad.net/bugs/1854549
The HiSilicon SEC engine driver uses DebugFS
to provide main debug information for user space.

Signed-off-by: Zaibo Xu <xuzaibo@huawei.com>
Signed-off-by: Longfang Liu <liulongfang@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 1e9bc276f8f19ea65b617d7c9458ead14da4ef60)
Signed-off-by: Ike Panhc <ike.pan@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

crypto: hisilicon - add SRIOV for HiSilicon SEC

BugLink: https://bugs.launchpad.net/bugs/1854549
HiSilicon SEC engine supports PCI SRIOV. This patch enable this feature.
User can enable VFs and pass through them to VM, same SEC driver can work
in VM to provide skcipher algorithms.

Signed-off-by: Zaibo Xu <xuzaibo@huawei.com>
Signed-off-by: Longfang Liu <liulongfang@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 73bcb049a77ba75b694cb4142b3a3ef09584a77c)
Signed-off-by: Ike Panhc <ike.pan@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

crypto: hisilicon - add HiSilicon SEC V2 driver

BugLink: https://bugs.launchpad.net/bugs/1854549
SEC driver provides PCIe hardware device initiation with
AES, SM4, and 3DES skcipher algorithms registered to Crypto.
It uses Hisilicon QM as interface to CPU.

Signed-off-by: Zaibo Xu <xuzaibo@huawei.com>
Signed-off-by: Longfang Liu <liulongfang@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 416d82204df44ef727de6eafafeaa4d12fdc78dc)
Signed-off-by: Ike Panhc <ike.pan@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

crypto: hisilicon - no need to check return value of debugfs_create functions

BugLink: https://bugs.launchpad.net/bugs/1854549
When calling debugfs functions, there is no need to ever check the
return value. The function can work or not, but the code logic should
never do something different based on this.

Cc: Zhou Wang <wangzhou1@hisilicon.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 4a97bfc79619c40d400f2a7b763a0d9cd1d33891)
Signed-off-by: Ike Panhc <ike.pan@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

crypto: hisilicon - add vfs_num module param for zip

BugLink: https://bugs.launchpad.net/bugs/1854549
Currently the VF can be enabled only through sysfs interface
after module loaded, but this also needs to be done when the
module loaded in some scenarios.

This patch adds module param vfs_num, adds hisi_zip_sriov_enable()
in probe, and also adjusts the position of probe.

Signed-off-by: Hao Fang <fanghao11@huawei.com>
Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 39977f4b51cdc544de4e5950751655f6693654a7)
Signed-off-by: Ike Panhc <ike.pan@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

crypto: hisilicon - replace #ifdef with IS_ENABLED for CONFIG_NUMA

BugLink: https://bugs.launchpad.net/bugs/1854549
Replace #ifdef CONFIG_NUMA with IS_ENABLED(CONFIG_NUMA) to fix kbuild error.

Fixes: 700f7d0d29c7 ("crypto: hisilicon - fix to return...")
Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com>
Reported-by: kbuild test robot <lkp@intel.com>
Acked-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 1e67ee9344abbbd35ee286d641461faecf43933f)
Signed-off-by: Ike Panhc <ike.pan@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

crypto: hisilicon - fix to return sub-optimal device when best device has no qps

BugLink: https://bugs.launchpad.net/bugs/1854549
Currently find_zip_device() finds zip device which has the min NUMA
distance with current CPU.

This patch modifies find_zip_device to return sub-optimal device when best
device has no qps. This patch sorts all devices by NUMA distance, then
finds the best zip device which has free qp.

Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com>
Signed-off-by: Shukun Tan <tanshukun1@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 700f7d0d29c795c36517dcd3541e4432a76c2efc)
Signed-off-by: Ike Panhc <ike.pan@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

crypto: hisilicon - use sgl API to get sgl dma addr and len

BugLink: https://bugs.launchpad.net/bugs/1854549
Use sgl API to get sgl dma addr and len, this will help to avoid compile
error in some platforms. So NEED_SG_DMA_LENGTH can be removed here, which
can only be selected by arch code.

Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com>
Suggested-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit f0c8b6a1e1454f1645463e8ffb3e027fc597867c)
Signed-off-by: Ike Panhc <ike.pan@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

crypto: hisilicon - fix endianness verification problem of QM

BugLink: https://bugs.launchpad.net/bugs/1854549
This patch fixes following sparse warning:

qm.c:345:33: warning: cast removes address space '<asn:2>' of expression
qm.c:359:20: warning: incorrect type in assignment (different base types)
qm.c:359:20:    expected restricted __le16 [usertype] w0
qm.c:359:20:    got int
qm.c:362:27: warning: incorrect type in assignment (different base types)
qm.c:362:27:    expected restricted __le16 [usertype] queue_num
qm.c:362:27:    got unsigned short [usertype] queue
qm.c:363:24: warning: incorrect type in assignment (different base types)
qm.c:363:24:    expected restricted __le32 [usertype] base_l
qm.c:363:24:    got unsigned int [usertype]
qm.c:364:24: warning: incorrect type in assignment (different base types)
qm.c:364:24:    expected restricted __le32 [usertype] base_h
qm.c:364:24:    got unsigned int [usertype]
qm.c:451:22: warning: restricted __le32 degrades to integer
qm.c:471:24: warning: restricted __le16 degrades to integer
......
qm.c:1617:19: warning: incorrect type in assignment (different base types)
qm.c:1617:19:    expected restricted __le32 [usertype] dw6
qm.c:1617:19:    got int
qm.c:1891:24: warning: incorrect type in return expression (different base types)
qm.c:1891:24:    expected int
qm.c:1891:24:    got restricted pci_ers_result_t
qm.c:1894:40: warning: incorrect type in return expression (different base types)
qm.c:1894:40:    expected int
qm.c:1894:40:    got restricted pci_ers_result_t

Signed-off-by: Shukun Tan <tanshukun1@huawei.com>
Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 9a8641a7ffbf6f896bcd2bb2c6c0f4b403831c18)
Signed-off-by: Ike Panhc <ike.pan@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

crypto: hisilicon - fix param should be static when not external.

BugLink: https://bugs.launchpad.net/bugs/1854549
This patch fixes following sparse warning:
zip_main.c:87:1: warning: symbol 'hisi_zip_list' was not declared.
Should it be static?
zip_main.c:88:1: warning: symbol 'hisi_zip_list_lock' was not declared.
Should it be static?
zip_main.c:948:68: warning: Using plain integer as NULL pointer

Signed-off-by: Shukun Tan <tanshukun1@huawei.com>
Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 719181f39a1045674b04256f54492f7fd97deddb)
Signed-off-by: Ike Panhc <ike.pan@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

crypto: hisilicon - Fix using plain integer as NULL pointer

BugLink: https://bugs.launchpad.net/bugs/1854549
This patch fix sparse warning:
zip_crypto.c:425:26: warning: Using plain integer as NULL pointer

Replaces assignment of 0 to pointer with NULL assignment.

Signed-off-by: Shukun Tan <tanshukun1@huawei.com>
Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit e10966981f7258dd7283f3028f414dd127bb5bfc)
Signed-off-by: Ike Panhc <ike.pan@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

crypto: hisilicon - tiny fix about QM/ZIP error callback print

BugLink: https://bugs.launchpad.net/bugs/1854549
Tiny fix to make QM/ZIP error callback print clear and right. If one version
hardware does not support error handling, we directly print this.

And QM is embedded in ZIP, we can use ZIP print only, so remove unnecessary
QM print.

Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit ee1788c61546b04763df608f8333ebd827119a02)
Signed-off-by: Ike Panhc <ike.pan@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

crypto: hisilicon: Fix misuse of GENMASK macro

BugLink: https://bugs.launchpad.net/bugs/1854549
Arguments are supposed to be ordered high then low.

Fixes: c8b4b477079d ("crypto: hisilicon - add HiSilicon HPRE accelerator")
Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 5b243b6c4aa2114ab84bb8a4b604c892a6ffd391)
Signed-off-by: Ike Panhc <ike.pan@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

crypto: hisilicon - select NEED_SG_DMA_LENGTH in qm Kconfig

BugLink: https://bugs.launchpad.net/bugs/1854549
To avoid compile error in some platforms, select NEED_SG_DMA_LENGTH in
qm Kconfig.

Fixes: dfed0098ab91 ("crypto: hisilicon - add hardware SGL support")
Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com>
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit b981744ef04f7e8cb6931edab50021fff3c8077e)
Signed-off-by: Ike Panhc <ike.pan@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

crypto: hisilicon - Add debugfs for HPRE

BugLink: https://bugs.launchpad.net/bugs/1850117
HiSilicon HPRE engine driver uses debugfs to provide debug information,
the usage can be found in /Documentation/ABI/testing/debugfs-hisi-hpre.

Signed-off-by: Zaibo Xu <xuzaibo@huawei.com>
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 8489741516182d8ac57a69e9f4ca963450607351)
Signed-off-by: Ike Panhc <ike.pan@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

crypto: hisilicon - add SRIOV support for HPRE

BugLink: https://bugs.launchpad.net/bugs/1850117
HiSilicon HPRE engine supports PCI SRIOV. This patch enable
this feature. User can enable VFs and pass through them to VM,
same HPRE driver can work in VM to provide RSA and DH algorithms
by crypto akcipher and kpp interfaces.

Signed-off-by: Zaibo Xu <xuzaibo@huawei.com>
Signed-off-by: Hui tang <tanghui20@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 5ec302a364bfd95be29a9784b2fabd8e2ddf0476)
Signed-off-by: Ike Panhc <ike.pan@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

crypto: hisilicon - add HiSilicon HPRE accelerator

BugLink: https://bugs.launchpad.net/bugs/1850117
The HiSilicon HPRE accelerator implements RSA and DH algorithms. It
uses Hisilicon QM as interface to CPU.

This patch provides PCIe driver to the accelerator and registers its
algorithms to crypto akcipher and kpp interfaces.

Signed-off-by: Zaibo Xu <xuzaibo@huawei.com>
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit c8b4b477079d1995cc0a1c10d5cdfd02be938cdf)
Signed-off-by: Ike Panhc <ike.pan@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

crypto: hisilicon - misc fix about sgl

BugLink: https://bugs.launchpad.net/bugs/1854549
This patch fixes some misc problems in sgl codes, e.g. missing static,
sparse error and input parameter check.

Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com>
Signed-off-by: Shukun Tan <tanshukun1@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit a92a00f809503c6db9dac518951e060ab3d6f6ee)
Signed-off-by: Ike Panhc <ike.pan@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

crypto: hisilicon - fix large sgl memory allocation problem when disable smmu

BugLink: https://bugs.launchpad.net/bugs/1854549
When disabling SMMU, it may fail to allocate large continuous memory. This
patch fixes this by allocating memory as blocks.

Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com>
Signed-off-by: Shukun Tan <tanshukun1@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit d8ac7b85236b04d14fa80328726cd4d098b4a2a7)
Signed-off-by: Ike Panhc <ike.pan@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

crypto: hisilicon - add sgl_sge_nr module param for zip

BugLink: https://bugs.launchpad.net/bugs/1854549
Add a module parameter for zip driver to set the number of SGE in one SGL.

Signed-off-by: Shukun Tan <tanshukun1@huawei.com>
Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit f081fda293ffba54216a7dab66faba7275475006)
Signed-off-by: Ike Panhc <ike.pan@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

crypto: hisilicon - merge sgl support to hisi_qm module

BugLink: https://bugs.launchpad.net/bugs/1854549
As HW SGL can be seen as a data format of QM's sqe, we merge sgl code into
qm module and rename it as hisi_qm, which reduces the number of module and
make the name less generic.

This patch also modify the interface of SGL:
- Create/free hisi_acc_sgl_pool inside.
- Let user to pass the SGE number in one SGL when creating sgl pool, which
is better than a unified module parameter for sgl module before.
- Modify zip driver according to sgl interface change.

Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com>
Signed-off-by: Shukun Tan <tanshukun1@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 48c1cd40fae31aa39e33930e7d28a0d96f01ea17)
Signed-off-by: Ike Panhc <ike.pan@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

crypto: hisilicon - allow compile-testing on x86

BugLink: https://bugs.launchpad.net/bugs/1854549
To avoid missing arm64 specific warnings that get introduced
in this driver, allow compile-testing on all 64-bit architectures.

The only actual arm64 specific code in this driver is an open-
coded 128 bit MMIO write. On non-arm64 the same can be done
using memcpy_toio. What I also noticed is that the mmio store
(either one) is not endian-safe, this will only work on little-
endian configurations, so I also add a Kconfig dependency on
that, regardless of the architecture.
Finally, a depenndecy on CONFIG_64BIT is needed because of the
writeq().

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit a7174f978563e112fcd8601c9f8b4a9ddef3388d)
Signed-off-by: Ike Panhc <ike.pan@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: hisi_sas: Relocate call to hisi_sas_debugfs_exit()

BugLink: https://bugs.launchpad.net/bugs/1854548
Currently we call function hisi_sas_debugfs_exit() to remove debugfs_dir
before freeing interrupt irqs and destroying workqueue in the driver remove
path.

If a dump is triggered before function hisi_sas_debugfs_exit() but
debugfs_work may be called after it, so it may refer to already removed
debugfs_dir which will cause NULL pointer dereference.

To avoid it, put function hisi_sas_debugfs_exit() after free_irqs and
destroy workqueue when removing hisi_sas driver.

Link: https://lore.kernel.org/r/1573551059-107873-4-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 7c0ecd40c31246a2d0de81879966436ff07505a4)
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: hisi_sas: Return directly if init hardware failed

BugLink: https://bugs.launchpad.net/bugs/1855958
Need to return directly if init hardware failed.

Fixes: 73a4925d154c ("scsi: hisi_sas: Update all the registers after suspend and resume")
Link: https://lore.kernel.org/r/1573551059-107873-3-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 547fde8b5a1923050f388caae4f76613b5a620e0)
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: hisi_sas: Record the phy down event in debugfs

BugLink: https://bugs.launchpad.net/bugs/1854548
The number of phy down reflects the quality of the link between SAS
controller and disk. In order to allow the user to confirm the link quality
of the system, we record the number of phy down for each phy.

The user can check the current phy down count by reading the debugfs file
corresponding to the specific phy, or clear the phy down count by writing 0
to the debugfs file.

Link: https://lore.kernel.org/r/1571926105-74636-19-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit f873b66119f2d6fc7b932a68df8d77a26357bab6)
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: hisi_sas: Add ability to have multiple debugfs dumps

BugLink: https://bugs.launchpad.net/bugs/1854548
We use the module parameter debugfs_dump_count to manage the upper limit of
the memory block for multiple dumps.

Link: https://lore.kernel.org/r/1571926105-74636-17-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 8f6432986e610612688dc77c2683657d7289546f)
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: hisi_sas: Add module parameter for debugfs dump count

BugLink: https://bugs.launchpad.net/bugs/1854548
We still only use dump index #0 however.

Link: https://lore.kernel.org/r/1571926105-74636-16-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 905ab01faf5fc81ba2fc46dddcd21ad5a2dd137b)
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: hisi_sas: Allocate memory for multiple dumps of debugfs

BugLink: https://bugs.launchpad.net/bugs/1854548
We add multiple dumps for debugfs, but only allocate memory this time and
only dump #0.

Link: https://lore.kernel.org/r/1571926105-74636-15-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit a70e33eae363e6f3e2ad9498daaccd231790f7f5)
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: hisi_sas: Add debugfs file structure for ITCT cache

BugLink: https://bugs.launchpad.net/bugs/1854548
Create a file structure which was used to save the memory address for
ITCT cache at debugfs. This structure is bound to the corresponding debugfs
file, it can help callback function of debugfs file to get what it needs.

Link: https://lore.kernel.org/r/1571926105-74636-14-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 357e4fc7a933ed5bfbf1eb2fad9c198afe6a11e1)
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: hisi_sas: Add debugfs file structure for IOST cache

BugLink: https://bugs.launchpad.net/bugs/1854548
Create a file structure which was used to save the memory address for IOST
cache at debugfs. This structure is bound to the corresponding debugfs
file, it can help callback function of debugfs file to get what it needs.

Link: https://lore.kernel.org/r/1571926105-74636-13-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit b714dd8f36dc609dd4b0078cf5563978134838ed)
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: hisi_sas: Add debugfs file structure for ITCT

BugLink: https://bugs.launchpad.net/bugs/1854548
Create a file structure which was used to save the memory address for ITCT
at debugfs. This structure is bound to the corresponding debugfs file, it
can help callback function of debugfs file to get what it needs.

Link: https://lore.kernel.org/r/1571926105-74636-12-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 0161d55f23a1e020e5e6892177caf92a61e5c161)
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: hisi_sas: Add debugfs file structure for IOST

BugLink: https://bugs.launchpad.net/bugs/1854548
Create a file structure which was used to save the memory address for IOST
at debugfs. This structure is bound to the corresponding debugfs file, it
can help callback function of debugfs file to get what it needs.

Link: https://lore.kernel.org/r/1571926105-74636-11-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit e15f2e2dff5b809dce923839f21362d6b0d06b1e)
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: hisi_sas: Add debugfs file structure for port

BugLink: https://bugs.launchpad.net/bugs/1854548
Create a file structure which was used to save the memory address and phy
pointer for port at debugfs. This structure is bound to the corresponding
debugfs file, it can help callback function of debugfs file to get what it
need.

Link: https://lore.kernel.org/r/1571926105-74636-10-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 1f66e1fd26bddb4c9275b61934dbaaf4b0b0bd79)
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: hisi_sas: Add debugfs file structure for registers

BugLink: https://bugs.launchpad.net/bugs/1854548
Create a file structure which was used to save the memory address and
hisi_hba pointer for REGS at debugfs. This structure is bound to the
corresponding debugfs file, it can help callback function of debugfs file
to get what it need.

Link: https://lore.kernel.org/r/1571926105-74636-9-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit c61163981076476e0bcf2d453dcddf8db605f115)
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: hisi_sas: Add debugfs file structure for DQ

BugLink: https://bugs.launchpad.net/bugs/1854548
Create a file structure which was used to save the memory address and DQ
pointer for DQ at debugfs. This structure is bound to the corresponding
debugfs file, it can help callback function of debugfs file to get what it
need.

Link: https://lore.kernel.org/r/1571926105-74636-8-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 1b54c4db725d875dcae645a3da74625b9e4b3bdf)
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: hisi_sas: Add debugfs file structure for CQ

BugLink: https://bugs.launchpad.net/bugs/1854548
Create a file structure which was used to save the memory address and CQ
pointer for CQ at debugfs. This structure is bound to the corresponding
debugfs file, it can help callback function of debugfs file to get what it
need.

Link: https://lore.kernel.org/r/1571926105-74636-7-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 35ea630b2bad4fe9f7db34624eaab3663bb2cb42)
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: hisi_sas: Add timestamp for a debugfs dump

BugLink: https://bugs.launchpad.net/bugs/1854548
It's useful to know when the dump occurred, so add a timestamp file for
this.

Link: https://lore.kernel.org/r/1571926105-74636-6-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit d28ed83b769378deefa82456f962e14a4b0afadf)
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: hisi_sas: use wait_for_completion_timeout() when clearing ITCT

BugLink: https://bugs.launchpad.net/bugs/1853999
When injecting 2bit ecc errors, it will cause confusion inside SAS
controller which needs host reset to recover it. If a device is gone at the
same times inject 2bit ecc errors, we may not receive the ITCT interrupt so
it will wait for completion in clear_itct_v3_hw() all the time. And host
reset will also not occur because it can't require hisi_hba->sem, so the
system will be suspended.

To solve the issue, use wait_for_completion_timeout() instead of
wait_for_completion(), and also don't mark the gone device as
SAS_PHY_UNUSED when device gone.

Link: https://lore.kernel.org/r/1571926105-74636-4-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 8fa9a7bd3099a96194d767ce681c68dbcb8a957e)
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: hisi_sas: Set the BIST init value before enabling BIST

BugLink: https://bugs.launchpad.net/bugs/1854548
If set the BIST init value after enabling BIST, there may be still some few
error bits. According to the process, need to set the BIST init value
before enabling BIST.

Fixes: 97b151e75861 ("scsi: hisi_sas: Add BIST support for phy loopback")
Link: https://lore.kernel.org/r/1571926105-74636-3-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 65a3b8bd56942dc988b8c05615bd3f510a10012b)
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: hisi_sas: Don't create debugfs dump folder twice

BugLink: https://bugs.launchpad.net/bugs/1854548
Due to a merge error, we attempt to create 2x debugfs dump folders, which
fails:
[ 861.101914] debugfs: Directory 'dump' with parent '0000:74:02.0'
already present!

This breaks the dump function.

To fix, remove the superfluous attempt to create the folder.

Fixes: 7ec7082c57ec ("scsi: hisi_sas: Add hisi_sas_debugfs_alloc() to centralise allocation")
Link: https://lore.kernel.org/r/1571926105-74636-2-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 35160421b63d4753a72e9f72ebcdd9d6f88f84b9)
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

UBUNTU: [Config] update annotations to match config changes

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

Linux 5.4.8

BugLink: https://bugs.launchpad.net/bugs/1858429
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

mm/hugetlbfs: fix for_each_hstate() loop in init_hugetlbfs_fs()

BugLink: https://bugs.launchpad.net/bugs/1858429
commit 15f0ec941f4f908fefa23a30ded8358977cc1cc0 upstream.

LTP memfd_create04 started failing for some huge page sizes
after v5.4-10135-gc3bfc5dd73c6.

The problem is the check introduced to for_each_hstate() loop that
should skip default_hstate_idx. Since it doesn't update 'i' counter,
all subsequent huge page sizes are skipped as well.

Fixes: 8fc312b32b25 ("mm/hugetlbfs: fix error handling when setting up mounts")
Signed-off-by: Jan Stancek <jstancek@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

mmc: sdhci-of-esdhc: re-implement erratum A-009204 workaround

BugLink: https://bugs.launchpad.net/bugs/1858429
[ Upstream commit f667216c5c7c967c3e568cdddefb51fe606bfe26 ]

The erratum A-009204 workaround patch was reverted because of
incorrect implementation.

8b6dc6b mmc: sdhci-of-esdhc: Revert "mmc: sdhci-of-esdhc: add
erratum A-009204 support"

This patch is to re-implement the workaround (add a 5 ms delay
before setting SYSCTL[RSTD] to make sure all the DMA transfers
are finished).

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Link: https://lore.kernel.org/r/20191219032335.26528-1-yangbo.lu@nxp.com
Fixes: 5dd195522562 ("mmc: sdhci-of-esdhc: add erratum A-009204 support")
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

mmc: sdhci-of-esdhc: fix up erratum A-008171 workaround

BugLink: https://bugs.launchpad.net/bugs/1858429
[ Upstream commit 22dc132d5448db1b1c021de0c34aa8033ca7d98f ]

A previous patch implemented an incomplete workaround of erratum
A-008171. The complete workaround is as below. This patch is to
implement the complete workaround which uses SW tuning if HW tuning
fails, and retries both HW/SW tuning once with reduced clock if
workaround fails. This is suggested by hardware team, and the patch
had been verified on LS1046A eSDHC + Phison 32G eMMC which could
trigger the erratum.

Workaround:
/* For T1040, T2080, LS1021A, T1023 Rev 1: */
1. Program TBPTR[TB_WNDW_END_PTR] = 3*DIV_RATIO.
2. Program TBPTR[TB_WNDW_START_PTR] = 5*DIV_RATIO.
3. Program the software tuning mode by setting TBCTL[TB_MODE] = 2'h3.
4. Set SYSCTL2[EXTN] and SYSCTL2[SAMPCLKSEL].
5. Issue SEND_TUNING_BLK Command (CMD19 for SD, CMD21 for MMC).
6. Wait for IRQSTAT[BRR], buffer read ready, to be set.
7. Clear IRQSTAT[BRR].
8. Check SYSCTL2[EXTN] to be cleared.
9. Check SYSCTL2[SAMPCLKSEL], Sampling Clock Select. It's set value
   indicate tuning procedure success, and clear indicate failure.
   In case of tuning failure, fixed sampling scheme could be used by
   clearing TBCTL[TB_EN].
/* For LS1080A Rev 1, LS2088A Rev 1.0, LA1575A Rev 1.0: */
1. Read the TBCTL[31:0] register. Write TBCTL[11:8]=4'h8 and wait for
   1ms.
2. Read the TBCTL[31:0] register and rewrite again. Wait for 1ms second.
3. Read the TBSTAT[31:0] register twice.
3.1 Reset data lines by setting ESDHCCTL[RSTD] bit.
3.2 Check ESDHCCTL[RSTD] bit.
3.3 If ESDHCCTL[RSTD] is 0, go to step 3.4 else go to step 3.2.
3.4 Write 32'hFFFF_FFFF to IRQSTAT register.
4. if TBSTAT[15:8]-TBSTAT[7:0] > 4*DIV_RATIO or TBSTAT[7:0]-TBSTAT[15:8]
   > 4*DIV_RATIO , then program TBPTR[TB_WNDW_END_PTR] = 4*DIV_RATIO and
   program TBPTR[TB_WNDW_START_PTR] = 8*DIV_RATIO.
/* For LS1012A Rev1, LS1043A Rev 1.x, LS1046A 1.0: */
1. Read the TBCTL[0:31] register. Write TBCTL[20:23]=4'h8 and wait for
   1ms.
2. Read the TBCTL[0:31] register and rewrite again. Wait for 1ms second.
3. Read the TBSTAT[0:31] register twice.
3.1 Reset data lines by setting ESDHCCTL[RSTD] bit.
3.2 Check ESDHCCTL[RSTD] bit.
3.3 If ESDHCCTL[RSTD] is 0, go to step 3.4 else go to step 3.2.
3.4 Write 32'hFFFF_FFFF to IRQSTAT register.
4. if TBSTAT[16:23]-TBSTAT[24:31] > 4*DIV_RATIO or TBSTAT[24:31]-
   TBSTAT[16:23] > 4* DIV_RATIO , then program TBPTR[TB_WNDW_END_PTR] =
   4*DIV_RATIO and program TBPTR[TB_WNDW_START_PTR] = 8*DIV_RATIO.
/* For LS1080A Rev 1, LS2088A Rev 1.0, LA1575A Rev 1.0 LS1012A Rev1,
* LS1043A Rev 1.x, LS1046A 1.0:
*/
5. else program TBPTR[TB_WNDW_END_PTR] = 3*DIV_RATIO and program
   TBPTR[TB_WNDW_START_PTR] = 5*DIV_RATIO.
6. Program the software tuning mode by setting TBCTL[TB_MODE] = 2'h3.
7. Set SYSCTL2[EXTN], wait 1us and SYSCTL2[SAMPCLKSEL].
8. Issue SEND_TUNING_BLK Command (CMD19 for SD, CMD21 for MMC).
9. Wait for IRQSTAT[BRR], buffer read ready, to be set.
10. Clear IRQSTAT[BRR].
11. Check SYSCTL2[EXTN] to be cleared.
12. Check SYSCTL2[SAMPCLKSEL], Sampling Clock Select. It's set value
    indicate tuning procedure success, and clear indicate failure.
    In case of tuning failure, fixed sampling scheme could be used by
    clearing TBCTL[TB_EN].

Fixes: b1f378ab5334 ("mmc: sdhci-of-esdhc: add erratum A008171 support")
Signed-off-by: Yinbo Zhu <yinbo.zhu@nxp.com>
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

vhost/vsock: accept only packets with the right dst_cid

BugLink: https://bugs.launchpad.net/bugs/1858429
[ Upstream commit 8a3cc29c316c17de590e3ff8b59f3d6cbfd37b0a ]

When we receive a new packet from the guest, we check if the
src_cid is correct, but we forgot to check the dst_cid.

The host should accept only packets where dst_cid is
equal to the host CID.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

net: ena: fix napi handler misbehavior when the napi budget is zero

BugLink: https://bugs.launchpad.net/bugs/1858429
[ Upstream commit 24dee0c7478d1a1e00abdf5625b7f921467325dc ]

In netpoll the napi handler could be called with budget equal to zero.
Current ENA napi handler doesn't take that into consideration.

The napi handler handles Rx packets in a do-while loop.
Currently, the budget check happens only after decrementing the
budget, therefore the napi handler, in rare cases, could run over
MAX_INT packets.

In addition to that, this moves all budget related variables to int
calculation and stop mixing u32 to avoid ambiguity

Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

net: phylink: fix interface passed to mac_link_up

BugLink: https://bugs.launchpad.net/bugs/1858429
[ Upstream commit 9b2079c046a9d6c9c73a4ec33816678565ee01f3 ]

A mismerge between the following two commits:

c678726305b9 ("net: phylink: ensure consistent phy interface mode")
27755ff88c0e ("net: phylink: Add phylink_mac_link_{up, down} wrapper functions")

resulted in the wrong interface being passed to the mac_link_up()
function. Fix this up.

Fixes: b4b12b0d2f02 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

ipv6/addrconf: only check invalid header values when NETLINK_F_STRICT_CHK is set

BugLink: https://bugs.launchpad.net/bugs/1858429
[ Upstream commit 2beb6d2901a3f73106485d560c49981144aeacb1 ]

In commit 4b1373de73a3 ("net: ipv6: addr: perform strict checks also for
doit handlers") we add strict check for inet6_rtm_getaddr(). But we did
the invalid header values check before checking if NETLINK_F_STRICT_CHK
is set. This may break backwards compatibility if user already set the
ifm->ifa_prefixlen, ifm->ifa_flags, ifm->ifa_scope in their netlink code.

I didn't move the nlmsg_len check because I thought it's a valid check.

Reported-by: Jianlin Shi <jishi@redhat.com>
Fixes: 4b1373de73a3 ("net: ipv6: addr: perform strict checks also for doit handlers")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

bnxt: apply computed clamp value for coalece parameter

BugLink: https://bugs.launchpad.net/bugs/1858429
[ Upstream commit 6adc4601c2a1ac87b4ab8ed0cb55db6efd0264e8 ]

After executing "ethtool -C eth0 rx-usecs-irq 0", the box becomes
unresponsive, likely due to interrupt livelock. It appears that
a minimum clamp value for the irq timer is computed, but is never
applied.

Fix by applying the corrected clamp value.

Fixes: 74706afa712d ("bnxt_en: Update interrupt coalescing logic.")
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

gtp: do not allow adding duplicate tid and ms_addr pdp context

BugLink: https://bugs.launchpad.net/bugs/1858429
[ Upstream commit 6b01b1d9b2d38dc84ac398bfe9f00baff06a31e5 ]

GTP RX packet path lookups pdp context with TID. If duplicate TID pdp
contexts are existing in the list, it couldn't select correct pdp context.
So, TID value should be unique.
GTP TX packet path lookups pdp context with ms_addr. If duplicate ms_addr pdp
contexts are existing in the list, it couldn't select correct pdp context.
So, ms_addr value should be unique.

Fixes: 459aa660eb1d ("gtp: add initial driver for datapath of GPRS Tunneling Protocol (GTP-U)")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

gtp: fix an use-after-free in ipv4_pdp_find()

BugLink: https://bugs.launchpad.net/bugs/1858429
[ Upstream commit 94dc550a5062030569d4aa76e10e50c8fc001930 ]

ipv4_pdp_find() is called in TX packet path of GTP.
ipv4_pdp_find() internally uses gtp->tid_hash to lookup pdp context.
In the current code, gtp->tid_hash and gtp->addr_hash are freed by
->dellink(), which is gtp_dellink().
But gtp_dellink() would be called while packets are processing.
So, gtp_dellink() should not free gtp->tid_hash and gtp->addr_hash.
Instead, dev->priv_destructor() would be used because this callback
is called after all packet processing safely.

Test commands:
    ip link add veth1 type veth peer name veth2
    ip a a 172.0.0.1/24 dev veth1
    ip link set veth1 up
    ip a a 172.99.0.1/32 dev lo

    gtp-link add gtp1 &

    gtp-tunnel add gtp1 v1 200 100 172.99.0.2 172.0.0.2
    ip r a  172.99.0.2/32 dev gtp1
    ip link set gtp1 mtu 1500

    ip netns add ns2
    ip link set veth2 netns ns2
    ip netns exec ns2 ip a a 172.0.0.2/24 dev veth2
    ip netns exec ns2 ip link set veth2 up
    ip netns exec ns2 ip a a 172.99.0.2/32 dev lo
    ip netns exec ns2 ip link set lo up

    ip netns exec ns2 gtp-link add gtp2 &
    ip netns exec ns2 gtp-tunnel add gtp2 v1 100 200 172.99.0.1 172.0.0.1
    ip netns exec ns2 ip r a 172.99.0.1/32 dev gtp2
    ip netns exec ns2 ip link set gtp2 mtu 1500

    hping3 172.99.0.2 -2 --flood &
    ip link del gtp1

Splat looks like:
[   72.568081][ T1195] BUG: KASAN: use-after-free in ipv4_pdp_find.isra.12+0x130/0x170 [gtp]
[   72.568916][ T1195] Read of size 8 at addr ffff8880b9a35d28 by task hping3/1195
[   72.569631][ T1195]
[   72.569861][ T1195] CPU: 2 PID: 1195 Comm: hping3 Not tainted 5.5.0-rc1 #199
[   72.570547][ T1195] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[   72.571438][ T1195] Call Trace:
[   72.571764][ T1195]  dump_stack+0x96/0xdb
[   72.572171][ T1195]  ? ipv4_pdp_find.isra.12+0x130/0x170 [gtp]
[   72.572761][ T1195]  print_address_description.constprop.5+0x1be/0x360
[   72.573400][ T1195]  ? ipv4_pdp_find.isra.12+0x130/0x170 [gtp]
[   72.573971][ T1195]  ? ipv4_pdp_find.isra.12+0x130/0x170 [gtp]
[   72.574544][ T1195]  __kasan_report+0x12a/0x16f
[   72.575014][ T1195]  ? ipv4_pdp_find.isra.12+0x130/0x170 [gtp]
[   72.575593][ T1195]  kasan_report+0xe/0x20
[   72.576004][ T1195]  ipv4_pdp_find.isra.12+0x130/0x170 [gtp]
[   72.576577][ T1195]  gtp_build_skb_ip4+0x199/0x1420 [gtp]
[ ... ]
[   72.647671][ T1195] BUG: unable to handle page fault for address: ffff8880b9a35d28
[   72.648512][ T1195] #PF: supervisor read access in kernel mode
[   72.649158][ T1195] #PF: error_code(0x0000) - not-present page
[   72.649849][ T1195] PGD a6c01067 P4D a6c01067 PUD 11fb07067 PMD 11f939067 PTE 800fffff465ca060
[   72.652958][ T1195] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[   72.653834][ T1195] CPU: 2 PID: 1195 Comm: hping3 Tainted: G    B             5.5.0-rc1 #199
[   72.668062][ T1195] RIP: 0010:ipv4_pdp_find.isra.12+0x86/0x170 [gtp]
[ ... ]
[   72.679168][ T1195] Call Trace:
[   72.679603][ T1195]  gtp_build_skb_ip4+0x199/0x1420 [gtp]
[   72.681915][ T1195]  ? ipv4_pdp_find.isra.12+0x170/0x170 [gtp]
[   72.682513][ T1195]  ? lock_acquire+0x164/0x3b0
[   72.682966][ T1195]  ? gtp_dev_xmit+0x35e/0x890 [gtp]
[   72.683481][ T1195]  gtp_dev_xmit+0x3c2/0x890 [gtp]
[ ... ]

Fixes: 459aa660eb1d ("gtp: add initial driver for datapath of GPRS Tunneling Protocol (GTP-U)")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

hv_netvsc: Fix tx_table init in rndis_set_subchannel()

BugLink: https://bugs.launchpad.net/bugs/1858429
[ Upstream commit c39ea5cba5a2e97fc01b78c85208bf31383b399c ]

Host can provide send indirection table messages anytime after RSS is
enabled by calling rndis_filter_set_rss_param(). So the host provided
table values may be overwritten by the initialization in
rndis_set_subchannel().

To prevent this problem, move the tx_table initialization before calling
rndis_filter_set_rss_param().

Fixes: a6fb6aa3cfa9 ("hv_netvsc: Set tx_table to equal weight after subchannels open")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

tcp/dccp: fix possible race __inet_lookup_established()

BugLink: https://bugs.launchpad.net/bugs/1858429
[ Upstream commit 8dbd76e79a16b45b2ccb01d2f2e08dbf64e71e40 ]

Michal Kubecek and Firo Yang did a very nice analysis of crashes
happening in __inet_lookup_established().

Since a TCP socket can go from TCP_ESTABLISH to TCP_LISTEN
(via a close()/socket()/listen() cycle) without a RCU grace period,
I should not have changed listeners linkage in their hash table.

They must use the nulls protocol (Documentation/RCU/rculist_nulls.txt),
so that a lookup can detect a socket in a hash list was moved in
another one.

Since we added code in commit d296ba60d8e2 ("soreuseport: Resolve
merge conflict for v4/v6 ordering fix"), we have to add
hlist_nulls_add_tail_rcu() helper.

Fixes: 3b24d854cb35 ("tcp/dccp: do not touch listener sk_refcnt under synflood")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Michal Kubecek <mkubecek@suse.cz>
Reported-by: Firo Yang <firo.yang@suse.com>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Link: https://lore.kernel.org/netdev/20191120083919.GH27852@unicorn.suse.cz/
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

tcp: do not send empty skb from tcp_write_xmit()

BugLink: https://bugs.launchpad.net/bugs/1858429
[ Upstream commit 1f85e6267caca44b30c54711652b0726fadbb131 ]

Backport of commit fdfc5c8594c2 ("tcp: remove empty skb from
write queue in error cases") in linux-4.14 stable triggered
various bugs. One of them has been fixed in commit ba2ddb43f270
("tcp: Don't dequeue SYN/FIN-segments from write-queue"), but
we still have crashes in some occasions.

Root-cause is that when tcp_sendmsg() has allocated a fresh
skb and could not append a fragment before being blocked
in sk_stream_wait_memory(), tcp_write_xmit() might be called
and decide to send this fresh and empty skb.

Sending an empty packet is not only silly, it might have caused
many issues we had in the past with tp->packets_out being
out of sync.

Fixes: c65f7f00c587 ("[TCP]: Simplify SKB data portion allocation with NETIF_F_SG.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Christoph Paasch <cpaasch@apple.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Cc: Jason Baron <jbaron@akamai.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

bonding: fix active-backup transition after link failure

BugLink: https://bugs.launchpad.net/bugs/1858429
[ Upstream commit 5d485ed88d48f8101a2067348e267c0aaf4ed486 ]

After the recent fix in commit 1899bb325149 ("bonding: fix state
transition issue in link monitoring"), the active-backup mode with
miimon initially come-up fine but after a link-failure, both members
transition into backup state.

Following steps to reproduce the scenario (eth1 and eth2 are the
slaves of the bond):

    ip link set eth1 up
    ip link set eth2 down
    sleep 1
    ip link set eth2 up
    ip link set eth1 down
    cat /sys/class/net/eth1/bonding_slave/state
    cat /sys/class/net/eth2/bonding_slave/state

Fixes: 1899bb325149 ("bonding: fix state transition issue in link monitoring")
CC: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

gtp: avoid zero size hashtable

BugLink: https://bugs.launchpad.net/bugs/1858429
[ Upstream commit 6a902c0f31993ab02e1b6ea7085002b9c9083b6a ]

GTP default hashtable size is 1024 and userspace could set specific
hashtable size with IFLA_GTP_PDP_HASHSIZE. If hashtable size is set to 0
from userspace, hashtable will not work and panic will occur.

Fixes: 459aa660eb1d ("gtp: add initial driver for datapath of GPRS Tunneling Protocol (GTP-U)")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

gtp: fix wrong condition in gtp_genl_dump_pdp()

BugLink: https://bugs.launchpad.net/bugs/1858429
[ Upstream commit 94a6d9fb88df43f92d943c32b84ce398d50bf49f ]

gtp_genl_dump_pdp() is ->dumpit() callback of GTP module and it is used
to dump pdp contexts. it would be re-executed because of dump packet size.

If dump packet size is too big, it saves current dump pointer
(gtp interface pointer, bucket, TID value) then it restarts dump from
last pointer.
Current GTP code allows adding zero TID pdp context but dump code
ignores zero TID value. So, last dump pointer will not be found.

In addition, this patch adds missing rcu_read_lock() in
gtp_genl_dump_pdp().

Fixes: 459aa660eb1d ("gtp: add initial driver for datapath of GPRS Tunneling Protocol (GTP-U)")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

net: marvell: mvpp2: phylink requires the link interrupt

BugLink: https://bugs.launchpad.net/bugs/1858429
[ Upstream commit f3f2364ea14d1cf6bf966542f31eadcf178f1577 ]

phylink requires the MAC to report when its link status changes when
operating in inband modes.  Failure to report link status changes
means that phylink has no idea when the link events happen, which
results in either the network interface's carrier remaining up or
remaining permanently down.

For example, with a fiber module, if the interface is brought up and
link is initially established, taking the link down at the far end
will cut the optical power.  The SFP module's LOS asserts, we
deactivate the link, and the network interface reports no carrier.

When the far end is brought back up, the SFP module's LOS deasserts,
but the MAC may be slower to establish link.  If this happens (which
in my tests is a certainty) then phylink never hears that the MAC
has established link with the far end, and the network interface is
stuck reporting no carrier.  This means the interface is
non-functional.

Avoiding the link interrupt when we have phylink is basically not
an option, so remove the !port->phylink from the test.

Fixes: 4bb043262878 ("net: mvpp2: phylink support")
Tested-by: Sven Auhagen <sven.auhagen@voleatech.de>
Tested-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

net: dsa: sja1105: Reconcile the meaning of TPID and TPID2 for E/T and P/Q/R/S

BugLink: https://bugs.launchpad.net/bugs/1858429
[ Upstream commit 54fa49ee88138756df0fcf867cb1849904710a8c ]

For first-generation switches (SJA1105E and SJA1105T):
- TPID means C-Tag (typically 0x8100)
- TPID2 means S-Tag (typically 0x88A8)

While for the second generation switches (SJA1105P, SJA1105Q, SJA1105R,
SJA1105S) it is the other way around:
- TPID means S-Tag (typically 0x88A8)
- TPID2 means C-Tag (typically 0x8100)

In other words, E/T tags untagged traffic with TPID, and P/Q/R/S with
TPID2.

So the patch mentioned below fixed VLAN filtering for P/Q/R/S, but broke
it for E/T.

We strive for a common code path for all switches in the family, so just
lie in the static config packing functions that TPID and TPID2 are at
swapped bit offsets than they actually are, for P/Q/R/S. This will make
both switches understand TPID to be ETH_P_8021Q and TPID2 to be
ETH_P_8021AD. The meaning from the original E/T was chosen over P/Q/R/S
because E/T is actually the one with public documentation available
(UM10944.pdf).

Fixes: f9a1a7646c0d ("net: dsa: sja1105: Reverse TPID and TPID2")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

net/dst: do not confirm neighbor for vxlan and geneve pmtu update

BugLink: https://bugs.launchpad.net/bugs/1858429
[ Upstream commit f081042d128a0c7acbd67611def62e1b52e2d294 ]

When do IPv6 tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end,
we should not call dst_confirm_neigh() as there is no two-way communication.

So disable the neigh confirm for vxlan and geneve pmtu update.

v5: No change.
v4: No change.
v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
dst_ops.update_pmtu to control whether we should do neighbor confirm.
Also split the big patch to small ones for each area.
v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.

Fixes: a93bf0ff4490 ("vxlan: update skb dst pmtu on tx path")
Fixes: 52a589d51f10 ("geneve: update skb dst pmtu on tx path")
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Tested-by: Guillaume Nault <gnault@redhat.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

sit: do not confirm neighbor when do pmtu update

BugLink: https://bugs.launchpad.net/bugs/1858429
[ Upstream commit 4d42df46d6372ece4cb4279870b46c2ea7304a47 ]

When do IPv6 tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end,
we should not call dst_confirm_neigh() as there is no two-way communication.

v5: No change.
v4: No change.
v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
dst_ops.update_pmtu to control whether we should do neighbor confirm.
Also split the big patch to small ones for each area.
v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.

Reviewed-by: Guillaume Nault <gnault@redhat.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

vti: do not confirm neighbor when do pmtu update

BugLink: https://bugs.launchpad.net/bugs/1858429
[ Upstream commit 8247a79efa2f28b44329f363272550c1738377de ]

When do IPv6 tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end,
we should not call dst_confirm_neigh() as there is no two-way communication.

Although vti and vti6 are immune to this problem because they are IFF_NOARP
interfaces, as Guillaume pointed. There is still no sense to confirm neighbour
here.

v5: Update commit description.
v4: No change.
v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
dst_ops.update_pmtu to control whether we should do neighbor confirm.
Also split the big patch to small ones for each area.
v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.

Reviewed-by: Guillaume Nault <gnault@redhat.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

tunnel: do not confirm neighbor when do pmtu update

BugLink: https://bugs.launchpad.net/bugs/1858429
[ Upstream commit 7a1592bcb15d71400a98632727791d1e68ea0ee8 ]

When do tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end,
we should not call dst_confirm_neigh() as there is no two-way communication.

v5: No Change.
v4: Update commit description
v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
dst_ops.update_pmtu to control whether we should do neighbor confirm.
Also split the big patch to small ones for each area.
v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.

Fixes: 0dec879f636f ("net: use dst_confirm_neigh for UDP, RAW, ICMP, L2TP")
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Tested-by: Guillaume Nault <gnault@redhat.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

net/dst: add new function skb_dst_update_pmtu_no_confirm

BugLink: https://bugs.launchpad.net/bugs/1858429
[ Upstream commit 07dc35c6e3cc3c001915d05f5bf21f80a39a0970 ]

Add a new function skb_dst_update_pmtu_no_confirm() for callers who need
update pmtu but should not do neighbor confirm.

v5: No change.
v4: No change.
v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
dst_ops.update_pmtu to control whether we should do neighbor confirm.
Also split the big patch to small ones for each area.
v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.

Reviewed-by: Guillaume Nault <gnault@redhat.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

gtp: do not confirm neighbor when do pmtu update

BugLink: https://bugs.launchpad.net/bugs/1858429
[ Upstream commit 6e9105c73f8d2163d12d5dfd762fd75483ed30f5 ]

When do IPv6 tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end,
we should not call dst_confirm_neigh() as there is no two-way communication.

Although GTP only support ipv4 right now, and __ip_rt_update_pmtu() does not
call dst_confirm_neigh(), we still set it to false to keep consistency with
IPv6 code.

v5: No change.
v4: No change.
v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
dst_ops.update_pmtu to control whether we should do neighbor confirm.
Also split the big patch to small ones for each area.
v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.

Reviewed-by: Guillaume Nault <gnault@redhat.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

ip6_gre: do not confirm neighbor when do pmtu update

BugLink: https://bugs.launchpad.net/bugs/1858429
[ Upstream commit 675d76ad0ad5bf41c9a129772ef0aba8f57ea9a7 ]

When we do ipv6 gre pmtu update, we will also do neigh confirm currently.
This will cause the neigh cache be refreshed and set to REACHABLE before
xmit.

But if the remote mac address changed, e.g. device is deleted and recreated,
we will not able to notice this and still use the old mac address as the neigh
cache is REACHABLE.

Fix this by disable neigh confirm when do pmtu update

v5: No change.
v4: No change.
v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
dst_ops.update_pmtu to control whether we should do neighbor confirm.
Also split the big patch to small ones for each area.
v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.

Reported-by: Jianlin Shi <jishi@redhat.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

net: add bool confirm_neigh parameter for dst_ops.update_pmtu

BugLink: https://bugs.launchpad.net/bugs/1858429
[ Upstream commit bd085ef678b2cc8c38c105673dfe8ff8f5ec0c57 ]

The MTU update code is supposed to be invoked in response to real
networking events that update the PMTU. In IPv6 PMTU update function
__ip6_rt_update_pmtu() we called dst_confirm_neigh() to update neighbor
confirmed time.

But for tunnel code, it will call pmtu before xmit, like:
  - tnl_update_pmtu()
    - skb_dst_update_pmtu()
      - ip6_rt_update_pmtu()
        - __ip6_rt_update_pmtu()
          - dst_confirm_neigh()

If the tunnel remote dst mac address changed and we still do the neigh
confirm, we will not be able to update neigh cache and ping6 remote
will failed.

So for this ip_tunnel_xmit() case, _EVEN_ if the MTU is changed, we
should not be invoking dst_confirm_neigh() as we have no evidence
of successful two-way communication at this point.

On the other hand it is also important to keep the neigh reachability fresh
for TCP flows, so we cannot remove this dst_confirm_neigh() call.

To fix the issue, we have to add a new bool parameter for dst_ops.update_pmtu
to choose whether we should do neigh update or not. I will add the parameter
in this patch and set all the callers to true to comply with the previous
way, and fix the tunnel code one by one on later patches.

v5: No change.
v4: No change.
v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
    dst_ops.update_pmtu to control whether we should do neighbor confirm.
    Also split the big patch to small ones for each area.
v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.

Suggested-by: David Miller <davem@davemloft.net>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

mlxsw: spectrum: Use dedicated policer for VRRP packets

BugLink: https://bugs.launchpad.net/bugs/1858429
[ Upstream commit acca789a358cc960be3937851d7de6591c79d6c2 ]

Currently, VRRP packets and packets that hit exceptions during routing
(e.g., MTU error) are policed using the same policer towards the CPU.
This means, for example, that misconfiguration of the MTU on a routed
interface can prevent VRRP packets from reaching the CPU, which in turn
can cause the VRRP daemon to assume it is the Master router.

Fix this by using a dedicated policer for VRRP packets.

Fixes: 11566d34f895 ("mlxsw: spectrum: Add VRRP traps")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Alex Veber <alexve@mellanox.com>
Tested-by: Alex Veber <alexve@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

mlxsw: spectrum_router: Skip loopback RIFs during MAC validation

BugLink: https://bugs.launchpad.net/bugs/1858429
[ Upstream commit 314bd842d98e1035cc40b671a71e07f48420e58f ]

When a router interface (RIF) is created the MAC address of the backing
netdev is verified to have the same MSBs as existing RIFs. This is
required in order to avoid changing existing RIF MAC addresses that all
share the same MSBs.

Loopback RIFs are special in this regard as they do not have a MAC
address, given they are only used to loop packets from the overlay to
the underlay.

Without this change, an error is returned when trying to create a RIF
after the creation of a GRE tunnel that is represented by a loopback
RIF. 'rif->dev->dev_addr' points to the GRE device's local IP, which
does not share the same MSBs as physical interfaces. Adding an IP
address to any physical interface results in:

Error: mlxsw_spectrum: All router interface MAC addresses must have the
same prefix.

Fix this by skipping loopback RIFs during MAC validation.

Fixes: 74bc99397438 ("mlxsw: spectrum_router: Veto unsupported RIF MAC addresses")
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

bnxt_en: Add missing devlink health reporters for VFs.

BugLink: https://bugs.launchpad.net/bugs/1858429
[ Upstream commit 7e334fc8003c7a38372cc98e7be6082670a47d29 ]

The VF driver also needs to create the health reporters since
VFs are also involved in firmware reset and recovery.  Modify
bnxt_dl_register() and bnxt_dl_unregister() so that they can
be called by the VFs to register/unregister devlink.  Only the PF
will register the devlink parameters.  With devlink registered,
we can now create the health reporters on the VFs.

Fixes: 6763c779c2d8 ("bnxt_en: Add new FW devlink_health_reporter")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

bnxt_en: Fix the logic that creates the health reporters.

BugLink: https://bugs.launchpad.net/bugs/1858429
[ Upstream commit 937f188c1f4f89b3fa93ba31fc8587dc1fb14a22 ]

Fix the logic to properly check the fw capabilities and create the
devlink health reporters only when needed.  The current code creates
the reporters unconditionally as long as bp->fw_health is valid, and
that's not correct.

Call bnxt_dl_fw_reporters_create() directly from the init and reset
code path instead of from bnxt_dl_register().  This allows the
reporters to be adjusted when capabilities change.  The same
applies to bnxt_dl_fw_reporters_destroy().

Fixes: 6763c779c2d8 ("bnxt_en: Add new FW devlink_health_reporter")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

bnxt_en: Remove unnecessary NULL checks for fw_health

BugLink: https://bugs.launchpad.net/bugs/1858429
[ Upstream commit 0797c10d2d1fa0d6f14612404781b348fc757c3e ]

After fixing the allocation of bp->fw_health in the previous patch,
the driver will not go through the fw reset and recovery code paths
if bp->fw_health allocation fails. So we can now remove the
unnecessary NULL checks.

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

bnxt_en: Fix bp->fw_health allocation and free logic.

BugLink: https://bugs.launchpad.net/bugs/1858429
[ Upstream commit 8280b38e01f71e0f89389ccad3fa43b79e57c604 ]

bp->fw_health needs to be allocated for either the firmware initiated
reset feature or the driver initiated error recovery feature.  The
current code is not allocating bp->fw_health for all the necessary cases.
This patch corrects the logic to allocate bp->fw_health correctly when
needed.  If allocation fails, we clear the feature flags.

We also add the the missing kfree(bp->fw_health) when the driver is
unloaded.  If we get an async reset message from the firmware, we also
need to make sure that we have a valid bp->fw_health before proceeding.

Fixes: 07f83d72d238 ("bnxt_en: Discover firmware error recovery capabilities.")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

bnxt_en: Return error if FW returns more data than dump length

BugLink: https://bugs.launchpad.net/bugs/1858429
[ Upstream commit c74751f4c39232c31214ec6a3bc1c7e62f5c728b ]

If any change happened in the configuration of VF in VM while
collecting live dump, there could be a race and firmware can return
more data than allocated dump length. Fix it by keeping track of
the accumulated core dump length copied so far and abort the copy
with error code if the next chunk of core dump will exceed the
original dump length.

Fixes: 6c5657d085ae ("bnxt_en: Add support for ethtool get dump.")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

bnxt_en: Free context memory in the open path if firmware has been reset.

BugLink: https://bugs.launchpad.net/bugs/1858429
[ Upstream commit 325f85f37e5b35807d86185bdf2c64d2980c44ba ]

This will trigger new context memory to be rediscovered and allocated
during the re-probe process after a firmware reset. Without this, the
newly reset firmware does not have valid context memory and the driver
will eventually fail to allocate some resources.

Fixes: ec5d31e3c15d ("bnxt_en: Handle firmware reset status during IF_UP.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

bnxt_en: Fix MSIX request logic for RDMA driver.

BugLink: https://bugs.launchpad.net/bugs/1858429
[ Upstream commit 0c722ec0a289c7f6b53f89bad1cfb7c4db3f7a62 ]

The logic needs to check both bp->total_irqs and the reserved IRQs in
hw_resc->resv_irqs if applicable and see if both are enough to cover
the L2 and RDMA requested vectors. The current code is only checking
bp->total_irqs and can fail in some code paths, such as the TX timeout
code path with the RDMA driver requesting vectors after recovery. In
this code path, we have not reserved enough MSIX resources for the
RDMA driver yet.

Fixes: 75720e6323a1 ("bnxt_en: Keep track of reserved IRQs.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

udp: fix integer overflow while computing available space in sk_rcvbuf

BugLink: https://bugs.launchpad.net/bugs/1858429
[ Upstream commit feed8a4fc9d46c3126fb9fcae0e9248270c6321a ]

When the size of the receive buffer for a socket is close to 2^31 when
computing if we have enough space in the buffer to copy a packet from
the queue to the buffer we might hit an integer overflow.

When an user set net.core.rmem_default to a value close to 2^31 UDP
packets are dropped because of this overflow. This can be visible, for
instance, with failure to resolve hostnames.

This can be fixed by casting sk_rcvbuf (which is an int) to unsigned
int, similarly to how it is done in TCP.

Signed-off-by: Antonio Messina <amessina@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

tcp: Fix highest_sack and highest_sack_seq

BugLink: https://bugs.launchpad.net/bugs/1858429
[ Upstream commit 853697504de043ff0bfd815bd3a64de1dce73dc7 ]

>From commit 50895b9de1d3 ("tcp: highest_sack fix"), the logic about
setting tp->highest_sack to the head of the send queue was removed.
Of course the logic is error prone, but it is logical. Before we
remove the pointer to the highest sack skb and use the seq instead,
we need to set tp->highest_sack to NULL when there is no skb after
the last sack, and then replace NULL with the real skb when new skb
inserted into the rtx queue, because the NULL means the highest sack
seq is tp->snd_nxt. If tp->highest_sack is NULL and new data sent,
the next ACK with sack option will increase tp->reordering unexpectedly.

This patch sets tp->highest_sack to the tail of the rtx queue if
it's NULL and new data is sent. The patch keeps the rule that the
highest_sack can only be maintained by sack processing, except for
this only case.

Fixes: 50895b9de1d3 ("tcp: highest_sack fix")
Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

ptp: fix the race between the release of ptp_clock and cdev

BugLink: https://bugs.launchpad.net/bugs/1858429
[ Upstream commit a33121e5487b424339636b25c35d3a180eaa5f5e ]

In a case when a ptp chardev (like /dev/ptp0) is open but an underlying
device is removed, closing this file leads to a race. This reproduces
easily in a kvm virtual machine:

ts# cat openptp0.c
int main() { ... fp = fopen("/dev/ptp0", "r"); ... sleep(10); }
ts# uname -r
5.5.0-rc3-46cf053e
ts# cat /proc/cmdline
... slub_debug=FZP
ts# modprobe ptp_kvm
ts# ./openptp0 &
[1] 670
opened /dev/ptp0, sleeping 10s...
ts# rmmod ptp_kvm
ts# ls /dev/ptp*
ls: cannot access '/dev/ptp*': No such file or directory
ts# ...woken up
[   48.010809] general protection fault: 0000 [#1] SMP
[   48.012502] CPU: 6 PID: 658 Comm: openptp0 Not tainted 5.5.0-rc3-46cf053e #25
[   48.014624] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), ...
[   48.016270] RIP: 0010:module_put.part.0+0x7/0x80
[   48.017939] RSP: 0018:ffffb3850073be00 EFLAGS: 00010202
[   48.018339] RAX: 000000006b6b6b6b RBX: 6b6b6b6b6b6b6b6b RCX: ffff89a476c00ad0
[   48.018936] RDX: fffff65a08d3ea08 RSI: 0000000000000247 RDI: 6b6b6b6b6b6b6b6b
[   48.019470] ...                                              ^^^ a slub poison
[   48.023854] Call Trace:
[   48.024050]  __fput+0x21f/0x240
[   48.024288]  task_work_run+0x79/0x90
[   48.024555]  do_exit+0x2af/0xab0
[   48.024799]  ? vfs_write+0x16a/0x190
[   48.025082]  do_group_exit+0x35/0x90
[   48.025387]  __x64_sys_exit_group+0xf/0x10
[   48.025737]  do_syscall_64+0x3d/0x130
[   48.026056]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   48.026479] RIP: 0033:0x7f53b12082f6
[   48.026792] ...
[   48.030945] Modules linked in: ptp i6300esb watchdog [last unloaded: ptp_kvm]
[   48.045001] Fixing recursive fault but reboot is needed!

This happens in:

static void __fput(struct file *file)
{   ...
    if (file->f_op->release)
        file->f_op->release(inode, file); <<< cdev is kfree'd here
    if (unlikely(S_ISCHR(inode->i_mode) && inode->i_cdev != NULL &&
             !(mode & FMODE_PATH))) {
        cdev_put(inode->i_cdev); <<< cdev fields are accessed here

Namely:

__fput()
  posix_clock_release()
    kref_put(&clk->kref, delete_clock) <<< the last reference
      delete_clock()
        delete_ptp_clock()
          kfree(ptp) <<< cdev is embedded in ptp
  cdev_put
    module_put(p->owner) <<< *p is kfree'd, bang!

Here cdev is embedded in posix_clock which is embedded in ptp_clock.
The race happens because ptp_clock's lifetime is controlled by two
refcounts: kref and cdev.kobj in posix_clock. This is wrong.

Make ptp_clock's sysfs device a parent of cdev with cdev_device_add()
created especially for such cases. This way the parent device with its
ptp_clock is not released until all references to the cdev are released.
This adds a requirement that an initialized but not exposed struct
device should be provided to posix_clock_register() by a caller instead
of a simple dev_t.

This approach was adopted from the commit 72139dfa2464 ("watchdog: Fix
the race between the release of watchdog_core_data and cdev"). See
details of the implementation in the commit 233ed09d7fda ("chardev: add
helper function to register char devs with a struct device").

Link: https://lore.kernel.org/linux-fsdevel/20191125125342.6189-1-vdronov@redhat.com/T/#u
Analyzed-by: Stephen Johnston <sjohnsto@redhat.com>
Analyzed-by: Vern Lovejoy <vlovejoy@redhat.com>
Signed-off-by: Vladis Dronov <vdronov@redhat.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

net: stmmac: dwmac-meson8b: Fix the RGMII TX delay on Meson8b/8m2 SoCs

BugLink: https://bugs.launchpad.net/bugs/1858429
[ Upstream commit bd6f48546b9cb7a785344fc78058c420923d7ed8 ]

GXBB and newer SoCs use the fixed FCLK_DIV2 (1GHz) clock as input for
the m250_sel clock. Meson8b and Meson8m2 use MPLL2 instead, whose rate
can be adjusted at runtime.

So far we have been running MPLL2 with ~250MHz (and the internal
m250_div with value 1), which worked enough that we could transfer data
with an TX delay of 4ns. Unfortunately there is high packet loss with
an RGMII PHY when transferring data (receiving data works fine though).
Odroid-C1's u-boot is running with a TX delay of only 2ns as well as
the internal m250_div set to 2 - no lost (TX) packets can be observed
with that setting in u-boot.

Manual testing has shown that the TX packet loss goes away when using
the following settings in Linux (the vendor kernel uses the same
settings):
- MPLL2 clock set to ~500MHz
- m250_div set to 2
- TX delay set to 2ns on the MAC side

Update the m250_div divider settings to only accept dividers greater or
equal 2 to fix the TX delay generated by the MAC.

iperf3 results before the change:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   182 MBytes   153 Mbits/sec  514      sender
[  5]   0.00-10.00  sec   182 MBytes   152 Mbits/sec           receiver

iperf3 results after the change (including an updated TX delay of 2ns):
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-10.00  sec   927 MBytes   778 Mbits/sec    0      sender
[  5]   0.00-10.01  sec   927 MBytes   777 Mbits/sec           receiver

Fixes: 4f6a71b84e1afd ("net: stmmac: dwmac-meson8b: fix internal RGMII clock configuration")
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

net_sched: sch_fq: properly set sk->sk_pacing_status

BugLink: https://bugs.launchpad.net/bugs/1858429
[ Upstream commit bb3d0b8bf5be61ab1d6f472c43cbf34de17e796b ]

If fq_classify() recycles a struct fq_flow because
a socket structure has been reallocated, we do not
set sk->sk_pacing_status immediately, but later if the
flow becomes detached.

This means that any flow requiring pacing (BBR, or SO_MAX_PACING_RATE)
might fallback to TCP internal pacing, which requires a per-socket
high resolution timer, and therefore more cpu cycles.

Fixes: 218af599fa63 ("tcp: internal implementation for pacing")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

net/sched: add delete_empty() to filters and use it in cls_flower

BugLink: https://bugs.launchpad.net/bugs/1858429
[ Upstream commit a5b72a083da197b493c7ed1e5730d62d3199f7d6 ]

Revert "net/sched: cls_u32: fix refcount leak in the error path of
u32_change()", and fix the u32 refcount leak in a more generic way that
preserves the semantic of rule dumping.
On tc filters that don't support lockless insertion/removal, there is no
need to guard against concurrent insertion when a removal is in progress.
Therefore, for most of them we can avoid a full walk() when deleting, and
just decrease the refcount, like it was done on older Linux kernels.
This fixes situations where walk() was wrongly detecting a non-empty
filter, like it happened with cls_u32 in the error path of change(), thus
leading to failures in the following tdc selftests:

6aa7: (filter, u32) Add/Replace u32 with source match and invalid indev
6658: (filter, u32) Add/Replace u32 with custom hash table and invalid handle
74c2: (filter, u32) Add/Replace u32 filter with invalid hash table id

On cls_flower, and on (future) lockless filters, this check is necessary:
move all the check_empty() logic in a callback so that each filter
can have its own implementation. For cls_flower, it's sufficient to check
if no IDRs have been allocated.

This reverts commit 275c44aa194b7159d1191817b20e076f55f0e620.

Changes since v1:
- document the need for delete_empty() when TCF_PROTO_OPS_DOIT_UNLOCKED
is used, thanks to Vlad Buslov
- implement delete_empty() without doing fl_walk(), thanks to Vlad Buslov
- squash revert and new fix in a single patch, to be nice with bisect
tests that run tdc on u32 filter, thanks to Dave Miller

Fixes: 275c44aa194b ("net/sched: cls_u32: fix refcount leak in the error path of u32_change()")
Fixes: 6676d5e416ee ("net: sched: set dedicated tcf_walker flag when tp is empty")
Suggested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Suggested-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Tested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

net/sched: act_mirred: Pull mac prior redir to non mac_header_xmit device

BugLink: https://bugs.launchpad.net/bugs/1858429
[ Upstream commit 70cf3dc7313207816255b9acb0dffb19dae78144 ]

There's no skb_pull performed when a mirred action is set at egress of a
mac device, with a target device/action that expects skb->data to point
at the network header.

As a result, either the target device is errornously given an skb with
data pointing to the mac (egress case), or the net stack receives the
skb with data pointing to the mac (ingress case).

E.g:
# tc qdisc add dev eth9 root handle 1: prio
# tc filter add dev eth9 parent 1: prio 9 protocol ip handle 9 basic \
action mirred egress redirect dev tun0

(tun0 is a tun device. result: tun0 errornously gets the eth header
instead of the iph)

Revise the push/pull logic of tcf_mirred_act() to not rely on the
skb_at_tc_ingress() vs tcf_mirred_act_wants_ingress() comparison, as it
does not cover all "pull" cases.

Instead, calculate whether the required action on the target device
requires the data to point at the network header, and compare this to
whether skb->data points to network header - and make the push/pull
adjustments as necessary.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Shmulik Ladkani <sladkani@proofpoint.com>
Tested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

net: phy: aquantia: add suspend / resume ops for AQR105

BugLink: https://bugs.launchpad.net/bugs/1858429
[ Upstream commit 1c93fb45761e79b3c00080e71523886cefaf351c ]

The suspend/resume code for AQR107 works on AQR105 too.
This patch fixes issues with the partner not seeing the link down
when the interface using AQR105 is brought down.

Fixes: bee8259dd31f ("net: phy: add driver for aquantia phy")
Signed-off-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

net/mlxfw: Fix out-of-memory error in mfa2 flash burning

BugLink: https://bugs.launchpad.net/bugs/1858429
[ Upstream commit a5bcd72e054aabb93ddc51ed8cde36a5bfc50271 ]

The burning process requires to perform internal allocations of large
chunks of memory. This memory doesn't need to be contiguous and can be
safely allocated by vzalloc() instead of kzalloc(). This patch changes
such allocation to avoid possible out-of-memory failure.

Fixes: 410ed13cae39 ("Add the mlxfw module for Mellanox firmware flash process")
Signed-off-by: Vladyslav Tarasiuk <vladyslavt@mellanox.com>
Reviewed-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

net: dsa: bcm_sf2: Fix IP fragment location and behavior

BugLink: https://bugs.launchpad.net/bugs/1858429
[ Upstream commit 7c3125f0a6ebc17846c5908ad7d6056d66c1c426 ]

The IP fragment is specified through user-defined field as the first
bit of the first user-defined word. We were previously trying to extract
it from the user-defined mask which could not possibly work. The ip_frag
is also supposed to be a boolean, if we do not cast it as such, we risk
overwriting the next fields in CFP_DATA(6) which would render the rule
inoperative.

Fixes: 7318166cacad ("net: dsa: bcm_sf2: Add support for ethtool::rxnfc")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

cxgb4/cxgb4vf: fix flow control display for auto negotiation

BugLink: https://bugs.launchpad.net/bugs/1858429
[ Upstream commit 0caeaf6ad532f9be5a768a158627cb31921cc8b7 ]

As per 802.3-2005, Section Two, Annex 28B, Table 28B-2 [1], when
_only_ Rx pause is enabled, both symmetric and asymmetric pause
towards local device must be enabled. Also, firmware returns the local
device's flow control pause params as part of advertised capabilities
and negotiated params as part of current link attributes. So, fix up
ethtool's flow control pause params fetch logic to read from acaps,
instead of linkattr.

[1] https://standards.ieee.org/standard/802_3-2005.html

Fixes: c3168cabe1af ("cxgb4/cxgbvf: Handle 32-bit fw port capabilities")
Signed-off-by: Surendra Mobiya <surendra@chelsio.com>
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>