]> git.proxmox.com Git - mirror_ubuntu-kernels.git/log
mirror_ubuntu-kernels.git
3 years agoionic: ignore EBUSY on queue start
Shannon Nelson [Wed, 7 Apr 2021 23:19:59 +0000 (16:19 -0700)]
ionic: ignore EBUSY on queue start

When starting the queues in the link-check, don't go into
the BROKEN state if the return was EBUSY.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoionic: re-start ptp after queues up
Shannon Nelson [Wed, 7 Apr 2021 23:19:58 +0000 (16:19 -0700)]
ionic: re-start ptp after queues up

When returning after a firmware reset, re-start the
PTP after we've restarted the general queues.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoionic: add SKBTX_IN_PROGRESS
Shannon Nelson [Wed, 7 Apr 2021 23:19:57 +0000 (16:19 -0700)]
ionic: add SKBTX_IN_PROGRESS

Set the SKBTX_IN_PROGRESS when offloading the Tx timestamp.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoionic: check for valid tx_mode on SKBTX_HW_TSTAMP xmit
Shannon Nelson [Wed, 7 Apr 2021 23:19:56 +0000 (16:19 -0700)]
ionic: check for valid tx_mode on SKBTX_HW_TSTAMP xmit

Make sure the device is in a Tx offload mode before calling the
hwstamp offload xmit.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoionic: remove unnecessary compat ifdef
Shannon Nelson [Wed, 7 Apr 2021 23:19:55 +0000 (16:19 -0700)]
ionic: remove unnecessary compat ifdef

We don't need to look for HAVE_HWSTAMP_TX_ONESTEP_P2P in the
upstream kernel.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoionic: fix up a couple of code style nits
Shannon Nelson [Wed, 7 Apr 2021 23:19:54 +0000 (16:19 -0700)]
ionic: fix up a couple of code style nits

Clean up variable declarations.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'marvell10g-updates'
David S. Miller [Thu, 8 Apr 2021 20:15:34 +0000 (13:15 -0700)]
Merge branch 'marvell10g-updates'

Marek Behún says:

====================
net: phy: marvell10g updates

Here are some updates for marvell10g PHY driver.

I am still working on some more changes for this driver, but I would
like to have at least something reviewed / applied.

Changes since v3:
- added Andrew's Reviewed-by tags
- removed patches adding variadic-macro library and bitmap
  initialization macro - it causes warning that we are not currently
  able to fix easily. Instead the supported_interfaces bitmap is now
  initialized via a chip specific method
- added explanation of mactype initialization to commit message of patch
  07/16
- fixed repeated word in commit message of second to last patch

Changes since v2:
- code refactored to use an additional structure mv3310_chip describing
  mv3310 specific properties / operations for PHYs supported by this
  driver
- added separate phy_driver structures for 88X3340 and 88E2111
- removed 88E2180 specific code (dual-port and quad-port SXGMII modes
  are ignored for now)

Changes since v1:
- added various MACTYPEs support also for 88E21XX
- differentiate between specific models with same PHY_ID
- better check for compatible interface
- print exact model
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMAINTAINERS: add myself as maintainer of marvell10g driver
Marek Behún [Wed, 7 Apr 2021 20:22:54 +0000 (22:22 +0200)]
MAINTAINERS: add myself as maintainer of marvell10g driver

Add myself as maintainer of the marvell10g ethernet PHY driver, in
addition to Russell King.

Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: phy: marvell10g: change module description
Marek Behún [Wed, 7 Apr 2021 20:22:53 +0000 (22:22 +0200)]
net: phy: marvell10g: change module description

This module supports not only Alaska X, but also Alaska M.

Change module description appropriately.

Signed-off-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: phy: marvell10g: differentiate 88E2110 vs 88E2111
Marek Behún [Wed, 7 Apr 2021 20:22:52 +0000 (22:22 +0200)]
net: phy: marvell10g: differentiate 88E2110 vs 88E2111

88E2111 is a variant of 88E2110 which does not support 5 gigabit speeds.

Differentiate these variants via the match_phy_device() method, since
they have the same PHY ID.

Signed-off-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: phy: add constants for 2.5G and 5G speed in PCS speed register
Marek Behún [Wed, 7 Apr 2021 20:22:51 +0000 (22:22 +0200)]
net: phy: add constants for 2.5G and 5G speed in PCS speed register

Add constants for 2.5G and 5G speed in PCS speed register into mdio.h.

Signed-off-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: phy: marvell10g: fix driver name for mv88e2110
Marek Behún [Wed, 7 Apr 2021 20:22:50 +0000 (22:22 +0200)]
net: phy: marvell10g: fix driver name for mv88e2110

The driver name "mv88x2110" should be instead "mv88e2110".

Signed-off-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: phy: marvell10g: add separate structure for 88X3340
Marek Behún [Wed, 7 Apr 2021 20:22:49 +0000 (22:22 +0200)]
net: phy: marvell10g: add separate structure for 88X3340

The 88X3340 contains 4 cores similar to 88X3310, but there is a
difference: it does not support xaui host mode. Instead the
corresponding MACTYPE means
  rxaui / 5gbase-r / 2500base-x / sgmii without AN

Signed-off-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: phy: marvell10g: support other MACTYPEs
Marek Behún [Wed, 7 Apr 2021 20:22:48 +0000 (22:22 +0200)]
net: phy: marvell10g: support other MACTYPEs

Currently the only "changing" MACTYPE we support is when the PHY changes
between
  10gbase-r / 5gbase-r / 2500base-x / sgmii

Add support for
  usxgmii
  xaui / 5gbase-r / 2500base-x / sgmii
  rxaui / 5gbase-r / 2500base-x / sgmii
and also
  5gbase-r / 2500base-x / sgmii
for 88E2110.

Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: phy: marvell10g: store temperature read method in chip strucutre
Marek Behún [Wed, 7 Apr 2021 20:22:47 +0000 (22:22 +0200)]
net: phy: marvell10g: store temperature read method in chip strucutre

Now that we have a chip structure, we can store the temperature reading
method in this structure (OOP style).

Signed-off-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: phy: marvell10g: check for correct supported interface mode
Marek Behún [Wed, 7 Apr 2021 20:22:46 +0000 (22:22 +0200)]
net: phy: marvell10g: check for correct supported interface mode

The 88E2110 does not support xaui nor rxaui modes. Check for correct
interface mode for different chips.

Signed-off-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: phy: marvell10g: support all rate matching modes
Marek Behún [Wed, 7 Apr 2021 20:22:45 +0000 (22:22 +0200)]
net: phy: marvell10g: support all rate matching modes

Add support for all rate matching modes for 88X3310 (currently only
10gbase-r is supported, but xaui and rxaui can also be used).

Add support for rate matching for 88E2110 (on 88E2110 the MACTYPE
register is at a different place).

Currently rate matching mode is selected by strapping pins (by setting
the MACTYPE register). There is work in progress to enable this driver
to deduce the best MACTYPE from the knowledge of which interface modes
are supported by the host, but this work is not finished yet.

Signed-off-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: phy: marvell10g: add MACTYPE definitions for 88E21xx
Marek Behún [Wed, 7 Apr 2021 20:22:44 +0000 (22:22 +0200)]
net: phy: marvell10g: add MACTYPE definitions for 88E21xx

Add all MACTYPE definitions for 88E211088E218088E2111 and 88E2181.

Signed-off-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: phy: marvell10g: add all MACTYPE definitions for 88X33x0
Marek Behún [Wed, 7 Apr 2021 20:22:43 +0000 (22:22 +0200)]
net: phy: marvell10g: add all MACTYPE definitions for 88X33x0

Add all MACTYPE definitions for 88X3310, 88X3310P, 88X3340 and 88X3340P.

In order to have consistent naming, rename
MV_V2_33X0_PORT_CTRL_MACTYPE_RATE_MATCH to
MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER_RATE_MATCH.

Signed-off-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: phy: marvell10g: indicate 88X33x0 only port control registers
Marek Behún [Wed, 7 Apr 2021 20:22:42 +0000 (22:22 +0200)]
net: phy: marvell10g: indicate 88X33x0 only port control registers

Rename port control registers to indicate that they are valid only for
88X33x0, not for 88E21x0.

Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: phy: marvell10g: allow 5gbase-r and usxgmii
Marek Behún [Wed, 7 Apr 2021 20:22:41 +0000 (22:22 +0200)]
net: phy: marvell10g: allow 5gbase-r and usxgmii

These modes are also supported by these PHYs.

Signed-off-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: phy: marvell10g: fix typo
Marek Behún [Wed, 7 Apr 2021 20:22:40 +0000 (22:22 +0200)]
net: phy: marvell10g: fix typo

This space should be a tab instead.

Signed-off-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: phy: marvell10g: rename register
Marek Behún [Wed, 7 Apr 2021 20:22:39 +0000 (22:22 +0200)]
net: phy: marvell10g: rename register

The MV_V2_PORT_MAC_TYPE_* is part of the CTRL register. Rename to
MV_V2_PORT_CTRL_MACTYPE_*.

Signed-off-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: fealnx: use module_pci_driver to simplify the code
Wei Yongjun [Wed, 7 Apr 2021 15:07:12 +0000 (15:07 +0000)]
net: fealnx: use module_pci_driver to simplify the code

Use the module_pci_driver() macro to make the code simpler
by eliminating module_init and module_exit calls.

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: atheros: atl2: use module_pci_driver to simplify the code
Wei Yongjun [Wed, 7 Apr 2021 15:07:11 +0000 (15:07 +0000)]
net: atheros: atl2: use module_pci_driver to simplify the code

Use the module_pci_driver() macro to make the code simpler
by eliminating module_init and module_exit calls.

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: sundance: use module_pci_driver to simplify the code
Wei Yongjun [Wed, 7 Apr 2021 15:07:09 +0000 (15:07 +0000)]
net: sundance: use module_pci_driver to simplify the code

Use the module_pci_driver() macro to make the code simpler
by eliminating module_init and module_exit calls.

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agotulip: de2104x: use module_pci_driver to simplify the code
Wei Yongjun [Wed, 7 Apr 2021 15:07:08 +0000 (15:07 +0000)]
tulip: de2104x: use module_pci_driver to simplify the code

Use the module_pci_driver() macro to make the code simpler
by eliminating module_init and module_exit calls.

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agotulip: windbond-840: use module_pci_driver to simplify the code
Wei Yongjun [Wed, 7 Apr 2021 15:07:07 +0000 (15:07 +0000)]
tulip: windbond-840: use module_pci_driver to simplify the code

Use the module_pci_driver() macro to make the code simpler
by eliminating module_init and module_exit calls.

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoenic: use module_pci_driver to simplify the code
Wei Yongjun [Wed, 7 Apr 2021 15:07:05 +0000 (15:07 +0000)]
enic: use module_pci_driver to simplify the code

Use the module_pci_driver() macro to make the code simpler
by eliminating module_init and module_exit calls.

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: encx24j600: use module_spi_driver to simplify the code
Wei Yongjun [Wed, 7 Apr 2021 15:07:04 +0000 (15:07 +0000)]
net: encx24j600: use module_spi_driver to simplify the code

module_spi_driver() makes the code simpler by eliminating
boilerplate code.

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: wan: z85230: drop unused async state
Johan Hovold [Wed, 7 Apr 2021 10:48:56 +0000 (12:48 +0200)]
net: wan: z85230: drop unused async state

According to the changelog, asynchronous mode was dropped sometime
before v2.2. Let's get rid of the unused driver-specific async state as
well so that it doesn't show up when doing tree-wide tty work.

Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoliquidio: Fix unintented sign extension of a left shift of a u16
Colin Ian King [Wed, 7 Apr 2021 10:12:48 +0000 (11:12 +0100)]
liquidio: Fix unintented sign extension of a left shift of a u16

The macro CN23XX_PEM_BAR1_INDEX_REG is being used to shift oct->pcie_port
(a u16) left 24 places. There are two subtle issues here, first the
shift gets promoted to an signed int and then sign extended to a u64.
If oct->pcie_port is 0x80 or more then the upper bits get sign extended
to 1. Secondly shfiting a u16 24 bits will lead to an overflow so it
needs to be cast to a u64 for all the bits to not overflow.

It is entirely possible that the u16 port value is never large enough
for this to fail, but it is useful to fix unintended overflows such
as this.

Fix this by casting the port parameter to the macro to a u64 before
the shift.

Addresses-Coverity: ("Unintended sign extension")
Fixes: 5bc67f587ba7 ("liquidio: CN23XX register definitions")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoxircom: remove redundant error check on variable err
Colin Ian King [Wed, 7 Apr 2021 09:39:22 +0000 (10:39 +0100)]
xircom: remove redundant error check on variable err

The error check on err is always false as err is always 0 at the
port_found label. The code is redundant and can be removed.

Addresses-Coverity: ("Logically dead code")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge tag 'linux-can-next-for-5.13-20210407' of git://git.kernel.org/pub/scm/linux...
David S. Miller [Wed, 7 Apr 2021 21:44:52 +0000 (14:44 -0700)]
Merge tag 'linux-can-next-for-5.13-20210407' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2021-04-07

this is a pull request of 6 patches for net-next/master.

The first patch targets the CAN driver infrastructure, it improves the
alloc_can{,fd}_skb() function to set the pointer to the CAN frame to
NULL if skb allocation fails.

The next patch adds missing error handling to the m_can driver's RX
path (the code was introduced in -next, no need to backport).

In the next patch an unused constant is removed from an enum in the
c_can driver.

The last 3 patches target the mcp251xfd driver. They add BQL support
and try to work around a sometimes broken CRC when reading the TBC
register.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: remove the new_ifindex argument from dev_change_net_namespace
Andrei Vagin [Wed, 7 Apr 2021 06:40:51 +0000 (23:40 -0700)]
net: remove the new_ifindex argument from dev_change_net_namespace

Here is only one place where we want to specify new_ifindex. In all
other cases, callers pass 0 as new_ifindex. It looks reasonable to add a
low-level function with new_ifindex and to convert
dev_change_net_namespace to a static inline wrapper.

Fixes: eeb85a14ee34 ("net: Allow to specify ifindex when device is moved to another namespace")
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: introduce nla_policy for IFLA_NEW_IFINDEX
Andrei Vagin [Wed, 7 Apr 2021 06:40:03 +0000 (23:40 -0700)]
net: introduce nla_policy for IFLA_NEW_IFINDEX

In this case, we don't need to check that new_ifindex is positive in
validate_linkmsg.

Fixes: eeb85a14ee34 ("net: Allow to specify ifindex when device is moved to another namespace")
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge tag 'mlx5-updates-2021-04-06' of git://git.kernel.org/pub/scm/linux/kernel...
David S. Miller [Wed, 7 Apr 2021 21:38:24 +0000 (14:38 -0700)]
Merge tag 'mlx5-updates-2021-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2021-04-06

Introduce TC sample offload

Background
----------

The tc sample action allows user to sample traffic matched by tc
classifier. The sampling consists of choosing packets randomly and
sampling them using psample module.

The tc sample parameters include group id, sampling rate and packet's
truncation (to save kernel-user traffic).

Sample in TC SW
---------------

User must specify rate and group id for sample action, truncate is
optional.

tc filter add dev enp4s0f0_0 ingress protocol ip prio 1 flower \
src_mac 02:25:d0:14:01:02 dst_mac 02:25:d0:14:01:03 \
action sample rate 10 group 5 trunc 60 \
action mirred egress redirect dev enp4s0f0_1

The tc sample action kernel module 'act_sample' will call another
kernel module 'psample' to send sampled packets to userspace.

MLX5 sample HW offload - MLX5 driver patches
--------------------------------------------

The sample action is translated to a goto flow table object
destination which samples packets according to the provided
sample ratio. Sampled packets are duplicated. One copy is
processed by a termination table, named the sample table,
which sends the packet to the eswitch manager port (that will
be processed by software).

The second copy is processed by the default table which executes
the subsequent actions. The default table is created per <vport,
chain, prio> tuple as rules with different prios and chains may
overlap.

For example, for the following typical flow table:

+-------------------------------+
+       original flow table     +
+-------------------------------+
+         original match        +
+-------------------------------+
+ sample action + other actions +
+-------------------------------+

We translate the tc filter with sample action to the following HW model:

        +---------------------+
        + original flow table +
        +---------------------+
        +   original match    +
        +---------------------+
                   |
                   v
+------------------------------------------------+
+                Flow Sampler Object             +
+------------------------------------------------+
+                    sample ratio                +
+------------------------------------------------+
+    sample table id    |    default table id    +
+------------------------------------------------+
           |                            |
           v                            v
+-----------------------------+  +----------------------------------------+
+        sample table         +  + default table per <vport, chain, prio> +
+-----------------------------+  +----------------------------------------+
+ forward to management vport +  +            original match              +
+-----------------------------+  +----------------------------------------+
                                 +            other actions               +
                                 +----------------------------------------+

Flow sampler object
-------------------

Hardware introduces flow sampler object to do sample. It is a new
destination type. Driver needs to specify two flow table ids in it.
One is sample table id. The other one is the default table id.
Sample table samples the packets according to the sample rate and
forward the sampled packets to eswitch manager port. Default table
finishes the subsequent actions.

Group id and reg_c0
-------------------

Userspace program will take different actions for sampled packets
according to tc sample action group id. So hardware must pass group
id to software for each sampled packets. In Paul Blakey's "Introduce
connection tracking offload" patch set, reg_c0 lower 16 bits are used
for miss packet chain id restore. We convert reg_c0 lower 16 bits to
a common object pool, so other features can also use it.

Since sample group id is 32 bits, create a 16 bits object id to map
the group id and write the object id to reg_c0 lower 16 bits. reg_c0
can only be used for matching. Write reg_c0 to flow_tag, so software
can get the object id via flow_tag and find group id via the common
object pool.

Sampler restore handle
----------------------

Use common object pool to create an object id to map sample parameters.
Allocate a modify header action to write the object id to reg_c0 lower
16 bits. Create a restore rule to pass the object id to software. So
software can identify sampled packets via the object id and send it to
userspace.

Aggregate the modify header action, restore rule and object id to a
sample restore handle. Re-use identical sample restore handle for
the same object id.

Send sampled packets to userspace
---------------------------------

The destination for sampled packets is eswitch manager port, so
representors can receive sampled packets together with the group id.
Driver will send sampled packets and group id to userspace via psample.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonfc/fdp: remove unnecessary assignment and label
wengjianfeng [Wed, 7 Apr 2021 03:16:38 +0000 (11:16 +0800)]
nfc/fdp: remove unnecessary assignment and label

In function fdp_nci_patch_otp and fdp_nci_patch_ram,many goto
out statements are used, and out label just return variable r.
in some places,just jump to the out label, and in other places,
assign a value to the variable r,then jump to the out label.
It is unnecessary, we just use return sentences to replace goto
sentences and delete out label.

Signed-off-by: wengjianfeng <wengjianfeng@yulong.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomlxsw: core: Remove critical trip points from thermal zones
Vadim Pasternak [Tue, 6 Apr 2021 12:27:33 +0000 (15:27 +0300)]
mlxsw: core: Remove critical trip points from thermal zones

Disable software thermal protection by removing critical trip points
from all thermal zones.

The software thermal protection is redundant given there are two layers
of protection below it in firmware and hardware. The first layer is
performed by firmware, the second, in case firmware was not able to
perform protection, by hardware.
The temperature threshold set for hardware protection is always higher
than for firmware.

Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agostmmac: intel: Enable SERDES PHY rx clk for PSE
Voon Weifeng [Tue, 6 Apr 2021 01:32:50 +0000 (09:32 +0800)]
stmmac: intel: Enable SERDES PHY rx clk for PSE

EHL PSE SGMII mode requires to ungate the SERDES PHY rx clk for power up
sequence and vice versa.

Signed-off-by: Voon Weifeng <weifeng.voon@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoethtool: document PHY tunable callbacks
Jakub Kicinski [Wed, 7 Apr 2021 00:23:59 +0000 (17:23 -0700)]
ethtool: document PHY tunable callbacks

Add missing kdoc for phy tunable callbacks.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'mptcp-next'
David S. Miller [Wed, 7 Apr 2021 21:09:40 +0000 (14:09 -0700)]
Merge branch 'mptcp-next'

Mat Martineau says:

====================
mptcp: Cleanup, a new test case, and header trimming

Some more patches to include from the MPTCP tree:

Patches 1-6 refactor an address-related data structure and reduce some
duplicate code that handles IPv4 and IPv6 addresses.

Patch 7 adds a test case for the MPTCP netlink interface, passing a
specific ifindex to the kernel.

Patch 8 drops extra header options from IPv4 address echo packets,
improving consistency and testability between IPv4 and IPv6.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomptcp: drop all sub-options except ADD_ADDR when the echo bit is set
Davide Caratti [Wed, 7 Apr 2021 00:16:04 +0000 (17:16 -0700)]
mptcp: drop all sub-options except ADD_ADDR when the echo bit is set

Current Linux carries echo-ed ADD_ADDR over pure TCP ACKs, so there is no
need to add a DSS element that would fit only ADD_ADDR with IPv4 address.
Drop the DSS from echo-ed ADD_ADDR, regardless of the IP version.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoselftests: mptcp: add the net device name testcase
Geliang Tang [Wed, 7 Apr 2021 00:16:03 +0000 (17:16 -0700)]
selftests: mptcp: add the net device name testcase

This patch added a new testcase for setting the net device name. In it,
pass the net device name to pm_nl_ctl to set the ifindex field of struct
mptcp_pm_addr_entry.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomptcp: unify add_addr(6)_generate_hmac
Geliang Tang [Wed, 7 Apr 2021 00:16:02 +0000 (17:16 -0700)]
mptcp: unify add_addr(6)_generate_hmac

The length of the IPv4 address is 4 octets and IPv6 is 16. That's the only
difference between add_addr_generate_hmac and add_addr6_generate_hmac.

This patch dropped the duplicate code and unify them into one.

Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomptcp: drop MPTCP_ADDR_IPVERSION_4/6
Geliang Tang [Wed, 7 Apr 2021 00:16:01 +0000 (17:16 -0700)]
mptcp: drop MPTCP_ADDR_IPVERSION_4/6

Since the type of the address family in struct mptcp_options_received
became sa_family_t, we should set AF_INET/AF_INET6 to it, instead of
using MPTCP_ADDR_IPVERSION_4/6.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomptcp: use mptcp_addr_info in mptcp_options_received
Geliang Tang [Wed, 7 Apr 2021 00:16:00 +0000 (17:16 -0700)]
mptcp: use mptcp_addr_info in mptcp_options_received

This patch added a new struct mptcp_addr_info member addr in struct
mptcp_options_received, and dropped the original family, addr_id, addr,
addr6 and port fields in it. Then we can pass the parameter mp_opt.addr
directly to mptcp_pm_add_addr_received and mptcp_pm_add_addr_echoed.

Since the port number became big-endian now, use htons to convert the
incoming port number to it. Also use ntohs to convert it when passing
it to add_addr_generate_hmac or printing it out.

Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomptcp: drop OPTION_MPTCP_ADD_ADDR6
Geliang Tang [Wed, 7 Apr 2021 00:15:59 +0000 (17:15 -0700)]
mptcp: drop OPTION_MPTCP_ADD_ADDR6

Since the family field was added in struct mptcp_out_options, no need to
use OPTION_MPTCP_ADD_ADDR6 to identify the IPv6 address. Drop it.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomptcp: use mptcp_addr_info in mptcp_out_options
Geliang Tang [Wed, 7 Apr 2021 00:15:58 +0000 (17:15 -0700)]
mptcp: use mptcp_addr_info in mptcp_out_options

This patch moved the mptcp_addr_info struct from protocol.h to mptcp.h,
added a new struct mptcp_addr_info member addr in struct mptcp_out_options,
and dropped the original addr, addr6, addr_id and port fields in it. Then
we can use opts->addr to get the adding address from PM directly using
mptcp_pm_add_addr_signal.

Since the port number became big-endian now, use ntohs to convert it
before sending it out with the ADD_ADDR suboption. Also convert it
when passing it to add_addr_generate_hmac or printing it out.

Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomptcp: move flags and ifindex out of mptcp_addr_info
Geliang Tang [Wed, 7 Apr 2021 00:15:57 +0000 (17:15 -0700)]
mptcp: move flags and ifindex out of mptcp_addr_info

This patch moved the flags and ifindex fields from struct mptcp_addr_info
to struct mptcp_pm_addr_entry. Add the flags and ifindex values as two new
parameters to __mptcp_subflow_connect.

In mptcp_pm_create_subflow_or_signal_addr, pass the local address entry's
flags and ifindex fields to __mptcp_subflow_connect.

In mptcp_pm_nl_add_addr_received, just pass two zeros to it.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agocan: mcp251xfd: mcp251xfd_regmap_crc_read(): work around broken CRC on TBC register
Marc Kleine-Budde [Mon, 15 Mar 2021 09:59:09 +0000 (10:59 +0100)]
can: mcp251xfd: mcp251xfd_regmap_crc_read(): work around broken CRC on TBC register

MCP251XFD_REG_TBC is the time base counter register. It increments
once per SYS clock tick, which is 20 or 40 MHz. Observation shows that
if the lowest byte (which is transferred first on the SPI bus) of that
register is 0x00 or 0x80 the calculated CRC doesn't always match the
transferred one.

To reproduce this problem let the driver read the TBC register in a
high frequency. This can be done by attaching only the mcp251xfd CAN
controller to a valid terminated CAN bus and send a single CAN frame.
As there are no other CAN controller on the bus, the sent CAN frame is
not ACKed and the mcp251xfd repeats it. If user space enables the bus
error reporting, each of the NACK errors is reported with a time
stamp (which is read from the TBC register) to user space.

$ ip link set can0 down
$ ip link set can0 up type can bitrate 500000 berr-reporting on
$ cansend can0 4FF#ff.01.00.00.00.00.00.00

This leads to several error messages per second:

| mcp251xfd spi0.0 can0: CRC read error at address 0x0010 (length=4, data=00 3a 86 da, CRC=0x7753) retrying.
| mcp251xfd spi0.0 can0: CRC read error at address 0x0010 (length=4, data=80 01 b4 da, CRC=0x5830) retrying.
| mcp251xfd spi0.0 can0: CRC read error at address 0x0010 (length=4, data=00 e9 23 db, CRC=0xa723) retrying.
| mcp251xfd spi0.0 can0: CRC read error at address 0x0010 (length=4, data=00 8a 30 db, CRC=0x4a9c) retrying.
| mcp251xfd spi0.0 can0: CRC read error at address 0x0010 (length=4, data=80 f3 43 db, CRC=0x66d2) retrying.

If the highest bit in the lowest byte is flipped the transferred CRC
matches the calculated one. We assume for now the CRC calculation in
the chip works on wrong data and the transferred data is correct.

This patch implements the following workaround:

- If a CRC read error on the TBC register is detected and the lowest
  byte is 0x00 or 0x80, the highest bit of the lowest byte is flipped
  and the CRC is calculated again.
- If the CRC now matches, the _original_ data is passed to the reader.
  For now we assume transferred data was OK.

Link: https://lore.kernel.org/r/20210406110617.1865592-5-mkl@pengutronix.de
Cc: Manivannan Sadhasivam <mani@kernel.org>
Cc: Thomas Kopp <thomas.kopp@microchip.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agocan: mcp251xfd: mcp251xfd_regmap_crc_read_one(): Factor out crc check into separate...
Marc Kleine-Budde [Mon, 15 Mar 2021 07:59:15 +0000 (08:59 +0100)]
can: mcp251xfd: mcp251xfd_regmap_crc_read_one(): Factor out crc check into separate function

This patch factors out the crc check into a separate function. This is
preparation for the next patch.

Link: https://lore.kernel.org/r/20210406110617.1865592-4-mkl@pengutronix.de
Cc: Manivannan Sadhasivam <mani@kernel.org>
Cc: Thomas Kopp <thomas.kopp@microchip.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agocan: mcp251xfd: add BQL support
Marc Kleine-Budde [Sun, 13 Dec 2020 16:25:15 +0000 (17:25 +0100)]
can: mcp251xfd: add BQL support

This patch re-adds BQL support to the driver. Support for
netdev_xmit_more() will be added in a separate patch series.

Link: https://lore.kernel.org/r/20210406110617.1865592-3-mkl@pengutronix.de
Cc: Manivannan Sadhasivam <mani@kernel.org>
Cc: Thomas Kopp <thomas.kopp@microchip.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agocan: c_can: remove unused enum BOSCH_C_CAN_PLATFORM
Marc Kleine-Budde [Wed, 24 Jul 2019 09:51:32 +0000 (11:51 +0200)]
can: c_can: remove unused enum BOSCH_C_CAN_PLATFORM

This patch removes the unused enum BOSCH_C_CAN_PLATFORM.

Link: https://lore.kernel.org/r/20210406110617.1865592-2-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agocan: m_can: m_can_receive_skb(): add missing error handling to can_rx_offload_queue_s...
Marc Kleine-Budde [Thu, 1 Apr 2021 08:37:31 +0000 (10:37 +0200)]
can: m_can: m_can_receive_skb(): add missing error handling to can_rx_offload_queue_sorted() call

In commit 1be37d3b0414 ("can: m_can: fix periph RX path: use
rx-offload to ensure skbs are sent from softirq context") the RX path
for peripherals (i.e. SPI based m_can controllers) was converted to
the rx-offload infrastructure. However, the error handling for
can_rx_offload_queue_sorted() was forgotten.
can_rx_offload_queue_sorted() will return with an error if the
internal queue is full.

This patch adds the missing error handling, by increasing the
rx_fifo_errors.

Fixes: 1be37d3b0414 ("can: m_can: fix periph RX path: use rx-offload to ensure skbs are sent from softirq context")
Link: https://lore.kernel.org/r/20210401084515.1455013-1-mkl@pengutronix.de
Reported-by: coverity-bot <keescook+coverity-bot@chromium.org>
Addresses-Coverity-ID: 1503583 ("Error handling issues")
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Torin Cooper-Bennun <torin@maxiluxsystems.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agocan: skb: alloc_can{,fd}_skb(): set "cf" to NULL if skb allocation fails
Marc Kleine-Budde [Fri, 2 Apr 2021 10:05:39 +0000 (12:05 +0200)]
can: skb: alloc_can{,fd}_skb(): set "cf" to NULL if skb allocation fails

The handling of CAN bus errors typically consist of allocating a CAN
error SKB using alloc_can_err_skb() followed by stats handling and
filling the error details in the newly allocated CAN error SKB. Even
if the allocation of the SKB fails the stats handling should not be
skipped.

The common pattern in CAN drivers is to allocate the skb and work on
the struct can_frame pointer "cf", if it has been assigned by
alloc_can_err_skb().

| skb = alloc_can_err_skb(priv->ndev, &cf);
|
|  /* RX errors */
|  if (bdiag1 & (MCP251XFD_REG_BDIAG1_DCRCERR |
|        MCP251XFD_REG_BDIAG1_NCRCERR)) {
|  netdev_dbg(priv->ndev, "CRC error\n");
|
|  stats->rx_errors++;
|  if (cf)
|  cf->data[3] |= CAN_ERR_PROT_LOC_CRC_SEQ;
|  }

In case of an OOM alloc_can_err_skb() returns NULL, but doesn't set
"cf" to NULL as well. For the above pattern to work the "cf" has to be
initialized to NULL, which is easily forgotten.

To solve this kind of problems, set "cf" to NULL if
alloc_can_err_skb() returns NULL.

Link: https://lore.kernel.org/r/20210402102245.1512583-1-mkl@pengutronix.de
Suggested-by: Vincent MAILHOL <mailhol.vincent@wanadoo.fr>
Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agonet/mlx5e: TC, Add support to offload sample action
Chris Mi [Mon, 21 Sep 2020 08:45:07 +0000 (16:45 +0800)]
net/mlx5e: TC, Add support to offload sample action

The following diagram illustrates the hardware model for tc sample action:

        +---------------------+
        + original flow table +
        +---------------------+
        +   original match    +
        +---------------------+
                   |
                   v
+------------------------------------------------+
+                Flow Sampler Object             +
+------------------------------------------------+
+                    sample ratio                +
+------------------------------------------------+
+    sample table id    |    default table id    +
+------------------------------------------------+
           |                            |
           v                            v
+-----------------------------+  +----------------------------------------+
+        sample table         +  + default table per <vport, chain, prio> +
+-----------------------------+  +----------------------------------------+
+ forward to management vport +  +            original match              +
+-----------------------------+  +----------------------------------------+
                                 +            other actions               +
                                 +----------------------------------------+

The sample action is translated to a goto flow table object
destination which samples packets according to the provided
sample ratio. Sampled packets are duplicated. One copy is
processed by a termination table, named the sample table,
which sends the packet to the eswitch manager port (that will
be processed by software).

The second copy is processed by the default table which executes
the subsequent actions. The default table is created per <vport,
chain, prio> tuple as rules with different prios and chains may
overlap.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5e: TC, Handle sampled packets
Chris Mi [Tue, 26 Jan 2021 03:15:46 +0000 (11:15 +0800)]
net/mlx5e: TC, Handle sampled packets

Mark the sampled packets with a sample restore object. Send sampled
packets using the psample api.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5e: TC, Refactor tc update skb function
Chris Mi [Fri, 15 Jan 2021 08:03:47 +0000 (16:03 +0800)]
net/mlx5e: TC, Refactor tc update skb function

As a pre-step to process sampled packet in this function.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5e: TC, Add sampler restore handle API
Chris Mi [Tue, 26 Jan 2021 03:08:28 +0000 (11:08 +0800)]
net/mlx5e: TC, Add sampler restore handle API

Use common object pool to create an object ID to map sample parameters.
Allocate a modify header action to write the object ID to reg_c0 lower
16 bits. Create a restore rule to pass the object ID to software. So
software can identify sampled packets via the object ID and send it to
userspace.

Aggregate the modify header action, restore rule and object ID to a
sample restore handle. Re-use identical sample restore handle for
the same object ID.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5e: TC, Add sampler object API
Chris Mi [Mon, 21 Sep 2020 05:25:54 +0000 (13:25 +0800)]
net/mlx5e: TC, Add sampler object API

In order to offload sample action, HW introduces sampler object. The
sampler object samples packets according to the provided sample ratio.
Sampled packets are duplicated. One copy is processed by a termination
table, named the sample table, which sends the packet up to software.
The second copy is processed by the default table.

Instantiate sampler object. Re-use identical sampler object for
the same sample ratio, sample table and default table as a prestep for
offloading tc sample actions.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5e: TC, Add sampler termination table API
Chris Mi [Fri, 18 Sep 2020 09:46:57 +0000 (17:46 +0800)]
net/mlx5e: TC, Add sampler termination table API

Sampled packets are sent to software using termination tables. There
is only one rule in that table that is to forward sampled packets to
the e-switch management vport.

Create a sampler termination table and rule for each eswitch.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5e: TC, Parse sample action
Chris Mi [Mon, 31 Aug 2020 05:28:35 +0000 (13:28 +0800)]
net/mlx5e: TC, Parse sample action

Parse TC sample action and save sample parameters in flow attribute
data structure.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: Instantiate separate mapping objects for FDB and NIC tables
Chris Mi [Mon, 31 Aug 2020 05:27:53 +0000 (13:27 +0800)]
net/mlx5: Instantiate separate mapping objects for FDB and NIC tables

Currently, the u32 chain id is mapped to u16 value which is stored on
the lower 16 bits of reg_c0 for FDB and reg_b for NIC tables. The
mapping is internally maintained by the chains object. However, with
the introduction of reg_c0 objects the fdb may store more than just
the chain id on reg_c0. This is not relevant for NIC tables.

Separate the chains mapping instantiation for FDB and NIC tables.
Remove the mapping from the chains object. For FDB tables, create
the mapping per eswitch. For NIC tables, create the mapping per tc
table. Pass the corresponding mapping pointer when creating the
chains object.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: Map register values to restore objects
Chris Mi [Thu, 10 Sep 2020 07:28:02 +0000 (15:28 +0800)]
net/mlx5: Map register values to restore objects

Currently reg_c0 lower 16 bits and reg_b are used to store the chain
id that missed in FDB and NIC tables accordingly. However, the
registers' values may index a restore object, rather than a single u32
value. Different object types can be used to restore mutually exclusive
contexts such as chain id and sample group id.

Use the mapping object to associate an index with a restore object
as a prestep for supporting additional restore types.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: E-switch, Set per vport table default group number
Chris Mi [Fri, 9 Oct 2020 03:06:33 +0000 (11:06 +0800)]
net/mlx5: E-switch, Set per vport table default group number

Different per voprt table is created using a different per vport table
namespace. Because we can't use variable to set the namespace member
value.  If max group number is 0 in the namespace, use the eswitch
default max group number.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: E-switch, Generalize per vport table API
Chris Mi [Mon, 31 Aug 2020 05:23:25 +0000 (13:23 +0800)]
net/mlx5: E-switch, Generalize per vport table API

Currently, per vport table was used only for port mirroring actions.
However, sample action will also require a per vport table instance.

Generalize the vport table API to work with multiple namespaces where
each namespace manages its own vport table instance.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: E-switch, Rename functions to follow naming convention.
Chris Mi [Thu, 14 Jan 2021 07:12:36 +0000 (15:12 +0800)]
net/mlx5: E-switch, Rename functions to follow naming convention.

Public api starts with mlx5 and remove mlx5 for non-public api.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: E-switch, Move vport table functions to a new file
Chris Mi [Mon, 31 Aug 2020 05:22:20 +0000 (13:22 +0800)]
net/mlx5: E-switch, Move vport table functions to a new file

Currently, the vport table functions are in common eswitch offload
file. This file is too big. Move the vport table create, delete and
lookup functions to a separate file. Put the file in esw directory.

Pre-step for generalizing its functionality for serving both the
mirroring and the sample features.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agodocs: ethtool: correct quotes
Jakub Kicinski [Tue, 6 Apr 2021 22:59:31 +0000 (15:59 -0700)]
docs: ethtool: correct quotes

Quotes to backticks. All commands use backticks since the names
are constants.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonfp: flower: add support for packet-per-second policing
Peng Zhang [Tue, 6 Apr 2021 15:54:52 +0000 (17:54 +0200)]
nfp: flower: add support for packet-per-second policing

Allow hardware offload of a policer action attached to a matchall filter
which enforces a packets-per-second rate-limit.

e.g.
tc filter add dev tap1 parent ffff: u32 match \
        u32 0 0 police pkts_rate 3000 pkts_burst 1000

Signed-off-by: Peng Zhang <peng.zhang@corigine.com>
Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Louis Peens <louis.peens@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
David S. Miller [Tue, 6 Apr 2021 23:36:41 +0000 (16:36 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following batch contains Netfilter/IPVS updates for your net-next tree:

1) Simplify log infrastructure modularity: Merge ipv4, ipv6, bridge,
   netdev and ARP families to nf_log_syslog.c. Add module softdeps.
   This fixes a rare deadlock condition that might occur when log
   module autoload is required. From Florian Westphal.

2) Moves part of netfilter related pernet data from struct net to
   net_generic() infrastructure. All of these users can be modules,
   so if they are not loaded there is no need to waste space. Size
   reduction is 7 cachelines on x86_64, also from Florian.

2) Update nftables audit support to report events once per table,
   to get it aligned with iptables. From Richard Guy Briggs.

3) Check for stale routes from the flowtable garbage collector path.
   This is fixing IPv6 which breaks due missing check for the dst_cookie.

4) Add a nfnl_fill_hdr() function to simplify netlink + nfnetlink
   headers setup.

5) Remove documentation on several statified functions.

6) Remove printk on netns creation for the FTP IPVS tracker,
   from Florian Westphal.

7) Remove unnecessary nf_tables_destroy_list_lock spinlock
   initialization, from Yang Yingliang.

7) Remove a duplicated forward declaration in ipset,
   from Wan Jiabing.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agotime64.h: Consolidated PSEC_PER_SEC definition
Andy Shevchenko [Tue, 6 Apr 2021 10:22:51 +0000 (13:22 +0300)]
time64.h: Consolidated PSEC_PER_SEC definition

We have currently three users of the PSEC_PER_SEC each of them defining it
individually. Instead, move it to time64.h to be available for everyone.

There is a new user coming with the same constant in use. It will also
make its life easier.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agostmmac: intel: Drop duplicate ID in the list of PCI device IDs
Andy Shevchenko [Tue, 6 Apr 2021 10:13:06 +0000 (13:13 +0300)]
stmmac: intel: Drop duplicate ID in the list of PCI device IDs

The PCI device IDs are defined with a prefix PCI_DEVICE_ID.
There is no need to repeat the ID part at the end of each definition.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Wong Vee Khee <vee.khee.wong@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonetdevsim: remove unneeded semicolon
Qiheng Lin [Tue, 6 Apr 2021 03:18:13 +0000 (11:18 +0800)]
netdevsim: remove unneeded semicolon

Eliminate the following coccicheck warning:
 drivers/net/netdevsim/fib.c:569:2-3: Unneeded semicolon

Signed-off-by: Qiheng Lin <linqiheng@huawei.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ethernet: mtk_eth_soc: remove unneeded semicolon
Qiheng Lin [Tue, 6 Apr 2021 03:04:33 +0000 (11:04 +0800)]
net: ethernet: mtk_eth_soc: remove unneeded semicolon

Eliminate the following coccicheck warning:
 drivers/net/ethernet/mediatek/mtk_ppe.c:270:2-3: Unneeded semicolon

Signed-off-by: Qiheng Lin <linqiheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonfc: s3fwrn5: remove unnecessary label
wengjianfeng [Tue, 6 Apr 2021 01:59:54 +0000 (09:59 +0800)]
nfc: s3fwrn5: remove unnecessary label

In function s3fwrn5_nci_post_setup, the variable ret is assigned then
goto out label, which just return ret, so we use return to replace it.
Other goto sentences are similar, we use return sentences to replace
goto sentences and delete out label.

Signed-off-by: wengjianfeng <wengjianfeng@yulong.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'usbnet-speed'
David S. Miller [Tue, 6 Apr 2021 23:22:37 +0000 (16:22 -0700)]
Merge branch 'usbnet-speed'

Grant Grundler says:

====================
usbnet: speed reporting for devices without MDIO

This series introduces support for USB network devices that report
speed as a part of their protocol, not emulating an MII to be accessed
over MDIO.

v2: rebased on recent upstream changes
v3: incorporated hints on naming and comments
v4: fix misplaced hunks; reword some commit messages;
    add same change for cdc_ether
v4-repost: added "net-next" to subject and Andrew Lunn's Reviewed-by

I'm reposting Oliver Neukum's <oneukum@suse.com> patch series with
fix ups for "misplaced hunks" (landed in the wrong patches).
Please fixup the "author" if "git am" fails to attribute the
patches 1-3 (of 4) to Oliver.

I've tested v4 series with "5.12-rc3+" kernel on Intel NUC6i5SYB
and + Sabrent NT-S25G. Google Pixelbook Go (chromeos-4.4 kernel)
+ Alpha Network AUE2500C were connected directly to the NT-S25G
to get 2.5Gbps link rate:
Settings for enx002427880815:
        Supported ports: [  ]
        Supported link modes:   Not reported
        Supported pause frame use: No
        Supports auto-negotiation: No
        Supported FEC modes: Not reported
        Advertised link modes:  Not reported
        Advertised pause frame use: No
        Advertised auto-negotiation: No
        Advertised FEC modes: Not reported
        Speed: 2500Mb/s
        Duplex: Half
        Auto-negotiation: off
        Port: Twisted Pair
        PHYAD: 0
        Transceiver: internal
        MDI-X: Unknown
        Current message level: 0x00000007 (7)
                               drv probe link
        Link detected: yes

"Duplex" is a lie since we get no information about it.

I expect "Auto-Negotiation" is always true for cdc_ncm and
cdc_ether devices and perhaps someone knows offhand how
to have ethtool report "true" instead.

But this is good step in the right direction.

base-commit: 1c273e10bc0cc7efb933e0ca10e260cdfc9f0b8c
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: cdc_ether: record speed in status method
Grant Grundler [Mon, 5 Apr 2021 23:13:44 +0000 (16:13 -0700)]
net: cdc_ether: record speed in status method

Until very recently, the usbnet framework only had support functions
for devices which reported the link speed by explicitly querying the
PHY over a MDIO interface. However, the cdc_ether devices send
notifications when the link state or link speeds change and do not
expose the PHY (or modem) directly.

Support funtions (e.g. usbnet_get_link_ksettings_internal()) to directly
query state recorded by the cdc_ether driver were added in a previous patch.

Instead of cdc_ether spewing the link speed into the dmesg buffer,
record the link speed encoded in these notifications and tell the
usbnet framework to use the new functions to get link speed/state.

User space can now get the most recent link speed/state using ethtool.

v4: added to series since cdc_ether uses same notifications
    as cdc_ncm driver.

Signed-off-by: Grant Grundler <grundler@chromium.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: cdc_ncm: record speed in status method
Oliver Neukum [Mon, 5 Apr 2021 23:13:43 +0000 (16:13 -0700)]
net: cdc_ncm: record speed in status method

Until very recently, the usbnet framework only had support functions
for devices which reported the link speed by explicitly querying the
PHY over a MDIO interface. However, the cdc_ncm devices send
notifications when the link state or link speeds change and do not
expose the PHY (or modem) directly.

Support funtions (e.g. usbnet_get_link_ksettings_internal()) to directly
query state recorded by the cdc_ncm driver were added in a previous patch.

So instead of cdc_ncm spewing the link speed into the dmesg buffer,
record the link speed encoded in these notifications and tell the
usbnet framework to use the new functions to get link speed/state.
Link speed/state is now available via ethtool.

This is especially useful given all current RTL8156 devices emit
a connection/speed status notification every 32ms and this would
fill the dmesg buffer. This implementation replaces the one
recently submitted in de658a195ee23ca6aaffe197d1d2ea040beea0a2 :
   "net: usb: cdc_ncm: don't spew notifications"

v2: rebased on upstream
v3: changed variable names
v4: rewrote commit message

Signed-off-by: Oliver Neukum <oneukum@suse.com>
Tested-by: Roland Dreier <roland@kernel.org>
Signed-off-by: Grant Grundler <grundler@chromium.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agousbnet: add method for reporting speed without MII
Oliver Neukum [Mon, 5 Apr 2021 23:13:42 +0000 (16:13 -0700)]
usbnet: add method for reporting speed without MII

The old method for reporting link speed assumed a driver uses the
generic phy (mii) MDIO read/write functions. CDC devices don't
expose the phy.

Add a primitive internal version reporting back directly what
the CDC notification/status operations recorded.

v2: rebased on upstream
v3: changed names and made clear which units are used
v4: moved hunks to correct patch; rewrote commmit messages

Signed-off-by: Oliver Neukum <oneukum@suse.com>
Tested-by: Roland Dreier <roland@kernel.org>
Reviewed-by: Grant Grundler <grundler@chromium.org>
Tested-by: Grant Grundler <grundler@chromium.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agousbnet: add _mii suffix to usbnet_set/get_link_ksettings
Oliver Neukum [Mon, 5 Apr 2021 23:13:41 +0000 (16:13 -0700)]
usbnet: add _mii suffix to usbnet_set/get_link_ksettings

The generic functions assumed devices provided an MDIO interface (accessed
via older mii code, not phylib). This is true only for genuine ethernet.

Devices with a higher level of abstraction or based on different
technologies do not have MDIO. To support this case, first rename
the existing functions with _mii suffix.

v2: rebased on changed upstream
v3: changed names to clearly say that this does NOT use phylib
v4: moved hunks to correct patch; reworded commmit messages

Signed-off-by : Oliver Neukum <oneukum@suse.com>
Tested-by: Roland Dreier <roland@kernel.org>
Reviewed-by: Grant Grundler <grundler@chromium.org>
Tested-by: Grant Grundler <grundler@chromium.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agotcp: Reset tcp connections in SYN-SENT state
Manoj Basapathi [Mon, 5 Apr 2021 17:02:42 +0000 (22:32 +0530)]
tcp: Reset tcp connections in SYN-SENT state

Userspace sends tcp connection (sock) destroy on network switch
i.e switching the default network of the device between multiple
networks(Cellular/Wifi/Ethernet).

Kernel though doesn't send reset for the connections in SYN-SENT state
and these connections continue to remain.
Even as per RFC 793, there is no hard rule to not send RST on ABORT in
this state.

Modify tcp_abort and tcp_disconnect behavior to send RST for connections
in syn-sent state to avoid lingering connections on network switch.

Signed-off-by: Manoj Basapathi <manojbm@codeaurora.org>
Signed-off-by: Sauvik Saha <ssaha@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: remove obsolete members from struct net
Florian Westphal [Thu, 1 Apr 2021 14:11:14 +0000 (16:11 +0200)]
net: remove obsolete members from struct net

all have been moved to generic_net infra. On x86_64, this reduces
struct net size from 70 to 63 cache lines (4480 to 4032 byte).

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agonetfilter: conntrack: move ecache dwork to net_generic infra
Florian Westphal [Thu, 1 Apr 2021 14:11:13 +0000 (16:11 +0200)]
netfilter: conntrack: move ecache dwork to net_generic infra

dwork struct is large (>128 byte) and not needed when conntrack module
is not loaded.

Place it in net_generic data instead.  The struct net dwork member is now
obsolete and will be removed in a followup patch.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agonetfilter: conntrack: move sysctl pointer to net_generic infra
Florian Westphal [Thu, 1 Apr 2021 14:11:12 +0000 (16:11 +0200)]
netfilter: conntrack: move sysctl pointer to net_generic infra

No need to keep this in struct net, place it in the net_generic data.
The sysctl pointer is removed from struct net in a followup patch.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agonetfilter: x_tables: move known table lists to net_generic infra
Florian Westphal [Thu, 1 Apr 2021 14:11:11 +0000 (16:11 +0200)]
netfilter: x_tables: move known table lists to net_generic infra

Will reduce struct net size by 208 bytes.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agonetfilter: nf_tables: use net_generic infra for transaction data
Florian Westphal [Thu, 1 Apr 2021 14:11:10 +0000 (16:11 +0200)]
netfilter: nf_tables: use net_generic infra for transaction data

This moves all nf_tables pernet data from struct net to a net_generic
extension, with the exception of the gencursor.

The latter is used in the data path and also outside of the nf_tables
core. All others are only used from the configuration plane.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agonetfilter: ebtables: use net_generic infra
Florian Westphal [Thu, 1 Apr 2021 14:11:09 +0000 (16:11 +0200)]
netfilter: ebtables: use net_generic infra

ebtables currently uses net->xt.tables[BRIDGE], but upcoming
patch will move net->xt.tables away from struct net.

To avoid exposing x_tables internals to ebtables, use a private list
instead.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agonetfilter: nf_defrag_ipv4: use net_generic infra
Florian Westphal [Thu, 1 Apr 2021 14:11:08 +0000 (16:11 +0200)]
netfilter: nf_defrag_ipv4: use net_generic infra

This allows followup patch to remove the defrag_ipv4 member from struct
net.  It also allows to auto-remove the hooks later on by adding a
_disable() function.  This will be done later in a follow patch series.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agonetfilter: nf_defrag_ipv6: use net_generic infra
Florian Westphal [Thu, 1 Apr 2021 14:11:07 +0000 (16:11 +0200)]
netfilter: nf_defrag_ipv6: use net_generic infra

This allows followup patch to remove these members from struct net.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agonetfilter: cttimeout: use net_generic infra
Florian Westphal [Thu, 1 Apr 2021 14:11:06 +0000 (16:11 +0200)]
netfilter: cttimeout: use net_generic infra

reduce size of struct net and make this self-contained.
The member in struct net is kept to minimize changes to struct net
layout, it will be removed in a separate patch.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agonetfilter: nfnetlink: use net_generic infra
Florian Westphal [Thu, 1 Apr 2021 14:11:05 +0000 (16:11 +0200)]
netfilter: nfnetlink: use net_generic infra

No need to place it in struct net, nfnetlink is a module and usage
doesn't occur in fastpath.

Also remove rcu usage:

Not a single reader of net->nfnl uses rcu accessors.

When exit_batch callbacks are executed the net namespace is already dead
so no calls to these functions are possible anymore (else we'd get NULL
deref crash too).

If the module is removed, then modules that call any of those functions
have been removed too so no calls to nfnl functions are possible either.

The nfnl and nfl_stash pointers in struct net are no longer used, they
will be removed in a followup patch to minimize changes to struct net
(causes rebuild for entire network stack).

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agonetfilter: nfnetlink: add and use nfnetlink_broadcast
Florian Westphal [Thu, 1 Apr 2021 14:11:04 +0000 (16:11 +0200)]
netfilter: nfnetlink: add and use nfnetlink_broadcast

This removes the only reference of net->nfnl outside of the nfnetlink
module.  This allows to move net->nfnl to net_generic infra.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agonet: smsc911x: skip acpi_device_id table when !CONFIG_ACPI
Krzysztof Kozlowski [Mon, 5 Apr 2021 18:15:48 +0000 (20:15 +0200)]
net: smsc911x: skip acpi_device_id table when !CONFIG_ACPI

The driver can match via multiple methods.  Its acpi_device_id table is
referenced via ACPI_PTR() so it will be unused for !CONFIG_ACPI builds:

  drivers/net/ethernet/smsc/smsc911x.c:2652:36: warning:
    ‘smsc911x_acpi_match’ defined but not used [-Wunused-const-variable=]

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: hns3: Limiting the scope of vector_ring_chain variable
Salil Mehta [Mon, 5 Apr 2021 17:28:25 +0000 (18:28 +0100)]
net: hns3: Limiting the scope of vector_ring_chain variable

Limiting the scope of the variable vector_ring_chain to the block where it
is used.

Fixes: 424eb834a9be ("net: hns3: Unified HNS3 {VF|PF} Ethernet Driver for hip08 SoC")
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: Allow to specify ifindex when device is moved to another namespace
Andrei Vagin [Mon, 5 Apr 2021 07:12:23 +0000 (00:12 -0700)]
net: Allow to specify ifindex when device is moved to another namespace

Currently, we can specify ifindex on link creation. This change allows
to specify ifindex when a device is moved to another network namespace.

Even now, a device ifindex can be changed if there is another device
with the same ifindex in the target namespace. So this change doesn't
introduce completely new behavior, it adds more control to the process.

CRIU users want to restore containers with pre-created network devices.
A user will provide network devices and instructions where they have to
be restored, then CRIU will restore network namespaces and move devices
into them. The problem is that devices have to be restored with the same
indexes that they have before C/R.

Cc: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Suggested-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Reviewed-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: nfc: Fix spelling errors in net/nfc module
Zheng Yongjun [Mon, 5 Apr 2021 10:54:35 +0000 (18:54 +0800)]
net: nfc: Fix spelling errors in net/nfc module

These patches fix a series of spelling errors in net/nfc module.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agotipc: Fix a kernel-doc warning in name_table.c
Wu XiangCheng [Sun, 4 Apr 2021 14:23:15 +0000 (22:23 +0800)]
tipc: Fix a kernel-doc warning in name_table.c

Fix kernel-doc warning:

Documentation/networking/tipc:66: /home/sfr/next/next/net/tipc/name_table.c
  :558: WARNING: Unexpected indentation.
Documentation/networking/tipc:66: /home/sfr/next/next/net/tipc/name_table.c
  :559: WARNING: Block quote ends without a blank line; unexpected unindent.

Due to blank line missing.

Fixes: 908148bc5046 ("tipc: refactor tipc_sendmsg() and tipc_lookup_anycast()")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Link: https://lore.kernel.org/netdev/20210318172255.63185609@canb.auug.org.au/
Signed-off-by: Wu XiangCheng <bobwxc@email.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomld: change lockdep annotation for ip6_sf_socklist and ipv6_mc_socklist
Taehee Yoo [Sun, 4 Apr 2021 13:38:23 +0000 (13:38 +0000)]
mld: change lockdep annotation for ip6_sf_socklist and ipv6_mc_socklist

struct ip6_sf_socklist and ipv6_mc_socklist are per-socket MLD data.
These data are protected by rtnl lock, socket lock, and RCU.
So, when these are used, it verifies whether rtnl lock is acquired or not.

ip6_mc_msfget() is called by do_ipv6_getsockopt().
But caller doesn't acquire rtnl lock.
So, when these data are used in the ip6_mc_msfget() lockdep warns about it.
But accessing these is actually safe because socket lock was acquired by
do_ipv6_getsockopt().

So, it changes lockdep annotation from rtnl lock to socket lock.
(rtnl_dereference -> sock_dereference)

Locking graph for mld data is like below:

When writing mld data:
do_ipv6_setsockopt()
    rtnl_lock
    lock_sock
    (mld functions)
        idev->mc_lock(if per-interface mld data is modified)

When reading mld data:
do_ipv6_getsockopt()
    lock_sock
    ip6_mc_msfget()

Splat looks like:
=============================
WARNING: suspicious RCU usage
5.12.0-rc4+ #503 Not tainted
-----------------------------
net/ipv6/mcast.c:610 suspicious rcu_dereference_protected() usage!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
1 lock held by mcast-listener-/923:
 #0: ffff888007958a70 (sk_lock-AF_INET6){+.+.}-{0:0}, at:
ipv6_get_msfilter+0xaf/0x190

stack backtrace:
CPU: 1 PID: 923 Comm: mcast-listener- Not tainted 5.12.0-rc4+ #503
Call Trace:
 dump_stack+0xa4/0xe5
 ip6_mc_msfget+0x553/0x6c0
 ? ipv6_sock_mc_join_ssm+0x10/0x10
 ? lockdep_hardirqs_on_prepare+0x3e0/0x3e0
 ? mark_held_locks+0xb7/0x120
 ? lockdep_hardirqs_on_prepare+0x27c/0x3e0
 ? __local_bh_enable_ip+0xa5/0xf0
 ? lock_sock_nested+0x82/0xf0
 ipv6_get_msfilter+0xc3/0x190
 ? compat_ipv6_get_msfilter+0x300/0x300
 ? lock_downgrade+0x690/0x690
 do_ipv6_getsockopt.isra.6.constprop.13+0x1809/0x29e0
 ? do_ipv6_mcast_group_source+0x150/0x150
 ? register_lock_class+0x1750/0x1750
 ? kvm_sched_clock_read+0x14/0x30
 ? sched_clock+0x5/0x10
 ? sched_clock_cpu+0x18/0x170
 ? find_held_lock+0x3a/0x1c0
 ? lock_downgrade+0x690/0x690
 ? ipv6_getsockopt+0xdb/0x1b0
 ipv6_getsockopt+0xdb/0x1b0
[ ... ]

Fixes: 88e2ca308094 ("mld: convert ifmcaddr6 to RCU")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>