git.proxmox.com Git - mirror_ubuntu-kernels.git/log

dpaa2-switch: add support for configuring learning state per port

Add support for configuring the learning state of a switch port.
When the user requests the HW learning to be disabled, a fast-age
procedure on that specific port is run so that previously learnt
addresses do not linger.

At device probe as well as on a bridge leave action, the ports are
configured with HW learning disabled since they are basically a
standalone port.

At the same time, at bridge join we inherit the bridge port BR_LEARNING
flag state and configure it on the switch port.

There were already some MC firmware ABI functions for changing the
learning state, but those were per FDB (bridging domain) and not per
port so we need to adjust those to use the new MC fw command which is
per port.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dpaa2-switch: refactor the egress flooding domain setup

Extract the code that determines the list of egress flood interfaces for
a specific flood type into a new function -
dpaa2_switch_fdb_get_flood_cfg().

This will help us to not duplicate code when the broadcast and
unknown ucast/mcast flooding domains will be individually configurable.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dpaa2-switch: move the dpaa2_switch_fdb_set_egress_flood function

In order to avoid a forward declaration in the next patches, move the
dpaa2_switch_fdb_set_egress_flood() function to the top of the file.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'lantiq-xrx300-xrx330'

Aleksander Jan Bajkowski says:

====================
net: dsa: lantiq: add support for xRX300 and xRX330

Changed since v3:
* fixed last compilation warning

Changed since v2:
* fixed compilation warnings
* removed example bindings for xrx330
* patches has been refactored due to upstream changes

Changed since v1:
* gswip_mii_mask_cfg() can now change port 3 on xRX330
* changed alowed modes on port 0 and 5 for xRX300 and xRX330
* moved common part of phylink validation into gswip_phylink_set_capab()
* verify the compatible string against the hardware
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

dt-bindings: net: dsa: lantiq: add xRx300 and xRX330 switch bindings

Add compatible string for xRX300 and xRX330 SoCs.

Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: lantiq: verify compatible strings against hardware

Verify compatible string against hardware.

Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Acked-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: lantiq: allow to use all GPHYs on xRX300 and xRX330

This patch allows to use all PHYs on GRX300 and GRX330. The ARX300
has 3 and the GRX330 has 4 integrated PHYs connected to different
ports compared to VRX200. Each integrated PHY can work as single
Gigabit Ethernet PHY (GMII) or as double Fast Ethernet PHY (MII).

Allowed port configurations:

xRX200:
GMAC0: RGMII, MII, REVMII or RMII port
GMAC1: RGMII, MII, REVMII or RMII port
GMAC2: GPHY0 (GMII)
GMAC3: GPHY0 (MII)
GMAC4: GPHY1 (GMII)
GMAC5: GPHY1 (MII) or RGMII port

xRX300:
GMAC0: RGMII port
GMAC1: GPHY2 (GMII)
GMAC2: GPHY0 (GMII)
GMAC3: GPHY0 (MII)
GMAC4: GPHY1 (GMII)
GMAC5: GPHY1 (MII) or RGMII port

xRX330:
GMAC0: RGMII, GMII or RMII port
GMAC1: GPHY2 (GMII)
GMAC2: GPHY0 (GMII)
GMAC3: GPHY0 (MII) or GPHY3 (GMII)
GMAC4: GPHY1 (GMII)
GMAC5: GPHY1 (MII), RGMII or RMII port

Tested on D-Link DWR966 (xRX330) with OpenWRT.

Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Acked-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
100GbE Intel Wired LAN Driver Updates 2021-03-22

This series contains updates to ice and iavf drivers.

Haiyue Wang says:

The Intel E810 Series supports a programmable pipeline for a domain
specific protocols classification, for example GTP by Dynamic Device
Personalization (DDP) profile.

The E810 PF has introduced flex-bytes support by ethtool user-def option
allowing for packet deeper matching based on an offset and value for DDP
usage.

For making VF also benefit from this flexible protocol classification,
some new virtchnl messages are defined and handled by PF, so VF can
query this new flow director capability, and use ethtool with extending
the user-def option to configure Rx flow classification.

The new user-def 0xAAAABBBBCCCCDDDD: BBBB is the 2 byte pattern while
AAAA corresponds to its offset in the packet. Similarly DDDD is the 2
byte pattern with CCCC being the corresponding offset. The offset ranges
from 0x0 to 0x1F7 (up to 504 bytes into the packet). The offset starts
from the beginning of the packet.

This feature can be used to allow customers to set flow director rules
for protocols headers that are beyond standard ones supported by
ethtool (e.g. PFCP or GTP-U).

Like for matching GTP-U's TEID value 0x10203040:
ethtool -N ens787f0v0 flow-type udp4 dst-port 2152 \
user-def 0x002e102000303040 action 13
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mlxsw-resil-nexthop-groups-prep'

Ido Schimmel says:

====================
mlxsw: Preparations for resilient nexthop groups

This patchset contains preparations for resilient nexthop groups support in
mlxsw. A follow-up patchset will add support and selftests. Most of the
patches are trivial and small to make review easier.

Patchset overview:

Patch #1 removes RTNL assertion in nexthop notifier block since it is
not needed. The assertion will trigger when mlxsw starts processing
notifications related to resilient groups as not all are emitted with
RTNL held.

Patches #2-#9 gradually add support for nexthops with trap action. Up
until now mlxsw did not program nexthops whose neighbour entry was not
resolved. This will not work with resilient groups as their size is
fixed and the nexthop mapped to each bucket is determined by the nexthop
code. Therefore, nexthops whose neighbour entry is not resolved will be
programmed to trap packets to the CPU in order to trigger neighbour
resolution.

Patch #10 is a non-functional change to allow for code reuse between
regular nexthop groups and resilient ones.

Patch #11 avoids unnecessary neighbour updates in hardware. See the
commit message for a detailed explanation.

Patches #12-#14 add support for additional nexthop group sizes that are
supported by Spectrum-{2,3} ASICs.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_router: Add Spectrum-{2, 3} adjacency group size ranges

Spectrum-{2,3} support different adjacency group size ranges compared to
Spectrum-1. Add an array describing these ranges and change the common
code to use the array which was set during the per-ASIC initialization.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_router: Encode adjacency group size ranges in an array

The device supports a fixed set of adjacency group sizes. Encode these
sizes in an array, so that the next patch will be able to split it
between Spectrum-1 and Spectrum-{2,3}, which support different size
ranges.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_router: Create per-ASIC router operations

There are several differences in the router module between Spectrum-1
and Spectrum-{2,3}. Currently, this is only apparent in the router
interface (RIF) operations that are split between these ASICs.

A subsequent patch is going to introduce another difference between
these ASICs.

Create per-ASIC router operations that will encapsulate all these
differences. For now, these operations are only used to set the per-ASIC
RIF operations in 'mlxsw_sp->router->rif_ops_arr'. Note that this fields
was unused since commit 1f5b23033937 ("mlxsw: spectrum: Set RIF ops per
ASIC type").

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_router: Avoid unnecessary neighbour updates

Avoid updating neighbour and adjacency entries in hardware when the
neighbour is already connected and its MAC address did not change. This
can happen, for example, when neighbour transitions between valid states
such as 'NUD_REACHABLE' and 'NUD_DELAY'.

This is especially important for resilient hashing as these updates will
result in adjacency entries being marked as active.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_router: Break nexthop group entry validation to a separate function

The validation of a nexthop group entry is also necessary for resilient
nexthop groups, so break the validation to a separate function to allow
for code reuse in subsequent patches.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_router: Encapsulate nexthop update in a function

Encapsulate this functionality in a separate function, so that it could
be invoked by follow-up patches, when replacing a nexthop bucket that is
part of a resilient nexthop group.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_router: Rename nexthop update function to reflect its type

mlxsw_sp_nexthop_update() is used to update the configuration of
Ethernet-type nexthops, as opposed to mlxsw_sp_nexthop_ipip_update(),
which is used to update IPinIP-type nexthops.

Rename the function to mlxsw_sp_nexthop_eth_update(), so that it is
consistent with mlxsw_sp_nexthop_ipip_update().

It will allow us to introduce mlxsw_sp_nexthop_update() in a follow-up
patch, which calls either of above mentioned function based on the
nexthop's type.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_router: Add nexthop trap action support

Currently, nexthops are programmed with either forward or discard action
(for blackhole nexthops). Nexthops that do not have a valid MAC address
(neighbour) or router interface (RIF) are simply not written to the
adjacency table.

In resilient nexthop groups, the size of the group must remain fixed and
the kernel is in complete control of the layout of the adjacency table.
A nexthop without a valid MAC or RIF will therefore be written with a
trap action, to trigger neighbour resolution.

Allow such nexthops to be programmed to the adjacency table to enable
above mentioned use case.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_router: Prepare for nexthops with trap action

Nexthops that need to be programmed with a trap action might not have a
valid router interface (RIF) associated with them. Therefore, use the
loopback RIF created during initialization to program them to the
device.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_router: Introduce nexthop action field

Currently, the action associated with the nexthop is assumed to be
'forward' unless the 'discard' bit is set.

Instead, simplify this by introducing a dedicated field to represent the
action of the nexthop. This will allow us to more easily introduce more
actions, such as trap.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_router: Adjust comments on nexthop fields

The comments assume that nexthops are simple Ethernet nexthops
that are programmed to forward packets to the associated neighbour. This
is no longer the case, as both IPinIP and blackhole nexthops are now
supported.

Adjust the comments to reflect these changes.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_router: Only provide MAC address for valid nexthops

The helper returns the MAC address associated with the nexthop. It is
only valid when the nexthop forwards packets and when it is an Ethernet
nexthop. Reflect this in the checks the helper is performing.

This is not an issue because the sole caller of the function only
invokes it for such nexthops.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_router: Consolidate nexthop helpers

The helper mlxsw_sp_nexthop_offload() is actually interested in finding
out if the nexthop is both written to the adjacency table and forwarding
packets (as opposed to discarding them).

Rename it to mlxsw_sp_nexthop_is_forward() and remove
mlxsw_sp_nexthop_is_discard().

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_router: Remove RTNL assertion

Remove the RTNL assertion in the nexthop notifier block. The assertion
is not needed given RTNL is never assumed to be taken.

This is a preparation for future patches where mlxsw will start handling
nexthop events that are not always sent with RTNL held.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bridge: when suppression is enabled exclude RARP packets

Recently we had an interop issue where RARP packets got suppressed with
bridge neigh suppression enabled, but the check in the code was meant to
suppress GARP. Exclude RARP packets from it which would allow some VMWare
setups to work, to quote the report:
"Those RARP packets usually get generated by vMware to notify physical
switches when vMotion occurs. vMware may use random sip/tip or just use
sip=tip=0. So the RARP packet sometimes get properly flooded by the vtep
and other times get dropped by the logic"

Reported-by: Amer Abdalamer <amer@nvidia.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net-sysfs: remove possible sleep from an RCU read-side critical section

xps_queue_show is mostly made of an RCU read-side critical section and
calls bitmap_zalloc with GFP_KERNEL in the middle of it. That is not
allowed as this call may sleep and such behaviours aren't allowed in RCU
read-side critical sections. Fix this by using GFP_NOWAIT instead.

Fixes: 5478fcd0f483 ("net: embed nr_ids in the xps maps")
Reported-by: kernel test robot <oliver.sang@intel.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: platform: fix build error with !CONFIG_PM_SLEEP

Get rid of the CONFIG_PM_SLEEP ifdefery to fix the build error
and use __maybe_unused for the suspend()/resume() hooks to avoid
build warning:

drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c:769:21:
error: 'stmmac_runtime_suspend' undeclared here (not in a function); did you mean 'stmmac_suspend'?
  769 |  SET_RUNTIME_PM_OPS(stmmac_runtime_suspend, stmmac_runtime_resume, NULL)
      |                     ^~~~~~~~~~~~~~~~~~~~~~
./include/linux/pm.h:342:21: note: in definition of macro 'SET_RUNTIME_PM_OPS'
  342 |  .runtime_suspend = suspend_fn, \
      |                     ^~~~~~~~~~
drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c:769:45:
error: 'stmmac_runtime_resume' undeclared here (not in a function)
  769 |  SET_RUNTIME_PM_OPS(stmmac_runtime_suspend, stmmac_runtime_resume, NULL)
      |                                             ^~~~~~~~~~~~~~~~~~~~~
./include/linux/pm.h:343:20: note: in definition of macro 'SET_RUNTIME_PM_OPS'
  343 |  .runtime_resume = resume_fn, \
      |                    ^~~~~~~~~

Fixes: 5ec55823438e ("net: stmmac: add clocks management for gmac driver")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: l2tp: Fix a typo

s/verifed/verified/

Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

misdn: avoid -Wempty-body warning

gcc warns about a pointless condition:

drivers/isdn/hardware/mISDN/hfcmulti.c: In function 'hfcmulti_interrupt':
drivers/isdn/hardware/mISDN/hfcmulti.c:2752:17: error: suggest braces around empty body in an 'if' statement [-Werror=empty-body]
2752 | ; /* external IRQ */

As the check has no effect, just remove it.

Suggested-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: move the ptype_all and ptype_base declarations to include/linux/netdevice.h

ptype_all and ptype_base are declared in net/core/dev.c as non-static,
because they are used by net-procfs.c too. However, a "make W=1" build
complains that there was no previous declaration of ptype_all and
ptype_base in a header file, so this way of declaring things constitutes
a violation of coding style.

Let's move the extern declarations of ptype_all and ptype_base to the
linux/netdevice.h file, which is included by net-procfs.c too.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: make xps_needed and xps_rxqs_needed static

Since their introduction in commit 04157469b7b8 ("net: Use static_key
for XPS maps"), xps_needed and xps_rxqs_needed were never used outside
net/core/dev.c, so I don't really understand why they were exported as
symbols in the first place.

This is needed in order to silence a "make W=1" warning about these
static keys not being declared as static variables, but not having a
previous declaration in a header file nonetheless.

Cc: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bridge: declare br_vlan_tunnel_lookup argument tunnel_id as __be64

The only caller of br_vlan_tunnel_lookup, br_handle_ingress_vlan_tunnel,
extracts the tunnel_id from struct ip_tunnel_info::struct ip_tunnel_key::
tun_id which is a __be64 value.

The exact endianness does not seem to matter, because the tunnel id is
just used as a lookup key for the VLAN group's tunnel hash table, and
the value is not interpreted directly per se. Moreover,
rhashtable_lookup_fast treats the key argument as a const void *.

Therefore, there is no functional change associated with this patch,
just one to silence "make W=1" builds.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: Remove redundant NULL check

Fix the following coccicheck warnings:

./drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c:3540:2-8: WARNING: NULL
check before some freeing functions is not needed.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: fix up kerneldoc some more

Commit 0b5294483c35 ("net: dsa: mv88e6xxx: scratch: Fixup kerneldoc")
has addressed some but not all kerneldoc warnings for the Global 2
Scratch register accessors. Namely, we have some mismatches between
the function names in the kerneldoc and the ones in the actual code.
Let's adjust the comments so that they match the functions they're
sitting next to.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bnxt_en-Error-recovery-improvements'

Michael Chan says:

====================
bnxt_en: Error recovery improvements.

This series contains some improvements for error recovery. The main
changes are:

1. Keep better track of the health register mappings with the
"status_reliable" flag.

2. Don't wait for firmware responses if firmware is not healthy.

3. Better retry logic of the first firmware message.

4. Set the proper flag early to let the RDMA driver know that firmware
reset has been detected.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt_en: Enhance retry of the first message to the firmware.

Two enhancements:

1. Read the health status first before sending the first
HWRM_VER_GET message to firmware instead of the other way around.
This guarantees we got the accurate health status before we attempt
to send the message.

2. We currently only retry sending the first HWRM_VER_GET message to
the firmware if the firmware is in the process of booting.  If the
firmware is in error state and is doing core dump for example, the
driver should also retry if the health register has the RECOVERING
flag set.  This flag indicates the firmware will undergo recovery
soon.  Modify the retry logic to retry for this case as well.

Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt_en: Remove the read of BNXT_FW_RESET_INPROG_REG after firmware reset.

Once the chip goes through reset, the register mapping may be lost
and any read of the mapped health registers may return garbage value
until the registers are mapped again in the init path.

Reading BNXT_FW_RESET_INPROG_REG after firmware reset will likely
return garbage value due to the above reason. Reading this register
is for information purpose only so remove it.

Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt_en: Set BNXT_STATE_FW_RESET_DET flag earlier for the RDMA driver.

During ifup, if the driver detects that firmware has gone through a
reset, it will go through a re-probe sequence. If the RDMA driver is
loaded, the re-probe sequence includes calling the RDMA driver to stop.
We need to set the BNXT_STATE_FW_RESET_DET flag earlier so that it is
visible to the RDMA driver. The RDMA driver's stop sequence is
different if firmware has gone through a reset.

Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: P B S Naresh Kumar <nareshkumar.pbs@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt_en: check return value of bnxt_hwrm_func_resc_qcaps

Check return value of call to bnxt_hwrm_func_resc_qcaps in
bnxt_hwrm_if_change and return failure on error.

Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Scott Branden <scott.branden@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt_en: don't fake firmware response success when PCI is disabled

The original intent here is to allow commands during reset to succeed
without error when the device is disabled, to ensure that cleanup
completes normally during NIC close, where firmware is not necessarily
expected to respond.

The problem with faking success during reset's PCI disablement is that
unrelated ULP commands will also see inadvertent success during reset
when failure would otherwise be appropriate. It is better to return
a different error result such that reset related code can detect
this unique condition and ignore as appropriate.

Note, the pci_disable_device() when firmware is fatally wounded in
bnxt_fw_reset_close() does not need to be addressed, as subsequent
commands are already expected to fail due to the BNXT_NO_FW_ACCESS()
check in bnxt_hwrm_do_send_msg().

Reviewed-by: Scott Branden <scott.branden@broadcom.com>
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt_en: Improve wait for firmware commands completion

In situations where FW has crashed, the bnxt_hwrm_do_send_msg() call
will have to wait until timeout for each firmware message.  This
generally takes about half a second for each firmware message.  If we
try to unload the driver n this state, the unload sequence will take
a long time to complete.

Improve this by checking the health register if it is available and
abort the wait for the firmware response if the register shows that
firmware is not healthy.  The very first message HWRM_VER_GET is
excluded from this check because that message is used to poll for
firmware to come out of reset during error recovery.

Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt_en: Improve the status_reliable flag in bp->fw_health.

In order to read the firmware health status, we first need to determine
the register location and then the register may need to be mapped.
There are 2 code paths to do this.  The first one is done early as a
best effort attempt by the function bnxt_try_map_fw_health_reg().  The
second one is done later in the function bnxt_map_fw_health_regs()
after establishing communications with the firmware.  We currently
only set fw_health->status_reliable if we can successfully set up the
health register in the first code path.

Improve the scheme by setting the fw_health->status_reliable flag if
either (or both) code paths can successfully set up the health
register.  This flag is relied upon during run-time when we need to
check the health status.  So this will make it work better.

During ifdown, if the health register is mapped, we need to invalidate
the health register mapping because a potential fw reset will reset
the mapping.  Similarly, we need to do the same after firmware reset
during recovery.  We'll remap it during ifup.

Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'hns3-flow-director'

Huazhong Tan says:

====================
net: hns3: refactor and new features for flow director

This patchset refactor some functions and add some new features for
flow director.

patch 1~3: refactor large functions
patch 4, 7: add traffic class and user-def field support for ethtool
patch 5: refactor flow director configuration
patch 6: clean up for hns3_del_all_fd_entries()

change log:
V1->V2: modifies patch 5 as Jakub suggested, keep configuring
ethtool/tc flower rules synchronously while aRFS
asynchronously.
changes the usecnt of user-def rule checking in patch 7.
removes previous patches 8 and 9 from this series, since
there are issues that need further discussion.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: add support for user-def data of flow director

For DEVICE_VERSION_V3, the hardware supports to match specified
data in the specified offset of packet payload. Each layer can
have one offset, and can't be masked when configure flow director
rule by ethtool command. The layer is selected based on the
flow-type, ether for L2, ip4/ipv6 for L3, and tcp4/tcp6/udp4/udp6
for L4. For example, tcp4/tcp6/udp4/udp6 rules share the same
user-def offset, but each rule can have its own user-def value.

For the user-def field of ethtool -N/U command is 64 bits long.
The bit 0~15 is used for user-def value, and bit 32~47 for user-def
offset in HNS3 driver.

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: refine for hns3_del_all_fd_entries()

For only PF driver can configure flow director rule, it's
better to call hclge_del_all_fd_entries() directly in hclge
layer, rather than call hns3_del_all_fd_entries() in hns3
layer. Then the ae_algo->ops.del_all_fd_entries can be removed.

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: refactor flow director configuration

Currently, the flow director rule of aRFS is configured in
the IO path. It's time-consuming. So move out the configuration,
and configure it asynchronously. And keep ethtool and tc flower
rule using synchronous way, otherwise the application maybe
unable to know the rule is installed or pending.

Add a state member for each flow director rule to indicate the
rule state. There are 4 states:
TO_ADD: the rule is waiting to add to hardware
TO_DEL: the rule is waiting to remove from hardware
DELETED: the rule has been removed from hardware. It's a middle
state, used to remove the rule node in the fd_rule_list.
ACTIVE: the rule is already added in hardware

For asynchronous way, when receive a new request to add or delete
flow director rule by aRFS, update the rule list, then request to
schedule the service task to finish the configuration.

For synchronous way, when receive a new request to add or delete
flow director rule by ethtool or tc flower, configure hardware
directly, then update the rule list if success.

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: add support for traffic class tuple support for flow director by ethtool

The hardware supports to parse and match the traffic class field
of IPv6 packet for flow director, uses the same tuple as ip tos.
So removes the limitation of configure 'tclass' by driver.

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: refactor for function hclge_fd_convert_tuple

Currently, there are too many branches for hclge_fd_convert_tuple().
And it may be more when add new tuples. Refactor it by sorting the
tuples according to their length. So it only needs several KEY_OPT
now, and being flexible to add new tuples.

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: refactor out hclge_fd_get_tuple()

The process of function hclge_fd_get_tuple() is complex and
prolix. To make it more readable, extract the process of each
flow-type tuple to a single function.

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: refactor out hclge_add_fd_entry()

The process of function hclge_add_fd_entry() is complex and
prolix. To make it more readable, extract the process of
fs->ring_cookie to a single function.

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

linux/qed: Mundane spelling fixes throughout the file

s/unrequired/"not required"/
s/consme/consume/ .....two different places
s/accros/across/

Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

openvswitch: Fix a typo

s/subsytem/subsystem/

Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

NFC: Fix a typo

s/packaet/packet/

Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'actions-semi-ethernet-mac'

Cristian Ciocaltea says:

====================
Add support for Actions Semi Owl Ethernet MAC

This patch series adds support for the Ethernet MAC found on the Actions
Semi Owl family of SoCs.

For the moment I have only tested the driver on RoseapplePi SBC, which is
based on the S500 SoC variant. It might work on S900 as well, but I cannot
tell for sure since the S900 datasheet I currently have doesn't provide
any information regarding the MAC registers - so I couldn't check the
compatibility with S500.

Similar story for S700: the datasheet I own is incomplete, but it seems
the MAC is advertised with Gigabit capabilities. For that reason most
probably we need to extend the current implementation in order to support
this SoC variant as well.

Please note that for testing the driver it is also necessary to update the
S500 clock subsystem:

https://lore.kernel.org/lkml/cover.1615221459.git.cristian.ciocaltea@gmail.com/

The DTS changes for the S500 SBCs will be provided separately.

Thanks,
Cristi

Changes in v3:
- Dropped the 'debug' module parameter and passed the default NETIF_MSG flags
to netif_msg_init(), according to David's review

- Removed the owl_emac_generate_mac_addr() function and the related
OWL_EMAC_GEN_ADDR_SYS_SN config option until a portable solution to get
the system serial number is found - when building on arm64 the following
error is thrown (as reported by Rob's kernel bot):
'[...]/owl-emac.c:9:10: fatal error: asm/system_info.h: No such file or directory'

- Rebased patchset on v5.12-rc4

Changes in v2:
* According to Philipp's review
- Requested exclusive control over serial line via
   devm_reset_control_get_exclusive()
- Optimized error handling by using dev_err_probe()

* According to Andrew's review
- Dropped the inline keywords
- Applied Reverse Christmas Tree format to local variable declarations
- Renamed owl_emac_phy_config() to owl_emac_update_link_state()
- Documented the purpose of the special descriptor used in the context of
   owl_emac_setup_frame_xmit()
- Updated comment inside owl_emac_mdio_clock_enable() regarding the MDC
   clock divider setup
- Indicated MAC support for symmetric pause via phy_set_sym_pause()
   in owl_emac_phy_init()
- Changed the MAC addr generation algorithm in owl_emac_generate_mac_addr()
   by setting the locally administered bit in byte 0 and replacing bytes 1 & 2
   with additional entries from enc_sn
- Moved devm_add_action_or_reset() before clk_set_rate() in owl_emac_probe()

* Other
- Added SMII interface support: updated owl_emac_core_sw_reset(), added
   owl_emac_clk_set_rate(), updated description in the YAML binding
- Changed OWL_EMAC_TX_TIMEOUT from 0.05*HZ to 2*HZ
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

MAINTAINERS: Add entries for Actions Semi Owl Ethernet MAC

Add entries for Actions Semi Owl Ethernet MAC binding and driver.

Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: actions: Add Actions Semi Owl Ethernet MAC driver

Add new driver for the Ethernet MAC used on the Actions Semi Owl
family of SoCs.

Currently this has been tested only on the Actions Semi S500 SoC
variant.

Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dt-bindings: net: Add Actions Semi Owl Ethernet MAC binding

Add devicetree binding for the Ethernet MAC present on the Actions
Semi Owl family of SoCs.

For the moment advertise only the support for the Actions Semi S500 SoC
variant.

Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/sched: cls_flower: use nla_get_be32 for TCA_FLOWER_KEY_FLAGS

The existing code is functionally correct: iproute2 parses the ip_flags
argument for tc-flower and really packs it as big endian into the
TCA_FLOWER_KEY_FLAGS netlink attribute. But there is a problem in the
fact that W=1 builds complain:

net/sched/cls_flower.c:1047:15: warning: cast to restricted __be32

This is because we should use the dedicated helper for obtaining a
__be32 pointer to the netlink attribute, not a u32 one. This ensures
type correctness for be32_to_cpu.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/sched: cls_flower: use ntohs for struct flow_dissector_key_ports

A make W=1 build complains that:

net/sched/cls_flower.c:214:20: warning: cast from restricted __be16
net/sched/cls_flower.c:214:20: warning: incorrect type in argument 1 (different base types)
net/sched/cls_flower.c:214:20: expected unsigned short [usertype] val
net/sched/cls_flower.c:214:20: got restricted __be16 [usertype] dst

This is because we use htons on struct flow_dissector_key_ports members
src and dst, which are defined as __be16, so they are already in network
byte order, not host. The byte swap function for the other direction
should have been used.

Because htons and ntohs do the same thing (either both swap, or none
does), this change has no functional effect except to silence the
warnings.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netdev: add netdev_queue_set_dql_min_limit()

Add a function to set the dynamic queue limit minimum value.

Some specific drivers might have legitimate reasons to configure
dql.min_limit to a given value. Typically, this is the case when the
PDU of the protocol is smaller than the packet size to used to
carry those frames to the device.

Concrete example: a CAN (Control Area Network) device with an USB 2.0
interface. The PDU of classical CAN protocol are roughly 16 bytes but
the USB packet size (which is used to carry the CAN frames to the
device) might be up to 512 bytes. Wen small traffic burst occurs, BQL
algorithm is not able to immediately adjust and this would result in
having to send many small USB packets (i.e packet of 16 bytes for each
CAN frame). Filling up the USB packet with CAN frames is relatively
fast (small latency issue) but the gain of not having to send several
small USB packets is huge (big throughput increase). In this case,
forcing dql.min_limit to a given value that would allow to stuff the
USB packet is always a win.

This function is to be used by network drivers which are able to prove
through a rationale and through empirical tests on several environment
(with other applications, heavy context switching, virtualization...),
that they constantly reach better performances with a specific
predefined dql.min_limit value with no noticeable latency impact.

Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

dsa: simplify Kconfig symbols and dependencies

1. Remove CONFIG_HAVE_NET_DSA.

CONFIG_HAVE_NET_DSA is a legacy leftover from the times when drivers
should have selected CONFIG_NET_DSA manually.
Currently, all drivers has explicit 'depends on NET_DSA', so this is
no more needed.

2. CONFIG_HAVE_NET_DSA dependencies became CONFIG_NET_DSA's ones.

- dropped !S390 dependency which was introduced to be sure NET_DSA
   can select CONFIG_PHYLIB. DSA migrated to Phylink almost 3 years
   ago and the PHY library itself doesn't depend on !S390 since
   commit 870a2b5e4fcd ("phylib: remove !S390 dependeny from Kconfig");
- INET dependency is kept to be sure we can select NET_SWITCHDEV;
- NETDEVICES dependency is kept to be sure we can select PHYLINK.

3. DSA drivers menu now depends on NET_DSA.

Instead on 'depends on NET_DSA' on every single driver, the entire
menu now depends on it. This eliminates a lot of duplicated lines
from Kconfig with no loss (when CONFIG_NET_DSA=m, drivers also can
be only m or n).
This also has a nice side effect that there's no more empty menu on
configurations without DSA.

4. Kbuild will now descend into 'drivers/net/dsa' only when
   CONFIG_NET_DSA is y or m.

This is safe since no objects inside this folder can be built without
DSA core, as well as when CONFIG_NET_DSA=m, no objects can be
built-in.

Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

iavf: Enable flex-bytes support

Flex-bytes allows for packet matching based on an offset and value. This
is supported via the ethtool user-def option.

The user-def 0xAAAABBBBCCCCDDDD: BBBB is the 2 byte pattern while AAAA
corresponds to its offset in the packet. Similarly DDDD is the 2 byte
pattern with CCCC being the corresponding offset. The offset ranges from
0x0 to 0x1F7 (up to 504 bytes into the packet). The offset starts from
the beginning of the packet.

This feature can be used to allow customers to set flow director rules
for protocols headers that are beyond standard ones supported by ethtool
(e.g. PFCP or GTP-U).

Like for matching GTP-U's TEID value 0x10203040:
ethtool -N ens787f0v0 flow-type udp4 dst-port 2152 \
user-def 0x002e102000303040 action 13

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Tested-by: Chen Bo <BoX.C.Chen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

iavf: Support Ethernet Type Flow Director filters

Support the addition and deletion of Ethernet filters.

Supported fields are: proto
Supported flow-types are: ether

Example usage:
ethtool -N ens787f0v0 flow-type ether proto 0x8863 action 6
ethtool -N ens787f0v0 flow-type ether proto 0x8864 action 7

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Tested-by: Chen Bo <BoX.C.Chen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

iavf: Support IPv6 Flow Director filters

Support the addition and deletion of IPv6 filters.

Supported fields are: src-ip, dst-ip, src-port, dst-port and l4proto
Supported flow-types are: tcp6, udp6, sctp6, ip6, ah6, esp6

Example usage:
ethtool -N ens787f0v0 flow-type tcp6 src-ip 2001::2 \
dst-ip CDCD:910A:2222:5498:8475:1111:3900:2020 \
tclass 1 src-port 22 dst-port 23 action 7

L2TPv3 over IP with 'Session ID' 17:
ethtool -N ens787f0v0 flow-type ip6 l4proto 115 l4data 17 action 7

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Tested-by: Chen Bo <BoX.C.Chen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

iavf: Support IPv4 Flow Director filters

Support the addition and deletion of IPv4 filters.

Supported fields are: src-ip, dst-ip, src-port, dst-port and l4proto
Supported flow-types are: tcp4, udp4, sctp4, ip4, ah4, esp4

Example usage:
ethtool -N ens787f0v0 flow-type tcp4 src-ip 192.168.0.20 \
dst-ip 192.168.0.21 tos 4 src-port 22 dst-port 23 action 8

L2TPv3 over IP with 'Session ID' 17:
ethtool -N ens787f0v0 flow-type ip4 l4proto 115 l4data 17 action 3

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Tested-by: Chen Bo <BoX.C.Chen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

iavf: Add framework to enable ethtool ntuple filters

Enable ethtool ntuple filter support on the VF driver using the virtchnl
interface to the PF driver and the Flow director functionality in the
hardware.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Tested-by: Chen Bo <BoX.C.Chen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Check FDIR program status for AVF

Enable returning FDIR completion status by checking the
ctrl_vsi Rx queue descriptor value.

To enable returning FDIR completion status from ctrl_vsi Rx queue,
COMP_Queue and COMP_Report of FDIR filter programming descriptor
needs to be properly configured. After program request sent to ctrl_vsi
Tx queue, ctrl_vsi Rx queue interrupt will be triggered and
completion status will be returned.

Driver will first issue request in ice_vc_fdir_add_fltr(), then
pass FDIR context to the background task in interrupt service routine
ice_vc_fdir_irq_handler() and finally deal with them in
ice_flush_fdir_ctx(). ice_flush_fdir_ctx() will check the descriptor's
value, fdir context, and then send back virtual channel message to VF
by calling ice_vc_add_fdir_fltr_post(). An additional timer will be
setup in case of hardware interrupt timeout.

Signed-off-by: Yahui Cao <yahui.cao@intel.com>
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Tested-by: Chen Bo <BoX.C.Chen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Add more FDIR filter type for AVF

FDIR for AVF can forward
- L2TPV3 packets by matching session id.
- IPSEC ESP packets by matching security parameter index.
- IPSEC AH packets by matching security parameter index.
- NAT_T ESP packets by matching security parameter index.
- Any PFCP session packets(s field is 1).

Signed-off-by: Yahui Cao <yahui.cao@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Tested-by: Chen Bo <BoX.C.Chen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Add GTPU FDIR filter for AVF

Add new FDIR filter type to forward GTPU packets by matching TEID or QFI.
The filter is only enabled when COMMS DDP package is downloaded.

Signed-off-by: Yahui Cao <yahui.cao@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Tested-by: Chen Bo <BoX.C.Chen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Add non-IP Layer2 protocol FDIR filter for AVF

Add new filter type that allow forward non-IP Ethernet packets base on its
ethertype. The filter is only enabled when COMMS DDP package is loaded.

Signed-off-by: Yahui Cao <yahui.cao@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Tested-by: Chen Bo <BoX.C.Chen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Add new actions support for VF FDIR

Add two new actions support for VF FDIR:

A passthrough action does not specify the destination queue, but
just allow the packet go to next pipeline stage, a typical use
cases is combined with a software mark (FDID) action.

Allow specify a 2^n continuous queues as the destination of a FDIR rule.
Packet distribution is based on current RSS configure.

Signed-off-by: Yahui Cao <yahui.cao@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Tested-by: Chen Bo <BoX.C.Chen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Add FDIR pattern action parser for VF

Add basic FDIR flow list and pattern / action parse functions for VF.

Signed-off-by: Yahui Cao <yahui.cao@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Tested-by: Chen Bo <BoX.C.Chen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Enable FDIR Configure for AVF

The virtual channel is going to be extended to support FDIR and
RSS configure from AVF. New data structures and OP codes will be
added, the patch enable the FDIR part.

To support above advanced AVF feature, we need to figure out
what kind of data structure should be passed from VF to PF to describe
an FDIR rule or RSS config rule. The common part of the requirement is
we need a data structure to represent the input set selection of a rule's
hash key.

An input set selection is a group of fields be selected from one or more
network protocol layers that could be identified as a specific flow.
For example, select dst IP address from an IPv4 header combined with
dst port from the TCP header as the input set for an IPv4/TCP flow.

The patch adds a new data structure virtchnl_proto_hdrs to abstract
a network protocol headers group which is composed of layers of network
protocol header(virtchnl_proto_hdr).

A protocol header contains a 32 bits mask (field_selector) to describe
which fields are selected as input sets, as well as a header type
(enum virtchnl_proto_hdr_type). Each bit is mapped to a field in
enum virtchnl_proto_hdr_field guided by its header type.

+------------+-----------+------------------------------+
|            | Proto Hdr | Header Type A                |
|            |           +------------------------------+
|            |           | BIT 31 | ... | BIT 1 | BIT 0 |
|            |-----------+------------------------------+
|Proto Hdrs  | Proto Hdr | Header Type B                |
|            |           +------------------------------+
|            |           | BIT 31 | ... | BIT 1 | BIT 0 |
|            |-----------+------------------------------+
|            | Proto Hdr | Header Type C                |
|            |           +------------------------------+
|            |           | BIT 31 | ... | BIT 1 | BIT 0 |
|            |-----------+------------------------------+
|            |    ....                                  |
+-------------------------------------------------------+

All fields in enum virtchnl_proto_hdr_fields are grouped with header type
and the value of the first field of a header type is always 32 aligned.

enum proto_hdr_type {
        header_type_A = 0;
        header_type_B = 1;
        ....
}

enum proto_hdr_field {
        /* header type A */
        header_A_field_0 = 0,
        header_A_field_1 = 1,
        header_A_field_2 = 2,
        header_A_field_3 = 3,

        /* header type B */
        header_B_field_0 = 32, // = header_type_B << 5
        header_B_field_0 = 33,
        header_B_field_0 = 34
        header_B_field_0 = 35,
        ....
};

So we have:
proto_hdr_type = proto_hdr_field / 32
bit offset = proto_hdr_field % 32

To simply the protocol header's operations, couple help macros are added.
For example, to select src IP and dst port as input set for an IPv4/UDP
flow.

we have:
struct virtchnl_proto_hdr hdr[2];

VIRTCHNL_SET_PROTO_HDR_TYPE(&hdr[0], IPV4)
VIRTCHNL_ADD_PROTO_HDR_FIELD(&hdr[0], IPV4, SRC)

VIRTCHNL_SET_PROTO_HDR_TYPE(&hdr[1], UDP)
VIRTCHNL_ADD_PROTO_HDR_FIELD(&hdr[1], UDP, DST)

The byte array is used to store the protocol header of a training package.
The byte array must be network order.

The patch added virtual channel support for iAVF FDIR add/validate/delete
filter. iAVF FDIR is Flow Director for Intel Adaptive Virtual Function
which can direct Ethernet packets to the queues of the Network Interface
Card. Add/delete command is adding or deleting one rule for each virtual
channel message, while validate command is just verifying if this rule
is valid without any other operations.

To add or delete one rule, driver needs to config TCAM and Profile,
build training packets which contains the input set value, and send
the training packets through FDIR Tx queue. In addition, driver needs to
manage the software context to avoid adding duplicated rules, deleting
non-existent rule, input set conflicts and other invalid cases.

NOTE:
Supported pattern/actions and their parse functions are not be included in
this patch, they will be added in a separate one.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Signed-off-by: Yahui Cao <yahui.cao@intel.com>
Signed-off-by: Simei Su <simei.su@intel.com>
Signed-off-by: Beilei Xing <beilei.xing@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Tested-by: Chen Bo <BoX.C.Chen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Add support for per VF ctrl VSI enabling

We are going to enable FDIR configure for AVF through virtual channel.
The first step is to add helper functions to support control VSI setup.
A control VSI will be allocated for a VF when AVF creates its
first FDIR rule through ice_vf_ctrl_vsi_setup().
The patch will also allocate FDIR rule space for VF's control VSI.
If a VF asks for flow director rules, then those should come entirely
from the best effort pool and not from the guaranteed pool. The patch
allow a VF VSI to have only space in the best effort rules.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Yahui Cao <yahui.cao@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Tested-by: Chen Bo <BoX.C.Chen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Enhanced IPv4 and IPv6 flow filter

Separate IPv4 and IPv6 ptype bit mask table into 2 tables:
with or without L4 protocols.

When a flow filter without any l4 type is specified, the
ICE_FLOW_SEG_HDR_IPV_OTHER flag can be used to describe if user
want to create a IP rule target for all IP packet or just IP
packet without l4 header.

Signed-off-by: Dan Nowlin <dan.nowlin@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Tested-by: Chen Bo <BoX.C.Chen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Support to separate GTP-U uplink and downlink

To apply different input set for GTP-U packet with or without extend
header as well as GTP-U uplink and downlink, we need to add TCAM mask
matching capability. This allows comprehending different PTYPE
attributes by examining flags from the parser. Using this method,
different profiles can be used by examining flag values from the parser.

Signed-off-by: Dan Nowlin <dan.nowlin@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Tested-by: Chen Bo <BoX.C.Chen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Add more advanced protocol support in flow filter

Add more protocol support in flow filter, these
include PPPoE, L2TPv3, GTP, PFCP, ESP and AH.

Signed-off-by: Ting Xu <ting.xu@intel.com>
Signed-off-by: Yahui Cao <yahui.cao@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Tested-by: Chen Bo <BoX.C.Chen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Support non word aligned input set field

To support FDIR input set with protocol field like DSCP, TTL,
PROT, etc. which is not word aligned, we need to enable field
vector masking.

Signed-off-by: Dan Nowlin <dan.nowlin@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Tested-by: Chen Bo <BoX.C.Chen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Add more basic protocol support for flow filter

Add more protocol and field support for flow filter include:
ETH, VLAN, ICMP, ARP and TCP flag.

Signed-off-by: Kevin Scott <kevin.c.scott@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Tested-by: Chen Bo <BoX.C.Chen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

Revert "net: dsa: sja1105: Clear VLAN filtering offload netdev feature"

This reverts commit e9bf96943b408e6c99dd13fb01cb907335787c61.

The topic of the reverted patch is the support for switches with global
VLAN filtering, added by commit 061f6a505ac3 ("net: dsa: Add
ndo_vlan_rx_{add, kill}_vid implementation"). Be there a switch with 4
ports swp0 -> swp3, and the following setup:

ip link add br0 type bridge vlan_filtering 1
ip link set swp0 master br0
ip link set swp1 master br0

What would happen with VLAN-tagged traffic received on standalone ports
swp2 and swp3? Well, it would get dropped, were it not for the
.ndo_vlan_rx_add_vid and .ndo_vlan_rx_kill_vid implementations (called
from vlan_vid_add and vlan_vid_del respectively). Basically, for DSA
switches where VLAN filtering is a global attribute, we enforce the
standalone ports to have 'rx-vlan-filter: off [fixed]' in their ethtool
features, which lets the user know that all VLAN-tagged packets that are
not explicitly added in the RX filtering list are dropped.

As for the sja1105 driver, at the time of the reverted patch, it was
operating in a pretty handicapped mode when it had ports under a bridge
with vlan_filtering=1. Specifically, it was unable to terminate traffic
through the CPU port (for further explanation see "Traffic support" in
Documentation/networking/dsa/sja1105.rst).

However, since then, the sja1105 driver has made considerable progress,
and that limitation is no longer as severe now. Specifically, since
commit 2cafa72e516f ("net: dsa: sja1105: add a new
best_effort_vlan_filtering devlink parameter"), the driver is able to
perform CPU termination even when some ports are under bridges with
vlan_filtering=1. Then, since commit 8841f6e63f2c ("net: dsa: sja1105:
make devlink property best_effort_vlan_filtering true by default"), this
even became the default operating mode.

So we can now take advantage of the logic in the DSA core.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: add support for ethtool get_ringparam

Add support for the ethtool get_ringparam operation.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ipa-cfg-data-updates'

Alex Elder says:

====================
net: ipa: more configuration data updates

This series starts with two patches that should have been included
in an earlier series.  With these in place, QSB settings are
programmed from information found in the data files rather than
being embedded in code.  Support is then added for reprenting
another QSB property (supported for IPA v4.0+).

The third patch updates the definition of the sequencer type used
for an endpoint.  Previously a set of 2-byte symbols with fairly
long names defined the sequencer type, but now those are broken into
1-byte halves whose names are a little more informative.

The fourth patch moves the sequencer type definition so it only
applies to TX endpoints (they aren't valid for RX endpoints).  And
the last makes some minor documentation updates.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: update some comments in "ipa_data.h"

Fix/expand some comments in "ipa_data.h".

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: sequencer type is for TX endpoints only

We only program the sequencer type for TX endpoints.  So move the
definition of the sequencer type fields into the TX-specific portion
of the endpoint configuration data.  There's no need to maintain
this in the IPA structure; we can extract it from the configuration
data it points to in the one spot it's needed.

We previously specified the sequencer type for RX endpoints with
INVALID values.  These are no longer needed, so get rid of them.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: split sequencer type in two

An IPA endpoint has a sequencer that must be configured based on how
the endpoint is to be used.  Currently the IPA code programs the
sequencer type by splitting a value into four 4-bit nibbles.  Doing
that doesn't really add much value, and regardless, a better way of
splitting the sequencer type is into two halves--the lower byte
describing how normal packet processing is handled, and the next
byte describing information about processing replicas.

So split the sequencer type into two sub-parts:  the sequencer type
and the replication sequencer type.  Define the values supported for
the "main" sequencer type, and define the values supported for the
replication part separately.

In addition, the sequencer type names are quite verbose, encoding
what the type includes, but also what it *excludes*.  Rename the
sequencer types in a way that mainly describes the number of passes
that a packet takes through the IPA processing pipeline, and how
many of those passes end by supplying the processed packet to the
microprocessor.

The result expands the supported types beyond what is required for
now, but simplifies the way these are defined.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: implement MAX_READS_BEATS QSB data

Starting with IPA v4.0, a limit is placed on the number of bytes
outstanding in a transaction, to reduce latency. The limit is
imposed only if this value is non-zero.

We don't use a non-zero value for SC7180, but newer versions of IPA
do. Prepare for that by allowing a programmed value to be specified
in the platform configuration data.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: use configuration data for QSB settings

Use the QSB configuration data in ipa_hardware_config_qsb(), rather
than determining in code what values to use based on IPA version.
Pass configuration data to ipa_hardware_config() so it can be passed
to ipa_hardware_config_qsb().

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: decnet: Fixed multiple coding style issues

Made changes to coding style as suggested by checkpatch.pl
changes are of the type:
open brace '{' following struct go on the same line
do not use assignment in if condition

Signed-off-by: Sai Kalyaan Palla <saikalyaan63@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
1GbE Intel Wired LAN Driver Updates 2021-03-19

This series contains updates to igc and e1000e drivers.

Sasha removes unused defines in igc driver.

Jiapeng Zhong changes bool assignments from 0/1 to false/true for igc.

Wei Yongjun marks e1000e_pm_prepare() as __maybe_unused to resolve a
defined but not used warning under certain configurations.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

e1000e: Mark e1000e_pm_prepare() as __maybe_unused

The function e1000e_pm_prepare() may have no callers depending
on configuration, so it must be marked __maybe_unused to avoid
harmless warning:

drivers/net/ethernet/intel/e1000e/netdev.c:6926:12:
warning: 'e1000e_pm_prepare' defined but not used [-Wunused-function]
6926 | static int e1000e_pm_prepare(struct device *dev)
| ^~~~~~~~~~~~~~~~~

Fixes: ccf8b940e5fd ("e1000e: Leverage direct_complete to speed up s2ram")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

igc: Assign boolean values to a bool variable

Fix the following coccicheck warnings:

./drivers/net/ethernet/intel/igc/igc_main.c:4961:2-14: WARNING:
Assignment of 0/1 to bool variable.

./drivers/net/ethernet/intel/igc/igc_main.c:4955:2-14: WARNING:
Assignment of 0/1 to bool variable.

./drivers/net/ethernet/intel/igc/igc_main.c:4933:1-13: WARNING:
Assignment of 0/1 to bool variable.

./drivers/net/ethernet/intel/igc/igc_main.c:4592:1-24: WARNING:
Assignment of 0/1 to bool variable.

./drivers/net/ethernet/intel/igc/igc_main.c:4438:2-25: WARNING:
Assignment of 0/1 to bool variable.

./drivers/net/ethernet/intel/igc/igc_main.c:4396:2-25: WARNING:
Assignment of 0/1 to bool variable.

./drivers/net/ethernet/intel/igc/igc_main.c:4018:2-25: WARNING:
Assignment of 0/1 to bool variable.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Zhong <abaci-bugfix@linux.alibaba.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

igc: Remove unused MII_CR_LOOPBACK

MII_CR_LOOPBACK masks not in use in i225 device and can be removed.

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

igc: Remove unused MII_CR_SPEED

Force PHY speed not supported for i225 devices.
MII_CR_SPEED masks not in use in i225 device and can be removed.

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

net: add CONFIG_PCPU_DEV_REFCNT

I was working on a syzbot issue, claiming one device could not be
dismantled because its refcount was -1

unregister_netdevice: waiting for sit0 to become free. Usage count = -1

It would be nice if syzbot could trigger a warning at the time
this reference count became negative.

This patch adds CONFIG_PCPU_DEV_REFCNT options which defaults
to per cpu variables (as before this patch) on SMP builds.

v2: free_dev label in alloc_netdev_mqs() is moved to avoid
a compiler warning (-Wunused-label), as reported
by kernel test robot <lkp@intel.com>

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ipa-update-config-data'

Alex Elder says:

====================
net: ipa: update configuration data

Each IPA version has a "data" file defining how various things are
configured.  This series gathers a few updates to this information:
  - The first patch makes all configuration data constant
  - The second fixes an incorrect (but seemingly harmless) value
  - The third simplifies things a bit by using implicit zero
    initialization for memory regions that are empty
  - The fourth adds definitions for memory regions that exist but
    are not yet used
  - The fifth use configuration data rather than conditional code to
    set some bus parameters
====================

net: ipa: define QSB limits in configuration data

Define the maximum number of reads and writes to configure for the
QSB masters used for IPA in configuration data.

We don't use these values yet; the next commit takes care of that.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: define some new memory regions

There are several memory regions that are defined starting with IPA
v4.0, but which were not used for the SC7180 SoC (IPA v4.2). Even
though they're not used (yet), define them so they are ready to be
used for SoCs when they become supported.

There are two QUOTA statistics memory regions, one for the modem and
one for the AP. Define distinct names for these regions, and get
rid of the definition of IPA_MEM_STATS_QUOTA.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: don't define empty memory regions

The AP_HEADER memory region for both the SDM845 and SC7180 SoCs has
zero size, and has no canaries.  Defining an offset for such a
zero-length region is not meaningful, so it's better not to define
it at all.  The size of this region is used in the code, but its
value will still be zero because the memory regions are defined in
statically initialized memory.

For the SC7180, the STATS_DROP memory region has a zero size and no
canaries as well.

These regions are the only place where a zero-sized region is
defined despite having no canaries.  Remove them.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: fix canary count for SC7180 UC_INFO region

There should be no canary values written before the beginning of the
UC_INFO memory region. This was correct for SDM845, but somehow was
committed with the wrong value for SC7180.

This bug seems to cause no harm, so we'll just correct it without
back-porting.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: make all configuration data constant

All of the platform configuration data should be constant, but
that isn't the case for the memory regions, interconnects, and
clocks. Fix this.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

igc: Remove unused MII_CR_RESET

MII_CR_RESET mask not in use in i225 device and can be removed

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>