]> git.proxmox.com Git - mirror_ubuntu-kernels.git/log
mirror_ubuntu-kernels.git
22 months agonet: ethernet: mtk_wed: add reset/reset_complete callbacks
Lorenzo Bianconi [Sat, 14 Jan 2023 17:01:32 +0000 (18:01 +0100)]
net: ethernet: mtk_wed: add reset/reset_complete callbacks

Introduce reset and reset_complete wlan callback to schedule WLAN driver
reset when ethernet/wed driver is resetting.

Tested-by: Daniel Golle <daniel@makrotopia.org>
Co-developed-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
22 months agonet: ethernet: mtk_eth_soc: add dma checks to mtk_hw_reset_check
Lorenzo Bianconi [Sat, 14 Jan 2023 17:01:31 +0000 (18:01 +0100)]
net: ethernet: mtk_eth_soc: add dma checks to mtk_hw_reset_check

Introduce mtk_hw_check_dma_hang routine to monitor possible dma hangs.

Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Tested-by: Daniel Golle <daniel@makrotopia.org>
Co-developed-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
22 months agonet: ethernet: mtk_eth_soc: align reset procedure to vendor sdk
Lorenzo Bianconi [Sat, 14 Jan 2023 17:01:30 +0000 (18:01 +0100)]
net: ethernet: mtk_eth_soc: align reset procedure to vendor sdk

Avoid to power-down the ethernet chip during hw reset and align reset
procedure to vendor sdk.

Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Tested-by: Daniel Golle <daniel@makrotopia.org>
Co-developed-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
22 months agonet: ethernet: mtk_eth_soc: introduce mtk_hw_warm_reset support
Lorenzo Bianconi [Sat, 14 Jan 2023 17:01:29 +0000 (18:01 +0100)]
net: ethernet: mtk_eth_soc: introduce mtk_hw_warm_reset support

Introduce mtk_hw_warm_reset utility routine. This is a preliminary patch
to align reset procedure to vendor sdk and avoid to power down the chip
during hw reset.

Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Tested-by: Daniel Golle <daniel@makrotopia.org>
Co-developed-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
22 months agonet: ethernet: mtk_eth_soc: introduce mtk_hw_reset utility routine
Lorenzo Bianconi [Sat, 14 Jan 2023 17:01:28 +0000 (18:01 +0100)]
net: ethernet: mtk_eth_soc: introduce mtk_hw_reset utility routine

This is a preliminary patch to add Wireless Ethernet Dispatcher reset
support.

Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Tested-by: Daniel Golle <daniel@makrotopia.org>
Co-developed-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
22 months agoMerge branch 'net-use-kmem_cache_free_bulk-in-kfree_skb_list'
Paolo Abeni [Tue, 17 Jan 2023 09:27:31 +0000 (10:27 +0100)]
Merge branch 'net-use-kmem_cache_free_bulk-in-kfree_skb_list'

Jesper Dangaard Brouer says:

====================
net: use kmem_cache_free_bulk in kfree_skb_list

The kfree_skb_list function walks SKB (via skb->next) and frees them
individually to the SLUB/SLAB allocator (kmem_cache). It is more
efficient to bulk free them via the kmem_cache_free_bulk API.

Netstack NAPI fastpath already uses kmem_cache bulk alloc and free
APIs for SKBs.

The kfree_skb_list call got an interesting optimization in commit
520ac30f4551 ("net_sched: drop packets after root qdisc lock is
released") that can create a list of SKBs "to_free" e.g. when qdisc
enqueue fails or deliberately chooses to drop . It isn't a normal data
fastpath, but the situation will likely occur when system/qdisc are
under heavy workloads, thus it makes sense to use a faster API for
freeing the SKBs.

E.g. the (often distro default) qdisc fq_codel will drop batches of
packets from fattest elephant flow, default capped at 64 packets (but
adjustable via tc argument drop_batch).

Performance measurements done in [1]:
 [1] https://github.com/xdp-project/xdp-project/blob/master/areas/mem/kfree_skb_list01.org
====================

Link: https://lore.kernel.org/r/167361788585.531803.686364041841425360.stgit@firesoul
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
22 months agonet: kfree_skb_list use kmem_cache_free_bulk
Jesper Dangaard Brouer [Fri, 13 Jan 2023 13:52:04 +0000 (14:52 +0100)]
net: kfree_skb_list use kmem_cache_free_bulk

The kfree_skb_list function walks SKB (via skb->next) and frees them
individually to the SLUB/SLAB allocator (kmem_cache). It is more
efficient to bulk free them via the kmem_cache_free_bulk API.

This patches create a stack local array with SKBs to bulk free while
walking the list. Bulk array size is limited to 16 SKBs to trade off
stack usage and efficiency. The SLUB kmem_cache "skbuff_head_cache"
uses objsize 256 bytes usually in an order-1 page 8192 bytes that is
32 objects per slab (can vary on archs and due to SLUB sharing). Thus,
for SLUB the optimal bulk free case is 32 objects belonging to same
slab, but runtime this isn't likely to occur.

The expected gain from using kmem_cache bulk alloc and free API
have been assessed via a microbencmark kernel module[1].

The module 'slab_bulk_test01' results at bulk 16 element:
 kmem-in-loop Per elem: 109 cycles(tsc) 30.532 ns (step:16)
 kmem-bulk    Per elem: 64 cycles(tsc) 17.905 ns (step:16)

More detailed description of benchmarks avail in [2].

[1] https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/mm
[2] https://github.com/xdp-project/xdp-project/blob/master/areas/mem/kfree_skb_list01.org

V2: rename function to kfree_skb_add_bulk.

Reviewed-by: Saeed Mahameed <saeed@kernel.org>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
22 months agonet: fix call location in kfree_skb_list_reason
Jesper Dangaard Brouer [Fri, 13 Jan 2023 13:51:59 +0000 (14:51 +0100)]
net: fix call location in kfree_skb_list_reason

The SKB drop reason uses __builtin_return_address(0) to give the call
"location" to trace_kfree_skb() tracepoint skb:kfree_skb.

To keep this stable for compilers kfree_skb_reason() is annotated with
__fix_address (noinline __noclone) as fixed in commit c205cc7534a9
("net: skb: prevent the split of kfree_skb_reason() by gcc").

The function kfree_skb_list_reason() invoke kfree_skb_reason(), which
cause the __builtin_return_address(0) "location" to report the
unexpected address of kfree_skb_list_reason.

Example output from 'perf script':
 kpktgend_0  1337 [000]    81.002597: skb:kfree_skb: skbaddr=0xffff888144824700 protocol=2048 location=kfree_skb_list_reason+0x1e reason: QDISC_DROP

Patch creates an __always_inline __kfree_skb_reason() helper call that
is called from both kfree_skb_list() and kfree_skb_list_reason().
Suggestions for solutions that shares code better are welcome.

As preparation for next patch move __kfree_skb() invocation out of
this helper function.

Reviewed-by: Saeed Mahameed <saeed@kernel.org>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
22 months agodevlink: remove some unnecessary code
Dan Carpenter [Fri, 13 Jan 2023 07:35:43 +0000 (10:35 +0300)]
devlink: remove some unnecessary code

This code checks if (attrs[DEVLINK_ATTR_TRAP_POLICER_ID]) twice.  Once
at the start of the function and then a couple lines later.  Delete the
second check since that one must be true.

Because the second condition is always true, it means the:

policer_item = group_item->policer_item;

assignment is immediately over-written.  Delete that as well.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/Y8EJz8oxpMhfiPUb@kili
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
22 months agosched: add new attr TCA_EXT_WARN_MSG to report tc extact message
Hangbin Liu [Fri, 13 Jan 2023 03:43:53 +0000 (11:43 +0800)]
sched: add new attr TCA_EXT_WARN_MSG to report tc extact message

We will report extack message if there is an error via netlink_ack(). But
if the rule is not to be exclusively executed by the hardware, extack is not
passed along and offloading failures don't get logged.

In commit 81c7288b170a ("sched: cls: enable verbose logging") Marcelo
made cls could log verbose info for offloading failures, which helps
improving Open vSwitch debuggability when using flower offloading.

It would also be helpful if userspace monitor tools, like "tc monitor",
could log this kind of message, as it doesn't require vswitchd log level
adjusment. Let's add a new tc attributes to report the extack message so
the monitor program could receive the failures. e.g.

  # tc monitor
  added chain dev enp3s0f1np1 parent ffff: chain 0
  added filter dev enp3s0f1np1 ingress protocol all pref 49152 flower chain 0 handle 0x1
    ct_state +trk+new
    not_in_hw
          action order 1: gact action drop
           random type none pass val 0
           index 1 ref 1 bind 1

  Warning: mlx5_core: matching on ct_state +new isn't supported.

In this patch I only report the extack message on add/del operations.
It doesn't look like we need to report the extack message on get/dump
operations.

Note this message not only reporte to multicast groups, it could also
be reported unicast, which may affect the current usersapce tool's behaivor.

Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20230113034353.2766735-1-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
22 months agoMerge branch 'dt-bindings-ocelot-switches'
David S. Miller [Mon, 16 Jan 2023 18:42:55 +0000 (18:42 +0000)]
Merge branch 'dt-bindings-ocelot-switches'

Colin Foster says

====================
dt-binding preparation for ocelot switches

Ocelot switches have the abilitiy to be used internally via
memory-mapped IO or externally via SPI or PCIe. This brings up issues
for documentation, where the same chip might be accessed internally in a
switchdev manner, or externally in a DSA configuration. This patch set
is perparation to bring DSA functionality to the VSC7512, utilizing as
much as possible with an almost identical VSC7514 chip.

This patch set changed quite a bit from v2, so I'll omit the background
of how those sets came to be. Rob offered a lot of very useful guidance.
My thanks.

At the end of the day, with this patch set, there should be a framework
to document Ocelot switches (and any switch) in scenarios where they can
be controlled internally (ethernet-switch) or externally (dsa-switch).

---

v6 -> v7
  * Add Reviewed / Acked on patch 1
  * Clean up descriptions on Ethernet / DSA switch port bindings

v5 -> v6
  * Rebase so it applies to net-next cleanly.
  * No other changes - during the last submission round I said I'd
    submit v6 with a change to move $dsa-port.yaml to outside the allOf
    list. In retrospect that wasn't the right thing to do, because later
    in the patch series the $dsa-port.yaml is removed outright. So I
    believe the submission in v5 to keep "type: object" was correct.

v4 -> v5
  * Sync DSA maintainers with MAINTAINERS file (new patch 1)
  * Undo move of port description of mediatek,mt7530.yaml (patch 4)
  * Move removal of "^(ethernet-)?switch(@.*)?$" in dsa.yaml from patch 4
    to patch 8
  * Add more consistent capitalization in title lines and better Ethernet
    switch port description. (patch 8)

v3 -> v4
  * Renamed "base" to "ethernet-ports" to avoid confusion with the concept
    of a base class.
  * Squash ("dt-bindings: net: dsa: mediatek,mt7530: fix port description location")
    patch into ("dt-bindings: net: dsa: utilize base definitions for standard dsa
    switches")
  * Corrections to fix confusion about additonalProperties vs unevaluatedProperties.
    See specific patches for details.

v2 -> v3
  * Restructured everything to use a "base" iref for devices that don't
    have additional properties, and simply a "ref" for devices that do.
  * New patches to fix up brcm,sf2, qca8k, and mt7530
  * Fix unevaluatedProperties errors from previous sets (see specific
    patches for more detail)
  * Removed redundant "Device Tree Binding" from titles, where applicable.

v1 -> v2
  * Two MFD patches were brought into the MFD tree, so are dropped
  * Add first patch 1/6 to allow DSA devices to add ports and port
    properties
  * Test qca8k against new dt-bindings and fix warnings. (patch 2/6)
  * Add tags (patch 3/6)
  * Fix vsc7514 refs and properties
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agodt-bindings: net: mscc,vsc7514-switch: utilize generic ethernet-switch.yaml
Colin Foster [Thu, 12 Jan 2023 17:56:13 +0000 (07:56 -1000)]
dt-bindings: net: mscc,vsc7514-switch: utilize generic ethernet-switch.yaml

Several bindings for ethernet switches are available for non-dsa switches
by way of ethernet-switch.yaml. Remove these duplicate entries and utilize
the common bindings for the VSC7514.

Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Suggested-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agodt-bindings: net: add generic ethernet-switch-port binding
Colin Foster [Thu, 12 Jan 2023 17:56:12 +0000 (07:56 -1000)]
dt-bindings: net: add generic ethernet-switch-port binding

The dsa-port.yaml binding had several references that can be common to all
ethernet ports, not just dsa-specific ones. Break out the generic bindings
to ethernet-switch-port.yaml they can be used by non-dsa drivers.

Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Suggested-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agodt-bindings: net: add generic ethernet-switch
Colin Foster [Thu, 12 Jan 2023 17:56:11 +0000 (07:56 -1000)]
dt-bindings: net: add generic ethernet-switch

The dsa.yaml bindings had references that can apply to non-dsa switches. To
prevent duplication of this information, keep the dsa-specific information
inside dsa.yaml and move the remaining generic information to the newly
created ethernet-switch.yaml.

Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Suggested-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agodt-bindings: net: dsa: mediatek,mt7530: remove unnecessary dsa-port reference
Colin Foster [Thu, 12 Jan 2023 17:56:10 +0000 (07:56 -1000)]
dt-bindings: net: dsa: mediatek,mt7530: remove unnecessary dsa-port reference

dsa.yaml contains a reference to dsa-port.yaml, so a duplicate reference to
the binding isn't necessary. Remove this unnecessary reference.

Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Suggested-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agodt-bindings: net: dsa: qca8k: utilize shared dsa.yaml
Colin Foster [Thu, 12 Jan 2023 17:56:09 +0000 (07:56 -1000)]
dt-bindings: net: dsa: qca8k: utilize shared dsa.yaml

The dsa.yaml binding contains duplicated bindings for address and size
cells, as well as the reference to dsa-port.yaml. Instead of duplicating
this information, remove the reference to dsa-port.yaml and include the
full reference to dsa.yaml.

Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Suggested-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agodt-bindings: net: dsa: allow additional ethernet-port properties
Colin Foster [Thu, 12 Jan 2023 17:56:08 +0000 (07:56 -1000)]
dt-bindings: net: dsa: allow additional ethernet-port properties

Explicitly allow additional properties for both the ethernet-port and
ethernet-ports properties. This specifically will allow the qca8k.yaml
binding to use shared properties.

Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agodt-bindings: net: dsa: utilize base definitions for standard dsa switches
Colin Foster [Thu, 12 Jan 2023 17:56:07 +0000 (07:56 -1000)]
dt-bindings: net: dsa: utilize base definitions for standard dsa switches

DSA switches can fall into one of two categories: switches where all ports
follow standard '(ethernet-)?port' properties, and switches that have
additional properties for the ports.

The scenario where DSA ports are all standardized can be handled by
switches with a reference to the new 'dsa.yaml#/$defs/ethernet-ports'.

The scenario where DSA ports require additional properties can reference
'$dsa.yaml#' directly. This will allow switches to reference these standard
definitions of the DSA switch, but add additional properties under the port
nodes.

Suggested-by: Rob Herring <robh@kernel.org>
Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Acked-by: Alvin Šipraga <alsi@bang-olufsen.dk> # realtek
Acked-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agodt-bindings: net: dsa: qca8k: remove address-cells and size-cells from switch node
Colin Foster [Thu, 12 Jan 2023 17:56:06 +0000 (07:56 -1000)]
dt-bindings: net: dsa: qca8k: remove address-cells and size-cells from switch node

The children of the switch node don't have a unit address, and therefore
should not need the #address-cells or #size-cells entries. Fix the example
schemas accordingly.

Suggested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agodt-bindings: net: dsa: sf2: fix brcm,use-bcm-hdr documentation
Colin Foster [Thu, 12 Jan 2023 17:56:05 +0000 (07:56 -1000)]
dt-bindings: net: dsa: sf2: fix brcm,use-bcm-hdr documentation

The property use-bcm-hdr was documented as an entry under the ports node
for the bcm_sf2 DSA switch. This property is actually evaluated for each
port. Correct the documentation to match the actual behavior and properly
reference dsa-port.yaml for additional properties of the node.

Suggested-by: Rob Herring <robh@kernel.org>
Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agodt-bindings: dsa: sync with maintainers
Colin Foster [Thu, 12 Jan 2023 17:56:04 +0000 (07:56 -1000)]
dt-bindings: dsa: sync with maintainers

The MAINTAINERS file has Andrew Lunn, Florian Fainelli, and Vladimir Oltean
listed as the maintainers for generic dsa bindings. Update dsa.yaml and
dsa-port.yaml accordingly.

Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Acked-by: Rob Herring <robh@kernel.org>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agoMerge branch 'net-microchip-vcap-rules'
David S. Miller [Mon, 16 Jan 2023 13:45:17 +0000 (13:45 +0000)]
Merge branch 'net-microchip-vcap-rules'

Steen Hegelund says:

====================
net: microchip: Add support for two classes of VCAP rules

This adds support for two classes of VCAP rules:

- Permanent rules (added e.g. for PTP support)
- TC user rules (added by the TC userspace tool)

For this to work the VCAP Loopups must be enabled from boot, so that the
"internal" clients like PTP can add rules that are always active.

When the TC tool add a flower filter the VCAP rule corresponding to this
filter will be disabled (kept in memory) until a TC matchall filter creates
a link from chain 0 to the chain (lookup) where the flower filter was
added.

When the flower filter is enabled it will be written to the appropriate
VCAP lookup and become active in HW.

Likewise the flower filter will be disabled if there is no link from chain
0 to the chain of the filter (lookup), and when that happens the
corresponding VCAP rule will be read from the VCAP instance and stored in
memory until it is deleted or enabled again.

Version History:
================
v4      Removed a leftover 'Fixes' tag from v2.  No functional changes.

v3      Removed the change that allowed rules to always be added in the
        LAN996x even though the lookups are not enabled (Horatiu Vultur).
        This was sent separately to net instead.

        Removed the 'Fixes' tags due to the patch sent to net by Horatiu
        Vultur.

        Added a check for validity of the chain source when enabling a
        lookup.

v2      Adding a missing goto exit in vcap_add_rule (Dan Carpenter).
        Added missing checks for error returns in vcap_enable_rule.

v1      Initial version
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: microchip: vcap api: Enable/Disable rules via chains in VCAP HW
Steen Hegelund [Sat, 14 Jan 2023 13:42:42 +0000 (14:42 +0100)]
net: microchip: vcap api: Enable/Disable rules via chains in VCAP HW

This supports that individual rules are enabled and disabled via chain
information.
This is done by keeping disabled rules in the VCAP list (cached) until they
are enabled, and only at this time are the rules written to the VCAP HW.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: microchip: vcap api: Add a storage state to a VCAP rule
Steen Hegelund [Sat, 14 Jan 2023 13:42:41 +0000 (14:42 +0100)]
net: microchip: vcap api: Add a storage state to a VCAP rule

This allows a VCAP rule to be in one of 3 states:

- permanently stored in the VCAP HW (for rules that must always be present)
- enabled (stored in HW) when the corresponding lookup has been enabled
- disabled (stored in SW) when the lookup is disabled

This way important VCAP rules can be added even before the user enables the
VCAP lookups using a TC matchall filter.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: microchip: vcap api: Check chains when adding a tc flower filter
Steen Hegelund [Sat, 14 Jan 2023 13:42:40 +0000 (14:42 +0100)]
net: microchip: vcap api: Check chains when adding a tc flower filter

This changes the way the chain information verified when adding a new tc
flower filter.

When adding a flower filter it is now checked that the filter contains a
goto action to one of the IS2 VCAP lookups, except for the last lookup
which may omit this goto action.

It is also checked if you attempt to add multiple matchall filters to
enable the same VCAP lookup.  This will be rejected.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: microchip: vcap api: Use src and dst chain id to chain VCAP lookups
Steen Hegelund [Sat, 14 Jan 2023 13:42:39 +0000 (14:42 +0100)]
net: microchip: vcap api: Use src and dst chain id to chain VCAP lookups

This adds both the source and destination chain id to the information kept
for enabled port lookups.
This allows enabling and disabling a chain of lookups by walking the chain
information for a port.

This changes the way that VCAP lookups are enabled from userspace: instead
of one matchall rule that enables all the 4 Sparx5 IS2 lookups, you need a
matchall rule per lookup.

In practice that is done by adding one matchall rule in chain 0 to goto IS2
Lookup 0, and then for each lookup you add a rule per lookup (low priority)
that does a goto to the next lookup chain.

Examples:

If you want IS2 Lookup 0 to be enabled you add the same matchall filter as
before:

tc filter add dev eth12 ingress chain 0 prio 1000 handle 1000 matchall \
       skip_sw action goto chain 8000000

If you also want to enable lookup 1 to 3 in IS2 and chain them you need
to add the following matchall filters:

tc filter add dev eth12 ingress chain 8000000 prio 1000 handle 1000 \
    matchall skip_sw action goto chain 8100000

tc filter add dev eth12 ingress chain 8100000 prio 1000 handle 1000 \
    matchall skip_sw action goto chain 8200000

tc filter add dev eth12 ingress chain 8200000 prio 1000 handle 1000 \
    matchall skip_sw action goto chain 8300000

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: microchip: vcap api: Convert multi-word keys/actions when encoding
Steen Hegelund [Sat, 14 Jan 2023 13:42:38 +0000 (14:42 +0100)]
net: microchip: vcap api: Convert multi-word keys/actions when encoding

The conversion to the platform specific multi-word format is moved from the
key/action add functions to the encoding key/action.
This allows rules that are disabled (not in VCAP HW) to use the same format
for keys/actions as rules that have just been read from VCAP HW.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: microchip: vcap api: Always enable VCAP lookups
Steen Hegelund [Sat, 14 Jan 2023 13:42:37 +0000 (14:42 +0100)]
net: microchip: vcap api: Always enable VCAP lookups

This changes the VCAP lookups state to always be enabled so that it is
possible to add "internal" VCAP rules that must be available even though
the user has not yet enabled the VCAP chains via a TC matchall filter.

The API callback to enable and disable VCAP lookups is therefore removed.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: microchip: sparx5: Reset VCAP counter for new rules
Steen Hegelund [Sat, 14 Jan 2023 13:42:36 +0000 (14:42 +0100)]
net: microchip: sparx5: Reset VCAP counter for new rules

When a rule counter is external to the VCAP such as the Sparx5 IS2 counters
are, then this counter must be reset when a new rule is created.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: microchip: vcap api: Erase VCAP cache before encoding rule
Steen Hegelund [Sat, 14 Jan 2023 13:42:35 +0000 (14:42 +0100)]
net: microchip: vcap api: Erase VCAP cache before encoding rule

For consistency the VCAP cache area is erased just before the new rule is
being encoded.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agor8169: reset bus if NIC isn't accessible after tx timeout
Heiner Kallweit [Fri, 13 Jan 2023 22:46:19 +0000 (23:46 +0100)]
r8169: reset bus if NIC isn't accessible after tx timeout

ASPM issues may result in the NIC not being accessible any longer.
In this case disabling ASPM may not work. Therefore detect this case
by checking whether register reads return ~0, and try to make the
NIC accessible again by resetting the secondary bus.

v2:
- add exception handling for the case that pci_reset_bus() fails

Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: dsa: mv88e6xxx: Enable PTP receive for mv88e6390
Kurt Kanzenbach [Fri, 13 Jan 2023 15:12:58 +0000 (16:12 +0100)]
net: dsa: mv88e6xxx: Enable PTP receive for mv88e6390

The switch receives management traffic such as STP and LLDP. However, PTP
messages are not received, only transmitted.

Ideally, the switch would trap all PTP messages to the management CPU. This
particular switch has a PTP block which identifies PTP messages and traps them
to a dedicated port. There is a register to program this destination. This is
not used at the moment.

Therefore, program it to the same port as the MGMT traffic is trapped to. This
allows to receive PTP messages as soon as timestamping is enabled.

In addition, the datasheet mentions that this register is not valid e.g., for
6190 variants. So, add a new PTP operation which is added for the 6390 and 6290
devices.

Tested simply like this on Marvell 88E6390, revision 1:

|/ # ptp4l -2 -i lan4 --tx_timestamp_timeout=40 -m
|[...]
|ptp4l[147.450]: master offset         56 s2 freq   +1262 path delay       413
|ptp4l[148.450]: master offset         22 s2 freq   +1244 path delay       434
|ptp4l[149.450]: master offset          5 s2 freq   +1234 path delay       446
|ptp4l[150.451]: master offset          3 s2 freq   +1233 path delay       451
|ptp4l[151.451]: master offset          1 s2 freq   +1232 path delay       451
|ptp4l[152.451]: master offset         -3 s2 freq   +1229 path delay       451
|ptp4l[153.451]: master offset          9 s2 freq   +1240 path delay       451

Link: https://lore.kernel.org/r/CAFSKS=PJBpvtRJxrR4sG1hyxpnUnQpiHg4SrUNzAhkWnyt9ivg@mail.gmail.com
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agovirtio/vsock: replace virtio_vsock_pkt with sk_buff
Bobby Eshleman [Fri, 13 Jan 2023 22:21:37 +0000 (22:21 +0000)]
virtio/vsock: replace virtio_vsock_pkt with sk_buff

This commit changes virtio/vsock to use sk_buff instead of
virtio_vsock_pkt. Beyond better conforming to other net code, using
sk_buff allows vsock to use sk_buff-dependent features in the future
(such as sockmap) and improves throughput.

This patch introduces the following performance changes:

Tool: Uperf
Env: Phys Host + L1 Guest
Payload: 64k
Threads: 16
Test Runs: 10
Type: SOCK_STREAM
Before: commit b7bfaa761d760 ("Linux 6.2-rc3")

Before
------
g2h: 16.77Gb/s
h2g: 10.56Gb/s

After
-----
g2h: 21.04Gb/s
h2g: 10.76Gb/s

Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agoMerge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next...
David S. Miller [Mon, 16 Jan 2023 13:21:11 +0000 (13:21 +0000)]
Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2023-01-13 (ixgbe)

This series contains updates to ixgbe driver only.

Jesse resolves warning for RCU pointer by no longer restoring old
pointer.

Sebastian adds waiting for updating of link info on devices utilizing
crosstalk fix to avoid false link state.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agounix: Improve locking scheme in unix_show_fdinfo()
Kirill Tkhai [Sat, 14 Jan 2023 09:35:02 +0000 (12:35 +0300)]
unix: Improve locking scheme in unix_show_fdinfo()

After switching to TCP_ESTABLISHED or TCP_LISTEN sk_state, alive SOCK_STREAM
and SOCK_SEQPACKET sockets can't change it anymore (since commit 3ff8bff704f4
"unix: Fix race in SOCK_SEQPACKET's unix_dgram_sendmsg()").

Thus, we do not need to take lock here.

Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agoMerge branch 'virtio-net-xdp-multi-buffer'
David S. Miller [Mon, 16 Jan 2023 11:15:49 +0000 (11:15 +0000)]
Merge branch 'virtio-net-xdp-multi-buffer'

Heng Qi says:

====================
virtio-net: support multi buffer xdp

Changes since PATCH v4:
- Make netdev_warn() in [PATCH 2/10] independent from [PATCH 3/10].

Changes since PATCH v3:
- Separate fix patch [2/10] for MTU calculation of single buffer xdp.
  Note that this patch needs to be backported to the stable branch.

Changes since PATCH v2:
- Even if single buffer xdp has a hole mechanism, there will be no
  problem (limiting mtu and turning off GUEST GSO), so there is no
  need to backport "[PATCH 1/9]";
- Modify calculation of MTU for single buffer xdp in virtnet_xdp_set();
- Make truesize in mergeable mode return to literal meaning;
- Add some comments for legibility;

Changes since RFC:
- Using headroom instead of vi->xdp_enabled to avoid re-reading
  in add_recvbuf_mergeable();
- Disable GRO_HW and keep linearization for single buffer xdp;
- Renamed to virtnet_build_xdp_buff_mrg();
- pr_debug() to netdev_dbg();
- Adjusted the order of the patch series.

Currently, virtio net only supports xdp for single-buffer packets
or linearized multi-buffer packets. This patchset supports xdp for
multi-buffer packets, then larger MTU can be used if xdp sets the
xdp.frags. This does not affect single buffer handling.

In order to build multi-buffer xdp neatly, we integrated the code
into virtnet_build_xdp_buff_mrg() for xdp. The first buffer is used
for prepared xdp buff, and the rest of the buffers are added to
its skb_shared_info structure. This structure can also be
conveniently converted during XDP_PASS to get the corresponding skb.

Since virtio net uses comp pages, and bpf_xdp_frags_increase_tail()
is based on the assumption of the page pool,
(rxq->frag_size - skb_frag_size(frag) - skb_frag_off(frag))
is negative in most cases. So we didn't set xdp_rxq->frag_size in
virtnet_open() to disable the tail increase.

====================

Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agovirtio-net: support multi-buffer xdp
Heng Qi [Sat, 14 Jan 2023 08:22:29 +0000 (16:22 +0800)]
virtio-net: support multi-buffer xdp

Driver can pass the skb to stack by build_skb_from_xdp_buff().

Driver forwards multi-buffer packets using the send queue
when XDP_TX and XDP_REDIRECT, and clears the reference of multi
pages when XDP_DROP.

Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agovirtio-net: remove xdp related info from page_to_skb()
Heng Qi [Sat, 14 Jan 2023 08:22:28 +0000 (16:22 +0800)]
virtio-net: remove xdp related info from page_to_skb()

For the clear construction of xdp_buff, we remove the xdp processing
interleaved with page_to_skb(). Now, the logic of xdp and building
skb from xdp are separate and independent.

Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agovirtio-net: build skb from multi-buffer xdp
Heng Qi [Sat, 14 Jan 2023 08:22:27 +0000 (16:22 +0800)]
virtio-net: build skb from multi-buffer xdp

This converts the xdp_buff directly to a skb, including
multi-buffer and single buffer xdp. We'll isolate the
construction of skb based on xdp from page_to_skb().

Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agovirtio-net: transmit the multi-buffer xdp
Heng Qi [Sat, 14 Jan 2023 08:22:26 +0000 (16:22 +0800)]
virtio-net: transmit the multi-buffer xdp

This serves as the basis for XDP_TX and XDP_REDIRECT
to send a multi-buffer xdp_frame.

Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agovirtio-net: construct multi-buffer xdp in mergeable
Heng Qi [Sat, 14 Jan 2023 08:22:25 +0000 (16:22 +0800)]
virtio-net: construct multi-buffer xdp in mergeable

Build multi-buffer xdp using virtnet_build_xdp_buff_mrg().

For the prefilled buffer before xdp is set, we will probably use
vq reset in the future. At the same time, virtio net currently
uses comp pages, and bpf_xdp_frags_increase_tail() needs to calculate
the tailroom of the last frag, which will involve the offset of the
corresponding page and cause a negative value, so we disable tail
increase by not setting xdp_rxq->frag_size.

Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agovirtio-net: build xdp_buff with multi buffers
Heng Qi [Sat, 14 Jan 2023 08:22:24 +0000 (16:22 +0800)]
virtio-net: build xdp_buff with multi buffers

Support xdp for multi buffer packets in mergeable mode.

Putting the first buffer as the linear part for xdp_buff,
and the rest of the buffers as non-linear fragments to struct
skb_shared_info in the tailroom belonging to xdp_buff.

Let 'truesize' return to its literal meaning, that is, when
xdp is set, it includes the length of headroom and tailroom.

Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agovirtio-net: update bytes calculation for xdp_frame
Heng Qi [Sat, 14 Jan 2023 08:22:23 +0000 (16:22 +0800)]
virtio-net: update bytes calculation for xdp_frame

Update relative record value for xdp_frame as basis
for multi-buffer xdp transmission.

Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agovirtio-net: set up xdp for multi buffer packets
Heng Qi [Sat, 14 Jan 2023 08:22:22 +0000 (16:22 +0800)]
virtio-net: set up xdp for multi buffer packets

When the xdp program sets xdp.frags, which means it can process
multi-buffer packets over larger MTU, so we continue to support xdp.

Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agovirtio-net: fix calculation of MTU for single-buffer xdp
Heng Qi [Sat, 14 Jan 2023 08:22:21 +0000 (16:22 +0800)]
virtio-net: fix calculation of MTU for single-buffer xdp

When single-buffer xdp is loaded, the size of the buffer filled each time
is 'sz = (PAGE_SIZE - headroom - tailroom)', which is the maximum packet
length that the driver allows the device to pass in. Otherwise, the packet
with a length greater than sz will come in, so num_buf will be greater than
or equal to 2, and xdp_linearize_page() will be performed and the packet
will be dropped because the total length is greater than PAGE_SIZE. So the
maximum value of MTU for single-buffer xdp is 'max_sz = sz - ETH_HLEN'.

Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agovirtio-net: disable the hole mechanism for xdp
Heng Qi [Sat, 14 Jan 2023 08:22:20 +0000 (16:22 +0800)]
virtio-net: disable the hole mechanism for xdp

XDP core assumes that the frame_size of xdp_buff and the length of
the frag are PAGE_SIZE. The hole may cause the processing of xdp to
fail, so we disable the hole mechanism when xdp is set.

Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agoocteontx2-af: update CPT inbound inline IPsec config mailbox
Srujana Challa [Wed, 11 Jan 2023 12:23:41 +0000 (17:53 +0530)]
octeontx2-af: update CPT inbound inline IPsec config mailbox

Updates CPT inbound inline IPsec configure mailbox to take
CPT credit, opcode, credit_th and bpid from VF.
This patch also adds a mailbox to read inbound IPsec
configuration.

Signed-off-by: Srujana Challa <schalla@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agoMerge branch 'mlxbf_gige-add-bluefield-3-support'
Jakub Kicinski [Sat, 14 Jan 2023 05:59:11 +0000 (21:59 -0800)]
Merge branch 'mlxbf_gige-add-bluefield-3-support'

David Thompson says:

====================
mlxbf_gige: add BlueField-3 support

This patch series adds driver logic to the "mlxbf_gige"
Ethernet driver in order to support the third generation
BlueField SoC (BF3).  The existing "mlxbf_gige" driver is
extended with BF3-specific logic and run-time decisions
are made by the driver depending on the SoC generation
(BF2 vs. BF3).

The BF3 SoC is similar to BF2 SoC with regards to transmit
and receive packet processing:
       * Driver rings usage; consumer & producer indices
       * Single queue for receive and transmit
       * DMA operation

The differences between BF3 and BF2 SoC are:
       * In addition to supporting 1Gbps interface speed, the BF3 SoC
         adds support for 10Mbps and 100Mbps interface speeds
       * BF3 requires SerDes config logic to support its SGMII interface
       * BF3 adds support for "ethtool -s" for interface speed config
       * BF3 utilizes different MDIO logic for accessing the
         board-level PHY device

Testing
  - Successful build of kernel for ARM64, ARM32, X86_64
  - Tested ARM64 build on FastModels, Palladium, SoC
====================

Link: https://lore.kernel.org/r/20230112202609.21331-1-davthompson@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agomlxbf_gige: fix white space in mlxbf_gige_eth_ioctl
David Thompson [Thu, 12 Jan 2023 20:26:09 +0000 (15:26 -0500)]
mlxbf_gige: fix white space in mlxbf_gige_eth_ioctl

This patch fixes the white space issue raised by checkpatch:
CHECK: Alignment should match open parenthesis
+static int mlxbf_gige_eth_ioctl(struct net_device *netdev,
+                              struct ifreq *ifr, int cmd)

Signed-off-by: David Thompson <davthompson@nvidia.com>
Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agomlxbf_gige: add "set_link_ksettings" ethtool callback
David Thompson [Thu, 12 Jan 2023 20:26:08 +0000 (15:26 -0500)]
mlxbf_gige: add "set_link_ksettings" ethtool callback

This patch extends the "ethtool_ops" data structure to
include the "set_link_ksettings" callback. This change
enables configuration of the various interface speeds
that the BlueField-3 supports (10Mbps, 100Mbps, and 1Gbps).

Signed-off-by: David Thompson <davthompson@nvidia.com>
Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agomlxbf_gige: support 10M/100M/1G speeds on BlueField-3
David Thompson [Thu, 12 Jan 2023 20:26:07 +0000 (15:26 -0500)]
mlxbf_gige: support 10M/100M/1G speeds on BlueField-3

The BlueField-3 OOB interface supports 10Mbps, 100Mbps, and 1Gbps speeds.
The external PHY is responsible for autonegotiating the speed with the
link partner. Once the autonegotiation is done, the BlueField PLU needs
to be configured accordingly.

This patch does two things:
1) Initialize the advertised control flow/duplex/speed in the probe
   based on the BlueField SoC generation (2 or 3)
2) Adjust the PLU speed config in the PHY interrupt handler

Signed-off-by: David Thompson <davthompson@nvidia.com>
Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agomlxbf_gige: add MDIO support for BlueField-3
David Thompson [Thu, 12 Jan 2023 20:26:06 +0000 (15:26 -0500)]
mlxbf_gige: add MDIO support for BlueField-3

This patch adds initial MDIO support for the BlueField-3
SoC. Separate header files for the BlueField-2 and the
BlueField-3 SoCs have been created.  These header files
hold the SoC-specific MDIO macros since the register
offsets and bit fields have changed.  Also, in BlueField-3
there is a separate register for writing and reading the
MDIO data.  Finally, instead of having "if" statements
everywhere to differentiate between SoC-specific logic,
a mlxbf_gige_mdio_gw_t struct was created for this purpose.

Signed-off-by: David Thompson <davthompson@nvidia.com>
Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agonet: pcs: pcs-lynx: use phylink_get_link_timer_ns() helper
Russell King (Oracle) [Thu, 12 Jan 2023 14:37:06 +0000 (14:37 +0000)]
net: pcs: pcs-lynx: use phylink_get_link_timer_ns() helper

Use the phylink_get_link_timer_ns() helper to get the period for the
link timer.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/E1pFyhW-0067jq-Fh@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoplca.c: fix obvious mistake in checking retval
Piergiorgio Beruto [Fri, 13 Jan 2023 13:26:35 +0000 (14:26 +0100)]
plca.c: fix obvious mistake in checking retval

Revert a wrong fix that was done during the review process. The
intention was to substitute "if(ret < 0)" with "if(ret)".
Unfortunately, the intended fix did not meet the code.
Besides, after additional review, it was decided that "if(ret < 0)"
was actually the right thing to do.

Fixes: 8580e16c28f3 ("net/ethtool: add netlink interface for the PLCA RS")
Signed-off-by: Piergiorgio Beruto <piergiorgio.beruto@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/f2277af8951a51cfee2fb905af8d7a812b7beaf4.1673616357.git.piergiorgio.beruto@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoMerge branch 'net-mdio-continue-separating-c22-and-c45'
Jakub Kicinski [Sat, 14 Jan 2023 05:40:56 +0000 (21:40 -0800)]
Merge branch 'net-mdio-continue-separating-c22-and-c45'

Michael Walle says:

====================
net: mdio: Continue separating C22 and C45

I've picked this older series from Andrew up and rebased it onto
the latest net-next.

This is the second patch set in the series which separates the C22
and C45 MDIO bus transactions at the API level to the MDIO bus drivers.
====================

Link: https://lore.kernel.org/r/20230112-net-next-c45-seperation-part-2-v1-0-5eeaae931526@walle.cc
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoenetc: Separate C22 and C45 transactions
Andrew Lunn [Thu, 12 Jan 2023 15:15:16 +0000 (16:15 +0100)]
enetc: Separate C22 and C45 transactions

The enetc MDIO bus driver can perform both C22 and C45 transfers.
Create separate functions for each and register the C45 versions using
the new API calls where appropriate.

This driver is shared with the Felix DSA switch, so update that at the
same time.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agonet: stmmac: Separate C22 and C45 transactions for xgmac
Andrew Lunn [Thu, 12 Jan 2023 15:15:15 +0000 (16:15 +0100)]
net: stmmac: Separate C22 and C45 transactions for xgmac

The stmmac MDIO bus driver in variant gmac4 can perform both C22 and
C45 transfers. Create separate functions for each and register the
C45 versions using the new API calls where appropriate.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agonet: stmmac: Separate C22 and C45 transactions for xgmac2
Andrew Lunn [Thu, 12 Jan 2023 15:15:14 +0000 (16:15 +0100)]
net: stmmac: Separate C22 and C45 transactions for xgmac2

The stmicro stmmac xgmac2 MDIO bus driver can perform both C22 and C45
transfers. Create separate functions for each and register the C45
versions using the new API calls where appropriate.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agonet: lan743x: Separate C22 and C45 transactions
Andrew Lunn [Thu, 12 Jan 2023 15:15:13 +0000 (16:15 +0100)]
net: lan743x: Separate C22 and C45 transactions

The microchip lan743x MDIO bus driver can perform both C22 and C45
transfers in some variants. Create separate functions for each and
register the C45 versions using the new API calls where appropriate.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agonet: ethernet: mtk_eth_soc: Separate C22 and C45 transactions
Andrew Lunn [Thu, 12 Jan 2023 15:15:12 +0000 (16:15 +0100)]
net: ethernet: mtk_eth_soc: Separate C22 and C45 transactions

The mediatek bus driver can perform both C22 and C45 transfers.
Create separate functions for each and register the C45 versions using
the new API calls.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agonet: mdio: ipq4019: Separate C22 and C45 transactions
Andrew Lunn [Thu, 12 Jan 2023 15:15:11 +0000 (16:15 +0100)]
net: mdio: ipq4019: Separate C22 and C45 transactions

The ipq4019 driver can perform both C22 and C45 transfers.  Create
separate functions for each and register the C45 versions using the
new driver API calls.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agonet: mdio: aspeed: Separate C22 and C45 transactions
Andrew Lunn [Thu, 12 Jan 2023 15:15:10 +0000 (16:15 +0100)]
net: mdio: aspeed: Separate C22 and C45 transactions

The aspeed MDIO bus driver can perform both C22 and C45 transfers.
Modify the existing C45 functions to take the devad as a parameter,
and remove the wrappers so there are individual C22 and C45 functions. Add
the C45 functions to the new API calls.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agonet: mdio: mux-bcm-iproc: Separate C22 and C45 transactions
Andrew Lunn [Thu, 12 Jan 2023 15:15:09 +0000 (16:15 +0100)]
net: mdio: mux-bcm-iproc: Separate C22 and C45 transactions

The MDIO mux broadcom iproc can perform both C22 and C45 transfers.
Create separate functions for each and register the C45 versions using
the new API calls.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agonet: mdio: i2c: Separate C22 and C45 transactions
Andrew Lunn [Thu, 12 Jan 2023 15:15:08 +0000 (16:15 +0100)]
net: mdio: i2c: Separate C22 and C45 transactions

The MDIO over I2C bus driver can perform both C22 and C45 transfers.
Create separate functions for each and register the C45 versions using
the new API calls.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agonet: mdio: cavium: Separate C22 and C45 transactions
Andrew Lunn [Thu, 12 Jan 2023 15:15:07 +0000 (16:15 +0100)]
net: mdio: cavium: Separate C22 and C45 transactions

The cavium IP can perform both C22 and C45 transfers.  Create separate
functions for each and register the C45 versions in both the octeon
and thunder bus driver.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agonfp: add DCB IEEE support
Bin Chen [Thu, 12 Jan 2023 12:11:02 +0000 (13:11 +0100)]
nfp: add DCB IEEE support

Add basic DCB IEEE support. This includes support for ETS, max-rate,
and DSCP to user priority mapping.

DCB may be configured using iproute2's dcb command.
Example usage:
  dcb ets set dev $dev tc-tsa 0:ets 1:ets 2:ets 3:ets 4:ets 5:ets \
    6:ets 7:ets tc-bw 0:0 1:80 2:0 3:0 4:0 5:0 6:20 7:0
  dcb maxrate set dev $dev tc-maxrate 1:1000bit

And DCB configuration can be shown using:
  dcb ets show dev $dev
  dcb maxrate show dev $dev

Signed-off-by: Bin Chen <bin.chen@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230112121102.469739-1-simon.horman@corigine.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agonet: ethernet: mtk_wed: get rid of queue lock for tx queue
Lorenzo Bianconi [Thu, 12 Jan 2023 09:21:29 +0000 (10:21 +0100)]
net: ethernet: mtk_wed: get rid of queue lock for tx queue

Similar to MTK Wireless Ethernet Dispatcher (WED) MCU rx queue,
we do not need to protect WED MCU tx queue with a spin lock since
the tx queue is accessed in the two following routines:
- mtk_wed_wo_queue_tx_skb():
  it is run at initialization and during mt7915 normal operation.
  Moreover MCU messages are serialized through MCU mutex.
- mtk_wed_wo_queue_tx_clean():
  it runs just at mt7915 driver module unload when no more messages
  are sent to the MCU.

Remove tx queue spinlock.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/7bd0337b2a13ab1a63673b7c03fd35206b3b284e.1673515140.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoipv6: remove max_size check inline with ipv4
Jon Maxwell [Thu, 12 Jan 2023 01:25:32 +0000 (12:25 +1100)]
ipv6: remove max_size check inline with ipv4

In ip6_dst_gc() replace:

  if (entries > gc_thresh)

With:

  if (entries > ops->gc_thresh)

Sending Ipv6 packets in a loop via a raw socket triggers an issue where a
route is cloned by ip6_rt_cache_alloc() for each packet sent. This quickly
consumes the Ipv6 max_size threshold which defaults to 4096 resulting in
these warnings:

[1]   99.187805] dst_alloc: 7728 callbacks suppressed
[2] Route cache is full: consider increasing sysctl net.ipv6.route.max_size.
.
.
[300] Route cache is full: consider increasing sysctl net.ipv6.route.max_size.

When this happens the packet is dropped and sendto() gets a network is
unreachable error:

remaining pkt 200557 errno 101
remaining pkt 196462 errno 101
.
.
remaining pkt 126821 errno 101

Implement David Aherns suggestion to remove max_size check seeing that Ipv6
has a GC to manage memory usage. Ipv4 already does not check max_size.

Here are some memory comparisons for Ipv4 vs Ipv6 with the patch:

Test by running 5 instances of a program that sends UDP packets to a raw
socket 5000000 times. Compare Ipv4 and Ipv6 performance with a similar
program.

Ipv4:

Before test:

MemFree:        29427108 kB
Slab:             237612 kB

ip6_dst_cache       1912   2528    256   32    2 : tunables    0    0    0
xfrm_dst_cache         0      0    320   25    2 : tunables    0    0    0
ip_dst_cache        2881   3990    192   42    2 : tunables    0    0    0

During test:

MemFree:        29417608 kB
Slab:             247712 kB

ip6_dst_cache       1912   2528    256   32    2 : tunables    0    0    0
xfrm_dst_cache         0      0    320   25    2 : tunables    0    0    0
ip_dst_cache       44394  44394    192   42    2 : tunables    0    0    0

After test:

MemFree:        29422308 kB
Slab:             238104 kB

ip6_dst_cache       1912   2528    256   32    2 : tunables    0    0    0
xfrm_dst_cache         0      0    320   25    2 : tunables    0    0    0
ip_dst_cache        3048   4116    192   42    2 : tunables    0    0    0

Ipv6 with patch:

Errno 101 errors are not observed anymore with the patch.

Before test:

MemFree:        29422308 kB
Slab:             238104 kB

ip6_dst_cache       1912   2528    256   32    2 : tunables    0    0    0
xfrm_dst_cache         0      0    320   25    2 : tunables    0    0    0
ip_dst_cache        3048   4116    192   42    2 : tunables    0    0    0

During Test:

MemFree:        29431516 kB
Slab:             240940 kB

ip6_dst_cache      11980  12064    256   32    2 : tunables    0    0    0
xfrm_dst_cache         0      0    320   25    2 : tunables    0    0    0
ip_dst_cache        3048   4116    192   42    2 : tunables    0    0    0

After Test:

MemFree:        29441816 kB
Slab:             238132 kB

ip6_dst_cache       1902   2432    256   32    2 : tunables    0    0    0
xfrm_dst_cache         0      0    320   25    2 : tunables    0    0    0
ip_dst_cache        3048   4116    192   42    2 : tunables    0    0    0

Tested-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230112012532.311021-1-jmaxwell37@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agocaif: don't assume iov_iter type
Keith Busch [Wed, 11 Jan 2023 18:42:45 +0000 (10:42 -0800)]
caif: don't assume iov_iter type

The details of the iov_iter types are appropriately abstracted, so
there's no need to check for specific type fields. Just let the
abstractions handle it.

This is preparing for io_uring/net's io_send to utilize the more
efficient ITER_UBUF.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20230111184245.3784393-1-kbusch@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agodt-bindings: net: rockchip-dwmac: fix rv1126 compatible warning
Anand Moon [Wed, 11 Jan 2023 17:24:31 +0000 (17:24 +0000)]
dt-bindings: net: rockchip-dwmac: fix rv1126 compatible warning

Fix compatible string for RV1126 gmac, and constrain it to
be compatible with Synopsys dwmac 4.20a.

fix below warning
$ make CHECK_DTBS=y rv1126-edgeble-neu2-io.dtb
arch/arm/boot/dts/rv1126-edgeble-neu2-io.dtb: ethernet@ffc40000:
 compatible: 'oneOf' conditional failed, one must be fixed:
        ['rockchip,rv1126-gmac', 'snps,dwmac-4.20a'] is too long
        'rockchip,rv1126-gmac' is not one of ['rockchip,rk3568-gmac', 'rockchip,rk3588-gmac']

Fixes: b36fe2f43662 ("dt-bindings: net: rockchip-dwmac: add rv1126 compatible")
Reviewed-by: Jagan Teki <jagan@edgeble.ai>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Anand Moon <anand@edgeble.ai>
Link: https://lore.kernel.org/r/20230111172437.5295-1-anand@edgeble.ai
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoixgbe: Filter out spurious link up indication
Sebastian Czapla [Mon, 12 Dec 2022 09:59:38 +0000 (10:59 +0100)]
ixgbe: Filter out spurious link up indication

Add delayed link state recheck to filter false link up indication
caused by transceiver with no fiber cable attached.

Signed-off-by: Sebastian Czapla <sebastianx.czapla@intel.com>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
22 months agoixgbe: XDP: fix checker warning from rcu pointer
Jesse Brandeburg [Tue, 22 Nov 2022 23:48:25 +0000 (15:48 -0800)]
ixgbe: XDP: fix checker warning from rcu pointer

The ixgbe driver uses an older style failure mode when initializing the
XDP program and the queues. It causes some warnings when running C=2
checking builds (and it's the last one in the ethernet/intel tree).

$ make W=1 C=2 M=`pwd`/drivers/net/ethernet/intel modules
.../ixgbe_main.c:10301:25: error: incompatible types in comparison expression (different address spaces):
.../ixgbe_main.c:10301:25:    struct bpf_prog [noderef] __rcu *
.../ixgbe_main.c:10301:25:    struct bpf_prog *

Fix the problem by removing the line that tried to re-xchg "the old_prog
pointer" if there was an error, to make this driver act like the other
drivers which return the error code without "pointer restoration."

Also, update the "copy the pointer" logic to use WRITE_ONCE as many/all
the other drivers do, which required making a change in two separate
functions that write the xdp_prog variable in the ring.

The code here was modeled after the code in i40e/i40e_xdp_setup().

NOTE: Compile-tested only.

CC: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
CC: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
22 months agosock: add tracepoint for send recv length
Yunhui Cui [Wed, 11 Jan 2023 06:59:30 +0000 (14:59 +0800)]
sock: add tracepoint for send recv length

Add 2 tracepoints to monitor the tcp/udp traffic
of per process and per cgroup.

Regarding monitoring the tcp/udp traffic of each process, there are two
existing solutions, the first one is https://www.atoptool.nl/netatop.php.
The second is via kprobe/kretprobe.

Netatop solution is implemented by registering the hook function at the
hook point provided by the netfilter framework.

These hook functions may be in the soft interrupt context and cannot
directly obtain the pid. Some data structures are added to bind packets
and processes. For example, struct taskinfobucket, struct taskinfo ...

Every time the process sends and receives packets it needs multiple
hashmaps,resulting in low performance and it has the problem fo inaccurate
tcp/udp traffic statistics(for example: multiple threads share sockets).

We can obtain the information with kretprobe, but as we know, kprobe gets
the result by trappig in an exception, which loses performance compared
to tracepoint.

We compared the performance of tracepoints with the above two methods, and
the results are as follows:

ab -n 1000000 -c 1000 -r http://127.0.0.1/index.html
without trace:
Time per request: 39.660 [ms] (mean)
Time per request: 0.040 [ms] (mean, across all concurrent requests)

netatop:
Time per request: 50.717 [ms] (mean)
Time per request: 0.051 [ms] (mean, across all concurrent requests)

kr:
Time per request: 43.168 [ms] (mean)
Time per request: 0.043 [ms] (mean, across all concurrent requests)

tracepoint:
Time per request: 41.004 [ms] (mean)
Time per request: 0.041 [ms] (mean, across all concurrent requests

It can be seen that tracepoint has better performance.

Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
Signed-off-by: Xiongchun Duan <duanxiongchun@bytedance.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agoMerge branch 'rmnet-tx-pkt-aggregation'
David S. Miller [Fri, 13 Jan 2023 10:23:52 +0000 (10:23 +0000)]
Merge branch 'rmnet-tx-pkt-aggregation'

Daniele Palmas says:

====================
net: add tx packets aggregation to ethtool and rmnet

Hello maintainers and all,

this patchset implements tx qmap packets aggregation in rmnet and generic
ethtool support for that.

Some low-cat Thread-x based modems are not capable of properly reaching the maximum
allowed throughput both in tx and rx during a bidirectional test if tx packets
aggregation is not enabled.

I verified this problem with rmnet + qmi_wwan by using a MDM9207 Cat. 4 based modem
(50Mbps/150Mbps max throughput). What is actually happening is pictured at
https://drive.google.com/file/d/1gSbozrtd9h0X63i6vdkNpN68d-9sg8f9/view

Testing with iperf TCP, when rx and tx flows are tested singularly there's no issue
in tx and minor issues in rx (not able to reach max throughput). When there are concurrent
tx and rx flows, tx throughput has an huge drop. rx a minor one, but still present.

The same scenario with tx aggregation enabled is pictured at
https://drive.google.com/file/d/1jcVIKNZD7K3lHtwKE5W02mpaloudYYih/view
showing a regular graph.

This issue does not happen with high-cat modems (e.g. SDX20), or at least it
does not happen at the throughputs I'm able to test currently: maybe the same
could happen when moving close to the maximum rates supported by those modems.
Anyway, having the tx aggregation enabled should not hurt.

The first attempt to solve this issue was in qmi_wwan qmap implementation,
see the discussion at https://lore.kernel.org/netdev/20221019132503.6783-1-dnlplm@gmail.com/

However, it turned out that rmnet was a better candidate for the implementation.

Moreover, Greg and Jakub suggested also to use ethtool for the configuration:
not sure if I got their advice right, but this patchset add also generic ethtool
support for tx aggregation.

The patches have been tested mainly against an MDM9207 based modem through USB
and SDX55 through PCI (MHI).

v2 should address the comments highlighted in the review: the implementation is
still in rmnet, due to Subash's request of keeping tx aggregation there.

v3 fixes ethtool-netlink.rst content out of table bounds and a W=1 build warning
for patch 2.

v4 solves a race related to egress_agg_params.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: qualcomm: rmnet: add ethtool support for configuring tx aggregation
Daniele Palmas [Wed, 11 Jan 2023 13:05:20 +0000 (14:05 +0100)]
net: qualcomm: rmnet: add ethtool support for configuring tx aggregation

Add support for ETHTOOL_COALESCE_TX_AGGR for configuring the tx
aggregation settings.

Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: qualcomm: rmnet: add tx packets aggregation
Daniele Palmas [Wed, 11 Jan 2023 13:05:19 +0000 (14:05 +0100)]
net: qualcomm: rmnet: add tx packets aggregation

Add tx packets aggregation.

Bidirectional TCP throughput tests through iperf with low-cat
Thread-x based modems revelead performance issues both in tx
and rx.

The Windows driver does not show this issue: inspecting USB
packets revealed that the only notable change is the driver
enabling tx packets aggregation.

Tx packets aggregation is by default disabled and can be enabled
by increasing the value of ETHTOOL_A_COALESCE_TX_MAX_AGGR_FRAMES.

The maximum aggregated size is by default set to a reasonably low
value in order to support the majority of modems.

This implementation is based on patches available in Code Aurora
repositories (msm kernel) whose main authors are

Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Sean Tranchetti <stranche@codeaurora.org>

Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Reviewed-by: Subash Abhinov Kasiviswanathan <quic_subashab@quicinc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agoethtool: add tx aggregation parameters
Daniele Palmas [Wed, 11 Jan 2023 13:05:18 +0000 (14:05 +0100)]
ethtool: add tx aggregation parameters

Add the following ethtool tx aggregation parameters:

ETHTOOL_A_COALESCE_TX_AGGR_MAX_BYTES
Maximum size in bytes of a tx aggregated block of frames.

ETHTOOL_A_COALESCE_TX_AGGR_MAX_FRAMES
Maximum number of frames that can be aggregated into a block.

ETHTOOL_A_COALESCE_TX_AGGR_TIME_USECS
Time in usecs after the first packet arrival in an aggregated
block for the block to be sent.

Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agoMerge branch 'dsa-microchip-ptp'
David S. Miller [Fri, 13 Jan 2023 08:40:41 +0000 (08:40 +0000)]
Merge branch 'dsa-microchip-ptp'

Arun Ramadoss says:

====================
net: dsa: microchip: add PTP support for KSZ9563/KSZ8563 and LAN937x

KSZ9563/KSZ8563 and  LAN937x switch are capable for supporting IEEE 1588 PTP
protocol.  LAN937x has the same PTP register set similar to KSZ9563, hence the
implementation has been made common for the KSZ switches.  KSZ9563 does not
support two step timestamping but LAN937x supports both.  Tested the 1step &
2step p2p timestamping in LAN937x and p2p1step timestamping in KSZ9563.

This patch series is based on the Christian Eggers PTP support for KSZ9563.
Applied the Christian patch and updated as per the latest refactoring of KSZ
series code. The features added on top are PTP packet Interrupt
implementation based on nested handler, LAN937x two step timestamping and
programmable per_out pins.

Link: https://www.spinics.net/lists/netdev/msg705531.html
Patch v7 -> v8
- set skb->ip_summed = CHECKSUM_NONE after updating the checksum

Patch v6 -> v7
- Corrected the misplaced spaces and tabs
- Added mutex lock in do_aux_work
- Replaced 0/1 with false/true for ts_en
- SKB_TX_INPROGRESS flag is set before dsa_enqueue_skb
- Removed the fallthrough keyword
- pdelay_resp header correction is performed based on
  KSZ_SKB_CB(skb)->update_correction instead of clone

Patch v5 -> v6
- Rebased to latest net-next and renamed from RFC to patch net-next.

Patch v4 -> v5
- Replaced irq_domain_add_simple with irq_doamin_add_linear
- Used the helper diff_by_scaled_ppm() for adjfine.

Patch v3 -> v4
- removed IRQF_TRIGGER_FALLING from the request_threaded_irq of ptp msg
- addressed review comments on patch 10 periodic output
- added sign off in patch 6 & 9
- reverted to set PTP_1STEP bit for lan937x which is missed during v3 regression

Patch v2-> v3
- used port_rxtstamp for reconstructing the absolute timestamp instead of
tagger function pointer.
- Reverted to setting of 802.1As bit.

Patch v1 -> v2
- GPIO perout enable bit is different for LAN937x and KSZ9x. Added new patch
for configuring LAN937x programmable pins.
- PTP enabled in hardware based on both tx and rx timestamping of all the user
ports.
- Replaced setting of 802.1AS bit with P2P bit in PTP_MSG_CONF1 register.

RFC v2 -> Patch v1
- Changed the patch author based on past patch submission
- Changed the commit message prefix as net: dsa: microchip: ptp
Individual patch changes are listed in correspondig commits.

RFC v1 -> v2
- Added the p2p1step timestamping and conditional execution of 2 step for
  LAN937x only.
- Added the periodic output support
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: dsa: microchip: ptp: lan937x: Enable periodic output in LED pins
Arun Ramadoss [Tue, 10 Jan 2023 08:49:30 +0000 (14:19 +0530)]
net: dsa: microchip: ptp: lan937x: Enable periodic output in LED pins

There is difference in implementation of per_out pins between KSZ9563
and LAN937x. In KSZ9563, Timestamping control register (0x052C) bit 6,
if 1 - timestamp input and 0 - trigger output. But it is opposite for
LAN937x 1 - trigger output and 0 - timestamp input.
As per per_out gpio pins, KSZ9563 has four Led pins and two dedicated
gpio pins. But in LAN937x dedicated gpio pins are removed instead there
are up to 10 LED pins out of which LED_0 and LED_1 can be mapped to PTP
tou 0, 1 or 2. This patch sets the bit 6 in 0x052C register and
configure the LED override and source register for LAN937x series of
switches alone.

Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: dsa: microchip: ptp: lan937x: add 2 step timestamping
Arun Ramadoss [Tue, 10 Jan 2023 08:49:29 +0000 (14:19 +0530)]
net: dsa: microchip: ptp: lan937x: add 2 step timestamping

LAN937x series of switches support 2 step timestamping mechanism. There
are timestamp correction calculation performed in ksz_rcv_timestamp and
ksz_xmit_timestamp which are applicable only for p2p1step. To check
whether the 2 step is enabled or not in tag_ksz.c introduced the helper
function in taggger_data to query it from ksz_ptp.c. Based on whether 2
step is enabled or not, timestamp calculation are performed.

Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: dsa: microchip: ptp: add support for perout programmable pins
Arun Ramadoss [Tue, 10 Jan 2023 08:49:28 +0000 (14:19 +0530)]
net: dsa: microchip: ptp: add support for perout programmable pins

There are two programmable pins available for Trigger output unit to
generate periodic pulses. This patch add verify_pin for the available 2
pins and configure it with respect to GPIO index for the TOU unit.

Tested using testptp
./testptp -i 0 -L 0,2
./testptp -i 0 -d /dev/ptp0 -p 1000000000
./testptp -i 1 -L 1,2
./testptp -i 1 -d /dev/ptp0 -p 100000000

Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: dsa: microchip: ptp: add periodic output signal
Christian Eggers [Tue, 10 Jan 2023 08:49:27 +0000 (14:19 +0530)]
net: dsa: microchip: ptp: add periodic output signal

LAN937x and KSZ PTP supported switches has Three Trigger output unit.
This TOU can used to generate the periodic signal for PTP. TOU has the
cycle width register of 32 bit in size and period width register of 24
bit, each value is of 8ns so the pulse width can be maximum 125ms.

Tested using ./testptp -d /dev/ptp0 -p 1000000000 -w 100000000 for
generating the 10ms pulse width

Signed-off-by: Christian Eggers <ceggers@arri.de>
Co-developed-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: dsa: microchip: ptp: move pdelay_rsp correction field to tail tag
Christian Eggers [Tue, 10 Jan 2023 08:49:26 +0000 (14:19 +0530)]
net: dsa: microchip: ptp: move pdelay_rsp correction field to tail tag

For PDelay_Resp messages we will likely have a negative value in the
correction field. The switch hardware cannot correctly update such
values (produces an off by one error in the UDP checksum), so it must be
moved to the time stamp field in the tail tag. Format of the correction
field is 48 bit ns + 16 bit fractional ns.  After updating the
correction field, clone is no longer required hence it is freed.

Signed-off-by: Christian Eggers <ceggers@arri.de>
Co-developed-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: dsa: microchip: ptp: add packet transmission timestamping
Christian Eggers [Tue, 10 Jan 2023 08:49:25 +0000 (14:19 +0530)]
net: dsa: microchip: ptp: add packet transmission timestamping

This patch adds the routines for transmission of ptp packets. When the
ptp pdelay_req packet to be transmitted, it uses the deferred xmit
worker to schedule the packets.
During irq_setup, interrupt for Sync, Pdelay_req and Pdelay_rsp are
enabled. So interrupt is triggered for all three packets. But for
p2p1step, we require only time stamp of Pdelay_req packet. Hence to
avoid posting of the completion from ISR routine for Sync and
Pdelay_resp packets, ts_en flag is introduced. This controls which
packets need to processed for timestamp.
After the packet is transmitted, ISR is triggered. The time at which
packet transmitted is recorded to separate register.
This value is reconstructed to absolute time and posted to the user
application through socket error queue.

Signed-off-by: Christian Eggers <ceggers@arri.de>
Co-developed-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: dsa: microchip: ptp: add packet reception timestamping
Christian Eggers [Tue, 10 Jan 2023 08:49:24 +0000 (14:19 +0530)]
net: dsa: microchip: ptp: add packet reception timestamping

Rx Timestamping is done through 4 additional bytes in tail tag.
Whenever the ptp packet is received, the 4 byte hardware time stamped
value is added before 1 byte tail tag. Also, bit 7 in tail tag indicates
it as PTP frame. This 4 byte value is extracted from the tail tag and
reconstructed to absolute time and assigned to skb hwtstamp.
If the packet received in PDelay_Resp, then partial ingress timestamp
is subtracted from the correction field. Since user space tools expects
to be done in hardware.

Signed-off-by: Christian Eggers <ceggers@arri.de>
Co-developed-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: ptp: add helper for one-step P2P clocks
Christian Eggers [Tue, 10 Jan 2023 08:49:23 +0000 (14:19 +0530)]
net: ptp: add helper for one-step P2P clocks

For P2P delay measurement, the ingress time stamp of the PDelay_Req is
required for the correction field of the PDelay_Resp. The application
echoes back the correction field of the PDelay_Req when sending the
PDelay_Resp.

Some hardware (like the ZHAW InES PTP time stamping IP core) subtracts
the ingress timestamp autonomously from the correction field, so that
the hardware only needs to add the egress timestamp on tx. Other
hardware (like the Microchip KSZ9563) reports the ingress time stamp via
an interrupt and requires that the software provides this time stamp via
tail-tag on tx.

In order to avoid introducing a further application interface for this,
the driver can simply emulate the behavior of the InES device and
subtract the ingress time stamp in software from the correction field.

On egress, the correction field can either be kept as it is (and the
time stamp field in the tail-tag is set to zero) or move the value from
the correction field back to the tail-tag.

Changing the correction field requires updating the UDP checksum (if UDP
is used as transport).

Signed-off-by: Christian Eggers <ceggers@arri.de>
Co-developed-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: dsa: microchip: ptp: enable interrupt for timestamping
Arun Ramadoss [Tue, 10 Jan 2023 08:49:22 +0000 (14:19 +0530)]
net: dsa: microchip: ptp: enable interrupt for timestamping

PTP Interrupt mask and status register differ from the global and port
interrupt mechanism by two methods. One is that for global/port
interrupt enabling we have to clear the bit but for ptp interrupt we
have to set the bit. And other is bit12:0 is reserved in ptp interrupt
registers. This forced to not use the generic implementation of
global/port interrupt method routine. This patch implement the ptp
interrupt mechanism to read the timestamp register for sync, pdelay_req
and pdelay_resp.

Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: dsa: microchip: ptp: manipulating absolute time using ptp hw clock
Christian Eggers [Tue, 10 Jan 2023 08:49:21 +0000 (14:19 +0530)]
net: dsa: microchip: ptp: manipulating absolute time using ptp hw clock

This patch is used for reconstructing the absolute time from the 32bit
hardware time stamping value. The do_aux ioctl is used for reading the
ptp hardware clock and store it to global variable.
The timestamped value in tail tag during rx and register during tx are
32 bit value (2 bit seconds and 30 bit nanoseconds). The time taken to
read entire ptp clock will be time consuming. In order to speed up, the
software clock is maintained. This clock time will be added to 32 bit
timestamp to get the absolute time stamp.

Signed-off-by: Christian Eggers <ceggers@arri.de>
Co-developed-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: dsa: microchip: ptp: add 4 bytes in tail tag when ptp enabled
Arun Ramadoss [Tue, 10 Jan 2023 08:49:20 +0000 (14:19 +0530)]
net: dsa: microchip: ptp: add 4 bytes in tail tag when ptp enabled

When the PTP is enabled in hardware bit 6 of PTP_MSG_CONF1 register, the
transmit frame needs additional 4 bytes before the tail tag. It is
needed for all the transmission packets irrespective of PTP packets or
not.
The 4-byte timestamp field is 0 for frames other than Pdelay_Resp. For
the one-step Pdelay_Resp, the switch needs the receive timestamp of the
Pdelay_Req message so that it can put the turnaround time in the
correction field.
Since PTP has to be enabled for both Transmission and reception
timestamping, driver needs to track of the tx and rx setting of the all
the user ports in the switch.
Two flags hw_tx_en and hw_rx_en are added in ksz_port to track the
timestampping setting of each port. When any one of ports has tx or rx
timestampping enabled, bit 6 of PTP_MSG_CONF1 is set and it is indicated
to tag_ksz.c through tagger bytes. This flag adds 4 additional bytes to
the tail tag.  When tx and rx timestamping of all the ports are disabled,
then 4 bytes are not added.

Tested using hwstamp -i <interface>

Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com> # mostly api
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: dsa: microchip: ptp: Initial hardware time stamping support
Christian Eggers [Tue, 10 Jan 2023 08:49:19 +0000 (14:19 +0530)]
net: dsa: microchip: ptp: Initial hardware time stamping support

This patch adds the routine for get_ts_info, hwstamp_get, set. This enables
the PTP support towards userspace applications such as linuxptp.

Signed-off-by: Christian Eggers <ceggers@arri.de>
Co-developed-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: dsa: microchip: ptp: add the posix clock support
Christian Eggers [Tue, 10 Jan 2023 08:49:18 +0000 (14:19 +0530)]
net: dsa: microchip: ptp: add the posix clock support

This patch implement routines (adjfine, adjtime, gettime and settime)
for manipulating the chip's PTP clock. It registers the ptp caps
to posix clock register.

Signed-off-by: Christian Eggers <ceggers@arri.de>
Co-developed-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com> # mostly api
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agoMerge branch 'add-support-to-offload-macsec-using-netlink-update'
Jakub Kicinski [Fri, 13 Jan 2023 05:43:37 +0000 (21:43 -0800)]
Merge branch 'add-support-to-offload-macsec-using-netlink-update'

Emeel Hakim says:

====================
Add support to offload macsec using netlink update

This series adds support for offloading macsec as part of the netlink
update routine, command example:

  $ ip link set link eth2 macsec0 type macsec offload mac

The above is done using the IFLA_MACSEC_OFFLOAD attribute hence
the second patch of dumping this attribute as part of the macsec
dump.
====================

Link: https://lore.kernel.org/r/20230111150210.8246-1-ehakim@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agomacsec: dump IFLA_MACSEC_OFFLOAD attribute as part of macsec dump
Emeel Hakim [Wed, 11 Jan 2023 15:02:10 +0000 (17:02 +0200)]
macsec: dump IFLA_MACSEC_OFFLOAD attribute as part of macsec dump

Support dumping offload netlink attribute in macsec's device
attributes dump.
Change macsec_get_size to consider the offload attribute in
the calculations of the required room for dumping the device
netlink attributes.

Signed-off-by: Emeel Hakim <ehakim@nvidia.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agomacsec: add support for IFLA_MACSEC_OFFLOAD in macsec_changelink
Emeel Hakim [Wed, 11 Jan 2023 15:02:09 +0000 (17:02 +0200)]
macsec: add support for IFLA_MACSEC_OFFLOAD in macsec_changelink

Add support for changing Macsec offload selection through the
netlink layer by implementing the relevant changes in
macsec_changelink.

Since the handling in macsec_changelink is similar to macsec_upd_offload,
update macsec_upd_offload to use a common helper function to avoid
duplication.

Example for setting offload for a macsec device:
    ip link set macsec0 type macsec offload mac

Signed-off-by: Emeel Hakim <ehakim@nvidia.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agonet: remove redundant config PCI dependency for some network driver configs
Lukas Bulwahn [Wed, 11 Jan 2023 12:58:55 +0000 (13:58 +0100)]
net: remove redundant config PCI dependency for some network driver configs

While reviewing dependencies in some Kconfig files, I noticed the redundant
dependency "depends on PCI && PCI_MSI". The config PCI_MSI has always,
since its introduction, been dependent on the config PCI. So, it is
sufficient to just depend on PCI_MSI, and know that the dependency on PCI
is implicitly implied.

Reduce the dependencies of some network driver configs.
No functional change and effective change of Kconfig dependendencies.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Acked-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Dimitris Michailidis <dmichail@fungible.com>
Link: https://lore.kernel.org/r/20230111125855.19020-1-lukas.bulwahn@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agonet: ngbe: Add ngbe mdio bus driver.
Mengyuan Lou [Wed, 11 Jan 2023 11:17:18 +0000 (19:17 +0800)]
net: ngbe: Add ngbe mdio bus driver.

Add mdio bus register for ngbe.
The internal phy and external phy need to be handled separately.
Add phy changed event detection.

Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230111111718.40745-1-mengyuanlou@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoMerge branch 'net-thunderbolt-add-tracepoints'
Jakub Kicinski [Fri, 13 Jan 2023 05:19:33 +0000 (21:19 -0800)]
Merge branch 'net-thunderbolt-add-tracepoints'

Mika Westerberg says:

====================
net: thunderbolt: Add tracepoints

This series adds tracepoints and additional logging to the
Thunderbolt/USB4 networking driver. These are useful when debugging
possible issues.

Before that we move the driver into its own directory under drivers/net
so that we can add additional files without trashing the network drivers
main directory, and update the MAINTAINERS accordingly.

v1: https://lore.kernel.org/netdev/20230104081731.45928-1-mika.westerberg@linux.intel.com/
====================

Link: https://lore.kernel.org/r/20230111062633.1385-1-mika.westerberg@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agonet: thunderbolt: Add tracepoints
Mika Westerberg [Wed, 11 Jan 2023 06:26:33 +0000 (08:26 +0200)]
net: thunderbolt: Add tracepoints

These are useful when debugging various performance issues.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Yehezkel Bernat <YehezkelShB@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agonet: thunderbolt: Add debugging when sending/receiving control packets
Mika Westerberg [Wed, 11 Jan 2023 06:26:32 +0000 (08:26 +0200)]
net: thunderbolt: Add debugging when sending/receiving control packets

These can be useful when debugging possible issues around USB4NET
control packet exchange.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Yehezkel Bernat <YehezkelShB@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agonet: thunderbolt: Move into own directory
Mika Westerberg [Wed, 11 Jan 2023 06:26:31 +0000 (08:26 +0200)]
net: thunderbolt: Move into own directory

We will be adding tracepoints to the driver so instead of littering the
main network driver directory, move the driver into its own directory.
While there, rename the module to thunderbolt_net (with underscore) to
match with the thunderbolt_dma_test convention.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Yehezkel Bernat <YehezkelShB@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>