Jakub Kicinski [Thu, 23 Mar 2023 04:56:33 +0000 (21:56 -0700)]
Merge tag 'ipsec-libreswan-mlx5' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Leon Romanovsky says:
====================
Extend packet offload to fully support libreswan
The following patches are an outcome of Raed's work to add packet
offload support to libreswan [1].
The series includes:
* Priority support to IPsec policies
* Statistics per-SA (visible through "ip -s xfrm state ..." command)
* Support to IKE policy holes
* Fine tuning to acquire logic.
[1] https://github.com/libreswan/libreswan/pull/986 Link: https://lore.kernel.org/all/cover.1678714336.git.leon@kernel.org
* tag 'ipsec-libreswan-mlx5' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
net/mlx5e: Update IPsec per SA packets/bytes count
net/mlx5e: Use one rule to count all IPsec Tx offloaded traffic
net/mlx5e: Support IPsec acquire default SA
net/mlx5e: Allow policies with reqid 0, to support IKE policy holes
xfrm: copy_to_user_state fetch offloaded SA packets/bytes statistics
xfrm: add new device offload acquire flag
net/mlx5e: Use chains for IPsec policy priority offload
net/mlx5: fs_core: Allow ignore_flow_level on TX dest
net/mlx5: fs_chains: Refactor to detach chains from tc usage
====================
====================
net: dsa: b53: configure 6318 and 63268 RGMII ports
BCM6318 and BCM63268 need special configuration for their RGMII ports, so we
need to be able to identify them as a special BCM63xx switch.
In the meantime, let's add some missing BCM63xx SoCs to B53 MMAP device table.
This should be applied after "net: dsa: b53: add support for BCM63xx RGMIIs":
https://patchwork.kernel.org/project/netdevbpf/patch/20230319220805.124024-1-noltari@gmail.com/
Álvaro Fernández Rojas (4):
dt-bindings: net: dsa: b53: add more 63xx SoCs
net: dsa: b53: mmap: add more 63xx SoCs
net: dsa: b53: mmap: allow passing a chip ID
net: dsa: b53: add BCM63268 RGMII configuration
Pavan Chebbi [Tue, 21 Mar 2023 14:44:49 +0000 (07:44 -0700)]
bnxt: Enforce PTP software freq adjustments only when in non-RTC mode
Currently driver performs software based frequency adjustments
when RTC capability is not discovered or when in shared PHC mode.
But there may be some old firmware versions that still support
hardware freq adjustments without RTC capability being exposed.
In this situation driver will use non-realtime mode even on single
host NICs.
Hence enforce software frequency adjustments only when running in
shared PHC mode. Make suitable changes for cyclecounter for the
same.
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Acked-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pavan Chebbi [Tue, 21 Mar 2023 14:44:48 +0000 (07:44 -0700)]
bnxt: Defer PTP initialization to after querying function caps
Driver uses the flag BNXT_FLAG_MULTI_HOST to determine whether
to use non-realtime mode PHC when running on a multi host NIC.
However when ptp initializes on a NIC with shared PHC, we still
don't have this flag set yet because HWRM_FUNC_QCFG is issued
much later.
Move the ptp initialization code after we have issued func_qcfg.
The next patch will use the BNXT_FLAG_MULTI_HOST flag during PTP
initialization.
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Acked-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Xiaoyan Li [Tue, 21 Mar 2023 08:12:01 +0000 (16:12 +0800)]
net-zerocopy: Reduce compound page head access
When compound pages are enabled, although the mm layer still
returns an array of page pointers, a subset (or all) of them
may have the same page head since a max 180kb skb can span 2
hugepages if it is on the boundary, be a mix of pages and 1 hugepage,
or fit completely in a hugepage. Instead of referencing page head
on all page pointers, use page length arithmetic to only call page
head when referencing a known different page head to avoid touching
a cold cacheline.
Tested:
See next patch with changes to tcp_mmap
Correntess:
On a pair of separate hosts as send with MSG_ZEROCOPY will
force a copy on tx if using loopback alone, check that the SHA
on the message sent is equivalent to checksum on the message received,
since the current program already checks for the length.
net: ethernet: ti: am65-cpts: adjust estf following ptp changes
When the CPTS clock is synced/adjusted by running linuxptp (ptp4l/phc2sys),
it will cause the TSN EST schedule to drift away over time. This is because
the schedule is driven by the EstF periodic counter whose pulse length is
defined in ref_clk cycles and it does not automatically sync to CPTS clock.
_______
_|
^
expected cycle start time boundary
_______________
_|_|___|_|
^
EstF drifted away -> direction
To fix it, the same PPM adjustment has to be applied to EstF as done to the
PHC CPTS clock, in order to correct the TSN EST cycle length and keep them
in sync.
Jason Xing [Tue, 21 Mar 2023 01:57:46 +0000 (09:57 +0800)]
net-sysfs: display two backlog queue len separately
Sometimes we need to know which one of backlog queue can be exactly
long enough to cause some latency when debugging this part is needed.
Thus, we can then separate the display of both.
Jakub Kicinski [Tue, 21 Mar 2023 04:41:59 +0000 (21:41 -0700)]
tools: ynl: skip the explicit op array size when not needed
Jiri suggests it reads more naturally to skip the explicit
array size when possible. When we export the symbol we want
to make sure that the size is right but for statics its
not needed.
Tom Rix [Mon, 20 Mar 2023 23:23:17 +0000 (19:23 -0400)]
net: atheros: atl1c: remove unused atl1c_irq_reset function
clang with W=1 reports
drivers/net/ethernet/atheros/atl1c/atl1c_main.c:214:20: error:
unused function 'atl1c_irq_reset' [-Werror,-Wunused-function]
static inline void atl1c_irq_reset(struct atl1c_adapter *adapter)
^
This function is not used, so remove it.
net: pasemi: Fix return type of pasemi_mac_start_tx()
With clang's kernel control flow integrity (kCFI, CONFIG_CFI_CLANG),
indirect call targets are validated against the expected function
pointer prototype to make sure the call target is valid to help mitigate
ROP attacks. If they are not identical, there is a failure at run time,
which manifests as either a kernel panic or thread getting killed. A
warning in clang aims to catch these at compile time, which reveals:
drivers/net/ethernet/pasemi/pasemi_mac.c:1665:21: error: incompatible function pointer types initializing 'netdev_tx_t (*)(struct sk_buff *, struct net_device *)' (aka 'enum netdev_tx (*)(struct sk_buff *, struct net_device *)') with an expression of type 'int (struct sk_buff *, struct net_device *)' [-Werror,-Wincompatible-function-pointer-types-strict]
.ndo_start_xmit = pasemi_mac_start_tx,
^~~~~~~~~~~~~~~~~~~
1 error generated.
->ndo_start_xmit() in 'struct net_device_ops' expects a return type of
'netdev_tx_t', not 'int'. Adjust the return type of
pasemi_mac_start_tx() to match the prototype's to resolve the warning.
While PowerPC does not currently implement support for kCFI, it could in
the future, which means this warning becomes a fatal CFI failure at run
time.
Josef Miegl [Sun, 19 Mar 2023 22:09:54 +0000 (23:09 +0100)]
net: geneve: accept every ethertype
The Geneve encapsulation, as defined in RFC 8926, has a Protocol Type
field, which states the Ethertype of the payload appearing after the
Geneve header.
Commit 435fe1c0c1f7 ("net: geneve: support IPv4/IPv6 as inner protocol")
introduced a new IFLA_GENEVE_INNER_PROTO_INHERIT flag that allowed the
use of other Ethertypes than Ethernet. However, it did not get rid of a
restriction that prohibits receiving payloads other than Ethernet,
instead the commit white-listed additional Ethertypes, IPv4 and IPv6.
This patch removes this restriction, making it possible to receive any
Ethertype as a payload, if the IFLA_GENEVE_INNER_PROTO_INHERIT flag is
set.
The restriction was set in place back in commit 0b5e8b8eeae4
("net: Add Geneve tunneling protocol driver"), which implemented a
protocol layer driver for Geneve to be used with Open vSwitch. The
relevant discussion about introducing the Ethertype white-list can be
found here:
https://lore.kernel.org/netdev/CAEP_g=_1q3ACX5NTHxLDnysL+dTMUVzdLpgw1apLKEdDSWPztw@mail.gmail.com/
<quote>
>> + if (unlikely(geneveh->proto_type != htons(ETH_P_TEB)))
>
> Why? I thought the point of geneve carrying protocol field was to
> allow protocols other than Ethernet... is this temporary maybe?
Yes, it is temporary. Currently OVS only handles Ethernet packets but
this restriction can be lifted once we have a consumer that is capable
of handling other protocols.
</quote>
This white-list was then ported to a generic Geneve netdevice in commit 371bd1061d29 ("geneve: Consolidate Geneve functionality in single
module."). Preserving the Ethertype white-list at this point made sense,
as the Geneve device could send out only Ethernet payloads anyways.
However, now that the Geneve netdevice supports encapsulating other
payloads with IFLA_GENEVE_INNER_PROTO_INHERIT and we have a consumer
capable of other protocols, it seems appropriate to lift the restriction
and allow any Geneve payload to be received.
Signed-off-by: Josef Miegl <josef@miegl.cz> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Eyal Birger <eyal.birger@gmail.com> Link: https://lore.kernel.org/r/20230319220954.21834-1-josef@miegl.cz Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Marek Behún [Sun, 19 Mar 2023 14:02:38 +0000 (15:02 +0100)]
net: dsa: mv88e6xxx: fix mdio bus' phy_mask member
Commit 2c7e46edbd03 ("net: dsa: mv88e6xxx: mask apparently non-existing
phys during probing") added non-trivial bus->phy_mask in
mv88e6xxx_mdio_register() in order to avoid excessive mdio bus
transactions during probing.
But the mask is incorrect for switches with non-zero phy_base_addr (such
as 88E6341).
Fix this.
Fixes: 2c7e46edbd03 ("net: dsa: mv88e6xxx: mask apparently non-existing phys during probing") Signed-off-by: Marek Behún <kabel@kernel.org> Tested-by: Klaus Kudielka <klaus.kudielka@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20230319140238.9470-1-kabel@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Tom Rix [Sun, 19 Mar 2023 17:24:33 +0000 (13:24 -0400)]
net: cxgb3: remove unused fl_to_qset function
clang with W=1 reports
drivers/net/ethernet/chelsio/cxgb3/sge.c:169:32: error: unused function
'fl_to_qset' [-Werror,-Wunused-function]
static inline struct sge_qset *fl_to_qset(const struct sge_fl *q, int qidx)
^
This function is not used, so remove it.
====================
net: ethernet: mtk_eth_soc: various enhancements
This series brings a variety of fixes and enhancements for mtk_eth_soc,
adds support for the MT7981 SoC and facilitates sharing the SGMII PCS
code between mtk_eth_soc and mt7530.
The whole series has been tested on MT7622+MT7531 (BPi-R64),
MT7623+MT7530 (BPi-R2), MT7981+GPY211 (GL.iNet GL-MT3000) and
MT7986+MT7531 (BPi-R3). On the BananaPi R3 a variete of SFP modules
have been tested, all of them (some SGMII with PHY, others 2500Base-X
or 1000Base-X without PHY) are working well now, however, some of them
need manually disabling of autonegotiation for the link to come up.
====================
Daniel Golle [Sun, 19 Mar 2023 12:58:43 +0000 (12:58 +0000)]
net: dsa: mt7530: use external PCS driver
Implement regmap access wrappers, for now only to be used by the
pcs-mtk-lynxi driver.
Make use of this external PCS driver and drop the now reduntant
implementation in mt7530.c.
As a nice side effect the SGMII registers can now also more easily be
inspected for debugging via /sys/kernel/debug/regmap.
Tested-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: Daniel Golle <daniel@makrotopia.org> Tested-by: Frank Wunderlich <frank-w@public-files.de> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Golle [Sun, 19 Mar 2023 12:58:02 +0000 (12:58 +0000)]
net: ethernet: mtk_eth_soc: switch to external PCS driver
Now that we got a PCS driver, use it and remove the now redundant
PCS code and it's header macros from the Ethernet driver.
Signed-off-by: Daniel Golle <daniel@makrotopia.org> Tested-by: Frank Wunderlich <frank-w@public-files.de> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Golle [Sun, 19 Mar 2023 12:57:50 +0000 (12:57 +0000)]
net: pcs: add driver for MediaTek SGMII PCS
The SGMII core found in several MediaTek SoCs is identical to what can
also be found in MediaTek's MT7531 Ethernet switch IC.
As this has not always been clear, both drivers developed different
implementations to deal with the PCS.
Recently Alexander Couzens pointed out this fact which lead to the
development of this shared driver.
Add a dedicated driver, mostly by copying the code now found in the
Ethernet driver. The now redundant code will be removed by a follow-up
commit.
Suggested-by: Alexander Couzens <lynxis@fe80.eu> Suggested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Daniel Golle <daniel@makrotopia.org> Tested-by: Frank Wunderlich <frank-w@public-files.de> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Golle [Sun, 19 Mar 2023 12:57:35 +0000 (12:57 +0000)]
net: ethernet: mtk_eth_soc: ppe: add support for flow accounting
The PPE units found in MT7622 and newer support packet and byte
accounting of hw-offloaded flows. Add support for reading those counters
as found in MediaTek's SDK[1].
[1]: https://git01.mediatek.com/plugins/gitiles/openwrt/feeds/mtk-openwrt-feeds/+/bc6a6a375c800dc2b80e1a325a2c732d1737df92 Tested-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: Daniel Golle <daniel@makrotopia.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Golle [Sun, 19 Mar 2023 12:56:52 +0000 (12:56 +0000)]
dt-bindings: arm: mediatek: sgmiisys: Convert to DT schema
Convert mediatek,sgmiiisys bindings to DT schema format.
Add maintainer Matthias Brugger, no maintainers were listed in the
original documentation.
As this node is also referenced by the Ethernet controller and used
as SGMII PCS add this fact to the description.
Move the file to Documentation/devicetree/bindings/net/pcs/ which seems
more appropriate given that the great majority of registers are related
to SGMII PCS functionality and only one register represents clock bits.
Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Daniel Golle <daniel@makrotopia.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Lukas Bulwahn [Mon, 20 Mar 2023 07:32:01 +0000 (08:32 +0100)]
MAINTAINERS: remove file entry in NFC SUBSYSTEM after platform_data movement
Commit 053fdaa841bd ("nfc: mrvl: Move platform_data struct into driver")
moves the nfcmrvl.h header file from include/linux/platform_data to the
driver's directory, but misses to adjust MAINTAINERS.
Hence, ./scripts/get_maintainer.pl --self-test=patterns complains about a
broken reference.
Just remove the file entry in NFC SUBSYSTEM, as the new location of the
code is already covered by another pattern in that section.
Fixes: 053fdaa841bd ("nfc: mrvl: Move platform_data struct into driver") Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 20 Mar 2023 10:24:09 +0000 (10:24 +0000)]
Merge branch 'reuse-smsc-phy-functionality'
Heiner Kallweit says:
====================
net: phy: reuse SMSC PHY driver functionality in the meson-gxl PHY driver
The Amlogic Meson internal PHY's have the same register layout as
certain SMSC PHY's (also for non-c22-standard registers). This seems
to be more than just coincidence. Apparently they also need the same
workaround for EDPD mode (energy detect power down). Therefore let's
reuse SMSC PHY driver functionality in the meson-gxl PHY driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Sat, 18 Mar 2023 20:36:04 +0000 (21:36 +0100)]
net: phy: meson-gxl: reuse functionality of the SMSC PHY driver
The Amlogic Meson internal PHY's have the same register layout as
certain SMSC PHY's (also for non-c22-standard registers). This seems
to be more than just coincidence. Apparently they also need the same
workaround for EDPD mode (energy detect power down). Therefore let's
reuse SMSC PHY driver functionality in the meson-gxl PHY driver.
Tested with a G12A internal PHY. I don't have GXL test hw,
therefore I replace only the callbacks that are identical in
the SMSC PHY driver.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Sat, 18 Mar 2023 20:32:41 +0000 (21:32 +0100)]
net: phy: smsc: export functions for use by meson-gxl PHY driver
The Amlogic Meson internal PHY's have the same register layout as
certain SMSC PHY's (also for non-c22-standard registers). This seems
to be more than just coincidence. Apparently they also need the same
workaround for EDPD mode (energy detect power down). Therefore let's
export SMSC PHY driver functionality for use by the meson-gxl PHY
driver.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Chris Healy <healych@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Raed Salem [Tue, 14 Mar 2023 08:58:44 +0000 (10:58 +0200)]
net/mlx5e: Update IPsec per SA packets/bytes count
Providing per SA packets/bytes statistics mandates creating unique
counter per SA flow for Rx/Tx, whenever offloaded SA statistics is
desired query the specific SA counter to provide the stack with the
needed data.
Raed Salem [Tue, 14 Mar 2023 08:58:43 +0000 (10:58 +0200)]
net/mlx5e: Use one rule to count all IPsec Tx offloaded traffic
Currently one counter is shared between all IPsec Tx offloaded
rules to count the total amount of packets/bytes that was IPsec
Tx offloaded, replace this scheme by adding a new flow table (ft)
with one rule that counts all flows that passes through this
table (like Rx status ft), this ft is pointed by all IPsec Tx
offloaded rules. The above allows to have a counter per tx flow
rule in while keeping a separate global counter that store the
aggregation outcome of all these per flow counters.
Raed Salem [Tue, 14 Mar 2023 08:58:42 +0000 (10:58 +0200)]
net/mlx5e: Support IPsec acquire default SA
During XFRM stack acquire flow, a default SA is created to be updated
later, once acquire netlink message is handled in user space.
This SA is also passed to IPsec offload supporting driver, however this
SA acts only as placeholder and does not have context suitable for
offloading in HW yet. Identify this kind of SA by special offload flag
(XFRM_DEV_OFFLOAD_FLAG_ACQ), and create a SW only context.
In such cases with special mark so it won't be installed in HW in addition
flow and on remove/delete free this SW only context.
Raed Salem [Tue, 14 Mar 2023 08:58:41 +0000 (10:58 +0200)]
net/mlx5e: Allow policies with reqid 0, to support IKE policy holes
IKE policies hole, is special policy that exists to allow for IKE
traffic to bypass IPsec encryption even though there is already a
policies and SA(s) configured on same endpoints, these policies
does not nessecarly have the reqid configured, so need to add
an exception for such policies. These kind of policies are allowed
under the condition that at least upper protocol and/or ips
are not 0.
Raed Salem [Tue, 14 Mar 2023 08:58:40 +0000 (10:58 +0200)]
xfrm: copy_to_user_state fetch offloaded SA packets/bytes statistics
Both in RX and TX, the traffic that performs IPsec packet offload
transformation is accounted by HW only. Consequently, the HW should
be queried for packets/bytes statistics when user asks for such
transformation data.
Raed Salem [Tue, 14 Mar 2023 08:58:39 +0000 (10:58 +0200)]
xfrm: add new device offload acquire flag
During XFRM acquire flow, a default SA is created to be updated later,
once acquire netlink message is handled in user space. When the relevant
policy is offloaded this default SA is also offloaded to IPsec offload
supporting driver, however this SA does not have context suitable for
offloading in HW, nor is interesting to offload to HW, consequently needs
a special driver handling apart from other offloaded SA(s).
Add a special flag that marks such SA so driver can handle it correctly.
David S. Miller [Mon, 20 Mar 2023 09:08:48 +0000 (09:08 +0000)]
Merge branch 'ocelot-external-ports'
Colin Foster says:
====================
add support for ocelot external ports
This is the start of part 3 of what is hopefully a 3-part series to add
Ethernet switching support to Ocelot chips.
Part 1 of the series (A New Chip) added general support for Ocelot chips
that were controlled externally via SPI.
https://lore.kernel.org/all/20220815005553.1450359-1-colin.foster@in-advantage.com/
Part 2 of the series (The Ethernet Strikes Back) added DSA Ethernet
support for ports 0-3, which are the four copper ports that are internal
to the chip.
https://lore.kernel.org/all/20230127193559.1001051-1-colin.foster@in-advantage.com/
Part 3 will, at a minimum, add support for ports 4-7, which are
configured to use QSGMII to an external phy (Return Of The QSGMII). With
any luck, and some guidance, support for SGMII, SFPs, etc. will also be
part of this series.
V1 was submitted as an RFC - and that was rightly so. I suspected I
wasn't doing something right, and that was certainly the case. V2 is
much cleaner, so hopefully upgrading it to PATCH status is welcomed.
Thanks to Russell and Vladimir for correcting my course from V1.
In V1 I included a device tree snippet. I won't repeat that here, but
I will include a boot log snippet, in case it is of use:
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Foster [Fri, 17 Mar 2023 18:54:15 +0000 (11:54 -0700)]
net: dsa: ocelot: add support for external phys
The VSC7512 has four ports with internal phys that are already supported.
There are additional ports that can be configured to work with external
phys.
Add support for these additional ethernet ports.
Signed-off-by: Colin Foster <colin.foster@in-advantage.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Foster [Fri, 17 Mar 2023 18:54:14 +0000 (11:54 -0700)]
net: dsa: felix: allow serdes configuration for dsa ports
Ports for Ocelot devices (VSC7511, VSC7512, VSC7513 and VSC7514) support
external phys. When external phys are used, additional configuration on
each port is required to enable QSGMII mode and set external phy modes.
Add a configurable hook into these routines, so the external ports can be
used.
Signed-off-by: Colin Foster <colin.foster@in-advantage.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Foster [Fri, 17 Mar 2023 18:54:12 +0000 (11:54 -0700)]
net: dsa: felix: attempt to initialize internal hsio plls
The VSC7512 and VSC7514 have internal PLLs that can be used to control
different peripherals. Initialize these high speed I/O (HSIO) PLLs when
they exist, so that dependent peripherals like QSGMII can function.
Signed-off-by: Colin Foster <colin.foster@in-advantage.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Foster [Fri, 17 Mar 2023 18:54:11 +0000 (11:54 -0700)]
net: mscc: ocelot: expose serdes configuration function
During chip initialization, ports that use SGMII / QSGMII to interface to
external phys need to be configured on the VSC7513 and VSC7514. Expose this
configuration routine, so it can be used by DSA drivers.
Signed-off-by: Colin Foster <colin.foster@in-advantage.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The ocelot-switch driver can utilize the phylink_mac_config routine. Move
this to the ocelot library location and export the symbol to make this
possible.
Signed-off-by: Colin Foster <colin.foster@in-advantage.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ocelot chips have an internal PLL that must be used when communicating
through external phys. Expose the init routine, so it can be used by other
drivers.
Signed-off-by: Colin Foster <colin.foster@in-advantage.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Foster [Fri, 17 Mar 2023 18:54:07 +0000 (11:54 -0700)]
phy: phy-ocelot-serdes: add ability to be used in a non-syscon configuration
The phy-ocelot-serdes module has exclusively been used in a syscon setup,
from an internal CPU. The addition of external control of ocelot switches
via an existing MFD implementation means that syscon is no longer the only
interface that phy-ocelot-serdes will see.
In the MFD configuration, an IORESOURCE_REG resource will exist for the
device. Utilize this resource to be able to function in both syscon and
non-syscon configurations.
Signed-off-by: Colin Foster <colin.foster@in-advantage.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 19 Mar 2023 15:21:48 +0000 (15:21 +0000)]
Merge branch 'lan966x-tx-rx-improve'
Horatiu Vultur says:
====================
net: lan966x: Improve TX/RX of frames from/to CPU
The first patch of this series improves the RX side. As it seems to be
an expensive operation to read the RX timestamp for every frame, then
read it only if it is required. This will give an improvement of ~70mbit
on the RX side.
The second patch stops using the packing library. This improves mostly
the TX side as this library is used to set diffent bits in the IFH. If
this library is replaced with a more simple/shorter implementation,
this gives an improvement of more than 100mbit on TX side.
All the measurements were done using iperf3.
v1->v2:
- update lan966x_ifh_set to set the bytes and not each bit individually
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Horatiu Vultur [Fri, 17 Mar 2023 15:27:13 +0000 (16:27 +0100)]
net: lan966x: Stop using packing library
When a frame is injected from CPU, it is required to create an IFH(Inter
frame header) which sits in front of the frame that is transmitted.
This IFH, contains different fields like destination port, to bypass the
analyzer, priotity, etc. Lan966x it is using packing library to set and
get the fields of this IFH. But this seems to be an expensive
operations.
If this is changed with a simpler implementation, the RX will be
improved with ~5Mbit while on the TX is a much bigger improvement as it
is required to set more fields. Below are the numbers for TX.
Horatiu Vultur [Fri, 17 Mar 2023 15:27:12 +0000 (16:27 +0100)]
net: lan966x: Don't read RX timestamp if not needed
Whenever a frame was received to the CPU, the HW is timestamping the
frame. In the IFH(Inter Frame Header) it is found the nanosecond part
of the timestamps the SW is required to read from HW the second part.
But reading the second part it seems to be a expensive operations, so
so change this such to read the second part only when rx filter is
enabled.
Doing this change gives the RX a performance boost of ~70mbit.
Eric Dumazet [Fri, 17 Mar 2023 16:20:02 +0000 (16:20 +0000)]
net/packet: remove po->xmit
Use PACKET_SOCK_QDISC_BYPASS atomic bit instead of a pointer.
This removes one indirect call in fast path,
and READ_ONCE()/WRITE_ONCE() annotations as well.
Signed-off-by: Eric Dumazet <edumazet@google.com> Suggested-by: Willem de Bruijn <willemb@google.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Harini Katakam [Fri, 17 Mar 2023 11:39:43 +0000 (17:09 +0530)]
net: macb: Reset TX when TX halt times out
Reset TX when halt times out i.e. disable TX, clean up TX BDs,
interrupts (already done) and enable TX.
This addresses the issue observed when iperf is run at 10Mps Half
duplex where, after multiple collisions and retries, TX halts.
Signed-off-by: Harini Katakam <harini.katakam@xilinx.com> Signed-off-by: Michal Simek <michal.simek@xilinx.com> Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Nguyen [Fri, 17 Mar 2023 20:09:03 +0000 (13:09 -0700)]
ixgb: Remove ixgb driver
There are likely no users of this driver as the hardware has been
discontinued since 2010. Remove the driver and all references to it
in documentation.
Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Shevchenko [Thu, 16 Mar 2023 12:08:26 +0000 (14:08 +0200)]
net: phy: at803x: Replace of_gpio.h with what indeed is used
of_gpio.h in this driver is solely used as a proxy to other headers.
This is incorrect usage of the of_gpio.h. Replace it .h with what
indeed is used in the code.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Shevchenko [Thu, 16 Mar 2023 12:04:19 +0000 (14:04 +0200)]
net: smc91x: Replace of_gpio.h with what indeed is used
of_gpio.h in this driver is solely used as a proxy to other headers.
This is incorrect usage of the of_gpio.h. Replace it .h with what
indeed is used in the code.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: macb: Set MDIO clock divisor for pclk higher than 160MHz
Currently macb sets clock divisor for pclk up to 160 MHz.
Function gem_mdc_clk_div was updated to enable divisor
for higher values of pclk.
Signed-off-by: Bartosz Wawrzyniak <bwawrzyn@cisco.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
This is a follow-up of d27d367d3b78 ("inet: better const qualifier awareness")
Adopting container_of_const() to perform (struct sock *)->(protocol sock *)
operation is allowing us to propagate const qualifier and thus detect
misuses at compile time.
Most conversions are trivial, because most protocols did not adopt yet
const sk pointers where it could make sense.
Only mptcp and tcp patches (end of this series) are requiring small
adjustments.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 17 Mar 2023 15:55:39 +0000 (15:55 +0000)]
tcp: preserve const qualifier in tcp_sk()
We can change tcp_sk() to propagate its argument const qualifier,
thanks to container_of_const().
We have two places where a const sock pointer has to be upgraded
to a write one. We have been using const qualifier for lockless
listeners to clearly identify points where writes could happen.
Add tcp_sk_rw() helper to better document these.
tcp_inbound_md5_hash(), __tcp_grow_window(), tcp_reset_check()
and tcp_rack_reo_wnd() get an additional const qualififer
for their @tp local variables.
smc_check_reset_syn_req() also needs a similar change.
Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 17 Mar 2023 15:55:38 +0000 (15:55 +0000)]
mptcp: preserve const qualifier in mptcp_sk()
We can change mptcp_sk() to propagate its argument const qualifier,
thanks to container_of_const().
We need to change few things to avoid build errors:
mptcp_set_datafin_timeout() and mptcp_rtx_head() have to accept
non-const sk pointers.
@msk local variable in mptcp_pending_tail() must be const.
Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Matthieu Baerts <matthieu.baerts@tessares.net> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 17 Mar 2023 15:55:37 +0000 (15:55 +0000)]
x25: preserve const qualifier in [a]x25_sk()
We can change [a]x25_sk() to propagate their argument const qualifier,
thanks to container_of_const().
Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 17 Mar 2023 15:55:36 +0000 (15:55 +0000)]
smc: preserve const qualifier in smc_sk()
We can change smc_sk() to propagate its argument const qualifier,
thanks to container_of_const().
Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Karsten Graul <kgraul@linux.ibm.com> Cc: Wenjia Zhang <wenjia@linux.ibm.com> Cc: Jan Karcher <jaka@linux.ibm.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 17 Mar 2023 15:55:35 +0000 (15:55 +0000)]
af_unix: preserve const qualifier in unix_sk()
We can change unix_sk() to propagate its argument const qualifier,
thanks to container_of_const().
We need to change dump_common_audit_data() 'struct unix_sock *u'
local var to get a const attribute.
Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 17 Mar 2023 15:55:34 +0000 (15:55 +0000)]
dccp: preserve const qualifier in dccp_sk()
We can change dccp_sk() to propagate its argument const qualifier,
thanks to container_of_const().
Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 17 Mar 2023 15:55:33 +0000 (15:55 +0000)]
ipv6: raw: preserve const qualifier in raw6_sk()
We can change raw6_sk() to propagate its argument const qualifier,
thanks to container_of_const().
Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 17 Mar 2023 15:55:32 +0000 (15:55 +0000)]
raw: preserve const qualifier in raw_sk()
We can change raw_sk() to propagate const qualifier of its argument,
thanks to container_of_const()
Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 17 Mar 2023 15:55:31 +0000 (15:55 +0000)]
af_packet: preserve const qualifier in pkt_sk()
We can change pkt_sk() to propagate const qualifier of its argument,
thanks to container_of_const()
This should avoid some potential errors caused by accidental
(const -> not_const) promotion.
Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 17 Mar 2023 15:55:30 +0000 (15:55 +0000)]
udp: preserve const qualifier in udp_sk()
We can change udp_sk() to propagate const qualifier of its argument,
thanks to container_of_const()
This should avoid some potential errors caused by accidental
(const -> not_const) promotion.
Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
net/mlx5e: Add GBP VxLAN HW offload support
Patch-1: Remove unused argument from functions.
Patch-2: Expose helper function vxlan_build_gbp_hdr.
Patch-3: Add helper function for encap_info_equal for tunnels with options.
Patch-4: Preserving the const-ness of the pointer in ip_tunnel_info_opts.
Patch-5: Add HW offloading support for TC flows with VxLAN GBP encap/decap
in mlx ethernet driver.
====================
Gavin Li [Thu, 16 Mar 2023 07:07:58 +0000 (09:07 +0200)]
net/mlx5e: TC, Add support for VxLAN GBP encap/decap flows offload
Add HW offloading support for TC flows with VxLAN GBP encap/decap.
Example of encap rule:
tc filter add dev eth0 protocol ip ingress flower \
action tunnel_key set id 42 vxlan_opts 512 \
action mirred egress redirect dev vxlan1
Example of decap rule:
tc filter add dev vxlan1 protocol ip ingress flower \
enc_key_id 42 enc_dst_port 4789 vxlan_opts 1024 \
action tunnel_key unset action mirred egress redirect dev eth0
Signed-off-by: Gavin Li <gavinl@nvidia.com> Reviewed-by: Gavi Teitz <gavi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Acked-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gavin Li [Thu, 16 Mar 2023 07:07:57 +0000 (09:07 +0200)]
ip_tunnel: Preserve pointer const in ip_tunnel_info_opts
Change ip_tunnel_info_opts( ) from static function to macro to cast return
value and preserve the const-ness of the pointer.
Signed-off-by: Gavin Li <gavinl@nvidia.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gavin Li [Thu, 16 Mar 2023 07:07:56 +0000 (09:07 +0200)]
net/mlx5e: Add helper for encap_info_equal for tunnels with options
For tunnels with options, eg, geneve and vxlan with gbp, they share the
same way to compare the headers and options. Extract the code as a common
function for them.
Signed-off-by: Gavin Li <gavinl@nvidia.com> Reviewed-by: Gavi Teitz <gavi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Acked-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
haozhe chang [Thu, 16 Mar 2023 09:58:20 +0000 (17:58 +0800)]
wwan: core: Support slicing in port TX flow of WWAN subsystem
wwan_port_fops_write inputs the SKB parameter to the TX callback of
the WWAN device driver. However, the WWAN device (e.g., t7xx) may
have an MTU less than the size of SKB, causing the TX buffer to be
sliced and copied once more in the WWAN device driver.
This patch implements the slicing in the WWAN subsystem and gives
the WWAN devices driver the option to slice(by frag_len) or not. By
doing so, the additional memory copy is reduced.
Meanwhile, this patch gives WWAN devices driver the option to reserve
headroom in fragments for the device-specific metadata.
Starting with commit 1a136ca2e089 ("net: mdio: scan bus based on bus
capabilities for C22 and C45"), mdiobus_scan_bus_c45() is being called on
buses with MDIOBUS_NO_CAP. On a Turris Omnia (Armada 385, 88E6176 switch),
this causes a significant increase of boot time, from 1.6 seconds, to 6.3
seconds. The boot time stated here is until start of /init.
Further testing revealed that the C45 scan is indeed expensive (around
2.7 seconds, due to a huge number of bus transactions), and called twice.
Two things were suggested:
(1) to move the expensive call of mv88e6xxx_mdios_register() from
mv88e6xxx_probe() to mv88e6xxx_setup().
(2) to mask apparently non-existing phys during probing.
Before that:
Patch #1 prepares the driver to handle the movement of
mv88e6xxx_mdios_register() to mv88e6xxx_setup() for cross-chip DSA trees.
Patch #2 is preparatory code movement, without functional change.
With those changes, boot time on the Turris Omnia is back to normal.
Klaus Kudielka [Wed, 15 Mar 2023 16:38:46 +0000 (17:38 +0100)]
net: dsa: mv88e6xxx: mask apparently non-existing phys during probing
To avoid excessive mdio bus transactions during probing, mask all phy
addresses that do not exist (there is a 1:1 mapping between switch port
number and phy address).
Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Klaus Kudielka <klaus.kudielka@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Klaus Kudielka [Wed, 15 Mar 2023 16:38:45 +0000 (17:38 +0100)]
net: dsa: mv88e6xxx: move call to mv88e6xxx_mdios_register()
Call the rather expensive mv88e6xxx_mdios_register() at the beginning of
mv88e6xxx_setup(). This avoids the double call via mv88e6xxx_probe()
during boot.
For symmetry, call mv88e6xxx_mdios_unregister() at the end of
mv88e6xxx_teardown().
Link: https://lore.kernel.org/lkml/449bde236c08d5ab5e54abd73b645d8b29955894.camel@gmail.com/ Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Klaus Kudielka <klaus.kudielka@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Tested-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Klaus Kudielka [Wed, 15 Mar 2023 16:38:44 +0000 (17:38 +0100)]
net: dsa: mv88e6xxx: re-order functions
Move mv88e6xxx_setup() below mv88e6xxx_mdios_register(), so that we are
able to call the latter one from here. Do the same thing for the
inverse functions.
Signed-off-by: Klaus Kudielka <klaus.kudielka@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Wed, 15 Mar 2023 16:38:43 +0000 (17:38 +0100)]
net: dsa: mv88e6xxx: don't dispose of Global2 IRQ mappings from mdiobus code
irq_find_mapping() does not need irq_dispose_mapping(), only
irq_create_mapping() does.
Calling irq_dispose_mapping() from mv88e6xxx_g2_irq_mdio_free() and from
the error path of mv88e6xxx_g2_irq_mdio_setup() effectively means that
the mdiobus logic (for internal PHY interrupts) is disposing of a
hwirq->virq mapping which it is not responsible of (but instead, the
function pair mv88e6xxx_g2_irq_setup() + mv88e6xxx_g2_irq_free() is).
With the current code structure, this isn't such a huge problem, because
mv88e6xxx_g2_irq_mdio_free() is called relatively close to the real
owner of the IRQ mappings:
and the switch isn't 'live' in any way such that it would be able of
generating interrupts at this point (mv88e6xxx_unregister_switch() has
been called).
However, there is a desire to split mv88e6xxx_mdios_unregister() and
mv88e6xxx_g2_irq_free() such that mv88e6xxx_mdios_unregister() only gets
called from mv88e6xxx_teardown(). This is much more problematic, as can
be seen below.
In a cross-chip scenario (say 3 switches d0032004.mdio-mii:10, d0032004.mdio-mii:11 and d0032004.mdio-mii:12 which form a single DSA
tree), it is possible to unbind the device driver from a single switch
(say d0032004.mdio-mii:10).
When that happens, mv88e6xxx_remove() will be called for just that one
switch, and this will call mv88e6xxx_unregister_switch() which will tear
down the entire tree (calling mv88e6xxx_teardown() for all 3 switches).
Assuming mv88e6xxx_mdios_unregister() was moved to mv88e6xxx_teardown(),
at this stage, all 3 switches will have called irq_dispose_mapping() on
their mdiobus virqs.
When we bind again the device driver to d0032004.mdio-mii:10,
mv88e6xxx_probe() is called for it, which calls dsa_register_switch().
The DSA tree is now complete again, and mv88e6xxx_setup() is called for
all 3 switches.
Also assuming that mv88e6xxx_mdios_register() is moved to
mv88e6xxx_setup() (the 2 assumptions go together), at this point, d0032004.mdio-mii:11 and d0032004.mdio-mii:12 don't have an IRQ mapping
for the internal PHYs anymore, as they've disposed of it in
mv88e6xxx_teardown(). Whereas switch d0032004.mdio-mii:10 has re-created
it, because its code path comes from mv88e6xxx_probe().
Simply put, this change prepares the driver to handle the movement of
mv88e6xxx_mdios_register() to mv88e6xxx_setup() for cross-chip DSA trees.
Also, the code being deleted was partially wrong anyway (in a way which
may have hidden this other issue). mv88e6xxx_g2_irq_mdio_setup()
populates bus->irq[] starting with offset chip->info->phy_base_addr, but
the teardown path doesn't apply that offset too. So it disposes of virq
0 for phy = [ 0, phy_base_addr ).
All switch families have phy_base_addr = 0, except for MV88E6141 and
MV88E6341 which have it as 0x10. I guess those families would have
happened to work by mistake in cross-chip scenarios too.
I'm deleting the body of mv88e6xxx_g2_irq_mdio_free() but leaving its
call sites and prototype in place. This is because, if we ever need to
add back some teardown procedure in the future, it will be perhaps
error-prone to deduce the proper call sites again. Whereas like this,
no extra code should get generated, it shouldn't bother anybody.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Klaus Kudielka <klaus.kudielka@gmail.com> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ptp: kvm: Use decrypted memory in confidential guest on x86
KVM_HC_CLOCK_PAIRING currently fails inside SEV-SNP guests because the
guest passes an address to static data to the host. In confidential
computing the host can't access arbitrary guest memory so handling the
hypercall runs into an "rmpfault". To make the hypercall work, the guest
needs to explicitly mark the memory as decrypted. Do that in
kvm_arch_ptp_init(), but retain the previous behavior for
non-confidential guests to save us from having to allocate memory.
Add a new arch-specific function (kvm_arch_ptp_exit()) to free the
allocation and mark the memory as encrypted again.