Oleksij Rempel [Mon, 20 Jun 2022 11:56:01 +0000 (13:56 +0200)]
net: phy: dp83td510: add SQI support
Convert MSE (mean-square error) values to SNR and split it SQI (Signal Quality
Indicator) ranges. The used ranges are taken from "OPEN ALLIANCE - Advanced
diagnostic features for 100BASE-T1 automotive Ethernet PHYs"
specification.
Lukas Wunner [Mon, 20 Jun 2022 09:28:39 +0000 (11:28 +0200)]
net: phy: smsc: Deduplicate interrupt acknowledgement upon phy_init_hw()
Since commit 4c0d2e96ba05 ("net: phy: consider that suspend2ram may cut
off PHY power"), phy_init_hw() invokes both, the ->config_init() and
->config_intr() callbacks.
In the SMSC PHY driver, the latter acknowledges stale interrupts, hence
there's no longer a need to acknowledge them in the former as well.
There are no other callers of ->config_init() besides phy_init_hw().
====================
net: dsa: microchip: common spi probe for the ksz series switches - part 1
This patch series aims to refactor the ksz_switch_register routine to have the
common flow for the ksz series switch. At present ksz8795.c & ksz9477.c have
its own dsa_switch_ops and switch detect functionality.
In ksz_switch_register, ksz_dev_ops is assigned based on the function parameter
passed by the individual ksz8/ksz9477 switch register function. And then switch
detect is performed based on the ksz_dev_ops.detect hook. This patch modifies
the ksz_switch_register such a way that switch detect is performed first, based
on the chip ksz_dev_ops is assigned to ksz_device structure. It ensures the
common flow for the existing as well as LAN937x switches.
In the next series of patch, it will move ksz_dsa_ops and dsa_switch_ops
from ksz8795.c and ksz9477.c to ksz_common.c and have the common spi
probe all the ksz based switches.
Changes in v1
- Splitted the patch series into two.
- Replaced all occurrence of REG_PORT_STATUS_0 and PORT_FIBER_MODE to
KSZ8_PORT_STATUS_0 and KSZ8_PORT_FIBER_MODE.
- Separated the tag protocol and phy read/write patch into two.
- Assigned the DSA_TAG_PROTO_NONE as the default value for get_tag_protocol hook.
- Reduced the indentation level by using the if(!dev->dev_ops->mirror_add).
- Added the stp_ctrl_reg as a member in ksz_chip_data and removed the member
in ksz_dev_ops.
- Removed the r_dyn_mac_table, r_sta_mac_table and w_sta_mac_table from the
ksz_dev_ops since it is used only in the ksz8795.c.
Changes in RFC v2
- Fixed the compilation issue.
- Reduced the patch set to 15.
====================
Arun Ramadoss [Fri, 17 Jun 2022 08:42:55 +0000 (14:12 +0530)]
net: dsa: microchip: move get_phy_flags & mtu to ksz_common
This patch assigns the get_phy_flags & mtu hook of ksz8795 and ksz9477
in dsa_switch_ops to ksz_common. For get_phy_flags hooks,checks whether
the chip is ksz8863/kss8793 then it returns error for port1.
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Arun Ramadoss [Fri, 17 Jun 2022 08:42:54 +0000 (14:12 +0530)]
net: dsa: microchip: update fdb add/del/dump in ksz_common
This patch makes the dsa_switch_hook for fdbs to use ksz_common.c file.
And from ksz_common, individual switches fdb functions are called using
the dev->dev_ops. And removed the r_dyn_mac_table, r_sta_mac_table and
w_sta_mac_table from ksz_dev_ops as it is used only in ksz8795.c
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Arun Ramadoss [Fri, 17 Jun 2022 08:42:53 +0000 (14:12 +0530)]
net: dsa: microchip: update the ksz_port_mdb_add/del
ksz_mdb_add/del in ksz_common.c is specific for the ksz8795.c file. The
ksz9477 has its separate ksz9477_port_mdb_add/del functions. This patch
moves the ksz8795 specific mdb functionality from ksz_common to ksz8795.
And this dsa_switch_ops hooks for ksz8795/ksz9477 are invoked through
the ksz_port_mdb_add/del.
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Arun Ramadoss [Fri, 17 Jun 2022 08:42:52 +0000 (14:12 +0530)]
net: dsa: microchip: update the ksz_phylink_get_caps
This patch assigns the phylink_get_caps in ksz8795 and ksz9477 to
ksz_phylink_get_caps. And update their mac_capabilities in the
respective ksz_dev_ops.
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Arun Ramadoss [Fri, 17 Jun 2022 08:42:51 +0000 (14:12 +0530)]
net: dsa: microchip: get P_STP_CTRL in ksz_port_stp_state by ksz_dev_ops
At present, P_STP_CTRL register value is passed as parameter to
ksz_port_stp_state from the individual dsa_switch_ops hooks. This patch
update the function to retrieve the register value through the
ksz_chip_data member.
And add the static to ksz_update_port_member since it is not called
outside the ksz_common.
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Arun Ramadoss [Fri, 17 Jun 2022 08:42:50 +0000 (14:12 +0530)]
net: dsa: microchip: move the port mirror to ksz_common
This patch updates the common port mirror add/del dsa_switch_ops in
ksz_common.c. The individual switches implementation is executed based
on the ksz_dev_ops function pointers.
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Arun Ramadoss [Fri, 17 Jun 2022 08:42:49 +0000 (14:12 +0530)]
net: dsa: microchip: move vlan functionality to ksz_common
This patch moves the vlan dsa_switch_ops such as vlan_add, vlan_del and
vlan_filtering from the individual files ksz8795.c, ksz9477.c to
ksz_common.c file.
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Arun Ramadoss [Fri, 17 Jun 2022 08:42:48 +0000 (14:12 +0530)]
net: dsa: microchip: ksz9477: use ksz_read_phy16 & ksz_write_phy16
ksz8795 and ksz9477 implementation on phy read/write hooks are
different. This patch modifies the ksz9477 implementation same as
ksz8795 by updating the ksz9477_dev_ops structure.
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Arun Ramadoss [Fri, 17 Jun 2022 08:42:46 +0000 (14:12 +0530)]
net: dsa: microchip: move switch chip_id detection to ksz_common
KSZ87xx and KSZ88xx have chip_id representation at reg location 0. And
KSZ9477 compatible switch and LAN937x switch have same chip_id detection
at location 0x01 and 0x02. To have the common switch detect
functionality for ksz switches, ksz_switch_detect function is
introduced.
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Arun Ramadoss [Fri, 17 Jun 2022 08:42:45 +0000 (14:12 +0530)]
net: dsa: microchip: ksz9477: cleanup the ksz9477_switch_detect
The ksz9477_switch_detect performs the detecting the chip id from the
location 0x00 and also check gigabit compatibility check & number of
ports based on the register global_options0. To prepare the common ksz
switch detect function, routine other than chip id read is moved to
ksz9477_switch_init.
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Yu Xiao [Mon, 20 Jun 2022 10:39:12 +0000 (12:39 +0200)]
nfp: compose firmware file name with new hwinfo "nffw.partno"
During initialization of the NFP driver, a file name for loading
application firmware is composed using the NIC's AMDA information and port
type (count and speed). E.g.: "nic_AMDA0145-1012_2x10.nffw".
In practice there may be many variants for each NIC type, and many of the
variants relate to assembly components which do not concern the driver and
application firmware implementation. Yet the current scheme leads to a
different application firmware file name for each variant, because they
have different AMDA information.
To reduce proliferation of content-duplicated application firmware images
or symlinks, the NIC's management firmware will only expose differences
between variants that need different application firmware via a newly
introduced hwinfo, "nffw.partno".
Use of the existing hwinfo, "assembly.partno", is maintained in order to
support for NICs with management firmware that does not expose
"nffw.partno".
====================
mlxsw: Unified bridge conversion - part 1/6
This set starts converting mlxsw to the unified bridge model and mainly
adds new device registers and extends existing ones that will be used in
follow-up patchsets.
High-level summary
==================
The unified bridge model is a new way of managing low-level device
objects such as filtering identifiers (FIDs). The conversion moves a lot
of logic out of the device's firmware towards the driver, but its main
selling point is that it allows to overcome various scalability issues
related to the amount of entries that need to be programmed to the
device.
The only (intended) user visible changes of the conversion are
improvement in resource utilization and ability to support more router
interfaces (RIFs) in Spectrum-{2,3}.
Details
=======
Commit 50853808ff4a ("Merge branch
'mlxsw-Prepare-for-VLAN-aware-bridge-w-VxLAN'") converted mlxsw to
emulate 802.1Q FIDs (represent VLANs in a VLAN-aware bridge) using
802.1D FIDs (represent VLAN-unaware bridges). This was necessary because
at that time VNI could not be assigned to 802.1Q FIDs, which effectively
meant that mlxsw could not support VXLAN with VLAN-aware bridges.
The downside of this approach is that multiple {Port,VID}->FID entries
are required in order to classify incoming traffic to a FID, as opposed
to a single VID->FID entry that can be used with actual 802.1Q FIDs.
For example, if 10 ports are members in the same VLAN-aware bridge and
the same 100 VLANs are configured on each port, then only 100 VID->FID
entries are required with 802.1Q FIDs, whereas 1000 {Port,VID}->FID
entries are required with emulated 802.1Q FIDs.
The above limitation is the result of various assumptions that were made
in the design of the API that was exposed to software. In the unified
bridge model the API is much more "raw" and therefore avoids these
assumptions, allowing software to configure the device in a more
efficient manner.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Sun, 19 Jun 2022 10:29:21 +0000 (13:29 +0300)]
mlxsw: reg: Add support for VLAN RIF as part of RITR register
Router interfaces (RIFs) constructed on top of VLAN-aware bridges are of
"VLAN" type, whereas RIFs constructed on top of VLAN-unaware bridges of
"FID" type.
In other words, the RIF type is derived from the underlying FID type.
VLAN RIFs are used on top of 802.1Q FIDs, whereas FID RIFs are used on
top of 802.1D FIDs.
Currently 802.1Q FIDs are emulated using 802.1D FIDs, and therefore VLAN
RIFs are emulated using FID RIFs.
As part of converting the driver to use unified bridge, 802.1Q FIDs and
VLAN RIFs will be used.
Add the relevant fields to RITR register, add pack() function for VLAN
RIF and rename one field to fit the internal name.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Sun, 19 Jun 2022 10:29:20 +0000 (13:29 +0300)]
mlxsw: Add support for egress FID classification after decapsulation
As preparation for unified bridge model, add support for VNI->FID mapping
via SVFA register.
When performing VXLAN encapsulation, the VXLAN header needs to contain a
VNI. This VNI is derived from the FID classification performed on
ingress, through which the ingress RIF is also determined.
Similarly, when performing VXLAN decapsulation, the FID of the packet
needs to be determined. This FID is derived from VNI classification
performed during decapsulation.
In the old model, both entries (i.e., FID->VNI and VNI->FID) were
configured via SFMR.vni.
In the new model, where ingress is separated from egress, ingress
configuration (VNI->FID) is performed via SVFA, while SFMR only
configures egress (FID->VNI).
Add 'vni' field to SVFA, add new mapping table - VNI to FID, add new
pack() function for VNI mapping and edit the comment in SFMR.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Sun, 19 Jun 2022 10:29:19 +0000 (13:29 +0300)]
mlxsw: reg: Add egress FID field to RITR register
RITR configures the router interface table. As preparation for unified
bridge model, add egress FID field to RITR.
After routing, a packet has to perform a layer-2 lookup using the
destination MAC it got from the routing and a FID.
In the new model, the egress FID is configured by RITR for both sub-port
and FID RIFs.
Add 'efid' field to sub-port router interface and update FID router
interface related comment.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Sun, 19 Jun 2022 10:29:18 +0000 (13:29 +0300)]
mlxsw: reg: Add Router Egress Interface to VID Register
The REIV maps {egress router interface (eRIF), egress_port} -> {vlan ID}.
As preparation for unified bridge model, add REIV register for future use.
In the past, firmware would take care of the above mentioned mapping,
but in the new model this should be done by software using REIV register.
REIV register supports a simultaneous update of 256 ports using
'port_page' field. When 'port_page'=0 the records represent ports
0-255, when 'port_page'=1 the records represent ports 256-511 and so
on.
The register is reserved while using the legacy model.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Sun, 19 Jun 2022 10:29:17 +0000 (13:29 +0300)]
mlxsw: reg: Replace MID related fields in SFGC register
SFGC register maps {packet type, bridge type} -> {MID base, table type}.
As preparation for unified bridge model, remove 'mid' field and add
'mid_base' field.
The MID index (index to PGT table which maps MID to local port list and
SMPE index) is a result of 'mid_base' + 'fid_offset'. Using the legacy
bridge model, firmware configures 'mid_base'. However, using the new model,
software is responsible to configure it via SFGC register.
The 'mid_base' is configured per {packet type, bridge type}, for
example, for {Unicast, .1Q}, {Broadcast, .1D}.
Add the field 'mid_base' to SFGC register and increase the length of the
register accordingly.
Remove the field 'mid' as currently it is ignored by the device, its use
is an old leftover.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Sun, 19 Jun 2022 10:29:16 +0000 (13:29 +0300)]
mlxsw: reg: Add flood related field to SFMR register
SFMR register creates and configures FIDs. As preparation for unified
bridge model, add a required field for future use.
The PGT (Port Group) table maps multicast ID (MID) to
{local port list, SMPE index} on Spectrum-1 and to {local port list} on
the other ASICs.
In the legacy model, software did not interact with this table directly.
Instead, it was accessed by firmware in response to registers such as
SFTR and SMID.
In the new model, the SFTR register is deprecated and software has full
control over the PGT table using the SMID register.
The configuration of MDB entries (using SFD) is unchanged, but flooding
configuration is completely different.
SFGC register maps {packet type, bridge type} -> {MID base, table type},
then with FID and FID-offset which are configured via SFMR, the MID index
is obtained.
Add the field 'flood_bridge_type' to SFMR, software can separate between
802.1q FIDs and vFIDs using two types which are supported.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Sun, 19 Jun 2022 10:29:15 +0000 (13:29 +0300)]
mlxsw: reg: Add VID related fields to SFD register
SFD register configures FDB table. As preparation for unified bridge model,
add some required fields for future use.
In the new model, firmware no longer configures the egress VID, this
responsibility is moved to software. For layer 2 this means that software
needs to determine the egress VID for both unicast and multicast.
For unicast FDB records and unicast LAG FDB records, the VID needs to be
set via new fields in SFD - 'set_vid' and 'vid'.
Add the two mentioned fields for future use.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Sun, 19 Jun 2022 10:29:14 +0000 (13:29 +0300)]
mlxsw: reg: Add SMPE related fields to SFMR register
SFMR register creates and configures FIDs. As preparation unified bridge
model, add some required fields for future use.
The device includes two main tables to support layer 2 multicast (i.e.,
MDB and flooding). These are the PGT (Port Group Table) and the
MPE (Multicast Port Egress) table.
- PGT is {MID -> (bitmap of local_port, SPME index)}
- MPE is {(Local port, SMPE index) -> eVID}
In Spectrum-2 and later ASICs, the SMPE index is an attribute of the FID
and programmed via new fields in SFMR register - 'smpe_valid' and 'smpe'.
Add the two mentioned fields for future use and increase the length of
the register accordingly.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Sun, 19 Jun 2022 10:29:13 +0000 (13:29 +0300)]
mlxsw: Add SMPE related fields to SMID2 register
SMID register maps multicast ID (MID) into a list of local ports.
As preparation for unified bridge model, add some required fields for
future use.
The device includes two main tables to support layer 2 multicast (i.e.,
MDB and flooding). These are the PGT (Port Group Table) and the
MPE (Multicast Port Egress) table.
- PGT is {MID -> (bitmap of local_port, SPME index)}
- MPE is {(Local port, SMPE index) -> eVID}
In Spectrum-1, both indexes into the MPE table (local port and SMPE) are
derived from the PGT table. Therefore, the SMPE index needs to be
programmed as part of the PGT entry via new fields in SMID - 'smpe_valid'
and 'smpe'.
Add the two mentioned fields for future use and align the callers of
mlxsw_reg_smid2_pack() to pass zeros for SMPE fields.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Sun, 19 Jun 2022 10:29:12 +0000 (13:29 +0300)]
mlxsw: reg: Add Switch Multicast Port to Egress VID Register
The SMPE register maps {egress_port, SMPE index} -> VID.
The device includes two main tables to support layer 2 multicast (i.e.,
MDB and flooding). These are the PGT (Port Group Table) and the
MPE (Multicast Port Egress) table.
- PGT is {MID -> (bitmap of local_port, SPME index)}
- MPE is {(Local port, SMPE index) -> eVID}
In Spectrum-1, the index into the MPE table - called switch multicast to
port egress VID (SMPE) - is derived from the PGT entry, whereas in
Spectrum-2 and later ASICs it is derived from the FID.
In the legacy model, software did not interact with this table as it was
completely hidden in firmware. In the new model, software needs to
populate the table itself in order to map from {Local port, SMPE index} to
an egress VID. This is done using the SMPE register.
Add the register for future use.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Sun, 19 Jun 2022 10:29:11 +0000 (13:29 +0300)]
mlxsw: reg: Add ingress RIF related fields to SVFA register
SVFA register controls the VID to FID mapping and {Port, VID} to FID
mapping for virtualized ports. As preparation for unified bridge model,
add some required fields for future use.
On ingress, after ingress ACL, a packet needs to be classified to a FID.
The key for this lookup can be one of:
1. VID. When port is not in virtual mode.
2. {RQ, VID}. When port is in virtual mode.
3. FID. When FID was set by ingress ACL.
Since RITR no longer performs ingress configuration, the ingress RIF for
the first two entry types needs to be set via new fields in SVFA -
'irif_v' and 'irif'.
Add the two mentioned fields for future use and increase the length of
the register accordingly.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Sun, 19 Jun 2022 10:29:10 +0000 (13:29 +0300)]
mlxsw: reg: Add ingress RIF related fields to SFMR register
SFMR register creates and configures FIDs. As preparation for unified
bridge model, add some required fields for future use.
On ingress, after ingress ACL, a packet needs to be classified to a FID.
The key for this lookup can be one of:
1. VID. When port is not in virtual mode.
2. {RQ, VID}. When port is in virtual mode.
3. FID. When FID was set by ingress ACL.
For example, via VR_AND_FID_ACTION.
Since RITR no longer performs ingress configuration, the ingress RIF for
the last entry type needs to be set via new fields in SFMR - 'irif_v'
and 'irif'.
Add the two mentioned fields for future use.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Sun, 19 Jun 2022 10:29:09 +0000 (13:29 +0300)]
mlxsw: reg: Add 'flood_rsp' field to SFMR register
SFMR register creates and configures FIDs. As preparation for unified
bridge model, add a field for future use.
In the new model, RITR no longer configures the rFID used for sub-port RIFs
and it has to be created by software via SFMR. Such FIDs need to be created
with special flood indication using 'flood_rsp' field. When set, this bit
instructs the device to manage the flooding entries for this FID in a
reserved part of the port group table (PGT).
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ronak Doshi [Mon, 20 Jun 2022 00:10:13 +0000 (17:10 -0700)]
vmxnet3: disable overlay offloads if UPT device does not support
'Commit 6f91f4ba046e ("vmxnet3: add support for capability registers")'
added support for capability registers. These registers are used
to advertize capabilities of the device.
The patch updated the dev_caps to disable outer checksum offload if
PTCR register does not support it. However, it missed to update
other overlay offloads. This patch fixes this issue.
Fixes: 6f91f4ba046e ("vmxnet3: add support for capability registers") Signed-off-by: Ronak Doshi <doshir@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 20 Jun 2022 08:10:13 +0000 (09:10 +0100)]
Merge branch 'raw-rcu-fixes'
Kuniyuki Iwashima says:
====================
raw: Fix nits of RCU conversion series.
The first patch fixes a build error by commit ba44f8182ec2 ("raw: use
more conventional iterators"), but it does not land in the net tree,
so this series is targeted to net-next. The second patch replaces some
hlist functions with sk's helper macros.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
raw: Fix mixed declarations error in raw_icmp_error().
The trailing semicolon causes a compiler error, so let's remove it.
net/ipv4/raw.c: In function ‘raw_icmp_error’:
net/ipv4/raw.c:266:2: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
266 | struct hlist_nulls_head *hlist;
| ^~~~~~
Fixes: ba44f8182ec2 ("raw: use more conventional iterators") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Horman [Sun, 19 Jun 2022 08:15:30 +0000 (10:15 +0200)]
Revert "nfp: update nfp_X logging definitions"
This reverts commit 9386ebccfc59 ("nfp: update nfp_X logging definitions")
The reverted patch was intended to improve logging for the NFP driver by
including information such as the source code file and number in log
messages.
Unfortunately our experience is that this has not improved things as
we had hoped. The resulting logs are inconsistent with (most) other
kernel log messages. And rely on knowledge of the source code version
in order for the extra information to be useful.
Thus, revert the change.
We acknowledge that Jakub Kicinski <kuba@kernel.org> foresaw this problem.
Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
While converting the mv88e6xxx driver to phylink pcs, it has been
noticed that we've started to have repeated cases where we convert a
speed and duplex to a BMCR value.
Rather than open coding this in multiple locations, let's provide a
helper for this - in linux/mii.h. This helper not only takes care of
the standard 10, 100 and 1000Mbps encodings, but also includes
2500Mbps (which is the same as 1000Mbps) for those users who require
that encoding as well. Unknown speeds will be encoded to 10Mbps, and
non-full duplexes will be encoded as half duplex.
This series converts the existing users to the new helper, and the
mv88e6xxx conversion will add further users in the 6352 and 639x PCS
code.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Xiaoliang Yang [Fri, 17 Jun 2022 03:24:23 +0000 (11:24 +0800)]
net: dsa: felix: update base time of time-aware shaper when adjusting PTP time
When adjusting the PTP clock, the base time of the TAS configuration
will become unreliable. We need reset the TAS configuration by using a
new base time.
For example, if the driver gets a base time 0 of Qbv configuration from
user, and current time is 20000. The driver will set the TAS base time
to be 20000. After the PTP clock adjustment, the current time becomes
10000. If the TAS base time is still 20000, it will be a future time,
and TAS entry list will stop running. Another example, if the current
time becomes to be 10000000 after PTP clock adjust, a large time offset
can cause the hardware to hang.
This patch introduces a tas_clock_adjust() function to reset the TAS
module by using a new base time after the PTP clock adjustment. This can
avoid issues above.
Due to PTP clock adjustment can occur at any time, it may conflict with
the TAS configuration. We introduce a new TAS lock to serialize the
access to the TAS registers.
Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: ethernet: stmmac: remove select QCOM_SOCINFO and make it optional
QCOM_SOCINFO depends on QCOM_SMEM but is not selected, this cause some
problems with QCOM_SOCINFO getting selected with the dependency of
QCOM_SMEM not met.
To fix this remove the select in Kconfig and add additional info in the
DWMAC_IPQ806X config description.
Reported-by: kernel test robot <lkp@intel.com> Fixes: 9ec092d2feb6 ("net: ethernet: stmmac: add missing sgmii configure for ipq806x") Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sat, 18 Jun 2022 04:04:15 +0000 (21:04 -0700)]
ping: convert to RCU lookups, get rid of rwlock
Using rwlock in networking code is extremely risky.
writers can starve if enough readers are constantly
grabing the rwlock.
I thought rwlock were at fault and sent this patch:
https://lkml.org/lkml/2022/6/17/272
But Peter and Linus essentially told me rwlock had to be unfair.
We need to get rid of rwlock in networking code.
Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Peter Lafreniere [Thu, 16 Jun 2022 15:23:33 +0000 (11:23 -0400)]
ax25: use GFP_KERNEL in ax25_dev_device_up()
ax25_dev_device_up() is only called during device setup, which is
done in user context. In addition, ax25_dev_device_up()
unconditionally calls ax25_register_dev_sysctl(), which already
allocates with GFP_KERNEL.
Since it is allowed to sleep in this function, here we change
ax25_dev_device_up() to use GFP_KERNEL to reduce unnecessary
out-of-memory errors.
We've added 72 non-merge commits during the last 15 day(s) which contain
a total of 92 files changed, 4582 insertions(+), 834 deletions(-).
The main changes are:
1) Add 64 bit enum value support to BTF, from Yonghong Song.
2) Implement support for sleepable BPF uprobe programs, from Delyan Kratunov.
3) Add new BPF helpers to issue and check TCP SYN cookies without binding to a
socket especially useful in synproxy scenarios, from Maxim Mikityanskiy.
4) Fix libbpf's internal USDT address translation logic for shared libraries as
well as uprobe's symbol file offset calculation, from Andrii Nakryiko.
5) Extend libbpf to provide an API for textual representation of the various
map/prog/attach/link types and use it in bpftool, from Daniel Müller.
6) Provide BTF line info for RV64 and RV32 JITs, and fix a put_user bug in the
core seen in 32 bit when storing BPF function addresses, from Pu Lehui.
7) Fix libbpf's BTF pointer size guessing by adding a list of various aliases
for 'long' types, from Douglas Raillard.
8) Fix bpftool to readd setting rlimit since probing for memcg-based accounting
has been unreliable and caused a regression on COS, from Quentin Monnet.
9) Fix UAF in BPF cgroup's effective program computation triggered upon BPF link
detachment, from Tadeusz Struk.
10) Fix bpftool build bootstrapping during cross compilation which was pointing
to the wrong AR process, from Shahab Vahedi.
11) Fix logic bug in libbpf's is_pow_of_2 implementation, from Yuze Chi.
12) BPF hash map optimization to avoid grabbing spinlocks of all CPUs when there
is no free element. Also add a benchmark as reproducer, from Feng Zhou.
13) Fix bpftool's codegen to bail out when there's no BTF, from Michael Mullin.
14) Various minor cleanup and improvements all over the place.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (72 commits)
bpf: Fix bpf_skc_lookup comment wrt. return type
bpf: Fix non-static bpf_func_proto struct definitions
selftests/bpf: Don't force lld on non-x86 architectures
selftests/bpf: Add selftests for raw syncookie helpers in TC mode
bpf: Allow the new syncookie helpers to work with SKBs
selftests/bpf: Add selftests for raw syncookie helpers
bpf: Add helpers to issue and check SYN cookies in XDP
bpf: Allow helpers to accept pointers with a fixed size
bpf: Fix documentation of th_len in bpf_tcp_{gen,check}_syncookie
selftests/bpf: add tests for sleepable (uk)probes
libbpf: add support for sleepable uprobe programs
bpf: allow sleepable uprobe programs to attach
bpf: implement sleepable uprobes by chaining gps
bpf: move bpf_prog to bpf.h
libbpf: Fix internal USDT address translation logic for shared libraries
samples/bpf: Check detach prog exist or not in xdp_fwd
selftests/bpf: Avoid skipping certain subtests
selftests/bpf: Fix test_varlen verification failure with latest llvm
bpftool: Do not check return value from libbpf_set_strict_mode()
Revert "bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK"
...
====================
Oleksij Rempel [Fri, 17 Jun 2022 07:16:07 +0000 (09:16 +0200)]
net: macb: fix negative max_mtu size for sama5d3
JML register on probe will return zero . This register is configured
later on macb_init_hw() which is called on open.
Since we have zero, after header and FCS length subtraction we will get
negative max_mtu size. This issue was affecting DSA drivers with MTU support
(for example KSZ9477).
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kees Cook [Thu, 16 Jun 2022 05:23:12 +0000 (22:23 -0700)]
hinic: Replace memcpy() with direct assignment
Under CONFIG_FORTIFY_SOURCE=y and CONFIG_UBSAN_BOUNDS=y, Clang is bugged
here for calculating the size of the destination buffer (0x10 instead of
0x14). This copy is a fixed size (sizeof(struct fw_section_info_st)), with
the source and dest being struct fw_section_info_st, so the memcpy should
be safe, assuming the index is within bounds, which is UBSAN_BOUNDS's
responsibility to figure out.
Avoid the whole thing and just do a direct assignment. This results in
no change to the executable code.
Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Tom Rix <trix@redhat.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: Jiri Pirko <jiri@nvidia.com> Cc: Vladimir Oltean <olteanv@gmail.com> Cc: Simon Horman <simon.horman@corigine.com> Cc: netdev@vger.kernel.org Cc: llvm@lists.linux.dev Link: https://github.com/ClangBuiltLinux/linux/issues/1592 Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> # build Signed-off-by: David S. Miller <davem@davemloft.net>
Current kernel will compile this driver with warnings. This patch will
fix it.
drivers/net/ethernet/atheros/ag71xx.c: In function 'ag71xx_fast_reset':
drivers/net/ethernet/atheros/ag71xx.c:996:31: warning: passing argument 2 of 'ag71xx_hw_set
_macaddr' discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
996 | ag71xx_hw_set_macaddr(ag, dev->dev_addr);
| ~~~^~~~~~~~~~
drivers/net/ethernet/atheros/ag71xx.c:951:69: note: expected 'unsigned char *' but argument
is of type 'const unsigned char *'
951 | static void ag71xx_hw_set_macaddr(struct ag71xx *ag, unsigned char *mac)
| ~~~~~~~~~~~~~~~^~~
drivers/net/ethernet/atheros/ag71xx.c: In function 'ag71xx_open':
drivers/net/ethernet/atheros/ag71xx.c:1441:32: warning: passing argument 2 of 'ag71xx_hw_se
t_macaddr' discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
1441 | ag71xx_hw_set_macaddr(ag, ndev->dev_addr);
| ~~~~^~~~~~~~~~
drivers/net/ethernet/atheros/ag71xx.c:951:69: note: expected 'unsigned char *' but argument
is of type 'const unsigned char *'
951 | static void ag71xx_hw_set_macaddr(struct ag71xx *ag, unsigned char *mac)
| ~~~~~~~~~~~~~~~^~~
Fixes: adeef3e32146 ("net: constify netdev->dev_addr") Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
pcs-xpcs, stmmac: add 1000BASE-X AN for network switch
Thanks for v4 review feedback in [1] and [2]. I have changed the v5
implementation as follow.
v5 changes:
1/5 - No change from v4.
2/5 - No change from v4.
3/5 - [Fix] make xpcs_modify_changed() static and use
mdiodev_modify_changed() for cleaner code as suggested by
Russell King.
4/5 - [Fix] Use fwnode_get_phy_mode() as recommended by Andrew Lunn.
5/5 - [Fix] Make fwnode = of_fwnode_handle(priv->plat->phylink_node)
order after priv = netdev_priv(dev).
v4 changes:
1/5 - Squash v3:1/7 & 2/7 patches into v4:1/6 so that it passes build.
2/5 - [No change] same as v3:3/7
3/5 - [Fix] Fix issues identified by Russell in [1]
4/5 - [Fix] Drop v3:5/7 patch per input by Russell in [2] and make
dwmac-intel clear the ovr_an_inband flag if fixed-link
is used in ACPI _DSD.
5/5 - [No change] same as v3:7/7
For the steps to setup ACPI _DSD and checking, they are the same
as in [3]
Ong Boon Leong [Wed, 15 Jun 2022 08:39:08 +0000 (16:39 +0800)]
net: stmmac: make mdio register skips PHY scanning for fixed-link
stmmac_mdio_register() lacks fixed-link consideration and only skip PHY
scanning if it has done DT style PHY discovery. So, for DT or ACPI _DSD
setting of fixed-link, the PHY scanning should not happen.
v2: fix incorrect order related to fwnode that is not caught in non-DT
platform.
Tested-by: Emilio Riva <emilio.riva@ericsson.com> Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ong Boon Leong [Wed, 15 Jun 2022 08:39:07 +0000 (16:39 +0800)]
stmmac: intel: add phy-mode and fixed-link ACPI _DSD setting support
Currently, phy_interface for TSN controller instance is set based on its
PCI Device ID. For SGMII PHY interface, phy_interface default to
PHY_INTERFACE_MODE_SGMII. As C37 AN supports both SGMII and 1000BASE-X
mode, we add support for 'phy-mode' ACPI _DSD for port-specific
and customer platform specific customization.
v3: use fwnode_get_phy_mode() as suggested by Andrew Lunn in
https://patchwork.kernel.org/comment/24895330/
v2:
For platform that sets 'fixed-link' using ACPI _DSD, we will unset
xpcs_an_inband within stmmac. Thanks to Russell King for his comment in
https://patchwork.kernel.org/comment/24890222/
v1:
Thanks to Andrew Lunn's guidance in
https://patchwork.kernel.org/comment/24827101/
Tested-by: Emilio Riva <emilio.riva@ericsson.com> Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ong Boon Leong [Wed, 15 Jun 2022 08:39:06 +0000 (16:39 +0800)]
net: pcs: xpcs: add CL37 1000BASE-X AN support
For CL37 1000BASE-X AN, DW xPCS does not support C22 method but offers
C45 vendor-specific MII MMD for programming.
We also add the ability to disable Autoneg (through ethtool for certain
network switch that supports 1000BASE-X (1000Mbps and Full-Duplex) but
not Autoneg capability.
v4: Fixes to comment from Russell King. Thanks!
https://patchwork.kernel.org/comment/24894239/
Make xpcs_modify_changed() as private, change to use
mdiodev_modify_changed() for cleaner code.
v3: Fixes to issues spotted by Russell King. Thanks!
https://patchwork.kernel.org/comment/24890210/
Use phylink_mii_c22_pcs_decode_state(), remove unnecessary
interrupt clearing and skip speed & duplex setting if AN
is enabled.
v2: Fixes to issues spotted by Russell King in v1. Thanks!
https://patchwork.kernel.org/comment/24826650/
Use phylink_mii_c22_pcs_encode_advertisement() and implement
C45 MII ADV handling since IP only support C45 access.
Tested-by: Emilio Riva <emilio.riva@ericsson.com> Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
Ong Boon Leong [Wed, 15 Jun 2022 08:39:05 +0000 (16:39 +0800)]
stmmac: intel: prepare to support 1000BASE-X phy interface setting
Currently, intel_speed_mode_2500() redundantly fix-up phy_interface to
PHY_INTERFACE_MODE_SGMII if the underlying controller is in 1000Mbps
SGMII mode. The value of phy_interface has been initialized earlier.
This patch removes such redundancy to prepare for setting 1000BASE-X
mode for certain hardware platform configuration.
Also update the intel_mgbe_common_data() to include 1000BASE-X setup.
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ong Boon Leong [Wed, 15 Jun 2022 08:39:04 +0000 (16:39 +0800)]
net: make xpcs_do_config to accept advertising for pcs-xpcs and sja1105
xpcs_config() has 'advertising' input that is required for C37 1000BASE-X
AN in later patch series. So, we prepare xpcs_do_config() for it.
For sja1105, xpcs_do_config() is used for xpcs configuration without
depending on advertising input, so set to NULL.
Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
While testing L3 HW stats [1] on top of mlxsw, two issues were found:
1. Stats cannot be enabled for more than 205 netdevs. This was fixed in
commit 4b7a632ac4e7 ("mlxsw: spectrum_cnt: Reorder counter pools").
2. ARP packets are counted as errors. Patch #1 takes care of that. See
the commit message for details.
The goal of the majority of the rest of the patches is to add selftests
that would have discovered that only about 205 netdevs can have L3 HW
stats supported, despite the HW supporting much more. The obvious place
to plug this in is the scale test framework.
The scale tests are currently testing two things: that some number of
instances of a given resource can actually be created; and that when an
attempt is made to create more than the supported amount, the failures
are noted and handled gracefully.
However the ability to allocate the resource does not mean that the
resource actually works when passing traffic. For that, make it possible
for a given scale to also test traffic.
To that end, this patchset adds traffic tests. The goal of these is to
run traffic and observe whether a sample of the allocated resource
instances actually perform their task. Traffic tests are only run on the
positive leg of the scale test (no point trying to pass traffic when the
expected outcome is that the resource will not be allocated). They are
opt-in, if a given test does not expose it, it is not run.
The patchset proceeds as follows:
- Patches #2 and #3 add to "devlink resource" support for number of
allocated RIFs, and the capacity. This is necessary, because when
evaluating how many L3 HW stats instances it should be possible to
allocate, the limiting resource on Spectrum-2 and above currently is
not the counters themselves, but actually the RIFs.
- Patch #6 adds support for invocation of a traffic test, if a given scale
tests exposes it.
- Patch #7 adds support for skipping a given scale test. Because on
Spectrum-2 and above, the limiting factor to L3 HW stats instances is
actually the number of RIFs, there is no point in running the failing leg
of a scale tests, because it would test exhaustion of RIFs, not of RIF
counters.
- With patch #8, the scale tests drivers pass the target number to the
cleanup function of a scale test.
- In patch #9, add a traffic test to the tc_flower selftests. This makes
sure that the flow counters installed with the ACLs actually do count as
they are supposed to.
- In patch #10, add a new scale selftest for RIF counter scale, including a
traffic test.
- In patch #11, the scale target for the tc_flower selftest is
dynamically set instead of being hard coded.
Petr Machata [Thu, 16 Jun 2022 10:42:44 +0000 (13:42 +0300)]
selftests: mlxsw: Add a RIF counter scale test
This tests creates as many RIFs as possible, ideally more than there can be
RIF counters (though that is currently only possible on Spectrum-1). It
then tries to enable L3 HW stats on each of the RIFs. It also contains the
traffic test, which tries to run traffic through a log2 of those counters
and checks that the traffic is shown in the counter values.
Like with tc_flower traffic test, take a log2 subset of rules. The logic
behind picking log2 rules is that then every bit of the instantiated item's
number is exercised. This should catch issues whether they happen at the
high end, low end, or somewhere in between.
Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Thu, 16 Jun 2022 10:42:43 +0000 (13:42 +0300)]
selftests: mlxsw: tc_flower_scale: Add a traffic test
Add a test that checks that the created filters do actually trigger on
matching traffic.
Exercising all the rules would be a very lengthy process. Instead, take a
log2 subset of rules. The logic behind picking log2 rules is that then
every bit of the instantiated item's number is exercised. This should catch
issues whether they happen at the high end, low end, or somewhere in
between.
Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Thu, 16 Jun 2022 10:42:42 +0000 (13:42 +0300)]
selftests: mlxsw: resource_scale: Pass target count to cleanup
The scale tests are verifying behavior of mlxsw when number of instances of
some resource reaches the ASIC capacity. The number of instances is
referred to as "target" number.
No scale tests so far needed to know this target number to clean up. E.g.
the tc_flower simply removes the clsact qdisc that all the tested filters
are hooked onto, and that takes care of collecting all the filters.
However, for the RIF counter test, which is being added in a future patch,
VLAN netdevices are created. These are created as part of the test, but of
course the cleanup needs to undo them again. For that it needs to know how
many there were. To support this usage, pass the target number to the
cleanup callback.
Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Thu, 16 Jun 2022 10:42:41 +0000 (13:42 +0300)]
selftests: mlxsw: resource_scale: Allow skipping a test
The scale tests are currently testing two things: that some number of
instances of a given resource can actually be created; and that when an
attempt is made to create more than the supported amount, the failures are
noted and handled gracefully.
Sometimes the scale test depends on more than one resource. In particular,
a following patch will add a RIF counter scale test, which depends on the
number of RIF counters that can be bound, and also on the number of RIFs
that can be created.
When the test is limited by the auxiliary resource and not by the primary
one, there's no point trying to run the overflow test, because it would be
testing exhaustion of the wrong resource.
To support this use case, when the $test_get_target yields 0, skip the test
instead.
Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The scale tests are currently testing two things: that some number of
instances of a given resource can actually be created; and that when an
attempt is made to create more than the supported amount, the failures are
noted and handled gracefully.
However the ability to allocate the resource does not mean that the
resource actually works when passing traffic. For that, make it possible
for a given scale to also test traffic.
Traffic test is only run on the positive leg of the scale test (no point
trying to pass traffic when the expected outcome is that the resource will
not be allocated). Traffic tests are opt-in, if a given test does not
expose it, it is not run.
To this end, delay the test cleanup until after the traffic test is run.
Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Thu, 16 Jun 2022 10:42:39 +0000 (13:42 +0300)]
selftests: mlxsw: resource_scale: Update scale target after test setup
The scale of each resource is tested in the following manner:
1. The scale target is queried.
2. The test setup is prepared.
3. The test is invoked.
In some cases, the occupancy of a resource changes as part of the second
step, requiring the test to return a scale target that takes this change
into account.
Make this more robust by re-querying the scale target after the second
step.
Another possible solution is to swap the first and second steps, but
when a test needs to be skipped (i.e., scale target is zero), the setup
would have been in vain.
Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Thu, 16 Jun 2022 10:42:38 +0000 (13:42 +0300)]
selftests: mirror_gre_bridge_1q_lag: Enslave port to bridge before other configurations
Using mlxsw driver, the configurations are offloaded just in case that
there is a physical port which is enslaved to the virtual device
(e.g., to a bridge). In 'mirror_gre_bridge_1q_lag' test, the bridge gets an
address and route before there are ports in the bridge. It means that these
configurations are not offloaded.
Till now the test passes with mlxsw driver even that the RIF of the
bridge is not in the hardware, because the ARP packets are trapped in
layer 2 and also mirrored, so there is no real need of the RIF in hardware.
The previous patch changed the traps 'ARP_REQUEST' and 'ARP_RESPONSE' to
be done at layer 3 instead of layer 2. With this change the ARP packets are
not trapped during the test, as the RIF is not in the hardware because of
the order of configurations.
Reorder the configurations to make them to be offloaded, then the test will
pass with the change of the traps.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Thu, 16 Jun 2022 10:42:37 +0000 (13:42 +0300)]
mlxsw: Add a resource describing number of RIFs
The Spectrum ASIC has a limit on how many L3 devices (called RIFs) can be
created. The limit depends on the ASIC and FW revision, and mlxsw reads it
from the FW. In order to communicate both the number of RIFs that there can
be, and how many are taken now (i.e. occupancy), introduce a corresponding
devlink resource.
Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Thu, 16 Jun 2022 10:42:36 +0000 (13:42 +0300)]
mlxsw: Keep track of number of allocated RIFs
In order to expose number of RIFs as a resource, it is going to be handy
to have the number of currently-allocated RIFs as a single number.
Introduce such.
Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Thu, 16 Jun 2022 10:42:35 +0000 (13:42 +0300)]
mlxsw: Trap ARP packets at layer 3 instead of layer 2
Currently, the traps 'ARP_REQUEST' and 'ARP_RESPONSE' occur at layer 2.
To allow the packets to be flooded, they are configured with the action
'MIRROR_TO_CPU' which means that the CPU receives a replica of the packet.
Today, Spectrum ASICs also support trapping ARP packets at layer 3. This
behavior is better, then the packets can just be trapped and there is no
need to mirror them. An additional motivation is that using the traps at
layer 2, the ARP packets are dropped in the router as they do not have an
IP header, then they are counted as error packets, which might confuse
users.
Add the relevant traps for layer 3 and use them instead of the existing
traps. There is no visible change to user space.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 17 Jun 2022 09:11:04 +0000 (10:11 +0100)]
Merge branch 'tcp-mem-pressure-fixes'
Eric Dumazet says:
====================
tcp: final (?) round of mem pressure fixes
While working on prior patch series (e10b02ee5b6c "Merge branch
'net-reduce-tcp_memory_allocated-inflation'"), I found that we
could still have frozen TCP flows under memory pressure.
I thought we had solved this in 2015, but the fix was not complete.
v2: deal with zerocopy tx paths.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 14 Jun 2022 17:17:34 +0000 (10:17 -0700)]
tcp: fix possible freeze in tx path under memory pressure
Blamed commit only dealt with applications issuing small writes.
Issue here is that we allow to force memory schedule for the sk_buff
allocation, but we have no guarantee that sendmsg() is able to
copy some payload in it.
In this patch, I make sure the socket can use up to tcp_wmem[0] bytes.
For example, if we consider tcp_wmem[0] = 4096 (default on x86),
and initial skb->truesize being 1280, tcp_sendmsg() is able to
copy up to 2816 bytes under memory pressure.
Before this patch a sendmsg() sending more than 2816 bytes
would either block forever (if persistent memory pressure),
or return -EAGAIN.
For bigger MTU networks, it is advised to increase tcp_wmem[0]
to avoid sending too small packets.
v2: deal with zero copy paths.
Fixes: 8e4d980ac215 ("tcp: fix behavior for epoll edge trigger") Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Wei Wang <weiwan@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 14 Jun 2022 17:17:34 +0000 (10:17 -0700)]
tcp: fix possible freeze in tx path under memory pressure
Blamed commit only dealt with applications issuing small writes.
Issue here is that we allow to force memory schedule for the sk_buff
allocation, but we have no guarantee that sendmsg() is able to
copy some payload in it.
In this patch, I make sure the socket can use up to tcp_wmem[0] bytes.
For example, if we consider tcp_wmem[0] = 4096 (default on x86),
and initial skb->truesize being 1280, tcp_sendmsg() is able to
copy up to 2816 bytes under memory pressure.
Before this patch a sendmsg() sending more than 2816 bytes
would either block forever (if persistent memory pressure),
or return -EAGAIN.
For bigger MTU networks, it is advised to increase tcp_wmem[0]
to avoid sending too small packets.
v2: deal with zero copy paths.
Fixes: 8e4d980ac215 ("tcp: fix behavior for epoll edge trigger") Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Wei Wang <weiwan@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 14 Jun 2022 17:17:33 +0000 (10:17 -0700)]
tcp: fix over estimation in sk_forced_mem_schedule()
sk_forced_mem_schedule() has a bug similar to ones fixed
in commit 7c80b038d23e ("net: fix sk_wmem_schedule() and
sk_rmem_schedule() errors")
While this bug has little chance to trigger in old kernels,
we need to fix it before the following patch.
Fixes: d83769a580f1 ("tcp: fix possible deadlock in tcp_send_fin()") Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Reviewed-by: Wei Wang <weiwan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrii Nakryiko [Fri, 17 Jun 2022 04:55:12 +0000 (21:55 -0700)]
selftests/bpf: Don't force lld on non-x86 architectures
LLVM's lld linker doesn't have a universal architecture support (e.g.,
it definitely doesn't work on s390x), so be safe and force lld for
urandom_read and liburandom_read.so only on x86 architectures.
Merge branch 'New BPF helpers to accelerate synproxy'
Maxim Mikityanskiy says:
====================
The first patch of this series is a documentation fix.
The second patch allows BPF helpers to accept memory regions of fixed
size without doing runtime size checks.
The two next patches add new functionality that allows XDP to
accelerate iptables synproxy.
v1 of this series [1] used to include a patch that exposed conntrack
lookup to BPF using stable helpers. It was superseded by series [2] by
Kumar Kartikeya Dwivedi, which implements this functionality using
unstable helpers.
The third patch adds new helpers to issue and check SYN cookies without
binding to a socket, which is useful in the synproxy scenario.
The fourth patch adds a selftest, which includes an XDP program and a
userspace control application. The XDP program uses socketless SYN
cookie helpers and queries conntrack status instead of socket status.
The userspace control application allows to tune parameters of the XDP
program. This program also serves as a minimal example of usage of the
new functionality.
The last two patches expose the new helpers to TC BPF and extend the
selftest.
The draft of the new functionality was presented on Netdev 0x15 [3].
v2 changes:
Split into two series, submitted bugfixes to bpf, dropped the conntrack
patches, implemented the timestamp cookie in BPF using bpf_loop, dropped
the timestamp cookie patch.
v3 changes:
Moved some patches from bpf to bpf-next, dropped the patch that changed
error codes, split the new helpers into IPv4/IPv6, added verifier
functionality to accept memory regions of fixed size.
v4 changes:
Converted the selftest to the test_progs runner. Replaced some
deprecated functions in xdp_synproxy userspace helper.
v5 changes:
Fixed a bug in the selftest. Added questionable functionality to support
new helpers in TC BPF, added selftests for it.
v6 changes:
Wrap the new helpers themselves into #ifdef CONFIG_SYN_COOKIES, replaced
fclose with pclose and fixed the MSS for IPv6 in the selftest.
v7 changes:
Fixed the off-by-one error in indices, changed the section name to
"xdp", added missing kernel config options to vmtest in CI.
v8 changes:
Properly rebased, dropped the first patch (the same change was applied
by someone else), updated the cover letter.
v9 changes:
Fixed selftests for no_alu32.
v10 changes:
Selftests for s390x were blacklisted due to lack of support of kfunc,
rebased the series, split selftests to separate commits, created
ARG_PTR_TO_FIXED_SIZE_MEM and packed arg_size, addressed the rest of
comments.
selftests/bpf: Add selftests for raw syncookie helpers in TC mode
This commit extends selftests for the new BPF helpers
bpf_tcp_raw_{gen,check}_syncookie_ipv{4,6} to also test the TC BPF
functionality added in the previous commit.
bpf: Allow the new syncookie helpers to work with SKBs
This commit allows the new BPF helpers to work in SKB context (in TC
BPF programs): bpf_tcp_raw_{gen,check}_syncookie_ipv{4,6}.
Using these helpers in TC BPF programs is not recommended, because it's
unlikely that the BPF program will provide any substantional speedup
compared to regular SYN cookies or synproxy, after the SKB is already
created.
selftests/bpf: Add selftests for raw syncookie helpers
This commit adds selftests for the new BPF helpers:
bpf_tcp_raw_{gen,check}_syncookie_ipv{4,6}.
xdp_synproxy_kern.c is a BPF program that generates SYN cookies on
allowed TCP ports and sends SYNACKs to clients, accelerating synproxy
iptables module.
xdp_synproxy.c is a userspace control application that allows to
configure the following options in runtime: list of allowed ports, MSS,
window scale, TTL.
A selftest is added to prog_tests that leverages the above programs to
test the functionality of the new helpers.
bpf: Add helpers to issue and check SYN cookies in XDP
The new helpers bpf_tcp_raw_{gen,check}_syncookie_ipv{4,6} allow an XDP
program to generate SYN cookies in response to TCP SYN packets and to
check those cookies upon receiving the first ACK packet (the final
packet of the TCP handshake).
Unlike bpf_tcp_{gen,check}_syncookie these new helpers don't need a
listening socket on the local machine, which allows to use them together
with synproxy to accelerate SYN cookie generation.
bpf: Allow helpers to accept pointers with a fixed size
Before this commit, the BPF verifier required ARG_PTR_TO_MEM arguments
to be followed by ARG_CONST_SIZE holding the size of the memory region.
The helpers had to check that size in runtime.
There are cases where the size expected by a helper is a compile-time
constant. Checking it in runtime is an unnecessary overhead and waste of
BPF registers.
This commit allows helpers to accept pointers to memory without the
corresponding ARG_CONST_SIZE, given that they define the memory region
size in struct bpf_func_proto and use ARG_PTR_TO_FIXED_SIZE_MEM type.
arg_size is unionized with arg_btf_id to reduce the kernel image size,
and it's valid because they are used by different argument types.
bpf: Fix documentation of th_len in bpf_tcp_{gen,check}_syncookie
bpf_tcp_gen_syncookie expects the full length of the TCP header (with
all options), and bpf_tcp_check_syncookie accepts lengths bigger than
sizeof(struct tcphdr). Fix the documentation that says these lengths
should be exactly sizeof(struct tcphdr).
While at it, fix a typo in the name of struct ipv6hdr.
This patch series continues with the addition of supported features
for the Ethernet function of the PCI11010 / PCI11414 devices to
the LAN743x driver.
====================
====================
net: dsa: realtek: rtl8365mb: improve handling of PHY modes
This series introduces some minor cleanup of the driver and improves the
handling of PHY interface modes to break the assumption that CPU ports
are always over an external interface, and the assumption that user
ports are always using an internal PHY.
====================
Realtek switches in the rtl8365mb family always have at least one port
with a so-called external interface, supporting PHY interface modes such
as RGMII or SGMII. The purpose of this patch is to improve the driver's
handling of these ports.
A new struct rtl8365mb_chip_info is introduced together with a static
array of such structs. An instance of this struct is added for each
supported switch, distinguished by its chip ID and version. Embedded in
each chip_info struct is an array of struct rtl8365mb_extint, describing
the external interfaces available. This is more specific than the old
rtl8365mb_extint_port_map, which was only valid for switches with up to
6 ports.
The struct rtl8365mb_extint also contains a bitmask of supported PHY
interface modes, which allows the driver to distinguish which ports
support RGMII. This corrects a previous mistake in the driver whereby it
was assumed that any port with an external interface supports RGMII.
This is not actually the case: for example, the RTL8367S has two
external interfaces, only the second of which supports RGMII. The first
supports only SGMII and HSGMII. This new design will make it easier to
add support for other interface modes.
Finally, rtl8365mb_phylink_get_caps() is fixed up to return supported
capabilities based on the external interface properties described above.
This addresses Vladimir's point in the linked thread that the
capabilities are not actually a function of the DSA port type: Although
most typical applications will treat the ports with internal PHY as user
ports, there is no actual hardware limitation preventing one from using
them as a CPU port. Equally, ports with external interface(s) may well
be treated as user ports, even though it is typical to use those ports
as CPU ports.