Add a hook for PCS to validate the link parameters. This avoids MAC
drivers having to have knowledge of their PCS in their validate()
method, thereby allowing several MAC drivers to be simplfied.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
net: phylink: add mac_select_pcs() method to phylink_mac_ops
mac_select_pcs() allows us to have an explicit point to query which
PCS the MAC wishes to use for a particular PHY interface mode, thereby
allowing us to add support to validate the link settings with the PCS.
Phylink will also use this to select the PCS to be used during a major
configuration event without the MAC driver needing to call
phylink_set_pcs().
Note that if mac_select_pcs() is present, the supported_interfaces
bitmap must be filled in; this avoids mac_select_pcs() being called
with PHY_INTERFACE_MODE_NA when we want to get support for all
interface types. Phylink will return an error in phylink_create()
unless this condition is satisfied.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 16 Dec 2021 10:23:34 +0000 (10:23 +0000)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/nex
t-queue
Tony Nguyen says:
====================
100GbE Intel Wired LAN Driver Updates 2021-12-15
This series contains updates to ice driver only.
Jake makes changes to flash update. This includes the following:
* a new shadow-ram region similar to NVM region but for the device shadow
RAM contents. This is distinct from NVM region because shadow RAM is
built up during device init and may be different from the raw NVM flash
data.
* refactoring of the ice_flash_pldm_image to become the main flash update
entry point. This is simpler than having both an
ice_devlink_flash_update and an ice_flash_pldm_image. It will make
additions like dry-run easier in the future.
* reducing time to read Option ROM version information.
* adding support for firmware activation via devlink reload, when
possible.
The major new work is the reload support, which allows activating firmware
immediately without a reboot when possible. Reload support only supports
firmware activation.
Jesse improves transmit code: utilizing newer netif_tx* API, adding some
prefetch calls, correcting expected conditions when calling ice_vsi_down(),
and utilizing __netdev_tx_sent_queue() call.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 16 Dec 2021 10:22:20 +0000 (10:22 +0000)]
Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Saeed Mahameed says:
====================
mlx5-next branch 2021-12-15
Hi Dave, Jakub, Jason
This pulls mlx5-next branch into net-next and rdma branches.
All patches already reviewed on both rdma and netdev mailing lists.
Please pull and let me know if there's any problem.
1) Add multiple FDB steering priorities [1]
2) Introduce HW bits needed to configure MAC list size of VF/SF.
Required for ("net/mlx5: Memory optimizations") upcoming series [2].
Jesse Brandeburg [Tue, 26 Oct 2021 00:08:25 +0000 (17:08 -0700)]
ice: tighter control over VSI_DOWN state
The driver had comments to the effect of: This flag should be set before
calling this function. While reviewing code it was found that there were
several violations of this policy, which could introduce hard to find
bugs or races.
Fix the violations of the "VSI DOWN state must be set before calling
ice_down" and make checking the state into code with a WARN_ON.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Jesse Brandeburg [Wed, 27 Oct 2021 19:38:36 +0000 (12:38 -0700)]
ice: use prefetch methods
The kernel provides some prefetch mechanisms to speed up commonly
cold cache line accesses during receive processing. Since these are
software structures it helps to have these strategically placed
prefetches.
Be careful to call BQL prefetch complete only for non XDP queues.
Co-developed-by: Piotr Raczynski <piotr.raczynski@intel.com> Signed-off-by: Piotr Raczynski <piotr.raczynski@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Jacob Keller [Wed, 27 Oct 2021 23:22:55 +0000 (16:22 -0700)]
ice: support immediate firmware activation via devlink reload
The ice hardware contains an embedded chip with firmware which can be
updated using devlink flash. The firmware which runs on this chip is
referred to as the Embedded Management Processor firmware (EMP
firmware).
Activating the new firmware image currently requires that the system be
rebooted. This is not ideal as rebooting the system can cause unwanted
downtime.
In practical terms, activating the firmware does not always require a
full system reboot. In many cases it is possible to activate the EMP
firmware immediately. There are a couple of different scenarios to
cover.
* The EMP firmware itself can be reloaded by issuing a special update
to the device called an Embedded Management Processor reset (EMP
reset). This reset causes the device to reset and reload the EMP
firmware.
* PCI configuration changes are only reloaded after a cold PCIe reset.
Unfortunately there is no generic way to trigger this for a PCIe
device without a system reboot.
When performing a flash update, firmware is capable of responding with
some information about the specific update requirements.
The driver updates the flash by programming a secondary inactive bank
with the contents of the new image, and then issuing a command to
request to switch the active bank starting from the next load.
The response to the final command for updating the inactive NVM flash
bank includes an indication of the minimum reset required to fully
update the device. This can be one of the following:
* A full power on is required
* A cold PCIe reset is required
* An EMP reset is required
The response to the command to switch flash banks includes an indication
of whether or not the firmware will allow an EMP reset request.
For most updates, an EMP reset is sufficient to load the new EMP
firmware without issues. In some cases, this reset is not sufficient
because the PCI configuration space has changed. When this could cause
incompatibility with the new EMP image, the firmware is capable of
rejecting the EMP reset request.
Add logic to ice_fw_update.c to handle the response data flash update
AdminQ commands.
For the reset level, issue a devlink status notification informing the
user of how to complete the update with a simple suggestion like
"Activate new firmware by rebooting the system".
Cache the status of whether or not firmware will restrict the EMP reset
for use in implementing devlink reload.
Implement support for devlink reload with the "fw_activate" flag. This
allows user space to request the firmware be activated immediately.
For the .reload_down handler, we will issue a request for the EMP reset
using the appropriate firmware AdminQ command. If we know that the
firmware will not allow an EMP reset, simply exit with a suitable
netlink extended ACK message indicating that the EMP reset is not
available.
For the .reload_up handler, simply wait until the driver has finished
resetting. Logic to handle processing of an EMP reset already exists in
the driver as part of its reset and rebuild flows.
Implement support for the devlink reload interface with the
"fw_activate" action. This allows userspace to request activation of
firmware without a reboot.
Note that support for indicating the required reset and EMP reset
restriction is not supported on old versions of firmware. The driver can
determine if the two features are supported by checking the device
capabilities report. I confirmed support has existed since at least
version 5.5.2 as reported by the 'fw.mgmt' version. Support to issue the
EMP reset request has existed in all version of the EMP firmware for the
ice hardware.
Check the device capabilities report to determine whether or not the
indications are reported by the running firmware. If the reset
requirement indication is not supported, always assume a full power on
is necessary. If the reset restriction capability is not supported,
always assume the EMP reset is available.
Users can verify if the EMP reset has activated the firmware by using
the devlink info report to check that the 'running' firmware version has
updated. For example a user might do the following:
# Check current version
$ devlink dev info
# Update the device
$ devlink dev flash pci/0000:af:00.0 file firmware.bin
# Confirm stored version updated
$ devlink dev info
# Reload to activate new firmware
$ devlink dev reload pci/0000:af:00.0 action fw_activate
# Confirm running version updated
$ devlink dev info
Finally, this change does *not* implement basic driver-only reload
support. I did look into trying to do this. However, it requires
significant refactor of how the ice driver probes and loads everything.
The ice driver probe and allocation flows were not designed with such
a reload in mind. Refactoring the flow to support this is beyond the
scope of this change.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Jacob Keller [Wed, 27 Oct 2021 23:22:54 +0000 (16:22 -0700)]
ice: reduce time to read Option ROM CIVD data
During probe and device reset, the ice driver reads some data from the
NVM image as part of ice_init_nvm. Part of this data includes a section
of the Option ROM which contains version information.
The function ice_get_orom_civd_data is used to locate the '$CIV' data
section of the Option ROM.
Timing of ice_probe and ice_rebuild indicate that the
ice_get_orom_civd_data function takes about 10 seconds to finish
executing.
The function locates the section by scanning the Option ROM every 512
bytes. This requires a significant number of NVM read accesses, since
the Option ROM bank is 500KB. In the worst case it would take about 1000
reads. Worse, all PFs serialize this operation during reload because of
acquiring the NVM semaphore.
The CIVD section is located at the end of the Option ROM image data.
Unfortunately, the driver has no easy method to determine the offset
manually. Practical experiments have shown that the data could be at
a variety of locations, so simply reversing the scanning order is not
sufficient to reduce the overall read time.
Instead, copy the entire contents of the Option ROM into memory. This
allows reading the data using 4Kb pages instead of 512 bytes at a time.
This reduces the total number of firmware commands by a factor of 8. In
addition, reading the whole section together at once allows better
indication to firmware of when we're "done".
Re-write ice_get_orom_civd_data to allocate virtual memory to store the
Option ROM data. Copy the entire OptionROM contents at once using
ice_read_flash_module. Finally, use this memory copy to scan for the
'$CIV' section.
This change significantly reduces the time to read the Option ROM CIVD
section from ~10 seconds down to ~1 second. This has a significant
impact on the total time to complete a driver rebuild or probe.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Jacob Keller [Fri, 15 Oct 2021 23:13:03 +0000 (16:13 -0700)]
ice: move ice_devlink_flash_update and merge with ice_flash_pldm_image
The ice_devlink_flash_update function performs a few upfront checks and
then calls ice_flash_pldm_image.
Most if these checks make more sense in the context of code within
ice_flash_pldm_image. Merge ice_devlink_flash_update and
ice_flash_pldm_image into one function, placing it in ice_fw_update.c
Since this is still the entry point for devlink, call the function
ice_devlink_flash_update instead of ice_flash_pldm_image. This leaves a
single function which handles the devlink parameters and then initiates
a PLDM update.
With this change, the ice_devlink_flash_update function in
ice_fw_update.c becomes the main entry point for flash update. It
elimintes some unnecessary boiler plate code between the two previous
functions. The ultimate motivation for this is that it eases supporting
a dry run with the PLDM library in a future change.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Jacob Keller [Tue, 12 Oct 2021 00:41:11 +0000 (17:41 -0700)]
ice: move and rename ice_check_for_pending_update
The ice_devlink_flash_update function performs a few checks and then
calls ice_flash_pldm_image. One of these checks is to call
ice_check_for_pending_update. This function checks if the device has
a pending update, and cancels it if so. This is necessary to allow
a new flash update to proceed.
We want to refactor the ice code to eliminate ice_devlink_flash_update,
moving its checks into ice_flash_pldm_image.
To do this, ice_check_for_pending_update will become static, and only
called by ice_flash_pldm_image. To make this change easier to review,
first just move the function up within the ice_fw_update.c file.
While at it, note that the function has a misleading name. Its primary
action is to cancel a pending update. Using the verb "check" does not
imply this. Rename it to ice_cancel_pending_update.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Jacob Keller [Tue, 12 Oct 2021 00:41:10 +0000 (17:41 -0700)]
ice: devlink: add shadow-ram region to snapshot Shadow RAM
We have a region for reading the contents of the NVM flash as
a snapshot. This region does not allow reading the Shadow RAM, as it
always passes the FLASH_ONLY bit to the low level firmware interface.
Add a separate shadow-ram region which will allow snapshot of the
current contents of the Shadow RAM. This data is built from the NVM
contents but is distinct as the device builds up the Shadow RAM during
initialization, so being able to snapshot its contents can be useful
when attempting to debug flash related issues.
Fix the comment description of the nvm-flash region which incorrectly
stated that it filled the shadow-ram region, and add a comment
explaining that the nvm-flash region does not actually read the Shadow
RAM.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Jakub Kicinski [Tue, 14 Dec 2021 15:47:25 +0000 (07:47 -0800)]
ethtool: always write dev in ethnl_parse_header_dev_get
Commit 0976b888a150 ("ethtool: fix null-ptr-deref on ref tracker")
made the write to req_info.dev conditional, but as Eric points out
in a different follow up the structure is often allocated on the
stack and not kzalloc()'d so seems safer to always write the dev,
in case it's garbage on input.
Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 14 Dec 2021 15:09:33 +0000 (07:09 -0800)]
net: add net device refcount tracker to struct packet_type
Most notable changes are in af_packet, tipc ones are trivial.
Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jon Maloy <jmaloy@redhat.com> Cc: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 15 Dec 2021 15:05:44 +0000 (15:05 +0000)]
Merge branch 'mlxsw-ipv6-underlay'
Ido Schimmel says:
====================
mlxsw: Add support for VxLAN with IPv6 underlay
So far, mlxsw only supported VxLAN with IPv4 underlay. This patchset
extends mlxsw to also support VxLAN with IPv6 underlay. The main
difference is related to the way IPv6 addresses are handled by the
device. See patch #1 for a detailed explanation.
Patch #1 creates a common hash table to store the mapping from IPv6
addresses to KVDL indexes. This table is useful for both IP-in-IP and
VxLAN tunnels with an IPv6 underlay.
Patch #2 converts the IP-in-IP code to use the new hash table.
Patches #3-#6 are preparations.
Patch #7 finally adds support for VxLAN with IPv6 underlay.
Patch #8 removes a test case that checked that VxLAN configurations with
IPv6 underlay are vetoed by the driver.
A follow-up patchset will add forwarding selftests.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Tue, 14 Dec 2021 14:25:50 +0000 (16:25 +0200)]
mlxsw: Add support for VxLAN with IPv6 underlay
Currently, mlxsw driver supports VxLAN with IPv4 underlay only.
Add support for IPv6 underlay.
The main differences are:
* Learning is not supported for IPv6 FDB entries, use static entries and
do not allow 'learning' flag for IPv6 VxLAN.
* IPv6 addresses for FDB entries should be saved as part of KVDL.
Use the new API to allocate and release entries for IPv6 addresses.
* Spectrum ASICs do not fill UDP checksum, while in software IPv6 UDP
packets with checksum zero are dropped.
Force the relevant flags which allow the VxLAN device to generate UDP
packets with zero checksum and also receive them.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Tue, 14 Dec 2021 14:25:49 +0000 (16:25 +0200)]
mlxsw: spectrum_nve: Keep track of IPv6 addresses used by FDB entries
FDB entries that perform VxLAN encapsulation with an IPv6 underlay hold
a reference on a resource. Namely, the KVDL entry where the IPv6
underlay destination IP is stored. When such an FDB entry is deleted, it
needs to drop the reference from the corresponding KVDL entry.
To that end, maintain a hash table that maps an FDB entry (i.e., {MAC,
FID}) to the IPv6 address used by it.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Tue, 14 Dec 2021 14:25:47 +0000 (16:25 +0200)]
mlxsw: Split handling of FDB tunnel entries between address families
Currently, the function which adds/removes unicast tunnel FDB entries is
shared between IPv4 and IPv6, while for IPv6 it warns because there is
no support for it.
The code for IPv6 will be more complicated because it needs to
allocate/release a KVDL pointer for the underlay IPv6 address.
As a preparation for IPv6 underlay support, split the code according to
address family.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Tue, 14 Dec 2021 14:25:44 +0000 (16:25 +0200)]
mlxsw: spectrum: Add hash table for IPv6 address mapping
The device supports forwarding entries such as routes and FDBs that
perform tunnel (e.g., VXLAN, IP-in-IP) encapsulation or decapsulation.
When the underlay is IPv6, these entries do not encode the 128 bit IPv6
address used for encapsulation / decapsulation. Instead, these entries
encode a 24 bit pointer to an array called KVDL where the IPv6 address
is stored.
Currently, only IP-in-IP with IPv6 underlay is supported, but subsequent
patches will add support for VxLAN with IPv6 underlay. To avoid
duplicating the logic required to store and retrieve these IPv6
addresses, introduce a hash table that will store the mapping between
IPv6 addresses and their KVDL index.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 15 Dec 2021 14:46:33 +0000 (14:46 +0000)]
Merge tag 'mlx5-updates-2021-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saed Mahameed says:
====================
mlx5-updates-2021-12-14
Parsing Infrastructure for TC actions:
The series introduce a TC action infrastructure to help
parsing TC actions in a generic way for both FDB and NIC rules.
To help maintain the parsing code of TC actions, we the parsing code to
action parser per action TC type in separate files, instead of having one
big switch case loop, duplicated between FDB and NIC parsers as before this
patchset.
Each TC flow_action->id is represented by a dedicated mlx5e_tc_act handler
which has callbacks to check if the specific action is offload supported and
to parse the specific action.
We move each case (TC action) handling into the specific handler, which is
responsible for parsing and determining if the action is supported.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 15 Dec 2021 10:59:57 +0000 (10:59 +0000)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
100GbE Intel Wired LAN Driver Updates 2021-12-14
This series contains updates to ice driver only.
Haiyue adds support to query hardware for supported PTYPEs.
Jeff changes PTYPE validation to utilize the capabilities queried from
the hardware instead of maintaining a per DDP support list.
Brett refactors promiscuous functions to provide common and clear
interfaces to call for configuration.
Wojciech modifies DDP package load to simplify determining the final
state of the load.
Tony removes the use of ice_status from the driver. This involves
removing string conversion functions, converting variables and values to
standard errors, and clean up. He also removes an unused define.
Dan Carpenter removes unneeded casts.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Joakim Zhang [Tue, 14 Dec 2021 10:50:46 +0000 (18:50 +0800)]
net: fec: fix system hang during suspend/resume
1. During normal suspend (WoL not enabled) process, system has posibility
to hang. The root cause is TXF interrupt coming after clocks disabled,
system hang when accessing registers from interrupt handler. To fix this
issue, disable all interrupts when system suspend.
2. System also has posibility to hang with WoL enabled during suspend,
after entering stop mode, then magic pattern coming after clocks
disabled, system will be waked up, and interrupt handler will be called,
system hang when access registers. To fix this issue, disable wakeup
irq in .suspend(), and enable it in .resume().
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Clément Léger [Tue, 14 Dec 2021 09:55:34 +0000 (10:55 +0100)]
net: ocelot: add support to get port mac from device-tree
Add support to get mac from device-tree using of_get_ethdev_address.
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Clément Léger <clement.leger@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Conley Lee [Tue, 14 Dec 2021 09:11:06 +0000 (17:11 +0800)]
sun4i-emac.c: remove unnecessary branch
According to the current implementation of emac_rx, every arrived packet
will be processed in the while loop. So, there is no remain packet last
time. The skb_last field and this branch for dealing with it is
unnecessary.
Signed-off-by: Conley Lee <conleylee@foxmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 14 Dec 2021 08:42:30 +0000 (00:42 -0800)]
ethtool: use ethnl_parse_header_dev_put()
It seems I missed that most ethnl_parse_header_dev_get() callers
declare an on-stack struct ethnl_req_info, and that they simply call
dev_put(req_info.dev) when about to return.
Add ethnl_parse_header_dev_put() helper to properly untrack
reference taken by ethnl_parse_header_dev_get().
Fixes: e4b8954074f6 ("netlink: add net device refcount tracker to struct ethnl_req_info") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Roi Dayan [Sun, 15 Aug 2021 10:23:14 +0000 (13:23 +0300)]
net/mlx5e: Move goto action checks into tc_action goto post parse op
Move goto action checks from parse nic/fdb funcs into the tc action
infra goto post parse op.
While moving this part also use NL_SET_ERR_MSG_MOD() instead of
NL_SET_ERR_MSG().
Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Roi Dayan [Wed, 11 Aug 2021 11:19:56 +0000 (14:19 +0300)]
net/mlx5e: Add post_parse() op to tc action infrastructure
The post_parse() op should be called after the parse op was called
for all actions. It could be an action state is dependent on other
actions. In the new op an action can fail the parse if the state
is not valid anymore.
Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Roi Dayan [Thu, 12 Aug 2021 13:02:58 +0000 (16:02 +0300)]
net/mlx5e: Move sample attr allocation to tc_action sample parse op
There is no reason to wait with the kmalloc to after parsing all
other actions. There could still be a failure later and before
offloading the rule. So alloc the mem when parsing.
The memory is being released on mlx5e_flow_put() which is called
also on error flow.
Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Roi Dayan [Mon, 12 Jul 2021 11:13:10 +0000 (14:13 +0300)]
net/mlx5e: Add tc action infrastructure
Add an infrastructure to help parsing tc actions in a generic way.
Supporting an action parser means implementing struct mlx5e_tc_act
for that action.
The infrastructure will give the possibility to be generic when parsing tc
actions, i.e. parse_tc_nic_actions() and parse_tc_fdb_actions().
To parse tc actions a user needs to allocate a parse_state instance
and pass it when iterating over the tc actions parsers.
If a parser doesn't exists then a user can treat it as unsupported.
To add an action parser a user needs to implement two callbacks.
The can_offload() callback to quickly check if an action can be offloaded.
The parse_action() callback to do actual parsing and prepare for offload.
Add implementation for drop, trap, mark and accept action parsers with this
commit to act as examples and implement usage of the new infrastructure for
those actions.
Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
====================
net: dsa: hellcreek: Fix handling of MGMT protocols
this series fixes some minor issues with regards to management protocols
such as PTP and STP in the hellcreek DSA driver. Configure static FDB
for these protocols. The end result is:
|root@tsn:~# mv88e6xxx_dump --atu
|Using device <platform/ff240000.switch>
|ATU:
|FID MAC 0123 Age OBT Pass Static Reprio Prio
| 0 01:1b:19:00:00:00 1100 1 X X 6
| 1 01:00:5e:00:01:81 1100 1 X X 6
| 2 33:33:00:00:01:81 1100 1 X X 6
| 3 01:80:c2:00:00:0e 1100 1 X X X 6
| 4 01:00:5e:00:00:6b 1100 1 X X X 6
| 5 33:33:00:00:00:6b 1100 1 X X X 6
| 6 01:80:c2:00:00:00 1100 1 X X X 6
Changes since v1:
* Target net-next, as this never worked correctly and is not critical
* Add STP and PTP over UDP rules
* Use pass_blocked for PDelay messages only (Richard Cochran)
====================
Dan Carpenter [Wed, 10 Nov 2021 07:50:23 +0000 (10:50 +0300)]
ice: Remove unnecessary casts
The "bitmap" variable is already an unsigned long so there is no need
for this cast.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tony Nguyen [Thu, 7 Oct 2021 23:00:23 +0000 (16:00 -0700)]
ice: Remove excess error variables
ice_status previously had a variable to contain these values where other
error codes had a variable as well. With ice_status now being an int,
there is no need for two variables to hold error values. In cases where
this occurs, remove one of the excess variables and use a single one.
Some initialization of variables are no longer needed and have been
removed.
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Tony Nguyen [Thu, 7 Oct 2021 22:58:01 +0000 (15:58 -0700)]
ice: Remove enum ice_status
Replace uses of ice_status to, as equivalent as possible, error codes.
Remove enum ice_status and its helper conversion function as they are no
longer needed.
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Tony Nguyen [Thu, 7 Oct 2021 22:56:57 +0000 (15:56 -0700)]
ice: Use int for ice_status
To prepare for removal of ice_status, change the variables from
ice_status to int. This eases the transition when values are changed to
return standard int error codes over enum ice_status.
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Tony Nguyen [Thu, 7 Oct 2021 22:56:02 +0000 (15:56 -0700)]
ice: Remove string printing for ice_status
Remove the ice_stat_str() function which prints the string
representation of the ice_status error code. With upcoming changes
moving away from ice_status, there will be no need for this function.
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Wojciech Drewek [Thu, 7 Oct 2021 22:54:37 +0000 (15:54 -0700)]
ice: Refactor status flow for DDP load
Before this change, final state of the DDP pkg load process was
dependent on many variables such as: ice_status, pkg version,
ice_aq_err. The last one had be stored in hw->pkg_dwnld_status.
It was impossible to conclude this state just from ice_status, that's
why logging process of DDP pkg load in the caller was a little bit
complicated.
With this patch new status enum is introduced - ice_ddp_state.
It covers all the possible final states of the loading process.
What's tricky for ice_ddp_state is that not only
ICE_DDP_PKG_SUCCESS(=0) means that load was successful. Actually
three states mean that:
- ICE_DDP_PKG_SUCCESS
- ICE_DDP_PKG_SAME_VERSION_ALREADY_LOADED
- ICE_DDP_PKG_COMPATIBLE_ALREADY_LOADED
ice_is_init_pkg_successful can tell that information.
One ddp_state should not be used outside of ice_init_pkg which is
ICE_DDP_PKG_ALREADY_LOADED. It is more generic, it is used in
ice_dwnld_cfg_bufs to see if pkg is already loaded. At this point
we can't use one of the specific one (SAME_VERSION, COMPATIBLE,
NOT_SUPPORTED) because we don't have information on the package
currently loaded in HW (we are before calling ice_get_pkg_info).
We can get rid of hw->pkg_dwnld_status because we are immediately
mapping aq errors to ice_ddp_state in ice_dwnld_cfg_bufs.
Other errors like ICE_ERR_NO_MEMORY, ICE_ERR_PARAM are mapped the
generic ICE_DDP_PKG_ERR.
Suggested-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Brett Creeley [Tue, 2 Mar 2021 18:15:34 +0000 (10:15 -0800)]
ice: Refactor promiscuous functions
Some of the promiscuous mode functions take a boolean to indicate
set/clear, which affects readability. Refactor and provide an
interface for the promiscuous mode code with explicit set and clear
promiscuous mode operations.
Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Jeff Guo [Fri, 16 Jul 2021 22:16:43 +0000 (15:16 -0700)]
ice: refactor PTYPE validating
Since the capability of a PTYPE within a specific package could be
negotiated by checking the HW bit map, it means that there's no need
to maintain a different PTYPE list for each type of the package when
parsing PTYPE. So refactor the PTYPE validating mechanism.
Signed-off-by: Jeff Guo <jia.guo@intel.com> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Haiyue Wang [Fri, 16 Jul 2021 22:16:42 +0000 (15:16 -0700)]
ice: Add package PTYPE enable information
Scan the 'Marker Ptype TCAM' section to retrieve the Rx parser PTYPE
enable information from the current package.
Signed-off-by: Haiyue Wang <haiyue.wang@intel.com> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Dany Madden [Tue, 14 Dec 2021 05:17:48 +0000 (00:17 -0500)]
ibmvnic: remove unused defines
IBMVNIC_STATS_TIMEOUT and IBMVNIC_INIT_FAILED are not used in the driver.
Remove them.
Suggested-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Signed-off-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Changcheng Deng [Tue, 14 Dec 2021 11:44:47 +0000 (11:44 +0000)]
pktgen: use min() to make code cleaner
Use min() in order to make code cleaner. Issue found by coccinelle.
Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Changcheng Deng <deng.changcheng@zte.com.cn> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>
It seems that the DSA tagger-owned storage changes were insufficiently
tested and do not work in all cases. Specifically, the NXP Bluebox 3
(arch/arm64/boot/dts/freescale/fsl-lx2160a-bluebox3.dts) got broken by
these changes, because
(a) I forgot that DSA_TAG_PROTO_SJA1110 exists and differs from
DSA_TAG_PROTO_SJA1105
(b) the Bluebox 3 uses a DSA switch tree with 2 switches, and the
tagger-owned storage patches don't cover that use case well, it
seems
Therefore, I'm sorry to say that there needs to be an API fixup: tagging
protocol drivers will from now on connect to individual switches from a
tree, rather than to the tree as a whole. This is more robust against
various ordering constraints in the DSA probe and teardown paths, and is
also symmetrical with the connection API exposed to the switch drivers
themselves, which is also per switch.
With these changes, the Bluebox 3 also works fine.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 14 Dec 2021 01:45:36 +0000 (03:45 +0200)]
net: dsa: make tagging protocols connect to individual switches from a tree
On the NXP Bluebox 3 board which uses a multi-switch setup with sja1105,
the mechanism through which the tagger connects to the switch tree is
broken, due to improper DSA code design. At the time when tag_ops->connect()
is called in dsa_port_parse_cpu(), DSA hasn't finished "touching" all
the ports, so it doesn't know how large the tree is and how many ports
it has. It has just seen the first CPU port by this time. As a result,
this function will call the tagger's ->connect method too early, and the
tagger will connect only to the first switch from the tree.
This could be perhaps addressed a bit more simply by just moving the
tag_ops->connect(dst) call a bit later (for example in dsa_tree_setup),
but there is already a design inconsistency at present: on the switch
side, the notification is on a per-switch basis, but on the tagger side,
it is on a per-tree basis. Furthermore, the persistent storage itself is
per switch (ds->tagger_data). And the tagger connect and disconnect
procedures (at least the ones that exist currently) could see a fair bit
of simplification if they didn't have to iterate through the switches of
a tree.
To fix the issue, this change transforms tag_ops->connect(dst) into
tag_ops->connect(ds) and moves it somewhere where we already iterate
over all switches of a tree. That is in dsa_switch_setup_tag_protocol(),
which is a good placement because we already have there the connection
call to the switch side of things.
As for the dsa_tree_bind_tag_proto() method (called from the code path
that changes the tag protocol), things are a bit more complicated
because we receive the tree as argument, yet when we unwind on errors,
it would be nice to not call tag_ops->disconnect(ds) where we didn't
previously call tag_ops->connect(ds). We didn't have this problem before
because the tag_ops connection operations passed the entire dst before,
and this is more fine grained now. To solve the error rewind case using
the new API, we have to create yet one more cross-chip notifier for
disconnection, and stay connected with the old tag protocol to all the
switches in the tree until we've succeeded to connect with the new one
as well. So if something fails half way, the whole tree is still
connected to the old tagger. But there may still be leaks if the tagger
fails to connect to the 2nd out of 3 switches in a tree: somebody needs
to tell the tagger to disconnect from the first switch. Nothing comes
for free, and this was previously handled privately by the tagging
protocol driver before, but now we need to emit a disconnect cross-chip
notifier for that, because DSA has to take care of the unwind path. We
assume that the tagging protocol has connected to a switch if it has set
ds->tagger_data to something, otherwise we avoid calling its
disconnection method in the error rewind path.
The rest of the changes are in the tagging protocol drivers, and have to
do with the replacement of dst with ds. The iteration is removed and the
error unwind path is simplified, as mentioned above.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 14 Dec 2021 01:45:35 +0000 (03:45 +0200)]
net: dsa: sja1105: fix broken connection with the sja1110 tagger
The driver was incorrectly converted assuming that "sja1105" is the only
tagger supported by this driver. This results in SJA1110 switches
failing to probe:
sja1105 spi1.0: Unable to connect to tag protocol "sja1110": -EPROTONOSUPPORT
sja1105: probe of spi1.2 failed with error -93
Add DSA_TAG_PROTO_SJA1110 to the list of supported taggers by the
sja1105 driver. The sja1105_tagger_data structure format is common for
the two tagging protocols.
Fixes: c79e84866d2a ("net: dsa: tag_sja1105: convert to tagger-owned data") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 14 Dec 2021 01:45:34 +0000 (03:45 +0200)]
net: dsa: tag_sja1105: fix zeroization of ds->priv on tag proto disconnect
The method was meant to zeroize ds->tagger_data but got the wrong
pointer. Fix this.
Fixes: c79e84866d2a ("net: dsa: tag_sja1105: convert to tagger-owned data") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 14 Dec 2021 01:39:02 +0000 (17:39 -0800)]
ethtool: fix null-ptr-deref on ref tracker
dev can be a NULL here, not all requests set require_dev.
Fixes: e4b8954074f6 ("netlink: add net device refcount tracker to struct ethnl_req_info") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 14 Dec 2021 12:28:25 +0000 (12:28 +0000)]
Merge branch 'hwtstamp_bonding'
Hangbin Liu says:
====================
net: add new hwtstamp flag HWTSTAMP_FLAG_BONDED_PHC_INDEX
This patchset add a new hwtstamp_config flag HWTSTAMP_FLAG_BONDED_PHC_INDEX.
When user want to get bond active interface's PHC, they need to add this flag
and aware the PHC index may changed.
v3: Use bitwise test to check the flags validation
v2: rename the flag to HWTSTAMP_FLAG_BONDED_PHC_INDEX
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Hangbin Liu [Fri, 10 Dec 2021 08:59:59 +0000 (16:59 +0800)]
Bonding: force user to add HWTSTAMP_FLAG_BONDED_PHC_INDEX when get/set HWTSTAMP
When there is a failover, the PHC index of bond active interface will be
changed. This may break the user space program if the author didn't aware.
By setting this flag, the user should aware that the PHC index get/set
by syscall is not stable. And the user space is able to deal with it.
Without this flag, the kernel will reject the request forwarding to
bonding.
Reported-by: Jakub Kicinski <kuba@kernel.org> Fixes: 94dd016ae538 ("bond: pass get_ts_info and SIOC[SG]HWTSTAMP ioctl to active device") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hangbin Liu [Fri, 10 Dec 2021 08:59:58 +0000 (16:59 +0800)]
net_tstamp: add new flag HWTSTAMP_FLAG_BONDED_PHC_INDEX
Since commit 94dd016ae538 ("bond: pass get_ts_info and SIOC[SG]HWTSTAMP
ioctl to active device") the user could get bond active interface's
PHC index directly. But when there is a failover, the bond active
interface will change, thus the PHC index is also changed. This may
break the user's program if they did not update the PHC timely.
This patch adds a new hwtstamp_config flag HWTSTAMP_FLAG_BONDED_PHC_INDEX.
When the user wants to get the bond active interface's PHC, they need to
add this flag and be aware the PHC index may be changed.
With the new flag. All flag checks in current drivers are removed. Only
the checking in net_hwtstamp_validate() is kept.
Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Maor Gottlieb [Wed, 1 Dec 2021 19:36:21 +0000 (11:36 -0800)]
RDMA/mlx5: Add support to multiple priorities for FDB rules
Currently, the driver ignores the user's priority for flow steering
rules in FDB namespace. Change it and create the rule in the right
priority.
It will allow to create FDB steering rules in up to 16 different
priorities.
Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Acked-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Maor Gottlieb [Wed, 1 Dec 2021 19:36:18 +0000 (11:36 -0800)]
net/mlx5: Separate FDB namespace
This patch doesn't add an additional namespaces, but just separates the
naming to be used by each FDB user, bypass and kernel.
Downstream patches will actually split this up and allow to have more
than single priority for the bypass users.
Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Acked-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Lorenzo Bianconi [Mon, 13 Dec 2021 11:23:22 +0000 (12:23 +0100)]
net: mtk_eth: add COMPILE_TEST support
Improve the build testing of mtk_eth drivers by enabling them when
COMPILE_TEST is selected. Moreover COMPILE_TEST will be useful
for the driver development.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
net: dev: Always serialize on Qdisc::busylock in __dev_xmit_skb() on PREEMPT_RT.
The root-lock is dropped before dev_hard_start_xmit() is invoked and after
setting the __QDISC___STATE_RUNNING bit. If the Qdisc owner is preempted
by another sender/task with a higher priority then this new sender won't
be able to submit packets to the NIC directly instead they will be
enqueued into the Qdisc. The NIC will remain idle until the Qdisc owner
is scheduled again and finishes the job.
By serializing every task on the ->busylock then the task will be
preempted by a sender only after the Qdisc has no owner.
Always serialize on the busylock on PREEMPT_RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Yang Li [Mon, 13 Dec 2021 09:54:13 +0000 (17:54 +0800)]
mt76: remove variable set but not used
The code that uses variable queued has been removed,
and "mt76_is_usb(dev) ? q->ndesc - q->queued : q->queued"
didn't do anything, so all they should be removed as well.
Eliminate the following clang warnings:
drivers/net/wireless/mediatek/mt76/debugfs.c:77:9: warning: variable
‘queued’ set but not used.
Reported-by: Abaci Robot <abaci@linux.alibaba.com> Fixes: 2d8be76c1674 ("mt76: debugfs: improve queue node readability") Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Horatiu Vultur [Sat, 11 Dec 2021 21:44:20 +0000 (22:44 +0100)]
net: lan966x: Fix the configuration of the pcs
When inserting a SFP that runs at 2.5G, then the Serdes was still
configured to run at 1G. Because the config->speed was 0, and then the
speed of the serdes was not configured at all, it was using the default
value which is 1G. This patch stop calling the serdes function set_speed
and allow the serdes to figure out the speed based on the interface
type.
Fixes: d28d6d2e37d10d ("net: lan966x: add port module support") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 13 Dec 2021 14:15:41 +0000 (14:15 +0000)]
Merge branch 'mse102x-support'
Stefan Wahren says:
====================
add Vertexcom MSE102x support
This patch series adds support for the Vertexcom MSE102x Homeplug GreenPHY
chips [1]. They can be connected either via RGMII, RMII or SPI to a host CPU.
These patches handles only the last one, with an Ethernet over SPI protocol
driver.
The code has been tested only on Raspberry Pi boards, but should work
on other platforms.
Changes in V3:
- drop IF_PORT_HOMEPLUG again, since it's actually not used
Changes in V2:
- improve lock handling for RX & TX path
- add new patch to introduce IF_PORT_HOMEPLUG as suggested by Andrew Lunn
- address all the comments by Jakub Kicinski, Andrew Lunn, Kernel test robot
net: mvneta: mark as a legacy_pre_march2020 driver
mvneta provides mac_an_restart and mac_pcs_get_state methods, so needs
to be marked as a legacy driver. Marek spotted that mvneta had stopped
working in 2500base-X mode - thanks for reporting.
Reported-by: Marek Behún <kabel@kernel.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>