Bin Chen [Wed, 11 May 2022 11:39:31 +0000 (13:39 +0200)]
rtnetlink: verify rate parameters for calls to ndo_set_vf_rate
When calling ndo_set_vf_rate() the max_tx_rate parameter may be zero,
in which case the setting is cleared, or it must be greater or equal to
min_tx_rate.
Enforce this requirement on all calls to ndo_set_vf_rate via a wrapper
which also only calls ndo_set_vf_rate() if defined by the driver.
Based on work by Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Bin Chen <bin.chen@corigine.com> Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Vladimir Oltean [Wed, 11 May 2022 09:42:00 +0000 (12:42 +0300)]
net: enetc: kill PHY-less mode for PFs
Right now, a PHY-less port (no phy-mode, no fixed-link, no phy-handle)
doesn't register with phylink, but calls netif_carrier_on() from
enetc_start().
This makes sense for a VF, but for a PF, this is braindead, because we
never call enetc_mac_enable() so the MAC is left inoperational.
Furthermore, commit 71b77a7a27a3 ("enetc: Migrate to PHYLINK and
PCS_LYNX") put the nail in the coffin because it removed the initial
netif_carrier_off() call done right after register_netdev().
Without that call, netif_carrier_on() does not call
linkwatch_fire_event(), so the operstate remains IF_OPER_UNKNOWN.
Just deny the broken configuration by requiring that a phy-mode is
present, and always register a PF with phylink.
Kees Cook [Wed, 11 May 2022 02:53:01 +0000 (19:53 -0700)]
fortify: Provide a memcpy trap door for sharp corners
As we continue to narrow the scope of what the FORTIFY memcpy() will
accept and build alternative APIs that give the compiler appropriate
visibility into more complex memcpy scenarios, there is a need for
"unfortified" memcpy use in rare cases where combinations of compiler
behaviors, source code layout, etc, result in cases where the stricter
memcpy checks need to be bypassed until appropriate solutions can be
developed (i.e. fix compiler bugs, code refactoring, new API, etc). The
intention is for this to be used only if there's no other reasonable
solution, for its use to include a justification that can be used
to assess future solutions, and for it to be temporary.
Example usage included, based on analysis and discussion from:
https://lore.kernel.org/netdev/CANn89iLS_2cshtuXPyNUGDPaic=sJiYfvTb_wNLgWrZRyBxZ_g@mail.gmail.com
Cc: Jakub Kicinski <kuba@kernel.org> Cc: Eric Dumazet <edumazet@google.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Coco Li <lixiaoyan@google.com> Cc: Tariq Toukan <tariqt@nvidia.com> Cc: Saeed Mahameed <saeedm@nvidia.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: netdev@vger.kernel.org Cc: linux-hardening@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220511025301.3636666-1-keescook@chromium.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
====================
Count tc-taprio window drops in enetc driver
This series includes a patch from Po Liu (no longer with NXP) which
counts frames dropped by the tc-taprio offload in ethtool -S and in
ndo_get_stats64. It also contains a preparation patch from myself.
====================
Po Liu [Tue, 10 May 2022 16:36:15 +0000 (19:36 +0300)]
net: enetc: count the tc-taprio window drops
The enetc scheduler for IEEE 802.1Qbv has 2 options (depending on
PTGCR[TG_DROP_DISABLE]) when we attempt to send an oversized packet
which will never fit in its allotted time slot for its traffic class:
either block the entire port due to head-of-line blocking, or drop the
packet and set a bit in the writeback format of the transmit buffer
descriptor, allowing other packets to be sent.
We obviously choose the second option in the driver, but we do not
detect the drop condition, so from the perspective of the network stack,
the packet is sent and no error counter is incremented.
This change checks the writeback of the TX BD when tc-taprio is enabled,
and increments a specific ethtool statistics counter and a generic
"tx_dropped" counter in ndo_get_stats64.
Signed-off-by: Po Liu <Po.Liu@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Tue, 10 May 2022 16:36:14 +0000 (19:36 +0300)]
net: enetc: manage ENETC_F_QBV in priv->active_offloads only when enabled
Future work in this driver would like to look at priv->active_offloads &
ENETC_F_QBV to determine whether a tc-taprio qdisc offload was
installed, but this does not produce the intended effect.
All the other flags in priv->active_offloads are managed dynamically,
except ENETC_F_QBV which is set statically based on the probed SI capability.
This change makes priv->active_offloads & ENETC_F_QBV really track the
presence of a tc-taprio schedule on the port.
Some existing users, like the enetc_sched_speed_set() call from
phylink_mac_link_up(), are best kept using the old logic: the tc-taprio
offload does not re-trigger another link mode resolve, so the scheduler
needs to be functional from the get go, as long as Qbv is supported at
all on the port. So to preserve functionality there, look at the static
station interface capability from pf->si->hw_features instead.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 11 May 2022 23:14:15 +0000 (16:14 -0700)]
Merge branch 'macb-napi-improvements'
Robert Hancock says:
====================
MACB NAPI improvements
Simplify the logic in the Cadence MACB/GEM driver for determining
when to reschedule NAPI processing, and update it to use NAPI for the
TX path as well as the RX path.
Changes since v1: Changed to use separate TX and RX NAPI instances and
poll functions to avoid unnecessary checks of the other ring (TX/RX)
states during polling and to use budget handling for both RX and TX.
Fixed locking to protect against concurrent access to TX ring on
TX transmit and TX poll paths.
====================
Robert Hancock [Mon, 9 May 2022 19:46:35 +0000 (13:46 -0600)]
net: macb: use NAPI for TX completion path
This driver was using the TX IRQ handler to perform all TX completion
tasks. Under heavy TX network load, this can cause significant irqs-off
latencies (found to be in the hundreds of microseconds using ftrace).
This can cause other issues, such as overrunning serial UART FIFOs when
using high baud rates with limited UART FIFO sizes.
Switch to using a NAPI poll handler to perform the TX completion work
to get this out of hard IRQ context and avoid the IRQ latency impact. A
separate NAPI instance is used for TX and RX to avoid checking the other
ring's state unnecessarily when doing the poll, and so that the NAPI
budget handling can work for both TX and RX packets.
A new per-queue tx_ptr_lock spinlock has been added to avoid using the
main device lock (with IRQs needing to be disabled) across the entire TX
mapping operation, and also to protect the TX queue pointers from
concurrent access between the TX start and TX poll operations.
The TX Used Bit Read interrupt (TXUBR) handling also needs to be moved into
the TX NAPI poll handler to maintain the proper order of operations. A flag
is used to notify the poll handler that a UBR condition needs to be
handled. The macb_tx_restart handler has had some locking added for global
register access, since this could now potentially happen concurrently on
different queues.
Signed-off-by: Robert Hancock <robert.hancock@calian.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Robert Hancock [Mon, 9 May 2022 19:46:34 +0000 (13:46 -0600)]
net: macb: simplify/cleanup NAPI reschedule checking
Previously the macb_poll method was checking the RSR register after
completing its RX receive work to see if additional packets had been
received since IRQs were disabled, since this controller does not
maintain the pending IRQ status across IRQ disable. It also had to
double-check the register after re-enabling IRQs to detect if packets
were received after the first check but before IRQs were enabled.
Using the RSR register for this purpose is problematic since it reflects
the global device state rather than the per-queue state, so if packets
are being received on multiple queues it may end up retriggering receive
on a queue where the packets did not actually arrive and not on the one
where they did arrive. This will also cause problems with an upcoming
change to use NAPI for the TX path where use of multiple queues is more
likely.
Add a macb_rx_pending function to check the RX ring to see if more
packets have arrived in the queue, and use that to check if NAPI should
be rescheduled rather than the RSR register. By doing this, we can just
ignore the global RSR register entirely, and thus save some extra device
register accesses at the same time.
This also makes the previous first check for pending packets rather
redundant, since it would be checking the RX ring state which was just
checked in the receive work function. Therefore we can get rid of it and
just check after enabling interrupts whether packets are already
pending.
Signed-off-by: Robert Hancock <robert.hancock@calian.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Tue, 10 May 2022 22:09:04 +0000 (01:09 +0300)]
selftests: forwarding: tc_actions: allow mirred egress test to run on non-offloaded h2
The host interfaces $h1 and $h2 don't have to be switchdev interfaces,
but due to the fact that we pass $tcflags which may have the value of
"skip_sw", we force $h2 to offload a drop rule for dst_ip, something
which it may not be able to do.
The selftest only wants to verify the hit count of this rule as a means
of figuring out whether the packet was received, so remove the $tcflags
for it and let it be done in software.
Jakub Kicinski [Wed, 11 May 2022 22:11:32 +0000 (15:11 -0700)]
Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
1GbE Intel Wired LAN Driver Updates 2022-05-10
This series contains updates to igc driver only.
Sasha cleans up the code by removing an unused function and removing an
enum for PHY type as there is only one PHY. The return type for
igc_check_downshift() is changed to void as it always returns success.
* '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
igc: Change type of the 'igc_check_downshift' method
igc: Remove unused phy_type enum
igc: Remove igc_set_spd_dplx method
====================
Jakub Kicinski [Mon, 9 May 2022 15:05:32 +0000 (08:05 -0700)]
eth: amd: remove NI6510 support (ni65)
Looks like all the changes to this driver had been tree-wide
refactoring since git era begun. The driver is using virt_to_bus()
we should make it use more modern DMA APIs but since it's unlikely
to be getting any use these days delete it instead. We can always
revert to bring it back.
Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 9 May 2022 15:01:30 +0000 (08:01 -0700)]
net: appletalk: remove Apple/Farallon LocalTalk PC support
Looks like all the changes to this driver had been tree-wide
refactoring since git era begun. The driver is using virt_to_bus()
we should make it use more modern DMA APIs but since it's unlikely
to be getting any use these days delete it instead. We can always
revert to bring it back.
Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 10 May 2022 03:57:40 +0000 (20:57 -0700)]
net: remove two BUG() from skb_checksum_help()
I have a syzbot report that managed to get a crash in skb_checksum_help()
If syzbot can trigger these BUG(), it makes sense to replace
them with more friendly WARN_ON_ONCE() since skb_checksum_help()
can instead return an error code.
Note that syzbot will still crash there, until real bug is fixed.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 10 May 2022 03:57:38 +0000 (20:57 -0700)]
net: add CONFIG_DEBUG_NET
This config option enables network debugging checks.
This patch adds DEBUG_NET_WARN_ON_ONCE(cond)
Note that this is not a replacement for WARN_ON_ONCE(cond)
as (cond) is not evaluated if CONFIG_DEBUG_NET is not set.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 11 May 2022 11:12:27 +0000 (12:12 +0100)]
Merge tag 'mlx5-updates-2022-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2022-05-09
1) Gavin Li, adds exit route from waiting for FW init on device boot and
increases FW init timeout on health recovery flow
2) Support 4 ports HCAs LAG mode
Mark Bloch Says:
================
This series adds to mlx5 drivers support for 4 ports HCAs.
Starting with ConnectX-7 HCAs with 4 ports are possible.
As most driver parts aren't affected by such configuration most driver
code is unchanged.
Specially the only affected areas are:
- Lag
- Devcom
- Merged E-Switch
- Single FDB E-Switch
Lag was chosen to be converted first. Creating hardware LAG when all 4
ports are added to the same bond device.
Devom, merge E-Switch and single FDB E-Switch, are marked as supporting
only 2 ports HCAs and future patches will add support for 4 ports HCAs.
In order to activate the hardware lag a user can execute the:
ip link add bond0 type bond
ip link set bond0 type bond miimon 100 mode 2
ip link set eth2 master bond0
ip link set eth3 master bond0
ip link set eth4 master bond0
ip link set eth5 master bond0
Where eth2, eth3, eth4 and eth5 are the PFs of the same HCA.
================
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
====================
net: phy: add comments for LAN8742 phy support
Add comments for 0xfffffff2 phy ID mask for the LAN8742 and the LAN88xx, explaining that they can coexist and allow future hardware revisions.
Also add one missing tab in smsc.c.
====================
====================
docs: document some aspects of struct sk_buff
This small set creates a place to render sk_buff documentation,
documents one random thing (data-only skbs) and converts the big
checksum comment to kdoc.
====================
Jakub Kicinski [Mon, 9 May 2022 16:04:56 +0000 (09:04 -0700)]
skbuff: render the checksum comment to documentation
Long time ago Tom added a giant comment to skbuff.h explaining
checksums. Now that we have a place in Documentation for skbuff
docs we should render it. Sprinkle some markup while at it.
Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Mon, 9 May 2022 16:04:54 +0000 (09:04 -0700)]
skbuff: add a basic intro doc
Add basic skb documentation. It's mostly an intro to the subsequent
patches - it would looks strange if we documented advanced topics
without covering the basics in any way.
Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
Move Siena into a separate subdirectory
The Siena NICs (SFN5000 and SFN6000 series) went EOL in November 2021.
Most of these adapters have been remove from our test labs, and testing
has been reduced to a minimum.
This patch series creates a separate kernel module for the Siena architecture,
analogous to what was done for Falcon some years ago.
This reduces our maintenance for the sfc.ko module, and allows us to
enhance the EF10 and EF100 drivers without the risk of breaking Siena NICs.
After this series further enhancements are needed to differentiate the
new kernel module from sfc.ko, and the Siena code can be removed from sfc.ko.
Thes will be posted as a small follow-up series.
The Siena module is not built by default, but can be enabled
using Kconfig option SFC_SIENA. This will create module sfc-siena.ko.
Patches
Patches 1-3 establish the code base for the Siena driver.
Patches 4-10 ensure the allyesconfig build succeeds.
Patch 11 adds the basic Siena module.
I do not expect patch 1 through 3 to be reviewed, they are FYI only.
No checkpatch issues were resolved as part of these, but they
were fixed in the subsequent patches.
Testing
Various build tests were done such as allyesconfig, W=1 and sparse.
The new sfc-siena.ko and sfc.ko modules were tested on a machine with both
these NICs in them, and several tests were run on both drivers.
====================
Martin Habets [Mon, 9 May 2022 15:32:58 +0000 (16:32 +0100)]
sfc/siena: Rename functions in nic_common.h to avoid conflicts with sfc
For siena use efx_siena_ as the function prefix.
efx_nic_update_stats_atomic is only used in efx_common.c, so move
it there.
efx_nic_copy_stats is not used in Siena, so it is removed.
Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Martin Habets [Mon, 9 May 2022 15:32:33 +0000 (16:32 +0100)]
sfc/siena: Rename peripheral functions to avoid conflicts with sfc
For siena use efx_siena_ as the function prefix.
This patch covers selftest.h, ptp.h, net_driver.h and ethtool_common.h.
efx_ethtool_fill_self_tests() can become static.
Some functions in ptp.c can also become static.
Rename loopback_mode in net_driver.h.
Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Martin Habets [Mon, 9 May 2022 15:32:20 +0000 (16:32 +0100)]
sfc/siena: Rename RX/TX functions to avoid conflicts with sfc
For siena use efx_siena_ as the function prefix.
Several functions are not used in Siena, so they are removed.
Use a Siena specific variable name for module parameter
efx_separate_tx_channels.
Move efx_fini_tx_queue() to avoid a forward declaration of
efx_dequeue_buffer().
Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Martin Habets [Mon, 9 May 2022 15:32:08 +0000 (16:32 +0100)]
sfc/siena: Rename functions in efx headers to avoid conflicts with sfc
When building with allyesconfig there are many identical
symbol names.
For siena use efx_siena_ as the function and variable prefix
to avoid build errors.
efx_mtd_remove_partition can become static as it is no longer called
from other files.
efx_ticks_to_usecs and efx_xmit_done_single are not used in Siena, so
they are removed.
Several functions are only used inside efx_channels.c for Siena so
they can become static.
Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Martin Habets [Mon, 9 May 2022 15:31:55 +0000 (16:31 +0100)]
sfc/siena: Remove build references to missing functionality
Functionality not supported or needed on Siena includes:
- Anything for EF100
- EF10 specifics such as register access, PIO and TSO offload.
Also only bind to Siena NICs.
Remove EF10 specifics from nic.h.
The functions that start with efx_farch_ will be removed from sfc.ko
with a subsequent patch.
Add the efx_ prefix to siena_prepare_flush() to make it consistent
with the other APIs.
Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Louis Peens [Tue, 10 May 2022 07:48:45 +0000 (09:48 +0200)]
nfp: flower: fix 'variable 'flow6' set but not used'
Kernel test robot reported an issue after a recent patch about an
unused variable when CONFIG_IPV6 is disabled. Move the variable
declaration to be inside the #ifdef, and do a bit more cleanup. There
is no need to use a temporary ipv6 bool value, it is just checked once,
remove the extra variable and just do the check directly.
Fixes: 9d5447ed44b5 ("nfp: flower: fixup ipv6/ipv4 route lookup for neigh events") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20220510074845.41457-1-simon.horman@corigine.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Complete to commit 8e153faf5827 ("igc: Remove unused phy type")
i225 parts have only one PHY. There is no point to use phy_type enum.
Clean up the code accordingly, and get rid of the unused enum lines.
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Colin Ian King [Sun, 8 May 2022 21:45:00 +0000 (22:45 +0100)]
x25: remove redundant pointer dev
Pointer dev is being assigned a value that is never used, the assignment
and the variable are redundant and can be removed. Also replace null check
with the preferred !ptr idiom.
Cleans up clang scan warning:
net/x25/x25_proc.c:94:26: warning: Although the value stored to 'dev' is
used in the enclosing expression, the value is never actually read
from 'dev' [deadcode.DeadStores]
====================
This is a patch series for Ethernet driver of Sunplus SP7021 SoC.
Sunplus SP7021 is an ARM Cortex A7 (4 cores) based SoC. It integrates
many peripherals (ex: UART, I2C, SPI, SDIO, eMMC, USB, SD card and
etc.) into a single chip. It is designed for industrial control
applications.
Refer to:
https://sunplus.atlassian.net/wiki/spaces/doc/overview
https://tibbo.com/store/plus1.html
====================
====================
ptp: Support hardware clocks with additional free running cycle counter
ptp vclocks require a clock with free running time for the timecounter.
Currently only a physical clock forced to free running is supported.
If vclocks are used, then the physical clock cannot be synchronized
anymore. The synchronized time is not available in hardware in this
case. As a result, timed transmission with TAPRIO hardware support
is not possible anymore.
If hardware would support a free running time additionally to the
physical clock, then the physical clock does not need to be forced to
free running. Thus, the physical clocks can still be synchronized while
vclocks are in use.
The physical clock could be used to synchronize the time domain of the
TSN network and trigger TAPRIO. In parallel vclocks can be used to
synchronize other time domains.
One year ago I thought for two time domains within a TSN network also
two physical clocks are required. This would lead to new kernel
interfaces for asking for the second clock, ... . But actually for a
time triggered system like TSN there can be only one time domain that
controls the system itself. All other time domains belong to other
layers, but not to the time triggered system itself. So other time
domains can be based on a free running counter if similar mechanisms
like 2 step synchroisation are used.
Synchronisation was tested with two time domains between two directly
connected hosts. Each host run two ptp4l instances, the first used the
physical clock and the second used the virtual clock. I used my FPGA
based network controller as network device. ptp4l was used in
combination with the virtual clock support patches from Miroslav
Lichvar.
v4:
- if_index of 0 is invalid (Jonathan Lemon)
- set if_index to 0 in the SOF_TIMESTAMPING_RAW_HARDWARE block (Jonathan
Lemon)
- add helper function for netdev_get_tstamp() call (Jonathan Lemon)
- update SKBTX_ANY_TSTAMP (Paolo Abeni)
- use separate bits for new tx_flags (Richard Cochran)
v3:
- optimize ptp_convert_timestamp (Richard Cochran)
- call dev_get_by_napi_id() only if needed (Richard Cochran)
- use non-negated logical test (Richard Cochran)
- add comment for skipped output (Richard Cochran)
- add comment for SKBTX_HW_TSTAMP_USE_CYCLES masking (Richard Cochran)
The TSN endpoint Ethernet MAC supports a free running counter
additionally to its clock. This free running counter can be read and
hardware timestamps are supported. As the name implies, this counter
cannot be set and its frequency cannot be adjusted.
Add free running cycle counter support based on this free running
counter to physical clock. This also requires hardware time stamps
based on that free running counter.
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
ptp_convert_timestamp() is called in the RX path of network messages.
The current implementation takes ~5000ns on 1.2GHz A53. This is too much
for the hot path of packet processing.
Introduce hash table for fast vclock lookup in ptp_convert_timestamp().
The execution time of ptp_convert_timestamp() is reduced to ~700ns on
1.2GHz A53.
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
If a physical clock supports a free running cycle counter, then
timestamps shall be based on this time too. For TX it is known in
advance before the transmission if a timestamp based on the free running
cycle counter is needed. For RX it is impossible to know which timestamp
is needed before the packet is received and assigned to a socket.
Support late timestamp determination by a network device. Therefore, an
address/cookie is stored within the new netdev_data field of struct
skb_shared_hwtstamps. This address/cookie is provided to a new network
device function called ndo_get_tstamp(), which returns a timestamp based
on the normal/adjustable time or based on the free running cycle
counter. If function is not supported, then timestamp handling is not
changed.
This mechanism is intended for RX, but TX use is also possible.
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
ptp_convert_timestamp() converts only the timestamp hwtstamp, which is
a field of the argument with the type struct skb_shared_hwtstamps *. So
a pointer to the hwtstamp field of this structure is sufficient.
Rework ptp_convert_timestamp() to use an argument of type ktime_t *.
This allows to add additional timestamp manipulation stages before the
call of ptp_convert_timestamp().
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The free running cycle counter of physical clocks called cycles shall be
used for hardware timestamps to enable synchronisation.
Introduce new flag SKBTX_HW_TSTAMP_USE_CYCLES, which signals driver to
provide a TX timestamp based on cycles if cycles are supported.
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
ptp vclocks require a free running time for their timecounter.
Currently only a physical clock forced to free running is supported.
If vclocks are used, then the physical clock cannot be synchronized
anymore. The synchronized time is not available in hardware in this
case. As a result, timed transmission with TAPRIO hardware support
is not possible anymore.
If hardware would support a free running time additionally to the
physical clock, then the physical clock does not need to be forced to
free running. Thus, the physical clocks can still be synchronized
while vclocks are in use.
The physical clock could be used to synchronize the time domain of the
TSN network and trigger TAPRIO. In parallel vclocks can be used to
synchronize other time domains.
Introduce support for a free running cycle counter called cycles to
physical clocks. Rework ptp vclocks to use this free running cycle
counter. Default implementation is based on time of physical clock.
Thus, behavior of ptp vclocks based on physical clocks without free
running cycle counter is identical to previous behavior.
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski [Fri, 6 May 2022 20:00:29 +0000 (13:00 -0700)]
eth: dpaa2-mac: remove a dead-code NULL check on fwnode parent
Since commit 4e30e98c4b4c ("dpaa2-mac: return -EPROBE_DEFER from dpaa2_mac_open in case the fwnode is not set")
@parent can't be NULL after the if. It's either the address
of the ->fwnode of @dpmacs or @fwnode in case of ACPI.
Mark Bloch [Tue, 15 Mar 2022 16:56:50 +0000 (16:56 +0000)]
net/mlx5: Lag, add debugfs to query hardware lag state
Lag state has become very complicated with many modes, flags, types and
port selections methods and future work will add additional features.
Add a debugfs to query the current lag state. A new directory named "lag"
will be created under the mlx5 debugfs directory. As the driver has
debugfs per pci function the location will be: <debugfs>/mlx5/<BDF>/lag
For example:
/sys/kernel/debug/mlx5/0000:08:00.0/lag
The following files are exposed:
- state: Returns "active" or "disabled". If "active" it means hardware
lag is active.
- members: Returns the BDFs of all the members of lag object.
- type: Returns the type of the lag currently configured. Valid only
if hardware lag is active.
* "roce" - Members are bare metal PFs.
* "switchdev" - Members are in switchdev mode.
* "multipath" - ECMP offloads.
- port_sel_mode: Returns the egress port selection method, valid
only if hardware lag is active.
* "queue_affinity" - Egress port is selected by
the QP/SQ affinity.
* "hash" - Egress port is selected by hash done on
each packet. Controlled by: xmit_hash_policy of the
bond device.
- flags: Returns flags that are specific per lag @type. Valid only if
hardware lag is active.
* "shared_fdb" - "on" or "off", if "on" single FDB is used.
- mapping: Returns the mapping which is used to select egress port.
Valid only if hardware lag is active.
If @port_sel_mode is "hash" returns the active egress ports.
The hash result will select only active ports.
if @port_sel_mode is "queue_affinity" returns the mapping
between the configured port affinity of the QP/SQ and actual
egress port. For example:
* 1:1 - Mapping means if the configured affinity is port 1
traffic will egress via port 1.
* 1:2 - Mapping means if the configured affinity is port 1
traffic will egress via port 2. This can happen
if port 1 is down or in active/backup mode and port 1
is backup.
Signed-off-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Mark Bloch [Wed, 2 Mar 2022 15:38:50 +0000 (15:38 +0000)]
net/mlx5: Lag, use buckets in hash mode
When in hardware lag and the NIC has more than 2 ports when one port
goes down need to distribute the traffic between the remaining
active ports.
For better spread in such cases instead of using 1-to-1 mapping and only
4 slots in the hash, use many.
Each port will have many slots that point to it. When a port goes down
go over all the slots that pointed to that port and spread them between
the remaining active ports. Once the port comes back restore the default
mapping.
We will have number_of_ports * MLX5_LAG_MAX_HASH_BUCKETS slots.
Each MLX5_LAG_MAX_HASH_BUCKETS belong to a different port.
The native mapping is such that:
port 1: The first MLX5_LAG_MAX_HASH_BUCKETS slots are: [1, 1, .., 1]
which means if a packet is hased into one of this slots it will hit the
wire via port 1.
port 2: The second MLX5_LAG_MAX_HASH_BUCKETS slots are: [2, 2, .., 2]
which means if a packet is hased into one of this slots it will hit the
wire via port2.
and this mapping is the same of the rest of the ports.
On a failover, lets say port 2 goes down (port 1, 3, 4 are still up).
the new mapping for port 2 will be:
port 2: The second MLX5_LAG_MAX_HASH_BUCKETS are: [1, 3, 1, 4, .., 4]
which means the mapping was changed from the native mapping to a mapping
that consists of only the active ports.
With this if a port goes down the traffic will be split between the
active ports randomly
Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Mark Bloch [Tue, 1 Mar 2022 17:34:31 +0000 (17:34 +0000)]
net/mlx5: Lag, use hash when in roce lag on 4 ports
Downstream patches will add support for lag over 4 ports.
In that mode we will only use hash as the uplink selection method.
Using hash instead of queue affinity (before this patch) offers key
advantages like:
- Align ports selection method with the method used by the bond device
- Better packets distribution where a single queue can transmit from
multiple ports (with queue affinity a queue is bound to a single port
regardless of the packet being sent).
- In case of failover we traffic is split between multiple ports and not
a single one like in queue affinity.
Going forward it was decided that queue affinity will be deprecated
as using hash provides a better user experience which means on 4 ports
HCAs hash will always be used.
Future work will add hash support for 2 ports HCAs as well.
Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Mark Bloch [Tue, 1 Mar 2022 17:24:40 +0000 (17:24 +0000)]
net/mlx5: Lag, support single FDB only on 2 ports
E-Switch currently doesn't support more than 2 E-Switch managers
being aggregated under a single hardware lag. Have specific checks
to disallow creating lag when the code doesn't support it.
Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Mark Bloch [Sun, 27 Feb 2022 13:45:59 +0000 (13:45 +0000)]
net/mlx5: Lag, store number of ports inside lag object
Store the number of lag ports inside the lag object. Lag object is a single
shared object managing the lag state of multiple mlx5 devices on the same
physical HCA.
Downstream patches will allow hardware lag to be created over devices with
more than 2 ports.
Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Mark Bloch [Sun, 27 Feb 2022 12:40:39 +0000 (12:40 +0000)]
net/mlx5: Lag, filter non compatible devices
When search for a peer lag device we can filter based on that
device's capabilities.
Downstream patch will be less strict when filtering compatible devices
and remove the limitation where we require exact MLX5_MAX_PORTS and
change it to a range.
Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Gavin Li [Sun, 27 Mar 2022 14:36:44 +0000 (17:36 +0300)]
net/mlx5: Increase FW pre-init timeout for health recovery
Currently, health recovery will reload driver to recover it from fatal
errors. During the driver's load process, it would wait for FW to set the
pre-init bit for up to 120 seconds, beyond this threshold it would abort
the load process. In some cases, such as a FW upgrade on the DPU, this
timeout period is insufficient, and the user has no way to recover the
host device.
To solve this issue, introduce a new FW pre-init timeout for health
recovery, which is set to 2 hours.
The timeout for devlink reload and probe will use the original one because
they are user triggered flows, and therefore should not have a
significantly long timeout, during which the user command would hang.
Gavin Li [Sun, 27 Mar 2022 14:45:32 +0000 (17:45 +0300)]
net/mlx5: Add exit route when waiting for FW
Currently, removing a device needs to get the driver interface lock before
doing any cleanup. If the driver is waiting in a loop for FW init, there
is no way to cancel the wait, instead the device cleanup waits for the
loop to conclude and release the lock.
To allow immediate response to remove device commands, check the TEARDOWN
flag while waiting for FW init, and exit the loop if it has been set.
====================
nfp: support Corigine PCIE vendor ID
Historically the nfp driver has supported NFP chips with Netronome's
PCIE vendor ID. This patch extends the driver to also support NFP
chips, which at this point are assumed to be otherwise identical from
a software perspective, that have Corigine's PCIE vendor ID (0x1da8).
This patchset begins by cleaning up strings to make them:
* Vendor neutral for the NFP chip
* Relate to Corigine for the driver itself
It then adds support to the driver for the Corigine's PCIE vendor ID
====================
Yu Xiao [Sun, 8 May 2022 17:38:16 +0000 (19:38 +0200)]
nfp: support Corigine PCIE vendor ID
Historically the nfp driver has supported NFP chips with Netronome's
PCIE vendor ID. This patch extends the driver to also support NFP
chips, which at this point are assumed to be otherwise identical from
a software perspective, that have Corigine's PCIE vendor ID (0x1da8).
Also, Rename the macro definitions PCI_DEVICE_ID_NERTONEOME_NFPXXXX
to PCI_DEVICE_ID_NFPXXXX, as they are now used in conjunction with two
PCIE vendor IDs.
Signed-off-by: Yu Xiao <yu.xiao@corigine.com> Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yu Xiao [Sun, 8 May 2022 17:38:15 +0000 (19:38 +0200)]
nfp: vendor neutral strings for chip and Corigne in strings for driver
Historically the nfp driver has supported NFP chips with Netronome's
PCIE vendor ID. In preparation for extending the to also support NFP
chips that have Corigine's PCIE vendor ID (0x1da8) make printk statements
relating to the chip vendor neutral.
An alternate approach is to set the string based on the PCI vendor ID.
In our judgement this proved to cumbersome so we have taken this simpler
approach.
Update strings relating to the driver to use Corigine, who have taken
over maintenance of the driver.
Signed-off-by: Yu Xiao <yu.xiao@corigine.com> Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 10 May 2022 00:51:01 +0000 (17:51 -0700)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
100GbE Intel Wired LAN Driver Updates 2022-05-06
Marcin Szycik says:
This patchset adds support for systemd defined naming scheme for port
representors, as well as re-enables displaying PCI bus-info in ethtool.
bus-info information has previously been removed from ethtool for port
representors, as a workaround for a bug in lshw tool, where the tool would
sometimes display wrong descriptions for port representors/PF. Now the bug
has been fixed in lshw tool [1].
Removing the workaround can be considered a regression (user might be
running an older, unpatched version of lshw) (see [2] for discussion).
However, calling SET_NETDEV_DEV also produces the same effect as removing
the workaround, i.e. lshw is able to access PCI bus-info (this time not
via ethtool, but in some other way) and the bug can occur.
Adding SET_NETDEV_DEV is important, as it greatly improves netdev naming -
- port representors are named based on PF name. Currently port representors
are named "ethX", which might be confusing, especially when spawning VFs on
multiple PFs. Furthermore, it's currently harder to determine to which PF
does a particular port representor belong, as bus-info is not shown in
ethtool.
Consider the following three cases:
Case 1: current code - driver workaround in place, no SET_NETDEV_DEV,
lshw with or without fix. Port representors are not displayed because they
don't have bus-info (the workaround), PFs are labelled correctly:
$ sudo ./lshw -c net -businfo
Bus info Device Class Description
========================================================
pci@0000:02:00.0 ens6f0 network Ethernet Controller E810-XXV for SFP <-- PF
pci@0000:02:00.1 ens6f1 network Ethernet Controller E810-XXV for SFP
pci@0000:02:01.0 ens6f0v0 network Ethernet Adaptive Virtual Function <-- VF
pci@0000:02:01.1 ens6f0v1 network Ethernet Adaptive Virtual Function
...
Case 2: driver workaround in place, SET_NETDEV_DEV, no lshw fix. Port
representors have predictable names. lshw is able to get bus-info because
of SET_NETDEV_DEV and netdevs CAN be mislabelled:
$ sudo ./lshw -c net -businfo
Bus info Device Class Description
=============================================================
pci@0000:02:00.0 ens6f0npf0vf60 network Ethernet Controller E810-XXV for SFP <-- mislabeled port representor
pci@0000:02:00.1 ens6f1 network Ethernet Controller E810-XXV for SFP
pci@0000:02:01.0 ens6f0v0 network Ethernet Adaptive Virtual Function
pci@0000:02:01.1 ens6f0v1 network Ethernet Adaptive Virtual Function
...
pci@0000:02:00.0 ens6f0npf0vf26 network Ethernet interface
pci@0000:02:00.0 ens6f0 network Ethernet interface <-- mislabeled PF
pci@0000:02:00.0 ens6f0npf0vf81 network Ethernet interface
...
$ sudo ethtool -i ens6f0npf0vf60
driver: ice
...
bus-info:
...
Output of lshw would be the same with workaround removed; it does not
change the fact that lshw labels netdevs incorrectly, while at the same
time it prevents ethtool from displaying potentially useful data
(bus-info).
Case 3: workaround removed, SET_NETDEV_DEV, lshw fix:
$ sudo ./lshw -c net -businfo
Bus info Device Class Description
=============================================================
pci@0000:02:00.0 ens6f0npf0vf73 network Ethernet Controller E810-XXV for SFP
pci@0000:02:00.1 ens6f1 network Ethernet Controller E810-XXV for SFP
pci@0000:02:01.0 ens6f0v0 network Ethernet Adaptive Virtual Function
pci@0000:02:01.1 ens6f0v1 network Ethernet Adaptive Virtual Function
...
pci@0000:02:00.0 ens6f0npf0vf5 network Ethernet Controller E810-XXV for SFP
pci@0000:02:00.0 ens6f0 network Ethernet Controller E810-XXV for SFP
pci@0000:02:00.0 ens6f0npf0vf60 network Ethernet Controller E810-XXV for SFP
...
$ sudo ethtool -i ens6f0npf0vf73
driver: ice
...
bus-info: 0000:02:00.0
...
In this case poort representors have predictable names, netdevs are not
mislabelled in lshw, and bus-info is shown in ethtool.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
Revert "ice: Hide bus-info in ethtool for PRs in switchdev mode"
ice: link representors to PCI device
====================
David S. Miller [Mon, 9 May 2022 13:30:38 +0000 (14:30 +0100)]
Merge branch 'hns3-next'
Guangbin Huang says:
====================
net: hns3: updates for -next
This series includes some updates for the HNS3 ethernet driver.
Change logs:
V1 -> V2:
- Fix some sparse warnings of patch 3# and 4#.
- Add patch #6 to fix sparse warnings of incorrect type of argument.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Guangbin Huang [Mon, 9 May 2022 07:55:32 +0000 (15:55 +0800)]
net: hns3: fix incorrect type of argument in declaration of function hclge_comm_get_rss_indir_tbl
The argument rss_ind_tbl_size is type u16 in function definition of
hclge_comm_get_rss_indir_tbl(), but it is set to type __le16 in function
declaration by mistake, so fix it.
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jie Wang [Mon, 9 May 2022 07:55:30 +0000 (15:55 +0800)]
net: hns3: add byte order conversion for VF to PF mailbox message
This patch uses __le16/__32 to define mailbox data structures. Then byte
order conversion are added for mailbox messages from VF to PF.
Signed-off-by: Jie Wang <wangjie125@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jie Wang [Mon, 9 May 2022 07:55:29 +0000 (15:55 +0800)]
net: hns3: add byte order conversion for PF to VF mailbox message
Currently, hns3 mailbox processing between PF and VF missed to convert
message byte order and use data type u16 instead of __le16 for mailbox
data process. These processes may cause problems between different
architectures.
So this patch uses __le16/__le32 data type to define mailbox data
structures. To be compatible with old hns3 driver, these structures use
one-byte alignment. Then byte order conversions are added to mailbox
messages from PF to VF.
Signed-off-by: Jie Wang <wangjie125@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yufeng Mo [Mon, 9 May 2022 07:55:28 +0000 (15:55 +0800)]
net: hns3: remove the affinity settings of vector0
Vector0 is used for common interrupt control events and is
irrelevant to performance. Currently, the driver sets the
default affinity of vector0 to NUMA nodes, which is unnecessary.
Therefore, the default setting is removed, and the driver does
not set the affinity for vector0.
Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hao Chen [Mon, 9 May 2022 07:55:27 +0000 (15:55 +0800)]
net: hns3: fix access null pointer issue when set tx-buf-size as 0
When set tx-buf-size as 0 by ethtool, hns3_init_tx_spare_buffer()
will return directly and priv->ring->tx_spare->len is uninitialized,
then print function access priv->ring->tx_spare->len will cause
this issue.
When set tx-buf-size as 0 by ethtool, the print function will
print 0 directly and not access priv->ring->tx_spare->len.
Fixes: 2373b35c24ff ("net: hns3: add log for setting tx spare buf size") Signed-off-by: Hao Chen <chenhao288@hisilicon.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yuiko Oshino [Thu, 5 May 2022 18:12:52 +0000 (11:12 -0700)]
net: phy: smsc: add LAN8742 phy support.
The current phy IDs on the available hardware.
LAN8742 0x0007C130, 0x0007C131
Signed-off-by: Yuiko Oshino <yuiko.oshino@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Yuiko Oshino [Thu, 5 May 2022 18:12:51 +0000 (11:12 -0700)]
net: phy: microchip: update LAN88xx phy ID and phy ID mask.
update LAN88xx phy ID and phy ID mask because the existing code conflicts with the LAN8742 phy.
The current phy IDs on the available hardware.
LAN8742 0x0007C130, 0x0007C131
LAN88xx 0x0007C132
Signed-off-by: Yuiko Oshino <yuiko.oshino@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
changes v3:
- export reusable code snippets and make use of it in the dp83td510
driver
changes v2:
- rewrite the driver reduce usage of common code and to reduce amount of
quirks.
- add genphy_c45_baset1_an_config_aneg fix
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Oleksij Rempel [Fri, 6 May 2022 04:23:55 +0000 (06:23 +0200)]
net: phy: genphy_c45_pma_baset1_read_master_slave: read actual configuration
Since MDIO_PMA_PMD_BT1_CTRL register shows actual configuration (and
forced state configuration is equal to the state), we should show
this configuration for ethtool.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Oleksij Rempel [Fri, 6 May 2022 04:23:51 +0000 (06:23 +0200)]
net: phy: genphy_c45_baset1_an_config_aneg: do no set unknown configuration
Do not change default master/slave autoneg configuration if no
changes was requested.
Fixes: 3da8ffd8545f ("net: phy: Add 10BASE-T1L support in phy-c45") Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Alaa Mohamed [Thu, 5 May 2022 15:09:58 +0000 (17:09 +0200)]
net: vxlan: Add extack support to vxlan_fdb_delete
This patch adds extack msg support to vxlan_fdb_delete and vxlan_fdb_parse.
extack is used to propagate meaningful error msgs to the user of vxlan
fdb netlink api
Signed-off-by: Alaa Mohamed <eng.alaamohamedsoliman.am@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>