net: dpaa2-mac: remove interface checks in dpaa2_mac_validate()
As phylink checks the interface mode against the supported_interfaces
bitmap, we no longer need to validate the interface mode, nor handle
PHY_INTERFACE_MODE_NA in the validation function. Remove these to
simplify the implementation.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
This series converts ag71xx to fill in the supported_interfaces member
of phylink_config, cleans up the validate() implementation, and then
converts to phylink_generic_validate().
The question over the port linkmode restriction has been answered by
Oleksij - there is no reason for this restriction, so we can go the
whole hog with this conversion. Thanks!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
ag71xx apparently only supports MII port type, which makes it different
from other implementations. However, Oleksij says there is no special
reason for this.
Convert the driver to use phylink_generic_validate(), which will allow
all ethtool port linkmodes instead of only MII, giving the driver
consistent behaviour with other drivers.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
net: ag71xx: remove interface checks in ag71xx_mac_validate()
As phylink checks the interface mode against the supported_interfaces
bitmap, we no longer need to validate the interface mode, nor handle
PHY_INTERFACE_MODE_NA in the validation function. Remove these to
simplify the implementation.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
Split clocks settings from init callback into clks_config callback,
which could support platform level clock management.
Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Bhupesh Sharma <bhupesh.sharma@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Add support to inet v4 raw sockets for binding to nonlocal addresses
through the IP_FREEBIND and IP_TRANSPARENT socket options, as well as
the ipv4.ip_nonlocal_bind kernel parameter.
Add helper function to inet_sock.h to check for bind address validity on
the base of the address type and whether nonlocal address are enabled
for the socket via any of the sockopts/sysctl, deduplicating checks in
ipv4/ping.c, ipv4/af_inet.c, ipv6/af_inet6.c (for mapped v4->v6
addresses), and ipv4/raw.c.
Add test cases with IP[V6]_FREEBIND verifying that both v4 and v6 raw
sockets support binding to nonlocal addresses after the change. Add
necessary support for the test cases to nettest.
David S. Miller [Wed, 17 Nov 2021 14:56:16 +0000 (14:56 +0000)]
Merge branch 'dev_watchdog-less-intrusive'
Eric Dumazet says:
====================
net: make dev_watchdog() less intrusive
dev_watchdog() is used on many NIC to periodically monitor TX queues
to detect hangs.
Problem is : It stops all queues, then check them, then 'unfreeze' them.
Not only this stops feeding the NIC, it also migrates all qdiscs
to be serviced on the cpu calling netif_tx_unlock(), causing
a potential latency artifact.
With many TX queues, this is becoming more visible.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 17 Nov 2021 03:29:24 +0000 (19:29 -0800)]
net: no longer stop all TX queues in dev_watchdog()
There is no reason for stopping all TX queues from dev_watchdog()
Not only this stops feeding the NIC, it also migrates all qdiscs
to be serviced on the cpu calling netif_tx_unlock(), causing
a potential latency artifact.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 17 Nov 2021 03:29:21 +0000 (19:29 -0800)]
net: use an atomic_long_t for queue->trans_timeout
tx_timeout_show() assumed dev_watchdog() would stop all
the queues, to fetch queue->trans_timeout under protection
of the queue->_xmit_lock.
As we want to no longer disrupt transmits, we use an
atomic_long_t instead.
Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: david decotigny <david.decotigny@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kurt Kanzenbach [Tue, 16 Nov 2021 08:03:25 +0000 (09:03 +0100)]
net: ethernet: ti: cpsw: Enable PHY timestamping
If the used PHYs also support hardware timestamping, all configuration requests
should be forwared to the PHYs instead of being processed by the MAC driver
itself.
This enables PHY timestamping in combination with the cpsw driver.
Tested with an am335x based board with two DP83640 PHYs connected to the cpsw
switch.
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Update net_failover documentation with missing and incomplete
details to get a proper working setup.
Signed-off-by: Vasudev Kamath <vasudev@copyninja.info> Reviewed-by: Krishna Kumar <krikku@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This series converts ocelot_net to fill in the supported_interfaces
member of phylink_config, cleans up the validate() implementation,
and then converts to phylink_generic_validate().
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
net: ocelot_net: remove interface checks in macb_validate()
As phylink checks the interface mode against the supported_interfaces
bitmap, we no longer need to validate the interface mode in the
validation function. Remove this to simplify it.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
This series converts mtk_eth_soc to fill in the supported_interfaces
member of phylink_config, cleans up the validate() implementation, and
then converts to phylink_generic_validate().
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
net: mtk_eth_soc: drop use of phylink_helper_basex_speed()
Now that we have a better method to select SFP interface modes, we
no longer need to use phylink_helper_basex_speed() in a driver's
validation function, and we can also get rid of our hack to indicate
both 1000base-X and 2500base-X if the comphy is present to make that
work. Remove this hack and use of phylink_helper_basex_speed().
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
net: mtk_eth_soc: remove interface checks in mtk_validate()
As phylink checks the interface mode against the supported_interfaces
bitmap, we no longer need to validate the interface mode, nor handle
PHY_INTERFACE_MODE_NA in the validation function. Remove these to
simplify the implementation.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
This series converts sparx5 to fill in the supported_interfaces member
of phylink_config, cleans up the validate() implementation, and then
converts to phylink_generic_validate().
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
sparx5_phylink_validate() no longer needs to check for
PHY_INTERFACE_MODE_NA as phylink will walk the supported interface
types to discover the link mode capabilities. Neither is it necessary
to check the device capabilities as we will not be called for
unsupported interface modes. Remove these checks.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
This series converts enetc to fill in the supported_interfaces member
of phylink_config, cleans up the validate() implementation, and then
converts to phylink_generic_validate().
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
net: enetc: remove interface checks in enetc_pl_mac_validate()
As phylink checks the interface mode against the supported_interfaces
bitmap, we no longer need to validate the interface mode in the
validation function. Remove this to simplify it.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
This series converts axienet to fill in the supported_interfaces member
of phylink_config, cleans up the validate() implementation, and then
converts to phylink_generic_validate().
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
net: axienet: remove interface checks in axienet_validate()
As phylink checks the interface mode against the supported_interfaces
bitmap, we no longer need to validate the interface mode in the
validation function. Remove this to simplify it.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 17 Nov 2021 11:03:43 +0000 (11:03 +0000)]
Merge tag 'mlx5-updates-2021-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2021-11-16
Updates for mlx5 driver:
1) Support ethtool cq mode
2) Static allocation of mod header object for the common case
3) TC support for when local and remote VTEPs are in the same
4) Create E-Switch QoS objects on demand to save on resources
5) Minor code improvements
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Dmytro Linkin [Tue, 21 Sep 2021 16:08:38 +0000 (19:08 +0300)]
net/mlx5: E-switch, Create QoS on demand
Don't create eswitch QoS (root TSAR) on switch mode change. Create it on
first child TSAR object creation - vport or rate group. Keep track
root TSAR references and release root TSAR with last object deletion.
No need to check for QoS is enabled when installing tc matchall filter.
Remove related helper function due to no users of it.
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Dmytro Linkin [Tue, 21 Sep 2021 15:45:42 +0000 (18:45 +0300)]
net/mlx5: E-switch, Enable vport QoS on demand
Vports' QoS is not commonly used but consume SW/HW resources, which
becomes an issue on BlueField SoC systems.
Don't enable QoS on vports by default on eswitch mode change and enable
when it's going to be used by one of the top level users:
- configuring TC matchall filter with police action;
- setting rate with legacy NDO API;
- calling devlink ops->rate_leaf_*() callbacks.
Disable vport QoS on vport cleanup.
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Parav Pandit [Wed, 20 Oct 2021 04:56:01 +0000 (07:56 +0300)]
net/mlx5: E-switch, Remove vport enabled check
An eswitch vport of the devlink port is always enabled before a
devlink port is registered. And a eswitch vport is always disabled
after a devlink port is unregistered.
Hence avoid the vport enabled check in the devlink callback routine.
Such check is only applicable in the legacy SR-IOV callbacks.
Chris Mi [Tue, 26 Oct 2021 09:08:24 +0000 (17:08 +0800)]
net/mlx5e: Specify out ifindex when looking up decap route
There is a use case that the local and remote VTEPs are in the same
host. Currently, the out ifindex is not specified when looking up the
decap route for offloads. So in this case, a local route is returned
and the route dev is lo.
Actual tunnel interface can be created with a parameter "dev" [1],
which specifies the physical device to use for tunnel endpoint
communication. Pass this parameter to driver when looking up decap
route for offloads. So that a unicast route will be returned.
[1] ip link add name vxlan1 type vxlan id 100 dev enp4s0f0 remote 1.1.1.1 dstport 4789
Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Paul Blakey [Mon, 5 Jul 2021 08:31:47 +0000 (11:31 +0300)]
net/mlx5e: Refactor mod header management API
For all mod hdr related functions to reside in a single self contained
component (mod_hdr.c), refactor alloc() and add get_id() so that user
won't rely on internal implementation, and move both to mod_hdr
component.
Rename the prefix to mlx5e_mod_hdr_* as other mod hdr functions.
Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Aya Levin [Tue, 9 Nov 2021 13:44:58 +0000 (15:44 +0200)]
net/mlx5: Avoid printing health buffer when firmware is unavailable
Use firmware version field as an indication to health buffer's sanity.
When firmware version is 0xFFFFFFFF, deduce that firmware is unavailable
and avoid printing the health buffer to dmesg as it doesn't provide
debug info.
Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Saeed Mahameed [Wed, 3 Nov 2021 21:01:05 +0000 (14:01 -0700)]
net/mlx5: Fix format-security build warnings
Treat the string as an argument to avoid this.
drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c:482:5:
error: format string is not a string literal (potentially insecure)
name);
^~~~
drivers/net/ethernet/mellanox/mlx5/core/en_stats.c:2079:4:
error: format string is not a string literal (potentially insecure)
ptp_ch_stats_desc[i].format);
^~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
net: document SMII and correct phylink's new validation mechanism
SMII has not been documented in the kernel, but information on this PHY
interface mode has been recently found. Document it, and correct the
recently introduced phylink handling for this interface mode.
====================
r8169: disable detection of further chip versions that didn't make it to the mass market
There's no sign of life from further chip versions. Seems they didn't
make it to the mass market. Let's disable detection and if nobody
complains remove support a few kernel versions later.
====================
Heiner Kallweit [Mon, 15 Nov 2021 20:17:56 +0000 (21:17 +0100)]
r8169: enable ASPM L1/L1.1 from RTL8168h
With newer chip versions ASPM-related issues seem to occur only if
L1.2 is enabled. I have a test system with RTL8168h that gives a
number of rx_missed errors when running iperf and L1.2 is enabled.
With L1.2 disabled (and L1 + L1.1 active) everything is fine.
See also [0]. Can't test this, but L1 + L1.1 being active should be
sufficient to reach higher package power saving states.
Archie Pusaka [Thu, 11 Nov 2021 05:20:53 +0000 (13:20 +0800)]
Bluetooth: Ignore HCI_ERROR_CANCELLED_BY_HOST on adv set terminated event
This event is received when the controller stops advertising,
specifically for these three reasons:
(a) Connection is successfully created (success).
(b) Timeout is reached (error).
(c) Number of advertising events is reached (error).
(*) This event is NOT generated when the host stops the advertisement.
Refer to the BT spec ver 5.3 vol 4 part E sec 7.7.65.18. Note that the
section was revised from BT spec ver 5.0 vol 2 part E sec 7.7.65.18
which was ambiguous about (*).
Some chips (e.g. RTL8822CE) send this event when the host stops the
advertisement with status = HCI_ERROR_CANCELLED_BY_HOST (due to (*)
above). This is treated as an error and the advertisement will be
removed and userspace will be informed via MGMT event.
On suspend, we are supposed to temporarily disable advertisements,
and continue advertising on resume. However, due to the behavior
above, the advertisements are removed instead.
This patch returns early if HCI_ERROR_CANCELLED_BY_HOST is received.
Btmon snippet of the unexpected behavior:
@ MGMT Command: Remove Advertising (0x003f) plen 1
Instance: 1
< HCI Command: LE Set Extended Advertising Enable (0x08|0x0039) plen 6
Extended advertising: Disabled (0x00)
Number of sets: 1 (0x01)
Entry 0
Handle: 0x01
Duration: 0 ms (0x00)
Max ext adv events: 0
> HCI Event: LE Meta Event (0x3e) plen 6
LE Advertising Set Terminated (0x12)
Status: Operation Cancelled by Host (0x44)
Handle: 1
Connection handle: 0
Number of completed extended advertising events: 5
> HCI Event: Command Complete (0x0e) plen 4
LE Set Extended Advertising Enable (0x08|0x0039) ncmd 2
Status: Success (0x00)
Randy Dunlap [Mon, 15 Nov 2021 03:05:17 +0000 (19:05 -0800)]
Bluetooth: btmrvl_main: repair a non-kernel-doc comment
Do not use "/**" to begin a non-kernel-doc comment.
Fixes this build warning:
drivers/bluetooth/btmrvl_main.c:2: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Cc: Marcel Holtmann <marcel@holtmann.org> Cc: Johan Hedberg <johan.hedberg@gmail.com> Cc: Luiz Augusto von Dentz <luiz.dentz@gmail.com> Cc: linux-bluetooth@vger.kernel.org Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Eric Dumazet [Mon, 15 Nov 2021 17:11:50 +0000 (09:11 -0800)]
net: drop nopreempt requirement on sock_prot_inuse_add()
This is distracting really, let's make this simpler,
because many callers had to take care of this
by themselves, even if on x86 this adds more
code than really needed.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 16 Nov 2021 13:10:35 +0000 (13:10 +0000)]
Merge branch 'tcp-optimizations'
Eric Dumazet says:
====================
tcp: optimizations for linux-5.17
Mostly small improvements in this series.
The notable change is in "defer skb freeing after
socket lock is released" in recvmsg() (and RX zerocopy)
The idea is to try to let skb freeing to BH handler,
whenever possible, or at least perform the freeing
outside of the socket lock section, for much improved
performance. This idea can probably be extended
to other protocols.
Tests on a 100Gbit NIC
Max throughput for one TCP_STREAM flow, over 10 runs.
MTU : 1500 (1428 bytes of TCP payload per MSS)
Before: 55 Gbit
After: 66 Gbit
MTU : 4096+ (4096 bytes of TCP payload, plus TCP/IPv6 headers)
Before: 82 Gbit
After: 95 Gbit
====================
Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 15 Nov 2021 19:02:48 +0000 (11:02 -0800)]
tcp: do not call tcp_cleanup_rbuf() if we have a backlog
Under pressure, tcp recvmsg() has logic to process the socket backlog,
but calls tcp_cleanup_rbuf() right before.
Avoiding sending ACK right before processing new segments makes
a lot of sense, as this decrease the number of ACK packets,
with no impact on effective ACK clocking.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 15 Nov 2021 19:02:46 +0000 (11:02 -0800)]
tcp: defer skb freeing after socket lock is released
tcp recvmsg() (or rx zerocopy) spends a fair amount of time
freeing skbs after their payload has been consumed.
A typical ~64KB GRO packet has to release ~45 page
references, eventually going to page allocator
for each of them.
Currently, this freeing is performed while socket lock
is held, meaning that there is a high chance that
BH handler has to queue incoming packets to tcp socket backlog.
This can cause additional latencies, because the user
thread has to process the backlog at release_sock() time,
and while doing so, additional frames can be added
by BH handler.
This patch adds logic to defer these frees after socket
lock is released, or directly from BH handler if possible.
Being able to free these skbs from BH handler helps a lot,
because this avoids the usual alloc/free assymetry,
when BH handler and user thread do not run on same cpu or
NUMA node.
One cpu can now be fully utilized for the kernel->user copy,
and another cpu is handling BH processing and skb/page
allocs/frees (assuming RFS is not forcing use of a single CPU)
Tested:
100Gbit NIC
Max throughput for one TCP_STREAM flow, over 10 runs
MTU : 1500
Before: 55 Gbit
After: 66 Gbit
MTU : 4096+(headers)
Before: 82 Gbit
After: 95 Gbit
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 15 Nov 2021 19:02:45 +0000 (11:02 -0800)]
tcp: avoid indirect calls to sock_rfree
TCP uses sk_eat_skb() when skbs can be removed from receive queue.
However, the call to skb_orphan() from __kfree_skb() incurs
an indirect call so sock_rfee(), which is more expensive than
a direct call, especially for CONFIG_RETPOLINE=y.
Add tcp_eat_recv_skb() function to make the call before
__kfree_skb().
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 15 Nov 2021 19:02:43 +0000 (11:02 -0800)]
tcp: annotate races around tp->urg_data
tcp_poll() and tcp_ioctl() are reading tp->urg_data without socket lock
owned.
Also, it is faster to first check tp->urg_data in tcp_poll(),
then tp->urg_seq == tp->copied_seq, because tp->urg_seq is
located in a different/cold cache line.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>