Aya Levin [Mon, 10 May 2021 07:13:06 +0000 (10:13 +0300)]
net/mlx5e: Use FW limitation for max MPW WQEBBs
Calculate maximal count of MPW WQEBBs on SQ's creation and store it
there. Remove MLX5E_TX_MPW_MAX_NUM_DS and MLX5E_TX_MPW_MAX_WQEBBS.
Update mlx5e_tx_mpwqe_is_full() and mlx5e_xdp_mpqwe_is_full() .
Aya Levin [Mon, 17 Jan 2022 09:46:35 +0000 (11:46 +0200)]
net/mlx5e: Read max WQEBBs on the SQ from firmware
Prior to this patch the maximal value for max WQEBBs (WQE Basic Blocks,
where WQE is a Work Queue Element) on the TX side was assumed to be 16
(fixed value). All firmware versions till today comply to this. In order
to be more flexible and resilient, read from FW the corresponding:
max_wqe_sz_sq. This value describes the maximum WQE size given in bytes,
thus max WQEBBs is given by the division in WQEBB's byte size. The
driver uses the top between 16 and the division result. This ensures
synchronization between driver and firmware and avoids unexpected
behavior. Store this value on the different SQs (Send Queues) for easy
access.
net: dsa: mv88e6xxx: Fix validation of built-in PHYs on 6095/6097
These chips have 8 built-in FE PHYs and 3 SERDES interfaces that can
run at 1G. With the blamed commit, the built-in PHYs could no longer
be connected to, using an MII PHY interface mode.
Create a separate .phylink_get_caps callback for these chips, which
takes the FE/GE split into consideration.
Fixes: 2ee84cfefb1e ("net: dsa: mv88e6xxx: convert to phylink_generic_validate()") Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20220213185154.3262207-1-tobias@waldekranz.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Volodymyr Mytnyk [Mon, 14 Feb 2022 08:20:06 +0000 (10:20 +0200)]
net: prestera: acl: add multi-chain support offload
Add support of rule offloading added to the non-zero index chain,
which was previously forbidden. Also, goto action is offloaded
allowing to jump for processing of desired chain.
Note that only implicit chain 0 is bound to the device port(s) for
processing. The rest of chains have to be jumped by actions.
Signed-off-by: Volodymyr Mytnyk <vmytnyk@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
M Chetan Kumar [Mon, 14 Feb 2022 07:16:52 +0000 (12:46 +0530)]
net: wwan: debugfs obtained dev reference not dropped
WWAN driver call's wwan_get_debugfs_dir() to obtain
WWAN debugfs dir entry. As part of this procedure it
returns a reference to a found device.
Since there is no debugfs interface available at WWAN
subsystem, it is not possible to drop dev reference post
debugfs use. This leads to side effects like post wwan
driver load and reload the wwan instance gets increment
from wwanX to wwanX+1.
A new debugfs interface is added in wwan subsystem so that
wwan driver can drop the obtained dev reference post debugfs
use.
void wwan_put_debugfs_dir(struct dentry *dir)
Signed-off-by: M Chetan Kumar <m.chetan.kumar@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 14 Feb 2022 14:06:11 +0000 (14:06 +0000)]
Merge branch 'dsa-realtek-next'
Luiz Angelo Daros de Luca says:
====================
net: dsa: realtek: realtek-mdio: reset before setup
This patch series cleans the realtek-smi reset code and copy that to the
realtek-mdio.
v1-v2)
- do not run reset code block if GPIO is missing. It was printing "RESET
deasserted" even when there is no GPIO configured.
- reset switch after dsa_unregister_switch()
- demote reset messages to debug
v2-v3)
- do not assert the reset on gpiod_get. Do it explicitly aferwards.
- split the commit into two (one for each module)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When reset GPIO was missing, the driver was still printing an info
message and still trying to assert the reset. Although gpiod_set_value()
will silently ignore calls with NULL gpio_desc, it is better to make it
clear the driver might allow gpio_desc to be NULL.
The initial value for the reset pin was changed to GPIOD_OUT_LOW,
followed by a gpiod_set_value() asserting the reset. This way, it will
be easier to spot if and where the reset really happens.
A new "asserted RESET" message was added just after the reset is
asserted, similar to the existing "deasserted RESET" message. Both
messages were demoted to dbg. The code comment is not needed anymore.
Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com> Acked-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Mon, 14 Feb 2022 02:10:56 +0000 (18:10 -0800)]
ipv6: blackhole_netdev needs snmp6 counters
Whenever rt6_uncached_list_flush_dev() swaps rt->rt6_idev
to the blackhole device, parts of IPv6 stack might still need
to increment one SNMP counter.
Root cause, patch from Ido, changelog from Eric :)
This bug suggests that we need to audit rt->rt6_idev usages
and make sure they are properly using RCU protection.
Fixes: e5f80fcf869a ("ipv6: give an IPv6 dev to blackhole_netdev") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 14 Feb 2022 13:38:36 +0000 (13:38 +0000)]
Merge branch 'netdev-RT'
Sebastian Andrzej Siewior says:
====================
net: dev: PREEMPT_RT fixups.
this series removes or replaces preempt_disable() and local_irq_save()
sections which are problematic on PREEMPT_RT.
Patch 2 makes netif_rx() work from any context after I found suggestions
for it in an old thread. Should that work, then the context-specific
variants could be removed.
v2…v3:
- #2
- Export __netif_rx() so it can be used by everyone.
- Add a lockdep assert to check for interrupt context.
- Update the kernel doc and mention that the skb is posted to
backlog NAPI.
- Use __netif_rx() also in drivers/net/*.c.
- Added Toke''s review tag and kept Eric's desptite the changes
made.
v1…v2:
- #1 and #2
- merge patch 1 und 2 from the series (as per Toke).
- updated patch description and corrected the first commit number (as
per Eric).
- #2
- Provide netif_rx() as in v1 and additionally __netif_rx() without
local_bh disable()+enable() for the loopback driver. __netif_rx() is
not exported (loopback is built-in only) so it won't be used
drivers. If this doesn't work then we can still export/ define a
wrapper as Eric suggested.
- Added a comment that netif_rx() considered legacy.
- #3
- Moved ____napi_schedule() into rps_ipi_queued() and
renamed it napi_schedule_rps().
https://lore.kernel.org/all/20220204201259.1095226-1-bigeasy@linutronix.de/
Disabling interrupts and in the RPS case locking input_pkt_queue is
split into local_irq_disable() and optional spin_lock().
This breaks on PREEMPT_RT because the spinlock_t typed lock can not be
acquired with disabled interrupts.
The sections in which the lock is acquired is usually short in a sense that it
is not causing long und unbounded latiencies. One exception is the
skb_flow_limit() invocation which may invoke a BPF program (and may
require sleeping locks).
By moving local_irq_disable() + spin_lock() into rps_lock(), we can keep
interrupts disabled on !PREEMPT_RT and enabled on PREEMPT_RT kernels.
Without RPS on a PREEMPT_RT kernel, the needed synchronisation happens
as part of local_bh_disable() on the local CPU.
____napi_schedule() is only invoked if sd is from the local CPU. Replace
it with __napi_schedule_irqoff() which already disables interrupts on
PREEMPT_RT as needed. Move this call to rps_ipi_queued() and rename the
function to napi_schedule_rps as suggested by Jakub.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
net: dev: Makes sure netif_rx() can be invoked in any context.
Dave suggested a while ago (eleven years by now) "Let's make netif_rx()
work in all contexts and get rid of netif_rx_ni()". Eric agreed and
pointed out that modern devices should use netif_receive_skb() to avoid
the overhead.
In the meantime someone added another variant, netif_rx_any_context(),
which behaves as suggested.
netif_rx() must be invoked with disabled bottom halves to ensure that
pending softirqs, which were raised within the function, are handled.
netif_rx_ni() can be invoked only from process context (bottom halves
must be enabled) because the function handles pending softirqs without
checking if bottom halves were disabled or not.
netif_rx_any_context() invokes on the former functions by checking
in_interrupts().
netif_rx() could be taught to handle both cases (disabled and enabled
bottom halves) by simply disabling bottom halves while invoking
netif_rx_internal(). The local_bh_enable() invocation will then invoke
pending softirqs only if the BH-disable counter drops to zero.
Eric is concerned about the overhead of BH-disable+enable especially in
regard to the loopback driver. As critical as this driver is, it will
receive a shortcut to avoid the additional overhead which is not needed.
Add a local_bh_disable() section in netif_rx() to ensure softirqs are
handled if needed.
Provide __netif_rx() which does not disable BH and has a lockdep assert
to ensure that interrupts are disabled. Use this shortcut in the
loopback driver and in drivers/net/*.c.
Make netif_rx_ni() and netif_rx_any_context() invoke netif_rx() so they
can be removed once they are no more users left.
Link: https://lkml.kernel.org/r/20100415.020246.218622820.davem@davemloft.net Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: dev: Remove preempt_disable() and get_cpu() in netif_rx_internal().
The preempt_disable() () section was introduced in commit cece1945bffcf ("net: disable preemption before call smp_processor_id()")
and adds it in case this function is invoked from preemtible context and
because get_cpu() later on as been added.
The get_cpu() usage was added in commit b0e28f1effd1d ("net: netif_rx() must disable preemption")
because ip_dev_loopback_xmit() invoked netif_rx() with enabled preemption
causing a warning in smp_processor_id(). The function netif_rx() should
only be invoked from an interrupt context which implies disabled
preemption. The commit e30b38c298b55 ("ip: Fix ip_dev_loopback_xmit()")
was addressing this and replaced netif_rx() with in netif_rx_ni() in
ip_dev_loopback_xmit().
Based on the discussion on the list, the former patch (b0e28f1effd1d)
should not have been applied only the latter (e30b38c298b55).
Remove get_cpu() and preempt_disable() since the function is supposed to
be invoked from context with stable per-CPU pointers. Bottom halves have
to be disabled at this point because the function may raise softirqs
which need to be processed.
Link: https://lkml.kernel.org/r/20100415.013347.98375530.davem@davemloft.net Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dave Ertman [Fri, 11 Feb 2022 18:26:03 +0000 (10:26 -0800)]
ice: Simplify tracking status of RDMA support
The status of support for RDMA is currently being tracked with two
separate status flags. This is unnecessary with the current state of
the driver.
Simplify status tracking down to a single flag.
Rename the helper function to denote the RDMA specific status and
universally use the helper function to test the status bit.
Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Leszek Kaliszczuk <leszek.kaliszczuk@intel.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 14 Feb 2022 13:24:29 +0000 (13:24 +0000)]
Merge branch 'ocelot-stats'
Colin Foster says:
====================
use bulk reads for ocelot statistics
Ocelot loops over memory regions to gather stats on different ports.
These regions are mostly continuous, and are ordered. This patch set
uses that information to break the stats reads into regions that can get
read in bulk.
The motiviation is for general cleanup, but also for SPI. Performing two
back-to-back reads on a SPI bus require toggling the CS line, holding,
re-toggling the CS line, sending 3 address bytes, sending N padding
bytes, then actually performing the read. Bulk reads could reduce almost
all of that overhead, but require that the reads are performed via
regmap_bulk_read.
v1 > v2: reword commit messages
v2 > v3: correctly mark this for net-next when sending
v3 > v4: calloc array instead of zalloc per review
v4 > v5:
Apply CR suggestions for whitespace
Fix calloc / zalloc mixup
Properly destroy workqueues
Add third commit to split long macros
v5 > v6:
Fix functionality - v5 was improperly tested
Add bugfix for ethtool mutex lock
Remove unnecessary ethtool stats reads
v6 > v7:
Remove mutex bug patch that was applied via net
Rename function based on CR
Add missed error check
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Foster [Sun, 13 Feb 2022 19:12:54 +0000 (11:12 -0800)]
net: mscc: ocelot: use bulk reads for stats
Create and utilize bulk regmap reads instead of single access for gathering
stats. The background reading of statistics happens frequently, and over
a few contiguous memory regions.
High speed PCIe buses and MMIO access will probably see negligible
performance increase. Lower speed buses like SPI and I2C could see
significant performance increase, since the bus configuration and register
access times account for a large percentage of data transfer time.
Signed-off-by: Colin Foster <colin.foster@in-advantage.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Foster [Sun, 13 Feb 2022 19:12:53 +0000 (11:12 -0800)]
net: mscc: ocelot: add ability to perform bulk reads
Regmap supports bulk register reads. Ocelot does not. This patch adds
support for Ocelot to invoke bulk regmap reads. That will allow any driver
that performs consecutive reads over memory regions to optimize that
access.
Signed-off-by: Colin Foster <colin.foster@in-advantage.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Foster [Sun, 13 Feb 2022 19:12:52 +0000 (11:12 -0800)]
net: ocelot: align macros for consistency
In the ocelot.h file, several read / write macros were split across
multiple lines, while others weren't. Split all macros that exceed the 80
character column width and match the style of the rest of the file.
Signed-off-by: Colin Foster <colin.foster@in-advantage.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Foster [Sun, 13 Feb 2022 19:12:51 +0000 (11:12 -0800)]
net: mscc: ocelot: remove unnecessary stat reading from ethtool
The ocelot_update_stats function only needs to read from one port, yet it
was updating the stats for all ports. Update to only read the stats that
are necessary.
Signed-off-by: Colin Foster <colin.foster@in-advantage.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Fri, 11 Feb 2022 17:15:07 +0000 (09:15 -0800)]
ipv6: Add reasons for skb drops to __udp6_lib_rcv
Add reasons to __udp6_lib_rcv for skb drops. The only twist is that the
NO_SOCKET takes precedence over the CSUM or other counters for that
path (motivation behind this patch - csum counter was misleading).
Signed-off-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Lu [Fri, 11 Feb 2022 06:52:21 +0000 (14:52 +0800)]
net/smc: Add comment for smc_tx_pending
The previous patch introduces a lock-free version of smc_tx_work() to
solve unnecessary lock contention, which is expected to be held lock.
So this adds comment to remind people to keep an eye out for locks.
Suggested-by: Stefan Raspl <raspl@linux.ibm.com> Signed-off-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kalash Nainwal [Thu, 10 Feb 2022 22:09:35 +0000 (14:09 -0800)]
Generate netlink notification when default IPv6 route preference changes
Generate RTM_NEWROUTE netlink notification when the route preference
changes on an existing kernel generated default route in response to
RA messages. Currently netlink notifications are generated only when
this route is added or deleted but not when the route preference
changes, which can cause userspace routing application state to go
out of sync with kernel.
Signed-off-by: Kalash Nainwal <kalash@arista.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Davide Caratti [Thu, 10 Feb 2022 17:56:08 +0000 (18:56 +0100)]
net/sched: act_police: more accurate MTU policing
in current Linux, MTU policing does not take into account that packets at
the TC ingress have the L2 header pulled. Thus, the same TC police action
(with the same value of tcfp_mtu) behaves differently for ingress/egress.
In addition, the full GSO size is compared to tcfp_mtu: as a consequence,
the policer drops GSO packets even when individual segments have the L2 +
L3 + L4 + payload length below the configured valued of tcfp_mtu.
Improve the accuracy of MTU policing as follows:
- account for mac_len for non-GSO packets at TC ingress.
- compare MTU threshold with the segmented size for GSO packets.
Also, add a kselftest that verifies the correct behavior.
Signed-off-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kees Cook [Sat, 12 Feb 2022 17:14:49 +0000 (09:14 -0800)]
etherdevice: Adjust ether_addr* prototypes to silence -Wstringop-overead
With GCC 12, -Wstringop-overread was warning about an implicit cast from
char[6] to char[8]. However, the extra 2 bytes are always thrown away,
alignment doesn't matter, and the risk of hitting the edge of unallocated
memory has been accepted, so this prototype can just be converted to a
regular char *. Silences:
net/core/dev.c: In function ‘bpf_prog_run_generic_xdp’: net/core/dev.c:4618:21: warning: ‘ether_addr_equal_64bits’ reading 8 bytes from a region of size 6 [-Wstringop-overread]
4618 | orig_host = ether_addr_equal_64bits(eth->h_dest, > skb->dev->dev_addr);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
net/core/dev.c:4618:21: note: referencing argument 1 of type ‘const u8[8]’ {aka ‘const unsigned char[8]’}
net/core/dev.c:4618:21: note: referencing argument 2 of type ‘const u8[8]’ {aka ‘const unsigned char[8]’}
In file included from net/core/dev.c:91: include/linux/etherdevice.h:375:20: note: in a call to function ‘ether_addr_equal_64bits’
375 | static inline bool ether_addr_equal_64bits(const u8 addr1[6+2],
| ^~~~~~~~~~~~~~~~~~~~~~~
Reported-by: Marc Kleine-Budde <mkl@pengutronix.de> Tested-by: Marc Kleine-Budde <mkl@pengutronix.de> Link: https://lore.kernel.org/netdev/20220212090811.uuzk6d76agw2vv73@pengutronix.de Cc: Jakub Kicinski <kuba@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Horatiu Vultur [Sat, 12 Feb 2022 20:03:43 +0000 (21:03 +0100)]
net: lan966x: Fix when CONFIG_IPV6 is not set
When CONFIG_IPV6 is not set, then the linking of the lan966x driver
fails with the following error:
drivers/net/ethernet/microchip/lan966x/lan966x_main.c:444: undefined
reference to `ipv6_mc_check_mld'
The fix consists in adding a check also for IS_ENABLED(CONFIG_IPV6)
Fixes: 47aeea0d57e80c ("net: lan966x: Implement the callback SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Horatiu Vultur [Sat, 12 Feb 2022 20:45:44 +0000 (21:45 +0100)]
net: lan966x: Fix when CONFIG_PTP_1588_CLOCK is compiled as module
When CONFIG_PTP_1588_CLOCK is compiled as a module, then the linking of
the lan966x fails because it can't find references to the following
functions 'ptp_clock_index', 'ptp_clock_register' and
'ptp_clock_unregister'
The fix consists in adding CONFIG_PTP_1588_CLOCK_OPTIONAL as a
dependency for the driver.
Fixes: d096459494a887 ("net: lan966x: Add support for ptp clocks") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch series adds support of the Ethernet function of the PCI11010 / PCI11414 devices to the LAN743x driver.
The PCI1xxxx family of devices consists of a PCIe switch with a variety of embedded PCI endpoints on its downstream ports.
The PCI11010 / PCI11414 devices include an Ethernet 10/100/1000/2500 function as one of those embedded endpoints.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
M Chetan Kumar [Thu, 10 Feb 2022 15:34:45 +0000 (21:04 +0530)]
net: wwan: iosm: Enable M.2 7360 WWAN card support
This patch enables Intel M.2 7360 WWAN card support on
IOSM Driver.
Control path implementation is a reuse whereas data path
implementation it uses a different protocol called as MUX
Aggregation. The major portion of this patch covers the MUX
Aggregation protocol implementation used for IP traffic
communication.
For M.2 7360 WWAN card, driver exposes 2 wwan AT ports for
control communication. The user space application or the
modem manager to use wwan AT port for data path establishment.
During probe, driver reads the mux protocol device capability
register to know the mux protocol version supported by device.
Base on which the right mux protocol is initialized for data
path communication.
An overview of an Aggregation Protocol
1> An IP packet is encapsulated with 16 octet padding header
to form a Datagram & the start offset of the Datagram is
indexed into Datagram Header (DH).
2> Multiple such Datagrams are composed & the start offset of
each DH is indexed into Datagram Table Header (DTH).
3> The Datagram Table (DT) is IP session specific & table_length
item in DTH holds the number of composed datagram pertaining
to that particular IP session.
4> And finally the offset of first DTH is indexed into DBH (Datagram
Block Header).
So in TX/RX flow Datagram Block (Datagram Block Header + Payload)is
exchanged between driver & device.
Signed-off-by: M Chetan Kumar <m.chetan.kumar@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 11 Feb 2022 14:19:23 +0000 (14:19 +0000)]
Merge tag 'wireless-next-2022-02-11' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
wireless-next patches for v5.18
First set of patches for v5.18, with both wireless and stack patches.
rtw89 now has AP mode support and wcn36xx has survey support. But
otherwise pretty normal.
Eric Dumazet [Thu, 10 Feb 2022 21:42:31 +0000 (13:42 -0800)]
ipv4: add (struct uncached_list)->quarantine list
This is an optimization to keep the per-cpu lists as short as possible:
Whenever rt_flush_dev() changes one rtable dst.dev
matching the disappearing device, it can can transfer the object
to a quarantine list, waiting for a final rt_del_uncached_list().
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 10 Feb 2022 21:42:30 +0000 (13:42 -0800)]
ipv6: add (struct uncached_list)->quarantine list
This is an optimization to keep the per-cpu lists as short as possible:
Whenever rt6_uncached_list_flush_dev() changes one rt6_info
matching the disappearing device, it can can transfer the object
to a quarantine list, waiting for a final rt6_uncached_list_del().
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 10 Feb 2022 21:42:29 +0000 (13:42 -0800)]
ipv6: give an IPv6 dev to blackhole_netdev
IPv6 addrconf notifiers wants the loopback device to
be the last device being dismantled at netns deletion.
This caused many limitations and work arounds.
Back in linux-5.3, Mahesh added a per host blackhole_netdev
that can be used whenever we need to make sure objects no longer
refer to a disappearing device.
If we attach to blackhole_netdev an ip6_ptr (allocate an idev),
then we can use this special device (which is never freed)
in place of the loopback_dev (which can be freed).
This will permit improvements in netdev_run_todo() and other parts
of the stack where had steps to make sure loopback_dev was
the last device to disappear.
Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Holger Brunck [Thu, 10 Feb 2022 17:48:23 +0000 (18:48 +0100)]
dsa: mv88e6xxx: make serdes SGMII/Fiber tx amplitude configurable
The mv88e6352, mv88e6240 and mv88e6176 have a serdes interface. This patch
allows to configure the output swing to a desired value in the
phy-handle of the port. The value which is peak to peak has to be
specified in microvolts. As the chips only supports eight dedicated
values we return EINVAL if the value in the DTS does not match one of
these values.
Signed-off-by: Holger Brunck <holger.brunck@hitachienergy.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Marek Behún <kabel@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Common PHYs and network PCSes often have the possibility to specify
peak-to-peak voltage on the differential pair - the default voltage
sometimes needs to be changed for a particular board.
Add properties `tx-p2p-microvolt` and `tx-p2p-microvolt-names` for this
purpose. The second property is needed to specify the mode for the
corresponding voltage in the `tx-p2p-microvolt` property, if the voltage
is to be used only for speficic mode. More voltage-mode pairs can be
specified.
Example usage with only one voltage (it will be used for all supported
PHY modes, the `tx-p2p-microvolt-names` property is not needed in this
case):
Guillaume Nault [Thu, 10 Feb 2022 15:08:08 +0000 (16:08 +0100)]
ipv6: Reject routes configurations that specify dsfield (tos)
The ->rtm_tos option is normally used to route packets based on both
the destination address and the DS field. However it's ignored for
IPv6 routes. Setting ->rtm_tos for IPv6 is thus invalid as the route
is going to work only on the destination address anyway, so it won't
behave as specified.
Suggested-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 11 Feb 2022 11:17:33 +0000 (11:17 +0000)]
Merge branch 'dsa-cleanup'
Vladimir Oltean says:
====================
More aggressive DSA cleanup
This series deletes some code which is apparently not needed.
I've had these patches in my tree for a while, and testing on my boards
didn't reveal any issues.
Compared to the RFC v1 series, the only change is the addition of patch 3.
https://patchwork.kernel.org/project/netdevbpf/cover/20220107184842.550334-1-vladimir.oltean@nxp.com/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Thu, 10 Feb 2022 13:45:00 +0000 (15:45 +0200)]
net: dsa: remove lockdep class for DSA slave address list
Since commit 2f1e8ea726e9 ("net: dsa: link interfaces with the DSA
master to get rid of lockdep warnings"), suggested by Cong Wang, the
DSA interfaces and their master have different dev->nested_level, which
makes netif_addr_lock() stop complaining about potentially recursive
locking on the same lock class.
So we no longer need DSA slave interfaces to have their own lockdep
class.
Cc: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Thu, 10 Feb 2022 13:44:59 +0000 (15:44 +0200)]
net: dsa: remove lockdep class for DSA master address list
Since commit 2f1e8ea726e9 ("net: dsa: link interfaces with the DSA
master to get rid of lockdep warnings"), suggested by Cong Wang, the
DSA interfaces and their master have different dev->nested_level, which
makes netif_addr_lock() stop complaining about potentially recursive
locking on the same lock class.
So we no longer need DSA masters to have their own lockdep class.
Cc: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Thu, 10 Feb 2022 13:44:58 +0000 (15:44 +0200)]
net: dsa: remove ndo_get_phys_port_name and ndo_get_port_parent_id
There are no legacy ports, DSA registers a devlink instance with ports
unconditionally for all switch drivers. Therefore, delete the old-style
ndo operations used for determining bridge forwarding domains.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Separate smc_tcp_listen_work() from smc_listen_work(), make them
independent of each other, the busy SMC handshake can not affect new TCP
connections visit any more. Avoid discarding a large number of TCP
connections after being overstock, which is undoubtedly raise the
connection establishment time.
Patch 2/5 (Limit SMC backlog connections):
Since patch 1 has separated smc_tcp_listen_work() from
smc_listen_work(), an unrestricted TCP accept have come into being. This
patch try to put a limit on SMC backlog connections refers to
implementation of TCP.
Patch 3/5 (Limit SMC visits when handshake workqueue congested):
Considering the complexity of SMC handshake right now, in short-lived
links scenarios, this may not be the main scenario of SMC though, it's
performance is still quite poor. This patch try to provide constraint on
SMC handshake when handshake workqueue congested, which is the sign of
SMC handshake stacking in our opinion.
Patch 4/5 (Dynamic control handshake limitation by socket options)
This patch allow applications dynamically control the ability of SMC
handshake limitation. Since SMC don't support set SMC socket option
before,
this patch also have to support SMC's owns socket options.
Patch 5/5 (Add global configure for handshake limitation by netlink)
This patch provides a way to get benefit of handshake limitation
without
modifying any code for applications, which is quite useful for most
existing applications.
After this patch set, performance figures like that:
Running 20s test @ http://11.213.45.6
4 threads and 10000 connections
693253 requests in 20.10s, 452.88MB read
Requests/sec: 34488.13
Transfer/sec: 22.53MB
That's a quite well performance improvement, about to 6 to 7 times in my
environment.
---
changelog:
v1 -> v2:
- fix compile warning
- fix invalid dependencies in kconfig
v2 -> v3:
- correct spelling mistakes
- fix useless variable declare
v3 -> v4
- make smc_tcp_ls_wq be static
v4 -> v5
- add dynamic control for SMC auto fallback by socket options
- add global configure for SMC auto fallback through netlink
v5 -> v6
- move auto fallback to net namespace scope
- remove auto fallback attribute in SMC_GEN_SYS_INFO
- add independent attributes for auto fallback
v6 -> v7
- fix wording and the naming issues, rename 'auto fallback' to handshake
limitation.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
D. Wythe [Thu, 10 Feb 2022 09:11:38 +0000 (17:11 +0800)]
net/smc: Add global configure for handshake limitation by netlink
Although we can control SMC handshake limitation through socket options,
which means that applications who need it must modify their code. It's
quite troublesome for many existing applications. This patch modifies
the global default value of SMC handshake limitation through netlink,
providing a way to put constraint on handshake without modifies any code
for applications.
Suggested-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
D. Wythe [Thu, 10 Feb 2022 09:11:37 +0000 (17:11 +0800)]
net/smc: Dynamic control handshake limitation by socket options
This patch aims to add dynamic control for SMC handshake limitation for
every smc sockets, in production environment, it is possible for the
same applications to handle different service types, and may have
different opinion on SMC handshake limitation.
This patch try socket options to complete it, since we don't have socket
option level for SMC yet, which requires us to implement it at the same
time.
This patch does the following:
- add new socket option level: SOL_SMC.
- add new SMC socket option: SMC_LIMIT_HS.
- provide getter/setter for SMC socket options.
D. Wythe [Thu, 10 Feb 2022 09:11:36 +0000 (17:11 +0800)]
net/smc: Limit SMC visits when handshake workqueue congested
This patch intends to provide a mechanism to put constraint on SMC
connections visit according to the pressure of SMC handshake process.
At present, frequent visits will cause the incoming connections to be
backlogged in SMC handshake queue, raise the connections established
time. Which is quite unacceptable for those applications who base on
short lived connections.
There are two ways to implement this mechanism:
1. Put limitation after TCP established.
2. Put limitation before TCP established.
In the first way, we need to wait and receive CLC messages that the
client will potentially send, and then actively reply with a decline
message, in a sense, which is also a sort of SMC handshake, affect the
connections established time on its way.
In the second way, the only problem is that we need to inject SMC logic
into TCP when it is about to reply the incoming SYN, since we already do
that, it's seems not a problem anymore. And advantage is obvious, few
additional processes are required to complete the constraint.
This patch use the second way. After this patch, connections who beyond
constraint will not informed any SMC indication, and SMC will not be
involved in any of its subsequent processes.
D. Wythe [Thu, 10 Feb 2022 09:11:35 +0000 (17:11 +0800)]
net/smc: Limit backlog connections
Current implementation does not handling backlog semantics, one
potential risk is that server will be flooded by infinite amount
connections, even if client was SMC-incapable.
This patch works to put a limit on backlog connections, referring to the
TCP implementation, we divides SMC connections into two categories:
1. Half SMC connection, which includes all TCP established while SMC not
connections.
2. Full SMC connection, which includes all SMC established connections.
For half SMC connection, since all half SMC connections starts with TCP
established, we can achieve our goal by put a limit before TCP
established. Refer to the implementation of TCP, this limits will based
on not only the half SMC connections but also the full connections,
which is also a constraint on full SMC connections.
For full SMC connections, although we know exactly where it starts, it's
quite hard to put a limit before it. The easiest way is to block wait
before receive SMC confirm CLC message, while it's under protection by
smc_server_lgr_pending, a global lock, which leads this limit to the
entire host instead of a single listen socket. Another way is to drop
the full connections, but considering the cast of SMC connections, we
prefer to keep full SMC connections.
Even so, the limits of full SMC connections still exists, see commits
about half SMC connection below.
After this patch, the limits of backend connection shows like:
For SMC:
1. Client with SMC-capability can makes 2 * backlog full SMC connections
or 1 * backlog half SMC connections and 1 * backlog full SMC
connections at most.
2. Client without SMC-capability can only makes 1 * backlog half TCP
connections and 1 * backlog full TCP connections.
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
D. Wythe [Thu, 10 Feb 2022 09:11:34 +0000 (17:11 +0800)]
net/smc: Make smc_tcp_listen_work() independent
In multithread and 10K connections benchmark, the backend TCP connection
established very slowly, and lots of TCP connections stay in SYN_SENT
state.
Client: smc_run wrk -c 10000 -t 4 http://server
the netstate of server host shows like:
145042 times the listen queue of a socket overflowed
145042 SYNs to LISTEN sockets dropped
One reason of this issue is that, since the smc_tcp_listen_work() shared
the same workqueue (smc_hs_wq) with smc_listen_work(), while the
smc_listen_work() do blocking wait for smc connection established. Once
the workqueue became congested, it's will block the accept() from TCP
listen.
This patch creates a independent workqueue(smc_tcp_ls_wq) for
smc_tcp_listen_work(), separate it from smc_listen_work(), which is
quite acceptable considering that smc_tcp_listen_work() runs very fast.
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
dt-bindings: net: dsa: realtek: convert to YAML schema, add MDIO
Schema changes:
- support for mdio-connected switches (mdio driver), recognized by
checking the presence of property "reg"
- new compatible strings for rtl8367s and rtl8367rb
- "interrupt-controller" was not added as a required property. It might
still work polling the ports when missing.
Examples changes:
- renamed "switch_intc" to make it unique between examples
- removed "dsa-mdio" from mdio compatible property
- renamed phy@0 to ethernet-phy@0 (not tested with real HW)
phy@ requires #phy-cells
Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 11 Feb 2022 00:01:22 +0000 (16:01 -0800)]
Merge tag 'net-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter and can.
Current release - new code bugs:
- sparx5: fix get_stat64 out-of-bound access and crash
- smc: fix netdev ref tracker misuse
Previous releases - regressions:
- eth: ixgbevf: require large buffers for build_skb on 82599VF, avoid
overflows
- eth: ocelot: fix all IP traffic getting trapped to CPU with PTP
over IP
- bonding: fix rare link activation misses in 802.3ad mode
Previous releases - always broken:
- tcp: fix tcp sock mem accounting in zero-copy corner cases
- remove the cached dst when uncloning an skb dst and its metadata,
since we only have one ref it'd lead to an UaF
- netfilter:
- conntrack: don't refresh sctp entries in closed state
- conntrack: re-init state for retransmitted syn-ack, avoid
connection establishment getting stuck with strange stacks
- ctnetlink: disable helper autoassign, avoid it getting lost
- nft_payload: don't allow transport header access for fragments
- dsa: fix use of devres for mdio throughout drivers
- eth: amd-xgbe: disable interrupts during pci removal
- eth: dpaa2-eth: unregister netdev before disconnecting the PHY
- eth: ice: fix IPIP and SIT TSO offload"
* tag 'net-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (53 commits)
net: dsa: mv88e6xxx: fix use-after-free in mv88e6xxx_mdios_unregister
net: mscc: ocelot: fix mutex lock error during ethtool stats read
ice: Avoid RTNL lock when re-creating auxiliary device
ice: Fix KASAN error in LAG NETDEV_UNREGISTER handler
ice: fix IPIP and SIT TSO offload
ice: fix an error code in ice_cfg_phy_fec()
net: mpls: Fix GCC 12 warning
dpaa2-eth: unregister the netdev before disconnecting from the PHY
skbuff: cleanup double word in comment
net: macb: Align the dma and coherent dma masks
mptcp: netlink: process IPv6 addrs in creating listening sockets
selftests: mptcp: add missing join check
net: usb: qmi_wwan: Add support for Dell DW5829e
vlan: move dev_put into vlan_dev_uninit
vlan: introduce vlan_dev_free_egress_priority
ax25: fix UAF bugs of net_device caused by rebinding operation
net: dsa: fix panic when DSA master device unbinds on shutdown
net: amd-xgbe: disable interrupts during pci removal
tipc: rate limit warning for received illegal binding update
net: mdio: aspeed: Add missing MODULE_DEVICE_TABLE
...
Linus Torvalds [Thu, 10 Feb 2022 23:42:48 +0000 (15:42 -0800)]
Merge tag 'linux-kselftest-fixes-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull Kselftest fixes from Shuah Khan:
"Build and run-time fixes to pidfd, clone3, and ir tests"
* tag 'linux-kselftest-fixes-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/ir: fix build with ancient kernel headers
selftests: fixup build warnings in pidfd / clone3 tests
pidfd: fix test failure due to stack overflow on some arches
Linus Torvalds [Thu, 10 Feb 2022 23:39:59 +0000 (15:39 -0800)]
Merge tag 'linux-kselftest-kunit-fixes-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull KUnit fixes from Shuah Khan:
"Fixes to the test and usage documentation"
* tag 'linux-kselftest-kunit-fixes-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
Documentation: KUnit: Fix usage bug
kunit: fix missing f in f-string in run_checks.py
Vladimir Oltean [Thu, 10 Feb 2022 17:40:17 +0000 (19:40 +0200)]
net: dsa: mv88e6xxx: fix use-after-free in mv88e6xxx_mdios_unregister
Since struct mv88e6xxx_mdio_bus *mdio_bus is the bus->priv of something
allocated with mdiobus_alloc_size(), this means that mdiobus_free(bus)
will free the memory backing the mdio_bus as well. Therefore, the
mdio_bus->list element is freed memory, but we continue to iterate
through the list of MDIO buses using that list element.
To fix this, use the proper list iterator that handles element deletion
by keeping a copy of the list element next pointer.
Fixes: f53a2ce893b2 ("net: dsa: mv88e6xxx: don't use devres for mdiobus") Reported-by: Rafael Richter <rafael.richter@gin.de> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220210174017.3271099-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 10 Feb 2022 19:45:35 +0000 (11:45 -0800)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2022-02-10
Dan Carpenter propagates an error in FEC configuration.
Jesse fixes TSO offloads of IPIP and SIT frames.
Dave adds a dedicated LAG unregister function to resolve a KASAN error
and moves auxiliary device re-creation after LAG removal to the service
task to avoid issues with RTNL lock.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ice: Avoid RTNL lock when re-creating auxiliary device
ice: Fix KASAN error in LAG NETDEV_UNREGISTER handler
ice: fix IPIP and SIT TSO offload
ice: fix an error code in ice_cfg_phy_fec()
====================
An ongoing workqueue populates the stats buffer. At the same time, a user
might query the statistics. While writing to the buffer is mutex-locked,
reading from the buffer wasn't. This could lead to buggy reads by ethtool.
This patch fixes the former blamed commit, but the bug was introduced in
the latter.
Signed-off-by: Colin Foster <colin.foster@in-advantage.com> Fixes: 1e1caa9735f90 ("ocelot: Clean up stats update deferred work") Fixes: a556c76adc052 ("net: mscc: Add initial Ocelot switch support") Reported-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/all/20220210150451.416845-2-colin.foster@in-advantage.com/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dave Ertman [Fri, 21 Jan 2022 00:27:56 +0000 (16:27 -0800)]
ice: Avoid RTNL lock when re-creating auxiliary device
If a call to re-create the auxiliary device happens in a context that has
already taken the RTNL lock, then the call flow that recreates auxiliary
device can hang if there is another attempt to claim the RTNL lock by the
auxiliary driver.
To avoid this, any call to re-create auxiliary devices that comes from
an source that is holding the RTNL lock (e.g. netdev notifier when
interface exits a bond) should execute in a separate thread. To
accomplish this, add a flag to the PF that will be evaluated in the
service task and dealt with there.
Fixes: f9f5301e7e2d ("ice: Register auxiliary device to provide RDMA") Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Reviewed-by: Jonathan Toppins <jtoppins@redhat.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Dave Ertman [Tue, 18 Jan 2022 21:08:20 +0000 (13:08 -0800)]
ice: Fix KASAN error in LAG NETDEV_UNREGISTER handler
Currently, the same handler is called for both a NETDEV_BONDING_INFO
LAG unlink notification as for a NETDEV_UNREGISTER call. This is
causing a problem though, since the netdev_notifier_info passed has
a different structure depending on which event is passed. The problem
manifests as a call trace from a BUG: KASAN stack-out-of-bounds error.
Fix this by creating a handler specific to NETDEV_UNREGISTER that only
is passed valid elements in the netdev_notifier_info struct for the
NETDEV_UNREGISTER event.
Also included is the removal of an unbalanced dev_put on the peer_netdev
and related braces.
Fixes: 6a8b357278f5 ("ice: Respond to a NETDEV_UNREGISTER event for LAG") Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Acked-by: Jonathan Toppins <jtoppins@redhat.com> Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Jesse Brandeburg [Fri, 14 Jan 2022 23:38:39 +0000 (15:38 -0800)]
ice: fix IPIP and SIT TSO offload
The driver was avoiding offload for IPIP (at least) frames due to
parsing the inner header offsets incorrectly when trying to check
lengths.
This length check works for VXLAN frames but fails on IPIP frames
because skb_transport_offset points to the inner header in IPIP
frames, which meant the subtraction of transport_header from
inner_network_header returns a negative value (-20).
With the code before this patch, everything continued to work, but GSO
was being used to segment, causing throughputs of 1.5Gb/s per thread.
After this patch, throughput is more like 10Gb/s per thread for IPIP
traffic.
Fixes: e94d44786693 ("ice: Implement filter sync, NDO operations and bump version") Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Dan Carpenter [Fri, 7 Jan 2022 08:02:06 +0000 (11:02 +0300)]
ice: fix an error code in ice_cfg_phy_fec()
Propagate the error code from ice_get_link_default_override() instead
of returning success.
Fixes: ea78ce4dab05 ("ice: add link lenient and default override support") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
net/switchdev: use struct_size over open coded arithmetic
Replace zero-length array with flexible-array member and make use
of the struct_size() helper in kmalloc(). For example:
struct switchdev_deferred_item {
...
unsigned long data[];
};
Make use of the struct_size() helper instead of an open-coded version
in order to avoid any potential type mistakes.
Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Minghao Chi (CGEL ZTE) <chi.minghao@zte.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
Guillaume Nault [Thu, 10 Feb 2022 12:24:51 +0000 (13:24 +0100)]
ipv4: Reject again rules with high DSCP values
Commit 563f8e97e054 ("ipv4: Stop taking ECN bits into account in
fib4-rules") replaced the validation test on frh->tos. While the new
test is stricter for ECN bits, it doesn't detect the use of high order
DSCP bits. This would be fine if IPv4 could properly handle them. But
currently, most IPv4 lookups are done with the three high DSCP bits
masked. Therefore, using these bits doesn't lead to the expected
result.
Let's reject such configurations again, so that nobody starts to
use and make any assumption about how the stack handles the three high
order DSCP bits in fib4 rules.
Fixes: 563f8e97e054 ("ipv4: Stop taking ECN bits into account in fib4-rules") Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds TC feature for VFs also. When MCAM
rules are allocated for a VF then either TC or ntuple
filters can be used. Below are the commands to use
TC feature for a VF(say lbk0):
devlink dev param set pci/0002:01:00.1 name mcam_count value 16 \
cmode runtime
ethtool -K lbk0 hw-tc-offload on
ifconfig lbk0 up
tc qdisc add dev lbk0 ingress
tc filter add dev lbk0 parent ffff: protocol ip flower skip_sw \
dst_mac 98:03:9b:83:aa:12 action police rate 100Mbit burst 5000
Also to modify any fields of the hardware context with
NIX_AQ_INSTOP_WRITE command then corresponding masks of those
fields must be set as per hardware. This was missing in
ingress ratelimiting context. This patch sets those masks also.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Victor Erminpour [Thu, 10 Feb 2022 00:28:38 +0000 (16:28 -0800)]
net: mpls: Fix GCC 12 warning
When building with automatic stack variable initialization, GCC 12
complains about variables defined outside of switch case statements.
Move the variable outside the switch, which silences the warning:
./net/mpls/af_mpls.c:1624:21: error: statement will never be executed [-Werror=switch-unreachable]
1624 | int err;
| ^~~
Signed-off-by: Victor Erminpour <victor.erminpour@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Device firmware can assert if the device shutdown path in driver
encounters an async. events from mfw (processed in
qed_mcp_handle_events()) after qed_mcp_unload_req() returns.
A call to qed_mcp_unload_req() currently marks the device as inactive
and thus stops any new events, but there is a windows where in-flight
events might still be received by the driver.
To prevent this race condition, atomically set QED_MCP_BYPASS_PROC_BIT
in qed_mcp_unload_req() to make sure qed_mcp_handle_events() ignores all
events. Wait for any event that might already be in-process to complete
by monitoring QED_MCP_IN_PROCESSING_BIT.
Signed-off-by: Pravin Kumar Ganesh Dhende <pdhende@marvell.com> Signed-off-by: Venkata Sudheer Kumar Bhavaraju <vbhavaraju@marvell.com> Signed-off-by: Alok Prasad <palok@marvell.com> Signed-off-by: Ariel Elior <aelior@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
dpaa2-eth: unregister the netdev before disconnecting from the PHY
The netdev should be unregistered before we are disconnecting from the
MAC/PHY so that the dev_close callback is called and the PHY and the
phylink workqueues are actually stopped before we are disconnecting and
destroying the phylink instance.
Fixes: 719479230893 ("dpaa2-eth: add MAC/PHY support through phylink") Signed-off-by: Robert-Ionut Alexa <robert-ionut.alexa@nxp.com> Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Marc St-Amand [Wed, 9 Feb 2022 09:43:25 +0000 (15:13 +0530)]
net: macb: Align the dma and coherent dma masks
Single page and coherent memory blocks can use different DMA masks
when the macb accesses physical memory directly. The kernel is clever
enough to allocate pages that fit into the requested address width.
When using the ARM SMMU, the DMA mask must be the same for single
pages and big coherent memory blocks. Otherwise the translation
tables turn into one big mess.
[ 74.959909] macb ff0e0000.ethernet eth0: DMA bus error: HRESP not OK
[ 74.959989] arm-smmu fd800000.smmu: Unhandled context fault: fsr=0x402, iova=0x3165687460, fsynr=0x20001, cbfrsynra=0x877, cb=1
[ 75.173939] macb ff0e0000.ethernet eth0: DMA bus error: HRESP not OK
[ 75.173955] arm-smmu fd800000.smmu: Unhandled context fault: fsr=0x402, iova=0x3165687460, fsynr=0x20001, cbfrsynra=0x877, cb=1
Since using the same DMA mask does not hurt direct 1:1 physical
memory mappings, this commit always aligns DMA and coherent masks.
Signed-off-by: Marc St-Amand <mstamand@ciena.com> Signed-off-by: Harini Katakam <harini.katakam@xilinx.com> Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Tested-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 10 Feb 2022 00:36:49 +0000 (16:36 -0800)]
selftests: net: test standard socket cmsgs across UDP and ICMP sockets
Test TIMESTAMPING and TXTIME across UDP / ICMP and IP versions.
Before ICMPv6 support:
# ./tools/testing/selftests/net/cmsg_time.sh
Case ICMPv6 - ts cnt returned '0', expected '2'
Case ICMPv6 - ts0 SCHED returned '', expected 'OK'
Case ICMPv6 - ts0 SND returned '', expected 'OK'
Case ICMPv6 - TXTIME abs returned '', expected 'OK'
Case ICMPv6 - TXTIME rel returned '', expected 'OK'
FAIL - 5/36 cases failed
After:
# ./tools/testing/selftests/net/cmsg_time.sh
OK
Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 10 Feb 2022 00:36:41 +0000 (16:36 -0800)]
net: ping6: support setting socket options via cmsg
Minor reordering of the code and a call to sock_cmsg_send()
gives us support for setting the common socket options via
cmsg (the usual ones - SO_MARK, SO_TIMESTAMPING_OLD, SCM_TXTIME).
Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Kalle Valo [Thu, 10 Feb 2022 14:36:03 +0000 (16:36 +0200)]
Merge tag 'mt76-for-kvalo-2022-02-04' of https://github.com/nbd168/wireless into main
mt76 patches for 5.18
- mt7915 mcu code cleanup
- mt7916 support
- fixes for SDIO support
- fixes for DFS
- power management fixes
- stability improvements
- background radar detection support
An update from ieee802154 for your *net-next* tree.
There is more ongoing in ieee802154 than usual. This will be the first pull
request for this cycle, but I expect one more. Depending on review and rework
times.
Pavel Skripkin ported the atusb driver over to the new USB api to avoid unint
problems as well as making use of the modern api without kmalloc() needs in he
driver.
Miquel Raynal landed some changes to ensure proper frame checksum checking with
hwsim, documenting our use of wake and stop_queue and eliding a magic value by
using the proper define.
David Girault documented the address struct used in ieee802154.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>