Petr Machata [Wed, 29 Mar 2023 15:24:53 +0000 (17:24 +0200)]
selftests: rtnetlink: Fix do_test_address_proto()
This selftest was introduced recently in the commit cited below. It misses
several check_err() invocations to actually verify that the previous
command succeeded. When these are added, the first one fails, because
besides the addresses added by hand, there can be a link-local address
added by the kernel. Adjust the check to expect at least three addresses
instead of exactly three, and add the missing check_err's.
Furthermore, the explanatory comments assume that the address with no
protocol is $addr2, when in fact it is $addr3. Update the comments.
net: ethernet: ti: Fix format specifier in netcp_create_interface()
After commit 3948b05950fd ("net: introduce a config option to tweak
MAX_SKB_FRAGS"), clang warns:
drivers/net/ethernet/ti/netcp_core.c:2085:4: warning: format specifies type 'long' but the argument has type 'int' [-Wformat]
MAX_SKB_FRAGS);
^~~~~~~~~~~~~
include/linux/dev_printk.h:144:65: note: expanded from macro 'dev_err'
dev_printk_index_wrap(_dev_err, KERN_ERR, dev, dev_fmt(fmt), ##__VA_ARGS__)
~~~ ^~~~~~~~~~~
include/linux/dev_printk.h:110:23: note: expanded from macro 'dev_printk_index_wrap'
_p_func(dev, fmt, ##__VA_ARGS__); \
~~~ ^~~~~~~~~~~
include/linux/skbuff.h:352:23: note: expanded from macro 'MAX_SKB_FRAGS'
#define MAX_SKB_FRAGS CONFIG_MAX_SKB_FRAGS
^~~~~~~~~~~~~~~~~~~~
./include/generated/autoconf.h:11789:30: note: expanded from macro 'CONFIG_MAX_SKB_FRAGS'
#define CONFIG_MAX_SKB_FRAGS 17
^~
1 warning generated.
Follow the pattern of the rest of the tree by changing the specifier to
'%u' and casting MAX_SKB_FRAGS explicitly to 'unsigned int', which
eliminates the warning.
Vladimir Oltean [Wed, 29 Mar 2023 13:38:19 +0000 (16:38 +0300)]
net: dsa: fix db type confusion in host fdb/mdb add/del
We have the following code paths:
Host FDB (unicast RX filtering):
dsa_port_standalone_host_fdb_add() dsa_port_bridge_host_fdb_add()
| |
+--------------+ +------------+
| |
v v
dsa_port_host_fdb_add()
dsa_port_standalone_host_fdb_del() dsa_port_bridge_host_fdb_del()
| |
+--------------+ +------------+
| |
v v
dsa_port_host_fdb_del()
Host MDB (multicast RX filtering):
dsa_port_standalone_host_mdb_add() dsa_port_bridge_host_mdb_add()
| |
+--------------+ +------------+
| |
v v
dsa_port_host_mdb_add()
dsa_port_standalone_host_mdb_del() dsa_port_bridge_host_mdb_del()
| |
+--------------+ +------------+
| |
v v
dsa_port_host_mdb_del()
The logic added by commit 5e8a1e03aa4d ("net: dsa: install secondary
unicast and multicast addresses as host FDB/MDB") zeroes out
db.bridge.num if the switch doesn't support ds->fdb_isolation
(the majority doesn't). This is done for a reason explained in commit c26933639b54 ("net: dsa: request drivers to perform FDB isolation").
Taking a single code path as example - dsa_port_host_fdb_add() - the
others are similar - the problem is that this function handles:
- DSA_DB_PORT databases, when called from
dsa_port_standalone_host_fdb_add()
- DSA_DB_BRIDGE databases, when called from
dsa_port_bridge_host_fdb_add()
So, if dsa_port_host_fdb_add() were to make any change on the
"bridge.num" attribute of the database, this would only be correct for a
DSA_DB_BRIDGE, and a type confusion for a DSA_DB_PORT bridge.
However, this bug is without consequences, for 2 reasons:
- dsa_port_standalone_host_fdb_add() is only called from code which is
(in)directly guarded by dsa_switch_supports_uc_filtering(ds), and that
function only returns true if ds->fdb_isolation is set. So, the code
only executed for DSA_DB_BRIDGE databases.
- Even if the code was not dead for DSA_DB_PORT, we have the following
memory layout:
So, the zeroization of dsa_db :: bridge :: num on a dsa_db structure of
type DSA_DB_PORT would access memory which is unused, because we only
use dsa_db :: dp for DSA_DB_PORT, and this is mapped at the same address
with dsa_db :: dev for DSA_DB_BRIDGE, thanks to the union definition.
It is correct to fix up dsa_db :: bridge :: num only from code paths
that come from the bridge / switchdev, so move these there.
Tom Rix [Wed, 29 Mar 2023 12:59:29 +0000 (08:59 -0400)]
net: ksz884x: remove unused change variable
clang with W=1 reports
drivers/net/ethernet/micrel/ksz884x.c:3216:6: error: variable
'change' set but not used [-Werror,-Wunused-but-set-variable]
int change = 0;
^
This variable is not used so remove it.
drivers/net/ethernet/mediatek/mtk_ppe.c 3fbe4d8c0e53 ("net: ethernet: mtk_eth_soc: ppe: add support for flow accounting") 924531326e2d ("net: ethernet: mtk_eth_soc: add missing ppe cache flush when deleting a flow")
Linus Torvalds [Thu, 30 Mar 2023 21:05:21 +0000 (14:05 -0700)]
Merge tag 'net-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from CAN and WPAN.
Still quite a few bugs from this release. This pull is a bit smaller
because major subtrees went into the previous one. Or maybe people
took spring break off?
Current release - regressions:
- phy: micrel: correct KSZ9131RNX EEE capabilities and advertisement
Current release - new code bugs:
- eth: wangxun: fix vector length of interrupt cause
- vsock/loopback: consistently protect the packet queue with
sk_buff_head.lock
- virtio/vsock: fix header length on skb merging
- wpan: ca8210: fix unsigned mac_len comparison with zero
Previous releases - regressions:
- eth: stmmac: don't reject VLANs when IFF_PROMISC is set
- eth: smsc911x: avoid PHY being resumed when interface is not up
- eth: mtk_eth_soc: fix tx throughput regression with direct 1G links
- eth: bnx2x: use the right build_skb() helper after core rework
- wwan: iosm: fix 7560 modem crash on use on unsupported channel
Previous releases - always broken:
- eth: sfc: don't overwrite offload features at NIC reset
- eth: r8169: fix RTL8168H and RTL8107E rx crc error
- can: j1939: prevent deadlock by moving j1939_sk_errqueue()
- virt: vmxnet3: use GRO callback when UPT is enabled
- virt: xen: don't do grant copy across page boundary
- phy: dp83869: fix default value for tx-/rx-internal-delay
- dsa: ksz8: fix multiple issues with ksz8_fdb_dump
- eth: mvpp2: fix classification/RSS of VLAN and fragmented packets
* tag 'net-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (55 commits)
net: ethernet: mtk_eth_soc: add missing ppe cache flush when deleting a flow
net: ethernet: mtk_eth_soc: fix L2 offloading with DSA untag offload
net: ethernet: mtk_eth_soc: fix flow block refcounting logic
net: mvneta: fix potential double-frees in mvneta_txq_sw_deinit()
net: dsa: sync unicast and multicast addresses for VLAN filters too
net: dsa: mv88e6xxx: Enable IGMP snooping on user ports only
xen/netback: use same error messages for same errors
test/vsock: new skbuff appending test
virtio/vsock: WARN_ONCE() for invalid state of socket
virtio/vsock: fix header length on skb merging
bnxt_en: Add missing 200G link speed reporting
bnxt_en: Fix typo in PCI id to device description string mapping
bnxt_en: Fix reporting of test result in ethtool selftest
i40e: fix registers dump after run ethtool adapter self test
bnx2x: use the right build_skb() helper
net: ipa: compute DMA pool size properly
net: wwan: iosm: fixes 7560 modem crash
net: ethernet: mtk_eth_soc: fix tx throughput regression with direct 1G links
ice: fix invalid check for empty list in ice_sched_assoc_vsi_to_agg()
ice: add profile conflict check for AVF FDIR
...
Linus Torvalds [Thu, 30 Mar 2023 20:58:12 +0000 (13:58 -0700)]
Merge tag 'for-6.3/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- Fix two DM core bugs in the code that handles splitting "abnormal" IO
(discards, write same and secure erase) and issuing that IO to the
correct underlying devices (and offsets within those devices).
* tag 'for-6.3/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm: fix __send_duplicate_bios() to always allow for splitting IO
dm: fix improper splitting for abnormal bios
* tag 'drm-fixes-2023-03-30' of git://anongit.freedesktop.org/drm/drm: (27 commits)
Revert "drm/scheduler: track GPU active time per entity"
Revert "drm/etnaviv: export client GPU usage statistics via fdinfo"
drm/etnaviv: fix reference leak when mmaping imported buffer
drm/amdgpu: allow more APUs to do mode2 reset when go to S4
drm/amd/display: Take FEC Overhead into Timeslot Calculation
drm/amd/display: Add DSC Support for Synaptics Cascaded MST Hub
drm: test: Fix 32-bit issue in drm_buddy_test
drm: buddy_allocator: Fix buddy allocator init on 32-bit systems
drm/nouveau/kms: Fix backlight registration
drm/i915/perf: Drop wakeref on GuC RC error
drm/i915/dpt: Treat the DPT BO as a framebuffer
drm/i915/gem: Flush lmem contents after construction
drm/i915/tc: Fix the ICL PHY ownership check in TC-cold state
drm/i915: Disable DC states for all commits
drm/i915: Workaround ICL CSC_MODE sticky arming
drm/i915: Add a .color_post_update() hook
drm/i915: Move CSC load back into .color_commit_arm() when PSR is enabled on skl/glk
drm/i915: Split icl_color_commit_noarm() from skl_color_commit_noarm()
drm/i915/pmu: Use functions common with sysfs to read actual freq
accel/ivpu: Fix IPC buffer header status field value
...
Mike Snitzer [Thu, 30 Mar 2023 19:09:29 +0000 (15:09 -0400)]
dm: fix __send_duplicate_bios() to always allow for splitting IO
Commit 7dd76d1feec70 ("dm: improve bio splitting and associated IO
accounting") only called setup_split_accounting() from
__send_duplicate_bios() if a single bio were being issued. But the case
where duplicate bios are issued must call it too.
Otherwise the bio won't be split and resubmitted (via recursion through
block core back to DM) to submit the later portions of a bio (which may
map to an entirely different target).
For example, when discarding an entire DM striped device with the
following DM table:
vg-lvol0: 0 159744 striped 2 128 7:0 2048 7:1 2048
vg-lvol0: 159744 45056 striped 2 128 7:2 2048 7:3 2048
Fixes: 7dd76d1feec70 ("dm: improve bio splitting and associated IO accounting") Cc: stable@vger.kernel.org Reported-by: Orange Kao <orange@aiven.io> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Mike Snitzer [Thu, 30 Mar 2023 18:56:38 +0000 (14:56 -0400)]
dm: fix improper splitting for abnormal bios
"Abnormal" bios include discards, write zeroes and secure erase. By no
longer passing the calculated 'len' pointer, commit 7dd06a2548b2 ("dm:
allow dm_accept_partial_bio() for dm_io without duplicate bios") took a
senseless approach to disallowing dm_accept_partial_bio() from working
for duplicate bios processed using __send_duplicate_bios().
It inadvertently and incorrectly stopped the use of 'len' when
initializing a target's io (in alloc_tio). As such the resulting tio
could address more area of a device than it should.
For example, when discarding an entire DM striped device with the
following DM table:
vg-lvol0: 0 159744 striped 2 128 7:0 2048 7:1 2048
vg-lvol0: 159744 45056 striped 2 128 7:2 2048 7:3 2048
Before this fix:
device-mapper: striped: target_stripe=0, bdev=7:0, start=2048 len=102400
blkdiscard: attempt to access beyond end of device
loop0: rw=2051, sector=2048, nr_sectors = 102400 limit=81920
device-mapper: striped: target_stripe=1, bdev=7:1, start=2048 len=102400
blkdiscard: attempt to access beyond end of device
loop1: rw=2051, sector=2048, nr_sectors = 102400 limit=81920
Fixes: 7dd06a2548b2 ("dm: allow dm_accept_partial_bio() for dm_io without duplicate bios") Cc: stable@vger.kernel.org Reported-by: Orange Kao <orange@aiven.io> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Felix Fietkau [Thu, 30 Mar 2023 12:08:40 +0000 (14:08 +0200)]
net: ethernet: mtk_eth_soc: add missing ppe cache flush when deleting a flow
The cache needs to be flushed to ensure that the hardware stops offloading
the flow immediately.
Fixes: 33fc42de3327 ("net: ethernet: mtk_eth_soc: support creating mac address based offload entries") Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Felix Fietkau <nbd@nbd.name> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20230330120840.52079-3-nbd@nbd.name Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since we call flow_block_cb_decref on FLOW_BLOCK_UNBIND, we also need to
call flow_block_cb_incref for a newly allocated cb.
Also fix the accidentally inverted refcount check on unbind.
Fixes: 502e84e2382d ("net: ethernet: mtk_eth_soc: add flow offloading support") Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Felix Fietkau <nbd@nbd.name> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20230330120840.52079-1-nbd@nbd.name Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net: mvneta: fix potential double-frees in mvneta_txq_sw_deinit()
Reported on the Turris forum, mvneta provokes kernel warnings in the
architecture DMA mapping code when mvneta_setup_txqs() fails to
allocate memory. This happens because when mvneta_cleanup_txqs() is
called in the mvneta_stop() path, we leave pointers in the structure
that have been freed.
Then on mvneta_open(), we call mvneta_setup_txqs(), which starts
allocating memory. On memory allocation failure, mvneta_cleanup_txqs()
will walk all the queues freeing any non-NULL pointers - which includes
pointers that were previously freed in mvneta_stop().
Fix this by setting these pointers to NULL to prevent double-freeing
of the same memory.
Vladimir Oltean [Wed, 29 Mar 2023 15:18:21 +0000 (18:18 +0300)]
net: dsa: sync unicast and multicast addresses for VLAN filters too
If certain conditions are met, DSA can install all necessary MAC
addresses on the CPU ports as FDB entries and disable flooding towards
the CPU (we call this RX filtering).
There is one corner case where this does not work.
ip link add br0 type bridge vlan_filtering 1 && ip link set br0 up
ip link set swp0 master br0 && ip link set swp0 up
ip link add link swp0 name swp0.100 type vlan id 100
ip link set swp0.100 up && ip addr add 192.168.100.1/24 dev swp0.100
Traffic through swp0.100 is broken, because the bridge turns on VLAN
filtering in the swp0 port (causing RX packets to be classified to the
FDB database corresponding to the VID from their 802.1Q header), and
although the 8021q module does call dev_uc_add() towards the real
device, that API is VLAN-unaware, so it only contains the MAC address,
not the VID; and DSA's current implementation of ndo_set_rx_mode() is
only for VID 0 (corresponding to FDB entries which are installed in an
FDB database which is only hit when the port is VLAN-unaware).
It's interesting to understand why the bridge does not turn on
IFF_PROMISC for its swp0 bridge port, and it may appear at first glance
that this is a regression caused by the logic in commit 2796d0c648c9
("bridge: Automatically manage port promiscuous mode."). After all,
a bridge port needs to have IFF_PROMISC by its very nature - it needs to
receive and forward frames with a MAC DA different from the bridge
ports' MAC addresses.
While that may be true, when the bridge is VLAN-aware *and* it has a
single port, there is no real reason to enable promiscuity even if that
is an automatic port, with flooding and learning (there is nowhere for
packets to go except to the BR_FDB_LOCAL entries), and this is how the
corner case appears. Adding a second automatic interface to the bridge
would make swp0 promisc as well, and would mask the corner case.
Given the dev_uc_add() / ndo_set_rx_mode() API is what it is (it doesn't
pass a VLAN ID), the only way to address that problem is to install host
FDB entries for the cartesian product of RX filtering MAC addresses and
VLAN RX filters.
Fixes: 7569459a52c9 ("net: dsa: manage flooding on the CPU ports") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20230329151821.745752-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Steffen Bätz [Wed, 29 Mar 2023 15:01:40 +0000 (12:01 -0300)]
net: dsa: mv88e6xxx: Enable IGMP snooping on user ports only
Do not set the MV88E6XXX_PORT_CTL0_IGMP_MLD_SNOOP bit on CPU or DSA ports.
This allows the host CPU port to be a regular IGMP listener by sending out
IGMP Membership Reports, which would otherwise not be forwarded by the
mv88exxx chip, but directly looped back to the CPU port itself.
Fixes: 54d792f257c6 ("net: dsa: Centralise global and port setup code into mv88e6xxx.") Signed-off-by: Steffen Bätz <steffen@innosonix.de> Signed-off-by: Fabio Estevam <festevam@denx.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20230329150140.701559-1-festevam@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Vetter [Thu, 30 Mar 2023 16:07:12 +0000 (18:07 +0200)]
Merge tag 'drm-intel-fixes-2023-03-30' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
drm/i915 fixes for v6.3-rc5:
- Fix PMU support by reusing functions with sysfs
- Fix a number of issues related to color, PSR and arm/noarm
- Fix state check related to ICL PHY ownership check in TC-cold state
- Flush lmem contents after construction
- Fix hibernate oops related to DPT BO
- Fix perf stream error path wakeref balance
Linus Torvalds [Thu, 30 Mar 2023 16:04:04 +0000 (09:04 -0700)]
Merge tag 'sound-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of small fixes:
- A potential deadlock fix for USB-audio, involving some change in
PCM core side
- A regression fix for probes of USB-audio devices with the
vendor-specific PCM format bits
- Two regression fixes for the old YMFPCI driver
- A few HD-audio quirks as usual"
* tag 'sound-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek: Add quirk for Lenovo ZhaoYang CF4620Z
ALSA: ymfpci: Fix BUG_ON in probe function
ALSA: ymfpci: Create card with device-managed snd_devm_card_new()
ALSA: usb-audio: Fix regression on detection of Roland VS-100
ALSA: hda/realtek: Fix support for Dell Precision 3260
ALSA: usb-audio: Fix recursive locking at XRUN during syncing
ALSA: hda/conexant: Partial revert of a quirk for Lenovo
ALSA: hda/realtek: Add quirks for some Clevo laptops
Linus Torvalds [Thu, 30 Mar 2023 16:00:17 +0000 (09:00 -0700)]
Merge tag 'zonefs-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs
Pull zonefs fixes from Damien Le Moal:
- Make sure to always invalidate the last page of an inode straddling
inode->i_size to avoid data inconsistencies with appended data when
the device zone write granularity does not match the page size.
- Do not propagate iomap -ENOBLK error to userspace and use -EBUSY
instead.
* tag 'zonefs-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
zonefs: Do not propagate iomap_dio_rw() ENOTBLK error to user space
zonefs: Always invalidate last cached page on append write
Lucas Stach [Thu, 30 Mar 2023 15:33:27 +0000 (17:33 +0200)]
Revert "drm/etnaviv: export client GPU usage statistics via fdinfo"
This reverts commit 97804a133c68, as it builds on top of df622729ddbf
("drm/scheduler: track GPU active time per entity") which needs to be
reverted, as it introduces a use-after-free.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Lucas Stach [Fri, 24 Feb 2023 17:21:54 +0000 (18:21 +0100)]
drm/etnaviv: fix reference leak when mmaping imported buffer
drm_gem_prime_mmap() takes a reference on the GEM object, but before that
drm_gem_mmap_obj() already takes a reference, which will be leaked as only
one reference is dropped when the mapping is closed. Drop the extra
reference when dma_buf_mmap() succeeds.
Cc: stable@vger.kernel.org Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Tim Huang [Thu, 30 Mar 2023 02:33:02 +0000 (10:33 +0800)]
drm/amdgpu: allow more APUs to do mode2 reset when go to S4
Skip mode2 reset only for IMU enabled APUs when do S4.
This patch is to fix the regression issue
https://gitlab.freedesktop.org/drm/amd/-/issues/2483
It is generated by commit b589626674de ("drm/amdgpu: skip ASIC reset
for APUs when go to S4").
Fixes: b589626674de ("drm/amdgpu: skip ASIC reset for APUs when go to S4") Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2483 Tested-by: Yuan Perry <Perry.Yuan@amd.com> Signed-off-by: Tim Huang <tim.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.1.x
Damien Le Moal [Thu, 30 Mar 2023 00:47:58 +0000 (09:47 +0900)]
zonefs: Do not propagate iomap_dio_rw() ENOTBLK error to user space
The call to invalidate_inode_pages2_range() in __iomap_dio_rw() may
fail, in which case -ENOTBLK is returned and this error code is
propagated back to user space trhough iomap_dio_rw() ->
zonefs_file_dio_write() return chain. This error code is fairly obscure
and may confuse the user. Avoid this and be consistent with the behavior
of zonefs_file_dio_append() for similar invalidate_inode_pages2_range()
errors by returning -EBUSY to user space when iomap_dio_rw() returns
-ENOTBLK.
Suggested-by: Christoph Hellwig <hch@infradead.org> Fixes: 8dcc1a9d90c1 ("fs: New zonefs file system") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Tested-by: Hans Holmberg <hans.holmberg@wdc.com>
Damien Le Moal [Wed, 29 Mar 2023 04:16:01 +0000 (13:16 +0900)]
zonefs: Always invalidate last cached page on append write
When a direct append write is executed, the append offset may correspond
to the last page of a sequential file inode which might have been cached
already by buffered reads, page faults with mmap-read or non-direct
readahead. To ensure that the on-disk and cached data is consistant for
such last cached page, make sure to always invalidate it in
zonefs_file_dio_append(). If the invalidation fails, return -EBUSY to
userspace to differentiate from IO errors.
This invalidation will always be a no-op when the FS block size (device
zone write granularity) is equal to the page size (e.g. 4K).
Reported-by: Hans Holmberg <Hans.Holmberg@wdc.com> Fixes: 02ef12a663c7 ("zonefs: use REQ_OP_ZONE_APPEND for sync DIO") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Tested-by: Hans Holmberg <hans.holmberg@wdc.com>
Paolo Abeni [Thu, 30 Mar 2023 11:40:04 +0000 (13:40 +0200)]
Merge branch 'net-rps-rfs-improvements'
Eric Dumazet says:
====================
net: rps/rfs improvements
Jason Xing attempted to optimize napi_schedule_rps() by avoiding
unneeded NET_RX_SOFTIRQ raises: [1], [2]
This is quite complex to implement properly. I chose to implement
the idea, and added a similar optimization in ____napi_schedule()
Overall, in an intensive RPC workload, with 32 TX/RX queues with RFS
I was able to observe a ~10% reduction of NET_RX_SOFTIRQ
invocations.
While this had no impact on throughput or cpu costs on this synthetic
benchmark, we know that firing NET_RX_SOFTIRQ from softirq handler
can force __do_softirq() to wakeup ksoftirqd when need_resched() is true.
This can have a latency impact on stressed hosts.
Eric Dumazet [Tue, 28 Mar 2023 23:50:21 +0000 (23:50 +0000)]
net: optimize ____napi_schedule() to avoid extra NET_RX_SOFTIRQ
____napi_schedule() adds a napi into current cpu softnet_data poll_list,
then raises NET_RX_SOFTIRQ to make sure net_rx_action() will process it.
Idea of this patch is to not raise NET_RX_SOFTIRQ when being called indirectly
from net_rx_action(), because we can process poll_list from this point,
without going to full softirq loop.
This needs a change in net_rx_action() to make sure we restart
its main loop if sd->poll_list was updated without NET_RX_SOFTIRQ
being raised.
Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jason Xing <kernelxing@tencent.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Tested-by: Jason Xing <kerneljasonxing@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Eric Dumazet [Tue, 28 Mar 2023 23:50:20 +0000 (23:50 +0000)]
net: optimize napi_schedule_rps()
Based on initial patch from Jason Xing.
Idea is to not raise NET_RX_SOFTIRQ from napi_schedule_rps()
when we queued a packet into another cpu backlog.
We can do this only in the context of us being called indirectly
from net_rx_action(), to have the guarantee our rps_ipi_list
will be processed before we exit from net_rx_action().
Link: https://lore.kernel.org/lkml/20230325152417.5403-1-kerneljasonxing@gmail.com/ Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jason Xing <kernelxing@tencent.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Tested-by: Jason Xing <kerneljasonxing@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Eric Dumazet [Tue, 28 Mar 2023 23:50:19 +0000 (23:50 +0000)]
net: add softnet_data.in_net_rx_action
We want to make two optimizations in napi_schedule_rps() and
____napi_schedule() which require to know if these helpers are
called from net_rx_action(), instead of being called from
other contexts.
sd.in_net_rx_action is only read/written by the owning cpu.
Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Tested-by: Jason Xing <kerneljasonxing@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Eric Dumazet [Tue, 28 Mar 2023 23:50:18 +0000 (23:50 +0000)]
net: napi_schedule_rps() cleanup
napi_schedule_rps() return value is ignored, remove it.
Change the comment to clarify the intent.
Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Tested-by: Jason Xing <kerneljasonxing@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Thu, 30 Mar 2023 08:47:51 +0000 (10:47 +0200)]
Merge branch 'fix-header-length-on-skb-merging'
Arseniy Krasnov says:
====================
fix header length on skb merging
this patchset fixes appending newly arrived skbuff to the last skbuff of
the socket's queue during rx path. Problem fires when we are trying to
append data to skbuff which was already processed in dequeue callback
at least once. Dequeue callback calls function 'skb_pull()' which changes
'skb->len'. In current implementation 'skb->len' is used to update length
in header of last skbuff after new data was copied to it. This is bug,
because value in header is used to calculate 'rx_bytes'/'fwd_cnt' and
thus must be constant during skbuff lifetime. Here is example, we have
two skbuffs: skb0 with length 10 and skb1 with length 4.
1) skb0 arrives, hdr->len == skb->len == 10, rx_bytes == 10
2) Read 3 bytes from skb0, skb->len == 7, hdr->len == 10, rx_bytes == 10
3) skb1 arrives, hdr->len == skb->len == 4, rx_bytes == 14
4) Append skb1 to skb0, skb0 now has skb->len == 11, hdr->len == 11.
But value of 11 in header is invalid.
5) Read whole skb0, update rx_bytes by 11 from skb0's header.
6) At this moment rx_bytes == 3, but socket's queue is empty.
This bug starts to fire since:
commit 077706165717 ("virtio/vsock: don't use skbuff state to account credit")
In fact, it presents before, but didn't triggered due to a little bit
buggy implementation of credit calculation logic. So i'll use Fixes tag
for it.
I really forgot about this branch in rx path when implemented patch 077706165717.
This patchset contains 3 patches:
1) Fix itself.
2) Patch with WARN_ONCE() to catch such problems in future.
3) Patch with test which triggers skb appending logic. It looks like
simple test with several 'send()' and 'recv()', but it checks, that
skbuff appending works ok.
====================
Arseniy Krasnov [Tue, 28 Mar 2023 11:33:07 +0000 (14:33 +0300)]
test/vsock: new skbuff appending test
This adds test which checks case when data of newly received skbuff is
appended to the last skbuff in the socket's queue. It looks like simple
test with 'send()' and 'recv()', but internally it triggers logic which
appends one received skbuff to another. Test checks that this feature
works correctly.
Arseniy Krasnov [Tue, 28 Mar 2023 11:32:12 +0000 (14:32 +0300)]
virtio/vsock: WARN_ONCE() for invalid state of socket
This adds WARN_ONCE() and return from stream dequeue callback when
socket's queue is empty, but 'rx_bytes' still non-zero. This allows
the detection of potential bugs due to packet merging (see previous
patch).
Arseniy Krasnov [Tue, 28 Mar 2023 11:31:28 +0000 (14:31 +0300)]
virtio/vsock: fix header length on skb merging
This fixes appending newly arrived skbuff to the last skbuff of the
socket's queue. Problem fires when we are trying to append data to skbuff
which was already processed in dequeue callback at least once. Dequeue
callback calls function 'skb_pull()' which changes 'skb->len'. In current
implementation 'skb->len' is used to update length in header of the last
skbuff after new data was copied to it. This is bug, because value in
header is used to calculate 'rx_bytes'/'fwd_cnt' and thus must be not
be changed during skbuff's lifetime.
Bug starts to fire since:
commit 077706165717
("virtio/vsock: don't use skbuff state to account credit")
It presents before, but didn't triggered due to a little bit buggy
implementation of credit calculation logic. So use Fixes tag for it.
Fixes: 077706165717 ("virtio/vsock: don't use skbuff state to account credit") Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski [Thu, 30 Mar 2023 05:15:24 +0000 (22:15 -0700)]
Merge tag 'mlx5-updates-2023-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2023-03-28
Dragos Tatulea says:
====================
net/mlx5e: RX, Drop page_cache and fully use page_pool
For page allocation on the rx path, the mlx5e driver has been using an
internal page cache in tandem with the page pool. The internal page
cache uses a queue for page recycling which has the issue of head of
queue blocking.
This patch series drops the internal page_cache altogether and uses the
page_pool to implement everything that was done by the page_cache
before:
* Let the page_pool handle dma mapping and unmapping.
* Use fragmented pages with fragment counter instead of tracking via
page ref.
* Enable skb recycling.
The patch series has the following effects on the rx path:
* Improved performance for the cases when there was low page recycling
due to head of queue blocking in the internal page_cache. The test
for this was running a single iperf TCP stream to a rx queue
which is bound on the same cpu as the application.
This will be handled in a different patch series by adding support for
multi-packet per page.
* For other cases the performance is roughly the same.
The above numbers were obtained on the following system:
24 core Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz
32 GB RAM
ConnectX-7 single port
The breakdown on the patch series is the following:
* Preparations for introducing the mlx5e_frag_page struct.
* Delete the mlx5e_page_cache struct.
* Enable dma mapping from page_pool.
* Enable skb recycling and fragment counting.
* Do deferred release of pages (just before alloc) to ensure better
page_pool cache utilization.
====================
* tag 'mlx5-updates-2023-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5e: RX, Remove unnecessary recycle parameter and page_cache stats
net/mlx5e: RX, Break the wqe bulk refill in smaller chunks
net/mlx5e: RX, Increase WQE bulk size for legacy rq
net/mlx5e: RX, Split off release path for xsk buffers for legacy rq
net/mlx5e: RX, Defer page release in legacy rq for better recycling
net/mlx5e: RX, Change wqe last_in_page field from bool to bit flags
net/mlx5e: RX, Defer page release in striding rq for better recycling
net/mlx5e: RX, Rename xdp_xmit_bitmap to a more generic name
net/mlx5e: RX, Enable skb page recycling through the page_pool
net/mlx5e: RX, Enable dma map and sync from page_pool allocator
net/mlx5e: RX, Remove internal page_cache
net/mlx5e: RX, Store SHAMPO header pages in array
net/mlx5e: RX, Remove alloc unit layout constraint for striding rq
net/mlx5e: RX, Remove alloc unit layout constraint for legacy rq
net/mlx5e: RX, Remove mlx5e_alloc_unit argument in page allocation
====================
Jakub Kicinski [Thu, 30 Mar 2023 04:48:18 +0000 (21:48 -0700)]
Merge branch 'bnxt_en-3-bug-fixes'
Michael Chan says:
====================
bnxt_en: 3 Bug fixes
This series contains 3 small bug fixes covering ethtool self test, PCI
ID string typos, and some missing 200G link speed ethtool reporting logic.
====================
Michael Chan [Wed, 29 Mar 2023 01:30:21 +0000 (18:30 -0700)]
bnxt_en: Add missing 200G link speed reporting
bnxt_fw_to_ethtool_speed() is missing the case statement for 200G
link speed reported by firmware. As a result, ethtool will report
unknown speed when the firmware reports 200G link speed.
Fixes: 532262ba3b84 ("bnxt_en: ethtool: support PAM4 link speeds up to 200G") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 30 Mar 2023 04:46:18 +0000 (21:46 -0700)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-03-28 (ice)
This series contains updates to ice driver only.
Jesse fixes mismatched header documentation reported when building with
W=1.
Brett restricts setting of VSI context to only applicable fields for the
given ICE_AQ_VSI_PROP_Q_OPT_VALID bit.
Junfeng adds check when adding Flow Director filters that conflict with
existing filter rules.
Jakob Koschel adds interim variable for iterating to prevent possible
misuse after looping.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ice: fix invalid check for empty list in ice_sched_assoc_vsi_to_agg()
ice: add profile conflict check for AVF FDIR
ice: Fix ice_cfg_rdma_fltr() to only update relevant fields
ice: fix W=1 headers mismatch
====================
Jakub Kicinski [Thu, 30 Mar 2023 04:41:12 +0000 (21:41 -0700)]
Merge tag 'ieee802154-for-net-2023-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git/wpan/wpan
Stefan Schmidt says:
====================
ieee802154 for net 2023-03-29
Two small fixes this time.
Dongliang Mu removed an unnecessary null pointer check.
Harshit Mogalapalli fixed an int comparison unsigned against signed from a
recent other fix in the ca8210 driver.
* tag 'ieee802154-for-net-2023-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git/wpan/wpan:
net: ieee802154: remove an unnecessary null pointer check
ca8210: Fix unsigned mac_len comparison with zero in ca8210_skb_tx()
====================
Dan Carpenter [Wed, 29 Mar 2023 06:51:37 +0000 (09:51 +0300)]
octeon_ep: unlock the correct lock on error path
The h and the f letters are swapped so it unlocks the wrong lock.
Fixes: 577f0d1b1c5f ("octeon_ep: add separate mailbox command and response queues") Signed-off-by: Dan Carpenter <error27@gmail.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/251aa2a2-913e-4868-aac9-0a90fc3eeeda@kili.mountain Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 29 Mar 2023 00:00:13 +0000 (17:00 -0700)]
bnx2x: use the right build_skb() helper
build_skb() no longer accepts slab buffers. Since slab use is fairly
uncommon we prefer the drivers to call a separate slab_build_skb()
function appropriately.
bnx2x uses the old semantics where size of 0 meant buffer from slab.
It sets the fp->rx_frag_size to 0 for MTUs which don't fit in a page.
It needs to call slab_build_skb().
This fixes the WARN_ONCE() of incorrect API use seen with bnx2x.
Alex Elder [Tue, 28 Mar 2023 16:27:51 +0000 (11:27 -0500)]
net: ipa: compute DMA pool size properly
In gsi_trans_pool_init_dma(), the total size of a pool of memory
used for DMA transactions is calculated. However the calculation is
done incorrectly.
For 4KB pages, this total size is currently always more than one
page, and as a result, the calculation produces a positive (though
incorrect) total size. The code still works in this case; we just
end up with fewer DMA pool entries than we intended.
Bjorn Andersson tested booting a kernel with 16KB pages, and hit a
null pointer derereference in sg_alloc_append_table_from_pages(),
descending from gsi_trans_pool_init_dma(). The cause of this was
that a 16KB total size was going to be allocated, and with 16KB
pages the order of that allocation is 0. The total_size calculation
yielded 0, which eventually led to the crash.
Correcting the total_size calculation fixes the problem.
Reported-by: Bjorn Andersson <quic_bjorande@quicinc.com> Tested-by: Bjorn Andersson <quic_bjorande@quicinc.com> Fixes: 9dd441e4ed57 ("soc: qcom: ipa: GSI transactions") Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Alex Elder <elder@linaro.org> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20230328162751.2861791-1-elder@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tianfei Zhang [Tue, 28 Mar 2023 14:24:55 +0000 (10:24 -0400)]
ptp: add ToD device driver for Intel FPGA cards
Adding a DFL (Device Feature List) device driver of ToD device for
Intel FPGA cards.
The Intel FPGA Time of Day(ToD) IP within the FPGA DFL bus is exposed
as PTP Hardware clock(PHC) device to the Linux PTP stack to synchronize
the system clock to its ToD information using phc2sys utility of the
Linux PTP stack. The DFL is a hardware List within FPGA, which defines
a linked list of feature headers within the device MMIO space to provide
an extensible way of adding subdevice features.
Signed-off-by: Raghavendra Khadatare <raghavendrax.anand.khadatare@intel.com> Signed-off-by: Tianfei Zhang <tianfei.zhang@intel.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/r/20230328142455.481146-1-tianfei.zhang@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fangzhi Zuo [Wed, 1 Mar 2023 02:34:58 +0000 (21:34 -0500)]
drm/amd/display: Take FEC Overhead into Timeslot Calculation
8b/10b encoding needs to add 3% fec overhead into the pbn.
In the Synapcis Cascaded MST hub, the first stage MST branch device
needs the information to determine the timeslot count for the
second stage MST branch device. Missing this overhead will leads to
insufficient timeslot allocation.
Cc: stable@vger.kernel.org Cc: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Hersen Wu <hersenxs.wu@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Fangzhi Zuo <Jerry.Zuo@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fangzhi Zuo [Fri, 24 Feb 2023 18:45:21 +0000 (13:45 -0500)]
drm/amd/display: Add DSC Support for Synaptics Cascaded MST Hub
Traditional synaptics hub has one MST branch device without virtual dpcd.
Synaptics cascaded hub has two chained MST branch devices. DSC decoding
is performed via root MST branch device, instead of the second MST branch
device.
Linus Torvalds [Wed, 29 Mar 2023 17:24:07 +0000 (10:24 -0700)]
Merge tag 'xtensa-20230327' of https://github.com/jcmvbkbc/linux-xtensa
Pull xtensa fixes from Max Filippov:
- fix KASAN report in show_stack
- drop linux-xtensa mailing list from the MAINTAINERS file
* tag 'xtensa-20230327' of https://github.com/jcmvbkbc/linux-xtensa:
MAINTAINERS: xtensa: drop linux-xtensa@linux-xtensa.org mailing list
xtensa: fix KASAN report for show_stack
David Gow [Wed, 29 Mar 2023 06:55:34 +0000 (14:55 +0800)]
drm: test: Fix 32-bit issue in drm_buddy_test
The drm_buddy_test KUnit tests verify that returned blocks have sizes
which are powers of two using is_power_of_2(). However, is_power_of_2()
operations on a 'long', but the block size is a u64. So on systems where
long is 32-bit, this can sometimes fail even on correctly sized blocks.
This only reproduces randomly, as the parameters passed to the buddy
allocator in this test are random. The seed 0xb2e06022 reproduced it
fine here.
For now, just hardcode an is_power_of_2() implementation using
x & (x - 1).
Signed-off-by: David Gow <davidgow@google.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Arunpravin Paneer Selvam <arunpravin.paneerselvam@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230329065532.2122295-2-davidgow@google.com Signed-off-by: Christian König <christian.koenig@amd.com>
David Gow [Wed, 29 Mar 2023 06:55:32 +0000 (14:55 +0800)]
drm: buddy_allocator: Fix buddy allocator init on 32-bit systems
The drm buddy allocator tests were broken on 32-bit systems, as
rounddown_pow_of_two() takes a long, and the buddy allocator handles
64-bit sizes even on 32-bit systems.
This can be reproduced with the drm_buddy_allocator KUnit tests on i386:
./tools/testing/kunit/kunit.py run --arch i386 \
--kunitconfig ./drivers/gpu/drm/tests drm_buddy
(It results in kernel BUG_ON() when too many blocks are created, due to
the block size being too small.)
This was independently uncovered (and fixed) by Luís Mendes, whose patch
added a new u64 variant of rounddown_pow_of_two(). This version instead
recalculates the size based on the order.
Hans de Goede [Sun, 26 Mar 2023 20:54:33 +0000 (22:54 +0200)]
drm/nouveau/kms: Fix backlight registration
The nouveau code used to call drm_fb_helper_initial_config() from
nouveau_fbcon_init() before calling drm_dev_register(). This would
probe all connectors so that drm_connector->status could be used during
backlight registration which runs from nouveau_connector_late_register().
After commit 4a16dd9d18a0 ("drm/nouveau/kms: switch to drm fbdev helpers")
the fbdev emulation code, which now is a drm-client, can only run after
drm_dev_register(). So during backlight registration the connectors are
not probed yet and the drm_connector->status == connected check in
nv50_backlight_init() would now always fail.
Replace the drm_connector->status == connected check with
a drm_helper_probe_detect() == connected check to fix nv_backlight
no longer getting registered because of this.
M Chetan Kumar [Tue, 28 Mar 2023 06:28:44 +0000 (11:58 +0530)]
net: wwan: iosm: fixes 7560 modem crash
ModemManger/Apps probing the wwan0xmmrpc0 port for 7560 Modem results in
modem crash.
7560 Modem FW uses the MBIM interface for control command communication
whereas 7360 uses Intel RPC interface so disable wwan0xmmrpc0 port for
7560.
Fixes: d08b0f8f46e4 ("net: wwan: iosm: add rpc interface for xmm modems") Reported-and-tested-by: Martin <mwolf@adiumentum.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217200 Signed-off-by: M Chetan Kumar <m.chetan.kumar@linux.intel.com> Signed-off-by: Shane Parslow <shaneparslow808@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 29 Mar 2023 08:06:09 +0000 (09:06 +0100)]
Merge branch 'sfc-tc-decap-support'
Edward Cree says:
====================
sfc: support TC decap rules
This series adds support for offloading tunnel decapsulation TC rules to
ef100 NICs, allowing matching encapsulated packets to be decapsulated in
hardware and redirected to VFs.
For now an encap match must be on precisely the following fields:
ethertype (IPv4 or IPv6), source IP, destination IP, ipproto UDP,
UDP destination port. This simplifies checking for overlaps in the
driver; the hardware supports a wider range of match fields which
future driver work may expose.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Mon, 27 Mar 2023 10:36:08 +0000 (11:36 +0100)]
sfc: add offloading of 'foreign' TC (decap) rules
A 'foreign' rule is one for which the net_dev is not the sfc netdevice
or any of its representors. The driver registers indirect flow blocks
for tunnel netdevs so that it can offload decap rules. For example:
When notified of a rule like this, register an encap match on the IP
and dport tuple (creating an Outer Rule table entry) and insert an MAE
action rule to perform the decapsulation and deliver to the representee.
Moved efx_tc_delete_rule() below efx_tc_flower_release_encap_match() to
avoid the need for a forward declaration.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Mon, 27 Mar 2023 10:36:07 +0000 (11:36 +0100)]
sfc: add code to register and unregister encap matches
Add a hashtable to detect duplicate and conflicting matches. If match
is not a duplicate, call MAE functions to add/remove it from OR table.
Calling code not added yet, so mark the new functions as unused.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Mon, 27 Mar 2023 10:36:06 +0000 (11:36 +0100)]
sfc: add functions to insert encap matches into the MAE
An encap match corresponds to an entry in the exact-match Outer Rule
table; the lookup response includes the encap type (protocol) allowing
the hardware to continue parsing into the inner headers.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Mon, 27 Mar 2023 10:36:05 +0000 (11:36 +0100)]
sfc: handle enc keys in efx_tc_flower_parse_match()
Translate the fields from flow dissector into struct efx_tc_match.
In efx_tc_flower_replace(), reject filters that match on them, because
only 'foreign' filters (i.e. those for which the ingress dev is not
the sfc netdev or any of its representors, e.g. a tunnel netdev) can
use them.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Mon, 27 Mar 2023 10:36:04 +0000 (11:36 +0100)]
sfc: add notion of match on enc keys to MAE machinery
Extend the MAE caps check to validate that the hardware supports these
outer-header matches where used by the driver.
Extend efx_mae_populate_match_criteria() to fill in the outer rule ID
and VNI match fields.
Nothing yet populates these match fields, nor creates outer rules.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Includes an explanation of the lifetime of the 'cursor' action-set `act`.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 29 Mar 2023 08:03:33 +0000 (09:03 +0100)]
Merge branch 'macvlan-broadcast-queue-bypass'
Herbert Xu says:
====================
macvlan: Allow some packets to bypass broadcast queue
This patch series allows some packets to bypass the broadcast
queue on receive. Currently all multicast packets are queued
on receive and then processed in a work queue. This is to avoid
an unbounded amount of work occurring in the receive path, as
one broadcast packet could easily translate into 4,000 packets.
However, for multicast packets with just one receiver (possible
for IPv6 ND), this introduces unnecessary latency as the packet
will go to exactly one device.
This series allows such multicast packets to be processed inline.
It also adds a toggle which lets the admin control what threshold
to set between queueing and not queueing. A follow-up patch for
iproute will be posted.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Tue, 28 Mar 2023 02:57:59 +0000 (10:57 +0800)]
macvlan: Add netlink attribute for broadcast cutoff
Make the broadcast cutoff configurable through netlink. Note
that macvlan is weird because there is no central device for
us to configure (the lowerdev could be anything). So all the
options are duplicated over what could be thousands of child
devices.
IFLA_MACVLAN_BC_QUEUE_LEN took the approach of taking the maximum
of all child device settings. This is unnecessary as we could
simply store the option in the port device and take the last
child device that gets updated as the value to use.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Tue, 28 Mar 2023 02:57:56 +0000 (10:57 +0800)]
macvlan: Skip broadcast queue if multicast with single receiver
As it stands all broadcast and multicast packets are queued and
processed in a work queue. This is so that we don't overwhelm
the receive softirq path by generating thousands of packets or
more (see commit 412ca1550cbe "macvlan: Move broadcasts into a
work queue").
As such all multicast packets will be delayed, even if they will
be received by a single macvlan device. As using a workqueue
is not free in terms of latency, we should avoid this where possible.
This patch adds a new filter to determine which addresses should
be delayed and which ones won't. This is done using a crude
counter of how many times an address has been added to the macvlan
port (ha->synced). For now if an address has been added more than
once, then it will be considered to be broadcast. This could be
tuned further by making this threshold configurable.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 29 Mar 2023 08:01:28 +0000 (09:01 +0100)]
Merge branch 'mptcp-cleanups'
Matthieu Baerts says:
====================
mptcp: a couple of cleanups and improvements
Patch 1 removes an unneeded address copy in subflow_syn_recv_sock().
Patch 2 simplifies subflow_syn_recv_sock() to postpone some actions and
to avoid a bunch of conditionals.
Patch 3 stops reporting limits that are not taken into account when the
userspace PM is used.
Patch 4 adds a new test to validate that the 'subflows' field reported
by the kernel is correct. Such info can be retrieved via Netlink (e.g.
with ss) or getsockopt(SOL_MPTCP, MPTCP_INFO).
---
Changes in v2:
- Patch 3/4's commit message has been updated to use the correct SHA
- Rebased on latest net-next
- Link to v1: https://lore.kernel.org/r/20230324-upstream-net-next-20230324-misc-features-v1-0-5a29154592bd@tessares.net
====================
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Geliang Tang [Mon, 27 Mar 2023 10:22:24 +0000 (12:22 +0200)]
selftests: mptcp: add mptcp_info tests
This patch adds the mptcp_info fields tests in endpoint_tests(). Add a
new function chk_mptcp_info() to check the given number of the given
mptcp_info field.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/330 Signed-off-by: Geliang Tang <geliang.tang@suse.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Matthieu Baerts [Mon, 27 Mar 2023 10:22:23 +0000 (12:22 +0200)]
mptcp: do not fill info not used by the PM in used
Only the in-kernel PM uses the number of address and subflow limits
allowed per connection.
It then makes more sense not to display such info when other PMs are
used not to confuse the userspace by showing limits not being used.
While at it, we can get rid of the "val" variable and add indentations
instead.
It would have been good to have done this modification directly in
commit 4d25247d3ae4 ("mptcp: bypass in-kernel PM restrictions for non-kernel PMs")
but as we change a bit the behaviour, it is fine not to backport it to
stable.
Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Mon, 27 Mar 2023 10:22:22 +0000 (12:22 +0200)]
mptcp: simplify subflow_syn_recv_sock()
Postpone the msk cloning to the child process creation
so that we can avoid a bunch of conditionals.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/61 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Mon, 27 Mar 2023 10:22:21 +0000 (12:22 +0200)]
mptcp: avoid unneeded address copy
In the syn_recv fallback path, the msk is unused. We can skip
setting the socket address.
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Some code defines the IPv6 wildcard address as a local variable and
use it with memcmp() or ipv6_addr_equal().
Let's use in6addr_any and ipv6_addr_any() instead.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 29 Mar 2023 07:19:38 +0000 (08:19 +0100)]
Merge branch 'vsock-sockmap-support'
Bobby Eshleman says:
====================
Add support for sockmap to vsock.
We're testing usage of vsock as a way to redirect guest-local UDS
requests to the host and this patch series greatly improves the
performance of such a setup.
Compared to copying packets via userspace, this improves throughput by
121% in basic testing.
Tested as follows.
Setup: guest unix dgram sender -> guest vsock redirector -> host vsock
server
Threads: 1
Payload: 64k
No sockmap:
- 76.3 MB/s
- The guest vsock redirector was
"socat VSOCK-CONNECT:2:1234 UNIX-RECV:/path/to/sock"
Using sockmap (this patch):
- 168.8 MB/s (+121%)
- The guest redirector was a simple sockmap echo server,
redirecting unix ingress to vsock 2:1234 egress.
- Same sender and server programs
*Note: these numbers are from RFC v1
Only the virtio transport has been tested. The loopback transport was
used in writing bpf/selftests, but not thoroughly tested otherwise.
This series requires the skb patch.
Changes in v4:
- af_vsock: fix parameter alignment in vsock_dgram_recvmsg()
- af_vsock: add TCP_ESTABLISHED comment in vsock_dgram_connect()
- vsock/bpf: change ret type to bool
Changes in v3:
- vsock/bpf: Refactor wait logic in vsock_bpf_recvmsg() to avoid
backwards goto
- vsock/bpf: Check psock before acquiring slock
- vsock/bpf: Return bool instead of int of 0 or 1
- vsock/bpf: Wrap macro args __sk/__psock in parens
- vsock/bpf: Place comment trailer */ on separate line
Changes in v2:
- vsock/bpf: rename vsock_dgram_* -> vsock_*
- vsock/bpf: change sk_psock_{get,put} and {lock,release}_sock() order
to minimize slock hold time
- vsock/bpf: use "new style" wait
- vsock/bpf: fix bug in wait log
- vsock/bpf: add check that recvmsg sk_type is one dgram, seqpacket, or
stream. Return error if not one of the three.
- virtio/vsock: comment __skb_recv_datagram() usage
- virtio/vsock: do not init copied in read_skb()
- vsock/bpf: add ifdef guard around struct proto in dgram_recvmsg()
- selftests/bpf: add vsock loopback config for aarch64
- selftests/bpf: add vsock loopback config for s390x
- selftests/bpf: remove vsock device from vmtest.sh qemu machine
- selftests/bpf: remove CONFIG_VIRTIO_VSOCKETS=y from config.x86_64
- vsock/bpf: move transport-related (e.g., if (!vsk->transport)) checks
out of fast path
====================
Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Bobby Eshleman [Mon, 27 Mar 2023 19:11:53 +0000 (19:11 +0000)]
selftests/bpf: add a test case for vsock sockmap
Add a test case testing the redirection from connectible AF_VSOCK
sockets to connectible AF_UNIX sockets.
Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Bobby Eshleman [Mon, 27 Mar 2023 19:11:52 +0000 (19:11 +0000)]
selftests/bpf: add vsock to vmtest.sh
Add vsock loopback to the test kernel.
This allows sockmap for vsock to be tested.
Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Bobby Eshleman [Mon, 27 Mar 2023 19:11:51 +0000 (19:11 +0000)]
vsock: support sockmap
This patch adds sockmap support for vsock sockets. It is intended to be
usable by all transports, but only the virtio and loopback transports
are implemented.
SOCK_STREAM, SOCK_DGRAM, and SOCK_SEQPACKET are all supported.
Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
ynl: add support for user headers and struct attrs
Add support for user headers and struct attrs to YNL. This patchset adds
features to ynl and add a partial spec for openvswitch that demonstrates
use of the features.
Patch 1-4 add features to ynl
Patch 5 adds partial openvswitch specs that demonstrate the new features
Patch 6-7 add documentation for legacy structs and for sub-type
====================
Donald Hunter [Mon, 27 Mar 2023 08:31:36 +0000 (09:31 +0100)]
netlink: specs: add partial specification for openvswitch
The openvswitch family has a fixed header, uses struct attrs and has array
values. This partial spec demonstrates these features in the YNL CLI. These
specs are sufficient to create, delete and dump datapaths and to dump vports:
Donald Hunter [Mon, 27 Mar 2023 08:31:35 +0000 (09:31 +0100)]
tools: ynl: Add fixed-header support to ynl
Add support for netlink families that add an optional fixed header structure
after the genetlink header and before any attributes. The fixed-header can be
specified on a per op basis, or once for all operations, which serves as a
default value that can be overridden.
Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 29 Mar 2023 06:52:12 +0000 (23:52 -0700)]
Merge tag 'mlx5-updates-2023-03-20' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2023-03-20
mlx5 dynamic msix
This patch series adds support for dynamic msix vectors allocation in mlx5.
Eli Cohen Says:
================
The following series of patches modifies mlx5_core to work with the
dynamic MSIX API. Currently, mlx5_core allocates all the interrupt
vectors it needs and distributes them amongst the consumers. With the
introduction of dynamic MSIX support, which allows for allocation of
interrupts more than once, we now allocate vectors as we need them.
This allows other drivers running on top of mlx5_core to allocate
interrupt vectors for their own use. An example for this is mlx5_vdpa,
which uses these vectors to propagate interrupts directly from the
hardware to the vCPU [1].
As a preparation for using this series, a use after free issue is fixed
in lib/cpu_rmap.c and the allocator for rmap entries has been modified.
A complementary API for irq_cpu_rmap_add() has also been introduced.
* tag 'mlx5-updates-2023-03-20' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5: Provide external API for allocating vectors
net/mlx5: Use one completion vector if eth is disabled
net/mlx5: Refactor calculation of required completion vectors
net/mlx5: Move devlink registration before mlx5_load
net/mlx5: Use dynamic msix vectors allocation
net/mlx5: Refactor completion irq request/release code
net/mlx5: Improve naming of pci function vectors
net/mlx5: Use newer affinity descriptor
net/mlx5: Modify struct mlx5_irq to use struct msi_map
net/mlx5: Fix wrong comment
net/mlx5e: Coding style fix, add empty line
lib: cpu_rmap: Add irq_cpu_rmap_remove to complement irq_cpu_rmap_add
lib: cpu_rmap: Use allocator for rmap entries
lib: cpu_rmap: Avoid use after free on rmap->obj array entries
====================
clang with W=1 reports
drivers/net/ethernet/8390/axnet_cs.c:653:9: error: variable
'xfer_count' set but not used [-Werror,-Wunused-but-set-variable]
int xfer_count = count;
^
This variable is not used so remove it.
Tasos Sahanidis [Wed, 29 Mar 2023 03:24:22 +0000 (06:24 +0300)]
ALSA: ymfpci: Create card with device-managed snd_devm_card_new()
snd_card_ymfpci_remove() was removed in commit c6e6bb5eab74 ("ALSA:
ymfpci: Allocate resources with device-managed APIs"), but the call to
snd_card_new() was not replaced with snd_devm_card_new().
Since there was no longer a call to snd_card_free, unloading the module
would eventually result in Oops:
Felix Fietkau [Fri, 24 Mar 2023 14:04:04 +0000 (15:04 +0100)]
net: ethernet: mtk_eth_soc: fix tx throughput regression with direct 1G links
Using the QDMA tx scheduler to throttle tx to line speed works fine for
switch ports, but apparently caused a regression on non-switch ports.
Based on a number of tests, it seems that this throttling can be safely
dropped without re-introducing the issues on switch ports that the
tx scheduling changes resolved.