MAINTAINERS: Drop Li Yang as their email address stopped working
When sending a patch to (among others) Li Yang the nxp MTA replied that
the address doesn't exist and so the mail couldn't be delivered. The
error code was 550, so at least technically that's not a temporal issue.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Inorder to support shaping and scheduling, Upon class creation
Netdev driver allocates trasmit schedulers.
The previous patch which added support for Round robin scheduling has
a bug due to which driver is not freeing transmit schedulers post
class deletion.
This patch fixes the same.
Fixes: 47a9656f168a ("octeontx2-pf: htb offload support for Round Robin scheduling") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
3) virtnet_commit_rss_command() populates 4 entries for the rss
scatter-gather
4) Since the command above does not have a key, then the last
scatter-gatter entry will be zeroed, since rss_key_size == 0.
sg_buf_size = vi->rss_key_size;
5) This buffer is passed to qemu, but qemu is not happy with a buffer
with zero length, and do the following in virtqueue_map_desc() (QEMU
function):
if (!sz) {
virtio_error(vdev, "virtio: zero sized buffers are not allowed");
6) virtio_error() (also QEMU function) set the device as broken
vdev->broken = true;
7) Qemu bails out, and do not repond this crazy kernel.
8) The kernel is waiting for the response to come back (function
virtnet_send_command())
9) The kernel is waiting doing the following :
while (!virtqueue_get_buf(vi->cvq, &tmp) &&
!virtqueue_is_broken(vi->cvq))
cpu_relax();
10) None of the following functions above is true, thus, the kernel
loops here forever. Keeping in mind that virtqueue_is_broken() does
not look at the qemu `vdev->broken`, so, it never realizes that the
vitio is broken at QEMU side.
Fix it by not sending RSS commands if the feature is not available in
the device.
Fixes: c7114b1249fa ("drivers/net/virtio_net: Added basic RSS support.") Cc: stable@vger.kernel.org Cc: qemu-devel@nongnu.org Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Heng Qi <hengqi@linux.alibaba.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 4 Apr 2024 20:27:38 +0000 (20:27 +0000)]
xsk: validate user input for XDP_{UMEM|COMPLETION}_FILL_RING
syzbot reported an illegal copy in xsk_setsockopt() [1]
Make sure to validate setsockopt() @optlen parameter.
[1]
BUG: KASAN: slab-out-of-bounds in copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
BUG: KASAN: slab-out-of-bounds in copy_from_sockptr include/linux/sockptr.h:55 [inline]
BUG: KASAN: slab-out-of-bounds in xsk_setsockopt+0x909/0xa40 net/xdp/xsk.c:1420
Read of size 4 at addr ffff888028c6cde3 by task syz-executor.0/7549
The buggy address belongs to the object at ffff888028c6cde0
which belongs to the cache kmalloc-8 of size 8
The buggy address is located 1 bytes to the right of
allocated 2-byte region [ffff888028c6cde0, ffff888028c6cde2)
Memory state around the buggy address: ffff888028c6cc80: fa fc fc fc fa fc fc fc fa fc fc fc fa fc fc fc ffff888028c6cd00: fa fc fc fc fa fc fc fc 00 fc fc fc 06 fc fc fc
>ffff888028c6cd80: fa fc fc fc fa fc fc fc fa fc fc fc 02 fc fc fc
^ ffff888028c6ce00: fa fc fc fc fa fc fc fc fa fc fc fc fa fc fc fc ffff888028c6ce80: fa fc fc fc fa fc fc fc fa fc fc fc fa fc fc fc
Fixes: 423f38329d26 ("xsk: add umem fill queue support and mmap") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: "Björn Töpel" <bjorn@kernel.org> Cc: Magnus Karlsson <magnus.karlsson@intel.com> Cc: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Cc: Jonathan Lemon <jonathan.lemon@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20240404202738.3634547-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Petr Tesarik [Thu, 4 Apr 2024 07:57:40 +0000 (09:57 +0200)]
u64_stats: fix u64_stats_init() for lockdep when used repeatedly in one file
Fix bogus lockdep warnings if multiple u64_stats_sync variables are
initialized in the same file.
With CONFIG_LOCKDEP, seqcount_init() is a macro which declares:
static struct lock_class_key __key;
Since u64_stats_init() is a function (albeit an inline one), all calls
within the same file end up using the same instance, effectively treating
them all as a single lock-class.
Fixes: 9464ca650008 ("net: make u64_stats_init() a function") Closes: https://lore.kernel.org/netdev/ea1567d9-ce66-45e6-8168-ac40a47d1821@roeck-us.net/ Signed-off-by: Petr Tesarik <petr@tesarici.cz> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240404075740.30682-1-petr@tesarici.cz Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net: openvswitch: fix unwanted error log on timeout policy probing
On startup, ovs-vswitchd probes different datapath features including
support for timeout policies. While probing, it tries to execute
certain operations with OVS_PACKET_ATTR_PROBE or OVS_FLOW_ATTR_PROBE
attributes set. These attributes tell the openvswitch module to not
log any errors when they occur as it is expected that some of the
probes will fail.
For some reason, setting the timeout policy ignores the PROBE attribute
and logs a failure anyway. This is causing the following kernel log
on each re-start of ovs-vswitchd:
kernel: Failed to associated timeout policy `ovs_test_tp'
Fix that by using the same logging macro that all other messages are
using. The message will still be printed at info level when needed
and will be rate limited, but with a net rate limiter instead of
generic printk one.
The nf_ct_set_timeout() itself will still print some info messages,
but at least this change makes logging in openvswitch module more
consistent.
Fixes: 06bd2bdf19d2 ("openvswitch: Add timeout support to ct action") Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Eelco Chaudron <echaudro@redhat.com> Link: https://lore.kernel.org/r/20240403203803.2137962-1-i.maximets@ovn.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Merge tag 'net-6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter, bluetooth and bpf.
Fairly usual collection of driver and core fixes. The large selftest
accompanying one of the fixes is also becoming a common occurrence.
Current release - regressions:
- ipv6: fix infinite recursion in fib6_dump_done()
- net/rds: fix possible null-deref in newly added error path
Current release - new code bugs:
- net: do not consume a full cacheline for system_page_pool
- bpf: fix bpf_arena-related file descriptor leaks in the verifier
- drv: ice: fix freeing uninitialized pointers, fixing misuse of the
newfangled __free() auto-cleanup
Previous releases - regressions:
- x86/bpf: fixes the BPF JIT with retbleed=stuff
- xen-netfront: add missing skb_mark_for_recycle, fix page pool
accounting leaks, revealed by recently added explicit warning
- tcp: fix bind() regression for v6-only wildcard and v4-mapped-v6
non-wildcard addresses
- Bluetooth:
- replace "hci_qca: Set BDA quirk bit if fwnode exists in DT" with
better workarounds to un-break some buggy Qualcomm devices
- set conn encrypted before conn establishes, fix re-connecting to
some headsets which use slightly unusual sequence of msgs
- mptcp:
- prevent BPF accessing lowat from a subflow socket
- don't account accept() of non-MPC client as fallback to TCP
- drv: mana: fix Rx DMA datasize and skb_over_panic
- drv: i40e: fix VF MAC filter removal
Previous releases - always broken:
- gro: various fixes related to UDP tunnels - netns crossing
problems, incorrect checksum conversions, and incorrect packet
transformations which may lead to panics
- bpf: support deferring bpf_link dealloc to after RCU grace period
- nf_tables:
- release batch on table validation from abort path
- release mutex after nft_gc_seq_end from abort path
- flush pending destroy work before exit_net release
- drv: r8169: skip DASH fw status checks when DASH is disabled"
* tag 'net-6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (81 commits)
netfilter: validate user input for expected length
net/sched: act_skbmod: prevent kernel-infoleak
net: usb: ax88179_178a: avoid the interface always configured as random address
net: dsa: sja1105: Fix parameters order in sja1110_pcs_mdio_write_c45()
net: ravb: Always update error counters
net: ravb: Always process TX descriptor ring
netfilter: nf_tables: discard table flag update with pending basechain deletion
netfilter: nf_tables: Fix potential data-race in __nft_flowtable_type_get()
netfilter: nf_tables: reject new basechain after table flag update
netfilter: nf_tables: flush pending destroy work before exit_net release
netfilter: nf_tables: release mutex after nft_gc_seq_end from abort path
netfilter: nf_tables: release batch on table validation from abort path
Revert "tg3: Remove residual error handling in tg3_suspend"
tg3: Remove residual error handling in tg3_suspend
net: mana: Fix Rx DMA datasize and skb_over_panic
net/sched: fix lockdep splat in qdisc_tree_reduce_backlog()
net: phy: micrel: lan8814: Fix when enabling/disabling 1-step timestamping
net: stmmac: fix rx queue priority assignment
net: txgbe: fix i2c dev name cannot match clkdev
net: fec: Set mac_managed_pm during probe
...
Merge tag 'bcachefs-2024-04-03' of https://evilpiepirate.org/git/bcachefs
Pull bcachefs repair code from Kent Overstreet:
"A couple more small fixes, and new repair code.
We can now automatically recover from arbitrary corrupted interior
btree nodes by scanning, and we can reconstruct metadata as needed to
bring a filesystem back into a working, consistent, read-write state
and preserve access to whatevver wasn't corrupted.
Meaning - you can blow away all metadata except for extents and
dirents leaf nodes, and repair will reconstruct everything else and
give you your data, and under the correct paths. If inodes are missing
i_size will be slightly off and permissions/ownership/timestamps will
be gone, and we do still need the snapshots btree if snapshots were in
use - in the future we'll be able to guess the snapshot tree structure
in some situations.
IOW - aside from shaking out remaining bugs (fuzz testing is still
coming), repair code should be complete and if repair ever doesn't
work that's the highest priority bug that I want to know about
immediately.
This patchset was kindly tested by a user from India who accidentally
wiped one drive out of a three drive filesystem with no replication on
the family computer - it took a couple weeks but we got everything
important back"
* tag 'bcachefs-2024-04-03' of https://evilpiepirate.org/git/bcachefs:
bcachefs: reconstruct_inode()
bcachefs: Subvolume reconstruction
bcachefs: Check for extents that point to same space
bcachefs: Reconstruct missing snapshot nodes
bcachefs: Flag btrees with missing data
bcachefs: Topology repair now uses nodes found by scanning to fill holes
bcachefs: Repair pass for scanning for btree nodes
bcachefs: Don't skip fake btree roots in fsck
bcachefs: bch2_btree_root_alloc() -> bch2_btree_root_alloc_fake()
bcachefs: Etyzinger cleanups
bcachefs: bch2_shoot_down_journal_keys()
bcachefs: Clear recovery_passes_required as they complete without errors
bcachefs: ratelimit informational fsck errors
bcachefs: Check for bad needs_discard before doing discard
bcachefs: Improve bch2_btree_update_to_text()
mean_and_variance: Drop always failing tests
bcachefs: fix nocow lock deadlock
bcachefs: BCH_WATERMARK_interior_updates
bcachefs: Fix btree node reserve
Jakub Kicinski [Thu, 4 Apr 2024 18:37:39 +0000 (11:37 -0700)]
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2024-04-04
We've added 7 non-merge commits during the last 5 day(s) which contain
a total of 9 files changed, 75 insertions(+), 24 deletions(-).
The main changes are:
1) Fix x86 BPF JIT under retbleed=stuff which causes kernel panics due to
incorrect destination IP calculation and incorrect IP for relocations,
from Uros Bizjak and Joan Bruguera Micó.
2) Fix BPF arena file descriptor leaks in the verifier,
from Anton Protopopov.
3) Defer bpf_link deallocation to after RCU grace period as currently
running multi-{kprobes,uprobes} programs might still access cookie
information from the link, from Andrii Nakryiko.
4) Fix a BPF sockmap lock inversion deadlock in map_delete_elem reported
by syzkaller, from Jakub Sitnicki.
5) Fix resolve_btfids build with musl libc due to missing linux/types.h
include, from Natanael Copa.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf, sockmap: Prevent lock inversion deadlock in map delete elem
x86/bpf: Fix IP for relocating call depth accounting
x86/bpf: Fix IP after emitting call depth accounting
bpf: fix possible file descriptor leaks in verifier
tools/resolve_btfids: fix build with musl libc
bpf: support deferring bpf_link dealloc to after RCU grace period
bpf: put uprobe link's path and task in release callback
====================
The buggy address belongs to the object at ffff88802cd73da0
which belongs to the cache kmalloc-8 of size 8
The buggy address is located 0 bytes inside of
allocated 1-byte region [ffff88802cd73da0, ffff88802cd73da1)
Memory state around the buggy address: ffff88802cd73c80: 07 fc fc fc 05 fc fc fc 05 fc fc fc fa fc fc fc ffff88802cd73d00: fa fc fc fc fa fc fc fc fa fc fc fc fa fc fc fc
>ffff88802cd73d80: fa fc fc fc 01 fc fc fc fa fc fc fc fa fc fc fc
^ ffff88802cd73e00: fa fc fc fc fa fc fc fc 05 fc fc fc 07 fc fc fc ffff88802cd73e80: 07 fc fc fc 07 fc fc fc 07 fc fc fc 07 fc fc fc
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org> Link: https://lore.kernel.org/r/20240404122051.2303764-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 4 Apr 2024 16:38:52 +0000 (09:38 -0700)]
Merge tag 'nf-24-04-04' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
Patch #1 unlike early commit path stage which triggers a call to abort,
an explicit release of the batch is required on abort, otherwise
mutex is released and commit_list remains in place.
Patch #2 release mutex after nft_gc_seq_end() in commit path, otherwise
async GC worker could collect expired objects.
Patch #3 flush pending destroy work in module removal path, otherwise UaF
is possible.
Patch #4 and #6 restrict the table dormant flag with basechain updates
to fix state inconsistency in the hook registration.
Patch #5 adds missing RCU read side lock to flowtable type to avoid races
with module removal.
* tag 'nf-24-04-04' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: discard table flag update with pending basechain deletion
netfilter: nf_tables: Fix potential data-race in __nft_flowtable_type_get()
netfilter: nf_tables: reject new basechain after table flag update
netfilter: nf_tables: flush pending destroy work before exit_net release
netfilter: nf_tables: release mutex after nft_gc_seq_end from abort path
netfilter: nf_tables: release batch on table validation from abort path
====================
net: usb: ax88179_178a: avoid the interface always configured as random address
After the commit d2689b6a86b9 ("net: usb: ax88179_178a: avoid two
consecutive device resets"), reset is not executed from bind operation and
mac address is not read from the device registers or the devicetree at that
moment. Since the check to configure if the assigned mac address is random
or not for the interface, happens after the bind operation from
usbnet_probe, the interface keeps configured as random address, although the
address is correctly read and set during open operation (the only reset
now).
In order to keep only one reset for the device and to avoid the interface
always configured as random address, after reset, configure correctly the
suitable field from the driver, if the mac address is read successfully from
the device registers or the devicetree. Take into account if a locally
administered address (random) was previously stored.
cc: stable@vger.kernel.org # 6.6+ Fixes: d2689b6a86b9 ("net: usb: ax88179_178a: avoid two consecutive device resets") Reported-by: Dave Stevenson <dave.stevenson@raspberrypi.com> Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240403132158.344838-1-jtornosm@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paul Barker [Tue, 2 Apr 2024 14:53:05 +0000 (15:53 +0100)]
net: ravb: Always update error counters
The error statistics should be updated each time the poll function is
called, even if the full RX work budget has been consumed. This prevents
the counts from becoming stuck when RX bandwidth usage is high.
This also ensures that error counters are not updated after we've
re-enabled interrupts as that could result in a race condition.
Paul Barker [Tue, 2 Apr 2024 14:53:04 +0000 (15:53 +0100)]
net: ravb: Always process TX descriptor ring
The TX queue should be serviced each time the poll function is called,
even if the full RX work budget has been consumed. This prevents
starvation of the TX queue when RX bandwidth usage is high.
netfilter: nf_tables: discard table flag update with pending basechain deletion
Hook unregistration is deferred to the commit phase, same occurs with
hook updates triggered by the table dormant flag. When both commands are
combined, this results in deleting a basechain while leaving its hook
still registered in the core.
Fixes: 179d9ba5559a ("netfilter: nf_tables: fix table flag updates") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Ziyang Xuan [Wed, 3 Apr 2024 07:22:04 +0000 (15:22 +0800)]
netfilter: nf_tables: Fix potential data-race in __nft_flowtable_type_get()
nft_unregister_flowtable_type() within nf_flow_inet_module_exit() can
concurrent with __nft_flowtable_type_get() within nf_tables_newflowtable().
And thhere is not any protection when iterate over nf_tables_flowtables
list in __nft_flowtable_type_get(). Therefore, there is pertential
data-race of nf_tables_flowtables list entry.
Use list_for_each_entry_rcu() to iterate over nf_tables_flowtables list
in __nft_flowtable_type_get(), and use rcu_read_lock() in the caller
nft_flowtable_type_get() to protect the entire type query process.
Fixes: 3b49e2e94e6e ("netfilter: nf_tables: add flow table netlink frontend") Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: nf_tables: flush pending destroy work before exit_net release
Similar to 2c9f0293280e ("netfilter: nf_tables: flush pending destroy
work before netlink notifier") to address a race between exit_net and
the destroy workqueue.
The trace below shows an element to be released via destroy workqueue
while exit_net path (triggered via module removal) has already released
the set that is used in such transaction.
netfilter: nf_tables: release mutex after nft_gc_seq_end from abort path
The commit mutex should not be released during the critical section
between nft_gc_seq_begin() and nft_gc_seq_end(), otherwise, async GC
worker could collect expired objects and get the released commit lock
within the same GC sequence.
nf_tables_module_autoload() temporarily releases the mutex to load
module dependencies, then it goes back to replay the transaction again.
Move it at the end of the abort phase after nft_gc_seq_end() is called.
netfilter: nf_tables: release batch on table validation from abort path
Unlike early commit path stage which triggers a call to abort, an
explicit release of the batch is required on abort, otherwise mutex is
released and commit_list remains in place.
Add WARN_ON_ONCE to ensure commit_list is empty from the abort path
before releasing the mutex.
After this patch, commit_list is always assumed to be empty before
grabbing the mutex, therefore
03c1f1ef1584 ("netfilter: Cleanup nft_net->module_list from nf_tables_exit_net()")
only needs to release the pending modules for registration.
Cc: stable@vger.kernel.org Fixes: c0391b6ab810 ("netfilter: nf_tables: missing validation from the abort path") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
mana_get_rxbuf_cfg() aligns the RX buffer's DMA datasize to be
multiple of 64. So a packet slightly bigger than mtu+14, say 1536,
can be received and cause skb_over_panic.
As discussed internally, this alignment is not necessary. To fix
this bug, remove it from the code. So oversized packets will be
marked as CQE_RX_TRUNCATED by NIC, and dropped.
Cc: stable@vger.kernel.org Fixes: 2fbbd712baf1 ("net: mana: Enable RX path to handle various MTU sizes") Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Link: https://lore.kernel.org/r/1712087316-20886-1-git-send-email-haiyangz@microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net: phy: micrel: lan8814: Fix when enabling/disabling 1-step timestamping
There are 2 issues with the blamed commit.
1. When the phy is initialized, it would enable the disabled of UDPv4
checksums. The UDPv6 checksum is already enabled by default. So when
1-step is configured then it would clear these flags.
2. After the 1-step is configured, then if 2-step is configured then the
1-step would be still configured because it is not clearing the flag.
So the sync frames will still have origin timestamps set.
Fix this by reading first the value of the register and then
just change bit 12 as this one determines if the timestamp needs to
be inserted in the frame, without changing any other bits.
Fixes: ece19502834d ("net: phy: micrel: 1588 support for LAN8814 phy") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Reviewed-by: Divya Koppera <divya.koppera@microchip.com> Link: https://lore.kernel.org/r/20240402071634.2483524-1-horatiu.vultur@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Piotr Wejman [Mon, 1 Apr 2024 19:22:39 +0000 (21:22 +0200)]
net: stmmac: fix rx queue priority assignment
The driver should ensure that same priority is not mapped to multiple
rx queues. From DesignWare Cores Ethernet Quality-of-Service
Databook, section 17.1.29 MAC_RxQ_Ctrl2:
"[...]The software must ensure that the content of this field is
mutually exclusive to the PSRQ fields for other queues, that is,
the same priority is not mapped to multiple Rx queues[...]"
Previously rx_queue_priority() function was:
- clearing all priorities from a queue
- adding new priorities to that queue
After this patch it will:
- first assign new priorities to a queue
- then remove those priorities from all other queues
- keep other priorities previously assigned to that queue
Fixes: a8f5102af2a7 ("net: stmmac: TX and RX queue priority configuration") Fixes: 2142754f8b9c ("net: stmmac: Add MAC related callbacks for XGMAC2") Signed-off-by: Piotr Wejman <piotrwejman90@gmail.com> Link: https://lore.kernel.org/r/20240401192239.33942-1-piotrwejman90@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
net: fec: Fix to suspend / resume with mac_managed_pm
Since the introduction of mac_managed_pm in the FEC driver there were some
discrepancies regarding power management of the PHY.
This failed on our board that has a permanently powered Microchip LAN8700R
attached to the FEC. Although the root cause of the failure can be traced
back to f166f890c8f0 ("net: ethernet: fec: Replace interrupt driven MDIO
with polled IO") and probably even before that, we only started noticing
the problem going from 5.10 to 6.1.
Since 557d5dc83f68 ("net: fec: use mac-managed PHY PM") is actually a fix
to most of the power management sequencing problems that came with power
managing the MDIO bus which for the FEC meant adding a race with FEC
resume (and phy_start() if netif was running) and PHY resume.
That it worked before for us was probably just luck...
Thanks to Wei's response to my report at [1] I was able to pick up his
patch and start honing in on the remaining missing details.
Wei Fang [Thu, 28 Mar 2024 15:59:29 +0000 (15:59 +0000)]
net: fec: Set mac_managed_pm during probe
Setting mac_managed_pm during interface up is too late.
In situations where the link is not brought up yet and the system suspends
the regular PHY power management will run. Since the FEC ETHEREN control
bit is cleared (automatically) on suspend the controller is off in resume.
When the regular PHY power management resume path runs in this context it
will write to the MII_DATA register but nothing will be transmitted on the
MDIO bus.
The data written will however remain in the MII_DATA register.
When the link later is set to administrative up it will trigger a call to
fec_restart() which will restore the MII_SPEED register. This triggers the
quirk explained in f166f890c8f0 ("net: ethernet: fec: Replace interrupt
driven MDIO with polled IO") causing an extra MII_EVENT.
This extra event desynchronizes all the MDIO register reads, causing them
to complete too early. Leading all reads to read as 0 because
fec_enet_mdio_wait() returns too early.
When a Microchip LAN8700R PHY is connected to the FEC, the 0 reads causes
the PHY to be initialized incorrectly and the PHY will not transmit any
ethernet signal in this state. It cannot be brought out of this state
without a power cycle of the PHY.
Fixes: 557d5dc83f68 ("net: fec: use mac-managed PHY PM") Closes: https://lore.kernel.org/netdev/1f45bdbe-eab1-4e59-8f24-add177590d27@actia.se/ Signed-off-by: Wei Fang <wei.fang@nxp.com>
[jernberg: commit message] Signed-off-by: John Ernberg <john.ernberg@actia.se> Link: https://lore.kernel.org/r/20240328155909.59613-2-john.ernberg@actia.se Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Joshua Hay [Thu, 21 Mar 2024 00:09:25 +0000 (17:09 -0700)]
idpf: fix kernel panic on unknown packet types
In the very rare case where a packet type is unknown to the driver,
idpf_rx_process_skb_fields would return early without calling
eth_type_trans to set the skb protocol / the network layer handler.
This is especially problematic if tcpdump is running when such a
packet is received, i.e. it would cause a kernel panic.
Instead, call eth_type_trans for every single packet, even when
the packet type is unknown.
Kent Overstreet [Sat, 30 Mar 2024 22:43:00 +0000 (18:43 -0400)]
bcachefs: Check for extents that point to same space
In backpointer repair, if we get a missing backpointer - but there's
already a backpointer that points to an existing extent - we've got
multiple extents that point to the same space and need to decide which
to keep.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 17 Mar 2024 02:45:30 +0000 (22:45 -0400)]
bcachefs: Topology repair now uses nodes found by scanning to fill holes
With the new btree node scan code, we can now recover from corrupt btree
roots - simply create a new fake root at depth 1, and then insert all
the leaves we found.
If the root wasn't corrupt but there's corruption elsewhere in the
btree, we can fill in holes as needed with the newest version of a given
node(s) from the scan; we also check if a given btree node is older than
what we found from the scan.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 22 Mar 2024 23:26:33 +0000 (19:26 -0400)]
bcachefs: Etyzinger cleanups
Pull out eytzinger.c and kill eytzinger_cmp_fn. We now provide
eytzinger0_sort and eytzinger0_sort_r, which use the standard cmp_func_t
and cmp_r_func_t callbacks.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Merge tag 'vboxsf-v6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hansg/linux
Pull vboxsf fixes from Hans de Goede:
- Compiler warning fixes
- Explicitly deny setlease attempts
* tag 'vboxsf-v6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hansg/linux:
vboxsf: explicitly deny setlease attempts
vboxsf: Remove usage of the deprecated ida_simple_xx() API
vboxsf: Avoid an spurious warning if load_nls_xxx() fails
vboxsf: remove redundant variable out_len
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"ARM:
- Ensure perf events programmed to count during guest execution are
actually enabled before entering the guest in the nVHE
configuration
- Restore out-of-range handler for stage-2 translation faults
- Several fixes to stage-2 TLB invalidations to avoid stale
translations, possibly including partial walk caches
- Fix early handling of architectural VHE-only systems to ensure E2H
is appropriately set
- Correct a format specifier warning in the arch_timer selftest
- Make the KVM banner message correctly handle all of the possible
configurations
RISC-V:
- Remove redundant semicolon in num_isa_ext_regs()
- Fix APLIC setipnum_le/be write emulation
- Fix APLIC in_clrip[x] read emulation
x86:
- Fix a bug in KVM_SET_CPUID{2,} where KVM looks at the wrong CPUID
entries (old vs. new) and ultimately neglects to clear PV_UNHALT
from vCPUs with HLT-exiting disabled
- Documentation fixes for SEV
- Fix compat ABI for KVM_MEMORY_ENCRYPT_OP
- Fix a 14-year-old goof in a declaration shared by host and guest;
the enabled field used by Linux when running as a guest pushes the
size of "struct kvm_vcpu_pv_apf_data" from 64 to 68 bytes. This is
really unconsequential because KVM never consumes anything beyond
the first 64 bytes, but the resulting struct does not match the
documentation
Selftests:
- Fix spelling mistake in arch_timer selftest"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (25 commits)
KVM: arm64: Rationalise KVM banner output
arm64: Fix early handling of FEAT_E2H0 not being implemented
KVM: arm64: Ensure target address is granule-aligned for range TLBI
KVM: arm64: Use TLBI_TTL_UNKNOWN in __kvm_tlb_flush_vmid_range()
KVM: arm64: Don't pass a TLBI level hint when zapping table entries
KVM: arm64: Don't defer TLB invalidation when zapping table entries
KVM: selftests: Fix __GUEST_ASSERT() format warnings in ARM's arch timer test
KVM: arm64: Fix out-of-IPA space translation fault handling
KVM: arm64: Fix host-programmed guest events in nVHE
RISC-V: KVM: Fix APLIC in_clrip[x] read emulation
RISC-V: KVM: Fix APLIC setipnum_le/be write emulation
RISC-V: KVM: Remove second semicolon
KVM: selftests: Fix spelling mistake "trigged" -> "triggered"
Documentation: kvm/sev: clarify usage of KVM_MEMORY_ENCRYPT_OP
Documentation: kvm/sev: separate description of firmware
KVM: SEV: fix compat ABI for KVM_MEMORY_ENCRYPT_OP
KVM: selftests: Check that PV_UNHALT is cleared when HLT exiting is disabled
KVM: x86: Use actual kvm_cpuid.base for clearing KVM_FEATURE_PV_UNHALT
KVM: x86: Introduce __kvm_get_hypervisor_cpuid() helper
KVM: SVM: Return -EINVAL instead of -EBUSY on attempt to re-init SEV/SEV-ES
...
security: Place security_path_post_mknod() where the original IMA call was
Commit 08abce60d63f ("security: Introduce path_post_mknod hook")
introduced security_path_post_mknod(), to replace the IMA-specific call
to ima_post_path_mknod().
For symmetry with security_path_mknod(), security_path_post_mknod() was
called after a successful mknod operation, for any file type, rather
than only for regular files at the time there was the IMA call.
However, as reported by VFS maintainers, successful mknod operation does
not mean that the dentry always has an inode attached to it (for
example, not for FIFOs on a SAMBA mount).
If that condition happens, the kernel crashes when
security_path_post_mknod() attempts to verify if the inode associated to
the dentry is private.
Move security_path_post_mknod() where the ima_post_path_mknod() call was,
which is obviously correct from IMA/EVM perspective. IMA/EVM are the only
in-kernel users, and only need to inspect regular files.
Reported-by: Steve French <smfrench@gmail.com> Closes: https://lore.kernel.org/linux-kernel/CAH2r5msAVzxCUHHG8VKrMPUKQHmBpE6K9_vjhgDa1uAvwx4ppw@mail.gmail.com/ Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Fixes: 08abce60d63f ("security: Introduce path_post_mknod hook") Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Mimi Zohar <zohar@linux.ibm.com> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
x86/retpoline: Do the necessary fixup to the Zen3/4 srso return thunk for !SRSO
The srso_alias_untrain_ret() dummy thunk in the !CONFIG_MITIGATION_SRSO
case is there only for the altenative in CALL_UNTRAIN_RET to have
a symbol to resolve.
However, testing with kernels which don't have CONFIG_MITIGATION_SRSO
enabled, leads to the warning in patch_return() to fire:
missing return thunk: srso_alias_untrain_ret+0x0/0x10-0x0: eb 0e 66 66 2e
WARNING: CPU: 0 PID: 0 at arch/x86/kernel/alternative.c:826 apply_returns (arch/x86/kernel/alternative.c:826
Put in a plain "ret" there so that gcc doesn't put a return thunk in
in its place which special and gets checked.
Petr Oros [Mon, 25 Mar 2024 20:19:01 +0000 (21:19 +0100)]
ice: fix enabling RX VLAN filtering
ice_port_vlan_on/off() was introduced in commit 2946204b3fa8 ("ice:
implement bridge port vlan"). But ice_port_vlan_on() incorrectly assigns
ena_rx_filtering to inner_vlan_ops in DVM mode.
This causes an error when rx_filtering cannot be enabled in legacy mode.
Reproducer:
echo 1 > /sys/class/net/$PF/device/sriov_numvfs
ip link set $PF vf 0 spoofchk off trust on vlan 3
dmesg:
ice 0000:41:00.0: failed to enable Rx VLAN filtering for VF 0 VSI 9 during VF rebuild, error -95
Fixes: 2946204b3fa8 ("ice: implement bridge port vlan") Signed-off-by: Petr Oros <poros@redhat.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Dan Carpenter [Thu, 21 Mar 2024 14:42:12 +0000 (17:42 +0300)]
ice: Fix freeing uninitialized pointers
Automatically cleaned up pointers need to be initialized before exiting
their scope. In this case, they need to be initialized to NULL before
any return statement.
Fixes: 90f821d72e11 ("ice: avoid unnecessary devm_ usage") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Jeff Layton [Tue, 19 Mar 2024 16:32:04 +0000 (12:32 -0400)]
vboxsf: explicitly deny setlease attempts
vboxsf does not break leases on its own, so it can't properly handle the
case where the hypervisor changes the data. Don't allow file leases on
vboxsf.
vboxsf: Avoid an spurious warning if load_nls_xxx() fails
If an load_nls_xxx() function fails a few lines above, the 'sbi->bdi_id' is
still 0.
So, in the error handling path, we will call ida_simple_remove(..., 0)
which is not allocated yet.
In order to prevent a spurious "ida_free called for id=0 which is not
allocated." message, tweak the error handling path and add a new label.
Colin Ian King [Thu, 29 Feb 2024 22:51:38 +0000 (22:51 +0000)]
vboxsf: remove redundant variable out_len
The variable out_len is being used to accumulate the number of
bytes but it is not being used for any other purpose. The variable
is redundant and can be removed.
Cleans up clang scan build warning:
fs/vboxsf/utils.c:443:9: warning: variable 'out_len' set but not
used [-Wunused-but-set-variable]
Phil Elwell [Mon, 1 Apr 2024 11:09:33 +0000 (13:09 +0200)]
net: bcmgenet: Reset RBUF on first open
If the RBUF logic is not reset when the kernel starts then there
may be some data left over from any network boot loader. If the
64-byte packet headers are enabled then this can be fatal.
Extend bcmgenet_dma_disable to do perform the reset, but not when
called from bcmgenet_resume in order to preserve a wake packet.
N.B. This different handling of resume is just based on a hunch -
why else wouldn't one reset the RBUF as well as the TBUF? If this
isn't the case then it's easy to change the patch to make the RBUF
reset unconditional.
Signed-off-by: Phil Elwell <phil@raspberrypi.com> Signed-off-by: Maarten Vanraes <maarten@rmail.be> Signed-off-by: David S. Miller <davem@davemloft.net>
Aleksandr Mishin [Thu, 28 Mar 2024 16:55:05 +0000 (19:55 +0300)]
octeontx2-af: Add array index check
In rvu_map_cgx_lmac_pf() the 'iter', which is used as an array index, can reach
value (up to 14) that exceed the size (MAX_LMAC_COUNT = 8) of the array.
Fix this bug by adding 'iter' value check.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 91c6945ea1f9 ("octeontx2-af: cn10k: Add RPM MAC support") Signed-off-by: Aleksandr Mishin <amishin@t-argos.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
Merge tag 'selinux-pr-20240402' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux fix from Paul Moore:
"A single patch for SELinux to fix a problem where we could potentially
dereference an error pointer if we failed to successfully mount
selinuxfs"
* tag 'selinux-pr-20240402' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: avoid dereference of garbage after mount failure
syzkaller reported infinite recursive calls of fib6_dump_done() during
netlink socket destruction. [1]
From the log, syzkaller sent an AF_UNSPEC RTM_GETROUTE message, and then
the response was generated. The following recvmmsg() resumed the dump
for IPv6, but the first call of inet6_dump_fib() failed at kzalloc() due
to the fault injection. [0]
Here, fib6_dump_done() was set to nlk_sk(sk)->cb.done, and the next call
of inet6_dump_fib() set it to nlk_sk(sk)->cb.args[3]. syzkaller stopped
receiving the response halfway through, and finally netlink_sock_destruct()
called nlk_sk(sk)->cb.done().
fib6_dump_done() calls fib6_dump_end() and nlk_sk(sk)->cb.done() if it
is still not NULL. fib6_dump_end() rewrites nlk_sk(sk)->cb.done() by
nlk_sk(sk)->cb.args[3], but it has the same function, not NULL, calling
itself recursively and hitting the stack guard page.
To avoid the issue, let's set the destructor after kzalloc().
[0]:
FAULT_INJECTION: forcing a failure.
name failslab, interval 1, probability 0, space 0, times 0
CPU: 1 PID: 432110 Comm: syz-executor.3 Not tainted 6.8.0-12821-g537c2e91d354-dirty #11
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl (lib/dump_stack.c:117)
should_fail_ex (lib/fault-inject.c:52 lib/fault-inject.c:153)
should_failslab (mm/slub.c:3733)
kmalloc_trace (mm/slub.c:3748 mm/slub.c:3827 mm/slub.c:3992)
inet6_dump_fib (./include/linux/slab.h:628 ./include/linux/slab.h:749 net/ipv6/ip6_fib.c:662)
rtnl_dump_all (net/core/rtnetlink.c:4029)
netlink_dump (net/netlink/af_netlink.c:2269)
netlink_recvmsg (net/netlink/af_netlink.c:1988)
____sys_recvmsg (net/socket.c:1046 net/socket.c:2801)
___sys_recvmsg (net/socket.c:2846)
do_recvmmsg (net/socket.c:2943)
__x64_sys_recvmmsg (net/socket.c:3041 net/socket.c:3034 net/socket.c:3034)
Heiner Kallweit [Sat, 30 Mar 2024 11:49:02 +0000 (12:49 +0100)]
r8169: fix issue caused by buggy BIOS on certain boards with RTL8168d
On some boards with this chip version the BIOS is buggy and misses
to reset the PHY page selector. This results in the PHY ID read
accessing registers on a different page, returning a more or
less random value. Fix this by resetting the page selector first.
Marco Pinna [Fri, 29 Mar 2024 16:12:59 +0000 (17:12 +0100)]
vsock/virtio: fix packet delivery to tap device
Commit 82dfb540aeb2 ("VSOCK: Add virtio vsock vsockmon hooks") added
virtio_transport_deliver_tap_pkt() for handing packets to the
vsockmon device. However, in virtio_transport_send_pkt_work(),
the function is called before actually sending the packet (i.e.
before placing it in the virtqueue with virtqueue_add_sgs() and checking
whether it returned successfully).
Queuing the packet in the virtqueue can fail even multiple times.
However, in virtio_transport_deliver_tap_pkt() we deliver the packet
to the monitoring tap interface only the first time we call it.
This certainly avoids seeing the same packet replicated multiple times
in the monitoring interface, but it can show the packet sent with the
wrong timestamp or even before we succeed to queue it in the virtqueue.
Move virtio_transport_deliver_tap_pkt() after calling virtqueue_add_sgs()
and making sure it returned successfully.
Fixes: 82dfb540aeb2 ("VSOCK: Add virtio vsock vsockmon hooks") Cc: stable@vge.kernel.org Signed-off-by: Marco Pinna <marco.pinn95@gmail.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20240329161259.411751-1-marco.pinn95@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Duoming Zhou [Fri, 29 Mar 2024 01:50:23 +0000 (09:50 +0800)]
ax25: fix use-after-free bugs caused by ax25_ds_del_timer
When the ax25 device is detaching, the ax25_dev_device_down()
calls ax25_ds_del_timer() to cleanup the slave_timer. When
the timer handler is running, the ax25_ds_del_timer() that
calls del_timer() in it will return directly. As a result,
the use-after-free bugs could happen, one of the scenarios
is shown below:
Kent Overstreet [Tue, 2 Apr 2024 22:30:14 +0000 (18:30 -0400)]
bcachefs: Check for bad needs_discard before doing discard
In the discard worker, we were failing to validate the bucket state -
meaning a corrupt needs_discard btree could cause us to discard a bucket
that we shouldn't.
If check_alloc_info hasn't run yet we just want to bail out, otherwise
it's a filesystem inconsistent error.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Merge tag 'docs-6.9-fixes' of git://git.lwn.net/linux
Pull documentation fixes from Jonathan Corbet:
"Four small documentation fixes"
* tag 'docs-6.9-fixes' of git://git.lwn.net/linux:
docs: zswap: fix shell command format
tracing: Fix documentation on tp_printk cmdline option
docs: Fix bitfield handling in kernel-doc
Documentation: dev-tools: Add link to RV docs
Merge tag 'bcachefs-2024-04-01' of https://evilpiepirate.org/git/bcachefs
Pull bcachefs fixes from Kent Overstreet:
"Lots of fixes for situations with extreme filesystem damage.
One fix ("Fix journal pins in btree write buffer") applicable to
normal usage; also a dio performance fix.
New repair/construction code is in the final stages, should be ready
in about a week. Anyone that lost btree interior nodes (or a variety
of other damage) as a result of the splitbrain bug will be able to
repair then"
* tag 'bcachefs-2024-04-01' of https://evilpiepirate.org/git/bcachefs: (32 commits)
bcachefs: On emergency shutdown, print out current journal sequence number
bcachefs: Fix overlapping extent repair
bcachefs: Fix remove_dirent()
bcachefs: Logged op errors should be ignored
bcachefs: Improve -o norecovery; opts.recovery_pass_limit
bcachefs: bch2_run_explicit_recovery_pass_persistent()
bcachefs: Ensure bch_sb_field_ext always exists
bcachefs: Flush journal immediately after replay if we did early repair
bcachefs: Resume logged ops after fsck
bcachefs: Add error messages to logged ops fns
bcachefs: Split out recovery_passes.c
bcachefs: fix backpointer for missing alloc key msg
bcachefs: Fix bch2_btree_increase_depth()
bcachefs: Kill bch2_bkey_ptr_data_type()
bcachefs: Fix use after free in check_root_trans()
bcachefs: Fix repair path for missing indirect extents
bcachefs: Fix use after free in bch2_check_fix_ptrs()
bcachefs: Fix btree node keys accounting in topology repair path
bcachefs: Check btree ptr min_key in .invalid
bcachefs: add REQ_SYNC and REQ_IDLE in write dio
...
Guenter Roeck [Sun, 25 Feb 2024 16:29:25 +0000 (08:29 -0800)]
mean_and_variance: Drop always failing tests
mean_and_variance_test_2 and mean_and_variance_test_4 always fail.
The input parameters to those tests are identical to the input parameters
to tests 1 and 3, yet the expected result for tests 2 and 4 is different
for the mean and stddev tests. That will always fail.
Jakub Sitnicki [Tue, 2 Apr 2024 10:46:21 +0000 (12:46 +0200)]
bpf, sockmap: Prevent lock inversion deadlock in map delete elem
syzkaller started using corpuses where a BPF tracing program deletes
elements from a sockmap/sockhash map. Because BPF tracing programs can be
invoked from any interrupt context, locks taken during a map_delete_elem
operation must be hardirq-safe. Otherwise a deadlock due to lock inversion
is possible, as reported by lockdep:
Locks in sockmap are hardirq-unsafe by design. We expects elements to be
deleted from sockmap/sockhash only in task (normal) context with interrupts
enabled, or in softirq context.
Detect when map_delete_elem operation is invoked from a context which is
_not_ hardirq-unsafe, that is interrupts are disabled, and bail out with an
error.
Note that map updates are not affected by this issue. BPF verifier does not
allow updating sockmap/sockhash from a BPF tracing program today.
Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface") Reported-by: xingwei lee <xrivendell7@gmail.com> Reported-by: yue sun <samsun1006219@gmail.com> Reported-by: syzbot+bc922f476bd65abbd466@syzkaller.appspotmail.com Reported-by: syzbot+d4066896495db380182e@syzkaller.appspotmail.com Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: syzbot+d4066896495db380182e@syzkaller.appspotmail.com Acked-by: John Fastabend <john.fastabend@gmail.com> Closes: https://syzkaller.appspot.com/bug?extid=d4066896495db380182e Closes: https://syzkaller.appspot.com/bug?extid=bc922f476bd65abbd466 Link: https://lore.kernel.org/bpf/20240402104621.1050319-1-jakub@cloudflare.com
Ivan Vecera [Fri, 29 Mar 2024 18:06:37 +0000 (11:06 -0700)]
i40e: Fix VF MAC filter removal
Commit 73d9629e1c8c ("i40e: Do not allow untrusted VF to remove
administratively set MAC") fixed an issue where untrusted VF was
allowed to remove its own MAC address although this was assigned
administratively from PF. Unfortunately the introduced check
is wrong because it causes that MAC filters for other MAC addresses
including multi-cast ones are not removed.
if (i40e_del_mac_filter(vsi, al->list[i].addr)) {
...
</snip>
The else path with `continue` effectively skips any MAC filter
removal except one for primary MAC addr when VF is allowed to do so.
Fix the check condition so the `continue` is only done for primary
MAC address.
Fixes: 73d9629e1c8c ("i40e: Do not allow untrusted VF to remove administratively set MAC") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Michal Schmidt <mschmidt@redhat.com> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20240329180638.211412-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
mptcp: fix fallback MIB counter and wrong var in selftests
Here are two fixes related to MPTCP.
The first patch fixes when the MPTcpExtMPCapableFallbackACK MIB counter
is modified: it should only be incremented when a connection was using
MPTCP options, but then a fallback to TCP has been done. This patch also
checks the counter is not incremented by mistake during the connect
selftests. This counter was wrongly incremented since its introduction
in v5.7.
The second patch fixes a wrong parsing of the 'dev' endpoint options in
the selftests: the wrong variable was used. This option was not used
before, but it is going to be soon. This issue is visible since v5.18.
====================
Geliang Tang [Fri, 29 Mar 2024 12:08:53 +0000 (13:08 +0100)]
selftests: mptcp: join: fix dev in check_endpoint
There's a bug in pm_nl_check_endpoint(), 'dev' didn't be parsed correctly.
If calling it in the 2nd test of endpoint_tests() too, it fails with an
error like this:
creation [FAIL] expected '10.0.2.2 id 2 subflow dev dev' \
found '10.0.2.2 id 2 subflow dev ns2eth2'
The reason is '$2' should be set to 'dev', not '$1'. This patch fixes it.
Davide Caratti [Fri, 29 Mar 2024 12:08:52 +0000 (13:08 +0100)]
mptcp: don't account accept() of non-MPC client as fallback to TCP
Current MPTCP servers increment MPTcpExtMPCapableFallbackACK when they
accept non-MPC connections. As reported by Christoph, this is "surprising"
because the counter might become greater than MPTcpExtMPCapableSYNRX.
MPTcpExtMPCapableFallbackACK counter's name suggests it should only be
incremented when a connection was seen using MPTCP options, then a
fallback to TCP has been done. Let's do that by incrementing it when
the subflow context of an inbound MPC connection attempt is dropped.
Also, update mptcp_connect.sh kselftest, to ensure that the
above MIB does not increment in case a pure TCP client connects to a
MPTCP server.
Fixes: fc518953bc9c ("mptcp: add and use MIB counter infrastructure") Cc: stable@vger.kernel.org Reported-by: Christoph Paasch <cpaasch@apple.com> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/449 Signed-off-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240329-upstream-net-20240329-fallback-mib-v1-1-324a8981da48@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 29 Mar 2024 16:05:59 +0000 (09:05 -0700)]
selftests: reuseaddr_conflict: add missing new line at the end of the output
The netdev CI runs in a VM and captures serial, so stdout and
stderr get combined. Because there's a missing new line in
stderr the test ends up corrupting KTAP:
In lan8814_get_sig_rx() and lan8814_get_sig_tx() ptp_parse_header() may
return NULL as ptp_header due to abnormal packet type or corrupted packet.
Fix this bug by adding ptp_header check.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: ece19502834d ("net: phy: micrel: 1588 support for LAN8814 phy") Signed-off-by: Aleksandr Mishin <amishin@t-argos.ru> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20240329061631.33199-1-amishin@t-argos.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
x86/bpf: Fixes for the BPF JIT with retbleed=stuff
From: Joan Bruguera Micó <joanbrugueram@gmail.com>
Fixes two issues that cause kernels panic when using the BPF JIT with
the call depth tracking / stuffing mitigation for Skylake processors
(`retbleed=stuff`). Both issues can be triggered by running simple
BPF programs (e.g. running the test suite should trigger both).
The first (resubmit) fixes a trivial issue related to calculating the
destination IP for call instructions with call depth tracking.
The second is related to using the correct IP for relocations, related
to the recently introduced %rip-relative addressing for PER_CPU_VAR.
Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net>
---
v2:
Simplify calculation of "ip".
Add more details to the commit message.
Joan Bruguera Micó (1):
x86/bpf: Fix IP for relocating call depth accounting
====================
x86/bpf: Fix IP for relocating call depth accounting
The commit:
59bec00ace28 ("x86/percpu: Introduce %rip-relative addressing to PER_CPU_VAR()")
made PER_CPU_VAR() to use rip-relative addressing, hence
INCREMENT_CALL_DEPTH macro and skl_call_thunk_template got rip-relative
asm code inside of it. A follow up commit:
17bce3b2ae2d ("x86/callthunks: Handle %rip-relative relocations in call thunk template")
changed x86_call_depth_emit_accounting() to use apply_relocation(),
but mistakenly assumed that the code is being patched in-place (where
the destination of the relocation matches the address of the code),
using *pprog as the destination ip. This is not true for the call depth
accounting, emitted by the BPF JIT, so the calculated address was wrong,
JIT-ed BPF progs on kernels with call depth tracking got broken and
usually caused a page fault.
Pass the destination IP when the BPF JIT emits call depth accounting.
x86/bpf: Fix IP after emitting call depth accounting
Adjust the IP passed to `emit_patch` so it calculates the correct offset
for the CALL instruction if `x86_call_depth_emit_accounting` emits code.
Otherwise we will skip some instructions and most likely crash.
Kent Overstreet [Mon, 1 Apr 2024 23:20:36 +0000 (19:20 -0400)]
bcachefs: BCH_WATERMARK_interior_updates
This adds a new watermark, higher priority than BCH_WATERMARK_reclaim,
for interior btree updates. We've seen a deadlock where journal replay
triggers a ton of btree node merges, and these use up all available open
buckets and then interior updates get stuck.
One cause of this is that we're currently lacking btree node merging on
write buffer btrees - that needs to be fixed as well.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Merge tag 'pwm/for-6.9-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux
Pull pwm fix from Uwe Kleine-König:
"This fixes a regression intoduced by an off-by-one in v6.9-rc1 making
the pwm-pxa and the pwm driver in ti-sn65dsi86 unusable for most
consumer drivers because the default period wasn't set"
* tag 'pwm/for-6.9-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux:
pwm: Fix setting period with #pwm-cells = <1> and of_pwm_single_xlate()
Marc Zyngier [Thu, 21 Mar 2024 17:37:06 +0000 (17:37 +0000)]
KVM: arm64: Rationalise KVM banner output
We are not very consistent when it comes to displaying which mode
we're in (VHE, {n,h}VHE, protected or not). For example, booting
in protected mode with hVHE results in:
Marc Zyngier [Thu, 21 Mar 2024 11:54:14 +0000 (11:54 +0000)]
arm64: Fix early handling of FEAT_E2H0 not being implemented
Commit 3944382fa6f2 introduced checks for the FEAT_E2H0 not being
implemented. However, the check is absolutely wrong and makes a
point it testing a bit that is guaranteed to be zero.
On top of that, the detection happens way too late, after the
init_el2_state has done its job.
This went undetected because the HW this was tested on has E2H being
RAO/WI, and not RES1. However, the bug shows up when run as a nested
guest, where HCR_EL2.E2H is not necessarily set to 1. As a result,
booting the kernel in hVHE mode fails with timer accesses being
cought in a trap loop (which was fun to debug).
Fix the check for ID_AA64MMFR4_EL1.E2H0, and set the HCR_EL2.E2H bit
early so that it can be checked by the rest of the init sequence.
With this, hVHE works again in a NV environment that doesn't have
FEAT_E2H0.
Fixes: 3944382fa6f2 ("arm64: Treat HCR_EL2.E2H as RES1 when ID_AA64MMFR4_EL1.E2H0 is negative") Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20240321115414.3169115-1-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Will Deacon [Wed, 27 Mar 2024 12:48:53 +0000 (12:48 +0000)]
KVM: arm64: Ensure target address is granule-aligned for range TLBI
When zapping a table entry in stage2_try_break_pte(), we issue range
TLB invalidation for the region that was mapped by the table. However,
we neglect to align the base address down to the granule size and so
if we ended up reaching the table entry via a misaligned address then
we will accidentally skip invalidation for some prefix of the affected
address range.
Align 'ctx->addr' down to the granule size when performing TLB
invalidation for an unmapped table in stage2_try_break_pte().
Cc: Raghavendra Rao Ananta <rananta@google.com> Cc: Gavin Shan <gshan@redhat.com> Cc: Shaoqin Huang <shahuang@redhat.com> Cc: Quentin Perret <qperret@google.com> Fixes: defc8cc7abf0 ("KVM: arm64: Invalidate the table entries upon a range") Signed-off-by: Will Deacon <will@kernel.org> Reviewed-by: Shaoqin Huang <shahuang@redhat.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240327124853.11206-5-will@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Will Deacon [Wed, 27 Mar 2024 12:48:52 +0000 (12:48 +0000)]
KVM: arm64: Use TLBI_TTL_UNKNOWN in __kvm_tlb_flush_vmid_range()
Commit c910f2b65518 ("arm64/mm: Update tlb invalidation routines for
FEAT_LPA2") updated the __tlbi_level() macro to take the target level
as an argument, with TLBI_TTL_UNKNOWN (rather than 0) indicating that
the caller cannot provide level information. Unfortunately, the two
implementations of __kvm_tlb_flush_vmid_range() were not updated and so
now ask for an level 0 invalidation if FEAT_LPA2 is implemented.
Fix the problem by passing TLBI_TTL_UNKNOWN instead of 0 as the level
argument to __flush_s2_tlb_range_op() in __kvm_tlb_flush_vmid_range().
Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Marc Zyngier <maz@kernel.org> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Fixes: c910f2b65518 ("arm64/mm: Update tlb invalidation routines for FEAT_LPA2") Signed-off-by: Will Deacon <will@kernel.org> Reviewed-by: Shaoqin Huang <shahuang@redhat.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240327124853.11206-4-will@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Will Deacon [Wed, 27 Mar 2024 12:48:50 +0000 (12:48 +0000)]
KVM: arm64: Don't defer TLB invalidation when zapping table entries
Commit 7657ea920c54 ("KVM: arm64: Use TLBI range-based instructions for
unmap") introduced deferred TLB invalidation for the stage-2 page-table
so that range-based invalidation can be used for the accumulated
addresses. This works fine if the structure of the page-tables remains
unchanged, but if entire tables are zapped and subsequently freed then
we transiently leave the hardware page-table walker with a reference
to freed memory thanks to the translation walk caches. For example,
stage2_unmap_walker() will free page-table pages:
if (childp)
mm_ops->put_page(childp);
and issue the TLB invalidation later in kvm_pgtable_stage2_unmap():
if (stage2_unmap_defer_tlb_flush(pgt))
/* Perform the deferred TLB invalidations */
kvm_tlb_flush_vmid_range(pgt->mmu, addr, size);
For now, take the conservative approach and invalidate the TLB eagerly
when we clear a table entry. Note, however, that the existing level
hint passed to __kvm_tlb_flush_vmid_ipa() is incorrect and will be
fixed in a subsequent patch.
Cc: Raghavendra Rao Ananta <rananta@google.com> Cc: Shaoqin Huang <shahuang@redhat.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Will Deacon <will@kernel.org> Reviewed-by: Shaoqin Huang <shahuang@redhat.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240327124853.11206-2-will@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
This adds opts.recovery_pass_limit, and redoes -o norecovery to make use
of it; this fixes some issues with -o norecovery so it can be safely
used for data recovery.
Norecovery means "don't do journal replay"; it's an important data
recovery tool when we're getting stuck in journal replay.
When using it this way we need to make sure we don't free journal keys
after startup, so we continue to overlay them: thus it needs to imply
retain_recovery_info, as well as nochanges.
recovery_pass_limit is an explicit option for telling recovery to exit
after a specific recovery pass; this is a much cleaner way of
implementing -o norecovery, as well as being a useful debug feature in
its own right.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>