Local variable buf.i225 created at:
smsc95xx_read_reg drivers/net/usb/smsc95xx.c:90 [inline]
smsc95xx_reset+0x203/0x25f0 drivers/net/usb/smsc95xx.c:892
smsc95xx_bind+0x9bc/0x22e0 drivers/net/usb/smsc95xx.c:1131
CPU: 1 PID: 773 Comm: kworker/1:2 Not tainted 6.6.0-rc1-syzkaller-00125-ge42bebf6db29 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/04/2023
Workqueue: usb_hub_wq hub_event
=====================================================
Similar to e9c65989920f ("net: usb: smsc75xx: Fix uninit-value access in
__smsc75xx_read_reg"), this issue is caused because usbnet_read_cmd() reads
less bytes than requested (zero byte in the reproducer). In this case,
'buf' is not properly filled.
This patch fixes the issue by returning -ENODATA if usbnet_read_cmd() reads
less bytes than requested.
sysbot reported similar uninit-value access issue [2]. The root cause is
the same as mentioned above, and this patch addresses it as well.
Fixes: 2f7ca802bdae ("net: Add SMSC LAN9500 USB2.0 10/100 ethernet adapter driver") Reported-and-tested-by: syzbot+c74c24b43c9ae534f0e0@syzkaller.appspotmail.com Reported-and-tested-by: syzbot+2c97a98a5ba9ea9c23bd@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=c74c24b43c9ae534f0e0 [1] Closes: https://syzkaller.appspot.com/bug?extid=2c97a98a5ba9ea9c23bd [2] Signed-off-by: Shigeru Yoshida <syoshida@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: ieee802154: adf7242: Fix some potential buffer overflow in adf7242_stats_show()
strncat() usage in adf7242_debugfs_init() is wrong.
The size given to strncat() is the maximum number of bytes that can be
written, excluding the trailing NULL.
Here, the size that is passed, DNAME_INLINE_LEN, does not take into account
the size of "adf7242-" that is already in the array.
In order to fix it, use snprintf() instead.
Fixes: 7302b9d90117 ("ieee802154/adf7242: Driver for ADF7242 MAC IEEE802154") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
The spi_transfer struct has to have all it's fields initialized to 0 in
this case, since not all of them are set before starting the transfer.
Otherwise, spi_sync_transfer() will sometimes return an error.
Fixes: a526a3cc9c8d ("net: ethernet: adi: adin1110: Fix SPI transfers") Signed-off-by: Dell Jin <dell.jin.code@outlook.com> Signed-off-by: Ciprian Regus <ciprian.regus@analog.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: stmmac: update MAC capabilities when tx queues are updated
Upon boot up, the driver will configure the MAC capabilities based on
the maximum number of tx and rx queues. When the user changes the
tx queues to single queue, the MAC should be capable of supporting Half
Duplex, but the driver does not update the MAC capabilities when it is
configured so.
Using the stmmac_reinit_queues() to check the number of tx queues
and set the MAC capabilities accordingly.
Fixes: 0366f7e06a6b ("net: stmmac: add ethtool support for get/set channels") Cc: <stable@vger.kernel.org> # 5.17+ Signed-off-by: Michael Sit Wei Hong <michael.wei.hong.sit@intel.com> Signed-off-by: Gan, Yi Fang <yi.fang.gan@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The fix is to drop of_match_ptr() which is not necessary because DT is
always used for this driver (well, it could in theory support ACPI only,
but CONFIG_OF is always enabled for arm64).
Fixes: b0377116decd ("net: ethernet: Use device_get_match_data()") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202310170627.2Kvf6ZHY-lkp@intel.com/ Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Tirthendu Sarkar [Thu, 19 Oct 2023 20:38:52 +0000 (13:38 -0700)]
i40e: sync next_to_clean and next_to_process for programming status desc
When a programming status desc is encountered on the rx_ring,
next_to_process is bumped along with cleaned_count but next_to_clean is
not. This causes I40E_DESC_UNUSED() macro to misbehave resulting in
overwriting whole ring with new buffers.
Update next_to_clean to point to next_to_process on seeing a programming
status desc if not in the middle of handling a multi-frag packet. Also,
bump cleaned_count only for such case as otherwise next_to_clean buffer
may be returned to hardware on reaching clean_threshold.
Fixes: e9031f2da1ae ("i40e: introduce next_to_process to i40e_ring") Suggested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Reported-by: hq.dev+kernel@msdfc.xyz
Reported by: Solomon Peachy <pizza@shaftnet.org> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217678 Tested-by: hq.dev+kernel@msdfc.xyz
Tested by: Indrek Järve <incx@dustbite.net> Signed-off-by: Tirthendu Sarkar <tirthendu.sarkar@intel.com> Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel) Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/20231019203852.3663665-1-jacob.e.keller@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sasha Neftin [Thu, 19 Oct 2023 20:36:41 +0000 (13:36 -0700)]
igc: Fix ambiguity in the ethtool advertising
The 'ethtool_convert_link_mode_to_legacy_u32' method does not allow us to
advertise 2500M speed support and TP (twisted pair) properly. Convert to
'ethtool_link_ksettings_test_link_mode' to advertise supported speed and
eliminate ambiguity.
Eric Dumazet [Thu, 19 Oct 2023 11:24:57 +0000 (11:24 +0000)]
net: do not leave an empty skb in write queue
Under memory stress conditions, tcp_sendmsg_locked()
might call sk_stream_wait_memory(), thus releasing the socket lock.
If a fresh skb has been allocated prior to this,
we should not leave it in the write queue otherwise
tcp_write_xmit() could panic.
This apparently does not happen often, but a future change
in __sk_mem_raise_allocated() that Shakeel and others are
considering would increase chances of being hurt.
Under discussion is to remove this controversial part:
/* Fail only if socket is _under_ its sndbuf.
* In this case we cannot block, so that we have to fail.
*/
if (sk->sk_wmem_queued + size >= sk->sk_sndbuf) {
/* Force charge with __GFP_NOFAIL */
if (memcg_charge && !charged) {
mem_cgroup_charge_skmem(sk->sk_memcg, amt,
gfp_memcg_charge() | __GFP_NOFAIL);
}
return 1;
}
Fixes: fdfc5c8594c2 ("tcp: remove empty skb from write queue in error cases") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Link: https://lore.kernel.org/r/20231019112457.1190114-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
igb: Fix potential memory leak in igb_add_ethtool_nfc_entry
Add check for return of igb_update_ethtool_nfc_entry so that in case
of any potential errors the memory alocated for input will be freed.
Fixes: 0e71def25281 ("igb: add support of RX network flow classification") Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel) Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kunwu Chan [Fri, 20 Oct 2023 09:31:56 +0000 (17:31 +0800)]
treewide: Spelling fix in comment
reques -> request
Fixes: 09dde54c6a69 ("PS3: gelic: Add wireless support for PS3") Signed-off-by: Kunwu Chan <chentao@kylinos.cn> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: David S. Miller <davem@davemloft.net>
Ivan Vecera [Thu, 19 Oct 2023 16:37:20 +0000 (18:37 +0200)]
i40e: Fix I40E_FLAG_VF_VLAN_PRUNING value
Commit c87c938f62d8f1 ("i40e: Add VF VLAN pruning") added new
PF flag I40E_FLAG_VF_VLAN_PRUNING but its value collides with
existing I40E_FLAG_TOTAL_PORT_SHUTDOWN_ENABLED flag.
Move the affected flag at the end of the flags and fix its value.
Reproducer:
[root@cnb-03 ~]# ethtool --set-priv-flags enp2s0f0np0 link-down-on-close on
[root@cnb-03 ~]# ethtool --set-priv-flags enp2s0f0np0 vf-vlan-pruning on
[root@cnb-03 ~]# ethtool --set-priv-flags enp2s0f0np0 link-down-on-close off
[ 6323.142585] i40e 0000:02:00.0: Setting link-down-on-close not supported on this port (because total-port-shutdown is enabled)
netlink error: Operation not supported
[root@cnb-03 ~]# ethtool --set-priv-flags enp2s0f0np0 vf-vlan-pruning off
[root@cnb-03 ~]# ethtool --set-priv-flags enp2s0f0np0 link-down-on-close off
The link-down-on-close flag cannot be modified after setting vf-vlan-pruning
because vf-vlan-pruning shares the same bit with total-port-shutdown flag
that prevents any modification of link-down-on-close flag.
Fixes: c87c938f62d8 ("i40e: Add VF VLAN pruning") Cc: Mateusz Palczewski <mateusz.palczewski@intel.com> Cc: Simon Horman <horms@kernel.org> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: David S. Miller <davem@davemloft.net>
Michal Schmidt [Thu, 19 Oct 2023 07:13:46 +0000 (09:13 +0200)]
iavf: initialize waitqueues before starting watchdog_task
It is not safe to initialize the waitqueues after queueing the
watchdog_task. It will be using them.
The chance of this causing a real problem is very small, because
there will be some sleeping before any of the waitqueues get used.
I got a crash only after inserting an artificial sleep in iavf_probe.
Queue the watchdog_task as the last step in iavf_probe. Add a comment to
prevent repeating the mistake.
Fixes: fe2647ab0c99 ("i40evf: prevent VF close returning before state transitions to DOWN") Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
r8169: fix the KCSAN reported data race in rtl_rx while reading desc->opts1
KCSAN reported the following data-race bug:
==================================================================
BUG: KCSAN: data-race in rtl8169_poll (drivers/net/ethernet/realtek/r8169_main.c:4430 drivers/net/ethernet/realtek/r8169_main.c:4583) r8169
race at unknown origin, with read to 0xffff888117e43510 of 4 bytes by interrupt on cpu 21:
rtl8169_poll (drivers/net/ethernet/realtek/r8169_main.c:4430 drivers/net/ethernet/realtek/r8169_main.c:4583) r8169
__napi_poll (net/core/dev.c:6527)
net_rx_action (net/core/dev.c:6596 net/core/dev.c:6727)
__do_softirq (kernel/softirq.c:553)
__irq_exit_rcu (kernel/softirq.c:427 kernel/softirq.c:632)
irq_exit_rcu (kernel/softirq.c:647)
sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1074 (discriminator 14))
asm_sysvec_apic_timer_interrupt (./arch/x86/include/asm/idtentry.h:645)
cpuidle_enter_state (drivers/cpuidle/cpuidle.c:291)
cpuidle_enter (drivers/cpuidle/cpuidle.c:390)
call_cpuidle (kernel/sched/idle.c:135)
do_idle (kernel/sched/idle.c:219 kernel/sched/idle.c:282)
cpu_startup_entry (kernel/sched/idle.c:378 (discriminator 1))
start_secondary (arch/x86/kernel/smpboot.c:210 arch/x86/kernel/smpboot.c:294)
secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:433)
value changed: 0x80003fff -> 0x3402805f
Reported by Kernel Concurrency Sanitizer on:
CPU: 21 PID: 0 Comm: swapper/21 Tainted: G L 6.6.0-rc2-kcsan-00143-gb5cbe7c00aa0 #41
Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
==================================================================
drivers/net/ethernet/realtek/r8169_main.c:
==========================================
4429
→ 4430 status = le32_to_cpu(desc->opts1);
4431 if (status & DescOwn)
4432 break;
4433
4434 /* This barrier is needed to keep us from reading
4435 * any other fields out of the Rx descriptor until
4436 * we know the status of DescOwn
4437 */
4438 dma_rmb();
4439
4440 if (unlikely(status & RxRES)) {
4441 if (net_ratelimit())
4442 netdev_warn(dev, "Rx ERROR. status = %08x\n",
Marco Elver explained that dma_rmb() doesn't prevent the compiler to tear up the access to
desc->opts1 which can be written to concurrently. READ_ONCE() should prevent that from
happening:
4429
→ 4430 status = le32_to_cpu(READ_ONCE(desc->opts1));
4431 if (status & DescOwn)
4432 break;
4433
As the consequence of this fix, this KCSAN warning was eliminated.
Fixes: 6202806e7c03a ("r8169: drop member opts1_mask from struct rtl8169_private") Suggested-by: Marco Elver <elver@google.com> Cc: Heiner Kallweit <hkallweit1@gmail.com> Cc: nic_swsd@realtek.com Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: netdev@vger.kernel.org Link: https://lore.kernel.org/lkml/dc7fc8fa-4ea4-e9a9-30a6-7c83e6b53188@alu.unizg.hr/ Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr> Acked-by: Marco Elver <elver@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Reported by Kernel Concurrency Sanitizer on:
CPU: 21 PID: 0 Comm: swapper/21 Tainted: G L 6.6.0-rc2-kcsan-00143-gb5cbe7c00aa0 #41
Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
==================================================================
The write side of drivers/net/ethernet/realtek/r8169_main.c is:
==================
4251 /* rtl_tx needs to see descriptor changes before updated tp->cur_tx */
4252 smp_wmb();
4253
→ 4254 WRITE_ONCE(tp->cur_tx, tp->cur_tx + frags + 1);
4255
4256 stop_queue = !netif_subqueue_maybe_stop(dev, 0, rtl_tx_slots_avail(tp),
4257 R8169_TX_STOP_THRS,
4258 R8169_TX_START_THRS);
The read side is the function rtl_tx():
4355 static void rtl_tx(struct net_device *dev, struct rtl8169_private *tp,
4356 int budget)
4357 {
4358 unsigned int dirty_tx, bytes_compl = 0, pkts_compl = 0;
4359 struct sk_buff *skb;
4360
4361 dirty_tx = tp->dirty_tx;
4362
4363 while (READ_ONCE(tp->cur_tx) != dirty_tx) {
4364 unsigned int entry = dirty_tx % NUM_TX_DESC;
4365 u32 status;
4366
4367 status = le32_to_cpu(tp->TxDescArray[entry].opts1);
4368 if (status & DescOwn)
4369 break;
4370
4371 skb = tp->tx_skb[entry].skb;
4372 rtl8169_unmap_tx_skb(tp, entry);
4373
4374 if (skb) {
4375 pkts_compl++;
4376 bytes_compl += skb->len;
4377 napi_consume_skb(skb, budget);
4378 }
4379 dirty_tx++;
4380 }
4381
4382 if (tp->dirty_tx != dirty_tx) {
4383 dev_sw_netstats_tx_add(dev, pkts_compl, bytes_compl);
4384 WRITE_ONCE(tp->dirty_tx, dirty_tx);
4385
4386 netif_subqueue_completed_wake(dev, 0, pkts_compl, bytes_compl,
4387 rtl_tx_slots_avail(tp),
4388 R8169_TX_START_THRS);
4389 /*
4390 * 8168 hack: TxPoll requests are lost when the Tx packets are
4391 * too close. Let's kick an extra TxPoll request when a burst
4392 * of start_xmit activity is detected (if it is not detected,
4393 * it is slow enough). -- FR
4394 * If skb is NULL then we come here again once a tx irq is
4395 * triggered after the last fragment is marked transmitted.
4396 */
→ 4397 if (tp->cur_tx != dirty_tx && skb)
4398 rtl8169_doorbell(tp);
4399 }
4400 }
Obviously from the code, an earlier detected data-race for tp->cur_tx was fixed in the
line 4363:
4363 while (READ_ONCE(tp->cur_tx) != dirty_tx) {
but the same solution is required for protecting the other access to tp->cur_tx:
→ 4397 if (READ_ONCE(tp->cur_tx) != dirty_tx && skb)
4398 rtl8169_doorbell(tp);
The write in the line 4254 is protected with WRITE_ONCE(), but the read in the line 4397
might have suffered read tearing under some compiler optimisations.
The fix eliminated the KCSAN data-race report for this bug.
It is yet to be evaluated what happens if tp->cur_tx changes between the test in line 4363
and line 4397. This test should certainly not be cached by the compiler in some register
for such a long time, while asynchronous writes to tp->cur_tx might have occurred in line
4254 in the meantime.
Fixes: 94d8a98e6235c ("r8169: reduce number of workaround doorbell rings") Cc: Heiner Kallweit <hkallweit1@gmail.com> Cc: nic_swsd@realtek.com Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Marco Elver <elver@google.com> Cc: netdev@vger.kernel.org Link: https://lore.kernel.org/lkml/dc7fc8fa-4ea4-e9a9-30a6-7c83e6b53188@alu.unizg.hr/ Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr> Acked-by: Marco Elver <elver@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Cited commit introduced a neat way of updating next_to_clean that does
not require boundary checks on each increment. This was done by masking
the new value with (ring length - 1) mask. Problem is that this is
applicable only for power of 2 ring sizes, for every other size this
assumption can not be made. In turn, it leads to cleaning descriptors
out of order as well as splats:
It comes from picking wrong ring entries when cleaning xsk buffers
during pool detach.
Remove the count_mask logic and use they boundary check when updating
next_to_process (which used to be a next_to_clean).
Fixes: c8a8ca3408dc ("i40e: remove unnecessary memory writes of the next to clean pointer") Reported-by: Tushar Vyavahare <tushar.vyavahare@intel.com> Tested-by: Tushar Vyavahare <tushar.vyavahare@intel.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20231018163908.40841-1-maciej.fijalkowski@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Thu, 19 Oct 2023 19:08:18 +0000 (12:08 -0700)]
Merge tag 'net-6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth, netfilter, WiFi.
Feels like an up-tick in regression fixes, mostly for older releases.
The hfsc fix, tcp_disconnect() and Intel WWAN fixes stand out as
fairly clear-cut user reported regressions. The mlx5 DMA bug was
causing strife for 390x folks. The fixes themselves are not
particularly scary, tho. No open investigations / outstanding reports
at the time of writing.
Current release - regressions:
- eth: mlx5: perform DMA operations in the right locations, make
devices usable on s390x, again
- sched: sch_hfsc: upgrade 'rt' to 'sc' when it becomes a inner
curve, previous fix of rejecting invalid config broke some scripts
- rfkill: reduce data->mtx scope in rfkill_fop_open, avoid deadlock
- revert "ethtool: Fix mod state of verbose no_mask bitset", needs
more work
Current release - new code bugs:
- tcp: fix listen() warning with v4-mapped-v6 address
Previous releases - regressions:
- tcp: allow tcp_disconnect() again when threads are waiting, it was
denied to plug a constant source of bugs but turns out .NET depends
on it
- eth: mlx5: fix double-free if buffer refill fails under OOM
- revert "net: wwan: iosm: enable runtime pm support for 7560", it's
causing regressions and the WWAN team at Intel disappeared
- tcp: tsq: relax tcp_small_queue_check() when rtx queue contains a
single skb, fix single-stream perf regression on some devices
Previous releases - always broken:
- Bluetooth:
- fix issues in legacy BR/EDR PIN code pairing
- correctly bounds check and pad HCI_MON_NEW_INDEX name
- netfilter:
- more fixes / follow ups for the large "commit protocol" rework,
which went in as a fix to 6.5
- fix null-derefs on netlink attrs which user may not pass in
- tcp: fix excessive TLP and RACK timeouts from HZ rounding (bless
Debian for keeping HZ=250 alive)
- net: more strict VIRTIO_NET_HDR_GSO_UDP_L4 validation, prevent
letting frankenstein UDP super-frames from getting into the stack
- net: fix interface altnames when ifc moves to a new namespace
- eth: qed: fix the size of the RX buffers
- mptcp: avoid sending RST when closing the initial subflow"
* tag 'net-6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (94 commits)
Revert "ethtool: Fix mod state of verbose no_mask bitset"
selftests: mptcp: join: no RST when rm subflow/addr
mptcp: avoid sending RST when closing the initial subflow
mptcp: more conservative check for zero probes
tcp: check mptcp-level constraints for backlog coalescing
selftests: mptcp: join: correctly check for no RST
net: ti: icssg-prueth: Fix r30 CMDs bitmasks
selftests: net: add very basic test for netdev names and namespaces
net: move altnames together with the netdevice
net: avoid UAF on deleted altname
net: check for altname conflicts when changing netdev's netns
net: fix ifname in netlink ntf during netns move
net: ethernet: ti: Fix mixed module-builtin object
net: phy: bcm7xxx: Add missing 16nm EPHY statistics
ipv4: fib: annotate races around nh->nh_saddr_genid and nh->nh_saddr
tcp_bpf: properly release resources on error paths
net/sched: sch_hfsc: upgrade 'rt' to 'sc' when it becomes a inner curve
net: mdio-mux: fix C45 access returning -EIO after API change
tcp: tsq: relax tcp_small_queue_check() when rtx queue contains a single skb
octeon_ep: update BQL sent bytes before ringing doorbell
...
Linus Torvalds [Thu, 19 Oct 2023 18:02:28 +0000 (11:02 -0700)]
Merge tag 'loongarch-fixes-6.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch fixes from Huacai ChenL
"Fix 4-level pagetable building, disable WUC for pgprot_writecombine()
like ioremap_wc(), use correct annotation for exception handlers, and
a trivial cleanup"
* tag 'loongarch-fixes-6.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
LoongArch: Disable WUC for pgprot_writecombine() like ioremap_wc()
LoongArch: Replace kmap_atomic() with kmap_local_page() in copy_user_highpage()
LoongArch: Export symbol invalid_pud_table for modules building
LoongArch: Use SYM_CODE_* to annotate exception handlers
Linus Torvalds [Thu, 19 Oct 2023 17:53:31 +0000 (10:53 -0700)]
Merge tag 'slab-fixes-for-6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab fix from Vlastimil Babka:
- stable fix to prevent kernel warnings with KASAN_HW_TAGS on arm64
due to improperly resolved kmalloc alignment restrictions (Catalin
Marinas)
* tag 'slab-fixes-for-6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
mm: slab: Do not create kmalloc caches smaller than arch_slab_minalign()
Linus Torvalds [Thu, 19 Oct 2023 16:37:41 +0000 (09:37 -0700)]
Merge tag 'v6.6-rc7.vfs.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fix from Christian Brauner:
"An openat() call from io_uring triggering an audit call can apparently
cause the refcount of struct filename to be incremented from multiple
threads concurrently during async execution, triggering a refcount
underflow and hitting a BUG_ON(). That bug has been lurking around
since at least v5.16 apparently.
Switch to an atomic counter to fix that. The underflow check is
downgraded from a BUG_ON() to a WARN_ON_ONCE() but we could easily
remove that check altogether tbh"
* tag 'v6.6-rc7.vfs.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
audit,io_uring: io_uring openat triggers audit reference count underflow
It was reported that this fix breaks the possibility to remove existing WoL
flags. For example:
~$ ethtool lan2
...
Supports Wake-on: pg
Wake-on: d
...
~$ ethtool -s lan2 wol gp
~$ ethtool lan2
...
Wake-on: pg
...
~$ ethtool -s lan2 wol d
~$ ethtool lan2
...
Wake-on: pg
...
This worked correctly before this commit because we were always updating
a zero bitmap (since commit 6699170376ab ("ethtool: fix application of
verbose no_mask bitset"), that is) so that the rest was left zero
naturally. But now the 1->0 change (old_val is true, bit not present in
netlink nest) no longer works.
Reported-by: Oleksij Rempel <o.rempel@pengutronix.de> Reported-by: Michal Kubecek <mkubecek@suse.cz> Closes: https://lore.kernel.org/netdev/20231019095140.l6fffnszraeb6iiw@lion.mk-sys.cz/ Cc: stable@vger.kernel.org Fixes: 108a36d07c01 ("ethtool: Fix mod state of verbose no_mask bitset") Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Link: https://lore.kernel.org/r/20231019-feature_ptp_bitset_fix-v1-1-70f3c429a221@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Thu, 19 Oct 2023 16:10:18 +0000 (09:10 -0700)]
Merge tag 'ntfs3_for_6.6' of https://github.com/Paragon-Software-Group/linux-ntfs3
Pull ntfs3 fixes from Konstantin Komarov:
- memory leak
- some logic errors, NULL dereferences
- some code was refactored
- more sanity checks
* tag 'ntfs3_for_6.6' of https://github.com/Paragon-Software-Group/linux-ntfs3:
fs/ntfs3: Avoid possible memory leak
fs/ntfs3: Fix directory element type detection
fs/ntfs3: Fix possible null-pointer dereference in hdr_find_e()
fs/ntfs3: Fix OOB read in ntfs_init_from_boot
fs/ntfs3: fix panic about slab-out-of-bounds caused by ntfs_list_ea()
fs/ntfs3: Fix NULL pointer dereference on error in attr_allocate_frame()
fs/ntfs3: Fix possible NULL-ptr-deref in ni_readpage_cmpr()
fs/ntfs3: Do not allow to change label if volume is read-only
fs/ntfs3: Add more info into /proc/fs/ntfs3/<dev>/volinfo
fs/ntfs3: Refactoring and comments
fs/ntfs3: Fix alternative boot searching
fs/ntfs3: Allow repeated call to ntfs3_put_sbi
fs/ntfs3: Use inode_set_ctime_to_ts instead of inode_set_ctime
fs/ntfs3: Fix shift-out-of-bounds in ntfs_fill_super
fs/ntfs3: fix deadlock in mark_as_free_ex
fs/ntfs3: Add more attributes checks in mi_enum_attr()
fs/ntfs3: Use kvmalloc instead of kmalloc(... __GFP_NOWARN)
fs/ntfs3: Write immediately updated ntfs state
fs/ntfs3: Add ckeck in ni_update_parent()
Geliang Tang [Wed, 18 Oct 2023 18:23:55 +0000 (11:23 -0700)]
mptcp: avoid sending RST when closing the initial subflow
When closing the first subflow, the MPTCP protocol unconditionally
calls tcp_disconnect(), which in turn generates a reset if the subflow
is established.
That is unexpected and different from what MPTCP does with MPJ
subflows, where resets are generated only on FASTCLOSE and other edge
scenarios.
We can't reuse for the first subflow the same code in place for MPJ
subflows, as MPTCP clean them up completely via a tcp_close() call,
while must keep the first subflow socket alive for later re-usage, due
to implementation constraints.
This patch adds a new helper __mptcp_subflow_disconnect() that
encapsulates, a logic similar to tcp_close, issuing a reset only when
the MPTCP_CF_FASTCLOSE flag is set, and performing a clean shutdown
otherwise.
Fixes: c2b2ae3925b6 ("mptcp: handle correctly disconnect() failures") Cc: stable@vger.kernel.org Reviewed-by: Matthieu Baerts <matttbe@kernel.org> Co-developed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20231018-send-net-20231018-v1-4-17ecb002e41d@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Wed, 18 Oct 2023 18:23:54 +0000 (11:23 -0700)]
mptcp: more conservative check for zero probes
Christoph reported that the MPTCP protocol can find the subflow-level
write queue unexpectedly not empty while crafting a zero-window probe,
hitting a warning:
Paolo Abeni [Wed, 18 Oct 2023 18:23:53 +0000 (11:23 -0700)]
tcp: check mptcp-level constraints for backlog coalescing
The MPTCP protocol can acquire the subflow-level socket lock and
cause the tcp backlog usage. When inserting new skbs into the
backlog, the stack will try to coalesce them.
Currently, we have no check in place to ensure that such coalescing
will respect the MPTCP-level DSS, and that may cause data stream
corruption, as reported by Christoph.
Address the issue by adding the relevant admission check for coalescing
in tcp_add_backlog().
Note the issue is not easy to reproduce, as the MPTCP protocol tries
hard to avoid acquiring the subflow-level socket lock.
Fixes: 648ef4b88673 ("mptcp: Implement MPTCP receive path") Cc: stable@vger.kernel.org Reported-by: Christoph Paasch <cpaasch@apple.com> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/420 Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20231018-send-net-20231018-v1-2-17ecb002e41d@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Matthieu Baerts [Wed, 18 Oct 2023 18:23:52 +0000 (11:23 -0700)]
selftests: mptcp: join: correctly check for no RST
The commit mentioned below was more tolerant with the number of RST seen
during a test because in some uncontrollable situations, multiple RST
can be generated.
But it was not taking into account the case where no RST are expected:
this validation was then no longer reporting issues for the 0 RST case
because it is not possible to have less than 0 RST in the counter. This
patch fixes the issue by adding a specific condition.
Fixes: 6bf41020b72b ("selftests: mptcp: update and extend fastclose test-cases") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts <matttbe@kernel.org> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20231018-send-net-20231018-v1-1-17ecb002e41d@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The bitmasks for EMAC_PORT_DISABLE and EMAC_PORT_FORWARD r30 commands are
wrong in the driver.
Update the bitmasks of these commands to the correct ones as used by the
ICSSG firmware. These bitmasks are backwards compatible and work with
any ICSSG firmware version.
Fixes: e9b4ece7d74b ("net: ti: icssg-prueth: Add Firmware config and classification APIs.") Signed-off-by: MD Danish Anwar <danishanwar@ti.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20231018150715.3085380-1-danishanwar@ti.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Thu, 19 Oct 2023 15:56:01 +0000 (08:56 -0700)]
Merge tag 'for-6.6-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fix from David Sterba:
"Fix a bug in chunk size decision that could lead to suboptimal
placement and filling patterns"
* tag 'for-6.6-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix stripe length calculation for non-zoned data chunk allocation
====================
net: fix bugs in device netns-move and rename
Daniel reported issues with the uevents generated during netdev
namespace move, if the netdev is getting renamed at the same time.
While the issue that he actually cares about is not fixed here,
there is a bunch of seemingly obvious other bugs in this code.
Fix the purely networking bugs while the discussion around
the uevent fix is still ongoing.
====================
Jakub Kicinski [Wed, 18 Oct 2023 01:38:16 +0000 (18:38 -0700)]
net: move altnames together with the netdevice
The altname nodes are currently not moved to the new netns
when netdevice itself moves:
[ ~]# ip netns add test
[ ~]# ip -netns test link add name eth0 type dummy
[ ~]# ip -netns test link property add dev eth0 altname some-name
[ ~]# ip -netns test link show dev some-name
2: eth0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 1e:67:ed:19:3d:24 brd ff:ff:ff:ff:ff:ff
altname some-name
[ ~]# ip -netns test link set dev eth0 netns 1
[ ~]# ip link
...
3: eth0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 02:40:88:62:ec:b8 brd ff:ff:ff:ff:ff:ff
altname some-name
[ ~]# ip li show dev some-name
Device "some-name" does not exist.
Remove them from the hash table when device is unlisted
and add back when listed again.
Fixes: 36fbf1e52bd3 ("net: rtnetlink: add linkprop commands to add and delete alternative ifnames") Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski [Wed, 18 Oct 2023 01:38:15 +0000 (18:38 -0700)]
net: avoid UAF on deleted altname
Altnames are accessed under RCU (dev_get_by_name_rcu())
but freed by kfree() with no synchronization point.
Each node has one or two allocations (node and a variable-size
name, sometimes the name is netdev->name). Adding rcu_heads
here is a bit tedious. Besides most code which unlists the names
already has rcu barriers - so take the simpler approach of adding
synchronize_rcu(). Note that the one on the unregistration path
(which matters more) is removed by the next fix.
Fixes: ff92741270bf ("net: introduce name_node struct to be used in hashlist") Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski [Wed, 18 Oct 2023 01:38:14 +0000 (18:38 -0700)]
net: check for altname conflicts when changing netdev's netns
It's currently possible to create an altname conflicting
with an altname or real name of another device by creating
it in another netns and moving it over:
[ ~]$ ip link add dev eth0 type dummy
[ ~]$ ip netns add test
[ ~]$ ip -netns test link add dev ethX netns test type dummy
[ ~]$ ip -netns test link property add dev ethX altname eth0
[ ~]$ ip -netns test link set dev ethX netns 1
[ ~]$ ip link
...
3: eth0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 02:40:88:62:ec:b8 brd ff:ff:ff:ff:ff:ff
...
5: ethX: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 26:b7:28:78:38:0f brd ff:ff:ff:ff:ff:ff
altname eth0
Create a macro for walking the altnames, this hopefully makes
it clearer that the list we walk contains only altnames.
Which is otherwise not entirely intuitive.
Fixes: 36fbf1e52bd3 ("net: rtnetlink: add linkprop commands to add and delete alternative ifnames") Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski [Wed, 18 Oct 2023 01:38:13 +0000 (18:38 -0700)]
net: fix ifname in netlink ntf during netns move
dev_get_valid_name() overwrites the netdev's name on success.
This makes it hard to use in prepare-commit-like fashion,
where we do validation first, and "commit" to the change
later.
Factor out a helper which lets us save the new name to a buffer.
Use it to fix the problem of notification on netns move having
incorrect name:
5: eth0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
link/ether be:4d:58:f9:d5:40 brd ff:ff:ff:ff:ff:ff
6: eth1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
link/ether 1e:4a:34:36:e3:cd brd ff:ff:ff:ff:ff:ff
[ ~]# ip link set dev eth0 netns 1 name eth1
ip monitor inside netns:
Deleted inet eth0
Deleted inet6 eth0
Deleted 5: eth1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
link/ether be:4d:58:f9:d5:40 brd ff:ff:ff:ff:ff:ff new-netnsid 0 new-ifindex 7
Name is reported as eth1 in old netns for ifindex 5, already renamed.
Fixes: d90310243fd7 ("net: device name allocation cleanups") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
With CONFIG_TI_K3_AM65_CPSW_NUSS=y and CONFIG_TI_ICSSG_PRUETH=m,
k3-cppi-desc-pool.o is linked to a module and also to vmlinux even though
the expected CFLAGS are different between builtins and modules.
The build system is complaining about the following:
k3-cppi-desc-pool.o is added to multiple modules: icssg-prueth
ti-am65-cpsw-nuss
Introduce the new module, k3-cppi-desc-pool, to provide the common
functions to ti-am65-cpsw-nuss and icssg-prueth.
Jakub Kicinski [Thu, 19 Oct 2023 01:17:50 +0000 (18:17 -0700)]
Merge tag 'nf-23-10-18' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Florian Westphal says:
====================
netfilter: updates for net
First patch, from Phil Sutter, reduces number of audit notifications
when userspace requests to re-set stateful objects.
This change also comes with a selftest update.
Second patch, also from Phil, moves the nftables audit selftest
to its own netns to avoid interference with the init netns.
Third patch, from Pablo Neira, fixes an inconsistency with the "rbtree"
set backend: When set element X has expired, a request to delete element
X should fail (like with all other backends).
Finally, patch four, also from Pablo, reverts a recent attempt to speed
up abort of a large pending update with the "pipapo" set backend.
It could cause stray references to remain in the set, which then
results in a double-free.
* tag 'nf-23-10-18' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: revert do not remove elements if set backend implements .abort
netfilter: nft_set_rbtree: .deactivate fails if element has expired
selftests: netfilter: Run nft_audit.sh in its own netns
netfilter: nf_tables: audit log object reset once per table
====================
Jakub Kicinski [Thu, 19 Oct 2023 01:14:25 +0000 (18:14 -0700)]
Merge tag 'wireless-2023-10-18' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Johannes Berg says:
====================
A few more fixes:
* prevent value bounce/glitch in rfkill GPIO probe
* fix lockdep report in rfkill
* fix error path leak in mac80211 key handling
* use system_unbound_wq for wiphy work since it
can take longer
* tag 'wireless-2023-10-18' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
net: rfkill: reduce data->mtx scope in rfkill_fop_open
net: rfkill: gpio: prevent value glitch during probe
wifi: mac80211: fix error path key leak
wifi: cfg80211: use system_unbound_wq for wiphy work
====================
The .probe() function would allocate the necessary space and ensure that
the library call sizes the number of statistics but the callbacks
necessary to fetch the name and values were not wired up.
Reported-by: Justin Chen <justin.chen@broadcom.com> Fixes: f68d08c437f9 ("net: phy: bcm7xxx: Add EPHY entry for 72165") Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20231017205119.416392-1-florian.fainelli@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 6759 Comm: kworker/u4:15 Not tainted 6.6.0-rc4-syzkaller-00029-gcbf3a2cb156a #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/06/2023
Workqueue: wg-kex-wg1 wg_packet_handshake_send_worker
Fixes: 436c3b66ec98 ("ipv4: Invalidate nexthop cache nh_saddr more correctly.") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20231017192304.82626-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Tue, 17 Oct 2023 15:49:51 +0000 (17:49 +0200)]
tcp_bpf: properly release resources on error paths
In the blamed commit below, I completely forgot to release the acquired
resources before erroring out in the TCP BPF code, as reported by Dan.
Address the issues by replacing the bogus return with a jump to the
relevant cleanup code.
Fixes: 419ce133ab92 ("tcp: allow again tcp_disconnect() when threads are waiting") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/r/8f99194c698bcef12666f0a9a999c58f8b1cb52c.1697557782.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pedro Tammela [Tue, 17 Oct 2023 14:36:02 +0000 (11:36 -0300)]
net/sched: sch_hfsc: upgrade 'rt' to 'sc' when it becomes a inner curve
Christian Theune says:
I upgraded from 6.1.38 to 6.1.55 this morning and it broke my traffic shaping script,
leaving me with a non-functional uplink on a remote router.
A 'rt' curve cannot be used as a inner curve (parent class), but we were
allowing such configurations since the qdisc was introduced. Such
configurations would trigger a UAF as Budimir explains:
The parent will have vttree_insert() called on it in init_vf(),
but will not have vttree_remove() called on it in update_vf()
because it does not have the HFSC_FSC flag set.
The qdisc always assumes that inner classes have the HFSC_FSC flag set.
This is by design as it doesn't make sense 'qdisc wise' for an 'rt'
curve to be an inner curve.
Budimir's original patch disallows users to add classes with a 'rt'
parent, but this is too strict as it breaks users that have been using
'rt' as a inner class. Another approach, taken by this patch, is to
upgrade the inner 'rt' into a 'sc', warning the user in the process.
It avoids the UAF reported by Budimir while also being more permissive
to bad scripts/users/code using 'rt' as a inner class.
Users checking the `tc class ls [...]` or `tc class get [...]` dumps would
observe the curve change and are potentially breaking with this change.
v1->v2: https://lore.kernel.org/all/20231013151057.2611860-1-pctammela@mojatatu.com/
- Correct 'Fixes' tag and merge with revert (Jakub)
Cc: Christian Theune <ct@flyingcircus.io> Cc: Budimir Markovic <markovicbudimir@gmail.com> Fixes: b3d26c5702c7 ("net/sched: sch_hfsc: Ensure inner classes have fsc curve") Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://lore.kernel.org/r/20231017143602.3191556-1-pctammela@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Tue, 17 Oct 2023 14:31:44 +0000 (17:31 +0300)]
net: mdio-mux: fix C45 access returning -EIO after API change
The mii_bus API conversion to read_c45() and write_c45() did not cover
the mdio-mux driver before read() and write() were made C22-only.
This broke arch/arm64/boot/dts/freescale/fsl-ls1028a-qds-13bb.dtso.
The -EOPNOTSUPP from mdiobus_c45_read() is transformed by
get_phy_c45_devs_in_pkg() into -EIO, is further propagated to
of_mdiobus_register() and this makes the mdio-mux driver fail to probe
the entire child buses, not just the PHYs that cause access errors.
Fix the regression by introducing special c45 read and write accessors
to mdio-mux which forward the operation to the parent MDIO bus.
Fixes: db1a63aed89c ("net: phy: Remove fallback to old C45 method") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20231017143144.3212657-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Tue, 17 Oct 2023 12:45:26 +0000 (12:45 +0000)]
tcp: tsq: relax tcp_small_queue_check() when rtx queue contains a single skb
In commit 75eefc6c59fd ("tcp: tsq: add a shortcut in tcp_small_queue_check()")
we allowed to send an skb regardless of TSQ limits being hit if rtx queue
was empty or had a single skb, in order to better fill the pipe
when/if TX completions were slow.
Then later, commit 75c119afe14f ("tcp: implement rb-tree based
retransmit queue") accidentally removed the special case for
one skb in rtx queue.
Stefan Wahren reported a regression in single TCP flow throughput
using a 100Mbit fec link, starting from commit 65466904b015 ("tcp: adjust
TSO packet sizes based on min_rtt"). This last commit only made the
regression more visible, because it locked the TCP flow on a particular
behavior where TSQ prevented two skbs being pushed downstream,
adding silences on the wire between each TSO packet.
Shinas Rasheed [Tue, 17 Oct 2023 10:50:30 +0000 (03:50 -0700)]
octeon_ep: update BQL sent bytes before ringing doorbell
Sometimes Tx is completed immediately after doorbell is updated, which
causes Tx completion routing to update completion bytes before the
same packet bytes are updated in sent bytes in transmit function, hence
hitting BUG_ON() in dql_completed(). To avoid this, update BQL
sent bytes before ringing doorbell.
perf/benchmark: fix seccomp_unotify benchmark for 32-bit
Commit 7d5cb68af638 (perf/benchmark: add a new benchmark for
seccom_unotify) added a reference to __NR_seccomp into perf. This is
fine as it added also a definition of __NR_seccomp for 64-bit. But it
failed to do so for 32-bit as instead of ifndef, ifdef was used.
Fix this typo (so fix the build of perf on 32-bit).
Fixes: 7d5cb68af638 (perf/benchmark: add a new benchmark for seccom_unotify) Cc: Andrei Vagin <avagin@google.com> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: "Jiri Slaby (SUSE)" <jirislaby@kernel.org> Link: https://lore.kernel.org/r/20231017083019.31733-1-jirislaby@kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
Linus Torvalds [Wed, 18 Oct 2023 16:30:03 +0000 (09:30 -0700)]
Merge tag 'regmap-fix-v6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
Pull regmap fix from Mark Brown:
"A straightforward fix from Johan for a long standing bug in cases
where we both have regmaps without devices and something is using
dev_get_regmap()"
* tag 'regmap-fix-v6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: fix NULL deref on lookup
netfilter: nf_tables: revert do not remove elements if set backend implements .abort
nf_tables_abort_release() path calls nft_set_elem_destroy() for
NFT_MSG_NEWSETELEM which releases the element, however, a reference to
the element still remains in the working copy.
Fixes: ebd032fa8818 ("netfilter: nf_tables: do not remove elements if set backend implements .abort") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
netfilter: nft_set_rbtree: .deactivate fails if element has expired
This allows to remove an expired element which is not possible in other
existing set backends, this is more noticeable if gc-interval is high so
expired elements remain in the tree. On-demand gc also does not help in
this case, because this is delete element path. Return NULL if element
has expired.
Phil Sutter [Fri, 13 Oct 2023 20:02:24 +0000 (22:02 +0200)]
selftests: netfilter: Run nft_audit.sh in its own netns
Don't mess with the host's firewall ruleset. Since audit logging is not
per-netns, add an initial delay of a second so other selftests' netns
cleanups have a chance to finish.
Fixes: e8dbde59ca3f ("selftests: netfilter: Test nf_tables audit logging") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Florian Westphal <fw@strlen.de>
Phil Sutter [Wed, 11 Oct 2023 15:06:59 +0000 (17:06 +0200)]
netfilter: nf_tables: audit log object reset once per table
When resetting multiple objects at once (via dump request), emit a log
message per table (or filled skb) and resurrect the 'entries' parameter
to contain the number of objects being logged for.
To test the skb exhaustion path, perform some bulk counter and quota
adds in the kselftest.
Signed-off-by: Phil Sutter <phil@nwl.cc> Reviewed-by: Richard Guy Briggs <rgb@redhat.com> Acked-by: Paul Moore <paul@paul-moore.com> (Audit) Signed-off-by: Florian Westphal <fw@strlen.de>
In file included from include/trace/define_trace.h:102,
from include/trace/events/neigh.h:255,
from net/core/net-traces.c:51:
include/trace/events/neigh.h: In function ‘trace_event_raw_event_neigh_create’:
include/trace/events/neigh.h:42:34: error: variable ‘pin6’ set but not used [-Werror=unused-but-set-variable]
42 | struct in6_addr *pin6;
| ^~~~
include/trace/trace_events.h:402:11: note: in definition of macro ‘DECLARE_EVENT_CLASS’
402 | { assign; } \
| ^~~~~~
include/trace/trace_events.h:44:30: note: in expansion of macro ‘PARAMS’
44 | PARAMS(assign), \
| ^~~~~~
include/trace/events/neigh.h:23:1: note: in expansion of macro ‘TRACE_EVENT’
23 | TRACE_EVENT(neigh_create,
| ^~~~~~~~~~~
include/trace/events/neigh.h:41:9: note: in expansion of macro ‘TP_fast_assign’
41 | TP_fast_assign(
| ^~~~~~~~~~~~~~
In file included from include/trace/define_trace.h:103,
from include/trace/events/neigh.h:255,
from net/core/net-traces.c:51:
include/trace/events/neigh.h: In function ‘perf_trace_neigh_create’:
include/trace/events/neigh.h:42:34: error: variable ‘pin6’ set but not used [-Werror=unused-but-set-variable]
42 | struct in6_addr *pin6;
| ^~~~
include/trace/perf.h:51:11: note: in definition of macro ‘DECLARE_EVENT_CLASS’
51 | { assign; } \
| ^~~~~~
include/trace/trace_events.h:44:30: note: in expansion of macro ‘PARAMS’
44 | PARAMS(assign), \
| ^~~~~~
include/trace/events/neigh.h:23:1: note: in expansion of macro ‘TRACE_EVENT’
23 | TRACE_EVENT(neigh_create,
| ^~~~~~~~~~~
include/trace/events/neigh.h:41:9: note: in expansion of macro ‘TP_fast_assign’
41 | TP_fast_assign(
| ^~~~~~~~~~~~~~
Indeed, the variable pin6 is declared and initialized unconditionally,
while it is only used and needlessly re-initialized when support for
IPv6 is enabled.
Fix this by dropping the unused variable initialization, and moving the
variable declaration inside the existing section protected by a check
for CONFIG_IPV6.
Fixes: fc651001d2c5ca4f ("neighbor: Add tracepoint to __neigh_create") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Simon Horman <horms@kernel.org> # build-tested Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Furthermore, when shutting down with `poweroff` and modem attached, the
system rebooted instead of powering down as expected. The modem works
again only after power cycling.
Revert runtime power management support for IOSM driver as introduced by
commit e4f5073d53be6c ("net: wwan: iosm: enable runtime pm support for
7560").
Fixes: e4f5073d53be ("net: wwan: iosm: enable runtime pm support for 7560") Reported-by: Martin <mwolf@adiumentum.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217996 Link: https://lore.kernel.org/r/267abf02-4b60-4a2e-92cd-709e3da6f7d3@gmail.com/ Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Reviewed-by: Loic Poulain <loic.poulain@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Gavrilov Ilia [Mon, 16 Oct 2023 14:08:59 +0000 (14:08 +0000)]
net: pktgen: Fix interface flags printing
Device flags are displayed incorrectly:
1) The comparison (i == F_FLOW_SEQ) is always false, because F_FLOW_SEQ
is equal to (1 << FLOW_SEQ_SHIFT) == 2048, and the maximum value
of the 'i' variable is (NR_PKT_FLAG - 1) == 17. It should be compared
with FLOW_SEQ_SHIFT.
2) Similarly to the F_IPSEC flag.
3) Also add spaces to the print end of the string literal "spi:%u"
to prevent the output from merging with the flag that follows.
Found by InfoTeCS on behalf of Linux Verification Center
(linuxtesting.org) with SVACE.
Fixes: 99c6d3d20d62 ("pktgen: Remove brute-force printing of flags") Signed-off-by: Gavrilov Ilia <Ilia.Gavrilov@infotecs.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
1) Fix a slab-use-after-free in xfrm_policy_inexact_list_reinsert.
From Dong Chenchen.
2) Fix data-races in the xfrm interfaces dev->stats fields.
From Eric Dumazet.
3) Fix a data-race in xfrm_gen_index.
From Eric Dumazet.
4) Fix an inet6_dev refcount underflow.
From Zhang Changzhong.
5) Check the return value of pskb_trim in esp_remove_trailer
for esp4 and esp6. From Ma Ke.
6) Fix a data-race in xfrm_lookup_with_ifid.
From Eric Dumazet.
* tag 'ipsec-2023-10-17' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
xfrm: fix a data-race in xfrm_lookup_with_ifid()
net: ipv4: fix return value check in esp_remove_trailer
net: ipv6: fix return value check in esp_remove_trailer
xfrm6: fix inet6_dev refcount underflow problem
xfrm: fix a data-race in xfrm_gen_index()
xfrm: interface: use DEV_STATS_INC()
net: xfrm: skip policies marked as dead while reinserting policies
====================
Fixes: fb7589a16216 ("tun: Add ability to create tun device with given index") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20231016180851.3560092-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Icenowy Zheng [Wed, 18 Oct 2023 00:42:52 +0000 (08:42 +0800)]
LoongArch: Disable WUC for pgprot_writecombine() like ioremap_wc()
Currently the code disables WUC only disables it for ioremap_wc(), which
is only used when mapping writecombine pages like ioremap() (mapped to
the kernel space). But for VRAM mapped in TTM/GEM, it is mapped with a
crafted pgprot by the pgprot_writecombine() function, in which case WUC
isn't disabled now.
Disable WUC for pgprot_writecombine() (fallback to SUC) if needed, like
ioremap_wc().
This improves the AMDGPU driver's stability (solves some misrendering)
on Loongson-3A5000/3A6000 machines.
Huacai Chen [Wed, 18 Oct 2023 00:42:52 +0000 (08:42 +0800)]
LoongArch: Replace kmap_atomic() with kmap_local_page() in copy_user_highpage()
Replace kmap_atomic()/kunmap_atomic() calls with kmap_local_page()/
kunmap_local() in copy_user_highpage() which can be invoked from both
preemptible and atomic context [1].
Neal Cardwell [Sun, 15 Oct 2023 17:47:00 +0000 (13:47 -0400)]
tcp: fix excessive TLP and RACK timeouts from HZ rounding
We discovered from packet traces of slow loss recovery on kernels with
the default HZ=250 setting (and min_rtt < 1ms) that after reordering,
when receiving a SACKed sequence range, the RACK reordering timer was
firing after about 16ms rather than the desired value of roughly
min_rtt/4 + 2ms. The problem is largely due to the RACK reorder timer
calculation adding in TCP_TIMEOUT_MIN, which is 2 jiffies. On kernels
with HZ=250, this is 2*4ms = 8ms. The TLP timer calculation has the
exact same issue.
This commit fixes the TLP transmit timer and RACK reordering timer
floor calculation to more closely match the intended 2ms floor even on
kernels with HZ=250. It does this by adding in a new
TCP_TIMEOUT_MIN_US floor of 2000 us and then converting to jiffies,
instead of the current approach of converting to jiffies and then
adding th TCP_TIMEOUT_MIN value of 2 jiffies.
Our testing has verified that on kernels with HZ=1000, as expected,
this does not produce significant changes in behavior, but on kernels
with the default HZ=250 the latency improvement can be large. For
example, our tests show that for HZ=250 kernels at low RTTs this fix
roughly halves the latency for the RACK reorder timer: instead of
mostly firing at 16ms it mostly fires at 8ms.
Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Fixes: bb4d991a28cc ("tcp: adjust tail loss probe timeout") Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20231015174700.2206872-1-ncardwell.sw@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Wed, 18 Oct 2023 00:14:22 +0000 (17:14 -0700)]
Merge tag 'fbdev-for-6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev
Pull fbdev fixes and cleanups from Helge Deller:
"Various minor fixes, cleanups and annotations for atyfb, sa1100fb,
omapfb, uvesafb and mmp"
* tag 'fbdev-for-6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
fbdev: core: syscopyarea: fix sloppy typing
fbdev: core: cfbcopyarea: fix sloppy typing
fbdev: uvesafb: Call cn_del_callback() at the end of uvesafb_exit()
fbdev: uvesafb: Remove uvesafb_exec() prototype from include/video/uvesafb.h
fbdev: sa1100fb: mark sa1100fb_init() static
fbdev: omapfb: fix some error codes
fbdev: atyfb: only use ioremap_uc() on i386 and ia64
fbdev: mmp: Annotate struct mmp_path with __counted_by
fbdev: mmp: Annotate struct mmphw_ctrl with __counted_by
Shailend Chand [Sat, 14 Oct 2023 01:41:21 +0000 (01:41 +0000)]
gve: Do not fully free QPL pages on prefill errors
The prefill function should have only removed the page count bias it
added. Fully freeing the page will cause gve_free_queue_page_list to
free a page the driver no longer owns.
Linus Torvalds [Tue, 17 Oct 2023 01:50:48 +0000 (18:50 -0700)]
Merge tag 'probes-fixes-v6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probes fixes from Masami Hiramatsu:
- Fix fprobe document to add a new ret_ip parameter for callback
functions. This has been introduced in v6.5 but the document was not
updated.
- Fix fprobe to check the number of active retprobes is not zero. This
number is passed from parameter or calculated by the parameter and it
can be zero which is not acceptable. But current code only check it
is not minus.
* tag 'probes-fixes-v6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
fprobe: Fix to ensure the number of active retprobes is not zero
Documentation: probes: Add a new ret_ip callback parameter
Linus Torvalds [Tue, 17 Oct 2023 01:34:17 +0000 (18:34 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"ARM:
- Fix the handling of the phycal timer offset when FEAT_ECV and
CNTPOFF_EL2 are implemented
- Restore the functionnality of Permission Indirection that was
broken by the Fine Grained Trapping rework
- Cleanup some PMU event sharing code
MIPS:
- Fix W=1 build
s390:
- One small fix for gisa to avoid stalls
x86:
- Truncate writes to PMU counters to the counter's width to avoid
spurious overflows when emulating counter events in software
- Set the LVTPC entry mask bit when handling a PMI (to match
Intel-defined architectural behavior)
- Treat KVM_REQ_PMI as a wake event instead of queueing host IRQ work
to kick the guest out of emulated halt
- Fix for loading XSAVE state from an old kernel into a new one
- Fixes for AMD AVIC
selftests:
- Play nice with %llx when formatting guest printf and assert
statements
- Clean up stale test metadata
- Zero-initialize structures in memslot perf test to workaround a
suspected 'may be used uninitialized' false positives from GCC"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (21 commits)
KVM: arm64: timers: Correctly handle TGE flip with CNTPOFF_EL2
KVM: arm64: POR{E0}_EL1 do not need trap handlers
KVM: arm64: Add nPIR{E0}_EL1 to HFG traps
KVM: MIPS: fix -Wunused-but-set-variable warning
KVM: arm64: pmu: Drop redundant check for non-NULL kvm_pmu_events
KVM: SVM: Fix build error when using -Werror=unused-but-set-variable
x86: KVM: SVM: refresh AVIC inhibition in svm_leave_nested()
x86: KVM: SVM: add support for Invalid IPI Vector interception
x86: KVM: SVM: always update the x2avic msr interception
KVM: selftests: Force load all supported XSAVE state in state test
KVM: selftests: Load XSAVE state into untouched vCPU during state test
KVM: selftests: Touch relevant XSAVE state in guest for state test
KVM: x86: Constrain guest-supported xfeatures only at KVM_GET_XSAVE{2}
x86/fpu: Allow caller to constrain xfeatures when copying to uabi buffer
KVM: selftests: Zero-initialize entire test_result in memslot perf test
KVM: selftests: Remove obsolete and incorrect test case metadata
KVM: selftests: Treat %llx like %lx when formatting guest printf
KVM: x86/pmu: Synthesize at most one PMI per VM-exit
KVM: x86: Mask LVTPC when handling a PMI
KVM: x86/pmu: Truncate counter value to allowed width on write
...
Jakub Kicinski [Tue, 17 Oct 2023 01:02:19 +0000 (18:02 -0700)]
Merge tag 'for-net-2023-10-13' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- Fix race when opening vhci device
- Avoid memcmp() out of bounds warning
- Correctly bounds check and pad HCI_MON_NEW_INDEX name
- Fix using memcmp when comparing keys
- Ignore error return for hci_devcd_register() in btrtl
- Always check if connection is alive before deleting
- Fix a refcnt underflow problem for hci_conn
* tag 'for-net-2023-10-13' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: hci_sock: Correctly bounds check and pad HCI_MON_NEW_INDEX name
Bluetooth: avoid memcmp() out of bounds warning
Bluetooth: hci_sock: fix slab oob read in create_monitor_event
Bluetooth: btrtl: Ignore error return for hci_devcd_register()
Bluetooth: hci_event: Fix coding style
Bluetooth: hci_event: Fix using memcmp when comparing keys
Bluetooth: Fix a refcnt underflow problem for hci_conn
Bluetooth: hci_sync: always check if connection is alive before deleting
Bluetooth: Reject connection with the device which has same BD_ADDR
Bluetooth: hci_event: Ignore NULL link key
Bluetooth: ISO: Fix invalid context error
Bluetooth: vhci: Fix race when opening vhci device
====================
Christoph Paasch [Fri, 13 Oct 2023 04:14:48 +0000 (21:14 -0700)]
netlink: Correct offload_xstats size
rtnl_offload_xstats_get_size_hw_s_info_one() conditionalizes the
size-computation for IFLA_OFFLOAD_XSTATS_HW_S_INFO_USED based on whether
or not the device has offload_xstats enabled.
However, rtnl_offload_xstats_fill_hw_s_info_one() is adding the u8 for
that field uncondtionally.
Dust Li [Thu, 12 Oct 2023 12:37:29 +0000 (20:37 +0800)]
net/smc: return the right falback reason when prefix checks fail
In the smc_listen_work(), if smc_listen_prfx_check() failed,
the real reason: SMC_CLC_DECL_DIFFPREFIX was dropped, and
SMC_CLC_DECL_NOSMCDEV was returned.
Althrough this is also kind of SMC_CLC_DECL_NOSMCDEV, but return
the real reason is much friendly for debugging.
Fixes: e49300a6bf62 ("net/smc: add listen processing for SMC-Rv2") Signed-off-by: Dust Li <dust.li@linux.alibaba.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Link: https://lore.kernel.org/r/20231012123729.29307-1-dust.li@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sergey Shtylyov [Fri, 13 Oct 2023 20:50:24 +0000 (23:50 +0300)]
fbdev: core: syscopyarea: fix sloppy typing
In sys_copyarea(), the local variable bits_per_line is needlessly typed as
*unsigned long* -- which is a 32-bit type on the 32-bit arches and a 64-bit
type on the 64-bit arches; that variable's value is derived from the __u32
typed fb_fix_screeninfo::line_length field (multiplied by 8u) and a 32-bit
*unsigned int* type should still be enough to store the # of bits per line.
Found by Linux Verification Center (linuxtesting.org) with the Svace static
analysis tool.
Sergey Shtylyov [Fri, 13 Oct 2023 20:50:23 +0000 (23:50 +0300)]
fbdev: core: cfbcopyarea: fix sloppy typing
In cfb_copyarea(), the local variable bits_per_line is needlessly typed as
*unsigned long* -- which is a 32-bit type on the 32-bit arches and a 64-bit
type on the 64-bit arches; that variable's value is derived from the __u32
typed fb_fix_screeninfo::line_length field (multiplied by 8u) and a 32-bit
*unsigned int* type should still be enough to store the # of bits per line.
Found by Linux Verification Center (linuxtesting.org) with the Svace static
analysis tool.
Jorge Maidana [Fri, 6 Oct 2023 20:43:46 +0000 (17:43 -0300)]
fbdev: uvesafb: Remove uvesafb_exec() prototype from include/video/uvesafb.h
uvesafb_exec() is a static function defined and called only in
drivers/video/fbdev/uvesafb.c, remove the prototype from
include/video/uvesafb.h.
Fixes the warning:
./include/video/uvesafb.h:112:12: warning: 'uvesafb_exec' declared 'static' but never defined [-Wunused-function]
when including '<video/uvesafb.h>' in an external program.
David S. Miller [Sun, 15 Oct 2023 19:02:51 +0000 (20:02 +0100)]
Merge branch 'ovs-selftests'
From: Aaron Conole <aconole@redhat.com>
To: netdev@vger.kernel.org Cc: dev@openvswitch.org, linux-kselftest@vger.kernel.org,
linux-kernel@vger.kernel.org, Pravin B Shelar <pshelar@ovn.org>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Adrian Moreno <amorenoz@redhat.com>,
Eelco Chaudron <echaudro@redhat.com>,
shuah@kernel.org
Subject: [PATCH net v2 0/4] selftests: openvswitch: Minor fixes for some systems
Date: Wed, 11 Oct 2023 15:49:35 -0400 [thread overview]
Message-ID: <20231011194939.704565-1-aconole@redhat.com> (raw)
A number of corner cases were caught when trying to run the selftests on
older systems. Missed skip conditions, some error cases, and outdated
python setups would all report failures but the issue would actually be
related to some other condition rather than the selftest suite.
Address these individual cases.
====================
Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Aaron Conole [Wed, 11 Oct 2023 19:49:39 +0000 (15:49 -0400)]
selftests: openvswitch: Fix the ct_tuple for v4
The ct_tuple v4 data structure decode / encode routines were using
the v6 IP address decode and relying on default encode. This could
cause exceptions during encode / decode depending on how a ct4
tuple would appear in a netlink message.
Caught during code review.
Fixes: e52b07aa1a54 ("selftests: openvswitch: add flow dump support") Signed-off-by: Aaron Conole <aconole@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Aaron Conole [Wed, 11 Oct 2023 19:49:38 +0000 (15:49 -0400)]
selftests: openvswitch: Skip drop testing on older kernels
Kernels that don't have support for openvswitch drop reasons also
won't have the drop counter reasons, so we should skip the test
completely. It previously wasn't possible to build a test case
for this without polluting the datapath, so we introduce a mechanism
to clear all the flows from a datapath allowing us to test for
explicit drop actions, and then clear the flows to build the
original test case.
Fixes: 4242029164d6 ("selftests: openvswitch: add explicit drop testcase") Signed-off-by: Aaron Conole <aconole@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Aaron Conole [Wed, 11 Oct 2023 19:49:36 +0000 (15:49 -0400)]
selftests: openvswitch: Add version check for pyroute2
Paolo Abeni reports that on some systems the pyroute2 version isn't
new enough to run the test suite. Ensure that we support a minimum
version of 0.6 for all cases (which does include the existing ones).
The 0.6.1 version was released in May of 2021, so should be
propagated to most installations at this point.
The alternative that Paolo proposed was to only skip when the
add-flow is being run. This would be okay for most cases, except
if a future test case is added that needs to do flow dump without
an associated add (just guessing). In that case, it could also be
broken and we would need additional skip logic anyway. Just draw
a line in the sand now.
Fixes: 25f16c873fb1 ("selftests: add openvswitch selftest suite") Reported-by: Paolo Abeni <pabeni@redhat.com> Closes: https://lore.kernel.org/lkml/8470c431e0930d2ea204a9363a60937289b7fdbe.camel@redhat.com/ Signed-off-by: Aaron Conole <aconole@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
3f874c9b2aae ("x86/smp: Don't send INIT to non-present and non-booted CPUs") b1472a60a584 ("x86/smp: Don't send INIT to boot CPU")
because it seems to result in hung machines at shutdown. Particularly
some Dell machines, but Thomas says
"The rest seems to be Lenovo and Sony with Alderlake/Raptorlake CPUs -
at least that's what I could figure out from the various bug reports.
I don't know which CPUs the DELL machines have, so I can't say it's a
pattern.
I agree with the revert for now"
Ashok Raj chimes in:
"There was a report (probably this same one), and it turns out it was a
bug in the BIOS SMI handler.
The client BIOS's were waiting for the lowest APICID to be the SMI
rendevous master. If this is MeteorLake, the BSP wasn't the one with
the lowest APIC and it triped here.
The BIOS change is also being pushed to others for assimilation :)
Server BIOS's had this correctly for a while now"
and it does look likely to be some bad interaction between SMI and the
non-BSP cores having put into INIT (and thus unresponsive until reset).
Older virtio types have historically had loose restrictions, leading
to many entirely impractical fuzzer generated packets causing
problems deep in the kernel stack. Ideally, we would have had strict
validation for all types from the start.
New virtio types can have tighter validation. Limit UDP GSO packets
inserted via virtio to the same limits imposed by the UDP_SEGMENT
socket interface:
1. must use checksum offload
2. checksum offload matches UDP header
3. no more segments than UDP_MAX_SEGMENTS
4. UDP GSO does not take modifier flags, notably SKB_GSO_TCP_ECN
Fixes: 860b7f27b8f7 ("linux/virtio_net.h: Support USO offload in vnet header.") Reported-by: syzbot+01cdbc31e9c0ae9b33ac@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/0000000000005039270605eb0b7f@google.com/ Reported-by: syzbot+c99d835ff081ca30f986@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/0000000000005426680605eb0b9f@google.com/ Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Xuan Zhuo [Wed, 27 Sep 2023 05:52:46 +0000 (13:52 +0800)]
virtio_net: fix the missing of the dma cpu sync
Commit 295525e29a5b ("virtio_net: merge dma operations when filling
mergeable buffers") unmaps the buffer with DMA_ATTR_SKIP_CPU_SYNC when
the dma->ref is zero. We do that with DMA_ATTR_SKIP_CPU_SYNC, because we
do not want to do the sync for the entire page_frag. But that misses the
sync for the current area.
This patch does cpu sync regardless of whether the ref is zero or not.
Fixes: 295525e29a5b ("virtio_net: merge dma operations when filling mergeable buffers") Reported-by: Michael Roth <michael.roth@amd.com> Closes: http://lore.kernel.org/all/20230926130451.axgodaa6tvwqs3ut@amd.com Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Zygo Blaxell [Sat, 7 Oct 2023 05:14:21 +0000 (01:14 -0400)]
btrfs: fix stripe length calculation for non-zoned data chunk allocation
Commit f6fca3917b4d "btrfs: store chunk size in space-info struct"
broke data chunk allocations on non-zoned multi-device filesystems when
using default chunk_size. Commit 5da431b71d4b "btrfs: fix the max chunk
size and stripe length calculation" partially fixed that, and this patch
completes the fix for that case.
After commit f6fca3917b4d and 5da431b71d4b, the sequence of events for
a data chunk allocation on a non-zoned filesystem is:
1. btrfs_create_chunk calls init_alloc_chunk_ctl, which copies
space_info->chunk_size (default 10 GiB) to ctl->max_stripe_len
unmodified. Before f6fca3917b4d, ctl->max_stripe_len value was
1 GiB for non-zoned data chunks and not configurable.
2. btrfs_create_chunk calls gather_device_info which consumes
and produces more fields of chunk_ctl.
3. gather_device_info multiplies ctl->max_stripe_len by
ctl->dev_stripes (which is 1 in all cases except dup)
and calls find_free_dev_extent with that number as num_bytes.
4. find_free_dev_extent locates the first dev_extent hole on
a device which is at least as large as num_bytes. With default
max_chunk_size from f6fca3917b4d, it finds the first hole which is
longer than 10 GiB, or the largest hole if that hole is shorter
than 10 GiB. This is different from the pre-f6fca3917b4d
behavior, where num_bytes is 1 GiB, and find_free_dev_extent
may choose a different hole.
5. gather_device_info repeats step 4 with all devices to find
the first or largest dev_extent hole that can be allocated on
each device.
6. gather_device_info sorts the device list by the hole size
on each device, using total unallocated space on each device to
break ties, then returns to btrfs_create_chunk with the list.
8. decide_stripe_size_regular finds the largest stripe_len that
fits across the first nr_devs device dev_extent holes that were
found by gather_device_info (and satisfies other constraints
on stripe_len that are not relevant here).
9. decide_stripe_size_regular caps the length of the stripe it
computed at 1 GiB. This cap appeared in 5da431b71d4b to correct
one of the other regressions introduced in f6fca3917b4d.
10. btrfs_create_chunk creates a new chunk with the above
computed size and number of devices.
At step 4, gather_device_info() has found a location where stripe up to
10 GiB in length could be allocated on several devices, and selected
which devices should have a dev_extent allocated on them, but at step
9, only 1 GiB of the space that was found on each device can be used.
This mismatch causes new suboptimal chunk allocation cases that did not
occur in pre-f6fca3917b4d kernels.
Consider a filesystem using raid1 profile with 3 devices. After some
balances, device 1 has 10x 1 GiB unallocated space, while devices 2
and 3 have 1x 10 GiB unallocated space, i.e. the same total amount of
space, but distributed across different numbers of dev_extent holes.
For visualization, let's ignore all the chunks that were allocated before
this point, and focus on the remaining holes:
Before f6fca3917b4d, the allocator would fill these optimally by
allocating chunks with dev_extents on devices 1 and 2 ([12]), 1 and 3
([13]), or 2 and 3 ([23]):
This allocates all of the space with no waste. The sorting function used
by gather_device_info considers free space holes above 1 GiB in length
to be equal to 1 GiB, so once find_free_dev_extent locates a sufficiently
long hole on each device, all the holes appear equal in the sort, and the
comparison falls back to sorting devices by total free space. This keeps
usable space on each device equal so they can all be filled completely.
After f6fca3917b4d, the allocator prefers the devices with larger holes
over the devices with more free space, so it makes bad allocation choices:
No further allocations are possible, with 8 GiB wasted (4 GiB of data
space). The sort in gather_device_info now considers free space in
holes longer than 1 GiB to be distinct, so it will prefer devices 2 and
3 over device 1 until all but 1 GiB is allocated on devices 2 and 3.
At that point, with only 1 GiB unallocated on every device, the largest
hole length on each device is equal at 1 GiB, so the sort finally moves
to ordering the devices with the most free space, but by this time it
is too late to make use of the free space on device 1.
Note that it's possible to contrive a case where the pre-f6fca3917b4d
allocator fails the same way, but these cases generally have extensive
dev_extent fragmentation as a precondition (e.g. many holes of 768M
in length on one device, and few holes 1 GiB in length on the others).
With the regression in f6fca3917b4d, bad chunk allocation can occur even
under optimal conditions, when all dev_extent holes are exact multiples
of stripe_len in length, as in the example above.
Also note that post-f6fca3917b4d kernels do treat dev_extent holes
larger than 10 GiB as equal, so the bad behavior won't show up on a
freshly formatted filesystem; however, as the filesystem ages and fills
up, and holes ranging from 1 GiB to 10 GiB in size appear, the problem
can show up as a failure to balance after adding or removing devices,
or an unexpected shortfall in available space due to unequal allocation.
To fix the regression and make data chunk allocation work
again, set ctl->max_stripe_len back to the original SZ_1G, or
space_info->chunk_size if that's smaller (the latter can happen if the
user set space_info->chunk_size to less than 1 GiB via sysfs, or it's
a 32 MiB system chunk with a hardcoded chunk_size and stripe_len).
While researching the background of the earlier commits, I found that an
identical fix was already proposed at:
The previous review missed one detail: ctl->max_stripe_len is used
before decide_stripe_size_regular() is called, when it is too late for
the changes in that function to have any effect. ctl->max_stripe_len is
not used directly by decide_stripe_size_regular(), but the parameter
does heavily influence the per-device free space data presented to
the function.
Linus Torvalds [Sun, 15 Oct 2023 16:16:30 +0000 (09:16 -0700)]
Merge tag 'usb-6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB / Thunderbolt fixes from Greg KH:
"Here are some USB and Thunderbolt driver fixes for 6.6-rc6 to resolve
a number of small reported issues. Included in here are:
- thunderbolt driver fixes
- xhci driver fixes
- cdns3 driver fixes
- musb driver fixes
- a number of typec driver fixes
- a few other small driver fixes
All of these have been in linux-next with no reported issues"
* tag 'usb-6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (22 commits)
usb: typec: ucsi: Use GET_CAPABILITY attributes data to set power supply scope
usb: typec: ucsi: Fix missing link removal
usb: typec: altmodes/displayport: Signal hpd low when exiting mode
xhci: Preserve RsvdP bits in ERSTBA register correctly
xhci: Clear EHB bit only at end of interrupt handler
xhci: track port suspend state correctly in unsuccessful resume cases
usb: xhci: xhci-ring: Use sysdev for mapping bounce buffer
usb: typec: ucsi: Clear EVENT_PENDING bit if ucsi_send_command fails
usb: misc: onboard_hub: add support for Microchip USB2412 USB 2.0 hub
usb: gadget: udc-xilinx: replace memcpy with memcpy_toio
usb: cdns3: Modify the return value of cdns_set_active () to void when CONFIG_PM_SLEEP is disabled
usb: dwc3: Soft reset phy on probe for host
usb: hub: Guard against accesses to uninitialized BOS descriptors
usb: typec: qcom: Update the logic of regulator enable and disable
usb: gadget: ncm: Handle decoding of multiple NTB's in unwrap call
usb: musb: Get the musb_qh poniter after musb_giveback
usb: musb: Modify the "HWVers" register address
usb: cdnsp: Fixes issue with dequeuing not queued requests
thunderbolt: Restart XDomain discovery handshake after failure
thunderbolt: Correct TMU mode initialization from hardware
...
Linus Torvalds [Sun, 15 Oct 2023 16:11:39 +0000 (09:11 -0700)]
Merge tag 'tty-6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial driver fixes from Greg KH:
"Here are some small tty/serial driver fixes for 6.6-rc6 that resolve
some reported issues. Included in here are:
- serial core pm runtime fix for issue reported by many
- 8250_omap driver fix
- rs485 spinlock fix for reported problem
- ams-delta bugfix for previous tty api changes in -rc1 that missed
this driver that never seems to get built in any test systems
All of these have been in linux-next for over a week with no reported
problems"
* tag 'tty-6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
ASoC: ti: ams-delta: Fix cx81801_receive() argument types
serial: core: Fix checks for tx runtime PM state
serial: 8250_omap: Fix errors with no_console_suspend
serial: Reduce spinlocked portion of uart_rs485_config()
Linus Torvalds [Sun, 15 Oct 2023 16:07:27 +0000 (09:07 -0700)]
Merge tag 'char-misc-6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here is a small set of char/misc and other smaller driver subsystem
fixes for 6.6-rc6. Included in here are:
- lots of iio driver fixes
- binder memory leak fix
- mcb driver fixes
- counter driver fixes
- firmware loader documentation fix
- documentation update for embargoed hardware issues
All of these have been in linux-next for over a week with no reported
issues"
* tag 'char-misc-6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (22 commits)
iio: pressure: ms5611: ms5611_prom_is_valid false negative bug
dt-bindings: iio: adc: adi,ad7292: Fix additionalProperties on channel nodes
iio: adc: ad7192: Correct reference voltage
iio: light: vcnl4000: Don't power on/off chip in config
iio: addac: Kconfig: update ad74413r selections
iio: pressure: dps310: Adjust Timeout Settings
iio: imu: bno055: Fix missing Kconfig dependencies
iio: adc: imx8qxp: Fix address for command buffer registers
iio: cros_ec: fix an use-after-free in cros_ec_sensors_push_data()
iio: irsd200: fix -Warray-bounds bug in irsd200_trigger_handler
dt-bindings: iio: rohm,bu27010: add missing vdd-supply to example
binder: fix memory leaks of spam and pending work
firmware_loader: Update contact emails for ABI docs
Documentation: embargoed-hardware-issues.rst: Clarify prenotifaction
mcb: remove is_added flag from mcb_device struct
coresight: tmc-etr: Disable warnings for allocation failures
coresight: Fix run time warnings while reusing ETR buffer
iio: admv1013: add mixer_vgate corner cases
iio: pressure: bmp280: Fix NULL pointer exception
iio: dac: ad3552r: Correct device IDs
...
Linus Torvalds [Sun, 15 Oct 2023 15:55:51 +0000 (08:55 -0700)]
Merge tag 'ovl-fixes-6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs
Pull overlayfs fixes from Amir Goldstein:
- Various fixes for regressions due to conversion to new mount
api in v6.5
- Disable a new mount option syntax (append lowerdir) that was
added in v6.5 because we plan to add a different lowerdir
append syntax in v6.7
* tag 'ovl-fixes-6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs:
ovl: temporarily disable appending lowedirs
ovl: fix regression in showing lowerdir mount option
ovl: fix regression in parsing of mount options with escaped comma
fs: factor out vfs_parse_monolithic_sep() helper
Linus Torvalds [Sun, 15 Oct 2023 15:48:53 +0000 (08:48 -0700)]
Merge tag 'powerpc-6.6-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Fix softlockup/crash when using hcall tracing
- Fix pte_access_permitted() for PAGE_NONE on 8xx
- Fix inverted pte_young() test in __ptep_test_and_clear_young()
on 64-bit BookE
- Fix unhandled math emulation exception on 85xx
- Fix kernel crash on syscall return on 476
Thanks to Athira Rajeev, Christophe Leroy, Eddie James, and Naveen N
Rao.
* tag 'powerpc-6.6-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/47x: Fix 47x syscall return crash
powerpc/85xx: Fix math emulation exception
powerpc/64e: Fix wrong test in __ptep_test_and_clear_young()
powerpc/8xx: Fix pte_access_permitted() for PAGE_NONE
powerpc/pseries: Remove unused r0 in the hcall tracing code
powerpc/pseries: Fix STK_PARAM access in the hcall tracing code
Manish Chopra [Fri, 13 Oct 2023 13:18:12 +0000 (18:48 +0530)]
qed: fix LL2 RX buffer allocation
Driver allocates the LL2 rx buffers from kmalloc()
area to construct the skb using slab_build_skb()
The required size allocation seems to have overlooked
for accounting both skb_shared_info size and device
placement padding bytes which results into the below
panic when doing skb_put() for a standard MTU sized frame.
This patch fixes this by accouting skb_shared_info and device
placement padding size bytes when allocating the buffers.
Cc: David S. Miller <davem@davemloft.net> Fixes: 0a7fb11c23c0 ("qed: Add Light L2 support") Signed-off-by: Manish Chopra <manishc@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>