net: stmmac: fix incorrect flag check in timestamp interrupt
The driver should continue get the timestamp if STMMAC_FLAG_EXT_SNAPSHOT_EN
flag is set.
Fixes: aa5513f5d95f ("net: stmmac: replace the ext_snapshot_en field with a flag") Cc: <stable@vger.kernel.org> # 6.6 Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com> Signed-off-by: Lai Peter Jun Ann <jun.ann.lai@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 20 Dec 2023 11:12:12 +0000 (11:12 +0000)]
Merge tag 'for-net-2023-12-15' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- Add encryption key size check when acting as peripheral
- Shut up false-positive build warning
- Send reject if L2CAP command request is corrupted
- Fix Use-After-Free in bt_sock_recvmsg
- Fix not notifying when connection encryption changes
- Fix not checking if HCI_OP_INQUIRY has been sent
- Fix address type send over to the MGMT interface
- Fix deadlock in vhci_send_frame
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Liu Jian [Sat, 16 Dec 2023 07:52:18 +0000 (15:52 +0800)]
net: check vlan filter feature in vlan_vids_add_by_dev() and vlan_vids_del_by_dev()
I got the below warning trace:
WARNING: CPU: 4 PID: 4056 at net/core/dev.c:11066 unregister_netdevice_many_notify
CPU: 4 PID: 4056 Comm: ip Not tainted 6.7.0-rc4+ #15
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
RIP: 0010:unregister_netdevice_many_notify+0x9a4/0x9b0
Call Trace:
rtnl_dellink
rtnetlink_rcv_msg
netlink_rcv_skb
netlink_unicast
netlink_sendmsg
__sock_sendmsg
____sys_sendmsg
___sys_sendmsg
__sys_sendmsg
do_syscall_64
entry_SYSCALL_64_after_hwframe
It can be repoduced via:
ip netns add ns1
ip netns exec ns1 ip link add bond0 type bond mode 0
ip netns exec ns1 ip link add bond_slave_1 type veth peer veth2
ip netns exec ns1 ip link set bond_slave_1 master bond0
[1] ip netns exec ns1 ethtool -K bond0 rx-vlan-filter off
[2] ip netns exec ns1 ip link add link bond_slave_1 name bond_slave_1.0 type vlan id 0
[3] ip netns exec ns1 ip link add link bond0 name bond0.0 type vlan id 0
[4] ip netns exec ns1 ip link set bond_slave_1 nomaster
[5] ip netns exec ns1 ip link del veth2
ip netns del ns1
This is all caused by command [1] turning off the rx-vlan-filter function
of bond0. The reason is the same as commit 01f4fd270870 ("bonding: Fix
incorrect deletion of ETH_P_8021AD protocol vid from slaves"). Commands
[2] [3] add the same vid to slave and master respectively, causing
command [4] to empty slave->vlan_info. The following command [5] triggers
this problem.
To fix this problem, we should add VLAN_FILTER feature checks in
vlan_vids_add_by_dev() and vlan_vids_del_by_dev() to prevent incorrect
addition or deletion of vlan_vid information.
Fixes: 348a1443cc43 ("vlan: introduce functions to do mass addition/deletion of vids by another device") Signed-off-by: Liu Jian <liujian56@huawei.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Ronald Wahl [Thu, 14 Dec 2023 18:11:12 +0000 (19:11 +0100)]
net: ks8851: Fix TX stall caused by TX buffer overrun
There is a bug in the ks8851 Ethernet driver that more data is written
to the hardware TX buffer than actually available. This is caused by
wrong accounting of the free TX buffer space.
The driver maintains a tx_space variable that represents the TX buffer
space that is deemed to be free. The ks8851_start_xmit_spi() function
adds an SKB to a queue if tx_space is large enough and reduces tx_space
by the amount of buffer space it will later need in the TX buffer and
then schedules a work item. If there is not enough space then the TX
queue is stopped.
The worker function ks8851_tx_work() dequeues all the SKBs and writes
the data into the hardware TX buffer. The last packet will trigger an
interrupt after it was send. Here it is assumed that all data fits into
the TX buffer.
In the interrupt routine (which runs asynchronously because it is a
threaded interrupt) tx_space is updated with the current value from the
hardware. Also the TX queue is woken up again.
Now it could happen that after data was sent to the hardware and before
handling the TX interrupt new data is queued in ks8851_start_xmit_spi()
when the TX buffer space had still some space left. When the interrupt
is actually handled tx_space is updated from the hardware but now we
already have new SKBs queued that have not been written to the hardware
TX buffer yet. Since tx_space has been overwritten by the value from the
hardware the space is not accounted for.
Now we have more data queued then buffer space available in the hardware
and ks8851_tx_work() will potentially overrun the hardware TX buffer. In
many cases it will still work because often the buffer is written out
fast enough so that no overrun occurs but for example if the peer
throttles us via flow control then an overrun may happen.
This can be fixed in different ways. The most simple way would be to set
tx_space to 0 before writing data to the hardware TX buffer preventing
the queuing of more SKBs until the TX interrupt has been handled. I have
chosen a slightly more efficient (and still rather simple) way and
track the amount of data that is already queued and not yet written to
the hardware. When new SKBs are to be queued the already queued amount
of data is honoured when checking free TX buffer space.
I tested this with a setup of two linked KS8851 running iperf3 between
the two in bidirectional mode. Before the fix I got a stall after some
minutes. With the fix I saw now issues anymore after hours.
Fixes: 3ba81f3ece3c ("net: Micrel KS8851 SPI network driver") Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Ben Dooks <ben.dooks@codethink.co.uk> Cc: Tristram Ha <Tristram.Ha@microchip.com> Cc: netdev@vger.kernel.org Cc: stable@vger.kernel.org # 5.10+ Signed-off-by: Ronald Wahl <ronald.wahl@raritan.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20231214181112.76052-1-rwahl@gmx.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
David S. Miller [Sun, 17 Dec 2023 20:54:22 +0000 (20:54 +0000)]
Merge branch 'mptcp-misc-fixes'
Matthieu Baerts says:
====================
mptcp: misc. fixes for v6.7
Here are a few fixes related to MPTCP:
Patch 1 avoids skipping some subtests of the MPTCP Join selftest by
mistake when using older versions of GCC. This fixes a patch introduced
in v6.4, backported up to v6.1.
Patch 2 fixes an inconsistent state when using MPTCP + FastOpen. A fix
for v6.2.
Patch 3 adds a description for MPTCP Kunit test modules to avoid a
warning.
Patch 4 adds an entry to the mailmap file for Geliang's email addresses.
====================
Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Matthieu Baerts <matttbe@kernel.org>
Matthieu Baerts [Fri, 15 Dec 2023 16:04:26 +0000 (17:04 +0100)]
mptcp: fill in missing MODULE_DESCRIPTION()
W=1 builds warn on missing MODULE_DESCRIPTION, add them here in MPTCP.
Only two were missing: two modules with different KUnit tests for MPTCP.
Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts <matttbe@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
and transitioning such subflow to FIN_WAIT1 status before the syn-ack
packet is processed. The MPTCP code does not react to such state change,
leaving the connection in not-fallback status and the subflow handshake
uncompleted, triggering the following splat:
To address the issue, catch the racing subflow state change and
use it to cause the MPTCP fallback. Such fallback is also used to
cause the first subflow state propagation to the msk socket via
mptcp_set_connected(). After this change, the first subflow can
additionally propagate the TCP_FIN_WAIT1 state, so rename the
helper accordingly.
Finally, if the state propagation is delayed to the msk release
callback, the first subflow can change to a different state in between.
Cache the relevant target state in a new msk-level field and use
such value to update the msk state at release time.
Fixes: 1e777f39b4d7 ("mptcp: add MSG_FASTOPEN sendmsg flag support") Cc: stable@vger.kernel.org Reported-by: <syzbot+c53d4d3ddb327e80bc51@syzkaller.appspotmail.com> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/458 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts <matttbe@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
MPC backups tests will skip unexpected sometimes (For example, when
compiling kernel with an older version of gcc, such as gcc-8), since
static functions like mptcp_subflow_send_ack also be listed in
/proc/kallsyms, with a 't' in front of it, not 'T' ('T' is for a global
function):
In this case, mptcp_lib_kallsyms_doesnt_have "mptcp_subflow_send_ack$"
will be false, MPC backups tests will skip. This is not what we expected.
The correct logic here should be: if mptcp_subflow_send_ack is not a
global function in /proc/kallsyms, do these MPC backups tests. So a 'T'
must be added in front of mptcp_subflow_send_ack.
Fixes: 632978f0a961 ("selftests: mptcp: join: skip MPC backups tests if not supported") Cc: stable@vger.kernel.org Signed-off-by: Geliang Tang <geliang.tang@linux.dev> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts <matttbe@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Golle [Tue, 12 Dec 2023 00:05:35 +0000 (00:05 +0000)]
net: phy: skip LED triggers on PHYs on SFP modules
Calling led_trigger_register() when attaching a PHY located on an SFP
module potentially (and practically) leads into a deadlock.
Fix this by not calling led_trigger_register() for PHYs localted on SFP
modules as such modules actually never got any LEDs.
======================================================
WARNING: possible circular locking dependency detected
6.7.0-rc4-next-20231208+ #0 Tainted: G O
------------------------------------------------------
kworker/u8:2/43 is trying to acquire lock: ffffffc08108c4e8 (triggers_list_lock){++++}-{3:3}, at: led_trigger_register+0x4c/0x1a8
but task is already holding lock: ffffff80c5c6f318 (&sfp->sm_mutex){+.+.}-{3:3}, at: cleanup_module+0x2ba8/0x3120 [sfp]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
Andy Gospodarek [Thu, 14 Dec 2023 21:31:38 +0000 (13:31 -0800)]
bnxt_en: do not map packet buffers twice
Remove double-mapping of DMA buffers as it can prevent page pool entries
from being freed. Mapping is managed by page pool infrastructure and
was previously managed by the driver in __bnxt_alloc_rx_page before
allowing the page pool infrastructure to manage it.
Fixes: 578fcfd26e2a ("bnxt_en: Let the page pool manage the DMA mapping") Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: David Wei <dw@davidwei.uk> Link: https://lore.kernel.org/r/20231214213138.98095-1-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Hyunwoo Kim [Sat, 9 Dec 2023 10:55:18 +0000 (05:55 -0500)]
Bluetooth: af_bluetooth: Fix Use-After-Free in bt_sock_recvmsg
This can cause a race with bt_sock_ioctl() because
bt_sock_recvmsg() gets the skb from sk->sk_receive_queue
and then frees it without holding lock_sock.
A use-after-free for a skb occurs with the following flow.
```
bt_sock_recvmsg() -> skb_recv_datagram() -> skb_free_datagram()
bt_sock_ioctl() -> skb_peek()
```
Add lock_sock to bt_sock_recvmsg() to fix this issue.
Cc: stable@vger.kernel.org Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Hyunwoo Kim <v4bel@theori.io> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Alex Lu [Tue, 12 Dec 2023 02:30:34 +0000 (10:30 +0800)]
Bluetooth: Add more enc key size check
When we are slave role and receives l2cap conn req when encryption has
started, we should check the enc key size to avoid KNOB attack or BLUFFS
attack.
From SIG recommendation, implementations are advised to reject
service-level connections on an encrypted baseband link with key
strengths below 7 octets.
A simple and clear way to achieve this is to place the enc key size
check in hci_cc_read_enc_key_size()
The btmon log below shows the case that lacks enc key size check.
> ACL Data RX: Handle 1 flags 0x02 dlen 12
L2CAP: Connection Request (0x02) ident 3 len 4
PSM: 25 (0x0019)
Source CID: 64
< ACL Data TX: Handle 1 flags 0x00 dlen 16
L2CAP: Connection Response (0x03) ident 3 len 8
Destination CID: 64
Source CID: 64
Result: Connection pending (0x0001)
Status: Authorization pending (0x0002)
> HCI Event: Number of Completed Packets (0x13) plen 5
Num handles: 1
Handle: 1 Address: BB:22:33:44:55:99 (OUI BB-22-33)
Count: 1
#35: len 16 (25 Kb/s)
Latency: 5 msec (2-7 msec ~4 msec)
< ACL Data TX: Handle 1 flags 0x00 dlen 16
L2CAP: Connection Response (0x03) ident 3 len 8
Destination CID: 64
Source CID: 64
Result: Connection successful (0x0000)
Status: No further information available (0x0000)
Cc: stable@vger.kernel.org Signed-off-by: Alex Lu <alex_lu@realsil.com.cn> Signed-off-by: Max Chou <max.chou@realtek.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Xiao Yao [Mon, 11 Dec 2023 16:27:18 +0000 (00:27 +0800)]
Bluetooth: MGMT/SMP: Fix address type when using SMP over BREDR/LE
If two Bluetooth devices both support BR/EDR and BLE, and also
support Secure Connections, then they only need to pair once.
The LTK generated during the LE pairing process may be converted
into a BR/EDR link key for BR/EDR transport, and conversely, a
link key generated during the BR/EDR SSP pairing process can be
converted into an LTK for LE transport. Hence, the link type of
the link key and LTK is not fixed, they can be either an LE LINK
or an ACL LINK.
Currently, in the mgmt_new_irk/ltk/crsk/link_key functions, the
link type is fixed, which could lead to incorrect address types
being reported to the application layer. Therefore, it is necessary
to add link_type/addr_type to the smp_irk/ltk/crsk and link_key,
to ensure the generation of the correct address type.
SMP over BREDR:
Before Fix:
> ACL Data RX: Handle 11 flags 0x02 dlen 12
BR/EDR SMP: Identity Address Information (0x09) len 7
Address: F8:7D:76:F2:12:F3 (OUI F8-7D-76)
@ MGMT Event: New Identity Resolving Key (0x0018) plen 30
Random address: 00:00:00:00:00:00 (Non-Resolvable)
LE Address: F8:7D:76:F2:12:F3 (OUI F8-7D-76)
@ MGMT Event: New Long Term Key (0x000a) plen 37
LE Address: F8:7D:76:F2:12:F3 (OUI F8-7D-76)
Key type: Authenticated key from P-256 (0x03)
After Fix:
> ACL Data RX: Handle 11 flags 0x02 dlen 12
BR/EDR SMP: Identity Address Information (0x09) len 7
Address: F8:7D:76:F2:12:F3 (OUI F8-7D-76)
@ MGMT Event: New Identity Resolving Key (0x0018) plen 30
Random address: 00:00:00:00:00:00 (Non-Resolvable)
BR/EDR Address: F8:7D:76:F2:12:F3 (OUI F8-7D-76)
@ MGMT Event: New Long Term Key (0x000a) plen 37
BR/EDR Address: F8:7D:76:F2:12:F3 (OUI F8-7D-76)
Key type: Authenticated key from P-256 (0x03)
SMP over LE:
Before Fix:
@ MGMT Event: New Identity Resolving Key (0x0018) plen 30
Random address: 5F:5C:07:37:47:D5 (Resolvable)
LE Address: F8:7D:76:F2:12:F3 (OUI F8-7D-76)
@ MGMT Event: New Long Term Key (0x000a) plen 37
LE Address: F8:7D:76:F2:12:F3 (OUI F8-7D-76)
Key type: Authenticated key from P-256 (0x03)
@ MGMT Event: New Link Key (0x0009) plen 26
BR/EDR Address: F8:7D:76:F2:12:F3 (OUI F8-7D-76)
Key type: Authenticated Combination key from P-256 (0x08)
After Fix:
@ MGMT Event: New Identity Resolving Key (0x0018) plen 30
Random address: 5E:03:1C:00:38:21 (Resolvable)
LE Address: F8:7D:76:F2:12:F3 (OUI F8-7D-76)
@ MGMT Event: New Long Term Key (0x000a) plen 37
LE Address: F8:7D:76:F2:12:F3 (OUI F8-7D-76)
Key type: Authenticated key from P-256 (0x03)
@ MGMT Event: New Link Key (0x0009) plen 26
Store hint: Yes (0x01)
LE Address: F8:7D:76:F2:12:F3 (OUI F8-7D-76)
Key type: Authenticated Combination key from P-256 (0x08)
Cc: stable@vger.kernel.org Signed-off-by: Xiao Yao <xiaoyao@rock-chips.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Frédéric Danis [Fri, 8 Dec 2023 17:41:50 +0000 (18:41 +0100)]
Bluetooth: L2CAP: Send reject on command corrupted request
L2CAP/COS/CED/BI-02-C PTS test send a malformed L2CAP signaling packet
with 2 commands in it (a connection request and an unknown command) and
expect to get a connection response packet and a command reject packet.
The second is currently not sent.
Cc: stable@vger.kernel.org Signed-off-by: Frédéric Danis <frederic.danis@collabora.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
hci_conn_hash_lookup_cis shall always match the requested CIG and CIS
ids even when they are unset as otherwise it result in not being able
to bind/connect different sockets to the same address as that would
result in having multiple sockets mapping to the same hci_conn which
doesn't really work and prevents BAP audio configuration such as
AC 6(i) when CIG and CIS are left unset.
Fixes: c14516faede3 ("Bluetooth: hci_conn: Fix not matching by CIS ID") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Arnd Bergmann [Wed, 22 Nov 2023 22:17:44 +0000 (23:17 +0100)]
Bluetooth: hci_event: shut up a false-positive warning
Turning on -Wstringop-overflow globally exposed a misleading compiler
warning in bluetooth:
net/bluetooth/hci_event.c: In function 'hci_cc_read_class_of_dev':
net/bluetooth/hci_event.c:524:9: error: 'memcpy' writing 3 bytes into a
region of size 0 overflows the destination [-Werror=stringop-overflow=]
524 | memcpy(hdev->dev_class, rp->dev_class, 3);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The problem here is the check for hdev being NULL in bt_dev_dbg() that
leads the compiler to conclude that hdev->dev_class might be an invalid
pointer access.
Add another explicit check for the same condition to make sure gcc sees
this cannot happen.
Fixes: a9de9248064b ("[Bluetooth] Switch from OGF+OCF to using only opcodes") Fixes: 1b56c90018f0 ("Makefile: Enable -Wstringop-overflow globally") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Bluetooth: hci_event: Fix not checking if HCI_OP_INQUIRY has been sent
Before setting HCI_INQUIRY bit check if HCI_OP_INQUIRY was really sent
otherwise the controller maybe be generating invalid events or, more
likely, it is a result of fuzzing tools attempting to test the right
behavior of the stack when unexpected events are generated.
This change removes the need for acquiring the open_mutex in
vhci_send_frame, thus eliminating the potential deadlock while
maintaining the required packet ordering.
Fixes: 92d4abd66f70 ("Bluetooth: vhci: Fix race when opening vhci device") Signed-off-by: Ying Hsu <yinghsu@chromium.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Bluetooth: Fix not notifying when connection encryption changes
Some layers such as SMP depend on getting notified about encryption
changes immediately as they only allow certain PDU to be transmitted
over an encrypted link which may cause SMP implementation to reject
valid PDUs received thus causing pairing to fail when it shouldn't.
Fixes: 7aca0ac4792e ("Bluetooth: Wait for HCI_OP_WRITE_AUTH_PAYLOAD_TO to complete") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Eric Dumazet [Thu, 14 Dec 2023 15:27:47 +0000 (15:27 +0000)]
net/rose: fix races in rose_kill_by_device()
syzbot found an interesting netdev refcounting issue in
net/rose/af_rose.c, thanks to CONFIG_NET_DEV_REFCNT_TRACKER=y [1]
Problem is that rose_kill_by_device() can change rose->device
while other threads do not expect the pointer to be changed.
We have to first collect sockets in a temporary array,
then perform the changes while holding the socket
lock and rose_list_lock spinlock (in this order)
Change rose_release() to also acquire rose_list_lock
before releasing the netdev refcount.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Bernard Pidoux <f6bvp@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
Zhipeng Lu [Thu, 14 Dec 2023 13:04:04 +0000 (21:04 +0800)]
ethernet: atheros: fix a memleak in atl1e_setup_ring_resources
In the error handling of 'offset > adapter->ring_size', the
tx_ring->tx_buffer allocated by kzalloc should be freed,
instead of 'goto failed' instantly.
Fixes: a6a5325239c2 ("atl1e: Atheros L1E Gigabit Ethernet driver") Signed-off-by: Zhipeng Lu <alexious@zju.edu.cn> Reviewed-by: Suman Ghosh <sumang@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 14 Dec 2023 11:30:38 +0000 (11:30 +0000)]
net: sched: ife: fix potential use-after-free
ife_decode() calls pskb_may_pull() two times, we need to reload
ifehdr after the second one, or risk use-after-free as reported
by syzbot:
BUG: KASAN: slab-use-after-free in __ife_tlv_meta_valid net/ife/ife.c:108 [inline]
BUG: KASAN: slab-use-after-free in ife_tlv_meta_decode+0x1d1/0x210 net/ife/ife.c:131
Read of size 2 at addr ffff88802d7300a4 by task syz-executor.5/22323
The buggy address belongs to the object at ffff88802d730000
which belongs to the cache kmalloc-8k of size 8192
The buggy address is located 164 bytes inside of
freed 8192-byte region [ffff88802d730000, ffff88802d732000)
Fixes: d57493d6d1be ("net: sched: ife: check on metadata length") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Alexander Aring <aahringo@redhat.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
sk_wait_event() returns an error (-EPIPE) if disconnect() is called on the
socket waiting for the event. However, sk_stream_wait_connect() returns
success, i.e. zero, even if sk_wait_event() returns -EPIPE, so a function
that waits for a connection with sk_stream_wait_connect() may misbehave.
In the case of the above DCCP issue, dccp_sendmsg() is waiting for the
connection. If disconnect() is called in concurrently, the above issue
occurs.
This patch fixes the issue by returning error from sk_stream_wait_connect()
if sk_wait_event() fails.
Fixes: 419ce133ab92 ("tcp: allow again tcp_disconnect() when threads are waiting") Signed-off-by: Shigeru Yoshida <syoshida@redhat.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reported-by: syzbot+c71bc336c5061153b502@syzkaller.appspotmail.com Reviewed-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Reported-by: syzkaller <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
duanqiangwen [Thu, 14 Dec 2023 02:33:37 +0000 (10:33 +0800)]
net: libwx: fix memory leak on free page
ifconfig ethx up, will set page->refcount larger than 1,
and then ifconfig ethx down, calling __page_frag_cache_drain()
to free pages, it is not compatible with page pool.
So deleting codes which changing page->refcount.
Fixes: 3c47e8ae113a ("net: libwx: Support to receive packets in NAPI") Signed-off-by: duanqiangwen <duanqiangwen@net-swift.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 15 Dec 2023 03:00:30 +0000 (19:00 -0800)]
Merge tag 'mlx5-fixes-2023-12-13' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5 fixes 2023-12-13
This series provides bug fixes to mlx5 driver.
* tag 'mlx5-fixes-2023-12-13' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5e: Correct snprintf truncation handling for fw_version buffer used by representors
net/mlx5e: Correct snprintf truncation handling for fw_version buffer
net/mlx5e: Fix error codes in alloc_branch_attr()
net/mlx5e: Fix error code in mlx5e_tc_action_miss_mapping_get()
net/mlx5: Refactor mlx5_flow_destination->rep pointer to vport num
net/mlx5: Fix fw tracer first block check
net/mlx5e: XDP, Drop fragmented packets larger than MTU size
net/mlx5e: Decrease num_block_tc when unblock tc offload
net/mlx5e: Fix overrun reported by coverity
net/mlx5e: fix a potential double-free in fs_udp_create_groups
net/mlx5e: Fix a race in command alloc flow
net/mlx5e: Fix slab-out-of-bounds in mlx5_query_nic_vport_mac_list()
net/mlx5e: fix double free of encap_header
Revert "net/mlx5e: fix double free of encap_header"
Revert "net/mlx5e: fix double free of encap_header in update funcs"
====================
Jakub Kicinski [Fri, 15 Dec 2023 02:57:39 +0000 (18:57 -0800)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-12-13 (ice, i40e)
This series contains updates to ice and i40e drivers.
Michal Schmidt prevents possible out-of-bounds access for ice.
Ivan Vecera corrects value for MDIO clause 45 on i40e.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
i40e: Fix ST code value for Clause 45
ice: fix theoretical out-of-bounds access in ethtool link modes
====================
Vladimir Oltean [Thu, 14 Dec 2023 00:09:02 +0000 (02:09 +0200)]
net: mscc: ocelot: fix pMAC TX RMON stats for bucket 256-511 and above
The typo from ocelot_port_rmon_stats_cb() was also carried over to
ocelot_port_pmac_rmon_stats_cb() as well, leading to incorrect TX RMON
stats for the pMAC too.
Fixes: ab3f97a9610a ("net: mscc: ocelot: export ethtool MAC Merge stats for Felix VSC9959") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/20231214000902.545625-2-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Thu, 14 Dec 2023 00:09:01 +0000 (02:09 +0200)]
net: mscc: ocelot: fix eMAC TX RMON stats for bucket 256-511 and above
There is a typo in the driver due to which we report incorrect TX RMON
counters for the 256-511 octet bucket and all the other buckets larger
than that.
Bug found with the selftest at
https://patchwork.kernel.org/project/netdevbpf/patch/20231211223346.2497157-9-tobias@waldekranz.com/
Fixes: e32036e1ae7b ("net: mscc: ocelot: add support for all sorts of standardized counters present in DSA") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/20231214000902.545625-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Thu, 14 Dec 2023 21:11:49 +0000 (13:11 -0800)]
Merge tag 'net-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Current release - regressions:
- tcp: fix tcp_disordered_ack() vs usec TS resolution
Current release - new code bugs:
- dpll: sanitize possible null pointer dereference in
dpll_pin_parent_pin_set()
- eth: octeon_ep: initialise control mbox tasks before using APIs
Previous releases - regressions:
- io_uring/af_unix: disable sending io_uring over sockets
- eth: mlx5e:
- TC, don't offload post action rule if not supported
- fix possible deadlock on mlx5e_tx_timeout_work
- eth: iavf: fix iavf_shutdown to call iavf_remove instead iavf_close
- eth: bnxt_en: fix skb recycling logic in bnxt_deliver_skb()
- eth: ena: fix DMA syncing in XDP path when SWIOTLB is on
- eth: team: fix use-after-free when an option instance allocation
fails
Previous releases - always broken:
- neighbour: don't let neigh_forced_gc() disable preemption for long
- net: prevent mss overflow in skb_segment()
- ipv6: support reporting otherwise unknown prefix flags in
RTM_NEWPREFIX
- tcp: remove acked SYN flag from packet in the transmit queue
correctly
- eth: octeontx2-af:
- fix a use-after-free in rvu_nix_register_reporters
- fix promisc mcam entry action
- eth: dwmac-loongson: make sure MDIO is initialized before use
- eth: atlantic: fix double free in ring reinit logic"
* tag 'net-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (62 commits)
net: atlantic: fix double free in ring reinit logic
appletalk: Fix Use-After-Free in atalk_ioctl
net: stmmac: Handle disabled MDIO busses from devicetree
net: stmmac: dwmac-qcom-ethqos: Fix drops in 10M SGMII RX
dpaa2-switch: do not ask for MDB, VLAN and FDB replay
dpaa2-switch: fix size of the dma_unmap
net: prevent mss overflow in skb_segment()
vsock/virtio: Fix unsigned integer wrap around in virtio_transport_has_space()
Revert "tcp: disable tcp_autocorking for socket when TCP_NODELAY flag is set"
MIPS: dts: loongson: drop incorrect dwmac fallback compatible
stmmac: dwmac-loongson: drop useless check for compatible fallback
stmmac: dwmac-loongson: Make sure MDIO is initialized before use
tcp: disable tcp_autocorking for socket when TCP_NODELAY flag is set
dpll: sanitize possible null pointer dereference in dpll_pin_parent_pin_set()
net: ena: Fix XDP redirection error
net: ena: Fix DMA syncing in XDP path when SWIOTLB is on
net: ena: Fix xdp drops handling due to multibuf packets
net: ena: Destroy correct number of xdp queues upon failure
net: Remove acked SYN flag from packet in the transmit queue correctly
qed: Fix a potential use-after-free in qed_cxt_tables_alloc
...
Linus Torvalds [Thu, 14 Dec 2023 19:53:00 +0000 (11:53 -0800)]
Merge tag 'for-6.7-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"Some fixes to quota accounting code, mostly around error handling and
correctness:
- free reserves on various error paths, after IO errors or
transaction abort
- don't clear reserved range at the folio release time, it'll be
properly cleared after final write
- fix integer overflow due to int used when passing around size of
freed reservations
- fix a regression in squota accounting that missed some cases with
delayed refs"
* tag 'for-6.7-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: ensure releasing squota reserve on head refs
btrfs: don't clear qgroup reserved bit in release_folio
btrfs: free qgroup pertrans reserve on transaction abort
btrfs: fix qgroup_free_reserved_data int overflow
btrfs: free qgroup reserve when ORDERED_IOERR is set
Igor Russkikh [Wed, 13 Dec 2023 09:40:44 +0000 (10:40 +0100)]
net: atlantic: fix double free in ring reinit logic
Driver has a logic leak in ring data allocation/free,
where double free may happen in aq_ring_free if system is under
stress and driver init/deinit is happening.
The probability is higher to get this during suspend/resume cycle.
Verification was done simulating same conditions with
Hyunwoo Kim [Wed, 13 Dec 2023 04:10:56 +0000 (23:10 -0500)]
appletalk: Fix Use-After-Free in atalk_ioctl
Because atalk_ioctl() accesses sk->sk_receive_queue
without holding a sk->sk_receive_queue.lock, it can
cause a race with atalk_recvmsg().
A use-after-free for skb occurs with the following flow.
```
atalk_ioctl() -> skb_peek()
atalk_recvmsg() -> skb_recv_datagram() -> skb_free_datagram()
```
Add sk->sk_receive_queue.lock to atalk_ioctl() to fix this issue.
Sneh Shah [Tue, 12 Dec 2023 09:22:08 +0000 (14:52 +0530)]
net: stmmac: dwmac-qcom-ethqos: Fix drops in 10M SGMII RX
In 10M SGMII mode all the packets are being dropped due to wrong Rx clock.
SGMII 10MBPS mode needs RX clock divider programmed to avoid drops in Rx.
Update configure SGMII function with Rx clk divider programming.
Fixes: 463120c31c58 ("net: stmmac: dwmac-qcom-ethqos: add support for SGMII") Tested-by: Andrew Halaney <ahalaney@redhat.com> Signed-off-by: Sneh Shah <quic_snehshah@quicinc.com> Reviewed-by: Bjorn Andersson <quic_bjorande@quicinc.com> Link: https://lore.kernel.org/r/20231212092208.22393-1-quic_snehshah@quicinc.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Johannes Berg [Thu, 14 Dec 2023 08:08:16 +0000 (09:08 +0100)]
wifi: cfg80211: fix certs build to not depend on file order
The file for the new certificate (Chen-Yu Tsai's) didn't
end with a comma, so depending on the file order in the
build rule, we'd end up with invalid C when concatenating
the (now two) certificates. Fix that.
Cc: stable@vger.kernel.org Reported-by: Biju Das <biju.das.jz@bp.renesas.com> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Fixes: fb768d3b13ff ("wifi: cfg80211: Add my certificate") Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Jakub Kicinski [Thu, 14 Dec 2023 06:03:01 +0000 (22:03 -0800)]
Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-12-12 (iavf)
This series contains updates to iavf driver only.
Piotr reworks Flow Director states to deal with issues in restoring
filters.
Slawomir fixes shutdown processing as it was missing needed calls.
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
iavf: Fix iavf_shutdown to call iavf_remove instead iavf_close
iavf: Handle ntuple on/off based on new state machines for flow director
iavf: Introduce new state machines for flow director
====================
Jakub Kicinski [Thu, 14 Dec 2023 02:38:56 +0000 (18:38 -0800)]
Merge branch 'dpaa2-switch-various-fixes'
Ioana Ciornei says:
====================
dpaa2-switch: various fixes
The first patch fixes the size passed to two dma_unmap_single() calls
which was wrongly put as the size of the pointer.
The second patch is new to this series and reverts the behavior of the
dpaa2-switch driver to not ask for object replay upon offloading so that
we avoid the errors encountered when a VLAN is installed multiple times
on the same port.
====================
Ioana Ciornei [Tue, 12 Dec 2023 16:43:26 +0000 (18:43 +0200)]
dpaa2-switch: do not ask for MDB, VLAN and FDB replay
Starting with commit 4e51bf44a03a ("net: bridge: move the switchdev
object replay helpers to "push" mode") the switchdev_bridge_port_offload()
helper was extended with the intention to provide switchdev drivers easy
access to object addition and deletion replays. This works by calling
the replay helpers with non-NULL notifier blocks.
In the same commit, the dpaa2-switch driver was updated so that it
passes valid notifier blocks to the helper. At that moment, no
regression was identified through testing.
In the meantime, the blamed commit changed the behavior in terms of
which ports get hit by the replay. Before this commit, only the initial
port which identified itself as offloaded through
switchdev_bridge_port_offload() got a replay of all port objects and
FDBs. After this, the newly joining port will trigger a replay of
objects on all bridge ports and on the bridge itself.
This behavior leads to errors in dpaa2_switch_port_vlans_add() when a
VLAN gets installed on the same interface multiple times.
The intended mechanism to address this is to pass a non-NULL ctx to the
switchdev_bridge_port_offload() helper and then check it against the
port's private structure. But since the driver does not have any use for
the replayed port objects and FDBs until it gains support for LAG
offload, it's better to fix the issue by reverting the dpaa2-switch
driver to not ask for replay. The pointers will be added back when we
are prepared to ignore replays on unrelated ports.
Ioana Ciornei [Tue, 12 Dec 2023 16:43:25 +0000 (18:43 +0200)]
dpaa2-switch: fix size of the dma_unmap
The size of the DMA unmap was wrongly put as a sizeof of a pointer.
Change the value of the DMA unmap to be the actual macro used for the
allocation and the DMA map.
Fixes: 3953c46c3ac7 ("sk_buff: allow segmenting based on frag sizes") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20231212164621.4131800-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rahul Rameshbabu [Tue, 21 Nov 2023 23:00:22 +0000 (15:00 -0800)]
net/mlx5e: Correct snprintf truncation handling for fw_version buffer used by representors
snprintf returns the length of the formatted string, excluding the trailing
null, without accounting for truncation. This means that is the return
value is greater than or equal to the size parameter, the fw_version string
was truncated.
Link: https://docs.kernel.org/core-api/kernel-api.html#c.snprintf Fixes: 1b2bd0c0264f ("net/mlx5e: Check return value of snprintf writing to fw_version buffer for representors") Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Rahul Rameshbabu [Tue, 21 Nov 2023 23:00:21 +0000 (15:00 -0800)]
net/mlx5e: Correct snprintf truncation handling for fw_version buffer
snprintf returns the length of the formatted string, excluding the trailing
null, without accounting for truncation. This means that is the return
value is greater than or equal to the size parameter, the fw_version string
was truncated.
Reported-by: David Laight <David.Laight@ACULAB.COM> Closes: https://lore.kernel.org/netdev/81cae734ee1b4cde9b380a9a31006c1a@AcuMS.aculab.com/ Link: https://docs.kernel.org/core-api/kernel-api.html#c.snprintf Fixes: 41e63c2baa11 ("net/mlx5e: Check return value of snprintf writing to fw_version buffer") Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Dan Carpenter [Wed, 13 Dec 2023 14:08:57 +0000 (17:08 +0300)]
net/mlx5e: Fix error codes in alloc_branch_attr()
Set the error code if set_branch_dest_ft() fails.
Fixes: ccbe33003b10 ("net/mlx5e: TC, Don't offload post action rule if not supported") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Dan Carpenter [Wed, 13 Dec 2023 14:08:17 +0000 (17:08 +0300)]
net/mlx5e: Fix error code in mlx5e_tc_action_miss_mapping_get()
Preserve the error code if esw_add_restore_rule() fails. Don't return
success.
Fixes: 6702782845a5 ("net/mlx5e: TC, Set CT miss to the specific ct action instance") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Vlad Buslov [Fri, 6 Oct 2023 13:22:22 +0000 (15:22 +0200)]
net/mlx5: Refactor mlx5_flow_destination->rep pointer to vport num
Currently the destination rep pointer is only used for comparisons or to
obtain vport number from it. Since it is used both during flow creation and
deletion it may point to representor of another eswitch instance which can
be deallocated during driver unload even when there are rules pointing to
it[0]. Refactor the code to store vport number and 'valid' flag instead of
the representor pointer.
[0]:
[176805.886303] ==================================================================
[176805.889433] BUG: KASAN: slab-use-after-free in esw_cleanup_dests+0x390/0x440 [mlx5_core]
[176805.892981] Read of size 2 at addr ffff888155090aa0 by task modprobe/27280
[176806.005317] The buggy address belongs to the object at ffff888155090a80
which belongs to the cache kmalloc-64 of size 64
[176806.006774] The buggy address is located 32 bytes inside of
freed 64-byte region [ffff888155090a80, ffff888155090ac0)
[176806.014935] Memory state around the buggy address:
[176806.015601] ffff888155090980: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[176806.016568] ffff888155090a00: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[176806.017497] >ffff888155090a80: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[176806.018438] ^
[176806.019007] ffff888155090b00: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[176806.020001] ffff888155090b80: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[176806.020996] ==================================================================
Moshe Shemesh [Thu, 30 Nov 2023 09:30:34 +0000 (11:30 +0200)]
net/mlx5: Fix fw tracer first block check
While handling new traces, to verify it is not the first block being
written, last_timestamp is checked. But instead of checking it is non
zero it is verified to be zero. Fix to verify last_timestamp is not
zero.
Carolina Jubran [Thu, 23 Nov 2023 14:11:20 +0000 (16:11 +0200)]
net/mlx5e: XDP, Drop fragmented packets larger than MTU size
XDP transmits fragmented packets that are larger than MTU size instead of
dropping those packets. The drop check that checks whether a packet is larger
than MTU is comparing MTU size against the linear part length only.
Adjust the drop check to compare MTU size against both linear and non-linear
part lengths to avoid transmitting fragmented packets larger than MTU size.
Chris Mi [Wed, 29 Nov 2023 02:53:32 +0000 (04:53 +0200)]
net/mlx5e: Decrease num_block_tc when unblock tc offload
The cited commit increases num_block_tc when unblock tc offload.
Actually should decrease it.
Fixes: c8e350e62fc5 ("net/mlx5e: Make TC and IPsec offloads mutually exclusive on a netdev") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Jianbo Liu [Tue, 14 Nov 2023 01:25:21 +0000 (01:25 +0000)]
net/mlx5e: Fix overrun reported by coverity
Coverity Scan reports the following issue. But it's impossible that
mlx5_get_dev_index returns 7 for PF, even if the index is calculated
from PCI FUNC ID. So add the checking to make coverity slience.
CID 610894 (#2 of 2): Out-of-bounds write (OVERRUN)
Overrunning array esw->fdb_table.offloads.peer_miss_rules of 4 8-byte
elements at element index 7 (byte offset 63) using index
mlx5_get_dev_index(peer_dev) (which evaluates to 7).
Fixes: 9bee385a6e39 ("net/mlx5: E-switch, refactor FDB miss rule add/remove") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Dinghao Liu [Tue, 28 Nov 2023 09:40:53 +0000 (17:40 +0800)]
net/mlx5e: fix a potential double-free in fs_udp_create_groups
When kcalloc() for ft->g succeeds but kvzalloc() for in fails,
fs_udp_create_groups() will free ft->g. However, its caller
fs_udp_create_table() will free ft->g again through calling
mlx5e_destroy_flow_table(), which will lead to a double-free.
Fix this by setting ft->g to NULL in fs_udp_create_groups().
Shifeng Li [Sat, 2 Dec 2023 08:01:26 +0000 (00:01 -0800)]
net/mlx5e: Fix a race in command alloc flow
Fix a cmd->ent use after free due to a race on command entry.
Such race occurs when one of the commands releases its last refcount and
frees its index and entry while another process running command flush
flow takes refcount to this command entry. The process which handles
commands flush may see this command as needed to be flushed if the other
process allocated a ent->idx but didn't set ent to cmd->ent_arr in
cmd_work_handler(). Fix it by moving the assignment of cmd->ent_arr into
the spin lock.
Shifeng Li [Thu, 30 Nov 2023 09:46:56 +0000 (01:46 -0800)]
net/mlx5e: Fix slab-out-of-bounds in mlx5_query_nic_vport_mac_list()
Out_sz that the size of out buffer is calculated using query_nic_vport
_context_in structure when driver query the MAC list. However query_nic
_vport_context_in structure is smaller than query_nic_vport_context_out.
When allowed_list_size is greater than 96, calling ether_addr_copy() will
trigger an slab-out-of-bounds.
Vlad Buslov [Tue, 21 Nov 2023 13:15:30 +0000 (14:15 +0100)]
net/mlx5e: fix double free of encap_header
Cited commit introduced potential double free since encap_header can be
destroyed twice in some cases - once by error cleanup sequence in
mlx5e_tc_tun_{create|update}_header_ipv{4|6}(), once by generic
mlx5e_encap_put() that user calls as a result of getting an error from
tunnel create|update. At the same time the point where e->encap_header is
assigned can't be delayed because the function can still return non-error
code 0 as a result of checking for NUD_VALID flag, which will cause
neighbor update to dereference NULL encap_header.
Fix the issue by:
- Nulling local encap_header variables in
mlx5e_tc_tun_{create|update}_header_ipv{4|6}() to make kfree(encap_header)
call in error cleanup sequence noop after that point.
- Assigning reformat_params.data from e->encap_header instead of local
variable encap_header that was set to NULL pointer by previous step. Also
assign reformat_params.size from e->encap_size for uniformity and in order
to make the code less error-prone in the future.
Yusong Gao [Wed, 13 Dec 2023 10:31:10 +0000 (10:31 +0000)]
sign-file: Fix incorrect return values check
There are some wrong return values check in sign-file when call OpenSSL
API. The ERR() check cond is wrong because of the program only check the
return value is < 0 which ignored the return val is 0. For example:
1. CMS_final() return 1 for success or 0 for failure.
2. i2d_CMS_bio_stream() returns 1 for success or 0 for failure.
3. i2d_TYPEbio() return 1 for success and 0 for failure.
4. BIO_free() return 1 for success and 0 for failure.
Linus Torvalds [Wed, 13 Dec 2023 19:09:58 +0000 (11:09 -0800)]
Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull ufs fix from Al Viro:
"ufs got broken this merge window on folio conversion - calling
conventions for filemap_lock_folio() are not the same as for
find_lock_page()"
* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fix ufs_get_locked_folio() breakage
Linus Torvalds [Wed, 13 Dec 2023 18:54:50 +0000 (10:54 -0800)]
Merge tag 'efi-urgent-for-v6.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI fixes from Ard Biesheuvel:
- Deal with a regression in the recently refactored x86 EFI stub code
on older Dell systems by disabling randomization of the physical load
address
- Use the correct load address for relocatable Loongarch kernels
* tag 'efi-urgent-for-v6.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
efi/x86: Avoid physical KASLR on older Dell systems
efi/loongarch: Use load address to calculate kernel entry address
Ivan Vecera [Wed, 29 Nov 2023 16:17:10 +0000 (17:17 +0100)]
i40e: Fix ST code value for Clause 45
ST code value for clause 45 that has been changed by
commit 8196b5fd6c73 ("i40e: Refactor I40E_MDIO_CLAUSE* macros")
is currently wrong.
The mentioned commit refactored ..MDIO_CLAUSE??_STCODE_MASK so
their value is the same for both clauses. The value is correct
for clause 22 but not for clause 45.
Fix the issue by adding a parameter to I40E_GLGEN_MSCA_STCODE_MASK
macro that specifies required value.
Fixes: 8196b5fd6c73 ("i40e: Refactor I40E_MDIO_CLAUSE* macros") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Michal Schmidt [Thu, 30 Nov 2023 16:58:06 +0000 (17:58 +0100)]
ice: fix theoretical out-of-bounds access in ethtool link modes
To map phy types reported by the hardware to ethtool link mode bits,
ice uses two lookup tables (phy_type_low_lkup, phy_type_high_lkup).
The "low" table has 64 elements to cover every possible bit the hardware
may report, but the "high" table has only 13. If the hardware reports a
higher bit in phy_types_high, the driver would access memory beyond the
lookup table's end.
Instead of iterating through all 64 bits of phy_types_{low,high}, use
the sizes of the respective lookup tables.
Fixes: 9136e1f1e5c3 ("ice: refactor PHY type to ethtool link mode") Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
David S. Miller [Wed, 13 Dec 2023 10:57:01 +0000 (10:57 +0000)]
Merge branch 'stmmac-bug-fixes'
Yanteng Si says:
====================
stmmac: Some bug fixes
* Put Krzysztof's patch into my thread, pick Conor's Reviewed-by
tag and Jiaxun's Acked-by tag.(prev version is RFC patch)
* I fixed an Oops related to mdio, mainly to ensure that
mdio is initialized before use, because it will be used
in a series of patches I am working on.
see <https://lore.kernel.org/loongarch/cover.1699533745.git.siyanteng@loongson.cn/T/#t>
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
MIPS: dts: loongson: drop incorrect dwmac fallback compatible
Device binds to proper PCI ID (LOONGSON, 0x7a03), already listed in DTS,
so checking for some other compatible does not make sense. It cannot be
bound to unsupported platform.
Drop useless, incorrect (space in between) and undocumented compatible.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Acked-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
stmmac: dwmac-loongson: drop useless check for compatible fallback
Device binds to proper PCI ID (LOONGSON, 0x7a03), already listed in DTS,
so checking for some other compatible does not make sense. It cannot be
bound to unsupported platform.
Drop useless, incorrect (space in between) and undocumented compatible.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Acked-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yanteng Si [Mon, 11 Dec 2023 10:33:11 +0000 (18:33 +0800)]
stmmac: dwmac-loongson: Make sure MDIO is initialized before use
Generic code will use mdio. If it is not initialized before use,
the kernel will Oops.
Fixes: 30bba69d7db4 ("stmmac: pci: Add dwmac support for Loongson") Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Signed-off-by: Feiyang Chen <chenfeiyang@loongson.cn> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
tcp: disable tcp_autocorking for socket when TCP_NODELAY flag is set
Based on the tcp man page, if TCP_NODELAY is set, it disables Nagle's algorithm
and packets are sent as soon as possible. However in the `tcp_push` function
where autocorking is evaluated the `nonagle` value set by TCP_NODELAY is not
considered which can trigger unexpected corking of packets and induce delays.
For example, if two packets are generated as part of a server's reply, if the
first one is not transmitted on the wire quickly enough, the second packet can
trigger the autocorking in `tcp_push` and be delayed instead of sent as soon as
possible. It will either wait for additional packets to be coalesced or an ACK
from the client before transmitting the corked packet. This can interact badly
if the receiver has tcp delayed acks enabled, introducing 40ms extra delay in
completion times. It is not always possible to control who has delayed acks
set, but it is possible to adjust when and how autocorking is triggered.
Patch prevents autocorking if the TCP_NODELAY flag is set on the socket.
Patch has been tested using an AWS c7g.2xlarge instance with Ubuntu 22.04 and
Apache Tomcat 9.0.83 running the basic servlet below:
public class HelloWorldServlet extends HttpServlet {
@Override
protected void doGet(HttpServletRequest request, HttpServletResponse response)
throws ServletException, IOException {
response.setContentType("text/html;charset=utf-8");
OutputStreamWriter osw = new OutputStreamWriter(response.getOutputStream(),"UTF-8");
String s = "a".repeat(3096);
osw.write(s,0,s.length());
osw.flush();
}
}
Load was applied using wrk2 (https://github.com/kinvolk/wrk2) from an AWS
c6i.8xlarge instance. With the current auto-corking behavior and TCP_NODELAY
set an additional 40ms latency from P99.99+ values are observed. With the
patch applied we see no occurrences of 40ms latencies. The patch has also been
tested with iperf and uperf benchmarks and no regression was observed.
# No patch with tcp_autocorking=1 and TCP_NODELAY set on all sockets
./wrk -t32 -c128 -d40s --latency -R10000 http://172.31.49.177:8080/hello/hello'
...
50.000% 0.91ms
75.000% 1.12ms
90.000% 1.46ms
99.000% 1.73ms
99.900% 1.96ms
99.990% 43.62ms <<< 40+ ms extra latency
99.999% 48.32ms
100.000% 49.34ms
# With patch
./wrk -t32 -c128 -d40s --latency -R10000 http://172.31.49.177:8080/hello/hello'
...
50.000% 0.89ms
75.000% 1.13ms
90.000% 1.44ms
99.000% 1.67ms
99.900% 1.78ms
99.990% 2.27ms <<< no 40+ ms extra latency
99.999% 3.71ms
100.000% 4.57ms
Fixes: f54b311142a9 ("tcp: auto corking") Signed-off-by: Salvatore Dipietro <dipiets@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tag 'hid-for-linus-2023121201' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: apple: Add "hfd.cn" and "WKB603" to the list of non-apple keyboards
HID: lenovo: Restrict detection of patched firmware only to USB cptkbd
HID: Add quirk for Labtec/ODDOR/aikeec handbrake
HID: i2c-hid: Add IDEA5002 to i2c_hid_acpi_blacklist[]
mailmap: add address mapping for Jiri Kosina
Jiri Pirko [Mon, 11 Dec 2023 08:37:58 +0000 (09:37 +0100)]
dpll: sanitize possible null pointer dereference in dpll_pin_parent_pin_set()
User may not pass DPLL_A_PIN_STATE attribute in the pin set operation
message. Sanitize that by checking if the attr pointer is not null
and process the passed state attribute value only in that case.
Reported-by: Xingyuan Mo <hdthky0@gmail.com> Fixes: 9d71b54b65b1 ("dpll: netlink: Add DPLL framework base functions") Signed-off-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://lore.kernel.org/r/20231211083758.1082853-1-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David Arinzon [Mon, 11 Dec 2023 06:28:00 +0000 (06:28 +0000)]
net: ena: Fix DMA syncing in XDP path when SWIOTLB is on
This patch fixes two issues:
Issue 1
-------
Description
```````````
Current code does not call dma_sync_single_for_cpu() to sync data from
the device side memory to the CPU side memory before the XDP code path
uses the CPU side data.
This causes the XDP code path to read the unset garbage data in the CPU
side memory, resulting in incorrect handling of the packet by XDP.
Solution
````````
1. Add a call to dma_sync_single_for_cpu() before the XDP code starts to
use the data in the CPU side memory.
2. The XDP code verdict can be XDP_PASS, in which case there is a
fallback to the non-XDP code, which also calls
dma_sync_single_for_cpu().
To avoid calling dma_sync_single_for_cpu() twice:
2.1. Put the dma_sync_single_for_cpu() in the code in such a place where
it happens before XDP and non-XDP code.
2.2. Remove the calls to dma_sync_single_for_cpu() in the non-XDP code
for the first buffer only (rx_copybreak and non-rx_copybreak
cases), since the new call that was added covers these cases.
The call to dma_sync_single_for_cpu() for the second buffer and on
stays because only the first buffer is handled by the newly added
dma_sync_single_for_cpu(). And there is no need for special
handling of the second buffer and on for the XDP path since
currently the driver supports only single buffer packets.
Issue 2
-------
Description
```````````
In case the XDP code forwarded the packet (ENA_XDP_FORWARDED),
ena_unmap_rx_buff_attrs() is called with attrs set to 0.
This means that before unmapping the buffer, the internal function
dma_unmap_page_attrs() will also call dma_sync_single_for_cpu() on
the whole buffer (not only on the data part of it).
This sync is both wasteful (since a sync was already explicitly
called before) and also causes a bug, which will be explained
using the below diagram.
The following diagram shows the flow of events causing the bug.
The order of events is (1)-(4) as shown in the diagram.
CPU side memory area
(3)convert_to_xdp_frame() initializes the
headroom with xdpf metadata
||
\/
___________________________________
| |
0 | V 4K
---------------------------------------------------------------------
| xdpf->data | other xdpf | < data > | tailroom ||...|
| | fields | | GARBAGE || |
---------------------------------------------------------------------
/\ /\
|| ||
(4)ena_unmap_rx_buff_attrs() calls (2)dma_sync_single_for_cpu()
dma_sync_single_for_cpu() on the copies data from device
whole buffer page, overwriting side to CPU side memory
the xdpf->data with GARBAGE. ||
0 4K
---------------------------------------------------------------------
| headroom | < data > | tailroom ||...|
| GARBAGE | | GARBAGE || |
---------------------------------------------------------------------
Device side memory area /\
||
(1) device writes RX packet data
After the call to ena_unmap_rx_buff_attrs() in (4), the xdpf->data
becomes corrupted, and so when it is later accessed in
ena_clean_xdp_irq()->xdp_return_frame(), it causes a page fault,
crashing the kernel.
Solution
````````
Explicitly tell ena_unmap_rx_buff_attrs() not to call
dma_sync_single_for_cpu() by passing it the ENA_DMA_ATTR_SKIP_CPU_SYNC
flag.
Fixes: f7d625adeb7b ("net: ena: Add dynamic recycling mechanism for rx buffers") Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David Arinzon <darinzon@amazon.com> Link: https://lore.kernel.org/r/20231211062801.27891-4-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David Arinzon [Mon, 11 Dec 2023 06:27:59 +0000 (06:27 +0000)]
net: ena: Fix xdp drops handling due to multibuf packets
Current xdp code drops packets larger than ENA_XDP_MAX_MTU.
This is an incorrect condition since the problem is not the
size of the packet, rather the number of buffers it contains.
This commit:
1. Identifies and drops XDP multi-buffer packets at the
beginning of the function.
2. Increases the xdp drop statistic when this drop occurs.
3. Adds a one-time print that such drops are happening to
give better indication to the user.
Fixes: 838c93dc5449 ("net: ena: implement XDP drop support") Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David Arinzon <darinzon@amazon.com> Link: https://lore.kernel.org/r/20231211062801.27891-3-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David Arinzon [Mon, 11 Dec 2023 06:27:58 +0000 (06:27 +0000)]
net: ena: Destroy correct number of xdp queues upon failure
The ena_setup_and_create_all_xdp_queues() function freed all the
resources upon failure, after creating only xdp_num_queues queues,
instead of freeing just the created ones.
In this patch, the only resources that are freed, are the ones
allocated right before the failure occurs.
Fixes: 548c4940b9f1 ("net: ena: Implement XDP_TX action") Signed-off-by: Shahar Itzko <itzko@amazon.com> Signed-off-by: David Arinzon <darinzon@amazon.com> Link: https://lore.kernel.org/r/20231211062801.27891-2-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
// TCP B: not send challenge ack for ack limit or packet loss
// TCP A: close
tcp_close
tcp_send_fin
if (!tskb && tcp_under_memory_pressure(sk))
tskb = skb_rb_last(&sk->tcp_rtx_queue); //pick SYN_ACK packet
TCP_SKB_CB(tskb)->tcp_flags |= TCPHDR_FIN; // set FIN flag
__tcp_retransmit_skb //skb->len=0
tcp_trim_head
len = tp->snd_una - TCP_SKB_CB(skb)->seq // len=101-100
__pskb_trim_head
skb->data_len -= len // skb->len=-1, wrap around
... ...
ip_fragment
icmp_glue_bits //BUG_ON
If we use tcp_trim_head() to remove acked SYN from packet that contains data
or other flags, skb->len will be incorrectly decremented. We can remove SYN
flag that has been acked from rtx_queue earlier than tcp_trim_head(), which
can fix the problem mentioned above.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Co-developed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com> Link: https://lore.kernel.org/r/20231210020200.1539875-1-dongchenchen2@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dinghao Liu [Sun, 10 Dec 2023 04:52:55 +0000 (12:52 +0800)]
qed: Fix a potential use-after-free in qed_cxt_tables_alloc
qed_ilt_shadow_alloc() will call qed_ilt_shadow_free() to
free p_hwfn->p_cxt_mngr->ilt_shadow on error. However,
qed_cxt_tables_alloc() accesses the freed pointer on failure
of qed_ilt_shadow_alloc() through calling qed_cxt_mngr_free(),
which may lead to use-after-free. Fix this issue by setting
p_mngr->ilt_shadow to NULL in qed_ilt_shadow_free().
Fixes: fe56b9e6a8d9 ("qed: Add module with basic common support") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn> Link: https://lore.kernel.org/r/20231210045255.21383-1-dinghao.liu@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Tue, 12 Dec 2023 19:37:04 +0000 (11:37 -0800)]
Merge tag 'ext4_for_linus-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"Fix various bugs / regressions for ext4, including a soft lockup, a
WARN_ON, and a BUG"
* tag 'ext4_for_linus-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
jbd2: fix soft lockup in journal_finish_inode_data_buffers()
ext4: fix warning in ext4_dio_write_end_io()
jbd2: increase the journal IO's priority
jbd2: correct the printing of write_flags in jbd2_write_superblock()
ext4: prevent the normalized size from exceeding EXT_MAX_BLOCKS
Slawomir Laba [Wed, 29 Nov 2023 15:35:26 +0000 (10:35 -0500)]
iavf: Fix iavf_shutdown to call iavf_remove instead iavf_close
Make the flow for pci shutdown be the same to the pci remove.
iavf_shutdown was implementing an incomplete version
of iavf_remove. It misses several calls to the kernel like
iavf_free_misc_irq, iavf_reset_interrupt_capability, iounmap
that might break the system on reboot or hibernation.
Implement the call of iavf_remove directly in iavf_shutdown to
close this gap.
Fixes below error messages (dmesg) during shutdown stress tests -
[685814.900917] ice 0000:88:00.0: MAC 02:d0:5f:82:43:5d does not exist for
VF 0
[685814.900928] ice 0000:88:00.0: MAC 33:33:00:00:00:01 does not exist for
VF 0
Reproduction:
1. Create one VF interface:
echo 1 > /sys/class/net/<interface_name>/device/sriov_numvfs
2. Run live dmesg on the host:
dmesg -wH
3. On SUT, script below steps into vf_namespace_assignment.sh
<#!/bin/sh> // Remove <>. Git removes # line
if=<VF name> (edit this per VF name)
loop=0
while true; do
echo test round $loop
let loop++
ip netns add ns$loop
ip link set dev $if up
ip link set dev $if netns ns$loop
ip netns exec ns$loop ip link set dev $if up
ip netns exec ns$loop ip link set dev $if netns 1
ip netns delete ns$loop
done
4. Run the script for at least 1000 iterations on SUT:
./vf_namespace_assignment.sh
Expected result:
No errors in dmesg.
Fixes: 129cf89e5856 ("iavf: rename functions and structs to new name") Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Ahmed Zaki <ahmed.zaki@intel.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Co-developed-by: Ranganatha Rao <ranganatha.rao@intel.com> Signed-off-by: Ranganatha Rao <ranganatha.rao@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Piotr Gardocki [Wed, 22 Nov 2023 03:47:16 +0000 (22:47 -0500)]
iavf: Handle ntuple on/off based on new state machines for flow director
ntuple-filter feature on/off:
Default is on. If turned off, the filters will be removed from both
PF and iavf list. The removal is irrespective of current filter state.
Steps to reproduce:
-------------------
1. Ensure ntuple is on.
ethtool -K enp8s0 ntuple-filters on
2. Create a filter to receive the traffic into non-default rx-queue like 15
and ensure traffic is flowing into queue into 15.
Now, turn off ntuple. Traffic should not flow to configured queue 15.
It should flow to default RX queue.
Fixes: 0dbfbabb840d ("iavf: Add framework to enable ethtool ntuple filters") Signed-off-by: Piotr Gardocki <piotrx.gardocki@intel.com> Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Signed-off-by: Ranganatha Rao <ranganatha.rao@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Current FDIR state machines (SM) are not adequate to handle a few
scenarios in the link DOWN/UP event, reset event and ntuple-feature.
For example, when VF link goes DOWN and comes back UP administratively,
the expectation is that previously installed filters should also be
restored. But with current SM, filters are not restored.
So with new SM, during link DOWN filters are marked as INACTIVE in
the iavf list but removed from PF. After link UP, SM will transition
from INACTIVE to ADD_REQUEST to restore the filter.
Similarly, with VF reset, filters will be removed from the PF, but
marked as INACTIVE in the iavf list. Filters will be restored after
reset completion.
Steps to reproduce:
-------------------
1. Create a VF. Here VF is enp8s0.
2. Assign IP addresses to VF and link partner and ping continuously
from remote. Here remote IP is 1.1.1.1.
5. Ensure filter gets added and traffic is received on RX queue 15 now.
Link event testing:
-------------------
6. Bring VF link down and up. If traffic flows to configured queue 15,
test is success, otherwise it is a failure.
Reset event testing:
--------------------
7. Reset the VF. If traffic flows to configured queue 15, test is success,
otherwise it is a failure.
Fixes: 0dbfbabb840d ("iavf: Add framework to enable ethtool ntuple filters") Signed-off-by: Piotr Gardocki <piotrx.gardocki@intel.com> Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Signed-off-by: Ranganatha Rao <ranganatha.rao@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Linus Torvalds [Tue, 12 Dec 2023 19:06:41 +0000 (11:06 -0800)]
Merge tag 'fuse-fixes-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse fixes from Miklos Szeredi:
- Fix a couple of potential crashes, one introduced in 6.6 and one
in 5.10
- Fix misbehavior of virtiofs submounts on memory pressure
- Clarify naming in the uAPI for a recent feature
* tag 'fuse-fixes-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: disable FOPEN_PARALLEL_DIRECT_WRITES with FUSE_DIRECT_IO_ALLOW_MMAP
fuse: dax: set fc->dax to NULL in fuse_dax_conn_free()
fuse: share lookup state between submount and its parent
docs/fuse-io: Document the usage of DIRECT_IO_ALLOW_MMAP
fuse: Rename DIRECT_IO_RELAX to DIRECT_IO_ALLOW_MMAP
Linus Torvalds [Tue, 12 Dec 2023 18:30:10 +0000 (10:30 -0800)]
Merge tag '6.7-rc5-ksmbd-server-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:
- Memory leak fix (in lock error path)
- Two fixes for create with allocation size
- FIx for potential UAF in lease break error path
- Five directory lease (caching) fixes found during additional recent
testing
* tag '6.7-rc5-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: fix wrong name of SMB2_CREATE_ALLOCATION_SIZE
ksmbd: fix wrong allocation size update in smb2_open()
ksmbd: avoid duplicate opinfo_put() call on error of smb21_lease_break_ack()
ksmbd: lazy v2 lease break on smb2_write()
ksmbd: send v2 lease break notification for directory
ksmbd: downgrade RWH lease caching state to RH for directory
ksmbd: set v2 lease capability
ksmbd: set epoch in create context v2 lease
ksmbd: fix memory leak in smb2_lock()
Analyzed informations from vmcore as follows:
(1) There are about 5k+ jbd2_inode in 'commit_transaction->t_inode_list';
(2) Now is processing the 855th jbd2_inode;
(3) JBD2 task has TIF_NEED_RESCHED flag;
(4) There's no pags in address_space around the 855th jbd2_inode;
(5) There are some process is doing drop caches;
(6) Mounted with 'nodioread_nolock' option;
(7) 128 CPUs;
According to informations from vmcore we know 'journal->j_list_lock' spin lock
competition is fierce. So journal_finish_inode_data_buffers() maybe process
slowly. Theoretically, there is scheduling point in the filemap_fdatawait_range_keep_errors().
However, if inode's address_space has no pages which taged with PAGECACHE_TAG_WRITEBACK,
will not call cond_resched(). So may lead to soft lockup.
journal_finish_inode_data_buffers
filemap_fdatawait_range_keep_errors
__filemap_fdatawait_range
while (index <= end)
nr_pages = pagevec_lookup_range_tag(&pvec, mapping, &index, end, PAGECACHE_TAG_WRITEBACK);
if (!nr_pages)
break; --> If 'nr_pages' is equal zero will break, then will not call cond_resched()
for (i = 0; i < nr_pages; i++)
wait_on_page_writeback(page);
cond_resched();
To solve above issue, add scheduling point in the journal_finish_inode_data_buffers();
Felix Fietkau [Fri, 8 Dec 2023 07:50:04 +0000 (08:50 +0100)]
wifi: mt76: fix crash with WED rx support enabled
If WED rx is enabled, rx buffers are added to a buffer pool that can be
filled from multiple page pools. Because buffers freed from rx poll are
not guaranteed to belong to the processed queue's page pool, lockless
caching must not be used in this case.
Cc: stable@vger.kernel.org Fixes: 2f5c3c77fc9b ("wifi: mt76: switch to page_pool allocator") Signed-off-by: Felix Fietkau <nbd@nbd.name> Acked-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20231208075004.69843-1-nbd@nbd.name
Yan Jun [Sun, 3 Dec 2023 11:50:58 +0000 (19:50 +0800)]
HID: apple: Add "hfd.cn" and "WKB603" to the list of non-apple keyboards
JingZao(京造) WKB603 keyboard is a rebranded product of Jamesdonkey RS2
keyboard, identified as "hfd.cn WKB603" in wired mode, "WKB603" in bluetooth
mode. Adding them to the list of non-apple keyboards fixes function key.
Signed-off-by: Yan Jun <jerrysteve1101@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.com>
HID: lenovo: Restrict detection of patched firmware only to USB cptkbd
Commit 46a0a2c96f0f ("HID: lenovo: Detect quirk-free fw on cptkbd and
stop applying workaround") introduced a regression for ThinkPad
TrackPoint Keyboard II which has similar quirks to cptkbd (so it uses
the same workarounds) but slightly different so that there are
false-positives during detecting well-behaving firmware. This commit
restricts detecting well-behaving firmware to the only model which
known to have one and have stable enough quirks to not cause
false-positives.
Hyunwoo Kim [Sat, 9 Dec 2023 10:05:38 +0000 (05:05 -0500)]
net/rose: Fix Use-After-Free in rose_ioctl
Because rose_ioctl() accesses sk->sk_receive_queue
without holding a sk->sk_receive_queue.lock, it can
cause a race with rose_accept().
A use-after-free for skb occurs with the following flow.
```
rose_ioctl() -> skb_peek()
rose_accept() -> skb_dequeue() -> kfree_skb()
```
Add sk->sk_receive_queue.lock to rose_ioctl() to fix this issue.
Hyunwoo Kim [Sat, 9 Dec 2023 09:42:10 +0000 (04:42 -0500)]
atm: Fix Use-After-Free in do_vcc_ioctl
Because do_vcc_ioctl() accesses sk->sk_receive_queue
without holding a sk->sk_receive_queue.lock, it can
cause a race with vcc_recvmsg().
A use-after-free for skb occurs with the following flow.
```
do_vcc_ioctl() -> skb_peek()
vcc_recvmsg() -> skb_recv_datagram() -> skb_free_datagram()
```
Add sk->sk_receive_queue.lock to do_vcc_ioctl() to fix this issue.
Avraham Stern [Thu, 7 Dec 2023 02:50:17 +0000 (04:50 +0200)]
wifi: iwlwifi: pcie: avoid a NULL pointer dereference
It possible that while the rx rb is being handled, the transport has
been stopped and re-started. In this case the tx queue pointer is not
yet initialized, which will lead to a NULL pointer dereference.
Fix it.