Jakub Kicinski [Tue, 5 Sep 2023 23:42:02 +0000 (16:42 -0700)]
net: phylink: fix sphinx complaint about invalid literal
sphinx complains about the use of "%PHYLINK_PCS_NEG_*":
Documentation/networking/kapi:144: ./include/linux/phylink.h:601: WARNING: Inline literal start-string without end-string.
Documentation/networking/kapi:144: ./include/linux/phylink.h:633: WARNING: Inline literal start-string without end-string.
These are not valid symbols so drop the '%' prefix.
Alternatively we could use %PHYLINK_PCS_NEG_\* (escape the *)
or use normal literal ``PHYLINK_PCS_NEG_*`` but there is already
a handful of un-adorned DEFINE_* in this file.
Fixes: f99d471afa03 ("net: phylink: add PCS negotiation mode") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Link: https://lore.kernel.org/all/20230626162908.2f149f98@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 5 Sep 2023 21:53:38 +0000 (00:53 +0300)]
net: dsa: sja1105: complete tc-cbs offload support on SJA1110
The blamed commit left this delta behind:
struct sja1105_cbs_entry {
- u64 port;
- u64 prio;
+ u64 port; /* Not used for SJA1110 */
+ u64 prio; /* Not used for SJA1110 */
u64 credit_hi;
u64 credit_lo;
u64 send_slope;
u64 idle_slope;
};
but did not actually implement tc-cbs offload fully for the new switch.
The offload is accepted, but it doesn't work.
The difference compared to earlier switch generations is that now, the
table of CBS shapers is sparse, because there are many more shapers, so
the mapping between a {port, prio} and a table index is static, rather
than requiring us to store the port and prio into the sja1105_cbs_entry.
So, the problem is that the code programs the CBS shaper parameters at a
dynamic table index which is incorrect.
All that needs to be done for SJA1110 CBS shapers to work is to bypass
the logic which allocates shapers in a dense manner, as for SJA1105, and
use the fixed mapping instead.
Fixes: 3e77e59bf8cf ("net: dsa: sja1105: add support for the SJA1110 switch family") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Error: Specified device failed to setup cbs hardware offload.
This comes from the fact that ndo_setup_tc(TC_SETUP_QDISC_CBS) presents
the same API for the qdisc create and replace cases, and the sja1105
driver fails to distinguish between the 2. Thus, it always thinks that
it must allocate the same shaper for a {port, queue} pair, when it may
instead have to replace an existing one.
Fixes: 4d7525085a9b ("net: dsa: sja1105: offload the Credit-Based Shaper qdisc") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 5 Sep 2023 21:53:36 +0000 (00:53 +0300)]
net: dsa: sja1105: fix bandwidth discrepancy between tc-cbs software and offload
More careful measurement of the tc-cbs bandwidth shows that the stream
bandwidth (effectively idleslope) increases, there is a larger and
larger discrepancy between the rate limit obtained by the software
Qdisc, and the rate limit obtained by its offloaded counterpart.
The discrepancy becomes so large, that e.g. at an idleslope of 40000
(40Mbps), the offloaded cbs does not actually rate limit anything, and
traffic will pass at line rate through a 100 Mbps port.
The reason for the discrepancy is that the hardware documentation I've
been following is incorrect. UM11040.pdf (for SJA1105P/Q/R/S) states
about IDLE_SLOPE that it is "the rate (in unit of bytes/sec) at which
the credit counter is increased".
Cross-checking with UM10944.pdf (for SJA1105E/T) and UM11107.pdf
(for SJA1110), the wording is different: "This field specifies the
value, in bytes per second times link speed, by which the credit counter
is increased".
So there's an extra scaling for link speed that the driver is currently
not accounting for, and apparently (empirically), that link speed is
expressed in Kbps.
I've pondered whether to pollute the sja1105_mac_link_up()
implementation with CBS shaper reprogramming, but I don't think it is
worth it. IMO, the UAPI exposed by tc-cbs requires user space to
recalculate the sendslope anyway, since the formula for that depends on
port_transmit_rate (see man tc-cbs), which is not an invariant from tc's
perspective.
So we use the offload->sendslope and offload->idleslope to deduce the
original port_transmit_rate from the CBS formula, and use that value to
scale the offload->sendslope and offload->idleslope to values that the
hardware understands.
Some numerical data points:
40Mbps stream, max interfering frame size 1500, port speed 100M
---------------------------------------------------------------
Before (doesn't work) After (works)
credit_hi 77 77
credit_lo 1444 1444
send_slope 11860000 118
idle_slope 640000 6
Tested on SJA1105T, SJA1105S and SJA1110A, at 1Gbps and 100Mbps.
Fixes: 4d7525085a9b ("net: dsa: sja1105: offload the Credit-Based Shaper qdisc") Reported-by: Yanan Yang <yanan.yang@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 6 Sep 2023 05:20:33 +0000 (06:20 +0100)]
Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Change MIN_TXD and MIN_RXD to allow set rx/tx value between 64 and 80
Olga Zaborska says:
Change the minimum value of RX/TX descriptors to 64 to enable setting the rx/tx
value between 64 and 80. All igb, igbvf and igc devices can use as low as 64
descriptors.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Bodong Wang [Tue, 5 Sep 2023 17:48:46 +0000 (10:48 -0700)]
mlx5/core: E-Switch, Create ACL FT for eswitch manager in switchdev mode
ACL flow table is required in switchdev mode when metadata is enabled,
driver creates such table when loading each vport. However, not every
vport is loaded in switchdev mode. Such as ECPF if it's the eswitch manager.
In this case, ACL flow table is still needed.
To make it modularized, create ACL flow table for eswitch manager as
default and skip such operations when loading manager vport.
Also, there is no need to load the eswitch manager vport in switchdev mode.
This means there is no need to load it on regular connect-x HCAs where
the PF is the eswitch manager. This will avoid creating duplicate ACL
flow table for host PF vport.
Fixes: 29bcb6e4fe70 ("net/mlx5e: E-Switch, Use metadata for vport matching in send-to-vport rules") Fixes: eb8e9fae0a22 ("mlx5/core: E-Switch, Allocate ECPF vport if it's an eswitch manager") Fixes: 5019833d661f ("net/mlx5: E-switch, Introduce helper function to enable/disable vports") Signed-off-by: Bodong Wang <bodong@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jianbo Liu [Tue, 5 Sep 2023 17:48:45 +0000 (10:48 -0700)]
net/mlx5e: Clear mirred devices array if the rule is split
In the cited commit, the mirred devices are recorded and checked while
parsing the actions. In order to avoid system crash, the duplicate
action in a single rule is not allowed.
But the rule is actually break down into several FTEs in different
tables, for either mirroring, or the specified types of actions which
use post action infrastructure.
It will reject certain action list by mistake, for example:
actions:enp8s0f0_1,set(ipv4(ttl=63)),enp8s0f0_0,enp8s0f0_1.
Here the rule is split to two FTEs because of pedit action.
To fix this issue, when parsing the rule actions, reset if_count to
clear the mirred devices array if the rule is split to multiple
FTEs, and then the duplicate checking is restarted.
Fixes: 554fe75c1b3f ("net/mlx5e: Avoid duplicating rule destinations") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: 039f50629b7f ("ip_tunnel: Move stats update to iptunnel_xmit()") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
team interface has used a dynamic lockdep key to avoid false-positive
lockdep deadlock detection. Virtual interfaces such as team usually
have their own lock for protecting private data.
These interfaces can be nested.
team0
|
team1
Each interface's lock is actually different(team0->lock and team1->lock).
So,
mutex_lock(&team0->lock);
mutex_lock(&team1->lock);
mutex_unlock(&team1->lock);
mutex_unlock(&team0->lock);
The above case is absolutely safe. But lockdep warns about deadlock.
Because the lockdep understands these two locks are same. This is a
false-positive lockdep warning.
So, in order to avoid this problem, the team interfaces started to use
dynamic lockdep key. The false-positive problem was fixed, but it
introduced a new problem.
When the new team virtual interface is created, it registers a dynamic
lockdep key(creates dynamic lockdep key) and uses it. But there is the
limitation of the number of lockdep keys.
So, If so many team interfaces are created, it consumes all lockdep keys.
Then, the lockdep stops to work and warns about it.
In order to fix this problem, team interfaces use the subclass instead
of the dynamic key. So, when a new team interface is created, it doesn't
register(create) a new lockdep, but uses existed subclass key instead.
It is already used by the bonding interface for a similar case.
As the bonding interface does, the subclass variable is the same as
the 'dev->nested_level'. This variable indicates the depth in the stacked
interface graph.
The 'dev->nested_level' is protected by RTNL and RCU.
So, 'mutex_lock_nested()' for 'team->lock' requires RTNL or RCU.
In the current code, 'team->lock' is usually acquired under RTNL, there is
no problem with using 'dev->nested_level'.
The 'team_nl_team_get()' and The 'lb_stats_refresh()' functions acquire
'team->lock' without RTNL.
But these don't iterate their own ports nested so they don't need nested
lock.
Reproducer:
for i in {0..1000}
do
ip link add team$i type team
ip link add dummy$i master team$i type dummy
ip link set dummy$i up
ip link set team$i up
done
Splat looks like:
BUG: MAX_LOCKDEP_ENTRIES too low!
turning off the locking correctness validator.
Please attach the output of /proc/lock_stat to the bug report
CPU: 0 PID: 4104 Comm: ip Not tainted 6.5.0-rc7+ #45
Call Trace:
<TASK>
dump_stack_lvl+0x64/0xb0
add_lock_to_list+0x30d/0x5e0
check_prev_add+0x73a/0x23a0
...
sock_def_readable+0xfe/0x4f0
netlink_broadcast+0x76b/0xac0
nlmsg_notify+0x69/0x1d0
dev_open+0xed/0x130
...
Reported-by: syzbot+9bbbacfbf1e04d5221f7@syzkaller.appspotmail.com Fixes: 369f61bee0f5 ("team: fix nested locking lockdep warning") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Quan Tian [Tue, 5 Sep 2023 10:36:10 +0000 (10:36 +0000)]
net/ipv6: SKB symmetric hash should incorporate transport ports
__skb_get_hash_symmetric() was added to compute a symmetric hash over
the protocol, addresses and transport ports, by commit eb70db875671
("packet: Use symmetric hash for PACKET_FANOUT_HASH."). It uses
flow_keys_dissector_symmetric_keys as the flow_dissector to incorporate
IPv4 addresses, IPv6 addresses and ports. However, it should not specify
the flag as FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL, which stops further
dissection when an IPv6 flow label is encountered, making transport
ports not being incorporated in such case.
As a consequence, the symmetric hash is based on 5-tuple for IPv4 but
3-tuple for IPv6 when flow label is present. It caused a few problems,
e.g. when nft symhash and openvswitch l4_sym rely on the symmetric hash
to perform load balancing as different L4 flows between two given IPv6
addresses would always get the same symmetric hash, leading to uneven
traffic distribution.
Removing the use of FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL makes sure the
symmetric hash is based on 5-tuple for both IPv4 and IPv6 consistently.
Fixes: eb70db875671 ("packet: Use symmetric hash for PACKET_FANOUT_HASH.") Reported-by: Lars Ekman <uablrek@gmail.com> Closes: https://github.com/antrea-io/antrea/issues/5457 Signed-off-by: Quan Tian <qtian@vmware.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Olga Zaborska [Tue, 25 Jul 2023 08:10:58 +0000 (10:10 +0200)]
igb: Change IGB_MIN to allow set rx/tx value between 64 and 80
Change the minimum value of RX/TX descriptors to 64 to enable setting the rx/tx
value between 64 and 80. All igb devices can use as low as 64 descriptors.
This change will unify igb with other drivers.
Based on commit 7b1be1987c1e ("e1000e: lower ring minimum size to 64")
Fixes: 9d5c824399de ("igb: PCI-Express 82575 Gigabit Ethernet driver") Signed-off-by: Olga Zaborska <olga.zaborska@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Olga Zaborska [Tue, 25 Jul 2023 08:10:57 +0000 (10:10 +0200)]
igbvf: Change IGBVF_MIN to allow set rx/tx value between 64 and 80
Change the minimum value of RX/TX descriptors to 64 to enable setting the rx/tx
value between 64 and 80. All igbvf devices can use as low as 64 descriptors.
This change will unify igbvf with other drivers.
Based on commit 7b1be1987c1e ("e1000e: lower ring minimum size to 64")
Fixes: d4e0fe01a38a ("igbvf: add new driver to support 82576 virtual functions") Signed-off-by: Olga Zaborska <olga.zaborska@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Olga Zaborska [Tue, 25 Jul 2023 08:10:56 +0000 (10:10 +0200)]
igc: Change IGC_MIN to allow set rx/tx value between 64 and 80
Change the minimum value of RX/TX descriptors to 64 to enable setting the rx/tx
value between 64 and 80. All igc devices can use as low as 64 descriptors.
This change will unify igc with other drivers.
Based on commit 7b1be1987c1e ("e1000e: lower ring minimum size to 64")
Fixes: 0507ef8a0372 ("igc: Add transmit and receive fastpath and interrupt handlers") Signed-off-by: Olga Zaborska <olga.zaborska@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
octeontx2-af: Fix truncation of smq in CN10K NIX AQ enqueue mbox handler
The smq value used in the CN10K NIX AQ instruction enqueue mailbox
handler was truncated to 9-bit value from 10-bit value because of
typecasting the CN10K mbox request structure to the CN9K structure.
Though this hasn't caused any problems when programming the NIX SQ
context to the HW because the context structure is the same size.
However, this causes a problem when accessing the structure parameters.
This patch reads the right smq value for each platform.
Fixes: 30077d210c83 ("octeontx2-af: cn10k: Update NIX/NPA context structure") Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 5 Sep 2023 04:23:38 +0000 (04:23 +0000)]
igmp: limit igmpv3_newpack() packet size to IP_MAX_MTU
This is a follow up of commit 915d975b2ffa ("net: deal with integer
overflows in kmalloc_reserve()") based on David Laight feedback.
Back in 2010, I failed to realize malicious users could set dev->mtu
to arbitrary values. This mtu has been since limited to 0x7fffffff but
regardless of how big dev->mtu is, it makes no sense for igmpv3_newpack()
to allocate more than IP_MAX_MTU and risk various skb fields overflows.
Fixes: 57e1ab6eaddc ("igmp: refine skb allocations") Link: https://lore.kernel.org/netdev/d273628df80f45428e739274ab9ecb72@AcuMS.aculab.com/ Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: David Laight <David.Laight@ACULAB.COM> Cc: Kyle Zeng <zengyhkyle@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
It was trying to work around an issue at the crypto layer by excluding
ASYNC implementations of gcm(aes), because a bug in the AESNI version
caused reordering when some requests bypassed the cryptd queue while
older requests were still pending on the queue.
This was fixed by commit 38b2f68b4264 ("crypto: aesni - Fix cryptd
reordering problem on gcm"), which pre-dates ab046a5d4be4.
Herbert Xu confirmed that all ASYNC implementations are expected to
maintain the ordering of completions wrt requests, so we can use them
in MACsec.
On my test machine, this restores the performance of a single netperf
instance, from 1.4Gbps to 4.4Gbps.
valis [Fri, 1 Sep 2023 16:22:37 +0000 (12:22 -0400)]
net: sched: sch_qfq: Fix UAF in qfq_dequeue()
When the plug qdisc is used as a class of the qfq qdisc it could trigger a
UAF. This issue can be reproduced with following commands:
tc qdisc add dev lo root handle 1: qfq
tc class add dev lo parent 1: classid 1:1 qfq weight 1 maxpkt 512
tc qdisc add dev lo parent 1:1 handle 2: plug
tc filter add dev lo parent 1: basic classid 1:1
ping -c1 127.0.0.1
and boom:
[ 285.353793] BUG: KASAN: slab-use-after-free in qfq_dequeue+0xa7/0x7f0
[ 285.354910] Read of size 4 at addr ffff8880bad312a8 by task ping/144
[ 285.355903]
[ 285.356165] CPU: 1 PID: 144 Comm: ping Not tainted 6.5.0-rc3+ #4
[ 285.357112] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[ 285.358376] Call Trace:
[ 285.358773] <IRQ>
[ 285.359109] dump_stack_lvl+0x44/0x60
[ 285.359708] print_address_description.constprop.0+0x2c/0x3c0
[ 285.360611] kasan_report+0x10c/0x120
[ 285.361195] ? qfq_dequeue+0xa7/0x7f0
[ 285.361780] qfq_dequeue+0xa7/0x7f0
[ 285.362342] __qdisc_run+0xf1/0x970
[ 285.362903] net_tx_action+0x28e/0x460
[ 285.363502] __do_softirq+0x11b/0x3de
[ 285.364097] do_softirq.part.0+0x72/0x90
[ 285.364721] </IRQ>
[ 285.365072] <TASK>
[ 285.365422] __local_bh_enable_ip+0x77/0x90
[ 285.366079] __dev_queue_xmit+0x95f/0x1550
[ 285.366732] ? __pfx_csum_and_copy_from_iter+0x10/0x10
[ 285.367526] ? __pfx___dev_queue_xmit+0x10/0x10
[ 285.368259] ? __build_skb_around+0x129/0x190
[ 285.368960] ? ip_generic_getfrag+0x12c/0x170
[ 285.369653] ? __pfx_ip_generic_getfrag+0x10/0x10
[ 285.370390] ? csum_partial+0x8/0x20
[ 285.370961] ? raw_getfrag+0xe5/0x140
[ 285.371559] ip_finish_output2+0x539/0xa40
[ 285.372222] ? __pfx_ip_finish_output2+0x10/0x10
[ 285.372954] ip_output+0x113/0x1e0
[ 285.373512] ? __pfx_ip_output+0x10/0x10
[ 285.374130] ? icmp_out_count+0x49/0x60
[ 285.374739] ? __pfx_ip_finish_output+0x10/0x10
[ 285.375457] ip_push_pending_frames+0xf3/0x100
[ 285.376173] raw_sendmsg+0xef5/0x12d0
[ 285.376760] ? do_syscall_64+0x40/0x90
[ 285.377359] ? __static_call_text_end+0x136578/0x136578
[ 285.378173] ? do_syscall_64+0x40/0x90
[ 285.378772] ? kasan_enable_current+0x11/0x20
[ 285.379469] ? __pfx_raw_sendmsg+0x10/0x10
[ 285.380137] ? __sock_create+0x13e/0x270
[ 285.380673] ? __sys_socket+0xf3/0x180
[ 285.381174] ? __x64_sys_socket+0x3d/0x50
[ 285.381725] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 285.382425] ? __rcu_read_unlock+0x48/0x70
[ 285.382975] ? ip4_datagram_release_cb+0xd8/0x380
[ 285.383608] ? __pfx_ip4_datagram_release_cb+0x10/0x10
[ 285.384295] ? preempt_count_sub+0x14/0xc0
[ 285.384844] ? __list_del_entry_valid+0x76/0x140
[ 285.385467] ? _raw_spin_lock_bh+0x87/0xe0
[ 285.386014] ? __pfx__raw_spin_lock_bh+0x10/0x10
[ 285.386645] ? release_sock+0xa0/0xd0
[ 285.387148] ? preempt_count_sub+0x14/0xc0
[ 285.387712] ? freeze_secondary_cpus+0x348/0x3c0
[ 285.388341] ? aa_sk_perm+0x177/0x390
[ 285.388856] ? __pfx_aa_sk_perm+0x10/0x10
[ 285.389441] ? check_stack_object+0x22/0x70
[ 285.390032] ? inet_send_prepare+0x2f/0x120
[ 285.390603] ? __pfx_inet_sendmsg+0x10/0x10
[ 285.391172] sock_sendmsg+0xcc/0xe0
[ 285.391667] __sys_sendto+0x190/0x230
[ 285.392168] ? __pfx___sys_sendto+0x10/0x10
[ 285.392727] ? kvm_clock_get_cycles+0x14/0x30
[ 285.393328] ? set_normalized_timespec64+0x57/0x70
[ 285.393980] ? _raw_spin_unlock_irq+0x1b/0x40
[ 285.394578] ? __x64_sys_clock_gettime+0x11c/0x160
[ 285.395225] ? __pfx___x64_sys_clock_gettime+0x10/0x10
[ 285.395908] ? _copy_to_user+0x3e/0x60
[ 285.396432] ? exit_to_user_mode_prepare+0x1a/0x120
[ 285.397086] ? syscall_exit_to_user_mode+0x22/0x50
[ 285.397734] ? do_syscall_64+0x71/0x90
[ 285.398258] __x64_sys_sendto+0x74/0x90
[ 285.398786] do_syscall_64+0x64/0x90
[ 285.399273] ? exit_to_user_mode_prepare+0x1a/0x120
[ 285.399949] ? syscall_exit_to_user_mode+0x22/0x50
[ 285.400605] ? do_syscall_64+0x71/0x90
[ 285.401124] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 285.401807] RIP: 0033:0x495726
[ 285.402233] Code: ff ff ff f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 11 b8 2c 00 00 00 0f 09
[ 285.404683] RSP: 002b:00007ffcc25fb618 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 285.405677] RAX: ffffffffffffffda RBX: 0000000000000040 RCX: 0000000000495726
[ 285.406628] RDX: 0000000000000040 RSI: 0000000002518750 RDI: 0000000000000000
[ 285.407565] RBP: 00000000005205ef R08: 00000000005f8838 R09: 000000000000001c
[ 285.408523] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000002517634
[ 285.409460] R13: 00007ffcc25fb6f0 R14: 0000000000000003 R15: 0000000000000000
[ 285.410403] </TASK>
[ 285.410704]
[ 285.410929] Allocated by task 144:
[ 285.411402] kasan_save_stack+0x1e/0x40
[ 285.411926] kasan_set_track+0x21/0x30
[ 285.412442] __kasan_slab_alloc+0x55/0x70
[ 285.412973] kmem_cache_alloc_node+0x187/0x3d0
[ 285.413567] __alloc_skb+0x1b4/0x230
[ 285.414060] __ip_append_data+0x17f7/0x1b60
[ 285.414633] ip_append_data+0x97/0xf0
[ 285.415144] raw_sendmsg+0x5a8/0x12d0
[ 285.415640] sock_sendmsg+0xcc/0xe0
[ 285.416117] __sys_sendto+0x190/0x230
[ 285.416626] __x64_sys_sendto+0x74/0x90
[ 285.417145] do_syscall_64+0x64/0x90
[ 285.417624] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 285.418306]
[ 285.418531] Freed by task 144:
[ 285.418960] kasan_save_stack+0x1e/0x40
[ 285.419469] kasan_set_track+0x21/0x30
[ 285.419988] kasan_save_free_info+0x27/0x40
[ 285.420556] ____kasan_slab_free+0x109/0x1a0
[ 285.421146] kmem_cache_free+0x1c2/0x450
[ 285.421680] __netif_receive_skb_core+0x2ce/0x1870
[ 285.422333] __netif_receive_skb_one_core+0x97/0x140
[ 285.423003] process_backlog+0x100/0x2f0
[ 285.423537] __napi_poll+0x5c/0x2d0
[ 285.424023] net_rx_action+0x2be/0x560
[ 285.424510] __do_softirq+0x11b/0x3de
[ 285.425034]
[ 285.425254] The buggy address belongs to the object at ffff8880bad31280
[ 285.425254] which belongs to the cache skbuff_head_cache of size 224
[ 285.426993] The buggy address is located 40 bytes inside of
[ 285.426993] freed 224-byte region [ffff8880bad31280, ffff8880bad31360)
[ 285.428572]
[ 285.428798] The buggy address belongs to the physical page:
[ 285.429540] page:00000000f4b77674 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xbad31
[ 285.430758] flags: 0x100000000000200(slab|node=0|zone=1)
[ 285.431447] page_type: 0xffffffff()
[ 285.431934] raw: 0100000000000200ffff88810094a8c0dead0000000001220000000000000000
[ 285.432757] raw: 000000000000000000000000800c000c00000001ffffffff0000000000000000
[ 285.433562] page dumped because: kasan: bad access detected
[ 285.434144]
[ 285.434320] Memory state around the buggy address:
[ 285.434828] ffff8880bad31180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 285.435580] ffff8880bad31200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 285.436264] >ffff8880bad31280: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 285.436777] ^
[ 285.437106] ffff8880bad31300: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
[ 285.437616] ffff8880bad31380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 285.438126] ==================================================================
[ 285.438662] Disabling lock debugging due to kernel taint
Fix this by:
1. Changing sch_plug's .peek handler to qdisc_peek_dequeued(), a
function compatible with non-work-conserving qdiscs
2. Checking the return value of qdisc_dequeue_peeked() in sch_qfq.
Fixes: 462dbc9101ac ("pkt_sched: QFQ Plus: fair-queueing service at DRR cost") Reported-by: valis <sec@valis.email> Signed-off-by: valis <sec@valis.email> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://lore.kernel.org/r/20230901162237.11525-1-jhs@mojatatu.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
As with sk->sk_shutdown shown in the previous patch, sk->sk_err can be
read locklessly by unix_dgram_sendmsg().
Let's use READ_ONCE() for sk_err as well.
Note that the writer side is marked by commit cc04410af7de ("af_unix:
annotate lockless accesses to sk->sk_err").
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
sk->sk_shutdown is changed under unix_state_lock(sk), but
unix_dgram_sendmsg() calls two functions to read sk_shutdown locklessly.
sock_alloc_send_pskb
`- sock_wait_for_wmem
Let's use READ_ONCE() there.
Note that the writer side was marked by commit e1d09c2c2f57 ("af_unix:
Fix data races around sk->sk_shutdown.").
BUG: KCSAN: data-race in sock_alloc_send_pskb / unix_release_sock
write (marked) to 0xffff8880069af12c of 1 bytes by task 1 on cpu 1:
unix_release_sock+0x75c/0x910 net/unix/af_unix.c:631
unix_release+0x59/0x80 net/unix/af_unix.c:1053
__sock_release+0x7d/0x170 net/socket.c:654
sock_close+0x19/0x30 net/socket.c:1386
__fput+0x2a3/0x680 fs/file_table.c:384
____fput+0x15/0x20 fs/file_table.c:412
task_work_run+0x116/0x1a0 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x174/0x180 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x1a/0x30 kernel/entry/common.c:297
do_syscall_64+0x4b/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x6e/0xd8
read to 0xffff8880069af12c of 1 bytes by task 28650 on cpu 0:
sock_alloc_send_pskb+0xd2/0x620 net/core/sock.c:2767
unix_dgram_sendmsg+0x2f8/0x14f0 net/unix/af_unix.c:1944
unix_seqpacket_sendmsg net/unix/af_unix.c:2308 [inline]
unix_seqpacket_sendmsg+0xba/0x130 net/unix/af_unix.c:2292
sock_sendmsg_nosec net/socket.c:725 [inline]
sock_sendmsg+0x148/0x160 net/socket.c:748
____sys_sendmsg+0x4e4/0x610 net/socket.c:2494
___sys_sendmsg+0xc6/0x140 net/socket.c:2548
__sys_sendmsg+0x94/0x140 net/socket.c:2577
__do_sys_sendmsg net/socket.c:2586 [inline]
__se_sys_sendmsg net/socket.c:2584 [inline]
__x64_sys_sendmsg+0x45/0x50 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x6e/0xd8
value changed: 0x00 -> 0x03
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 28650 Comm: systemd-coredum Not tainted 6.4.0-11989-g6843306689af #6
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzkaller <syzkaller@googlegroups.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
unix_tot_inflight is changed under spin_lock(unix_gc_lock), but
unix_release_sock() reads it locklessly.
Let's use READ_ONCE() for unix_tot_inflight.
Note that the writer side was marked by commit 9d6d7f1cb67c ("af_unix:
annote lockless accesses to unix_tot_inflight & gc_in_progress")
BUG: KCSAN: data-race in unix_inflight / unix_release_sock
write (marked) to 0xffffffff871852b8 of 4 bytes by task 123 on cpu 1:
unix_inflight+0x130/0x180 net/unix/scm.c:64
unix_attach_fds+0x137/0x1b0 net/unix/scm.c:123
unix_scm_to_skb net/unix/af_unix.c:1832 [inline]
unix_dgram_sendmsg+0x46a/0x14f0 net/unix/af_unix.c:1955
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x148/0x160 net/socket.c:747
____sys_sendmsg+0x4e4/0x610 net/socket.c:2493
___sys_sendmsg+0xc6/0x140 net/socket.c:2547
__sys_sendmsg+0x94/0x140 net/socket.c:2576
__do_sys_sendmsg net/socket.c:2585 [inline]
__se_sys_sendmsg net/socket.c:2583 [inline]
__x64_sys_sendmsg+0x45/0x50 net/socket.c:2583
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
read to 0xffffffff871852b8 of 4 bytes by task 4891 on cpu 0:
unix_release_sock+0x608/0x910 net/unix/af_unix.c:671
unix_release+0x59/0x80 net/unix/af_unix.c:1058
__sock_release+0x7d/0x170 net/socket.c:653
sock_close+0x19/0x30 net/socket.c:1385
__fput+0x179/0x5e0 fs/file_table.c:321
____fput+0x15/0x20 fs/file_table.c:349
task_work_run+0x116/0x1a0 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x174/0x180 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x1a/0x30 kernel/entry/common.c:297
do_syscall_64+0x4b/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc
value changed: 0x00000000 -> 0x00000001
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 4891 Comm: systemd-coredum Not tainted 6.4.0-rc5-01219-gfa0e21fa4443 #5
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Fixes: 9305cfa4443d ("[AF_UNIX]: Make unix_tot_inflight counter non-atomic") Reported-by: syzkaller <syzkaller@googlegroups.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
af_unix: Fix msg_controllen test in scm_pidfd_recv() for MSG_CMSG_COMPAT.
Heiko Carstens reported that SCM_PIDFD does not work with MSG_CMSG_COMPAT
because scm_pidfd_recv() always checks msg_controllen against sizeof(struct
cmsghdr).
We need to use sizeof(struct compat_cmsghdr) for the compat case.
Fixes: 5e2ff6704a27 ("scm: add SO_PASSPIDFD and SCM_PIDFD") Reported-by: Heiko Carstens <hca@linux.ibm.com> Closes: https://lore.kernel.org/netdev/20230901200517.8742-A-hca@linux.ibm.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Tested-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 1 Sep 2023 21:17:18 +0000 (14:17 -0700)]
docs: netdev: update the netdev infra URLs
Some corporate proxies block our current NIPA URLs because
they use a free / shady DNS domain. As suggested by Jesse
we got a new DNS entry from Konstantin - netdev.bots.linux.dev,
use it.
Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
net: phy: micrel: Correct bit assignments for phy_device flags
Previously, the defines for phy_device flags in the Micrel driver were
ambiguous in their representation. They were intended to be bit masks
but were mistakenly defined as bit positions. This led to the following
issues:
- MICREL_KSZ8_P1_ERRATA, designated for KSZ88xx switches, overlapped
with MICREL_PHY_FXEN and MICREL_PHY_50MHZ_CLK.
- Due to this overlap, the code path for MICREL_PHY_FXEN, tailored for
the KSZ8041 PHY, was not executed for KSZ88xx PHYs.
- Similarly, the code associated with MICREL_PHY_50MHZ_CLK wasn't
triggered for KSZ88xx.
To rectify this, all three flags have now been explicitly converted to
use the `BIT()` macro, ensuring they are defined as bit masks and
preventing potential overlaps in the future.
Fixes: 49011e0c1555 ("net: phy: micrel: ksz886x/ksz8081: add cabletest support") Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Henrie [Fri, 1 Sep 2023 04:41:27 +0000 (22:41 -0600)]
net: ipv6/addrconf: avoid integer underflow in ipv6_create_tempaddr
The existing code incorrectly casted a negative value (the result of a
subtraction) to an unsigned value without checking. For example, if
/proc/sys/net/ipv6/conf/*/temp_prefered_lft was set to 1, the preferred
lifetime would jump to 4 billion seconds. On my machine and network the
shortest lifetime that avoided underflow was 3 seconds.
Fixes: 76506a986dc3 ("IPv6: fix DESYNC_FACTOR") Signed-off-by: Alex Henrie <alexhenrie24@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
veth: Fixing transmit return status for dropped packets
The veth_xmit function returns NETDEV_TX_OK even when packets are dropped.
This behavior leads to incorrect calculations of statistics counts, as
well as things like txq->trans_start updates.
Fixes: e314dbdc1c0d ("[NET]: Virtual ethernet device driver.") Signed-off-by: Liang Chen <liangchen.linux@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: 9b8dd5e5ea48 ("gve: DQO: Add RX path") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Bailey Forrest <bcf@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Catherine Sullivan <csully@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
to:
size = kmalloc_size_roundup(size);
ptr = kmalloc(size);
This allowed various crash as reported by syzbot [1]
and Kyle Zeng.
Problem is that if @size is bigger than 0x80000001,
kmalloc_size_roundup(size) returns 2^32.
kmalloc_reserve() uses a 32bit variable (obj_size),
so 2^32 is truncated to 0.
kmalloc(0) returns ZERO_SIZE_PTR which is not handled by
skb allocations.
Following trace can be triggered if a netdev->mtu is set
close to 0x7fffffff
We might in the future limit netdev->mtu to more sensible
limit (like KMALLOC_MAX_SIZE).
This patch is based on a syzbot report, and also a report
and tentative fix from Kyle Zeng.
[1]
BUG: KASAN: user-memory-access in __build_skb_around net/core/skbuff.c:294 [inline]
BUG: KASAN: user-memory-access in __alloc_skb+0x3c4/0x6e8 net/core/skbuff.c:527
Write of size 32 at addr 00000000fffffd10 by task syz-executor.4/22554
Corinna Vinschen [Thu, 31 Aug 2023 12:19:13 +0000 (14:19 +0200)]
igb: disable virtualization features on 82580
Disable virtualization features on 82580 just as on i210/i211.
This avoids that virt functions are acidentally called on 82850.
Fixes: 55cac248caa4 ("igb: Add full support for 82580 devices") Signed-off-by: Corinna Vinschen <vinschen@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Thu, 31 Aug 2023 16:58:11 +0000 (17:58 +0100)]
sfc: check for zero length in EF10 RX prefix
When EF10 RXDP firmware is operating in cut-through mode, packet length
is not known at the time the RX prefix is generated, so it is left as
zero and RX event merging is inhibited to ensure that the length is
available in the RX event. However, it has been found that in certain
circumstances the RX events for these packets still get merged,
meaning the driver cannot read the length from the RX event, and tries
to use the length from the prefix.
The resulting zero-length SKBs cause crashes in GRO since commit 1d11fa696733 ("net-gro: remove GRO_DROP"), so add a check to the driver
to detect these zero-length RX events and discard the packet.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 1 Sep 2023 07:11:51 +0000 (08:11 +0100)]
Merge branch 'dst-hint-multipath'
Sriram Yagnaraman says:
====================
Avoid TCP resets when using ECMP for load-balancing between multiple servers.
All packets in the same flow (L3/L4 depending on multipath hash policy)
should be directed to the same target, but after [0]/[1] we see stray
packets directed towards other targets. This, for instance, causes RST
to be sent on TCP connections.
The first two patches solve the problem by ignoring route hints for
destinations that are part of multipath group, by using new SKB flags
for IPv4 and IPv6. The third patch is a selftest that tests the
scenario.
Thanks to Ido, for reviewing and suggesting a way forward in [2] and
also suggesting how to write a selftest for this.
v4->v5:
- Fixed review comments from Ido
v3->v4:
- Remove single path test
- Rebase to latest
v2->v3:
- Add NULL check for skb in fib6_select_path (Ido Schimmel)
- Use fib_tests.sh for selftest instead of the forwarding suite (Ido
Schimmel)
v1->v2:
- Update to commit messages describing the solution (Ido Schimmel)
- Use perf stat to count fib table lookups in selftest (Ido Schimmel)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
selftests: fib_tests: Add multipath list receive tests
The test uses perf stat to count the number of fib:fib_table_lookup
tracepoint hits for IPv4 and the number of fib6:fib6_table_lookup for
IPv6. The measured count is checked to be within 5% of the total number
of packets sent via veth1.
Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech> Signed-off-by: David S. Miller <davem@davemloft.net>
Route hints when the nexthop is part of a multipath group causes packets
in the same receive batch to be sent to the same nexthop irrespective of
the multipath hash of the packet. So, do not extract route hint for
packets whose destination is part of a multipath group.
A new SKB flag IP6SKB_MULTIPATH is introduced for this purpose, set the
flag when route is looked up in fib6_select_path() and use it in
ip6_can_use_hint() to check for the existence of the flag.
Fixes: 197dbf24e360 ("ipv6: introduce and uses route look hints for list input.") Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Route hints when the nexthop is part of a multipath group causes packets
in the same receive batch to be sent to the same nexthop irrespective of
the multipath hash of the packet. So, do not extract route hint for
packets whose destination is part of a multipath group.
A new SKB flag IPSKB_MULTIPATH is introduced for this purpose, set the
flag when route is looked up in ip_mkroute_input() and use it in
ip_extract_route_hint() to check for the existence of the flag.
Fixes: 02b24941619f ("ipv4: use dst hint for ipv4 list receive") Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
skbuff: skb_segment, Call zero copy functions before using skbuff frags
Commit bf5c25d60861 ("skbuff: in skb_segment, call zerocopy functions
once per nskb") added the call to zero copy functions in skb_segment().
The change introduced a bug in skb_segment() because skb_orphan_frags()
may possibly change the number of fragments or allocate new fragments
altogether leaving nrfrags and frag to point to the old values. This can
cause a panic with stacktrace like the one below.
In this case calling skb_orphan_frags() updated nr_frags leaving nrfrags
local variable in skb_segment() stale. This resulted in the code hitting
i >= nrfrags prematurely and trying to move to next frag_skb using
list_skb pointer, which was NULL, and caused kernel panic. Move the call
to zero copy functions before using frags and nr_frags.
Fixes: bf5c25d60861 ("skbuff: in skb_segment, call zerocopy functions once per nskb") Signed-off-by: Mohamed Khalfella <mkhalfella@purestorage.com> Reported-by: Amit Goyal <agoyal@purestorage.com> Cc: stable@vger.kernel.org Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 31 Aug 2023 13:52:12 +0000 (13:52 +0000)]
net: annotate data-races around sk->sk_bind_phc
sk->sk_bind_phc is read locklessly. Add corresponding annotations.
Fixes: d463126e23f1 ("net: sock: extend SO_TIMESTAMPING for PHC binding") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yangbo Lu <yangbo.lu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 31 Aug 2023 13:52:11 +0000 (13:52 +0000)]
net: annotate data-races around sk->sk_tsflags
sk->sk_tsflags can be read locklessly, add corresponding annotations.
Fixes: b9f40e21ef42 ("net-timestamp: move timestamp flags out of sk_flags") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 31 Aug 2023 13:52:10 +0000 (13:52 +0000)]
mptcp: annotate data-races around msk->rmem_fwd_alloc
msk->rmem_fwd_alloc can be read locklessly.
Add mptcp_rmem_fwd_alloc_add(), similar to sk_forward_alloc_add(),
and appropriate READ_ONCE()/WRITE_ONCE() annotations.
Fixes: 6511882cdd82 ("mptcp: allocate fwd memory separately on the rx and tx path") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 31 Aug 2023 13:52:08 +0000 (13:52 +0000)]
net: use sk_forward_alloc_get() in sk_get_meminfo()
inet_sk_diag_fill() has been changed to use sk_forward_alloc_get(),
but sk_get_meminfo() was forgotten.
Fixes: 292e6077b040 ("net: introduce sk_forward_alloc_get()") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: 3b3009ea8abb ("net/handshake: Create a NETLINK service for handling handshake requests") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Björn Töpel [Thu, 31 Aug 2023 16:29:54 +0000 (18:29 +0200)]
selftests/bpf: Include build flavors for install target
When using the "install" or targets depending on install, e.g. "gen_tar",
the BPF machine flavors weren't included.
A command like:
| make ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- O=/workspace/kbuild \
| HOSTCC=gcc FORMAT= SKIP_TARGETS="arm64 ia64 powerpc sparc64 x86 sgx" \
| -C tools/testing/selftests gen_tar
would not include bpf/no_alu32, bpf/cpuv4, or bpf/bpf-gcc.
Include the BPF machine flavors for "install" make target.
Daniel Borkmann [Tue, 29 Aug 2023 20:53:52 +0000 (22:53 +0200)]
bpf: Annotate bpf_long_memcpy with data_race
syzbot reported a data race splat between two processes trying to
update the same BPF map value via syscall on different CPUs:
BUG: KCSAN: data-race in bpf_percpu_array_update / bpf_percpu_array_update
write to 0xffffe8fffe7425d8 of 8 bytes by task 8257 on cpu 1:
bpf_long_memcpy include/linux/bpf.h:428 [inline]
bpf_obj_memcpy include/linux/bpf.h:441 [inline]
copy_map_value_long include/linux/bpf.h:464 [inline]
bpf_percpu_array_update+0x3bb/0x500 kernel/bpf/arraymap.c:380
bpf_map_update_value+0x190/0x370 kernel/bpf/syscall.c:175
generic_map_update_batch+0x3ae/0x4f0 kernel/bpf/syscall.c:1749
bpf_map_do_batch+0x2df/0x3d0 kernel/bpf/syscall.c:4648
__sys_bpf+0x28a/0x780
__do_sys_bpf kernel/bpf/syscall.c:5241 [inline]
__se_sys_bpf kernel/bpf/syscall.c:5239 [inline]
__x64_sys_bpf+0x43/0x50 kernel/bpf/syscall.c:5239
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
write to 0xffffe8fffe7425d8 of 8 bytes by task 8268 on cpu 0:
bpf_long_memcpy include/linux/bpf.h:428 [inline]
bpf_obj_memcpy include/linux/bpf.h:441 [inline]
copy_map_value_long include/linux/bpf.h:464 [inline]
bpf_percpu_array_update+0x3bb/0x500 kernel/bpf/arraymap.c:380
bpf_map_update_value+0x190/0x370 kernel/bpf/syscall.c:175
generic_map_update_batch+0x3ae/0x4f0 kernel/bpf/syscall.c:1749
bpf_map_do_batch+0x2df/0x3d0 kernel/bpf/syscall.c:4648
__sys_bpf+0x28a/0x780
__do_sys_bpf kernel/bpf/syscall.c:5241 [inline]
__se_sys_bpf kernel/bpf/syscall.c:5239 [inline]
__x64_sys_bpf+0x43/0x50 kernel/bpf/syscall.c:5239
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
value changed: 0x0000000000000000 -> 0xfffffff000002788
The bpf_long_memcpy is used with 8-byte aligned pointers, power-of-8 size
and forced to use long read/writes to try to atomically copy long counters.
It is best-effort only and no barriers are here since it _will_ race with
concurrent updates from BPF programs. The bpf_long_memcpy() is called from
bpf(2) syscall. Marco suggested that the best way to make this known to
KCSAN would be to use data_race() annotation.
Jiri Olsa [Thu, 31 Aug 2023 14:11:03 +0000 (16:11 +0200)]
selftests/bpf: Fix d_path test
Recent commit [1] broke d_path test, because now filp_close is not called
directly from sys_close, but eventually later when the file is finally
released.
As suggested by Hou Tao we don't need to re-hook the bpf program, but just
instead we can use sys_close_range to trigger filp_close synchronously.
[1] 021a160abf62 ("fs: use __fput_sync in close(2)")
Vishal Chourasia [Tue, 29 Aug 2023 07:49:31 +0000 (13:19 +0530)]
bpf, docs: Fix invalid escape sequence warnings in bpf_doc.py
The script bpf_doc.py generates multiple SyntaxWarnings related to invalid
escape sequences when executed with Python 3.12. These warnings do not appear
in Python 3.10 and 3.11 and do not affect the kernel build, which completes
successfully.
This patch resolves these SyntaxWarnings by converting the relevant string
literals to raw strings or by escaping backslashes. This ensures that
backslashes are interpreted as literal characters, eliminating the warnings.
Magnus Karlsson [Thu, 31 Aug 2023 10:01:17 +0000 (12:01 +0200)]
xsk: Fix xsk_diag use-after-free error during socket cleanup
Fix a use-after-free error that is possible if the xsk_diag interface
is used after the socket has been unbound from the device. This can
happen either due to the socket being closed or the device
disappearing. In the early days of AF_XDP, the way we tested that a
socket was not bound to a device was to simply check if the netdevice
pointer in the xsk socket structure was NULL. Later, a better system
was introduced by having an explicit state variable in the xsk socket
struct. For example, the state of a socket that is on the way to being
closed and has been unbound from the device is XSK_UNBOUND.
The commit in the Fixes tag below deleted the old way of signalling
that a socket is unbound, setting dev to NULL. This in the belief that
all code using the old way had been exterminated. That was
unfortunately not true as the xsk diagnostics code was still using the
old way and thus does not work as intended when a socket is going
down. Fix this by introducing a test against the state variable. If
the socket is in the state XSK_UNBOUND, simply abort the diagnostic's
netlink operation.
Fixes: 18b1ab7aa76b ("xsk: Fix race at socket teardown") Reported-by: syzbot+822d1359297e2694f873@syzkaller.appspotmail.com Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: syzbot+822d1359297e2694f873@syzkaller.appspotmail.com Tested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/bpf/20230831100119.17408-1-magnus.karlsson@gmail.com
Florian Westphal [Wed, 30 Aug 2023 11:00:37 +0000 (13:00 +0200)]
net: fib: avoid warn splat in flow dissector
New skbs allocated via nf_send_reset() have skb->dev == NULL.
fib*_rules_early_flow_dissect helpers already have a 'struct net'
argument but its not passed down to the flow dissector core, which
will then WARN as it can't derive a net namespace to use:
read to 0xffff888150f31744 of 1 bytes by task 21839 on cpu 1:
fib_table_lookup+0x2bf/0xd50 net/ipv4/fib_trie.c:1585
fib_lookup include/net/ip_fib.h:383 [inline]
ip_route_output_key_hash_rcu+0x38c/0x12c0 net/ipv4/route.c:2751
ip_route_output_key_hash net/ipv4/route.c:2641 [inline]
__ip_route_output_key include/net/route.h:134 [inline]
ip_route_output_flow+0xa6/0x150 net/ipv4/route.c:2869
send4+0x1e7/0x500 drivers/net/wireguard/socket.c:61
wg_socket_send_skb_to_peer+0x94/0x130 drivers/net/wireguard/socket.c:175
wg_socket_send_buffer_to_peer+0xd6/0x100 drivers/net/wireguard/socket.c:200
wg_packet_send_handshake_initiation drivers/net/wireguard/send.c:40 [inline]
wg_packet_handshake_send_worker+0x10c/0x150 drivers/net/wireguard/send.c:51
process_one_work+0x434/0x860 kernel/workqueue.c:2600
worker_thread+0x5f2/0xa10 kernel/workqueue.c:2751
kthread+0x1d7/0x210 kernel/kthread.c:389
ret_from_fork+0x2e/0x40 arch/x86/kernel/process.c:145
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304
value changed: 0x00 -> 0x01
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 21839 Comm: kworker/u4:18 Tainted: G W 6.5.0-syzkaller #0
Fixes: dccd9ecc3744 ("ipv4: Do not use dead fib_info entries.") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230830095520.1046984-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
read to 0xffff888149d77810 of 4 bytes by task 17828 on cpu 1:
sctp_writeable net/sctp/socket.c:9304 [inline]
sctp_poll+0x265/0x410 net/sctp/socket.c:8671
sock_poll+0x253/0x270 net/socket.c:1374
vfs_poll include/linux/poll.h:88 [inline]
do_pollfd fs/select.c:873 [inline]
do_poll fs/select.c:921 [inline]
do_sys_poll+0x636/0xc00 fs/select.c:1015
__do_sys_ppoll fs/select.c:1121 [inline]
__se_sys_ppoll+0x1af/0x1f0 fs/select.c:1101
__x64_sys_ppoll+0x67/0x80 fs/select.c:1101
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
value changed: 0x00019e80 -> 0x0000cc80
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 17828 Comm: syz-executor.1 Not tainted 6.5.0-rc7-syzkaller-00185-g28f20a19294d #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Xin Long <lucien.xin@gmail.com> Link: https://lore.kernel.org/r/20230830094519.950007-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Eric Dumazet [Tue, 29 Aug 2023 12:35:41 +0000 (12:35 +0000)]
net/sched: fq_pie: avoid stalls in fq_pie_timer()
When setting a high number of flows (limit being 65536),
fq_pie_timer() is currently using too much time as syzbot reported.
Add logic to yield the cpu every 2048 flows (less than 150 usec
on debug kernels).
It should also help by not blocking qdisc fast paths for too long.
Worst case (65536 flows) would need 31 jiffies for a complete scan.
net: stmmac: failure to probe without MAC interface specified
Alexander Stein reports that commit a014c35556b9 ("net: stmmac: clarify
difference between "interface" and "phy_interface"") caused breakage,
because plat->mac_interface will never be negative. Fix this by using
the "rc" temporary variable in stmmac_probe_config_dt().
Reported-by: Alexander Stein <alexander.stein@ew.tq-group.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Tested-by: Alexander Stein <alexander.stein@ew.tq-group.com> Link: https://lore.kernel.org/r/E1qayn0-006Q8J-GE@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Phil Sutter [Tue, 29 Aug 2023 17:51:58 +0000 (19:51 +0200)]
netfilter: nf_tables: Audit log rule reset
Resetting rules' stateful data happens outside of the transaction logic,
so 'get' and 'dump' handlers have to emit audit log entries themselves.
Fixes: 8daa8fde3fc3f ("netfilter: nf_tables: Introduce NFT_MSG_GETRULE_RESET") Signed-off-by: Phil Sutter <phil@nwl.cc> Reviewed-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Phil Sutter [Tue, 29 Aug 2023 17:51:57 +0000 (19:51 +0200)]
netfilter: nf_tables: Audit log setelem reset
Since set element reset is not integrated into nf_tables' transaction
logic, an explicit log call is needed, similar to NFT_MSG_GETOBJ_RESET
handling.
For the sake of simplicity, catchall element reset will always generate
a dedicated log entry. This relieves nf_tables_dump_set() from having to
adjust the logged element count depending on whether a catchall element
was found or not.
Fixes: 079cd633219d7 ("netfilter: nf_tables: Introduce NFT_MSG_GETSETELEM_RESET") Signed-off-by: Phil Sutter <phil@nwl.cc> Reviewed-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The xt_u32 module doesn't validate the fields in the xt_u32 structure.
An attacker may take advantage of this to trigger an OOB read by setting
the size fields with a value beyond the arrays boundaries.
Add a checkentry function to validate the structure.
This was originally reported by the ZDI project (ZDI-CAN-18408).
Fixes: 1b50b8a371e9 ("[NETFILTER]: Add u32 match") Cc: stable@vger.kernel.org Signed-off-by: Wander Lairson Costa <wander@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
David Vernet [Mon, 28 Aug 2023 15:59:48 +0000 (10:59 -0500)]
bpf, docs: s/eBPF/BPF in standards documents
There isn't really anything other than just "BPF" at this point,
so referring to it as "eBPF" in our standards document just causes
unnecessary confusion. Let's just be consistent and use "BPF".
David Vernet [Mon, 28 Aug 2023 15:59:47 +0000 (10:59 -0500)]
bpf, docs: Add abi.rst document to standardization subdirectory
As specified in the IETF BPF charter, the BPF working group has plans to
add one or more informational documents that recommend conventions and
guidelines for producing portable BPF program binaries. The
instruction-set.rst document currently contains a "Registers and calling
convention" subsection which dictates a calling convention that belongs
in an ABI document, rather than an instruction set document. Let's move
it to a new abi.rst document so we can clean it up. The abi.rst document
will of course be significantly changed and expanded upon over time. For
now, it's really just a placeholder which will contain ABI-specific
language that doesn't belong in other documents.
David Vernet [Mon, 28 Aug 2023 15:59:46 +0000 (10:59 -0500)]
bpf, docs: Move linux-notes.rst to root bpf docs tree
In commit 4d496be9ca05 ("bpf,docs: Create new standardization
subdirectory"), I added a standardization/ directory to the BPF
documentation, which will contain the docs that will be standardized
as part of the effort with the IETF.
I included linux-notes.rst in that directory, but I shouldn't have. It
doesn't contain anything that will be standardized. Let's move it back
to Documentation/bpf.
commit edf391ff1723 ("snmp: add missing counters for RFC 4293") had
already added OutOctets for RFC 4293. In commit 2d8dbb04c63e ("snmp: fix
OutOctets counter to include forwarded datagrams"), OutOctets was
counted again, but not removed from ip_output().
According to RFC 4293 "3.2.3. IP Statistics Tables",
ipipIfStatsOutTransmits is not equal to ipIfStatsOutForwDatagrams. So
"IPSTATS_MIB_OUTOCTETS must be incremented when incrementing" is not
accurate. And IPSTATS_MIB_OUTOCTETS should be counted after fragment.
This patch reverts commit 2d8dbb04c63e ("snmp: fix OutOctets counter to
include forwarded datagrams") and move IPSTATS_MIB_OUTOCTETS to
ip_finish_output2 for ipv4.
Reviewed-by: Filip Pudak <filip.pudak@windriver.com> Signed-off-by: Heng Guo <heng.guo@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
John Fastabend [Wed, 30 Aug 2023 05:35:17 +0000 (22:35 -0700)]
bpf, sockmap: Fix preempt_rt splat when using raw_spin_lock_t
Sockmap and sockhash maps are a collection of psocks that are
objects representing a socket plus a set of metadata needed
to manage the BPF programs associated with the socket. These
maps use the stab->lock to protect from concurrent operations
on the maps, e.g. trying to insert to objects into the array
at the same time in the same slot. Additionally, a sockhash map
has a bucket lock to protect iteration and insert/delete into
the hash entry.
Each psock has a psock->link which is a linked list of all the
maps that a psock is attached to. This allows a psock (socket)
to be included in multiple sockmap and sockhash maps. This
linked list is protected the psock->link_lock.
They _must_ be nested correctly to avoid deadlock:
lock(stab->lock)
: do BPF map operations and psock insert/delete
lock(psock->link_lock)
: add map to psock linked list of maps
unlock(psock->link_lock)
unlock(stab->lock)
For non PREEMPT_RT kernels both raw_spin_lock_t and spin_lock_t
are guaranteed to not sleep. But, with PREEMPT_RT kernels the
spin_lock_t variants may sleep. In the current code we have
many patterns like this:
rcu_critical_section:
raw_spin_lock(stab->lock)
spin_lock(psock->link_lock) <- may sleep ouch
spin_unlock(psock->link_lock)
raw_spin_unlock(stab->lock)
rcu_critical_section
Nesting spin_lock() inside a raw_spin_lock() violates locking
rules for PREEMPT_RT kernels. And additionally we do alloc(GFP_ATOMICS)
inside the stab->lock, but those might sleep on PREEMPT_RT kernels.
The result is splats like this:
./test_progs -t sockmap_basic
[ 33.344330] bpf_testmod: loading out-of-tree module taints kernel.
[ 33.441933]
[ 33.442089] =============================
[ 33.442421] [ BUG: Invalid wait context ]
[ 33.442763] 6.5.0-rc5-01731-gec0ded2e0282 #4958 Tainted: G O
[ 33.443320] -----------------------------
[ 33.443624] test_progs/2073 is trying to lock:
[ 33.443960] ffff888102a1c290 (&psock->link_lock){....}-{3:3}, at: sock_map_update_common+0x2c2/0x3d0
[ 33.444636] other info that might help us debug this:
[ 33.444991] context-{5:5}
[ 33.445183] 3 locks held by test_progs/2073:
[ 33.445498] #0: ffff88811a208d30 (sk_lock-AF_INET){+.+.}-{0:0}, at: sock_map_update_elem_sys+0xff/0x330
[ 33.446159] #1: ffffffff842539e0 (rcu_read_lock){....}-{1:3}, at: sock_map_update_elem_sys+0xf5/0x330
[ 33.446809] #2: ffff88810d687240 (&stab->lock){+...}-{2:2}, at: sock_map_update_common+0x177/0x3d0
[ 33.447445] stack backtrace:
[ 33.447655] CPU: 10 PID
To fix observe we can't readily remove the allocations (for that
we would need to use/create something similar to bpf_map_alloc). So
convert raw_spin_lock_t to spin_lock_t. We note that sock_map_update
that would trigger the allocate and potential sleep is only allowed
through sys_bpf ops and via sock_ops which precludes hw interrupts
and low level atomic sections in RT preempt kernel. On non RT
preempt kernel there are no changes here and spin locks sections
and alloc(GFP_ATOMIC) are still not sleepable.
Eduard Zingerman [Sat, 26 Aug 2023 22:29:12 +0000 (01:29 +0300)]
docs/bpf: Add description for CO-RE relocations
Add a section on CO-RE relocations to llvm_relo.rst. Describe relevant .BTF.ext
structure, `enum bpf_core_relo_kind` and `struct bpf_core_relo` in some detail.
Finally, I decided to do some investigation since the test is introduced
by myself. It turns out the reason is due to cgroup_fd with value 0.
In cgroup_iter, a cgroup_fd of value 0 means the root cgroup.
/* from cgroup_iter.c */
if (fd)
cgrp = cgroup_v1v2_get_from_fd(fd);
else if (id)
cgrp = cgroup_get_from_id(id);
else /* walk the entire hierarchy by default. */
cgrp = cgroup_get_from_path("/");
That is why we got cgroup_id 1 instead of expected 2812.
Why we got a cgroup_fd 0? Nobody should really touch 'stdin' (fd 0) in
test_progs. I traced 'close' syscall with stack trace and found the root
cause, which is a bug in bpf_obj_pinning.c. Basically, the code closed
fd 0 although it should not. Fixing the bug in bpf_obj_pinning.c also
resolved the above cgroup_iter_sleepable subtest failure.
Tirthendu Sarkar [Wed, 23 Aug 2023 14:47:13 +0000 (20:17 +0530)]
xsk: Fix xsk_build_skb() error: 'skb' dereferencing possible ERR_PTR()
Currently, xsk_build_skb() is a function that builds skb in two possible
ways and then is ended with common error handling.
We can distinguish four possible error paths and handling in xsk_build_skb():
1. sock_alloc_send_skb fails: Retry (skb is NULL).
2. skb_store_bits fails : Free skb and retry.
3. MAX_SKB_FRAGS exceeded: Free skb, cleanup and drop packet.
4. alloc_page fails for frag: Retry page allocation w/o freeing skb
1] and 3] can happen in xsk_build_skb_zerocopy(), which is one of the
two code paths responsible for building skb. Common error path in
xsk_build_skb() assumes that in case errno != -EAGAIN, skb is a valid
pointer, which is wrong as kernel test robot reports that in
xsk_build_skb_zerocopy() other errno values are returned for skb being
NULL.
To fix this, set -EOVERFLOW as error when MAX_SKB_FRAGS are exceeded
and packet needs to be dropped in both xsk_build_skb() and
xsk_build_skb_zerocopy() and use this to distinguish against all other
error cases. Also, add explicit kfree_skb() for 3] so that handling
of 1], 2], and 3] becomes identical where allocation needs to be retried.
Fixes: cf24f5a5feea ("xsk: add support for AF_XDP multi-buffer on Tx path") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Tirthendu Sarkar <tirthendu.sarkar@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Closes: https://lore.kernel.org/r/202307210434.OjgqFcbB-lkp@intel.com Link: https://lore.kernel.org/bpf/20230823144713.2231808-1-tirthendu.sarkar@intel.com
Yafang Shao [Wed, 30 Aug 2023 03:03:25 +0000 (03:03 +0000)]
bpftool: Fix build warnings with -Wtype-limits
Quentin reported build warnings when building bpftool :
link.c: In function ‘perf_config_hw_cache_str’:
link.c:86:18: warning: comparison of unsigned expression in ‘>= 0’ is always true [-Wtype-limits]
86 | if ((id) >= 0 && (id) < ARRAY_SIZE(array)) \
| ^~
link.c:320:20: note: in expansion of macro ‘perf_event_name’
320 | hw_cache = perf_event_name(evsel__hw_cache, config & 0xff);
| ^~~~~~~~~~~~~~~
[... more of the same for the other calls to perf_event_name ...]
He also pointed out the reason and the solution:
We're always passing unsigned, so it should be safe to drop the check on
(id) >= 0.
[...]
kprobe_multi_test_run:FAIL:kprobe_test7_result unexpected kprobe_test7_result: actual 0 != expected 1
[...]
kprobe_multi_test_run:FAIL:kretprobe_test7_result unexpected kretprobe_test7_result: actual 0 != expected 1
clang17 does not have this issue.
Further investigation shows that kernel func bpf_fentry_test7(), used in
the above tests, is inlined by the compiler although it is marked as
noinline.
int noinline bpf_fentry_test7(struct bpf_fentry_test_t *arg)
{
return (long)arg;
}
It is known that for simple functions like the above (e.g. just returning
a constant or an input argument), the clang compiler may still do inlining
for a noinline function. Adding 'asm volatile ("")' in the beginning of the
bpf_fentry_test7() can prevent inlining.
Linus Torvalds [Tue, 29 Aug 2023 18:33:01 +0000 (11:33 -0700)]
Merge tag 'net-next-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni:
"Core:
- Increase size limits for to-be-sent skb frag allocations. This
allows tun, tap devices and packet sockets to better cope with
large writes operations
- Store netdevs in an xarray, to simplify iterating over netdevs
- Refactor nexthop selection for multipath routes
- Improve sched class lifetime handling
- Add backup nexthop ID support for bridge
- Implement drop reasons support in openvswitch
- Several data races annotations and fixes
- Constify the sk parameter of routing functions
- Prepend kernel version to netconsole message
Protocols:
- Implement support for TCP probing the peer being under memory
pressure
- Remove hard coded limitation on IPv6 specific info placement inside
the socket struct
- Get rid of sysctl_tcp_adv_win_scale and use an auto-estimated per
socket scaling factor
- Scaling-up the IPv6 expired route GC via a separated list of
expiring routes
- In-kernel support for the TLS alert protocol
- Better support for UDP reuseport with connected sockets
- Add NEXT-C-SID support for SRv6 End.X behavior, reducing the SR
header size
- Get rid of additional ancillary per MPTCP connection struct socket
- Implement support for BPF-based MPTCP packet schedulers
- Format MPTCP subtests selftests results in TAP
- Several new SMC 2.1 features including unique experimental options,
max connections per lgr negotiation, max links per lgr negotiation
BPF:
- Multi-buffer support in AF_XDP
- Add multi uprobe BPF links for attaching multiple uprobes and usdt
probes, which is significantly faster and saves extra fds
- Implement an fd-based tc BPF attach API (TCX) and BPF link support
on top of it
- Add SO_REUSEPORT support for TC bpf_sk_assign
- Support new instructions from cpu v4 to simplify the generated code
and feature completeness, for x86, arm64, riscv64
- Support defragmenting IPv(4|6) packets in BPF
- Teach verifier actual bounds of bpf_get_smp_processor_id() and fix
perf+libbpf issue related to custom section handling
- Introduce bpf map element count and enable it for all program types
- Add a BPF hook in sys_socket() to change the protocol ID from
IPPROTO_TCP to IPPROTO_MPTCP to cover migration for legacy
- Introduce bpf_me_mcache_free_rcu() and fix OOM under stress
- Add uprobe support for the bpf_get_func_ip helper
- Check skb ownership against full socket
- Support for up to 12 arguments in BPF trampoline
- Extend link_info for kprobe_multi and perf_event links
Netfilter:
- Speed-up process exit by aborting ruleset validation if a fatal
signal is pending
- Allow NLA_POLICY_MASK to be used with BE16/BE32 types
Driver API:
- Page pool optimizations, to improve data locality and cache usage
- Introduce ndo_hwtstamp_get() and ndo_hwtstamp_set() to avoid the
need for raw ioctl() handling in drivers
- Simplify genetlink dump operations (doit/dumpit) providing them the
common information already populated in struct genl_info
- Extend and use the yaml devlink specs to [re]generate the split ops
- Introduce devlink selective dumps, to allow SF filtering SF based
on handle and other attributes
- Add yaml netlink spec for netlink-raw families, allow route, link
and address related queries via the ynl tool
- Bluetooth:
- Intel Gale Peak
- Qualcomm WCN3988 and WCN7850
- NXP AW693 and IW624
- Mediatek MT2925
Drivers:
- Ethernet NICs:
- nVidia/Mellanox:
- mlx5:
- support UDP encapsulation in packet offload mode
- IPsec packet offload support in eswitch mode
- improve aRFS observability by adding new set of counters
- extends MACsec offload support to cover RoCE traffic
- dynamic completion EQs
- mlx4:
- convert to use auxiliary bus instead of custom interface
logic
- Intel
- ice:
- implement switchdev bridge offload, even for LAG
interfaces
- implement SRIOV support for LAG interfaces
- igc:
- add support for multiple in-flight TX timestamps
- Broadcom:
- bnxt:
- use the unified RX page pool buffers for XDP and non-XDP
- use the NAPI skb allocation cache
- OcteonTX2:
- support Round Robin scheduling HTB offload
- TC flower offload support for SPI field
- Freescale:
- add XDP_TX feature support
- AMD:
- ionic: add support for PCI FLR event
- sfc:
- basic conntrack offload
- introduce eth, ipv4 and ipv6 pedit offloads
- ST Microelectronics:
- stmmac: maximze PTP timestamping resolution
- Virtual NICs:
- Microsoft vNIC:
- batch ringing RX queue doorbell on receiving packets
- add page pool for RX buffers
- Virtio vNIC:
- add per queue interrupt coalescing support
- Google vNIC:
- add queue-page-list mode support
- Ethernet high-speed switches:
- nVidia/Mellanox (mlxsw):
- add port range matching tc-flower offload
- permit enslavement to netdevices with uppers
- Ethernet embedded switches:
- Marvell (mv88e6xxx):
- convert to phylink_pcs
- Renesas:
- r8A779fx: add speed change support
- rzn1: enables vlan support
- Ethernet PHYs:
- convert mv88e6xxx to phylink_pcs
- WiFi:
- Qualcomm Wi-Fi 7 (ath12k):
- extremely High Throughput (EHT) PHY support
- RealTek (rtl8xxxu):
- enable AP mode for: RTL8192FU, RTL8710BU (RTL8188GU),
RTL8192EU and RTL8723BU
- RealTek (rtw89):
- Introduce Time Averaged SAR (TAS) support
- Connector:
- support for event filtering"
* tag 'net-next-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1806 commits)
net: ethernet: mtk_wed: minor change in wed_{tx,rx}info_show
net: ethernet: mtk_wed: add some more info in wed_txinfo_show handler
net: stmmac: clarify difference between "interface" and "phy_interface"
r8152: add vendor/device ID pair for D-Link DUB-E250
devlink: move devlink_notify_register/unregister() to dev.c
devlink: move small_ops definition into netlink.c
devlink: move tracepoint definitions into core.c
devlink: push linecard related code into separate file
devlink: push rate related code into separate file
devlink: push trap related code into separate file
devlink: use tracepoint_enabled() helper
devlink: push region related code into separate file
devlink: push param related code into separate file
devlink: push resource related code into separate file
devlink: push dpipe related code into separate file
devlink: move and rename devlink_dpipe_send_and_alloc_skb() helper
devlink: push shared buffer related code into separate file
devlink: push port related code into separate file
devlink: push object register/unregister notifications into separate helpers
inet: fix IP_TRANSPARENT error handling
...
Linus Torvalds [Tue, 29 Aug 2023 18:23:29 +0000 (11:23 -0700)]
Merge tag 'v6.6-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
"API:
- Move crypto engine callback from tfm ctx into algorithm object
- Fix atomic sleep bug in crypto_destroy_instance
- Move lib/mpi into lib/crypto
Algorithms:
- Add chacha20 and poly1305 implementation for powerpc p10
Drivers:
- Add AES skcipher and aead support to starfive
- Add Dynamic Boost Control support to ccp
- Add support for STM32P13 platform to stm32"
* tag 'v6.6-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (149 commits)
Revert "dt-bindings: crypto: qcom,prng: Add SM8450"
crypto: chelsio - Remove unused declarations
X.509: if signature is unsupported skip validation
crypto: qat - fix crypto capability detection for 4xxx
crypto: drivers - Explicitly include correct DT includes
crypto: engine - Remove crypto_engine_ctx
crypto: zynqmp - Use new crypto_engine_op interface
crypto: virtio - Use new crypto_engine_op interface
crypto: stm32 - Use new crypto_engine_op interface
crypto: jh7110 - Use new crypto_engine_op interface
crypto: rk3288 - Use new crypto_engine_op interface
crypto: omap - Use new crypto_engine_op interface
crypto: keembay - Use new crypto_engine_op interface
crypto: sl3516 - Use new crypto_engine_op interface
crypto: caam - Use new crypto_engine_op interface
crypto: aspeed - Remove non-standard sha512 algorithms
crypto: aspeed - Use new crypto_engine_op interface
crypto: amlogic - Use new crypto_engine_op interface
crypto: sun8i-ss - Use new crypto_engine_op interface
crypto: sun8i-ce - Use new crypto_engine_op interface
...
Linus Torvalds [Tue, 29 Aug 2023 17:21:56 +0000 (10:21 -0700)]
Merge tag 'gpio-updates-for-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio updates from Bartosz Golaszewski:
"We have a lot of code refactoring using common helpers and ended up
removing more lines then we're adding this release cycle.
Nothing really stands out, just small updates all over the place.
Core GPIOLIB updates:
- wake-up poll() in user-space on device unbind
- improve fwnode usage
- interrupt domain handling improvements
- correctly handle the ngpios property in gpio-mmio
Driver cleanups:
- remove unneeded calls to platform_set_drvdata() all around the
place
- remove unneeded of_match_ptr() expansions whenever a driver depends
on CONFIG_OF
- remove redundant calls to dev_err_probe() from gpio-omap and
gpio-davinci
Driver improvements:
- use autopointers and guards from cleanup.h in gpio-sim
- shrink code in gpio-sim using some common helpers
- convert the idio family of drivers to using gpio-regmap
- convert gpio-ws16c48 to using gpio-regmap
- use devres to simplify code in gpio-pisosr and gpio-mxc
- update gpio-sifive: support IRQ wake, improve interrupt handling,
allow building as module
- make gpio-ge and gpio-bcm-kona OF-independent (plus some minor
tweaks)
- add support for new models in gpio-pca953x and gpio-ds4520
- add runtime PM support to gpio-mxc
- fix a build warning in gpio-mxs
- add support for adding pin ranges to gpio-mlxbf3
- add counter/timer support to gpio-104-dio-48e
- switch to dynamic GPIO base allocation in gpio-vf610
- minor oneliners here and there
Device-tree bindings updates:
- enable the gpio-line-names property in snps,dw-apb and STMPE GPIO
- document new models in fsl-imx-gpio, ds4520 and pca95xx
- convert the bindings for brcm,kona-gpio to YAML"
* tag 'gpio-updates-for-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: (94 commits)
gpio: pca953x: add support for TCA9538
dt-bindings: gpio: pca95xx: document new tca9538 chip
gpio: pca953x: Use i2c_get_match_data()
gpio: mlxbf3: use capital "OR" for multiple licenses in SPDX
gpio: pcf857x: Extend match data support for OF tables
gpio: vf610: switch to dynamic allocat GPIO base
gpiolib: provide and use gpiod_line_state_notify()
gpio: cdev: wake up lineevent poll() on device unbind
gpio: cdev: wake up linereq poll() on device unbind
gpio: cdev: wake up chardev poll() on device unbind
gpiolib: add a second blocking notifier to struct gpio_device
gpio: cdev: open-code to_gpio_chardev_data()
gpiolib: rename the gpio_device notifier
gpio: mlxbf3: Support add_pin_ranges()
gpio: mxc: Use helper function devm_clk_get_optional_enabled()
gpio: pca9570: fix kerneldoc
gpio: sim: simplify code with cleanup helpers
gpio: sim: replace memmove() + strstrip() with skip_spaces() + strim()
gpio: sim: simplify gpio_sim_device_config_live_store()
gpio: mxc: release the parent IRQ in runtime suspend
...
Linus Torvalds [Tue, 29 Aug 2023 16:56:24 +0000 (09:56 -0700)]
Merge tag 'mmc-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC updates from Ulf Hansson:
"MMC core:
- Convert drivers to use the ->remove_new() callback
- Propagate the removable attribute for the card's device
MMC host:
- Convert drivers to use the ->remove_new() callback
- atmel-mci: Convert to gpio descriptors and cleanup the code
- davinci: Make SDIO irq truly optional
- renesas_sdhi: Register irqs before registering controller
- sdhci: Simplify the sdhci_pltfm_* interface a bit
- sdhci-esdhc-imx: Improve support for the 1.8V errata
- sdhci-of-at91: Add support for the microchip sam9x7 variant
- sdhci-of-dwcmshc: Add support for runtime PM
- sdhci-pci-o2micro: Add support for the new Bayhub GG8 variant
- sdhci-sprd: Add support for SD high-speed mode tuning
- uniphier-sd: Register irqs before registering controller"
* tag 'mmc-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: (108 commits)
mmc: atmel-mci: Move card detect gpio polarity quirk to gpiolib
mmc: atmel-mci: move atmel MCI header file
mmc: atmel-mci: Convert to gpio descriptors
mmc: sdhci-sprd: Add SD HS mode online tuning
mmc: core: Add host specific tuning support for SD HS mode
mmc: sdhci-of-dwcmshc: Add runtime PM operations
mmc: sdhci-of-dwcmshc: Add error handling in dwcmshc_resume
mmc: sdhci-esdhc-imx: improve ESDHC_FLAG_ERR010450
mmc: sdhci-pltfm: Rename sdhci_pltfm_register()
mmc: sdhci-pltfm: Remove sdhci_pltfm_unregister()
mmc: sdhci-st: Use sdhci_pltfm_remove()
mmc: sdhci-pxav2: Use sdhci_pltfm_remove()
mmc: sdhci-of-sparx5: Use sdhci_pltfm_remove()
mmc: sdhci-of-hlwd: Use sdhci_pltfm_remove()
mmc: sdhci-of-esdhc: Use sdhci_pltfm_remove()
mmc: sdhci-of-at91: Use sdhci_pltfm_remove()
mmc: sdhci-of-arasan: Use sdhci_pltfm_remove()
mmc: sdhci-iproc: Use sdhci_pltfm_remove()
mmc: sdhci_f_sdh30: Use sdhci_pltfm_remove()
mmc: sdhci-dove: Use sdhci_pltfm_remove()
...
Linus Torvalds [Tue, 29 Aug 2023 16:47:33 +0000 (09:47 -0700)]
Merge tag 'spi-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi updates from Mark Brown:
"There's been quite a lot of generic activity here, but more
administrative than featuers. We also have a bunch of new drivers,
including one that's part of a MFD so we pulled in the core parts of
that:
- Lots of work from both Yang Yingliang and Andy Shevchenko on moving
to host/device/controller based terminology for devices.
- QuadSPI SPI support for Allwinner sun6i.
- New device support Cirrus Logic CS43L43, Longsoon, Qualcomm GENI
QuPv3 and StarFive JH7110 QSPI"
* tag 'spi-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (151 commits)
spi: at91-usart: Use PTR_ERR_OR_ZERO() to simplify code
spi: spi-sn-f-ospi: switch to use modern name
spi: sifive: switch to use modern name
spi: sh: switch to use modern name
spi: sh-sci: switch to use modern name
spi: sh-msiof: switch to use modern name
spi: sh-hspi: switch to use modern name
spi: sc18is602: switch to use modern name
spi: s3c64xx: switch to use modern name
spi: rzv2m-csi: switch to use devm_spi_alloc_host()
spi: rspi: switch to use spi_alloc_host()
spi: rockchip: switch to use modern name
spi: rockchip-sfc: switch to use modern name
spi: realtek-rtl: switch to use devm_spi_alloc_host()
spi: rb4xx: switch to use modern name
spi: qup: switch to use modern name
spi: spi-qcom-qspi: switch to use modern name
spi: pxa2xx: switch to use modern name
spi: ppc4xx: switch to use modern name
spi: spl022: switch to use modern name
...
Linus Torvalds [Tue, 29 Aug 2023 16:40:16 +0000 (09:40 -0700)]
Merge tag 'regulator-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator updates from Mark Brown:
"Other than new device support and some minor fixes this has been a
really quiet release, the only notable things are the new drivers.
There's a couple of MFDs among the new devices so the generic parts
are pulled in:
- Support for Analog Devices MAX77831/57/59, Awinc AW37503, Qualcom
PMX75 and RFGEN, RealTek RT5733, RichTek RTQ2208 and Texas
Instruments TPS65086"
* tag 'regulator-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: (68 commits)
regulator: userspace-consumer: Drop event support for this cycle
regulator: aw37503: Switch back to use struct i2c_driver's .probe()
dt-bindings: regulator: qcom,rpmh-regulator: allow i, j, l, m & n as RPMh resource name suffix
regulator: dt-bindings: Add Awinic AW37503
regulator: aw37503: add regulator driver for Awinic AW37503
regulator: tps65086: Select dedicated regulator config for chip variant
mfd: tps65086: Read DEVICE ID register 1 from device
regulator: raa215300: Update help description
regulator: raa215300: Add missing blank space
regulator: raa215300: Change rate from 32000->32768
regulator: db8500-prcmu: Remove unused declaration power_state_active_is_enabled()
regulator: raa215300: Add const definition
regulator: raa215300: Fix resource leak in case of error
regulator: rtq2208: Switch back to use struct i2c_driver's .probe()
regulator: lp872x: Fix Wvoid-pointer-to-enum-cast warning
regulator: max77857: Fix Wvoid-pointer-to-enum-cast warning
regulator: ltc3589: Fix Wvoid-pointer-to-enum-cast warning
regulator: qcom_rpm-regulator: Use devm_kmemdup to replace devm_kmalloc + memcpy
regulator: tps6286x-regulator: Remove redundant of_match_ptr() macros
regulator: pfuze100-regulator: Remove redundant of_match_ptr() macro
...
Linus Torvalds [Tue, 29 Aug 2023 16:26:04 +0000 (09:26 -0700)]
Merge tag 'regmap-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
Pull regmap updates from Mark Brown:
"This is a much quieter release than the past few, there's one small
API addition that I noticed a user for in ALSA and a bunch of
cleanups:
- Provide an interface for determining if a register is present in
the cache and add a user of it in ALSA.
- Full support for dynamic allocations, following the temporary
bodges that were done as fixes in the previous release.
- Remove the unused and questionably working 64 bit support"
* tag 'regmap-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: Fix the type used for a bitmap pointer
regmap: Remove dynamic allocation warnings for rbtree and maple
regmap: rbtree: Use alloc_flags for memory allocations
regmap: maple: Use alloc_flags for memory allocations
regmap: Reject fast_io regmap configurations with RBTREE and MAPLE caches
ALSA: hda: Use regcache_reg_cached() rather than open coding
regmap: Provide test for regcache_reg_present()
regmap: Let users check if a register is cached
regmap: Provide user selectable option to enable regmap
regmap: mmio: Remove unused 64-bit support code
regmap: cache: Revert "Add 64-bit mode support"
regmap: Revert "add 64-bit mode support" and Co.
Linus Torvalds [Tue, 29 Aug 2023 15:19:46 +0000 (08:19 -0700)]
Merge tag 'rust-6.6' of https://github.com/Rust-for-Linux/linux
Pull rust updates from Miguel Ojeda:
"In terms of lines, most changes this time are on the pinned-init API
and infrastructure. While we have a Rust version upgrade, and thus a
bunch of changes from the vendored 'alloc' crate as usual, this time
those do not account for many lines.
Toolchain and infrastructure:
- Upgrade to Rust 1.71.1. This is the second such upgrade, which is a
smaller jump compared to the last time.
This version allows us to remove the '__rust_*' allocator functions
-- the compiler now generates them as expected, thus now our
'KernelAllocator' is used.
It also introduces the 'offset_of!' macro in the standard library
(as an unstable feature) which we will need soon. So far, we were
using a declarative macro as a prerequisite in some not-yet-landed
patch series, which did not support sub-fields (i.e. nested
structs):
#[repr(C)]
struct S {
a: u16,
b: (u8, u8),
}
assert_eq!(offset_of!(S, b.1), 3);
- Upgrade to bindgen 0.65.1. This is the first time we upgrade its
version.
Given it is a fairly big jump, it comes with a fair number of
improvements/changes that affect us, such as a fix needed to
support LLVM 16 as well as proper support for '__noreturn' C
functions, which are now mapped to return the '!' type in Rust:
- 'scripts/rust_is_available.sh' improvements and fixes.
This series takes care of all the issues known so far and adds a
few new checks to cover for even more cases, plus adds some more
help texts. All this together will hopefully make problematic
setups easier to identify and to be solved by users building the
kernel.
In addition, it adds a test suite which covers all branches of the
shell script, as well as tests for the issues found so far.
- Support rust-analyzer for out-of-tree modules too.
- Give 'cfg's to rust-analyzer for the 'core' and 'alloc' crates.
- Drop 'scripts/is_rust_module.sh' since it is not needed anymore.
Macros crate:
- New 'paste!' proc macro.
This macro is a more flexible version of 'concat_idents!': it
allows the resulting identifier to be used to declare new items and
it allows to transform the identifiers before concatenating them,
e.g.
- New '#[derive(Zeroable)]' proc macro for the 'Zeroable' trait,
which allows 'unsafe' implementations for structs where every field
implements the 'Zeroable' trait, e.g.:
- Support arbitrary paths in init macros, instead of just identifiers
and generic types.
- Implement the 'Zeroable' trait for the 'UnsafeCell<T>' and
'Opaque<T>' types.
- Make initializer values inaccessible after initialization.
- Make guards in the init macros hygienic.
'allocator' module:
- Use 'krealloc_aligned()' in 'KernelAllocator::alloc' preventing
misaligned allocations when the Rust 1.71.1 upgrade is applied
later in this pull.
The equivalent fix for the previous compiler version (where
'KernelAllocator' is not yet used) was merged into 6.5 already,
which added the 'krealloc_aligned()' function used here.
- Implement 'KernelAllocator::{realloc, alloc_zeroed}' for
performance, using 'krealloc_aligned()' too, which forwards the
call to the C API.
'types' module:
- Make 'Opaque' be '!Unpin', removing the need to add a
'PhantomPinned' field to Rust structs that contain C structs which
must not be moved.
- Make 'Opaque' use 'UnsafeCell' as the outer type, rather than
inner.
Documentation:
- Suggest obtaining the source code of the Rust's 'core' library
using the tarball instead of the repository.
MAINTAINERS:
- Andreas and Alice, from Samsung and Google respectively, are
joining as reviewers of the "RUST" entry.
As well as a few other minor changes and cleanups"
* tag 'rust-6.6' of https://github.com/Rust-for-Linux/linux: (42 commits)
rust: init: update expanded macro explanation
rust: init: add `{pin_}chain` functions to `{Pin}Init<T, E>`
rust: init: make `PinInit<T, E>` a supertrait of `Init<T, E>`
rust: init: implement `Zeroable` for `UnsafeCell<T>` and `Opaque<T>`
rust: init: add support for arbitrary paths in init macros
rust: init: add functions to create array initializers
rust: init: add `..Zeroable::zeroed()` syntax for zeroing all missing fields
rust: init: make initializer values inaccessible after initializing
rust: init: wrap type checking struct initializers in a closure
rust: init: make guards in the init macros hygienic
rust: add derive macro for `Zeroable`
rust: init: make `#[pin_data]` compatible with conditional compilation of fields
rust: init: consolidate init macros
docs: rust: clarify what 'rustup override' does
docs: rust: update instructions for obtaining 'core' source
docs: rust: add command line to rust-analyzer section
scripts: generate_rust_analyzer: provide `cfg`s for `core` and `alloc`
rust: bindgen: upgrade to 0.65.1
rust: enable `no_mangle_with_rust_abi` Clippy lint
rust: upgrade to Rust 1.71.1
...
Linus Torvalds [Tue, 29 Aug 2023 15:05:18 +0000 (08:05 -0700)]
Merge tag 'tpmdd-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
Pull tpm updates from Jarkko Sakkinen:
- Restrict linking of keys to .ima and .evm keyrings based on
digitalSignature attribute in the certificate
- PowerVM: load machine owner keys into the .machine [1] keyring
- PowerVM: load module signing keys into the secondary trusted keyring
(keys blessed by the vendor)
- tpm_tis_spi: half-duplex transfer mode
- tpm_tis: retry corrupted transfers
- Apply revocation list (.mokx) to an all system keyrings (e.g.
.machine keyring)
Link: https://blogs.oracle.com/linux/post/the-machine-keyring
* tag 'tpmdd-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
certs: Reference revocation list for all keyrings
tpm/tpm_tis_synquacer: Use module_platform_driver macro to simplify the code
tpm: remove redundant variable len
tpm_tis: Resend command to recover from data transfer errors
tpm_tis: Use responseRetry to recover from data transfer errors
tpm_tis: Move CRC check to generic send routine
tpm_tis_spi: Add hardware wait polling
KEYS: Replace all non-returning strlcpy with strscpy
integrity: PowerVM support for loading third party code signing keys
integrity: PowerVM machine keyring enablement
integrity: check whether imputed trust is enabled
integrity: remove global variable from machine_keyring.c
integrity: ignore keys failing CA restrictions on non-UEFI platform
integrity: PowerVM support for loading CA keys on machine keyring
integrity: Enforce digitalSignature usage in the ima and evm keyrings
KEYS: DigitalSignature link restriction
tpm_tis: Revert "tpm_tis: Disable interrupts on ThinkPad T490s"
Linus Torvalds [Tue, 29 Aug 2023 02:03:24 +0000 (19:03 -0700)]
Merge tag 'linux-kselftest-nolibc-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull nolibc updates from Shuah Khan:
"Nolibc:
- improved portability by removing build errors with -ENOSYS
- added syscall6() on MIPS to support pselect6() and mmap()
- added setvbuf(), rmdir(), pipe(), pipe2()
- add support for ppc/ppc64
- environ is no longer optional
- fixed frame pointer issues at -O0
- dropped sys_stat() in favor of sys_statx()
- centralized _start_c() to remove lots of asm code
- switched size_t to __SIZE_TYPE__
Selftests:
- improved status reporting (success/warning/failure counts, path to
log file)
- various code cleanups (indent, unused variables, ...)
- more consistent test numbering
- enabled compiler warnings
- dropped unreliable chmod_net test
- improved reliability (create /dev/zero & /tmp, rely less on /proc)
- new tests (brk/sbrk/mmap/munmap)
- improved compatibility with musl
- new run-nolibc-test target to build and run natively
- new run-libc-test target to build and run against native libc
- made the cmdline parser more reliable against boolean arguments
- dropped dependency on memfd for vfprintf() test
- nolibc-test is no longer stripped
- added support for extending ARCH via XARCH
Other:
- add Thomas as co-maintainer"
* tag 'linux-kselftest-nolibc-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (103 commits)
tools/nolibc: avoid undesired casts in the __sysret() macro
tools/nolibc: keep brk(), sbrk(), mmap() away from __sysret()
tools/nolibc: silence ppc64 compile warnings
selftests/nolibc: libc-test: use HOSTCC instead of CC
tools/nolibc: stackprotector.h: make __stack_chk_init static
selftests/nolibc: allow report with existing test log
selftests/nolibc: add test support for ppc64
selftests/nolibc: add test support for ppc64le
selftests/nolibc: add test support for ppc
selftests/nolibc: add XARCH and ARCH mapping support
tools/nolibc: add support for powerpc64
tools/nolibc: add support for powerpc
MAINTAINERS: nolibc: add myself as co-maintainer
selftests/nolibc: enable compiler warnings
selftests/nolibc: don't strip nolibc-test
selftests/nolibc: prevent out of bounds access in expect_vfprintf
selftests/nolibc: use correct return type for read() and write()
selftests/nolibc: avoid sign-compare warnings
selftests/nolibc: avoid unused parameter warnings
selftests/nolibc: make functions static if possible
...
Linus Torvalds [Tue, 29 Aug 2023 01:56:38 +0000 (18:56 -0700)]
Merge tag 'linux-kselftest-kunit-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kunit updates from Shuah Khan:
- add support for running Rust documentation tests as KUnit tests
- make init, str, sync, types doctests compilable/testable
- add support for attributes API which include speed, modules
attributes, ability to filter and report attributes
- add support for marking tests slow using attributes API
- add attributes API documentation
- fix a wild-memory-access bug in kunit_filter_suites() and a possible
memory leak in kunit_filter_suites()
- add support for counting number of test suites in a module, list
action to kunit test modules, and test filtering on module tests
* tag 'linux-kselftest-kunit-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (25 commits)
kunit: fix struct kunit_attr header
kunit: replace KUNIT_TRIGGER_STATIC_STUB maro with KUNIT_STATIC_STUB_REDIRECT
kunit: Allow kunit test modules to use test filtering
kunit: Make 'list' action available to kunit test modules
kunit: Report the count of test suites in a module
kunit: fix uninitialized variables bug in attributes filtering
kunit: fix possible memory leak in kunit_filter_suites()
kunit: fix wild-memory-access bug in kunit_filter_suites()
kunit: Add documentation of KUnit test attributes
kunit: add tests for filtering attributes
kunit: time: Mark test as slow using test attributes
kunit: memcpy: Mark tests as slow using test attributes
kunit: tool: Add command line interface to filter and report attributes
kunit: Add ability to filter attributes
kunit: Add module attribute
kunit: Add speed attribute
kunit: Add test attributes API structure
MAINTAINERS: add Rust KUnit files to the KUnit entry
rust: support running Rust documentation tests as KUnit ones
rust: types: make doctests compilable/testable
...
Linus Torvalds [Tue, 29 Aug 2023 01:46:47 +0000 (18:46 -0700)]
Merge tag 'linux-kselftest-next-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull Kselftest updates from Shuah Khan:
"A mix of fixes, enhancements, and new tests. Bulk of the changes
enhance and fix rseq and resctrl tests.
In addition, user_events, dmabuf-heaps and perf_events are added to
default kselftest build and test coverage. A futex test fix, enhance
prctl test coverage, and minor fixes are included in this update"
* tag 'linux-kselftest-next-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (32 commits)
selftests: cachestat: use proper syscall number macro
selftests: cachestat: properly link in librt
selftests/futex: Order calls to futex_lock_pi
selftests: Hook more tests into the build infrastructure
selftests/user_events: Reenable build
selftests/filesystems: Add six consecutive 'x' characters to mktemp
selftests/rseq: Use rseq_unqual_scalar_typeof in macros
selftests/rseq: Fix arm64 buggy load-acquire/store-release macros
selftests/rseq: Implement rseq_unqual_scalar_typeof
selftests/rseq: Fix CID_ID typo in Makefile
selftests:prctl: add set-process-name to .gitignore
selftests:prctl: Fix make clean override warning
selftests/resctrl: Remove test type checks from cat_val()
selftests/resctrl: Pass the real number of tests to show_cache_info()
selftests/resctrl: Move CAT/CMT test global vars to function they are used in
selftests/resctrl: Don't use variable argument list for ->setup()
selftests/resctrl: Don't pass test name to fill_buf
selftests/resctrl: Improve parameter consistency in fill_buf
selftests/resctrl: Remove unnecessary startptr global from fill_buf
selftests/resctrl: Remove "malloc_and_init_memory" param from run_fill_buf()
...
Linus Torvalds [Tue, 29 Aug 2023 01:26:45 +0000 (18:26 -0700)]
Merge tag 'thermal-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull thermal control updates from Rafael Wysocki:
"These rework the Intel DTS IOSF and the ACPI thermal drivers to pass
tables of generic trip point structures to the core during
initialization and make some requisite modifications in the thermal
core, fix a few issues elsewhere and clean up code.
This includes changes that are present in the ACPI updates too,
because they involve both ACPI and the thermal core. The list of
specific changes below is limited to thermal control, however.
Specifics:
- Make the ACPI thermal driver use its own Notify() handler (Michal
Wilczynski)
- Rework the ACPI thermal driver to use a table of generic trip point
structures on top of the internal representation of trip points and
remove thermal zone callbacks that are not necessary any more from
that driver (Rafael Wysocki)
- Fix a few issues in the Intel DTS IOSF thermal driver, clean up
code in it and make it pass tables of generic trip point structures
to the core during thermal zone registration (Rafael Wysocki)
- Drop a redundant check from the Intel DTS IOSF thermal driver's
"remove" routine (Zhang Rui)
- Use module_platform_driver() to replace an open-coded counterpart
of it in the int340x thermal driver (Yang Yingliang)
- Fix possible uninitialized value access in __thermal_of_bind() and
__thermal_of_unbind() (Peng Fan)
- Make the int3400 driver use thermal zone device wrappers (Daniel
Lezcano)
- Remove redundant thermal zone state check from the int340x thermal
driver (Daniel Lezcano)
- Drop non-functional nocrt parameter from ACPI thermal (Mario
Limonciello)
- Explicitly include correct DT includes in the thermal core and
drivers (Rob Herring)"
* tag 'thermal-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: intel: intel_soc_dts_iosf: Remove redundant check
thermal: intel: int340x: simplify the code with module_platform_driver()
thermal/of: Fix potential uninitialized value access
thermal: intel: intel_soc_dts_iosf: Use struct thermal_trip
thermal: intel: intel_soc_dts_iosf: Rework critical trip setup
thermal: intel: intel_soc_dts_iosf: Add helper for resetting trip points
thermal: intel: intel_soc_dts_iosf: Change initialization ordering
thermal: intel: intel_soc_dts_iosf: Pass sensors to update_trip_temp()
thermal: intel: intel_soc_dts_iosf: Untangle update_trip_temp()
thermal: intel: intel_soc_dts_iosf: Always assume notification support
thermal: intel: intel_soc_dts_iosf: Drop redundant symbol definition
thermal: intel: intel_soc_dts_iosf: Always use 2 trips
thermal: Explicitly include correct DT includes
thermal/drivers/int340x: Do not check the thermal zone state
thermal/drivers/int3400: Use thermal zone device wrappers
Linus Torvalds [Tue, 29 Aug 2023 01:04:39 +0000 (18:04 -0700)]
Merge tag 'pm-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"These rework cpuidle governors to call tick_nohz_get_sleep_length()
less often and fix one of them, rework hibernation to avoid storing
pages filled with zeros in hibernation images, switch over some
cpufreq drivers to use void remove callbacks, fix and clean up
multiple cpufreq drivers, fix the devfreq core, update the cpupower
utility and make other assorted improvements.
Specifics:
- Rework the menu and teo cpuidle governors to avoid calling
tick_nohz_get_sleep_length(), which is likely to become quite
expensive going forward, too often and improve making decisions
regarding whether or not to stop the scheduler tick in the teo
governor (Rafael Wysocki)
- Improve the performance of cpufreq_stats_create_table() in some
cases (Liao Chang)
- Fix two issues in the amd-pstate-ut cpufreq driver (Swapnil Sapkal)
- Use clamp() helper macro to improve the code readability in
cpufreq_verify_within_limits() (Liao Chang)
- Set stale CPU frequency to minimum in intel_pstate (Doug Smythies)
- Migrate cpufreq drivers for various platforms to use void remove
callback (Yangtao Li)
- Add online/offline/exit hooks for Tegra driver (Sumit Gupta)
- Explicitly include correct DT includes in cpufreq (Rob Herring)
- Frequency domain updates for qcom-hw driver (Neil Armstrong)
- Modify AMD pstate driver return the highest_perf value (Meng Li)
- Generic cleanups for cppc, mediatek and powernow driver (Liao
Chang, Konrad Dybcio)
- Add more platforms to cpufreq-arm driver's blocklist
(AngeloGioacchino Del Regno and Konrad Dybcio)
- brcmstb-avs-cpufreq: Fix -Warray-bounds bug (Gustavo A. R. Silva)
- Add device PM helpers to allow a device to remain powered-on during
system-wide transitions (Ulf Hansson)
- Rework hibernation memory snapshotting to avoid storing pages
filled with zeros in hibernation image files (Brian Geffon)
- Add check to make sure that CPU latency QoS constraints do not use
negative values (Clive Lin)
- Optimize rp->domains memory allocation in the Intel RAPL power
capping driver (xiongxin)
- Remove recursion while parsing zones in the arm_scmi power capping
driver (Cristian Marussi)
- Fix memory leak in devfreq_dev_release() (Boris Brezillon)
- Add turbo-boost support to cpupower (Wyes Karny)
- Add support for amd_pstate mode change to cpupower (Wyes Karny)
- Fix 'cpupower idle_set' command to accept only numeric values of
arguments (Likhitha Korrapati)
- Clean up OPP code and add new frequency related APIs to it (Viresh
Kumar, Manivannan Sadhasivam)
- Convert ti cpufreq/opp bindings to json schema (Nishanth Menon)"
* tag 'pm-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (74 commits)
cpufreq: tegra194: remove opp table in exit hook
cpufreq: powernow-k8: Use related_cpus instead of cpus in driver.exit()
cpufreq: tegra194: add online/offline hooks
cpuidle: teo: Avoid unnecessary variable assignments
cpufreq: qcom-cpufreq-hw: add support for 4 freq domains
dt-bindings: cpufreq: qcom-hw: add a 4th frequency domain
cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver
cpufreq: amd-pstate-ut: Remove module parameter access
cpufreq: Use clamp() helper macro to improve the code readability
PM: sleep: Add helpers to allow a device to remain powered-on
PM: QoS: Add check to make sure CPU latency is non-negative
PM: runtime: Remove unsued extern declaration of pm_runtime_update_max_time_suspended()
cpufreq: intel_pstate: set stale CPU frequency to minimum
cpufreq: stats: Improve the performance of cpufreq_stats_create_table()
dt-bindings: cpufreq: Convert ti-cpufreq to json schema
dt-bindings: opp: Convert ti-omap5-opp-supply to json schema
OPP: Fix argument name in doc comment
cpuidle: menu: Skip tick_nohz_get_sleep_length() call in some cases
cpufreq: cppc: Set fie_disabled to FIE_DISABLED if fails to create kworker_fie
cpufreq: cppc: cppc_cpufreq_get_rate() returns zero in all error cases.
...
Linus Torvalds [Tue, 29 Aug 2023 00:58:39 +0000 (17:58 -0700)]
Merge tag 'acpi-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI updates from Rafael Wysocki:
"These include new ACPICA material, a rework of the ACPI thermal
driver, a switch-over of the ACPI processor driver to using _OSC
instead of (long deprecated) _PDC for CPU initialization, a rework of
firmware notifications handling in several drivers, fixes and cleanups
for suspend-to-idle handling on AMD systems, ACPI backlight driver
updates and more.
Specifics:
- Update the ACPICA code in the kernel to upstream revision 20230628
including the following changes:
- Suppress a GCC 12 dangling-pointer warning (Philip Prindeville)
- Reformat the ACPI_STATE_COMMON macro and its users (George Guo)
- Replace the ternary operator with ACPI_MIN() (Jiangshan Yi)
- Add support for _DSC as per ACPI 6.5 (Saket Dumbre)
- Remove a duplicate macro from zephyr header (Najumon B.A)
- Add data structures for GED and _EVT tracking (Jose Marinho)
- Fix misspelled CDAT DSMAS define (Dave Jiang)
- Simplify an error message in acpi_ds_result_push() (Christophe
Jaillet)
- Add a struct size macro related to SRAT (Dave Jiang)
- Add AML_NO_OPERAND_RESOLVE flag to Timer (Abhishek Mainkar)
- Add support for RISC-V external interrupt controllers in MADT
(Sunil V L)
- Add RHCT flags, CMO and MMU nodes (Sunil V L)
- Change ACPICA version to 20230628 (Bob Moore)
- Introduce new wrappers for ACPICA notify handler install/remove and
convert multiple drivers to using their own Notify() handlers
instead of the ACPI bus type .notify() slated for removal (Michal
Wilczynski)
- Add backlight=native DMI quirk for Apple iMac12,1 and iMac12,2
(Hans de Goede)
- Put ACPI video and its child devices explicitly into D0 on boot to
avoid platform firmware confusion (Kai-Heng Feng)
- Support obtaining physical CPU ID from MADT on LoongArch (Bibo Mao)
- Convert ACPI CPU initialization to using _OSC instead of _PDC that
has been depreceted since 2018 and dropped from the specification
in ACPI 6.5 (Michal Wilczynski, Rafael Wysocki)
- Drop non-functional nocrt parameter from ACPI thermal (Mario
Limonciello)
- Clean up the ACPI thermal driver, rework the handling of firmware
notifications in it and make it provide a table of generic trip
point structures to the core during initialization (Rafael Wysocki)
- Defer enumeration of devices with _DEP pointing to IVSC (Wentong
Wu)
- Install SystemCMOS address space handler for ACPI000E (TAD) to meet
platform firmware expectations on some platforms (Zhang Rui)
- Fix finding the generic error data in the ACPi extlog driver for
compatibility with old and new firmware interface versions
(Xiaochun Lee)
- Remove assorted unused declarations of functions (Yue Haibing)
- Move AMBA bus scan handling into arm64 specific directory (Sudeep
Holla)
- Fix and clean up suspend-to-idle interface for AMD systems (Mario
Limonciello, Andy Shevchenko)
- Fix string truncation warning in pnpacpi_add_device() (Sunil V L)"
* tag 'acpi-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (66 commits)
ACPI: x86: s2idle: Add a function to get LPS0 constraint for a device
ACPI: x86: s2idle: Add for_each_lpi_constraint() helper
ACPI: x86: s2idle: Add more debugging for AMD constraints parsing
ACPI: x86: s2idle: Fix a logic error parsing AMD constraints table
ACPI: x86: s2idle: Catch multiple ACPI_TYPE_PACKAGE objects
ACPI: x86: s2idle: Post-increment variables when getting constraints
ACPI: Adjust #ifdef for *_lps0_dev use
ACPI: TAD: Install SystemCMOS address space handler for ACPI000E
ACPI: Remove assorted unused declarations of functions
ACPI: extlog: Fix finding the generic error data for v3 structure
PNP: ACPI: Fix string truncation warning
ACPI: Remove unused extern declaration acpi_paddr_to_node()
ACPI: video: Add backlight=native DMI quirk for Apple iMac12,1 and iMac12,2
ACPI: video: Put ACPI video and its child devices into D0 on boot
ACPI: processor: LoongArch: Get physical ID from MADT
ACPI: scan: Defer enumeration of devices with a _DEP pointing to IVSC device
ACPI: thermal: Eliminate code duplication from acpi_thermal_notify()
ACPI: thermal: Drop unnecessary thermal zone callbacks
ACPI: thermal: Rework thermal_get_trend()
ACPI: thermal: Use trip point table to register thermal zones
...
Linus Torvalds [Tue, 29 Aug 2023 00:56:03 +0000 (17:56 -0700)]
Merge tag 'tag-chrome-platform-firmware-for-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux
Pull chrome platform firmware update from Tzung-Bi Shih:
- Add MAINTAINERS entry for chrome platform firmware
* tag 'tag-chrome-platform-firmware-for-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux:
MAINTAINERS: Add drivers/firmware/google/ entry
Linus Torvalds [Tue, 29 Aug 2023 00:49:33 +0000 (17:49 -0700)]
Merge tag 'tag-chrome-platform-for-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux
Pull chrome platform updates from Tzung-Bi Shih:
"Improvements:
- Remove shutdown timeout on EC panic for offering the filesystem a
chance to sync
- Support official HID "GOOG0016" for ChromeOS ACPI
Fixes:
- Print hex string instead of using "%s" for ACPI_TYPE_BUFFER
Misc:
- Update MAINTAINERS"
* tag 'tag-chrome-platform-for-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux:
platform/chrome: chromeos_acpi: print hex string for ACPI_TYPE_BUFFER
platform/chrome: chromeos_acpi: support official HID GOOG0016
platform/chrome: cros_ec_lpc: Remove EC panic shutdown timeout
MAINTAINERS: update maintainers of chrome-platform
Linus Torvalds [Tue, 29 Aug 2023 00:41:30 +0000 (17:41 -0700)]
Merge tag 'for-linus-6.6-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen updates from Juergen Gross:
- a bunch of minor cleanups
- a patch adding irqfd support for virtio backends running as user
daemon supporting Xen guests
* tag 'for-linus-6.6-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: privcmd: Add support for irqfd
xen/xenbus: Avoid a lockdep warning when adding a watch
xen: Fix one kernel-doc comment
xen: xenbus: Use helper function IS_ERR_OR_NULL()
xen: Switch to use kmemdup() helper
xen-pciback: Remove unused function declarations
x86/xen: Make virt_to_pfn() a static inline
xen: remove a confusing comment on auto-translated guest I/O
xen/evtchn: Remove unused function declaration xen_set_affinity_evtchn()