Jianbo Liu [Mon, 31 Jul 2023 11:58:40 +0000 (14:58 +0300)]
net/mlx5: fs_core: Make find_closest_ft more generic
As find_closest_ft_recursive is called to find the closest FT, the
first parameter of find_closest_ft can be changed from fs_prio to
fs_node. Thus this function is extended to find the closest FT for the
nodes of any type, not only prios, but also the sub namespaces.
Benjamin Poirier [Mon, 31 Jul 2023 20:02:08 +0000 (16:02 -0400)]
vxlan: Fix nexthop hash size
The nexthop code expects a 31 bit hash, such as what is returned by
fib_multipath_hash() and rt6_multipath_hash(). Passing the 32 bit hash
returned by skb_get_hash() can lead to problems related to the fact that
'int hash' is a negative number when the MSB is set.
In the case of hash threshold nexthop groups, nexthop_select_path_hthr()
will disproportionately select the first nexthop group entry. In the case
of resilient nexthop groups, nexthop_select_path_res() may do an out of
bounds access in nh_buckets[], for example:
hash = -912054133
num_nh_buckets = 2
bucket_index = 65535
Fix this problem by ensuring the MSB of hash is 0 using a right shift - the
same approach used in fib_multipath_hash() and rt6_multipath_hash().
Fixes: 1274e1cc4226 ("vxlan: ecmp support for mac fdb entries") Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
When setup a vlan device on dev pim6reg, DAD ns packet may sent on reg_vif_xmit().
reg_vif_xmit()
ip6mr_cache_report()
skb_push(skb, -skb_network_offset(pkt));//skb_network_offset(pkt) is 4
And skb_push declared as:
void *skb_push(struct sk_buff *skb, unsigned int len);
skb->data -= len;
//0xffff88805f86a84c - 0xfffffffc = 0xffff887f5f86a850
skb->data is set to 0xffff887f5f86a850, which is invalid mem addr, lead to skb_push() fails.
Fixes: 14fb64e1f449 ("[IPV6] MROUTE: Support PIM-SM (SSM).") Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
dev_close() and dev_open() are issued to change the interface state to DOWN
or UP (dev->flags IFF_UP). When the netdev is set DOWN it loses e.g its
Ipv6 addresses and routes. We don't want this in cases of device recovery
(triggered by hardware or software) or when the qeth device is set
offline.
Setting a qeth device offline or online and device recovery actions call
netif_device_detach() and/or netif_device_attach(). That will reset or
set the LOWER_UP indication i.e. change the dev->state Bit
__LINK_STATE_PRESENT. That is enough to e.g. cause bond failovers, and
still preserves the interface settings that are handled by the network
stack.
Don't call dev_open() nor dev_close() from the qeth device driver. Let the
network stack handle this.
Fixes: d4560150cb47 ("s390/qeth: call dev_close() during recovery") Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 2 Aug 2023 09:06:06 +0000 (10:06 +0100)]
Merge branch 'tun-tap-uid'
Laszlo Ersek says:
====================
tun/tap: set sk_uid from current_fsuid()
The original patches fixing CVE-2023-1076 are incorrect in my opinion.
This small series fixes them up; see the individual commit messages for
explanation.
I have a very elaborate test procedure demonstrating the problem for
both tun and tap; it involves libvirt, qemu, and "crash". I can share
that procedure if necessary, but it's indeed quite long (I wrote it
originally for our QE team).
The patches in this series are supposed to "re-fix" CVE-2023-1076; given
that said CVE is classified as Low Impact (CVSSv3=5.5), I'm posting this
publicly, and not suggesting any embargo. Red Hat Product Security may
assign a new CVE number later.
I've tested the patches on top of v6.5-rc4, with "crash" built at commit c74f375e0ef7.
Cc: Eric Dumazet <edumazet@google.com> Cc: Lorenzo Colitti <lorenzo@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Pietro Borrello <borrello@diag.uniroma1.it> Cc: netdev@vger.kernel.org Cc: stable@vger.kernel.org
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 66b2c338adce initializes the "sk_uid" field in the protocol socket
(struct sock) from the "/dev/tapX" device node's owner UID. Per original
commit 86741ec25462 ("net: core: Add a UID field to struct sock.",
2016-11-04), that's wrong: the idea is to cache the UID of the userspace
process that creates the socket. Commit 86741ec25462 mentions socket() and
accept(); with "tap", the action that creates the socket is
open("/dev/tapX").
Therefore the device node's owner UID is irrelevant. In most cases,
"/dev/tapX" will be owned by root, so in practice, commit 66b2c338adce has
no observable effect:
- before, "sk_uid" would be zero, due to undefined behavior
(CVE-2023-1076),
- after, "sk_uid" would be zero, due to "/dev/tapX" being owned by root.
What matters is the (fs)UID of the process performing the open(), so cache
that in "sk_uid".
Cc: Eric Dumazet <edumazet@google.com> Cc: Lorenzo Colitti <lorenzo@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Pietro Borrello <borrello@diag.uniroma1.it> Cc: netdev@vger.kernel.org Cc: stable@vger.kernel.org Fixes: 66b2c338adce ("tap: tap_open(): correctly initialize socket uid")
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2173435 Signed-off-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: tun_chr_open(): set sk_uid from current_fsuid()
Commit a096ccca6e50 initializes the "sk_uid" field in the protocol socket
(struct sock) from the "/dev/net/tun" device node's owner UID. Per
original commit 86741ec25462 ("net: core: Add a UID field to struct
sock.", 2016-11-04), that's wrong: the idea is to cache the UID of the
userspace process that creates the socket. Commit 86741ec25462 mentions
socket() and accept(); with "tun", the action that creates the socket is
open("/dev/net/tun").
Therefore the device node's owner UID is irrelevant. In most cases,
"/dev/net/tun" will be owned by root, so in practice, commit a096ccca6e50
has no observable effect:
- before, "sk_uid" would be zero, due to undefined behavior
(CVE-2023-1076),
- after, "sk_uid" would be zero, due to "/dev/net/tun" being owned by root.
What matters is the (fs)UID of the process performing the open(), so cache
that in "sk_uid".
Cc: Eric Dumazet <edumazet@google.com> Cc: Lorenzo Colitti <lorenzo@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Pietro Borrello <borrello@diag.uniroma1.it> Cc: netdev@vger.kernel.org Cc: stable@vger.kernel.org Fixes: a096ccca6e50 ("tun: tun_chr_open(): correctly initialize socket uid")
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2173435 Signed-off-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Lin Ma [Tue, 1 Aug 2023 01:32:48 +0000 (09:32 +0800)]
net: dcb: choose correct policy to parse DCB_ATTR_BCN
The dcbnl_bcn_setcfg uses erroneous policy to parse tb[DCB_ATTR_BCN],
which is introduced in commit 859ee3c43812 ("DCB: Add support for DCB
BCN"). Please see the comment in below code
static int dcbnl_bcn_setcfg(...)
{
...
ret = nla_parse_nested_deprecated(..., dcbnl_pfc_up_nest, .. )
// !!! dcbnl_pfc_up_nest for attributes
// DCB_PFC_UP_ATTR_0 to DCB_PFC_UP_ATTR_ALL in enum dcbnl_pfc_up_attrs
...
for (i = DCB_BCN_ATTR_RP_0; i <= DCB_BCN_ATTR_RP_7; i++) {
// !!! DCB_BCN_ATTR_RP_0 to DCB_BCN_ATTR_RP_7 in enum dcbnl_bcn_attrs
...
value_byte = nla_get_u8(data[i]);
...
}
...
for (i = DCB_BCN_ATTR_BCNA_0; i <= DCB_BCN_ATTR_RI; i++) {
// !!! DCB_BCN_ATTR_BCNA_0 to DCB_BCN_ATTR_RI in enum dcbnl_bcn_attrs
...
value_int = nla_get_u32(data[i]);
...
}
...
}
That is, the nla_parse_nested_deprecated uses dcbnl_pfc_up_nest
attributes to parse nlattr defined in dcbnl_pfc_up_attrs. But the
following access code fetch each nlattr as dcbnl_bcn_attrs attributes.
By looking up the associated nla_policy for dcbnl_bcn_attrs. We can find
the beginning part of these two policies are "same".
Therefore, the current code is buggy and this
nla_parse_nested_deprecated could overflow the dcbnl_pfc_up_nest and use
the adjacent nla_policy to parse attributes from DCB_BCN_ATTR_BCNA_0.
Hence use the correct policy dcbnl_bcn_nest to parse the nested
tb[DCB_ATTR_BCN] TLV.
Jakub Kicinski [Tue, 1 Aug 2023 22:05:00 +0000 (15:05 -0700)]
Merge branch 'bnxt_en-2-xdp-bug-fixes'
Michael Chan says:
====================
bnxt_en: 2 XDP bug fixes
The first patch fixes XDP page pool logic on systems with page size >=
64K. The second patch fixes the max_mtu setting when an XDP program
supporting multi buffers is attached.
====================
Michael Chan [Mon, 31 Jul 2023 14:20:43 +0000 (07:20 -0700)]
bnxt_en: Fix max_mtu setting for multi-buf XDP
The existing code does not allow the MTU to be set to the maximum even
after an XDP program supporting multiple buffers is attached. Fix it
to set the netdev->max_mtu to the maximum value if the attached XDP
program supports mutiple buffers, regardless of the current MTU value.
Also use a local variable dev instead of repeatedly using bp->dev.
Fixes: 1dc4c557bfed ("bnxt: adding bnxt_xdp_build_skb to build skb from multibuffer xdp_buff") Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20230731142043.58855-3-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The RXBD length field on all bnxt chips is 16-bit and so we cannot
support a full page when the native page size is 64K or greater.
The non-XDP (non page pool) code path has logic to handle this but
the XDP page pool code path does not handle this. Add the missing
logic to use page_pool_dev_alloc_frag() to allocate 32K chunks if
the page size is 64K or greater.
selftest: net: Assert on a proper value in so_incoming_cpu.c.
Dan Carpenter reported an error spotted by Smatch.
./tools/testing/selftests/net/so_incoming_cpu.c:163 create_clients()
error: uninitialized symbol 'ret'.
The returned value of sched_setaffinity() should be checked with
ASSERT_EQ(), but the value was not saved in a proper variable,
resulting in an error above.
Let's save the returned value of with sched_setaffinity().
Fixes: 6df96146b202 ("selftest: Add test for SO_INCOMING_CPU.") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/linux-kselftest/fe376760-33b6-4fc9-88e8-178e809af1ac@moroto.mountain/ Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20230731181553.5392-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Mark Brown [Mon, 31 Jul 2023 10:48:32 +0000 (11:48 +0100)]
net: netsec: Ignore 'phy-mode' on SynQuacer in DT mode
As documented in acd7aaf51b20 ("netsec: ignore 'phy-mode' device
property on ACPI systems") the SocioNext SynQuacer platform ships with
firmware defining the PHY mode as RGMII even though the physical
configuration of the PHY is for TX and RX delays. Since bbc4d71d63549bc
("net: phy: realtek: fix rtl8211e rx/tx delay config") this has caused
misconfiguration of the PHY, rendering the network unusable.
This was worked around for ACPI by ignoring the phy-mode property but
the system is also used with DT. For DT instead if we're running on a
SynQuacer force a working PHY mode, as well as the standard EDK2
firmware with DT there are also some of these systems that use u-boot
and might not initialise the PHY if not netbooting. Newer firmware
imagaes for at least EDK2 are available from Linaro so print a warning
when doing this.
Fixes: 533dd11a12f6 ("net: socionext: Add Synquacer NetSec driver") Signed-off-by: Mark Brown <broonie@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20230731-synquacer-net-v3-1-944be5f06428@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yuanjun Gong [Mon, 31 Jul 2023 09:05:35 +0000 (17:05 +0800)]
net: korina: handle clk prepare error in korina_probe()
in korina_probe(), the return value of clk_prepare_enable()
should be checked since it might fail. we can use
devm_clk_get_optional_enabled() instead of devm_clk_get_optional()
and clk_prepare_enable() to automatically handle the error.
Ross Maynard [Mon, 31 Jul 2023 05:42:04 +0000 (15:42 +1000)]
USB: zaurus: Add ID for A-300/B-500/C-700
The SL-A300, B500/5600, and C700 devices no longer auto-load because of
"usbnet: Remove over-broad module alias from zaurus."
This patch adds IDs for those 3 devices.
Dan Carpenter [Mon, 31 Jul 2023 07:42:32 +0000 (10:42 +0300)]
net: ll_temac: fix error checking of irq_of_parse_and_map()
Most kernel functions return negative error codes but some irq functions
return zero on error. In this code irq_of_parse_and_map(), returns zero
and platform_get_irq() returns negative error codes. We need to handle
both cases appropriately.
Fixes: 8425c41d1ef7 ("net: ll_temac: Extend support to non-device-tree platforms") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Acked-by: Esben Haabendal <esben@geanix.com> Reviewed-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Harini Katakam <harini.katakam@amd.com> Link: https://lore.kernel.org/r/3d0aef75-06e0-45a5-a2a6-2cc4738d4143@moroto.mountain Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tomas Glozar [Fri, 28 Jul 2023 06:44:11 +0000 (08:44 +0200)]
bpf: sockmap: Remove preempt_disable in sock_map_sk_acquire
Disabling preemption in sock_map_sk_acquire conflicts with GFP_ATOMIC
allocation later in sk_psock_init_link on PREEMPT_RT kernels, since
GFP_ATOMIC might sleep on RT (see bpf: Make BPF and PREEMPT_RT co-exist
patchset notes for details).
This causes calling bpf_map_update_elem on BPF_MAP_TYPE_SOCKMAP maps to
BUG (sleeping function called from invalid context) on RT kernels.
preempt_disable was introduced together with lock_sk and rcu_read_lock
in commit 99ba2b5aba24e ("bpf: sockhash, disallow bpf_tcp_close and update
in parallel"), probably to match disabled migration of BPF programs, and
is no longer necessary.
Remove preempt_disable to fix BUG in sock_map_update_common on RT.
====================
net/sched Bind logic fixes for cls_fw, cls_u32 and cls_route
Three classifiers (cls_fw, cls_u32 and cls_route) always copy
tcf_result struct into the new instance of the filter on update.
This causes a problem when updating a filter bound to a class,
as tcf_unbind_filter() is always called on the old instance in the
success path, decreasing filter_cnt of the still referenced class
and allowing it to be deleted, leading to a use-after-free.
This patch set fixes this issue in all affected classifiers by no longer
copying the tcf_result struct from the old filter.
====================
valis [Sat, 29 Jul 2023 12:32:02 +0000 (08:32 -0400)]
net/sched: cls_route: No longer copy tcf_result on update to avoid use-after-free
When route4_change() is called on an existing filter, the whole
tcf_result struct is always copied into the new instance of the filter.
This causes a problem when updating a filter bound to a class,
as tcf_unbind_filter() is always called on the old instance in the
success path, decreasing filter_cnt of the still referenced class
and allowing it to be deleted, leading to a use-after-free.
Fix this by no longer copying the tcf_result struct from the old filter.
Fixes: 1109c00547fc ("net: sched: RCU cls_route") Reported-by: valis <sec@valis.email> Reported-by: Bing-Jhong Billy Jheng <billy@starlabs.sg> Signed-off-by: valis <sec@valis.email> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Victor Nogueira <victor@mojatatu.com> Reviewed-by: Pedro Tammela <pctammela@mojatatu.com> Reviewed-by: M A Ramdhan <ramdhan@starlabs.sg> Link: https://lore.kernel.org/r/20230729123202.72406-4-jhs@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
valis [Sat, 29 Jul 2023 12:32:01 +0000 (08:32 -0400)]
net/sched: cls_fw: No longer copy tcf_result on update to avoid use-after-free
When fw_change() is called on an existing filter, the whole
tcf_result struct is always copied into the new instance of the filter.
This causes a problem when updating a filter bound to a class,
as tcf_unbind_filter() is always called on the old instance in the
success path, decreasing filter_cnt of the still referenced class
and allowing it to be deleted, leading to a use-after-free.
Fix this by no longer copying the tcf_result struct from the old filter.
Fixes: e35a8ee5993b ("net: sched: fw use RCU") Reported-by: valis <sec@valis.email> Reported-by: Bing-Jhong Billy Jheng <billy@starlabs.sg> Signed-off-by: valis <sec@valis.email> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Victor Nogueira <victor@mojatatu.com> Reviewed-by: Pedro Tammela <pctammela@mojatatu.com> Reviewed-by: M A Ramdhan <ramdhan@starlabs.sg> Link: https://lore.kernel.org/r/20230729123202.72406-3-jhs@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
valis [Sat, 29 Jul 2023 12:32:00 +0000 (08:32 -0400)]
net/sched: cls_u32: No longer copy tcf_result on update to avoid use-after-free
When u32_change() is called on an existing filter, the whole
tcf_result struct is always copied into the new instance of the filter.
This causes a problem when updating a filter bound to a class,
as tcf_unbind_filter() is always called on the old instance in the
success path, decreasing filter_cnt of the still referenced class
and allowing it to be deleted, leading to a use-after-free.
Fix this by no longer copying the tcf_result struct from the old filter.
Fixes: de5df63228fc ("net: sched: cls_u32 changes to knode must appear atomic to readers") Reported-by: valis <sec@valis.email> Reported-by: M A Ramdhan <ramdhan@starlabs.sg> Signed-off-by: valis <sec@valis.email> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Victor Nogueira <victor@mojatatu.com> Reviewed-by: Pedro Tammela <pctammela@mojatatu.com> Reviewed-by: M A Ramdhan <ramdhan@starlabs.sg> Link: https://lore.kernel.org/r/20230729123202.72406-2-jhs@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michal Schmidt [Sat, 29 Jul 2023 15:15:16 +0000 (17:15 +0200)]
octeon_ep: initialize mbox mutexes
The two mbox-related mutexes are destroyed in octep_ctrl_mbox_uninit(),
but the corresponding mutex_init calls were missing.
A "DEBUG_LOCKS_WARN_ON(lock->magic != lock)" warning was emitted with
CONFIG_DEBUG_MUTEXES on.
Initialize the two mutexes in octep_ctrl_mbox_init().
Fixes: 577f0d1b1c5f ("octeon_ep: add separate mailbox command and response queues") Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20230729151516.24153-1-mschmidt@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 28 Jul 2023 20:50:20 +0000 (13:50 -0700)]
bnxt: don't handle XDP in netpoll
Similarly to other recently fixed drivers make sure we don't
try to access XDP or page pool APIs when NAPI budget is 0.
NAPI budget of 0 may mean that we are in netpoll.
This may result in running software IRQs in hard IRQ context,
leading to deadlocks or crashes.
To make sure bnapi->tx_pkts don't get wiped without handling
the events, move clearing the field into the handler itself.
Remember to clear tx_pkts after reset (bnxt_enable_napi())
as it's technically possible that netpoll will accumulate
some tx_pkts and then a reset will happen, leaving tx_pkts
out of sync with reality.
Fixes: 322b87ca55f2 ("bnxt_en: add page_pool support") Reviewed-by: Andy Gospodarek <gospo@broadcom.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20230728205020.2784844-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Edward Cree [Fri, 28 Jul 2023 16:55:28 +0000 (17:55 +0100)]
sfc: fix field-spanning memcpy in selftest
Add a struct_group for the whole packet body so we can copy it in one
go without triggering FORTIFY_SOURCE complaints.
Fixes: cf60ed469629 ("sfc: use padding to fix alignment in loopback test") Fixes: 30c24dd87f3f ("sfc: siena: use padding to fix alignment in loopback test") Fixes: 1186c6b31ee1 ("sfc: falcon: use padding to fix alignment in loopback test") Reviewed-by: Andy Moreton <andy.moreton@amd.com> Tested-by: Andy Moreton <andy.moreton@amd.com> Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230728165528.59070-1-edward.cree@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net: usb: lan78xx: reorder cleanup operations to avoid UAF bugs
The timer dev->stat_monitor can schedule the delayed work dev->wq and
the delayed work dev->wq can also arm the dev->stat_monitor timer.
When the device is detaching, the net_device will be deallocated. but
the net_device private data could still be dereferenced in delayed work
or timer handler. As a result, the UAF bugs will happen.
Although we use cancel_delayed_work_sync() to cancel the delayed work
in lan78xx_disconnect(), it could still be scheduled in timer handler
lan78xx_stat_monitor().
Although we use del_timer_sync() to delete the timer, the function
timer_pending() returns 0 when the timer is activated. As a result,
the del_timer_sync() will not be executed and the timer could be
re-armed.
In order to mitigate this bug, We use timer_shutdown_sync() to shutdown
the timer and then use cancel_delayed_work_sync() to cancel the delayed
work. As a result, the net_device could be deallocated safely.
What's more, the dev->flags is set to EVENT_DEV_DISCONNECT in
lan78xx_disconnect(). But it could still be set to EVENT_STAT_UPDATE
in lan78xx_stat_monitor(). So this patch put the set_bit() behind
timer_shutdown_sync().
Fixes: 77dfff5bb7e2 ("lan78xx: Fix race condition in disconnect handling") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
1. Use unevaluatedProperties
It's needed to allow ethernet-controller.yaml properties work correctly.
2. Drop unneeded phy-handle/phy-mode
3. Don't require phy-handle
Some SoCs may use fixed link.
For in-kernel MT7621 DTS files this fixes following errors:
arch/mips/boot/dts/ralink/mt7621-tplink-hc220-g5-v1.dtb: ethernet@1e100000: mac@0: 'fixed-link' does not match any of the regexes: 'pinctrl-[0-9]+'
From schema: Documentation/devicetree/bindings/net/mediatek,net.yaml
arch/mips/boot/dts/ralink/mt7621-tplink-hc220-g5-v1.dtb: ethernet@1e100000: mac@0: 'phy-handle' is a required property
From schema: Documentation/devicetree/bindings/net/mediatek,net.yaml
arch/mips/boot/dts/ralink/mt7621-tplink-hc220-g5-v1.dtb: ethernet@1e100000: mac@1: 'fixed-link' does not match any of the regexes: 'pinctrl-[0-9]+'
From schema: Documentation/devicetree/bindings/net/mediatek,net.yaml
arch/mips/boot/dts/ralink/mt7621-tplink-hc220-g5-v1.dtb: ethernet@1e100000: mac@1: 'phy-handle' is a required property
From schema: Documentation/devicetree/bindings/net/mediatek,net.yaml
Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: 4cfd5779bd6e ("taprio: Add support for txtime-assist mode") Reported-by: syzkaller <syzkaller@googlegroups.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Co-developed-by: Eric Dumazet <edumazet@google.com> Co-developed-by: Pedro Tammela <pctammela@mojatatu.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 28 Jul 2023 15:03:17 +0000 (15:03 +0000)]
net: add missing data-race annotation for sk_ll_usec
In a prior commit I forgot that sk_getsockopt() reads
sk->sk_ll_usec without holding a lock.
Fixes: 0dbffbb5335a ("net: annotate data race around sk_ll_usec") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 28 Jul 2023 15:03:16 +0000 (15:03 +0000)]
net: add missing data-race annotations around sk->sk_peek_off
sk_getsockopt() runs locklessly, thus we need to annotate the read
of sk->sk_peek_off.
While we are at it, add corresponding annotations to sk_set_peek_off()
and unix_set_peek_off().
Fixes: b9bb53f3836f ("sock: convert sk_peek_offset functions to WRITE_ONCE") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 28 Jul 2023 15:03:15 +0000 (15:03 +0000)]
net: annotate data-races around sk->sk_mark
sk->sk_mark is often read while another thread could change the value.
Fixes: 4a19ec5800fc ("[NET]: Introducing socket mark socket option.") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
In a prior commit, I forgot to change sk_getsockopt()
when reading sk->sk_rcvbuf locklessly.
Fixes: ebb3b78db7bf ("tcp: annotate sk->sk_rcvbuf lockless reads") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
In a prior commit, I forgot to change sk_getsockopt()
when reading sk->sk_sndbuf locklessly.
Fixes: e292f05e0df7 ("tcp: annotate sk->sk_sndbuf lockless reads") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
In a prior commit, I forgot to change sk_getsockopt()
when reading sk->sk_rcvlowat locklessly.
Fixes: eac66402d1c3 ("net: annotate sk->sk_rcvlowat lockless reads") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 28 Jul 2023 15:03:10 +0000 (15:03 +0000)]
net: annotate data-races around sk->sk_max_pacing_rate
sk_getsockopt() runs locklessly. This means sk->sk_max_pacing_rate
can be read while other threads are changing its value.
Fixes: 62748f32d501 ("net: introduce SO_MAX_PACING_RATE") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 28 Jul 2023 15:03:09 +0000 (15:03 +0000)]
net: annotate data-race around sk->sk_txrehash
sk_getsockopt() runs locklessly. This means sk->sk_txrehash
can be read while other threads are changing its value.
Other locations were handled in commit cb6cd2cec799
("tcp: Change SYN ACK retransmit behaviour to account for rehash")
Fixes: 26859240e4ee ("txhash: Add socket option to control TX hash rethink behavior") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Akhmat Karakotov <hmukos@yandex-team.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 28 Jul 2023 15:03:08 +0000 (15:03 +0000)]
net: annotate data-races around sk->sk_reserved_mem
sk_getsockopt() runs locklessly. This means sk->sk_reserved_mem
can be read while other threads are changing its value.
Add missing annotations where they are needed.
Fixes: 2bb2f5fb21b0 ("net: add new socket option SO_RESERVE_MEM") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Wei Wang <weiwan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Gobert [Thu, 27 Jul 2023 15:33:56 +0000 (17:33 +0200)]
net: gro: fix misuse of CB in udp socket lookup
This patch fixes a misuse of IP{6}CB(skb) in GRO, while calling to
`udp6_lib_lookup2` when handling udp tunnels. `udp6_lib_lookup2` fetch the
device from CB. The fix changes it to fetch the device from `skb->dev`.
l3mdev case requires special attention since it has a master and a slave
device.
Fixes: a6024562ffd7 ("udp: Add GRO functions to UDP socket") Reported-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Richard Gobert <richardbgobert@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Fix this by making caller to provide the context whether it could be in
atomic context flow or not when getting stats from QED driver.
QED driver based on the context provided decide to schedule out or not
when acquiring the PTT BAR window.
We faced the BUG_ON() while getting vport stats, but according to the
code same issue could happen for fcoe and iscsi statistics as well, so
fixing them too.
Fixes: 6c75424612a7 ("qed: Add support for NCSI statistics.") Fixes: 1e128c81290a ("qed: Add support for hardware offloaded FCoE.") Fixes: 2f2b2614e893 ("qed: Provide iSCSI statistics to management") Cc: Sudarsana Kalluru <skalluru@marvell.com> Cc: David Miller <davem@davemloft.net> Cc: Manish Chopra <manishc@marvell.com> Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
net: dsa: microchip: KSZ9477 register regmap alignment to 32 bit boundaries
The commit (SHA1: 5c844d57aa7894154e49cf2fc648bfe2f1aefc1c) provided code
to apply "Module 6: Certain PHY registers must be written as pairs instead
of singly" errata for KSZ9477 as this chip for certain PHY registers
(0xN120 to 0xN13F, N=1,2,3,4,5) must be accesses as 32 bit words instead
of 16 or 8 bit access.
Otherwise, adjacent registers (no matter if reserved or not) are
overwritten with 0x0.
Without this patch some registers (e.g. 0x113c or 0x1134) required for 32
bit access are out of valid regmap ranges.
As a result, following error is observed and KSZ9477 is not properly
configured:
ksz-switch spi1.0: can't rmw 32bit reg 0x113c: -EIO
ksz-switch spi1.0: can't rmw 32bit reg 0x1134: -EIO
ksz-switch spi1.0 lan1 (uninitialized): failed to connect to PHY: -EIO
ksz-switch spi1.0 lan1 (uninitialized): error -5 setting up PHY for tree 0, switch 0, port 0
The solution is to modify regmap_reg_range to allow accesses with 4 bytes
boundaries.
Signed-off-by: Lukasz Majewski <lukma@denx.de> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
net: stmmac: tegra: Properly allocate clock bulk data
The clock data is an array of struct clk_bulk_data, so make sure to
allocate enough memory.
Fixes: d8ca113724e7 ("net: stmmac: tegra: Add MGBE support") Signed-off-by: Thierry Reding <treding@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Chengfeng Ye [Thu, 27 Jul 2023 08:56:19 +0000 (08:56 +0000)]
mISDN: hfcpci: Fix potential deadlock on &hc->lock
As &hc->lock is acquired by both timer _hfcpci_softirq() and hardirq
hfcpci_int(), the timer should disable irq before lock acquisition
otherwise deadlock could happen if the timmer is preemtped by the hadr irq.
A match entry is uniquely identified with an "address" or "path" in the
form of: hashtable ID(12b):bucketid(8b):nodeid(12b).
When creating table match entries all of hash table id, bucket id and
node (match entry id) are needed to be either specified by the user or
reasonable in-kernel defaults are used. The in-kernel default for a table id is
0x800(omnipresent root table); for bucketid it is 0x0. Prior to this fix there
was none for a nodeid i.e. the code assumed that the user passed the correct
nodeid and if the user passes a nodeid of 0 (as Mingi Cho did) then that is what
was used. But nodeid of 0 is reserved for identifying the table. This is not
a problem until we dump. The dump code notices that the nodeid is zero and
assumes it is referencing a table and therefore references table struct
tc_u_hnode instead of what was created i.e match entry struct tc_u_knode.
Ming does an equivalent of:
tc filter add dev dummy0 parent 10: prio 1 handle 0x1000 \
protocol ip u32 match ip src 10.0.0.1/32 classid 10:1 action ok
Essentially specifying a table id 0, bucketid 1 and nodeid of zero
Tableid 0 is remapped to the default of 0x800.
Bucketid 1 is ignored and defaults to 0x00.
Nodeid was assumed to be what Ming passed - 0x000
dumping before fix shows:
~$ tc filter ls dev dummy0 parent 10:
filter protocol ip pref 1 u32 chain 0
filter protocol ip pref 1 u32 chain 0 fh 800: ht divisor 1
filter protocol ip pref 1 u32 chain 0 fh 800: ht divisor -30591
Note that the last line reports a table instead of a match entry
(you can tell this because it says "ht divisor...").
As a result of reporting the wrong data type (misinterpretting of struct
tc_u_knode as being struct tc_u_hnode) the divisor is reported with value
of -30591. Ming identified this as part of the heap address
(physmap_base is 0xffff8880 (-30591 - 1)).
The fix is to ensure that when table entry matches are added and no
nodeid is specified (i.e nodeid == 0) then we get the next available
nodeid from the table's pool.
After the fix, this is what the dump shows:
$ tc filter ls dev dummy0 parent 10:
filter protocol ip pref 1 u32 chain 0
filter protocol ip pref 1 u32 chain 0 fh 800: ht divisor 1
filter protocol ip pref 1 u32 chain 0 fh 800::800 order 2048 key ht 800 bkt 0 flowid 10:1 not_in_hw
match 0a000001/ffffffff at 12
action order 1: gact action pass
random type none pass val 0
index 1 ref 1 bind 1
Eugen Hristev [Wed, 26 Jul 2023 07:06:15 +0000 (10:06 +0300)]
dt-bindings: net: rockchip-dwmac: fix {tx|rx}-delay defaults/range in schema
The range and the defaults are specified in the description instead of
being specified in the schema.
Fix it by adding the default value in the `default` field and specifying
the range as `minimum` and `maximum`.
Fixes: b331b8ef86f0 ("dt-bindings: net: convert rockchip-dwmac to json-schema") Signed-off-by: Eugen Hristev <eugen.hristev@collabora.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 28 Jul 2023 03:03:40 +0000 (20:03 -0700)]
Merge tag 'mlx5-fixes-2023-07-26' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5 fixes 2023-07-26
This series provides bug fixes to mlx5 driver.
* tag 'mlx5-fixes-2023-07-26' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5: Unregister devlink params in case interface is down
net/mlx5: DR, Fix peer domain namespace setting
net/mlx5: fs_chains: Fix ft prio if ignore_flow_level is not supported
net/mlx5e: kTLS, Fix protection domain in use syndrome when devlink reload
net/mlx5: Bridge, set debugfs access right to root-only
net/mlx5e: xsk: Fix crash on regular rq reactivation
net/mlx5e: xsk: Fix invalid buffer access for legacy rq
net/mlx5e: Move representor neigh cleanup to profile cleanup_tx
net/mlx5e: Fix crash moving to switchdev mode when ntuple offload is set
net/mlx5e: Don't hold encap tbl lock if there is no encap action
net/mlx5: Honor user input for migratable port fn attr
net/mlx5e: fix return value check in mlx5e_ipsec_remove_trailer()
net/mlx5: fix potential memory leak in mlx5e_init_rep_rx
net/mlx5: DR, fix memory leak in mlx5dr_cmd_create_reformat_ctx
net/mlx5e: fix double free in macsec_fs_tx_create_crypto_table_groups
====================
Eric Dumazet [Wed, 26 Jul 2023 14:58:15 +0000 (14:58 +0000)]
net: flower: fix stack-out-of-bounds in fl_set_key_cfm()
Typical misuse of
nla_parse_nested(array, XXX_MAX, ...);
array must be declared as
struct nlattr *array[XXX_MAX + 1];
v2: Based on feedbacks from Ido Schimmel and Zahari Doychev,
I also changed TCA_FLOWER_KEY_CFM_OPT_MAX and cfm_opt_policy
definitions.
syzbot reported:
BUG: KASAN: stack-out-of-bounds in __nla_validate_parse+0x136/0x2bd0 lib/nlattr.c:588
Write of size 32 at addr ffffc90003a0ee20 by task syz-executor296/5014
The buggy address belongs to stack of task syz-executor296/5014
and is located at offset 32 in frame:
fl_set_key_cfm+0x0/0x440 net/sched/cls_flower.c:374
This frame has 1 object:
[32, 56) 'nla_cfm_opt'
The buggy address belongs to the virtual mapping at
[ffffc90003a08000, ffffc90003a11000) created by:
copy_process+0x5c8/0x4290 kernel/fork.c:2330
Fixes: 7cfffd5fed3e ("net: flower: add support for matching cfm fields") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Simon Horman <simon.horman@corigine.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Zahari Doychev <zdoychev@maxlinear.com> Link: https://lore.kernel.org/r/20230726145815.943910-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 26 Jul 2023 15:11:20 +0000 (08:11 -0700)]
MAINTAINERS: stmmac: retire Giuseppe Cavallaro
I tried to get stmmac maintainers to be more active by agreeing with
them off-list on a review rotation. I pinged Peppe 3 times over 2 weeks
during his "shift month", no reviews are flowing.
All the contributions are much appreciated! But stmmac is quite
active, we need participating maintainers :(
Older DSA drivers that do not provide an dsa_ops adjust_link method end
up using phylink. Unfortunately, a recent phylink change that requires
its supported_interfaces bitmap to be filled breaks these drivers
because the bitmap remains empty.
Rather than fixing each driver individually, fix it in the core code so
we have a sensible set of defaults.
Reported-by: Sergei Antonov <saproj@gmail.com> Fixes: de5c9bf40c45 ("net: phylink: require supported_interfaces to be filled") Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Tested-by: Vladimir Oltean <olteanv@gmail.com> # dsa_loop Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/E1qOflM-001AEz-D3@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Lin Ma [Wed, 26 Jul 2023 07:53:14 +0000 (15:53 +0800)]
rtnetlink: let rtnl_bridge_setlink checks IFLA_BRIDGE_MODE length
There are totally 9 ndo_bridge_setlink handlers in the current kernel,
which are 1) bnxt_bridge_setlink, 2) be_ndo_bridge_setlink 3)
i40e_ndo_bridge_setlink 4) ice_bridge_setlink 5)
ixgbe_ndo_bridge_setlink 6) mlx5e_bridge_setlink 7)
nfp_net_bridge_setlink 8) qeth_l2_bridge_setlink 9) br_setlink.
By investigating the code, we find that 1-7 parse and use nlattr
IFLA_BRIDGE_MODE but 3 and 4 forget to do the nla_len check. This can
lead to an out-of-attribute read and allow a malformed nlattr (e.g.,
length 0) to be viewed as a 2 byte integer.
To avoid such issues, also for other ndo_bridge_setlink handlers in the
future. This patch adds the nla_len check in rtnl_bridge_setlink and
does an early error return if length mismatches. To make it works, the
break is removed from the parsing for IFLA_BRIDGE_FLAGS to make sure
this nla_for_each_nested iterates every attribute.
Fixes: b1edc14a3fbf ("ice: Implement ice_bridge_getlink and ice_bridge_setlink") Fixes: 51616018dd1b ("i40e: Add support for getlink, setlink ndo ops") Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Lin Ma <linma@zju.edu.cn> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://lore.kernel.org/r/20230726075314.1059224-1-linma@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Merge tag 'net-6.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from can, netfilter.
Current release - regressions:
- core: fix splice_to_socket() for O_NONBLOCK socket
- af_unix: fix fortify_panic() in unix_bind_bsd().
- can: raw: fix lockdep issue in raw_release()
Previous releases - regressions:
- tcp: reduce chance of collisions in inet6_hashfn().
- netfilter: skip immediate deactivate in _PREPARE_ERROR
- tipc: stop tipc crypto on failure in tipc_node_create
- eth: igc: fix kernel panic during ndo_tx_timeout callback
- eth: iavf: fix potential deadlock on allocation failure
Previous releases - always broken:
- ipv6: fix bug where deleting a mngtmpaddr can create a new
temporary address
- eth: ice: fix memory management in ice_ethtool_fdir.c
- eth: hns3: fix the imp capability bit cannot exceed 32 bits issue
- eth: vxlan: calculate correct header length for GPE
- eth: stmmac: apply redundant write work around on 4.xx too"
* tag 'net-6.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (49 commits)
tipc: stop tipc crypto on failure in tipc_node_create
af_unix: Terminate sun_path when bind()ing pathname socket.
tipc: check return value of pskb_trim()
benet: fix return value check in be_lancer_xmit_workarounds()
virtio-net: fix race between set queues and probe
net/sched: mqprio: Add length check for TCA_MQPRIO_{MAX/MIN}_RATE64
splice, net: Fix splice_to_socket() for O_NONBLOCK socket
net: fec: tx processing does not call XDP APIs if budget is 0
mptcp: more accurate NL event generation
selftests: mptcp: join: only check for ip6tables if needed
tools: ynl-gen: fix parse multi-attr enum attribute
tools: ynl-gen: fix enum index in _decode_enum(..)
netfilter: nf_tables: disallow rule addition to bound chain via NFTA_RULE_CHAIN_ID
netfilter: nf_tables: skip immediate deactivate in _PREPARE_ERROR
netfilter: nft_set_rbtree: fix overlap expiration walk
igc: Fix Kernel Panic during ndo_tx_timeout callback
net: dsa: qca8k: fix mdb add/del case with 0 VID
net: dsa: qca8k: fix broken search_and_del
net: dsa: qca8k: fix search_and_insert wrong handling of new rule
net: dsa: qca8k: enable use_single_write for qca8xxx
...
Merge tag 'soundwire-6.5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire
Pull soundwire fixes from Vinod Koul:
- Core fix for enumeration completion
- Qualcomm driver fix to update status
- AMD driver fix for probe error check
* tag 'soundwire-6.5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire:
soundwire: amd: Fix a check for errors in probe()
soundwire: qcom: update status correctly with mask
soundwire: fix enumeration completion
Merge tag 'phy-fixes-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy
Pull phy fixes from Vinod Koul:
- Out of bound fix for hisilicon phy
- Qualcomm synopsis femto phy for keeping clock enabled during suspend
and enabling ref clocks
- Mediatek driver fixes for upper limit test and error code
* tag 'phy-fixes-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy:
phy: hisilicon: Fix an out of bounds check in hisi_inno_phy_probe()
phy: qcom-snps-femto-v2: use qcom_snps_hsphy_suspend/resume error code
phy: qcom-snps-femto-v2: properly enable ref clock
phy: qcom-snps-femto-v2: keep cfg_ahb_clk enabled during runtime suspend
phy: mediatek: hdmi: mt8195: fix prediv bad upper limit test
phy: phy-mtk-dp: Fix an error code in probe()
Merge tag 'for-6.5-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- fix accounting of global block reserve size when block group tree is
enabled
- the async discard has been enabled in 6.2 unconditionally, but for
zoned mode it does not make that much sense to do it asynchronously
as the zones are reset as needed
- error handling and proper error value propagation fixes
* tag 'for-6.5-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: check for commit error at btrfs_attach_transaction_barrier()
btrfs: check if the transaction was aborted at btrfs_wait_for_commit()
btrfs: remove BUG_ON()'s in add_new_free_space()
btrfs: account block group tree when calculating global reserve size
btrfs: zoned: do not enable async discard
Merge tag 'fixes-2023-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock
Pull memblock fix from Mike Rapoport:
"A call to memblock_free() or memblock_phys_free() issued after
memblock data is discarded will result in use after free in
memblock_isolate_range().
Avoid those issues by making sure that memblock_discard points
memblock.reserved.regions back at the static buffer"
* tag 'fixes-2023-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
mm,memblock: reset memblock.reserved to system init state to prevent UAF
mm: lock_vma_under_rcu() must check vma->anon_vma under vma lock
lock_vma_under_rcu() tries to guarantee that __anon_vma_prepare() can't
be called in the VMA-locked page fault path by ensuring that
vma->anon_vma is set.
However, this check happens before the VMA is locked, which means a
concurrent move_vma() can concurrently call unlink_anon_vmas(), which
disassociates the VMA's anon_vma.
This means we can get UAF in the following scenario:
This is a security bug if you can hit it, although an attacker would
have to win two races at once where the first race window is only a few
instructions wide.
This patch is based on some previous discussion with Linus Torvalds on
the security list.
Cc: stable@vger.kernel.org Fixes: 5e31275cc997 ("mm: add per-VMA lock and helper functions to control it") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
tipc: stop tipc crypto on failure in tipc_node_create
If tipc_link_bc_create() fails inside tipc_node_create() for a newly
allocated tipc node then we should stop its tipc crypto and free the
resources allocated with a call to tipc_crypto_start().
As the node ref is initialized to one to that point, just put the ref on
tipc_link_bc_create() error case that would lead to tipc_node_free() be
eventually executed and properly clean the node and its crypto resources.
Found by Linux Verification Center (linuxtesting.org).
Fixes: cb8092d70a6f ("tipc: move bc link creation back to tipc_node_create") Suggested-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Reviewed-by: Xin Long <lucien.xin@gmail.com> Link: https://lore.kernel.org/r/20230725214628.25246-1-pchelkin@ispras.ru Signed-off-by: Paolo Abeni <pabeni@redhat.com>
af_unix: Terminate sun_path when bind()ing pathname socket.
kernel test robot reported slab-out-of-bounds access in strlen(). [0]
Commit 06d4c8a80836 ("af_unix: Fix fortify_panic() in unix_bind_bsd().")
removed unix_mkname_bsd() call in unix_bind_bsd().
If sunaddr->sun_path is not terminated by user and we don't enable
CONFIG_INIT_STACK_ALL_ZERO=y, strlen() will do the out-of-bounds access
during file creation.
Let's go back to strlen()-with-sockaddr_storage way and pack all 108
trickiness into unix_mkname_bsd() with bold comments.
[0]:
BUG: KASAN: slab-out-of-bounds in strlen (lib/string.c:?)
Read of size 1 at addr ffff000015492777 by task fortify_strlen_/168
The buggy address belongs to the object at ffff000015492700
which belongs to the cache kmalloc-128 of size 128
The buggy address is located 0 bytes to the right of
allocated 119-byte region [ffff000015492700, ffff000015492777)
Memory state around the buggy address: ffff000015492600: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff000015492680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff000015492700: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 07 fc
^ ffff000015492780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff000015492800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Fixes: 06d4c8a80836 ("af_unix: Fix fortify_panic() in unix_bind_bsd().") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/netdev/202307262110.659e5e8-oliver.sang@intel.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230726190828.47874-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski [Thu, 27 Jul 2023 05:18:00 +0000 (22:18 -0700)]
Merge tag 'nf-23-07-26' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Florian Westphal says:
====================
netfilter fixes for net
1. On-demand overlap detection in 'rbtree' set can cause memory leaks.
This is broken since 6.2.
2. An earlier fix in 6.4 to address an imbalance in refcounts during
transaction error unwinding was incomplete, from Pablo Neira.
3. Disallow adding a rule to a deleted chain, also from Pablo.
Broken since 5.9.
* tag 'nf-23-07-26' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: disallow rule addition to bound chain via NFTA_RULE_CHAIN_ID
netfilter: nf_tables: skip immediate deactivate in _PREPARE_ERROR
netfilter: nft_set_rbtree: fix overlap expiration walk
====================
Jason Wang [Tue, 25 Jul 2023 07:20:49 +0000 (03:20 -0400)]
virtio-net: fix race between set queues and probe
A race were found where set_channels could be called after registering
but before virtnet_set_queues() in virtnet_probe(). Fixing this by
moving the virtnet_set_queues() before netdevice registering. While at
it, use _virtnet_set_queues() to avoid holding rtnl as the device is
not even registered at that time.
Cc: stable@vger.kernel.org Fixes: a220871be66f ("virtio-net: correctly enable multiqueue") Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Link: https://lore.kernel.org/r/20230725072049.617289-1-jasowang@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Lin Ma [Tue, 25 Jul 2023 02:42:27 +0000 (10:42 +0800)]
net/sched: mqprio: Add length check for TCA_MQPRIO_{MAX/MIN}_RATE64
The nla_for_each_nested parsing in function mqprio_parse_nlattr() does
not check the length of the nested attribute. This can lead to an
out-of-attribute read and allow a malformed nlattr (e.g., length 0) to
be viewed as 8 byte integer and passed to priv->max_rate/min_rate.
This patch adds the check based on nla_len() when check the nla_type(),
which ensures that the length of these two attribute must equals
sizeof(u64).
Fixes: 4e8b86c06269 ("mqprio: Introduce new hardware offload mode and shaper in mqprio") Reviewed-by: Victor Nogueira <victor@mojatatu.com> Signed-off-by: Lin Ma <linma@zju.edu.cn> Link: https://lore.kernel.org/r/20230725024227.426561-1-linma@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jan Stancek [Mon, 24 Jul 2023 17:39:04 +0000 (19:39 +0200)]
splice, net: Fix splice_to_socket() for O_NONBLOCK socket
LTP sendfile07 [1], which expects sendfile() to return EAGAIN when
transferring data from regular file to a "full" O_NONBLOCK socket,
started failing after commit 2dc334f1a63a ("splice, net: Use
sendmsg(MSG_SPLICE_PAGES) rather than ->sendpage()").
sendfile() no longer immediately returns, but now blocks.
Removed sock_sendpage() handled this case by setting a MSG_DONTWAIT
flag, fix new splice_to_socket() to do the same for O_NONBLOCK sockets.
net: fec: tx processing does not call XDP APIs if budget is 0
According to the clarification [1] in the latest napi.rst, the tx
processing cannot call any XDP (or page pool) APIs if the "budget"
is 0. Because NAPI is called with the budget of 0 (such as netpoll)
indicates we may be in an IRQ context, however, we cannot use the
page pool from IRQ context.
Paolo Abeni [Tue, 25 Jul 2023 18:34:56 +0000 (11:34 -0700)]
mptcp: more accurate NL event generation
Currently the mptcp code generate a "new listener" event even
if the actual listen() syscall fails. Address the issue moving
the event generation call under the successful branch.
Cc: stable@vger.kernel.org Fixes: f8c9dfbd875b ("mptcp: add pm listener events") Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20230725-send-net-20230725-v1-2-6f60fe7137a9@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
selftests: mptcp: join: only check for ip6tables if needed
If 'iptables-legacy' is available, 'ip6tables-legacy' command will be
used instead of 'ip6tables'. So no need to look if 'ip6tables' is
available in this case.
Cc: stable@vger.kernel.org Fixes: 0c4cd3f86a40 ("selftests: mptcp: join: use 'iptables-legacy' if available") Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20230725-send-net-20230725-v1-1-6f60fe7137a9@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Shay Drory [Sun, 25 Jun 2023 08:07:38 +0000 (11:07 +0300)]
net/mlx5: Unregister devlink params in case interface is down
Currently, in case an interface is down, mlx5 driver doesn't
unregister its devlink params, which leads to this WARN[1].
Fix it by unregistering devlink params in that case as well.
Shay Drory [Wed, 14 Jun 2023 06:03:32 +0000 (09:03 +0300)]
net/mlx5: DR, Fix peer domain namespace setting
The offending patch is based on the assumption that for PFs,
mlx5_get_dev_index() is the same as vhca_id. However, this assumption
is wrong in case of DPU (ECPF).
Fix it by using vhca_id directly, and switch the array of peers to
xarray.
Fixes: 6d5b7321d8af ("net/mlx5: DR, handle more than one peer domain") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Chris Mi [Mon, 17 Jul 2023 05:32:51 +0000 (08:32 +0300)]
net/mlx5: fs_chains: Fix ft prio if ignore_flow_level is not supported
The cited commit sets ft prio to fs_base_prio. But if
ignore_flow_level it not supported, ft prio must be set based on
tc filter prio. Otherwise, all the ft prio are the same on the same
chain. It is invalid if ignore_flow_level is not supported.
Fix it by setting ft prio based on tc filter prio and setting
fs_base_prio to 0 for fdb.
Fixes: 8e80e5648092 ("net/mlx5: fs_chains: Refactor to detach chains from tc usage") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Jianbo Liu [Mon, 8 May 2023 03:36:10 +0000 (03:36 +0000)]
net/mlx5e: kTLS, Fix protection domain in use syndrome when devlink reload
There are DEK objects cached in DEK pool after kTLS is used, and they
are freed only in mlx5e_ktls_cleanup().
mlx5e_destroy_mdev_resources() is called in mlx5e_suspend() to
free mdev resources, including protection domain (PD). However, PD is
still referenced by the cached DEK objects in this case, because
profile->cleanup() (and therefore mlx5e_ktls_cleanup()) is called
after mlx5e_suspend() during devlink reload. So the following FW
syndrome is generated:
mlx5_cmd_out_err:803:(pid 12948): DEALLOC_PD(0x801) op_mod(0x0) failed,
status bad resource state(0x9), syndrome (0xef0c8a), err(-22)
To avoid this syndrome, move DEK pool destruction to
mlx5e_ktls_cleanup_tx(), which is called by profile->cleanup_tx(). And
move pool creation to mlx5e_ktls_init_tx() for symmetry.
Fixes: f741db1a5171 ("net/mlx5e: kTLS, Improve connection rate by using fast update encryption key") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
net/mlx5e: xsk: Fix crash on regular rq reactivation
When the regular rq is reactivated after the XSK socket is closed
it could be reading stale cqes which eventually corrupts the rq.
This leads to no more traffic being received on the regular rq and a
crash on the next close or deactivation of the rq.
Kal Cuttler Conely reported this issue as a crash on the release
path when the xdpsock sample program is stopped (killed) and restarted
in sequence while traffic is running.
This patch flushes all cqes when during the rq flush. The cqe flushing
is done in the reset state of the rq. mlx5e_rq_to_ready code is moved
into the flush function to allow for this.
net/mlx5e: xsk: Fix invalid buffer access for legacy rq
The below crash can be encountered when using xdpsock in rx mode for
legacy rq: the buffer gets released in the XDP_REDIRECT path, and then
once again in the driver. This fix sets the flag to avoid releasing on
the driver side.
XSK handling of buffers for legacy rq was relying on the caller to set
the skip release flag. But the referenced fix started using fragment
counts for pages instead of the skip flag.
Jianbo Liu [Mon, 3 Jul 2023 08:28:16 +0000 (08:28 +0000)]
net/mlx5e: Move representor neigh cleanup to profile cleanup_tx
For IP tunnel encapsulation in ECMP (Equal-Cost Multipath) mode, as
the flow is duplicated to the peer eswitch, the related neighbour
information on the peer uplink representor is created as well.
In the cited commit, eswitch devcom unpair is moved to uplink unload
API, specifically the profile->cleanup_tx. If there is a encap rule
offloaded in ECMP mode, when one eswitch does unpair (because of
unloading the driver, for instance), and the peer rule from the peer
eswitch is going to be deleted, the use-after-free error is triggered
while accessing neigh info, as it is already cleaned up in uplink's
profile->disable, which is before its profile->cleanup_tx.
To fix this issue, move the neigh cleanup to profile's cleanup_tx
callback, and after mlx5e_cleanup_uplink_rep_tx is called. The neigh
init is moved to init_tx for symmeter.
[ 2453.376299] BUG: KASAN: slab-use-after-free in mlx5e_rep_neigh_entry_release+0x109/0x3a0 [mlx5_core]
[ 2453.379125] Read of size 4 at addr ffff888127af9008 by task modprobe/2496
Amir Tzin [Tue, 30 May 2023 17:11:14 +0000 (20:11 +0300)]
net/mlx5e: Fix crash moving to switchdev mode when ntuple offload is set
Moving to switchdev mode with ntuple offload on causes the kernel to
crash since fs->arfs is freed during nic profile cleanup flow.
Ntuple offload is not supported in switchdev mode and it is already
unset by mlx5 fix feature ndo in switchdev mode. Verify fs->arfs is
valid before disabling it.
net/mlx5: Honor user input for migratable port fn attr
Currently, whenever a user is setting migratable port fn attr, the
driver is always turn migratable capability on.
Fix it by honor the user input
Fixes: e5b9642a33be ("net/mlx5: E-Switch, Implement devlink port function cmds to control migratable") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
net/mlx5: fix potential memory leak in mlx5e_init_rep_rx
The memory pointed to by the priv->rx_res pointer is not freed in the error
path of mlx5e_init_rep_rx, which can lead to a memory leak. Fix by freeing
the memory in the error path, thereby making the error path identical to
mlx5e_cleanup_rep_rx().
Fixes: af8bbf730068 ("net/mlx5e: Convert mlx5e_flow_steering member of mlx5e_priv to pointer") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
net/mlx5: DR, fix memory leak in mlx5dr_cmd_create_reformat_ctx
when mlx5_cmd_exec failed in mlx5dr_cmd_create_reformat_ctx, the memory
pointed by 'in' is not released, which will cause memory leak. Move memory
release after mlx5_cmd_exec.
net/mlx5e: fix double free in macsec_fs_tx_create_crypto_table_groups
In function macsec_fs_tx_create_crypto_table_groups(), when the ft->g
memory is successfully allocated but the 'in' memory fails to be
allocated, the memory pointed to by ft->g is released once. And in function
macsec_fs_tx_create(), macsec_fs_tx_destroy() is called to release the
memory pointed to by ft->g again. This will cause double free problem.
When attribute is enum type and marked as multi-attr, the netlink
respond is not parsed, fails with stack trace:
Traceback (most recent call last):
File "/net-next/tools/net/ynl/./test.py", line 520, in <module>
main()
File "/net-next/tools/net/ynl/./test.py", line 488, in main
dplls=dplls_get(282574471561216)
File "/net-next/tools/net/ynl/./test.py", line 48, in dplls_get
reply=act(args)
File "/net-next/tools/net/ynl/./test.py", line 41, in act
reply = ynl.dump(args.dump, attrs)
File "/net-next/tools/net/ynl/lib/ynl.py", line 598, in dump
return self._op(method, vals, dump=True)
File "/net-next/tools/net/ynl/lib/ynl.py", line 584, in _op
rsp_msg = self._decode(gm.raw_attrs, op.attr_set.name)
File "/net-next/tools/net/ynl/lib/ynl.py", line 451, in _decode
self._decode_enum(rsp, attr_spec)
File "/net-next/tools/net/ynl/lib/ynl.py", line 408, in _decode_enum
value = enum.entries_by_val[raw].name
TypeError: unhashable type: 'list'
error: 1
Redesign _decode_enum(..) to take a enum int value and translate
it to either a bitmask or enum name as expected.
tools: ynl-gen: fix enum index in _decode_enum(..)
Remove wrong index adjustment, which is leftover from adding
support for sparse enums.
enum.entries_by_val() function shall not subtract the start-value, as
it is indexed with real enum value.
Fixes: c311aaa74ca1 ("tools: ynl: fix enum-as-flags in the generic CLI") Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20230725101642.267248-2-arkadiusz.kubalewski@intel.com Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Merge tag 'platform-drivers-x86-v6.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Hans de Goede:
"Misc small fixes and hw-id additions"
* tag 'platform-drivers-x86-v6.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: huawei-wmi: Silence ambient light sensor
platform/x86: msi-laptop: Fix rfkill out-of-sync on MSI Wind U100
platform/x86: asus-wmi: Fix setting RGB mode on some TUF laptops
platform/x86: think-lmi: Use kfree_sensitive instead of kfree
platform/x86/intel/hid: Add HP Dragonfly G2 to VGBS DMI quirks
platform/x86: intel: hid: Always call BTNL ACPI method
platform/x86/amd/pmf: Notify OS power slider update
platform/x86/amd/pmf: reduce verbosity of apmf_get_system_params
platform/x86: serial-multi-instantiate: Auto detect IRQ resource for CSC3551
platform/x86/amd: pmc: Use release_mem_region() to undo request_mem_region_muxed()
platform/x86: touchscreen_dmi.c: small changes for Archos 101 Cesium Educ tablet
Merge tag '6.5-rc3-ksmbd-server-fixes' of git://git.samba.org/ksmbd
Pull ksmbd server fixes from Steve French:
- fixes for two possible out of bounds access (in negotiate, and in
decrypt msg)
- fix unsigned compared to zero warning
- fix path lookup crossing a mountpoint
- fix case when first compound request is a tree connect
- fix memory leak if reads are compounded
* tag '6.5-rc3-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: fix out of bounds in init_smb2_rsp_hdr()
ksmbd: no response from compound read
ksmbd: validate session id and tree id in compound request
ksmbd: fix out of bounds in smb3_decrypt_req()
ksmbd: check if a mount point is crossed during path lookup
ksmbd: Fix unsigned expression compared with zero
mm: suppress mm fault logging if fatal signal already pending
Commit eda0047296a1 ("mm: make the page fault mmap locking killable")
intentionally made it much easier to trigger the "page fault fails
because a fatal signal is pending" situation, by having the mmap locking
fail early in that case.
We have long aborted page faults in other fatal cases when the actual IO
for a page is interrupted by SIGKILL - which is particularly useful for
the traditional case of NFS hanging due to network issues, but local
filesystems could cause it too if you happened to get the SIGKILL while
waiting for a page to be faulted in (eg lock_folio_maybe_drop_mmap()).
So aborting the page fault wasn't a new condition - but it now triggers
earlier, before we even get to 'handle_mm_fault()'. And as a result the
error doesn't go through our 'fault_signal_pending()' logic, and doesn't
get filtered away there.
Normally you'd never even notice, because if a fatal signal is pending,
the new SIGSEGV we send ends up being ignored anyway.
But it turns out that there is one very noticeable exception: if you
enable 'show_unhandled_signals', the aborted page fault will be logged
in the kernel messages, and you'll get a scary line looking something
like this in your logs:
which is rather misleading. It's not really a segfault at all, it's
just "the thread was killed before the page fault completed, so we
aborted the page fault".
Fix this by just making it clear that a pending fatal signal means that
any new signal coming in after that is implicitly handled. This will
avoid the misleading logging, since now the signal isn't 'unhandled' any
more.
Reported-and-tested-by: Fiona Ebner <f.ebner@proxmox.com> Tested-by: Thomas Lamprecht <t.lamprecht@proxmox.com> Link: https://lore.kernel.org/lkml/8d063a26-43f5-0bb7-3203-c6a04dc159f8@proxmox.com/ Acked-by: Oleg Nesterov <oleg@redhat.com> Fixes: eda0047296a1 ("mm: make the page fault mmap locking killable") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
netfilter: nf_tables: disallow rule addition to bound chain via NFTA_RULE_CHAIN_ID
Bail out with EOPNOTSUPP when adding rule to bound chain via
NFTA_RULE_CHAIN_ID. The following warning splat is shown when
adding a rule to a deleted bound chain:
netfilter: nft_set_rbtree: fix overlap expiration walk
The lazy gc on insert that should remove timed-out entries fails to release
the other half of the interval, if any.
Can be reproduced with tests/shell/testcases/sets/0044interval_overlap_0
in nftables.git and kmemleak enabled kernel.
Second bug is the use of rbe_prev vs. prev pointer.
If rbe_prev() returns NULL after at least one iteration, rbe_prev points
to element that is not an end interval, hence it should not be removed.
Lastly, check the genmask of the end interval if this is active in the
current generation.
Fixes: c9e6978e2725 ("netfilter: nft_set_rbtree: Switch to node list walk for overlap detection") Signed-off-by: Florian Westphal <fw@strlen.de>
Filipe Manana [Fri, 21 Jul 2023 09:49:21 +0000 (10:49 +0100)]
btrfs: check for commit error at btrfs_attach_transaction_barrier()
btrfs_attach_transaction_barrier() is used to get a handle pointing to the
current running transaction if the transaction has not started its commit
yet (its state is < TRANS_STATE_COMMIT_START). If the transaction commit
has started, then we wait for the transaction to commit and finish before
returning - however we completely ignore if the transaction was aborted
due to some error during its commit, we simply return ERR_PT(-ENOENT),
which makes the caller assume everything is fine and no errors happened.
This could make an fsync return success (0) to user space when in fact we
had a transaction abort and the target inode changes were therefore not
persisted.
Fix this by checking for the return value from btrfs_wait_for_commit(),
and if it returned an error, return it back to the caller.
Fixes: d4edf39bd5db ("Btrfs: fix uncompleted transaction") CC: stable@vger.kernel.org # 4.19+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
igc: Fix Kernel Panic during ndo_tx_timeout callback
The Xeon validation group has been carrying out some loaded tests
with various HW configurations, and they have seen some transmit
queue time out happening during the test. This will cause the
reset adapter function to be called by igc_tx_timeout().
Similar race conditions may arise when the interface is being brought
down and up in igc_reinit_locked(), an interrupt being generated, and
igc_clean_tx_irq() being called to complete the TX.
When the igc_tx_timeout() function is invoked, this patch will turn
off all TX ring HW queues during igc_down() process. TX ring HW queues
will be activated again during the igc_configure_tx_ring() process
when performing the igc_up() procedure later.
This patch also moved existing igc_disable_tx_ring_hw() to avoid using
forward declaration.
Fixes: 9b275176270e ("igc: Add ndo_tx_timeout support") Tested-by: Alejandra Victoria Alcaraz <alejandra.victoria.alcaraz@intel.com> Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Acked-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The qca8k switch doesn't support using 0 as VID and require a default
VID to be always set. MDB add/del function doesn't currently handle
this and are currently setting the default VID.
Fix this by correctly handling this corner case and internally use the
default VID for VID 0 case.
Fixes: ba8f870dfa63 ("net: dsa: qca8k: add support for mdb_add/del") Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
On deleting an MDB entry for a port, fdb_search_and_del is used.
An FDB entry can't be modified so it needs to be deleted and readded
again with the new portmap (and the port deleted as requested)
We use the SEARCH operator to search the entry to edit by vid and mac
address and then we check the aging if we actually found an entry.
Currently the code suffer from a bug where the searched fdb entry is
never read again with the found values (if found) resulting in the code
always returning -EINVAL as aging was always 0.
Fix this by correctly read the fdb entry after it was searched.
Fixes: ba8f870dfa63 ("net: dsa: qca8k: add support for mdb_add/del") Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>