Piotr Krysiuk [Tue, 16 Mar 2021 07:26:25 +0000 (08:26 +0100)]
bpf: Simplify alu_limit masking for pointer arithmetic
Instead of having the mov32 with aux->alu_limit - 1 immediate, move this
operation to retrieve_ptr_limit() instead to simplify the logic and to
allow for subsequent sanity boundary checks inside retrieve_ptr_limit().
This avoids in future that at the time of the verifier masking rewrite
we'd run into an underflow which would not sign extend due to the nature
of mov32 instruction.
Signed-off-by: Piotr Krysiuk <piotras@gmail.com> Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>
Piotr Krysiuk [Tue, 16 Mar 2021 07:20:16 +0000 (08:20 +0100)]
bpf: Fix off-by-one for area size in creating mask to left
retrieve_ptr_limit() computes the ptr_limit for registers with stack and
map_value type. ptr_limit is the size of the memory area that is still
valid / in-bounds from the point of the current position and direction
of the operation (add / sub). This size will later be used for masking
the operation such that attempting out-of-bounds access in the speculative
domain is redirected to remain within the bounds of the current map value.
When masking to the right the size is correct, however, when masking to
the left, the size is off-by-one which would lead to an incorrect mask
and thus incorrect arithmetic operation in the non-speculative domain.
Piotr found that if the resulting alu_limit value is zero, then the
BPF_MOV32_IMM() from the fixup_bpf_calls() rewrite will end up loading
0xffffffff into AX instead of sign-extending to the full 64 bit range,
and as a result, this allows abuse for executing speculatively out-of-
bounds loads against 4GB window of address space and thus extracting the
contents of kernel memory via side-channel.
Fixes: 979d63d50c0c ("bpf: prevent out of bounds speculation on pointer arithmetic") Signed-off-by: Piotr Krysiuk <piotras@gmail.com> Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>
Piotr Krysiuk [Tue, 16 Mar 2021 08:47:02 +0000 (09:47 +0100)]
bpf: Prohibit alu ops for pointer types not defining ptr_limit
The purpose of this patch is to streamline error propagation and in particular
to propagate retrieve_ptr_limit() errors for pointer types that are not defining
a ptr_limit such that register-based alu ops against these types can be rejected.
The main rationale is that a gap has been identified by Piotr in the existing
protection against speculatively out-of-bounds loads, for example, in case of
ctx pointers, unprivileged programs can still perform pointer arithmetic. This
can be abused to execute speculatively out-of-bounds loads without restrictions
and thus extract contents of kernel memory.
Fix this by rejecting unprivileged programs that attempt any pointer arithmetic
on unprotected pointer types. The two affected ones are pointer to ctx as well
as pointer to map. Field access to a modified ctx' pointer is rejected at a
later point in time in the verifier, and 7c6967326267 ("bpf: Permit map_ptr
arithmetic with opcode add and offset 0") only relevant for root-only use cases.
Risk of unprivileged program breakage is considered very low.
Fixes: 7c6967326267 ("bpf: Permit map_ptr arithmetic with opcode add and offset 0") Fixes: b2157399cc98 ("bpf: prevent out-of-bounds speculation") Signed-off-by: Piotr Krysiuk <piotras@gmail.com> Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>
The following sequence of commands:
register_ftrace_direct(ip, addr1);
modify_ftrace_direct(ip, addr1, addr2);
unregister_ftrace_direct(ip, addr2);
will cause the kernel to warn:
[ 30.179191] WARNING: CPU: 2 PID: 1961 at kernel/trace/ftrace.c:5223 unregister_ftrace_direct+0x130/0x150
[ 30.180556] CPU: 2 PID: 1961 Comm: test_progs W O 5.12.0-rc2-00378-g86bc10a0a711-dirty #3246
[ 30.182453] RIP: 0010:unregister_ftrace_direct+0x130/0x150
When modify_ftrace_direct() changes the addr from old to new it should update
the addr stored in ftrace_direct_funcs. Otherwise the final
unregister_ftrace_direct() won't find the address and will cause the splat.
Hangbin Liu [Tue, 9 Mar 2021 03:22:14 +0000 (11:22 +0800)]
selftests/bpf: Set gopt opt_class to 0 if get tunnel opt failed
When fixing the bpf test_tunnel.sh geneve failure. I only fixed the IPv4
part but forgot the IPv6 issue. Similar with the IPv4 fixes 557c223b643a
("selftests/bpf: No need to drop the packet when there is no geneve opt"),
when there is no tunnel option and bpf_skb_get_tunnel_opt() returns error,
there is no need to drop the packets and break all geneve rx traffic.
Just set opt_class to 0 and keep returning TC_ACT_OK at the end.
Fixes: 557c223b643a ("selftests/bpf: No need to drop the packet when there is no geneve opt") Fixes: 933a741e3b82 ("selftests/bpf: bpf tunnel test.") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: William Tu <u9012063@gmail.com> Link: https://lore.kernel.org/bpf/20210309032214.2112438-1-liuhangbin@gmail.com
flow_dissector: fix byteorder of dissected ICMP ID
flow_dissector_key_icmp::id is of type u16 (CPU byteorder),
ICMP header has its ID field in network byteorder obviously.
Sparse says:
net/core/flow_dissector.c:178:43: warning: restricted __be16 degrades to integer
Convert ID value to CPU byteorder when storing it into
flow_dissector_key_icmp.
Fixes: 5dec597e5cd0 ("flow_dissector: extract more ICMP information") Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: David S. Miller <davem@davemloft.net>
Local variable ----addr@____sys_recvmsg created at:
____sys_recvmsg+0x168/0xd50 net/socket.c:2550
____sys_recvmsg+0x168/0xd50 net/socket.c:2550
Bytes 2-3 of 12 are uninitialized
Memory access of size 12 starts at ffff88817c627b40
Data copied to user address 0000000020000140
Fixes: bdabad3e363d ("net: Add Qualcomm IPC router") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Courtney Cavin <courtney.cavin@sonymobile.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Tong Zhang [Sun, 14 Mar 2021 18:08:36 +0000 (14:08 -0400)]
net: arcnet: com20020 fix error handling
There are two issues when handling error case in com20020pci_probe()
1. priv might be not initialized yet when calling com20020pci_remove()
from com20020pci_probe(), since the priv is set at the very last but it
can jump to error handling in the middle and priv remains NULL.
2. memory leak - the net device is allocated in alloc_arcdev but not
properly released if error happens in the middle of the big for loop
Alex Elder [Fri, 12 Mar 2021 15:12:48 +0000 (09:12 -0600)]
net: ipa: terminate message handler arrays
When a QMI handle is initialized, an array of message handler
structures is provided, defining how any received message should
be handled based on its type and message ID. The QMI core code
traverses this array when a message arrives and calls the function
associated with the (type, msg_id) found in the array.
The array is supposed to be terminated with an empty (all zero)
entry though. Without it, an unsupported message will cause
the QMI core code to go past the end of the array.
Fix this bug, by properly terminating the message handler arrays
provided when QMI handles are set up by the IPA driver.
Fixes: 530f9216a9537 ("soc: qcom: ipa: AP/modem communications") Reported-by: Sujit Kautkar <sujitka@chromium.org> Signed-off-by: Alex Elder <elder@linaro.org> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 13 Mar 2021 02:02:28 +0000 (18:02 -0800)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git
/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2021-03-12
This series contains updates to ice, i40e, ixgbe and igb drivers.
Magnus adjusts the return value for xsk allocation for ice. This fixes
reporting of napi work done and matches the behavior of other Intel NIC
drivers for xsk allocations.
Maciej moves storing of the rx_offset value to after the build_skb flag
is set as this flag affects the offset value for ice, i40e, and ixgbe.
Li RongQing resolves an issue where an Rx buffer can be reused
prematurely with XDP redirect for igb.
====================
Mat Martineau [Sat, 13 Mar 2021 00:43:52 +0000 (16:43 -0800)]
selftests: mptcp: Restore packet capture option in join tests
The join self tests previously used the '-c' command line option to
enable creation of pcap files for the tests that run, but the change to
allow running a subset of the join tests made overlapping use of that
option.
Restore the capture functionality with '-c' and move the syncookie test
option to '-k'.
Fixes: 1002b89f23ea ("selftests: mptcp: add command line arguments for mptcp_join.sh") Acked-and-tested-by: Geliang Tang <geliangtang@gmail.com> Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ilya Lipnitskiy [Fri, 12 Mar 2021 08:07:03 +0000 (00:07 -0800)]
net: dsa: mt7530: setup core clock even in TRGMII mode
A recent change to MIPS ralink reset logic made it so mt7530 actually
resets the switch on platforms such as mt7621 (where bit 2 is the reset
line for the switch). That exposed an issue where the switch would not
function properly in TRGMII mode after a reset.
Reconfigure core clock in TRGMII mode to fix the issue.
Tested on Ubiquiti ER-X (MT7621) with TRGMII mode enabled.
Fixes: 3f9ef7785a9c ("MIPS: ralink: manage low reset lines") Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Fri, 12 Mar 2021 07:41:12 +0000 (10:41 +0300)]
mptcp: fix bit MPTCP_PUSH_PENDING tests
The MPTCP_PUSH_PENDING define is 6 and these tests should be testing if
BIT(6) is set.
Fixes: c2e6048fa1cf ("mptcp: fix race in release_cb") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Fri, 12 Mar 2021 00:52:50 +0000 (16:52 -0800)]
net: phy: broadcom: Fix RGMII delays for BCM50160 and BCM50610M
The PHY driver entry for BCM50160 and BCM50610M calls
bcm54xx_config_init() but does not call bcm54xx_config_clock_delay() in
order to configuration appropriate clock delays on the PHY, fix that.
Fixes: 733336262b28 ("net: phy: Allow BCM5481x PHYs to setup internal TX/RX clock delay") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dylan Hung [Fri, 12 Mar 2021 00:34:05 +0000 (11:04 +1030)]
ftgmac100: Restart MAC HW once
The interrupt handler may set the flag to reset the mac in the future,
but that flag is not cleared once the reset has occurred.
Fixes: 10cbd6407609 ("ftgmac100: Rework NAPI & interrupts handling") Signed-off-by: Dylan Hung <dylan_hung@aspeedtech.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Robert Hancock [Thu, 11 Mar 2021 20:05:18 +0000 (14:05 -0600)]
net: axienet: Fix probe error cleanup
The driver did not always clean up all allocated resources when probe
failed. Fix the probe cleanup path to clean up everything that was
allocated.
Fixes: 57baf8cc70ea ("net: axienet: Handle deferred probe on clock properly") Signed-off-by: Robert Hancock <robert.hancock@calian.com> Signed-off-by: David S. Miller <davem@davemloft.net>
liuyacan [Thu, 11 Mar 2021 16:32:25 +0000 (00:32 +0800)]
net: correct sk_acceptq_is_full()
The "backlog" argument in listen() specifies
the maximom length of pending connections,
so the accept queue should be considered full
if there are exactly "backlog" elements.
Signed-off-by: liuyacan <yacanliu@163.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Li RongQing [Thu, 21 Jan 2021 21:54:23 +0000 (13:54 -0800)]
igb: avoid premature Rx buffer reuse
Igb needs a similar fix as commit 75aab4e10ae6a ("i40e: avoid
premature Rx buffer reuse")
The page recycle code, incorrectly, relied on that a page fragment
could not be freed inside xdp_do_redirect(). This assumption leads to
that page fragments that are used by the stack/XDP redirect can be
reused and overwritten.
To avoid this, store the page count prior invoking xdp_do_redirect().
Longer explanation:
Intel NICs have a recycle mechanism. The main idea is that a page is
split into two parts. One part is owned by the driver, one part might
be owned by someone else, such as the stack.
t0: Page is allocated, and put on the Rx ring
+---------------
used by NIC ->| upper buffer
(rx_buffer) +---------------
| lower buffer
+---------------
page count == USHRT_MAX
rx_buffer->pagecnt_bias == USHRT_MAX
t1: Buffer is received, and passed to the stack (e.g.)
+---------------
| upper buff (skb)
+---------------
used by NIC ->| lower buffer
(rx_buffer) +---------------
page count == USHRT_MAX
rx_buffer->pagecnt_bias == USHRT_MAX - 1
t2: Buffer is received, and redirected
+---------------
| upper buff (skb)
+---------------
used by NIC ->| lower buffer
(rx_buffer) +---------------
This means that buffer *cannot* be flipped/reused, because the skb is
still using it.
The problem arises when xdp_do_redirect() actually frees the
segment. Then we get:
page count == USHRT_MAX - 1
rx_buffer->pagecnt_bias == USHRT_MAX - 2
From a recycle perspective, the buffer can be flipped and reused,
which means that the skb data area is passed to the Rx HW ring!
To work around this, the page count is stored prior calling
xdp_do_redirect().
Fixes: 9cbc948b5a20 ("igb: add XDP support") Signed-off-by: Li RongQing <lirongqing@baidu.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Tested-by: Vishakha Jambekar <vishakha.jambekar@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
ixgbe: move headroom initialization to ixgbe_configure_rx_ring
ixgbe_rx_offset(), that is supposed to initialize the Rx buffer headroom,
relies on __IXGBE_RX_BUILD_SKB_ENABLED flag.
Currently, the callsite of mentioned function is placed incorrectly
within ixgbe_setup_rx_resources() where Rx ring's build skb flag is not
set yet. This causes the XDP_REDIRECT to be partially broken due to
inability to create xdp_frame in the headroom space, as the headroom is
0.
Fix this by moving ixgbe_rx_offset() to ixgbe_configure_rx_ring() after
the flag setting, which happens to be set in ixgbe_set_rx_buffer_len.
Fixes: c0d4e9d223c5 ("ixgbe: store the result of ixgbe_rx_offset() onto ixgbe_ring") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Vishakha Jambekar <vishakha.jambekar@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
ice: move headroom initialization to ice_setup_rx_ctx
ice_rx_offset(), that is supposed to initialize the Rx buffer headroom,
relies on ICE_RX_FLAGS_RING_BUILD_SKB flag as well as XDP prog presence.
Currently, the callsite of mentioned function is placed incorrectly
within ice_setup_rx_ring() where Rx ring's build skb flag is not
set yet. This causes the XDP_REDIRECT to be partially broken due to
inability to create xdp_frame in the headroom space, as the headroom is
0.
Fix this by moving ice_rx_offset() to ice_setup_rx_ctx() after the flag
setting.
Fixes: f1b1f409bf79 ("ice: store the result of ice_rx_offset() onto ice_ring") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
i40e: move headroom initialization to i40e_configure_rx_ring
i40e_rx_offset(), that is supposed to initialize the Rx buffer headroom,
relies on I40E_RXR_FLAGS_BUILD_SKB_ENABLED flag.
Currently, the callsite of mentioned function is placed incorrectly
within i40e_setup_rx_descriptors() where Rx ring's build skb flag is not
set yet. This causes the XDP_REDIRECT to be partially broken due to
inability to create xdp_frame in the headroom space, as the headroom is
0.
For the record, below is the call graph:
i40e_vsi_open
i40e_vsi_setup_rx_resources
i40e_setup_rx_descriptors
i40e_rx_offset() <-- sets offset to 0 as build_skb flag is set below
i40e_vsi_configure_rx
i40e_configure_rx_ring
set_ring_build_skb_enabled(ring) <-- set build_skb flag
Fix this by moving i40e_rx_offset() to i40e_configure_rx_ring() after
the flag setting.
Fixes: f7bb0d71d658 ("i40e: store the result of i40e_rx_offset() onto i40e_ring") Reported-by: Jesper Dangaard Brouer <brouer@redhat.com> Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Tested-by: Jesper Dangaard Brouer <brouer@redhat.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Magnus Karlsson [Fri, 5 Feb 2021 09:09:04 +0000 (10:09 +0100)]
ice: fix napi work done reporting in xsk path
Fix the wrong napi work done reporting in the xsk path of the ice
driver. The code in the main Rx processing loop was written to assume
that the buffer allocation code returns true if all allocations where
successful and false if not. In contrast with all other Intel NIC xsk
drivers, the ice_alloc_rx_bufs_zc() has the inverted logic messing up
the work done reporting in the napi loop.
This can be fixed either by inverting the return value from
ice_alloc_rx_bufs_zc() in the function that uses this in an incorrect
way, or by changing the return value of ice_alloc_rx_bufs_zc(). We
chose the latter as it makes all the xsk allocation functions for
Intel NICs behave in the same way. My guess is that it was this
unexpected discrepancy that gave rise to this bug in the first place.
Fixes: 5bb0c4b5eb61 ("ice, xsk: Move Rx allocation out of while-loop") Reported-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
David S. Miller [Fri, 12 Mar 2021 02:30:32 +0000 (18:30 -0800)]
Merge branch 'htb-fixes'
Maxim Mikityanskiy says:
====================
Bugfixes for HTB
The HTB offload feature introduced a few bugs in HTB. One affects the
non-offload mode, preventing attaching qdiscs to HTB classes, and the
other affects the error flow, when the netdev doesn't support the
offload, but it was requested. This short series fixes them.
====================
Acked-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>
sch_htb: Fix offload cleanup in htb_destroy on htb_init failure
htb_init may fail to do the offload if it's not supported or if a
runtime error happens when allocating direct qdiscs. In those cases
TC_HTB_CREATE command is not sent to the driver, however, htb_destroy
gets called anyway and attempts to send TC_HTB_DESTROY.
It shouldn't happen, because the driver didn't receive TC_HTB_CREATE,
and also because the driver may not support ndo_setup_tc at all, while
q->offload is true, and htb_destroy mistakenly thinks the offload is
supported. Trying to call ndo_setup_tc in the latter case will lead to a
NULL pointer dereference.
This commit fixes the issues with htb_destroy by deferring assignment of
q->offload until after the TC_HTB_CREATE command. The necessary cleanup
of the offload entities is already done in htb_init.
Reported-by: syzbot+b53a709f04722ca12a3c@syzkaller.appspotmail.com Fixes: d03b195b5aa0 ("sch_htb: Hierarchical QoS hardware offload") Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
htb_select_queue assumes it's always the offload mode, and it ends up in
calling ndo_setup_tc without any checks. It may lead to a NULL pointer
dereference if ndo_setup_tc is not implemented, or to an error returned
from the driver, which will prevent attaching qdiscs to HTB classes in
the non-offload mode.
This commit fixes the bug by adding the missing check to
htb_select_queue. In the non-offload mode it will return sch->dev_queue,
mimicking tc_modify_qdisc's behavior for the case where select_queue is
not implemented.
Reported-by: syzbot+b53a709f04722ca12a3c@syzkaller.appspotmail.com Fixes: d03b195b5aa0 ("sch_htb: Hierarchical QoS hardware offload") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 11 Mar 2021 04:53:42 +0000 (20:53 -0800)]
net: phy: broadcom: Add power down exit reset state delay
Per the datasheet, when we clear the power down bit, the PHY remains in
an internal reset state for 40us and then resume normal operation.
Account for that delay to avoid any issues in the future if
genphy_resume() changes.
Fixes: fe26821fa614 ("net: phy: broadcom: Wire suspend/resume for BCM54810") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Tong Zhang [Thu, 11 Mar 2021 04:27:35 +0000 (23:27 -0500)]
mISDN: fix crash in fritzpci
setup_fritz() in avmfritz.c might fail with -EIO and in this case the
isac.type and isac.write_reg is not initialized and remains 0(NULL).
A subsequent call to isac_release() will dereference isac->write_reg and
crash.
Lv Yunlong [Thu, 11 Mar 2021 04:01:40 +0000 (20:01 -0800)]
net/qlcnic: Fix a use after free in qlcnic_83xx_get_minidump_template
In qlcnic_83xx_get_minidump_template, fw_dump->tmpl_hdr was freed by
vfree(). But unfortunately, it is used when extended is true.
Fixes: 7061b2bdd620e ("qlogic: Deletion of unnecessary checks before two function calls") Signed-off-by: Lv Yunlong <lyl2019@mail.ustc.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
Tonghao Zhang [Thu, 11 Mar 2021 02:57:36 +0000 (10:57 +0800)]
net: sock: simplify tw proto registration
Introduce the new function tw_prot_init (inspired by
req_prot_init) to simplify "proto_register" function.
tw_prot_cleanup will take care of a partially initialized
timewait_sock_ops.
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dinghao Liu [Sun, 28 Feb 2021 09:44:23 +0000 (17:44 +0800)]
e1000e: Fix error handling in e1000_set_d0_lplu_state_82571
There is one e1e_wphy() call in e1000_set_d0_lplu_state_82571
that we have caught its return value but lack further handling.
Check and terminate the execution flow just like other e1e_wphy()
in this function.
Fixes: bc7f75fa9788 ("[E1000E]: New pci-express e1000 driver (currently for ICH9 devices only)") Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn> Acked-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Vitaly Lifshits [Wed, 21 Oct 2020 11:59:37 +0000 (14:59 +0300)]
e1000e: add rtnl_lock() to e1000_reset_task
A possible race condition was found in e1000_reset_task,
after discovering a similar issue in igb driver via
commit 024a8168b749 ("igb: reinit_locked() should be called
with rtnl_lock").
Added rtnl_lock() and rtnl_unlock() to avoid this.
Fixes: bc7f75fa9788 ("[E1000E]: New pci-express e1000 driver (currently for ICH9 devices only)") Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com> Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Andre Guedes [Wed, 10 Mar 2021 06:42:56 +0000 (22:42 -0800)]
igc: Fix igc_ptp_rx_pktstamp()
The comment describing the timestamps layout in the packet buffer is
wrong and the code is actually retrieving the timestamp in Timer 1
reference instead of Timer 0. This hasn't been a big issue so far
because hardware is configured to report both timestamps using Timer 0
(see IGC_SRRCTL register configuration in igc_ptp_enable_rx_timestamp()
helper). This patch fixes the comment and the code so we retrieve the
timestamp in Timer 0 reference as expected.
This patch also takes the opportunity to get rid of the hw.mac.type check
since it is not required.
Fixes: 81b055205e8ba ("igc: Add support for RX timestamping") Signed-off-by: Andre Guedes <andre.guedes@intel.com> Signed-off-by: Vedang Patel <vedang.patel@intel.com> Signed-off-by: Jithu Joseph <jithu.joseph@intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The Supported Pause Frame always display "No" even though the Advertised
pause frame showing the correct setting based on the pause parameters via
ethtool. Set bit in link_ksettings to "Supported" for Pause Frame.
Fix Pause Frame Advertising when getting the advertisement via ethtool.
Remove setting the "advertising" bit in link_ksettings during default
case when Tx and Rx are in off state with Auto Negotiate off.
Below is the original output of advertisement link during Tx and Rx off:
Advertised pause frame use: Symmetric Receive-only
Expected output:
Advertised pause frame use: No
Fixes: 8c5ad0dae93c ("igc: Add ethtool support") Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Reviewed-by: Malli C <mallikarjuna.chilakala@intel.com> Acked-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Sasha Neftin [Tue, 20 Oct 2020 13:34:00 +0000 (16:34 +0300)]
igc: reinit_locked() should be called with rtnl_lock
This commit applies to the igc_reset_task the same changes that
were applied to the igb driver in commit 024a8168b749 ("igb:
reinit_locked() should be called with rtnl_lock")
and fix possible race in reset subtask.
Fixes: 0507ef8a0372 ("igc: Add transmit and receive fastpath and interrupt handlers") Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Florian Fainelli [Wed, 10 Mar 2021 22:17:58 +0000 (14:17 -0800)]
net: dsa: bcm_sf2: Qualify phydev->dev_flags based on port
Similar to commit 92696286f3bb37ba50e4bd8d1beb24afb759a799 ("net:
bcmgenet: Set phydev->dev_flags only for internal PHYs") we need to
qualify the phydev->dev_flags based on whether the port is connected to
an internal or external PHY otherwise we risk having a flags collision
with a completely different interpretation depending on the driver.
Fixes: aa9aef77c761 ("net: dsa: bcm_sf2: communicate integrated PHY revision to PHY driver") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Wed, 10 Mar 2021 18:46:10 +0000 (10:46 -0800)]
net: dsa: b53: VLAN filtering is global to all users
The bcm_sf2 driver uses the b53 driver as a library but does not make
usre of the b53_setup() function, this made it fail to inherit the
vlan_filtering_is_global attribute. Fix this by moving the assignment to
b53_switch_alloc() which is used by bcm_sf2.
Fixes: 7228b23e68f7 ("net: dsa: b53: Let DSA handle mismatched VLAN filtering settings") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: 8afa10cbe281 ("net_sched: red: Avoid illegal values") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Rafaล Miลecki [Wed, 10 Mar 2021 12:51:59 +0000 (13:51 +0100)]
net: dsa: bcm_sf2: use 2 Gbps IMP port link on BCM4908
BCM4908 uses 2 Gbps link between switch and the Ethernet interface.
Without this BCM4908 devices were able to achieve only 2 x ~895 Mb/s.
This allows handling e.g. NAT traffic with 940 Mb/s.
Signed-off-by: Rafaล Miลecki <rafal@milecki.pl> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Andrianov [Wed, 10 Mar 2021 08:10:46 +0000 (11:10 +0300)]
net: pxa168_eth: Fix a potential data race in pxa168_eth_remove
pxa168_eth_remove() firstly calls unregister_netdev(),
then cancels a timeout work. unregister_netdev() shuts down a device
interface and removes it from the kernel tables. If the timeout occurs
in parallel, the timeout work (pxa168_eth_tx_timeout_task) performs stop
and open of the device. It may lead to an inconsistent state and memory
leaks.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Pavel Andrianov <andrianov@ispras.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 10 Mar 2021 09:56:36 +0000 (01:56 -0800)]
macvlan: macvlan_count_rx() needs to be aware of preemption
macvlan_count_rx() can be called from process context, it is thus
necessary to disable preemption before calling u64_stats_update_begin()
syzbot was able to spot this on 32bit arch:
WARNING: CPU: 1 PID: 4632 at include/linux/seqlock.h:271 __seqprop_assert include/linux/seqlock.h:271 [inline]
WARNING: CPU: 1 PID: 4632 at include/linux/seqlock.h:271 __seqprop_assert.constprop.0+0xf0/0x11c include/linux/seqlock.h:269
Modules linked in:
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 4632 Comm: kworker/1:3 Not tainted 5.12.0-rc2-syzkaller #0
Hardware name: ARM-Versatile Express
Workqueue: events macvlan_process_broadcast
Backtrace:
[<82740468>] (dump_backtrace) from [<827406dc>] (show_stack+0x18/0x1c arch/arm/kernel/traps.c:252)
r7:00000080 r6:60000093 r5:00000000 r4:8422a3c4
[<827406c4>] (show_stack) from [<82751b58>] (__dump_stack lib/dump_stack.c:79 [inline])
[<827406c4>] (show_stack) from [<82751b58>] (dump_stack+0xb8/0xe8 lib/dump_stack.c:120)
[<82751aa0>] (dump_stack) from [<82741270>] (panic+0x130/0x378 kernel/panic.c:231)
r7:830209b4 r6:84069ea4 r5:00000000 r4:844350d0
[<82741140>] (panic) from [<80244924>] (__warn+0xb0/0x164 kernel/panic.c:605)
r3:8404ec8c r2:00000000 r1:00000000 r0:830209b4
r7:0000010f
[<80244874>] (__warn) from [<82741520>] (warn_slowpath_fmt+0x68/0xd4 kernel/panic.c:628)
r7:81363f70 r6:0000010f r5:83018e50 r4:00000000
[<827414bc>] (warn_slowpath_fmt) from [<81363f70>] (__seqprop_assert include/linux/seqlock.h:271 [inline])
[<827414bc>] (warn_slowpath_fmt) from [<81363f70>] (__seqprop_assert.constprop.0+0xf0/0x11c include/linux/seqlock.h:269)
r8:5a109000 r7:0000000f r6:a568dac0 r5:89802300 r4:00000001
[<81363e80>] (__seqprop_assert.constprop.0) from [<81364af0>] (u64_stats_update_begin include/linux/u64_stats_sync.h:128 [inline])
[<81363e80>] (__seqprop_assert.constprop.0) from [<81364af0>] (macvlan_count_rx include/linux/if_macvlan.h:47 [inline])
[<81363e80>] (__seqprop_assert.constprop.0) from [<81364af0>] (macvlan_broadcast+0x154/0x26c drivers/net/macvlan.c:291)
r5:89802300 r4:8a927740
[<8136499c>] (macvlan_broadcast) from [<81365020>] (macvlan_process_broadcast+0x258/0x2d0 drivers/net/macvlan.c:317)
r10:81364f78 r9:8a86d000 r8:8a9c7e7c r7:8413aa5c r6:00000000 r5:00000000
r4:89802840
[<81364dc8>] (macvlan_process_broadcast) from [<802696a4>] (process_one_work+0x2d4/0x998 kernel/workqueue.c:2275)
r10:00000008 r9:8404ec98 r8:84367a02 r7:ddfe6400 r6:ddfe2d40 r5:898dac80
r4:8a86d43c
[<802693d0>] (process_one_work) from [<80269dcc>] (worker_thread+0x64/0x54c kernel/workqueue.c:2421)
r10:00000008 r9:8a9c6000 r8:84006d00 r7:ddfe2d78 r6:898dac94 r5:ddfe2d40
r4:898dac80
[<80269d68>] (worker_thread) from [<80271f40>] (kthread+0x184/0x1a4 kernel/kthread.c:292)
r10:85247e64 r9:898dac80 r8:80269d68 r7:00000000 r6:8a9c6000 r5:89a2ee40
r4:8a97bd00
[<80271dbc>] (kthread) from [<80200114>] (ret_from_fork+0x14/0x20 arch/arm/kernel/entry-common.S:158)
Exception stack(0x8a9c7fb0 to 0x8a9c7ff8)
Fixes: 412ca1550cbe ("macvlan: Move broadcasts into a work queue") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 10 Mar 2021 10:28:01 +0000 (12:28 +0200)]
drop_monitor: Perform cleanup upon probe registration failure
In the rare case that drop_monitor fails to register its probe on the
'napi_poll' tracepoint, it will not deactivate its hysteresis timer as
part of the error path. If the hysteresis timer was armed by the shortly
lived 'kfree_skb' probe and user space retries to initiate tracing, a
warning will be emitted for trying to initialize an active object [1].
Fix this by properly undoing all the operations that were done prior to
probe registration, in both software and hardware code paths.
Note that syzkaller managed to fail probe registration by injecting a
slab allocation failure [2].
[2]
FAULT_INJECTION: forcing a failure.
name failslab, interval 1, probability 0, space 0, times 1
CPU: 1 PID: 8645 Comm: syz-executor.0 Not tainted 5.11.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
dump_stack+0xfa/0x151
should_fail.cold+0x5/0xa
should_failslab+0x5/0x10
__kmalloc+0x72/0x3f0
tracepoint_add_func+0x378/0x990
tracepoint_probe_register+0x9c/0xe0
net_dm_cmd_trace+0x7fc/0x1220
genl_family_rcv_msg_doit+0x228/0x320
genl_rcv_msg+0x328/0x580
netlink_rcv_skb+0x153/0x420
genl_rcv+0x24/0x40
netlink_unicast+0x533/0x7d0
netlink_sendmsg+0x856/0xd90
sock_sendmsg+0xcf/0x120
____sys_sendmsg+0x6e8/0x810
___sys_sendmsg+0xf3/0x170
__sys_sendmsg+0xe5/0x1b0
do_syscall_64+0x2d/0x70
entry_SYSCALL_64_after_hwframe+0x44/0xae
Fixes: 70c69274f354 ("drop_monitor: Initialize timer and work item upon tracing enable") Fixes: 8ee2267ad33e ("drop_monitor: Convert to using devlink tracepoint") Reported-by: syzbot+779559d6503f3a56213d@syzkaller.appspotmail.com Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The following pull-request contains BPF updates for your *net* tree.
We've added 8 non-merge commits during the last 5 day(s) which contain
a total of 11 files changed, 136 insertions(+), 17 deletions(-).
The main changes are:
1) Reject bogus use of vmlinux BTF as map/prog creation BTF, from Alexei Starovoitov.
2) Fix allocation failure splat in x86 JIT for large progs. Also fix overwriting
percpu cgroup storage from tracing programs when nested, from Yonghong Song.
3) Fix rx queue retrieval in XDP for multi-queue veth, from Maciej Fijalkowski.
4) Fix bpf_check_mtu() helper API before freeze to have mtu_len as custom skb/xdp
L3 input length, from Jesper Dangaard Brouer.
5) Fix inode_storage's lookup_elem return value upon having bad fd, from Tal Lossos.
6) Fix bpftool and libbpf cross-build on MacOS, from Georgi Valkov.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Wang [Wed, 10 Mar 2021 02:20:35 +0000 (18:20 -0800)]
ipv6: fix suspecious RCU usage warning
Syzbot reported the suspecious RCU usage in nexthop_fib6_nh() when
called from ipv6_route_seq_show(). The reason is ipv6_route_seq_start()
calls rcu_read_lock_bh(), while nexthop_fib6_nh() calls
rcu_dereference_rtnl().
The fix proposed is to add a variant of nexthop_fib6_nh() to use
rcu_dereference_bh_rtnl() for ipv6_route_seq_show().
The reported trace is as follows:
./include/net/nexthop.h:416 suspicious rcu_dereference_check() usage!
Fixes: f88d8ea67fbdb ("ipv6: Plumb support for nexthop object in a fib6_info") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Wei Wang <weiwan@google.com> Cc: David Ahern <dsahern@kernel.org> Cc: Ido Schimmel <idosch@idosch.org> Cc: Petr Machata <petrm@nvidia.com> Cc: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 10 Mar 2021 20:24:19 +0000 (12:24 -0800)]
Merge branch 'ip6ip6-crash'
Daniel Borkmann says:
====================
Fix ip6ip6 crash for collect_md skbs
Fix a NULL pointer deref panic I ran into for regular ip6ip6 tunnel devices
when collect_md populated skbs were redirected to them for xmit. See patches
for further details, thanks!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Wed, 10 Mar 2021 00:38:10 +0000 (01:38 +0100)]
net, bpf: Fix ip6ip6 crash with collect_md populated skbs
I ran into a crash where setting up a ip6ip6 tunnel device which was /not/
set to collect_md mode was receiving collect_md populated skbs for xmit.
The BPF prog was populating the skb via bpf_skb_set_tunnel_key() which is
assigning special metadata dst entry and then redirecting the skb to the
device, taking ip6_tnl_start_xmit() -> ipxip6_tnl_xmit() -> ip6_tnl_xmit()
and in the latter it performs a neigh lookup based on skb_dst(skb) where
we trigger a NULL pointer dereference on dst->ops->neigh_lookup() since
the md_dst_ops do not populate neigh_lookup callback with a fake handler.
Transform the md_dst_ops into generic dst_blackhole_ops that can also be
reused elsewhere when needed, and use them for the metadata dst entries as
callback ops.
Also, remove the dst_md_discard{,_out}() ops and rely on dst_discard{,_out}()
from dst_init() which free the skb the same way modulo the splat. Given we
will be able to recover just fine from there, avoid any potential splats
iff this gets ever triggered in future (or worse, panic on warns when set).
Fixes: f38a9eb1f77b ("dst: Metadata destinations") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Wed, 10 Mar 2021 00:38:09 +0000 (01:38 +0100)]
net: Consolidate common blackhole dst ops
Move generic blackhole dst ops to the core and use them from both
ipv4_dst_blackhole_ops and ip6_dst_blackhole_ops where possible. No
functional change otherwise. We need these also in other locations
and having to define them over and over again is not great.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Shay Drory [Thu, 11 Feb 2021 13:41:35 +0000 (15:41 +0200)]
net/mlx5: SF: Fix error flow of SFs allocation flow
When SF id is unavailable, code jumps to wrong label that accesses
sw id array outside of its range.
Hence, when SF id is not allocated, avoid accessing such array.
Maor Gottlieb [Wed, 3 Mar 2021 12:39:20 +0000 (14:39 +0200)]
RDMA/mlx5: Fix timestamp default mode
1. Don't set the ts_format bit to default when it reserved - device is
running in the old mode (free running).
2. XRC doesn't have a CQ therefore the ts format in the QP
context should be default / free running.
3. Set ts_format to WQ.
Fixes: 2fe8d4b87802 ("RDMA/mlx5: Fail QP creation if the device can not support the CQE TS") Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Maor Gottlieb [Wed, 3 Mar 2021 12:36:16 +0000 (14:36 +0200)]
net/mlx5: Set QP timestamp mode to default
QPs which don't care from timestamp mode, should set the ts_format
to default, otherwise the QP creation could be failed if the timestamp
mode is not supported.
Fixes: 2fe8d4b87802 ("RDMA/mlx5: Fail QP creation if the device can not support the CQE TS") Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Roi Dayan [Wed, 10 Feb 2021 08:33:13 +0000 (10:33 +0200)]
net/mlx5e: Fix error flow in change profile
Move priv memset from init to cleanup to avoid double priv cleanup
that can happen on profile change if also roolback fails.
Add missing cleanup flow in mlx5e_netdev_attach_profile().
Maor Dickman [Mon, 1 Mar 2021 11:07:08 +0000 (13:07 +0200)]
net/mlx5: Disable VF tunnel TX offload if ignore_flow_level isn't supported
VF tunnel TX traffic offload is adding flow which forward to flow
tables with lower level, which isn't support on all FW versions
and may cause firmware to fail with syndrome.
Fixed by enabling VF tunnel TX offload only if flow table capability
ignore_flow_level is enabled.
Roi Dayan [Tue, 2 Mar 2021 09:31:08 +0000 (11:31 +0200)]
net/mlx5e: Check correct ip_version in decapsulation route resolution
flow_attr->ip_version has the matching that should be done inner/outer.
When working with chains, decapsulation is done on chain0 and next chain
match on outer header which is the original inner which could be ipv4.
So in tunnel route resolution we cannot use that to know which ip version
we are at so save tun_ip_version when parsing the tunnel match and use
that.
Fixes: a508728a4c8b ("net/mlx5e: VF tunnel RX traffic offloading") Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Dmytro Linkin <dlinkin@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Aya Levin [Mon, 1 Mar 2021 15:54:21 +0000 (17:54 +0200)]
net/mlx5: Fix turn-off PPS command
Fix a bug of uninitialized pin index when trying to turn off PPS out.
Fixes: de19cd6cc977 ("net/mlx5: Move some PPS logic into helper functions") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Maor Dickman [Tue, 16 Feb 2021 11:39:18 +0000 (13:39 +0200)]
net/mlx5e: Don't match on Geneve options in case option masks are all zero
The cited change added offload support for Geneve options without verifying
the validity of the options masks, this caused offload of rules with match
on Geneve options with class,type and data masks which are zero to fail.
Fix by ignoring the match on Geneve options in case option masks are
all zero.
Fixes: 9272e3df3023 ("net/mlx5e: Geneve, Add support for encap/decap flows offload") Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
net/mlx5e: Revert parameters on errors when changing PTP state without reset
Port timestamping for PTP can be enabled/disabled while the channels are
closed. In that case mlx5e_safe_switch_channels is skipped, and the
preactivate hook is called directly. However, if that hook returns an
error, the channel parameters must be reverted back to their old values.
This commit adds missing handling on this case.
net/mlx5e: When changing XDP program without reset, take refs for XSK RQs
Each RQ (including XSK RQs) takes a reference to the XDP program. When
an XDP program is attached or detached, the channels and queues are
recreated, however, there is a special flow for changing an active XDP
program to another one. In that flow, channels and queues stay alive,
but the refcounts of the old and new XDP programs are adjusted. This
flow didn't increment refcount by the number of active XSK RQs, and this
commit fixes it.
Aya Levin [Tue, 12 Jan 2021 11:34:29 +0000 (13:34 +0200)]
net/mlx5e: Set PTP channel pointer explicitly to NULL
When closing the PTP channel, set its pointer explicitly to NULL. PTP
channel is opened on demand, the code verify the pointer validity before
access. Nullify it when closing the PTP channel to avoid unexpected
behavior.
Tariq Toukan [Tue, 12 Jan 2021 11:21:17 +0000 (13:21 +0200)]
net/mlx5e: RX, Mind the MPWQE gaps when calculating offsets
Since cited patch, MLX5E_REQUIRED_WQE_MTTS is not a power of two.
Hence, usage of MLX5E_LOG_ALIGNED_MPWQE_PPW should be replaced,
as it lost some accuracy. Use the designated macro to calculate
the number of required MTTs.
This makes sure the solution in cited patch works properly.
While here, un-inline mlx5e_get_mpwqe_offset(), and remove the
unused RQ parameter.
Fixes: c3c9402373fe ("net/mlx5e: Add resiliency in Striding RQ mode for packets larger than MTU") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
1) Fix transmissions in dynamic SMPS mode in ath9k, from Felix Fietkau.
2) TX skb error handling fix in mt76 driver, also from Felix.
3) Fix BPF_FETCH atomic in x86 JIT, from Brendan Jackman.
4) Avoid double free of percpu pointers when freeing a cloned bpf prog.
From Cong Wang.
5) Use correct printf format for dma_addr_t in ath11k, from Geert
Uytterhoeven.
6) Fix resolve_btfids build with older toolchains, from Kun-Chuan
Hsieh.
7) Don't report truncated frames to mac80211 in mt76 driver, from
Lorenzop Bianconi.
8) Fix watcdog timeout on suspend/resume of stmmac, from Joakim Zhang.
9) mscc ocelot needs NET_DEVLINK selct in Kconfig, from Arnd Bergmann.
10) Fix sign comparison bug in TCP_ZEROCOPY_RECEIVE getsockopt(), from
Arjun Roy.
11) Ignore routes with deleted nexthop object in mlxsw, from Ido
Schimmel.
12) Need to undo tcp early demux lookup sometimes in nf_nat, from
Florian Westphal.
13) Fix gro aggregation for udp encaps with zero csum, from Daniel
Borkmann.
14) Make sure to always use imp*_ndo_send when necessaey, from Jason A.
Donenfeld.
15) Fix TRSCER masks in sh_eth driver from Sergey Shtylyov.
16) prevent overly huge skb allocationsd in qrtr, from Pavel Skripkin.
17) Prevent rx ring copnsumer index loss of sync in enetc, from Vladimir
Oltean.
18) Make sure textsearch copntrol block is large enough, from Wilem de
Bruijn.
19) Revert MAC changes to r8152 leading to instability, from Hates Wang.
20) Advance iov in 9p even for empty reads, from Jissheng Zhang.
21) Double hook unregister in nftables, from PabloNeira Ayuso.
22) Fix memleak in ixgbe, fropm Dinghao Liu.
23) Avoid dups in pkt scheduler class dumps, from Maximilian Heyne.
24) Various mptcp fixes from Florian Westphal, Paolo Abeni, and Geliang
Tang.
25) Fix DOI refcount bugs in cipso, from Paul Moore.
26) One too many irqsave in ibmvnic, from Junlin Yang.
27) Fix infinite loop with MPLS gso segmenting via virtio_net, from
Balazs Nemeth.
* git://git.kernel.org:/pub/scm/linux/kernel/git/netdev/net: (164 commits)
s390/qeth: fix notification for pending buffers during teardown
s390/qeth: schedule TX NAPI on QAOB completion
s390/qeth: improve completion of pending TX buffers
s390/qeth: fix memory leak after failed TX Buffer allocation
net: avoid infinite loop in mpls_gso_segment when mpls_hlen == 0
net: check if protocol extracted by virtio_net_hdr_set_proto is correct
net: dsa: xrs700x: check if partner is same as port in hsr join
net: lapbether: Remove netif_start_queue / netif_stop_queue
atm: idt77252: fix null-ptr-dereference
atm: uPD98402: fix incorrect allocation
atm: fix a typo in the struct description
net: qrtr: fix error return code of qrtr_sendmsg()
mptcp: fix length of ADD_ADDR with port sub-option
net: bonding: fix error return code of bond_neigh_init()
net: enetc: allow hardware timestamping on TX queues with tc-etf enabled
net: enetc: set MAC RX FIFO to recommended value
net: davicom: Use platform_get_irq_optional()
net: davicom: Fix regulator not turned off on driver removal
net: davicom: Fix regulator not turned off on failed probe
net: dsa: fix switchdev objects on bridge master mistakenly being applied on ports
...
After my patch there is CONFIG_ATA defined twice.
Remove the duplicate one.
Same problem for CONFIG_HAPPYMEAL, except I added as builtin for boot
test with NFS.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Fixes: a57cdeb369ef ("sparc: sparc64_defconfig: add necessary configs for qemu") Signed-off-by: Corentin Labbe <clabbe@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Rob Gardner [Mon, 1 Mar 2021 05:48:16 +0000 (22:48 -0700)]
sparc64: Fix opcode filtering in handling of no fault loads
is_no_fault_exception() has two bugs which were discovered via random
opcode testing with stress-ng. Both are caused by improper filtering
of opcodes.
The first bug can be triggered by a floating point store with a no-fault
ASI, for instance "sta %f0, [%g0] #ASI_PNF", opcode C1A01040.
The code first tests op3[5] (0x1000000), which denotes a floating
point instruction, and then tests op3[2] (0x200000), which denotes a
store instruction. But these bits are not mutually exclusive, and the
above mentioned opcode has both bits set. The intent is to filter out
stores, so the test for stores must be done first in order to have
any effect.
The second bug can be triggered by a floating point load with one of
the invalid ASI values 0x8e or 0x8f, which pass this check in
is_no_fault_exception():
if ((asi & 0xf2) == ASI_PNF)
An example instruction is "ldqa [%l7 + %o7] #ASI 0x8f, %f38",
opcode CF95D1EF. Asi values greater than 0x8b (ASI_SNFL) are fatal
in handle_ldf_stq(), and is_no_fault_exception() must not allow these
invalid asi values to make it that far.
In both of these cases, handle_ldf_stq() reacts by calling
sun4v_data_access_exception() or spitfire_data_access_exception(),
which call is_no_fault_exception() and results in an infinite
recursion.
Signed-off-by: Rob Gardner <rob.gardner@oracle.com> Tested-by: Anatoly Pugachev <matorola@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 10 Mar 2021 00:14:54 +0000 (16:14 -0800)]
Merge branch 's390-qeth-fixes'
Julian Wiedmann says:
====================
s390/qeth: fixes 2021-03-09
please apply the following patch series to netdev's net tree.
This brings one fix for a memleak in an error path of the setup code.
Also several fixes for dealing with pending TX buffers - two for old
bugs in their completion handling, and one recent regression in a
teardown path.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Tue, 9 Mar 2021 16:52:21 +0000 (17:52 +0100)]
s390/qeth: fix notification for pending buffers during teardown
The cited commit reworked the state machine for pending TX buffers.
In qeth_iqd_tx_complete() it turned PENDING into a transient state, and
uses NEED_QAOB for buffers that get parked while waiting for their QAOB
completion.
But it missed to adjust the check in qeth_tx_complete_buf(). So if
qeth_tx_complete_pending_bufs() is called during teardown to drain
the parked TX buffers, we no longer raise a notification for af_iucv.
Instead of updating the checked state, just move this code into
qeth_tx_complete_pending_bufs() itself. This also gets rid of the
special-case in the common TX completion path.
Fixes: 8908f36d20d8 ("s390/qeth: fix af_iucv notification race") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Tue, 9 Mar 2021 16:52:20 +0000 (17:52 +0100)]
s390/qeth: schedule TX NAPI on QAOB completion
When a QAOB notifies us that a pending TX buffer has been delivered, the
actual TX completion processing by qeth_tx_complete_pending_bufs()
is done within the context of a TX NAPI instance. We shouldn't rely on
this instance being scheduled by some other TX event, but just do it
ourselves.
qeth_qdio_handle_aob() is called from qeth_poll(), ie. our main NAPI
instance. To avoid touching the TX queue's NAPI instance
before/after it is (un-)registered, reorder the code in qeth_open()
and qeth_stop() accordingly.
Fixes: 0da9581ddb0f ("qeth: exploit asynchronous delivery of storage blocks") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Tue, 9 Mar 2021 16:52:19 +0000 (17:52 +0100)]
s390/qeth: improve completion of pending TX buffers
The current design attaches a pending TX buffer to a custom
single-linked list, which is anchored at the buffer's slot on the
TX ring. The buffer is then checked for final completion whenever
this slot is processed during a subsequent TX NAPI poll cycle.
But if there's insufficient traffic on the ring, we might never make
enough progress to get back to this ring slot and discover the pending
buffer's final TX completion. In particular if this missing TX
completion blocks the application from sending further traffic.
So convert the custom single-linked list code to a per-queue list_head,
and scan this list on every TX NAPI cycle.
Fixes: 0da9581ddb0f ("qeth: exploit asynchronous delivery of storage blocks") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Tue, 9 Mar 2021 16:52:18 +0000 (17:52 +0100)]
s390/qeth: fix memory leak after failed TX Buffer allocation
When qeth_alloc_qdio_queues() fails to allocate one of the buffers that
back an Output Queue, the 'out_freeoutqbufs' path will free all
previously allocated buffers for this queue. But it misses to free the
half-finished queue struct itself.
Move the buffer allocation into qeth_alloc_output_queue(), and deal with
such errors internally.
Fixes: 0da9581ddb0f ("qeth: exploit asynchronous delivery of storage blocks") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 10 Mar 2021 00:12:20 +0000 (16:12 -0800)]
Merge branch 'virtio_net-infinite-loop'
Balazs Nemeth says:
====================
net: prevent infinite loop caused by incorrect proto from virtio_net_hdr_set_proto
These patches prevent an infinite loop for gso packets with a protocol
from virtio net hdr that doesn't match the protocol in the packet.
Note that packets coming from a device without
header_ops->parse_protocol being implemented will not be caught by
the check in virtio_net_hdr_to_skb, but the infinite loop will still
be prevented by the check in the gso layer.
Changes from v2 to v3:
- Remove unused *eth.
- Use MPLS_HLEN to also check if the MPLS header length is a multiple
of four.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Balazs Nemeth [Tue, 9 Mar 2021 11:31:01 +0000 (12:31 +0100)]
net: avoid infinite loop in mpls_gso_segment when mpls_hlen == 0
A packet with skb_inner_network_header(skb) == skb_network_header(skb)
and ETH_P_MPLS_UC will prevent mpls_gso_segment from pulling any headers
from the packet. Subsequently, the call to skb_mac_gso_segment will
again call mpls_gso_segment with the same packet leading to an infinite
loop. In addition, ensure that the header length is a multiple of four,
which should hold irrespective of the number of stacked labels.
Signed-off-by: Balazs Nemeth <bnemeth@redhat.com> Acked-by: Willem de Bruijn <willemb@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Balazs Nemeth [Tue, 9 Mar 2021 11:31:00 +0000 (12:31 +0100)]
net: check if protocol extracted by virtio_net_hdr_set_proto is correct
For gso packets, virtio_net_hdr_set_proto sets the protocol (if it isn't
set) based on the type in the virtio net hdr, but the skb could contain
anything since it could come from packet_snd through a raw socket. If
there is a mismatch between what virtio_net_hdr_set_proto sets and
the actual protocol, then the skb could be handled incorrectly later
on.
An example where this poses an issue is with the subsequent call to
skb_flow_dissect_flow_keys_basic which relies on skb->protocol being set
correctly. A specially crafted packet could fool
skb_flow_dissect_flow_keys_basic preventing EINVAL to be returned.
Avoid blindly trusting the information provided by the virtio net header
by checking that the protocol in the packet actually matches the
protocol set by virtio_net_hdr_set_proto. Note that since the protocol
is only checked if skb->dev implements header_ops->parse_protocol,
packets from devices without the implementation are not checked at this
stage.
Fixes: 9274124f023b ("net: stricter validation of untrusted gso packets") Signed-off-by: Balazs Nemeth <bnemeth@redhat.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: dsa: xrs700x: check if partner is same as port in hsr join
Don't assign dp to partner if it's the same port that xrs700x_hsr_join
was called with. The partner port is supposed to be the other port in
the HSR/PRP redundant pair not the same port. This fixes an issue
observed in testing where forwarding between redundant HSR ports on this
switch didn't work depending on the order the ports were added to the
hsr device.
Fixes: bd62e6f5e6a9 ("net: dsa: xrs700x: add HSR offloading support") Signed-off-by: George McCollister <george.mccollister@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yonghong Song [Tue, 9 Mar 2021 01:56:47 +0000 (17:56 -0800)]
bpf, x86: Use kvmalloc_array instead kmalloc_array in bpf_jit_comp
x86 bpf_jit_comp.c used kmalloc_array to store jited addresses
for each bpf insn. With a large bpf program, we have see the
following allocation failures in our production server:
Yonghong Song [Tue, 9 Mar 2021 18:50:28 +0000 (10:50 -0800)]
bpf: Don't do bpf_cgroup_storage_set() for kuprobe/tp programs
For kuprobe and tracepoint bpf programs, kernel calls
trace_call_bpf() which calls BPF_PROG_RUN_ARRAY_CHECK()
to run the program array. Currently, BPF_PROG_RUN_ARRAY_CHECK()
also calls bpf_cgroup_storage_set() to set percpu
cgroup local storage with NULL value. This is
due to Commit 394e40a29788 ("bpf: extend bpf_prog_array to store
pointers to the cgroup storage") which modified
__BPF_PROG_RUN_ARRAY() to call bpf_cgroup_storage_set()
and this macro is also used by BPF_PROG_RUN_ARRAY_CHECK().
kuprobe and tracepoint programs are not allowed to call
bpf_get_local_storage() helper hence does not
access percpu cgroup local storage. Let us
change BPF_PROG_RUN_ARRAY_CHECK() not to
modify percpu cgroup local storage.
The issue is observed when I tried to debug [1] where
percpu data is overwritten due to
preempt_disable -> migration_disable
change. This patch does not completely fix the above issue,
which will be addressed separately, e.g., multiple cgroup
prog runs may preempt each other. But it does fix
any potential issue caused by tracing program
overwriting percpu cgroup storage:
- in a busy system, a tracing program is to run between
bpf_cgroup_storage_set() and the cgroup prog run.
- a kprobe program is triggered by a helper in cgroup prog
before bpf_get_local_storage() is called.
Fixes: 394e40a29788 ("bpf: extend bpf_prog_array to store pointers to the cgroup storage") Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Roman Gushchin <guro@fb.com> Link: https://lore.kernel.org/bpf/20210309185028.3763817-1-yhs@fb.com
Linus Torvalds [Tue, 9 Mar 2021 20:02:18 +0000 (12:02 -0800)]
Merge tag 'gpio-fixes-for-v5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
"A bunch of fixes for the GPIO subsystem. We have two regressions in
the core code spotted right after the merge window, a series of fixes
for ACPI GPIO and a subsequent fix for a related regression in
gpio-pca953x + a minor tweak in .gitignore and a rework of handling of
the gpio-line-names to remedy a regression in stm32mp151.
Summary:
- fix two regressions in core GPIO subsystem code: one NULL-pointer
dereference and one list corruption
- read GPIO line names from fwnode instead of using the generic
device properties to fix a regression on stm32mp151
- fixes to ACPI GPIO and gpio-pca953x to handle a regression in IRQ
handling on Intel Galileo
- update .gitignore in GPIO selftests"
* tag 'gpio-fixes-for-v5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpiolib: Read "gpio-line-names" from a firmware node
gpio: pca953x: Set IRQ type when handle Intel Galileo Gen 2
gpiolib: acpi: Allow to find GpioInt() resource by name and index
gpiolib: acpi: Add ACPI_GPIO_QUIRK_ABSOLUTE_NUMBER quirk
gpiolib: acpi: Add missing IRQF_ONESHOT
gpio: fix gpio-device list corruption
gpio: fix NULL-deref-on-deregistration regression
selftests: gpio: update .gitignore
Linus Torvalds [Tue, 9 Mar 2021 19:56:41 +0000 (11:56 -0800)]
Merge tag 'mips-fixes_5.12_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS fixes from Thomas Bogendoerfer:
- fixes for boot breakage because of misaligned FDTs
- fix for overwritten exception handlers
- enable MIPS optimized crypto for all MIPS CPUs to improve wireguard
performance
* tag 'mips-fixes_5.12_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: kernel: Reserve exception base early to prevent corruption
MIPS: vmlinux.lds.S: align raw appended dtb to 8 bytes
crypto: mips/poly1305 - enable for all MIPS processors
MIPS: boot/compressed: Copy DTB to aligned address
For the devices in this driver, the default qdisc is "noqueue",
because their "tx_queue_len" is 0.
In function "__dev_queue_xmit" in "net/core/dev.c", devices with the
"noqueue" qdisc are specially handled. Packets are transmitted without
being queued after a "dev->flags & IFF_UP" check. However, it's possible
that even if this check succeeds, "ops->ndo_stop" may still have already
been called. This is because in "__dev_close_many", "ops->ndo_stop" is
called before clearing the "IFF_UP" flag.
If we call "netif_stop_queue" in "ops->ndo_stop", then it's possible in
"__dev_queue_xmit", it sees the "IFF_UP" flag is present, and then it
checks "netif_xmit_stopped" and finds that the queue is already stopped.
In this case, it will complain that:
"Virtual device ... asks to queue packet!"
To prevent "__dev_queue_xmit" from generating this complaint, we should
not call "netif_stop_queue" in "ops->ndo_stop".
We also don't need to call "netif_start_queue" in "ops->ndo_open",
because after a netdev is allocated and registered, the
"__QUEUE_STATE_DRV_XOFF" flag is initially not set, so there is no need
to call "netif_start_queue" to clear it.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Xie He <xie.he.0141@gmail.com> Acked-by: Martin Schiller <ms@dev.tdt.de> Signed-off-by: David S. Miller <davem@davemloft.net>
MIPS: kernel: Reserve exception base early to prevent corruption
BMIPS is one of the few platforms that do change the exception base.
After commit 2dcb39645441 ("memblock: do not start bottom-up allocations
with kernel_end") we started seeing BMIPS boards fail to boot with the
built-in FDT being corrupted.
Before the cited commit, early allocations would be in the [kernel_end,
RAM_END] range, but after commit they would be within [RAM_START +
PAGE_SIZE, RAM_END].
The custom exception base handler that is installed by
bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the
memory region allocated by unflatten_and_copy_device_tree() thus
corrupting the FDT used by the kernel.
To fix this, we need to perform an early reservation of the custom
exception space. Additional we reserve the first 4k (1k for R3k) for
either normal exception vector space (legacy CPUs) or special vectors
like cache exceptions.
Huge thanks to Serge for analysing and proposing a solution to this
issue.
Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with kernel_end") Reported-by: Kamal Dasu <kdasu.kdev@gmail.com> Debugged-by: Serge Semin <Sergey.Semin@baikalelectronics.ru> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Pull sparc updates from David Miller:
"Just some more random bits from Al, including a conversion over to
generic extables"
* git://git.kernel.org:/pub/scm/linux/kernel/git/davem/sparc:
sparc32: take ->thread.flags out
sparc32: get rid of fake_swapper_regs
sparc64: get rid of fake_swapper_regs
sparc32: switch to generic extables
sparc32: switch copy_user.S away from range exception table entries
sparc32: get rid of range exception table entries in checksum_32.S
sparc32: switch __bzero() away from range exception table entries
sparc32: kill lookup_fault()
sparc32: don't bother with lookup_fault() in __bzero()
Tong Zhang [Mon, 8 Mar 2021 03:25:30 +0000 (22:25 -0500)]
atm: idt77252: fix null-ptr-dereference
this one is similar to the phy_data allocation fix in uPD98402, the
driver allocate the idt77105_priv and store to dev_data but later
dereference using dev->dev_data, which will cause null-ptr-dereference.
fix this issue by changing dev_data to phy_data so that PRIV(dev) can
work correctly.
Signed-off-by: Tong Zhang <ztong0001@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Tong Zhang [Mon, 8 Mar 2021 03:25:29 +0000 (22:25 -0500)]
atm: uPD98402: fix incorrect allocation
dev->dev_data is set in zatm.c, calling zatm_start() will overwrite this
dev->dev_data in uPD98402_start() and a subsequent PRIV(dev)->lock
(i.e dev->phy_data->lock) will result in a null-ptr-dereference.
I believe this is a typo and what it actually want to do is to allocate
phy_data instead of dev_data.
Signed-off-by: Tong Zhang <ztong0001@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jia-Ju Bai [Mon, 8 Mar 2021 09:13:55 +0000 (01:13 -0800)]
net: qrtr: fix error return code of qrtr_sendmsg()
When sock_alloc_send_skb() returns NULL to skb, no error return code of
qrtr_sendmsg() is assigned.
To fix this bug, rc is assigned with -ENOMEM in this case.
Fixes: 194ccc88297a ("net: qrtr: Support decoding incoming v2 packets") Reported-by: TOTE Robot <oslab@tsinghua.edu.cn> Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Davide Caratti [Mon, 8 Mar 2021 09:00:04 +0000 (10:00 +0100)]
mptcp: fix length of ADD_ADDR with port sub-option
in current Linux, MPTCP peers advertising endpoints with port numbers use
a sub-option length that wrongly accounts for the trailing TCP NOP. Also,
receivers will only process incoming ADD_ADDR with port having such wrong
sub-option length. Fix this, making ADD_ADDR compliant to RFC8684 ยง3.4.1.
this can be verified running tcpdump on the kselftests artifacts:
patched kernel:
[root@bottarga mptcp]# tcpdump -tnnr patched.pcap | grep add-addr
reading from file patched.pcap, link-type LINUX_SLL (Linux cooked v1), snapshot length 65535
IP 10.0.1.1.10000 > 10.0.1.2.38178: Flags [.], ack 101, win 509, options [nop,nop,TS val 3728873902 ecr 2732713192,mptcp add-addr v1 id 1 10.0.2.1:10100 hmac 0xbccdfcbe59292a1f,nop,nop], length 0
IP 10.0.1.2.38178 > 10.0.1.1.10000: Flags [.], ack 101, win 502, options [nop,nop,TS val 2732713195 ecr 3728873902,mptcp add-addr v1-echo id 1 10.0.2.1:10100,nop,nop], length 0
Fixes: 22fb85ffaefb ("mptcp: add port support for ADD_ADDR suboption writing") CC: stable@vger.kernel.org # 5.11+ Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Acked-and-tested-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jia-Ju Bai [Mon, 8 Mar 2021 03:11:02 +0000 (19:11 -0800)]
net: bonding: fix error return code of bond_neigh_init()
When slave is NULL or slave_ops->ndo_neigh_setup is NULL, no error
return code of bond_neigh_init() is assigned.
To fix this bug, ret is assigned with -EINVAL in these cases.
Fixes: 9e99bfefdbce ("bonding: fix bond_neigh_init()") Reported-by: TOTE Robot <oslab@tsinghua.edu.cn> Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Sun, 7 Mar 2021 13:23:39 +0000 (15:23 +0200)]
net: enetc: allow hardware timestamping on TX queues with tc-etf enabled
The txtime is passed to the driver in skb->skb_mstamp_ns, which is
actually in a union with skb->tstamp (the place where software
timestamps are kept).
Since commit b50a5c70ffa4 ("net: allow simultaneous SW and HW transmit
timestamping"), __sock_recv_timestamp has some logic for making sure
that the two calls to skb_tstamp_tx:
skb_tx_timestamp(skb) # Software timestamp in the driver
-> skb_tstamp_tx(skb, NULL)
and
skb_tstamp_tx(skb, &shhwtstamps) # Hardware timestamp in the driver
will both do the right thing and in a race-free manner, meaning that
skb_tx_timestamp will deliver a cmsg with the software timestamp only,
and skb_tstamp_tx with a non-NULL hwtstamps argument will deliver a cmsg
with the hardware timestamp only.
Why are races even possible? Well, because although the software timestamp
skb->tstamp is private per skb, the hardware timestamp skb_hwtstamps(skb)
lives in skb_shinfo(skb), an area which is shared between skbs and their
clones. And skb_tstamp_tx works by cloning the packets when timestamping
them, therefore attempting to perform hardware timestamping on an skb's
clone will also change the hardware timestamp of the original skb. And
the original skb might have been yet again cloned for software
timestamping, at an earlier stage.
So the logic in __sock_recv_timestamp can't be as simple as saying
"does this skb have a hardware timestamp? if yes I'll send the hardware
timestamp to the socket, otherwise I'll send the software timestamp",
precisely because the hardware timestamp is shared.
Instead, it's quite the other way around: __sock_recv_timestamp says
"does this skb have a software timestamp? if yes, I'll send the software
timestamp, otherwise the hardware one". This works because the software
timestamp is not shared with clones.
But that means we have a problem when we attempt hardware timestamping
with skbs that don't have the skb->tstamp == 0. __sock_recv_timestamp
will say "oh, yeah, this must be some sort of odd clone" and will not
deliver the hardware timestamp to the socket. And this is exactly what
is happening when we have txtime enabled on the socket: as mentioned,
that is put in a union with skb->tstamp, so it is quite easy to mistake
it.
Do what other drivers do (intel igb/igc) and write zero to skb->tstamp
before taking the hardware timestamp. It's of no use to us now (we're
already on the TX confirmation path).
Fixes: 0d08c9ec7d6e ("enetc: add support time specific departure base on the qos etf") Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Marginean [Sun, 7 Mar 2021 13:23:38 +0000 (15:23 +0200)]
net: enetc: set MAC RX FIFO to recommended value
On LS1028A, the MAC RX FIFO defaults to the value 2, which is too high
and may lead to RX lock-up under traffic at a rate higher than 6 Gbps.
Set it to 1 instead, as recommended by the hardware design team and by
later versions of the ENETC block guide.
Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Reviewed-by: Jason Liu <jason.hui.liu@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>