Aya Levin [Sun, 27 Dec 2020 14:33:19 +0000 (16:33 +0200)]
net/mlx5e: ethtool, Fix restriction of autoneg with 56G
Prior to this patch, configuring speed to 50G with autoneg off over
devices supporting 50G per lane failed.
Support for 50G per lane introduced a new set of link-modes, on which
driver always performed a speed validation as if only legacy link-modes
were configured. Fix driver speed validation to force setting autoneg
over 56G only if in legacy link-mode.
Fixes: 3d7cadae51f1 ("net/mlx5e: ethtool, Fix analysis of speed setting") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Maor Dickman [Mon, 14 Dec 2020 11:53:03 +0000 (13:53 +0200)]
net/mlx5e: In skb build skip setting mark in switchdev mode
sop_drop_qpn field in the cqe is used by two features, in SWITCHDEV mode
to restore the chain id in case of a miss and in LEGACY mode to support
skbedit mark action. In build RX skb, the skb mark field is set regardless
of the configured mode which cause a corruption of the mark field in case
of switchdev mode.
Fix by overriding the mark value back to 0 in the representor tc update
skb flow.
Fixes: 8f1e0b97cc70 ("net/mlx5: E-Switch, Mark miss packets with new chain id mapping") Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Alaa Hleihel [Mon, 4 Jan 2021 10:54:40 +0000 (12:54 +0200)]
net/mlx5: E-Switch, fix changing vf VLANID
Adding vf VLANID for the first time, or after having cleared previously
defined VLANID works fine, however, attempting to change an existing vf
VLANID clears the rules on the firmware, but does not add new rules for
the new vf VLANID.
Fix this by changing the logic in function esw_acl_egress_lgcy_setup()
so that it will always configure egress rules.
Moshe Shemesh [Fri, 13 Nov 2020 04:06:28 +0000 (06:06 +0200)]
net/mlx5e: Fix SWP offsets when vlan inserted by driver
In case WQE includes inline header the vlan is inserted by driver even
if vlan offload is set. On geneve over vlan interface where software
parser is used the SWP offsets should be updated according to the added
vlan.
Oz Shlomo [Mon, 7 Dec 2020 08:15:18 +0000 (08:15 +0000)]
net/mlx5e: CT: Use per flow counter when CT flow accounting is enabled
Connection counters may be shared for both directions when the counter
is used for connection aging purposes. However, if TC flow
accounting is enabled then a unique counter is required per direction.
Instantiate a unique counter per direction if the conntrack accounting
extension is enabled. Use a shared counter when the connection accounting
extension is disabled.
Fixes: 1edae2335adf ("net/mlx5e: CT: Use the same counter for both directions") Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Aya Levin [Tue, 24 Nov 2020 20:16:23 +0000 (22:16 +0200)]
net/mlx5e: Add missing capability check for uplink follow
Expose firmware indication that it supports setting eswitch uplink state
to follow (follow the physical link). Condition setting the eswitch
uplink admin-state with this capability bit. Older FW may not support
the uplink state setting.
Fixes: 7d0314b11cdd ("net/mlx5e: Modify uplink state on interface up/down") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Fixes: 9b412cc35f00 ("net/mlx5e: Add LAG warning if bond slave is not lag master") Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Sean Tranchetti [Wed, 6 Jan 2021 00:22:26 +0000 (16:22 -0800)]
tools: selftests: add test for changing routes with PTMU exceptions
Adds new 2 new tests to the PTMU script: pmtu_ipv4/6_route_change.
These tests explicitly test for a recently discovered problem in the
IPv6 routing framework where PMTU exceptions were not properly released
when replacing a route via "ip route change ...".
After creating PMTU exceptions, the route from the device A to R1 will be
replaced with a new route, then device A will be deleted. If the PMTU
exceptions were properly cleaned up by the kernel, this device deletion
will succeed. Otherwise, the unregistration of the device will stall, and
messages such as the following will be logged in dmesg:
unregister_netdevice: waiting for veth_A-R1 to become free. Usage count = 4
Sean Tranchetti [Wed, 6 Jan 2021 00:22:25 +0000 (16:22 -0800)]
net: ipv6: fib: flush exceptions when purging route
Route removal is handled by two code paths. The main removal path is via
fib6_del_route() which will handle purging any PMTU exceptions from the
cache, removing all per-cpu copies of the DST entry used by the route, and
releasing the fib6_info struct.
The second removal location is during fib6_add_rt2node() during a route
replacement operation. This path also calls fib6_purge_rt() to handle
cleaning up the per-cpu copies of the DST entries and releasing the
fib6_info associated with the older route, but it does not flush any PMTU
exceptions that the older route had. Since the older route is removed from
the tree during the replacement, we lose any way of accessing it again.
As these lingering DSTs and the fib6_info struct are holding references to
the underlying netdevice struct as well, unregistering that device from the
kernel can never complete.
Jakub Kicinski [Thu, 7 Jan 2021 19:08:08 +0000 (11:08 -0800)]
Merge tag 'linux-can-fixes-for-5.11-20210107' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2021-01-07
The first patch is by me for the m_can driver and removes an erroneous
m_can_clk_stop() from the driver's unregister function.
The second patch targets the tcan4x5x driver, is by me, and fixes the bit
timing constant parameters.
The next two patches are by me, target the mcp251xfd driver, and fix a race
condition in the optimized TEF path (which was added in net-next for v5.11).
The similar code in the RX path is changed to look the same, although it
doesn't suffer from the race condition.
A patch by Lad Prabhakar updates the description and help text for the rcar CAN
driver to reflect all supported SoCs.
In the last patch Sriram Dash transfers the maintainership of the m_can driver
to Pankaj Sharma.
* tag 'linux-can-fixes-for-5.11-20210107' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
MAINTAINERS: Update MCAN MMIO device driver maintainer
can: rcar: Kconfig: update help description for CAN_RCAR config
can: mcp251xfd: mcp251xfd_handle_rxif_ring(): first increment RX tail pointer in HW, then in driver
can: mcp251xfd: mcp251xfd_handle_tefif(): fix TEF vs. TX race condition
can: tcan4x5x: fix bittiming const, use common bittiming from m_can driver
can: m_can: m_can_class_unregister(): remove erroneous m_can_clk_stop()
====================
can: mcp251xfd: mcp251xfd_handle_rxif_ring(): first increment RX tail pointer in HW, then in driver
The previous patch fixes a TEF vs. TX race condition, by first updating the TEF
tail pointer in hardware, and then updating the driver internal pointer.
The same pattern exists in the RX-path, too. This should be no problem, as the
driver accesses the RX-FIFO from the interrupt handler only, thus the access is
properly serialized. Fix the order here, too, so that the TEF- and RX-path look
similar.
can: mcp251xfd: mcp251xfd_handle_tefif(): fix TEF vs. TX race condition
The mcp251xfd driver uses a TX FIFO for sending CAN frames and a TX Event FIFO
(TEF) for completed TX-requests.
The TEF event handling in the mcp251xfd_handle_tefif() function has a race
condition. It first increments the tx-ring's tail counter to signal that
there's room in the TX and TEF FIFO, then it increments the TEF FIFO in
hardware.
A running mcp251xfd_start_xmit() on a different CPU might not stop the txqueue
(as the tx-ring still shows free space). The next mcp251xfd_start_xmit() will
push a message into the chip and the TX complete event might overflow the TEF
FIFO.
can: tcan4x5x: fix bittiming const, use common bittiming from m_can driver
According to the TCAN4550 datasheet "SLLSF91 - DECEMBER 2018" the tcan4x5x has
the same bittiming constants as a m_can revision 3.2.x/3.3.0.
The tcan4x5x chip I'm using identifies itself as m_can revision 3.2.1, so
remove the tcan4x5x specific bittiming values and rely on the values in the
m_can driver, which are selected according to core revision.
Fixes: 5443c226ba91 ("can: tcan4x5x: Add tcan4x5x driver to the kernel") Cc: Dan Murphy <dmurphy@ti.com> Reviewed-by: Sean Nyekjaer <sean@geanix.com> Link: https://lore.kernel.org/r/20201215103238.524029-3-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Randy Dunlap [Wed, 6 Jan 2021 04:25:31 +0000 (20:25 -0800)]
ptp: ptp_ines: prevent build when HAS_IOMEM is not set
ptp_ines.c uses devm_platform_ioremap_resource(), which is only
built/available when CONFIG_HAS_IOMEM is enabled.
CONFIG_HAS_IOMEM is not enabled for arch/s390/, so builds on S390
have a build error:
s390-linux-ld: drivers/ptp/ptp_ines.o: in function `ines_ptp_ctrl_probe':
ptp_ines.c:(.text+0x17e6): undefined reference to `devm_platform_ioremap_resource'
Prevent builds of ptp_ines.c when HAS_IOMEM is not set.
Fixes: bad1eaa6ac31 ("ptp: Add a driver for InES time stamping IP core.") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kernel test robot <lkp@intel.com>
Link: lore.kernel.org/r/202101031125.ZEFCUiKi-lkp@intel.com Acked-by: Richard Cochran <richardcochran@gmail.com> Link: https://lore.kernel.org/r/20210106042531.1351-1-rdunlap@infradead.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Randy Dunlap [Wed, 6 Jan 2021 02:18:15 +0000 (18:18 -0800)]
net: dsa: fix led_classdev build errors
Fix build errors when LEDS_CLASS=m and NET_DSA_HIRSCHMANN_HELLCREEK=y.
This limits the latter to =m when LEDS_CLASS=m.
microblaze-linux-ld: drivers/net/dsa/hirschmann/hellcreek_ptp.o: in function `hellcreek_ptp_setup':
(.text+0xf80): undefined reference to `led_classdev_register_ext'
microblaze-linux-ld: (.text+0xf94): undefined reference to `led_classdev_register_ext'
microblaze-linux-ld: drivers/net/dsa/hirschmann/hellcreek_ptp.o: in function `hellcreek_ptp_free':
(.text+0x1018): undefined reference to `led_classdev_unregister'
microblaze-linux-ld: (.text+0x1024): undefined reference to `led_classdev_unregister'
Qinglang Miao [Tue, 5 Jan 2021 05:57:54 +0000 (13:57 +0800)]
net: qrtr: fix null-ptr-deref in qrtr_ns_remove
A null-ptr-deref bug is reported by Hulk Robot like this:
--------------
KASAN: null-ptr-deref in range [0x0000000000000128-0x000000000000012f]
Call Trace:
qrtr_ns_remove+0x22/0x40 [ns]
qrtr_proto_fini+0xa/0x31 [qrtr]
__x64_sys_delete_module+0x337/0x4e0
do_syscall_64+0x34/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x468ded
--------------
When qrtr_ns_init fails in qrtr_proto_init, qrtr_ns_remove which would
be called later on would raise a null-ptr-deref because qrtr_ns.workqueue
has been destroyed.
Fix it by making qrtr_ns_init have a return value and adding a check in
qrtr_proto_init.
Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: cdc_ncm: correct overhead in delayed_ndp_size
Aligning to tx_ndp_modulus is not sufficient because the next align
call can be cdc_ncm_align_tail, which can add up to ctx->tx_modulus +
ctx->tx_remainder - 1 bytes. This used to lead to occasional crashes
on a Huawei 909s-120 LTE module as follows:
- the condition marked /* if there is a remaining skb [...] */ is true
so the swaps happen
- skb_out is set from ctx->tx_curr_skb
- skb_out->len is exactly 0x3f52
- ctx->tx_curr_size is 0x4000 and delayed_ndp_size is 0xac
(note that the sum of skb_out->len and delayed_ndp_size is 0x3ffe)
- the for loop over n is executed once
- the cdc_ncm_align_tail call marked /* align beginning of next frame */
increases skb_out->len to 0x3f56 (the sum is now 0x4002)
- the condition marked /* check if we had enough room left [...] */ is
false so we break out of the loop
- the condition marked /* If requested, put NDP at end of frame. */ is
true so the NDP is written into skb_out
- now skb_out->len is 0x4002, so padding_count is minus two interpreted
as an unsigned number, which is used as the length argument to memset,
leading to a crash with various symptoms but usually including
The cdc_ncm_align_tail call first aligns on a ctx->tx_modulus
boundary (adding at most ctx->tx_modulus-1 bytes), then adds
ctx->tx_remainder bytes. Alternatively, the next alignment call can
occur in cdc_ncm_ndp16 or cdc_ncm_ndp32, in which case at most
ctx->tx_ndp_modulus-1 bytes are added.
A similar problem has occurred before, and the code is nontrivial to
reason about, so add a guard before the crashing call. By that time it
is too late to prevent any memory corruption (we'll have written past
the end of the buffer already) but we can at least try to get a warning
written into an on-disk log by avoiding the hard crash caused by padding
past the buffer with a huge number of zeros.
Signed-off-by: Jouni K. Seppänen <jks@iki.fi> Fixes: 4a0e3e989d66 ("cdc_ncm: Add support for moving NDP to end of NCM frame") Link: https://bugzilla.kernel.org/show_bug.cgi?id=209407 Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
Jian Shen [Tue, 5 Jan 2021 03:37:28 +0000 (11:37 +0800)]
net: hns3: fix incorrect handling of sctp6 rss tuple
For DEVICE_VERSION_V2, the hardware only supports src-ip,
dst-ip and verification-tag for rss tuple set of sctp6
packet. For DEVICE_VERSION_V3, the hardware supports
src-port and dst-port as well.
Currently, when user queries the sctp6 rss tuples info,
some unsupported information will be showed on V2. So add
a check for hardware version when initializing and queries
sctp6 rss tuple to fix this issue.
Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yufeng Mo [Tue, 5 Jan 2021 03:37:27 +0000 (11:37 +0800)]
net: hns3: fix the number of queues actually used by ARQ
HCLGE_MBX_MAX_ARQ_MSG_NUM is used to apply memory for the number
of queues used by ARQ(Asynchronous Receive Queue), so the head
and tail pointers should also use this macro.
Fixes: 07a0556a3a73 ("net: hns3: Changes to support ARQ(Asynchronous Receive Queue)") Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yonglong Liu [Tue, 5 Jan 2021 03:37:26 +0000 (11:37 +0800)]
net: hns3: fix a phy loopback fail issue
When phy driver does not implement the set_loopback interface,
phy loopback test will return -EOPNOTSUPP, and the loopback test
will fail. So when phy driver does not implement the set_loopback
interface, don't do phy loopback test.
Fixes: c9765a89d142 ("net: hns3: add phy selftest function") Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 6 Jan 2021 00:32:08 +0000 (16:32 -0800)]
Merge branch 'stmmac-fixes'
Samuel Holland says:
====================
Fixes for dwmac-sun8i suspend/resume
This series fixes issues preventing dwmac-sun8i from working after a
suspend/resume cycle. Those issues include the PHY being left powered
off, the MAC syscon configuration being reset, and the reference to the
reset controller being improperly dropped. They also fix related issues
in probe error handling and driver removal.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously, sun8i_dwmac_set_syscon was called from a chain of functions
in several different files:
sun8i_dwmac_probe
stmmac_dvr_probe
stmmac_hw_init
stmmac_hwif_init
sun8i_dwmac_setup
sun8i_dwmac_set_syscon
which made the lifetime of the syscon values hard to reason about. Part
of the problem is that there is no similar platform driver callback from
stmmac_dvr_remove. As a result, the driver unset the syscon value in
sun8i_dwmac_exit, but this leaves it uninitialized after a suspend/
resume cycle. It was also unset a second time (outside sun8i_dwmac_exit)
in the probe error path.
Move the init to the earliest available place in sun8i_dwmac_probe
(after stmmac_probe_config_dt, which initializes plat_dat), and the
deinit to the corresponding position in the cleanup order.
Since priv is not filled in until stmmac_dvr_probe, this requires
changing the sun8i_dwmac_set_syscon parameters to priv's two relevant
members.
Fixes: 9f93ac8d4085 ("net-next: stmmac: Add dwmac-sun8i") Fixes: 634db83b8265 ("net: stmmac: dwmac-sun8i: Handle integrated/external MDIOs") Signed-off-by: Samuel Holland <samuel@sholland.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Samuel Holland [Sun, 3 Jan 2021 11:17:43 +0000 (05:17 -0600)]
net: stmmac: dwmac-sun8i: Balance internal PHY power
sun8i_dwmac_exit calls sun8i_dwmac_unpower_internal_phy, but
sun8i_dwmac_init did not call sun8i_dwmac_power_internal_phy. This
caused PHY power to remain off after a suspend/resume cycle. Fix this by
recording if PHY power should be restored, and if so, restoring it.
Fixes: 634db83b8265 ("net: stmmac: dwmac-sun8i: Handle integrated/external MDIOs") Signed-off-by: Samuel Holland <samuel@sholland.org> Signed-off-by: David S. Miller <davem@davemloft.net>
While stmmac_pltfr_remove calls sun8i_dwmac_exit, the sun8i_dwmac_init
and sun8i_dwmac_exit functions are also called by the stmmac_platform
suspend/resume callbacks. They may be called many times during the
device's lifetime and should not release resources used by the driver.
Furthermore, there was no error handling in case registering the MDIO
mux failed during probe, and the EPHY clock was never released at all.
Fix all of these issues by moving the deinitialization code to a driver
removal callback. Also ensure the EPHY is powered down before removal.
Fixes: 634db83b8265 ("net: stmmac: dwmac-sun8i: Handle integrated/external MDIOs") Signed-off-by: Samuel Holland <samuel@sholland.org> Reviewed-by: Chen-Yu Tsai <wens@csie.org> Signed-off-by: David S. Miller <davem@davemloft.net>
stmmac_pltfr_remove does three things in one function, making it
inapproprate for unwinding the steps in the probe function. Currently,
a failure before the call to stmmac_dvr_probe would leak OF node
references due to missing a call to stmmac_remove_config_dt. And an
error in stmmac_dvr_probe would cause the driver to attempt to remove a
netdevice that was never added. Fix these by reordering the init and
splitting out the error handling steps.
Fixes: 9f93ac8d4085 ("net-next: stmmac: Add dwmac-sun8i") Fixes: 40a1dcee2d18 ("net: ethernet: dwmac-sun8i: Use the correct function in exit path") Signed-off-by: Samuel Holland <samuel@sholland.org> Reviewed-by: Chen-Yu Tsai <wens@csie.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 31 Dec 2020 03:40:27 +0000 (19:40 -0800)]
net: vlan: avoid leaks on register_vlan_dev() failures
VLAN checks for NETREG_UNINITIALIZED to distinguish between
registration failure and unregistration in progress.
Since commit cb626bf566eb ("net-sysfs: Fix reference count leak")
registration failure may, however, result in NETREG_UNREGISTERED
as well as NETREG_UNINITIALIZED.
This fix is similer to cebb69754f37 ("rtnetlink: Fix
memory(net_device) leak when ->newlink fails")
Fixes: cb626bf566eb ("net-sysfs: Fix reference count leak") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
net/sonic: Fix some resource leaks in error handling paths
A call to dma_alloc_coherent() is wrapped by sonic_alloc_descriptors().
This is correctly freed in the remove function, but not in the error
handling path of the probe function. Fix this by adding the missing
dma_free_coherent() call.
While at it, rename a label in order to be slightly more informative.
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Chris Zankel <chris@zankel.net>
References: commit 10e3cc180e64 ("net/sonic: Fix a resource leak in an error handling path in 'jazz_sonic_probe()'") Fixes: 74f2a5f0ef64 ("xtensa: Add support for the Sonic Ethernet device for the XT2000 board.") Fixes: efcce839360f ("[PATCH] macsonic/jazzsonic network drivers update") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Sun, 3 Jan 2021 21:36:23 +0000 (22:36 +0100)]
wan: ds26522: select CONFIG_BITREVERSE
Without this, the driver runs into a link failure
arm-linux-gnueabi-ld: drivers/net/wan/slic_ds26522.o: in function `slic_ds26522_probe':
slic_ds26522.c:(.text+0x100c): undefined reference to `byte_rev_table'
arm-linux-gnueabi-ld: slic_ds26522.c:(.text+0x1cdc): undefined reference to `byte_rev_table'
arm-linux-gnueabi-ld: drivers/net/wan/slic_ds26522.o: in function `slic_write':
slic_ds26522.c:(.text+0x1e4c): undefined reference to `byte_rev_table'
Fixes: c37d4a0085c5 ("Maxim/driver: Add driver for maxim ds26522") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Sun, 3 Jan 2021 21:36:22 +0000 (22:36 +0100)]
misdn: dsp: select CONFIG_BITREVERSE
Without this, we run into a link error
arm-linux-gnueabi-ld: drivers/isdn/mISDN/dsp_audio.o: in function `dsp_audio_generate_law_tables':
(.text+0x30c): undefined reference to `byte_rev_table'
arm-linux-gnueabi-ld: drivers/isdn/mISDN/dsp_audio.o:(.text+0x5e4): more undefined references to `byte_rev_table' follow
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Sun, 3 Jan 2021 21:36:21 +0000 (22:36 +0100)]
cfg80211: select CONFIG_CRC32
Without crc32 support, this fails to link:
arm-linux-gnueabi-ld: net/wireless/scan.o: in function `cfg80211_scan_6ghz':
scan.c:(.text+0x928): undefined reference to `crc32_le'
Fixes: c8cb5b854b40 ("nl80211/cfg80211: support 6 GHz scanning") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Sun, 3 Jan 2021 21:36:20 +0000 (22:36 +0100)]
wil6210: select CONFIG_CRC32
Without crc32, the driver fails to link:
arm-linux-gnueabi-ld: drivers/net/wireless/ath/wil6210/fw.o: in function `wil_fw_verify':
fw.c:(.text+0x74c): undefined reference to `crc32_le'
arm-linux-gnueabi-ld: drivers/net/wireless/ath/wil6210/fw.o:fw.c:(.text+0x758): more undefined references to `crc32_le' follow
Fixes: 151a9706503f ("wil6210: firmware download") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Sun, 3 Jan 2021 21:36:19 +0000 (22:36 +0100)]
can: kvaser_pciefd: select CONFIG_CRC32
Without crc32, this driver fails to link:
arm-linux-gnueabi-ld: drivers/net/can/kvaser_pciefd.o: in function `kvaser_pciefd_probe':
kvaser_pciefd.c:(.text+0x2b0): undefined reference to `crc32_be'
Fixes: 26ad340e582d ("can: kvaser_pciefd: Add driver for Kvaser PCIEcan devices") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Sun, 3 Jan 2021 21:36:18 +0000 (22:36 +0100)]
phy: dp83640: select CONFIG_CRC32
Without crc32, this driver fails to link:
arm-linux-gnueabi-ld: drivers/net/phy/dp83640.o: in function `match':
dp83640.c:(.text+0x476c): undefined reference to `crc32_le'
Fixes: 539e44d26855 ("dp83640: Include hash in timestamp/packet matching") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Sun, 3 Jan 2021 21:36:17 +0000 (22:36 +0100)]
qed: select CONFIG_CRC32
Without this, the driver fails to link:
lpc_eth.c:(.text+0x1934): undefined reference to `crc32_le'
arm-linux-gnueabi-ld: drivers/net/ethernet/qlogic/qed/qed_debug.o: in function `qed_grc_dump':
qed_debug.c:(.text+0x4068): undefined reference to `crc32_le'
arm-linux-gnueabi-ld: drivers/net/ethernet/qlogic/qed/qed_debug.o: in function `qed_idle_chk_dump':
qed_debug.c:(.text+0x51fc): undefined reference to `crc32_le'
arm-linux-gnueabi-ld: drivers/net/ethernet/qlogic/qed/qed_debug.o: in function `qed_mcp_trace_dump':
qed_debug.c:(.text+0x6000): undefined reference to `crc32_le'
arm-linux-gnueabi-ld: drivers/net/ethernet/qlogic/qed/qed_debug.o: in function `qed_dbg_reg_fifo_dump':
qed_debug.c:(.text+0x66cc): undefined reference to `crc32_le'
arm-linux-gnueabi-ld: drivers/net/ethernet/qlogic/qed/qed_debug.o:qed_debug.c:(.text+0x6aa4): more undefined references to `crc32_le' follow
Fixes: 7a4b21b7d1f0 ("qed: Add nvram selftest") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Tue, 5 Jan 2021 20:46:27 +0000 (12:46 -0800)]
Merge tag 'arc-5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC updates from Vineet Gupta:
"Things are quieter on upstreaming front as we are mostly focusing on
ARCv3/ARC64 port.
This contains just build system updates from Masahiro Yamada"
* tag 'arc-5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: build: use $(READELF) instead of hard-coded readelf
ARC: build: remove unneeded extra-y
ARC: build: move symlink creation to arch/arc/Makefile to avoid race
ARC: build: add boot_targets to PHONY
ARC: build: add uImage.lzma to the top-level target
ARC: build: remove non-existing bootpImage from KBUILD_IMAGE
Linus Torvalds [Tue, 5 Jan 2021 20:38:56 +0000 (12:38 -0800)]
Merge tag 'net-5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Networking fixes, including fixes from netfilter, wireless and bpf
trees.
Current release - regressions:
- mt76: fix NULL pointer dereference in mt76u_status_worker and
mt76s_process_tx_queue
- net: ipa: fix interconnect enable bug
Current release - always broken:
- netfilter: fixes possible oops in mtype_resize in ipset
- ath11k: fix number of coding issues found by static analysis tools
and spurious error messages
Previous releases - regressions:
- e1000e: re-enable s0ix power saving flows for systems with the
Intel i219-LM Ethernet controllers to fix power use regression
- virtio_net: fix recursive call to cpus_read_lock() to avoid a
deadlock
- ipv4: ignore ECN bits for fib lookups in fib_compute_spec_dst()
- sysfs: take the rtnl lock around XPS configuration
- xsk: fix memory leak for failed bind and rollback reservation at
NETDEV_TX_BUSY
- r8169: work around power-saving bug on some chip versions
Previous releases - always broken:
- dcb: validate netlink message in DCB handler
- tun: fix return value when the number of iovs exceeds MAX_SKB_FRAGS
to prevent unnecessary retries
- vhost_net: fix ubuf refcount when sendmsg fails
- bpf: save correct stopping point in file seq iteration
- ncsi: use real net-device for response handler
- neighbor: fix div by zero caused by a data race (TOCTOU)
- bareudp: fix use of incorrect min_headroom size and a false
positive lockdep splat from the TX lock
- mvpp2:
- clear force link UP during port init procedure in case
bootloader had set it
- add TCAM entry to drop flow control pause frames
- fix PPPoE with ipv6 packet parsing
- fix GoP Networking Complex Control config of port 3
- fix pkt coalescing IRQ-threshold configuration
- xsk: fix race in SKB mode transmit with shared cq
- ionic: account for vlan tag len in rx buffer len
- stmmac: ignore the second clock input, current clock framework does
not handle exclusive clock use well, other drivers may reconfigure
the second clock
Misc:
- ppp: change PPPIOCUNBRIDGECHAN ioctl request number to follow
existing scheme"
* tag 'net-5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (99 commits)
net: dsa: lantiq_gswip: Fix GSWIP_MII_CFG(p) register access
net: dsa: lantiq_gswip: Enable GSWIP_MII_CFG_EN also for internal PHYs
net: lapb: Decrease the refcount of "struct lapb_cb" in lapb_device_event
r8169: work around power-saving bug on some chip versions
net: usb: qmi_wwan: add Quectel EM160R-GL
selftests: mlxsw: Set headroom size of correct port
net: macb: Correct usage of MACB_CAPS_CLK_HW_CHG flag
ibmvnic: fix: NULL pointer dereference.
docs: networking: packet_mmap: fix old config reference
docs: networking: packet_mmap: fix formatting for C macros
vhost_net: fix ubuf refcount incorrectly when sendmsg fails
bareudp: Fix use of incorrect min_headroom size
bareudp: set NETIF_F_LLTX flag
net: hdlc_ppp: Fix issues when mod_timer is called while timer is running
atlantic: remove architecture depends
erspan: fix version 1 check in gre_parse_header()
net: hns: fix return value check in __lb_other_process()
net: sched: prevent invalid Scell_log shift count
net: neighbor: fix a crash caused by mod zero
ipv4: Ignore ECN bits for fib lookups in fib_compute_spec_dst()
...
Linus Torvalds [Tue, 5 Jan 2021 19:55:46 +0000 (11:55 -0800)]
Merge tag 'afs-fixes-04012021' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull AFS fixes from David Howells:
"Two fixes.
The first is the fix for the strnlen() array limit check and the
second fixes the calculation of the number of dirent records used to
represent any particular filename length"
* tag 'afs-fixes-04012021' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
afs: Fix directory entry size calculation
afs: Work around strnlen() oops with CONFIG_FORTIFIED_SOURCE=y
Linus Torvalds [Tue, 5 Jan 2021 19:33:00 +0000 (11:33 -0800)]
mm: make wait_on_page_writeback() wait for multiple pending writebacks
Ever since commit 2a9127fcf229 ("mm: rewrite wait_on_page_bit_common()
logic") we've had some very occasional reports of BUG_ON(PageWriteback)
in write_cache_pages(), which we thought we already fixed in commit 073861ed77b6 ("mm: fix VM_BUG_ON(PageTail) and BUG_ON(PageWriteback)").
But syzbot just reported another one, even with that commit in place.
And it turns out that there's a simpler way to trigger the BUG_ON() than
the one Hugh found with page re-use. It all boils down to the fact that
the page writeback is ostensibly serialized by the page lock, but that
isn't actually really true.
Yes, the people _setting_ writeback all do so under the page lock, but
the actual clearing of the bit - and waking up any waiters - happens
without any page lock.
This gives us this fairly simple race condition:
CPU1 = end previous writeback
CPU2 = start new writeback under page lock
CPU3 = write_cache_pages()
wake_up_page(page, PG_writeback);
.. wakes up CPU3 ..
BUG_ON(PageWriteback(page));
where the BUG_ON() happens because we woke up the PG_writeback bit
becasue of the _previous_ writeback, but a new one had already been
started because the clearing of the bit wasn't actually atomic wrt the
actual wakeup or serialized by the page lock.
The reason this didn't use to happen was that the old logic in waiting
on a page bit would just loop if it ever saw the bit set again.
The nice proper fix would probably be to get rid of the whole "wait for
writeback to clear, and then set it" logic in the writeback path, and
replace it with an atomic "wait-to-set" (ie the same as we have for page
locking: we set the page lock bit with a single "lock_page()", not with
"wait for lock bit to clear and then set it").
However, out current model for writeback is that the waiting for the
writeback bit is done by the generic VFS code (ie write_cache_pages()),
but the actual setting of the writeback bit is done much later by the
filesystem ".writepages()" function.
IOW, to make the writeback bit have that same kind of "wait-to-set"
behavior as we have for page locking, we'd have to change our roughly
~50 different writeback functions. Painful.
Instead, just make "wait_on_page_writeback()" loop on the very unlikely
situation that the PG_writeback bit is still set, basically re-instating
the old behavior. This is very non-optimal in case of contention, but
since we only ever set the bit under the page lock, that situation is
controlled.
The following patchset contains Netfilter fixes for net:
1) Missing sanitization of rateest userspace string, bug has been
triggered by syzbot, patch from Florian Westphal.
2) Report EOPNOTSUPP on missing set features in nft_dynset, otherwise
error reporting to userspace via EINVAL is misleading since this is
reserved for malformed netlink requests.
3) New binaries with old kernels might silently accept several set
element expressions. New binaries set on the NFT_SET_EXPR and
NFT_DYNSET_F_EXPR flags to request for several expressions per
element, hence old kernels which do not support for this bail out
with EOPNOTSUPP.
* git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf:
netfilter: nftables: add set expression flags
netfilter: nft_dynset: report EOPNOTSUPP on missing set feature
netfilter: xt_RATEEST: reject non-null terminated string from userspace
====================
====================
net: dsa: lantiq_gswip: two fixes for -net/-stable
While testing the lantiq_gswip driver in OpenWrt at least one board had
a non-working Ethernet port connected to an internal 100Mbit/s PHY22F
GPHY. The problem which could be observed:
- the PHY would detect the link just fine
- ethtool stats would see the TX counter rise
- the RX counter in ethtool was stuck at zero
It turns out that two independent patches are needed to fix this:
- first we need to enable the MII data lines also for internal PHYs
- second we need to program the GSWIP_MII_CFG registers for all ports
except the CPU port
These two patches have also been tested by back-porting them on top of
Linux 5.4.86 in OpenWrt.
Special thanks to Hauke for debugging and brainstorming this on IRC
with me!
====================
There is one GSWIP_MII_CFG register for each switch-port except the CPU
port. The register offset for the first port is 0x0, 0x02 for the
second, 0x04 for the third and so on.
Update the driver to not only restrict the GSWIP_MII_CFG registers to
ports 0, 1 and 5. Handle ports 0..5 instead but skip the CPU port. This
means we are not overwriting the configuration for the third port (port
two since we start counting from zero) with the settings for the sixth
port (with number five) anymore.
The GSWIP_MII_PCDU(p) registers are not updated because there's really
only three (one for each of the following ports: 0, 1, 5).
Fixes: 14fceff4771e51 ("net: dsa: Add Lantiq / Intel DSA driver for vrx200") Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Acked-by: Hauke Mehrtens <hauke@hauke-m.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net: dsa: lantiq_gswip: Enable GSWIP_MII_CFG_EN also for internal PHYs
Enable GSWIP_MII_CFG_EN also for internal PHYs to make traffic flow.
Without this the PHY link is detected properly and ethtool statistics
for TX are increasing but there's no RX traffic coming in.
Xie He [Thu, 31 Dec 2020 17:43:31 +0000 (09:43 -0800)]
net: lapb: Decrease the refcount of "struct lapb_cb" in lapb_device_event
In lapb_device_event, lapb_devtostruct is called to get a reference to
an object of "struct lapb_cb". lapb_devtostruct increases the refcount
of the object and returns a pointer to it. However, we didn't decrease
the refcount after we finished using the pointer. This patch fixes this
problem.
Heiner Kallweit [Wed, 30 Dec 2020 18:33:34 +0000 (19:33 +0100)]
r8169: work around power-saving bug on some chip versions
A user reported failing network with RTL8168dp (a quite rare chip
version). Realtek confirmed that few chip versions suffer from a PLL
power-down hw bug.
Charles Keepax [Mon, 4 Jan 2021 10:38:02 +0000 (10:38 +0000)]
net: macb: Correct usage of MACB_CAPS_CLK_HW_CHG flag
A new flag MACB_CAPS_CLK_HW_CHG was added and all callers of
macb_set_tx_clk were gated on the presence of this flag.
- if (!clk)
+ if (!bp->tx_clk || !(bp->caps & MACB_CAPS_CLK_HW_CHG))
However the flag was not added to anything other than the new
sama7g5_gem, turning that function call into a no op for all other
systems. This breaks the networking on Zynq.
The commit message adding this states: a new capability so that
macb_set_tx_clock() to not be called for IPs having this
capability
This strongly implies that present of the flag was intended to skip
the function not absence of the flag. Update the if statement to
this effect, which repairs the existing users.
Fixes: daafa1d33cc9 ("net: macb: add capability to not set the clock rate") Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com> Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20210104103802.13091-1-ckeepax@opensource.cirrus.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Baruch Siach [Tue, 29 Dec 2020 09:08:39 +0000 (11:08 +0200)]
docs: networking: packet_mmap: fix old config reference
Before commit 889b8f964f2f ("packet: Kill CONFIG_PACKET_MMAP.") there
used to be a CONFIG_PACKET_MMAP config symbol that depended on
CONFIG_PACKET. The text still implies that PACKET_MMAP can be disabled.
Remove that from the text, as well as reference to old kernel versions.
Also, drop reference to broken link to information for pre 2.6.5
kernels.
Make a slight working improvement (s/In/On/) while at it.
Yunjian Wang [Tue, 29 Dec 2020 02:01:48 +0000 (10:01 +0800)]
vhost_net: fix ubuf refcount incorrectly when sendmsg fails
Currently the vhost_zerocopy_callback() maybe be called to decrease
the refcount when sendmsg fails in tun. The error handling in vhost
handle_tx_zerocopy() will try to decrease the same refcount again.
This is wrong. To fix this issue, we only call vhost_net_ubuf_put()
when vq->heads[nvq->desc].len == VHOST_DMA_IN_PROGRESS.
Fixes: bab632d69ee4 ("vhost: vhost TX zero-copy support") Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/1609207308-20544-1-git-send-email-wangyunjian@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Taehee Yoo [Mon, 28 Dec 2020 15:21:46 +0000 (15:21 +0000)]
bareudp: Fix use of incorrect min_headroom size
In the bareudp6_xmit_skb(), it calculates min_headroom.
At that point, it uses struct iphdr, but it's not correct.
So panic could occur.
The struct ipv6hdr should be used.
Test commands:
ip netns add A
ip netns add B
ip link add veth0 netns A type veth peer name veth1 netns B
ip netns exec A ip link set veth0 up
ip netns exec A ip a a 2001:db8:0::1/64 dev veth0
ip netns exec B ip link set veth1 up
ip netns exec B ip a a 2001:db8:0::2/64 dev veth1
for i in {10..1}
do
let A=$i-1
ip netns exec A ip link add bareudp$i type bareudp dstport $i \
ethertype 0x86dd
ip netns exec A ip link set bareudp$i up
ip netns exec A ip -6 a a 2001:db8:$i::1/64 dev bareudp$i
ip netns exec A ip -6 r a 2001:db8:$i::2 encap ip6 src \
2001:db8:$A::1 dst 2001:db8:$A::2 via 2001:db8:$i::2 \
dev bareudp$i
ip netns exec B ip link add bareudp$i type bareudp dstport $i \
ethertype 0x86dd
ip netns exec B ip link set bareudp$i up
ip netns exec B ip -6 a a 2001:db8:$i::2/64 dev bareudp$i
ip netns exec B ip -6 r a 2001:db8:$i::1 encap ip6 src \
2001:db8:$A::2 dst 2001:db8:$A::1 via 2001:db8:$i::1 \
dev bareudp$i
done
ip netns exec A ping 2001:db8:7::2
Taehee Yoo [Mon, 28 Dec 2020 15:21:36 +0000 (15:21 +0000)]
bareudp: set NETIF_F_LLTX flag
Like other tunneling interfaces, the bareudp doesn't need TXLOCK.
So, It is good to set the NETIF_F_LLTX flag to improve performance and
to avoid lockdep's false-positive warning.
Test commands:
ip netns add A
ip netns add B
ip link add veth0 netns A type veth peer name veth1 netns B
ip netns exec A ip link set veth0 up
ip netns exec A ip a a 10.0.0.1/24 dev veth0
ip netns exec B ip link set veth1 up
ip netns exec B ip a a 10.0.0.2/24 dev veth1
for i in {2..1}
do
let A=$i-1
ip netns exec A ip link add bareudp$i type bareudp \
dstport $i ethertype ip
ip netns exec A ip link set bareudp$i up
ip netns exec A ip a a 10.0.$i.1/24 dev bareudp$i
ip netns exec A ip r a 10.0.$i.2 encap ip src 10.0.$A.1 \
dst 10.0.$A.2 via 10.0.$i.2 dev bareudp$i
ip netns exec B ip link add bareudp$i type bareudp \
dstport $i ethertype ip
ip netns exec B ip link set bareudp$i up
ip netns exec B ip a a 10.0.$i.2/24 dev bareudp$i
ip netns exec B ip r a 10.0.$i.1 encap ip src 10.0.$A.2 \
dst 10.0.$A.1 via 10.0.$i.1 dev bareudp$i
done
ip netns exec A ping 10.0.2.2
Linus Torvalds [Mon, 4 Jan 2021 18:55:19 +0000 (10:55 -0800)]
Merge branch 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull RCU fix from Paul McKenney:
"This is a fix for a regression in the v5.10 merge window, but it was
reported quite late in the v5.10 process, plus generating and testing
the fix took some time.
The regression is due to commit 36dadef23fcc ("kprobes: Init kprobes
in early_initcall") which on powerpc can use RCU Tasks before
initialization, resulting in boot failures.
The fix is straightforward, simply moving initialization of RCU Tasks
before the early_initcall()s. The fix has been exposed to -next and
kbuild test robot testing, and has been tested by the PowerPC guys"
* 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
rcu-tasks: Move RCU-tasks initialization to before early_initcall()
David Howells [Wed, 23 Dec 2020 10:39:57 +0000 (10:39 +0000)]
afs: Fix directory entry size calculation
The number of dirent records used by an AFS directory entry should be
calculated using the assumption that there is a 16-byte name field in the
first block, rather than a 20-byte name field (which is actually the case).
This miscalculation is historic and effectively standard, so we have to use
it.
The calculation we need to use is:
1 + (((strlen(name) + 1) + 15) >> 5)
where we are adding one to the strlen() result to account for the NUL
termination.
Fix this by the following means:
(1) Create an inline function to do the calculation for a given name
length.
(2) Use the function to calculate the number of records used for a dirent
in afs_dir_iterate_block().
Use this to move the over-end check out of the loop since it only
needs to be done once.
Further use this to only go through the loop for the 2nd+ records
composing an entry. The only test there now is for if the record is
allocated - and we already checked the first block at the top of the
outer loop.
(3) Add a max name length check in afs_dir_iterate_block().
(4) Make afs_edit_dir_add() and afs_edit_dir_remove() use the function
from (1) to calculate the number of blocks rather than doing it
incorrectly themselves.
Fixes: 63a4681ff39c ("afs: Locally edit directory data for mkdir/create/unlink/...") Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Marc Dionne <marc.dionne@auristor.com>
David Howells [Mon, 21 Dec 2020 22:37:58 +0000 (22:37 +0000)]
afs: Work around strnlen() oops with CONFIG_FORTIFIED_SOURCE=y
AFS has a structured layout in its directory contents (AFS dirs are
downloaded as files and parsed locally by the client for lookup/readdir).
The slots in the directory are defined by union afs_xdr_dirent. This,
however, only directly allows a name of a length that will fit into that
union. To support a longer name, the next 1-8 contiguous entries are
annexed to the first one and the name flows across these.
afs_dir_iterate_block() uses strnlen(), limited to the space to the end of
the page, to find out how long the name is. This worked fine until 6a39e62abbaf. With that commit, the compiler determines the size of the
array and asserts that the string fits inside that array. This is a
problem for AFS because we *expect* it to overflow one or more arrays.
A similar problem also occurs in afs_dir_scan_block() when a directory file
is being locally edited to avoid the need to redownload it. There strlen()
was being used safely because each page has the last byte set to 0 when the
file is downloaded and validated (in afs_dir_check_page()).
Fix this by changing the afs_xdr_dirent union name field to an
indeterminate-length array and dropping the overflow field.
(Note that whilst looking at this, I realised that the calculation of the
number of slots a dirent used is non-standard and not quite right, but I'll
address that in a separate patch.)
Linus Torvalds [Sat, 2 Jan 2021 20:22:46 +0000 (12:22 -0800)]
Merge tag 's390-5.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 cleanups from Vasily Gorbik:
"Update defconfigs and sort config select list"
* tag 's390-5.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/Kconfig: sort config S390 select list once again
s390: update defconfigs
Linus Torvalds [Sat, 2 Jan 2021 19:53:05 +0000 (11:53 -0800)]
Merge tag 'pm-5.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix a crash in intel_pstate during resume from suspend-to-RAM
that may occur after recent changes and two resource leaks in error
paths in the operating performance points (OPP) framework, add a new
C-states table to intel_idle and update the cpuidle MAINTAINERS entry
to cover the governors too.
Specifics:
- Fix recently introduced crash in the intel_pstate driver that
occurs if scale-invariance is disabled during resume from
suspend-to-RAM due to inconsistent changes of APERF or MPERF MSR
values made by the platform firmware (Rafael Wysocki).
- Fix a memory leak and add a missing clk_put() in error paths in the
OPP framework (Quanyang Wang, Viresh Kumar).
- Add new C-states table for SnowRidge processors to the intel_idle
driver (Artem Bityutskiy).
- Update the MAINTAINERS entry for cpuidle to make it clear that the
governors are covered by it too (Lukas Bulwahn)"
* tag 'pm-5.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
intel_idle: add SnowRidge C-state table
cpufreq: intel_pstate: Fix fast-switch fallback path
opp: Call the missing clk_put() on error
opp: fix memory leak in _allocate_opp_table
MAINTAINERS: include governors into CPU IDLE TIME MANAGEMENT FRAMEWORK
Linus Torvalds [Fri, 1 Jan 2021 20:58:07 +0000 (12:58 -0800)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"This is a load of driver fixes (12 ufs, 1 mpt3sas, 1 cxgbi).
The big core two fixes are for power management ("block: Do not accept
any requests while suspended" and "block: Fix a race in the runtime
power management code") which finally sorts out the resume problems
we've occasionally been having.
To make the resume fix, there are seven necessary precursors which
effectively renames REQ_PREEMPT to REQ_PM, so every "special" request
in block is automatically a power management exempt one.
All of the non-PM preempt cases are removed except for the one in the
SCSI Parallel Interface (spi) domain validation which is a genuine
case where we have to run requests at high priority to validate the
bus so this becomes an autopm get/put protected request"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (22 commits)
scsi: cxgb4i: Fix TLS dependency
scsi: ufs: Un-inline ufshcd_vops_device_reset function
scsi: ufs: Re-enable WriteBooster after device reset
scsi: ufs-mediatek: Use correct path to fix compile error
scsi: mpt3sas: Signedness bug in _base_get_diag_triggers()
scsi: block: Do not accept any requests while suspended
scsi: block: Remove RQF_PREEMPT and BLK_MQ_REQ_PREEMPT
scsi: core: Only process PM requests if rpm_status != RPM_ACTIVE
scsi: scsi_transport_spi: Set RQF_PM for domain validation commands
scsi: ide: Mark power management requests with RQF_PM instead of RQF_PREEMPT
scsi: ide: Do not set the RQF_PREEMPT flag for sense requests
scsi: block: Introduce BLK_MQ_REQ_PM
scsi: block: Fix a race in the runtime power management code
scsi: ufs-pci: Enable UFSHCD_CAP_RPM_AUTOSUSPEND for Intel controllers
scsi: ufs-pci: Fix recovery from hibernate exit errors for Intel controllers
scsi: ufs-pci: Ensure UFS device is in PowerDown mode for suspend-to-disk ->poweroff()
scsi: ufs-pci: Fix restore from S4 for Intel controllers
scsi: ufs-mediatek: Keep VCC always-on for specific devices
scsi: ufs: Allow regulators being always-on
scsi: ufs: Clear UAC for RPMB after ufshcd resets
...
Linus Torvalds [Fri, 1 Jan 2021 20:49:09 +0000 (12:49 -0800)]
Merge tag 'block-5.11-2021-01-01' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"Two minor block fixes from this last week that should go into 5.11:
- Add missing NOWAIT debugfs definition (Andres)
- Fix kerneldoc warning introduced this merge window (Randy)"
* tag 'block-5.11-2021-01-01' of git://git.kernel.dk/linux-block:
block: add debugfs stanza for QUEUE_FLAG_NOWAIT
fs: block_dev.c: fix kernel-doc warnings from struct block_device changes
Linus Torvalds [Fri, 1 Jan 2021 20:29:49 +0000 (12:29 -0800)]
Merge tag 'io_uring-5.11-2021-01-01' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"A few fixes that should go into 5.11, all marked for stable as well:
- Fix issue around identity COW'ing and users that share a ring
across processes
- Fix a hang associated with unregistering fixed files (Pavel)
- Move the 'process is exiting' cancelation a bit earlier, so
task_works aren't affected by it (Pavel)"
* tag 'io_uring-5.11-2021-01-01' of git://git.kernel.dk/linux-block:
kernel/io_uring: cancel io_uring before task works
io_uring: fix io_sqe_files_unregister() hangs
io_uring: add a helper for setting a ref node
io_uring: don't assume mm is constant across submits
Linus Torvalds [Mon, 28 Dec 2020 19:40:22 +0000 (11:40 -0800)]
depmod: handle the case of /sbin/depmod without /sbin in PATH
Commit 436e980e2ed5 ("kbuild: don't hardcode depmod path") stopped
hard-coding the path of depmod, but in the process caused trouble for
distributions that had that /sbin location, but didn't have it in the
PATH (generally because /sbin is limited to the super-user path).
Work around it for now by just adding /sbin to the end of PATH in the
depmod.sh script.
Pavel Begunkov [Wed, 30 Dec 2020 21:34:16 +0000 (21:34 +0000)]
kernel/io_uring: cancel io_uring before task works
For cancelling io_uring requests it needs either to be able to run
currently enqueued task_works or having it shut down by that moment.
Otherwise io_uring_cancel_files() may be waiting for requests that won't
ever complete.
Go with the first way and do cancellations before setting PF_EXITING and
so before putting the task_work infrastructure into a transition state
where task_work_run() would better not be called.
Pavel Begunkov [Wed, 30 Dec 2020 21:34:15 +0000 (21:34 +0000)]
io_uring: fix io_sqe_files_unregister() hangs
io_sqe_files_unregister() uninterruptibly waits for enqueued ref nodes,
however requests keeping them may never complete, e.g. because of some
userspace dependency. Make sure it's interruptible otherwise it would
hang forever.
Linus Torvalds [Wed, 30 Dec 2020 20:02:12 +0000 (12:02 -0800)]
Merge tag 'ceph-for-5.11-rc2' of git://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"A fix for an edge case in MClientRequest encoding and a couple of
trivial fixups for the new msgr2 support"
* tag 'ceph-for-5.11-rc2' of git://github.com/ceph/ceph-client:
libceph: add __maybe_unused to DEFINE_MSGR2_FEATURE
libceph: align session_key and con_secret to 16 bytes
libceph: fix auth_signature buffer allocation in secure mode
ceph: reencode gid_list when reconnecting
Artem Bityutskiy [Sun, 27 Dec 2020 10:11:16 +0000 (12:11 +0200)]
intel_idle: add SnowRidge C-state table
Add C-state table for the SnowRidge SoC which is found on Intel Jacobsville
platforms.
The following has been changed.
1. C1E latency changed from 10us to 15us. It was measured using the
open source "wult" tool (the "nic" method, 15us is the 99.99th
percentile).
2. C1E power break even changed from 20us to 25us, which may result
in less C1E residency in some workloads.
3. C6 latency changed from 50us to 130us. Measured the same way as C1E.
The C6 C-state is supported only by some SnowRidge revisions, so add a C-state
table commentary about this.
On SnowRidge, C6 support is enumerated via the usual mechanism: "mwait" leaf of
the "cpuid" instruction. The 'intel_idle' driver does check this leaf, so even
though C6 is present in the table, the driver will only use it if the CPU does
support it.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When sugov_update_single_perf() falls back to the "frequency"
path due to the missing scale-invariance, it will call
cpufreq_driver_fast_switch() via sugov_fast_switch()
and the driver's ->fast_switch() callback will be invoked,
so it must not be NULL.
However, after commit a365ab6b9dfb ("cpufreq: intel_pstate: Implement
the ->adjust_perf() callback") intel_pstate sets ->fast_switch() to
NULL when it is going to use intel_cpufreq_adjust_perf(), which is a
mistake, because on x86 the scale-invariance may be turned off
dynamically, so modify it to retain the original ->adjust_perf()
callback pointer.
Fixes: a365ab6b9dfb ("cpufreq: intel_pstate: Implement the ->adjust_perf() callback") Reported-by: Kenneth R. Crudup <kenny@panix.com> Tested-by: Kenneth R. Crudup <kenny@panix.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Merge branch 'opp/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm
Pull operating performance points (OPP) framework fixes for 5.11-rc2
from Viresh Kumar:
"This contains two patches to fix freeing of resources in error paths."
* 'opp/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
opp: Call the missing clk_put() on error
opp: fix memory leak in _allocate_opp_table
Randy Dunlap [Tue, 29 Dec 2020 03:47:06 +0000 (19:47 -0800)]
fs: block_dev.c: fix kernel-doc warnings from struct block_device changes
Fix new kernel-doc warnings in fs/block_dev.c:
../fs/block_dev.c:1066: warning: Excess function parameter 'whole' description in 'bd_abort_claiming'
../fs/block_dev.c:1837: warning: Function parameter or member 'dev' not described in 'lookup_bdev'
Fixes: 4e7b5671c6a8 ("block: remove i_bdev") Fixes: 37c3fc9abb25 ("block: simplify the block device claiming interface") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@lst.de> Cc: linux-fsdevel@vger.kernel.org Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Tue, 29 Dec 2020 23:45:49 +0000 (15:45 -0800)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"16 patches
Subsystems affected by this patch series: mm (selftests, hugetlb,
pagecache, mremap, kasan, and slub), kbuild, checkpatch, misc, and
lib"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: slub: call account_slab_page() after slab page initialization
zlib: move EXPORT_SYMBOL() and MODULE_LICENSE() out of dfltcc_syms.c
lib/zlib: fix inflating zlib streams on s390
lib/genalloc: fix the overflow when size is too big
kdev_t: always inline major/minor helper functions
sizes.h: add SZ_8G/SZ_16G/SZ_32G macros
local64.h: make <asm/local64.h> mandatory
kasan: fix null pointer dereference in kasan_record_aux_stack
mm: generalise COW SMC TLB flushing race comment
mm/mremap.c: fix extent calculation
mm: memmap defer init doesn't work as expected
mm: add prototype for __add_to_page_cache_locked()
checkpatch: prefer strscpy to strlcpy
Revert "kbuild: avoid static_assert for genksyms"
mm/hugetlb: fix deadlock in hugetlb_cow error path
selftests/vm: fix building protection keys test
Roman Gushchin [Tue, 29 Dec 2020 23:15:07 +0000 (15:15 -0800)]
mm: slub: call account_slab_page() after slab page initialization
It's convenient to have page->objects initialized before calling into
account_slab_page(). In particular, this information can be used to
pre-alloc the obj_cgroup vector.
Let's call account_slab_page() a bit later, after the initialization of
page->objects.
This commit doesn't bring any functional change, but is required for
further optimizations.
[akpm@linux-foundation.org: undo changes needed by forthcoming mm-memcg-slab-pre-allocate-obj_cgroups-for-slab-caches-with-slab_account.patch]
Link: https://lkml.kernel.org/r/20201110195753.530157-1-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Randy Dunlap [Tue, 29 Dec 2020 23:15:04 +0000 (15:15 -0800)]
zlib: move EXPORT_SYMBOL() and MODULE_LICENSE() out of dfltcc_syms.c
In commit 11fb479ff5d9 ("zlib: export S390 symbols for zlib modules"), I
added EXPORT_SYMBOL()s to dfltcc_inflate.c but then Mikhail said that
these should probably be in dfltcc_syms.c with the other
EXPORT_SYMBOL()s.
However, that is contrary to the current kernel style, which places
EXPORT_SYMBOL() immediately after the function that it applies to, so
move all EXPORT_SYMBOL()s to their respective function locations and
drop the dfltcc_syms.c file. Also move MODULE_LICENSE() from the
deleted file to dfltcc.c.
Ilya Leoshkevich [Tue, 29 Dec 2020 23:15:01 +0000 (15:15 -0800)]
lib/zlib: fix inflating zlib streams on s390
Decompressing zlib streams on s390 fails with "incorrect data check"
error.
Userspace zlib checks inflate_state.flags in order to byteswap checksums
only for zlib streams, and s390 hardware inflate code, which was ported
from there, tries to match this behavior. At the same time, kernel zlib
does not use inflate_state.flags, so it contains essentially random
values. For many use cases either zlib stream is zeroed out or checksum
is not used, so this problem is masked, but at least SquashFS is still
affected.
Fix by always passing a checksum to and from the hardware as is, which
matches zlib_inflate()'s expectations.
Randy Dunlap [Tue, 29 Dec 2020 23:14:49 +0000 (15:14 -0800)]
local64.h: make <asm/local64.h> mandatory
Make <asm-generic/local64.h> mandatory in include/asm-generic/Kbuild and
remove all arch/*/include/asm/local64.h arch-specific files since they
only #include <asm-generic/local64.h>.
This fixes build errors on arch/c6x/ and arch/nios2/ for
block/blk-iocost.c.
Build-tested on 21 of 25 arch-es. (tools problems on the others)
Yes, we could even rename <asm-generic/local64.h> to
<linux/local64.h> and change all #includes to use
<linux/local64.h> instead.
Link: https://lkml.kernel.org/r/20201227024446.17018-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Suggested-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Masahiro Yamada <masahiroy@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: Mark Salter <msalter@redhat.com> Cc: Aurelien Jacquiot <jacquiot.aurelien@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nicholas Piggin [Tue, 29 Dec 2020 23:14:43 +0000 (15:14 -0800)]
mm: generalise COW SMC TLB flushing race comment
I'm not sure if I'm completely missing something here, but AFAIKS the
reference to the mysterious "COW SMC race" confuses the issue. The
original changelog and mailing list thread didn't help me either.
This SMC race is where the problem was detected, but isn't the general
problem bigger and more obvious: that the new PTE could be picked up at
any time by any TLB while entries for the old PTE exist in other TLBs
before the TLB flush takes effect?
The case where the iTLB and dTLB of a CPU are pointing at different pages
is an interesting one but follows from the general problem.
The other (minor) thing with the comment I think it makes it a bit clearer
to say what the old code was doing (i.e., it avoids the race as opposed to
what?).
References: 4ce072f1faf29 ("mm: fix a race condition under SMC + COW") Link: https://lkml.kernel.org/r/20201215121119.351650-1-npiggin@gmail.com Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Hugh Dickins <hughd@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suresh Siddha <sbsiddha@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Baoquan He [Tue, 29 Dec 2020 23:14:37 +0000 (15:14 -0800)]
mm: memmap defer init doesn't work as expected
VMware observed a performance regression during memmap init on their
platform, and bisected to commit 73a6e474cb376 ("mm: memmap_init:
iterate over memblock regions rather that check each PFN") causing it.
Before the commit:
[0.033176] Normal zone: 1445888 pages used for memmap
[0.033176] Normal zone: 89391104 pages, LIFO batch:63
[0.035851] ACPI: PM-Timer IO Port: 0x448
With commit
[0.026874] Normal zone: 1445888 pages used for memmap
[0.026875] Normal zone: 89391104 pages, LIFO batch:63
[2.028450] ACPI: PM-Timer IO Port: 0x448
The root cause is the current memmap defer init doesn't work as expected.
Before, memmap_init_zone() was used to do memmap init of one whole zone,
to initialize all low zones of one numa node, but defer memmap init of
the last zone in that numa node. However, since commit 73a6e474cb376,
function memmap_init() is adapted to iterater over memblock regions
inside one zone, then call memmap_init_zone() to do memmap init for each
region.
E.g, on VMware's system, the memory layout is as below, there are two
memory regions in node 2. The current code will mistakenly initialize the
whole 1st region [mem 0xab00000000-0xfcffffffff], then do memmap defer to
iniatialize only one memmory section on the 2nd region [mem
0x10000000000-0x1033fffffff]. In fact, we only expect to see that there's
only one memory section's memmap initialized. That's why more time is
costed at the time.
Now, let's add a parameter 'zone_end_pfn' to memmap_init_zone() to pass
down the real zone end pfn so that defer_init() can use it to judge
whether defer need be taken in zone wide.
Link: https://lkml.kernel.org/r/20201223080811.16211-1-bhe@redhat.com Link: https://lkml.kernel.org/r/20201223080811.16211-2-bhe@redhat.com Fixes: commit 73a6e474cb376 ("mm: memmap_init: iterate over memblock regions rather that check each PFN") Signed-off-by: Baoquan He <bhe@redhat.com> Reported-by: Rahul Gopakumar <gopakumarr@vmware.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Cc: David Hildenbrand <david@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Souptick Joarder [Tue, 29 Dec 2020 23:14:34 +0000 (15:14 -0800)]
mm: add prototype for __add_to_page_cache_locked()
Otherwise it causes a gcc warning:
mm/filemap.c:830:14: warning: no previous prototype for `__add_to_page_cache_locked' [-Wmissing-prototypes]
A previous attempt to make this function static led to compilation
errors when CONFIG_DEBUG_INFO_BTF is enabled because
__add_to_page_cache_locked() is referred to by BPF code.
Adding a prototype will silence the warning.
Link: https://lkml.kernel.org/r/1608693702-4665-1-git-send-email-jrdr.linux@gmail.com Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Cc: Alex Shi <alex.shi@linux.alibaba.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>