git.proxmox.com Git - mirror_ubuntu-jammy-kernel.git/log

net: tulip: Remove the repeated declaration

Function 'pnic2_lnk_change' is declared twice, so remove the
repeated declaration.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mscc: ocelot: check return value after calling platform_get_resource()

It will cause null-ptr-deref if platform_get_resource() returns NULL,
we need check the return value.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'hns3-error-handling'

Guangbin Huang says:

====================
net: hns3: refactors and decouples the error handling logic

This patchset refactors and decouples the error handling logic from reset
logic, it is the preset patch of the RAS feature. It mainly implements the
function that reset logic remains independent of the error handling logic,
this will ensure that common misellaneous MSI-X interrupt are re-enabled
quickly.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: remove now redundant logic related to HNAE3_UNKNOWN_RESET

Earlier patches have decoupled the MSI-X conveyed error handling
and recovery logic. This earlier concept code is no longer required.

Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: Jiaran Zhang <zhangjiaran@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: add scheduling logic for error handling task

Error handling & recovery is done in context of reset task which
gets scheduled from misc interrupt handler in existing code. But
since error handling has been moved to new task, it should get
scheduled instead of the reset task from the interrupt handler.

Signed-off-by: Jiaran Zhang <zhangjiaran@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: add a separate error handling task

Error handling and recovery logic are intertwined. Error handling (i.e.
error identification, clearing error sources and initiation of recovery)
is done in context of reset task. If certain hardware errors get
delivered during driver init time, which can cause driver init/loading
to fail.

Introduce a separate error handling task to ensure below:

1. Reset logic remains independent of the error handling logic.
2. Add the hclge_errhand_task_schedule to schedule error recovery
tasks, This will ensure that common misellaneous MSI-X interrupt are
re-enabled quickly.

Signed-off-by: Jiaran Zhang <zhangjiaran@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: Fix duplicate included linux/kernel.h

Clean up the following includecheck warning:

./drivers/net/ethernet/qlogic/qed/qed_nvmetcp_fw_funcs.h: linux/kernel.h
is included more than once.

No functional change.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
100GbE Intel Wired LAN Driver Updates 2021-06-07

This series contains updates to virtchnl header file and ice driver.

Brett adds capability bits to virtchnl to specify whether a primary or
secondary MAC address is being requested and adds the implementation to
ice. He also adds storing of VF MAC address so that it will be preserved
across reboots of VM and refactors VF queue configuration to remove the
expectation that configuration be done all at once.

Krzysztof refactors ice_setup_rx_ctx() to remove configuration not
related to Rx context into a new function, ice_vsi_cfg_rxq().

Liwei Song extends the wait time for the global config timeout.

Salil Mehta refactors code in ice_vsi_set_num_qs() to remove an
unnecessary call when the user has requested specific number of Rx or Tx
queues.

Jesse converts define macros to static inlines for NOP configurations.

Jake adds messaging when devlink fails to read device capabilities and
when pldmfw cannot find the requested firmware. Adds a wait for reset
completion when reporting devlink info and reinitializes NVM during
rebuild to ensure values are current.

Ani adds detection and reporting of modules exceeding supported power
levels and changes an error message to a debug message.

Paul fixes a clang warning for deadcode.DeadStores.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ax88772-phylib'

Oleksij Rempel says:

====================
port asix ax88772 to the PHYlib

changes v2:
- add Reviewed-by: Andrew Lunn <andrew@lunn.ch> to some patches
- refactor asix_read_phy_addr() and add error handling for all callers
- refactor asix_mdio_bus_read()

Port ax88772 part of asix driver to the phylib to be able to use more
advanced external PHY attached to this controller.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

usbnet: run unbind() before unregister_netdev()

unbind() is the proper place to disconnect PHY, but it will fail if
netdev is already unregistered.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: do not print dump stack if device was removed

In case phy_state_machine() works on top of USB device, we can get -ENODEV
at any point. So, be less noisy if device was removed.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: usb: asix: add error handling for asix_mdio_* functions

This usb devices can be removed at any time, so we need to forward
correct error value if device was detached.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: usb: asix: ax88772: add generic selftest support

With working phylib support we are able now to use generic selftests.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: usb: asix: ax88772: add phylib support

To be able to use ax88772 with external PHYs and use advantage of
existing PHY drivers, we need to port at least ax88772 part of asix
driver to the phylib framework.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: usb/phy: asix: add support for ax88772A/C PHYs

Add support for build-in x88772A/C PHYs

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: usb: asix: refactor asix_read_phy_addr() and handle errors on return

Refactor asix_read_phy_addr() to return usable error value directly and
make sure all callers handle this error.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: usb: asix: ax88772_bind: use devm_kzalloc() instead of kzalloc()

Make resource management easier, use devm_kzalloc().

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: gemini: Use devm_platform_get_and_ioremap_resource()

Use devm_platform_get_and_ioremap_resource() to simplify
code.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

atm: [br2864] fix spelling mistakes

interrupt should be changed to interrupting.

Signed-off-by: gushengxian <gushengxian@yulong.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: hellcreek: Use is_zero_ether_addr() instead of memcmp()

Using is_zero_ether_addr() instead of directly use
memcmp() to determine if the ethernet address is all
zeros.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zou Wei <zou_wei@huawei.com>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

pktgen: add pktgen_handle_all_threads() for the same code

The pktgen_{run, reset, stop}_all_threads() has the same code,
so add pktgen_handle_all_threads() for it.

Signed-off-by: Yejune Deng <yejune.deng@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

virtio_net: Remove BUG() to avoid machine dead

We should not directly BUG() when there is hdr error, it is
better to output a print when such error happens. Currently,
the caller of xmit_skb() already did it.

Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: ixp4xx_eth: Use devm_platform_get_and_ioremap_resource()

Use devm_platform_get_and_ioremap_resource() to simplify
code.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: lantiq: Use devm_platform_get_and_ioremap_resource()

Use devm_platform_get_and_ioremap_resource() to simplify
code.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Acked-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net

Bug fixes overlapping feature additions and refactoring, mostly.

Signed-off-by: David S. Miller <davem@davemloft.net>

sch_htb: fix doc warning in htb_lookup_leaf()

Add description for parameters of htb_lookup_leaf()
to fix gcc W=1 warnings:

net/sched/sch_htb.c:773: warning: Function parameter or member 'hprio' not described in 'htb_lookup_leaf'
net/sched/sch_htb.c:773: warning: Function parameter or member 'prio' not described in 'htb_lookup_leaf'

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sch_htb: fix doc warning in htb_do_events()

Add description for parameters of htb_do_events()
to fix gcc W=1 warnings:

net/sched/sch_htb.c:708: warning: Function parameter or member 'q' not described in 'htb_do_events'
net/sched/sch_htb.c:708: warning: Function parameter or member 'level' not described in 'htb_do_events'
net/sched/sch_htb.c:708: warning: Function parameter or member 'start' not described in 'htb_do_events'

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sch_htb: fix doc warning in htb_charge_class()

Add description for parameters of htb_charge_class()
to fix gcc W=1 warnings:

net/sched/sch_htb.c:663: warning: Function parameter or member 'q' not described in 'htb_charge_class'
net/sched/sch_htb.c:663: warning: Function parameter or member 'cl' not described in 'htb_charge_class'
net/sched/sch_htb.c:663: warning: Function parameter or member 'level' not described in 'htb_charge_class'
net/sched/sch_htb.c:663: warning: Function parameter or member 'skb' not described in 'htb_charge_class'

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sch_htb: fix doc warning in htb_deactivate()

Add description for parameters of htb_deactivate()
to fix gcc W=1 warnings:

net/sched/sch_htb.c:578: warning: Function parameter or member 'q' not described in 'htb_deactivate'
net/sched/sch_htb.c:578: warning: Function parameter or member 'cl' not described in 'htb_deactivate'

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sch_htb: fix doc warning in htb_activate()

Add description for parameters of htb_activate()
to fix gcc W=1 warnings:

net/sched/sch_htb.c:562: warning: Function parameter or member 'q' not described in 'htb_activate'
net/sched/sch_htb.c:562: warning: Function parameter or member 'cl' not described in 'htb_activate'

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sch_htb: fix doc warning in htb_change_class_mode()

Add description for parameters of htb_change_class_mode()
to fix gcc W=1 warnings:

net/sched/sch_htb.c:533: warning: Function parameter or member 'q' not described in 'htb_change_class_mode'
net/sched/sch_htb.c:533: warning: Function parameter or member 'cl' not described in 'htb_change_class_mode'
net/sched/sch_htb.c:533: warning: Function parameter or member 'diff' not described in 'htb_change_class_mode'

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sch_htb: fix doc warning in htb_class_mode()

Add description for parameters of htb_class_mode()
to fix gcc W=1 warnings:

net/sched/sch_htb.c:507: warning: Function parameter or member 'cl' not described in 'htb_class_mode'
net/sched/sch_htb.c:507: warning: Function parameter or member 'diff' not described in 'htb_class_mode'

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sch_htb: fix doc warning in htb_deactivate_prios()

Add description for parameters of htb_deactivate_prios()
to fix gcc W=1 warnings:

net/sched/sch_htb.c:442: warning: Function parameter or member 'q' not described in 'htb_deactivate_prios'
net/sched/sch_htb.c:442: warning: Function parameter or member 'cl' not described in 'htb_deactivate_prios'

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sch_htb: fix doc warning in htb_activate_prios()

Add description for parameters of htb_activate_prios()
to fix gcc W=1 warnings:

net/sched/sch_htb.c:407: warning: Function parameter or member 'q' not described in 'htb_activate_prios'
net/sched/sch_htb.c:407: warning: Function parameter or member 'cl' not described in 'htb_activate_prios'

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sch_htb: fix doc warning in htb_remove_class_from_row()

Add description for parameters of htb_remove_class_from_row()
to fix gcc W=1 warnings:

net/sched/sch_htb.c:380: warning: Function parameter or member 'q' not described in 'htb_remove_class_from_row'
net/sched/sch_htb.c:380: warning: Function parameter or member 'cl' not described in 'htb_remove_class_from_row'
net/sched/sch_htb.c:380: warning: Function parameter or member 'mask' not described in 'htb_remove_class_from_row'

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sch_htb: fix doc warning in htb_add_class_to_row()

Add description for parameters of htb_add_class_to_row() to fix
gcc W=1 warnings:

net/sched/sch_htb.c:351: warning: Function parameter or member 'q' not described in 'htb_add_class_to_row'
net/sched/sch_htb.c:351: warning: Function parameter or member 'cl' not described in 'htb_add_class_to_row'
net/sched/sch_htb.c:351: warning: Function parameter or member 'mask' not described in 'htb_add_class_to_row'

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sch_htb: fix doc warning in htb_next_rb_node()

Add description for parameters of htb_next_rb_node() to fix
gcc W=1 warnings:

net/sched/sch_htb.c:339: warning: Function parameter or member 'n' not described in 'htb_next_rb_node'

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sch_htb: fix doc warning in htb_add_to_wait_tree()

Add description for parameters of htb_add_to_wait_tree() to fix
gcc W=1 warnings:

net/sched/sch_htb.c:308: warning: Function parameter or member 'q' not described in 'htb_add_to_wait_tree'
net/sched/sch_htb.c:308: warning: Function parameter or member 'cl' not described in 'htb_add_to_wait_tree'
net/sched/sch_htb.c:308: warning: Function parameter or member 'delay' not described in 'htb_add_to_wait_tree'

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'hd6470-cleanups'

Peng Li says:

====================
net: hd64570: clean up some code style issues

This patchset clean up some code style issues.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: hd64570: add some required spaces

Add space required before the open parenthesis '('.
Add space required after that close brace '}'.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hd64570: remove redundant parentheses

Remove redundant parentheses around 'cda >= desc_off'.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hd64570: fix the comments style issue

Block comments use * on subsequent lines.
Block comments use a trailing */ on a separate line.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hd64570: add braces {} to all arms of the statement

Braces {} should be used on all arms of this statement.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hd64570: fix the code style issue about trailing statements

Trailing statements should be on next line.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hd64570: fix the code style issue about "foo* bar"

Fix the checkpatch error as "foo* bar" and should be "foo *bar",
and "(foo*)" should be "(foo *)".

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hd64570: add blank line after declarations

This patch fixes the checkpatch error about missing a blank line
after declarations.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hd64570: remove redundant blank lines

This patch removes some redundant blank lines.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'sja1105-yaml'

Vladimir Oltean says:

====================
Convert NXP SJA1105 DSA driver to YAML

This is an attempt to convert the SJA1105 driver to the YAML schema.

The SJA1105 driver has some custom device tree properties which caused
validation problems in the previous attempt:
https://patchwork.kernel.org/project/netdevbpf/patch/20210531234735.1582031-1-olteanv@gmail.com/

So now we are removing them, hoping that this will make the conversion
easier to accept.

In order to do that, we introduce a new PHY interface type, "reverse RMII",
which is like "reverse MII" (aka MII as a PHY) but for the reduced data
width version of the protocol. This is a direct replacement for an rmii
fixed-link. Now, rmii fixed-link interfaces behave as a MAC, and rev-rmii
fixed-link interfaces behave as a PHY.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

dt-bindings: net: dsa: sja1105: convert to YAML schema

Since the sja1105 driver no longer has any custom device tree
properties, the conversion is trivial.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: sja1105: determine PHY/MAC role from PHY interface type

Now that both RevMII as well as RevRMII exist, we can deprecate the
sja1105,role-mac and sja1105,role-phy properties and simply let the user
select that a port operates in MII PHY role by using
phy-mode = "rev-mii";
or in RMII PHY role by using
phy-mode = "rev-rmii";

There are no fixed-link MII or RMII properties in mainline device trees,
and the setup itself is fairly uncommon, so there shouldn't be risks of
breaking compatibility.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: sja1105: apply RGMII delays based on the fixed-link property

The sja1105 driver has an intermediate way of determining whether the
RGMII delays should be applied by the PHY or by itself: by looking at
the port role (PHY or MAC). The port can be put in the PHY role either
explicitly (sja1105,role-phy) or implicitly (fixed-link).

We want to deprecate the sja1105,role-phy property, so all that remains
is the fixed-link property. Introduce a "fixed_link" array of booleans
in the driver, and use that to determine whether RGMII delays must be
applied or not.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: introduce PHY_INTERFACE_MODE_REVRMII

The "reverse RMII" protocol name is a personal invention, derived from
"reverse MII".

Just like MII, RMII is an asymmetric protocol in that a PHY behaves
differently than a MAC. In the case of RMII, for example:
- the 50 MHz clock signals are either driven by the MAC or by an
external oscillator (but never by the PHY).
- the PHY can transmit extra in-band control symbols via RXD[1:0] which
the MAC is supposed to understand, but a PHY isn't.

The "reverse MII" protocol is not standardized either, except for this
web document:
https://www.eetimes.com/reverse-media-independent-interface-revmii-block-architecture/#

In short, it means that the Ethernet controller speaks the 4-bit data
parallel protocol from the perspective of a PHY (it acts like a PHY).
This might mean that it implements clause 22 compatible registers,
although that is optional - the important bit is that its pins can be
connected to an MII MAC and it will 'just work'.

In this discussion thread:
https://lore.kernel.org/netdev/20210201214515.cx6ivvme2tlquge2@skbuf/

we agreed that it would be an abuse of terms to use the "RevMII" name
for anything than the 4-bit parallel MII protocol. But since all the
same concepts can be applied to the 2-bit Reduced MII protocol as well,
here we are introducing a "Reverse RMII" protocol. This means: "behave
like an RMII PHY".

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ice: fix clang warning regarding deadcode.DeadStores

clang generates deadcode.DeadStores warnings when a variable
is used to read a value, but then that value isn't used later
in the code. Fix this warning.

Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: downgrade error print to debug print

Failing to add or remove LLDP filter doesn't seem to be a fatal
error, so downgrade the dev_err message to a dev_dbg message.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Detect and report unsupported module power levels

Determine whether an unsupported power configuration is preventing link
establishment by storing and checking the link_cfg_err_byte. Print error
messages when module power levels are unsupported. Also add a new flag
bit to prevent spamming said error messages.

Co-developed-by: Jeb Cramer <jeb.j.cramer@intel.com>
Signed-off-by: Jeb Cramer <jeb.j.cramer@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: (re)initialize NVM fields when rebuilding

After performing a flash update, a device EMP reset may occur. This
reset will cause the newly downloaded firmware to be initialized. When
this happens, the driver still reports the previous NVM version
information.

This is because the NVM versions are cached within the hw structure.
This can be confusing, as the new firmware is in fact running in this
case.

Handle this by calling ice_init_nvm when rebuilding the driver state.
This will update the flash version information and ensures that the
current values are displayed when reporting the NVM versions to the
stack.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: wait for reset before reporting devlink info

Requesting device firmware information while the device is busy cleaning
up after a reset can result in an unexpected failure:

This occurs because the command is attempting to access the device
AdminQ while it is down. Resolve this by having the command wait for
a while until the reset is complete. To do this, introduce
a reset_wait_queue and associated helper function "ice_wait_for_reset".

This helper will use the wait queue to sleep until the driver is done
rebuilding. Use of a wait queue is preferred because the potential sleep
duration can be several seconds.

To ensure that the thread wakes up properly, a new wake_up call is added
during all code paths which clear the reset state bits associated with
the driver rebuild flow.

Using this ensures that tools can request device information without
worrying about whether the driver is cleaning up from a reset.
Specifically, it is expected that a flash update could result in
a device reset, and it is better to delay the response for information
until the reset is complete rather than exit with an immediate failure.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: add error message when pldmfw_flash_image fails

When flashing a new firmware image onto the device, the pldmfw library
parses the image contents looking for a matching record. If no record
can be found, the function reports an error of -ENOENT. This can produce
a very confusing error message and experience for the user:

  $devlink dev flash pci/0000:ab:00.0 file image.bin
  devlink answers: No such file or directory

This is because the ENOENT error code is interpreted as a missing file
or directory. The pldmfw library does not have direct access to the
extack pointer as it is generic and non-netdevice specific. The only way
that ENOENT is returned by the pldmfw library is when no record matches.

Catch this specific error and report a suitable extended ack message:

  $devlink dev flash pci/0000:ab:00.0 file image.bin
  Error: ice: Firmware image has no record matching this device
  devlink answers: No such file or directory

In addition, ensure that we log an error message to the console whenever
this function fails. Because our driver specific PLDM operation
functions potentially set the extended ACK message, avoid overwriting
this with a generic message.

This change should result in an improved experience when attempting to
flash an image that does not have a compatible record.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: add extack when unable to read device caps

When filling out information for the DEVLINK_CMD_INFO_GET, the driver
needs to read some device capabilities. Add an extack message to
properly inform the caller of the failure, as we do for other failures
in this function.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: use static inline for dummy functions

Trivial:
The driver had previously attempted to use #define
macros to make functions that have no use in certain
configs disappear. Using static inlines instead allows
for certain static checkers to process the code better,
and results in no functional change.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Re-organizes reqstd/avail {R, T}XQ check/code for efficiency

If user has explicitly requested the number of {R,T}XQs, then it is
unnecessary to get the count of already available {R,T}XQs from the
PF avail_{r,t}xqs bitmap. This value will get overridden by user specified
value in any case.

Re-organize this code for improving the flow, readability and efficiency.
This scope of improvement was found during the review of the ICE driver
code.

Fixes: 87324e747fde ("ice: Implement ethtool ops for channels")
Cc: intel-wired-lan@lists.osuosl.org
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: set the value of global config lock timeout longer

It may need hold Global Config Lock a longer time when download DDP
package file, extend the timeout value to 5000ms to ensure that
download can be finished before other AQ command got time to run,
this will fix the issue below when probe the device, 5000ms is a test
value that work with both Backplane and BreakoutCable NVM image:

ice 0000:f4:00.0: VSI 12 failed lan queue config, error ICE_ERR_CFG
ice 0000:f4:00.0: Failed to delete VSI 12 in FW - error: ICE_ERR_AQ_TIMEOUT
ice 0000:f4:00.0: probe failed due to setup PF switch: -12
ice: probe of 0000:f4:00.0 failed with error -12

Signed-off-by: Liwei Song <liwei.song@windriver.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Refactor VIRTCHNL_OP_CONFIG_VSI_QUEUES handling

Currently, when a VF requests queue configuration via
VIRTCHNL_OP_CONFIG_VSI_QUEUES the PF driver expects that this message
will only be called once and we always assume the queues being
configured start from 0. This is incorrect and is causing issues when
a VF tries to send this message for multiple queue blocks. Fix this by
using the queue_id specified in the virtchnl message and allowing for
individual Rx and/or Tx queues to be configured.

Also, reduce the duplicated for loops for configuring the queues by
moving all the logic into a single for loop.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Refactor ice_setup_rx_ctx

Move AF_XDP logic and buffer allocation out of ice_setup_rx_ctx() to a
new function ice_vsi_cfg_rxq(), so the function actually sets up the Rx
context.

Signed-off-by: Krzysztof Kazimierczak <krzysztof.kazimierczak@intel.com>
Co-developed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com>

ice: Save VF's MAC across reboot

If a VM reboots and/or VF driver is unloaded, its cached hardware MAC
address (hw_lan_addr.addr) is cleared in some cases. If the VF is
trusted, then the PF driver allows the VF to clear its old MAC address
even if this MAC was configured by a host administrator. If the VF is
untrusted, then the PF driver allows the VF to clear its old MAC
address only if the host admin did not set it.

For the trusted VF case, this is unexpected and will cause issues
because the host configured MAC (i.e. via XML) will be cleared on VM
reboot. For the untrusted VF case, this is done to be consistent and it
will allow the VF to keep the same MAC across VM reboot.

Fix this by introducing dev_lan_addr to the VF structure. This will be
the VF's MAC address when it's up and running and in most cases will be
the same as the hw_lan_addr. However, to address the VM reboot and
unload/reload problem, the driver will never allow the hw_lan_addr to be
cleared via VIRTCHNL_OP_DEL_ETH_ADDR. When the VF's MAC is changed, the
dev_lan_addr and hw_lan_addr will always be updated with the same value.
The only ways the VF's MAC can change are the following:

- Set the VF's MAC administratively on the host via iproute2.
- If the VF is trusted and changes/sets its own MAC.
- If the VF is untrusted and the host has not set the MAC via iproute2.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Manage VF's MAC address for both legacy and new cases

Currently there is no way for a VF driver to specify if it wants to
change it's hardware address. New bits are being added to virtchnl.h
in struct virtchnl_ether_addr that allow for the VF to correctly
communicate this information. However, legacy VF drivers that don't
support the new virtchnl.h bits still need to be supported. Make a
best effort attempt at saving the VF's primary/device address in the
legacy case and depend on the VIRTCHNL_ETHER_ADDR_PRIMARY type for
the new case.

Legacy case - If a unicast MAC is being added and the
hw_lan_addr.addr is empty, then populate it. This assumes that the
address is the VF's hardware address. If a unicast MAC is being
added and the hw_lan_addr.addr is not empty, then cache it in the
legacy_last_added_umac.addr. If a unicast MAC is being deleted and it
matches the hw_lan_addr.addr, then zero the hw_lan_addr.addr.
Also, if the legacy_last_added_umac.addr has not expired, copy the
legacy_last_added_umac.addr into the hw_lan_addr.addr. This is done
because we cannot guarantee the order of VIRTCHNL_OP_ADD_ETH_ADDR and
VIRTCHNL_OP_DEL_ETH_ADDR.

New case - If a unicast MAC is being added and it's specified as
VIRTCHNL_ETHER_ADDR_PRIMARY, then replace the current
hw_lan_addr.addr. If a unicast MAC is being deleted and it's type
is specified as VIRTCHNL_ETHER_ADDR_PRIMARY, then zero the
hw_lan_addr.addr.

Untrusted VFs - Only allow above legacy/new changes to their
hardware address if the PF has not set it administratively via
iproute2.

Trusted VFs - Always allow above legacy/new changes to their
hardware address even if the PF has administratively set it via
iproute2.

Also, change the variable dflt_lan_addr to hw_lan_addr to clearly
represent the purpose of this variable since it's purpose is to
act as a hardware programmed MAC address for the VF.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

virtchnl: Use pad byte in virtchnl_ether_addr to specify MAC type

Currently, there is no way for a VF driver to specify that it wants to
change its device/primary unicast MAC address. This makes it
difficult/impossible for the PF driver to track the VF's device/primary
unicast MAC address, which is used for VM/VF reboot and displaying on
the host. Fix this by using 2 bits of a pad byte in the
virtchnl_ether_addr structure so the VF can specify what type of MAC
it's adding/deleting.

Below are the values that should be used by all VF drivers going
forward.

VIRTCHNL_ETHER_ADDR_LEGACY(0):
- The type should only ever be 0 for legacy AVF drivers (i.e.
  drivers that don't support the new type bits). The PF drivers
  will track VF's device/primary unicast MAC, but this will only
  be a best effort.

VIRTCHNL_ETHER_ADDR_PRIMARY(1):
- This type should only be used when the VF is changing their
  device/primary unicast MAC. It should be used for both delete
  and add cases related to the device/primary unicast MAC.

VIRTCHNL_ETHER_ADDR_EXTRA(2):
- This type should be used when the VF is adding and/or deleting
  MAC addresses that are not the device/primary unicast MAC. For
  example, extra unicast addresses and multicast addresses
  assuming the PF supports "extra" addresses at all.

If a PF is parsing the type field of the virtchnl_ether_addr, then it
should use the VIRTCHNL_ETHER_ADDR_TYPE_MASK to mask the first two bits
of the type field since 0, 1, and 2 are the only valid values.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

net: dsa: xrs700x: allow HSR/PRP supervision dupes for node_table

Add an inbound policy filter which matches the HSR/PRP supervision
MAC range and forwards to the CPU port without discarding duplicates.
This is required to correctly populate time_in[A] and time_in[B] in the
HSR/PRP node_table. Leave the policy disabled by default and
enable/disable it when joining/leaving hsr.

Signed-off-by: George McCollister <george.mccollister@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net:cxgb3: fix incorrect work cancellation

In my last changes in commit 5e0b8928927f I introduced a copy-paste bug,
leading to cancel twice qresume_task work for OFLD queue, and never the
one for CTRL queue. This patch cancels correctly both works.

Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: avoid link re-train during TC-MQPRIO configuration

When configuring TC-MQPRIO offload, only turn off netdev carrier and
don't bring physical link down in hardware. Otherwise, when the
physical link is brought up again after configuration, it gets
re-trained and stalls ongoing traffic.

Also, when firmware is no longer accessible or crashed, avoid sending
FLOWC and waiting for reply that will never come.

Fix following hung_task_timeout_secs trace seen in these cases.

INFO: task tc:20807 blocked for more than 122 seconds.
Tainted: G S 5.13.0-rc3+ #122
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:tc state:D stack:14768 pid:20807 ppid: 19366 flags:0x00000000
Call Trace:
__schedule+0x27b/0x6a0
schedule+0x37/0xa0
schedule_preempt_disabled+0x5/0x10
__mutex_lock.isra.14+0x2a0/0x4a0
? netlink_lookup+0x120/0x1a0
? rtnl_fill_ifinfo+0x10f0/0x10f0
__netlink_dump_start+0x70/0x250
rtnetlink_rcv_msg+0x28b/0x380
? rtnl_fill_ifinfo+0x10f0/0x10f0
? rtnl_calcit.isra.42+0x120/0x120
netlink_rcv_skb+0x4b/0xf0
netlink_unicast+0x1a0/0x280
netlink_sendmsg+0x216/0x440
sock_sendmsg+0x56/0x60
__sys_sendto+0xe9/0x150
? handle_mm_fault+0x6d/0x1b0
? do_user_addr_fault+0x1c5/0x620
__x64_sys_sendto+0x1f/0x30
do_syscall_64+0x3c/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f7f73218321
RSP: 002b:00007ffd19626208 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 000055b7c0a8b240 RCX: 00007f7f73218321
RDX: 0000000000000028 RSI: 00007ffd19626210 RDI: 0000000000000003
RBP: 000055b7c08680ff R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 000055b7c085f5f6
R13: 000055b7c085f60a R14: 00007ffd19636470 R15: 00007ffd196262a0

Fixes: b1396c2bd675 ("cxgb4: parse and configure TC-MQPRIO offload")
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sch_htb: fix refcount leak in htb_parent_to_leaf_offload

The commit ae81feb7338c ("sch_htb: fix null pointer dereference
on a null new_q") fixes a NULL pointer dereference bug, but it
is not correct.

Because htb_graft_helper properly handles the case when new_q
is NULL, and after the previous patch by skipping this call
which creates an inconsistency : dev_queue->qdisc will still
point to the old qdisc, but cl->parent->leaf.q will point to
the new one (which will be noop_qdisc, because new_q was NULL).
The code is based on an assumption that these two pointers are
the same, so it can lead to refcount leaks.

The correct fix is to add a NULL pointer check to protect
qdisc_refcount_inc inside htb_parent_to_leaf_offload.

Fixes: ae81feb7338c ("sch_htb: fix null pointer dereference on a null new_q")
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Suggested-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bridge: mrp: Update ring transitions.

According to the standard IEC 62439-2, the number of transitions needs
to be counted for each transition 'between' ring state open and ring
state closed and not from open state to closed state.

Therefore fix this for both ring and interconnect ring.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: enetc: use get/put_unaligned helpers for MAC address handling

The supplied buffer for the MAC address might not be aligned. Thus
doing a 32bit (or 16bit) access could be on an unaligned address. For
now, enetc is only used on aarch64 which can do unaligned accesses, thus
there is no error. In any case, be correct and use the get/put_unaligned
helpers.

Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'hdlc_x25-cleanups'

Peng Li says:

====================
net: hdlc_x25: clean up some code style issues

This patchset clean up some code style issues.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: hdlc_x25: fix the alignment issue

Alignment should match open parenthesis.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hdlc_x25: fix the code issue about "if..else.."

According to the chackpatch.pl, else should follow close brace '}'.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hdlc_x25: add some required spaces

Add spaces required around that '='.
Add space required after that ','.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hdlc_x25: move out assignment in if condition

Should not use assignment in if condition.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hdlc_x25: remove unnecessary out of memory message

This patch removes unnecessary out of memory message,
to fix the following checkpatch.pl warning:
"WARNING: Possible unnecessary 'out of memory' message"

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hdlc_x25: remove redundant blank lines

This patch removes some redundant blank lines.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
1GbE Intel Wired LAN Driver Updates 2021-06-04

This series contains updates to igc driver only.

Sasha utilizes the newly introduced ethtool_sprintf() function, removes
unused defines, and fixes indentation.

Muhammad adds support for hardware VLAN insertion and stripping.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2021-06-04

This series contains updates to virtchnl header file and ice driver.

Brett fixes VF being unable to request a different number of queues then
allocated and adds clearing of VF_MBX_ATQLEN register for VF reset.

Haiyue handles error of rebuilding VF VSI during reset.

Paul fixes reporting of autoneg to use the PHY capabilities.

Dave allows LLDP packets without priority of TC_PRIO_CONTROL to be
transmitted.

Geert Uytterhoeven adds explicit padding to virtchnl_proto_hdrs
structure in the virtchnl header file.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'wireguard-fixes'

Jason A. Donenfeld says:

====================
wireguard fixes for 5.13-rc5

Here are bug fixes to WireGuard for 5.13-rc5:

1-2,6) These are small, trivial tweaks to our test harness.

3) Linus thinks -O3 is still dangerous to enable. The code gen wasn't so
   much different with -O2 either.

4) We were accidentally calling synchronize_rcu instead of
   synchronize_net while holding the rtnl_lock, resulting in some rather
   large stalls that hit production machines.

5) Peer allocation was wasting literally hundreds of megabytes on real
   world deployments, due to oddly sized large objects not fitting
   nicely into a kmalloc slab.

7-9) We move from an insanely expensive O(n) algorithm to a fast O(1)
     algorithm, and cleanup a massive memory leak in the process, in
     which allowed ips churn would leave danging nodes hanging around
     without cleanup until the interface was removed. The O(1) algorithm
     eliminates packet stalls and high latency issues, in addition to
     bringing operations that took as much as 10 minutes down to less
     than a second.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

wireguard: allowedips: free empty intermediate nodes when removing single node

When removing single nodes, it's possible that that node's parent is an
empty intermediate node, in which case, it too should be removed.
Otherwise the trie fills up and never is fully emptied, leading to
gradual memory leaks over time for tries that are modified often. There
was originally code to do this, but was removed during refactoring in
2016 and never reworked. Now that we have proper parent pointers from
the previous commits, we can implement this properly.

In order to reduce branching and expensive comparisons, we want to keep
the double pointer for parent assignment (which lets us easily chain up
to the root), but we still need to actually get the parent's base
address. So encode the bit number into the last two bits of the pointer,
and pack and unpack it as needed. This is a little bit clumsy but is the
fastest and less memory wasteful of the compromises. Note that we align
the root struct here to a minimum of 4, because it's embedded into a
larger struct, and we're relying on having the bottom two bits for our
flag, which would only be 16-bit aligned on m68k.

The existing macro-based helpers were a bit unwieldy for adding the bit
packing to, so this commit replaces them with safer and clearer ordinary
functions.

We add a test to the randomized/fuzzer part of the selftests, to free
the randomized tries by-peer, refuzz it, and repeat, until it's supposed
to be empty, and then then see if that actually resulted in the whole
thing being emptied. That combined with kmemcheck should hopefully make
sure this commit is doing what it should. Along the way this resulted in
various other cleanups of the tests and fixes for recent graphviz.

Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Cc: stable@vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

wireguard: allowedips: allocate nodes in kmem_cache

The previous commit moved from O(n) to O(1) for removal, but in the
process introduced an additional pointer member to a struct that
increased the size from 60 to 68 bytes, putting nodes in the 128-byte
slab. With deployed systems having as many as 2 million nodes, this
represents a significant doubling in memory usage (128 MiB -> 256 MiB).
Fix this by using our own kmem_cache, that's sized exactly right. This
also makes wireguard's memory usage more transparent in tools like
slabtop and /proc/slabinfo.

Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Cc: stable@vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

wireguard: allowedips: remove nodes in O(1)

Previously, deleting peers would require traversing the entire trie in
order to rebalance nodes and safely free them. This meant that removing
1000 peers from a trie with a half million nodes would take an extremely
long time, during which we're holding the rtnl lock. Large-scale users
were reporting 200ms latencies added to the networking stack as a whole
every time their userspace software would queue up significant removals.
That's a serious situation.

This commit fixes that by maintaining a double pointer to the parent's
bit pointer for each node, and then using the already existing node list
belonging to each peer to go directly to the node, fix up its pointers,
and free it with RCU. This means removal is O(1) instead of O(n), and we
don't use gobs of stack.

The removal algorithm has the same downside as the code that it fixes:
it won't collapse needlessly long runs of fillers. We can enhance that
in the future if it ever becomes a problem. This commit documents that
limitation with a TODO comment in code, a small but meaningful
improvement over the prior situation.

Currently the biggest flaw, which the next commit addresses, is that
because this increases the node size on 64-bit machines from 60 bytes to
68 bytes. 60 rounds up to 64, but 68 rounds up to 128. So we wind up
using twice as much memory per node, because of power-of-two
allocations, which is a big bummer. We'll need to figure something out
there.

Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Cc: stable@vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

wireguard: allowedips: initialize list head in selftest

The randomized trie tests weren't initializing the dummy peer list head,
resulting in a NULL pointer dereference when used. Fix this by
initializing it in the randomized trie test, just like we do for the
static unit test.

While we're at it, all of the other strings like this have the word
"self-test", so add it to the missing place here.

Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Cc: stable@vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

wireguard: peer: allocate in kmem_cache

With deployments having upwards of 600k peers now, this somewhat heavy
structure could benefit from more fine-grained allocations.
Specifically, instead of using a 2048-byte slab for a 1544-byte object,
we can now use 1544-byte objects directly, thus saving almost 25%
per-peer, or with 600k peers, that's a savings of 303 MiB. This also
makes wireguard's memory usage more transparent in tools like slabtop
and /proc/slabinfo.

Fixes: 8b5553ace83c ("wireguard: queueing: get rid of per-peer ring buffers")
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Cc: stable@vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

wireguard: use synchronize_net rather than synchronize_rcu

Many of the synchronization points are sometimes called under the rtnl
lock, which means we should use synchronize_net rather than
synchronize_rcu. Under the hood, this expands to using the expedited
flavor of function in the event that rtnl is held, in order to not stall
other concurrent changes.

This fixes some very, very long delays when removing multiple peers at
once, which would cause some operations to take several minutes.

Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Cc: stable@vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

wireguard: do not use -O3

Apparently, various versions of gcc have O3-related miscompiles. Looking
at the difference between -O2 and -O3 for gcc 11 doesn't indicate
miscompiles, but the difference also doesn't seem so significant for
performance that it's worth risking.

Link: https://lore.kernel.org/lkml/CAHk-=wjuoGyxDhAF8SsrTkN0-YfCx7E6jUN3ikC_tn2AKWTTsA@mail.gmail.com/
Link: https://lore.kernel.org/lkml/CAHmME9otB5Wwxp7H8bR_i2uH2esEMvoBMC8uEXBMH9p0q1s6Bw@mail.gmail.com/
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Cc: stable@vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

wireguard: selftests: make sure rp_filter is disabled on vethc

Some distros may enable strict rp_filter by default, which will prevent
vethc from receiving the packets with an unrouteable reverse path address.

Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Cc: stable@vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

wireguard: selftests: remove old conntrack kconfig value

On recent kernels, this config symbol is no longer used.

Reported-by: Rui Salvaterra <rsalvaterra@gmail.com>
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Cc: stable@vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: Return the correct errno code

When kalloc or kmemdup failed, should return ENOMEM rather than ENOBUF.

Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mptcp-timestamps'

Mat Martineau says:

====================
mptcp: Add timestamp support

Enable the SO_TIMESTAMP and SO_TIMESTAMPING socket options for MPTCP
sockets and add receive path cmsg support for timestamps.

Patches 1, 2, and 5 expose existing sock and tcp helpers for timestamps
(no new EXPORT_SYMBOLS()s).

Patch 3 propagates timestamp options to subflows.

Patch 4 cleans up MPTCP handling of SOL_SOCKET options.

Patch 6 adds timestamp csmg data when receiving on sockets that have
been configured for timestamps.

Patch 7 adds self test coverage for timestamps.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: mptcp_connect: add SO_TIMESTAMPNS cmsg support

This extends the existing setsockopt test case to also check for cmsg
timestamps.

mptcp_connect will abort/fail if the setockopt was passed but the
timestamp cmsg isn't present after successful recvmsg().

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mptcp: receive path cmsg support

This adds support for SO_TIMESTAMP(NS). Timestamps are passed to
userspace in the same way as for plain tcp sockets.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: export timestamp helpers for mptcp

MPTCP is builtin, so no need to add EXPORT_SYMBOL()s.

It will be used to support SO_TIMESTAMP(NS) ancillary
messages in the mptcp receive path.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mptcp: setsockopt: handle SOL_SOCKET in one place only

Move the pre-check to the function that handles all SOL_SOCKET values.

At this point there is complete coverage for all values that were
accepted by the pre-check.

BUSYPOLL functions are accepted but will not have any functionality
yet until its clear how the expected mptcp behaviour should look like.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mptcp: sockopt: propagate timestamp request to subflows

This adds support for TIMESTAMP(NS) setsockopt.

This doesn't make things work yet, because the mptcp receive path
doesn't convert the skb timestamps to cmsgs for userspace consumption.

receive path cmsg support is added ina followup patch.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sock: expose so_timestamping options for mptcp

Similar to previous patch: expose SO_TIMESTAMPING helper so we do not
have to copy & paste this into the mptcp core.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>