git.proxmox.com Git - mirror_ubuntu-jammy-kernel.git/log

mac80211: add STBC encoding to ieee80211_parse_tx_radiotap

This patch adds support for STBC encoding to the radiotap tx parse
function. Prior to this change adding the STBC flag to the radiotap
header did not encode frames with STBC.

Signed-off-by: Philipp Borgers <borgers@mi.fu-berlin.de>
Link: https://lore.kernel.org/r/20210125150744.83065-1-borgers@mi.fu-berlin.de
[use u8_get_bits/u32_encode_bits instead of manually shifting]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211: minstrel_ht: remove sample rate switching code for constrained devices

This was added to mitigate the effects of too much sampling on devices that
use a static global fallback table instead of configurable multi-rate retry.
Now that the sampling algorithm is improved, this code path no longer performs
any better than the standard probing on affected devices.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20210127055735.78599-6-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211: minstrel_ht: show sampling rates in debugfs

This makes it easier to see what rates are going to be tested next

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20210127055735.78599-5-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211: minstrel_ht: significantly redesign the rate probing strategy

The biggest flaw in current minstrel_ht is the fact that it needs way too
many probing packets to be able to quickly find the best rate.
Depending on the wifi hardware and operating mode, this can significantly
reduce throughput when not operating at the highest available data rate.

In order to be able to significantly reduce the amount of rate sampling,
we need a much smarter selection of probing rates.

The new approach introduced by this patch maintains a limited set of
available rates to be tested during a statistics window.

They are split into distinct categories:
- MINSTREL_SAMPLE_TYPE_INC - incremental rate upgrade:
  Pick the next rate group and find the first rate that is faster than
  the current max. throughput rate
- MINSTREL_SAMPLE_TYPE_JUMP - random testing of higher rates:
  Pick a random rate from the next group that is faster than the current
  max throughput rate. This allows faster adaptation when the link changes
  significantly
- MINSTREL_SAMPLE_TYPE_SLOW - test a rate between max_prob, max_tp2 and
  max_tp in order to reduce the gap between them

In order to prioritize sampling, every 6 attempts are split into 3x INC,
2x JUMP, 1x SLOW.

Available rates are checked and refilled on every stats window update.

With this approach, we finally get a very small delta in throughput when
comparing setting the optimal data rate as a fixed rate vs normal rate
control operation.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20210127055735.78599-4-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211: minstrel_ht: reduce the need to sample slower rates

In order to more gracefully be able to fall back to lower rates without too
much throughput fluctuations, initialize all untested rates below tested ones
to the maximum probabilty of higher rates.
Usually this leads to untested lower rates getting initialized with a
probability value of 100%, making them better candidates for fallback without
having to rely on random probing

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20210127055735.78599-3-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211: minstrel_ht: update total packets counter in tx status path

Keep the update in one place and prepare for further rework

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20210127055735.78599-2-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211: minstrel_ht: use bitfields to encode rate indexes

Get rid of a lot of divisions and modulo operations
Reduces code size and improves performance

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20210127055735.78599-1-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

cfg80211: initialize reg_rule in __freq_reg_info()

Sparse started warning on this function because we can potentially
return an uninitialized value. The reason is that if the caller
passes a min_bw value that is higher then the last value in bws[], we
will not go into the loop and reg_rule will remain initialized. This
cannot happen because the only caller of this function uses either 1
or 20 in min_bw, but the function will be more robust if we
pre-initialize the value.

Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/iwlwifi.20210204154439.6c884ea7281c.I257278d03b0c1ae0aa6631672cfa48f1a95d5996@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211: fix potential overflow when multiplying to u32 integers

The multiplication of the u32 variables tx_time and estimated_retx is
performed using a 32 bit multiplication and the result is stored in
a u64 result. This has a potential u32 overflow issue, so avoid this
by casting tx_time to a u64 to force a 64 bit multiply.

Addresses-Coverity: ("Unintentional integer overflow")
Fixes: 050ac52cbe1f ("mac80211: code for on-demand Hybrid Wireless Mesh Protocol")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20210205175352.208841-1-colin.king@canonical.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211: enable QoS support for nl80211 ctrl port

This patch unifies sending control port frames
over nl80211 and AF_PACKET sockets a little more.

Before this patch, EAPOL frames got QoS prioritization
only when using AF_PACKET sockets.

__ieee80211_select_queue only selects a QoS-enabled queue
for control port frames, when the control port protocol
is set correctly on the skb. For the AF_PACKET path this
works, but the nl80211 path used ETH_P_802_3.

Another check for injected frames in wme.c then prevented
the QoS TID to be copied in the frame.

In order to fix this, get rid of the frame injection marking
for nl80211 ctrl port and set the correct ethernet protocol.

Please note:
An erlier version of this path tried to prevent
frame aggregation for control port frames in order to speed up
the initial connection setup a little. This seemed to cause
issues on my older Intel dvm-based hardware, and was therefore
removed again. Future commits which try to reintroduce this
have to check carefully how hw behaves with aggregated and
non-aggregated traffic for the same TID.
My NIC: Intel(R) Centrino(R) Ultimate-N 6300 AGN, REV=0x74

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de>
Link: https://lore.kernel.org/r/20210206115112.567881-1-markus.theil@tu-ilmenau.de
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

cfg80211: remove unused callback

The ieee80211 class registers a callback which actually does nothing.
Given that the callback is optional, and all its accesses are protected
by a NULL check, remove it entirely.

Signed-off-by: Matteo Croce <mcroce@microsoft.com>
Link: https://lore.kernel.org/r/20210208113356.4105-1-mcroce@linux.microsoft.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

tcp: Sanitize CMSG flags and reserved args in tcp_zerocopy_receive.

Explicitly define reserved field and require it and any subsequent
fields to be zero-valued for now. Additionally, limit the valid CMSG
flags that tcp_zerocopy_receive accepts.

Fixes: 7eeba1706eba ("tcp: Add receive timestamp support for receive zerocopy.")
Signed-off-by: Arjun Roy <arjunroy@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Suggested-by: David Ahern <dsahern@gmail.com>
Suggested-by: Leon Romanovsky <leon@kernel.org>
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: handle tx before rx in napi poll

Cleaning up tx descriptors first increases the chance that
rtl_rx() can allocate new skb's from the cache.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fix dev_ifsioc_locked() race condition

dev_ifsioc_locked() is called with only RCU read lock, so when
there is a parallel writer changing the mac address, it could
get a partially updated mac address, as shown below:

Thread 1 Thread 2
// eth_commit_mac_addr_change()
memcpy(dev->dev_addr, addr->sa_data, ETH_ALEN);
// dev_ifsioc_locked()
memcpy(ifr->ifr_hwaddr.sa_data,
dev->dev_addr,...);

Close this race condition by guarding them with a RW semaphore,
like netdev_get_name(). We can not use seqlock here as it does not
allow blocking. The writers already take RTNL anyway, so this does
not affect the slow path. To avoid bothering existing
dev_set_mac_address() callers in drivers, introduce a new wrapper
just for user-facing callers on ioctl and rtnetlink paths.

Note, bonding also changes slave mac addresses but that requires
a separate patch due to the complexity of bonding code.

Fixes: 3710becf8a58 ("net: RCU locking for simple ioctl()")
Reported-by: "Gong, Sishuai" <sishuai@purdue.edu>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: fix interrupt mask/unmask skip condition

The condition should be skipped if CPU ID equal to nthreads.
The patch doesn't fix any actual issue since
nthreads = min_t(unsigned int, num_present_cpus(), MVPP2_MAX_THREADS).
On all current Armada platforms, the number of CPU's is
less than MVPP2_MAX_THREADS.

Fixes: e531f76757eb ("net: mvpp2: handle cases where more CPUs are available than s/w threads")
Reported-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Stefan Chulski <stefanc@marvell.com>
Reviewed-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'am65-cpsw-nuss-switchdev-driver'

Vignesh Raghavendra says:

====================
net: ti: am65-cpsw-nuss: Add switchdev driver

This series adds switchdev support for AM65 CPSW NUSS driver to support
multi port CPSW present on J721e and AM64 SoCs.
It adds devlink hook to switch b/w switch mode and multi mac mode.

v2:
Rebased on latest net-next
Update patch 1/4 with rationale for using devlink
====================

docs: networking: ti: Add driver doc for AM65 NUSS switch driver

J721e, J7200 and AM64 have multi port switches which can work in multi
mac mode and in switch mode. Add documentation explaining how to use
different modes.

Borrowed from:
Documentation/networking/device_drivers/ethernet/ti/cpsw_switchdev.rst

Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ti: am65-cpsw-nuss: Add switchdev support

J721e, J7200 and AM64 have multi port switches which can work in multi
mac mode and in switch mode. Add support for configuring this HW in
switch mode using devlink and switchdev notifiers.

Support is similar to existing CPSW switchdev implementation of TI's 32 bit
platform like AM33/AM43/AM57.

To enable switch mode:
devlink dev param set platform/8000000.ethernet name switch_mode value true cmode runtime

All configuration is implemented via switchdev API and notifiers.
Supported:
      - SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS
      - SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS
      - SWITCHDEV_ATTR_ID_PORT_STP_STATE
      - SWITCHDEV_OBJ_ID_PORT_VLAN
      - SWITCHDEV_OBJ_ID_PORT_MDB
      - SWITCHDEV_OBJ_ID_HOST_MDB

Hence AM65 CPSW switchdev driver supports:
     - FDB offloading
     - MDB offloading
     - VLAN filtering and offloading
     - STP

Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ti: am65-cpsw-nuss: Add netdevice notifiers

Register netdevice notifiers in order to receive notification when
individual MAC ports are added to the HW bridge.

Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ti: am65-cpsw-nuss: Add devlink support

AM65 NUSS ethernet switch on K3 devices can be configured to work either
in independent mac mode where each port acts as independent network
interface (multi mac) or switch mode.

Add devlink hooks to provide a way to switch b/w these modes.

Rationale to use devlink instead of defaulting to bridge mode is that
SoC use cases require to support multiple independent MAC ports with no
switching so that users can use software bridges with multi-mac
configuration (e.g: to support LAG, HSR/PRP, etc). Also, switching
between multi mac and switch mode requires significant Port and ALE
reconfiguration, therefore is easier to be made as part of mode change
devlink hooks. It also allows to keep user interface similar to what
was implemented for the previous generation of TI CPSW IP
(on AM33/AM43/AM57 SoCs).

Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bcm4908_enet-post-review-fixes'

Rafał Miłecki says:

====================
bcm4908_enet: post-review fixes

V2 of my BCM4908 Ethernet patchset was applied to the net-next.git and
it was later that is received some extra reviews. I'm sending patches
that handle pointed out issues.

David: earler I missed that V2 was applied and I sent V3 and V4 of my
inital patchset. Sorry for that. I think it's the best to ignore V3 and
V4 I sent and proceed with this fixes patchset instead.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: broadcom: bcm4908_enet: fix endianness in xmit code

Use le32_to_cpu() for reading __le32 struct field filled by hw.

Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: broadcom: bcm4908_enet: fix received skb length

Use ETH_FCS_LEN instead of magic value and drop incorrect + 2

Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: broadcom: bcm4908_enet: fix minor typos

1. Fix "ensable" typo noticed by Andrew
2. Fix chipset name in the struct net_device_ops variable

Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: broadcom: bcm4908_enet: drop "inline" from C functions

It seems preferred to let compiler optimize code if applicable.
While at it drop unused enet_umac_maskset().

Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: broadcom: bcm4908_enet: drop unneeded memset()

dma_alloc_coherent takes care of zeroing allocated memory

Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: broadcom: rename BCM4908 driver & update DT binding

compatible string was updated to match normal naming convention so
update driver as well

Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dt-bindings: net: bcm4908-enet: include ethernet-controller.yaml

It should be /included/ by every Ethernet controller binding. It adds
support for various generic properties.

Suggested-by: Rob Herring <robh@kernel.org>
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dt-bindings: net: rename BCM4908 Ethernet binding

Rob pointed out that a normal convention is "brcm,bcm4908-enet" so
update whole binding to match it.

Suggested-by: Rob Herring <robh@kernel.org>
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kern
el/git/bluetooth/bluetooth-next

Johan Hedberg says:

====================
pull request: bluetooth-next 2021-02-11

Here's the main bluetooth-next pull request for 5.12:

- Add support for advertising monitor offliading using Microsoft
vendor extensions
- Add firmware download support for MediaTek MT7921U USB devices
- Suspend-related fixes for Qualcomm devices
- Add support for Intel GarfieldPeak controller
- Various other smaller fixes & cleanups

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'marvell-cn10k'

Geetha sowjanya says:

====================
Add Marvell CN10K support

The current admin function (AF) driver and the netdev driver supports
OcteonTx2 silicon variants. The same OcteonTx2's
Resource Virtualization Unit (RVU) is carried forward to the next-gen
silicon ie OcteonTx3, with some changes and feature enhancements.

This patch set adds support for OcteonTx3 (CN10K) silicon and gets
the drivers to the same level as OcteonTx2. No new OcteonTx3 specific
features are added.

Changes cover below HW level differences
- PCIe BAR address changes wrt shared mailbox memory region
- Receive buffer freeing to HW
- Transmit packet's descriptor submission to HW
- Programmable HW interface identifiers (channels)
- Increased MTU support
- A Serdes MAC block (RPM) configuration

v5-v6
Rebased on top of latest net-next branch.

v4-v5
Fixed sparse warnings.

v3-v4
Fixed compiler warnings.

v2-v3
Reposting as a single thread.
Rebased on top latest net-next branch.

v1-v2
Fixed check-patch reported issues.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

octeontx2-af: cn10k: MAC internal loopback support

MAC on CN10K silicon support loopback for selftest or debug purposes.
This patch does necessary configuration to loopback packets upon receiving
request from LMAC mapped RVU PF's netdev via mailbox.

Also MAC (CGX) on OcteonTx2 silicon variants and MAC (RPM) on
OcteonTx3 CN10K are different and loopback needs to be configured
differently. Upper layer interface between RVU AF and PF netdev is
kept same. Based on silicon variant appropriate fn() pointer is
called to config the MAC.

Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

octeontx2-af: cn10k: Add RPM Rx/Tx stats support

RPM supports below list of counters as an extension to existing counters
*  class based flow control pause frames
*  vlan/jabber/fragmented packets
*  fcs/alignment/oversized error packets

This patch adds support to display supported RPM counters via debugfs
and define new mbox rpm_stats to read all support counters.

Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <Sunil.Goutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

octeontx2-af: cn10k: Add RPM LMAC pause frame support

Flow control configuration is different for CGX(Octeontx2)
and RPM(CN10K) functional blocks. This patch adds the necessary
changes for RPM to support 802.3 pause frames configuration on
cn10k platforms.

Signed-off-by: Rakesh Babu <rsaladi2@marvell.com>
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <Sunil.Goutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

octeontx2-pf: cn10k: Get max mtu supported from admin function

CN10K supports max MTU of 16K on LMAC links and 64k on LBK
links and Octeontx2 silicon supports 9K mtu on both links.
Get the same from nix_get_hw_info mbox message in netdev probe.

This patch also calculates receive buffer size required based
on the MTU set.

Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

octeontx2-af: cn10K: Add MTU configuration

OcteonTx3 CN10K silicon supports bigger MTU when compared
to 9216 MTU supported by OcteonTx2 silicon variants. Lookback
interface supports upto 64K and RPM LMAC interfaces support
upto 16K.

This patch does the necessary configuration and adds support
for PF/VF drivers to retrieve max packet size supported via mbox

This patch also configures tx link credit by considering supported
fifo size and max packet length for Octeontx3 silicon.

This patch also removes platform specific name from the driver name.

Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

octeontx2-af: cn10k: Add support for programmable channels

NIX uses unique channel numbers to identify the packet sources/sinks
like CGX,LBK and SDP. The channel numbers assigned to each block are
hardwired in CN9xxx silicon.
The fixed channel numbers in CN9xxx are:

0x0 | a << 8 | b            - LBK(0..3)_CH(0..63)
0x0 | a << 8                - Reserved
0x700 | a                   - SDP_CH(0..255)
0x800 | a << 8 | b << 4 | c - CGX(0..7)_LMAC(0..3)_CH(0..15)

All the channels in the above fixed enumerator(with maximum
number of blocks) are not required since some chips
have less number of blocks.
For CN10K silicon the channel numbers need to be programmed by
software in each block with the base channel number and range of
channels. This patch calculates and assigns the channel numbers
to efficiently distribute the channel number range(0-4095) among
all the blocks. The assignment is made based on the actual number of
blocks present and also contiguously leaving no holes.
The channel numbers remaining after the math are used as new CPT
replay channels present in CN10K. Also since channel numbers are
not fixed the transmit channel link number needed by AF consumers
is calculated by AF and sent along with nix_lf_alloc mailbox response.

Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

octeontx2-af: cn10k: Add RPM MAC support

OcteonTx2's next gen platform the CN10K has RPM MAC which has a
different serdes when compared to CGX MAC. Though the underlying
HW is different, the CSR interface has been designed largely inline
with CGX MAC, with few exceptions though. So we are using the same
CGX driver for RPM MAC as well and will have a different set of APIs
for RPM where ever necessary.

This patch adds initial support for CN10K's RPM MAC i.e. the driver
registration, communication with firmware etc. For communication with
firmware, RPM provides a different IRQ when compared to CGX.
The CGX and RPM blocks support different features. Currently few
features like ptp, flowcontrol and higig are not supported by RPM. This
patch adds new mailbox message "CGX_FEATURES_GET" to get the list of
features supported by underlying MAC.

RPM has different implementations for RX/TX stats. Unlike CGX,
bar offset of stat registers are different. This patch adds
support to access the same and dump the values in debugfs.

Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

octeontx2-pf: cn10k: Use LMTST lines for NPA/NIX operations

This patch adds support to use new LMTST lines for NPA batch free
and burst SQE flush. Adds new dev_hw_ops structure to hold platform
specific functions and create new files cn10k.c and cn10k.h.

Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

octeontx2-pf: cn10k: Map LMTST region

On CN10K platform transmit/receive buffer alloc and free from/to hardware
had changed to support burst operation. Whereas pervious silicon's only
support single buffer free at a time.
To Support the same firmware allocates a DRAM region for each PF/VF for
storing LMTLINES. These LMTLINES are used for NPA batch free and for
flushing SQE to the hardware.
PF/VF LMTST region is accessed via BAR4. PFs LMTST region is followed
by its VFs mbox memory. The size of region varies from 2KB to 256KB based
on number of LMTLINES configured.

This patch adds support for
- Mapping PF/VF LMTST region.
- Reserves 0-71 (RX + TX + XDP) LMTST lines for NPA batch
free operation.
- Reserves 72-512 LMTST lines for NIX SQE flush.

Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

octeontx2-pf: cn10k: Initialise NIX context

On CN10K platform NIX RQ and SQ context structure got changed.
This patch uses new mbox message "NIX_CN10K_AQ_ENQ" for NIX
context initialization on CN10K platform.

This patch also updates the nix_rx_parse_s and nix_sqe_sg_s
structures to add packet steering bit feilds.

Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

octeontx2-af: cn10k: Update NIX and NPA context in debugfs

On CN10K platform NPA and NIX context structure bit fields
had changed to support new features like bandwidth steering etc.
This patch dumps approprate context for CN10K platform.

Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

octeontx2-af: cn10k: Update NIX/NPA context structure

NIX hardware context structure got changed to accommodate new
features like bandwidth steering, L3/L4 outer/inner checksum
enable/disable etc., on CN10K platform.
This patch defines new mbox message NIX_CN10K_AQ_INST for new
NIX context initialization.

This patch also updates the NPA context structures to accommodate
bit field changes made for CN10K platform.

This patch also removes Big endian bit fields from existing
structures as its support got deprecated in current and upcoming silicons.

Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

octeontx2-pf: cn10k: Add mbox support for CN10K

Firmware allocates memory regions for PFs and VFs in DRAM.
The PFs memory region is used for AF-PF and PF-VF mailbox.
This mbox facilitate communication between AF-PF and PF-VF.

On CN10K platform:
The DRAM region allocated to PF is enumerated as PF BAR4 memory.
PF BAR4 contains AF-PF mbox region followed by its VFs mbox region.
AF-PF mbox region base address is configured at RVU_AF_PFX_BAR4_ADDR
PF-VF mailbox base address is configured at
RVU_PF(x)_VF_MBOX_ADDR = RVU_AF_PF()_BAR4_ADDR+64KB. PF access its
mbox region via BAR4, whereas VF accesses PF-VF DRAM mailboxes via
BAR2 indirect access.

On CN9XX platform:
Mailbox region in DRAM is divided into two parts AF-PF mbox region and
PF-VF mbox region i.e all PFs mbox region is contiguous similarly all
VFs.
The base address of the AF-PF mbox region is configured at
RVU_AF_PF_BAR4_ADDR.
AF-PF1 mbox address can be calculated as RVU_AF_PF_BAR4_ADDR * mbox
size.
The base address of PF-VF mbox region for each PF is configure at
RVU_AF_PF(0..15)_VF_BAR4_ADDR.PF access its mbox region via BAR4 and its
VF mbox regions from RVU_PF_VF_BAR4_ADDR register, whereas VF access its
mbox region via BAR4.

This patch changes mbox initialization to support both CN9XX and CN10K
platform.
The patch also adds new hw_cap flag to setting hw features like TSO etc
and removes platform specific name from the PF/VF driver name to make it
appropriate for all supported platforms

This patch also removes platform specific name from the PF/VF driver name
to make it appropriate for all supported platforms

Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

octeontx2-af: cn10k: Add mbox support for CN10K platform

Firmware allocates memory regions for PFs and VFs in DRAM.
The PFs memory region is used for AF-PF and PF-VF mailbox.
This mbox facilitates communication between AF-PF and PF-VF.

On CN10K platform:
The DRAM region allocated to PF is enumerated as PF BAR4 memory.
PF BAR4 contains AF-PF mbox region followed by its VFs mbox region.
AF-PF mbox region base address is configured at RVU_AF_PFX_BAR4_ADDR
PF-VF mailbox base address is configured at
RVU_PF(x)_VF_MBOX_ADDR = RVU_AF_PF()_BAR4_ADDR+64KB. PF access its
mbox region via BAR4, whereas VF accesses PF-VF DRAM mailboxes via
BAR2 indirect access.

On CN9XX platform:
Mailbox region in DRAM is divided into two parts AF-PF mbox region and
PF-VF mbox region i.e all PFs mbox region is contiguous similarly all
VFs.
The base address of the AF-PF mbox region is configured at
RVU_AF_PF_BAR4_ADDR.
AF-PF1 mbox address can be calculated as RVU_AF_PF_BAR4_ADDR * mbox
size.
The base address of PF-VF mbox region for each PF is configure at
RVU_AF_PF(0..15)_VF_BAR4_ADDR.PF access its mbox region via BAR4 and its
VF mbox regions from RVU_PF_VF_BAR4_ADDR register, whereas VF access its
mbox region via BAR4.

This patch changes mbox initialization to support both CN9XX and CN10K
platform.

This patch also adds CN10K PTP subsystem and device IDs to ptp
driver id table.

Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mvpp2-tx-flow-control'

Stefan Chulski says:

====================
net: mvpp2: Add TX Flow Control support

Armada hardware has a pause generation mechanism in GOP (MAC).
The GOP generate flow control frames based on an indication programmed in Ports Control 0 Register. There is a bit per port.
However assertion of the PortX Pause bits in the ports control 0 register only sends a one time pause.
To complement the function the GOP has a mechanism to periodically send pause control messages based on periodic counters.
This mechanism ensures that the pause is effective as long as the Appropriate PortX Pause is asserted.

Problem is that Packet Processor that actually can drop packets due to lack of resources not connected to the GOP flow control generation mechanism.
To solve this issue Armada has firmware running on CM3 CPU dedicated for Flow Control support.
Firmware monitors Packet Processor resources and asserts XON/XOFF by writing to Ports Control 0 Register.

MSS shared SRAM memory used to communicate between CM3 firmware and PP2 driver.
During init PP2 driver informs firmware about used BM pools, RXQs, congestion and depletion thresholds.

The pause frames are generated whenever congestion or depletion in resources is detected.
The back pressure is stopped when the resource reaches a sufficient level.
So the congestion/depletion and sufficient level implement a hysteresis that reduces the XON/XOFF toggle frequency.

Packet Processor v23 hardware introduces support for RX FIFO fill level monitor.
Patch "add PPv23 version definition" to differ between v23 and v22 hardware.
Patch "add TX FC firmware check" verifies that CM3 firmware supports Flow Control monitoring.

v12 --> v13
- Remove bm_underrun_protect module_param

v11 --> v12
- Improve warning message in "net: mvpp2: add TX FC firmware check" patch

v10 --> v11
- Improve "net: mvpp2: add CM3 SRAM memory map" comment
- Move condition check to 'net: mvpp2: always compare hw-version vs MVPP21' patch

v9 --> v10
- Add CM3 SRAM description to PPv2 documentation

v8 --> v9
- Replace generic pool allocation with devm_ioremap_resource

v7 --> v8
- Reorder "always compare hw-version vs MVPP21" and "add PPv23 version definition" commits
- Typo fixes
- Remove condition fix from "add RXQ flow control configurations"

v6 --> v7
- Reduce patch set from 18 to 15 patches
- Documentation change combined into a single patch
- RXQ and BM size change combined into a single patch
- Ring size change check moved into "add RXQ flow control configurations" commit

v5 --> v6
- No change

v4 --> v5
- Add missed Signed-off
- Fix warnings in patches 3 and 12
- Add revision requirement to warning message
- Move mss_spinlock into RXQ flow control configurations patch
- Improve FCA RXQ non occupied descriptor threshold commit message

v3 --> v4
- Remove RFC tag

v2 --> v3
- Remove inline functions
- Add PPv2.3 description into marvell-pp2.txt
- Improve mvpp2_interrupts_mask/unmask procedure
- Improve FC enable/disable procedure
- Add priv->sram_pool check
- Remove gen_pool_destroy call
- Reduce Flow Control timer to x100 faster

v1 --> v2
- Add memory requirements information
- Add EPROBE_DEFER if of_gen_pool_get return NULL
- Move Flow control configuration to mvpp2_mac_link_up callback
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: add TX FC firmware check

Patch check that TX FC firmware is running in CM3.
If not, global TX FC would be disabled.

Signed-off-by: Stefan Chulski <stefanc@marvell.com>
Acked-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: set 802.3x GoP Flow Control mode

This patch fix GMAC TX flow control autoneg.
Flow control autoneg wrongly were disabled with enabled TX
flow control.

Signed-off-by: Stefan Chulski <stefanc@marvell.com>
Acked-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: add PPv23 RX FIFO flow control

New FIFO flow control feature was added in PPv23.
PPv2 FIFO polled by HW and trigger pause frame if FIFO
fill level is below threshold.
FIFO HW flow control enabled with CM3 RXQ&BM flow
control with ethtool.
Current FIFO thresholds is:
9KB for port with maximum speed 10Gb/s port
4KB for port with maximum speed 5Gb/s port
2KB for port with maximum speed 1Gb/s port

Signed-off-by: Stefan Chulski <stefanc@marvell.com>
Acked-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: add BM protection underrun feature support

The PP2v23 hardware supports a feature allowing to double the
size of BPPI by decreasing number of pools from 16 to 8.
Increasing of BPPI size protect BM drop from BPPI underrun.
Underrun could occurred due to stress on DDR and as result slow buffer
transition from BPPE to BPPI.
New BPPI threshold recommended by spec is:
BPPI low threshold - 640 buffers
BPPI high threshold - 832 buffers
Supported only in PPv23.

Signed-off-by: Stefan Chulski <stefanc@marvell.com>
Acked-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: add ethtool flow control configuration support

This patch add ethtool flow control configuration support.

Tx flow control retrieved correctly by ethtool get function.
FW per port ethtool configuration capability added.

Patch also takes care about mtu change procedure, if PPv2 switch
BM pools during mtu change.

Signed-off-by: Stefan Chulski <stefanc@marvell.com>
Acked-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: add RXQ flow control configurations

This patch adds RXQ flow control configurations.
Flow control disabled by default.
Minimum ring size limited to 1024 descriptors.

Signed-off-by: Stefan Chulski <stefanc@marvell.com>
Acked-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: enable global flow control

This patch enables global flow control in FW and in the phylink validate mask.

Signed-off-by: Stefan Chulski <stefanc@marvell.com>
Acked-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: add FCA RXQ non occupied descriptor threshold

The firmware needs to monitor the RX Non-occupied descriptor
bits for flow control to move to XOFF mode.
These bits need to be unmasked to be functional, but they will
not raise interrupts as we leave the RX exception summary
bit in MVPP2_ISR_RX_TX_MASK_REG clear.

Signed-off-by: Stefan Chulski <stefanc@marvell.com>
Acked-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: add FCA periodic timer configurations

Flow Control periodic timer would be used if port in
XOFF to transmit periodic XOFF frames.

Signed-off-by: Stefan Chulski <stefanc@marvell.com>
Acked-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: increase BM pool and RXQ size

BM pool and RXQ size increased to support Firmware Flow Control.
Minimum depletion thresholds to support FC are 1024 buffers.
BM pool size increased to 2048 to have some 1024 buffers
space between depletion thresholds and BM pool size.

Jumbo frames require a 9888B buffer, so memory requirements
for data buffers increased from 7MB to 24MB.

Signed-off-by: Stefan Chulski <stefanc@marvell.com>
Acked-by: Marcin Wojtas <mw@semihalf.com>
Reviewed-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: add PPv23 version definition

This patch add PPv23 version definition.
PPv23 is new packet processor in CP115.
Everything that supported by PPv22, also supported by PPv23.
No functional changes in this stage.

Signed-off-by: Stefan Chulski <stefanc@marvell.com>
Acked-by: Marcin Wojtas <mw@semihalf.com>
Reviewed-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: always compare hw-version vs MVPP21

Currently we have PP2v1 and PP2v2 hw-versions, with some different
handlers depending upon condition hw_version = MVPP21/MVPP22.
In a future there will be also PP2v3. Let's use now the generic
"if equal/notEqual MVPP21" for all cases instead of "if MVPP22".

This patch does not change any functionality.
It is not intended to introduce PP2v3.
It just modifies MVPP21/MVPP22 check-condition
bringing it to generic and unified form correct for new-code
introducing and PP2v3 net-next generation.

Signed-off-by: Stefan Chulski <stefanc@marvell.com>
Acked-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: add CM3 SRAM memory map

This patch adds CM3 memory map.

Signed-off-by: Stefan Chulski <stefanc@marvell.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dts: marvell: add CM3 SRAM memory to cp11x ethernet device tree

CM3 SRAM address space will be used for Flow Control configuration.

Signed-off-by: Stefan Chulski <stefanc@marvell.com>
Signed-off-by: Konstantin Porotchkin <kostap@marvell.com>
Acked-by: Marcin Wojtas <mw@semihalf.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

doc: marvell: add CM3 address space and PPv2.3 description

Patch adds CM3 address space and PPv2.3 description.

Signed-off-by: Stefan Chulski <stefanc@marvell.com>
Acked-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

xen/events: link interdomain events to associated xenbus device

In order to support the possibility of per-device event channel
settings (e.g. lateeoi spurious event thresholds) add a xenbus device
pointer to struct irq_info() and modify the related event channel
binding interfaces to take the pointer to the xenbus device as a
parameter instead of the domain id of the other side.

While at it remove the stale prototype of bind_evtchn_to_irq_lateeoi().

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Wei Liu <wei.liu@kernel.org>
Reviewed-by: Paul Durrant <paul@xen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

xen/netback: fix spurious event detection for common event case

In case of a common event for rx and tx queue the event should be
regarded to be spurious if no rx and no tx requests are pending.

Unfortunately the condition for testing that is wrong causing to
decide a event being spurious if no rx OR no tx requests are
pending.

Fix that plus using local variables for rx/tx pending indicators in
order to split function calls and if condition.

Fixes: 23025393dbeb3b ("xen/netback: use lateeoi irq binding")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Wei Liu <wl@xen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fib_notifier: don't return positive values on fib registration

The function fib6_walk_continue() cannot return a positive value when
called from register_fib_notifier(), but ignoring causes static analyzer to
generate warnings in users of register_fib_notifier() that try to convert
returned error code to pointer with ERR_PTR(). Handle such case by
explicitly checking for positive error values and converting them to
-EINVAL in fib6_tables_dump().

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Suggested-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'mlx5-for-upstream-2021-02-10' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-for-upstream-2021-02-10

Misc cleanups and trivial fixes for net-next

1) spelling mistakes
2) error path checks fixes
3) unused includes and struct fields cleanup
4) build error when MLX5_ESWITCH=no
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipconfig: avoid use-after-free in ic_close_devs

Due to the fact that ic_dev->dev is kept open in ic_close_dev, I had
thought that ic_dev will not be freed either. But that is not the case,
but instead "everybody dies" when ipconfig cleans up, and just the
net_device behind ic_dev->dev remains allocated but not ic_dev itself.

This is a problem because in ic_close_devs, for every net device that
we're about to close, we compare it against the list of lower interfaces
of ic_dev, to figure out whether we should close it or not. But since
ic_dev itself is subject to freeing, this means that at some point in
the middle of the list of ipconfig interfaces, ic_dev will have been
freed, and we would be still attempting to iterate through its list of
lower interfaces while checking whether to bring down the remaining
ipconfig interfaces.

There are multiple ways to avoid the use-after-free: we could delay
freeing ic_dev until the very end (outside the while loop). Or an even
simpler one: we can observe that we don't need ic_dev when iterating
through its lowers, only ic_dev->dev, structure which isn't ever freed.
So, by keeping ic_dev->dev in a variable assigned prior to freeing
ic_dev, we can avoid all use-after-free issues.

Fixes: 46acf7bdbc72 ("Revert "net: ipv4: handle DSA enabled master network devices"")
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: disable detection of bogus xid's 308/388

Several years ago these two entries have been added, but it's not clear
why. There's no trace that there has ever been such a chip version, and
not even the r8101 vendor driver knows these id's. So let's disable
detection, and if nobody complains remove them completely later.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bond-3ad-200g-400g'

Nikolay Aleksandrov says:

====================
bonding: 3ad: support for 200G/400G ports and more verbose warning
xk
We'd like to have proper 200G and 400G support with 3ad bond mode, so we
need to add new definitions for them in order to have separate oper keys,
aggregated bandwidth and proper operation (patches 01 and 02). In
patch 03 Ido changes the code to use pr_err_once instead of
pr_warn_once which would help future detection of unsupported speeds.

v2: patch 03: use pr_err_once instead of WARN_ONCE
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: 3ad: Print an error for unknown speeds

The bond driver needs to be patched to support new ethtool speeds.
Currently it emits a single warning [1] when it encounters an unknown
speed. As evident by the two previous patches, this is not explicit
enough. Instead, promote it to an error.

[1]
bond10: (slave swp1): unknown ethtool speed (200000) for port 1 (set it to 0)

v2:
* Use pr_err_once() instead of WARN_ONCE()

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: 3ad: add support for 400G speed

In order to be able to use 3ad mode with 400G devices we need to extend
the supported speeds.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: 3ad: add support for 200G speed

In order to be able to use 3ad mode with 200G devices we need to extend
the supported speeds.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'qede-netpoll-coalesce'

Bhaskar Upadhaya says:

====================
qede: add netpoll and per-queue coalesce support

This is a followup implementation after series

https://patchwork.kernel.org/project/netdevbpf/cover/1610701570-29496-1-git-send-email-bupadhaya@marvell.com/

Patch 1: Add net poll controller support to transmit kernel printks
         over UDP
Patch 2: QLogic card support multiple queues and each queue can be
         configured with respective coalescing parameters, this patch
         add per queue rx-usecs, tx-usecs coalescing parameters
Patch 3: set default per queue rx-usecs, tx-usecs coalescing parameters and
         preserve coalesce parameters across interface up and down

v3: fixed warnings reported by Dan Carpenter
v2: comments from jakub
- p1: remove poll_controller ndo and add budget 0 support in qede_poll
- p3: preserve coalesce parameters across interface up and down
===================

Signed-off-by: David S. Miller <davem@davemloft.net>

qede: preserve per queue stats across up/down of interface

Here we do the initialization of coalescing values on load.
per queue coalesce values are also restored across up/down of
ethernet interface.

Signed-off-by: Bhaskar Upadhaya <bupadhaya@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qede: add per queue coalesce support for qede driver

per queue coalescing allows better and more finegrained control
over interrupt rates.

Signed-off-by: Bhaskar Upadhaya <bupadhaya@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qede: add netpoll support for qede driver

handle netpoll case when qede_poll is called by
netpoll layer with budget 0

Signed-off-by: Bhaskar Upadhaya <bupadhaya@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: fix return of random stack value

Currently, a random stack value is being returned because variable
_ret_ is not properly initialized. This variable is actually not
used anymore and it should be removed.

Fix this by removing all instances of variable ret and return 0.

Fixes: 64749c9c38a9 ("net: hns3: remove redundant return value of hns3_uninit_all_ring()")
Addresses-Coverity-ID: 1501700 ("Uninitialized scalar variable")
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: dwmac-intel-plat: remove unnecessary initialization

plat_dat is initialized by stmmac_probe_config_dt().
So, initialization is not required by priv->plat.
This removes unnecessary initialization and variables.

Signed-off-by: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: initialize net->net_cookie at netns setup

It is simpler to make net->net_cookie a plain u64
written once in setup_net() instead of looping
and using atomic64 helpers.

Lorenz Bauer wants to add SO_NETNS_COOKIE socket option
and this patch would makes his patch series simpler.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: re-configure WOL settings on resume from hibernation

So far we don't re-configure WOL-related register bits when waking up
from hibernation. I'm not aware of any problem reports, but better
play safe and call __rtl8169_set_wol() in the resume() path too.
To achieve this move calling __rtl8169_set_wol() to
rtl8169_net_resume() and rename the function to rtl8169_runtime_resume().

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'phy-icplus-next'

Michael Walle says:

====================
net: phy: icplus: cleanups and new features

Cleanup the PHY drivers for IPplus devices and add PHY counters and MDIX
support for the IP101A/G.

Patch 5 adds a model detection based on the behavior of the PHY.
Unfortunately, the IP101A shares the PHY ID with the IP101G. But the latter
provides more features. Try to detect the newer model by accessing the page
selection register. If it is writeable, it is assumed, that it is a IP101G.

With this detection in place, we can now access registers >= 16 in a
correct way on the IP101G; that is by first selecting the correct page.
This might previouly worked, because no one ever set another active page
before booting linux.

The last two patches add the new features.
===================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: icplus: add MDI/MDIX support for IP101A/G

Implement the operations to set desired mode and retrieve the current
mode.

This feature was tested with an IP101G.

Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: icplus: add PHY counter for IP101G

The IP101G provides three counters: RX packets, CRC errors and symbol
errors. The error counters can be configured to clear automatically on
read. Unfortunately, this isn't true for the RX packet counter. Because
of this and because the RX packet counter is more likely to overflow,
than the error counters implement only support for the error counters.

Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: icplus: fix paged register access

Registers >= 16 are paged. Be sure to set the page. It seems this was
working for now, because the default is correct for the registers used
in the driver at the moment. But this will also assume, nobody will
change the page select register before linux is started. The page select
register is _not_ reset with a soft reset of the PHY.

To ease the function reuse between the non-paged register space of the
IP101A and the IP101G, add noop read_page()/write_page() callbacks so
the IP101G functions can also be used for the IP101A.

Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: icplus: don't set APS_EN bit on IP101G

This bit is reserved as 'always-write-1'. While this is not a particular
error, because we are only setting it, guard it by checking the model to
prevent errors in the future.

Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: icplus: split IP101A/G driver

Unfortunately, the IP101A and IP101G share the same PHY identifier.
While most of the functions are somewhat backwards compatible, there is
for example the APS_EN bit on the IP101A but on the IP101G this bit
reserved. Also, the IP101G has many more functionalities.

Deduce the model by accessing the page select register which - according
to the datasheet - is not available on the IP101A. If this register is
writable, assume we have an IP101G.

Split the combined IP101A/G driver into two separate drivers.

Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: icplus: use the .soft_reset() of the phy-core

The PHY core already resets the PHY before .config_init() if a
.soft_reset() op is registered. Drop the open-coded ip1xx_reset().

Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: icplus: drop address operator for functions

Don't sometimes use the address operator and sometimes not. Drop it and
make the code look uniform.

Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: icplus: use PHY_ID_MATCH_EXACT() for IP101A/G

According to the datasheet of the IP101A/G there is no revision field
and MII_PHYSID2 always reads as 0x0c54. Use PHY_ID_MATCH_EXACT() then.

Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: icplus: use PHY_ID_MATCH_MODEL() macro

Simpify the initializations of the structures. There is no functional
change.

Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'dsa-hsr-offload'

George McCollister says:

====================
add HSR offloading support for DSA switches

Add support for offloading HSR/PRP (IEC 62439-3) tag insertion, tag
removal, forwarding and duplication on DSA switches.
This series adds offloading to the xrs700x DSA driver.

Changes since RFC:
* Split hsr and dsa patches. (Florian Fainelli)

Changes since v1:
* Fixed some typos/wording. (Vladimir Oltean)
* eliminate IFF_HSR and use is_hsr_master instead. (Vladimir Oltean)
* Make hsr_handle_sup_frame handle skb_std as well (required when offloading)
* Don't add hsr tag for HSR v0 supervisory frames.
* Fixed tag insertion offloading for PRP.

Changes since v2:
* Return -EOPNOTSUPP instead of 0 in dsa_switch_hsr_join and
dsa_switch_hsr_leave. (Vladimir Oltean)
* Only allow ports 1 and 2 to be HSR/PRP redundant ports. (Tobias Waldekranz)
* Set and remove HSR features for both redundant ports. (Vladimir Oltean)
* Change port_hsr_leave() to return int instead of void.
* Remove hsr_init_skb() proto argument. (Vladimir Oltean)
===================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: xrs700x: add HSR offloading support

Add offloading for HSR/PRP (IEC 62439-3) tag insertion, tag removal
forwarding and duplication supported by the xrs7000 series switches.

Only HSR v1 and PRP v1 are supported by the xrs7000 series switches (HSR
v0 is not).

Signed-off-by: George McCollister <george.mccollister@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: add support for offloading HSR

Add support for offloading of HSR/PRP (IEC 62439-3) tag insertion
tag removal, duplicate generation and forwarding on DSA switches.

Add DSA_NOTIFIER_HSR_JOIN and DSA_NOTIFIER_HSR_LEAVE which trigger calls
to .port_hsr_join and .port_hsr_leave in the DSA driver for the switch.

The DSA switch driver should then set netdev feature flags for the
HSR/PRP operation that it offloads.
    NETIF_F_HW_HSR_TAG_INS
    NETIF_F_HW_HSR_TAG_RM
    NETIF_F_HW_HSR_FWD
    NETIF_F_HW_HSR_DUP

Signed-off-by: George McCollister <george.mccollister@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hsr: add offloading support

Add support for offloading of HSR/PRP (IEC 62439-3) tag insertion
tag removal, duplicate generation and forwarding.

For HSR, insertion involves the switch adding a 6 byte HSR header after
the 14 byte Ethernet header. For PRP it adds a 6 byte trailer.

Tag removal involves automatically stripping the HSR/PRP header/trailer
in the switch. This is possible when the switch also performs auto
deduplication using the HSR/PRP header/trailer (making it no longer
required).

Forwarding involves automatically forwarding between redundant ports in
an HSR. This is crucial because delay is accumulated as a frame passes
through each node in the ring.

Duplication involves the switch automatically sending a single frame
from the CPU port to both redundant ports. This is required because the
inserted HSR/PRP header/trailer must contain the same sequence number
on the frames sent out both redundant ports.

Export is_hsr_master so DSA can tell them apart from other devices in
dsa_slave_changeupper.

Signed-off-by: George McCollister <george.mccollister@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hsr: generate supervision frame without HSR/PRP tag

For a switch to offload insertion of HSR/PRP tags, frames must not be
sent to the CPU facing switch port with a tag. Generate supervision frames
(eth type ETH_P_PRP) without HSR v1 (ETH_P_HSR)/PRP tag and rely on
create_tagged_frame which inserts it later. This will allow skipping the
tag insertion for all outgoing frames in the future which is required for
HSR v1/PRP tag insertions to be offloaded.

HSR v0 supervision frames always contain tag information so insertion of
the tag can't be offloaded. IEC 62439-3 Ed.2.0 (HSR v1) specifically
notes that this was changed since v0 to allow offloading.

Signed-off-by: George McCollister <george.mccollister@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Tested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: xrs700x: use of_match_ptr() on xrs700x_mdio_dt_ids

Use of_match_ptr() on xrs700x_mdio_dt_ids so that NULL is substituted
when CONFIG_OF isn't defined. This will prevent unnecessary use of
xrs700x_mdio_dt_ids when CONFIG_OF isn't defined.

Signed-off-by: George McCollister <george.mccollister@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: xrs700x: fix unused warning for of_device_id

Fix unused variable warning that occurs when CONFIG_OF isn't defined by
adding __maybe_unused.

>> drivers/net/dsa/xrs700x/xrs700x_i2c.c:127:34: warning: unused
variable 'xrs700x_i2c_dt_ids' [-Wunused-const-variable]
static const struct of_device_id xrs700x_i2c_dt_ids[] = {

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: George McCollister <george.mccollister@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'tcp-rfc-6056'

Eric Dumazet says:

====================
tcp: RFC 6056 induced changes

This is based on a report from David Dworken.

First patch implements RFC 6056 3.3.4 proposal.

Second patch is adding a little bit of noise to make
attacker life a bit harder.
===================

Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: add some entropy in __inet_hash_connect()

Even when implementing RFC 6056 3.3.4 (Algorithm 4: Double-Hash
Port Selection Algorithm), a patient attacker could still be able
to collect enough state from an otherwise idle host.

Idea of this patch is to inject some noise, in the
cases __inet_hash_connect() found a candidate in the first
attempt.

This noise should not significantly reduce the collision
avoidance, and should be zero if connection table
is already well used.

Note that this is not implementing RFC 6056 3.3.5
because we think Algorithm 5 could hurt typical
workloads.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: David Dworken <ddworken@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: change source port randomizarion at connect() time

RFC 6056 (Recommendations for Transport-Protocol Port Randomization)
provides good summary of why source selection needs extra care.

David Dworken reminded us that linux implements Algorithm 3
as described in RFC 6056 3.3.3

Quoting David :
   In the context of the web, this creates an interesting info leak where
   websites can count how many TCP connections a user's computer is
   establishing over time. For example, this allows a website to count
   exactly how many subresources a third party website loaded.
   This also allows:
   - Distinguishing between different users behind a VPN based on
       distinct source port ranges.
   - Tracking users over time across multiple networks.
   - Covert communication channels between different browsers/browser
       profiles running on the same computer
   - Tracking what applications are running on a computer based on
       the pattern of how fast source ports are getting incremented.

Section 3.3.4 describes an enhancement, that reduces
attackers ability to use the basic information currently
stored into the shared 'u32 hint'.

This change also decreases collision rate when
multiple applications need to connect() to
different destinations.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: David Dworken <ddworken@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Revert "net-loopback: set lo dev initial state to UP"

In commit c9dca822c729 ("net-loopback: set lo dev initial state to UP"),
linux started automatically bringing up the loopback device of a newly
created namespace. However, an existing user script might reasonably have
the following stanza when creating a new namespace -- and in fact at least
tools/testing/selftests/net/fib_nexthops.sh in Linux's very own testsuite
does:

# set -e
# ip netns add foo
# ip -netns foo addr add 127.0.0.1/8 dev lo
# ip -netns foo link set lo up
# set +e

This will now fail, because the kernel reasonably rejects "ip addr add" of
a duplicate address. The described change of behavior therefore constitutes
a breakage. Revert it.

Fixes: c9dca822c729 ("net-loopback: set lo dev initial state to UP")
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>