git.proxmox.com Git - mirror_ubuntu-jammy-kernel.git/log

Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
1GbE Intel Wired LAN Driver Updates 2020-05-22

This series contains updates to e1000e, igc and igb.

Many of the patches in this series are fixes, but many of the igc fixes
are based on the recent filter rule handling Andre has been working,
which will not backport to earlier/stable kernels.  The remaining fixes
for e1000e and igb have CC'd stable where applicable.

Andre continue with his refactoring of the filter rule code to help with
reducing the complexity, in multiple patches.  Fix the inconsistent size
of a struct field.  Fixed an issue where filter rules stay active in the
hardware, even after it was deleted, so make sure to disable the filter
rule before deleting.  Fixed an issue with NFC rules which were dropping
valid multicast MAC address.  Fixed how the NFC rules are restored after
the NIC is reset or brought up, so that they are restored in the same order
they were initially setup in.  Fix a potential memory leak when the
driver is unloaded and the NFC rules are not flushed from memory
properly.  Fixed how NFC rule validation handles when a request to
overwrite an existing rule.  Changed the locking around the NFC rule API
calls from spin_locks to mutex locks to avoid unnecessary busy waiting
on lock contention.

Sasha clean up more unused code in the igc driver.

Kai-Heng Feng from Canonical provides three fixes, first has igb report
the speed and duplex as unknown when in runtime suspend.  Fixed e1000e
to pass up the error when disabling ULP mode.  Fixed e1000e performance
by disabling TSO by default for certain MACs.

Vitaly disables S0ix entry and exit flows for ME systems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'mlx5-updates-2020-05-22' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2020-05-22

This series includes two updates and one cleanup patch

1) Tang Bim, clean-up with IS_ERR() usage

2) Vlad introduces a new mlx5 kconfig flag for TC support

   This is required due to the high volume of current and upcoming
   development in the eswitch and representors areas where some of the
   feature are TC based such as the downstream patches of MPLSoUDP and
   the following representor bonding support for VF live migration and
   uplink representor dynamic loading.
   For this Vlad kept TC specific code in tc.c and rep/tc.c and
   organized non TC code in representors specific files.

3) Eli Cohen adds support for MPLS over UPD encap and decap TC offloads.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: psample: fix build error when CONFIG_INET is not enabled

Fix psample build error when CONFIG_INET is not set/enabled by
bracketing the tunnel code in #ifdef CONFIG_NET / #endif.

../net/psample/psample.c: In function ‘__psample_ip_tun_to_nlattr’:
../net/psample/psample.c:216:25: error: implicit declaration of function ‘ip_tunnel_info_opts’; did you mean ‘ip_tunnel_info_opts_set’? [-Werror=implicit-function-declaration]

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Yotam Gigi <yotam.gi@gmail.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: at803x: fix PHY ID masks

Ever since its first commit 0ca7111a38f05 ("phy: add AT803x driver") the
PHY ID mask was set to 0xffffffef. It is unclear to me why this mask was
chosen in the first place. Both the AR8031/AR8033 and the AR8035
datasheets mention it is always the given value:
- for AR8031/AR8033 its 0x004d/0xd074
- for AR8035 its 0x004d/0xd072

Unfortunately, I don't have a datasheet for the AR8030. Therefore, we
leave its PHY ID mask untouched. For the PHYs mentioned before use the
handy PHY_ID_MATCH_EXACT() macro.

I've tried to contact the author of the initial commit, but received no
answer so far.

Cc: Matus Ujhelyi <ujhelyi.m@gmail.com>
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2020-05-23

The following pull-request contains BPF updates for your *net-next* tree.

We've added 50 non-merge commits during the last 8 day(s) which contain
a total of 109 files changed, 2776 insertions(+), 2887 deletions(-).

The main changes are:

1) Add a new AF_XDP buffer allocation API to the core in order to help
   lowering the bar for drivers adopting AF_XDP support. i40e, ice, ixgbe
   as well as mlx5 have been moved over to the new API and also gained a
   small improvement in performance, from Björn Töpel and Magnus Karlsson.

2) Add getpeername()/getsockname() attach types for BPF sock_addr programs
   in order to allow for e.g. reverse translation of load-balancer backend
   to service address/port tuple from a connected peer, from Daniel Borkmann.

3) Improve the BPF verifier is_branch_taken() logic to evaluate pointers
   being non-NULL, e.g. if after an initial test another non-NULL test on
   that pointer follows in a given path, then it can be pruned right away,
   from John Fastabend.

4) Larger rework of BPF sockmap selftests to make output easier to understand
   and to reduce overall runtime as well as adding new BPF kTLS selftests
   that run in combination with sockmap, also from John Fastabend.

5) Batch of misc updates to BPF selftests including fixing up test_align
   to match verifier output again and moving it under test_progs, allowing
   bpf_iter selftest to compile on machines with older vmlinux.h, and
   updating config options for lirc and v6 segment routing helpers, from
   Stanislav Fomichev, Andrii Nakryiko and Alan Maguire.

6) Conversion of BPF tracing samples outdated internal BPF loader to use
   libbpf API instead, from Daniel T. Lee.

7) Follow-up to BPF kernel test infrastructure in order to fix a flake in
   the XDP selftests, from Jesper Dangaard Brouer.

8) Minor improvements to libbpf's internal hashmap implementation, from
   Ian Rogers.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

e1000e: disable s0ix entry and exit flows for ME systems

Since ME systems do not support SLP_S0 in S0ix state, and S0ix entry
and exit flows may cause errors on them it is best to avoid using
e1000e_s0ix_entry_flow and e1000e_s0ix_exit_flow functions.

This was done by creating a struct of all devices that comes with ME
and by checking if the current device has ME.

Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e: Disable TSO for buffer overrun workaround

Commit b10effb92e27 ("e1000e: fix buffer overrun while the I219 is
processing DMA transactions") imposes roughly 30% performance penalty.

The commit log states that "Disabling TSO eliminates performance loss
for TCP traffic without a noticeable impact on CPU performance", so
let's disable TSO by default to regain the loss.

CC: stable <stable@vger.kernel.org>
Fixes: b10effb92e27 ("e1000e: fix buffer overrun while the I219 is processing DMA transactions")
BugLink: https://bugs.launchpad.net/bugs/1802691
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e: Warn if disabling ULP failed

The hardware may stop working if driver failed to disable ULP mode.

Take the return value of e1000_disable_ulp_lpt_lp() into account, and
pass up the error if it fails.

Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

igb: Report speed and duplex as unknown when device is runtime suspended

igb device gets runtime suspended when there's no link partner. We can't
get correct speed under that state:
$ cat /sys/class/net/enp3s0/speed
1000

In addition to that, an error can also be spotted in dmesg:
[ 385.991957] igb 0000:03:00.0 enp3s0: PCIe link lost

Since device can only be runtime suspended when there's no link partner,
we can skip reading register and let the following logic set speed and
duplex with correct status.

The more generic approach will be wrap get_link_ksettings() with begin()
and complete() callbacks. However, for this particular issue, begin()
calls igb_runtime_resume() , which tries to rtnl_lock() while the lock
is already hold by upper ethtool layer.

So let's take this approach until the igb_runtime_resume() no longer
needs to hold rtnl_lock.

CC: stable <stable@vger.kernel.org>
Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

igc: Remove unused descriptor's flags

Enable Tidv register, Report Packet Sent, Report Status and
Ethernet CRC flags not in use.
This patch comes to clean up these flags.

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

igc: Remove igc_nfc_rule_exit()

During igc_down(), we call igc_nfc_rule_exit() which traverse the NFC
rule list disabling filters one by one. Later on in igc_down() flow
we issue an hardware reset which also clear all filters. Since we
already reset the hardware, we don't actually need to disable each
filter manually. In order to simplify the code, this patch removes
igc_nfc_rule() altogether.

Signed-off-by: Andre Guedes <andre.guedes@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

igc: Change adapter->nfc_rule_lock to mutex

This patch changes adapter->nfc_rule_lock type from spin_lock to mutex
so we avoid unnecessary busy waiting on lock contention.

A closer look at the execution context of NFC rule API users shows that
all of them run in process context. The API users are: ethtool ops,
igc_configure(), called when interface is brought up by user or reset
workequeue thread, igc_down(), called when interface is brought down,
and igc_remove(), called when driver is unloaded.

Signed-off-by: Andre Guedes <andre.guedes@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

igc: Change return type from igc_disable_nfc_rule()

None of igc_disable_nfc_rule() callers actually check its returning
value. A closer look at why this function would fail shows that the
only situation is when we try to delete an Ethertype or MAC filter that
doesn't exist.

That situation is very unlikely so we can change igc_del_etype_filter()
and igc_del_mac_filter() logic to "if the filter doesn't exist, we are
done", and keep the logic in igc_disable_nfc_rule() callers simple.

Signed-off-by: Andre Guedes <andre.guedes@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

igc: Fix NFC rule validation

If we try to overwrite an existing rule with the same filter but
different action, we get EEXIST error as shown below.

$ ethtool -N eth0 flow-type ether dst <MACADDR> action 1 loc 10
$ ethtool -N eth0 flow-type ether dst <MACADDR> action 2 loc 10
rmgr: Cannot insert RX class rule: File exists

The second command is expected to overwrite the previous rule in location
10 and succeed.

This patch fixes igc_ethtool_check_nfc_rule() so it also checks the
rules location. In case they match, the rule under evaluation should not
be considered invalid.

Signed-off-by: Andre Guedes <andre.guedes@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

igc: Fix NFC rules leak when driver is unloaded

If we have RFC rules in adapter->nfc_rule_list when the IGC driver
is unloaded, all rules are leaked. This patch fixes the issue by
introducing the helper igc_flush_nfc_rules() and calling it in
igc_remove(). It also updates igc_set_features() so is reuses the
new helper instead of re-implementing it.

Signed-off-by: Andre Guedes <andre.guedes@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

igc: Refactor igc_ethtool_update_nfc_rule()

Current implementation of igc_ethtool_update_nfc_rule() is a bit
convoluted since it handles too many things: rule lookup, deletion
and addition. This patch breaks it into three functions so we simplify
the code and improve code reuse.

Code related to rule lookup is refactored out to a new function called
igc_get_nfc_rule().

Code related to rule addition is refactored out to a new function called
igc_add_nfc_rule(). This function enables the rule in hardware and adds
it to the adapter's list.

Code related to rule deletion is refactored out to a new function called
igc_del_nfc_rule(). This function disables the rule in hardware, removes
it from adapter's list, and deletes it.

As a byproduct of this refactoring, igc_enable_nfc_rule() and
igc_disable_nfc_rule() are moved to igc_main.c since they are not used
in igc_ethtool.c anymore, and igc_restore_nfc_rules() and igc_nfc_rule_
exit() are moved around to avoid forward declaration.

Also, since this patch already touches igc_ethtool_get_nfc_rule(), it
takes the opportunity to remove the 'match_flags' check. Empty flags
are not allowed to be added so no need to check that.

Signed-off-by: Andre Guedes <andre.guedes@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

igc: Fix NFC rules restoration

When network interface is brought up, the driver re-enables the NFC
rules previously configured. However, this is done in reverse order
the rules were added and hardware filters are configured differently.

For example, consider the following rules:

$ ethtool -N eth0 flow-type ether dst 00:00:00:00:00:AA queue 0
$ ethtool -N eth0 flow-type ether dst 00:00:00:00:00:BB queue 1
$ ethtool -N eth0 flow-type ether dst 00:00:00:00:00:CC queue 2
$ ethtool -N eth0 flow-type ether dst 00:00:00:00:00:DD queue 3

RAL/RAH registers are configure so filter index 1 has address ending
with AA, filter index 2 has address ending in BB, and so on.

If we bring the interface down and up again, RAL/RAH registers are
configured so filter index 1 has address ending in DD, filter index 2
has CC, and so on. IOW, in reverse order we had before bringing the
interface down.

This issue can be fixed by traversing adapter->nfc_rule_list in
backwards when restoring the rules. Since hlist doesn't support
backwards traversal, this patch replaces it by list_head and fixes
igc_restore_nfc_rules() accordingly.

Signed-off-by: Andre Guedes <andre.guedes@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

igc: Fix NFC rules with multicast addresses

Multicast MAC addresses are valid address for NFC rules but
igc_add_mac_filter() is currently rejecting them. In fact, the I225
controller doesn't impose any constraint on the address value so this
patch gets rid of the address validation check in MAC filter APIs.

Signed-off-by: Andre Guedes <andre.guedes@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

igc: Fix NFC rule overwrite cases

When the 'loc' argument is passed in ethtool, the input rule overwrites
any rule present in that location. In this situation we must disable the
old rule otherwise it is left enabled in hardware. This patch fixes
the issue by always calling igc_disable_nfc_rule() when deleting the
old rule, no matter the value of 'input' argument.

Signed-off-by: Andre Guedes <andre.guedes@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

igc: Fix locking issue when retrieving NFC rules

Access to NFC rules stored in adapter->nfc_rule_list is protect by
adapter->nfc_rule_lock. The functions igc_ethtool_get_nfc_rule()
and igc_ethtool_get_nfc_rules() are missing to hold the lock while
accessing rule objects.

Signed-off-by: Andre Guedes <andre.guedes@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

igc: Fix 'sw_idx' type in struct igc_nfc_rule

The 'sw_idx' field from 'struct igc_nfc_rule' is u16 type but it is
assigned an u32 value in igc_ethtool_init_nfc_rule(). This patch changes
'sw_idx' type to u32 so they match. Also, it makes more sense to call
this field 'location' since it holds the NFC rule location.

Signed-off-by: Andre Guedes <andre.guedes@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

igc: Refactor igc_ethtool_add_nfc_rule()

Current implementation of igc_ethtool_add_nfc_rule() is quite long and a
bit convoluted so this patch does a code refactoring to improve the
code.

Code related to NFC rule object initialization is refactored out to the
local helper function igc_ethtool_init_nfc_rule(). Likewise, code
related to NFC rule validation is refactored out to another local
helper, igc_ethtool_is_nfc_rule_valid().

RX_CLS_FLOW_DISC check is removed since it is redundant. The macro is
defined as the max value fsp->ring_cookie can have, so checking if
fsp->ring_cookie >= adapter->num_rx_queues is already sufficient.

Finally, some log messages are improved or added, and obvious comments
are removed.

Signed-off-by: Andre Guedes <andre.guedes@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

net/mlx5e: Support pedit on mpls over UDP decap

Allow to modify ethernet headers while decapsulating mpls over UDP
packets. This is implemented using the same reformat object used for
decapsulation.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Add support for hw decapsulation of MPLS over UDP

MPLS over UDP is supported in hardware by using a packet reformat object
with reformat type equal L3_TUNNEL_TO_L2 which both decapsulates the
outer L3, L4 and MPLS headers, and allows for setting the L2 headers of
the resulting decapsulated packet. For the hardware to operate
correctly, the configuration of the firmware must have
FLEX_PARSER_PROFILE_ENABLE = 1.

Example tc rule:
  tc filter add dev bareudp0 protocol all prio 1 root flower enc_dst_port \
      6635 enc_src_ip 8.8.8.23 action mpls pop protocol ip pipe \
      action pedit ex munge eth dst set 00:11:22:33:44:21 pipe action \
      mirred egress redirect dev enp59s0f0_0

We use pedit to set the correct destination MAC.

For MPLS over UDP decapsulation to take place, the driver logic requires
the following:

1. flower filter added on bareudp device.
2. action mpls pop
3. zero or more pedit munge actions
4. one redirect action

Current implementation supports only IPv4 and no VLAN.

tc filter show output looks like this:
   filter protocol all pref 1 flower chain 0
   filter protocol all pref 1 flower chain 0 handle 0x1
     enc_src_ip 8.8.8.24
     enc_dst_port 6635
     in_hw in_hw_count 1
            action order 1: mpls  pop protocol ip pipe
             index 2 ref 1 bind 1

            action order 2:  pedit action pipe keys 2
             index 1 ref 1 bind 1
             key #0  at eth+0: val 00112233 mask 00000000
             key #1  at eth+4: val 44210000 mask 0000ffff

            action order 3: mirred (Egress Redirect to device enp59s0f0_0) stolen
            index 2 ref 1 bind 1

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Allow to match on mpls parameters

Support matching on MPLS over UDP parameters using misc2 section of
match parameters.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Add support for hw encapsulation of MPLS over UDP

MPLS over UDP is supported by adding a rule on a representor net device
which does tunnel_key set, push mpls and forward to a baredup device. At
the hardware level we use a packet_reformat_context object to do the
encapsulation of the packet.

The resulting packet looks as follows (left side transmitted first):
outer L2 | outer IP | UDP | MPLS | inner L3 and data |

Example usage:
  tc filter add dev $rep0 protocol ip prio 1 root flower skip_sw  \
     action tunnel_key set src_ip 8.8.8.21 dst_ip 8.8.8.24 id 555 \
     dst_port 6635 tos 4 ttl 6 csum action mpls push protocol 0x8847 \
     label 555 tc 3 action mirred egress redirect dev bareudp0

This is how the filter is shown with tc filter show:
tc filter show dev enp59s0f0_0 ingress
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
  eth_type ipv4
  skip_sw
  in_hw in_hw_count 1
        action order 1: tunnel_key  set
        src_ip 8.8.8.21
        dst_ip 8.8.8.24
        key_id 555
        dst_port 6635
        csum
        tos 0x4
        ttl 6 pipe
         index 1 ref 1 bind 1

        action order 2: mpls  push protocol mpls_uc label 555 tc 3 ttl 255 pipe
         index 1 ref 1 bind 1

        action order 3: mirred (Egress Redirect to device bareudp0) stolen
        index 1 ref 1 bind 1

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net: Add netif_is_bareudp() API to identify bareudp devices

Add netif_is_bareudp() so the device can be identified as a bareudp one.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Introduce kconfig var for TC support

In order to improve code maintainability and readability, introduce new
CONFIG_MLX5_CLS_ACT kconfig variable to control compilation of TC hardware
offloads implementation. This allows distinguishing between features that
require TC support (MPLSoUDP, etc.) and features that just rely on
representor functionality (rep_bond for live migration, etc.).

Modify rep_tc.h, rep_neigh.h, en_tc.h and chains.h files to provide stubs
for functions that are called from generic code.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Move TC-specific code from en_main.c to en_tc.c

As a preparation for introducing new kconfig option that controls
compilation of all TC offloads code in mlx5, extract TC-specific code from
en_main.c to en_tc.c. This allows easily compiling out the code by
only including new source in make file when corresponding kconfig is
enabled instead of adding multiple ifdef blocks to en_main.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Extract neigh-specific code from en_rep.c to rep/neigh.c

As a preparation for introducing new kconfig option that controls
compilation of all TC offloads code in mlx5, extract neigh-specific code
from en_rep.c to standalone file. This allows easily compiling out the code
by only including new source in make file when corresponding kconfig is
enabled instead of adding multiple ifdef blocks to en_rep.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Extract TC-specific code from en_rep.c to rep/tc.c

As a preparation for introducing new kconfig option that controls
compilation of all TC offloads code in mlx5, extract TC-specific code from
en_rep.c to standalone file. This allows easily compiling out the code by
only including new source in make file when corresponding kconfig is
enabled instead of adding multiple ifdef blocks to en_rep.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Use IS_ERR() to check and simplify code

Use IS_ERR() and PTR_ERR() instead of PTR_ERR_OR_ZERO() to
simplify code, avoid redundant judgements.

Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
Signed-off-by: Tang Bin <tangbin@cmss.chinamobile.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

Merge branch 'bridge-mrp-Add-br_mrp_unique_ifindex-function'

Horatiu Vultur says:

====================
bridge: mrp: Add br_mrp_unique_ifindex function

This patch series adds small fixes to MRP implementation.
The following are fixed in this patch series:
- now is not allow to add the same port to multiple MRP rings
- remove unused variable
- restore the port state according to the bridge state when the MRP instance
is deleted

v2:
- use rtnl_dereference instead of rcu_dereference in the first patch
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: mrp: Restore port state when deleting MRP instance

When a MRP instance is deleted, then restore the port according to the
bridge state. If the bridge is up then the ports will be in forwarding
state otherwise will be in disabled state.

Fixes: 9a9f26e8f7ea ("bridge: mrp: Connect MRP API with the switchdev API")
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

switchdev: mrp: Remove the variable mrp_ring_state

Remove the variable mrp_ring_state from switchdev_attr because is not
used anywhere.
The ring state is set using SWITCHDEV_OBJ_ID_RING_STATE_MRP.

Fixes: c284b5459008 ("switchdev: mrp: Extend switchdev API to offload MRP")
Acked-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: mrp: Add br_mrp_unique_ifindex function

It is not allow to have the same net bridge port part of multiple MRP
rings. Therefore add a check if the port is used already in a different
MRP. In that case return failure.

Fixes: 9a9f26e8f7ea ("bridge: mrp: Connect MRP API with the switchdev API")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'DP83869-Enhancements'

Dan Murphy says:

====================
DP83869 Enhancements

These are improvements to the DP83869 Ethernet PHY driver. OP-mode and port
mirroring may be strapped on the device but the software only retrives these
settings from the device tree. Reading the straps and initializing the
associated stored variables so when setting the PHY up and down the PHY's
configuration values will be retained.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: dp83869: Set opmode from straps

If the op-mode for the device is not set in the device tree then set
the strapped op-mode and store it for later configuration.

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Dan Murphy <dmurphy@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: dp83869: Update port-mirroring to read straps

The device tree may not have the property set for port mirroring
because the hardware may have it strapped. If the property is not in the
DT then check the straps and set the port mirroring bit appropriately.

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Dan Murphy <dmurphy@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests/bpf: CONFIG_LIRC required for test_lirc_mode2.sh

test_lirc_mode2.sh assumes presence of /sys/class/rc/rc0/lirc*/uevent
which will not be present unless CONFIG_LIRC=y

Fixes: 6bdd533cee9a ("bpf: add selftest for lirc_mode2 type program")
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/1590147389-26482-3-git-send-email-alan.maguire@oracle.com

selftests/bpf: CONFIG_IPV6_SEG6_BPF required for test_seg6_loop.o

test_seg6_loop.o uses the helper bpf_lwt_seg6_adjust_srh();
it will not be present if CONFIG_IPV6_SEG6_BPF is not specified.

Fixes: b061017f8b4d ("selftests/bpf: add realistic loop tests")
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/1590147389-26482-2-git-send-email-alan.maguire@oracle.com

selftests/bpf: Add general instructions for test execution

Getting a clean BPF selftests run involves ensuring latest trunk LLVM/clang
are used, pahole is recent (>=1.16) and config matches the specified
config file as closely as possible. Add to bpf_devel_QA.rst and point
tools/testing/selftests/bpf/README.rst to it.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/1590146674-25485-1-git-send-email-alan.maguire@oracle.com

Revert "net: mvneta: speed down the PHY, if WoL used, to save energy"

This reverts commit 5e3768a436bb70c9c3e27aaba6b73f8ef8f5dcf3.

On request from Russell King, this is a layering violation.

Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: add adapter hotplug support for ULDs

Upon adapter hotplug, cxgb4 registers ULD devices for all the ULDs that
are already loaded, ensuring that ULD's can enumerate the hotplugged
adapter without reloading the ULD.

Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: flow_offload: simplify hw stats check handling

Make FLOW_ACTION_HW_STATS_DONT_CARE be all bits, rather than none, so that
drivers and __flow_action_hw_stats_check can use simple bitwise checks.

Pre-fill all actions with DONT_CARE in flow_rule_alloc(), rather than
relying on implicit semantics of zero from kzalloc, so that callers which
don't configure action stats themselves (i.e. netfilter) get the correct
behaviour by default.

Only the kernel's internal API semantics change; the TC uAPI is unaffected.

v4: move DONT_CARE setting to flow_rule_alloc() for robustness and simplicity.

v3: set DONT_CARE in nft and ct offload.

v2: rebased on net-next, removed RFC tags.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ip6_tunnel-add-MPLS-support'

Vadim Fedorenko says:

====================
ip6_tunnel: add MPLS support

The support for MPLS-in-IPv4 was added earlier. This patchset adds
support for MPLS-in-IPv6.

Changes in v2:
- Eliminate ifdefs IS_ENABLE(CONFIG_MPLS)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

mpls: Add support for IPv6 tunnels

Add support for IPv6 tunnel devices in AF_MPLS.

Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>

ip6_tunnel: add generic MPLS receive support

Add support for MPLS in receive side.

Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>

tunnel6: support for IPPROTO_MPLS

This patch is just preparation for MPLS support in ip6_tunnel

Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>

ip6_tunnel: add MPLS transmit support

Add ETH_P_MPLS_UC as supported protocol.

Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>

ip6_tunnel: simplify transmit path

Merge ip{4,6}ip6_tnl_xmit functions into one universal
ipxip6_tnl_xmit in preparation for adding MPLS support.

Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mediatek-add-support-for-MediaTek-Ethernet-MAC'

Bartosz Golaszewski says:

====================
mediatek: add support for MediaTek Ethernet MAC

This series adds support for the STAR Ethernet Controller present on MediaTeK
SoCs from the MT8* family.

First we convert the existing DT bindings for the PERICFG controller to YAML
and add a new compatible string for mt8516 variant of it. Then we add the DT
bindings for the MAC.

Next we do some cleanup of the mediatek ethernet drivers directory.

The largest patch in the series adds the actual new driver.

The rest of the patches add DT fixups for the boards already supported
upstream.

v1 -> v2:
- add a generic helper for retrieving the net_device associated with given
  private data
- fix several typos in commit messages
- remove MTK_MAC_VERSION and don't set the driver version
- use NET_IP_ALIGN instead of a magic number (2) but redefine it as it defaults
  to 0 on arm64
- don't manually turn the carrier off in mtk_mac_enable()
- process TX cleanup in napi poll callback
- configure pause in the adjust_link callback
- use regmap_read_poll_timeout() instead of handcoding the polling
- use devres_find() to verify that struct net_device is managed by devres in
  devm_register_netdev()
- add a patch moving all networking devres helpers into net/devres.c
- tweak the dma barriers: remove where unnecessary and add comments to the
  remaining barriers
- don't reset internal counters when enabling the NIC
- set the net_device's mtu size instead of checking the framesize in
  ndo_start_xmit() callback
- fix a race condition in waking up the netif queue
- don't emit log messages on OOM errors
- use dma_set_mask_and_coherent()
- use eth_hw_addr_random()
- rework the receive callback so that we reuse the previous skb if unmapping
  fails, like we already do if skb allocation fails
- rework hash table operations: add proper timeout handling and clear bits when
  appropriate

v2 -> v3:
- drop the patch adding priv_to_netdev() and store the netdev pointer in the
  driver private data
- add an additional dma_wmb() after reseting the descriptor in
  mtk_mac_ring_pop_tail()
- check the return value of dma_set_mask_and_coherent()
- improve the DT bindings for mtk-eth-mac: make the reg property in the example
  use single-cell address and size, extend the description of the PERICFG
  phandle and document the mdio sub-node
- add a patch converting the old .txt bindings for PERICFG to yaml
- limit reading the DMA memory by storing the mapped addresses in the driver
  private structure
- add a patch documenting the existing networking devres helpers

v3 -> v4:
- drop the devres patches: they will be sent separately
- call netdev_sent_queue() & netdev_completed_queue() where appropriate
- don't redefine NET_IP_ALIGN: define a private constant in the driver
- fix a couple typos
- only disabe/enable the MAC in suspend/resume if netif is running
- drop the count field from the ring structure and instead calculate the number
  of used descriptors from the tail and head indicies
- rework the locking used to protect the ring structures from concurrent
  access: use cheaper spin_lock_bh() and completely disable the internal
  spinlock used by regmap
- rework the interrupt handling to make it more fine-grained: onle re-enable
  TX and RX interrupts while they're needed, process the stats updates in a
  workqueue, not in napi context
- shrink the code responsible for unmapping and freeing skb memory
- rework the barriers as advised by Arnd

v4 -> v5:
- rename the driver to make it less confusing with the existing mtk_eth_soc
  ethernet driver
- unregister the mdiobus at device's detachment
- open-code spin lock calls to avoid calling the _bh variants where unnecessary
- limit read-modify-write operations where possible when accessing descriptor
  memory
- use READ_ONCE/WRITE_ONCE when modifying the status and data_ptr descriptor
  fields
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ARM64: dts: mediatek: enable ethernet on pumpkin boards

Add remaining properties to the ethernet node and enable it.

Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ARM64: dts: mediatek: add ethernet pins for pumpkin boards

Setup the pin control for the Ethernet MAC.

Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ARM64: dts: mediatek: add an alias for ethernet0 for pumpkin boards

Add the ethernet0 alias for ethernet so that u-boot can find this node
and fill in the MAC address.

Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ARM64: dts: mediatek: add the ethernet node to mt8516.dtsi

Add the Ethernet MAC node to mt8516.dtsi. This defines parameters common
to all the boards based on this SoC.

Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ARM64: dts: mediatek: add pericfg syscon to mt8516.dtsi

This adds support for the PERICFG register range as a syscon. This will
soon be used by the MediaTek Ethernet MAC driver for NIC configuration.

Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: mtk-star-emac: new driver

This adds the driver for the MediaTek STAR Ethernet MAC currently used
on the MT8* SoC family. For now we only support full-duplex.

Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: mediatek: remove unnecessary spaces from Makefile

The Makefile formatting in the kernel tree usually doesn't use tabs,
so remove them before we add a second driver.

Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: mediatek: rename Kconfig prompt

We'll soon by adding a second MediaTek Ethernet driver so modify the
Kconfig prompt.

Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dt-bindings: net: add a binding document for MediaTek STAR Ethernet MAC

This adds yaml DT bindings for the MediaTek STAR Ethernet MAC present
on the mt8* family of SoCs.

Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dt-bindings: add new compatible to mediatek,pericfg

The PERICFG controller is present on the MT8516 SoC. Add an appropriate
compatible variant.

Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dt-bindings: convert the binding document for mediatek PERICFG to yaml

Convert the DT binding .txt file for MediaTek's peripheral configuration
controller to YAML. There's one special case where the compatible has
three positions. Otherwise, it's a pretty normal syscon.

Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ENA-features-and-cosmetic-changes'

Arthur Kiyanovski says:

====================
ENA features and cosmetic changes

Diff from V1 of this patchset:
Removed error prints patch

This patchset includes:
1. new rx offset feature
2. reduction of the driver load time
3. multiple cosmetic changes to the code
====================

Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ena: reduce driver load time

This commit reduces the driver load time by using usec resolution
instead of msec when polling for hardware state change.

Also add back-off mechanism to handle cases where minimal sleep
time is not enough.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ena: cosmetic: minor code changes

1. Use BIT macro instead of shift operator for code clarity
2. Replace multiple flag assignments to a single assignment of multiple
flags in ena_com_add_single_rx_desc()
3. Move ENA_HASH_KEY_SIZE from ena_netdev.h to ena_com.h

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ena: cosmetic: fix spacing issues

1. Add leading and trailing spaces to several comments for better
readability
2. Make tabs and spaces uniform in enum defines in ena_admin_defs.h

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ena: cosmetic: code reorderings

1. Reorder sanity checks in get_comp_ctxt() to make more sense
2. Reorder variables in ena_com_fill_hash_function() and
ena_calc_io_queue_size() in reverse christmas tree.
3. Move around member initializations.

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ena: cosmetic: remove unnecessary code

1. Remove unused definition of DRV_MODULE_VERSION
2. Remove {} from single line-of-code ifs
3. Remove unnecessary comments from ena_get/set_coalesce()
4. Remove unnecessary extra spaces and newlines

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ena: cosmetic: fix line break issues

1. Join unnecessarily broken short lines in ena_com.c ena_netdev.c
2. Fix Indentations of broken lines

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ena: cosmetic: fix spelling and grammar mistakes in comments

fix spelling and grammar mistakes in comments in ena_com.h,
ena_com.c and ena_netdev.c

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ena: cosmetic: set queue sizes to u32 for consistency

Make all types of variables that convey the number and sizeof queues to
be u32, for consistency with the API between the driver and device via
ena_admin_defs.h:ena_admin_get_feat_resp.max_queue_ext fields. Current
code sometimes uses int and there are multiple assignments between these
variables with different types.

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ena: cosmetic: rename ena_update_tx/rx_rings_intr_moderation()

Rename ena_update_tx/rx_rings_intr_moderation() to
ena_update_tx/rx_rings_nonadaptive_intr_moderation()
to distinguish between adaptive and non adaptive interrupt moderaion.

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ena: simplify ena_com_update_intr_delay_resolution()

Initialize prev_intr_delay_resolution with ena_dev->intr_delay_resolution
unconditionally, since it is initialized with
ENA_DEFAULT_INTR_DELAY_RESOLUTION in ena_probe(). This approach makes much
more sense than handling errors of not initializing it.

Also added unlikely to if condition.

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ena: fix ena_com_comp_status_to_errno() return value

Default return value should be -EINVAL since the input
in this case was unexpected.
Also remove the now redundant check in the beginning
of the function.

Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ena: use explicit variable size for clarity

Use u64 instead of unsigned long long for clarity

Signed-off-by: Shai Brandes <shaibran@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ena: rename ena_com_free_desc to make API more uniform

Rename ena_com_free_desc to ena_com_free_q_entries to match
the LLQ mode.

In non-LLQ mode, an entry in an IO ring corresponds to a
a descriptor. In LLQ mode an entry may correspond to several
descriptors (per LLQ definition).

Signed-off-by: Igor Chauskin <igorch@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ena: add support for the rx offset feature

Newer ENA devices can write data to rx buffers with an offset
from the beginning of the buffer.

This commit adds support for this feature in the driver.

Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-atlantic-QoS-implementation'

Igor Russkikh says:

====================
net: atlantic: QoS implementation

This patch series adds support for mqprio rate limiters and multi-TC:
* max_rate is supported on both A1 and A2;
* min_rate is supported on A2 only;

This is a joint work of Mark and Dmitry.

To implement this feature, a couple of rearrangements and code
improvements were done, in areas of TC/ring management, allocation
control, etc.

One of the problems we faced is conflicting ptp functionality, which
consumes a whole traffic class due to hardware limitations.
Patches below have a more detailed description on how PTP and multi-TC
co-exist right now.

v2:
* accommodated review comments (-Wmissing-prototypes and
   -Wunused-but-set-variable findings);
* added user notification in case of conflicting multi-TC<->PTP
   configuration;
* added automatic PTP disabling, if a conflicting configuration is
   detected;
* removed module param, which was used for PTP disabling in v1;

v1: https://patchwork.ozlabs.org/cover/1294380/
====================

Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: atlantic: proper rss_ctrl1 (54c0) initialization

This patch fixes an inconsistency between code and spec, which
was found while working on the QoS implementation.

When 8TCs are used, 2 is the maximum supported number of index bits.
In a 4TC mode, we do support 3, but we shouldn't really use the bytes,
which are intended for the 8TC mode.

Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: atlantic: QoS implementation: min_rate

This patch adds support for mqprio min_rate limiters.

A2 HW supports Weighted Strict Priority (WSP) arbitration for Tx Descriptor
Queue scheduling among TCs, which can be used for min_rate shaping.

Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: atlantic: change the order of arguments for TC weight/credit setters

This patch changes the order of arguments for TC weight/credit setter
functions.
Having the "value to be set" on the right is slightly more robust in
a sense that it's more natural for the humans, so it's a bit more
error-proof this way.

Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: atlantic: always use random TC-queue mapping for TX on A2.

This patch changes the TC-queue mapping mechanism used on A2.
Configure the A2 HW in such a way that we can keep queue index mapping
exactly as it was on A1.

Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: atlantic: automatically downgrade the number of queues if necessary

This patch adds support for automatic queue number downgrade.

On A2: this is a must have, because only TC0/TC1 support more than 4Q.
Other TCs support 4Qs maximum.
Thus, on A2 we must downgrade the number of queues per TC to 4, if more
than 2 TCs are requested.

On A1: this allows using 8TCs even on systems with cpu count >= 8, when
we have 8 queues by default.
We will just automatically switch to 8TCx4Q mode in this case.

Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: atlantic: QoS implementation: max_rate

This patch adds initial support for mqprio rate limiters (max_rate only).

Atlantic HW supports Rate-Shaping for time-sensitive traffic at per
Traffic Class (TC) granularity.
Target rate is defined by:
* nominal link rate (always 10G);
* rate factor (ratio between nominal rate and max allowed).

Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: atlantic: make TCVEC2RING accept nic_cfg

This patch updates TCVEC2RING to accept nic_cfg, which is needed to be able
to use it from hw_atl.
The name is updated to reflect the changes.

Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: atlantic: per-TC queue statistics

This patch adds support for per-TC queue statistics.

By default (single TC), the output is the same as it used to be, e.g.:
     Queue[0] InPackets: 2
     Queue[0] OutPackets: 8
     Queue[0] Restarts: 0
     Queue[0] InJumboPackets: 0
     Queue[0] InLroPackets: 0
     Queue[0] InErrors: 0

If several TCs are enabled, then each queue statistics line is prefixed
with TC number, e.g.:
     TC0 Queue[0] InPackets: 6
     TC0 Queue[0] OutPackets: 11
Queue numbering is end-to-end, so:
     TC1 Queue[4] InPackets: 0
     TC1 Queue[4] OutPackets: 22

Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: atlantic: QoS implementation: multi-TC support

This patch adds multi-TC support.

PTP is automatically disabled when the user enables more than 2 TCs,
otherwise traffic on TC2 won't quite work, because it's reserved for PTP.

Signed-off-by: Dmitry Bezrukov <dbezrukov@marvell.com>
Co-developed-by: Dmitry Bogdanov <dbogdanov@marvell.com>
Signed-off-by: Dmitry Bogdanov <dbogdanov@marvell.com>
Co-developed-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: atlantic: changes for multi-TC support

This patch contains the following changes:
* add cfg->is_ptp (used for PTP enable/disable switch, which
is described in more details below);
* add cfg->tc_mode (A1 supports 2 HW modes only);
* setup queue to TC mapping based on TC mode on A2;
* remove hw_tx_tc_mode_get / hw_rx_tc_mode_get hw_ops.

In the first generation of our hardware (A1), a whole traffic class is
consumed for PTP handling in FW (FW uses it to send the ptp data and to
send back timestamps).
The 'is_ptp' flag introduced in this patch will be used in to automatically
disable PTP when a conflicting configuration is detected, e.g. when
multiple TCs are enabled.

Signed-off-by: Dmitry Bezrukov <dbezrukov@marvell.com>
Co-developed-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: atlantic: move PTP TC initialization to a separate function

This patch moves the PTP TC initialization into a separate function.

Signed-off-by: Dmitry Bezrukov <dbezrukov@marvell.com>
Co-developed-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: atlantic: changes for multi-TC support

This patch contains the following changes:
* access cfg via aq_nic_get_cfg() in aq_nic_start() and aq_nic_map_skb();
* call aq_nic_get_dev() just once in aq_nic_map_skb();
* move ring allocation/deallocation out of aq_vec_alloc()/aq_vec_free();
* add the missing aq_nic_deinit() in atl_resume_common();
* rename 'tcs' field to 'tcs_max' in aq_hw_caps_s to differentiate it from
  the 'tcs' field in aq_nic_cfg_s, which is used for the current number of
  TCs;
* update _TC_MAX defines to the actual number of supported TCs;
* move tx_tc_mode register defines slightly higher (just to keep the order
  of definitions);
* separate variables for TX/RX buff_size in hw_atl*_hw_qos_set();
* use AQ_HW_*_TC instead of hardcoded magic numbers;
* actually use the 'ret' value in aq_mdo_add_secy();

Signed-off-by: Dmitry Bezrukov <dbezrukov@marvell.com>
Co-developed-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
100GbE Intel Wired LAN Driver Updates 2020-05-21

This series contains updates to ice driver only.  Several of the changes
are fixes, which could be backported to stable, of which, only one was
marked for stable because of the memory leak potential.

Jake exposes the information in the flash memory used for link
management, which is called the netlist module.

Henry and Tony add support for tunnel offloads.

Brett adds promiscuous support in VF's which is based on VF trust and
the new vf-true-promisc flag.

Avinash fixes an issue where a transmit timeout for a queue that belongs
to a PFC enabled TC is not a true transmit timeout, but because the PFC
is in action.

Dave fixes the check for contiguous TCs to allow for various UP2TC
mapping configurations.  Also fixed an issue when changing the pause
parameters would could multiple link drop/down's in succession, which in
turn caused the firmware to not generate a link interrupt for the driver
to respond to.

Anirudh (Ani) fixed a potential race condition in probe/open due to a
bit being cleared too early.

Lihong updates an error message to make it more meaningful instead of
just printing out the numerical value of the status/error code.  Also
fixed an incorrect return value if deleting a filter does not find a
match to delete or when adding a filter that already exists.

Karol fixes casting issues and precision loss in the driver.

Jesse make the sign usage more consistent in the driver by making sure
all instances of vf_id are unsigned, since it can never be negative.

Eric fixes a potential memory leak in ice_add_prof_id_vsig() where was
not cleaning up resources properly when an error occurs.

Michal to help organize the filtering code in the driver, refactor the
code into a separate file and add functions to prepare the filter
information.

Bruce cleaned up a conditional statement that always resulted in true
and provided a comment to make it more obvious.  Also cleaned up
redundant code checks.

Tony helps with potential namespace issues by renaming a 'ice' specific
function with the driver name prepended.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'Support-for-fdb-ECMP-nexthop-groups'

Roopa Prabhu says:

====================
Support for fdb ECMP nexthop groups

This series introduces ecmp nexthops and nexthop groups
for mac fdb entries. In subsequent patches this is used
by the vxlan driver fdb entries. The use case is
E-VPN multihoming [1,2,3] which requires bridged vxlan traffic
to be load balanced to remote switches (vteps) belonging to
the same multi-homed ethernet segment (This is analogous to
a multi-homed LAG but over vxlan).

Changes include new nexthop flag NHA_FDB for nexthops
referenced by fdb entries. These nexthops only have ip.
The patches make sure that routes dont reference such nexthops.

example:
$ip nexthop add id 12 via 172.16.1.2 fdb
$ip nexthop add id 13 via 172.16.1.3 fdb
$ip nexthop add id 102 group 12/13 fdb

$bridge fdb add 02:02:00:00:00:13 dev vxlan1000 nhid 101 self

[1] E-VPN https://tools.ietf.org/html/rfc7432
[2] E-VPN VxLAN: https://tools.ietf.org/html/rfc8365
[3] LPC talk with mention of nexthop groups for L2 ecmp
http://vger.kernel.org/lpc_net2018_talks/scaling_bridge_fdb_database_slidesV3.pdf

v4 -
    - fix error path free_skb in vxlan_xmit_nh
    - fix atomic notifier initialization issue
      (Reported-by: kernel test robot <rong.a.chen@intel.com>)
      The reported error was easy to locate and fix, but i was not
      able to re-test with the robot reproducer script due to some
      other issues with running the script on my test system.

v3 - fix wording in selftest print as pointed out by davidA

v2 -
- dropped nikolays fixes for nexthop multipath null pointer deref
  (he will send those separately)
- added negative tests for route add with fdb nexthop + a few more
- Fixes for a few  fdb replace conditions found during more testing
- Moved to rcu_dereference_rtnl in vxlan_fdb_info and consolidate rcu
  dereferences
- Fixes to build failures Reported-by: kbuild test robot <lkp@intel.com>
- DavidA, I am going to send a separate patch for the neighbor code validation
  for NDA_NH_ID if thats ok.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: net: add fdb nexthop tests

This commit adds ipv4 and ipv6 fdb nexthop api tests to fib_nexthops.sh.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

vxlan: support for nexthop notifiers

vxlan driver registers for nexthop add/del notifiers to
cleanup fdb entries pointing to such nexthops.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nexthop: add support for notifiers

This patch adds nexthop add/del notifiers. To be used by
vxlan driver in a later patch. Could possibly be used by
switchdev drivers in the future.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

vxlan: ecmp support for mac fdb entries

Todays vxlan mac fdb entries can point to multiple remote
ips (rdsts) with the sole purpose of replicating
broadcast-multicast and unknown unicast packets to those remote ips.

E-VPN multihoming [1,2,3] requires bridged vxlan traffic to be
load balanced to remote switches (vteps) belonging to the
same multi-homed ethernet segment (E-VPN multihoming is analogous
to multi-homed LAG implementations, but with the inter-switch
peerlink replaced with a vxlan tunnel). In other words it needs
support for mac ecmp. Furthermore, for faster convergence, E-VPN
multihoming needs the ability to update fdb ecmp nexthops independent
of the fdb entries.

New route nexthop API is perfect for this usecase.
This patch extends the vxlan fdb code to take a nexthop id
pointing to an ecmp nexthop group.

Changes include:
- New NDA_NH_ID attribute for fdbs
- Use the newly added fdb nexthop groups
- makes vxlan rdsts and nexthop handling code mutually
exclusive
- since this is a new use-case and the requirement is for ecmp
nexthop groups, the fdb add and update path checks that the
nexthop is really an ecmp nexthop group. This check can be relaxed
in the future, if we want to introduce replication fdb nexthop groups
and allow its use in lieu of current rdst lists.
- fdb update requests with nexthop id's only allowed for existing
fdb's that have nexthop id's
- learning will not override an existing fdb entry with nexthop
group
- I have wrapped the switchdev offload code around the presence of
rdst

[1] E-VPN RFC https://tools.ietf.org/html/rfc7432
[2] E-VPN with vxlan https://tools.ietf.org/html/rfc8365
[3] http://vger.kernel.org/lpc_net2018_talks/scaling_bridge_fdb_database_slidesV3.pdf

Includes a null check fix in vxlan_xmit from Nikolay

v2 - Fixed build issue:
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nexthop: support for fdb ecmp nexthops

This patch introduces ecmp nexthops and nexthop groups
for mac fdb entries. In subsequent patches this is used
by the vxlan driver fdb entries. The use case is
E-VPN multihoming [1,2,3] which requires bridged vxlan traffic
to be load balanced to remote switches (vteps) belonging to
the same multi-homed ethernet segment (This is analogous to
a multi-homed LAG but over vxlan).

Changes include new nexthop flag NHA_FDB for nexthops
referenced by fdb entries. These nexthops only have ip.
This patch includes appropriate checks to avoid routes
referencing such nexthops.

example:
$ip nexthop add id 12 via 172.16.1.2 fdb
$ip nexthop add id 13 via 172.16.1.3 fdb
$ip nexthop add id 102 group 12/13 fdb

$bridge fdb add 02:02:00:00:00:13 dev vxlan1000 nhid 101 self

[1] E-VPN https://tools.ietf.org/html/rfc7432
[2] E-VPN VxLAN: https://tools.ietf.org/html/rfc8365
[3] LPC talk with mention of nexthop groups for L2 ecmp
http://vger.kernel.org/lpc_net2018_talks/scaling_bridge_fdb_database_slidesV3.pdf

v4 - fixed uninitialized variable reported by kernel test robot
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
1GbE Intel Wired LAN Driver Updates 2020-05-21

This series contains updates to igc and e1000.

Andre cleans up code that was left over from the igb driver that handled
MAC address filters based on the source address, which is not currently
supported.  Simplifies the MAC address filtering code and prepare the
igc driver for future source address support.  Updated the MAC address
filter internal APIs to support filters based on source address.  Added
support for Network Flow Classification (NFC) rules based on source MAC
address.  Cleaned up the 'cookie' field which is not used anywhere in
the code and cleaned up a wrapper function that was not needed.
Simplified the filtering code for readability and aligned the ethtool
functions, so that function names were consistent.

Alex provides a fix for e1000 to resolve a deadlock issue when NAPI is
being disabled.

Sasha does additional cleanup of the igc driver of dead code that is not
used or needed.

v2: Fix the function header comment in patch 3 of the series, based on
    the feedback from Jakub Kicinski.
====================

Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ice: Rename build_ctob to ice_build_ctob

To make the function easier to identify as being part of the ice driver,
prepend ice to the function name.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>