git.proxmox.com Git - mirror_ubuntu-hirsute-kernel.git/log

netfilter: nf_tables: pass ctx to nf_tables_expr_destroy()

nft_set_elem_destroy() can be called from call_rcu context. Annotate
netns and table in set object so we can populate the context object.
Moreover, pass context object to nf_tables_set_elem_destroy() from the
commit phase, since it is already available from there.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_conncount: expose connection list interface

This patch provides an interface to maintain the list of connections and
the lookup function to obtain the number of connections in the list.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: pass context to object destroy indirection

The new connlimit object needs this to properly deal with conntrack
dependencies.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: Libify xt_TPROXY

The extracted functions will likely be usefull to implement tproxy
support in nf_tables.

Extrancted functions:
- nf_tproxy_sk_is_transparent
- nf_tproxy_laddr4
- nf_tproxy_handle_time_wait4
- nf_tproxy_get_sock_v4
- nf_tproxy_laddr6
- nf_tproxy_handle_time_wait6
- nf_tproxy_get_sock_v6

(nf_)tproxy_handle_time_wait6 also needed some refactor as its current
implementation was xtables-specific.

Signed-off-by: Máté Eckl <ecklm94@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: Decrease code duplication regarding transparent socket option

There is a function in include/net/netfilter/nf_socket.h to decide if a
socket has IP(V6)_TRANSPARENT socket option set or not. However this
does the same as inet_sk_transparent() in include/net/tcp.h

include/net/tcp.h:1733
/* This helper checks if socket has IP_TRANSPARENT set */
static inline bool inet_sk_transparent(const struct sock *sk)
{
switch (sk->sk_state) {
case TCP_TIME_WAIT:
return inet_twsk(sk)->tw_transparent;
case TCP_NEW_SYN_RECV:
return inet_rsk(inet_reqsk(sk))->no_srccheck;
}
return inet_sk(sk)->transparent;
}

tproxy_sk_is_transparent has also been refactored to use this function
instead of reimplementing it.

Signed-off-by: Máté Eckl <ecklm94@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next

Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter/IPVS updates for your net-next
tree, the most relevant things in this batch are:

1) Compile masquerade infrastructure into NAT module, from Florian Westphal.
   Same thing with the redirection support.

2) Abort transaction if early initialization of the commit phase fails.
   Also from Florian.

3) Get rid of synchronize_rcu() by using rule array in nf_tables, from
   Florian.

4) Abort nf_tables batch if fatal signal is pending, from Florian.

5) Use .call_rcu nfnetlink from nf_tables to make dumps fully lockless.
   From Florian Westphal.

6) Support to match transparent sockets from nf_tables, from Máté Eckl.

7) Audit support for nf_tables, from Phil Sutter.

8) Validate chain dependencies from commit phase, fall back to fine grain
   validation only in case of errors.

9) Attach dst to skbuff from netfilter flowtable packet path, from
   Jason A. Donenfeld.

10) Use artificial maximum attribute cap to remove VLA from nfnetlink.
    Patch from Kees Cook.

11) Add extension to allow to forward packets through neighbour layer.

12) Add IPv6 conntrack helper support to IPVS, from Julian Anastasov.

13) Add IPv6 FTP conntrack support to IPVS, from Julian Anastasov.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'mlx5e-updates-2018-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5e-updates-2018-06-01

1) From Tariq, Two patches to Fix IPoIB issues introduced in
   "net/mlx5e: TX, Use actual WQE size for SQ edge fill"

2) From Eran, Additional improvements to mlx5e statistics reporting

3) From Maor, Increase aRFS flow tables size

4) From Adi, Support MTU change for ethernet representors

5) From Ilan and Adi, Handle QP error events in FPGA

6) From Tariq, last 10 patches mainly deals with RX buffer scheme improvements for legacy RQ
   to use only order-0 pages and fragmented SKBs for large MTUs.

-  Tariq starts with some refactoring and removing HW LRO support from traditional
   (legacy) RQ, since it complicates the buffer scheme and removing it makes it smoother
   to move to cyclic descriptor buffer for traditional RQ.

- Use cyclic WQ in legacy RQ, which has many benefits and paves the way for fragmented SKBs
  for large MTUs.

- Enhance legacy Receive Queue memory scheme, such that only order-0 pages are used.
  Whenever possible, prefer using a linear SKB, and build it wrapping the WQE buffer.
  Otherwise (for example, jumbo frames on x86), use non-linear SKB, with as many frags
  as needed. In this case, multiple WQE scatter entries are used, up to a maximum of 4
  frags and 10KB of MTU.

- TX statistics access improvements.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5e: TX, Separate cachelines of xmit and completion stats

Avoid false sharing of cachelines by separating the cachelines of
TX stats that are dertied in xmit flow and in completion flow.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: RX, Always prefer Linear SKB configuration

Prefer the linear SKB configuration of Legacy RQ over the
non-linear one of Striding RQ.

This implies that ConnectX-4 LX now uses legacy RQ by default,
as it does not support the linear configuration of Striding RQ.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: RX, Enhance legacy Receive Queue memory scheme

Enhance the memory scheme of the legacy RQ, such that
only order-0 pages are used.

Whenever possible, prefer using a linear SKB, and build it
wrapping the WQE buffer.

Otherwise (for example, jumbo frames on x86), use non-linear SKB,
with as many frags as needed. In this case, multiple WQE
scatter entries are used, up to a maximum of 4 frags and 10KB of MTU.

This implied to remove support of HW LRO in legacy RQ, as it would
require large number of page allocations and scatter entries per WQE
on archs with PAGE_SIZE = 4KB, yielding bad performance.

In earlier patches, we guaranteed that all completions are in-order,
and that we use a cyclic WQ.
This creates an oppurtunity for a performance optimization:
The mapping between a "struct mlx5e_dma_info", and the
WQEs (struct mlx5e_wqe_frag_info) pointing to it, is constant
across different cycles of a WQ. This allows initializing
the mapping in the time of RQ creation, and not handle it
in datapath.

A struct mlx5e_dma_info that is shared between different WQEs
is allocated by the first WQE, and freed by the last one.
This implies an important requirement: WQEs that share the same
struct mlx5e_dma_info must be posted within the same NAPI.
Otherwise, upon completion, struct mlx5e_wqe_frag_info would mistakenly
point to the new struct mlx5e_dma_info, not the one that was posted
(and the HW wrote to).
This bulking requirement is actually good also for performance reasons,
hence we extend the bulk beyong the minimal requirement above.

With this memory scheme, the RQs memory footprint is reduce by a
factor of 2 on x86, and by a factor of 32 on PowerPC.
Same factors apply for the number of pages in a GRO session.

Performance tests:
ConnectX-4, single core, single RX ring, default MTU.

x86:
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz

Packet rate (early drop in TC): no degradation
TCP streams: ~5% improvement

PowerPC:
CPU: POWER8 (raw), altivec supported

Packet rate (early drop in TC): 20% gain
TCP streams: 25% gain

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: RX, Use cyclic WQ in legacy RQ

Now that LRO is not supported for Legacy RQ, there is no source of
out-of-order completions in the WQ, and we can use a cyclic one.
This has multiple advantages:
- reduces the WQE size (smaller PCI transactions).
- lower overhead in datapath (no handling of 'next' pointers).
- no reserved WQE for the WQ head (was need in linked-list).
- allows using a constant map between frag and dma_info struct, in downstream patch.

Performance tests:
ConnectX-4, single core, single RX ring.
Major gain in packet rate of single ring XDP drop.
Bottleneck is shifted form HW (at 16Mpps) to SW (at 20Mpps).

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: RX, Split WQ objects for different RQ types

Replace the common RQ WQ object with two separate ones for the
different RQ types.
This is in preparation for switching to using a cyclic WQ type
in Legacy RQ.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: RX, Remove HW LRO support in legacy RQ

Current LRO implementation in Legacy RQ uses high-order pages.
In downstream patches of this series we complete the transition
to using only order-0 pages in RX datapath (which was already done
in Striding RQ).

Unlike the more advanced Striding RQ, Legacy RQ does not make reuse
of any non-consumed buffers of non-full LRO sessions, and combining
it with order-0 pages has many performance drawbacks.

Hence, here we totally remove LRO support in Legacy RQ.
This guarantees having no out-of-order completions, which allows using
a cyclic work queue (instead of a linked-list) in a downstream patch.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: RX, Dedicate a function for copying SKB header

Get the logic of copying the packet header into the SKB linear part
into a generic function. Function does copy length alignment
and dma buffer sync.

It is currently called only within the MPWQE flow.
In a downstream patch, it will be called within the legacy RQ flow
as well.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: RX, Generalise function of SKB frag addition

Rename it and pass truesize as an extra argument, as it will be used also
in Legacy RQ in a downstream patch.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: RX, Generalise name of non-linear SKB head size

Make name more generic by dropping MPWRQ from it, as it will be
used also in Legacy RQ in a downstream patch.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: TX, Obsolete maintaining local copies of skb->len/data

Instead of maintaining a local copy of skb->len/data and updating
it upon every copy to the WQE inline part, just calculate it once
when needed, using the ihs.

This obsoletes the function mlx5e_tx_skb_pull_inline.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: FPGA, Handle QP error event

Add handlers for this event to perform graceful teardown of the device.

Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Adi Nissim <adin@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Support configurable MTU for vport representors

The representor MTU was hard coded to 1500 bytes.
Allow setting arbitrary MTU values up to the max supported by the FW.

Signed-off-by: Adi Nissim <adin@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Increase aRFS flow tables size

Increase the aRFS flow table size to 64k so it could contain up to 64k
different streams.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Remove redundant active_channels indication

Now, when all channels stats are saved regardless of the channel's state
{open, closed}, we can safely remove this indication and the stats spin
lock which protects it.

Fixes: 76c3810bade3 ("net/mlx5e: Avoid reset netdev stats on configuration changes")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Present SW stats when state is not opened

The driver can present all SW stats even when the state not opened.
Fixed get strings, count and stats to support it.

In addition, fix tc2txq to hold a static mapping which doesn't depend on
the amount of open channels, and cannot have the same value on two
different cells while moving between configurations.
Example:
- OOB 16 channels
- Change to 2 channels, 8 TCs
- tc2txq[15][0] == tc2txq[1][7] == 15
This will cause multiple appearances of the same TX index in statistics
output.

Fixes: 76c3810bade3 ("net/mlx5e: Avoid reset netdev stats on configuration changes")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: IPOIB, Add a missing skb_pull

A call to mlx5e_tx_skb_pull_inline was mistakenly dropped
in the cited patch. Get it back.

Fixes: 043dc78ecf07 ("net/mlx5e: TX, Use actual WQE size for SQ edge fill")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: IPOIB, Fix overflowing SQ WQE memset

IPoIB WQE size is larger than a single WQEBB. Must not fetch the WQE,
and surely not memset it, until it is guaranteed that there are enough
WQEBBs available before getting to SQ/frag edge.

Fixes: 043dc78ecf07 ("net/mlx5e: TX, Use actual WQE size for SQ edge fill")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

Merge branch 'hns3-next'

Salil Mehta says:

====================
Misc. bug fixes & optimizations for HNS3 driver

This patch-set presents some bug fixes found out during the internal
review and system testing and some small optimizations.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: Optimize the VF's process of updating multicast MAC

In the update flow of the new PF driver, if a multicast address is in mta
table, the VF deletion action will not take effect.

This patch adds the VF adaptation according to the new flow of PF'driver.

Signed-off-by: Xi Wang <wangxi11@huawei.com>
Reviewed-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: Optimize the PF's process of updating multicast MAC

In the current process, the multicast MAC is added to both MAC_VLAN
table and MTA table, this will reduce the utilization of the resource.

This patch improves the process of adding multicast MAC address, the
new process starts using the MTA table to add multicast MAC after the
MAC_VLAN table is full, and the MTA is disable if it is no longer used.

Signed-off-by: Xi Wang <wangxi11@huawei.com>
Reviewed-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: Fix for vxlan tx checksum bug

when skb->encapsulation is 0, skb->ip_summed is CHECKSUM_PARTIAL
and it is udp packet, which has a dest port as the IANA assigned.
the hardware is expected to do the checksum offload, but the
hardware will not do the checksum offload when udp dest port is
4789.

This patch fixes it by doing the checksum in software.

Fixes: 76ad4f0ee747 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: Add missing break in misc_irq_handle

There is a break missing in the switch/case handling in
hclge_misc_irq_handle, which causes the log to output
uncorrectly.

This patch adds the missing break, and change the dev_dbg
to dev_warn in order to better catch the error.

Fixes: c1a81619d73a ("net: hns3: Add mailbox interrupt handling to PF driver")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: Fix for phy not link up problem after resetting

When resetting, phy_state_machine may be accessing the phy through
firmware if the phy is not stopped or disconnected, which will
cause firemware timeout problem because the firmware is busy
processing the reset request.

This patch fixes it by disabling the phy when resetting.

Fixes: b940aeae0ed6 ("net: hns3: never send command queue message to IMP when reset")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: Fix for hclge_reset running repeatly problem

When hardware sends the HCLGE_VECTOR0_EVENT_RST event through
hclge_misc_irq_handle, currently driver enables misc_vector in
the interrupt handle, and hardware generates the same interrupt
for the same reset event again and again until the reset is
complete, which causes hclge_reset running repeatly problem.

This patch fixes by enabling the misc_vector after reset is
complete.

Fixes: 4ed340ab8f49 ("net: hns3: Add reset process in hclge_main")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: Fix for service_task not running problem after resetting

When hclge_ae_stop is called during resetting, it will cancel the
service_task by calling cancel_work_sync, which may cause the
service_task to exit without clearing HCLGE_STATE_SERVICE_SCHED
bit. If this happens, the service_task will never run again.

This patch fixes this problem by clearing it after calling
cancel_work_sync in hclge_ae_stop.

Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: Fix setting mac address error

When doing function reset or insmod hns3 dirver after rmmod,
the entries of mac vlan table are not cleared, which may cause
init mac address failed. This patch fixes it by clearing the
old mac address when doing function reset or rmmod hns3 driver.

Fixes: 76ad4f0ee747 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: Add repeat address checking for setting mac address

Add checking for new mac address. It doesn't need to config
the mac vlan table if it's already in use.

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: Add support for IFF_ALLMULTI flag

This patch adds support for IFF_ALLMULTI flag to HNS3 PF and VF
driver.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: Disable vf vlan filter when vf vlan table is full

This is only 128 entries for hardware's vf vlan table, when
the vf table is full, the firmware will disable the vf vlan
filter and return a resp_code of HCLGE_VF_VLAN_NO_ENTRY to
driver.

This patch checks the if resp_code from firmware is
HCLGE_VF_VLAN_NO_ENTRY, if yes, then print a warning and
return ok to the caller.

Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mirror-to-gretap-tests'

Petr Machata says:

====================
Test mirror-to-gretap with bridge in UL

This patchset adds more tests to the mirror-to-gretap suite where bridge
is present in the underlay. Specifically it adds tests for bridge VLAN
handling, FDB, and bridge port STP status.

In patches #1-#3, the codebase is refactored to support the new tests.

In patch #4, an STP test is added to the mirroring library, that will
later be called from bridge tests.

In patches #5-#8, the test for mirror-to-gretap with an 802.1q bridge in
underlay is adapted and more tests are added.

In patch #9, an STP test is added to the test suite for mirror-to-gretap
with an 802.1d bridge in underlay.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: forwarding: mirror_gre_bridge_1d_vlan: Add STP test

To test offloading of mirror-to-gretap in mlxsw for cases that a
VLAN-unaware bridge is in underlay packet path, test that the STP status
of bridge egress port is reflected.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: forwarding: mirror_gre_vlan_bridge_1q: Add more tests

Offloading of mirror-to-gretap in mlxsw is tricky especially in cases
when the gretap underlay involves bridges. Add more tests that exercise
the bridge handling code:

- forbidden_egress tests that check vlan removal on bridge port in the
underlay packet path
- untagged_egress tests that similarly check "egress untagged"
- fdb_roaming tests that check whether learning FDB on a different port
is reflected
- stp tests for handling port STP status of bridge egress port

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: forwarding: mirror_gre_vlan_bridge_1q: Rename two tests

Rename test_gretap_forbidden() and test_ip6gretap_forbidden() to a more
specific test_gretap_forbidden_cpu() and test_ip6gretap_forbidden_cpu().
This will make it clearer which is which when further down a patch is
introduced that forbids a VLAN on regular bridge port.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: forwarding: mirror_gre_vlan_bridge_1q: Test final config

After the final change reestablishes the original configuration, make
sure the traffic flows again as it should.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: forwarding: mirror_gre_vlan_bridge_1q: Fix tunnel name

The "ip6gretap" in the test name refers to the tunnel device type that
the test is supposed to be testing. However test_ip6gretap_forbidden()
tests, due to a typo, a gretap tunnel. Fix the typo.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: forwarding: mirror_gre_lib: Add STP test

Add a reusable full test that toggles STP state of a given bridge port
and checks that the mirroring reacts appropriately. The test will be
used by bridge tests in follow-up patches.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: forwarding: mirror_lib: skip_hw the VLAN capture

When the VLAN capture is installed on a front panel device and not a
soft device, the packets are counted twice: once in fast path, and once
after they are trapped to the kernel. Resolve the problem by passing
skip_hw flag to vlan_capture_install().

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: forwarding: mirror_lib: Move here do_test_span_vlan_dir_ips()

Move the function do_test_span_vlan_dir_ips() from mirror_vlan.sh test
to a library file mirror_lib.sh to allow reuse. Fill in other entry
points similar to other testing functions in mirror_lib.sh, they will be
useful in following patches.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: forwarding: lib: Move here vlan_capture_{, un}install()

Move vlan_capture_install() and vlan_capture_uninstall() from
mirror_vlan.sh test to lib.sh so that it can be reused in other tests.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: Split the PPv2 driver to a dedicated directory

As the mvpp2 driver is growing, move this driver to a dedicated
directory and split it into several files.

Since this driver has a lot of register defines and structure
definitions, it can benefit from having all of this into a dedicated
header file, named mvpp2.h.

A good chunk of the mvpp2 code is dedicated to Header Parser handling, so
we introduce mvpp2_prs.h where all Header Parser definitions are located,
and mvpp2_prs.c containing the related code.

In the same way, mvpp2_cls.h and mvpp2_cls.c are created to contain
Classifier and RSS related code.

The former 'mvpp2.c' file is renamed 'mvpp2_main.c' so that we can keep
the driver binary named 'mvpp2'.

This commit is only about spliting the driver into multiple files and
doesn't introduce any new function, feature or fix besides removing
'static' keywords when needed.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Tested-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sched: split tc_ctl_tfilter into three handlers

tc_ctl_tfilter handles three netlink message types: RTM_NEWTFILTER,
RTM_DELTFILTER, RTM_GETTFILTER. However, implementation of this function
involves a lot of branching on specific message type because most of the
code is message-specific. This significantly complicates adding new
functionality and doesn't provide much benefit of code reuse.

Split tc_ctl_tfilter to three standalone functions that handle filter new,
delete and get requests.

The only truly protocol independent part of tc_ctl_tfilter is code that
looks up queue, class, and block. Refactor this code to standalone
tcf_block_find function that is used by all three new handlers.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

rtnetlink: Fix null-ptr-deref in rtnl_newlink

In rtnl_newlink(), NULL check is performed on m_ops however member of
ops is accessed. Fixed by accessing member of m_ops instead of ops.

[  345.432629] BUG: KASAN: null-ptr-deref in rtnl_newlink+0x400/0x1110
[  345.432629] Read of size 4 at addr 0000000000000088 by task ip/986
[  345.432629]
[  345.432629] CPU: 1 PID: 986 Comm: ip Not tainted 4.17.0-rc6+ #9
[  345.432629] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[  345.432629] Call Trace:
[  345.432629]  dump_stack+0xc6/0x150
[  345.432629]  ? dump_stack_print_info.cold.0+0x1b/0x1b
[  345.432629]  ? kasan_report+0xb4/0x410
[  345.432629]  kasan_report.cold.4+0x8f/0x91
[  345.432629]  ? rtnl_newlink+0x400/0x1110
[  345.432629]  rtnl_newlink+0x400/0x1110
[...]

Fixes: ccf8dbcd062a ("rtnetlink: Remove VLA usage")
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipvs: add ipv6 support to ftp

Add support for FTP commands with extended format (RFC 2428):

- FTP EPRT: IPv4 and IPv6, active mode, similar to PORT
- FTP EPSV: IPv4 and IPv6, passive mode, similar to PASV.
EPSV response usually contains only port but we allow real
server to provide different address

We restrict control and data connection to be from same
address family.

Allow the "(" and ")" to be optional in PASV response.

Also, add ipvsh argument to the pkt_in/pkt_out handlers to better
access the payload after transport header.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

ipvs: add full ipv6 support to nfct

Prepare NFCT to support IPv6 for FTP:

- Do not restrict the expectation callback to PF_INET

- Split the debug messages, so that the 160-byte limitation
in IP_VS_DBG_BUF is not exceeded when printing many IPv6
addresses. This means no more than 3 addresses in one message,
i.e. 1 tuple with 2 addresses or 1 connection with 3 addresses.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nft_fwd_netdev: allow to forward packets via neighbour layer

This allows us to forward packets from the netdev family via neighbour
layer, so you don't need an explicit link-layer destination when using
this expression from rules. The ttl/hop_limit field is decremented.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nfnetlink: Remove VLA usage

In the quest to remove all stack VLA usage from the kernel[1], this
allocates the maximum size expected for all possible attrs and adds
sanity-checks at both registration and usage to make sure nothing
gets out of sync.

[1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_flow_table: attach dst to skbs

Some drivers, such as vxlan and wireguard, use the skb's dst in order to
determine things like PMTU. They therefore loose functionality when flow
offloading is enabled. So, we ensure the skb has it before xmit'ing it
in the offloading path.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: fix chain dependency validation

The following ruleset:

add table ip filter
add chain ip filter input { type filter hook input priority 4; }
add chain ip filter ap
add rule ip filter input jump ap
add rule ip filter ap masquerade

results in a panic, because the masquerade extension should be rejected
from the filter chain. The existing validation is missing a chain
dependency check when the rule is added to the non-base chain.

This patch fixes the problem by walking down the rules from the
basechains, searching for either immediate or lookup expressions, then
jumping to non-base chains and again walking down the rules to perform
the expression validation, so we make sure the full ruleset graph is
validated. This is done only once from the commit phase, in case of
problem, we abort the transaction and perform fine grain validation for
error reporting. This patch requires 003087911af2 ("netfilter:
nfnetlink: allow commit to fail") to achieve this behaviour.

This patch also adds a cleanup callback to nfnl batch interface to reset
the validate state from the exit path.

As a result of this patch, nf_tables_check_loops() doesn't use
->validate to check for loops, instead it just checks for immediate
expressions.

Reported-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: Add audit support to log statement

This extends log statement to support the behaviour achieved with
AUDIT target in iptables.

Audit logging is enabled via a pseudo log level 8. In this case any
other settings like log prefix are ignored since audit log format is
fixed.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: add support for native socket matching

Now it can only match the transparent flag of an ip/ipv6 socket.

Signed-off-by: Máté Eckl <ecklm94@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: fix ptr_ret.cocci warnings

net/netfilter/nft_numgen.c:117:1-3: WARNING: PTR_ERR_OR_ZERO can be used
net/netfilter/nft_hash.c:180:1-3: WARNING: PTR_ERR_OR_ZERO can be used
net/netfilter/nft_hash.c:223:1-3: WARNING: PTR_ERR_OR_ZERO can be used

Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR

Generated by: scripts/coccinelle/api/ptr_ret.cocci

Fixes: b9ccc07e3f31 ("netfilter: nft_hash: add map lookups for hashing operations")
Fixes: d734a2888922 ("netfilter: nft_numgen: add map lookups for numgen statements")
CC: Laura Garcia Liebana <nevola@gmail.com>
Signed-off-by: kbuild test robot <fengguang.wu@intel.com>
Acked-by: Laura Garcia Liebana <nevola@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

virtio_net: fix error return code in virtnet_probe()

Fix to return a negative error code from the failover create fail error
handling case instead of 0, as done elsewhere in this function.

Fixes: ba5e4426e80e ("virtio_net: Extend virtio to use VF datapath when available")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

rtnetlink: Remove VLA usage

In the quest to remove all stack VLA usage from the kernel[1], this
allocates the maximum size expected for all possible types and adds
sanity-checks at both registration and usage to make sure nothing gets
out of sync. This matches the proposed VLA solution for nfnetlink[2]. The
values chosen here were based on finding assignments for .maxtype and
.slave_maxtype and manually counting the enums:

slave_maxtype (max 33):
IFLA_BRPORT_MAX     33
IFLA_BOND_SLAVE_MAX  9

maxtype (max 45):
IFLA_BOND_MAX       28
IFLA_BR_MAX         45
__IFLA_CAIF_HSI_MAX  8
IFLA_CAIF_MAX        4
IFLA_CAN_MAX        16
IFLA_GENEVE_MAX     12
IFLA_GRE_MAX        25
IFLA_GTP_MAX         5
IFLA_HSR_MAX         7
IFLA_IPOIB_MAX       4
IFLA_IPTUN_MAX      21
IFLA_IPVLAN_MAX      3
IFLA_MACSEC_MAX     15
IFLA_MACVLAN_MAX     7
IFLA_PPP_MAX         2
__IFLA_RMNET_MAX     4
IFLA_VLAN_MAX        6
IFLA_VRF_MAX         2
IFLA_VTI_MAX         7
IFLA_VXLAN_MAX      28
VETH_INFO_MAX        2
VXCAN_INFO_MAX       2

This additionally changes maxtype and slave_maxtype fields to unsigned,
since they're only ever using positive values.

[1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com
[2] https://patchwork.kernel.org/patch/10439647/

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: Be explicit about DT or pdata

Make it explicit that either device tree is used or platform data. If
neither is available, abort the probe.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 877b7cb0b6f2 ("net: dsa: mv88e6xxx: Add minimal platform_data support")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ti: cpsw: include gpio/consumer.h

On platforms that don't always enable CONFIG_GPIOLIB, we run into
a build failure:

drivers/net/ethernet/ti/cpsw.c: In function 'cpsw_probe':
drivers/net/ethernet/ti/cpsw.c:3006:9: error: implicit declaration of function 'devm_gpiod_get_array_optional' [-Werror=implicit-function-declaration]
  mode = devm_gpiod_get_array_optional(&pdev->dev, "mode", GPIOD_OUT_LOW);
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/ti/cpsw.c:3006:59: error: 'GPIOD_OUT_LOW' undeclared (first use in this function); did you mean 'GPIOF_INIT_LOW'?
  mode = devm_gpiod_get_array_optional(&pdev->dev, "mode", GPIOD_OUT_LOW);

Since we cannot rely on this to be visible from gpio.h, we have to include
gpio/consumer.h directly.

Fixes: 2652113ff043 ("net: ethernet: ti: Allow most drivers with COMPILE_TEST")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mlx5-new-device-events'

Saeed Mahameed says:

====================
Mellanox, mlx5 new device events

The following series is for mlx5-next tree [1], it adds the support of two
new device events, from Ilan Tayari:

1. High temperature warnings.
2. FPGA QP error event.

In case of no objection this series will be applied to mlx5-next tree
and will be sent later as a pull request to both rdma and net trees.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git/log/?h=mlx5-next

v1->v2:
- improve commit message of the FPGA QP error event patch.
====================

Acked-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: Add FPGA QP error event

The FPGA queue pair (QP) event fires whenever a QP on the FPGA
transitions to the error state.

At this stage, this event is unrecoverable, it may become recoverable
in the future.

Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Adi Nissim <adin@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: Add temperature warning event to log

Temperature warning event is sent by FW to indicate high temperature
as detected by one of the sensors on the board.
Add handling of this event by writing the numbers of the alert sensors
to the kernel log.

Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Adi Nissim <adin@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

rtnetlink: Add more well known protocol values

FRRouting installs routes into the kernel associated with
the originating protocol. Add these values to the well
known values in rtnetlink.h.

Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: Add FORCE_PAUSE bit to 32 bit port caps

Add FORCE_PAUSE bit to force local pause settings instead
of using auto negotiated values.

Signed-off-by: Santosh Rastapur <santosh@chelsio.com>
Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bridge-vlan-notify'

Petr Machata says:

====================
net: bridge: Notify about bridge VLANs

In commit 946a11e7408e ("mlxsw: spectrum_span: Allow bridge for gretap
mirror"), mlxsw got support for offloading mirror-to-gretap such that
the underlay packet path involves a bridge. In that case, the offload is
also influenced by PVID setting of said bridge. However, changes to VLAN
configuration of the bridge itself do not generate switchdev
notifications, so there's no mechanism to prod mlxsw to update the
offload when these settings change.

In this patchset, the problem is resolved by distributing the switchdev
notification SWITCHDEV_OBJ_ID_PORT_VLAN also for configuration changes
on bridge VLANs. Since stacked devices distribute the notification to
lower devices, such event eventually reaches the driver, which can
determine whether it's a bridge or port VLAN by inspecting orig_dev.

To keep things consistent, the newly-distributed notifications observe
the same protocol as the existing ones: dual prepare/commit, with
-EOPNOTSUPP indicating lack of support, even though there's currently
nothing to prepare for and nothing to support. Correspondingly, all
switchdev drivers have been updated to return -EOPNOTSUPP for bridge
VLAN notifications.

In patches #1 and #2, the code base is changed to support the following
additions: functions br_switchdev_port_vlan_add() and
br_switchdev_port_vlan_del() are introduced to simplify sending
notifications; and br_vlan_add_existing() is introduced to later make it
simpler to add error-handling code for the case of configuring a
preexisting VLAN on bridge CPU port.

In patches #3-#6, respectively for mlxsw, rocker, DSA and DPAA2 ethsw,
the new notifications (which are not enabled yet) are ignored to
maintain the current behavior.

In patch #7, the notification is actually enabled.

In patch #8, mlxsw is changed to update offloads of mirror-to-gre also
for bridge-related notifications.

Changes from v3 to v4:

- In patch #1, separate variable declarations from program logic.
- Add patch #2.
- In patch #7, add error handling around a newly-introduced call to
  br_switchdev_port_vlan_add().
- Rephrase commit messages of patches #3-#6 to explain motivation for
  the change.

Changes from v2 to v3:

- Add a fallback definition for br_switchdev_port_obj_add() and
  br_switchdev_port_obj_del() when !CONFIG_NET_SWITCHDEV.

Changes from v1 to v2:

- Rename br_switchdev_port_obj_add() and br_switchdev_port_obj_del() to
  br_switchdev_port_vlan_add() and br_switchdev_port_vlan_del(), and
  move from br_vlan.c to br_switchdev.c.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_switchdev: Schedule respin during trans prepare

Since there's no special support for the bridge events, the driver
returns -EOPNOTSUPP, and thus the commit never happens. Therefore
schedule respin during the prepare stage: there's no real difference one
way or another.

This fixes the problem that mirror-to-gretap offload wouldn't adapt to
changes in bridge vlan configuration right away and another notification
would have to arrive for mlxsw to catch up.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bridge: Notify about bridge VLANs

A driver might need to react to changes in settings of brentry VLANs.
Therefore send switchdev port notifications for these as well. Reuse
SWITCHDEV_OBJ_ID_PORT_VLAN for this purpose. Listeners should use
netif_is_bridge_master() on orig_dev to determine whether the
notification is about a bridge port or a bridge.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

staging: fsl-dpaa2: ethsw: Ignore bridge VLAN events

A follow-up patch enables emitting VLAN notifications for the bridge CPU
port in addition to the existing slave port notifications. These
notifications have orig_dev set to the bridge in question.

Because there's no specific support for these VLANs, just ignore the
notifications to maintain the current behavior.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dsa: port: Ignore bridge VLAN events

A follow-up patch enables emitting VLAN notifications for the bridge CPU
port in addition to the existing slave port notifications. These
notifications have orig_dev set to the bridge in question.

Because there's no specific support for these VLANs, just ignore the
notifications to maintain the current behavior.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

rocker: rocker_main: Ignore bridge VLAN events

A follow-up patch enables emitting VLAN notifications for the bridge CPU
port in addition to the existing slave port notifications. These
notifications have orig_dev set to the bridge in question.

Because there's no specific support for these VLANs, just ignore the
notifications to maintain the current behavior.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_switchdev: Ignore bridge VLAN events

A follow-up patch enables emitting VLAN notifications for the bridge CPU
port in addition to the existing slave port notifications. These
notifications have orig_dev set to the bridge in question.

Because there's no specific support for these VLANs, just ignore the
notifications to maintain the current behavior.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bridge: Extract br_vlan_add_existing()

Extract the code that deals with adding a preexisting VLAN to bridge CPU
port to a separate function. A follow-up patch introduces a need to roll
back operations in this block due to an error, and this split will make
the error-handling code clearer.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bridge: Extract boilerplate around switchdev_port_obj_*()

A call to switchdev_port_obj_add() or switchdev_port_obj_del() involves
initializing a struct switchdev_obj_port_vlan, a piece of code that
repeats on each call site almost verbatim. While in the current codebase
there is just one duplicated add call, the follow-up patches add more of
both add and del calls.

Thus to remove the duplication, extract the repetition into named
functions and reuse.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed*: Add link change count value to ethtool statistics display.

This patch adds driver changes for capturing the link change count in
ethtool statistics display.

Please consider applying this to "net-next".

Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'mlx5e-updates-2018-05-29' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5e-updates-2018-05-29

This series includes mlx5 FPGA and mlx5e netdevice updates:

1) Print FPGA info such as device name, vendor id, etc.., from Ilan Tayari.
2) Abort FPGA if some essential capabilities are not supported, from Yevgeny Kliteynik.
3) Two FPGA dma related minor fixes, from Ilya Lesokhin.
4) Use the right table to report offloaded TC rules, from Or Gerlitz.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: remove bypassed check in sch_direct_xmit()

Checking netif_xmit_frozen_or_stopped() at the end of sch_direct_xmit()
is being bypassed. This is because "ret" from sch_direct_xmit() will be
either NETDEV_TX_OK or NETDEV_TX_BUSY, and only ret == NETDEV_TX_OK == 0
will reach the condition:

    if (ret && netif_xmit_frozen_or_stopped(txq))
        return false;

This patch cleans up the code by removing the whole condition.

For more discussion about this, please refer to
   https://marc.info/?t=152727195700008

Signed-off-by: Song Liu <songliubraving@fb.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: minor optimization around tcp_hdr() usage in receive path

This is additional to the
commit ea1627c20c34 ("tcp: minor optimizations around tcp_hdr() usage").
At this point, skb->data is same with tcp_hdr() as tcp header has not
been pulled yet. So use the less expensive one to get the tcp header.

Remove the third parameter of tcp_rcv_established() and put it into
the function body.

Furthermore, the local variables are listed as a reverse christmas tree :)

Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

MAINTAINERS: add myself as maintainer for QorIQ PTP clock driver

Added myself as maintainer for QorIQ PTP clock driver.
Since gianfar_ptp.c was renamed to ptp_qoriq.c, let's
maintain it under QorIQ PTP clock driver.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers/net: Fix various unnecessary characters after logging newlines

Remove and coalesce formats when there is an unnecessary
character after a logging newline. These extra characters
cause logging defects.

Miscellanea:

o Coalesce formats

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: davinci: fix building davinci mdio code without CONFIG_OF

Test-building this driver on targets without CONFIG_OF revealed a build
failure:

drivers/net/ethernet/ti/davinci_mdio.c: In function 'davinci_mdio_probe':
drivers/net/ethernet/ti/davinci_mdio.c:380:9: error: implicit declaration of function 'davinci_mdio_probe_dt'; did you mean 'davinci_mdio_probe'? [-Werror=implicit-function-declaration]

This adjusts the #ifdef logic in the driver to make it build in
all configurations.

Fixes: 2652113ff043 ("net: ethernet: ti: Allow most drivers with COMPILE_TEST")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Sekhar Nori <nsekhar@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: freescale: fix false-positive string overflow warning

While compile-testing on arm64 with gcc-8.1, I ran into a build diagnostic:

drivers/net/ethernet/freescale/fec_main.c: In function 'fec_probe':
drivers/net/ethernet/freescale/fec_main.c:3517:25: error: '%d' directive writing between 1 and 10 bytes into a region of size 5 [-Werror=format-overflow=]
   sprintf(irq_name, "int%d", i);
                         ^~
drivers/net/ethernet/freescale/fec_main.c:3517:21: note: directive argument in the range [0, 2147483646]
   sprintf(irq_name, "int%d", i);
                     ^~~~~~~
drivers/net/ethernet/freescale/fec_main.c:3517:3: note: 'sprintf' output between 5 and 14 bytes into a destination of size 8
   sprintf(irq_name, "int%d", i);
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~

It appears this has never shown on ppc32 or arm32 for an unknown reason, but
now gcc fails to identify that the 'irq_cnt' loop index has an upper bound
of 3, and instead uses a bogus range.

To work around the warning, this changes the sprintf to snprintf with the
correct buffer length.

Fixes: 78cc6e7ef957 ("net: ethernet: freescale: Allow FEC with COMPILE_TEST")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5e: Get the number of offloaded TC rules from the correct table

As we keep the offloaded TC rules for NIC and e-switch in two different
places, make sure to return the number of offloaded flows according
to the use-case and not blindly from the priv.

Fixes: 655dc3d2b91b ('net/mlx5e: Use shared table for offloaded TC eswitch flows')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: FPGA, Call DMA unmap with the right size

When mlx5_fpga_conn_unmap_buf is called buf->sg[0].size
should equal the actual buffer size, not the message size.
Otherwise we will trigger the following dma debug warning
"DMA-API: device driver frees DMA memory with different size"

Fixes: 537a50574175 ('net/mlx5: FPGA, Add high-speed connection routines')
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: FPGA, Properly initialize dma direction on fpga conn send

Properly initialize dma direction on fpga conn send.
Do not rely on dma_dir == 0 (DMA_BIDIRECTIONAL).

Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: FPGA, Abort FPGA init if the device reports no QP capability

In the case that the reported max number of QPs capability
equals to zero, abort FPGA init.

Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com>
Signed-off-by: Adi Nissim <adin@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: FPGA, print SBU identification on init

Add print of the following values on init:
1. ieee vendor id
2. sandbox product id
3. sandbox product version

Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Adi Nissim <adin@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: FPGA, Add device name

Add device name for Mellanox FPGA devices.

Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Adi Nissim <adin@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: FPGA, Add doxygen for access type enum

Add doxygen comments for enum mlx5_fpga_access_type.

Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Adi Nissim <adin@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

bpfilter: fix a build err

gcc-7.3.0 report following err:

HOSTCC net/bpfilter/main.o
In file included from net/bpfilter/main.c:9:0:
./include/uapi/linux/bpf.h:12:10: fatal error: linux/bpf_common.h: No such file or directory
#include <linux/bpf_common.h>

remove it by adding a include path.
Fixes: d2ba09c17a06 ("net: add skeleton of bpfilter kernel module")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: use data length instead of skb->len in tcp_probe

skb->len is meaningless to user.
data length could be more helpful, with which we can easily filter out
the packet without payload.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

crypto: chtls: free beyond end rspq_skb_cache

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Atul Gupta <atul.gupta@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

crypto: chtls: kbuild warnings

- unindented continue
- check for null page
- signed return

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Atul Gupta <atul.gupta@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

crypto: chtls: dereference null variable

skb dereferenced before check in sendpage

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Atul Gupta <atul.gupta@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

crypto: chtls: wait for memory sendmsg, sendpage

address suspicious code <gustavo@embeddedor.com>

1210       set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
1211       }

The issue is that in the code above, set_bit is never reached
due to the 'continue' statement at line 1208.

Also reported by bug report:<dan.carpenter@oracle.com>
1210       set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Not reachable.

Its required to wait for buffer in the send path and takes care of
unaddress and un-handled SOCK_NOSPACE.

v2: use csk_mem_free where appropriate
    proper indent of goto do_nonblock
    replace out with do_rm_wq

Reported-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Atul Gupta <atul.gupta@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

crypto:chtls: key len correction

corrected the key length to copy 128b key. Removed 192b and 256b
key as user input supports key of size 128b in gcm_ctx

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Atul Gupta <atul.gupta@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-Add-address-attribute-to-control-metric-of-prefix-route'

David Ahern says:

====================
net: Add address attribute to control metric of prefix route

For use cases such as VRR (Virtual Router Redundancy) interface managers
want efficient control over the order of prefix routes when multiple
interfaces have addresses with overlapping/duplicate subnets.

Currently, if two interfaces have addresses in the same subnet, the order
of the prefix route entries is determined by the order in which the
addresses are assigned or the links brought up. Any actions like cycling
an interface up and down changes that order. This set adds a new attribute
for addresses to allow a user to specify the metric of the prefix route
associated with an address giving interface managers better and more
efficient control of the order of prefix routes.

Patches 1-3 refactor IPv6 address add functions to pass an ifa6_config
struct. The functions currently have a long list of arguments and adding
the metric just makes it worse. Because of the overall diff size in
moving the arguments to a struct, the change is done in stages to make
it easier to review starting with the bottom function and pushing the
struct up to callers in each successive patch.

Patch 4 introduces the new attribute.

Patches 5 and 6 add support for the new attribute to IPv4 and IPv6
addresses.

Patch 7 adds a set of test cases.

Patch 8 adds support to iproute2

Changes since RFC
- collapsed patches 1 and 3 into patch 2
- simplified stack variables in fib_modify_prefix_metric in patch 5
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: fib_tests: Add prefix route tests with metric

Add tests verifying prefix routes are inserted with expected metric.

IPv6 prefix route tests
    TEST: Default metric                                      [ OK ]
    TEST: User specified metric on first device               [ OK ]
    TEST: User specified metric on second device              [ OK ]
    TEST: Delete of address on first device                   [ OK ]
    TEST: Modify metric of address                            [ OK ]
    TEST: Prefix route removed on link down                   [ OK ]
    TEST: Prefix route with metric on link up                 [ OK ]

IPv4 prefix route tests
    TEST: Default metric                                      [ OK ]
    TEST: User specified metric on first device               [ OK ]
    TEST: User specified metric on second device              [ OK ]
    TEST: Delete of address on first device                   [ OK ]
    TEST: Modify metric of address                            [ OK ]
    TEST: Prefix route removed on link down                   [ OK ]
    TEST: Prefix route with metric on link up                 [ OK ]

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>