Vlad Buslov [Thu, 8 Aug 2019 14:01:33 +0000 (17:01 +0300)]
net/mlx5e: Allow concurrent creation of encap entries
Encap entries creation is fully synchronized by encap_tbl_lock. In order to
allow concurrent allocation of hardware resources used to offload
encapsulation, extend mlx5e_encap_entry with 'res_ready' completion. Move
call to mlx5e_tc_tun_create_header_ipv{4|6}() out of encap_tbl_lock
critical section. Modify code that attaches new flows to existing encap to
wait for 'res_ready' completion before using the entry. Insert encap entry
to table before provisioning it to hardware and modify all users of the
encap table to verify that encap was fully initialized by checking
completion result for non-zero value (and to wait for 'res_ready'
completion, if necessary).
Vlad Buslov [Fri, 2 Aug 2019 19:21:56 +0000 (22:21 +0300)]
net/mlx5e: Protect encap hash table with mutex
To remove dependency on rtnl lock, protect encap hash table from concurrent
modifications with new "encap_tbl_lock" mutex. Use the mutex to protect
internal encap entry state from concurrent modification. This is necessary
because a flow can be attached to multiple encap entries simultaneously,
which significantly complicates using finer grained per-entry lock.
Vlad Buslov [Sun, 3 Jun 2018 17:31:47 +0000 (20:31 +0300)]
net/mlx5e: Extend encap entry with reference counter
List of flows attached to encap entry is used as implicit reference
counter (encap entry is deallocated when list becomes free) and as a
mechanism to obtain encap entry that flow is attached to (through list
head). This is not safe when concurrent modification of list of flows
attached to encap entry is possible. Proper atomic reference counter is
required to support concurrent access.
As a preparation for extending encap with reference counting, extract code
that lookups and deletes encap entry into standalone put/get helpers. In
order to remove this dependency on external locking, extend encap entry
with reference counter to manage its lifetime and extend flow structure
with direct pointer to encap entry that flow is attached to.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Vlad Buslov [Thu, 8 Aug 2019 13:53:15 +0000 (16:53 +0300)]
net/mlx5e: Allow concurrent creation of mod_hdr entries
Mod_hdr entries creation is fully synchronized by mod_hdr_tbl->lock. In
order to allow concurrent allocation of hardware resources used to offload
header rewrite, extend mlx5e_mod_hdr_entry with 'res_ready' completion.
Move call to mlx5_modify_header_alloc() out of mod_hdr_tbl->lock critical
section. Modify code that attaches new flows to existing mh to wait for
'res_ready' completion before using the entry. Insert mh to mod_hdr table
before provisioning it to hardware and modify all users of mod_hdr table to
verify that mh was fully initialized by checking completion result for
negative value (and to wait for 'res_ready' completion, if necessary).
Vlad Buslov [Fri, 8 Jun 2018 19:10:09 +0000 (22:10 +0300)]
net/mlx5e: Protect mod header entry flows list with spinlock
To remove dependency on rtnl lock, extend mod header entry with spinlock
and use it to protect list of flows attached to mod header entry from
concurrent modifications.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Vlad Buslov [Fri, 1 Jun 2018 18:47:43 +0000 (21:47 +0300)]
net/mlx5e: Extend mod header entry with reference counter
List of flows attached to mod header entry is used as implicit reference
counter (mod header entry is deallocated when list becomes free) and as a
mechanism to obtain mod header entry that flow is attached to (through list
head). This is not safe when concurrent modification of list of flows
attached to mod header entry is possible. Proper atomic reference counter
is required to support concurrent access.
As a preparation for extending mod header with reference counting, extract
code that lookups and deletes mod header entry into standalone put/get
helpers. In order to remove this dependency on external locking, extend mod
header entry with reference counter to manage its lifetime and extend flow
structure with direct pointer to mod header entry that flow is attached to.
To remove code duplication between legacy and switchdev mode
implementations that both support mod_hdr functionality, store mod_hdr
table in dedicated structure used by both fdb and kernel namespaces. New
table structure is extended with table lock by one of the following patches
in this series. Implement helper function to get correct mod_hdr table
depending on flow namespace.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Vlad Buslov [Thu, 1 Aug 2019 13:54:54 +0000 (16:54 +0300)]
net/mlx5e: Allow concurrent creation of hairpin entries
Hairpin entries creation is fully synchronized by hairpin_tbl_lock. In
order to allow concurrent initialization of mlx5e_hairpin structure
instances and provisioning of hairpin entries to hardware, extend
mlx5e_hairpin_entry with 'res_ready' completion. Move call to
mlx5e_hairpin_create() out of hairpin_tbl_lock critical section. Modify
code that attaches new flows to existing hpe to wait for 'res_ready'
completion before using the hpe. Insert hpe to hairpin table before
provisioning it to hardware and modify all users of hairpin table to verify
that hpe was fully initialized by checking hpe->hp pointer (and to wait for
'res_ready' completion, if necessary).
Modify dead peer update event handling function to save hpe's to temporary
list with their reference counter incremented. Wait for completion of hpe's
in temporary list and update their 'peer_gone' flag outside of
hairpin_tbl_lock critical section.
Vlad Buslov [Thu, 7 Jun 2018 20:01:40 +0000 (23:01 +0300)]
net/mlx5e: Protect hairpin entry flows list with spinlock
To remove dependency on rtnl lock, extend hairpin entry with spinlock and
use it to protect list of flows attached to hairpin entry from concurrent
modifications.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Vlad Buslov [Fri, 8 Jun 2018 16:26:50 +0000 (19:26 +0300)]
net/mlx5e: Extend hairpin entry with reference counter
List of flows attached to hairpin entry is used as implicit reference
counter (hairpin entry is deallocated when list becomes free) and as a
mechanism to obtain hairpin entry that flow is attached to (through list
head). This is not safe when concurrent modification of list of flows
attached to hairpin entry is possible. Proper atomic reference counter is
required to support concurrent access.
As a preparation for extending hairpin with reference counting, extract
code that deletes hairpin entry into standalone function. In order to
remove this dependency on external locking, extend hairpin entry with
reference counter to manage its lifetime and extend flow structure with
direct pointer to hairpin entry that flow is attached to.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
net/sched/sch_taprio.c:680:32: warning:
entry_list_policy defined but not used [-Wunused-const-variable=]
One of the points of commit a3d43c0d56f1 ("taprio: Add support adding
an admin schedule") is that it removes support (it now returns "not
supported") for schedules using the TCA_TAPRIO_ATTR_SCHED_SINGLE_ENTRY
attribute (which were never used), the parsing of those types of schedules
was the only user of this policy. So removing this policy should be fine.
Reported-by: Hulk Robot <hulkci@huawei.com> Suggested-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Disabling TSO but leaving SG active results is a significant
performance drop. Therefore disable also SG on RTL8168evl.
This restores the original performance.
Fixes: 93681cd7d94f ("r8169: enable HW csum and TSO") Signed-off-by: Holger Hoffstätte <holger@applied-asynchrony.com> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Josh Hunt [Wed, 7 Aug 2019 23:52:29 +0000 (19:52 -0400)]
tcp: add new tcp_mtu_probe_floor sysctl
The current implementation of TCP MTU probing can considerably
underestimate the MTU on lossy connections allowing the MSS to get down to
48. We have found that in almost all of these cases on our networks these
paths can handle much larger MTUs meaning the connections are being
artificially limited. Even though TCP MTU probing can raise the MSS back up
we have seen this not to be the case causing connections to be "stuck" with
an MSS of 48 when heavy loss is present.
Prior to pushing out this change we could not keep TCP MTU probing enabled
b/c of the above reasons. Now with a reasonble floor set we've had it
enabled for the past 6 months.
The new sysctl will still default to TCP_MIN_SND_MSS (48), but gives
administrators the ability to control the floor of MSS probing.
Signed-off-by: Josh Hunt <johunt@akamai.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Fri, 9 Aug 2019 13:27:15 +0000 (15:27 +0200)]
devlink: remove pointless data_len arg from region snapshot create
The size of the snapshot has to be the same as the size of the region,
therefore no need to pass it again during snapshot creation. Remove the
arg and use region->size instead.
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 9 Aug 2019 12:04:47 +0000 (05:04 -0700)]
tcp: batch calls to sk_flush_backlog()
Starting from commit d41a69f1d390 ("tcp: make tcp_sendmsg() aware of socket backlog")
loopback flows got hurt, because for each skb sent, the socket receives an
immediate ACK and sk_flush_backlog() causes extra work.
Intent was to not let the backlog grow too much, but we went a bit too far.
We can check the backlog every 16 skbs (about 1MB chunks)
to increase TCP over loopback performance by about 15 %
Note that the call to sk_flush_backlog() handles a single ACK,
thanks to coalescing done on backlog, but cleans the 16 skbs
found in rtx rb-tree.
Reported-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Denis Efremov [Thu, 8 Aug 2019 04:57:53 +0000 (07:57 +0300)]
liquidio: Use pcie_flr() instead of reimplementing it
octeon_mbox_process_cmd() directly writes the PCI_EXP_DEVCTL_BCR_FLR
bit, which bypasses timing requirements imposed by the PCIe spec.
This patch fixes the function to use the pcie_flr() interface instead.
Signed-off-by: Denis Efremov <efremov@linux.com> Reviewed-by: Andrew Murray <andrew.murray@arm.com> Reviewed-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Wed, 7 Aug 2019 19:38:22 +0000 (21:38 +0200)]
r8169: allocate rx buffers using alloc_pages_node
We allocate 16kb per rx buffer, so we can avoid some overhead by using
alloc_pages_node directly instead of bothering kmalloc_node. Due to
this change buffers are page-aligned now, therefore the alignment check
can be removed.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Acked-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
YueHaibing [Wed, 7 Aug 2019 13:10:55 +0000 (21:10 +0800)]
fq_codel: remove set but not used variables 'prev_ecn_mark' and 'prev_drop_count'
Fixes gcc '-Wunused-but-set-variable' warning:
net/sched/sch_fq_codel.c: In function fq_codel_dequeue:
net/sched/sch_fq_codel.c:288:23: warning: variable prev_ecn_mark set but not used [-Wunused-but-set-variable]
net/sched/sch_fq_codel.c:288:6: warning: variable prev_drop_count set but not used [-Wunused-but-set-variable]
They are not used since commit 77ddaff218fc ("fq_codel: Kill
useless per-flow dropped statistic")
Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Wed, 7 Aug 2019 10:42:31 +0000 (13:42 +0300)]
mlxsw: spectrum: Extend to support Spectrum-3 ASIC
Extend existing driver for Spectrum and Spectrum-2 ASICs
to support Spectrum-3 ASIC as well.
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
John Rutherford [Wed, 7 Aug 2019 02:52:29 +0000 (12:52 +1000)]
tipc: add loopback device tracking
Since node internal messages are passed directly to the socket, it is not
possible to observe those messages via tcpdump or wireshark.
We now remedy this by making it possible to clone such messages and send
the clones to the loopback interface. The clones are dropped at reception
and have no functional role except making the traffic visible.
The feature is enabled if network taps are active for the loopback device.
pcap filtering restrictions require the messages to be presented to the
receiving side of the loopback device.
v3 - Function dev_nit_active used to check for network taps.
- Procedure netif_rx_ni used to send cloned messages to loopback device.
Signed-off-by: John Rutherford <john.rutherford@dektech.com.au> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
flow_offload: add indr-block in nf_table_offload
This series patch make nftables offload support the vlan and
tunnel device offload through indr-block architecture.
The first four patches mv tc indr block to flow offload and
rename to flow-indr-block.
Because the new flow-indr-block can't get the tcf_block
directly. The fifth patch provide a callback list to get
flow_block of each subsystem immediately when the device
register and contain a block.
The last patch make nf_tables_offload support flow-indr-block.
This version add a mutex lock for add/del flow_indr_block_ing_cb
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 9 Aug 2019 01:22:29 +0000 (18:22 -0700)]
Merge branch 'net-batched-receive-in-GRO-path'
Edward Cree says:
====================
net: batched receive in GRO path
This series listifies part of GRO processing, in a manner which allows those
packets which are not GROed (i.e. for which dev_gro_receive returns
GRO_NORMAL) to be passed on to the listified regular receive path.
dev_gro_receive() itself is not listified, nor the per-protocol GRO
callback, since GRO's need to hold packets on lists under napi->gro_hash
makes keeping the packets on other lists awkward, and since the GRO control
block state of held skbs can refer only to one 'new' skb at a time.
Instead, when napi_frags_finish() handles a GRO_NORMAL result, stash the skb
onto a list in the napi struct, which is received at the end of the napi
poll or when its length exceeds the (new) sysctl net.core.gro_normal_batch.
Performance figures with this series, collected on a back-to-back pair of
Solarflare sfn8522-r2 NICs with 120-second NetPerf tests. In the stats,
sample size n for old and new code is 6 runs each; p is from a Welch t-test.
Tests were run both with GRO enabled and disabled, the latter simulating
uncoalesceable packets (e.g. due to IP or TCP options). The receive side
(which was the device under test) had the NetPerf process pinned to one CPU,
and the device interrupts pinned to a second CPU. CPU utilisation figures
(used in cases of line-rate performance) are summed across all CPUs.
net.core.gro_normal_batch was left at its default value of 8.
TCP 4 streams, GRO on: all results line rate (9.415Gbps)
net-next: 210.3% cpu
after #1: 181.5% cpu (-13.7%, p=0.031 vs net-next)
after #3: 196.7% cpu (- 8.4%, p=0.136 vs net-next)
TCP 4 streams, GRO off:
net-next: 8.017 Gbps
after #1: 7.785 Gbps (- 2.9%, p=0.385 vs net-next)
after #3: 7.604 Gbps (- 5.1%, p=0.282 vs net-next. But note *)
TCP 1 stream, GRO off:
net-next: 6.553 Gbps
after #1: 6.444 Gbps (- 1.7%, p=0.302 vs net-next)
after #3: 6.790 Gbps (+ 3.6%, p=0.169 vs net-next)
TCP 1 stream, GRO on, busy_read = 50: all results line rate
net-next: 156.0% cpu
after #1: 174.5% cpu (+11.9%, p=0.015 vs net-next)
after #3: 165.0% cpu (+ 5.8%, p=0.147 vs net-next)
TCP 1 stream, GRO off, busy_read = 50:
net-next: 6.488 Gbps
after #1: 6.625 Gbps (+ 2.1%, p=0.059 vs net-next)
after #3: 7.351 Gbps (+13.3%, p=0.026 vs net-next)
TCP_RR 100 streams, GRO off, 8000 byte payload
net-next: 995.083 us
after #1: 969.167 us (- 2.6%, p=0.204 vs net-next)
after #3: 976.433 us (- 1.9%, p=0.254 vs net-next)
TCP_RR 100 streams, GRO off, 8000 byte payload, busy_read = 50:
net-next: 2.851 ms
after #1: 2.871 ms (+ 0.7%, p=0.134 vs net-next)
after #3: 2.937 ms (+ 3.0%, p<0.001 vs net-next)
TCP_RR 100 streams, GRO off, 1 byte payload, busy_read = 50:
net-next: 867.317 us
after #1: 865.717 us (- 0.2%, p=0.334 vs net-next)
after #3: 868.517 us (+ 0.1%, p=0.414 vs net-next)
(*) These tests produced a mixture of line-rate and below-line-rate results,
meaning that statistically speaking the results were 'censored' by the
upper bound, and were thus not normally distributed, making a Welch t-test
mathematically invalid. I therefore also calculated estimators according
to [1], which gave the following:
net-next: 8.133 Gbps
after #1: 8.130 Gbps (- 0.0%, p=0.499 vs net-next)
after #3: 7.680 Gbps (- 5.6%, p=0.285 vs net-next)
(though my procedure for determining ν wasn't mathematically well-founded
either, so take that p-value with a grain of salt).
A further check came from dividing the bandwidth figure by the CPU usage for
each test run, giving:
net-next: 3.461
after #1: 3.198 (- 7.6%, p=0.145 vs net-next)
after #3: 3.641 (+ 5.2%, p=0.280 vs net-next)
The above results are fairly mixed, and in most cases not statistically
significant. But I think we can roughly conclude that the series
marginally improves non-GROable throughput, without hurting latency
(except in the large-payload busy-polling case, which in any case yields
horrid performance even on net-next (almost triple the latency without
busy-poll). Also, drivers which, unlike sfc, pass UDP traffic to GRO
would expect to see a benefit from gaining access to batching.
Changed in v3:
* gro_normal_batch sysctl now uses SYSCTL_ONE instead of &one
* removed RFC tags (no comments after a week means no-one objects, right?)
Changed in v2:
* During busy poll, call gro_normal_list() to receive batched packets
after each cycle of the napi busy loop. See comments in Patch #3 for
complications of doing the same in busy_poll_stop().
Edward Cree [Tue, 6 Aug 2019 13:53:55 +0000 (14:53 +0100)]
net: use listified RX for handling GRO_NORMAL skbs
When GRO decides not to coalesce a packet, in napi_frags_finish(), instead
of passing it to the stack immediately, place it on a list in the napi
struct. Then, at flush time (napi_complete_done(), napi_poll(), or
napi_busy_loop()), call netif_receive_skb_list_internal() on the list.
We'd like to do that in napi_gro_flush(), but it's not called if
!napi->gro_bitmask, so we have to do it in the callers instead. (There are
a handful of drivers that call napi_gro_flush() themselves, but it's not
clear why, or whether this will affect them.)
Because a full 64 packets is an inefficiently large batch, also consume the
list whenever it exceeds gro_normal_batch, a new net/core sysctl that
defaults to 8.
Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Tue, 6 Aug 2019 13:53:20 +0000 (14:53 +0100)]
sfc: don't score irq moderation points for GRO
We already scored points when handling the RX event, no-one else does this,
and looking at the history it appears this was originally meant to only
score on merges, not on GRO_NORMAL. Moreover, it gets in the way of
changing GRO to not immediately pass GRO_NORMAL skbs to the stack.
Performance testing with four TCP streams received on a single CPU (where
throughput was line rate of 9.4Gbps in all tests) showed a 13.7% reduction
in RX CPU usage (n=6, p=0.03).
Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Rahul Verma [Tue, 6 Aug 2019 06:59:50 +0000 (23:59 -0700)]
qed: Add new ethtool supported port types based on media.
Supported ports in ethtool <eth1> are displayed based on media type.
For media type fibre and twinaxial, port type is "FIBRE". Media type
Base-T is "TP" and media KR is "Backplane".
V1->V2:
Corrected the subject.
Signed-off-by: Rahul Verma <rahulv@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Chuhong Yuan [Tue, 6 Aug 2019 02:58:46 +0000 (10:58 +0800)]
cxgb4: smt: Add lock for atomic_dec_and_test
The atomic_dec_and_test() is not safe because it is
outside of locks.
Move the locks of t4_smte_free() to its caller,
cxgb4_smt_release() to protect the atomic decrement.
Fixes: 3bdb376e6944 ("cxgb4: introduce SMT ops to prepare for SMAC rewrite support") Signed-off-by: Chuhong Yuan <hslester96@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
"Yeah I should have sent a pull request last week, so there is a lot
more here than usual:
1) Fix memory leak in ebtables compat code, from Wenwen Wang.
2) Several kTLS bug fixes from Jakub Kicinski (circular close on
disconnect etc.)
3) Force slave speed check on link state recovery in bonding 802.3ad
mode, from Thomas Falcon.
4) Clear RX descriptor bits before assigning buffers to them in
stmmac, from Jose Abreu.
5) Several missing of_node_put() calls, mostly wrt. for_each_*() OF
loops, from Nishka Dasgupta.
6) Double kfree_skb() in peak_usb can driver, from Stephane Grosjean.
7) Need to hold sock across skb->destructor invocation, from Cong
Wang.
8) IP header length needs to be validated in ipip tunnel xmit, from
Haishuang Yan.
9) Use after free in ip6 tunnel driver, also from Haishuang Yan.
10) Do not use MSI interrupts on r8169 chips before RTL8168d, from
Heiner Kallweit.
11) Upon bridge device init failure, we need to delete the local fdb.
From Nikolay Aleksandrov.
12) Handle erros from of_get_mac_address() properly in stmmac, from
Martin Blumenstingl.
13) Handle concurrent rename vs. dump in netfilter ipset, from Jozsef
Kadlecsik.
14) Setting NETIF_F_LLTX on mac80211 causes complete breakage with
some devices, so revert. From Johannes Berg.
15) Fix deadlock in rxrpc, from David Howells.
16) Fix Kconfig deps of enetc driver, we must have PHYLIB. From Yue
Haibing.
17) Fix mvpp2 crash on module removal, from Matteo Croce.
18) Fix race in genphy_update_link, from Heiner Kallweit.
19) bpf_xdp_adjust_head() stopped working with generic XDP when we
fixes generic XDP to support stacked devices properly, fix from
Jesper Dangaard Brouer.
20) Unbalanced RCU locking in rt6_update_exception_stamp_rt(), from
David Ahern.
21) Several memory leaks in new sja1105 driver, from Vladimir Oltean"
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (214 commits)
net: dsa: sja1105: Fix memory leak on meta state machine error path
net: dsa: sja1105: Fix memory leak on meta state machine normal path
net: dsa: sja1105: Really fix panic on unregistering PTP clock
net: dsa: sja1105: Use the LOCKEDS bit for SJA1105 E/T as well
net: dsa: sja1105: Fix broken learning with vlan_filtering disabled
net: dsa: qca8k: Add of_node_put() in qca8k_setup_mdio_bus()
net: sched: sample: allow accessing psample_group with rtnl
net: sched: police: allow accessing police->params with rtnl
net: hisilicon: Fix dma_map_single failed on arm64
net: hisilicon: fix hip04-xmit never return TX_BUSY
net: hisilicon: make hip04_tx_reclaim non-reentrant
tc-testing: updated vlan action tests with batch create/delete
net sched: update vlan action for batched events operations
net: stmmac: tc: Do not return a fragment entry
net: stmmac: Fix issues when number of Queues >= 4
net: stmmac: xgmac: Fix XGMAC selftests
be2net: disable bh with spin_lock in be_process_mcc
net: cxgb3_main: Fix a resource leak in a error path in 'init_one()'
net: ethernet: sun4i-emac: Support phy-handle property for finding PHYs
net: bridge: move default pvid init/deinit to NETDEV_REGISTER/UNREGISTER
...
David S. Miller [Tue, 6 Aug 2019 21:41:45 +0000 (14:41 -0700)]
Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
40GbE Intel Wired LAN Driver Updates 2019-08-05
This series contains updates to i40e driver only.
Dmitrii adds missing statistic counters for VEB and VEB TC's.
Slawomir adds support for logging the "Disable Firmware LLDP" flag
option and its current status.
Jake fixes an issue where VF's being notified of their link status
before their queues are enabled which was causing issues. So always
report link status down when the VF queues are not enabled. Also adds
future proofing when statistics are added or removed by adding checks to
ensure the data pointer for the strings lines up with the expected
statistics count.
Czeslaw fixes the advertised mode reported in ethtool for FEC, where the
"None BaseR RS" was always being displayed no matter what the mode it
was in. Also added logging information when the PF is entering or
leaving "allmulti" (or promiscuous) mode. Fixed up the logging logic
for VF's when leaving multicast mode to not include unicast as well.
v2: drop Aleksandr's patch (previously patch #2 in the series) to
display the VF MAC address that is set by the VF while community
feedback is addressed.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Yifeng Sun [Mon, 5 Aug 2019 02:56:11 +0000 (19:56 -0700)]
openvswitch: Print error when ovs_execute_actions() fails
Currently in function ovs_dp_process_packet(), return values of
ovs_execute_actions() are silently discarded. This patch prints out
an debug message when error happens so as to provide helpful hints
for debugging. Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 6 Aug 2019 21:37:02 +0000 (14:37 -0700)]
Merge branch 'sja1105-fixes'
Vladimir Oltean says:
====================
Fixes for SJA1105 DSA: FDBs, Learning and PTP
This is an assortment of functional fixes for the sja1105 switch driver
targeted for the "net" tree (although they apply on net-next just as
well).
Patch 1/5 ("net: dsa: sja1105: Fix broken learning with vlan_filtering
disabled") repairs a breakage introduced in the early development stages
of the driver: support for traffic from the CPU has broken "normal"
frame forwarding (based on DMAC) - there is connectivity through the
switch only because all frames are flooded.
I debated whether this patch qualifies as a fix, since it puts the
switch into a mode it has never operated in before (aka SVL). But
"normal" forwarding did use to work before the "Traffic support for
SJA1105 DSA driver" patchset, and arguably this patch should have been
part of that.
Also, it would be strange for this feature to be broken in the 5.2 LTS.
Patch 2/5 ("net: dsa: sja1105: Use the LOCKEDS bit for SJA1105 E/T as
well") is a simplification of a previous FDB-related patch that is
currently in the 5.3 rc's.
Patches 3/5 - 5/5 fix various crashes found while running linuxptp over the
switch ports for extended periods of time, or in conjunction with other
error conditions. The fixed-up commits were all introduced in 5.2.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Sun, 4 Aug 2019 22:38:48 +0000 (01:38 +0300)]
net: dsa: sja1105: Fix memory leak on meta state machine error path
When RX timestamping is enabled and two link-local (non-meta) frames are
received in a row, this constitutes an error.
The tagger is always caching the last link-local frame, in an attempt to
merge it with the meta follow-up frame when that arrives. To recover
from the above error condition, the initial cached link-local frame is
dropped and the second frame in a row is cached (in expectance of the
second meta frame).
However, when dropping the initial link-local frame, its backing memory
was being leaked.
Fixes: f3097be21bf1 ("net: dsa: sja1105: Add a state machine for RX timestamping") Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Sun, 4 Aug 2019 22:38:47 +0000 (01:38 +0300)]
net: dsa: sja1105: Fix memory leak on meta state machine normal path
After a meta frame is received, it is associated with the cached
sp->data->stampable_skb from the DSA tagger private structure.
Cached means its refcount is incremented with skb_get() in order for
dsa_switch_rcv() to not free it when the tagger .rcv returns NULL.
The mistake is that skb_unref() is not the correct function to use. It
will correctly decrement the refcount (which will go back to zero) but
the skb memory will not be freed. That is the job of kfree_skb(), which
also calls skb_unref().
But it turns out that freeing the cached stampable_skb is in fact not
necessary. It is still a perfectly valid skb, and now it is even
annotated with the partial RX timestamp. So remove the skb_copy()
altogether and simply pass the stampable_skb with a refcount of 1
(incremented by us, decremented by dsa_switch_rcv) up the stack.
Fixes: f3097be21bf1 ("net: dsa: sja1105: Add a state machine for RX timestamping") Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Sun, 4 Aug 2019 22:38:46 +0000 (01:38 +0300)]
net: dsa: sja1105: Really fix panic on unregistering PTP clock
The IS_ERR_OR_NULL(priv->clock) check inside
sja1105_ptp_clock_unregister() is preventing cancel_delayed_work_sync
from actually being run.
Additionally, sja1105_ptp_clock_unregister() does not actually get run,
when placed in sja1105_remove(). The DSA switch gets torn down, but the
sja1105 module does not get unregistered. So sja1105_ptp_clock_unregister
needs to be moved to sja1105_teardown, to be symmetrical with
sja1105_ptp_clock_register which is called from the DSA sja1105_setup.
It is strange to fix a "fixes" patch, but the probe failure can only be
seen when the attached PHY does not respond to MDIO (issue which I can't
pinpoint the reason to) and it goes away after I power-cycle the board.
This time the patch was validated on a failing board, and the kernel
panic from the fixed commit's message can no longer be seen.
Fixes: 29dd908d355f ("net: dsa: sja1105: Cancel PTP delayed work on unregister") Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Sun, 4 Aug 2019 22:38:45 +0000 (01:38 +0300)]
net: dsa: sja1105: Use the LOCKEDS bit for SJA1105 E/T as well
It looks like the FDB dump taken from first-generation switches also
contains information on whether entries are static or not. So use that
instead of searching through the driver's tables.
Fixes: d763778224ea ("net: dsa: sja1105: Implement is_static for FDB entries on E/T") Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Sun, 4 Aug 2019 22:38:44 +0000 (01:38 +0300)]
net: dsa: sja1105: Fix broken learning with vlan_filtering disabled
When put under a bridge with vlan_filtering 0, the SJA1105 ports will
flood all traffic as if learning was broken. This is because learning
interferes with the rx_vid's configured by dsa_8021q as unique pvid's.
So learning technically still *does* work, it's just that the learnt
entries never get matched due to their unique VLAN ID.
The setting that saves the day is Shared VLAN Learning, which on this
switch family works exactly as desired: VLAN tagging still works
(untagged traffic gets the correct pvid) and FDB entries are still
populated with the correct contents including VID. Also, a frame cannot
violate the forwarding domain restrictions enforced by its classified
VLAN. It is just that the VID is ignored when looking up the FDB for
taking a forwarding decision (selecting the egress port).
This patch activates SVL, and the result is that frames with a learnt
DMAC are no longer flooded in the scenario described above.
Now exactly *because* SVL works as desired, we have to revisit some
earlier patches:
- It is no longer necessary to manipulate the VID of the 'bridge fdb
{add,del}' command when vlan_filtering is off. This is because now,
SVL is enabled for that case, so the actual VID does not matter*.
- It is still desirable to hide dsa_8021q VID's in the FDB dump
callback. But right now the dump callback should no longer hide
duplicates (one per each front panel port's pvid, plus one for the
VLAN that the CPU port is going to tag a TX frame with), because there
shouldn't be any (the switch will match a single FDB entry no matter
its VID anyway).
* Not really... It's no longer necessary to transform a 'bridge fdb add'
into 5 fdb add operations, but the user might still add a fdb entry with
any vid, and all of them would appear as duplicates in 'bridge fdb
show'. So force a 'bridge fdb add' to insert the VID of 0**, so that we
can prune the duplicates at insertion time.
** The VID of 0 is better than 1 because it is always guaranteed to be
in the ports' hardware filter. DSA also avoids putting the VID inside
the netlink response message towards the bridge driver when we return
this particular VID, which makes it suitable for FDB entries learnt
with vlan_filtering off.
Fixes: 227d07a07ef1 ("net: dsa: sja1105: Add support for traffic through standalone ports") Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Georg Waibel <georg.waibel@sensor-technik.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Nishka Dasgupta [Sun, 4 Aug 2019 15:30:18 +0000 (21:00 +0530)]
net: dsa: qca8k: Add of_node_put() in qca8k_setup_mdio_bus()
Each iteration of for_each_available_child_of_node() puts the previous
node, but in the case of a return from the middle of the loop, there
is no put, thus causing a memory leak. Hence add an of_node_put() before
the return.
Additionally, the local variable ports in the function
qca8k_setup_mdio_bus() takes the return value of of_get_child_by_name(),
which gets a node but does not put it. If the function returns without
putting ports, it may cause a memory leak. Hence put ports before the
mid-loop return statement, and also outside the loop after its last usage
in this function.
Issues found with Coccinelle.
Signed-off-by: Nishka Dasgupta <nishkadg.linux@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 6 Aug 2019 21:24:22 +0000 (14:24 -0700)]
Merge branch 'Support-tunnels-over-VLAN-in-NFP'
John Hurley says:
====================
Support tunnels over VLAN in NFP
This patchset deals with tunnel encap and decap when the end-point IP
address is on an internal port (for example and OvS VLAN port). Tunnel
encap without VLAN is already supported in the NFP driver. This patchset
extends that to include a push VLAN along with tunnel header push.
Patches 1-4 extend the flow_offload IR API to include actions that use
skbedit to set the ptype of an SKB and that send a packet to port ingress
from the act_mirred module. Such actions are used in flower rules that
forward tunnel packets to internal ports where they can be decapsulated.
OvS and its TC API is an example of a user-space app that produces such
rules.
Patch 5 modifies the encap offload code to allow the pushing of a VLAN
header after a tunnel header push.
Patches 6-10 deal with tunnel decap when the end-point is on an internal
port. They detect 'pre-tunnel rules' which do not deal with tunnels
themselves but, rather, forward packets to internal ports where they
can be decapped if required. Such rules are offloaded to a table in HW
along with an indication of whether packets need to be passed to this
table of not (based on their destination MAC address). Matching against
this table prior to decapsulation in HW allows the correct parsing and
handling of outer VLANs on tunnelled packets and the correct updating of
stats for said 'pre-tunnel' rules.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
John Hurley [Sun, 4 Aug 2019 15:10:49 +0000 (16:10 +0100)]
nfp: flower: encode mac indexes with pre-tunnel rule check
When a tunnel packet arrives on the NFP card, its destination MAC is
looked up and MAC index returned for it. This index can help verify the
tunnel by, for example, ensuring that the packet arrived on the expected
port. If the packet is destined for a known MAC that is not connected to a
given physical port then the mac index can have a global value (e.g. when
a series of bonded ports shared the same MAC).
If the packet is to be detunneled at a bridge device or internal port like
an Open vSwitch VLAN port, then it should first match a 'pre-tunnel' rule
to direct it to that internal port.
Use the MAC index to indicate if a packet should match a pre-tunnel rule
before decap is allowed. Do this by tracking the number of internal ports
associated with a MAC address and, if the number if >0, set a bit in the
mac_index to forward the packet to the pre-tunnel table before continuing
with decap.
Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
John Hurley [Sun, 4 Aug 2019 15:09:11 +0000 (16:09 +0100)]
nfp: flower: remove offloaded MACs when reprs are applied to OvS bridges
MAC addresses along with an identifying index are offloaded to firmware to
allow tunnel decapsulation. If a tunnel packet arrives with a matching
destination MAC address and a verified index, it can continue on the
decapsulation process. This replicates the MAC verifications carried out
in the kernel network stack.
When a netdev is added to a bridge (e.g. OvS) then packets arriving on
that dev are directed through the bridge datapath instead of passing
through the network stack. Therefore, tunnelled packets matching the MAC
of that dev will not be decapped here.
Replicate this behaviour on firmware by removing offloaded MAC addresses
when a MAC representer is added to an OvS bridge. This can prevent any
false positive tunnel decaps.
Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
John Hurley [Sun, 4 Aug 2019 15:09:10 +0000 (16:09 +0100)]
nfp: flower: offload pre-tunnel rules
Pre-tunnel rules are TC flower and OvS rules that forward a packet to the
tunnel end point where it can then pass through the network stack and be
decapsulated. These are required if the tunnel end point is, say, an OvS
internal port.
Currently, firmware determines that a packet is in a tunnel and decaps it
if it has a known destination IP and MAC address. However, this bypasses
the flower pre-tunnel rule and so does not update the stats. Further to
this it ignores VLANs that may exist outside of the tunnel header.
Offload pre-tunnel rules to the NFP. This embeds the pre-tunnel rule into
the tunnel decap process based on (firmware) mac index and VLAN. This
means that decap can be carried out correctly with VLANs and that stats
can be updated for all kernel rules correctly.
Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
John Hurley [Sun, 4 Aug 2019 15:09:09 +0000 (16:09 +0100)]
nfp: flower: verify pre-tunnel rules
Pre-tunnel rules must direct packets to an internal port based on L2
information. Rules that egress to an internal port are already indicated
by a non-NULL device in its nfp_fl_payload struct. Verfiy the rest of the
match fields indicate that the rule is a pre-tunnel rule. This requires a
full match on the destination MAC address, an option VLAN field, and no
specific matches on other lower layer fields (with the exception of L4
proto and flags).
If a rule is identified as a pre-tunnel rule then mark it for offload to
the pre-tunnel table. Similarly, remove it from the pre-tunnel table on
rule deletion. The actual offloading of these commands is left to a
following patch.
Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
John Hurley [Sun, 4 Aug 2019 15:09:08 +0000 (16:09 +0100)]
nfp: flower: detect potential pre-tunnel rules
Pre-tunnel rules are used when the tunnel end-point is on an 'internal
port'. These rules are used to direct the tunnelled packets (based on outer
header fields) to the internal port where they can be detunnelled. The
rule must send the packet to ingress the internal port at the TC layer.
Currently FW does not support an action to send to ingress so cannot
offload such rules. However, in preparation for populating the pre-tunnel
table to represent such rules, check for rules that send to the ingress of
an internal port and mark them as such. Further validation of such rules
is left to subsequent patches.
Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
John Hurley [Sun, 4 Aug 2019 15:09:07 +0000 (16:09 +0100)]
nfp: flower: push vlan after tunnel in merge
NFP allows the merging of 2 flows together into a single offloaded flow.
In the kernel datapath the packet must match 1 flow, impliment its
actions, recirculate, match the 2nd flow and also impliment its actions.
Merging creates a single flow with all actions from the 2 original flows.
Firmware impliments a tunnel header push as the packet is about to egress
the card. Therefore, if the first merge rule candiate pushes a tunnel,
then the second rule can only have an egress action for a valid merge to
occur (or else the action ordering will be incorrect). This prevents the
pushing of a tunnel header followed by the pushing of a vlan header.
In order to support this behaviour, firmware allows VLAN information to
be encoded in the tunnel push action. If this is non zero then the fw will
push a VLAN after the tunnel header push meaning that 2 such flows with
these actions can be merged (with action order being maintained).
Support tunnel in VLAN pushes by encoding VLAN information in the tunnel
push action of any merge flow requiring this.
Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
John Hurley [Sun, 4 Aug 2019 15:09:06 +0000 (16:09 +0100)]
net: sched: add ingress mirred action to hardware IR
TC mirred actions (redirect and mirred) can send to egress or ingress of a
device. Currently only egress is used for hw offload rules.
Modify the intermediate representation for hw offload to include mirred
actions that go to ingress. This gives drivers access to such rules and
can decide whether or not to offload them.
Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
John Hurley [Sun, 4 Aug 2019 15:09:05 +0000 (16:09 +0100)]
net: tc_act: add helpers to detect ingress mirred actions
TC mirred actions can send to egress or ingress on a given netdev. Helpers
exist to detect actions that are mirred to egress. Extend the header file
to include helpers to detect ingress mirred actions.
Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
John Hurley [Sun, 4 Aug 2019 15:09:04 +0000 (16:09 +0100)]
net: sched: add skbedit of ptype action to hardware IR
TC rules can impliment skbedit actions. Currently actions that modify the
skb mark are passed to offloading drivers via the hardware intermediate
representation in the flow_offload API.
Extend this to include skbedit actions that modify the packet type of the
skb. Such actions may be used to set the ptype to HOST when redirecting a
packet to ingress.
Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
John Hurley [Sun, 4 Aug 2019 15:09:03 +0000 (16:09 +0100)]
net: tc_act: add skbedit_ptype helper functions
The tc_act header file contains an inline function that checks if an
action is changing the skb mark of a packet and a further function to
extract the mark.
Add similar functions to check for and get skbedit actions that modify
the packet type of the skb.
Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 6 Aug 2019 21:21:21 +0000 (14:21 -0700)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
100GbE Intel Wired LAN Driver Updates 2019-08-04
This series contains more updates to fm10k from Jake Keller.
Jake removes the unnecessary initialization of some variables to help
resolve static code checker warnings. Explicitly return success during
resume, since the value of 'err' is always success. Fixed a issue with
incrementing a void pointer, which can produce undefined behavior. Used
the __always_unused macro for function templates that are passed as
parameters in functions, but are not used. Simplified the code by
removing an unnecessary macro in determining the value of NON_Q_VECTORS.
Fixed an issue, using bitwise operations to prevent the low address
overwriting the high portion of the address.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Sun, 4 Aug 2019 07:42:57 +0000 (09:42 +0200)]
r8169: remove access to legacy register MultiIntr
This code piece was inherited from RTL8139 code, the register at
address 0x5c however has a different meaning on RTL8169 and is unused.
So we can remove this.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 6 Aug 2019 21:18:20 +0000 (14:18 -0700)]
Merge branch 'fq_codel-small-optimizations'
Dave Taht says:
====================
Two small fq_codel optimizations
These two patches improve fq_codel performance
under extreme network loads. The first patch
more rapidly escalates the codel count under
overload, the second just kills a totally useless
statistic.
(sent together because they'd otherwise conflict)
====================
Signed-off-by: Dave Taht <dave.taht@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dave Taht [Sat, 3 Aug 2019 23:37:29 +0000 (16:37 -0700)]
fq_codel: Kill useless per-flow dropped statistic
It is almost impossible to get anything other than a 0 out of
flow->dropped statistic with a tc class dump, as it resets to 0
on every round.
It also conflates ecn marks with drops.
It would have been useful had it kept a cumulative drop count, but
it doesn't. This patch doesn't change the API, it just stops
tracking a stat and state that is impossible to measure and nobody
uses.
Signed-off-by: Dave Taht <dave.taht@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dave Taht [Sat, 3 Aug 2019 23:37:28 +0000 (16:37 -0700)]
Increase fq_codel count in the bulk dropper
In the field fq_codel is often used with a smaller memory or
packet limit than the default, and when the bulk dropper is hit,
the drop pattern bifircates into one that more slowly increases
the codel drop rate and hits the bulk dropper more than it should.
The scan through the 1024 queues happens more often than it needs to.
This patch increases the codel count in the bulk dropper, but
does not change the drop rate there, relying on the next codel round
to deliver the next packet at the original drop rate
(after that burst of loss), then escalate to a higher signaling rate.
Signed-off-by: Dave Taht <dave.taht@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 6 Aug 2019 21:15:39 +0000 (14:15 -0700)]
Merge branch 'flow_offload-action-fixes'
Vlad Buslov says:
====================
action fixes for flow_offload infra compatibility
Fix rcu warnings due to usage of action helpers that expect rcu read lock
protection from rtnl-protected context of flow_offload infra.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Change tcf_sample_psample_group() helper to allow using it from both rtnl
and rcu protected contexts.
Fixes: a7a7be6087b0 ("net/sched: add sample action to the hardware intermediate representation") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Change tcf_police_rate_bytes_ps() and tcf_police_tcfp_burst() helpers to
allow using them from both rtnl and rcu protected contexts.
Fixes: 8c8cfc6ed274 ("net/sched: add police action to the hardware intermediate representation") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 6 Aug 2019 21:14:01 +0000 (14:14 -0700)]
Merge branch 'hisilicon-fixes'
Jiangfeng Xiao says:
====================
net: hisilicon: Fix a few problems with hip04_eth
During the use of the hip04_eth driver,
several problems were found,
which solved the hip04_tx_reclaim reentry problem,
fixed the problem that hip04_mac_start_xmit never
returns NETDEV_TX_BUSY
and the dma_map_single failed on the arm64 platform.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiangfeng Xiao [Sat, 3 Aug 2019 12:31:41 +0000 (20:31 +0800)]
net: hisilicon: Fix dma_map_single failed on arm64
On the arm64 platform, executing "ifconfig eth0 up" will fail,
returning "ifconfig: SIOCSIFFLAGS: Input/output error."
ndev->dev is not initialized, dma_map_single->get_dma_ops->
dummy_dma_ops->__dummy_map_page will return DMA_ERROR_CODE
directly, so when we use dma_map_single, the first parameter
is to use the device of platform_device.
Signed-off-by: Jiangfeng Xiao <xiaojiangfeng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiangfeng Xiao [Sat, 3 Aug 2019 12:31:40 +0000 (20:31 +0800)]
net: hisilicon: fix hip04-xmit never return TX_BUSY
TX_DESC_NUM is 256, in tx_count, the maximum value of
mod(TX_DESC_NUM - 1) is 254, the variable "count" in
the hip04_mac_start_xmit function is never equal to
(TX_DESC_NUM - 1), so hip04_mac_start_xmit never
return NETDEV_TX_BUSY.
tx_count is modified to mod(TX_DESC_NUM) so that
the maximum value of tx_count can reach
(TX_DESC_NUM - 1), then hip04_mac_start_xmit can reurn
NETDEV_TX_BUSY.
Signed-off-by: Jiangfeng Xiao <xiaojiangfeng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiangfeng Xiao [Sat, 3 Aug 2019 12:31:39 +0000 (20:31 +0800)]
net: hisilicon: make hip04_tx_reclaim non-reentrant
If hip04_tx_reclaim is interrupted while it is running
and then __napi_schedule continues to execute
hip04_rx_poll->hip04_tx_reclaim, reentrancy occurs
and oops is generated. So you need to mask the interrupt
during the hip04_tx_reclaim run.
net: mdio-octeon: Fix Kconfig warnings and build errors
After commit 171a9bae68c7 ("staging/octeon: Allow test build on
!MIPS"), the following combination of configs cause a few Kconfig
warnings and build errors (distilled from arm allyesconfig and Randy's
randconfig builds):
In file included from ../drivers/net/phy/mdio-octeon.c:14:
../drivers/net/phy/mdio-cavium.h:111:36: error: implicit declaration of
function ‘writeq’; did you mean ‘writel’?
[-Werror=implicit-function-declaration]
111 | #define oct_mdio_writeq(val, addr) writeq(val, (void *)addr)
| ^~~~~~
CONFIG_64BIT is not strictly necessary if the proper readq/writeq
definitions are included from io-64-nonatomic-lo-hi.h.
CONFIG_OF_MDIO is not needed when CONFIG_COMPILE_TEST is enabled because
of commit f9dc9ac51610 ("of/mdio: Add dummy functions in of_mdio.h.").
Fixes: 171a9bae68c7 ("staging/octeon: Allow test build on !MIPS") Reported-by: kbuild test robot <lkp@intel.com> Reported-by: Mark Brown <broonie@kernel.org> Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Fri, 2 Aug 2019 19:34:55 +0000 (15:34 -0400)]
net: dsa: dump CPU port regs through master
Merge the CPU port registers dump into the master interface registers
dump through ethtool, by nesting the ethtool_drvinfo and ethtool_regs
structures of the CPU port into the dump.
drvinfo->regdump_len will contain the full data length, while regs->len
will contain only the master interface registers dump length.
This allows for example to dump the CPU port registers on a ZII Dev
C board like this:
# ethtool -d eth1
0x004: 0x00000000
0x008: 0x0a8000aa
0x010: 0x01000000
0x014: 0x00000000
0x024: 0xf0000102
0x040: 0x6d82c800
0x044: 0x00000020
0x064: 0x40000000
0x084: RCR (Receive Control Register) 0x47c00104
MAX_FL (Maximum frame length) 1984
FCE (Flow control enable) 0
BC_REJ (Broadcast frame reject) 0
PROM (Promiscuous mode) 0
DRT (Disable receive on transmit) 0
LOOP (Internal loopback) 0
0x0c4: TCR (Transmit Control Register) 0x00000004
RFC_PAUSE (Receive frame control pause) 0
TFC_PAUSE (Transmit frame control pause) 0
FDEN (Full duplex enable) 1
HBC (Heartbeat control) 0
GTS (Graceful transmit stop) 0
0x0e4: 0x76735d6d
0x0e8: 0x7e9e8808
0x0ec: 0x00010000
.
.
. 88E6352 Switch Port Registers
------------------------------
00: Port Status 0x4d04
Pause Enabled 0
My Pause 1
802.3 PHY Detected 0
Link Status Up
Duplex Full
Speed 100 or 200 Mbps
EEE Enabled 0
Transmitter Paused 0
Flow Control 0
Config Mode 0x4
01: Physical Control 0x003d
RGMII Receive Timing Control Default
RGMII Transmit Timing Control Default
200 BASE Mode 100
Flow Control's Forced value 0
Force Flow Control 0
Link's Forced value Up
Force Link 1
Duplex's Forced value Full
Force Duplex 1
Force Speed 100 or 200 Mbps
.
.
.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
Fix batched event generation for vlan action
When adding or deleting a batch of entries, the kernel sends up to
TCA_ACT_MAX_PRIO (defined to 32 in kernel) entries in an event to user
space. However it does not consider that the action sizes may vary and
require different skb sizes.
For example, consider the following script adding 32 entries with all
supported vlan parameters (in order to maximize netlink messages size):
$TC actions flush action vlan
for i in `seq 1 $1`;
do
cmd="action vlan push protocol 802.1q id 4094 priority 7 pipe \
index $i cookie aabbccddeeff112233445566778800a1 "
args=$args$cmd
done
$TC actions add $args
%
% ./tc-batch.sh 32
Error: Failed to fill netlink attributes while adding TC action.
We have an error talking to the kernel
%
patch 1 adds callback in tc_action_ops of vlan action, which calculates
the action size, and passes size to tcf_add_notify()/tcf_del_notify().
patch 2 updates the TDC test suite with relevant vlan test cases.
====================
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Tue, 6 Aug 2019 21:01:08 +0000 (14:01 -0700)]
Merge tag 'mips_fixes_5.3_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS fixes from Paul Burton:
"A few MIPS fixes for 5.3:
- Various switch fall through annotations to fixup warnings & errors
resulting from -Wimplicit-fallthrough.
- A fix for systems (at least jazz) using an i8253 PIT as clocksource
when it's not suitably configured.
- Set struct cacheinfo's cpu_map_populated field to true, indicating
that we filled in cache info detected from cop0 registers &
avoiding complaints about that info being (intentionally) missing
in devicetree"
* tag 'mips_fixes_5.3_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: BCM63XX: Mark expected switch fall-through
MIPS: OProfile: Mark expected switch fall-throughs
MIPS: Annotate fall-through in Cavium Octeon code
MIPS: Annotate fall-through in kvm/emulate.c
mips: fix cacheinfo
MIPS: kernel: only use i8253 clocksource with periodic clockevent
====================
drop_monitor: Various improvements and cleanups
This patchset performs various improvements and cleanups in drop monitor
with no functional changes intended. There are no changes in these
patches relative to the RFC I sent two weeks ago [1].
A followup patchset will extend drop monitor with a packet alert mode in
which the dropped packet is notified to user space instead of just a
summary of recent drops. Subsequent patchsets will add the ability to
monitor hardware originated drops via drop monitor.
Ido Schimmel [Tue, 6 Aug 2019 13:19:56 +0000 (16:19 +0300)]
drop_monitor: Use pre_doit / post_doit hooks
Each operation from user space should be protected by the global drop
monitor mutex. Use the pre_doit / post_doit hooks to take / release the
lock instead of doing it explicitly in each function.
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>