Quentin Schulz [Fri, 5 May 2017 13:50:31 +0000 (15:50 +0200)]
can: m_can: make m_can_start and m_can_stop symmetric
This moves clocks gating outside of the m_can_stop function as the
m_can_start function does not (and cannot, at least in current
implementation) ungate clocks. This way, both functions can now be used
symmetrically.
Signed-off-by: Quentin Schulz <quentin.schulz@free-electrons.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Quentin Schulz [Fri, 5 May 2017 13:50:30 +0000 (15:50 +0200)]
can: m_can: move Message RAM initialization to function
To avoid possible ECC/parity checksum errors when reading an
uninitialized buffer, the entire Message RAM is initialized when probing
the driver. This initialization is done in the same function reading the
Device Tree properties.
This patch moves the RAM initialization to a separate function so it can
be called separately from device initialization from Device Tree.
Signed-off-by: Quentin Schulz <quentin.schulz@free-electrons.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
I will be contributing a few new features to the Marvell PHY driver
soon. Start by making the code mostly checkpatch clean. There should
not be any functional changes. Just comments set into the correct
format, missing blank lines, turn some comparisons around, and
refactoring to reduce indentation depth.
There is still one camel in the code, but it actually makes sense, so
leave it in piece.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 16 May 2017 21:00:13 +0000 (14:00 -0700)]
tcp: replace misc tcp_time_stamp to tcp_jiffies32
After this patch, all uses of tcp_time_stamp will require
a change when we introduce 1 ms and/or 1 us TCP TS option.
Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 16 May 2017 21:00:12 +0000 (14:00 -0700)]
tcp_lp: cache tcp_time_stamp
tcp_time_stamp will become slightly more expensive soon,
cache its value.
Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 16 May 2017 21:00:11 +0000 (14:00 -0700)]
tcp_westwood: use tcp_jiffies32 instead of tcp_time_stamp
This CC does not need 1 ms tcp_time_stamp and can use
the jiffy based 'timestamp'.
Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 16 May 2017 21:00:10 +0000 (14:00 -0700)]
tcp: use tcp_jiffies32 in __tcp_oow_rate_limited()
This place wants to use tcp_jiffies32, this is good enough.
Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 16 May 2017 21:00:09 +0000 (14:00 -0700)]
tcp: uses jiffies_32 to feed tp->chrono_start
tcp_time_stamp will no longer be tied to jiffies.
Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 16 May 2017 21:00:08 +0000 (14:00 -0700)]
tcp: use tcp_jiffies32 to feed probe_timestamp
Use tcp_jiffies32 instead of tcp_time_stamp, since
tcp_time_stamp will soon be only used for TCP TS option.
Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 16 May 2017 21:00:07 +0000 (14:00 -0700)]
tcp: use tcp_jiffies32 for rcv_tstamp and lrcvtime
Use tcp_jiffies32 instead of tcp_time_stamp, since
tcp_time_stamp will soon be only used for TCP TS option.
Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 16 May 2017 21:00:06 +0000 (14:00 -0700)]
tcp: bic, cubic: use tcp_jiffies32 instead of tcp_time_stamp
Use tcp_jiffies32 instead of tcp_time_stamp, since
tcp_time_stamp will soon be only used for TCP TS option.
Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 16 May 2017 21:00:05 +0000 (14:00 -0700)]
tcp_bbr: use tcp_jiffies32 instead of tcp_time_stamp
Use tcp_jiffies32 instead of tcp_time_stamp, since
tcp_time_stamp will soon be only used for TCP TS option.
Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 16 May 2017 21:00:04 +0000 (14:00 -0700)]
tcp: use tcp_jiffies32 to feed tp->snd_cwnd_stamp
Use tcp_jiffies32 instead of tcp_time_stamp to feed
tp->snd_cwnd_stamp.
tcp_time_stamp will soon be a litle bit more expensive
than simply reading 'jiffies'.
Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 16 May 2017 21:00:03 +0000 (14:00 -0700)]
tcp: use tcp_jiffies32 to feed tp->lsndtime
Use tcp_jiffies32 instead of tcp_time_stamp to feed
tp->lsndtime.
tcp_time_stamp will soon be a litle bit more expensive
than simply reading 'jiffies'.
Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 16 May 2017 21:00:02 +0000 (14:00 -0700)]
dccp: do not use tcp_time_stamp
Use our own macro instead of abusing tcp_time_stamp
Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 16 May 2017 21:00:01 +0000 (14:00 -0700)]
tcp: introduce tcp_jiffies32
We abuse tcp_time_stamp for two different cases :
1) base to generate TCP Timestamp options (RFC 7323)
2) A 32bit version of jiffies since some TCP fields
are 32bit wide to save memory.
Since we want in the future to have 1ms TCP TS clock,
regardless of HZ value, we want to cleanup things.
tcp_jiffies32 is the truncated jiffies value,
which will be used only in places where we want a 'host'
timestamp.
Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 16 May 2017 21:00:00 +0000 (14:00 -0700)]
tcp: use tp->tcp_mstamp in output path
Idea is to later convert tp->tcp_mstamp to a full u64 counter
using usec resolution, so that we can later have fine
grained TCP TS clock (RFC 7323), regardless of HZ value.
We try to refresh tp->tcp_mstamp only when necessary.
Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 17 May 2017 19:22:15 +0000 (15:22 -0400)]
Merge branch 'net-sched-multichain-filters'
Jiri Pirko says:
====================
net: sched: introduce multichain support for filters
Currently, each classful qdisc holds one chain of filters.
This chain is traversed and each filter could be matched on, which
may lead to execution of list of actions. One of such action
could be "reclassify", which would "reset" the processing of the
filter chain.
So this filter chain could be looked at as a flat table.
Sometimes it is convenient for user to configure a hierarchy
of tables. Example usecase is encapsulation.
Hierarchy of tables is a common way how it is done in HW pipelines.
So it is much more convenient to offload this.
This patchset contains two major patches:
8/10 - This patch introduces the support for having multiple
chains of filters.
10/10 - This patch adds new control action to allow going to specified chain
The rest of the patches are smaller or bigger depencies of those 2.
Please see individual patch descriptions for details.
Corresponding iproute2 patches are appended as a reply to this cover letter.
Simple example:
$ tc qdisc add dev eth0 ingress
$ tc filter add dev eth0 parent ffff: protocol ip pref 33 flower dst_mac 52:54:00:3d:c7:6d action goto chain 11
$ tc filter add dev eth0 parent ffff: protocol ip pref 22 chain 11 flower dst_ip 192.168.40.1 action drop
$ tc filter show dev eth0 root
filter parent ffff: protocol ip pref 33 flower chain 0
filter parent ffff: protocol ip pref 33 flower chain 0 handle 0x1
dst_mac 52:54:00:3d:c7:6d
eth_type ipv4
action order 1: gact action goto chain 11
random type none pass val 0
index 2 ref 1 bind 1
filter parent ffff: protocol ip pref 22 flower chain 11
filter parent ffff: protocol ip pref 22 flower chain 11 handle 0x1
eth_type ipv4
dst_ip 192.168.40.1
action order 1: gact action drop
random type none pass val 0
index 3 ref 1 bind 1
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Wed, 17 May 2017 09:08:03 +0000 (11:08 +0200)]
net: sched: add termination action to allow goto chain
Introduce new type of termination action called "goto_chain". This allows
user to specify a chain to be processed. This action type is
then processed as a return value in tcf_classify loop in similar
way as "reclassify" is, only it does not reset to the first filter
in chain but rather reset to the first filter of the desired chain.
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Wed, 17 May 2017 09:08:01 +0000 (11:08 +0200)]
net: sched: introduce multichain support for filters
Instead of having only one filter per block, introduce a list of chains
for every block. Create chain 0 by default. UAPI is extended so the user
can specify which chain he wants to change. If the new attribute is not
specified, chain 0 is used. That allows to maintain backward
compatibility. If chain does not exist and user wants to manipulate with
it, new chain is created with specified index. Also, when last filter is
removed from the chain, the chain is destroyed.
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Wed, 17 May 2017 09:07:57 +0000 (11:07 +0200)]
net: sched: replace nprio by a bool to make the function more readable
The use of "nprio" variable in tc_ctl_tfilter is a bit cryptic and makes
a reader wonder what is going on for a while. So help him to understand
this priority allocation dance a litte bit better.
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Wed, 17 May 2017 09:07:55 +0000 (11:07 +0200)]
net: sched: introduce tcf block infractructure
Currently, the filter chains are direcly put into the private structures
of qdiscs. In order to be able to have multiple chains per qdisc and to
allow filter chains sharing among qdiscs, there is a need for common
object that would hold the chains. This introduces such object and calls
it "tcf_block".
Helpers to get and put the blocks are provided to be called from
individual qdisc code. Also, the original filter_list pointers are left
in qdisc privs to allow the entry into tcf_block processing without any
added overhead of possible multiple pointer dereference on fast path.
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Tue, 16 May 2017 20:40:08 +0000 (22:40 +0200)]
drivers: net: DSA: Sort drivers
With more drivers being added, it is time to sort the drivers to
impose some order.
Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Tue, 16 May 2017 20:40:07 +0000 (22:40 +0200)]
net: dsa: Sort DSA tagging protocol drivers
With more tag protocols being added, regain some order by sorting the
entries in various places.
Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Felix Manlunas [Tue, 16 May 2017 18:28:00 +0000 (11:28 -0700)]
liquidio: fix PF falsely indicating success at setting MAC address of a nonexistent VF
In the function assigned to .ndo_set_vf_mac, check the validity of the
vfidx argument before proceeding to tell the firmware to set the VF MAC
address.
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: Derek Chickles <derek.chickles@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Rick Farrington [Tue, 16 May 2017 18:14:50 +0000 (11:14 -0700)]
liquidio: fix insmod failure when multiple NICs are plugged in
When multiple liquidio NICs are plugged in, the first insmod of the PF
driver succeeds. But after an rmmod, a subsequent insmod fails. Reason is
during rmmod, the PF driver resets the Octeon of only one of the NICs; it
neglects to reset the Octeons of the other NICs.
Fix the insmod failure by adding the missing Octeon resets at rmmod. Keep
a per-NIC refcount that indicates the number of active PFs in a given NIC.
When the refcount goes to zero, then reset the Octeon of that NIC.
Signed-off-by: Rick Farrington <ricardo.farrington@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Tue, 16 May 2017 18:10:33 +0000 (14:10 -0400)]
net: dsa: store CPU port pointer in the tree
A dsa_switch_tree instance holds a dsa_switch pointer and a port index
to identify the switch port to which the CPU is attached.
Now that the DSA layer has a dsa_port structure to hold this data, use
it to point the switch CPU port.
This patch simply substitutes s/dst->cpu_switch/dst->cpu_dp->ds/ and
s/dst->cpu_port/dst->cpu_dp->index/.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
mlxsw: Preparations for restructuring
This patchset doesn't introduce any functional changes and merely meant
to make the code base more receptive for upcoming restructuring.
The first six patches mainly shuffle code in order to reduce the scope of
structs that shouldn't be defined in the main driver header. Most of them
will be later expanded, so it makes sense to correctly place them now.
The last patches mostly simplify bridge-related functions, so that they
could be more easily modified later on.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Tue, 16 May 2017 17:38:35 +0000 (19:38 +0200)]
mlxsw: spectrum: Default ports to non-virtual mode
In virtual mode, packets are classified to FIDs based on their ingress
port and VLAN whereas in non-virtual mode only the VLAN is taken into
account.
Currently ports are initialized to use virtual mode due to the presence
of the PVID vPort. However, we're going to transition ports between both
modes based on the FIDs they use and not merely based on the presence on
a VLAN upper. Therefore, during initialization, no mode will be
explicitly set.
Since the Programmer's Reference Manual (PRM) doesn't specify a default,
explicitly set the port to non-virtual mode and later transition the
port between both modes based on the FIDs it uses.
In a follow-up patchset, this step will be moved to the common FID core
where it logically belongs.
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Simplify the code by using the common function that sets an STP state
for a Port-VLAN and remove the existing one that tries to batch it for
several VLANs.
This will help us in a follow-up patchset to introduce a unified
infrastructure for bridge ports, regardless if the bridge is VLAN-aware
or not.
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
switchdev's VLAN object has the ability to describe a range of VLAN IDs,
but this is only used when VLAN operations are done using the SELF flag,
which is something we would like to remove as it allows one to bypass
the bridge driver.
Do VLAN operations on a per-VLAN basis, thereby simplifying the code and
preparing it for refactoring in a follow-up patchset.
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Tue, 16 May 2017 17:38:30 +0000 (19:38 +0200)]
mlxsw: spectrum_switchdev: Remove redundant check
Since commit 97c242902c20 ("switchdev: Execute bridge ndos only for
bridge ports") switchdev code checks that port is bridged, so no need to
perform the same check in the driver.
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Tue, 16 May 2017 17:38:29 +0000 (19:38 +0200)]
mlxsw: spectrum_router: Initialize RIFs in a separate function
The router interfaces (RIFs) array is currently initialized together
with the general router configuration. However, in a follow-up patchset
we're going to introduce a common RIF core that will require us to
initialize more RIF constructs, so move the RIF initialization to its
own function.
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Tue, 16 May 2017 17:38:27 +0000 (19:38 +0200)]
mlxsw: spectrum_router: Move RIFs array to its rightful place
The router interfaces (RIFs) array is of no interest to code outside the
routing realm, so declare it inside the router specific struct instead
of the chip-wide one.
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Tue, 16 May 2017 17:38:26 +0000 (19:38 +0200)]
mlxsw: spectrum_switchdev: Reduce scope of bridge struct
Some attributes in the global chip struct are only relevant for bridge
operation, so encapsulate them in their own struct that isn't exposed to
non-bridge code.
This will also help us later, when we add more bridge-specific
attributes.
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Tue, 16 May 2017 17:38:25 +0000 (19:38 +0200)]
mlxsw: spectrum_router: Reduce scope of router struct
In a similar fashion to previous patch, the router structure
('mlxsw_sp_router') doesn't need to be accessible to anyone, but the
router code located at spectrum_router.c
Make this apparent and reduce its scope by defining it there.
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Tue, 16 May 2017 16:29:11 +0000 (18:29 +0200)]
net: phy: Remove residual magic from PHY drivers
commit fa8cddaf903c ("net phylib: Remove unnecessary condition check in phy")
removed the only place where the PHY flag PHY_HAS_MAGICANEG was
checked. But it left the flag being set in the drivers. Remove the flag.
Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Leon Romanovsky [Tue, 16 May 2017 12:20:56 +0000 (15:20 +0300)]
bnx2x: Remove open coded carrier check
There is inline function to test if carrier present,
so it makes open-coded solution redundant.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 16 May 2017 11:24:36 +0000 (04:24 -0700)]
tcp: internal implementation for pacing
BBR congestion control depends on pacing, and pacing is
currently handled by sch_fq packet scheduler for performance reasons,
and also because implemening pacing with FQ was convenient to truly
avoid bursts.
However there are many cases where this packet scheduler constraint
is not practical.
- Many linux hosts are not focusing on handling thousands of TCP
flows in the most efficient way.
- Some routers use fq_codel or other AQM, but still would like
to use BBR for the few TCP flows they initiate/terminate.
This patch implements an automatic fallback to internal pacing.
Pacing is requested either by BBR or use of SO_MAX_PACING_RATE option.
If sch_fq happens to be in the egress path, pacing is delegated to
the qdisc, otherwise pacing is done by TCP itself.
One advantage of pacing from TCP stack is to get more precise rtt
estimations, and less work done from TX completion, since TCP Small
queue limits are not generally hit. Setups with single TX queue but
many cpus might even benefit from this.
Note that unlike sch_fq, we do not take into account header sizes.
Taking care of these headers would add additional complexity for
no practical differences in behavior.
Some performance numbers using 800 TCP_STREAM flows rate limited to
~48 Mbit per second on 40Gbit NIC.
As expected, number of interrupts per second is very different.
Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Van Jacobson <vanj@google.com> Cc: Jerry Chu <hkchu@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch series implement an idea suggested by Eric Dumazet to
reduce the contention of the udp sk_receive_queue lock when the socket is
under flood.
An ancillary queue is added to the udp socket, and the socket always
tries first to read packets from such queue. If it's empty, we splice
the content from sk_receive_queue into the ancillary queue.
The first patch introduces some helpers to keep the udp code small, and the
following two implement the ancillary queue strategy. The code is split
to hopefully help the reviewing process.
The measured overall gain under udp flood is up to the 30% depending on
the numa layout and the number of ingress queue used by the relevant nic.
The performance numbers have been gathered using pktgen as sender, with 64
bytes packets, random src port on a host b2b connected via a 10Gbs link
with the dut.
The receiver used the udp_sink program by Jesper [1] and an h/w l4 rx hash on
the ingress nic, so that the number of ingress nic rx queues hit by the udp
traffic could be controlled via ethtool -L.
The udp_sink program was bound to the first idle cpu, to get more
stable numbers.
When using a single nic rx queue, busy polling was also enabled,
elsewhere, in the above scenario, the bh processing becomes the bottle-neck
and this produces large artifacts in the measured performances (e.g.
improving the udp sink run time, decreases the overall tput, since more
action from the scheduler comes into play).
Paolo Abeni [Tue, 16 May 2017 09:20:15 +0000 (11:20 +0200)]
udp: keep the sk_receive_queue held when splicing
On packet reception, when we are forced to splice the
sk_receive_queue, we can keep the related lock held, so
that we can avoid re-acquiring it, if fwd memory
scheduling is required.
v1 -> v2:
the rx_queue_lock_held param in udp_rmem_release() is
now a bool
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Tue, 16 May 2017 09:20:14 +0000 (11:20 +0200)]
udp: use a separate rx queue for packet reception
under udp flood the sk_receive_queue spinlock is heavily contended.
This patch try to reduce the contention on such lock adding a
second receive queue to the udp sockets; recvmsg() looks first
in such queue and, only if empty, tries to fetch the data from
sk_receive_queue. The latter is spliced into the newly added
queue every time the receive path has to acquire the
sk_receive_queue lock.
The accounting of forward allocated memory is still protected with
the sk_receive_queue lock, so udp_rmem_release() needs to acquire
both locks when the forward deficit is flushed.
On specific scenarios we can end up acquiring and releasing the
sk_receive_queue lock multiple times; that will be covered by
the next patch
Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Tue, 16 May 2017 09:20:13 +0000 (11:20 +0200)]
net/sock: factor out dequeue/peek with offset code
And update __sk_queue_drop_skb() to work on the specified queue.
This will help the udp protocol to use an additional private
rx queue in a later patch.
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
nfp: LSO, checksum and XDP datapath updates
This series introduces a number of refinements to standard features
like LSO and checksum offload. Three major features are support for
CHECKSUM_COMPLETE, refinement of TSO handling and another small speed
up for XDP TX. This series also switches from depending on some
app FW<>driver ABI versions to heavier use of capabilities.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 16 May 2017 00:55:23 +0000 (17:55 -0700)]
nfp: eliminate an if statement in calculation of completed frames
Given that our rings are always a power of 2, we can simplify the
calculation of number of completed TX descriptors by using masking
instead of if statement based on whether the index have wrapped
or not.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 16 May 2017 00:55:22 +0000 (17:55 -0700)]
nfp: add a helper for wrapping descriptor index
We have a number of places where we calculate the descriptor
index based on a value which may have overflown. Create a
macro for masking with the ring size.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 16 May 2017 00:55:21 +0000 (17:55 -0700)]
nfp: complete the XDP TX ring only when it's full
Since XDP TX ring holds "spare" RX buffers anyway, we don't have to
rush the completion. We can wait until ring fills up completely
before trying to reclaim buffers. If RX poll has ended an no
buffer has been queued for XDP TX we have no guarantee we will see
another interrupt, so run the reclaim there as well, to make sure
TX statistics won't become stale.
This should help us reclaim more buffers per single queue controller
register read.
Note that the XDP completion is very trivial, it only adds up
the sizes of transmitted frames for statistics so the latency
spike should be acceptable. In case user sets the ring sizes
to something crazy, limit the completion to 2k entries.
The check if the ring is empty at the beginning of xdp_complete()
is no longer needed - the callers will perform it.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 16 May 2017 00:55:20 +0000 (17:55 -0700)]
nfp: add CHECKSUM_COMPLETE support
Introduce NFP_NET_CFG_CTRL_CSUM_COMPLETE capability and implement parsing
of CHECKSUM_COMPLETE metadata.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Edwin Peer <edwin.peer@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Edwin Peer [Tue, 16 May 2017 00:55:19 +0000 (17:55 -0700)]
nfp: version independent support for chained RSS metadata
ABI version 4 introduced metadata chaining. Using the ABI version to signal
metadata chaining precludes firmware that advertises new capabilities which
rely on prepended metadata from working on older kernels.
Capability bits are thus better suited to signalling the chained metadata
format. A new version of the RSS capability is introduced to distinguish
between the differing metadata formats for ABI versions other than 4.
Signed-off-by: Edwin Peer <edwin.peer@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 16 May 2017 00:55:18 +0000 (17:55 -0700)]
nfp: don't assume RSS and IRQ moderation are always enabled
Even if capability for RSS and IRQ moderation are present we may
have not initialized them for control vNIC. Depend on selected
features mask (ctrl) rather than capabilities (cap) to determine
which features should be enabled.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Edwin Peer [Tue, 16 May 2017 00:55:17 +0000 (17:55 -0700)]
nfp: support LSO2 capability
Firmware advertising the LSO2 capability exploits driver provided L3 and L4
offsets in order to avoid parsing packet headers in the TX path. The vlan
field in struct nfp_net_tx_desc is repurposed, making TXVLAN a mutually
exclusive configuration to LSO2.
Signed-off-by: Edwin Peer <edwin.peer@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Edwin Peer [Tue, 16 May 2017 00:55:16 +0000 (17:55 -0700)]
nfp: rename l4_offset in struct nfp_net_tx_desc to lso_hdrlen
The l4_offset field referred to by NFD is confusingly named. It is not the
offset of the L4 transport header, but rather the L4 payload.
The LSO2 capability supported by alternative device firmware requires
the actual L4 offset, thus the rename seems prudent.
Signed-off-by: Edwin Peer <edwin.peer@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 16 May 2017 00:55:15 +0000 (17:55 -0700)]
nfp: don't enable TSO on the device when disabled
We advertise TSO to the stack but leave it disabled by default.
Make sure it's not only disabled in the netdev features but
also on the device itself.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Wed, 10 May 2017 01:30:12 +0000 (18:30 -0700)]
bnxt: add dma mapping attributes
On the SPARC platform we need to use the DMA_ATTR_WEAK_ORDERING attribute
in our Rx path dma mapping in order to get the expected performance out
of the receive path. Adding it to the Tx path has little effect, so
that's not a part of this patch.
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Reviewed-by: Tushar Dave <tushar.n.dave@oracle.com> Reviewed-by: Tom Saeger <tom.saeger@oracle.com> Acked-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
drivers: net: xgene: Add ethtool stats and bug fixes
This patch set,
- adds ethtool extended statistics support
- addresses errata workarounds
- fixes bugs related to statistics
v2: Address review comments from v1
- Adds lock to protect mdio-xgene indirect MAC access
- Refactors xgene-enet indirect MAC read/write functions
- Uses mdio-xgene MAC access routines, if xgene-enet port
use the same HW.
v1:
- Initial version
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: Quan Nguyen <qnguyen@apm.com>
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Quan Nguyen [Wed, 10 May 2017 20:45:09 +0000 (13:45 -0700)]
drivers: net: xgene: Workaround for HW errata 10GE_10/ENET_15
This patch adds workaround for HW errata 10GE_10 and ENET_15:
"HW statistic counters value are duplicated".
- RFCS duplicates RALN counter
- RFLR duplicates RUND counter
- TFCS duplicates TFRG counter
- RALN should be intepreted as 0 in 10G mode
Signed-off-by: Quan Nguyen <qnguyen@apm.com> Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds statistic counter for frames recovered from HW errata
10GE_8 and ENET_11:
"HW reports Length error for valid 64 byte frames with len <46 bytes".
Signed-off-by: Quan Nguyen <qnguyen@apm.com> Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Quan Nguyen [Wed, 10 May 2017 20:45:07 +0000 (13:45 -0700)]
drivers: net: xgene: Workaround for HW errata 10GE_4
This patch adds workaround for HW errata 10GE_4:
"XGENET_ICM_ECM_DROP_COUNT_REG_0 reg not clear on read".
Signed-off-by: Quan Nguyen <qnguyen@apm.com> Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds rx_overrun and tx_underrun ethtool statistic counters.
Signed-off-by: Quan Nguyen <qnguyen@apm.com> Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Quan Nguyen [Wed, 10 May 2017 20:45:05 +0000 (13:45 -0700)]
drivers: net: xgene: Extend ethtool statistics
This patch adds extended ethtool statistics support.
Signed-off-by: Quan Nguyen <qnguyen@apm.com> Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Quan Nguyen [Wed, 10 May 2017 20:45:04 +0000 (13:45 -0700)]
drivers: net: xgene: Remove unused macros
This patch cleans up unused macros to improve readability.
Signed-off-by: Quan Nguyen <qnguyen@apm.com> Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes the tx error counters and adds more rx error counters.
Signed-off-by: Quan Nguyen <qnguyen@apm.com> Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Quan Nguyen [Wed, 10 May 2017 20:45:02 +0000 (13:45 -0700)]
drivers: net: xgene: Remove redundant local stats
Commit 5944701df90d ("net: remove useless memset's in drivers get_stats64")
makes the pdata->stats redundant. This patch removes pdata->stats and
updates get_stats64() callback accordingly.
Signed-off-by: Quan Nguyen <qnguyen@apm.com> Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Quan Nguyen [Wed, 10 May 2017 20:45:01 +0000 (13:45 -0700)]
drivers: net: xgene: Use rgmii mdio mac access
This patch switches to use rgmii mdio mac access routines if available,
as they share the same HW.
Signed-off-by: Quan Nguyen <qnguyen@apm.com> Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Quan Nguyen [Wed, 10 May 2017 20:45:00 +0000 (13:45 -0700)]
drivers: net: phy: xgene: Add lock to protect mac access
This patch,
- refactors mac access routine
- adds lock to protect mac indirect access
Signed-off-by: Quan Nguyen <qnguyen@apm.com> Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
- refactors mac read/write functions
- adds lock to protect indirect mac access
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: Quan Nguyen <qnguyen@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
1) Track alignment in BPF verifier so that legitimate programs won't be
rejected on !CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS architectures.
2) Make tail calls work properly in arm64 BPF JIT, from Deniel
Borkmann.
3) Make the configuration and semantics Generic XDP make more sense and
don't allow both generic XDP and a driver specific instance to be
active at the same time. Also from Daniel.
4) Don't crash on resume in xen-netfront, from Vitaly Kuznetsov.
5) Fix use-after-free in VRF driver, from Gao Feng.
6) Use netdev_alloc_skb_ip_align() to avoid unaligned IP headers in
qca_spi driver, from Stefan Wahren.
7) Always run cleanup routines in BPF samples when we get SIGTERM, from
Andy Gospodarek.
8) The mdio phy code should bring PHYs out of reset using the shared
GPIO lines before invoking bus->reset(). From Florian Fainelli.
9) Some USB descriptor access endian fixes in various drivers from
Johan Hovold.
10) Handle PAUSE advertisements properly in mlx5 driver, from Gal
Pressman.
11) Fix reversed test in mlx5e_setup_tc(), from Saeed Mahameed.
12) Cure netdev leak in AF_PACKET when using timestamping via control
messages. From Douglas Caetano dos Santos.
13) netcp doesn't support HWTSTAMP_FILTER_ALl, reject it. From Miroslav
Lichvar.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (52 commits)
ldmvsw: stop the clean timer at beginning of remove
ldmvsw: unregistering netdev before disable hardware
net: netcp: fix check of requested timestamping filter
ipv6: avoid dad-failures for addresses with NODAD
qed: Fix uninitialized data in aRFS infrastructure
mdio: mux: fix device_node_continue.cocci warnings
net/packet: fix missing net_device reference release
net/mlx4_core: Use min3 to select number of MSI-X vectors
macvlan: Fix performance issues with vlan tagged packets
net: stmmac: use correct pointer when printing normal descriptor ring
net/mlx5: Use underlay QPN from the root name space
net/mlx5e: IPoIB, Only support regular RQ for now
net/mlx5e: Fix setup TC ndo
net/mlx5e: Fix ethtool pause support and advertise reporting
net/mlx5e: Use the correct pause values for ethtool advertising
vmxnet3: ensure that adapter is in proper state during force_close
sfc: revert changes to NIC revision numbers
net: ch9200: add missing USB-descriptor endianness conversions
net: irda: irda-usb: fix firmware name on big-endian hosts
net: dsa: mv88e6xxx: add default case to switch
...
Linus Torvalds [Mon, 15 May 2017 22:27:02 +0000 (15:27 -0700)]
Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"A set of minor cifs fixes"
* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
[CIFS] Minor cleanup of xattr query function
fs: cifs: transport: Use time_after for time comparison
SMB2: Fix share type handling
cifs: cifsacl: Use a temporary ops variable to reduce code length
Don't delay freeing mids when blocked on slow socket write of request
CIFS: silence lockdep splat in cifs_relock_file()
David S. Miller [Mon, 15 May 2017 19:36:09 +0000 (15:36 -0400)]
Merge branch 'ldmsw-fixes'
Shannon Nelson says:
====================
ldmvsw: port removal stability
Under heavy reboot stress testing we found a couple of timing issues
when removing the device that could cause the kernel great heartburn,
addressed by these two patches.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Tai [Mon, 15 May 2017 17:51:07 +0000 (10:51 -0700)]
ldmvsw: unregistering netdev before disable hardware
When running LDom binding/unbinding test, kernel may panic
in ldmvsw_open(). It is more likely that because we're removing
the ldc connection before unregistering the netdev in vsw_port_remove(),
we set up a window of time where one process could be removing the
device while another trying to UP the device. This also sometimes causes
vio handshake error due to opening a device without closing it completely.
We should unregister the netdev before we disable the "hardware".
Signed-off-by: Thomas Tai <thomas.tai@oracle.com> Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Miroslav Lichvar [Mon, 15 May 2017 14:04:36 +0000 (16:04 +0200)]
net: netcp: fix check of requested timestamping filter
The driver doesn't support timestamping of all received packets and
should return error when trying to enable the HWTSTAMP_FILTER_ALL
filter.
Cc: WingMan Kwok <w-kwok2@ti.com> Cc: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This series contains some mlx5 fixes for net.
Please pull and let me know if there's any problem.
For -stable:
("net/mlx5e: Fix ethtool pause support and advertise reporting") kernels >= 4.8
("net/mlx5e: Use the correct pause values for ethtool advertising") kernels >= 4.8
v1->v2:
Dropped statistics spinlock patch, it needs some extra work.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Mahesh Bandewar [Sat, 13 May 2017 00:03:39 +0000 (17:03 -0700)]
ipv6: avoid dad-failures for addresses with NODAD
Every address gets added with TENTATIVE flag even for the addresses with
IFA_F_NODAD flag and dad-work is scheduled for them. During this DAD process
we realize it's an address with NODAD and complete the process without
sending any probe. However the TENTATIVE flags stays on the
address for sometime enough to cause misinterpretation when we receive a NS.
While processing NS, if the address has TENTATIVE flag, we mark it DADFAILED
and endup with an address that was originally configured as NODAD with
DADFAILED.
We can't avoid scheduling dad_work for addresses with NODAD but we can
avoid adding TENTATIVE flag to avoid this racy situation.
Signed-off-by: Mahesh Bandewar <maheshb@google.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>