Xin Long [Thu, 1 Mar 2018 15:05:16 +0000 (23:05 +0800)]
sctp: remove the unnecessary transport looking up from sctp_sendmsg
Now sctp_assoc_lookup_paddr can only be called only if daddr has
been set. But if daddr has been set, sctp_endpoint_lookup_assoc
would be done, where it could already have the transport.
So this unnecessary transport looking up should be removed, but
only reset transport as NULL when SCTP_ADDR_OVER is not set for
UDP type socket.
Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Thu, 1 Mar 2018 15:05:14 +0000 (23:05 +0800)]
sctp: factor out sctp_sendmsg_parse from sctp_sendmsg
This patch is to move the codes for parsing msghdr and checking
sk into sctp_sendmsg_parse.
Note that different from before, 'sinfo' in sctp_sendmsg won't
be NULL any more. It gets the value either from cmsgs->srinfo,
cmsgs->sinfo or asoc. With it, the 'sinfo' and 'fill_sinfo_ttl'
check can be removed from sctp_sendmsg.
Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Extend bpftool to build up CFG information of eBPF programs and add an
option to dump this in DOT format such that this can later be used with
DOT graphic tools (xdot, graphviz, etc) to visualize it. Part of the
analysis performed is sub-program detection and basic-block partitioning,
from Jiong.
2) Multiple enhancements for bpftool's batch mode, more specifically the
parser now understands comments (#), continuation lines (\), and arguments
enclosed between quotes. Also, allow to read from stdin via '-' as input
file, all from Quentin.
3) Improve BPF kselftests by i) unifying the rlimit handling into a helper
that is then used by all tests, and ii) add support for testing tail calls
to test_verifier plus add tests covering all corner cases. The latter is
especially useful for testing JITs, from Daniel.
4) Remove x64 JIT's bpf_flush_icache() since flush_icache_range() is a noop
on x64, from Daniel.
5) Fix one more occasion in BPF samples where we do not detach the BPF program
from the cgroup after completion, from Prashant.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Fri, 2 Mar 2018 13:42:39 +0000 (13:42 +0000)]
net/usb/kalmia: use ARRAY_SIZE for various array sizing calculations
Use the ARRAY_SIZE macro on a couple of arrays to determine
size of the arrays. Also fix up alignment to clean up a checkpatch
warning. Improvement suggested by Coccinelle.
Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ganesh Goudar [Fri, 2 Mar 2018 10:27:07 +0000 (15:57 +0530)]
cxgb4: Add TP Congestion map entry for single-port
Add TP Congestion Map entry for single-port T6 cards.
Signed-off-by: Casey Leedom <leedom@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 2 Mar 2018 14:50:21 +0000 (09:50 -0500)]
Merge tag 'mac80211-next-for-davem-2018-03-02' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:
====================
Only a few new things:
* hwsim net namespace stuff from Kirill Tkhai
* A-MSDU support in fast-RX
* 4-addr mode support in fast-RX
* support for a spec quirk in Add-BA negotiation
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ganesh Goudar [Fri, 2 Mar 2018 09:05:49 +0000 (14:35 +0530)]
cxgb4: remove dead code when allocating filter
Error code is already returned earlier if filter exists
at specified location. So, remove dead code trying to
free existing filter.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Thu, 1 Mar 2018 11:30:17 +0000 (14:30 +0300)]
net: Convert hwsim_net_ops
These pernet_operations allocate and destroy IDA identifier,
and these actions are synchronized by IDA subsystem locks.
Exit method removes mac80211_hwsim_data enteries from the lists,
and this is synchronized by hwsim_radio_lock with the rest
parallel pernet_operations. Also it queues destroy_radio()
work, and these work already may be executed in parallel
with any pernet_operations (as it's a work :). So, we may
mark these pernet_operations as async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Kirill Tkhai [Thu, 1 Mar 2018 11:30:09 +0000 (14:30 +0300)]
mac80211_hwsim: Make hwsim_netgroup IDA
hwsim_netgroup counter is declarated as int, and it is incremented
every time a new net is created. After sizeof(int) net are created,
it will overflow, and different net namespaces will have the same
identifier. This patch fixes the problem by introducing IDA instead
of int counter. IDA guarantees, all the net namespaces have the uniq
identifier.
Note, that after we do ida_simple_remove() in hwsim_exit_net(),
and we destroy the ID, later there may be executed destroy_radio()
from the workqueue. But destroy_radio() does not use the ID, so it's OK.
Out of bounds of this patch, just as a report to wireless subsystem
maintainer, destroy_radio() increaments hwsim_radios_generation
without hwsim_radio_lock, so this may need one more patch to fix.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Daniel Borkmann [Fri, 2 Mar 2018 08:46:41 +0000 (09:46 +0100)]
Merge branch 'bpf-bpftool-batch-improvements'
Quentin Monnet says:
====================
Several enhancements for bpftool batch mode are introduced in this series.
More specifically, input files for batch mode gain support for:
* comments (starting with '#'),
* continuation lines (after a line ending with '\'),
* arguments enclosed between quotes.
Also, make bpftool able to read from standard input when "-" is provided as
input file name.
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Quentin Monnet [Fri, 2 Mar 2018 04:20:11 +0000 (20:20 -0800)]
tools: bpftool: add support for quotations in batch files
Improve argument parsing from batch input files in order to support
arguments enclosed between single (') or double quotes ("). For example,
this command can now be parsed in batch mode:
bpftool prog dump xlated id 1337 file "/tmp/my file with spaces"
The function responsible for parsing command arguments is copied from
its counterpart in lib/utils.c in iproute2 package.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Quentin Monnet [Fri, 2 Mar 2018 04:20:09 +0000 (20:20 -0800)]
tools: bpftool: support continuation lines in batch files
Add support for continuation lines, such as in the following example:
prog show
prog dump xlated \
id 1337 opcodes
This patch is based after the code for support for continuation lines
from file lib/utils.c from package iproute2.
"Lines" in error messages are renamed as "commands", as we count the
number of commands (but we ignore empty lines, comments, and do not add
continuation lines to the count).
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Quentin Monnet [Fri, 2 Mar 2018 04:20:08 +0000 (20:20 -0800)]
tools: bpftool: support comments in batch files
Replace '#' by '\0' in commands read from batch files in order to avoid
processing the remaining part of the line, thus allowing users to use
comments in the files.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
David S. Miller [Fri, 2 Mar 2018 02:44:29 +0000 (21:44 -0500)]
Merge branch 'tcp_bbr-more-GSO-work'
Eric Dumazet says:
====================
tcp_bbr: more GSO work
Playing with r8152 USB 1Gbit NIC, on both USB2 and USB3 slots, I found
that BBR was performing poorly, because of TSO being limited to 16KB
This patch series makes sure BBR is not under estimating number of
packets that are needed to fill the pipe when a device has suboptimal
TSO limits.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 28 Feb 2018 22:40:46 +0000 (14:40 -0800)]
tcp_bbr: better deal with suboptimal GSO (II)
This is second part of dealing with suboptimal device gso parameters.
In first patch (350c9f484bde "tcp_bbr: better deal with suboptimal GSO")
we dealt with devices having low gso_max_segs
Some devices lower gso_max_size from 64KB to 16 KB (r8152 is an example)
In order to probe an optimal cwnd, we want BBR being not sensitive
to whatever GSO constraint a device can have.
This patch removes tso_segs_goal() CC callback in favor of
min_tso_segs() for CC wanting to override sysctl_tcp_min_tso_segs
Next patch will remove bbr->tso_segs_goal since it does not have
to be persistent.
Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch set is an application of CFG information on eBPF program
visualization. It presents some initial code for building CFG information
from eBPF instruction sequences.
After we get eBPF program bytecode, we do sub-program detection and
basic-block partition. These information then are visualized into DOT
graph.
The user could use any DOT graphic tools (xdot, graphviz etc) to view it.
For example:
bpftool prog dump xlated id 2 visual &>output.dot
[xdot | dotty] output.dot
dot -Tpng -o output.png
This initial patch set hasn't tuned much on the dot description layout
nor decoration, we could improve them later once the direction of the patch
set is agreed on. We could also visualize some static analysis performance
data.
Quentin Monnet [Fri, 2 Mar 2018 02:01:23 +0000 (18:01 -0800)]
tools: bpftool: add bash completion for CFG dump
Add bash completion for the "visual" keyword used for dumping the CFG of
eBPF programs with bpftool. Make sure we only complete with this keyword
when we dump "xlated" (and not "jited") instructions.
Acked-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Jiong Wang [Fri, 2 Mar 2018 02:01:19 +0000 (18:01 -0800)]
tools: bpftool: partition basic-block for each function in the CFG
This patch partition basic-block for each function in the CFG. The
algorithm is simple, we identify basic-block head in a first traversal,
then second traversal to identify the tail.
We could build extended basic-block (EBB) in next steps. EBB could make the
graph more readable when the eBPF sequence is big.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Jiong Wang [Fri, 2 Mar 2018 02:01:18 +0000 (18:01 -0800)]
tools: bpftool: detect sub-programs from the eBPF sequence
This patch detect all sub-programs from the eBPF sequence and keep the
information in the new CFG data structure.
The detection algorithm is basically the same as the one in verifier except
we need to use insn->off instead of insn->imm to get the pc-relative call
offset. Because verifier has modified insn->off/insn->imm during finishing
the verification.
Also, we don't need to do some sanity checks as verifier has done them.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
socket: skip checking sk_err for recvmmsg(MSG_ERRQUEUE)
recvmmsg does not call ___sys_recvmsg when sk_err is set.
That is fine for normal reads but, for MSG_ERRQUEUE, recvmmsg
should always call ___sys_recvmsg regardless of sk->sk_err to
be able to clear error queue. Otherwise, users are not able to
drain the error queue using recvmmsg.
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 2 Mar 2018 02:23:42 +0000 (21:23 -0500)]
Merge branch 'net-phy-Reduce-duplication'
Florian Fainelli says:
====================
net: phy: Reduce duplication
This patch series reduces the duplication among 10G PHY drivers that just
essentially stub most functions, but do that while replicating what the existing
generic functions do.
Changes in v3:
- removed unused "reg" variable in teranetics.c
- fixed subject for patch 5 since we actually use gen10g_no_soft_reset()
Changes in v2:
- rename gen10g_soft_reset() to gen10g_no_soft_reset() to better illustrate
what it does (or does not)
- removed stray comment in marvell10g.c
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
cortina_soft_reset() does the same thing as gen10g_soft_reset(), and
cortina_config_aneg() is actually doing what gen10g_config_init() does
for 10G capable PHYs.
Update teranetics_aneg_done() to use genphy_c45_aneg_done() instead of
duplicating that code, and switch to gen10g_* functions where
appropriate instead of maintaining identical copies doing nothing.
In order to remove a fair amount of duplication in the different 10G PHY
drivers, export all gen10g_* functions to be able to make use of those.
While we are at it, rename gen10g_soft_reset() to gen10g_no_soft_reset()
to illustrate what it does.
Finn Thain [Thu, 1 Mar 2018 23:29:28 +0000 (18:29 -0500)]
net/mac89x0: Replace custom debug logging with netif_* calls
Adopt the conventional style of debug logging because it is both
shorter and more flexible.
Remove the 'version_printed' flag as the version will be printed
only once anyway (when the module loads).
Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Finn Thain [Thu, 1 Mar 2018 23:29:28 +0000 (18:29 -0500)]
net/mac89x0: Fix and modernize log messages
Fix log message fragments that no longer produce the desired output
since the behaviour of printk() was changed.
Add missing printk severity levels.
Drop deprecated "out of memory" message as per checkpatch advice.
Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Finn Thain [Thu, 1 Mar 2018 23:29:28 +0000 (18:29 -0500)]
net/mac89x0: Convert to platform_driver
Apparently these Dayna cards don't have a pseudoslot declaration ROM
which means they can't be probed like NuBus cards.
Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 2 Mar 2018 02:19:03 +0000 (21:19 -0500)]
Merge branch 'forwarding-selftest-fixes'
David Ahern says:
====================
selftests: forwarding: misc bug fixes and enhancements
Bug fixes and an enhancement for the recent forwarding tests:
- only check tc version on tc tests
- handle multipath tests failing with 0 packet count
- fix ping command for IPv6 on Debian jessie
- improve summary of multipath tests
v2
- add CHECK_TC to bridge_vlan_aware.sh (Ido)
- dropped patch 2; always check for mz given its use
- fixed commit message for the last patch (Multipath: was dropped)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 1 Mar 2018 21:49:31 +0000 (13:49 -0800)]
selftests: forwarding: Handle 0 for packet difference in multipath tests
If the packet stats have a difference of 0, the test output shows:
INFO: Expected ratio 2.00 Measured ratio
Runtime error (func=(main), adr=9): Divide by zero
(standard_in) 2: syntax error
(standard_in) 1: syntax error
./router_multipath.sh: line 187: test: : integer expression expected
TEST: Multipath [FAIL]
Too large discrepancy between expected and measured ratios
Handle the 0 and display a cleaner message:
INFO: Running IPv6 multipath tests
TEST: Multipath [FAIL]
Packet difference is 0
Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Roopa Prabhu [Fri, 2 Mar 2018 01:55:37 +0000 (17:55 -0800)]
fib_rules: FRA_GENERIC_POLICY updates for ip proto, sport and dport attrs
Fixes: bfff4862653b ("net: fib_rules: support for match on ip_proto, sport and dport") Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Prashant Bhole [Thu, 1 Mar 2018 05:47:40 +0000 (14:47 +0900)]
samples/bpf: detach prog from cgroup
test_cgrp2_sock.sh and test_cgrp2_sock2.sh tests keep the program
attached to cgroup even after completion.
Using detach functionality of test_cgrp2_sock in both scripts.
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Acked-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Sabrina Dubroca [Wed, 28 Feb 2018 15:40:08 +0000 (16:40 +0100)]
ipv6: allow userspace to add IFA_F_OPTIMISTIC addresses
According to RFC 4429 (section 3.1), adding new IPv6 addresses as
optimistic addresses is acceptable, as long as the implementation
follows some rules:
* Optimistic DAD SHOULD only be used when the implementation is aware
that the address is based on a most likely unique interface
identifier (such as in [RFC2464]), generated randomly [RFC3041],
or by a well-distributed hash function [RFC3972] or assigned by
Dynamic Host Configuration Protocol for IPv6 (DHCPv6) [RFC3315].
Optimistic DAD SHOULD NOT be used for manually entered
addresses.
Thus, it seems reasonable to allow userspace to set the optimistic flag
when adding new addresses.
We must not let userspace set NODAD + OPTIMISTIC, since if the kernel is
not performing DAD we would never clear the optimistic flag. We must
also ignore userspace's request to add OPTIMISTIC flag to addresses that
have already completed DAD (addresses that don't have the TENTATIVE
flag, or that have the DADFAILED flag).
Then we also need to clear the OPTIMISTIC flag on permanent addresses
when DAD fails. Otherwise, IFA_F_OPTIMISTIC addresses added by userspace
can still be used after DAD has failed, because in
ipv6_chk_addr_and_flags(), IFA_F_OPTIMISTIC overrides IFA_F_TENTATIVE.
Setting IFA_F_OPTIMISTIC from userspace is conditional on
CONFIG_IPV6_OPTIMISTIC_DAD and the optimistic_dad sysctl.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
As a part of working on MII time stamping infrastructure, I was trying
to figure out how netdev->phydev gets assigned, and I stumbled across
this. Ever since the new phylink code came in, the field is assigned
twice.
The function, phylink_connect_phy(), calls
phy_attach_direct()
phylink_bringup_phy()
and phy_attach_direct() sets
dev->phydev = phydev;
but phylink_bringup_phy() then sets the same field again:
pl->netdev->phydev = phy;
Similarly, the function, phylink_of_phy_connect(), calls
Karsten Graul [Thu, 1 Mar 2018 12:51:33 +0000 (13:51 +0100)]
net/smc: prevent new connections on link group
When the processing of a DELETE LINK message has started,
new connections should not be added to the link group that
is about to terminate.
Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Karsten Graul [Thu, 1 Mar 2018 12:51:32 +0000 (13:51 +0100)]
net/smc: process add/delete link messages
Add initial support for the LLC messages ADD LINK and DELETE LINK.
Introduce a link state field. Extend the initial LLC handshake with
ADD LINK processing.
Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Karsten Graul [Thu, 1 Mar 2018 12:51:31 +0000 (13:51 +0100)]
net/smc: do not allow eyecatchers in rmbe
SMC does not support eyecatchers in RMB elements,
decline peers requesting this support.
Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Karsten Graul [Thu, 1 Mar 2018 12:51:30 +0000 (13:51 +0100)]
net/smc: process confirm/delete rkey messages
Process and respond to CONFIRM RKEY and DELETE RKEY messages.
Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Karsten Graul [Thu, 1 Mar 2018 12:51:29 +0000 (13:51 +0100)]
net/smc: respond to test link messages
Add TEST LINK message responses, which also serves as preparation for
support of sockopt TCP_KEEPALIVE.
Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Karsten Graul [Thu, 1 Mar 2018 12:51:28 +0000 (13:51 +0100)]
net/smc: remove unused fields from smc structures
The daddr field holds the destination IPv4 address. The field was set but
never used and can be removed. The addr field was a left-over from an
earlier version of non-blocking connects and can be removed.
The result of the call to kernel_getpeername is not used, the call can be
removed. Non-blocking connects are working, so remove restriction comment.
Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Karsten Graul [Thu, 1 Mar 2018 12:51:27 +0000 (13:51 +0100)]
net/smc: move netinfo function to file smc_clc.c
The function smc_netinfo_by_tcpsk() belongs to CLC handling.
Move it to smc_clc.c and rename to smc_clc_netinfo_by_tcpsk.
Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Raspl [Thu, 1 Mar 2018 12:51:26 +0000 (13:51 +0100)]
net/smc: cleanup smc_llc.h and smc_clc.h headers
Remove structures used internal only from headers.
And remove an extra function parameter.
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 1 Mar 2018 18:13:24 +0000 (13:13 -0500)]
Merge branch 'ipv4-ipv6-mcast-align'
Yuval Mintz says:
====================
ipmr, ip6mr: Align multicast routing for IPv4 & IPv6
Historically ip6mr was based [cut-n-paste] on ipmr and the two have not
diverged too much. Apparently as ipv4 multicast routing is more common
than its ipv6 brethren modifications since then are mostly one-way,
affecting ipmr while leaving ip6mr unchanged.
This series is meant to re-factor both ipmr and ip6mr into having common
structures [and some functionality], adding 2 new common files -
mroute_base.h and ipmr_base.c.
The series begins by bringing ip6mr up to speed to some of the changes
applied in the past to ipmr [#2, #3].
It is then possible to re-factor a lot of the common structures -
vif devices [#1], mr_table [#4] mfc_cache [#6], and use the common
structures in both ipmr and ip6mr.
The rest of the patches re-factor some choice flows used by both ipmr
and ip6mr and eliminates duplicity.
This series would later allow for easy extension of ipmr offloading
to support ip6mr offloading as well, as almost all structures
related to the offloading would be shared between the two protocols.
Changes from previous versions
------------------------------
v2:
- #6 Corrected reporting logic when hitting an unresolved cache
- #7 Addressed kernel doc style [Thanks Nikolay]
RFC -> v1:
- Corrected support for CONFIG_IP{,V6}_MROUTE_MULTIPLE_TABLES
- Addressed a couple of kbuild test robot issues
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Wed, 28 Feb 2018 21:29:39 +0000 (23:29 +0200)]
ipmr, ip6mr: Unite dumproute flows
The various MFC entries are being held in the same kind of mr_tables
for both ipmr and ip6mr, and their traversal logic is identical.
Also, with the exception of the addresses [and other small tidbits]
the major bulk of the nla setting is identical.
Unite as much of the dumping as possible between the two.
Notice this requires creating an mr_table iterator for each, as the
for-each preprocessor macro can't be used by the common logic.
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Wed, 28 Feb 2018 21:29:38 +0000 (23:29 +0200)]
ip6mr: Remove MFC_NOTIFY and refactor flags
MFC_NOTIFY exists in ip6mr, probably as some legacy code
[was already removed for ipmr in commit 06bd6c0370bb ("net: ipmr: remove unused MFC_NOTIFY flag and make the flags enum").
Remove it from ip6mr as well, and move the enum into a common file;
Notice MFC_OFFLOAD is currently only used by ipmr.
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Wed, 28 Feb 2018 21:29:37 +0000 (23:29 +0200)]
ipmr, ip6mr: Unite vif seq functions
Same as previously done with the mfc seq, the logic for the vif seq is
refactored to be shared between ipmr and ip6mr.
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Wed, 28 Feb 2018 21:29:36 +0000 (23:29 +0200)]
ipmr, ip6mr: Unite mfc seq logic
With the exception of the final dump, ipmr and ip6mr have the exact same
seq logic for traversing a given mr_table. Refactor that code and make
it common.
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Wed, 28 Feb 2018 21:29:35 +0000 (23:29 +0200)]
ipmr, ip6mr: Unite logic for searching in MFC cache
ipmr and ip6mr utilize the exact same methods for searching the
hashed resolved connections, difference being only in the construction
of the hash comparison key.
In order to unite the flow, introduce an mr_table operation set that
would contain the protocol specific information required for common
flows, in this case - the hash parameters and a comparison key
representing a (*,*) route.
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Wed, 28 Feb 2018 21:29:34 +0000 (23:29 +0200)]
ipmr, ip6mr: Make mfc_cache a common structure
mfc_cache and mfc6_cache are almost identical - the main difference is
in the origin/group addresses and comparison-key. Make a common
structure encapsulating most of the multicast routing logic - mr_mfc
and convert both ipmr and ip6mr into using it.
For easy conversion [casting, in this case] mr_mfc has to be the first
field inside every multicast routing abstraction utilizing it.
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Wed, 28 Feb 2018 21:29:33 +0000 (23:29 +0200)]
ipmr, ip6mr: Unite creation of new mr_table
Now that both ipmr and ip6mr are using the same mr_table structure,
we can have a common function to allocate & initialize a new instance.
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Wed, 28 Feb 2018 21:29:32 +0000 (23:29 +0200)]
mroute*: Make mr_table a common struct
Following previous changes to ip6mr, mr_table and mr6_table are
basically the same [up to mr6_table having additional '6' suffixes to
its variable names].
Move the common structure definition into a common header; This
requires renaming all references in ip6mr to variables that had the
distinct suffix.
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Wed, 28 Feb 2018 21:29:31 +0000 (23:29 +0200)]
ip6mr: Align hash implementation to ipmr
Since commit 8fb472c09b9d ("ipmr: improve hash scalability") ipmr has
been using rhashtable as a basis for its mfc routes, but ip6mr is
currently still using the old private MFC hash implementation.
Align ip6mr to the current ipmr implementation.
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Wed, 28 Feb 2018 21:29:30 +0000 (23:29 +0200)]
ip6mr: Make mroute_sk rcu-based
In ipmr the mr_table socket is handled under RCU. Introduce the same
for ip6mr.
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Wed, 28 Feb 2018 21:29:29 +0000 (23:29 +0200)]
ipmr,ipmr6: Define a uniform vif_device
The two implementations have almost identical structures - vif_device and
mif_device. As a step toward uniforming the mr_tables, eliminate the
mif_device and relocate the vif_device definition into a new common
header file.
Also, introduce a common initializing function for setting most of the
vif_device fields in a new common source file. This requires modifying
the ipv{4,6] Kconfig and ipv4 makefile as we're introducing a new common
config option - CONFIG_IP_MROUTE_COMMON.
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
fib_rules: support sport, dport and proto match
This series extends fib rule match support to include sport, dport
and ip proto match (to complete the 5-tuple match support).
Common use-cases of Policy based routing in the data center require
5-tuple match. The last 2 patches in the series add a call to flow dissect
in the fwd path if required by the installed fib rules (controlled by a flag).
v1:
- Fix errors reported by kbuild and feedback on RFC
- extend port match uapi to accomodate port ranges
v2:
- address comments from Nikolay, David Ahern and Paolo (Thanks!)
Pending things I will submit separate patches for:
- extack for fib rules
- fib rules test (as requested by david ahern)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Roopa Prabhu [Thu, 1 Mar 2018 03:43:22 +0000 (22:43 -0500)]
ipv6: route: dissect flow in input path if fib rules need it
Dissect flow in fwd path if fib rules require it. Controlled by
a flag to avoid penatly for the common case. Flag is set when fib
rules with sport, dport and proto match that require flow dissect
are installed. Also passes the dissected hash keys to the multipath
hash function when applicable to avoid dissecting the flow again.
icmp packets will continue to use inner header for hash
calculations.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Roopa Prabhu [Thu, 1 Mar 2018 03:42:41 +0000 (22:42 -0500)]
ipv6: route: dissect flow in input path if fib rules need it
Dissect flow in fwd path if fib rules require it. Controlled by
a flag to avoid penatly for the common case. Flag is set when fib
rules with sport, dport and proto match that require flow dissect
are installed. Also passes the dissected hash keys to the multipath
hash function when applicable to avoid dissecting the flow again.
icmp packets will continue to use inner header for hash
calculations (Thanks to Nikolay Aleksandrov for some review here).
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Roopa Prabhu [Thu, 1 Mar 2018 03:41:37 +0000 (22:41 -0500)]
ipv6: fib6_rules: support for match on sport, dport and ip proto
support to match on src port, dst port and ip protocol.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Roopa Prabhu [Thu, 1 Mar 2018 03:41:06 +0000 (22:41 -0500)]
ipv4: fib_rules: support match on sport, dport and ip proto
support to match on src port, dst port and ip protocol.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Roopa Prabhu [Thu, 1 Mar 2018 03:40:16 +0000 (22:40 -0500)]
net: fib_rules: support for match on ip_proto, sport and dport
uapi for ip_proto, sport and dport range match
in fib rules.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Wed, 28 Feb 2018 19:43:38 +0000 (20:43 +0100)]
r8169: fix interrupt number after adding support for MSI-X interrupts
In case of MSI-X the interrupt number may differ from pcidev->irq.
Fix this by using pci_irq_vector().
Fixes: 6c6aa15fdea5 ("r8169: improve interrupt handling") Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 28 Feb 2018 17:34:20 +0000 (12:34 -0500)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
100GbE Intel Wired LAN Driver Updates 2018-02-28
This series contains updates to fm10k only.
Jake provides all the changes in this series, starting with making the
function header comments consistent and to align with how the kernel
documentation expects it. Also cleaned up code comment as well as bump
the driver version.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
One of the nice things about network namespaces is that they allow one
to easily create and test complex environments.
Unfortunately, these namespaces can not be used with actual switching
ASICs, as their ports can not be migrated to other network namespaces
(NETIF_F_NETNS_LOCAL) and most of them probably do not support the
L1-separation provided by namespaces.
However, a similar kind of flexibility can be achieved by using VRFs and
by looping the switch ports together. For example:
The VRFs act as lightweight namespaces representing hosts connected to
the switch.
This approach for testing switch ASICs has several advantages over the
traditional method that requires multiple physical machines, to name a
few:
1. Only the device under test (DUT) is being tested without noise from
other system.
2. Ability to easily provision complex topologies. Testing bridging
between 4-ports LAGs or 8-way ECMP requires many physical links that are
not always available. With the VRF-based approach one merely needs to
loopback more ports.
These tests are written with switch ASICs in mind, but they can be run
on any Linux box using veth pairs to emulate physical loopbacks.
v2:
* Order local variables declaration according to function arguments
order (Petr)
v1:
* Change location to net/forwarding instead of forwarding/
* Add ability to pause on failure
* Add ability to pause on cleanup
* Make configuration file optional
* Make ping/ping6/mz configurable
* Add more tc tests
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 28 Feb 2018 10:25:12 +0000 (12:25 +0200)]
selftests: forwarding: Test IPv6 weighted nexthops
Have one host generate 16K IPv6 echo requests with a random flow label
and check that they are distributed between both multipath links
according to the provided weights.
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 28 Feb 2018 10:25:11 +0000 (12:25 +0200)]
selftests: forwarding: Test IPv4 weighted nexthops
Use different weights for the multipath route configured on the first
router and check that the different flows generated by the first host
are distributed according to the provided weights.
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 28 Feb 2018 10:25:10 +0000 (12:25 +0200)]
selftests: forwarding: Create test topology for multipath routing
Create a topology with two hosts, each directly connected to a different
router. Both routers are connected using two links, enabling multipath
routing.
Test IPv4 and IPv6 ping using default MTU and large MTU.
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 28 Feb 2018 10:25:08 +0000 (12:25 +0200)]
selftests: forwarding: Add a test for flooded traffic
Add test cases for unknown unicast and unregistered multicast flooding.
For each traffic type, turn off flooding on one bridged port and inject
a packet of the specified type through the second bridged port. Make
sure the packet was not received by checking the ACL counters on the
other end. Later, turn on flooding and make sure the packet was
received.
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Wed, 28 Feb 2018 09:59:27 +0000 (10:59 +0100)]
ipvlan: use per device spinlock to protect addrs list updates
This changeset moves ipvlan address under RCU protection, using
a per ipvlan device spinlock to protect list mutation and RCU
read access to protect list traversal.
Also explicitly use RCU read lock to traverse the per port
ipvlans list, so that we can now perform a full address lookup
without asserting the RTNL lock.
Overall this allows the ipvlan driver to check fully for duplicate
addresses - before this commit ipv6 addresses assigned by autoconf
via prefix delegation where accepted without any check - and avoid
the following rntl assertion failure still in the same code path:
v1 -> v2: drop unneeded in_softirq check in ipvlan_addr6_validator_event()
Fixes: e9997c2938b2 ("ipvlan: fix check for IP addresses in control path") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>