David Ahern [Wed, 7 Dec 2016 20:55:09 +0000 (12:55 -0800)]
Makefile: really suppress printing of directories
Makefile adds --no-print-directory to MAKEFLAGS if VERBOSE is not
defined however Config always defines VERBOSE. Update the check to
whether VERBOSE is 0.
Fixes: 57bdf8b76451 ("Make builds default to quiet mode") Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Daniel Borkmann [Tue, 6 Dec 2016 01:21:57 +0000 (02:21 +0100)]
bpf: add initial support for attaching xdp progs
Now that we made the BPF loader generic as a library, reuse it
for loading XDP programs as well. This basically adds a minimal
start of a facility for iproute2 to load XDP programs. There
currently only exists the xdp1_user.c sample code in the kernel
tree that sets up netlink directly and an iovisor/bcc front-end.
Since we have all the necessary infrastructure in place already
from tc side, we can just reuse its loader back-end and thus
facilitate migration and usability among the two for people
familiar with tc/bpf already. Sharing maps, performing tail calls,
etc works the same way as with tc. Naturally, once kernel
configuration API evolves, we will extend new features for XDP
here as well, resp. extend dumping of related netlink attributes.
Minimal example:
clang -target bpf -O2 -Wall -c prog.c -o prog.o
ip [-force] link set dev em1 xdp obj prog.o # attaching
ip [-d] link # dumping
ip link set dev em1 xdp off # detaching
For the dump, intention is that in the first line for each ip
link entry, we'll see "xdp" to indicate that this device has an
XDP program attached. Once we dump some more useful information
via netlink (digest, etc), idea is that 'ip -d link' will then
display additional relevant program information below the "link/
ether [...]" output line for such devices, for example.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>
Daniel Borkmann [Tue, 6 Dec 2016 01:17:58 +0000 (02:17 +0100)]
bpf: check for owner_prog_type and notify users when differ
Kernel commit 21116b7068b9 ("bpf: add owner_prog_type and accounted mem
to array map's fdinfo") added support for telling the owner prog type in
case of prog arrays. Give a notification to the user when they differ,
and the program eventually fails to load.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>
Thomas Graf [Wed, 7 Dec 2016 09:47:59 +0000 (10:47 +0100)]
bpf: Fix number of retries when growing log buffer
The log buffer is automatically grown when the verifier output does not
fit into the default buffer size. The number of growing attempts was
not sufficient to reach the maximum buffer size so far.
Perform 9 iterations to reach max and let the 10th one fail.
Simon Horman [Sat, 3 Dec 2016 08:52:40 +0000 (09:52 +0100)]
tc: flower: make use of flower_port_attr_type() safe and silent
Make use of flower_port_attr_type() safe:
* flower_port_attr_type() may return a valid index into tb[] or -1.
Only access tb[] in the case of the former.
* Do not access null entries in tb[]
Also make usage silent - it is valid for ip_proto to be invalid,
for example if it is not specified as part of the filter.
Fixes: a1fb0d484237 ("tc: flower: Support matching on SCTP ports") Signed-off-by: Simon Horman <simon.horman@netronome.com>
Simon Horman [Fri, 2 Dec 2016 22:59:43 +0000 (14:59 -0800)]
tc: flower: remove references to eth_type in manpage
Remove references to eth_type and ether_type (spelling error) in
the tc flower manpage.
Also correct formatting of boldface text with whitespace.
Cc: Paul Blakey <paulb@mellanox.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Simon Horman [Fri, 2 Dec 2016 11:56:05 +0000 (12:56 +0100)]
ss: initialise variables outside of for loop
Initialise for loops outside of for loops. GCC flags this as being
out of spec unless C99 or C11 mode is used.
With this change the entire tree appears to compile cleanly with -Wall.
$ gcc --version
gcc (Debian 4.9.2-10) 4.9.2
...
$ make
...
ss.c: In function ‘unix_show_sock’:
ss.c:3128:4: error: ‘for’ loop initial declarations are only allowed in C99 or C11 mode
...
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Amir Vadai [Fri, 2 Dec 2016 11:25:15 +0000 (13:25 +0200)]
tc/act_tunnel: Introduce ip tunnel action
This action could be used before redirecting packets to a shared tunnel
device, or when redirecting packets arriving from a such a device.
The 'unset' action is optional. It is used to explicitly unset the
metadata created by the tunnel device during decap. If not used, the
metadata will be released automatically by the kernel.
The 'set' operation, will set the metadata with the specified values for
the encap.
For example, the following flower filter will forward all ICMP packets
destined to 11.11.11.2 through the shared vxlan device 'vxlan0'. Before
redirecting, a metadata for the vxlan tunnel is created using the
tunnel_key action and it's arguments:
$ tc filter add dev net0 protocol ip parent ffff: \
flower \
ip_proto 1 \
dst_ip 11.11.11.2 \
action tunnel_key set \
src_ip 11.11.0.1 \
dst_ip 11.11.0.2 \
id 11 \
action mirred egress redirect dev vxlan0
Amir Vadai [Fri, 2 Dec 2016 11:25:14 +0000 (13:25 +0200)]
tc/cls_flower: Classify packet in ip tunnels
Introduce classifying by metadata extracted by the tunnel device.
Outer header fields - source/dest ip and tunnel id, are extracted from
the metadata when classifying.
For example, the following will add a filter on the ingress Qdisc of shared
vxlan device named 'vxlan0'. To forward packets with outer src ip
11.11.0.2, dst ip 11.11.0.1 and tunnel id 11. The packets will be
forwarded to tap device 'vnet0':
Phil Sutter [Fri, 2 Dec 2016 10:39:51 +0000 (11:39 +0100)]
ss: Eliminate unix_use_proc()
This function is used only at a single place anymore, so replace the
call to it by it's content, which makes that specific part of
unix_show() consistent with e.g. tcp_show().
Phil Sutter [Fri, 2 Dec 2016 10:39:50 +0000 (11:39 +0100)]
ss: Drop list traversal from unix_stats_print()
Although this complicates the dedicated procfs-based code path in
unix_show() a bit, it's the only sane way to get rid of unix_show_sock()
output diverging from other socket types in that it prints all socket
details in a new line.
As a side effect, it allows to eliminate all procfs specific code in
the same function.
Phil Sutter [Fri, 2 Dec 2016 10:39:49 +0000 (11:39 +0100)]
ss: introduce proc_ctx_print()
This consolidates identical code in three places. While the function
name is not quite perfect as there is different proc_ctx printing code
in netlink_show_one() as well, I sadly didn't find a more suitable one.
Phil Sutter [Fri, 2 Dec 2016 10:39:48 +0000 (11:39 +0100)]
ss: Use sockstat->type in all socket types
Unix sockets used that field already to hold info about the socket type.
By replicating this approach in all other socket types, we can get rid
of protocol parameter in inet_stats_print() and have sock_state_print()
figure things out by itself.
Phil Sutter [Fri, 2 Dec 2016 10:39:47 +0000 (11:39 +0100)]
ss: Add missing tab when printing UNIX details
When dumping UNIX sockets and show_details is active but not show_mem
(ss -xne), the socket details are printed without being prefixed by tab.
Fix this by printing the tab character when either one of '-e' or '-m'
has been specified.
Phil Sutter [Fri, 2 Dec 2016 10:39:46 +0000 (11:39 +0100)]
ss: Drop empty lines in UDP output
When dumping UDP sockets and show_tcpinfo (-i) is active but not
show_mem (-m), print_tcpinfo() does not output anything leading to an
empty line being printed after every socket. Fix this by skipping the
call to print_tcpinfo() and the previous newline printing in that case.
Cyrill Gorcunov [Wed, 2 Nov 2016 13:14:55 +0000 (16:14 +0300)]
libnetlink: Add test for error code returned from netlink reply
In case if some diag module is not present in the system,
say the kernel is not modern enough, we simply skip the
error code reported. Instead we should check for data
length in NLMSG_DONE and process unsupported case.
l2tp: read IPv6 UDP checksum attributes from kernel
In case of an older kernel that doesn't set L2TP_ATTR_UDP_ZERO_CSUM6_{RX,TX}
the old hard-coded value is being preserved, since the attribute flag will be
missing.
L2TP_ATTR_UDP_CSUM is read by the kernel as a NLA_FLAG value,
but is validated as a NLA_U8, so we will write it as an u8,
but the value isn't actually being read by the kernel.
It is written by the kernel as a NLA_U8, so we will read as
such.
Zhang Shengju [Sat, 19 Nov 2016 15:50:13 +0000 (23:50 +0800)]
libnetlink: reduce size of message sent to kernel
Fixes commit 246f57c4086d99fa ("ip link: Add support for kernel
side filtering").
This patch reduce the size of message sent to kernel space. Before this
patch, for command: 'ip link show', we will sent 1056 bytes. With this
patch, we only need to send 40 bytes.
Zhang Shengju [Fri, 18 Nov 2016 01:12:53 +0000 (09:12 +0800)]
iproute2: fix the link group name getting error
In the situation where more than one entry live in the same hash bucket,
loop to get the correct one.
Before:
$ cat /etc/iproute2/group
0 default
256 test
$ sudo ip link set group test dummy1
$ ip link show type dummy
11: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group 0 qlen 1000
link/ether 4e:3b:d3:6c:f0:e6 brd ff:ff:ff:ff:ff:ff
12: dummy1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group test qlen 1000
link/ether d6:9c:a4:1f:e7:e5 brd ff:ff:ff:ff:ff:ff
After:
$ ip link show type dummy
11: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 4e:3b:d3:6c:f0:e6 brd ff:ff:ff:ff:ff:ff
12: dummy1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group test qlen 1000
link/ether d6:9c:a4:1f:e7:e5 brd ff:ff:ff:ff:ff:ff
Adjusting iproute2 utility to support new macvlan link type mode called
"source".
Example of commands that can be applied:
ip link add link eth0 name macvlan0 type macvlan mode source
ip link set link dev macvlan0 type macvlan macaddr add 00:11:11:11:11:11
ip link set link dev macvlan0 type macvlan macaddr del 00:11:11:11:11:11
ip link set link dev macvlan0 type macvlan macaddr flush
ip -details link show dev macvlan0
Based on previous work of Stefan Gula <steweg@gmail.com>
Signed-off-by: Michael Braun <michael-dev@fami-braun.de> Cc: steweg@gmail.com
v5:
- rebase and fix checkpatch
v4:
- add MACADDR_SET support
- skip FLAG_UNICAST / FLAG_UNICAST_ALL as this is not upstream
- fix man page
Daniel Borkmann [Thu, 10 Nov 2016 00:20:59 +0000 (01:20 +0100)]
bpf: make tc's bpf loader generic and move into lib
This work moves the bpf loader into the iproute2 library and reworks
the tc specific parts into generic code. It's useful as we can then
more easily support new program types by just having the same ELF
loader backend. Joint work with Thomas Graf. I hacked a rough start
of a test suite to make sure nothing breaks [1] and looks all good.
Phil Sutter [Tue, 8 Nov 2016 21:29:11 +0000 (22:29 +0100)]
ipaddress: Simplify vf_info parsing
Commit 7b8179c780a1a ("iproute2: Add new command to ip link to
enable/disable VF spoof check") tried to add support for
IFLA_VF_SPOOFCHK in a backwards-compatible manner, but aparently overdid
it: parse_rtattr_nested() handles missing attributes perfectly fine in
that it will leave the relevant field unassigned so calling code can
just compare against NULL. There is no need to layback from the previous
(IFLA_VF_TX_RATE) attribute to the next to check if IFLA_VF_SPOOFCHK is
present or not. To the contrary, it establishes a potentially incorrect
assumption of these two attributes directly following each other which
may not be the case (although up to now, kernel aligns them this way).
This patch cleans up the code to adhere to the common way of checking
for attribute existence. It has been tested to return correct results
regardless of whether the kernel exports IFLA_VF_SPOOFCHK or not.
Signed-off-by: Phil Sutter <phil@nwl.cc> Reviewed-by: Greg Rose <grose@lightfleet.com>
Paul Blakey [Wed, 2 Nov 2016 15:09:58 +0000 (17:09 +0200)]
tc: flower: Fix usage message
Remove left over usage from removal of eth_type argument.
Fixes: 488b41d020fb ('tc: flower no need to specify the ethertype') Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Simon Horman <simon.horman@netronome.com>
Isaac Boukris [Sat, 29 Oct 2016 19:20:19 +0000 (22:20 +0300)]
iproute2: ss: escape all null bytes in abstract unix domain socket
Abstract unix domain socket may embed null characters,
these should be translated to '@' when printed by ss the
same way the null prefix is currently being translated.
Daniel Borkmann [Tue, 18 Oct 2016 12:13:09 +0000 (14:13 +0200)]
tc, ipt: don't enforce iproute2 dependency on iptables-devel
Since 5cd1adba79d3 ("Update to current iptables headers") compilation
of iproute2 broke for systems without iptables-devel package [1].
Reason is that even though we fall back to build m_ipt.c, the include
depends on a xtables-version.h header, which only ships with
iptables-devel. Machines not having this package fail compilation with:
[...]
CC m_ipt.o
In file included from ../include/iptables.h:5:0,
from m_ipt.c:17:
../include/xtables.h:34:29: fatal error: xtables-version.h: No such file or directory
compilation terminated.
../Config:31: recipe for target 'm_ipt.o' failed
make[1]: *** [m_ipt.o] Error 1
The configure script only barks that package xtables was not found in
the pkg-config search path. The generated Config then only contains f.e.
TC_CONFIG_IPSET. In tc's Makefile we thus fall back to adding m_ipt.o
to TCMODULES. m_ipt.c then includes the local include/iptables.h header
copy, which includes the include/xtables.h copy. Latter then includes
xtables-version.h, which only ships with iptables-devel.
One way to resolve this is to skip this whole mess when pkg-config has
no xtables config available. I've carried something along these lines
locally for a while now, but it's just too annyoing. :/ Build works fine
now also when xtables.pc is not available.
Hangbin Liu [Sun, 9 Oct 2016 02:14:18 +0000 (10:14 +0800)]
devlink: Convert conditional in dl_argv_handle_port() to switch()
Discovered by Phil's covscan. The final return statement is never reached.
This is not inherently clear from looking at the code, so change the
conditional to a switch() statement which should clarify this.
CC: Phil Sutter <phil@nwl.cc> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Phil Sutter <phil@nwl.cc>