Phil Sutter [Thu, 12 Jan 2017 14:22:49 +0000 (15:22 +0100)]
tc: m_xt: Fix segfault with iptables-1.6.0
Said iptables version introduced struct xtables_globals field
'compat_rev', a function pointer. Initializing it is mandatory as
libxtables calls it without existence check.
Without this, tc segfaults when using the xt action like so:
| tc filter add dev d0 parent ffff: u32 match u32 0 0 \
| action xt -j MARK --set-mark 20
David Ahern [Mon, 9 Jan 2017 23:43:09 +0000 (15:43 -0800)]
Add support for rt_protos.d
Add support for reading proto id/name mappings from rt_protos.d
directory. Allows users to have custom protocol values converted
to human friendly names.
Each file under rt_protos.d has the 'id name' format used by
rt_protos. Only .conf files are read and parsed.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Baruch Siach [Thu, 22 Dec 2016 18:52:48 +0000 (20:52 +0200)]
tc: add missing limits.h header
This fixes under musl build issues like:
f_matchall.c: In function ‘matchall_parse_opt’:
f_matchall.c:48:12: error: ‘LONG_MIN’ undeclared (first use in this function)
if (h == LONG_MIN || h == LONG_MAX) {
^
f_matchall.c:48:12: note: each undeclared identifier is reported only once for each function it appears in
f_matchall.c:48:29: error: ‘LONG_MAX’ undeclared (first use in this function)
if (h == LONG_MIN || h == LONG_MAX) {
^
Hadar Hen Zion [Thu, 22 Dec 2016 08:14:41 +0000 (10:14 +0200)]
tc/m_tunnel_key: Add to the usage encapsulation dest UDP port
tunnel key set parameters includes also dest UDP port, add it to the
usage.
Fixes: 449c709c3868 ("tc/m_tunnel_key: Add dest UDP port to tunnel key action") Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Reported-by: Simon Horman <simon.horman@netronome.com>
Hadar Hen Zion [Thu, 22 Dec 2016 08:14:40 +0000 (10:14 +0200)]
tc/cls_flower: Add to the usage encapsulation dest UDP port
Encapsulation dest UDP port is part of the classifier matching
parameters, add it to the usage.
Fixes: 41aa17ff4668 ("tc/cls_flower: Add dest UDP port to tunnel params") Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Reported-by: Simon Horman <simon.horman@netronome.com>
Simon Horman [Fri, 16 Dec 2016 13:54:37 +0000 (14:54 +0100)]
tc: flower: Allow *_mac options to accept a mask
* The argument to src_mac and dst_mac may now take an optional mask
to limit the scope of matching.
* This address is is documented as a LLADDR in keeping with ip-link(8).
* The formats accepted match those already output when dumping flower
filters from the kernel.
Example of use of LLADDR with and without a mask:
tc qdisc add dev eth0 ingress
tc filter add dev eth0 protocol ip parent ffff: flower indev eth0 \
src_mac 52:54:01:00:00:00/ff:ff:00:00:00:01 action drop
tc filter add dev eth0 protocol ip parent ffff: flower indev eth0 \
src_mac 52:54:00:00:00:00/23 action drop
tc filter add dev eth0 protocol ip parent ffff: flower indev eth0 \
src_mac 52:54:00:00:00:00 action drop
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Simon Horman [Fri, 16 Dec 2016 13:54:36 +0000 (14:54 +0100)]
tc: flower: document that *_ip parameters take a PREFIX as an argument.
* The argument to src_ip, dst_ip, enc_src_ip and enc_dst_ip take an
optional prefix length which is used to provide a mask to limit the scope
of matching.
* This is documented as a PREFIX in keeping with ip-route(8).
Example of uses of IPv4 and IPv6 prefixes
tc qdisc add dev eth0 ingress
tc filter add dev eth0 protocol ip parent ffff: flower \
indev eth0 dst_ip 192.168.1.1 action drop
tc filter add dev eth0 protocol ip parent ffff: flower \
indev eth0 src_ip 10.0.0.0/8 action drop
tc filter add dev eth0 protocol ipv6 parent ffff: flower \
indev eth0 src_ip 2001:DB8:1::/48 action drop
tc filter add dev eth0 protocol ipv6 parent ffff: flower \
indev eth0 dst_ip 2001:DB8::1 action drop
Signed-off-by: Simon Horman <simon.horman@netronome.com>
David Ahern [Thu, 15 Dec 2016 20:07:01 +0000 (12:07 -0800)]
ip vrf: Fix reset to default VRF
Path in vrf_switch for "default" VRF is supposed to be MNT/vrf not
MNT/default. Also, default_vrf flag is redundant with ifindex. Remove
the flag in favor of ifindex != 0.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
David Ahern [Thu, 15 Dec 2016 20:07:00 +0000 (12:07 -0800)]
ip vrf: Refactor ipvrf_identify
Split ipvrf_identify into arg processing and a function that does the
actual cgroup file parsing. The latter function is used in a follow
on patch.
In the process, convert the reading of the cgroups file to use fopen
and fgets just in case the file ever grows beyond 4k. Move printing
of any error message and the vrf name to the caller of the new
vrf_identify.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
David Ahern [Tue, 13 Dec 2016 23:34:32 +0000 (15:34 -0800)]
Fix compile warning in get_addr_1
A recent cleanup causes a compile warning on Debian jessie:
CC utils.o
utils.c: In function ‘get_addr_1’:
utils.c:486:21: warning: passing argument 1 of ‘ll_addr_a2n’ from incompatible pointer type
len = ll_addr_a2n(&addr->data, sizeof(addr->data), name);
^
In file included from utils.c:34:0:
../include/rt_names.h:27:5: note: expected ‘char *’ but argument is of type ‘__u32 (*)[8]’
int ll_addr_a2n(char *lladdr, int len, const char *arg);
^
Revert the removal of the typecast
Fixes: e1933b928125 ("utils: cleanup style") Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
David Ahern [Mon, 12 Dec 2016 00:53:15 +0000 (16:53 -0800)]
Introduce ip vrf command
'ip vrf' follows the user semnatics established by 'ip netns'.
The 'ip vrf' subcommand supports 3 usages:
1. Run a command against a given vrf:
ip vrf exec NAME CMD
Uses the recently committed cgroup/sock BPF option. vrf directory
is added to cgroup2 mount. Individual vrfs are created under it. BPF
filter attached to vrf/NAME cgroup2 to set sk_bound_dev_if to the VRF
device index. From there the current process (ip's pid) is addded to
the cgroups.proc file and the given command is exected. In doing so
all AF_INET/AF_INET6 (ipv4/ipv6) sockets are automatically bound to
the VRF domain.
The association is inherited parent to child allowing the command to
be a shell from which other commands are run relative to the VRF.
2. Show the VRF a process is bound to:
ip vrf id
This command essentially looks at /proc/pid/cgroup for a "::/vrf/"
entry with the VRF name following.
3. Show process ids bound to a VRF
ip vrf pids NAME
This command dumps the file MNT/vrf/NAME/cgroup.procs since that file
shows the process ids in the particular vrf cgroup.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
David Ahern [Mon, 12 Dec 2016 00:53:14 +0000 (16:53 -0800)]
libnetlink: Add variant of rtnl_talk that does not display RTNETLINK answers error
iplink_vrf has 2 functions used to validate a user given device name is
a VRF device and to return the table id. If the user string is not a
device name ip commands with a vrf keyword show a confusing error
message: "RTNETLINK answers: No such device".
Add a variant of rtnl_talk that does not display the "RTNETLINK answers"
message and update iplink_vrf to use it.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
David Ahern [Mon, 12 Dec 2016 00:53:12 +0000 (16:53 -0800)]
Add filesystem APIs to lib
Add make_path to recursively call mkdir as needed to create a given
path with the given mode.
Add find_cgroup2_mount to lookup path where cgroup2 is mounted. If it
is not already mounted, cgroup2 is mounted under /var/run/cgroup2 for
use by iproute2.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
David Ahern [Wed, 7 Dec 2016 20:55:09 +0000 (12:55 -0800)]
Makefile: really suppress printing of directories
Makefile adds --no-print-directory to MAKEFLAGS if VERBOSE is not
defined however Config always defines VERBOSE. Update the check to
whether VERBOSE is 0.
Fixes: 57bdf8b76451 ("Make builds default to quiet mode") Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Daniel Borkmann [Tue, 6 Dec 2016 01:21:57 +0000 (02:21 +0100)]
bpf: add initial support for attaching xdp progs
Now that we made the BPF loader generic as a library, reuse it
for loading XDP programs as well. This basically adds a minimal
start of a facility for iproute2 to load XDP programs. There
currently only exists the xdp1_user.c sample code in the kernel
tree that sets up netlink directly and an iovisor/bcc front-end.
Since we have all the necessary infrastructure in place already
from tc side, we can just reuse its loader back-end and thus
facilitate migration and usability among the two for people
familiar with tc/bpf already. Sharing maps, performing tail calls,
etc works the same way as with tc. Naturally, once kernel
configuration API evolves, we will extend new features for XDP
here as well, resp. extend dumping of related netlink attributes.
Minimal example:
clang -target bpf -O2 -Wall -c prog.c -o prog.o
ip [-force] link set dev em1 xdp obj prog.o # attaching
ip [-d] link # dumping
ip link set dev em1 xdp off # detaching
For the dump, intention is that in the first line for each ip
link entry, we'll see "xdp" to indicate that this device has an
XDP program attached. Once we dump some more useful information
via netlink (digest, etc), idea is that 'ip -d link' will then
display additional relevant program information below the "link/
ether [...]" output line for such devices, for example.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>
Daniel Borkmann [Tue, 6 Dec 2016 01:17:58 +0000 (02:17 +0100)]
bpf: check for owner_prog_type and notify users when differ
Kernel commit 21116b7068b9 ("bpf: add owner_prog_type and accounted mem
to array map's fdinfo") added support for telling the owner prog type in
case of prog arrays. Give a notification to the user when they differ,
and the program eventually fails to load.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>
Thomas Graf [Wed, 7 Dec 2016 09:47:59 +0000 (10:47 +0100)]
bpf: Fix number of retries when growing log buffer
The log buffer is automatically grown when the verifier output does not
fit into the default buffer size. The number of growing attempts was
not sufficient to reach the maximum buffer size so far.
Perform 9 iterations to reach max and let the 10th one fail.
Simon Horman [Sat, 3 Dec 2016 08:52:40 +0000 (09:52 +0100)]
tc: flower: make use of flower_port_attr_type() safe and silent
Make use of flower_port_attr_type() safe:
* flower_port_attr_type() may return a valid index into tb[] or -1.
Only access tb[] in the case of the former.
* Do not access null entries in tb[]
Also make usage silent - it is valid for ip_proto to be invalid,
for example if it is not specified as part of the filter.
Fixes: a1fb0d484237 ("tc: flower: Support matching on SCTP ports") Signed-off-by: Simon Horman <simon.horman@netronome.com>
Simon Horman [Fri, 2 Dec 2016 22:59:43 +0000 (14:59 -0800)]
tc: flower: remove references to eth_type in manpage
Remove references to eth_type and ether_type (spelling error) in
the tc flower manpage.
Also correct formatting of boldface text with whitespace.
Cc: Paul Blakey <paulb@mellanox.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Simon Horman [Fri, 2 Dec 2016 11:56:05 +0000 (12:56 +0100)]
ss: initialise variables outside of for loop
Initialise for loops outside of for loops. GCC flags this as being
out of spec unless C99 or C11 mode is used.
With this change the entire tree appears to compile cleanly with -Wall.
$ gcc --version
gcc (Debian 4.9.2-10) 4.9.2
...
$ make
...
ss.c: In function ‘unix_show_sock’:
ss.c:3128:4: error: ‘for’ loop initial declarations are only allowed in C99 or C11 mode
...
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Amir Vadai [Fri, 2 Dec 2016 11:25:15 +0000 (13:25 +0200)]
tc/act_tunnel: Introduce ip tunnel action
This action could be used before redirecting packets to a shared tunnel
device, or when redirecting packets arriving from a such a device.
The 'unset' action is optional. It is used to explicitly unset the
metadata created by the tunnel device during decap. If not used, the
metadata will be released automatically by the kernel.
The 'set' operation, will set the metadata with the specified values for
the encap.
For example, the following flower filter will forward all ICMP packets
destined to 11.11.11.2 through the shared vxlan device 'vxlan0'. Before
redirecting, a metadata for the vxlan tunnel is created using the
tunnel_key action and it's arguments:
$ tc filter add dev net0 protocol ip parent ffff: \
flower \
ip_proto 1 \
dst_ip 11.11.11.2 \
action tunnel_key set \
src_ip 11.11.0.1 \
dst_ip 11.11.0.2 \
id 11 \
action mirred egress redirect dev vxlan0
Amir Vadai [Fri, 2 Dec 2016 11:25:14 +0000 (13:25 +0200)]
tc/cls_flower: Classify packet in ip tunnels
Introduce classifying by metadata extracted by the tunnel device.
Outer header fields - source/dest ip and tunnel id, are extracted from
the metadata when classifying.
For example, the following will add a filter on the ingress Qdisc of shared
vxlan device named 'vxlan0'. To forward packets with outer src ip
11.11.0.2, dst ip 11.11.0.1 and tunnel id 11. The packets will be
forwarded to tap device 'vnet0':
Phil Sutter [Fri, 2 Dec 2016 10:39:51 +0000 (11:39 +0100)]
ss: Eliminate unix_use_proc()
This function is used only at a single place anymore, so replace the
call to it by it's content, which makes that specific part of
unix_show() consistent with e.g. tcp_show().
Phil Sutter [Fri, 2 Dec 2016 10:39:50 +0000 (11:39 +0100)]
ss: Drop list traversal from unix_stats_print()
Although this complicates the dedicated procfs-based code path in
unix_show() a bit, it's the only sane way to get rid of unix_show_sock()
output diverging from other socket types in that it prints all socket
details in a new line.
As a side effect, it allows to eliminate all procfs specific code in
the same function.
Phil Sutter [Fri, 2 Dec 2016 10:39:49 +0000 (11:39 +0100)]
ss: introduce proc_ctx_print()
This consolidates identical code in three places. While the function
name is not quite perfect as there is different proc_ctx printing code
in netlink_show_one() as well, I sadly didn't find a more suitable one.
Phil Sutter [Fri, 2 Dec 2016 10:39:48 +0000 (11:39 +0100)]
ss: Use sockstat->type in all socket types
Unix sockets used that field already to hold info about the socket type.
By replicating this approach in all other socket types, we can get rid
of protocol parameter in inet_stats_print() and have sock_state_print()
figure things out by itself.
Phil Sutter [Fri, 2 Dec 2016 10:39:47 +0000 (11:39 +0100)]
ss: Add missing tab when printing UNIX details
When dumping UNIX sockets and show_details is active but not show_mem
(ss -xne), the socket details are printed without being prefixed by tab.
Fix this by printing the tab character when either one of '-e' or '-m'
has been specified.
Phil Sutter [Fri, 2 Dec 2016 10:39:46 +0000 (11:39 +0100)]
ss: Drop empty lines in UDP output
When dumping UDP sockets and show_tcpinfo (-i) is active but not
show_mem (-m), print_tcpinfo() does not output anything leading to an
empty line being printed after every socket. Fix this by skipping the
call to print_tcpinfo() and the previous newline printing in that case.
Cyrill Gorcunov [Wed, 2 Nov 2016 13:14:55 +0000 (16:14 +0300)]
libnetlink: Add test for error code returned from netlink reply
In case if some diag module is not present in the system,
say the kernel is not modern enough, we simply skip the
error code reported. Instead we should check for data
length in NLMSG_DONE and process unsupported case.