When using the tc flower filter, rules marked with "protocol all" do not
actually match all packets. This is due to a bug in f_flower.c that passes
in ETH_P_ALL in the TCA_FLOWER_KEY_ETH_TYPE attribute when adding a rule.
Fix this by omitting TCA_FLOWER_KEY_ETH_TYPE if the protocol is set to
ETH_P_ALL.
Fixes: 488b41d020fb ("tc: flower no need to specify the ethertype") Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Benjamin LaHaise <benjamin.lahaise@netronome.com> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> Reviewed-by: Roi Dayan <roid@mellanox.com>
This patch adds a new field that is printed in the end of the line which
denotes the real entry state. Before this patch an entry's IIF could
disappear and it would look like an unresolved one (iif = unresolved):
(3.0.16.1, 225.11.16.1) Iif: unresolved
with no way to really distinguish it from an unresolved entry.
After the patch if the dumped entry has RTNH_F_UNRESOLVED set we get:
(3.0.16.1, 225.11.16.1) Iif: unresolved State: unresolved
for resolved entries after the OIF list. Note that "State:" has ':' in
it so it cannot be mistaken for an interface name.
And for the example above, we'd get:
(0.0.0.0, 225.11.11.11) Iif: unresolved State: resolved
Also when dumping all routes via ip route show table all,
it will show up as:
multicast 225.11.16.1/32 from 3.0.16.1/32 table default proto 17 unresolved
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
David Ahern [Thu, 19 Jan 2017 17:08:21 +0000 (09:08 -0800)]
ip route: error out on multiple via without nexthop keyword
To specify multiple nexthops in a route the user is expected to use the
"nexthop" keyword which ip route uses to create the RTA_MULTIPATH.
However, ip route always accepts multiple 'via' keywords where only the
last one is used in the route leading to confusion. For example, ip
accepts this syntax:
$ ip ro add vrf red 1.1.1.0/24 via 10.100.1.18 via 10.100.2.18
but the route entered inserted by the kernel is just the last gateway:
1.1.1.0/24 via 10.100.2.18 dev eth2
which is not the full request from the user. Detect the presense of
multiple 'via' and give the user a hint to add nexthop:
$ ip ro add vrf red 1.1.1.0/24 via 10.100.1.18 via 10.100.2.18
Error: argument "via" is wrong: use nexthop syntax to specify multiple via
Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Roi Dayan [Thu, 19 Jan 2017 12:31:20 +0000 (14:31 +0200)]
tc: flower: Fix incorrect error msg about eth type
addattr16 may return an error about the nl msg size
but not about incorrect eth type.
Fixes: 488b41d020fb ("tc: flower no need to specify the ethertype") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com>
Petr Vorel [Mon, 16 Jan 2017 23:25:50 +0000 (00:25 +0100)]
ip: fix igmp parsing when iface is long
Entries with long vhost names in /proc/net/igmp have no whitespace
between name and colon, so sscanf() adds it to vhost and
'ip maddr show iface' doesn't include inet result.
Simon Horman [Wed, 11 Jan 2017 13:10:16 +0000 (14:10 +0100)]
tc: ife: correct spelling of prio in example
Correct typo in example in ife man page.
Fixes: 06f9a59170c0 ("man: tc-ife.8: man page for ife action") Cc: Lucas Bates <lucasb@mojatatu.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
This patch adds a new argument to the bridge fdb show command that allows
to filter by entry state.
Also update the man page to include all available show arguments.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Phil Sutter [Thu, 12 Jan 2017 14:22:49 +0000 (15:22 +0100)]
tc: m_xt: Fix segfault with iptables-1.6.0
Said iptables version introduced struct xtables_globals field
'compat_rev', a function pointer. Initializing it is mandatory as
libxtables calls it without existence check.
Without this, tc segfaults when using the xt action like so:
| tc filter add dev d0 parent ffff: u32 match u32 0 0 \
| action xt -j MARK --set-mark 20
David Ahern [Mon, 9 Jan 2017 23:43:09 +0000 (15:43 -0800)]
Add support for rt_protos.d
Add support for reading proto id/name mappings from rt_protos.d
directory. Allows users to have custom protocol values converted
to human friendly names.
Each file under rt_protos.d has the 'id name' format used by
rt_protos. Only .conf files are read and parsed.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Simon Horman [Wed, 4 Jan 2017 11:02:18 +0000 (12:02 +0100)]
tc: flower: Update dest UDP port documentation
Since 41aa17ff4668 ("tc/cls_flower: Add dest UDP port to tunnel params")
tc flower supports setting the dest UDP port.
* Use "port_number" to be consistent with other man-page text
* Re-add "enc_dst_port" documentation to manpage which was
accidently removed by b2a1f740aa4d ("tc: flower: document that *_ip
parameters take a PREFIX as an argument.")
Cc: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
Paul Blakey [Thu, 29 Dec 2016 18:42:08 +0000 (10:42 -0800)]
tc: flower: support matching flags
Enhance flower to support matching on flags.
The 1st flag allows to match on whether the packet is
an IP fragment.
Example:
# add a flower filter that will drop fragmented packets
# (bit 0 of control flags)
tc filter add dev ens4f0 protocol ip parent ffff: \
flower \
src_mac e4:1d:2d:fd:8b:01 \
dst_mac e4:1d:2d:fd:8b:02 \
indev ens4f0 \
matching_flags 0x1/0x1 \
action drop
Signed-off-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com>
Baruch Siach [Thu, 22 Dec 2016 18:52:48 +0000 (20:52 +0200)]
tc: add missing limits.h header
This fixes under musl build issues like:
f_matchall.c: In function ‘matchall_parse_opt’:
f_matchall.c:48:12: error: ‘LONG_MIN’ undeclared (first use in this function)
if (h == LONG_MIN || h == LONG_MAX) {
^
f_matchall.c:48:12: note: each undeclared identifier is reported only once for each function it appears in
f_matchall.c:48:29: error: ‘LONG_MAX’ undeclared (first use in this function)
if (h == LONG_MIN || h == LONG_MAX) {
^
Hadar Hen Zion [Thu, 22 Dec 2016 08:14:41 +0000 (10:14 +0200)]
tc/m_tunnel_key: Add to the usage encapsulation dest UDP port
tunnel key set parameters includes also dest UDP port, add it to the
usage.
Fixes: 449c709c3868 ("tc/m_tunnel_key: Add dest UDP port to tunnel key action") Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Reported-by: Simon Horman <simon.horman@netronome.com>
Hadar Hen Zion [Thu, 22 Dec 2016 08:14:40 +0000 (10:14 +0200)]
tc/cls_flower: Add to the usage encapsulation dest UDP port
Encapsulation dest UDP port is part of the classifier matching
parameters, add it to the usage.
Fixes: 41aa17ff4668 ("tc/cls_flower: Add dest UDP port to tunnel params") Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Reported-by: Simon Horman <simon.horman@netronome.com>
Simon Horman [Fri, 16 Dec 2016 13:54:37 +0000 (14:54 +0100)]
tc: flower: Allow *_mac options to accept a mask
* The argument to src_mac and dst_mac may now take an optional mask
to limit the scope of matching.
* This address is is documented as a LLADDR in keeping with ip-link(8).
* The formats accepted match those already output when dumping flower
filters from the kernel.
Example of use of LLADDR with and without a mask:
tc qdisc add dev eth0 ingress
tc filter add dev eth0 protocol ip parent ffff: flower indev eth0 \
src_mac 52:54:01:00:00:00/ff:ff:00:00:00:01 action drop
tc filter add dev eth0 protocol ip parent ffff: flower indev eth0 \
src_mac 52:54:00:00:00:00/23 action drop
tc filter add dev eth0 protocol ip parent ffff: flower indev eth0 \
src_mac 52:54:00:00:00:00 action drop
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Simon Horman [Fri, 16 Dec 2016 13:54:36 +0000 (14:54 +0100)]
tc: flower: document that *_ip parameters take a PREFIX as an argument.
* The argument to src_ip, dst_ip, enc_src_ip and enc_dst_ip take an
optional prefix length which is used to provide a mask to limit the scope
of matching.
* This is documented as a PREFIX in keeping with ip-route(8).
Example of uses of IPv4 and IPv6 prefixes
tc qdisc add dev eth0 ingress
tc filter add dev eth0 protocol ip parent ffff: flower \
indev eth0 dst_ip 192.168.1.1 action drop
tc filter add dev eth0 protocol ip parent ffff: flower \
indev eth0 src_ip 10.0.0.0/8 action drop
tc filter add dev eth0 protocol ipv6 parent ffff: flower \
indev eth0 src_ip 2001:DB8:1::/48 action drop
tc filter add dev eth0 protocol ipv6 parent ffff: flower \
indev eth0 dst_ip 2001:DB8::1 action drop
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Simon Horman [Fri, 16 Dec 2016 13:54:37 +0000 (14:54 +0100)]
tc: flower: Allow *_mac options to accept a mask
* The argument to src_mac and dst_mac may now take an optional mask
to limit the scope of matching.
* This address is is documented as a LLADDR in keeping with ip-link(8).
* The formats accepted match those already output when dumping flower
filters from the kernel.
Example of use of LLADDR with and without a mask:
tc qdisc add dev eth0 ingress
tc filter add dev eth0 protocol ip parent ffff: flower indev eth0 \
src_mac 52:54:01:00:00:00/ff:ff:00:00:00:01 action drop
tc filter add dev eth0 protocol ip parent ffff: flower indev eth0 \
src_mac 52:54:00:00:00:00/23 action drop
tc filter add dev eth0 protocol ip parent ffff: flower indev eth0 \
src_mac 52:54:00:00:00:00 action drop
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Simon Horman [Fri, 16 Dec 2016 13:54:36 +0000 (14:54 +0100)]
tc: flower: document that *_ip parameters take a PREFIX as an argument.
* The argument to src_ip, dst_ip, enc_src_ip and enc_dst_ip take an
optional prefix length which is used to provide a mask to limit the scope
of matching.
* This is documented as a PREFIX in keeping with ip-route(8).
Example of uses of IPv4 and IPv6 prefixes
tc qdisc add dev eth0 ingress
tc filter add dev eth0 protocol ip parent ffff: flower \
indev eth0 dst_ip 192.168.1.1 action drop
tc filter add dev eth0 protocol ip parent ffff: flower \
indev eth0 src_ip 10.0.0.0/8 action drop
tc filter add dev eth0 protocol ipv6 parent ffff: flower \
indev eth0 src_ip 2001:DB8:1::/48 action drop
tc filter add dev eth0 protocol ipv6 parent ffff: flower \
indev eth0 dst_ip 2001:DB8::1 action drop
Signed-off-by: Simon Horman <simon.horman@netronome.com>
David Ahern [Thu, 15 Dec 2016 20:07:01 +0000 (12:07 -0800)]
ip vrf: Fix reset to default VRF
Path in vrf_switch for "default" VRF is supposed to be MNT/vrf not
MNT/default. Also, default_vrf flag is redundant with ifindex. Remove
the flag in favor of ifindex != 0.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
David Ahern [Thu, 15 Dec 2016 20:07:00 +0000 (12:07 -0800)]
ip vrf: Refactor ipvrf_identify
Split ipvrf_identify into arg processing and a function that does the
actual cgroup file parsing. The latter function is used in a follow
on patch.
In the process, convert the reading of the cgroups file to use fopen
and fgets just in case the file ever grows beyond 4k. Move printing
of any error message and the vrf name to the caller of the new
vrf_identify.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
David Ahern [Tue, 13 Dec 2016 23:34:32 +0000 (15:34 -0800)]
Fix compile warning in get_addr_1
A recent cleanup causes a compile warning on Debian jessie:
CC utils.o
utils.c: In function ‘get_addr_1’:
utils.c:486:21: warning: passing argument 1 of ‘ll_addr_a2n’ from incompatible pointer type
len = ll_addr_a2n(&addr->data, sizeof(addr->data), name);
^
In file included from utils.c:34:0:
../include/rt_names.h:27:5: note: expected ‘char *’ but argument is of type ‘__u32 (*)[8]’
int ll_addr_a2n(char *lladdr, int len, const char *arg);
^
Revert the removal of the typecast
Fixes: e1933b928125 ("utils: cleanup style") Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
David Ahern [Mon, 12 Dec 2016 00:53:15 +0000 (16:53 -0800)]
Introduce ip vrf command
'ip vrf' follows the user semnatics established by 'ip netns'.
The 'ip vrf' subcommand supports 3 usages:
1. Run a command against a given vrf:
ip vrf exec NAME CMD
Uses the recently committed cgroup/sock BPF option. vrf directory
is added to cgroup2 mount. Individual vrfs are created under it. BPF
filter attached to vrf/NAME cgroup2 to set sk_bound_dev_if to the VRF
device index. From there the current process (ip's pid) is addded to
the cgroups.proc file and the given command is exected. In doing so
all AF_INET/AF_INET6 (ipv4/ipv6) sockets are automatically bound to
the VRF domain.
The association is inherited parent to child allowing the command to
be a shell from which other commands are run relative to the VRF.
2. Show the VRF a process is bound to:
ip vrf id
This command essentially looks at /proc/pid/cgroup for a "::/vrf/"
entry with the VRF name following.
3. Show process ids bound to a VRF
ip vrf pids NAME
This command dumps the file MNT/vrf/NAME/cgroup.procs since that file
shows the process ids in the particular vrf cgroup.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
David Ahern [Mon, 12 Dec 2016 00:53:14 +0000 (16:53 -0800)]
libnetlink: Add variant of rtnl_talk that does not display RTNETLINK answers error
iplink_vrf has 2 functions used to validate a user given device name is
a VRF device and to return the table id. If the user string is not a
device name ip commands with a vrf keyword show a confusing error
message: "RTNETLINK answers: No such device".
Add a variant of rtnl_talk that does not display the "RTNETLINK answers"
message and update iplink_vrf to use it.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
David Ahern [Mon, 12 Dec 2016 00:53:12 +0000 (16:53 -0800)]
Add filesystem APIs to lib
Add make_path to recursively call mkdir as needed to create a given
path with the given mode.
Add find_cgroup2_mount to lookup path where cgroup2 is mounted. If it
is not already mounted, cgroup2 is mounted under /var/run/cgroup2 for
use by iproute2.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
David Ahern [Wed, 7 Dec 2016 20:55:09 +0000 (12:55 -0800)]
Makefile: really suppress printing of directories
Makefile adds --no-print-directory to MAKEFLAGS if VERBOSE is not
defined however Config always defines VERBOSE. Update the check to
whether VERBOSE is 0.
Fixes: 57bdf8b76451 ("Make builds default to quiet mode") Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Daniel Borkmann [Tue, 6 Dec 2016 01:21:57 +0000 (02:21 +0100)]
bpf: add initial support for attaching xdp progs
Now that we made the BPF loader generic as a library, reuse it
for loading XDP programs as well. This basically adds a minimal
start of a facility for iproute2 to load XDP programs. There
currently only exists the xdp1_user.c sample code in the kernel
tree that sets up netlink directly and an iovisor/bcc front-end.
Since we have all the necessary infrastructure in place already
from tc side, we can just reuse its loader back-end and thus
facilitate migration and usability among the two for people
familiar with tc/bpf already. Sharing maps, performing tail calls,
etc works the same way as with tc. Naturally, once kernel
configuration API evolves, we will extend new features for XDP
here as well, resp. extend dumping of related netlink attributes.
Minimal example:
clang -target bpf -O2 -Wall -c prog.c -o prog.o
ip [-force] link set dev em1 xdp obj prog.o # attaching
ip [-d] link # dumping
ip link set dev em1 xdp off # detaching
For the dump, intention is that in the first line for each ip
link entry, we'll see "xdp" to indicate that this device has an
XDP program attached. Once we dump some more useful information
via netlink (digest, etc), idea is that 'ip -d link' will then
display additional relevant program information below the "link/
ether [...]" output line for such devices, for example.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>
Daniel Borkmann [Tue, 6 Dec 2016 01:17:58 +0000 (02:17 +0100)]
bpf: check for owner_prog_type and notify users when differ
Kernel commit 21116b7068b9 ("bpf: add owner_prog_type and accounted mem
to array map's fdinfo") added support for telling the owner prog type in
case of prog arrays. Give a notification to the user when they differ,
and the program eventually fails to load.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>
Thomas Graf [Wed, 7 Dec 2016 09:47:59 +0000 (10:47 +0100)]
bpf: Fix number of retries when growing log buffer
The log buffer is automatically grown when the verifier output does not
fit into the default buffer size. The number of growing attempts was
not sufficient to reach the maximum buffer size so far.
Perform 9 iterations to reach max and let the 10th one fail.