Phil Sutter [Tue, 24 Nov 2015 14:31:02 +0000 (15:31 +0100)]
ipaddress: fix ipaddr_flush for Linux >= 3.1
Linux version 3.1 introduced a consistency check for netlink dumps in
commit 670dc28 ("netlink: advertise incomplete dumps"). This bites
iproute2 when flushing more addresses than can fit into a single
RTM_GETADDR response. To silence the spurious error message "Dump was
interrupted and may be inconsistent.", advise rtnl_dump_filter_l() to
not care about NLM_F_DUMP_INTR.
Neil Horman [Thu, 5 Nov 2015 19:54:17 +0000 (14:54 -0500)]
iproute2: Ignore EADDRNOTAVAIL errors during address flush operation
I found recently that, if I disabled address promotion in the kernel, that
ip addr flush dev <dev>
would fail with an EADDRNOTAVAIL errno (though the flush operation would in fact
flush all addresses from an interface properly)
Whats happening is that, if I add a primary and multiple secondary addresses to
an interface, the flush operation first ennumerates them all with a GETADDR |
DUMP operation, then sends a delete request for each address. But the kernel,
having promotion disabled, deletes all secondary addresses when the primary is
removed. That means, that several delete requests may still be pending in the
netlink request for addresses that have been removed on our behalf, resulting in
EADDRNOTAVAIL return codes.
It seems the simplest thing to do is to understand that EADDRUNAVAIL isn't a
fatal outcome on a flush operation, as it just indicates that an address which
you want to remove is already removed, so it can safely be ignored.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Stephen Hemminger <stephen@networkplumber.org> CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Phil Sutter [Wed, 18 Nov 2015 15:57:47 +0000 (16:57 +0100)]
lnstat: fix header displaying mechanism
The algorithm depends on the loop counter ('i') to increment by one in
each iteration. Though if running endlessly (count==0), the counter was
not incremented at all.
Also change formatting of the header printing conditional a bit so it's
hopefully easier to read.
Fixes: e7e2913 ("lnstat: run indefinitely by default") Signed-off-by: Phil Sutter <phil@nwl.cc>
Phil Sutter [Fri, 6 Nov 2015 17:54:08 +0000 (18:54 +0100)]
ip_common.h header cleanup
- Drop 'extern' keyword from all function prototypes.
- Make line breaking of print_* functions consistent.
- Make print_ntable() and ipntable_reset_filter() static and remove
their declaration.
- Drop declaration of non-existent ipaddr_list() and iproute_monitor().
Phil Sutter [Fri, 13 Nov 2015 17:09:02 +0000 (18:09 +0100)]
iptunnel: simplify parsing TTL, allow 'hlim' as identifier
Instead of parsing an unsigned integer and checking boundaries, simply
parse u8. This and the added ttl alias 'hlim' provide consistency with
ip6tunnel.
Phil Sutter [Fri, 13 Nov 2015 17:08:58 +0000 (18:08 +0100)]
ip{,6}tunnel: align do_tunnels_list() a bit
In iptunnel, declare loop variables inside the loop as done in
ip6tunnel.
Fix and simplify goto logic in ip6tunnel:
- Failure to read over header lines would have left fp opened.
- By returning directly upon fopen() failure, fp can be closed
unconditionally in the end.
Phil Sutter [Fri, 13 Nov 2015 17:08:55 +0000 (18:08 +0100)]
ip/tunnel: introduce tnl_parse_key()
Instead of duplicating the same code six times (key, ikey and okey in
iptunnel and ip6tunnel), have a common parsing routine. This has the
added benefit of having the same verbose error message in ip6tunnel as
well as iptunnel.
I'm not sure if parsing an IPv4 address as key makes sense for
ip6tunnel, but the code was there before so this patch at least doesn't
make it worse.
Phil Sutter [Fri, 23 Oct 2015 17:21:23 +0000 (19:21 +0200)]
tc: u32 filter coding style cleanup
Add missing spaces around operators to increase readability. Aside from
that, make "preference" match a real synonym for "tos" and "dsfield" as
it's effect was identical to them.
Daniel Borkmann [Thu, 8 Oct 2015 10:22:39 +0000 (12:22 +0200)]
ip, realms: also allow to pass in raw realms value
If get_rt_realms() fails, try to get a possible raw u32 realms
value for the u32 RTA_FLOW/FRA_FLOW attribute, as it might be
useful to directly configure the hex value itself. And only if
that fails, then bail out.
The source realm is provided in the upper u16 (mask: 0xffff0000)
and the destination realm through the lower u16 part (mask:
0x0000ffff). This can be useful for tc's bpf realm matcher, but
also a full hex/mask param can be provided already for matching
through iptables' --realm cmdline option, for example.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Phil Sutter [Thu, 15 Oct 2015 20:32:17 +0000 (22:32 +0200)]
ip-rule: neither prohibit nor reject or unreachable flags exist
This has been inconsistent since the beginning of Git and seems to be
merely a documentation leftover, therefore just remove it from help
output and man page.
Phil Sutter [Thu, 15 Oct 2015 19:01:16 +0000 (21:01 +0200)]
ss: return -1 if an unrecognized option was given
When getopt_long encounters an option which has not been registered, it
returns '?'. React upon that and call usage() instead of help() so ss
returns with a non-zero exit status.
Roopa Prabhu [Thu, 15 Oct 2015 11:13:39 +0000 (13:13 +0200)]
lwtunnel: Add encapsulation support to ip route
This patch adds support to parse and print lwtunnel
encapsulation attributes attached to routes for MPLS
and IP tunnels.
example:
Add ipv4 route with mpls encap attributes:
Examples:
MPLS:
$ ip route add 40.1.2.0/30 encap mpls 200 via inet 40.1.1.1 dev eth3
$ ip route show
40.1.2.0/30 encap mpls 200 via 40.1.1.1 dev eth3
Add ipv4 multipath route with mpls encap attributes:
$ ip route add 10.1.1.0/30 nexthop encap mpls 200 via 10.1.1.1 dev eth0 \
nexthop encap mpls 700 via 40.1.1.2 dev eth3
$ ip route show
10.1.1.0/30
nexthop encap mpls 200 via 10.1.1.1 dev eth0 weight 1
nexthop encap mpls 700 via 40.1.1.2 dev eth3 weight 1
IP:
$ ip route add 10.1.1.1/24 encap ip id 200 dst 20.1.1.1 dev vxlan0
Roopa Prabhu [Thu, 15 Oct 2015 18:47:43 +0000 (11:47 -0700)]
ip monitor neigh: Change 'delete' to 'Deleted' to be consistent with ip route
It helps to grep for one string "Deleted" when monitoring all events.
Fixes: 6ea3ebafe077 ("iproute2: inform user when a neighbor is removed") Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Roopa Prabhu [Thu, 15 Oct 2015 11:13:38 +0000 (13:13 +0200)]
libnetlink: introduce rta_nest and u8, u16, u64 helpers for nesting within rtattr
This patch introduces two new api's rta_nest and rta_nest_end to
nest attributes inside a rta attribute represented by 'struct rtattr'
as required to construct a nexthop. Also adds rta_addattr* variants
for u8, u16 and u64 as needed to support encapsulation.
willy tarreau [Tue, 6 Oct 2015 10:09:33 +0000 (12:09 +0200)]
fix "ss -p" segfaults
I've updated Jose's patch to make it slightly simpler (eg: calloc instead
of malloc+memset), and ported it to 4.2.0 which requires it as well, and
attached it to this e-mail.
I can confirm that with this patch 4.1.1 doesn't segfault on me anymore.
The commit message should be reworked I guess though everything's in it
and I didn't want to modify his description.
Can it be merged as-is or should I reword the commit message and reference
Jose as the fix reporter ? We should not let this bug live forever.
Essentially all that is needed to get rid of this issue is the
addition of:
memset(u, 0, sizeof(*u));
after:
if (!(u = malloc(sizeof(*u))))
break;
Also patched some other situations (strcpy and sprintf uses) that
potentially produce the same results.
Signed-off-by: Jose P Santos <j.ps@openmailbox.org>
[ wt: made Jose's patch slightly simpler, all credits to him for the diag ] Signed-off-by: Willy Tarreau <w@1wt.eu>
Phil Sutter [Fri, 25 Sep 2015 12:09:49 +0000 (14:09 +0200)]
ip: link: consolidate macvlan and macvtap
After eliminating the minor differences in both files which existed
solely because features/fixes were applied to only one of them and not
the other, the remaining differences were in function naming and error
messages. The latter is addressed by using the 'id' field of struct
link_util.
Fold both files into one in order to share common code and eliminate the
chance of having fixes/enhancements applied to only one of them.
Daniel Borkmann [Thu, 8 Oct 2015 13:22:05 +0000 (15:22 +0200)]
m_bpf: don't require default opcode on ebpf actions
After the patch, the most minimal command to load an eBPF action
for late binding with auto index selection through tc is:
tc actions add action bpf obj prog.o
We already set TC_ACT_PIPE in tc as default opcode, so if nothing
further has been specified, just use it. Also, allow "ok" next to
"pass" for matching cmdline on TC_ACT_OK.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
David Ahern [Wed, 7 Oct 2015 17:23:24 +0000 (10:23 -0700)]
ip neigh: Add ifindex to request when filtering dumps by device
Add ifindex to dump request when filtering by device. If the kernel
supports it adding the index to the request limits the amount of data
the kernel pushes to userpsace.
The feature exists in userspace already, so no need to warn the user
if kernel side support does not exist. Using the kernel side filter
makes the request more efficient.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Daniel Borkmann [Fri, 25 Sep 2015 10:32:41 +0000 (12:32 +0200)]
f_bpf: allow for optional classid and add flags
When having optional classid, most minimal command can be sth
like:
tc filter add dev foo parent X: bpf obj prog.o
Therefore, adapt the code so that a next argument will not be
enforced as the case currently.
Also, minor cleanup on the classid, where we should rather
have used addattr32(), and add flags for exec configuration,
for example (using short notation):
tc filter add dev foo parent X: bpf da obj prog.o
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com>
David Ahern [Fri, 2 Oct 2015 16:42:27 +0000 (09:42 -0700)]
ip neigh: Add support for filtering dumps by master device
Add support for filtering neighbor dumps by master device. Kernel side
support provided by commit 21fdd092acc7. Since the feature is not
available in older kernels the user is given a warning message if the
kernel does not support the request.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Christoph Schulz [Fri, 25 Sep 2015 06:44:07 +0000 (08:44 +0200)]
ip: allow using a device "help" (or a prefix thereof)
Device names that match "help" or a prefix thereof should be allowed anywhere
a device name can be used. Note that a suitable keyword ("dev" or "name", the
latter for "ip tunnel") has to be used in these cases to resolve ambiguities.
Signed-off-by: Christoph Schulz <develop@kristov.de> Reported-by: Leonhard Preis <leonhard@pre.is> Reported-by: Wilhelm Wijkander <lists@0x5e.se>
Richard Alpe [Fri, 2 Oct 2015 08:15:21 +0000 (10:15 +0200)]
tipc: add man pages
This patch adds man pages for the TIPC tool. There is one main page
and one page for each top level sub-command. These pages mainly aims
to help a user of the tipc tool. In addition to this they describe
a bit about what TIPC is and some of its features as a protocol.
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Dan Webster [Thu, 24 Sep 2015 07:36:53 +0000 (09:36 +0200)]
ss: fix file-based filtering segfault
Commit 1527a17 introduced a change where the second of two ssfilter_parse()
calls in ss.c was moved outside of a conditional block (ss.c: ~3575). This
commit enabled the parsing of services, such as 'sport = :ssh', but
inadvertently broke the '-F' file-based filtering:
David Ahern [Wed, 23 Sep 2015 22:44:56 +0000 (16:44 -0600)]
ip: Add type and master filters to brief output
The brief format does not honer the master and type filters:
$ ip link show master vrf-mgmt
7: dummy0: <BROADCAST,NOARP,SLAVE> mtu 1500 qdisc noop master vrf-mgmt state DOWN mode DEFAULT group default qlen 1000
link/ether 66:39:cc:2b:e9:bd brd ff:ff:ff:ff:ff:ff
$ ip -br link show master vrf-mgmt
lo UNKNOWN 00:00:00:00:00:00 <LOOPBACK,UP,LOWER_UP>
eth0 UP 08:00:27:de:14:c8 <BROADCAST,MULTICAST,UP,LOWER_UP>
eth1 UP 08:00:27:87:02:f1 <BROADCAST,MULTICAST,UP,LOWER_UP>
eth2 UP 08:00:27:61:1e:fd <BROADCAST,MULTICAST,UP,LOWER_UP>
vrf-blue UNKNOWN a6:3f:09:34:7e:74 <NOARP,MASTER,UP,LOWER_UP>
vrf-red DOWN fe:a2:2d:e1:bc:ac <NOARP,MASTER>
dummy0 DOWN 66:39:cc:2b:e9:bd <BROADCAST,NOARP,SLAVE>
dummy1 DOWN 4a:4f:13:91:64:b1 <BROADCAST,NOARP,SLAVE>
dummy2 DOWN b2:4f:b6:cd:bd:a6 <BROADCAST,NOARP>
dummy3 DOWN 1e:06:3d:40:b8:c2 <BROADCAST,NOARP,SLAVE>
vrf-mgmt DOWN ce:b2:74:41:21:df <NOARP,MASTER>
With this patch the expected output is shown:
$ ip -br link show master vrf-mgmt
dummy0 DOWN 66:39:cc:2b:e9:bd <BROADCAST,NOARP,SLAVE>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
David Ahern [Mon, 21 Sep 2015 18:19:48 +0000 (11:19 -0700)]
ip route: Add RTM_F_LOOKUP_TABLE flag and show table id
Currently 'ip route get' does not show the table the lookup result comes
from and prior to kernel commit c36ba6603a11 the response from the kernel
was hardcoded to the main table. From the discussion this appears to be
a leftover from the route cache where the cached entry lost the table id
and so the result was hardcoded to main table.
c36ba6603a11 added the RTM_F_LOOKUP_TABLE flag to maintain that behavior
but to allow new tools to ask for the actual table id for the lookup.
This patch adds that flag to ip route get request and if the result is
not the main table shows the table id.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Andrew Vagin [Wed, 23 Sep 2015 11:43:46 +0000 (14:43 +0300)]
route: filter routes by family if it's specified
Currently when we specify AF_INET6 when it is disabled, we will get
all routes.
For example, we can boot kernel with ipv6.disable=1 and try to get ipv6
routes:
$ ip -6 route show
default via 192.168.122.1 dev eth0 proto static metric 100
192.168.122.0/24 dev eth0 proto kernel scope link src 192.168.122.141 metric 100
Here are ipv4 routes and this is unexpected behaviour.
Phil Sutter [Thu, 10 Sep 2015 14:25:47 +0000 (16:25 +0200)]
tc: fq: allow setting and retrieving flow refill delay
Code to parse and export this tuneable via netlink is already present in
sched_fq.c of the kernel, so not making it accessible for users would be
a waste of resources.