]> git.proxmox.com Git - mirror_iproute2.git/log
mirror_iproute2.git
8 years agoiproute2: ip-route.8.in: Add expires option for ip route
Hangbin Liu [Fri, 25 Dec 2015 03:12:16 +0000 (11:12 +0800)]
iproute2: ip-route.8.in: Add expires option for ip route

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
8 years agoiproute2: ip-route.8.in: Add missing '[' before 'pref'
Hangbin Liu [Fri, 25 Dec 2015 03:12:15 +0000 (11:12 +0800)]
iproute2: ip-route.8.in: Add missing '[' before 'pref'

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
8 years agoroute: allow routes to be configured with expire values
Hangbin Liu [Mon, 21 Dec 2015 08:29:36 +0000 (16:29 +0800)]
route: allow routes to be configured with expire values

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
8 years agoMerge branch 'master' into net-next
Stephen Hemminger [Tue, 22 Dec 2015 05:37:21 +0000 (21:37 -0800)]
Merge branch 'master' into net-next

8 years agoiptunnel: Fix compile error in ip/tunnel.c
Phil Sutter [Mon, 21 Dec 2015 19:42:56 +0000 (20:42 +0100)]
iptunnel: Fix compile error in ip/tunnel.c

I repeatedly failed to get this right, so now I have to clean up my mess
afterwards.

Fixes: 7d6aadcd0a1dc ("ip{,6}tunnel: have a shared stats parser/printer")
Signed-off-by: Phil Sutter <phil@nwl.cc>
8 years agoip{,6}tunnel: have a shared stats parser/printer
Phil Sutter [Fri, 18 Dec 2015 10:58:06 +0000 (11:58 +0100)]
ip{,6}tunnel: have a shared stats parser/printer

This has a slight side-effect of not aborting when /proc/net/dev is
malformed, but OTOH stats are not parsed for uninteresting interfaces.

Signed-off-by: Phil Sutter <phil@nwl.cc>
8 years agolwtunnel: implement support for ip6 encap
Paolo Abeni [Fri, 18 Dec 2015 09:50:38 +0000 (10:50 +0100)]
lwtunnel: implement support for ip6 encap

Currently ip6 encap support for lwtunnel is missing.
This patch implement it, mostly duplicating the ipv4 parts.

Also be sure to insert a space after the encap type, when
showing lwtunnel, to avoid the tunnel type and the following
argument being merged into a single word.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
8 years agogre: add support for collect metadata flag
Paolo Abeni [Fri, 18 Dec 2015 09:50:37 +0000 (10:50 +0100)]
gre: add support for collect metadata flag

This patch add support for IFLA_GRE_COLLECT_METADATA via the
'external' keyword to the gre link.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
8 years agovxlan: add support for collect metadata flag
Paolo Abeni [Fri, 18 Dec 2015 09:50:36 +0000 (10:50 +0100)]
vxlan: add support for collect metadata flag

This patch add support for IFLA_VXLAN_COLLECT_METADATA via the
'external' keyword to the vxlan link.

Also enforce mutual exclusion between 'vni' and 'external'.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
8 years agoiproute: print addrgenmode stable_secret and fallback otherwise
Hannes Frederic Sowa [Wed, 16 Dec 2015 09:52:36 +0000 (10:52 +0100)]
iproute: print addrgenmode stable_secret and fallback otherwise

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
8 years agobpf: minor fix in api and bpf_dump_error() usage
Daniel Borkmann [Mon, 14 Dec 2015 15:57:32 +0000 (16:57 +0100)]
bpf: minor fix in api and bpf_dump_error() usage

Fix a whitespace in bpf_dump_error() usage, and also a missing closing
bracket in ntohl() macro for eBPF programs.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
8 years agoinclude: update kernel headers
Stephen Hemminger [Fri, 18 Dec 2015 01:21:53 +0000 (17:21 -0800)]
include: update kernel headers

Current headers for net-next

8 years agoMerge branch 'master' into net-next
Stephen Hemminger [Fri, 18 Dec 2015 01:21:15 +0000 (17:21 -0800)]
Merge branch 'master' into net-next

8 years agolwtunnel: fix argument parsing
Paolo Abeni [Tue, 15 Dec 2015 11:18:04 +0000 (12:18 +0100)]
lwtunnel: fix argument parsing

Currently parse_encap_ip() does not update correctly argv/argc;
if multiple lwtunnel arguments are provided, the parsing fails after
the first one, i.e.

 ip route add 172.16.101.0/24 dev vxlan1 encap ip id 42 dst 192.168.255.1

fails with:

 Error: either "to" is duplicate, or "dst" is a garbage.

This commit addresses the issue, stepping to next argument at each iteration
of the parsing loop.

Fixes: 1e5293056a02 ("lwtunnel: Add encapsulation support to ip route")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
8 years agoroute: Fix printing of locked entries
Phil Sutter [Sat, 12 Dec 2015 13:09:48 +0000 (14:09 +0100)]
route: Fix printing of locked entries

Commit 0f7543322c5fd ("route: ignore RTAX_HOPLIMIT of value -1")
accidentally reordered fprintf statements. This patch restores the
original ordering.

Fixes: 0f7543322c5fd ("route: ignore RTAX_HOPLIMIT of value -1")
Signed-off-by: Phil Sutter <phil@nwl.cc>
8 years agoip neigh: device is optional for proxy entries
Konstantin Khlebnikov [Mon, 30 Nov 2015 22:17:06 +0000 (01:17 +0300)]
ip neigh: device is optional for proxy entries

Though dumping such entries crashes present kernels.

Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
8 years agoila: Add support for ILA lwtunnels
Tom Herbert [Mon, 30 Nov 2015 22:57:28 +0000 (14:57 -0800)]
ila: Add support for ILA lwtunnels

This patch:
 - Adds a utility function for parsing a 64 bit address
 - Adds a utility function for converting a 64 bit address to ASCII
 - Adds and ILA encap type in lwt tunnels

Signed-off-by: Tom Herbert <tom@herbertland.com>
8 years agoexamples, bpf: further improve examples
Daniel Borkmann [Tue, 1 Dec 2015 23:25:36 +0000 (00:25 +0100)]
examples, bpf: further improve examples

Improve example files further and add a more generic set of possible
helpers for them that can be used.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
8 years agoMerge branch 'master' into net-next
Stephen Hemminger [Thu, 10 Dec 2015 16:56:18 +0000 (08:56 -0800)]
Merge branch 'master' into net-next

8 years agoip: fix format string when reading statistics
Stephen Hemminger [Thu, 10 Dec 2015 16:52:10 +0000 (08:52 -0800)]
ip: fix format string when reading statistics

The tunnel code was doing sscanf(buf, "%ld", &x) where x was unsigned
long.

8 years agotc.8: Fix reference to tc-tcindex.8
Phil Sutter [Thu, 10 Dec 2015 12:24:51 +0000 (13:24 +0100)]
tc.8: Fix reference to tc-tcindex.8

Just a typo there, it's spelled correctly in SEE ALSO section..

Signed-off-by: Phil Sutter <phil@nwl.cc>
8 years agovrf: Add support for table names
David Ahern [Tue, 8 Dec 2015 20:24:44 +0000 (12:24 -0800)]
vrf: Add support for table names

Currently, the table id for VRF devices requires an integer. Convert
it to use rtnl_rttable_a2n which handles table names from the iproute2
directory.

This also fixes a bug in the original commit where table name are not
properly handled.

Fixes: 15faa0a30bed ("add support for VRF device")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
8 years agolibnetlink: don't confuse variables in rtnl_talk()
Nicolas Dichtel [Thu, 3 Dec 2015 16:13:48 +0000 (17:13 +0100)]
libnetlink: don't confuse variables in rtnl_talk()

There is two variables named 'len' in rtnl_talk. In fact, commit
c079e121a73a didn't work. For example, it was possible to trigger
a seg fault with this command:
$ ip link set gre2 type ip6gre hoplimit 32

Let's rename the argument len to maxlen.

Fixes: c079e121a73a ("libnetlink: add size argument to rtnl_talk")
Reported-by: Thomas Faivre <thomas.faivre@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
8 years agoroute: ignore RTAX_HOPLIMIT of value -1
Phil Sutter [Wed, 2 Dec 2015 12:50:22 +0000 (13:50 +0100)]
route: ignore RTAX_HOPLIMIT of value -1

Older kernels use -1 internally as indicator to use the sysctl default,
but they still export the setting. Newer kernels use 0 to indicate that
(which is why the conversion from -1 to 0 was done here), but they also
stopped exporting the value. Since the meaning of -1 is clear, treat it
equally like default on newer kernels (which is to not print anything).

Signed-off-by: Phil Sutter <phil@nwl.cc>
8 years agoiptunnel: cleanup code
Stephen Hemminger [Sun, 29 Nov 2015 20:05:39 +0000 (12:05 -0800)]
iptunnel: cleanup code

Make iptunnel pass checkpatch (mostly).

8 years agoip_tunnel: determine tunnel address family from the tunnel type
Konstantin Shemyak [Thu, 26 Nov 2015 16:22:05 +0000 (18:22 +0200)]
ip_tunnel: determine tunnel address family from the tunnel type

On 24.11.2015 02:26, Stephen Hemminger wrote:
> On Thu, 12 Nov 2015 21:10:08 +0000
> Konstantin Shemyak <konstantin@shemyak.com> wrote:
>
>> When creating an IP tunnel over IPv6, the address family must be passed in
>> the option, e.g.
>>
>> ip -6 tunnel add mode ip6gre local 1::1 remote 2::2
>>
>> This makes it impossible to create both IPv4 and IPv6 tunnels in one batch.
>>
>> In fact the address family option is redundant here, as each tunnel mode is
>> relevant for only one address family.
>> The patch determines whether the applicable address family is AF_INET6
>> instead of the default AF_INET and makes the "-6" option unnecessary for
>> "ip tunnel add".
>>
>> Signed-off-by: Konstantin Shemyak <konstantin@shemyak.com>
>> ---
>>   ip/iptunnel.c                          | 26 ++++++++++++++++++++++++++
>>   testsuite/tests/ip/tunnel/add_tunnel.t | 14 ++++++++++++++
>>   2 files changed, 40 insertions(+)
>>   create mode 100755 testsuite/tests/ip/tunnel/add_tunnel.t
>>
>> diff --git a/ip/iptunnel.c b/ip/iptunnel.c
>> index 78fa988..7826a37 100644
>> --- a/ip/iptunnel.c
>> +++ b/ip/iptunnel.c
>> @@ -629,8 +629,34 @@ static int do_6rd(int argc, char **argv)
>>          return tnl_6rd_ioctl(cmd, medium, &ip6rd);
>>   }
>>
>> +static int tunnel_mode_is_ipv6(char *tunnel_mode) {
>> +       char *ipv6_modes[] = {
>> +               "ipv6/ipv6", "ip6ip6",
>> +               "vti6",
>> +               "ip/ipv6", "ipv4/ipv6", "ipip6", "ip4ip6",
>> +               "ip6gre", "gre/ipv6",
>> +               "any/ipv6", "any"
>> +       };
>> +       int i;
>> +
>> +       for (i = 0; i < sizeof(ipv6_modes) / sizeof(char *); i++) {
>> +               if (strcmp(ipv6_modes[i], tunnel_mode) == 0)
>> +                       return 1;
>> +       }
>> +       return 0;
>> +}
>> +
>
> The ipv6_modes table should be static const.

Thank you for the note! attached the corrected patch.

> Also is it possible to use strstr for ipv6 and ip6 or even strchr(tunnel_mode, '6')
> to simplify this?

There is IPv6 tunnel mode 'any', and IPv4 tunnel mode 'ipv6/ip' (aka
'sit'). It looks to me that attempts to find some substring match
would not make the code much shorter, but definitely less readable.

Konstantin Shemyak.

>From 42d27db0055c3a114fe6eb86d680bef9ec098ad4 Mon Sep 17 00:00:00 2001
From: Konstantin Shemyak <konstantin@shemyak.com>
Date: Thu, 12 Nov 2015 20:52:02 +0200
Subject: [PATCH] Tunnel address family is determined from the tunnel mode

When the tunnel mode already tells the IP address family, "ip tunnel"
command determines it and does not require option "-4"/"-6" to be passed.

This makes possible creating both IPv4 and IPv6 tunnels in one batch.

Signed-off-by: Konstantin Shemyak <konstantin@shemyak.com>
8 years ago{f,m}_bpf: add more example code
Daniel Borkmann [Thu, 26 Nov 2015 14:38:46 +0000 (15:38 +0100)]
{f,m}_bpf: add more example code

I've added three examples to examples/bpf/ that demonstrate how one can
implement eBPF tail calls in tc with f.e. multiple levels of nesting.
That should act as a good starting point, but also as test cases for the
ELF loader and kernel. A real test suite for {f,m,e}_bpf is still to be
developed in future work.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
8 years ago{f,m}_bpf: allow updates on program arrays
Daniel Borkmann [Thu, 26 Nov 2015 14:38:45 +0000 (15:38 +0100)]
{f,m}_bpf: allow updates on program arrays

Since we have all infrastructure in place now, allow atomic live updates
on program arrays. This can be very useful e.g. in case programs that are
being tail-called need to be replaced, f.e. when classifier functionality
needs to be changed, new protocols added/removed during runtime, etc.

Thus, provide a way for in-place code updates, minimal example: Given is
an object file cls.o that contains the entry point in section 'classifier',
has a globally pinned program array 'jmp' with 2 slots and id of 0, and
two tail called programs under section '0/0' (prog array key 0) and '0/1'
(prog array key 1), the section encoding for the loader is <id/key>.
Adding the filter loads everything into cls_bpf:

  tc filter add dev foo parent ffff: bpf da obj cls.o

Now, the program under section '0/1' needs to be replaced with an updated
version that resides in the same section (also full path to tc's subfolder
of the mount point can be passed, e.g. /sys/fs/bpf/tc/globals/jmp):

  tc exec bpf graft m:globals/jmp obj cls.o sec 0/1

In case the program resides under a different section 'foo', it can also
be injected into the program array like:

  tc exec bpf graft m:globals/jmp key 1 obj cls.o sec foo

If the new tail called classifier program is already available as a pinned
object somewhere (here: /sys/fs/bpf/tc/progs/parser), it can be injected
into the prog array like:

  tc exec bpf graft m:globals/jmp key 1 fd m:progs/parser

In the kernel, the program on key 1 is being atomically replaced and the
old one's refcount dropped.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
8 years ago{f, m}_bpf: allow for user-defined object pinnings
Daniel Borkmann [Thu, 26 Nov 2015 14:38:44 +0000 (15:38 +0100)]
{f, m}_bpf: allow for user-defined object pinnings

The recently introduced object pinning can be further extended in order
to allow sharing maps beyond tc namespace. F.e. maps that are being pinned
from tracing side, can be accessed through this facility as well.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
8 years ago{f, m}_bpf: check map attributes when fetching as pinned
Daniel Borkmann [Thu, 26 Nov 2015 14:38:43 +0000 (15:38 +0100)]
{f, m}_bpf: check map attributes when fetching as pinned

Make use of the new show_fdinfo() facility and verify that when a
pinned map is being fetched that its basic attributes are the same
as the map we declared from the ELF file. I.e. when placed into the
globalns, collisions could occur. In such a case warn the user and
bail out.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
8 years ago{f,m}_bpf: make tail calls working
Daniel Borkmann [Thu, 26 Nov 2015 14:38:42 +0000 (15:38 +0100)]
{f,m}_bpf: make tail calls working

Now that we have the possibility of sharing maps, it's time we get the
ELF loader fully working with regards to tail calls. Since program array
maps are pinned, we can keep them finally alive. I've noticed two bugs
that are being fixed in bpf_fill_prog_arrays() with this patch. Example
code comes as follow-up.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
8 years agoMerge branch 'master' into net-next
Stephen Hemminger [Sun, 29 Nov 2015 19:53:43 +0000 (11:53 -0800)]
Merge branch 'master' into net-next

8 years agovxlan: Add support for remote checksum offload
Tom Herbert [Fri, 27 Nov 2015 18:23:43 +0000 (10:23 -0800)]
vxlan: Add support for remote checksum offload

This patch adds support to remote checksum checksum offload
to VXLAN. This patch adds remcsumtx and remcsumrx to ip vxlan
configuration to enable remote checksum offload for transmit
and receive on the VXLAN tunnel.

https://tools.ietf.org/html/draft-herbert-vxlan-rco-00

Example:

ip link add name vxlan0 type vxlan id 42 group 239.1.1.1 dev eth0 \
    udpcsum remcsumtx remcsumrx

Testing:

Ran single netperf over mlnx4 to illustrate the effest:

- Without RCO (UDP csum set to zero)
  4335.99 Mbps
- With RCO enabled
  7661.81 Mbps

Signed-off-by: Tom Herbert <tom@herbertland.com>
8 years agoget rid of unnecessary fgets() buffer size limitation
Phil Sutter [Sat, 28 Nov 2015 00:00:05 +0000 (01:00 +0100)]
get rid of unnecessary fgets() buffer size limitation

fgets() will read at most size-1 bytes into the buffer and add a
terminating null-char at the end. Therefore it is not necessary to pass
a reduced buffer size when calling it.

This change was generated using the following semantic patch:

@@
identifier buf, fp;
@@
- fgets(buf, sizeof(buf) - 1, fp)
+ fgets(buf, sizeof(buf), fp)

Signed-off-by: Phil Sutter <phil@nwl.cc>
8 years agoget rid of remaining -Wunused-result warnings
Phil Sutter [Sat, 28 Nov 2015 00:00:04 +0000 (01:00 +0100)]
get rid of remaining -Wunused-result warnings

Although not fundamentally necessary to check return codes in these
spots, preventing the warnings will put new ones into focus.

Signed-off-by: Phil Sutter <phil@nwl.cc>
8 years agoss: review is_ephemeral()
Phil Sutter [Sat, 28 Nov 2015 00:00:03 +0000 (01:00 +0100)]
ss: review is_ephemeral()

No need to keep static port boundaries global, they are not used
directly. Keeping them local also allows to safely reduce their names to
the minimum. Assign hardcoded fallback values also if fscanf() fails.
Get rid of unnecessary braces around return parameter.

Instead of more or less duplicating is_ephemeral() in run_ssfilter(),
simply call the function instead.

Signed-off-by: Phil Sutter <phil@nwl.cc>
8 years agoss: reduce max indentation level in init_service_resolver()
Phil Sutter [Sat, 28 Nov 2015 00:00:02 +0000 (01:00 +0100)]
ss: reduce max indentation level in init_service_resolver()

Exit early or continue on error instead of putting conditional into
conditional to make reading the code a bit easier.

Also, the call to memcpy() can be skipped by initialising prog with the
desired prefix.

Signed-off-by: Phil Sutter <phil@nwl.cc>
8 years agolnstat: review lnstat_update()
Phil Sutter [Sat, 28 Nov 2015 00:00:01 +0000 (01:00 +0100)]
lnstat: review lnstat_update()

Instead of calling rewind() and fgets() before every call to
scan_lines(), move them into scan_lines() itself.

This should also fix compat mode, as before the second call to
scan_lines() the first line was skipped unconditionally.

Signed-off-by: Phil Sutter <phil@nwl.cc>
8 years agobridge.8: minor formatting cleanup
Phil Sutter [Tue, 24 Nov 2015 14:50:00 +0000 (15:50 +0100)]
bridge.8: minor formatting cleanup

- Replace commas at end of subsection with dots.
- Replace double whitespace by single one.

Signed-off-by: Phil Sutter <phil@nwl.cc>
8 years agoiproute: restrict hoplimit values to be in range [0; 255]
Phil Sutter [Tue, 24 Nov 2015 14:45:31 +0000 (15:45 +0100)]
iproute: restrict hoplimit values to be in range [0; 255]

Technically, the range of possible hoplimit values are defined by IPv4
and IPv6 header formats. Both define the field to be eight bits in size,
which leads to a value range of [0;255]. Setting a packet's hoplimit
field to 0 though makes not much sense, as the next hop would
immediately drop the packet. Therefore Linux uses 0 as a special value
indicating to use the system's default hoplimit (configurable via
sysctl). In iproute, setting the hoplimit of a route to 0 is equivalent
to omitting the hoplimit parameter alltogether, so it is actually not
necessary to allow that value to be specified, but keep it anyway for
backwards compatibility.

Signed-off-by: Phil Sutter <phil@nwl.cc>
8 years agoiptoken: simplify iptoken_list a bit
Phil Sutter [Tue, 24 Nov 2015 14:31:04 +0000 (15:31 +0100)]
iptoken: simplify iptoken_list a bit

Since it uses only a single filter, rtnl_dump_filter() can be used.

Signed-off-by: Phil Sutter <phil@nwl.cc>
8 years agoipaddress: drop unnecessary check in ipaddr_list_flush_or_save()
Phil Sutter [Tue, 24 Nov 2015 14:31:03 +0000 (15:31 +0100)]
ipaddress: drop unnecessary check in ipaddr_list_flush_or_save()

Right after ipaddr_reset_filter(), filter.family is always AF_UNSPEC.

Signed-off-by: Phil Sutter <phil@nwl.cc>
8 years agoipaddress: fix ipaddr_flush for Linux >= 3.1
Phil Sutter [Tue, 24 Nov 2015 14:31:02 +0000 (15:31 +0100)]
ipaddress: fix ipaddr_flush for Linux >= 3.1

Linux version 3.1 introduced a consistency check for netlink dumps in
commit 670dc28 ("netlink: advertise incomplete dumps"). This bites
iproute2 when flushing more addresses than can fit into a single
RTM_GETADDR response. To silence the spurious error message "Dump was
interrupted and may be inconsistent.", advise rtnl_dump_filter_l() to
not care about NLM_F_DUMP_INTR.

Signed-off-by: Phil Sutter <phil@nwl.cc>
8 years agolibnetlink: introduce nc_flags
Phil Sutter [Tue, 24 Nov 2015 14:31:01 +0000 (15:31 +0100)]
libnetlink: introduce nc_flags

Allow for a filter to ignore certain nlmsg_flags.

Signed-off-by: Phil Sutter <phil@nwl.cc>
8 years agoipaddress: simplify ipaddr_flush()
Phil Sutter [Tue, 24 Nov 2015 14:31:00 +0000 (15:31 +0100)]
ipaddress: simplify ipaddr_flush()

Since it's no longer relevant whether an IP address is primary or
secondary when flushing, ipaddr_flush() can be simplified a bit.

Signed-off-by: Phil Sutter <phil@nwl.cc>
8 years agort_names: style cleanup
Stephen Hemminger [Sun, 29 Nov 2015 19:41:23 +0000 (11:41 -0800)]
rt_names: style cleanup

Cleanup all checkpatch complaints about whitespace in rt_names.

8 years agoAdd support for rt_tables.d
David Ahern [Tue, 24 Nov 2015 21:20:01 +0000 (13:20 -0800)]
Add support for rt_tables.d

Add support for reading table id/name mappings from rt_tables.d
directory.

Suggested-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
8 years agogeneve: add support for IPv6 link partners
John W. Linville [Thu, 24 Sep 2015 18:39:39 +0000 (14:39 -0400)]
geneve: add support for IPv6 link partners

Signed-off-by: John W. Linville <linville@tuxdriver.com>
8 years agogeneve: add support for IPv6 link partners
John W. Linville [Thu, 24 Sep 2015 18:39:39 +0000 (14:39 -0400)]
geneve: add support for IPv6 link partners

Signed-off-by: John W. Linville <linville@tuxdriver.com>
8 years ago{f,m}_bpf: allow for sharing maps
Daniel Borkmann [Thu, 12 Nov 2015 23:39:29 +0000 (00:39 +0100)]
{f,m}_bpf: allow for sharing maps

This larger work addresses one of the bigger remaining issues on
tc's eBPF frontend, that is, to allow for persistent file descriptors.
Whenever tc parses the ELF object, extracts and loads maps into the
kernel, these file descriptors will be out of reach after the tc
instance exits.

Meaning, for simple (unnested) programs which contain one or
multiple maps, the kernel holds a reference, and they will live
on inside the kernel until the program holding them is unloaded,
but they will be out of reach for user space, even worse with
(also multiple nested) tail calls.

For this issue, we introduced the concept of an agent that can
receive the set of file descriptors from the tc instance creating
them, in order to be able to further inspect/update map data for
a specific use case. However, while that is more tied towards
specific applications, it still doesn't easily allow for sharing
maps accross multiple tc instances and would require a daemon to
be running in the background. F.e. when a map should be shared by
two eBPF programs, one attached to ingress, one to egress, this
currently doesn't work with the tc frontend.

This work solves exactly that, i.e. if requested, maps can now be
_arbitrarily_ shared between object files (PIN_GLOBAL_NS) or within
a single object (but various program sections, PIN_OBJECT_NS) without
"loosing" the file descriptor set. To make that happen, we use eBPF
object pinning introduced in kernel commit b2197755b263 ("bpf: add
support for persistent maps/progs") for exactly this purpose.

The shipped examples/bpf/bpf_shared.c code from this patch can be
easily applied, for instance, as:

 - classifier-classifier shared:

  tc filter add dev foo parent 1: bpf obj shared.o sec egress
  tc filter add dev foo parent ffff: bpf obj shared.o sec ingress

 - classifier-action shared (here: late binding to a dummy classifier):

  tc actions add action bpf obj shared.o sec egress pass index 42
  tc filter add dev foo parent ffff: bpf obj shared.o sec ingress
  tc filter add dev foo parent 1: bpf bytecode '1,6 0 0 4294967295,' \
     action bpf index 42

The toy example increments a shared counter on egress and dumps its
value on ingress (if no sharing (PIN_NONE) would have been chosen,
map value is 0, of course, due to the two map instances being created):

  [...]
          <idle>-0     [002] ..s. 38264.788234: : map val: 4
          <idle>-0     [002] ..s. 38264.788919: : map val: 4
          <idle>-0     [002] ..s. 38264.789599: : map val: 5
  [...]

... thus if both sections reference the pinned map(s) in question,
tc will take care of fetching the appropriate file descriptor.

The patch has been tested extensively on both, classifier and
action sides.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
8 years agoiproute2: Ignore EADDRNOTAVAIL errors during address flush operation
Neil Horman [Thu, 5 Nov 2015 19:54:17 +0000 (14:54 -0500)]
iproute2: Ignore EADDRNOTAVAIL errors during address flush operation

I found recently that, if I disabled address promotion in the kernel, that
ip addr flush dev <dev>

would fail with an EADDRNOTAVAIL errno (though the flush operation would in fact
flush all addresses from an interface properly)

Whats happening is that, if I add a primary and multiple secondary addresses to
an interface, the flush operation first ennumerates them all with a GETADDR |
DUMP operation, then sends a delete request for each address.  But the kernel,
having promotion disabled, deletes all secondary addresses when the primary is
removed.  That means, that several delete requests may still be pending in the
netlink request for addresses that have been removed on our behalf, resulting in
EADDRNOTAVAIL return codes.

It seems the simplest thing to do is to understand that EADDRUNAVAIL isn't a
fatal outcome on a flush operation, as it just indicates that an address which
you want to remove is already removed, so it can safely be ignored.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Stephen Hemminger <stephen@networkplumber.org>
CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
8 years agobridge.8: document fdb replace command
Phil Sutter [Wed, 18 Nov 2015 11:46:42 +0000 (12:46 +0100)]
bridge.8: document fdb replace command

Despite commit 45a82e5 ("iproute vxlan add support for fdb replace
command"), the 'fdb replace' command was not mentioned in bridge.8.

Signed-off-by: Phil Sutter <phil@nwl.cc>
8 years agolnstat: fix header displaying mechanism
Phil Sutter [Wed, 18 Nov 2015 15:57:47 +0000 (16:57 +0100)]
lnstat: fix header displaying mechanism

The algorithm depends on the loop counter ('i') to increment by one in
each iteration. Though if running endlessly (count==0), the counter was
not incremented at all.

Also change formatting of the header printing conditional a bit so it's
hopefully easier to read.

Fixes: e7e2913 ("lnstat: run indefinitely by default")
Signed-off-by: Phil Sutter <phil@nwl.cc>
8 years agolnstat: describe -s option in help output
Phil Sutter [Wed, 18 Nov 2015 15:57:46 +0000 (16:57 +0100)]
lnstat: describe -s option in help output

Signed-off-by: Phil Sutter <phil@nwl.cc>
8 years agoupdate kernel headers to 4.4-rc1
Stephen Hemminger [Mon, 23 Nov 2015 23:53:04 +0000 (15:53 -0800)]
update kernel headers to 4.4-rc1

Post merge window changes

8 years agoip_common.h header cleanup
Phil Sutter [Fri, 6 Nov 2015 17:54:08 +0000 (18:54 +0100)]
ip_common.h header cleanup

- Drop 'extern' keyword from all function prototypes.
- Make line breaking of print_* functions consistent.
- Make print_ntable() and ipntable_reset_filter() static and remove
  their declaration.
- Drop declaration of non-existent ipaddr_list() and iproute_monitor().

Signed-off-by: Phil Sutter <phil@nwl.cc>
8 years agomisc: remove extra blank line
Stephen Hemminger [Mon, 23 Nov 2015 23:42:34 +0000 (15:42 -0800)]
misc: remove extra blank line

8 years agoman8: scrub trailing whitespace
Stephen Hemminger [Mon, 23 Nov 2015 23:41:37 +0000 (15:41 -0800)]
man8: scrub trailing whitespace

Remove extraneous whitespace

8 years agoman: Spelling fixes
Ville Skyttä [Sat, 7 Nov 2015 09:53:00 +0000 (11:53 +0200)]
man: Spelling fixes

Signed-off-by: Ville Skyttä <ville.skytta@iki.fi>
8 years agoman: Syntax and warning fixes
Ville Skyttä [Sat, 7 Nov 2015 09:52:59 +0000 (11:52 +0200)]
man: Syntax and warning fixes

Fix syntax issues and warnings highlighted by `man --warnings=w' from
man-db 2.7.1.

Signed-off-by: Ville Skyttä <ville.skytta@iki.fi>
8 years agoip{,6}tunnel: put spaces around non-unary operators
Phil Sutter [Fri, 13 Nov 2015 17:09:05 +0000 (18:09 +0100)]
ip{,6}tunnel: put spaces around non-unary operators

Signed-off-by: Phil Sutter <phil@nwl.cc>
8 years agoiptunnel: sanitize copying tunnel name
Phil Sutter [Fri, 13 Nov 2015 17:09:04 +0000 (18:09 +0100)]
iptunnel: sanitize copying tunnel name

Since p->name is only IFNAMSIZ bytes, do not copy more than IFNAMSIZ - 1
bytes into it so there remains at least a single null byte in the end.

Signed-off-by: Phil Sutter <phil@nwl.cc>
8 years agoiptunnel: share common code when determining the default interface name
Phil Sutter [Fri, 13 Nov 2015 17:09:03 +0000 (18:09 +0100)]
iptunnel: share common code when determining the default interface name

Signed-off-by: Phil Sutter <phil@nwl.cc>
8 years agoiptunnel: simplify parsing TTL, allow 'hlim' as identifier
Phil Sutter [Fri, 13 Nov 2015 17:09:02 +0000 (18:09 +0100)]
iptunnel: simplify parsing TTL, allow 'hlim' as identifier

Instead of parsing an unsigned integer and checking boundaries, simply
parse u8. This and the added ttl alias 'hlim' provide consistency with
ip6tunnel.

Signed-off-by: Phil Sutter <phil@nwl.cc>
8 years agoiptunnel: share common code when setting tunnel mode
Phil Sutter [Fri, 13 Nov 2015 17:09:01 +0000 (18:09 +0100)]
iptunnel: share common code when setting tunnel mode

Signed-off-by: Phil Sutter <phil@nwl.cc>
8 years agoip6tunnel: fix coding style: no newline between brace and else
Phil Sutter [Fri, 13 Nov 2015 17:09:00 +0000 (18:09 +0100)]
ip6tunnel: fix coding style: no newline between brace and else

Signed-off-by: Phil Sutter <phil@nwl.cc>
8 years agoip6tunnel: print local/remote addresses like iptunnel does
Phil Sutter [Fri, 13 Nov 2015 17:08:59 +0000 (18:08 +0100)]
ip6tunnel: print local/remote addresses like iptunnel does

This makes output consistent with iptunnel, also supporting reverse DNS
lookup for remote address if requested.

Signed-off-by: Phil Sutter <phil@nwl.cc>
8 years agoip{,6}tunnel: align do_tunnels_list() a bit
Phil Sutter [Fri, 13 Nov 2015 17:08:58 +0000 (18:08 +0100)]
ip{,6}tunnel: align do_tunnels_list() a bit

In iptunnel, declare loop variables inside the loop as done in
ip6tunnel.

Fix and simplify goto logic in ip6tunnel:
- Failure to read over header lines would have left fp opened.
- By returning directly upon fopen() failure, fp can be closed
  unconditionally in the end.

Use the same goto logic in iptunnel, as well.

Signed-off-by: Phil Sutter <phil@nwl.cc>
8 years agoiptunnel: use ll_name_to_index() for physical interface lookup
Phil Sutter [Fri, 13 Nov 2015 17:08:57 +0000 (18:08 +0100)]
iptunnel: use ll_name_to_index() for physical interface lookup

Although the cache is only initialized in do_show(), this way it is at
least consistent with ip6tunnel.

Signed-off-by: Phil Sutter <phil@nwl.cc>
8 years agoip{, 6}tunnel: unify behaviour if physical device is not found
Phil Sutter [Fri, 13 Nov 2015 17:08:56 +0000 (18:08 +0100)]
ip{, 6}tunnel: unify behaviour if physical device is not found

Make ip6tunnel print an error message as well. While there, get rid of
unnecessary line breaking.

Signed-off-by: Phil Sutter <phil@nwl.cc>
8 years agoip/tunnel: introduce tnl_parse_key()
Phil Sutter [Fri, 13 Nov 2015 17:08:55 +0000 (18:08 +0100)]
ip/tunnel: introduce tnl_parse_key()

Instead of duplicating the same code six times (key, ikey and okey in
iptunnel and ip6tunnel), have a common parsing routine. This has the
added benefit of having the same verbose error message in ip6tunnel as
well as iptunnel.

I'm not sure if parsing an IPv4 address as key makes sense for
ip6tunnel, but the code was there before so this patch at least doesn't
make it worse.

Signed-off-by: Phil Sutter <phil@nwl.cc>
8 years agoip{, 6}tunnel: get rid of extraneous whitespace when printing
Phil Sutter [Fri, 13 Nov 2015 17:08:54 +0000 (18:08 +0100)]
ip{, 6}tunnel: get rid of extraneous whitespace when printing

Put whitespace in the beginning of optional parts, not as suffix
anywhere. Also drop double whitespaces in between words.

Signed-off-by: Phil Sutter <phil@nwl.cc>
8 years agomisc/Makefile: use PKG_CONFIG
Aaro Koskinen [Tue, 17 Nov 2015 14:08:00 +0000 (16:08 +0200)]
misc/Makefile: use PKG_CONFIG

Use PKG_CONFIG from Config - it works better when cross-compiling.

Signed-off-by: Aaro Koskinen <aaro.koskinen@nokia.com>
8 years agoMerge branch 'master' into net-next
Stephen Hemminger [Wed, 4 Nov 2015 00:38:15 +0000 (16:38 -0800)]
Merge branch 'master' into net-next

8 years agov4.3.0
Stephen Hemminger [Wed, 4 Nov 2015 00:34:46 +0000 (16:34 -0800)]
v4.3.0

8 years agoMerge branch 'master' into net-next
Stephen Hemminger [Wed, 4 Nov 2015 00:31:57 +0000 (16:31 -0800)]
Merge branch 'master' into net-next

8 years agolib/utils: improve error messages of get_addr() and get_prefix()
Phil Sutter [Thu, 29 Oct 2015 16:20:56 +0000 (17:20 +0100)]
lib/utils: improve error messages of get_addr() and get_prefix()

Instead of statically complaining about illegal inet address, use
get_family() to get the address family right.

Based on a patch by Hangbin Liu to print "inet6" for AF_INET6 made more
generic by me.

Signed-off-by: Phil Sutter <phil@nwl.cc>
8 years agobridge: fdb: minor syntax fix in help text
Phil Sutter [Thu, 29 Oct 2015 09:55:24 +0000 (10:55 +0100)]
bridge: fdb: minor syntax fix in help text

8 years agoifstat: add manpage
Phil Sutter [Thu, 29 Oct 2015 09:55:23 +0000 (10:55 +0100)]
ifstat: add manpage

8 years agogenl: add manpage
Phil Sutter [Thu, 29 Oct 2015 09:55:22 +0000 (10:55 +0100)]
genl: add manpage

8 years agoifcfg: add manpage
Phil Sutter [Thu, 29 Oct 2015 09:55:21 +0000 (10:55 +0100)]
ifcfg: add manpage

8 years agoadd new IFLA_VF_TRUST netlink attribute
Stephen Hemminger [Fri, 23 Oct 2015 22:47:07 +0000 (15:47 -0700)]
add new IFLA_VF_TRUST netlink attribute

8 years agoMerge branch 'master' into net-next
Stephen Hemminger [Fri, 23 Oct 2015 22:46:08 +0000 (15:46 -0700)]
Merge branch 'master' into net-next

8 years agomisc: cleanup extra whitespace
Stephen Hemminger [Fri, 23 Oct 2015 22:44:30 +0000 (15:44 -0700)]
misc: cleanup extra whitespace

No blank lines at end of file

8 years agotc: remove extra whitespace
Stephen Hemminger [Fri, 23 Oct 2015 22:43:28 +0000 (15:43 -0700)]
tc: remove extra whitespace

No blank lines at EOF, or trailing whitespace.

8 years agoip: remove extra newlines at end-of-file
Stephen Hemminger [Fri, 23 Oct 2015 22:41:58 +0000 (15:41 -0700)]
ip: remove extra newlines at end-of-file

Shouldn't have extra blank lines.

8 years agotc: ship filter man pages and refer to them in tc.8
Phil Sutter [Fri, 23 Oct 2015 17:47:16 +0000 (19:47 +0200)]
tc: ship filter man pages and refer to them in tc.8

Cc: Thomas Graf <tgraf@suug.ch>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Werner Almesberger <werner@almesberger.net>
Signed-off-by: Phil Sutter <phil@nwl.cc>
8 years agotc: add a man page for u32 filter
Phil Sutter [Fri, 23 Oct 2015 17:47:15 +0000 (19:47 +0200)]
tc: add a man page for u32 filter

Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: Phil Sutter <phil@nwl.cc>
8 years agotc: add a man page for tcindex filter
Phil Sutter [Fri, 23 Oct 2015 17:47:14 +0000 (19:47 +0200)]
tc: add a man page for tcindex filter

Cc: Werner Almesberger <werner@almesberger.net>
Signed-off-by: Phil Sutter <phil@nwl.cc>
8 years agotc: add a man page for route filter
Phil Sutter [Fri, 23 Oct 2015 17:47:13 +0000 (19:47 +0200)]
tc: add a man page for route filter

Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: Phil Sutter <phil@nwl.cc>
8 years agotc: add a man page for fw filter
Phil Sutter [Fri, 23 Oct 2015 17:47:12 +0000 (19:47 +0200)]
tc: add a man page for fw filter

Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: Phil Sutter <phil@nwl.cc>
8 years agotc: add a man page for flower filter
Phil Sutter [Fri, 23 Oct 2015 17:47:11 +0000 (19:47 +0200)]
tc: add a man page for flower filter

Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Phil Sutter <phil@nwl.cc>
8 years agotc: add a man page for flow filter
Phil Sutter [Fri, 23 Oct 2015 17:47:10 +0000 (19:47 +0200)]
tc: add a man page for flow filter

Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Phil Sutter <phil@nwl.cc>
8 years agotc: add a man page for cgroup filter
Phil Sutter [Fri, 23 Oct 2015 17:47:09 +0000 (19:47 +0200)]
tc: add a man page for cgroup filter

Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Phil Sutter <phil@nwl.cc>
8 years agotc: add a man page for basic filter
Phil Sutter [Fri, 23 Oct 2015 17:47:08 +0000 (19:47 +0200)]
tc: add a man page for basic filter

Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Phil Sutter <phil@nwl.cc>
8 years agotc: u32 filter coding style cleanup
Phil Sutter [Fri, 23 Oct 2015 17:21:23 +0000 (19:21 +0200)]
tc: u32 filter coding style cleanup

Add missing spaces around operators to increase readability. Aside from
that, make "preference" match a real synonym for "tos" and "dsfield" as
it's effect was identical to them.

Signed-off-by: Phil Sutter <phil@nwl.cc>
8 years agotc: improve filter help texts a bit
Phil Sutter [Fri, 23 Oct 2015 17:21:17 +0000 (19:21 +0200)]
tc: improve filter help texts a bit

This fixes a few syntax errors and changes route filter help text to use
classid instead of flowid to be consistent with other filters' help
texts.

Signed-off-by: Phil Sutter <phil@nwl.cc>
8 years agoupdate bpf kernel header
Stephen Hemminger [Fri, 23 Oct 2015 06:43:35 +0000 (23:43 -0700)]
update bpf kernel header

8 years agoMerge branch 'master' into net-next
Stephen Hemminger [Fri, 23 Oct 2015 06:42:37 +0000 (23:42 -0700)]
Merge branch 'master' into net-next

8 years agoip, realms: also allow to pass in raw realms value
Daniel Borkmann [Thu, 8 Oct 2015 10:22:39 +0000 (12:22 +0200)]
ip, realms: also allow to pass in raw realms value

If get_rt_realms() fails, try to get a possible raw u32 realms
value for the u32 RTA_FLOW/FRA_FLOW attribute, as it might be
useful to directly configure the hex value itself. And only if
that fails, then bail out.

The source realm is provided in the upper u16 (mask: 0xffff0000)
and the destination realm through the lower u16 part (mask:
0x0000ffff). This can be useful for tc's bpf realm matcher, but
also a full hex/mask param can be provided already for matching
through iptables' --realm cmdline option, for example.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>