Thomas Egerer [Mon, 30 Oct 2017 18:11:46 +0000 (19:11 +0100)]
xfrm_{state, policy}: Allow to deleteall polices/states with marks
Using 'ip deleteall' with policies that have marks, fails unless you
eplicitely specify the mark values. This is very uncomfortable when
bulk-deleting policies and states. With this patch all relevant states
and policies are wiped by 'ip deleteall' regardless of their mark
values.
Signed-off-by: Thomas Egerer <thomas.egerer@secunet.com>
Thomas Egerer [Mon, 30 Oct 2017 18:11:45 +0000 (19:11 +0100)]
xfrm_policy: Do not attempt to deleteall a socket policy
Socket polices are added to a socket using setsockopt(2). They cannot be
deleted by iproute2. The attempt to delete them causes an error
(EINVAL).
To avoid this unnecessary error message all socket policies are skipped
in xfrm_policy_keep.
Signed-off-by: Thomas Egerer <thomas.egerer@secunet.com>
Thomas Egerer [Mon, 30 Oct 2017 18:11:44 +0000 (19:11 +0100)]
xfrm_policy: Add filter option for socket policies
Listing policies on systems with a lot of socket policies can be
confusing due to the number of returned polices. Even if socket polices
are not of interest, they cannot be filtered. This patch adds an option
to filter all socket policies from the output.
Signed-off-by: Thomas Egerer <thomas.egerer@secunet.com>
Stefano Brivio [Tue, 31 Oct 2017 17:47:56 +0000 (18:47 +0100)]
ss: Fix width calculations when Netid or State columns are missing
If Netid or State columns are missing, we must not subtract one
for each of these two columns from the remaining screen width,
while distributing available space to columns. This one
character corresponding to one delimiting space has to be
subtracted only if the columns are actually printed.
Further, in the existing implementation, if the screen width is
an odd number, one additional character is added to the width of
one of the two columns.
But if both are not printed, this filling character needs to be
added somewhere else, in order to have the right spacing
allowing us to fill lines completely.
Address and port fields are printed in pairs (local and remote),
so we can't distribute the space to any of them, because it
would be doubled. Instead, print this additional space to the
right of the Send-Q column, to keep code changes to a minimum.
This is particularly visible with 'ss -f netlink -Z'. Before
this patch, with an 80 column terminal, we have:
$ ss -f netlink -Z|head -n3
Recv-Q Send-Q Local Address:Port Peer Address:Port
0 0 rtnl:evolution-calen/2049 * pro
c_ctx=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
0 0 rtnl:clock-applet/1944 * pro
c_ctx=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
After this patch, in both cases, the output is:
$ ss -f netlink -Z|head -n3
Recv-Q Send-Q Local Address:Port Peer Address:Port
0 0 rtnl:evolution-calen/2049 *
proc_ctx=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
0 0 rtnl:clock-applet/1944 *
proc_ctx=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
Stefano Brivio [Tue, 31 Oct 2017 17:47:54 +0000 (18:47 +0100)]
ss: Remove useless width specifier in process context print
Both local address and service, and remote address and service
fields are already printed out in netlink_show_one() before we
start printing process context, by calling sock_addr_print()
twice.
At this point, sock_addr_print() has already forced the remote
service field to be 'serv_width' wide -- that is, 'serv_width'
width has already been consumed, before we print process
context.
Hence, it makes no sense to force the display width of process
context to be 'serv_width' wide again: previous prints have
filled up the line already. Remove the width specifier and
prefix with a space instead, to keep this consistent with fields
which are displayed after the first output line.
Roman Mashak [Tue, 31 Oct 2017 18:24:19 +0000 (14:24 -0400)]
ip netns: use strtol() instead of atoi()
Use strtol-based API to parse and validate integer input; atoi() does
not detect errors and may yield undefined behaviour if result can't be
represented.
v2: use get_unsigned() since network namespace is really an unsigned value.
... if we exceed rate of 1kbps (burst of 90K), do an absolute jump of 2 actions
sudo $TC actions add action police rate 1kbit burst 90k conform-exceed jump 2 / pipe
... lets add a couple of marks so we can use them to mark exceed/not exceed
sudo $TC actions add action skbedit mark 11 ok index 11
sudo $TC actions add action skbedit mark 12 ok index 12
... if we dont exceed our rate we get a mark of 11, else mark of 12
sudo $TC filter add dev $ETH parent ffff: protocol ip prio 8 u32 \
match ip dst 127.0.0.8/32 flowid 1:10 \
action police index 4 \
action skbedit index 11 \
action skbedit index 12
Ok, lets keep this thing a little busy..
sudo ping -f -c 10000 127.0.0.8
... now lets see the filters..
sudo $TC -s filter ls dev $ETH parent ffff: protocol ip
filter pref 8 u32 chain 0
filter pref 8 u32 chain 0 fh 800: ht divisor 1
filter pref 8 u32 chain 0 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:10 not_in_hw (rule hit 20000 success 10000)
match 7f000008/ffffffff at 16 (success 10000 )
action order 1: police 0x4 rate 1Kbit burst 23440b mtu 2Kb action jump 2/pipe overhead 0b
ref 2 bind 1 installed 198 sec used 2 sec
Action statistics:
Sent 840000 bytes 10000 pkt (dropped 0, overlimits 9721 requeues 0)
backlog 0b 0p requeues 0
action order 2: skbedit mark 11 pass
index 11 ref 2 bind 1 installed 127 sec used 2 sec
Action statistics:
Sent 23436 bytes 279 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
action order 3: skbedit mark 12 pass
index 12 ref 2 bind 1 installed 127 sec used 2 sec
Action statistics:
Sent 816564 bytes 9721 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
As can be seen 97.21% of the packets were marked as exceeding the allocated
rate; you could do something clever with the skb mark after this.
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Michal Kubecek [Thu, 19 Oct 2017 08:21:08 +0000 (10:21 +0200)]
ip maddr: fix filtering by device
Commit 530903dd9003 ("ip: fix igmp parsing when iface is long") uses
variable len to keep trailing colon from interface name comparison. This
variable is local to loop body but we set it in one pass and use it in
following one(s) so that we are actually using (pseudo)random length for
comparison. This became apparent since commit b48a1161f5f9 ("ipmaddr: Avoid
accessing uninitialized data") always initializes len to zero so that the
name comparison is always true. As a result, "ip maddr show dev eth0" shows
IPv4 multicast addresses for all interfaces.
Instead of keeping the length, let's simply replace the trailing colon with
a null byte. The bonus is that we get correct interface name in ma.name.
Fixes: 530903dd9003 ("ip: fix igmp parsing when iface is long") Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Phil Sutter <phil@nwl.cc> Acked-by: Petr Vorel <pvorel@suse.cz>
Phil Sutter [Wed, 18 Oct 2017 17:58:13 +0000 (19:58 +0200)]
ss: Distinguish between IPv4 and IPv6 wildcard sockets
Commit aba9c23a6e1cb ("ss: enclose IPv6 address in brackets") unified
display of wildcard sockets in IPv4 and IPv6 to print the unspecified
address as '*'. Users then complained that they can't distinguish
between address families anymore, so change this again to what Stephen
Hemminger suggested:
| *:80 << both IPV6 and IPV4
| [::]:80 << IPV6_ONLY
| 0.0.0.0:80 << IPV4_ONLY
Note that on older kernels which don't support INET_DIAG_SKV6ONLY
attribute, pure IPv6 sockets will still show as '*'.
Cc: Humberto Alves <hjalves@live.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Phil Sutter <phil@nwl.cc>
Petr Vorel [Fri, 13 Oct 2017 13:57:17 +0000 (15:57 +0200)]
color: Fix another ip segfault when using --color switch
Commit 959f1428 ("color: add new COLOR_NONE and disable_color function")
introducing color enum COLOR_NONE, which is not only duplicite of
COLOR_CLEAR, but also caused segfault, when running ip with --color
switch, as 'attr + 8' in color_fprintf() access array item out of
bounds. Thus removing it and restoring "magic" offset + 7.
Petr Vorel [Fri, 13 Oct 2017 13:57:16 +0000 (15:57 +0200)]
color: Fix ip segfault when using --color switch
Commit d0e72011 ("ip: ipaddress.c: add support for json output")
introduced passing -1 as enum color_attr. This is not only wrong as no
color_attr has value -1, but also causes another segfault in color_fprintf()
on this setup as there is no item with index -1 in array of enum attr_colors[].
Using COLOR_CLEAR is valid option.
Reproduce with:
$ COLORFGBG='0;15' ip -c a
NOTE: COLORFGBG is environmental variable used for defining whether user
has light or dark background.
COLORFGBG="0;15" is used to ask for color set suitable for light background,
COLORFGBG="15;0" is used to ask for color set suitable for dark background.
Ivan Delalande [Fri, 6 Oct 2017 23:48:20 +0000 (16:48 -0700)]
ss: print MD5 signature keys configured on TCP sockets
These keys are reported by kernel 4.14 and later under the
INET_DIAG_MD5SIG attribute, when INET_DIAG_INFO is requested (ss -i)
and we have CAP_NET_ADMIN. The additional output looks like:
Ivan Delalande [Fri, 6 Oct 2017 23:48:19 +0000 (16:48 -0700)]
utils: add print_escape_buf to format and print arbitrary bytes
Keep it as simple as possible for now: just escape anything that is not
isprint-able, is among the "escape" parameter or '\' as an octal escape
sequence. This should be pretty easy to extend if any other user needs
something more complex in the future.
Baruch Siach [Mon, 9 Oct 2017 05:49:44 +0000 (08:49 +0300)]
lib: fix multiple strlcpy definition
Some C libraries, like uClibc and musl, provide BSD compatible
strlcpy(). Add check_strlcpy() to configure, and avoid defining strlcpy
and strlcat when the C library provides them.
This fixes the following static link error with uClibc-ng:
.../sysroot/usr/lib/libc.a(strlcpy.os): In function `strlcpy':
strlcpy.c:(.text+0x0): multiple definition of `strlcpy'
../lib/libutil.a(utils.o):utils.c:(.text+0x1ddc): first defined here
collect2: error: ld returned 1 exit status
Acked-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Lorenzo Colitti [Mon, 2 Oct 2017 17:03:37 +0000 (02:03 +0900)]
iproute: build more easily on Android
iproute2 contains a bunch of kernel headers, including uapi ones.
Android's libc uses uapi headers almost directly, and uses a
script to fix kernel types that don't match what userspace
expects.
For example: https://issuetracker.google.com/36987220 reports
that our struct ip_mreq_source contains "__be32 imr_multiaddr"
rather than "struct in_addr imr_multiaddr". The script addresses
this by replacing the uapi struct definition with a #include
<bits/ip_mreq.h> which contains the traditional userspace
definition.
Unfortunately, when we compile iproute2, this definition
conflicts with the one in iproute2's linux/in.h.
Historically we've just solved this problem by running "git rm"
on all the iproute2 include/linux headers that break Android's
libc. However, deleting the files in this way makes it harder to
keep up with upstream, because every upstream change to
an include file causes a merge conflict with the delete.
This patch fixes the problem by moving the iproute2 linux headers
from include/linux to include/uapi/linux.
Tested: compiles on ubuntu trusty (glibc)
Signed-off-by: Elliott Hughes <enh@google.com> Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Phil Sutter [Mon, 2 Oct 2017 11:46:37 +0000 (13:46 +0200)]
Check user supplied interface name lengths
The original problem was that something like:
| strncpy(ifr.ifr_name, *argv, IFNAMSIZ);
might leave ifr.ifr_name unterminated if length of *argv exceeds
IFNAMSIZ. In order to fix this, I thought about replacing all those
cases with (equivalent) calls to snprintf() or even introducing
strlcpy(). But as Ulrich Drepper correctly pointed out when rejecting
the latter from being added to glibc, truncating a string without
notifying the user is not to be considered good practice. So let's
excercise what he suggested and reject empty, overlong or otherwise
invalid interface names right from the start - this way calls to
strncpy() like shown above become safe and the user has a chance to
reconsider what he was trying to do.
Note that this doesn't add calls to check_ifname() to all places where
user supplied interface name is parsed. In many cases, the interface
must exist already and is therefore looked up using ll_name_to_index(),
so if_nametoindex() will perform the necessary checks already.
Phil Sutter [Mon, 2 Oct 2017 11:46:35 +0000 (13:46 +0200)]
ip{6, }tunnel: Avoid copying user-supplied interface name around
In both files' parse_args() functions as well as in iptunnel's do_prl()
and do_6rd() functions, a user-supplied 'dev' parameter is uselessly
copied into a temporary buffer before passing it to ll_name_to_index()
or copying into a struct ifreq. Avoid this by just caching the argv
pointer value until the later lookup/strcpy.
Michal Kubecek [Fri, 29 Sep 2017 11:41:05 +0000 (13:41 +0200)]
ip xfrm: use correct key length for netlink message
When SA is added manually using "ip xfrm state add", xfrm_state_modify()
uses alg_key_len field of struct xfrm_algo for the length of key passed to
kernel in the netlink message. However alg_key_len is bit length of the key
while we need byte length here. This is usually harmless as kernel ignores
the excess data but when the bit length of the key exceeds 512
(XFRM_ALGO_KEY_BUF_SIZE), it can result in buffer overflow.
We can simply divide by 8 here as the only place setting alg_key_len is in
xfrm_algo_parse() where it is always set to a multiple of 8 (and there are
already multiple places using "algo->alg_key_len / 8").
lib: json_print: rework 'new_json_obj' drop FILE* argument
As Stephen Hemminger mentioned on the last submission the new_json_obj
function is always called with fp == stdout, so right now, there's no
need of this extra argument.
The background for the rework is the following:
The ip monitor didn't call `new_json_obj` (even for in non json context),
so the static FILE* _fp variable wasn't initialized, thus raising a
SIGSEGV in ipaddress.c. This patch should fix this issue for good, new
paths won't have to call `new_json_obj`.
How to reproduce:
$ ip -t mon label link
(gdb) bt
.#0 _IO_vfprintf_internal (s=s@entry=0x0, format=format@entry=0x45460d “%d: “, ap=ap@entry=0x7fffffff7f18) at vfprintf.c:1278
.#1 0x0000000000451310 in color_fprintf (fp=0x0, attr=<optimized out>, fmt=0x45460d “%d: “) at color.c:108
.#2 0x000000000044a856 in print_color_int (t=t@entry=PRINT_ANY, color=color@entry=4294967295, key=key@entry=0x4545fc “ifindex”,
fmt=fmt@entry=0x45460d “%d: “, value=<optimized out>) at ip_print.c:132
.#3 0x000000000040ccd2 in print_int (value=<optimized out>, fmt=0x45460d “%d: “, key=0x4545fc “ifindex”, t=PRINT_ANY) at ip_common.h:189
.#4 print_linkinfo (who=<optimized out>, n=0x7fffffffa380, arg=0x7ffff77a82a0 <_IO_2_1_stdout_>) at ipaddress.c:1107
.#5 0x0000000000422e13 in accept_msg (who=0x7fffffff8320, ctrl=0x7fffffff8310, n=0x7fffffffa380, arg=0x7ffff77a82a0 <_IO_2_1_stdout_>) at ipmonitor.c:89
.#6 0x000000000044c58f in rtnl_listen (rtnl=0x672160 <rth>, handler=handler@entry=0x422c70 <accept_msg>, jarg=0x7ffff77a82a0 <_IO_2_1_stdout_>)
at libnetlink.c:761
.#7 0x00000000004233db in do_ipmonitor (argc=<optimized out>, argv=0x7fffffffe5a0) at ipmonitor.c:310
.#8 0x0000000000408f74 in do_cmd (argv0=0x7fffffffe7f5 “mon”, argc=3, argv=0x7fffffffe588) at ip.c:116
.#9 0x0000000000408a94 in main (argc=4, argv=0x7fffffffe580) at ip.c:311
Fixes: 6377572f ("ip: ip_print: add new API to print JSON or regular format output") Reported-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
lib: json_print: rework 'new_json_obj' drop FILE* argument
As Stephen Hemminger mentioned on the last submission the new_json_obj
function is always called with fp == stdout, so right now, there's no
need of this extra argument.
The background for the rework is the following:
The ip monitor didn't call `new_json_obj` (even for in non json context),
so the static FILE* _fp variable wasn't initialized, thus raising a
SIGSEGV in ipaddress.c. This patch should fix this issue for good, new
paths won't have to call `new_json_obj`.
How to reproduce:
$ ip -t mon label link
(gdb) bt
.#0 _IO_vfprintf_internal (s=s@entry=0x0, format=format@entry=0x45460d “%d: “, ap=ap@entry=0x7fffffff7f18) at vfprintf.c:1278
.#1 0x0000000000451310 in color_fprintf (fp=0x0, attr=<optimized out>, fmt=0x45460d “%d: “) at color.c:108
.#2 0x000000000044a856 in print_color_int (t=t@entry=PRINT_ANY, color=color@entry=4294967295, key=key@entry=0x4545fc “ifindex”,
fmt=fmt@entry=0x45460d “%d: “, value=<optimized out>) at ip_print.c:132
.#3 0x000000000040ccd2 in print_int (value=<optimized out>, fmt=0x45460d “%d: “, key=0x4545fc “ifindex”, t=PRINT_ANY) at ip_common.h:189
.#4 print_linkinfo (who=<optimized out>, n=0x7fffffffa380, arg=0x7ffff77a82a0 <_IO_2_1_stdout_>) at ipaddress.c:1107
.#5 0x0000000000422e13 in accept_msg (who=0x7fffffff8320, ctrl=0x7fffffff8310, n=0x7fffffffa380, arg=0x7ffff77a82a0 <_IO_2_1_stdout_>) at ipmonitor.c:89
.#6 0x000000000044c58f in rtnl_listen (rtnl=0x672160 <rth>, handler=handler@entry=0x422c70 <accept_msg>, jarg=0x7ffff77a82a0 <_IO_2_1_stdout_>)
at libnetlink.c:761
.#7 0x00000000004233db in do_ipmonitor (argc=<optimized out>, argv=0x7fffffffe5a0) at ipmonitor.c:310
.#8 0x0000000000408f74 in do_cmd (argv0=0x7fffffffe7f5 “mon”, argc=3, argv=0x7fffffffe588) at ip.c:116
.#9 0x0000000000408a94 in main (argc=4, argv=0x7fffffffe580) at ip.c:311
Fixes: 6377572f ("ip: ip_print: add new API to print JSON or regular format output") Reported-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
Daniel Borkmann [Thu, 21 Sep 2017 08:42:29 +0000 (10:42 +0200)]
bpf: properly output json for xdp
After merging net-next branch into master, Stephen asked
to fix up json dump for XDP. Thus, rework the json dump a
bit, such that 'ip -json l' looks as below.
Daniel Borkmann [Thu, 21 Sep 2017 08:42:28 +0000 (10:42 +0200)]
json: move json printer to common library
Move the json printer which is based on json writer into the
iproute2 library, so it can be used by library code and tools
other than ip. Should probably have been done from the beginning
like that given json writer is in the library already anyway.
No functional changes.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Julien Fortin <julien@cumulusnetworks.com>
Phil Sutter [Tue, 12 Sep 2017 14:58:12 +0000 (16:58 +0200)]
ipaddress: Fix segfault in 'addr showdump'
Obviously, 'addr showdump' feature wasn't adjusted to json output
support. As a consequence, calls to print_string() in print_addrinfo()
tried to dereference a NULL FILE pointer.
Fixes: d0e720111aad2 ("ip: ipaddress.c: add support for json output") Signed-off-by: Phil Sutter <phil@nwl.cc>
devlink: Add support for special format protocol headers
In case of global header (protocol header), the header:field ids are used
to perform lookup for special format printer. In case no printer existence
fallback to plain value printing.
This patch decouples the match/action parsing from printing. This is
done as a preparation for adding the ability to print global header
values, for example print IPv4 address, which require special formatting.
Phil Sutter [Wed, 6 Sep 2017 16:51:42 +0000 (18:51 +0200)]
utils: strlcpy() and strlcat() don't clobber dst
As David Laight correctly pointed out, the first version of strlcpy()
modified dst buffer behind the string copied into it. Fix this by
writing NUL to the byte immediately following src string instead of to
the last byte in dst. Doing so also allows to reduce overhead by using
memcpy().
Improve strlcat() by avoiding the call to strlcpy() if dst string is
already full, not just as sanity check.
Daniel Borkmann [Tue, 5 Sep 2017 00:24:32 +0000 (02:24 +0200)]
bpf: consolidate dumps to use bpf_dump_prog_info
Consolidate dump of prog info to use bpf_dump_prog_info() when possible.
Moving forward, we want to have a consistent output for BPF progs when
being dumped. E.g. in cls/act case we used to dump tag as a separate
netlink attribute before we had BPF_OBJ_GET_INFO_BY_FD bpf(2) command.
Move dumping tag into bpf_dump_prog_info() as well, and only dump the
netlink attribute for older kernels. Also, reuse bpf_dump_prog_info()
for XDP case, so we can dump tag and whether program was jited, which
we currently don't show.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Simon Horman [Tue, 5 Sep 2017 11:06:24 +0000 (13:06 +0200)]
tc actions: store and dump correct length of user cookies
Correct two errors which cancel each other out:
* Do not send twice the length of the actual provided by the user to the kernel
* Do not dump half the length of the cookie provided by the kernel
As the cookie is now stored in the kernel at its correct length rather
than double the that length cookies of up to the maximum size of 16 bytes
may now be stored rather than a maximum of half that length.
Output of dump is the same before and after this change,
but the data stored in the kernel is now exactly the cookie
rather than the cookie + as many trailing zeros.
Before:
# tc filter add dev eth0 protocol ip parent ffff: \
flower ip_proto udp action drop \
cookie 0123456789abcdef0123456789abcdef
RTNETLINK answers: Invalid argument
After:
# tc filter add dev eth0 protocol ip parent ffff: \
flower ip_proto udp action drop \
cookie 0123456789abcdef0123456789abcdef
# tc filter show dev eth0 ingress
eth_type ipv4
ip_proto udp
not_in_hw
action order 1: gact action drop
random type none pass val 0
index 1 ref 1 bind 1 installed 1 sec used 1 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
cookie len 16 0123456789abcdef0123456789abcdef
Fixes: fd8b3d2c1b9b ("actions: Add support for user cookies") Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
Phil Sutter [Tue, 29 Aug 2017 15:09:45 +0000 (17:09 +0200)]
lib/bpf: Fix bytecode-file parsing
The signedness of char type is implementation dependent, and there are
architectures on which it is unsigned by default. In that case, the
check whether fgetc() returned EOF failed because the return value was
assigned an (unsigned) char variable prior to comparison with EOF (which
is defined to -1). Fix this by using int as type for 'c' variable, which
also matches the declaration of fgetc().
While being at it, fix the parser logic to correctly handle multiple
empty lines and consecutive whitespace and tab characters to further
improve the parser's robustness. Note that this will still detect double
separator characters, so doesn't soften up the parser too much.
Fixes: 3da3ebfca85b8 ("bpf: Make bytecode-file reading a little more robust") Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Phil Sutter <phil@nwl.cc> Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Michal Kubecek [Fri, 1 Sep 2017 16:39:16 +0000 (18:39 +0200)]
iplink: double the buffer size also in iplink_get()
Commit 72b365e8e0fd ("libnetlink: Double the dump buffer size") increased
the buffer size for "ip link show" command to 32 KB to handle NICs with
large number of VFs. With "dev" filter, a different code path is taken and
iplink_get() still uses only 16 KB buffer.
The size of 32768 is not very future-proof as NICs supporting 120-128 VFs
are already in use so that single RTM_NEWLINK message in the dump can
exceed 30000 bytes. But it's what rtnl_talk() and rtnl_dump_filter_l() use
so let's be consistent. Once this proves insufficient, all three sizes
should be increased.
Michal Kubecek [Fri, 1 Sep 2017 16:39:11 +0000 (18:39 +0200)]
iplink: check for message truncation in iplink_get()
If message length exceeds maxlen argument of rtnl_talk(), it is truncated
to maxlen but unlike in the case of truncation to the length of local
buffer in rtnl_talk(), the caller doesn't get any indication of a problem.
In particular, iplink_get() passes the truncated message on and parsing it
results in various warnings and sometimes even a segfault (observed with
"ip link show dev ..." for a NIC with 125 VFs).
Handle message truncation in iplink_get() the same way as truncation in
rtnl_talk() would be handled: return an error.
Phil Sutter [Fri, 1 Sep 2017 14:08:08 +0000 (16:08 +0200)]
link_gre6: Fix for changing tclass/flowlabel
When trying to change tclass or flowlabel of a GREv6 tunnel which has
the respective value set already, the code accidentally bitwise OR'ed
the old and the new value, leading to unexpected results. Fix this by
clearing the relevant bits of flowinfo variable prior to assigning the
new value.
Fixes: af89576d7a8c4 ("iproute2: GRE over IPv6 tunnel support.") Signed-off-by: Phil Sutter <phil@nwl.cc>
Phil Sutter [Mon, 28 Aug 2017 17:31:22 +0000 (19:31 +0200)]
ss: Fix for added diag support check
Commit 9f66764e308e9 ("libnetlink: Add test for error code returned from
netlink reply") changed rtnl_dump_filter_l() to return an error in case
NLMSG_DONE would contain one, even if it was ENOENT.
This in turn breaks ss when it tries to dump DCCP sockets on a system
without support for it: The function tcp_show(), which is shared between
TCP and DCCP, will start parsing /proc since inet_show_netlink() returns
an error - yet it parses /proc/net/tcp which doesn't make sense for DCCP
sockets at all.
On my system, a call to 'ss' without further arguments prints the list
of connected TCP sockets twice.
Fix this by introducing a dedicated function dccp_show() which does not
have a fallback to /proc, just like sctp_show(). And since tcp_show()
is no longer "multi-purpose", drop it's socktype parameter.
Fixes: 9f66764e308e9 ("libnetlink: Add test for error code returned from netlink reply") Signed-off-by: Phil Sutter <phil@nwl.cc>
Phil Sutter [Thu, 24 Aug 2017 09:41:30 +0000 (11:41 +0200)]
lib/fs: Fix and simplify make_path()
Calling stat() before mkdir() is racey: The entry might change in
between. Also, the call to stat() seems to exist only to check if the
directory exists already. So simply call mkdir() unconditionally and
catch only errors other than EEXIST.