Jon Maloy [Thu, 17 May 2018 14:02:42 +0000 (16:02 +0200)]
tipc: fixed node and name table listings
We make it easier for users to correlate between 128-bit node
identities and 32-bit node hash number by extending the 'node list'
command to also show the hash number.
We also improve the 'nametable show' command to show the node identity
instead of the node hash number. Since the former potentially is much
longer than the latter, we make room for it by eliminating the (to the
user) irrelevant publication key. We also reorder some of the columns so
that the node id comes last, since this looks nicer and is more logical.
Currently there is no way to log offloading errors if the rule is not
explicitly marked as skip_sw, making it hard for other applications such
as Open vSwitch to log why a given could not be offloaded.
This patch adds support for signaling the kernel that more verbose
logging is wanted, which now will include such messages.
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Signed-off-by: David Ahern <dsahern@gmail.com>
Luca Boccassi [Fri, 11 May 2018 12:39:56 +0000 (13:39 +0100)]
ip: do not drop capabilities if net_admin=i is set
Users have reported a regression due to ip now dropping capabilities
unconditionally.
zerotier-one VPN and VirtualBox use ambient capabilities in their
binary and then fork out to ip to set routes and links, and this
does not work anymore.
As a workaround, do not drop caps if CAP_NET_ADMIN (the most common
capability used by ip) is set with the INHERITABLE flag.
Users that want ip vrf exec to work do not need to set INHERITABLE,
which will then only set when the calling program had privileges to
give itself the ambient capability.
Fixes: ba2fc55b99f8 ("Drop capabilities if not running ip exec vrf with libcap") Signed-off-by: Luca Boccassi <bluca@debian.org>
Ss was using slabinfo to try and intuit TCP statistics.
The slabinfo changed several times since 2.4 and all these statistics
are broken by renames and slab merging. Plus slabinfo does not exist
at all if kernel is compiled with SLUB option.
Rather than trying to fix kernel, just trim away the no longer
valid statistics.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Baruch Siach [Tue, 1 May 2018 12:43:08 +0000 (15:43 +0300)]
arpd: remove pthread dependency
Explicit link with pthread is not needed when linking dynamically. Even
static link with recent libdb does not pull in the code that uses
pthread. Finally, the configure check introduced in commit a25df4887d7
(configure: Check for Berkeley DB for arpd compilation) does not add
-lpthread to its link command.
This change allows arpd build with toolchains that do not provide
threads support.
Signed-off-by: Baruch Siach <baruch@tkos.co.il> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
print_uint() will silently promote its variable type to uint64_t, but there
is nothing that ensures that the format string specifier passed along with
it fits (and the function name suggest to pass "%u").
Fix this by changing print_uint() to use a native 'unsigned int' type, and
introduce a separate print_u64() function for printing 64-bit values. All
call sites that were actually printing 64-bit values using print_uint() are
converted to use print_u64() instead.
Since print_int() was already using native int types, just add a
print_s64() to match, but don't convert any call sites. For symmetry,
also add a print_luint() method (with no users).
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Currently, iproute allows setting those flags, but it's impossible to
clear them, since their current value is fetched from the kernel and
then we OR in the additional flags passed on the command line.
Add no* variants to allow clearing them.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David Ahern <dsahern@gmail.com>
GRE tunnels are currently only documented together with IPIP and SIT
tunnels, but they actually have very different configuration
options. Let's separate them.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David Ahern <dsahern@gmail.com>
Jakub Kicinski [Wed, 18 Apr 2018 18:06:07 +0000 (11:06 -0700)]
iplink_geneve: correct size of message to avoid spurious errors
Commit 6c4b672738ac ("iplink_geneve: Get rid of inet_get_addr()")
inadvertently changed the parameter to addattr_l() resulting in:
addattr_l ERROR: message exceeded bound of 4
when remote is specified.
Fixes: 6c4b672738ac ("iplink_geneve: Get rid of inet_get_addr()") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Hangbin Liu [Wed, 18 Apr 2018 05:05:48 +0000 (13:05 +0800)]
vxlan: fix ttl inherit behavior
Like kernel net-next commit 72f6d71e491e6 ("vxlan: add ttl inherit support"),
vxlan ttl inherit should means inherit the inner protocol's ttl value.
But currently when we add vxlan with "ttl inherit", we only set ttl 0,
which is actually use whatever default value instead of inherit the inner
protocol's ttl value.
To make a difference with ttl inherit and ttl == 0, we add an attribute
IFLA_VXLAN_TTL_INHERIT when "ttl inherit" specified. And use "ttl auto"
to means "use whatever default value", the same behavior with ttl == 0.
Reported-by: Jianlin Shi <jishi@redhat.com> Suggested-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David Ahern <dsahern@gmail.com>
David Ahern [Fri, 13 Apr 2018 16:36:33 +0000 (09:36 -0700)]
utils: Do not reset family for default, any, all addresses
Thomas reported a change in behavior with respect to autodectecting
address families. Specifically, 'ip ro add default via fe80::1'
syntax was failing to treat fe80::1 as an IPv6 address as it did in
prior releases. The root causes appears to be a change in family when
the default keyword is parsed.
'default', 'any' and 'all' are relevant outside of AF_INET. Leave the
family arg as is for these when setting addr.
Fixes: 93fa12418dc6 ("utils: Always specify family and ->bytelen in get_prefix_1()") Reported-by: Thomas Deutschmann <whissi@gentoo.org> Signed-off-by: David Ahern <dsahern@gmail.com> Cc: Serhey Popovych <serhe.popovych@gmail.com>
Jakub Sitnicki [Wed, 11 Apr 2018 09:43:11 +0000 (11:43 +0200)]
iproute: Abort if nexthop cannot be parsed
Attempt to add a multipath route where a nexthop definition refers to a
non-existent device causes 'ip' to crash and burn due to stack buffer
overflow:
Fixes: 64108901b737 ("bridge: Add support for setting bridge port attributes") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Leon Romanovsky [Tue, 3 Apr 2018 04:29:14 +0000 (07:29 +0300)]
rdma: Print net device name and index for RDMA device
The RDMA devices are operated in RoCE and iWARP modes have net device
underneath. Present their names in regular output and their net index
in detailed mode.
[root@nps ~]# rdma link show mlx5_3/1
4/1: mlx5_3/1: state ACTIVE physical_state LINK_UP netdev ens7
[root@nps ~]# rdma link show mlx5_3/1 -d
4/1: mlx5_3/1: state ACTIVE physical_state LINK_UP netdev ens7 netdev_index 7
caps: <CM, IP_BASED_GIDS>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: David Ahern <dsahern@gmail.com>
l2tp: no need to export session offsets in JSON output
The offset and peer_offset parameters are only printed to avoid
confusing external scripts that may parse "ip l2tp show session"
output. There's no reason to keep them in JSON.
Commit 9fd3f0b255d9 ("tc: enable json output for actions") added JSON
support for tc-actions at the expense of breaking other use cases that
reach tc_print_action(), as the latter don't expect the 'actions' array
to be a new object.
Consider the following taken duringrun of tc_chain.sh selftest,
and see the latter command output is broken:
Leon Romanovsky [Tue, 3 Apr 2018 07:28:42 +0000 (10:28 +0300)]
rdma: Ignore unknown netlink attributes
The check if netlink attributes supplied more than maximum supported
is to strict and may lead to backward compatibility issues with old
application with a newer kernel that supports new attribute.
CC: Steve Wise <swise@opengridcomputing.com> Fixes: 74bd75c2b68d ("rdma: Add basic infrastructure for RDMA tool") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
David Ahern [Sun, 1 Apr 2018 15:19:21 +0000 (08:19 -0700)]
Merge branch 'rdma-res-tracking' into iproute2-next
Steve Wise says:
====================
This series enhances the iproute2 rdma tool to include dumping of
connection manager id (cm_id), completion queue (cq), memory region (mr),
and protection domain (pd) rdma resources. It is the user-space part of
the kernel resource tracking series merged into rdma-next for 4.17 [1]
and [2].
Changes since v3:
- replaced rdma_cma.h inclusion with UAPI rdma_user_cm.h
- display only device names instead of device/port for cq, mr, and pd
since they are not associated with a specific port.
Changes since v2:
- pull in rdma-core:include/rdma/rdma_cma.h
- 80 column reformat
- add reviewed-by tags
Changes since v1/RFC:
- removed RFC tag
- initialize rd properly to avoid passing a garbage port number
- revert accidental change to qp_valid_filters
- removed cm_id dev/network/transport types
- cm_id ip addrs now passed up as __kernel_sockaddr_storage
- cm_id ip address ports printed as "address:port" strings
- only parse/display memory keys and iova if available
- filter on "users" for cqs and pds
- fixed memory leaks
- removed PD_FLAGS attribute
- filter on "mrlen" for mrs
- filter on "poll-ctx" for cqs
- don't require addrs or qp_type for parsing cm_ids
- only filter optional attrs if they are present
- remove PGSIZE MR attr to match kernel
Steve Wise [Thu, 29 Mar 2018 16:10:44 +0000 (09:10 -0700)]
rdma: Add PD resource tracking information
Sample output:
Without CAP_NET_ADMIN capability:
dev mlx4_0 users 0 pid 0 comm [ib_srpt]
dev mlx4_0 users 0 pid 0 comm [ib_srp]
dev mlx4_0 users 1 pid 0 comm [ib_core]
dev cxgb4_0 users 0 pid 0 comm [ib_srp]
With CAP_NET_ADMIN capability:
dev mlx4_0 local_dma_lkey 0x8000 users 0 pid 0 comm [ib_srpt]
dev mlx4_0 local_dma_lkey 0x8000 users 0 pid 0 comm [ib_srp]
dev mlx4_0 local_dma_lkey 0x8000 users 1 pid 0 comm [ib_core]
dev cxgb4_0 local_dma_lkey 0x0 users 0 pid 0 comm [ib_srp]
Signed-off-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David Ahern <dsahern@gmail.com>
Steve Wise [Thu, 29 Mar 2018 16:10:41 +0000 (09:10 -0700)]
rdma: Add MR resource tracking information
Sample output:
Without CAP_NET_ADMIN:
$ rdma resource show mr mrlen 65536
dev mlx4_0 mrlen 65536 pid 0 comm [nvme_rdma]
dev cxgb4_0 mrlen 65536 pid 0 comm [nvme_rdma]
With CAP_NET_ADMIN:
# rdma resource show mr mrlen 65536
dev mlx4_0 rkey 0x12702 lkey 0x12702 iova 0x85724a000 mrlen 65536 pid 0 comm [nvme_rdma]
dev cxgb4_0 rkey 0x68fe4e9 lkey 0x68fe4e9 iova 0x835b91000 mrlen 65536 pid 0 comm [nvme_rdma]
Signed-off-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David Ahern <dsahern@gmail.com>
Steve Wise [Thu, 29 Mar 2018 16:10:39 +0000 (09:10 -0700)]
rdma: Add CQ resource tracking information
Sample output:
# rdma resource show cq
dev cxgb4_0 cqe 46 users 2 pid 30503 comm rping
dev cxgb4_0 cqe 46 users 2 pid 30498 comm rping
dev mlx4_0 cqe 63 users 2 pid 30494 comm rping
dev mlx4_0 cqe 63 users 2 pid 30489 comm rping
dev mlx4_0 cqe 1023 users 2 poll_ctx WORKQUEUE pid 0 comm [ib_core]
# rdma resource show cq pid 30489
dev mlx4_0 cqe 63 users 2 pid 30489 comm rping
Signed-off-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David Ahern <dsahern@gmail.com>
# rdma resource show cm_id
link cxgb4_0/- lqpn 0 qp-type RC state LISTEN ps TCP pid 30485 comm rping src-addr 0.0.0.0:7174
link cxgb4_0/2 lqpn 1048 qp-type RC state CONNECT ps TCP pid 30503 comm rping src-addr 172.16.2.1:7174 dst-addr 172.16.2.1:38246
link cxgb4_0/2 lqpn 1040 qp-type RC state CONNECT ps TCP pid 30498 comm rping src-addr 172.16.2.1:38246 dst-addr 172.16.2.1:7174
link mlx4_0/- lqpn 0 qp-type RC state LISTEN ps TCP pid 30485 comm rping src-addr 0.0.0.0:7174
link mlx4_0/1 lqpn 539 qp-type RC state CONNECT ps TCP pid 30494 comm rping src-addr 172.16.99.1:7174 dst-addr 172.16.99.1:43670
link mlx4_0/1 lqpn 538 qp-type RC state CONNECT ps TCP pid 30492 comm rping src-addr 172.16.99.1:43670 dst-addr 172.16.99.1:7174
# rdma resource show cm_id dst-port 7174
link cxgb4_0/2 lqpn 1040 qp-type RC state CONNECT ps TCP pid 30498 comm rping src-addr 172.16.2.1:38246 dst-addr 172.16.2.1:7174
link mlx4_0/1 lqpn 538 qp-type RC state CONNECT ps TCP pid 30492 comm rping src-addr 172.16.99.1:43670 dst-addr 172.16.99.1:7174
Signed-off-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David Ahern <dsahern@gmail.com>
Steve Wise [Thu, 29 Mar 2018 16:10:35 +0000 (09:10 -0700)]
rdma: initialize the rd struct
Initialize the rd struct so port_idx is 0 unless set otherwise.
Otherwise, strict_port queries end up passing an uninitialized PORT
nlattr.
Signed-off-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David Ahern <dsahern@gmail.com>
Steve Wise [Thu, 29 Mar 2018 16:10:30 +0000 (09:10 -0700)]
rdma: update rdma_netlink.h
Pull in the latest rdma_netlink.h which has support for
the rdma nldev resource tracking objects being added
with this patch series.
Signed-off-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David Ahern <dsahern@gmail.com>
David Ahern [Thu, 29 Mar 2018 17:50:30 +0000 (10:50 -0700)]
Merge branch 'tipc-addr' into iproute2-next
Jon Maloy says:
====================
1: We introduce ability to set/get 128-bit node identities
2: We rename 'net id' to 'cluster id' in the command API,
of course in a compatible way.
3: We print out all 32-bit node addresses as an integer in hex format,
i.e., we remove the assumption about an internal structure.
====================
Alexander Zubkov [Tue, 27 Mar 2018 23:57:13 +0000 (01:57 +0200)]
arrange prefix parsing code after redundant patches
A problem was reported with parsing of prefixes all/any/default.
Commit 7696f1097f79be2ce5984a8a16103fd17391cac2 fixes the problem,
but there were also other pathces applied: 00b31a6b2ecf73ee477f701098164600a2bfe227, which were intended to
fix the same problem. And they became redundant now. This patch
reverts changes introduced by those redundant patches.
This fixes new gcc warning about possible string overflow.
mdb.c: In function ‘__print_router_port_stats’:
mdb.c:61:11: warning: ‘%.2i’ directive output may be truncated
writing between 2 and 7 bytes into a region of size
between 0 and 4 [-Wformat-truncation=]
"%4i.%.2i", (int)tv.tv_sec,
^~~~
Note: already fixed in iproute2-next.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Jon Maloy [Wed, 28 Mar 2018 16:52:13 +0000 (18:52 +0200)]
tipc: introduce command for handling a new 128-bit node identity
We add the possibility to set and get a 128 bit node identifier, as
an alternative to the legacy 32-bit node address we are using now.
We also add an option to set and get 'clusterid' in the node. This
is the same as what we have so far called 'netid' and performs the
same operations. For compatibility the old 'netid' commands are
retained, -we just remove them from the help texts.
Acked-by: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David Ahern <dsahern@gmail.com>
David Ahern [Thu, 29 Mar 2018 03:28:58 +0000 (20:28 -0700)]
Merge branch 'tipc-stats' into iproute2-next
GhantaKrishnamurthy MohanKrishna
says:
====================
The following patchset add user space TIPC socket diagnostics support
in ss tool of iproute2. It requires the sock_diag framework
for AF_TIPC support in the kernel, commit id: c30b70deb5f
(tipc: implement socket diagnostics for AF_TIPC).
tipc socket stats are requested with the "--tipc" option. Additional
tipc specific info are requested with "--tipcinfo" option.
This patchset is based on top of iproute2 v4.15.0-100-g4f63187
commitid: f85adc6. It has been co-authored by
Parthasarathy Bhuvaragan.
Example output (the first socket is the internal topology server)
Phil Sutter [Tue, 27 Mar 2018 23:51:55 +0000 (01:51 +0200)]
ss: Put filter DB parsing into a separate function
Use a table for database name parsing. The tricky bit is to allow for
association of a (nearly) arbitrary number of DBs with each name.
Luckily the number is not fully arbitrary as there is an upper bound of
MAX_DB items. Since it is not possible to have a variable length
array inside a variable length array, use this knowledge to make the
inner array of fixed length. But since DB values start from zero, an
explicit end entry needs to be present as well, so the inner array has
to be MAX_DB + 1 in size.
Phil Sutter [Tue, 27 Mar 2018 23:51:54 +0000 (01:51 +0200)]
ss: Allow excluding a socket table from being queried
The original problem was that a simple call to 'ss' leads to loading of
sctp_diag kernel module which might not be desired. While searching for
a workaround, it became clear how inconvenient it is to exclude a single
socket table from being queried.
This patch allows to prefix an item passed to '-A' parameter with an
exclamation mark to inverse its meaning.
Luca Boccassi [Tue, 27 Mar 2018 17:48:55 +0000 (18:48 +0100)]
Drop capabilities if not running ip exec vrf with libcap
ip vrf exec requires root or CAP_NET_ADMIN, CAP_SYS_ADMIN and
CAP_DAC_OVERRIDE. It is not possible to run unprivileged commands like
ping as non-root or non-cap-enabled due to this requirement.
To allow users and administrators to safely add the required
capabilities to the binary, drop all capabilities on start if not
invoked with "vrf exec".
Update the manpage with the requirements.
Signed-off-by: Luca Boccassi <bluca@debian.org> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Phil Sutter [Sat, 24 Mar 2018 17:45:14 +0000 (18:45 +0100)]
ssfilter: Eliminate shift/reduce conflicts
The problematic bit was the 'expr: expr expr' rule. Fix this by making
'expr' token represent a single filter only and introduce a new token
'exprlist' to represent a combination of filters.
Stefano Brivio [Fri, 23 Mar 2018 08:37:05 +0000 (09:37 +0100)]
ss: Fix rendering of continuous output (-E, --events)
Roman Mashak reported that ss currently shows no output when it
should continuously report information about terminated sockets
(-E, --events switch).
This happens because I missed this case in 691bd854bf4a ("ss:
Buffer raw fields first, then render them as a table") and the
rendering function is simply not called.
To fix this, we need to:
- call render() every time we need to display new socket events
from generic_show_sock(), which is only used to follow events.
Always call it even if specific socket display functions
return errors to ensure we clean up buffers
- get the screen width every time we have new events to display,
thus factor out getting the screen width from main() into a
function we'll call whenever we calculate columns width
- reset the current field pointer after rendering, more output
might come after render() is called
Reported-by: Roman Mashak <mrv@mojatatu.com> Fixes: 691bd854bf4a ("ss: Buffer raw fields first, then render them as a table") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Tested-by: Roman Mashak <mrv@mojatatu.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Alexander Zubkov [Sun, 18 Mar 2018 16:50:25 +0000 (17:50 +0100)]
treat "default" and "all"/"any" addresses differenty
Debian maintainer found that basic command:
# ip route flush all
No longer worked as expected which breaks user scripts and
expectations. It no longer flushed all IPv4 routes.
Recently behavior of "default" prefix parameter was corrected. But at
the same time behavior of "all"/"any" was altered too, because they
were the same branch of the code. As those parameters mean different,
they need to be treated differently in code too. This patch reflects
the difference.
Also after mentioned change, address parsing code was changed more
and address family was set explicitly even for "all"/"any" addresses.
And that broke matching conditions further. This patch fixes that too
and returns AF_UNSPEC to "all"/"any" address.
Now "default" is treated as top-level prefix (for example 0.0.0.0/0 in
IPv4) and "all"/"any" always matches anything in exact, root and match
modes.
Reported-by: Luca Boccassi <bluca@debian.org> Signed-off-by: Alexander Zubkov <green@msu.ru>
Leon Romanovsky [Sun, 25 Mar 2018 06:38:56 +0000 (09:38 +0300)]
rdma: Move RDMA UAPI header file to be under RDMA responsibility
In iproute2 package, the updates of UAPIs files are performed
after the needed feature lands in kernel's net-next tree.
Such development flow created delays to the rdma tool developers,
who uses rdma-next tree as a basis for their work.
Move RDMA UAPI file to be under rdma/ folder, so whole responsibility
of syncing this file will be on them.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: David Ahern <dsahern@gmail.com>