Steve Wise [Thu, 29 Mar 2018 16:10:30 +0000 (09:10 -0700)]
rdma: update rdma_netlink.h
Pull in the latest rdma_netlink.h which has support for
the rdma nldev resource tracking objects being added
with this patch series.
Signed-off-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David Ahern <dsahern@gmail.com>
David Ahern [Thu, 29 Mar 2018 17:50:30 +0000 (10:50 -0700)]
Merge branch 'tipc-addr' into iproute2-next
Jon Maloy says:
====================
1: We introduce ability to set/get 128-bit node identities
2: We rename 'net id' to 'cluster id' in the command API,
of course in a compatible way.
3: We print out all 32-bit node addresses as an integer in hex format,
i.e., we remove the assumption about an internal structure.
====================
Jon Maloy [Wed, 28 Mar 2018 16:52:13 +0000 (18:52 +0200)]
tipc: introduce command for handling a new 128-bit node identity
We add the possibility to set and get a 128 bit node identifier, as
an alternative to the legacy 32-bit node address we are using now.
We also add an option to set and get 'clusterid' in the node. This
is the same as what we have so far called 'netid' and performs the
same operations. For compatibility the old 'netid' commands are
retained, -we just remove them from the help texts.
Acked-by: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David Ahern <dsahern@gmail.com>
David Ahern [Thu, 29 Mar 2018 03:28:58 +0000 (20:28 -0700)]
Merge branch 'tipc-stats' into iproute2-next
GhantaKrishnamurthy MohanKrishna
says:
====================
The following patchset add user space TIPC socket diagnostics support
in ss tool of iproute2. It requires the sock_diag framework
for AF_TIPC support in the kernel, commit id: c30b70deb5f
(tipc: implement socket diagnostics for AF_TIPC).
tipc socket stats are requested with the "--tipc" option. Additional
tipc specific info are requested with "--tipcinfo" option.
This patchset is based on top of iproute2 v4.15.0-100-g4f63187
commitid: f85adc6. It has been co-authored by
Parthasarathy Bhuvaragan.
Example output (the first socket is the internal topology server)
Luca Boccassi [Tue, 27 Mar 2018 17:48:55 +0000 (18:48 +0100)]
Drop capabilities if not running ip exec vrf with libcap
ip vrf exec requires root or CAP_NET_ADMIN, CAP_SYS_ADMIN and
CAP_DAC_OVERRIDE. It is not possible to run unprivileged commands like
ping as non-root or non-cap-enabled due to this requirement.
To allow users and administrators to safely add the required
capabilities to the binary, drop all capabilities on start if not
invoked with "vrf exec".
Update the manpage with the requirements.
Signed-off-by: Luca Boccassi <bluca@debian.org> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Phil Sutter [Sat, 24 Mar 2018 17:45:14 +0000 (18:45 +0100)]
ssfilter: Eliminate shift/reduce conflicts
The problematic bit was the 'expr: expr expr' rule. Fix this by making
'expr' token represent a single filter only and introduce a new token
'exprlist' to represent a combination of filters.
Stefano Brivio [Fri, 23 Mar 2018 08:37:05 +0000 (09:37 +0100)]
ss: Fix rendering of continuous output (-E, --events)
Roman Mashak reported that ss currently shows no output when it
should continuously report information about terminated sockets
(-E, --events switch).
This happens because I missed this case in 691bd854bf4a ("ss:
Buffer raw fields first, then render them as a table") and the
rendering function is simply not called.
To fix this, we need to:
- call render() every time we need to display new socket events
from generic_show_sock(), which is only used to follow events.
Always call it even if specific socket display functions
return errors to ensure we clean up buffers
- get the screen width every time we have new events to display,
thus factor out getting the screen width from main() into a
function we'll call whenever we calculate columns width
- reset the current field pointer after rendering, more output
might come after render() is called
Reported-by: Roman Mashak <mrv@mojatatu.com> Fixes: 691bd854bf4a ("ss: Buffer raw fields first, then render them as a table") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Tested-by: Roman Mashak <mrv@mojatatu.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Alexander Zubkov [Sun, 18 Mar 2018 16:50:25 +0000 (17:50 +0100)]
treat "default" and "all"/"any" addresses differenty
Debian maintainer found that basic command:
# ip route flush all
No longer worked as expected which breaks user scripts and
expectations. It no longer flushed all IPv4 routes.
Recently behavior of "default" prefix parameter was corrected. But at
the same time behavior of "all"/"any" was altered too, because they
were the same branch of the code. As those parameters mean different,
they need to be treated differently in code too. This patch reflects
the difference.
Also after mentioned change, address parsing code was changed more
and address family was set explicitly even for "all"/"any" addresses.
And that broke matching conditions further. This patch fixes that too
and returns AF_UNSPEC to "all"/"any" address.
Now "default" is treated as top-level prefix (for example 0.0.0.0/0 in
IPv4) and "all"/"any" always matches anything in exact, root and match
modes.
Reported-by: Luca Boccassi <bluca@debian.org> Signed-off-by: Alexander Zubkov <green@msu.ru>
Leon Romanovsky [Sun, 25 Mar 2018 06:38:56 +0000 (09:38 +0300)]
rdma: Move RDMA UAPI header file to be under RDMA responsibility
In iproute2 package, the updates of UAPIs files are performed
after the needed feature lands in kernel's net-next tree.
Such development flow created delays to the rdma tool developers,
who uses rdma-next tree as a basis for their work.
Move RDMA UAPI file to be under rdma/ folder, so whole responsibility
of syncing this file will be on them.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: David Ahern <dsahern@gmail.com>
Roopa Prabhu [Mon, 19 Mar 2018 17:20:10 +0000 (10:20 -0700)]
bridge: add option extern_learn to set NTF_EXT_LEARNED on fdb entries
NTF_EXT_LEARNED can be set by a user on bridge fdb entry.
Provide a bridge command option to allow a user to set
NTF_EXT_LEARNED on a bridge fdb entry.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David Ahern <dsahern@gmail.com>
Alexander Zubkov [Sun, 18 Mar 2018 16:50:25 +0000 (17:50 +0100)]
treat "default" and "all"/"any" addresses differenty
Debian maintainer found that basic command:
# ip route flush all
No longer worked as expected which breaks user scripts and
expectations. It no longer flushed all IPv4 routes.
Recently behavior of "default" prefix parameter was corrected. But at
the same time behavior of "all"/"any" was altered too, because they
were the same branch of the code. As those parameters mean different,
they need to be treated differently in code too. This patch reflects
the difference.
Also after mentioned change, address parsing code was changed more
and address family was set explicitly even for "all"/"any" addresses.
And that broke matching conditions further. This patch fixes that too
and returns AF_UNSPEC to "all"/"any" address.
Now "default" is treated as top-level prefix (for example 0.0.0.0/0 in
IPv4) and "all"/"any" always matches anything in exact, root and match
modes.
Reported-by: Luca Boccassi <bluca@debian.org> Signed-off-by: Alexander Zubkov <green@msu.ru>
tc: f_flower: Add support for matching first frag packets
Add matching support for distinguishing between first and later fragmented
packets.
# tc filter add dev eth0 protocol ip parent ffff: \
flower indev eth0 \
ip_flags firstfrag \
ip_proto udp \
action mirred egress redirect dev eth1
# tc filter add dev eth0 protocol ip parent ffff: \
flower indev eth0 \
ip_flags nofirstfrag \
ip_proto udp \
action mirred egress redirect dev eth1
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David Ahern <dsahern@gmail.com>
Debian maintainer found that basic command:
# ip route flush all
No longer worked as expected which breaks user scripts and
expectations. It no longer flushed all IPv4 routes.
Reported-by: Luca Boccassi <bluca@debian.org> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
ipmroute: better error message if no kernel mroute
If kernel does not support the IP multicast address family,
then it will report all routes (PF_UNSPEC).
Give the user a better error message and abort the command.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David Ahern <dsahern@gmail.com>
David Ahern [Mon, 12 Mar 2018 01:46:07 +0000 (18:46 -0700)]
Merge branch 'iplink-parse' into iproute2-next
Serhey Popovych says:
====================
This is main routine to parse ip-link(8) configuration parameters.
Move all code related to command line parsing and validation to it from
iptables_modify(). As benefit we reduce number of arguments as well as
checking for most of weired cases in single place to give benefit to
iptables_parse() users.
See individual patch description message for more information.
v4
Drop patches intended to reduce number of arguments to
iptables_parse(): postpone to the series with real use cases.
Save only ifi_index in iplink_vxcan.c and link_veth.c: no need
to save whole ifinfomsg data structure.
Note that there is no sense to introduce custom version of
iplink_parse() to use in iplink_vxcan.c and link_veth.c because
there is too much parameters we need to support (except VF and
few others) making huge code duplication.
v3
Move vxlan/veth ifinfomsg save/restore to separate patch to
make clear change that perform most of request buffer setups
and checks in iplink_parse().
Update commit message descriptions and extra new line from
"utils: Introduce and use nodev() helper routine" patch.
v2
Terminate via exit() when failing to parse command line arguments
to help identify failing line in batch mode.
Serhey Popovych [Wed, 7 Mar 2018 08:40:39 +0000 (10:40 +0200)]
iplink: Perform most of request buffer setups and checks in iplink_parse()
To benefit other users (e.g. link_veth.c) of iplink_parse() from
additional attribute checks and setups made in iplink_modify(). This
catches most of weired cobination of parameters to peer device
configuration.
Drop @name, @dev, @link, @group and @index from iplink_parse() parameters
list: they are not needed outside.
While there change return -1 to exit(-1) for group parsing errors: we
want to stop further command processing unless -force option is given
to get error line easily.
Serhey Popovych [Wed, 7 Mar 2018 08:40:38 +0000 (10:40 +0200)]
iplink: Follow documented behaviour when "index" is given
Both ip-link(8) and error message when "index" parameter is given for
set/delete case says that index can only be given during network
device creation.
Follow this documented behaviour and get rid of ambiguous behaviour in
case of both "dev" and "index" specified for ip link delete scenario
(actually "index" being ignored in favor to "dev").
Prohibit "index" when configuring/deleting group of network devices.
Serhey Popovych [Wed, 7 Mar 2018 08:40:37 +0000 (10:40 +0200)]
iplink: Use "dev" and "name" parameters interchangeable when possible
Both of them accept network device name as argument, but have different
meaning:
dev - is a device by it's name,
name - name for specific device.
The only case where they treated separately is network device rename
case where need to specify both ifindex and new name. In rest of the
cases we can assume that dev == name.
With this change we do following:
1) Kill ambiguity with both "dev" and "name" parameters given the same
name:
ip link {add|set} dev veth100a name veth100a ...
2) Make sure we do not accept "name" more than once.
3) For VF and XDP treat "name" as "dev". Fail in case of "dev" is
given after VF and/or XDP parsing.
4) Make veth and vxcan to accept both "name" and "dev" as their peer
parameters, effectively following general ip-link(8) utility
behaviour on link create:
ip link add {name|dev} veth1a type veth peer {name|dev} veth1b
Serhey Popovych [Wed, 7 Mar 2018 08:40:36 +0000 (10:40 +0200)]
utils: Introduce and use nodev() helper routine
There is a couple of places where we report error in case of no network
device is found. In all of them we output message in the same format to
stderr and either return -1 or 1 to the caller or exit with -1.
Introduce new helper function nodev() that takes name of the network
device caused error and returns -1 to it's caller. Either call exit()
or return to the caller to preserve behaviour before change.
Use -nodev() in traffic control (tc) code to return 1.
Simplify expression for checking for argument being 0/NULL in @if
statement.
Fixes: 3fd86630876a ("iproute2: rework SR-IOV VF support") Fixes: 8c29ae7cc249 ("ip link: Fix crash on older kernels when show VF dev") Fixes: f89a2a05ffa9 ("Add support to configure SR-IOV VF minimum and maximum Tx rate through ip tool") Fixes: ae7229d5f99e ("ip: Add support for setting and showing SR-IOV virtual funtion link params") Fixes: d0e720111aad ("ip: ipaddress.c: add support for json output") Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Roopa Prabhu [Thu, 8 Mar 2018 18:06:47 +0000 (10:06 -0800)]
iprule: support for ip_proto, sport and dport match options
add support to match on ip_proto, sport and dport ranges.
For ip_proto, this patch currently enumerates, tcp, udp and sctp.
This list can be extended in the future.
example:
$ip rule add sport 666-777 dport 999 ip_proto tcp table 100
$ip rule show
0: from all lookup local
32765: from all ip_proto 6 sport 666-777 dport 999 lookup 100
32766: from all lookup main
32767: from all lookup default
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David Ahern <dsahern@gmail.com>
Leon Romanovsky [Wed, 7 Mar 2018 09:05:35 +0000 (11:05 +0200)]
rdma: Update device capabilities flags
In kernel commit e1d2e8873369 ("IB/core: Add PCI write
end padding flags for WQ and QP"), we introduced new
device capability to advertise PCI write end padding.
PCI write end padding is the device's ability to pad the ending of
incoming packets (scatter) to full cache line such that the last
upstream write generated by an incoming packet will be a full cache
line.
This commit updates RDMAtool to present this field.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David Ahern <dsahern@gmail.com>
David Ahern [Tue, 6 Mar 2018 23:48:22 +0000 (15:48 -0800)]
Merge branch 'more-json' into iproute2-next
Stephen Hemminger says:
====================
The ip command implementation of JSON was very spotty. Only address
and link were originally implemented. After doing route for next,
went ahead and implemented it for a bunch of the other sub commands.
The JSON output does scale values differently since there is no good
way to indicate units. The rtt values are displayed in seconds in
JSON and microseconds in the original (non JSON) mode. In the example
above the output in without the -j flag, the output would be
... rtt 39176us rttvar 39176us
I did this since all the other values in the JSON record are also in
floating point seconds.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David Ahern <dsahern@gmail.com>
Davide Caratti [Fri, 2 Mar 2018 18:36:16 +0000 (19:36 +0100)]
tc: fix parsing of the control action
If the user didn't specify any control action, don't pop the command line
arguments: otherwise, parsing of the next argument (tipically the 'index'
keyword) results in an error, causing the following 'tc-testing' failures:
Test a6d6: Add skbedit action with index
Test 38f3: Delete skbedit action
Test a568: Add action with ife type
Test b983: Add action without ife type
Test 7d50: Add skbmod action to set destination mac
Test 9b29: Add skbmod action to set source mac
Test e93a: Delete an skbmod action
Also, add missing parse for 'ok' control action to m_police, to fix the
following 'tc-testing' failure:
Test 8dd5: Add police action with control ok
tested with:
# ./tdc.py
test results:
all tests ok using kernel 4.16-rc2, except 9aa8 "Get a single skbmod
action from a list" (which is failing also before this commit)
Fixes: 3572e01a090a ("tc: util: Don't call NEXT_ARG_FWD() in __parse_action_control()") Cc: Michal Privoznik <mprivozn@redhat.com> Cc: Wolfgang Bumiller <w.bumiller@proxmox.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
ss: fix NULL dereference when rendering without header
When ss is invoked with the no-header flag, if the query doesn't return
any result, render() is called with 'buffer' uninitialized. This
currently leads to a segfault. Ensure that buffer is initialized before
rendering.
The bug can be triggered with: ss -H sport = 100000
David Ahern [Thu, 1 Mar 2018 22:43:08 +0000 (14:43 -0800)]
libnetlink: __rtnl_talk_iov should only loop max iovlen times
William reported ip hanging and bisected to a recent commit for batching
allowing more than 1 command to be sent per message. The loop over
recvmsg should never cycle more than iovlen times -- 1 response for
each command in the message.
Fixes: 72a2ff3916e5 ("lib/libnetlink: Add a new function rtnl_talk_iov") Signed-off-by: David Ahern <dsahern@gmail.com>
Phil Sutter [Thu, 1 Mar 2018 09:35:12 +0000 (10:35 +0100)]
ip-link: Fix use after free in nl_get_ll_addr_len()
Immediately after freeing the buffer returned from rtnl_talk(), it is
accessed again via pointer in struct rtattr array. This leads to some
builds not allowing to set an interface's MAC address because the
expected length value is garbage.
Fixes: 86bf43c7c2fdc ("lib/libnetlink: update rtnl_talk to support malloc buff at run time") Signed-off-by: Phil Sutter <phil@nwl.cc>
Joe Stringer [Wed, 28 Feb 2018 22:16:42 +0000 (14:16 -0800)]
bpf: Print section name when hitting non ld64 issue
It's useful to be able to tell which section is being processed in the
ELF when this error is triggered, so print that detail.
Signed-off-by: Joe Stringer <joe@wand.net.nz> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Donald Sharp [Wed, 28 Feb 2018 23:43:58 +0000 (18:43 -0500)]
ip: Use the `struct fib_rule_hdr` for rules
The iprule.c code was using `struct rtmsg` as the data
type to pass into the kernel for the netlink message.
While 'struct rtmsg' and `struct fib_rule_hdr` are
the same size and mostly the same, we should use
the correct data structure. This commit translates
the data structures to have iprule.c use the correct
one.
Additionally copy over the modified fib_rules.h file
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com> Signed-off-by: David Ahern <dsahern@gmail.com>
Eyal Birger [Fri, 23 Feb 2018 11:12:25 +0000 (13:12 +0200)]
tc: add em_ipt ematch for calling xtables matches from tc matching context
The commit calls a new tc ematch for using netfilter xtable matches.
This allows early classification as well as mirroning/redirecting traffic
based on logic implemented in netfilter extensions.
Current supported use case is classification based on the incoming IPSec
state used during decpsulation using the 'policy' iptables extension
(xt_policy).
The matcher uses libxtables for parsing the input parameters.
Example use for matching an IPSec state with reqid 1:
tc qdisc add dev eth0 ingress
tc filter add dev eth0 protocol ip parent ffff: \
basic match 'ipt(-m policy --dir in --pol ipsec --reqid 1)' \
action drop
This is the user-space counter part of kernel commit ccc007e4a746
("net: sched: add em_ipt ematch for calling xtables matches")
Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: David Ahern <dsahern@gmail.com>
Petr Machata [Wed, 21 Feb 2018 11:18:37 +0000 (12:18 +0100)]
ip: link_gre6.c: Support IP6_TNL_F_ALLOW_LOCAL_REMOTE flag
For IP-in-IP tunnels, one can specify the [no]allow-localremote command
when configuring a device. Under the hood, this flips the
IP6_TNL_F_ALLOW_LOCAL_REMOTE flag on the netdevice. However, ip6gretap
and ip6erspan devices, where the flag is also relevant, are not IP-in-IP
tunnels, and thus there's no way to configure the flag on these
netdevices. Therefore introduce the command to link_gre6 as well.
The original support was introduced in commit 21440d19d957
("ip: link_ip6tnl.c/ip6tunnel.c: Support IP6_TNL_F_ALLOW_LOCAL_REMOTE flag")
Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David Ahern <dsahern@gmail.com>
Dpipe - Each dpipe table can have one resource which is mapped to it.
The resource is presented via its full path. Furthermore, the number
of units consumed by single table entry is presented.
Resource - Each resource presents the dpipe tables that use it.
devlink: Add support for devlink resource abstraction
Add support for devlink resource abstraction. The resources are
represented by a tree based structure and are identified by a name and
a size. Some resources can present their real time occupancy.
First the resources exposed by the driver can be observed, for example:
$devlink resource show pci/0000:03:00.0
pci/0000:03:00.0:
name kvd size 245760 unit entry
resources:
name linear size 98304 occ 0 unit entry size_min 0 size_max 147456 size_gran 128
name hash_double size 60416 unit entry size_min 32768 size_max 180224 size_gran 128
name hash_single size 87040 unit entry size_min 65536 size_max 212992 size_gran 128
Some resource's size can be changed. Examples:
$devlink resource set pci/0000:03:00.0 path /kvd/hash_single size 73088
$devlink resource set pci/0000:03:00.0 path /kvd/hash_double size 74368
The changes do not apply immediately, this can be validate by the 'size_new'
attribute, which represents the pending changed size. For example
$devlink resource show pci/0000:03:00.0
pci/0000:03:00.0:
name kvd size 245760 unit entry size_valid false
resources:
name linear size 98304 size_new 147456 occ 0 unit entry size_min 0 size_max 147456 size_gran 128
name hash_double size 60416 unit entry size_min 32768 size_max 180224 size_gran 128
name hash_single size 87040 unit entry size_min 65536 size_max 212992 size_gran 128
In case of a pending change the nested resources present an indication
for a valid configuration of its children (sum of its children sizes
doesn't exceed the parent's size).
In order for the changes to take place hot reload is needed. The hot
reload through devlink will be introduced in the following patch.
Quentin Monnet [Thu, 22 Feb 2018 03:22:14 +0000 (19:22 -0800)]
README: re-add updated information link
The "Information" link was removed from README file in commit d7843207e6fd ("README: update location of git repositories, remove
broken info link"), because it redirected to a page that no longer
existed on the Linux Foundation wiki.
This page has just been restored, so we can add the link back again.
Since the previous link was a redirection, use the updated link instead.
Thanks to Luca Boccassi for investigating this issue, restoring and
updating the page.
Vincent Bernat [Tue, 20 Feb 2018 23:28:04 +0000 (00:28 +0100)]
color: disable color when json output is requested
Instead of declaring -color and -json exclusive, ignore -color when
-json is provided. The rationale is to allow to put -color in an alias
for ip while still being able to use -json. -color is merely a
presentation suggestion and we can assume there is nothing to color in
the JSON output.
Signed-off-by: Vincent Bernat <vincent@bernat.im> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>