]> git.proxmox.com Git - mirror_iproute2.git/log
mirror_iproute2.git
6 years agoMerge branch 'revert'
Stephen Hemminger [Tue, 27 Mar 2018 15:58:36 +0000 (08:58 -0700)]
Merge branch 'revert'

6 years agotreat "default" and "all"/"any" addresses differenty
Alexander Zubkov [Sun, 18 Mar 2018 16:50:25 +0000 (17:50 +0100)]
treat "default" and "all"/"any" addresses differenty

Debian maintainer found that basic command:
# ip route flush all
No longer worked as expected which breaks user scripts and
expectations. It no longer flushed all IPv4 routes.

Recently behavior of "default" prefix parameter was corrected. But at
the same time behavior of "all"/"any" was altered too, because they
were the same branch of the code. As those parameters mean different,
they need to be treated differently in code too. This patch reflects
the difference.

Also after mentioned change, address parsing code was changed more
and address family was set explicitly even for "all"/"any" addresses.
And that broke matching conditions further. This patch fixes that too
and returns AF_UNSPEC to "all"/"any" address.

Now "default" is treated as top-level prefix (for example 0.0.0.0/0 in
IPv4) and "all"/"any" always matches anything in exact, root and match
modes.

Reported-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Alexander Zubkov <green@msu.ru>
6 years agotreat "default" and "all"/"any" addresses differenty
Alexander Zubkov [Sun, 18 Mar 2018 16:50:25 +0000 (17:50 +0100)]
treat "default" and "all"/"any" addresses differenty

Debian maintainer found that basic command:
# ip route flush all
No longer worked as expected which breaks user scripts and
expectations. It no longer flushed all IPv4 routes.

Recently behavior of "default" prefix parameter was corrected. But at
the same time behavior of "all"/"any" was altered too, because they
were the same branch of the code. As those parameters mean different,
they need to be treated differently in code too. This patch reflects
the difference.

Also after mentioned change, address parsing code was changed more
and address family was set explicitly even for "all"/"any" addresses.
And that broke matching conditions further. This patch fixes that too
and returns AF_UNSPEC to "all"/"any" address.

Now "default" is treated as top-level prefix (for example 0.0.0.0/0 in
IPv4) and "all"/"any" always matches anything in exact, root and match
modes.

Reported-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Alexander Zubkov <green@msu.ru>
6 years agotc: use get_u32() in psample action to match types
Roman Mashak [Tue, 13 Mar 2018 21:16:23 +0000 (17:16 -0400)]
tc: use get_u32() in psample action to match types

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Acked-by: Yotam Gigi <yotam.gi@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agotc: print actual action for sample action
Roman Mashak [Tue, 13 Mar 2018 13:57:10 +0000 (09:57 -0400)]
tc: print actual action for sample action

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agoRevert "iproute: "list/flush/save default" selected all of the routes"
Stephen Hemminger [Mon, 12 Mar 2018 20:58:17 +0000 (13:58 -0700)]
Revert "iproute: "list/flush/save default" selected all of the routes"

This reverts commit 9135c4d6037ff9f1818507bac0049fc44db8c3d2.

Debian maintainer found that basic command:
# ip route flush all
No longer worked as expected which breaks user scripts and
expectations. It no longer flushed all IPv4 routes.

Reported-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agoip-address: Fix negative prints of large TX rate limits
Tariq Toukan [Thu, 8 Mar 2018 16:08:26 +0000 (18:08 +0200)]
ip-address: Fix negative prints of large TX rate limits

TX rate limit fields are unsigned (__u32).
Use %u and print_uint when printing.

Tested:
$ ip link set ens1 vf 1 rate 2294967296
$ ip link show |grep -iE "vf 1" | grep rate

before:
vf 1 MAC 00:00:00:00:00:00, tx rate -2000000000 (Mbps), max_tx_rate -2000000000Mbps, ...

after:
vf 1 MAC 00:00:00:00:00:00, tx rate 2294967296 (Mbps), max_tx_rate 2294967296Mbps, ...

Fixes: 3fd86630876a ("iproute2: rework SR-IOV VF support")
Fixes: 8c29ae7cc249 ("ip link: Fix crash on older kernels when show VF dev")
Fixes: f89a2a05ffa9 ("Add support to configure SR-IOV VF minimum and maximum Tx rate through ip tool")
Fixes: ae7229d5f99e ("ip: Add support for setting and showing SR-IOV virtual funtion link params")
Fixes: d0e720111aad ("ip: ipaddress.c: add support for json output")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
6 years agotc: updated tc-bpf man page
Roman Mashak [Wed, 7 Mar 2018 14:35:39 +0000 (09:35 -0500)]
tc: updated tc-bpf man page

Added description of direct-action parameter.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
6 years agojson_writer: add SPDX Identifier (GPL-2/BSD-2)
Stephen Hemminger [Tue, 6 Mar 2018 22:39:19 +0000 (14:39 -0800)]
json_writer: add SPDX Identifier (GPL-2/BSD-2)

I wrote this code so put SPDX License on it and intentionally
allow use in BSD code.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agotc: added tc monitor description in man page
Roman Mashak [Mon, 5 Mar 2018 16:36:16 +0000 (11:36 -0500)]
tc: added tc monitor description in man page

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agotc: fix parsing of the control action
Davide Caratti [Fri, 2 Mar 2018 18:36:16 +0000 (19:36 +0100)]
tc: fix parsing of the control action

If the user didn't specify any control action, don't pop the command line
arguments: otherwise, parsing of the next argument (tipically the 'index'
keyword) results in an error, causing the following 'tc-testing' failures:

 Test a6d6: Add skbedit action with index
 Test 38f3: Delete skbedit action
 Test a568: Add action with ife type
 Test b983: Add action without ife type
 Test 7d50: Add skbmod action to set destination mac
 Test 9b29: Add skbmod action to set source mac
 Test e93a: Delete an skbmod action

Also, add missing parse for 'ok' control action to m_police, to fix the
following 'tc-testing' failure:

 Test 8dd5: Add police action with control ok

tested with:
 # ./tdc.py

test results:
 all tests ok using kernel 4.16-rc2, except 9aa8 "Get a single skbmod
 action from a list" (which is failing also before this commit)

Fixes: 3572e01a090a ("tc: util: Don't call NEXT_ARG_FWD() in __parse_action_control()")
Cc: Michal Privoznik <mprivozn@redhat.com>
Cc: Wolfgang Bumiller <w.bumiller@proxmox.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agoss: fix NULL dereference when rendering without header
Jean-Philippe Brucker [Sat, 3 Mar 2018 16:59:44 +0000 (16:59 +0000)]
ss: fix NULL dereference when rendering without header

When ss is invoked with the no-header flag, if the query doesn't return
any result, render() is called with 'buffer' uninitialized. This
currently leads to a segfault. Ensure that buffer is initialized before
rendering.

The bug can be triggered with: ss -H sport = 100000

Signed-off-by: Jean-Philippe Brucker <jphilippe.brucker@gmail.com>
Acked-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agolibnetlink: __rtnl_talk_iov should only loop max iovlen times
David Ahern [Thu, 1 Mar 2018 22:43:08 +0000 (14:43 -0800)]
libnetlink: __rtnl_talk_iov should only loop max iovlen times

William reported ip hanging and bisected to a recent commit for batching
allowing more than 1 command to be sent per message. The loop over
recvmsg should never cycle more than iovlen times -- 1 response for
each command in the message.

Fixes: 72a2ff3916e5 ("lib/libnetlink: Add a new function rtnl_talk_iov")
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agoip-link: Fix use after free in nl_get_ll_addr_len()
Phil Sutter [Thu, 1 Mar 2018 09:35:12 +0000 (10:35 +0100)]
ip-link: Fix use after free in nl_get_ll_addr_len()

Immediately after freeing the buffer returned from rtnl_talk(), it is
accessed again via pointer in struct rtattr array. This leads to some
builds not allowing to set an interface's MAC address because the
expected length value is garbage.

Fixes: 86bf43c7c2fdc ("lib/libnetlink: update rtnl_talk to support malloc buff at run time")
Signed-off-by: Phil Sutter <phil@nwl.cc>
6 years agobpf: Print section name when hitting non ld64 issue
Joe Stringer [Wed, 28 Feb 2018 22:16:42 +0000 (14:16 -0800)]
bpf: Print section name when hitting non ld64 issue

It's useful to be able to tell which section is being processed in the
ELF when this error is triggered, so print that detail.

Signed-off-by: Joe Stringer <joe@wand.net.nz>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agodevlink: Fix error reporting
Arkadi Sharshevsky [Wed, 28 Feb 2018 09:24:22 +0000 (11:24 +0200)]
devlink: Fix error reporting

The current code doesn't set errno in case of extended ack.

Fixes: 049c58539f5d ("devlink: mnlg: Add support for extended ack")
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agoip: Properly display AF_BRIDGE address information for neighbor events
Donald Sharp [Fri, 23 Feb 2018 19:10:09 +0000 (14:10 -0500)]
ip: Properly display AF_BRIDGE address information for neighbor events

The vxlan driver when a neighbor add/delete event occurs sends
NDA_DST filled with a union:

union vxlan_addr {
struct sockaddr_in sin;
struct sockaddr_in6 sin6;
struct sockaddr sa;
};

This eventually calls rt_addr_n2a_r which had no handler for the
AF_BRIDGE family and "???" was being printed.

Add code to properly display this data when requested.

Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agordma: Avoid memory leak for skipper resource
Leon Romanovsky [Tue, 20 Feb 2018 12:47:18 +0000 (14:47 +0200)]
rdma: Avoid memory leak for skipper resource

The call to get_task_name() allocates memory which is not freed
in case of skipping the object.

Fixes: 8ecac46a60ff ("rdma: Add QP resource tracking information")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agodevlink: Update man pages and add resource man
Arkadi Sharshevsky [Wed, 14 Feb 2018 08:55:22 +0000 (10:55 +0200)]
devlink: Update man pages and add resource man

Add resource man, and update dev manual for reload command.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agodevlink: Add support for resource/dpipe relation
Arkadi Sharshevsky [Wed, 14 Feb 2018 08:55:21 +0000 (10:55 +0200)]
devlink: Add support for resource/dpipe relation

Dpipe - Each dpipe table can have one resource which is mapped to it.
The resource is presented via its full path. Furthermore, the number
of units consumed by single table entry is presented.

Resource - Each resource presents the dpipe tables that use it.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agodevlink: Move dpipe context from heap to stack
Arkadi Sharshevsky [Wed, 14 Feb 2018 08:55:20 +0000 (10:55 +0200)]
devlink: Move dpipe context from heap to stack

Move dpipe context to stack instead of dynamically.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agodevlink: Add support for hot reload
Arkadi Sharshevsky [Wed, 14 Feb 2018 08:55:19 +0000 (10:55 +0200)]
devlink: Add support for hot reload

Add support for hot reload. It should be used in order for resource
updates to take place.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agodevlink: Add support for devlink resource abstraction
Arkadi Sharshevsky [Wed, 14 Feb 2018 08:55:18 +0000 (10:55 +0200)]
devlink: Add support for devlink resource abstraction

Add support for devlink resource abstraction. The resources are
represented by a tree based structure and are identified by a name and
a size. Some resources can present their real time occupancy.

First the resources exposed by the driver can be observed, for example:

$devlink resource show pci/0000:03:00.0
pci/0000:03:00.0:
  name kvd size 245760 unit entry
    resources:
      name linear size 98304 occ 0 unit entry size_min 0 size_max 147456 size_gran 128
      name hash_double size 60416 unit entry size_min 32768 size_max 180224 size_gran 128
      name hash_single size 87040 unit entry size_min 65536 size_max 212992 size_gran 128

Some resource's size can be changed. Examples:

$devlink resource set pci/0000:03:00.0 path /kvd/hash_single size 73088
$devlink resource set pci/0000:03:00.0 path /kvd/hash_double size 74368

The changes do not apply immediately, this can be validate by the 'size_new'
attribute, which represents the pending changed size. For example

$devlink resource show pci/0000:03:00.0
pci/0000:03:00.0:
  name kvd size 245760 unit entry size_valid false
  resources:
    name linear size 98304 size_new 147456 occ 0 unit entry size_min 0 size_max 147456 size_gran 128
    name hash_double size 60416 unit entry size_min 32768 size_max 180224 size_gran 128
    name hash_single size 87040 unit entry size_min 65536 size_max 212992 size_gran 128

In case of a pending change the nested resources present an indication
for a valid configuration of its children (sum of its children sizes
doesn't exceed the parent's size).

In order for the changes to take place hot reload is needed. The hot
reload through devlink will be introduced in the following patch.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agodevlink: mnlg: Add support for extended ack
Arkadi Sharshevsky [Wed, 14 Feb 2018 08:55:17 +0000 (10:55 +0200)]
devlink: mnlg: Add support for extended ack

Add support for extended ack.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agodevlink: Change empty line indication with indentations
Arkadi Sharshevsky [Wed, 14 Feb 2018 08:55:16 +0000 (10:55 +0200)]
devlink: Change empty line indication with indentations

Currently multi-line objects are separated by new-lines. This patch
changes this behavior by using indentations for separation.

Signed-off-by: Arkadi Sharhsevsky <arkadis@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agoss: prepare rth when killing inet sock
Masatake YAMATO [Thu, 15 Feb 2018 19:11:20 +0000 (04:11 +0900)]
ss: prepare rth when killing inet sock

kill_inet_sock() expects rhn_handle instance is passed
via inet_diag_arg argument. However on the following calling path:

    generic_show_sock
    => show_one_inet_sock
       => kill_inet_sock

rth field of inet_diag_arg is not filled with the address of
rhn_handle instance. As the result ss crashes.

This commit fills the field with newly created rhn_handle
instance.

Changes in v2:
Instead of creating rtn_handle instances for each socket, create
one in upper layer and reuse it.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agoREADME: re-add updated information link
Quentin Monnet [Thu, 22 Feb 2018 03:22:14 +0000 (19:22 -0800)]
README: re-add updated information link

The "Information" link was removed from README file in commit
d7843207e6fd ("README: update location of git repositories, remove
broken info link"), because it redirected to a page that no longer
existed on the Linux Foundation wiki.

This page has just been restored, so we can add the link back again.
Since the previous link was a redirection, use the updated link instead.

Thanks to Luca Boccassi for investigating this issue, restoring and
updating the page.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
6 years agocolor: disable color when json output is requested
Vincent Bernat [Tue, 20 Feb 2018 23:28:04 +0000 (00:28 +0100)]
color: disable color when json output is requested

Instead of declaring -color and -json exclusive, ignore -color when
-json is provided. The rationale is to allow to put -color in an alias
for ip while still being able to use -json. -color is merely a
presentation suggestion and we can assume there is nothing to color in
the JSON output.

Signed-off-by: Vincent Bernat <vincent@bernat.im>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agotc: fix an off-by-one error while printing tc actions
Adam Vyskovsky [Sun, 18 Feb 2018 19:50:10 +0000 (20:50 +0100)]
tc: fix an off-by-one error while printing tc actions

The tc_print_action() function did not print all tc actions
when e.g. TCA_ACT_MAX_PRIO actions were defined for a single
tc filter.

Signed-off-by: Adam Vyskovsky <adamvyskovsky@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agobridge: Prevent a double space in bridge mdb show
Timothy Redaelli [Mon, 19 Feb 2018 16:13:06 +0000 (17:13 +0100)]
bridge: Prevent a double space in bridge mdb show

Prevent a double space in "bridge mdb show" when the MDB entry is not
marked as "offload".

Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agolib/namespace: don't try to mount rw /sys over a ro one
Lubomir Rintel [Mon, 12 Feb 2018 19:23:12 +0000 (20:23 +0100)]
lib/namespace: don't try to mount rw /sys over a ro one

It will fail with EPERM on Linux 4.15.

Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agoip: remove dead code
Stephen Hemminger [Wed, 21 Feb 2018 00:01:46 +0000 (16:01 -0800)]
ip: remove dead code

Remove long dead code (in #if 0) from original iproute2
for numeric names.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agouapi: update if_ether compat headers
Stephen Hemminger [Tue, 20 Feb 2018 18:48:32 +0000 (10:48 -0800)]
uapi: update if_ether compat headers

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agoRemove leftovers from removed Latex documentation
Phil Sutter [Fri, 9 Feb 2018 17:49:38 +0000 (18:49 +0100)]
Remove leftovers from removed Latex documentation

Since there is no documentation in Latex format left, there is no need
to check for commands to build it. Also there is no need to ignore any
of the temporary files which were created by them.

Signed-off-by: Phil Sutter <phil@nwl.cc>
6 years agoREADME: update location of git repositories, remove broken info link
Quentin Monnet [Fri, 9 Feb 2018 17:11:09 +0000 (09:11 -0800)]
README: update location of git repositories, remove broken info link

Reflect the recent change of location for the git repositories, and the
creation of the -next development repo, in README and README.devel.

Also remove the link to the Linux Foundation wiki that contained
information about iproute2. The link is now broken, I did not find any
alternative page to point to.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: David Ahern <dsahern@gmail.com>
6 years agoinclude: update rdma header from 4.16-rc1
Stephen Hemminger [Wed, 14 Feb 2018 00:42:00 +0000 (16:42 -0800)]
include: update rdma header from 4.16-rc1

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agonetns: allow negative nsid
Christian Brauner [Tue, 6 Feb 2018 18:39:31 +0000 (19:39 +0100)]
netns: allow negative nsid

If the kernel receives a negative nsid it will automatically assign
the next available nsid. In this case alloc_netid() will set min and
max to 0 for ird_alloc(). And when max == 0 idr_alloc() will interpret
this as the maximum range, i.e. specific to nsids it will try to find
an id in the range [0,INT_MAX). This is intentionally supported in the
kernel for nsids.

Commit acbe9118ce80 ("ip netns: use strtol() instead of atoi()")
regressed ip netns in that respect although previously the use-case
was either accidentally supported or opaquely supported such that it
triggered the original commit. From what I can gather it went as
follows before: atoi() was called with a string indicating a negative
value which caused it to return -1 which was passed to the
kernel. Let's make it less opaque by introducing the keyword "auto":

ip netns set <netns-name> auto

will cause nsid to be set to -1 and the kernel will select an available
nsid.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agordma: Check return value of strdup call
Leon Romanovsky [Wed, 31 Jan 2018 08:11:56 +0000 (10:11 +0200)]
rdma: Check return value of strdup call

Fixes: 74bd75c2b68d ("rdma: Add basic infrastructure for RDMA tool")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agordma: Document resource tracking
Leon Romanovsky [Wed, 31 Jan 2018 08:11:55 +0000 (10:11 +0200)]
rdma: Document resource tracking

Spartan version of resource tracking documentation.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agordma: Add QP resource tracking information
Leon Romanovsky [Wed, 31 Jan 2018 08:11:54 +0000 (10:11 +0200)]
rdma: Add QP resource tracking information

This patch adds ss-similar interface to view various resource
tracked objects. At this stage, only QP is presented.

1. Get all QPs for the specific device:
$ rdma res show qp link mlx5_4
link mlx5_4/- lqpn 8 type UD state RESET sq-psn 0 pid 0 comm [ib_ipoib]
link mlx5_4/1 lqpn 7 type UD state RTS sq-psn 0 pid 0 comm [ib_core]
link mlx5_4/1 lqpn 1 type GSI state RTS sq-psn 0 pid 0 comm [ib_core]
link mlx5_4/1 lqpn 0 type SMI state RTS sq-psn 0 pid 0 comm [ib_core]

$ rdma res show qp link mlx5_4/
link mlx5_4/- lqpn 8 type UD state RESET sq-psn 0 pid 0 comm [ib_ipoib]
link mlx5_4/1 lqpn 7 type UD state RTS sq-psn 0 pid 0 comm [ib_core]
link mlx5_4/1 lqpn 1 type GSI state RTS sq-psn 0 pid 0 comm [ib_core]
link mlx5_4/1 lqpn 0 type SMI state RTS sq-psn 0 pid 0 comm [ib_core]

2. Provide illegal port number (0 is illegal):
$ rdma res show qp link mlx5_4/0
Wrong device name

3. Get QPs of specific port:
$ rdma res show qp link mlx5_4/1
link mlx5_4/1 lqpn 7 type UD state RTS sq-psn 0 pid 0 comm [ib_core]
link mlx5_4/1 lqpn 1 type GSI state RTS sq-psn 0 pid 0 comm [ib_core]
link mlx5_4/1 lqpn 0 type SMI state RTS sq-psn 0 pid 0 comm [ib_core]

4. Get QPs which have not assigned port yet:
link mlx5_4/- lqpn 8 type UD state RESET sq-psn 0 pid 0 comm [ib_ipoib]

5. Limit to specific Local QPNs:
$ rdma res show qp link mlx5_4/1 lqpn 1-3,7
link mlx5_4/1 lqpn 7 type UD state RTS sq-psn 0 pid 0 comm [ib_core]
link mlx5_4/1 lqpn 1 type GSI state RTS sq-psn 0 pid 0 comm [ib_core]
link mlx5_4/1 lqpn 0 type SMI state RTS sq-psn 0 pid 0 comm [ib_core]

. Filter types (strings):
$ rdma res show qp link mlx5_4/1 type UD,gSi
link mlx5_4/1 lqpn 7 type UD state RTS sq-psn 0 pid 0 comm [ib_core]
link mlx5_4/1 lqpn 1 type GSI state RTS sq-psn 0 pid 0 comm [ib_core]

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agordma: Add resource tracking summary
Leon Romanovsky [Wed, 31 Jan 2018 08:11:53 +0000 (10:11 +0200)]
rdma: Add resource tracking summary

The global resource summary information. The object names, current utilization
and maximum numbers are received as is from the kernel.

$ rdma res
1: mlx5_0: pd 3 cq 5 qp 4
2: mlx5_1: pd 3 cq 5 qp 4
3: mlx5_2: pd 3 cq 5 qp 4
4: mlx5_3: pd 2 cq 3 qp 2
5: mlx5_4: pd 3 cq 5 qp 4

$ rdma res show mlx5_4
5: mlx5_4: pd 3 cq 5 qp 44

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agordma: Allow external usage of compare string routine
Leon Romanovsky [Wed, 31 Jan 2018 08:11:51 +0000 (10:11 +0200)]
rdma: Allow external usage of compare string routine

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agordma: Set pointer to device name position
Leon Romanovsky [Wed, 31 Jan 2018 08:11:50 +0000 (10:11 +0200)]
rdma: Set pointer to device name position

The dev and link execution callbacks expects that next
command line argument is device or port name.

Set pointer to device or port name position prior calls to
rd_exec_dev()/rd_exec_link().

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agordma: Add filtering infrastructure
Leon Romanovsky [Wed, 31 Jan 2018 08:11:49 +0000 (10:11 +0200)]
rdma: Add filtering infrastructure

This patch adds general infrastructure to RDMAtool to handle various
filtering options needed for the downstream resource tracking patches.

The infrastructure is generic and stores filters in list of key<->value
entries. There are three types of filters:

1. Numeric - the values are intended to be digits combined with '-' to
mark range and ',' to mark multiple entries, e.g. pid 1-100,234,400-401
is perfectly legit filter to limit process ids.

2. String - the values are consist from strings and "," as a denominator.

3. Link - special case to allow '/' in string to provide link name, e.g.
link mlx4_1/2.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agordma: Make visible the number of arguments
Leon Romanovsky [Wed, 31 Jan 2018 08:11:48 +0000 (10:11 +0200)]
rdma: Make visible the number of arguments

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agordma: Add option to provide "-" sign for the port number
Leon Romanovsky [Wed, 31 Jan 2018 08:11:47 +0000 (10:11 +0200)]
rdma: Add option to provide "-" sign for the port number

According to the IBTA spec [1], the physical connected port is provided
for the QP in RTR-to-INIT stage performed by modify_qp(). It causes
to do not have port number for newly created QPs.

The following patch adds "-" sign to present absence of port, because
QPs are going to be associated with rdmatool link object, which needs
port number as an index.

[1] InfiniBand Architecture Release 1.3 -
"Table 96 QP State Transition Properties"

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agoinclude: update UAPI types.h
Stephen Hemminger [Tue, 6 Feb 2018 01:21:27 +0000 (17:21 -0800)]
include: update UAPI types.h

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agoinclude: update interface UAPI from 4.15-rc1
Stephen Hemminger [Tue, 6 Feb 2018 01:21:01 +0000 (17:21 -0800)]
include: update interface UAPI from 4.15-rc1

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agoinclude: update rdma uapi from 4.15-rc1
Stephen Hemminger [Tue, 6 Feb 2018 01:20:14 +0000 (17:20 -0800)]
include: update rdma uapi from 4.15-rc1

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agoinclude: update netfilter headers from 4.15-rc1
Stephen Hemminger [Tue, 6 Feb 2018 01:19:32 +0000 (17:19 -0800)]
include: update netfilter headers from 4.15-rc1

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agoinclude: update uapi with BPF from 4.15-rc1
Stephen Hemminger [Tue, 6 Feb 2018 01:18:53 +0000 (17:18 -0800)]
include: update uapi with BPF from 4.15-rc1

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agoMerge branch 'iproute2-master' into iproute2-next
David Ahern [Mon, 29 Jan 2018 16:24:57 +0000 (08:24 -0800)]
Merge branch 'iproute2-master' into iproute2-next

Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agov4.15.0 v4.15.0
Stephen Hemminger [Mon, 29 Jan 2018 16:08:52 +0000 (08:08 -0800)]
v4.15.0

6 years agotc: fix second printing of requeues
Jakub Kicinski [Sat, 27 Jan 2018 09:19:04 +0000 (01:19 -0800)]
tc: fix second printing of requeues

Non-JSON tc qdisc output used to print the "requeues" statistic
twice.  Commit 4fcec7f3665b ("tc: jsonify stats2") tried to preserve
this behaviour for both standard output and JSON, but used the wrong
statistic (q.qlen).  Also duplicating keys in JSON is not allowed,
so the second occurrence should be completely skipped with JSON.

Fixes: 4fcec7f3665b ("tc: jsonify stats2")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agoip: address: fix stats64 JSON object name
Jakub Kicinski [Fri, 26 Jan 2018 19:30:35 +0000 (11:30 -0800)]
ip: address: fix stats64 JSON object name

The JSON object name for statistics in ip link show is "stats644".
Looks like a typo, commit d0e720111aad ("ip: ipaddress.c: add support
for json output") contains an example with the expected "stats64" name.

The fact that no one has noticed until now is probably an indication
that no one is using this object.  Hopefully it's not too late to fix
this, although IIUC this has already been in 4.13 and 4.14 releases :S

Fixes: d0e720111aad ("ip: ipaddress.c: add support for json output")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agotc: prio: JSON-ify prio output
Jakub Kicinski [Fri, 26 Jan 2018 19:27:57 +0000 (11:27 -0800)]
tc: prio: JSON-ify prio output

Make JSON output work with prio Qdiscs.  This will also make
other qdiscs which reuse the print_qopt work, like mqprio or
pfifo_fast.

Note that there is a double space between "priomap" and first
prio number.  Keep this original behaviour.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agotc: red: JSON-ify RED output
Jakub Kicinski [Fri, 26 Jan 2018 19:27:56 +0000 (11:27 -0800)]
tc: red: JSON-ify RED output

Make JSON output work with RED Qdiscs.  Float/double printing
helpers have to be added/uncommented to print the probability.
Since TC stats in general are not split out to a separate object
the xstats printed by this patch are not separated either.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agoMerge branch 'get_addr_rta' into iproute2-next
David Ahern [Thu, 25 Jan 2018 17:32:27 +0000 (09:32 -0800)]
Merge branch 'get_addr_rta' into iproute2-next

Serhey Popovych  says:

====================

Now we enhance get_addr() to return additional information about address
(e.g. if it unspecified or multicast) we want to have same functionality
for attributes in netlink message.

Introduce and use get_addr_rta() that parses given netlink attribute
into @inet_prefix data structure in the same way similar get_addr()
parses address from it's string representation.

Use attribute length to guess address family: force it by giving non
AF_UNSPEC @family to get_addr_rta() to ensure address is of expected
family.

Introduce and use inet_addr_match_rta() to further simplify and unify
code where get_addr_rta() intended to be used together with
inet_addr_match().

This is next step in ipv4 and ipv6 modules unification to prepare for
merge in the future.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agoip/tunnel: Unify local/remote endpoint address printing
Serhey Popovych [Wed, 24 Jan 2018 18:56:40 +0000 (20:56 +0200)]
ip/tunnel: Unify local/remote endpoint address printing

Introduce and use tnl_print_endpoint() helper to print of tunnel
endpoint address.

Note that for AF_INET and AF_INET6 inet_ntop(3) is used that may return
NULL in case of failure and while unlikely format_host_rta() might
return NULL too. Handle this case when passing local/remote to
print_string().

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agotcp_metric: Use get_addr_rta()
Serhey Popovych [Wed, 24 Jan 2018 18:56:39 +0000 (20:56 +0200)]
tcp_metric: Use get_addr_rta()

While there remove & from inet_prefix.data when since it is array.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agoipl2tp: Use get_addr_rta()
Serhey Popovych [Wed, 24 Jan 2018 18:56:38 +0000 (20:56 +0200)]
ipl2tp: Use get_addr_rta()

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agoipneigh: Use inet_addr_match_rta()
Serhey Popovych [Wed, 24 Jan 2018 18:56:37 +0000 (20:56 +0200)]
ipneigh: Use inet_addr_match_rta()

While there check return from get_prefix() for filter address.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agoipmroute: Use inet_addr_match_rta()
Serhey Popovych [Wed, 24 Jan 2018 18:56:36 +0000 (20:56 +0200)]
ipmroute: Use inet_addr_match_rta()

While there check return from get_prefix() for filter address.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agoiprule: Use inet_addr_match_rta()
Serhey Popovych [Wed, 24 Jan 2018 18:56:35 +0000 (20:56 +0200)]
iprule: Use inet_addr_match_rta()

While there check return from get_prefix() for filter address.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agoipaddress: Use inet_addr_match_rta()
Serhey Popovych [Wed, 24 Jan 2018 18:56:34 +0000 (20:56 +0200)]
ipaddress: Use inet_addr_match_rta()

While there check return from get_prefix() for filter address.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agoutils: Introduce get_addr_rta() and inet_addr_match_rta()
Serhey Popovych [Wed, 24 Jan 2018 18:56:33 +0000 (20:56 +0200)]
utils: Introduce get_addr_rta() and inet_addr_match_rta()

First is used to get address from netlink attribute to
inet_prefix data structure. Use memcpy() with constant
value to let complier optimize by replacing a call by
inlining load/store instructions.

Second is used to match address in given netlink attribute
with one given as reference. It matches successfully if
no attribute is given (@rta is NULL), reference address
family is AF_UNSPEC or it's length isn't given; fails if
get_attr_rta() can't get attribute or it's family does
not match reference; calls inet_addr_match() to get final
verdict.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agoMerge branch 'unify_external' into iproute2-next
David Ahern [Wed, 24 Jan 2018 18:02:27 +0000 (10:02 -0800)]
Merge branch 'unify_external' into iproute2-next

Serhey Popovych  says:

====================

With this series I want to unify collect metadata
handling in tunnels:

  1) Use "external" name for JSON and non-JSON output.

     Do not *print* any options when tunnel in
     collect metadata mode: gre6 already do
     this, so just apply to others.

  2) Do not *add* any attributes when configuring
     gre tunnel in collect metadata mode.

     Other tunnels (e.g. gre6, iptnl, ip6tnl)
     alredy do that.

This is next step in ipv4 and ipv6 modules
unification to prepare for merge in the future.

Any comments, suggestions and criticism as always
welcome.

v2
  For all tunnels implementing collect metadata
  use "external" keyword for both JSON. Thanks
  to Jiri Benc for detailed explanation.
====================

Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agogre/gre6: Unify attribute addition to netlink buffer
Serhey Popovych [Mon, 22 Jan 2018 17:23:46 +0000 (19:23 +0200)]
gre/gre6: Unify attribute addition to netlink buffer

There are couple of minor improvements:

  1) Check erspan_ver == 2 in gre6. It still could
     be 1 if erspan_idx is 0.

  2) Add tunnel encapsulation attributes only when
     collect metadata not in effect in gre.

  3) Trivial: address checkpatch issues.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agoip/tunnel: Be consistent when printing tunnel collect metadata
Serhey Popovych [Mon, 22 Jan 2018 17:23:45 +0000 (19:23 +0200)]
ip/tunnel: Be consistent when printing tunnel collect metadata

Print only "external" if collect meta data attribute
is given: rest of parameters are irrelevant. This is
to follow gre6.

For both JSON and non-JSON output use "external" for
all tunnels including vxlan and geneve.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agoMerge branch 'iproute2-master' into iproute2-next
David Ahern [Wed, 24 Jan 2018 17:59:03 +0000 (09:59 -0800)]
Merge branch 'iproute2-master' into iproute2-next

Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agotc/lexer: let quotes actually start strings
Wolfgang Bumiller [Mon, 22 Jan 2018 10:53:46 +0000 (11:53 +0100)]
tc/lexer: let quotes actually start strings

The lexer will go with the longest match, so previously
the starting double quotes of a string would be swallowed by
the [^ \t\r\n()]+ pattern leaving the user no way to
actually use strings with escape sequences.
Fix this by not allowing this case to start with double
quotes.

Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
6 years agoiplink: Use ll_name_to_index() instead of if_nametoindex()
Serhey Popovych [Fri, 19 Jan 2018 16:44:03 +0000 (18:44 +0200)]
iplink: Use ll_name_to_index() instead of if_nametoindex()

While benefit from using ll_name_to_index() with populated
cache can potentially be exploited only in few places
(e.g. bridge fdb/mdb/vlan show routines) there is another
advantage of ll_name_to_index() over plain if_nametoindex():

  in case of if_nametoindex() failure ll_name_to_index()
  will attempt to get index from common name in form "if%d"
  that may be returned from ll_index_to_name().

This makes output from ip(8) coherent with it's input.

Note that most of the code already switched from plain
if_nametoindex() to ll_name_to_index() to cached variant.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agovti/vti6: Minor improvements
Serhey Popovych [Fri, 19 Jan 2018 16:44:02 +0000 (18:44 +0200)]
vti/vti6: Minor improvements

In prepare of link_vti.c and link_vti6.c merge:

  1) Make @fwmark of __u32 type instead of unsigned int
     in vti to match with rest tunneling code.

  2) Report when unable to translate @link network device
     name to index instead of silently exiting in vti6.

  3) Remove newline separating local/remote attributes
     from the ikey/okey in vti6 to match vti module.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agoiptnl/ip6tnl: Unify ttl/hoplimit parsing routines
Serhey Popovych [Fri, 19 Jan 2018 16:44:01 +0000 (18:44 +0200)]
iptnl/ip6tnl: Unify ttl/hoplimit parsing routines

Handle "inherit" case properly for gre6 and ip6tnl.

Use get_u8() in gre to parse ttl/hoplimit.

Be consistent about "hlim" alias to ttl/hoplimit
support.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agotunnel: Add space between encap-dport and encap-sport in non-JSON output
Serhey Popovych [Fri, 19 Jan 2018 16:44:00 +0000 (18:44 +0200)]
tunnel: Add space between encap-dport and encap-sport in non-JSON output

Fixes: bad76e6b1f44 ("ip/tunnel: Abstract tunnel encapsulation options printing")
Fixes: e2d4588331fc ("ip: link_gre.c: add json output support")
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agogre/gre6: Post merge fixes
Serhey Popovych [Mon, 22 Jan 2018 14:50:08 +0000 (16:50 +0200)]
gre/gre6: Post merge fixes

Few minor changes after merge of 'master' into 'net-next' branch:

  1) Follow 80 line length for printing erspan_index parameter
     as we did in master with commit 2a8d0f6e9c3f ("gre/tunnel:
     Print erspan_index using print_uint()").

  2) Remove remnants of encapsulation option printing: now it
     is done using tnl_print_encap() helper in commit bad76e6b1f44
     ("ip/tunnel: Abstract tunnel encapsulation options printing").

Fixes: 8c75f69411bc ("Merge branch 'master' into net-next")
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
6 years agoMerge branch 'shared_block' into net-next
David Ahern [Sun, 21 Jan 2018 19:20:56 +0000 (11:20 -0800)]
Merge branch 'shared_block' into net-next

Jiri Pirko  says:

====================

From: Jiri Pirko <jiri@mellanox.com>

Kernel allows to share all filters between qdiscs with use
of shared block.

Example:

block number 22. "22" is just an identification:
$ tc qdisc add dev ens7 ingress_block 22 ingress
                        ^^^^^^^^^^^^^^^^
$ tc qdisc add dev ens8 ingress_block 22 ingress
                        ^^^^^^^^^^^^^^^^

If we don't specify "block" command line option, no shared block would
be created:
$ tc qdisc add dev ens9 ingress

Now if we list the qdiscs, we will see the block index in the output:

$ tc qdisc
qdisc ingress ffff: dev ens7 parent ffff:fff1 ingress_block 22
qdisc ingress ffff: dev ens8 parent ffff:fff1 ingress_block 22
qdisc ingress ffff: dev ens9 parent ffff:fff1

To make is more visual, the situation looks like this:

   ens7 ingress qdisc                 ens7 ingress qdisc
          |                                  |
          |                                  |
          +---------->  block 22  <----------+

Unlimited number of qdiscs may share the same block.

Block sharing is also supported for clsact qdisc:
$ tc qdisc add dev ens10 ingress_block 23 egress_block 24 clsact
$ tc qdisc show dev ens10
qdisc clsact ffff: dev ens10 parent ffff:fff1 ingress_block 23 egress_block 24

We can add filter using the block index:

$ tc filter add block 22 protocol ip pref 25 flower dst_ip 192.168.0.0/16 action drop

Note we cannot use the qdisc for filter manipulations of shared blocks:

$ tc filter add dev ens8 ingress protocol ip pref 1 flower dst_ip 192.168.100.2 action drop
Error: This filter block is shared. Please use the block index to manipulate the filters.

We will see the same output if we list filters for ingress qdisc of
ens7 and ens8, also for the block 22:

$ tc filter show block 22
filter protocol ip pref 25 flower chain 0
filter protocol ip pref 25 flower chain 0 handle 0x1
...

$ tc filter show dev ens7 ingress
filter block 22 protocol ip pref 25 flower chain 0
filter block 22 protocol ip pref 25 flower chain 0 handle 0x1
...

$ tc filter show dev ens8 ingress
filter block 22 protocol ip pref 25 flower chain 0
filter block 22 protocol ip pref 25 flower chain 0 handle 0x1
...

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agotc: implement ingress/egress block index attributes for qdiscs
Jiri Pirko [Sat, 20 Jan 2018 10:00:29 +0000 (11:00 +0100)]
tc: implement ingress/egress block index attributes for qdiscs

During qdisc creation it is possible to specify shared block for bot
ingress and egress. Pass this values to kernel according to the command
line options.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agotc: introduce support for block-handle for filter operations
Jiri Pirko [Sat, 20 Jan 2018 10:00:28 +0000 (11:00 +0100)]
tc: introduce support for block-handle for filter operations

So far, qdisc was the only handle that could be used to manipulate
filters. Kernel added support for using block to manipulate it. So add
the support to use block index to manipulate filters. The magic
TCM_IFINDEX_MAGIC_BLOCK indicates the block index is in use.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agotc: introduce tc_qdisc_block_exists helper
Jiri Pirko [Sat, 20 Jan 2018 10:00:27 +0000 (11:00 +0100)]
tc: introduce tc_qdisc_block_exists helper

This hepler used qdisc dump to list all qdisc and find if block index in
question is used by any of them. That means the block with specified
index exists.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agoMerge branch 'inet_get_addr' into net-next
David Ahern [Sun, 21 Jan 2018 18:11:07 +0000 (10:11 -0800)]
Merge branch 'inet_get_addr' into net-next

Serhey Popovych  says:

====================

It looks confusing to have multiple independent
routines to get internet address from it's string
representation: get_addr() and inet_get_addr().

Most complicated users of inet_get_addr() is
iplink_geneve.c and iplink_vxlan.c because they
required to handle both AF_INET and AF_INET6
for their local/remote endpoints.

On the other hand get_addr() does not provide
additional information like address type: need
to address this. to get rid of current and
possible future code duplications. Note that
this functionality is first step to make proto
independent handling of local/remote endpoints
in ip/tunnel code (there will be additional
series based on this one).

Also fix get_addr_1() and get_prefix() to make
sure it always provide correct ->family and
->bitlen.

As always comments, suggestions and criticism
are welcome.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agoip: Get rid of inet_get_addr()
Serhey Popovych [Thu, 18 Jan 2018 18:13:47 +0000 (20:13 +0200)]
ip: Get rid of inet_get_addr()

Both geneve and vxlan modules are converted to
use get_addr() we can replace inet_get_addr()
in less problematic places and finally get
rid of inet_get_addr().

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agoiplink_vxlan: Get rid of inet_get_addr()
Serhey Popovych [Thu, 18 Jan 2018 18:13:46 +0000 (20:13 +0200)]
iplink_vxlan: Get rid of inet_get_addr()

Now we have additional information about address
class from get_addr() we can use it in place of
inet_get_addr().

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agoiplink_geneve: Get rid of inet_get_addr()
Serhey Popovych [Thu, 18 Jan 2018 18:13:45 +0000 (20:13 +0200)]
iplink_geneve: Get rid of inet_get_addr()

Now we have additional information about address
class from get_addr() we can use it in place of
inet_get_addr().

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agoutils: Fast inet address classification after get_addr()
Serhey Popovych [Thu, 18 Jan 2018 18:13:44 +0000 (20:13 +0200)]
utils: Fast inet address classification after get_addr()

It looks very useful to receive additional information
from get_addr_1() and get_addr() about address to simplify
caller and get rid of code duplications.

For now following information can be returned:

  1) address is unspecified (zero)
  2) address is multicast
  3) address is internet: family is either AF_INET or
     AF_INET6.

More information can be added in the future.

Introduce inline helpers to make code using this new
address classification interface more self explaining:

  bool is_addrtype_inet(inet_prefix *addr)
    true if @addr is inet address

  bool is_addrtype_inet_unspec(inet_prefix *addr)
    true if @addr is unspecified inet address

  bool is_addrtype_inet_multi(inet_prefix *addr)
    true if @addr is multicast inet address

  bool is_addrtype_inet_not_unspec(inet_prefix *addr)
    true if @addr is not unspecified inet address
    false if @addr is not inet or unspecified inet

  bool is_addrtype_inet_not_multi(inet_prefix *addr)
    true if @addr is not multicast inet address
    false if @addr is not inet or multicast inet

Last two are useful for case when we need inet address
that is not unspecified or multicast.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agoutils: Always specify family and ->bytelen in get_prefix_1()
Serhey Popovych [Thu, 18 Jan 2018 18:13:43 +0000 (20:13 +0200)]
utils: Always specify family and ->bytelen in get_prefix_1()

Handle default/all/any special case in get_addr_1() to setup
->family and ->bytelen correctly.

Make get_addr_1() return ->bitlen == -2 instead of -1 to
distinguish default/all/any special case from the rest:
it is safe because all callers check ->bitlen < 0, not
explicit value -1.

Reduce intendation by one level and get rid of goto/label
to make code more readable.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agoutils: Always specify family for address in get_addr_1()
Serhey Popovych [Thu, 18 Jan 2018 18:13:42 +0000 (20:13 +0200)]
utils: Always specify family for address in get_addr_1()

Set ->family correctly when string representing address
is "default", "all" or "any": get_addr_1() might be called
with AF_UNSPEC (e.g. get_addr() -> get_addr_1()).

Extend support for zero address to all address families,
not only AF_INET and AF_INET6 when one explicitly given
as @family: use af_byte_len() to correctly set address length.

Still assume AF_INET when @family is AF_UNSPEC.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agoMerge branch 'master' into net-next
David Ahern [Sun, 21 Jan 2018 17:37:01 +0000 (09:37 -0800)]
Merge branch 'master' into net-next

Conflicts:
ip/link_gre.c
ip/link_gre6.c

Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agobpf: support map offload
Jakub Kicinski [Wed, 17 Jan 2018 07:50:54 +0000 (23:50 -0800)]
bpf: support map offload

When program is loaded with a specified ifindex, use that
ifindex also when creating maps.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agotc: red: allow setting th_min and th_max to the same value
Jakub Kicinski [Tue, 16 Jan 2018 23:08:50 +0000 (15:08 -0800)]
tc: red: allow setting th_min and th_max to the same value

Setting th_min and th_max to the same value may be useful for DCTCP
deployments.  The original DCTCP paper describes it as a simplest way
of achieving simple ECN threshold marking.  Indeed, there doesn't seem
to be any simpler qdisc in Linux which would allow such a setup today.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agoUpdate kernel headers to 4.15-rc8
David Ahern [Fri, 19 Jan 2018 20:33:41 +0000 (12:33 -0800)]
Update kernel headers to 4.15-rc8

Update kernel headers to commit 30c3e9d47035
("l2tp: remove switch block in l2tp_nl_cmd_session_create()")

Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agotunnel: Return constant string without copying it
Serhey Popovych [Thu, 18 Jan 2018 14:04:36 +0000 (16:04 +0200)]
tunnel: Return constant string without copying it

We return constant string from tnl_strproto(), no need
to copy it to temporary buffer and then return such
buffer as const: return constant string instead.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agovti6/tunnel: Unify and simplify link type help functions
Serhey Popovych [Thu, 18 Jan 2018 14:04:35 +0000 (16:04 +0200)]
vti6/tunnel: Unify and simplify link type help functions

Both of these two changes are missing for link_vti6.c:

  commit 8b47135474cd ("ip: link: Unify link type help functions a bit")
  commit 561e650eff67 ("ip link: Shortify printing the usage of link type")

Replay them on link_vti6.c to bring link type help functions
inline with other tunneling code.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agovti/tunnel: Unify ikey/okey printing
Serhey Popovych [Thu, 18 Jan 2018 14:04:34 +0000 (16:04 +0200)]
vti/tunnel: Unify ikey/okey printing

For vti6 tunnel we print [io]key in dotted-quad notation
(ipv4 address) while in vti we do that in hex format.

For vti tunnel we print [io]key only if value is not
zero while for vti6 we miss such check.

Unify vti and vti6 tunnel [io]key output.

While here enlarge s2 buffer to the same size as in rest
of tunnel support code (64 bytes) and check return from
inet_ntop().

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agogre/tunnel: Print erspan_index using print_uint()
Serhey Popovych [Thu, 18 Jan 2018 14:04:33 +0000 (16:04 +0200)]
gre/tunnel: Print erspan_index using print_uint()

One is missing in JSON output because fprintf()
is used instead of print_uint().

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agoip/tunnel: Abstract tunnel encapsulation options printing
Serhey Popovych [Thu, 18 Jan 2018 14:04:32 +0000 (16:04 +0200)]
ip/tunnel: Abstract tunnel encapsulation options printing

Get rid of code duplications and consolidate encapsulation
options printing in single function - tnl_print_encap().

Introduce and use tnl_encap_str() to format encapsulation
option string according to tempate and given values to avoid
code duplication and simplify it.

Use print_string() instead of fputs() and fprintf() to
print encapsulation for !is_json_context().

Print "unknown" parameter for "encap" type in PRINT_FP
context using "%s " format specifier and benefit from
complite time string merge.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agoip/tunnel: Use print_0xhex() instead of print_string()
Serhey Popovych [Thu, 18 Jan 2018 14:04:31 +0000 (16:04 +0200)]
ip/tunnel: Use print_0xhex() instead of print_string()

No need for custom SPRINT_BUF() and snprintf() 0x%x
value to this buffer: we can use print_0xhex() instead
of print_string().

In link_iptnl.c use s2 instead of s1 buffer and remove
s1.

While there adjust fwmark option print order in iptnl
and ip6tnl to get it match each other.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agoip/tunnel: Simplify and unify tos printing
Serhey Popovych [Thu, 18 Jan 2018 14:04:30 +0000 (16:04 +0200)]
ip/tunnel: Simplify and unify tos printing

For ip tunnels tos can be 0 when not configured, 1 when
inherited from encapsulated packet and rest specifying
diffserv (rfc2474) or tos (rfc1349) bits. It is stored
in packet tos/diffserv field and returned in tos
netlink attribute to userspace.

Simplify and unify tos printing by using print_0xhex()
and print_string() instead of fprintf() to output values.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agoip/tunnel: Correct and unify ttl/hoplimit printing
Serhey Popovych [Thu, 18 Jan 2018 14:04:29 +0000 (16:04 +0200)]
ip/tunnel: Correct and unify ttl/hoplimit printing

Both ttl/hoplimit is from 1 to 255. Zero has special meaning:
use encapsulated packet value. In ip-link(8) -d output this
looks like "ttl/hoplimit inherit". In JSON we have "int" type
for ttl and therefore values from 0 (inherit) to 255.

To do the best in handling ttl/hoplimit we need to accept
both cases: missing attribute in netlink dump and zero value
for "inherit"ed case. Last one is broken since JSON output
introduction for gre/iptnl versions and was never true for
gre6/ip6tnl.

For all tunnels, except ip6tnl change JSON type from "int" to
"uint" to reflect true nature of the ttl.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agoiplink: Use ll_index_to_name() instead of if_indextoname()
Serhey Popovych [Thu, 18 Jan 2018 14:04:28 +0000 (16:04 +0200)]
iplink: Use ll_index_to_name() instead of if_indextoname()

There are two reasons for switching to cached variant:

  1) ll_index_to_name() may return result from cache,
     eliminating expensive ioctl() to the kernel.

     Note that most of the code already switched from plain
     if_indextoname() to ll_index_to_name() to cached variant
     in print path because in most cases cache populated.

  2) It always return name in the form "if%d", even if
     entry is not in cache and ioctl() fails. This drops
     "link_index" from JSON output.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>