]> git.proxmox.com Git - mirror_iproute2.git/log
mirror_iproute2.git
3 years agodevlink: Add devlink port health command
Vladyslav Tarasiuk [Sun, 19 Jul 2020 13:36:02 +0000 (16:36 +0300)]
devlink: Add devlink port health command

Add devlink port health show subcommand which displays information about
specified port reporter or all present port reporters as in the example.
Device and port reporters can be distinguished by a handle being used.

Make other devlink-health subcommands be aliased by devlink port health.
Refactor devlink-health commands for usage of port handles in order to
interact with port reporters.

Change devlink health show output to dump information about both device
and port reporters with correct handles.

Example:
$ devlink health show
pci/0000:00:0b.0:
  reporter fw
    state healthy error 0 recover 0 auto_dump true
  reporter fw_fatal
    state healthy error 0 recover 0 grace_period 1200000 auto_recover true auto_dump true
pci/0000:00:0b.0/1:
  reporter tx
    state healthy error 0 recover 0 grace_period 10000 auto_recover true auto_dump true
  reporter rx
    state healthy error 0 recover 0 grace_period 10000 auto_recover true auto_dump true

$ devlink health show pci/0000:00:0b.0/1 reporter rx
Which is equivalent to:
$ devlink port health show pci/0000:00:0b.0/1 reporter rx
pci/0000:00:0b.0/1:
  reporter rx
    state healthy error 0 recover 0 grace_period 10000 auto_recover true auto_dump true

$ devlink port health show pci/0000:00:0b.0/1 reporter rx -j --pretty
{
    "health": {
         "pci/0000:00:0b.0/1": [ {
                 "reporter": "rx",
                 "state": "healthy",
                 "error": 0,
                 "recover": 0,
                 "grace_period": 500,
                 "auto_recover": true,
                 "auto_dump": true
              } ]
    }
}

$ devlink health set pci/0000:00:0b.0/1 reporter rx grace_period 5000
Which is equivalent to:
$ devlink port health set pci/0000:00:0b.0/1 reporter rx grace_period 5000

$ devlink port health show pci/0000:00:0b.0/1 reporter rx
pci/0000:00:0b.0/1:
  reporter rx
    state healthy error 0 recover 0 grace_period 5000 auto_recover true auto_dump true

Signed-off-by: Vladyslav Tarasiuk <vladyslavt@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agodevlink: Add a possibility to print arrays of devlink port handles
Vladyslav Tarasiuk [Sun, 19 Jul 2020 13:36:01 +0000 (16:36 +0300)]
devlink: Add a possibility to print arrays of devlink port handles

Add a capability of printing port handles for arrays in non-JSON format
in devlink-health manner.

Signed-off-by: Vladyslav Tarasiuk <vladyslavt@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoMerge branch 'tc-qevent-block' into next
David Ahern [Mon, 20 Jul 2020 16:36:41 +0000 (16:36 +0000)]
Merge branch 'tc-qevent-block' into next

Petr Machata  says:

====================

When a list of filters at a given block is requested, tc first validates
that the block exists before doing the filter query. Currently the
validation routine checks ingress and egress blocks. But now that blocks
can be bound to qevents as well, qevent blocks should be looked for as
well:

    # ip link add up type dummy
    # tc qdisc add dev dummy1 root handle 1: \
         red min 30000 max 60000 avpkt 1000 qevent early_drop block 100
    # tc filter add block 100 pref 1234 handle 102 matchall action drop
    # tc filter show block 100
    Cannot find block "100"

This patchset fixes this issue:

    # tc filter show block 100
    filter protocol all pref 1234 matchall chain 0
    filter protocol all pref 1234 matchall chain 0 handle 0x66
      not_in_hw
            action order 1: gact action drop
             random type none pass val 0
             index 2 ref 1 bind 1

In patch #1, the helpers and necessary infrastructure is introduced,
including a new qdisc_util callback that implements sniffing out bound
blocks in a given qdisc.

In patch #2, RED implements the new callback.

v3:
- Patch #1:
    - Do not pass &ctx->found directly to has_block. Do it through a
      helper variable, so that the callee does not overwrite the result
      already stored in ctx->found.

v2:
- Patch #1:
    - In tc_qdisc_block_exists_cb(), do not initialize 'q'.
    - Propagate upwards errors from q->has_block.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agotc: q_red: Implement has_block for RED
Petr Machata [Thu, 16 Jul 2020 16:47:08 +0000 (19:47 +0300)]
tc: q_red: Implement has_block for RED

In order for "tc filter show block X" to find a given block, implement the
has_block callback.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agotc: Look for blocks in qevents
Petr Machata [Thu, 16 Jul 2020 16:47:07 +0000 (19:47 +0300)]
tc: Look for blocks in qevents

When a list of filters at a given block is requested, tc first validates
that the block exists before doing the filter query. Currently the
validation routine checks ingress and egress blocks. But now that blocks
can be bound to qevents as well, qevent blocks should be looked for as
well.

In order to support that, extend struct qdisc_util with a new callback,
has_block. That should report whether, give the attributes in TCA_OPTIONS,
a blocks with a given number is bound to a qevent. In
tc_qdisc_block_exists_cb(), invoke that callback when set.

Add a helper to the tc_qevent module that walks the list of qevents and
looks for a given block. This is meant to be used by the individual qdiscs.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoss: mptcp: add msk diag interface support
Paolo Abeni [Fri, 10 Jul 2020 13:52:35 +0000 (15:52 +0200)]
ss: mptcp: add msk diag interface support

This implement support for MPTCP sockets type, comprising
extended socket info. Note that we need to add an extended
attribute carrying the actual protocol number to the diag
request.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoUpdate kernel headers
David Ahern [Tue, 14 Jul 2020 23:56:53 +0000 (23:56 +0000)]
Update kernel headers

Update kernel headers to commit:
    81adcd65b685 ("ksz884x: switch from 'pci_' to 'dma_' API")

Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoMerge branch 'main' into next
David Ahern [Tue, 14 Jul 2020 23:52:43 +0000 (23:52 +0000)]
Merge branch 'main' into next

Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoip xfrm: policy: support policies with IF_ID in get/delete/deleteall
Eyal Birger [Thu, 9 Jul 2020 06:29:48 +0000 (09:29 +0300)]
ip xfrm: policy: support policies with IF_ID in get/delete/deleteall

The XFRMA_IF_ID attribute is set in policies for them to be
associated with an XFRM interface (4.19+).

Add support for getting/deleting policies with this attribute.

For supporting 'deleteall' the XFRMA_IF_ID attribute needs to be
explicitly copied.

Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoip xfrm: update man page on setting/printing XFRMA_IF_ID in states/policies
Eyal Birger [Thu, 9 Jul 2020 06:29:47 +0000 (09:29 +0300)]
ip xfrm: update man page on setting/printing XFRMA_IF_ID in states/policies

In commit aed63ae1acb9 ("ip xfrm: support setting/printing XFRMA_IF_ID attribute in states/policies")
I added the ability to set/print the xfrm interface ID without updating
the man page.

Fixes: aed63ae1acb9 ("ip xfrm: support setting/printing XFRMA_IF_ID attribute in states/policies")
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agotipc: fixed a compile warning in tipc/link.c
Hoang Huu Le [Thu, 9 Jul 2020 04:25:55 +0000 (11:25 +0700)]
tipc: fixed a compile warning in tipc/link.c

Fixes: 5027f233e35b ("tipc: add link broadcast get")
Signed-off-by: Hoang Huu Le <hoang.h.le@dektech.com.au>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agobridge: fdb get: add missing json init (new_json_obj)
Julien Fortin [Fri, 10 Jul 2020 00:53:02 +0000 (02:53 +0200)]
bridge: fdb get: add missing json init (new_json_obj)

'bridge fdb get' has json support but the json object is never initialized

before patch:

$ bridge -j fdb get 56:23:28:4f:4f:e5 dev vx0
56:23:28:4f:4f:e5 dev vx0 master br0 permanent
$

after patch:

$ bridge -j fdb get 56:23:28:4f:4f:e5 dev vx0 | \
python -c \
'import sys,json;print(json.dumps(json.loads(sys.stdin.read()),indent=4))'
[
    {
        "master": "br0",
        "mac": "56:23:28:4f:4f:e5",
        "flags": [],
        "ifname": "vx0",
        "state": "permanent"
    }
]
$

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoconfigure: support ipset version 7 with kernel version 5
Tony Ambardar [Tue, 7 Jul 2020 07:58:33 +0000 (00:58 -0700)]
configure: support ipset version 7 with kernel version 5

The configure script checks for ipset v6 availability but doesn't test
for v7, which is backward compatible and used on kernel v5.x systems.
Update the script to test for both ipset versions. Without this change,
the tc ematch function em_ipset will be disabled.

Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoip address: remove useless include
Andrea Claudi [Tue, 7 Jul 2020 19:49:47 +0000 (21:49 +0200)]
ip address: remove useless include

utils.h is included two times in ipaddress.c, there is no need for that.

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agogenl: use <> for system includes
Stephen Hemminger [Wed, 8 Jul 2020 15:41:24 +0000 (08:41 -0700)]
genl: use <> for system includes

Be consistent about local versus system headers.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agortacct: drop unused header
Stephen Hemminger [Wed, 8 Jul 2020 15:40:20 +0000 (08:40 -0700)]
rtacct: drop unused header

3 years agoiplink_bareudp: use common include syntax
Stephen Hemminger [Wed, 8 Jul 2020 15:38:58 +0000 (08:38 -0700)]
iplink_bareudp: use common include syntax

Follow the precedent of other parts of iproute2 follow the example of:
  Standard libc headers
  Linux headers

  Iproute2 support headers

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agodevlink: add 'disk' to 'fw_load_policy' string validation
Louis Peens [Fri, 19 Jun 2020 11:50:07 +0000 (13:50 +0200)]
devlink: add 'disk' to 'fw_load_policy' string validation

The 'fw_load_policy' devlink parameter supports the 'disk' value
since kernel v5.4, seems like there was some oversight in adding
this to iproute, fixed by this patch.

Signed-off-by: Louis Peens <louis.peens@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agodevlink: Document zero policer identifier
Ido Schimmel [Tue, 23 Jun 2020 20:06:09 +0000 (23:06 +0300)]
devlink: Document zero policer identifier

When setting a policer to a trap group, a value of "0" will unbind the
currently bound policer from the group.

The behavior is intentional and tested in kernel selftests, so document
it.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Suggested-by: Alex Kushnarov <alexanderk@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agotc: flower: support multiple MPLS LSE match
Guillaume Nault [Wed, 1 Jul 2020 19:49:18 +0000 (21:49 +0200)]
tc: flower: support multiple MPLS LSE match

Add the new "mpls" keyword that can be used to match MPLS fields in
arbitrary Label Stack Entries.
LSEs are introduced by the "lse" keyword and followed by LSE options:
"depth", "label", "tc", "bos" and "ttl". The depth is manadtory, the
other options are optionals.

For example, the following filter drops MPLS packets having two labels,
where the first label is 21 and has TTL 64 and the second label is 22:

$ tc filter add dev ethX ingress proto mpls_uc flower mpls \
    lse depth 1 label 21 ttl 64 \
    lse depth 2 label 22 bos 1 \
    action drop

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoip link: initial support for bareudp devices
Guillaume Nault [Wed, 1 Jul 2020 19:45:04 +0000 (21:45 +0200)]
ip link: initial support for bareudp devices

Bareudp devices provide a generic L3 encapsulation for tunnelling
different protocols like MPLS, IP, NSH, etc. inside a UDP tunnel.

This patch is based on original work from Martin Varghese:
https://lore.kernel.org/netdev/1570532361-15163-1-git-send-email-martinvarghesenokia@gmail.com/

Examples:

  - ip link add dev bareudp0 type bareudp dstport 6635 ethertype mpls_uc

This creates a bareudp tunnel device which tunnels L3 traffic with
ethertype 0x8847 (unicast MPLS traffic). The destination port of the
UDP header will be set to 6635. The device will listen on UDP port 6635
to receive traffic.

  - ip link add dev bareudp0 type bareudp dstport 6635 ethertype ipv4 multiproto

Same as the MPLS example, but for IPv4. The "multiproto" keyword allows
the device to also tunnel IPv6 traffic.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agolib: fix checking of returned file handle size for cgroup
Dmitry Yakunin [Sun, 5 Jul 2020 16:18:12 +0000 (19:18 +0300)]
lib: fix checking of returned file handle size for cgroup

Before this patch check is happened only in case when we try to find
cgroup at cgroup2 mount point.

v2:
  - add Fixes line before Signed-off-by (David Ahern)

Fixes: d5e6ee0dac64 ("ss: introduce cgroup2 cache and helper functions")
Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoip fou: respect preferred_family for IPv6
Sorah Fukumori [Thu, 25 Jun 2020 21:07:12 +0000 (06:07 +0900)]
ip fou: respect preferred_family for IPv6

ip(8) accepts -family ipv6 (-6) option at the toplevel. It is
straightforward to support the existing option for modifying listener
on IPv6 addresses.

Maintain the backward compatibility by leaving ip fou -6 flag
implemented, while it's removed from the usage message.

Signed-off-by: Sorah Fukumori <her@sorah.jp>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agotc: improve the qdisc show command
Anton Danilov [Fri, 3 Jul 2020 15:39:22 +0000 (18:39 +0300)]
tc: improve the qdisc show command

Before can be possible show only all qeueue disciplines on an interface.
There wasn't a way to get the qdisc info by handle or parent, only full
dump of the disciplines with a following grep/sed usage.

Now new and old options work as expected to filter a qdisc by handle or
parent.

Full syntax of the qdisc show command:

tc qdisc { show | list } [ dev STRING ] [ QDISC_ID ] [ invisible ]
  QDISC_ID := { root | ingress | handle QHANDLE | parent CLASSID }

This change doesn't require any changes in the kernel.

Signed-off-by: Anton Danilov <littlesmilingcloud@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agouapi: update bpf.h
Stephen Hemminger [Mon, 6 Jul 2020 17:54:35 +0000 (10:54 -0700)]
uapi: update bpf.h

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agodevlint-health.8: use a single-font macro for a single argument
Bjarni Ingi Gislason [Mon, 29 Jun 2020 00:42:48 +0000 (00:42 +0000)]
devlint-health.8: use a single-font macro for a single argument

Use a single font macro for a single argument.

  Remove unnecessary quotes for a single-font macro.

  Join two lines into one.

  The output of "nroff" and "groff" is unchanged.

Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agodevlink-dev.8: use a single-font macro for one argument
Bjarni Ingi Gislason [Sun, 28 Jun 2020 23:58:40 +0000 (23:58 +0000)]
devlink-dev.8: use a single-font macro for one argument

Use a single-font macro for one argument.

  Remove unnecessary quotes for a single font macro.

  Join some lines into one.

  The output of "nroff" and "groff" is unchanged, except for a font
change in two lines.

Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agodevlink.8: Use a single-font macro for a single argument
Bjarni Ingi Gislason [Sun, 28 Jun 2020 23:26:12 +0000 (23:26 +0000)]
devlink.8: Use a single-font macro for a single argument

Use a single-font macro for a single argument

Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoman8/bridge.8: fix misuse of two-fonts macros
Bjarni Ingi Gislason [Sun, 28 Jun 2020 22:46:26 +0000 (22:46 +0000)]
man8/bridge.8: fix misuse of two-fonts macros

Use a single-font macro for a single argument.

Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agolibnetlink.3: display section numbers in roman font, not boldface
Bjarni Ingi Gislason [Sun, 28 Jun 2020 16:26:15 +0000 (16:26 +0000)]
libnetlink.3: display section numbers in roman font, not boldface

Typeset section numbers in roman font, see man-pages(7).

###

  Details:

Output is from: test-groff -b -mandoc -T utf8 -rF0 -t -w w -z

  [ "test-groff" is a developmental version of "groff" ]

<./man/man3/libnetlink.3>:53 (macro BR): only 1 argument, but more are expected
<./man/man3/libnetlink.3>:132 (macro BR): only 1 argument, but more are expected
<./man/man3/libnetlink.3>:134 (macro BR): only 1 argument, but more are expected
<./man/man3/libnetlink.3>:197 (macro BR): only 1 argument, but more are expected
<./man/man3/libnetlink.3>:198 (macro BR): only 1 argument, but more are expected

Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
3 years agoMerge branch 'rdma-raw-format-dumps' into next
David Ahern [Sun, 5 Jul 2020 18:11:49 +0000 (18:11 +0000)]
Merge branch 'rdma-raw-format-dumps' into next

Leon Romanovsky  says:

====================

The following series adds support to get the RDMA resource data in RAW
format. The main motivation for doing this is to enable vendors to
return the entire QP/CQ/MR data without a need from the vendor to set
each field separately.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agordma: Add support to get MR in raw format
Maor Gottlieb [Wed, 24 Jun 2020 10:40:12 +0000 (13:40 +0300)]
rdma: Add support to get MR in raw format

Add the required support to print MR data in raw format.
Example:

$rdma res show mr dev mlx5_1 mrn 2 -r -j
[{"ifindex":7,"ifname":"mlx5_1",
"data":[0,4,255,254,0,0,0,0,0,0,0,0,16,28,0,216,...]}]

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agordma: Add support to get CQ in raw format
Maor Gottlieb [Wed, 24 Jun 2020 10:40:11 +0000 (13:40 +0300)]
rdma: Add support to get CQ in raw format

Add the required support to print CQ data in raw format.
Example:

$rdma res show cq dev mlx5_2 cqn 1 -r -j
[{"ifindex":8,"ifname":"mlx5_2",
"data":[0,4,255,254,0,0,0,0,0,0,0,0,16,28,...]}]

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agordma: Add support to get QP in raw format
Maor Gottlieb [Wed, 24 Jun 2020 10:40:10 +0000 (13:40 +0300)]
rdma: Add support to get QP in raw format

Add 'raw' argument to get the resource in raw format.
When RDMA_NLDEV_ATTR_RES_RAW is set in the netlink message,
then the resource fields are in raw format, print it as byte array.

Example:
$rdma res show qp link rocep0s12f0/1 lqpn 1137 -j -r
[{"ifindex":7,"ifname":"mlx5_1","port":1,
"data":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...]}]

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agordma: update uapi headers
Maor Gottlieb [Wed, 24 Jun 2020 10:40:09 +0000 (13:40 +0300)]
rdma: update uapi headers

Update rdma_netlink.h file upto kernel commit 65959522f806
("RDMA: Add support to dump resource tracker in RAW format")

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoMerge branch 'tc-qevents' into next
David Ahern [Sun, 5 Jul 2020 15:45:48 +0000 (15:45 +0000)]
Merge branch 'tc-qevents' into next

Petr Machata  says:

====================

To allow configuring user-defined actions as a result of inner workings of
a qdisc, a concept of qevents was recently introduced to the kernel.
Qevents are attach points for TC blocks, where filters can be put that are
executed as the packet hits well-defined points in the qdisc algorithms.
The attached blocks can be shared, in a manner similar to clsact ingress
and egress blocks, arbitrary classifiers with arbitrary actions can be put
on them, etc.

For example:

 # tc qdisc add dev eth0 root handle 1: \
red limit 500K avpkt 1K qevent early_drop block 10
 # tc filter add block 10 \
matchall action mirred egress mirror dev eth1

This patch set introduces the corresponding iproute2 support. Patch #1 adds
the new netlink attribute enumerators. Patch #2 adds a set of helpers to
implement qevents, and #3 adds a generic documentation to tc.8. Patch #4
then adds two new qevents to the RED qdisc: mark and early_drop.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agotc: q_red: Add support for qevents "mark" and "early_drop"
Petr Machata [Tue, 30 Jun 2020 10:14:52 +0000 (13:14 +0300)]
tc: q_red: Add support for qevents "mark" and "early_drop"

The "early_drop" qevent matches packets that have been early-dropped. The
"mark" qevent matches packets that have been ECN-marked.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoman: tc: Describe qevents
Petr Machata [Tue, 30 Jun 2020 10:14:51 +0000 (13:14 +0300)]
man: tc: Describe qevents

Add some general remarks about qevents.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agotc: Add helpers to support qevent handling
Petr Machata [Tue, 30 Jun 2020 10:14:50 +0000 (13:14 +0300)]
tc: Add helpers to support qevent handling

Introduce a set of helpers to make it easy to add support for qevents into
qdisc.

The idea behind this is that qevent types will be generally reused between
qdiscs, rather than each having a completely idiosyncratic set of qevents.
The qevent module holds functions for parsing, dumping and formatting of
these common qevent types, and for dispatch to the appropriate set of
handlers based on the qevent name.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoaction police: make 'mtu' could be set independently in police action
Po Liu [Mon, 29 Jun 2020 02:04:20 +0000 (10:04 +0800)]
action police: make 'mtu' could be set independently in police action

Current police action must set 'rate' and 'burst'. 'mtu' parameter
set the max frame size and could be set alone without 'rate' and 'burst'
in some situation. Offloading to hardware for example, 'mtu' could limit
the flow max frame size.

Signed-off-by: Po Liu <po.liu@nxp.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoaction police: change the print message quotes style
Po Liu [Mon, 29 Jun 2020 02:04:19 +0000 (10:04 +0800)]
action police: change the print message quotes style

Change the double quotes to single quotes in fprintf message to make it
more readable.

Signed-off-by: Po Liu <po.liu@nxp.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoadd support to keepalived rtm_protocol
Alexandre Cassen [Wed, 24 Jun 2020 16:21:25 +0000 (18:21 +0200)]
add support to keepalived rtm_protocol

Following inclusion in net-next, extend rtnl_rtprot_tab and rt_protos
to support Keepalived.

Signed-off-by: Alexandre Cassen <acassen@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoMerge branch 'devlink-port-mac-addr' into next
David Ahern [Sun, 5 Jul 2020 14:49:53 +0000 (14:49 +0000)]
Merge branch 'devlink-port-mac-addr' into next

Parav Pandit  says:

====================

Currently ip link set dev <pfndev> vf <vf_num> <param> <value> has
few below limitations.

1. Command is limited to set VF parameters only.
It cannot set the default MAC address for the PCI PF.

2. It can be set only on system where PCI SR-IOV is supported.
In smartnic based system, eswitch of a NIC resides on a different
embedded cpu which has the VF and PF representors for the SR-IOV
support on a host system in which this smartnic is plugged-in.

3. It cannot setup the function attributes of sub-function described
in detail in comprehensive RFC [1] and [2].

This series covers the first small part to let user query and set MAC
address (hardware address) of a PCI PF/VF which is represented by
devlink port.

[1] https://lore.kernel.org/netdev/20200519092258.GF4655@nanopsycho/
[2] https://marc.info/?l=linux-netdev&m=158555928517777&w=2

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agodevlink: Support setting port function hardware address
Parav Pandit [Tue, 23 Jun 2020 10:44:25 +0000 (10:44 +0000)]
devlink: Support setting port function hardware address

Support setting devlink port function hardware address.

Example of a PCI VF port which supports a port function:
Set hardware address of the VF's port function.

$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
  function:
    hw_addr 00:00:00:00:00:00

$ devlink port function set pci/0000:06:00.0/2 hw_addr 00:11:22:33:44:55

$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
  function:
    hw_addr 00:11:22:33:44:55

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agodevlink: Support querying hardware address of port function
Parav Pandit [Tue, 23 Jun 2020 10:44:24 +0000 (10:44 +0000)]
devlink: Support querying hardware address of port function

Add support to query the hardware address of function represented
by devlink port function.

Example of a PCI VF port which supports a port function:
$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
  function:
    hw_addr 00:11:22:33:44:66

$ devlink port show pci/0000:06:00.0/2 -jp
{
    "port": {
        "pci/0000:06:00.0/2": {
            "type": "eth",
            "netdev": "enp6s0pf0vf1",
            "flavour": "pcivf",
            "pfnum": 0,
            "vfnum": 1,
            "function": {
                "hw_addr": "00:11:22:33:44:66"
            }
        }
    }
}

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agodevlink: Move devlink port code at start to reuse
Parav Pandit [Tue, 23 Jun 2020 10:44:23 +0000 (10:44 +0000)]
devlink: Move devlink port code at start to reuse

To reuse print routines for port function in subsequent patch, move
print routine specific to devlink device at start of the file.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoUpdate kernel headers
David Ahern [Sun, 5 Jul 2020 14:33:15 +0000 (14:33 +0000)]
Update kernel headers

Update kernel headers to commit:
   e1f046704404 ("Merge branch 'qlogic-use-generic-power-management'")

Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoman/tc: remove obsolete reference to ipchains
Stephen Hemminger [Wed, 24 Jun 2020 19:13:46 +0000 (12:13 -0700)]
man/tc: remove obsolete reference to ipchains

It isn't Linux 2.2 anymore.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoip address: Fix loop initial declarations are only allowed in C99
Roi Dayan [Thu, 11 Jun 2020 17:35:43 +0000 (20:35 +0300)]
ip address: Fix loop initial declarations are only allowed in C99

On some distros, i.e. rhel 7.6, compilation fails with the following:

ipaddress.c: In function â€˜lookup_flag_data_by_name’:
ipaddress.c:1260:2: error: â€˜for’ loop initial declarations are only allowed in C99 mode
  for (int i = 0; i < ARRAY_SIZE(ifa_flag_data); ++i) {
  ^
ipaddress.c:1260:2: note: use option -std=c99 or -std=gnu99 to compile your code

This commit fixes the single place needed for compilation to pass.

Fixes: 9d59c86e575b ("iproute2: ip addr: Organize flag properties structurally")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agouapi: update to magic.h
Stephen Hemminger [Thu, 11 Jun 2020 16:52:38 +0000 (09:52 -0700)]
uapi: update to magic.h

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agodevlink: Add 'mirror' trap action
Ido Schimmel [Sun, 7 Jun 2020 08:36:47 +0000 (11:36 +0300)]
devlink: Add 'mirror' trap action

Allow setting 'mirror' trap action for traps that support it. Extend the
devlink-trap man page and bash completion accordingly.

Example:

# devlink -jp trap show netdevsim/netdevsim10 trap igmp_query
{
    "trap": {
        "netdevsim/netdevsim10": [ {
                "name": "igmp_query",
                "type": "control",
                "generic": true,
                "action": "mirror",
                "group": "mc_snooping"
            } ]
    }
}

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agodevlink: Add 'control' trap type
Ido Schimmel [Sun, 7 Jun 2020 08:36:46 +0000 (11:36 +0300)]
devlink: Add 'control' trap type

This type is used for traps that trap control packets such as ARP
request and IGMP query to the CPU.

Example:

# devlink -jp trap show netdevsim/netdevsim10 trap igmp_v1_report
{
    "trap": {
        "netdevsim/netdevsim10": [ {
                "name": "igmp_v1_report",
                "type": "control",
                "generic": true,
                "action": "trap",
                "group": "mc_snooping"
            } ]
    }
}

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agodevlink: update include files
Stephen Hemminger [Thu, 11 Jun 2020 16:46:46 +0000 (09:46 -0700)]
devlink: update include files

Use the tool iwyu to get more complete list of includes for
all the bits used by devlink.

This should also fix build with musl libc.

Fixes: c4dfddccef4e ("fix JSON output of mon command")
Reported-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agobridge: support for nexthop id in fdb entries
Roopa Prabhu [Thu, 11 Jun 2020 00:35:21 +0000 (17:35 -0700)]
bridge: support for nexthop id in fdb entries

This patch adds support to assign a nexthop group
id to an fdb entry.

$bridge fdb add 02:02:00:00:00:13 dev vx10 nhid 102 self

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
3 years agoipnexthop: support for fdb nexthops
Roopa Prabhu [Thu, 11 Jun 2020 00:35:20 +0000 (17:35 -0700)]
ipnexthop: support for fdb nexthops

This patch adds support to add and delete
ecmp nexthops of type fdb. Such nexthops can
be linked to vxlan fdb entries.

$ip nexthop add id 12 via 172.16.1.2 fdb
$ip nexthop add id 13 via 172.16.1.3 fdb
$ip nexthop add id 102 group 12/13 fdb

$bridge fdb add 02:02:00:00:00:13 dev vx10 nhid 102 self

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
3 years agoMerge branch 'master' into next
David Ahern [Mon, 8 Jun 2020 14:40:54 +0000 (14:40 +0000)]
Merge branch 'master' into next

Signed-off-by: David Ahern <dsahern@gmail.com>
3 years agouapi: update headers
Stephen Hemminger [Fri, 5 Jun 2020 15:36:54 +0000 (08:36 -0700)]
uapi: update headers

Update kernel headers from 5.8.0 merge

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoMerge git://git.kernel.org/pub/scm/network/iproute2/iproute2-next
Stephen Hemminger [Fri, 5 Jun 2020 15:33:29 +0000 (08:33 -0700)]
Merge git://git.kernel.org/pub/scm/network/iproute2/iproute2-next

3 years agov5.7.0
Stephen Hemminger [Wed, 3 Jun 2020 03:35:00 +0000 (20:35 -0700)]
v5.7.0

3 years agonexthop: Fix Deletion display
Donald Sharp [Sat, 30 May 2020 12:16:37 +0000 (08:16 -0400)]
nexthop: Fix Deletion display

Actually display that deletions are happening
when monitoring nexthops.

Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agouapi: fix comment in xfrm.h
Stephen Hemminger [Mon, 1 Jun 2020 15:07:02 +0000 (08:07 -0700)]
uapi: fix comment in xfrm.h

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoiproute2: ip addr: Add support for setting 'optimistic'
Ian K. Coolidge [Wed, 27 May 2020 18:03:46 +0000 (11:03 -0700)]
iproute2: ip addr: Add support for setting 'optimistic'

optimistic DAD is controllable via sysctl for an interface
or all interfaces on the system. This would affect addresses
added by the kernel only.

Recent kernels, however, have enabled support for adding optimistic
address via userspace. This plumbs that support.

Signed-off-by: David Ahern <dsahern@gmail.com>
3 years agoiproute2: ip addr: Organize flag properties structurally
Ian K. Coolidge [Wed, 27 May 2020 18:03:45 +0000 (11:03 -0700)]
iproute2: ip addr: Organize flag properties structurally

This creates a nice systematic way to check that the various flags are
mutable from userspace and that the address family is valid.

Mutability properties are preserved to avoid introducing any behavioral
change in this CL. However, previously, immutable flags were ignored and
fell through to this confusing error:

Error: either "local" is duplicate, or "dadfailed" is a garbage.

But now, they just warn more explicitly:

Warning: dadfailed option is not mutable from userspace
Signed-off-by: David Ahern <dsahern@gmail.com>
3 years agotc: report time an action was first used
Roman Mashak [Thu, 28 May 2020 01:22:47 +0000 (21:22 -0400)]
tc: report time an action was first used

Have print_tm() dump firstuse value along with install, lastuse
and expires.

v2: Resubmit after 'master' merged into next

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
3 years agobpf: Fixes a snprintf truncation warning
Andrea Claudi [Tue, 26 May 2020 16:04:11 +0000 (18:04 +0200)]
bpf: Fixes a snprintf truncation warning

gcc v9.3.1 reports:

bpf.c: In function â€˜bpf_get_work_dir’:
bpf.c:784:49: warning: â€˜snprintf’ output may be truncated before the last format character [-Wformat-truncation=]
  784 |  snprintf(bpf_wrk_dir, sizeof(bpf_wrk_dir), "%s/", mnt);
      |                                                 ^
bpf.c:784:2: note: â€˜snprintf’ output between 2 and 4097 bytes into a destination of size 4096
  784 |  snprintf(bpf_wrk_dir, sizeof(bpf_wrk_dir), "%s/", mnt);
      |  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fix this simply checking snprintf return code and properly handling the error.

Fixes: e42256699cac ("bpf: make tc's bpf loader generic and move into lib")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoRevert "bpf: replace snprintf with asprintf when dealing with long buffers"
Andrea Claudi [Tue, 26 May 2020 16:04:10 +0000 (18:04 +0200)]
Revert "bpf: replace snprintf with asprintf when dealing with long buffers"

This reverts commit c0325b06382cb4f7ebfaf80c29c8800d74666fd9.
It introduces a segfault in bpf_make_custom_path() when custom pinning is used.

This happens because asprintf allocates exactly the space needed to hold a
string in the buffer passed as its first argument, but if this buffer is later
used in strcat() or similar we have a buffer overrun.

As the aim of commit c0325b06382c is simply to fix a compiler warning, it
seems safe and reasonable to revert it.

Fixes: c0325b06382c ("bpf: replace snprintf with asprintf when dealing with long buffers")
Reported-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoMerge branch 'master' into next
David Ahern [Wed, 27 May 2020 02:08:27 +0000 (02:08 +0000)]
Merge branch 'master' into next

Signed-off-by: David Ahern <dsahern@gmail.com>
3 years agotipc: enable printing of broadcast rcv link stats
Tuong Lien [Tue, 26 May 2020 09:40:55 +0000 (16:40 +0700)]
tipc: enable printing of broadcast rcv link stats

This commit allows printing the statistics of a broadcast-receiver link
using the same tipc command, but with additional 'link' options:

$ tipc link stat show --help
Usage: tipc link stat show [ link { LINK | SUBSTRING | all } ]

With:
+ 'LINK'      : print the stats of the specific link 'LINK';
+ 'SUBSTRING' : print the stats of all the links having the 'SUBSTRING'
                in name;
+ 'all'       : print all the links' stats incl. the broadcast-receiver
                ones;

Also, a link stats can be reset in the usual way by specifying the link
name in command.

For example:

$ tipc l st sh l br
Link <broadcast-link>
  Window:50 packets
  RX packets:0 fragments:0/0 bundles:0/0
  TX packets:5011125 fragments:4968774/149643 bundles:38402/307061
  RX naks:781484 defs:0 dups:0
  TX naks:0 acks:0 retrans:330259
  Congestion link:50657  Send queue max:0 avg:0

Link <broadcast-link:1001001>
  Window:50 packets
  RX packets:95146 fragments:95040/1980 bundles:1/2
  TX packets:0 fragments:0/0 bundles:0/0
  RX naks:380938 defs:83962 dups:403
  TX naks:8362 acks:0 retrans:170662
  Congestion link:0  Send queue max:0 avg:0

Link <broadcast-link:1001002>
  Window:50 packets
  RX packets:0 fragments:0/0 bundles:0/0
  TX packets:0 fragments:0/0 bundles:0/0
  RX naks:400546 defs:0 dups:0
  TX naks:0 acks:0 retrans:159597
  Congestion link:0  Send queue max:0 avg:0

$ tipc l st sh l 1001002
Link <1001003:data0-1001002:data0>
  ACTIVE  MTU:1500  Priority:10  Tolerance:1500 ms  Window:50 packets
  RX packets:99546 fragments:0/0 bundles:33/877
  TX packets:629 fragments:0/0 bundles:35/828
  TX profile sample:8 packets average:390 octets
  0-64:75% -256:0% -1024:0% -4096:25% -16384:0% -32768:0% -66000:0%
  RX states:488714 probes:7397 naks:0 defs:4 dups:5
  TX states:27734 probes:18016 naks:5 acks:2305 retrans:0
  Congestion link:0  Send queue max:0 avg:0

Link <broadcast-link:1001002>
  Window:50 packets
  RX packets:0 fragments:0/0 bundles:0/0
  TX packets:0 fragments:0/0 bundles:0/0
  RX naks:400546 defs:0 dups:0
  TX naks:0 acks:0 retrans:159597
  Congestion link:0  Send queue max:0 avg:0

$ tipc l st re l broadcast-link:1001002

$ tipc l st sh l broadcast-link:1001002
Link <broadcast-link:1001002>
  Window:50 packets
  RX packets:0 fragments:0/0 bundles:0/0
  TX packets:0 fragments:0/0 bundles:0/0
  RX naks:0 defs:0 dups:0
  TX naks:0 acks:0 retrans:0
  Congestion link:0  Send queue max:0 avg:0

Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David Ahern <dsahern@gmail.com>
3 years agolwtunnel: add support for rpl segment routing
Alexander Aring [Wed, 20 May 2020 23:57:27 +0000 (19:57 -0400)]
lwtunnel: add support for rpl segment routing

This patch adds support for rpl segment routing settings.
Example:

ip -n ns0 -6 route add 2001::3 encap rpl segs \
fe80::c8fe:beef:cafe:cafe,fe80::c8fe:beef:cafe:beef dev lowpan0

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agotc: action: fix time values output in JSON format
Roman Mashak [Wed, 20 May 2020 00:59:44 +0000 (20:59 -0400)]
tc: action: fix time values output in JSON format

Report tcf_t values in seconds, not jiffies, in JSON format as it is now
for stdout.

v2: use PRINT_ANY, drop the useless casts and fix the style (Stephen Hemminger)

Fixes: 2704bd625583 ("tc: jsonify actions core")
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agouapi: update to bpf.h
Stephen Hemminger [Tue, 19 May 2020 21:31:54 +0000 (14:31 -0700)]
uapi: update to bpf.h

Part of the zero-length array changes

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoutils: remove trailing zeros in print_time() and print_time64()
Eric Dumazet [Tue, 5 May 2020 19:46:18 +0000 (12:46 -0700)]
utils: remove trailing zeros in print_time() and print_time64()

Before :

tc qd sh dev eth1

... refill_delay 40.0ms timer_slack 10.000us horizon 10.000s

After :
... refill_delay 40ms timer_slack 10us horizon 10s

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoman: tc-ct.8: Add manual page for ct tc action
Paul Blakey [Thu, 14 May 2020 14:10:20 +0000 (17:10 +0300)]
man: tc-ct.8: Add manual page for ct tc action

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agotc: mqprio: reject queues count/offset pair count higher than num_tc
Maciej Fijalkowski [Wed, 13 May 2020 19:47:17 +0000 (21:47 +0200)]
tc: mqprio: reject queues count/offset pair count higher than num_tc

Provide a sanity check that will make sure whether queues count/offset
pair count will not exceed the actual number of TCs being created.

Example command that is invalid because there are 4 count/offset pairs
whereas num_tc is only 2.

 # tc qdisc add dev enp96s0f0 root mqprio num_tc 2 map 0 0 0 0 1 1 1 1
queues 4@0 4@4 4@8 4@12 hw 1 mode channel

Store the parsed count/offset pair count onto a dedicated variable that
will be compared against opt.num_tc after all of the command line
arguments were parsed. Bail out if this count is higher than opt.num_tc
and let user know about it.

Drivers were swallowing such commands as they were iterating over
count/offset pairs where num_tc was used as a delimiter, so this is not
a big deal, but better catch such misconfiguration at the command line
argument parsing level.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agoss: add checks for bc filter support
Dmitry Yakunin [Sat, 9 May 2020 16:52:02 +0000 (19:52 +0300)]
ss: add checks for bc filter support

As noted by David Ahern, now if some bytecode filter is not supported
by running kernel printed error message is not clear. This patch is attempt to
detect such case and print correct message. This is done by providing checking
function for new filter types. As example check function for cgroup filter
is implemented. It sends correct lightweight request (idiag_states = 0)
with zero cgroup condition to the kernel and checks returned errno. If filter
is not supported EINVAL is returned. Result of checking is cached to
avoid extra checks if several same filters are specified.

Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru>
Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agoss: add support for cgroup v2 information and filtering
Dmitry Yakunin [Sat, 9 May 2020 16:52:01 +0000 (19:52 +0300)]
ss: add support for cgroup v2 information and filtering

This patch introduces two new features: obtaining cgroup information and
filtering sockets by cgroups. These features work based on cgroup v2 ID
field in the socket (kernel should be compiled with CONFIG_SOCK_CGROUP_DATA).

Cgroup information can be obtained by specifying --cgroup flag and now contains
only pathname. For faster pathname lookups cgroup cache is implemented. This
cache is filled on ss startup and missed entries are resolved and saved
on the fly.

Cgroup filter extends EXPRESSION and allows to specify cgroup pathname
(relative or absolute) to obtain sockets attached only to this cgroup.
Filter syntax: ss [ cgroup PATHNAME ]
Examples:
    ss -a cgroup /sys/fs/cgroup/unified (or ss -a cgroup .)
    ss -a cgroup /sys/fs/cgroup/unified/cgroup1 (or ss -a cgroup cgroup1)

v2:
  - style fixes (David Ahern)

Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru>
Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agoss: introduce cgroup2 cache and helper functions
Dmitry Yakunin [Sat, 9 May 2020 16:52:00 +0000 (19:52 +0300)]
ss: introduce cgroup2 cache and helper functions

This patch prepares infrastructure for matching sockets by cgroups.
Two helper functions are added for transformation between cgroup v2 ID
and pathname. Cgroup v2 cache is implemented as hash table indexed by ID.
This cache is needed for faster lookups of socket cgroup.

v2:
  - style fixes (David Ahern)

Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru>
Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agoiproute2-next: add gate action man page
Po Liu [Fri, 8 May 2020 07:02:47 +0000 (15:02 +0800)]
iproute2-next: add gate action man page

This patch is to add the man page for the tc gate action.

Signed-off-by: Po Liu <Po.Liu@nxp.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agoiproute2-next:tc:action: add a gate control action
Po Liu [Fri, 8 May 2020 07:02:46 +0000 (15:02 +0800)]
iproute2-next:tc:action: add a gate control action

Introduce a ingress frame gate control flow action.
Tc gate action does the work like this:
Assume there is a gate allow specified ingress frames can pass at
specific time slot, and also drop at specific time slot. Tc filter
chooses the ingress frames, and tc gate action would specify what slot
does these frames can be passed to device and what time slot would be
dropped.
Tc gate action would provide an entry list to tell how much time gate
keep open and how much time gate keep state close. Gate action also
assign a start time to tell when the entry list start. Then driver would
repeat the gate entry list cyclically.
For the software simulation, gate action require the user assign a time
clock type.

Below is the setting example in user space. Tc filter a stream source ip
address is 192.168.0.20 and gate action own two time slots. One is last
200ms gate open let frame pass another is last 100ms gate close let
frames dropped.

 # tc qdisc add dev eth0 ingress
 # tc filter add dev eth0 parent ffff: protocol ip \

            flower src_ip 192.168.0.20 \
            action gate index 2 clockid CLOCK_TAI \
            sched-entry open 200000000ns -1 8000000b \
            sched-entry close 100000000ns

 # tc chain del dev eth0 ingress chain 0

"sched-entry" follow the name taprio style. Gate state is
"open"/"close". Follow the period nanosecond. Then next -1 is internal
priority value means which ingress queue should put to. "-1" means
wildcard. The last value optional specifies the maximum number of
MSDU octets that are permitted to pass the gate during the specified
time interval, the overlimit frames would be dropped.

Below example shows filtering a stream with destination mac address is
10:00:80:00:00:00 and ip type is ICMP, follow the action gate. The gate
action would run with one close time slot which means always keep close.
The time cycle is total 200000000ns. The base-time would calculate by:

     1357000000000 + (N + 1) * cycletime

When the total value is the future time, it will be the start time.
The cycletime here would be 200000000ns for this case.

 #tc filter add dev eth0 parent ffff:  protocol ip \
           flower skip_hw ip_proto icmp dst_mac 10:00:80:00:00:00 \
           action gate index 12 base-time 1357000000000ns \
           sched-entry CLOSE 200000000ns \
           clockid CLOCK_TAI

Signed-off-by: Po Liu <Po.Liu@nxp.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agoUpdate kernel headers and import tc_gate.h
David Ahern [Wed, 13 May 2020 02:18:15 +0000 (02:18 +0000)]
Update kernel headers and import tc_gate.h

Update kernel headers to commit:
    fb9f2e92864f ("net: dsa: tag_sja1105: appease sparse checks for ethertype accessors")
and import tc_act/tc_gate.h

Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agotc: fq: fix two issues
Eric Dumazet [Tue, 5 May 2020 15:43:48 +0000 (08:43 -0700)]
tc: fq: fix two issues

My latest patch missed the fact that this file got JSON support.

Also fixes a spelling error added during JSON change.

Fixes: be9ca9d54123 ("tc: fq: add timer_slack parameter")
Fixes: d15e2bfc042b ("tc: fq: add support for JSON output")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoss: update to bw print
Stephen Hemminger [Tue, 5 May 2020 17:18:58 +0000 (10:18 -0700)]
ss: update to bw print

Display kilobit with the standard suffix.
Add comment to describe where data rate suffixes come from.
Add support for terrabit.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agodevlink: support kernel-side snapshot id allocation
Jakub Kicinski [Tue, 5 May 2020 17:04:57 +0000 (10:04 -0700)]
devlink: support kernel-side snapshot id allocation

Make ID argument optional and read the snapshot info
that kernel sends us.

$ devlink region new netdevsim/netdevsim1/dummy
netdevsim/netdevsim1/dummy: snapshot 0
$ devlink -jp region new netdevsim/netdevsim1/dummy
{
    "regions": {
        "netdevsim/netdevsim1/dummy": {
            "snapshot": [ 1 ]
        }
    }
}
$ devlink region show netdevsim/netdevsim1/dummy
netdevsim/netdevsim1/dummy: size 32768 snapshot [0 1]

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agoss: add support for Gbit speeds in sprint_bw()
Eric Dumazet [Tue, 5 May 2020 15:37:41 +0000 (08:37 -0700)]
ss: add support for Gbit speeds in sprint_bw()

Also use 'g' specifier instead of 'f' to remove trailing zeros,
and increase precision.

Examples of output :
 Before        After
 8.0Kbps       8Kbps
 9.9Mbps       9.92Mbps
 55001Mbps     55Gbps

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoMerge branch 'master' into next
David Ahern [Tue, 5 May 2020 16:49:38 +0000 (16:49 +0000)]
Merge branch 'master' into next

Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agoImport rpl.h and rpl_iptunnel.h uapi headers
David Ahern [Tue, 5 May 2020 16:23:14 +0000 (16:23 +0000)]
Import rpl.h and rpl_iptunnel.h uapi headers

Import rpl.h and rpl_iptunnel.h as of kernel commit:
    354d86141796 ("Merge branch 'net-reduce-dynamic-lockdep-keys'")

Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agotc: full JSON support for 'bpf' filter
Davide Caratti [Thu, 30 Apr 2020 18:03:30 +0000 (20:03 +0200)]
tc: full JSON support for 'bpf' filter

example using eBPF:

 # tc filter add dev dummy0 ingress bpf \
 > direct-action obj ./bpf/filter.o sec tc-ingress
 # tc  -j filter show dev dummy0 ingress | jq
 [
   {
     "protocol": "all",
     "pref": 49152,
     "kind": "bpf",
     "chain": 0
   },
   {
     "protocol": "all",
     "pref": 49152,
     "kind": "bpf",
     "chain": 0,
     "options": {
       "handle": "0x1",
       "bpf_name": "filter.o:[tc-ingress]",
       "direct-action": true,
       "not_in_hw": true,
       "prog": {
         "id": 101,
         "tag": "a04f5eef06a7f555",
         "jited": 1
       }
     }
   }
 ]

v2:
 - use print_nl(), thanks to Andrea Claudi
 - use print_0xhex() for filter handle, thanks to Stephen Hemminger

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agoUpdate kernel headers
David Ahern [Tue, 5 May 2020 16:11:22 +0000 (16:11 +0000)]
Update kernel headers

Update kernel headers to commit:
    354d86141796 ("Merge branch 'net-reduce-dynamic-lockdep-keys'")

Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agoReplace open-coded instances of print_nl()
Benjamin Poirier [Fri, 1 May 2020 08:47:20 +0000 (17:47 +0900)]
Replace open-coded instances of print_nl()

Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agobridge: Align output columns
Benjamin Poirier [Fri, 1 May 2020 08:47:19 +0000 (17:47 +0900)]
bridge: Align output columns

Use fixed column widths to improve readability.

Before:
root@vsid:/src/iproute2# ./bridge/bridge vlan tunnelshow
port    vlan-id tunnel-id
vx0      2       2
         1010-1020       1010-1020
         1030    65556
vx-longname      2       2

After:
root@vsid:/src/iproute2# ./bridge/bridge vlan tunnelshow
port              vlan-id    tunnel-id
vx0               2          2
                  1010-1020  1010-1020
                  1030       65556
vx-longname       2          2

Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agojson_print: Return number of characters printed
Benjamin Poirier [Fri, 1 May 2020 08:47:18 +0000 (17:47 +0900)]
json_print: Return number of characters printed

When outputting in normal mode, forward the return value from
color_fprintf().

Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agobridge: Fix output with empty vlan lists
Benjamin Poirier [Fri, 1 May 2020 08:47:17 +0000 (17:47 +0900)]
bridge: Fix output with empty vlan lists

Consider this configuration:

ip link add br0 type bridge
ip link add vx0 type vxlan dstport 4789 external
ip link set dev vx0 master br0
bridge vlan del vid 1 dev vx0
ip link add vx1 type vxlan dstport 4790 external
ip link set dev vx1 master br0

root@vsid:/src/iproute2# ./bridge/bridge vlan
port    vlan-id
br0      1 PVID Egress Untagged

vx0     None
vx1      1 PVID Egress Untagged

root@vsid:/src/iproute2#

Note the useless and inconsistent empty lines.

root@vsid:/src/iproute2# ./bridge/bridge vlan tunnelshow
port    vlan-id tunnel-id
br0
vx0     None
vx1

What's the difference between "None" and ""?

root@vsid:/src/iproute2# ./bridge/bridge -j -p vlan tunnelshow
[ {
"ifname": "br0",
"tunnels": [ ]
    },{
"ifname": "vx1",
"tunnels": [ ]
    } ]

Why does vx0 appear in normal output and not json output?
Why output an empty list for br0 and vx1?

Fix these inconsistencies and avoid outputting entries with no values. This
makes the behavior consistent with other iproute2 commands, for example
`ip -6 addr`: if an interface doesn't have any ipv6 addresses, it is not
part of the listing.

Fixes: 8652eeb3ab12 ("bridge: vlan: support for per vlan tunnel info")
Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agobridge: Fix typo
Benjamin Poirier [Fri, 1 May 2020 08:47:16 +0000 (17:47 +0900)]
bridge: Fix typo

Fixes: 7abf5de677e3 ("bridge: vlan: add support to display per-vlan statistics")
Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agobridge: Use consistent column names in vlan output
Benjamin Poirier [Fri, 1 May 2020 08:47:15 +0000 (17:47 +0900)]
bridge: Use consistent column names in vlan output

Fix singular vs plural. Add a hyphen to clarify that each of those are
single fields.

Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agotc: f_flower: add options support for erspan
Xin Long [Mon, 27 Apr 2020 10:27:51 +0000 (18:27 +0800)]
tc: f_flower: add options support for erspan

This patch is to add TCA_FLOWER_KEY_ENC_OPTS_ERSPAN's parse and
print to implement erspan options support in m_tunnel_key, like
Commit 56155d4df86d ("tc: f_flower: add geneve option match
support to flower") for geneve options support.

Option is expressed as version:index:dir:hwid, dir and hwid will
be parsed when version is 2, while index will be parsed when
version is 1. erspan doesn't support multiple options.

With this patch, users can add and dump erspan options like:

  # ip link add name erspan1 type erspan external
  # tc qdisc add dev erspan1 ingress
  # tc filter add dev erspan1 protocol ip parent ffff: \
      flower \
        enc_src_ip 10.0.99.192 \
        enc_dst_ip 10.0.99.193 \
        enc_key_id 11 \
        erspan_opts 1:2:0:0/1:255:0:0 \
        ip_proto udp \
        action mirred egress redirect dev eth1
  # tc -s filter show dev erspan1 parent ffff:

     filter protocol ip pref 49152 flower chain 0 handle 0x1
       eth_type ipv4
       ip_proto udp
       enc_dst_ip 10.0.99.193
       enc_src_ip 10.0.99.192
       enc_key_id 11
       erspan_opts 1:2:0:0/1:255:0:0
       not_in_hw
         action order 1: mirred (Egress Redirect to device eth1) stolen
         index 1 ref 1 bind 1
         Action statistics:
         Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
         backlog 0b 0p requeues 0

v1->v2:
  - no change.
v2->v3:
  - no change.
v3->v4:
  - keep the same format between input and output, json and non json.
  - print version, index, dir and hwid as uint.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agotc: f_flower: add options support for vxlan
Xin Long [Mon, 27 Apr 2020 10:27:50 +0000 (18:27 +0800)]
tc: f_flower: add options support for vxlan

This patch is to add TCA_FLOWER_KEY_ENC_OPTS_VXLAN's parse and
print to implement vxlan options support in m_tunnel_key, like
Commit 56155d4df86d ("tc: f_flower: add geneve option match
support to flower") for geneve options support.

Option is expressed a 32bit number for gbp only, and vxlan
doesn't support multiple options.

With this patch, users can add and dump vxlan options like:

  # ip link add name vxlan1 type vxlan dstport 0 external
  # tc qdisc add dev vxlan1 ingress
  # tc filter add dev vxlan1 protocol ip parent ffff: \
      flower \
        enc_src_ip 10.0.99.192 \
        enc_dst_ip 10.0.99.193 \
        enc_key_id 11 \
        vxlan_opts 65793/4008635966 \
        ip_proto udp \
        action mirred egress redirect dev eth1
  # tc -s filter show dev vxlan1 parent ffff:

     filter protocol ip pref 49152 flower chain 0 handle 0x1
       eth_type ipv4
       ip_proto udp
       enc_dst_ip 10.0.99.193
       enc_src_ip 10.0.99.192
       enc_key_id 11
       vxlan_opts 65793/4008635966
       not_in_hw
         action order 1: mirred (Egress Redirect to device eth1) stolen
         index 3 ref 1 bind 1
         Action statistics:
         Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
         backlog 0b 0p requeues 0

v1->v2:
  - get_u32 with base = 0 for gbp.
v2->v3:
  - implement proper JSON array for opts.
v3->v4:
  - keep the same format between input and output, json and non json.
  - print gbp as uint.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agotc: m_tunnel_key: add options support for erpsan
Xin Long [Mon, 27 Apr 2020 10:27:49 +0000 (18:27 +0800)]
tc: m_tunnel_key: add options support for erpsan

This patch is to add TCA_TUNNEL_KEY_ENC_OPTS_ERSPAN's parse and
print to implement erspan options support in m_tunnel_key, like
Commit 6217917a3826 ("tc: m_tunnel_key: Add tunnel option support
to act_tunnel_key") for geneve options support.

Option is expressed as version:index:dir:hwid, dir and hwid will
be parsed when version is 2, while index will be parsed when
version is 1. erspan doesn't support multiple options.

With this patch, users can add and dump erspan options like:

  # ip link add name erspan1 type erspan external
  # tc qdisc add dev eth0 ingress
  # tc filter add dev eth0 protocol ip parent ffff: \
      flower indev eth0 \
        ip_proto udp \
        action tunnel_key \
          set src_ip 10.0.99.192 \
          dst_ip 10.0.99.193 \
          dst_port 6081 \
          id 11 \
          erspan_opts 1:2:0:0 \
      action mirred egress redirect dev erspan1
  # tc -s filter show dev eth0 parent ffff:

     filter protocol ip pref 49151 flower chain 0 handle 0x1
       indev eth0
       eth_type ipv4
       ip_proto udp
       not_in_hw
         action order 1: tunnel_key  set
         src_ip 10.0.99.192
         dst_ip 10.0.99.193
         key_id 11
         dst_port 6081
         erspan_opts 1:2:0:0
         csum pipe
           index 2 ref 1 bind 1
         ...
v1->v2:
  - no change.
v2->v3:
  - no change.
v3->v4:
  - keep the same format between input and output, json and non json.
  - print version, index, dir and hwid as uint.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agotc: m_tunnel_key: add options support for vxlan
Xin Long [Mon, 27 Apr 2020 10:27:48 +0000 (18:27 +0800)]
tc: m_tunnel_key: add options support for vxlan

This patch is to add TCA_TUNNEL_KEY_ENC_OPTS_VXLAN's parse and
print to implement vxlan options support in m_tunnel_key, like
Commit 6217917a3826 ("tc: m_tunnel_key: Add tunnel option support
to act_tunnel_key") for geneve options support.

Option is expressed a 32bit number for gbp only, and vxlan
doesn't support multiple options.

With this patch, users can add and dump vxlan options like:

  # ip link add name vxlan1 type vxlan dstport 0 external
  # tc qdisc add dev eth0 ingress
  # tc filter add dev eth0 protocol ip parent ffff: \
      flower indev eth0 \
        ip_proto udp \
        action tunnel_key \
          set src_ip 10.0.99.192 \
          dst_ip 10.0.99.193 \
          dst_port 6081 \
          id 11 \
          vxlan_opts 65793 \
      action mirred egress redirect dev vxlan1
  # tc -s filter show dev eth0 parent ffff:

     filter protocol ip pref 49152 flower chain 0 handle 0x1
       indev eth0
       eth_type ipv4
       ip_proto udp
       not_in_hw
         action order 1: tunnel_key  set
         src_ip 10.0.99.192
         dst_ip 10.0.99.193
         key_id 11
         dst_port 6081
         vxlan_opts 65793
         ...

v1->v2:
  - get_u32 with base = 0 for gbp.
  - use to print_unint("0x%x") to print gbp.
v2->v3:
  - implement proper JSON array for opts.
v3->v4:
  - keep the same format between input and output, json and non json.
  - print gbp as uint.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agoiproute_lwtunnel: add options support for erspan metadata
Xin Long [Mon, 27 Apr 2020 10:27:47 +0000 (18:27 +0800)]
iproute_lwtunnel: add options support for erspan metadata

This patch is to add LWTUNNEL_IP_OPTS_ERSPAN's parse and print to implement
erspan options support in iproute_lwtunnel.

Option is expressed as version:index:dir:hwid, dir and hwid will be parsed
when version is 2, while index will be parsed when version is 1. All of
these are numbers. erspan doesn't support multiple options.

With this patch, users can add and dump erspan options like:

  # ip netns add a
  # ip netns add b
  # ip -n a link add eth0 type veth peer name eth0 netns b
  # ip -n a link set eth0 up
  # ip -n b link set eth0 up
  # ip -n a addr add 10.1.0.1/24 dev eth0
  # ip -n b addr add 10.1.0.2/24 dev eth0
  # ip -n b link add erspan1 type erspan key 1 seq erspan 123 \
    local 10.1.0.2 remote 10.1.0.1
  # ip -n b addr add 1.1.1.1/24 dev erspan1
  # ip -n b link set erspan1 up
  # ip -n b route add 2.1.1.0/24 dev erspan1
  # ip -n a link add erspan1 type erspan key 1 seq local 10.1.0.1 external
  # ip -n a addr add 2.1.1.1/24 dev erspan1
  # ip -n a link set erspan1 up
  # ip -n a route add 1.1.1.0/24 encap ip id 1 \
    erspan_opts 2:123:1:2 dst 10.1.0.2 dev erspan1
  # ip -n a route show
  # ip netns exec a ping 1.1.1.1 -c 1

   1.1.1.0/24  encap ip id 1 src 0.0.0.0 dst 10.1.0.2 ttl 0 tos 0
     erspan_opts 2:0:1:2 dev erspan1 scope link

   PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
   64 bytes from 1.1.1.1: icmp_seq=1 ttl=64 time=0.124 ms

v1->v2:
  - improve the changelog.
  - use PRINT_ANY to support dumping with json format.
v2->v3:
  - implement proper JSON object for opts instead of just bunch of strings.
v3->v4:
  - keep the same format between input and output, json and non json.
  - print version, index, dir and hwid as uint.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agoiproute_lwtunnel: add options support for vxlan metadata
Xin Long [Mon, 27 Apr 2020 10:27:46 +0000 (18:27 +0800)]
iproute_lwtunnel: add options support for vxlan metadata

This patch is to add LWTUNNEL_IP_OPTS_VXLAN's parse and print to implement
vxlan options support in iproute_lwtunnel.

Option is expressed a number for gbp only, and vxlan doesn't support
multiple options.

With this patch, users can add and dump vxlan options like:

  # ip netns add a
  # ip netns add b
  # ip -n a link add eth0 type veth peer name eth0 netns b
  # ip -n a link set eth0 up
  # ip -n b link set eth0 up
  # ip -n a addr add 10.1.0.1/24 dev eth0
  # ip -n b addr add 10.1.0.2/24 dev eth0
  # ip -n b link add vxlan1 type vxlan id 1 local 10.1.0.2 \
    remote 10.1.0.1 dev eth0 ttl 64 gbp
  # ip -n b addr add 1.1.1.1/24 dev vxlan1
  # ip -n b link set vxlan1 up
  # ip -n b route add 2.1.1.0/24 dev vxlan1
  # ip -n a link add vxlan1 type vxlan local 10.1.0.1 dev eth0 ttl 64 \
    gbp external
  # ip -n a addr add 2.1.1.1/24 dev vxlan1
  # ip -n a link set vxlan1 up
  # ip -n a route add 1.1.1.0/24 encap ip id 1 \
    vxlan_opts 1110 dst 10.1.0.2 dev vxlan1
  # ip -n a route show
  # ip netns exec a ping 1.1.1.1 -c 1

   1.1.1.0/24  encap ip id 1 src 0.0.0.0 dst 10.1.0.2 ttl 0 tos 0
     vxlan_opts 1110 dev vxlan1 scope link

   PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
   64 bytes from 1.1.1.1: icmp_seq=1 ttl=64 time=0.111 ms

v1->v2:
  - improve the changelog.
  - get_u32 with base = 0 for gbp.
  - use PRINT_ANY to support dumping with json format.
v2->v3:
  - implement proper JSON array for opts.
v3->v4:
  - keep the same format between input and output, json and non json.
  - print gbp as uint.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>