]> git.proxmox.com Git - mirror_iproute2.git/log
mirror_iproute2.git
3 years agoip link: initial support for bareudp devices
Guillaume Nault [Wed, 1 Jul 2020 19:45:04 +0000 (21:45 +0200)]
ip link: initial support for bareudp devices

Bareudp devices provide a generic L3 encapsulation for tunnelling
different protocols like MPLS, IP, NSH, etc. inside a UDP tunnel.

This patch is based on original work from Martin Varghese:
https://lore.kernel.org/netdev/1570532361-15163-1-git-send-email-martinvarghesenokia@gmail.com/

Examples:

  - ip link add dev bareudp0 type bareudp dstport 6635 ethertype mpls_uc

This creates a bareudp tunnel device which tunnels L3 traffic with
ethertype 0x8847 (unicast MPLS traffic). The destination port of the
UDP header will be set to 6635. The device will listen on UDP port 6635
to receive traffic.

  - ip link add dev bareudp0 type bareudp dstport 6635 ethertype ipv4 multiproto

Same as the MPLS example, but for IPv4. The "multiproto" keyword allows
the device to also tunnel IPv6 traffic.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agolib: fix checking of returned file handle size for cgroup
Dmitry Yakunin [Sun, 5 Jul 2020 16:18:12 +0000 (19:18 +0300)]
lib: fix checking of returned file handle size for cgroup

Before this patch check is happened only in case when we try to find
cgroup at cgroup2 mount point.

v2:
  - add Fixes line before Signed-off-by (David Ahern)

Fixes: d5e6ee0dac64 ("ss: introduce cgroup2 cache and helper functions")
Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoip fou: respect preferred_family for IPv6
Sorah Fukumori [Thu, 25 Jun 2020 21:07:12 +0000 (06:07 +0900)]
ip fou: respect preferred_family for IPv6

ip(8) accepts -family ipv6 (-6) option at the toplevel. It is
straightforward to support the existing option for modifying listener
on IPv6 addresses.

Maintain the backward compatibility by leaving ip fou -6 flag
implemented, while it's removed from the usage message.

Signed-off-by: Sorah Fukumori <her@sorah.jp>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agotc: improve the qdisc show command
Anton Danilov [Fri, 3 Jul 2020 15:39:22 +0000 (18:39 +0300)]
tc: improve the qdisc show command

Before can be possible show only all qeueue disciplines on an interface.
There wasn't a way to get the qdisc info by handle or parent, only full
dump of the disciplines with a following grep/sed usage.

Now new and old options work as expected to filter a qdisc by handle or
parent.

Full syntax of the qdisc show command:

tc qdisc { show | list } [ dev STRING ] [ QDISC_ID ] [ invisible ]
  QDISC_ID := { root | ingress | handle QHANDLE | parent CLASSID }

This change doesn't require any changes in the kernel.

Signed-off-by: Anton Danilov <littlesmilingcloud@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agouapi: update bpf.h
Stephen Hemminger [Mon, 6 Jul 2020 17:54:35 +0000 (10:54 -0700)]
uapi: update bpf.h

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agodevlint-health.8: use a single-font macro for a single argument
Bjarni Ingi Gislason [Mon, 29 Jun 2020 00:42:48 +0000 (00:42 +0000)]
devlint-health.8: use a single-font macro for a single argument

Use a single font macro for a single argument.

  Remove unnecessary quotes for a single-font macro.

  Join two lines into one.

  The output of "nroff" and "groff" is unchanged.

Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agodevlink-dev.8: use a single-font macro for one argument
Bjarni Ingi Gislason [Sun, 28 Jun 2020 23:58:40 +0000 (23:58 +0000)]
devlink-dev.8: use a single-font macro for one argument

Use a single-font macro for one argument.

  Remove unnecessary quotes for a single font macro.

  Join some lines into one.

  The output of "nroff" and "groff" is unchanged, except for a font
change in two lines.

Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agodevlink.8: Use a single-font macro for a single argument
Bjarni Ingi Gislason [Sun, 28 Jun 2020 23:26:12 +0000 (23:26 +0000)]
devlink.8: Use a single-font macro for a single argument

Use a single-font macro for a single argument

Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoman8/bridge.8: fix misuse of two-fonts macros
Bjarni Ingi Gislason [Sun, 28 Jun 2020 22:46:26 +0000 (22:46 +0000)]
man8/bridge.8: fix misuse of two-fonts macros

Use a single-font macro for a single argument.

Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agolibnetlink.3: display section numbers in roman font, not boldface
Bjarni Ingi Gislason [Sun, 28 Jun 2020 16:26:15 +0000 (16:26 +0000)]
libnetlink.3: display section numbers in roman font, not boldface

Typeset section numbers in roman font, see man-pages(7).

###

  Details:

Output is from: test-groff -b -mandoc -T utf8 -rF0 -t -w w -z

  [ "test-groff" is a developmental version of "groff" ]

<./man/man3/libnetlink.3>:53 (macro BR): only 1 argument, but more are expected
<./man/man3/libnetlink.3>:132 (macro BR): only 1 argument, but more are expected
<./man/man3/libnetlink.3>:134 (macro BR): only 1 argument, but more are expected
<./man/man3/libnetlink.3>:197 (macro BR): only 1 argument, but more are expected
<./man/man3/libnetlink.3>:198 (macro BR): only 1 argument, but more are expected

Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
3 years agoman/tc: remove obsolete reference to ipchains
Stephen Hemminger [Wed, 24 Jun 2020 19:13:46 +0000 (12:13 -0700)]
man/tc: remove obsolete reference to ipchains

It isn't Linux 2.2 anymore.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoip address: Fix loop initial declarations are only allowed in C99
Roi Dayan [Thu, 11 Jun 2020 17:35:43 +0000 (20:35 +0300)]
ip address: Fix loop initial declarations are only allowed in C99

On some distros, i.e. rhel 7.6, compilation fails with the following:

ipaddress.c: In function ‘lookup_flag_data_by_name’:
ipaddress.c:1260:2: error: ‘for’ loop initial declarations are only allowed in C99 mode
  for (int i = 0; i < ARRAY_SIZE(ifa_flag_data); ++i) {
  ^
ipaddress.c:1260:2: note: use option -std=c99 or -std=gnu99 to compile your code

This commit fixes the single place needed for compilation to pass.

Fixes: 9d59c86e575b ("iproute2: ip addr: Organize flag properties structurally")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agouapi: update to magic.h
Stephen Hemminger [Thu, 11 Jun 2020 16:52:38 +0000 (09:52 -0700)]
uapi: update to magic.h

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agodevlink: Add 'mirror' trap action
Ido Schimmel [Sun, 7 Jun 2020 08:36:47 +0000 (11:36 +0300)]
devlink: Add 'mirror' trap action

Allow setting 'mirror' trap action for traps that support it. Extend the
devlink-trap man page and bash completion accordingly.

Example:

# devlink -jp trap show netdevsim/netdevsim10 trap igmp_query
{
    "trap": {
        "netdevsim/netdevsim10": [ {
                "name": "igmp_query",
                "type": "control",
                "generic": true,
                "action": "mirror",
                "group": "mc_snooping"
            } ]
    }
}

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agodevlink: Add 'control' trap type
Ido Schimmel [Sun, 7 Jun 2020 08:36:46 +0000 (11:36 +0300)]
devlink: Add 'control' trap type

This type is used for traps that trap control packets such as ARP
request and IGMP query to the CPU.

Example:

# devlink -jp trap show netdevsim/netdevsim10 trap igmp_v1_report
{
    "trap": {
        "netdevsim/netdevsim10": [ {
                "name": "igmp_v1_report",
                "type": "control",
                "generic": true,
                "action": "trap",
                "group": "mc_snooping"
            } ]
    }
}

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agodevlink: update include files
Stephen Hemminger [Thu, 11 Jun 2020 16:46:46 +0000 (09:46 -0700)]
devlink: update include files

Use the tool iwyu to get more complete list of includes for
all the bits used by devlink.

This should also fix build with musl libc.

Fixes: c4dfddccef4e ("fix JSON output of mon command")
Reported-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agouapi: update headers
Stephen Hemminger [Fri, 5 Jun 2020 15:36:54 +0000 (08:36 -0700)]
uapi: update headers

Update kernel headers from 5.8.0 merge

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoMerge git://git.kernel.org/pub/scm/network/iproute2/iproute2-next
Stephen Hemminger [Fri, 5 Jun 2020 15:33:29 +0000 (08:33 -0700)]
Merge git://git.kernel.org/pub/scm/network/iproute2/iproute2-next

4 years agov5.7.0
Stephen Hemminger [Wed, 3 Jun 2020 03:35:00 +0000 (20:35 -0700)]
v5.7.0

4 years agonexthop: Fix Deletion display
Donald Sharp [Sat, 30 May 2020 12:16:37 +0000 (08:16 -0400)]
nexthop: Fix Deletion display

Actually display that deletions are happening
when monitoring nexthops.

Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agouapi: fix comment in xfrm.h
Stephen Hemminger [Mon, 1 Jun 2020 15:07:02 +0000 (08:07 -0700)]
uapi: fix comment in xfrm.h

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoiproute2: ip addr: Add support for setting 'optimistic'
Ian K. Coolidge [Wed, 27 May 2020 18:03:46 +0000 (11:03 -0700)]
iproute2: ip addr: Add support for setting 'optimistic'

optimistic DAD is controllable via sysctl for an interface
or all interfaces on the system. This would affect addresses
added by the kernel only.

Recent kernels, however, have enabled support for adding optimistic
address via userspace. This plumbs that support.

Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agoiproute2: ip addr: Organize flag properties structurally
Ian K. Coolidge [Wed, 27 May 2020 18:03:45 +0000 (11:03 -0700)]
iproute2: ip addr: Organize flag properties structurally

This creates a nice systematic way to check that the various flags are
mutable from userspace and that the address family is valid.

Mutability properties are preserved to avoid introducing any behavioral
change in this CL. However, previously, immutable flags were ignored and
fell through to this confusing error:

Error: either "local" is duplicate, or "dadfailed" is a garbage.

But now, they just warn more explicitly:

Warning: dadfailed option is not mutable from userspace
Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agotc: report time an action was first used
Roman Mashak [Thu, 28 May 2020 01:22:47 +0000 (21:22 -0400)]
tc: report time an action was first used

Have print_tm() dump firstuse value along with install, lastuse
and expires.

v2: Resubmit after 'master' merged into next

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agobpf: Fixes a snprintf truncation warning
Andrea Claudi [Tue, 26 May 2020 16:04:11 +0000 (18:04 +0200)]
bpf: Fixes a snprintf truncation warning

gcc v9.3.1 reports:

bpf.c: In function ‘bpf_get_work_dir’:
bpf.c:784:49: warning: ‘snprintf’ output may be truncated before the last format character [-Wformat-truncation=]
  784 |  snprintf(bpf_wrk_dir, sizeof(bpf_wrk_dir), "%s/", mnt);
      |                                                 ^
bpf.c:784:2: note: ‘snprintf’ output between 2 and 4097 bytes into a destination of size 4096
  784 |  snprintf(bpf_wrk_dir, sizeof(bpf_wrk_dir), "%s/", mnt);
      |  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fix this simply checking snprintf return code and properly handling the error.

Fixes: e42256699cac ("bpf: make tc's bpf loader generic and move into lib")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoRevert "bpf: replace snprintf with asprintf when dealing with long buffers"
Andrea Claudi [Tue, 26 May 2020 16:04:10 +0000 (18:04 +0200)]
Revert "bpf: replace snprintf with asprintf when dealing with long buffers"

This reverts commit c0325b06382cb4f7ebfaf80c29c8800d74666fd9.
It introduces a segfault in bpf_make_custom_path() when custom pinning is used.

This happens because asprintf allocates exactly the space needed to hold a
string in the buffer passed as its first argument, but if this buffer is later
used in strcat() or similar we have a buffer overrun.

As the aim of commit c0325b06382c is simply to fix a compiler warning, it
seems safe and reasonable to revert it.

Fixes: c0325b06382c ("bpf: replace snprintf with asprintf when dealing with long buffers")
Reported-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoMerge branch 'master' into next
David Ahern [Wed, 27 May 2020 02:08:27 +0000 (02:08 +0000)]
Merge branch 'master' into next

Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agotipc: enable printing of broadcast rcv link stats
Tuong Lien [Tue, 26 May 2020 09:40:55 +0000 (16:40 +0700)]
tipc: enable printing of broadcast rcv link stats

This commit allows printing the statistics of a broadcast-receiver link
using the same tipc command, but with additional 'link' options:

$ tipc link stat show --help
Usage: tipc link stat show [ link { LINK | SUBSTRING | all } ]

With:
+ 'LINK'      : print the stats of the specific link 'LINK';
+ 'SUBSTRING' : print the stats of all the links having the 'SUBSTRING'
                in name;
+ 'all'       : print all the links' stats incl. the broadcast-receiver
                ones;

Also, a link stats can be reset in the usual way by specifying the link
name in command.

For example:

$ tipc l st sh l br
Link <broadcast-link>
  Window:50 packets
  RX packets:0 fragments:0/0 bundles:0/0
  TX packets:5011125 fragments:4968774/149643 bundles:38402/307061
  RX naks:781484 defs:0 dups:0
  TX naks:0 acks:0 retrans:330259
  Congestion link:50657  Send queue max:0 avg:0

Link <broadcast-link:1001001>
  Window:50 packets
  RX packets:95146 fragments:95040/1980 bundles:1/2
  TX packets:0 fragments:0/0 bundles:0/0
  RX naks:380938 defs:83962 dups:403
  TX naks:8362 acks:0 retrans:170662
  Congestion link:0  Send queue max:0 avg:0

Link <broadcast-link:1001002>
  Window:50 packets
  RX packets:0 fragments:0/0 bundles:0/0
  TX packets:0 fragments:0/0 bundles:0/0
  RX naks:400546 defs:0 dups:0
  TX naks:0 acks:0 retrans:159597
  Congestion link:0  Send queue max:0 avg:0

$ tipc l st sh l 1001002
Link <1001003:data0-1001002:data0>
  ACTIVE  MTU:1500  Priority:10  Tolerance:1500 ms  Window:50 packets
  RX packets:99546 fragments:0/0 bundles:33/877
  TX packets:629 fragments:0/0 bundles:35/828
  TX profile sample:8 packets average:390 octets
  0-64:75% -256:0% -1024:0% -4096:25% -16384:0% -32768:0% -66000:0%
  RX states:488714 probes:7397 naks:0 defs:4 dups:5
  TX states:27734 probes:18016 naks:5 acks:2305 retrans:0
  Congestion link:0  Send queue max:0 avg:0

Link <broadcast-link:1001002>
  Window:50 packets
  RX packets:0 fragments:0/0 bundles:0/0
  TX packets:0 fragments:0/0 bundles:0/0
  RX naks:400546 defs:0 dups:0
  TX naks:0 acks:0 retrans:159597
  Congestion link:0  Send queue max:0 avg:0

$ tipc l st re l broadcast-link:1001002

$ tipc l st sh l broadcast-link:1001002
Link <broadcast-link:1001002>
  Window:50 packets
  RX packets:0 fragments:0/0 bundles:0/0
  TX packets:0 fragments:0/0 bundles:0/0
  RX naks:0 defs:0 dups:0
  TX naks:0 acks:0 retrans:0
  Congestion link:0  Send queue max:0 avg:0

Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agolwtunnel: add support for rpl segment routing
Alexander Aring [Wed, 20 May 2020 23:57:27 +0000 (19:57 -0400)]
lwtunnel: add support for rpl segment routing

This patch adds support for rpl segment routing settings.
Example:

ip -n ns0 -6 route add 2001::3 encap rpl segs \
fe80::c8fe:beef:cafe:cafe,fe80::c8fe:beef:cafe:beef dev lowpan0

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agotc: action: fix time values output in JSON format
Roman Mashak [Wed, 20 May 2020 00:59:44 +0000 (20:59 -0400)]
tc: action: fix time values output in JSON format

Report tcf_t values in seconds, not jiffies, in JSON format as it is now
for stdout.

v2: use PRINT_ANY, drop the useless casts and fix the style (Stephen Hemminger)

Fixes: 2704bd625583 ("tc: jsonify actions core")
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agouapi: update to bpf.h
Stephen Hemminger [Tue, 19 May 2020 21:31:54 +0000 (14:31 -0700)]
uapi: update to bpf.h

Part of the zero-length array changes

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoutils: remove trailing zeros in print_time() and print_time64()
Eric Dumazet [Tue, 5 May 2020 19:46:18 +0000 (12:46 -0700)]
utils: remove trailing zeros in print_time() and print_time64()

Before :

tc qd sh dev eth1

... refill_delay 40.0ms timer_slack 10.000us horizon 10.000s

After :
... refill_delay 40ms timer_slack 10us horizon 10s

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoman: tc-ct.8: Add manual page for ct tc action
Paul Blakey [Thu, 14 May 2020 14:10:20 +0000 (17:10 +0300)]
man: tc-ct.8: Add manual page for ct tc action

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agotc: mqprio: reject queues count/offset pair count higher than num_tc
Maciej Fijalkowski [Wed, 13 May 2020 19:47:17 +0000 (21:47 +0200)]
tc: mqprio: reject queues count/offset pair count higher than num_tc

Provide a sanity check that will make sure whether queues count/offset
pair count will not exceed the actual number of TCs being created.

Example command that is invalid because there are 4 count/offset pairs
whereas num_tc is only 2.

 # tc qdisc add dev enp96s0f0 root mqprio num_tc 2 map 0 0 0 0 1 1 1 1
queues 4@0 4@4 4@8 4@12 hw 1 mode channel

Store the parsed count/offset pair count onto a dedicated variable that
will be compared against opt.num_tc after all of the command line
arguments were parsed. Bail out if this count is higher than opt.num_tc
and let user know about it.

Drivers were swallowing such commands as they were iterating over
count/offset pairs where num_tc was used as a delimiter, so this is not
a big deal, but better catch such misconfiguration at the command line
argument parsing level.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agoss: add checks for bc filter support
Dmitry Yakunin [Sat, 9 May 2020 16:52:02 +0000 (19:52 +0300)]
ss: add checks for bc filter support

As noted by David Ahern, now if some bytecode filter is not supported
by running kernel printed error message is not clear. This patch is attempt to
detect such case and print correct message. This is done by providing checking
function for new filter types. As example check function for cgroup filter
is implemented. It sends correct lightweight request (idiag_states = 0)
with zero cgroup condition to the kernel and checks returned errno. If filter
is not supported EINVAL is returned. Result of checking is cached to
avoid extra checks if several same filters are specified.

Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru>
Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agoss: add support for cgroup v2 information and filtering
Dmitry Yakunin [Sat, 9 May 2020 16:52:01 +0000 (19:52 +0300)]
ss: add support for cgroup v2 information and filtering

This patch introduces two new features: obtaining cgroup information and
filtering sockets by cgroups. These features work based on cgroup v2 ID
field in the socket (kernel should be compiled with CONFIG_SOCK_CGROUP_DATA).

Cgroup information can be obtained by specifying --cgroup flag and now contains
only pathname. For faster pathname lookups cgroup cache is implemented. This
cache is filled on ss startup and missed entries are resolved and saved
on the fly.

Cgroup filter extends EXPRESSION and allows to specify cgroup pathname
(relative or absolute) to obtain sockets attached only to this cgroup.
Filter syntax: ss [ cgroup PATHNAME ]
Examples:
    ss -a cgroup /sys/fs/cgroup/unified (or ss -a cgroup .)
    ss -a cgroup /sys/fs/cgroup/unified/cgroup1 (or ss -a cgroup cgroup1)

v2:
  - style fixes (David Ahern)

Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru>
Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agoss: introduce cgroup2 cache and helper functions
Dmitry Yakunin [Sat, 9 May 2020 16:52:00 +0000 (19:52 +0300)]
ss: introduce cgroup2 cache and helper functions

This patch prepares infrastructure for matching sockets by cgroups.
Two helper functions are added for transformation between cgroup v2 ID
and pathname. Cgroup v2 cache is implemented as hash table indexed by ID.
This cache is needed for faster lookups of socket cgroup.

v2:
  - style fixes (David Ahern)

Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru>
Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agoiproute2-next: add gate action man page
Po Liu [Fri, 8 May 2020 07:02:47 +0000 (15:02 +0800)]
iproute2-next: add gate action man page

This patch is to add the man page for the tc gate action.

Signed-off-by: Po Liu <Po.Liu@nxp.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agoiproute2-next:tc:action: add a gate control action
Po Liu [Fri, 8 May 2020 07:02:46 +0000 (15:02 +0800)]
iproute2-next:tc:action: add a gate control action

Introduce a ingress frame gate control flow action.
Tc gate action does the work like this:
Assume there is a gate allow specified ingress frames can pass at
specific time slot, and also drop at specific time slot. Tc filter
chooses the ingress frames, and tc gate action would specify what slot
does these frames can be passed to device and what time slot would be
dropped.
Tc gate action would provide an entry list to tell how much time gate
keep open and how much time gate keep state close. Gate action also
assign a start time to tell when the entry list start. Then driver would
repeat the gate entry list cyclically.
For the software simulation, gate action require the user assign a time
clock type.

Below is the setting example in user space. Tc filter a stream source ip
address is 192.168.0.20 and gate action own two time slots. One is last
200ms gate open let frame pass another is last 100ms gate close let
frames dropped.

 # tc qdisc add dev eth0 ingress
 # tc filter add dev eth0 parent ffff: protocol ip \

            flower src_ip 192.168.0.20 \
            action gate index 2 clockid CLOCK_TAI \
            sched-entry open 200000000ns -1 8000000b \
            sched-entry close 100000000ns

 # tc chain del dev eth0 ingress chain 0

"sched-entry" follow the name taprio style. Gate state is
"open"/"close". Follow the period nanosecond. Then next -1 is internal
priority value means which ingress queue should put to. "-1" means
wildcard. The last value optional specifies the maximum number of
MSDU octets that are permitted to pass the gate during the specified
time interval, the overlimit frames would be dropped.

Below example shows filtering a stream with destination mac address is
10:00:80:00:00:00 and ip type is ICMP, follow the action gate. The gate
action would run with one close time slot which means always keep close.
The time cycle is total 200000000ns. The base-time would calculate by:

     1357000000000 + (N + 1) * cycletime

When the total value is the future time, it will be the start time.
The cycletime here would be 200000000ns for this case.

 #tc filter add dev eth0 parent ffff:  protocol ip \
           flower skip_hw ip_proto icmp dst_mac 10:00:80:00:00:00 \
           action gate index 12 base-time 1357000000000ns \
           sched-entry CLOSE 200000000ns \
           clockid CLOCK_TAI

Signed-off-by: Po Liu <Po.Liu@nxp.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agoUpdate kernel headers and import tc_gate.h
David Ahern [Wed, 13 May 2020 02:18:15 +0000 (02:18 +0000)]
Update kernel headers and import tc_gate.h

Update kernel headers to commit:
    fb9f2e92864f ("net: dsa: tag_sja1105: appease sparse checks for ethertype accessors")
and import tc_act/tc_gate.h

Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agotc: fq: fix two issues
Eric Dumazet [Tue, 5 May 2020 15:43:48 +0000 (08:43 -0700)]
tc: fq: fix two issues

My latest patch missed the fact that this file got JSON support.

Also fixes a spelling error added during JSON change.

Fixes: be9ca9d54123 ("tc: fq: add timer_slack parameter")
Fixes: d15e2bfc042b ("tc: fq: add support for JSON output")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoss: update to bw print
Stephen Hemminger [Tue, 5 May 2020 17:18:58 +0000 (10:18 -0700)]
ss: update to bw print

Display kilobit with the standard suffix.
Add comment to describe where data rate suffixes come from.
Add support for terrabit.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agodevlink: support kernel-side snapshot id allocation
Jakub Kicinski [Tue, 5 May 2020 17:04:57 +0000 (10:04 -0700)]
devlink: support kernel-side snapshot id allocation

Make ID argument optional and read the snapshot info
that kernel sends us.

$ devlink region new netdevsim/netdevsim1/dummy
netdevsim/netdevsim1/dummy: snapshot 0
$ devlink -jp region new netdevsim/netdevsim1/dummy
{
    "regions": {
        "netdevsim/netdevsim1/dummy": {
            "snapshot": [ 1 ]
        }
    }
}
$ devlink region show netdevsim/netdevsim1/dummy
netdevsim/netdevsim1/dummy: size 32768 snapshot [0 1]

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agoss: add support for Gbit speeds in sprint_bw()
Eric Dumazet [Tue, 5 May 2020 15:37:41 +0000 (08:37 -0700)]
ss: add support for Gbit speeds in sprint_bw()

Also use 'g' specifier instead of 'f' to remove trailing zeros,
and increase precision.

Examples of output :
 Before        After
 8.0Kbps       8Kbps
 9.9Mbps       9.92Mbps
 55001Mbps     55Gbps

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoMerge branch 'master' into next
David Ahern [Tue, 5 May 2020 16:49:38 +0000 (16:49 +0000)]
Merge branch 'master' into next

Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agoImport rpl.h and rpl_iptunnel.h uapi headers
David Ahern [Tue, 5 May 2020 16:23:14 +0000 (16:23 +0000)]
Import rpl.h and rpl_iptunnel.h uapi headers

Import rpl.h and rpl_iptunnel.h as of kernel commit:
    354d86141796 ("Merge branch 'net-reduce-dynamic-lockdep-keys'")

Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agotc: full JSON support for 'bpf' filter
Davide Caratti [Thu, 30 Apr 2020 18:03:30 +0000 (20:03 +0200)]
tc: full JSON support for 'bpf' filter

example using eBPF:

 # tc filter add dev dummy0 ingress bpf \
 > direct-action obj ./bpf/filter.o sec tc-ingress
 # tc  -j filter show dev dummy0 ingress | jq
 [
   {
     "protocol": "all",
     "pref": 49152,
     "kind": "bpf",
     "chain": 0
   },
   {
     "protocol": "all",
     "pref": 49152,
     "kind": "bpf",
     "chain": 0,
     "options": {
       "handle": "0x1",
       "bpf_name": "filter.o:[tc-ingress]",
       "direct-action": true,
       "not_in_hw": true,
       "prog": {
         "id": 101,
         "tag": "a04f5eef06a7f555",
         "jited": 1
       }
     }
   }
 ]

v2:
 - use print_nl(), thanks to Andrea Claudi
 - use print_0xhex() for filter handle, thanks to Stephen Hemminger

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agoUpdate kernel headers
David Ahern [Tue, 5 May 2020 16:11:22 +0000 (16:11 +0000)]
Update kernel headers

Update kernel headers to commit:
    354d86141796 ("Merge branch 'net-reduce-dynamic-lockdep-keys'")

Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agoReplace open-coded instances of print_nl()
Benjamin Poirier [Fri, 1 May 2020 08:47:20 +0000 (17:47 +0900)]
Replace open-coded instances of print_nl()

Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agobridge: Align output columns
Benjamin Poirier [Fri, 1 May 2020 08:47:19 +0000 (17:47 +0900)]
bridge: Align output columns

Use fixed column widths to improve readability.

Before:
root@vsid:/src/iproute2# ./bridge/bridge vlan tunnelshow
port    vlan-id tunnel-id
vx0      2       2
         1010-1020       1010-1020
         1030    65556
vx-longname      2       2

After:
root@vsid:/src/iproute2# ./bridge/bridge vlan tunnelshow
port              vlan-id    tunnel-id
vx0               2          2
                  1010-1020  1010-1020
                  1030       65556
vx-longname       2          2

Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agojson_print: Return number of characters printed
Benjamin Poirier [Fri, 1 May 2020 08:47:18 +0000 (17:47 +0900)]
json_print: Return number of characters printed

When outputting in normal mode, forward the return value from
color_fprintf().

Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agobridge: Fix output with empty vlan lists
Benjamin Poirier [Fri, 1 May 2020 08:47:17 +0000 (17:47 +0900)]
bridge: Fix output with empty vlan lists

Consider this configuration:

ip link add br0 type bridge
ip link add vx0 type vxlan dstport 4789 external
ip link set dev vx0 master br0
bridge vlan del vid 1 dev vx0
ip link add vx1 type vxlan dstport 4790 external
ip link set dev vx1 master br0

root@vsid:/src/iproute2# ./bridge/bridge vlan
port    vlan-id
br0      1 PVID Egress Untagged

vx0     None
vx1      1 PVID Egress Untagged

root@vsid:/src/iproute2#

Note the useless and inconsistent empty lines.

root@vsid:/src/iproute2# ./bridge/bridge vlan tunnelshow
port    vlan-id tunnel-id
br0
vx0     None
vx1

What's the difference between "None" and ""?

root@vsid:/src/iproute2# ./bridge/bridge -j -p vlan tunnelshow
[ {
"ifname": "br0",
"tunnels": [ ]
    },{
"ifname": "vx1",
"tunnels": [ ]
    } ]

Why does vx0 appear in normal output and not json output?
Why output an empty list for br0 and vx1?

Fix these inconsistencies and avoid outputting entries with no values. This
makes the behavior consistent with other iproute2 commands, for example
`ip -6 addr`: if an interface doesn't have any ipv6 addresses, it is not
part of the listing.

Fixes: 8652eeb3ab12 ("bridge: vlan: support for per vlan tunnel info")
Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agobridge: Fix typo
Benjamin Poirier [Fri, 1 May 2020 08:47:16 +0000 (17:47 +0900)]
bridge: Fix typo

Fixes: 7abf5de677e3 ("bridge: vlan: add support to display per-vlan statistics")
Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agobridge: Use consistent column names in vlan output
Benjamin Poirier [Fri, 1 May 2020 08:47:15 +0000 (17:47 +0900)]
bridge: Use consistent column names in vlan output

Fix singular vs plural. Add a hyphen to clarify that each of those are
single fields.

Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agotc: f_flower: add options support for erspan
Xin Long [Mon, 27 Apr 2020 10:27:51 +0000 (18:27 +0800)]
tc: f_flower: add options support for erspan

This patch is to add TCA_FLOWER_KEY_ENC_OPTS_ERSPAN's parse and
print to implement erspan options support in m_tunnel_key, like
Commit 56155d4df86d ("tc: f_flower: add geneve option match
support to flower") for geneve options support.

Option is expressed as version:index:dir:hwid, dir and hwid will
be parsed when version is 2, while index will be parsed when
version is 1. erspan doesn't support multiple options.

With this patch, users can add and dump erspan options like:

  # ip link add name erspan1 type erspan external
  # tc qdisc add dev erspan1 ingress
  # tc filter add dev erspan1 protocol ip parent ffff: \
      flower \
        enc_src_ip 10.0.99.192 \
        enc_dst_ip 10.0.99.193 \
        enc_key_id 11 \
        erspan_opts 1:2:0:0/1:255:0:0 \
        ip_proto udp \
        action mirred egress redirect dev eth1
  # tc -s filter show dev erspan1 parent ffff:

     filter protocol ip pref 49152 flower chain 0 handle 0x1
       eth_type ipv4
       ip_proto udp
       enc_dst_ip 10.0.99.193
       enc_src_ip 10.0.99.192
       enc_key_id 11
       erspan_opts 1:2:0:0/1:255:0:0
       not_in_hw
         action order 1: mirred (Egress Redirect to device eth1) stolen
         index 1 ref 1 bind 1
         Action statistics:
         Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
         backlog 0b 0p requeues 0

v1->v2:
  - no change.
v2->v3:
  - no change.
v3->v4:
  - keep the same format between input and output, json and non json.
  - print version, index, dir and hwid as uint.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agotc: f_flower: add options support for vxlan
Xin Long [Mon, 27 Apr 2020 10:27:50 +0000 (18:27 +0800)]
tc: f_flower: add options support for vxlan

This patch is to add TCA_FLOWER_KEY_ENC_OPTS_VXLAN's parse and
print to implement vxlan options support in m_tunnel_key, like
Commit 56155d4df86d ("tc: f_flower: add geneve option match
support to flower") for geneve options support.

Option is expressed a 32bit number for gbp only, and vxlan
doesn't support multiple options.

With this patch, users can add and dump vxlan options like:

  # ip link add name vxlan1 type vxlan dstport 0 external
  # tc qdisc add dev vxlan1 ingress
  # tc filter add dev vxlan1 protocol ip parent ffff: \
      flower \
        enc_src_ip 10.0.99.192 \
        enc_dst_ip 10.0.99.193 \
        enc_key_id 11 \
        vxlan_opts 65793/4008635966 \
        ip_proto udp \
        action mirred egress redirect dev eth1
  # tc -s filter show dev vxlan1 parent ffff:

     filter protocol ip pref 49152 flower chain 0 handle 0x1
       eth_type ipv4
       ip_proto udp
       enc_dst_ip 10.0.99.193
       enc_src_ip 10.0.99.192
       enc_key_id 11
       vxlan_opts 65793/4008635966
       not_in_hw
         action order 1: mirred (Egress Redirect to device eth1) stolen
         index 3 ref 1 bind 1
         Action statistics:
         Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
         backlog 0b 0p requeues 0

v1->v2:
  - get_u32 with base = 0 for gbp.
v2->v3:
  - implement proper JSON array for opts.
v3->v4:
  - keep the same format between input and output, json and non json.
  - print gbp as uint.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agotc: m_tunnel_key: add options support for erpsan
Xin Long [Mon, 27 Apr 2020 10:27:49 +0000 (18:27 +0800)]
tc: m_tunnel_key: add options support for erpsan

This patch is to add TCA_TUNNEL_KEY_ENC_OPTS_ERSPAN's parse and
print to implement erspan options support in m_tunnel_key, like
Commit 6217917a3826 ("tc: m_tunnel_key: Add tunnel option support
to act_tunnel_key") for geneve options support.

Option is expressed as version:index:dir:hwid, dir and hwid will
be parsed when version is 2, while index will be parsed when
version is 1. erspan doesn't support multiple options.

With this patch, users can add and dump erspan options like:

  # ip link add name erspan1 type erspan external
  # tc qdisc add dev eth0 ingress
  # tc filter add dev eth0 protocol ip parent ffff: \
      flower indev eth0 \
        ip_proto udp \
        action tunnel_key \
          set src_ip 10.0.99.192 \
          dst_ip 10.0.99.193 \
          dst_port 6081 \
          id 11 \
          erspan_opts 1:2:0:0 \
      action mirred egress redirect dev erspan1
  # tc -s filter show dev eth0 parent ffff:

     filter protocol ip pref 49151 flower chain 0 handle 0x1
       indev eth0
       eth_type ipv4
       ip_proto udp
       not_in_hw
         action order 1: tunnel_key  set
         src_ip 10.0.99.192
         dst_ip 10.0.99.193
         key_id 11
         dst_port 6081
         erspan_opts 1:2:0:0
         csum pipe
           index 2 ref 1 bind 1
         ...
v1->v2:
  - no change.
v2->v3:
  - no change.
v3->v4:
  - keep the same format between input and output, json and non json.
  - print version, index, dir and hwid as uint.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agotc: m_tunnel_key: add options support for vxlan
Xin Long [Mon, 27 Apr 2020 10:27:48 +0000 (18:27 +0800)]
tc: m_tunnel_key: add options support for vxlan

This patch is to add TCA_TUNNEL_KEY_ENC_OPTS_VXLAN's parse and
print to implement vxlan options support in m_tunnel_key, like
Commit 6217917a3826 ("tc: m_tunnel_key: Add tunnel option support
to act_tunnel_key") for geneve options support.

Option is expressed a 32bit number for gbp only, and vxlan
doesn't support multiple options.

With this patch, users can add and dump vxlan options like:

  # ip link add name vxlan1 type vxlan dstport 0 external
  # tc qdisc add dev eth0 ingress
  # tc filter add dev eth0 protocol ip parent ffff: \
      flower indev eth0 \
        ip_proto udp \
        action tunnel_key \
          set src_ip 10.0.99.192 \
          dst_ip 10.0.99.193 \
          dst_port 6081 \
          id 11 \
          vxlan_opts 65793 \
      action mirred egress redirect dev vxlan1
  # tc -s filter show dev eth0 parent ffff:

     filter protocol ip pref 49152 flower chain 0 handle 0x1
       indev eth0
       eth_type ipv4
       ip_proto udp
       not_in_hw
         action order 1: tunnel_key  set
         src_ip 10.0.99.192
         dst_ip 10.0.99.193
         key_id 11
         dst_port 6081
         vxlan_opts 65793
         ...

v1->v2:
  - get_u32 with base = 0 for gbp.
  - use to print_unint("0x%x") to print gbp.
v2->v3:
  - implement proper JSON array for opts.
v3->v4:
  - keep the same format between input and output, json and non json.
  - print gbp as uint.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agoiproute_lwtunnel: add options support for erspan metadata
Xin Long [Mon, 27 Apr 2020 10:27:47 +0000 (18:27 +0800)]
iproute_lwtunnel: add options support for erspan metadata

This patch is to add LWTUNNEL_IP_OPTS_ERSPAN's parse and print to implement
erspan options support in iproute_lwtunnel.

Option is expressed as version:index:dir:hwid, dir and hwid will be parsed
when version is 2, while index will be parsed when version is 1. All of
these are numbers. erspan doesn't support multiple options.

With this patch, users can add and dump erspan options like:

  # ip netns add a
  # ip netns add b
  # ip -n a link add eth0 type veth peer name eth0 netns b
  # ip -n a link set eth0 up
  # ip -n b link set eth0 up
  # ip -n a addr add 10.1.0.1/24 dev eth0
  # ip -n b addr add 10.1.0.2/24 dev eth0
  # ip -n b link add erspan1 type erspan key 1 seq erspan 123 \
    local 10.1.0.2 remote 10.1.0.1
  # ip -n b addr add 1.1.1.1/24 dev erspan1
  # ip -n b link set erspan1 up
  # ip -n b route add 2.1.1.0/24 dev erspan1
  # ip -n a link add erspan1 type erspan key 1 seq local 10.1.0.1 external
  # ip -n a addr add 2.1.1.1/24 dev erspan1
  # ip -n a link set erspan1 up
  # ip -n a route add 1.1.1.0/24 encap ip id 1 \
    erspan_opts 2:123:1:2 dst 10.1.0.2 dev erspan1
  # ip -n a route show
  # ip netns exec a ping 1.1.1.1 -c 1

   1.1.1.0/24  encap ip id 1 src 0.0.0.0 dst 10.1.0.2 ttl 0 tos 0
     erspan_opts 2:0:1:2 dev erspan1 scope link

   PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
   64 bytes from 1.1.1.1: icmp_seq=1 ttl=64 time=0.124 ms

v1->v2:
  - improve the changelog.
  - use PRINT_ANY to support dumping with json format.
v2->v3:
  - implement proper JSON object for opts instead of just bunch of strings.
v3->v4:
  - keep the same format between input and output, json and non json.
  - print version, index, dir and hwid as uint.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agoiproute_lwtunnel: add options support for vxlan metadata
Xin Long [Mon, 27 Apr 2020 10:27:46 +0000 (18:27 +0800)]
iproute_lwtunnel: add options support for vxlan metadata

This patch is to add LWTUNNEL_IP_OPTS_VXLAN's parse and print to implement
vxlan options support in iproute_lwtunnel.

Option is expressed a number for gbp only, and vxlan doesn't support
multiple options.

With this patch, users can add and dump vxlan options like:

  # ip netns add a
  # ip netns add b
  # ip -n a link add eth0 type veth peer name eth0 netns b
  # ip -n a link set eth0 up
  # ip -n b link set eth0 up
  # ip -n a addr add 10.1.0.1/24 dev eth0
  # ip -n b addr add 10.1.0.2/24 dev eth0
  # ip -n b link add vxlan1 type vxlan id 1 local 10.1.0.2 \
    remote 10.1.0.1 dev eth0 ttl 64 gbp
  # ip -n b addr add 1.1.1.1/24 dev vxlan1
  # ip -n b link set vxlan1 up
  # ip -n b route add 2.1.1.0/24 dev vxlan1
  # ip -n a link add vxlan1 type vxlan local 10.1.0.1 dev eth0 ttl 64 \
    gbp external
  # ip -n a addr add 2.1.1.1/24 dev vxlan1
  # ip -n a link set vxlan1 up
  # ip -n a route add 1.1.1.0/24 encap ip id 1 \
    vxlan_opts 1110 dst 10.1.0.2 dev vxlan1
  # ip -n a route show
  # ip netns exec a ping 1.1.1.1 -c 1

   1.1.1.0/24  encap ip id 1 src 0.0.0.0 dst 10.1.0.2 ttl 0 tos 0
     vxlan_opts 1110 dev vxlan1 scope link

   PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
   64 bytes from 1.1.1.1: icmp_seq=1 ttl=64 time=0.111 ms

v1->v2:
  - improve the changelog.
  - get_u32 with base = 0 for gbp.
  - use PRINT_ANY to support dumping with json format.
v2->v3:
  - implement proper JSON array for opts.
v3->v4:
  - keep the same format between input and output, json and non json.
  - print gbp as uint.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agoiproute_lwtunnel: add options support for geneve metadata
Xin Long [Mon, 27 Apr 2020 10:27:45 +0000 (18:27 +0800)]
iproute_lwtunnel: add options support for geneve metadata

This patch is to add LWTUNNEL_IP(6)_OPTS and LWTUNNEL_IP_OPTS_GENEVE's
parse and print to implement geneve options support in iproute_lwtunnel.

Options are expressed as class:type:data and multiple options may be
listed using a comma delimiter, class and type are numbers and data
is a hex string.

With this patch, users can add and dump geneve options like:

  # ip netns add a
  # ip netns add b
  # ip -n a link add eth0 type veth peer name eth0 netns b
  # ip -n a link set eth0 up; ip -n b link set eth0 up
  # ip -n a addr add 10.1.0.1/24 dev eth0
  # ip -n b addr add 10.1.0.2/24 dev eth0
  # ip -n b link add geneve1 type geneve id 1 remote 10.1.0.1 ttl 64
  # ip -n b addr add 1.1.1.1/24 dev geneve1
  # ip -n b link set geneve1 up
  # ip -n b route add 2.1.1.0/24 dev geneve1
  # ip -n a link add geneve1 type geneve external
  # ip -n a addr add 2.1.1.1/24 dev geneve1
  # ip -n a link set geneve1 up
  # ip -n a route add 1.1.1.0/24 encap ip id 1 geneve_opts \
    1:1:1212121234567890,1:1:1212121234567890,1:1:1212121234567890 \
    dst 10.1.0.2 dev geneve1
  # ip -n a route show
  # ip netns exec a ping 1.1.1.1 -c 1

   1.1.1.0/24  encap ip id 1 src 0.0.0.0 dst 10.1.0.2 ttl 0 tos 0
     geneve_opts 1:1:1212121234567890,1:1:1212121234567890 ...

   PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
   64 bytes from 1.1.1.1: icmp_seq=1 ttl=64 time=0.079 ms

v1->v2:
  - improve the changelog.
  - use PRINT_ANY to support dumping with json format.
v2->v3:
  - implement proper JSON array for opts instead of just bunch of strings.
v3->v4:
  - keep the same format between input and output, json and non json.
  - print class and type as uint and print data as hex string.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agodevlink: add support for DEVLINK_CMD_REGION_NEW
Jacob Keller [Tue, 28 Apr 2020 20:43:24 +0000 (13:43 -0700)]
devlink: add support for DEVLINK_CMD_REGION_NEW

Add support to request that a new snapshot be taken immediately for
a devlink region. To avoid confusion, the desired snapshot id must be
provided.

Note that if a region does not support snapshots on demand, the kernel
will reject the request with -EOPNOTSUP.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agouapi: update bpf.h
Stephen Hemminger [Thu, 30 Apr 2020 05:30:48 +0000 (22:30 -0700)]
uapi: update bpf.h

Minor spelling in comment
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agotc: pedit: Support JSON dumping
Petr Machata [Tue, 28 Apr 2020 11:44:33 +0000 (14:44 +0300)]
tc: pedit: Support JSON dumping

The action pedit does not currently support dumping to JSON. Convert
print_pedit() to the print_* family of functions so that dumping is correct
both in plain and JSON mode. In plain mode, the output is character for
character the same as it was before. In JSON mode, this is an example dump:

$ tc filter add dev dummy0 ingress prio 125 flower \
         action pedit ex munge udp dport set 12345 \
                 munge ip ttl add 1        \
 munge offset 10 u8 clear
$ tc -j filter show dev dummy0 ingress | jq
[
  {
    "protocol": "all",
    "pref": 125,
    "kind": "flower",
    "chain": 0
  },
  {
    "protocol": "all",
    "pref": 125,
    "kind": "flower",
    "chain": 0,
    "options": {
      "handle": 1,
      "keys": {},
      "not_in_hw": true,
      "actions": [
        {
          "order": 1,
          "kind": "pedit",
          "control_action": {
            "type": "pass"
          },
          "nkeys": 3,
          "index": 1,
          "ref": 1,
          "bind": 1,
          "keys": [
            {
              "htype": "udp",
              "offset": 0,
              "cmd": "set",
              "val": "3039",
              "mask": "ffff0000"
            },
            {
              "htype": "ipv4",
              "offset": 8,
              "cmd": "add",
              "val": "1000000",
              "mask": "ffffff"
            },
            {
              "htype": "network",
              "offset": 8,
              "cmd": "set",
              "val": "0",
              "mask": "ffff00ff"
            }
          ]
        }
      ]
    }
  }
]

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agoerspan: Add type I version 0 support.
William Tu [Sun, 26 Apr 2020 15:04:15 +0000 (08:04 -0700)]
erspan: Add type I version 0 support.

The Type I ERSPAN frame format is based on the barebones
IP + GRE(4-byte) encapsulation on top of the raw mirrored frame.
Both type I and II use 0x88BE as protocol type. Unlike type II
and III, no sequence number or key is required.

To creat a type I erspan tunnel device:
$ ip link add dev erspan11 type erspan \
local 172.16.1.100 remote 172.16.1.200 \
erspan_ver 0

CC: Dmitriy Andreyevskiy <dandreye@cisco.com>
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agoman: ip.8: add reference to mptcp man-page
Paolo Abeni [Wed, 29 Apr 2020 17:17:22 +0000 (19:17 +0200)]
man: ip.8: add reference to mptcp man-page

While at it, additionally fix a mandoc warning in mptcp.8

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agoMerge branch 'mptcp' into next
David Ahern [Wed, 29 Apr 2020 16:50:25 +0000 (16:50 +0000)]
Merge branch 'mptcp' into next

Paolo Abeni  says:

====================

This introduces support for the MPTCP PM netlink interface, allowing admins
to configure several aspects of the MPTCP path manager. The subcommand is
documented with a newly added man-page.

This series also includes support for MPTCP subflow diag.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agoman: mptcp man page
Paolo Abeni [Thu, 23 Apr 2020 13:37:10 +0000 (15:37 +0200)]
man: mptcp man page

describe the mptcp subcommands implemented so far.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agoss: allow dumping MPTCP subflow information
Davide Caratti [Thu, 23 Apr 2020 13:37:09 +0000 (15:37 +0200)]
ss: allow dumping MPTCP subflow information

 [root@f31 packetdrill]# ss -tni

 ESTAB    0        0           192.168.82.247:8080           192.0.2.1:35273
          cubic wscale:7,8 [...] tcp-ulp-mptcp flags:Mec token:0000(id:0)/5f856c60(id:0) seq:b810457db34209a5 sfseq:1 ssnoff:0 maplen:190

Additionally extends ss manpage to describe the new entry layout.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agoadd support for mptcp netlink interface
Paolo Abeni [Thu, 23 Apr 2020 13:37:08 +0000 (15:37 +0200)]
add support for mptcp netlink interface

Implement basic commands to:
- manipulate MPTCP endpoints list
- manipulate MPTCP connection limits

Examples:
1. Allows multiple subflows per MPTCP connection
   $ ip mptcp limits set subflows 2

2. Accept ADD_ADDR announcement from the peer (server):
   $ ip mptcp limits set add_addr_accepted 2

3. Add a ipv4 address to be annunced for backup subflows:
   $ ip mptcp endpoint add 10.99.1.2 signal backup

4. Add an ipv6 address used as source for additional subflows:
   $ ip mptcp endpoint add 2001::2 subflow

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agoUpdate kernel headers and import mptcp.h
David Ahern [Wed, 29 Apr 2020 16:41:39 +0000 (16:41 +0000)]
Update kernel headers and import mptcp.h

Update kernel headers to commit
    790ab249b55d ("net: ethernet: fec: Prevent MII event after MII_SPEED write")

and import mptcp.h

Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agotc: fq: add timer_slack parameter
Eric Dumazet [Mon, 27 Apr 2020 18:05:43 +0000 (11:05 -0700)]
tc: fq: add timer_slack parameter

Commit 583396f4ca4d ("net_sched: sch_fq: enable use of hrtimer slack")
added TCA_FQ_TIMER_SLACK parameter, with a default value of 10 usec.

Add the corresponding tc support to get/set this tunable.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agotc: fq_codel: add drop_batch parameter
Eric Dumazet [Mon, 27 Apr 2020 17:51:55 +0000 (10:51 -0700)]
tc: fq_codel: add drop_batch parameter

Commit 9d18562a2278 ("fq_codel: add batch ability to fq_codel_drop()")
added the new TCA_FQ_CODEL_DROP_BATCH_SIZE parameter, set by default to 64.

Add to tc command the ability to get/set the drop_batch

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoxfrm: also check for ipv6 state in xfrm_state_keep
Xin Long [Mon, 27 Apr 2020 07:14:24 +0000 (15:14 +0800)]
xfrm: also check for ipv6 state in xfrm_state_keep

As commit f9d696cf414c ("xfrm: not try to delete ipcomp states when using
deleteall") does, this patch is to fix the same issue for ip6 state where
xsinfo->id.proto == IPPROTO_IPV6.

  # ip xfrm state add src 2000::1 dst 2000::2 spi 0x1000 \
    proto comp comp deflate mode tunnel sel src 2000::1 dst \
    2000::2 proto gre
  # ip xfrm sta deleteall
  Failed to send delete-all request
  : Operation not permitted

Note that the xsinfo->proto in common states can never be IPPROTO_IPV6.

Fixes: f9d696cf414c ("xfrm: not try to delete ipcomp states when using deleteall")
Reported-by: Xiumei Mu <xmu@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agotc: m_action: check cookie hex string len
Jiri Pirko [Mon, 27 Apr 2020 06:10:55 +0000 (08:10 +0200)]
tc: m_action: check cookie hex string len

Check the cookie hex string len is dividable by 2 as the valid hex
string always should be.

Reported-by: Alex Kushnarov <alexanderk@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoMerge branch 'macsec-offload' into next
David Ahern [Sun, 26 Apr 2020 18:32:20 +0000 (18:32 +0000)]
Merge branch 'macsec-offload' into next

Igor Russkikh  says:

====================

From: Mark Starovoytov <mstarovoitov@marvell.com>

This series adds support for selecting the offloading mode of a MACsec
interface at link creation time.
Available modes are for now 'off', 'phy' and 'mac', 'off' being the default
when an interface is created.

First patch adds support for MAC offloading.

Last patch allows a user to change the offloading mode at runtime
through a new attribute, `ip link add link ... offload`:

  # ip link add link enp1s0 type macsec encrypt on offload off
  # ip link add link enp1s0 type macsec encrypt on offload phy
  # ip link add link enp1s0 type macsec encrypt on offload mac

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agomacsec: add support for specifying offload at link add time
Mark Starovoytov [Fri, 24 Apr 2020 08:38:57 +0000 (11:38 +0300)]
macsec: add support for specifying offload at link add time

This patch adds support for configuring offload mode upon MACsec
device creation.

If offload mode is not specified, then netlink attribute is not
added. Default behavior on the kernel side in this case is
backward-compatible (offloading is disabled by default).

Example:
$ ip link add link eth0 macsec0 type macsec port 11 encrypt on offload mac

Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agomacsec: add support for MAC offload
Mark Starovoytov [Fri, 24 Apr 2020 08:38:56 +0000 (11:38 +0300)]
macsec: add support for MAC offload

This patch enables MAC HW offload usage in iproute, since MACSec
implementation supports it now.

Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agobridge: man page spelling fixes
Stephen Hemminger [Mon, 20 Apr 2020 16:48:57 +0000 (09:48 -0700)]
bridge: man page spelling fixes

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoState of bridge STP port are now case insensitive
Bastien Roucariès [Sun, 12 Apr 2020 23:50:38 +0000 (01:50 +0200)]
State of bridge STP port are now case insensitive

Improve use experience

Signed-off-by: Bastien Roucariès <rouca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoDocument root_block option
Bastien Roucariès [Sun, 12 Apr 2020 23:50:37 +0000 (01:50 +0200)]
Document root_block option

Root_block is also called root port guard, document it.

Signed-off-by: Bastien Roucariès <rouca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoBetter documentation of BDPU guard
Bastien Roucariès [Sun, 12 Apr 2020 23:50:36 +0000 (01:50 +0200)]
Better documentation of BDPU guard

Document that guard disable the port and how to reenable it

Signed-off-by: Bastien Roucariès <rouca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoDocument BPDU filter option
Bastien Roucariès [Sun, 12 Apr 2020 23:50:35 +0000 (01:50 +0200)]
Document BPDU filter option

Disabled state is also BPDU filter

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoImprove hairpin mode description
Bastien Roucariès [Sun, 12 Apr 2020 23:50:34 +0000 (01:50 +0200)]
Improve hairpin mode description

Mention VEPA and reflective relay.

Signed-off-by: Bastien Roucariès <rouca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoBetter documentation of mcast_to_unicast option
Bastien Roucariès [Sun, 12 Apr 2020 23:50:33 +0000 (01:50 +0200)]
Better documentation of mcast_to_unicast option

This option is useful for Wifi bridge but need some tweak.

Document it from kernel patches documentation

Signed-off-by: Bastien Roucariès <rouca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoman: replace $(NETNS_ETC_DIR) and $(NETNS_RUN_DIR) in ip-netns(8)
Brian Norris [Tue, 7 Apr 2020 17:43:06 +0000 (10:43 -0700)]
man: replace $(NETNS_ETC_DIR) and $(NETNS_RUN_DIR) in ip-netns(8)

These can be configured to different paths. Reflect that in the
generated documentation.

Signed-off-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoman: add ip-netns(8) as generation target
Brian Norris [Tue, 7 Apr 2020 17:43:05 +0000 (10:43 -0700)]
man: add ip-netns(8) as generation target

Prepare for adding new variable substitutions. Unify the sed rules while
we're at it, since there's no need to write this out 4 times.

Signed-off-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agotc: fq_codel: fix class stat deficit is signed int
Benjamin Lee [Wed, 15 Apr 2020 04:11:12 +0000 (21:11 -0700)]
tc: fq_codel: fix class stat deficit is signed int

The fq_codel class stat deficit is a signed int.  This is a regression
from when JSON output was added.

Fixes: 997f2dc19378 ("tc: Add JSON output of fq_codel stats")
Signed-off-by: Benjamin Lee <ben@b1c1l1.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoq_cake: properly print memlimit
Odin Ugedal [Wed, 15 Apr 2020 14:39:35 +0000 (16:39 +0200)]
q_cake: properly print memlimit

Load memlimit so that it will be printed if it isn't set to zero.

Also add a space to properly print it.

Signed-off-by: Odin Ugedal <odin@ugedal.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoq_cake: Make fwmark uint instead of int
Odin Ugedal [Wed, 15 Apr 2020 14:39:34 +0000 (16:39 +0200)]
q_cake: Make fwmark uint instead of int

This will help avoid overflow, since setting it to 0xffffffff would
result in -1 when converted to integer, resulting in being "-1", setting
the fwmark to 0x00.

Signed-off-by: Odin Ugedal <odin@ugedal.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agotc_util: detect overflow in get_size
Odin Ugedal [Thu, 16 Apr 2020 14:08:14 +0000 (16:08 +0200)]
tc_util: detect overflow in get_size

This detects overflow during parsing of value using get_size:

eg. running:

$ tc qdisc add dev lo root cake memlimit 11gb

currently gives a memlimit of "3072Mb", while with this patch it errors
with 'illegal value for "memlimit": "11gb"', since memlinit is an
unsigned integer.

Signed-off-by: Odin Ugedal <odin@ugedal.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agodevlink: Add devlink health auto_dump command support
Eran Ben Elisha [Tue, 14 Apr 2020 06:57:52 +0000 (09:57 +0300)]
devlink: Add devlink health auto_dump command support

Add support for configuring auto_dump attribute per reporter.
With this attribute, one can indicate whether the devlink kernel core
should execute automatic dump on error.

The change will be reflected in show, set and man commands.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agoMerge branch 'master' into next
David Ahern [Sun, 19 Apr 2020 22:26:27 +0000 (22:26 +0000)]
Merge branch 'master' into next

Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agoman: tc-htb.8: fix class prio is not mandatory
Benjamin Lee [Thu, 9 Apr 2020 05:12:15 +0000 (22:12 -0700)]
man: tc-htb.8: fix class prio is not mandatory

Fix description for htb class prio parameter to indicate it's not
mandatory.

Signed-off-by: Benjamin Lee <ben@b1c1l1.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoman: tc-htb.8: add missing class parameter quantum
Benjamin Lee [Thu, 9 Apr 2020 05:12:14 +0000 (22:12 -0700)]
man: tc-htb.8: add missing class parameter quantum

Add description for htb class parameter quantum.

Signed-off-by: Benjamin Lee <ben@b1c1l1.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoman: tc-htb.8: add missing qdisc parameter r2q
Benjamin Lee [Thu, 9 Apr 2020 05:12:13 +0000 (22:12 -0700)]
man: tc-htb.8: add missing qdisc parameter r2q

Add description for htb qdisc parameter r2q.

Signed-off-by: Benjamin Lee <ben@b1c1l1.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoip: link_gre: Do not send ERSPAN attributes to GRE tunnels
Petr Machata [Fri, 3 Apr 2020 22:55:34 +0000 (01:55 +0300)]
ip: link_gre: Do not send ERSPAN attributes to GRE tunnels

In the commit referenced below, ip link started sending ERSPAN-specific
attributes even for GRE and gretap tunnels. Fix by more carefully
distinguishing between the GRE/tap and ERSPAN modes. Do not show
ERSPAN-related help in GRE/tap mode, likewise do not accept ERSPAN
arguments, or send ERSPAN attributes.

Fixes: 83c543af872e ("erspan: set erspan_ver to 1 by default")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agodevlink: fix JSON output of mon command
Jiri Pirko [Thu, 9 Apr 2020 18:29:51 +0000 (20:29 +0200)]
devlink: fix JSON output of mon command

The current JSON output of mon command is broken. Fix it and make sure
that the output is a valid JSON. Also, handle SIGINT gracefully to allow
to end the JSON properly.

Example:
$ devlink mon -j -p
{
    "mon": [ {
            "command": "new",
            "dev": {
                "netdevsim/netdevsim10": {}
            }
        },{
            "command": "new",
            "port": {
                "netdevsim/netdevsim10/0": {
                    "type": "notset",
                    "flavour": "physical",
                    "port": 1
                }
            }
        },{
            "command": "new",
            "port": {
                "netdevsim/netdevsim10/0": {
                    "type": "eth",
                    "netdev": "eth0",
                    "flavour": "physical",
                    "port": 1
                }
            }
        },{
            "command": "new",
            "port": {
                "netdevsim/netdevsim10/0": {
                    "type": "notset",
                    "flavour": "physical",
                    "port": 1
                }
            }
        },{
            "command": "del",
            "port": {
                "netdevsim/netdevsim10/0": {
                    "type": "notset",
                    "flavour": "physical",
                    "port": 1
                }
            }
        },{
            "command": "del",
            "dev": {
                "netdevsim/netdevsim10": {}
            }
        } ]
}

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoMerge branch 'master' into next
David Ahern [Thu, 9 Apr 2020 14:42:33 +0000 (14:42 +0000)]
Merge branch 'master' into next

Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agoman: tc-pedit: Drop the claim that pedit ex is only for IPv4
Petr Machata [Fri, 3 Apr 2020 23:05:31 +0000 (02:05 +0300)]
man: tc-pedit: Drop the claim that pedit ex is only for IPv4

This sentence predates addition of extended pedit for IPv6 packets.

Reported-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>