]> git.proxmox.com Git - mirror_iproute2.git/log
mirror_iproute2.git
6 years agodevlink: Add support for devlink-region access
Alex Vesker [Tue, 17 Jul 2018 08:34:27 +0000 (11:34 +0300)]
devlink: Add support for devlink-region access

Devlink region allows access to driver defined address regions.
Each device can create its supported address regions and register
them. A device which exposes a region will allow access to it
using devlink.

This support allows reading and dumping regions snapshots as well
as presenting information such as region size and current available
snapshots.

A snapshot represents a memory image of a region taken by the driver.
If a device collects a snapshot of an address region it can be later
exposed using devlink region read or dump commands.
This functionality allows for future analyses on the snapshots.

The dump command is designed to read the full address space of a
region or of a snapshot unlike the read command which allows
reading only a specific section in a region/snapshot indicated by
an address and a length, current support is for reading and dumping
for a previously taken snapshot ID.

New commands added:
 devlink region show [ DEV/REGION ]
 devlink region delete DEV/REGION snapshot SNAPSHOT_ID
 devlink region dump DEV/REGION [ snapshot SNAPSHOT_ID ]
 devlink region read DEV/REGION [ snapshot SNAPSHOT_ID ]
                                address ADDRESS length length

Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agonet:sched: add action inheritdsfield to skbedit
Qiaobin Fu [Thu, 19 Jul 2018 16:07:18 +0000 (12:07 -0400)]
net:sched: add action inheritdsfield to skbedit

The new action inheritdsfield copies the field DS of
IPv4 and IPv6 packets into skb->priority. This enables
later classification of packets based on the DS field.

v4:
* Make tc use netlink helper functions

v3:
* Make flag represented in JSON output as a null value

v2:
* Align the output syntax with the input syntax

* Fix the style issues

Original idea by Jamal Hadi Salim <jhs@mojatatu.com>

Signed-off-by: Qiaobin Fu <qiaobinf@bu.edu>
Reviewed-by: Michel Machado <michel@digirati.com.br>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agoip: add support for seg6local End.BPF action
Mathieu Xhonneux [Tue, 17 Jul 2018 14:49:52 +0000 (14:49 +0000)]
ip: add support for seg6local End.BPF action

This patch adds support for the End.BPF action of the seg6local
lightweight tunnel. Functions from the BPF lightweight tunnel are
re-used in this patch. Example:

$ ip -6 route add fc00::18 encap seg6local action End.BPF endpoint
obj my_bpf.o sec my_func dev eth0

$ ip -6 route show fc00::18
fc00::18  encap seg6local action End.BPF endpoint my_bpf.o:[my_func]
dev eth0 metric 1024 pref medium

v2: - re-use of print_encap_bpf_prog instead of fprintf
    - introduction of "endpoint" keyword for more consistency with
      others parameters

Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agoipaddress: Fix and make consistent label match handling
Serhey Popovych [Sat, 14 Jul 2018 21:36:34 +0000 (00:36 +0300)]
ipaddress: Fix and make consistent label match handling

Since commit 9516823051ce ("ipaddress: Improve print_linkinfo()") we
return -1 instead of 0 when ip-address(8) label does not match network
device name as we did before change. This causes regression when trying
to output ip address matching label:

     # ip addr add 192.168.192.1/24 dev lo label lo:1
     # ip addr show label lo:1
     <no output>

This is special case and return 0 from print_linkinfo() earlier to match
only filter.ifindex and filter.up if given, but not rest fields in
@filter. Then call print_selected_addrinfo() without calling
print_link_stats() in ipaddr_list_flush_or_save().

Later print_selected_addrinfo() calls print_addrinfo() that finally
matches IFA_LABEL attribute in netlink buffer with filter.label using
ifa_label_match_rta().

On the other hand there is three conditions checked in print_linkinfo()
to determine label special case:

    1) filter.label != NULL
    2) filter.family == AF_UNSPEC || filter.family == AF_PACKET
    3) fnmatch(filter.label, name, 0)

With 1) it is ok to check if filtering by label is on by given pattern
in @filter.label.

Since label is IPv4 specific and AF_PACKET is for printing ip-link(8)
information (see ipaddr_link_list()::ipaddress.c as example) checking
for AF_PACKET in 2) doesn't take much sense: better to defer these
checks to print_addrinfo() determine valid combinations before calling
ifa_label_match_rta() to finally match IFA_LABEL to pattern in
filter.label.

For 3) we have following call for test case:

    fnmatch(pattern, string, flags) ->
      fnmatch(filter.label, name, 0) ->
        fnmatch("lo:1", "lo", 0) == FNM_NOMATCH (1) or non-zero on error

To support special case in print_linkinfo() for filtering by label we
only need to check if label pattern is given in filter.label and return
0 to skip print_link_stats() in ipaddr_list_flush_or_save(): actual
filtering will be done in print_addrinfo().

Before commit 9516823051ce ("ipaddress: Improve print_linkinfo()"):
-------------------------------------------------------------------

$ ip addr sh label lo
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN \
group default qlen 1000
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                                          fnmatch("lo", "lo", 0) == 0
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
$ ip addr show label 'lo:*'
    inet 192.168.192.1/24 scope global lo:1
       valid_lft forever preferred_lft forever
$ ip addr sh label lo:1
    inet 192.168.192.1/24 scope global lo:1
       valid_lft forever preferred_lft forever
$ ip -4 addr sh label lo:1
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN \
group default qlen 1000
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                                             filter.family == AF_INET
    inet 192.168.192.1/24 scope global lo:1
       valid_lft forever preferred_lft forever

After this change applied:
--------------------------

$ ip/ip addr show label lo
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
$ ip/ip addr show label 'lo:*'
    inet 192.168.192.1/24 scope global lo:1
        valid_lft forever preferred_lft forever
$ ip/ip addr show label lo:1
    inet 192.168.192.1/24 scope global lo:1
       valid_lft forever preferred_lft forever
$ ip/ip -4 addr show label lo:1
    inet 192.168.192.1/24 scope global lo:1
       valid_lft forever preferred_lft forever

Note that we no longer show link information as we did previously:
    we are filtering by "label" pattern, not showing by "dev".

Fixes: commit 9516823051ce ("ipaddress: Improve print_linkinfo()")
Reported-by: Vincent Bernat <vincent@bernat.im>
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agoMerge branch 'bpf-btf' into iproute2-next
David Ahern [Wed, 18 Jul 2018 02:39:06 +0000 (19:39 -0700)]
Merge branch 'bpf-btf' into iproute2-next

Daniel Borkmann  says:

====================

Main part of this set is to: i) avoid strict af_alg kernel dependency,
ii) add loader support for bpf to bpf calls and iii) add btf loader
support with an option to annotate maps. For details please see the
individual patches. Thanks!

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agobpf: implement btf handling and map annotation
Daniel Borkmann [Tue, 17 Jul 2018 23:31:22 +0000 (01:31 +0200)]
bpf: implement btf handling and map annotation

Implement loading of .BTF section from object file and build up
internal table for retrieving key/value id related to maps in
the BPF program. Latter is done by setting up struct btf_type
table.

One of the issues is that there's a disconnect between the data
types used in the map and struct bpf_elf_map, meaning the underlying
types are unknown from the map description. One way to overcome
this is to add a annotation such that the loader will recognize
the relation to both. BPF_ANNOTATE_KV_PAIR(map_foo, struct key,
struct val); has been added to the API that programs can use.

The loader will then pick the corresponding key/value type ids and
attach it to the maps for creation. This can later on be dumped via
bpftool for introspection.

Example with test_xdp_noinline.o from kernel selftests:

  [...]

  struct ctl_value {
        union {
                __u64 value;
                __u32 ifindex;
                __u8 mac[6];
        };
  };

  struct bpf_map_def __attribute__ ((section("maps"), used)) ctl_array = {
        .type = BPF_MAP_TYPE_ARRAY,
        .key_size = sizeof(__u32),
        .value_size = sizeof(struct ctl_value),
        .max_entries = 16,
        .map_flags = 0,
  };
  BPF_ANNOTATE_KV_PAIR(ctl_array, __u32, struct ctl_value);

  [...]

Above could also further be wrapped in a macro. Compiling through LLVM and
converting to BTF:

  # llc --version
  LLVM (http://llvm.org/):
    LLVM version 7.0.0svn
    Optimized build.
    Default target: x86_64-unknown-linux-gnu
    Host CPU: skylake

    Registered Targets:
      bpf    - BPF (host endian)
      bpfeb  - BPF (big endian)
      bpfel  - BPF (little endian)
  [...]

  # clang [...] -O2 -target bpf -g -emit-llvm -c test_xdp_noinline.c -o - |
    llc -march=bpf -mcpu=probe -mattr=dwarfris -filetype=obj -o test_xdp_noinline.o
  # pahole -J test_xdp_noinline.o

Checking pahole dump of BPF object file:

  # file test_xdp_noinline.o
  test_xdp_noinline.o: ELF 64-bit LSB relocatable, *unknown arch 0xf7* version 1 (SYSV), with debug_info, not stripped
  # pahole test_xdp_noinline.o
  [...]
  struct ctl_value {
union {
__u64              value;                /*     0     8 */
__u32              ifindex;              /*     0     4 */
__u8               mac[0];               /*     0     0 */
};                                               /*     0     8 */

/* size: 8, cachelines: 1, members: 1 */
/* last cacheline: 8 bytes */
  };

Now loading into kernel and dumping the map via bpftool:

  # ip -force link set dev lo xdp obj test_xdp_noinline.o sec xdp-test
  # ip a
  1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 xdpgeneric/id:227 qdisc noqueue state UNKNOWN group default qlen 1000
      link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
      inet 127.0.0.1/8 scope host lo
         valid_lft forever preferred_lft forever
      inet6 ::1/128 scope host
         valid_lft forever preferred_lft forever
  [...]
  # bpftool prog show id 227
  227: xdp  tag a85e060c275c5616  gpl
      loaded_at 2018-07-17T14:41:29+0000  uid 0
      xlated 8152B  not jited  memlock 12288B  map_ids 381,385,386,382,384,383
  # bpftool map dump id 386
   [{
        "key": 0,
        "value": {
            "": {
                "value": 0,
                "ifindex": 0,
                "mac": []
            }
        }
    },{
        "key": 1,
        "value": {
            "": {
                "value": 0,
                "ifindex": 0,
                "mac": []
            }
        }
    },{
  [...]

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agobpf: implement bpf to bpf calls support
Daniel Borkmann [Tue, 17 Jul 2018 23:31:21 +0000 (01:31 +0200)]
bpf: implement bpf to bpf calls support

Implement missing bpf to bpf calls support. The loader will
recognize .text section and handle relocation entries that
are emitted by LLVM.

First step is processing of map related relocation entries
for .text section, and in a second step loader will copy .text
section into program section and adjust call instruction
offset accordingly.

Example with test_xdp_noinline.o from kernel selftests:

 1) Every function as __attribute__ ((always_inline)), rest
    left unchanged:

  # ip -force link set dev lo xdp obj test_xdp_noinline.o sec xdp-test
  # ip a
  1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 xdpgeneric/id:233 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
  [...]
  # bpftool prog dump xlated id 233
  [...]
  1669: (2d) if r3 > r2 goto pc+4
  1670: (79) r2 = *(u64 *)(r10 -136)
  1671: (61) r2 = *(u32 *)(r2 +0)
  1672: (63) *(u32 *)(r1 +0) = r2
  1673: (b7) r0 = 1
  1674: (95) exit        <-- 1674 insns total

 2) Every function as __attribute__ ((noinline)), rest
    left unchanged:

  # ip -force link set dev lo xdp obj test_xdp_noinline.o sec xdp-test
  # ip a
  1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 xdpgeneric/id:236 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
  [...]
  # bpftool prog dump xlated id 236
  [...]
  1000: (bf) r1 = r6
  1001: (b7) r2 = 24
  1002: (85) call pc+3   <-- pc-relative call insns
  1003: (1f) r7 -= r0
  1004: (bf) r0 = r7
  1005: (95) exit
  1006: (bf) r0 = r1
  1007: (bf) r1 = r2
  1008: (67) r1 <<= 32
  1009: (77) r1 >>= 32
  1010: (bf) r3 = r0
  1011: (6f) r3 <<= r1
  1012: (87) r2 = -r2
  1013: (57) r2 &= 31
  1014: (67) r0 <<= 32
  1015: (77) r0 >>= 32
  1016: (7f) r0 >>= r2
  1017: (4f) r0 |= r3
  1018: (95) exit        <-- 1018 insns total

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agobpf: remove strict dependency on af_alg
Daniel Borkmann [Tue, 17 Jul 2018 23:31:20 +0000 (01:31 +0200)]
bpf: remove strict dependency on af_alg

Do not bail out when AF_ALG is not supported by the kernel and
only do so when a map is requested in object ns where we're
calculating the hash. Otherwise, the loader can operate just
fine, therefore lets not fail early when it's not needed.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agobpf: move bpf_elf_map fixup notification under verbose
Daniel Borkmann [Tue, 17 Jul 2018 23:31:19 +0000 (01:31 +0200)]
bpf: move bpf_elf_map fixup notification under verbose

No need to spam the user with this if it can be fixed gracefully
anyway. Therefore, move it under verbose option.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agoImport btf.h from kernel headers
David Ahern [Wed, 18 Jul 2018 02:37:50 +0000 (19:37 -0700)]
Import btf.h from kernel headers

Import btf.h from kernel headers at commit
    2aa4a3378ad0 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next")
which is the last sync point.

Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agoipneigh: exclude NTF_EXT_LEARNED from default filter
Roopa Prabhu [Mon, 16 Jul 2018 22:19:52 +0000 (15:19 -0700)]
ipneigh: exclude NTF_EXT_LEARNED from default filter

NUD_NOARP entries are filtered out by default by iproute2.
We dont want NUD_NOARP with NTF_EXT_LEARNED flag filtered out.
This patch extends the default filter check for ip neigh show
to include the NTF_EXT_LEARNED flag.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agoiplink: add support for reporting multiple XDP programs
Jakub Kicinski [Fri, 13 Jul 2018 22:54:51 +0000 (15:54 -0700)]
iplink: add support for reporting multiple XDP programs

Kernel now supports attaching XDP programs in the driver
and hardware at the same time.  Print that information
correctly.

In case there are multiple programs attached kernel will
not provide IFLA_XDP_PROG_ID, so don't expect it to be
there (this also improves the printing for very old kernels
slightly, as it avoids unnecessary "prog/xdp" line).

In short mode preserve the current outputs but don't print
IDs if there are multiple.

6: netdevsim0: <BROADCAST,NOARP> mtu 1500 xdpoffload/id:11 qdisc [...]

and:

6: netdevsim0: <BROADCAST,NOARP> mtu 1500 xdpmulti qdisc [...]

ip link output will keep using prog/xdp prefix if only one program
is attached, but can also print multiple program lines:

    prog/xdp id 8 tag fc7a51d1a693a99e jited

vs:

    prog/xdpdrv id 8 tag fc7a51d1a693a99e jited
    prog/xdpoffload id 9 tag fc7a51d1a693a99e

JSON output gains a new array called "attached" which will
contain the full list of attached programs along with their
attachment modes:

        "xdp": {
            "mode": 3,
            "prog": {
                "id": 11,
                "tag": "fc7a51d1a693a99e",
                "jited": 0
            },
            "attached": [ {
                    "mode": 3,
                    "prog": {
                        "id": 11,
                        "tag": "fc7a51d1a693a99e",
                        "jited": 0
                    }
                } ]
        },

In case there are multiple programs attached the general "xdp"
section will not contain program information:

        "xdp": {
            "mode": 4,
            "attached": [ {
                    "mode": 1,
                    "prog": {
                        "id": 10,
                        "tag": "fc7a51d1a693a99e",
                        "jited": 1
                    }
                },{
                    "mode": 3,
                    "prog": {
                        "id": 11,
                        "tag": "fc7a51d1a693a99e",
                        "jited": 0
                    }
                } ]
        },

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agotc: flower: Add support for QinQ
Jianbo Liu [Sat, 30 Jun 2018 10:01:33 +0000 (10:01 +0000)]
tc: flower: Add support for QinQ

To support matching on both outer and inner vlan headers,
we add new cvlan_id/cvlan_prio/cvlan_ethtype for inner vlan header.

Example:
# tc filter add dev eth0 protocol 802.1ad parent ffff: \
    flower vlan_id 1000 vlan_ethtype 802.1q \
        cvlan_id 100 cvlan_ethtype ipv4 \
    action vlan pop \
    action vlan pop \
    action mirred egress redirect dev eth1

# tc filter show dev eth0 ingress
filter protocol 802.1ad pref 1 flower chain 0
filter protocol 802.1ad pref 1 flower chain 0 handle 0x1
  vlan_id 1000
  vlan_ethtype 802.1Q
  cvlan_id 100
  cvlan_ethtype ip
  eth_type ipv4
  in_hw

Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agoUpdate kernel headers
David Ahern [Sun, 15 Jul 2018 20:02:51 +0000 (13:02 -0700)]
Update kernel headers

Update kernel headers to commit
2aa4a3378ad0 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next")

Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agoMerge branch 'tc-etf' into iproute2-next
David Ahern [Thu, 12 Jul 2018 00:51:46 +0000 (17:51 -0700)]
Merge branch 'tc-etf' into iproute2-next

Jesus Sanchez-Palencia  says:

====================

fixes since v3:
 - Add support for clock names with the "CLOCK_" prefix;
 - Print clock name on print_opt();
 - Use strcasecmp() instead of strncasecmp().

The ETF (earliest txtime first) qdisc was recently merged into net-next
[1], so this patchset adds support for it through the tc command line
tool.

An initial man page is also provided.

The first commit in this series is adding an updated version of
include/uapi/linux/pkt_sched.h and is not meant to be merged. It's
provided here just as a convenience for those who want to easily build
this patchset.

[1] https://patchwork.ozlabs.org/cover/938991/

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agoman: Add initial manpage for tc-etf(8)
Jesus Sanchez-Palencia [Mon, 9 Jul 2018 23:56:13 +0000 (16:56 -0700)]
man: Add initial manpage for tc-etf(8)

Add an initial manpage for tc-etf covering all config options, basic
concepts and operation modes.

Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agotc: Add support for the ETF Qdisc
Vinicius Costa Gomes [Mon, 9 Jul 2018 23:56:12 +0000 (16:56 -0700)]
tc: Add support for the ETF Qdisc

The "Earliest TxTime First" (ETF) queueing discipline allows precise
control of the transmission time of packets by providing a sorted
time-based scheduling of packets.

The syntax is:

tc qdisc add dev DEV parent NODE etf delta <DELTA>
                     clockid <CLOCKID> [offload] [deadline_mode]

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agotc: don't double print rate
Stephen Hemminger [Mon, 9 Jul 2018 16:53:45 +0000 (09:53 -0700)]
tc: don't double print rate

Conversion to print stats in JSON forgot to remove existing
fprintf.

Fixes: 4fcec7f3665b ("tc: jsonify stats2")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agoman: Fix typos on tc-cbs
Jesus Sanchez-Palencia [Thu, 5 Jul 2018 15:20:09 +0000 (08:20 -0700)]
man: Fix typos on tc-cbs

Fix 2 typos on the man page of the CBS qdisc.

Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agotc: Fix the bug not to display prio and quantum options of htb
fumihiko kakuma [Wed, 4 Jul 2018 03:32:33 +0000 (12:32 +0900)]
tc: Fix the bug not to display prio and quantum options of htb

A commandline like 'tc -d class show dev dev-name' does not
display value of prio and quantum option when we use htb qdisc.
This patch fixes the bug.

Signed-off-by: Fumihiko Kakuma <kakuma@valinux.co.jp>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agotc: Fix output of ip attributes
Roi Dayan [Tue, 3 Jul 2018 12:54:32 +0000 (15:54 +0300)]
tc: Fix output of ip attributes

Example output is of tos and ttl.
Befoe this fix the format used %x caused output of the pointer
instead of the intended string created in the out variable.

Fixes: e28b88a464c4 ("tc: jsonify flower filter")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agouapi: update bpf.h
Stephen Hemminger [Sat, 7 Jul 2018 16:56:14 +0000 (09:56 -0700)]
uapi: update bpf.h

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agotc: m_tunnel_key: Add tunnel option support to act_tunnel_key
Simon Horman [Fri, 6 Jul 2018 00:12:00 +0000 (17:12 -0700)]
tc: m_tunnel_key: Add tunnel option support to act_tunnel_key

Allow setting tunnel options using the act_tunnel_key action.

Options are expressed as class:type:data and multiple options
may be listed using a comma delimiter.

 # ip link add name geneve0 type geneve dstport 0 external
 # tc qdisc add dev eth0 ingress
 # tc filter add dev eth0 protocol ip parent ffff: \
     flower indev eth0 \
        ip_proto udp \
        action tunnel_key \
            set src_ip 10.0.99.192 \
            dst_ip 10.0.99.193 \
            dst_port 6081 \
            id 11 \
            geneve_opts 0102:80:00800022,0102:80:00800022 \
    action mirred egress redirect dev geneve0

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agodevlink: Add param command support
Moshe Shemesh [Wed, 4 Jul 2018 14:12:06 +0000 (17:12 +0300)]
devlink: Add param command support

Add support for configuration parameters set and show.
Each parameter can be either generic or driver-specific.
The user can retrieve data on these configuration parameters by devlink
param show command and can set new value to a configuration parameter
by devlink param set command.
The configuration parameters can be set in different configuration
modes:
  runtime - set while driver is running, no reset required.
  driverinit - applied while driver initializes, requires restart
               driver by devlink reload command.
  permanent - written to device's non-volatile memory, hard reset
              required to apply.

New commands added:
  devlink dev param show [DEV name PARAMETER]
  devlink dev param set DEV name PARAMETER value VALUE
    cmode { permanent | driverinit | runtime }

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agoUpdate kernel headers
David Ahern [Fri, 6 Jul 2018 15:42:22 +0000 (08:42 -0700)]
Update kernel headers

Update kernel headers to commit
ab8565af68001 ("Merge branch 'IP-listification-follow-ups'")

Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agobridge: add support for isolated option
Nikolay Aleksandrov [Tue, 3 Jul 2018 12:42:42 +0000 (15:42 +0300)]
bridge: add support for isolated option

This patch adds support for the new isolated port option which, if set,
would allow the isolated ports to communicate only with non-isolated
ports and the bridge device. The option can be set via the bridge or ip
link type bridge_slave commands, e.g.:
$ ip link set dev eth0 type bridge_slave isolated on
$ bridge link set dev eth0 isolated on

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agoMerge branch 'iproute2-master' into iproute2-next
David Ahern [Thu, 21 Jun 2018 15:12:39 +0000 (08:12 -0700)]
Merge branch 'iproute2-master' into iproute2-next

Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agotc: jsonify nat action
Keara Leibovitz [Mon, 18 Jun 2018 18:57:49 +0000 (14:57 -0400)]
tc: jsonify nat action

Add json output support for nat action

Example output:

~$ $TC actions add action nat egress 10.10.10.1 20.20.20.2 index 2
~$ $TC actions add action nat ingress 100.100.100.1/32 200.200.200.2 \
continue index 99
~$ $TC -j actions ls action nat

[{
"total acts": 2
}, {
"actions": [{
"order": 0,
"type": "nat",
"direction": "egress",
"old_addr": "10.10.10.1/32",
"new_addr": "20.20.20.2",
"control_action": {
"type": "pass"
},
"index": 2,
"ref": 1,
"bind": 0
}, {
"order": 1,
"type": "nat",
"direction": "ingress",
"old_addr": "100.100.100.1/32",
"new_addr": "200.200.200.2",
"control_action": {
"type": "continue"
},
"index": 99,
"ref": 1,
"bind": 0
}]
}]

Signed-off-by: Keara Leibovitz <kleib@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agodevlink.8, translate unparseable callout syntax to parseable form.
Eric S. Raymond [Wed, 13 Jun 2018 21:31:12 +0000 (17:31 -0400)]
devlink.8, translate unparseable callout syntax to parseable form.

Signed-off-by: Eric S. Raymond <esr@thyrsus.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agotc: fix batch force option
Vlad Buslov [Wed, 20 Jun 2018 07:24:21 +0000 (10:24 +0300)]
tc: fix batch force option

When sending accumulated compound command results an error, check 'force'
option before exiting. Move return code check after putting batch bufs and
freeing iovs to prevent memory leak. Break from loop, instead of returning
error code to allow cleanup at the end of batch function. Don't reset ret
code on each iteration.

Fixes: 485d0c6001c4 ("tc: Add batchsize feature for filter and actions")
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Chris Mi <chrism@mellanox.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agoip-xfrm: Add support for OUTPUT_MARK
Subash Abhinov Kasiviswanathan [Sat, 16 Jun 2018 02:32:55 +0000 (20:32 -0600)]
ip-xfrm: Add support for OUTPUT_MARK

This patch adds support for OUTPUT_MARK in xfrm state to exercise the
functionality added by kernel commit 077fbac405bf
("net: xfrm: support setting an output mark.").

Sample output-

(with mark and output-mark)
src 192.168.1.1 dst 192.168.1.2
        proto esp spi 0x00004321 reqid 0 mode tunnel
        replay-window 0 flag af-unspec
        mark 0x10000/0x3ffff output-mark 0x20000
        auth-trunc xcbc(aes) 0x3ed0af408cf5dcbf5d5d9a5fa806b211 96
        enc cbc(aes) 0x3ed0af408cf5dcbf5d5d9a5fa806b233
        anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000

(with mark only)
src 192.168.1.1 dst 192.168.1.2
        proto esp spi 0x00004321 reqid 0 mode tunnel
        replay-window 0 flag af-unspec
        mark 0x10000/0x3ffff
        auth-trunc xcbc(aes) 0x3ed0af408cf5dcbf5d5d9a5fa806b211 96
        enc cbc(aes) 0x3ed0af408cf5dcbf5d5d9a5fa806b233
        anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000

(with output-mark only)
src 192.168.1.1 dst 192.168.1.2
        proto esp spi 0x00004321 reqid 0 mode tunnel
        replay-window 0 flag af-unspec
        output-mark 0x20000
        auth-trunc xcbc(aes) 0x3ed0af408cf5dcbf5d5d9a5fa806b211 96
        enc cbc(aes) 0x3ed0af408cf5dcbf5d5d9a5fa806b233
        anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000

(no mark and output-mark)
src 192.168.1.1 dst 192.168.1.2
        proto esp spi 0x00004321 reqid 0 mode tunnel
        replay-window 0 flag af-unspec
        auth-trunc xcbc(aes) 0x3ed0af408cf5dcbf5d5d9a5fa806b211 96
        enc cbc(aes) 0x3ed0af408cf5dcbf5d5d9a5fa806b233
        anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000

v1->v2: Moved the XFRMA_OUTPUT_MARK print after XFRMA_MARK in
xfrm_xfrma_print() as mentioned by Lorenzo

v2->v3: Fix one help formatting error as mentioned by Lorenzo.
Keep mark and output-mark on the same line and add man page info as
mentioned by David.

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agoip: add rmnet initial support
Daniele Palmas [Fri, 15 Jun 2018 08:23:05 +0000 (10:23 +0200)]
ip: add rmnet initial support

This patch adds basic support for Qualcomm rmnet devices.

Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agoipaddress: strengthen check on 'label' input
Patrick Talbert [Thu, 14 Jun 2018 13:46:57 +0000 (15:46 +0200)]
ipaddress: strengthen check on 'label' input

As mentioned in the ip-address man page, an address label must
be equal to the device name or prefixed by the device name
followed by a colon. Currently the only check on this input is
to see if the device name appears at the beginning of the label
string.

This commit adds an additional check to ensure label == dev or
continues with a colon.

Signed-off-by: Patrick Talbert <ptalbert@redhat.com>
Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agordma: sync some IP headers with glibc
Hoang Le [Wed, 13 Jun 2018 04:09:56 +0000 (11:09 +0700)]
rdma: sync some IP headers with glibc

In the commit 9a362cc71a45, new userspace header:
  (i.e rdma/rdma_user_cm.h -> linux/in6.h)
is included before the kernel space header:
  (i.e utils.h -> resolv.h -> netinet/in.h).

This leads to unsynchronous some IP headers and compiler got failure
with error: redefinition of some structs IP.

In this commit, just reorder this including to make them in-sync.

Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agotipc: JSON support for tipc link printouts
Hoang Le [Tue, 12 Jun 2018 02:32:29 +0000 (09:32 +0700)]
tipc: JSON support for tipc link printouts

Add json output support for tipc link command

Example output:
$tipc -j -p link list

[ {
        "broadcast-link": "up",
        "1.1.1:bridge-1.1.104:eth0": "up",
        "1.1.1:bridge-1.1.105:eth0": "up",
        "1.1.1:bridge-1.1.106:eth0": "up"
    } ]

--------------------
$tipc -j -p link stat show link broadcast-link

[ {
        "link": "broadcast-link",
        "window": 50,
        "rx packets": {
            "rx packets": 0,
            "fragments": 0,
            "fragmented": 0,
            "bundles": 0,
            "bundled": 0
        },
        "tx packets": {
            "tx packets": 0,
            "fragments": 0,
            "fragmented": 0,
            "bundles": 0,
            "bundled": 0
        },
        "rx naks": {
            "rx naks": 0,
            "defs": 0,
            "dups": 0
        },
        "tx naks": {
            "tx naks": 0,
            "acks": 0,
            "retrans": 0
        },
        "congestion link": 0,
        "send queue max": 0,
        "avg": 0
    } ]

v2:
    Replace variable 'json_flag' by 'json' declared in include/utils.h

v3:
    Update manual page

Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agotipc: JSON support for showing nametable
Hoang Le [Tue, 12 Jun 2018 02:32:28 +0000 (09:32 +0700)]
tipc: JSON support for showing nametable

Add json output support for nametable show

Example output:
$tipc -j -p nametable show

[ {
        "type": 0,
        "lower": 16781313,
        "upper": 16781313,
        "scope": "zone",
        "port": 0,
        "node": ""
    },{
        "type": 0,
        "lower": 16781416,
        "upper": 16781416,
        "scope": "cluster",
        "port": 0,
        "node": ""
    } ]

v2:
    Replace variable 'json_flag' by 'json' declared in include/utils.h
    Add new parameter '-pretty' to support pretty output

v3:
    Update manual page

Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agoiproute2: Add support for a few routing protocols
Donald Sharp [Fri, 8 Jun 2018 17:47:13 +0000 (13:47 -0400)]
iproute2: Add support for a few routing protocols

Add support for:

BGP
ISIS
OSPF
RIP
EIGRP

Routing protocols to iproute2.

Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agoMerge branch 'iproute2-master' into iproute2-next
David Ahern [Sun, 10 Jun 2018 14:30:32 +0000 (07:30 -0700)]
Merge branch 'iproute2-master' into iproute2-next

Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agouapi: update headers from linux-net
Stephen Hemminger [Fri, 8 Jun 2018 17:27:13 +0000 (10:27 -0700)]
uapi: update headers from linux-net

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agoMerge ../iproute2-next
Stephen Hemminger [Fri, 8 Jun 2018 17:27:04 +0000 (10:27 -0700)]
Merge ../iproute2-next

6 years agov4.17.0
Stephen Hemminger [Fri, 8 Jun 2018 17:11:50 +0000 (10:11 -0700)]
v4.17.0

6 years agouapi: update bpf.h to include padding
Stephen Hemminger [Fri, 8 Jun 2018 17:09:21 +0000 (10:09 -0700)]
uapi: update bpf.h to include padding

Last minute upstream 4.17 change.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agotipc: TIPC_NLA_LINK_NAME value pass on nesting entry TIPC_NLA_LINK
Hoang Le [Fri, 8 Jun 2018 02:19:28 +0000 (09:19 +0700)]
tipc: TIPC_NLA_LINK_NAME value pass on nesting entry TIPC_NLA_LINK

In the commit 94f6a80 on next-net, TIPC_NLA_LINK_NAME attribute should be
retrieved and validated via TIPC_NLA_LINK nesting entry in
tipc_nl_node_get_link().
According to that commit, TIPC_NLA_LINK_NAME value passing via
tipc link get command must follow above hierachy.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agoiplink: enable to specify a name for the link-netns
Nicolas Dichtel [Tue, 5 Jun 2018 13:08:31 +0000 (15:08 +0200)]
iplink: enable to specify a name for the link-netns

The 'link-netnsid' argument needs a number. Add 'link-netns' when the user
wants to use the iproute2 netns name instead of the nsid.

Example:
ip link add ipip1 link-netns foo type ipip remote 10.16.0.121 local 10.16.0.249

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agoip: display netns name instead of nsid
Nicolas Dichtel [Tue, 5 Jun 2018 13:08:30 +0000 (15:08 +0200)]
ip: display netns name instead of nsid

When iproute2 has a name for the nsid, let's display it. It's more
user friendly than a number.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agotc: add json support in csum action
Keara Leibovitz [Tue, 5 Jun 2018 20:44:19 +0000 (16:44 -0400)]
tc: add json support in csum action

Add json output support for checksum action.

Example output:

~$ $TC actions add action csum udp continue index 7
~$ $TC actions add action csum icmp iph igmp pipe index 200 cookie 112233
~$ $TC -j actions ls action csum

[{
    "total acts":2
}, {
    "actions": [{
        "order":0,
        "csum":"udp",
        "control_action": {
            "type":"continue"
        },
        "index":7,
        "ref":1,
        "bind":0
    }, {
        "order":1,
        "csum":"iph, icmp, igmp",
        "control_action": {
            "type":"pipe"
        },
        "index":200,
        "ref":1,
        "bind":0,
        "cookie":"112233"
    }]
}]

v2:
    Don't initialized char buf[64];
    Add output example

Signed-off-by: Keara Leibovitz <kleib@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agoMerge branch 'iproute2-master' into iproute2-next
David Ahern [Tue, 5 Jun 2018 21:22:15 +0000 (14:22 -0700)]
Merge branch 'iproute2-master' into iproute2-next

Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agodevlink: don't enforce NETLINK_{CAP,EXT}_ACK sock opts
Ivan Vecera [Fri, 1 Jun 2018 08:18:49 +0000 (10:18 +0200)]
devlink: don't enforce NETLINK_{CAP,EXT}_ACK sock opts

Since commit 049c58539f5d ("devlink: mnlg: Add support for extended ack")
devlink requires NETLINK_{CAP,EXT}_ACK. This prevents devlink from
working with older kernels that don't support these features.

host # ./devlink/devlink
Failed to connect to devlink Netlink

Fixes: 049c58539f5d ("devlink: mnlg: Add support for extended ack")
Cc: Arkadi Sharshevsky <arkadis@mellanox.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
6 years agoip: IFLA_NEW_NETNSID/IFLA_NEW_IFINDEX support
Nicolas Dichtel [Thu, 31 May 2018 14:28:48 +0000 (16:28 +0200)]
ip: IFLA_NEW_NETNSID/IFLA_NEW_IFINDEX support

Parse and display those attributes.
Example:
ip l a type dummy
ip netns add foo
ip monitor link&
ip l s dummy1 netns foo
Deleted 6: dummy1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
    link/ether 66:af:3a:3f:a0:89 brd ff:ff:ff:ff:ff:ff new-nsid 0 new-ifindex 6

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agoiproute2: fix 'ip xfrm monitor all' command
Nathan Harold [Wed, 30 May 2018 19:11:32 +0000 (12:11 -0700)]
iproute2: fix 'ip xfrm monitor all' command

Currently, calling 'ip xfrm monitor all' will
actually invoke the 'all-nsid' command because the
soft-match for 'all-nsid' occurs before the precise
match for 'all'. This patch rearranges the checks
so that the 'all' command, itself an alias for
invoking 'ip xfrm monitor' with no argument, can
be called consistent with the syntax for other ip
commands that accept an 'all'.

Signed-off-by: Nathan Harold <nharold@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agoiplink_vrf: Save device index from response for return code
David Ahern [Fri, 1 Jun 2018 15:50:16 +0000 (08:50 -0700)]
iplink_vrf: Save device index from response for return code

A recent commit changed rtnl_talk_* to return the response message in
allocated memory so callers need to free it. The change to name_is_vrf
did not save the device index which is pointing to a struct inside the
now allocated and freed memory resulting in garbage getting returned
in some cases.

Fix by using a stack variable to save the return value and only set
it to ifi->ifi_index after all checks are done and before the answer
buffer is freed.

Fixes: 86bf43c7c2fdc ("lib/libnetlink: update rtnl_talk to support malloc buff at run time")
Cc: Hangbin Liu <liuhangbin@gmail.com>
Cc: Phil Sutter <phil@nwl.cc>
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agort_protos: drop old experimental gated names
Stephen Hemminger [Fri, 1 Jun 2018 19:44:14 +0000 (15:44 -0400)]
rt_protos: drop old experimental gated names

No longer need these petroglyph values.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agoiproute: ip route get support for sport, dport and ipproto match
Roopa Prabhu [Wed, 30 May 2018 17:06:18 +0000 (10:06 -0700)]
iproute: ip route get support for sport, dport and ipproto match

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agoip route: print RTA_CACHEINFO if it exists
David Ahern [Wed, 30 May 2018 15:30:09 +0000 (08:30 -0700)]
ip route: print RTA_CACHEINFO if it exists

RTA_CACHEINFO can be sent for non-cloned routes. If the attribute is
present print it. Allows route dumps to print expires times for example
which can exist on FIB entries.

Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agoMerge branch 'iproute2-master' into iproute2-next
David Ahern [Fri, 1 Jun 2018 15:17:23 +0000 (08:17 -0700)]
Merge branch 'iproute2-master' into iproute2-next

Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agoipaddress: Add support for address metric
David Ahern [Sun, 27 May 2018 15:10:00 +0000 (08:10 -0700)]
ipaddress: Add support for address metric

Add support for IFA_RT_PRIORITY using the same keywords as iproute for
RTA_PRIORITY.

Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agoUpdate kernel headers
David Ahern [Wed, 30 May 2018 15:06:19 +0000 (08:06 -0700)]
Update kernel headers

Update kernel headers to commit
ae40832e53c3 ("bpfilter: fix a build err")

Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agoip: defer lookup interface index
Stephen Hemminger [Fri, 25 May 2018 14:48:40 +0000 (07:48 -0700)]
ip: defer lookup interface index

The ip command would always lookup the network device index
even when not necessary. This slows down operations like creating
lots of VLAN's.

David reported the original issue, this is an alternative patch
that solves it in a slightly more general method.

Using iproute2 to create a bridge and add 4094 vlans to it can take from
2 to 3 *minutes*. The reason is the extraneous call to ll_name_to_index.
ll_name_to_index results in an ioctl(SIOCGIFINDEX) call which in turn
invokes dev_load. If the index does not exist, which it won't when
creating a new link, dev_load calls modprobe twice -- once for
netdev-NAME and again for NAME. This is unnecessary overhead for each
link create.

When ip link is invoked for a new device, there is no reason to
call ll_name_to_index for the new device. With this patch, creating
a bridge and adding 4094 vlans takes less than 3 *seconds*.

old:
# time ip -batch ip-vlan.batch
real    3m13.727s
user    0m0.076s
sys     0m1.959s

new:
# time ip -batch ip-vlan.batch
real    0m3.222s
user    0m0.044s
sys     0m1.777s

Reported-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agoip route: Print expires as signed int
David Ahern [Wed, 23 May 2018 18:50:01 +0000 (11:50 -0700)]
ip route: Print expires as signed int

rta_expires is a signed int; print it as one.

Fixes: 663c3cb23103f ("iproute: implement JSON and color output")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agoAllow to configure /var/run/netns directory
Pavel Maltsev [Fri, 18 May 2018 22:44:00 +0000 (15:44 -0700)]
Allow to configure /var/run/netns directory

Currently NETNS_RUN_DIR is hardcoded and refers to /var/run/netns.
However, some systems (e.g. Android) doesn't have /var
which results in error attempts to create network namespaces on these
systems.  This change makes NETNS_RUN_DIR configurable at build time
by allowing to pass environment variable to make command.
Also, this change makes /etc/netns directory configurable through
NETNS_ETC_DIR environment variable.

For example: ./configure && NETNS_RUN_DIR=/mnt/vendor/netns make

Tested: verified that iproute2 with configuration mentioned above
creates namespaces in /mnt/vendor/netns

Signed-off-by: Pavel Maltsev <pavelm@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agodevlink: introduce support for showing port flavours
Jiri Pirko [Sun, 20 May 2018 08:15:38 +0000 (10:15 +0200)]
devlink: introduce support for showing port flavours

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agoUpdate kernel headers
David Ahern [Wed, 23 May 2018 19:58:34 +0000 (12:58 -0700)]
Update kernel headers

Update kernel headers to commit
e89e59c08d1b ("Merge branch 'net-sfp-small-improvements'")

Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agoMerge branch 'rdma-resource-tracking' into iproute2-next
David Ahern [Fri, 18 May 2018 16:21:42 +0000 (09:21 -0700)]
Merge branch 'rdma-resource-tracking' into iproute2-next

Steve Wise  says:

====================

This series enhances the iproute2 rdma tool to include displaying
driver-specific resource attributes.  It is the user-space part of the
kernel driver resource tracking series that has been accepted for merging
into linux-4.18 [1]

If there are no additional review comments, it can now be merged, I think.

Changes since v2:
- resync rdma_netlink.h to fix uapi break

Changes since v1:
- commit log editorial fixes
- cite kernel commits that updated rdma_netlink.h in the
  iproute2 commit syncing this header
- reorder stack definitions ala "reverse christmas tree"
- correctly handle unknown driver attributes when printing

Changes since v0/rfc:
- changed "provider" to "driver" based on kernel side changes
- updated man pages
- removed "RFC" tag

Thanks,

Steve.

[1] https://www.spinics.net/lists/linux-rdma/msg64199.html

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agordma: update man pages
Steve Wise [Tue, 15 May 2018 20:41:15 +0000 (13:41 -0700)]
rdma: update man pages

Update the man pages for the resource attributes as well
as the driver-specific attributes.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agordma: print driver resource attributes
Steve Wise [Tue, 15 May 2018 20:41:09 +0000 (13:41 -0700)]
rdma: print driver resource attributes

This enhancement allows printing rdma device-specific state, if provided
by the kernel. This is done in a generic manner, so rdma tool doesn't
need to know about the details of every type of rdma device.

Driver attributes for a rdma resource are in the form of <key,
[print_type], value> tuples, where the key is a string and the value can
be any supported driver attribute. The print_type attribute, if present,
provides a print format to use vs the standard print format for the type.
For example, the default print type for a PROVIDER_S32 value is "%d ",
but "0x%x " if the print_type of PRINT_TYPE_HEX is included inthe tuple.

Driver resources are only printed when the -dd flag is present.
If -p is present, then the output is formatted to not exceed 80 columns,
otherwise it is printed as a single row to be grep/awk friendly.

Example output:

# rdma resource show qp lqpn 1028 -dd -p
link cxgb4_0/- lqpn 1028 rqpn 0 type RC state RTS rq-psn 0 sq-psn 0 path-mig-state MIGRATED pid 0 comm [nvme_rdma]
    sqid 1028 flushed 0 memsize 123968 cidx 85 pidx 85 wq_pidx 106 flush_cidx 85 in_use 0
    size 386 flags 0x0 rqid 1029 memsize 16768 cidx 43 pidx 41 wq_pidx 171 msn 44 rqt_hwaddr 0x2a8a5d00
    rqt_size 256 in_use 128 size 130

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agordma: update rdma_netlink.h to get new driver attributes
Steve Wise [Tue, 15 May 2018 20:41:02 +0000 (13:41 -0700)]
rdma: update rdma_netlink.h to get new driver attributes

Pull in the rdma_netlink.h changes from kernel
commits:

25a0ad85156a ("RDMA/nldev: Add explicit pad attribute")
da5c85078215 ("RDMA/nldev: add driver-specific resource tracking)"
0d52d803767e ("RDMA/uapi: Fix uapi breakage")

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agotipc: fixed node and name table listings
Jon Maloy [Thu, 17 May 2018 14:02:42 +0000 (16:02 +0200)]
tipc: fixed node and name table listings

We make it easier for users to correlate between 128-bit node
identities and 32-bit node hash number by extending the 'node list'
command to also show the hash number.

We also improve the 'nametable show' command to show the node identity
instead of the node hash number. Since the former potentially is much
longer than the latter, we make room for it by eliminating the (to the
user) irrelevant publication key. We also reorder some of the columns so
that the node id comes last, since this looks nicer and is more logical.

Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agotc: add missing space symbol in ife output
Roman Mashak [Thu, 17 May 2018 13:28:02 +0000 (09:28 -0400)]
tc: add missing space symbol in ife output

In order to make TDC tests match the output patterns, the missing space
character must be added in the mode output string.

Fixes: 8744c5d3388e3 ("tc: jsonify ife action")
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agotc: flower: add support for verbose logging
Marcelo Ricardo Leitner [Sun, 13 May 2018 20:44:28 +0000 (17:44 -0300)]
tc: flower: add support for verbose logging

Currently there is no way to log offloading errors if the rule is not
explicitly marked as skip_sw, making it hard for other applications such
as Open vSwitch to log why a given could not be offloaded.

This patch adds support for signaling the kernel that more verbose
logging is wanted, which now will include such messages.

Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agoUpdate kernel headers
David Ahern [Fri, 18 May 2018 16:05:07 +0000 (09:05 -0700)]
Update kernel headers

Update kernel headers to commit
64a2658b58ab ("net: mscc: Add SPDX identifier")

Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agotc: allow 0% for percent options
Stephen Hemminger [Thu, 17 May 2018 23:20:50 +0000 (16:20 -0700)]
tc: allow 0% for percent options

Allowing 0% is sometimes useful for example in netem loss and drop
or perhaps dropping all traffic in a HTB bin.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199745
Reported-by: stuartmarsden@gmail.com
Fixes: 927e3cfb52b5 ("tc: B.W limits can now be specified in %.")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agotc-netem: fix limit description in man page
Marcelo Ricardo Leitner [Wed, 16 May 2018 00:49:55 +0000 (21:49 -0300)]
tc-netem: fix limit description in man page

As the kernel code says, limit is actually the amount of packets it can
hold queued at a time, as per:

static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
                         struct sk_buff **to_free)
{
...
        if (unlikely(sch->q.qlen >= sch->limit))
                return qdisc_drop_all(skb, sch, to_free);

So lets fix the description of the field in the man page.

Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agotc-netem: fix limit description in man page
Marcelo Ricardo Leitner [Wed, 16 May 2018 00:49:55 +0000 (21:49 -0300)]
tc-netem: fix limit description in man page

As the kernel code says, limit is actually the amount of packets it can
hold queued at a time, as per:

static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
                         struct sk_buff **to_free)
{
...
        if (unlikely(sch->q.qlen >= sch->limit))
                return qdisc_drop_all(skb, sch, to_free);

So lets fix the description of the field in the man page.

Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agoMerge branch 'iproute2-master' into iproute2-next
David Ahern [Wed, 16 May 2018 21:10:27 +0000 (14:10 -0700)]
Merge branch 'iproute2-master' into iproute2-next

Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agoip: do not drop capabilities if net_admin=i is set
Luca Boccassi [Fri, 11 May 2018 12:39:56 +0000 (13:39 +0100)]
ip: do not drop capabilities if net_admin=i is set

Users have reported a regression due to ip now dropping capabilities
unconditionally.
zerotier-one VPN and VirtualBox use ambient capabilities in their
binary and then fork out to ip to set routes and links, and this
does not work anymore.

As a workaround, do not drop caps if CAP_NET_ADMIN (the most common
capability used by ip) is set with the INHERITABLE flag.
Users that want ip vrf exec to work do not need to set INHERITABLE,
which will then only set when the calling program had privileges to
give itself the ambient capability.

Fixes: ba2fc55b99f8 ("Drop capabilities if not running ip exec vrf with libcap")
Signed-off-by: Luca Boccassi <bluca@debian.org>
6 years agoMerge branch 'iproute2-master' into iproute2-next
David Ahern [Thu, 10 May 2018 04:04:16 +0000 (21:04 -0700)]
Merge branch 'iproute2-master' into iproute2-next

 Conflicts:
rdma/include/uapi/rdma/rdma_netlink.h

Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agotipc: Add support to set and get MTU for UDP bearer
GhantaKrishnamurthy MohanKrishna [Tue, 8 May 2018 11:55:28 +0000 (13:55 +0200)]
tipc: Add support to set and get MTU for UDP bearer

In this commit we introduce the ability to set and get
MTU for UDP media and bearer.

For set and get properties such as tolerance, window and priority,
we already do:

    $ tipc media set PPROPERTY media MEDIA
    $ tipc media get PPROPERTY media MEDIA

    $ tipc bearer set OPTION media MEDIA ARGS
    $ tipc bearer get [OPTION] media MEDIA ARGS

The same has been extended for MTU, with an exception to support
only media type UDP.

Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agoUpdate kernel headers
David Ahern [Thu, 10 May 2018 03:52:52 +0000 (20:52 -0700)]
Update kernel headers

Update kernel headers to commit 53a7bdfb2a27
("dt-bindings: dsa: Remove unnecessary #address/#size-cells")

Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agoss: remove non-functional slabinfo
Stephen Hemminger [Wed, 9 May 2018 20:57:08 +0000 (13:57 -0700)]
ss: remove non-functional slabinfo

Ss was using slabinfo to try and intuit TCP statistics.
The slabinfo changed several times since 2.4 and all these statistics
are broken by renames and slab merging. Plus slabinfo does not exist
at all if kernel is compiled with SLUB option.

Rather than trying to fix kernel, just trim away the no longer
valid statistics.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agordma: add ib header files
Stephen Hemminger [Wed, 9 May 2018 15:14:55 +0000 (08:14 -0700)]
rdma: add ib header files

The iproute2 header files must be complete to allow builds on
other places where some of the headers are not present.

For example, iproute2 is built on Windows Services for Linux
as a test tool. With the partial addition of rdma it was broken.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agordma: align headers with upstream
Stephen Hemminger [Wed, 9 May 2018 15:12:13 +0000 (08:12 -0700)]
rdma: align headers with upstream

This makes rdma/include/uapi/rdma headers align with those produced
by doing make headers_install from upstream (Linus) tree.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agobpf: don't offload perf array maps
Jakub Kicinski [Sat, 5 May 2018 00:37:51 +0000 (17:37 -0700)]
bpf: don't offload perf array maps

Perf arrays are handled specially by the kernel, don't request
offload even when used by an offloaded program.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agoMerge branch 'iproute2-master' into iproute2-next
David Ahern [Sat, 5 May 2018 18:07:47 +0000 (11:07 -0700)]
Merge branch 'iproute2-master' into iproute2-next

Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agoiproute: Parse last nexthop in a multipath route
Ido Schimmel [Tue, 1 May 2018 13:16:35 +0000 (16:16 +0300)]
iproute: Parse last nexthop in a multipath route

Continue parsing a multipath payload as long as another nexthop can fit
in the payload.

# ip route add 192.0.2.0/24 nexthop dev dummy0 nexthop dev dummy1

Before:
# ip route show 192.0.2.0/24
192.0.2.0/24
        nexthop dev dummy0 weight 1

After:
# ip route show 192.0.2.0/24
192.0.2.0/24
        nexthop dev dummy0 weight 1
        nexthop dev dummy1 weight 1

Fixes: f48e14880a0e ("iproute: refactor multipath print")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agoarpd: remove pthread dependency
Baruch Siach [Tue, 1 May 2018 12:43:08 +0000 (15:43 +0300)]
arpd: remove pthread dependency

Explicit link with pthread is not needed when linking dynamically. Even
static link with recent libdb does not pull in the code that uses
pthread. Finally, the configure check introduced in commit a25df4887d7
(configure: Check for Berkeley DB for arpd compilation) does not add
-lpthread to its link command.

This change allows arpd build with toolchains that do not provide
threads support.

Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agoREADME: update libdb build dependency information
Baruch Siach [Tue, 1 May 2018 12:43:07 +0000 (15:43 +0300)]
README: update libdb build dependency information

Debian does not distribute libdb4.x-dev for quite some time now. Current
stable carries libdb5.3-dev. Update the wording accordingly.

Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agojson_print: Fix hidden 64-bit type promotion
Toke Høiland-Jørgensen [Wed, 25 Apr 2018 15:28:57 +0000 (17:28 +0200)]
json_print: Fix hidden 64-bit type promotion

print_uint() will silently promote its variable type to uint64_t, but there
is nothing that ensures that the format string specifier passed along with
it fits (and the function name suggest to pass "%u").

Fix this by changing print_uint() to use a native 'unsigned int' type, and
introduce a separate print_u64() function for printing 64-bit values. All
call sites that were actually printing 64-bit values using print_uint() are
converted to use print_u64() instead.

Since print_int() was already using native int types, just add a
print_s64() to match, but don't convert any call sites. For symmetry,
also add a print_luint() method (with no users).

Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agoingress: Don't break JSON output
Toke Høiland-Jørgensen [Wed, 25 Apr 2018 09:29:46 +0000 (11:29 +0200)]
ingress: Don't break JSON output

The dash printed by the ingress qdisc breaks JSON output, so only print it
in regular output mode.

Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agovxlan: add ttl auto in help message
Hangbin Liu [Tue, 24 Apr 2018 02:40:17 +0000 (10:40 +0800)]
vxlan: add ttl auto in help message

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agogre/gre6: allow clearing {,i,o}{key,seq,csum} flags
Sabrina Dubroca [Fri, 20 Apr 2018 08:32:00 +0000 (10:32 +0200)]
gre/gre6: allow clearing {,i,o}{key,seq,csum} flags

Currently, iproute allows setting those flags, but it's impossible to
clear them, since their current value is fetched from the kernel and
then we OR in the additional flags passed on the command line.

Add no* variants to allow clearing them.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agoman: ip link: document GRE tunnels
Sabrina Dubroca [Fri, 20 Apr 2018 08:31:59 +0000 (10:31 +0200)]
man: ip link: document GRE tunnels

GRE tunnels are currently only documented together with IPIP and SIT
tunnels, but they actually have very different configuration
options. Let's separate them.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agoMerge branch 'master' into iproute2-next
David Ahern [Tue, 24 Apr 2018 02:42:21 +0000 (19:42 -0700)]
Merge branch 'master' into iproute2-next

Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agoiplink_geneve: correct size of message to avoid spurious errors
Jakub Kicinski [Wed, 18 Apr 2018 18:06:07 +0000 (11:06 -0700)]
iplink_geneve: correct size of message to avoid spurious errors

Commit 6c4b672738ac ("iplink_geneve: Get rid of inet_get_addr()")
inadvertently changed the parameter to addattr_l() resulting in:

addattr_l ERROR: message exceeded bound of 4

when remote is specified.

Fixes: 6c4b672738ac ("iplink_geneve: Get rid of inet_get_addr()")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
6 years agobpf: fix warnings on gcc-8 about string truncation
Stephen Hemminger [Fri, 20 Apr 2018 17:38:00 +0000 (10:38 -0700)]
bpf: fix warnings on gcc-8 about string truncation

In theory, the path for BPF could exceed the 4K PATH_MAX.
In practice, not really possible. But shut up gcc.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agotc: return on invalid smac or dmac in ife action
Roman Mashak [Fri, 20 Apr 2018 13:52:18 +0000 (09:52 -0400)]
tc: return on invalid smac or dmac in ife action

Return on invalid smac/dmac and use invarg consistently for invalid
arguments report.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
6 years agoflower: use 16 bit format where possible
Stephen Hemminger [Fri, 20 Apr 2018 17:04:14 +0000 (10:04 -0700)]
flower: use 16 bit format where possible

Should use print_hu not print_uint for 16 bit value.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agoipneigh: fix missing format specifier
Stephen Hemminger [Fri, 20 Apr 2018 16:29:13 +0000 (09:29 -0700)]
ipneigh: fix missing format specifier

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agovxlan: fix ttl inherit behavior
Hangbin Liu [Wed, 18 Apr 2018 05:05:48 +0000 (13:05 +0800)]
vxlan: fix ttl inherit behavior

Like kernel net-next commit 72f6d71e491e6 ("vxlan: add ttl inherit support"),
vxlan ttl inherit should means inherit the inner protocol's ttl value.

But currently when we add vxlan with "ttl inherit", we only set ttl 0,
which is actually use whatever default value instead of inherit the inner
protocol's ttl value.

To make a difference with ttl inherit and ttl == 0, we add an attribute
IFLA_VXLAN_TTL_INHERIT when "ttl inherit" specified. And use "ttl auto"
to means "use whatever default value", the same behavior with ttl == 0.

Reported-by: Jianlin Shi <jishi@redhat.com>
Suggested-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agoUpdate kernel headers
David Ahern [Thu, 19 Apr 2018 18:10:27 +0000 (11:10 -0700)]
Update kernel headers

Update kernel headers to commit 292eba02dbb4
("net-next/hinic: add arm64 support")

Signed-off-by: David Ahern <dsahern@gmail.com>
6 years agoutils: Do not reset family for default, any, all addresses
David Ahern [Fri, 13 Apr 2018 16:36:33 +0000 (09:36 -0700)]
utils: Do not reset family for default, any, all addresses

Thomas reported a change in behavior with respect to autodectecting
address families. Specifically, 'ip ro add default via fe80::1'
syntax was failing to treat fe80::1 as an IPv6 address as it did in
prior releases. The root causes appears to be a change in family when
the default keyword is parsed.

'default', 'any' and 'all' are relevant outside of AF_INET. Leave the
family arg as is for these when setting addr.

Fixes: 93fa12418dc6 ("utils: Always specify family and ->bytelen in get_prefix_1()")
Reported-by: Thomas Deutschmann <whissi@gentoo.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: Serhey Popovych <serhe.popovych@gmail.com>