]> git.proxmox.com Git - mirror_iproute2.git/log
mirror_iproute2.git
3 years agovdpa: add .gitignore master
Stephen Hemminger [Wed, 24 Feb 2021 07:12:14 +0000 (23:12 -0800)]
vdpa: add .gitignore

Ignore the resulting binary vdpa.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoUpdate kernel headers from 5.12-pre rc
Stephen Hemminger [Wed, 24 Feb 2021 07:10:51 +0000 (23:10 -0800)]
Update kernel headers from 5.12-pre rc

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoMerge git://git.kernel.org/pub/scm/network/iproute2/iproute2-next
Stephen Hemminger [Wed, 24 Feb 2021 07:03:42 +0000 (23:03 -0800)]
Merge git://git.kernel.org/pub/scm/network/iproute2/iproute2-next

3 years agov5.11.0
Stephen Hemminger [Tue, 23 Feb 2021 17:34:11 +0000 (09:34 -0800)]
v5.11.0

3 years agolib/fs: Fix single return points for get_cgroup2_*
Andrea Claudi [Mon, 22 Feb 2021 18:14:32 +0000 (19:14 +0100)]
lib/fs: Fix single return points for get_cgroup2_*

Functions get_cgroup2_id() and get_cgroup2_path() may call close() with
a negative argument.
Avoid that making the calls conditional on the file descriptors.

get_cgroup2_path() may also return NULL leaking a file descriptor.
Ensure this does not happen using a single return point.

Fixes: d5e6ee0dac64 ("ss: introduce cgroup2 cache and helper functions")
Fixes: 8f1cd119b377 ("lib: fix checking of returned file handle size for cgroup")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agolib/fs: avoid double call to mkdir on make_path()
Andrea Claudi [Mon, 22 Feb 2021 18:14:31 +0000 (19:14 +0100)]
lib/fs: avoid double call to mkdir on make_path()

make_path() function calls mkdir two times in a row. The first one it
stores mkdir return code, and then it calls it again to check for errno.

This seems unnecessary, as we can use the return code from the first
call and check for errno if not 0.

Fixes: ac3415f5c1b1d ("lib/fs: Fix and simplify make_path()")
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agolib/bpf: Fix and simplify bpf_mnt_check_target()
Andrea Claudi [Mon, 22 Feb 2021 17:43:10 +0000 (18:43 +0100)]
lib/bpf: Fix and simplify bpf_mnt_check_target()

As stated in commit ac3415f5c1b1 ("lib/fs: Fix and simplify make_path()"),
calling stat() before mkdir() is racey, because the entry might change in
between.

As the call to stat() seems to only check for target existence, we can
simply call mkdir() unconditionally and catch all errors but EEXIST.

Fixes: 95ae9a4870e7 ("bpf: fix mnt path when from env")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
3 years agolib/namespace: fix ip -all netns return code
Andrea Claudi [Mon, 22 Feb 2021 11:40:36 +0000 (12:40 +0100)]
lib/namespace: fix ip -all netns return code

When ip -all netns {del,exec} are called and no netns is present, ip
exit with status 0. However this does not happen if no netns has been
created since boot time: in that case, indeed, the NETNS_RUN_DIR is not
present and netns_foreach() exit with code 1.

$ ls /var/run/netns
ls: cannot access '/var/run/netns': No such file or directory
$ ip -all netns exec ip link show
$ echo $?
1
$ ip -all netns del
$ echo $?
1
$ ip netns add test
$ ip netns del test
$ ip -all netns del
$ echo $?
0
$ ls -a /var/run/netns
.  ..

This leaves us in the unpleasant situation where the same command, when
no netns is present, does the same stuff (in this case, nothing), but
exit with two different statuses.

Fix this treating ENOENT in a different way from other errors, similarly
to what we already do in ipnetns.c netns_identify_pid()

Fixes: e998e118ddc3 ("lib: Exec func on each netns")
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoip: lwtunnel: seg6: bail out if table ids are invalid
Andrea Claudi [Mon, 22 Feb 2021 20:23:01 +0000 (21:23 +0100)]
ip: lwtunnel: seg6: bail out if table ids are invalid

When table and vrftable are used in SRv6, ip should bail out if table
ids are not valid, and return a proper error message to the user.

Achieve this simply checking rtnl_rttable_a2n return value, as we
already do in the rest of iproute.

Fixes: 0486388a877a ("add support for table name in SRv6 End.DT* behaviors")
Fixes: 69629b4e43c4 ("seg6: add support for vrftable attribute in SRv6 End.DT4/DT6 behaviors")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agotc: m_gate: use SPRINT_BUF when needed
Andrea Claudi [Mon, 22 Feb 2021 20:22:47 +0000 (21:22 +0100)]
tc: m_gate: use SPRINT_BUF when needed

sprint_time64() uses SPRINT_BSIZE-1 as a constant buffer lenght in its
implementation, however m_gate uses shorter buffers when calling it.

Fix this using SPRINT_BUF macro to get the buffer, thus getting a
SPRINT_BSIZE-long buffer.

Fixes: 07d5ee70b5b3 ("iproute2-next:tc:action: add a gate control action")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoman8/bridge.8: be explicit that "flood" is an egress setting
Vladimir Oltean [Thu, 11 Feb 2021 10:45:02 +0000 (12:45 +0200)]
man8/bridge.8: be explicit that "flood" is an egress setting

Talking to varios people, it became apparent that there is a certain
ambiguity in the description of these flags. They refer to egress
flooding, which should perhaps be stated more clearly.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoman8/bridge.8: explain self vs master for "bridge fdb add"
Vladimir Oltean [Thu, 11 Feb 2021 10:45:01 +0000 (12:45 +0200)]
man8/bridge.8: explain self vs master for "bridge fdb add"

The "usually hardware" and "usually software" distinctions make no
sense, try to clarify what these do based on the actual kernel behavior.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoman8/bridge.8: fix which one of self/master is default for "bridge fdb"
Vladimir Oltean [Thu, 11 Feb 2021 10:45:00 +0000 (12:45 +0200)]
man8/bridge.8: fix which one of self/master is default for "bridge fdb"

The bridge program does:

fdb_modify:
/* Assume self */
if (!(req.ndm.ndm_flags&(NTF_SELF|NTF_MASTER)))
req.ndm.ndm_flags |= NTF_SELF;

which is clearly against the documented behavior. The only thing we can
do, sadly, is update the documentation.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoman8/bridge.8: explain what a local FDB entry is
Vladimir Oltean [Thu, 11 Feb 2021 10:44:59 +0000 (12:44 +0200)]
man8/bridge.8: explain what a local FDB entry is

Explaining the "local" flag by saying that it is "a local permanent fdb
entry" is not very helpful, be more specific.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoman8/bridge.8: document that "local" is default for "bridge fdb add"
Vladimir Oltean [Thu, 11 Feb 2021 10:44:58 +0000 (12:44 +0200)]
man8/bridge.8: document that "local" is default for "bridge fdb add"

The bridge does this:

fdb_modify:
/* Assume permanent */
if (!(req.ndm.ndm_state&(NUD_PERMANENT|NUD_REACHABLE)))
req.ndm.ndm_state |= NUD_PERMANENT;

So let's make the user aware of the fact that if they don't want local
entries, they need to specify some other flag like "static".

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoman8/bridge.8: document the "permanent" flag for "bridge fdb add"
Vladimir Oltean [Thu, 11 Feb 2021 10:44:57 +0000 (12:44 +0200)]
man8/bridge.8: document the "permanent" flag for "bridge fdb add"

The bridge program parses "local" and "permanent" in just the same way,
so it makes sense to tell that to users:

fdb_modify:
} else if (matches(*argv, "local") == 0 ||
   matches(*argv, "permanent") == 0) {
req.ndm.ndm_state |= NUD_PERMANENT;

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agordma: Fix statistics bind/unbing argument handling
Ido Kalir [Sun, 14 Feb 2021 08:33:35 +0000 (10:33 +0200)]
rdma: Fix statistics bind/unbing argument handling

The dump isn't supported for the statistics bind/unbind commands
because they operate on specific QP counters. This is different
from query commands that can operate on many objects at the same
time.

Let's check the user input and ensure that arguments are valid.

Fixes: a6d0773ebecc ("rdma: Add stat manual mode support")
Signed-off-by: Ido Kalir <idok@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoss: Make leading ":" always optional for sport and dport
Thayne McCombs [Sun, 14 Feb 2021 08:09:13 +0000 (01:09 -0700)]
ss: Make leading ":" always optional for sport and dport

The sport and dport conditions in expressions were inconsistent on
whether there should be a ":" at the beginning of the port when only a
port was provided depending on the family. The link and netlink
families required a ":" to work. The vsock family required the ":"
to be absent. The inet and inet6 families work with or without a leading
":".

This makes the leading ":" optional in all cases, so if sport or dport
are used, then it works with a leading ":" or without one, as inet and
inet6 did.

Signed-off-by: Thayne McCombs <astrothayne@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoip route: Print "rt_offload_failed" indication
Amit Cohen [Tue, 9 Feb 2021 09:12:00 +0000 (11:12 +0200)]
ip route: Print "rt_offload_failed" indication

The kernel signals when offload fails using the 'RTM_F_OFFLOAD_FAILED'
flag. Print it to help users understand the offload state of the route.
The "rt_" prefix is used in order to distinguish it from the offload state
of nexthops, similar to "rt_offload" and "rt_trap".

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoUpdate kernel headers
David Ahern [Sun, 14 Feb 2021 00:48:05 +0000 (17:48 -0700)]
Update kernel headers

Update kernel headers to commit:
    c4762993129f ("Merge branch 'skbuff-introduce-skbuff_heads-bulking-and-reusing'")

Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agodevlink: add support for port params get/set
Oleksandr Mazur [Tue, 9 Feb 2021 10:31:51 +0000 (12:31 +0200)]
devlink: add support for port params get/set

Add implementation for the port parameters
getting/setting.
Add bash completion for port param.
Add man description for port param.

Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoMerge branch 'vdpa' into next
David Ahern [Thu, 11 Feb 2021 16:16:49 +0000 (09:16 -0700)]
Merge branch 'vdpa' into next

Parav Pandit  says:

====================

Linux vdpa interface allows vdpa device management functionality.
This includes adding, removing, querying vdpa devices.

vdpa interface also includes showing supported management devices
which support such operations.

This patchset includes kernel uapi headers and a vdpa tool.

examples:

$ vdpa mgmtdev show
vdpasim:
  supported_classes net

$ vdpa mgmtdev show -jp
{
    "show": {
        "vdpasim": {
            "supported_classes": [ "net" ]
        }
    }
}

Create a vdpa device of type networking named as "foo2" from
the management device vdpasim_net:

$ vdpa dev add mgmtdev vdpasim_net name foo2

Show the newly created vdpa device by its name:
$ vdpa dev show foo2
foo2: type network mgmtdev vdpasim_net vendor_id 0 max_vqs 2 max_vq_size 25=
6

$ vdpa dev show foo2 -jp
{
    "dev": {
        "foo2": {
            "type": "network",
            "mgmtdev": "vdpasim_net",
            "vendor_id": 0,
            "max_vqs": 2,
            "max_vq_size": 256
        }
    }
}

Delete the vdpa device after its use:
$ vdpa dev del foo2

An example of PCI PF, VF and SF management device:
pci/0000:03.00:0
  supported_classes
    net
pci/0000:03.00:4
  supported_classes
    net
auxiliary/mlx5_core.sf.8
  supported_classes
    net

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agovdpa: Add vdpa tool
Parav Pandit [Wed, 10 Feb 2021 18:34:45 +0000 (20:34 +0200)]
vdpa: Add vdpa tool

vdpa tool is created to create, delete and query vdpa devices.
examples:
Show vdpa management device that supports creating, deleting vdpa devices.

$ vdpa mgmtdev show
vdpasim:
  supported_classes net

$ vdpa mgmtdev show -jp
{
    "show": {
        "vdpasim": {
            "supported_classes": [ "net" ]
        }
    }
}

Create a vdpa device of type networking named as "foo2" from
the management device vdpasim_net:

$ vdpa dev add mgmtdev vdpasim_net name foo2

Show the newly created vdpa device by its name:
$ vdpa dev show foo2
foo2: type network mgmtdev vdpasim_net vendor_id 0 max_vqs 2 max_vq_size 256

$ vdpa dev show foo2 -jp
{
    "dev": {
        "foo2": {
            "type": "network",
            "mgmtdev": "vdpasim_net",
            "vendor_id": 0,
            "max_vqs": 2,
            "max_vq_size": 256
        }
    }
}

Delete the vdpa device after its use:
$ vdpa dev del foo2

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoutils: Add helper to map string to unsigned int
Parav Pandit [Wed, 10 Feb 2021 18:34:44 +0000 (20:34 +0200)]
utils: Add helper to map string to unsigned int

In subsequent patch need to map a string to a unsigned int.
Hence, add an API to map a string to unsigned int.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoutils: Add generic socket helpers
Parav Pandit [Wed, 10 Feb 2021 18:34:43 +0000 (20:34 +0200)]
utils: Add generic socket helpers

Subsequent patch needs to
(a) query and use socket family
(b) send/receive messages using this family

Hence add helper routines to open, close, query family and to perform
send receive operations.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoutils: Add helper routines for indent handling
Parav Pandit [Wed, 10 Feb 2021 18:34:42 +0000 (20:34 +0200)]
utils: Add helper routines for indent handling

Subsequent patch needs to use 2 char indentation for nested objects.
Hence introduce a generic helpers to allocate, deallocate, increment,
decrement and to print indent block.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoAdd kernel headers
Parav Pandit [Wed, 10 Feb 2021 18:34:41 +0000 (20:34 +0200)]
Add kernel headers

Add kernel headers to commit from kernel tree [1].
  6acba4951632 ("vdpa_sim_net: Add support for user supported devices")

[1] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agotc: flower: Add support for ct_state reply flag
Paul Blakey [Tue, 2 Feb 2021 12:24:42 +0000 (14:24 +0200)]
tc: flower: Add support for ct_state reply flag

Matches on conntrack rpl ct_state.

Example:
$ tc filter add dev ens1f0_0 ingress prio 1 chain 1 proto ip flower \
  ct_state +trk+est+rpl \
  action mirred egress redirect dev ens1f0_1
$ tc filter add dev ens1f0_1 ingress prio 1 chain 1 proto ip flower \
  ct_state +trk+est-rpl \
  action mirred egress redirect dev ens1f0_0

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agotc/htb: Hierarchical QoS hardware offload
Maxim Mikityanskiy [Thu, 4 Feb 2021 14:51:37 +0000 (16:51 +0200)]
tc/htb: Hierarchical QoS hardware offload

This commit adds support for configuring HTB in offload mode. HTB
offload eliminates the single qdisc lock in the datapath and offloads
the algorithm to the NIC. The new 'offload' parameter is added to
enable this mode:

    # tc qdisc replace dev eth0 root handle 1: htb offload

Classes are created as usual, but filters should be moved to clsact for
lock-free classification (filters attached to HTB itself are not
supported in the offload mode):

    # tc filter add dev eth0 egress protocol ip flower dst_port 80
    action skbedit priority 1:10

tc qdisc show and tc class show will indicate whether the offload is
enabled. Example output:

$ tc qdisc show dev eth1
qdisc htb 1: root offloaded r2q 10 default 0 direct_packets_stat 0 direct_qlen 1000 offload
qdisc pfifo 0: parent 1: limit 1000p
qdisc pfifo 0: parent 1: limit 1000p
qdisc pfifo 0: parent 1: limit 1000p
qdisc pfifo 0: parent 1: limit 1000p
qdisc pfifo 0: parent 1: limit 1000p
qdisc pfifo 0: parent 1: limit 1000p
qdisc pfifo 0: parent 1: limit 1000p
qdisc pfifo 0: parent 1: limit 1000p
$ tc class show dev eth1
class htb 1:101 parent 1:1 prio 0 rate 4Gbit ceil 4Gbit burst 1000b cburst 1000b  offload
class htb 1:1 root rate 100Gbit ceil 100Gbit burst 0b cburst 0b  offload
class htb 1:103 parent 1:1 prio 0 rate 4Gbit ceil 4Gbit burst 1000b cburst 1000b  offload
class htb 1:102 parent 1:1 prio 0 rate 4Gbit ceil 4Gbit burst 1000b cburst 1000b  offload
class htb 1:105 parent 1:1 prio 0 rate 4Gbit ceil 4Gbit burst 1000b cburst 1000b  offload
class htb 1:104 parent 1:1 prio 0 rate 4Gbit ceil 4Gbit burst 1000b cburst 1000b  offload
class htb 1:107 parent 1:1 prio 0 rate 4Gbit ceil 4Gbit burst 1000b cburst 1000b  offload
class htb 1:106 parent 1:1 prio 0 rate 4Gbit ceil 4Gbit burst 1000b cburst 1000b  offload
class htb 1:108 parent 1:1 prio 0 rate 4Gbit ceil 4Gbit burst 1000b cburst 1000b  offload
$ tc -j qdisc show dev eth1
[{"kind":"htb","handle":"1:","root":true,"offloaded":true,"options":{"r2q":10,"default":"0","direct_packets_stat":0,"direct_qlen":1000,"offload":null}},{"kind":"pfifo","handle":"0:","parent":"1:","options":{"limit":1000}},{"kind":"pfifo","handle":"0:","parent":"1:","options":{"limit":1000}},{"kind":"pfifo","handle":"0:","parent":"1:","options":{"limit":1000}},{"kind":"pfifo","handle":"0:","parent":"1:","options":{"limit":1000}},{"kind":"pfifo","handle":"0:","parent":"1:","options":{"limit":1000}},{"kind":"pfifo","handle":"0:","parent":"1:","options":{"limit":1000}},{"kind":"pfifo","handle":"0:","parent":"1:","options":{"limit":1000}},{"kind":"pfifo","handle":"0:","parent":"1:","options":{"limit":1000}}]

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoss: always prefer family as part of host condition to default family
Thayne McCombs [Tue, 2 Feb 2021 03:32:10 +0000 (20:32 -0700)]
ss: always prefer family as part of host condition to default family

ss accepts an address family both with the -f option and as part of a
host condition. However, if the family in the host condition is
different than the the last -f option, then which family is actually
used depends on the order that different families are checked.

This changes parse_hostcond to check all family prefixes before parsing
the rest of the address, so that the host condition's family always has
a higher priority than the "preferred" family.

Signed-off-by: Thayne McCombs <astrothayne@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agouapi: pick up rpl.h fix
Stephen Hemminger [Wed, 3 Feb 2021 16:16:16 +0000 (08:16 -0800)]
uapi: pick up rpl.h fix

Upstream change to fix byte order issues.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoiproute: force rtm_dst_len to 32/128
Luca Boccassi [Sun, 24 Jan 2021 17:36:58 +0000 (17:36 +0000)]
iproute: force rtm_dst_len to 32/128

Since NETLINK_GET_STRICT_CHK was enabled, the kernel rejects commands
that pass a prefix length, eg:

 ip route get `1.0.0.0/1
  Error: ipv4: Invalid values in header for route get request.
 ip route get 0.0.0.0/0
  Error: ipv4: rtm_src_len and rtm_dst_len must be 32 for IPv4

Since there's no point in setting a rtm_dst_len that we know is going
to be rejected, just force it to the right value if it's passed on
the command line. Print a warning to stderr to notify users.

Bug-Debian: https://bugs.debian.org/944730
Reported-By: Clément 'wxcafé' Hertling <wxcafe@wxcafe.net>
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoss: Add clarification about host conditions with multiple familes to man
Thayne McCombs [Tue, 2 Feb 2021 22:30:40 +0000 (14:30 -0800)]
ss: Add clarification about host conditions with multiple familes to man

In creating documentation for expressions I ran into an interesting case
where if you use two different familie types in the expression, such as
in `ss 'sport inet:ssh or src unix:/run/*'`, then you would only get the
results for one address family (in this case unix sockets).

The reason is that in parse_hostcond if the family is specified we
remove any previously added families from filter->families, and
preserve the "states" if any states are set. I tried changing this to
not reset the families, but ran into some issues with Invalid Argument
errors in inet_show_netlink, I think related to the state.

I can dig into that more if supporting this is useful, but I'm not sure
if these types of expressions would actually be useful in practice. Or
perhaps an error should be given if an expression contains conditions
with multiple families (besides inet and inet6)?

Anyway, for now, this patch just notes the limitation in the man page.

Signed-off-by: Thayne McCombs <astrothayne@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoAdd documentation of ss filter to man page
Thayne McCombs [Thu, 28 Jan 2021 08:10:18 +0000 (01:10 -0700)]
Add documentation of ss filter to man page

This adds some documentation of the syntax for the FILTER argument to
the ss command to the ss (8) man page.

Signed-off-by: Thayne McCombs <astrothayne@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoiplink: print warning for missing VF data
Edwin Peer [Tue, 26 Jan 2021 17:40:53 +0000 (09:40 -0800)]
iplink: print warning for missing VF data

The kernel might truncate VF info in IFLA_VFINFO_LIST. Compare the
expected number of VFs in IFLA_NUM_VF to how many were found in the
list and warn accordingly.

Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoss: do not emit warn while dumping MPTCP on old kernels
Paolo Abeni [Mon, 25 Jan 2021 16:02:07 +0000 (17:02 +0100)]
ss: do not emit warn while dumping MPTCP on old kernels

Prior to this commit, running 'ss' on a kernel older than v5.9
bumps an error message:

RTNETLINK answers: Invalid argument

When asked to dump protocol number > 255 - that is: MPTCP - 'ss'
adds an INET_DIAG_REQ_PROTOCOL attribute, unsupported by the older
kernel.

Avoid the warning ignoring filter issues when INET_DIAG_REQ_PROTOCOL
is used.

Additionally older kernel end-up invoking tcpdiag_send(), which
in turn will try to dump DCCP socks. Bail early in such function,
as the kernel does not implement an MPTCPDIAG_GET request.

Reported-by: "Rantala, Tommi T. (Nokia - FI/Espoo)" <tommi.t.rantala@nokia.com>
Fixes: 9c3be2c0eee0 ("ss: mptcp: add msk diag interface support")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoman: tc-taprio.8: document the full offload feature
Vladimir Oltean [Fri, 22 Jan 2021 01:22:11 +0000 (03:22 +0200)]
man: tc-taprio.8: document the full offload feature

Since this feature's introduction in commit 9c66d1564676 ("taprio: Add
support for hardware offloading") from kernel v5.4, it never got
documented in the man pages. Due to this reason, we see customer reports
of seemingly contradictory information: the community manpages claim
there is no support for full offload, nonetheless many silicon vendors
have already implemented it.

This patch documents the full offload feature (enabled by specifying
"flags 2" to the taprio qdisc) and gives one more example that tries to
illustrate some of the finer points related to the usage.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoiplink_bareudp: cleanup help message and man page
Guillaume Nault [Mon, 1 Feb 2021 17:44:07 +0000 (18:44 +0100)]
iplink_bareudp: cleanup help message and man page

* Fix PROTO description in help message (mpls isn't a valid argument).

 * Remove SRCPORTMIN description from help message since it doesn't
   appear in the syntax string.

 * Use same keywords in help message and in man page.

 * Use the "ethertype" option name (.B ethertype) rather than the
   option value (.I ETHERTYPE) in the man page description of
   [no]multiproto.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoMerge branch 'devlink-port-mgmt' into next
David Ahern [Tue, 2 Feb 2021 02:45:49 +0000 (02:45 +0000)]
Merge branch 'devlink-port-mgmt' into next

Parav Pandit  says:

====================

This patchset implements devlink port add, delete and function state
management commands.

An example sequence for a PCI SF:

Set the device in switchdev mode:
$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev

View ports in switchdev mode:
$ devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 s=
plittable false

Add a subfunction port for PCI PF 0 with sfnumber 88:
$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
pci/0000:08:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfn=
um 0 sfnum 88 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

Show a newly added port:
$ devlink port show pci/0000:06:00.0/32768
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf contro=
ller 0 pfnum 0 sfnum 88 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

Set the function state to active:
$ devlink port function set pci/0000:06:00.0/32768 hw_addr 00:00:00:00:88:8=
8 state active

Show the port in JSON format:
$ devlink port show pci/0000:06:00.0/32768 -jp
{
    "port": {
        "pci/0000:06:00.0/32768": {
            "type": "eth",
            "netdev": "ens2f0npf0sf88",
            "flavour": "pcisf",
            "controller": 0,
            "pfnum": 0,
            "sfnum": 88,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:88:88",
                "state": "active",
                "opstate": "attached"
            }
        }
    }
}

Set the function state to active:
$ devlink port function set pci/0000:06:00.0/32768 state inactive

Delete the port after use:
$ devlink port del pci/0000:06:00.0/32768

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agodevlink: Support set of port function state
Parav Pandit [Mon, 1 Feb 2021 21:35:51 +0000 (23:35 +0200)]
devlink: Support set of port function state

Support set operation of the devlink port function state.

Example of a PCI SF port function which supports the state:

$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev

$ devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false

$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
pci/0000:08:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

$ devlink port show pci/0000:06:00.0/32768
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

$ devlink port function set pci/0000:06:00.0/32768 hw_addr 00:00:00:00:88:88 state active

$ devlink port show pci/0000:06:00.0/32768 -jp
{
    "port": {
        "pci/0000:06:00.0/32768": {
            "type": "eth",
            "netdev": "ens2f0npf0sf88",
            "flavour": "pcisf",
            "controller": 0,
            "pfnum": 0,
            "sfnum": 88,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:88:88",
                "state": "active",
                "opstate": "attached"
            }
        }
    }
}

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agodevlink: Support get port function state
Parav Pandit [Mon, 1 Feb 2021 21:35:50 +0000 (23:35 +0200)]
devlink: Support get port function state

Print port function state and operational state whenever reported by
kernel.

Example of a PCI SF port function which supports the state:

$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev

$ devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false

$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
pci/0000:08:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

$ devlink port show pci/0000:06:00.0/32768
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

$ devlink port function set pci/0000:06:00.0/32768 hw_addr 00:00:00:00:88:88

$ devlink port show pci/0000:06:00.0/32768 -jp
{
    "port": {
        "pci/0000:06:00.0/32768": {
            "type": "eth",
            "netdev": "ens2f0npf0sf88",
            "flavour": "pcisf",
            "controller": 0,
            "pfnum": 0,
            "sfnum": 88,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:88:88",
                "state": "inactive",
                "opstate": "detached"
            }
        }
    }
}

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agodevlink: Supporting add and delete of devlink port
Parav Pandit [Mon, 1 Feb 2021 21:35:49 +0000 (23:35 +0200)]
devlink: Supporting add and delete of devlink port

Enable user to add and delete the devlink port.

Examples for adding and deleting one SF port:

Examples of add, show and delete commands:
$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev

$ devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false

Add devlink port of flavour 'pcipf' for PF number 0 SF number 88:

$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
pci/0000:06:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

Delete newly added devlink port
$ devlink port del pci/0000:06:00.0/32768

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agodevlink: Introduce PCI SF port flavour and attribute
Parav Pandit [Mon, 1 Feb 2021 21:35:48 +0000 (23:35 +0200)]
devlink: Introduce PCI SF port flavour and attribute

Introduce PCI SF port flavour and port attributes such as PF
number and SF number.

$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev

$ devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false

$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
pci/0000:08:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

$ devlink port show pci/0000:06:00.0/32768
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

$ devlink port function set pci/0000:06:00.0/32768 hw_addr 00:00:00:00:88:88 state active

$ devlink port show pci/0000:06:00.0/32768 -jp
{
    "port": {
        "pci/0000:06:00.0/32768": {
            "type": "eth",
            "netdev": "ens2f0npf0sf88",
            "flavour": "pcisf",
            "controller": 0,
            "pfnum": 0,
            "sfnum": 88,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:88:88",
                "state": "active",
                "opstate": "attached"
            }
        }
    }
}

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agodevlink: Introduce and use string to number mapper
Parav Pandit [Mon, 1 Feb 2021 21:35:47 +0000 (23:35 +0200)]
devlink: Introduce and use string to number mapper

Instead of using static mapping in code, introduce a helper routine to
map a value to string.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoUpdate kernel headers
David Ahern [Tue, 2 Feb 2021 01:58:51 +0000 (01:58 +0000)]
Update kernel headers

Update kernel headers to commit:
    14e8e0f60088 ("tcp: shrink inet_connection_sock icsk_mtup enabled and probe_size")

Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoiplink_can: add Classical CAN frame LEN8_DLC support
Oliver Hartkopp [Mon, 25 Jan 2021 10:40:55 +0000 (11:40 +0100)]
iplink_can: add Classical CAN frame LEN8_DLC support

The len8_dlc element is filled by the CAN interface driver and used for CAN
frame creation by the CAN driver when the CAN_CTRLMODE_CC_LEN8_DLC flag is
supported by the driver and enabled via netlink configuration interface.

Add the command line support for cc-len8-dlc for Linux 5.11+

Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agobond: support xmit_hash_policy=vlan+srcmac
Jarod Wilson [Fri, 15 Jan 2021 19:21:37 +0000 (14:21 -0500)]
bond: support xmit_hash_policy=vlan+srcmac

There's a new transmit hash policy being added to the bonding driver that
is a simple XOR of vlan ID and source MAC, xmit_hash_policy vlan+srcmac.
This trivial patch makes it configurable and queryable via iproute2.

$ sudo modprobe bonding mode=2 max_bonds=1 xmit_hash_policy=0

$ sudo ip link set bond0 type bond xmit_hash_policy vlan+srcmac

$ ip -d link show bond0
11: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether ce:85:5e:24:ce:90 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
    bond mode balance-xor miimon 0 updelay 0 downdelay 0 peer_notify_delay 0 use_carrier 1 arp_interval 0 arp_validate none arp_all_targets any
primary_reselect always fail_over_mac none xmit_hash_policy vlan+srcmac resend_igmp 1 num_grat_arp 1 all_slaves_active 0 min_links 0 lp_interval 1
packets_per_slave 1 lacp_rate slow ad_select stable tlb_dynamic_lb 1 addrgenmode eui64 numtxqueues 16 numrxqueues 16 gso_max_size 65536 gso_max_segs
65535

$ grep Hash /proc/net/bonding/bond0
Transmit Hash Policy: vlan+srcmac (5)

$ sudo ip link add test type bond help
Usage: ... bond [ mode BONDMODE ] [ active_slave SLAVE_DEV ]
                [ clear_active_slave ] [ miimon MIIMON ]
                [ updelay UPDELAY ] [ downdelay DOWNDELAY ]
                [ peer_notify_delay DELAY ]
                [ use_carrier USE_CARRIER ]
                [ arp_interval ARP_INTERVAL ]
                [ arp_validate ARP_VALIDATE ]
                [ arp_all_targets ARP_ALL_TARGETS ]
                [ arp_ip_target [ ARP_IP_TARGET, ... ] ]
                [ primary SLAVE_DEV ]
                [ primary_reselect PRIMARY_RESELECT ]
                [ fail_over_mac FAIL_OVER_MAC ]
                [ xmit_hash_policy XMIT_HASH_POLICY ]
                [ resend_igmp RESEND_IGMP ]
                [ num_grat_arp|num_unsol_na NUM_GRAT_ARP|NUM_UNSOL_NA ]
                [ all_slaves_active ALL_SLAVES_ACTIVE ]
                [ min_links MIN_LINKS ]
                [ lp_interval LP_INTERVAL ]
                [ packets_per_slave PACKETS_PER_SLAVE ]
                [ tlb_dynamic_lb TLB_DYNAMIC_LB ]
                [ lacp_rate LACP_RATE ]
                [ ad_select AD_SELECT ]
                [ ad_user_port_key PORTKEY ]
                [ ad_actor_sys_prio SYSPRIO ]
                [ ad_actor_system LLADDR ]

BONDMODE := balance-rr|active-backup|balance-xor|broadcast|802.3ad|balance-tlb|balance-alb
ARP_VALIDATE := none|active|backup|all
ARP_ALL_TARGETS := any|all
PRIMARY_RESELECT := always|better|failure
FAIL_OVER_MAC := none|active|follow
XMIT_HASH_POLICY := layer2|layer2+3|layer3+4|encap2+3|encap3+4|vlan+srcmac
LACP_RATE := slow|fast
AD_SELECT := stable|bandwidth|count

Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agotc: flower: add tc conntrack inv ct_state support
wenxu [Wed, 20 Jan 2021 02:52:12 +0000 (10:52 +0800)]
tc: flower: add tc conntrack inv ct_state support

Matches on conntrack inv ct_state.

Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoUpdate kernel headers
David Ahern [Sat, 23 Jan 2021 18:15:57 +0000 (18:15 +0000)]
Update kernel headers

Update kernel headers to commit:
    59a49d9617e2 ("Merge branch 'mlxsw-expose-number-of-physical-ports'")

Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agovrf: fix ip vrf exec with libbpf
Luca Boccassi [Sun, 17 Jan 2021 22:54:27 +0000 (22:54 +0000)]
vrf: fix ip vrf exec with libbpf

The size of bpf_insn is passed to bpf_load_program instead of the number
of elements as it expects, so ip vrf exec fails with:

$ sudo ip link add vrf-blue type vrf table 10
$ sudo ip link set dev vrf-blue up
$ sudo ip/ip vrf exec vrf-blue ls
Failed to load BPF prog: 'Invalid argument'
last insn is not an exit or jmp
processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
Kernel compiled with CGROUP_BPF enabled?

https://bugs.debian.org/980046

Reported-by: Emmanuel DECAEN <Emmanuel.Decaen@xsalto.com>
Signed-off-by: Luca Boccassi <bluca@debian.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agovrf: print BPF log buffer if bpf_program_load fails
Luca Boccassi [Sun, 17 Jan 2021 22:54:26 +0000 (22:54 +0000)]
vrf: print BPF log buffer if bpf_program_load fails

Necessary to understand what is going on when bpf_program_load fails

Signed-off-by: Luca Boccassi <bluca@debian.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agobuild: Fix link errors on some systems
Roi Dayan [Tue, 12 Jan 2021 10:33:17 +0000 (12:33 +0200)]
build: Fix link errors on some systems

Since moving get_rate() and get_size() from tc to lib, on some
systems we fail to link because of missing math lib.
Move the functions that require math lib to their own c file
and add -lm to dcb that now use those functions.

../lib/libutil.a(utils.o): In function `get_rate':
utils.c:(.text+0x10dc): undefined reference to `floor'
../lib/libutil.a(utils.o): In function `get_size':
utils.c:(.text+0x1394): undefined reference to `floor'
../lib/libutil.a(json_print.o): In function `sprint_size':
json_print.c:(.text+0x14c0): undefined reference to `rint'
json_print.c:(.text+0x14f4): undefined reference to `rint'
json_print.c:(.text+0x157c): undefined reference to `rint'

Fixes: f3be0e6366ac ("lib: Move get_rate(), get_rate64() from tc here")
Fixes: 44396bdfcc0a ("lib: Move get_size() from tc here")
Fixes: adbe5de96662 ("lib: Move sprint_size() from tc here, add print_size()")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Tested-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoMerge branch 'dcb-app-dcbx' into next
David Ahern [Mon, 18 Jan 2021 04:10:27 +0000 (04:10 +0000)]
Merge branch 'dcb-app-dcbx' into next

Petr Machata  says:

====================

Add support to the dcb tool for the following two DCB objects:

- APP, which allows configuration of traffic prioritization rules based on
  several possible packet headers.

- DCBX, which is a 1-byte bitfield of flags that configure whether the DCBX
  protocol is implemented in the device or in the host, and which version
  of the protocol should be used.

Patch #1 adds a new helper for finding a name of a given dsfield value.
This is useful for APP DSCP-to-priority rules, which can use human-readable
DSCP names.

Patches #2, #3 and #4 extend existing interfaces for, respectively, parsing
of the X:Y mappings, for setting a DCB object, and for getting a DCB
object.

In patch #5, support for the command line argument -N / --Numeric is
added. The APP tool later uses it to decide whether to format DSCP values
as human-readable strings or as plain numbers.

Patches #6 and #7 add the subtools themselves and their man pages.

v2:
- Two patches dropped and sent to iproute2 branch as "dcb: Fixes".
  This patch set now depends on that one.
- Patch #5:
    - Make it -N / --Numeric instead of -n / --no-nice-names
    - Rename the flag from no_nice_names to numeric as well
- Patch #6:
    - Adjust to s/no_nice_names/numeric/ from another patch.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agodcb: Add a subtool for the DCBX object
Petr Machata [Sat, 2 Jan 2021 00:03:41 +0000 (01:03 +0100)]
dcb: Add a subtool for the DCBX object

The Linux DCBX object is a 1-byte bitfield of flags that configure whether
the DCBX protocol is implemented in the device or in the host, and which
version of the protocol should be used. Add a tool to access the per-port
Linux DCBX object.

For example:

# dcb dcbx set dev eni1np1 host ieee
# dcb dcbx show dev eni1np1
host ieee

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agodcb: Add a subtool for the DCB APP object
Petr Machata [Sat, 2 Jan 2021 00:03:40 +0000 (01:03 +0100)]
dcb: Add a subtool for the DCB APP object

DCB APP interfaces are standardized in 802.1q-2018, and allow configuration
of traffic prioritization rules based on several possible headers.

Add a dcb subtool for maintenance and display of the APP table. For
example:

    # dcb app add dev eni1np1 dscp-prio 0:0 CS3:3 CS6:6
    # dcb app show dev eni1np1
    dscp-prio 0:0 CS3:3 CS6:6
    # dcb app add dev eni1np1 dscp-prio CS3:4
    # dcb app show dev eni1np1
    dscp-prio 0:0 CS3:3 CS3:4 CS6:6
    # dcb app replace dev eni1np1 dscp-prio CS3:5
    # dcb app show dev eni1np1
    dscp-prio 0:0 CS3:5 CS6:6

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agodcb: Support -N to suppress translation to human-readable names
Petr Machata [Sat, 2 Jan 2021 00:03:39 +0000 (01:03 +0100)]
dcb: Support -N to suppress translation to human-readable names

Some DSCP values can be translated to symbolic names. That may not be
always desirable. Introduce a command-line option similar to other tools,
-N or --Numeric, to suppress this translation.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agodcb: Generalize dcb_get_attribute()
Petr Machata [Sat, 2 Jan 2021 00:03:38 +0000 (01:03 +0100)]
dcb: Generalize dcb_get_attribute()

The function dcb_get_attribute() assumes that the caller knows the exact
size of the looked-for payload. It also assumes that the response comes
wrapped in an DCB_ATTR_IEEE nest. The former assumption does not hold for
the IEEE APP table, which has variable size. The latter one does not hold
for DCBX, which is not IEEE-nested, and also for any CEE attributes, which
would come CEE-nested.

Factor out the payload extractor from the current dcb_get_attribute() code,
and put into a helper. Then rewrite dcb_get_attribute() compatibly in terms
of the new function. Introduce dcb_get_attribute_va() as a thin wrapper for
IEEE-nested access, and dcb_get_attribute_bare() for access to attributes
that are not nested.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agodcb: Generalize dcb_set_attribute()
Petr Machata [Sat, 2 Jan 2021 00:03:37 +0000 (01:03 +0100)]
dcb: Generalize dcb_set_attribute()

The function dcb_set_attribute() takes a fully-formed payload as an
argument. For callers that need to build a nested attribute, such as is the
case for DCB APP table, this is not great, because with libmnl, they would
need to construct a separate netlink message just to pluck out the payload
and hand it over to this function.

Currently, dcb_set_attribute() also always wraps the payload in an
DCB_ATTR_IEEE container, because that is what all the dcb subtools so far
needed. But that is not appropriate for DCBX in particular, and in fact a
handful other attributes, as well as any CEE payloads.

Instead, generalize this code by adding parameters for constructing a
custom payload and for fetching the response from a custom response
attribute. Then add dcb_set_attribute_va(), which takes a callback to
invoke in the right place for the nest to be built, and
dcb_set_attribute_bare(), which is similar to dcb_set_attribute(), but does
not encapsulate the payload in an IEEE container. Rewrite
dcb_set_attribute() compatibly in terms of the new functions.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agolib: Generalize parse_mapping()
Petr Machata [Sat, 2 Jan 2021 00:03:36 +0000 (01:03 +0100)]
lib: Generalize parse_mapping()

The function parse_mapping() assumes the key is a number, with a single
configurable exception, which is using "all" to mean "all possible keys".
If a caller wishes to use symbolic names instead of numbers, they cannot
reuse this function.

To facilitate reuse in these situations, convert parse_mapping() into a
helper, parse_mapping_gen(), which instead of an allow-all boolean takes a
generic key-parsing callback. Rewrite parse_mapping() in terms of this
newly-added helper and add a pair of key parsers, one for just numbers,
another for numbers and the keyword "all". Publish the latter as well.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agolib: rt_names: Add rtnl_dsfield_get_name()
Petr Machata [Sat, 2 Jan 2021 00:03:35 +0000 (01:03 +0100)]
lib: rt_names: Add rtnl_dsfield_get_name()

For formatting DSCP (not full dsfield), it would be handy to be able to
just get the name from the name table, and not get any of the remaining
cruft related to formatting. Add a new entry point to just fetch the
name table string uninterpreted. Use it from rtnl_dsfield_n2a().

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoMerge branch 'main' into next
David Ahern [Mon, 18 Jan 2021 03:57:29 +0000 (03:57 +0000)]
Merge branch 'main' into next

Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agotc: flower: fix json output with mpls lse
Guillaume Nault [Tue, 12 Jan 2021 10:30:53 +0000 (11:30 +0100)]
tc: flower: fix json output with mpls lse

The json output of the TCA_FLOWER_KEY_MPLS_OPTS attribute was invalid.

Example:

  $ tc filter add dev eth0 ingress protocol mpls_uc flower mpls \
      lse depth 1 label 100                                     \
      lse depth 2 label 200

  $ tc -json filter show dev eth0 ingress
    ...{"eth_type":"8847",
        "  mpls":["    lse":["depth":1,"label":100],
                  "    lse":["depth":2,"label":200]]}...

This is invalid as the arrays, introduced by "[", can't contain raw
string:value pairs. Those must be enclosed into "{}" to form valid json
ojects. Also, there are spurious whitespaces before the mpls and lse
strings because of the indentation used for normal output.

Fix this by putting all LSE parameters (depth, label, tc, bos and ttl)
into the same json object. The "mpls" key now directly contains a list
of such objects.

Also, handle strings differently for normal and json output, so that
json strings don't get spurious indentation whitespaces.

Normal output isn't modified.
The json output now looks like:

  $ tc -json filter show dev eth0 ingress
    ...{"eth_type":"8847",
        "mpls":[{"depth":1,"label":100},
                {"depth":2,"label":200}]}...

Fixes: eb09a15c12fb ("tc: flower: support multiple MPLS LSE match")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agodcb: Change --Netns/-N to --netns/-n
Petr Machata [Sun, 3 Jan 2021 10:57:24 +0000 (11:57 +0100)]
dcb: Change --Netns/-N to --netns/-n

This to keep compatible with the major tools, ip and tc. Also
document the option in the man page, which was neglected.

Fixes: 67033d1c1c8a ("Add skeleton of a new tool, dcb")
Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agodcb: Plug a leaking DCB socket buffer
Petr Machata [Sun, 3 Jan 2021 10:57:23 +0000 (11:57 +0100)]
dcb: Plug a leaking DCB socket buffer

DCB socket buffer is allocated in dcb_init(), but never freed(). Free it
in dcb_fini().

Fixes: 67033d1c1c8a ("Add skeleton of a new tool, dcb")
Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agodcb: Set values with RTM_SETDCB type
Petr Machata [Sun, 3 Jan 2021 10:57:22 +0000 (11:57 +0100)]
dcb: Set values with RTM_SETDCB type

dcb currently sends all netlink messages with a type RTM_GETDCB, even the
set ones. Change to the appropriate type.

Fixes: 67033d1c1c8a ("Add skeleton of a new tool, dcb")
Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agouapi: update if_link.h from upstream
Stephen Hemminger [Sat, 16 Jan 2021 17:09:35 +0000 (09:09 -0800)]
uapi: update if_link.h from upstream

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoinclude: uapi: Carry dcbnl.h
Petr Machata [Tue, 12 Jan 2021 12:04:55 +0000 (13:04 +0100)]
include: uapi: Carry dcbnl.h

To allow building a new suite of DCB tools on an older kernel, carry a copy
of dcbnl.h.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agordma: Add support for the netlink extack
Patrisious Haddad [Sun, 3 Jan 2021 06:17:06 +0000 (08:17 +0200)]
rdma: Add support for the netlink extack

Add support in rdma for extack errors to be received
in userspace when sent from kernel, so now netlink extack
error messages sent from kernel would be printed for the
user.

Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoipmonitor: Mention "nexthop" object in help and man page
Ido Schimmel [Thu, 7 Jan 2021 15:23:27 +0000 (17:23 +0200)]
ipmonitor: Mention "nexthop" object in help and man page

Before:

 # ip monitor help
 Usage: ip monitor [ all | LISTofOBJECTS ] [ FILE ] [ label ] [all-nsid] [dev DEVICE]
 LISTofOBJECTS := link | address | route | mroute | prefix |
                  neigh | netconf | rule | nsid
 FILE := file FILENAME

After:

 # ip monitor help
 Usage: ip monitor [ all | LISTofOBJECTS ] [ FILE ] [ label ] [all-nsid] [dev DEVICE]
 LISTofOBJECTS := link | address | route | mroute | prefix |
                  neigh | netconf | rule | nsid | nexthop
 FILE := file FILENAME

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agonexthop: Fix usage output
Ido Schimmel [Thu, 7 Jan 2021 15:23:26 +0000 (17:23 +0200)]
nexthop: Fix usage output

Before:

 # ip nexthop help
 Usage: ip nexthop { list | flush } [ protocol ID ] SELECTOR
        ip nexthop { add | replace } id ID NH [ protocol ID ]
        ip nexthop { get| del } id ID
 SELECTOR := [ id ID ] [ dev DEV ] [ vrf NAME ] [ master DEV ]
             [ groups ] [ fdb ]
 NH := { blackhole | [ via ADDRESS ] [ dev DEV ] [ onlink ]
       [ encap ENCAPTYPE ENCAPHDR ] | group GROUP ] }
 GROUP := [ id[,weight]>/<id[,weight]>/... ]
 ENCAPTYPE := [ mpls ]
 ENCAPHDR := [ MPLSLABEL ]

After:

 # ip nexthop help
 Usage: ip nexthop { list | flush } [ protocol ID ] SELECTOR
        ip nexthop { add | replace } id ID NH [ protocol ID ]
        ip nexthop { get | del } id ID
 SELECTOR := [ id ID ] [ dev DEV ] [ vrf NAME ] [ master DEV ]
             [ groups ] [ fdb ]
 NH := { blackhole | [ via ADDRESS ] [ dev DEV ] [ onlink ]
         [ encap ENCAPTYPE ENCAPHDR ] | group GROUP [ fdb ] }
 GROUP := [ <id[,weight]>/<id[,weight]>/... ]
 ENCAPTYPE := [ mpls ]
 ENCAPHDR := [ MPLSLABEL ]

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agouapi: update kernel headers to 5.11 pre rc1
Stephen Hemminger [Fri, 25 Dec 2020 03:38:35 +0000 (19:38 -0800)]
uapi: update kernel headers to 5.11 pre rc1

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoMerge git://git.kernel.org/pub/scm/network/iproute2/iproute2-next into main
Stephen Hemminger [Fri, 25 Dec 2020 03:29:15 +0000 (19:29 -0800)]
Merge git://git.kernel.org/pub/scm/network/iproute2/iproute2-next into main

3 years ago5.10.0
Stephen Hemminger [Mon, 21 Dec 2020 18:28:53 +0000 (10:28 -0800)]
5.10.0

3 years agotestsuite: Add mpls packet matching tests for tc flower
Guillaume Nault [Mon, 14 Dec 2020 17:14:22 +0000 (18:14 +0100)]
testsuite: Add mpls packet matching tests for tc flower

Match all MPLS fields using smallest and highest possible values.
Test the two ways of specifying MPLS header matching:

  * with the basic mpls_{label,tc,bos,ttl} keywords (match only on the
    first LSE),

  * with the more generic "lse" keyword (allows matching at different
    depth of the MPLS label stack).

This test file allows to find problems like the one fixed by
Linux commit 7fdd375e3830 ("net: sched: Fix dump of MPLS_OPT_LSE_LABEL
attribute in cls_flower").

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
3 years agoMerge branch 'main' into next
David Ahern [Wed, 16 Dec 2020 04:06:06 +0000 (04:06 +0000)]
Merge branch 'main' into next

Signed-off-by: David Ahern <dsahern@gmail.com>
3 years agoiplink:macvlan: Added bcqueuelen parameter
Thomas Karlsson [Mon, 14 Dec 2020 10:42:18 +0000 (11:42 +0100)]
iplink:macvlan: Added bcqueuelen parameter

This patch allows the user to set and retrieve the
IFLA_MACVLAN_BC_QUEUE_LEN parameter via the bcqueuelen
command line argument

This parameter controls the requested size of the queue for
broadcast and multicast packages in the macvlan driver.

If not specified, the driver default (1000) will be used.

Note: The request is per macvlan but the actually used queue
length per port is the maximum of any request to any macvlan
connected to the same port.

For this reason, the used queue length IFLA_MACVLAN_BC_QUEUE_LEN_USED
is also retrieved and displayed in order to aid in the understanding
of the setting. However, it can of course not be directly set.

Signed-off-by: Thomas Karlsson <thomas.karlsson@paneda.se>
Signed-off-by: David Ahern <dsahern@gmail.com>
3 years agoss: mptcp: fix add_addr_accepted stat print
Andrea Claudi [Wed, 18 Nov 2020 14:24:18 +0000 (15:24 +0100)]
ss: mptcp: fix add_addr_accepted stat print

add_addr_accepted value is not printed if add_addr_signal value is 0.
Fix this properly looking for add_addr_accepted value, instead.

Fixes: 9c3be2c0eee01 ("ss: mptcp: add msk diag interface support")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agotc: pedit: fix memory leak in print_pedit
Andrea Claudi [Fri, 11 Dec 2020 18:53:03 +0000 (19:53 +0100)]
tc: pedit: fix memory leak in print_pedit

keys_ex is dinamically allocated with calloc on line 770, but
is not freed in case of error at line 823.

Fixes: 081d6c310d3a ("tc: pedit: Support JSON dumping")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agodevlink: fix memory leak in cmd_dev_flash()
Andrea Claudi [Fri, 11 Dec 2020 18:53:02 +0000 (19:53 +0100)]
devlink: fix memory leak in cmd_dev_flash()

nlg_ntf is dinamically allocated in mnlg_socket_open(), and is freed on
the out: return path. However, some error paths do not free it,
resulting in memory leak.

This commit fix this using mnlg_socket_close(), and reporting the
correct error number when required.

Fixes: 9b13cddfe268 ("devlink: implement flash status monitoring")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoman: tc-flower: fix manpage
Andrea Claudi [Fri, 11 Dec 2020 18:26:40 +0000 (19:26 +0100)]
man: tc-flower: fix manpage

Commit 924c43778a84 ("man: tc-ct.8: Add manual page for ct tc action")
add man page for tc-ct, but it brings with it a bogus block of text
in the benning of tc-flower man page.

This commit simply removes it.

Fixes: 924c43778a84 ("man: tc-ct.8: Add manual page for ct tc action")
Reported-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoMerge branch 'dcb-pfc-buffer-maxrate' into next
David Ahern [Mon, 14 Dec 2020 16:48:38 +0000 (16:48 +0000)]
Merge branch 'dcb-pfc-buffer-maxrate' into next

Petr Machata  says:
====================

Add support to the dcb tool for the following three DCB objects:

- PFC, for "Priority-based Flow Control", allows configuration of priority
  lossiness, and related toggles.

- DCBNL buffer interfaces are an extension to the 802.1q DCB interfaces and
  allow configuration of port headroom buffers.

- DCBNL maxrate interfaces are an extension to the 802.1q DCB interfaces
  and allow configuration of rate with which traffic in a given traffic
  class is sent.

Patches #1-#4 fix small issues in the current DCB code and man pages.

Patch #5 adds new helpers to the DCB dispatcher.

Patches #6 and #7 add support for command line arguments -s and -i. These
enable, respectively, display of statistical counters, and ISO/IEC mode of
rate units.

Patches #8-#10 add the subtools themselves and their man pages.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
3 years agodcb: Add a subtool for the DCB maxrate object
Petr Machata [Thu, 10 Dec 2020 23:02:24 +0000 (00:02 +0100)]
dcb: Add a subtool for the DCB maxrate object

DCBNL maxrate interfaces are an extension to the 802.1q DCB interfaces and
allow configuration of rate with which traffic in a given traffic class is
sent.

Add a dcb subtool to allow showing and tweaking of this per-TC maximum
rate. For example:

    # dcb maxrate show dev eni1np1
    tc-maxrate 0:25Gbit 1:25Gbit 2:25Gbit 3:25Gbit 4:25Gbit 5:25Gbit 6:100Gbit 7:25Gbit

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
3 years agodcb: Add a subtool for the DCB buffer object
Petr Machata [Thu, 10 Dec 2020 23:02:23 +0000 (00:02 +0100)]
dcb: Add a subtool for the DCB buffer object

DCBNL buffer interfaces are an extension to the 802.1q DCB interfaces and
allow configuration of port headroom buffers.

Add a dcb subtool to allow showing and tweaking of buffer priority mapping
and buffer sizes. For example:

    # dcb buf show dev eni1np1
    prio-buffer 0:0 1:0 2:0 3:3 4:0 5:0 6:6 7:0
    buffer-size 0:10000 1:0 2:0 3:70000 4:0 5:0 6:10000 7:0
    total-size 221072

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
3 years agodcb: Add a subtool for the DCB PFC object
Petr Machata [Thu, 10 Dec 2020 23:02:22 +0000 (00:02 +0100)]
dcb: Add a subtool for the DCB PFC object

PFC, for "Priority-based Flow Control", allows configuration of priority
lossiness, and related toggles.

Add a dcb subtool to allow showing and tweaking of individual PFC
configuration options, and querying statistics. For example:

    # dcb pfc show dev eni1np1
    pfc-cap 8 macsec-bypass on delay 0
    pg-pfc 0:off 1:on 2:off 3:off 4:off 5:off 6:off 7:on
    requests 0:0 1:217 2:0 3:0 4:0 5:0 6:0 7:28
    indications 0:0 1:179 2:0 3:0 4:0 5:0 6:0 7:18

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
3 years agodcb: Add -i to enable IEC mode
Petr Machata [Thu, 10 Dec 2020 23:02:21 +0000 (00:02 +0100)]
dcb: Add -i to enable IEC mode

Allow switching "dcb" into the ISO/IEC mode of units by passing -i.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
3 years agodcb: Add -s to enable statistics
Petr Machata [Thu, 10 Dec 2020 23:02:20 +0000 (00:02 +0100)]
dcb: Add -s to enable statistics

Allow selective display of statistical counters by passing -s.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
3 years agodcb: Add dcb_set_u32(), dcb_set_u64()
Petr Machata [Thu, 10 Dec 2020 23:02:19 +0000 (00:02 +0100)]
dcb: Add dcb_set_u32(), dcb_set_u64()

The DCB buffer object has a settable array of 32-bit quantities, and the
maxrate object of 64-bit ones. Adjust dcb_parse_mapping() and related
helpers to support 64-bit values in mappings, and add appropriate helpers.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
3 years agoman: dcb-ets: Remove an unnecessary empty line
Petr Machata [Thu, 10 Dec 2020 23:02:18 +0000 (00:02 +0100)]
man: dcb-ets: Remove an unnecessary empty line

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
3 years agodcb: ets: Change the way show parameters are given in synopsis
Petr Machata [Thu, 10 Dec 2020 23:02:17 +0000 (00:02 +0100)]
dcb: ets: Change the way show parameters are given in synopsis

None, one, or many parameters can be given on the command line, but
the current synopsis allows only none or one. Fix it.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
3 years agodcb: ets: Fix help display for "show" subcommand
Petr Machata [Thu, 10 Dec 2020 23:02:16 +0000 (00:02 +0100)]
dcb: ets: Fix help display for "show" subcommand

"dcb ets show dev X help" currently shows full "ets" help instead of just
help for the show command. Fix it.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
3 years agodcb: Remove unsupported command line arguments from getopt_long()
Petr Machata [Thu, 10 Dec 2020 23:02:15 +0000 (00:02 +0100)]
dcb: Remove unsupported command line arguments from getopt_long()

getopt_long() currently includes "c" and "n" in the short option string.
These probably slipped in as a cut'n'paste, and are not actually accepted.
Remove them.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
3 years agouapi: merge in change to bpf.h
Stephen Hemminger [Mon, 14 Dec 2020 16:07:06 +0000 (08:07 -0800)]
uapi: merge in change to bpf.h

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoMerge branch 'devlink-reload' into next
David Ahern [Wed, 9 Dec 2020 02:43:41 +0000 (02:43 +0000)]
Merge branch 'devlink-reload' into next

Signed-off-by: David Ahern <dsahern@gmail.com>
3 years agodevlink: Add reload stats to dev show
Moshe Shemesh [Mon, 7 Dec 2020 05:35:22 +0000 (07:35 +0200)]
devlink: Add reload stats to dev show

Show reload statistics through devlink dev show using devlink stats
flag. The reload statistics show the history per reload action type and
limit. Add remote reload statistics to show the history of actions
performed due devlink reload commands initiated by remote host.

Output examples:
$ devlink dev show -s
pci/0000:82:00.0:
  stats:
      reload:
          driver_reinit:
            unspecified 2
          fw_activate:
            unspecified 1 no_reset 0
      remote_reload:
          driver_reinit:
            unspecified 0
          fw_activate:
            unspecified 0 no_reset 0
pci/0000:82:00.1:
  stats:
      reload:
          driver_reinit:
            unspecified 0
          fw_activate:
            unspecified 0 no_reset 0
      remote_reload:
          driver_reinit:
            unspecified 1
          fw_activate:
            unspecified 1 no_reset 0

$ devlink dev show -s -jp
{
    "dev": {
        "pci/0000:82:00.0": {
            "stats": {
                "reload": {
                    "driver_reinit": {
                        "unspecified": 2
                    },
                    "fw_activate": {
                        "unspecified": 1,
                        "no_reset": 0
                    }
                },
                "remote_reload": {
                    "driver_reinit": {
                        "unspecified": 0
                    },
                    "fw_activate": {
                        "unspecified": 0,
                        "no_reset": 0
                    }
                }
            }
        },
        "pci/0000:82:00.1": {
            "stats": {
                "reload": {
                    "driver_reinit": {
                        "unspecified": 0
                    },
                    "fw_activate": {
                        "unspecified": 0,
                        "no_reset": 0
                    }
                },
                "remote_reload": {
                    "driver_reinit": {
                        "unspecified": 1
                    },
                    "fw_activate": {
                        "unspecified": 1,
                        "no_reset": 0
                    }
                }
            }
        }
    }
}

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
3 years agodevlink: Add pr_out_dev() helper function
Moshe Shemesh [Mon, 7 Dec 2020 05:35:21 +0000 (07:35 +0200)]
devlink: Add pr_out_dev() helper function

Add pr_out_dev() helper function and use it both by cmd_dev_show_cb()
and by cmd_mon_show_cb().

Dev stats will be added on the next patch to dev context, so
cmd_mon_show_cb() should print the whole dev context and not just dev
handle.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
3 years agodevlink: Add devlink reload action and limit options
Moshe Shemesh [Mon, 7 Dec 2020 05:35:20 +0000 (07:35 +0200)]
devlink: Add devlink reload action and limit options

Add reload action and reload limit to devlink reload command to enable
the user to select the reload action required and constrains limits on
these actions that he may want to ensure.

The following reload actions are supported:
  driver_reinit: driver entities re-initialization, applying
                 devlink-param and devlink-resource values.
  fw_activate: firmware activate.

The uAPI is backward compatible, if the reload action option is omitted
from the reload command, the driver reinit action will be used.
Note that when required to do firmware activation some drivers may need
to reload the driver. On the other hand some drivers may need to reset
the firmware to reinitialize the driver entities. Therefore, the devlink
reload command returns the actions which were actually performed.

By default reload actions are not limited and driver implementation may
include reset or downtime as needed to perform the actions. However, if
reload limit is selected, the driver should perform only if it can do it
while keeping the limit constraints.

Reload limit added:
  no_reset: No reset allowed, no down time allowed, no link flap and no
            configuration is lost.

Command examples:
$devlink dev reload pci/0000:82:00.0 action driver_reinit
reload_actions_performed:
  driver_reinit

$devlink dev reload pci/0000:82:00.0 action fw_activate
reload_actions_performed:
  driver_reinit fw_activate

devlink dev reload pci/0000:82:00.1 action driver_reinit -jp
{
    "reload": {
        "reload_actions_performed": [ "driver_reinit" ]
    }
}

devlink dev reload pci/0000:82:00.0 action fw_activate -jp
{
    "reload": {
        "reload_actions_performed": [ "driver_reinit","fw_activate" ]
    }
}

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
3 years agoMerge branch 'rate-size-parsing-output' into next
David Ahern [Wed, 9 Dec 2020 02:32:17 +0000 (02:32 +0000)]
Merge branch 'rate-size-parsing-output' into next

Petr Machata says:
==================

The DCB tool will have commands that deal with buffer sizes and traffic
rates. TC is another tool that has a number of such commands, and functions
to support them: get_size(), get_rate/64(), s/print_size() and
s/print_rate(). In this patchset, these functions are moved from TC to lib/
for possible reuse and modernized.

s/print_rate() has a hidden parameter of a global variable use_iec, which
made the conversion non-trivial. The parameter was made explicit,
print_rate() converted to a mostly json_print-like function, and
sprint_rate() retired in favor of the new print_rate. Patches #1 and #2
deal with this.

The intention was to treat s/print_size() similarly, but unfortunately two
use cases of sprint_size() cannot be converted to a json_print-like
print_size(), and the function sprint_size() had to remain as a discouraged
backdoor to print_size(). This is done in patch #3.

Patch #4 then improves the code of sprint_size() a little bit.

Patch #5 fixes a buglet in formatting small rates in IEC mode.

Patches #6 and #7 handle a routine movement of, respectively,
get_rate/64() and get_size() from tc to lib.

This patchset does not actually add any new uses of these functions. A
follow-up patchset will add subtools for management of DCB buffer and DCB
maxrate objects that will make use of them.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
3 years agolib: Move get_size() from tc here
Petr Machata [Sat, 5 Dec 2020 21:13:35 +0000 (22:13 +0100)]
lib: Move get_size() from tc here

The function get_size() serves for parsing of sizes using a handly notation
that supports units and their prefixes, such as 10Kbit. This will be useful
for the DCB buffer size parsing. Move the function from TC to the general
library, so that it can be reused.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
3 years agolib: Move get_rate(), get_rate64() from tc here
Petr Machata [Sat, 5 Dec 2020 21:13:34 +0000 (22:13 +0100)]
lib: Move get_rate(), get_rate64() from tc here

The functions get_rate() and get_rate64() are useful for parsing rate-like
values. The DCB tool will find these useful in the maxrate subtool.
Move them over to lib so that they can be easily reused.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
3 years agolib: print_color_rate(): Fix formatting small rates in IEC mode
Petr Machata [Sat, 5 Dec 2020 21:13:33 +0000 (22:13 +0100)]
lib: print_color_rate(): Fix formatting small rates in IEC mode

ISO/IEC units are distinguished from the decadic ones by using a prefixes
like "Ki", "Mi" instead of "K" and "M". The current code inserts the letter
"i" after the decadic unit when in IEC mode. However it does so even when
the prefix is an empty string, formatting 1Kbit in IEC mode as "1000ibit".
Fix by omitting the letter if there is no prefix.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>