Andrea Claudi [Mon, 22 Feb 2021 18:14:32 +0000 (19:14 +0100)]
lib/fs: Fix single return points for get_cgroup2_*
Functions get_cgroup2_id() and get_cgroup2_path() may call close() with
a negative argument.
Avoid that making the calls conditional on the file descriptors.
get_cgroup2_path() may also return NULL leaking a file descriptor.
Ensure this does not happen using a single return point.
Fixes: d5e6ee0dac64 ("ss: introduce cgroup2 cache and helper functions") Fixes: 8f1cd119b377 ("lib: fix checking of returned file handle size for cgroup") Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Andrea Claudi [Mon, 22 Feb 2021 18:14:31 +0000 (19:14 +0100)]
lib/fs: avoid double call to mkdir on make_path()
make_path() function calls mkdir two times in a row. The first one it
stores mkdir return code, and then it calls it again to check for errno.
This seems unnecessary, as we can use the return code from the first
call and check for errno if not 0.
Fixes: ac3415f5c1b1d ("lib/fs: Fix and simplify make_path()") Acked-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Andrea Claudi [Mon, 22 Feb 2021 17:43:10 +0000 (18:43 +0100)]
lib/bpf: Fix and simplify bpf_mnt_check_target()
As stated in commit ac3415f5c1b1 ("lib/fs: Fix and simplify make_path()"),
calling stat() before mkdir() is racey, because the entry might change in
between.
As the call to stat() seems to only check for target existence, we can
simply call mkdir() unconditionally and catch all errors but EEXIST.
Fixes: 95ae9a4870e7 ("bpf: fix mnt path when from env") Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Andrea Claudi [Mon, 22 Feb 2021 11:40:36 +0000 (12:40 +0100)]
lib/namespace: fix ip -all netns return code
When ip -all netns {del,exec} are called and no netns is present, ip
exit with status 0. However this does not happen if no netns has been
created since boot time: in that case, indeed, the NETNS_RUN_DIR is not
present and netns_foreach() exit with code 1.
$ ls /var/run/netns
ls: cannot access '/var/run/netns': No such file or directory
$ ip -all netns exec ip link show
$ echo $?
1
$ ip -all netns del
$ echo $?
1
$ ip netns add test
$ ip netns del test
$ ip -all netns del
$ echo $?
0
$ ls -a /var/run/netns
. ..
This leaves us in the unpleasant situation where the same command, when
no netns is present, does the same stuff (in this case, nothing), but
exit with two different statuses.
Fix this treating ENOENT in a different way from other errors, similarly
to what we already do in ipnetns.c netns_identify_pid()
Fixes: e998e118ddc3 ("lib: Exec func on each netns") Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Andrea Claudi [Mon, 22 Feb 2021 20:23:01 +0000 (21:23 +0100)]
ip: lwtunnel: seg6: bail out if table ids are invalid
When table and vrftable are used in SRv6, ip should bail out if table
ids are not valid, and return a proper error message to the user.
Achieve this simply checking rtnl_rttable_a2n return value, as we
already do in the rest of iproute.
Fixes: 0486388a877a ("add support for table name in SRv6 End.DT* behaviors") Fixes: 69629b4e43c4 ("seg6: add support for vrftable attribute in SRv6 End.DT4/DT6 behaviors") Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Andrea Claudi [Mon, 22 Feb 2021 20:22:47 +0000 (21:22 +0100)]
tc: m_gate: use SPRINT_BUF when needed
sprint_time64() uses SPRINT_BSIZE-1 as a constant buffer lenght in its
implementation, however m_gate uses shorter buffers when calling it.
Fix this using SPRINT_BUF macro to get the buffer, thus getting a
SPRINT_BSIZE-long buffer.
Fixes: 07d5ee70b5b3 ("iproute2-next:tc:action: add a gate control action") Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Vladimir Oltean [Thu, 11 Feb 2021 10:45:02 +0000 (12:45 +0200)]
man8/bridge.8: be explicit that "flood" is an egress setting
Talking to varios people, it became apparent that there is a certain
ambiguity in the description of these flags. They refer to egress
flooding, which should perhaps be stated more clearly.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The dump isn't supported for the statistics bind/unbind commands
because they operate on specific QP counters. This is different
from query commands that can operate on many objects at the same
time.
Let's check the user input and ensure that arguments are valid.
Fixes: a6d0773ebecc ("rdma: Add stat manual mode support") Signed-off-by: Ido Kalir <idok@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Thayne McCombs [Sun, 14 Feb 2021 08:09:13 +0000 (01:09 -0700)]
ss: Make leading ":" always optional for sport and dport
The sport and dport conditions in expressions were inconsistent on
whether there should be a ":" at the beginning of the port when only a
port was provided depending on the family. The link and netlink
families required a ":" to work. The vsock family required the ":"
to be absent. The inet and inet6 families work with or without a leading
":".
This makes the leading ":" optional in all cases, so if sport or dport
are used, then it works with a leading ":" or without one, as inet and
inet6 did.
Signed-off-by: Thayne McCombs <astrothayne@gmail.com> Signed-off-by: David Ahern <dsahern@kernel.org>
Amit Cohen [Tue, 9 Feb 2021 09:12:00 +0000 (11:12 +0200)]
ip route: Print "rt_offload_failed" indication
The kernel signals when offload fails using the 'RTM_F_OFFLOAD_FAILED'
flag. Print it to help users understand the offload state of the route.
The "rt_" prefix is used in order to distinguish it from the offload state
of nexthops, similar to "rt_offload" and "rt_trap".
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>
Delete the vdpa device after its use:
$ vdpa dev del foo2
An example of PCI PF, VF and SF management device:
pci/0000:03.00:0
supported_classes
net
pci/0000:03.00:4
supported_classes
net
auxiliary/mlx5_core.sf.8
supported_classes
net
Parav Pandit [Wed, 10 Feb 2021 18:34:42 +0000 (20:34 +0200)]
utils: Add helper routines for indent handling
Subsequent patch needs to use 2 char indentation for nested objects.
Hence introduce a generic helpers to allocate, deallocate, increment,
decrement and to print indent block.
Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>
This commit adds support for configuring HTB in offload mode. HTB
offload eliminates the single qdisc lock in the datapath and offloads
the algorithm to the NIC. The new 'offload' parameter is added to
enable this mode:
Classes are created as usual, but filters should be moved to clsact for
lock-free classification (filters attached to HTB itself are not
supported in the offload mode):
# tc filter add dev eth0 egress protocol ip flower dst_port 80
action skbedit priority 1:10
tc qdisc show and tc class show will indicate whether the offload is
enabled. Example output:
Thayne McCombs [Tue, 2 Feb 2021 03:32:10 +0000 (20:32 -0700)]
ss: always prefer family as part of host condition to default family
ss accepts an address family both with the -f option and as part of a
host condition. However, if the family in the host condition is
different than the the last -f option, then which family is actually
used depends on the order that different families are checked.
This changes parse_hostcond to check all family prefixes before parsing
the rest of the address, so that the host condition's family always has
a higher priority than the "preferred" family.
Signed-off-by: Thayne McCombs <astrothayne@gmail.com> Signed-off-by: David Ahern <dsahern@kernel.org>
Luca Boccassi [Sun, 24 Jan 2021 17:36:58 +0000 (17:36 +0000)]
iproute: force rtm_dst_len to 32/128
Since NETLINK_GET_STRICT_CHK was enabled, the kernel rejects commands
that pass a prefix length, eg:
ip route get `1.0.0.0/1
Error: ipv4: Invalid values in header for route get request.
ip route get 0.0.0.0/0
Error: ipv4: rtm_src_len and rtm_dst_len must be 32 for IPv4
Since there's no point in setting a rtm_dst_len that we know is going
to be rejected, just force it to the right value if it's passed on
the command line. Print a warning to stderr to notify users.
Thayne McCombs [Tue, 2 Feb 2021 22:30:40 +0000 (14:30 -0800)]
ss: Add clarification about host conditions with multiple familes to man
In creating documentation for expressions I ran into an interesting case
where if you use two different familie types in the expression, such as
in `ss 'sport inet:ssh or src unix:/run/*'`, then you would only get the
results for one address family (in this case unix sockets).
The reason is that in parse_hostcond if the family is specified we
remove any previously added families from filter->families, and
preserve the "states" if any states are set. I tried changing this to
not reset the families, but ran into some issues with Invalid Argument
errors in inet_show_netlink, I think related to the state.
I can dig into that more if supporting this is useful, but I'm not sure
if these types of expressions would actually be useful in practice. Or
perhaps an error should be given if an expression contains conditions
with multiple families (besides inet and inet6)?
Anyway, for now, this patch just notes the limitation in the man page.
Signed-off-by: Thayne McCombs <astrothayne@gmail.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Edwin Peer [Tue, 26 Jan 2021 17:40:53 +0000 (09:40 -0800)]
iplink: print warning for missing VF data
The kernel might truncate VF info in IFLA_VFINFO_LIST. Compare the
expected number of VFs in IFLA_NUM_VF to how many were found in the
list and warn accordingly.
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Paolo Abeni [Mon, 25 Jan 2021 16:02:07 +0000 (17:02 +0100)]
ss: do not emit warn while dumping MPTCP on old kernels
Prior to this commit, running 'ss' on a kernel older than v5.9
bumps an error message:
RTNETLINK answers: Invalid argument
When asked to dump protocol number > 255 - that is: MPTCP - 'ss'
adds an INET_DIAG_REQ_PROTOCOL attribute, unsupported by the older
kernel.
Avoid the warning ignoring filter issues when INET_DIAG_REQ_PROTOCOL
is used.
Additionally older kernel end-up invoking tcpdiag_send(), which
in turn will try to dump DCCP socks. Bail early in such function,
as the kernel does not implement an MPTCPDIAG_GET request.
Reported-by: "Rantala, Tommi T. (Nokia - FI/Espoo)" <tommi.t.rantala@nokia.com> Fixes: 9c3be2c0eee0 ("ss: mptcp: add msk diag interface support") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Vladimir Oltean [Fri, 22 Jan 2021 01:22:11 +0000 (03:22 +0200)]
man: tc-taprio.8: document the full offload feature
Since this feature's introduction in commit 9c66d1564676 ("taprio: Add
support for hardware offloading") from kernel v5.4, it never got
documented in the man pages. Due to this reason, we see customer reports
of seemingly contradictory information: the community manpages claim
there is no support for full offload, nonetheless many silicon vendors
have already implemented it.
This patch documents the full offload feature (enabled by specifying
"flags 2" to the taprio qdisc) and gives one more example that tries to
illustrate some of the finer points related to the usage.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
David Ahern [Tue, 2 Feb 2021 02:45:49 +0000 (02:45 +0000)]
Merge branch 'devlink-port-mgmt' into next
Parav Pandit says:
====================
This patchset implements devlink port add, delete and function state
management commands.
An example sequence for a PCI SF:
Set the device in switchdev mode:
$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev
View ports in switchdev mode:
$ devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 s=
plittable false
Add a subfunction port for PCI PF 0 with sfnumber 88:
$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
pci/0000:08:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfn=
um 0 sfnum 88 splittable false
function:
hw_addr 00:00:00:00:00:00 state inactive opstate detached
Show a newly added port:
$ devlink port show pci/0000:06:00.0/32768
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf contro=
ller 0 pfnum 0 sfnum 88 splittable false
function:
hw_addr 00:00:00:00:00:00 state inactive opstate detached
Set the function state to active:
$ devlink port function set pci/0000:06:00.0/32768 hw_addr 00:00:00:00:88:8=
8 state active
Show the port in JSON format:
$ devlink port show pci/0000:06:00.0/32768 -jp
{
"port": {
"pci/0000:06:00.0/32768": {
"type": "eth",
"netdev": "ens2f0npf0sf88",
"flavour": "pcisf",
"controller": 0,
"pfnum": 0,
"sfnum": 88,
"splittable": false,
"function": {
"hw_addr": "00:00:00:00:88:88",
"state": "active",
"opstate": "attached"
}
}
}
}
Set the function state to active:
$ devlink port function set pci/0000:06:00.0/32768 state inactive
Delete the port after use:
$ devlink port del pci/0000:06:00.0/32768
Oliver Hartkopp [Mon, 25 Jan 2021 10:40:55 +0000 (11:40 +0100)]
iplink_can: add Classical CAN frame LEN8_DLC support
The len8_dlc element is filled by the CAN interface driver and used for CAN
frame creation by the CAN driver when the CAN_CTRLMODE_CC_LEN8_DLC flag is
supported by the driver and enabled via netlink configuration interface.
Add the command line support for cc-len8-dlc for Linux 5.11+
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: David Ahern <dsahern@kernel.org>
Jarod Wilson [Fri, 15 Jan 2021 19:21:37 +0000 (14:21 -0500)]
bond: support xmit_hash_policy=vlan+srcmac
There's a new transmit hash policy being added to the bonding driver that
is a simple XOR of vlan ID and source MAC, xmit_hash_policy vlan+srcmac.
This trivial patch makes it configurable and queryable via iproute2.
Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David Ahern <dsahern@kernel.org>
Luca Boccassi [Sun, 17 Jan 2021 22:54:27 +0000 (22:54 +0000)]
vrf: fix ip vrf exec with libbpf
The size of bpf_insn is passed to bpf_load_program instead of the number
of elements as it expects, so ip vrf exec fails with:
$ sudo ip link add vrf-blue type vrf table 10
$ sudo ip link set dev vrf-blue up
$ sudo ip/ip vrf exec vrf-blue ls
Failed to load BPF prog: 'Invalid argument'
last insn is not an exit or jmp
processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
Kernel compiled with CGROUP_BPF enabled?
https://bugs.debian.org/980046
Reported-by: Emmanuel DECAEN <Emmanuel.Decaen@xsalto.com> Signed-off-by: Luca Boccassi <bluca@debian.org> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Roi Dayan [Tue, 12 Jan 2021 10:33:17 +0000 (12:33 +0200)]
build: Fix link errors on some systems
Since moving get_rate() and get_size() from tc to lib, on some
systems we fail to link because of missing math lib.
Move the functions that require math lib to their own c file
and add -lm to dcb that now use those functions.
../lib/libutil.a(utils.o): In function `get_rate':
utils.c:(.text+0x10dc): undefined reference to `floor'
../lib/libutil.a(utils.o): In function `get_size':
utils.c:(.text+0x1394): undefined reference to `floor'
../lib/libutil.a(json_print.o): In function `sprint_size':
json_print.c:(.text+0x14c0): undefined reference to `rint'
json_print.c:(.text+0x14f4): undefined reference to `rint'
json_print.c:(.text+0x157c): undefined reference to `rint'
Fixes: f3be0e6366ac ("lib: Move get_rate(), get_rate64() from tc here") Fixes: 44396bdfcc0a ("lib: Move get_size() from tc here") Fixes: adbe5de96662 ("lib: Move sprint_size() from tc here, add print_size()") Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Tested-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
David Ahern [Mon, 18 Jan 2021 04:10:27 +0000 (04:10 +0000)]
Merge branch 'dcb-app-dcbx' into next
Petr Machata says:
====================
Add support to the dcb tool for the following two DCB objects:
- APP, which allows configuration of traffic prioritization rules based on
several possible packet headers.
- DCBX, which is a 1-byte bitfield of flags that configure whether the DCBX
protocol is implemented in the device or in the host, and which version
of the protocol should be used.
Patch #1 adds a new helper for finding a name of a given dsfield value.
This is useful for APP DSCP-to-priority rules, which can use human-readable
DSCP names.
Patches #2, #3 and #4 extend existing interfaces for, respectively, parsing
of the X:Y mappings, for setting a DCB object, and for getting a DCB
object.
In patch #5, support for the command line argument -N / --Numeric is
added. The APP tool later uses it to decide whether to format DSCP values
as human-readable strings or as plain numbers.
Patches #6 and #7 add the subtools themselves and their man pages.
v2:
- Two patches dropped and sent to iproute2 branch as "dcb: Fixes".
This patch set now depends on that one.
- Patch #5:
- Make it -N / --Numeric instead of -n / --no-nice-names
- Rename the flag from no_nice_names to numeric as well
- Patch #6:
- Adjust to s/no_nice_names/numeric/ from another patch.
Petr Machata [Sat, 2 Jan 2021 00:03:41 +0000 (01:03 +0100)]
dcb: Add a subtool for the DCBX object
The Linux DCBX object is a 1-byte bitfield of flags that configure whether
the DCBX protocol is implemented in the device or in the host, and which
version of the protocol should be used. Add a tool to access the per-port
Linux DCBX object.
For example:
# dcb dcbx set dev eni1np1 host ieee
# dcb dcbx show dev eni1np1
host ieee
Signed-off-by: Petr Machata <me@pmachata.org> Signed-off-by: David Ahern <dsahern@kernel.org>
Petr Machata [Sat, 2 Jan 2021 00:03:39 +0000 (01:03 +0100)]
dcb: Support -N to suppress translation to human-readable names
Some DSCP values can be translated to symbolic names. That may not be
always desirable. Introduce a command-line option similar to other tools,
-N or --Numeric, to suppress this translation.
Signed-off-by: Petr Machata <me@pmachata.org> Signed-off-by: David Ahern <dsahern@kernel.org>
Petr Machata [Sat, 2 Jan 2021 00:03:38 +0000 (01:03 +0100)]
dcb: Generalize dcb_get_attribute()
The function dcb_get_attribute() assumes that the caller knows the exact
size of the looked-for payload. It also assumes that the response comes
wrapped in an DCB_ATTR_IEEE nest. The former assumption does not hold for
the IEEE APP table, which has variable size. The latter one does not hold
for DCBX, which is not IEEE-nested, and also for any CEE attributes, which
would come CEE-nested.
Factor out the payload extractor from the current dcb_get_attribute() code,
and put into a helper. Then rewrite dcb_get_attribute() compatibly in terms
of the new function. Introduce dcb_get_attribute_va() as a thin wrapper for
IEEE-nested access, and dcb_get_attribute_bare() for access to attributes
that are not nested.
Signed-off-by: Petr Machata <me@pmachata.org> Signed-off-by: David Ahern <dsahern@kernel.org>
Petr Machata [Sat, 2 Jan 2021 00:03:37 +0000 (01:03 +0100)]
dcb: Generalize dcb_set_attribute()
The function dcb_set_attribute() takes a fully-formed payload as an
argument. For callers that need to build a nested attribute, such as is the
case for DCB APP table, this is not great, because with libmnl, they would
need to construct a separate netlink message just to pluck out the payload
and hand it over to this function.
Currently, dcb_set_attribute() also always wraps the payload in an
DCB_ATTR_IEEE container, because that is what all the dcb subtools so far
needed. But that is not appropriate for DCBX in particular, and in fact a
handful other attributes, as well as any CEE payloads.
Instead, generalize this code by adding parameters for constructing a
custom payload and for fetching the response from a custom response
attribute. Then add dcb_set_attribute_va(), which takes a callback to
invoke in the right place for the nest to be built, and
dcb_set_attribute_bare(), which is similar to dcb_set_attribute(), but does
not encapsulate the payload in an IEEE container. Rewrite
dcb_set_attribute() compatibly in terms of the new functions.
Signed-off-by: Petr Machata <me@pmachata.org> Signed-off-by: David Ahern <dsahern@kernel.org>
Petr Machata [Sat, 2 Jan 2021 00:03:36 +0000 (01:03 +0100)]
lib: Generalize parse_mapping()
The function parse_mapping() assumes the key is a number, with a single
configurable exception, which is using "all" to mean "all possible keys".
If a caller wishes to use symbolic names instead of numbers, they cannot
reuse this function.
To facilitate reuse in these situations, convert parse_mapping() into a
helper, parse_mapping_gen(), which instead of an allow-all boolean takes a
generic key-parsing callback. Rewrite parse_mapping() in terms of this
newly-added helper and add a pair of key parsers, one for just numbers,
another for numbers and the keyword "all". Publish the latter as well.
Signed-off-by: Petr Machata <me@pmachata.org> Signed-off-by: David Ahern <dsahern@kernel.org>
Petr Machata [Sat, 2 Jan 2021 00:03:35 +0000 (01:03 +0100)]
lib: rt_names: Add rtnl_dsfield_get_name()
For formatting DSCP (not full dsfield), it would be handy to be able to
just get the name from the name table, and not get any of the remaining
cruft related to formatting. Add a new entry point to just fetch the
name table string uninterpreted. Use it from rtnl_dsfield_n2a().
Signed-off-by: Petr Machata <me@pmachata.org> Signed-off-by: David Ahern <dsahern@kernel.org>
$ tc -json filter show dev eth0 ingress
...{"eth_type":"8847",
" mpls":[" lse":["depth":1,"label":100],
" lse":["depth":2,"label":200]]}...
This is invalid as the arrays, introduced by "[", can't contain raw
string:value pairs. Those must be enclosed into "{}" to form valid json
ojects. Also, there are spurious whitespaces before the mpls and lse
strings because of the indentation used for normal output.
Fix this by putting all LSE parameters (depth, label, tc, bos and ttl)
into the same json object. The "mpls" key now directly contains a list
of such objects.
Also, handle strings differently for normal and json output, so that
json strings don't get spurious indentation whitespaces.
Normal output isn't modified.
The json output now looks like:
$ tc -json filter show dev eth0 ingress
...{"eth_type":"8847",
"mpls":[{"depth":1,"label":100},
{"depth":2,"label":200}]}...
Fixes: eb09a15c12fb ("tc: flower: support multiple MPLS LSE match") Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Petr Machata [Sun, 3 Jan 2021 10:57:24 +0000 (11:57 +0100)]
dcb: Change --Netns/-N to --netns/-n
This to keep compatible with the major tools, ip and tc. Also
document the option in the man page, which was neglected.
Fixes: 67033d1c1c8a ("Add skeleton of a new tool, dcb") Signed-off-by: Petr Machata <me@pmachata.org> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Petr Machata [Sun, 3 Jan 2021 10:57:23 +0000 (11:57 +0100)]
dcb: Plug a leaking DCB socket buffer
DCB socket buffer is allocated in dcb_init(), but never freed(). Free it
in dcb_fini().
Fixes: 67033d1c1c8a ("Add skeleton of a new tool, dcb") Signed-off-by: Petr Machata <me@pmachata.org> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Petr Machata [Sun, 3 Jan 2021 10:57:22 +0000 (11:57 +0100)]
dcb: Set values with RTM_SETDCB type
dcb currently sends all netlink messages with a type RTM_GETDCB, even the
set ones. Change to the appropriate type.
Fixes: 67033d1c1c8a ("Add skeleton of a new tool, dcb") Signed-off-by: Petr Machata <me@pmachata.org> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Add support in rdma for extack errors to be received
in userspace when sent from kernel, so now netlink extack
error messages sent from kernel would be printed for the
user.
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>
Ido Schimmel [Thu, 7 Jan 2021 15:23:26 +0000 (17:23 +0200)]
nexthop: Fix usage output
Before:
# ip nexthop help
Usage: ip nexthop { list | flush } [ protocol ID ] SELECTOR
ip nexthop { add | replace } id ID NH [ protocol ID ]
ip nexthop { get| del } id ID
SELECTOR := [ id ID ] [ dev DEV ] [ vrf NAME ] [ master DEV ]
[ groups ] [ fdb ]
NH := { blackhole | [ via ADDRESS ] [ dev DEV ] [ onlink ]
[ encap ENCAPTYPE ENCAPHDR ] | group GROUP ] }
GROUP := [ id[,weight]>/<id[,weight]>/... ]
ENCAPTYPE := [ mpls ]
ENCAPHDR := [ MPLSLABEL ]
After:
# ip nexthop help
Usage: ip nexthop { list | flush } [ protocol ID ] SELECTOR
ip nexthop { add | replace } id ID NH [ protocol ID ]
ip nexthop { get | del } id ID
SELECTOR := [ id ID ] [ dev DEV ] [ vrf NAME ] [ master DEV ]
[ groups ] [ fdb ]
NH := { blackhole | [ via ADDRESS ] [ dev DEV ] [ onlink ]
[ encap ENCAPTYPE ENCAPHDR ] | group GROUP [ fdb ] }
GROUP := [ <id[,weight]>/<id[,weight]>/... ]
ENCAPTYPE := [ mpls ]
ENCAPHDR := [ MPLSLABEL ]
Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>
Guillaume Nault [Mon, 14 Dec 2020 17:14:22 +0000 (18:14 +0100)]
testsuite: Add mpls packet matching tests for tc flower
Match all MPLS fields using smallest and highest possible values.
Test the two ways of specifying MPLS header matching:
* with the basic mpls_{label,tc,bos,ttl} keywords (match only on the
first LSE),
* with the more generic "lse" keyword (allows matching at different
depth of the MPLS label stack).
This test file allows to find problems like the one fixed by
Linux commit 7fdd375e3830 ("net: sched: Fix dump of MPLS_OPT_LSE_LABEL
attribute in cls_flower").
Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David Ahern <dsahern@gmail.com>
Thomas Karlsson [Mon, 14 Dec 2020 10:42:18 +0000 (11:42 +0100)]
iplink:macvlan: Added bcqueuelen parameter
This patch allows the user to set and retrieve the
IFLA_MACVLAN_BC_QUEUE_LEN parameter via the bcqueuelen
command line argument
This parameter controls the requested size of the queue for
broadcast and multicast packages in the macvlan driver.
If not specified, the driver default (1000) will be used.
Note: The request is per macvlan but the actually used queue
length per port is the maximum of any request to any macvlan
connected to the same port.
For this reason, the used queue length IFLA_MACVLAN_BC_QUEUE_LEN_USED
is also retrieved and displayed in order to aid in the understanding
of the setting. However, it can of course not be directly set.
Signed-off-by: Thomas Karlsson <thomas.karlsson@paneda.se> Signed-off-by: David Ahern <dsahern@gmail.com>
Andrea Claudi [Fri, 11 Dec 2020 18:53:03 +0000 (19:53 +0100)]
tc: pedit: fix memory leak in print_pedit
keys_ex is dinamically allocated with calloc on line 770, but
is not freed in case of error at line 823.
Fixes: 081d6c310d3a ("tc: pedit: Support JSON dumping") Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Andrea Claudi [Fri, 11 Dec 2020 18:53:02 +0000 (19:53 +0100)]
devlink: fix memory leak in cmd_dev_flash()
nlg_ntf is dinamically allocated in mnlg_socket_open(), and is freed on
the out: return path. However, some error paths do not free it,
resulting in memory leak.
This commit fix this using mnlg_socket_close(), and reporting the
correct error number when required.
Fixes: 9b13cddfe268 ("devlink: implement flash status monitoring") Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Andrea Claudi [Fri, 11 Dec 2020 18:26:40 +0000 (19:26 +0100)]
man: tc-flower: fix manpage
Commit 924c43778a84 ("man: tc-ct.8: Add manual page for ct tc action")
add man page for tc-ct, but it brings with it a bogus block of text
in the benning of tc-flower man page.
This commit simply removes it.
Fixes: 924c43778a84 ("man: tc-ct.8: Add manual page for ct tc action") Reported-by: Paolo Valerio <pvalerio@redhat.com> Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
David Ahern [Mon, 14 Dec 2020 16:48:38 +0000 (16:48 +0000)]
Merge branch 'dcb-pfc-buffer-maxrate' into next
Petr Machata says:
====================
Add support to the dcb tool for the following three DCB objects:
- PFC, for "Priority-based Flow Control", allows configuration of priority
lossiness, and related toggles.
- DCBNL buffer interfaces are an extension to the 802.1q DCB interfaces and
allow configuration of port headroom buffers.
- DCBNL maxrate interfaces are an extension to the 802.1q DCB interfaces
and allow configuration of rate with which traffic in a given traffic
class is sent.
Patches #1-#4 fix small issues in the current DCB code and man pages.
Patch #5 adds new helpers to the DCB dispatcher.
Patches #6 and #7 add support for command line arguments -s and -i. These
enable, respectively, display of statistical counters, and ISO/IEC mode of
rate units.
Patches #8-#10 add the subtools themselves and their man pages.
Petr Machata [Thu, 10 Dec 2020 23:02:24 +0000 (00:02 +0100)]
dcb: Add a subtool for the DCB maxrate object
DCBNL maxrate interfaces are an extension to the 802.1q DCB interfaces and
allow configuration of rate with which traffic in a given traffic class is
sent.
Add a dcb subtool to allow showing and tweaking of this per-TC maximum
rate. For example:
# dcb maxrate show dev eni1np1
tc-maxrate 0:25Gbit 1:25Gbit 2:25Gbit 3:25Gbit 4:25Gbit 5:25Gbit 6:100Gbit 7:25Gbit
Signed-off-by: Petr Machata <me@pmachata.org> Signed-off-by: David Ahern <dsahern@gmail.com>
Petr Machata [Thu, 10 Dec 2020 23:02:19 +0000 (00:02 +0100)]
dcb: Add dcb_set_u32(), dcb_set_u64()
The DCB buffer object has a settable array of 32-bit quantities, and the
maxrate object of 64-bit ones. Adjust dcb_parse_mapping() and related
helpers to support 64-bit values in mappings, and add appropriate helpers.
Signed-off-by: Petr Machata <me@pmachata.org> Signed-off-by: David Ahern <dsahern@gmail.com>
Petr Machata [Thu, 10 Dec 2020 23:02:15 +0000 (00:02 +0100)]
dcb: Remove unsupported command line arguments from getopt_long()
getopt_long() currently includes "c" and "n" in the short option string.
These probably slipped in as a cut'n'paste, and are not actually accepted.
Remove them.
Signed-off-by: Petr Machata <me@pmachata.org> Signed-off-by: David Ahern <dsahern@gmail.com>
Moshe Shemesh [Mon, 7 Dec 2020 05:35:22 +0000 (07:35 +0200)]
devlink: Add reload stats to dev show
Show reload statistics through devlink dev show using devlink stats
flag. The reload statistics show the history per reload action type and
limit. Add remote reload statistics to show the history of actions
performed due devlink reload commands initiated by remote host.
Moshe Shemesh [Mon, 7 Dec 2020 05:35:20 +0000 (07:35 +0200)]
devlink: Add devlink reload action and limit options
Add reload action and reload limit to devlink reload command to enable
the user to select the reload action required and constrains limits on
these actions that he may want to ensure.
The following reload actions are supported:
driver_reinit: driver entities re-initialization, applying
devlink-param and devlink-resource values.
fw_activate: firmware activate.
The uAPI is backward compatible, if the reload action option is omitted
from the reload command, the driver reinit action will be used.
Note that when required to do firmware activation some drivers may need
to reload the driver. On the other hand some drivers may need to reset
the firmware to reinitialize the driver entities. Therefore, the devlink
reload command returns the actions which were actually performed.
By default reload actions are not limited and driver implementation may
include reset or downtime as needed to perform the actions. However, if
reload limit is selected, the driver should perform only if it can do it
while keeping the limit constraints.
Reload limit added:
no_reset: No reset allowed, no down time allowed, no link flap and no
configuration is lost.
Command examples:
$devlink dev reload pci/0000:82:00.0 action driver_reinit
reload_actions_performed:
driver_reinit
$devlink dev reload pci/0000:82:00.0 action fw_activate
reload_actions_performed:
driver_reinit fw_activate
David Ahern [Wed, 9 Dec 2020 02:32:17 +0000 (02:32 +0000)]
Merge branch 'rate-size-parsing-output' into next
Petr Machata says:
==================
The DCB tool will have commands that deal with buffer sizes and traffic
rates. TC is another tool that has a number of such commands, and functions
to support them: get_size(), get_rate/64(), s/print_size() and
s/print_rate(). In this patchset, these functions are moved from TC to lib/
for possible reuse and modernized.
s/print_rate() has a hidden parameter of a global variable use_iec, which
made the conversion non-trivial. The parameter was made explicit,
print_rate() converted to a mostly json_print-like function, and
sprint_rate() retired in favor of the new print_rate. Patches #1 and #2
deal with this.
The intention was to treat s/print_size() similarly, but unfortunately two
use cases of sprint_size() cannot be converted to a json_print-like
print_size(), and the function sprint_size() had to remain as a discouraged
backdoor to print_size(). This is done in patch #3.
Patch #4 then improves the code of sprint_size() a little bit.
Patch #5 fixes a buglet in formatting small rates in IEC mode.
Patches #6 and #7 handle a routine movement of, respectively,
get_rate/64() and get_size() from tc to lib.
This patchset does not actually add any new uses of these functions. A
follow-up patchset will add subtools for management of DCB buffer and DCB
maxrate objects that will make use of them.
Petr Machata [Sat, 5 Dec 2020 21:13:35 +0000 (22:13 +0100)]
lib: Move get_size() from tc here
The function get_size() serves for parsing of sizes using a handly notation
that supports units and their prefixes, such as 10Kbit. This will be useful
for the DCB buffer size parsing. Move the function from TC to the general
library, so that it can be reused.
Signed-off-by: Petr Machata <me@pmachata.org> Signed-off-by: David Ahern <dsahern@gmail.com>
Petr Machata [Sat, 5 Dec 2020 21:13:34 +0000 (22:13 +0100)]
lib: Move get_rate(), get_rate64() from tc here
The functions get_rate() and get_rate64() are useful for parsing rate-like
values. The DCB tool will find these useful in the maxrate subtool.
Move them over to lib so that they can be easily reused.
Signed-off-by: Petr Machata <me@pmachata.org> Signed-off-by: David Ahern <dsahern@gmail.com>
Petr Machata [Sat, 5 Dec 2020 21:13:33 +0000 (22:13 +0100)]
lib: print_color_rate(): Fix formatting small rates in IEC mode
ISO/IEC units are distinguished from the decadic ones by using a prefixes
like "Ki", "Mi" instead of "K" and "M". The current code inserts the letter
"i" after the decadic unit when in IEC mode. However it does so even when
the prefix is an empty string, formatting 1Kbit in IEC mode as "1000ibit".
Fix by omitting the letter if there is no prefix.
Signed-off-by: Petr Machata <me@pmachata.org> Signed-off-by: David Ahern <dsahern@gmail.com>