]> git.proxmox.com Git - mirror_iproute2.git/log
mirror_iproute2.git
4 years agoss: sctp: Formatting tweak in sctp_show_info for locals
Patrick Talbert [Sat, 3 Aug 2019 08:47:08 +0000 (10:47 +0200)]
ss: sctp: Formatting tweak in sctp_show_info for locals

'locals' output does not include a leading space so it runs up against
skmem:() output. Add a leading space to fix it.

Signed-off-by: Patrick Talbert <ptalbert@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoss: sctp: fix typo for nodelay
Patrick Talbert [Sat, 3 Aug 2019 08:37:41 +0000 (10:37 +0200)]
ss: sctp: fix typo for nodelay

nodealy should be nodelay.

Signed-off-by: Patrick Talbert <ptalbert@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agodevlink: finish queue.h to list.h transition
Jiri Pirko [Mon, 5 Aug 2019 09:56:56 +0000 (11:56 +0200)]
devlink: finish queue.h to list.h transition

Loose the "q" from the names and name the structure fields in the same
way rest of the code does. Also, fix list_add arg order which leads
to segfault.

Fixes: 33267017faf1 ("iproute2: devlink: port from sys/queue.h to list.h")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agotc: fflush after each command in batch mode
Stephen Hemminger [Fri, 2 Aug 2019 16:33:39 +0000 (09:33 -0700)]
tc: fflush after each command in batch mode

Restore behaviour of tc batch mode.
Flush stdout after each command.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoRevert "tc: Add batchsize feature for filter and actions"
Stephen Hemminger [Thu, 1 Aug 2019 00:27:59 +0000 (17:27 -0700)]
Revert "tc: Add batchsize feature for filter and actions"

This reverts commit 485d0c6001c4aa134b99c86913d6a7089b7b2ab0.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoRevert "tc: fix batch force option"
Stephen Hemminger [Thu, 1 Aug 2019 00:19:33 +0000 (17:19 -0700)]
Revert "tc: fix batch force option"

This reverts commit b133392468d1f404077a8f3554d1f63d48bb45e8.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoRevert "tc: flush after each command in batch mode"
Stephen Hemminger [Thu, 1 Aug 2019 00:19:18 +0000 (17:19 -0700)]
Revert "tc: flush after each command in batch mode"

This reverts commit d66fdfda71e4a30c1ca0ddb7b1a048bef30fe79e.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoRevert "tc: Remove pointless assignments in batch()"
Stephen Hemminger [Thu, 1 Aug 2019 00:16:54 +0000 (17:16 -0700)]
Revert "tc: Remove pointless assignments in batch()"

This reverts commit 6358bbc381c6e38465838370bcbbdeb77ec3565a.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agordma: Document adaptive-moderation
Yamin Friedman [Mon, 29 Jul 2019 07:42:26 +0000 (10:42 +0300)]
rdma: Document adaptive-moderation

Add document of setting the adaptive-moderation for the ib device.

Signed-off-by: Yamin Friedman <yaminf@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
4 years agordma: Control CQ adaptive moderation (DIM)
Yamin Friedman [Mon, 29 Jul 2019 07:42:25 +0000 (10:42 +0300)]
rdma: Control CQ adaptive moderation (DIM)

In order to set adaptive-moderation for an ib device the command is:
rdma dev set [DEV] adaptive-moderation [on|off]

rdma dev show -d
0: mlx5_0: node_type ca fw 16.25.0319 node_guid 248a:0703:00a5:29d0
sys_image_guid 248a:0703:00a5:29d0 adaptive-moderation on
caps: <BAD_PKEY_CNTR, BAD_QKEY_CNTR, AUTO_PATH_MIG, CHANGE_PHY_PORT,
PORT_ACTIVE_EVENT, SYS_IMAGE_GUID, RC_RNR_NAK_GEN, MEM_WINDOW, XRC,
MEM_MGT_EXTENSIONS, BLOCK_MULTICAST_LOOPBACK, MEM_WINDOW_TYPE_2B,
RAW_IP_CSUM, CROSS_CHANNEL, MANAGED_FLOW_STEERING, SIGNATURE_HANDOVER,
ON_DEMAND_PAGING, SG_GAPS_REG, RAW_SCATTER_FCS, PCI_WRITE_END_PADDING>

rdma resource show cq
dev mlx5_0 cqn 0 cqe 1023 users 4 poll-ctx UNBOUND_WORKQUEUE
adaptive-moderation off comm [ib_core]

Signed-off-by: Yamin Friedman <yaminf@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
4 years agojson_print: drop extra semi-colons
Stephen Hemminger [Mon, 29 Jul 2019 15:45:32 +0000 (08:45 -0700)]
json_print: drop extra semi-colons

The _PRINT_FUNC() macro expands to a function call.
Putting a semi-colon is unnecessary and causes warnings with -pedantic

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoutils: Fix get_s64() function
Kurt Kanzenbach [Thu, 4 Jul 2019 12:24:27 +0000 (14:24 +0200)]
utils: Fix get_s64() function

get_s64() uses internally strtoll() to parse the value out of a given
string. strtoll() returns a long long. However, the intermediate variable is
long only which might be 32 bit on some systems. So, fix it.

Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoiplink: document 'change' option to ip link
Stephen Hemminger [Fri, 26 Jul 2019 21:59:59 +0000 (14:59 -0700)]
iplink: document 'change' option to ip link

Add the command alias "change" to man page.
Don't show it on usage, since it is not commonly used.

Reported-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Matteo Croce <mcroce@redhat.com>
4 years agoiplink_can: fix format output of clock with flag -details
Antonio Borneo [Fri, 26 Jul 2019 13:06:09 +0000 (15:06 +0200)]
iplink_can: fix format output of clock with flag -details

The command
ip -details link show can0
prints in the last line the value of the clock frequency attached
to the name of the following value "numtxqueues", e.g.
clock 49500000numtxqueues 1 numrxqueues 1 gso_max_size
 65536 gso_max_segs 65535

Add the missing space after the clock value.

Signed-off-by: Antonio Borneo <borneo.antonio@gmail.com>
4 years agoiproute2: devlink: port from sys/queue.h to list.h
Sergei Trofimovich [Fri, 26 Jul 2019 21:01:05 +0000 (22:01 +0100)]
iproute2: devlink: port from sys/queue.h to list.h

sys/queue.h does not exist on linux-musl targets and fails build as:

    devlink.c:28:10: fatal error: sys/queue.h: No such file or directory
       28 | #include <sys/queue.h>
          |          ^~~~~~~~~~~~~

The change ports to list.h API and drops dependency of 'sys/queue.h'.
The API maps one-to-one.

Build-tested on linux-musl and linux-glibc.

Bug: https://bugs.gentoo.org/690486
CC: Stephen Hemminger <stephen@networkplumber.org>
CC: netdev@vger.kernel.org
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agouapi: update kernel headers from 5.3-rc1
Stephen Hemminger [Mon, 22 Jul 2019 16:45:09 +0000 (09:45 -0700)]
uapi: update kernel headers from 5.3-rc1

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agordma: Document counter statistic
Mark Zhang [Wed, 17 Jul 2019 14:31:56 +0000 (17:31 +0300)]
rdma: Document counter statistic

Add document of accessing the QP counter, including bind/unbind a QP
to a counter manually or automatically, and dump counter statistics.

Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agordma: Add default counter show support
Mark Zhang [Wed, 17 Jul 2019 14:31:55 +0000 (17:31 +0300)]
rdma: Add default counter show support

Show default counter statistics, which are same through the sysfs
interface: /sys/class/infiniband/<dev>/ports/<port>/hw_counters/

Example:
$ rdma stat show link mlx5_2/1
link mlx5_2/1 rx_write_requests 8 rx_read_requests 4 rx_atomic_requests 0
out_of_buffer 0 out_of_sequence 0 duplicate_request 0 rnr_nak_retry_err 0
packet_seq_err 0 implied_nak_seq_err 0 local_ack_timeout_err 0
resp_local_length_error 0 resp_cqe_error 0 req_cqe_error 0
req_remote_invalid_request 0 req_remote_access_errors 0
resp_remote_access_errors 0 resp_cqe_flush_error 0 req_cqe_flush_error 0
rp_cnp_ignored 0 rp_cnp_handled 0 np_ecn_marked_roce_packets 0
np_cnp_sent 0 rx_icrc_encapsulated 0

Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agordma: Add stat manual mode support
Mark Zhang [Wed, 17 Jul 2019 14:31:54 +0000 (17:31 +0300)]
rdma: Add stat manual mode support

In manual mode a QP can be manually bound to a counter. If the counter
id(cntn) is not specified that kernel will allocate one. After a
successful bind, the cntn can be seen through "rdma statistic qp show".
And in unbind if lqpn is not specified then all QPs on this counter will
be unbound.
The manual and auto mode are mutual-exclusive.

Examples:
$ rdma statistic qp bind link mlx5_2/1 lqpn 178
$ rdma statistic qp bind link mlx5_2/1 lqpn 178 cntn 4
$ rdma statistic qp unbind link mlx5_2/1 cntn 4
$ rdma statistic qp unbind link mlx5_2/1 cntn 4 lqpn 178

Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agordma: Make get_port_from_argv() returns valid port in strict port mode
Mark Zhang [Wed, 17 Jul 2019 14:31:53 +0000 (17:31 +0300)]
rdma: Make get_port_from_argv() returns valid port in strict port mode

When strict_port is set, make get_port_from_argv() returns failure if
no valid port is specified.

Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agordma: Add rdma statistic counter per-port auto mode support
Mark Zhang [Wed, 17 Jul 2019 14:31:52 +0000 (17:31 +0300)]
rdma: Add rdma statistic counter per-port auto mode support

With per-QP statistic counter support, a user is allowed to monitor
specific QPs categories, which are bound to/unbound from counters
dynamically allocated/deallocated.

In per-port "auto" mode, QPs are bound to counters automatically
according to common criteria. For example a per "type"(qp type)
scheme, where in each process all QPs have same qp type are bind
automatically to a single counter.
Currently only "type" (qp type) is supported. Examples:

$ rdma statistic qp set link mlx5_2/1 auto type on
$ rdma statistic qp set link mlx5_2/1 auto off

Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agordma: Add get per-port counter mode support
Mark Zhang [Wed, 17 Jul 2019 14:31:51 +0000 (17:31 +0300)]
rdma: Add get per-port counter mode support

Add an interface to show which mode is active. Two modes are supported:
- "auto": In this mode all QPs belong to one category are bind automatically
  to a single counter set. Currently only "qp type" is supported;
- "manual": In this mode QPs are bound to a counter manually.

Examples:
$ rdma statistic qp mode
0/1: mlx5_0/1: qp auto off
1/1: mlx5_1/1: qp auto off
2/1: mlx5_2/1: qp auto type on
3/1: mlx5_3/1: qp auto off

$ rdma statistic qp mode link mlx5_0
0/1: mlx5_0/1: qp auto off

Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agordma: Add "stat qp show" support
Mark Zhang [Wed, 17 Jul 2019 14:31:50 +0000 (17:31 +0300)]
rdma: Add "stat qp show" support

This patch presents link, id, task name, lqpn, as well as all sub
counters of a QP counter.
A QP counter is a dynamically allocated statistic counter that is
bound with one or more QPs. It has several sub-counters, each is
used for a different purpose.

Examples:
$ rdma stat qp show
link mlx5_2/1 cntn 5 pid 31609 comm client.1 rx_write_requests 0
rx_read_requests 0 rx_atomic_requests 0 out_of_buffer 0 out_of_sequence 0
duplicate_request 0 rnr_nak_retry_err 0 packet_seq_err 0
implied_nak_seq_err 0 local_ack_timeout_err 0 resp_local_length_error 0
resp_cqe_error 0 req_cqe_error 0 req_remote_invalid_request 0
req_remote_access_errors 0 resp_remote_access_errors 0
resp_cqe_flush_error 0 req_cqe_flush_error 0
    LQPN: <178>
$ rdma stat show link rocep1s0f5/1
link rocep1s0f5/1 rx_write_requests 0 rx_read_requests 0 rx_atomic_requests 0 out_of_buffer 0 duplicate_request 0
rnr_nak_retry_err 0 packet_seq_err 0 implied_nak_seq_err 0 local_ack_timeout_err 0 resp_local_length_error 0 resp_cqe_error 0
req_cqe_error 0 req_remote_invalid_request 0 req_remote_access_errors 0 resp_remote_access_errors 0 resp_cqe_flush_error 0
req_cqe_flush_error 0 rp_cnp_ignored 0 rp_cnp_handled 0 np_ecn_marked_roce_packets 0 np_cnp_sent 0
$ rdma stat show link rocep1s0f5/1 -p
link rocep1s0f5/1
    rx_write_requests 0
    rx_read_requests 0
    rx_atomic_requests 0
    out_of_buffer 0
    duplicate_request 0
    rnr_nak_retry_err 0
    packet_seq_err 0
    implied_nak_seq_err 0
    local_ack_timeout_err 0
    resp_local_length_error 0
    resp_cqe_error 0
    req_cqe_error 0
    req_remote_invalid_request 0
    req_remote_access_errors 0
    resp_remote_access_errors 0
    resp_cqe_flush_error 0
    req_cqe_flush_error 0
    rp_cnp_ignored 0
    rp_cnp_handled 0
    np_ecn_marked_roce_packets 0
    np_cnp_sent 0

Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agouapi: fix bpf comment typo
Stephen Hemminger [Fri, 19 Jul 2019 17:49:36 +0000 (10:49 -0700)]
uapi: fix bpf comment typo

From upstream.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agojson: fix backslash escape typo in jsonw_puts
Ivan Delalande [Thu, 18 Jul 2019 01:15:31 +0000 (18:15 -0700)]
json: fix backslash escape typo in jsonw_puts

Fixes: fcc16c22 ("provide common json output formatter")
Signed-off-by: Ivan Delalande <colona@arista.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoip tunnel: warn when changing IPv6 tunnel without tunnel name
Andrea Claudi [Tue, 9 Jul 2019 13:16:51 +0000 (15:16 +0200)]
ip tunnel: warn when changing IPv6 tunnel without tunnel name

Tunnel change fails if a tunnel name is not specified while using
'ip -6 tunnel change'. However, no warning message is printed and
no error code is returned.

$ ip -6 tunnel add ip6tnl1 mode ip6gre local fd::1 remote fd::2 tos inherit ttl 127 encaplimit none dev dummy0
$ ip -6 tunnel change dev dummy0 local 2001:1234::1 remote 2001:1234::2
$ ip -6 tunnel show ip6tnl1
ip6tnl1: gre/ipv6 remote fd::2 local fd::1 dev dummy0 encaplimit none hoplimit 127 tclass inherit flowlabel 0x00000 (flowinfo 0x00000000)

This commit checks if tunnel interface name is equal to an empty
string: in this case, it prints a warning message to the user.
It intentionally avoids to return an error to not break existing
script setup.

This is the output after this commit:
$ ip -6 tunnel add ip6tnl1 mode ip6gre local fd::1 remote fd::2 tos inherit ttl 127 encaplimit none dev dummy0
$ ip -6 tunnel change dev dummy0 local 2001:1234::1 remote 2001:1234::2
Tunnel interface name not specified
$ ip -6 tunnel show ip6tnl1
ip6tnl1: gre/ipv6 remote fd::2 local fd::1 dev dummy0 encaplimit none hoplimit 127 tclass inherit flowlabel 0x00000 (flowinfo 0x00000000)

Reviewed-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoRevert "ip6tunnel: fix 'ip -6 {show|change} dev <name>' cmds"
Andrea Claudi [Tue, 9 Jul 2019 13:16:50 +0000 (15:16 +0200)]
Revert "ip6tunnel: fix 'ip -6 {show|change} dev <name>' cmds"

This reverts commit ba126dcad20e6d0e472586541d78bdd1ac4f1123.
It breaks tunnel creation when using 'dev' parameter:

$ ip link add type dummy
$ ip -6 tunnel add ip6tnl1 mode ip6ip6 remote 2001:db8:ffff:100::2 local 2001:db8:ffff:100::1 hoplimit 1 tclass 0x0 dev dummy0
add tunnel "ip6tnl0" failed: File exists

dev parameter must be used to specify the device to which
the tunnel is binded, and not the tunnel itself.

Reported-by: Jianwen Ji <jiji@redhat.com>
Reviewed-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agouapi: rdma netlink.h update
Stephen Hemminger [Tue, 16 Jul 2019 18:58:44 +0000 (11:58 -0700)]
uapi: rdma netlink.h update

From upstream 5.3-rc

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agouapi: update uapi/magic.h
Stephen Hemminger [Tue, 16 Jul 2019 18:56:58 +0000 (11:56 -0700)]
uapi: update uapi/magic.h

From upstream

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agodevlink: Remove enclosing array brackets binary print with json format
Aya Levin [Wed, 10 Jul 2019 11:03:21 +0000 (14:03 +0300)]
devlink: Remove enclosing array brackets binary print with json format

Keep pr_out_binary_value function only for printing. Inner relations
like array grouping should be done outside the function.

Fixes: 844a61764c6f ("devlink: Add helper functions for name and value separately")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agodevlink: Fix binary values print
Aya Levin [Wed, 10 Jul 2019 11:03:20 +0000 (14:03 +0300)]
devlink: Fix binary values print

Fix function pr_out_binary_value() to start printing the binary buffer
from offset 0 instead of offset 1. Remove redundant new line at the
beginning of the output

Example:
With patch:
 mlx5e_txqsq:
   05 00 00 00 05 00 00 00 01 00 00 00 00 00 00 00
   00 00 00 00 00 00 00 00 8e 6e 3a 13 07 00 00 00
   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
   c0
Without patch
  mlx5e_txqsq:

  00 00 00 05 00 00 00 01 00 00 00 00 00 00 00 00
  00 00 00 00 00 00 00 8e 6e 3a 13 07 00 00 00 00
  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 c0

Fixes: 844a61764c6f ("devlink: Add helper functions for name and value separately")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agodevlink: Change devlink health dump show command to dumpit
Aya Levin [Wed, 10 Jul 2019 11:03:19 +0000 (14:03 +0300)]
devlink: Change devlink health dump show command to dumpit

Although devlink health dump show command is given per reporter, it
returns large amounts of data. Trying to use the doit cb results in
OUT-OF-BUFFER error. This complementary patch raises the DUMP flag in
order to invoke the dumpit cb. We're safe as no existing drivers
implement the dump health reporter option yet.

Fixes: 041e6e651a8e ("devlink: Add devlink health dump show command")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoutils: don't match empty strings as prefixes
Matteo Croce [Mon, 15 Jul 2019 18:04:30 +0000 (20:04 +0200)]
utils: don't match empty strings as prefixes

iproute has an utility function which checks if a string is a prefix for
another one, to allow use of abbreviated commands, e.g. 'addr' or 'a'
instead of 'address'.

This routine unfortunately considers an empty string as prefix
of any pattern, leading to undefined behaviour when an empty
argument is passed to ip:

    # ip ''
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
           valid_lft forever preferred_lft forever
        inet6 ::1/128 scope host
           valid_lft forever preferred_lft forever

    # tc ''
    qdisc noqueue 0: dev lo root refcnt 2

    # ip address add 192.0.2.0/24 '' 198.51.100.1 dev dummy0
    # ip addr show dev dummy0
    6: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
        link/ether 02:9d:5e:e9:3f:c0 brd ff:ff:ff:ff:ff:ff
        inet 192.0.2.0/24 brd 198.51.100.1 scope global dummy0
           valid_lft forever preferred_lft forever

Rewrite matches() so it takes care of an empty input, and doesn't
scan the input strings three times: the actual implementation
does 2 strlen and a memcpy to accomplish the same task.

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agotc: util: constrain percentage in 0-100 interval
Andrea Claudi [Sat, 13 Jul 2019 09:44:07 +0000 (11:44 +0200)]
tc: util: constrain percentage in 0-100 interval

parse_percent() currently allows to specify negative percentages
or value above 100%. However this does not seems to make sense,
as the function is used for probabilities or bandiwidth rates.

Moreover, using negative values leads to erroneous results
(using Bernoulli loss model as example):

$ ip link add test type dummy
$ ip link set test up
$ tc qdisc add dev test root netem loss gemodel -10% limit 10
$ tc qdisc show dev test
qdisc netem 800c: root refcnt 2 limit 10 loss gemodel p 90% r 10% 1-h 100% 1-k 0%

Using values above 100% we have instead:

$ ip link add test type dummy
$ ip link set test up
$ tc qdisc add dev test root netem loss gemodel 140% limit 10
$ tc qdisc show dev test
qdisc netem 800f: root refcnt 2 limit 10 loss gemodel p 40% r 60% 1-h 100% 1-k 0%

This commit changes parse_percent() with a check to ensure
percentage values stay between 1.0 and 0.0.
parse_percent_rate() function, which already employs a similar
check, is adjusted accordingly.

With this check in place, we have:

$ ip link add test type dummy
$ ip link set test up
$ tc qdisc add dev test root netem loss gemodel -10% limit 10
Illegal "loss gemodel p"

Fixes: 927e3cfb52b58 ("tc: B.W limits can now be specified in %.")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agouapi: fix bpf.h link
Stephen Hemminger [Thu, 11 Jul 2019 22:36:29 +0000 (15:36 -0700)]
uapi: fix bpf.h link

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agotc: print all error messages to stderr
Stephen Hemminger [Tue, 9 Jul 2019 21:25:14 +0000 (14:25 -0700)]
tc: print all error messages to stderr

Many tc modules were printing error messages to stdout.
This is problematic if using JSON or other output formats.
Change all these places to use fprintf(stderr, ...) instead.

Also, remove unnecessary initialization and places
where else is used after error return.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoMerge branch 'master' into next
David Ahern [Wed, 10 Jul 2019 21:41:13 +0000 (14:41 -0700)]
Merge branch 'master' into next

Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agoMerge branch 'tc-mpls-action' into next
David Ahern [Wed, 10 Jul 2019 21:07:42 +0000 (14:07 -0700)]
Merge branch 'tc-mpls-action' into next

John Hurley  says:

====================

Recent kernel additions to TC allows the manipulation of MPLS headers as
filter actions.

The following patchset creates an iproute2 interface to the new actions
and includes documentation on how to use it.

v1->v2:
- change error from print_string() to fprintf(strerr,) (Stephen Hemminger)
- split long line in explain() message (David Ahern)
- use _SL_ instead of /n in print message (David Ahern)

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agoman: update man pages for TC MPLS actions
John Hurley [Wed, 10 Jul 2019 12:40:40 +0000 (13:40 +0100)]
man: update man pages for TC MPLS actions

Add a man page describing the newly added TC mpls manipulation actions.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agotc: add mpls actions
John Hurley [Wed, 10 Jul 2019 12:40:39 +0000 (13:40 +0100)]
tc: add mpls actions

Create a new action type for TC that allows the pushing, popping, and
modifying of MPLS headers.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agolib: add mpls_uc and mpls_mc as link layer protocol names
John Hurley [Wed, 10 Jul 2019 12:40:38 +0000 (13:40 +0100)]
lib: add mpls_uc and mpls_mc as link layer protocol names

Update the llproto_names array to allow users to reference the mpls
protocol ids with the names 'mpls_uc' for unicast MPLS and 'mpls_mc' for
multicast.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agoImport tc_mpls.h uapi header
David Ahern [Wed, 10 Jul 2019 21:05:19 +0000 (14:05 -0700)]
Import tc_mpls.h uapi header

Import tc_mpls.h uapi header from kernel headers at commit:
        1ff2f0fa450e ("net/mlx5e: Return in default case statement in tx_post_resync_params")

Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agodevlink: Introduce PCI PF and VF port flavour and attribute
Parav Pandit [Wed, 10 Jul 2019 12:39:52 +0000 (07:39 -0500)]
devlink: Introduce PCI PF and VF port flavour and attribute

Introduce PCI PF and VF port flavour and port attributes such as PF
number and VF number.

$ devlink port show
pci/0000:05:00.0/0: type eth netdev eth0 flavour pcipf pfnum 0
pci/0000:05:00.0/1: type eth netdev eth1 flavour pcivf pfnum 0 vfnum 0
pci/0000:05:00.0/2: type eth netdev eth2 flavour pcivf pfnum 0 vfnum 1

Signed-off-by: Parav Pandit <parav@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agoip: bond: add peer notification delay support
Vincent Bernat [Sun, 7 Jul 2019 17:51:15 +0000 (19:51 +0200)]
ip: bond: add peer notification delay support

Ability to tweak the delay between gratuitous ND/ARP packets has been
added in kernel commit 07a4ddec3ce9 ("bonding: add an option to
specify a delay between peer notifications"), through
IFLA_BOND_PEER_NOTIF_DELAY attribute. Add support to set and show this
value.

Example:

    $ ip -d link set bond0 type bond peer_notify_delay 1000
    $ ip -d link l dev bond0
    2: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue
    state UP mode DEFAULT group default qlen 1000
        link/ether 50:54:33:00:00:01 brd ff:ff:ff:ff:ff:ff
        bond mode active-backup active_slave eth0 miimon 100 updelay 0
    downdelay 0 peer_notify_delay 1000 use_carrier 1 arp_interval 0
    arp_validate none arp_all_targets any primary eth0
    primary_reselect always fail_over_mac active xmit_hash_policy
    layer2 resend_igmp 1 num_grat_arp 5 all_slaves_active 0 min_links
    0 lp_interval 1 packets_per_slave 1 lacp_rate slow ad_select
    stable tlb_dynamic_lb 1 addrgenmode eu

Signed-off-by: Vincent Bernat <vincent@bernat.ch>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agoUpdate kernel headers
David Ahern [Wed, 10 Jul 2019 20:52:48 +0000 (13:52 -0700)]
Update kernel headers

Update kernel headers to commit:
    1ff2f0fa450e ("net/mlx5e: Return in default case statement in tx_post_resync_params")

import include/uapi/linux/const.h per new dependency in
include/uapi/linux/pkt_cls.h.

Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agotc: document 'mask' parameter in skbedit man page
Roman Mashak [Mon, 8 Jul 2019 16:06:18 +0000 (12:06 -0400)]
tc: document 'mask' parameter in skbedit man page

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agotc: added mask parameter in skbedit action
Roman Mashak [Mon, 8 Jul 2019 16:06:17 +0000 (12:06 -0400)]
tc: added mask parameter in skbedit action

Add 32-bit missing mask attribute in iproute2/tc, which has been long
supported by the kernel side.

v2: print value in hex with print_hex() as suggested by Stephen Hemminger.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoip-route: fix json formatting for metrics
Andrea Claudi [Mon, 8 Jul 2019 09:36:42 +0000 (11:36 +0200)]
ip-route: fix json formatting for metrics

Setting metrics for routes currently lead to non-parsable
json output. For example:

$ ip link add type dummy
$ ip route add 192.168.2.0 dev dummy0 metric 100 mtu 1000 rto_min 3
$ ip -j route | jq
parse error: ':' not as part of an object at line 1, column 319

Fixing this opening a json object in the metrics array and using
print_string() instead of fprintf().

This is the output for the above commands applying this patch:

$ ip -j route | jq
[
  {
    "dst": "192.168.2.0",
    "dev": "dummy0",
    "scope": "link",
    "metric": 100,
    "flags": [],
    "metrics": [
      {
        "mtu": 1000,
        "rto_min": 3
      }
    ]
  }
]

Fixes: 663c3cb23103f ("iproute: implement JSON and color output")
Fixes: 968272e791710 ("iproute: refactor metrics print")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Reported-by: Frank Hofmann <fhofmann@cloudflare.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agodevlink: Show devlink port number
Parav Pandit [Tue, 9 Jul 2019 17:26:54 +0000 (12:26 -0500)]
devlink: Show devlink port number

Show devlink port number whenever kernel reports that attribute.

An example output for a physical port.
$ devlink port show
pci/0000:06:00.1/65535: type eth netdev eth1_p1 flavour physical port 1

Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agoss: Change resolve_services to numeric
David Ahern [Tue, 9 Jul 2019 21:54:34 +0000 (14:54 -0700)]
ss: Change resolve_services to numeric

Commit ca697cee4cfc ("ip: add a new parameter -Numeric") changed
!resolve_services to numeric in ss.c.

A commit in master:
  d791e75d74ff ("ss: in --numeric mode, print raw numbers for data rates")
added another reference to !resolve_services. Convert it to numeric.

Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agoMerge branch 'master' into next
David Ahern [Tue, 9 Jul 2019 21:26:44 +0000 (14:26 -0700)]
Merge branch 'master' into next

Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agov5.2.0
Stephen Hemminger [Mon, 8 Jul 2019 18:09:59 +0000 (11:09 -0700)]
v5.2.0

4 years agotc: netem: fix r parameter in Bernoulli loss model
Andrea Claudi [Thu, 27 Jun 2019 14:47:45 +0000 (16:47 +0200)]
tc: netem: fix r parameter in Bernoulli loss model

As the man page for tc netem states:

    To use the Bernoulli model, the only needed parameter is p while the
    others will be set to the default values r=1-p, 1-h=1 and 1-k=0.

However r parameter is erroneusly set to 1, and not to 1-p.
Fix this using the same approach of the 4-state loss model.

Fixes: 3c7950af598be ("netem: add support for 4 state and GE loss model")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoss: in --numeric mode, print raw numbers for data rates
Tomasz Torcz [Tue, 2 Jul 2019 06:53:39 +0000 (08:53 +0200)]
ss: in --numeric mode, print raw numbers for data rates

ss by default shows data rates in human-readable form - as Mbps/Gbps etc.
 Enhance --numeric mode to show raw values in bps, without conversion.

Signed-of-by: Tomasz Torcz <tomasz.torcz@nordea.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoman: tc-netem.8: fix URL for netem page
Andrea Claudi [Mon, 1 Jul 2019 14:04:41 +0000 (16:04 +0200)]
man: tc-netem.8: fix URL for netem page

URL for netem page on sources section points to a no more existent
resource. Fix this using the correct URL.

Fixes: cd72dcf13c8a4 ("netem: add man-page")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoipaddress: correctly print a VF hw address in the IPoIB case
Denis Kirjanov [Fri, 28 Jun 2019 09:54:25 +0000 (11:54 +0200)]
ipaddress: correctly print a VF hw address in the IPoIB case

Current code assumes that we print ethernet mac and
that doesn't work in the IPoIB case with SRIOV-enabled hardware

Before:
11: ib1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc pfifo_fast
state UP mode DEFAULT group default qlen 256
        link/infiniband
80:00:00:66:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a4:3e:7c brd
00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
        vf 0 MAC 14:80:00:00:66:fe, spoof checking off, link-state
disable,
    trust off, query_rss off
    ...

After:
11: ib1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc pfifo_fast
state UP mode DEFAULT group default qlen 256
        link/infiniband
80:00:00:66:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a4:3e:7c brd
00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
        vf 0     link/infiniband
80:00:00:66:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a4:3e:7c brd
00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff, spoof
checking off, link-state disable, trust off, query_rss off

v1->v2: updated kernel headers to uapi commit
v2->v3: fixed alignment
v3->v4: aligned print statements as used through the source

Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
[ committer note: flipped argument order for print_vfinfo to keep fp first
  and fixed alignment issues ]

4 years agoUpdate kernel headers
David Ahern [Fri, 28 Jun 2019 23:14:25 +0000 (16:14 -0700)]
Update kernel headers

Update kernel headers to commit:
    5cdda5f1d6ad ("ipv4: enable route flushing in network namespaces")

Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agoutils: move parse_percent() to tc_util
Andrea Claudi [Fri, 28 Jun 2019 16:03:45 +0000 (18:03 +0200)]
utils: move parse_percent() to tc_util

As parse_percent() is used only in tc.

This reduces ip, bridge and genl binaries size:

$ bloat-o-meter -t bridge/bridge bridge/bridge.new
add/remove: 0/1 grow/shrink: 0/0 up/down: 0/-109 (-109)
Total: Before=50973, After=50864, chg -0.21%

$ bloat-o-meter -t genl/genl genl/genl.new
add/remove: 0/1 grow/shrink: 0/0 up/down: 0/-109 (-109)
Total: Before=30298, After=30189, chg -0.36%

$ bloat-o-meter ip/ip ip/ip.new
add/remove: 0/1 grow/shrink: 0/0 up/down: 0/-109 (-109)
Total: Before=674164, After=674055, chg -0.02%

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agotipc: support interface name when activating UDP bearer
Hoang Le [Tue, 25 Jun 2019 04:34:39 +0000 (11:34 +0700)]
tipc: support interface name when activating UDP bearer

Support for indicating interface name has an ip address in parallel
with specifying ip address when activating UDP bearer.
This liberates the user from keeping track of the current ip address
for each device.

Old command syntax:
$tipc bearer enable media udp name NAME localip IP

New command syntax:
$tipc bearer enable media udp name NAME [localip IP|dev DEVICE]

v2:
    - Removed initial value for fd
    - Fixed the returning value for cmd_bearer_validate_and_get_addr
      to make its consistent with using: zero or non-zero
v3: - Switch to use helper 'get_ifname' to retrieve interface name
v4: - Replace legacy SIOCGIFADDR by netlink
v5: - Fix leaky rtnl_handle

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agodevlink: fix libc and kernel headers collision
Baruch Siach [Thu, 27 Jun 2019 18:37:19 +0000 (21:37 +0300)]
devlink: fix libc and kernel headers collision

Since commit 2f1242efe9d ("devlink: Add devlink health show command") we
use the sys/sysinfo.h header for the sysinfo(2) system call. But since
iproute2 carries a local version of the kernel struct sysinfo, this
causes a collision with libc that do not rely on kernel defined sysinfo
like musl libc:

In file included from devlink.c:25:0:
.../sysroot/usr/include/sys/sysinfo.h:10:8: error: redefinition of 'struct sysinfo'
 struct sysinfo {
        ^~~~~~~
In file included from ../include/uapi/linux/kernel.h:5:0,
                 from ../include/uapi/linux/netlink.h:5,
                 from ../include/uapi/linux/genetlink.h:6,
                 from devlink.c:21:
../include/uapi/linux/sysinfo.h:8:8: note: originally defined here
 struct sysinfo {
        ^~~~~~~

Move the sys/sysinfo.h userspace header before kernel headers, and
suppress the indirect include of linux/sysinfo.h.

Cc: Aya Levin <ayal@mellanox.com>
Cc: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agodevlink: fix format string warning for 32bit targets
Baruch Siach [Thu, 27 Jun 2019 18:37:18 +0000 (21:37 +0300)]
devlink: fix format string warning for 32bit targets

32bit targets define uint64_t as long long unsigned. This leads to the
following build warning:

devlink.c: In function ‘pr_out_u64’:
devlink.c:1729:11: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 4 has type ‘uint64_t {aka long long unsigned int}’ [-Wformat=]
    pr_out("%s %lu", name, val);
           ^
devlink.c:59:21: note: in definition of macro ‘pr_out’
   fprintf(stdout, ##args);   \
                     ^~~~

Use uint64_t specific conversion specifiers in the format string to fix
that.

Cc: Aya Levin <ayal@mellanox.com>
Cc: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoip address: do not set mngtmpaddr option for IPv4 addresses
Andrea Claudi [Tue, 25 Jun 2019 10:29:57 +0000 (12:29 +0200)]
ip address: do not set mngtmpaddr option for IPv4 addresses

'mngtmpaddr' option make the kernel manage temporary addresses
created from the specified one as template on behalf of Privacy
Extensions (RFC3041). This option should be available only for
IPv6 addresses, as correctly stated in the manpage.

However it is possible to set mngtmpaddr on IPv4 addresses, too:

$ ip link add dummy0 type dummy
$ ip -4 addr add 192.168.1.1 dev dummy0 mngtmpaddr
$ ip a
1: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
   link/ether 1a:6d:c6:96:ca:f8 brd ff:ff:ff:ff:ff:ff
   inet 192.168.1.1/32 scope global mngtmpaddr dummy0
      valid_lft forever preferred_lft forever

Fix this adding a check on the protocol family before setting
IFA_F_MANAGETEMPADDR flag.

Fixes: 5b7e21c417bea ("add support for IFA_F_MANAGETEMPADDR")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoip address: do not set home option for IPv4 addresses
Andrea Claudi [Tue, 25 Jun 2019 10:29:56 +0000 (12:29 +0200)]
ip address: do not set home option for IPv4 addresses

'home' option designates a IPv6 address as "home address" as
defined in RFC 6275. This option should be available only for
IPv6 addresses, as correctly stated in the manpage.

However it is possible to set home on IPv4 addresses, too:

$ ip link add dummy0 type dummy
$ ip -4 addr add 192.168.1.1 dev dummy0 home
$ ip a
1: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
   link/ether 1a:6d:c6:96:ca:f8 brd ff:ff:ff:ff:ff:ff
   inet 192.168.1.1/32 scope global home dummy0
      valid_lft forever preferred_lft forever

Fix this adding a check on the protocol family before setting
IFA_F_HOMEADDRESS flag.

Fixes: bac735c53a36d ("enabled to manipulate the flags of IFA_F_HOMEADDRESS or IFA_F_NODAD from ip.")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoip address: do not set nodad option for IPv4 addresses
Andrea Claudi [Tue, 25 Jun 2019 10:29:55 +0000 (12:29 +0200)]
ip address: do not set nodad option for IPv4 addresses

Duplicate Address Detection (RFC 4862) is available only for IPv6
addresses. As a consequence, 'nodad' option, turning it off, should
be available only for IPv6, and is defined like that in the man page.

However it is possible to set nodad on IPv4 addresses, too:

$ ip link add dummy0 type dummy
$ ip -4 addr add 192.168.1.1 dev dummy0 nodad
$ ip a
1: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
   link/ether 1a:6d:c6:96:ca:f8 brd ff:ff:ff:ff:ff:ff
   inet 192.168.1.1/32 scope global nodad dummy0
      valid_lft forever preferred_lft forever

Fix this adding a check on the protocol family before setting
IFA_F_NODAD flag.

Fixes: bac735c53a36d ("enabled to manipulate the flags of IFA_F_HOMEADDRESS or IFA_F_NODAD from ip.")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoiproute: Set flags and attributes on dump to get IPv6 cached routes to be flushed
Stefano Brivio [Tue, 25 Jun 2019 11:41:24 +0000 (13:41 +0200)]
iproute: Set flags and attributes on dump to get IPv6 cached routes to be flushed

With a current (5.1) kernel version, IPv6 exception routes can't be listed
(ip -6 route list cache) or flushed (ip -6 route flush cache). Kernel
support for this is being added back. Relevant net-next commits:

  564c91f7e563 fib_frontend, ip6_fib: Select routes or exceptions dump from RTM_F_CLONED
  ef11209d4219 Revert "net/ipv6: Bail early if user only wants cloned entries"
  3401bfb1638e ipv6/route: Don't match on fc_nh_id if not set in ip6_route_del()
  bf9a8a061ddc ipv6/route: Change return code of rt6_dump_route() for partial node dumps
  1e47b4837f3b ipv6: Dump route exceptions if requested
  40cb35d5dc04 ip6_fib: Don't discard nodes with valid routing information in fib6_locate_1()

However, to allow the kernel to filter routes based on the RTM_F_CLONED
flag, we need to make sure this flag is always passed when we want cached
routes to be dumped, and we can also pass table and output interface
attributes to have the kernel filtering on them, if requested by the user.

Use the existing iproute_dump_filter() as a filter for the dump request in
iproute_flush(). This way, 'ip -6 route flush cache' works again.

v2: Instead of creating a separate 'filter' function dealing with
    RTM_F_CACHED only, use the existing iproute_dump_filter() and get
    table and oif kernel filtering for free. Suggested by David Ahern.

Fixes: aba5acdfdb34 ("(Logical change 1.3)")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoip/iptoken: fix dump error when ipv6 disabled
Hangbin Liu [Wed, 26 Jun 2019 01:44:07 +0000 (09:44 +0800)]
ip/iptoken: fix dump error when ipv6 disabled

When we disable IPv6 from the start up (ipv6.disable=1), there will be
no IPv6 route info in the dump message. If we return -1 when
ifi->ifi_family != AF_INET6, we will get error like

$ ip token list
Dump terminated

which will make user feel confused. There is no need to return -1 if the
dump message not match. Return 0 is enough.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agodevlink: replace print macros with functions
Stephen Hemminger [Wed, 26 Jun 2019 16:18:18 +0000 (09:18 -0700)]
devlink: replace print macros with functions

Using functions is safer, and printing is not performance
critical.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agotc: adjust xtables_match and xtables_target to changes in recent iptables
Eyal Birger [Mon, 24 Jun 2019 15:14:57 +0000 (18:14 +0300)]
tc: adjust xtables_match and xtables_target to changes in recent iptables

iptables commit 933400b37d09 ("nft: xtables: add the infrastructure to translate from iptables to nft")
added an additional member to struct xtables_match and struct xtables_target.

This change is available for libxtables12 and up.
Add these members conditionally to support both newer and older versions.

Fixes: dd29621578d2 ("tc: add em_ipt ematch for calling xtables matches from tc matching context")
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoMerge branch 'master' into next
David Ahern [Fri, 21 Jun 2019 22:59:24 +0000 (15:59 -0700)]
Merge branch 'master' into next

Conflicts:
include/uapi/linux/snmp.h

Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agotc: q_netem: JSON-ify the output
Jakub Kicinski [Tue, 18 Jun 2019 00:49:29 +0000 (17:49 -0700)]
tc: q_netem: JSON-ify the output

Add JSON output support to q_netem.

The normal output is untouched.

In JSON output always use seconds as the base of time units,
and non-percentage numbers (0.01 instead of 1%). Try to always
report the fields, even if they are zero.
All this should make the output more machine-friendly.

v2: less macroes

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agoip monitor: display interfaces from all groups
Nicolas Dichtel [Fri, 21 Jun 2019 09:21:32 +0000 (11:21 +0200)]
ip monitor: display interfaces from all groups

Only interface from group 0 were displayed.

ip monitor calls ipaddr_reset_filter() and there is no reason to not reset
the filter group in this function.

Fixes: c4fdf75d3def ("ip link: fix display of interface groups")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agonetns: make netns_{save,restore} static
Matteo Croce [Tue, 18 Jun 2019 14:49:35 +0000 (16:49 +0200)]
netns: make netns_{save,restore} static

The netns_{save,restore} functions are only used in ipnetns.c now, since
the restore is not needed anymore after the netns exec command.
Move them in ipnetns.c, and make them static.

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoip vrf: use hook to change VRF in the child
Matteo Croce [Tue, 18 Jun 2019 14:49:34 +0000 (16:49 +0200)]
ip vrf: use hook to change VRF in the child

On vrf exec, reset the VRF associations in the child process, via the
new hook added to cmd_exec(). In this way, the parent doesn't have to
reset the VRF associations before spawning other processes.

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agonetns: switch netns in the child when executing commands
Matteo Croce [Tue, 18 Jun 2019 14:49:33 +0000 (16:49 +0200)]
netns: switch netns in the child when executing commands

'ip netns exec' changes the current netns just before executing a child
process, and restores it after forking. This is needed if we're running
in batch or do_all mode.
Some cleanups must be done both in the parent and in the child: the
parent must restore the previous netns, while the child must reset any
VRF association.
Unfortunately, if do_all is set, the VRF are not reset in the child, and
the spawned processes are started with the wrong VRF context. This can
be triggered with this script:

# ip -b - <<-'EOF'
link add type vrf table 100
link set vrf0 up
link add type dummy
link set dummy0 vrf vrf0 up
netns add ns1
EOF
# ip -all -b - <<-'EOF'
vrf exec vrf0 true
netns exec setsid -f sleep 1h
EOF
# ip vrf pids vrf0
  314  sleep
# ps 314
  PID TTY      STAT   TIME COMMAND
  314 ?        Ss     0:00 sleep 1h

Refactor cmd_exec() and pass to it a function pointer which is called in
the child before the final exec. In the netns exec case the function just
resets the VRF and switches netns.

Doing it in the child is less error prone and safer, because the parent
environment is always kept unaltered.

After this refactor some utility functions became unused, so remove them.

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoAdd support for configuring MACsec gcm-aes-256 cipher type.
Pete Morici [Fri, 14 Jun 2019 17:24:59 +0000 (13:24 -0400)]
Add support for configuring MACsec gcm-aes-256 cipher type.

Signed-off-by: Pete Morici <pmorici@dev295.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoMakefile: use make -C
Andrea Claudi [Thu, 13 Jun 2019 17:47:02 +0000 (19:47 +0200)]
Makefile: use make -C

make provides a handy -C option to change directory before reading
the makefiles or doing anything else.

Use that instead of the "cd dir && make && cd .." pattern, thus
simplifying sintax for some makefiles.

Changes from v1:
- Drop an obviously wrong leftover on testsuite/iproute2/Makefile

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agouapi: update headers and add if_link.h and if_infiniband.h
Stephen Hemminger [Tue, 18 Jun 2019 16:46:33 +0000 (09:46 -0700)]
uapi: update headers and add if_link.h and if_infiniband.h

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoipmroute: Prevent overlapping storage of `filter` global
Michael Forney [Sun, 16 Jun 2019 21:46:02 +0000 (14:46 -0700)]
ipmroute: Prevent overlapping storage of `filter` global

This variable has the same name as `struct xfrm_filter filter` in
ip/ipxfrm.c, but overrides that definition since `struct rtfilter`
is larger.

This is visible when built with -Wl,--warn-common in LDFLAGS:

/usr/bin/ld: ipxfrm.o: warning: common of `filter' overridden by larger common from ipmroute.o

Signed-off-by: Michael Forney <mforney@mforney.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoip: add a new parameter -Numeric
Hangbin Liu [Wed, 12 Jun 2019 09:21:15 +0000 (17:21 +0800)]
ip: add a new parameter -Numeric

Add a new parameter '-Numeric' to show the number of protocol, scope,
dsfield, etc directly instead of converting it to human readable name.
Do the same on tc and ss.

This patch is based on David Ahern's previous patch.

Suggested-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agoMerge branch 'master' into next
David Ahern [Fri, 14 Jun 2019 14:29:40 +0000 (07:29 -0700)]
Merge branch 'master' into next

Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agotools: Fix include path for generate_nlmsg
David Ahern [Fri, 14 Jun 2019 13:50:55 +0000 (06:50 -0700)]
tools: Fix include path for generate_nlmsg

Compile of tools directory fails with:

make -C tools
    CC       generate_nlmsg
../../lib/libnetlink.c:28:27: fatal error: linux/nexthop.h: No such file or directory
 #include <linux/nexthop.h>
                           ^
compilation terminated.

Add local uapi to build path.

Fixes: 74829ca7dd60 ("libnetlink: Add helper to create nexthop dump request")
Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agoMakefile: use make -C to change directory
Andrea Claudi [Thu, 13 Jun 2019 17:59:29 +0000 (19:59 +0200)]
Makefile: use make -C to change directory

make provides a handy -C option to change directory before reading
the makefiles or doing anything else.

Use that instead of the "cd dir && make && cd .." pattern, thus
simplifying sintax for some makefiles.

Changes from v1:
- Drop an obviously wrong leftover in testsuite/iproute2/Makefile

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Reviewed-and-tested-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agotestsuite: intent if/else in Makefile
Stephen Hemminger [Wed, 12 Jun 2019 15:48:33 +0000 (08:48 -0700)]
testsuite: intent if/else in Makefile

Indent both arms of if/else equally.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agodevlink: mnlg: Catch returned error value of dumpit commands
Moshe Shemesh [Tue, 11 Jun 2019 16:11:09 +0000 (19:11 +0300)]
devlink: mnlg: Catch returned error value of dumpit commands

Devlink commands which implements the dumpit callback may return error.
The netlink function netlink_dump() sends the errno value as the payload
of the message, while answering user space with NLMSG_DONE.
To enable receiving errno value for dumpit commands we have to check for
it in the message. If it is a negative value then the dump returned an
error so we should set errno accordingly and check for ext_ack in case
it was set.

Fixes: 049c58539f5d ("devlink: mnlg: Add support for extended ack")
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoMerge branch 'nexthop-objects' into next
David Ahern [Tue, 11 Jun 2019 17:32:07 +0000 (10:32 -0700)]
Merge branch 'nexthop-objects' into next

David Ahern  says:

====================

This set adds support for nexthop objects to the ip command. The syntax
for nexthop objects is identical to the current 'ip route .. nexthop ...'
syntax making it easy to convert existing use cases.

v2
- Fixed header use in rtnl_nexthopdump_req as noted by roopa
- made rth_del static per Stephen's request and fixed coding style
- removed print_nh_gateway and exported print_rta_gateway to reuse
  the iproute.c code (keeps consistency in output)
- added examples to commit message
- fixed monitor use when specific groups requested
- fixed usage in 'ip nexthop'
- added manpage

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agoipmonitor: Add nexthop option to monitor
David Ahern [Fri, 7 Jun 2019 22:38:16 +0000 (15:38 -0700)]
ipmonitor: Add nexthop option to monitor

Add capability to ip-monitor to listen and dump nexthop messages.
Since the nexthop group = 32 which exceeds the max groups bit
field, 2 separate flags are needed - one that defaults on to indicate
nexthop group is joined by default and a second that indicates a
specific selection by the user (e.g, ip mon nexthop route).

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoip route: Add option to use nexthop objects
David Ahern [Fri, 7 Jun 2019 22:38:15 +0000 (15:38 -0700)]
ip route: Add option to use nexthop objects

Add nhid option for routes to use nexthop objects by id.

Example:
  $ ip nexthop add id 1 via 10.99.1.2 dev veth1
  $ ip route add 10.100.1.0/24 nhid 1
  $ ip route ls
  ...
  10.100.1.0/24 nhid 1 via 10.99.1.2 dev veth1

Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agoip: Add man page for nexthop command
David Ahern [Fri, 7 Jun 2019 22:38:14 +0000 (15:38 -0700)]
ip: Add man page for nexthop command

Document 'ip nexthop' options in a man page with a few examples.

Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agoAdd support for nexthop objects
David Ahern [Fri, 7 Jun 2019 22:38:13 +0000 (15:38 -0700)]
Add support for nexthop objects

Add nexthop subcommand to ip. Implement basic commands for creating,
deleting and dumping nexthop objects. Syntax follows 'nexthop' syntax
from existing 'ip route' command.

Examples:
1. Single path
    $ ip nexthop add id 1 via 10.99.1.2 dev veth1
    $ ip nexthop ls
    id 1 via 10.99.1.2 src 10.99.1.1 dev veth1 scope link

2. ECMP
    $ ip nexthop add id 2 via 10.99.3.2 dev veth3
    $ ip nexthop add id 1001 group 1/2
      --> creates a nexthop group with 2 component nexthops:
          id 1 and id 2 both the same weight

    $ ip nexthop ls
    id 1 via 10.99.1.2 src 10.99.1.1 dev veth1 scope link
    id 2 via 10.99.3.2 src 10.99.3.1 dev veth3 scope link
    id 1001 group 1/2

3. Weighted multipath
    $ ip nexthop add id 1002 group 1,10/2,20
      --> creates a nexthop group with 2 component nexthops:
          id 1 with a weight of 10 and id 2 with a weight of 20

    $ ip nexthop ls
    id 1 via 10.99.1.2 src 10.99.1.1 dev veth1 scope link
    id 2 via 10.99.3.2 src 10.99.3.1 dev veth3 scope link
    id 1001 group 1/2
    id 1002 group 1,10/2,20

Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agoip route: Export print_rt_flags, print_rta_if and print_rta_gateway
David Ahern [Fri, 7 Jun 2019 22:38:12 +0000 (15:38 -0700)]
ip route: Export print_rt_flags, print_rta_if and print_rta_gateway

Export print_rt_flags and print_rta_if for use by the nexthop
command.

Change print_rta_gateway to take the family versus rtmsg struct and
export for use by the nexthop command.

Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agolibnetlink: Add helper to create nexthop dump request
David Ahern [Fri, 7 Jun 2019 22:38:11 +0000 (15:38 -0700)]
libnetlink: Add helper to create nexthop dump request

Add rtnl_nexthopdump_req to initiate a dump request of nexthop objects.

Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agouapi: Import nexthop object API
David Ahern [Fri, 7 Jun 2019 22:38:10 +0000 (15:38 -0700)]
uapi: Import nexthop object API

Add nexthop.h from kernel with the uapi for nexthop objects.

Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agolibnetlink: Add helper to add a group via setsockopt
David Ahern [Fri, 7 Jun 2019 22:38:09 +0000 (15:38 -0700)]
libnetlink: Add helper to add a group via setsockopt

groups > 31 have to be joined using the setsockopt. Since the nexthop
group is 32, add a helper to allow 'ip monitor' to listen for nexthop
messages.

Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agolwtunnel: Pass encap and encap_type attributes to lwt_parse_encap
David Ahern [Fri, 7 Jun 2019 22:38:08 +0000 (15:38 -0700)]
lwtunnel: Pass encap and encap_type attributes to lwt_parse_encap

lwt_parse_encap currently assumes the encap attribute is RTA_ENCAP
and the type is RTA_ENCAP_TYPE. Change lwt_parse_encap to take these
as input arguments for reuse by nexthop code which has the attributes
as NHA_ENCAP and NHA_ENCAP_TYPE.

Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agolibnetlink: Set NLA_F_NESTED in rta_nest
David Ahern [Fri, 7 Jun 2019 22:38:07 +0000 (15:38 -0700)]
libnetlink: Set NLA_F_NESTED in rta_nest

Kernel now requires NLA_F_NESTED to be set on new nested
attributes. Set NLA_F_NESTED in rta_nest.

Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agoip6tunnel: fix 'ip -6 {show|change} dev <name>' cmds
Mahesh Bandewar [Thu, 6 Jun 2019 23:44:26 +0000 (16:44 -0700)]
ip6tunnel: fix 'ip -6 {show|change} dev <name>' cmds

Inclusion of 'dev' is allowed by the syntax but not handled
correctly by the command. It produces no output for show
command and falsely successful for change command but does
not make any changes.

can be verified with the following steps
  # ip -6 tunnel add ip6tnl1 mode ip6gre local fd::1 remote fd::2 tos inherit ttl 127 encaplimit none
  # ip -6 tunnel show ip6tnl1
  <correct output>
  # ip -6 tunnel show dev ip6tnl1
  <no output but correct output after this change>
  # ip -6 tunnel change dev ip6tnl1 local 2001:1234::1 remote 2001:1234::2 encaplimit none ttl 127 tos inherit allow-localremote
  # echo $?
  0
  # ip -6 tunnel show ip6tnl1
  <no changes applied, but changes are correctly applied after this change>

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoip: reset netns after each command in batch mode
Matteo Croce [Fri, 7 Jun 2019 20:41:22 +0000 (22:41 +0200)]
ip: reset netns after each command in batch mode

When creating a new netns or executing a program into an existing one,
the unshare() or setns() calls will change the current netns.
In batch mode, this can run commands on the wrong interfaces, as the
ifindex value is meaningful only in the current netns. For example, this
command fails because veth-c doesn't exists in the init netns:

    # ip -b - <<-'EOF'
        netns add client
        link add name veth-c type veth peer veth-s netns client
        addr add 192.168.2.1/24 dev veth-c
    EOF
    Cannot find device "veth-c"
    Command failed -:7

But if there are two devices with the same name in the init and new netns,
ip will build a wrong ll_map with indexes belonging to the new netns,
and will execute actions in the init netns using this wrong mapping.
This script will flush all eth0 addresses and bring it down, as it has
the same ifindex of veth0 in the new netns:

    # ip addr
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
           valid_lft forever preferred_lft forever
    2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
        link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
        inet 192.168.122.76/24 brd 192.168.122.255 scope global dynamic eth0
           valid_lft 3598sec preferred_lft 3598sec

    # ip -b - <<-'EOF'
        netns add client
        link add name veth0 type veth peer name veth1
        link add name veth-ns type veth peer name veth0 netns client
        link set veth0 down
        address flush veth0
    EOF

    # ip addr
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
           valid_lft forever preferred_lft forever
    2: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN group default qlen 1000
        link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
    3: veth1@veth0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN group default qlen 1000
        link/ether c2:db:d0:34:13:4a brd ff:ff:ff:ff:ff:ff
    4: veth0@veth1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN group default qlen 1000
        link/ether ca:9d:6b:5f:5f:8f brd ff:ff:ff:ff:ff:ff
    5: veth-ns@if2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
        link/ether 32:ef:22:df:51:0a brd ff:ff:ff:ff:ff:ff link-netns client

The same issue can be triggered by the netns exec subcommand with a
sligthy different script:

    # ip netns add client
    # ip -b - <<-'EOF'
        netns exec client true
        link add name veth0 type veth peer name veth1
        link add name veth-ns type veth peer name veth0 netns client
        link set veth0 down
        address flush veth0
    EOF

Fix this by adding two netns_{save,reset} functions, which are used
to get a file descriptor for the init netns, and restore it after
each batch command.
netns_save() is called before the unshare() or setns(),
while netns_restore() is called after each command.

Fixes: 0dc34c7713bb ("iproute2: Add processless network namespace support")
Reviewed-and-tested-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 years agoMerge branch 'master' into next
David Ahern [Mon, 10 Jun 2019 17:32:07 +0000 (10:32 -0700)]
Merge branch 'master' into next

Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agotc: add support for action act_ctinfo
Kevin Darbyshire-Bryant [Tue, 4 Jun 2019 13:52:09 +0000 (14:52 +0100)]
tc: add support for action act_ctinfo

ctinfo is a tc action restoring data stored in conntrack marks to
various fields.  At present it has two independent modes of operation,
restoration of DSCP into IPv4/v6 diffserv and restoration of conntrack
marks into packet skb marks.

It understands a number of parameters specific to this action in
additional to the usual action syntax.  Each operating mode is
independent of the other so all options are optional, however not
specifying at least one mode is a bit pointless.

Usage: ... ctinfo [dscp mask [statemask]] [cpmark [mask]] [zone ZONE]
  [CONTROL] [index <INDEX>]

DSCP mode

dscp enables copying of a DSCP stored in the conntrack mark into the
ipv4/v6 diffserv field.  The mask is a 32bit field and specifies where
in the conntrack mark the DSCP value is located.  It must be 6
contiguous bits long. eg. 0xfc000000 would restore the DSCP from the
upper 6 bits of the conntrack mark.

The DSCP copying may be optionally controlled by a statemask.  The
statemask is a 32bit field, usually with a single bit set and must not
overlap the dscp mask.  The DSCP restore operation will only take place
if the corresponding bit/s in conntrack mark ANDed with the statemask
yield a non zero result.

eg. dscp 0xfc000000 0x01000000 would retrieve the DSCP from the top 6
bits, whilst using bit 25 as a flag to do so.  Bit 26 is unused in this
example.

CPMARK mode

cpmark enables copying of the conntrack mark to the packet skb mark.  In
this mode it is completely equivalent to the existing act_connmark
action.  Additional functionality is provided by the optional mask
parameter, whereby the stored conntrack mark is logically ANDed with the
cpmark mask before being stored into skb mark.  This allows shared usage
of the conntrack mark between applications.

eg. cpmark 0x00ffffff would restore only the lower 24 bits of the
conntrack mark, thus may be useful in the event that the upper 8 bits
are used by the DSCP function.

Usage: ... ctinfo [dscp mask [statemask]] [cpmark [mask]] [zone ZONE]
  [CONTROL] [index <INDEX>]
where :
dscp MASK is the bitmask to restore DSCP
     STATEMASK is the bitmask to determine conditional restoring
cpmark MASK mask applied to restored packet mark
ZONE is the conntrack zone
CONTROL := reclassify | pipe | drop | continue | ok |
   goto chain <CHAIN_INDEX>

Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
4 years agouapi: Import tc_ctinfo uapi
David Ahern [Mon, 10 Jun 2019 17:23:32 +0000 (10:23 -0700)]
uapi: Import tc_ctinfo uapi

Add tc_ctinfo.h uapi file from kernel.

Signed-off-by: David Ahern <dsahern@gmail.com>