Ido Schimmel [Tue, 13 Aug 2019 08:31:41 +0000 (11:31 +0300)]
devlink: Add devlink trap set and show commands
The trap set command allows the user to set the action of an individual
trap. Example:
# devlink trap set netdevsim/netdevsim10 trap blackhole_route action trap
The trap show command allows the user to get the current status of an
individual trap or a dump of all traps in case one is not specified.
When '-s' is specified the trap's statistics are shown. When '-v' is
specified the metadata types the trap can provide are shown. Example:
Donald Sharp [Sat, 10 Aug 2019 00:18:42 +0000 (20:18 -0400)]
ip nexthop: Add space to display properly when showing a group
When displaying a nexthop group made up of other nexthops, the display
line shows this when you have additional data at the end:
id 42 group 43/44/45/46/47/48/49/50/51/52/53/54/55/56/57/58/59/60/61/62/63/64/65/66/67/68/69/70/71/72/73/74proto zebra
Modify code so that it shows:
id 42 group 43/44/45/46/47/48/49/50/51/52/53/54/55/56/57/58/59/60/61/62/63/64/65/66/67/68/69/70/71/72/73/74 proto zebra
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Ido Schimmel [Mon, 12 Aug 2019 10:17:06 +0000 (13:17 +0300)]
tc: Fix block-handle support for filter operations
The revert of batchsize accidently reverted more than it should
and broke shared block functionality. Fix this by restoring the
original functionality.
To reproduce:
dst_ip 192.0.2.0/24 action drop
Unknown filter "block", hence option "10" is unparsable
Fixes: e991c04d64c0 ("Revert "tc: Add batchsize feature for filter and actions"") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Jiri Pirko [Mon, 5 Aug 2019 09:56:56 +0000 (11:56 +0200)]
devlink: finish queue.h to list.h transition
Loose the "q" from the names and name the structure fields in the same
way rest of the code does. Also, fix list_add arg order which leads
to segfault.
Fixes: 33267017faf1 ("iproute2: devlink: port from sys/queue.h to list.h") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Kurt Kanzenbach [Thu, 4 Jul 2019 12:24:27 +0000 (14:24 +0200)]
utils: Fix get_s64() function
get_s64() uses internally strtoll() to parse the value out of a given
string. strtoll() returns a long long. However, the intermediate variable is
long only which might be 32 bit on some systems. So, fix it.
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Antonio Borneo [Fri, 26 Jul 2019 13:06:09 +0000 (15:06 +0200)]
iplink_can: fix format output of clock with flag -details
The command
ip -details link show can0
prints in the last line the value of the clock frequency attached
to the name of the following value "numtxqueues", e.g.
clock 49500000numtxqueues 1 numrxqueues 1 gso_max_size
65536 gso_max_segs 65535
Add the missing space after the clock value.
Signed-off-by: Antonio Borneo <borneo.antonio@gmail.com>
Mark Zhang [Wed, 17 Jul 2019 14:31:56 +0000 (17:31 +0300)]
rdma: Document counter statistic
Add document of accessing the QP counter, including bind/unbind a QP
to a counter manually or automatically, and dump counter statistics.
Signed-off-by: Mark Zhang <markz@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Mark Zhang <markz@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Mark Zhang [Wed, 17 Jul 2019 14:31:54 +0000 (17:31 +0300)]
rdma: Add stat manual mode support
In manual mode a QP can be manually bound to a counter. If the counter
id(cntn) is not specified that kernel will allocate one. After a
successful bind, the cntn can be seen through "rdma statistic qp show".
And in unbind if lqpn is not specified then all QPs on this counter will
be unbound.
The manual and auto mode are mutual-exclusive.
Signed-off-by: Mark Zhang <markz@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Mark Zhang [Wed, 17 Jul 2019 14:31:53 +0000 (17:31 +0300)]
rdma: Make get_port_from_argv() returns valid port in strict port mode
When strict_port is set, make get_port_from_argv() returns failure if
no valid port is specified.
Signed-off-by: Mark Zhang <markz@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Mark Zhang [Wed, 17 Jul 2019 14:31:52 +0000 (17:31 +0300)]
rdma: Add rdma statistic counter per-port auto mode support
With per-QP statistic counter support, a user is allowed to monitor
specific QPs categories, which are bound to/unbound from counters
dynamically allocated/deallocated.
In per-port "auto" mode, QPs are bound to counters automatically
according to common criteria. For example a per "type"(qp type)
scheme, where in each process all QPs have same qp type are bind
automatically to a single counter.
Currently only "type" (qp type) is supported. Examples:
$ rdma statistic qp set link mlx5_2/1 auto type on
$ rdma statistic qp set link mlx5_2/1 auto off
Signed-off-by: Mark Zhang <markz@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Mark Zhang [Wed, 17 Jul 2019 14:31:51 +0000 (17:31 +0300)]
rdma: Add get per-port counter mode support
Add an interface to show which mode is active. Two modes are supported:
- "auto": In this mode all QPs belong to one category are bind automatically
to a single counter set. Currently only "qp type" is supported;
- "manual": In this mode QPs are bound to a counter manually.
Examples:
$ rdma statistic qp mode
0/1: mlx5_0/1: qp auto off
1/1: mlx5_1/1: qp auto off
2/1: mlx5_2/1: qp auto type on
3/1: mlx5_3/1: qp auto off
$ rdma statistic qp mode link mlx5_0
0/1: mlx5_0/1: qp auto off
Signed-off-by: Mark Zhang <markz@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Mark Zhang [Wed, 17 Jul 2019 14:31:50 +0000 (17:31 +0300)]
rdma: Add "stat qp show" support
This patch presents link, id, task name, lqpn, as well as all sub
counters of a QP counter.
A QP counter is a dynamically allocated statistic counter that is
bound with one or more QPs. It has several sub-counters, each is
used for a different purpose.
Signed-off-by: Mark Zhang <markz@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Ivan Delalande [Thu, 18 Jul 2019 01:15:31 +0000 (18:15 -0700)]
json: fix backslash escape typo in jsonw_puts
Fixes: fcc16c22 ("provide common json output formatter") Signed-off-by: Ivan Delalande <colona@arista.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
This allows a new parameter, flags, to be passed to taprio. Currently, it
only supports enabling the txtime-assist mode. But, we plan to add
different modes for taprio (e.g. hardware offloading) and this parameter
will be useful in enabling those modes.
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Vedang Patel <vedang.patel@intel.com> Signed-off-by: David Ahern <dsahern@gmail.com>
ETF Qdisc currently checks for a socket with SO_TXTIME socket option. If
either is not present, the packet is dropped. In the future commits, we
want other Qdiscs to add packet with launchtime to the ETF Qdisc. Also,
there are some packets (e.g. ICMP packets) which may not have a socket
associated with them. So, add an option to skip this check.
Signed-off-by: Vedang Patel <vedang.patel@intel.com> Signed-off-by: David Ahern <dsahern@gmail.com>
David Ahern [Thu, 18 Jul 2019 22:42:13 +0000 (15:42 -0700)]
Merge branch 'tc-conntrack' into next
Paul Blakey says:
====================
This patch series add connection tracking capabilities in tc.
It does so via a new tc action, called act_ct, and new tc flower classifier matching.
Act ct and relevant flower matches, are still under review in net-next mailing list.
Usage is as follows:
$ tc qdisc add dev ens1f0_0 ingress
$ tc qdisc add dev ens1f0_1 ingress
$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 0 proto ip \
flower ip_proto tcp ct_state -trk \
action ct zone 2 pipe \
action goto chain 2
$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 2 proto ip \
flower ct_state +trk+new \
action ct zone 2 commit mark 0xbb nat src addr 5.5.5.7 pipe \
action mirred egress redirect dev ens1f0_1
$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 2 proto ip \
flower ct_zone 2 ct_mark 0xbb ct_state +trk+est \
action ct nat pipe \
action mirred egress redirect dev ens1f0_1
$ tc filter add dev ens1f0_1 ingress \
prio 1 chain 0 proto ip \
flower ip_proto tcp ct_state -trk \
action ct zone 2 pipe \
action goto chain 1
$ tc filter add dev ens1f0_1 ingress \
prio 1 chain 1 proto ip \
flower ct_zone 2 ct_mark 0xbb ct_state +trk+est \
action ct nat pipe \
action mirred egress redirect dev ens1f0_0
Andrea Claudi [Tue, 9 Jul 2019 13:16:51 +0000 (15:16 +0200)]
ip tunnel: warn when changing IPv6 tunnel without tunnel name
Tunnel change fails if a tunnel name is not specified while using
'ip -6 tunnel change'. However, no warning message is printed and
no error code is returned.
$ ip -6 tunnel add ip6tnl1 mode ip6gre local fd::1 remote fd::2 tos inherit ttl 127 encaplimit none dev dummy0
$ ip -6 tunnel change dev dummy0 local 2001:1234::1 remote 2001:1234::2
$ ip -6 tunnel show ip6tnl1
ip6tnl1: gre/ipv6 remote fd::2 local fd::1 dev dummy0 encaplimit none hoplimit 127 tclass inherit flowlabel 0x00000 (flowinfo 0x00000000)
This commit checks if tunnel interface name is equal to an empty
string: in this case, it prints a warning message to the user.
It intentionally avoids to return an error to not break existing
script setup.
This is the output after this commit:
$ ip -6 tunnel add ip6tnl1 mode ip6gre local fd::1 remote fd::2 tos inherit ttl 127 encaplimit none dev dummy0
$ ip -6 tunnel change dev dummy0 local 2001:1234::1 remote 2001:1234::2
Tunnel interface name not specified
$ ip -6 tunnel show ip6tnl1
ip6tnl1: gre/ipv6 remote fd::2 local fd::1 dev dummy0 encaplimit none hoplimit 127 tclass inherit flowlabel 0x00000 (flowinfo 0x00000000)
Reviewed-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
$ ip link add type dummy
$ ip -6 tunnel add ip6tnl1 mode ip6ip6 remote 2001:db8:ffff:100::2 local 2001:db8:ffff:100::1 hoplimit 1 tclass 0x0 dev dummy0
add tunnel "ip6tnl0" failed: File exists
dev parameter must be used to specify the device to which
the tunnel is binded, and not the tunnel itself.
Reported-by: Jianwen Ji <jiji@redhat.com> Reviewed-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Aya Levin [Wed, 10 Jul 2019 11:03:21 +0000 (14:03 +0300)]
devlink: Remove enclosing array brackets binary print with json format
Keep pr_out_binary_value function only for printing. Inner relations
like array grouping should be done outside the function.
Fixes: 844a61764c6f ("devlink: Add helper functions for name and value separately") Signed-off-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Aya Levin [Wed, 10 Jul 2019 11:03:20 +0000 (14:03 +0300)]
devlink: Fix binary values print
Fix function pr_out_binary_value() to start printing the binary buffer
from offset 0 instead of offset 1. Remove redundant new line at the
beginning of the output
Fixes: 844a61764c6f ("devlink: Add helper functions for name and value separately") Signed-off-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Aya Levin [Wed, 10 Jul 2019 11:03:19 +0000 (14:03 +0300)]
devlink: Change devlink health dump show command to dumpit
Although devlink health dump show command is given per reporter, it
returns large amounts of data. Trying to use the doit cb results in
OUT-OF-BUFFER error. This complementary patch raises the DUMP flag in
order to invoke the dumpit cb. We're safe as no existing drivers
implement the dump health reporter option yet.
Fixes: 041e6e651a8e ("devlink: Add devlink health dump show command") Signed-off-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
iproute has an utility function which checks if a string is a prefix for
another one, to allow use of abbreviated commands, e.g. 'addr' or 'a'
instead of 'address'.
This routine unfortunately considers an empty string as prefix
of any pattern, leading to undefined behaviour when an empty
argument is passed to ip:
# ip ''
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
# tc ''
qdisc noqueue 0: dev lo root refcnt 2
# ip address add 192.0.2.0/24 '' 198.51.100.1 dev dummy0
# ip addr show dev dummy0
6: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 02:9d:5e:e9:3f:c0 brd ff:ff:ff:ff:ff:ff
inet 192.0.2.0/24 brd 198.51.100.1 scope global dummy0
valid_lft forever preferred_lft forever
Rewrite matches() so it takes care of an empty input, and doesn't
scan the input strings three times: the actual implementation
does 2 strlen and a memcpy to accomplish the same task.
Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Andrea Claudi [Sat, 13 Jul 2019 09:44:07 +0000 (11:44 +0200)]
tc: util: constrain percentage in 0-100 interval
parse_percent() currently allows to specify negative percentages
or value above 100%. However this does not seems to make sense,
as the function is used for probabilities or bandiwidth rates.
Moreover, using negative values leads to erroneous results
(using Bernoulli loss model as example):
$ ip link add test type dummy
$ ip link set test up
$ tc qdisc add dev test root netem loss gemodel -10% limit 10
$ tc qdisc show dev test
qdisc netem 800c: root refcnt 2 limit 10 loss gemodel p 90% r 10% 1-h 100% 1-k 0%
Using values above 100% we have instead:
$ ip link add test type dummy
$ ip link set test up
$ tc qdisc add dev test root netem loss gemodel 140% limit 10
$ tc qdisc show dev test
qdisc netem 800f: root refcnt 2 limit 10 loss gemodel p 40% r 60% 1-h 100% 1-k 0%
This commit changes parse_percent() with a check to ensure
percentage values stay between 1.0 and 0.0.
parse_percent_rate() function, which already employs a similar
check, is adjusted accordingly.
With this check in place, we have:
$ ip link add test type dummy
$ ip link set test up
$ tc qdisc add dev test root netem loss gemodel -10% limit 10
Illegal "loss gemodel p"
Fixes: 927e3cfb52b58 ("tc: B.W limits can now be specified in %.") Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Many tc modules were printing error messages to stdout.
This is problematic if using JSON or other output formats.
Change all these places to use fprintf(stderr, ...) instead.
Also, remove unnecessary initialization and places
where else is used after error return.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
David Ahern [Wed, 10 Jul 2019 21:07:42 +0000 (14:07 -0700)]
Merge branch 'tc-mpls-action' into next
John Hurley says:
====================
Recent kernel additions to TC allows the manipulation of MPLS headers as
filter actions.
The following patchset creates an iproute2 interface to the new actions
and includes documentation on how to use it.
v1->v2:
- change error from print_string() to fprintf(strerr,) (Stephen Hemminger)
- split long line in explain() message (David Ahern)
- use _SL_ instead of /n in print message (David Ahern)
John Hurley [Wed, 10 Jul 2019 12:40:40 +0000 (13:40 +0100)]
man: update man pages for TC MPLS actions
Add a man page describing the newly added TC mpls manipulation actions.
Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David Ahern <dsahern@gmail.com>
John Hurley [Wed, 10 Jul 2019 12:40:39 +0000 (13:40 +0100)]
tc: add mpls actions
Create a new action type for TC that allows the pushing, popping, and
modifying of MPLS headers.
Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David Ahern <dsahern@gmail.com>
John Hurley [Wed, 10 Jul 2019 12:40:38 +0000 (13:40 +0100)]
lib: add mpls_uc and mpls_mc as link layer protocol names
Update the llproto_names array to allow users to reference the mpls
protocol ids with the names 'mpls_uc' for unicast MPLS and 'mpls_mc' for
multicast.
Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David Ahern <dsahern@gmail.com>
devlink: Introduce PCI PF and VF port flavour and attribute
Introduce PCI PF and VF port flavour and port attributes such as PF
number and VF number.
$ devlink port show
pci/0000:05:00.0/0: type eth netdev eth0 flavour pcipf pfnum 0
pci/0000:05:00.0/1: type eth netdev eth1 flavour pcivf pfnum 0 vfnum 0
pci/0000:05:00.0/2: type eth netdev eth2 flavour pcivf pfnum 0 vfnum 1
Vincent Bernat [Sun, 7 Jul 2019 17:51:15 +0000 (19:51 +0200)]
ip: bond: add peer notification delay support
Ability to tweak the delay between gratuitous ND/ARP packets has been
added in kernel commit 07a4ddec3ce9 ("bonding: add an option to
specify a delay between peer notifications"), through
IFLA_BOND_PEER_NOTIF_DELAY attribute. Add support to set and show this
value.
Example:
$ ip -d link set bond0 type bond peer_notify_delay 1000
$ ip -d link l dev bond0
2: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue
state UP mode DEFAULT group default qlen 1000
link/ether 50:54:33:00:00:01 brd ff:ff:ff:ff:ff:ff
bond mode active-backup active_slave eth0 miimon 100 updelay 0
downdelay 0 peer_notify_delay 1000 use_carrier 1 arp_interval 0
arp_validate none arp_all_targets any primary eth0
primary_reselect always fail_over_mac active xmit_hash_policy
layer2 resend_igmp 1 num_grat_arp 5 all_slaves_active 0 min_links
0 lp_interval 1 packets_per_slave 1 lacp_rate slow ad_select
stable tlb_dynamic_lb 1 addrgenmode eu
Signed-off-by: Vincent Bernat <vincent@bernat.ch> Reviewed-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David Ahern <dsahern@gmail.com>
Andrea Claudi [Mon, 8 Jul 2019 09:36:42 +0000 (11:36 +0200)]
ip-route: fix json formatting for metrics
Setting metrics for routes currently lead to non-parsable
json output. For example:
$ ip link add type dummy
$ ip route add 192.168.2.0 dev dummy0 metric 100 mtu 1000 rto_min 3
$ ip -j route | jq
parse error: ':' not as part of an object at line 1, column 319
Fixing this opening a json object in the metrics array and using
print_string() instead of fprintf().
This is the output for the above commands applying this patch:
David Ahern [Tue, 9 Jul 2019 21:54:34 +0000 (14:54 -0700)]
ss: Change resolve_services to numeric
Commit ca697cee4cfc ("ip: add a new parameter -Numeric") changed
!resolve_services to numeric in ss.c.
A commit in master: d791e75d74ff ("ss: in --numeric mode, print raw numbers for data rates")
added another reference to !resolve_services. Convert it to numeric.
Andrea Claudi [Thu, 27 Jun 2019 14:47:45 +0000 (16:47 +0200)]
tc: netem: fix r parameter in Bernoulli loss model
As the man page for tc netem states:
To use the Bernoulli model, the only needed parameter is p while the
others will be set to the default values r=1-p, 1-h=1 and 1-k=0.
However r parameter is erroneusly set to 1, and not to 1-p.
Fix this using the same approach of the 4-state loss model.
Fixes: 3c7950af598be ("netem: add support for 4 state and GE loss model") Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Denis Kirjanov [Fri, 28 Jun 2019 09:54:25 +0000 (11:54 +0200)]
ipaddress: correctly print a VF hw address in the IPoIB case
Current code assumes that we print ethernet mac and
that doesn't work in the IPoIB case with SRIOV-enabled hardware
Before:
11: ib1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc pfifo_fast
state UP mode DEFAULT group default qlen 256
link/infiniband
80:00:00:66:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a4:3e:7c brd
00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
vf 0 MAC 14:80:00:00:66:fe, spoof checking off, link-state
disable,
trust off, query_rss off
...
After:
11: ib1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc pfifo_fast
state UP mode DEFAULT group default qlen 256
link/infiniband
80:00:00:66:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a4:3e:7c brd
00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
vf 0 link/infiniband
80:00:00:66:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a4:3e:7c brd
00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff, spoof
checking off, link-state disable, trust off, query_rss off
v1->v2: updated kernel headers to uapi commit
v2->v3: fixed alignment
v3->v4: aligned print statements as used through the source
Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org> Signed-off-by: David Ahern <dsahern@gmail.com>
[ committer note: flipped argument order for print_vfinfo to keep fp first
and fixed alignment issues ]
Hoang Le [Tue, 25 Jun 2019 04:34:39 +0000 (11:34 +0700)]
tipc: support interface name when activating UDP bearer
Support for indicating interface name has an ip address in parallel
with specifying ip address when activating UDP bearer.
This liberates the user from keeping track of the current ip address
for each device.
Old command syntax:
$tipc bearer enable media udp name NAME localip IP
New command syntax:
$tipc bearer enable media udp name NAME [localip IP|dev DEVICE]
v2:
- Removed initial value for fd
- Fixed the returning value for cmd_bearer_validate_and_get_addr
to make its consistent with using: zero or non-zero
v3: - Switch to use helper 'get_ifname' to retrieve interface name
v4: - Replace legacy SIOCGIFADDR by netlink
v5: - Fix leaky rtnl_handle
Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au> Signed-off-by: David Ahern <dsahern@gmail.com>
Baruch Siach [Thu, 27 Jun 2019 18:37:19 +0000 (21:37 +0300)]
devlink: fix libc and kernel headers collision
Since commit 2f1242efe9d ("devlink: Add devlink health show command") we
use the sys/sysinfo.h header for the sysinfo(2) system call. But since
iproute2 carries a local version of the kernel struct sysinfo, this
causes a collision with libc that do not rely on kernel defined sysinfo
like musl libc:
In file included from devlink.c:25:0:
.../sysroot/usr/include/sys/sysinfo.h:10:8: error: redefinition of 'struct sysinfo'
struct sysinfo {
^~~~~~~
In file included from ../include/uapi/linux/kernel.h:5:0,
from ../include/uapi/linux/netlink.h:5,
from ../include/uapi/linux/genetlink.h:6,
from devlink.c:21:
../include/uapi/linux/sysinfo.h:8:8: note: originally defined here
struct sysinfo {
^~~~~~~
Move the sys/sysinfo.h userspace header before kernel headers, and
suppress the indirect include of linux/sysinfo.h.
Cc: Aya Levin <ayal@mellanox.com> Cc: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Baruch Siach <baruch@tkos.co.il> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Baruch Siach [Thu, 27 Jun 2019 18:37:18 +0000 (21:37 +0300)]
devlink: fix format string warning for 32bit targets
32bit targets define uint64_t as long long unsigned. This leads to the
following build warning:
devlink.c: In function ‘pr_out_u64’:
devlink.c:1729:11: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 4 has type ‘uint64_t {aka long long unsigned int}’ [-Wformat=]
pr_out("%s %lu", name, val);
^
devlink.c:59:21: note: in definition of macro ‘pr_out’
fprintf(stdout, ##args); \
^~~~
Use uint64_t specific conversion specifiers in the format string to fix
that.
Cc: Aya Levin <ayal@mellanox.com> Cc: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Baruch Siach <baruch@tkos.co.il> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Andrea Claudi [Tue, 25 Jun 2019 10:29:57 +0000 (12:29 +0200)]
ip address: do not set mngtmpaddr option for IPv4 addresses
'mngtmpaddr' option make the kernel manage temporary addresses
created from the specified one as template on behalf of Privacy
Extensions (RFC3041). This option should be available only for
IPv6 addresses, as correctly stated in the manpage.
However it is possible to set mngtmpaddr on IPv4 addresses, too:
$ ip link add dummy0 type dummy
$ ip -4 addr add 192.168.1.1 dev dummy0 mngtmpaddr
$ ip a
1: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 1a:6d:c6:96:ca:f8 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.1/32 scope global mngtmpaddr dummy0
valid_lft forever preferred_lft forever
Fix this adding a check on the protocol family before setting
IFA_F_MANAGETEMPADDR flag.
Fixes: 5b7e21c417bea ("add support for IFA_F_MANAGETEMPADDR") Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Andrea Claudi [Tue, 25 Jun 2019 10:29:56 +0000 (12:29 +0200)]
ip address: do not set home option for IPv4 addresses
'home' option designates a IPv6 address as "home address" as
defined in RFC 6275. This option should be available only for
IPv6 addresses, as correctly stated in the manpage.
However it is possible to set home on IPv4 addresses, too:
$ ip link add dummy0 type dummy
$ ip -4 addr add 192.168.1.1 dev dummy0 home
$ ip a
1: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 1a:6d:c6:96:ca:f8 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.1/32 scope global home dummy0
valid_lft forever preferred_lft forever
Fix this adding a check on the protocol family before setting
IFA_F_HOMEADDRESS flag.
Fixes: bac735c53a36d ("enabled to manipulate the flags of IFA_F_HOMEADDRESS or IFA_F_NODAD from ip.") Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Andrea Claudi [Tue, 25 Jun 2019 10:29:55 +0000 (12:29 +0200)]
ip address: do not set nodad option for IPv4 addresses
Duplicate Address Detection (RFC 4862) is available only for IPv6
addresses. As a consequence, 'nodad' option, turning it off, should
be available only for IPv6, and is defined like that in the man page.
However it is possible to set nodad on IPv4 addresses, too:
$ ip link add dummy0 type dummy
$ ip -4 addr add 192.168.1.1 dev dummy0 nodad
$ ip a
1: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 1a:6d:c6:96:ca:f8 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.1/32 scope global nodad dummy0
valid_lft forever preferred_lft forever
Fix this adding a check on the protocol family before setting
IFA_F_NODAD flag.
Fixes: bac735c53a36d ("enabled to manipulate the flags of IFA_F_HOMEADDRESS or IFA_F_NODAD from ip.") Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Stefano Brivio [Tue, 25 Jun 2019 11:41:24 +0000 (13:41 +0200)]
iproute: Set flags and attributes on dump to get IPv6 cached routes to be flushed
With a current (5.1) kernel version, IPv6 exception routes can't be listed
(ip -6 route list cache) or flushed (ip -6 route flush cache). Kernel
support for this is being added back. Relevant net-next commits:
564c91f7e563 fib_frontend, ip6_fib: Select routes or exceptions dump from RTM_F_CLONED ef11209d4219 Revert "net/ipv6: Bail early if user only wants cloned entries" 3401bfb1638e ipv6/route: Don't match on fc_nh_id if not set in ip6_route_del() bf9a8a061ddc ipv6/route: Change return code of rt6_dump_route() for partial node dumps 1e47b4837f3b ipv6: Dump route exceptions if requested 40cb35d5dc04 ip6_fib: Don't discard nodes with valid routing information in fib6_locate_1()
However, to allow the kernel to filter routes based on the RTM_F_CLONED
flag, we need to make sure this flag is always passed when we want cached
routes to be dumped, and we can also pass table and output interface
attributes to have the kernel filtering on them, if requested by the user.
Use the existing iproute_dump_filter() as a filter for the dump request in
iproute_flush(). This way, 'ip -6 route flush cache' works again.
v2: Instead of creating a separate 'filter' function dealing with
RTM_F_CACHED only, use the existing iproute_dump_filter() and get
table and oif kernel filtering for free. Suggested by David Ahern.
Hangbin Liu [Wed, 26 Jun 2019 01:44:07 +0000 (09:44 +0800)]
ip/iptoken: fix dump error when ipv6 disabled
When we disable IPv6 from the start up (ipv6.disable=1), there will be
no IPv6 route info in the dump message. If we return -1 when
ifi->ifi_family != AF_INET6, we will get error like
$ ip token list
Dump terminated
which will make user feel confused. There is no need to return -1 if the
dump message not match. Return 0 is enough.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Eyal Birger [Mon, 24 Jun 2019 15:14:57 +0000 (18:14 +0300)]
tc: adjust xtables_match and xtables_target to changes in recent iptables
iptables commit 933400b37d09 ("nft: xtables: add the infrastructure to translate from iptables to nft")
added an additional member to struct xtables_match and struct xtables_target.
This change is available for libxtables12 and up.
Add these members conditionally to support both newer and older versions.
Fixes: dd29621578d2 ("tc: add em_ipt ematch for calling xtables matches from tc matching context") Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Jakub Kicinski [Tue, 18 Jun 2019 00:49:29 +0000 (17:49 -0700)]
tc: q_netem: JSON-ify the output
Add JSON output support to q_netem.
The normal output is untouched.
In JSON output always use seconds as the base of time units,
and non-percentage numbers (0.01 instead of 1%). Try to always
report the fields, even if they are zero.
All this should make the output more machine-friendly.
v2: less macroes
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David Ahern <dsahern@gmail.com>
Matteo Croce [Tue, 18 Jun 2019 14:49:35 +0000 (16:49 +0200)]
netns: make netns_{save,restore} static
The netns_{save,restore} functions are only used in ipnetns.c now, since
the restore is not needed anymore after the netns exec command.
Move them in ipnetns.c, and make them static.
Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Matteo Croce [Tue, 18 Jun 2019 14:49:34 +0000 (16:49 +0200)]
ip vrf: use hook to change VRF in the child
On vrf exec, reset the VRF associations in the child process, via the
new hook added to cmd_exec(). In this way, the parent doesn't have to
reset the VRF associations before spawning other processes.
Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Matteo Croce [Tue, 18 Jun 2019 14:49:33 +0000 (16:49 +0200)]
netns: switch netns in the child when executing commands
'ip netns exec' changes the current netns just before executing a child
process, and restores it after forking. This is needed if we're running
in batch or do_all mode.
Some cleanups must be done both in the parent and in the child: the
parent must restore the previous netns, while the child must reset any
VRF association.
Unfortunately, if do_all is set, the VRF are not reset in the child, and
the spawned processes are started with the wrong VRF context. This can
be triggered with this script:
# ip -b - <<-'EOF'
link add type vrf table 100
link set vrf0 up
link add type dummy
link set dummy0 vrf vrf0 up
netns add ns1
EOF
# ip -all -b - <<-'EOF'
vrf exec vrf0 true
netns exec setsid -f sleep 1h
EOF
# ip vrf pids vrf0
314 sleep
# ps 314
PID TTY STAT TIME COMMAND
314 ? Ss 0:00 sleep 1h
Refactor cmd_exec() and pass to it a function pointer which is called in
the child before the final exec. In the netns exec case the function just
resets the VRF and switches netns.
Doing it in the child is less error prone and safer, because the parent
environment is always kept unaltered.
After this refactor some utility functions became unused, so remove them.
Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>