Add support for pipeline debug (dpipe). The headers are used both the
gain visibillity into the headers supported by the hardware, and to
build the headers/field database which is used by other commands.
Examples:
First we can see the headers supported by the hardware:
$devlink dpipe header show pci/0000:03:00.0
pci/0000:03:00.0:
name mlxsw_meta
field:
name erif_port bitwidth 32 mapping_type ifindex
name l3_forward bitwidth 1
name l3_drop bitwidth 1
Note that mapping_type is presented only if relevant. Also the header/
field id's are reported by the kernel they are not shown by default.
They can be observed by using the -v option. Also the headers scope
(global/local) is specified.
$devlink -v dpipe header show pci/0000:03:00.0
pci/0000:03:00.0:
name mlxsw_meta id 0 global false
field:
name erif_port id 0 bitwidth 32 mapping_type ifindex
name l3_forward id 1 bitwidth 1
name l3_drop id 2 bitwidth 1
Second we can examine the tables supported by the hardware. In order
to dump all the tables no table name should be provided:
$devlink dpipe table show pci/0000:03:00.0
In order to examine specific table its name have to be specified:
$devlink dpipe table show pci/0000:03:00.0 name erif
pci/0000:03:00.0:
name mlxsw_erif size 800 counters_enabled true
match:
type field_exact header mlxsw_meta field erif_port mapping ifindex
action:
type field_modify header mlxsw_meta field l3_forward
type field_modify header mlxsw_meta field l3_drop
To enable/disable counters on the table:
$devlink dpipe table set pci/0000:03:00.0 name erif counters enable
$devlink dpipe table set pci/0000:03:00.0 name erif counters disable
In order to see the current entries in the hardware for specific table:
$devlink dpipe table dump pci/0000:03:00.0 name erif
pci/0000:03:00.0:
index 0 counter 0
match_value:
type field_exact header mlxsw_meta field erif_port mapping ifindex mapping_value 383 value 0
action_value:
type field_modify header mlxsw_meta field l3_forward value 1
index 1 counter 0
match_value:
type field_exact header mlxsw_meta field erif_port mapping ifindex mapping_value 381 value 1
action_value:
type field_modify header mlxsw_meta field l3_forward value 1
In the above example the table contains two entries which does match
on erif port and forwards the packet or drop it (currently only the
forward count is implemented). The counter values are provided for
example. In case the counting is not enabled on the table the counters
will not be available.
Boris Pismenny [Sun, 30 Apr 2017 14:16:02 +0000 (17:16 +0300)]
ip xfrm: Add xfrm state crypto offload
syntax:
ip xfrm state .... offload dev <if-name> dir <in or out>
Example to add inbound offload:
ip xfrm state .... offload dev mlx0 dir in
Example to add outbound offload:
ip xfrm state .... offload dev mlx0 dir out
Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Daniel Borkmann [Fri, 28 Apr 2017 13:44:29 +0000 (15:44 +0200)]
bpf: add support for generic xdp
Follow-up to commit c7272ca72009 ("bpf: add initial support for
attaching xdp progs") to also support generic XDP. This adds an
indicator for loaded generic XDP programs when programs are loaded
as shown in c7272ca72009, but the driver still lacks native XDP
support.
# ip link
[...]
3: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdpgeneric qdisc [...]
link/ether 0c:c4:7a:03:f9:25 brd ff:ff:ff:ff:ff:ff
[...]
In case the driver does support native XDP, but the user wants
to load the program as generic XDP (e.g. for testing purposes),
then this can be done with the same semantics as in c7272ca72009,
but with 'xdpgeneric' instead of 'xdp' command for loading:
# ip -force link set dev eno1 xdpgeneric obj xdp.o
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: David S. Miller <davem@davemloft.net>
Or Gerlitz [Sun, 23 Apr 2017 12:53:56 +0000 (15:53 +0300)]
tc/pedit: p_udp: introduce pedit udp support
For example, forward udp traffic destined to port 999 to veth0 and set
tcp port to 888:
$ tc filter add dev enp0s9 protocol ip parent ffff: \
flower \
ip_proto udp \
dst_port 999 \
action pedit ex munge \
udp dport set 888 \
action mirred egress \
redirect dev veth0
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Amir Vadai <amir@vadai.me>
Amir Vadai [Sun, 23 Apr 2017 12:53:55 +0000 (15:53 +0300)]
tc/pedit: p_tcp: introduce pedit tcp support
For example, forward tcp traffic destined to port 80 to veth0 and set
tcp port to 8080:
$ tc filter add dev enp0s9 protocol ip parent ffff: \
flower \
ip_proto tcp \
dst_port 80 \
action pedit ex munge \
tcp dport set 8080 \
action mirred egress \
redirect dev veth0
Amir Vadai [Sun, 23 Apr 2017 12:53:54 +0000 (15:53 +0300)]
tc/pedit: p_eth: ETH header editor
For example, forward tcp traffic to veth0 and set
destination mac address to 11:22:33:44:55:66 :
$ tc filter add dev enp0s9 protocol ip parent ffff: \
flower \
ip_proto tcp \
action pedit ex munge \
eth dst set 11:22:33:44:55:66 \
action mirred egress \
redirect dev veth0
Amir Vadai [Sun, 23 Apr 2017 12:53:52 +0000 (15:53 +0300)]
tc/pedit: p_ip: introduce editing ttl header
Enable user to edit IP header ttl field.
For example, to forward any TCP packet and decrease its TTL by one:
$ tc filter add dev enp0s9 protocol ip parent ffff: \
flower \
ip_proto tcp \
action pedit ex munge \
ip ttl add 0xff pipe \
action mirred egress \
redirect dev veth0
Amir Vadai [Sun, 23 Apr 2017 12:53:50 +0000 (15:53 +0300)]
tc/pedit: Extend pedit to specify offset relative to mac/transport headers
Utilize the extended pedit netlink to set an offset relative to a
specific header type. Old netlink only enabled the user to set
approximated offset relative to the IPv4 header.
To use this extended functionality need to use the 'ex' keyword after
'pedit' and before any 'munge'.
e.g:
$ tc filter add dev ens9 protocol ip parent ffff: \
flower \
ip_proto udp \
dst_port 80 \
action pedit ex munge \
ip dst set 1.1.1.1 \
pipe \
action mirred egress redirect dev veth0
Michal Kubeček [Thu, 27 Apr 2017 09:43:47 +0000 (11:43 +0200)]
routel: fix infinite loop in line parser
As noticed by one of the few users of routel script, it ends up in an
infinite loop when they pull out the cable from the NIC used for some
route. This is caused by its parser expecting the line of "ip route show"
output consists of "key value" pairs (except for the initial target range),
together with an old trap of Bourne style shells that "shift 2" does
nothing if there is only one argument left. Some keywords, e.g. "linkdown",
are not followed by a value.
Improve the parser to
(1) only set variables for keywords we care about
(2) recognize (currently) known keywords without value
This is still far from perfect (and certainly not future proof) but to
fully fix the script, one would probably have to rewrite the logic
completely (and I'm not sure it's worth the effort).
Introduce optional 128-bit action cookie.
Like all other cookie schemes in the networking world (eg in protocols
like http or existing kernel fib protocol field, etc) the idea is to
save user state that when retrieved serves as a correlator. The kernel
_should not_ intepret it. The user can store whatever they wish in the
128 bits.
Sample exercise(showing variable length use of cookie)
.. create an accept action with cookie a1b2c3d4
sudo $TC actions add action ok index 1 cookie a1b2c3d4
.. dump all gact actions..
sudo $TC -s actions ls action gact
action order 0: gact action pass
random type none pass val 0
index 1 ref 1 bind 0 installed 5 sec used 5 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
cookie a1b2c3d4
.. bind the accept action to a filter..
sudo $TC filter add dev lo parent ffff: protocol ip prio 1 \
u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 1
... send some traffic..
$ ping 127.0.0.1 -c 3
PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.020 ms
64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.027 ms
64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.038 ms
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
David Ahern [Fri, 14 Apr 2017 23:09:56 +0000 (16:09 -0700)]
ip vrf: Add command name next to pid
'ip vrf pids' is used to list processes bound to a vrf, but it only
shows the pid leaving a lot of work for the user. Add the command
name to the output. With this patch you get the more user friendly:
David Ahern [Fri, 14 Apr 2017 23:09:56 +0000 (16:09 -0700)]
ip vrf: Add command name next to pid
'ip vrf pids' is used to list processes bound to a vrf, but it only
shows the pid leaving a lot of work for the user. Add the command
name to the output. With this patch you get the more user friendly:
David Ahern [Fri, 24 Mar 2017 02:51:22 +0000 (19:51 -0700)]
ip netconf: show all families on dev request
Currently specifying a device to ip netconf and it dumps only values
for IPv4. Change this to dump data for all families unless a specific
family is given.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
David Ahern [Fri, 24 Mar 2017 02:51:21 +0000 (19:51 -0700)]
ip netconf: Show all address families by default in dumps
Currently, 'ip netconf' only shows ipv4 and ipv6 netconf settings. If IPv6
is not enabled, the dump ends with
RTNETLINK answers: Operation not supported
when IPv6 request is attempted. Further, if the mpls_router module is also
loaded a separate request is needed to get MPLS settings.
To make this better going forward, use the new PF_UNSPEC dump all option
if the kernel supports it. If the kernel does not, it sets NLMSG_ERROR and
returns EOPNOTSUPP which is trapped and we fall back to the existing output
to maintain compatibility with existing kernels.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
David Ahern [Fri, 24 Mar 2017 02:51:20 +0000 (19:51 -0700)]
netlink: Add flag to suppress print of nlmsg error
Allow callers of the dump API to handle nlmsg errors (e.g., an
unsupported feature). Setting RTNL_HANDLE_F_SUPPRESS_NLERR in the
rtnl_handle avoids unnecessary messages to the users in some case.
For example,
RTNETLINK answers: Operation not supported
when probing for support of a new feature.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
The maketable program used to generate one of the configuration
files at build time for netem would access past the end of the array
for one input value. This is a bug inherited from original NISTnet.
Just fold the value, like other code there.
This is not a runtime error security problem.
It only impacts the build process if the build machine
had extra hardening enabled.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Robert Shearman [Tue, 11 Apr 2017 08:37:20 +0000 (09:37 +0100)]
iproute: Add support for ttl-propagation attribute
Add support for setting and displaying the ttl-propagation attribute
initially used by MPLS to control propagation of MPLS TTL to IPv4/IPv6
TTL/hop-limit on popping final label on a per-route basis.
Signed-off-by: Robert Shearman <rshearma@brocade.com> Acked-by: David Ahern <dsa@cumulusnetworks.com>
Phil Sutter [Tue, 28 Mar 2017 21:19:39 +0000 (23:19 +0200)]
ip: link: Add missing link type help texts
These are basically stubs: The types which lacked their own help text
simply don't accept any options (yet). Still it might be a bit confusing
to users if they are presented with the generic 'ip link' help text
instead of something saying there are no type specific options.
Phil Sutter [Tue, 28 Mar 2017 21:19:38 +0000 (23:19 +0200)]
ip: link: Unify link type help functions a bit
Take help function in iplink_bridge.c as an example and make other link
types' help functions similar:
* Use a single fprintf() call (if possible).
* Don't state a full command line, just "... type OPTIONS".
* Put every option in it's own line, align options by column.
* List mandatory options first.
link_veth.c is intentionally left untouched because it's 'peer' option
eats all kinds of generic link options and the help text points this out
without duplicating all the options there again.
While generating PDFs from the man pages, I saw the warning below from
several files. Compared the tc-matchall.8 with bridge.8 and used .RI
instead of .R. It should have no effect on the man page rendering.
`R' is a string (producing the registered sign), not a macro.
Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com>
Vincent Bernat [Thu, 9 Mar 2017 20:05:42 +0000 (21:05 +0100)]
vxlan: use preferred address family when neither group or remote is specified
When neither group or remote is specified (or if they are specified with
the any address), nothing is sent to the kernel. In this case, the
kernel defaults to IPv4. This makes impossible to use IPv6 with
unspecified unicast remote ("bridge fdb add" will return
EAFNOTSUPPORT).
If the user specifies a preferred address family (eg, "ip -6 link add"),
then send either IFLA_VXLAN_GROUP or IFLA_VXLAN_GROUP6 to enforce the
use of the appropriate family.
Having some examples in the top level man page might make it a little bit easier
for new users to get started. Reused some words / sentences from the existing
man pages.
Suggested-by: 積丹尼 Dan Jacobson <jidanni@jidanni.org> Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com>
Jiri Kosina [Wed, 8 Mar 2017 12:04:42 +0000 (13:04 +0100)]
iproute2: add support for invisible qdisc dumping
Support the new TCA_DUMP_INVISIBLE netlink attribute that allows asking
kernel to perform 'full qdisc dump', as for historical reasons some of the
default qdiscs are being hidden by the kernel.
The command syntax is being extended by voluntary 'invisible' argument to
'tc qdisc show'.
Robert Shearman [Thu, 9 Mar 2017 12:43:36 +0000 (12:43 +0000)]
iplink: add support for afstats subcommand
Add support for new afstats subcommand. This uses the new
IFLA_STATS_AF_SPEC attribute of RTM_GETSTATS messages to show
per-device, AF-specific stats. At the moment the kernel only supports
MPLS AF stats, so that is all that's implemented here.
The print_num function is exposed from ipaddress.c to be used for
printing the new stats so that the human-readable option, if set, can
be respected.
Daniel Borkmann [Mon, 6 Mar 2017 12:06:00 +0000 (13:06 +0100)]
bpf: test for valid type in bpf_get_work_dir
Jan-Erik reported an assertion in bpf_prog_to_subdir() failed where
type was BPF_PROG_TYPE_UNSPEC, which is only used in bpf_init_env()
to auto-mount and cache the bpf fs mount point.
Therefore, make sure when bpf_init_env() is called multiple times
(f.e. eBPF classifier with eBPF action attached) and bpf_mnt_cached
is set already that the type is also valid. In bpf_init_env(), we're
only interested in the mount point and not a type-specific subdir.
Fixes: e42256699cac ("bpf: make tc's bpf loader generic and move into lib") Reported-by: Jan-Erik Rediger <janerik@rediger.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Jiri Kosina [Fri, 24 Feb 2017 17:28:54 +0000 (18:28 +0100)]
iproute2: tc: introduce build dependency on libnetlink
Rebuilding libnetlink doesn't trigger rebuild of tc, which is wrong
(especially so for builds where libnetlink.a gets statically linked into
tc). Fix that by introducing an explicit dependency.
Use the new helper functions rta_getattr_u* instead of direct
cast of RTA_DATA(). Where RTA_DATA() is a structure, then remove
the unnecessary cast since RTA_DATA() is void *
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Daniel Borkmann [Thu, 23 Feb 2017 12:07:14 +0000 (13:07 +0100)]
{f,m}_bpf: dump tag over insns
We already export TCA_BPF_TAG resp. TCA_ACT_BPF_TAG from kernel commit f1f7714ea51c ("bpf: rework prog_digest into prog_tag"), thus also dump
it when filter/actions are shown.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Roi Dayan [Wed, 22 Feb 2017 14:05:01 +0000 (16:05 +0200)]
tc: flower: Fix parsing ip address
Fix order of arguments when passed to __flower_parse_ip_addr.
Fixes: ("f888f4e20534 tc: flower: Support matching ARP") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Simon Horman <simon.horman@netronome.com>
David Ahern [Tue, 21 Feb 2017 17:23:31 +0000 (09:23 -0800)]
ip: Add support for MPLS netconf
Add support for MPLS netconf to ip monitor and ip netconf commands.
Changes to header files not included as those are typically pulled
in my a header sync with the kernel.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
This patch adds support for a new xstats link subcommand which uses the
specified link type's new parse/print_ifla_xstats callbacks to display
extended statistics.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Leon Romanovsky [Tue, 14 Feb 2017 05:29:38 +0000 (07:29 +0200)]
devlink: Call dl_free in early exit case
Prior to parsing command options, the devlink tool allocates memory
to store results. In case of early exit (wrong parameters or version
check), this memory wasn't freed.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com>
Lucas Bates [Fri, 10 Feb 2017 23:28:54 +0000 (18:28 -0500)]
man page: add page for skbmod action
Signed-off-by: Lucas Bates <lucasb@mojatatu.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Roman Mashak <mrv@mojatatu.com> Reviewed-by: Simon Horman <simon.horman@netronome.com>