tc: f_flower: add geneve option match support to flower
Allow matching on options in Geneve tunnel headers.
The options can be described in the form
CLASS:TYPE:DATA/CLASS_MASK:TYPE_MASK:DATA_MASK, where CLASS is
represented as a 16bit hexadecimal value, TYPE as an 8bit
hexadecimal value and DATA as a variable length hexadecimal value.
e.g.
# ip link add name geneve0 type geneve dstport 0 external
# tc qdisc add dev geneve0 ingress
# tc filter add dev geneve0 protocol ip parent ffff: \
flower \
enc_src_ip 10.0.99.192 \
enc_dst_ip 10.0.99.193 \
enc_key_id 11 \
geneve_opts 0102:80:1122334421314151/ffff:ff:ffffffffffffffff \
ip_proto udp \
action mirred egress redirect dev eth1
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David Ahern <dsahern@gmail.com>
Fixes: c60389e4f9ea ("libnetlink: fix leak and using unused memory on error") Reported-by: David Ahern <dsahern@gmail.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Lorenzo Bianconi [Fri, 21 Sep 2018 13:34:25 +0000 (15:34 +0200)]
iplink_vxlan: take into account preferred_family creating vxlan device
Take into account the configured preferred_family if neither saddr or
daddr are provided since otherwise vxlan kernel module will use IPv4 as
default remote inet family neglecting the one provided by userspace.
This behaviour was originally in commit 97d564b90ccb ("vxlan: use
preferred address family when neither group or remote is specified").
The issue can be triggered with the following reproducer:
$ip -6 link add vxlan1 type vxlan id 42 dev enp0s2 \
proxy nolearning l2miss l3miss
$bridge fdb add 46:47:1f:a7:1c:25 dev vxlan1 dst 2000::2
RTNETLINK answers: Address family not supported by protocol
Fixes: 1e9b8072de2c ("iplink_vxlan: Get rid of inet_get_addr()") Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Hangbin Liu [Tue, 18 Sep 2018 09:48:40 +0000 (17:48 +0800)]
iplink: fix incorrect any address handling for ip tunnels
After commit d42c7891d26e4 ("utils: Do not reset family for default, any,
all addresses"), when call get_addr() for any/all addresses, we will set
addr->flags to ADDRTYPE_INET_UNSPEC if family is AF_INET/AF_INET6, which
makes is_addrtype_inet() checking passed and assigns incorrect address
to kernel. The ip link cmd will return error like:
]# ip link add ipip1 type ipip local any remote 1.1.1.1
RTNETLINK answers: Numerical result out of range
Fix it by using is_addrtype_inet_not_unspec() to avoid unspec addresses.
geneve, vxlan are not affected as they use AF_UNSPEC family when call
get_addr()
Reported-by: Jianlin Shi <jishi@redhat.com> Fixes: d42c7891d26e4 ("utils: Do not reset family for default, any, all addresses") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Petr Vorel [Wed, 19 Sep 2018 23:36:22 +0000 (01:36 +0200)]
testsuite: Fix missing generate_nlmsg
Commit ad23e152 caused generate_nlmsg to be always missing:
$ make alltests
make: ./tools/generate_nlmsg: Command not found
Create testclean: to remove only results directory.
Fixes: ad23e152 testsuite: remove all temp files and implement make clean Signed-off-by: Petr Vorel <petr.vorel@gmail.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
David Ahern [Fri, 21 Sep 2018 00:50:55 +0000 (17:50 -0700)]
Merge branch 'iproute2-master' into iproute2-next
Conflicts:
ip/iproute_lwtunnel.c
In addition to merge conflict between bd59e5b1517b and 94a8722f2f78,
updated the code added by the latter commit based on the change of the
former (ie., added ret = to the new rta_addattr_l).
Leon Romanovsky [Sun, 16 Sep 2018 17:28:13 +0000 (20:28 +0300)]
rdma: Fix representation of PortInfo CapabilityMask
The port capability mask represents IBTA PortInfo specification,
but as it is written in description of kernel commit 2f944c0fbf58
("RDMA: Fix storage of PortInfo CapabilityMask in the kernel"),
the bit 26 was mistakenly overwritten.
The rdmatool followed it too and mislead users by presenting wrong
value. Since it never showed proper value, we update the whole
port_cap_mask to comply with IBTA and show real HW values.
Fixes: da990ab40a92 ("rdma: Add link object") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
libnetlink: fix leak and using unused memory on error
If an error happens in multi-segment message (tc only)
then report the error and stop processing further responses.
This also fixes refering to the buffer after free.
The sequence check is not necessary here because the
response message has already been validated to be in
the window of the sequence number of the iov.
Hangbin Liu [Wed, 12 Sep 2018 01:39:44 +0000 (09:39 +0800)]
bridge/mdb: fix missing new line when show bridge mdb
The bridge mdb show is broken on current iproute2. e.g.
]# bridge mdb show
34: br0 veth0_br 224.1.1.2 temp 34: br0 veth0_br 224.1.1.1 temp
After fix:
]# bridge mdb show
34: br0 veth0_br 224.1.1.2 temp
34: br0 veth0_br 224.1.1.1 temp
v2: Use json print lib as Stephen suggested.
v3: No need to use is_json_context() as print_string() could handle both cases.
v4: use new function print_nl() to print new line in non-json mode.
Reported-by: Ying Xu <yinxu@redhat.com> Fixes: c7c1a1ef51aea ("bridge: colorize output and use JSON print library") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
When the GSO splitting was turned into dual split-gso/no-split-gso options,
the printing of the latter was left out. Add that, so output is consistent
with the options passed.
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Phil Sutter [Thu, 6 Sep 2018 13:31:51 +0000 (15:31 +0200)]
ip-route: Fix segfault with many nexthops
It was possible to crash ip-route by adding an IPv6 route with 37
nexthop statements. A simple reproducer is:
| for i in `seq 37`; do
| nhs="nexthop via 1111::$i "$nhs
| done
| ip -6 route add 3333::/64 $nhs
The related code was broken in multiple ways:
* parse_one_nh() assumed that rta points to 4kB of storage but caller
provided just 1kB. Fixed by passing 'len' parameter with the correct
value.
* Error checking of rta_addattr*() calls in parse_one_nh() and called
functions was completely absent, so with above fix in place output
flood would occur due to parser looping forever.
While being at it, increase message buffer sizes to 4k. This allows for
at most 144 nexthops.
Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
David Ahern [Thu, 30 Aug 2018 18:08:43 +0000 (11:08 -0700)]
Merge branch 'netem-slot-param' into iproute2-next
Yousuk Seung says:
====================
This series adds support for the new "slot" netem parameter for
slotting. Slotting is an approximation of shared media that gather up
packets within a varying delay window before delivering them nearly at
once.
Yousuk Seung [Mon, 27 Aug 2018 02:42:30 +0000 (19:42 -0700)]
q_netem: slotting with non-uniform distribution
Extend slotting with support for non-uniform distributions. This is
similar to netem's non-uniform distribution delay feature.
Syntax:
slot distribution DISTRIBUTION DELAY JITTER [packets MAX_PACKETS] \
[bytes MAX_BYTES]
The syntax and use of the distribution table is the same as in the
non-uniform distribution delay feature. A file DISTRIBUTION must be
present in TC_LIB_DIR (e.g. /usr/lib/tc) containing numbers scaled by
NETEM_DIST_SCALE. A random value x is selected from the table and it
takes DELAY + ( x * JITTER ) as delay. Correlation between values is not
supported.
Examples:
Normal distribution delay with mean = 800us and stdev = 100us.
> tc qdisc add dev eth0 root netem slot distribution normal \
800us 100us
Optionally set the max slot size in bytes and/or packets.
> tc qdisc add dev eth0 root netem slot distribution normal \
800us 100us bytes 64k packets 42
Signed-off-by: Yousuk Seung <ysseung@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Dave Taht <dave.taht@gmail.com> Signed-off-by: David Ahern <dsahern@gmail.com>
Dave Taht [Mon, 27 Aug 2018 02:42:29 +0000 (19:42 -0700)]
q_netem: support delivering packets in delayed time slots
Slotting is a crude approximation of the behaviors of shared media such
as cable, wifi, and LTE, which gather up a bunch of packets within a
varying delay window and deliver them, relative to that, nearly all at
once.
It works within the existing loss, duplication, jitter and delay
parameters of netem. Some amount of inherent latency must be specified,
regardless.
The new "slot" parameter specifies a minimum and maximum delay between
transmission attempts.
The "bytes" and "packets" parameters can be used to limit the amount of
information transferred per slot.
A more correct example, using stacked netem instances and a packet limit
to emulate a tail drop wifi queue with slots and variable packet
delivery, with a 200Mbit isochronous underlying rate, and 20ms path
delay:
Dave Taht [Mon, 27 Aug 2018 02:42:28 +0000 (19:42 -0700)]
tc: support conversions to or from 64 bit nanosecond-based time
Using a 32 bit field to represent time in nanoseconds results in a
maximum value of about 4.3 seconds, which is well below many observed
delays in WiFi and LTE, and barely in the ballpark for a trip past the
Earth's moon, Luna.
Using 64 bit time fields in nanoseconds allows us to simulate
network diameters of several hundred light-years. However, only
conversions to and from ns, us, ms, and seconds are provided.
The iproute2 64 bit api uses signed values for time. Being able to
represent positive or negative time allows us to calculate +/- deltas
between, for example, the CLOCK_TAI and CLOCK_REALTIME clocks.
Time related utility functions in tc_util.c are moved to lib/utils.c.
Signed-off-by: Yousuk Seung <ysseung@google.com> Signed-off-by: Dave Taht <dave.taht@gmail.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David Ahern <dsahern@gmail.com>
Mahesh Bandewar [Thu, 23 Aug 2018 01:01:37 +0000 (18:01 -0700)]
iproute: make clang happy
These are primarily fixes for "string is not string literal" warnings
/ errors (with -Werror -Wformat-nonliteral). This should be a no-op
change. I had to replace couple of print helper functions with the
code they call as it was becoming harder to eliminate these warnings,
however these helpers were used only at couple of places, so no
major change as such.
Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Mahesh Bandewar [Thu, 23 Aug 2018 01:01:34 +0000 (18:01 -0700)]
ipmaddr: use preferred_family when given
When creating socket() AF_INET is used irrespective of the family
that is given at the command-line (with -4, -6, or -0). This change
will open the socket with the preferred family.
Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Cong Wang [Wed, 29 Aug 2018 17:09:27 +0000 (10:09 -0700)]
ss: add UNIX_DIAG_VFS and UNIX_DIAG_ICONS for unix sockets
UNIX_DIAG_VFS and UNIX_DIAG_ICONS are never used by ss,
make them available in ss -e output.
Cc: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Stefan Bader [Tue, 28 Aug 2018 14:27:29 +0000 (16:27 +0200)]
iprule: Fix destination prefix output
When adding support for JSON output the new code for printing
the destination prefix adds a stray blank character before
the bitmask. This causes some user-space parsing to fail.
Current output:
...: from x.x.x.x/l to y.y.y.y /l
Previous output:
...: from x.x.x.x/l to y.y.y.y/l
Fixes: 0dd4ccc5 "iprule: add json support" Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Luca Boccassi <bluca@debian.org> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
q_cake: Add description of the tc filter override mechanism to man page
Since CAKE now has three different settings that can be overridden by tc
filters (priority and host and flow hashes), documenting how they work is
probably a good idea.
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Luca Boccassi [Wed, 22 Aug 2018 18:09:02 +0000 (19:09 +0100)]
testsuite: let make compile build the netlink helper
The generate_nlmsg binary is required but make -C testsuite compile
does not build it. Add the necessary includes and C*FLAGS to the tools
Makefile and have the compile target build it.
Signed-off-by: Luca Boccassi <bluca@debian.org> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Luca Boccassi [Wed, 22 Aug 2018 18:09:01 +0000 (19:09 +0100)]
testsuite: remove all temp files and implement make clean
Some generated test files were not removed, including one executable in
the testsuite/tools directory.
Ensure make clean from the top level directory works for the testsuite
subdirs too, and that all the files are removed.
Signed-off-by: Luca Boccassi <bluca@debian.org> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Stefan Bader [Wed, 22 Aug 2018 08:31:38 +0000 (10:31 +0200)]
testsuite: Handle large number of kernel options
Once there are more than a certain number of kernel config options
set (this happened for us with kernel 4.17), the method of passing
those as command line arguments exceeds the maximum number of
arguments the shell supports. This causes the whole testsuite to
fail.
Instead, create a temporary file and modify its contents so that
the config option variables are exported. Then this file can be
sourced in before running the tests.
Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Luca Boccassi <bluca@debian.org> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Phil Sutter [Fri, 17 Aug 2018 16:38:46 +0000 (18:38 +0200)]
lib: Make check_enable_color() return boolean
As suggested, turn return code into true/false although it's not checked
anywhere yet.
Fixes: 4d82962cccc6a ("Merge common code for conditionally colored output") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Phil Sutter [Tue, 14 Aug 2018 12:18:07 +0000 (14:18 +0200)]
testsuite: Prepare for ss tests
This merges the shared bits from ts_tc() and ts_ip() into a common
function for being wrapped by the first ones and adds a third ts_ss()
for testing ss commands.
Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Phil Sutter [Tue, 14 Aug 2018 12:18:06 +0000 (14:18 +0200)]
ss: Review ssfilter
The original problem was ssfilter rejecting single expressions if
enclosed in braces, such as:
| sport = 22 or ( dport = 22 )
This is fixed by allowing 'expr' to be an 'exprlist' enclosed in braces.
The no longer required recursion in 'exprlist' being an 'exprlist'
enclosed in braces is dropped.
In addition to that, a few other things are changed:
* Remove pointless 'null' prefix in 'appled' before 'exprlist'.
* For simple equals matches, '=' operator was required for ports but not
allowed for hosts. Make this consistent by making '=' operator
optional in both cases.
Reported-by: Samuel Mannehed <samuel@cendio.se> Fixes: b2038cc0b2403 ("ssfilter: Eliminate shift/reduce conflicts") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Phil Sutter [Wed, 15 Aug 2018 09:18:26 +0000 (11:18 +0200)]
man: ip-route: Clarify referenced versions are Linux ones
Versioning scheme of Linux and iproute2 is similar, therefore the
referenced kernel versions are likely to confuse readers. Clarify this
by prefixing each kernel version by 'Linux' prefix.
Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Phil Sutter [Wed, 15 Aug 2018 16:21:25 +0000 (18:21 +0200)]
bridge: Fix check for colored output
There is no point in calling enable_color() conditionally if it was
already called for each time '-color' flag was parsed. Align the
algorithm with that in ip and tc by actually making use of 'color'
variable.
Fixes: e9625d6aead11 ("Merge branch 'iproute2-master' into iproute2-next") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David Ahern <dsahern@gmail.com>
sch_skbprio is a qdisc that prioritizes packets according to their skb->priority
field. Under congestion, it drops already-enqueued lower priority packets to
make space available for higher priority packets. Skbprio was conceived as a
solution for denial-of-service defenses that need to route packets with
different priorities as a means to overcome DoS attacks.
Signed-off-by: Nishanth Devarajan <ndev2021@gmail.com> Reviewed-by: Michel Machado <michel@digirati.com.br> Signed-off-by: David Ahern <dsahern@gmail.com>
Tobias Klauser [Wed, 8 Aug 2018 12:33:40 +0000 (14:33 +0200)]
tc: bpf: update list of archs with eBPF support in manpage
Update the list of architectures supporting eBPF JIT as of Linux 4.18.
Also mention the Linux version where support for a particular
architecture was introduced. Finally, reformat the list of architectures
as a bullet list in order to make it more readable.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>