git.proxmox.com Git - mirror

ip/tunnel: Unify local/remote endpoint address printing

Introduce and use tnl_print_endpoint() helper to print of tunnel
endpoint address.

Note that for AF_INET and AF_INET6 inet_ntop(3) is used that may return
NULL in case of failure and while unlikely format_host_rta() might
return NULL too. Handle this case when passing local/remote to
print_string().

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>

tcp_metric: Use get_addr_rta()

While there remove & from inet_prefix.data when since it is array.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>

ipl2tp: Use get_addr_rta()

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>

ipneigh: Use inet_addr_match_rta()

While there check return from get_prefix() for filter address.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>

ipmroute: Use inet_addr_match_rta()

While there check return from get_prefix() for filter address.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>

iprule: Use inet_addr_match_rta()

While there check return from get_prefix() for filter address.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>

ipaddress: Use inet_addr_match_rta()

While there check return from get_prefix() for filter address.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>

utils: Introduce get_addr_rta() and inet_addr_match_rta()

First is used to get address from netlink attribute to
inet_prefix data structure. Use memcpy() with constant
value to let complier optimize by replacing a call by
inlining load/store instructions.

Second is used to match address in given netlink attribute
with one given as reference. It matches successfully if
no attribute is given (@rta is NULL), reference address
family is AF_UNSPEC or it's length isn't given; fails if
get_attr_rta() can't get attribute or it's family does
not match reference; calls inet_addr_match() to get final
verdict.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>

Merge branch 'unify_external' into iproute2-next

Serhey Popovych  says:

====================

With this series I want to unify collect metadata
handling in tunnels:

  1) Use "external" name for JSON and non-JSON output.

     Do not *print* any options when tunnel in
     collect metadata mode: gre6 already do
     this, so just apply to others.

  2) Do not *add* any attributes when configuring
     gre tunnel in collect metadata mode.

     Other tunnels (e.g. gre6, iptnl, ip6tnl)
     alredy do that.

This is next step in ipv4 and ipv6 modules
unification to prepare for merge in the future.

Any comments, suggestions and criticism as always
welcome.

v2
  For all tunnels implementing collect metadata
  use "external" keyword for both JSON. Thanks
  to Jiri Benc for detailed explanation.
====================

Signed-off-by: David Ahern <dsahern@gmail.com>

gre/gre6: Unify attribute addition to netlink buffer

There are couple of minor improvements:

  1) Check erspan_ver == 2 in gre6. It still could
     be 1 if erspan_idx is 0.

  2) Add tunnel encapsulation attributes only when
     collect metadata not in effect in gre.

  3) Trivial: address checkpatch issues.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>

ip/tunnel: Be consistent when printing tunnel collect metadata

Print only "external" if collect meta data attribute
is given: rest of parameters are irrelevant. This is
to follow gre6.

For both JSON and non-JSON output use "external" for
all tunnels including vxlan and geneve.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>

Merge branch 'iproute2-master' into iproute2-next

Signed-off-by: David Ahern <dsahern@gmail.com>

tc/lexer: let quotes actually start strings

The lexer will go with the longest match, so previously
the starting double quotes of a string would be swallowed by
the [^ \t\r\n()]+ pattern leaving the user no way to
actually use strings with escape sequences.
Fix this by not allowing this case to start with double
quotes.

Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>

iplink: Use ll_name_to_index() instead of if_nametoindex()

While benefit from using ll_name_to_index() with populated
cache can potentially be exploited only in few places
(e.g. bridge fdb/mdb/vlan show routines) there is another
advantage of ll_name_to_index() over plain if_nametoindex():

  in case of if_nametoindex() failure ll_name_to_index()
  will attempt to get index from common name in form "if%d"
  that may be returned from ll_index_to_name().

This makes output from ip(8) coherent with it's input.

Note that most of the code already switched from plain
if_nametoindex() to ll_name_to_index() to cached variant.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

vti/vti6: Minor improvements

In prepare of link_vti.c and link_vti6.c merge:

  1) Make @fwmark of __u32 type instead of unsigned int
     in vti to match with rest tunneling code.

  2) Report when unable to translate @link network device
     name to index instead of silently exiting in vti6.

  3) Remove newline separating local/remote attributes
     from the ikey/okey in vti6 to match vti module.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

iptnl/ip6tnl: Unify ttl/hoplimit parsing routines

Handle "inherit" case properly for gre6 and ip6tnl.

Use get_u8() in gre to parse ttl/hoplimit.

Be consistent about "hlim" alias to ttl/hoplimit
support.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

tunnel: Add space between encap-dport and encap-sport in non-JSON output

Fixes: bad76e6b1f44 ("ip/tunnel: Abstract tunnel encapsulation options printing")
Fixes: e2d4588331fc ("ip: link_gre.c: add json output support")
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

gre/gre6: Post merge fixes

Few minor changes after merge of 'master' into 'net-next' branch:

  1) Follow 80 line length for printing erspan_index parameter
     as we did in master with commit 2a8d0f6e9c3f ("gre/tunnel:
     Print erspan_index using print_uint()").

  2) Remove remnants of encapsulation option printing: now it
     is done using tnl_print_encap() helper in commit bad76e6b1f44
     ("ip/tunnel: Abstract tunnel encapsulation options printing").

Fixes: 8c75f69411bc ("Merge branch 'master' into net-next")
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>

Merge branch 'shared_block' into net-next

Jiri Pirko  says:

====================

From: Jiri Pirko <jiri@mellanox.com>

Kernel allows to share all filters between qdiscs with use
of shared block.

Example:

block number 22. "22" is just an identification:
$ tc qdisc add dev ens7 ingress_block 22 ingress
                        ^^^^^^^^^^^^^^^^
$ tc qdisc add dev ens8 ingress_block 22 ingress
                        ^^^^^^^^^^^^^^^^

If we don't specify "block" command line option, no shared block would
be created:
$ tc qdisc add dev ens9 ingress

Now if we list the qdiscs, we will see the block index in the output:

$ tc qdisc
qdisc ingress ffff: dev ens7 parent ffff:fff1 ingress_block 22
qdisc ingress ffff: dev ens8 parent ffff:fff1 ingress_block 22
qdisc ingress ffff: dev ens9 parent ffff:fff1

To make is more visual, the situation looks like this:

   ens7 ingress qdisc                 ens7 ingress qdisc
          |                                  |
          |                                  |
          +---------->  block 22  <----------+

Unlimited number of qdiscs may share the same block.

Block sharing is also supported for clsact qdisc:
$ tc qdisc add dev ens10 ingress_block 23 egress_block 24 clsact
$ tc qdisc show dev ens10
qdisc clsact ffff: dev ens10 parent ffff:fff1 ingress_block 23 egress_block 24

We can add filter using the block index:

$ tc filter add block 22 protocol ip pref 25 flower dst_ip 192.168.0.0/16 action drop

Note we cannot use the qdisc for filter manipulations of shared blocks:

$ tc filter add dev ens8 ingress protocol ip pref 1 flower dst_ip 192.168.100.2 action drop
Error: This filter block is shared. Please use the block index to manipulate the filters.

We will see the same output if we list filters for ingress qdisc of
ens7 and ens8, also for the block 22:

$ tc filter show block 22
filter protocol ip pref 25 flower chain 0
filter protocol ip pref 25 flower chain 0 handle 0x1
...

$ tc filter show dev ens7 ingress
filter block 22 protocol ip pref 25 flower chain 0
filter block 22 protocol ip pref 25 flower chain 0 handle 0x1
...

$ tc filter show dev ens8 ingress
filter block 22 protocol ip pref 25 flower chain 0
filter block 22 protocol ip pref 25 flower chain 0 handle 0x1
...

====================

Signed-off-by: David Ahern <dsahern@gmail.com>

tc: implement ingress/egress block index attributes for qdiscs

During qdisc creation it is possible to specify shared block for bot
ingress and egress. Pass this values to kernel according to the command
line options.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>

tc: introduce support for block-handle for filter operations

So far, qdisc was the only handle that could be used to manipulate
filters. Kernel added support for using block to manipulate it. So add
the support to use block index to manipulate filters. The magic
TCM_IFINDEX_MAGIC_BLOCK indicates the block index is in use.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>

tc: introduce tc_qdisc_block_exists helper

This hepler used qdisc dump to list all qdisc and find if block index in
question is used by any of them. That means the block with specified
index exists.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>

Merge branch 'inet_get_addr' into net-next

Serhey Popovych says:

====================

It looks confusing to have multiple independent
routines to get internet address from it's string
representation: get_addr() and inet_get_addr().

Most complicated users of inet_get_addr() is
iplink_geneve.c and iplink_vxlan.c because they
required to handle both AF_INET and AF_INET6
for their local/remote endpoints.

On the other hand get_addr() does not provide
additional information like address type: need
to address this. to get rid of current and
possible future code duplications. Note that
this functionality is first step to make proto
independent handling of local/remote endpoints
in ip/tunnel code (there will be additional
series based on this one).

Also fix get_addr_1() and get_prefix() to make
sure it always provide correct ->family and
->bitlen.

As always comments, suggestions and criticism
are welcome.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>

ip: Get rid of inet_get_addr()

Both geneve and vxlan modules are converted to
use get_addr() we can replace inet_get_addr()
in less problematic places and finally get
rid of inet_get_addr().

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>

iplink_vxlan: Get rid of inet_get_addr()

Now we have additional information about address
class from get_addr() we can use it in place of
inet_get_addr().

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>

iplink_geneve: Get rid of inet_get_addr()

Now we have additional information about address
class from get_addr() we can use it in place of
inet_get_addr().

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>

utils: Fast inet address classification after get_addr()

It looks very useful to receive additional information
from get_addr_1() and get_addr() about address to simplify
caller and get rid of code duplications.

For now following information can be returned:

  1) address is unspecified (zero)
  2) address is multicast
  3) address is internet: family is either AF_INET or
     AF_INET6.

More information can be added in the future.

Introduce inline helpers to make code using this new
address classification interface more self explaining:

  bool is_addrtype_inet(inet_prefix *addr)
    true if @addr is inet address

  bool is_addrtype_inet_unspec(inet_prefix *addr)
    true if @addr is unspecified inet address

  bool is_addrtype_inet_multi(inet_prefix *addr)
    true if @addr is multicast inet address

  bool is_addrtype_inet_not_unspec(inet_prefix *addr)
    true if @addr is not unspecified inet address
    false if @addr is not inet or unspecified inet

  bool is_addrtype_inet_not_multi(inet_prefix *addr)
    true if @addr is not multicast inet address
    false if @addr is not inet or multicast inet

Last two are useful for case when we need inet address
that is not unspecified or multicast.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>

utils: Always specify family and ->bytelen in get_prefix_1()

Handle default/all/any special case in get_addr_1() to setup
->family and ->bytelen correctly.

Make get_addr_1() return ->bitlen == -2 instead of -1 to
distinguish default/all/any special case from the rest:
it is safe because all callers check ->bitlen < 0, not
explicit value -1.

Reduce intendation by one level and get rid of goto/label
to make code more readable.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>

utils: Always specify family for address in get_addr_1()

Set ->family correctly when string representing address
is "default", "all" or "any": get_addr_1() might be called
with AF_UNSPEC (e.g. get_addr() -> get_addr_1()).

Extend support for zero address to all address families,
not only AF_INET and AF_INET6 when one explicitly given
as @family: use af_byte_len() to correctly set address length.

Still assume AF_INET when @family is AF_UNSPEC.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>

Merge branch 'master' into net-next

Conflicts:
ip/link_gre.c
ip/link_gre6.c

Signed-off-by: David Ahern <dsahern@gmail.com>

bpf: support map offload

When program is loaded with a specified ifindex, use that
ifindex also when creating maps.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Ahern <dsahern@gmail.com>

tc: red: allow setting th_min and th_max to the same value

Setting th_min and th_max to the same value may be useful for DCTCP
deployments. The original DCTCP paper describes it as a simplest way
of achieving simple ECN threshold marking. Indeed, there doesn't seem
to be any simpler qdisc in Linux which would allow such a setup today.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>

Update kernel headers to 4.15-rc8

Update kernel headers to commit 30c3e9d47035
("l2tp: remove switch block in l2tp_nl_cmd_session_create()")

Signed-off-by: David Ahern <dsahern@gmail.com>

tunnel: Return constant string without copying it

We return constant string from tnl_strproto(), no need
to copy it to temporary buffer and then return such
buffer as const: return constant string instead.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

vti6/tunnel: Unify and simplify link type help functions

Both of these two changes are missing for link_vti6.c:

commit 8b47135474cd ("ip: link: Unify link type help functions a bit")
commit 561e650eff67 ("ip link: Shortify printing the usage of link type")

Replay them on link_vti6.c to bring link type help functions
inline with other tunneling code.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

vti/tunnel: Unify ikey/okey printing

For vti6 tunnel we print [io]key in dotted-quad notation
(ipv4 address) while in vti we do that in hex format.

For vti tunnel we print [io]key only if value is not
zero while for vti6 we miss such check.

Unify vti and vti6 tunnel [io]key output.

While here enlarge s2 buffer to the same size as in rest
of tunnel support code (64 bytes) and check return from
inet_ntop().

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

gre/tunnel: Print erspan_index using print_uint()

One is missing in JSON output because fprintf()
is used instead of print_uint().

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

ip/tunnel: Abstract tunnel encapsulation options printing

Get rid of code duplications and consolidate encapsulation
options printing in single function - tnl_print_encap().

Introduce and use tnl_encap_str() to format encapsulation
option string according to tempate and given values to avoid
code duplication and simplify it.

Use print_string() instead of fputs() and fprintf() to
print encapsulation for !is_json_context().

Print "unknown" parameter for "encap" type in PRINT_FP
context using "%s " format specifier and benefit from
complite time string merge.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

ip/tunnel: Use print_0xhex() instead of print_string()

No need for custom SPRINT_BUF() and snprintf() 0x%x
value to this buffer: we can use print_0xhex() instead
of print_string().

In link_iptnl.c use s2 instead of s1 buffer and remove
s1.

While there adjust fwmark option print order in iptnl
and ip6tnl to get it match each other.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

ip/tunnel: Simplify and unify tos printing

For ip tunnels tos can be 0 when not configured, 1 when
inherited from encapsulated packet and rest specifying
diffserv (rfc2474) or tos (rfc1349) bits. It is stored
in packet tos/diffserv field and returned in tos
netlink attribute to userspace.

Simplify and unify tos printing by using print_0xhex()
and print_string() instead of fprintf() to output values.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

ip/tunnel: Correct and unify ttl/hoplimit printing

Both ttl/hoplimit is from 1 to 255. Zero has special meaning:
use encapsulated packet value. In ip-link(8) -d output this
looks like "ttl/hoplimit inherit". In JSON we have "int" type
for ttl and therefore values from 0 (inherit) to 255.

To do the best in handling ttl/hoplimit we need to accept
both cases: missing attribute in netlink dump and zero value
for "inherit"ed case. Last one is broken since JSON output
introduction for gre/iptnl versions and was never true for
gre6/ip6tnl.

For all tunnels, except ip6tnl change JSON type from "int" to
"uint" to reflect true nature of the ttl.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

iplink: Use ll_index_to_name() instead of if_indextoname()

There are two reasons for switching to cached variant:

  1) ll_index_to_name() may return result from cache,
     eliminating expensive ioctl() to the kernel.

     Note that most of the code already switched from plain
     if_indextoname() to ll_index_to_name() to cached variant
     in print path because in most cases cache populated.

  2) It always return name in the form "if%d", even if
     entry is not in cache and ioctl() fails. This drops
     "link_index" from JSON output.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

devlink: Ignore unknown attributes

In case of extending the UAPI old packages would break.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

iplink: Fix "alias" parameter length calculations

We need NEXT_ARG() to get *argv pointing to "alias"
parameter value. Overwise we get and check "alias"
string length.

Fixes: f88becf35e08 ("iplink: Process "alias" parameter correctly")
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

man: Document the meaning of zero in min/max_tx_rate parameters

Zero value in min/max_tx_rate has a special meaning of no rate limit,
document it.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>

ipaddress: Make sure VF min/max rate API is supported before using it

When using the new minimum rate API and providing only one parameter
(minimum rate/maximum rate), we query the VF min and max rate regardless
of kernel support.
This resulted in segmentation fault in ipaddr_loop_each_vf, which tries
to access NULL pointer.

This patch identifies such cases by testing the VF table for NULL
pointer in IFLA_VF_RATE, and aborts the operation.
Aborting on the first VF is valid since if the kernel does not support
the new API for the first VF, it will not support it for the other VFs
as well.

Fixes: f89a2a05ffa9 ("Add support to configure SR-IOV VF minimum and maximum Tx rate through ip tool")
Cc: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: Gal Pressman <galp@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>

iplink: Validate minimum tx rate is less than maximum tx rate

According to the documentation (man ip-link), the minimum TXRATE should
be always <= Maximum TXRATE, but commit f89a2a05ffa9 ("Add support to
configure SR-IOV VF minimum and maximum Tx rate through ip tool") didn't
enforce it.

Fixes: f89a2a05ffa9 ("Add support to configure SR-IOV VF minimum and maximum Tx rate through ip tool")
Cc: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: Gal Pressman <galp@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>

ipaddress: Use family_name() for better code reuse

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>

tc: Optimize gact action lookup

When adding a filter with a gact action such as 'drop', tc first tries
to open a shared object with equivalent name (m_drop.so in this case)
before trying gact. Avoid this by matching the action name against those
handled by gact prior to calling get_action_kind().

Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>

Merge branch 'tc-batch' into net-next

Chris Mi says:

====================
Currently in tc batch mode, only one command is read from the batch
file and sent to kernel to process. With this patchset, at most 128
commands can be accumulated before sending to kernel.

We introduced a new function in patch 1 to support for sending
multiple messages. In patch 2, we add this support for filter
add/delete/change/replace and actions add/change/replace commands.

But please note that kernel still processes the requests one by one.
To process the requests in parallel in kernel is another effort.
The time we're saving in this patchset is the user mode and kernel mode
context switch. So this patchset works on top of the current kernel.

Using the following script in kernel, we can generate 1,000,000 rules.
tools/testing/selftests/tc-testing/tdc_batch.py

Without this patchset, 'tc -b $file' exection time is:

real    0m15.555s
user    0m7.211s
sys     0m8.284s

With this patchset, 'tc -b $file' exection time is:

real    0m12.360s
user    0m6.082s
sys     0m6.213s

The insertion rate is improved more than 10%.
====================

Signed-off-by: David Ahern <dsahern@gmail.com>

tc: Add batchsize feature for filter and actions

Currently in tc batch mode, only one command is read from the batch
file and sent to kernel to process. With this support, at most 128
commands can be accumulated before sending to kernel.

Now it only works for the following successive commands:
1. filter add/delete/change/replace
2. actions add/change/replace

Signed-off-by: Chris Mi <chrism@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>

lib/libnetlink: Add a new function rtnl_talk_iov

rtnl_talk can only send a single message to kernel. Add a new function
rtnl_talk_iov that can send multiple messages to kernel.
rtnl_talk_iov takes struct iovec * and iovlen as arguments.

Signed-off-by: Chris Mi <chrism@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>

tests: make sure rand_dev suffix has 6 chars

The change to limit the read size from /dev/urandom is a tradeoff.
Reading too much can trigger an issue, but so it could if the
suggested 250 random chars would not contain enough [:alpha:] chars.
If they contain:
a) >=6 all is ok
b) [1-5] the devname would be shorter than expected (non fatal).
c) 0 things would break

In loops of hundreds of thousands it always was (a) for my, but since
if occuring in a test it might be hard to track what happened avoid
this issue by retrying on the length condition.

Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

tests: read limited amount from /dev/urandom

In some test environments like e.g. Ubuntu & Debian autopkgtest it
can happen that while generating random device names the pipes
between tr and head are considered dead while processing.
That prints (non fatal) issues like:
  Running ip/link/new_link.t [iproute2-this/4.13.0-17-generic]: tr:
write error: Broken pipe
  tr: write error
  PASS

This only happens if reading an infinite amount of chars with the
read from urandom, so reading a defined amount fixes the issue.

Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

ifcfg/rtpr: convert to POSIX shell

These files are already mostly written in POSIX shell, so convert their
shebangs to /bin/sh and tweak the few bashisms in here.

URL: https://crbug.com/756559
Reported-by: Pat Erley <perley@chromium.org>
Signed-off-by: Mike Frysinger <vapier@chromium.org>

mark shell scripts +x

This makes it easier to execute locally for testing.

Signed-off-by: Mike Frysinger <vapier@chromium.org>

tc: remove no longer relevant README

This document described how kernel and tc used to handle
timing. In last two years, kernel has switched over to using
ktime. Nothing to see here, move along.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

ip6tnl/tunnel: Output hoplimit before encapsulation limit

To follow gre6 output print hoplimit before encapsulation
limit in link_ip6tnl.c.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

gre6/tunnel: Output flowlabel after tclass

To follow ip6tnl output print flowlabel after tclass
in link_gre6.c.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

ip6/tunnel: Unify encap_limit printing

Use %u format specifier to print it in link_gre6.c and
make code more readable.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

ip6/tunnel: Unify flowlabel printing

Use @s2 buffer to store string representation of
flowlabel and get rid of extra SPRINT_BUF(): no
need to preserve @s2 contents for later.

Use print_string(PRINT_ANY, ...) with prepared by
snprintf() string for both PRINT_JSON and PRINT_FP
cases.

Omit flowlabel from output if no flowinfo attribute
is given and IP6_TNL_F_USE_ORIG_FLOWLABEL isn't set.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

ip6/tunnel: Unify tclass printing

Use @s2 buffer to store string representation of
tclass and get rid of extra SPRINT_BUF(): no
need to preserve @s2 contents for later.

Use print_string(PRINT_ANY, ...) with prepared by
snprintf() string for both PRINT_JSON and PRINT_FP
cases.

While there use __u32 for flowinfo in link_gre6.c
and check for IFLA_GRE_FLOWINFO attribute presense.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

ip6tnl/tunnel: Do not print obscure flowinfo

It is implementation internal and main purpose
of printing it seems debugging.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

ip6/tunnel: Fix tclass output

In link_gre6.c it seems copy paste error: tclass is 8 bits,
not 20 as flowlabel.

In link_iptnl.c rename "flowinfo_tclass" to "tclass" as it
correct name since flowinfo is implementation internal name
used to label combined within u32 attribute tclass and
flowlabel.

Fixes: 1facc1c61c07 ("ip: link_ip6tnl.c: add json output support")
Fixes: 2e706e12d9b0 ("Merge branch 'master' into net-next")
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

tc: Fix filter protocol output

Fixes: 249284ff5a44 ("tc: jsonify filter core")
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>

include: update ethernet headers

Incorporate upstream changes to fix compliation with MUSL.
See commit 6926e041a892
("uapi/if_ether.h: prevent redefinition of struct ethhdr")

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

ss: fix NULL pointer access when parsing unix sockets with oldformat

When parsing and printing the unix sockets in unix_show(),
if the oldformat is detected, the peer_name member of the sockstat
object is left uninitialized (NULL).
For this reason, if a filter has been specified on the command line,
a strcmp() will crash when trying to access it.

Avoid crash by checking that peer_name is not NULL before
passing it to strcmp().

Cc: Stefano Brivio <sbrivio@redhat.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Antonio Quartulli <a@unstable.cc>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

ss: fix crash when skipping disabled header field

When the first header field is disabled (i.e. when passing the -t
option), field_flush() is invoked with the `buffer` global variable
still zero'd.
However, in field_flush() we try to access buffer.cur->len
during variables initialization, thus leading to a SIGSEGV.

It's interesting to note that this bug appears only when the code
is compiled with -O0, because the compiler is smart
enough to immediately jump to the return statement if optimizations
are enabled and skip the faulty instruction.

Cc: Stefano Brivio <sbrivio@redhat.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Antonio Quartulli <a@unstable.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

ip fou: pass family attribute as u8

This fixes fou on big-endian systems.

Signed-off-by: Filip Moc <dev@moc6.cz>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

Restore --no-print-directory option for silent builds

Commit 69fed534a533 ("change how Config is used in Makefile's") removed
Config from Makefile. Config had the checks to set VERBOSE based on user
request and VERBOSE is used to add the --no-print-directory argument.
Since Config is gone, add the relevant setup for VERBOSE to Makefile
to restore quieter builds by default.

Fixes: 69fed534a533 ("change how Config is used in Makefile's")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

Merge branch 'master' into net-next

Conflicts:
man/man8/ip-link.8.in

Signed-off-by: David Ahern <dsahern@gmail.com>

link_iptnl: Open "encap" JSON object

It seems missing pair of open_json_object()/close_json_object()
in iptnl implementation.

Note that we open "encap" JSON object in ip6tnl.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>

link_iptnl: Print tunnel mode

Tunnel mode does not appear in parameters print for iptnl
supported tunnels like ipip and sit, while printed for
ip6tnl.

Print tunnel mode as "proto" field name for JSON and
without any name when printing to cli to follow ip6tnl
behaviour.

For non JSON output we have:

   $ ip -d link show dev sit1

Before:
-------
17: sit1@NONE: <NOARP> mtu 1480 qdisc noop state DOWN ...
    link/sit X.X.X.X brd 0.0.0.0 promiscuity 0
    sit remote any local X.X.X.X ...
        ~~~

After:
------
17: sit1@NONE: <NOARP> mtu 1480 qdisc noop state DOWN ...
    link/sit X.X.X.X brd 0.0.0.0 promiscuity 0
    sit any remote any local X.X.X.X ...
        ^^^

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>

link_iptnl: Kill code duplication

Both sit and ipip "mode" parameter handling nearly the same.
Except for sit we have "ip6ip" mode: check it only when
configuring sit.

Note that there is no need strcmp(lu->id, "ipip"): if it is
not sit it is "ipip" because we have only these two link util
defined in module.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>

devlink, rdma, tipc: properly define TARGETS without HAVE_MNL

Leaving a variable with a generic name such as TARGETS undefined would lead
to Make picking up its value from the environment. Avoid this by always
defining TARGETS in the Makefiles.

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>

ip: link: add support for netdevsim device type

netdevsim is a new software device for testing kernel APIs
without any hardware attached. Allow users to create such
devices.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>

man: fix small formatting errors

Lintian detected the following formatting errors:

man/man8/devlink-sb.8.gz 230: warning: macro `b' not defined
man/man8/ip-link.8.gz 1243: warning: macro `in-8' not defined
(possibly missing space after `in')
man/man8/tc-u32.8.gz `R' is a string (producing the registered sign),
not a macro.

Signed-off-by: Luca Boccassi <bluca@debian.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

man: routel/routef: don't mention filesystem paths

The filesytem paths to these scripts might be different on various
distros, so don't mention it in the manpages. It is not really useful
information anyway.

Originally submitted as Debian bug:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=561424

Reported-by: jidanni@jidanni.org
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

man: ip-address: document 15-char limit for LABEL

Trying to set a label longer than 15 characters returns an error:
RTNETLINK answers: Numerical result out of range

Document the limit in the manpage.

Originally reported as a Debian bug:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=661886

Reported-by: Gabor Kiss <kissg@ssg.ki.iif.hu>
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

man: add more keywords to ip.8 short description

A Debian user suggested adding more network-related keywords to the
ip manpage, so that manpage-scraping and indexing software like
apropos can do a better job of categorizing the programs.

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=877983

Suggested-by: Lynoure Braakman <lynoure@gmail.com>
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

man: drop references to Debian-specific paths

Documentation should be distribution-agnostic - any specific quirks
should be handled by downstream maintainers, if necessary.
Remove mentions of Debian paths and package names.

Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

ip/tunnel: Document "external" parameter

Add it to ip-link(8) "type gre" output help message
as well as to ip-link(8) page.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>

vxcan,veth: Forbid "type" for peer device

It is already given for original device we configure this
peer for.

Results from following command before/after change applied
are shown below:

$ ip link add dev veth1a type veth peer name veth1b \
type veth peer name veth1c

Before:
-------

<no output, no netdevs created>

After:
------

Error: duplicate "type": "veth" is the second value.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

qdisc: print offload indication

Use the newly added TCA_HW_OFFLOAD indication from kernel
to print a consistent 'offloaded' message to user when listing qdiscs.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

gre6/tunnel: Do not submit garbage in flowinfo

We always send flowinfo to the kernel. If flowlabel/tclass
was set first to non-inherit value and then reset to
inherit we do not clear flowlabel/tclass part in flowinfo,
send it to kernel and can get from the kernel back.

Even if we check for IP6_TNL_F_USE_ORIG_TCLASS and
IP6_TNL_F_USE_ORIG_FLOWLABEL when printing options
sending invalid flowlabel/tclass to the kernel seems
bad idea.

Note that ip6tnl always clean corresponding flowinfo
parts on inherit.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>

gre,ip6tnl/tunnel: Fix noencap- support

We must clear bit, not set all but given bit.

Fixes: 858dbb208e39 ("ip link: Add support for remote checksum offload to IP tunnels")
Fixes: 73516e128a5a ("ip6tnl: Support for fou encapsulation"
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>

rdma: Move link execution logic to common code

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>

rdma: Rename rd_free_devmap to be rd_free

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>

rdma: Rename free function to be rd_cleanup

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>

rdma: Get rid of dev_map_free call

The dev_map_free() is called once only and it is short,
so it is better to integrate it into the caller's site.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>

rdma: Print supplied device name in case of wrong name

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>

rdma: Check that port index exists before operate on link layer

Link layer operates on port layer, hence it should check
it existence before execution commands.

Fixes: da990ab40a92 ("rdma: Add link object")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>

rdma: Fix misspelled SYS_IMAGE_GUID

SYS_IMAGE_GUIG is actually SYS_IMAGE_GUID.

Fixes: da990ab40a92 ("rdma: Add link object")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>

rdma: Move per-device handler function to generic code

Most of the proposed objects are working in the scope "dev"
and will implement the same logic. Move the code to utils.c,
so other objects will be able to reuse the code.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>

rdma: Protect dev_map_lookup from wrong input

Despite the fact that all callers to dev_map_lookup are ensuring that
there is always device name prior to call to that function, it is better
and safer to check that in the dev_map_lookup itself.

Fixes: 40df8263a0f0 ("rdma: Add dev object")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>

rdma: Reduce scope of _dev_map_lookup call

There is no external users of _dev_map_lookup function,
so let's limit its scope to be local.

Fixes: 40df8263a0f0 ("rdma: Add dev object")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>

erspan: add erspan usage description

The patch adds erspan usage description, so 'ip link help erspan'
and 'ip link help ip6erspan' shows the options.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>

ip/tunnel: No need to free answer after rtnl_talk() on error

Since rtnl_talk() never returns with answer buffer allocated
on error we do not need to release it manually. After this
initializing answer with NULL before rtnl_talk() is useless.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>

utils: ll_addr: Handle ARPHRD_IP6GRE in ll_addr_n2a()

ll_addr_n2a() correctly prints tunnel endpoints for gre, ipip, sit
and ip6tnl, but not for ip6gre. Fix this by adding ARPHRD_IP6GRE to
IPv6 tunnel endpoing address conversion.

Before:
-------

$ ip link show
...
18: ip6tnl0: <NOARP> mtu 1452 qdisc noop state DOWN mode DEFAULT group default
    link/tunnel6 :: brd ::
19: ip6gre0: <NOARP> mtu 1456 qdisc noop state DOWN mode DEFAULT group default
    link/gre6 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00 brd \
00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00

After:
------

$ ip link show
...
18: ip6tnl0: <NOARP> mtu 1452 qdisc noop state DOWN mode DEFAULT group default
    link/tunnel6 :: brd ::
19: ip6gre0: <NOARP> mtu 1456 qdisc noop state DOWN mode DEFAULT group default
    link/gre6 :: brd ::

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>

erspan: add erspan version II support

The patch adds support for configuring the erspan v2, for both
ipv4 and ipv6 erspan implementation. Three additional fields
are added: 'erspan_ver' for distinguishing v1 or v2, 'erspan_dir'
for specifying direction of the mirrored traffic, and 'erspan_hwid'
for users to set ERSPAN engine ID within a system.

As for manpage, the ERSPAN descriptions used to be under GRE, IPIP,
SIT Type paragraph. Since IP6GRE/IP6GRETAP also supports ERSPAN,
the patch removes the old one, creates a separate ERSPAN paragrah,
and adds an example.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>