Stefano Brivio [Fri, 23 Mar 2018 08:37:05 +0000 (09:37 +0100)]
ss: Fix rendering of continuous output (-E, --events)
Roman Mashak reported that ss currently shows no output when it
should continuously report information about terminated sockets
(-E, --events switch).
This happens because I missed this case in 691bd854bf4a ("ss:
Buffer raw fields first, then render them as a table") and the
rendering function is simply not called.
To fix this, we need to:
- call render() every time we need to display new socket events
from generic_show_sock(), which is only used to follow events.
Always call it even if specific socket display functions
return errors to ensure we clean up buffers
- get the screen width every time we have new events to display,
thus factor out getting the screen width from main() into a
function we'll call whenever we calculate columns width
- reset the current field pointer after rendering, more output
might come after render() is called
Reported-by: Roman Mashak <mrv@mojatatu.com> Fixes: 691bd854bf4a ("ss: Buffer raw fields first, then render them as a table") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Tested-by: Roman Mashak <mrv@mojatatu.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Alexander Zubkov [Sun, 18 Mar 2018 16:50:25 +0000 (17:50 +0100)]
treat "default" and "all"/"any" addresses differenty
Debian maintainer found that basic command:
# ip route flush all
No longer worked as expected which breaks user scripts and
expectations. It no longer flushed all IPv4 routes.
Recently behavior of "default" prefix parameter was corrected. But at
the same time behavior of "all"/"any" was altered too, because they
were the same branch of the code. As those parameters mean different,
they need to be treated differently in code too. This patch reflects
the difference.
Also after mentioned change, address parsing code was changed more
and address family was set explicitly even for "all"/"any" addresses.
And that broke matching conditions further. This patch fixes that too
and returns AF_UNSPEC to "all"/"any" address.
Now "default" is treated as top-level prefix (for example 0.0.0.0/0 in
IPv4) and "all"/"any" always matches anything in exact, root and match
modes.
Reported-by: Luca Boccassi <bluca@debian.org> Signed-off-by: Alexander Zubkov <green@msu.ru>
Alexander Zubkov [Sun, 18 Mar 2018 16:50:25 +0000 (17:50 +0100)]
treat "default" and "all"/"any" addresses differenty
Debian maintainer found that basic command:
# ip route flush all
No longer worked as expected which breaks user scripts and
expectations. It no longer flushed all IPv4 routes.
Recently behavior of "default" prefix parameter was corrected. But at
the same time behavior of "all"/"any" was altered too, because they
were the same branch of the code. As those parameters mean different,
they need to be treated differently in code too. This patch reflects
the difference.
Also after mentioned change, address parsing code was changed more
and address family was set explicitly even for "all"/"any" addresses.
And that broke matching conditions further. This patch fixes that too
and returns AF_UNSPEC to "all"/"any" address.
Now "default" is treated as top-level prefix (for example 0.0.0.0/0 in
IPv4) and "all"/"any" always matches anything in exact, root and match
modes.
Reported-by: Luca Boccassi <bluca@debian.org> Signed-off-by: Alexander Zubkov <green@msu.ru>
Debian maintainer found that basic command:
# ip route flush all
No longer worked as expected which breaks user scripts and
expectations. It no longer flushed all IPv4 routes.
Reported-by: Luca Boccassi <bluca@debian.org> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Fixes: 3fd86630876a ("iproute2: rework SR-IOV VF support") Fixes: 8c29ae7cc249 ("ip link: Fix crash on older kernels when show VF dev") Fixes: f89a2a05ffa9 ("Add support to configure SR-IOV VF minimum and maximum Tx rate through ip tool") Fixes: ae7229d5f99e ("ip: Add support for setting and showing SR-IOV virtual funtion link params") Fixes: d0e720111aad ("ip: ipaddress.c: add support for json output") Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Davide Caratti [Fri, 2 Mar 2018 18:36:16 +0000 (19:36 +0100)]
tc: fix parsing of the control action
If the user didn't specify any control action, don't pop the command line
arguments: otherwise, parsing of the next argument (tipically the 'index'
keyword) results in an error, causing the following 'tc-testing' failures:
Test a6d6: Add skbedit action with index
Test 38f3: Delete skbedit action
Test a568: Add action with ife type
Test b983: Add action without ife type
Test 7d50: Add skbmod action to set destination mac
Test 9b29: Add skbmod action to set source mac
Test e93a: Delete an skbmod action
Also, add missing parse for 'ok' control action to m_police, to fix the
following 'tc-testing' failure:
Test 8dd5: Add police action with control ok
tested with:
# ./tdc.py
test results:
all tests ok using kernel 4.16-rc2, except 9aa8 "Get a single skbmod
action from a list" (which is failing also before this commit)
Fixes: 3572e01a090a ("tc: util: Don't call NEXT_ARG_FWD() in __parse_action_control()") Cc: Michal Privoznik <mprivozn@redhat.com> Cc: Wolfgang Bumiller <w.bumiller@proxmox.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
ss: fix NULL dereference when rendering without header
When ss is invoked with the no-header flag, if the query doesn't return
any result, render() is called with 'buffer' uninitialized. This
currently leads to a segfault. Ensure that buffer is initialized before
rendering.
The bug can be triggered with: ss -H sport = 100000
David Ahern [Thu, 1 Mar 2018 22:43:08 +0000 (14:43 -0800)]
libnetlink: __rtnl_talk_iov should only loop max iovlen times
William reported ip hanging and bisected to a recent commit for batching
allowing more than 1 command to be sent per message. The loop over
recvmsg should never cycle more than iovlen times -- 1 response for
each command in the message.
Fixes: 72a2ff3916e5 ("lib/libnetlink: Add a new function rtnl_talk_iov") Signed-off-by: David Ahern <dsahern@gmail.com>
Phil Sutter [Thu, 1 Mar 2018 09:35:12 +0000 (10:35 +0100)]
ip-link: Fix use after free in nl_get_ll_addr_len()
Immediately after freeing the buffer returned from rtnl_talk(), it is
accessed again via pointer in struct rtattr array. This leads to some
builds not allowing to set an interface's MAC address because the
expected length value is garbage.
Fixes: 86bf43c7c2fdc ("lib/libnetlink: update rtnl_talk to support malloc buff at run time") Signed-off-by: Phil Sutter <phil@nwl.cc>
Joe Stringer [Wed, 28 Feb 2018 22:16:42 +0000 (14:16 -0800)]
bpf: Print section name when hitting non ld64 issue
It's useful to be able to tell which section is being processed in the
ELF when this error is triggered, so print that detail.
Signed-off-by: Joe Stringer <joe@wand.net.nz> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Dpipe - Each dpipe table can have one resource which is mapped to it.
The resource is presented via its full path. Furthermore, the number
of units consumed by single table entry is presented.
Resource - Each resource presents the dpipe tables that use it.
devlink: Add support for devlink resource abstraction
Add support for devlink resource abstraction. The resources are
represented by a tree based structure and are identified by a name and
a size. Some resources can present their real time occupancy.
First the resources exposed by the driver can be observed, for example:
$devlink resource show pci/0000:03:00.0
pci/0000:03:00.0:
name kvd size 245760 unit entry
resources:
name linear size 98304 occ 0 unit entry size_min 0 size_max 147456 size_gran 128
name hash_double size 60416 unit entry size_min 32768 size_max 180224 size_gran 128
name hash_single size 87040 unit entry size_min 65536 size_max 212992 size_gran 128
Some resource's size can be changed. Examples:
$devlink resource set pci/0000:03:00.0 path /kvd/hash_single size 73088
$devlink resource set pci/0000:03:00.0 path /kvd/hash_double size 74368
The changes do not apply immediately, this can be validate by the 'size_new'
attribute, which represents the pending changed size. For example
$devlink resource show pci/0000:03:00.0
pci/0000:03:00.0:
name kvd size 245760 unit entry size_valid false
resources:
name linear size 98304 size_new 147456 occ 0 unit entry size_min 0 size_max 147456 size_gran 128
name hash_double size 60416 unit entry size_min 32768 size_max 180224 size_gran 128
name hash_single size 87040 unit entry size_min 65536 size_max 212992 size_gran 128
In case of a pending change the nested resources present an indication
for a valid configuration of its children (sum of its children sizes
doesn't exceed the parent's size).
In order for the changes to take place hot reload is needed. The hot
reload through devlink will be introduced in the following patch.
Quentin Monnet [Thu, 22 Feb 2018 03:22:14 +0000 (19:22 -0800)]
README: re-add updated information link
The "Information" link was removed from README file in commit d7843207e6fd ("README: update location of git repositories, remove
broken info link"), because it redirected to a page that no longer
existed on the Linux Foundation wiki.
This page has just been restored, so we can add the link back again.
Since the previous link was a redirection, use the updated link instead.
Thanks to Luca Boccassi for investigating this issue, restoring and
updating the page.
Vincent Bernat [Tue, 20 Feb 2018 23:28:04 +0000 (00:28 +0100)]
color: disable color when json output is requested
Instead of declaring -color and -json exclusive, ignore -color when
-json is provided. The rationale is to allow to put -color in an alias
for ip while still being able to use -json. -color is merely a
presentation suggestion and we can assume there is nothing to color in
the JSON output.
Signed-off-by: Vincent Bernat <vincent@bernat.im> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Phil Sutter [Fri, 9 Feb 2018 17:49:38 +0000 (18:49 +0100)]
Remove leftovers from removed Latex documentation
Since there is no documentation in Latex format left, there is no need
to check for commands to build it. Also there is no need to ignore any
of the temporary files which were created by them.
Quentin Monnet [Fri, 9 Feb 2018 17:11:09 +0000 (09:11 -0800)]
README: update location of git repositories, remove broken info link
Reflect the recent change of location for the git repositories, and the
creation of the -next development repo, in README and README.devel.
Also remove the link to the Linux Foundation wiki that contained
information about iproute2. The link is now broken, I did not find any
alternative page to point to.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Acked-by: David Ahern <dsahern@gmail.com>
If the kernel receives a negative nsid it will automatically assign
the next available nsid. In this case alloc_netid() will set min and
max to 0 for ird_alloc(). And when max == 0 idr_alloc() will interpret
this as the maximum range, i.e. specific to nsids it will try to find
an id in the range [0,INT_MAX). This is intentionally supported in the
kernel for nsids.
Commit acbe9118ce80 ("ip netns: use strtol() instead of atoi()")
regressed ip netns in that respect although previously the use-case
was either accidentally supported or opaquely supported such that it
triggered the original commit. From what I can gather it went as
follows before: atoi() was called with a string indicating a negative
value which caused it to return -1 which was passed to the
kernel. Let's make it less opaque by introducing the keyword "auto":
ip netns set <netns-name> auto
will cause nsid to be set to -1 and the kernel will select an available
nsid.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Leon Romanovsky [Wed, 31 Jan 2018 08:11:54 +0000 (10:11 +0200)]
rdma: Add QP resource tracking information
This patch adds ss-similar interface to view various resource
tracked objects. At this stage, only QP is presented.
1. Get all QPs for the specific device:
$ rdma res show qp link mlx5_4
link mlx5_4/- lqpn 8 type UD state RESET sq-psn 0 pid 0 comm [ib_ipoib]
link mlx5_4/1 lqpn 7 type UD state RTS sq-psn 0 pid 0 comm [ib_core]
link mlx5_4/1 lqpn 1 type GSI state RTS sq-psn 0 pid 0 comm [ib_core]
link mlx5_4/1 lqpn 0 type SMI state RTS sq-psn 0 pid 0 comm [ib_core]
$ rdma res show qp link mlx5_4/
link mlx5_4/- lqpn 8 type UD state RESET sq-psn 0 pid 0 comm [ib_ipoib]
link mlx5_4/1 lqpn 7 type UD state RTS sq-psn 0 pid 0 comm [ib_core]
link mlx5_4/1 lqpn 1 type GSI state RTS sq-psn 0 pid 0 comm [ib_core]
link mlx5_4/1 lqpn 0 type SMI state RTS sq-psn 0 pid 0 comm [ib_core]
2. Provide illegal port number (0 is illegal):
$ rdma res show qp link mlx5_4/0
Wrong device name
3. Get QPs of specific port:
$ rdma res show qp link mlx5_4/1
link mlx5_4/1 lqpn 7 type UD state RTS sq-psn 0 pid 0 comm [ib_core]
link mlx5_4/1 lqpn 1 type GSI state RTS sq-psn 0 pid 0 comm [ib_core]
link mlx5_4/1 lqpn 0 type SMI state RTS sq-psn 0 pid 0 comm [ib_core]
4. Get QPs which have not assigned port yet:
link mlx5_4/- lqpn 8 type UD state RESET sq-psn 0 pid 0 comm [ib_ipoib]
5. Limit to specific Local QPNs:
$ rdma res show qp link mlx5_4/1 lqpn 1-3,7
link mlx5_4/1 lqpn 7 type UD state RTS sq-psn 0 pid 0 comm [ib_core]
link mlx5_4/1 lqpn 1 type GSI state RTS sq-psn 0 pid 0 comm [ib_core]
link mlx5_4/1 lqpn 0 type SMI state RTS sq-psn 0 pid 0 comm [ib_core]
. Filter types (strings):
$ rdma res show qp link mlx5_4/1 type UD,gSi
link mlx5_4/1 lqpn 7 type UD state RTS sq-psn 0 pid 0 comm [ib_core]
link mlx5_4/1 lqpn 1 type GSI state RTS sq-psn 0 pid 0 comm [ib_core]
Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Leon Romanovsky [Wed, 31 Jan 2018 08:11:49 +0000 (10:11 +0200)]
rdma: Add filtering infrastructure
This patch adds general infrastructure to RDMAtool to handle various
filtering options needed for the downstream resource tracking patches.
The infrastructure is generic and stores filters in list of key<->value
entries. There are three types of filters:
1. Numeric - the values are intended to be digits combined with '-' to
mark range and ',' to mark multiple entries, e.g. pid 1-100,234,400-401
is perfectly legit filter to limit process ids.
2. String - the values are consist from strings and "," as a denominator.
3. Link - special case to allow '/' in string to provide link name, e.g.
link mlx4_1/2.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Leon Romanovsky [Wed, 31 Jan 2018 08:11:47 +0000 (10:11 +0200)]
rdma: Add option to provide "-" sign for the port number
According to the IBTA spec [1], the physical connected port is provided
for the QP in RTR-to-INIT stage performed by modify_qp(). It causes
to do not have port number for newly created QPs.
The following patch adds "-" sign to present absence of port, because
QPs are going to be associated with rdmatool link object, which needs
port number as an index.
Jakub Kicinski [Sat, 27 Jan 2018 09:19:04 +0000 (01:19 -0800)]
tc: fix second printing of requeues
Non-JSON tc qdisc output used to print the "requeues" statistic
twice. Commit 4fcec7f3665b ("tc: jsonify stats2") tried to preserve
this behaviour for both standard output and JSON, but used the wrong
statistic (q.qlen). Also duplicating keys in JSON is not allowed,
so the second occurrence should be completely skipped with JSON.
Fixes: 4fcec7f3665b ("tc: jsonify stats2") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Jakub Kicinski [Fri, 26 Jan 2018 19:30:35 +0000 (11:30 -0800)]
ip: address: fix stats64 JSON object name
The JSON object name for statistics in ip link show is "stats644".
Looks like a typo, commit d0e720111aad ("ip: ipaddress.c: add support
for json output") contains an example with the expected "stats64" name.
The fact that no one has noticed until now is probably an indication
that no one is using this object. Hopefully it's not too late to fix
this, although IIUC this has already been in 4.13 and 4.14 releases :S
Fixes: d0e720111aad ("ip: ipaddress.c: add support for json output") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Jakub Kicinski [Fri, 26 Jan 2018 19:27:56 +0000 (11:27 -0800)]
tc: red: JSON-ify RED output
Make JSON output work with RED Qdiscs. Float/double printing
helpers have to be added/uncommented to print the probability.
Since TC stats in general are not split out to a separate object
the xstats printed by this patch are not separated either.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David Ahern <dsahern@gmail.com>
David Ahern [Thu, 25 Jan 2018 17:32:27 +0000 (09:32 -0800)]
Merge branch 'get_addr_rta' into iproute2-next
Serhey Popovych says:
====================
Now we enhance get_addr() to return additional information about address
(e.g. if it unspecified or multicast) we want to have same functionality
for attributes in netlink message.
Introduce and use get_addr_rta() that parses given netlink attribute
into @inet_prefix data structure in the same way similar get_addr()
parses address from it's string representation.
Use attribute length to guess address family: force it by giving non
AF_UNSPEC @family to get_addr_rta() to ensure address is of expected
family.
Introduce and use inet_addr_match_rta() to further simplify and unify
code where get_addr_rta() intended to be used together with
inet_addr_match().
This is next step in ipv4 and ipv6 modules unification to prepare for
merge in the future.
Introduce and use tnl_print_endpoint() helper to print of tunnel
endpoint address.
Note that for AF_INET and AF_INET6 inet_ntop(3) is used that may return
NULL in case of failure and while unlikely format_host_rta() might
return NULL too. Handle this case when passing local/remote to
print_string().
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com> Signed-off-by: David Ahern <dsahern@gmail.com>
Serhey Popovych [Wed, 24 Jan 2018 18:56:33 +0000 (20:56 +0200)]
utils: Introduce get_addr_rta() and inet_addr_match_rta()
First is used to get address from netlink attribute to
inet_prefix data structure. Use memcpy() with constant
value to let complier optimize by replacing a call by
inlining load/store instructions.
Second is used to match address in given netlink attribute
with one given as reference. It matches successfully if
no attribute is given (@rta is NULL), reference address
family is AF_UNSPEC or it's length isn't given; fails if
get_attr_rta() can't get attribute or it's family does
not match reference; calls inet_addr_match() to get final
verdict.
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com> Signed-off-by: David Ahern <dsahern@gmail.com>
David Ahern [Wed, 24 Jan 2018 18:02:27 +0000 (10:02 -0800)]
Merge branch 'unify_external' into iproute2-next
Serhey Popovych says:
====================
With this series I want to unify collect metadata
handling in tunnels:
1) Use "external" name for JSON and non-JSON output.
Do not *print* any options when tunnel in
collect metadata mode: gre6 already do
this, so just apply to others.
2) Do not *add* any attributes when configuring
gre tunnel in collect metadata mode.
Other tunnels (e.g. gre6, iptnl, ip6tnl)
alredy do that.
This is next step in ipv4 and ipv6 modules
unification to prepare for merge in the future.
Any comments, suggestions and criticism as always
welcome.
v2
For all tunnels implementing collect metadata
use "external" keyword for both JSON. Thanks
to Jiri Benc for detailed explanation.
====================
The lexer will go with the longest match, so previously
the starting double quotes of a string would be swallowed by
the [^ \t\r\n()]+ pattern leaving the user no way to
actually use strings with escape sequences.
Fix this by not allowing this case to start with double
quotes.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Serhey Popovych [Fri, 19 Jan 2018 16:44:03 +0000 (18:44 +0200)]
iplink: Use ll_name_to_index() instead of if_nametoindex()
While benefit from using ll_name_to_index() with populated
cache can potentially be exploited only in few places
(e.g. bridge fdb/mdb/vlan show routines) there is another
advantage of ll_name_to_index() over plain if_nametoindex():
in case of if_nametoindex() failure ll_name_to_index()
will attempt to get index from common name in form "if%d"
that may be returned from ll_index_to_name().
This makes output from ip(8) coherent with it's input.
Note that most of the code already switched from plain
if_nametoindex() to ll_name_to_index() to cached variant.
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Serhey Popovych [Mon, 22 Jan 2018 14:50:08 +0000 (16:50 +0200)]
gre/gre6: Post merge fixes
Few minor changes after merge of 'master' into 'net-next' branch:
1) Follow 80 line length for printing erspan_index parameter
as we did in master with commit 2a8d0f6e9c3f ("gre/tunnel:
Print erspan_index using print_uint()").
2) Remove remnants of encapsulation option printing: now it
is done using tnl_print_encap() helper in commit bad76e6b1f44
("ip/tunnel: Abstract tunnel encapsulation options printing").
David Ahern [Sun, 21 Jan 2018 19:20:56 +0000 (11:20 -0800)]
Merge branch 'shared_block' into net-next
Jiri Pirko says:
====================
From: Jiri Pirko <jiri@mellanox.com>
Kernel allows to share all filters between qdiscs with use
of shared block.
Example:
block number 22. "22" is just an identification:
$ tc qdisc add dev ens7 ingress_block 22 ingress
^^^^^^^^^^^^^^^^
$ tc qdisc add dev ens8 ingress_block 22 ingress
^^^^^^^^^^^^^^^^
If we don't specify "block" command line option, no shared block would
be created:
$ tc qdisc add dev ens9 ingress
Now if we list the qdiscs, we will see the block index in the output:
$ tc qdisc
qdisc ingress ffff: dev ens7 parent ffff:fff1 ingress_block 22
qdisc ingress ffff: dev ens8 parent ffff:fff1 ingress_block 22
qdisc ingress ffff: dev ens9 parent ffff:fff1
To make is more visual, the situation looks like this:
Unlimited number of qdiscs may share the same block.
Block sharing is also supported for clsact qdisc:
$ tc qdisc add dev ens10 ingress_block 23 egress_block 24 clsact
$ tc qdisc show dev ens10
qdisc clsact ffff: dev ens10 parent ffff:fff1 ingress_block 23 egress_block 24
We can add filter using the block index:
$ tc filter add block 22 protocol ip pref 25 flower dst_ip 192.168.0.0/16 action drop
Note we cannot use the qdisc for filter manipulations of shared blocks:
$ tc filter add dev ens8 ingress protocol ip pref 1 flower dst_ip 192.168.100.2 action drop
Error: This filter block is shared. Please use the block index to manipulate the filters.
We will see the same output if we list filters for ingress qdisc of
ens7 and ens8, also for the block 22:
$ tc filter show block 22
filter protocol ip pref 25 flower chain 0
filter protocol ip pref 25 flower chain 0 handle 0x1
...
$ tc filter show dev ens7 ingress
filter block 22 protocol ip pref 25 flower chain 0
filter block 22 protocol ip pref 25 flower chain 0 handle 0x1
...
$ tc filter show dev ens8 ingress
filter block 22 protocol ip pref 25 flower chain 0
filter block 22 protocol ip pref 25 flower chain 0 handle 0x1
...
Jiri Pirko [Sat, 20 Jan 2018 10:00:29 +0000 (11:00 +0100)]
tc: implement ingress/egress block index attributes for qdiscs
During qdisc creation it is possible to specify shared block for bot
ingress and egress. Pass this values to kernel according to the command
line options.
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David Ahern <dsahern@gmail.com>
Jiri Pirko [Sat, 20 Jan 2018 10:00:28 +0000 (11:00 +0100)]
tc: introduce support for block-handle for filter operations
So far, qdisc was the only handle that could be used to manipulate
filters. Kernel added support for using block to manipulate it. So add
the support to use block index to manipulate filters. The magic
TCM_IFINDEX_MAGIC_BLOCK indicates the block index is in use.
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David Ahern <dsahern@gmail.com>
Jiri Pirko [Sat, 20 Jan 2018 10:00:27 +0000 (11:00 +0100)]
tc: introduce tc_qdisc_block_exists helper
This hepler used qdisc dump to list all qdisc and find if block index in
question is used by any of them. That means the block with specified
index exists.
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David Ahern <dsahern@gmail.com>
David Ahern [Sun, 21 Jan 2018 18:11:07 +0000 (10:11 -0800)]
Merge branch 'inet_get_addr' into net-next
Serhey Popovych says:
====================
It looks confusing to have multiple independent
routines to get internet address from it's string
representation: get_addr() and inet_get_addr().
Most complicated users of inet_get_addr() is
iplink_geneve.c and iplink_vxlan.c because they
required to handle both AF_INET and AF_INET6
for their local/remote endpoints.
On the other hand get_addr() does not provide
additional information like address type: need
to address this. to get rid of current and
possible future code duplications. Note that
this functionality is first step to make proto
independent handling of local/remote endpoints
in ip/tunnel code (there will be additional
series based on this one).
Also fix get_addr_1() and get_prefix() to make
sure it always provide correct ->family and
->bitlen.
As always comments, suggestions and criticism
are welcome.
Serhey Popovych [Thu, 18 Jan 2018 18:13:47 +0000 (20:13 +0200)]
ip: Get rid of inet_get_addr()
Both geneve and vxlan modules are converted to
use get_addr() we can replace inet_get_addr()
in less problematic places and finally get
rid of inet_get_addr().
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com> Signed-off-by: David Ahern <dsahern@gmail.com>
Serhey Popovych [Thu, 18 Jan 2018 18:13:44 +0000 (20:13 +0200)]
utils: Fast inet address classification after get_addr()
It looks very useful to receive additional information
from get_addr_1() and get_addr() about address to simplify
caller and get rid of code duplications.
For now following information can be returned:
1) address is unspecified (zero)
2) address is multicast
3) address is internet: family is either AF_INET or
AF_INET6.
More information can be added in the future.
Introduce inline helpers to make code using this new
address classification interface more self explaining:
bool is_addrtype_inet(inet_prefix *addr)
true if @addr is inet address
bool is_addrtype_inet_unspec(inet_prefix *addr)
true if @addr is unspecified inet address
bool is_addrtype_inet_multi(inet_prefix *addr)
true if @addr is multicast inet address
bool is_addrtype_inet_not_unspec(inet_prefix *addr)
true if @addr is not unspecified inet address
false if @addr is not inet or unspecified inet
bool is_addrtype_inet_not_multi(inet_prefix *addr)
true if @addr is not multicast inet address
false if @addr is not inet or multicast inet
Last two are useful for case when we need inet address
that is not unspecified or multicast.
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com> Signed-off-by: David Ahern <dsahern@gmail.com>
Serhey Popovych [Thu, 18 Jan 2018 18:13:43 +0000 (20:13 +0200)]
utils: Always specify family and ->bytelen in get_prefix_1()
Handle default/all/any special case in get_addr_1() to setup
->family and ->bytelen correctly.
Make get_addr_1() return ->bitlen == -2 instead of -1 to
distinguish default/all/any special case from the rest:
it is safe because all callers check ->bitlen < 0, not
explicit value -1.
Reduce intendation by one level and get rid of goto/label
to make code more readable.
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com> Signed-off-by: David Ahern <dsahern@gmail.com>
Serhey Popovych [Thu, 18 Jan 2018 18:13:42 +0000 (20:13 +0200)]
utils: Always specify family for address in get_addr_1()
Set ->family correctly when string representing address
is "default", "all" or "any": get_addr_1() might be called
with AF_UNSPEC (e.g. get_addr() -> get_addr_1()).
Extend support for zero address to all address families,
not only AF_INET and AF_INET6 when one explicitly given
as @family: use af_byte_len() to correctly set address length.
Still assume AF_INET when @family is AF_UNSPEC.
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com> Signed-off-by: David Ahern <dsahern@gmail.com>
Jakub Kicinski [Wed, 17 Jan 2018 07:50:54 +0000 (23:50 -0800)]
bpf: support map offload
When program is loaded with a specified ifindex, use that
ifindex also when creating maps.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David Ahern <dsahern@gmail.com>
Jakub Kicinski [Tue, 16 Jan 2018 23:08:50 +0000 (15:08 -0800)]
tc: red: allow setting th_min and th_max to the same value
Setting th_min and th_max to the same value may be useful for DCTCP
deployments. The original DCTCP paper describes it as a simplest way
of achieving simple ECN threshold marking. Indeed, there doesn't seem
to be any simpler qdisc in Linux which would allow such a setup today.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David Ahern <dsahern@gmail.com>
Serhey Popovych [Thu, 18 Jan 2018 14:04:36 +0000 (16:04 +0200)]
tunnel: Return constant string without copying it
We return constant string from tnl_strproto(), no need
to copy it to temporary buffer and then return such
buffer as const: return constant string instead.
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Get rid of code duplications and consolidate encapsulation
options printing in single function - tnl_print_encap().
Introduce and use tnl_encap_str() to format encapsulation
option string according to tempate and given values to avoid
code duplication and simplify it.
Use print_string() instead of fputs() and fprintf() to
print encapsulation for !is_json_context().
Print "unknown" parameter for "encap" type in PRINT_FP
context using "%s " format specifier and benefit from
complite time string merge.
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>