Eli Britstein [Wed, 20 Nov 2019 12:42:45 +0000 (14:42 +0200)]
tc: flower: support masked port destination and source match
Extend destination and source port match to support masks, accepting
both decimal and hexadecimal formats.
Also add missing documentation to synopsis in manpage.
$ tc qdisc add dev eth0 ingress
$ tc filter add dev eth0 protocol ip parent ffff: prio 1 flower skip_hw \
ip_proto tcp dst_port 1234/0xff00 action drop
$ tc -s filter show dev eth0 parent ffff:
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
eth_type ipv4
ip_proto tcp
dst_port 1234/0xff00
skip_hw
not_in_hw
action order 1: gact action drop
random type none pass val 0
index 1 ref 1 bind 1 installed 26 sec used 26 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
$ tc -p -j filter show dev eth0 parent ffff:
"options": {
"keys": {
"dst_port": 1234,
"dst_port_mask": 65280
...
Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David Ahern <dsahern@gmail.com>
Moshe Shemesh [Sun, 24 Nov 2019 16:02:38 +0000 (18:02 +0200)]
ip: fix oneline output
Ip tool oneline option should output each record on a single line. While
oneline option is active the variable _SL_ replaces line feeds with the
'\' character. However, at the end of print_linkinfo() the variable _SL_
shouldn't be used, otherwise the whole output is on a single line.
Before this fix:
$ip -o link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode
DEFAULT group default qlen 1000\ link/loopback 00:00:00:00:00:00 brd
00:00:00:00:00:00\2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
qdisc fq_codel state UP mode DEFAULT group default qlen 1000\
link/ether 52:54:00:60:0a:db brd ff:ff:ff:ff:ff:ff\3: eth1:
<BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode
DEFAULT group default qlen 1000\ link/ether 00:50:56:1b:05:cd brd
ff:ff:ff:ff:ff:ff\
After this fix:
$ip -o link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode
DEFAULT group default qlen 1000\ link/loopback 00:00:00:00:00:00 brd
00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state
UP mode DEFAULT group default qlen 1000\ link/ether 52:54:00:60:0a:db
brd ff:ff:ff:ff:ff:ff
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state
UP mode DEFAULT group default qlen 1000\ link/ether 00:50:56:1b:05:cd
brd ff:ff:ff:ff:ff:ff
Fixes: 3aa0e51be64b ("ip: add support for alternative name addition/deletion/list") Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David Ahern <dsahern@gmail.com>
Jakub Kicinski [Wed, 20 Nov 2019 17:56:06 +0000 (09:56 -0800)]
devlink: fix requiring either handle
devlink sb occupancy show requires device or port handle.
It passes both device and port handle bits as required to
dl_argv_parse() so since commit 1896b100af46 ("devlink: catch
missing strings in dl_args_required") devlink will now
complain that only one is present:
$ devlink sb occupancy show pci/0000:06:00.0/0
BUG: unknown argument required but not found
Drop the bit for the handle which was not found from required.
Ido Kalir [Mon, 18 Nov 2019 08:35:30 +0000 (10:35 +0200)]
rdma: Rewrite custom JSON and prints logic to use common API
Instead of doing open-coded solution to generate JSON and prints, let's
reuse existing infrastructure and APIs to do the same as ip/*.
Before this change:
if (rd->json_output)
jsonw_uint_field(rd->jw, "sm_lid", sm_lid);
else
pr_out("sm_lid %u ", sm_lid);
After this change:
print_uint(PRINT_ANY, "sm_lid", "sm_lid %u ", sm_lid);
All the print functions are converted to support color but for now the
type of color is COLOR_NONE. This is done as a preparation to addition
of color enable option. Such change will require rewrite of command line
arguments parser which is out-of-scope for this patch.
Signed-off-by: Ido Kalir <idok@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David Ahern <dsahern@gmail.com>
Danit Goldberg [Mon, 18 Nov 2019 07:49:12 +0000 (09:49 +0200)]
ip link: Add support to get SR-IOV VF node GUID and port GUID
Extend iplink to show VF GUIDs (IFLA_VF_IB_NODE_GUID, IFLA_VF_IB_PORT_GUID),
giving the ability for user-space application to print GUID values.
This ability is added to the one of setting new node GUID and port GUID values.
Suitable ip link command:
- ip link show <device>
For example:
- ip link set ib4 vf 0 node_guid 22:44:33:00:33:11:00:33
- ip link set ib4 vf 0 port_guid 10:21:33:12:00:11:22:10
- ip link show ib4
ib4: <BROADCAST,MULTICAST> mtu 4092 qdisc noop state DOWN mode DEFAULT group default qlen 256
link/infiniband 00:00:0a:2d:fe:80:00:00:00:00:00:00:ec:0d:9a:03:00:44:36:8d brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
vf 0 link/infiniband 00:00:0a:2d:fe:80:00:00:00:00:00:00:ec:0d:9a:03:00:44:36:8d brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff,
spoof checking off, NODE_GUID 22:44:33:00:33:11:00:33, PORT_GUID 10:21:33:12:00:11:22:10, link-state disable, trust off, query_rss off
Signed-off-by: Danit Goldberg <danitg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David Ahern <dsahern@gmail.com>
Eli Britstein [Thu, 14 Nov 2019 12:44:41 +0000 (14:44 +0200)]
tc: flower: fix output for ip tos and ttl
Fix the output for ip tos and ttl to be numbers in JSON format.
Example:
$ tc qdisc add dev eth0 ingress
$ tc filter add dev eth0 protocol ip parent ffff: prio 1 flower skip_hw \
ip_tos 5/0xf action drop
Non JSON format remains the same:
$ tc filter show dev eth0 parent ffff:
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
eth_type ipv4
ip_tos 5/0xf
skip_hw
not_in_hw
action order 1: gact action drop
random type none pass val 0
index 1 ref 1 bind 1
JSON format is changed (partial output):
$ tc -p -j filter show dev eth0 parent ffff:
Before:
"options": {
"keys": {
"ip_tos": "0x5/f",
...
After:
"options": {
"keys": {
"ip_tos": 5,
"ip_tos_mask": 15,
...
Fixes: 6ea2c2b1cff6 ("tc: flower: add support for matching on ip tos and ttl") Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Eli Britstein [Thu, 14 Nov 2019 12:44:37 +0000 (14:44 +0200)]
tc_util: introduce a function to print JSON/non-JSON masked numbers
Introduce a function to print masked number with a different output for
JSON or non-JSON methods, as a pre-step towards printing numbers using
this common function.
Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Guillaume Nault [Fri, 8 Nov 2019 21:28:04 +0000 (22:28 +0100)]
man: remove ppp from list of devices not allowed to change netns
PPP devices can be moved to different network namespaces. The feature
was added by commit 79c441ae505c ("ppp: implement x-netns support")
in Linux 4.3.
Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
David Ahern [Sat, 9 Nov 2019 01:34:24 +0000 (01:34 +0000)]
Merge branch 'nsid-cleanup' into next
Guillaume Nault says:
====================
It's currently hard to review ipnetns. The netns ids are inconsistently
treated as signed or unsigned and most helper functions aren't prepared
to use negative ids.
Netns id attributes can be negative: NETNSA_NSID_NOT_ASSIGNED =3D=3D -1.
So let's consistently treat nsids as signed and also reject negative
values in functions that are supposed to only handle assigned netns
ids.
While there, let's drop the extra blank line generated by some command
line parsing errors (patch 5/5).
Guillaume Nault [Fri, 8 Nov 2019 17:00:12 +0000 (18:00 +0100)]
ipnetns: treat NETNSA_NSID and NETNSA_CURRENT_NSID as signed
These attributes are signed (with -1 meaning NETNSA_NSID_NOT_ASSIGNED).
So let's use rta_getattr_s32() and print_int() instead of their
unsigned counterpart to avoid confusion.
Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David Ahern <dsahern@gmail.com>
Jakub Kicinski [Tue, 5 Nov 2019 21:17:05 +0000 (13:17 -0800)]
devlink: fix referencing namespace by PID
netns parameter for devlink reload is supposed to take PID
as well as string name. However, the PID parsing has two
bugs:
- the opts->netns member is unsigned so the < 0
condition is always false;
- the parameter list is not rewinded after parsing as
a name, so parsing as a pid uses the wrong argument.
Fixes: 08e8e1ca3e05 ("devlink: extend reload command to add support for network namespace change") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David Ahern <dsahern@gmail.com>
Jakub Kicinski [Tue, 5 Nov 2019 21:13:36 +0000 (13:13 -0800)]
devlink: require resource parameters
If devlink resource set parameters are not provided it crashes:
$ devlink resource set netdevsim/netdevsim0
Segmentation fault (core dumped)
This is because even though DL_OPT_RESOURCE_PATH and
DL_OPT_RESOURCE_SIZE are passed as o_required, the validation
table doesn't contain a relevant string.
Fixes: 8cd644095842 ("devlink: Add support for devlink resource abstraction") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Vlad Buslov [Wed, 30 Oct 2019 14:20:40 +0000 (16:20 +0200)]
tc: implement support for action flags
Implement setting and printing of action flags with single available flag
value "no_percpu" that translates to kernel UAPI TCA_ACT_FLAGS value
TCA_ACT_FLAGS_NO_PERCPU_STATS. Update man page with information regarding
usage of action flags.
Example usage:
# tc actions add action gact drop no_percpu
# sudo tc actions list action gact
total acts 1
action order 0: gact action drop
random type none pass val 0
index 1 ref 1 bind 0
no_percpu
Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David Ahern <dsahern@gmail.com>
fread(3) returns size_t data type which is unsigned, thus check
`if (fread(...) < 0)' is always false. To check if fread(3) has
failed, user should check error indicator with ferror(3).
This commit also changes read logic a little bit by being less
forgiving for errors. Previous logic was checking if fread(3)
read *at least* required ammount of data, now code checks if
fread(3) read *exactly* expected ammount of data. This makes
sense because code parses very specific binary file, and reading
even 1 less/more byte than expected, will later corrupt data anyway.
Signed-off-by: Michał Łyszczek <michal.lyszczek@bofc.pl> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Vlad Buslov [Tue, 29 Oct 2019 17:53:46 +0000 (19:53 +0200)]
tc: remove duplicated NEXT_ARG_FWD() in parse_ct()
Function parse_ct() manually calls NEXT_ARG_FWD() after
parse_action_control_dflt(). This is redundant because
parse_action_control_dflt() modifies argc and argv itself. Moreover, such
implementation parses out any following actions option. For example, adding
action ct with cookie errors:
$ sudo tc actions add action ct cookie 111111111111
Bad action type 111111111111
Usage: ... gact <ACTION> [RAND] [INDEX]
Where: ACTION := reclassify | drop | continue | pass | pipe |
goto chain <CHAIN_INDEX> | jump <JUMP_COUNT>
RAND := random <RANDTYPE> <ACTION> <VAL>
RANDTYPE := netrand | determ
VAL : = value not exceeding 10000
JUMP_COUNT := Absolute jump from start of action list
INDEX := index value used
With fix:
$ sudo tc actions add action ct cookie 111111111111
$ sudo tc actions list action ct
total acts 1
action order 0: ct zone 0 pipe
index 1 ref 1 bind 0
cookie 111111111111
Michał Łyszczek [Thu, 24 Oct 2019 21:20:43 +0000 (23:20 +0200)]
rdma/sys.c: fix possible out-of-bound array access
netns_modes_str[] array has 2 elements, when netns_mode is 2,
condition (2 <= 2) will be true and `mode_str = netns_modes_str[2]'
will be executed, which will result in out-of-bound read.
Signed-off-by: Michał Łyszczek <michal.lyszczek@bofc.pl> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Jiri Pirko [Thu, 24 Oct 2019 10:20:52 +0000 (12:20 +0200)]
ip: allow to use alternative names as handle
Extend ll_name_to_index() to get the index of a netdevice using
alternative interface name. Allow alternative long names to pass checks
in couple of ip link/addr commands.
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David Ahern <dsahern@gmail.com>
Jiri Pirko [Thu, 24 Oct 2019 10:20:50 +0000 (12:20 +0200)]
lib/ll_map: cache alternative names
Alternative names are related to the "parent name". That means,
whenever ll_remember_index() is called to add/delete/update and it founds
the "parent name" im object by ifindex, processes related
alternative name im objects too. Put them in a list which holds the
relationship with the parent.
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David Ahern <dsahern@gmail.com>
David Ahern [Sun, 27 Oct 2019 17:28:49 +0000 (10:28 -0700)]
Merge branch 'rdma-mr-stats' into next
Leon Romanovsky says:
====================
This is supplementary part of "ODP information and statistics"
kernel series.
https://lore.kernel.org/linux-rdma/20191016062308.11886-1-leon@kernel.org
Michał Łyszczek [Tue, 22 Oct 2019 20:09:23 +0000 (22:09 +0200)]
ipnetns: do not check netns NAME when -all is specified
When `-all' argument is specified netns runs cmd on all namespaces
and NAME is not used, but netns nevertheless checks if argv[1] is a
valid namespace name ignoring the fact that argv[1] contains cmd
and not NAME. This results in bug where user cannot specify
absolute path to command.
# ip -all netns exec /usr/bin/whoami
Invalid netns name "/usr/bin/whoami"
This forces user to have his command in PATH.
Solution is simply to not validate argv[1] when `-all' argument is
specified.
Signed-off-by: Michał Łyszczek <michal.lyszczek@bofc.pl> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Davide Caratti [Mon, 7 Oct 2019 10:16:44 +0000 (12:16 +0200)]
ss: allow dumping kTLS info
now that INET_DIAG_INFO requests can dump TCP ULP information, extend 'ss'
to allow diagnosing kTLS when it is attached to a TCP socket. While at it,
import kTLS uAPI definitions from the latest net-next tree.
CC: Andrea Claudi <aclaudi@redhat.com> Co-developed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David Ahern <dsahern@gmail.com>
Nicolas Dichtel [Mon, 7 Oct 2019 13:44:47 +0000 (15:44 +0200)]
ipnetns: enable to dump nsid conversion table
This patch enables to dump/get nsid from a netns into another netns.
Example:
$ ./test.sh
+ ip netns add foo
+ ip netns add bar
+ touch /var/run/netns/init_net
+ mount --bind /proc/1/ns/net /var/run/netns/init_net
+ ip netns set init_net 11
+ ip netns set foo 12
+ ip netns set bar 13
+ ip netns
init_net (id: 11)
bar (id: 13)
foo (id: 12)
+ ip -n foo netns set init_net 21
+ ip -n foo netns set foo 22
+ ip -n foo netns set bar 23
+ ip -n foo netns
init_net (id: 21)
bar (id: 23)
foo (id: 22)
+ ip -n bar netns set init_net 31
+ ip -n bar netns set foo 32
+ ip -n bar netns set bar 33
+ ip -n bar netns
init_net (id: 31)
bar (id: 33)
foo (id: 32)
+ ip netns list-id target-nsid 12
nsid 21 current-nsid 11 (iproute2 netns name: init_net)
nsid 22 current-nsid 12 (iproute2 netns name: foo)
nsid 23 current-nsid 13 (iproute2 netns name: bar)
+ ip -n foo netns list-id target-nsid 21
nsid 11 current-nsid 21 (iproute2 netns name: init_net)
nsid 12 current-nsid 22 (iproute2 netns name: foo)
nsid 13 current-nsid 23 (iproute2 netns name: bar)
+ ip -n bar netns list-id target-nsid 33 nsid 32
nsid 32 current-nsid 32 (iproute2 netns name: foo)
+ ip -n bar netns list-id target-nsid 31 nsid 32
nsid 12 current-nsid 32 (iproute2 netns name: foo)
+ ip netns list-id nsid 13
nsid 13 (iproute2 netns name: bar)
CC: Petr Oros <poros@redhat.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Tested-by: Petr Oros <poros@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Aya Levin [Wed, 2 Oct 2019 14:35:16 +0000 (17:35 +0300)]
devlink: Fix inconsistency between command input and output
In devlink health show command the reporter's name parameter is called
reporter, but in the output the reporter's name is referred to as name
Before this patch:
$ devlink health show pci/0000:04:00.0 reporter tx
pci/0000:04:00.0:
name tx
state healthy error 0 recover 0 grace_period 500 auto_recover true
After this patch:
$ devlink health show pci/0000:04:00.0 reporter tx
pci/0000:04:00.0:
reporter tx
state healthy error 0 recover 0 grace_period 500 auto_recover true
Reported-by: Jiri Pirko <jiri@mellanox.com> Fixes: 2f1242efe9d0 ("devlink: Add devlink health show command") Signed-off-by: Aya Levin <ayal@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Aya Levin [Wed, 2 Oct 2019 14:35:15 +0000 (17:35 +0300)]
devlink: Left justification on FMSG output
FMSG output is dynamic, space separator must be on the left hand side of
the value. Otherwise output has redundant left indentation regardless
the hierarchy.
Fixes: 844a61764c6f ("devlink: Add helper functions for name and value separately") Signed-off-by: Aya Levin <ayal@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Andrea Claudi [Tue, 1 Oct 2019 10:32:17 +0000 (12:32 +0200)]
tc: fix segmentation fault on gact action
tc segfaults if gact action is used without action or index:
$ ip link add type dummy
$ tc actions add action pipe index 1
$ tc filter add dev dummy0 parent ffff: protocol ip \
pref 10 u32 match ip src 127.0.0.2 flowid 1:10 action gact
Segmentation fault
We expect tc to fail gracefully with an error message.
This happens if gact is the last argument of the incomplete
command. In this case the "gact" action is parsed, the macro
NEXT_ARG_FWD() is executed and the next matches() crashes
because of null argv pointer.
To avoid this, simply use NEXT_ARG() instead.
With this change in place:
$ ip link add type dummy
$ tc actions add action pipe index 1
$ tc filter add dev dummy0 parent ffff: protocol ip \
pref 10 u32 match ip src 127.0.0.2 flowid 1:10 action gact
Command line is not complete. Try option "help"
Fixes: fa4958897314 ("tc: Fix binding of gact action by index.") Reported-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Damien Robert [Mon, 30 Sep 2019 21:11:37 +0000 (23:11 +0200)]
man: add reference to `ip route add encap ... src`
The ability to specify the source adresse for 'encap ip' / 'encap ip6'
was added in commit 94a8722f2f78f04c47678cf864ac234a38366709 but the man
page was not updated.
Also fixes a missing page in ip-route.8.in.
Signed-off-by: Damien Robert <damien.olivier.robert+git@gmail.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Jiri Pirko [Thu, 3 Oct 2019 09:51:15 +0000 (11:51 +0200)]
devlink: extend reload command to add support for network namespace change
Extend existing devlink reload command by adding option "netns" by which
user can instruct kernel to reload the devlink instance into specified
network namespace.
Example:
$ ip netns add testns1
$ devlink dev reload netdevsim/netdevsim10 netns testns1
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David Ahern <dsahern@kernel.org>
Leon Romanovsky [Wed, 2 Oct 2019 13:49:34 +0000 (16:49 +0300)]
rdma: Relax requirement to have PID for HW objects
RDMA has weak connection between PIDs and HW objects, because
the latter tied to file descriptors for their lifetime management.
The outcome of such connection is that for the following scenario,
the returned PID will be 0 (not-valid):
1. Create FD and context
2. Share it with ephemeral child
3. Create any object and exit that child
This flow was revealed in testing environment and of course real users
are not running such scenario, because it makes no sense at all in RDMA
world.
Let's do two changes in the code to support such workflow anyway:
1. Remove need to provide PID/kernel name. Code already supports it,
just need to remove extra validation.
2. Ball-out in case PID is 0.
Thomas Haller [Wed, 25 Sep 2019 10:24:03 +0000 (12:24 +0200)]
man: add note to ip-macsec manual about necessary key management
The man page of ip-macsec and the existance of the tool makes it seem like
the user could just configure static keys once, and be done with it. That is
not the case. Some form or key management must be done in user space.
Add a note about that.
Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Joe Stringer [Fri, 20 Sep 2019 02:04:47 +0000 (19:04 -0700)]
bpf: Fix race condition with map pinning
If two processes attempt to invoke bpf_map_attach() at the same time,
then they will both create maps, then the first will successfully pin
the map to the filesystem and the second will not pin the map, but will
continue operating with a reference to its own copy of the map. As a
result, the sharing of the same map will be broken from the two programs
that were concurrently loaded via loaders using this library.
Fix this by adding a retry in the case where the pinning fails because
the map already exists on the filesystem. In that case, re-attempt
opening a fd to the map on the filesystem as it shows that another
program already created and pinned a map at that location.
Signed-off-by: Joe Stringer <joe@wand.net.nz> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Andrea Claudi [Mon, 16 Sep 2019 13:00:55 +0000 (15:00 +0200)]
bpf: replace snprintf with asprintf when dealing with long buffers
This reduces stack usage, as asprintf allocates memory on the heap.
This indirectly fixes a snprintf truncation warning (from gcc v9.2.1):
bpf.c: In function ‘bpf_get_work_dir’:
bpf.c:784:49: warning: ‘snprintf’ output may be truncated before the last format character [-Wformat-truncation=]
784 | snprintf(bpf_wrk_dir, sizeof(bpf_wrk_dir), "%s/", mnt);
| ^
bpf.c:784:2: note: ‘snprintf’ output between 2 and 4097 bytes into a destination of size 4096
784 | snprintf(bpf_wrk_dir, sizeof(bpf_wrk_dir), "%s/", mnt);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Fixes: e42256699cac ("bpf: make tc's bpf loader generic and move into lib") Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Signed-off-by: David Ahern <dsahern@gmail.com>
Nicolas Dichtel [Mon, 16 Sep 2019 15:36:27 +0000 (17:36 +0200)]
link_xfrm: don't force to set phydev
Since linux commit 22d6552f827e ("xfrm interface: fix management of
phydev"), phydev is not mandatory anymore.
Note that it also could be useful before the above commit to not force the
user to put a phydev (the kernel was checking it anyway).
For example, it was useful to not set it in case of x-netns, because the
phydev is not available in the current netns:
Before the patch:
$ ip netns add foo
$ ip link add xfrm1 type xfrm dev eth1 if_id 1
$ ip link set xfrm1 netns foo
$ ip -n foo link set xfrm1 type xfrm dev eth1 if_id 2
Cannot find device "eth1"
$ ip -n foo link set xfrm1 type xfrm if_id 2
must specify physical device
Fixes: 286446c1e8c7 ("ip: support for xfrm interfaces") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Matt Ellison <matt@arroyo.io> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The 'fw_load_policy' devlink parameter now supports an unknown value.
Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David Ahern <dsahern@gmail.com>
Add support for the new devlink parameter along with string to uint
conversion.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David Ahern <dsahern@gmail.com>
David Dai [Wed, 4 Sep 2019 15:06:51 +0000 (10:06 -0500)]
iproute2-next: police: support 64bit rate and peakrate in tc utility
For high speed adapter like Mellanox CX-5 card, it can reach upto
100 Gbits per second bandwidth. Currently htb already supports 64bit rate
in tc utility. However police action rate and peakrate are still limited
to 32bit value (upto 32 Gbits per second). Taking advantage of the 2 new
attributes TCA_POLICE_RATE64 and TCA_POLICE_PEAKRATE64 from kernel,
tc can use them to break the 32bit limit, and still keep the backward
binary compatibility.
Tested-by: David Dai <zdai@linux.vnet.ibm.com> Signed-off-by: David Dai <zdai@linux.vnet.ibm.com> Signed-off-by: David Ahern <dsahern@gmail.com>
David Ahern [Wed, 4 Sep 2019 15:09:52 +0000 (08:09 -0700)]
nexthop: Add space after blackhole
Add a space after 'blackhole' is missing to properly separate the
protocol when it is given.
Fixes: 63df8e8543b0 ("Add support for nexthop objects") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Andrea Claudi [Wed, 4 Sep 2019 17:26:14 +0000 (19:26 +0200)]
devlink: fix segfault on health command
devlink segfaults when using grace_period without reporter
$ devlink health set pci/0000:00:09.0 grace_period 3500
Segmentation fault
devlink is instead supposed to gracefully fail printing a warning
message
$ devlink health set pci/0000:00:09.0 grace_period 3500
Reporter's name is expected.
This happens because DL_OPT_HEALTH_REPORTER_NAME and
DL_OPT_HEALTH_REPORTER_GRACEFUL_PERIOD are both defined as BIT(27).
When dl_opts_put() parse options and grace_period is set, it erroneously
tries to set reporter name to null.
This is fixed simply shifting by 1 bit enumeration starting with
DL_OPT_HEALTH_REPORTER_GRACEFUL_PERIOD.
Fixes: b18d89195b16 ("devlink: Add devlink health set command") Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Donald Sharp [Sat, 10 Aug 2019 00:18:43 +0000 (20:18 -0400)]
ip nexthop: Allow flush|list operations to specify a specific protocol
In the case where we have a large number of nexthops from a specific
protocol, allow the flush and list operations to take a protocol
to limit the commands scopes.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com> Signed-off-by: David Ahern <dsahern@gmail.com>