]> git.proxmox.com Git - mirror_iproute2.git/log
mirror_iproute2.git
5 years agobridge: fdb: add support for src_vni option
Roopa Prabhu [Mon, 4 Mar 2019 05:26:32 +0000 (21:26 -0800)]
bridge: fdb: add support for src_vni option

We already print src_vni for a fdb entry when present.
This patch adds the ability to set src_vni on a fdb
entry. When not specified, kernel will use vni specified
on the vxlan device. This can be used on a vxlan fdb entry
when the vxlan device is in external or collect metadata
mode.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agoMerge branch 'devlink-health' into next
David Ahern [Thu, 28 Feb 2019 16:00:19 +0000 (08:00 -0800)]
Merge branch 'devlink-health' into next

Aya Levin  says:

====================

This series adds support for devlink health commands:
 devlink health show     [ DEV reporter REPORTER_NAME ]
 devlink health recover    DEV reporter REPORTER_NAME
 devlink health diagnose   DEV reporter REPORTER_NAME
 devlink health dump show  DEV reporter REPORTER_NAME
 devlink health dump clear DEV reporter REPORTER_NAME
 devlink health set        DEV reporter REPORTER_NAME { grace_period | auto_recover } { msec | boolean }

The first patch refactors the validation of input parameters, which
grow way too long. Second and third patches fix bugs that were
discovered during the devlink health development. The forth patch adds
helper functions which enable output of value and labels separately.
Patches 5-10 add the devlink health functionality by command, the last
is the man page.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agodevlink: Add devlink-health man page
Aya Levin [Thu, 28 Feb 2019 12:13:04 +0000 (14:13 +0200)]
devlink: Add devlink-health man page

Add a man page describing devlink health's command set. Also add a
reference link from devlink main man page.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agodevlink: Add devlink health set command
Aya Levin [Thu, 28 Feb 2019 12:13:03 +0000 (14:13 +0200)]
devlink: Add devlink health set command

Add devlink set command which enables the user to configure parameters
related to the devlink health mechanism per reporter.
1) grace_period [msec] time interval between auto recoveries.
2) auto_recover [true/false] whether the devlink should execute automatic
recover on error.
Add a helper function to retrieve a boolean value as an input parameter.
Example:
$ devlink health set pci/0000:00:09.0 reporter tx grace_period 3500
$ devlink health set pci/0000:00:09.0 reporter tx auto_recover false

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agodevlink: Add devlink health dump clear command
Aya Levin [Thu, 28 Feb 2019 12:13:02 +0000 (14:13 +0200)]
devlink: Add devlink health dump clear command

Add devlink dump clear command which deletes the last saved dump file.
Clearing the last saved dump enables a new dump file to be saved.
Example:
$ devlink health dump clear pci/0000:00:09.0 reporter tx

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agodevlink: Add devlink health dump show command
Aya Levin [Thu, 28 Feb 2019 12:13:01 +0000 (14:13 +0200)]
devlink: Add devlink health dump show command

Add devlink dump show command which displays the last saved dump.
Devlink health saves a single dump. If a dump is not already stored
by the devlink for this reporter, devlink generates a new dump. The dump
can be generated automatically when a reporter reports on an
error or manually by user's request.
The dump's output is defined by the reporter. The command uses the
infra structure for flexible format output introduced in previous patch.
Example:
$ devlink health dump show pci/0000:00:09.0 reporter tx

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agodevlink: Add devlink health diagnose command
Aya Levin [Thu, 28 Feb 2019 12:13:00 +0000 (14:13 +0200)]
devlink: Add devlink health diagnose command

Add devlink health diagnose command: enabling retrieval of diagnostics data
by the user on a reporter on a device. The command's output is a
free text defined by the reporter.

This patch also introduces an infra structure for flexible format
output. This allow the command to display different data fields
according to the reporter.
Example:
$ devlink health diagnose pci/0000:00:0a.0 reporter tx
SQs:
  sqn: 4403 HW state: 1 stopped: false
  sqn: 4408 HW state: 1 stopped: false
  sqn: 4413 HW state: 1 stopped: false
  sqn: 4418 HW state: 1 stopped: false
  sqn: 4423 HW state: 1 stopped: false

$ devlink health diagnose pci/0000:00:0a.0 reporter tx -jp
{
 "SQs":[
      {
       "sqn":4403,
       "HW state":1,
       "stopped":false
     },
      {
       "sqn":4408,
       "HW state":1,
       "stopped":false
     },
      {
       "sqn":4413,
       "HW state":1,
       "stopped":false
     },
      {
       "sqn":4418,
       "HW state":1,
       "stopped":false
     },
      {
       "sqn":4423,
       "HW state":1,
       "stopped":false
     }
   ]
}

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agodevlink: Add devlink health recover command
Aya Levin [Thu, 28 Feb 2019 12:12:59 +0000 (14:12 +0200)]
devlink: Add devlink health recover command

Add devlink health recover command which enables the user to initiate a
recovery on a reporter (if a recovery cb was supplied by the reporter).
This operation will increment the recoveries counter displayed in the
show command.
Example:
$ devlink health recover pci/0000:00:09.0 reporter tx

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agodevlink: Add devlink health show command
Aya Levin [Thu, 28 Feb 2019 12:12:58 +0000 (14:12 +0200)]
devlink: Add devlink health show command

Add devlink health show command which displays status and configuration
info on a specific reporter on a device or dump the info on all
reporters on all devices. Add helper functions to display status and
dump's time stamp.
Example:
$ devlink health show pci/0000:00:09.0 reporter tx
pci/0000:00:09.0:
 name tx
  state healthy error 0 recover 1 last_dump_date 2019-02-14 last_dump_time 10:10:10 grace_period 600 auto_recover true
$ devlink health show pci/0000:00:09.0 reporter tx -jp
{
 "health":{
  "pci/0000:00:0a.0":[
     {
     "name":"tx",
     "state":"healthy",
     "error":0,
     "recover":1,
     "last_dump_date":"2019-Feb-14",
     "last_dump_time":"10:10:10",
     "grace_period":600,
     "auto_recover":true
    }
  ]
}

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agodevlink: Add helper functions for name and value separately
Aya Levin [Thu, 28 Feb 2019 12:12:57 +0000 (14:12 +0200)]
devlink: Add helper functions for name and value separately

Add a new helper functions which outputs only values (without name
label) for different types: boolean, uint, uint64, string and binary.
In addition add a helper function which prints only the name label.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agodevlink: Fix boolean JSON print
Aya Levin [Thu, 28 Feb 2019 12:12:56 +0000 (14:12 +0200)]
devlink: Fix boolean JSON print

This patch removes the inverted commas from boolean values in JSON
format: true/false instead of "true"/"false".

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agodevlink: Fix print of uint64_t
Aya Levin [Thu, 28 Feb 2019 12:12:55 +0000 (14:12 +0200)]
devlink: Fix print of uint64_t

This patch prints uint64_t with its corresponding format and avoid implicit
cast to uint32_t.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agodevlink: Refactor validation of finding required arguments
Aya Levin [Thu, 28 Feb 2019 12:12:54 +0000 (14:12 +0200)]
devlink: Refactor validation of finding required arguments

Introducing argument's metadata structure matching a bitmap flag per
required argument and an error message if missing. Using this static
array to refactor validation of finding required arguments in devlink
command line and to ease further maintenance.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agordma: Add the prefix for driver attributes
Leon Romanovsky [Wed, 27 Feb 2019 06:41:51 +0000 (08:41 +0200)]
rdma: Add the prefix for driver attributes

There is a need to distinguish between driver vs. general exposed
attributes. The most common use case is to expose some internal
garbage under extremely common and sexy name, e.g. pi, ci e.t.c

In order to achieve that, we will add "drv_" prefix to all strings
which were received through RDMA_NLDEV_ATTR_DRIVER_* attributes.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>a
Tested-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agodevlink: add support for updating device flash
Jakub Kicinski [Tue, 26 Feb 2019 20:20:14 +0000 (12:20 -0800)]
devlink: add support for updating device flash

Add new command for updating flash of devices via devlink API.
Example:

$ cp flash-boot.bin /lib/firmware/
$ devlink dev flash pci/0000:05:00.0 file flash-boot.bin

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agoUpdate kernel headers
David Ahern [Wed, 27 Feb 2019 16:23:22 +0000 (08:23 -0800)]
Update kernel headers

Update kernel headers to commit:
    ff8285f81822 ("net: sched: pie: fix 64-bit division")

Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agoMerge branch 'rdma-object-ids' into next
David Ahern [Sun, 24 Feb 2019 15:13:21 +0000 (07:13 -0800)]
Merge branch 'rdma-object-ids' into next

Leon Romanovsky says:

====================

This series adds ability to present and query all known to rdmatool
object by their respective, unique IDs (e.g. pdn. mrn, cqn e.t.c).
All objects which have "parent" object has this information too.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agordma: Provide and reuse filter functions
Leon Romanovsky [Sat, 23 Feb 2019 09:15:28 +0000 (11:15 +0200)]
rdma: Provide and reuse filter functions

Globally replace all filter function in safer variants of those
is_filtered functions, which take into account the availability/lack
of netlink attributes.

Such conversion allowed to fix a number of places in the code, where
the previous implementation didn't honor filter requests if netlink
attribute wasn't present.

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agordma: Perform single .doit call to query specific objects
Leon Romanovsky [Sat, 23 Feb 2019 09:15:27 +0000 (11:15 +0200)]
rdma: Perform single .doit call to query specific objects

If user provides specific index, we can speedup query
by using .doit callback and save full dump and filtering
after that.

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agordma: Unify netlink attribute checks prior to prints
Leon Romanovsky [Sat, 23 Feb 2019 09:15:26 +0000 (11:15 +0200)]
rdma: Unify netlink attribute checks prior to prints

Place check if netlink attribute available in general place,
instead of doing the same check in many paces.

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agordma: Move QP code to separate function
Leon Romanovsky [Sat, 23 Feb 2019 09:15:25 +0000 (11:15 +0200)]
rdma: Move QP code to separate function

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agordma: Place PD parsing print routine into separate function
Leon Romanovsky [Sat, 23 Feb 2019 09:15:24 +0000 (11:15 +0200)]
rdma: Place PD parsing print routine into separate function

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agordma: Move MR code to be suitable for per-line parsing
Leon Romanovsky [Sat, 23 Feb 2019 09:15:23 +0000 (11:15 +0200)]
rdma: Move MR code to be suitable for per-line parsing

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agordma: Refactor CQ prints
Leon Romanovsky [Sat, 23 Feb 2019 09:15:22 +0000 (11:15 +0200)]
rdma: Refactor CQ prints

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agordma: Simplify CM_ID print code
Leon Romanovsky [Sat, 23 Feb 2019 09:15:21 +0000 (11:15 +0200)]
rdma: Simplify CM_ID print code

Refactor our the CM_ID print code.

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agordma: Simplify code to reuse existing functions
Leon Romanovsky [Sat, 23 Feb 2019 09:15:20 +0000 (11:15 +0200)]
rdma: Simplify code to reuse existing functions

Remove duplicated functions in favour general res_print_uint() call.

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agordma: Properly mark RDMAtool license
Leon Romanovsky [Sat, 23 Feb 2019 09:15:19 +0000 (11:15 +0200)]
rdma: Properly mark RDMAtool license

RDMA subsystem is dual-licensed with "GPL-2.0 OR Linux-OpenIB" proper
license and Mellanox submission are supposed to have this type of license.

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agordma: Move resource QP logic to separate file
Leon Romanovsky [Sat, 23 Feb 2019 09:15:18 +0000 (11:15 +0200)]
rdma: Move resource QP logic to separate file

Logically separate resource QP logic to separate file,
in order to make PD specific logic self-contained.

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agordma: Move out resource CM-ID logic to separate file
Leon Romanovsky [Sat, 23 Feb 2019 09:15:17 +0000 (11:15 +0200)]
rdma: Move out resource CM-ID logic to separate file

Logically separate resource CM-ID logic to separate file,
in order to make CM-ID specific logic self-contained.

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agordma: Move out resource CQ logic to separate file
Leon Romanovsky [Sat, 23 Feb 2019 09:15:16 +0000 (11:15 +0200)]
rdma: Move out resource CQ logic to separate file

Logically separate resource CQ logic to separate file,
in order to make CQ specific logic self-contained.

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agordma: Refactor out resource MR logic to separate file
Leon Romanovsky [Sat, 23 Feb 2019 09:15:15 +0000 (11:15 +0200)]
rdma: Refactor out resource MR logic to separate file

Logically separate resource MR logic to separate file,
in order to make MR specific logic self-contained.

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agordma: Move resource PD logic to separate file
Leon Romanovsky [Sat, 23 Feb 2019 09:15:14 +0000 (11:15 +0200)]
rdma: Move resource PD logic to separate file

Logically separate resource PD logic to separate file,
in order to make PD specific logic self-contained.

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agordma: Provide parent context index for all objects except CM_ID
Leon Romanovsky [Sat, 23 Feb 2019 09:15:13 +0000 (11:15 +0200)]
rdma: Provide parent context index for all objects except CM_ID

Allow users to correlate allocated object with relevant parent

[leonro@server ~]$ rdma res show pd
dev mlx5_0 users 5 pid 0 comm [ib_core] pdn 1
dev mlx5_0 users 7 pid 0 comm [ib_ipoib] pdn 2
dev mlx5_0 users 0 pid 0 comm [mlx5_ib] pdn 3
dev mlx5_0 users 2 pid 548 comm ibv_rc_pingpong ctxn 0 pdn 4

[leonro@server ~]$ rdma res show cq cqn 0-100
dev mlx5_0 cqe 2047 users 6 poll-ctx UNBOUND_WORKQUEUE pid 0 comm [ib_core] cqn 2
dev mlx5_0 cqe 255 users 2 poll-ctx SOFTIRQ pid 0 comm [mlx5_ib] cqn 3
dev mlx5_0 cqe 511 users 1 poll-ctx DIRECT pid 0 comm [ib_ipoib] cqn 4
dev mlx5_0 cqe 255 users 1 poll-ctx DIRECT pid 0 comm [ib_ipoib] cqn 5
dev mlx5_0 cqe 255 users 0 poll-ctx SOFTIRQ pid 0 comm [mlx5_ib] cqn 6
dev mlx5_0 cqe 511 users 2 pid 548 comm ibv_rc_pingpong cqn 7 ctxn 0

[leonro@server ~]$ rdma res show mr
dev mlx5_0 mrlen 4096 pid 548 comm ibv_rc_pingpong mrn 4 pdn 0

[leonro@nps-server-14-015 ~]$ /images/leonro/src/iproute2/rdma/rdma res show qp
link mlx5_0/1 lqpn 0 type SMI state RTS sq-psn 0 pid 0 comm [ib_core]
link mlx5_0/1 lqpn 1 type GSI state RTS sq-psn 0 pid 0 comm [ib_core]
link mlx5_0/1 lqpn 7 type UD state RTS sq-psn 0 pid 0 comm [ib_core]
link mlx5_0/1 lqpn 8 type UD state RTS sq-psn 0 pid 0 comm [ib_ipoib]
link mlx5_0/1 lqpn 9 pdn 4 rqpn 0 type RC state INIT rq-psn 0 sq-psn 0 path-mig-state MIGRATED pid 548 comm ibv_rc_pingpong

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agordma: Provide unique indexes for all visible objects
Leon Romanovsky [Sat, 23 Feb 2019 09:15:12 +0000 (11:15 +0200)]
rdma: Provide unique indexes for all visible objects

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agordma: Remove duplicated print code
Leon Romanovsky [Sat, 23 Feb 2019 09:15:11 +0000 (11:15 +0200)]
rdma: Remove duplicated print code

There is no need to keep same print functions for
uint32_t and uint64_t, unify them into one function.

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agordma: update uapi headers
Leon Romanovsky [Sat, 23 Feb 2019 09:15:10 +0000 (11:15 +0200)]
rdma: update uapi headers

Update rdma_netlink.h file upto kernel commit
f2a0e45f36b0 RDMA/nldev: Don't expose number of not-visible entries

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agoImprove batch and dump times by caching link lookups
David Ahern [Wed, 13 Feb 2019 23:56:30 +0000 (15:56 -0800)]
Improve batch and dump times by caching link lookups

ip route uses ll_name_to_index and ll_index_to_name to convert between
device names and indices. At the moment both use for the ioctl based glibc
functions if_nametoindex and if_indextoname and does not cache the result.
When using a batch file or dumping large number of routes this means the
same device lookups can be done repeatedly adding unnecessary overhead
(socket + ioctl + close for each device lookup).

Add a new function, ll_link_get, to send a netlink based RTM_GETLINK. If
successful, cache the result in idx_head and name_head so future lookups
can re-use the entry. Update ll_name_to_index and ll_index_to_name to use
ll_link_get and only fallback to the glibc functions if it fails.

With this change the time to install 720,022 routes with 2 ecmp nexthops
where the nexthop device is given is reduced from 31.4 seconds to 19.2
seconds. A dump of those routes drops from 13.3 to 2.8 seconds.

Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agoip link: Drop cache entry on any changes
David Ahern [Wed, 13 Feb 2019 23:53:21 +0000 (15:53 -0800)]
ip link: Drop cache entry on any changes

Remove any entry from the link cache when the link is modified.

Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agoll_map: Add function to remove link cache entry by index
David Ahern [Mon, 7 Jan 2019 22:29:15 +0000 (14:29 -0800)]
ll_map: Add function to remove link cache entry by index

Add ll_drop_by_index to remove an entry from the link cache.

Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agoMerge branch 'iproute2-master' into next
David Ahern [Sat, 23 Feb 2019 02:50:39 +0000 (18:50 -0800)]
Merge branch 'iproute2-master' into next

Conflicts:
misc/ss.c

Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agoss: Render buffer to output every time a number of chunks are allocated
Stefano Brivio [Thu, 14 Feb 2019 00:58:32 +0000 (01:58 +0100)]
ss: Render buffer to output every time a number of chunks are allocated

Eric reported that, with 10 million sockets, ss -emoi (about 1000 bytes
output per socket) can easily lead to OOM (buffer would grow to 10GB of
memory).

Limit the maximum size of the buffer to five chunks, 1M each. Render and
flush buffers whenever we reach that.

This might make the resulting blocks slightly unaligned between them, with
occasional loss of readability on lines occurring every 5k to 50k sockets
approximately. Something like (from ss -tu):

[...]
CLOSE-WAIT   32       0           192.168.1.50:35232           10.0.0.1:https
ESTAB        0        0           192.168.1.50:53820           10.0.0.1:https
ESTAB       0        0           192.168.1.50:46924            10.0.0.1:https
CLOSE-WAIT  32       0           192.168.1.50:35228            10.0.0.1:https
[...]

However, I don't actually expect any human user to scroll through that
amount of sockets, so readability should be preserved when it matters.

The bulk of the diffstat comes from moving field_next() around, as we now
call render() from it. Functionally, this is implemented by six lines of
code, most of them in field_next().

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Fixes: 691bd854bf4a ("ss: Buffer raw fields first, then render them as a table")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
5 years agoss: fix compilation under glibc < 2.18
Thomas De Schampheleire [Wed, 20 Feb 2019 14:41:51 +0000 (15:41 +0100)]
ss: fix compilation under glibc < 2.18

Commit c759116a0b2b6da8df9687b0a40ac69050132c77 introduced support for
AF_VSOCK. This define is only provided since glibc version 2.18, so
compilation fails when using older toolchains.

Provide the necessary definitions if needed.

Signed-off-by: Thomas De Schampheleire <thomas.de_schampheleire@nokia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
5 years agouapi: update inet_diag_info.h
Stephen Hemminger [Thu, 21 Feb 2019 22:24:07 +0000 (14:24 -0800)]
uapi: update inet_diag_info.h

Upstream changes.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
5 years agobridge: make mcast_flood description consistent
Vivien Didelot [Wed, 20 Feb 2019 16:33:57 +0000 (11:33 -0500)]
bridge: make mcast_flood description consistent

This patch simply changes the description of the mcast_flood flag
with "flood" instead of "be flooded with" to avoid confusion, and be
consistent with the description of the flooding flag, which "Controls
whether a given port will *flood* unicast traffic for which there is
no FDB entry."

At the same time, fix the documentation for the "flood" flag which
is incorrectly described as "flooding on" or "flooding off".

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
5 years agodevlink: relax dpipe table show dependency on resources
Jiri Pirko [Thu, 21 Feb 2019 10:55:56 +0000 (11:55 +0100)]
devlink: relax dpipe table show dependency on resources

Dpipe table show command has a depencency on getting resources.
If resource get command is not supported by the driver, dpipe table
show fails. However, resource is only additional information
in dpipe table show output. So relax the dependency and let
the dpipe tables be shown even if resources get command fails.

Fixes: ead180274caf ("devlink: Add support for resource/dpipe relation")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
5 years agoip-address: Use correct max attribute value in print_vf_stats64()
Phil Sutter [Thu, 21 Feb 2019 18:37:51 +0000 (19:37 +0100)]
ip-address: Use correct max attribute value in print_vf_stats64()

IFLA_VF_MAX is larger than the highest valid index in vf array.

Fixes: a1b99717c7cd7 ("Add displaying VF traffic statistics")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
5 years agoip-rule: fix json key "to_tbl" for unspecific rule action
Thomas Haller [Tue, 19 Feb 2019 20:50:19 +0000 (21:50 +0100)]
ip-rule: fix json key "to_tbl" for unspecific rule action

The key should not be called "to_tbl" because it is exactly
not a FR_ACT_TO_TBL action. Change it to "action".

    # ip rule add blackhole
    # ip -j rule | python -m json.tool
    ...
    {
        "priority": 0,
        "src": "all",
        "to_tbl": "blackhole"
    },

This is an API break of JSON output as it was added in v4.17.0.
Still change it as the API is relatively new and unstable.

Fixes: 0dd4ccc56c0e ("iprule: add json support")
Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
5 years agoip route: get: allow zero-length subnet mask
Luca Boccassi [Thu, 14 Feb 2019 23:29:18 +0000 (23:29 +0000)]
ip route: get: allow zero-length subnet mask

A /0 subnet mask is theoretically valid, but ip route get doesn't allow
it:

$ ip route get 1.0.0.0/0
need at least a destination address

Change the check and remember whether we found an address or not, since
according to the documentation it's a mandatory parameter.

$ ip/ip route get 1.0.0.0/0
1.0.0.0 via 192.168.1.1 dev eth0 src 192.168.1.91 uid 1000
    cache

Reported-by: Clément Hertling <wxcafe@wxcafe.net>
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
5 years agoiplink: document XDP subcommand to force the XDP mode.
Matteo Croce [Wed, 13 Feb 2019 14:40:30 +0000 (15:40 +0100)]
iplink: document XDP subcommand to force the XDP mode.

When attaching an eBPF program to a device, ip link can force the XDP mode
by using the xdp{generic,drv,offload} keyword instead of just 'xdp'.
Document this behaviour also in the help output.

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Fixes: 14683814 ("bpf: add xdpdrv for requesting XDP driver mode")
Fixes: 1b5e8094 ("bpf: allow requesting XDP HW offload")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
5 years agoss: add option --tos for requesting ipv4 tos and ipv6 tclass
Konstantin Khlebnikov [Wed, 13 Feb 2019 12:39:01 +0000 (15:39 +0300)]
ss: add option --tos for requesting ipv4 tos and ipv6 tclass

Also show socket class_id/priority used by classful qdisc.
Kernel report this together with tclass since commit
("inet_diag: fix reporting cgroup classid and fallback to priority")

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
5 years agolib/libnetlink: ensure a minimum of 32KB for the buffer used in rtnl_recvmsg()
Eric Dumazet [Wed, 13 Feb 2019 01:58:41 +0000 (17:58 -0800)]
lib/libnetlink: ensure a minimum of 32KB for the buffer used in rtnl_recvmsg()

In the past, we tried to increase the buffer size up to 32 KB in order
to reduce number of syscalls per dump.

Commit 2d34851cd341 ("lib/libnetlink: re malloc buff if size is not enough")
brought the size back to 4KB because the kernel can not know the application
is ready to receive bigger requests.

See kernel commits 9063e21fb026 ("netlink: autosize skb lengthes") and
d35c99ff77ec ("netlink: do not enter direct reclaim from netlink_dump()")
for more details.

Fixes: 2d34851cd341 ("lib/libnetlink: re malloc buff if size is not enough")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Hangbin Liu <liuhangbin@gmail.com>
Cc: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
5 years agouse print_{,h}hu instead of print_uint when format specifier is %{,h}hu
Davide Caratti [Thu, 7 Feb 2019 10:51:27 +0000 (11:51 +0100)]
use print_{,h}hu instead of print_uint when format specifier is %{,h}hu

in this way, a useless cast to unsigned int is avoided in bpf_print_ops()
and print_tunnel().

Tested with:
 # ./tdc.py -c bpf

Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
Cc: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agotc: use bits not mbits/sec in rate percent
Marcos Antonio Moraes [Thu, 7 Feb 2019 15:29:54 +0000 (13:29 -0200)]
tc: use bits not mbits/sec in rate percent

As /sys/class/net/<iface>/speed indicates a value in Mbits/sec, the
conversion is necessary to create the correct limits.

This guarantees the same result for the following commands in an
1000Mbit/sec device:

tc class add ... htb rate 500Mbit
tc class add ... htb rate 50%

Fixes: 927e3cfb52b5 ("tc: B.W limits can now be specified in %.")
Signed-off-by: Marcos Antonio Moraes <marcos.antonio@digirati.com.br>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
5 years agotc: avoid problems with hard coded rate string length
Stephen Hemminger [Wed, 6 Feb 2019 18:49:47 +0000 (10:49 -0800)]
tc: avoid problems with hard coded rate string length

The parse_percent_rate function assumed the buffer was 20 characters.
Better to pass length in case the size ever changes.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
5 years agotc: fix memory leak in error path
Stephen Hemminger [Wed, 6 Feb 2019 18:41:58 +0000 (10:41 -0800)]
tc: fix memory leak in error path

If value passed to parse_percent was not valid, it would
leak the dynamic allocation from sscanf.

Fixes: 927e3cfb52b5 ("tc: B.W limits can now be specified in %.")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
5 years agodevlink: add info subcommand
Jakub Kicinski [Mon, 4 Feb 2019 16:10:11 +0000 (08:10 -0800)]
devlink: add info subcommand

Add support for reading the device serial number, driver name
and various versions.  Example:

$ devlink dev info pci/0000:82:00.0
pci/0000:82:00.0:
  driver nfp
  serial_number 16240145
  versions:
      fixed:
        board.id AMDA0081-0001
        board.rev 15
        board.vendor SMA
        board.model hydrogen
      running:
        fw.mgmt 010181.010181.0101d4
        fw.cpld 0x1030000
        fw.app abm-d372b6
        fw.undi 0.0.2
        chip.init AMDA-0081-0001  20160318164536
      stored:
        fw.mgmt 010181.010181.0101d4
        fw.app abm-d372b6
        fw.undi 0.0.2
        chip.init AMDA-0081-0001  20160318164536

$ devlink -jp dev info pci/0000:82:00.0
{
    "info": {
        "pci/0000:82:00.0": {
            "driver": "nfp",
            "serial_number": "16240145",
            "versions": {
                "fixed": {
                    "board.id": "AMDA0081-0001",
                    "board.rev": "15",
                    "board.vendor": "SMA",
                    "board.model": "hydrogen"
                },
                "running": {
                    "fw.mgmt": "010181.010181.0101d4",
                    "fw.cpld": "0x1030000",
                    "fw.app": "abm-d372b6",
                    "fw.undi": "0.0.2",
                    "chip.init": "AMDA-0081-0001  20160318164536"
                },
                "stored": {
                    "fw.mgmt": "010181.010181.0101d4",
                    "fw.app": "abm-d372b6",
                    "fw.undi": "0.0.2",
                    "chip.init": "AMDA-0081-0001  20160318164536"
                }
            }
        }
    }
}

v5:
 - remove spurious new line.
v4:
 - more commit message improvements.
v3:
 - show up-to-date output in the commit message.
v2 (Jiri):
 - remove filtering;
 - add example in the commit message.
RFCv2:
 - make info subcommand of dev.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agodevlink: report cell size
Jakub Kicinski [Mon, 4 Feb 2019 15:28:59 +0000 (07:28 -0800)]
devlink: report cell size

Print the value of DEVLINK_ATTR_SB_POOL_CELL_SIZE, if reported.

Example:
pci/0000:82:00.0:
  sb 1 pool 0 type egress size 40945664 thtype static cell_size 2048
  sb 2 pool 0 type egress size 258867200 thtype static cell_size 10240
...

v3: - don't double space.
v2: - fix spelling.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agoUpdate kernel headers
David Ahern [Wed, 6 Feb 2019 16:45:41 +0000 (08:45 -0800)]
Update kernel headers

Update kernel headers to commit:
bfbae2eafe05 ("Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue")

Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agobpf: add btf func and func_proto kind support
Yonghong Song [Fri, 25 Jan 2019 00:41:07 +0000 (16:41 -0800)]
bpf: add btf func and func_proto kind support

The issue is discovered for bpf selftest test_skb_cgroup.sh.
Currently we have,
  $ ./test_skb_cgroup_id.sh
  Wait for testing link-local IP to become available ... OK
  Object has unknown BTF type: 13!
  [PASS]

In the above the BTF type 13 refers to BTF kind
BTF_KIND_FUNC_PROTO.
This patch added support of BTF_KIND_FUNC_PROTO and
BTF_KIND_FUNC during type parsing.
With this patch, I got
  $ ./test_skb_cgroup_id.sh
  Wait for testing link-local IP to become available ... OK
  [PASS]

Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
5 years agobridge: fdb: Fix FDB dump with strict checking disabled
Ido Schimmel [Fri, 25 Jan 2019 17:09:17 +0000 (17:09 +0000)]
bridge: fdb: Fix FDB dump with strict checking disabled

While iproute2 correctly uses ifinfomsg struct as the ancillary header
when requesting an FDB dump on old kernels, it sets the message type to
RTM_GETLINK. This results in wrong reply being returned.

Fix this by using RTM_GETNEIGH instead.

Before:
$ bridge fdb show brport dummy0
Not RTM_NEWNEIGH: 00000158 00000010 00000002

After:
$ bridge fdb show brport dummy0
2a:0b:41:1c:92:d3 vlan 1 master br0 permanent
2a:0b:41:1c:92:d3 master br0 permanent
33:33:00:00:00:01 self permanent
01:00:5e:00:00:01 self permanent

Fixes: 05880354c2cf ("bridge: fdb: Fix filtering with strict checking disabled")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: LiLiang <liali@redhat.com>
Acked-by: David Ahern <dsahern@gmail.com>
Acked-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
5 years agolibnetlink: linkdump_req: AF_PACKET family also expects ext_filter_mask
Chris Mi [Fri, 25 Jan 2019 10:37:07 +0000 (10:37 +0000)]
libnetlink: linkdump_req: AF_PACKET family also expects ext_filter_mask

Without this fix, the VF info can't be showed using command
"ip link".

146: ens1f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 24:8a:07:ad:78:52 brd ff:ff:ff:ff:ff:ff
    vf 0 MAC 02:25:d0:12:01:01, spoof checking off, link-state auto, trust off, query_rss off
    vf 1 MAC 02:25:d0:12:01:02, spoof checking off, link-state auto, trust off, query_rss off

Fixes: d97b16b2c906 ("libnetlink: linkdump_req: Only AF_UNSPEC family expects an ext_filter_mask")
Signed-off-by: Chris Mi <chrism@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
5 years agotc: add 'kind' property to 'csum' action
Davide Caratti [Thu, 31 Jan 2019 17:58:41 +0000 (18:58 +0100)]
tc: add 'kind' property to 'csum' action

unlike other TC actions already supporting JSON printout, 'csum' does not
print the value of TCA_KIND in the 'kind' property: remove 'csum' word
from 'csum' property, and add a separate 'kind' property containing the
action name. The human-readable printout is preserved.

Tested with:
 # ./tdc.py -c csum

Cc: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agotc: full JSON support for 'bpf' actions
Davide Caratti [Thu, 31 Jan 2019 17:58:09 +0000 (18:58 +0100)]
tc: full JSON support for 'bpf' actions

Add full JSON output support in the dump of 'act_bpf'.

Example using eBPF:

 # tc actions flush action bpf
 # tc action add action bpf object bpf/action.o section 'action-ok'
 # tc -j action list action bpf | jq
 [
   {
     "total acts": 1
   },
   {
     "actions": [
       {
         "order": 0,
         "kind": "bpf",
         "bpf_name": "action.o:[action-ok]",
         "prog": {
           "id": 33,
           "tag": "a04f5eef06a7f555",
           "jited": 1
         },
         "control_action": {
           "type": "pipe"
         },
         "index": 1,
         "ref": 1,
         "bind": 0
       }
     ]
   }
 ]

Example using cBPF:

 # tc actions flush action bpf
 # a=$(mktemp)
 # tcpdump -ddd not ether proto 0x888e >$a
 # tc action add action bpf bytecode-file $a index 42
 # rm $a
 # tc -j action list action bpf | jq
 [
   {
     "total acts": 1
   },
   {
     "actions": [
       {
         "order": 0,
         "kind": "bpf",
         "bytecode": {
           "length": 4,
           "insns": [
             {
               "code": 40,
               "jt": 0,
               "jf": 0,
               "k": 12
             },
             {
               "code": 21,
               "jt": 0,
               "jf": 1,
               "k": 34958
             },
             {
               "code": 6,
               "jt": 0,
               "jf": 0,
               "k": 0
             },
             {
               "code": 6,
               "jt": 0,
               "jf": 0,
               "k": 262144
             }
           ]
         },
         "control_action": {
           "type": "pipe"
         },
         "index": 42,
         "ref": 1,
         "bind": 0
       }
     ]
   }
 ]

Tested with:
 # ./tdc.py -c bpf

Cc: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agoss: add AF_XDP support
Björn Töpel [Wed, 30 Jan 2019 06:57:32 +0000 (07:57 +0100)]
ss: add AF_XDP support

AF_XDP is an address family that is optimized for high performance
packet processing.

This patch adds AF_XDP support to ss(8) so that sockets can be queried
and monitored.

Example:
$ sudo ss --xdp -e -p -m
Recv-Q      Send-Q           Local Address:Port             Peer Address:Port

0           0                   enp134s0f0:q20                          *
 users:(("xdpsock",pid=17787,fd=3)) ino:39424 sk:4
        rx(entries:2048)
        tx(entries:2048)
        umem(id:1,size:8388608,num_pages:2048,chunk_size:2048,headroom:0,ifindex:7,
qid:20,zc:0,refs:1)
        fr(entries:2048)
        cr(entries:2048) skmem:(r0,rb212992,t0,tb212992,f0,w0,o0,bl0,d0)
0           0                    enp24s0f0:q0                           *
 users:(("xdpsock",pid=17780,fd=3)) ino:37384 sk:5
        rx(entries:2048)
        tx(entries:2048)
        umem(id:0,size:8388608,num_pages:2048,chunk_size:2048,headroom:0,ifindex:6,
qid:0,zc:1,refs:1)
        fr(entries:2048)
        cr(entries:2048) skmem:(r0,rb212992,t0,tb212992,f0,w0,o0,bl0,d0)

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agoUpdate kernel headers and add xdp_diag.h
David Ahern [Wed, 30 Jan 2019 02:34:40 +0000 (18:34 -0800)]
Update kernel headers and add xdp_diag.h

Update kernel headers to commit:
c829f5f52db9 ("cxgb4: cxgb4_tc_u32: use struct_size() in kvzalloc()")

and import xdp_diag.h for the next patch.

Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agonetns: add subcommand to attach an existing network namespace
Matteo Croce [Tue, 29 Jan 2019 15:01:15 +0000 (16:01 +0100)]
netns: add subcommand to attach an existing network namespace

ip tracks namespaces with dummy files in /var/run/netns/, but can't see
namespaces created with other tools.
Creating the dummy file and bind mounting the correct procfs entry will
make ip aware of that namespace.
Add an ip netns subcommand to automate this task.

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Reviewed-by: Andrea Claudi <aclaudi@redhat.com>
Tested-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agotc: replace left side comparison
Stephen Hemminger [Thu, 24 Jan 2019 20:30:14 +0000 (09:30 +1300)]
tc: replace left side comparison

The kernel (and iproute2) don't use the if (NULL == x) style
and instead prefer if (!x)

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agof_flower: fix build with musl libc
Hans Dedecker [Wed, 23 Jan 2019 21:02:31 +0000 (22:02 +0100)]
f_flower: fix build with musl libc

XATTR_SIZE_MAX requires the usage of linux/limits.h; let's include it

Signed-off-by: Hans Dedecker <dedeckeh@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
5 years agoiproute: Set ip/ip6 lwtunnel flags
wenxu [Wed, 2 Jan 2019 03:57:00 +0000 (11:57 +0800)]
iproute: Set ip/ip6 lwtunnel flags

ip l add dev tun type gretap external
ip r a 10.0.0.1 encap ip dst 192.168.152.171 id 1000 dev gretap

For gretap example when the command set the id but don't set the
TUNNEL_KEY flags. There is no key field in the send packet

User can set flags with key, csum, seq
ip r a 10.0.0.1 encap ip dst 192.168.152.171 id 1000 key csum dev gretap

Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
5 years agoMerge 'iproute2-master' into iproute2-next
David Ahern [Tue, 22 Jan 2019 16:30:38 +0000 (08:30 -0800)]
Merge 'iproute2-master' into iproute2-next

Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agoip route: get: only set RTM_F_LOOKUP_TABLE flag for IPv4
Jakub Kicinski [Sat, 12 Jan 2019 20:54:06 +0000 (12:54 -0800)]
ip route: get: only set RTM_F_LOOKUP_TABLE flag for IPv4

Kernel ignores the RTM_F_LOOKUP_TABLE flag for all families
but IPv4.  Don't set it, otherwise it may fall foul of
strict checking policies.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
5 years agotc: m_tunnel_key: Allow key-less tunnels
Adi Nissim [Thu, 10 Jan 2019 13:03:50 +0000 (15:03 +0200)]
tc: m_tunnel_key: Allow key-less tunnels

Change the id parameter of the tunnel_key set action from mandatory to
optional.

Some tunneling protocols (e.g. GRE) specify the id as an optional field.

Signed-off-by: Adi Nissim <adin@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
5 years agouapi: in.h change
Stephen Hemminger [Tue, 22 Jan 2019 03:03:31 +0000 (16:03 +1300)]
uapi: in.h change

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
5 years agoxfrm: add option to hide keys in state output
Benedict Wong [Fri, 18 Jan 2019 19:12:17 +0000 (11:12 -0800)]
xfrm: add option to hide keys in state output

ip xfrm state show currently dumps keys unconditionally. This limits its
use in logging, as security information can be leaked.

This patch adds a nokeys option to ip xfrm ( state show | monitor ), which
prevents the printing of keys. This allows ip xfrm state show to be used
in logging without exposing keys.

Signed-off-by: Benedict Wong <benedictwong@google.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agotc: add hit counter for matchall
Cong Wang [Thu, 17 Jan 2019 21:18:55 +0000 (13:18 -0800)]
tc: add hit counter for matchall

Cc: Martin Olsson <martin.olsson+netdev@sentorsecurity.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agoUpdate kernel headers
David Ahern [Mon, 21 Jan 2019 16:29:26 +0000 (08:29 -0800)]
Update kernel headers

Update kernel headers to commit
28f9d1a3d4fe ("Merge branch 'mlxsw-spectrum_router-Add-GRE-tunnel-support-for-Spectrum-2'")

Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agordma: Add unbound workqueue to list of poll context types
Leon Romanovsky [Thu, 17 Jan 2019 15:14:43 +0000 (17:14 +0200)]
rdma: Add unbound workqueue to list of poll context types

Kernel commit f794809a7259 ("IB/core: Add an unbound WQ type to the new CQ API")
added new CQ poll context type, reflect this change in rdmatool.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agoclang-format: add configuration file
Leon Romanovsky [Thu, 17 Jan 2019 15:08:01 +0000 (17:08 +0200)]
clang-format: add configuration file

The codebase of iproute2 follows Linux kernel coding style,
so it will be very helpful to reuse existing clang configuration
file to reliably format code.

For more information see kernel commit d4ef8d3ff005
("clang-format: add configuration file").

Updated upto commit v5.0-rc1 with small number of ForEachMacros.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agoMakefile: check manpages for syntax errors
Luca Boccassi [Sat, 12 Jan 2019 12:28:56 +0000 (12:28 +0000)]
Makefile: check manpages for syntax errors

Pass the same parameters Lintian uses in Debian.

$ make check
<...>
Checking manpages for syntax errors...
<standard input>:48: warning: macro `Q' not defined
Error in tc-taprio.8
Makefile:27: recipe for target 'check' failed

Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
5 years agoman: tc-taprio.8: fix syntax error
Luca Boccassi [Sat, 12 Jan 2019 12:28:55 +0000 (12:28 +0000)]
man: tc-taprio.8: fix syntax error

.Q does not exist so groff complains and the "queues" word is actually
not displayed.

Fixes: 579acb4bc52f ("taprio: Add manpage for tc-taprio(8)")
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
5 years agoman: ss.8: more line breaks
Luca Boccassi [Sat, 12 Jan 2019 12:28:54 +0000 (12:28 +0000)]
man: ss.8: more line breaks

groff stiff complains about unbreakable lines:
  96: warning [p 2, 3.0i]: can't break line

Indent it some more.

Fixes: 7f5047524c99 ("man: ss.8: break and indent long line")
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
5 years agoMerge branch 'master' into next
David Ahern [Tue, 8 Jan 2019 00:30:13 +0000 (16:30 -0800)]
Merge branch 'master' into next

Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agoconfigure: fix typo in check_xt_old_internal_h
Dmitry V. Levin [Mon, 7 Jan 2019 22:37:15 +0000 (01:37 +0300)]
configure: fix typo in check_xt_old_internal_h

Fixes: 377a09902a57 ("configure: Minor code cleanup")
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
5 years agordma: update uapi headers
Stephen Hemminger [Mon, 7 Jan 2019 19:41:39 +0000 (11:41 -0800)]
rdma: update uapi headers

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
5 years agouapi: update headers from 4.21-rc1
Stephen Hemminger [Mon, 7 Jan 2019 19:39:26 +0000 (11:39 -0800)]
uapi: update headers from 4.21-rc1

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
5 years agoMerge ../iproute2-next
Stephen Hemminger [Mon, 7 Jan 2019 19:36:41 +0000 (11:36 -0800)]
Merge ../iproute2-next

5 years agov4.20.0
Stephen Hemminger [Mon, 7 Jan 2019 18:24:02 +0000 (10:24 -0800)]
v4.20.0

5 years agoipneigh: print dst for AF_BRIDGE
Tobias Jungel [Sat, 5 Jan 2019 12:36:43 +0000 (13:36 +0100)]
ipneigh: print dst for AF_BRIDGE

In case a neighbour message is of family AF_BRIDE the NDA_DST attribute
was not printed so far. With this patch the family is evaluated to pass
the correct family to format_host_rta.

Signed-off-by: Tobias Jungel <tobias.jungel@bisdn.de>
5 years agolibnetlink: linkdump_req is done for AF_BRIDGE as well
David Ahern [Mon, 7 Jan 2019 00:17:13 +0000 (16:17 -0800)]
libnetlink: linkdump_req is done for AF_BRIDGE as well

The bridge command 'vlan show' calls rtnl_linkdump_req_filter for
family AF_BRIDGE. Update rtnl_linkdump_req_filter to send the filter
for that family as well.

Fixes: d97b16b2c906 ("libnetlink: linkdump_req: Only AF_UNSPEC family expects an ext_filter_mask")
Reported-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
5 years agoMerge branch 'iproute2-master' into iproute2-next
David Ahern [Fri, 4 Jan 2019 20:22:47 +0000 (12:22 -0800)]
Merge branch 'iproute2-master' into iproute2-next

Conflicts:
ip/iprule.c

Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agoMerge branch 'strict-updates' into iproute2-next
David Ahern [Fri, 4 Jan 2019 20:19:37 +0000 (12:19 -0800)]
Merge branch 'strict-updates' into iproute2-next

Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agobridge: fdb: Fix filtering with strict checking disabled
David Ahern [Thu, 3 Jan 2019 00:33:42 +0000 (16:33 -0800)]
bridge: fdb: Fix filtering with strict checking disabled

Older kernels expect an ifinfomsg struct as the ancillary header, and
after kernel commit bd961c9bc664 ("rtnetlink: fix rtnl_fdb_dump() for ndmsg
header") can handle either ifinfomsg or ndmsg. Strict data checking only
allows ndmsg.

Use the new RTNL_HANDLE_F_STRICT_CHK flag to know which header to send.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
5 years agolibnetlink: Add RTNL_HANDLE_F_STRICT_CHK flag
David Ahern [Thu, 3 Jan 2019 00:31:38 +0000 (16:31 -0800)]
libnetlink: Add RTNL_HANDLE_F_STRICT_CHK flag

Add RTNL_HANDLE_F_STRICT_CHK flag and set in rth flags to let know
commands know if the kernel supports strict checking.

Extracted from patch from Ido to fix filtering with strict checking
enabled.

Cc: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agobridge: Update fdb show to use rtnl_neighdump_req
David Ahern [Mon, 31 Dec 2018 18:00:24 +0000 (10:00 -0800)]
bridge: Update fdb show to use rtnl_neighdump_req

Add fdb_dump_filter to set filter attributes in dump request
and convert fdb_show to use rtnl_neighdump_req.

Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agoip neigh: Convert do_show_or_flush to use rtnl_neighdump_req
David Ahern [Mon, 31 Dec 2018 17:55:45 +0000 (09:55 -0800)]
ip neigh: Convert do_show_or_flush to use rtnl_neighdump_req

Add ipneigh_dump_filter to add filter attributes to the neighbor
dump request and update do_show_or_flush to use rtnl_neighdump_req.

Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agolibnetlink: Add filter function to rtnl_neighdump_req
David Ahern [Mon, 31 Dec 2018 17:54:47 +0000 (09:54 -0800)]
libnetlink: Add filter function to rtnl_neighdump_req

Add filter function to rtnl_neighdump_req and a buffer to the
request for the filter functions to append attributes.

Signed-off-by: David Ahern <dsahern@gmail.com>
5 years agordma: Fix incorrectly handled NLA validation
Leon Romanovsky [Sun, 30 Dec 2018 13:34:09 +0000 (15:34 +0200)]
rdma: Fix incorrectly handled NLA validation

mnl_attr_type_valid() receives maximum attribute type, which means that
we were supposed to supply the latest valid netlink attribute and not
the number of attributes. Such coding mistake caused to failures while
NLA attributes were extended.

Fixes: 74bd75c2b68d ("rdma: Add basic infrastructure for RDMA tool")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
5 years agoiprule: Add tun_id filed in the selector
wenxu [Mon, 24 Dec 2018 08:49:44 +0000 (16:49 +0800)]
iprule: Add tun_id filed in the selector

ip rule add from all iif gretap tun_id 2000 lookup 200

Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
5 years agonstat: fix load_ugly_table() limits
Eric Dumazet [Sat, 22 Dec 2018 06:53:35 +0000 (22:53 -0800)]
nstat: fix load_ugly_table() limits

A recent change reduced max line length from 4096 to 2048 bytes,
but we already have lines above the 2048 threshold, and we keep
adding more SNMP counters in linux.

Switch to getline() and do not worry about future kernel changes.

Fixes: da8034a01904 ("misc: avoid snprintf warnings in ss and nstat")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
5 years agobridge: fdb: Use 'struct ndmsg' for FDB dumping
Ido Schimmel [Sun, 30 Dec 2018 17:14:54 +0000 (17:14 +0000)]
bridge: fdb: Use 'struct ndmsg' for FDB dumping

Since commit aea41afcfd6d ("ip bridge: Set NETLINK_GET_STRICT_CHK on
socket") iproute2 uses strict checking on kernels that support it. This
causes FDB dumping to fail [1], as iproute2 uses 'struct ifinfomsg'
whereas the kernel expects 'struct ndmsg'.

Note that with this change iproute2 continues to work on old kernels
that do not support strict checking, but contain the fix introduced in
kernel commit bd961c9bc664 ("rtnetlink: fix rtnl_fdb_dump() for ndmsg
header").

[1]
# bridge fdb show
[ 5365.137224] netlink: 4 bytes leftover after parsing attributes in process `bridge'.
Error: bytes leftover after parsing attributes.
Dump terminated

Fixes: aea41afcfd6d ("ip bridge: Set NETLINK_GET_STRICT_CHK on socket")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>