]> git.proxmox.com Git - mirror_iproute2.git/log
mirror_iproute2.git
6 years agoMerge branch 'master' into net-next
Stephen Hemminger [Mon, 13 Nov 2017 18:35:17 +0000 (10:35 -0800)]
Merge branch 'master' into net-next

6 years agov4.14.1
Stephen Hemminger [Mon, 13 Nov 2017 18:09:57 +0000 (10:09 -0800)]
v4.14.1

6 years agoutils: remove duplicate include of ctype.h
Stephen Hemminger [Mon, 13 Nov 2017 18:08:39 +0000 (10:08 -0800)]
utils: remove duplicate include of ctype.h

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agoip: Fix compilation break on old systems
Leon Romanovsky [Mon, 13 Nov 2017 10:21:19 +0000 (12:21 +0200)]
ip: Fix compilation break on old systems

As was reported [1], the iproute2 fails to compile on old systems,
in Cong's case, it was Fedora 19, in our case it was RedHat 7.2, which
failed with the following errors during compilation:

ipxfrm.c: In function ‘xfrm_selector_print’:
ipxfrm.c:479:7: error: ‘IPPROTO_MH’ undeclared (first use in this
function)
  case IPPROTO_MH:
       ^
ipxfrm.c:479:7: note: each undeclared identifier is reported only once
for each function it appears in
ipxfrm.c: In function ‘xfrm_selector_upspec_parse’:
ipxfrm.c:1345:8: error: ‘IPPROTO_MH’ undeclared (first use in this
function)
   case IPPROTO_MH:
        ^                                                                                                                                                            make[1]: *** [ipxfrm.o] Error 1

The reason to it is the order of headers files. The IPPROTO_MH field is
set in kernel's UAPI header file (in6.h), but only in case
__UAPI_DEF_IPPROTO_V6 is set before. That define comes from other kernel's
header file (libc-compat.h) and is set in case there are no previous
libc relevant declarations.

In ip code, the include of <netdb.h> causes to indirect inclusion of
<netinet/in.h> and it sets __UAPI_DEF_IPPROTO_V6 to be zero and prevents from
IPPROTO_MH declaration.

This patch takes the simplest possible approach to fix the compilation
error by checking if IPPROTO_MH was defined before and in case it
wasn't, it defines it to be the same as in the kernel.

[1] https://www.spinics.net/lists/netdev/msg463980.html

Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Riad Abo Raed <riada@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
6 years agoMerge branch 'master' into net-next
Stephen Hemminger [Mon, 13 Nov 2017 00:30:14 +0000 (16:30 -0800)]
Merge branch 'master' into net-next

6 years agov4.14.0
Stephen Hemminger [Mon, 13 Nov 2017 00:29:43 +0000 (16:29 -0800)]
v4.14.0

6 years agodrop unneeded include of syslog.h
Stephen Hemminger [Mon, 13 Nov 2017 00:22:12 +0000 (16:22 -0800)]
drop unneeded include of syslog.h

Only arpd uses syslog

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agoMerge branch 'master' into net-next
Stephen Hemminger [Mon, 13 Nov 2017 00:17:37 +0000 (16:17 -0800)]
Merge branch 'master' into net-next

6 years agodevlink: add batch command support
Ivan Vecera [Fri, 10 Nov 2017 06:20:14 +0000 (07:20 +0100)]
devlink: add batch command support

The patch adds support to batch devlink commands.

Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
6 years agolib: make resolve_hosts variable common
Ivan Vecera [Fri, 10 Nov 2017 06:20:13 +0000 (07:20 +0100)]
lib: make resolve_hosts variable common

Any iproute utility that uses any function from lib/utils.c needs
to declare its own resolve_hosts variable instance although it does
not need/use hostname resolving functionality (currently only 'ip'
and 'ss' commands uses this).
The patch declares single common instance of resolve_hosts directly
in utils.c so the existing ones can be removed (the same approach
that is used for timestamp_short).

Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
6 years agoupdate kernel headers from 4.14 net-next
Stephen Hemminger [Sun, 12 Nov 2017 23:58:11 +0000 (15:58 -0800)]
update kernel headers from 4.14 net-next

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agotc: distinguish Add/Replace qdisc operations
Roman Mashak [Thu, 26 Oct 2017 21:30:08 +0000 (17:30 -0400)]
tc: distinguish Add/Replace qdisc operations

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
6 years agoupdate kernel headers
Stephen Hemminger [Sun, 12 Nov 2017 23:55:49 +0000 (15:55 -0800)]
update kernel headers

To 4.14 final kernel version
Note: SPDX tag added by upstream

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agoman: Clarify idleslope calculation for tc-cbs
Jesus Sanchez-Palencia [Fri, 10 Nov 2017 22:34:36 +0000 (14:34 -0800)]
man: Clarify idleslope calculation for tc-cbs

In order to calculate the idleSlope parameter of CBS correctly, users
must take into account the entire packet size, including the overhead
from all layers.

Add some more details to the man page to clarify that, giving one
simple example and pointing users to the correct 802.1Q section for
further clarifications if needed.

Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
6 years agoip6_gre: add support for ERSPAN tunnel
William Tu [Tue, 7 Nov 2017 02:27:18 +0000 (18:27 -0800)]
ip6_gre: add support for ERSPAN tunnel

The patch adds ERSPAN type II tunnel support for IPv6.

Signed-off-by: William Tu <u9012063@gmail.com>
6 years agolibnetlink: Handle extack messages for non-error case
David Ahern [Thu, 9 Nov 2017 00:46:50 +0000 (09:46 +0900)]
libnetlink: Handle extack messages for non-error case

Kernel can now return non-fatal error messages in extack facility.
Update iproute2 to dump to use if present.
- rename nl_dump_ext_err to nl_dump_ext_ack
- rename errmsg to msg
- add call to nl_dump_ext_ack in rtnl_dump_done and __rtnl_talk for
  non-error path

Signed-off-by: David Ahern <dsahern@gmail.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
6 years agoMerge branch 'master' into net-next
Stephen Hemminger [Thu, 9 Nov 2017 00:45:17 +0000 (09:45 +0900)]
Merge branch 'master' into net-next

6 years agonetem: use fixed rather than floating point for scaling
Stephen Hemminger [Tue, 7 Nov 2017 02:15:34 +0000 (11:15 +0900)]
netem: use fixed rather than floating point for scaling

Don't need to do floating point math to compute scaled random.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agoxfrm_{state, policy}: Allow to deleteall polices/states with marks
Thomas Egerer [Mon, 30 Oct 2017 18:11:46 +0000 (19:11 +0100)]
xfrm_{state, policy}: Allow to deleteall polices/states with marks

Using 'ip deleteall' with policies that have marks, fails unless you
eplicitely specify the mark values. This is very uncomfortable when
bulk-deleting policies and states. With this patch all relevant states
and policies are wiped by 'ip deleteall' regardless of their mark
values.

Signed-off-by: Thomas Egerer <thomas.egerer@secunet.com>
6 years agoxfrm_policy: Do not attempt to deleteall a socket policy
Thomas Egerer [Mon, 30 Oct 2017 18:11:45 +0000 (19:11 +0100)]
xfrm_policy: Do not attempt to deleteall a socket policy

Socket polices are added to a socket using setsockopt(2). They cannot be
deleted by iproute2. The attempt to delete them causes an error
(EINVAL).
To avoid this unnecessary error message all socket policies are skipped
in xfrm_policy_keep.

Signed-off-by: Thomas Egerer <thomas.egerer@secunet.com>
6 years agoxfrm_policy: Add filter option for socket policies
Thomas Egerer [Mon, 30 Oct 2017 18:11:44 +0000 (19:11 +0100)]
xfrm_policy: Add filter option for socket policies

Listing policies on systems with a lot of socket policies can be
confusing due to the number of returned polices. Even if socket polices
are not of interest, they cannot be filtered. This patch adds an option
to filter all socket policies from the output.

Signed-off-by: Thomas Egerer <thomas.egerer@secunet.com>
6 years agoflower: Represent HW traffic classes as classid values
Amritha Nambiar [Fri, 3 Nov 2017 08:54:01 +0000 (01:54 -0700)]
flower: Represent HW traffic classes as classid values

This patch was previously submitted as RFC. Submitting this as
non-RFC now that the classid reservation scheme for hardware
traffic classes and offloads to route packets to a hardware
traffic class are accepted in net-next.

HW traffic classes 0 through 15 are represented using the
reserved classid values :ffe0 - :ffef.

Example:
Match Dst IPv4,Dst Port and route to TC1:
# tc filter add dev eth0 protocol ip parent ffff:\
  prio 1 flower dst_ip 192.168.1.1/32\
  ip_proto udp dst_port 12000 skip_sw\
  hw_tc 1

# tc filter show dev eth0 parent ffff:
filter pref 1 flower chain 0
filter pref 1 flower chain 0 handle 0x1 hw_tc 1
  eth_type ipv4
  ip_proto udp
  dst_ip 192.168.1.1
  dst_port 12000
  skip_sw
  in_hw

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
6 years agoUpdate kernel headers with new SPDK identifier
Stephen Hemminger [Tue, 7 Nov 2017 02:02:41 +0000 (11:02 +0900)]
Update kernel headers with new SPDK identifier

The kernel header sanitizisation process now puts SPDK GPLv2
license comment on files.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agoUpdate kernel headers from 4.14-rc8 nete-next
Stephen Hemminger [Tue, 7 Nov 2017 02:02:08 +0000 (11:02 +0900)]
Update kernel headers from 4.14-rc8 nete-next

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agobridge: fdb: print NDA_SRC_VNI if available
Roopa Prabhu [Thu, 26 Oct 2017 17:12:55 +0000 (10:12 -0700)]
bridge: fdb: print NDA_SRC_VNI if available

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
6 years agoman: Add initial manpage for tc-cbs(8)
Vinicius Costa Gomes [Thu, 26 Oct 2017 17:17:49 +0000 (10:17 -0700)]
man: Add initial manpage for tc-cbs(8)

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agotc: Add support for the CBS qdisc
Vinicius Costa Gomes [Thu, 26 Oct 2017 17:17:48 +0000 (10:17 -0700)]
tc: Add support for the CBS qdisc

The Credit Based Shaper (CBS) queueing discipline allows bandwidth
reservation with sub-milisecond precision. It is defined by the
802.1Q-2014 specification (section 8.6.8.2 and Annex L).

The syntax is:

tc qdisc add dev DEV parent NODE cbs locredit <LOCREDIT>
    hicredit <HICREDIT> sendslope <SENDSLOPE>
idleslope <IDLESLOPE>

(The order is not important)

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agotc/mqprio: Offload mode and shaper options in mqprio
Amritha Nambiar [Wed, 1 Nov 2017 07:45:42 +0000 (00:45 -0700)]
tc/mqprio: Offload mode and shaper options in mqprio

This patch was previously submitted as RFC. Submitting this as
non-RFC now that the tc/mqprio changes are accepted in net-next.

Adds new mqprio options for 'mode' and 'shaper'. The mode
option can take values for offload modes such as 'dcb' (default),
'channel' with the 'hw' option set to 1. The new 'channel' mode
supports offloading TCs and other queue configurations. The
'shaper' option is to support HW shapers ('dcb' default) and
takes the value 'bw_rlimit' for bandwidth rate limiting. The
parameters to the bw_rlimit shaper are minimum and maximum
bandwidth rates. New HW shapers in future can be supported
through the shaper attribute.

# tc qdisc add dev eth0 root mqprio num_tc 2  map 0 0 0 0 1 1 1 1\
  queues 4@0 4@4 hw 1 mode channel shaper bw_rlimit\
  min_rate 1Gbit 2Gbit max_rate 4Gbit 5Gbit

# tc qdisc show dev eth0

qdisc mqprio 804a: root  tc 2 map 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0
             queues:(0:3) (4:7)
             mode:channel
             shaper:bw_rlimit   min_rate:1Gbit 2Gbit   max_rate:4Gbit 5Gbit

v2: Avoid buffer overrun and minor cleanup.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
6 years agoip/ipvlan: enhance ability to add mode flags to existing modes
Mahesh Bandewar [Mon, 30 Oct 2017 20:57:51 +0000 (13:57 -0700)]
ip/ipvlan: enhance ability to add mode flags to existing modes

IPvlan supported bridge-only functionality prior to commits
a190d04db937 ('ipvlan: introduce 'private' attribute for all
existing modes.') and fe89aa6b250c ('ipvlan: implement VEPA mode').
These two commits allow to configure the VEPA and private modes now.
This patch adds those options in ip command.

e.g.
  bash:~# ip link add link eth0 name ipvl0 type ipvlan mode l2 private
  -or-
  bash:~# ip link add link eth0 type ipvl0 type ipvlan mode l2 vepa

Also the output will reflect the mode and the mode-flag accordingly.
e.g.
  bash:~# ip -details link show ipvl0
  4: ipvl0@eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc ...
     link/ether 00:1a:11:44:a5:3e brd ff:ff:ff:ff:ff:ff promiscuity 0
     ipvlan  mode l2 private addrgenmode eui64 numtxqueues 1 ...

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
6 years agoupdate kernel headers from 4.14-rc7 net-next
Stephen Hemminger [Wed, 1 Nov 2017 21:15:50 +0000 (22:15 +0100)]
update kernel headers from 4.14-rc7 net-next

6 years agoMerge branch 'master' into net-next
Stephen Hemminger [Wed, 1 Nov 2017 21:15:00 +0000 (22:15 +0100)]
Merge branch 'master' into net-next

6 years agoss: Fix width calculations when Netid or State columns are missing
Stefano Brivio [Tue, 31 Oct 2017 17:47:56 +0000 (18:47 +0100)]
ss: Fix width calculations when Netid or State columns are missing

If Netid or State columns are missing, we must not subtract one
for each of these two columns from the remaining screen width,
while distributing available space to columns. This one
character corresponding to one delimiting space has to be
subtracted only if the columns are actually printed.

Further, in the existing implementation, if the screen width is
an odd number, one additional character is added to the width of
one of the two columns.

But if both are not printed, this filling character needs to be
added somewhere else, in order to have the right spacing
allowing us to fill lines completely.

Address and port fields are printed in pairs (local and remote),
so we can't distribute the space to any of them, because it
would be doubled. Instead, print this additional space to the
right of the Send-Q column, to keep code changes to a minimum.

This is particularly visible with 'ss -f netlink -Z'. Before
this patch, with an 80 column terminal, we have:

$ ss -f netlink -Z|head -n3
Recv-Q Send-Q Local Address:Port                 Peer Address:Port
0      0            rtnl:evolution-calen/2049           *                     pr
oc_ctx=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
0      0            rtnl:clock-applet/1944              *                     pr
oc_ctx=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023

and with an 81 column terminal:

$ ss -f netlink -Z|head -n3
Recv-Q Send-Q Local Address:Port                 Peer Address:Port
0      0            rtnl:evolution-calen/2049           *                     pro
c_ctx=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
0      0            rtnl:clock-applet/1944              *                     pro
c_ctx=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023

After this patch, in both cases, the output is:
$ ss -f netlink -Z|head -n3
Recv-Q Send-Q Local Address:Port                 Peer Address:Port
0      0             rtnl:evolution-calen/2049            *
 proc_ctx=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
0      0             rtnl:clock-applet/1944               *
 proc_ctx=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
6 years agoss: Streamline process context printing in netlink_show_one()
Stefano Brivio [Tue, 31 Oct 2017 17:47:55 +0000 (18:47 +0100)]
ss: Streamline process context printing in netlink_show_one()

There's no need to check 'pid_context' before calling free().

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
6 years agoss: Remove useless width specifier in process context print
Stefano Brivio [Tue, 31 Oct 2017 17:47:54 +0000 (18:47 +0100)]
ss: Remove useless width specifier in process context print

Both local address and service, and remote address and service
fields are already printed out in netlink_show_one() before we
start printing process context, by calling sock_addr_print()
twice.

At this point, sock_addr_print() has already forced the remote
service field to be 'serv_width' wide -- that is, 'serv_width'
width has already been consumed, before we print process
context.

Hence, it makes no sense to force the display width of process
context to be 'serv_width' wide again: previous prints have
filled up the line already. Remove the width specifier and
prefix with a space instead, to keep this consistent with fields
which are displayed after the first output line.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
6 years agoip: add fastopen_no_cookie option to ip route
Christoph Paasch [Tue, 31 Oct 2017 21:54:52 +0000 (14:54 -0700)]
ip: add fastopen_no_cookie option to ip route

This patch adds fastopen_no_cookie option to enable/disable TCP fastopen
without a cookie on a per-route basis.

Support in Linux was added with 71c02379c762 (tcp: Configure TFO without
cookie per socket and/or per route).

Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
6 years agoip netns: use strtol() instead of atoi()
Roman Mashak [Tue, 31 Oct 2017 18:24:19 +0000 (14:24 -0400)]
ip netns: use strtol() instead of atoi()

Use strtol-based API to parse and validate integer input; atoi() does
not detect errors and may yield undefined behaviour if result can't be
represented.

v2: use get_unsigned() since network namespace is really an unsigned value.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
6 years agoip: link_ip6tnl.c/ip6tunnel.c: Support IP6_TNL_F_ALLOW_LOCAL_REMOTE flag
Shmulik Ladkani [Sun, 29 Oct 2017 15:50:46 +0000 (17:50 +0200)]
ip: link_ip6tnl.c/ip6tunnel.c: Support IP6_TNL_F_ALLOW_LOCAL_REMOTE flag

IP6_TNL_F_ALLOW_LOCAL_REMOTE allows tunnel traffic on ip6tnl devices
where the remote endpoint is a local host address.

Specifying "[no]allow-localremote" controls the
IP6_TNL_F_ALLOW_LOCAL_REMOTE flag on ip6tnl interfaces.

This is the user-space counterpart for kernel
commit 908d140a87a7 ("ip6_tunnel: Allow rcv/xmit even if remote address is a local address")

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
6 years agobridge: vlan: support for per vlan tunnel info
Roopa Prabhu [Sat, 28 Oct 2017 05:13:50 +0000 (22:13 -0700)]
bridge: vlan: support for per vlan tunnel info

This patch uses kernel bridge vlan attribute
IFLA_BRIDGE_VLAN_TUNNEL_INFO to set/delete/show per vlan tunnel info.

$bridge vlan add dev vxlan0 vid 2000 tunnel_info id 2000
$bridge vlan add dev vxlan0 vid 1000-1001 tunnel_info id 2000-2001

$bridge vlan tunnelshow
port    vlan ids        tunnel id
vxlan0   1000-1001       1000-1001
         2000            2000

$bridge  -j vlan tunnelshow
{
    "dummy0": [],
    "dummy1": [],
    "bridge": [],
    "vxlan0": [{
            "vlan": 1000,
            "vlanEnd": 1001,
            "tunid": 1000,
            "tunidEnd": 1001
        },{
            "vlan": 2000,
            "tunid": 2000
        }
    ]
}

This patch also fixes a json termination bug in print_vlan
when filter vlan is provided by the user.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
6 years agoiplink: bridge: support bridge port vlan_tunnel attribute
Roopa Prabhu [Sat, 28 Oct 2017 05:13:49 +0000 (22:13 -0700)]
iplink: bridge: support bridge port vlan_tunnel attribute

This config maps to IFLA_BRPORT_VLAN_TUNNEL bridge port netlink
flag attribute. This flag enables vlan to tunnel mapping on a bridge
port. It is off by default.

set vlan_tunnel attribute on bridge port vxlan0:

$ip link set dev vxlan0 type bridge_slave vlan_tunnel on
$ip link set dev vxlan0 type bridge_slave vlan_tunnel off

or via bridge command

$bridge link set dev vxlan0 vlan_tunnel on
$bridge link set dev vxlan0 vlan_tunnel off

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
6 years agoUpdate kernel headers from net-next (4.14-rc6)
Stephen Hemminger [Tue, 31 Oct 2017 17:03:52 +0000 (18:03 +0100)]
Update kernel headers from net-next (4.14-rc6)

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agoMerge branch 'master' into net-next
Stephen Hemminger [Tue, 31 Oct 2017 17:03:12 +0000 (18:03 +0100)]
Merge branch 'master' into net-next

6 years agoUpdate kernel headers based on 4.14-rc7
Stephen Hemminger [Tue, 31 Oct 2017 17:01:51 +0000 (18:01 +0100)]
Update kernel headers based on 4.14-rc7

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agotc: m_ife: fix match tcindex parsing
Alexander Aring [Mon, 30 Oct 2017 16:37:49 +0000 (12:37 -0400)]
tc: m_ife: fix match tcindex parsing

This patch changes ife_prio to ife_tcindex which is right variable to
assign in the argument in this case.

Signed-off-by: Alexander Aring <aring@mojatatu.com>
6 years agoip: added missing newline in man page
Roman Mashak [Fri, 27 Oct 2017 19:05:34 +0000 (15:05 -0400)]
ip: added missing newline in man page

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
6 years agoMerge branch 'master' into net-next
Stephen Hemminger [Fri, 27 Oct 2017 07:27:43 +0000 (09:27 +0200)]
Merge branch 'master' into net-next

6 years agobridge: checkpatch related cleanups
Stephen Hemminger [Fri, 27 Oct 2017 07:15:23 +0000 (09:15 +0200)]
bridge: checkpatch related cleanups

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agoiproute: source code cleanup
Stephen Hemminger [Fri, 27 Oct 2017 06:38:25 +0000 (08:38 +0200)]
iproute: source code cleanup

Break long lines.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agoupdate kernel headers
Stephen Hemminger [Fri, 27 Oct 2017 06:31:26 +0000 (08:31 +0200)]
update kernel headers

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agoinclude: add TCP fastopen option
Stephen Hemminger [Fri, 27 Oct 2017 06:30:48 +0000 (08:30 +0200)]
include: add TCP fastopen option

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agobpf: update header file
Stephen Hemminger [Fri, 27 Oct 2017 06:28:36 +0000 (08:28 +0200)]
bpf: update header file

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agobridge: request vlans along with link information
Roman Mashak [Fri, 8 Sep 2017 21:52:23 +0000 (17:52 -0400)]
bridge: request vlans along with link information

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
6 years agobridge: dump vlan table information for link
Roman Mashak [Fri, 8 Sep 2017 21:52:22 +0000 (17:52 -0400)]
bridge: dump vlan table information for link

Kernel also reports vlans a port is member of, so print it. Since vlan
table can be quite large, dump it only when detailed information is
requested.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
6 years agobridge: isolate vlans parsing code in a separate API
Roman Mashak [Fri, 8 Sep 2017 21:52:21 +0000 (17:52 -0400)]
bridge: isolate vlans parsing code in a separate API

IFLA_BRIDGE_VLAN_INFO parsing logic will be used in link and vlan
processing code, so it makes sense to move it in the separate function.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
6 years agolib/libnetlink: update rtnl_talk to support malloc buff at run time
Hangbin Liu [Thu, 26 Oct 2017 01:41:47 +0000 (09:41 +0800)]
lib/libnetlink: update rtnl_talk to support malloc buff at run time

This is an update for 460c03f3f3cc ("iplink: double the buffer size also in
iplink_get()"). After update, we will not need to double the buffer size
every time when VFs number increased.

With call like rtnl_talk(&rth, &req.n, NULL, 0), we can simply remove the
length parameter.

With call like rtnl_talk(&rth, nlh, nlh, sizeof(req), I add a new variable
answer to avoid overwrite data in nlh, because it may has more info after
nlh. also this will avoid nlh buffer not enough issue.

We need to free answer after using.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
6 years agolib/libnetlink: re malloc buff if size is not enough
Hangbin Liu [Thu, 26 Oct 2017 01:41:46 +0000 (09:41 +0800)]
lib/libnetlink: re malloc buff if size is not enough

With commit 72b365e8e0fd ("libnetlink: Double the dump buffer size")
we doubled the buffer size to support more VFs. But the VFs number is
increasing all the time. Some customers even use more than 200 VFs now.

We could not double it everytime when the buffer is not enough. Let's just
not hard code the buffer size and malloc the correct number when running.

Introduce function rtnl_recvmsg() to always return a newly allocated buffer.
The caller need to free it after using.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
6 years agoman: add additional explainations for ss
yupeng [Thu, 26 Oct 2017 07:15:31 +0000 (07:15 +0000)]
man: add additional explainations for ss

Add detail explains of -m, -o, -e and -i options, which are not documented anywhere

Signed-off-by: yupeng <yupeng0921@gmail.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
6 years agoupdate headers for TC and TIPC from net-next
Stephen Hemminger [Wed, 25 Oct 2017 10:40:47 +0000 (12:40 +0200)]
update headers for TC and TIPC from net-next

6 years agoMerge branch 'master' into net-next
Stephen Hemminger [Wed, 25 Oct 2017 10:39:18 +0000 (12:39 +0200)]
Merge branch 'master' into net-next

6 years agotc/actions: introduce support for jump action
Jamal Hadi Salim [Sun, 22 Oct 2017 14:48:10 +0000 (10:48 -0400)]
tc/actions: introduce support for jump action

Sample use case:

... add ingress qdisc
sudo $TC qdisc add dev $ETH ingress

 ... if we exceed rate of 1kbps (burst of 90K), do an absolute jump of 2 actions
sudo $TC actions add action police rate 1kbit burst 90k conform-exceed jump 2 / pipe

sudo $TC -s actions ls action police
 action order 0:  police 0x4 rate 1Kbit burst 23440b mtu 2Kb action jump 2/pipe overhead 0b
 ref 1 bind 0 installed 41 sec used 41 sec
 Action statistics:
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0

... lets add a couple of marks so we can use them to mark exceed/not exceed
sudo $TC actions add action skbedit mark 11 ok index 11
sudo $TC actions add action skbedit mark 12 ok index 12

... if we dont exceed our rate we get a mark of 11, else mark of 12
sudo $TC filter add dev $ETH parent ffff: protocol ip prio 8 u32 \
match ip dst 127.0.0.8/32 flowid 1:10 \
action police index 4 \
action skbedit index 11 \
action skbedit index 12

Ok, lets keep this thing a little busy..
sudo ping -f -c 10000 127.0.0.8

... now lets see the filters..
sudo $TC -s filter ls dev $ETH parent ffff: protocol ip
filter pref 8 u32 chain 0
filter pref 8 u32 chain 0 fh 800: ht divisor 1
filter pref 8 u32 chain 0 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:10 not_in_hw  (rule hit 20000 success 10000)
  match 7f000008/ffffffff at 16 (success 10000 )
action order 1:  police 0x4 rate 1Kbit burst 23440b mtu 2Kb action jump 2/pipe overhead 0b
ref 2 bind 1 installed 198 sec used 2 sec
Action statistics:
Sent 840000 bytes 10000 pkt (dropped 0, overlimits 9721 requeues 0)
backlog 0b 0p requeues 0

action order 2:  skbedit mark 11 pass
 index 11 ref 2 bind 1 installed 127 sec used 2 sec
  Action statistics:
Sent 23436 bytes 279 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0

action order 3:  skbedit mark 12 pass
 index 12 ref 2 bind 1 installed 127 sec used 2 sec
  Action statistics:
Sent 816564 bytes 9721 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0

As can be seen 97.21% of the packets were marked as exceeding the allocated
rate; you could do something clever with the skb mark after this.

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years ago ip: bridge_slave: add neigh_suppress to the type help and
Nikolay Aleksandrov [Mon, 23 Oct 2017 12:46:24 +0000 (14:46 +0200)]
 ip: bridge_slave: add neigh_suppress to the type help and

Add neigh_suppress to the type help and document it in ip-link's man page.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agoMerge branch 'master' into net-next
Stephen Hemminger [Mon, 23 Oct 2017 12:44:55 +0000 (14:44 +0200)]
Merge branch 'master' into net-next

6 years agoss: initialize 'fackets' member of tcpstat structure
Roman Mashak [Wed, 18 Oct 2017 19:44:01 +0000 (15:44 -0400)]
ss: initialize 'fackets' member of tcpstat structure

'fackets' has never been initialized with kernel extracted information, thus
never really printed.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
6 years agoip maddr: fix filtering by device
Michal Kubecek [Thu, 19 Oct 2017 08:21:08 +0000 (10:21 +0200)]
ip maddr: fix filtering by device

Commit 530903dd9003 ("ip: fix igmp parsing when iface is long") uses
variable len to keep trailing colon from interface name comparison.  This
variable is local to loop body but we set it in one pass and use it in
following one(s) so that we are actually using (pseudo)random length for
comparison. This became apparent since commit b48a1161f5f9 ("ipmaddr: Avoid
accessing uninitialized data") always initializes len to zero so that the
name comparison is always true. As a result, "ip maddr show dev eth0" shows
IPv4 multicast addresses for all interfaces.

Instead of keeping the length, let's simply replace the trailing colon with
a null byte. The bonus is that we get correct interface name in ma.name.

Fixes: 530903dd9003 ("ip: fix igmp parsing when iface is long")
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Phil Sutter <phil@nwl.cc>
Acked-by: Petr Vorel <pvorel@suse.cz>
6 years agoss: Detect IPPROTO_ICMPV6 sockets
Phil Sutter [Wed, 18 Oct 2017 18:08:26 +0000 (20:08 +0200)]
ss: Detect IPPROTO_ICMPV6 sockets

Prefix IPPROTO_ICMPV6 sockets with 'icmp6' instead of '???'.

Signed-off-by: Phil Sutter <phil@nwl.cc>
6 years agoss: Distinguish between IPv4 and IPv6 wildcard sockets
Phil Sutter [Wed, 18 Oct 2017 17:58:13 +0000 (19:58 +0200)]
ss: Distinguish between IPv4 and IPv6 wildcard sockets

Commit aba9c23a6e1cb ("ss: enclose IPv6 address in brackets") unified
display of wildcard sockets in IPv4 and IPv6 to print the unspecified
address as '*'. Users then complained that they can't distinguish
between address families anymore, so change this again to what Stephen
Hemminger suggested:

| *:80    << both IPV6 and IPV4
| [::]:80 << IPV6_ONLY
| 0.0.0.0:80  << IPV4_ONLY

Note that on older kernels which don't support INET_DIAG_SKV6ONLY
attribute, pure IPv6 sockets will still show as '*'.

Cc: Humberto Alves <hjalves@live.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
6 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/shemminger...
Stephen Hemminger [Thu, 19 Oct 2017 00:11:50 +0000 (17:11 -0700)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2

6 years agoip: bridge_slave: add support for per-port group_fwd_mask
Nikolay Aleksandrov [Fri, 13 Oct 2017 13:12:53 +0000 (16:12 +0300)]
ip: bridge_slave: add support for per-port group_fwd_mask

This patch adds the iproute2 support for getting and setting the
per-port group_fwd_mask. It also tries to resolve the value into a more
human friendly format by printing the known protocols instead of only
the raw value.
The man page is also updated with the new option.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
6 years agoMerge branch 'master' into net-next
Stephen Hemminger [Mon, 16 Oct 2017 16:25:56 +0000 (09:25 -0700)]
Merge branch 'master' into net-next

6 years agocolor: Rename enum
Petr Vorel [Fri, 13 Oct 2017 13:57:19 +0000 (15:57 +0200)]
color: Rename enum

COLOR_NONE is more descriptive than COLOR_CLEAR.

Signed-off-by: Petr Vorel <petr.vorel@gmail.com>
6 years agocolor: Cleanup code to remove "magic" offset + 7
Petr Vorel [Fri, 13 Oct 2017 13:57:18 +0000 (15:57 +0200)]
color: Cleanup code to remove "magic" offset + 7

Signed-off-by: Petr Vorel <petr.vorel@gmail.com>
6 years agocolor: Fix another ip segfault when using --color switch
Petr Vorel [Fri, 13 Oct 2017 13:57:17 +0000 (15:57 +0200)]
color: Fix another ip segfault when using --color switch

Commit 959f1428 ("color: add new COLOR_NONE and disable_color function")
introducing color enum COLOR_NONE, which is not only duplicite of
COLOR_CLEAR, but also caused segfault, when running ip with --color
switch, as 'attr + 8' in color_fprintf() access array item out of
bounds. Thus removing it and restoring "magic" offset + 7.

Reproduce with:
$ ip -c a

Signed-off-by: Petr Vorel <petr.vorel@gmail.com>
6 years agocolor: Fix ip segfault when using --color switch
Petr Vorel [Fri, 13 Oct 2017 13:57:16 +0000 (15:57 +0200)]
color: Fix ip segfault when using --color switch

Commit d0e72011 ("ip: ipaddress.c: add support for json output")
introduced passing -1 as enum color_attr. This is not only wrong as no
color_attr has value -1, but also causes another segfault in color_fprintf()
on this setup as there is no item with index -1 in array of enum attr_colors[].
Using COLOR_CLEAR is valid option.

Reproduce with:
$ COLORFGBG='0;15' ip -c a

NOTE: COLORFGBG is environmental variable used for defining whether user
has light or dark background.
COLORFGBG="0;15" is used to ask for color set suitable for light background,
COLORFGBG="15;0" is used to ask for color set suitable for dark background.

Signed-off-by: Petr Vorel <petr.vorel@gmail.com>
6 years agotests: Revert back /bin/sh in shebang
Petr Vorel [Sun, 15 Oct 2017 09:59:45 +0000 (11:59 +0200)]
tests: Revert back /bin/sh in shebang

This was added by mistake in commit ecd44e68
("tests: Remove bashisms (s/source/.)")

Signed-off-by: Petr Vorel <petr.vorel@gmail.com>
6 years agoMerge branch 'master' into net-next
Stephen Hemminger [Thu, 12 Oct 2017 16:06:10 +0000 (09:06 -0700)]
Merge branch 'master' into net-next

6 years agonetem: fix code indentation
Stephen Hemminger [Thu, 12 Oct 2017 01:08:15 +0000 (18:08 -0700)]
netem: fix code indentation

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agoMerge branch 'master' into net-next
Stephen Hemminger [Wed, 11 Oct 2017 18:07:20 +0000 (11:07 -0700)]
Merge branch 'master' into net-next

6 years agoss: print MD5 signature keys configured on TCP sockets
Ivan Delalande [Fri, 6 Oct 2017 23:48:20 +0000 (16:48 -0700)]
ss: print MD5 signature keys configured on TCP sockets

These keys are reported by kernel 4.14 and later under the
INET_DIAG_MD5SIG attribute, when INET_DIAG_INFO is requested (ss -i)
and we have CAP_NET_ADMIN. The additional output looks like:

md5keys:fe80::/64=signing_key,10.1.2.0/24=foobar,::1/128=Test

Signed-off-by: Ivan Delalande <colona@arista.com>
6 years agoutils: add print_escape_buf to format and print arbitrary bytes
Ivan Delalande [Fri, 6 Oct 2017 23:48:19 +0000 (16:48 -0700)]
utils: add print_escape_buf to format and print arbitrary bytes

Keep it as simple as possible for now: just escape anything that is not
isprint-able, is among the "escape" parameter or '\' as an octal escape
sequence. This should be pretty easy to extend if any other user needs
something more complex in the future.

Signed-off-by: Ivan Delalande <colona@arista.com>
6 years agolib: fix multiple strlcpy definition
Baruch Siach [Mon, 9 Oct 2017 05:49:44 +0000 (08:49 +0300)]
lib: fix multiple strlcpy definition

Some C libraries, like uClibc and musl, provide BSD compatible
strlcpy(). Add check_strlcpy() to configure, and avoid defining strlcpy
and strlcat when the C library provides them.

This fixes the following static link error with uClibc-ng:

.../sysroot/usr/lib/libc.a(strlcpy.os): In function `strlcpy':
strlcpy.c:(.text+0x0): multiple definition of `strlcpy'
../lib/libutil.a(utils.o):utils.c:(.text+0x1ddc): first defined here
collect2: error: ld returned 1 exit status

Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
6 years agotests: Remove bashisms (s/source/.)
Petr Vorel [Sun, 8 Oct 2017 14:39:16 +0000 (16:39 +0200)]
tests: Remove bashisms (s/source/.)

Signed-off-by: Petr Vorel <petr.vorel@gmail.com>
6 years agoiplink: new option to set neigh suppression on a bridge port
Roopa Prabhu [Tue, 10 Oct 2017 04:42:13 +0000 (21:42 -0700)]
iplink: new option to set neigh suppression on a bridge port

neigh suppression can be used to suppress arp and nd flood
to bridge ports. It maps to the recently added
kernel support for bridge port flag IFLA_BRPORT_NEIGH_SUPPRESS.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
6 years agoip: mroute: Print offload indication
Yotam Gigi [Sun, 8 Oct 2017 14:43:04 +0000 (17:43 +0300)]
ip: mroute: Print offload indication

Since kernel net-next commit c7c0bbeae950 ("net: ipmr: Add MFC offload
indication") the kernel indicates on an MFC entry whether it was offloaded
using the RTNH_F_OFFLOAD flag. Update the "ip mroute show" command to
indicate when a route is offloaded, similarly to the "ip route show"
command.

Example output:
$ ip mroute
(0.0.0.0, 239.255.0.1)      Iif: sw1p7  Oifs: t_br0 State: resolved offload
(192.168.1.1, 239.255.0.1)  Iif: sw1p7  Oifs: sw1p4 State: resolved offload

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
6 years agoss: add AF_VSOCK support
Stefan Hajnoczi [Fri, 6 Oct 2017 15:48:41 +0000 (11:48 -0400)]
ss: add AF_VSOCK support

The AF_VSOCK address family is a host<->guest communications channel
supported by VMware, KVM, and Hyper-V.  Initial VMware support was
released in Linux 3.9 in 2013 and transports for other hypervisors were
added later.

AF_VSOCK addresses are <u32 cid, u32 port> tuples.  The 32-bit cid
integer is comparable to an IP address.  AF_VSOCK ports work like
TCP/UDP ports.

Both SOCK_STREAM and SOCK_DGRAM socket types are available.

This patch adds AF_VSOCK support to ss(8) so that sockets can be
observed.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
6 years agoss: allow AF_FAMILY constants >32
Stefan Hajnoczi [Fri, 6 Oct 2017 15:48:39 +0000 (11:48 -0400)]
ss: allow AF_FAMILY constants >32

Linux has more than 32 address families defined in <bits/socket.h>.  Use
a 64-bit type so all of them can be represented in the filter->families
bitmask.

It's easy to introduce bugs when using (1 << AF_FAMILY) because the
value is 32-bit.  This can produce incorrect results from bitmask
operations so introduce the FAMILY_MASK() macro to eliminate these bugs.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
6 years agouapi: add include linux/vm_sockets_diag.h
Stephen Hemminger [Wed, 11 Oct 2017 17:49:25 +0000 (10:49 -0700)]
uapi: add include linux/vm_sockets_diag.h

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agoMerge branch 'master' into net-next
Stephen Hemminger [Wed, 11 Oct 2017 17:47:55 +0000 (10:47 -0700)]
Merge branch 'master' into net-next

6 years agordma: move headers to uapi
Stephen Hemminger [Wed, 11 Oct 2017 17:47:28 +0000 (10:47 -0700)]
rdma: move headers to uapi

And update with version from upstream.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agoupdate uapi headers from 4.14-rc4 net-next
Stephen Hemminger [Wed, 11 Oct 2017 17:43:38 +0000 (10:43 -0700)]
update uapi headers from 4.14-rc4 net-next

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agoMerge branch 'master' into net-next
Stephen Hemminger [Wed, 11 Oct 2017 17:43:13 +0000 (10:43 -0700)]
Merge branch 'master' into net-next

6 years agoiproute: build more easily on Android
Lorenzo Colitti [Mon, 2 Oct 2017 17:03:37 +0000 (02:03 +0900)]
iproute: build more easily on Android

iproute2 contains a bunch of kernel headers, including uapi ones.
Android's libc uses uapi headers almost directly, and uses a
script to fix kernel types that don't match what userspace
expects.

For example: https://issuetracker.google.com/36987220 reports
that our struct ip_mreq_source contains "__be32 imr_multiaddr"
rather than "struct in_addr imr_multiaddr". The script addresses
this by replacing the uapi struct definition with a #include
<bits/ip_mreq.h> which contains the traditional userspace
definition.

Unfortunately, when we compile iproute2, this definition
conflicts with the one in iproute2's linux/in.h.

Historically we've just solved this problem by running "git rm"
on all the iproute2 include/linux headers that break Android's
libc.  However, deleting the files in this way makes it harder to
keep up with upstream, because every upstream change to
an include file causes a merge conflict with the delete.

This patch fixes the problem by moving the iproute2 linux headers
from include/linux to include/uapi/linux.

Tested: compiles on ubuntu trusty (glibc)

Signed-off-by: Elliott Hughes <enh@google.com>
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
6 years agotipc: don't need custom CFLAGS
Stephen Hemminger [Wed, 11 Oct 2017 17:35:00 +0000 (10:35 -0700)]
tipc: don't need custom CFLAGS

Since libmnl CFLAGS are now handled by config.mk

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agoMerge branch 'master' into net-next
Stephen Hemminger [Mon, 2 Oct 2017 15:04:13 +0000 (08:04 -0700)]
Merge branch 'master' into net-next

6 years agoupdate headers from net-next rc
Stephen Hemminger [Mon, 2 Oct 2017 15:03:45 +0000 (08:03 -0700)]
update headers from net-next rc

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 years agoCheck user supplied interface name lengths
Phil Sutter [Mon, 2 Oct 2017 11:46:37 +0000 (13:46 +0200)]
Check user supplied interface name lengths

The original problem was that something like:

| strncpy(ifr.ifr_name, *argv, IFNAMSIZ);

might leave ifr.ifr_name unterminated if length of *argv exceeds
IFNAMSIZ. In order to fix this, I thought about replacing all those
cases with (equivalent) calls to snprintf() or even introducing
strlcpy(). But as Ulrich Drepper correctly pointed out when rejecting
the latter from being added to glibc, truncating a string without
notifying the user is not to be considered good practice. So let's
excercise what he suggested and reject empty, overlong or otherwise
invalid interface names right from the start - this way calls to
strncpy() like shown above become safe and the user has a chance to
reconsider what he was trying to do.

Note that this doesn't add calls to check_ifname() to all places where
user supplied interface name is parsed. In many cases, the interface
must exist already and is therefore looked up using ll_name_to_index(),
so if_nametoindex() will perform the necessary checks already.

Signed-off-by: Phil Sutter <phil@nwl.cc>
6 years agotc: flower: No need to cache indev arg
Phil Sutter [Mon, 2 Oct 2017 11:46:36 +0000 (13:46 +0200)]
tc: flower: No need to cache indev arg

Since addattrstrz() will copy the provided string into the attribute
payload, there is no need to cache the data.

Signed-off-by: Phil Sutter <phil@nwl.cc>
6 years agoip{6, }tunnel: Avoid copying user-supplied interface name around
Phil Sutter [Mon, 2 Oct 2017 11:46:35 +0000 (13:46 +0200)]
ip{6, }tunnel: Avoid copying user-supplied interface name around

In both files' parse_args() functions as well as in iptunnel's do_prl()
and do_6rd() functions, a user-supplied 'dev' parameter is uselessly
copied into a temporary buffer before passing it to ll_name_to_index()
or copying into a struct ifreq.  Avoid this by just caching the argv
pointer value until the later lookup/strcpy.

Signed-off-by: Phil Sutter <phil@nwl.cc>
6 years agoip xfrm: use correct key length for netlink message
Michal Kubecek [Fri, 29 Sep 2017 11:41:05 +0000 (13:41 +0200)]
ip xfrm: use correct key length for netlink message

When SA is added manually using "ip xfrm state add", xfrm_state_modify()
uses alg_key_len field of struct xfrm_algo for the length of key passed to
kernel in the netlink message. However alg_key_len is bit length of the key
while we need byte length here. This is usually harmless as kernel ignores
the excess data but when the bit length of the key exceeds 512
(XFRM_ALGO_KEY_BUF_SIZE), it can result in buffer overflow.

We can simply divide by 8 here as the only place setting alg_key_len is in
xfrm_algo_parse() where it is always set to a multiple of 8 (and there are
already multiple places using "algo->alg_key_len / 8").

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
6 years agotc: fix ipv6 filter selector attribute for some prefix lengths
Yulia Kartseva [Sun, 1 Oct 2017 03:18:40 +0000 (20:18 -0700)]
tc: fix ipv6 filter selector attribute for some prefix lengths

Wrong TCA_U32_SEL attribute packing if prefixLen AND 0x1f equals 0x1f.
These are  /31, /63, /95 and /127 prefix lengths.

Example:
ip6 dst face:b00f::/31
filter parent b: protocol ipv6 pref 2307 u32
filter parent b: protocol ipv6 pref 2307 u32 fh 800: ht divisor 1
filter parent b: protocol ipv6 pref 2307 u32 fh 800::800 order 2048
key ht 800 bkt 0
  match faceb00f/ffffffff at 24

v2: previous patch was made with a wrong repo

Signed-off-by: Yulia Kartseva <hex@fb.com>
6 years agoMerge branch 'master' into net-next
Stephen Hemminger [Fri, 29 Sep 2017 19:03:16 +0000 (12:03 -0700)]
Merge branch 'master' into net-next

6 years agoip-route: Fix for listing routes with RTAX_LOCK attribute
Phil Sutter [Thu, 28 Sep 2017 17:33:56 +0000 (19:33 +0200)]
ip-route: Fix for listing routes with RTAX_LOCK attribute

This fixes a corner-case for routes with a certain metric locked to
zero:

| ip route add 192.168.7.0/24 dev eth0 window 0
| ip route add 192.168.7.0/24 dev eth0 window lock 0

Since the kernel doesn't dump the attribute if it is zero, both routes
added above would appear as if they were equal although they are not.

Fix this by taking mxlock value for the given metric into account before
skipping it if it is not present.

Reported-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>