]> git.proxmox.com Git - mirror_iproute2.git/log
mirror_iproute2.git
8 years agodevlink: implement shared buffer occupancy control
Jiri Pirko [Sat, 14 May 2016 13:21:02 +0000 (15:21 +0200)]
devlink: implement shared buffer occupancy control

Use kernel shared buffer occupancy control commands to make snapshot and
clear occupancy watermarks. Also, allow to show occupancy values in a
nice way.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
8 years agodevlink: implement shared buffer support
Jiri Pirko [Sat, 14 May 2016 13:21:01 +0000 (15:21 +0200)]
devlink: implement shared buffer support

Implement kernel devlink shared buffer interface. Introduce new object
"sb" and allow to browse the shared buffer parameters and also change
configuration.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
8 years agoingress, clsact: don't add TCA_OPTIONS to nl msg
Daniel Borkmann [Sun, 15 May 2016 16:36:03 +0000 (18:36 +0200)]
ingress, clsact: don't add TCA_OPTIONS to nl msg

In ingress and clsact qdisc TCA_OPTIONS are ignored, since it's
parameterless. In tc, we add an empty addattr_l(... TCA_OPTIONS,
NULL, 0) to the netlink message nevertheless. This has the
side effect that when someone tries a 'tc qdisc replace' and
already an existing such qdisc is present, tc fails with
EINVAL here.

Reason is that in the kernel, this invokes qdisc_change() when
such requested qdisc is already present. When TCA_OPTIONS are
passed to modify parameters, it looks whether qdisc implements
.change() callback, and if not present (like in both cases here)
it returns with error. Rather than adding an empty stub to the
kernel that ignores TCA_OPTIONS again, just don't add TCA_OPTIONS
to the netlink message in the first place.

Before:

  # tc qdisc replace dev foo clsact    # first try
  # tc qdisc replace dev foo clsact    # second one
  RTNETLINK answers: Invalid argument

After:

  # tc qdisc replace dev foo clsact
  # tc qdisc replace dev foo clsact
  # tc qdisc replace dev foo clsact

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
8 years agoMerge branch 'master' into net-next
Stephen Hemminger [Mon, 16 May 2016 18:20:40 +0000 (11:20 -0700)]
Merge branch 'master' into net-next

8 years agotc simple action update and breakage
Jamal Hadi Salim [Sun, 8 May 2016 15:02:06 +0000 (11:02 -0400)]
tc simple action update and breakage

Brings it closer to more serious actions (adding branching
and allowing for late binding)

Unfortunately this breaks old syntax of the simple action.
But because simple is a pedagogical example unlikely to be used
in production environments (i.e its role is to serve as an example
on how to write actions), then this is ok.

New syntax for simple has new keyword "sdata". Example usage is:

sudo tc actions add action simple sdata "foobar" index 1
or
tc filter add dev $DEV parent ffff: protocol ip prio 1 u32\
match ip dst 17.0.0.1/32 flowid 1:10 action simple sdata "foobar"

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
8 years agotc: don't ignore ok as an action branch
Jamal Hadi Salim [Sat, 7 May 2016 13:39:36 +0000 (09:39 -0400)]
tc: don't ignore ok as an action branch

This is what used to happen before:

tc filter add dev tap1 parent ffff: protocol 0xfefe prio 10 \
     u32 match u32 0 0 flowid 1:16 \
     action ife decode allow mark ok

tc -s filter ls dev tap1 parent ffff:
filter protocol [65278] pref 10 u32
filter protocol [65278] pref 10 u32 fh 800: ht divisor 1
filter protocol [65278] pref 10 u32 fh 800::800 order 2048 key ht 800
bkt 0 flowid 1:16
  match 00000000/00000000 at 0
        action order 1: ife decode action pipe
         index 2 ref 1 bind 1 installed 4 sec used 4 sec
         type: 0x0
         Metadata: allow mark
        Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

        action order 2: gact action pass
         random type none pass val 0
         index 1 ref 1 bind 1 installed 4 sec used 4 sec
        Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

Note the extra action added at the end..

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
8 years agotc: introduce IFE action
Jamal Hadi Salim [Sat, 7 May 2016 13:35:23 +0000 (09:35 -0400)]
tc: introduce IFE action

This action allows for a sending side to encapsulate arbitrary metadata
which is decapsulated by the receiving end.
The sender runs in encoding mode and the receiver in decode mode.
Both sender and receiver must specify the same ethertype.
At some point we hope to have a registered ethertype and we'll
then provide a default so the user doesnt have to specify it.
For now we enforce the user specify it.

Described in netdev01 paper:
   "Distributing Linux Traffic Control Classifier-Action Subsystem"
    Authors: Jamal Hadi Salim and Damascene M. Joachimpillai

Also refer to IETF draft-ietf-forces-interfelfb-04.txt

Lets show example usage where we encode icmp from a sender towards
a receiver with an skbmark of 17; both sender and receiver use
ethertype of 0xdead to interop.

YYYY: Lets start with Receiver-side policy config:
xxx: add an ingress qdisc
sudo tc qdisc add dev $ETH ingress

xxx: any packets with ethertype 0xdead will be subjected to ife decoding
xxx: we then restart the classification so we can match on icmp at prio 3
sudo $TC filter add dev $ETH parent ffff: prio 2 protocol 0xdead \
u32 match u32 0 0 flowid 1:1 \
action ife decode reclassify

xxx: on restarting the classification from above if it was an icmp
xxx: packet, then match it here and continue to the next rule at prio 4
xxx: which will match based on skb mark of 17
sudo tc filter add dev $ETH parent ffff: prio 3 protocol ip \
u32 match ip protocol 1 0xff flowid 1:1 \
action continue

xxx: match on skbmark of 0x11 (decimal 17) and accept
sudo tc filter add dev $ETH parent ffff: prio 4 protocol ip \
handle 0x11 fw flowid 1:1 \
action ok

xxx: Lets show the decoding policy
sudo tc -s filter ls dev $ETH parent ffff: protocol 0xdead
xxx:
filter pref 2 u32
filter pref 2 u32 fh 800: ht divisor 1
filter pref 2 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:1  (rule hit 0 success 0)
  match 00000000/00000000 at 0 (success 0 )
action order 1: ife decode action reclassify type 0x0
 allow mark allow prio
 index 11 ref 1 bind 1 installed 45 sec used 45 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0

xxx:
Observe that above lists all metadatum it can decode. Typically these
submodules will already be compiled into a monolithic kernel or
loaded as modules

YYYY: Lets show the sender side now ..
xxx: Add an egress qdisc on the sender netdev
sudo tc qdisc add dev $ETH root handle 1: prio
xxx:
xxx: Match all icmp packets to 192.168.122.237/24, then
xxx: tag the packet with skb mark of decimal 17, then
xxx: Encode it with:
xxx:    ethertype 0xdead
xxx:    add skb->mark to whitelist of metadatum to send
xxx:    rewrite target dst MAC address to 02:15:15:15:15:15
xxx:
sudo $TC filter add dev $ETH parent 1: protocol ip prio 10  u32 \
match ip dst 192.168.122.237/24 \
match ip protocol 1 0xff \
flowid 1:2 \
action skbedit mark 17 \
action ife encode \
type 0xDEAD \
allow mark \
dst 02:15:15:15:15:15

xxx: Lets show the encoding policy
filter pref 10 u32
filter pref 10 u32 fh 800: ht divisor 1
filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:2  (rule hit 118 success 0)
  match c0a87a00/ffffff00 at 16 (success 0 )
  match 00010000/00ff0000 at 8 (success 0 )
action order 1:  skbedit mark 17
 index 11 ref 1 bind 1 installed 3 sec used 3 sec
  Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0

action order 2: ife encode action pipe type 0xDEAD
 allow mark dst 02:15:15:15:15:15
 index 12 ref 1 bind 1 installed 3 sec used 3 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
xxx:

Now test by sending ping from sender to destination

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
8 years agoadd tc_ife.h
Stephen Hemminger [Mon, 16 May 2016 18:13:05 +0000 (11:13 -0700)]
add tc_ife.h

8 years agoupdate kernel headers from net-next
Stephen Hemminger [Fri, 13 May 2016 21:56:31 +0000 (14:56 -0700)]
update kernel headers from net-next

Take sanitized headers for davem net-next

8 years agodevlink: update uapi header
Stephen Hemminger [Fri, 13 May 2016 21:49:40 +0000 (14:49 -0700)]
devlink: update uapi header

Get santized version from net-next

8 years agoMerge branch 'master' into net-next
Stephen Hemminger [Fri, 13 May 2016 21:48:53 +0000 (14:48 -0700)]
Merge branch 'master' into net-next

8 years agodevlink: remove more unused code
Stephen Hemminger [Fri, 13 May 2016 21:48:32 +0000 (14:48 -0700)]
devlink: remove more unused code

8 years agoss: Remove unused argument from kill_inet_sock
subashab@codeaurora.org [Mon, 9 May 2016 20:54:36 +0000 (14:54 -0600)]
ss: Remove unused argument from kill_inet_sock

addr is not used here.

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
8 years agoMerge branch 'master' into net-next
Stephen Hemminger [Fri, 13 May 2016 21:44:48 +0000 (14:44 -0700)]
Merge branch 'master' into net-next

8 years agodevlink: remove unused code
Stephen Hemminger [Fri, 13 May 2016 21:42:06 +0000 (14:42 -0700)]
devlink: remove unused code

Unused code causes warnings, removed.

8 years agoupdate kernel headers to 4.6-rc6
Stephen Hemminger [Fri, 13 May 2016 21:41:45 +0000 (14:41 -0700)]
update kernel headers to 4.6-rc6

Close to final upstream headers

8 years agoRevert "devlink: implement shared buffer support"
Stephen Hemminger [Fri, 13 May 2016 21:38:47 +0000 (14:38 -0700)]
Revert "devlink: implement shared buffer support"

This reverts commit b56700bf8add4ebb2fe451c85f50602b58a886a2.

8 years agoRevert "devlink: implement shared buffer occupancy control"
Stephen Hemminger [Fri, 13 May 2016 21:38:38 +0000 (14:38 -0700)]
Revert "devlink: implement shared buffer occupancy control"

This reverts commit a60ebcb6f34f4c43cba092f52b1150d7fb1deec5.

8 years agogeneve: fix IPv6 remote address reporting
Edward Cree [Fri, 6 May 2016 14:28:25 +0000 (15:28 +0100)]
geneve: fix IPv6 remote address reporting

Since we can only configure unicast, we probably want to be able to
display unicast, rather than multicast.

Fixes: 906ac5437ab8 ("geneve: add support for IPv6 link partners")
Signed-off-by: Edward Cree <ecree@solarflare.com>
8 years agoip link gre: print only relevant info in external mode
Jiri Benc [Wed, 27 Apr 2016 14:11:14 +0000 (16:11 +0200)]
ip link gre: print only relevant info in external mode

Display only attributes that are relevant when a GRE interface is in
'external' mode instead of the default values (which are ignored by the
kernel even if passed back).

Fixes: 926b39e1feffd ("gre: add support for collect metadata flag")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
8 years agoip link gre: create interfaces in external mode correctly
Jiri Benc [Wed, 27 Apr 2016 14:11:13 +0000 (16:11 +0200)]
ip link gre: create interfaces in external mode correctly

For GRE interfaces in 'external' mode, the kernel ignores all manual
settings like remote IP address or TTL. However, for some of those
attributes, kernel checks their value and does not allow them to be zero
(even though they're ignored later).

Currently, 'ip link' always includes all attributes in the netlink message.
This leads to problem with creating interfaces in 'external' mode. For
example, this command does not work:

ip link add gre1 type gretap external

and needs a bogus remote IP address to be specified, as the kernel enforces
remote IP address to be either not present, or not null.

Ignore the parameters that do not make sense in 'external' mode.
Unfortunately, we cannot error out, as there may be existing deployments
that workarounded the bug by specifying bogus values.

Fixes: 926b39e1feffd ("gre: add support for collect metadata flag")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
8 years agotc: add bash-completion function
Quentin Monnet [Tue, 3 May 2016 07:39:08 +0000 (09:39 +0200)]
tc: add bash-completion function

Add function for command completion for tc in bash, and update Makefile
to install it under /usr/share/bash-completion/completions/.

Inside iproute2 repository, the completion code is in a new
`bash-completion` toplevel directory.

v2: Remove `if` statement in Makefile: do not try to install in
    /etc/bash_completion.d/ if /usr/share/bash-completion/completions/
    is not found; instead, the user can override the installation path
    with the specific environment variable.

Signed-off-by: Quentin Monnet <quentin.monnet@6wind.com>
8 years agoupdate kernel headers from net-next
Stephen Hemminger [Mon, 25 Apr 2016 05:30:46 +0000 (22:30 -0700)]
update kernel headers from net-next

8 years agoss: add SK_MEMINFO_DROPS display
Eric Dumazet [Thu, 21 Apr 2016 12:19:04 +0000 (05:19 -0700)]
ss: add SK_MEMINFO_DROPS display

SK_MEMINFO_DROPS is added in linux-4.7 for TCP, UDP and SCTP

skmem will display the socket drop count using d prefix as in :

$ ss -tm src :22 | more
State      Recv-Q Send-Q Local Address:Port    Peer Address:Port
ESTAB      0      52     10.246.7.151:ssh      172.20.10.101:50759
 skmem:(r0,rb8388608,t0,tb8388608,f1792,w2304,o0,bl0,d0)

Signed-off-by: Eric Dumazet <edumazet@google.com>
8 years agoupdate kernel headers from net-next
Stephen Hemminger [Fri, 22 Apr 2016 17:01:12 +0000 (10:01 -0700)]
update kernel headers from net-next

8 years agoupdate inet_diag.h header
Stephen Hemminger [Tue, 19 Apr 2016 15:06:11 +0000 (08:06 -0700)]
update inet_diag.h header

8 years agoMerge branch 'master' into net-next
Stephen Hemminger [Tue, 19 Apr 2016 15:01:55 +0000 (08:01 -0700)]
Merge branch 'master' into net-next

8 years agodevlink: add manpage for shared buffer
Jiri Pirko [Fri, 15 Apr 2016 07:51:53 +0000 (09:51 +0200)]
devlink: add manpage for shared buffer

Manpage for devlink "sb" object.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
8 years agodevlink: implement shared buffer occupancy control
Jiri Pirko [Fri, 15 Apr 2016 07:51:52 +0000 (09:51 +0200)]
devlink: implement shared buffer occupancy control

Use kernel shared buffer occupancy control commands to make snapshot and
clear occupancy watermarks. Also, allow to show occupancy values in a
nice way.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
8 years agodevlink: implement shared buffer support
Jiri Pirko [Fri, 15 Apr 2016 07:51:51 +0000 (09:51 +0200)]
devlink: implement shared buffer support

Implement kernel devlink shared buffer interface. Introduce new object
"sb" and allow to browse the shared buffer parameters and also change
configuration.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
8 years agodevlink: allow to parse both devlink and port handle in the same time
Jiri Pirko [Fri, 15 Apr 2016 07:51:50 +0000 (09:51 +0200)]
devlink: allow to parse both devlink and port handle in the same time

For filtering purposes, it makes sense for used to either specify
devlink handle of port handle.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
8 years agodevlink: introduce dump filtering function
Jiri Pirko [Fri, 15 Apr 2016 07:51:49 +0000 (09:51 +0200)]
devlink: introduce dump filtering function

This function is to be used from dump callbacks to decide if the output
currect output should be filtered off or not. Filtering is based on
previously parsed and stored command line options.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
8 years agodevlink: split dl_argv_parse_put to parse and put parts
Jiri Pirko [Fri, 15 Apr 2016 07:51:48 +0000 (09:51 +0200)]
devlink: split dl_argv_parse_put to parse and put parts

It is handy to have parsed cmdline data stored so they can be used for
dumps filtering. So split original dl_argv_parse_put into parse and put
parts.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
8 years agodevlink: introduce helper to print out nice names (ifnames)
Jiri Pirko [Fri, 15 Apr 2016 07:51:47 +0000 (09:51 +0200)]
devlink: introduce helper to print out nice names (ifnames)

By default, ifnames will be printed out. User can turn that off using
"-n" option on the command line.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
8 years agodevlink: introduce pr_out_port_handle helper
Jiri Pirko [Fri, 15 Apr 2016 07:51:46 +0000 (09:51 +0200)]
devlink: introduce pr_out_port_handle helper

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
8 years agolist: add list_add_tail helper
Jiri Pirko [Fri, 15 Apr 2016 07:51:45 +0000 (09:51 +0200)]
list: add list_add_tail helper

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
8 years agolist: add list_for_each_entry_reverse macro
Jiri Pirko [Fri, 15 Apr 2016 07:51:44 +0000 (09:51 +0200)]
list: add list_for_each_entry_reverse macro

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
8 years agodevlink: fix "devlink port" help message
Jiri Pirko [Fri, 15 Apr 2016 07:51:43 +0000 (09:51 +0200)]
devlink: fix "devlink port" help message

"dl" -> "devlink"

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
8 years agoss: take care of unknown min_rtt
Eric Dumazet [Wed, 13 Apr 2016 22:18:38 +0000 (15:18 -0700)]
ss: take care of unknown min_rtt

Kernel sets info->tcpi_min_rtt to ~0U when no RTT sample was ever
taken for the session, thus min_rtt is unknown.

Signed-off-by: Eric Dumazet <edumazet@google.com>
8 years agoss: Fix accidental state filter override
Phil Sutter [Wed, 13 Apr 2016 20:07:05 +0000 (22:07 +0200)]
ss: Fix accidental state filter override

Passing a filter expression and selecting an address family using the
'-f' flag would overwrite the state filter by accident. Therefore
calling e.g. 'ss -nl -f inet '(sport = :22)' would not only print
listening sockets (as requested by '-l' flag) but connected ones, as
well.

Fix this by reusing the formerly ineffective call to filter_states_set()
to restore the state filter as it was before the call to
filter_af_set().

Signed-off-by: Phil Sutter <phil@nwl.cc>
8 years agoss: Drop silly assignment
Phil Sutter [Wed, 13 Apr 2016 20:07:04 +0000 (22:07 +0200)]
ss: Drop silly assignment

An expression of the form '(a | b) & b' will evaluate to the value of b
for any value of a or b.

Signed-off-by: Phil Sutter <phil@nwl.cc>
8 years agoip: neigh: Fix leftover attributes message during flush
Jeff Harris [Thu, 14 Apr 2016 18:15:03 +0000 (14:15 -0400)]
ip: neigh: Fix leftover attributes message during flush

Use the same rtnl_dump_request_n call as the show.  The rtnl_wilddump_request
assumes the type uses an ifinfomsg which is not the case for the neighbor
table.

Signed-off-by: Jeff Harris <jefftharris@gmail.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
8 years agovxlan: add support for VXLAN-GPE
Jiri Benc [Thu, 7 Apr 2016 12:36:29 +0000 (14:36 +0200)]
vxlan: add support for VXLAN-GPE

Adds support to create a VXLAN-GPE interface.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
8 years agoip-link.8: document "external" flag for vxlan
Jiri Benc [Thu, 7 Apr 2016 12:36:28 +0000 (14:36 +0200)]
ip-link.8: document "external" flag for vxlan

Signed-off-by: Jiri Benc <jbenc@redhat.com>
8 years agovxlan: 'external' implies 'nolearning'
Jiri Benc [Thu, 7 Apr 2016 12:36:27 +0000 (14:36 +0200)]
vxlan: 'external' implies 'nolearning'

It doesn't make sense to use external control plane and fill internal FDB at
the same time. It's even an illegal combination for VXLAN-GPE.

Just switch off learning when 'external' is specified.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
8 years agoMerge branch 'master' into net-next
Stephen Hemminger [Mon, 11 Apr 2016 22:15:41 +0000 (22:15 +0000)]
Merge branch 'master' into net-next

8 years agoip: whitespace cleanup
Stephen Hemminger [Mon, 11 Apr 2016 22:13:55 +0000 (22:13 +0000)]
ip: whitespace cleanup

Fix whitespace

8 years agoip-link: Support printing VF trust setting
Phil Sutter [Thu, 31 Mar 2016 12:43:32 +0000 (14:43 +0200)]
ip-link: Support printing VF trust setting

This adds a new item to VF lines of a PF, stating whether the VF is
trusted or not.

Signed-off-by: Phil Sutter <phil@nwl.cc>
8 years agoiproute2: tc_bpf.c: fix building with musl libc
Gustavo Zacarias [Fri, 8 Apr 2016 12:59:33 +0000 (09:59 -0300)]
iproute2: tc_bpf.c: fix building with musl libc

We need limits.h for PATH_MAX, fixes:

tc_bpf.c: In function â€˜bpf_map_selfcheck_pinned’:
tc_bpf.c:222:12: error: â€˜PATH_MAX’ undeclared (first use in this
function)
  char file[PATH_MAX], buff[4096];

Signed-off-by: Gustavo Zacarias <gustavo@zacarias.com.ar>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
8 years agoip: only display phys attributes with details option
Stephen Hemminger [Mon, 11 Apr 2016 22:07:51 +0000 (22:07 +0000)]
ip: only display phys attributes with details option

Since output of ip commands are already cluttered, move the physical port details
under a show_details option.

8 years agoiplink: display IFLA_PHYS_PORT_NAME
Nicolas Dichtel [Fri, 1 Apr 2016 14:22:01 +0000 (16:22 +0200)]
iplink: display IFLA_PHYS_PORT_NAME

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
8 years agotc, bpf: add support for map pre/allocation
Daniel Borkmann [Fri, 8 Apr 2016 22:32:05 +0000 (00:32 +0200)]
tc, bpf: add support for map pre/allocation

Follow-up to kernel commit 6c9059817432 ("bpf: pre-allocate hash map
elements"). Add flags support, so that we can pass in BPF_F_NO_PREALLOC
flag for disallowing preallocation. Update examples accordingly and also
remove the BPF_* map helper macros from them as they were not very useful.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
8 years agotc, bpf: further improve error reporting
Daniel Borkmann [Fri, 8 Apr 2016 22:32:04 +0000 (00:32 +0200)]
tc, bpf: further improve error reporting

Make it easier to spot issues when loading the object file fails. This
includes reporting in what pinned object specs differ, better indication
when we've reached instruction limits. Don't retry to load a non relo
program once we failed with bpf(2), and report out of bounds tail call key.

Also, add truncation of huge log outputs by default. Sometimes errors are
quite easy to spot by only looking at the tail of the verifier log, but
logs can get huge in size e.g. up to few MB (due to verifier checking all
possible program paths). Thus, by default limit output to the last 4096
bytes and indicate that it's truncated. For the full log, the verbose option
can be used.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
8 years agotc, bpf: add new csum and tunnel signatures
Daniel Borkmann [Fri, 8 Apr 2016 22:32:03 +0000 (00:32 +0200)]
tc, bpf: add new csum and tunnel signatures

Add new signatures for BPF_FUNC_csum_diff, BPF_FUNC_skb_get_tunnel_opt
and BPF_FUNC_skb_set_tunnel_opt.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
8 years agobridge: vlan: add support to filter by vlan id
Nikolay Aleksandrov [Mon, 11 Apr 2016 15:45:16 +0000 (17:45 +0200)]
bridge: vlan: add support to filter by vlan id

Add the optional keyword "vid" to bridge vlan show so the user can
request filtering by a specific vlan id. Currently the filtering is
implemented only in user-space. The argument name has been chosen to
match the add/del one - "vid". This filtering can be used also with the
"-compressvlans" option to see in which range is a vlan (if in any).
Also this will be used to show only specific per-vlan statistics later
when support is added to the kernel for it.

Examples:
$ bridge vlan show vid 450
port vlan ids
eth2  450

$ bridge -c vlan show vid 450
port vlan ids
eth2  400-500

$ bridge vlan show vid 1
port vlan ids
eth1  1 PVID Egress Untagged
eth2  1 PVID
br0  1 PVID Egress Untagged

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
8 years agobridge: mdb: add support to filter by vlan id
Nikolay Aleksandrov [Mon, 11 Apr 2016 15:45:15 +0000 (17:45 +0200)]
bridge: mdb: add support to filter by vlan id

Add the optional keyword "vid" to bridge mdb show so the user can
request filtering by a specific vlan id. Currently the filtering is
implemented only in user-space. The argument name has been chosen to match
the add/del one - "vid".

Example:
$ bridge mdb show vid 200
dev br0 port eth2 grp 239.0.0.1 permanent vid 200

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
8 years agobridge: fdb: add support to filter by vlan id
Nikolay Aleksandrov [Mon, 11 Apr 2016 15:45:14 +0000 (17:45 +0200)]
bridge: fdb: add support to filter by vlan id

Add the optional keyword "vlan" to bridge fdb show so the user can request
filtering by a specific vlan id. Currently the filtering is implemented
only in user-space. The argument name has been chosen to match the
add/del one - "vlan".

Example:
$ bridge fdb show vlan 400
52:54:00:bf:57:16 dev eth2 vlan 400 master br0 permanent

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
8 years agoiplink: display number of rx/tx queues
Eric Dumazet [Thu, 7 Apr 2016 23:11:39 +0000 (16:11 -0700)]
iplink: display number of rx/tx queues

We can set the attributes, so would be nice to display them when
provided by the kernel.

Signed-off-by: Eric Dumazet <edumazet@google.com>
8 years agoupdate kernel headers
Stephen Hemminger [Mon, 11 Apr 2016 20:44:50 +0000 (13:44 -0700)]
update kernel headers

Headers up to date with 4.6-net-next

8 years agoupdate kernel headers
Stephen Hemminger [Mon, 11 Apr 2016 20:40:40 +0000 (13:40 -0700)]
update kernel headers

Update from 4.6-rc3

8 years agodevlink: ignore build result
Stephen Hemminger [Mon, 11 Apr 2016 20:32:22 +0000 (13:32 -0700)]
devlink: ignore build result

devlink binary is built

8 years agogeneve: add support to set flow label
Daniel Borkmann [Thu, 24 Mar 2016 15:49:56 +0000 (16:49 +0100)]
geneve: add support to set flow label

Follow-up for kernel commit 8eb3b99554b8 ("geneve: support setting
IPv6 flow label") to allow setting the label for the device config.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
8 years agovxlan: add support to set flow label
Daniel Borkmann [Thu, 24 Mar 2016 15:49:55 +0000 (16:49 +0100)]
vxlan: add support to set flow label

Follow-up for kernel commit e7f70af111f0 ("vxlan: support setting
IPv6 flow label") to allow setting the label for the device config.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
8 years agoadd devlink tool
Jiri Pirko [Tue, 22 Mar 2016 09:02:21 +0000 (10:02 +0100)]
add devlink tool

Add new tool called devlink which is userspace counterpart of devlink
Netlink socket.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
8 years agoinclude: add linked list implementation from kernel
Jiri Pirko [Tue, 22 Mar 2016 09:02:20 +0000 (10:02 +0100)]
include: add linked list implementation from kernel

Rename hlist.h to list.h while adding it to be aligned with kernel

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
8 years agogeneve: Add support for configuring UDP checksums.
Jesse Gross [Sat, 19 Mar 2016 00:51:09 +0000 (17:51 -0700)]
geneve: Add support for configuring UDP checksums.

Enable support for configuring outer UDP checksums on Geneve tunnels:

ip link add type geneve id 10 remote 10.0.0.2 udpcsum

Signed-off-by: Jesse Gross <jesse@kernel.org>
8 years agovxlan: Follow kernel defaults for outer UDP checksum.
Jesse Gross [Sat, 19 Mar 2016 00:51:08 +0000 (17:51 -0700)]
vxlan: Follow kernel defaults for outer UDP checksum.

On recent kernels, UDP checksum computation has become more efficient and
the default behavior was changed, however, the ip command overrides this
by always specifying a particular behavior.

If the user does not specify that UDP checksums should either be computed
or not then we don't need to send an explicit netlink message - the kernel
can just use its default behavior.

Signed-off-by: Jesse Gross <jesse@kernel.org>
8 years agoscrub out whitespace issues
Stephen Hemminger [Sun, 27 Mar 2016 17:47:46 +0000 (10:47 -0700)]
scrub out whitespace issues

Run script that removes trailing whitespace everywhere.

8 years agofix get_addr() and get_prefix() error messages
Marco Varlese [Sun, 27 Mar 2016 17:45:51 +0000 (10:45 -0700)]
fix get_addr() and get_prefix() error messages

An attempt to add invalid address to interface would print "???" string
instead of the address family name.

For example:
$ ip address add 256.10.166.1/24 dev ens8
Error: ??? prefix is expected rather than "256.10.166.1/24".

$ ip neighbor add proxy 2001:db8::g dev ens8
Error: ??? address is expected rather than "2001:db8::g".

With this patch the output will look like:
$ ip address add 256.10.166.1/24 dev ens8
Error: inet prefix is expected rather than "256.10.166.1/24".

$ ip neighbor add proxy 2001:db8::g dev ens8
Error: inet6 address is expected rather than "2001:db8::g".

Signed-off-by: Przemyslaw Szczerbik <przemyslawx.szczerbik@intel.com>
Signed-off-by: Marco Varlese <marco.varlese@intel.com>
8 years agolib/ll_addr: improve ll_addr_n2a() a bit
Phil Sutter [Tue, 22 Mar 2016 18:35:19 +0000 (19:35 +0100)]
lib/ll_addr: improve ll_addr_n2a() a bit

Apart from making the code a bit more compact and efficient, this also
prevents a potential buffer overflow if the passed buffer is really too
small: Although correctly decrementing the size parameter passed to
snprintf, it could become negative which would then wrap since snprintf
uses (unsigned) size_t for the parameter.

Signed-off-by: Phil Sutter <phil@nwl.cc>
8 years agolib/utils: introduce rt_addr_n2a_rta()
Phil Sutter [Tue, 22 Mar 2016 18:35:18 +0000 (19:35 +0100)]
lib/utils: introduce rt_addr_n2a_rta()

This simple macro eases calling rt_addr_n2a() with data from an rt_attr
pointer.

Signed-off-by: Phil Sutter <phil@nwl.cc>
8 years agolib/utils: introduce format_host_rta()
Phil Sutter [Tue, 22 Mar 2016 18:35:17 +0000 (19:35 +0100)]
lib/utils: introduce format_host_rta()

This simple macro eases calling format_host() with data from an rt_attr
pointer.

Signed-off-by: Phil Sutter <phil@nwl.cc>
8 years agoutils: make rt_addr_n2a() non-reentrant by default
Phil Sutter [Tue, 22 Mar 2016 18:35:16 +0000 (19:35 +0100)]
utils: make rt_addr_n2a() non-reentrant by default

There is only a single user who needs it to be reentrant (not really,
but it's safer like this), add rt_addr_n2a_r() for it to use.

Signed-off-by: Phil Sutter <phil@nwl.cc>
8 years agomake format_host non-reentrant by default
Phil Sutter [Tue, 22 Mar 2016 18:35:15 +0000 (19:35 +0100)]
make format_host non-reentrant by default

There are only three users which require it to be reentrant, the rest is
fine without. Instead, provide a reentrant format_host_r() for users
which need it.

Signed-off-by: Phil Sutter <phil@nwl.cc>
8 years agoipaddress: colorize peer, broadcast and anycast addresses as well
Phil Sutter [Tue, 22 Mar 2016 18:35:14 +0000 (19:35 +0100)]
ipaddress: colorize peer, broadcast and anycast addresses as well

Signed-off-by: Phil Sutter <phil@nwl.cc>
8 years agocolor: introduce color helpers and COLOR_CLEAR
Phil Sutter [Tue, 22 Mar 2016 18:35:13 +0000 (19:35 +0100)]
color: introduce color helpers and COLOR_CLEAR

This adds two helper functions which map a given data field to a color,
so color_fprintf() statements don't have to be duplicated with only a
different color value depending on that data field's value. In order for
this to work in a generic way, COLOR_CLEAR has been added to serve as a
fallback default of uncolored output.

Signed-off-by: Phil Sutter <phil@nwl.cc>
8 years agoman: tc-vlan.8: Describe CONTROL option
Phil Sutter [Tue, 22 Mar 2016 14:48:39 +0000 (15:48 +0100)]
man: tc-vlan.8: Describe CONTROL option

This should be made generic and part of a common tc-actions man page.
Though leave it here for now to not confuse readers of the example which
uses it.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
8 years agotc/m_vlan.c: mention CONTROL option in help text
Phil Sutter [Tue, 22 Mar 2016 14:48:38 +0000 (15:48 +0100)]
tc/m_vlan.c: mention CONTROL option in help text

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
8 years agoman: tc-skbedit.8: Elaborate a bit on TX queues
Phil Sutter [Tue, 22 Mar 2016 14:48:37 +0000 (15:48 +0100)]
man: tc-skbedit.8: Elaborate a bit on TX queues

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
8 years agoman: tc-police.8: Emphasize on the two rate control mechanisms
Phil Sutter [Tue, 22 Mar 2016 14:48:36 +0000 (15:48 +0100)]
man: tc-police.8: Emphasize on the two rate control mechanisms

As Jamal pointed out, there are two different approaches to bandwidth
measurement. Try to make this clear by separating them in synopsis and
also documenting the way to fine-tune avrate.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
8 years agoman: tc-mirred.8: Reword man page a bit, add generic mirror example
Phil Sutter [Tue, 22 Mar 2016 14:48:35 +0000 (15:48 +0100)]
man: tc-mirred.8: Reword man page a bit, add generic mirror example

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
8 years agoman: tc-csum.8: Add an example
Phil Sutter [Tue, 22 Mar 2016 14:48:34 +0000 (15:48 +0100)]
man: tc-csum.8: Add an example

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
8 years agotc: connmark, pedit: Rename BRANCH to CONTROL
Phil Sutter [Tue, 22 Mar 2016 14:48:33 +0000 (15:48 +0100)]
tc: connmark, pedit: Rename BRANCH to CONTROL

As Jamal suggested, BRANCH is the wrong name, as these keywords go
beyond simple branch control - e.g. loops are possible, too. Therefore
rename the non-terminal to CONTROL instead which should be more
appropriate.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
8 years agodoc/tc-filters.tex: Drop overly subjective paragraphs
Phil Sutter [Tue, 22 Mar 2016 14:48:32 +0000 (15:48 +0100)]
doc/tc-filters.tex: Drop overly subjective paragraphs

Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
8 years agotestsuite: add a test for tc pedit action
Phil Sutter [Tue, 22 Mar 2016 14:16:24 +0000 (15:16 +0100)]
testsuite: add a test for tc pedit action

This is not a full test, since kernel functionality is not actually
tested. It only compares that the kernel returned values when listing
the action are what one expects them to be.

Since this test succeeded on both a little-endian and a big-endian
system, it shows that any endianness issues have been resolved in
tc/p_ip.c at least.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
8 years agotc: pedit: Fix raw op
Phil Sutter [Tue, 22 Mar 2016 14:16:23 +0000 (15:16 +0100)]
tc: pedit: Fix raw op

The retain value was wrong for u16 and u8 types.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
8 years agotc: pedit: Fix for big-endian systems
Phil Sutter [Tue, 22 Mar 2016 14:16:22 +0000 (15:16 +0100)]
tc: pedit: Fix for big-endian systems

This was tricky to get right:
- The 'stride' value used for 8 and 16 bit values must behave inverse to
  the value's intra word offset to work correctly with big-endian data
  act_pedit is editing.
- The 'm' array's values are in host byte order, so they have to be
  converted as well (and the ordering was just inverse, for some
  reason).
- The only sane way of getting this right is to manipulate value/mask in
  host byte order and convert the output.
- TIPV4 (i.e. 'munge ip src/dst') had it's own pitfall: the address
  parser converts to network byte order automatically. This patch fixes
  this by converting it back before calling pack_key32, which is a hack
  but at least does not require to implement a completely separate code
  flow.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
8 years agotc/p_ip.c: Minor coding style cleanup
Phil Sutter [Tue, 22 Mar 2016 14:16:21 +0000 (15:16 +0100)]
tc/p_ip.c: Minor coding style cleanup

Break overlong function definitions and remove one extraneous
whitespace.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
8 years agonetconf: add support for ignore route attribute
Zhang Shengju [Mon, 21 Mar 2016 19:16:25 +0000 (12:16 -0700)]
netconf: add support for ignore route attribute

Add support for ignore route attribute, and refine the code to use
rta_getattr_* function to get attribute value.

Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
8 years agoman: update netconf manual for new attributes
Zhang Shengju [Tue, 15 Mar 2016 02:32:12 +0000 (02:32 +0000)]
man: update netconf manual for new attributes

Update this manual to add attributes proxy_neigh and
ignore_routes_with_linkdown.

Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
8 years agonetconf: replace macro with a function
Stephen Hemminger [Mon, 21 Mar 2016 19:13:57 +0000 (12:13 -0700)]
netconf: replace macro with a function

The number of casts in macro was excessive.

8 years agoupdate kernel headers to 4.6 (pre rc1)
Stephen Hemminger [Mon, 21 Mar 2016 19:02:32 +0000 (12:02 -0700)]
update kernel headers to 4.6 (pre rc1)

8 years agomisc: fix style issues
Stephen Hemminger [Mon, 21 Mar 2016 18:56:36 +0000 (11:56 -0700)]
misc: fix style issues

More checkpatch spring cleaning

8 years agobridge: code cleanup
Stephen Hemminger [Mon, 21 Mar 2016 18:56:01 +0000 (11:56 -0700)]
bridge: code cleanup

Use checkpatch auto fix to cleanup lingering style issues

8 years agoip: code cleanup
Stephen Hemminger [Mon, 21 Mar 2016 18:52:19 +0000 (11:52 -0700)]
ip: code cleanup

Run all the ip code through checkpatch and have it fix the obvious stuff.

8 years agotc: code cleanup
Stephen Hemminger [Mon, 21 Mar 2016 18:48:36 +0000 (11:48 -0700)]
tc: code cleanup

Use checkpatch to fix whitespace and other style issues.

8 years agotc: q_{codel,fq_codel}: add missing space in help text
Luca Lemmo [Wed, 16 Mar 2016 16:56:14 +0000 (17:56 +0100)]
tc: q_{codel,fq_codel}: add missing space in help text

Signed-off-by: Luca Lemmo <luca@linux.com>
8 years agotc: f_u32: trivial coding style cleanups
Luca Lemmo [Wed, 16 Mar 2016 16:56:13 +0000 (17:56 +0100)]
tc: f_u32: trivial coding style cleanups

Signed-off-by: Luca Lemmo <luca@linux.com>
8 years agotc: f_u32: add missing spaces around operators
Luca Lemmo [Wed, 16 Mar 2016 16:56:12 +0000 (17:56 +0100)]
tc: f_u32: add missing spaces around operators

Signed-off-by: Luca Lemmo <luca@linux.com>
8 years agobridge: mdb: add support for extended router port information
Nikolay Aleksandrov [Mon, 14 Mar 2016 10:04:46 +0000 (11:04 +0100)]
bridge: mdb: add support for extended router port information

Recently a new temp router port mode was added and with it the dumped
information was extended similar to how mdb entries were done. This
patch adds support to dump the new information by using the "-s" switch.
Example:
$ bridge -d -s mdb show
dev br0 port eth1 grp ff02::1:ffbf:5716 temp 234.39
dev br0 port eth1 grp 239.0.0.2 temp  97.17
dev br0 port eth1 grp 239.0.0.3 temp 105.36
router ports on br0: eth1    0.00 permanent
router ports on br0: eth2  254.87 temp

It also updates the bridge man page.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>