git.proxmox.com Git - mirror

link dump filter

This patch avoids a full link wildump request when the user has specified
a single link. Uses RTM_GETLINK without the NLM_F_DUMP flag.

This helps on a system with large number of interfaces.

This patch currently only uses the link ifindex in the filter.
Hoping to provide a subsequent kernel patch to do link dump filtering on
other attributes in the kernel.

In iplink_get, to be safe, this patch currently sets the answer buffer
size to the max size that libnetlink rtnl_talk can copy. The current api
does not seem to provide a way to indicate the answer buf size.

changelog from RFC to v1:
- incorporated comments from stephen (fixed comment and fixed if/else block)

changelog from v1 to v2:
- fix whitespaces error

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>

iplink: macvtap: fix man page

This patch adds description about macvtap to ip-link.8 man page.

Signed-off-by: Rami Rosen <ramirose@gmail.com>

fix ip tunnel for vti tunnels with ikey

Consider the following command:

ip tunnel add mode vti remote 12.0.0.1 local 12.0.0.3 ikey 15

i_flags will be GRE_KEY|VTI_ISVTI. So, in order to distinguish between ipip and
vti we have to check just VTI_ISVTI bit, not the equality of i_flags and
VTI_ISVTI.

* Note, that there also was a bug in ip_tunnel/ip_vti, see
commit 7c8e6b9c281(ip_vti: Fix 'ip tunnel add' with 'key' parameters),
https://lkml.org/lkml/2014/6/7/125.
Even patched iproute could be unable to create vti tunnels with non-zero keys.

1) Unpatched iproute2:
[root@vm ~]# ip tunnel show
[root@vm ~]# lsmod | egrep '(ipip|vti)'
[root@vm ~]# ip tunnel add mode vti ikey 1
[root@vm ~]# lsmod | egrep '(ipip|vti)'
ipip                    4197  0
tunnel4                 1659  1 ipip
ip_tunnel               9295  1 ipip
[root@vm ~]# ip tunnel show
tunl0: ip/ip  remote any  local any  ttl inherit
[root@vm ~]# ip tunnel add mode vti remote 1.2.3.4 ikey 2
[root@vm ~]# ip tunnel show
ipip0: ip/ip  remote 1.2.3.4  local any  ttl inherit
tunl0: ip/ip  remote any  local any  ttl inherit
[root@vm ~]# lsmod | egrep '(ipip|vti)'
ipip                    4197  0
tunnel4                 1659  1 ipip
ip_tunnel               9295  1 ipip

# ipip tunnels are created instead of vti

2) Patched iproute2:
[root@vm ~]# ip tunnel show
[root@vm ~]# lsmod | egrep '(ipip|vti)'
[root@vm ~]# ip tunnel add mode vti ikey 1
[root@vm ~]# lsmod | egrep '(ipip|vti)'
ip_vti                  5258  0
ip_tunnel               9295  1 ip_vti
[root@vm ~]# ip tunnel show
vti0: ip/ip  remote any  local any  ttl inherit  ikey 1  okey 0
ip_vti0: ip/ip  remote any  local any  ttl inherit  nopmtudisc key 0
[root@vm ~]# ip tunnel add mode vti remote 1.2.3.4 ikey 2
[root@vm ~]# ip tunnel show
vti0: ip/ip  remote any  local any  ttl inherit  ikey 1  okey 0
vti1: ip/ip  remote 1.2.3.4  local any  ttl inherit  ikey 2  okey 0
ip_vti0: ip/ip  remote any  local any  ttl inherit  nopmtudisc key 0

# Vti tunnels are created as expected
# * If you have unpatched kernel your vti tunnels will have ikey == okey == 0

Same story exists with ip tunnel show/del with non-zero [io]key: requests are
routed to tunl0 instead of ip_vti0.

Signed-off-by: Dmitry Popov <ixaphire@qrator.net>

ipnetns: fixed typo "seting" -> "setTing"

Signed-off-by: Vasily Averin <vvs@openvz.org>

man: token: fix couple of typos

Not sure how these typos slipped in back then, I suspect
too much coffee. ;) So lets fix them up properly.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>

ip: Added missing usage for netconf object

ip: add nlmon as a device type to help message

Though nlmon device can be added, it was not listed
in the output of "ip link help".

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>

ip: check for missing dev arg when doing VF rate

New VF rate code was not handling case where device not specified.
Caught by GCC warning about uninitialized variable.

ip: add paren to silence warning

Gcc doesn't like mixed || and && in same conditional.

Update kernel headers to 3.16-rc5

Merge branch 'net-next'

v3.15.0

bridge: Add master device name to bridge fdb show

This patch adds master dev name from NDA_MASTER netlink attribute
to bridge fdb show output

current iproute2 tries to print 'master' in the output if NTF_MASTER
is present. But, kernel today does not set NTF_MASTER during dump
requests. Which means I have not seen iproute2 bridge cmd print 'master' atall.
This patch overrides the NTF_MASTER flag if NDA_MASTER attribute is present.

Example output:

before this patch:
# bridge fdb show
44:38:39:00:27:ba dev bond2.2003 permanent
44:38:39:00:27:bb dev bond4.2003 permanent
44:38:39:00:27:bc dev bond2.2004 permanent

After this patch:
# bridge fdb show
44:38:39:00:27:ba dev bond2.2003 master br-2003 permanent
44:38:39:00:27:bb dev bond4.2003 master br-2003 permanent
44:38:39:00:27:bc dev bond2.2004 master br-2004 permanent

For comparision with the above, below is the output for NTF_SELF today,
# bridge fdb show
33:33:00:00:00:01 dev eth0 self permanent
01:00:5e:00:00:01 dev eth0 self permanent
33:33:ff:00:01:cc dev eth0 self permanent

If change in output is a concern, 'master' can be put at the end of the fdb
output line or made optional with -d[etails] option.

change from v1 to v2:
    use 'bridge' instead of 'master' in fdb show output

change from v2 to v3:
    use 'master' instead of 'bridge' in fdb show output
    (master could also be a vxlan device)

Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>

Add support to configure SR-IOV VF minimum and maximum Tx rate through ip tool

o "min_tx_rate" option has been added for minimum Tx rate. Hence, for
  consistent naming, "max_tx_rate" option has been introduced for maximum
  Tx rate.

o Change in v2: "rate" can be used along with "max_tx_rate".
  When both are specified, "max_tx_rate" should override.

o Change in v3:
  * IFLA_VF_RATE: When IFLA_VF_RATE is used, and user has given only one of
    min_tx_rate or max_tx_rate, reading of previous rate limits is done in
    userspace instead of in kernel space before ndo_set_vf_rate.

  * IFLA_VF_TX_RATE: When IFLA_VF_TX_RATE is used, min_tx_rate is always read
    in kernel space. This takes care of below scenarios:
    (1) when old tool sends "rate" but kernel is new (expects min and max)
    (2) when new tool sends only "rate" but kernel is old (expects only "rate")

o Change in v4 as suggested by Stephen Hemminger:
  * As per iproute policy, input and output formats should match. Changing display
    of max_tx_rate and min_tx_rate options accordingly.
./ip/ip link show p3p1
8: p3p1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000
        link/ether 00:0e:1e:16:ce:40 brd ff:ff:ff:ff:ff:ff
        vf 0 MAC 2a:18:8f:4d:3d:d4, tx rate 700 (Mbps), max_tx_rate 700Mbps, min_tx_rate 200Mbps
        vf 1 MAC 72:dc:ba:f9:df:fd

Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>

Update to current net-next kernel headers

Update sanitized headers

iproute2: utils: change hexstring_n2a and hexstring_a2n to do not work with ":"

Signed-off-by: Jiri Pirko <jiri@resnulli.us>

iproute2: arpd: use ll_addr_a2n and ll_addr_n2a

Signed-off-by: Jiri Pirko <jiri@resnulli.us>

fq: allow options of fair queue set to ~0U

Some options of fair queue cannot be (~0U). It leads to maxrate
cannot be reset to unlimited because it cannot be (~0U). Allow
the options being ~0U.

Tested by the following command:
# tc qdisc add dev eth4 root handle 1: fq limit 2000 flow_limit 200 maxrate 100mbit quantum 2000 initial_quantum 1600
# tc -s -d qdisc show
qdisc fq 1: dev eth4 root refcnt 2 limit 2000p flow_limit 200p buckets 1024 quantum 2000 initial_quantum 1600 maxrate 100Mbit
Sent 1492 bytes 10 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
  1 flows (0 inactive, 0 throttled)
  0 gc, 0 highprio, 0 throttled

# tc qdisc change dev eth4 root handle 1: fq limit 4294967295 flow_limit 4294967295 maxrate 34359738360 quantum 4294967295 initial_quantum 4294967295
# tc -s -d qdisc show
qdisc fq 1: dev eth4 root refcnt 2 limit 4294967295p flow_limit 4294967295p buckets 1024 quantum 4294967295 initial_quantum 4294967295
Sent 38372 bytes 216 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
  2 flows (1 inactive, 0 throttled)
  0 gc, 2 highprio, 7 throttled

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

bridge: Make filter_index match in signedness

Michael Tautschnig wrote:

During a rebuild [...]. Please note that we use our research
compiler tool-chain (using tools from the cbmc package), which permits extended
reporting on type inconsistencies at link time.

[...]
gcc bridge.o fdb.o monitor.o link.o mdb.o vlan.o ../lib/libnetlink.a ../lib/libutil.a ../lib/libnetlink.a ../lib/libutil.a -o bridge
file link.c line 18: error: conflicting types for variable "filter_index"
old definition in module fdb file fdb.c line 29
signed int
new definition in module link file link.c line 18
unsigned int
<builtin>: recipe for target 'bridge' failed
make[3]: *** [bridge] Error 64
make[3]: Leaving directory '/srv/jenkins-slave/workspace/sid-goto-cc-iproute2/iproute2-3.14.0/bridge'
Makefile:45: recipe for target 'all' failed

While practical constraints may limit the value of filter_index to remain within
the bounds of a positive signed int, there is certainly no such guarantee here.
Also, a plain majority vote suggests that this really just a wrong declaration
in link.c as several declarations of filter_index as signed int exist.

[...]

My followup on this was:

I think the majority is wrong.

filter_index is assigned exclusively from if_nametoindex or ll_name_to_index
which both return unsigned int.

Changing it to unsigned everywhere seems better.

This has been minimally tested by using the bridge tool
to add vids and showing available vids on different devices.

Reported-by: Michael Tautschnig <mt@debian.org>
Signed-off-by: Andreas Henriksson <andreas@fatal.se>

do not exit silently when link is not found

When we create a tunnel on top of a link and the link specified
in cmdline doesn't exist, an error message should be shown.

Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

ss: display pacing_rate/max_pacing_rate

Since linux-3.15, kernel exports tcpi_pacing_rate and
tcpi_max_pacing_rate in tcp_info

Add TCP pacing_rate information on ss -i output :

lpaa23:~# ./ss -ti dst 10.246.7.151
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 325800 10.246.7.151:57614
10.246.7.152:46811
cubic wscale:7,7 rto:201 rtt:0.081/0.006 mss:1448 cwnd:90 ssthresh:63
send 12871.1Mbps pacing_rate 15397.8Mbps unacked:90 retrans:0/305
rcv_space:29200

If SO_MAX_PACING_RATE is set on the socket, we add /max_pacing_rate as
in :

... pacing_rate 1570.5Mbps/2.0Gbps ...

Signed-off-by: Eric Dumazet <edumazet@google.com>

Fix non-literal string format warnings

The lnstat program was building a format string, then using it.
This was safe, but simpler to just use format character * to
get width.

fix format warnings

Enable format security, and fix the warning caused by printing
with string for format.

bridge: Add learning and flood support

Add ability to control learning and flood flags on bridge
ports.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>

Fixed 'tc qdisc show' for tbf when latency<0

When limit<burst latency becomes <0, for example:
# tc qdisc add dev eth0 root handle 1: tbf limit 100K burst 256K rate 256kbit
# tc qdisc show
qdisc tbf 1: dev eth0 root refcnt 2 rate 256Kbit burst 256Kb lat 4290.0s

If latency<0 there is no reason to show it. Limit will be printed instead of
latency when latency<0:
# tc qdisc show
qdisc tbf 1: dev eth0 root refcnt 2 rate 256Kbit burst 256Kb limit 100Kb

Signed-off-by: Sergey V. Lobanov <sergey@lobanov.in>

iplink: can: fix help text and man page

Controller Area Network (CAN) interfaces are physical network interfaces.
They can't be 'created' like software devices by 'ip link add type can'.

Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>

iproute2: ipa: show port id

Signed-off-by: Jiri Pirko <jiri@resnulli.us>

actions: correctly report the number of actions flushed

This also fixes a long standing bug of not sanely reporting the
action chain ordering

Sample scenario test

on window 1(event window):
run "tc monitor" and observe events

on window 2:
sudo tc actions add action drop index 10
sudo tc actions add action ok index 12
sudo tc actions ls action gact
sudo tc actions flush action gact

See the event window reporting two entries
(doing another listing should show empty generic actions)

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>

actions: keyword flowid or classid terminates action pipeline

scenario testcase:

TC="sudo ./tc/tc"
DEV="dev eth0"
$TC qdisc del $DEV ingress
$TC qdisc add $DEV ingress
$TC filter add $DEV parent ffff: protocol ip u32 match ip src 10.0.0.0/24 action police rate 6Mbit burst 6Mbit drop flowid :1
$TC filter add $DEV parent ffff: protocol ip u32 match ip dst 10.0.0.0/24 action police rate 1Gbit burst 1Gbit pass flowid :1
$TC -s filter ls $DEV parent ffff: protocol ip
$TC qdisc del $DEV ingress
$TC qdisc add $DEV ingress
$TC filter add $DEV parent ffff: protocol ip u32 match ip src 10.0.0.0/24 flowid 1:1 action police rate 6Mbit burst 6Mbit drop
$TC filter add $DEV parent ffff: protocol ip u32 match ip dst 10.0.0.0/24 flowid 1:2 action police rate 1Gbit burst 1Gbit pass

$TC -s filter ls $DEV parent ffff: protocol ip
$TC qdisc del $DEV ingress
$TC qdisc add $DEV ingress
$TC filter add $DEV parent ffff: protocol ip pref 10 \
u32 match ip protocol 1 0xff \
flowid 1:10 \
action skbedit mark 11 \
action police rate 10kbit burst 10k pipe index 1 \
action skbedit mark 12 \
action police rate 20kbit burst 20k pipe index 2 \
action mirred egress mirror dev dummy0

$TC -s filter ls $DEV parent ffff: protocol ip
$TC qdisc del $DEV ingress
$TC qdisc add $DEV ingress
$TC filter add $DEV parent ffff: protocol ip pref 10 \
u32 match ip protocol 1 0xff \
action skbedit mark 11 \
action police rate 10kbit burst 10k pipe index 1 \
action skbedit mark 12 \
action police rate 20kbit burst 20k pipe index 2 \
action mirred egress mirror dev dummy0 \
flowid 1:10

$TC -s filter ls $DEV parent ffff: protocol ip

Reported-by: Seann Herdejurgen <seann@herdejurgen.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>

Remove unnecessary debug statement

Reported-by: Seann Herdejurgen <seann@herdejurgen.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>

iproute2: various header include fixes for compiling with musl libc

We need limits.h for LONG_MIN and LONG_MAX, sys/param.h for MIN and
sys/select for struct timeval.

This fixes the following compile errors with musl libc:

f_bpf.c: In function 'bpf_parse_opt':
f_bpf.c:181:12: error: 'LONG_MIN' undeclared (first use in this function)
   if (h == LONG_MIN || h == LONG_MAX) {
            ^
...

tc_util.o: In function `print_tcstats2_attr':
tc_util.c:(.text+0x13fe): undefined reference to `MIN'
tc_util.c:(.text+0x1465): undefined reference to `MIN'
tc_util.c:(.text+0x14ce): undefined reference to `MIN'
tc_util.c:(.text+0x154c): undefined reference to `MIN'
tc_util.c:(.text+0x160a): undefined reference to `MIN'
tc_util.o:tc_util.c:(.text+0x174e): more undefined references to `MIN' follow
...

tc_stab.o: In function `print_size_table':
tc_stab.c:(.text+0x40f): undefined reference to `MIN'
...

fdb.c:247:30: error: 'ULONG_MAX' undeclared (first use in this function)
        (vni >> 24) || vni == ULONG_MAX)
                              ^

lnstat.h:28:17: error: field 'last_read' has incomplete type
  struct timeval last_read;  /* last time of read */
                 ^

Signed-off-by: Natanael Copa <ncopa@alpinelinux.org>

fix print_ipt: segfault if more then one filter with action -j MARK.

BUG: tc filter show ... produce a segmentation fault if more than one
filter rule with action -j MARK exists.

Reason: In print_ipt(...) xtables will be initialzed with a
pointer to the static struct tcipt_globals at xtables_init_all().
Later on the fields .opts and .options_offset of tcipt_globals are
modified. The call of xtables_free_opts(1) at the end of print(...)
does not restore the original values of tcipt_globals for the
modified fields. It only frees some allocated memory and sets
.opts to NULL. This leads to a segmentation fault when print_ipt()
is called for the next filter rule with action -j MARK.

Fix: Cloneing tcipt_globals on the stack as tmp_tcipt_globals and
use it instead of tcipt_globals, so tcipt_globals will be not
modified.

Signed-off-by: Andreas Greve <andreas.greve@a-greve.de>

Document VF link state control in the ip-link man page

Document the support added by commit 07fa9c1 "Add VF link state
control" in the ip-link man page.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>

ipnetns: fix misprint in an error message

Signed-off-by: Mike Rapoport <mike.rapoport@ravellosystems.com>

TBF man page fix (tbf is not classless)

TBF is not classless qdisc. man page corrected, added example
describing the use of inner qdisc

Signed-off-by: Sergey V. Lobanov <sergey@lobanov.in>

Fix Linux priority and band for TOS==0x2 (man 8 tc-prio)

Due to commit 4a2b9c3(in Linux kernel) Linux priority(skb->priority)
changed for TOS==0x2

Signed-off-by: Sergey V. Lobanov <sergey@lobanov.in>

Whitespace and indentation cleanup

Need to go over whole source and scrub..

iproute2: show counter of carrier on<->off transitions

This patch allows to display the current counter of carrier on<->off
transitions (IFLA_CARRIER_CHANGES, see kernel commit "expose number of
carrier on/off changes"):

  ip -s -s link show dev eth0
  32: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 ...
    link/ether ................. brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast
    125552461  258881   0       0       0       10150
    RX errors: length  crc     frame   fifo    missed
               0        0       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    40426119   224444   0       0       0       0
    TX errors: aborted fifo    window  heartbeat transns
               0        0       0       0        3

Tested:
  - kernel with patch "net-sysfs: expose number of carrier on/off
    changes": see "transns" column above
  - kernel wthout the patch: "transns" not displayed (as expected)

Signed-off-by: David Decotigny <decot@googlers.com>

support for Heavy Hitter Filter (HHF) qdisc

$tc qdisc add dev eth0 hhf help
Usage: ... hhf [ limit PACKETS ] [ quantum BYTES]
               [ hh_limit NUMBER ]
               [ reset_timeout TIME ]
               [ admit_bytes BYTES ]
               [ evict_timeout TIME ]
               [ non_hh_weight NUMBER ]

$tc -s -d qdisc show dev eth0
qdisc hhf 8005: root refcnt 32 limit 1000p quantum 1514 hh_limit 2048
reset_timeout 40.0ms admit_bytes 131072 evict_timeout 1.0s non_hh_weight 2
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
    drop_overlimit 0 hh_overlimit 0 tot_hh 0 cur_hh 0

HHF qdisc parameters:
- limit: max number of packets in qdisc (default 1000)
- quantum: max deficit per RR round (default 1 MTU)
- hh_limit: max number of HHs to keep states (default 2048)
- reset_timeout: time to reset HHF counters (default 40ms)
- admit_bytes: counter thresh to classify as HH (default 128KB)
- evict_timeout: threshold to evict idle HHs (default 1s)
- non_hh_weight:  DRR weight for mice (default 2)

Signed-off-by: Terry Lam <vtlam@google.com>

tc/netem: fix loss state display and p14 parsing

The display of the entire netem loss state is shown as if it
were gemodel state, as the loss state information is assigned to the
wrong pointer. Correct this by assigning the loss state to the correct
pointer.

Additionally, attempting to set netem loss state will result in
random values in the p14 state probability because the option value
passed to the kernel by tc netem is not parsed or initialized. Fix this
by supplying a default value of 0 for p14 and parsing the p14 value if
one is supplied.

Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>

iproute2: can: support CAN FD control interface

For CAN FD a new set of bittiming configuration and enabling functions for the
data section is provided by the CAN driver infrastructure.

This patch allows to configure the newly introduced CAN FD properties.

Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>

iproute2: can: fix indention white spaces

When preparing a patch for CAN FD support these white space issues showed up.
Fix it in the current code to be able to provide a proper follow up patch.

Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>

Update to 3.15-rc2 headers

Merge branch 'net-next'

v3.14.0

ip: officially support flag mngtmpaddr also for "ip addr del"

Kernel is being extended to support flag IFA_F_MANAGETEMPADDR also for
deletion of addresses. This will allow a userspace application to indicate
that for a global address the kernel should delete all related temporary
addresses as well.

"ip addr del" internally calls ipaddr_modify which silently accepts
any flag provided on the command line already, independent of the
actual command.
Therefore only the usage documentation needs to be extended.

Signed-off-by: Heiner Kallweit <heiner.kallweit@web.de>

ipaddress: do not add IFA_FLAGS when not necessary

commit 37c9b94ed21d5779acc23d89a4 (add support for extended ifa_flags)
introduced a regression:

# ./ip/ip addr add 192.168.0.1/24 dev eth0
RTNETLINK answers: Invalid argument

This is due to old kernels don't support IFA_FLAGS flag, we should not
use it if we don't use the flags beyond old .ifa_flags.

Cc: Jiri Pirko <jiri@resnulli.us>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

veth: Handle flags correctry

Flags for a peer override flags for the other and not used for the
peer.

before:
# ip link add up type veth peer down multicast off
# ip link
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: veth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 2e:5c:cd:f5:63:d2 brd ff:ff:ff:ff:ff:ff
3: veth1: <BROADCAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 72:b0:fa:1e:76:7a brd ff:ff:ff:ff:ff:ff

after:
# ip link add up type veth peer down multicast off
# ip link
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: veth0: <BROADCAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 6e:db:03:b3:bd:ff brd ff:ff:ff:ff:ff:ff
3: veth1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 1000
    link/ether a6:62:d9:84:f0:73 brd ff:ff:ff:ff:ff:ff

Signed-off-by: Kusanagi Kouichi <slash@ac.auone-net.jp>

fix indentation of ip neighbour man page

Formatting was awful and unclear on ip neighbour

ipxfrm: allow to setup filter when dumping SA

It's now possible to filter SA directly into the kernel by specifying
XFRMA_PROTO and/or XFRMA_ADDRESS_FILTER.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>

Merge branch 'master' into net-next

bridge: fix reporting of IPv6 addresses

Signed-off-by: Mike Rapoport <mike.rapoport@ravellosystems.com>

iproute: Show default type, table, proto and scope of route

In "ip route show" output unicast type, main table, boot protocol and
universe scope are hidden as default labels.

Sometimes it is helpful to show the hidden label for people not enough
familiar with routing subsystem to map the output of "ip route show" and
kernel source code.

With this patch "ip route show" with -d option shows the default labels.

Example of difference of output with -d option:

    $ ./ip/ip -4   route show table all dev virbr1
    ...
    192.168.121.0/28  proto kernel  scope link  src 192.168.121.1
    ...
    $ ./ip/ip -4 -d  route show table all dev virbr1
    ...
    unicast 192.168.121.0/28  table main  proto kernel  scope link  src 192.168.121.1
    ...

Signed-off-by: Masatake YAMATO <yamato@redhat.com>

htb: Move direct_qlen code part to htb_parse_opt().

The direct_qlen command option is used with qdisc operation.
It happened to be implemented in htb_parse_class_opt() which is called
with class operation.

Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>

Update headers to net-next

ss: Add support for retrieving SELinux contexts

The process SELinux contexts can be added to the output using the -Z
option. Using the -z option will show the process and socket contexts (see
the man page for details).
For netlink sockets: if valid process show process context, if pid = 0
show kernel initial context, if unknown show "unavailable".

Signed-off-by: Richard Haines <richard_c_haines@btinternet.com>

iproute2: use named constants instead of number literals to fill rtnl_rttable_hash

Signed-off-by: Masatake YAMATO <yamato@redhat.com>

iproute2: use named constants instead of number literals to fill rtnl_rtscope_tab

Signed-off-by: Masatake YAMATO <yamato@redhat.com>

iproute2: add man page for mqprio

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>

iplink_bond_slave: show mii_status only once

With "ip -d link show", bonding slave mii status is displayed
twice, once as a number and once as a name.

Fixes: 730d3f61 ("iplink: add support for bonding slave")
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>

iplink_bond: fix parameter value matching

Lookup function get_index() compares argument with table entries
only up to the length of the table entry so that if an entry
with lower index is a substring of a later one, earlier entry is
used even if the argument is equal to the other. For example,

ip link set bond0 type bond xmit_hash_policy layer2+3

sets xmit_hash_policy to 0 (layer2) as this is found before
"layer2+3" can be checked.

Use strcmp() to compare whole strings instead.

v2: look for an exact match only

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>

kill spaces before tabs

Remove trailing whitespace

iplink: add support for bonding slave

Signed-off-by: Jiri Pirko <jiri@resnulli.us>

introduce support for slave info data

Signed-off-by: Jiri Pirko <jiri@resnulli.us>

iplink_bond: fix arp_all_targets parameter name in output

Name of arp_all_targets parameter in output of "ip -d link show"
is missing trailing "s".

Fixes: 63d127b0 ("iproute2: finish support for bonding attributes")
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>

ss: display interface name as zone index when needed

This change enable the ss command to display the interface name as zone index
for local addresses when needed.

For this enhanced display *_diag stuff is needed.

It is based on a first version by Bernd Eckenfels.

example:
Netid  State   Recv-Q Send-Q                 Local Address:Port    Peer Address:Port
udp    UNCONN  0      0      fe80::20c:29ff:fe1f:7406%eth1:9999              :::*
udp    UNCONN  0      0                                 :::domain            :::*
tcp    LISTEN  0      3                                 :::domain            :::*
tcp    LISTEN  0      5      fe80::20c:29ff:fe1f:7410%eth2:99                :::*

Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com>

iproute: Fix Netid value for multi-families output

When requesting simultaneous output of TCP and UDP sockets
the netid field shows "tcp" always.

[root@xemvm1 iproute2]# ./misc/ss -a -tu
Netid State      Recv-Q Send-Q                            Local Address:Port                                Peer Address:Port
tcp   UNCONN     0      0                                             *:32713                                          *:*
tcp   UNCONN     0      0                                             *:bootpc                                         *:*
tcp   UNCONN     0      0                                            :::57879                                         :::*
tcp   LISTEN     0      128                                           *:ssh                                            *:*
tcp   ESTAB      0      48                                      1.2.3.5:ssh                                      1.2.3.4:45826
tcp   ESTAB      0      0                                       1.2.3.5:ssh                                      1.2.3.4:45814
tcp   LISTEN     0      128                                          :::ssh                                           :::*

While the 1st 3 sockets are UDP ones:

[root@xemvm1 iproute2]# ./misc/ss -a -u
State       Recv-Q Send-Q                              Local Address:Port                                  Peer Address:Port
UNCONN      0      0                                               *:32713                                            *:*
UNCONN      0      0                                               *:bootpc                                           *:*
UNCONN      0      0                                              :::57879                                           :::*

Reported-by: François-Xavier Le Bail <fx.lebail@yahoo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Tested-by: François-Xavier Le Bail <fx.lebail@yahoo.com>

tcp_metrics: Allow removal based on the source-IP

This patch allows adding the source-IP attribute to the netlink-command.

Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>

tcp_metrics: Display source-address

This patch allows to display the source-IP.
stype will be used in the next patch that allows to remove based on the
source-IP.

Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>

tcp_metrics: Rename addr to daddr and add local variable

Renaming addr to daddr, because we will introduce saddr later.

The local variable is necessary to store RTA_PAYLOAD(a) temporarily.

Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>

pedit: do not print debugging information by default

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

genl: fix a typo in help message of ctrl

Signed-off-by: Masatake YAMATO <yamato@redhat.com>

Update kernel headers to 3.13-rc2

Merge branch 'net-next-for-3.13'

PIE: Add man page

This adds the manpage for PIE: Proportional Integral controller Enhanced AQM
scheme.

Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Signed-off-by: Vijay Subramanian <vijaynsu@cisco.com>
CC: Dave Taht <dave.taht@bufferbloat.net>

netem: add 64bit rates support

netem support 64bit rates start from linux-3.13.
Add 64bit rates support in tc tools.

tc qdisc show dev eth0
qdisc netem 1: dev eth4 root refcnt 2 limit 1000 rate 35Gbit

Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Acked-by: Eric Dumazet <edumazet@google.com>

tbf: support sending burst/mtu to kernel directly

To avoid loss when transforming burst to buffer in userspace, send
burst/mtu to kernel directly.

Kernel commit 2e04ad424b("sch_tbf: add TBF_BURST/TBF_PBURST attribute")
make it can handle burst/mtu.

Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

add support for IFA_F_NOPREFIXROUTE

Signed-off-by: Thomas Haller <thaller@redhat.com>

add support for IFA_F_MANAGETEMPADDR

Signed-off-by: Jiri Pirko <jiri@resnulli.us>

Update headers files from net-next

Revert "vxlan: remove dstport option"

This reverts commit 92deabcf29e1b3df99230e89acc84fd8de53c87f.

Conflicts:
ip/iplink_vxlan.c

Allow setting dst_port in 3.12

iproute2: finish support for bonding attributes

Add support for bonding attributes just added to net-next.
On set, allow string or number value for enumerated attributes.
On show, use always use string value for attribute.

Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>

Merge branch 'master' into net-next-for-3.13

ss: add unix_seqpacket to the help message and the man page

Signed-off-by: Masatake YAMATO <yamato@redhat.com>

ss: enable query by type in unix domain related socket

This patch enables -A unix_stream, -A unix_dgram and
-A unix_seqpacket option even if ss gets socket information
via netlink.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>

ss: handle seqpacket type of unix domain socket

ss didn't distignish seqpacket type from dgram type.
With this patch ss can distignish it.

$ misc/ss -x -a | grep seq
u_seq  LISTEN     0      128    /run/udev/control 10966                 * 0
u_seq  ESTAB      0      0                    * 115103                * 115104
u_seq  ESTAB      0      0                    * 115104                * 115103

Signed-off-by: Masatake YAMATO <yamato@redhat.com>

PIE: Proportional Integral controller Enhanced

Proportional Integral controller Enhanced (PIE) is a scheduler to address the
bufferbloat problem.

We present here a lightweight design, PIE(Proportional Integral controller
Enhanced) that can effectively control the average queueing latency to a target
value. Simulation results, theoretical analysis and Linux testbed results have
shown that PIE can ensure low latency and achieve high link utilization under
various congestion situations. The design does not require per-packet
timestamp, so it incurs very small overhead and is simple enough to implement
in both hardware and software. "

For more information, please see technical paper about PIE in the IEEE
Conference on High Performance Switching and Routing 2013. A copy of the paper
can be found at ftp://ftpeng.cisco.com/pie/.

Please also refer to the IETF draft submission at
http://tools.ietf.org/html/draft-pan-tsvwg-pie-00

All relevant code, documents and test scripts and results can be found at
ftp://ftpeng.cisco.com/pie/.

For problems with the iproute2/tc or Linux kernel code, please contact Vijay
Subramanian (vijaynsu@cisco.com or subramanian.vijay@gmail.com) Mythili Prabhu
(mysuryan@cisco.com)

Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Signed-off-by: Mythili Prabhu <mysuryan@cisco.com>
CC: Dave Taht <dave.taht@bufferbloat.net>

add support for extended ifa_flags

Signed-off-by: Jiri Pirko <jiri@resnulli.us>

Update to 3.13-rc6 + net-next headers

Merge branch 'master' into net-next-for-3.13

iproute: Document the "ip link add index IDX" possibility

Signed-off-by: Pavel Emelyanov <xemul@paralles.com>

iptunnel: Allow GRE_KEY for vti interface

The vti interface will use GRE_KEY to match the right policy in kernel. So we
can not return fail when the tunnel is vti.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>

iproute: Make it possible to specify index on link creation

The RTM_NEWLINK message accepts ifi_index non-zero value and lets
creation of links with given index (if it's free, or course). This
functionality is available since linux-v3.5.

This patch makes this API available via ip tool.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>

update to latest net-next headers

dont skip action order

attached.

cheers,
jamal
commit 58d78f9f6447df324cdeb99262442c5e3f1f924b
Author: Jamal Hadi Salim <jhs@mojatatu.com>
Date: Sun Dec 22 10:34:18 2013 -0500

dont skip displaying of action chains or lists by TCA_ACT_MAX_PRIO

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>

allow batch gets of actions

Attached.

cheers,
jamal
commit c5f30cabef14c951596210b96bc9b423b0d39592
Author: Jamal Hadi Salim <hadi@mojatatu.com>
Date:   Sun Dec 22 10:24:17 2013 -0500

    Allow batching of action gets
    Example:
    ----
    tc actions get \
    action gact index 100 \
    action gact index 4
    ----

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>

simple print newline

attached.

cheers,
jamal
commit d7869e6167c3553e93e254940b0647032b40fed8
Author: Jamal Hadi Salim <jhs@mojatatu.com>
Date: Sun Dec 22 07:46:28 2013 -0500

print new line at the end for aesthetics

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>

policer - retire old syntax

attached.

cheers,
jamal
commit b82057d9ec851a8aba8a295b959190ef5098f330
Author: Jamal Hadi Salim <jhs@mojatatu.com>
Date:   Sat Dec 21 17:00:11 2013 -0500

    After a decade of trying to deprecate the old policer syntax,
    I believe it is time to kill it. The kernel build option for old
    policer is gone for at least 5 years now (although backward
    compatibility is still there). Being backward compatible meant
    hijacking the keyword "action" and was obstructing policies like:

    tc filter add dev eth0 parent ffff: protocol ip pref 10 \
    u32 match ip protocol 1 0xff flowid 1:10 \
    action skbedit mark 1 \
    action police rate 10kbit burst 10k pipe \
    action skbedit mark 2 \
    action police rate 20kbit burst 20k pipe \
    action action mirred egress mirror dev dummy0

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>

skbedit print missing metadata

skbedit should print the index and other generic metadata info

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>