This patch avoids a full link wildump request when the user has specified
a single link. Uses RTM_GETLINK without the NLM_F_DUMP flag.
This helps on a system with large number of interfaces.
This patch currently only uses the link ifindex in the filter.
Hoping to provide a subsequent kernel patch to do link dump filtering on
other attributes in the kernel.
In iplink_get, to be safe, this patch currently sets the answer buffer
size to the max size that libnetlink rtnl_talk can copy. The current api
does not seem to provide a way to indicate the answer buf size.
changelog from RFC to v1:
- incorporated comments from stephen (fixed comment and fixed if/else block)
Dmitry Popov [Fri, 6 Jun 2014 20:33:49 +0000 (00:33 +0400)]
fix ip tunnel for vti tunnels with ikey
Consider the following command:
ip tunnel add mode vti remote 12.0.0.1 local 12.0.0.3 ikey 15
i_flags will be GRE_KEY|VTI_ISVTI. So, in order to distinguish between ipip and
vti we have to check just VTI_ISVTI bit, not the equality of i_flags and
VTI_ISVTI.
* Note, that there also was a bug in ip_tunnel/ip_vti, see
commit 7c8e6b9c281(ip_vti: Fix 'ip tunnel add' with 'key' parameters),
https://lkml.org/lkml/2014/6/7/125.
Even patched iproute could be unable to create vti tunnels with non-zero keys.
1) Unpatched iproute2:
[root@vm ~]# ip tunnel show
[root@vm ~]# lsmod | egrep '(ipip|vti)'
[root@vm ~]# ip tunnel add mode vti ikey 1
[root@vm ~]# lsmod | egrep '(ipip|vti)'
ipip 4197 0
tunnel4 1659 1 ipip
ip_tunnel 9295 1 ipip
[root@vm ~]# ip tunnel show
tunl0: ip/ip remote any local any ttl inherit
[root@vm ~]# ip tunnel add mode vti remote 1.2.3.4 ikey 2
[root@vm ~]# ip tunnel show
ipip0: ip/ip remote 1.2.3.4 local any ttl inherit
tunl0: ip/ip remote any local any ttl inherit
[root@vm ~]# lsmod | egrep '(ipip|vti)'
ipip 4197 0
tunnel4 1659 1 ipip
ip_tunnel 9295 1 ipip
# ipip tunnels are created instead of vti
2) Patched iproute2:
[root@vm ~]# ip tunnel show
[root@vm ~]# lsmod | egrep '(ipip|vti)'
[root@vm ~]# ip tunnel add mode vti ikey 1
[root@vm ~]# lsmod | egrep '(ipip|vti)'
ip_vti 5258 0
ip_tunnel 9295 1 ip_vti
[root@vm ~]# ip tunnel show
vti0: ip/ip remote any local any ttl inherit ikey 1 okey 0
ip_vti0: ip/ip remote any local any ttl inherit nopmtudisc key 0
[root@vm ~]# ip tunnel add mode vti remote 1.2.3.4 ikey 2
[root@vm ~]# ip tunnel show
vti0: ip/ip remote any local any ttl inherit ikey 1 okey 0
vti1: ip/ip remote 1.2.3.4 local any ttl inherit ikey 2 okey 0
ip_vti0: ip/ip remote any local any ttl inherit nopmtudisc key 0
# Vti tunnels are created as expected
# * If you have unpatched kernel your vti tunnels will have ikey == okey == 0
Same story exists with ip tunnel show/del with non-zero [io]key: requests are
routed to tunl0 instead of ip_vti0.
Roopa Prabhu [Sun, 8 Jun 2014 05:23:42 +0000 (22:23 -0700)]
bridge: Add master device name to bridge fdb show
This patch adds master dev name from NDA_MASTER netlink attribute
to bridge fdb show output
current iproute2 tries to print 'master' in the output if NTF_MASTER
is present. But, kernel today does not set NTF_MASTER during dump
requests. Which means I have not seen iproute2 bridge cmd print 'master' atall.
This patch overrides the NTF_MASTER flag if NDA_MASTER attribute is present.
Example output:
before this patch:
# bridge fdb show
44:38:39:00:27:ba dev bond2.2003 permanent
44:38:39:00:27:bb dev bond4.2003 permanent
44:38:39:00:27:bc dev bond2.2004 permanent
After this patch:
# bridge fdb show
44:38:39:00:27:ba dev bond2.2003 master br-2003 permanent
44:38:39:00:27:bb dev bond4.2003 master br-2003 permanent
44:38:39:00:27:bc dev bond2.2004 master br-2004 permanent
For comparision with the above, below is the output for NTF_SELF today,
# bridge fdb show
33:33:00:00:00:01 dev eth0 self permanent
01:00:5e:00:00:01 dev eth0 self permanent
33:33:ff:00:01:cc dev eth0 self permanent
If change in output is a concern, 'master' can be put at the end of the fdb
output line or made optional with -d[etails] option.
change from v1 to v2:
use 'bridge' instead of 'master' in fdb show output
change from v2 to v3:
use 'master' instead of 'bridge' in fdb show output
(master could also be a vxlan device)
Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Add support to configure SR-IOV VF minimum and maximum Tx rate through ip tool
o "min_tx_rate" option has been added for minimum Tx rate. Hence, for
consistent naming, "max_tx_rate" option has been introduced for maximum
Tx rate.
o Change in v2: "rate" can be used along with "max_tx_rate".
When both are specified, "max_tx_rate" should override.
o Change in v3:
* IFLA_VF_RATE: When IFLA_VF_RATE is used, and user has given only one of
min_tx_rate or max_tx_rate, reading of previous rate limits is done in
userspace instead of in kernel space before ndo_set_vf_rate.
* IFLA_VF_TX_RATE: When IFLA_VF_TX_RATE is used, min_tx_rate is always read
in kernel space. This takes care of below scenarios:
(1) when old tool sends "rate" but kernel is new (expects min and max)
(2) when new tool sends only "rate" but kernel is old (expects only "rate")
o Change in v4 as suggested by Stephen Hemminger:
* As per iproute policy, input and output formats should match. Changing display
of max_tx_rate and min_tx_rate options accordingly.
./ip/ip link show p3p1
8: p3p1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000
link/ether 00:0e:1e:16:ce:40 brd ff:ff:ff:ff:ff:ff
vf 0 MAC 2a:18:8f:4d:3d:d4, tx rate 700 (Mbps), max_tx_rate 700Mbps, min_tx_rate 200Mbps
vf 1 MAC 72:dc:ba:f9:df:fd
During a rebuild [...]. Please note that we use our research
compiler tool-chain (using tools from the cbmc package), which permits extended
reporting on type inconsistencies at link time.
[...]
gcc bridge.o fdb.o monitor.o link.o mdb.o vlan.o ../lib/libnetlink.a ../lib/libutil.a ../lib/libnetlink.a ../lib/libutil.a -o bridge
file link.c line 18: error: conflicting types for variable "filter_index"
old definition in module fdb file fdb.c line 29
signed int
new definition in module link file link.c line 18
unsigned int
<builtin>: recipe for target 'bridge' failed
make[3]: *** [bridge] Error 64
make[3]: Leaving directory '/srv/jenkins-slave/workspace/sid-goto-cc-iproute2/iproute2-3.14.0/bridge'
Makefile:45: recipe for target 'all' failed
While practical constraints may limit the value of filter_index to remain within
the bounds of a positive signed int, there is certainly no such guarantee here.
Also, a plain majority vote suggests that this really just a wrong declaration
in link.c as several declarations of filter_index as signed int exist.
[...]
My followup on this was:
I think the majority is wrong.
filter_index is assigned exclusively from if_nametoindex or ll_name_to_index
which both return unsigned int.
Changing it to unsigned everywhere seems better.
This has been minimally tested by using the bridge tool
to add vids and showing available vids on different devices.
Reported-by: Michael Tautschnig <mt@debian.org> Signed-off-by: Andreas Henriksson <andreas@fatal.se>
When limit<burst latency becomes <0, for example:
# tc qdisc add dev eth0 root handle 1: tbf limit 100K burst 256K rate 256kbit
# tc qdisc show
qdisc tbf 1: dev eth0 root refcnt 2 rate 256Kbit burst 256Kb lat 4290.0s
If latency<0 there is no reason to show it. Limit will be printed instead of
latency when latency<0:
# tc qdisc show
qdisc tbf 1: dev eth0 root refcnt 2 rate 256Kbit burst 256Kb limit 100Kb
Signed-off-by: Sergey V. Lobanov <sergey@lobanov.in>
Natanael Copa [Tue, 27 May 2014 07:40:10 +0000 (07:40 +0000)]
iproute2: various header include fixes for compiling with musl libc
We need limits.h for LONG_MIN and LONG_MAX, sys/param.h for MIN and
sys/select for struct timeval.
This fixes the following compile errors with musl libc:
f_bpf.c: In function 'bpf_parse_opt':
f_bpf.c:181:12: error: 'LONG_MIN' undeclared (first use in this function)
if (h == LONG_MIN || h == LONG_MAX) {
^
...
tc_util.o: In function `print_tcstats2_attr':
tc_util.c:(.text+0x13fe): undefined reference to `MIN'
tc_util.c:(.text+0x1465): undefined reference to `MIN'
tc_util.c:(.text+0x14ce): undefined reference to `MIN'
tc_util.c:(.text+0x154c): undefined reference to `MIN'
tc_util.c:(.text+0x160a): undefined reference to `MIN'
tc_util.o:tc_util.c:(.text+0x174e): more undefined references to `MIN' follow
...
tc_stab.o: In function `print_size_table':
tc_stab.c:(.text+0x40f): undefined reference to `MIN'
...
fdb.c:247:30: error: 'ULONG_MAX' undeclared (first use in this function)
(vni >> 24) || vni == ULONG_MAX)
^
lnstat.h:28:17: error: field 'last_read' has incomplete type
struct timeval last_read; /* last time of read */
^
Signed-off-by: Natanael Copa <ncopa@alpinelinux.org>
Andreas Greve [Sat, 10 May 2014 09:19:18 +0000 (11:19 +0200)]
fix print_ipt: segfault if more then one filter with action -j MARK.
BUG: tc filter show ... produce a segmentation fault if more than one
filter rule with action -j MARK exists.
Reason: In print_ipt(...) xtables will be initialzed with a
pointer to the static struct tcipt_globals at xtables_init_all().
Later on the fields .opts and .options_offset of tcipt_globals are
modified. The call of xtables_free_opts(1) at the end of print(...)
does not restore the original values of tcipt_globals for the
modified fields. It only frees some allocated memory and sets
.opts to NULL. This leads to a segmentation fault when print_ipt()
is called for the next filter rule with action -j MARK.
Fix: Cloneing tcipt_globals on the stack as tmp_tcipt_globals and
use it instead of tcipt_globals, so tcipt_globals will be not
modified.
Signed-off-by: Andreas Greve <andreas.greve@a-greve.de>
david decotigny [Tue, 6 May 2014 03:38:18 +0000 (20:38 -0700)]
iproute2: show counter of carrier on<->off transitions
This patch allows to display the current counter of carrier on<->off
transitions (IFLA_CARRIER_CHANGES, see kernel commit "expose number of
carrier on/off changes"):
Tested:
- kernel with patch "net-sysfs: expose number of carrier on/off
changes": see "transns" column above
- kernel wthout the patch: "transns" not displayed (as expected)
Signed-off-by: David Decotigny <decot@googlers.com>
HHF qdisc parameters:
- limit: max number of packets in qdisc (default 1000)
- quantum: max deficit per RR round (default 1 MTU)
- hh_limit: max number of HHs to keep states (default 2048)
- reset_timeout: time to reset HHF counters (default 40ms)
- admit_bytes: counter thresh to classify as HH (default 128KB)
- evict_timeout: threshold to evict idle HHs (default 1s)
- non_hh_weight: DRR weight for mice (default 2)
Jay Vosburgh [Fri, 9 May 2014 02:02:20 +0000 (19:02 -0700)]
tc/netem: fix loss state display and p14 parsing
The display of the entire netem loss state is shown as if it
were gemodel state, as the loss state information is assigned to the
wrong pointer. Correct this by assigning the loss state to the correct
pointer.
Additionally, attempting to set netem loss state will result in
random values in the p14 state probability because the option value
passed to the kernel by tc netem is not parsed or initialized. Fix this
by supplying a default value of 0 for p14 and parsing the p14 value if
one is supplied.
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Oliver Hartkopp [Wed, 23 Apr 2014 17:27:26 +0000 (19:27 +0200)]
iproute2: can: fix indention white spaces
When preparing a patch for CAN FD support these white space issues showed up.
Fix it in the current code to be able to provide a proper follow up patch.
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
ip: officially support flag mngtmpaddr also for "ip addr del"
Kernel is being extended to support flag IFA_F_MANAGETEMPADDR also for
deletion of addresses. This will allow a userspace application to indicate
that for a global address the kernel should delete all related temporary
addresses as well.
"ip addr del" internally calls ipaddr_modify which silently accepts
any flag provided on the command line already, independent of the
actual command.
Therefore only the usage documentation needs to be extended.
Flags for a peer override flags for the other and not used for the
peer.
before:
# ip link add up type veth peer down multicast off
# ip link
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: veth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 2e:5c:cd:f5:63:d2 brd ff:ff:ff:ff:ff:ff
3: veth1: <BROADCAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 72:b0:fa:1e:76:7a brd ff:ff:ff:ff:ff:ff
after:
# ip link add up type veth peer down multicast off
# ip link
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: veth0: <BROADCAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 6e:db:03:b3:bd:ff brd ff:ff:ff:ff:ff:ff
3: veth1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 1000
link/ether a6:62:d9:84:f0:73 brd ff:ff:ff:ff:ff:ff
Masatake YAMATO [Wed, 12 Mar 2014 03:46:53 +0000 (12:46 +0900)]
iproute: Show default type, table, proto and scope of route
In "ip route show" output unicast type, main table, boot protocol and
universe scope are hidden as default labels.
Sometimes it is helpful to show the hidden label for people not enough
familiar with routing subsystem to map the output of "ip route show" and
kernel source code.
With this patch "ip route show" with -d option shows the default labels.
Example of difference of output with -d option:
$ ./ip/ip -4 route show table all dev virbr1
...
192.168.121.0/28 proto kernel scope link src 192.168.121.1
...
$ ./ip/ip -4 -d route show table all dev virbr1
...
unicast 192.168.121.0/28 table main proto kernel scope link src 192.168.121.1
...
Hiroaki SHIMODA [Sat, 8 Mar 2014 13:10:03 +0000 (22:10 +0900)]
htb: Move direct_qlen code part to htb_parse_opt().
The direct_qlen command option is used with qdisc operation.
It happened to be implemented in htb_parse_class_opt() which is called
with class operation.
Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com> Cc: Eric Dumazet <eric.dumazet@gmail.com>
Richard Haines [Fri, 7 Mar 2014 10:36:52 +0000 (10:36 +0000)]
ss: Add support for retrieving SELinux contexts
The process SELinux contexts can be added to the output using the -Z
option. Using the -z option will show the process and socket contexts (see
the man page for details).
For netlink sockets: if valid process show process context, if pid = 0
show kernel initial context, if unknown show "unavailable".
Signed-off-by: Richard Haines <richard_c_haines@btinternet.com>
Michal Kubeček [Thu, 13 Feb 2014 16:31:59 +0000 (17:31 +0100)]
iplink_bond: fix parameter value matching
Lookup function get_index() compares argument with table entries
only up to the length of the table entry so that if an entry
with lower index is a substring of a later one, earlier entry is
used even if the argument is equal to the other. For example,
ip link set bond0 type bond xmit_hash_policy layer2+3
sets xmit_hash_policy to 0 (layer2) as this is found before
"layer2+3" can be checked.
[root@xemvm1 iproute2]# ./misc/ss -a -u
State Recv-Q Send-Q Local Address:Port Peer Address:Port
UNCONN 0 0 *:32713 *:*
UNCONN 0 0 *:bootpc *:*
UNCONN 0 0 :::57879 :::*
Reported-by: François-Xavier Le Bail <fx.lebail@yahoo.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Tested-by: François-Xavier Le Bail <fx.lebail@yahoo.com>
Add support for bonding attributes just added to net-next.
On set, allow string or number value for enumerated attributes.
On show, use always use string value for attribute.
Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Proportional Integral controller Enhanced (PIE) is a scheduler to address the
bufferbloat problem.
We present here a lightweight design, PIE(Proportional Integral controller
Enhanced) that can effectively control the average queueing latency to a target
value. Simulation results, theoretical analysis and Linux testbed results have
shown that PIE can ensure low latency and achieve high link utilization under
various congestion situations. The design does not require per-packet
timestamp, so it incurs very small overhead and is simple enough to implement
in both hardware and software. "
For more information, please see technical paper about PIE in the IEEE
Conference on High Performance Switching and Routing 2013. A copy of the paper
can be found at ftp://ftpeng.cisco.com/pie/.
Please also refer to the IETF draft submission at
http://tools.ietf.org/html/draft-pan-tsvwg-pie-00
All relevant code, documents and test scripts and results can be found at
ftp://ftpeng.cisco.com/pie/.
For problems with the iproute2/tc or Linux kernel code, please contact Vijay
Subramanian (vijaynsu@cisco.com or subramanian.vijay@gmail.com) Mythili Prabhu
(mysuryan@cisco.com)
Pavel Emelyanov [Thu, 26 Dec 2013 19:15:20 +0000 (23:15 +0400)]
iproute: Make it possible to specify index on link creation
The RTM_NEWLINK message accepts ifi_index non-zero value and lets
creation of links with given index (if it's free, or course). This
functionality is available since linux-v3.5.
This patch makes this API available via ip tool.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
After a decade of trying to deprecate the old policer syntax,
I believe it is time to kill it. The kernel build option for old
policer is gone for at least 5 years now (although backward
compatibility is still there). Being backward compatible meant
hijacking the keyword "action" and was obstructing policies like:
tc filter add dev eth0 parent ffff: protocol ip pref 10 \
u32 match ip protocol 1 0xff flowid 1:10 \
action skbedit mark 1 \
action police rate 10kbit burst 10k pipe \
action skbedit mark 2 \
action police rate 20kbit burst 20k pipe \
action action mirred egress mirror dev dummy0
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>