Add support to configure SR-IOV VF minimum and maximum Tx rate through ip tool
o "min_tx_rate" option has been added for minimum Tx rate. Hence, for
consistent naming, "max_tx_rate" option has been introduced for maximum
Tx rate.
o Change in v2: "rate" can be used along with "max_tx_rate".
When both are specified, "max_tx_rate" should override.
o Change in v3:
* IFLA_VF_RATE: When IFLA_VF_RATE is used, and user has given only one of
min_tx_rate or max_tx_rate, reading of previous rate limits is done in
userspace instead of in kernel space before ndo_set_vf_rate.
* IFLA_VF_TX_RATE: When IFLA_VF_TX_RATE is used, min_tx_rate is always read
in kernel space. This takes care of below scenarios:
(1) when old tool sends "rate" but kernel is new (expects min and max)
(2) when new tool sends only "rate" but kernel is old (expects only "rate")
o Change in v4 as suggested by Stephen Hemminger:
* As per iproute policy, input and output formats should match. Changing display
of max_tx_rate and min_tx_rate options accordingly.
./ip/ip link show p3p1
8: p3p1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000
link/ether 00:0e:1e:16:ce:40 brd ff:ff:ff:ff:ff:ff
vf 0 MAC 2a:18:8f:4d:3d:d4, tx rate 700 (Mbps), max_tx_rate 700Mbps, min_tx_rate 200Mbps
vf 1 MAC 72:dc:ba:f9:df:fd
During a rebuild [...]. Please note that we use our research
compiler tool-chain (using tools from the cbmc package), which permits extended
reporting on type inconsistencies at link time.
[...]
gcc bridge.o fdb.o monitor.o link.o mdb.o vlan.o ../lib/libnetlink.a ../lib/libutil.a ../lib/libnetlink.a ../lib/libutil.a -o bridge
file link.c line 18: error: conflicting types for variable "filter_index"
old definition in module fdb file fdb.c line 29
signed int
new definition in module link file link.c line 18
unsigned int
<builtin>: recipe for target 'bridge' failed
make[3]: *** [bridge] Error 64
make[3]: Leaving directory '/srv/jenkins-slave/workspace/sid-goto-cc-iproute2/iproute2-3.14.0/bridge'
Makefile:45: recipe for target 'all' failed
While practical constraints may limit the value of filter_index to remain within
the bounds of a positive signed int, there is certainly no such guarantee here.
Also, a plain majority vote suggests that this really just a wrong declaration
in link.c as several declarations of filter_index as signed int exist.
[...]
My followup on this was:
I think the majority is wrong.
filter_index is assigned exclusively from if_nametoindex or ll_name_to_index
which both return unsigned int.
Changing it to unsigned everywhere seems better.
This has been minimally tested by using the bridge tool
to add vids and showing available vids on different devices.
Reported-by: Michael Tautschnig <mt@debian.org> Signed-off-by: Andreas Henriksson <andreas@fatal.se>
When limit<burst latency becomes <0, for example:
# tc qdisc add dev eth0 root handle 1: tbf limit 100K burst 256K rate 256kbit
# tc qdisc show
qdisc tbf 1: dev eth0 root refcnt 2 rate 256Kbit burst 256Kb lat 4290.0s
If latency<0 there is no reason to show it. Limit will be printed instead of
latency when latency<0:
# tc qdisc show
qdisc tbf 1: dev eth0 root refcnt 2 rate 256Kbit burst 256Kb limit 100Kb
Signed-off-by: Sergey V. Lobanov <sergey@lobanov.in>
Natanael Copa [Tue, 27 May 2014 07:40:10 +0000 (07:40 +0000)]
iproute2: various header include fixes for compiling with musl libc
We need limits.h for LONG_MIN and LONG_MAX, sys/param.h for MIN and
sys/select for struct timeval.
This fixes the following compile errors with musl libc:
f_bpf.c: In function 'bpf_parse_opt':
f_bpf.c:181:12: error: 'LONG_MIN' undeclared (first use in this function)
if (h == LONG_MIN || h == LONG_MAX) {
^
...
tc_util.o: In function `print_tcstats2_attr':
tc_util.c:(.text+0x13fe): undefined reference to `MIN'
tc_util.c:(.text+0x1465): undefined reference to `MIN'
tc_util.c:(.text+0x14ce): undefined reference to `MIN'
tc_util.c:(.text+0x154c): undefined reference to `MIN'
tc_util.c:(.text+0x160a): undefined reference to `MIN'
tc_util.o:tc_util.c:(.text+0x174e): more undefined references to `MIN' follow
...
tc_stab.o: In function `print_size_table':
tc_stab.c:(.text+0x40f): undefined reference to `MIN'
...
fdb.c:247:30: error: 'ULONG_MAX' undeclared (first use in this function)
(vni >> 24) || vni == ULONG_MAX)
^
lnstat.h:28:17: error: field 'last_read' has incomplete type
struct timeval last_read; /* last time of read */
^
Signed-off-by: Natanael Copa <ncopa@alpinelinux.org>
Andreas Greve [Sat, 10 May 2014 09:19:18 +0000 (11:19 +0200)]
fix print_ipt: segfault if more then one filter with action -j MARK.
BUG: tc filter show ... produce a segmentation fault if more than one
filter rule with action -j MARK exists.
Reason: In print_ipt(...) xtables will be initialzed with a
pointer to the static struct tcipt_globals at xtables_init_all().
Later on the fields .opts and .options_offset of tcipt_globals are
modified. The call of xtables_free_opts(1) at the end of print(...)
does not restore the original values of tcipt_globals for the
modified fields. It only frees some allocated memory and sets
.opts to NULL. This leads to a segmentation fault when print_ipt()
is called for the next filter rule with action -j MARK.
Fix: Cloneing tcipt_globals on the stack as tmp_tcipt_globals and
use it instead of tcipt_globals, so tcipt_globals will be not
modified.
Signed-off-by: Andreas Greve <andreas.greve@a-greve.de>
david decotigny [Tue, 6 May 2014 03:38:18 +0000 (20:38 -0700)]
iproute2: show counter of carrier on<->off transitions
This patch allows to display the current counter of carrier on<->off
transitions (IFLA_CARRIER_CHANGES, see kernel commit "expose number of
carrier on/off changes"):
Tested:
- kernel with patch "net-sysfs: expose number of carrier on/off
changes": see "transns" column above
- kernel wthout the patch: "transns" not displayed (as expected)
Signed-off-by: David Decotigny <decot@googlers.com>
HHF qdisc parameters:
- limit: max number of packets in qdisc (default 1000)
- quantum: max deficit per RR round (default 1 MTU)
- hh_limit: max number of HHs to keep states (default 2048)
- reset_timeout: time to reset HHF counters (default 40ms)
- admit_bytes: counter thresh to classify as HH (default 128KB)
- evict_timeout: threshold to evict idle HHs (default 1s)
- non_hh_weight: DRR weight for mice (default 2)
Jay Vosburgh [Fri, 9 May 2014 02:02:20 +0000 (19:02 -0700)]
tc/netem: fix loss state display and p14 parsing
The display of the entire netem loss state is shown as if it
were gemodel state, as the loss state information is assigned to the
wrong pointer. Correct this by assigning the loss state to the correct
pointer.
Additionally, attempting to set netem loss state will result in
random values in the p14 state probability because the option value
passed to the kernel by tc netem is not parsed or initialized. Fix this
by supplying a default value of 0 for p14 and parsing the p14 value if
one is supplied.
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Oliver Hartkopp [Wed, 23 Apr 2014 17:27:26 +0000 (19:27 +0200)]
iproute2: can: fix indention white spaces
When preparing a patch for CAN FD support these white space issues showed up.
Fix it in the current code to be able to provide a proper follow up patch.
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
ip: officially support flag mngtmpaddr also for "ip addr del"
Kernel is being extended to support flag IFA_F_MANAGETEMPADDR also for
deletion of addresses. This will allow a userspace application to indicate
that for a global address the kernel should delete all related temporary
addresses as well.
"ip addr del" internally calls ipaddr_modify which silently accepts
any flag provided on the command line already, independent of the
actual command.
Therefore only the usage documentation needs to be extended.
Flags for a peer override flags for the other and not used for the
peer.
before:
# ip link add up type veth peer down multicast off
# ip link
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: veth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 2e:5c:cd:f5:63:d2 brd ff:ff:ff:ff:ff:ff
3: veth1: <BROADCAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 72:b0:fa:1e:76:7a brd ff:ff:ff:ff:ff:ff
after:
# ip link add up type veth peer down multicast off
# ip link
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: veth0: <BROADCAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 6e:db:03:b3:bd:ff brd ff:ff:ff:ff:ff:ff
3: veth1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 1000
link/ether a6:62:d9:84:f0:73 brd ff:ff:ff:ff:ff:ff
Masatake YAMATO [Wed, 12 Mar 2014 03:46:53 +0000 (12:46 +0900)]
iproute: Show default type, table, proto and scope of route
In "ip route show" output unicast type, main table, boot protocol and
universe scope are hidden as default labels.
Sometimes it is helpful to show the hidden label for people not enough
familiar with routing subsystem to map the output of "ip route show" and
kernel source code.
With this patch "ip route show" with -d option shows the default labels.
Example of difference of output with -d option:
$ ./ip/ip -4 route show table all dev virbr1
...
192.168.121.0/28 proto kernel scope link src 192.168.121.1
...
$ ./ip/ip -4 -d route show table all dev virbr1
...
unicast 192.168.121.0/28 table main proto kernel scope link src 192.168.121.1
...
Hiroaki SHIMODA [Sat, 8 Mar 2014 13:10:03 +0000 (22:10 +0900)]
htb: Move direct_qlen code part to htb_parse_opt().
The direct_qlen command option is used with qdisc operation.
It happened to be implemented in htb_parse_class_opt() which is called
with class operation.
Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com> Cc: Eric Dumazet <eric.dumazet@gmail.com>
Richard Haines [Fri, 7 Mar 2014 10:36:52 +0000 (10:36 +0000)]
ss: Add support for retrieving SELinux contexts
The process SELinux contexts can be added to the output using the -Z
option. Using the -z option will show the process and socket contexts (see
the man page for details).
For netlink sockets: if valid process show process context, if pid = 0
show kernel initial context, if unknown show "unavailable".
Signed-off-by: Richard Haines <richard_c_haines@btinternet.com>
Michal Kubeček [Thu, 13 Feb 2014 16:31:59 +0000 (17:31 +0100)]
iplink_bond: fix parameter value matching
Lookup function get_index() compares argument with table entries
only up to the length of the table entry so that if an entry
with lower index is a substring of a later one, earlier entry is
used even if the argument is equal to the other. For example,
ip link set bond0 type bond xmit_hash_policy layer2+3
sets xmit_hash_policy to 0 (layer2) as this is found before
"layer2+3" can be checked.
[root@xemvm1 iproute2]# ./misc/ss -a -u
State Recv-Q Send-Q Local Address:Port Peer Address:Port
UNCONN 0 0 *:32713 *:*
UNCONN 0 0 *:bootpc *:*
UNCONN 0 0 :::57879 :::*
Reported-by: François-Xavier Le Bail <fx.lebail@yahoo.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Tested-by: François-Xavier Le Bail <fx.lebail@yahoo.com>
Add support for bonding attributes just added to net-next.
On set, allow string or number value for enumerated attributes.
On show, use always use string value for attribute.
Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Proportional Integral controller Enhanced (PIE) is a scheduler to address the
bufferbloat problem.
We present here a lightweight design, PIE(Proportional Integral controller
Enhanced) that can effectively control the average queueing latency to a target
value. Simulation results, theoretical analysis and Linux testbed results have
shown that PIE can ensure low latency and achieve high link utilization under
various congestion situations. The design does not require per-packet
timestamp, so it incurs very small overhead and is simple enough to implement
in both hardware and software. "
For more information, please see technical paper about PIE in the IEEE
Conference on High Performance Switching and Routing 2013. A copy of the paper
can be found at ftp://ftpeng.cisco.com/pie/.
Please also refer to the IETF draft submission at
http://tools.ietf.org/html/draft-pan-tsvwg-pie-00
All relevant code, documents and test scripts and results can be found at
ftp://ftpeng.cisco.com/pie/.
For problems with the iproute2/tc or Linux kernel code, please contact Vijay
Subramanian (vijaynsu@cisco.com or subramanian.vijay@gmail.com) Mythili Prabhu
(mysuryan@cisco.com)
Pavel Emelyanov [Thu, 26 Dec 2013 19:15:20 +0000 (23:15 +0400)]
iproute: Make it possible to specify index on link creation
The RTM_NEWLINK message accepts ifi_index non-zero value and lets
creation of links with given index (if it's free, or course). This
functionality is available since linux-v3.5.
This patch makes this API available via ip tool.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
After a decade of trying to deprecate the old policer syntax,
I believe it is time to kill it. The kernel build option for old
policer is gone for at least 5 years now (although backward
compatibility is still there). Being backward compatible meant
hijacking the keyword "action" and was obstructing policies like:
tc filter add dev eth0 parent ffff: protocol ip pref 10 \
u32 match ip protocol 1 0xff flowid 1:10 \
action skbedit mark 1 \
action police rate 10kbit burst 10k pipe \
action skbedit mark 2 \
action police rate 20kbit burst 20k pipe \
action action mirred egress mirror dev dummy0
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>