iproute2: allow to change slave options via type_slave
This patch adds the necessary changes to allow altering a slave device's
options via ip link set <device> type <master type>_slave specific-option.
It also adds support to set the bonding slaves' queue_id.
Example:
ip link set eth0 type bond_slave queue_id 10
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Acked-by: Jiri Pirko <jiri@resnulli.us>
ip monitor: Dont print timestamp or banner-label for cloned routes
This is ugly fix but solves the case when timestamp
or banner-label is printed before the cloned route will be skipped
by iproute filter which filters out all cached routes by default.
In such case timestamp will be printed twice:
Update the rt_dsfield file to contain values defined in current RFC.
The days of TOS precedence are gone, even Cisco doesn't refer
to these in the documents.
vadimk [Sat, 30 Aug 2014 12:06:00 +0000 (15:06 +0300)]
ip link: Remove unnecessary device checking
The real checking is performed later in iplink_modify(..) func which
checks device existence if NLM_F_CREATE flag is set.
Also it fixes the case when impossible to add veth link which was
caused by 9a02651a87 (ip: check for missing dev arg when doing VF rate)
because these devices are not exist yet.
Signed-off-by: Vadim Kochan <vadim4j@gmail.com> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
vadimk [Thu, 28 Aug 2014 13:56:03 +0000 (16:56 +0300)]
ip netns: Show error message if mkdir failed to create /var/run/netns
Currently if mkdir failed with "Permission denied" error then "mount --make-shared ..."
error message will be showed because /var/run/netns does not exist.
Signed-off-by: Vadim Kochan <vadim4j@gmail.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Jay Vosburgh [Sat, 10 May 2014 20:34:58 +0000 (13:34 -0700)]
tc/netem: loss gemodel options fixes
First, the default value for 1-k is documented as being 0, but is
currently being set to 1. (100%). This causes all packets to be dropped
in the good state if 1-k is not explicitly specified. Fix this by setting
the default to 0.
Second, the 1-h option is parsed correctly, however, the kernel is
expecting "h", not 1-h. Fix this by inverting the "1-h" percentage before
sending to and after receiving from the kernel. This does change the
behavior, but makes it consistent with the netem documentation and the
literature on the Gilbert-Elliot model, which refer to "1-h" and "1-k,"
not "h" or "k" directly.
Last, fix a minor formatting issue for the options reporting.
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
iproute2 bridge: bring to above par with brctl show macs
root@moja-mojo:bridge# ./bridge fdb help
Usage: bridge fdb { add | append | del | replace } ADDR dev DEV {self|master} [ temp ]
[router] [ dst IPADDR] [ vlan VID ]
[ port PORT] [ vni VNI ] [via DEV]
bridge fdb {show} [ br BRDEV ] [ brport DEV ]
Lets start with two bridges each with a port...
root@moja-mojo:bridge# ./bridge link
10: sw1-p1 state DOWN : <BROADCAST,NOARP> mtu 1500 master sw1 state disabled priority 32 cost 100
11: eth1 state DOWN : <BROADCAST,NOARP> mtu 1500 master br0 state disabled priority 32 cost 100
show all...
root@moja-mojo:bridge# ./bridge fdb show
33:33:00:00:00:01 dev ifb0 self permanent
33:33:00:00:00:01 dev ifb1 self permanent
33:33:00:00:00:01 dev eth0 self permanent
01:00:5e:00:00:01 dev eth0 self permanent
33:33:ff:92:c0:60 dev eth0 self permanent
33:33:00:00:00:fb dev eth0 self permanent
01:00:5e:00:00:fb dev eth0 self permanent
01:00:5e:7f:ff:fd dev eth0 self permanent
01:00:5e:00:00:01 dev wlan0 self permanent
33:33:00:00:00:01 dev wlan0 self permanent
33:33:ff:c2:84:3b dev wlan0 self permanent
33:33:00:00:00:fb dev wlan0 self permanent
01:00:5e:00:00:01 dev virbr0 self permanent
01:00:5e:00:00:fb dev virbr0 self permanent
33:33:00:00:00:01 dev br0 self permanent
33:33:00:00:00:01 dev sw1 self permanent
33:33:00:00:00:01 dev dummy0 self permanent
5e:f4:03:44:da:9a dev sw1-p1 vlan 0 master sw1 permanent
33:33:00:00:00:01 dev sw1-p1 self permanent
b6:5e:dd:ce:d7:5e dev eth1 vlan 0 master br0 permanent
33:33:00:00:00:01 dev eth1 self permanent
Lets see a netdev that is *not* attached to a bridge
Lets see a netdev that is bridge port
root@moja-mojo:bridge# ./bridge fdb show brport eth1
hadi@jhs-1:/media/MT1/other-gits/iproute-jul04/bridge$ ./bridge fdb show brport eth1
b6:5e:dd:ce:d7:5e vlan 0 master br0 permanent
33:33:00:00:00:01 self permanent
Specify the correct bridge and you get good stuff
root@moja-mojo:bridge# ./bridge fdb show brport eth1 br br0
6:5e:dd:ce:d7:5e vlan 0 master br0 permanent
33:33:00:00:00:01 self permanent
Specify the wrong bridge and you get good nada
root@moja-mojo:bridge# ./bridge fdb show brport eth1 br sw1
dump br0
root@moja-mojo:bridge# ./bridge fdb show br br0
33:33:00:00:00:01 dev br0 self permanent
b6:5e:dd:ce:d7:5e dev eth1 vlan 0 master br0 permanent
33:33:00:00:00:01 dev eth1 self permanent
dump sw1
root@moja-mojo:bridge# ./bridge fdb show br sw1
33:33:00:00:00:01 dev sw1 self permanent
5e:f4:03:44:da:9a dev sw1-p1 vlan 0 master sw1 permanent
33:33:00:00:00:01 dev sw1-p1 self permanent
Lets move a port from one bridge to another for shits-and-giggles
(as the New Brunswickians like to say)
root@moja-mojo:bridge# ip link set sw1-p1 master br0
Now dump again br0
root@moja-mojo:bridge# ./bridge fdb show br br0
33:33:00:00:00:01 dev br0 self permanent
5e:f4:03:44:da:9a dev sw1-p1 vlan 0 master br0 permanent
33:33:00:00:00:01 dev sw1-p1 self permanent
b6:5e:dd:ce:d7:5e dev eth1 vlan 0 master br0 permanent
33:33:00:00:00:01 dev eth1 self permanent
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
This patch avoids a full link wildump request when the user has specified
a single link. Uses RTM_GETLINK without the NLM_F_DUMP flag.
This helps on a system with large number of interfaces.
This patch currently only uses the link ifindex in the filter.
Hoping to provide a subsequent kernel patch to do link dump filtering on
other attributes in the kernel.
In iplink_get, to be safe, this patch currently sets the answer buffer
size to the max size that libnetlink rtnl_talk can copy. The current api
does not seem to provide a way to indicate the answer buf size.
changelog from RFC to v1:
- incorporated comments from stephen (fixed comment and fixed if/else block)
Dmitry Popov [Fri, 6 Jun 2014 20:33:49 +0000 (00:33 +0400)]
fix ip tunnel for vti tunnels with ikey
Consider the following command:
ip tunnel add mode vti remote 12.0.0.1 local 12.0.0.3 ikey 15
i_flags will be GRE_KEY|VTI_ISVTI. So, in order to distinguish between ipip and
vti we have to check just VTI_ISVTI bit, not the equality of i_flags and
VTI_ISVTI.
* Note, that there also was a bug in ip_tunnel/ip_vti, see
commit 7c8e6b9c281(ip_vti: Fix 'ip tunnel add' with 'key' parameters),
https://lkml.org/lkml/2014/6/7/125.
Even patched iproute could be unable to create vti tunnels with non-zero keys.
1) Unpatched iproute2:
[root@vm ~]# ip tunnel show
[root@vm ~]# lsmod | egrep '(ipip|vti)'
[root@vm ~]# ip tunnel add mode vti ikey 1
[root@vm ~]# lsmod | egrep '(ipip|vti)'
ipip 4197 0
tunnel4 1659 1 ipip
ip_tunnel 9295 1 ipip
[root@vm ~]# ip tunnel show
tunl0: ip/ip remote any local any ttl inherit
[root@vm ~]# ip tunnel add mode vti remote 1.2.3.4 ikey 2
[root@vm ~]# ip tunnel show
ipip0: ip/ip remote 1.2.3.4 local any ttl inherit
tunl0: ip/ip remote any local any ttl inherit
[root@vm ~]# lsmod | egrep '(ipip|vti)'
ipip 4197 0
tunnel4 1659 1 ipip
ip_tunnel 9295 1 ipip
# ipip tunnels are created instead of vti
2) Patched iproute2:
[root@vm ~]# ip tunnel show
[root@vm ~]# lsmod | egrep '(ipip|vti)'
[root@vm ~]# ip tunnel add mode vti ikey 1
[root@vm ~]# lsmod | egrep '(ipip|vti)'
ip_vti 5258 0
ip_tunnel 9295 1 ip_vti
[root@vm ~]# ip tunnel show
vti0: ip/ip remote any local any ttl inherit ikey 1 okey 0
ip_vti0: ip/ip remote any local any ttl inherit nopmtudisc key 0
[root@vm ~]# ip tunnel add mode vti remote 1.2.3.4 ikey 2
[root@vm ~]# ip tunnel show
vti0: ip/ip remote any local any ttl inherit ikey 1 okey 0
vti1: ip/ip remote 1.2.3.4 local any ttl inherit ikey 2 okey 0
ip_vti0: ip/ip remote any local any ttl inherit nopmtudisc key 0
# Vti tunnels are created as expected
# * If you have unpatched kernel your vti tunnels will have ikey == okey == 0
Same story exists with ip tunnel show/del with non-zero [io]key: requests are
routed to tunl0 instead of ip_vti0.
Roopa Prabhu [Sun, 8 Jun 2014 05:23:42 +0000 (22:23 -0700)]
bridge: Add master device name to bridge fdb show
This patch adds master dev name from NDA_MASTER netlink attribute
to bridge fdb show output
current iproute2 tries to print 'master' in the output if NTF_MASTER
is present. But, kernel today does not set NTF_MASTER during dump
requests. Which means I have not seen iproute2 bridge cmd print 'master' atall.
This patch overrides the NTF_MASTER flag if NDA_MASTER attribute is present.
Example output:
before this patch:
# bridge fdb show
44:38:39:00:27:ba dev bond2.2003 permanent
44:38:39:00:27:bb dev bond4.2003 permanent
44:38:39:00:27:bc dev bond2.2004 permanent
After this patch:
# bridge fdb show
44:38:39:00:27:ba dev bond2.2003 master br-2003 permanent
44:38:39:00:27:bb dev bond4.2003 master br-2003 permanent
44:38:39:00:27:bc dev bond2.2004 master br-2004 permanent
For comparision with the above, below is the output for NTF_SELF today,
# bridge fdb show
33:33:00:00:00:01 dev eth0 self permanent
01:00:5e:00:00:01 dev eth0 self permanent
33:33:ff:00:01:cc dev eth0 self permanent
If change in output is a concern, 'master' can be put at the end of the fdb
output line or made optional with -d[etails] option.
change from v1 to v2:
use 'bridge' instead of 'master' in fdb show output
change from v2 to v3:
use 'master' instead of 'bridge' in fdb show output
(master could also be a vxlan device)
Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Add support to configure SR-IOV VF minimum and maximum Tx rate through ip tool
o "min_tx_rate" option has been added for minimum Tx rate. Hence, for
consistent naming, "max_tx_rate" option has been introduced for maximum
Tx rate.
o Change in v2: "rate" can be used along with "max_tx_rate".
When both are specified, "max_tx_rate" should override.
o Change in v3:
* IFLA_VF_RATE: When IFLA_VF_RATE is used, and user has given only one of
min_tx_rate or max_tx_rate, reading of previous rate limits is done in
userspace instead of in kernel space before ndo_set_vf_rate.
* IFLA_VF_TX_RATE: When IFLA_VF_TX_RATE is used, min_tx_rate is always read
in kernel space. This takes care of below scenarios:
(1) when old tool sends "rate" but kernel is new (expects min and max)
(2) when new tool sends only "rate" but kernel is old (expects only "rate")
o Change in v4 as suggested by Stephen Hemminger:
* As per iproute policy, input and output formats should match. Changing display
of max_tx_rate and min_tx_rate options accordingly.
./ip/ip link show p3p1
8: p3p1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000
link/ether 00:0e:1e:16:ce:40 brd ff:ff:ff:ff:ff:ff
vf 0 MAC 2a:18:8f:4d:3d:d4, tx rate 700 (Mbps), max_tx_rate 700Mbps, min_tx_rate 200Mbps
vf 1 MAC 72:dc:ba:f9:df:fd
During a rebuild [...]. Please note that we use our research
compiler tool-chain (using tools from the cbmc package), which permits extended
reporting on type inconsistencies at link time.
[...]
gcc bridge.o fdb.o monitor.o link.o mdb.o vlan.o ../lib/libnetlink.a ../lib/libutil.a ../lib/libnetlink.a ../lib/libutil.a -o bridge
file link.c line 18: error: conflicting types for variable "filter_index"
old definition in module fdb file fdb.c line 29
signed int
new definition in module link file link.c line 18
unsigned int
<builtin>: recipe for target 'bridge' failed
make[3]: *** [bridge] Error 64
make[3]: Leaving directory '/srv/jenkins-slave/workspace/sid-goto-cc-iproute2/iproute2-3.14.0/bridge'
Makefile:45: recipe for target 'all' failed
While practical constraints may limit the value of filter_index to remain within
the bounds of a positive signed int, there is certainly no such guarantee here.
Also, a plain majority vote suggests that this really just a wrong declaration
in link.c as several declarations of filter_index as signed int exist.
[...]
My followup on this was:
I think the majority is wrong.
filter_index is assigned exclusively from if_nametoindex or ll_name_to_index
which both return unsigned int.
Changing it to unsigned everywhere seems better.
This has been minimally tested by using the bridge tool
to add vids and showing available vids on different devices.
Reported-by: Michael Tautschnig <mt@debian.org> Signed-off-by: Andreas Henriksson <andreas@fatal.se>
When limit<burst latency becomes <0, for example:
# tc qdisc add dev eth0 root handle 1: tbf limit 100K burst 256K rate 256kbit
# tc qdisc show
qdisc tbf 1: dev eth0 root refcnt 2 rate 256Kbit burst 256Kb lat 4290.0s
If latency<0 there is no reason to show it. Limit will be printed instead of
latency when latency<0:
# tc qdisc show
qdisc tbf 1: dev eth0 root refcnt 2 rate 256Kbit burst 256Kb limit 100Kb
Signed-off-by: Sergey V. Lobanov <sergey@lobanov.in>
Natanael Copa [Tue, 27 May 2014 07:40:10 +0000 (07:40 +0000)]
iproute2: various header include fixes for compiling with musl libc
We need limits.h for LONG_MIN and LONG_MAX, sys/param.h for MIN and
sys/select for struct timeval.
This fixes the following compile errors with musl libc:
f_bpf.c: In function 'bpf_parse_opt':
f_bpf.c:181:12: error: 'LONG_MIN' undeclared (first use in this function)
if (h == LONG_MIN || h == LONG_MAX) {
^
...
tc_util.o: In function `print_tcstats2_attr':
tc_util.c:(.text+0x13fe): undefined reference to `MIN'
tc_util.c:(.text+0x1465): undefined reference to `MIN'
tc_util.c:(.text+0x14ce): undefined reference to `MIN'
tc_util.c:(.text+0x154c): undefined reference to `MIN'
tc_util.c:(.text+0x160a): undefined reference to `MIN'
tc_util.o:tc_util.c:(.text+0x174e): more undefined references to `MIN' follow
...
tc_stab.o: In function `print_size_table':
tc_stab.c:(.text+0x40f): undefined reference to `MIN'
...
fdb.c:247:30: error: 'ULONG_MAX' undeclared (first use in this function)
(vni >> 24) || vni == ULONG_MAX)
^
lnstat.h:28:17: error: field 'last_read' has incomplete type
struct timeval last_read; /* last time of read */
^
Signed-off-by: Natanael Copa <ncopa@alpinelinux.org>
Andreas Greve [Sat, 10 May 2014 09:19:18 +0000 (11:19 +0200)]
fix print_ipt: segfault if more then one filter with action -j MARK.
BUG: tc filter show ... produce a segmentation fault if more than one
filter rule with action -j MARK exists.
Reason: In print_ipt(...) xtables will be initialzed with a
pointer to the static struct tcipt_globals at xtables_init_all().
Later on the fields .opts and .options_offset of tcipt_globals are
modified. The call of xtables_free_opts(1) at the end of print(...)
does not restore the original values of tcipt_globals for the
modified fields. It only frees some allocated memory and sets
.opts to NULL. This leads to a segmentation fault when print_ipt()
is called for the next filter rule with action -j MARK.
Fix: Cloneing tcipt_globals on the stack as tmp_tcipt_globals and
use it instead of tcipt_globals, so tcipt_globals will be not
modified.
Signed-off-by: Andreas Greve <andreas.greve@a-greve.de>
david decotigny [Tue, 6 May 2014 03:38:18 +0000 (20:38 -0700)]
iproute2: show counter of carrier on<->off transitions
This patch allows to display the current counter of carrier on<->off
transitions (IFLA_CARRIER_CHANGES, see kernel commit "expose number of
carrier on/off changes"):
Tested:
- kernel with patch "net-sysfs: expose number of carrier on/off
changes": see "transns" column above
- kernel wthout the patch: "transns" not displayed (as expected)
Signed-off-by: David Decotigny <decot@googlers.com>
HHF qdisc parameters:
- limit: max number of packets in qdisc (default 1000)
- quantum: max deficit per RR round (default 1 MTU)
- hh_limit: max number of HHs to keep states (default 2048)
- reset_timeout: time to reset HHF counters (default 40ms)
- admit_bytes: counter thresh to classify as HH (default 128KB)
- evict_timeout: threshold to evict idle HHs (default 1s)
- non_hh_weight: DRR weight for mice (default 2)
Jay Vosburgh [Fri, 9 May 2014 02:02:20 +0000 (19:02 -0700)]
tc/netem: fix loss state display and p14 parsing
The display of the entire netem loss state is shown as if it
were gemodel state, as the loss state information is assigned to the
wrong pointer. Correct this by assigning the loss state to the correct
pointer.
Additionally, attempting to set netem loss state will result in
random values in the p14 state probability because the option value
passed to the kernel by tc netem is not parsed or initialized. Fix this
by supplying a default value of 0 for p14 and parsing the p14 value if
one is supplied.
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Oliver Hartkopp [Wed, 23 Apr 2014 17:27:26 +0000 (19:27 +0200)]
iproute2: can: fix indention white spaces
When preparing a patch for CAN FD support these white space issues showed up.
Fix it in the current code to be able to provide a proper follow up patch.
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
ip: officially support flag mngtmpaddr also for "ip addr del"
Kernel is being extended to support flag IFA_F_MANAGETEMPADDR also for
deletion of addresses. This will allow a userspace application to indicate
that for a global address the kernel should delete all related temporary
addresses as well.
"ip addr del" internally calls ipaddr_modify which silently accepts
any flag provided on the command line already, independent of the
actual command.
Therefore only the usage documentation needs to be extended.
Flags for a peer override flags for the other and not used for the
peer.
before:
# ip link add up type veth peer down multicast off
# ip link
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: veth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 2e:5c:cd:f5:63:d2 brd ff:ff:ff:ff:ff:ff
3: veth1: <BROADCAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 72:b0:fa:1e:76:7a brd ff:ff:ff:ff:ff:ff
after:
# ip link add up type veth peer down multicast off
# ip link
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: veth0: <BROADCAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 6e:db:03:b3:bd:ff brd ff:ff:ff:ff:ff:ff
3: veth1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 1000
link/ether a6:62:d9:84:f0:73 brd ff:ff:ff:ff:ff:ff
Masatake YAMATO [Wed, 12 Mar 2014 03:46:53 +0000 (12:46 +0900)]
iproute: Show default type, table, proto and scope of route
In "ip route show" output unicast type, main table, boot protocol and
universe scope are hidden as default labels.
Sometimes it is helpful to show the hidden label for people not enough
familiar with routing subsystem to map the output of "ip route show" and
kernel source code.
With this patch "ip route show" with -d option shows the default labels.
Example of difference of output with -d option:
$ ./ip/ip -4 route show table all dev virbr1
...
192.168.121.0/28 proto kernel scope link src 192.168.121.1
...
$ ./ip/ip -4 -d route show table all dev virbr1
...
unicast 192.168.121.0/28 table main proto kernel scope link src 192.168.121.1
...
Hiroaki SHIMODA [Sat, 8 Mar 2014 13:10:03 +0000 (22:10 +0900)]
htb: Move direct_qlen code part to htb_parse_opt().
The direct_qlen command option is used with qdisc operation.
It happened to be implemented in htb_parse_class_opt() which is called
with class operation.
Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com> Cc: Eric Dumazet <eric.dumazet@gmail.com>
Richard Haines [Fri, 7 Mar 2014 10:36:52 +0000 (10:36 +0000)]
ss: Add support for retrieving SELinux contexts
The process SELinux contexts can be added to the output using the -Z
option. Using the -z option will show the process and socket contexts (see
the man page for details).
For netlink sockets: if valid process show process context, if pid = 0
show kernel initial context, if unknown show "unavailable".
Signed-off-by: Richard Haines <richard_c_haines@btinternet.com>
Michal Kubeček [Thu, 13 Feb 2014 16:31:59 +0000 (17:31 +0100)]
iplink_bond: fix parameter value matching
Lookup function get_index() compares argument with table entries
only up to the length of the table entry so that if an entry
with lower index is a substring of a later one, earlier entry is
used even if the argument is equal to the other. For example,
ip link set bond0 type bond xmit_hash_policy layer2+3
sets xmit_hash_policy to 0 (layer2) as this is found before
"layer2+3" can be checked.
[root@xemvm1 iproute2]# ./misc/ss -a -u
State Recv-Q Send-Q Local Address:Port Peer Address:Port
UNCONN 0 0 *:32713 *:*
UNCONN 0 0 *:bootpc *:*
UNCONN 0 0 :::57879 :::*
Reported-by: François-Xavier Le Bail <fx.lebail@yahoo.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Tested-by: François-Xavier Le Bail <fx.lebail@yahoo.com>