Andrea Claudi [Thu, 27 Jun 2019 14:47:45 +0000 (16:47 +0200)]
tc: netem: fix r parameter in Bernoulli loss model
As the man page for tc netem states:
To use the Bernoulli model, the only needed parameter is p while the
others will be set to the default values r=1-p, 1-h=1 and 1-k=0.
However r parameter is erroneusly set to 1, and not to 1-p.
Fix this using the same approach of the 4-state loss model.
Fixes: 3c7950af598be ("netem: add support for 4 state and GE loss model") Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Baruch Siach [Thu, 27 Jun 2019 18:37:19 +0000 (21:37 +0300)]
devlink: fix libc and kernel headers collision
Since commit 2f1242efe9d ("devlink: Add devlink health show command") we
use the sys/sysinfo.h header for the sysinfo(2) system call. But since
iproute2 carries a local version of the kernel struct sysinfo, this
causes a collision with libc that do not rely on kernel defined sysinfo
like musl libc:
In file included from devlink.c:25:0:
.../sysroot/usr/include/sys/sysinfo.h:10:8: error: redefinition of 'struct sysinfo'
struct sysinfo {
^~~~~~~
In file included from ../include/uapi/linux/kernel.h:5:0,
from ../include/uapi/linux/netlink.h:5,
from ../include/uapi/linux/genetlink.h:6,
from devlink.c:21:
../include/uapi/linux/sysinfo.h:8:8: note: originally defined here
struct sysinfo {
^~~~~~~
Move the sys/sysinfo.h userspace header before kernel headers, and
suppress the indirect include of linux/sysinfo.h.
Cc: Aya Levin <ayal@mellanox.com> Cc: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Baruch Siach <baruch@tkos.co.il> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Baruch Siach [Thu, 27 Jun 2019 18:37:18 +0000 (21:37 +0300)]
devlink: fix format string warning for 32bit targets
32bit targets define uint64_t as long long unsigned. This leads to the
following build warning:
devlink.c: In function ‘pr_out_u64’:
devlink.c:1729:11: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 4 has type ‘uint64_t {aka long long unsigned int}’ [-Wformat=]
pr_out("%s %lu", name, val);
^
devlink.c:59:21: note: in definition of macro ‘pr_out’
fprintf(stdout, ##args); \
^~~~
Use uint64_t specific conversion specifiers in the format string to fix
that.
Cc: Aya Levin <ayal@mellanox.com> Cc: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Baruch Siach <baruch@tkos.co.il> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Andrea Claudi [Tue, 25 Jun 2019 10:29:57 +0000 (12:29 +0200)]
ip address: do not set mngtmpaddr option for IPv4 addresses
'mngtmpaddr' option make the kernel manage temporary addresses
created from the specified one as template on behalf of Privacy
Extensions (RFC3041). This option should be available only for
IPv6 addresses, as correctly stated in the manpage.
However it is possible to set mngtmpaddr on IPv4 addresses, too:
$ ip link add dummy0 type dummy
$ ip -4 addr add 192.168.1.1 dev dummy0 mngtmpaddr
$ ip a
1: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 1a:6d:c6:96:ca:f8 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.1/32 scope global mngtmpaddr dummy0
valid_lft forever preferred_lft forever
Fix this adding a check on the protocol family before setting
IFA_F_MANAGETEMPADDR flag.
Fixes: 5b7e21c417bea ("add support for IFA_F_MANAGETEMPADDR") Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Andrea Claudi [Tue, 25 Jun 2019 10:29:56 +0000 (12:29 +0200)]
ip address: do not set home option for IPv4 addresses
'home' option designates a IPv6 address as "home address" as
defined in RFC 6275. This option should be available only for
IPv6 addresses, as correctly stated in the manpage.
However it is possible to set home on IPv4 addresses, too:
$ ip link add dummy0 type dummy
$ ip -4 addr add 192.168.1.1 dev dummy0 home
$ ip a
1: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 1a:6d:c6:96:ca:f8 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.1/32 scope global home dummy0
valid_lft forever preferred_lft forever
Fix this adding a check on the protocol family before setting
IFA_F_HOMEADDRESS flag.
Fixes: bac735c53a36d ("enabled to manipulate the flags of IFA_F_HOMEADDRESS or IFA_F_NODAD from ip.") Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Andrea Claudi [Tue, 25 Jun 2019 10:29:55 +0000 (12:29 +0200)]
ip address: do not set nodad option for IPv4 addresses
Duplicate Address Detection (RFC 4862) is available only for IPv6
addresses. As a consequence, 'nodad' option, turning it off, should
be available only for IPv6, and is defined like that in the man page.
However it is possible to set nodad on IPv4 addresses, too:
$ ip link add dummy0 type dummy
$ ip -4 addr add 192.168.1.1 dev dummy0 nodad
$ ip a
1: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 1a:6d:c6:96:ca:f8 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.1/32 scope global nodad dummy0
valid_lft forever preferred_lft forever
Fix this adding a check on the protocol family before setting
IFA_F_NODAD flag.
Fixes: bac735c53a36d ("enabled to manipulate the flags of IFA_F_HOMEADDRESS or IFA_F_NODAD from ip.") Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Stefano Brivio [Tue, 25 Jun 2019 11:41:24 +0000 (13:41 +0200)]
iproute: Set flags and attributes on dump to get IPv6 cached routes to be flushed
With a current (5.1) kernel version, IPv6 exception routes can't be listed
(ip -6 route list cache) or flushed (ip -6 route flush cache). Kernel
support for this is being added back. Relevant net-next commits:
564c91f7e563 fib_frontend, ip6_fib: Select routes or exceptions dump from RTM_F_CLONED ef11209d4219 Revert "net/ipv6: Bail early if user only wants cloned entries" 3401bfb1638e ipv6/route: Don't match on fc_nh_id if not set in ip6_route_del() bf9a8a061ddc ipv6/route: Change return code of rt6_dump_route() for partial node dumps 1e47b4837f3b ipv6: Dump route exceptions if requested 40cb35d5dc04 ip6_fib: Don't discard nodes with valid routing information in fib6_locate_1()
However, to allow the kernel to filter routes based on the RTM_F_CLONED
flag, we need to make sure this flag is always passed when we want cached
routes to be dumped, and we can also pass table and output interface
attributes to have the kernel filtering on them, if requested by the user.
Use the existing iproute_dump_filter() as a filter for the dump request in
iproute_flush(). This way, 'ip -6 route flush cache' works again.
v2: Instead of creating a separate 'filter' function dealing with
RTM_F_CACHED only, use the existing iproute_dump_filter() and get
table and oif kernel filtering for free. Suggested by David Ahern.
Hangbin Liu [Wed, 26 Jun 2019 01:44:07 +0000 (09:44 +0800)]
ip/iptoken: fix dump error when ipv6 disabled
When we disable IPv6 from the start up (ipv6.disable=1), there will be
no IPv6 route info in the dump message. If we return -1 when
ifi->ifi_family != AF_INET6, we will get error like
$ ip token list
Dump terminated
which will make user feel confused. There is no need to return -1 if the
dump message not match. Return 0 is enough.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Eyal Birger [Mon, 24 Jun 2019 15:14:57 +0000 (18:14 +0300)]
tc: adjust xtables_match and xtables_target to changes in recent iptables
iptables commit 933400b37d09 ("nft: xtables: add the infrastructure to translate from iptables to nft")
added an additional member to struct xtables_match and struct xtables_target.
This change is available for libxtables12 and up.
Add these members conditionally to support both newer and older versions.
Fixes: dd29621578d2 ("tc: add em_ipt ematch for calling xtables matches from tc matching context") Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Matteo Croce [Tue, 18 Jun 2019 14:49:35 +0000 (16:49 +0200)]
netns: make netns_{save,restore} static
The netns_{save,restore} functions are only used in ipnetns.c now, since
the restore is not needed anymore after the netns exec command.
Move them in ipnetns.c, and make them static.
Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Matteo Croce [Tue, 18 Jun 2019 14:49:34 +0000 (16:49 +0200)]
ip vrf: use hook to change VRF in the child
On vrf exec, reset the VRF associations in the child process, via the
new hook added to cmd_exec(). In this way, the parent doesn't have to
reset the VRF associations before spawning other processes.
Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Matteo Croce [Tue, 18 Jun 2019 14:49:33 +0000 (16:49 +0200)]
netns: switch netns in the child when executing commands
'ip netns exec' changes the current netns just before executing a child
process, and restores it after forking. This is needed if we're running
in batch or do_all mode.
Some cleanups must be done both in the parent and in the child: the
parent must restore the previous netns, while the child must reset any
VRF association.
Unfortunately, if do_all is set, the VRF are not reset in the child, and
the spawned processes are started with the wrong VRF context. This can
be triggered with this script:
# ip -b - <<-'EOF'
link add type vrf table 100
link set vrf0 up
link add type dummy
link set dummy0 vrf vrf0 up
netns add ns1
EOF
# ip -all -b - <<-'EOF'
vrf exec vrf0 true
netns exec setsid -f sleep 1h
EOF
# ip vrf pids vrf0
314 sleep
# ps 314
PID TTY STAT TIME COMMAND
314 ? Ss 0:00 sleep 1h
Refactor cmd_exec() and pass to it a function pointer which is called in
the child before the final exec. In the netns exec case the function just
resets the VRF and switches netns.
Doing it in the child is less error prone and safer, because the parent
environment is always kept unaltered.
After this refactor some utility functions became unused, so remove them.
Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Moshe Shemesh [Tue, 11 Jun 2019 16:11:09 +0000 (19:11 +0300)]
devlink: mnlg: Catch returned error value of dumpit commands
Devlink commands which implements the dumpit callback may return error.
The netlink function netlink_dump() sends the errno value as the payload
of the message, while answering user space with NLMSG_DONE.
To enable receiving errno value for dumpit commands we have to check for
it in the message. If it is a negative value then the dump returned an
error so we should set errno accordingly and check for ext_ack in case
it was set.
Fixes: 049c58539f5d ("devlink: mnlg: Add support for extended ack") Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Mahesh Bandewar [Thu, 6 Jun 2019 23:44:26 +0000 (16:44 -0700)]
ip6tunnel: fix 'ip -6 {show|change} dev <name>' cmds
Inclusion of 'dev' is allowed by the syntax but not handled
correctly by the command. It produces no output for show
command and falsely successful for change command but does
not make any changes.
can be verified with the following steps
# ip -6 tunnel add ip6tnl1 mode ip6gre local fd::1 remote fd::2 tos inherit ttl 127 encaplimit none
# ip -6 tunnel show ip6tnl1
<correct output>
# ip -6 tunnel show dev ip6tnl1
<no output but correct output after this change>
# ip -6 tunnel change dev ip6tnl1 local 2001:1234::1 remote 2001:1234::2 encaplimit none ttl 127 tos inherit allow-localremote
# echo $?
0
# ip -6 tunnel show ip6tnl1
<no changes applied, but changes are correctly applied after this change>
Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Matteo Croce [Fri, 7 Jun 2019 20:41:22 +0000 (22:41 +0200)]
ip: reset netns after each command in batch mode
When creating a new netns or executing a program into an existing one,
the unshare() or setns() calls will change the current netns.
In batch mode, this can run commands on the wrong interfaces, as the
ifindex value is meaningful only in the current netns. For example, this
command fails because veth-c doesn't exists in the init netns:
# ip -b - <<-'EOF'
netns add client
link add name veth-c type veth peer veth-s netns client
addr add 192.168.2.1/24 dev veth-c
EOF
Cannot find device "veth-c"
Command failed -:7
But if there are two devices with the same name in the init and new netns,
ip will build a wrong ll_map with indexes belonging to the new netns,
and will execute actions in the init netns using this wrong mapping.
This script will flush all eth0 addresses and bring it down, as it has
the same ifindex of veth0 in the new netns:
# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
inet 192.168.122.76/24 brd 192.168.122.255 scope global dynamic eth0
valid_lft 3598sec preferred_lft 3598sec
# ip -b - <<-'EOF'
netns add client
link add name veth0 type veth peer name veth1
link add name veth-ns type veth peer name veth0 netns client
link set veth0 down
address flush veth0
EOF
# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN group default qlen 1000
link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
3: veth1@veth0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether c2:db:d0:34:13:4a brd ff:ff:ff:ff:ff:ff
4: veth0@veth1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether ca:9d:6b:5f:5f:8f brd ff:ff:ff:ff:ff:ff
5: veth-ns@if2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 32:ef:22:df:51:0a brd ff:ff:ff:ff:ff:ff link-netns client
The same issue can be triggered by the netns exec subcommand with a
sligthy different script:
# ip netns add client
# ip -b - <<-'EOF'
netns exec client true
link add name veth0 type veth peer name veth1
link add name veth-ns type veth peer name veth0 netns client
link set veth0 down
address flush veth0
EOF
Fix this by adding two netns_{save,reset} functions, which are used
to get a file descriptor for the init netns, and restore it after
each batch command.
netns_save() is called before the unshare() or setns(),
while netns_restore() is called after each command.
Fixes: 0dc34c7713bb ("iproute2: Add processless network namespace support") Reviewed-and-tested-by: Andrea Claudi <aclaudi@redhat.com> Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Davide Caratti [Tue, 4 Jun 2019 22:30:16 +0000 (00:30 +0200)]
tc: simple: don't hardcode the control action
the following TDC test case:
b776 - Replace simple action with invalid goto chain control
checks if the kernel correctly validates the 'goto chain' control action,
when it is specified in 'act_simple' rules. The test systematically fails
because the control action is hardcoded in parse_simple(), i.e. it is not
parsed by command line arguments, so its value is constantly TC_ACT_PIPE.
Because of that, the following command:
# tc action add action simple sdata "test" drop index 7
installs an 'act_simple' rule that never drops packets, and whose 'index'
is the first IDR available, plus an 'act_gact' rule with 'index' equal to
7, that drops packets.
Use parse_action_control_dflt(), like we did on many other TC actions, to
make the control action configurable also with 'act_simple'. The expected
results of test b776 are summarized below:
Roman Mashak [Thu, 6 Jun 2019 21:32:09 +0000 (17:32 -0400)]
tc: Fix binding of gact action by index.
The following operation fails:
% sudo tc actions add action pipe index 1
% sudo tc filter add dev lo parent ffff: \
protocol ip pref 10 u32 match ip src 127.0.0.2 \
flowid 1:10 action gact index 1
Bad action type index
Usage: ... gact <ACTION> [RAND] [INDEX]
Where: ACTION := reclassify | drop | continue | pass | pipe |
goto chain <CHAIN_INDEX> | jump <JUMP_COUNT>
RAND := random <RANDTYPE> <ACTION> <VAL>
RANDTYPE := netrand | determ
VAL : = value not exceeding 10000
JUMP_COUNT := Absolute jump from start of action list
INDEX := index value used
However, passing a control action of gact rule during filter binding works:
% sudo tc filter add dev lo parent ffff: \
protocol ip pref 10 u32 match ip src 127.0.0.2 \
flowid 1:10 action gact pipe index 1
Binding by reference, i.e. by index, has to consistently work with
any tc action.
Since tc is sensitive to the order of keywords passed on the command line,
we can teach gact to skip parsing arguments as soon as it sees 'gact'
followed by 'index' keyword.
Signed-off-by: Roman Mashak <mrv@mojatatu.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Parav Pandit [Thu, 6 Jun 2019 11:49:19 +0000 (06:49 -0500)]
devlink: Increase bus, device buffer size to 64 bytes
Device name on mdev bus is 36 characters long which follow standard uuid
RFC 4122.
This is probably the longest name that a kernel will return for a
device.
Nicolas Dichtel [Wed, 29 May 2019 14:42:10 +0000 (16:42 +0200)]
iplink: don't try to get ll addr len when creating an iface
It will obviously fail. This is a follow up of the
commit 757837230a65 ("lib: suppress error msg when filling the cache").
Suggested-by: David Ahern <dsahern@gmail.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
While I fixed the mdb json output, I did overlook the text output.
This patch returns the original text output format:
dev <bridge> port <port> grp <mcast group> <temp|permanent> <flags> <timer>
Example (old format, restored by this patch):
dev br0 port eth8 grp 239.1.1.11 temp
Example (changed format after the commit below):
23: br0 eth8 239.1.1.11 temp
We had some reports of failing scripts which were parsing the output.
Also the old format matches the bridge mdb command syntax which makes
it easier to build commands out of the output.
Fixes: c7c1a1ef51ae ("bridge: colorize output and use JSON print library") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Lukasz Czapnik [Mon, 27 May 2019 21:03:49 +0000 (23:03 +0200)]
tc: flower: fix port value truncation
sscanf truncates read port values silently without any error. As sscanf
man says:
(...) sscanf() conform to C89 and C99 and POSIX.1-2001. These standards
do not specify the ERANGE error.
Replace sscanf with safer get_be16 that returns error when value is out
of range.
Example:
tc filter add dev eth0 protocol ip parent ffff: prio 1 flower ip_proto
tcp dst_port 70000 hw_tc 1
Would result in filter for port 4464 without any warning.
Fixes: 8930840e678b ("tc: flower: Classify packets based port ranges") Signed-off-by: Lukasz Czapnik <lukasz.czapnik@intel.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Nicolas Dichtel [Fri, 24 May 2019 08:59:10 +0000 (10:59 +0200)]
lib: suppress error msg when filling the cache
Before the patch:
$ ip netns add foo
$ ip link add name veth1 address 2a:a5:5c:b9:52:89 type veth peer name veth2 address 2a:a5:5c:b9:53:90 netns foo
RTNETLINK answers: No such device
RTNETLINK answers: No such device
But the command was successful. This may break script. Let's remove those
error messages.
Fixes: 55870dfe7f8b ("Improve batch and dump times by caching link lookups") Reported-by: Philippe Guibert <philippe.guibert@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Paolo Abeni [Mon, 20 May 2019 09:56:52 +0000 (11:56 +0200)]
m_mirred: don't bail if the control action is missing
The mirred act admits an optional control action, defaulting
to TC_ACT_PIPE. The parsing code currently emits an error message
if the control action is not provided on the command line, even
if the command itself completes with no error.
This change shuts down the error message, using the appropriate
parsing helper.
Fixes: e67aba559581 ("tc: actions: add helpers to parse and print control actions") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Phil Sutter [Mon, 6 May 2019 17:09:56 +0000 (19:09 +0200)]
ip-xfrm: Respect family in deleteall and list commands
Allow to limit 'ip xfrm {state|policy} list' output to a certain address
family and to delete all states/policies by family.
Although preferred_family was already set in filters, the filter
function ignored it. To enable filtering despite the lack of other
selectors, filter.use has to be set if family is not AF_UNSPEC.
Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Zhiqiang Liu [Sun, 5 May 2019 01:59:51 +0000 (09:59 +0800)]
ipnetns: use-after-free problem in get_netnsid_from_name func
Follow the following steps:
# ip netns add net1
# export MALLOC_MMAP_THRESHOLD_=0
# ip netns list
then Segmentation fault (core dumped) will occur.
In get_netnsid_from_name func, answer is freed before
rta_getattr_u32(tb[NETNSA_NSID]), where tb[] refers to answer`s
content. If we set MALLOC_MMAP_THRESHOLD_=0, mmap will be adoped to
malloc memory, which will be freed immediately after calling free
func. So reading tb[NETNSA_NSID] will access the released memory
after free(answer).
Here, we will call get_netnsid_from_name(tb[NETNSA_NSID]) before free(answer).
Fixes: 86bf43c7c2f ("lib/libnetlink: update rtnl_talk to support malloc buff at run time") Reported-by: Huiying Kou <kouhuiying@huawei.com> Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com> Acked-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
taprio: Add support for cycle_time and cycle_time_extension
This allows a cycle-time and a cycle-time-extension to be specified.
Specifying a cycle-time will truncate that cycle, so when that instant
is reached, the cycle will start from its beginning.
A cycle-time-extension may cause the last entry of a cycle, just
before the start of a new schedule (the base-time of the "admin"
schedule) to be extended by at maximum "cycle-time-extension"
nanoseconds. The idea of this feauture, as described by the IEEE
802.1Q, is too avoid too narrow gate states.
Example:
tc qdisc change dev IFACE parent root handle 100 taprio \
sched-entry S 0x1 1000000 \
sched-entry S 0x0 2000000 \
sched-entry S 0x1 3000000 \
sched-entry S 0x0 4000000 \
cycle-time-extension 100000 \
cycle-time 9000000 \
base-time 12345678900000000
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David Ahern <dsahern@gmail.com>
This allows for a new schedule to be specified during runtime, without
removing the current one.
For that, the semantics of the 'tc qdisc change' operation in the
context of taprio is that if "change" is called and there is a running
schedule, a new schedule is created and the base-time (let's call it
X) of this new schedule is used so at instant X, it becomes the
"current" schedule. So, in short, "change" doesn't change the current
schedule, it creates a new one and sets it up to it becomes the
current one at some point.
In IEEE 802.1Q terms, it means that we have support for the
"Oper" (current and read-only) and "Admin" (future and mutable)
schedules.
Example of creating the first schedule, then adding a new one:
David Ahern [Thu, 2 May 2019 23:13:21 +0000 (16:13 -0700)]
uapi: wrap SIOCGSTAMP and SIOCGSTAMPNS in ifndef
These warnings:
../include/uapi/linux/sockios.h:42:0: warning: "SIOCGSTAMP" redefined
../include/uapi/linux/sockios.h:43:0: warning: "SIOCGSTAMPNS" redefined
are from kernel commit 0768e17073dc5 ("net: socket: implement 64-bit
timestamps"). This commit moved the definitions of SIOCGSTAMP and
SIOCGSTAMPNS from include/asm-generic/sockios.h to
include/uapi/linux/sockios.h. Older OS'es already define them in
/usr/include/asm-generic/sockios.h resulting in ugly compile errors now:
In file included from ll_types.c:24:0:
../include/uapi/linux/sockios.h:42:0: warning: "SIOCGSTAMP" redefined
#define SIOCGSTAMP SIOCGSTAMP_OLD
In file included from /usr/include/x86_64-linux-gnu/asm/sockios.h:1:0,
from /usr/include/asm-generic/socket.h:5,
from /usr/include/x86_64-linux-gnu/asm/socket.h:1,
from /usr/include/x86_64-linux-gnu/bits/socket.h:368,
from /usr/include/x86_64-linux-gnu/sys/socket.h:38,
from ll_types.c:17:
/usr/include/asm-generic/sockios.h:11:0: note: this is the location of the previous definition
#define SIOCGSTAMP 0x8906 /* Get stamp (timeval) */
Josh Hunt [Wed, 1 May 2019 01:38:38 +0000 (21:38 -0400)]
ss: add option to print socket information on one line
Multi-line output in ss makes it difficult to search for things with
grep. This new option will make it easier to find sockets matching
certain criteria with simple grep commands.
Example without option:
$ ss -emoitn
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 0 127.0.0.1:13265 127.0.0.1:36743 uid:1974 ino:48271 sk:1 <->
skmem:(r0,rb2227595,t0,tb2626560,f0,w0,o0,bl0,d0) ts sack reno wscale:7,7 rto:211 rtt:10.245/16.616 ato:40 mss:65483 cwnd:10 bytes_acked:41865496 bytes_received:21580440 segs_out:242496 segs_in:351446 data_segs_out:242495 data_segs_in:242495 send 511.3Mbps lastsnd:2383 lastrcv:2383 lastack:2342 pacing_rate 1022.6Mbps rcv_rtt:92427.6 rcv_space:43725 minrtt:0.007
Example with new option:
$ ss -emoitnO
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 0 127.0.0.1:13265 127.0.0.1:36743 uid:1974 ino:48271 sk:1 <-> skmem:(r0,rb2227595,t0,tb2626560,f0,w0,o0,bl0,d0) ts sack reno wscale:7,7 rto:211 rtt:10.067/16.429 ato:40 mss:65483 pmtu:65535 rcvmss:536 advmss:65483 cwnd:10 bytes_sent:41868244 bytes_acked:41868244 bytes_received:21581866 segs_out:242512 segs_in:351469 data_segs_out:242511 data_segs_in:242511 send 520.4Mbps lastsnd:14355 lastrcv:14355 lastack:14314 pacing_rate 1040.7Mbps delivery_rate 74837.7Mbps delivered:242512 app_limited busy:1861946ms rcv_rtt:92427.6 rcv_space:43725 rcv_ssthresh:43690 minrtt:0.007
Signed-off-by: Josh Hunt <johunt@akamai.com> Signed-off-by: David Ahern <dsahern@gmail.com>
This patch updates the tc-bpf.8 example application for changes to the
struct bpf_elf_map definition. In it's current form, things compile, but
the resulting object file is rejected by the verifier when attempting to
load it through tc.
Signed-off-by: Lucas Siba <lucas.siba@gmail.com> Signed-off-by: David Ahern <dsahern@gmail.com>
[ dropped the unnecessary flags initialization on commit ]
Mike Manning [Sat, 20 Apr 2019 10:45:37 +0000 (11:45 +0100)]
iplink_vlan: add support for VLAN bridge binding flag
This patch adds support for the VLAN bridge binding flag that is
provided in net-next kernel by the series merged by 1ab839281cf7
("net-support-binding-vlan-dev-link-state-to-vlan-member-bridge-ports")
Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com> Signed-off-by: David Ahern <dsahern@gmail.com>
Thomas Haller [Tue, 23 Apr 2019 07:16:14 +0000 (09:16 +0200)]
iprule: refactor print_rule() to use leading space before printing attribute
When printing the actions, we avoid adding the trailing space after the
attribute. Possibly because we expect the action to be the last output
on the line and not end with a space.
But for FR_ACT_TO_TBL nothing is printed. That means, we add double
spaces if a protocol is printed as well:
# ip rule add priority 10 protocol 10 type 1
will be printed as
10: from all lookup 1 proto mrt
The only visible effect of the patch is to avoid the double-space and
avoid a trailing space if the action is FR_ACT_TO_TBL.
Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Thomas Haller [Tue, 23 Apr 2019 07:16:13 +0000 (09:16 +0200)]
iprule: avoid trailing space in print_rule() after printing protocol
It seems print_rule() tries to avoid a trailing space at the end
of the line. At least, when printing details about the actions,
they no longer append the space. Probably expecting to be the
last attribute that will be printed.
Don't let the protocol add the trailing space. The space at the end
of the line should be printed consistently (or not).
Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Kristian Evensen [Mon, 22 Apr 2019 15:27:41 +0000 (17:27 +0200)]
ip fou: Support binding FOU ports
This patch adds support for binding FOU ports using iproute2.
Kernel-support was added in 1713cb37bf67 ("fou: Support binding FoU
socket").
The parse function now handles new arguments for setting the
binding-related attributes, while the print function writes the new
attributes if they are set. Also, the man page has been updated.
v2->v3:
* Remove redundant ll_init_map()-calls (thanks David Ahern).
v1->v2 (all changes suggested by David Ahern):
* Fix reverse Christmas tree ordering.
* Remove redundant peer_port_set-variable, it is enough to check
peer_port.
* Add proper error handling of invalid local/peer addresses.
* Use interface name and not index.
* Remove updating fou-header file, it is already done.
Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com> Signed-off-by: David Ahern <dsahern@gmail.com>
iplink: bridge: add support for vlan_stats_per_port
Add support for manipulating and showing the vlan_stats_per_port bridge
option which can be toggled only when there are no port VLANs
configured. Also update the man page with the new option.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David Ahern <dsahern@gmail.com>
Each of the commits below broke the vlan stats output in a different
way:
- 45fca4ed9412 ("bridge: fix vlan show stats formatting")
Added a second print of an interface name (e.g. eth4eth4)
- c7c1a1ef51ae ("bridge: colorize output and use JSON print library")
Broke normal vlan stats output by not printing a new line after them
Also printed interfaces without any vlans when printing stats
This fix is not pretty but it brings back the previous behaviour.
Before this fix:
$ bridge -s vlan show
port vlan id
br0br0 1 PVID Egress Untagged
RX: 0 bytes 0 packets
TX: 0 bytes 0 packets 4
RX: 0 bytes 0 packets
TX: 0 bytes 0 packetseth4eth4 4
RX: 0 bytes 0 packets
TX: 0 bytes 0 packetsroot@debian:~/
After this fix:
$ bridge -s vlan show
port vlan id
br0 1 PVID Egress Untagged
RX: 0 bytes 0 packets
TX: 0 bytes 0 packets
4
RX: 0 bytes 0 packets
TX: 0 bytes 0 packets
eth4 4
RX: 0 bytes 0 packets
TX: 0 bytes 0 packets
Fixes: 45fca4ed9412 ("bridge: fix vlan show stats formatting") Fixes: c7c1a1ef51ae ("bridge: colorize output and use JSON print library") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Since the commit below mdb's json output has been invalid and also with
changed format. Restore it to a valid json like the previous format.
Also takes care of a double "Deleted" print when monitoring for changes.
Fixes: c7c1a1ef51ae ("bridge: colorize output and use JSON print library") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
This adds support for the newly added fwmark option to CAKE, which allows
overriding the tin selection from the per-packet firewall marks. The fwmark
field is a bitmask that is applied to the fwmark to select the tin.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
David Ahern [Wed, 3 Apr 2019 19:05:17 +0000 (12:05 -0700)]
Merge branch 'rdma-dynamic-link-create' into next
Steve Wise says:
====================
This series adds rdmatool support for creating/deleting rdma links.
This will be used, mainly, by soft rdma drivers to allow adding/deleting
rdma links over netdev interfaces. It provides the user side for
the following kernel changes merged in linux-5.1.
Changes since v2:
- move checks for required parameters in the parameter handlers
- move final 'link add' processing to link_add_netdev()
- added reviewed-by tags
Changes since v1:
- move error receive checking from rd_sendrecv_msg() to rd_recv_msg().
- Add rd->suppress_errors to allow control over whether errors when
reading a response should be ignored. Namely: resource queries can
get errors like "none found" when querying for resources, and this
error should not be displayed. So on a rd object basis, error
suppression can be controlled.
- Rebased on rdma/for-next UABI (no need to sync rdma_netlink.h now)
- use chains of struct rd_cmd and rd_exec_cmd vs open coding the parsing
for the 'link add' command.
- minor nit resolution
- added .mailmap file. If this is not desired for iproute2, then please
drop the patch.
Changes since RFC:
- add rd_sendrecv_msg() and make use of it in dev_set as well
as the new link commands.
- fixed problems with the man pages
- changed the command line to use "netdev" as the keyword
for the network device, do avoid confused with the ib_device
name.
- got rid of the "type" parameter for link delete. Also pass
down the device index instead of the name, using the common
rd services for validating the device name and fetching the
index.
Steve Wise [Wed, 3 Apr 2019 17:10:32 +0000 (12:10 -0500)]
rdma: man page update for link add/delete
Update the 'rdma link' man page with 'link add/delete' info.
Signed-off-by: Steve Wise <larrystevenwise@gmail.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David Ahern <dsahern@gmail.com>
Steve Wise [Wed, 3 Apr 2019 17:10:31 +0000 (12:10 -0500)]
rdma: add 'link add/delete' commands
Add new 'link' subcommand 'add' and 'delete' to allow binding a soft-rdma
device to a netdev interface.
EG:
rdma link add rxe_eth0 type rxe netdev eth0
rdma link delete rxe_eth0
Signed-off-by: Steve Wise <larrystevenwise@gmail.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David Ahern <dsahern@gmail.com>
Steve Wise [Wed, 3 Apr 2019 17:10:30 +0000 (12:10 -0500)]
rdma: add helper rd_sendrecv_msg()
This function sends the constructed netlink message and then
receives the response.
Change rd_recv_msg() to display any error messages.
Change 'rdma dev set' to use rd_sendrecv_msg().
Signed-off-by: Steve Wise <larrystevenwise@gmail.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David Ahern <dsahern@gmail.com>
Steve Wise [Wed, 3 Apr 2019 17:10:29 +0000 (12:10 -0500)]
Add .mailmap file
.mailmap allows tracking multiple email addresses to the proper user name.
Signed-off-by: Steve Wise <larrystevenwise@gmail.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David Ahern <dsahern@gmail.com>
Hoang Le [Fri, 22 Mar 2019 08:47:33 +0000 (15:47 +0700)]
tipc: add link broadcast set method and ratio
The command added here makes it possible to forcibly configure the
broadcast link to use either broadcast or replicast, in addition to
the already existing auto selection algorithm.
A sample usage is shown below:
$tipc link set broadcast BROADCAST
$tipc link set broadcast AUTOSELECT ratio 25
$tipc link set broadcast -h
Usage: tipc link set broadcast PROPERTY
PROPERTIES
BROADCAST - Forces all multicast traffic to be
transmitted via broadcast only,
irrespective of cluster size and number
of destinations
REPLICAST - Forces all multicast traffic to be
transmitted via replicast only,
irrespective of cluster size and number
of destinations
AUTOSELECT - Auto switching to broadcast or replicast
depending on cluster size and destination
node number
ratio SIZE - Set the AUTOSELECT criteria, percentage of
destination nodes vs cluster size
Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au> Signed-off-by: David Ahern <dsahern@gmail.com>