Ben Pfaff [Wed, 24 Oct 2018 21:23:38 +0000 (14:23 -0700)]
connmgr: Make treatment of active and passive connections more uniform.
Until now, connmgr has handled active and passive OpenFlow connections in
quite different ways. Any active connection, whether it was currently
connected or not, was always maintained as an ofconn. Whenever such a
connection (re)connected, its settings were cleared. On the other hand,
passive connections had a separate listener which created an ofconn when
a new connection came in, and these ofconns would be deleted when such a
connection was closed. This approach is inelegant and has occasionally
led to bugs when reconnection didn't clear all of the state that it
should have.
There's another motivation here. Currently, active connections are
always primary controllers and passive connections are always service
controllers (as documented in ovs-vswitchd.conf.db(5)). Sometimes it would
be useful to have passive primary controllers (maybe active service
controllers too but I haven't personally run into that use case). As is,
this is difficult to implement because there is so much different code in
use between active and passive connections. This commit will make it
easier.
Ilya Maximets [Tue, 5 Feb 2019 07:16:04 +0000 (10:16 +0300)]
travis: Speed up linux kernel downloads.
CDN links are much faster in average. https://www.kernel.org/
links shows usually less than 10 MB/s, while https://cdn.kernel.org/
could give up to 200 MB/s and usually shows speeds much higher than
10 MB/s. Also, 'xz' archives are 30-50 MB smaller than gzip ones.
It takes a bit more time to unpack them, but it's negligible in
compare with download time.
For exmaple,
linux-3.16.54.tar.gz - 122064395 (116M)
linux-3.16.54.tar.xz - 81057528 (77M)
'xz' archive download via CDN link is the default way for kernel
downloading that provided by the kernel.org.
psiyengar [Thu, 17 Jan 2019 00:53:52 +0000 (16:53 -0800)]
Fix OpenFlow v1.3.4 Conf test failures: 430.500, 430.510
This commit adds additional verification to nx_pull_header__()
in lib/nx-match.c to distinguish between bad match and bad action
header conditions and return the appropriate error type/code.
Signed-off-by: Prashanth Iyengar <prashanth_iyengar@alliedtelesis.com> Reviewed-by: Tony van der Peet <tony.vanderpeet@alliedtelesis.co.nz> Reviewed-by: Rahul Gupta <Rahul_Gupta@alliedtelesis.com>
Ilya Maximets [Tue, 29 Jan 2019 13:09:55 +0000 (16:09 +0300)]
skiplist: Drop data comparison in skiplist_delete.
Current version of 'skiplist_delete' uses data comparator to check
if the node that we're removing exists on current level. i.e. our
node 'x' is the next of update[i] on the level i.
But it's enough to just check pointers for that purpose.
Here is the small example of how the data structures looks at
this moment:
i a b c x d e f
0 [ ]>[ ]>[*] ---> [ ] ---> [#]>[ ]>[ ]
1 [ ]>[*] -------> [ ] -------> [#]>[ ]
2 [ ]>[*] -------> [ ] -----------> [#]
3 [ ]>[*] ------------------------> [ ]
4 [*] ----------------------------> [ ]
0 1 2 3 4
update[] = { c, b, b, b, a }
x.forward[] = { d, e, f }
c.forward[0] = x
b.forward[1] = x
b.forward[2] = x
b.forward[3] = f
a.forward[4] = f
Target:
i a b c d e f
0 [ ]>[ ]>[*] ------------> [#]>[ ]>[ ]
1 [ ]>[*] --------------------> [#]>[ ]
2 [ ]>[*] ------------------------> [#]
3 [ ]>[*] ------------------------> [ ]
4 [*] ----------------------------> [ ]
c.forward[0] = x.forward[0] = d
b.forward[1] = x.forward[1] = e
b.forward[2] = x.forward[2] = f
b.forward[3] = f
a.forward[4] = f
i.e. we're updating forward pointers while update[i].forward[i] == x.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ben Pfaff [Fri, 25 Jan 2019 20:22:06 +0000 (12:22 -0800)]
skiplist: Remove 'height' from skiplist_node.
This member was write-only: it was initialized and never used later on.
Thanks to Esteban Rodriguez Betancourt <estebarb@hpe.com> for the
following additional rationale:
In this case you are right, the "height" member is not only not
used, it is in fact not required, and can be safely removed,
without causing security issues.
The code can't read past the end of the 'forward' array because
the skiplist "level" member, that specifies the maximum height of
the whole skiplist.
The "level" field is updated in insertions and deletions, so that
in insertion the root node will point to the newly created item
(if there isn't a list there yet). At the deletions, if the
deleted item is the last one at that height then the root is
modified to point to NULL at that height, and the whole skiplist
height is decremented.
For the forward_to case:
- If a node is found in a list of level/height N, then it has
height N (that's why it was inserted in that list)
- forward_to travels throught nodes in the same level, so it is
safe, as it doesn't go up.
- If a node has height N, then it belongs to all the lists
initiated at root->forward[n, n-1 ,n-2, ..., 0]
- forward_to may go to lower levels, but that is safe, because of
previous point.
So, the protection is given by the "level" field in skiplist root
node, and it is enough to guarantee that the code won't go off
limits at 'forward' array. But yes, the height field is unused,
unneeded, and can be removed safely.
CC: Esteban Rodriguez Betancourt <estebarb@hpe.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Darrell Ball [Tue, 5 Feb 2019 00:02:15 +0000 (16:02 -0800)]
conntrack: Fix possibly uninitialized memory.
There are a few cases where struct 'conn_key' padding may be unspecified
according to the C standard. Practically, it seems implementations don't
have issue, but it is better to be safe. The code paths modified are not
hot ones. Fix this by doing a memcpy in these cases in lieu of a
structure copy.
Found by inspection.
Signed-off-by: Darrell Ball <dlu998@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ashish Varma [Mon, 4 Feb 2019 23:34:34 +0000 (15:34 -0800)]
ofproto-dpif-trace: Fix for the segmentation fault in ofproto_trace().
Added the check for NULL in "next_ct_states" argument passed to the
"ofproto_trace()" function. Under normal scenario, this is non-NULL. A NULL
"next_ct_states" argument is passed from the "upcall_xlate()" function on
encountering XLATE_RECURSION_TOO_DEEP or XLATE_TOO_MANY_RESUBMITS error.
VMware-BZ: #2282287 Signed-off-by: Ashish Varma <ashishvarma.ovs@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
The previous commit fa642f08839b
("openvswitch: Derive IP protocol number for IPv6 later frags")
introduces IP protocol number parsing for IPv6 later frags that can mess
up the network header length calculation logic, i.e. nh_len < 0.
However, the network header length calculation is mainly for deriving
the transport layer header in the key extraction process which the later
fragment does not apply.
Therefore, this commit skips the network header length calculation to
fix the issue.
Reported-by: Chris Mi <chrism@mellanox.com> Reported-by: Greg Rose <gvrose8192@gmail.com> Fixes: fa642f08839b ("openvswitch: Derive IP protocol number for IPv6 later frags") Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Fixes: 9a4ab6da01f7 ("datapath: Derive IP protocol number for IPv6 later frags") Cc: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
openvswitch: Derive IP protocol number for IPv6 later frags
Currently, OVS only parses the IP protocol number for the first
IPv6 fragment, but sets the IP protocol number for the later fragments
to be NEXTHDF_FRAGMENT. This patch tries to derive the IP protocol
number for the IPV6 later frags so that we can match that.
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> CC: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
openvswitch: Avoid OOB read when parsing flow nlattrs
For nested and variable attributes, the expected length of an attribute
is not known and marked by a negative number. This results in an OOB
read when the expected length is later used to check if the attribute is
all zeros. Fix this by using the actual length of the attribute rather
than the expected length.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Ross Lagerwall <ross.lagerwall@citrix.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Yifeng Sun [Tue, 29 Jan 2019 22:18:08 +0000 (14:18 -0800)]
datapath: Add support for kernel 4.18.x
No code changing is necessary to support 4.18.x.
Only one kernel test failed and it is in the process of being fixed.
Updated .travis.yml to include 4.18.x and also use latest 4.17 version.
Updated test files to test 4.18 kernel.
Tested-by: Greg Rose <gvrose8192@gmail.com> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Yifeng Sun [Thu, 31 Jan 2019 23:10:00 +0000 (15:10 -0800)]
dpif-netlink: Fix a bug that causes duplicate key error in datapath
Kmod tests 122 and 123 failed and kernel reports a "Duplicate key of
type 6" error. Further debugging reveals that nl_attr_find__() should
start looking for OVS_KEY_ATTR_ETHERTYPE from offset returned by
a previous called nl_msg_start_nested(). This patch fixes it.
Tests 122 and 123 were skipped by kernel 4.15 and older versions.
Kernel 4.16 and later kernels start showing this failure.
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Yifeng Sun [Fri, 1 Feb 2019 17:56:53 +0000 (09:56 -0800)]
test: Fix failed test "flow resume with geneve tun_metadata"
Test "flow resume with geneve tun_metadata" failed because there is
no controller running to handle the continuation message. A previous
commit deleted the line that starts ovs-ofctl as a controller in
order to avoid a race condition on monitor log. This patch adds
back this line but omits the log file because this test doesn't
depend on the log file.
Fixes: e8833217914f9c071c49 ("system-traffic.at: avoid a race condition on monitor log") Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> CC: David Marchand <david.marchand@redhat.com> Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Support for match & set ICMPv6 reserved and options type fields
Currently OVS supports all ARP protocol fields as OXM match fields to
implement the relevant ARP procedures for IPv4. This includes support
for matching copying and setting ARP fields. In IPv6 ARP has been
replaced by ICMPv6 neighbor discovery (ND) procedures, neighbor
advertisement and neighbor solicitation.
The support for ICMPv6 fields in OVS is not complete for the use cases
equivalent to ARP in IPv4. OVS lacks support for matching, copying and
setting the “ND option type” and “ND reserved” fields. Without these user
cannot implement all ICMPv6 ND procedures for IPv6 support.
This commit adds additional OXM fields to OVS for ICMPv6 “ND option type“
and ICMPv6 “ND reserved” using the OXM extension mechanism. This allows
support for parsing these fields from an ICMPv6 packet header and extending
the OpenFlow protocol with specifications for these new OXM fields for
matching, copying and setting.
Signed-off-by: Vishal Deep Ajmera <vishal.deep.ajmera@ericsson.com> Co-authored-by: Ashvin Lakshmikantha <ashvin.lakshmikantha@ericsson.com> Signed-off-by: Ashvin Lakshmikantha <ashvin.lakshmikantha@ericsson.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Yifeng Sun [Fri, 1 Feb 2019 23:56:04 +0000 (15:56 -0800)]
odp-util: Stop parse odp actions if nlattr is overflow
`encap = nl_msg_start_nested(key, OVS_KEY_ATTR_ENCAP)` ensures that
key->size >= (encap + NLA_HDRLEN), so the `if` statement is safe.
Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=11306 Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Yifeng Sun [Sat, 2 Feb 2019 00:44:26 +0000 (16:44 -0800)]
ofp-actions: Set an action depth limit to prevent stackoverflow by ofpacts_parse
Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=12557 Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Hyong Youb Kim [Sat, 2 Feb 2019 07:19:40 +0000 (23:19 -0800)]
ovs-tcpdump: Fix an undefined variable
Run ovs-tcpdump without --span, and it throws the following
exception. Define mirror_select_all to avoid the error.
Traceback (most recent call last):
File "/usr/local/bin/ovs-tcpdump", line 488, in <module>
main()
File "/usr/local/bin/ovs-tcpdump", line 454, in main
mirror_select_all)
UnboundLocalError: local variable 'mirror_select_all' referenced before assignment
Fixes: 0475db71c650 ("ovs-tcpdump: Add --span to mirror all ports on bridge.") Acked-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Numan Siddique [Mon, 4 Feb 2019 16:50:31 +0000 (22:20 +0530)]
ovn-controller: Fix chassisredirect port flapping when ovs-vswitchd crashes
On a chassis when ovs-vswitchd crashes for some reason, the BFD status doesn't
get updated in the ovs db. ovn-controller will be reading the old BFD status
even though ovs-vswitchd is crashed. This results in the chassiredirect port
claim flapping between the master chassis and the chasiss with the next higher
priority if ovs-vswitchd crashes in the master chassis.
All the other chassis notices the BFD status down with the master chassis
and hence the next higher priority claims the port. But according to
the master chassis, the BFD status is fine and it again claims back the
chassisredirect port. And this results in flapping. The issue gets resolved
when ovs-vswitchd comes back but until then it leads to lot of SB DB
transactions and high CPU usage in ovn-controller's.
This patch fixes the issue by checking the OF connection status of the
ovn-controller with ovs-vswitchd and calculates the active bfd tunnels
only if it's connected.
Signed-off-by: Numan Siddique <nusiddiq@redhat.com> Acked-by: Mark Michelson <mmichels@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Darrell Ball [Fri, 1 Feb 2019 07:35:40 +0000 (23:35 -0800)]
conntrack: fix ftp ipv4 address substitution.
When replacing the ipv4 address in repl_ftp_v4_addr(), the remaining size
was incorrectly calculated which could lead to the wrong replacement
adjustment.
This goes unnoticed most of the time, unless you choose carefully your
initial and replacement addresses.
Example fail address combination with 10.1.1.200 DNAT'd to 10.1.100.1.
Fix this by doing something similar to V6 and also splicing out common
code for better coverage and maintainability.
A test is updated to exercise different initial and replacement addresses
and another test is added.
Fixes: bd5e81a0e596 ("Userspace Datapath: Add ALG infra and FTP.") Reported-by: David Marchand <david.marchand@redhat.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Darrell Ball <dlu998@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ilya Maximets [Tue, 29 Jan 2019 08:11:50 +0000 (11:11 +0300)]
dpdk: Limit DPDK memory usage.
Since 18.05 release, DPDK moved to dynamic memory model in which
hugepages could be allocated on demand. At the same time '--socket-mem'
option was re-defined as a size of pre-allocated memory, i.e. memory
that should be allocated at startup and could not be freed.
So, DPDK with a new memory model could allocate more hugepage memory
than specified in '--socket-mem' or '-m' options.
This change adds new configurable 'other_config:dpdk-socket-limit'
which could be used to limit the ammount of memory DPDK could use.
It uses new DPDK option '--socket-limit'.
Ex.:
ovs-vsctl set Open_vSwitch . other_config:dpdk-socket-limit="1024,1024"
Also, in order to preserve old behaviour, if '--socket-limit' is not
specified, it will be defaulted to the amount of memory specified by
'--socket-mem' option, i.e. OVS will not be able to allocate more.
This is needed, for example, to disallow OVS to allocate more memory
than reserved for it by Nova in OpenStack installations.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
lib/tc: add set ipv6 traffic class action offload via pedit
Extend ovs-tc translation by allowing non-byte-aligned fields
for set actions. Use new boundary shifts and add set ipv6 traffic
class action offload via pedit.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Signed-off-by: Louis Peens <louis.peens@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
lib/tc: add set ipv4 dscp and ecn action offload via pedit
Add setting of ipv4 dscp and ecn fields in tc offload using pedit.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Signed-off-by: Louis Peens <louis.peens@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
lib/tc: fix 32 bits shift for pedit offset calculation
pedit allows setting entire words with an optional mask and OVS
makes use of such masks to allow setting fields that do not span
entire words. One mask for leading bytes that should not be
updated and another mask for trailing bytes that should not be
updated. The masks are created using bit shifts.
In the case of the mask to omit trailing bytes a right bit shift
is used. Currently the code can produce shifts of 1, 2, 3 or 4
bytes (8, 16, 24 or 32 bits) based on the alignment of the end
of field being set.
However, a shift of 32 bits on a 32bit value is undefined.
As it stands the code relies on the result of UINT32_MAX >> 32
being UINT32_MAX. Or in other words a mask that results in the
pedit action setting all bytes of the word under operation.
This patch adjusts the code to use a shift of 0 for this case,
which gives the same result as the undefined behaviour that was
relied on, and appears logically correct as the desire is for no
trailing bytes (or bits!) to be omitted from the set action.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
lib/tc: make pedit mask calculations byte order agnostic
pedit allows setting entire words with an optional mask and OVS
makes use of such masks to allow setting fields that do not span
entire words.
The struct tc_pedit_key structure, which is part of the kernel
ABI, uses host byte order fields to store the mask and value for
a pedit action, however, these fields contain values in network
byte order.
In order to allow static analysis tools to check for endianness
problems this patch adds a local version of struct tc_pedit_key
which uses big endian types and refactors the relevant code as
appropriate.
In the course of making this change it became apparent that the
calculation of masks was occurring using host byte order although
the values are in network byte order. This patch also fixes that
problem by shifting values in host byte order and then converting
them to network byte order. It is believe this fixes a bug on big
endian systems although we are not in a position to test that.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
Fix test 'testing ovn -- IP packet buffering' on Windows
The test fails on Windows because of:
<--cut-->
ovn-nbctl: sw0: invalid network address: 2001;1\64
ovn-nbctl: sw1: invalid network address: 2002;1\64
<--cut-->
This is due to the fact msys converts '::1' into ';1'.
Use IPv6 long form instead of its short variant.
Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org> Acked-by: Numan Siddique <nusiddiq@redhat.com> Acked-by: Ben Pfaff <blp@ovn.org>
Anand Kumar [Fri, 11 Jan 2019 00:45:24 +0000 (16:45 -0800)]
datapath-windows: Add support for 'OVS_KEY_ATTR_ENCAP' key attribute.
Add a new structure in l2 header to accomodate vlan header,
based of commit "d7efce7beff25052bd9083419200e1a47f0d6066
datapath: 802.1AD Flow handling, actions, vlan parsing, netlink attributes"
Also reset vlan header in flow key, after deleting vlan tag from nbl
With this change a sample vlan flow would look like,
eth(src=0a:ea:8a:24:03:86,dst=0a:cd:fa:4d:15:5c),in_port(3),eth_type(0x8100),
vlan(vid=2239,pcp=0),encap(eth_type(0x0800),ipv4(src=13.12.11.149,dst=13.12.11.107,
proto=1,tos=0,ttl=128,frag=no),icmp(type=8,code=0))
Signed-off-by: Anand Kumar <kumaranand@vmware.com> Acked-by: Alin Gabriel Serdean <aserdean@ovn.org> Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>
This patch introduced a regression in OSP environments using internal
ports in other netns. Their networking configuration is lost when
the service is restarted because the ports are recreated now.
Before the patch it checked using netlink if the port with a specific
"name" was already there. The check is a lookup in all ports attached
to the DP regardless of the port's netns.
After the patch it relies on the kernel to identify that situation.
Unfortunately the only protection there is register_netdevice() which
fails only if the port with that name exists in the current netns.
If the port is in another netns, it will get a new dp_port and because
of that userspace will delete the old port. At this point the original
port is gone from the other netns and there a fresh port in the current
netns.
Signed-off-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
The original commit 7521e0cf9e88 ("ofproto-dpif: Let the dpif report
when a port is a duplicate.") relies on the kernel to check if the
port exists or not. However, the current kernel code doesn't handle
when the port is moved to another network namespace.
Signed-off-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
The original commit 7521e0cf9e88 ("ofproto-dpif: Let the dpif report
when a port is a duplicate.") relies on the kernel to check if the
port exists or not. However, the current kernel code doesn't handle
when the port is moved to another network namespace.
Signed-off-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Li RongQing [Fri, 25 Jan 2019 11:08:33 +0000 (19:08 +0800)]
flow: fix udp checksum
As per RFC 768, if the calculated UDP checksum is 0, it should be
instead set as 0xFFFF in the frame. A value of 0 in the checksum
field indicates to the receiver that no checksum was calculated
and hence it should not verify the checksum.
Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
hash: Implement hash for aarch64 using CRC32c intrinsics.
This commit adds lib/hash-aarch64.h to implement hash for aarch64.
It is based on aarch64 built-in CRC32c intrinsics, which accelerates
hash function for datapath performance.
test:
1. "test-hash" case passed in aarch64 platform.
2. OVS-DPDK datapth performance test was run(NIC to NIC).
Test bed: aarch64(Centriq 2400) platform.
Test case: DPCLS forwarding(disable EMC + avg 10 subtable lookups)
Test result: improve around 10%.
Signed-off-by: Yanqin Wei <yanqin.wei@arm.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ilya Maximets [Fri, 25 Jan 2019 13:21:22 +0000 (16:21 +0300)]
ovs-macros.at: Better hide 'exec -a' checking.
There is some issue with parsing of redirection options
on some shells. For example:
$ (exec -a name true) 2>&1 >/dev/null || echo "failed"
sh: 10: exec: -a: not found
failed
$ (exec -a name true) >/dev/null 2>&1 || echo "failed"
failed
So, the order of redirections matters for some reason.
Let's replace our current version with simple redirection of stderr.
This version seems to work in most of shells except [t]csh. But it's
really tricky to write portable redirections that works with csh and
this shell will not be used by the testsuite on most of the systems.
Aaron Conole [Thu, 24 Jan 2019 18:20:13 +0000 (10:20 -0800)]
stt: Fix return code during xmit.
In the case of an error, return the error code as opposed to
NETDEV_TX_OK.
Caught by compiler warning:
/home/travis/build/ovsrobot/ovs/datapath/linux/stt.c: In function =E2=80=
=98ovs_stt_xmit=E2=80=99:
/home/travis/build/ovsrobot/ovs/datapath/linux/stt.c:1005:6: warning: var=
iable =E2=80=98err=E2=80=99 set but not used [-Wunused-but-set-variable]
int err;
^
Martin Xu [Tue, 22 Jan 2019 23:02:30 +0000 (15:02 -0800)]
rhel: bug fix upgrade path in kmod fedora spec file
This patch removes the "Conflicts" tag and adds "Obsoletes" tag.
With the conflicts tag, when a user attempts to install or upgrade with
the same version as already installed, the conflict kicks in. Otherwise,
such is allowed with --replacepkgs.
Obsoletes is needed for the upgrade path from kmod-openvswitch to
openvswitch-kmod.
Fixes: 22c33c3039 (rhel: support kmod build against mulitple kernel
versions, fedora)
Greg Rose [Tue, 22 Jan 2019 23:42:55 +0000 (15:42 -0800)]
datapath: return -EEXIST if inet6_add_protocol fails
Our code to determine whether receive functionality will work with
ip6 gre depends on the return of -EEXIST but inet6_add_protocol()
returns a -1 on failure to grab the pointer via a cmpxchg op. Just
set the error return to -EEXIST to help out the vport init function.
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2019-January/048090.html Reported-by: Ken Ajiro <ken-ajiro@xr.jp.nec.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Greg Rose [Thu, 10 Jan 2019 22:09:51 +0000 (14:09 -0800)]
compat: Fixup ipv6 fragmentation on 4.9.135+ kernels
Upstream commit 648700f76b03 ("inet: frags: use rhashtables...") changed
how ipv6 fragmentation is implemented. This patch was backported to
the upstream stable 4.9.x kernel starting at 4.9.135.
This patch creates the compatibility layer changes required to both
compile and also operate correctly with ipv6 fragmentation on these
kernels. Check if the inet_frags 'rnd' field is present to key on
whether the upstream patch is present. Also update Travis to the
latest 4.9 kernel release so that this patch is compile tested.
Cc: William Tu <u9012063@gmail.com> Cc: Yi-Hung Wei <yihung.wei@gmail.com> Cc: Yifeng Sun <pkusunyifeng@gmail.com> Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Anju Thomas [Mon, 7 May 2018 19:04:54 +0000 (00:34 +0530)]
Fix crash due to multiple tnl push action
During slow path packet processing, if the action is to output to a
tunnel port, the slow path processing of the encapsulated packet
continues on the underlay bridge and additional actions (e.g. optional
VLAN encapsulation, bond link selection and finally output to port) are
collected there.
To prepare for a continuation of the processing of the original packet
(e.g. output to other tunnel ports in a flooding scenario), the
“tunnel_push” action and the actions of the underlay bridge are
encapsulated in a clone() action to preserve the original packet.
If the underlay bridge decides to drop the tunnel packet (for example if
both bonded ports are down simultaneously), the clone(tunnel_push))
actions previously generated as part of translation of the output to
tunnel port are discarded and a stand-alone tunnel_push action is added
instead. Thus the tunnel header is pushed on to the original packet.
This is the bug.
Consequences: If packet processing continues with sending to further
tunnel ports, multiple tunnel header pushes will happen on the original
packet as typically the tunnels all traverse the same underlay bond
which is down. The packet may not have enough headroom to accommodate
all the tunnel headers. OVS crashes if it runs out of space while trying
to push the tunnel headers.
Even in case there is enough headroom, the packet will not be freed
since the accumulated action list contains only the tunnel header push
action without any output port action. Thus, we either have a crash or a
packet buffer leak.
Signed-off-by: Anju Thomas <anju.thomas@ericsson.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Aaron Conole [Mon, 21 Jan 2019 18:05:25 +0000 (13:05 -0500)]
travis: enable testsuite with dpdk
The testsuite flag isn't currently being passed for DPDK. Let's pass it
and when a future DPDK supports running the check-dpdk suite, we can
turn that on then, too.
Signed-off-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
David Marchand [Wed, 16 Jan 2019 02:58:15 +0000 (18:58 -0800)]
conntrack: fix tcp seq adjustments when mangling commands.
The ftp alg deals with packets in two ways for the command connection:
either they are inspected and can be mangled when nat is enabled
(CT_FTP_CTL_INTEREST) or they just go through without being modified
(CT_FTP_CTL_OTHER).
For CT_FTP_CTL_INTEREST packets, we must both adjust the packet tcp seq
number by the connection current offset, then prepare for the next
packets by setting an accumulated offset in the ct object. However,
this was not done for multiple CT_FTP_CTL_INTEREST packets for the same
connection.
This is relevant for handling multiple child data connections that also
need natting.
The tests are updated so that some ftp+NAT tests send multiple port
commands or other similar commands for a single control connection.
Wget is not able to do this, so switch to lftp.
Fixes: bd5e81a0e596 ("Userspace Datapath: Add ALG infra and FTP.") Co-authored-by: Darrell Ball <dlu998@gmail.com> Signed-off-by: Darrell Ball <dlu998@gmail.com> Signed-off-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ben Pfaff [Sat, 15 Dec 2018 02:16:54 +0000 (18:16 -0800)]
odp-util: Avoid revalidation error for masked NSH set action.
A masked NSH set action has mdtype 0 because the mdtype is not being
changed, but odp_nsh_key_from_attr() rejects this because mdtype 0 does
not match up with the OVS_NSH_KEY_ATTR_MD1 attribute being present. This
fixes the problem.
The kernel datapath in flow_netlink function nsh_key_put_from_nlattr() has
a similar exception.
Acked-by: Justin Pettit <jpettit@ovn.org> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ben Pfaff [Sat, 15 Dec 2018 02:16:53 +0000 (18:16 -0800)]
Fix bugs in L3 protocol support.
Test 854 "tunnel_push_pop - action" showed problems in revalidation for
L3 protocol support in its L3 GRE test. L3 packets (that is, packets
without an Ethernet header but only some L3 protocol such as IPv4 or IPv6)
have an Ethernet type that is kept in the dl_type member of the flow, and
the flows that they pass through can cause L3 and L4 fields to be matched.
However, the translation process incorrectly forced the dl_type to be
wildcarded, which caused a contradiction since it's not possible to match
on L3 and L4 fields if the dl_type is not known, and the code in
odp_flow_key_to_flow() and related functions therefore rejected these flows
at revalidation time.
This commit fixes the problem by treating dl_type the same for L2 and L3
flows in translation. It also makes odp_flow_key_to_flow__() copy the
Ethernet type that comes from a packet_type field into dl_type, which is
the expected behavior.
The actual error that this fixes is only visible after applying an upcoming
commit that improves logging for bad datapath flows.
Acked-by: Justin Pettit <jpettit@ovn.org> Signed-off-by: Ben Pfaff <blp@ovn.org>
Yi-Hung Wei [Mon, 7 Jan 2019 23:48:19 +0000 (15:48 -0800)]
selinux: Add missing permissions for ovs-kmod-ctl
Starting from OVS 2.10, ovs-vswitchd may fail to run after system reboot
since it fails to load ovs kernel module. It is because the conntrack
zone limit feature introduced in OVS 2.10 now depends on
nf_conntrack_ipv4/6 kernel module, and the SELinux prevents it to load the
two kernel modules.
Example log of the AVC violations:
type=AVC msg=audit(1546903594.735:29): avc: denied { execute_no_trans }
for pid=820 comm="modprobe" path="/usr/bin/bash" dev="dm-0" ino=50337111
scontext=system_u:system_r:openvswitch_load_module_t:s0
tcontext=system_u:object_r:shell_exec_t:s0 tclass=file
Ben Pfaff [Wed, 12 Dec 2018 20:28:34 +0000 (12:28 -0800)]
connmgr: Do not send asynchronous messages to rconns lacking protocols.
There are corner cases in which an rconn might not have a defined OpenFlow
protocol or version. These happen at connection startup, before the
protocol version has been negotiated, and can also happen when a connection
is being shut down. It's desirable to avoid these situations entirely,
but so far we haven't managed to do this. This commit avoids trying to
send messages to such connection, which is what really tends to get OVS in
trouble since there's no way to construct an OpenFlow message without
knowing what version of OpenFlow to use (with a few exceptions that don't
matter here).
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2018-December/047876.html Reported-by: Josh Bailey <joshb@google.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Yunjian Wang [Fri, 18 Jan 2019 05:38:58 +0000 (13:38 +0800)]
odp-util: Fix parsing QinQ packet in parse_8021q_onward.
A problem the userspace datapath failed to create a new datapath flow
when dealing with QinQ packets(the flow includeing ip,udp,etc). L2-L5
header should be considered before parsing the second 802.1Q header.
Fixes: f0fb825a3785 ("Add support for 802.1ad (QinQ tunneling)") Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ilya Maximets [Fri, 18 Jan 2019 08:56:50 +0000 (11:56 +0300)]
dpif-netdev: Per-port configurable EMC.
Conditional EMC insert helps a lot in scenarios with high numbers
of parallel flows, but in current implementation this option affects
all the threads and ports at once. There are scenarios where we have
different number of flows on different ports. For example, if one
of the VMs encapsulates traffic using additional headers, it will
receive large number of flows but only few flows will come out of
this VM. In this scenario it's much faster to use EMC instead of
classifier for traffic from the VM, but it's better to disable EMC
for the traffic which flows to VM.
To handle above issue introduced 'emc-enable' configurable to
enable/disable EMC on a per-port basis. Ex.:
ovs-vsctl set interface dpdk0 other_config:emc-enable=false
EMC probability kept as is and it works for all the ports with
'emc-enable=true'.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Ben Pfaff [Thu, 10 Jan 2019 23:38:01 +0000 (15:38 -0800)]
poll-loop: Set poll loop initial deadline to LLONG_MAX.
This is consistent with the re-initialization value that poll_block() uses.
It is better than 0 because the monotonic clock can have a negative value,
even though that is rare and pathological.
Found by inspection.
Acked-by: Justin Pettit <jpettit@ovn.org> Signed-off-by: Ben Pfaff <blp@ovn.org>
Yifeng Sun [Thu, 17 Jan 2019 18:22:12 +0000 (10:22 -0800)]
odp-util: Fix a bug in parse_odp_push_nsh_action
In this piece of code, 'struct ofpbuf b' should always point to
metadata so that metadata can be filled with values through ofpbuf
operations, like ofpbuf_put_hex and ofpbuf_push_zeros. However,
ofpbuf_push_zeros may change the data pointer of 'struct ofpbuf b'
and therefore, metadata will not contain the expected values.
This patch fixes it by changing ofpbuf_push_zeros to
ofpbuf_put_zeros.
Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=10863 Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ophir Munk [Thu, 17 Jan 2019 18:42:35 +0000 (18:42 +0000)]
netdev-dpdk: support port representors
Dpdk port representors were introduced in dpdk versions 18.xx.
Prior to port representors there was a one-to-one relationship
between an rte device (e.g. PCI bus) and an eth device (referenced as
dpdk port id in OVS). With port representors the relationship becomes
one-to-many rte device to eth devices.
For example in [3] there are two devices (representors) using the same
PCI physical address 0000:08:00.0: "0000:08:00.0,representor=[3]" and
"0000:08:00.0,representor=[5]".
This commit handles the new one-to-many relationship. For example,
when one of the device port representors in [3] is closed - the PCI bus
cannot be detached until the other device port representor is closed as
well. OVS remains backward compatible by supporting dpdk legacy PCI
ports which do not include port representors.
Dpdk port representors related commits are listed in [1]. Dpdk port
representors documentation appears in [2]. A sample configuration
which uses two representors ports (the output of "ovs-vsctl show"
command) is shown in [3].
[1] e0cb96204b71 ("net/i40e: add support for representor ports") cf80ba6e2038 ("net/ixgbe: add support for representor ports") 26c08b979d26 ("net/mlx5: add port representor awareness")
Yifeng Sun [Wed, 16 Jan 2019 22:37:08 +0000 (14:37 -0800)]
ofp-actions: Avoid overflow for ofpact_learn_spec->n_bits
ofpact_learn_spec->n_bits is the size of immediate data that is
following ofpact_learn_spec. Now it is defined as 'uint8_t'.
In many places, it gets its value directly from mf_subfield->n_bits,
whose type is 'unsigned int'. If input is large enough, there will
be uint8_t overflow.
For example, the following command will make ovs-ofctl crash:
ovs-ofctl add-flow br0 "table=0, priority=0, action=learn(limit=20 tun_metadata15=0x60ff00000000000003000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002fffffffffffffff0ffffffffffffffffffffffffffff)"
This patch fixies this issue by changing type of ofpact_learn_spec->n_bits
from uint8_t to uint32_t.
Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=11870 Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Han Zhou [Wed, 16 Jan 2019 21:45:10 +0000 (13:45 -0800)]
sandbox: Fix env for clustered OVN DBs.
When ovn clustered mode is specified, the environment veriables
OVN_NB_DB/OVN_SB_DB are wrong. It should be something like
unix:nb1,unix:nb2,unix:nb3 but it turns out to be unix:nb1,unix:nb1,unix:nb2.
So when nb3 becomes leader, the connection will always fail.
It is caused by using an undefined variable $n resulting in the
unexpected result of `seq 2 $n`. This patch fixed it by using the
correct variable $servers.
Signed-off-by: Han Zhou <hzhou8@ebay.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Mark Michelson [Wed, 16 Jan 2019 15:37:06 +0000 (10:37 -0500)]
ovn: Add port addresses to IPAM later.
ipam_add_port_adresses() needs to be called after the peer field is set
on the ovn_port structures. This way, addresses taken by peered router
ports will be added to the logical switch's IPAM and therefore will be
barred from assignment to other ports.
Reported-by: Girish Moodalbail <gmoodalbail@nvidia.com> Signed-off-by: Mark Michelson <mmichels@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ian Stokes [Tue, 6 Nov 2018 21:17:38 +0000 (21:17 +0000)]
travis: Add dpdk shared library build.
Add travis builds for DPDK as a shared library.
Currently the DPDK builds in travis only compile DPDK as a static library.
With static builds in DPDK there is a risk that if a function is not
exported then it will not be supported when DPDK is used as a shared library.
This commit adds the option to build DPDK as a shared library. Also two
build jobs are added to the travis.yml whereby a shared DPDK is built
with both static and shared OVS libraries.
Signed-off-by: Ian Stokes <ian.stokes@intel.com> Acked-by: Tiago Lam <tiago.lam@intel.com> Acked-by: Kevin Traynor <ktraynor@redhat.com>
Nitin Katiyar [Wed, 16 Jan 2019 05:41:43 +0000 (05:41 +0000)]
Adding support for PMD auto load balancing
Port rx queues that have not been statically assigned to PMDs are currently
assigned based on periodically sampled load measurements.
The assignment is performed at specific instances – port addition, port
deletion, upon reassignment request via CLI etc.
Due to change in traffic pattern over time it can cause uneven load among
the PMDs and thus resulting in lower overall throughout.
This patch enables the support of auto load balancing of PMDs based on
measured load of RX queues. Each PMD measures the processing load for each
of its associated queues every 10 seconds. If the aggregated PMD load reaches
95% for 6 consecutive intervals then PMD considers itself to be overloaded.
If any PMD is overloaded, a dry-run of the PMD assignment algorithm is
performed by OVS main thread. The dry-run does NOT change the existing
queue to PMD assignments.
If the resultant mapping of dry-run indicates an improved distribution
of the load then the actual reassignment will be performed.
The automatic rebalancing will be disabled by default and has to be
enabled via configuration option. The interval (in minutes) between
two consecutive rebalancing can also be configured via CLI, default
is 1 min.
Following example commands can be used to set the auto-lb params:
ovs-vsctl set open_vswitch . other_config:pmd-auto-lb="true"
ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-rebalance-intvl="5"
Terry Wilson [Mon, 14 Jan 2019 14:15:36 +0000 (08:15 -0600)]
Un-revert Work around Python/C JSON unicode differences
This fix was reverted because it depended on a small bit of code
in a patch that was reverted that changed some python/ovs testing
and build. The fix is still necessary.
The OVS C-based JSON parser operates on bytes, so the parser_feed
function returns the number of bytes that are processed. The pure
Python JSON parser currently operates on unicode, so it expects
that Parser.feed() returns a number of characters. This difference
leads to parsing errors when unicode characters are passed to the
C JSON parser from Python.
Acked-by: Lucas Alvares Gomes <lucasagomes@gmail.com> Signed-off-by: Terry Wilson <twilson@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ilya Maximets [Mon, 14 Jan 2019 15:04:32 +0000 (18:04 +0300)]
checkpatch: Check style of FOREACH loops.
Current checkpatch rules matches only OVS 'FOR_EACH' loops.
This change will apply same style checks for DPDK iterators
like 'RTE_ETH_FOREACH_MATCHING_DEV () {}'.
Ilya Maximets [Tue, 15 Jan 2019 14:03:00 +0000 (17:03 +0300)]
python: Escape backslashes while formatting logs.
Since python version 3.7 (and some 3.6+ versions) regexp engine
changed to treat the wrong escape sequences as errors. Previously,
if the replace string had something like '\u0000', '\u' was
qualified as a bad escape sequence and treated just as a sequence
of characters '\' and 'u'. But know this triggers an error:
Traceback (most recent call last):
File "/usr/lib/python3.7/sre_parse.py", line 1021, in parse_template
this = chr(ESCAPES[this][1])
KeyError: '\\u'
From the documentation [1]:
Unknown escapes consisting of '\' and an ASCII letter in replacement
templates for re.sub() were deprecated in Python 3.5, and will now
cause an error.
We need to escape the backslash by another one to keep regexp engine
from errors. In case of '\\u000', '\\' is a valid escape sequence
and the 'u' is a simple character.
To be 100% safe we need to use 're.escape(replace)', but it escapes
too many characters making the logs hard to read.
This change fixes Python 3 tests on systems with python 3.7.
Should be backward compatible.
Reported-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ben Pfaff [Thu, 10 Jan 2019 23:23:45 +0000 (15:23 -0800)]
python: Fix invalid escape sequences.
It appears that Python silently treats invalid escape sequences in
strings as literals, e.g. "\." is the same as "\\.". Newer versions of
checkpatch complain, and it does seem reasonable to me to fix these.
Ilya Maximets [Fri, 11 Jan 2019 08:09:19 +0000 (11:09 +0300)]
vconn: Fix using of uninitialized deadline.
Typo introduced while making minor refactoring before applying the
patch.
Fixes logic and the clang build:
lib/vconn.c:707:47: error:
variable 'deadline' is uninitialized when
used within its own initialization [-Werror,-Wuninitialized]
? time_msec() + deadline
^~~~~~~~
Acked-by: Kevin Traynor <ktraynor@redhat.com> Fixes: 04895042e9f6 ("vconn: Allow timeout configuration for blocking connection.") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ben Pfaff [Wed, 29 Aug 2018 18:14:31 +0000 (11:14 -0700)]
ofproto: Handle multipart requests with multiple parts.
OpenFlow has a concept of multipart messages, that is, messages that can be
broken into multiple pieces that are sent separately. Before OpenFlow 1.3,
only replies could actually have multiple pieces. OpenFlow 1.3 introduced
the idea that requests could have multiple pieces. This is only useful for
multipart requests that take an array as part of the request, which amounts
to only flow monitoring requests and table features requests. So far, OVS
hasn't implemented the multipart versions of these (it just reports an
error). This commit introduces the necessary infastructure to implement
them properly.
Acked-by: Justin Pettit <jpettit@ovn.org> Signed-off-by: Ben Pfaff <blp@ovn.org>