Joe Stringer [Thu, 24 Dec 2015 21:09:38 +0000 (13:09 -0800)]
datapath: Re-designate OVS_FRAGMENT_BACKPORT.
Typically the way that we include backported code is by testing for
existence of the feature in the upstream codebase via header checks,
then attempt to use the upstream code as much as possible. However, for
the IP fragmentation handling backport we have an additional constraint
which is that we cannot support kernels older than Linux-3.10.
To date, OVS_FRAGMENT_BACKPORT has been defined to include the backport
of the IP fragmentation code for all kernels from 3.10 to 4.2, rather
than attempting to use the upstream code as much as possible. This patch
relaxes OVS_FRAGMENT_BACKPORT to only check the lower bound so that the
upstream code may be used in more circumstances.
Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org>
Joe Stringer [Tue, 2 Feb 2016 23:19:02 +0000 (15:19 -0800)]
compat: Detect and use upstream ip_fragment().
Previously a version check was used to determine whether the upstream
ip_fragment() should be used or the backported version. The actual test
is for whether upstream commit d6b915e29f4a ("ip_fragment: don't forward
defragmented DF packet") is present, so test for that instead.
Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org>
Joe Stringer [Thu, 24 Dec 2015 18:41:35 +0000 (10:41 -0800)]
compat: Detect and use inet_frag_queue->list_evictor.
Kernels 3.17 to 4.2 have a work queue to evict old fragments, but do not
track these fragments in an eviction list. On these kernels, we detect
the absence of the list_evictor and provide one. This commit fixes the
reliance on kernel versions in the case that this functionality is
backported.
Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org>
Ilya Maximets [Wed, 3 Feb 2016 11:31:43 +0000 (14:31 +0300)]
dpif: Allow adding ukeys for same flow by different pmds.
In multiqueue mode several pmd threads may process one port, but
different queues. Flow may not depend on queue. It's true at least for
vhost-user ports.
When multiple pmd threads attempt to process upcalls for a particular
flow key, only the first will succeed. Any subsequent threads will
receive error = ENOSPC when attempting to insert a new udpif_key into
the umaps. This causes the latter threads to never insert a flow into
the datapath to handle the traffic, and as a result they will
consistently execute those flows through the slow path.
Fix that by mixing pmd_id with the bits from the ufid for ukey->hash
calculation. So, for a given flow key/UFID, each pmd thread will create
an independent udpif_key.
This also opens the possibility to reassign queues among pmd threads
without restarting them and deleting the megaflow cache.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Joe Stringer <joe@ovn.org>
Ben Pfaff [Wed, 3 Feb 2016 21:15:48 +0000 (13:15 -0800)]
vlog: Stop using explicit references to external log modules.
It's always risky to write "extern" declarations outside a header file,
since there's no way to ensure the type of what's being referenced is
correct. In these cases, we can easily avoid the extern reference, so do
so.
There is a little tradeoff here, in that referring to the log modules
through strings means that we catch an incorrect module name at runtime
instead of at link time, but I think that the risk here is minimal because
the mistake will be found by every test in "make check" that runs any of
the utilities, since they make these calls as one of their first tasks
during initialization.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Russell Bryant <russell@ovn.org>
Ben Pfaff [Wed, 3 Feb 2016 23:18:33 +0000 (15:18 -0800)]
vlog: Simplify module definition.
Until now, vlog had a macro VLOG_DEFINE_THIS_MODULE, which expanded using
VLOG_DEFINE_MODULE, which expanded using VLOG_DEFINE_MODULE__, and the
latter macros didn't have any other users. This commit combines them for
clarity.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Russell Bryant <russell@ovn.org>
Ben Pfaff [Wed, 3 Feb 2016 20:23:36 +0000 (12:23 -0800)]
vlog: Make 'vlog_modules' private to vlog.c.
I think we once used this variable from an inline function in vlog.h, so
that we had to make it "extern", but these days it's only used from vlog.c,
so it can be static now.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Russell Bryant <russell@ovn.org>
Ben Pfaff [Wed, 3 Feb 2016 01:57:46 +0000 (17:57 -0800)]
ofproto: Detect and handle errors in ofproto_port_add().
The update_port() function called in ofproto_port_add() can encounter
errors that prevent a port from being added, but nothing was checking for
the error and in fact update_port() didn't even pass the error along to
its caller. This commit fixes the problem.
The scenario that led me to examine this code can be triggered as follows
from the sandbox, as long as you change --enable-dummy=override to
--enable-dummy=system in ovs-sandbox:
The second add-port will fail due to the duplicate tunnel options, but
ofproto_port_add() will not return the error. Instead, it will report to
the caller that it succeeded and tell it that it has ofp_port OFPP_NONE
(65535), which is invalid and it obviously does not. The result is that
you get bizarre log messages like this:
tunnel|WARN|tun1: attempting to add tunnel port with same config as port 'tun0' (::->1.2.3.4, key=0, dp port=7471, pkt mark=0)
ofproto|WARN|br0: could not add port tun1 (File exists)
bridge|INFO|bridge br0: added interface tun1 on port 65535
ofproto|WARN|br0: cannot configure bfd on nonexistent port 65535
ofproto|WARN|br0: cannot configure LLDP on nonexistent port 65535
ofproto|WARN|br0: cannot get STP status on nonexistent port 65535
ofproto|WARN|br0: cannot get RSTP status on nonexistent port 65535
ofproto|WARN|br0: cannot get STP stats on nonexistent port 65535
ofproto|WARN|br0: cannot get STP stats on nonexistent port 65535
VMware-BZ: #1598643 Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Justin Pettit <jpettit@ovn.org>
Russell Bryant [Wed, 3 Feb 2016 16:46:33 +0000 (11:46 -0500)]
ovn: Update comment about local datapath calculation.
ovn-controller has a simple optimization for excluding some logical flow
processing that we know is unnecessary. This update to a comment in the
code reflects a discussion we had on the mailing list about a better
algorithm that would let us exclude even more.
Alin Serdean [Tue, 12 Jan 2016 18:00:32 +0000 (18:00 +0000)]
datapath-windows: fix endless loop on reboot
Testing under 2012 gave some more inisight on an old bug.
If a PNP event with the value of NetEventSwitchActivate was triggered
we were calling OvsQuerySwitchActivationComplete which does an OID request
to the underlying drivers, however this triggered a hang because as per
documentation:
https://msdn.microsoft.com/en-us/library/windows/hardware/ff561830%28v=vs.85%29.aspx
"A driver can call NdisFOidRequest when it is in the Restarting, Running,
Pausing, or Paused state."
This resulted in an endless booting cycle.
Looking at the documentation again:
https://msdn.microsoft.com/en-us/library/windows/hardware/ff568751%28v=vs.85%29.aspx
NetEventSwitchActivate indicates that the extensible switch has completed
activation so we can now safely query the switch itself.
Also we are not forwarding the PNP event to the overlaying drivers unless
we succeeded in the operation, this issue has been fixed also.
Russell Bryant [Wed, 20 Jan 2016 18:29:25 +0000 (13:29 -0500)]
ovn-controller: Only process lflows for local datapaths.
Previously, ovn-controller translated logical flows into OpenFlow flows
for *every* logical datapath. This patch makes it so we skip doing so
for the egress pipeline if the datapath is a logical switch with no
logical ports bound locally. In that case, the flows have no effect.
This was the code path taking the most time in a large scale OVN
environment and was an easy optimization to make based on the existing
local_datapaths info.
In this environment, while idling, ovn-controller was taking up about
20% CPU with this patch, while other nodes were in the 40-70% range.
Reported-at: https://bugs.launchpad.net/networking-ovn/+bug/1536003 Signed-off-by: Russell Bryant <russell@ovn.org> Tested-by: Matt Mulsow <mamulsow@us.ibm.com> Acked-by: Ben Pfaff <blp@ovn.org> Acked-By: Kyle Mestery <mestery@mestery.com>
Russell Bryant [Mon, 25 Jan 2016 21:54:06 +0000 (16:54 -0500)]
ovn-controller: Allocate ct zones for localnet ports.
Previously, all ct() actions applied to localnet ports used the default
conntrack zone. We should allocate a ct zone ID for all localnet ports
just like we do for all local VIFs so that none of our connection
tracking interferes with any base system connection tracking in the
default zone.
Signed-off-by: Russell Bryant <russell@ovn.org> Acked-by: Han Zhou <zhouhan@gmail.com> Acked-by: Ben Pfaff <blp@ovn.org>
Russell Bryant [Fri, 15 Jan 2016 21:39:42 +0000 (16:39 -0500)]
ovn: Fix localnet ports on the same chassis.
Multiple logical ports on the same chassis that were connected to the
same physical network via localnet ports were not able to send packets
to each other. This was because ovn-controller created a single patch
port between br-int and the physical network access bridge and used it
for all localnet ports.
The fix implemented here is to create a separate patch port for every
logical port of type=localnet. An optimization is included where these
ports are only created if the localnet port is on a logical switch with
another logical port with an associated local VIF.
A nice side effect of this fix is that the code in physical.c got a lot
simpler, as localnet ports are now handled mostly like local VIFs.
Fixes: c02819293d52 ("ovn: Add "localnet" logical port type.") Reported-by: Han Zhou <zhouhan@gmail.com>
Reported-at: http://openvswitch.org/pipermail/dev/2016-January/064413.html Signed-off-by: Russell Bryant <russell@ovn.org> Tested-by: Kyle Mestery <mestery@mestery.com Acked-By: Kyle Mestery <mestery@mestery.com> Tested-by: Han Zhou <zhouhan@gmail.com> Tested-by: Michael Arnaldi <michael.arnaldi@mymoneyex.com> Acked-by: Ben Pfaff <blp@ovn.org>
Russell Bryant [Fri, 15 Jan 2016 19:30:41 +0000 (14:30 -0500)]
ovn-controller: Move local_datapaths calculation.
Before this patch, physical.c build up the set of local datapaths for
its own use. I'd like to use it in another module in a later patch, so
pull it out of physical. It's now populated by the bindings module,
since that seems like a more appropriate place to do it, and it's also
done much earlier in the main loop, making it easier to re-use.
Signed-off-by: Russell Bryant <russell@ovn.org> Acked-by: Han Zhou <zhouhan@gmail.com> Acked-by: Ben Pfaff <blp@ovn.org>
It is ok to iterate a cmap with CMAP_FOR_EACH and remove elements with
cmap_remove(), but having quiescent states inside the loop might create
problems, since some of the postponed cleanup done inside the cmap might
be executed, freeing the memory that the iterator is using.
We had several of these errors in dpif-netdev, because when we rearrange
ports or threads we often need to wait on a condition variable (which
implies a quiescent state).
This problem caused iterations to skip elements or to list them twice,
resulting in the main thread waiting on a condition without anyone else
to signal.
Fix these cases by moving the possible quiescent states outside
CMAP_FOR_EACH loops.
When a group of packets arrives from a port, we loop through them to
initialize metadata and then we loop through them again to extract the
flow and perform the exact match classification.
This commit combines the two loops into one, and initializes packet->md
in emc_processing() to improve performance.
Since emc_processing() might also be called after recirculation (in
which case the metadata is already valid), an extra parameter is added
to support both cases.
This commits also implements simple prefetching of packet metadata,
to further improve performance.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Andy Zhou <azhou@ovn.org> Acked-by: Chandran, Sugesh <sugesh.chandran@intel.com>
Joe Stringer [Fri, 8 Jan 2016 01:47:23 +0000 (17:47 -0800)]
compat: Detect and use nf_ct_frag6_gather().
This function is a likely candidate for backporting, and currently
relies on version checks to include the source or not. Grep for the
appropriate functions instead, and include the backport based on that.
Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org>
Joe Stringer [Thu, 24 Dec 2015 19:06:18 +0000 (11:06 -0800)]
compat: Detect and use inet_frags->lock.
Prior to ab1c724f6330 ("inet: frag: use seqlock for hash rebuild")
upstream, a rwlock was used when rebuilding inet_frags. Rather than
using a version check to detect this, search for it in the header and
enable the code based on whether it exists.
Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org>
Joe Stringer [Thu, 24 Dec 2015 18:54:37 +0000 (10:54 -0800)]
compat: Detect and use inet_frags->frags_work.
Kernels 3.17 and newer have a work queue to evict old fragments, while
older kernel versions use an LRU in the fast path; see upstream commit b13d3cbfb8e8 ("inet: frag: move eviction of queues to work queue").
This commit fixes the version checking so that rather than enabling the
code for either of these approaches using version checks, it is
triggered based on the presence of the work queue in "struct inet_frags".
Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org>
Joe Stringer [Thu, 24 Dec 2015 18:40:02 +0000 (10:40 -0800)]
compat: Detect and use inet_frag_queue->last_in.
Kernels 3.17 and older have this field, while newer kernels use the
'flags' field. Detect this in the build in case anyone backports this
change to an older kernel.
Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org>
Russell Bryant [Thu, 17 Dec 2015 14:45:58 +0000 (09:45 -0500)]
python: Deal with str and byte differences.
Python 3 has separate types for strings and bytes. Python 2 used the
same type for both. We need to convert strings to bytes before writing
them out to a socket. We also need to convert data read from the socket
to a string.
Signed-off-by: Russell Bryant <russell@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
Russell Bryant [Thu, 17 Dec 2015 17:55:43 +0000 (12:55 -0500)]
python: Fix object comparisons in Python 3.
Python 3 no longer supports __cmp__. Instead, we have to implement the
"rich comparison" operators. We implement __eq__ and __lt__ and use
functools.total_ordering to implement the rest.
In one case, no __cmp__ method was provided and instead relied on the
default behavior provided in Python 2. We have to implement the
comparisons explicitly for Python 3.
Signed-off-by: Russell Bryant <russell@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
Russell Bryant [Wed, 16 Dec 2015 21:16:49 +0000 (16:16 -0500)]
python: Remove reamining direct type comparisons.
I've hit several bugs in this Python 3 work where the fix was some code
needed to be converted to use isinstance(). This has been primarily
around deadling with the changes to unicode handling. Go ahead and
convert the rest of the direct type comparisons to use isinstance(), as
it could avoid a bug I haven't hit yet and it's more Pythonic, anyway.
Signed-off-by: Russell Bryant <russell@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
Russell Bryant [Tue, 15 Dec 2015 21:37:11 +0000 (16:37 -0500)]
python: Drop use of sys.maxint.
sys.maxint does not exist in Python 3, as an int does not have a max
value anymore (except as limited by implementation details and system
resources).
sys.maxsize works as a reasonable substitute as it's the same as
sys.maxint. The Python 3.0 release notes have this to say:
The sys.maxint constant was removed, since there is no longer a limit
to the value of integers. However, sys.maxsize can be used as an
integer larger than any practical list or string index. It conforms to
the implementation’s “natural” integer size and is typically the same
as sys.maxint in previous releases on the same platform (assuming the
same build options).
sys.maxsize is documented as:
An integer giving the maximum value a variable of type Py_ssize_t can
take. It’s usually 2**31 - 1 on a 32-bit platform and 2**63 - 1 on a
64-bit platform.
Signed-off-by: Russell Bryant <russell@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
Russell Bryant [Tue, 15 Dec 2015 13:51:45 +0000 (08:51 -0500)]
python: Drop use of types.FunctionType.
This code asserted that the callback argument was of type
types.FunctionType. It's more pythonic to just check that the argument
is callable, and not specifically that it's a function. There are other
ways to implement a callback than types.FunctionType.
Signed-off-by: Russell Bryant <russell@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
Russell Bryant [Mon, 14 Dec 2015 22:01:11 +0000 (17:01 -0500)]
python: Drop unicode type.
Python 2 had str and unicode. Python 3 only has str, which is always a
unicode string. Drop use of unicode with the help of six.text_type
(unicode in py2 and str in py3) and six.string_types ([str, unicode] in
py2 and [str] in py3).
Signed-off-by: Russell Bryant <russell@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
Russell Bryant [Mon, 14 Dec 2015 21:32:00 +0000 (16:32 -0500)]
python: Drop usage of long type.
Python 2 has both long and int types. Python 3 only has int, which
behaves like long.
In the case of needing a set of integer types, we can use
six.integer_types which includes int and long for Python 2 and just int
for Python 3.
We can convert all cases of long(value) to int(value), because as of
Python 2.4, when the result of an operation would be too big for an int,
the type is automatically converted to a long.
There were several places in this patch doing type comparisons. The
preferred way to do this is using the isinstance() or issubclass()
built-in functions, so I converted the similar checks nearby while I was
at it.
Signed-off-by: Russell Bryant <russell@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
Russell Bryant [Fri, 22 Jan 2016 19:49:25 +0000 (14:49 -0500)]
flake8: Fix use of --select and --ignore.
The flake8 command evolved over a series of patches and now includes the
use of both --select and --ignore. Unfortunately, this wasn't doing
what I thought. The use of --select completely overrides what --ignore
does, meaning that we were only currently enforcing a small number of
warnings specified in --select. This patch runs flake8 twice, once with
--select and once with --ignore to actually enforce the full desired
set of warnings.
No additional violations had been introduced, but I noticed this while
working on some other patches.
Signed-off-by: Russell Bryant <russell@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
Andy Zhou [Sat, 30 Jan 2016 01:48:50 +0000 (17:48 -0800)]
dpif-netdev: optmizing emc_processing()
Commit d262ac2c60ce1da7b477737f70e8efd38b32502d introduced a slight
performance drop for the fast path, where every packets hits the
emc cache. This patch removes that performance drop by only reloading
the key pointer on emc cache miss.
Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Andy Zhou [Sat, 30 Jan 2016 01:40:18 +0000 (17:40 -0800)]
dpif-netdev: Load packet pointer only once in emc_processing()
For the machines I have access to, Reloading the same pointer from
memory seems to inhibit complier optimization somewhat.
In emc_processing(), using a single packet pointer, instead reloading
it from memory with packets[i], improves performance by 0.3 Mpps (tested
with 10G NIC pushing 64 byte packets, with the base line of 12.2 Mpps).
Besides improving performance, this patch should also improves code
readability.
Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Ilya Maximets [Tue, 2 Feb 2016 11:02:15 +0000 (14:02 +0300)]
netdev-dpdk: Unlink vhost-user sockets on fatal signals.
While killing OVS may not call rte_vhost_driver_unregister()
for vhost-user ports. As a result corresponding socket will
remain in a system and opening of that port after restart
will fail.
(Even after this patch this remains a problem for signals
that OVS does not or cannot catch, such as SIGSEGV and
SIGKILL.)
Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Russell Bryant [Mon, 1 Feb 2016 14:58:22 +0000 (09:58 -0500)]
ovn-northd: Don't set custom log level defaults.
ovn-northd set some custom log level defaults, which I believe were
copied from ovs-vsctl. Other daemons don't set this. The difference in
behavior in ovn-northd vs other daemons has caused some confusion during
OpenStack+OVN development and testing, so make it consistent.
Reported-by: Ryan Moats <rmoats@us.ibm.com>
Reported-at: https://bugs.launchpad.net/bugs/1539994 Signed-off-by: Russell Bryant <russell@ovn.org> Acked-By: Kyle Mestery <mestery@mestery.com> Acked-by: Ben Pfaff <blp@ovn.org>
Ben Pfaff [Sun, 24 Jan 2016 16:32:36 +0000 (08:32 -0800)]
dpif-netdev: Avoid copying netdev_flow_key in emc_processing().
Before this commit, emc_processing() copied a netdev_flow_key if there was
no exact-match cache (EMC) hit. This commit eliminates the copy by
constructing the netdev_flow_key in the place it would be copied.
Found by inspection.
Shahbaz (CCed) reports that this reduces the cost of an EMC miss by 72
cycles in his test case in which the EMC is disabled. Presumably this
is similarly valuable in cases where the EMC merely has few hits.
For the original version of this patch, which was against a slightly
earlier version of OVS, Daniele reported that:
- With EMC disabled, this increases throughput from 4.8 Mpps to 5.4
Mpps.
- With EMC enabled, this decreases throughput from 12.4 to 12.0 Mpps.
CC: Muhammad Shahbaz <mshahbaz@cs.princeton.edu> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Andy Zhou <azhou@ovn.org> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Jarno Rajahalme [Sat, 30 Jan 2016 01:28:08 +0000 (17:28 -0800)]
ofproto-dpif-xlate: Remove obsolete special case.
Bond recirculation used to insert a special rule that jumped from the
internal table to table 0 using GOTO_TABLE. Since the introduction of
the ofproto-dpif-rid this has not been necessary any more, so we can
remove the special case that allowed GOTO_TABLE to go backwards in
that case.
When the linux stack is an endpoint connected to OVS which is performing
IP fragmentation via conntrack actions, it's possible to hit a kernel
BUG. The following upstream commit fixes the issue inside ip_defrag().
For the backport, we provide this inside ip_defrag() for kernels that we
currently backport that function, and also provide just the bugfix for
newer kernels, so we can continue to use upstream functionality as much
as possible.
Upstream commit:
Later parts of the stack (including fragmentation) expect that there is
never a socket attached to frag in a frag_list, however this invariant
was not enforced on all defrag paths. This could lead to the
BUG_ON(skb->sk) during ip_do_fragment(), as per the call stack at the
end of this commit message.
While the call could be added to openvswitch to fix this particular
error, the head and tail of the frags list are already orphaned
indirectly inside ip_defrag(), so it seems like the remaining fragments
should all be orphaned in all circumstances.
Russell Bryant [Fri, 29 Jan 2016 21:32:36 +0000 (16:32 -0500)]
MAINTAINERS: convert to .md format.
Most other doc files are in markdown format, so convert this one as
well. Instead of linking to the web site, just use relative links
to the policy documents in the tree.
This patch also gives me a chance to hide a fix for my failure to
accurately sort names in alphabetical order.
Signed-off-by: Russell Bryant <russell@ovn.org> Acked-by: Justin Pettit <jpettit@ovn.org>
Russell Bryant [Thu, 28 Jan 2016 16:42:27 +0000 (11:42 -0500)]
rhel: Clarify instructions for RHEL 7.
The rpm build instructions did not clarify what spec files were to be
used for RHEL 7 and its derivatives. Clarify that you're actually
supposed to use the spec files called "fedora" for RHEL 7 right now.
Update references to Fedora versions to reflect the current release
(23), as neither 16 or 17 are supported releases anymore.
Signed-off-by: Russell Bryant <russell@ovn.org> Acked-by: Gurucharan Shetty <guru@ovn.org>
Russell Bryant [Thu, 28 Jan 2016 20:23:00 +0000 (15:23 -0500)]
rhel: Make openvswitch-kmod-fedora.spec build.
I tried building this package on Fedora 23 and it failed for a couple of
different reasons.
This package tried to install modules without specifying the rpm build
root as an install prefix. The result was just attempting to install
the modules on the system, which luckily failed since I wasn't running
rpmbuild as root.
The package also then tried to manually install the modules into the rpm
build root, which is unnecessary once modules_install is pointed to the
right place.
Finally, the package build failed with a completely unhelpful error
which turned out to be because it didn't know how to generate
a debuginfo pacakge. I turned off the debug package for now. At least
it builds now, which is an improvement.
Joe Stringer [Wed, 27 Jan 2016 00:49:36 +0000 (00:49 +0000)]
datapath: Fix IPv6 fragment expiry crash.
Prior to a series of commits in 3.17 like the following, the model
used to manage and expire fragments was different. We already backport
several of these functions (See datapath/compat/inet_fragment.c) to do
things like allocate/evict/destroy frags and frag queues. In the IPv4
code, we use these. In most of the IPv6 cases, we already reuse these
also. However, for timed frag expiration we instead call the upstream
version of the function, which proceeds to use the upstream versions
of the functions we backport in inet_fragment.c. There can be some
discrepancy between the offsets used in these upstream versions vs. the
backport versions, so if you mix/match them then it leads to invalid
dereferences.
b13d3cbfb8e8 ("inet: frag: move eviction of queues to work queue") ab1c724f6330 ("inet: frag: use seqlock for hash rebuild")
Fixes the following kernel oops on kernels < 3.17 when IPv6 fragments
are expired without reassembling the frame.
Simon Horman [Fri, 27 Nov 2015 06:07:23 +0000 (22:07 -0800)]
datapath: test for netlink_set_err returning void
In v2.6.33 netlink_set_err returns void. However, 1a50307ba182 ("netlink:
fix NETLINK_RECV_NO_ENOBUFS in netlink_set_err()") was backported and
included in v2.6.33.2 and in that and subsequent v2.6.33 stable releases
netlink_set_err returns an int.
It seems plausible that there are other backports floating around. So check
for netlink_set_err returning void rather than including compatibility code
based on the version of the kernel.
Signed-off-by: Simon Horman <simon.horman@netronome.com> Acked-by: Pravin B Shelar <pshelar@ovn.org>
Flavio Leitner [Tue, 26 Jan 2016 18:58:14 +0000 (16:58 -0200)]
netdev-dpdk: Add vhost-user multiqueue support
Most of the network cards today supports multiple receive
and transmit queues (MQ). The core idea is that on packet
reception, a NIC can send different packets to different
queues to distribute processing among CPUs running in parallel.
The packet distribution is based on a result of a filter applied
on each packet headers. The filter should keep all packets from
the same flow on the same queue to avoid re-ordering while
distributing different flows among all available queues.
This is how the packet moves in a typical vhost-user use-case:
NIC OVS
DPDK port ==== bridge --- vhost-user ==== qemu ==== virtio eth0
The DPDK ports, OVS bridges, virtio network driver and
recently QEMU (vhost-user) supports MQ. This patch adds MQ
support to OVS that leverages DPDK vhost library to implement
vhost-user interfaces.
Signed-off-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Kevin Traynor <kevin.traynor@intel.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Ben Pfaff [Thu, 28 Jan 2016 21:25:50 +0000 (13:25 -0800)]
tests: Fix race in "ofproto-dpif - ofproto-dpif-monitor 1" test.
This test contained two commands that both read and overwrote
ovs-vswitchd.log, and then expected the running ovs-vswitchd to carry on
appending to it. Depending on the shell implementation and the speed of
execution, and the libc implementation, this might not have the desired
effect. This commit replaces this by a multi-step process that avoids
in-place replacement.
Found by inspection.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Jarno Rajahalme <jarno@ovn.org>
Ben Pfaff [Wed, 27 Jan 2016 00:23:30 +0000 (16:23 -0800)]
tests: Change ADD_OF_PORTS from macro to shell function.
This reduces the size of the generated testsuite and makes it possible
to pass arguments that vary at runtime instead of at the time of
translation from .at to shell script.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Jarno Rajahalme <jarno@ovn.org>
Russell Bryant [Wed, 27 Jan 2016 20:10:08 +0000 (15:10 -0500)]
Add MAINTAINERS file.
Previously, the list of committers was not written down publicly. There
was no reason for this other than it not being trivial to expose the
commiter group membership via github. This MAINTAINERS file lists the
members of the OVS committers group.
There's some outdated email addresses in here, but I just copied what
was currently in AUTHORS. This can be fixed when AUTHORS gets fixed,
too.
Signed-off-by: Russell Bryant <russell@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
Andy Zhou [Tue, 26 Jan 2016 02:48:19 +0000 (18:48 -0800)]
dpif-netdev: drop swapping
emc_processing() moves all the missed packets towards the beginning of
packet array; matched packets are queued up into flow queues. Since the
remaining of the packet array is not used anymore, don't bother swap
packet pointers to save cycles and simplify logic.
Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
When debug_recirc triggers recirculation and we later resume processing,
only the output to port 2 should be executed, because the effects of
"resubmit" have already taken place. However, until now, the "resubmit"
was added to the actions to execute post-recirculation, resulting in an
infinite loop.
Now consider this flow table (as seen in the "MPLS handling" test in
ofproto-dpif.at):
Here, we do want to add the "resubmit" to the actions to execute
post-recirculation, since the "resubmit" cannot be processed until after
recirculation makes the nw_dst field available.
This commit fixes the problem in both cases.
Found when testing a feature based on recirculation.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Jarno Rajahalme <jarno@ovn.org>
William Tu [Wed, 27 Jan 2016 06:21:15 +0000 (22:21 -0800)]
tests: Fix unbalanced parentheses that caused build break in testcase.
The current build fails at this test case:
/usr/bin/m4:tests/ovs-vswitchd.at:171: recursion limit of 1024 exceeded,
use -L<N> to change it
autom4te: /usr/bin/m4 failed with exit status: 1
Observed on Centos 6.5 with m4 version 1.4.13.
Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ilya Maximets [Tue, 26 Jan 2016 06:12:34 +0000 (09:12 +0300)]
dpif-netdev: Unique and sequential tx_qids.
Currently tx_qid is equal to pmd->core_id. This leads to unexpected
behavior if pmd-cpu-mask different from '/(0*)(1|3|7)?(f*)/',
e.g. if core_ids are not sequential, or doesn't start from 0, or both.
Example:
starting 2 pmd threads with 1 port, 2 rxqs per port,
pmd-cpu-mask = 00000014 and let dev->real_n_txq = 2
It that case pmd_1->tx_qid = 2, pmd_2->tx_qid = 4 and
txq_needs_locking = true (if device hasn't ovs_numa_get_n_cores()+1
queues).
In that case, after truncating in netdev_dpdk_send__():
'qid = qid % dev->real_n_txq;'
pmd_1: qid = 2 % 2 = 0
pmd_2: qid = 4 % 2 = 0
So, both threads will call dpdk_queue_pkts() with same qid = 0.
This is unexpected behavior if there is 2 tx queues in device.
Queue #1 will not be used and both threads will lock queue #0
on each send.
Fix that by using sequential tx_qids.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Ilya Maximets [Tue, 26 Jan 2016 06:12:33 +0000 (09:12 +0300)]
dpif-netdev: Rework of rx queue management.
Current rx queue management model is buggy and will not work properly
without additional barriers and other syncronization between PMD
threads and main thread.
Known BUGS of current model:
* While reloading, two PMD threads, one already reloaded and
one not yet reloaded, can poll same queue of the same port.
This behavior may lead to dpdk driver failure, because they
are not thread-safe.
* Same bug as fixed in commit e4e74c3a2b
("dpif-netdev: Purge all ukeys when reconfigure pmd.") but
reproduced while only reconfiguring of pmd threads without
restarting, because addition may change the sequence of
other ports, which is important in time of reconfiguration.
Introducing the new model, where distribution of queues made by main
thread with minimal synchronizations and without data races between
pmd threads. Also, this model should work faster, because only
needed threads will be interrupted for reconfiguraition and total
computational complexity of reconfiguration is less.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Lance Richardson [Fri, 22 Jan 2016 15:12:29 +0000 (10:12 -0500)]
system-traffic: use appropriate nc options for installed version
Test cases using netcat ("ICMP related" and "ICMP related with NAT")
currently fail on systems using the nmap version of nc because this
version does not support the -q command-line option.
Fix this by detecting which version of netcat is is in use and
using the "--send-only" command-line option when the nmap flavor
is detected, and using "-q 1" otherwise (openbsd and traditional
versions).
Tested via "make check-kernel" on RHEL7 (nmap version of nc),
Debian 8.2 (openbsd version of nc), and Ubuntu 14.04 ("traditional" nc).
Signed-off-by: Lance Richardson <lrichard@redhat.com> Signed-off-by: Joe Stringer <joe@ovn.org>
Ben Pfaff [Fri, 22 Jan 2016 23:58:55 +0000 (15:58 -0800)]
ofproto-dpif-xlate: Fix recirculation for resubmit to current table.
When recirculation defers actions for processing later, it decides
based on the actions being saved whether it needs to record the table
and cookie from which they originated. Until now, it was thought that
this was only important for actions that send packets to the controller
(because those actions send the table ID and cookie). This overlooked
a special case of the "resubmit" action which also depends on the
current table ID, which meant that this special case malfunctioned if
it came after recirculation. This commit fixes the problem, and adds
a test.
Found while testing another feature under development.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Jarno Rajahalme <jarno@ovn.org>
Russell Bryant [Mon, 14 Dec 2015 20:13:20 +0000 (15:13 -0500)]
python: Convert dict iterators.
In Python 2, dict.items(), dict.keys(), and dict.values() returned a
list. dict.iteritems(), dict.iterkeys(), and dict.itervalues() returned
an iterator.
As of Python 3, dict.iteritems(), dict.itervalues(), and dict.iterkeys()
are gone. items(), keys(), and values() now return an iterator.
In the case where we want an iterator, we now use the six.iter*()
helpers. If we want a list, we explicitly create a list from the
iterator.
Signed-off-by: Russell Bryant <russell@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
Ben Pfaff [Thu, 21 Jan 2016 00:53:01 +0000 (16:53 -0800)]
ofproto-dpif-xlate: Put recirc_state, not recirc_id_node, in xlate_in.
This will make it possible, in an upcoming commit, to construct a
recirc_state locally on the stack to pass to xlate_actions(). It would
also be possible to construct and pass a recirc_id_node on the stack, but
the translation process only uses the recirc_state anyway. The alternative
here of having upcall_xlate() know that it can recover the recirc_id_node
from the recirc_state isn't great either; it's debatable which is the
better approach.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Jarno Rajahalme <jarno@ovn.org>