Jesse Gross [Mon, 22 Jun 2015 21:23:37 +0000 (14:23 -0700)]
tunneling: Userspace datapath support for Geneve options.
Currently the userspace datapath only supports Geneve in a
basic mode - without options - since the rest of userspace
previously didn't support options either. This enables the
userspace datapath to send and receive options as well.
The receive path for extracting the tunnel options isn't entirely
optimal because it does a lookup on the options on a per-packet
basis, rather than per-flow like the kernel does. This is not
as straightforward to do in the userspace datapath since there
is no translation step between packet formats used in packet vs.
flow lookup. This can be optimized in the future and in the
meantime option support is still useful for testing and simulation.
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Tue, 23 Jun 2015 18:38:56 +0000 (11:38 -0700)]
ofproto: Fix use-after-free in bridge destruction with groups.
Groups were not destroyed until after lots of other important bridge
data had been destroyed, including the connection manager. There was an
indirect dependency on the connection manager for bridge destruction
because destroying a group also destroys all of the flows that reference
the group, which in turn causes the ofmonitor to be invoked to report that
the flows had been destroyed. This commit fixes the problem by destroying
groups earlier.
The problem can be observed by reverting the code changes in this commit
then running "make check-valgrind" with the test that this commit
introduces.
Reported-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Ben Pfaff <blp@nicira.com> Reviewed-by: Simon Horman <simon.horman@netronome.com>
Ben Pfaff [Fri, 26 Jun 2015 15:14:15 +0000 (08:14 -0700)]
ofp-actions: Support mixing "conjunction" and "note" actions.
It doesn't make sense to mix "conjunction" actions with most other kinds
of actions. That's because flows with "conjunction" actions aren't ever
actually executed, so any actions mixed up with them would never do
anything useful. "note" actions are a little different because they never
do anything useful anyway: they are just there to allow a controller to
annotate flows. It makes as much sense to annotate a flow with
"conjunction" actions as it does to annotate any other flow, so this
commit makes this possible.
Requested-by: Soner Sevinc <sevincs@vmware.com> Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Andy Zhou <azhou@nicira.com>
Jesse Gross [Wed, 24 Jun 2015 21:44:50 +0000 (14:44 -0700)]
tunneling: Don't match on source IP address for native tunnels.
When doing native tunneling, we look at packets destined to the
local port to see if they match tunnel protocols that we should
intercept. The criteria are IP protocol, destination UDP port, etc.
However, we also look at the source IP address of the packets. This
should be a function of the port-based tunnel layer and not the
tunnel receive code itself. For comparison, the kernel tunnel code
has no idea about the IP addresses of its link partners. If port
based tunnel is desired, it can be handled using the normal port
tunnel layer, regardless of whether the packets originally came
from userspace or the kernel.
For port based tunneling, this bug has no effect - the check is
simply redundant. However, it breaks flow-based native tunnels
because the remote IP address is not known at port creation time.
CC: Pravin Shelar <pshelar@nicira.com> Reported-by: David Griswold <David.Griswold@overturenetworks.com> Tested-by: David Griswold <David.Griswold@overturenetworks.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
Mark Kavanagh [Tue, 9 Jun 2015 14:49:18 +0000 (07:49 -0700)]
dpif-netdev: log port/core affinity
When using multiple PMDs and numerous ports, a performance gain
may be achieved in some use cases by pinning a PMD/port to a
particular (set of) core(s).
This patch provides a summary of the switch's port/core affinities
each time that the status of the switch's ports is modified.
Based on this information, a user may determine what affinity
modifications are required to optimise performance for their
particular use case.
Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com> Signed-off-by: Wojciech Andralojc <wojciechx.andralojc@intel.com> Acked-by: Flavio Leitner <fbl@redhat.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>
Wei li [Thu, 25 Jun 2015 09:45:08 +0000 (02:45 -0700)]
netdev-dpdk: Do not flush tx queue which is shared among CPUs since it is always flushed
When tx queue is shared among CPUS,the pkts always be flush
in 'netdev_dpdk_eth_send'. So it is unnecessarily for flushing
in netdev_dpdk_rxq_recv Otherwise tx will be accessed without
locking.
Signed-off-by: Wei li <liw@dtdream.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>
Jesse Gross [Fri, 12 Jun 2015 19:49:23 +0000 (12:49 -0700)]
pkt-metadata: Avoid introducing overhead for userspace tunnels.
The addition of Geneve metadata requires a large amount of additional
space to handle the maximum set of options. In most cases, this is
not a big deal since it is only temporary storage on the stack or
can be automatically stripped out for miniflows. However, userspace
tunnels need to deal with this on a per-packet basis, so we should
avoid introducing additional overhead if possible. Two small changes
are aimed at this:
* Move struct flow_tnl to the end of the packet metadata. Since
the Geneve metadata is already at the end of flow_tnl and pkt_metadata
is at the end of struct dp_packet, this avoids putting a large
amount metadata (which might be empty) in hot cache lines.
* Only push the new metadata into a miniflow if any options are present
during miniflow_extract(). This does not necessarily provide the
most fine-grained flow generation but it is a quick check and
the userspace implementation of Geneve does not currently support
options anyways.
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Fri, 1 May 2015 01:09:57 +0000 (18:09 -0700)]
tunnel: Geneve TLV handling support for OpenFlow.
The current support for Geneve in OVS is exactly equivalent to VXLAN:
it is possible to set and match on the VNI but not on any options
contained in the header. This patch enables the use of options.
The goal for Geneve support is not to add support for any particular option
but to allow end users or controllers to specify what they would like to
match. That is, the full range of Geneve's capabilities should be exposed
without modifying the code (the one exception being options that require
per-packet computation in the fast path).
The main issue with supporting Geneve options is how to integrate the
fields into the existing OpenFlow pipeline. All existing operations
are referred to by their NXM/OXM field name - matches, action generation,
arithmetic operations (i.e. tranfer to a register). However, the Geneve
option space is exactly the same as the OXM space, so a direct mapping
is not feasible. Instead, we create a pool of 64 NXMs that are then
dynamically mapped on Geneve option TLVs using OpenFlow. Once mapped,
these fields become first-class citizens in the OpenFlow pipeline.
An example of how to use Geneve options:
ovs-ofctl add-geneve-map br0 {class=0xffff,type=0,len=4}->tun_metadata0
ovs-ofctl add-flow br0 in_port=LOCAL,actions=set_field:0xffffffff->tun_metadata0,1
This will add a 4 bytes option (filled will all 1's) to all packets
coming from the LOCAL port and then send then out to port 1.
A limitation of this patch is that although the option table is specified
for a particular switch over OpenFlow, it is currently global to all
switches. This will be addressed in a future patch.
Based on work originally done by Madhu Challa. Ben Pfaff also significantly
improved the comments.
Jesse Gross [Fri, 19 Jun 2015 20:54:13 +0000 (13:54 -0700)]
odp-util: Pass down flow netlink attributes when translating masks.
Sometimes we need to look at flow fields to understand how to parse
an attribute. However, masks don't have this information - just the
mask on the field. We already use the translated flow structure for
this purpose but this isn't always enough since sometimes we actually
need the raw netlink information. Fortunately, that is also readily
available so this passes it down from the appropriate callers.
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Fri, 19 Jun 2015 20:39:03 +0000 (13:39 -0700)]
metaflow: Extend size of mf_value to 128 bytes.
Tunnel metadata can be substantially larger than our existing fields
(up to 124 bytes in a single Geneve option) so this extends the size
of the data that we can handle with metaflow fields. This also
breaks a few tests that assume that their max size is also the
maximum that can be handled in a field.
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Tue, 2 Jun 2015 22:11:00 +0000 (15:11 -0700)]
openflow: Table maintenance commands for Geneve options.
In order to work with Geneve options, we need to maintain a mapping
table between an option (defined by <class, type, length>) and
an NXM field that can be operated on for the purposes of matches,
actions, etc. This mapping must be explicitly specified by the
user.
Conceptually, this table could be communicated using either OpenFlow
or OVSDB. Using OVSDB requires less code and definition of extensions
than OpenFlow but introduces the possibility that mapping table
updates and flow modifications are desynchronized from each other.
This is dangerous because the mapping table signifcantly impacts the
way that flows using Geneve options are installed and processed by
OVS. Therefore, the mapping table is maintained using OpenFlow commands
instead, which opens the possibility of using synchronization between
table changes and flow modifications through barriers, bundles, etc.
There are two primary groups of OpenFlow messages that are introduced
as Nicira extensions: modification commands (add, delete, clear mappings)
and table status request/reply to dump the current table along with switch
information.
Note that mappings should not be changed while they are in active use by
a flow. The result of doing so is undefined.
This only adds the OpenFlow infrastructure but doesn't actually
do anything with the information yet after the messages have been
decoded.
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Fri, 8 May 2015 03:11:57 +0000 (20:11 -0700)]
nx-match: Enable parsing string representations of variable fields.
When reading in hex strings that form NXM fields, we don't need to
enforce size constraints if the fields are variable length.
Instead, we can set the header size based on the string length.
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Thu, 7 May 2015 01:05:18 +0000 (18:05 -0700)]
nx-match: Trim variable length fields when encoding as actions.
It is technically correct to send the entire maximum length of
a field when it is variable length. However, it is awkward to
do so and not what one would naively expect. Since receivers will
internally zero-extend fields, we can do the opposite and trim
off leading zeros. This results in encodings that are generally
sensible without specific knowledge of what is being transmitted.
(Of course, other implementations, such as controllers, may know
exactly the expected length of the field and are free to encode
it that way even if it has leading zeros.)
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Thu, 7 May 2015 01:04:11 +0000 (18:04 -0700)]
nx-match: Enable senders of NXM fields to specify length.
Currently when an NXM field is encoded, the caller must specify
the length of the data being provided. However, this data is
always placed into a field of standard length. In order to
support variable length options, the length field must also
alter the size in the header. The previous implementation
already required callers to pass in the exact (fixed) size of
the field or it would not work properly, so there is no danger
that this will change the behavior for non-variable length
fields.
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
This adds support for receiving variable length fields encoded in
NXM/OXM and mapping them into OVS internal structures. In order
for this to make sense, we need to define some semantics:
There are three lengths that matter in this process: the maximum
size of the field (represented as the existing mf->n_bytes), the
size of the field in the incoming NXM (given by the length in the
NXM header), and the currently configured length of the field
(defined by the consumer of the field and outside the scope of
this patch).
Fields are modeled as being their maximum length and have the
characteristics expected by exsiting code (i.e. exact match fields
have masks that are all 1's for the whole field, etc.). Incoming
NXMs are stored in the field in the least significant bits. If
the NXM length is larger than the field, is is truncated, if it
is smaller it is zero-extended. When the field is consumed, the
component that needs data picks the configured length out of the
generated field.
In most cases, the configured and NXM lengths will be equal and
these edge cases do not matter. However, since we cannot easily
enforce that the lengths match (and might not even know what the
right length is, such as in the case of a string parsed by
ovs-ofctl), these semantics should provide deterministic results
that are easy to understand and not require most existing code
to be aware of variable length fields.
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Thu, 7 May 2015 00:59:24 +0000 (17:59 -0700)]
nx-match: Support variable length header lookup.
Currently we treat the entire NXM/OXM header, including length,
as an ID to define a field. However, this does not allow for
multiple lengths of a particular field.
If a field has been marked as variable, we should ignore the length
when looking up the field and only use the class and field. We
continue to use the length for non-variable fields to ensure that
we don't accept something that can never match.
Jesse Gross [Thu, 7 May 2015 00:57:03 +0000 (17:57 -0700)]
metaflow: Allow fields to be marked as variable length.
Until now, all fields that OVS can match against have been fixed
size (variable length headers can be skipped during parsing but
the match is fixed). However, Geneve options can vary in size
so we must not require the size of these fields to be known
at compile time.
This allows data types to be annotated with not only their size
but whether the field can be smaller than that. The following
patches will change OpenFlow parsing based on that.
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Thu, 25 Jun 2015 16:18:38 +0000 (09:18 -0700)]
bitmap: Convert single bitmap functions to 64-bit.
Currently the functions to set, clear, and iterate over bitmaps
only operate over 32 bit values. If we convert them to handle
64 bit bitmaps, they can be used in more places.
Suggested-by: Ben Pfaff <blp@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Sumit Garg [Thu, 25 Jun 2015 13:24:54 +0000 (09:24 -0400)]
python: Fix issue with probes for JSONRPC connections
When opening a JSONRPC connection, the health probes
are incorrectly getting turned off for connections
that need probes.
In other words, when stream_or_pstream_needs_probes()
return non-zero, the probes are gettting disabled as
the probe interval is getting set to zero. This leads
to incorrect behavior such that probes are:
- not getting turned off for unix: connections
- getting turned off for tcp:/ssl: connections
The changes in this commit fix this issue.
Signed-off-by: Sumit Garg <sumit@extremenetworks.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
Sumit Garg [Thu, 25 Jun 2015 15:51:42 +0000 (08:51 -0700)]
python: Fix writing to non-"alert" column for newly inserted row.
When 'alert' was turned off on a column, the code was erroring out when
value for that column was being set in a newly inserted row. This is
because the row._data was None at this time.
It seems that new rows are not initialized to defaults and that's why the
NULL error happens. IMO a newly inserted row should automatically get
intialized to default values. This new behavior can be implemented as a
separate improvement sometime in the future.
For now, I don't see an issue with adding the additional check. This new
check can continue as-is even after the new behavior is implemented.
Signed-off-by: Sumit Garg <sumit@extremenetworks.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
Alin Serdean [Thu, 25 Jun 2015 15:30:34 +0000 (15:30 +0000)]
tests: Reduce user burden for running "make check".
With this commit, users do not have to manually add the pthread-win32
DLL directory to their PATH.
Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com> Co-authored-by: Ben Pfaff <blp@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
Russell Bryant [Tue, 23 Jun 2015 18:22:08 +0000 (14:22 -0400)]
ovn: Add logical port 'enabled' state.
This patch adds a new column to the Logical_Port table of the
OVN_Northbound database called 'enabled'. The purpose is to allow a
port to be administratively enabled or disabled. It is sometimes
useful to keep a port and its related configuration, but temporarily
disable it, which means no traffic is allowed in or out of the port.
The implementation is fairly non-invasive as it only required minor
changes to the logical pipeline.
Signed-off-by: Russell Bryant <rbryant@redhat.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Mon, 22 Jun 2015 22:42:29 +0000 (15:42 -0700)]
Makefiles: Move xml2nroff rule from ovn directory to top level.
Originally only the OVN documentation used the XML format, but now it's
used outside the ovn directory (initially for ovs-sim.1) so it's more
logical to have the xml->nroff rule at the top level.
Reported-by: Alex Wang <alexw@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Alex Wang <alexw@nicira.com>
Ben Pfaff [Wed, 24 Jun 2015 18:17:12 +0000 (11:17 -0700)]
nx-match: Fix distribution of hash function for NXM/OXM headers.
NXM/OXM headers as represented in this file are 64-bit long and the low
bits are essentially constant (almost always 0) so using hash_int(),
which takes an uint32_t, is going to be a useless hash function. This
commit fixes the problem.
Found by inspection.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
Sorin Vinturis [Wed, 24 Jun 2015 10:56:55 +0000 (10:56 +0000)]
datapath-windows: Wrong cleanup of newly created multiple NBLs
Bug found in OvsPartialCopyToMultipleNBLs function in the cleanup part of
the code. Before completing the current NBL (newNbl) the NEXT link for the
following NBL (firstNbl) was broken, instead of the current one (newNbl).
Andy Zhou [Thu, 11 Jun 2015 00:24:08 +0000 (17:24 -0700)]
ovsdb: Flush JSON cache only when necessary
Currently, JSON cache is always flushed whenever the monitor receives
a transaction. This patch improves the JSON cache efficiency by only
flush the JSON cache when a transaction causes client visible
changes, avoiding unnecessary flushes.
Suggested-by: Ben Pfaff <blp@nicira.com> Signed-off-by: Andy Zhou <azhou@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Sorin Vinturis [Thu, 18 Jun 2015 18:37:13 +0000 (18:37 +0000)]
datapath-windows: Return success for already existing WFP objects
There are cases when the WFP callout or sublayer, being persistent
objects, already exists when we try to register the OVS callout. In
this cases, when trying to add again these WFP objects the return code
is STATUS_FWP_ALREADY_EXISTS, which we are interpreting as an error.
This is incorrect and this patch changes that.
Ben Pfaff [Wed, 10 Jun 2015 16:04:23 +0000 (09:04 -0700)]
Makefiles: Stop distributing files because building them requires Python.
A long time ago, the Open vSwitch build did not depend on Python (whereas
the runtime did), so the "make dist" based distribution included the
results of Python build tools. Later, the build began using Python,
but the distribution still included some of those results, because no one
had gone to the trouble of changing them. This commit changes the
Makefiles not to distribute Python-generated files but instead to just
generate them at build time.
Alex Wang [Fri, 12 Jun 2015 02:29:40 +0000 (19:29 -0700)]
db-ctl-base: Improve show command.
This commit adds improvement to 'show' command logic and allows it
to print key->table_ref maps. The direct effect can be observed
from the tests/vtep-ctl.at change. The improvement will also be
used in the ovn-sbctl implementation.
Signed-off-by: Alex Wang <alexw@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Alex Wang [Thu, 28 May 2015 23:19:15 +0000 (16:19 -0700)]
db-ctl-base: Make common database command code into library.
This commit extracts common database command (e.g. ovs-vsctl, vtep-ctl)
code into a new library module, db-ctl-base. Specifically, the module
unifies the command syntax and common database-operating commands like
(get, list, find, set ...), and provides apis which allow user to create
more specific commands.
Signed-off-by: Alex Wang <alexw@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
rpms: Exclude OVN files from openvswitch packages.
Currently rhel rpm does not build because of OVN files. This
patch only fixes the build failures. We eventually may have
to add OVN packages for RHEL, Xenserver and Debian.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Sorin Vinturis [Fri, 19 Jun 2015 16:33:56 +0000 (16:33 +0000)]
datapath-windows: Initialize reference count when enabling extension
When the extension is initialized the global reference count, used for
preventing early deallocation of the switch extension, is set to 1.
Enabling and then disabling the extension leaves the latter reference
count to zero. Because of this the switch context fails to be acquired,
i.e OvsAcquireSwitchContext returns zero, and that affects driver's
communication to the userspace.
The solution is to initialize the reference count each time the extension
is enabled.
Nithin Raju [Fri, 19 Jun 2015 16:13:08 +0000 (09:13 -0700)]
datapath-windows: use correct dst port during Vxlan Tx
A previous commit used the wrong DST port in the UDP header during Vxlan
Tx which caused Vxlan tunneling to break. Fixing it here..
Also included is a cosmetic fix in OvsDetectTunnelRxPkt() where we were
using htons() instead of ntohs(). Doesn't make a difference in practice
though.
One more change is, OvsIpHlprCbVxlan() has been nuked since it is not
used. Not sure if it is worth being resurrected.
Testing done: Ping across Vxlan tunnel and Stt tunnel.
Jesse Gross [Tue, 16 Jun 2015 18:15:28 +0000 (11:15 -0700)]
odp-util: Convert flow serialization parameters to a struct.
Serializing between userspace flows and netlink attributes currently
requires several additional parameters besides the flows themselves.
This will continue to grow in the future as well. This converts
the function arguments to a parameters struct, which makes the code
easier to read and allowing irrelevant arguments to be omitted.
Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: Andy Zhou <azhou@nicira.com>
Support IGMPv3 messages with multiple records. Make sure all IGMPv3
messages go through slow path, since they may carry multiple multicast
addresses, unlike IGMPv2.
Tests done:
* multiple addresses in IGMPv3 report are inserted in mdb;
* address is removed from IGMPv3 if record is INCLUDE_MODE;
* reports sent on a burst with same flow all go to userspace;
* IGMPv3 reports go to mrouters, i.e., ports that have issued a query.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ben Pfaff <blp@nicira.com>
ofproto-dpif-xlate: Make IGMP packets always take slow path.
IGMP packets need to take the slow path. Otherwise, packets that match
the same flow will not be processed by OVS. That might prevent OVS from
updating the expire time for entries already in the mdb, but also to
lose packets with different addresses in the payload.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ben Pfaff <blp@nicira.com>
With this commit, the VTEP emulator detects the datapath_type of the
bridge used as a "physical" switch, and creates subsequent bridges
with the same type. This allows ovs-vtep to work with the userspace
datapath.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Gurucharan Shetty <gshetty@nicira.com>
Ben Pfaff [Tue, 16 Jun 2015 15:22:46 +0000 (08:22 -0700)]
xml2nroff: Add support for variable substitutions.
This allows XML-generated manpages in the source tree to include correct
directory names for the local configuration, instead of just the plain
nroff ones.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Alex Wang <alexw@nicira.com>
Ben Pfaff [Sat, 13 Jun 2015 23:58:49 +0000 (16:58 -0700)]
dummy: Introduce new --enable-dummy=system option.
Until now there have been two variants for --enable-dummy:
* --enable-dummy: This adds support for "dummy" dpif and netdev.
* --enable-dummy=override: In addition, this replaces *every* existing
dpif and netdev by the dummy type.
The latter is useful for testing but it defeats the possibility of using
the userspace native tunneling implementation (because all the tunnel
netdevs get replaced by dummy netdevs). Thus, this commit adds a third
variant:
* --enable-dummy=system: This replaces the "system" dpif and netdev
by dummies but leaves the others untouched.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Alex Wang <alexw@nicira.com>
Ben Pfaff [Sat, 13 Jun 2015 23:59:49 +0000 (16:59 -0700)]
netdev: Initialize at the beginning of netdev_unregister_provider().
Otherwise, if netdev_unregister_provider() is called before any other
netdev function, netdev_class_mutex is not initialized and the attempt to
lock it aborts.
This doesn't fix an existing bug but with the following commit
--enable-dummy=system will make netdev_unregister_provider() the first
netdev function to be called.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Alex Wang <alexw@nicira.com>
Ben Pfaff [Sun, 14 Jun 2015 18:03:23 +0000 (11:03 -0700)]
packets: Generalize compose_arp().
Until now, compose_arp() has only been able to compose ARP requests. This
extends it to composing general ARP packets, in particular replies.
An upcoming commit will make use of this capability.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Alex Wang <alexw@nicira.com>
Sorin Vinturis [Thu, 28 May 2015 20:30:57 +0000 (20:30 +0000)]
datapath-windows: BSOD when disabling the extension
When the filter detach routine is called while there are packets
still in processing, the OvsUninitSwitchContext function call will
decrement the switch context reference count without releasing the
switch context structure. This behaviour is correct and expected,
but the BSOD is caused in this case because the gOvsSwitchContext
variable is set to NULL, which is wrong.
The gOvsSwitchContext global variable must be set to NULL only when
the switch context structure is actually released.
Signed-off-by: Sorin Vinturis <svinturis@cloudbasesolutions.com> Reported-by: Sorin Vinturis <svinturis@cloudbasesolutions.com>
Reported-at: https://github.com/openvswitch/ovs-issues/issues/80 Acked-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Mon, 15 Jun 2015 22:28:43 +0000 (15:28 -0700)]
ovn-sb: Remove redundant "attached_port" column from Gateway table.
The keys in the Chassis table's "gateway_ports" column report the same
information as the Gateway table's "attached_port" column, so this commit
removes the latter.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Alex Wang <alexw@nicira.com>
dpif-netdev: Prefetch next packet before miniflow_extract().
It appears that miniflow_extract() in emc_processing() spends a lot of
cycles waiting for the packet's data to be read.
Prefetching the next packet's data while parsing removes this delay.
For a single flow pipeline the throughput improves by ~10%. With a
more realistic pipeline the change has a much smaller effect (~0.5%
improvement)
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Signed-off-by: Ethan Jackson <ethan@nicira.com> Acked-by: Ethan Jackson <ethan@nicira.com>
ovs-ctl: let openvswitch startup to NOT hold up system boot upon error
Abort openvswitch startup script if ovsdb startup fails for
some reason. This helps in getting the system startup to NOT hang
indefinitely, as was seen in a recent report when ovsdb failed with
"I/O error: /etc/openvswitch/conf.db: failed to lock lockfile
(Resource temporarily unavailable)" and system remained in hung state
forever, unless manually rebooted from console.
Signed-off-by: Sabyasachi Sengupta <sabyasachi.sengupta@alcatel-lucent.com>
[blp@nicira.com changed an 'if' statement to '||'] Signed-off-by: Ben Pfaff <blp@nicira.com>
Russell Bryant [Sat, 13 Jun 2015 01:08:33 +0000 (21:08 -0400)]
Update location for Neutron plugin.
The git repository for the neutron plugin has been renamed to reflect
that it is now officially part of the OpenStack Neutron project. The
repo now lives in the "openstack" namespace.
Also remove the link to the todo file as those are now just tracked in
the networking-ovn bug tracker (launchpad bugs).
Signed-off-by: Russell Bryant <rbryant@redhat.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
Russell Bryant [Fri, 12 Jun 2015 16:51:24 +0000 (12:51 -0400)]
fedora.spec: Create openvswitch-ovn package.
This patch creates a new subpackage for OVN, openvswitch-ovn. It also
installs systemd unit files for ovn-controller and ovn-northd.
If you want to run ovn-controller:
# systemctl start ovn-controller
If you want to run ovn-northd:
# systemctl start ovn-northd
Both systemd units are currently set to depend on openvswitch. If
further ovsdb initialization is required for the OVN databases before
ovn-northd can start, that will be handled automatically by ovn-ctl
when you start the ovn-northd service.
This currently assumes that ovn-northd runs on the same host as
ovsdb-server that is hosting the OVN databases. That seems like a
reasonable assumption in the current architecture and can be evolved
later when needed.
Signed-off-by: Russell Bryant <rbryant@redhat.com> CC: Flavio Leitner <fbl@redhat.com> CC: Ben Pfaff <blp@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
Russell Bryant [Fri, 12 Jun 2015 16:51:23 +0000 (12:51 -0400)]
ovn: Add ovn-ctl to assist with OVN daemon lifecycle.
This patch introduces ovn-ctl, which is similar to ovs-ctl. I opted
for a new script as everything in OVN so far is nicely isolated, so a
new script seemed to make the most sense.
If you'd like to run ovn-controller on a host already running ovs:
# ovn-ctl start_controller
If you'd like to run ovn-northd:
# ovn-ctl start_northd
Note that ovn-ctl assumes that ovn-northd is running on the same
server as ovsdb-server hosting the OVN databases. Based on the
current architecture this seems like a completely reasonable
assumption. This can be improved later when needed.
There's some additional stuff happening in start_northd to make the
experience nicer and not require additional steps by the
administrator. It creates the OVN dbs if they don't exist. If
ovsdb-server hasn't loaded them, it tells it to load them, as well.
ovn-ctl also supports running everything on the same host. This would
be common in a test environment with a single host or small set of
hosts. That would simply be:
Ciara Loftus [Thu, 4 Jun 2015 13:51:40 +0000 (06:51 -0700)]
netdev-dpdk: add dpdk vhost-user ports
This patch adds support for a new port type to the userspace
datapath called dpdkvhostuser.
A new dpdkvhostuser port will create a unix domain socket which
when provided to QEMU is used to facilitate communication between
the virtio-net device on the VM and the OVS port on the host.
vhost-cuse ('dpdkvhost') ports are still available as 'dpdkvhostcuse'
ports and will be enabled if vhost-cuse support is detected in the
DPDK build specified during compilation of the switch. Otherwise,
vhost-user ports are enabled.
Ben Pfaff [Sun, 14 Jun 2015 01:46:34 +0000 (18:46 -0700)]
ovn-controller: Verify bridge ports before changing.
OVSDB is transactional but it does not have built-in protection from dirty
reads. To avoid those, it's necessary to manually add verification to
transactions to ensure that any data reads whose values were essential to
later writes have not changed. ovn-controller didn't do that for
the "ports" column in the Bridge table, which means that if the set of
ports changed when it didn't expect it, it could revert changes made by
other database clients.
In particular this showed up in a scale test, where ovn-controller would
delete "vif" ports added via ovs-vsctl.
(It's easy to see exactly what happened by looking in the database log
with "ovsdb-tool -mm show-log".)
Reported-by: Russell Bryant <rbryant@redhat.com>
Reported-at: http://openvswitch.org/pipermail/dev/2015-June/056326.html Signed-off-by: Ben Pfaff <blp@nicira.com>
Jarno Rajahalme [Fri, 12 Jun 2015 23:12:56 +0000 (16:12 -0700)]
ofproto: Support port mods in bundles.
Add support for port mods in an OpenFlow 1.4 bundle, as required for
the minimum support level by the OpenFlow 1.4 specification. If the
bundle includes port mods, it may not specify the OFPBF_ATOMIC flag.
Port mods and flow mods in a bundle are always applied in order and
the consecutive flow mods between port mods are made available to
lookups atomically.
Note that ovs-ofctl does not support creating bundles with port mods.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Jarno Rajahalme [Fri, 12 Jun 2015 23:12:56 +0000 (16:12 -0700)]
ofproto: Postpone sending flow removed messages.
The final flow stats are available only after there are no references
to the rule. Postpone sending the flow removed message until the
final stats are available.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Jarno Rajahalme [Fri, 12 Jun 2015 23:12:56 +0000 (16:12 -0700)]
classifier: Simplify versioning.
After all, there are some cases in which both the insertion version
and removal version of a rule need to be considered. This makes the
cls_match a bit bigger, but makes classifier versioning much simpler
to understand.
Also, avoid using type larger than int in an enum, as it is not
portable C.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Jarno Rajahalme [Fri, 12 Jun 2015 00:28:37 +0000 (17:28 -0700)]
rculist: Remove postponed poisoning.
Postponed 'next' member poisoning was based on the faulty assumption
that postponed functions would be called in the order they were
postponed. This assumption holds only for the functions postponed by
any single thread. When functions are postponed by different
threads, there are no guarantees of the order in which the functions
may be called, or timing between those calls after the next grace
period has passed.
Given this, the postponed poisoning could have executed after
postponed destruction of the object containing the rculist element.
This bug was revealed after the memory leaks on rule deletion were
recently fixed.
This patch removes the postponed 'next' member poisoning and adds
documentation describing the ordering limitations in OVS RCU.
Alex Wang dug out the root cause of the resulting crashes, thanks!
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Alex Wang <alexw@nicira.com>
Alex Wang [Tue, 9 Jun 2015 05:57:09 +0000 (22:57 -0700)]
vtep-ctl: Fix a bug.
add_port_to_cache() uses 'cache_name' as the shash node name for
shash_add(). So, the del_cached_port() must also pass 'cache_name'
as argument for shash_find_and_delete().
This bug does not cause any issue currently but should be fixed.
Signed-off-by: Alex Wang <alexw@nicira.com> Acked-by: Justin Pettit <jpettit@nicira.com>
Jarno Rajahalme [Thu, 11 Jun 2015 22:53:43 +0000 (15:53 -0700)]
ofproto: Revertible eviction.
Handling evictions was broken in the previous patches. Eviction took
place early in the commit, and actually inappropriately bumped the
version number too early. Now eviction is treated much like a flow
modification, where a new rule replaces the old one, but just without
any 'inheritance' from the evicted rule to the new rule. This makes
evictions to be executed only when commit is successful, as evictions
are reverted like any other changes when the commit fails.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Jarno Rajahalme [Thu, 11 Jun 2015 22:53:43 +0000 (15:53 -0700)]
ofproto: Accurate flow counts.
Classifier's rule count now contains temporary duplicates and rules
whose deletion has been deferred. Maintain a new 'n_flows' count in
struct oftable to as the count of rules in the latest version.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>