Update the documentation with the information on the megaflow hits
observed with the default 'emc-insert-inv-prob' value. Also add the
recommended setting for achieving higher forwarding performance.
Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com> CC: Ciara Loftus <ciara.loftus@intel.com> CC: Georg Schmuecking <georg.schmuecking@ericsson.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Kevin Traynor <ktraynor@redhat.com>
dpif-netdev: Skip EMC lookup when EMC is disabled.
Conditional EMC insert patch gives the flexibility to configure the
probability of flow insertion in to EMC. This also allows an option to
entirely disable EMC by setting 'emc-insert-inv-prob=0' which can be
useful at large number of parallel flows.
This patch skips EMC lookup when EMC is disabled. This is useful to
avoid wasting CPU cycles and also improve performance considerably.
Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com> CC: Ciara Loftus <ciara.loftus@intel.com> CC: Georg Schmuecking <georg.schmuecking@ericsson.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Darrell Ball dlu998@gmail.com
netdev-dpdk: fix ifindex assignment for DPDK ports
In current implementation port_id is used as an ifindex for all netdev-dpdk
interfaces.
For physical DPDK interfaces using port_id as ifindex causes that '0' is set as
ifindex for 'dpdk0' interface, '1' for 'dpdk1' and so on. For the DPDK vHost
interfaces ifindexes are not even assigned (0 is used by default) due to the
fact that vHost ports don't use port_id field from the DPDK library.
This causes multiple negative side-effects. First of all 0 is an invalid
ifindex value. The other issue is possible overlapping of 'dpdkX' interfaces
ifindex values with the ifindexes of kernel space interfaces which may cause
problems in any external tools that use those values. Neither 'dpdk0', nor any
DPDK vHost interfaces are visible in sFlow collector tools, as all interfaces
with ifindexes smaller than 1 are ignored.
Proposed solution to these issues is to calculate a hash of interface's name
and use calculated value as an ifindex. This way interfaces keep their
ifindexes during OVS-DPDK restarts, ports re-initialization events, etc., show
up in sFlow collectors and meet RFC 2863 specification regarding re-using
ifindex values by the same virtual interfaces and maximum ifindex value.
Joe Stringer [Fri, 21 Apr 2017 20:33:55 +0000 (13:33 -0700)]
configure: Reset libtool CURRENT version.
Since commit f12e09b7b2e5 ("libopenvswitch: Rename to libfoo-X.Y."), the
CURRENT libtool number is no longer derived from the OVS MINOR (from
vMAJOR.MINOR.MICRO) version, so it can be reset to 0.
Developers should attempt to avoid introducing ABI-breaking changes
within a particular OVS-X.Y release series. Occasionally due to the
nature of a particular bug, this is not possible. In such a case,
developers must update the libtool CURRENT version to indicate this
breakage to library users.
In most OVS library releases, this is expected to remain 0.
Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
Joe Stringer [Wed, 26 Apr 2017 20:47:49 +0000 (13:47 -0700)]
libopenvswitch: Rename to libfoo-X.Y.
The current intent for Open vSwitch is to maintain libopenvswitch ABI
stability for minor versions, for example each release within the 2.7.z
series. According to the following documentation, no changes to exported
headers should be made.
However, it is occasionally necessary to make changes to
{include/openvswitch,lib}/*.h headers to fix issues within a given
release series. The current libtool tagging mechanism in the build
system does not allow for this without creating a conflict between the
libtool 'current' version and the next minor release of OVS.
This patch modifies libopenvswitch build to include the MAJOR.MINOR
release version in the libX name, and include the libtool CURRENT and
OVS MICRO release in the libtool versioning tags to indicate library
stability. The resulting format is "libfoo-X.Y.so.CURRENT.0.Z" for OVS
release "X.Y.Z".
Developers should still attempt to avoid introducing ABI-breaking changes
within a particular OVS-X.Y release series, but if this is not possible
this patch introduced a mechanism to allow an ABI-breaking fix to be
introduced. In such a case, developers may update the libtool CURRENT
version to indicate this breakage to library users.
Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
William Tu [Wed, 10 May 2017 21:45:09 +0000 (14:45 -0700)]
tests: Test native tunneling underlay match.
Add a test that checks that native tunneling flow
matching is working. The test verifies that outer L2 and L3
flow fields populated in the overlay bridge can be
matched in the underlay bridge.
Co-Authored-by: Joe Stringer <joe@ovn.org> Co-Authored-by: Darrell Ball <dlu998@gmail.com> Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: Darrell Ball <dlu998@gmail.com> Signed-off-by: Joe Stringer <joe@ovn.org>
Without the fix, this test currently consistently fail when running
on Travis CI. Connecting to the controller can take more time than
running locally. Because the exact connecting time is variable, the
exact output should not be used for correctness checking.
Fixes: 85c55772a453(bridge: Fix controller status update) Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
Shashank Ram [Wed, 17 May 2017 16:30:49 +0000 (09:30 -0700)]
datapath-windows: Set Version correctly for OVSExt
- Previously, the 'Version' property passed to MSBuild
was not being passed to the RcComplile section. To
use the value of 'Version' property in the rc file,
it needs to be passed.
- Adds a macro to convert the Version to a string literal.
Previously, the Version was simply being converted
to a literal text 'Version' instead of the the version
number passed using the 'Version' property to MSBuild.
Andy Zhou [Wed, 10 May 2017 22:10:59 +0000 (15:10 -0700)]
bridge: Fix controller status update
When multiple bridges connects to the same controller, the controller
status should be maintained separately for each bridge. Current
logic pushes status updates only based on the connection string,
which is the same across multiple bridges when connecting to the
same controller.
Report-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2017-May/044412.html Reported-by: Tulio Ribeiro <tribeiro@lasige.di.fc.ul.pt> Signed-off-by: Andy Zhou <azhou@ovn.org> Reviewed-by: Greg Rose <gvrose@8192@gmail.com>
Joe Stringer [Mon, 8 May 2017 18:15:39 +0000 (11:15 -0700)]
Revert "tunneling: Avoid recirculation on datapath."
This reverts commit f1dac5128ce6db2e493f0d1c7a8b53fb9f34476f. When this
commit was introduced, it broke the 'make check-system-userspace'
testsuite. It appears that the new translation fails to modify the flow
in a way that would represent the flow as an encapsulated flow when the
traffic is patched through to the second bridge. As such, rather than
matching on, for example, "ip,proto=47" for gre, it would use the inner
packet's flow headers. It also results in problems reporting statistics,
as the tunnel's header is not reflected in subsequent statistics and
truncation is not properly applied during translation.
While a refreshed approach to solving the above problem is formed,
revert this patch.
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2017-May/331972.html Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Greg Rose <gvrose8192@gmail.com>
The code in checkpatch inconsistently stripped "a/" or "b/" from the
beginning of a file name, and the check for "datapath" only worked when
the prefix was not stripped. This fixes the problem.
Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Jan Scheurich [Sat, 6 May 2017 15:49:43 +0000 (15:49 +0000)]
userspace: Support for push_eth and pop_eth actions
Add support for actions push_eth and pop_eth to the netdev datapath and
the supporting libraries. This patch relies on the support for these actions
in the kernel datapath to be present.
Signed-off-by: Lorand Jakab <lojakab@cisco.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: Yi Yang <yi.y.yang@intel.com> Signed-off-by: Jean Tourrilhes <jt@labs.hpe.com> Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Co-authored-by: Zoltan Balogh <zoltan.balogh@ericsson.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Anand Kumar [Thu, 4 May 2017 22:12:54 +0000 (15:12 -0700)]
datapath-windows: Fragment NBL based on MRU size
This patch adds support for Fragmenting NBL based on the MRU value.
MRU value is updated only for Ipv4 fragments, if it is non zero, then
fragment the NBL and send out the new NBL to the vnic.
Signed-off-by: Anand Kumar <kumaranand@vmware.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Anand Kumar [Thu, 4 May 2017 22:12:52 +0000 (15:12 -0700)]
datapath-windows: Retain MRU value in the _OVS_BUFFER_CONTEXT.
This patch introduces a new field MRU(Maximum Recieved Unit) in the
_OVS_BUFFER_CONTEXT and it is used only for Ip Fragments to retain MRU for
the reassembled IP datagram when the packet is forwarded to userspace.
Signed-off-by: Anand Kumar <kumaranand@vmware.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Anand Kumar [Thu, 4 May 2017 22:12:51 +0000 (15:12 -0700)]
datapath-windows: Added Ipv4 fragments support in Conntrack
This patch adds support for tracking Ipv4 fragments in conntrack module.
Individual fragments are not tracked and are consumed by the
fragmentation/reassembly. Only the reassembled Ipv4 datagram is tracked and
treated as a single ct entry.
Signed-off-by: Anand Kumar <kumaranand@vmware.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Anand Kumar [Thu, 4 May 2017 22:12:50 +0000 (15:12 -0700)]
datapath-windows: Added a new file to support Ipv4 fragments.
This patch adds functionalities to support IPv4 fragments, which will be
used by Conntrack module.
Added a new structure to hold the Ipv4 fragments and a hash table to
hold Ipv4 datagram entries. Also added a clean up thread that runs
every minute to delete the expired IPv4 datagram entries.
The individual fragments are ignored by the conntrack. Once all the
fragments are recieved, a new NBL is created out of the reassembled
fragments and conntrack executes actions on the new NBL.
Created new APIs OvsProcessIpv4Fragment() to process individual fragments,
OvsIpv4Reassemble() to reassemble Ipv4 fragments.
Signed-off-by: Anand Kumar <kumaranand@vmware.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Ben Pfaff [Fri, 5 May 2017 22:18:09 +0000 (18:18 -0400)]
sparse: Add rte_memcpy.h replacement header.
Without this replacement header, building netdev-dpdk.c provokes several
"sparse" warnings on i386:
/usr/include/dpdk/rte_memcpy.h:515:33: warning: incorrect type in argument 1 (different type sizes)
/usr/include/dpdk/rte_memcpy.h:515:33: expected long long const [usertype] *__P
/usr/include/dpdk/rte_memcpy.h:515:33: got int const [usertype] *<noident>
/usr/lib/gcc/i686-linux-gnu/6//include/emmintrin.h:698:20: error: undefined identifier '__builtin_ia32_loaddqu'
/usr/lib/gcc/i686-linux-gnu/6//include/emmintrin.h:698:11: error: cast from unknown type
/usr/lib/gcc/i686-linux-gnu/6//include/emmintrin.h:716:3: error: undefined identifier '__builtin_ia32_storedqu'
/usr/lib/gcc/i686-linux-gnu/6//include/emmintrin.h:698:43: error: not a function <noident>
/usr/lib/gcc/i686-linux-gnu/6//include/emmintrin.h:716:27: error: not a function <noident>
...
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Darrell Ball <dlu998@gmail.com>
Ben Pfaff [Fri, 5 May 2017 12:54:43 +0000 (12:54 +0000)]
physical: Simplify updating localvif_to_ofport map in physical_run()
These two loops were updating 'localvif_to_ports' to be the same as
'new_localvif_to_ports' and setting physical_map_changed if there had
been any differences. This is more work than necessary, so this
commit simplifies it.
Port a variant of commit a6d1a2997db4:
ofproto.at: Workaround a race
While a barrier serializes requests from the same connection,
it doesn't wait for requests from other connections to the switch.
Replace the barrier with infamous "sleep 1" to workaround the problem.
to the following tests:
"ofproto - asynchronous message control (OpenFlow 1.0)",
"ofproto - asynchronous message control (OpenFlow 1.3)",
"ofproto - asynchronous message control (OpenFlow 1.4)" and
"ofproto - asynchronous message control (OpenFlow 1.5)"
Do not use "sleep 1", but wait for log file to have (at least) the same
number of lines we expect (it generally waits less time).
Sometimes one of these tests fails because the OFPT_BARRIER_REPLY is
printed before the other message we expect to have.
The message "Cannot bind port X, err=Y" creates only confusion. In metadata
based mode, failure of IPv6 socket creation is okay if IPv6 is disabled and
no error message should be printed. But when IPv6 tunnel was requested, such
failure is fatal. The vxlan_socket_create does not know when the error is
harmless and when it's not.
Instead of passing such information down to vxlan_socket_create, remove the
message completely. It's not useful. We propagate the error code up to the
user space and the port number comes from the user space. There's nothing in
the message that the process creating vxlan interface does not know.
Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
Zong Kai LI [Thu, 4 May 2017 15:12:54 +0000 (20:42 +0530)]
lib: rename ovs_nd_opt to ovs_nd_lla_opt
Since ovs_nd_mtu_opt and ovs_nd_prefix_opt is introducted, rename
ovs_nd_opt to ovs_nd_lla_opt to specify it's Source/Target Link-layer
Address Option.
Signed-off-by: Zongkai LI <zealokii@gmail.com> Signed-off-by: Numan Siddique <nusiddiq@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Zong Kai LI [Thu, 4 May 2017 15:12:30 +0000 (20:42 +0530)]
packets: add compose_nd_ra
This patch introduces methods to compose a Router Advertisement (RA) packet,
introduces flags for RA. RA packet composed structures against specification
in RFC4861.
Caller can use compse_nd_ra_with_sll_mtu_opts to compose a RA packet with
Source Link-layer Address Option and MTU Option.
Caller can use packet_put_ra_prefix_opt to append a Prefix Information Option
to a RA packet.
Signed-off-by: Zongkai LI <zealokii@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
ovs-pki: add option to suppress generated id in common name
For some applications, it is desirable to have full control of
the common name field in generated certificates. Add a command-line
option to suppress appending " id:<uuid-or-date>" to the user-
specified name.
Signed-off-by: Lance Richardson <lrichard@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
ovsdb: refactor utility functions into separate file
Move local db access functions to a new file and make give them
global scope so they can be included in the ovsdb library and used
by other ovsdb library functions.
Signed-off-by: Lance Richardson <lrichard@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ben Pfaff [Sun, 30 Apr 2017 20:39:00 +0000 (13:39 -0700)]
ovn-trace: Display friendlier port and datapath names.
This makes ovn-trace use short name instead of UUIDs (etc.) in its own
output, by default. Since it's possible that there's software out there
parsing ovn-trace output, it also adds a --no-friendly-names option.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Andy Zhou <azhou@ovn.org>
Ben Pfaff [Sun, 30 Apr 2017 21:21:04 +0000 (14:21 -0700)]
ovn-trace: Accept human-friendly logical port and datapath names.
This allows the user to specify these names in a natural way, e.g.
"ovn-trace myswitch 'inport == "myport"' instead of having to specify
whatever UUID or other horrible name the CMS invented.
Simon Horman [Wed, 3 May 2017 14:33:06 +0000 (16:33 +0200)]
tests: Only run python SSL test if SSL support is configured
Only run python SSL test, which invokes ovsdb with a --remote=pssl,
if SSL support is configured.
Without this change the following error appears when running
the test-suite when OVS is configured with --disable-ssl.
+ovsdb-server: Private key specified but Open vSwitch was built without SSL support
./ovsdb-idl.at:1215: exit code was 1, expected 0
Fixes: d90ed7d65ba8 ("python: Add SSL support to the python ovs client library") Signed-off-by: Simon Horman <simon.horman@netronome.com> Acked-by: Ben Pfaff <blp@ovn.org>
When IPv6 is compiled but disabled at runtime, __vxlan_sock_add returns
-EAFNOSUPPORT. For metadata based tunnels, this causes failure of the whole
operation of bringing up the tunnel.
Ignore failure of IPv6 socket creation for metadata based tunnels caused by
IPv6 not being available.
Fixes: b1be00a6c39f ("vxlan: support both IPv4 and IPv6 sockets in a single vxlan device") Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
Jan Scheurich [Tue, 25 Apr 2017 16:29:59 +0000 (16:29 +0000)]
userspace: Add packet_type in dp_packet and flow
This commit adds a packet_type attribute to the structs dp_packet and flow
to explicitly carry the type of the packet as prepration for the
introduction of the so-called packet type-aware pipeline (PTAP) in OVS.
The packet_type is a big-endian 32 bit integer with the encoding as
specified in OpenFlow verion 1.5.
The upper 16 bits contain the packet type name space. Pre-defined values
are defined in openflow-common.h:
enum ofp_header_type_namespaces {
OFPHTN_ONF = 0, /* ONF namespace. */
OFPHTN_ETHERTYPE = 1, /* ns_type is an Ethertype. */
OFPHTN_IP_PROTO = 2, /* ns_type is a IP protocol number. */
OFPHTN_UDP_TCP_PORT = 3, /* ns_type is a TCP or UDP port. */
OFPHTN_IPV4_OPTION = 4, /* ns_type is an IPv4 option number. */
};
The lower 16 bits specify the actual type in the context of the name space.
Only name spaces 0 and 1 will be supported for now.
For name space OFPHTN_ONF the relevant packet type is 0 (Ethernet).
This is the default packet_type in OVS and the only one supported so far.
Packets of type (OFPHTN_ONF, 0) are called Ethernet packets.
In name space OFPHTN_ETHERTYPE the type is the Ethertype of the packet.
A packet of type (OFPHTN_ETHERTYPE, <Ethertype>) is a standard L2 packet
whith the Ethernet header (and any VLAN tags) removed to expose the L3
(or L2.5) payload of the packet. These will simply be called L3 packets.
The Ethernet address fields dl_src and dl_dst in struct flow are not
applicable for an L3 packet and must be zero. However, to maintain
compatibility with the large code base, we have chosen to copy the
Ethertype of an L3 packet into the the dl_type field of struct flow.
This does not mean that it will be possible to match on dl_type for L3
packets with PTAP later on. Matching must be done on packet_type instead.
New dp_packets are initialized with packet_type Ethernet. Ports that
receive L3 packets will have to explicitly adjust the packet_type.
Signed-off-by: Jean Tourrilhes <jt@labs.hpe.com> Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Co-authored-by: Zoltan Balogh <zoltan.balogh@ericsson.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ben Pfaff [Sun, 30 Apr 2017 20:53:24 +0000 (13:53 -0700)]
ovn-sbctl: Get rid of redundant code by using function from db-ctl-base.
This renames get_row() to ctl_get_row() and makes it public. It's
unfortunate that it adds a cast, but getting rid of redundant code seems
worth it to me.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Andy Zhou <azhou@ovn.org>
Ben Pfaff [Sun, 30 Apr 2017 20:52:11 +0000 (13:52 -0700)]
ovn-sbctl: Allow database commands to refer to datapaths by name.
Until now, only the lflow-list command supported using UUIDs or names
for datapaths. This commit extends that support to all the database
commands, as well as adding support for matching "logical-switch" or
"logical-router" in addition to "name".
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Andy Zhou <azhou@ovn.org>
Ben Pfaff [Sun, 30 Apr 2017 05:57:31 +0000 (22:57 -0700)]
ovn-northd: Keep external-ids up-to-date in Datapath_Binding.
Without this, ovn-northd sets external-ids properly when it creates a
Datapath_Binding record, but failed to update the external-ids if they
should have changed.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Andy Zhou <azhou@ovn.org>
Ben Pfaff [Sun, 30 Apr 2017 21:24:18 +0000 (14:24 -0700)]
ovn-northd: Propagate Neutron datapath names to southbound database.
It's much easier to see what's going on in the southbound database if
human-friendly names are available.
Really it's too bad that we didn't put the human-friendly name in "name"
and the UUID in something like "external_ids:neutron-uuid", but it'll take
more coordination to change that at this point and it may not be worth it.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Andy Zhou <azhou@ovn.org>
Joe Stringer [Wed, 3 May 2017 18:53:29 +0000 (11:53 -0700)]
datapath: Remove untracked CT on newer kernels.
Upstream commits cc41c84b7e7f ("netfilter: kill the fake untracked
conntrack objects") and ab8bc7ed864b ("netfilter: remove
nf_ct_is_untracked") removed the 'untracked' conntrack objects and
functions. The latter commit removes the usage of nf_ct_is_untracked()
from OVS. However, older kernels still have a representation of
'untracked' CT objects so the code needs to remain until the kernel
support is bumped to Linux 4.12 or newer. Introduce a macro to detect
this symbol and wrap these lines in the macro check.
Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Greg Rose <gvrose8192@gmail.com>
Ben Pfaff [Wed, 3 May 2017 18:05:53 +0000 (11:05 -0700)]
ovs-atomic: Report error for contradictory configuration.
A user reported that GCC 5.x was using the atomic fallback for GCC 4.x
because the test
#elif __GNUC__ >= 4 && __GNUC_MINOR__ >= 7
didn't include GCC 5. However, GCC 5+ has <stdatomic.h> and shouldn't use
any of the GCC-specific cases at all. I think that this user was actually
pulling our atomics out into third-party code that probably didn't define
HAVE_STDATOMIC_H properly, so this commit both avoids that problem for
them in the future and clarifies the intent of the ovs-atomic header.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
Andy Zhou [Tue, 25 Apr 2017 01:55:04 +0000 (18:55 -0700)]
vswitchd: Add --cleanup option to the 'appctl exit' command
'appctl exit' stops the running vswitchd daemon, without releasing
the datapath resources (such as bridges and ports) that vswitchd
has created. This is expected when vswitchd is to be relaunched, to
reduce the perturbation of exiting traffic and connections.
However, when vswitchd is intended to be shutdown permanently, it
is desirable not to leak datapath resources. In theory, this can be
achieved by removing the corresponding configurations from
OVSDB before shutting down vswitchd. However it is not always
possible in practice. Sometimes it is convenient and robust for
vswitchd to release all datapath resources that it has configured.
Add 'appctl exit --cleanup' option for this use case.
Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Jarno Rajahalme <jarno@ovn.org>
DPDK 16.07 introduced the support for mempool offload support.
rte_pktmbuf_pool_create is the recommended method for creating pktmbuf
pools. Buffer pools created with rte_mempool_create may not get offloaded
to the underlying offloaded mempools.
This patch, changes the rte_mempool_create to use helper wrapper
"rte_pktmbuf_pool_create" provided by dpdk, so that it can leverage
offloaded mempools.
Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com> Acked-by: Jianbo Liu <jianbo.liu@linaro.org> Acked-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Darrell Ball <dlu998@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Billy O'Mahony [Wed, 1 Mar 2017 12:36:43 +0000 (12:36 +0000)]
netdev-dpdk: Enable INDIRECT_DESC on DPDK vHostUser.
This gives much better performance for linux apps in the guest without
affecting dpdk applications in the guest.
I'm creating this patch on the basis of performance results outlined below.
In summary it appears that enabling INDIRECT_DESC on DPDK vHostUser ports
leads to very large increase in performance when using linux stack
applications in the guest with no noticable performance drop for DPDK based
applications in the guest.
Test#1 (VM-VM iperf3 performance)
VMs use DPDK vhostuser ports
OVS bridge is configured for normal action.
OVS version 603381a (on 2.7.0 branch but before release,
also seen on v2.6.0 and v2.6.1)
DPDK v16.11
QEMU v2.5.0 (also seen with v2.7.1)
Test#2 (Phy-VM-Phy RFC2544 Throughput)
DPDK PMDs are polling NIC, DPDK loopback app running in guest.
OVS bridge is configured with port forwarding to VM (via dpdkvhostuser ports).
OVS version 603381a (on 2.7.0 branch but before release),
other versions not tested.
DPDK v16.11
QEMU v2.5.0 (also seen with v2.7.1)
Mark Kavanagh [Mon, 13 Mar 2017 11:35:26 +0000 (11:35 +0000)]
netdev-dpdk: fix mempool_configure error state
netdev_dpdk_mempool_configure obtains a handle to a
DPDK memory pool via a call to dpdk_mp_get. If dpdk_mp_get
fails, the former informs the user that insufficient memory
is available, and returns ENOMEM. However, this is
potentially misleading, as there are a number of reasons why
creation of a mempool can fail (as per rte_mempool_create),
including:
- insufficient memory available
- mempool already exists
- other memory allocation error
Update the error log to reflect this fact, and return rte_errno
in the event of error, instead of ENOMEM.
Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com> Fixes: 0072e931 ("netdev-dpdk: add support for jumbo frames") Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Ian Stokes <ian.stokes@intel.com> Acked-by: Darrell Ball <dlu998@gmail.com>
Han Zhou [Tue, 2 May 2017 20:22:35 +0000 (13:22 -0700)]
ovn-controller: Disable probes by default for unix sockets.
Normally the OVS JSON-RPC library does not probe idle connections across
Unix domain sockets, since the kernel can tell OVS whether the connections
are truly connected without probes, but ovn-controller carelessly
overrode that.
(This should not be an issue in typical OVN deployments, because the OVN SB
database is normally accessed via TCP or SSL.)
CC: Nirapada Ghosh <nghosh@us.ibm.com> Fixes: 715038b6b222 ("ovn-controller: reload configured SB probe timer") Signed-off-by: Han Zhou <zhouhan@gmail.com> Co-authored-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Ben Pfaff <blp@ovn.org>
Yi-Hung Wei [Mon, 1 May 2017 17:24:35 +0000 (10:24 -0700)]
system-traffic: Add test for mpls actions
Add ping test to verify the behavior of mpls_push/pop actions. In this
test, we use the resubmit action to trigger recirulation for making sure
the flow key is revalidated after mpls_push/pop. This test depends on
commit 5ba0c107c51e ("datapath: Fix ovs_flow_key_update()") to behave
correctly.
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Acked-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
ovs_flow_key_update() is called when the flow key is invalid, and it is
used to update and revalidate the flow key. Commit 329f45bc4f19
("openvswitch: add mac_proto field to the flow key") introduces mac_proto
field to flow key and use it to determine whether the flow key is valid.
However, the commit does not update the code path in ovs_flow_key_update()
to revalidate the flow key which may cause BUG_ON() on execute_recirc().
This patch addresses the aforementioned issue.
Fixes: 329f45bc4f19 ("openvswitch: add mac_proto field to the flow key") Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Acked-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
openvswitch: correctly fragment packet with mpls headers
If mpls headers were pushed to a defragmented packet, the refragmentation no
longer works correctly after 48d2ab609b6b ("net: mpls: Fixups for GSO"). The
network header has to be shifted after the mpls headers for the
fragmentation and restored afterwards.
Fixes: 48d2ab609b6b ("net: mpls: Fixups for GSO") Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
openvswitch: mpls: set network header correctly on key extract
After the 48d2ab609b6b ("net: mpls: Fixups for GSO"), MPLS handling in
openvswitch was changed to have network header pointing to the start of the
MPLS headers and inner_network_header pointing after the MPLS headers.
However, key_extract was missed by the mentioned commit, causing incorrect
headers to be set when a MPLS packet just enters the bridge or after it is
recirculated.
Fixes: 48d2ab609b6b ("net: mpls: Fixups for GSO") Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
Yi-Hung Wei [Mon, 1 May 2017 17:24:31 +0000 (10:24 -0700)]
datapath: Fixups for MPLS GSO
This patch backports the following two upstream commits to fix MPLS GSO in
ovs datapath. Starting from upstream commit 48d2ab609b6b ("net: mpls: Fixups
for GSO"), the mpls_gso kernel module relies on the fact that
skb_network_header() points to the mpls header and skb_inner_network_header()
points to the L3 header so that it can derive the length of mpls header
correctly, and the upstream commit updates how ovs datapath marks the skb
header when push and pop mpls. However, the old mpls_gso kernel module
assumes that the skb_network_header() points to the L3 header, and the old
mpls_gso kernel module will misbehave if the ovs datapath marks the
skb_network_header() in the new way since it will treat mpls header as the L3
header.
Because of the functional signature of mpls_gso_segment() does not change,
this backport patch uses the new mpls_hdr() to determine if the kernel that
ovs datapath is compiled with has the new or legacy mpls_gso kernel module.
It has been tested on kernel 4.4 and 4.9.
As reported by Lennert the MPLS GSO code is failing to properly segment
large packets. There are a couple of problems:
1. the inner protocol is not set so the gso segment functions for inner
protocol layers are not getting run, and
2 MPLS labels for packets that use the "native" (non-OVS) MPLS code
are not properly accounted for in mpls_gso_segment.
The MPLS GSO code was added for OVS. It is re-using skb_mac_gso_segment
to call the gso segment functions for the higher layer protocols. That
means skb_mac_gso_segment is called twice -- once with the network
protocol set to MPLS and again with the network protocol set to the
inner protocol.
This patch sets the inner skb protocol addressing item 1 above and sets
the network_header and inner_network_header to mark where the MPLS labels
start and end. The MPLS code in OVS is also updated to set the two
network markers.
>From there the MPLS GSO code uses the difference between the network
header and the inner network header to know the size of the MPLS header
that was pushed. It then pulls the MPLS header, resets the mac_len and
protocol for the inner protocol and then calls skb_mac_gso_segment
to segment the skb.
Afterward the inner protocol segmentation is done the skb protocol
is set to mpls for each segment and the network and mac headers
restored.
Reported-by: Lennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Upstream commit:
commit 85de4a2101acb85c3b1dde465e84596ccca99f2c
Author: Jiri Benc <jbenc@redhat.com>
Date: Fri Sep 30 19:08:07 2016 +0200
openvswitch: use mpls_hdr
skb_mpls_header is equivalent to mpls_hdr now. Use the existing helper
instead.
Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
Ben Pfaff [Sun, 30 Apr 2017 21:03:02 +0000 (14:03 -0700)]
ovn-nbctl: Display and accept Neutron network, router, port names.
The names of these neutron:* keys in external_ids are unfortunate, but
they are the keys that the OVN utilities need to support if we want users
to be able to work with OpenStack in a convenient fashion rather than
having to cut and paste UUIDs everywhere.
This commit documents the meaning of these keys, in the hopes that other
CMS integrations will simply use them instead of inventing new ones.
Perhaps at some point we can clean this up, since bad names are a bad idea,
but it also would take a lot of coordination and probably multiple
releases.
Port names are slightly less useful in practice than switch or router names
because Neutron doesn't by default give names to ports. (You can add them
with "openstack port set --name", though.)
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Andy Zhou <azhou@ovn.org>
Ben Pfaff [Thu, 27 Apr 2017 22:47:59 +0000 (15:47 -0700)]
db-ctl-base: Add support for identifying a row based on a value in a map.
This will be used in an upcoming commit to allow Datapath_Binding records
in the OVN southbound database to be identified based on external-ids:name
and other map values.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Andy Zhou <azhou@ovn.org>
Ben Pfaff [Thu, 27 Apr 2017 20:33:12 +0000 (13:33 -0700)]
ovn-sbctl, ovn-nbctl, ovs-vsctl: Remove useless record id methods.
These only did anything if both the first two members of the struct were
nonnull, as you can see from the first test in get_row_by_id() in
lib/db-ctl-base.c, so these never did anything useful and I can't figure
out why they're there.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Russell Bryant <russell@ovn.org>
Ben Pfaff [Thu, 27 Apr 2017 16:36:36 +0000 (09:36 -0700)]
ovn-nbctl: Drop gratuitous indentation for "show" output.
"ovn-nbctl show" indented every line of output by at least 4 spaces, which
needlessly wastes horizontal space. This drops 4 spaces of indent from
each line of output.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Andy Zhou <azhou@ovn.org> Acked-by: Russell Bryant <russell@ovn.org>
Ben Pfaff [Sun, 30 Apr 2017 21:09:55 +0000 (14:09 -0700)]
uuid: Change semantics of uuid_is_partial_string().
Until now, uuid_is_partial_string() returned the number of characters at
the beginning of a string that were the beginning of a valid UUID. This
is useful, but all of the callers actually wanted to get a value of 0 if
the string contained a character that was invalid for a UUID. This makes
that change.
Examples:
"123" previously yielded 3 and still does.
"xyzzy" previously yielded 0 and still does.
"123xyzzy" previously yielded 3, now yields 0.
"e66250bb-9531-491b-b9c3-5385cabb0080" previously yielded 36, still does.
"e66250bb-9531-491b-b9c3-5385cabb0080xyzzy" previously yielded 36, now 0.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Andy Zhou <azhou@ovn.org> Acked-by: Russell Bryant <russell@ovn.org>
fedora: do not restart ovn svcs automatically on pkg upgrade
Similar to commit 5771f4765734 ("fedora: do not restart the
service on a pkg upgrade"), this change eliminates the
automatic restart of OVN services after upgrade.
Note that the post-uninstall scriptlet affected by this change
is executed from the previously installed package when upgrading,
so existing installations need to go through two package upgrades
before this change will take effect.
Signed-off-by: Lance Richardson <lrichard@redhat.com> Signed-off-by: Russell Bryant <rbryant@redhat.com>
Russell Bryant [Fri, 31 Mar 2017 15:27:23 +0000 (11:27 -0400)]
build: Don't run tests in rpm makefile targets.
The RPM build makefile targets are helpful during development and testing,
but I personally almost never want the tests to run when I use them.
Leave tests on by default in the spec file for when the package is built by
distro build systems, but disable it by default in the Makefile targets and
update the documentation accordingly.
Joe Stringer [Mon, 1 May 2017 19:58:06 +0000 (12:58 -0700)]
revalidator: Revalidate ukeys created from flows.
If there is no active ukey for a particular datapath flow, and it is
dumped from the datapath, then the revalidator threads will assemble a
ukey based on the datapath flow. This will allow tracking of the stats
for proper attribution, and future validation of the flow.
However, until now when creating the ukey in this context, the ukey's
'reval_seq' has been set to the current udpif's reval_seq. This implies
that the flow has been validated against the current flow table.
However, this is not true - The flow appeared in the datapath without
any prior knowledge in this OVS instance so we should set up the
reval_seq of the ukey to ensure that the flow will be validated during
the current dump/revalidation cycle.
Refer also revalidate_ukey().
Fixes: 23597df05226 ("upcall: Create ukeys in handler threads.") Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Jarno Rajahalme <jarno@ovn.org>
ovn-northd: Add logical flows to support native DNS
OVN implements native DNS resolution which can be used to resolve the
internal DNS names belonging to a logical datapath.
To support this, a new table 'DNS' is added in the NB DB. A new column
'dns_records' is added in 'Logical_Switch' table which references to the
'DNS' table.
Following flows are added for each logical switch if configured with
DNS records in the 'dns_records' column
- A logical flow in DNS_LOOKUP stage which uses the action 'dns_lookup'
to transform the DNS query to DNS reply packet and advances
to the next stage - DNS_RESPONSE.
- A logical flow in DNS_RESPONSE stage which implements the DNS responder
by sending the DNS reply from previous stage back to the inport.
This patch adds a new OVN action 'dns_lookup' to support native DNS.
ovn-controller parses this action and adds a NXT_PACKET_IN2
OF flow with 'pause' flag set.
A new table 'DNS' is added in the SB DB to look up and resolve
the DNS queries. When a valid DNS packet is received by
ovn-controller, it looks up the DNS name in the 'DNS' table
and if successful, it frames a DNS reply, resumes the packet
and stores 1 in the 1-bit subfield. If the packet is invalid
or cannot be resolved, it resumes the packet without any
modifications and stores 0 in the 1-bit subfield.
reg0[4] = dns_lookup(); next;
An upcoming patch will use this action and adds logical flows.
Aaron Conole [Tue, 2 May 2017 20:17:48 +0000 (16:17 -0400)]
rhel: fix the fedora spec
When commit d0c961a99f57 ("lib/automake.mk: don't install
runtime directories") landed, it broke RPM based builds since
the requisite directories were no longer available. This commit
adds those directories back when making RPMs so that the package
manager can see them.
Ben Pfaff [Mon, 1 May 2017 20:19:43 +0000 (13:19 -0700)]
ovs-macros: Add helper to make 'wc' use POSIX compliant output format.
Several times, we've had to fix tests that used 'wc' and expected a
particular output format. POSIX is specific about the output format, but
neither GNU or BSD wc honors it. This commit makes whatever 'wc' is on
the system use the POSIX output format.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: YAMAMOTO Takashi <yamamoto@ovn.org>
Han Zhou [Sat, 22 Apr 2017 01:55:27 +0000 (18:55 -0700)]
ovn-controller: Avoid recomputing when there are in-flight msgs.
When there are in-flight msgs being sent to OVS, ofctrl_put will
skip, which makes all the flows computed in that main loop
iteration useless. To avoid the wasted CPU cycles, a check is added
before lflow/physical flow run in each iteration.
This has huge performance improvement in below testing:
- 1 lswitch with 10 lports bound locally
- Each lport has an ingress ACL, referencing the same address-set
- The address-set has 10,000 IPv4 addresses
For each IP address in the address-set, there will be 3
OpenFlow rules generated for each ACL. So the total number
of rules is 300k+.
Without the patch, it takes 50+ minutes to install all the
rules to ovs-vswitchd.
With the patch, it takes 16 seconds to install all the rules
to ovs-vswitchd.
The reason is that the large number of rules are sent to
ovs-vswitchd gradually in many iterations of ovn-controller
main loop. Without the patch, cpu cycles are wasted in
lflow_run to re-processing the large address set in every
main loop iteration. With the patch, this re-processing is
avoided in iterations when there are pending rules sending.
Signed-off-by: Han Zhou <zhouhan@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Aaron Conole [Mon, 1 May 2017 20:14:09 +0000 (16:14 -0400)]
checkpatch: fix pointer declaration
A common way of expressing 'raise to the power of' when authoring
comments uses **. This is currently getting caught by the pointer
spacing warning. So, catch it here.
Aaron Conole [Mon, 1 May 2017 20:14:08 +0000 (16:14 -0400)]
checkpatch: filename from hunks fix
Filenames that come from the hunks match include the git-ified 'b/'
prefix, which makes jumping to the error file that much harder. This
patch corrects that by simply skipping those bytes.
Signed-off-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Aaron Conole [Mon, 1 May 2017 20:14:07 +0000 (16:14 -0400)]
checkpatch: print conformance
Other utilities (notoriously the linux kernel's checkpatch.pl) have a more
standardized form for printing file and lines. With this change, the
template used to print gains two enhancements:
1. Color
2. Conformance with the kernel's version of checkpatch.pl
Signed-off-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Aaron Conole [Mon, 1 May 2017 20:14:06 +0000 (16:14 -0400)]
checkpatch: correct a parsing issue
Occasionally, characters will be sent which violate the
ascii decoder's sense of propriety. In fact, in-tree there are
a few such files (ex: tests/atlocal.in), and they cause an
exception to be raised when they are encountered.
Set the policy to ignore these cases. This means these bytes are
omitted from the text stream during processing.
Signed-off-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Aaron Conole [Mon, 1 May 2017 20:14:03 +0000 (16:14 -0400)]
checkpatch: introduce a flexible framework
Developers wishing to add checks to checkpatch sift through an adhoc mess,
currently. The process goes something like:
1. Figure out what to test in the patch
2. Write some code, quickly, that checks for that condition
3. Look through the statemachine to find where the check should go
4. ignore parts of the above and just throw something together
That worked fine for the initial development, but as interesting new tests
are developed, it is important to have a more flexible framework that lets
a developer just plug in a new test, easily.
This commit brings in a new framework that allows plugging in checks very
quickly. Hook up the line-length test as an initial demonstration.
Signed-off-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>