ovn-controller: Add datapath-type and iface-types in chassis:external_ids
This patch reads the 'Bridge.datapath_type' column value of the integration
bridge and 'Open_vSwitch.iface_types' column value and sets these in the
external_ids:datapath-type and external_ids:iface-types of Chassis table.
This will provide hints to the CMS or clients monitoring OVN SB DB to
determine the datapath type (DPDK or non-DPDK) configured and take some
actions based on it.
One usecase is, OVN neutron plugin can use this information to set the
vif_type (ovs or vhostuser) during the port binding.
Signed-off-by: Numan Siddique <nusiddiq@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Ryan Moats <rmoats@us.ibm.com>
Mark Kavanagh [Tue, 9 Aug 2016 16:01:20 +0000 (17:01 +0100)]
netdev-dpdk: add support for jumbo frames
Add support for Jumbo Frames to DPDK-enabled port types,
using single-segment-mbufs.
Using this approach, the amount of memory allocated to each mbuf
to store frame data is increased to a value greater than 1518B
(typical Ethernet maximum frame length). The increased space
available in the mbuf means that an entire Jumbo Frame of a specific
size can be carried in a single mbuf, as opposed to partitioning
it across multiple mbuf segments.
The amount of space allocated to each mbuf to hold frame data is
defined dynamically by the user with ovs-vsctl, via the 'mtu_request'
parameter.
Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
[diproiettod@vmware.com rebased] Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
netdev: Make netdev_set_mtu() netdev parameter non-const.
Every provider silently drops the const attribute when converting the
parameter to the appropriate subclass. Might as well drop the const
attribute from the parameter, since this is a "set" function.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ilya Maximets <i.maximets@samsung.com>
vswitchd: Introduce 'mtu_request' column in Interface.
The 'mtu_request' column can be used to set the MTU of a specific
interface.
This column is useful because it will allow changing the MTU of DPDK
devices (implemented in a future commit), which are not accessible
outside the ovs-vswitchd process, but it can be used for kernel
interfaces as well.
The current implementation of set_mtu() in netdev-dpdk is removed
because it's broken. It will be reintroduced by a subsequent commit on
this series.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ilya Maximets <i.maximets@samsung.com>
dpif-netdev: Fix -Wformat warning on 32-bit build.
Use the appropriate format specifier for size_t, otherwise the 32-bit
build fails.
Reported-at: https://travis-ci.org/openvswitch/ovs/jobs/151938383 Fixes: 3453b4d62a98("dpif-netdev: dpcls per in_port with sorted
subtables") Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Joe Stringer <joe@ovn.org>
Ciara Loftus [Wed, 10 Aug 2016 14:28:27 +0000 (15:28 +0100)]
netdev-dpdk: add DPDK pdump capability
This commit provides the ability to 'listen' on DPDK ports and save
packets to a pcap file with a DPDK app that uses the librte_pdump
library. One such app is the 'pdump' app that can be found in the DPDK
'app' directory. Instructions on how to use this can be found in
INSTALL.DPDK-ADVANCED.md
Pdump capability in OVS with DPDK will only be initialised if the
CONFIG_RTE_LIBRTE_PMD_PCAP=y and CONFIG_RTE_LIBRTE_PDUMP=y options are
set in DPDK. libpcap is required if the above configuration is used.
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
ovs-bugtool: Correct "rmdir" error messages during "make distcheck".
Remove duplicated delete attempts and error messages during distcheck
clean procedure.
The problem is that during clean up procedure of distcheck:
rmdir: failed to remove ‘/openvswitch-2.5.90/_inst/share/openvswitch/bugtool-plugins/’: Directory not empty
rmdir: failed to remove ‘/openvswitch-2.5.90/_inst/share/openvswitch/bugtool-plugins/ovn/network-status ’: No such file or directory
The first entry is caused by xml file which is kept flat in the directory
structure (not in the subdirectory as it is for other plugins), and rmdir
"tries" to remove folder which keeps all plugins files and folders. That is
why additional check if directory is not empty is added, to prevent that.
The second entry is cause by some other commit when ovs plugin has been added:
stem=`echo "$$plugin" | sed 's,ovn/,,'`; \
So in that sense directory path has been modified during removal of xml
file, but it hasn't been updated during directory removal.
I didn't want to really change this logic, as I'm not sure if there
something else can be stored in this directory, but it was very tempting to
remove everything just by:
rm -rf "$(DESTDIR)$(bugtoolpluginsdir)/*"
Signed-off-by: Michal Weglicki <michalx.weglicki@intel.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Jan Scheurich [Thu, 11 Aug 2016 10:02:27 +0000 (12:02 +0200)]
dpif-netdev: dpcls per in_port with sorted subtables
The user-space datapath (dpif-netdev) consists of a first level "exact match
cache" (EMC) matching on 5-tuples and the normal megaflow classifier. With
many parallel packet flows (e.g. TCP connections) the EMC becomes inefficient
and the OVS forwarding performance is determined by the megaflow classifier.
The megaflow classifier (dpcls) consists of a variable number of hash tables
(aka subtables), each containing megaflow entries with the same mask of
packet header and metadata fields to match upon. A dpcls lookup matches a
given packet against all subtables in sequence until it hits a match. As
megaflow cache entries are by construction non-overlapping, the first match
is the only match.
Today the order of the subtables in the dpcls is essentially random so that
on average a dpcls lookup has to visit N/2 subtables for a hit, when N is the
total number of subtables. Even though every single hash-table lookup is
fast, the performance of the current dpcls degrades when there are many
subtables.
How does the patch address this issue:
In reality there is often a strong correlation between the ingress port and a
small subset of subtables that have hits. The entire megaflow cache typically
decomposes nicely into partitions that are hit only by packets entering from
a range of similar ports (e.g. traffic from Phy -> VM vs. traffic from VM ->
Phy).
Therefore, maintaining a separate dpcls instance per ingress port with its
subtable vector sorted by frequency of hits reduces the average number of
subtables lookups in the dpcls to a minimum, even if the total number of
subtables gets large. This is possible because megaflows always have an exact
match on in_port, so every megaflow belongs to unique dpcls instance.
For thread safety, the PMD thread needs to block out revalidators during the
periodic optimization. We use ovs_mutex_trylock() to avoid blocking the PMD.
To monitor the effectiveness of the patch we have enhanced the ovs-appctl
dpif-netdev/pmd-stats-show command with an extra line "avg. subtable lookups
per hit" to report the average number of subtable lookup needed for a
megaflow match. Ideally, this should be close to 1 and almost all cases much
smaller than N/2.
The PMD tests have been adjusted to the additional line in pmd-stats-show.
We have benchmarked a L3-VPN pipeline on top of a VXLAN overlay mesh.
With pure L3 tenant traffic between VMs on different nodes the resulting
netdev dpcls contains N=4 subtables. Each packet traversing the OVS
datapath is subject to dpcls lookup twice due to the tunnel termination.
Disabling the EMC, we have measured a baseline performance (in+out) of ~1.45
Mpps (64 bytes, 10K L4 packet flows). The average number of subtable lookups
per dpcls match is 2.5. With the patch the average number of subtable lookups
per dpcls match is reduced to 1 and the forwarding performance grows by ~50%
to 2.13 Mpps.
Even with EMC enabled, the patch improves the performance by 9% (for 1000 L4
flows) and 34% (for 50K+ L4 flows).
As the actual number of subtables will often be higher in reality, we can
assume that this is at the lower end of the speed-up one can expect from this
optimization. Just running a parallel ping between the VXLAN tunnel endpoints
increases the number of subtables and hence the average number of subtable
lookups from 2.5 to 3.5 on master with a corresponding decrease of throughput
to 1.2 Mpps. With the patch the parallel ping has no impact on average number
of subtable lookups and performance. The performance gain is then ~75%.
Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Acked-by: Antonio Fischetti <antonio.fischetti@intel.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
ovn-northd: Only warn about peer as switch port when it really is one.
At the end of join_logical_ports(), some ovn_ports might not have been
bound as logical switch ports or logical router ports, but the code assumed
that they were and gave a confusing warning when the assumption was
violated.
Russell Bryant [Fri, 12 Aug 2016 17:28:48 +0000 (13:28 -0400)]
release-process: Use markdown table format.
Update the release process document to use markdown formatting for the
table used to describe the 6 month release schedule. This will make it
be formatted correctly when converted to HTML on github and
openvswitch.org.
Signed-off-by: Russell Bryant <russell@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
Pravin B Shelar [Fri, 12 Aug 2016 02:27:12 +0000 (19:27 -0700)]
datapath: compat: keep skb mark across tunnel devices.
Older kernel skb_scrub_packet() has bug which resets skb mark for
all packet. It is fixed during 3.18 release where it is reset
only for packets crossing namespace. So OVS is forced to use
compat skb_scrub_packet() on older kernel.
This is related to upstream bug fix commit ca7c7b9059e3
("skbuff: Do not scrub skb mark within the same name space").
VMware-BZ: #1710701 Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
Joe Stringer [Fri, 12 Aug 2016 00:54:08 +0000 (17:54 -0700)]
ofproto-dpif-xlate: Fix VLOG_ERR_RL() call.
a716ef9a7a73 ("ofproto-dpif-xlate: Log flow in XLATE_REPORT_ERROR.")
inadvertantly broke build on clang due to improper passing of the ds
cstring into the VLOG() function:
error: format string is not a string literal
(potentially insecure) [-Werror,-Wformat-security]
XLATE_REPORT_ERROR(ctx, "over max translation depth %d", MAX_DEPTH);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
note: expanded from macro
'XLATE_REPORT_ERROR'
VLOG_ERR_RL(&error_report_rl, ds_cstr(&ds)); \
Reported-by: Daniele Di Proietto <diproiettod@vmware.com> Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Joe Stringer [Thu, 11 Aug 2016 19:36:16 +0000 (12:36 -0700)]
ofproto-dpif-xlate: Log flow in XLATE_REPORT_ERROR.
To assist debugging pipelines when resubmit resource checks fail, print
the base_flow from the translation context. This base flow can then be
used from ofproto/trace to figure out which parts of the pipeline lead
to this translation error.
Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
Ben Pfaff [Thu, 11 Aug 2016 04:14:09 +0000 (21:14 -0700)]
ovs-bugtool: Switch from MD5 to SHA-256.
While going through a FIPS certification process we discovered that
ovs-bugtool uses MD5 to identify the contents of files. FIPS doesn't allow
use of the obsolete and broken MD5 algorithm, so this commit switches to
SHA-256.
In a way, this is a silly requirement. ovs-bugtool only uses MD5 to
identify file content, mostly to ensure that the contents of the bug report
have not been corrupted. MD5 is perfectly adequate for that purpose; in
fact a 16-bit CRC would probably be adequate. On the other hand, there is
basically no cost and no disadvantage to switching to SHA-256, so why not
do it? That's why I think that this is a reasonable change.
VMware-BZ: #1708786 Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Ryan Moats <rmoats@us.ibm.com>
Ilya Maximets [Wed, 10 Aug 2016 09:43:03 +0000 (12:43 +0300)]
netdev-dpdk: vhost: Fix double free and use after free with QoS.
While using QoS with vHost interfaces 'netdev_dpdk_qos_run__()' will
free mbufs while executing 'netdev_dpdk_policer_run()'. After
that same mbufs will be freed at the end of '__netdev_dpdk_vhost_send()'
if 'may_steal == true'. This behaviour will break mempool.
Also 'netdev_dpdk_qos_run__()' will free packets even if we shouldn't
do this ('may_steal == false'). This will lead to using of already freed
packets by the upper layers.
Fix that by copying all packets that we can't steal like it done
for DPDK_DEV_ETH devices and freeing only packets not freed by QoS.
Fixes: 0bf765f753fd ("netdev_dpdk.c: Add QoS functionality.") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Tested-by: Ian Stokes <ian.stokes@intel.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Darrell Ball [Tue, 9 Aug 2016 02:20:38 +0000 (19:20 -0700)]
ovn: Fix receive from vxlan in ovn-controller.
The changes enable source node replication in OVN for receive from vxlan
tunnels. OVN only supports source node replication mode. This is needed
for ovn-controller to interoperate with hardware switches.
Previously hardware vtep interaction, which uses service node
replication by default for multicast/broadcast/unknown unicast traffic
partially "worked" by happenstance. Because of limited vxlan
encapsulation metadata, received packets were resubmitted to find
the egress port(s). This is not correct for multicast, broadcast and
unknown unicast traffic as traffic will get resent on the tunnel mesh.
ovn-controller is changed not to send traffic received from vxlan
tunnels out the tunnel mesh again. Traffic received from vxlan tunnels is
now only sent locally as intended with obvious benefits. This behavior is
newly documented in ovn-architecture.7.xml.
To support keeping state for receipt from a vxlan tunnel, a MFF logical
flags register flag is allocated.
As part of this change ovn-controller-vtep is hard-coded to set the
replication mode of each logical switch to source node as OVN will only
support source node replication.
I failed to see that lib/dpif-netdev.c actually needs the concurrency
provided by pvector prior to this change. More specifically, when a
subtable is removed, concurrent lookups may skip over another subtable
swapped in to the place of the removed subtable in the vector.
Since this was the only use of the non-concurrent pvector, it is
cleaner to revert the whole patch.
Reported-by: Jan Scheurich <jan.scheurich@ericsson.com> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Ryan Moats [Thu, 28 Jul 2016 22:17:41 +0000 (22:17 +0000)]
ovn-controller: Persist desired conntrack groups.
With incremental processing of logical flows desired conntrack groups
are not being persisted. This patch adds this capability, with the
side effect of adding a ds_clone method that this capability leverages.
Signed-off-by: Ryan Moats <rmoats@us.ibm.com> Reported-by: Guru Shetty <guru@ovn.org>
Reported-at: http://openvswitch.org/pipermail/dev/2016-July/076320.html Fixes: 70c7cfe ("ovn-controller: Add incremental processing to lflow_run and physical_run") Acked-by: Flavio Fernandes <flavio@flaviof.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
This commit builds upon some of the recent ovs-ctl changes to build a
more integrated systemd setup. A new service (ovs-vswitchd) is
added to track the ovs-vswitchd, and ovsdb-server service is reserved
for the ovsdb-server daemon. The systemd scripts still use ovs-ctl to
actually initialize the daemons.
rhel/ovsdb-server.service: Rename the nonetwork service
Currently, openvswitch.service calls out to start
openvswitch-nonetwork.service. However, openvswitch-nonetwork.service
will be called ovsdb-server, so that it is a bit more reflective of
the dependencies. This commit does make the file a bit of a misnomer as
currently the ovsdb-server SERVICE will start the ovs-vswitchd service
as well. A future commit will clean this up, and change the ifup
configuration in the process.
Panu Matilainen [Wed, 10 Aug 2016 11:16:14 +0000 (14:16 +0300)]
ovs-ctl: support populating system info from /etc/os-release
On systemd-era hosts, OS name and version are available in sanitized
format from /etc/os-release(5) without resorting to calling (and thus
requiring) lsb_release. Support populating system-type and system-version
from /etc/os-release, prefer it over lsb_release, but permit overriding
via the OVS-specific system-type.conf and system-version.conf.
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1350550 Signed-off-by: Panu Matilainen <pmatilai@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ilya Maximets [Mon, 8 Aug 2016 11:19:45 +0000 (14:19 +0300)]
netdev-dpdk: Avoid reconfiguration on reconnection of same vhost device.
Binding/unbinding of virtio driver inside VM leads to reconfiguration
of PMD threads. This behaviour may be abused by executing bind/unbind
in an infinite loop to break normal networking on all ports attached
to the same instance of Open vSwitch.
Fix that by avoiding reconfiguration if it's not necessary.
Number of queues will not be decreased to 1 on device disconnection but
it's not very important in comparison with possible DOS attack from the
inside of guest OS.
Fixes: 81acebdaaf27 ("netdev-dpdk: Obtain number of queues for vhost
ports from attached virtio.") Reported-by: Ciara Loftus <ciara.loftus@intel.com> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
When egress policer is set as a QoS type for a port, an error may occur during
setup if incorrect parameters are used for the rte_meter. If this occurs
the egress policer construct and set functions should free any allocated
memory relevant to the policer and set the QoS configuration pointer to
null. The netdev_dpdk_set_qos function should check the error value returned
for any QoS construct/set calls with an assertion to avoid segfault.
Also this commit modifies egress_policer_qos_set() to correctly lock the QoS
spinlock while the egress policer configuration is updated to avoid
segfault.
Signed-off-by: Ian Stokes <ian.stokes@intel.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
INSTALL.DPDK: Update documentation for DPDK 16.07 support
Replace 'dpdk_nic_bind.py' references with 'dpdk-devbind.py'. The script
name is changed in DPDK 16.07 as the script can be used also on crypto
devices along with NICs.
Update the command for setting packet forwarding mode in 'testpmd' app
from 'set fwd mac_retry' to 'set fwd mac retry'.
vxlan driver has bypass for local vxlan traffic, but that
depends on information about all VNIs on local system in
vxlan driver. This is not available in case of LWT.
Therefore following patch disable encap bypass for LWT
vxlan traffic.
Fixes: ee122c79d42 ("vxlan: Flow based tunneling"). Reported-by: Jakub Libosvar <jlibosva@redhat.com> Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
net: vxlan: lwt: Use source ip address during route lookup.
LWT user can specify destination as well as source ip address
for given tunnel endpoint. But vxlan is ignoring given source
ip address. Following patch uses both ip address to route the
tunnel packet. This consistent with other LWT implementations,
like GENEVE and GRE.
Fixes: ee122c79d42 ("vxlan: Flow based tunneling"). Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
netdev_dpdk_vhost_destruct() calls rte_vhost_driver_unregister(), which
can trigger the destroy_device() callback. destroy_device() will try to
take two mutexes already held by netdev_dpdk_vhost_destruct(), causing a
deadlock.
This problem can be solved by dropping the mutexes before calling
rte_vhost_driver_unregister(). The netdev_dpdk_vhost_destruct() and
construct() call are already serialized by netdev_mutex.
This commit also makes clear that dev->vhost_id is constant and can be
accessed without taking any mutexes in the lifetime of the devices.
ofproto: Consider datapath_type when looking for internal ports.
Interfaces with type "internal" end up having a netdev with type "tap"
in the dpif-netdev datapath, so a strcmp will fail to match internal
interfaces.
We can translate the types with ofproto_port_open_type() before calling
strcmp to fix this.
This fixes a minor issue where internal interfaces are considered
non-internal in the userspace datapath for the purpose of adjusting the
MTU.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Kyle Mestery [Mon, 8 Aug 2016 13:48:40 +0000 (06:48 -0700)]
ovs-vsctl: Change log level of vsctl_parent_process_info
While running the ovn-scale-test [1] port-binding tests [2], I notice a
continual stream of messages such as this:
2016-08-04 13:05:28.705 547 INFO rally_ovs.plugins.ovs.scenarios.ovn [-] bind lport_0996bf_cikzNO to sandbox-172.16.200.24 on ovn-farm-node-uat-dal09-compute-325
2016-08-04 13:05:28.712 547 INFO paramiko.transport [-] Connected (version 2.0, client OpenSSH_6.6.1p1)
2016-08-04 13:05:28.805 547 INFO paramiko.transport [-] Authentication (publickey) successful!
2016-08-04T13:05:28Z|00002|vsctl|WARN|/proc/0/cmdline: open failed (No such file or directory)
2016-08-04T13:05:29Z|00002|vsctl|WARN|/proc/0/cmdline: open failed (No such file or directory)
2016-08-04 13:05:29.042 547 INFO rally_ovs.plugins.ovs.scenarios.ovn [-] bind lport_0996bf_tvovcK to sandbox-172.16.200.24 on ovn-farm-node-uat-dal09-compute-325
2016-08-04T13:05:29Z|00002|vsctl|WARN|/proc/0/cmdline: open failed (No such file or directory)
2016-08-04T13:05:29Z|00002|vsctl|WARN|/proc/0/cmdline: open failed (No such file or directory)
2016-08-04 13:05:29.285 547 INFO rally_ovs.plugins.ovs.scenarios.ovn [-] bind lport_0996bf_HwG7AK to sandbox-172.16.200.24 on ovn-farm-node-uat-dal09-compute-325
2016-08-04T13:05:29Z|00002|vsctl|WARN|/proc/0/cmdline: open failed (No such file or directory)
2016-08-04T13:05:29Z|00002|vsctl|WARN|/proc/0/cmdline: open failed (No such file or directory)
2016-08-04 13:05:29.505 547 INFO rally_ovs.plugins.ovs.scenarios.ovn [-] bind lport_0996bf_Lqbv92 to sandbox-172.16.200.24 on ovn-farm-node-uat-dal09-compute-325
2016-08-04T13:05:29Z|00002|vsctl|WARN|/proc/0/cmdline: open failed (No such file or directory)
2016-08-04T13:05:29Z|00002|vsctl|WARN|/proc/0/cmdline: open failed (No such file or directory)
2016-08-04 13:05:29.724 547 INFO rally_ovs.plugins.ovs.scenarios.ovn [-] bind lport_0996bf_6f8uQW to sandbox-172.16.200.24 on ovn-farm-node-uat-dal09-compute-325
2016-08-04T13:05:29Z|00002|vsctl|WARN|/proc/0/cmdline: open failed (No such file or directory)
2016-08-04T13:05:29Z|00002|vsctl|WARN|/proc/0/cmdline: open failed (No such file or directory)
2016-08-04 13:05:29.944 547 INFO rally_ovs.plugins.ovs.scenarios.ovn [-] bind lport_0996bf_nKl2XF to sandbox-172.16.200.24 on ovn-farm-node-uat-dal09-compute-325
Tracing these down, this is due to the check in vsctl_parent_process_info(),
which is verifying if the parent process can be opened. Since ovn-scale-test
runs sandboxes in containers, and these are run as root, there is no /proc/0
in the container. Thus, the check fails, and the error message is printed out.
It's unclear what value this log message provides, so removing it clears up
this problem and is probably the best option.
For the init process with pid of zero, this patch returns "init",
instead of trying to read from /proc/0/cmdline, which does not exist.
Ben Pfaff [Sat, 6 Aug 2016 06:47:59 +0000 (23:47 -0700)]
lflow: Correct register definitions to use subfields for overlaps.
OVN expressions need to know what fields overlap or alias one another.
This is supposed to be done via subfields: if two fields overlap, then the
smaller one should be defined as a subfield of the larger one. For
example, reg0 should be defined as xxreg0[96..127]. The symbol table in
lflow didn't do this, so it's possible for confusion to result. (I don't
have evidence of this actually happening, because it would only occur
in a case where the same bits of a field were referred to with different
names.)
This commit fixes the problem. It deserves a test, but that's somewhat
difficult at this point, so it will actually happen in a future commit.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Ryan Moats <rmoats@us.ibm.com>
Ben Pfaff [Fri, 15 Jul 2016 22:31:42 +0000 (15:31 -0700)]
expr: Give a subfield a direct pointer to its parent in struct expr_symbol.
Until now, symbols that represent subfields and predicates were both
implemented as the same string member, named 'expansion', inside struct
expr. This makes it a little inconvenient to find the parent of a subfield
for two reasons. First, one must actually parse the string, e.g. to
convert "vlan.tci[13..15]" into a pointer to a struct. Second, and more
importantly, to parse the string it's necessary to have access to the
symbol table, which isn't always convenient to pass around. This commit
avoids the problem by breaking apart subfields and predicates and giving
the former a direct pointer to the parent symbol.
We could do the same thing for predicates by storing a pointer to a
pre-built struct expr, but so far it's not necessary.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Ryan Moats <rmoats@us.ibm.com>
Ben Pfaff [Fri, 15 Jul 2016 21:27:55 +0000 (14:27 -0700)]
expr: Track writability as part of expr_symbol.
Until now it was only possible to find out whether an expr_symbol was
read/write or read-only, for subfields, by chasing down whether the
eventual parent field was read/write or read-only. This commit adds
a new 'rw' member that indicates directly.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Ryan Moats <rmoats@us.ibm.com>
Ben Pfaff [Wed, 3 Aug 2016 05:46:18 +0000 (22:46 -0700)]
expr: Initialize 'relop' of allocated exprs in crush_and_string().
Every relop at this point is always EXPR_R_EQ, and therefore it seems that
no code actually examined it, so this doesn't appear to fix an existing
bug, but some code I was working on was affected by the uninitialized
member.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Ryan Moats <rmoats@us.ibm.com>
Ben Pfaff [Wed, 3 Aug 2016 04:53:59 +0000 (21:53 -0700)]
expr: Refine handling of error parameter to expr_annotate().
In most cases expr_annotate() set '*errorp' to NULL if it was successful,
but there was one case where it did not. This corrects that and refines
the comment to better explain the intended behavior.
This didn't affect any existing users because all of them passed in a
pointer that was already NULL.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Ryan Moats <rmoats@us.ibm.com>
Ben Pfaff [Sun, 31 Jul 2016 17:18:40 +0000 (10:18 -0700)]
expr: Fine-tune parser error message for common typo.
It's easy to type "=" in place of "==" in an expression but the expression
parser's error message was far from clear. For multibit numeric fields,
it said:
Explicit `!= 0' is required for inequality test of multibit field
against 0.
For string fields, the parser treated such an expression as "<name> != 0"
and thus it said:
String field <name> is not compatible with numeric constant.
This improves the error message in each case to:
Syntax error at `=' expecting relational operator.
which I hope to be clear.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Ryan Moats <rmoats@us.ibm.com>
Ben Pfaff [Fri, 15 Jul 2016 21:13:02 +0000 (14:13 -0700)]
ofp-actions: Correct member name for write_actions.
For a variable-length action like write_actions, the member name is
supposed to be the name of the variable-length array at the end of the
action structure. It only makes a real difference if the beginning of the
array is not 64-bit aligned, so it did not matter in this case, but it's
better to get it right.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Ryan Moats <rmoats@us.ibm.com>
Ben Pfaff [Wed, 27 Jul 2016 06:55:25 +0000 (23:55 -0700)]
ovsdb-idl: Wake up ovsdb_idl_loop when a transaction commits.
There is a fair amount of code that defers modifying the database when a
transaction cannot be created (because there is already one outstanding).
This code tends to assume that the main loop will wake up again when it
becomes possible again to modify the database, but the actual ovsdb_id_loop
implementation only did this if the database had changed. This is too
conservative a policy and may account for some failures I've seen in tests.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Ryan Moats <rmoats@us.ibm.com>
Ben Pfaff [Mon, 8 Aug 2016 03:44:51 +0000 (20:44 -0700)]
ovn-nbctl: Add "sync" command to wait for previous changes to take effect.
It's slow to add --wait to every ovn-nbctl command; only the last command
needs it. But it's sometimes inconvenient to add it to the last command
if it's in a loop, etc. This makes it possible to separately wait for
the OVN southbound or hypervisors to catch up to the northbound.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Ryan Moats <rmoats@us.ibm.com>
The '-d' flag tells autotest to always keep the testcase output, but
prevents '--recheck' from working. If a user wants to always keep the
output from the tests, the '-d' flag can be passed explicitly. This is
more in line with other test make target ('check',
'check-system-userspace').
CC: Andy Zhou <azhou@ovn.org> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Andy Zhou <azhou@ovn.org>
system-traffic: Flush conntrack after debug ping6.
We want to discard any state created by the initial ping6 (used to wait
for an available IP address). Otherwise some weird state can show up in
the connection tracking tables (such as ICMP connection from link-local
addresses).
Fixes: e5cf8cce2759("system-tests: Add ping through conntrack test.") Reported-by: Joe Stringer <joe@ovn.org> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Joe Stringer <joe@ovn.org>
system-userspace-macros: Check the exit code of ethtool.
If the ethtool command is not available on the system we should fail,
since the userspace testsuite cannot work properly without disabling
offloads.
Also, add ethtool to the list of installed packages on Vagrantfile, to
ensure that offloads don't cause test failures in the vagrant VM when
the kernel is updated.
Fixes: ddcf96d2dcc1 ("system-tests: Disable offloads in userspace tests.") Reported-by: Joe Stringer <joe@ovn.org> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Joe Stringer <joe@ovn.org>
This patch adds some comments to the dpcls_lookup() funtion,
which is one of the most important places where the Userspace
wildcard matching happens.
The purpose is to give some more explanations on its design
and also on how it works.
Joe Stringer [Fri, 5 Aug 2016 00:40:43 +0000 (17:40 -0700)]
system-traffic: Make ping6 vlan test more reliable.
Previously we checked on the underlying interfaces rather than the vlan
interfaces to verify whether IPv6 connectivity is available;
occasionally this would fail on some systems. Wait on the VLAN IP
instead.
Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: <diproiettod@vmware.com>
Maxime Coquelin [Tue, 2 Aug 2016 13:48:27 +0000 (15:48 +0200)]
bridge: No QoS configured is not an error
If no QoS is configured, type value is likely to be an empty
string.
This is not an error though, so use the regular command reply
function, not the error one.
For example, before this patch:
# ovs-appctl -t ovs-vswitchd qos/show vhost-user1
QoS not configured on vhost-user1
ovs-appctl: ovs-vswitchd: server returned an error
After the patch:
# ovs-appctl -t ovs-vswitchd qos/show vhost-user1
QoS not configured on vhost-user1
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Ian Stokes <ian.stokes@intel.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
#0 0x00007efcbadf18d7 in raise () from /lib64/libc.so.6
#1 0x00007efcbadf353a in abort () from /lib64/libc.so.6
#2 0x000000000068d5be in ovs_abort_valist at lib/util.c:335
#3 0x0000000000693d90 in vlog_abort_valist at lib/vlog.c:1204
#4 0x0000000000693e17 in vlog_abort at lib/vlog.c:1218
#5 0x000000000068d3ae in ovs_assert_failure at lib/util.c:72
#6 0x000000000060425c in ds_put_format_valist at lib/dynamic-string.c:168
#7 0x00000000006042e7 in ds_put_format at lib/dynamic-string.c:142
#8 0x00000000005a9e75 in qos_unixctl_show at vswitchd/bridge.c:3185
#9 0x000000000068cda1 in process_command at lib/unixctl.c:347
#11 unixctl_server_run at lib/unixctl.c:400
#12 0x000000000040a3ff in main at vswitchd/ovs-vswitchd.c:113
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Ian Stokes <ian.stokes@intel.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Ciara Loftus [Thu, 4 Aug 2016 10:44:40 +0000 (11:44 +0100)]
netdev-dpdk: Make libnuma dependencies optional
Prior to this patch, OVS with DPDK required the libnuma packages to
build. This patch removes this dependency, making it only a requirement
when the CONFIG_RTE_LIBRTE_VHOST_NUMA option is detected as enabled in
the DPDK build.
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Mark Kavanagh [Thu, 4 Aug 2016 09:49:12 +0000 (10:49 +0100)]
netdev-dpdk: fix memory leak
DPDK v16.07 introduces the ability to free memzones.
Up until this point, DPDK memory pools created in OVS could
not be destroyed, thus incurring a memory leak.
Leverage the DPDK v16.07 rte_mempool API to free DPDK
mempools when their associated reference count reaches 0 (this
indicates that the memory pool is no longer in use).
Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
ovs_ct_find_existing() issues a warning if an existing conntrack entry
classified as IP_CT_NEW is found, with the premise that this should
not happen. However, a newly confirmed, non-expected conntrack entry
remains IP_CT_NEW as long as no reply direction traffic is seen. This
has resulted into somewhat confusing kernel log messages. This patch
removes this check and warning.
Fixes: 289f2253 ("openvswitch: Find existing conntrack entry after upcall.") Suggested-by: Joe Stringer <joe@ovn.org> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
Ciara Loftus [Wed, 3 Aug 2016 12:29:24 +0000 (13:29 +0100)]
netdev-dpdk: Add support for DPDK 16.07
This commit introduces support for DPDK 16.07 and consequently breaks
compatibility with DPDK 16.04.
DPDK 16.07 introduces some changes to various APIs. These have been
updated in OVS, including:
* xstats API: changes to structure of xstats
* vhost API: replace virtio-net references with 'vid'
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Tested-by: Maxime Coquelin <maxime.coquelin@redhat.com> Tested-by: Robert Wojciechowicz <robertx.wojciechowicz@intel.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
With RCU in Open vSwitch it's very easy to protect objects accessed by
a pointer, but sometimes a pointer is not available.
One example is the vhost id for DPDK 16.07. Until DPDK 16.04 a pointer
was used to access a vhost device with RCU semantics. From DPDK 16.07
an integer id (which is an array index) is used to access a vhost
device. Ideally, we want the exact same RCU semantics that we had for
the pointer, on the integer (atomicity, memory barriers, behaviour
around quiescent states)
This commit implements a new type in ovs-rcu: ovsrcu_index. The newly
implemented ovsrcu_index_*() functions should be used to access the
type.
Even though we say "Do not, in general, declare a typedef for a struct,
union, or enum.", I think we're not in the "general" case.
Pravin B Shelar [Wed, 3 Aug 2016 21:37:44 +0000 (14:37 -0700)]
datapath: compat: Use checksum offload for outer header.
Following patch simplifies UDP-checksum routine by unconditionally
using checksum offload for non GSO packets. We might get some
performance improvement due to code simplification.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
For ipv6+udp+geneve encapsulation data, the max_mtu should subtract
sizeof(ipv6hdr), instead of sizeof(iphdr).
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
Only the first and last netlink message for a particular conntrack are
actually sent. The first message is sent through nf_conntrack_confirm when
the conntrack is committed. The last one is sent when the conntrack is
destroyed on timeout. The other conntrack state change messages are not
advertised.
When the conntrack subsystem is used from netfilter, nf_conntrack_confirm
is called for each packet, from the postrouting hook, which in turn calls
nf_ct_deliver_cached_events to send the state change netlink messages.
This commit fixes the problem by calling nf_ct_deliver_cached_events in the
non-commit case as well.
Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action") CC: Joe Stringer <joestringer@nicira.com> CC: Justin Pettit <jpettit@nicira.com> CC: Andy Zhou <azhou@nicira.com> CC: Thomas Graf <tgraf@suug.ch> Signed-off-by: Samuel Gauthier <samuel.gauthier@6wind.com> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
Pravin B Shelar [Tue, 2 Aug 2016 06:47:12 +0000 (23:47 -0700)]
datapath: compat: Use udp-checksum function for compat case.
udp_set_csum() has bug fix that is not relevant for upstream
(commit c77d947191b0).
So OVS need to use compat function. This function is also
used from UDP xmit path so we have to check USE_UPSTREAM_TUNNEL.
Following patch couple this function to USE_UPSTREAM_TUNNEL symbol
rather than kernel version.
This is not bug, This patch help in code readability.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
On some systems I get a sparse warning when compiling
tests/test-netlink-conntrack.c
/usr/include/x86_64-linux-gnu/sys/cdefs.h:307:10: warning: preprocessor
token __always_inline redefined
/usr/include/linux/stddef.h:4:9: this was the original definition
The problem seems to be that Linux upstream commit 283d75737837("uapi/linux/stddef.h: Provide __always_inline to userspace
headers") introduced __always_inline in stddef.h, but glibc headers
didn't like that until e0835a5354ab("Bug 20215: Always undefine
__always_inline before defining it.").
This commit works around the issue by including a glibc header before a
kernel header.
Fixes: 2c06d9a927c5("ovstest: Add test-netlink-conntrack command.") Reported-by: Joe Stringer <joe@ovn.org> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Joe Stringer <joe@ovn.org>
Pravin B Shelar [Tue, 2 Aug 2016 03:12:06 +0000 (20:12 -0700)]
datapath: compat: Detect GSO support at ovs configure
OVS turns on tunnel GSO for statically for kernel older than 3.18.
Some distributions kernel could backport tunnel GSO. To make use
of device offload on such kernel detect the support at configure
stage.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
Paul Boca [Tue, 2 Aug 2016 17:45:42 +0000 (17:45 +0000)]
python tests: Skip TCP6 idl tests on Windows
The IPPROTO_IPV6 is not defined on Python for Windows because of
compatibility with older Windows versions.
Here is this issue discussed:https://bugs.python.org/issue6926
Paul Boca [Tue, 2 Aug 2016 17:45:41 +0000 (17:45 +0000)]
python tests: Added fcntl module for Windows
This is needed for lockf function used to lock the PID file on Windows.
ioctl and fcntl functions are not implemented at this time because they are
not used by any script.
Paul Boca [Tue, 2 Aug 2016 17:45:39 +0000 (17:45 +0000)]
python tests: Implemented signal.alarm for Windows
signal.alarm is not available in Windows and would trigger an exception
when called. Implemented this to mentain compatibility between
Windows and Linux for python tests.