OVS compat layer can handle tunnel GSO packets. but it does
keep skb encapsulation on for packet handled in GSO. This can
confuse some NIC drivers. I have seen this issue on intel devices:
In upstream linux kernel networking stack udp_set_csum() is called
with only udp header applied but in case of compat layer it can
be called with IP header. So following patch take the offset into
account.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
ovn-northd: Add logical flows to support native DHCPv4
OVN implements a native DHCPv4 support which caters to the common
use case of providing an IP address to a booting instance by
providing stateless replies to DHCPv4 requests based on statically
configured address mappings. To do this it allows a short list of
DHCPv4 options to be configured and applied at each compute host
running ovn-controller.
A new table 'DHCP_Options' is added in OVN NB DB to store the DHCP
options. Logical ports refer to this table to configure the DHCPv4
options.
For each logical port configured with DHCPv4 Options following flows
are added
- A logical flow which copies the DHCPv4 options to the DHCPv4
request packets using the 'put_dhcp_opts' action and advances the
packet to the next stage.
- A logical flow which implements the DHCP reponder by sending
the DHCPv4 reply back to the inport once the 'put_dhcp_opts' action
is applied.
Signed-off-by: Numan Siddique <nusiddiq@redhat.com> Co-authored-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Ben Pfaff <blp@ovn.org> Tested-by: Ramu Ramamurthy <ramu.ramamurthy@us.ibm.com> Acked-by: Ramu Ramamurthy <ramu.ramamurthy@us.ibm.com>
Joe Stringer [Mon, 25 Jul 2016 21:09:26 +0000 (14:09 -0700)]
rhel/openvswitch.spec: Add SELinux policy.
Commit 9b897c9125ef ("rhel: provide our own SELinux custom policy
package") added the SELinux policy to the fedora packaging as a
subpackage. This patch makes the corresponding change to
openvswitch.spec, so that users of that specfile can generate the
selinux policy package without having to build all of the fedora
packages.
VMware-BZ: #1692972 Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Flavio Leitner <fbl@sysclose.org>
Joe Stringer [Fri, 22 Jul 2016 21:10:51 +0000 (14:10 -0700)]
selinux: Allow ovs-ctl force-reload-kmod.
When invoking ovs-ctl force-reload-kmod via '/etc/init.d/openvswitch
force-reload-kmod', spurious errors would output related to 'hostname'
and 'ip', and the system's selinux audit log would complain about some
of the invocations such as those listed at the end of this commit message.
This patch loosens restrictions for openvswitch_t (used for ovs-ctl, as
well as all of the OVS daemons) to allow it to execute 'hostname' and
'ip' commands, and also to execute temporary files created as
openvswitch_tmp_t. This allows force-reload-kmod to run correctly.
Example audit logs:
type=AVC msg=audit(1468515192.912:16720): avc: denied { getattr } for
pid=11687 comm="ovs-ctl" path="/usr/bin/hostname" dev="dm-1"
ino=33557805 scontext=system_u:system_r:openvswitch_t:s0
tcontext=system_u:object_r:hostname_exec_t:s0 tclass=file
Prevents the cloning of rows with outgoing or incoming weak references when
those rows aren't being modified.
It improves the OVSDB Server performance when many rows with weak references
are involved in a transaction.
In the original code (dst_refs is created from scratch):
old->dst_refs = all the rows that weak referenced old
new->dst_refs = all the rows that weak referenced old and are still weak
+referencing new + rows in the transaction that weak referenced new
In the patch (dst_refs incrementally built):
Old->dst_refs = all the rows that weak referenced old
Ideally, but expansive to calculate:
New->dst_refs = old->dst_refs - "weak references removed within this TXN" +
+"weak references created within this TXN"
What this patch implements:
New->dst_refs = old->dst_refs - "weak references in old rows in TXN" + "weak
+references in new rows in TXN"
The resulting sets should be equal in both cases.
We do some more optimizations:
- If we know that the transactions must be successful at some point then,
instead of cloning dst_refs we could just move the elements between
the lists.
- At that point we lost the rollback feature, but we aren't going to need
it anyway (note that we didn't really touch the src_refs part).
- The references in dst_refs must point to new instead than old.
Previously we iterated over all the weak references in dst_refs
to change that pointer, but using an UUID is easier, and prevents
that iteration completely.
For some more commentary, see:
http://openvswitch.org/pipermail/dev/2016-July/074840.html
Signed-off-by: Esteban Rodriguez Betancourt <estebarb@hpe.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ben Pfaff [Fri, 22 Jul 2016 23:43:50 +0000 (16:43 -0700)]
flow: Verify that tot_len >= ip_len in miniflow_extract().
miniflow_extract() uses the following quantities when it examines an IPv4
header:
size, the number of bytes from the start of the IPv4 header onward
ip_len, the number of bytes in the IPv4 header (from the IHL field)
tot_len, same as size but taken from IPv4 header Total Length field
Until now, the code in miniflow_extract() verified these invariants:
size >= 20 (minimum IP header length)
ip_len >= 20 (ditto)
ip_len <= size (to avoid reading past end of packet)
tot_len <= size (ditto)
size - tot_len <= 255 (because this is stored in a 1-byte variable
internally and wouldn't normally be big)
It failed to verify the following, which is not implied by the conjunction
of the above:
ip_len <= tot_len (e.g. that the IP header fits in the packet)
This means that the code was willing to read past the end of an IP
packet's declared length, up to the actual end of the packet including any
L2 padding. For example, given:
size = 44
ip_len = 44
tot_len = 40
miniflow_extract() would successfully verify all the constraints, then:
* Trim off 4 bytes of tail padding (size - tot_len), reducing size to
40 to match tot_len.
* Pull 44 (ip_len) bytes of IP header, even though there are only 40
bytes left. This causes 'size' to wrap around to SIZE_MAX-4.
Given an IP protocol that OVS understands (such as TCP or UDP), this
integer wraparound could cause OVS to read past the end of the packet.
In turn, this could cause OVS to extract invalid port numbers, TCP flags,
or ICMPv4 or ICMPv6 or IGMP type and code from arbitrary heap data
past the end of a packet.
This bug has common hallmarks of a security vulnerability, but we do not
know of a way to exploit this bug to cause an Open vSwitch crash, or to
extract sensitive data from Open vSwitch address space to an attacker's
benefit.
We do not have a specific example, but it is reasonable to suspect that
this bug could allow an attacker in some circumstances to bypass ACLs
implemented via Open vSwitch flow tables. However, any IP packet that
triggers this bug is invalid and should be rejected in an early stage of a
receiver's IP stack. For the same reason, any IP packet that triggers this
bug will also be dropped by any IP router, so an attacker would have to
share the same L2 segment as the victim. In conjunction with an IP stack
that has a similar bug, of course, this could cause some damage, but we do
not know of an IP stack with such a bug; neither Linux nor the OVS
userspace tunnel implementation appear to have such a bug.
Terry Wilson [Tue, 26 Jul 2016 00:17:11 +0000 (19:17 -0500)]
python: Serial JSON via Python's json lib.
There is no particularly good reason to use our own Python JSON
serialization implementation when serialization can be done faster
with Python's built-in JSON library.
A few tests were changed due to Python's default JSON library
returning slightly more precise floating point numbers.
Signed-off-by: Terry Wilson <twilson@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
This compatibility code was only needed for Linux 2.6.36 and older. With the
support for versions older than 3.10 dropped, this code is not needed anymore.
The style for checking for mpls was kept in case some other protocol type is
added in the future.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org>
datapath: remove rtnl_delete_link support for older Linux
The changes from upstream version of rtnl_delete_link were only there to support
Linux 2.6.33 or older. The removal of this support makes it identical to
upstream version as of 4.6.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org>
William Tu [Mon, 25 Jul 2016 15:14:24 +0000 (08:14 -0700)]
netdev-dpdk: Apply batch truncation API.
Instead of looping into each packet and check whether to truncate, the
patch moves it out of the loop and uses batch API. If truncation is
not set, checking 'trunc' in 'struct dp_packet_batch' at per-batch basis
can skip the per-packet checking overhead.
Signed-off-by: William Tu <u9012063@gmail.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Zong Kai LI [Thu, 21 Jul 2016 06:17:28 +0000 (14:17 +0800)]
python: add set type for ovs.idl.data.Datum.from_python
ovs.db.idl.Datum.from_python fails to handle set type value, while set
type is also a common iterable sequence, just like list and tuple.
No reason IDL caller must to turn set type parameters to list or tuple
type. Otherwise, they will fail to insert data, but get no exception.
Reported-at: https://bugs.launchpad.net/networking-ovn/+bug/1605573 Signed-off-by: Zong Kai LI <zealokii@gmail.com> Acked-by: Richard Theis <rtheis@us.ibm.com> Tested-by: Richard Theis <rtheis@us.ibm.com> Signed-off-by: Russell Bryant <russell@ovn.org>
Paul Boca [Mon, 25 Jul 2016 12:50:33 +0000 (12:50 +0000)]
windows: Added lockf function and lock PID file
If the PID file isn't locked then appctl.py detects it as stale and
bails out without doing anything. Because of this lots of Python tests fail.
Also this protects the PID file from being overwritten.
I used only shared lock, in order to be compatible with Python tests,
which try to acquire the lock exclusively. On Windows if the exclusive lock
is used, than the read access is denied too for other instances of this file.
Ryan Moats [Sun, 24 Jul 2016 18:36:35 +0000 (18:36 +0000)]
Explain initialization when using csum()
The checksum method csum() requires its output location to be
intialized to zero when that output location is part of the
checksum. Add comments to the various places where csum is
called documenting where the initialization has occurred.
Signed-off-by: Ryan Moats <rmoats@us.ibm.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
ovn-controller: eliminate stall in ofctrl state machine
The "ovn -- 2 HVs, 3 LRs connected via LS, static routes"
test case currently exhibits frequent failures. These failures
occur because, at the time that the test packets are sent to
verify forwarding, no flows have been installed in the vswitch
for one of the hypervisors.
The state machine implemented by ofctrl_run() is intended to
iterate as long as progress is being made, either as long as
the state continues to change or as long as packets are being
received. Unfortunately, the code had a bug: if receiving a
packet caused the state to change, it didn't call the state's
run function again to try to see if it would change the state.
This caused a real problem in the following case:
1) The state is S_TLV_TABLE_MOD_SENT.
2) An OFPTYPE_NXT_TLV_TABLE_REPLY message is received.
3) No event (other than SB probe timer expiration) is expected
that would unblock poll_block() in the main ovn-controller
loop.
In such a case, ofctrl_run() would receive the packet and
advance the state, but not call the run function for the new
state, and then leave the state machine paused until the next
event (e.g. a timer event) occurred.
This commit fixes the problem by continuing to iterate the state
machine until the state remains the same and no packet is
received in the same iteration. Without this fix, around 40
failures are seen out of 100 attempts, with this fix no failures
have been observed in several hundred attempts (using an earlier
version of this patch).
Signed-off-by: Lance Richardson <lrichard@redhat.com>
[blp@ovn.org refactored for clarity] Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Lance Richardson <lrichard@redhat.com>
ovs-lib: Keep internal interface ip during upgrade.
Commit 9b5422a98f81("ovs-lib: Try to call exit before killing.")
introduced a problem where internal interfaces are destroyed and
recreated, losing their IP address.
Commit 9aad5a5a96ba("ovs-vswitchd: Preserve datapath ports across
graceful shutdown.") fixed the problem by changing ovs-vswitchd
to preserve the ports on `ovs-appctl exit`. Unfortunately, this fix is
not enough during upgrade from <= 2.5.0, where an old ovs-vswitchd is
running (without the fix) and a new ovs-lib script is performing the
restart.
The problem seem to affect both RHEL and ubuntu.
This commit fixes the upgrade by looking at the running daemon
version and avoid using `ovs-appctl exit` if it's < 2.5.90.
Suggested-by: Gurucharan Shetty <guru@ovn.org> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Gurucharan Shetty <guru@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
Terry Wilson [Tue, 12 Jul 2016 21:37:34 +0000 (16:37 -0500)]
json: Move from lib to include/openvswitch.
To easily allow both in- and out-of-tree building of the Python
wrapper for the OVS JSON parser (e.g. w/ pip), move json.h to
include/openvswitch. This also requires moving lib/{hmap,shash}.h.
Both hmap.h and shash.h were #include-ing "util.h" even though the
headers themselves did not use anything from there, but rather from
include/openvswitch/util.h. Fixing that required including util.h
in several C files mostly due to OVS_NOT_REACHED and things like
xmalloc.
Signed-off-by: Terry Wilson <twilson@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
These failures were caused by a combination of problems in
handling physical changes:
1. When a vif was removed, the localvif_to_ofport entry was not
removed.
2. When a physical change was detected, ovn-controller would wait
a poll cycle before processing the logical flow table.
This patch set addresses both of these issues while simultaneously
cleaning up the code in physical.c. A side effect is a modification
of where OF flows are dumped in the gateway router case that allowed
the root causes of this issue to be found.
With these changes, all of the above tests had a 100/100 success rate.
Andy Zhou [Fri, 22 Jul 2016 20:49:09 +0000 (13:49 -0700)]
ovsdb: Add ovsdb-client options for testing lock
RFC 7047 lock operation has been fully implemented in ovsdb-server
for a while, but it is not well covered in unit testing. This
patch adds options for the ovsdb-client tool to issue lock operations.
The next patch will make use those options.
Please see ovsdb-client(1) changes for more details.
Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
Benli Ye [Thu, 7 Jul 2016 15:17:48 +0000 (23:17 +0800)]
tests: Fix IPFIX test cases issue.
IPFIX statistics 'tx pkts' means the number of successfully
sending IPFIX packets, while 'tx errs' means sending error
IPFIX packets. These two parameters can be affected by whether
listening on port 4739 on local host. This case should be
solved entirely by introducing PARSE_LISTENING_PORT as sFlow,
but it depends on implementing IPFIX packet analysis and it
will take some time. Disable these field first, as IPFIX statistics
check are failed on Windows due to 'tx pkts' and 'tx errs' fields.
Windows marks all packets sending successfully, even if port 4739
on local host is not listened.
Remove XFAIL check for 'Flow IPFIX sanity check - tunnel set',
as this test had “UNEXPECTED PASS” on Windows.
More detail, please refer the following link.
https://www.mail-archive.com/dev@openvswitch.org/msg65229.html
Reported-by: Paul Boca <pboca@cloudbasesolutions.com> Acked-by: Paul Boca <pboca@cloudbasesolutions.com> Signed-off-by: Benli Ye <daniely@vmware.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
netdev-dummy: fix crash with more than one passive connection
Investigation found that Some of the occasional failures in the
"ovn -- vtep: 3 HVs, 1 VIFs/HV, 1 GW, 1 LS" test case are caused
by ovs-vswitchd crashing with SIGSEGV. It turns out that the
crash occurrs when the number of netdev-dummy passive connections
transitions from 1 to 2. When xrealloc() copies the array of
dummy_packet_stream structures from the original buffer to a
newly allocated one, the struct ovs_list txq member of the structure
becomes corrupt (e.g. if ovs_list_is_empty() would have returned
false before the copy, it will return true after the copy, which
will lead to a crash when the bogus packet buffer on the list is
dereferenced).
Fix by taking a hint from David Wheeler and adding a level of
indirection.
Signed-off-by: Lance Richardson <lrichard@redhat.com>
[blp@ovn.org folded in an additional bug fix] Signed-off-by: Ben Pfaff <blp@ovn.org>
Ben Pfaff [Sat, 2 Jul 2016 01:05:40 +0000 (18:05 -0700)]
ovs-pki: Use SHA-512 instead of SHA-1 as message digest.
The upcoming OpenSSL 1.1.0 release disables use of SHA-1, which breaks the
OVS unit tests, which use SHA-1. We last tried to switch to SHA-512 in
2014 with commit 9ff33ca75e9fcc ("ovs-pki: Use SHA-512 instead of MD5 as
message digest."), but we had to downgrade to SHA-1 in commit 4a1f9610682d
("ovs-pki: Use SHA-1 instead of SHA-512 as message digest.") because
XenServer did not support SHA-512. It has been a few years, so let's try
again.
CC: 828478@bugs.debian.org
Reported-at: https://bugs.debian.org/828478 Reported-by: Kurt Roeckx <kurt@roeckx.be> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Ryan Moats <rmoats@us.ibm.com>
Using sleep's is prone to runtime system dependent races, and indeed
this test started consistently failing on my dev VM after an unrelated
change to ovs-vswitchd. Get git of the sleeps and explicitly wait for
the transaction on ovsdb1 to become visible on ovsdb2.
Also fix the name of the test.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Andy Zhou <azhou@ovn.org> Tested-by: Joe Stringer <joe@ovn.org>
Ben Pfaff [Tue, 19 Jul 2016 15:36:35 +0000 (08:36 -0700)]
ovn-northd: Only peer router ports to other router ports.
A router port's "peer", if set, must point to another router port, but the
code as written also accepted switch ports. This caused problems when
switch ports were actually specified.
William Tu [Tue, 19 Jul 2016 00:05:35 +0000 (17:05 -0700)]
netdev-provider: Apply batch object to netdev provider.
Commit 1895cc8dbb64 ("dpif-netdev: create batch object") introduces
batch process functions and 'struct dp_packet_batch' to associate with
batch-level metadata. This patch applies the packet batch object to
the netdev provider interface (dummy, Linux, BSD, and DPDK) so that
batch APIs can be used in providers. With batch metadata visible in
providers, optimizations can be introduced at per-batch level instead
of per-packet.
Tested-at: https://travis-ci.org/williamtu/ovs-travis/builds/145694197 Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Joe Stringer [Fri, 17 Jun 2016 19:42:30 +0000 (12:42 -0700)]
debian: Fix OVS upgrade dependencies.
Commit 0dcc739e7a28 ("debian: Move ovs-lib to openvswitch-common.")
shifted a file between debian packages, but didn't update the
destination package annotations to indicate that it replaces a file
from earlier versions of the source package.
As a result, if one installs openvswitch-switch-2.5* (or earlier) and
then tries to upgrade to openvswitch-{switch,common}-2.5.90+, the
install of openvswitch-common will fail like the following:
dpkg: error processing archive
/tmp/openvswitch-common_2.5.90-1_amd64.deb (--install):
trying to overwrite '/usr/share/openvswitch/scripts/ovs-lib', which is
also in package openvswitch-switch 2.5.0-1
Fix the issue by adding "Replaces" and "Breaks" tags to the new
openvswitch-common section of debian/control.
Fixes: 0dcc739e7a28 ("debian: Move ovs-lib to openvswitch-common.") Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Joe Stringer [Tue, 19 Jul 2016 19:54:08 +0000 (12:54 -0700)]
system-traffic: Fix up FTP tests.
Prior to commit b87a5aacefe2 ("datapath: Fix cached ct with helper."),
we were relying on automatic helpers to ensure that FTP connections were
tracked correctly, regardless of the flows that existed in the datapath.
Now, we can drop the automatic helpers in the root namespace and still
have related connections work correctly. Also, the ALG should only be
specified when committing the connection. Update the rules.
Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Jarno Rajahalme <jarno@ovn.org>
Joe Stringer [Tue, 19 Jul 2016 19:54:06 +0000 (12:54 -0700)]
system-traffic: Update tests in flat tables.
A few of the earlier tests were written with all flows in a single flat
table. While this is a possible way to write your flows to use
connection tracking, it's easier to understand if the processing
proceeds forward from one table to the next. Update these tests.
Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Jarno Rajahalme <jarno@ovn.org>
tunneling: get skb marking to work properly with tunnels
There are two issues that this patch fixes:
1. it was impossible to set skb mark at all through
NXM_NX_PKT_MARK register for tunnel packets; AND
2. ipsec_xxx tunnels would not be marked with the default
IPsec mark (broken by d23df9a87 "lib/odp: Use masked set
actions.").
This patch also adds anti-regression tests to prevent such
breakages in the future.
IPsec: refactor out some code in OVS_MONITOR_IPSEC_START macro
This OVS_MONITOR_IPSEC_START macro will be helpful in the next
patch where it will be used also from tests/tunnel.at file to test
that skb marking happens correctly. Otherwise, without ovs-monitor-ipsec
running the ovs-vswitchd would refuse to configure ipsec_XXX tunnels.
Russell Bryant [Thu, 30 Jun 2016 20:14:05 +0000 (16:14 -0400)]
ovn: Apply ACL changes to existing connections.
Prior to this commit, once a connection had been committed to the
connection tracker, the connection would continue to be allowed, even
if the policy defined in the ACL table changed. This patch changes
the implementation so that existing connections are affected by policy
changes.
The implementation is based on the suggested approach in this mailing
list thread:
Instead of always allowing packets associated with an established
connection, we now put all packets in the request direction through
the flows generated based on OVN ACLs. If a packet associated with an
established connection hits a "drop" ACL, that means we have
encountered a policy change and should drop packets associated with
this connection from now on. We handle this by setting "ct_label" on
the associated connection tracking entry.
These changes also account for re-allowing a known connection after
ct_label had been set on it. This can happen if you delete an ACL and
then re-create it while connection state is still known.
The proposal on the mailing list also discussed the idea that
ovn-controller could periodically sweep the connection tracker and
delete entries with ct_label set. That is not implemented in this
patch. Instead, we rely on connections dying since we're dropping
its packets and then allowing the connection tracking entry to
eventually time out. More proactively clearing them out could be a
future enhancement.
As a realistic example of how this works, consider this security policy
from an OpenStack+OVN development environment.
The OpenStack Neutron plugin creates ACLs that drop traffic by default
and higher priority ACLs for each type of traffic that is allowed. In
this case, the ACLs for a port using the "default" security group are:
One way I tested this by leaving ping running, ensuring that it was
blocked when the rule for ICMP was deleted, and then re-allowed when
the rule allowing ICMP was restored. In this case, the ICMP
connection is still known by the connection tracker, but the flows
ensure that ct_label gets reset back to 0.
Reported-by: Xiao Li Xu <xiaolixu@cn.ibm.com>
Reported-at: https://bugs.launchpad.net/networking-ovn/+bug/1536080 Suggested-by: Justin Pettit <jpettit@ovn.org> Signed-off-by: Russell Bryant <russell@ovn.org> Acked-by: Han Zhou <zhouhan@gmail.com> Acked-by: Ben Pfaff <blp@ovn.org> Acked-by: Justin Pettit <jpettit@ovn.org> Tested-by: Babu Shanmugam <bschanmu@redhat.com>
Justin Pettit [Thu, 23 Jun 2016 01:20:08 +0000 (18:20 -0700)]
ovn-util: Add solicited node addresses to ipv6_netaddr.
Every IPv6 host has a link-local solicited node multicast address for
neighbor discovery. This commit defines the solicited node address for
each IPv6 address added to a logical switch or router port.
Signed-off-by: Justin Pettit <jpettit@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
I presume the flags are supposed to map to neighbor discovery
advertisement "Router", "Solicited", and "Override" flags, which would
be "rso" instead of "rco".
Signed-off-by: Justin Pettit <jpettit@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
Ben Pfaff [Tue, 19 Jul 2016 16:07:13 +0000 (09:07 -0700)]
ovn-northd: Ensure that flows are added to correct types of datapaths.
A DP_TYPE_SWITCH_* flow should only be added to a logical switch datapath,
and a DP_TYPE_ROUTER_* flow should only be added to a logical router
datapath, but the code previously did not verify this and it caused a
problem in practice.
Suggested-by: Guru Shetty <guru@ovn.org>
Suggested-at: http://openvswitch.org/pipermail/dev/2016-July/075557.html Signed-off-by: Ben Pfaff <blp@ovn.org>
FreeBSD returns a socklen of sockaddr_storage when doing an accept on an unix
STREAM socket. The current code will assume it means a sun_path larger than 0.
That breaks some tests like the one below which don't expect to find "unix::" on
the logs.
As a Linux abstract address would not have a more useful name either, it's
better to check that sun_path starts with a non-zero byte and return 0 length in
case it doesn't.
402: ovs-ofctl replace-flows with --bundle FAILED (ovs-ofctl.at:2928)
2016-07-08T12:44:30.068Z|00020|vconn|DBG|unix:: sent (Success): OFPT_HELLO (OF1.6) (xid=0x1):
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
ovn-sbctl: eliminate a spurious test case error cause
The "ovn-sbctl" test fails occasionally due to log messages
similar to these:
jsonrpc|WARN|unix: receive error: Connection reset by peer
reconnect|WARN|unix: connection dropped (Connection reset by peer)
Since we're already ignoring "Broken pipe" messages in this test
case, and the difference between EPIPE and ECONNRESET on send
is simply a matter of whether the peer had unconsumed data
in its receive buffer when the peer socket was closed, it should
be OK to ignore "reset by peer" logs as well.
This same type of failure has been observed in ovs-nbctl and
ovn-vtep-controller tests, so fix it there as well.
Signed-off-by: Lance Richardson <lrichard@redhat.com> Acked-by: Ryan Moats <rmoats@us.ibm.com> Signed-off-by: Russell Bryant <russell@ovn.org>
Add to IDL API that allows the user to add and remove clauses on a table's condition
iteratively. IDL maintain tables condition and send monitor_cond_change to the server
upon condition change.
Add tests for conditional monitoring to IDL.
Signed-off-by: Liran Schour <lirans@il.ibm.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
IDL uses now a uuid to specify a monitoring session that is being
sent to the server on "monitor_cond" request.
This uuid will be used to issue ongoing "monitor_cond_change" requests
for this monitoring session.
Signed-off-by: Liran Schour <lirans@il.ibm.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Add monitor_cond method to ovsdb-client. Enable testing of monitor_cond_change
via unixctl command.Add unit tests for monitor_cond and monitor_cond_change.
See ovsdb-client(1) man page for details.
Replace monitor2 with monitor_cond.
Signed-off-by: Liran Schour <lirans@il.ibm.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Optimize ovsdb_condition_match_any_clause() to be in O(#columns in condition)
and not O(#clauses) in case condition's caluses function is boolean or "==".
Signed-off-by: Liran Schour <lirans@il.ibm.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
ovsdb: enable jsonrpc-server to service "monitor_cond_change" request
ovsdb-server now accepts "monitor_cond_change" request. After conditions change
we compose update notification according to the current state of the
database without using a change list before sending reply to the monitor_cond_change
request.
Sees ovsdb-server (1) man page for details of monitor_cond_change.
Signed-off-by: Liran Schour <lirans@il.ibm.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
ovsdb: generate update notifications for monitor_cond session
Hold session's conditions in ovsdb_monitor_session_condition. Pass it
to ovsdb_monitor for generating "update2" notifications.
Add functions that can generate "update2" notification for a
"monitor_cond" session.
JSON cache is enabled only for session's with true condition only.
"monitor_cond" and "monitor_cond_change" are RFC 7047 extensions
described by ovsdb-server(1) manpage.
Performance evaluation:
OVN is the main candidate for conditional monitoring usage. It is clear that
conditional monitoring reduces computation on the ovn-controller (client) side
due to the reduced size of flow tables and update messages. Performance
evaluation shows up to 75% computation reduction.
However, performance evaluation shows also a reduction in computation on the SB
ovsdb-server side proportional to the degree that each logical network is
spread over physical hosts in the DC. Evaluation shows that in a realistic
scenarios there is a computation reduction also in the server side.
Evaluation on simulated environment of 50 hosts and 1000 logical ports shows
the following results (cycles #):
ovsdb: allow unmonitored columns in condition evaluation
This commit allows to add unmonitored columns to a monitored table
due to condition update.
It will be used to evaluate conditions on unmonitored columns.
Update notification includes only monitored columns.
Due to the limited number of columns, we do not remove unused unmonitored
columns on condition update for code simplicity.
Signed-off-by: Liran Schour <lirans@il.ibm.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
ovsdb: add conditions utilities to support monitor_cond
Change ovsdb_condition to be a 3-element json array or a boolean value (see ovsdb-server
man page).
Conditions utilities will be used later for conditional monitoring.
Signed-off-by: Liran Schour <lirans@il.ibm.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ryan Moats [Mon, 18 Jul 2016 21:21:16 +0000 (16:21 -0500)]
ovn-controller: Persist ovn flow tables
Ensure that ovn flow tables are persisted so that changes to
them chan be applied incrementally - this is a prereq for
making lflow_run and physical_run incremental.
As part of this change, add a one-to-many hindex for finding
desired flows by their parent's UUID. Also extend the mapping
by match from one-to-one to one-to-many.
Signed-off-by: Ryan Moats <rmoats@us.ibm.com>
[blp@ovn.org adjusted style and comments and added
HINDEX_FOR_EACH_WITH_HASH_SAFE] Signed-off-by: Ben Pfaff <blp@ovn.org>
Russell Bryant [Mon, 18 Jul 2016 20:25:20 +0000 (16:25 -0400)]
ovn-controller: Drop remove_local_datapath_by_binding().
ovn-controller has an hmap called 'local_datapaths' which tracks
all OVN datapaths that have at least one port binding on the local
chassis. This patch corrects the case where a port binding row is
deleted from the southbound DB while it's still bound to the chassis,
meaning it was deleted before the ovs interface was deleted.
The previous code tried to handle this case by calling
remove_local_datapath_by_binding(). The function appears to try
to look up local_datapath by the binding UUID. If it finds it,
it will delete the local datapath entry. On the surface, this
looks like a bug where it deletes a local datapath entry even
when there could be other ports still bound to the chassis.
The reality is that this function was always a no-op. It was
doing a lookup using a different hash value than how local_datapath
entries are actually hashed. In practice, this wasn't a big problem
because local_datapaths are correctly cleaned in in the
process_full_binding case after an ovs interface is added or removed.
The new change ensures that we run the process_full_binding code
in this case right away, even if the interface is not deleted.
Fixes: 263064aeaa31 ("Convert binding_run to incremental processing.") Signed-off-by: Russell Bryant <russell@ovn.org> Acked-by: Ryan Moats <rmoats@us.ibm.com>
dataoath: compat: Do not use upstream fill-meta-data function for compat tunnel
upstream dev_fill_metadata_dst() uses upstream tunnel-dst which could
be different from OVS defined tun-dst. Therefore use fill-meta-data
function from compat layer.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
openvswitch: allow output of MPLS packets on tunnel vports
Currently output of MPLS packets on tunnel vports is not allowed by Open
vSwitch. This is because historically encapsulation was done in such a way
that the inner_protocol field of the skb needed to hold the inner protocol
for both MPLS and tunnel encapsulation in order for GSO segmentation to be
performed correctly.
Since b2acd1dc3949 ("openvswitch: Use regular GRE net_device instead of
vport") Open vSwitch makes use of lwt to output to tunnel netdevs which
perform encapsulation. As no drivers expose support for MPLS offloads this
means that GSO packets are segmented in software by validate_xmit_skb(),
which is called from __dev_queue_xmit(), before tunnel encapsulation occurs.
This means that the inner protocol of MPLS is no longer needed by the time
encapsulation occurs and the contention on the inner_protocol field of the
skb no longer occurs.
Thus it is now safe to output MPLS to tunnel vports.
Signed-off-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Jesse Gross <jesse@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
API changes are related commit:
openvswitch: Revert: "Enable memory mapped Netlink i/o"
revert commit 795449d8b846 ("openvswitch: Enable memory mapped Netlink i/o").
Following the mmaped netlink removal this code can be removed.
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
Russell Bryant [Fri, 15 Jul 2016 23:29:55 +0000 (19:29 -0400)]
ovn-controller: Remove local_datapaths_by_uuid.
binding.c included a static local_datapaths_by_uuid but it was not used
for anything. In fact, the hash node used when inserting into this hmap
is overwritten in another code path for a different hmap.
Fixes: 263064aeaa31 ("Convert binding_run to incremental processing.") Signed-off-by: Russell Bryant <russell@ovn.org> Acked-by: Ryan Moats <rmoats@us.ibm.com>
Russell Bryant [Tue, 12 Jul 2016 17:33:08 +0000 (13:33 -0400)]
ovn-controller: Clean up bindings handling.
Remove the global set of logical port IDs called 'all_lports'. This is
no longer used for anything after conntrack ID assignment was moved out
of binding.c.
Remove the global smap of logical port IDs to ovsrec_interface records.
We can't persist references to these records, as we may be holding
references to freed memory. Instead, replace it with a new global sset
of logical port IDs called 'local_ids'. This is used to track when
interfaces have been added or removed. We also build a temporary
shash of logical port IDs to ovs interfaces used for fast lookup
of the right interface as needed.
Found by inspection.
Fixes: a478c4efef4d ("ovn-controller: Refactor conntrack zone allocation.") Fixes: 263064aeaa31 ("Convert binding_run to incremental processing.") Signed-off-by: Russell Bryant <russell@ovn.org> Acked-by: Ryan Moats <rmoats@us.ibm.com>
ovn.at: A "peer" is only for interconnected routers.
We should not use "peer" while connecting a router to a switch.
(Doing so, will cause ovn-northd to constantly create and destroy
logical_flow records which causes CPU utilization of ovn-controller to
spike up.)
Fixes: 31114af758c7e6 ("ovn-nbctl: Update logical router port commands.") Signed-off-by: Gurucharan Shetty <guru@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org> Acked-by: Flavio Fernandes <flavio@flaviof.com>
system-ovn.at: Add a OVN NAT test using OVN gateway.
This unit test adds a basic OVN NAT test that tests north-south
DNAT, south-north SNAT and east-west DNAT and SNAT. It uses network
namespaces connected to br-int using veth pairs to act as logical
ports. This test does not cover multi-host scenarios, so there is
a gap. But userspace OVN tests do multi-host scenarios (without NAT
testing), so it should still be a decent coverage.
Signed-off-by: Gurucharan Shetty <guru@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
In the case of CHECKSUM_COMPLETE the skb checksum should be updated in
{push,pop}_mpls() as they the type in the ethernet header.
As suggested by Pravin Shelar.
Cc: Pravin Shelar <pshelar@ovn.org> Fixes: 25cd9ba0abc0 ("openvswitch: Add basic MPLS support to kernel") Signed-off-by: Simon Horman <simon.horman@netronome.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>