OVN currently supports multiple gateway routers (residing on
different chassis) connected to the same logical topology.
When external traffic enters the logical topology, they can enter
from any gateway routers and reach its eventual destination. This
is achieved with proper static routes configured on the gateway
routers.
But when traffic is initiated in the logical space by a logical
port, we do not have a good way to distribute that traffic across
multiple gateway routers.
This commit introduces one particular way to do it. Based on the
source IP address or source IP network of the packet, we can now
jump to a specific gateway router.
This is very useful for a specific use case of Kubernetes.
When traffic is initiated inside a container heading to outside world,
we want to be able to send such traffic outside the gateway router
residing in the same host as that of the container. Since each
host gets a specific subnet, we can use source IP address based
policy routing to decide on the gateway router.
Rationale for using the same routing table for both source and
destination IP address based routing:
Some hardware network vendors support policy routing in a different table
on arbitrary "match". And when a packet enters, if there is a match
in policy based routing table, the default routing table is not
consulted at all. In case of OVN, we mainly want policy based routing
for north-south traffic. We want east-west traffic to flow as-is. Creating
a separate table for policy based routing complicates the configuration
quite a bit. For e.g., if we have a source IP network based rule added,
to decide a particular gateway router as a next hop, we should add rules at
a higher priority for all the connected routes to make sure that east-west
traffic is not effected in the policy based routing table itself.
Signed-off-by: Gurucharan Shetty <guru@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
ovn-controller: Container can have connection to a hosting VM.
A Container running inside a VM can have a connection to the
hosting VM (parent port) in the logical topology (for e.g via a router).
So we should be able to loop-back into the same VM, even if the
final packet delivered does not have any tags in it.
Add a connection table to the southbound db schema, similar
to the Open_vSwitch "Manager" table.
Add tests for pssl: and ptcp: read-only connection types.
Add support to ovn-sbctl for listing the SB Connection table.
Potential future work:
- Test cases for other connection types (punix, ssl, tcp, unix).
- SSL configuration table for southbound db.
- Connection table for NB schema.
- Add a way to specify a read-only connection as an ovsdb-server
command-line option.
Signed-off-by: Lance Richardson <lrichard@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Pravin B Shelar [Tue, 1 Nov 2016 19:06:15 +0000 (12:06 -0700)]
datapath: geneve: Handle vlan tag
The compat vlan code ignores vlan tag for inner packet
on egress path. Following patch fixes this by inserting the
tag for inner packet before tunnel encapsulation.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
Ben Pfaff [Mon, 31 Oct 2016 21:33:13 +0000 (14:33 -0700)]
ofproto-dpif: Log warning when ct action or its variants are not supported.
Some datapaths do not support the ct action, and others support only a
subset of its features. Until now, it has been difficult to tell why a
particular action is being rejected. This commit should make it clearer.
Reported-by: Kevin Lin <kevinlin@berkeley.edu>
Reported-at: http://openvswitch.org/pipermail/discuss/2016-October/023060.html Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
When vxlan device is closed vxlan socket is freed. This
operation can race with vxlan-xmit function which
dereferences vxlan socket. Following patch uses RCU
mechanism to avoid this situation.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
Pravin B Shelar [Sun, 30 Oct 2016 04:33:06 +0000 (21:33 -0700)]
lisp: avoid using stale lisp socket.
This patch is similar to earlier vxlan patch.
Lisp device close operation frees lisp socket. This
operation can race with lisp-xmit function which
dereferences lisp socket. Following patch uses RCU
mechanism to avoid this situation.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
This patch is similar to earlier vxlan patch.
Geneve device close operation frees geneve socket. This
operation can race with geneve-xmit function which
dereferences geneve socket. Following patch uses RCU
mechanism to avoid this situation.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
ifnotifier: do not wake up when there is no db connection
When bridge uses the interface notifier, it wakes up until a reconfiguration
takes place. However, if there is no connection or a lock contention to the
database, the check for reconfiguration will not take place.
This uses a seq and only seq_wait when checking for the interfaces change.
This is easily reproduced by starting ovs-vswitchd without starting
ovsdb-server, and then creating a new system interface, like using
'ip link add type veth'. ovs-vswitchd will then consume 100% CPU.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Shashank Ram [Mon, 10 Oct 2016 22:15:05 +0000 (15:15 -0700)]
datapath-windows: Set isActivated flag only on success
@Switch.c: Modifies OvsActivateSwitch() function
to mark the switch as activated only if the
the status is success. The callers itself
only call this method when the isActivated
flag is unset.
Mauricio Vasquez [Fri, 21 Oct 2016 04:51:24 +0000 (23:51 -0500)]
doc: v2: fix bad link to dpdk advance installation guide
Previous fix was also wrong.
Fixes: 167703d ("doc: Convert INSTALL.DPDK to rST") Signed-off-by: Mauricio Vasquez B <mauricio.vasquez@polito.it> Acked-by: Stephen Finucane <stephen@that.guru> Signed-off-by: Russell Bryant <russell@ovn.org>
Jarno Rajahalme [Thu, 20 Oct 2016 22:22:14 +0000 (15:22 -0700)]
datapath: Support a fixed size of 128 distinct labels.
Port upstream change in conntrack labels extension. Add a new
configure macro HAVE_NF_CONN_LABELS_WITH_WORDS to detect the old
definition. Unfortunately there is no conntrack API to hide the
difference, so the this makes conntrack.c deviate from upstream source
a bit.
netfilter: conntrack: support a fixed size of 128 distinct labels
The conntrack label extension is currently variable-sized, e.g. if
only 2 labels are used by iptables rules then the labels->bits[] array
will only contain one element.
We track size of each label storage area in the 'words' member.
But in nftables and openvswitch we always have to ask for worst-case
since we don't know what bit will be used at configuration time.
As most arches are 64bit we need to allocate 24 bytes in this case:
Make bits a fixed size and drop the words member, it simplifies
the code and only increases memory requirements on x86 when
less than 64bit labels are required.
We still only allocate the extension if its needed.
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org>
Flavio Leitner [Tue, 18 Oct 2016 17:04:42 +0000 (15:04 -0200)]
fedora: do not restart the service on a pkg upgrade
There is no reliable way to restore the previous networking
state after a service restart. Many things like firewall
configuration, traffic shaping, stacked devices, custom setups
are completely out of OVS control.
The OVS might be providing the network used for remote
administration, so do not automatically restart the service
during a package upgrade.
Signed-off-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: Russell Bryant <russell@ovn.org>
Ben Pfaff [Wed, 7 Sep 2016 16:04:46 +0000 (09:04 -0700)]
ovsdb-idl: Check internal graph in OVSDB tests.
Some upcoming tests will add extra trickiness to the IDL internal graph.
This worries me, because the IDL doesn't have any checks for its graph
consistency. This commit adds some.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Andy Zhou <azhou@ovn.org>
Ben Pfaff [Tue, 30 Aug 2016 23:58:44 +0000 (16:58 -0700)]
ovsdb-idlc: Remove special case for "sizeof bool".
The "sparse" checker used to warn about sizeof(bool). These days, it does
not warn (without -Wsizeof-bool), so remove this ugly special case.
If you have a version of "sparse" that still warns by default, please
upgrade to a version that includes commit 2667c2d4ab33 (sparse: Allow
override of sizeof(bool) warning).
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Andy Zhou <azhou@ovn.org>
Ben Pfaff [Wed, 31 Aug 2016 18:42:53 +0000 (11:42 -0700)]
ovsdb-idl: Sort and unique-ify datum in ovsdb_idl_txn_write().
I noticed that there were lots of calls to ovsdb_datum_sort_unique() from
"set" functions in generated IDL code. This moves that call into common
code, reducing redundancy.
There are more calls to the same function that are a little harder to
remove.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Andy Zhou <azhou@ovn.org>
Ben Pfaff [Wed, 24 Aug 2016 19:38:39 +0000 (12:38 -0700)]
ovsdb-idlc: Declare loop variables in for statements in generated code.
This changes several instances of
size_t i;
for (i = 0; i < ...; i++)
into:
for (size_t i = 0; i < ...; i++)
in generated code, making it slightly more compact and easier to read.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Andy Zhou <azhou@ovn.org>
Ben Pfaff [Wed, 24 Aug 2016 19:32:59 +0000 (12:32 -0700)]
ovsdb-idlc: Make generated references to columns easier to read.
This replaces ovsrec_open_vswitch_columns[OVSREC_OPEN_VSWITCH_COL_CUR_CFG]
by the easier to read and equivalent ovsrec_open_vswitch_col_cur_cfg in
generated code.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Andy Zhou <azhou@ovn.org>
Ben Pfaff [Wed, 24 Aug 2016 18:47:56 +0000 (11:47 -0700)]
ovsdb-idlc: Simplify code generation to parse sets and maps of references.
This switches from code that looks like:
if (keyRow) {
...
}
to:
if (!keyRow) {
continue;
}
...
which is a little easier to generate because the indentation of ... is
constant.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Andy Zhou <azhou@ovn.org>
Ben Pfaff [Thu, 1 Sep 2016 05:03:27 +0000 (22:03 -0700)]
ovsdb-idlc: Use ovsdb_datum_from_smap() instead of open-coding it.
There's no reason to have three copies of this code for every smap-type
column.
The code wasn't a perfect match for ovsdb_datum_from_smap(), so this commit
also changes ovsdb_datum_from_smap() to better suit it. It only had one
caller and the new design is adequate for that caller.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Andy Zhou <azhou@ovn.org>
Mauricio Vasquez [Tue, 18 Oct 2016 22:23:12 +0000 (17:23 -0500)]
doc: fix bad link to dpdk advance installation guide
The link was pointing to a wrong place after the file was converted to rst.
Fixes: 167703d664fc ("doc: Convert INSTALL.DPDK to rST") Signed-off-by: Mauricio Vasquez B <mauricio.vasquez@polito.it> Acked-by: Stephen Finucane <stephen@that.guru> Signed-off-by: Russell Bryant <russell@ovn.org>
YAMAMOTO Takashi [Tue, 18 Oct 2016 11:31:55 +0000 (11:31 +0000)]
ovn-controller.at: Stop hardcoding a list of iface types
The list of supported iface types hardcoded in the test
is wrong on NetBSD. (or any userland-only ports I guess)
Instead of adding another case for NetBSD following WIN32,
just get the list from ovsdb.
Signed-off-by: YAMAMOTO Takashi <yamamoto@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
This will eventually go away once Sphinx starts doing all this work for
us. For now, however, let's make sure we don't break the OVS website.
This introduces a new dependency for the dist-docs script - 'rst2html'.
This tool is packaged on Ubuntu, Fedora (via 'python-docutils'), etc.
and can be installed from pip using the 'docsutils' package.
Signed-off-by: Stephen Finucane <stephen@that.guru> Signed-off-by: Russell Bryant <russell@ovn.org>
By reordering the data elements in dpif_upcall structure, pad bytes can
be reduced and also a cache line. Also dp_packet should be the first
member of the structure because rte_mbuf, the first member of dp_packet
should be aligned atleast on a 64-byte boundary.
Before: structure size:768, holes:1, sum padbytes:60, cachelines:12
After: structure size:704, holes:1, sum padbytes:4, cachelines:11
Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com> Co-authored-by: Antonio Fischetti <antonio.fischetti@intel.com> Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com> Acked-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
hash: Skip Invoking mhash_add__() with zero input.
mhash_add__() is expensive and should be only called with valid input.
zero-valued 'data' will not affect the 'hash' value and expensive hash
computation can be skipped when input is zero.
This patch will validate the input in mhash_add__ to save some cpu
cycles.
Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com> Co-authored-by: Antonio Fischetti <antonio.fischetti@intel.com> Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Ben Pfaff [Fri, 14 Oct 2016 18:12:43 +0000 (11:12 -0700)]
stream-ssl: Fix memory leak on error path.
The commit that this fixes is from 2009.
Reported-by: Kai-Wei Fan <fank@vmware.com> Fixes: 9467fe624698 ("Add SSL support to "stream" library and OVSDB.") Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Andy Zhou <azhou@ovn.org>
Amitabha Biswas [Wed, 12 Oct 2016 21:36:57 +0000 (14:36 -0700)]
Python-IDL: getattr after mutate fix
This commit returns the updated column value when getattr is done
after a mutate operation is performed (but before the commit).
Signed-off-by: Amitabha Biswas <azbiswas@gmail.com> Reported-by: Richard Theis <rtheis@us.ibm.com>
Reported-at: http://openvswitch.org/pipermail/dev/2016-September/080120.html Fixes: a59912a0ee8e ("python: Add support for partial map and set updates") Signed-off-by: Russell Bryant <russell@ovn.org>
Aaron Conole [Fri, 7 Oct 2016 17:36:45 +0000 (13:36 -0400)]
rhel-systemd: Delay shutting down the services
During testing it was found that systemd would consider the openvswitch
service as a part of networking component, but the dependent services of
ovs-vswitchd and ovsdb-server were not likewise considered. This leads
to some strange race conditions, observed when using NFS over TCP, while
shutting down systems.
Mark Kavanagh [Thu, 6 Oct 2016 10:25:33 +0000 (11:25 +0100)]
doc: Update DPDK pdump documentation
The DPDK pdump sample app was renamed from 'dpdk_pdump' to
'dpdk-pdump'. Update references to same within
INSTALL.DPDK-ADVANCED.md.
Add an additional sample command line that shows how to capture all
traffic traversing an interface within a single pcap file - a useful
tool for debugging DPDK-related issues.
Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Ciara Loftus [Thu, 13 Oct 2016 17:27:51 +0000 (18:27 +0100)]
dpdk: Fix DPDK pdump compilation
The rte_pdump header file was not included in the file that requires it.
Fix this.
Fixes: 01961bbdd34a ("dpdk: New module with some code from netdev-dpdk.") Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Since DPDK commit 30e639989227("mempool: support non-EAL thread"),
non-EAL threads can use the mempool API safely. Plus, nonpmd threads
access to netdev is already serialized with 'non_pmd_mutex' in
dpif-netdev.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ben Pfaff <blp@ovn.org> Acked-by: Aaron Conole <aconole@redhat.com> Tested-by: Aaron Conole <aconole@redhat.com>
netdev-dpdk: Do not abort if out of hugepage memory.
We can run out of hugepage memory coming from rte_*alloc() more easily
than heap coming from malloc().
Therefore:
* We should not use hugepage memory if we're going to access it only in
the slowpath.
* We shouldn't abort if we're out of hugepage memory.
* We should gracefully handle out of memory conditions.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ben Pfaff <blp@ovn.org> Acked-by: Aaron Conole <aconole@redhat.com> Tested-by: Aaron Conole <aconole@redhat.com>
I think it's clearer to use RCU than to check for a pointer twice in the
fast path (before and after taking the spinlock). Now the spinlock is
integrated into 'qos_conf'.
'qos_conf' objects cannot be modified, so, instead of having
'qos_set()', we now have 'qos_is_equal()', which tells us if an object
must be destroyed and recreated.
With this patch we also avoid passing the netdev parameter to qos ops,
since it was unused most of the times.
Lastly, some duplication is removed.
CC: Ian Stokes <ian.stokes@intel.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ben Pfaff <blp@ovn.org> Acked-by: Aaron Conole <aconole@redhat.com> Tested-by: Aaron Conole <aconole@redhat.com>