]> git.proxmox.com Git - ovs.git/log
ovs.git
4 years agoRemove OVN.
Mark Michelson [Fri, 6 Sep 2019 14:33:03 +0000 (10:33 -0400)]
Remove OVN.

OVN is separated into its own repo. This commit removes the OVN source,
OVN tests, and OVN documentation. It also removes mentions of OVN from
most documentation. The only place where OVN has been left is in
changelogs/NEWS, since we shouldn't mess with the history of the
project.

There is an exception here. The ovsdb-cluster tests rely on ovn-nbctl
and ovn-sbctl to run. Therefore those ovn utilities, as well as their
dependencies remain in the repo with this commit.

Acked-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agotests: Add track-origins flag to valgrind.
Ilya Maximets [Thu, 25 Jul 2019 15:45:44 +0000 (18:45 +0300)]
tests: Add track-origins flag to valgrind.

Useful for tracking where the uninitialized memory came from.
Report example:

  Thread 13 revalidator11:
  Conditional jump or move depends on uninitialised value(s)
     at 0x4C35D96: __memcmp_sse4_1 (in vgpreload_memcheck.so)
     by 0x9D4404: ofpbuf_equal (ofpbuf.h:273)
     by 0x9D4404: revalidate_ukey__ (ofproto-dpif-upcall.c:2219)
     <...>
     by 0x6AF488E: clone (clone.S:95)
   Uninitialised value was created by a stack allocation
     at 0x9D4450: compose_slow_path (ofproto-dpif-upcall.c:1062)

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Tested-by: William Tu <u9012063@gmail.com>
Acked-by: Ben Pfaff <blp@ovn.org>
4 years agodpif-netdev: Add core id in the PMD thread name.
Ilya Maximets [Wed, 10 Jul 2019 10:38:14 +0000 (13:38 +0300)]
dpif-netdev: Add core id in the PMD thread name.

This is highly useful to see on which core PMD is running by
only looking at the thread name. Thread Id still allows to
distinguish different threads running on the same core over the time:

   |dpif_netdev(pmd-c10/id:53)|DBG|Creating 2. subtable <...>
   |dpif_netdev(pmd-c10/id:53)|DBG|flow_add: <...>, actions:2
   |dpif_netdev(pmd-c09/id:70)|DBG|Core 9 processing port <..>

In gdb, top or any other utility it's useful to quickly catch up
needed thread without parsing logs, memory or matching threads by port
names they're handling.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
4 years agodpdk: Use ovs-numa provided functions to manage thread affinity.
Ilya Maximets [Tue, 13 Aug 2019 11:28:26 +0000 (14:28 +0300)]
dpdk: Use ovs-numa provided functions to manage thread affinity.

This allows to decrease code duplication and avoid using Linux-specific
functions (this might be useful in the future if we'll try to allow
running OvS+DPDK on FreeBSD).

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: William Tu <u9012063@gmail.com>
4 years agodpif-netdev-perf: Fix TSC frequency for non-DPDK case.
Ilya Maximets [Fri, 5 Jul 2019 12:43:15 +0000 (08:43 -0400)]
dpif-netdev-perf: Fix TSC frequency for non-DPDK case.

Unlike 'rte_get_tsc_cycles()' which doesn't need any specific
initialization, 'rte_get_tsc_hz()' could be used only after successfull
call to 'rte_eal_init()'. 'rte_eal_init()' estimates the TSC frequency
for later use by 'rte_get_tsc_hz()'.  Fairly said, we're not allowed
to use 'rte_get_tsc_cycles()' before initializing DPDK too, but it
works this way for now and provides correct results.

This patch provides TSC frequency estimation code that will be used
in two cases:
  * DPDK is not compiled in, i.e. DPDK_NETDEV not defined.
  * DPDK compiled in but not initialized,
    i.e. other_config:dpdk-init=false

This change is mostly useful for AF_XDP netdev support, i.e. allows
to use dpif-netdev/pmd-perf-show command and various PMD perf metrics.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: William Tu <u9012063@gmail.com>
4 years agoovs-numa: Add dump based thread affinity functions.
Ilya Maximets [Tue, 13 Aug 2019 11:15:05 +0000 (14:15 +0300)]
ovs-numa: Add dump based thread affinity functions.

New functions to get and set CPU affinity using CPU dumps.
This will abstract OS specific implementation details from the
cross-platform code.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
4 years agoSet release date for 2.12.0.
Justin Pettit [Tue, 3 Sep 2019 21:56:53 +0000 (14:56 -0700)]
Set release date for 2.12.0.

Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Flavio Leitner <fbl@sysclose.org>
4 years agorhel: Revert RHEL 7.4 comp_ver change
Greg Rose [Tue, 3 Sep 2019 15:50:42 +0000 (08:50 -0700)]
rhel: Revert RHEL 7.4 comp_ver change

I looked at the wrong list of kernels when I changed the value for the
RHEL 7.4 comp_ver variable.  Revert that part of commit e64c2c1
("rhel: Fix ovs-kmod-manage.sh to work with RHEL 7.3").

Fixes: e64c2c1 ("rhel: Fix ovs-kmod-manage.sh to work with RHEL 7.3")
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Gurucharan Shetty <guru@ovn.org>
4 years agorhel: Fix ovs-kmod-manage.sh to work with RHEL 7.3
Greg Rose [Thu, 29 Aug 2019 18:56:01 +0000 (11:56 -0700)]
rhel: Fix ovs-kmod-manage.sh to work with RHEL 7.3

Add case for RHEL 7.3.  This also fixes commit 22abff2 where I forgot to
update the comp_ver variable for RHEL 7.5 and while I was in there I
updated comp_ver for the RHEL 7.4 case as well.

Fixes: 22abff2 ("rhel: Add case for RHEL 7.5 major version to...")
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agopackets: Fix using outdated RSS hash after MPLS decapsulation.
Nitin Katiyar [Wed, 28 Aug 2019 16:42:07 +0000 (22:12 +0530)]
packets: Fix using outdated RSS hash after MPLS decapsulation.

When a packet is received, the RSS hash is calculated if it is not
already available. The Exact Match Cache (EMC) entry is then looked up
using this RSS hash.

When a MPLS encapsulated packet is received, the MPLS header is popped
and the packet is recirculated. Since the RSS hash has not been
invalidated here, the EMC lookup for all decapsulated packets will hit
the same entry even though these packets will have different tuple
values. This degrades performance severely as different inner packets
from the same MPLS tunnel would hit the same EMC entry.

This patch invalidates RSS hash (by resetting offload flags) after MPLS
header is popped.

Signed-off-by: Nitin Katiyar <nitin.katiyar@ericsson.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
4 years agodpif-netdev: Fail port addition if reconfiguration failed.
Ilya Maximets [Mon, 22 Jul 2019 14:56:50 +0000 (17:56 +0300)]
dpif-netdev: Fail port addition if reconfiguration failed.

If the port was destroyed during the initial reconfiguration, we should
report an error to the upper layers. Otherwise, successful addition of
the port will be logged and upper layers will continue to configure
this port. For example, the 'dpif' layer will try to initilaize flow
API for this device.

Fix that by checking for port existence after reconfiguration. We can't
get the real error code here, so let's assume EINVAL. 'ovs-vsctl' will
tell the user to check the logs for a real reason anyway.

Fixes: e32971b8ddb4 ("dpif-netdev: Centralized threads and queues handling code.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ian Stokes <ian.stokes@intel.com>
4 years agoMake pid_exists() more robust against empty pid argument
Michele Baldessari [Wed, 14 Aug 2019 15:47:07 +0000 (17:47 +0200)]
Make pid_exists() more robust against empty pid argument

In some of our destructive testing of ovn-dbs inside containers managed
by pacemaker we reached a situation where /var/run/openvswitch had
empty .pid files. The current code does not deal well with them
and pidfile_is_running() returns true in such a case and this confuses
the OCF resource agent.

- Before this change:
Inside a container run:
  killall ovsdb-server;
  echo -n '' > /var/run/openvswitch/ovnnb_db.pid; echo -n '' > /var/run/openvswitch/ovnsb_db.pid

We will observe that the cluster is unable to ever recover because
it believes the ovn processes to be running when they really aren't and
eventually just fails:
 podman container set: ovn-dbs-bundle [192.168.24.1:8787/rhosp15/openstack-ovn-northd:pcmklatest]
   ovn-dbs-bundle-0     (ocf::ovn:ovndb-servers):       Master controller-0
   ovn-dbs-bundle-1     (ocf::ovn:ovndb-servers):       Stopped controller-1
   ovn-dbs-bundle-2     (ocf::ovn:ovndb-servers):       Slave controller-2

Let's make sure pid_exists() returns false when the pid is an empty
string.

- After this change the cluster is able to recover from this state and
correctly start the resource:
 podman container set: ovn-dbs-bundle [192.168.24.1:8787/rhosp15/openstack-ovn-northd:pcmklatest]
   ovn-dbs-bundle-0     (ocf::ovn:ovndb-servers):       Master controller-0
   ovn-dbs-bundle-1     (ocf::ovn:ovndb-servers):       Slave controller-1
   ovn-dbs-bundle-2     (ocf::ovn:ovndb-servers):       Slave controller-2

Fixes: 3028ce2595c8 ("ovs-lib: Allow "status" command to work as non-root.")
Signed-off-by: Michele Baldessari <michele@acksyn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoconntrack: Fix ICMPv4 error data L4 length check.
Darrell Ball [Tue, 27 Aug 2019 23:59:02 +0000 (16:59 -0700)]
conntrack: Fix ICMPv4 error data L4 length check.

The ICMPv4 error data L4 length check was found to be too strict for TCP,
expecting a minimum of 20 rather than 8 bytes.  This worked by
hapenstance for other inner protocols.  The approach is to explicitly
handle the ICMPv4 error data L4 length check and to do this for all
supported inner protocols in the same way.  Making the code common
between protocols also allows the existing ICMPv4 related UDP tests to
cover TCP and ICMP inner protocol cases.
Note that ICMPv6 does not have an 8 byte limit for error L4 data.

Fixes: a489b16854b5 ("conntrack: New userspace connection tracker.")
CC: Daniele Di Proietto <diproiettod@ovn.org>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2019-August/361949.html
Reported-by: Vishal Deep Ajmera <vishal.deep.ajmera@ericsson.com>
Signed-off-by: Vishal Deep Ajmera <vishal.deep.ajmera@ericsson.com>
Co-authored-by: Vishal Deep Ajmera <vishal.deep.ajmera@ericsson.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agodatapath: compat: Backports bugfixes for nf_conncount
Yifeng Sun [Wed, 7 Aug 2019 22:25:33 +0000 (15:25 -0700)]
datapath: compat: Backports bugfixes for nf_conncount

This patch backports several critical bug fixes related to
locking and data consistency in nf_conncount code.

This backport is based on the following upstream net-next upstream commits.
a007232 ("netfilter: nf_conncount: fix argument order to find_next_bit")
c80f10b ("netfilter: nf_conncount: speculative garbage collection on empty lists")
2f971a8 ("netfilter: nf_conncount: move all list iterations under spinlock")
df4a902 ("netfilter: nf_conncount: merge lookup and add functions")
e8cfb37 ("netfilter: nf_conncount: restart search when nodes have been erased")
f7fcc98 ("netfilter: nf_conncount: split gc in two phases")
4cd273b ("netfilter: nf_conncount: don't skip eviction when age is negative")
c78e781 ("netfilter: nf_conncount: replace CONNCOUNT_LOCK_SLOTS with CONNCOUNT_SLOTS")
d4e7df1 ("netfilter: nf_conncount: use rb_link_node_rcu() instead of rb_link_node()")
53ca0f2 ("netfilter: nf_conncount: remove wrong condition check routine")
3c5cdb1 ("netfilter: nf_conncount: fix unexpected permanent node of list.")
31568ec ("netfilter: nf_conncount: fix list_del corruption in conn_free")
fd3e71a ("netfilter: nf_conncount: use spin_lock_bh instead of spin_lock")

This patch adds additional compat code so that it can build on
all supported kernel versions.

In addition, this patch helps OVS datapath to always choose bug-fixed
nf_conncount code. If kernel already has these fixes, then kernel's
nf_conncount is being used. Otherwise, OVS falls back to use compat
nf_conncount functions.

Travis tests are at
https://travis-ci.org/yifsun/ovs-travis/builds/569056850
On latest RHEL kernel, 'make check-kmod' runs good.

VMware-BZ: #2396471

Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agodatapath: Clear the L4 portion of the key for "later" fragments
Justin Pettit [Wed, 28 Aug 2019 23:50:07 +0000 (16:50 -0700)]
datapath: Clear the L4 portion of the key for "later" fragments

Upstream commit:
    commit 0754b4e8cdf3eec6e4122e79af26ed9bab20f8f8
    Author: Justin Pettit <jpettit@ovn.org>
    Date:   Tue Aug 27 07:58:10 2019 -0700

    openvswitch: Clear the L4 portion of the key for "later" fragments.

    Only the first fragment in a datagram contains the L4 headers.  When the
    Open vSwitch module parses a packet, it always sets the IP protocol
    field in the key, but can only set the L4 fields on the first fragment.
    The original behavior would not clear the L4 portion of the key, so
    garbage values would be sent in the key for "later" fragments.  This
    patch clears the L4 fields in that circumstance to prevent sending those
    garbage values as part of the upcall.

Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Justin Pettit <jpettit@ovn.org>
4 years agodatapath: Properly set L4 keys on "later" IP fragments
Greg Rose [Wed, 28 Aug 2019 23:50:06 +0000 (16:50 -0700)]
datapath: Properly set L4 keys on "later" IP fragments

Upstream commit:
    commit ad06a566e118e57b852cab5933dbbbaebb141de3
    Author: Greg Rose <gvrose8192@gmail.com>
    Date:   Tue Aug 27 07:58:09 2019 -0700

    openvswitch: Properly set L4 keys on "later" IP fragments

    When IP fragments are reassembled before being sent to conntrack, the
    key from the last fragment is used.  Unless there are reordering
    issues, the last fragment received will not contain the L4 ports, so the
    key for the reassembled datagram won't contain them.  This patch updates
    the key once we have a reassembled datagram.

    The handle_fragments() function works on L3 headers so we pull the L3/L4
    flow key update code from key_extract into a new function
    'key_extract_l3l4'.  Then we add a another new function
    ovs_flow_key_update_l3l4() and export it so that it is accessible by
    handle_fragments() for conntrack packet reassembly.

Co-authored-by: Justin Pettit <jpettit@ovn.org>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Justin Pettit <jpettit@ovn.org>
4 years agoutilities: Improve ovs-dpctl-top flow parsing
Jaime Caamaño Ruiz [Mon, 12 Aug 2019 16:52:27 +0000 (18:52 +0200)]
utilities: Improve ovs-dpctl-top flow parsing

* check that expected bytes and packets stats are correctly read from
  every flow.
* check that the expected elements are read for every field type
  aggregation.

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@suse.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoflow: save "vlan_hdrs" memset for untagged traffic
Yanqin Wei [Thu, 22 Aug 2019 10:09:50 +0000 (18:09 +0800)]
flow: save "vlan_hdrs" memset for untagged traffic

For untagged traffic, it is unnecessary to clear vlan_hdrs as it costs 32B
memset. So the patch improves it by postponing to clear vlan_hdrs until
ethtype check. It can benefit both untagged and single-tagged traffic. From
testing, it does not impact performance of dual-tagged traffic.

Reviewed-by: Gavin Hu <Gavin.Hu@arm.com>
Signed-off-by: Yanqin Wei <Yanqin.Wei@arm.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoflow: Reduce metadata connection state branches in miniflow_extract
Malvika Gupta [Tue, 27 Aug 2019 18:21:08 +0000 (13:21 -0500)]
flow: Reduce metadata connection state branches in miniflow_extract

This patch merges two separate if-else branches for metadata connection state
into one if-else branch to improve performance. It gives an average performance
improvement of ~3% on arm platforms and ~4.5% on x86 platforms.

Signed-off-by: Malvika Gupta <malvika.gupta@arm.com>
Reviewed-by: Yanqin Wei <yanqin.wei@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agopinctrl: Fix DNS packet parsing
Dumitru Ceara [Fri, 16 Aug 2019 14:31:28 +0000 (16:31 +0200)]
pinctrl: Fix DNS packet parsing

Due to the use of a uint8_t to index inside the DNS payload we could end
up in an infinite loop when specific (invalid) DNS packets were
processed by ovn-controller. In the infinite loop we keep increasing the
query_name dynamic string until running out of memory.

One way to replicate the issue is to configure DNS on the logical switch
and then inject a manually crafted DNS-like packet. For example, with
Scapy:

>>> p = IP(dst='10.0.0.2',src='10.0.0.3')/UDP(dport=53)/('a'*364)
>>> send(p)

Also add a sanity check on minimum L4 size of packets.

(cherry-picked from ovn commit - 7fbdeaade826da299c20c05050627ebea65fe8c2)

CC: Numan Siddique <nusiddiq@redhat.com>
Fixes: 16cb4fb8ca49 ("ovn-controller: Add 'dns_lookup' action")
Reported-at: https://bugzilla.redhat.com/1740335
Reported-by: Priscila <pveiga@redhat.com>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoOVN: fix memory leak in build_pre_lb
Lorenzo Bianconi [Mon, 26 Aug 2019 09:19:32 +0000 (11:19 +0200)]
OVN: fix memory leak in build_pre_lb

Fix memory leak of ip_address string in build_pre_lb routine if we
install logical flows for empty_lb controller event

Fixes: f49b17a6cbe3 ("OVN: use trigger_event action to report 'empty_lb_rule' events")
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoovs-thread: Make struct spin lock cache aligned.
William Tu [Mon, 26 Aug 2019 23:00:31 +0000 (16:00 -0700)]
ovs-thread: Make struct spin lock cache aligned.

Make the spin lock struct 64-byte aligned to avoid false sharing issue.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agotnl-neigh: Use outgoing ofproto version.
Flavio Leitner [Tue, 13 Aug 2019 16:34:04 +0000 (13:34 -0300)]
tnl-neigh: Use outgoing ofproto version.

When a packet needs to be encapsulated in userspace, the endpoint
address needs to be resolved to fill in the headers. If it is not,
then currently OvS sends either a Neighbor Solicitation (IPv6)
or an ARP Query (IPv4) to resolve it.

The problem is that the NS/ARP packet will go through the flow
rules in the new bridge, but inheriting the ofproto table version
from the original packet to be encapsulated. When those versions
don't match, the result is unexpected because no flow rules might
be visible, which would cause the default table rule to be used
to drop the packet. Or only part of the flow rules would be visible
and so on.

Since the NS/ARP packet is created by OvS and will be injected in
the outgoing bridge, use the corresponding ofproto version instead.

Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-By: Vasu Dasari <vdasari@gmail.com>
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agorhel: Add case for RHEL 7.5 major version to kmod manage script
Greg Rose [Tue, 27 Aug 2019 21:06:29 +0000 (14:06 -0700)]
rhel: Add case for RHEL 7.5 major version to kmod manage script

A Centos 7.5 kernel with an unencountered set of minor build numbers
caused an upgrade bug.  Adding the case for the rhel 7.5 kmod management
script fixes the problem.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Gurucharan Shetty <guru@ovn.org>
4 years agoovs-lib: Add timeout at ovs-check-dead-ifs.
William Tu [Tue, 27 Aug 2019 18:09:13 +0000 (11:09 -0700)]
ovs-lib: Add timeout at ovs-check-dead-ifs.

At SUSE12 SP3, we hit a case where ovs-check-dead-ifs tries to read
an entry in /proc/<pid>/fd/<some fd> but hangs forever.  The pid is
a qemu-system-x86_64 process and we suspect it's an issue related to
qemu, not ovs.  As a result, force-reload-kmod hangs and OVS bridge
never gets restarted. This patch adds a timeout of 5-seconds to
ovs-check-dead-ifs.

VMware-BZ: #2408059
Signed-off-by: William Tu <u9012063@gmail.com>
Cc: Ashish Varma <ashishvarma.ovs@gmail.com>
Cc: Gurucharan Shetty <guru@ovn.org>
Signed-off-by: Gurucharan Shetty <guru@ovn.org>
4 years agonetdev-dpdk: Refactor vhost custom stats for extensibility.
Ilya Maximets [Tue, 6 Aug 2019 15:57:09 +0000 (18:57 +0300)]
netdev-dpdk: Refactor vhost custom stats for extensibility.

vHost interfaces currently has only one custom statistic, but there
might be others in the near future. This refactoring makes the code
work in the same way as it done for dpdk and afxdp stats to keep the
common style over the different code places and makes it easily
extensible for the new stats addition.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
4 years agonetdev-dpdk: Fix not reporting rx_oversize_errors in stats.
Ilya Maximets [Tue, 6 Aug 2019 15:46:36 +0000 (18:46 +0300)]
netdev-dpdk: Fix not reporting rx_oversize_errors in stats.

There is a big code duplication issue with DPDK xstats that led to
missed "rx_oversize_errors" statistics. It's defined but not used.
Fix that by actually using this stat along with code refactoring that
will allow us to not make same mistakes in the future.
Macro definitions are perfectly suitable to automate code generation
in such cases and already used in a couple of places in OVS for similar
purposes.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Ian Stokes <ian.stokes@intel.com>
4 years agoovsdb.7.rst: some corrections in ovsdb-client usage.
Aliasgar Ginwala [Fri, 23 Aug 2019 22:36:26 +0000 (15:36 -0700)]
ovsdb.7.rst: some corrections in ovsdb-client usage.

1. Correct typo where it should be ovsdb-client backup vs ovsdb-tool backup.
2. Update for which case will ovsdb-client not work.

Acked-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Aliasgar Ginwala <aginwala@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoovsdb.5.rst: Fix minor format problem.
Han Zhou [Thu, 22 Aug 2019 21:08:23 +0000 (14:08 -0700)]
ovsdb.5.rst: Fix minor format problem.

Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoraft: Save and read new election timer in header snapshot.
Han Zhou [Thu, 22 Aug 2019 21:08:22 +0000 (14:08 -0700)]
raft: Save and read new election timer in header snapshot.

This patch store the latest election timer in snapshot during log
compression, and when server restarts it reads the value from the log.
Without this, any previous changes to election timer will be lost
in the log, and if server restarts, it will use the default value
instead of the changed value.

Fixes: commit 8e35461 ("ovsdb raft: Support leader election time change online.")
Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoraft.c: Election timer initial reset with value from log.
Han Zhou [Thu, 22 Aug 2019 21:08:21 +0000 (14:08 -0700)]
raft.c: Election timer initial reset with value from log.

After election timer is changed through cluster/change-election-timer
command, if a server restarts, it firstly initializes with the default
value and use it to reset the timer. Although it reads the latest
timer value later from the log, the first timeout may be much shorter
than expected by other servers that use latest timeout, and it would
start election before it receives the first heartbeat from the leader.

This patch fixes it by changing the order of reading log and resetting
timer so that the latest value is read from the log before the initial
resetting of the timer.

Fixes: commit 8e35461 ("ovsdb raft: Support leader election time change online.")
Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoofproto-dpif: Fix using uninitialised memory in user_action_cookie.
Ilya Maximets [Thu, 25 Jul 2019 15:11:13 +0000 (18:11 +0300)]
ofproto-dpif: Fix using uninitialised memory in user_action_cookie.

Designated initializers are not suitable for initializing non-packed
structures and unions which are subjects for comparison by memcmp().

Whole memory for 'struct user_action_cookie' must be explicitly cleared
before using because it will be copied with memcpy and later compared
by memcmp in ofpbuf_equal().

Few issues found be valgrind:

 Thread 13 revalidator11:
 Conditional jump or move depends on uninitialised value(s)
    at 0x4C35D96: __memcmp_sse4_1 (in vgpreload_memcheck.so)
    by 0x9D4404: ofpbuf_equal (ofpbuf.h:273)
    by 0x9D4404: revalidate_ukey__ (ofproto-dpif-upcall.c:2219)
    by 0x9D4404: revalidate_ukey (ofproto-dpif-upcall.c:2286)
    by 0x9D62AC: revalidate (ofproto-dpif-upcall.c:2685)
    by 0x9D62AC: udpif_revalidator (ofproto-dpif-upcall.c:942)
    by 0xA9C732: ovsthread_wrapper (ovs-thread.c:383)
    by 0x5FF86DA: start_thread (pthread_create.c:463)
    by 0x6AF488E: clone (clone.S:95)
  Uninitialised value was created by a stack allocation
    at 0x9D4450: compose_slow_path (ofproto-dpif-upcall.c:1062)

 Thread 11 revalidator16:
 Conditional jump or move depends on uninitialised value(s)
    at 0x4C35D96: __memcmp_sse4_1 (in vgpreload_memcheck.so)
    by 0x9D4404: ofpbuf_equal (ofpbuf.h:273)
    by 0x9D4404: revalidate_ukey__ (ofproto-dpif-upcall.c:2220)
    by 0x9D4404: revalidate_ukey (ofproto-dpif-upcall.c:2287)
    by 0x9D62BC: revalidate (ofproto-dpif-upcall.c:2686)
    by 0x9D62BC: udpif_revalidator (ofproto-dpif-upcall.c:942)
    by 0xA9C6D2: ovsthread_wrapper (ovs-thread.c:383)
    by 0x5FF86DA: start_thread (pthread_create.c:463)
    by 0x6AF488E: clone (clone.S:95)
  Uninitialised value was created by a stack allocation
    at 0x9DC4E0: compose_sflow_action (ofproto-dpif-xlate.c:3211)

The struct was never marked as 'packed', however it was manually
adjusted to be so in practice.
Old IPFIX related commit first made the structure non-contiguous.
Commit 8de6ff3ea864 ("ofproto-dpif: Use a fixed size userspace cookie.")
added uninitialized parts of the additional union space and the next
one introduced new holes between structure fields for all cases.

CC: Justin Pettit <jpettit@ovn.org>
Fixes: 8b7ea2d48033 ("Extend OVS IPFIX exporter to export tunnel headers")
Fixes: 8de6ff3ea864 ("ofproto-dpif: Use a fixed size userspace cookie.")
Fixes: fcb9579be3c7 ("ofproto: Add 'ofproto_uuid' and 'ofp_in_port' to user action cookie.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ben Pfaff <blp@ovn.org>
4 years agoRemove ageing check in run_put_mac_binding
Lorenzo Bianconi [Thu, 22 Aug 2019 16:51:40 +0000 (18:51 +0200)]
Remove ageing check in run_put_mac_binding

Remove ageing check in run_put_mac_binding routine on mac-binding info
since if ovn-controller main thread is heavy loaded the info will be
discarded and the mac_binding table will not never be updated

Acked-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
4 years agotest: do not require python2 for CHECK_CONNTRACK macro
Lorenzo Bianconi [Mon, 29 Jul 2019 11:48:33 +0000 (13:48 +0200)]
test: do not require python2 for CHECK_CONNTRACK macro

Do not strictly require python2 for CHECK_CONNTRACK macro definitions in
system-{kmod,userspace}-macros.at

Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoOVN: fix DNAT/SNAT system-ovn unit tests
Lorenzo Bianconi [Mon, 29 Jul 2019 11:41:03 +0000 (13:41 +0200)]
OVN: fix DNAT/SNAT system-ovn unit tests

Fix conntrack checks in the following tests in tests/system-ovn.at:
- ovn -- DNAT and SNAT on distributed router - N/S
- ovn -- DNAT and SNAT on distributed router - E/W

Fixes: a6ee09882283 ("OVN: run local logical flows first in S_ROUTER_OUT_SNAT table")
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoOVS: Containerize components
Aliasgar Ginwala [Sat, 17 Aug 2019 07:21:45 +0000 (00:21 -0700)]
OVS: Containerize components

 1. Start OVS components in containers so that building and shipping
    of OVS components is easy.
 2. Load OVS kernel modules on host from container to avoid installing ovs
    on host.
 3. Update documentation about how to build/run ovs in docker.

Acked-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: aginwala <aginwala@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoAUTHORS: Add Surya Rudra.
Ben Pfaff [Thu, 22 Aug 2019 20:17:46 +0000 (13:17 -0700)]
AUTHORS: Add Surya Rudra.

Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoofproto-dpif: Fix for recirc issue with mpls traffic with dp_hash
Surya Rudra [Fri, 19 Jul 2019 05:35:19 +0000 (11:05 +0530)]
ofproto-dpif: Fix for recirc issue with mpls traffic with dp_hash

Fix infinite recirculation loop for MPLS packets sent to dp_hash-based
 select group

Issue:
When a MPLS encapsulated packet is received, the MPLS header is removed,
a recirculation id assigned and then recirculated into the pipeline.
If the flow rules require the packet to be then sent over DP-HASH based
select group buckets, the packet has to be recirculated again. However,
the same recirculation id was used and this resulted in the packet being
repeatedly recirculated until it got dropped because the maximum recirculation
limit was hit.

Fix:
Include the  “was_mpls” boolean which indicates whether the packet was MPLS
encapsulated when computing the hash. After popping the MPLS header this will
result in a  different hash value than before and new recirculation id will
get generated.

DPCTL flows with and without the fix are shown below
Without Fix:
recirc_id(0x1),dp_hash(0x5194bf18/0xf),in_port(2),packet_type(ns=0,id=0),
eth_type(0x0800),ipv4(frag=no), packets:20, bytes:1960,
used:0.329s, actions:1
recirc_id(0x1),in_port(2),packet_type(ns=0,id=0),eth_type(0x0800),
ipv4(frag=no), packets:20, bytes:1960, used:0.329s,
actions:hash(sym_l4(0)),recirc(0x1)
recirc_id(0),in_port(2),packet_type(ns=0,id=0),eth_type(0x8847),
mpls(label=22/0xfffff,tc=0/0,ttl=64/0x0,bos=1/1), packets:20, bytes:2040,
used:0.329s, actions:pop_mpls(eth_type=0x800),recirc(0x1)

With Fix:
recirc_id(0x2),dp_hash(0x5194bf18/0xf),in_port(3),packet_type(ns=0,id=0),
eth_type(0x0800),ipv4(frag=no), packets:12481, bytes:1223138,
used:0.588s, actions:1
recirc_id(0x1),in_port(3),packet_type(ns=0,id=0),eth_type(0x0800),
ipv4(frag=no), packets:74431, bytes:7294238, used:0.386s,
actions:hash(sym_l4(0)),recirc(0x2)
recirc_id(0x2),dp_hash(0xb952470d/0xf),in_port(3),packet_type(ns=0,id=0),
eth_type(0x0800),ipv4(frag=no), packets:12441, bytes:1219218,
used:0.482s, actions:1
recirc_id(0x2),dp_hash(0xeff6ad76/0xf),in_port(3),packet_type(ns=0,id=0),
eth_type(0x0800),ipv4(frag=no), packets:12385, bytes:1213730,
used:0.908s, actions:1
recirc_id(0),in_port(3),packet_type(ns=0,id=0),eth_type(0x8847),
mpls(label=22/0xfffff,tc=0/0,ttl=64/0x0,bos=1/1), packets:74431,
bytes:7591962, used:0.386s, actions:pop_mpls(eth_type=0x800),recirc(0x1)
recirc_id(0x2),dp_hash(0xb6233fbe/0xf),in_port(3),packet_type(ns=0,id=0),
eth_type(0x0800),ipv4(frag=no), packets:12369, bytes:1212162,
used:0.386s, actions:1
recirc_id(0x2),dp_hash(0xa3670459/0xf),in_port(3),packet_type(ns=0,id=0),
eth_type(0x0800),ipv4(frag=no), packets:24751, bytes:2425598,
used:0.483s, actions:1

Signed-off-by: Surya Rudra <rudrasurya.r@altencalsoftlabs.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoupcall: Configure datapath min-revalidate-pps through ovs-vsctl.
Vlad Buslov [Sun, 21 Jul 2019 08:34:23 +0000 (11:34 +0300)]
upcall: Configure datapath min-revalidate-pps through ovs-vsctl.

This patch adds a new configuration option, "min-revalidate-pps" to the
Open_vSwitch "other-config" column. This sets minimum pps that flow must
have in order to be revalidated when revalidation duration exceeds half of
max-revalidator config variable.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoupcall: Change should_revalidate to use max-revalidator value
Vlad Buslov [Sun, 21 Jul 2019 08:34:22 +0000 (11:34 +0300)]
upcall: Change should_revalidate to use max-revalidator value

Revalidate if dump duration was longer than half of max-revalidator
timeout, instead of hardcoded 200msec value.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoupcall: Configure datapath max-revalidator through ovs-vsctl.
Vlad Buslov [Sun, 21 Jul 2019 08:34:21 +0000 (11:34 +0300)]
upcall: Configure datapath max-revalidator through ovs-vsctl.

This patch adds a new configuration option, "max-revalidator" to the
Open_vSwitch "other-config" column. This sets maximum allowed ravalidator
timeout. Actual timeout value is determined at runtime as minimum of
"max-idle" and "max-revalidator".

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoovsdb monitor: Fix crash when using non-zero last-id with standalone DB.
Han Zhou [Mon, 19 Aug 2019 23:30:35 +0000 (16:30 -0700)]
ovsdb monitor: Fix crash when using non-zero last-id with standalone DB.

When a client uses monitor-cond-since with a non-zero last-id but the
server is not in cluster mode for the DB being monitored, it leads to
segmentation fault because the txn_history list is not initialized in
this case.

Program terminated with signal SIGSEGV, Segmentation fault.
1536            struct ovsdb_txn *txn = h_node->txn;
(gdb) bt
0  ovsdb_monitor_get_changes_after (txn_uuid=txn_uuid@entry=0x7ffe8605b7e0, dbmon=0x17c1b40, p_mcs=p_mcs@entry=0x17c4900) at ovsdb/monitor.c:1536
1  0x000000000040da2d in ovsdb_jsonrpc_monitor_create (request_id=0x1804630, version=<optimized out>, params=0x17ad330, db=0x18015b0, s=<optimized out>) at ovsdb/jsonrpc-server.c:1469
2  ovsdb_jsonrpc_session_got_request (request=0x17ad520, s=<optimized out>) at ovsdb/jsonrpc-server.c:1002
3  ovsdb_jsonrpc_session_run (s=<optimized out>) at ovsdb/jsonrpc-server.c:556
...

Although it doesn't happen in normal use cases, no one can prevent a
client to send this on purpose or in a corner case when a client firstly
connected to a clustered DB but later the server restarted with a
non-clustered DB.

This patch fixes it by always initialize the txn_history list to avoid
the undefined behavior in this case. It adds a test case to cover it, too.

Fixes: 695e815 ("ovsdb-server: Transaction history tracking.")
Reported-by: Aliasgar Ginwala <aginwala@ebay.com>
Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoovsdb raft: Support leader election time change online.
Han Zhou [Mon, 19 Aug 2019 16:30:00 +0000 (09:30 -0700)]
ovsdb raft: Support leader election time change online.

A new unixctl command cluster/change-election-timer is implemented to
change leader election timeout base value according to the scale needs.

The change takes effect upon consensus of the cluster, implemented through
the append-request RPC.  A new field "election-timer" is added to raft log
entry for this purpose.

Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoraft.c: Set candidate_retrying if no leader elected since last election.
Han Zhou [Mon, 19 Aug 2019 16:29:59 +0000 (09:29 -0700)]
raft.c: Set candidate_retrying if no leader elected since last election.

candiate_retrying is used to determine if the current node is disconnected
from the cluster when the node is in candiate role. However, a node
can flap between candidate and follower role before a leader is elected
when majority of the cluster is down, so is_connected() will flap, too, which
confuses clients.

This patch avoids the flapping with the help of a new member had_leader,
so that if no leader was elected since last election, we know we are
still retrying, and keep as disconnected from the cluster.

Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoraft.c: Stale leader should disconnect from cluster.
Han Zhou [Mon, 19 Aug 2019 16:29:58 +0000 (09:29 -0700)]
raft.c: Stale leader should disconnect from cluster.

As mentioned in RAFT paper, section 6.2:

Leaders: A server might be in the leader state, but if it isn’t the current
leader, it could be needlessly delaying client requests. For example, suppose a
leader is partitioned from the rest of the cluster, but it can still
communicate with a particular client. Without additional mechanism, it could
delay a request from that client forever, being unable to replicate a log entry
to any other servers. Meanwhile, there might be another leader of a newer term
that is able to communicate with a majority of the cluster and would be able to
commit the client’s request. Thus, a leader in Raft steps down if an election
timeout elapses without a successful round of heartbeats to a majority of its
cluster; this allows clients to retry their requests with another server.

Reported-by: Aliasgar Ginwala <aginwala@ebay.com>
Tested-by: Aliasgar Ginwala <aginwala@ebay.com>
Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoovsdb-idl.c: Allows retry even when using a single remote.
Han Zhou [Mon, 19 Aug 2019 16:29:57 +0000 (09:29 -0700)]
ovsdb-idl.c: Allows retry even when using a single remote.

When clustered mode is used, the client needs to retry connecting
to new servers when certain failures happen. Today it is allowed to
retry new connection only if multiple remotes are used, which prevents
using LB VIP with clustered nodes. This patch makes sure the retry
logic works when using LB VIP: although same IP is used for retrying,
the LB can actually redirect the connection to a new node.

Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoraft: Move raft_reset_ping_timer() out of the loop.
Han Zhou [Tue, 13 Aug 2019 16:23:19 +0000 (09:23 -0700)]
raft: Move raft_reset_ping_timer() out of the loop.

Fixes: commit 5a9b53a5 ("ovsdb raft: Fix duplicated transaction execution when leader failover.")
Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agosat-math: Do not use __builtin_s*_overflow() with sparse.
Ben Pfaff [Wed, 21 Aug 2019 17:17:21 +0000 (10:17 -0700)]
sat-math: Do not use __builtin_s*_overflow() with sparse.

Some versions of sparse do not understand __builtin_saddll_overflow() and
related GCC builtins for calculations with overflow detection.  This patch
avoids using them when sparse is in use.

Reported-by: Justin Pettit <jpettit@ovn.org>
Tested-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoovs-lib: Fix standalone db migration to raft
Aliasgar Ginwala [Tue, 20 Aug 2019 01:36:13 +0000 (18:36 -0700)]
ovs-lib: Fix standalone db migration to raft

Current code of create-cluster from standalone db takes backup of existing
standalone db and then generates a new clustered dbs from backup dbs. Hence,
during migration if nb and sb  dbs are still present, create-cluster will fail
saying file exists and will not really convert  dbs to clustered dbs. This
patch fixes the same.

e.g message that pops up while migration from standalone to raft cluster:
 * Backing up database to /etc/openvswitch/ovnnb_db.db.backup5.13.0-1278623084
ovsdb-tool: I/O error: /etc/openvswitch/ovnnb_db.db: create failed (File exists)
 * Creating cluster database /etc/openvswitch/ovnnb_db.db from existing one

Signed-off-by: aginwala <aginwala@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agodatapath-windows: Fix updating ct label when mask is specified
Anand Kumar [Thu, 15 Aug 2019 16:39:06 +0000 (09:39 -0700)]
datapath-windows: Fix updating ct label when mask is specified

When an existing label needs to be changed by specifing bits to be
updated using mask, instead of updating only the masked bits,
new label was getting overridden. This patch fixes this issue.

Signed-off-by: Anand Kumar <kumaranand@vmware.com>
Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>
4 years agotravis: Drop OSX workarounds.
Ilya Maximets [Mon, 5 Aug 2019 15:14:46 +0000 (18:14 +0300)]
travis: Drop OSX workarounds.

TravisCI currently uses xcode9.4 as a default image and it
it has good version of libtool out-of-the-box.
Removing these workarounds saves 4-6 minutes of OSX build.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Aaron Conole <aconole@redhat.com>
4 years agotravis: Combine kernel builds.
Ilya Maximets [Mon, 5 Aug 2019 12:26:02 +0000 (15:26 +0300)]
travis: Combine kernel builds.

Single kernel build job takes ~3 minutes in average. Most of
this time takes VM spawning and initial configuration.
Combining these 24 jobs in 4 allows us to better utilize workers
and not waste time on spawning VMs.

Before:

    24 jobs * 3 minutes = 72 minutes
    72 minutes / 5 workers = 14.4 minutes / worker

After:

    4 jobs * 10 minutes = 40 minutes
    40 minutes / 4 workers = 10 minutes / worker
    + 1 free worker that able to run other jobs at the same time.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Aaron Conole <aconole@redhat.com>
4 years agotravis: Cache DPDK build.
Ilya Maximets [Mon, 5 Aug 2019 07:58:18 +0000 (10:58 +0300)]
travis: Cache DPDK build.

This change enables cache for DPDK build directory, so we'll never
build same version of DPDK again. This speeds up each DPDK related
job by 4-6 minutes effectively saving 30-50 minutes of the total time.
Ex. Full TravisCI run on 'trusty' images:

  Without cache:
    Ran for 1 hr 9 min 29 sec
    Total time 4 hrs 55 min 13 sec

  With populated cache:
    Ran for 1 hr 2 min 18 sec
    Total time 4 hrs 20 min 9 sec

Saved:
  Real time: ~7 minutes.
  Total worker time: ~35 minutes.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Aaron Conole <aconole@redhat.com>
4 years agoSwitch from internal port to all ports defined
Alin Gabriel Serdean [Mon, 25 Mar 2019 10:12:49 +0000 (12:12 +0200)]
Switch from internal port to all ports defined

This patch changes the way we try to figure out if a port is defined on a given switch.

Instead of looking only in the internal ports defined switch to all ports defined.

This caused issues when trying to add a Hyper-V container port to a given OVS bridge.

Reported-by: Danting Liu <dantingl@vmware.com>
Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>
Acked-by: Anand Kumar <kumaranand@vmware.com>
4 years agocheckpatch: Fix regexp for if, while, etc inside macros.
Ilya Maximets [Fri, 9 Aug 2019 11:53:29 +0000 (14:53 +0300)]
checkpatch: Fix regexp for if, while, etc inside macros.

This allows to use a one-character expression inside the 'if'
statement and multiple spaces before the line continuation character.

Fixes false positive in case like this:

  #define MACRO(ARG)     \
      if (a) {           \
          do_work(ARG);  \
      }

Fixes: 16770c6d9179 ("checkpatch: support macro continuation")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Aaron Conole <aconole@redhat.com>
4 years agonetdev-afxdp: fix corner case where umem entries were not released
Eelco Chaudron [Thu, 8 Aug 2019 13:27:05 +0000 (09:27 -0400)]
netdev-afxdp: fix corner case where umem entries were not released

If for some reason the last element in the batch was already pushed on
the stack, none of the elements where pushed. This was leading to
buffer starvation.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
4 years agoovs-tc: offload MPLS set actions to TC datapath
John Hurley [Tue, 30 Jul 2019 11:05:17 +0000 (12:05 +0100)]
ovs-tc: offload MPLS set actions to TC datapath

Recent modifications to TC allows the modifying of fields within the
outermost MPLS header of a packet. OvS datapath rules impliment an MPLS
set action by supplying a new MPLS header that should overwrite the
current one.

Convert the OvS datapath MPLS set action to a TC modify action and allow
such rules to be offloaded to a TC datapath.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agoovs-tc: offload MPLS push actions to TC datapath
John Hurley [Tue, 30 Jul 2019 11:05:16 +0000 (12:05 +0100)]
ovs-tc: offload MPLS push actions to TC datapath

TC can now be used to push an MPLS header onto a packet. The MPLS label is
the only information that needs to be passed here with the rest reverting
to default values if none are supplied. OvS, however, gives the entire
MPLS header to be pushed along with the MPLS protocol to use. TC can
optionally accept these values so can be made replicate the OvS datapath
rule.

Convert OvS MPLS push datapath rules to TC format and offload to a TC
datapath.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agoovs-tc: offload MPLS pop actions to TC datapath
John Hurley [Tue, 30 Jul 2019 11:05:15 +0000 (12:05 +0100)]
ovs-tc: offload MPLS pop actions to TC datapath

TC now supports an action to pop the outer MPLS header from a packet. The
next protocol after the header is required alongside this. Currently, OvS
datapath rules also supply this information.

Offload OvS MPLS pop actions to TC along with the next protocol.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agocompat: add compatibility headers for tc mpls action
John Hurley [Tue, 30 Jul 2019 11:05:14 +0000 (12:05 +0100)]
compat: add compatibility headers for tc mpls action

OvS includes compat code for several TC actions including vlan, mirred and
tunnel key. MPLS actions have recently been added to TC in the kernel. In
preparation for adding TC offload code for MPLS, add the MPLS compat code.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agodpif-netlink: Allow offloading of flows with dl_type 0x1234.
Ilya Maximets [Tue, 30 Jul 2019 14:00:06 +0000 (17:00 +0300)]
dpif-netlink: Allow offloading of flows with dl_type 0x1234.

'dpif_probe_feature()' always has DPIF_FP_PROBE flag set. Other probing
code uses dpif_execute() with DPIF_OP_EXECUTE, hence never calls
parse_flow_put().
Thus, this 'if' statement is wrong and should be removed as it only
forbids offloading of the real legitimate flows with dl_type 0x1234.
Dummy flows never reach this code.

CC: Paul Blakey <paulb@mellanox.com>
Fixes: 8b668ee3f0cc ("dpif-netlink: Use netdev flow put api to insert a flow")
Reported-by: Eli Britstein <elibr@mellanox.com>
Acked-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
4 years agonetdev-afxdp: Error when no XDP program loaded.
William Tu [Wed, 24 Jul 2019 15:21:35 +0000 (08:21 -0700)]
netdev-afxdp: Error when no XDP program loaded.

netdev-afxdp requires XDP program to be loaded.  When prog_id == 0,
it indicates no XDP program, so return error and free resources.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
4 years agoovsdb-data: Don't put strings with digits in quotes.
Ilya Maximets [Thu, 25 Jul 2019 16:09:22 +0000 (19:09 +0300)]
ovsdb-data: Don't put strings with digits in quotes.

No need to use quotes for strings like "br0".

Keeping UUIDs always in quotes to avoid different treatment of those
that starts with digits and those that starts with letters.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ben Pfaff <blp@ovn.org>
4 years agonetdev-afxdp: Convert AFXDP_DEBUG to custom stats.
Ilya Maximets [Fri, 5 Jul 2019 15:37:58 +0000 (11:37 -0400)]
netdev-afxdp: Convert AFXDP_DEBUG to custom stats.

These are valid statistics of a network interface and should be
exposed via custom stats.

The same MACRO trick as in vswitchd/bridge.c is used to reduce code
duplication and easily add new stats if necessary in the future.

Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
4 years agonetdev-afxdp: Fix use of unconfigured device.
Ilya Maximets [Mon, 22 Jul 2019 13:05:20 +0000 (09:05 -0400)]
netdev-afxdp: Fix use of unconfigured device.

In case of failure of 'xsk_configure_all()', 'n_rxq' and 'xdpmode'
will remain in a new state. This will result in successful
reconfiguration (immediate return, because configuration is already
applied) if 'netdev_reconfigure()' will be called again.

Same issue was fixed previously for netdev-dpdk using 'dev->started'
flag in commit:
606f66507250 ("netdev-dpdk: Don't use PMD driver if not configured successfully")

Let's use similar approach with checking the 'dev->xsks' which only
exists if configuration was successful.

Additionally implemented 'netdev_afxdp_construct()' function to
explicitly initialize all the specific fields and request the
reconfiguration.

CC: William Tu <u9012063@gmail.com>
Fixes: 0de1b425962d ("netdev-afxdp: add new netdev type for AF_XDP.")
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
4 years agoPrepare for post-2.12.0 (2.12.90).
Ben Pfaff [Mon, 22 Jul 2019 16:47:09 +0000 (09:47 -0700)]
Prepare for post-2.12.0 (2.12.90).

Acked-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoPrepare for 2.12.0.
Ben Pfaff [Mon, 22 Jul 2019 16:47:08 +0000 (09:47 -0700)]
Prepare for 2.12.0.

Acked-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agotest: Fix fragment-related tests that fail on 4.19+ due to small-sized packets
Yifeng Sun [Thu, 18 Jul 2019 22:59:45 +0000 (15:59 -0700)]
test: Fix fragment-related tests that fail on 4.19+ due to small-sized packets

These fragment-related tests are failing on later kernels (4.19.x)
because kernel quietly drops any packet fragment that is not the last
but has a size smaller than IPV6_MIN_MTU. This patch fixes
them by increasing their sizes to IPV6_MIN_MTU.

Reviewed-by: Darrell Ball <dlu998@gmail.com>
Reivewed-at: https://github.com/openvswitch/ovs/pull/278
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agotnl-neigh-cache: Purge learnt neighbors when port/bridge is deleted
Vasu Dasari [Tue, 16 Jul 2019 14:54:31 +0000 (10:54 -0400)]
tnl-neigh-cache: Purge learnt neighbors when port/bridge is deleted

Say an ARP entry is learnt on a OVS port and when such a port is deleted,
learnt entry should be removed from the port. It would have be aged out after
ARP ageout time. This code will clean up immediately.

Added test case(tunnel - neighbor entry add and deletion) in tunnel.at, to
verify neighbors are added and removed on deletion of a ports and bridges.

Discussion for this addition is at:
https://mail.openvswitch.org/pipermail/ovs-discuss/2019-June/048754.html

Signed-off-by: Vasu Dasari <vdasari@gmail.com>
Reviewed-by: Flavio Fernandes <flavio@flaviof.com>
Reviewed-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agosystem-traffic: Make nsh test more robust.
William Tu [Fri, 19 Jul 2019 14:49:01 +0000 (07:49 -0700)]
system-traffic: Make nsh test more robust.

The patch adds '-n' to tcpdump to avoid address coverting. Add '-l' for rhel8
to avoid buffering. Since '-U' is used to output to stdout, simply use 'cat'
to search result.  Use OVS_WAIT_UNTIL instead of sleep, and also remove/add
some newlines. Finally, move tcpdump captured interface into the namespace,
(capture p1 instead of ovs-p1), and tested using af_xdp.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
4 years agonetdev-afxdp: add new netdev type for AF_XDP.
William Tu [Thu, 18 Jul 2019 20:11:14 +0000 (13:11 -0700)]
netdev-afxdp: add new netdev type for AF_XDP.

The patch introduces experimental AF_XDP support for OVS netdev.
AF_XDP, the Address Family of the eXpress Data Path, is a new Linux socket
type built upon the eBPF and XDP technology.  It is aims to have comparable
performance to DPDK but cooperate better with existing kernel's networking
stack.  An AF_XDP socket receives and sends packets from an eBPF/XDP program
attached to the netdev, by-passing a couple of Linux kernel's subsystems
As a result, AF_XDP socket shows much better performance than AF_PACKET
For more details about AF_XDP, please see linux kernel's
Documentation/networking/af_xdp.rst. Note that by default, this feature is
not compiled in.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
4 years agoovs-thread: Add pthread spin lock support.
William Tu [Wed, 17 Jul 2019 20:23:33 +0000 (13:23 -0700)]
ovs-thread: Add pthread spin lock support.

The patch adds the basic spin lock functions:
ovs_spin_{lock, try_lock, unlock, init, destroy}.

Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
4 years agoAUTHORS: Add Harry Van Haaren.
Ian Stokes [Fri, 19 Jul 2019 11:29:47 +0000 (12:29 +0100)]
AUTHORS: Add Harry Van Haaren.

Signed-off-by: Ian Stokes <ian.stokes@intel.com>
4 years agodpif-netdev: Add specialized generic scalar functions
Harry van Haaren [Thu, 18 Jul 2019 13:03:06 +0000 (14:03 +0100)]
dpif-netdev: Add specialized generic scalar functions

This commit adds a number of specialized functions, that handle
common miniflow fingerprints. This enables compiler optimization,
resulting in higher performance. Below a quick description of
how this optimization actually works;

"Specialized functions" are "instances" of the generic implementation,
but the compiler is given extra context when compiling. In the case of
iterating miniflow datastructures, the most interesting value to enable
compile time optimizations is the loop trip count per unit.

In order to create a specialized function, there is a generic implementation,
which uses a for() loop without the compiler knowing the loop trip count at
compile time. The loop trip count is passed in as an argument to the function:

uint32_t miniflow_impl_generic(struct miniflow *mf, uint32_t loop_count)
{
    for(uint32_t i = 0; i < loop_count; i++)
        // do work
}

In order to "specialize" the function, we call the generic implementation
with hard-coded numbers - these are compile time constants!

uint32_t miniflow_impl_loop5(struct miniflow *mf, uint32_t loop_count)
{
    // use hard coded constant for compile-time constant-propogation
    return miniflow_impl_generic(mf, 5);
}

Given the compiler is aware of the loop trip count at compile time,
it can perform an optimization known as "constant propogation". Combined
with inlining of the miniflow_impl_generic() function, the compiler is
now enabled to *compile time* unroll the loop 5x, and produce "flat" code.

The last step to using the specialized functions is to utilize a
function-pointer to choose the specialized (or generic) implementation.
The selection of the function pointer is performed at subtable creation
time, when miniflow fingerprint of the subtable is known. This technique
is known as "multiple dispatch" in some literature, as it uses multiple
items of information (miniflow bit counts) to select the dispatch function.

By pointing the function pointer at the optimized implementation, OvS
benefits from the compile time optimizations at runtime.

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Tested-by: Malvika Gupta <malvika.gupta@arm.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
4 years agodpif-netdev: Refactor generic implementation
Harry van Haaren [Thu, 18 Jul 2019 13:03:05 +0000 (14:03 +0100)]
dpif-netdev: Refactor generic implementation

This commit refactors the generic implementation. The
goal of this refactor is to simplify the code to enable
"specialization" of the functions at compile time.

Given compile-time optimizations, the compiler is able
to unroll loops, and create optimized code sequences due
to compile time knowledge of loop-trip counts.

In order to enable these compiler optimizations, we must
refactor the code to pass the loop-trip counts to functions
as compile time constants.

This patch allows the number of miniflow-bits set per "unit"
in the miniflow to be passed around as a function argument.

Note that this patch does NOT yet take advantage of doing so,
this is only a refactor to enable it in the next patches.

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Tested-by: Malvika Gupta <malvika.gupta@arm.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
4 years agodpif-netdev: Split out generic lookup function
Harry van Haaren [Thu, 18 Jul 2019 13:03:04 +0000 (14:03 +0100)]
dpif-netdev: Split out generic lookup function

This commit splits the generic hash-lookup-verify function to
its own file, for cleaner seperation between optimized versions.

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Tested-by: Malvika Gupta <malvika.gupta@arm.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
4 years agodpif-netdev: Move dpcls lookup structures to .h
Harry van Haaren [Thu, 18 Jul 2019 13:03:03 +0000 (14:03 +0100)]
dpif-netdev: Move dpcls lookup structures to .h

This commit moves some data-structures to be available
in the dpif-netdev-private.h header. This allows specific
implementations of the subtable lookup function to include
just that header file, and not require that the code exists
in dpif-netdev.c

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Tested-by: Malvika Gupta <malvika.gupta@arm.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
4 years agodpif-netdev: Implement function pointers/subtable
Harry van Haaren [Thu, 18 Jul 2019 13:03:02 +0000 (14:03 +0100)]
dpif-netdev: Implement function pointers/subtable

This allows plugging-in of different subtable hash-lookup-verify
routines, and allows special casing of those functions based on
known context (eg: # of bits set) of the specific subtable.

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Tested-by: Malvika Gupta <malvika.gupta@arm.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
4 years agoovsdb-idl: Mark row "parsed" in ovsdb_idl_txn_write__
Dumitru Ceara [Wed, 17 Jul 2019 19:05:04 +0000 (21:05 +0200)]
ovsdb-idl: Mark row "parsed" in ovsdb_idl_txn_write__

Once a column is set in a row using ovsdb_idl_txn_write__ we also mark
the row "parsed". Otherwise the memory allocated by
sbrec_<table>_parse_<col>() will never be freed. After marking the row
"parsed", the ovsdb_idl_txn_disassemble function will properly free the
newly allocated memory for the column (ovsdb_idl_row_unparse).

The problem is present only for rows that are inserted by the
running application because rows that are loaded from the database
will always have row->parsed == true.

One way to detect the leak is to start northd with valgrind:

valgrind --tool=memcheck --leak-check=yes ./ovn-northd

Then create a logical switch and bind a logical port to it:

ovn-nbctl ls-add ls1
ovn-nbctl lsp-add ls1 ls1-vm1

The valgrind report:

==9270== Memcheck, a memory error detector
==9270== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==9270== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright
info
==9270== Command: ./ovn-northd
==9270==

<snip>

==9270==
==9270== 8 bytes in 1 blocks are definitely lost in loss record 30 of
292
==9270==    at 0x4C29BC3: malloc (vg_replace_malloc.c:299)
==9270==    by 0x4D31EF: xmalloc (util.c:138)
==9270==    by 0x45CB8E: sbrec_multicast_group_parse_ports (ovn-sb-idl.c:18141)
==9270==    by 0x4BB12D: ovsdb_idl_txn_write__ (ovsdb-idl.c:4489)
==9270==    by 0x4BB1B5: ovsdb_idl_txn_write (ovsdb-idl.c:4527)
==9270==    by 0x45D167: sbrec_multicast_group_set_ports (ovn-sb-idl.c:18561)
==9270==    by 0x40F416: ovn_multicast_update_sbrec (ovn-northd.c:2947)
==9270==    by 0x41FC55: build_lflows (ovn-northd.c:7981)
==9270==    by 0x421830: ovnnb_db_run (ovn-northd.c:8522)
==9270==    by 0x422B2D: ovn_db_run (ovn-northd.c:9089)
==9270==    by 0x423909: main (ovn-northd.c:9409)
==9270==
==9270== 157 (32 direct, 125 indirect) bytes in 1 blocks are definitely
lost in loss record 199 of 292
==9270==    at 0x4C29BC3: malloc (vg_replace_malloc.c:299)
==9270==    by 0x4D31EF: xmalloc (util.c:138)
==9270==    by 0x471E3D: resize (hmap.c:100)
==9270==    by 0x4720C8: hmap_expand_at (hmap.c:175)
==9270==    by 0x4C74F1: hmap_insert_at (hmap.h:277)
==9270==    by 0x4C825A: smap_add__ (smap.c:392)
==9270==    by 0x4C7783: smap_add (smap.c:55)
==9270==    by 0x451054: sbrec_datapath_binding_parse_external_ids (ovn-sb-idl.c:7181)
==9270==    by 0x4BB12D: ovsdb_idl_txn_write__ (ovsdb-idl.c:4489)
==9270==    by 0x4BB1B5: ovsdb_idl_txn_write (ovsdb-idl.c:4527)
==9270==    by 0x451436: sbrec_datapath_binding_set_external_ids (ovn-sb-idl.c:7444)
==9270==    by 0x4090F1: ovn_datapath_update_external_ids (ovn-northd.c:817)
==9270==

<snip>

==9270==
==9270== LEAK SUMMARY:
==9270==    definitely lost: 1,322 bytes in 47 blocks
==9270==    indirectly lost: 4,653 bytes in 240 blocks
==9270==      possibly lost: 0 bytes in 0 blocks
==9270==    still reachable: 254,004 bytes in 7,780 blocks
==9270==         suppressed: 0 bytes in 0 blocks
==9270== Reachable blocks (those to which a pointer was found) are not
shown.
==9270== To see them, rerun with: --leak-check=full
--show-leak-kinds=all
==9270==
==9270== For counts of detected and suppressed errors, rerun with: -v
==9270== ERROR SUMMARY: 9 errors from 9 contexts (suppressed: 0 from 0)

Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agodoc: Remove experimental tag for SMC cache.
Yipeng Wang [Wed, 17 Jul 2019 11:38:57 +0000 (04:38 -0700)]
doc: Remove experimental tag for SMC cache.

SMC cache was introduced in 2.10 with experimental tag.
SMC cache is a layer of software cache located after EMC
cache. The purpose is to improve the performance of use
cases that many flows missing the EMC cache.

One can enable SMC cache using smc-enable=true option.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
4 years agoflow: Avoid unsafe comparison of minimasks.
Ben Pfaff [Wed, 17 Jul 2019 17:55:16 +0000 (10:55 -0700)]
flow: Avoid unsafe comparison of minimasks.

The following, run inside the OVS sandbox, caused OVS to abort when
Address Sanitizer was used:

    ovs-vsctl add-br br-int
    ovs-ofctl add-flow br-int "table=0,cookie=0x1234,priority=10000,icmp,actions=drop"
    ovs-ofctl --strict del-flows br-int "table=0,cookie=0x1234/-1,priority=10000"

Sample report from Address Sanitizer:

==3029==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x603000043260 at pc 0x7f6b09c2459b bp 0x7ffcb67e7540 sp 0x7ffcb67e6cf0
READ of size 40 at 0x603000043260 thread T0
    #0 0x7f6b09c2459a  (/lib/x86_64-linux-gnu/libasan.so.5+0xb859a)
    #1 0x565110a748a5 in minimask_equal ../lib/flow.c:3510
    #2 0x565110a9ea41 in minimatch_equal ../lib/match.c:1821
    #3 0x56511091e864 in collect_rules_strict ../ofproto/ofproto.c:4516
    #4 0x56511093d526 in delete_flow_start_strict ../ofproto/ofproto.c:5959
    #5 0x56511093d526 in ofproto_flow_mod_start ../ofproto/ofproto.c:7949
    #6 0x56511093d77b in handle_flow_mod__ ../ofproto/ofproto.c:6122
    #7 0x56511093db71 in handle_flow_mod ../ofproto/ofproto.c:6099
    #8 0x5651109407f6 in handle_single_part_openflow ../ofproto/ofproto.c:8406
    #9 0x5651109407f6 in handle_openflow ../ofproto/ofproto.c:8587
    #10 0x5651109e40da in ofconn_run ../ofproto/connmgr.c:1318
    #11 0x5651109e40da in connmgr_run ../ofproto/connmgr.c:355
    #12 0x56511092b129 in ofproto_run ../ofproto/ofproto.c:1826
    #13 0x5651108f23cd in bridge_run__ ../vswitchd/bridge.c:2965
    #14 0x565110904887 in bridge_run ../vswitchd/bridge.c:3023
    #15 0x5651108e659c in main ../vswitchd/ovs-vswitchd.c:127
    #16 0x7f6b093b709a in __libc_start_main ../csu/libc-start.c:308
    #17 0x5651108e9009 in _start (/home/blp/nicira/ovs/_build/vswitchd/ovs-vswitchd+0x11d009)

This fixes the problem, which although largely theoretical could crop up
with odd implementations of memcmp(), perhaps ones optimized in various
"clever" ways.  All in all, it seems best to avoid the theoretical problem.

Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoAUTHORS: Add Michele Baldessari.
Ben Pfaff [Wed, 17 Jul 2019 18:12:10 +0000 (11:12 -0700)]
AUTHORS: Add Michele Baldessari.

Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoOVN resource agent - make promotion synchronous
Michele Baldessari [Tue, 9 Jul 2019 07:02:57 +0000 (09:02 +0200)]
OVN resource agent - make promotion synchronous

Currently inside the ovsdb_server_promote() function we call 'promote_ovnnb'
and 'promote_ovnsb' and then just record the new master state in the
CIB.

This creates a race because those two promote commands are asynchronous
so when we exit the ovsdb_server_promote() function the underlying DBs
are not guaranteed to be in master state. That means that clients might
connect to an instance that is in read-only mode.

We add a simple sleep loop where we wait for the underlying DB state to
confirm the master state. We do not need to add a timeout loop because
in case of an issue the resource timeout set within pacemaker will kick
in and the resource agent script will be killed by pacemaker.

Tested this within an openstack environment using ovn with roughly ~20
reboots and was unable to trigger the issue (before the patch we would
trigger the issue after a couple of reboots tops).

Acked-by: Numan Siddique <nusiddiq@redhat.com>
Acked-By: Daniel Alvarez <dalvarez@redhat.com>
Signed-off-by: Michele Baldessari <michele@acksyn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoofp-flow: Improve error message when cookie cannot be set.
Ben Pfaff [Tue, 16 Jul 2019 17:18:18 +0000 (10:18 -0700)]
ofp-flow: Improve error message when cookie cannot be set.

The "cookie" value has two meanings in "ovs-ofctl add-flow", etc.  With
a mask, it indicates a match; without a mask, it indicates that the
cookie should be set.  In some cases, the cookie cannot be set, which may
mean that the user meant to indicate a match.  The error message for this
case was poor; this improves it.

Suggested-by: "Yi Yang (杨燚)-云服务集团" <yangyi01@inspur.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoNEWS: Mention IGMP support for OVN
Dumitru Ceara [Wed, 17 Jul 2019 12:15:24 +0000 (14:15 +0200)]
NEWS: Mention IGMP support for OVN

NEWS update was missed while adding the IGMP code.

Fixes: 605535f9adf2 ("OVN: Add ovn-northd IGMP support")
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoodp-util: Fix NSH mask parsing.
Ilya Maximets [Mon, 15 Jul 2019 13:31:11 +0000 (16:31 +0300)]
odp-util: Fix NSH mask parsing.

'odp_flow_key_to_flow__' is used to parse both keys and masks.
However, 'odp_nsh_key_from_attr' expects only 'key' as an argument
and fails to parse masks with ODP_FIT_ERROR which causes userspace
system tests failures:

 # make check-system-userspace TESTSUITEFLAGS='-k nsh'

 |odp_util(revalidator)|WARN|
 OVS_NSH_KEY_ATTR_MD1 present but declared mdtype 0 is not 1 (NSH_M_TYPE1)

 |odp_util(revalidator)|WARN|
 the flow mask in error is:
     <...> nsh(flags=0ttl=0,mdtype=0,np=255,spi=0x0,si=0),
 for the following flow key:
     <...> nsh_ttl=8,nsh_mdtype=1,nsh_np=3,nsh_spi=0x64,nsh_si=3,
     nsh_c1=0x1020304,nsh_c2=0x5060708,nsh_c3=0x90a0b0c,nsh_c4=0xd0e0f10
     <...>

Fix that by passing the additional argument 'is_mask' to make it be
like all other parsing functions.

Additionally fixed missing comma in the 'format_nsh_key'.

CC: Yi Yang <yi.y.yang@intel.com>
Fixes: f59cb331c481 ("nsh: rework NSH netlink keys and actions")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: William Tu <u9012063@gmail.com>
4 years agoflow: Wildcard UDP ports when using SYMMETRIC_L4 hash for select groups.
Vishal Deep Ajmera [Wed, 10 Jul 2019 13:32:30 +0000 (19:02 +0530)]
flow: Wildcard UDP ports when using SYMMETRIC_L4 hash for select groups.

UDP source and destination ports are not used to derive the hash index
used for selecting the bucket in case of SYMMETRIC_L4 hash based select
groups. However, they are un-wildcarded in the megaflow entry match criteria.
This results in distinct megaflow entry being created for each pair of UDP
source and destination ports unnecessarily and causes significant performance
deterioration when the megaflow cache limit is reached.

This patch wildcards UDP ports when using select group with SYMMETRIC_L4
hash function.

Signed-off-by: Vishal Deep Ajmera <vishal.deep.ajmera@ericsson.com>
CC: Jan Scheurich <jan.scheurich@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoOVN: Add ovn-northd IGMP support
Dumitru Ceara [Mon, 15 Jul 2019 20:25:03 +0000 (22:25 +0200)]
OVN: Add ovn-northd IGMP support

New IP Multicast Snooping Options are added to the Northbound DB
Logical_Switch:other_config column. These allow enabling IGMP snooping and
querier on the logical switch and get translated by ovn-northd to rows in
the IP_Multicast Southbound DB table.

ovn-northd monitors for changes done by ovn-controllers in the Southbound DB
IGMP_Group table. Based on the entries in IGMP_Group ovn-northd creates
Multicast_Group entries in the Southbound DB, one per IGMP_Group address X,
containing the list of logical switch ports (aggregated from all controllers)
that have IGMP_Group entries for that datapath and address X. ovn-northd
also creates a logical flow that matches on IP multicast traffic destined
to address X and outputs it on the tunnel key of the corresponding
Multicast_Group entry.

Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoOVN: Add IGMP SB definitions and ovn-controller support
Dumitru Ceara [Mon, 15 Jul 2019 20:24:49 +0000 (22:24 +0200)]
OVN: Add IGMP SB definitions and ovn-controller support

A new IP_Multicast table is added to Southbound DB. This table stores the
multicast related configuration for each datapath. Each row will be
populated by ovn-northd and will control:
- if IGMP Snooping is enabled or not, the snooping table size and multicast
  group idle timeout.
- if IGMP Querier is enabled or not (only if snooping is enabled too), query
  interval, query source addresses (Ethernet and IP) and the max-response
  field to be stored in outgoing queries.
- an additional "seq_no" column is added such that ovn-sbctl or if needed a
  CMS can flush currently learned groups. This can be achieved by incrementing
  the "seq_no" value.

A new IGMP_Group table is added to Southbound DB. This table stores all the
multicast groups learned by ovn-controllers. The table is indexed by
datapath, group address and chassis. For a learned multicast group on a
specific datapath each ovn-controller will store its own row in this table.
Each row contains the list of chassis-local ports on which the group was
learned. Rows in the IGMP_Group table are updated or deleted only by the
ovn-controllers that created them.

A new action ("igmp") is added to punt IGMP packets on a specific logical
switch datapath to ovn-controller if IGMP snooping is enabled.

Per datapath IGMP multicast snooping support is added to pinctrl:
- incoming IGMP reports are processed and multicast groups are maintained
  (using the OVS mcast-snooping library).
- each OVN controller syncs its in-memory IGMP groups to the Southbound DB
  in the IGMP_Group table.
- pinctrl also sends periodic IGMPv3 general queries for all datapaths where
  querier is enabled.

Signed-off-by: Mark Michelson <mmichels@redhat.com>
Co-authored-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agopackets: Add IGMPv3 query packet definitions
Dumitru Ceara [Mon, 15 Jul 2019 20:24:37 +0000 (22:24 +0200)]
packets: Add IGMPv3 query packet definitions

Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoovn-northd: Fix the ovn-northd continuous looping
Numan Siddique [Tue, 16 Jul 2019 18:01:10 +0000 (23:31 +0530)]
ovn-northd: Fix the ovn-northd continuous looping

ovn-northd wakes up continuously from poll_block(). This issue can be reproduced
in the sandbox with the below commands

ovn-nbctl lr-add lr0
ovn-nbctl ls-add public
ovn-nbctl lrp-add lr0 lr0-public 00:00:20:20:12:13 172.168.0.100/24
ovn-nbctl lsp-add public public-lr0
ovn-nbctl lsp-set-type public-lr0 router
ovn-nbctl lsp-set-addresses public-lr0 router
ovn-nbctl lsp-set-options public-lr0 router-port=lr0-public
ovn-nbctl lrp-set-gateway-chassis lr0-public chassis-1 20

This issue is seen after the commit [1], which makes use of the function -
sbrec_port_binding_update_nat_addresses_addvalue() to add a value to
Port_Binding.nat_addresses column.

Looks like the IDL client code is sending the transactions to the ovsdb-server repeatedly
to update the Port_Binding.nat_addresses even though the Southbound DB has updated
the column when this function is used. The actual bug seems to be in the IDL client code
and that needs to be fixed. This patch as a quick fix, fixes ovn-northd's continuous loop
by not using this function, instead making use of sbrec_port_binding_set_nat_addresses().

The below messages are seen continuously when the ovn-nortdh debug logs are enabled.

****

2019-07-12T17:26:13.837Z|74512|jsonrpc|DBG|unix:sb1.ovsdb: received reply,
result=[{},{"count":1},{"count":1}], id=18628
2019-07-12T17:26:13.837Z|74513|poll_loop|DBG|wakeup due to 0-ms timeout at ../lib/ovsdb-idl.c:5397 (75% CPU usage)
2019-07-12T17:26:13.837Z|74514|jsonrpc|DBG|unix:sb1.ovsdb: send request,
method="transact", params=["OVN_Southbound",{"lock":"ovn_northd","op":"assert"},
{"where":[["_uuid","==",["uuid","56a9eb75-8d3b-4144-b4e7-1bb749645011"]]],"row":
{"nat_addresses":["set",[]]},"op":"update","table":"Port_Binding"},{"mutations":[["nat_addresses",
"insert",["set",["00:00:20:20:12:13 172.168.0.100 is_chassis_resident(\"cr-lr0-public\")"]]]],
"where":[["_uuid","==",["uuid","56a9eb75-8d3b-4144-b4e7-1bb749645011"]]],"op":"mutate","table":"Port_Binding"}], id=18629

2019-07-12T17:26:13.837Z|74516|jsonrpc|DBG|unix:sb1.ovsdb: received reply, result=[{},{"count":1},{"count":1}], id=18629
2019-07-12T17:26:13.837Z|74517|poll_loop|DBG|wakeup due to 0-ms timeout at ../lib/ovsdb-idl.c:5397 (75% CPU usage)
2019-07-12T17:26:13.837Z|74518|jsonrpc|DBG|unix:sb1.ovsdb: send request,
method="transact", params=["OVN_Southbound",{"lock":"ovn_northd","op":"assert"},
{"where":[["_uuid","==",["uuid","56a9eb75-8d3b-4144-b4e7-1bb749645011"]]],
"row":{"nat_addresses":["set",[]]},"op":"update","table":"Port_Binding"},
{"mutations":[["nat_addresses","insert",["set",["00:00:20:20:12:13 172.168.0.100
is_chassis_resident(\"cr-lr0-public\")"]]]],"where":[["_uuid","==",["uuid",
"56a9eb75-8d3b-4144-b4e7-1bb749645011"]]],"op":"mutate","table":"Port_Binding"}], id=18630
2019-07-12T17:26:13.837Z|74520|jsonrpc|DBG|unix:sb1.ovsdb: received reply, result=[{},{"count":1},{"count":1}], id=18630
******

The OpenStack CI tests for networking-ovn is frequently failing few tests after this
commit. The failure seems to be related to timing issues as ovn-northd is hogging
the CPU continuously. We are also seeing travis CI test failures after this commit.

[1] - ed198fb3b92e

Fixes: ed198fb3b92e("ovn: Send GARP for the router ports with reside-on-redirect-chassis options set")
Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoovs-macros: An option to suspend test execution on error
Vasu Dasari [Mon, 15 Jul 2019 21:15:01 +0000 (17:15 -0400)]
ovs-macros: An option to suspend test execution on error

Origins for this patch are captured at
https://mail.openvswitch.org/pipermail/ovs-discuss/2019-June/048923.html.

Summarizing here, when a test fails, it would be good to pause test execution
and let the developer poke around the system to see current status of system.

As part of this patch, made a small tweaks to ovs-macros.at, so that when test
suite fails, ovs_on_exit() function will be called. And in this function, a check
is made to see if an environment variable to OVS_PAUSE_TEST is set. If it is
set, then test suite is paused and will continue to wait for user input
Ctrl-D. Meanwhile user can poke around the system to see why test case has
failed. Once done with investigation, user can press ctrl-d to cleanup the
test suite.

For example, to re-run test case 139:

export OVS_PAUSE_TEST=1
cd tests/system-userspace-testsuite.dir/139
sudo -E ./run

When error occurs, above command would display something like this:
=====================================================
Set environment variable to use various ovs utilities
export OVS_RUNDIR=/opt/vdasari/Developer/ovs/_build-gcc/tests/system-userspace-testsuite.dir/139
Press ENTER to continue:

=====================================================
And from another window, one can execute ovs-xxx commands like:
export OVS_RUNDIR=/opt/vdasari/Developer/ovs/_build-gcc/tests/system-userspace-testsuite.dir/139
$ ovs-ofctl dump-ports br0
.
.

To be able to pause while performing `make check`, one can do:
$ OVS_PAUSE_TEST=1 make check TESTSUITEFLAGS='-v'

Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Vasu Dasari <vdasari@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoOVN: use trigger_event action to report 'empty_lb_rule' events
Lorenzo Bianconi [Thu, 11 Jul 2019 15:48:45 +0000 (17:48 +0200)]
OVN: use trigger_event action to report 'empty_lb_rule' events

Add northd logical flows in order to reports that the controller
received an IP packet for LB rule witn no backends.
This configuration is used by OpenShift to spin up a idle POD

Signed-off-by: Mark Michelson <mmichels@redhat.com>
Co-authored-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoOVN: introduce trigger_event() action
Lorenzo Bianconi [Thu, 11 Jul 2019 15:48:44 +0000 (17:48 +0200)]
OVN: introduce trigger_event() action

Add trigger_event() ovn action in order to allow ovs-vswitchd to report
CMS related events.
This commit introduces a new event, empty_lb_backends. This event is
raised if a received packet is destined for a load balancer VIP that has
no configured backend destinations. For this event, the event info
includes the load balancer VIP, the load balancer UUID, and the
transport protocol.
The use case for this particular event is for the CMS to supply backend
resources to handle this traffic. For example, in Openshift, this event
can be used to spin up new containers to handle the incoming traffic.

Signed-off-by: Mark Michelson <mmichels@redhat.com>
Co-authored-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoOVN: introduce Controller_Event table
Lorenzo Bianconi [Thu, 11 Jul 2019 15:48:43 +0000 (17:48 +0200)]
OVN: introduce Controller_Event table

Add Controller_Event table to OVN SBDB in order to
report CMS related event.
Introduce event_table hashmap array and controller_event related
structures to ovn-controller in order to track pending events
forwarded by ovs-vswitchd. Moreover integrate event_table hashmap
array with event_table ovn-sbdb table

Signed-off-by: Mark Michelson <mmichels@redhat.com>
Co-authored-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoShutdown SSL connection before closing socket
Terry Wilson [Thu, 11 Jul 2019 13:00:20 +0000 (08:00 -0500)]
Shutdown SSL connection before closing socket

Without shutting down the SSL connection, log messages like:

stream_ssl|WARN|SSL_read: unexpected SSL connection close
jsonrpc|WARN|ssl:127.0.0.1:47052: receive error: Protocol error
reconnect|WARN|ssl:127.0.0.1:47052: connection dropped (Protocol error)

would occur whenever the socket is closed. This just adds an
SSLStream.close() that calls shutdown() and ignores SSL errors, the
same way that lib/stream-ssl.c does in ssl_close().

Signed-off-by: Terry Wilson <twilson@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agotests: Add ovsdb-cluster-testsuite to gitignore.
Ilya Maximets [Fri, 12 Jul 2019 16:15:27 +0000 (19:15 +0300)]
tests: Add ovsdb-cluster-testsuite to gitignore.

CC: Han Zhou <hzhou8@ebay.com>
Fixes: 2bcb3b7052c8 ("ovsdb raft: Move ovsdb cluster tests to separate testsuite.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ben Pfaff <blp@ovn.org>
Acked-by: Han Zhou <hzhou8@ebay.com>
4 years agocheckpatch: Check FOR_EACH loops with numbers.
Ilya Maximets [Fri, 12 Jul 2019 12:57:02 +0000 (15:57 +0300)]
checkpatch: Check FOR_EACH loops with numbers.

OVS has defines for loops like 'BITMAP_FOR_EACH_1' or
'ULLONG_FOR_EACH_1', but the regexp in checkpatch doesn't match with
numbers and skips these loops while checking.

This patch adds numbers into regexp and adds some FER_EACH loops to
the unit tests.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ben Pfaff <blp@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
4 years agoovn: Fix the test failures in travis CI.
Numan Siddique [Thu, 11 Jul 2019 16:00:33 +0000 (21:30 +0530)]
ovn: Fix the test failures in travis CI.

After the commit [1], below test cases are failing repeatedly in travis CI.

2663: ovn -- 4 HV, 1 LS, 1 LR, packet test with HA distributed router gateway port FAILED (ovn.at:8597)
2664: ovn -- 4 HV, 3 LS, 2 LR, packet test with HA distributed router gateway port FAILED (ovn.at:8844)
2667: ovn -- vlan traffic for external network with distributed router gateway port FAILED (ovn.at:9580)
2691: ovn -- router - check packet length - icmp defrag FAILED (ovn.at:13624)

With the commit [1], ovn-controller sends GARPs for the IPs of the distributed
router ports. The failing tests did not handle the situation if multiple GARPs
are sent. The failures are mostly timing related. This patch fixes these issues.

[1] - d65586b6fa97 ("ovn: Send GARP for router port IPs of a router port connected to bridged logical switch")

Fixes: d65586b6fa97 ("ovn: Send GARP for router port IPs of a router port connected to bridged logical switch")
CC: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agonetdev-vport: Make ip6gre netdev type to use TC rules
Eli Britstein [Thu, 4 Jul 2019 07:36:42 +0000 (07:36 +0000)]
netdev-vport: Make ip6gre netdev type to use TC rules

The offload api functions already assigned to every tunnel class.
For ip6gre tunnel class only need to also assign the get_ifindex
function, similarly as done in commit 5e63eaa969a3 ("netdev-vport: Make
gre netdev type to use TC rules").

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>