]> git.proxmox.com Git - mirror_ovs.git/log
mirror_ovs.git
5 years agocoverage: Add command for reading counter value.
Jakub Sitnicki [Fri, 17 May 2019 19:56:39 +0000 (12:56 -0700)]
coverage: Add command for reading counter value.

From: Jakub Sitnicki <jkbs@redhat.com>

Facilitate checking coverage counters from scripts and tests with a new
coverage/read-counter command that gets the total count for a counter.

Same could be achieved by scraping the output of coverage/show command
but the difficulties there are that output is in human readable format
and zero-value counters are not listed.

Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
Acked-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovn-controller: Incremental processing for port-group changes.
Han Zhou [Fri, 17 May 2019 19:56:38 +0000 (12:56 -0700)]
ovn-controller: Incremental processing for port-group changes.

Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovn-controller: Split port_groups from runtime_data.
Han Zhou [Fri, 17 May 2019 19:56:37 +0000 (12:56 -0700)]
ovn-controller: Split port_groups from runtime_data.

Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovn-controller: Incremental processing for address-set changes.
Han Zhou [Fri, 17 May 2019 19:56:36 +0000 (12:56 -0700)]
ovn-controller: Incremental processing for address-set changes.

When the content of an address set changes, ovn-controller will
not recompute all flows but only the ones related to the changed
address-set. The performance test result is discussed at [1].

[1] https://mail.openvswitch.org/pipermail/ovs-discuss/2018-June/046880.html

Tested-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovn-controller: Maintain resource references for logical flows.
Han Zhou [Fri, 17 May 2019 19:56:35 +0000 (12:56 -0700)]
ovn-controller: Maintain resource references for logical flows.

This patch maintains the cross reference between logical flows and
the resources such as address sets and port groups that are used by
logical flows. This data will be needed in address set and port
group incremental processing.

Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovn-controller: Split addr_sets from runtime_data.
Han Zhou [Fri, 17 May 2019 19:56:34 +0000 (12:56 -0700)]
ovn-controller: Split addr_sets from runtime_data.

Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovsdb-idl: Tracking - preserve data for deleted rows.
Han Zhou [Fri, 17 May 2019 19:56:33 +0000 (12:56 -0700)]
ovsdb-idl: Tracking - preserve data for deleted rows.

OVSDB IDL can track changes, but for deleted rows, the data is
destroyed and only uuid is tracked. In some cases we need to
check the data of the deleted rows. This patch preserves data
for deleted rows until track clear is called.

Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovn-controller: incremental processing for multicast group changes
Han Zhou [Fri, 17 May 2019 19:56:32 +0000 (12:56 -0700)]
ovn-controller: incremental processing for multicast group changes

This patch handles SB Multicast_Group table changes incrementally.
The Multicast_Group table changes can be triggered by creating/deleting
a lport of a lswitch. It can be also triggered indirectly by an
updating of a port-binding which is referenced by the multicast
group.

This patch together with previous incremental processing engine
related changes supports incremental processing for lflow changes
and port-binding changes of lports on other HVs, which are the most
common scenarios in a cloud where workloads come up and down.

In ovn-scale-test env [1], the total execution time of creating and
binding 10k ports on 1k HVs with 40 lswitches and 8 lrouters
(5 lswitches/lrouter), decreased from 3h40m to 1h50m because of the
less CPU on HVs. The CPU time of ovn-controller for additional 500
lports creating and binding (on top of already existed 10k lports)
decreased 90% comparing with master.

Latency for end-to-end operations of one extra port on top of the
10k lports, start from port-creation until all flows installation
on all related HVs is also improved significantly:

before: 20.6s in total
- lsp-add: 0.4s
- wait-until port up=true: 4.8s
- --wait=hv sync: 15.4s

after: 7.3s in total
- lsp-add: 0.4s
- wait-until port up=true: 4.0s
- --wait=hv sync: 2.9s

[1] https://github.com/openvswitch/ovn-scale-test

Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovn-controller: port-binding incremental processing for physical flows
Han Zhou [Fri, 17 May 2019 19:56:31 +0000 (12:56 -0700)]
ovn-controller: port-binding incremental processing for physical flows

This patch implements change handler for port-binding in flow_output
for physical flows computing, so that physical flow computing will
be incremental.

Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovn-controller: runtime_data change handler for SB port-binding
Han Zhou [Fri, 17 May 2019 19:56:30 +0000 (12:56 -0700)]
ovn-controller: runtime_data change handler for SB port-binding

Evaluates change for SB port-binding in runtime_data node.
If the port-binding change has no impact for the runtime_data it will
not trigger runtime_data change.

Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovn-controller: Incremental logical flow processing
Han Zhou [Fri, 17 May 2019 19:56:29 +0000 (12:56 -0700)]
ovn-controller: Incremental logical flow processing

Implements change handler of flow_output for SB lflow changes.

Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovn-controller: Initial use of incremental engine - quiet mode.
Han Zhou [Fri, 17 May 2019 19:56:28 +0000 (12:56 -0700)]
ovn-controller: Initial use of incremental engine - quiet mode.

Incremental proccessing engine is used to compute flows. In this
patch we create below engine nodes:
    - Engine nodes for each OVSDB table in local OVS DB and SB DB.
    - runtime_data: compute and maintain intermediate result such
                    as local_datapath, etc.
    - mff_ovn_geneve: MFF_* field ID for our Geneve option, which
                      is provided by switch.
    - flow_output: compute and maintain computed flow table.

In this patch the ovn flow table is persistent across main loop
iterations, and a new index of SB uuid is maintained for the
desired flow table, which will be useful for next patches for
incremental processing.

This patch doesn't do any incremental processing yet, but it achieves
the "quiet mode": the flow computation won't be triggered by unrelated
events, such as pinctrl/ofctrl messages. The flow computation
(full compute) happens only when any of its related inputs are
changed.

Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovn-controller: Track OVSDB changes
Han Zhou [Fri, 17 May 2019 19:56:27 +0000 (12:56 -0700)]
ovn-controller: Track OVSDB changes

Track OVSDB changes for future patches of incremental processing

Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovn-controller: Incremental processing engine
Han Zhou [Fri, 17 May 2019 19:56:26 +0000 (12:56 -0700)]
ovn-controller: Incremental processing engine

This patch implements the engine which will be used in future patches
for ovn-controller incremental processing.

Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoofproto-dpif-xlate: Add "always" mode to priority tags
Eli Britstein [Sun, 12 May 2019 05:51:00 +0000 (05:51 +0000)]
ofproto-dpif-xlate: Add "always" mode to priority tags

Configure "if-nonzero" priority tags to retain the 802.1Q header
when the VLAN ID is zero, except both the VLAN ID and priority are zero.
Add a "always" configuration option to retain the 802.1Q header in such
frames as well.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoofproto-dpif-xlate: Change priority tags from boolean to enum
Eli Britstein [Sun, 12 May 2019 05:50:59 +0000 (05:50 +0000)]
ofproto-dpif-xlate: Change priority tags from boolean to enum

Priority tags is a port configuration to determine how the port treats
priority tags, e.g. zero VLAN ID. Change the type from boolean to enum
as a pre-step towards introducing additional modes. The new options are
"never", equivalent to previously "false", and "if-nonzero",
equivalent to previously "true". "true" is still supported for backwards
compatibility.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovsdb-client.1: Fix typo.
Justin Pettit [Thu, 23 May 2019 21:07:54 +0000 (14:07 -0700)]
ovsdb-client.1: Fix typo.

Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
5 years agoconntrack: add display support for sctp
Aaron Conole [Tue, 21 May 2019 18:16:31 +0000 (14:16 -0400)]
conntrack: add display support for sctp

Currently, only the netlink datapath supports SCTP connection tracking,
but at least this removes the warning message that will pop up when
running something like:

   ovs-appctl dpctl/dump-conntrack

This doesn't impact any conntrack functionality, just the display.

Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agocompat: add SCTP netfilter states for older kernels
Aaron Conole [Tue, 21 May 2019 18:16:30 +0000 (14:16 -0400)]
compat: add SCTP netfilter states for older kernels

Bake in the SCTP states from the kernel UAPI.  This means an older
revision of the kernel headers won't interfere with the SCTP display
enhancement.  Additionally, if a newer version is available, or if
x-compiling the datapath module we defer to that version (since this
is just meant to provide the missing definitions).

This will be used in a future commit.

Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoAUTHORS: Add Liliia Butorina.
Ilya Maximets [Fri, 24 May 2019 13:27:07 +0000 (16:27 +0300)]
AUTHORS: Add Liliia Butorina.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ben Pfaff <blp@ovn.org>
5 years agonetdev-dpdk: Post-copy Live Migration support for vhost-user-client.
Liliia Butorina [Tue, 14 May 2019 13:08:43 +0000 (16:08 +0300)]
netdev-dpdk: Post-copy Live Migration support for vhost-user-client.

Post-copy Live Migration for vHost supported since DPDK 18.11 and
QEMU 2.12. New global config option 'vhost-postcopy-support' added
to control this feature. Ex.:

  ovs-vsctl set Open_vSwitch . other_config:vhost-postcopy-support=true

Changing this value requires restarting the daemon. It's safe to
enable this knob even if QEMU doesn't support post-copy LM.

Feature marked as experimental and disabled by default because it may
cause PMD thread hang on destination host on page fault for the time
of page downloading from the source.

Feature is not compatible with 'mlockall' and 'dequeue zero-copy'.
Support added only for vhost-user-client.

Signed-off-by: Liliia Butorina <l.butorina@partner.samsung.com>
Co-authored-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
5 years agovswitchd: Track status of memory locking.
Ilya Maximets [Tue, 14 May 2019 13:08:42 +0000 (16:08 +0300)]
vswitchd: Track status of memory locking.

Needed for the future post-copy live migration support for
vhost-user ports.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
5 years agoAUTHORS: Add Klemens Nanni.
Ben Pfaff [Thu, 23 May 2019 23:38:51 +0000 (16:38 -0700)]
AUTHORS: Add Klemens Nanni.

Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agosflow, ovn: Typofix: trafic -> traffic
Klemens Nanni [Thu, 23 May 2019 22:53:40 +0000 (00:53 +0200)]
sflow, ovn: Typofix: trafic -> traffic

Spotted http://docs.openvswitch.org/en/latest/howto/sflow/,
grepping the tree found another instance in ovn.

Signed-off-by: Klemens Nanni <klemens@posteo.de>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovn: Properly set the index for chassis lookup
Dumitru Ceara [Thu, 9 May 2019 08:09:23 +0000 (10:09 +0200)]
ovn: Properly set the index for chassis lookup

The chassis_lookup_by_name function now calls
sbrec_chassis_index_set_name instead of sbrec_chassis_set_name. Due to
the use of the wrong API memory was leaked every time a chassis was
looked up by name. This was mostly visible when chassis lookups had to
be done continuously (e.g., when two chassis were misconfigured
with the same system-id).

Acked-by: Han Zhou <hzhou8@ebay.com>
Reported-at: https://bugzilla.redhat.com/1698462
Reported-by: Alexander <alerom@rambler.ru>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovn-macros: Break the OVN macros into their own file.
Justin Pettit [Sat, 6 Apr 2019 16:02:05 +0000 (09:02 -0700)]
ovn-macros: Break the OVN macros into their own file.

Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
5 years agoovn.at: Use ovn cleanup macros.
Justin Pettit [Sat, 6 Apr 2019 15:28:57 +0000 (08:28 -0700)]
ovn.at: Use ovn cleanup macros.

Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
5 years agoovn.at: Clean up northd-backup in "ovn -- ipam" test.
Justin Pettit [Wed, 22 May 2019 23:07:50 +0000 (16:07 -0700)]
ovn.at: Clean up northd-backup in "ovn -- ipam" test.

Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
5 years agotravis: Fix typo in comment.
Ben Pfaff [Mon, 20 May 2019 22:49:28 +0000 (15:49 -0700)]
travis: Fix typo in comment.

Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Reported-by: Numan Siddique <nusiddiq@redhat.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2019-May/359144.html
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agotravis: Retry kernel download on 503 first byte timeout.
Ilya Maximets [Thu, 16 May 2019 14:58:16 +0000 (17:58 +0300)]
travis: Retry kernel download on 503 first byte timeout.

Sometimes it takes to long for CDN to reply in case of downloading
of not frequently used kernels.
For example, even on my local PC it fails to download linux-4.19.29
at the first try:

  $ wget https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.19.29.tar.xz
  Resolving cdn.kernel.org (cdn.kernel.org)... 151.101.1.176
  Connecting to cdn.kernel.org |151.101.1.176|:443... connected.
  HTTP request sent, awaiting response... 503 first byte timeout

It seems that CDN downloads the tar for that time to the nearby
server and instant retry usually succeeds immediately.

503 is not a "fatal error" for wget and, unfortunately, wget on
TravisCI is too old and we can't just use "--retry-on-http-error=503"
to avoid failures in this case. So, let's just retry unconditionally.
Fallback to the direct download if CDN fails twice.

Fixes: ae6e4f12fcab ("travis: Speed up linux kernel downloads.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ben Pfaff <blp@ovn.org>
5 years agorhel: Add 4.12 kernel support in ovs-kmod-manage.sh
Yi-Hung Wei [Mon, 20 May 2019 19:39:10 +0000 (12:39 -0700)]
rhel: Add 4.12 kernel support in ovs-kmod-manage.sh

This patch extends c3570519ecaf ("rhel: add 4.4 kernel in kmod build
with mulitple versions, fedora") that updates ovs-kmod-manage.sh to
support SLES 12 SP4 kernel (4.12.x, x>=14).

For some distros, openvswitch-kmod rpm package may contain multiple
ovs kernel modules built against different kernels to deal with kernel
ABI changes and kernel module compatibility issues.  For rpm that
packages with multiple kernel modules, ovs-kmod-manage.sh is invoked
during the rpm post installation stage to 1) select the proper kernel
module to be used; 2) create symbolic links to the proper kernel module
in the weak-updates directory if needed.

For SLES 12 SP4, since the weak-modules utility is not available, even
though there is no ovs related kernel ABI changes for its
currently 5 available kernels from 4.12.14-94.41.1 to 4.12.14-95.16.1,
we still want to invoke ovs-kmod-manage.sh to create weak-updates
symbolic links if the kernel that build the rpm package is different
from the installed kernel.

Notice that ovs-kmod-manage.sh assumes the oldest compatible kernel
is used to build the kernel module rpm. For example, on SLES 12 SP4
it would be,

$ rpmbuild -bb -D 'kversion 4.12.14-94.41-default' \
    rhel/openvswitch-kmod-fedora.spec

Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Gurucharan Shetty <guru@ovn.org>
5 years agodatapath-windows: Copy mru information when cloning a nbl.
Anand Kumar [Fri, 17 May 2019 21:16:39 +0000 (14:16 -0700)]
datapath-windows: Copy mru information when cloning a nbl.

When a nbl is cloned, mru value stored in the original nbl
context is lost, which skips refragemting the cloned nbls.

This patch fixes it.

Signed-off-by: Anand Kumar <kumaranand@vmware.com>
Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>
5 years agodoc: Add CirrusCI status badge to README.
Ilya Maximets [Thu, 16 May 2019 13:40:25 +0000 (16:40 +0300)]
doc: Add CirrusCI status badge to README.

Badge for CirrusCI just like for other CI systems.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Aaron Conole <aconole@redhat.com>
5 years agodoc: Fix cropped what-is-ovs page.
Ilya Maximets [Thu, 16 May 2019 13:40:24 +0000 (16:40 +0300)]
doc: Fix cropped what-is-ovs page.

Despite of comments in both files no-one ever adjusted start/end-line
in 'what-is-ovs' document. As a result, current document contains
truncated "tools" section.

Let's replace start/end-line with start-after/end-before which requires
less attention. Additionally, 'make docs-check' will fail if specified
lines will not be found, i.e it'll be harder to mess up the docs again.

"Fixes" tag points to commit that broke the lines first.

Fixes: 602e24ee189b ("doc: Remove experimental warning for DPDK.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Stephen Finucane <stephen@that.guru>
5 years agocirrus: Disable coredumps on FreeBSD.
Ilya Maximets [Tue, 14 May 2019 16:23:59 +0000 (19:23 +0300)]
cirrus: Disable coredumps on FreeBSD.

Some tests uses 'kill -SEGV' to simulate segfault of a child process.
This causes test failures on CirrusCI because process hangs in DL state
for more than 10 seconds:

  ./daemon-py.at:69: kill -SEGV $child
  daemon-py.at:69: waiting while kill -0 $child...
  daemon-py.at:69: wait failed after 10 seconds
  ./ovs-macros.at:219: hard failure

Testing shows that on CirrusCI with FreeBSD 11.2 coredump takes 4+
seconds and with FreeBSD 12.0 it takes 8+ seconds for successful runs
and fails the testsuite frequently. It's hard to determine the root
cause, but most probably it happens because of overloaded CirrusCI
community cluster.

Let's just disable coredumps in 'prepare_script'. This makes no harm
because we can't take them out of CI anyway.

Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
5 years agofedora: Handle upgrades from rhel package.
Gurucharan Shetty [Fri, 3 May 2019 06:50:12 +0000 (23:50 -0700)]
fedora: Handle upgrades from rhel package.

Currently we have rhel/openvswitch.spec.in that provides
sysv scripts. The fedora package provides systemd scripts.
If one upgrades openvswitch package from sysv to systemd,
you will end up in a situation where old OVS daemons are
running, but systemd does not know about it.  One "restart"
is needed for systemd to see the old daemons. Another "restart"
or "force-reload-kmod" is needed to actually use the new
daemons.

This commit, just takes care of the first restart. The "real"
restart/force-reload-kmod will still have to be done outside
the package installation.

Signed-off-by: Gurucharan Shetty <guru@ovn.org>
Acked-by: Ansis Atteka <aatteka@ovn.org>
5 years agofedora: Ability to auto enable openvswitch service.
Gurucharan Shetty [Fri, 3 May 2019 06:38:30 +0000 (23:38 -0700)]
fedora: Ability to auto enable openvswitch service.

We currently have rhel/openvswitch.spec.in that automatically
enables openvswitch service when the package is installed using
chkconfig.

But fedora rpm may not enable openvswitch service automatically.
The macro currently being used in fedora rpm (systemd_post) will
look for preset files in /etc/systemd/system-preset/ to figure
out whether openvswitch service needs to be automatically enabled.
But, the fedora package does not provide such a file. The argument
is that people may want to install the package for binaries and
not necessarily to run OVS.

If someone now wants to install the fedora package and automatically
enable openvswitch, he will have to create a new package that OVS
package depends on to install the preset file. This is unwieldy.

This commit, provides a rpm build time option to enable the openvswitch
service automatically. If you now run the below command, openvswitch
service will be automatically enabled during package installation.

make rpm-fedora RPMBUILD_OPT="--with autoenable"

Signed-off-by: Gurucharan Shetty <guru@ovn.org>
Acked-by: Ansis Atteka <aatteka@ovn.org>
5 years agodatapath: Support kernel version 4.19.x and 4.20.x
Yifeng Sun [Fri, 10 May 2019 19:30:14 +0000 (12:30 -0700)]
datapath: Support kernel version 4.19.x and 4.20.x

This patch updated acinclude.m4 so that OVS can be compiled on 4.19.x
and 4.20.x kernels.
This patch also updated travis files so that latest kernel versions
are used during travis test builds.

Tested-by: Greg Rose <gvrose8192@gmail.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agonetfilter: Remove useless param helper of nf_ct_helper_ext_add
Gao Feng [Fri, 10 May 2019 19:30:13 +0000 (12:30 -0700)]
netfilter: Remove useless param helper of nf_ct_helper_ext_add

Upstream commit:
    commit 440534d3c56be04abfb26850ee882d19d223557a
    Author: Gao Feng <gfree.wind@vip.163.com>
    Date:   Mon Jul 9 18:06:33 2018 +0800

    netfilter: Remove useless param helper of nf_ct_helper_ext_add

    The param helper of nf_ct_helper_ext_add is useless now, then remove
    it now.

Signed-off-by: Gao Feng <gfree.wind@vip.163.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch backports the above upstream patch to OVS.

Cc: Gao Feng <gfree.wind@vip.163.com>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoopenvswitch: use nf_ct_get_tuplepr, invert_tuplepr
Florian Westphal [Fri, 10 May 2019 19:30:12 +0000 (12:30 -0700)]
openvswitch: use nf_ct_get_tuplepr, invert_tuplepr

Upstream commit:
    commit 60e3be94e6a1c5162a0763c9aafb5190b2b1fdce
    Author: Florian Westphal <fw@strlen.de>
    Date:   Mon Jun 25 17:55:32 2018 +0200

    openvswitch: use nf_ct_get_tuplepr, invert_tuplepr

    These versions deal with the l3proto/l4proto details internally.
    It removes only caller of nf_ct_get_tuple, so make it static.

    After this, l3proto->get_l4proto() can be removed in a followup patch.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch backports the above upstream kernel patch to OVS.

Cc: Florian Westphal <fw@strlen.de>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agodatapath: Fix conntrack_count related compilation errors
Yifeng Sun [Fri, 10 May 2019 19:30:11 +0000 (12:30 -0700)]
datapath: Fix conntrack_count related compilation errors

This patch fixes the compilation errors of OVS on 4.19+ kernels.

Tested-by: Greg Rose <gvrose8192@gmail.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agodatapath: Use new header file net/ipv6_frag.h
Florian Westphal [Fri, 10 May 2019 19:30:10 +0000 (12:30 -0700)]
datapath: Use new header file net/ipv6_frag.h

Upstream commit:
    commit 70b095c84326640eeacfd69a411db8fc36e8ab1a
    Author: Florian Westphal <fw@strlen.de>
    Date:   Sat Jul 14 01:14:01 2018 +0200

    ipv6: remove dependency of nf_defrag_ipv6 on ipv6 module

    IPV6=m
    DEFRAG_IPV6=m
    CONNTRACK=y yields:

    net/netfilter/nf_conntrack_proto.o: In function `nf_ct_netns_do_get':
    net/netfilter/nf_conntrack_proto.c:802: undefined reference to `nf_defrag_ipv6_enable'
    net/netfilter/nf_conntrack_proto.o:(.rodata+0x640): undefined reference to `nf_conntrack_l4proto_icmpv6'

    Setting DEFRAG_IPV6=y causes undefined references to ip6_rhash_params
    ip6_frag_init and ip6_expire_frag_queue so it would be needed to force
    IPV6=y too.

    This patch gets rid of the 'followup linker error' by removing
    the dependency of ipv6.ko symbols from netfilter ipv6 defrag.

    Shared code is placed into a header, then used from both.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch backports the above upstream patch to OVS.

Cc: Florian Westphal <fw@strlen.de>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agodatapath: Pass nf_hook_state to nf_conntrack_in()
Florian Westphal [Fri, 10 May 2019 19:30:09 +0000 (12:30 -0700)]
datapath: Pass nf_hook_state to nf_conntrack_in()

Upstream Commit:
    commit 93e66024b0249cec81e91328c55a754efd3192e0
    Author: Florian Westphal <fw@strlen.de>
    Date:   Wed Sep 12 15:19:07 2018 +0200

    netfilter: conntrack: pass nf_hook_state to packet and error handlers

    nf_hook_state contains all the hook meta-information: netns, protocol family,
    hook location, and so on.

    Instead of only passing selected information, pass a pointer to entire
    structure.

    This will allow to merge the error and the packet handlers and remove
    the ->new() function in followup patches.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch backports the above upstream patch to OVS and fixes compiling
errors on RHEL kernels.

Cc: Florian Westphal <fw@strlen.de>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agodatapath: Handle removal of nf_conntrack_l3proto.h
Yifeng Sun [Fri, 10 May 2019 19:30:08 +0000 (12:30 -0700)]
datapath: Handle removal of nf_conntrack_l3proto.h

Upstream kernel commit a0ae2562 ("netfilter: conntrack: remove l3proto
abstraction") removed header file net/netfilter/nf_conntrack_l3proto.h.
This patch detects it and fixes compilation errors of OVS on 4.19+ kernels.

Tested-by: Greg Rose <gvrose8192@gmail.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoAUTHORS: Add Nicolas J. Bouliane.
Ben Pfaff [Fri, 10 May 2019 19:41:26 +0000 (12:41 -0700)]
AUTHORS: Add Nicolas J. Bouliane.

Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovs-actions: Fix sentence.
Nicolas Bouliane [Fri, 10 May 2019 19:26:46 +0000 (15:26 -0400)]
ovs-actions: Fix sentence.

Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovs-vswitchd: Update limits section in manpage.
Ben Pfaff [Tue, 30 Apr 2019 23:51:50 +0000 (16:51 -0700)]
ovs-vswitchd: Update limits section in manpage.

Reported-by: William Konitzer <wkonitzer@mirantis.com>
Acked-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovs-actions.xml: Better document the "bundle" and "bundle_load" actions.
Ben Pfaff [Thu, 2 May 2019 17:56:22 +0000 (10:56 -0700)]
ovs-actions.xml: Better document the "bundle" and "bundle_load" actions.

Acked-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoconntrack: Add rcu support.
Darrell Ball [Thu, 9 May 2019 15:15:07 +0000 (08:15 -0700)]
conntrack: Add rcu support.

For performance and code simplification reasons, add rcu support for
conntrack. The array of hmaps is replaced by a cmap as part of this
conversion.  Using a single map also simplifies the handling of NAT
and allows the removal of the nat_conn map and friends.  Per connection
entry locks are introduced, which are needed in a few code paths.

Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovs-save: Handle cases of upgrades from very old OVS versions.
Gurucharan Shetty [Wed, 8 May 2019 13:55:27 +0000 (06:55 -0700)]
ovs-save: Handle cases of upgrades from very old OVS versions.

We have added code to ovs-save over the last few releases
which makes the following bad assumptions.

1. The default OpenFlow version of running daemon is OpenFlow14.

Impact: This causes upgrades from older OVS versions to end up with no
flows in their bridges (even the default 'NORMAL' ones) causing traffic
to stop.

2. That ovs-ofctl commands like dump-groups and dump-tlv-map
will just work with old OVS versions.

Impact: Does not look like it effects the upgrade in a bad away - except
you get some errors.

Since OpenFlow14 was enabled by default in OVS 2.8, this commit makes
a lazy assumption that any upgrade of OVS from versions before 2.7
will not attempt to save and restore flows.

VMware-BZ: #2340482
Signed-off-by: Gurucharan Shetty <guru@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
5 years agoodp-util: extend usage of limit for parse functions
Toms Atteka [Thu, 9 May 2019 13:29:59 +0000 (06:29 -0700)]
odp-util: extend usage of limit for parse functions

This fixes stack overflow issues for odp_actions_from_string.
Added wrapper functions for recursion limitation.

Basic manual testing was performed.

Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13808
Signed-off-by: Toms Atteka <cpp.code.lv@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agodpdk: Use DPDK 18.11.1 release.
Ian Stokes [Thu, 9 May 2019 16:53:40 +0000 (17:53 +0100)]
dpdk: Use DPDK 18.11.1 release.

Modify travis linux build script to use the latest
DPDK stable release 18.11.1. Update docs for latest
DPDK stable releases.

Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
5 years agodatapath-windows: Add Win10Analyze target
Alin Gabriel Serdean [Wed, 3 Apr 2019 17:48:03 +0000 (20:48 +0300)]
datapath-windows: Add Win10Analyze target

This patch adds a new target called `Win10Analyze` to the driver solution.

It enables us to trigger static analysis over the Win10 target.

Since the location of the ruleset of drivers is somewhat random
starting from 1803:
https://www.osr.com/blog/2018/05/21/wdk-1803-ca/

Commit the ruleset inside our repository. This is the same ruleset used for
8,8.1 and 10.

Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>
Acked-by: Anand Kumar <kumaranand@vmware.com>
5 years agoAUTHORS: Add Dumitru Ceara.
Numan Siddique [Thu, 9 May 2019 11:23:22 +0000 (16:53 +0530)]
AUTHORS: Add Dumitru Ceara.

Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>
5 years agostopwatch: Free stopwatch packets after processing
Dumitru Ceara [Wed, 8 May 2019 12:56:23 +0000 (14:56 +0200)]
stopwatch: Free stopwatch packets after processing

The free(pkt) call was missing inside the stopwatch_thread processing
loop.

Fixes: 484f7dbdaa2b ("stopwatch: Fix Windows incompatibility")
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoOVN: convert buffered_mac_bindings to ovs_list
Lorenzo Bianconi [Wed, 8 May 2019 13:41:48 +0000 (15:41 +0200)]
OVN: convert buffered_mac_bindings to ovs_list

Convert buffered_mac_bindings from hashmap to a linked list since it is
used just for iteration (no lookups are performed on it)

Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoOVN: fix pinctrl ip buffering for gw router port
Lorenzo Bianconi [Tue, 7 May 2019 13:08:52 +0000 (15:08 +0200)]
OVN: fix pinctrl ip buffering for gw router port

Use sb mac binding table to trigger ip buffer dequeueing instead of
the APR/ND packet reception since the ARP reply can be managed on a
different chassis if a gw router port is scheduled on a different
node

Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agotests: Fix IPv4 checksums in zone limit test.
Darrell Ball [Mon, 6 May 2019 16:47:47 +0000 (09:47 -0700)]
tests: Fix IPv4 checksums in zone limit test.

Userspace conntrack cares about IPv4 checksums, so this is a
prerequisite for adding zone limit support to userspace conntrack.

Fixes: 3f1087c70cf9 ("system-traffic: Add conntrack per zone limit test case.")
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoconntrack: Free conntrack context in 'conntrack_destroy()'.
Darrell Ball [Mon, 6 May 2019 14:37:18 +0000 (07:37 -0700)]
conntrack: Free conntrack context in 'conntrack_destroy()'.

Fixes: 57593fd24378 ( conntrack: Stop exporting internal datastructures.)
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoconntrack: Stop exporting internal datastructures.
Darrell Ball [Fri, 3 May 2019 04:34:04 +0000 (21:34 -0700)]
conntrack: Stop exporting internal datastructures.

Stop the exporting of the main internal conntrack datastructure.

Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovn: Added missing --wait in ovn tests
Leonid Ryzhyk [Thu, 2 May 2019 17:37:57 +0000 (10:37 -0700)]
ovn: Added missing --wait in ovn tests

Several of the ovn tests did not use the `--wait` flag to to wait for a
configuration change to propagate through the system. As a result,
these tests fail when `ovn-northd` is slow.

Fixed by adding `--wait=hv` or `--wait=sb` as appropriate.

Signed-off-by: Leonid Ryzhyk <ryzhyk@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agooss-fuzz: fixed wrong lib path
Toms Atteka [Tue, 30 Apr 2019 11:57:45 +0000 (04:57 -0700)]
oss-fuzz: fixed wrong lib path

the logical-fields.h file was moved. Path has been updated
accordingly. This broke oss-fuzz buils.

CC: Numan Siddique <nusiddiq@redhat.com>
Fixes: 086470cdbe66 ("ovn: Add a new OVN field icmp4.frag_mtu")
Signed-off-by: Toms Atteka <cpp.code.lv@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovs-tcpdump: Fix E117 over-indented.
Ilya Maximets [Mon, 29 Apr 2019 12:19:08 +0000 (15:19 +0300)]
ovs-tcpdump: Fix E117 over-indented.

utilities/ovs-tcpdump.in:376:9: E117 over-indented
make[2]: *** [flake8-check] Error 1

CC: Liu Chang <txfh2007@aliyun.com>
Fixes: 2eeadf73d931 ("ovs-tcpdump: Improve error message when tcpdump is not available.")
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
5 years agosystem-offloads-traffic.at: Fix requesting HW offloaded flows from veth.
Ilya Maximets [Fri, 26 Apr 2019 12:45:20 +0000 (15:45 +0300)]
system-offloads-traffic.at: Fix requesting HW offloaded flows from veth.

veth pair doesn't offload anything to HW. i.e. we should use 'tc' type
while requesting flows. 'offloaded' kept just in case to not update the
test if veths will be HW offloaded someday.

Additionally fixed missed for unknown reason 'ipv4' fields. Also
dropped stripping of the errors from log.

Fixes test:

  2: offloads - ping between two ports - offloads enabled ok

CC: Gavi Teitz <gavi@mellanox.com>
Fixes: d63ca5329ff9 ("dpctl: Properly reflect a rule's offloaded to HW state")
Acked-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
5 years agodatapath: Fix compiling error for 4.14.111+ kernel
Yifeng Sun [Fri, 26 Apr 2019 21:42:07 +0000 (14:42 -0700)]
datapath: Fix compiling error for 4.14.111+ kernel

Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Fixes: f72469405eec9 ("datapath: meter: Use struct_size() in kzalloc()")
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoMAINTAINERS: Add Ilya Maximets.
Ben Pfaff [Fri, 26 Apr 2019 17:52:50 +0000 (10:52 -0700)]
MAINTAINERS: Add Ilya Maximets.

Ilya was elected by the Open vSwitch committers on Thursday.  Welcome to
the team, Ilya!

CC: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovn-northd: Fix the HA_Chassis sync issue in OVN SB DB
Numan Siddique [Thu, 25 Apr 2019 19:01:39 +0000 (00:31 +0530)]
ovn-northd: Fix the HA_Chassis sync issue in OVN SB DB

ovn-northd deletes and recreates HA_Chassis rows (which belong
to a HA_Chassis_Group) whenever the HA_Chassis_Group/Gateway_Chassis
rows in Northbound DB are out of sync. If a Chassis table row in
Southbound DB is deleted and if this row is referenced by HA_Chassis
row (in Southbound DB), then the present code syncs the HA_Chassis
rows continously and this causes the ovn-controller's to wake up
and results in 100% cpu usage.

This was a simple case which the commit
1be1e0e5e0d1 ("ovn: Add generic HA chassis group") missed out addressing.

This patch fixes this issue.

Fixes: 1be1e0e5e0d1 ("ovn: Add generic HA chassis group")
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2019-April/048580.html
Reported-by: Daniel Alvarez Sanchez (dalvarez@redhat.com)
Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovsdb-server.7: Describe message ordering between "update" and "transact".
Ben Pfaff [Thu, 25 Apr 2019 19:42:46 +0000 (12:42 -0700)]
ovsdb-server.7: Describe message ordering between "update" and "transact".

This comes up sometime and it's best to document it.

Acked-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoDocumentation: Update documentation for OpenFlow support.
Ben Pfaff [Wed, 24 Apr 2019 16:37:21 +0000 (09:37 -0700)]
Documentation: Update documentation for OpenFlow support.

The commits that implemented these features forgot to update the
documentation.

Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agodatapath-windows: Do not send out nbls when cloned nbls are being accessed
Anand Kumar [Thu, 11 Apr 2019 16:14:21 +0000 (09:14 -0700)]
datapath-windows: Do not send out nbls when cloned nbls are being accessed

As per MSDN documentation, "As soon as a filter driver calls the
NdisFSendNetBufferLists function, it relinquishes ownership of
the NET_BUFFER_LIST structures and all associated resources.
A filter driver should never try to examine the NET_BUFFER_LIST
structures or any associated data after calling NdisFSendNetBufferLists".

https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/content/ndis/nf-ndis-ndisfsendnetbufferlists

When freeing up memory of a cloned nbl, parent's nbl and context
is being accessed, which is incorrect can cause BSOD.
With this patch, original nbl is sent out only when cloned nbl is done
with packet processing and its memory is freed.

Signed-off-by: Anand Kumar <kumaranand@vmware.com>
Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>
5 years agosparse: Configure target operating system and fix fallout.
Ben Pfaff [Tue, 23 Apr 2019 23:42:32 +0000 (16:42 -0700)]
sparse: Configure target operating system and fix fallout.

cgcc, the "sparse" wrapper that OVS uses, can be told the host architecture
or the host OS or both.  Until now, OVS has told it the host architecture
because it is fairly common that it doesn't guess it automatically.  Until
now, OS has not told it the host OS, assuming that it would get it right.
However, it doesn't--if you tell it the host OS or the host architecture,
it doesn't really have a default for the other.  This means that on Linux
(presumably the only OS where sparse works properly for OVS), it was not
defining __linux__, which caused some weird behavior.

This commit adds a flag to the cgcc invocation to make it define __linux__
on Linux, and it fixes some errors that this would otherwise cause.

Acked-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agotravis: Fix checks skipping by sparse.
Ilya Maximets [Wed, 24 Apr 2019 13:00:22 +0000 (16:00 +0300)]
travis: Fix checks skipping by sparse.

Recent commit in "sparse" broke checking the OVS sources, because
'make' uses '-MD' flag to generate dependencies as a side effect
within compilation commands, but "sparse" skips all the build commands
that contains '-MD' and friends.
Let's revert the bad commit as a workaround before installing "sparse"
in TravisCI.

Additionally fixed a false-positive:
./lib/bitmap.h:64:29: error: shift too big (64) for type unsigned long

CC: Yi-Hung Wei <yihung.wei@gmail.com>
Fixes: 879e8238dfdf ("travis: Update sparse git repo")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoofproto: Return error codes for rule insertions.
Aravind Prasad S [Tue, 23 Apr 2019 19:00:59 +0000 (00:30 +0530)]
ofproto: Return error codes for rule insertions.

Currently, rule_insert() API does not have return value. There are some
possible scenarios where rule insertions can fail at run-time even though the
static checks during rule_construct() had passed previously.  Some possible
scenarios for failure of rule insertions:

**) Rule insertions can fail dynamically in Hybrid mode (both Openflow and
Normal switch functioning coexist) where the CAM space could get suddenly
filled up by Normal switch functioning and Openflow gets devoid of available
space.

**) Some deployments could have separate independent layers for HW rule
insertions and application layer to interact with OVS. HW layer could face any
dynamic issue during rule handling which application could not have
predicted/captured in rule-construction phase.

Rule-insert errors for bundles are handled too.

Testing: Tested failures of rule insertions and also with bundles.

Signed-off-by: Aravind Prasad S <aravind.sridharan at dell.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoDouble postponing to free subtables.
Zhantao Fu [Tue, 23 Apr 2019 11:04:25 +0000 (19:04 +0800)]
Double postponing to free subtables.

Subtable destruction should be double postponed because readers could still obtain old values while iterating over pvector implementation before its new version published.

Signed-off-by: Zhantao Fu <fuzhantao@huawei.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoOVN: Clarify docs about the default transport zone
Lucas Alvares Gomes [Tue, 23 Apr 2019 12:25:48 +0000 (13:25 +0100)]
OVN: Clarify docs about the default transport zone

This patch is extending the documentation about the new transport zones
feature to clarify that if no transport zones are set, the chassis will
belong to a default group.

Signed-off-by: Lucas Alvares Gomes <lucasagomes@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agodebian: Notes for systemd-networkd integration with OVS.
Gurucharan Shetty [Fri, 19 Apr 2019 07:18:27 +0000 (00:18 -0700)]
debian: Notes for systemd-networkd integration with OVS.

Signed-off-by: Gurucharan Shetty <guru@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
5 years agoOVN: Add support for Transport Zones
Lucas Alvares Gomes [Thu, 18 Apr 2019 13:39:09 +0000 (14:39 +0100)]
OVN: Add support for Transport Zones

This patch is adding support for Transport Zones. Transport zones (a.k.a
TZs) is way to enable users of OVN to separate Chassis into different
logical groups that will only form tunnels between members of the same
groups. Each Chassis can belong to one or more Transport Zones. If
not set, the Chassis will be considered part of a default group.

Configuring Transport Zones is done by creating a key called
"ovn-transport-zones" in the external_ids column of the Open_vSwitch
table from the local OVS instance. The value is a string with the name
of the Transport Zone that this instance is part of. Multiple TZs can
be specified with a comma-separated list. For example:

$ sudo ovs-vsctl set open . external-ids:ovn-transport-zones=tz1

or

$ sudo ovs-vsctl set open . external-ids:ovn-transport-zones=tz1,tz2,tz3

This configuration is also exposed in the Chassis table of the OVN
Southbound Database in a new column called "transport_zones".

The use for Transport Zones includes but are not limited to:

* Edge computing: As a way to preventing edge sites from trying to create
  tunnels with every node on every other edge site while still allowing
  these sites to create tunnels with the central node.

* Extra security layer: Where users wants to create "trust zones"
  and prevent computes in a more secure zone to communicate with a less
  secure zone.

This patch is also backward compatible so the upgrade guide for OVN [0]
is still valid and the ovn-controller service can be upgraded before the
OVSDBs.

[0] http://docs.openvswitch.org/en/latest/intro/install/ovn-upgrades/

Reported-by: Daniel Alvarez Sanchez <dalvarez@redhat.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2019-February/048255.html
Signed-off-by: Lucas Alvares Gomes <lucasagomes@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agotravis: Update sparse git repo
Yi-Hung Wei [Fri, 19 Apr 2019 18:12:15 +0000 (11:12 -0700)]
travis: Update sparse git repo

The old git tree git://git.kernel.org/pub/scm/devel/sparse/chrisl/sparse.git
has not been updated since 2016, and that triggers the following build error
on Ubuntu 18.04 host with 2.27-3 libc6-dev.  So update the sparse git repo
to the new one.

$ .travis/linux-prepare.sh
$  export PATH=$PATH:$HOME/bin
$ .travis/linux-build.sh

/usr/include/stdlib.h:140:17: error: Expected ; at end of declaration
/usr/include/stdlib.h:140:17: error: got strtof32
/usr/include/stdlib.h:146:17: error: Expected ; at end of declaration
/usr/include/stdlib.h:146:17: error: got strtof64
/usr/include/stdlib.h:158:18: error: Expected ; at end of declaration
/usr/include/stdlib.h:158:18: error: got strtof32x
/usr/include/stdlib.h:233:33: error: Expected ) in function declarator
/usr/include/stdlib.h:233:33: error: got __f
/usr/include/stdlib.h:239:33: error: Expected ) in function declarator
/usr/include/stdlib.h:239:33: error: got __f
/usr/include/stdlib.h:251:35: error: Expected ) in function declarator
/usr/include/stdlib.h:251:35: error: got __f
/usr/include/stdlib.h:316:17: error: Expected ; at end of declaration
/usr/include/stdlib.h:316:17: error: got strtof32_l
/usr/include/stdlib.h:323:17: error: Expected ; at end of declaration
/usr/include/stdlib.h:323:17: error: got strtof64_l
/usr/include/stdlib.h:337:18: error: Expected ; at end of declaration
/usr/include/stdlib.h:337:18: error: got strtof32x_l
Makefile:5288: recipe for target 'lib/aes128.lo' failed
make[2]: *** [lib/aes128.lo] Error 1
...

Tested on Jarvis: https://travis-ci.org/YiHungWei/ovs/builds/521979625

Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovsdb raft: Avoid unnecessary reconnecting during leader election.
Han Zhou [Fri, 19 Apr 2019 19:17:47 +0000 (12:17 -0700)]
ovsdb raft: Avoid unnecessary reconnecting during leader election.

If a server claims itself as "disconnected", all clients connected
to that server will try to reconnect to a new server in the cluster.

However, currently a server would claim itself as disconnected even
when itself is the candidate and try to become the new leader (most
likely it will be), and all its clients will reconnect to another
node.

During a leader fail-over (e.g. due to a leader failure), it is
expected that all clients of the old leader will have to reconnect
to other nodes in the cluster, but it is unnecessary for all the
clients of a healthy node to reconnect, which could cause more
disturbance in a large scale environment.

This patch fixes the problem by slightly change the condition that
a server regards itself as disconnected: if its role is candidate,
it is regarded as disconnected only if the election didn't succeed
at the first attempt. Related failure test cases are also unskipped
and all passed with this patch.

Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovsdb-cluster-testsuite.at: Restores "clustered transactions" tests back.
Han Zhou [Fri, 19 Apr 2019 19:17:46 +0000 (12:17 -0700)]
ovsdb-cluster-testsuite.at: Restores "clustered transactions" tests back.

In commit-2bcb3b70 (ovsdb raft: Move ovsdb cluster tests to separate
testsuite.) the "clustered transactions" tests were left unexecuted
because they depend on "EXECUTION_EXAMPLES", which is defined in
ovsdb-execution.at.

Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovn: Generate ICMPv4 packet in router pipeline for larger packets
Numan Siddique [Mon, 22 Apr 2019 19:23:58 +0000 (00:53 +0530)]
ovn: Generate ICMPv4 packet in router pipeline for larger packets

This patch adds 2 stages in router pipeline after ARP_RESOLVE
and adds the logical flows to check the packet length and
generate ICMPv4 packet.

   * S_ROUTER_IN_CHK_PKT_LEN - Which checks the packet length using
                               check_pkt_larger OVN action

   * S_ROUTER_IN_LARGER_PKTS - Which generates icmp packet with
                               type 3 (Destination Unreachable),
                               code 4 (Frag Needed and DF was Set)
                               icmp4.frag_mtu = gw_mtu

In order to add these logical flows, CMS should set the
option 'gateway_mtu' for the distributed logical router port.

Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
Acked-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovn: Support OVS action 'check_pkt_larger' in OVN
Numan Siddique [Mon, 22 Apr 2019 19:23:55 +0000 (00:53 +0530)]
ovn: Support OVS action 'check_pkt_larger' in OVN

Previous commit added a new OVS action 'check_pkt_larger'. This
patch supports that action in OVN. The syntax to use this would be

reg0[0] = check_pkt_larger(LEN)

Upcoming commit will make use of this action in ovn-northd and
will generate an ICMPv4 packet if the packet length is greater than
the specified length.

Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
Acked-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovn: Add a new OVN action 'icmp4_error'
Numan Siddique [Mon, 22 Apr 2019 19:23:51 +0000 (00:53 +0530)]
ovn: Add a new OVN action 'icmp4_error'

This action is similar to the existing 'icmp4' OVN action except that
that this action is expected to be used to generate an ICMPv4 packet
in response to an error in original IP packet. When this action
injects the icmpv4 packet, it also copies the original IP datagram
following the icmp4 header as per RFC 1122: 3.2.2

Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
Acked-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovn: Add a new OVN field icmp4.frag_mtu
Numan Siddique [Mon, 22 Apr 2019 19:23:47 +0000 (00:53 +0530)]
ovn: Add a new OVN field icmp4.frag_mtu

In order to support OVN specific fields (which are not yet
supported in OpenvSwitch to set or modify values) a generic
OVN field support is added in this patch. These OVN fields
gets translated to controller actions.

This patch adds only one field for now - icmp4.frag_mtu.
It should be fairly straightforward to add similar fields in the
near future.

Example usage.
action=(icmp4 {"eth.dst <-> eth.src; "
        "icmp4.type = 3; /* Destination Unreachable */ "
        "icmp4.code = 4; /* Fragmentation Needed */ "
         icmp4.frag_mtu = 1442;
         ...
         "next; };")

action=(icmp4.frag_mtu = 1500; ..)

pinctrl module of ovn-controller will set the specified value
in the the low-order 16 bits of the ICMP4 header field that is
labelled "unused" in the ICMP specification as defined in the RFC 1191.

Upcoming patch will use it to send an icmp4 packet if the
source IPv4 packet destined to go via external gateway needs to
be fragmented.

Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
Acked-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agodatapath: Add a new action check_pkt_len
Numan Siddique [Mon, 22 Apr 2019 19:23:43 +0000 (00:53 +0530)]
datapath: Add a new action check_pkt_len

Upstream commit:
    commit 4d5ec89fc8d14dcdab7214a0c13a1c7321dc6ea9
    Author: Numan Siddique <nusiddiq@redhat.com>
    Date:   Tue Mar 26 06:13:46 2019 +0530

    net: openvswitch: Add a new action check_pkt_len

    This patch adds a new action - 'check_pkt_len' which checks the
    packet length and executes a set of actions if the packet
    length is greater than the specified length or executes
    another set of actions if the packet length is lesser or equal to.

    This action takes below nlattrs
      * OVS_CHECK_PKT_LEN_ATTR_PKT_LEN - 'pkt_len' to check for

      * OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_GREATER - Nested actions
        to apply if the packet length is greater than the specified 'pkt_len'

      * OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_LESS_EQUAL - Nested
        actions to apply if the packet length is lesser or equal to the
        specified 'pkt_len'.

    The main use case for adding this action is to solve the packet
    drops because of MTU mismatch in OVN virtual networking solution.
    When a VM (which belongs to a logical switch of OVN) sends a packet
    destined to go via the gateway router and if the nic which provides
    external connectivity, has a lesser MTU, OVS drops the packet
    if the packet length is greater than this MTU.

    With the help of this action, OVN will check the packet length
    and if it is greater than the MTU size, it will generate an
    ICMP packet (type 3, code 4) and includes the next hop mtu in it
    so that the sender can fragment the packets.

    Reported-at:
    https://mail.openvswitch.org/pipermail/ovs-discuss/2018-July/047039.html
Suggested-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
CC: Gregory Rose <gvrose8192@gmail.com>
CC: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use of 'nla_parse_strict()' (in validate_and_copy_check_len()) is available
only in recent kernels. So changed it to 'nla_parse_nested()'.

Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoAdd a new OVS action check_pkt_larger
Numan Siddique [Mon, 22 Apr 2019 19:23:38 +0000 (00:53 +0530)]
Add a new OVS action check_pkt_larger

This patch adds a new action 'check_pkt_larger' which checks if the
packet is larger than the given size and stores the result in the
destination register.

Usage: check_pkt_larger(len)->REGISTER
Eg. match=...,actions=check_pkt_larger(1442)->NXM_NX_REG0[0],next;

This patch makes use of the new datapath action - 'check_pkt_len'
which was recently added in the commit [1].
At the start of ovs-vswitchd, datapath is probed for this action.
If the datapath action is present, then 'check_pkt_larger'
makes use of this datapath action.

Datapath action 'check_pkt_len' takes these nlattrs
      * OVS_CHECK_PKT_LEN_ATTR_PKT_LEN - 'pkt_len' to check for
      * OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_GREATER (optional) - Nested actions
        to apply if the packet length is greater than the specified 'pkt_len'
      * OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_LESS_EQUAL (optional) - Nested
        actions to apply if the packet length is lesser or equal to the
        specified 'pkt_len'.

Let's say we have these flows added to an OVS bridge br-int

table=0, priority=100 in_port=1,ip,actions=check_pkt_larger:100->NXM_NX_REG0[0],resubmit(,1)
table=1, priority=200,in_port=1,ip,reg0=0x1/0x1 actions=output:3
table=1, priority=100,in_port=1,ip,actions=output:4

Then the action 'check_pkt_larger' will be translated as
  - check_pkt_len(size=100,gt(3),le(4))

datapath will check the packet length and if the packet length is greater than 100,
it will output to port 3, else it will output to port 4.

In case, datapath doesn't support 'check_pkt_len' action, the OVS action
'check_pkt_larger' sets SLOW_ACTION so that datapath flow is not added.

This OVS action is intended to be used by OVN to check the packet length
and generate an ICMP packet with type 3, code 4 and next hop mtu
in the logical router pipeline if the MTU of the physical interface
is lesser than the packet length. More information can be found here [2]

[1] - https://kernel.googlesource.com/pub/scm/linux/kernel/git/davem/net-next/+/4d5ec89fc8d14dcdab7214a0c13a1c7321dc6ea9
[2] - https://mail.openvswitch.org/pipermail/ovs-discuss/2018-July/047039.html

Reported-at:
https://mail.openvswitch.org/pipermail/ovs-discuss/2018-July/047039.html
Suggested-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
CC: Ben Pfaff <blp@ovn.org>
CC: Gregory Rose <gvrose8192@gmail.com>
Acked-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agonetdev-linux: Add coverage counters for netdev_set_policing when ingress tc-offload
Tonghao Zhang [Sat, 20 Apr 2019 00:25:08 +0000 (17:25 -0700)]
netdev-linux: Add coverage counters for netdev_set_policing when ingress tc-offload

When enable tc-offload, we should add coverage counters for netdev_set_policing.

Fixes: e7f6ba220e10 ("lib/tc: add ingress ratelimiting support for tc-offload")
Cc: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agodpif-netdev: fix meter at high packet rate.
William Tu [Fri, 19 Apr 2019 22:26:41 +0000 (15:26 -0700)]
dpif-netdev: fix meter at high packet rate.

When testing packet rate around 1Mpps with meter enabled, the frequency
of hitting meter action becomes much higher, around 30us each time.
As a result, the meter's calculation of 'uint32_t delta_t' becomes
always 0 and meter action has no effect.  This is due to the previous
commit 05f9e707e194 divides the delta by 1000, in order to convert to
msec granularity.  The patch fixes it updating the time when across
millisecond boundary.

Fixes: 05f9e707e194 ("dpif-netdev: Use microsecond granularity.")
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoselinux: update for netlink socket types
Aaron Conole [Wed, 17 Apr 2019 20:07:25 +0000 (16:07 -0400)]
selinux: update for netlink socket types

These are used for interfacing with conntrack, as well as by some
DPDK PMDs

Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Ansis Atteka <aatteka@ovn.org>
5 years agodpif-netdev: Update comment about flow installation race.
Ilya Maximets [Wed, 17 Apr 2019 08:43:56 +0000 (11:43 +0300)]
dpif-netdev: Update comment about flow installation race.

Userspace datapath uses per-PMD flow tables/classifiers for a long
time. However, it was decided to keep this race window to not block
revalidators. Comment should be updated to reflect the current state.

Fixes: 1c1e46ed8457 ("dpif-netdev: Add per-pmd flow-table/classifier.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
5 years agodpif-netdev: Fix double parsing of packets when EMC disabled.
Ilya Maximets [Mon, 11 Mar 2019 16:31:50 +0000 (19:31 +0300)]
dpif-netdev: Fix double parsing of packets when EMC disabled.

This partially reverts commit bde94613e6276d48a6e0be7a592ebcf9836b4aaf.

Commit bde94613e627 was aimed to slightly ( < 1%) increase performance
in the case where EMC disabled, but it avoids RSS hash calculation and
OVS has to calculate it while executing OVS_ACTION_ATTR_HASH in order
to handle balanced-tcp bonding. At the time of executing that action
there is no parsed flow, and OVS parses the packet for the second time
to calculate the hash. This happens for all packets received from the
virtual interfaces because they have no HW RSS.

Here is the example of 'perf' output for VM --> (bonded PHY) traffic:

  Samples: 401K of event 'cycles', Event count (approx.): 50964771478
  Overhead  Shared Object       Symbol
    27.50%  ovs-vswitchd        [.] dpcls_lookup.370382
    16.30%  ovs-vswitchd        [.] rte_vhost_dequeue_burst.9267
    14.95%  ovs-vswitchd        [.] miniflow_extract
     7.22%  ovs-vswitchd        [.] flow_extract
     7.10%  ovs-vswitchd        [.] dp_netdev_input__.371002.4826
     4.01%  ovs-vswitchd        [.] fast_path_processing.370987.4893

We can see that packet parsed twice. First time by 'miniflow_extract'
right after receiving and the second time by 'flow_extract' while
executing actions.

In this particular case calculating RSS on receive saves > 7% of the
total CPU processing time. It varies from ~7 to ~10 % depending on
scenario/traffic types.

It's better to calculate hash each time because performance
improvements of avoiding are negligible in compare with performance
drop in case of sending packets to bonded interface.

Another solution could be to pass the parsed flow explicitly through
the datapath, but this will require big code changes and will have
additional overhead for metadata updating on packet changes.

Also, this change should have small impact since SMC works well in most
cases and will be enabled/recommended by default in the future.

CC: Antonio Fischetti <antonio.fischetti@intel.com>
Fixes: bde94613e627 ("dpif-netdev: Avoid reading RSS hash when EMC is disabled.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
5 years agoovn-nbctl: Fix 32-bit build with gcc.
Ilya Maximets [Wed, 17 Apr 2019 17:22:06 +0000 (20:22 +0300)]
ovn-nbctl: Fix 32-bit build with gcc.

ovn/utilities/ovn-nbctl.c: In function 'print_routing_policy':
ovn/utilities/ovn-nbctl.c:3620:23: error: format '%ld' expects argument
    of type 'long int', but argument 3 has type 'int64_t'
                       policy->match, policy->action, next_hop);
                       ^
ovn/utilities/ovn-nbctl.c:3624:23: error: format '%ld' expects argument
    of type 'long int', but argument 3 has type 'int64_t'
                       policy->match, policy->action);
                       ^
ovn/utilities/ovn-nbctl.c: In function 'cmd_ha_ch_grp_list':
ovn/utilities/ovn-nbctl.c:5056:27: error: format '%lu' expects argument
    of type 'long unsigned int', but argument 10 has type 'int64_t'
                           ha_ch->priority);
                           ^
cc1: all warnings being treated as errors
make[2]: *** [ovn/utilities/ovn-nbctl.o] Error 1

https://travis-ci.org/openvswitch/ovs/jobs/521015912

CC: Numan Siddique <nusiddiq@redhat.com>
CC: Mary Manohar <mary.manohar@nutanix.com>
Fixes: 1be1e0e5e0d1 ("ovn: Add generic HA chassis group")
Fixes: a64bb573468f ("Policy-based routing (PBR) in OVN.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoOVN: Add NEWS for Policy-based routing
Mary Manohar [Wed, 17 Apr 2019 02:05:30 +0000 (02:05 +0000)]
OVN: Add NEWS for Policy-based routing

Signed-off-by: Mary Manohar <mary.manohar at nutanix.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agocompat: iptunnel: NULL pointer deref for ip_md_tunnel_xmit
Alan Maguire [Wed, 27 Mar 2019 15:32:19 +0000 (08:32 -0700)]
compat: iptunnel: NULL pointer deref for ip_md_tunnel_xmit

Upstream commit:
    commit f4b3ec4e6aa1a2ca437905a519ae08e8cf6af754
    Author: Alan Maguire <alan.maguire@oracle.com>
    Date:   Wed Mar 6 10:25:42 2019 +0000

    iptunnel: NULL pointer deref for ip_md_tunnel_xmit

    Naresh Kamboju noted the following oops during execution of selftest
    tools/testing/selftests/bpf/test_tunnel.sh on x86_64:

    [  274.120445] BUG: unable to handle kernel NULL pointer dereference
    at 0000000000000000
    [  274.128285] #PF error: [INSTR]
    [  274.131351] PGD 8000000414a0e067 P4D 8000000414a0e067 PUD 3b6334067 PMD 0
    [  274.138241] Oops: 0010 [#1] SMP PTI
    [  274.141734] CPU: 1 PID: 11464 Comm: ping Not tainted
    5.0.0-rc4-next-20190129 #1
    [  274.149046] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
    2.0b 07/27/2017
    [  274.156526] RIP: 0010:          (null)
    [  274.160280] Code: Bad RIP value.
    [  274.163509] RSP: 0018:ffffbc9681f83540 EFLAGS: 00010286
    [  274.168726] RAX: 0000000000000000 RBX: ffffdc967fa80a18 RCX: 0000000000000000
    [  274.175851] RDX: ffff9db2ee08b540 RSI: 000000000000000e RDI: ffffdc967fa809a0
    [  274.182974] RBP: ffffbc9681f83580 R08: ffff9db2c4d62690 R09: 000000000000000c
    [  274.190098] R10: 0000000000000000 R11: ffff9db2ee08b540 R12: ffff9db31ce7c000
    [  274.197222] R13: 0000000000000001 R14: 000000000000000c R15: ffff9db3179cf400
    [  274.204346] FS:  00007ff4ae7c5740(0000) GS:ffff9db31fa80000(0000)
    knlGS:0000000000000000
    [  274.212424] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [  274.218162] CR2: ffffffffffffffd6 CR3: 00000004574da004 CR4: 00000000003606e0
    [  274.225292] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [  274.232416] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [  274.239541] Call Trace:
    [  274.241988]  ? tnl_update_pmtu+0x296/0x3b0
    [  274.246085]  ip_md_tunnel_xmit+0x1bc/0x520
    [  274.250176]  gre_fb_xmit+0x330/0x390
    [  274.253754]  gre_tap_xmit+0x128/0x180
    [  274.257414]  dev_hard_start_xmit+0xb7/0x300
    [  274.261598]  sch_direct_xmit+0xf6/0x290
    [  274.265430]  __qdisc_run+0x15d/0x5e0
    [  274.269007]  __dev_queue_xmit+0x2c5/0xc00
    [  274.273011]  ? dev_queue_xmit+0x10/0x20
    [  274.276842]  ? eth_header+0x2b/0xc0
    [  274.280326]  dev_queue_xmit+0x10/0x20
    [  274.283984]  ? dev_queue_xmit+0x10/0x20
    [  274.287813]  arp_xmit+0x1a/0xf0
    [  274.290952]  arp_send_dst.part.19+0x46/0x60
    [  274.295138]  arp_solicit+0x177/0x6b0
    [  274.298708]  ? mod_timer+0x18e/0x440
    [  274.302281]  neigh_probe+0x57/0x70
    [  274.305684]  __neigh_event_send+0x197/0x2d0
    [  274.309862]  neigh_resolve_output+0x18c/0x210
    [  274.314212]  ip_finish_output2+0x257/0x690
    [  274.318304]  ip_finish_output+0x219/0x340
    [  274.322314]  ? ip_finish_output+0x219/0x340
    [  274.326493]  ip_output+0x76/0x240
    [  274.329805]  ? ip_fragment.constprop.53+0x80/0x80
    [  274.334510]  ip_local_out+0x3f/0x70
    [  274.337992]  ip_send_skb+0x19/0x40
    [  274.341391]  ip_push_pending_frames+0x33/0x40
    [  274.345740]  raw_sendmsg+0xc15/0x11d0
    [  274.349403]  ? __might_fault+0x85/0x90
    [  274.353151]  ? _copy_from_user+0x6b/0xa0
    [  274.357070]  ? rw_copy_check_uvector+0x54/0x130
    [  274.361604]  inet_sendmsg+0x42/0x1c0
    [  274.365179]  ? inet_sendmsg+0x42/0x1c0
    [  274.368937]  sock_sendmsg+0x3e/0x50
    [  274.372460]  ___sys_sendmsg+0x26f/0x2d0
    [  274.376293]  ? lock_acquire+0x95/0x190
    [  274.380043]  ? __handle_mm_fault+0x7ce/0xb70
    [  274.384307]  ? lock_acquire+0x95/0x190
    [  274.388053]  ? __audit_syscall_entry+0xdd/0x130
    [  274.392586]  ? ktime_get_coarse_real_ts64+0x64/0xc0
    [  274.397461]  ? __audit_syscall_entry+0xdd/0x130
    [  274.401989]  ? trace_hardirqs_on+0x4c/0x100
    [  274.406173]  __sys_sendmsg+0x63/0xa0
    [  274.409744]  ? __sys_sendmsg+0x63/0xa0
    [  274.413488]  __x64_sys_sendmsg+0x1f/0x30
    [  274.417405]  do_syscall_64+0x55/0x190
    [  274.421064]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [  274.426113] RIP: 0033:0x7ff4ae0e6e87
    [  274.429686] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 80 00
    00 00 00 8b 05 ca d9 2b 00 48 63 d2 48 63 ff 85 c0 75 10 b8 2e 00 00
    00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 53 48 89 f3 48 83 ec 10 48 89 7c
    24 08
    [  274.448422] RSP: 002b:00007ffcd9b76db8 EFLAGS: 00000246 ORIG_RAX:
    000000000000002e
    [  274.455978] RAX: ffffffffffffffda RBX: 0000000000000040 RCX: 00007ff4ae0e6e87
    [  274.463104] RDX: 0000000000000000 RSI: 00000000006092e0 RDI: 0000000000000003
    [  274.470228] RBP: 0000000000000000 R08: 00007ffcd9bc40a0 R09: 00007ffcd9bc4080
    [  274.477349] R10: 000000000000060a R11: 0000000000000246 R12: 0000000000000003
    [  274.484475] R13: 0000000000000016 R14: 00007ffcd9b77fa0 R15: 00007ffcd9b78da4
    [  274.491602] Modules linked in: cls_bpf sch_ingress iptable_filter
    ip_tables algif_hash af_alg x86_pkg_temp_thermal fuse [last unloaded:
    test_bpf]
    [  274.504634] CR2: 0000000000000000
    [  274.507976] ---[ end trace 196d18386545eae1 ]---
    [  274.512588] RIP: 0010:          (null)
    [  274.516334] Code: Bad RIP value.
    [  274.519557] RSP: 0018:ffffbc9681f83540 EFLAGS: 00010286
    [  274.524775] RAX: 0000000000000000 RBX: ffffdc967fa80a18 RCX: 0000000000000000
    [  274.531921] RDX: ffff9db2ee08b540 RSI: 000000000000000e RDI: ffffdc967fa809a0
    [  274.539082] RBP: ffffbc9681f83580 R08: ffff9db2c4d62690 R09: 000000000000000c
    [  274.546205] R10: 0000000000000000 R11: ffff9db2ee08b540 R12: ffff9db31ce7c000
    [  274.553329] R13: 0000000000000001 R14: 000000000000000c R15: ffff9db3179cf400
    [  274.560456] FS:  00007ff4ae7c5740(0000) GS:ffff9db31fa80000(0000)
    knlGS:0000000000000000
    [  274.568541] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [  274.574277] CR2: ffffffffffffffd6 CR3: 00000004574da004 CR4: 00000000003606e0
    [  274.581403] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [  274.588535] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [  274.595658] Kernel panic - not syncing: Fatal exception in interrupt
    [  274.602046] Kernel Offset: 0x14400000 from 0xffffffff81000000
    (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
    [  274.612827] ---[ end Kernel panic - not syncing: Fatal exception in
    interrupt ]---
    [  274.620387] ------------[ cut here ]------------

    I'm also seeing the same failure on x86_64, and it reproduces
    consistently.

    >From poking around it looks like the skb's dst entry is being used
    to calculate the mtu in:

    mtu = skb_dst(skb) ? dst_mtu(skb_dst(skb)) : dev->mtu;

    ...but because that dst_entry  has an "ops" value set to md_dst_ops,
    the various ops (including mtu) are not set:

    crash> struct sk_buff._skb_refdst ffff928f87447700 -x
          _skb_refdst = 0xffffcd6fbf5ea590
    crash> struct dst_entry.ops 0xffffcd6fbf5ea590
      ops = 0xffffffffa0193800
    crash> struct dst_ops.mtu 0xffffffffa0193800
      mtu = 0x0
    crash>

    I confirmed that the dst entry also has dst->input set to
    dst_md_discard, so it looks like it's an entry that's been
    initialized via __metadata_dst_init alright.

    I think the fix here is to use skb_valid_dst(skb) - it checks
    for  DST_METADATA also, and with that fix in place, the
    problem - which was previously 100% reproducible - disappears.

    The below patch resolves the panic and all bpf tunnel tests pass
    without incident.

Fixes: c8b34e680a09 ("ip_tunnel: Add tnl_update_pmtu in ip_md_tunnel_xmit")
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Anders Roxell <anders.roxell@linaro.org>
Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Tested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixed up for backward compatibility to our own compat layer ip_tunnel.c
module.

Cc: Alan Maguire <alan.maguire@oracle.com>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agodatapath: fix missing checks for nla_nest_start
Kangjie Lu [Wed, 27 Mar 2019 15:32:18 +0000 (08:32 -0700)]
datapath: fix missing checks for nla_nest_start

Upstream commit:
    commit 0fff9bd47e1341b5c4db862cc39fc68ce45f165d
    Author: Kangjie Lu <kjlu@umn.edu>
    Date:   Fri Mar 15 01:11:22 2019 -0500

    net: openvswitch: fix missing checks for nla_nest_start

    nla_nest_start may fail and thus deserves a check.
    The fix returns -EMSGSIZE when it fails.

Signed-off-by: Kangjie Lu <kjlu@umn.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Kangjie Lu <kjlu@umn.edu>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agonet: openvswitch: fix a NULL pointer dereference
Kangjie Lu [Wed, 27 Mar 2019 15:32:17 +0000 (08:32 -0700)]
net: openvswitch: fix a NULL pointer dereference

Upstream commit:
    commit 6f19893b644a9454d85e593b5e90914e7a72b7dd
    Author: Kangjie Lu <kjlu@umn.edu>
    Date:   Thu Mar 14 23:20:16 2019 -0500

    net: openvswitch: fix a NULL pointer dereference

    upcall is dereferenced even when genlmsg_put fails. The fix
    goto out to avoid the NULL pointer dereference in this case.

Signed-off-by: Kangjie Lu <kjlu@umn.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Kangjie Lu <kjlu@umn.edu>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agodatapath: convert to kvmalloc
Kent Overstreet [Wed, 27 Mar 2019 15:32:16 +0000 (08:32 -0700)]
datapath: convert to kvmalloc

Upstream commit:
    commit ee9c5e67557f9663b27946ba1d3813fb1924b1fe
    Author: Kent Overstreet <kent.overstreet@gmail.com>
    Date:   Mon Mar 11 23:31:02 2019 -0700

    openvswitch: convert to kvmalloc

    Patch series "generic radix trees; drop flex arrays".

    This patch (of 7):

    There was no real need for this code to be using flexarrays, it's just
    implementing a hash table - ideally it would be using rhashtables, but
    that conversion would be significantly more complicated.

Link: http://lkml.kernel.org/r/20181217131929.11727-2-kent.overstreet@gmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Reviewed-by: Matthew Wilcox <willy@infradead.org>
Cc: Pravin B Shelar <pshelar@ovn.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Eric Paris <eparis@parisplace.org>
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Shaohua Li <shli@kernel.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agorhel ovn: Remove ovn-common rpm
Numan Siddique [Tue, 16 Apr 2019 09:01:53 +0000 (14:31 +0530)]
rhel ovn: Remove ovn-common rpm

ovn-fedora spec generates the rpms - ovn, ovn-common, ovn-host etc
in which ovn is an empty package. The ovn fedora spec file here [1]
has moved all the ovn-common files to the 'ovn' package.
This patch does the same.

[1] - https://src.fedoraproject.org/rpms/ovn/blob/master/f/ovn.spec

Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
CC: Timothy Redaelli <tredaelli@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agonetlink linux: fix to append the netnsid netlink attr.
Flavio Leitner [Tue, 26 Mar 2019 17:15:00 +0000 (14:15 -0300)]
netlink linux: fix to append the netnsid netlink attr.

The attribute was being prepended to the netlink buffer, but
the function  nl_sock_transact_multiple__() expects to find the
netlink header as first to update the length, seq and pid fields.

This patch fixes to append the attribute instead of prepending it.

Fixes: 756819ddd788 ("netdev-linux: use netlink to update netdev.")
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agonetlink linux: account for the netnsid netlink attr.
Flavio Leitner [Tue, 26 Mar 2019 17:14:59 +0000 (14:14 -0300)]
netlink linux: account for the netnsid netlink attr.

The buffer needs to be reallocated and data copied when
the netnsid netlink attribute is included, so avoid that
by accounting the attribute when the buffer is initially
allocated.

Fixes: 756819ddd788 ("netdev-linux: use netlink to update netdev.")
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>