]> git.proxmox.com Git - mirror_ovs.git/log
mirror_ovs.git
4 years agoovsdb-server: fix memory leak while deleting zone
Damijan Skvarc [Tue, 12 Nov 2019 11:32:35 +0000 (12:32 +0100)]
ovsdb-server: fix memory leak while deleting zone

memory leak was detected by valgrind during execution
of "database commands -- positive checks" test.

leaked memory was allocated in ovsdb_execute_mutate() function
while parsing mutations from the apparent json entity:

==19563==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==19563==    by 0x4652D0: xmalloc (util.c:138)
==19563==    by 0x46539E: xmemdup0 (util.c:168)
==19563==    by 0x4653F7: xstrdup (util.c:177)
==19563==    by 0x450379: ovsdb_base_type_clone (ovsdb-types.c:208)
==19563==    by 0x450F8D: ovsdb_type_clone (ovsdb-types.c:550)
==19563==    by 0x428C3F: ovsdb_mutation_from_json (mutation.c:108)
==19563==    by 0x428F6B: ovsdb_mutation_set_from_json (mutation.c:187)
==19563==    by 0x42578D: ovsdb_execute_mutate (execution.c:573)
==19563==    by 0x4246B0: ovsdb_execute_compose (execution.c:171)
==19563==    by 0x41CDE5: ovsdb_trigger_try (trigger.c:204)
==19563==    by 0x41C8DF: ovsdb_trigger_init (trigger.c:61)
==19563==    by 0x40E93C: ovsdb_jsonrpc_trigger_create (jsonrpc-server.c:1135)
==19563==    by 0x40E20C: ovsdb_jsonrpc_session_got_request (jsonrpc-server.c:1002)
==19563==    by 0x40D1C2: ovsdb_jsonrpc_session_run (jsonrpc-server.c:561)
==19563==    by 0x40D31E: ovsdb_jsonrpc_session_run_all (jsonrpc-server.c:591)
==19563==    by 0x40CD6E: ovsdb_jsonrpc_server_run (jsonrpc-server.c:406)
==19563==    by 0x40627E: main_loop (ovsdb-server.c:209)
==19563==    by 0x406E66: main (ovsdb-server.c:460)

This memory is usually freed at the end of ovsdb_execute_mutate()
however in the aforementioned test case this does not happen. Namely
in case of delete mutator and in case of error while calling ovsdb_datum_from_json()
apparent mutation was marked as invalid, what prevents freeing problematic memory.

Memory leak can be reproduced quickly with the following command sequence:
ovs-vsctl --no-wait -vreconnect:emer add-zone-tp netdev zone=1 icmp_first=1 icmp_reply=2
ovs-vsctl --no-wait -vreconnect:emer del-zone-tp netdev zone=1

Signed-off-by: Damijan Skvarc <damjan.skvarc@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agocompat: Add missing inline keyword
Greg Rose [Tue, 5 Nov 2019 22:14:24 +0000 (14:14 -0800)]
compat: Add missing inline keyword

The missing inline keyword before the definition of the
rpl_nf_ct_tmpl_free() function causes spurious warnings about the
function not being used on some older kernels.  Add the keyword
to suppress the warning.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoovs-actions: Clarify documentation for stack usage with group buckets.
Ben Pfaff [Wed, 20 Nov 2019 19:53:46 +0000 (11:53 -0800)]
ovs-actions: Clarify documentation for stack usage with group buckets.

This should be less confusing now.

Reported-by: Han Zhou <hzhou@ovn.org>
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agonetdev-afxdp: Best-effort configuration of XDP mode.
Ilya Maximets [Wed, 6 Nov 2019 21:38:33 +0000 (21:38 +0000)]
netdev-afxdp: Best-effort configuration of XDP mode.

Until now there was only two options for XDP mode in OVS: SKB or DRV.
i.e. 'generic XDP' or 'native XDP with zero-copy enabled'.

Devices like 'veth' interfaces in Linux supports native XDP, but
doesn't support zero-copy mode.  This case can not be covered by
existing API and we have to use slower generic XDP for such devices.
There are few more issues, e.g. TCP is not supported in generic XDP
mode for veth interfaces due to kernel limitations, however it is
supported in native mode.

This change introduces ability to use native XDP without zero-copy
along with best-effort configuration option that enabled by default.
In best-effort case OVS will sequentially try different modes starting
from the fastest one and will choose the first acceptable for current
interface.  This will guarantee the best possible performance.

If user will want to choose specific mode, it's still possible by
setting the 'options:xdp-mode'.

This change additionally changes the API by renaming the configuration
knob from 'xdpmode' to 'xdp-mode' and also renaming the modes
themselves to be more user-friendly.

The full list of currently supported modes:
  * native-with-zerocopy - former DRV
  * native               - new one, DRV without zero-copy
  * generic              - former SKB
  * best-effort          - new one, chooses the best available from
                           3 above modes

Since 'best-effort' is a default mode, users will not need to
explicitely set 'xdp-mode' in most cases.

TCP related tests enabled back in system afxdp testsuite, because
'best-effort' will choose 'native' mode for veth interfaces
and this mode has no issues with TCP.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
4 years agodpdk: Deprecate pdump support.
Ilya Maximets [Mon, 11 Nov 2019 18:52:56 +0000 (19:52 +0100)]
dpdk: Deprecate pdump support.

The conventional way for packet dumping in OVS is to use ovs-tcpdump
that works via traffic mirroring.  DPDK pdump could probably be used
for some lower level debugging, but it is not commonly used for
various reasons.

There are lots of limitations for using this functionality in practice.
Most of them connected with running secondary pdump process and
memory layout issues like requirement to disable ASLR in kernel.
More details are available in DPDK guide:
https://doc.dpdk.org/guides/prog_guide/multi_proc_support.html#multi-process-limitations

Beside the functional limitations it's also hard to use this
functionality correctly.  User must be sure that OVS and pdump utility
are running on different CPU cores, which is hard because non-PMD
threads could float over available CPU cores.  This or any other
misconfiguration will likely lead to crash of the pdump utility
or/and OVS.

Another problem is that the user must actually have this special pdump
utility in a system and it might be not available in distributions.

This change disables pdump support by default introducing special
configuration option '--enable-dpdk-pdump'.  Deprecation warnings will
be shown to users on configuration and in runtime.

Claiming to completely remove this functionality from OVS in one
of the next releases.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
4 years agonetdev-afxdp: add afxdp specific maximum MTU check
Eelco Chaudron [Tue, 12 Nov 2019 09:46:09 +0000 (04:46 -0500)]
netdev-afxdp: add afxdp specific maximum MTU check

Drivers natively supporting AF_XDP will check that a configured MTU size
will not exceed the allowed size for AF_XDP. However, when the skb
compatibility mode is used there is no check and any value is accepted.
This, for example, is the case when using the TAP interface.

This fix adds a check to make sure only AF_XDP valid values are excepted.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agodpif-netdev: Log rxq assignment for isolated pmd.
Gowrishankar Muthukrishnan [Sat, 9 Nov 2019 03:11:27 +0000 (08:41 +0530)]
dpif-netdev: Log rxq assignment for isolated pmd.

There is no log about isolated rxq assignment in a pmd today, which
sometimes could be useful to trace rxq/pmd pinning, when debugging
with log. Ovs-appctl dpif-netdev/pmd-rxq-show reports about it
already, but logging is helpful to trace pinning in time.

Signed-off-by: Gowrishankar Muthukrishnan <gmuthukr@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agorhel: Fix ovs-kmod-manage.sh that may create invalid soft links
Yifeng Sun [Mon, 18 Nov 2019 20:06:26 +0000 (12:06 -0800)]
rhel: Fix ovs-kmod-manage.sh that may create invalid soft links

Current code iterates every kernel under '/lib/modules' for a matched
version. As a result, this script may create invalid soft links if the
matched kernel doesn't have openvswitch-kmod RPM installed.

This patch fixes it.

VMWare-BZ: #2257534

Fixes: c3570519 ("rhel: add 4.4 kernel in kmod build with mulitple versions, fedora")
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agoovs-bugtool: Script to collect the port statistics.
Sriram Vatala [Tue, 29 Oct 2019 14:50:06 +0000 (20:20 +0530)]
ovs-bugtool: Script to collect the port statistics.

Sometimes, analysing the drop statistics of the ports
will be helpful in debugging. This patch adds script
to collect all supported port stats which also includes
the drop counters in userspace datapath. The output of
this script is included in the bugtool output.

Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Sriram Vatala <sriram.v@altencalsoftlabs.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agonetdev-dpdk: Detailed packet drop statistics.
Sriram Vatala [Tue, 29 Oct 2019 14:50:05 +0000 (20:20 +0530)]
netdev-dpdk: Detailed packet drop statistics.

OVS may be unable to transmit packets for multiple reasons on
the userspace datapath and today there is a single counter to
track packets dropped due to any of those reasons. This patch
adds custom software stats for the different reasons packets
may be dropped during tx/rx on the userspace datapath in OVS.

- MTU drops : drops that occur due to a too large packet size
- Qos drops : drops that occur due to egress/ingress QOS
- Tx failures: drops as returned by the DPDK PMD send function

Note that the reason for tx failures is not specified in OVS.
In practice for vhost ports it is most common that tx failures
are because there are not enough available descriptors,
which is usually caused by misconfiguration of the guest queues
and/or because the guest is not consuming packets fast enough
from the queues.

These counters are displayed along with other stats in
"ovs-vsctl get interface <iface> statistics" command and are
available for dpdk and vhostuser/vhostuserclient ports.

Also the existing "tx_retries" counter for vhost ports has been
renamed to "ovs_tx_retries", so that all the custom statistics
that OVS accumulates itself will have the prefix "ovs_". This
will prevent any custom stats names overlapping with
driver/HW stats.

Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Sriram Vatala <sriram.v@altencalsoftlabs.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agonetdev-dpdk: Reuse vhost function for dpdk ETH custom stats.
Ilya Maximets [Tue, 29 Oct 2019 14:50:04 +0000 (20:20 +0530)]
netdev-dpdk: Reuse vhost function for dpdk ETH custom stats.

This is yet another refactoring for upcoming detailed drop stats.
It allows to use single function for all the software calculated
statistics in netdev-dpdk for both vhost and ETH ports.

UINT64_MAX used as a marker for non-supported statistics in a
same way as it's done in bridge.c for common netdev stats.

Co-authored-by: Sriram Vatala <sriram.v@altencalsoftlabs.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Sriram Vatala <sriram.v@altencalsoftlabs.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
4 years agotc: Set 'no_percpu' flag for compatible actions
Vlad Buslov [Mon, 4 Nov 2019 16:34:49 +0000 (18:34 +0200)]
tc: Set 'no_percpu' flag for compatible actions

Recent changes in Linux kernel TC action subsystem introduced new
TCA_ACT_FLAGS_NO_PERCPU_STATS flag. The purpose of the flag is to request
action implementation to skip allocating action stats with expensive percpu
allocator and use regular built-in action stats instead. Such approach
significantly improves rule insertion rate and reduce memory usage for
hardware-offloaded rules that don't need benefits provided by percpu
allocated stats (improved software TC fast-path performance). Set the flag
for all compatible actions.

Modify acinclude.m4 to use OVS-internal pkt_cls.h implementation when
TCA_ACT_FLAGS is not defined by kernel headers and to manually define
struct nla_bitfield32 in netlink.h (new file) when it is not defined by
kernel headers.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agocompat: Add compat fix for old kernels
Roi Dayan [Wed, 6 Nov 2019 07:34:46 +0000 (09:34 +0200)]
compat: Add compat fix for old kernels

In kernels older than 4.8, struct tcf_t didn't have the firstuse.
If openvswitch is compiled with the compat pkt_cls.h then there is
a struct size mismatch between openvswitch and the kernel which cause
parsing netlink actions to fail.
After this commit parsing the netlink actions pass even if compiled with
the compat pkt_cls.h.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agonetdev-dpdk: Track vhost tx contention.
David Marchand [Mon, 26 Aug 2019 14:33:17 +0000 (16:33 +0200)]
netdev-dpdk: Track vhost tx contention.

Add a coverage counter to help diagnose contention on the vhost txqs.
This is seen as dropped packets on the physical ports for rates that
are usually handled fine by OVS.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agotests: Wait up to OVS_CTL_TIMEOUT seconds.
Ilya Maximets [Wed, 6 Nov 2019 16:29:58 +0000 (17:29 +0100)]
tests: Wait up to OVS_CTL_TIMEOUT seconds.

While running tests under valgrind, it could take more than 10 seconds
for process to disappear after successful 'ovs-appctl exit' command.

Same applies to some other events that tests are waiting for with
OVS_WAIT macro.  This makes tests to fail frequently under valgrind.

Using OVS_CTL_TIMEOUT variable instead of constant 10 seconds seems
reasonable to avoid this issue because it controls timeouts of all
control utilities and needs to be adjusted while running under valgrind
anyway.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agovswitch.xml: Fix column for xdpmode.
Ilya Maximets [Tue, 5 Nov 2019 20:54:08 +0000 (21:54 +0100)]
vswitch.xml: Fix column for xdpmode.

'xdpmode' is part of 'options', not the 'other_config'.

CC: William Tu <u9012063@gmail.com>
Fixes: 0de1b425962d ("netdev-afxdp: add new netdev type for AF_XDP.")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agoip_gre: Remove even more unused code
Greg Rose [Mon, 4 Nov 2019 20:43:47 +0000 (12:43 -0800)]
ip_gre: Remove even more unused code

There is a confusing mix of ipgre and gretap functions with some
needed for gretap still having ipgre_ prefixes.  This time though
I think I got the rest of the unused ipgre code.

Passes Travis here and this time I made sure the patch passing
Travis is the same one I'm mailing.
https://travis-ci.org/gvrose8192/ovs-experimental/builds/607296133

Fixes: d5822f428814 ("gre: Remove dead ipgre code")
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoAUTHORS: Add Tomasz Konieczny.
Ilya Maximets [Mon, 4 Nov 2019 19:27:01 +0000 (20:27 +0100)]
AUTHORS: Add Tomasz Konieczny.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agonetdev-dpdk: Fix flow control not configuring.
Tomasz Konieczny [Thu, 12 Sep 2019 10:43:20 +0000 (12:43 +0200)]
netdev-dpdk: Fix flow control not configuring.

Currently OVS is unable to change flow control configuration in DPDK
because new settings are being overwritten by current settings with
rte_eth_dev_flow_ctrl_get(). The fix restores correct order of
operations and at the same time does not trigger error on devices
without flow control support when flow control not requested.

Fixes: 7e1de65e8dfb ("netdev-dpdk: Fix failure to configure flow control at netdev-init.")
Signed-off-by: Tomasz Konieczny <tomaszx.konieczny@intel.com>
Co-authored-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agofaq: Fix meter action releases.
Darrell Ball [Sat, 2 Nov 2019 19:05:34 +0000 (12:05 -0700)]
faq: Fix meter action releases.

At the same time disambiguate some feature descriptions.
'Meters' is changed to 'Meter action' to clarify that the entry
describes the Openflow meter action rather than port based meters.
'NAT' is changed to 'Conntrack NAT' to indicate that this entry
represents NAT done in 'conntrack', rather than basic Openflow
IP address and L4 port modifications.

Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agolib/tc: Fix flow dump for tunnel id equal zero
Dmytro Linkin [Wed, 30 Oct 2019 12:40:35 +0000 (14:40 +0200)]
lib/tc: Fix flow dump for tunnel id equal zero

Tunnel id 0 is not printed unless tunnel flag FLOW_TNL_F_KEY is set.
Fix that by always setting FLOW_TNL_F_KEY when tunnel id is valid.

Fixes: 0227bf092ee6 ("lib/tc: Support optional tunnel id")
Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agotravis: Fix skipping of python SSL tests.
Ilya Maximets [Fri, 1 Nov 2019 22:16:28 +0000 (23:16 +0100)]
travis: Fix skipping of python SSL tests.

After this change we'll have only one windows related skipped test
in default build.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
4 years agotravis: Workaround skipping of IPv6 tests.
Ilya Maximets [Fri, 1 Nov 2019 21:53:03 +0000 (22:53 +0100)]
travis: Workaround skipping of IPv6 tests.

IPv6 support disabled in TravisCI images but supported by kernel.
So, we could enable it in order to not skip unit tests.
We are not trying to communicate over network with IPv6, so this
should not make any harm.

Related issue: https://github.com/travis-ci/travis-ci/issues/8891

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
4 years agoofp-monitor: Fixed the usage of 'usable_protocols' variable in 'parse_flow_monitor_re...
Ashish Varma [Tue, 23 Jul 2019 20:02:10 +0000 (13:02 -0700)]
ofp-monitor: Fixed the usage of 'usable_protocols' variable in 'parse_flow_monitor_request' function.

'usable_protocols' is now getting set to OFPUTIL_P_OF10_ANY on return from
'parse_flow_monitor_request' function. The calling function now checks for the
value in this variable against the 'allowed_protocols' variable.
Also a check is added for a match field which is not supported in OpenFlow 1.0
and return an error.
Modified the man page of ovs-ofctl to reflect Flow Monitor support as
OpenFlow 1.0 Nicira extension only.

Signed-off-by: Ashish Varma <ashishvarma.ovs@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoRevert "ip_gre: Remove even more unused code"
Greg Rose [Fri, 1 Nov 2019 16:07:54 +0000 (09:07 -0700)]
Revert "ip_gre: Remove even more unused code"

This reverts commit 42a059e02bf343787951be2824c579e1c9a26e12.

Not all the necessary ipgre prefixed code was removed that
should have been.  Another patch will follow with the correct
removed code.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agotravis: Enable pdump for DPDK build.
Ilya Maximets [Wed, 30 Oct 2019 18:39:46 +0000 (19:39 +0100)]
travis: Enable pdump for DPDK build.

OVS has support for DPDK pdump that checked in configure script.
Enabling it to increase OVS build test coverage by the code guarded
by DPDK_PDUMP macro.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: David Marchand <david.marchand@redhat.com>
4 years agoip_gre: Remove even more unused code
Greg Rose [Thu, 31 Oct 2019 22:46:04 +0000 (15:46 -0700)]
ip_gre: Remove even more unused code

There is a confusing mix of ipgre and gretap functions with some
needed for gretap still having ipgre_ prefixes.  This time though
I think I got the rest of the unused ipgre code.

Fixes: d5822f428814 ("gre: Remove dead ipgre code")
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoip_gre: Removed unused ipgre netdev ops
Greg Rose [Thu, 31 Oct 2019 20:30:39 +0000 (13:30 -0700)]
ip_gre: Removed unused ipgre netdev ops

When cleaning up unused ipgre code the ipgre_netdev_ops structure
was missed. Get rid of it now.

Fixes: d5822f428814 ("gre: Remove dead ipgre code")
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoAvoid indeterminate statistics in offload implementations.
Ben Pfaff [Fri, 25 Oct 2019 18:46:24 +0000 (11:46 -0700)]
Avoid indeterminate statistics in offload implementations.

A lot of the offload implementations didn't bother to initialize the
statistics they were supposed to return.  I don't know whether any of
the callers actually use them, but it looked wrong.

Found by inspection.

Acked-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoAUTHORS: Add Gowrishankar Muthukrishnan.
Ben Pfaff [Wed, 30 Oct 2019 17:51:44 +0000 (10:51 -0700)]
AUTHORS: Add Gowrishankar Muthukrishnan.

Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agolacp: warn transmit failure of lacp pdu
Gowrishankar Muthukrishnan [Mon, 21 Oct 2019 14:04:36 +0000 (19:34 +0530)]
lacp: warn transmit failure of lacp pdu

It might be difficult to trace whether LACP PDU tx (as in
response) was successful when the pdu was not transmitted by
egress slave for various reasons (including resource contention
within NIC) and only way to trace its fate is by looking at
ofproto->stats.tx_[packets/bytes] and slave port stats.

Adding a warning when there is tx failure could help user
debug at the root of this problem.

Signed-off-by: Gowrishankar Muthukrishnan <gmuthukr@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoovsdb-execute: Remove unused variable from ovsdb_execute_mutate().
Damijan Skvarc [Tue, 29 Oct 2019 13:04:58 +0000 (14:04 +0100)]
ovsdb-execute: Remove unused variable from ovsdb_execute_mutate().

Signed-off-by: Damijan Skvarc <damjan.skvarc@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agonetdev-afxdp: Add need_wakeup support.
William Tu [Wed, 23 Oct 2019 21:06:01 +0000 (14:06 -0700)]
netdev-afxdp: Add need_wakeup support.

The patch adds support for using need_wakeup flag in AF_XDP rings.
A new option, use-need-wakeup, is added.  When this option is used,
it means that OVS has to explicitly wake up the kernel RX, using poll()
syscall and wake up TX, using sendto() syscall. This feature improves
the performance by avoiding unnecessary sendto syscalls for TX.
For RX, instead of kernel always busy-spinning on fille queue, OVS wakes
up the kernel RX processing when fill queue is replenished.

The need_wakeup feature is merged into Linux kernel bpf-next tee with commit
77cd0d7b3f25 ("xsk: add support for need_wakeup flag in AF_XDP rings") and
OVS enables it by default, if libbpf supports it.  If users enable it but
runs in an older version of libbpf, then the need_wakeup feature has no effect,
and a warning message is logged.

For virtual interface, it's better set use-need-wakeup=false, since
the virtual device's AF_XDP xmit is synchronous: the sendto syscall
enters kernel and process the TX packet on tx queue directly.

On Intel Xeon E5-2620 v3 2.4GHz system, performance of physical port
to physical port improves from 6.1Mpps to 7.3Mpps.

Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agorhel: openvswitch-fedora.spec.in: Fix output redirect to null device
Roi Dayan [Mon, 28 Oct 2019 08:37:44 +0000 (10:37 +0200)]
rhel: openvswitch-fedora.spec.in: Fix output redirect to null device

Add missing slash.

Fixes: 0447019df7c6 ("fedora-spec: added systemd post/postun/pre/preun sections")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agodpif-netdev: Fix time delta overflow in case of race for meter lock.
Ilya Maximets [Thu, 24 Oct 2019 13:15:07 +0000 (15:15 +0200)]
dpif-netdev: Fix time delta overflow in case of race for meter lock.

There is a race window between getting the time and getting the meter
lock.  This could lead to situation where the thread with larger
current time (this thread called time_{um}sec() later than others)
will acquire meter lock first and update meter->used to the large
value.  Next threads will try to calculate time delta by subtracting
the large meter->used from their lower time getting the negative value
which will be converted to a big unsigned delta.

Fix that by assuming that all these threads received packets in the
same time in this case, i.e. dropping negative delta to 0.

CC: Jarno Rajahalme <jarno@ovn.org>
Fixes: 4b27db644a8c ("dpif-netdev: Simple DROP meter implementation.")
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2019-September/363126.html
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
4 years agodpif-netdev: Do not mix recirculation depth into RSS hash itself.
Ilya Maximets [Thu, 24 Oct 2019 10:55:15 +0000 (12:55 +0200)]
dpif-netdev: Do not mix recirculation depth into RSS hash itself.

Mixing of RSS hash with recirculation depth is useful for flow lookup
because same packet after recirculation should match with different
datapath rule.  Setting of the mixed value back to the packet is
completely unnecessary because recirculation depth is different on
each recirculation, i.e. we will have different packet hash for
flow lookup anyway.

This should fix the issue that packets from the same flow could be
directed to different buckets based on a dp_hash or different ports of
a balanced bonding in case they were recirculated different number of
times (e.g. due to conntrack rules).
With this change, the original RSS hash will remain the same making
it possible to calculate equal dp_hash values for such packets.

Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2019-September/363127.html
Fixes: 048963aa8507 ("dpif-netdev: Reset RSS hash when recirculating.")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Jan Scheurich <jan.scheurich@ericsson.com>
4 years agocommand-line: New function ovs_cmdl_env_parse_all().
Aliasgar Ginwala [Fri, 25 Oct 2019 19:33:52 +0000 (12:33 -0700)]
command-line: New function ovs_cmdl_env_parse_all().

This function allows an environment variable to be included in
command-line parsing.  It will receive its first user in an
upcoming commit.

Signed-off-by: Aliasgar Ginwala <aginwala@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoovsdb-server: fix memory leak while converting database
Damijan Skvarc [Fri, 25 Oct 2019 12:22:58 +0000 (14:22 +0200)]
ovsdb-server: fix memory leak while converting database

Memory leak happens while converting existing database into new
database according to the specified schema (ovsdb-client convert
new-schema). Memory leak was detected by valgrind while executing
functional test "schema conversion online - clustered"

==16202== 96 bytes in 6 blocks are definitely lost in loss record 326 of 399
==16202==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==16202==    by 0x44A5D4: xmalloc (util.c:138)
==16202==    by 0x4377A6: alloc_default_atoms (ovsdb-data.c:315)
==16202==    by 0x437F18: ovsdb_datum_init_default (ovsdb-data.c:918)
==16202==    by 0x413D82: ovsdb_row_create (row.c:59)
==16202==    by 0x40AA53: ovsdb_convert_table (file.c:220)
==16202==    by 0x40AA53: ovsdb_convert (file.c:275)
==16202==    by 0x416BE1: ovsdb_trigger_try (trigger.c:255)
==16202==    by 0x40D29E: ovsdb_jsonrpc_trigger_create (jsonrpc-server.c:1119)
==16202==    by 0x40D29E: ovsdb_jsonrpc_session_got_request (jsonrpc-server.c:986)
==16202==    by 0x40D29E: ovsdb_jsonrpc_session_run (jsonrpc-server.c:556)
==16202==    by 0x40D29E: ovsdb_jsonrpc_session_run_all (jsonrpc-server.c:586)
==16202==    by 0x40D29E: ovsdb_jsonrpc_server_run (jsonrpc-server.c:401)
==16202==    by 0x40682E: main_loop (ovsdb-server.c:209)
==16202==    by 0x40682E: main (ovsdb-server.c:460)

The problem was in ovsdb_datum_convert() function, which overrides
pointers to datum memory allocated in ovsdb_row_create() function.
Fix was done by freeing this memory before ovsdb_datum_convert()
is called.

Signed-off-by: Damijan Skvarc <damjan.skvarc@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agodocs: To build OVS on RHEL7 EPEL is needed
Timothy Redaelli [Fri, 25 Oct 2019 15:41:22 +0000 (17:41 +0200)]
docs: To build OVS on RHEL7 EPEL is needed

Since Python 3 is now mandatory, Extra Packages for Enterprise Linux
(EPEL) repository is needed in order to build OVS on RHEL7.

Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoflow: Fix crash on vlan packets with partial offloading.
Ilya Maximets [Wed, 23 Oct 2019 20:26:52 +0000 (22:26 +0200)]
flow: Fix crash on vlan packets with partial offloading.

parse_tcp_flags() does not care about vlan tags in a packet thus
not able to parse them.  As a result, if partial offloading is
enabled in userspace datapath vlan packets are not parsed, i.e.
has no initialized offsets.  This causes OVS crash on any attempt
to access/modify packet header fields.

For example, having the flow with following actions:
  in_port=1,ip,actions=mod_nw_src:192.168.0.7,output:IN_PORT

will lead to OVS crash on vlan packet handling:

 Process terminating with default action of signal 11 (SIGSEGV)
 Invalid read of size 4
    at 0x785657: get_16aligned_be32 (unaligned.h:249)
    by 0x785657: odp_set_ipv4 (odp-execute.c:82)
    by 0x785657: odp_execute_masked_set_action (odp-execute.c:527)
    by 0x785657: odp_execute_actions (odp-execute.c:894)
    by 0x74CDA9: dp_netdev_execute_actions (dpif-netdev.c:7355)
    by 0x74CDA9: packet_batch_per_flow_execute (dpif-netdev.c:6339)
    by 0x74CDA9: dp_netdev_input__ (dpif-netdev.c:6845)
    by 0x74DB6E: dp_netdev_input (dpif-netdev.c:6854)
    by 0x74DB6E: dp_netdev_process_rxq_port (dpif-netdev.c:4287)
    by 0x74E863: dpif_netdev_run (dpif-netdev.c:5264)
    by 0x703F57: type_run (ofproto-dpif.c:370)
    by 0x6EC8B8: ofproto_type_run (ofproto.c:1760)
    by 0x6DA52B: bridge_run__ (bridge.c:3188)
    by 0x6E083F: bridge_run (bridge.c:3252)
    by 0x1642E4: main (ovs-vswitchd.c:127)
  Address 0xc is not stack'd, malloc'd or (recently) free'd

Fix that by properly parsing vlan tags first.  Function 'parse_dl_type'
transformed for that purpose as it had no users anyway.

Added unit test for packet modification with partial offloading that
triggers above crash.

Fixes: aab96ec4d81e ("dpif-netdev: retrieve flow directly from the flow mark")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
4 years agotests: Fix indentation in userspace packet type aware test.
Ilya Maximets [Thu, 24 Oct 2019 12:28:50 +0000 (12:28 +0000)]
tests: Fix indentation in userspace packet type aware test.

CC: Ben Pfaff <blp@ovn.org>
Fixes: 7be29a47576d ("ofproto-dpif: Remove tabs from output.")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
4 years agorhel: Support RHEL7.7 build and packaging
Yifeng Sun [Fri, 11 Oct 2019 21:49:14 +0000 (14:49 -0700)]
rhel: Support RHEL7.7 build and packaging

This patch provides essential fixes for OVS to support
RHEL7.7's new kernel.

make rpm-fedora-kmod \
RPMBUILD_OPT='-D "kversion 3.10.0-1062.1.2.el7.x86_64"'

Tested-by: Greg Rose <gvrose8192@gmail.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoovsdb-server: Allow replication from older schema version servers.
Numan Siddique [Mon, 21 Oct 2019 16:56:51 +0000 (22:26 +0530)]
ovsdb-server: Allow replication from older schema version servers.

Presently, replication is not allowed if there is a schema version mismatch between
the schema returned by the active ovsdb-server and the local db schema. This is
causing failures in OVN DB HA deployments during uprades.

In the case of OpenStack tripleo deployment with OVN, OVN DB ovsdb-servers are
deployed on a multi node controller cluster in active/standby mode. During
minor updates or major upgrades, the cluster is updated one at a time. If
a node A is running active OVN DB ovsdb-servers and when it is updated, another
node B becomes active. After the update when OVN DB ovsdb-servers in A are started,
these ovsdb-servers fail to replicate from the active if there is a schema
version mismatch.

This patch addresses this issue by allowing replication even if there is a
schema version mismatch only if all the active db schema tables and its colums are
present in the local db schema.

This should not result in any data loss.

Signed-off-by: Numan Siddique <numans@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agolldp: Fix for OVS crashes when a LLDP-enabled port is deleted
Surya Rudra [Mon, 21 Oct 2019 07:12:02 +0000 (12:42 +0530)]
lldp: Fix for OVS crashes when a LLDP-enabled port is deleted

Issue:
When LLDP is enabled on a port, a structure to hold LLDP related state
is created and that structure has a reference to the port. The ofproto
monitor thread accesses the LLDP structure to periodically send packets
over the associated port. When the port is deleted, the LLDP structure
is not cleaned up and it continues to refer to the deleted port.

When the monitor thread attempts to access the deleted port OVS crashes.
Crash can happen with bridge delete and bond delete also.

Fix:
Remove all references to the LLDP structure and free it when
the port is deleted.

Signed-off-by: Surya Rudra <rudrasurya.r@altencalsoftlabs.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agodocs: DPDK isn't a datapath, so don't use the term.
Ben Pfaff [Wed, 23 Oct 2019 17:19:39 +0000 (10:19 -0700)]
docs: DPDK isn't a datapath, so don't use the term.

The DPDK library allows OVS fast access to packet I/O in userspace.  It
is not a datapath.  This commit avoids using that term.

Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agofaq: Give specific versions that introduced various features.
Ben Pfaff [Wed, 23 Oct 2019 17:19:38 +0000 (10:19 -0700)]
faq: Give specific versions that introduced various features.

Some users would find it useful to know the particular OVS version that
introduced a feature to the OVS tree kernel module or to the OVS
userspace (DPDK) datapath implementation.  This patch updates the FAQ
to include that information.

This information is primarily gleaned from the top-level NEWS file.
For most of these, I did not verify them by looking carefully through
the history, so some of them may be inaccurate, although a few people
made corrections in review.

Requested-by: Jianjun Shen <shenj@vmware.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agolacp: report desync in ovs threads enabling slave
Gowrishankar Muthukrishnan [Tue, 22 Oct 2019 05:29:14 +0000 (10:59 +0530)]
lacp: report desync in ovs threads enabling slave

It is helpful in reporting main thread that is yet to enable bond slave,
but link state was brought up by lacp thread and capture this desync
between ovs threads for debugging.

Fixes: a8448cb170 ("lacp: Avoid packet drop on LACP bond after link up")
Signed-off-by: Gowrishankar Muthukrishnan <gmuthukr@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoovs-tcpundump: allow multiple packet lengths
Aaron Conole [Tue, 22 Oct 2019 14:55:59 +0000 (10:55 -0400)]
ovs-tcpundump: allow multiple packet lengths

The tcpundump tool expects all packets to be a length which aligns to
exactly a 4-nibble boundary.  This means packets like DNS requests will be
stripped before being correctly processed.  Fix this by allowing at least
two nibbles (or one byte) alignment.

Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoovs-tcpundump: exit when getting version
Aaron Conole [Tue, 22 Oct 2019 14:55:58 +0000 (10:55 -0400)]
ovs-tcpundump: exit when getting version

Running 'ovs-tcpundump -V' will cause ovs-tcpundump to start processing on
stdin.  Instead, print the version and exit.

Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agonetdev-dpdk: Fix Tx queue false sharing.
Ilya Maximets [Mon, 26 Aug 2019 14:54:04 +0000 (17:54 +0300)]
netdev-dpdk: Fix Tx queue false sharing.

'tx_q' array is allocated for each DPDK netdev.  'struct dpdk_tx_queue'
is 8 bytes long, so 8 tx queues are sharing the same cache line in
case of 64B cacheline size.  This causes 'false sharing' issue in
mutliqueue case because taking the spinlock implies write to memory
i.e. cache invalidation.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
4 years agotravis: Test build with afxdp.
Ilya Maximets [Mon, 21 Oct 2019 13:33:04 +0000 (15:33 +0200)]
travis: Test build with afxdp.

We can't easily update the kernel on TravisCI to run system tests
with AF_XDP, but we could run build tests with libbpf and headers
from newer kernels.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
4 years agorhel: Remove the cond 'build_python3'
Numan Siddique [Mon, 21 Oct 2019 09:42:42 +0000 (15:12 +0530)]
rhel: Remove the cond 'build_python3'

A previous patch removed python2 support from ovs. So we can remove
this condition and make python3 mandatory for builds. Without this
patch, make rpm-fedora on centos 7 fails unless  we pass
RPMBUILD_OPT="--with build_python3".

Signed-off-by: Numan Siddique <numans@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agodebian and rhel: Add libunwind dev package.
William Tu [Thu, 17 Oct 2019 19:55:36 +0000 (12:55 -0700)]
debian and rhel: Add libunwind dev package.

The patch add libunwind dev package to debian and rhel.

Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
4 years agoossfuzz: Simplify miniflow fuzzer harness.
Bhargava Shastry [Fri, 18 Oct 2019 15:17:34 +0000 (17:17 +0200)]
ossfuzz: Simplify miniflow fuzzer harness.

Google's oss-fuzz builder bots were complaining that miniflow_target is
too slow to fuzz in that some tests take longer than a second to
complete. This patch fixes this by replacing the random flow generation
within the harness to a more simpler scenario.

Signed-off-by: Bhargava Shastry <bshas3@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agodatapath: Allow attaching helper in later commit
Yi-Hung Wei [Tue, 15 Oct 2019 17:27:53 +0000 (10:27 -0700)]
datapath: Allow attaching helper in later commit

Upstream commit:
commit 248d45f1e1934f7849fbdc35ef1e57151cf063eb
Author: Yi-Hung Wei <yihung.wei@gmail.com>
Date:   Fri Oct 4 09:26:44 2019 -0700

    openvswitch: Allow attaching helper in later commit

    This patch allows to attach conntrack helper to a confirmed conntrack
    entry.  Currently, we can only attach alg helper to a conntrack entry
    when it is in the unconfirmed state.  This patch enables an use case
    that we can firstly commit a conntrack entry after it passed some
    initial conditions.  After that the processing pipeline will further
    check a couple of packets to determine if the connection belongs to
    a particular application, and attach alg helper to the connection
    in a later stage.

Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agodatapath: Fix log message in ovs conntrack
Yi-Hung Wei [Tue, 15 Oct 2019 17:27:52 +0000 (10:27 -0700)]
datapath: Fix log message in ovs conntrack

Upstream commit:
commit 12c6bc38f99bb168b7f16bdb5e855a51a23ee9ec
Author: Yi-Hung Wei <yihung.wei@gmail.com>
Date:   Wed Aug 21 17:16:10 2019 -0700

    openvswitch: Fix log message in ovs conntrack

Fixes: 06bd2bdf19d2 ("openvswitch: Add timeout support to ct action")
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agodatapath: Replace removed NF_NAT_NEEDED with IS_ENABLED(CONFIG_NF_NAT)
Yi-Hung Wei [Tue, 15 Oct 2019 17:27:51 +0000 (10:27 -0700)]
datapath: Replace removed NF_NAT_NEEDED with IS_ENABLED(CONFIG_NF_NAT)

Backports the following upstream commit with some backward compatibility
change.

commit f319ca6557c10a711facc4dd60197470796d3ec1
Author: Geert Uytterhoeven <geert@linux-m68k.org>
Date:   Wed May 8 08:52:32 2019 +0200

    openvswitch: Replace removed NF_NAT_NEEDED with IS_ENABLED(CONFIG_NF_NAT)

    Commit 4806e975729f99c7 ("netfilter: replace NF_NAT_NEEDED with
    IS_ENABLED(CONFIG_NF_NAT)") removed CONFIG_NF_NAT_NEEDED, but a new user
    popped up afterwards.

Fixes: fec9c271b8f1bde1 ("openvswitch: load and reference the NAT helper.")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Florian Westphal <fw@strlen.de>
Acked-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agodatapath: Check for null pointer return from nla_nest_start_noflag
Colin Ian King [Tue, 15 Oct 2019 17:27:50 +0000 (10:27 -0700)]
datapath: Check for null pointer return from nla_nest_start_noflag

upstream commit:

commit ca96534630e2edfd73121c487c957b17eca3b7d7
Author: Colin Ian King <colin.king@canonical.com>
Date:   Wed May 1 14:41:58 2019 +0100

    openvswitch: check for null pointer return from nla_nest_start_noflag

    The call to nla_nest_start_noflag can return null in the unlikely
    event that nla_put returns -EMSGSIZE.  Check for this condition to
    avoid a null pointer dereference on pointer nla_reply.

    Addresses-Coverity: ("Dereference null return value")
Fixes: 11efd5cb04a1 ("openvswitch: Support conntrack zone limit")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agodatapath: Load and reference the NAT helper.
Yi-Hung Wei [Tue, 15 Oct 2019 17:27:49 +0000 (10:27 -0700)]
datapath: Load and reference the NAT helper.

This commit backports the following upstream commit, and two functions
in nf_conntrack_helper.h.

Upstream commit:
commit fec9c271b8f1bde1086be5aa415cdb586e0dc800
Author: Flavio Leitner <fbl@redhat.com>
Date:   Wed Apr 17 11:46:17 2019 -0300

    openvswitch: load and reference the NAT helper.

    This improves the original commit 17c357efe5ec ("openvswitch: load
    NAT helper") where it unconditionally tries to load the module for
    every flow using NAT, so not efficient when loading multiple flows.
    It also doesn't hold any references to the NAT module while the
    flow is active.

    This change fixes those problems. It will try to load the module
    only if it's not present. It grabs a reference to the NAT module
    and holds it while the flow is active. Finally, an error message
    shows up if either actions above fails.

Fixes: 17c357efe5ec ("openvswitch: load NAT helper")
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agodatapath: genetlink: optionally validate strictly/dumps
Yi-Hung Wei [Tue, 15 Oct 2019 17:27:48 +0000 (10:27 -0700)]
datapath: genetlink: optionally validate strictly/dumps

This patch backports the following upstream commit within the
openvswitch kernel module with some checks so that it also works
in the older kernel.

Upstream commit:
commit ef6243acb4782df587a4d7d6c310fa5b5d82684b
Author: Johannes Berg <johannes.berg@intel.com>
Date:   Fri Apr 26 14:07:31 2019 +0200

    genetlink: optionally validate strictly/dumps

    Add options to strictly validate messages and dump messages,
    sometimes perhaps validating dump messages non-strictly may
    be required, so add an option for that as well.

    Since none of this can really be applied to existing commands,
    set the options everwhere using the following spatch:

        @@
        identifier ops;
        expression X;
        @@
        struct genl_ops ops[] = {
        ...,
         {
                .cmd = X,
        +       .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
                ...
         },
        ...
        };

    For new commands one should just not copy the .validate 'opt-out'
    flags and thus get strict validation.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agodatapath: Use nla_nest_start_noflag()
Yi-Hung Wei [Tue, 15 Oct 2019 17:27:47 +0000 (10:27 -0700)]
datapath: Use nla_nest_start_noflag()

This patch backports the openvswitch changes and update the compat layer
for the following upstream patch.

commit ae0be8de9a53cda3505865c11826d8ff0640237c
Author: Michal Kubecek <mkubecek@suse.cz>
Date:   Fri Apr 26 11:13:06 2019 +0200

    netlink: make nla_nest_start() add NLA_F_NESTED flag

    Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most
    netlink based interfaces (including recently added ones) are still not
    setting it in kernel generated messages. Without the flag, message parsers
    not aware of attribute semantics (e.g. wireshark dissector or libmnl's
    mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display
    the structure of their contents.

    Unfortunately we cannot just add the flag everywhere as there may be
    userspace applications which check nlattr::nla_type directly rather than
    through a helper masking out the flags. Therefore the patch renames
    nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start()
    as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually
    are rewritten to use nla_nest_start().

    Except for changes in include/net/netlink.h, the patch was generated using
    this semantic patch:

    @@ expression E1, E2; @@
    -nla_nest_start(E1, E2)
    +nla_nest_start_noflag(E1, E2)

    @@ expression E1, E2; @@
    -nla_nest_start_noflag(E1, E2 | NLA_F_NESTED)
    +nla_nest_start(E1, E2)

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agodatapath: Handle NF_NAT_NEEDED replacement
Yi-Hung Wei [Tue, 15 Oct 2019 17:27:46 +0000 (10:27 -0700)]
datapath: Handle NF_NAT_NEEDED replacement

Starting from the following upstream commit, NF_NAT_NEEDED is replaced
by IS_ENABLED(CONFIG_NF_NAT) in the upstream kernel. This patch makes
some changes so that our in tree ovs kernel module is compatible to
both old and new kernels.

Upstream commit:
commit 4806e975729f99c7908d1688a143f1e16d464e6c
Author: Florian Westphal <fw@strlen.de>
Date:   Wed Mar 27 09:22:26 2019 +0100

    netfilter: replace NF_NAT_NEEDED with IS_ENABLED(CONFIG_NF_NAT)

    NF_NAT_NEEDED is true whenever nat support for either ipv4 or ipv6 is
    enabled.  Now that the af-specific nat configuration switches have been
    removed, IS_ENABLED(CONFIG_NF_NAT) has the same effect.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agodatapath: add seqadj extension when NAT is used.
Flavio Leitner [Tue, 15 Oct 2019 17:27:45 +0000 (10:27 -0700)]
datapath: add seqadj extension when NAT is used.

upstream patch:

commit fa7e428c6b7ed3281610511a2b2ec716d9894be8
Author: Flavio Leitner <fbl@sysclose.org>
Date:   Mon Mar 25 15:58:31 2019 -0300

    openvswitch: add seqadj extension when NAT is used.

    When the conntrack is initialized, there is no helper attached
    yet so the nat info initialization (nf_nat_setup_info) skips
    adding the seqadj ext.

    A helper is attached later when the conntrack is not confirmed
    but is going to be committed. In this case, if NAT is needed then
    adds the seqadj ext as well.

Fixes: 16ec3d4fbb96 ("openvswitch: Fix cached ct with helper.")
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agodatapath: Detect upstream nf_nat change
Yi-Hung Wei [Tue, 15 Oct 2019 17:27:44 +0000 (10:27 -0700)]
datapath: Detect upstream nf_nat change

The following two upstream commits merge nf_nat_ipv4 and nf_nat_ipv6
into nf_nat core, and move some header files around.  To handle
these modifications, this patch detects the upstream changes, uses
the header files and config symbols properly.

Ideally, we should replace CONFIG_NF_NAT_IPV4 and CONFIG_NF_NAT_IPV6 with
CONFIG_NF_NAT and CONFIG_IPV6.  In order to keep backward compatibility,
we keep the checking of CONFIG_NF_NAT_IPV4/6 as is for the old kernel,
and replace them with marco for the new kernel.

upstream commits:
3bf195ae6037 ("netfilter: nat: merge nf_nat_ipv4,6 into nat core")
d2c5c103b133 ("netfilter: nat: remove nf_nat_l3proto.h and nf_nat_core.h")

Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agodatapath: Replace nf_ct_invert_tuplepr() with nf_ct_invert_tuple()
Yi-Hung Wei [Tue, 15 Oct 2019 17:27:43 +0000 (10:27 -0700)]
datapath: Replace nf_ct_invert_tuplepr() with nf_ct_invert_tuple()

After upstream net-next commit 303e0c558959 ("netfilter: conntrack:
avoid unneeded nf_conntrack_l4proto lookups") nf_ct_invert_tuplepr()
is no longer available in the kernel.

Ideally, we should be in sync with upstream kernel by calling
nf_ct_invert_tuple() directly in conntrack.c.  However,
nf_ct_invert_tuple() has different function signature in older kernel,
and it would be hard to replace that in the compat layer. Thus, we
use rpl_nf_ct_invert_tuple() in conntrack.c and maintain compatibility
in the compat layer so that ovs kernel module runs smoothly in both
new and old kernel.

Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agodatapath: Fix linking without CONFIG_NF_CONNTRACK_LABELS
Arnd Bergmann [Tue, 15 Oct 2019 17:27:42 +0000 (10:27 -0700)]
datapath: Fix linking without CONFIG_NF_CONNTRACK_LABELS

upstream commit:
commit a277d516de5f498c91d91189717ef7e01102ad27
Author: Arnd Bergmann <arnd@arndb.de>
Date:   Fri Nov 2 16:36:55 2018 +0100

    openvswitch: fix linking without CONFIG_NF_CONNTRACK_LABELS

    When CONFIG_CC_OPTIMIZE_FOR_DEBUGGING is enabled, the compiler
    fails to optimize out a dead code path, which leads to a link failure:

    net/openvswitch/conntrack.o: In function `ovs_ct_set_labels':
    conntrack.c:(.text+0x2e60): undefined reference to `nf_connlabels_replace'

    In this configuration, we can take a shortcut, and completely
    remove the contrack label code. This may also help the regular
    optimization.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agodatapath: compat: drop bridge nf reset from nf_reset
Greg Rose [Wed, 9 Oct 2019 21:22:14 +0000 (14:22 -0700)]
datapath: compat: drop bridge nf reset from nf_reset

Upstream commmit:
    commit 895b5c9f206eb7d25dc1360a8ccfc5958895eb89
    Author: Florian Westphal <fw@strlen.de>
    Date:   Sun Sep 29 20:54:03 2019 +0200

    netfilter: drop bridge nf reset from nf_reset

    commit 174e23810cd31
    ("sk_buff: drop all skb extensions on free and skb scrubbing") made napi
    recycle always drop skb extensions.  The additional skb_ext_del() that is
    performed via nf_reset on napi skb recycle is not needed anymore.

    Most nf_reset() calls in the stack are there so queued skb won't block
    'rmmod nf_conntrack' indefinitely.

    This removes the skb_ext_del from nf_reset, and renames it to a more
    fitting nf_reset_ct().

    In a few selected places, add a call to skb_ext_reset to make sure that
    no active extensions remain.

    I am submitting this for "net", because we're still early in the release
    cycle.  The patch applies to net-next too, but I think the rename causes
    needless divergence between those trees.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Added some compat layer fixups for nf_reset_ct.  This is just a portion
of the upstream commit that applies to openvswitch.

Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agodatapath: rename flow_stats to sw_flow_stats
Pablo Neira Ayuso [Wed, 9 Oct 2019 21:22:13 +0000 (14:22 -0700)]
datapath: rename flow_stats to sw_flow_stats

Upstream commit:
    commit aef833c58d321f09ae4ce4467723542842ba9faf
    Author: Pablo Neira Ayuso <pablo@netfilter.org>
    Date:   Fri Jul 19 18:20:13 2019 +0200

    net: openvswitch: rename flow_stats to sw_flow_stats

    There is a flow_stats structure defined in include/net/flow_offload.h
    and a follow up patch adds #include <net/flow_offload.h> to
    net/sch_generic.h.

    This breaks compilation since OVS codebase includes net/sock.h which
    pulls in linux/filter.h which includes net/sch_generic.h.

    In file included from ./include/net/sch_generic.h:18:0,
                     from ./include/linux/filter.h:25,
                     from ./include/net/sock.h:59,
                     from ./include/linux/tcp.h:19,
                     from net/openvswitch/datapath.c:24

    This definition takes precedence on OVS since it is placed in the
    networking core, so rename flow_stats in OVS to sw_flow_stats since
    this structure is contained in sw_flow.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agocompat: remove the incorrect mtu limit for erspan
Haishuang Yan [Wed, 9 Oct 2019 21:22:12 +0000 (14:22 -0700)]
compat: remove the incorrect mtu limit for erspan

Upstream commit:
    commit 0e141f757b2c78c983df893e9993313e2dc21e38
    Author: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
    Date:   Fri Sep 27 14:58:20 2019 +0800

    erspan: remove the incorrect mtu limit for erspan

    erspan driver calls ether_setup(), after commit 61e84623ace3
    ("net: centralize net_device min/max MTU checking"), the range
    of mtu is [min_mtu, max_mtu], which is [68, 1500] by default.

    It causes the dev mtu of the erspan device to not be greater
    than 1500, this limit value is not correct for ipgre tap device.

    Tested:
    Before patch:
    # ip link set erspan0 mtu 1600
    Error: mtu greater than device maximum.
    After patch:
    # ip link set erspan0 mtu 1600
    # ip -d link show erspan0
    21: erspan0@NONE: <BROADCAST,MULTICAST> mtu 1600 qdisc noop state DOWN
    mode DEFAULT group default qlen 1000
        link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 0

Fixes: 61e84623ace3 ("net: centralize net_device min/max MTU checking")
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agodatapath: change type of UPCALL_PID attribute to NLA_UNSPEC
Li RongQing [Wed, 9 Oct 2019 21:22:11 +0000 (14:22 -0700)]
datapath: change type of UPCALL_PID attribute to NLA_UNSPEC

Upstream commit:
    commit ea8564c865299815095bebeb4b25bef474218e4c
    Author: Li RongQing <lirongqing@baidu.com>
    Date:   Tue Sep 24 19:11:52 2019 +0800

    openvswitch: change type of UPCALL_PID attribute to NLA_UNSPEC

    userspace openvswitch patch "(dpif-linux: Implement the API
    functions to allow multiple handler threads read upcall)"
    changes its type from U32 to UNSPEC, but leave the kernel
    unchanged

    and after kernel 6e237d099fac "(netlink: Relax attr validation
    for fixed length types)", this bug is exposed by the below
    warning

     [   57.215841] netlink: 'ovs-vswitchd': attribute type 5 has an invalid length.

Fixes: 5cd667b0a456 ("openvswitch: Allow each vport to have an array of 'port_id's")
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: beb1c69a3 ("datapath: Allow each vport to have an array of 'port_id's.")
Cc: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agodatapath: hide clang frame-overflow warnings
Arnd Bergmann [Wed, 9 Oct 2019 21:22:10 +0000 (14:22 -0700)]
datapath: hide clang frame-overflow warnings

Upstream commit:
    commit 260637903f47f20c5918bb5c1eea52b2a28ea863
    Author: Arnd Bergmann <arnd@arndb.de>
    Date:   Mon Jul 22 17:00:01 2019 +0200

    ovs: datapath: hide clang frame-overflow warnings

    Some functions in the datapath code are factored out so that each
    one has a stack frame smaller than 1024 bytes with gcc. However,
    when compiling with clang, the functions are inlined more aggressively
    and combined again so we get

    net/openvswitch/datapath.c:1124:12: error: stack frame size of 1528 bytes in function 'ovs_flow_cmd_set' [-Werror,-Wframe-larger-than=]

    Marking both get_flow_actions() and ovs_nla_init_match_and_action()
    as 'noinline_for_stack' gives us the same behavior that we see with
    gcc, and no warning. Note that this does not mean we actually use
    less stack, as the functions call each other, and we still get
    three copies of the large 'struct sw_flow_key' type on the stack.

    The comment tells us that this was previously considered safe,
    presumably since the netlink parsing functions are called with
    a known backchain that does not also use a lot of stack space.

Fixes: 9cc9a5cb176c ("datapath: Avoid using stack larger than 1024.")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agotc: Limit the max action number to 16
Chris Mi [Wed, 16 Oct 2019 08:37:14 +0000 (11:37 +0300)]
tc: Limit the max action number to 16

Currently, ovs supports to offload max TCA_ACT_MAX_PRIO(32) actions.
But net sched api has a limit of 4K message size which is not enough
for 32 actions when echo flag is set.

After a lot of testing, we find that 16 actions is a reasonable number.
So in this commit, we introduced a new define to limit the max actions.

Fixes: 0c70132cd288("tc: Make the actions order consistent")
Signed-off-by: Chris Mi <chrism@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agocompat: Fix small naming issue
Greg Rose [Wed, 16 Oct 2019 20:21:11 +0000 (13:21 -0700)]
compat: Fix small naming issue

In commit 057772cf2477 the function is missing the rpl_ prefix
and the define that replaces the original function should come
after the function definition.

Fixes: 057772cf2477 ("compat: Backport nf_ct_tmpl_alloc().")
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agoconfigure: Properly handle case where libunwind.h is not available.
Yi-Hung Wei [Thu, 17 Oct 2019 04:35:55 +0000 (21:35 -0700)]
configure: Properly handle case where libunwind.h is not available.

It is possible that user install libunwind but not libunwind-devel,
and it will run into a compilation error.  So we need to check the
existence of the library and the header file.

Fixes: e2ed6fbeb18c ("fatal-signal: Catch SIGSEGV and print backtrace.")
Suggested-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agotests: Get rid of timeout options for control utilities.
Ilya Maximets [Thu, 10 Oct 2019 14:23:57 +0000 (16:23 +0200)]
tests: Get rid of timeout options for control utilities.

'OVS_CTL_TIMEOUT' environment variable is exported in tests/atlocal.in
and controls timeouts for all OVS utilities in testsuite.

There should be no manual tweaks for each single command.

This helps with running tests under valgrind where commands could
take really long time as you only need to change 'OVS_CTL_TIMEOUT'
in a single place.

Few manual timeouts were left in places where they make sense.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Acked-by: Ben Pfaff <blp@ovn.org>
4 years agoAUTHORS: Add Alessandro Pilotti.
Alin Gabriel Serdean [Tue, 15 Oct 2019 10:15:02 +0000 (13:15 +0300)]
AUTHORS: Add Alessandro Pilotti.

Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
4 years agoovs container build.sh requires python3
Aliasgar Ginwala [Fri, 11 Oct 2019 21:56:46 +0000 (14:56 -0700)]
ovs container build.sh requires python3

Building ovn/ovs container breaks while configure:
checking for Python 3 (version 3.4 or later)... no
configure: error: Python 3.4 or later is required but not found in
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin,
please install it or set  to point to it

As per ovs commit 1ca0323e7c29dc7ef5a615c265df0460208f92de
Require Python 3 and remove support for Python 2.

Signed-off-by: Aliasgar Ginwala <aginwala@ebay.com>
Signed-off-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoovs-vlan-bug-workaround: Remove.
Ben Pfaff [Thu, 10 Oct 2019 19:07:27 +0000 (12:07 -0700)]
ovs-vlan-bug-workaround: Remove.

This workaround only applied to kernels earlier than 2.6.37, but OVS
only supports 3.10 and later.

As the original author of this code, I won't miss it.

Tested-by: Greg Rose <gvrose8192@gmail.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agodpif-netlink: Fix some variable naming.
Ben Pfaff [Mon, 14 Oct 2019 18:10:47 +0000 (11:10 -0700)]
dpif-netlink: Fix some variable naming.

Usually a plural name refers to an array, but 'socks' and 'socksp' were
only single objects, so this changes their names to 'sock' and 'sockp'.

Usually a 'p' suffix means that a variable is an output argument, but
that was only true in one place here, so this changes the names of the
other variables to plain 'sock'.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
4 years agolib: packets: export compose_ipv6 routine to OVN
Lorenzo Bianconi [Fri, 11 Oct 2019 10:23:12 +0000 (12:23 +0200)]
lib: packets: export compose_ipv6 routine to OVN

Remove static qualifier from compose_ipv6 definition and export it to
OVN. compose_ipv6 will be used in order to add IPv6 prefix delegation
support to OVN

Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agodpif-netlink: Free leaked nl_sock
Yifeng Sun [Fri, 11 Oct 2019 22:50:47 +0000 (15:50 -0700)]
dpif-netlink: Free leaked nl_sock

Valgrind reports:
20 bytes in 1 blocks are definitely lost in loss record 94 of 353
    by 0x532594: xmalloc (util.c:138)
    by 0x553EAD: nl_sock_create (netlink-socket.c:146)
    by 0x54331D: create_nl_sock (dpif-netlink.c:255)
    by 0x54331D: dpif_netlink_port_add__ (dpif-netlink.c:756)
    by 0x5435F6: dpif_netlink_port_add_compat (dpif-netlink.c:876)
    by 0x5435F6: dpif_netlink_port_add (dpif-netlink.c:922)
    by 0x47EC1D: dpif_port_add (dpif.c:584)
    by 0x42B35F: port_add (ofproto-dpif.c:3721)
    by 0x41E64A: ofproto_port_add (ofproto.c:2032)
    by 0x40B3FE: iface_do_create (bridge.c:1817)
    by 0x40B3FE: iface_create (bridge.c:1855)
    by 0x40B3FE: bridge_add_ports__ (bridge.c:943)
    by 0x40D14A: bridge_add_ports (bridge.c:959)
    by 0x40D14A: bridge_reconfigure (bridge.c:673)
    by 0x410D75: bridge_run (bridge.c:3050)
    by 0x407614: main (ovs-vswitchd.c:127)

This leak is because when vport_add_channel() returns 0, it is expected
to take the ownership of 'socksp'. This patch fixes this issue.

Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoovsdb-server: Don't drop all connections on read/write status change.
Numan Siddique [Mon, 14 Oct 2019 15:20:02 +0000 (20:50 +0530)]
ovsdb-server: Don't drop all connections on read/write status change.

The commit [1] force drops all connections when the db read/write status changes.
Prior to the commit [1], when there was read/write status change, the existing
jsonrpc sessions with 'db_change_aware' set to true, were not updated with the
changed 'read_only' value. If the db status was changed to 'standby', the existing
clients could still write to the db.

In the case of pacemaker OVN HA, OVN OCF script 'start' action starts the
ovsdb-servers in read-only state and later, it sets to read-write in the
'promote' action. We have observed that if some ovn-controllers connect to
the SB ovsdb-server (in read-only state) just before the 'promote' action,
the connection is not reset all the times and these ovn-controllers remain connected
to the SB ovsdb-server in read-only state all the time. Even though
the commit [1] calls 'ovsdb_jsonrpc_server_reconnect()' with 'forced' flag
set to true when the db read/write status changes, somehow the FSM misses resetting
the connections of these ovn-controllers.

I think this needs to be addressed in the FSM. This patch doesn't address
this FSM issue. Instead it changes the behavior of 'ovsdb_jsonrpc_server_set_read_only()'
by setting the 'read_only' flag of all the jsonrpc sessions instead of forcefully
resetting the connection.

I think there is no need to reset the connection. In large scale production
deployements with OVN, this results in unnecessary waste of CPU cycles as ovn-controllers
will have to connect twice - once during 'start' action and again during 'promote'.

[1] - 2a9679e3b2c6("ovsdb-server: drop all connections on read/write status change")

Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agonetdev-afxdp: Update memory locking limits unconditionally.
Ilya Maximets [Wed, 9 Oct 2019 14:23:31 +0000 (16:23 +0200)]
netdev-afxdp: Update memory locking limits unconditionally.

Any type of AF_XDP socket in all modes implies creation of BPF map of
type BPF_MAP_TYPE_XSKMAP.  This leads to BPF_MAP_CREATE syscall and
subsequently 'xsk_map_alloc()' function that will charge required
memory from the memlock limit and fail with EPERM if we're trying
to allocate more.

On my system with 64K bytes of max locked memory by default, OVS
frequently starts to fail after addition of 3rd afxdp port in SKB
mode:

  netdev_afxdp|ERR|xsk_socket__create failed (Operation not permitted)
                   mode: SKB qid: 0

Fixes: 0de1b425962d ("netdev-afxdp: add new netdev type for AF_XDP.")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agonetdev-afxdp: Fix umem creation failure due to uninitialized config.
Ilya Maximets [Wed, 9 Oct 2019 14:17:58 +0000 (16:17 +0200)]
netdev-afxdp: Fix umem creation failure due to uninitialized config.

Later version of 'struct xsk_umem_config' contains additional field
'flags'.  OVS doesn't use that field passing uninitialized stack
memory to the 'xsk_umem__create()' call that could fail with
'Invalid argument' if 'flags' are non-zero or, even worse, create
umem with unexpected properties.

We need to clear the whole structure explicitly to avoid this kind
of issues.

Fixes: 0de1b425962d ("netdev-afxdp: add new netdev type for AF_XDP.")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agocirrus: Use latest stable FreeBSD images.
Ilya Maximets [Tue, 1 Oct 2019 11:28:45 +0000 (14:28 +0300)]
cirrus: Use latest stable FreeBSD images.

CirrusCI recently introduced [1] new feature to use image families
instead of bare image names for gCloud based instances.
This allows us to use most recent stable builds. All the stable builds
are in the same image family in gCloud and it will run instances using
the most recent one.
This also allows us to simply use 11.3 image instead of 11.2.  There
was no such ability previously, because base freebsd-11-3-release-amd64
image has issues[2] that doesn't allow CirrusCI to use it.  However,
later stable 11.3 images from freebsd-11-3-snap family works fine.

[1] https://github.com/cirruslabs/cirrus-ci-docs/issues/422
[2] https://github.com/cirruslabs/cirrus-ci-docs/issues/359

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
4 years agoflow: Fix using pointer to member of packed struct icmp6_hdr.
Ilya Maximets [Tue, 1 Oct 2019 15:18:23 +0000 (18:18 +0300)]
flow: Fix using pointer to member of packed struct icmp6_hdr.

OVS has no structure definition for ICMPv6 header with additional
data. More precisely, it has, but this structure named as
'icmp6_error_header' and only suitable to store error related
extended information.  'flow_compose_l4' stores additional
information in reserved bits by using system defined structure
'icmp6_hdr', which is marked as 'packed' and this leads to
build failure with gcc >= 9:

  lib/flow.c:3041:34: error:
    taking address of packed member of 'struct icmp6_hdr' may result
    in an unaligned pointer value [-Werror=address-of-packed-member]

        uint32_t *reserved = &icmp->icmp6_dataun.icmp6_un_data32[0];
                             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Fix that by renaming 'icmp6_error_header' to 'icmp6_data_header'
and allowing it to store not only errors, but any type of additional
information by analogue with 'struct icmp6_hdr'.
All the usages of 'struct icmp6_hdr' replaced with this new structure.
Removed redundant conversions between network and host representations.
Now fields are always in be.

This also, probably, makes flow_compose_l4 more robust by avoiding
possible unaligned accesses to 32 bit value.

Fixes: 9b2b84973db7 ("Support for match & set ICMPv6 reserved and options type fields")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
Acked-by: Ben Pfaff <blp@ovn.org>
4 years agoappveyor: Update OpenSSL link and python3 to path
Alin Gabriel Serdean [Wed, 9 Oct 2019 14:10:08 +0000 (17:10 +0300)]
appveyor: Update OpenSSL link and python3 to path

This patch fixes the appveyor build by adding the python version 3 to path
as per:
https://www.appveyor.com/docs/windows-images-software/#python

We also create a hardlink for python3 in the same directory to ease up scripts
which checks for its existence.

This patch also bumps the OpenSSL version from 1.0.2n to 1.0.2t.

Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agotests: Allow valgrind check for afxdp testsuite.
Ilya Maximets [Wed, 9 Oct 2019 14:36:49 +0000 (16:36 +0200)]
tests: Allow valgrind check for afxdp testsuite.

New 'make' target 'check-afxdp-valgrind'.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agoFix "the the" typo in two places.
Ben Pfaff [Wed, 9 Oct 2019 21:30:48 +0000 (14:30 -0700)]
Fix "the the" typo in two places.

Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoofproto: Fix for frequent invalidation of mega flows for push actions
Vishal Deep Ajmera [Wed, 9 Oct 2019 08:36:30 +0000 (14:06 +0530)]
ofproto: Fix for frequent invalidation of mega flows for push actions

When a packet is processed by the slow path and the matching OpenFlow
rule has actions like push_mpls/set_field and push_vlan/set_field, the
ofproto layer un-wildcards the MPLS and VLAN match fields in the megaflow
entry that it plans to install. However, when the megaflow entry is
actually installed, all protocol match fields that are not present in the
packet are wildcarded. Thus, the wildcard bits in the installed megaflow
entry could be different from the bits originally generated by the ofproto
layer.

When the revalidator thread validates a megaflow, it will first query the
ofproto layer to get the wildcard bits and then compare it against the
wildcard bits in the megaflow. If the bits are different the entry will be
removed.  A subsequent packet will again result in the same megaflow entry
being installed only for it to be removed by the revalidator thread. This
cycle will continue and will significantly degrade performance.

This patch fixes the issue by wildcarding flow fields which are not present
in the incoming packet.

Signed-off-by: Vishal Deep Ajmera <vishal.deep.ajmera@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoAvoid clobbered variable warning on ppc64le.
David Wilder [Tue, 8 Oct 2019 19:40:19 +0000 (12:40 -0700)]
Avoid clobbered variable warning on ppc64le.

Since commit e2ed6fbeb1, Ci on ppc64le with Ubuntu 16.04.6 LTS throws
this error:

lib/fatal-signal.c: In function 'send_backtrace_to_monitor':
lib/fatal-signal.c:168:9: error: variable 'dep' might be clobbered by
'longjmp' or 'vfork' [-Werror=clobbered]
     int dep;

Declaring dep as a volatile int.

Signed-off-by: David Wilder <dwilder@us.ibm.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoovsdb-client: fix memory leak while executing database backup
Damijan Skvarc [Wed, 9 Oct 2019 06:34:24 +0000 (08:34 +0200)]
ovsdb-client: fix memory leak while executing database backup

valgrind detects this leak while running functional test "ovsdb-client backup and restore"

==25401== 1,068 (240 direct, 828 indirect) bytes in 6 blocks are definitely lost in loss record 22 of 22
==25401==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==25401==    by 0x449DA4: xmalloc (util.c:138)
==25401==    by 0x43012D: json_create (json.c:1451)
==25401==    by 0x43012D: json_array_create_empty (json.c:186)
==25401==    by 0x43012D: json_parser_push_array (json.c:1279)
==25401==    by 0x4303CF: json_parser_input (json.c:1407)
==25401==    by 0x4312F1: json_lex_input (json.c:991)
==25401==    by 0x43193B: json_parser_feed (json.c:1149)
==25401==    by 0x4329FA: jsonrpc_recv.part.7 (jsonrpc.c:332)
==25401==    by 0x432D3B: jsonrpc_recv (jsonrpc.c:297)
==25401==    by 0x432D3B: jsonrpc_recv_block (jsonrpc.c:402)
==25401==    by 0x4330EB: jsonrpc_transact_block (jsonrpc.c:436)
==25401==    by 0x409246: do_backup (ovsdb-client.c:2008)
==25401==    by 0x405F76: main (ovsdb-client.c:282)

the problem was in db_backup() function, where _uuid json node was detached from
its parent "row" json node, but never destroyed afterwards.

Signed-off-by: Damijan Skvarc <damjan.skvarc@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agonetdev-offload: replace netdev_hmap_mutex to netdev_hmap_rwlock
wenxu [Wed, 9 Oct 2019 07:01:00 +0000 (15:01 +0800)]
netdev-offload: replace netdev_hmap_mutex to netdev_hmap_rwlock

All the kmap lookup operations netdev_ports_flow_del, netdev_ports_get
netdev_ifindex_to_odp_port should protected by rdlock without
affect each other in the handlers and revalidators

Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoacinclude: Fix false positive search for prandom_u32
Greg Rose [Tue, 8 Oct 2019 16:21:15 +0000 (09:21 -0700)]
acinclude: Fix false positive search for prandom_u32

Searching random.h for prandom_u32 will also match when prandom_u32_max
is present and cause a false positive HAVE_PRANDOM_U32.  Fix this up
by looking for the parenthesis following prandom_u32 so it won't
match on prandom_u32_max.

Passes Travis:
https://travis-ci.org/gvrose8192/ovs-experimental/builds/595171808

Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoovsdb-tool: fix memory leak while converting cluster into standalone database
Damijan Skvarc [Mon, 7 Oct 2019 08:10:34 +0000 (10:10 +0200)]
ovsdb-tool: fix memory leak while converting cluster into standalone database

memory leak is reported by valgrind while executing functional test
"ovsdb-tool convert-to-standalone"

==13842== 2,850 (280 direct, 2,570 indirect) bytes in 7 blocks are definitely lost in loss record 20 of 20
==13842==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==13842==    by 0x45EE2E: xmalloc (util.c:138)
==13842==    by 0x43E386: json_create (json.c:1451)
==13842==    by 0x43BDD2: json_object_create (json.c:254)
==13842==    by 0x43DEE3: json_parser_push_object (json.c:1273)
==13842==    by 0x43E167: json_parser_input (json.c:1371)
==13842==    by 0x43D6EA: json_lex_input (json.c:991)
==13842==    by 0x43DAC1: json_parser_feed (json.c:1149)
==13842==    by 0x40D108: parse_body (log.c:411)
==13842==    by 0x40D386: ovsdb_log_read (log.c:476)
==13842==    by 0x406A0B: do_convert_to_standalone (ovsdb-tool.c:1571)
==13842==    by 0x406A0B: do_cluster_standalone (ovsdb-tool.c:1606)
==13842==    by 0x438670: ovs_cmdl_run_command__ (command-line.c:223)
==13842==    by 0x438720: ovs_cmdl_run_command (command-line.c:254)
==13842==    by 0x405A4C: main (ovsdb-tool.c:79)

The problem was in do_convert_to_standalone() function which while reading log file
allocate json object which was not deallocated at the end.

Signed-off-by: Damijan Skvarc <damjan.skvarc@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoovsdb-tool: fix memory leak while running "db-is-standalone" command
Damijan Skvarc [Mon, 7 Oct 2019 08:30:07 +0000 (10:30 +0200)]
ovsdb-tool: fix memory leak while running "db-is-standalone" command

problem is reported by valgrind while running functional tests:

==21043== 160 (88 direct, 72 indirect) bytes in 1 blocks are definitely lost in loss record 7 of 8
==21043==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==21043==    by 0x45EE2E: xmalloc (util.c:138)
==21043==    by 0x40CB81: ovsdb_log_open (log.c:270)
==21043==    by 0x406B4F: do_db_has_magic.isra.9 (ovsdb-tool.c:563)
==21043==    by 0x438670: ovs_cmdl_run_command__ (command-line.c:223)
==21043==    by 0x438720: ovs_cmdl_run_command (command-line.c:254)
==21043==    by 0x405A4C: main (ovsdb-tool.c:79)

problem was in do_db_has_magic() which opens log file which is never closed.

Signed-off-by: Damijan Skvarc <damjan.skvarc@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoofproto: fix a typo for ttl in dpif_sflow_actions
Martin Zhang [Mon, 7 Oct 2019 04:34:55 +0000 (00:34 -0400)]
ofproto: fix a typo for ttl in dpif_sflow_actions

Signed-off-by: Martin Zhang <martinbj2008@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agotravis: Fix 32-bit libunwind system build.
William Tu [Fri, 4 Oct 2019 21:21:15 +0000 (14:21 -0700)]
travis: Fix 32-bit libunwind system build.

32-bit and 64-bit libunwind can not be installed at the same time.
For 32-bit build, this patch removes the 64-bit libunwind and install
32-bit version.

Fixes: e2ed6fbeb18c ("fatal-signal: Catch SIGSEGV and print backtrace.")
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agobacktrace: Fix 32-bit libunwind build.
William Tu [Fri, 4 Oct 2019 21:21:16 +0000 (14:21 -0700)]
backtrace: Fix 32-bit libunwind build.

The libunwind unw_word_t type is defined as uint32_t for 32-bit
system and uint64_t for 64-bit system.  The patch fixes the
compile error using PRIxPTR to print this value.

Fixes: e2ed6fbeb18c ("fatal-signal: Catch SIGSEGV and print backtrace.")
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agodoc: Added OVS Extensions document
Ashish Varma [Wed, 2 Oct 2019 17:23:05 +0000 (10:23 -0700)]
doc: Added OVS Extensions document

OVS supports OVS Extensions as various vendor messages or as vendor
types in stats or multipart messages. Added a document to describe the
extensions as currently supported by OVS.

Signed-off-by: Ashish Varma <ashishvarma.ovs@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>