]> git.proxmox.com Git - ovs.git/log
ovs.git
7 years agodpif-netdev: Assign ports to pmds on non-local numa node.
Billy O'Mahony [Tue, 1 Aug 2017 21:38:43 +0000 (14:38 -0700)]
dpif-netdev: Assign ports to pmds on non-local numa node.

Previously if there is no available (non-isolated) pmd on the numa node
for a port then the port is not polled at all. This can result in a
non-operational system until such time as nics are physically
repositioned. It is preferable to operate with a pmd on the 'wrong' numa
node albeit with lower performance. Local pmds are still chosen when
available.

Signed-off-by: Billy O'Mahony <billy.o.mahony@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Co-authored-by: Ilya Maximets <i.maximets@samsung.com>
Tested-by: Ian Stokes <ian.stokes@intel.com>
Acked-by: Ian Stokes <ian.stokes@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
7 years agodpif-netdev: Don't uninit emc on reload.
Ilya Maximets [Tue, 1 Aug 2017 21:22:17 +0000 (14:22 -0700)]
dpif-netdev: Don't uninit emc on reload.

There are many reasons for reloading of pmd threads:
* reconfiguration of one of the ports.
* Adjusting of static_tx_qid.
* Adding new tx/rx ports.

In many cases EMC is still useful after reload and uninit
will only lead to unnecessary upcalls/classifier lookups.

Such behaviour slows down the datapath. Uninit itself slows
down the reload path. All this factors leads to additional
unexpected latencies/drops on events not directly connected
to current PMD thread.

Lets not uninitialize emc cache on reload path.
'emc_cache_slow_sweep()' and replacements should free all
the old/unwanted entries.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Cian Ferriter <cian.ferriter@intel.com>
Tested-by: Cian Ferriter <cian.ferriter@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
7 years agodpif-netdev: Incremental addition/deletion of PMD threads.
Ilya Maximets [Tue, 1 Aug 2017 20:46:33 +0000 (13:46 -0700)]
dpif-netdev: Incremental addition/deletion of PMD threads.

Currently, change of 'pmd-cpu-mask' is very heavy operation.
It requires destroying of all the PMD threads and creating
them back. After that, all the threads will sleep until
ports' redistribution finished.

This patch adds ability to not stop the datapath while
adjusting number/placement of PMD threads. All not affected
threads will forward traffic without any additional latencies.

id-pool created for static tx queue ids to keep them sequential
in a flexible way. non-PMD thread will always have
static_tx_qid = 0 as it was before.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Tested-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
7 years agoovn: Fix the failing "2335: ovn -- ACL logging" test case
Numan Siddique [Wed, 2 Aug 2017 14:20:57 +0000 (19:50 +0530)]
ovn: Fix the failing "2335: ovn -- ACL logging" test case

The test case is failing mainly because of timing issue. Looking into the
ovn-controller.log it is evident that the last packet injected just before the
AT_CHECK, is still not processed by ovn-controller.

Fixes: d383eed59589 ("ovn: Add support for ACL logging.")
Suggested-by: Lance Richardson <lrichard@redhat.com>
Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Justin Pettit <jpettit@ovn.org>
7 years agodpif-netlink: Fix log level for error message
Roi Dayan [Sun, 30 Jul 2017 04:58:17 +0000 (07:58 +0300)]
dpif-netlink: Fix log level for error message

Since it's an error but also will always occur in older kernels
log the message with level warning instead of info.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Acked-by: Eric Garver <e@erig.me>
Signed-off-by: Joe Stringer <joe@ovn.org>
7 years agodpif-netlink-rtnl: Fix false errors on interfaces without tunnel config
Roi Dayan [Thu, 27 Jul 2017 11:40:02 +0000 (14:40 +0300)]
dpif-netlink-rtnl: Fix false errors on interfaces without tunnel config

When we skip adding a port using rtnetlink and not because of an error we
need to return EOPNOTSUPP to avoid logging an error message.

Fixes: 2fd3d5eda508 ("dpif-netlink-rtnl: Support layer3 GRE")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Acked-by: Eric Garver <e@erig.me>
Acked-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
7 years agodpif-netlink-rtnl: Fix VXLAN port create for regular VXLAN
Eric Garver [Tue, 1 Aug 2017 22:47:18 +0000 (18:47 -0400)]
dpif-netlink-rtnl: Fix VXLAN port create for regular VXLAN

When VXLAN-GPE was introduced we added IFLA_VXLAN_GPE to the policy
parsing, but did not mark it as optional. The kernel only returns this
netlink attribute if it's actually configured.

This also adds a missing entry for IFLA_VXLAN_GBP. Apparently we have no
system-traffic test coverage there.

Fixes: c33c412fb139 ("dpif-netlink-rtnl: Support VXLAN-GPE")
Fixes: 825e45e0109f ("dpif-netlink-rtnl: Support VXLAN creation")
Reported-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Eric Garver <e@erig.me>
Signed-off-by: Joe Stringer <joe@ovn.org>
7 years agoofpbuf: Fix parameter for const initializer.
Joe Stringer [Tue, 1 Aug 2017 00:16:11 +0000 (17:16 -0700)]
ofpbuf: Fix parameter for const initializer.

Clang 4.0 complains:

In file included from include/openvswitch/cxxtest.cc:11:0:
../include/openvswitch/ofpbuf.h: In function ‘ofpbuf ofpbuf_const_initializer(const void*, size_t)’:
../include/openvswitch/ofpbuf.h:107:5: warning: narrowing conversion of ‘size’ from ‘size_t {aka long unsigned int}’ to ‘uint32_t {aka unsigned int}’ inside { } [-Wnarrowing]
     };
     ^
../include/openvswitch/ofpbuf.h:107:5: warning: narrowing conversion of ‘size’ from ‘size_t {aka long unsigned int}’ to ‘uint32_t {aka unsigned int}’ inside { } [-Wnarrowing]

This is because the ofpbuf struct's "size" parameter is a uint32_t,
while ofpbuf_const_initializer() takes a size_t for the size. Fix this
function to take a uint32_t instead.

Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
7 years agorhel: Fix typo in README.RHEL.rst
Timothy Redaelli [Fri, 28 Jul 2017 19:02:02 +0000 (21:02 +0200)]
rhel: Fix typo in README.RHEL.rst

Replace systemctk with systemctl

Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Signed-off-by: Russell Bryant <russell@ovn.org>
7 years agoodp-util: Refactor odp_key_to_dp_packet()
Yi-Hung Wei [Mon, 31 Jul 2017 20:35:39 +0000 (13:35 -0700)]
odp-util: Refactor odp_key_to_dp_packet()

Change type from uint16_t to 'enum ovs_key_attr' so that the compiler
will warn the unhandled cases.

Suggested-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
7 years agoodp-util: Remove unnecessary optimization in odp_key_to_dp_packet()
Yi-Hung Wei [Mon, 31 Jul 2017 20:35:38 +0000 (13:35 -0700)]
odp-util: Remove unnecessary optimization in odp_key_to_dp_packet()

The optimization logic in odp_key_to_dp_packet() used to be useful if the
number of wanted key attributes are small. However, as the expected key
attributes increase, and the optimization logic need to check all the
netlink attributes if one of the wanted key attributes is missing, the
benefit of the optimization logic is minimal. Therefore, this patch removes
the optimization.

Suggested-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
7 years agoodp-util: Fix generating ct_orig_tuple in odp_key_to_dp_packet()
Yi-Hung Wei [Mon, 31 Jul 2017 20:35:37 +0000 (13:35 -0700)]
odp-util: Fix generating ct_orig_tuple in odp_key_to_dp_packet()

Previously, odp_key_to_dp_packet() may fail to get ct_orig_tuple
from ODP flow key. This patch fixes the issue.

VMWare-BZ: #1920903
Fixes: daf4d3c18da4 ("odp: Support conntrack orig tuple key.")
Suggested-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
7 years agoodp-util: Fix generating various ct fields in odp_key_to_dp_packet()
Yi-Hung Wei [Mon, 31 Jul 2017 20:35:36 +0000 (13:35 -0700)]
odp-util: Fix generating various ct fields in odp_key_to_dp_packet()

Previously, odp_key_to_dp_packet() may fail to get ct_state, ct_zone,
ct_mark, and ct_labels from ODP flow key. This patch fixes the issue.

VMWare-BZ: #1920903
Fixes: 07659514c3c1 ("Add support for connection tracking.")
Fixes: 8e53fe8cf7a1 ("Add connection tracking mark support.")
Fixes: 9daf23484fb1 ("Add connection tracking label support.")
Suggested-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
7 years agoodp-util: Make checks for exact or wildcard masks more precise.
Ben Pfaff [Mon, 31 Jul 2017 19:36:48 +0000 (12:36 -0700)]
odp-util: Make checks for exact or wildcard masks more precise.

Checking whether an ODP mask is all-0s or all-1s is a little more
complicated than one might expect because the structures sometimes have
trailing padding.  The function odp_mask_is_exact() was fairly careful
about this, but odp_mask_attr_is_wildcard() didn't take padding into
consideration at all, which caused test failures on Travis and on some
machines because of uninitialized padding.

This commit fixes the problem by unifying the two different functions so
that both of them are careful about checking only significant bytes.  It
also adds support for the ct_orig_tuples for IPv4 and IPv6, which also
have trailing padding but weren't special cased before.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Andy Zhou <azhou@ovn.org>
7 years agoutil: New function is_all_byte().
Ben Pfaff [Mon, 31 Jul 2017 17:07:50 +0000 (10:07 -0700)]
util: New function is_all_byte().

This makes it easy for callers to choose all-ones or all-zeros based on
a parameter instead of choice of function.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Andy Zhou <azhou@ovn.org>
7 years agoodp-util: Drop special case for OVS_KEY_ATTR_TUNNEL for exact mask checks.
Ben Pfaff [Mon, 31 Jul 2017 16:40:57 +0000 (09:40 -0700)]
odp-util: Drop special case for OVS_KEY_ATTR_TUNNEL for exact mask checks.

This special case isn't actually necessary.  Commit 48954dab23ee
("odp-util: Remove last use of odp_tun_key_from_attr for formatting.")
retained it "as a safety measure" but that isn't really needed.

This makes an upcoming change more straightforward.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Andy Zhou <azhou@ovn.org>
7 years agoodp-util: Rewrite odp_mask_attr_is_exact().
Ben Pfaff [Fri, 28 Jul 2017 21:47:58 +0000 (14:47 -0700)]
odp-util: Rewrite odp_mask_attr_is_exact().

The way this function was written seemed really funny to me, so this commit
rewrites it.  There should be no behavioral change.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Andy Zhou <azhou@ovn.org>
7 years agoodp-util: More carefully validate attribute length in odp_flow_format().
Ben Pfaff [Fri, 28 Jul 2017 21:43:57 +0000 (14:43 -0700)]
odp-util: More carefully validate attribute length in odp_flow_format().

odp_flow_format() passes masks to odp_mask_attr_is_wildcard() without
first checking that they are the correct length.  This is OK for the
moment because odp_mask_attr_is_wildcard() doesn't care that the length
is correct.  An upcoming commit will change odp_mask_attr_is_wildcard()
to make it pickier, so this prepares for that change.

This adds a few comments to make it a little harder to get length
validation wrong in the future.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Andy Zhou <azhou@ovn.org>
7 years agoodp-util: Fix misleading parameter names.
Ben Pfaff [Fri, 28 Jul 2017 21:23:57 +0000 (14:23 -0700)]
odp-util: Fix misleading parameter names.

The 'max_len' parameters to these functions are actually the maximum type
values, not the maximum length of anything.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Andy Zhou <azhou@ovn.org>
7 years agoAdd 'extern "C"' for all relevant public header files, plus a build check.
Ben Pfaff [Mon, 31 Jul 2017 01:03:24 +0000 (18:03 -0700)]
Add 'extern "C"' for all relevant public header files, plus a build check.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Russell Bryant <russell@ovn.org>
7 years agoAutomatically verify that OVS header files work OK in C++ also.
Ben Pfaff [Mon, 31 Jul 2017 20:31:43 +0000 (13:31 -0700)]
Automatically verify that OVS header files work OK in C++ also.

This should help address a recurring problem.

This change makes the OVS header files, when parsed by a C++ compiler,
require C++11 or later.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Russell Bryant <russell@ovn.org>
7 years agoofp-util: Avoid C++ keyword 'public' in name of struct member.
Ben Pfaff [Mon, 31 Jul 2017 00:40:32 +0000 (17:40 -0700)]
ofp-util: Avoid C++ keyword 'public' in name of struct member.

This allows a C++ program to include ofp-util.h.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Russell Bryant <russell@ovn.org>
7 years agoutil: Add C++ compatible definition of PADDED_MEMBERS.
Ben Pfaff [Mon, 31 Jul 2017 00:36:21 +0000 (17:36 -0700)]
util: Add C++ compatible definition of PADDED_MEMBERS.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Russell Bryant <russell@ovn.org>
7 years agoofp-actions: Add casts to placate C++ compilers.
Ben Pfaff [Mon, 31 Jul 2017 00:35:25 +0000 (17:35 -0700)]
ofp-actions: Add casts to placate C++ compilers.

C++ does not allow for an implicit conversion from void * to pointer to
object or incomplete type.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Russell Bryant <russell@ovn.org>
7 years agoovn: Restrict encap modification to its creating chassis
Mark Michelson [Thu, 27 Jul 2017 18:34:23 +0000 (13:34 -0500)]
ovn: Restrict encap modification to its creating chassis

This patch extends RBAC restrictiveness of the encap table in
the ovn southbound database by only allowing modification by the
chassis that created the encap.

Signed-off-by: Mark Michelson <mmichels@redhat.com>
Reported-by: Lance Richardson <lrichard@redhat.com>
Acked-by: Lance Richardson <lrichard@redhat.com>
Signed-off-by: Russell Bryant <russell@ovn.org>
7 years agoovn: Update ovn-nbctl manpage with DHCP lsp commands.
Mark Michelson [Wed, 26 Jul 2017 21:28:13 +0000 (16:28 -0500)]
ovn: Update ovn-nbctl manpage with DHCP lsp commands.

This adds the appropriate manpage entries for ovn-nbctl for

lsp-set-dhcpv4-options
lsp-get-dhcpv4-options
lsp-set-dhcpv6-options
lsp-get-dhcpv4-options

Signed-off-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Russell Bryant <russell@ovn.org>
7 years agotests: Use ovn-nbctl lsp-set-dhcpvX-options
Mark Michelson [Wed, 26 Jul 2017 21:28:12 +0000 (16:28 -0500)]
tests: Use ovn-nbctl lsp-set-dhcpvX-options

Existing OVN tests manually added DHCP options to the
Logical_Switch_Port database. There is a shortcut CLI command for doing
the same thing, so we may as well use it and get the extra test coverage
as a result.

Signed-off-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Russell Bryant <russell@ovn.org>
7 years agoovn: Add lsp-set-dhcpv6-options ovn-nbctl operation.
Mark Michelson [Wed, 26 Jul 2017 21:28:11 +0000 (16:28 -0500)]
ovn: Add lsp-set-dhcpv6-options ovn-nbctl operation.

OVN offers a shortcut to set DHCPv4 options on a logical switch port,
but it did not offer the same for DHCPv6. This commit adds a similar
option for DHCPv6.

Signed-off-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Russell Bryant <russell@ovn.org>
7 years agoovn: Add support for ACL logging.
Justin Pettit [Sat, 17 Dec 2016 01:40:24 +0000 (17:40 -0800)]
ovn: Add support for ACL logging.

Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Han Zhou <zhouhan@gmail.com>
Acked-by: Ben Pfaff <blp@ovn.org>
7 years agoofproto-dpif-rid: Don't include action_set_len as part of hash.
Justin Pettit [Fri, 28 Jul 2017 01:02:26 +0000 (18:02 -0700)]
ofproto-dpif-rid: Don't include action_set_len as part of hash.

It doesn't improve the hashing, since the number of bytes hashed is
included in hash_bytes64() hash calculation.

Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
7 years agoofproto-dpif-rid: Always store tunnel metadata.
Justin Pettit [Fri, 7 Jul 2017 23:26:10 +0000 (16:26 -0700)]
ofproto-dpif-rid: Always store tunnel metadata.

Tunnel metadata was only stored if the tunnel destination was set.  It's
possible, for example, that a flow could set the tunnel id field before
recirculation and then set the destination field afterwards.  The
previous behavior is that the tunnel id would be lost during
recirculation under such a circumstance.  This changes the behavior to
always copy the tunnel metadata, regardless of whether the tunnel
destination is set.  It also adds a unit test.

Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
7 years agoofproto-dpif-rid: Store tunnel metadata in frozen metadata directly.
Justin Pettit [Fri, 7 Jul 2017 23:04:57 +0000 (16:04 -0700)]
ofproto-dpif-rid: Store tunnel metadata in frozen metadata directly.

"recirc_id_node" contains a 'state_metadata_tunnel' member field.  The
"frozen_metadata" structure used by "recird_id_node" had a 'tunnel'
member that always pointed to 'state_metadata_tunnel".  This commit just
stores the tunnel information directly in "frozen_metadata" instead of
accessing it through a pointer.

This makes the code a bit simpler and easier to reason about.

Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
7 years agotravis: Fail the build if any of the Linux build preparations fail.
Ben Pfaff [Thu, 27 Jul 2017 20:41:06 +0000 (13:41 -0700)]
travis: Fail the build if any of the Linux build preparations fail.

We want the build to fail if we can't prepare properly for it, but
linux-prepare.sh ignored errors.  This fixes the problem.

This would have made it more obvious where the problem fixed by the
previous commit originated.

(osx-prepare.sh already does the right thing.)

Signed-off-by: Ben Pfaff <blp@ovn.org>
7 years agotravis: Explicitly disable LLVM for sparse build.
Ben Pfaff [Thu, 27 Jul 2017 23:48:54 +0000 (16:48 -0700)]
travis: Explicitly disable LLVM for sparse build.

Newer travis environments claim to have LLVM support (llvm-config exists
and works) but in reality don't, which prevents sparse from building and
later parts of the build from succeeding.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
7 years agoflow: Refactor flow_compose() API.
Andy Zhou [Tue, 25 Jul 2017 21:26:22 +0000 (14:26 -0700)]
flow: Refactor flow_compose() API.

Currently, flow_compose_size() is only supposed to be called after
flow_compose(). I find this API to be unintuitive.

Change flow_compose() API to take the 'size' argument, and
returns 'true' if the packet can be created, 'false' otherwise.

This change also improves error detection and reporting when
'size' is unreasonably small.

Signed-off-by: Andy Zhou <azhou@ovn.org>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
7 years agonetlink: Correct comment for nl_msg_put_unspec().
Justin Pettit [Wed, 26 Jul 2017 23:51:17 +0000 (16:51 -0700)]
netlink: Correct comment for nl_msg_put_unspec().

Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
7 years agoovn: l3ha, CLI for logical router port gateway chassis
Venkata Anil [Tue, 18 Jul 2017 06:05:45 +0000 (11:35 +0530)]
ovn: l3ha, CLI for logical router port gateway chassis

This change adds commands to set, get and delete gateway chassis
for logical router port.

Signed-off-by: Venkata Anil Kommaddi <vkommadi@redhat.com>
Signed-off-by: Russell Bryant <russell@ovn.org>
7 years agodpif: Refactor obj type from void pointer to dpif_class
Roi Dayan [Tue, 25 Jul 2017 05:28:41 +0000 (08:28 +0300)]
dpif: Refactor obj type from void pointer to dpif_class

It's basically what is being passed today and passing a specific
type adds a compiler type check.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
7 years agotc: Add SCTP support
Vlad Buslov [Tue, 25 Jul 2017 11:39:51 +0000 (14:39 +0300)]
tc: Add SCTP support

Implement SCTP source and destination ports support for flower.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Acked-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
7 years agoDocumentation/conf.py: Fix line length.
Russell Bryant [Thu, 27 Jul 2017 00:30:34 +0000 (20:30 -0400)]
Documentation/conf.py: Fix line length.

A previous commit introduced a line that was greater than 79
characters long, causing a flake8 warning to be emitted.

Reported-by: Joe Stringer <joe@ovn.org>
Fixes: 5ca89127382d ("docs: Refer to correct package name for sphinx theme.")
Signed-off-by: Russell Bryant <russell@ovn.org>
7 years agodatapath: fix potential out of bound access in parse_ct
Greg Rose [Tue, 25 Jul 2017 15:40:58 +0000 (08:40 -0700)]
datapath: fix potential out of bound access in parse_ct

Upstream commit:
    commit 69ec932e364b1ba9c3a2085fe96b76c8a3f71e7c
    Author: Liping Zhang <zlpnobody@gmail.com>
    Date:   Sun Jul 23 17:52:23 2017 +0800

    openvswitch: fix potential out of bound access in parse_ct

    Before the 'type' is validated, we shouldn't use it to fetch the
    ovs_ct_attr_lens's minlen and maxlen, else, out of bound access
    may happen.

Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pick up an upstream bug fix.

Fixes: a94ebc39996b ("datapath: Add conntrack action")
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
7 years agosystem-userspace-macros: Fix ethtool with new kernels.
Joe Stringer [Wed, 26 Jul 2017 19:49:48 +0000 (12:49 -0700)]
system-userspace-macros: Fix ethtool with new kernels.

The latest net-next kernels have removed the UFO feature, which results
in older ethtool reporting the following error:

Cannot get device udp-fragmentation-offload settings: Operation not
supported

Currently, we rely on no errors being reported, and if there is an error
then a failure is reported. However, in this case we can safely ignore
the stderr output. We still check the return code so if something is
truly fatal, a failure will still be reported; otherwise, we will not
fail the test due to the above.

Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Andy Zhou <azhou@ovn.org>
7 years agoovn-northd: Optimize acl of localnet-port.
wangqianyu [Wed, 26 Jul 2017 21:02:24 +0000 (17:02 -0400)]
ovn-northd: Optimize acl of localnet-port.

Localnet port is not an endpoint, and have no security requirements
to use localnet port at present. So, for performance consideration, we
could do not use ct for localnet port.

The more specific discussion can be found from
https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335048.html

Signed-off-by: wangqianyu <wang.qianyu@zte.com.cn>
Acked-by: Han Zhou <zhouhan@gmail.com>
Signed-off-by: Russell Bryant <russell@ovn.org>
7 years agocheckpatch: Print commit hashes and names.
Ilya Maximets [Fri, 14 Jul 2017 10:57:24 +0000 (13:57 +0300)]
checkpatch: Print commit hashes and names.

It's better to see real commits instead of 'HEAD~n'.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Russell Bryant <russell@ovn.org>
7 years agocheckpatch: Allow checking more than one file.
Ilya Maximets [Fri, 14 Jul 2017 10:57:23 +0000 (13:57 +0300)]
checkpatch: Allow checking more than one file.

Currently to check more than one patch or file it's required
to invoke script for each file separately.
Fix that by iterating over all the passed filenames.

Note: If '-f' option passed, all the files treated as usual files.
      Without '-f' all the files treated as patch files.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Russell Bryant <russell@ovn.org>
7 years agocheckpatch: Print results while checking HEAD and stdin.
Ilya Maximets [Fri, 14 Jul 2017 10:57:22 +0000 (13:57 +0300)]
checkpatch: Print results while checking HEAD and stdin.

Currently, result status printed only for patch files.
It'll be nice to have results for other checking types.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Russell Bryant <russell@ovn.org>
7 years agocheckpatch: Don't allow Gerrit Change-Ids.
Ilya Maximets [Fri, 14 Jul 2017 10:57:21 +0000 (13:57 +0300)]
checkpatch: Don't allow Gerrit Change-Ids.

Local Gerrit Change-Ids are not welcome in common repository.
Inspired by checkpatch.pl from Linux Kernel.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Russell Bryant <russell@ovn.org>
7 years agodocs: Refer to correct package name for sphinx theme.
Russell Bryant [Fri, 21 Jul 2017 00:23:21 +0000 (20:23 -0400)]
docs: Refer to correct package name for sphinx theme.

Update the log message emitted when the OVS sphinx theme is not found
to reference the name of the package to be installed via pip:
ovs-sphinx-theme.

Signed-off-by: Russell Bryant <russell@ovn.org>
Acked-by: Lance Richardson <lrichard@redhat.com>
7 years agoovn-controller: avoid null ptr dereference
Lance Richardson [Wed, 26 Jul 2017 19:47:34 +0000 (15:47 -0400)]
ovn-controller: avoid null ptr dereference

Avoid null pointer dereference in fdb_calculate_active_tunnels()
when integration bridge isn't present. This is easily encountered
by executing "make sandbox SANDBOXFLAGS=--ovn".

Fixes: 3475695ea61c ("ovn: l3ha, enable bfd between tunnel endpoints")
Signed-off-by: Lance Richardson <lrichard@redhat.com>
Signed-off-by: Russell Bryant <russell@ovn.org>
7 years agobond: Adjust bond hash masks
Andy Zhou [Tue, 25 Jul 2017 18:28:37 +0000 (11:28 -0700)]
bond: Adjust bond hash masks

Commit 42781e77035d (bond: Unify hash functions in hash action and entry
lookup.) changed the BM_TCP's hash function, but did not update
hash mask fields accordingly. Found by inspection.

CC: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Andy Zhou <azhou@ovn.org>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
7 years agodpif-netdev.at: Add netdev-dummy/receive test.
Ilya Maximets [Tue, 25 Jul 2017 13:02:02 +0000 (16:02 +0300)]
dpif-netdev.at: Add netdev-dummy/receive test.

Regression test for 'netdev-dummy/receive' appctl command.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Andy Zhou <azhou@ovn.org>
7 years agonetdev-dummy: Fix setting length in recieve command.
Ilya Maximets [Tue, 25 Jul 2017 13:02:01 +0000 (16:02 +0300)]
netdev-dummy: Fix setting length in recieve command.

Currently, if '--len' option passed to 'netdev-dummy/receive' command,
only 'size' field of dp_packet will changes.

This is incorrect behaviour, because memory for that size is not
allocated and also packet headers not fixed to reflect the new size.
This leads to flow_extract() failure, because it checks the
'ip->tot_len' and stops further parsing if it doesn't match the
dp_packet_size(). As a result packets created while processing of the
'receive' command can't be parsed to the same flow.
Additionally this may lead to wrong memory accesses in case someone
will try to read or modify packets data.

Fix that by creating right packets using recently introduced
'flow_compose_size()'.

CC: Andy Zhou <azhou@ovn.org>
Fixes: d8ada2368cbe ("netdev-dummy: Add --len option for netdev-dummy/receive command")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Andy Zhou <azhou@ovn.org>
7 years agoflow: Add flow_compose_size().
Ilya Maximets [Tue, 25 Jul 2017 13:02:00 +0000 (16:02 +0300)]
flow: Add flow_compose_size().

This allows to compose packets with different real lenghts from
odp flows i.e. memory will be allocated for requested packet
size and all required headers like ip->tot_len filled correctly.

Will be used in netdev-dummy to properly handle '--len' option.

Suggested-by: Andy Zhou <azhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Andy Zhou <azhou@ovn.org>
7 years agostream-ssl: Fix memory leak in error scenario
Mark Michelson [Fri, 21 Jul 2017 20:46:00 +0000 (15:46 -0500)]
stream-ssl: Fix memory leak in error scenario

ssl_new_stream() takes ownership of the passed-in 'name' parameter.
In error scenarios, the name is leaked. I was able to trigger this
leak by attempting to connect to an ovsdb over SSL and specifying
non-existent certificate, private key, and CA cert files.

This patch fixes the problem by freeing 'name' in the error label.

Signed-off-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Russell Bryant <russell@ovn.org>
7 years agoAUTHORS.rst: Add Mark Michelson.
Russell Bryant [Tue, 25 Jul 2017 19:39:59 +0000 (15:39 -0400)]
AUTHORS.rst: Add Mark Michelson.

Signed-off-by: Russell Bryant <russell@ovn.org>
7 years agobond: Remove bond_hash_src.
Ilya Maximets [Tue, 25 Jul 2017 10:46:39 +0000 (13:46 +0300)]
bond: Remove bond_hash_src.

Since introduction of 'hash_mac()' function in
commit 7e36ac42e33a ("lib/packet.h: add hash_mac()"), there is no
need to have additional wrapper for mac address hashing.

Lets use 'hash_mac()' directly and remove 'bond_hash_src()' to
simplify the code.

Suggested-by: Andy Zhou <azhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Andy Zhou <azhou@ovn.org>
7 years agobond: Unify hash functions in hash action and entry lookup.
Ilya Maximets [Tue, 25 Jul 2017 10:46:38 +0000 (13:46 +0300)]
bond: Unify hash functions in hash action and entry lookup.

'lookup_bond_entry' currently uses 'flow_hash_symmetric_l4' while
OVS_ACTION_ATTR_HASH uses 'flow_hash_5tuple'. This may lead to
inconsistency in slave choosing for the new flows.  In general,
there is no point to unify hash functions, because it's not
required for correct work, but it's logically wrong to use
different hash functions there.

Unfortunately we're not able to use RSS hash here, because we have
no packet at this point, but we may reduce inconsistency by using
'flow_hash_5tuple' instead of 'flow_hash_symmetric_l4' because
symmetric quality is not needed.

'flow_hash_symmetric_l4' was used previously just because there
was no other implemented hash function at the moment and L2
fields was additionally involved in hash calculation. Now we
have 5tuple hash and L2 not used anymore, so, we may replace the
old function.

'flow_hash_5tuple' is preferable solution because it in 2 - 8 times
(depending on the flow) faster than symmetric function.
So, this change will also speed up handling of the new flows and
statistics accounting.

Additionally function 'bond_hash_tcp()' was removed for the reasons
of code simplification and possible additional speed up.

Co-authored-by: Andy Zhou <azhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Andy Zhou <azhou@ovn.org>
7 years agovswitch.xml: Fix L2 balancing mentioning for balance-tcp bond.
Ilya Maximets [Tue, 25 Jul 2017 10:46:37 +0000 (13:46 +0300)]
vswitch.xml: Fix L2 balancing mentioning for balance-tcp bond.

L2 fields are not used in userspace hash action since
commit 4f150744921f ("dpif-netdev: Use miniflow as a flow key.").
In kernel datapath RSS (which is not include L2 by default for
most of the NICs) was used from the beginning. This means that
if recirculation is in use, L2 fields are not used for flow
balancing.

Fix the documentation accordingly.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Andy Zhou <azhou@ovn.org>
7 years agoovn-architecture: Remove outdated comment.
Russell Bryant [Mon, 24 Jul 2017 20:52:30 +0000 (16:52 -0400)]
ovn-architecture: Remove outdated comment.

This outdated comment said that support for hardware gateways that
support the vtep schema would come later.  This was actually
implemented a long time ago.

Signed-off-by: Russell Bryant <russell@ovn.org>
Acked-by: Miguel Angel Ajo <majopela@redhat.com>
7 years agotests: Add force/commit test to system-traffic.at
Joe Stringer [Tue, 18 Jul 2017 15:42:53 +0000 (08:42 -0700)]
tests: Add force/commit test to system-traffic.at

Add a new check if the conntrack force direction change and
commit is working correctly.

This test was used to find and root cause VMware-BZ 1890854.

Signed-off-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Darrell Ball <dlu998@gmail.com>
7 years agodatapath: Fix for force/commit action failures
Greg Rose [Tue, 18 Jul 2017 15:42:54 +0000 (08:42 -0700)]
datapath: Fix for force/commit action failures

Upstream commit:
    commit 8b97ac5bda17cfaa257bcab6180af0f43a2e87e0
    Author: Greg Rose <gvrose8192@gmail.com>
    Date:   Fri Jul 14 12:42:49 2017 -0700

    openvswitch: Fix for force/commit action failures

    When there is an established connection in direction A->B, it is
    possible to receive a packet on port B which then executes
    ct(commit,force) without first performing ct() - ie, a lookup.
    In this case, we would expect that this packet can delete the
    existing entry so that we can commit a connection with direction B->A.
    However, currently we only perform a check in skb_nfct_cached() for
    whether OVS_CS_F_TRACKED is set and OVS_CS_F_INVALID is not set, ie
    that a lookup previously occurred. In the above scenario, a lookup
    has not occurred but we should still be able to statelessly look
    up the existing entry and potentially delete the entry if it is
    in the opposite direction.

    This patch extends the check to also hint that if the action has the
    force flag set, then we will lookup the existing entry so that the
    force check at the end of skb_nfct_cached has the ability to delete
    the connection.

Fixes: dd41d330b03 ("openvswitch: Add force commit.")
CC: Pravin Shelar <pshelar@nicira.com>
CC: dev@openvswitch.org
Signed-off-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Co-authored-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
7 years agotravis: Update test kernels
Greg Rose [Fri, 21 Jul 2017 23:46:14 +0000 (16:46 -0700)]
travis: Update test kernels

Update the Travis test kernels as per the latest information from
kernel.org. In particular add support for kernel 4.12 as the newest
released kernel.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
7 years agoacinclude.m4: Support Linux kernel 4.12
Greg Rose [Fri, 21 Jul 2017 23:46:13 +0000 (16:46 -0700)]
acinclude.m4: Support Linux kernel 4.12

Allow datapath kernel modules to be configured and built for kernels up
to 4.12.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
7 years agodatapath: fix mis-ordered comment lines for ovs_skb_cb
Greg Rose [Fri, 21 Jul 2017 23:46:12 +0000 (16:46 -0700)]
datapath: fix mis-ordered comment lines for ovs_skb_cb

Upstream commit:
    commit 52427fa0631269c62885dc48e0c32e2ad6e17f8c
    Author: Daniel Axtens <dja@axtens.net>
    Date:   Mon Jul 3 21:46:43 2017 +1000

    openvswitch: fix mis-ordered comment lines for ovs_skb_cb

    I was trying to wrap my head around meaning of mru, and realised
    that the second line of the comment defining it had somehow
    ended up after the line defining cutlen, leading to much confusion.

    Reorder the lines to make sense.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
7 years agodatapath: Avoid using stack larger than 1024.
Tonghao Zhang [Fri, 21 Jul 2017 23:46:11 +0000 (16:46 -0700)]
datapath: Avoid using stack larger than 1024.

Upstream commit:
    commit 9cc9a5cb176ccb4f2cda5ac34da5a659926f125f
    Author: Tonghao Zhang <xiangxia.m.yue@gmail.com>
    Date:   Thu Jun 29 17:27:44 2017 -0700

    datapath: Avoid using stack larger than 1024.

    When compiling OvS-master on 4.4.0-81 kernel,
    there is a warning:

        CC [M]  /root/ovs/datapath/linux/datapath.o
/root/ovs/datapath/linux/datapath.c: In function
'ovs_flow_cmd_set':
/root/ovs/datapath/linux/datapath.c:1221:1: warning:
the frame size of 1040 bytes is larger than 1024 bytes
[-Wframe-larger-than=]

    This patch factors out match-init and action-copy to avoid
    "Wframe-larger-than=1024" warning. Because mask is only
    used to get actions, we new a function to save some
    stack space.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
7 years agocompat: net: store port/representator id in metadata_dst.
Joe Stringer [Fri, 21 Jul 2017 23:46:10 +0000 (16:46 -0700)]
compat: net: store port/representator id in metadata_dst.

Upstream commit:
    commit 3fcece12bc1b6dcdf0986f2cd9e8f63b1f9b6aa0
    Author: Jakub Kicinski <jakub.kicinski@netronome.com>
    Date: Fri Jun 23 22:11:58 2017 +0200

    net: store port/representator id in metadata_dst

    Switches and modern SR-IOV enabled NICs may multiplex traffic from Port
    representators and control messages over single set of hardware queues.
    Control messages and muxed traffic may need ordered delivery.

    Those requirements make it hard to comfortably use TC infrastructure today
    unless we have a way of attaching metadata to skbs at the upper device.
    Because single set of queues is used for many netdevs stopping TC/sched
    queues of all of them reliably is impossible and lower device has to
    retreat to returning NETDEV_TX_BUSY and usually has to take extra locks on
    the fastpath.

    This patch attempts to enable port/representative devs to attach metadata
    to skbs which carry port id.  This way representatives can be queueless and
    all queuing can be performed at the lower netdev in the usual way.

    Traffic arriving on the port/representative interfaces will be have
    metadata attached and will subsequently be queued to the lower device for
    transmission.  The lower device should recognize the metadata and translate
    it to HW specific format which is most likely either a special header
    inserted before the network headers or descriptor/metadata fields.

    Metadata is associated with the lower device by storing the netdev pointer
    along with port id so that if TC decides to redirect or mirror the new
    netdev will not try to interpret it.

    This is mostly for SR-IOV devices since switches don't have lower netdevs
    today.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Upstream: 3fcece12bc1b ("net: store port/representator id in metadata_dst")
Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Greg Rose <gvrose8192@gmail.com>
7 years agodatapath: get rid of redundant vxlan_dev.flags
Greg Rose [Fri, 21 Jul 2017 23:46:09 +0000 (16:46 -0700)]
datapath: get rid of redundant vxlan_dev.flags

Upstream commit:
    commit dc5321d79697db1b610c25fa4fad1aec7533ea3e
    Author: Matthias Schiffer <mschiffer@universe-factory.net>
    Date:   Mon Jun 19 10:03:56 2017 +0200

    vxlan: get rid of redundant vxlan_dev.flags

    There is no good reason to keep the flags twice in vxlan_dev and
    vxlan_config.

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Applied using HAVE_VXLAN_DEV_CFG compatibility flag defined in
acinclude.m4.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
7 years agocompat: Implement upstream net device free change.
Greg Rose [Fri, 21 Jul 2017 23:46:08 +0000 (16:46 -0700)]
compat: Implement upstream net device free change.

Upstream commit cf124db566e6 ("net: Fix inconsistent teardown and
release of private netdev state.") removed the destructor member
of the net_device structure and replaced it with a boolean flag
indicating that the net device resource needs freeing.  Use
compat flag HAVE_NEEDS_FREE_NETDEV to indicate whether the new
flag should be used.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
7 years agocompat: convert many more places to skb_put_zero().
Joe Stringer [Fri, 21 Jul 2017 23:46:07 +0000 (16:46 -0700)]
compat: convert many more places to skb_put_zero().

Upstream commit:
    commit de77b966ce8adcb4c58d50e2f087320d5479812a
    Author: Johannes Berg <johannes.berg@intel.com>
    Date: Fri Jun 16 14:29:19 2017 +0200

    networking: convert many more places to skb_put_zero()

    There were many places that my previous spatch didn't find,
    as pointed out by yuan linyu in various patches.

    The following spatch found many more and also removes the
    now unnecessary casts:

        @@
        identifier p, p2;
        expression len;
        expression skb;
        type t, t2;
        @@
        (
        -p = skb_put(skb, len);
        +p = skb_put_zero(skb, len);
        |
        -p = (t)skb_put(skb, len);
        +p = skb_put_zero(skb, len);
        )
        ... when != p
        (
        p2 = (t2)p;
        -memset(p2, 0, len);
        |
        -memset(p, 0, len);
        )

        @@
        type t, t2;
        identifier p, p2;
        expression skb;
        @@
        t *p;
        ...
        (
        -p = skb_put(skb, sizeof(t));
        +p = skb_put_zero(skb, sizeof(t));
        |
        -p = (t *)skb_put(skb, sizeof(t));
        +p = skb_put_zero(skb, sizeof(t));
        )
        ... when != p
        (
        p2 = (t2)p;
        -memset(p2, 0, sizeof(*p));
        |
        -memset(p, 0, sizeof(*p));
        )

        @@
        expression skb, len;
        @@
        -memset(skb_put(skb, len), 0, len);
        +skb_put_zero(skb, len);

    Apply it to the tree (with one manual fixup to keep the
    comment in vxlan.c, which spatch removed.)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use e45a79da863c ("skbuff/mac80211: introduce and use skb_put_zero()")
as the basis for the backported function.

Upstream: de77b966ce8a ("networking: convert many more places to skb_put_zero()")
Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Greg Rose <gvrose8192@gmail.com>
7 years agodatapath: Fix inconsistent teardown and release of private netdev state.
Greg Rose [Fri, 21 Jul 2017 23:46:06 +0000 (16:46 -0700)]
datapath: Fix inconsistent teardown and release of private netdev state.

Upstream commit:
    commit cf124db566e6b036b8bcbe8decbed740bdfac8c6
    Author: David S. Miller <davem@davemloft.net>
    Date:   Mon May 8 12:52:56 2017 -0400

    net: Fix inconsistent teardown and release of private netdev state.

    Network devices can allocate reasources and private memory using
    netdev_ops->ndo_init().  However, the release of these resources
    can occur in one of two different places.

    Either netdev_ops->ndo_uninit() or netdev->destructor().

    The decision of which operation frees the resources depends upon
    whether it is necessary for all netdev refs to be released before it
    is safe to perform the freeing.

    netdev_ops->ndo_uninit() presumably can occur right after the
    NETDEV_UNREGISTER notifier completes and the unicast and multicast
    address lists are flushed.

    netdev->destructor(), on the other hand, does not run until the
    netdev references all go away.

    Further complicating the situation is that netdev->destructor()
    almost universally does also a free_netdev().

    This creates a problem for the logic in register_netdevice().
    Because all callers of register_netdevice() manage the freeing
    of the netdev, and invoke free_netdev(dev) if register_netdevice()
    fails.

    If netdev_ops->ndo_init() succeeds, but something else fails inside
    of register_netdevice(), it does call ndo_ops->ndo_uninit().  But
    it is not able to invoke netdev->destructor().

    This is because netdev->destructor() will do a free_netdev() and
    then the caller of register_netdevice() will do the same.

    However, this means that the resources that would normally be released
    by netdev->destructor() will not be.

    Over the years drivers have added local hacks to deal with this, by
    invoking their destructor parts by hand when register_netdevice()
    fails.

    Many drivers do not try to deal with this, and instead we have leaks.

    Let's close this hole by formalizing the distinction between what
    private things need to be freed up by netdev->destructor() and whether
    the driver needs unregister_netdevice() to perform the free_netdev().

    netdev->priv_destructor() performs all actions to free up the private
    resources that used to be freed by netdev->destructor(), except for
    free_netdev().

    netdev->needs_free_netdev is a boolean that indicates whether
    free_netdev() should be done at the end of unregister_netdevice().

    Now, register_netdevice() can sanely release all resources after
    ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit()
    and netdev->priv_destructor().

    And at the end of unregister_netdevice(), we invoke
    netdev->priv_destructor() and optionally call free_netdev().

Signed-off-by: David S. Miller <davem@davemloft.net>
Applied the portion of the commit applicable to openvswitch.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
7 years agodatapath: more accurate checksumming in queue_userspace_packet()
Joe Stringer [Fri, 21 Jul 2017 23:46:05 +0000 (16:46 -0700)]
datapath: more accurate checksumming in queue_userspace_packet()

Upstream commit:
    commit 7529390d08f07fbf9b0174c5a87600b5caa1a8e8
    Author: Davide Caratti <dcaratti@redhat.com>
    Date:   Thu May 18 15:44:42 2017 +0200

    openvswitch: more accurate checksumming in queue_userspace_packet()

    if skb carries an SCTP packet and ip_summed is CHECKSUM_PARTIAL, it needs
    CRC32c in place of Internet Checksum: use skb_csum_hwoffload_help to avoid
    corrupting such packets while queueing them towards userspace.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Joe Stringer <joe@ovn.org>
7 years agodatapath: introduce nf_conntrack_helper_put function
Greg Rose [Fri, 21 Jul 2017 23:46:04 +0000 (16:46 -0700)]
datapath: introduce nf_conntrack_helper_put function

Upstream commit:
    commit d91fc59cd77c719f33eda65c194ad8f95a055190
    Author: Liping Zhang <zlpnobody@gmail.com>
    Date:   Sun May 7 22:01:55 2017 +0800

    netfilter: introduce nf_conntrack_helper_put helper function

    And convert module_put invocation to nf_conntrack_helper_put, this is
    prepared for the followup patch, which will add a refcnt for cthelper,
    so we can reject the deleting request when cthelper is in use.

Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Applied with additional use of HAVE_NF_CONNTRACK_HELPER_PUT compatibility
flag defined in acinclude.m4.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
7 years agotests: Check ip command whether support udp6zerocsum.
Tonghao Zhang [Fri, 21 Jul 2017 11:34:07 +0000 (04:34 -0700)]
tests: Check ip command whether support udp6zerocsum.

The version of ip-route may not support udp6zerocsum for
vxlan6 or geneve6. If we run the kernel check, there may
be always error message. Before running the test units,
we check the ip command.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Eric Garver <e@erig.me>
Signed-off-by: Joe Stringer <joe@ovn.org>
7 years agoSystem-tests: Improve reliability of an icmp test.
Darrell Ball [Thu, 20 Jul 2017 20:02:43 +0000 (13:02 -0700)]
System-tests: Improve reliability of an icmp test.

One SNAT test is based on a single ping being successful;
to make the result more predictable, static arp binding is now used.
To put less stress on the stack a single arp binding is used for
the reverse direction mapping.  This does not change the goal of the
test, but significantly increases the reliability; I ran the test
100 times without failure.

Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
7 years agoSystem-tests: Allow SNAT address variability retries.
Darrell Ball [Thu, 20 Jul 2017 20:02:42 +0000 (13:02 -0700)]
System-tests: Allow SNAT address variability retries.

Three of the SNAT tests allow for wget retries, which occasionally
happen.  However, these tests did not allow for SNAT address
variability for the retries, which is now tolerated.

Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
7 years agoAUTHORS.rst: Update name and e-mail.
Tonghao Zhang [Wed, 19 Jul 2017 03:44:16 +0000 (20:44 -0700)]
AUTHORS.rst: Update name and e-mail.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
7 years agodocs: Note currently used L3 gateway HA approach.
Russell Bryant [Sun, 16 Jul 2017 19:39:56 +0000 (15:39 -0400)]
docs: Note currently used L3 gateway HA approach.

The OVN gateway HA design document is very useful in its current form.
It describes a range of options OVN could take to provide gateway HA.
Leave all the useful discussion in place and add a note to indicate
how the current implementation lines up with the options described.

I plan to follow up with an additional patch to describe the current L3
gateway HA implementation in the ovn-architecture document.

Signed-off-by: Russell Bryant <russell@ovn.org>
Acked-by: Miguel Angel Ajo <majopela@redhat.com>
7 years agodatapath: Fix kernel panic for ovs reassemble.
wangzhike [Thu, 6 Jul 2017 20:57:34 +0000 (13:57 -0700)]
datapath: Fix kernel panic for ovs reassemble.

Ovs and kernel stack would add frag_queue to same netns_frags list.
As result, ovs and kernel may access the fraq_queue without correct
lock. Also the struct ipq may be different on kernel(older than 4.3),
which leads to invalid pointer access.

The fix creates specific netns_frags for ovs.

Signed-off-by: wangzhike <wangzhike@jd.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
7 years agorhel/systemd: Set ovs-vswitchd timeout to 5 minutes
aaron conole [Thu, 13 Jul 2017 14:51:34 +0000 (10:51 -0400)]
rhel/systemd: Set ovs-vswitchd timeout to 5 minutes

During initialization, it's possible that the startup time takes longer
than the systemd default provided.  Set this to be 5 minutes.  If we
take longer than 5 minutes, maybe something is wrong.

As an example of long initialization, enable DPDK, and allocate large
numbers of hugepages before starting ovs-vswitchd.  The vswitchd can
take two or more minutes to start.  During that time, systemd will decide
that the startup time took too long, and kill the parent process, leading
eventually to an error like:
   ovs|00011|daemon_unix|EMER|pipe write failed (Broken pipe)

And a systemd log like:
   ovs-vswitchd.service start operation timed out. Terminating.

The 5 minutes setting has been observed to work on a system where 400G
of hugepages were allocated.

Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Markos Chandras <mchandras@suse.de>
Acked-by: Timothy Redaelli <tredaelli@redhat.com>
Signed-off-by: Russell Bryant <russell@ovn.org>
7 years agorhel: Fix creation of symlink for ocf script
Timothy Redaelli [Wed, 19 Jul 2017 12:56:28 +0000 (14:56 +0200)]
rhel: Fix creation of symlink for ocf script

The policy is to use %files to track installed files.

If %files is not used the resulting file is not owned by any package.

Before this commit:
 # rpm -qf /usr/lib/ocf/resource.d/ovn/ovndb-servers
 file /usr/lib/ocf/resource.d/ovn/ovndb-servers is not owned by any package

After this commit:
 # rpm -qf /usr/lib/ocf/resource.d/ovn/ovndb-servers
 openvswitch-ovn-common-2.7.90-1.fc26.x86_64

Fixes: a4245b7869c8 ("ovn: Add ovn db servers ocf script in fedora packager")
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Russell Bryant <russell@ovn.org>
7 years agoovn-architecture: Add notes on L3 gateway HA.
Russell Bryant [Sun, 16 Jul 2017 20:07:12 +0000 (16:07 -0400)]
ovn-architecture: Add notes on L3 gateway HA.

Add some comments to the ovn-architecture document that distributed
gateway ports can also be made highly available.  Provide a brief
overview of the approach and point to the gateway HA design document
for a more detailed discussion of the approach taken.

Signed-off-by: Russell Bryant <russell@ovn.org>
Acked-by: Miguel Angel Ajo <majopela@redhat.com>
7 years agoNEWS: Add OVN L3 Gateway HA.
Russell Bryant [Sun, 16 Jul 2017 18:59:04 +0000 (14:59 -0400)]
NEWS: Add OVN L3 Gateway HA.

Signed-off-by: Russell Bryant <russell@ovn.org>
Acked-by: Miguel Angel Ajo <majopela@redhat.com>
7 years agoodp-execute: Reuse rss hash in OVS_ACTION_ATTR_HASH.
Ilya Maximets [Thu, 13 Jul 2017 15:07:03 +0000 (18:07 +0300)]
odp-execute: Reuse rss hash in OVS_ACTION_ATTR_HASH.

If RSS hash exists in a packet it can be reused instead of
5 tuple hash re-calculation in OVS_ACTION_ATTR_HASH. This
leads to increasing the performance of sending packets to
the OVS bonding in userspace datapath up to 10-15%.

Additionally fixed unit test 'select group with dp_hash
selection method' to not depend on dp_hash value.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Andy Zhou <azhou@ovn.org>
7 years agodpif-netdev: Indicate support for various ct features.
Justin Pettit [Wed, 19 Jul 2017 05:55:35 +0000 (22:55 -0700)]
dpif-netdev: Indicate support for various ct features.

The userspace datapath uses a structure to indicate supported features
that affects debug output.  This commit updates that structure to
indicate that "ct_state_nat", "ct_orig_tuple", and "ct_orig_tuple6" are
supported.

Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Darrell Ball <dlu998@gmail.com>
7 years agotunneling: Avoid datapath-recirc by combining recirc actions at xlate.
Sugesh Chandran [Wed, 19 Jul 2017 13:46:03 +0000 (14:46 +0100)]
tunneling: Avoid datapath-recirc by combining recirc actions at xlate.

This patch set removes the recirculation of encapsulated tunnel packets
if possible. It is done by computing the post tunnel actions at the time of
translation. The combined nested action set are programmed in the datapath
using CLONE action.

The following test results shows the performance improvement offered by
this optimization for tunnel encap.

          +-------------+
      dpdk0 |             |
         -->o    br-in    |
            |             o--> gre0
            +-------------+

                   --> LOCAL
            +-----------o-+
            |             | dpdk1
            |    br-p1    o-->
            |             |
            +-------------+

Test result on OVS master with DPDK 16.11.2 (Without optimization):

 # dpdk0

 RX packets         : 7037641.60  / sec
 RX packet errors   : 0  / sec
 RX packets dropped : 7730632.90  / sec
 RX rate            : 402.69 MB/sec

 # dpdk1

 TX packets         : 7037641.60  / sec
 TX packet errors   : 0  / sec
 TX packets dropped : 0  / sec
 TX rate            : 657.73 MB/sec
 TX processing cost per TX packets in nsec : 142.09

Test result on OVS master + DPDK 16.11.2 (With optimization):

 # dpdk0

 RX packets         : 9386809.60  / sec
 RX packet errors   : 0  / sec
 RX packets dropped : 5381496.40  / sec
 RX rate            : 537.11 MB/sec

 # dpdk1

 TX packets         : 9386809.60  / sec
 TX packet errors   : 0  / sec
 TX packets dropped : 0  / sec
 TX rate            : 877.29 MB/sec
 TX processing cost per TX packets in nsec : 106.53

The offered performance gain is approx 30%.

Signed-off-by: Sugesh Chandran <sugesh.chandran@intel.com>
Signed-off-by: Zoltán Balogh <zoltan.balogh@ericsson.com>
Co-authored-by: Zoltán Balogh <zoltan.balogh@ericsson.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
7 years agotunneling: Calculate and update packet l4_offset in tunnel push.
Sugesh Chandran [Wed, 19 Jul 2017 13:46:02 +0000 (14:46 +0100)]
tunneling: Calculate and update packet l4_offset in tunnel push.

The following tunnel combine patch series avoids the packets recirculation
after the tunnel push. So it is necessary to populate all relevant packet meta
data fields for the following combined action-set.

Consider a chained tunnel test case shown below,

PKT-IN --> TUNNEL_PUSH --> MOD_PKT_HDR --> TUNNEL_POP

In this eg: the last tunnel_pop operation uses the l4_offset in the packet to
validate the packets. So it must be calculated and updated in the packet before
executing the action. Since there is no recirculation now on, this calculation
is doing as part of tunnel_push.

Signed-off-by: Sugesh Chandran <sugesh.chandran@intel.com>
Signed-off-by: Zoltán Balogh <zoltan.balogh@ericsson.com>
Co-authored-by: Zoltán Balogh <zoltan.balogh@ericsson.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
7 years agoxlate: Clear tunnel mask along with other fields while combine actions.
Sugesh Chandran [Wed, 19 Jul 2017 13:46:01 +0000 (14:46 +0100)]
xlate: Clear tunnel mask along with other fields while combine actions.

The tunnel mask in the translation state should be cleared along with other
context fields. It is necessary in 'apply_nested_clone_actions' as it will be
used to combine post tunnel output actions with tunnel push. This will assure
right openflow state while executing the translation.

Signed-off-by: Sugesh Chandran <sugesh.chandran@intel.com>
Signed-off-by: Zoltán Balogh <zoltan.balogh@ericsson.com>
Co-authored-by: Zoltán Balogh <zoltan.balogh@ericsson.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
7 years agosystem-layer3-tunnels: Add basic GRE ping test case
Eric Garver [Mon, 10 Jul 2017 19:40:00 +0000 (15:40 -0400)]
system-layer3-tunnels: Add basic GRE ping test case

Signed-off-by: Eric Garver <e@erig.me>
Signed-off-by: Joe Stringer <joe@ovn.org>
7 years agosystem-common-macros: Add macro to check for L3 GRE support
Eric Garver [Mon, 10 Jul 2017 19:39:59 +0000 (15:39 -0400)]
system-common-macros: Add macro to check for L3 GRE support

Signed-off-by: Eric Garver <e@erig.me>
Signed-off-by: Joe Stringer <joe@ovn.org>
7 years agosystem-layer3-tunnels: Add basic VXLAN-GPE ping test case
Eric Garver [Mon, 10 Jul 2017 19:39:58 +0000 (15:39 -0400)]
system-layer3-tunnels: Add basic VXLAN-GPE ping test case

This also adds a new autotest file specifically for layer3 tunnels.

Signed-off-by: Eric Garver <e@erig.me>
Signed-off-by: Joe Stringer <joe@ovn.org>
7 years agosystem-common-macros: Add macro to check for VXLAN-GPE support
Eric Garver [Mon, 10 Jul 2017 19:39:57 +0000 (15:39 -0400)]
system-common-macros: Add macro to check for VXLAN-GPE support

Signed-off-by: Eric Garver <e@erig.me>
Signed-off-by: Joe Stringer <joe@ovn.org>
7 years agosystem-common-macros: Add macro to check for ip-route encap support
Eric Garver [Mon, 10 Jul 2017 19:39:56 +0000 (15:39 -0400)]
system-common-macros: Add macro to check for ip-route encap support

This is used for native layer3 tunnels.

Signed-off-by: Eric Garver <e@erig.me>
Signed-off-by: Joe Stringer <joe@ovn.org>
7 years agosystem-common-macros: Allow passing config to ADD_OVS_TUNNEL
Eric Garver [Mon, 10 Jul 2017 19:39:55 +0000 (15:39 -0400)]
system-common-macros: Allow passing config to ADD_OVS_TUNNEL

Signed-off-by: Eric Garver <e@erig.me>
Signed-off-by: Joe Stringer <joe@ovn.org>
7 years agodpif-netlink-rtnl: Support layer3 GRE
Eric Garver [Mon, 10 Jul 2017 19:39:54 +0000 (15:39 -0400)]
dpif-netlink-rtnl: Support layer3 GRE

Add support for creating layer3 GRE.

Signed-off-by: Eric Garver <e@erig.me>
Signed-off-by: Joe Stringer <joe@ovn.org>
7 years agodpif-netlink-rtnl: Support VXLAN-GPE
Eric Garver [Mon, 10 Jul 2017 19:39:53 +0000 (15:39 -0400)]
dpif-netlink-rtnl: Support VXLAN-GPE

Add support for creating VXLAN tunnels with GPE. This enables layer3
VXLAN tunnels with kernel datapath.

Signed-off-by: Eric Garver <e@erig.me>
Signed-off-by: Joe Stringer <joe@ovn.org>
7 years agodpif-netlink: For non-Ethernet, use Ethertype from packet_type.
Joe Stringer [Tue, 18 Jul 2017 22:32:44 +0000 (15:32 -0700)]
dpif-netlink: For non-Ethernet, use Ethertype from packet_type.

For non-Ethernet flows, when fixing up the netlink message we need make
sure to pass down a valid Ethertype. The kernel does not understand
packet_type so it's implicitly encoded by the absence of _ETHERNET and
exact match of _ETHERTYPE. Without this change match_validate in the
kernel complains when trying to match packets from L3 tunnels.
e.g.
  openvswitch: netlink: Unexpected mask (mask=110088, allowed=3d9804c)

The mask use to always be set in xlate_wc_init() and xlate_wc_finish(),
but that changed for non-Ethernet frames with the commit listed below.

Fixes: 3d4b2e6eb74e ("userspace: Add OXM field MFF_PACKET_TYPE")
Signed-off-by: Joe Stringer <joe@ovn.org>
Co-authored-by: Eric Garver <e@erig.me>
Acked-by: Eric Garver <e@erig.me>
7 years agodpif-netlink: Use netlink helpers for packet_type.
Joe Stringer [Tue, 18 Jul 2017 22:32:43 +0000 (15:32 -0700)]
dpif-netlink: Use netlink helpers for packet_type.

Rather than open-coding access to netlink attribute pointers in
put_exclude_packet_type(), make use of the netlink attribute helpers.
This simplifies the following bugfix.

Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Eric Garver <e@erig.me>
7 years agodatapath: enable VxLAN-gpe port creation in compat mode
Yang, Yi Y [Fri, 7 Jul 2017 03:02:13 +0000 (11:02 +0800)]
datapath: enable VxLAN-gpe port creation in compat mode

In compat mode, ovs can't create L3 VxLAN-gpe port in old
kernels if port creation failed by rtnetlink, this patch
enables old kernels to create L3 VxLAN-gpe port.

Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
7 years agoodp-util: Document size of OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4.
Justin Pettit [Wed, 19 Jul 2017 04:49:39 +0000 (21:49 -0700)]
odp-util: Document size of OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4.

This attribute is exclusive of OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6 so it
doesn't take up additional space (IPv6 is larger), but it's still worth
documenting.

Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
7 years agoovn: l3ha truncate log file in tests before action starts
majopela@redhat.com [Tue, 18 Jul 2017 14:55:03 +0000 (14:55 +0000)]
ovn: l3ha truncate log file in tests before action starts

An specific ovn/l3ha test looks for port releases in the MASTER
gateway when another gateway on the Gateway_Chassis list is gone.

But we didn't clean the log before taking out the chassis, so
any previous unrelated occurrence could make the test fail
while it was ok.

Signed-off-by: Miguel Angel Ajo <majopela@redhat.com>
Signed-off-by: Russell Bryant <russell@ovn.org>