]> git.proxmox.com Git - ovs.git/log
ovs.git
4 years agoDocumentation: Fix kernel support matrix
Greg Rose [Tue, 19 May 2020 22:01:46 +0000 (15:01 -0700)]
Documentation: Fix kernel support matrix

The documentation matrix for OVS branches and which kernels they support
is out of date.  Update it to show that since 2.10 the lowest kernel
that we test and support is Linux 3.16.

RHEL and CentOS kernels based upon the original 3.10 kernel are still
supported.

Reported-by: Han Zhou <hzhou@ovn.org>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2020-May/370742.html
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agonetdev-offload-tc: Re-fetch block ID after probing.
Aaron Conole [Fri, 15 May 2020 20:36:19 +0000 (16:36 -0400)]
netdev-offload-tc: Re-fetch block ID after probing.

It's possible that block_id could changes after the probe for block
support.  Therefore, fetch the block_id again after the probe.

Fixes: edc2055a2bf7 ("netdev-offload-tc: Flush rules on ingress block when init tc flow api")
Cc: Dmytro Linkin <dmitrolin@mellanox.com>
Acked-by: Roi Dayan <roid@mellanox.com>
Co-authored-by: Marcelo Leitner <mleitner@redhat.com>
Signed-off-by: Marcelo Leitner <mleitner@redhat.com>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agonetdev-linux: Update LAG in all cases.
Aaron Conole [Fri, 15 May 2020 20:36:18 +0000 (16:36 -0400)]
netdev-linux: Update LAG in all cases.

In some cases, when processing a netlink change event, it's possible for
an alternate part of OvS (like the IPv6 endpoint processing) to hold an
active netdev interface.  This creates a race-condition, where sometimes
the OvS change processing will take the normal path.  This doesn't work
because the netdev device object won't actually be enslaved to the
ovs-system (for instance, a linux bond) and ingress qdisc entries will
be missing.

To address this, we update the LAG information in ALL cases where
LAG information could come in.

Fixes: d22f8927c3c9 ("netdev-linux: monitor and offload LAG slaves to TC")
Cc: Marcelo Leitner <mleitner@redhat.com>
Cc: John Hurley <john.hurley@netronome.com>
Acked-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agodebian: Add python3-sphinx to ovs build dependencies
Ansis Atteka [Fri, 15 May 2020 19:08:13 +0000 (12:08 -0700)]
debian: Add python3-sphinx to ovs build dependencies

python3-sphinx has become mandatory build dependency since patch
39b5e46 ("Documentation: Convert multiple manpages to ReST."), because,
otherwise, without this dependency installed, packaging of OVS debian
packages fails with an error that generated man pages can't be found.

Fixes: 39b5e46312 ("Documentation: Convert multiple manpages to ReST.")
CC: Ben Pfaff <blp@ovn.org>
Signed-off-by: Ansis Atteka <aatteka@ovn.org>
Reported-by: Artem Teleshev <artem.teleshev@gmail.com>
Acked-by: Greg Rose <gvrose8192@gmail.com>
4 years agoofproto: Fix statistics of removed flow.
Ilya Maximets [Thu, 14 May 2020 18:20:56 +0000 (20:20 +0200)]
ofproto: Fix statistics of removed flow.

'fr' is a new variable on the stack.  '+=' here adds the real statistics
to a random stack memory.

Fixes: 164413156cf9 ("Add offload packets statistics")
Acked-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agodebian: Fix package dependencies
Roi Dayan [Thu, 14 May 2020 13:25:10 +0000 (16:25 +0300)]
debian: Fix package dependencies

In python2 package was python-twisted-conch but it looks like
for python3 it's just python3-twisted.
For zope interface the python3 package name is python3-zope.interface.

Fixes: 1ca0323e7c29 ("Require Python 3 and remove support for Python 2.")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Acked-by: Ansis Atteka <aatteka@ovn.org>
4 years agooss-fuzz: Fix miniflow_target.c.
William Tu [Tue, 12 May 2020 15:22:31 +0000 (08:22 -0700)]
oss-fuzz: Fix miniflow_target.c.

Clang reports:
tests/oss-fuzz/miniflow_target.c:209:26: error: suggest braces around \
initialization of subobject
      [-Werror,-Wmissing-braces]
          struct flow flow2 = {0};

Fix it by using memset.

Cc: Bhargava Shastry <bshastry@sect.tu-berlin.de>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agometaflow: Fix maskable conntrack orig tuple fields
Yi-Hung Wei [Wed, 13 May 2020 20:11:17 +0000 (13:11 -0700)]
metaflow: Fix maskable conntrack orig tuple fields

From man ovs-fields(7), the conntrack origin tuple fields
ct_nw_src/dst, ct_ipv6_src/dst, and ct_tp_src/dst are supposed
to be bitwise maskable, but they are not.  This patch enables
those fields to be maskable, and adds a regression test.

Fixes: daf4d3c18da4 ("odp: Support conntrack orig tuple key.")
Reported-by: Wenying Dong <wenyingd@vmware.com>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agotests: Add tests using tap device.
William Tu [Tue, 24 Mar 2020 22:10:51 +0000 (15:10 -0700)]
tests: Add tests using tap device.

Similar to using veth across namespaces, this patch creates
tap devices, assigns to namespaces, and allows traffic to
go through different test cases.

Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agouserspace: Enable TSO support for non-DPDK.
William Tu [Tue, 24 Mar 2020 22:10:50 +0000 (15:10 -0700)]
userspace: Enable TSO support for non-DPDK.

This patch enables TSO support for non-DPDK use cases, and
also add check-system-tso testsuite. Before TSO, we have to
disable checksum offload, allowing the kernel to calculate the
TCP/UDP packet checsum. With TSO, we can skip the checksum
validation by enabling checksum offload, and with large packet
size, we see better performance.

Consider container to container use cases:
  iperf3 -c (ns0) -> veth peer -> OVS -> veth peer -> iperf3 -s (ns1)
And I got around 6Gbps, similar to TSO with DPDK-enabled.

Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agodebian: Fix broken build after some man pages became generated from RST
Ansis Atteka [Wed, 13 May 2020 17:44:11 +0000 (10:44 -0700)]
debian: Fix broken build after some man pages became generated from RST

As far as I know, the official way to build debian packages is by invoking
following command:

> fakeroot debian/rules binary

However, that command started to fail with these errors:

dh_installman --language=C
dh_installman: Cannot find (any matches for) "utilities/ovs-appctl.8" (tried in .)
dh_installman: Cannot find (any matches for) "utilities/ovs-l3ping.8" (tried in .)
dh_installman: Cannot find (any matches for) "utilities/ovs-tcpdump.8" (tried in .)

because the generated manpages are not part of the source tree anymore.  This
patch updates debian *.manpages files to point to the generted files.

Fixes: 39b5e46312 ("Documentation: Convert multiple manpages to ReST.")
CC: Ben Pfaff <blp@ovn.org>
Signed-off-by: Ansis Atteka <aatteka@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
4 years agoRAFT: Add clarifying note for cluster/leave operation.
Mark Michelson [Fri, 8 May 2020 21:00:27 +0000 (17:00 -0400)]
RAFT: Add clarifying note for cluster/leave operation.

We had a user express confusion about the state of a cluster after using
cluster/leave. The user had a three server cluster and used
cluster/leave to remove two servers from the cluster. The user expected
that the single server left would not function since the quorum of two
servers for a three server cluster was not met.

In actuality, cluster/leave removes the server from the cluster and
alters the cluster size in the process. Thus the single remaining server
continued to function since quorum was reached.

This documentation change makes it a bit more explicit that
cluster/leave alters the size of the cluster and cites the three server
down to one server case as an example.

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1798158
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoraft: Disable RAFT jsonrpc inactivity probe.
Zhen Wang [Tue, 31 Mar 2020 00:21:04 +0000 (17:21 -0700)]
raft: Disable RAFT jsonrpc inactivity probe.

With the scale test of 640 nodes k8s cluster, raft DB nodes' jsonrpc
session got closed due to the timeout of default 5 seconds probe.
It will cause disturbance of the raft cluster. Since we already have
the heartbeat for RAFT, just disable the probe between the servers
to avoid the unnecessary jsonrpc inactivity probe.

Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Zhen Wang <zhewang@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agoovsdb-idl: Fix NULL deref reported by Coverity.
William Tu [Sat, 2 May 2020 16:24:30 +0000 (09:24 -0700)]
ovsdb-idl: Fix NULL deref reported by Coverity.

When 'datum.values' or 'datum.keys' is NULL, some code path calling
into ovsdb_idl_txn_write__ triggers NULL deref.  An example is below:

ovsrec_open_vswitch_set_cur_cfg(const struct ovsrec_open_vswitch
{
    struct ovsdb_datum datum;
    union ovsdb_atom key;

    datum.n = 1;
    datum.keys = &key;

    key.integer = cur_cfg;
//  1. assign_zero: Assigning: datum.values = NULL.
    datum.values = NULL;
//  CID 1421356 (#1 of 1): Explicit null dereferenced (FORWARD_NULL)
//  2. var_deref_model: Passing &datum to ovsdb_idl_txn_write_clone,\
//     which dereferences null datum.values.
    ovsdb_idl_txn_write_clone(&row->header_, &ovsrec_open_vswitch_col
}

And with the following calls:
ovsdb_idl_txn_write_clone
  ovsdb_idl_txn_write__
    6. deref_parm_in_call: Function ovsdb_datum_destroy dereferences
       datum->values
ovsdb_datum_destroy

And another possible NULL deref is at ovsdb_datum_equals(). Fix the
two by adding additional checks.

Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agoovsdb-idlc: Fix memory leak reported by Coverity.
William Tu [Sat, 2 May 2020 16:08:26 +0000 (09:08 -0700)]
ovsdb-idlc: Fix memory leak reported by Coverity.

An exmplae pattern shown below:
void
ovsrec_ct_zone_index_set_external_ids(const struct ovsrec_ct_zone...
{
//  1. alloc_fn: Storage is returned from allocation function xmalloc.
//  2. var_assign: Assigning: datum = storage returned from xmalloc(24UL).
    struct ovsdb_datum *datum = xmalloc(sizeof(struct ovsdb_datum));

//  3. Condition external_ids, taking false branch.
    if (external_ids) {
...
    } else {
//  4. noescape: Resource datum is not freed or pointed-to in ovsdb_datum_init_empty.
        ovsdb_datum_init_empty(datum);
    }
//  5. noescape: Resource datum is not freed or pointed-to in ovsdb_idl_index_write.
    ovsdb_idl_index_write(CONST_CAST(struct ovsdb_idl_row *, &row->header_),
                          &ovsrec_ct_zone_columns[OVSREC_CT_ZONE_COL_EXTERNAL_IDS],
                          datum,
                          &ovsrec_table_classes[OVSREC_TABLE_CT_ZONE]);

// CID 1420856 (#1 of 1): Resource leak (RESOURCE_LEAK)
// 6. leaked_storage: Variable datum going out of scope leaks the storage it
      points to.
Fix it by freeing the datum.

Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agoovsdb-idlc: Fix memory leak reported by Coverity.
William Tu [Sat, 2 May 2020 16:01:48 +0000 (09:01 -0700)]
ovsdb-idlc: Fix memory leak reported by Coverity.

Coverity shows the following memory leak in this code pattern:

void
ovsrec_ipfix_index_set_obs_domain_id(...
{
    struct ovsdb_datum datum;
//     1. alloc_fn: Storage is returned from allocation function xmalloc.
//     2. var_assign: Assigning: key = storage returned from xmalloc(16UL).
    union ovsdb_atom *key = xmalloc(sizeof(union ovsdb_atom));

//     3. Condition n_obs_domain_id, taking false branch.
    if (n_obs_domain_id) {
        datum.n = 1;
        datum.keys = key;
        key->integer = *obs_domain_id;
    } else {
        datum.n = 0;
        datum.keys = NULL;
    }
    datum.values = NULL;
    ovsdb_idl_index_write(CONST_CAST(struct ovsdb_idl_row *, &row->head...
//     CID 1420891 (#1 of 1): Resource leak (RESOURCE_LEAK)

Fixed it by moving the xmalloc to the true branch.

Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agosparse: Fix typo in DPDK endian conversion macros.
David Marchand [Tue, 28 Apr 2020 12:03:53 +0000 (14:03 +0200)]
sparse: Fix typo in DPDK endian conversion macros.

This header is duplicated from the DPDK generic header.
Fix typo identified in DPDK [1].

While at it, RTE_EXEC_ENV_BSDAPP has been replaced with
RTE_EXEC_ENV_FREEBSD in 19.05 [2].

1: https://git.dpdk.org/dpdk/commit/?id=a3e283ed904c
2: https://git.dpdk.org/dpdk/commit/?id=5fbc1d498f54

Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
4 years agoraft: Fix leak of the incomplete command.
Ilya Maximets [Mon, 4 May 2020 19:55:41 +0000 (21:55 +0200)]
raft: Fix leak of the incomplete command.

Function raft_command_initiate() returns correctly referenced command
instance.  'n_ref' equals 1 for complete commands and 2 for incomplete
commands because one more reference is in raft->commands list.
raft_handle_execute_command_request__() leaks the reference by not
returning pointer anywhere and not unreferencing incomplete commands.

 792 bytes in 11 blocks are definitely lost in loss record 258 of 262
    at 0x483BB1A: calloc (vg_replace_malloc.c:762)
    by 0x44BA32: xcalloc (util.c:121)
    by 0x422E5F: raft_command_create_incomplete (raft.c:2038)
    by 0x422E5F: raft_command_initiate (raft.c:2061)
    by 0x428651: raft_handle_execute_command_request__ (raft.c:4161)
    by 0x428651: raft_handle_execute_command_request (raft.c:4177)
    by 0x428651: raft_handle_rpc (raft.c:4230)
    by 0x428651: raft_conn_run (raft.c:1445)
    by 0x428DEA: raft_run (raft.c:1803)
    by 0x407392: main_loop (ovsdb-server.c:226)
    by 0x407392: main (ovsdb-server.c:469)

Fixes: 1b1d2e6daa56 ("ovsdb: Introduce experimental support for clustered databases.")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agonetdev-afxdp: Fix missing init.
William Tu [Mon, 4 May 2020 16:26:24 +0000 (09:26 -0700)]
netdev-afxdp: Fix missing init.

When introducing the interrupt mode for netdev-afxdp, the netdev
init function is accidentally removed.  Fix it by adding it back.

Fixes: 5bfc519fee499 ("netdev-afxdp: Add interrupt mode netdev class.")
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agodocs: Document check_pkt_len action.
William Tu [Fri, 1 May 2020 15:42:25 +0000 (08:42 -0700)]
docs: Document check_pkt_len action.

Cc: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Numan Siddique <numans@ovn.org>
4 years agouserspace: Add conntrack timeout policy support.
William Tu [Wed, 29 Apr 2020 19:25:11 +0000 (12:25 -0700)]
userspace: Add conntrack timeout policy support.

Commit 1f1613183733 ("ct-dpif, dpif-netlink: Add conntrack timeout
policy support") adds conntrack timeout policy for kernel datapath.
This patch enables support for the userspace datapath.  I tested
using the 'make check-system-userspace' which checks the timeout
policies for ICMP and UDP cases.

Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
4 years agocompat: Fix ipv6_dst_lookup build error
Yi-Hung Wei [Wed, 29 Apr 2020 21:25:50 +0000 (14:25 -0700)]
compat: Fix ipv6_dst_lookup build error

The geneve/vxlan compat code base invokes ipv6_dst_lookup() which is
recently replaced by ipv6_dst_lookup_flow() in the stable kernel tree.

This causes travis build failure:
    * https://travis-ci.org/github/openvswitch/ovs/builds/681084038

This patch updates the backport logic to invoke the right function.

Related patch in
    git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git

b9f3e457098e ("net: ipv6_stub: use ip6_dst_lookup_flow instead of
               ip6_dst_lookup")

Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agoAUTHORS: Add Jiang Lidong.
William Tu [Thu, 30 Apr 2020 14:40:36 +0000 (07:40 -0700)]
AUTHORS: Add Jiang Lidong.

Signed-off-by: William Tu <u9012063@gmail.com>
4 years agonetdev-linux: remove sum of vport stats and kernel netdev stats.
Jiang Lidong [Thu, 23 Apr 2020 05:35:14 +0000 (05:35 +0000)]
netdev-linux: remove sum of vport stats and kernel netdev stats.

When using kernel veth as OVS interface, doubled drop counter
value is shown when veth drops packets due to traffic overrun.

In netdev_linux_get_stats, it reads both vport stats and kernel
netdev stats, in case vport stats retrieve failure. If both of
them success, error counters are added to include errors from
different layers. But implementation of ovs_vport_get_stats in
kernel data path has included kernel netdev stats by calling
dev_get_stats. When drop or other error counters is not zero,
its value is doubled by netdev_linux_get_stats.

In this change, adding kernel netdev stats into vport stats
is removed, since vport stats includes all information of
kernel netdev stats.

Signed-off-by: Jiang Lidong <jianglidong3@jd.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agoovs-bugtool: Add ethtool -l for combined channel.
William Tu [Wed, 29 Apr 2020 17:30:26 +0000 (10:30 -0700)]
ovs-bugtool: Add ethtool -l for combined channel.

Users of netdev-afxdp has to setup the combined channel
on physical NIC. This helps debugging related issues.
Example output:
  $ ethtool -l enp3s0f0
  Channel parameters for enp3s0f0:
  Pre-set maximums:
  RX:        0
  TX:        0
  Other:     1
  Combined:  63
  Current hardware settings:
  RX:        0
  TX:        0
  Other:     1
  Combined:  1

Some previous discussion:
https://mail.openvswitch.org/pipermail/ovs-dev/2020-January/366631.html

Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
4 years agobugtool: Add dump-tlv-map.
William Tu [Wed, 29 Apr 2020 16:14:54 +0000 (09:14 -0700)]
bugtool: Add dump-tlv-map.

This helps debugging the tlv map issues.

Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
4 years agoofp-actions: Add delete field action
Yi-Hung Wei [Tue, 14 Apr 2020 20:33:28 +0000 (13:33 -0700)]
ofp-actions: Add delete field action

This patch adds a new OpenFlow action, delete field, to delete a
field in packets.  Currently, only the tun_metadata fields are
supported.

One use case to add this action is to support multiple versions
of geneve tunnel metadatas to be exchanged among different versions
of networks.  For example, we may introduce tun_metadata2 to
replace old tun_metadata1, but still want to provide backward
compatibility to the older release.  In this case, in the new
OpenFlow pipeline, we would like to support the case to receive a
packet with tun_metadata1, do some processing.  And if the packet
is going to a switch in the newer release, we would like to delete
the value in tun_metadata1 and set a value into tun_metadata2.

Currently, ovs does not provide an action to remove a value in
tun_metadata if the value is present.  This patch fulfills the gap
by adding the delete_field action.  For example, the OpenFlow
syntax to delete tun_metadata1 is:

    actions=delete_field:tun_metadata1

Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: William Tu <u9012063@gmail.com>
4 years agoAUTHORS: Add Anton Ivanov.
William Tu [Mon, 27 Apr 2020 15:49:13 +0000 (08:49 -0700)]
AUTHORS: Add Anton Ivanov.

Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Numan Siddique <numans@ovn.org>
4 years agoconntrack: Fix icmp conntrack state.
William Tu [Mon, 27 Apr 2020 15:42:29 +0000 (08:42 -0700)]
conntrack: Fix icmp conntrack state.

ICMP conntrack state should be ICMPS_REPLY after seeing both
side of ICMP traffic.

Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
4 years agodocs: Fix GTP-U release version.
William Tu [Mon, 27 Apr 2020 15:45:19 +0000 (08:45 -0700)]
docs: Fix GTP-U release version.

GTP-U support should be at OVS-2.14.

Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
4 years agonetdev-afxdp: Add interrupt mode netdev class.
William Tu [Tue, 14 Apr 2020 13:22:55 +0000 (06:22 -0700)]
netdev-afxdp: Add interrupt mode netdev class.

The patch adds a new netdev class 'afxdp-nonpmd' to enable afxdp
interrupt mode. This is similar to 'type=afxdp', except that the
is_pmd field is set to false. As a result, the packet processing
is handled by main thread, not pmd thread. This avoids burning
the CPU to always 100% when there is no traffic.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agoovsdb: Remove duplicated function defintions
Yi-Hung Wei [Tue, 21 Apr 2020 22:09:05 +0000 (15:09 -0700)]
ovsdb: Remove duplicated function defintions

ovsdb_function_from_string() and ovsdb_function_to_string() are defined
both in ovsdb/condition.c and lib/ovsdb-condidtion.c with the same function
definition.  Remove the one in ovsdb/condition.c to avoid duplication.

This also resolves the following bazel building error.

./libopenvswitch.lo(ovsdb-condition.pic.o): In function `ovsdb_function_from_string':
/lib/ovsdb-condition.c:24: multiple definition of `ovsdb_function_from_string'
./libovsdb.a(condition.pic.o):/proc/self/cwd/external/openvswitch_repo/ovsdb/condition.c:34: first defined here
./libopenvswitch.lo(ovsdb-condition.pic.o): In function `ovsdb_function_from_string':
./lib/ovsdb-condition.c:24: multiple definition of `ovsdb_function_to_string'
./libovsdb.a(condition.pic.o):/proc/self/cwd/external/openvswitch_repo/ovsdb/condition.c:335

Reported-by: Harold Lim <haroldl@vmware.com>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agoovsdb: Switch ovsdb log fsync to data only.
Anton Ivanov [Tue, 21 Apr 2020 08:23:57 +0000 (09:23 +0100)]
ovsdb: Switch ovsdb log fsync to data only.

We do not check metadata - mtime, atime, anywhere, so we
do not need to update it every time we sync the log.
if the system supports it, the log update should be
data only

Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agovlog: Fast path in vlog.
Anton Ivanov [Tue, 21 Apr 2020 08:24:38 +0000 (09:24 +0100)]
vlog: Fast path in vlog.

Avoid grabbing any mutexes if the log levels specify that
no logging is to take place.

Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agoUtilities: Add the ovs_dump_ofpacts command to gdb
Eelco Chaudron [Fri, 17 Apr 2020 12:51:34 +0000 (14:51 +0200)]
Utilities: Add the ovs_dump_ofpacts command to gdb

This adds the ovs_dump_ifpacts command:

(gdb) help ovs_dump_ofpacts
Dump all actions in an ofpacts set
    Usage: ovs_dump_ofpacts <struct ofpact *> <ofpacts_len>

       <struct ofpact *> : Pointer to set of ofpact structures.
       <ofpacts_len>     : Total length of the set.

    Example dumping all actions when in the clone_xlate_actions() function:

    (gdb) ovs_dump_ofpacts actions actions_len
    (struct ofpact *) 0x561c7be487c8: {type = OFPACT_SET_FIELD, raw = 255 '', len = 24}
    (struct ofpact *) 0x561c7be487e0: {type = OFPACT_SET_FIELD, raw = 255 '', len = 24}
    (struct ofpact *) 0x561c7be487f8: {type = OFPACT_SET_FIELD, raw = 255 '', len = 24}
    (struct ofpact *) 0x561c7be48810: {type = OFPACT_SET_FIELD, raw = 255 '', len = 32}
    (struct ofpact *) 0x561c7be48830: {type = OFPACT_SET_FIELD, raw = 255 '', len = 24}
    (struct ofpact *) 0x561c7be48848: {type = OFPACT_RESUBMIT, raw = 38 '&', len = 16}

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agoUtilities: make print() in gdb script work on all version of Python
Eelco Chaudron [Tue, 21 Apr 2020 11:33:55 +0000 (13:33 +0200)]
Utilities: make print() in gdb script work on all version of Python

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agofatal-signal: Remove snprintf.
William Tu [Tue, 14 Apr 2020 15:17:04 +0000 (08:17 -0700)]
fatal-signal: Remove snprintf.

Function snprintf is not async-signal-safe.  Replace it with
our own implementation.  Example ovs-vswitchd.log output:
  2020-03-25T01:08:19.673Z|00050|memory|INFO|handlers:2 ports:3
  SIGSEGV detected, backtrace:
  0x4872d9         <fatal_signal_handler+0x49>
  0x7f4e2ab974b0   <killpg+0x40>
  0x7f4e2ac5d74d   <__poll+0x2d>
  0x531098         <time_poll+0x108>
  0x51aefc         <poll_block+0x8c>
  0x445ca9         <udpif_revalidator+0x289>
  0x5056fd         <ovsthread_wrapper+0x7d>
  0x7f4e2b65f6ba   <start_thread+0xca>
  0x7f4e2ac6941d   <clone+0x6d>
  0x0              <+0x0>

Tested-at: https://travis-ci.org/github/williamtu/ovs-travis/builds/674901331
Tested-by: Yifeng Sun <pkusunyifeng@gmail.com>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agoconntrack: Add coverage count for l4csum error.
William Tu [Thu, 16 Apr 2020 19:54:53 +0000 (12:54 -0700)]
conntrack: Add coverage count for l4csum error.

Add a coverage counter when userspace conntrack receives a packet
with invalid l4 checksum.  When using veth for testing, users
often forget to turn off the tx offload on the other side of the
namespace, causing l4 checksum not calculated in packet header,
and at conntrack, return invalid conntrack state.

Suggested-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
4 years agoacinclude: handle dependencies for DPDK's AF_XDP PMD
Ciara Loftus [Mon, 10 Feb 2020 13:48:54 +0000 (13:48 +0000)]
acinclude: handle dependencies for DPDK's AF_XDP PMD

If RTE_LIBRTE_AF_XDP is enabled in the DPDK build, OVS must link
the libbpf library, otherwise build failures will occur.

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agoacinclude: handle dependencies for DPDK's PCAP PMD
Ciara Loftus [Mon, 10 Feb 2020 13:48:53 +0000 (13:48 +0000)]
acinclude: handle dependencies for DPDK's PCAP PMD

If RTE_LIBRTE_PMD_PCAP is enabled in the DPDK build, OVS must link
the pcap library, otherwise build failures will occur.

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agocompat: Fix broken partial backport of extack op parameter
Greg Rose [Tue, 14 Apr 2020 18:42:10 +0000 (11:42 -0700)]
compat: Fix broken partial backport of extack op parameter

A series of commits added support for the extended ack
parameter to the newlink, changelink and validate ops in
the rtnl_link_ops structure:
a8b8a889e369d ("net: add netlink_ext_ack argument to rtnl_link_ops.validate")
7a3f4a185169b ("net: add netlink_ext_ack argument to rtnl_link_ops.newlink")
ad744b223c521 ("net: add netlink_ext_ack argument to rtnl_link_ops.changelink")

These commits were all added at the same time and present since the
Linux kernel 4.13 release. In our compatiblity layer we have a
define HAVE_EXT_ACK_IN_RTNL_LINKOPS that indicates the presence of
the extended ack parameter for these three link operations.

At least one distro has only backported two of the three patches,
for newlink and changelink, while not backporting patch a8b8a889e369d
for the validate op.  Our compatibility layer code in acinclude.m4
is able to find the presence of the extack within the rtnl_link_ops
structure so it defines HAVE_EXT_ACK_IN_RTNL_LINKOPS but since the
validate link op does not have the extack parameter the compilation
fails on recent kernels for that particular distro. Other kernel
distributions based upon this distro will presumably also encounter
the compile errors.

Introduce a new function in acinclude.m4 that will find function
op definitions and then search for the required parameter.  Then
use this function to define HAVE_RTNLOP_VALIDATE_WITH_EXTACK so
that we can detect and enable correct compilation on kernels
which have not backported the entire set of patches.  This function
is generic to any function op - it need not be in a structure.

In places where HAVE_EXT_ACK_IN_RTNL_LINKOPS wraps validate functions
replace it with the new HAVE_RTNLOP_VALIDATE_WITH_EXTACK define.

Passes Travis here:
https://travis-ci.org/github/gvrose8192/ovs-experimental/builds/674599698

Passes a kernel check-kmod test on several systems, including
sles12 sp4 4.12.14-95.48-default kernel, without any regressions.

VMWare-BZ: #2544032
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agoofp-actions: Fix memory leak on error path.
William Tu [Mon, 13 Apr 2020 15:36:44 +0000 (08:36 -0700)]
ofp-actions: Fix memory leak on error path.

Need to free the memory before return. Detected by gcc10.

Signed-off-by: William Tu <u9012063@gmail.com>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
4 years agosystem-traffic: Check frozen state handling with TLV map change
Yifeng Sun [Thu, 9 Apr 2020 18:37:39 +0000 (11:37 -0700)]
system-traffic: Check frozen state handling with TLV map change

This patch enhances a system traffic test to prevent regression on
the tunnel metadata table (tun_table) handling with frozen state.
Without a proper fix this test can crash ovs-vswitchd due to a
use-after-free bug on tun_table.

These are the timed sequence of how this bug is triggered:

- Adds an OpenFlow rule in OVS that matches Geneve tunnel metadata that
contains a controller action.
- When the first packet matches the aforementioned OpenFlow rule,
during the miss upcall, OVS stores a pointer to the tun_table (that
decodes the Geneve tunnel metadata) in a frozen state and pushes down
a datapath flow into kernel datapath.
- Issues a add-tlv-map command to reprogram the tun_table on OVS.
OVS frees the old tun_table and create a new tun_table.
- A subsequent packet hits the kernel datapath flow again. Since
there is a controller action associated with that flow, it triggers
slow path controller upcall.
- In the slow path controller upcall, OVS derives the tun_table
from the frozen state, which points to the old tun_table that is
already being freed at this time point.
- In order to access the tunnel metadata, OVS uses the invalid
pointer that points to the old tun_table and triggers the core dump.

Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
Co-authored-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agotun_metadata: Fix coredump caused by use-after-free bug
Yifeng Sun [Thu, 9 Apr 2020 18:37:38 +0000 (11:37 -0700)]
tun_metadata: Fix coredump caused by use-after-free bug

Tun_metadata can be referened by flow and frozen_state at the same
time. When ovs-vswitchd handles TLV table mod message, the involved
tun_metadata gets freed. The call trace to free tun_metadata is
shown as below:

ofproto_run
- handle_openflow
  - handle_single_part_openflow
    - handle_tlv_table_mod
      - tun_metadata_table_mod
        - tun_metadata_postpone_free

Unfortunately, this tun_metadata can be still used by some frozen_state,
and later on when frozen_state tries to access its tun_metadata table,
ovs-vswitchd crashes. The call trace to access tun_metadata from
frozen_state is shown as below:

udpif_upcall_handler
- recv_upcalls
  - process_upcall
    - frozen_metadata_to_flow

It is unsafe for frozen_state to reference tun_table because tun_table
is protected by RCU while the lifecycle of frozen_state can span several
RCU quiesce states. Current code violates OVS's RCU protection mechanism.

This patch fixes it by simply stopping frozen_state from referencing
tun_table. If frozen_state needs tun_table, the latest valid tun_table
can be found through ofproto_get_tun_tab() efficiently.

A previous commit seems fixing the samiliar issue:
254878c18874f6 (ofproto-dpif-xlate: Fix segmentation fault caused by tun_table)

VMware-BZ: #2526222
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agobugtool: Fix for Python3.
Timothy Redaelli [Thu, 19 Mar 2020 19:05:39 +0000 (20:05 +0100)]
bugtool: Fix for Python3.

Currently ovs-bugtool tool doesn't start on Python 3.
This commit fixes ovs-bugtool to make it works on Python 3.

Replaced StringIO.StringIO with io.BytesIO since the script is
processing binary data.

Reported-at: https://bugzilla.redhat.com/1809241
Reported-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Co-authored-by: William Tu <u9012063@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agodpif-netdev: includes microsecond delta in meter bucket calculation
Jiang Lidong [Tue, 7 Apr 2020 03:28:06 +0000 (03:28 +0000)]
dpif-netdev: includes microsecond delta in meter bucket calculation

When dp-netdev meter rate is higher than 200Mbps, observe
more than 10% bias from configured rate value with UDP traffic.

In dp-netdev meter, millisecond delta between now and last used
is taken into bucket size calcualtion, while sub-millisecond part
is truncated.

If traffic rate is pretty high, time delta can be few milliseconds,
its ratio to truncated part is less than 10:1, the loss of bucket
size caused by truncated can be observed obviously by commited
traffic rate.

In this patch, microsend delta part is included in calculation
of meter bucket to make it more precise.

Signed-off-by: Jiang Lidong <jianglidong3@jd.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agoTravis: Enable clang compiler and unit test for arm CI
Lance Yang [Mon, 30 Mar 2020 12:54:03 +0000 (20:54 +0800)]
Travis: Enable clang compiler and unit test for arm CI

Enable testsuite and clang compiler for arm CI. In order not to increase
the CI jobs, selectively enable them in the existing jobs instead of
adding extra jobs.

Successful travis job build report:
https://travis-ci.org/github/yzyuestc/ovs/builds/667539360

Reviewed-by: Yanqin Wei <Yanqin.Wei@arm.com>
Reviewed-by: Malvika Gupta <Malvika.Gupta@arm.com>
Signed-off-by: Lance Yang <Lance.Yang@arm.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agotests/testsuite: Skip failing UT cases on aarch64
Malvika Gupta [Mon, 30 Mar 2020 12:54:02 +0000 (20:54 +0800)]
tests/testsuite: Skip failing UT cases on aarch64

The following test cases are failing inconsistently on aarch64 platforms and
have been skipped until further investigation can be made on how to fix them:

20: bfd.at:268           bfd - bfd decay
2104: ovsdb-idl.at:1815  Check Python IDL connects to leader - Python3 (leader only)
2105: ovsdb-idl.at:1816  Check Python IDL reconnects to leader - Python3 (leader only)

Suggested-by: Yanqin Wei <Yanqin.Wei@arm.com>
Suggested-by: Lance Yang <Lance.Yang@arm.com>
Signed-off-by: Malvika Gupta <malvika.gupta@arm.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agotests/atlocal.in: Add check for aarch64 Architecture
Malvika Gupta [Mon, 30 Mar 2020 12:54:01 +0000 (20:54 +0800)]
tests/atlocal.in: Add check for aarch64 Architecture

This patch adds a condition to check if the CPU architecture is aarch64. If the
condition evaluates to true, $IS_ARM64 variable is set to 'yes'. For all other
architectures, this variable is set to 'no'.

Reviewed-by: Yanqin Wei <Yanqin.wei@arm.com>
Signed-off-by: Malvika Gupta <malvika.gupta@arm.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agoutil: Update OVS_TYPEOF macro for C++ enabled applications.
Archana Holla [Tue, 7 Apr 2020 18:09:33 +0000 (11:09 -0700)]
util: Update OVS_TYPEOF macro for C++ enabled applications.

OVS_TYPEOF macro doesn’t return the type of object for non __GNUC__ platforms.
Updating it to use "decltype" keyword when used from C++ code.

Signed-off-by: Archana Holla <harchana@vmware.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agoovs-vswitchd: Fix icmp reply timeout description.
William Tu [Mon, 6 Apr 2020 23:59:01 +0000 (16:59 -0700)]
ovs-vswitchd: Fix icmp reply timeout description.

Currently the userspace datapath implements conntrack ICMP reply state
as when ICMP packets have been seen on both directions.  However, the
description is defined as timeout of the connection after an ICMP error
is replied in response to an ICMP packet.

Fixes: 61a5264d60d0c ("ovs-vswitchd: Add Datapath, CT_Zone, and CT_Zone_Policy tables.")
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Greg Rose <gvrose8192@gmail.com>
4 years agonetdev-linux.c: Fix coverity unreachable code warning
Usman Ansari [Wed, 1 Apr 2020 22:33:32 +0000 (15:33 -0700)]
netdev-linux.c: Fix coverity unreachable code warning

Coverity reports unreachable code in "?" statement
Fixed by removing code segment and unused variables & defines

Signed-off-by: Usman Ansari <ua1422@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agocirrus: Force pkg update on FreeBSD.
Ilya Maximets [Fri, 27 Mar 2020 08:51:51 +0000 (09:51 +0100)]
cirrus: Force pkg update on FreeBSD.

Seems like FreeBSD ports/images are not well maintained and frequently
causes package installation failures like this:

 [1/40] Fetching automake-1.16.1_2.txz: .......... done
 pkg: cached package automake-1.16.1_2: size mismatch, fetching from remote
 [2/40] Fetching automake-1.16.1_2.txz: .......... done
 pkg: cached package automake-1.16.1_2: size mismatch, cannot continue
 Consider running 'pkg update -f'

Forced update doesn't increase build time significantly, but helps
to solve at least this one kind of issues.

Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agoRevert "ovsdb-idl: Avoid sending redundant conditional monitoring updates"
Dumitru Ceara [Wed, 25 Mar 2020 20:15:23 +0000 (21:15 +0100)]
Revert "ovsdb-idl: Avoid sending redundant conditional monitoring updates"

This reverts commit 5351980b047f4dd40be7a59a1e4b910df21eca0a.

If the ovsdb-server reply to "monitor_cond_since" requests has
"found" == false then ovsdb_idl_db_parse_monitor_reply() calls
ovsdb_idl_db_clear() which iterates through all tables and
unconditionally sets table->cond_changed to false.

However, if the client had already set a new condition for some of the
tables, this new condition request will never be sent to ovsdb-server
until the condition is reset to a different value. This is due to the
check in ovsdb_idl_db_set_condition().

One way to replicate the issue is described in the bugzilla reporting
the bug, when ovn-controller is configured to use "ovn-monitor-all":
https://bugzilla.redhat.com/show_bug.cgi?id=1808125#c6

Commit 5351980b047f tried to optimize sending redundant conditional
monitoring updates but the chances that this scenario happens with the
latest code is quite low since commit 403a6a0cb003 ("ovsdb-idl: Fast
resync from server when connection reset.") changed the behavior of
ovsdb_idl_db_parse_monitor_reply() to avoid calling ovsdb_idl_db_clear()
in most cases.

Reported-by: Dan Williams <dcbw@redhat.com>
Reported-at: https://bugzilla.redhat.com/1808125
CC: Andy Zhou <azhou@ovn.org>
Fixes: 5351980b047f ("ovsdb-idl: Avoid sending redundant conditional monitoring updates")
Acked-by: Han Zhou <hzhou@ovn.org>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agouserspace: Add GTP-U support.
William Tu [Mon, 25 Nov 2019 19:19:23 +0000 (11:19 -0800)]
userspace: Add GTP-U support.

GTP, GPRS Tunneling Protocol, is a group of IP-based communications
protocols used to carry general packet radio service (GPRS) within
GSM, UMTS and LTE networks.  GTP protocol has two parts: Signalling
(GTP-Control, GTP-C) and User data (GTP-User, GTP-U). GTP-C is used
for setting up GTP-U protocol, which is an IP-in-UDP tunneling
protocol. Usually GTP is used in connecting between base station for
radio, Serving Gateway (S-GW), and PDN Gateway (P-GW).

This patch implements GTP-U protocol for userspace datapath,
supporting only required header fields and G-PDU message type.
See spec in:
https://tools.ietf.org/html/draft-hmm-dmm-5g-uplane-analysis-00

Tested-at: https://travis-ci.org/github/williamtu/ovs-travis/builds/666518784
Signed-off-by: Feng Yang <yangfengee04@gmail.com>
Co-authored-by: Feng Yang <yangfengee04@gmail.com>
Signed-off-by: Yi Yang <yangyi01@inspur.com>
Co-authored-by: Yi Yang <yangyi01@inspur.com>
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Ben Pfaff <blp@ovn.org>
4 years agodpif-netdev: Force port reconfiguration to change dynamic_txqs.
Ilya Maximets [Tue, 24 Mar 2020 23:50:45 +0000 (00:50 +0100)]
dpif-netdev: Force port reconfiguration to change dynamic_txqs.

In case number of polling threads goes from exact number of Tx queues
in port to higher value while set_tx_multiq() not implemented or not
requesting reconfiguration, port will not be reconfigured and datapath
will continue using static Tx queue ids leading to crash.

Ex.:
 Assuming that port p0 supports up to 4 Tx queues and doesn't support
 set_tx_multiq() method.  For example, netdev-afxdp could be the case,
 because it could have multiple Tx queues, but doesn't have
 set_tx_multiq() implementation because number of Tx queues always
 equals to number of Rx queues.

 1. Configuring pmd-cpu-mask to have 3 pmd threads.

 2. Adding port p0 to OVS.
    At this point wanted_txqs = 4 (3 for pmd threads + 1 for non-pmd).
    Port reconfigured to have 4 Tx queues successfully.
    dynamic_txqs = (4 < 4) = false;

 3. Configuring pmd-cpu-mask to have 10 pmd threads.
    At this point wanted_txqs = 11 (10 for pmd threads + 1 for non-pmd).
    Since set_tx_multiq() is not implemented, netdev doesn't request
    reconfiguration and 'dynamic_txqs' remains in 'false' state.

 4. Since 'dynamic_txqs == false', dpif-netdev uses static Tx queue
    ids that are in range [0, 10] while device only supports 4 leading
    to unwanted behavior and crashes.

Fix that by marking for reconfiguration all the ports that will likely
change their 'dynamic_txqs' value.

It looks like the issue could be reproduced only with afxdp ports,
because all other non-dpdk ports ignores Tx queue ids and dpdk ports
requests for reconfiguration on set_tx_multiq().

Reported-by: William Tu <u9012063@gmail.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2020-March/368364.html
Fixes: e32971b8ddb4 ("dpif-netdev: Centralized threads and queues handling code.")
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agonetdev-offload-tc: Flush rules on ingress block when init tc flow api
Dmytro Linkin [Thu, 27 Feb 2020 15:22:32 +0000 (17:22 +0200)]
netdev-offload-tc: Flush rules on ingress block when init tc flow api

OVS can fail to attach ingress block on iface when init tc flow api,
if block already exist with rules on it and is shared with other iface.
Fix by flush all existing rules on the ingress block prior to deleting
it.

Fixes: 093c9458fb02 ("tc: allow offloading of block ids")
Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com>
Acked-by: Raed Salem <raeds@mellanox.com>
Acked-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agotravis: Enable OvS Travis CI for arm
Lance Yang [Tue, 24 Mar 2020 07:00:37 +0000 (15:00 +0800)]
travis: Enable OvS Travis CI for arm

Enable part of travis jobs with gcc compiler for arm64 architecture

1. Add arm jobs into the matrix in .travis.yml configuration file
2. To enable OVS-DPDK jobs, set the build target according to
different CPU architectures
3. Temporarily disable sparse checker because of static code checking
failure on arm64

Considering the balance of the CI coverage and running time, some kernel
and DPDK jobs are removed from Arm CI.

Successful travis build jobs report:
https://travis-ci.org/github/yzyuestc/ovs/builds/666129448

Reviewed-by: Yanqin Wei <Yanqin.Wei@arm.com>
Reviewed-by: Ruifeng Wang <Ruifeng.Wang@arm.com>
Reviewed-by: JingZhao Ni <JingZhao.Ni@arm.com>
Reviewed-by: Gavin Hu <Gavin.Hu@arm.com>
Signed-off-by: Lance Yang <Lance.Yang@arm.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agovlog: Fix OVS_REQUIRES macro.
William Tu [Tue, 24 Mar 2020 19:10:57 +0000 (12:10 -0700)]
vlog: Fix OVS_REQUIRES macro.

Pass lock objects, not their addresses, to the annotation macros.

Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Ben Pfaff <blp@ovn.org>
4 years agoDocumentation: Add extra repo info for RHEL 8
Greg Rose [Tue, 24 Mar 2020 15:42:03 +0000 (08:42 -0700)]
Documentation: Add extra repo info for RHEL 8

The extra development repo for RHEL 8 has changed.  Document it.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agocompat: Fix nf_ip_hook parameters for RHEL 8
Greg Rose [Tue, 24 Mar 2020 15:42:02 +0000 (08:42 -0700)]
compat: Fix nf_ip_hook parameters for RHEL 8

A RHEL release version check was only checking for RHEL releases
greater than 7.0 so that ended up including a compat fixup that
is not needed for 8.0.  Fix up the version check.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agoconntrack: Reset ct_state when entering a new zone.
Dumitru Ceara [Thu, 19 Mar 2020 19:21:16 +0000 (20:21 +0100)]
conntrack: Reset ct_state when entering a new zone.

When a new conntrack zone is entered, the ct_state field is zeroed in
order to avoid using state information from different zones.

One such scenario is when a packet is double NATed. Assuming two zones
and 3 flows performing the following actions in order on the packet:
1. ct(zone=5,nat), recirc
2. ct(zone=1), recirc
3. ct(zone=1,nat)

If at step #1 the packet matches an existing NAT entry, it will get
translated and pkt->md.ct_state is set to CS_DST_NAT or CS_SRC_NAT.
At step #2 the new tuple might match an existing connection and
pkt->md.ct_zone is set to 1.
If at step #3 the packet matches an existing NAT entry in zone 1,
handle_nat() will be called to perform the translation but it will
return early because the packet's zone matches the conntrack zone and
the ct_state field still contains CS_DST_NAT or CS_SRC_NAT from the
translations in zone 5.

In order to reliably detect when a packet enters a new conntrack zone
we also need to make sure that the pkt->md.ct_zone is properly
initialized if pkt->md.ct_state is non-zero. This already happens for
most cases. The only exception is when matched conntrack connection is
of type CT_CONN_TYPE_UN_NAT and the master connection is missing. To
cover this path we now call write_ct_md() in that case too. Remove
setting the CS_TRACKED flag as in this case as it will be done by the
new call to write_ct_md().

CC: Darrell Ball <dlu998@gmail.com>
Fixes: 286de2729955 ("dpdk: Userspace Datapath: Introduce NAT Support.")
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agofatal-signal: Fix clang error due to lock.
William Tu [Tue, 24 Mar 2020 14:17:02 +0000 (07:17 -0700)]
fatal-signal: Fix clang error due to lock.

Due to not acquiring lock, clang reports:
  lib/vlog.c:618:12: error: reading variable 'log_fd' requires holding mutex
  'log_file_mutex' [-Werror,-Wthread-safety-analysis]
  return log_fd;

The patch fixes it by creating a function in vlog.c to write
directly to log file unsafely.

Tested-at: https://travis-ci.org/github/williamtu/ovs-travis/builds/666165883
Fixes: ecd4a8fcdff2 ("fatal-signal: Log backtrace when no monitor daemon.")
Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agolockfile: Fix OVS_REQUIRES macro.
William Tu [Mon, 23 Mar 2020 23:34:37 +0000 (16:34 -0700)]
lockfile: Fix OVS_REQUIRES macro.

Pass lock objects, not their addresses, to the annotation macros.

Fixes: f21fa45f3085 ("lockfile: Minor code cleanup.")
Tested-at: https://travis-ci.org/github/williamtu/ovs-travis/builds/666098338
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Ben Pfaff <blp@ovn.org>
4 years agofatal-signal: Log backtrace when no monitor daemon.
William Tu [Mon, 23 Mar 2020 14:44:48 +0000 (07:44 -0700)]
fatal-signal: Log backtrace when no monitor daemon.

Currently the backtrace logging is only available when monitor
daemon is running.  This patch enables backtrace logging when
no monitor daemon exists.  At signal handling context, it detects
whether monitor daemon exists.  If not, write directly the backtrace
to the vlog fd.  Note that using VLOG_* macro doesn't work due to
it's buffer I/O, so this patch directly issue write() syscall to
the file descriptor.

For some system we stop using monitor daemon and use systemd to
monitor ovs-vswitchd, thus need this patch. Example of
ovs-vswitchd.log (note that there is no timestamp printed):
  2020-03-23T14:42:12.949Z|00049|memory|INFO|175332 kB peak resident
  2020-03-23T14:42:12.949Z|00050|memory|INFO|handlers:2 ports:3 reva
  SIGSEGV detected, backtrace:
  0x0000000000486969 <fatal_signal_handler+0x49>
  0x00007f7f5e57f4b0 <killpg+0x40>
  0x000000000047daa8 <pmd_thread_main+0x238>
  0x0000000000504edd <ovsthread_wrapper+0x7d>
  0x00007f7f5f0476ba <start_thread+0xca>
  0x00007f7f5e65141d <clone+0x6d>
  0x0000000000000000 <+0x0>

Acked-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agotrivial: Fix typo in comments.
William Tu [Mon, 23 Mar 2020 14:56:47 +0000 (07:56 -0700)]
trivial: Fix typo in comments.

s/daemon_complete/daemonize_complete/

Signed-off-by: William Tu <u9012063@gmail.com>
4 years agotrivial: Fix indentation.
William Tu [Fri, 20 Mar 2020 20:54:50 +0000 (13:54 -0700)]
trivial: Fix indentation.

Add extra space to fix indentation.

Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Ben Pfaff <blp@ovn.org>
4 years agoofproto: Fix typo in manpage fragment.
Ben Pfaff [Thu, 19 Mar 2020 23:02:56 +0000 (16:02 -0700)]
ofproto: Fix typo in manpage fragment.

There was a missing ] and an extra space.

Acked-by: Numan Siddique <numans@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoAUTHORS: Add Usman Ansari.
William Tu [Fri, 20 Mar 2020 15:29:13 +0000 (08:29 -0700)]
AUTHORS: Add Usman Ansari.

Signed-off-by: William Tu <u9012063@gmail.com>
4 years agoHandle refTable values with setkey()
Terry Wilson [Fri, 20 Mar 2020 15:22:38 +0000 (15:22 +0000)]
Handle refTable values with setkey()

For columns like QoS.queues where we have a map containing refTable
values, assigning w/ __setattr__ e.g. qos.queues={1: $queue_row}
works, but using using qos.setkey('queues', 1, $queue_row) results
in an Exception. The opdat argument can essentially just be the
JSON representation of the map column instead of trying to build
it.

Signed-off-by: Terry Wilson <twilson@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoofp-actions: Fix memory leak.
William Tu [Tue, 17 Mar 2020 23:31:55 +0000 (16:31 -0700)]
ofp-actions: Fix memory leak.

Coverity CID 279274 reports leaking previously allocated
'error' buffer when 'return xasprintf("input too big");'.

Cc: Usman Ansari <uansari@vmware.com>
Signed-off-by: William Tu <u9012063@gmail.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
4 years agoofproto-dpif-xlate: Fix recirculation when in_port is OFPP_CONTROLLER.
Ben Pfaff [Fri, 20 Mar 2020 00:53:10 +0000 (17:53 -0700)]
ofproto-dpif-xlate: Fix recirculation when in_port is OFPP_CONTROLLER.

Recirculation usually requires finding the pre-recirculation input port.
Packets sent by the controller, with in_port of OFPP_CONTROLLER or
OFPP_NONE, do not have a real input port data structure, only a port
number.  The code in xlate_lookup_ofproto_() mishandled this case,
failing to return the ofproto data structure.  This commit fixes the
problem and adds a test to guard against regression.

Reported-by: Numan Siddique <numans@ovn.org>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2020-March/368642.html
Tested-by: Numan Siddique <numans@ovn.org>
Acked-by: Numan Siddique <numans@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoconntrack: Fix NULL pointer dereference.
William Tu [Tue, 17 Mar 2020 23:12:21 +0000 (16:12 -0700)]
conntrack: Fix NULL pointer dereference.

Coverity CID 279957 reports NULL pointer derefence when
'conn' is NULL and calling ct_print_conn_info.

Cc: Usman Ansari <uansari@vmware.com>
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Dumitru Ceara <dceara@redhat.com>
4 years agoDocumentation: Add note about iproute2 requirements for check-kmod
Greg Rose [Wed, 11 Mar 2020 17:49:17 +0000 (10:49 -0700)]
Documentation: Add note about iproute2 requirements for check-kmod

On many systems the check-kmod and check-kernel test suites have
many failures due to the lack of feature support in the older
iproute2 utility packages shipped with those systems.  Add a
note indicating that it might be necessary to update the iproute2
utility package in order to fix those errors.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agohmap: Fix Coverity false positive
Usman Ansari [Thu, 19 Mar 2020 21:47:17 +0000 (14:47 -0700)]
hmap: Fix Coverity false positive

Coverity reports a false positive below:
Incorrect expression, Assign_where_compare_meant: use of "="
where "==" may have been intended.
Fixed it by rewriting '(NODE = NULL)' as '((NODE = NULL), false)'.
"make check" passes for this change
Coverity reports over 500 errors resolved

Suggested-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Usman Ansari <ua1422@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agodpif-netlink: avoid netlink modify flow put op failed after tc modify flow put op...
wenxu [Wed, 11 Mar 2020 05:39:34 +0000 (13:39 +0800)]
dpif-netlink: avoid netlink modify flow put op failed after tc modify flow put op failed.

The tc modify flow put always delete the original flow first and
then add the new flow. If the modfiy flow put operation failed,
the flow put operation will change from modify to create if success
to delete the original flow in tc (which will be always failed with
ENOENT, the flow is already be deleted before add the new flow in tc).
Finally, the modify flow put will failed to add in kernel datapath.

Signed-off-by: wenxu <wenxu@ucloud.cn>
Acked-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agodpif-netdev: Enter quiescent state after each offloading operation.
Ilya Maximets [Fri, 21 Feb 2020 14:41:50 +0000 (15:41 +0100)]
dpif-netdev: Enter quiescent state after each offloading operation.

If the offloading queue is big and filled continuously, offloading
thread may have no chance to quiesce blocking rcu callbacks and
other threads waiting for synchronization.

Fix that by entering momentary quiescent state after each operation
since we're not holding any rcu-protected memory here.

Fixes: 02bb2824e51d ("dpif-netdev: do hw flow offload in a thread")
Reported-by: Eli Britstein <elibr@mellanox.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2020-February/049768.html
Acked-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agopvector: Use acquire-release semantics for size.
Yanqin Wei [Thu, 27 Feb 2020 16:12:21 +0000 (00:12 +0800)]
pvector: Use acquire-release semantics for size.

Read/write concurrency of pvector library is implemented by a temp vector
and RCU protection. Considering performance reason, insertion does not
follow this scheme.
In insertion function, a thread fence ensures size increment is done
after new entry is stored. But there is no barrier in the iteration
fuction(pvector_cursor_init). Entry point access may be reordered before
loading vector size, so the invalid entry point may be loaded when vector
iteration.
This patch fixes it by acquire-release pair. It can guarantee new size is
observed by reader after new entry stored by writer. And this is
implemented by one-way barrier instead of two-way memory fence.

Fixes: fe7cfa5c3f19 ("lib/pvector: Non-intrusive RCU priority vector.")
Reviewed-by: Gavin Hu <Gavin.Hu@arm.com>
Reviewed-by: Lijian Zhang <Lijian.Zhang@arm.com>
Signed-off-by: Yanqin Wei <Yanqin.Wei@arm.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agotc: Fix nat port range when offloading ct action
Paul Blakey [Sun, 8 Mar 2020 12:50:23 +0000 (14:50 +0200)]
tc: Fix nat port range when offloading ct action

Port range struct is currently union so the last min/max port assignment
wins, and kernel doesn't receive the range.

Change it to struct type.

Fixes: 2bf6ffb76ac6 ("netdev-offload-tc: Add conntrack nat support")
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agotravis: Disable sindex build in sparse.
Ilya Maximets [Thu, 12 Mar 2020 09:57:44 +0000 (10:57 +0100)]
travis: Disable sindex build in sparse.

Sparse introduced a new utility 'sindex' for semantic search,
but unfortunately it fails to build in Travis environment.
Disabling it explicitly as we don't need it anyway.

Acked-by: Numan Siddique <numans@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agodatapath: Update kernel test list, news and FAQ
Greg Rose [Fri, 6 Mar 2020 22:37:21 +0000 (14:37 -0800)]
datapath: Update kernel test list, news and FAQ

We are adding support for Linux kernels up to 5.5 so update the
Travis test list, NEWS and FAQ.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agodatapath: conntrack: mark expected switch fall-through
Gustavo A. R. Silva [Fri, 6 Mar 2020 22:37:20 +0000 (14:37 -0800)]
datapath: conntrack: mark expected switch fall-through

Upstream commit:
    commit 279badc2a85be83e0187b8c566e3b476b76a87a2
    Author: Gustavo A. R. Silva <garsilva@embeddedor.com>
    Date:   Thu Oct 19 12:55:03 2017 -0500

    openvswitch: conntrack: mark expected switch fall-through

    In preparation to enabling -Wimplicit-fallthrough, mark switch cases
    where we are expecting to fall through.

    Notice that in this particular case I placed a "fall through" comment on
    its own line, which is what GCC is expecting to find.

Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agocompat: Use nla_parse deprecated functions
Johannes Berg [Fri, 6 Mar 2020 22:37:19 +0000 (14:37 -0800)]
compat: Use nla_parse deprecated functions

Upstream commit:
    commit 8cb081746c031fb164089322e2336a0bf5b3070c
    Author: Johannes Berg <johannes.berg@intel.com>
    Date:   Fri Apr 26 14:07:28 2019 +0200

    netlink: make validation more configurable for future strictness

    We currently have two levels of strict validation:

     1) liberal (default)
         - undefined (type >= max) & NLA_UNSPEC attributes accepted
         - attribute length >= expected accepted
         - garbage at end of message accepted
     2) strict (opt-in)
         - NLA_UNSPEC attributes accepted
         - attribute length >= expected accepted

    Split out parsing strictness into four different options:
     * TRAILING     - check that there's no trailing data after parsing
                      attributes (in message or nested)
     * MAXTYPE      - reject attrs > max known type
     * UNSPEC       - reject attributes with NLA_UNSPEC policy entries
     * STRICT_ATTRS - strictly validate attribute size

    The default for future things should be *everything*.
    The current *_strict() is a combination of TRAILING and MAXTYPE,
    and is renamed to _deprecated_strict().
    The current regular parsing has none of this, and is renamed to
    *_parse_deprecated().

    Additionally it allows us to selectively set one of the new flags
    even on old policies. Notably, the UNSPEC flag could be useful in
    this case, since it can be arranged (by filling in the policy) to
    not be an incompatible userspace ABI change, but would then going
    forward prevent forgetting attribute entries. Similar can apply
    to the POLICY flag.

    We end up with the following renames:
     * nla_parse           -> nla_parse_deprecated
     * nla_parse_strict    -> nla_parse_deprecated_strict
     * nlmsg_parse         -> nlmsg_parse_deprecated
     * nlmsg_parse_strict  -> nlmsg_parse_deprecated_strict
     * nla_parse_nested    -> nla_parse_nested_deprecated
     * nla_validate_nested -> nla_validate_nested_deprecated

    Using spatch, of course:
        @@
        expression TB, MAX, HEAD, LEN, POL, EXT;
        @@
        -nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
        +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)

        @@
        expression NLH, HDRLEN, TB, MAX, POL, EXT;
        @@
        -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
        +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)

        @@
        expression NLH, HDRLEN, TB, MAX, POL, EXT;
        @@
        -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
        +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)

        @@
        expression TB, MAX, NLA, POL, EXT;
        @@
        -nla_parse_nested(TB, MAX, NLA, POL, EXT)
        +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)

        @@
        expression START, MAX, POL, EXT;
        @@
        -nla_validate_nested(START, MAX, POL, EXT)
        +nla_validate_nested_deprecated(START, MAX, POL, EXT)

        @@
        expression NLH, HDRLEN, MAX, POL, EXT;
        @@
        -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
        +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)

    For this patch, don't actually add the strict, non-renamed versions
    yet so that it breaks compile if I get it wrong.

    Also, while at it, make nla_validate and nla_parse go down to a
    common __nla_validate_parse() function to avoid code duplication.

    Ultimately, this allows us to have very strict validation for every
    new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
    next patch, while existing things will continue to work as is.

    In effect then, this adds fully strict validation for any new command.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Backport portions of this commit applicable to openvswitch and
added necessary compatibility layer changes to support older
kernels.

Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agodatapath: Kbuild: Add kcompat.h header to front of NOSTDINC
Greg Rose [Fri, 6 Mar 2020 22:37:18 +0000 (14:37 -0800)]
datapath: Kbuild: Add kcompat.h header to front of NOSTDINC

Since this commit in the Linux upstream kernel:
'commit 9b9a3f20cbe0 ("kbuild: split final module linking out into Makefile.modfinal")'
The openvswitch kernel module fails to build against the upstream
Linux kernel. The cause of the build failure is that the include of the
KBUILD_EXTMOD variable was dropped in Makefile.modfinal when
it was split out from Makefile.modpost.  Our Kbuild was setting
the ccflags-y variable to include our kcompat.h header as the
first header file.  The Linux kernel maintainer has said that
it is incorrect to rely on the ccflags-y variable for the modfinal
phase of the build so that is why KBUILD_EXTMOD is not included.

We fix this by breaking a different Linux kernel make rule.  We
add '-include $(builddir)/kcompat.h' to the front of the NOSTDINC
variable setting in our Kbuild makefile.

As noted already in the comment for the NOSTDINC setting:
\# These include directories have to go before -I$(KSRC)/include.
\# NOSTDINC_FLAGS just happens to be a variable that goes in the
\# right place, even though it's conceptually incorrect.

So we continue the misuse of the NOSTDINC variable to fix this
issue as well.

The assumption of the Linux kernel maintainers is that any
local, out-of-tree build include files can be added to the end
of the command line. In our case that is wrong of course, but
there is nothing we can do about it that I know of other than using
some utility like unifdef to strip out offending chunks of our
compatibility layer code before invocation of Makefile.modfinal.
That is a big change that would take a lot of work to implement.

We could ask the Linux kernel maintainers to provide some
way for out-of-tree kernel modules to include their own header
files first in a proper manner. I consider that to be a very
low probability of success but something we could ask about.

For now we cheat and take the easy way out.

Reported-by: David Ahern <dsahern@gmail.com>
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agodatapath: Use sizeof_field macro
Pankaj Bharadiya [Fri, 6 Mar 2020 22:37:17 +0000 (14:37 -0800)]
datapath: Use sizeof_field macro

Upstream commit:
    commit c593642c8be046915ca3a4a300243a68077cd207
    Author: Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com>
    Date:   Mon Dec 9 10:31:43 2019 -0800

    treewide: Use sizeof_field() macro

    Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except
    at places where these are defined. Later patches will remove the unused
    definition of FIELD_SIZEOF().

    This patch is generated using following script:

    EXCLUDE_FILES="include/linux/stddef.h|include/linux/kernel.h"

    git grep -l -e "\bFIELD_SIZEOF\b" | while read file;
    do

     if [[ "$file" =~ $EXCLUDE_FILES ]]; then
     continue
     fi
     sed -i  -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file;
    done

Signed-off-by: Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com>
Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.com
Co-developed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: David Miller <davem@davemloft.net> # for net
Also added a compatibility layer macro for older kernels that still
use FIELD_SIZEOF

Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agocompat: Remove flex_array code
Greg Rose [Fri, 6 Mar 2020 22:37:16 +0000 (14:37 -0800)]
compat: Remove flex_array code

Flex array support is removed since kernel 5.1.  Convert to use
kvmalloc_array instead.

Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agocompat: Move genl_ops policy to genl_family
Johannes Berg [Fri, 6 Mar 2020 22:37:15 +0000 (14:37 -0800)]
compat: Move genl_ops policy to genl_family

Upstream commit:
    commit 3b0f31f2b8c9fb348e4530b88f6b64f9621f83d6
    Author: Johannes Berg <johannes.berg@intel.com>
    Date:   Thu Mar 21 22:51:02 2019 +0100

    genetlink: make policy common to family

    Since maxattr is common, the policy can't really differ sanely,
    so make it common as well.

    The only user that did in fact manage to make a non-common policy
    is taskstats, which has to be really careful about it (since it's
    still using a common maxattr!). This is no longer supported, but
    we can fake it using pre_doit.

    This reduces the size of e.g. nl80211.o (which has lots of commands):

       text    data     bss     dec     hex filename
     398745   14323    2240  415308   6564c net/wireless/nl80211.o (before)
     397913   14331    2240  414484   65314 net/wireless/nl80211.o (after)
    --------------------------------
       -832      +8       0    -824

    Which is obviously just 8 bytes for each command, and an added 8
    bytes for the new policy pointer. I'm not sure why the ops list is
    counted as .text though.

    Most of the code transformations were done using the following spatch:
        @ops@
        identifier OPS;
        expression POLICY;
        @@
        struct genl_ops OPS[] = {
        ...,
         {
        - .policy = POLICY,
         },
        ...
        };

        @@
        identifier ops.OPS;
        expression ops.POLICY;
        identifier fam;
        expression M;
        @@
        struct genl_family fam = {
                .ops = OPS,
                .maxattr = M,
        +       .policy = POLICY,
                ...
        };

    This also gets rid of devlink_nl_cmd_region_read_dumpit() accessing
    the cb->data as ops, which we want to change in a later genl patch.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit 3b0f31f2b8c9f ("genetlink: make policy common to family")
the policy field of the genl_ops structure has been moved into the
genl_family structure.  Add necessary compat layer infrastructure
to still support older kernels.

Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agocompat: Fix up changes to inet frags in 5.1+
Greg Rose [Fri, 6 Mar 2020 22:37:14 +0000 (14:37 -0800)]
compat: Fix up changes to inet frags in 5.1+

Since Linux kernel release 5.1 the fragments field of the inet_frag_queue
structure is removed and now only the rb_fragments structure with an
rb_node pointer is used for both ipv4 and ipv6.  In addition, the
atomic_sub and atomic_add functions are replaced with their
equivalent long counterparts.

Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoacinclude: Enable Linux kernel 5.5
Greg Rose [Fri, 6 Mar 2020 22:37:13 +0000 (14:37 -0800)]
acinclude: Enable Linux kernel 5.5

Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoraft: Unset leader when starting election.
Han Zhou [Fri, 6 Mar 2020 07:48:46 +0000 (23:48 -0800)]
raft: Unset leader when starting election.

During election, there shouldn't be any leader. This change makes sure
that a server in candidate role always report leader as "unknown".

Signed-off-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoraft: Fix the problem of stuck in candidate role forever.
Han Zhou [Fri, 6 Mar 2020 07:48:45 +0000 (23:48 -0800)]
raft: Fix the problem of stuck in candidate role forever.

Sometimes a server can stay in candidate role forever, even if the server
already see the new leader and handles append-requests normally. However,
because of the wrong role, it appears as disconnected from cluster and
so the clients are disconnected.

This problem happens when 2 servers become candidates in the same
term, and one of them is elected as leader in that term. It can be
reproduced by the test cases added in this patch.

The root cause is that the current implementation only changes role to
follower when a bigger term is observed (in raft_receive_term__()).
According to the RAFT paper, if another candidate becomes leader with
the same term, the candidate should change to follower.

This patch fixes it by changing the role to follower when leader
is being updated in raft_update_leader().

Signed-off-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoraft: Fix next_index in install_snapshot reply handling.
Han Zhou [Sat, 29 Feb 2020 02:07:10 +0000 (18:07 -0800)]
raft: Fix next_index in install_snapshot reply handling.

When a leader handles install_snapshot reply, the next_index for
the follower should be log_start instead of log_end, because there
can be new entries added in leader's log after initiating the
install_snapshot procedure.  Also, it should send all the accumulated
entries to follower in the following append-request message, instead
of sending 0 entries, to speed up the converge.

Without this fix, there is no functional problem, but it takes
uncessary extra rounds of append-requests responsed with "inconsistency"
by follower, although finally will be converged.

Signed-off-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoraft: Send all missing logs in one single append_request.
Han Zhou [Sat, 29 Feb 2020 02:07:09 +0000 (18:07 -0800)]
raft: Send all missing logs in one single append_request.

When a follower needs to "catch up", leader can send N entries in
a single append_request instead of only one entry by each message.

The function raft_send_append_request() already supports this, so
this patch just calculate the correct "n" and use it.

Signed-off-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoraft: Avoid sending unnecessary heartbeat when becoming leader.
Han Zhou [Sat, 29 Feb 2020 02:07:08 +0000 (18:07 -0800)]
raft: Avoid sending unnecessary heartbeat when becoming leader.

When a node becomes leader, it sends out heartbeat to all followers
and then sends out another append-request for a no-op command
execution to all followers again immediately. This causes 2
continously append-requests sent out to each followers, and the first
heartbeat append-request is unnecessary. This patch removes the
heartbeat.

Signed-off-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoraft: Avoid busy loop during leader election.
Han Zhou [Sat, 29 Feb 2020 02:07:07 +0000 (18:07 -0800)]
raft: Avoid busy loop during leader election.

When a server doesn't see a leader yet, e.g. during leader re-election,
if a transaction comes from a client, it will cause 100% CPU busy loop.
With debug log enabled it is like:

2020-02-28T04:04:35.631Z|00059|poll_loop|DBG|wakeup due to 0-ms timeout at ../ovsdb/trigger.c:164
2020-02-28T04:04:35.631Z|00062|poll_loop|DBG|wakeup due to 0-ms timeout at ../ovsdb/trigger.c:164
2020-02-28T04:04:35.631Z|00065|poll_loop|DBG|wakeup due to 0-ms timeout at ../ovsdb/trigger.c:164
2020-02-28T04:04:35.631Z|00068|poll_loop|DBG|wakeup due to 0-ms timeout at ../ovsdb/trigger.c:164
2020-02-28T04:04:35.631Z|00071|poll_loop|DBG|wakeup due to 0-ms timeout at ../ovsdb/trigger.c:164
2020-02-28T04:04:35.631Z|00074|poll_loop|DBG|wakeup due to 0-ms timeout at ../ovsdb/trigger.c:164
2020-02-28T04:04:35.631Z|00077|poll_loop|DBG|wakeup due to 0-ms timeout at ../ovsdb/trigger.c:164
...

The problem is that in ovsdb_trigger_try(), all cluster errors are treated
as temporary error and retry immediately. This patch fixes it by introducing
'run_triggers_now', which tells if a retry is needed immediately. When the
cluster error is with detail 'not leader', we don't immediately retry, but
will wait for the next poll event to trigger the retry. When 'not leader'
status changes, there must be a event, i.e. raft RPC that changes the
status, so the trigger is guaranteed to be triggered, without busy loop.

Signed-off-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoraft: Fix raft_is_connected() when there is no leader yet.
Han Zhou [Sat, 29 Feb 2020 02:07:06 +0000 (18:07 -0800)]
raft: Fix raft_is_connected() when there is no leader yet.

If there is never a leader known by the current server, it's status
should be "disconnected" to the cluster. Without this patch, when
a server in cluster is restarted, before it successfully connecting
back to the cluster it will appear as connected, which is wrong.

Signed-off-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoovsdb-server: Don't disconnect clients after raft install_snapshot.
Han Zhou [Sat, 29 Feb 2020 02:07:05 +0000 (18:07 -0800)]
ovsdb-server: Don't disconnect clients after raft install_snapshot.

When "schema" field is found in read_db(), there can be two cases:
1. There is a schema change in clustered DB and the "schema" is the new one.
2. There is a install_snapshot RPC happened, which caused log compaction on the
server and the next log is just the snapshot, which always constains "schema"
field, even though the schema hasn't been changed.

The current implementation doesn't handle case 2), and always assume the schema
is changed hence disconnect all clients of the server. It can cause stability
problem when there are big number of clients connected when this happens in
a large scale environment.

Signed-off-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoraft-rpc: Fix message format.
Han Zhou [Sat, 29 Feb 2020 02:07:04 +0000 (18:07 -0800)]
raft-rpc: Fix message format.

Signed-off-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoofproto: Add support to watch controller port liveness in fast-failover group
Vishal Deep Ajmera [Mon, 3 Feb 2020 10:32:46 +0000 (11:32 +0100)]
ofproto: Add support to watch controller port liveness in fast-failover group

Currently fast-failover group does not support checking liveness of controller
port (OFPP_CONTROLLER). However this feature can be useful for selecting
alternate pipeline when controller connection itself is down for e.g.
by using local DHCP server to reply for any DHCP request originating from VMs.

This patch adds the support for watching controller port liveness in fast-
failover group. Controller port is considered live when atleast one
of-connection is alive.

Example usage:

ovs-ofctl add-group br-int 'group_id=1234,type=ff,
          bucket=watch_port:CONTROLLER,actions:<A>,
          bucket=watch_port:1,actions:<B>

Signed-off-by: Vishal Deep Ajmera <vishal.deep.ajmera@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agorelease-process: Fix indentation.
Ben Pfaff [Fri, 6 Mar 2020 21:25:12 +0000 (13:25 -0800)]
release-process: Fix indentation.

Signed-off-by: Ben Pfaff <blp@ovn.org>