]> git.proxmox.com Git - mirror_ovs.git/log
mirror_ovs.git
3 years agometa-flow: fix a typo in "MPLS Bottom of Stack Field" paragraph.
Timothy Redaelli [Thu, 6 Aug 2020 16:33:50 +0000 (18:33 +0200)]
meta-flow: fix a typo in "MPLS Bottom of Stack Field" paragraph.

In the ovs-fields.7 manual page, the "MPLS Bottom of Stack Field" paragraph
says:
 * When mpls_bos is 1, there is another MPLS label following this one,
   so the Ethertype passed to pop_mpls should be an MPLS Ethertype. [...]

 * When mpls_bos is 0, this MPLS label is the last one, so the Ethertype
   passed to pop_mpls should be a non-MPLS Ethertype such as IPv4. [...]

The values 0 and 1 have been swapped: when BOS is 1,
then no more label stack entries follows.

Fixes: 96fee5e0a2a0 ("ovs-fields: New manpage to document Open vSwitch and OpenFlow fields.")
Reported-at: https://bugzilla.redhat.com/1842032
Reported-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Acked-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
3 years agopython: Fixup python shebangs to python3.
Greg Rose [Fri, 21 Aug 2020 20:30:07 +0000 (13:30 -0700)]
python: Fixup python shebangs to python3.

Builds on RHEL 8.2 systems are failing due to this issue.

See [1] as to why this is necessary.

I used the following command to identify files that need this fix:
find . -type f -executable | /usr/lib/rpm/redhat/brp-mangle-shebangs

I also updated the copyright notices as needed.

1. https://fedoraproject.org/wiki/Changes/Make_ambiguous_python_shebangs_error

Fixes: 1ca0323e7c29 ("Require Python 3 and remove support for Python 2.")
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
3 years agotest-conntrack: Fix conntrack benchmark by clearing conntrack metadata.
Ilya Maximets [Tue, 18 Aug 2020 14:13:29 +0000 (16:13 +0200)]
test-conntrack: Fix conntrack benchmark by clearing conntrack metadata.

Packets in the benchmark must be treated as new packets, i.e. they
should not have conntrack metadata set.  Current code will set up
'pkt->md.conn' after the first run and all subsequent calls will hit
the 'fast' processing that is intended for recirculated packets making
a false impression that current conntrack implementation is lightning
fast.

Before the change:
  $ ./ovstest test-conntrack benchmark 4 33554432 32 1
  conntrack:   1059 ms

After (correct):
  $ ./ovstest test-conntrack benchmark 4 33554432 32 1
  conntrack:  92785 ms

Fixes: 594570ea1cde ("conntrack: Optimize recirculations.")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
3 years agotravis: Test build of debian packages.
Ilya Maximets [Fri, 21 Aug 2020 12:04:05 +0000 (14:04 +0200)]
travis: Test build of debian packages.

We had a lot of issues with debian packaging lately.  This job will
check build and installation of debian packages to avoid most of such
issues in the future.

Installing only minimal set of tools, most of dependencies will be
installed according to package description, this way we will check if
we have all required dependencies listed.

Not trying to install openvswitch-ipsec package as there is an issue
that python from the pyenv for some reason doesn't see ovs packages
installed from python3-openvswitch, i.e. ipsec service is not able to
start.

Tests are skipped because they are tested in many other scenarios.
No need to waste time.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
3 years agoSet release date for 2.14.0.
Ilya Maximets [Mon, 17 Aug 2020 12:17:17 +0000 (14:17 +0200)]
Set release date for 2.14.0.

Acked-by: Ian Stokes <ian.stokes@intel.com>
Acked-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
3 years agoconnmgr: Support changing openflow versions without restarting.
Aaron Conole [Wed, 12 Aug 2020 20:07:55 +0000 (16:07 -0400)]
connmgr: Support changing openflow versions without restarting.

When commit a0baa7dfa4fe ("connmgr: Make treatment of active and passive
connections more uniform") was applied, it did not take into account
that a reconfiguration of the allowed_versions setting would require a
reload of the ofservice object (only accomplished via a restart of OvS).

For now, during the reconfigure cycle, we delete the ofservice object and
then recreate it immediately.  A new test is added to ensure we do not
break this behavior again.

Fixes: a0baa7dfa4fe ("connmgr: Make treatment of active and passive connections more uniform")
Suggested-by: Ben Pfaff <blp@ovn.org>
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1782834
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Numan Siddique <numans@ovn.org>
Tested-by: Numan Siddique <numans@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
3 years agoovs-monitor-ipsec: Convert Python2 code to Python3.
lzhecheng [Thu, 6 Aug 2020 04:23:39 +0000 (04:23 +0000)]
ovs-monitor-ipsec: Convert Python2 code to Python3.

Submitted-at: https://github.com/openvswitch/ovs/pull/331
Reported-at: https://github.com/openvswitch/ovs-issues/issues/192
Fixes: 1ca0323e7c29 ("Require Python 3 and remove support for Python 2.")
Signed-off-by: lzhecheng <lzhecheng@vmware.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
3 years agonetdev-offload-dpdk: Fix for broken ethernet matching HWOL for XL710NIC.
Emma Finn [Fri, 14 Aug 2020 13:38:49 +0000 (14:38 +0100)]
netdev-offload-dpdk: Fix for broken ethernet matching HWOL for XL710NIC.

This patch introduces a temporary work around to fix
partial hardware offload for XL710 devices. Currently the incorrect
ethernet pattern is being set. This patch will be removed once
this issue is fixed within the i40e PMD.

Signed-off-by: Emma Finn <emma.finn@intel.com>
Signed-off-by: Eli Britstein <elibr@nvidia.com>
Co-authored-by: Eli Britstein <elibr@nvidia.com>
Tested-by: Ian Stokes <ian.stokes@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
3 years agoRevert "ovsdb-idl: Fix NULL deref reported by Coverity."
Han Zhou [Tue, 11 Aug 2020 06:15:10 +0000 (23:15 -0700)]
Revert "ovsdb-idl: Fix NULL deref reported by Coverity."

This reverts commit 68bc6f88a3a36549fcd3b6248c25c5e2e6deb8f3.
The commit causes a regression in OVN scale test. ovn-northd's CPU
more than doubled for the test scenario: create and bind 12k ports.
Below are some perf data of ovn-northd when running command:
  ovn-nbctl --wait=sb sync

Before reverting this commit:
-   92.42%     0.62%  ovn-northd  ovn-northd          [.] main
   - 91.80% main
      + 68.93% ovn_db_run (inlined)
      + 22.45% ovsdb_idl_loop_commit_and_wait

After reverting this commit:
-   92.84%     0.60%  ovn-northd  ovn-northd          [.] main
   - 92.24% main
      + 92.03% ovn_db_run (inlined)

Reverting this commit avoided 22.45% of the CPU caused by
ovsdb_idl_loop_commit_and_wait().

The commit changed the logic of ovsdb_idl_txn_write__() by adding
the check "datum->keys && datum->values" before discarding unchanged
data in a transaction. However, it is normal for OVSDB clients (
such as ovn-northd) to try to set columns with same empty data
as it is before the transaction. IDL would discard these changes
and avoid sending big transactions to server (which would end up as
no-op on server side). In the ovn scale test scenario mentioned above,
each iteration of ovn-northd would send a transaction to server that
includes all rows of the huge Port_Binding table, which caused the
significant CPU increase of ovn-northd (and also the OVN SB DB server),
resulted in longer end to end latency of OVN configuration changes.

For the original problem the commit 68bc6f88 was trying to fix, it
doesn't seem to be a real problem. The NULL deref reported by
Coverity may be addressed in a future patch using a different approach,
if necessary.

Signed-off-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
3 years agoAUTHORS: Add Sivaprasad Tummala.
Ian Stokes [Wed, 12 Aug 2020 17:28:39 +0000 (18:28 +0100)]
AUTHORS: Add Sivaprasad Tummala.

Signed-off-by: Ian Stokes <ian.stokes@intel.com>
3 years agodpdk: Deprecate vhost-user dequeue zero-copy.
Ian Stokes [Thu, 6 Aug 2020 15:28:35 +0000 (16:28 +0100)]
dpdk: Deprecate vhost-user dequeue zero-copy.

Dequeue zero-copy is no longer supported for vhost-user client mode
in DPDK due to commit [1].

In addition to this, zero-copy mode has been proposed to be marked
deprecated in [2] with removal in the next DPDK LTS release.

This commit deprecates support for vhost-user dequeue zero-copy in OVS
with its removal expected in the next OVS release.

[1] 715070ea10e6 ("vhost: prevent zero-copy with incompatible client
    mode")
[2] http://mails.dpdk.org/archives/dev/2020-August/177236.html

Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
3 years agonetdev-dpdk: linear buffer check with zero-copy
Sivaprasad Tummala [Thu, 26 Mar 2020 12:09:20 +0000 (12:09 +0000)]
netdev-dpdk: linear buffer check with zero-copy

As of DPDK 19.11, in order to use dequeue-zero-copy in DPDK Vhost library,
the application has to disable the linear buffer option. Hence
dequeue-zero-copy is not supported for vhost application that requires
linear buffers.

An alternative DPDK based approach to disable the linear buffers within
the vhost library itself was proposed in [1], however the consensus was
that application should be responsible for disabling linear buffers.

As such this patch disables linear buffers when zero-copy is enabled.

[1]    https://patches.dpdk.org/patch/67200/

Fixes: 127b6a6eea02 ("dpdk: Update to use DPDK 19.11.")
Signed-off-by: Sivaprasad Tummala <Sivaprasad.Tummala@intel.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
3 years agoacinclude: Fix build with kernels with prandom* moved to prandom.h.
Ilya Maximets [Wed, 12 Aug 2020 08:57:07 +0000 (10:57 +0200)]
acinclude: Fix build with kernels with prandom* moved to prandom.h.

Recent commit c0842fbc1b18 ("random32: move the pseudo-random 32-bit
definitions to prandom.h") in upstream kernel moved the definition
of prandom_* functions from random.h to prandom.h.  This change was
also backported to stable kernels.

Fixing our configure script to look for these functions in a new
location and avoid build failures:

  datapath/linux/compat/include/linux/random.h:11:19:
    error: redefinition of 'prandom_u32_max'

Acked-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
3 years agofaq: Mention Linux kernel versions supported by 2.13.x.
Ben Pfaff [Thu, 14 May 2020 17:36:12 +0000 (10:36 -0700)]
faq: Mention Linux kernel versions supported by 2.13.x.

This is based on acinclude.m4 in branch-2.13, which rejects anything
newer than 5.0.

Reported-by: Han Zhou <hzhou@ovn.org>
Acked-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
3 years agoreleases: Add OVS 2.14 to DPDK mapping.
Ian Stokes [Tue, 11 Aug 2020 17:21:44 +0000 (18:21 +0100)]
releases: Add OVS 2.14 to DPDK mapping.

Add an entry for OVS 2.14 to map to the validated DPDK release.

Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
3 years agoovsdb-server: Replace in-memory DB contents at raft install_snapshot.
Dumitru Ceara [Wed, 5 Aug 2020 19:40:51 +0000 (21:40 +0200)]
ovsdb-server: Replace in-memory DB contents at raft install_snapshot.

Every time a follower has to install a snapshot received from the
leader, it should also replace the data in memory. Right now this only
happens when snapshots are installed that also change the schema.

This can lead to inconsistent DB data on follower nodes and the snapshot
may fail to get applied.

Fixes: bda1f6b60588 ("ovsdb-server: Don't disconnect clients after raft install_snapshot.")
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
3 years agotc: Use skip_hw flag when probing tc features
Roi Dayan [Tue, 4 Aug 2020 06:37:21 +0000 (09:37 +0300)]
tc: Use skip_hw flag when probing tc features

There is no need to pass tc rules to hw when just probing
for tc features. this will avoid redundant errors from hw drivers
that may happen.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Acked-By: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
3 years agoconfigure: explicitly disable avx512 if bintuils check fails
Harry van Haaren [Wed, 29 Jul 2020 10:59:34 +0000 (11:59 +0100)]
configure: explicitly disable avx512 if bintuils check fails

This commit explicitly disables avx512f if the binutils assembler
check fails to correctly assemble its input.

Without this fix, there is a possibility that users can see undefined
behaviour when compiling with -march=native on a CPU which supports
avx512 and with a buggy binutils version (v2.30 and 2.31), without a
backported fix, if the compiler's vectorizing optimizations convert
scalar code to avx512 instructions.

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
3 years agodpif-netdev/avx512: add -fPIC flag to enable shared builds
Harry van Haaren [Wed, 29 Jul 2020 10:59:33 +0000 (11:59 +0100)]
dpif-netdev/avx512: add -fPIC flag to enable shared builds

In certain scenarios with OVS built with --enable-shared and
DPDK enabled as shared build too, Position Independant Code
is required to link the avx512.a file into the relocatable .so
that it must be linked into.

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
3 years agodpif-netdev/avx512: avoid compiling avx512 code if binutils check fails
Harry van Haaren [Wed, 29 Jul 2020 10:59:32 +0000 (11:59 +0100)]
dpif-netdev/avx512: avoid compiling avx512 code if binutils check fails

This commit avoids compiling and linking of avx512 code into the
vswitch_la library if the binutils check fails. This avoids compiling
code into OVS that will not be executed due to binutils issue.

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
3 years agoodp-util: Clear padding in the nd_extension.
Peng He [Tue, 4 Aug 2020 01:54:56 +0000 (09:54 +0800)]
odp-util: Clear padding in the nd_extension.

Silimar to the patch 67eb8110171f ("odp-util: Fix passing
uninitialized bytes in OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV*.")
when change from flow into the netlink format, the tail
padding of nd_extension should be cleared.

this fixes the following warning logs:

 |ofproto_dpif_upcall(pmd-...)|WARN|Conflicting ukey for flows:
   ufid:763c7d3b-4d0c-4bff-aafc-fdfb6089c2ba
   <...>,eth(...),eth_type(0x86dd),ipv6(...),icmpv6(type=135,code=0),\
   nd(target=fdbd:dc02:ff:1:1::1,sll=fa:16:3e:75:b3:a9,tll=00:00:00:00:00:00),\
   nd_ext(nd_reserved=0x0,nd_options_type=1)

   ufid:763c7d3b-4d0c-4bff-aafc-fdfb6089c2ba
   <...>,eth(...),eth_type(0x86dd),ipv6(...),icmpv6(type=135,code=0),\
   nd(target=fdbd:dc02:ff:1:1::1,sll=fa:16:3e:75:b3:a9,tll=00:00:00:00:00:00),\
   nd_ext(nd_reserved=0x0,nd_options_type=1)
 |ofproto_dpif_upcall(pmd-...)|WARN|upcall_cb failure: ukey installation fails

Fixes: 9b2b84973db7 ("Support for match & set ICMPv6 reserved and options type fields")
Signed-off-by: Peng He <hepeng.0320@bytedance.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
3 years agoodp-util: Fix clearing match mask if set action is partially unnecessary.
Ilya Maximets [Mon, 27 Jul 2020 15:41:35 +0000 (17:41 +0200)]
odp-util: Fix clearing match mask if set action is partially unnecessary.

While committing set() actions, commit() could wildcard all the fields
that are same in match key and in the set action.  This leads to
situation where mask after commit could actually contain less bits
than it was before.  And if set action was partially committed, all
the fields that were the same will be cleared out from the matching key
resulting in the incorrect (too wide) flow.

For example, for the flow that matches on both src and dst mac
addresses, if the dst mac is the same and only src should be changed
by the set() action, destination address will be wildcarded in the
match key and will never be matched, i.e. flows with any destination
mac will match, which is not correct.

Setting OF rule:

 in_port=1,dl_src=50:54:00:00:00:09 actions=mod_dl_dst(50:54:00:00:00:0a),output(2)

Sending following packets on port 1:

  1. eth(src=50:54:00:00:00:09,dst=50:54:00:00:00:0a),eth_type(0x0800)
  2. eth(src=50:54:00:00:00:09,dst=50:54:00:00:00:0c),eth_type(0x0800)
  3. eth(src=50:54:00:00:00:0b,dst=50:54:00:00:00:0c),eth_type(0x0800)

Resulted datapath flows:
  eth(dst=50:54:00:00:00:0c),<...>, actions:set(eth(dst=50:54:00:00:00:0a)),2
  eth(src=50:54:00:00:00:09,dst=50:54:00:00:00:0a),<...>, actions:2

The first flow  doesn't have any match on source MAC address and the
third packet successfully matched on it while it must be dropped.

Fix that by updating the match mask with only the new bits set by
commit(), but keeping those that were cleared (OR operation).

With fix applied, resulted correct flows are:
  eth(src=50:54:00:00:00:09,dst=50:54:00:00:00:0a),<...>, actions:2
  eth(src=50:54:00:00:00:09,dst=50:54:00:00:00:0c),<...>,
                                    actions:set(eth(dst=50:54:00:00:00:0a)),2
  eth(src=50:54:00:00:00:0b),<...>, actions:drop

The code before commit dbf4a92800d0 was not able to reduce the mask,
it was only possible to expand it to exact match, so it was OK to
update original matching mask with the new value in all cases.

Fixes: dbf4a92800d0 ("odp-util: Do not rewrite fields with the same values as matched")
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1854376
Acked-by: Eli Britstein <elibr@mellanox.com>
Tested-by: Adrián Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
3 years agodatapath-windows: Update flow key in SET action
Jinjun Gao [Wed, 29 Jul 2020 03:33:18 +0000 (11:33 +0800)]
datapath-windows: Update flow key in SET action

The flow key is not updated when process OVS_ACTION_ATTR_SET action.
It will impact follow-up actions, such as, conntrack module cannot
find created conntrack entry if passing old flow key to it.

Reported-by: Rui Cao <rcao@vmware.com>
Signed-off-by: Jinjun Gao <jinjung@vmware.com>
Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
3 years agodatapath-windows: Reset ct_mark/ct_label to support ALG
Jinjun Gao [Thu, 23 Jul 2020 04:05:51 +0000 (12:05 +0800)]
datapath-windows: Reset ct_mark/ct_label to support ALG

The ct_mark/ct_label setting on related connection keep the same
behavior with Linux datapath. If one CT entry has parent/master entry,
its ct_mark and ct_label should inherit from the corresponding part
of parent/master entry at initialization.

Signed-off-by: Jinjun Gao <jinjung@vmware.com>
Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
3 years agobfd: Support overlay BFD
Yifeng Sun [Mon, 27 Jul 2020 19:27:23 +0000 (12:27 -0700)]
bfd: Support overlay BFD

Current OVS intercepts and processes all BFD packets, thus VM-2-VM
BFD packets get lost and the recipient VM never sees them.

This patch fixes it by only intercepting and processing BFD packets
destined to a configured BFD instance, and other BFD packets are made
available to the OVS flow table for forwarding.

This patch keeps BFD's backward compatibility.

VMware-BZ: #2579326
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
3 years agotests: Refactor the iptables accept rule.
William Tu [Thu, 23 Jul 2020 16:32:06 +0000 (09:32 -0700)]
tests: Refactor the iptables accept rule.

Certain Linux distributions, like CentOS, have default iptable
rules to reject input traffic from br-underlay.  Refactor by
creating a macro 'IPTABLES_ACCEPT([bridge])' for adding the
accept rule to the iptable input chain.

Signed-off-by: William Tu <u9012063@gmail.com>
3 years agoRevert "dpif-netdev: includes microsecond delta in meter bucket calculation".
Tonghao Zhang [Sat, 23 May 2020 10:33:20 +0000 (18:33 +0800)]
Revert "dpif-netdev: includes microsecond delta in meter bucket calculation".

This reverts commit 5c41c31ebd64fda821fb733a5784a7a440a794f8.

Use the pktgen-dpdk to test the commit 5c41c31ebd64
("dpif-netdev: includes microsecond delta in meter bucket calculation"),
it does't work as expected. And it broken the meter function (e.g. set
rate 200Mbps, the rate watched was 400Mbps). To reproduce it:

 $ ovs-vsctl add-br br-int -- set bridge br-int datapath_type=netdev
 $ ovs-ofctl -O OpenFlow13 add-meter br-int \
         "meter=100 kbps burst stats bands=type=drop rate=200000 burst_size=200000"
 $ ovs-ofctl -O OpenFlow13 add-flow br-int \
         "in_port=dpdk0 action=meter:100,output:dpdk1"
 $ pktgen -l 1,3,5,7,9,11,13,15,17,19 -n 8 --socket-mem 4096 \
         --file-prefix pg1 -w 0000:82:00.0 -w 0000:82:00.1 -- \
         -T -P -m "[3/5/7/9/11/13/15].[0-1]" -f meter-test.pkt

 meter-test.pkt:
 | set 0 count 0
 | set 0 size 1500
 | set 0 rate 100
 | set 0 burst 64
 | set 0 sport 1234
 | set 0 dport 5678
 | set 0 prime 1
 | set 0 type ipv4
 | set 0 proto udp
 | set 0 dst ip 1.1.1.2
 | set 0 src ip 1.1.1.1/24
 | set 0 dst mac ec:0d:9a:ab:54:0a
 | set 0 src mac ec:0d:9a:bf:df:bb
 | set 0 vlanid 0
 | start 0

Note that the issue that patch 5c41c31ebd64 was intended to fix was
already fixed by commit:
  42697ca7757b ("dpif-netdev: fix meter at high packet rate.")

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
3 years agodebian: Fixed openvswitch-test package dependency.
Toms Atteka [Wed, 22 Jul 2020 21:25:00 +0000 (14:25 -0700)]
debian: Fixed openvswitch-test package dependency.

Python3 does not have python3-twisted-web. Required codebase is inside
python3-twisted.

Fixes: 1ca0323e7c29 ("Require Python 3 and remove support for Python 2.")
Signed-off-by: Toms Atteka <cpp.code.lv@gmail.com>
Acked-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
3 years agodpif-netdev.at: Wait for miss upcall log.
Ilya Maximets [Thu, 23 Jul 2020 15:17:24 +0000 (17:17 +0200)]
dpif-netdev.at: Wait for miss upcall log.

Some tests checks for 'miss upcall' log in a log file immediately
after sending the packet, this causes test failures while running
them under valgrind or on the overloaded system.

Fix that by waiting for appearance of the actual string in the log
file.  Some other tests uses 'sleep 1' to fix that, but it's better
to wait for event than sleep for a specific amount of time.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: William Tu <u9012063@gmail.com>
3 years agodpctl: Fix memory leak in dpctl_dump_flows()
Tonghao Zhang [Thu, 16 Jul 2020 11:14:44 +0000 (19:14 +0800)]
dpctl: Fix memory leak in dpctl_dump_flows()

Goto label accurately to avoid memleak.

Fixes: a692410af0f7 ("dpctl: Expand the flow dump type filter")
Cc: Gavi Teitz <gavi@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
3 years agodocs: Remove duplicate word from vhost-user doc.
Flavio Leitner [Fri, 5 Jun 2020 19:24:53 +0000 (16:24 -0300)]
docs: Remove duplicate word from vhost-user doc.

Fixes: 49df3c0fe779 ("docs: DPDK isn't a datapath, so don't use the term.")
Acked-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: William Tu <u9012063@gmail.com>
3 years agoovs-router: Fix flushing of local routes.
Ilya Maximets [Tue, 21 Jul 2020 12:47:32 +0000 (14:47 +0200)]
ovs-router: Fix flushing of local routes.

Since commit 8e4e45887ec3, priority of 'local' route entries no
longer matches with 'plen'.  This should be taken into account
while flushing cached routes, otherwise they will remain in OVS
even after removing them from the system:

  # ifconfig eth0 11.0.0.1
  # ovs-appctl ovs/route/show
    --- A new route synchronized from kernel route table ---
    Cached: 11.0.0.1/32 dev eth0 SRC 11.0.0.1 local
  # ifconfig eth0 0
  # ovs-appctl ovs/route/show
    -- the new route entry is still in ovs route table ---
    Cached: 11.0.0.1/32 dev eth0 SRC 11.0.0.1 local

CC: wenxu <wenxu@ucloud.cn>
Fixes: 8e4e45887ec3 ("ofproto-dpif-xlate: makes OVS native tunneling honor tunnel-specified source addresses")
Reported-by: Zheng Jingzhou <glovejmm@163.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2020-July/373093.html
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: William Tu <u9012063@gmail.com>
3 years agoPrepare for post-2.14.0 (2.14.90).
Ilya Maximets [Fri, 17 Jul 2020 01:41:32 +0000 (03:41 +0200)]
Prepare for post-2.14.0 (2.14.90).

Acked-by: Ian Stokes <ian.stokes@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
3 years agoPrepare for 2.14.0.
Ilya Maximets [Fri, 17 Jul 2020 01:38:31 +0000 (03:38 +0200)]
Prepare for 2.14.0.

Acked-by: Ian Stokes <ian.stokes@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
3 years agocheckpatch: Add argument to skip gerrit change id check.
Roi Dayan [Tue, 14 Jul 2020 07:24:41 +0000 (10:24 +0300)]
checkpatch: Add argument to skip gerrit change id check.

This arg can be used internally by groups using gerrit for code reviews.

Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
3 years agoacinclude: Remove libmnl for MLX5 PMD.
Timothy Redaelli [Tue, 23 Jun 2020 16:48:38 +0000 (18:48 +0200)]
acinclude: Remove libmnl for MLX5 PMD.

libmnl is not used anymore for MLX5 PMD since DPDK 19.08.

Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Acked-by: Numan Siddique <numans@ovn.org>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
3 years agodpdk: Add commands to configure log levels.
David Marchand [Mon, 13 Jul 2020 08:06:21 +0000 (10:06 +0200)]
dpdk: Add commands to configure log levels.

Enabling debug logs in dpdk can be a challenge to be sure of what is
actually enabled, add commands to list and change those log levels.
However, these commands do not help when tracking issues in dpdk init
itself: dump log levels right after init.

Example:
$ ovs-appctl dpdk/log-list
global log level is debug
id 0: lib.eal, level is info
id 1: lib.malloc, level is info
id 2: lib.ring, level is info
id 3: lib.mempool, level is info
id 4: lib.timer, level is info
id 5: pmd, level is info
[...]
id 37: pmd.net.bnxt.driver, level is notice
id 38: pmd.net.e1000.init, level is notice
id 39: pmd.net.e1000.driver, level is notice
id 40: pmd.net.enic, level is info
[...]

$ ovs-appctl dpdk/log-set debug pmd.*:notice
$ ovs-appctl dpdk/log-list
global log level is debug
id 0: lib.eal, level is debug
id 1: lib.malloc, level is debug
id 2: lib.ring, level is debug
id 3: lib.mempool, level is debug
id 4: lib.timer, level is debug
id 5: pmd, level is debug
[...]
id 37: pmd.net.bnxt.driver, level is notice
id 38: pmd.net.e1000.init, level is notice
id 39: pmd.net.e1000.driver, level is notice
id 40: pmd.net.enic, level is notice
[...]

Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
3 years agorhel: openvswitch-fedora.spec.in: Fix installed but not packaged.
Roi Dayan [Wed, 15 Jul 2020 12:40:49 +0000 (15:40 +0300)]
rhel: openvswitch-fedora.spec.in: Fix installed but not packaged.

With the cited commit, we get an error from rpmbuild about installed
but not packaged /usr/lib64/libopenvswitchavx512.a.
Fix it by treating it as the other la files.

Fixes: 352b6c7116cd ("dpif-lookup: add avx512 gather implementation.")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Acked-by: Ian Stokes <ian.stokes@intel.com>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
3 years agoAUTHORS: Add Jeff Squyres.
Ilya Maximets [Thu, 16 Jul 2020 23:44:15 +0000 (01:44 +0200)]
AUTHORS: Add Jeff Squyres.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
3 years agobond: Add 'primary' interface concept for active-backup mode.
Jeff Squyres [Thu, 9 Jul 2020 23:57:47 +0000 (16:57 -0700)]
bond: Add 'primary' interface concept for active-backup mode.

In AB bonding, if the current active slave becomes disabled, a
replacement slave is arbitrarily picked from the remaining set of
enabled slaves.  This commit adds the concept of a "primary" slave: an
interface that will always be (or become) the current active slave if
it is enabled.

The rationale for this functionality is to allow the designation of a
preferred interface for a given bond.  For example:

1. Bond is created with interfaces p1 (primary) and p2, both enabled.
2. p1 becomes the current active slave (because it was designated as
   the primary).
3. Later, p1 fails/becomes disabled.
4. p2 is chosen to become the current active slave.
5. Later, p1 becomes re-enabled.
6. p1 is chosen to become the current active slave (because it was
   designated as the primary)

Note that p1 becomes the active slave once it becomes re-enabled, even
if nothing has happened to p2.

This "primary" concept exists in Linux kernel network interface
bonding, but did not previously exist in OVS bonding.

Only one primary slave interface is supported per bond, and is only
supported for active/backup bonding.

The primary slave interface is designated via
"other_config:bond-primary" when creating a bond.

Also, while adding tests for the "primary" concept, make a few small
improvements to the non-primary AB bonding test.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
3 years agodpif-netdev: Avoid deadlock with offloading during PMD thread deletion.
Ilya Maximets [Thu, 9 Jul 2020 10:09:29 +0000 (12:09 +0200)]
dpif-netdev: Avoid deadlock with offloading during PMD thread deletion.

Main thread will try to pause/stop all revalidators during datapath
reconfiguration via datapath purge callback (dp_purge_cb) while
holding 'dp->port_mutex'.  And deadlock happens in case any of
revalidator threads is already waiting on 'dp->port_mutex' while
dumping offloaded flows:

           main thread                           revalidator
 ---------------------------------  ----------------------------------

 ovs_mutex_lock(&dp->port_mutex)

                                    dpif_netdev_flow_dump_next()
                                    -> dp_netdev_flow_to_dpif_flow
                                    -> get_dpif_flow_status
                                    -> dpif_netdev_get_flow_offload_status()
                                    -> ovs_mutex_lock(&dp->port_mutex)
                                       <waiting for mutex here>

 reconfigure_datapath()
 -> reconfigure_pmd_threads()
 -> dp_netdev_del_pmd()
 -> dp_purge_cb()
 -> udpif_pause_revalidators()
 -> ovs_barrier_block(&udpif->pause_barrier)
    <waiting for revalidators to reach barrier>

                          <DEADLOCK>

We're not allowed to call offloading API without holding global
port mutex from the userspace datapath due to thread safety
restrictions on netdev-offload-dpdk module.  And it's also not easy
to rework datapath reconfiguration process in order to move actual
PMD removal and datapath purge out of the port mutex.

So, for now, not sleeping on a mutex if it's not immediately available
seem like an easiest workaround.  This will have impact on flow
statistics update rate and on ability to get the latest statistics
before removing the flow (latest stats will be lost in case we were
not able to take the mutex).  However, this will allow us to operate
normally avoiding the deadlock.

The last bit is that to avoid flapping of flow attributes and
statistics we're not failing the operation, but returning last
statistics and attributes returned by offload provider.  Since those
might be updated in different threads, stores and reads are atomic.

Reported-by: Frank Wang (王培辉) <wangpeihui@inspur.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2020-June/371753.html
Fixes: a309e4f52660 ("dpif-netdev: Update offloaded flows statistics.")
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Ian Stokes <ian.stokes@intel.com>
Tested-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
3 years agoovs-bugtool: Fix Python3 bytes str issue.
William Tu [Mon, 13 Jul 2020 20:34:32 +0000 (13:34 -0700)]
ovs-bugtool: Fix Python3 bytes str issue.

The patch fixes two errors due to type mismatched, when converting
between str and bytes:
  File "/usr/local/sbin/ovs-bugtool", line 649, in main
    cmd_output(CAP_NETWORK_STATUS, [OVS_DPCTL, 'dump-flows', '-m', d])
  File "/usr/local/sbin/ovs-bugtool", line 278, in cmd_output
    label = ' '.join(a)
TypeError: sequence item 3: expected str instance, bytes found

And
  File "/usr/sbin/ovs-bugtool", line 721, in main
    collect_data()
  File "/usr/sbin/ovs-bugtool", line 366, in collect_data
    run_procs(process_lists.values())
  File "/usr/sbin/ovs-bugtool", line 1354, in run_procs
    p.inst.write("\n** timeout **\n")
  File "/usr/sbin/ovs-bugtool", line 1403, in write
    BytesIO.write(self, s)
TypeError: a bytes-like object is required, not 'str'

VMware-BZ: #2602135
Fixed: 9e6c00bca9af ("bugtool: Fix for Python3.")
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
3 years agodatapath: Remove duplicated includes
Yunjian Wang [Fri, 10 Jul 2020 00:58:12 +0000 (08:58 +0800)]
datapath: Remove duplicated includes

Remove duplicated includes.

Acked-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Signed-off-by: William Tu <u9012063@gmail.com>
3 years agoofproto: Remove duplicated includes
Yunjian Wang [Fri, 10 Jul 2020 00:58:02 +0000 (08:58 +0800)]
ofproto: Remove duplicated includes

Remove duplicated includes.

Acked-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Signed-off-by: William Tu <u9012063@gmail.com>
3 years agolib: Remove duplicated includes
Yunjian Wang [Fri, 10 Jul 2020 00:57:51 +0000 (08:57 +0800)]
lib: Remove duplicated includes

Remove duplicated includes.

Acked-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Signed-off-by: William Tu <u9012063@gmail.com>
3 years agoofproto: report coverage on hitting datapath flow limit
Gowrishankar Muthukrishnan [Mon, 20 Apr 2020 13:43:42 +0000 (19:13 +0530)]
ofproto: report coverage on hitting datapath flow limit

Whenever the number of flows in the datapath crosses above
the flow limit set/autoconfigured, it is helpful to report
this event through coverage counter for an operator/devops
engineer to know and take proactive corrections in the
switch configuration.

Today, these events are reported in ovs vswitch log when
a new flow can not be inserted in upcall processing in which
case ovs writes a warning, otherwise an auto correction
made by ovs to flush old flows without any intimation at all.

Signed-off-by: Gowrishankar Muthukrishnan <gmuthukr@redhat.com>
Signed-off-by: William Tu <u9012063@gmail.com>
3 years agodpdk: Use DPDK 19.11.2 release.
Ian Stokes [Thu, 2 Jul 2020 15:09:27 +0000 (16:09 +0100)]
dpdk: Use DPDK 19.11.2 release.

Modify travis linux build script to use DPDK 19.11.2 stable release and
update docs to reference 19.11.2 stable release. Update release faq to
reflect latest validated DPDK versions for all branches.

Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
3 years agodocs/dpdk/bridge: add datapath performance section.
Harry van Haaren [Mon, 13 Jul 2020 12:42:15 +0000 (13:42 +0100)]
docs/dpdk/bridge: add datapath performance section.

This commit adds a section to the dpdk/bridge.rst netdev documentation,
detailing the added DPCLS functionality. The newly added commands are
documented, and sample output is provided.

Running the DPCLS autovalidator with unit tests by default is possible
through re-compiling the autovalidator to have the highest priority at
startup time. This avoids making changes to all tests, and enables
debug and CI builds to validate every lookup implementation with all
unit tests.

Add NEWS updates for CPU ISA, dynamic subtables, and AVX512 lookup.

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
3 years agodpif-lookup: add avx512 gather implementation.
Harry van Haaren [Mon, 13 Jul 2020 12:42:14 +0000 (13:42 +0100)]
dpif-lookup: add avx512 gather implementation.

This commit adds an AVX-512 dpcls lookup implementation.
It uses the AVX-512 SIMD ISA to perform multiple miniflow
operations in parallel.

To run this implementation, the "avx512f" and "bmi2" ISAs are
required. These ISA checks are performed at runtime while
probing the subtable implementation. If a CPU does not provide
both "avx512f" and "bmi2", then this code does not execute.

The avx512 code is built as a separate static library, with added
CFLAGS to enable the required ISA features. By building only this
static library with avx512 enabled, it is ensured that the main OVS
core library is *not* using avx512, and that OVS continues to run
as before on CPUs that do not support avx512.

The approach taken in this implementation is to use the
gather instruction to access the packet miniflow, allowing
any miniflow blocks to be loaded into an AVX-512 register.
This maximizes the usefulness of the register, and hence this
implementation handles any subtable with up to miniflow 8 bits.

Note that specialization of these avx512 lookup routines
still provides performance value, as the hashing of the
resulting data is performed in scalar code, and compile-time
loop unrolling occurs when specialized to miniflow bits.

This commit checks at configure time if the assembling in use
has a known bug in assembling AVX512 code. If this bug is present,
all AVX512 code is disabled. Checking the version string of the binutils
or assembler is not a good method to detect the issue, as back ported
fixes would not be reflected.

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
3 years agodpdk: enable CPU feature detection.
Harry van Haaren [Mon, 13 Jul 2020 12:42:13 +0000 (13:42 +0100)]
dpdk: enable CPU feature detection.

This commit implements a method to retrieve the CPU ISA capabilities.
These ISA capabilities can be used in OVS to at runtime select a function
implementation to make the best use of the available ISA on the CPU.

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
3 years agodpif-netdev: add subtable-lookup-prio-get command.
Harry van Haaren [Mon, 13 Jul 2020 12:42:12 +0000 (13:42 +0100)]
dpif-netdev: add subtable-lookup-prio-get command.

This commit adds a new command, "dpif-netdev/subtable-lookup-prio-get"
which prints the available subtable lookup functions in this OVS binary.
Example output from the command:

Available lookup functions (priority : name)
        0 : autovalidator
        1 : generic

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
3 years agodpif-netdev: add subtable lookup prio set command.
Harry van Haaren [Mon, 13 Jul 2020 12:42:11 +0000 (13:42 +0100)]
dpif-netdev: add subtable lookup prio set command.

This commit adds a command for the dpif-netdev to set a specific
lookup function to a particular priority level. The command enables
runtime switching of the dpcls subtable lookup implementation.

Selection is performed based on a priority. Higher priorities take
precedence, e.g. priority 5 will be selected instead of a priority 3.
If lookup functions have the same priority, the first one in the list
is selected.

The two options available are 'autovalidator' and 'generic'.
The below command will set a new priority for the given function:
$ ovs-appctl dpif-netdev/subtable-lookup-prio-set generic 2

The autovalidator implementation can be selected at runtime now:
$ ovs-appctl dpif-netdev/subtable-lookup-prio-set autovalidator 5

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
3 years agodpif-netdev: implement subtable lookup validation.
Harry van Haaren [Mon, 13 Jul 2020 12:42:10 +0000 (13:42 +0100)]
dpif-netdev: implement subtable lookup validation.

This commit refactors the existing dpif subtable function pointer
infrastructure, and implements an autovalidator component.

The refactoring of the existing dpcls subtable lookup function
handling, making it more generic, and cleaning up how to enable
more implementations in future.

In order to ensure all implementations provide identical results,
the autovalidator is added. The autovalidator itself implements
the subtable lookup function prototype, but internally iterates
over all other available implementations. The end result is that
testing of each implementation becomes automatic, when the auto-
validator implementation is selected.

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
3 years agodpif-netdev: Return error code when no mark available.
Tonghao Zhang [Tue, 9 Jun 2020 00:53:41 +0000 (08:53 +0800)]
dpif-netdev: Return error code when no mark available.

The max number of mark is (UINT32_MAX - 1), that is
enough to be used. But theoretically, if there are no
mark available, the later different flows will shared
the mark INVALID_FLOW_MARK, that may break the function.
If there are no available mark to be used, return error
code.

Fixes: 02bb2824e51d ("dpif-netdev: do hw flow offload in a thread")
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
3 years agodpif-netdev: Add check mark to avoid ovs-vswitchd crash.
Tonghao Zhang [Tue, 9 Jun 2020 00:53:40 +0000 (08:53 +0800)]
dpif-netdev: Add check mark to avoid ovs-vswitchd crash.

When changing the pmd interfaces attribute, ovs-vswitchd will
reload pmd and flush offload flows. reload_affected_pmds may
be invoked twice or more. In that case, the flows may been
queued to "dp_netdev_flow_offload" thread again.

For example:
$ ovs-vsctl -- set interface <Interface> options:dpdk-lsc-interrupt=true

ovs-vswitchd main       flow-offload thread
append F to queue       ...
...
append F to queue
...                     del F
...                     del F (crash [1])

[1]:
ovs_assert_failure          lib/cmap.c:922
cmap_replace                lib/cmap.c:921
cmap_remove                 lib/cmap.h:295
mark_to_flow_disassociate   lib/dpif-netdev.c:2269
dp_netdev_flow_offload_del  lib/dpif-netdev.c:2369
dp_netdev_flow_offload_main lib/dpif-netdev.c:2492

Fixes: 02bb2824e51d ("dpif-netdev: do hw flow offload in a thread")
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
3 years agonetdev-linux: Fix broken build on Ubuntu 14.04
Yi-Hung Wei [Tue, 7 Jul 2020 22:59:49 +0000 (15:59 -0700)]
netdev-linux: Fix broken build on Ubuntu 14.04

Patch 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support") uses
__virtio16 which is defined in kernel 3.19.  Ubuntu 14.04 is using 3.13
kernel that lacks the virtio_types definition.  This patch fixes that.

Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support")
Acked-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
3 years agonetdev-offload-dpdk: Support tnl/push using vxlan encap attribute.
Eli Britstein [Wed, 8 Jul 2020 06:38:31 +0000 (06:38 +0000)]
netdev-offload-dpdk: Support tnl/push using vxlan encap attribute.

For DPDK, there is the RAW_ENCAP attribute which gets raw buffer of the
encapsulation header. For specific protocol, such as vxlan, there is a
more specific attribute, VXLAN_ENCAP, which gets the parsed fields of
the outer header. In case tunnel type is vxlan, parse the header
and use the specific attribute, with fallback to RAW_ENCAP.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roni Bar Yanai <roniba@mellanox.com>
Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
3 years agonetdev-offload-dpdk: Support offload of clone tnl_push/output actions.
Eli Britstein [Wed, 8 Jul 2020 06:38:30 +0000 (06:38 +0000)]
netdev-offload-dpdk: Support offload of clone tnl_push/output actions.

Tunnel encapsulation is done by tnl_push and output actions nested in a
clone action. Support offloading of such flows with
RTE_FLOW_ACTION_TYPE_RAW_ENCAP attribute.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
3 years agonetdev-offload: Use dpif type instead of class.
Ilya Maximets [Wed, 8 Jul 2020 06:38:29 +0000 (06:38 +0000)]
netdev-offload: Use dpif type instead of class.

There is no real difference between the 'class' and 'type' in the
context of common lookup operations inside netdev-offload module
because it only checks the value of pointers without using the
value itself.  However, 'type' has some meaning and can be used by
offload provides on the initialization phase to check if this type
of Flow API in pair with the netdev type could be used in particular
datapath type.  For example, this is needed to check if Linux flow
API could be used for current tunneling vport because it could be
used only if tunneling vport belongs to system datapath, i.e. has
backing linux interface.

This is needed to unblock tunneling offloads in userspace datapath
with DPDK flow API.

Acked-by: Eli Britstein <elibr@mellanox.com>
Acked-by: Roni Bar Yanai <roniba@mellanox.com>
Acked-by: Ophir Munk <ophirmu@mellanox.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
3 years agonetdev: Allow storing dpif type into netdev structure.
Ilya Maximets [Wed, 8 Jul 2020 06:38:28 +0000 (06:38 +0000)]
netdev: Allow storing dpif type into netdev structure.

Storing of the dpif type of the owning datapath interface will allow
us to easily distinguish, for example, userspace tunneling ports from
the system ones.  This is required in terms of HW offloading to avoid
offloading of userspace flows to kernel interfaces that doesn't belong
to userspace datapath, but have same dpif_port names.

Acked-by: Eli Britstein <elibr@mellanox.com>
Acked-by: Roni Bar Yanai <roniba@mellanox.com>
Acked-by: Ophir Munk <ophirmu@mellanox.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
3 years agonetdev-offload-dpdk: Support offload of set IPv6 actions.
Eli Britstein [Wed, 8 Jul 2020 06:38:27 +0000 (06:38 +0000)]
netdev-offload-dpdk: Support offload of set IPv6 actions.

Add support for set IPv6 actions.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roni Bar Yanai <roniba@mellanox.com>
Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
3 years agonetdev-offload-dpdk: Add IPv6 pattern matching.
Eli Britstein [Wed, 8 Jul 2020 06:38:26 +0000 (06:38 +0000)]
netdev-offload-dpdk: Add IPv6 pattern matching.

Add support for IPv6 pattern matching for offloading flows.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roni Bar Yanai <roniba@mellanox.com>
Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
3 years agonetdev-offload-dpdk: Remove pre-validate of patterns function.
Eli Britstein [Wed, 8 Jul 2020 06:38:25 +0000 (06:38 +0000)]
netdev-offload-dpdk: Remove pre-validate of patterns function.

The function of adding patterns by requested matches checks that it
consumed all the required matches, and err if not. For functional
purpose there is no need for pre-validation. For performance such
validation may decrease the time spent for failing flows, but at the
expense of increasing the time spent for the good flows, and code
complexity. Remove the pre-validation function.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roni Bar Yanai <roniba@mellanox.com>
Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
3 years agonetdev-offload-dpdk: Support partial TCP/UDP port matching.
Eli Britstein [Wed, 8 Jul 2020 06:38:24 +0000 (06:38 +0000)]
netdev-offload-dpdk: Support partial TCP/UDP port matching.

The cited commit failed partial matching of TCP/UDP port matching,
preventing such offload of supporting HWs. Remove this failure.

Fixes: e8a2b5bf92bb ("netdev-dpdk: implement flow offload with rte flow")
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roni Bar Yanai <roniba@mellanox.com>
Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
3 years agonetdev-offload-dpdk: Fix Ethernet matching for type only.
Eli Britstein [Wed, 8 Jul 2020 06:38:23 +0000 (06:38 +0000)]
netdev-offload-dpdk: Fix Ethernet matching for type only.

For OVS rule of the form "eth type is 0x1234 / end", rule is offloaded
in the form of "eth / end", which is incorrect. Fix it.

Fixes: e8a2b5bf92bb ("netdev-dpdk: implement flow offload with rte flow")
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roni Bar Yanai <roniba@mellanox.com>
Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
3 years agodpif-netdev: Don't use zero flow mark.
Eli Britstein [Wed, 8 Jul 2020 06:38:22 +0000 (06:38 +0000)]
dpif-netdev: Don't use zero flow mark.

Zero flow mark is used to indicate the HW to remove the mark. A packet
marked with zero mark is received in SW without a mark at all, so it
cannot be used as a valid mark. Change the pool range to fix it.

Fixes: 241bad15d99a ("dpif-netdev: associate flow with a mark id")
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roni Bar Yanai <roniba@mellanox.com>
Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
3 years agodpif-netdev: Add mega ufid in flow add/del log.
Eli Britstein [Wed, 8 Jul 2020 06:38:21 +0000 (06:38 +0000)]
dpif-netdev: Add mega ufid in flow add/del log.

As offload is done using the mega ufid of a flow, for better
debugability, add it in the log message.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roni Bar Yanai <roniba@mellanox.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
3 years agonetdev-offload-dpdk: Log testpmd format for flow create/destroy.
Eli Britstein [Wed, 8 Jul 2020 06:38:20 +0000 (06:38 +0000)]
netdev-offload-dpdk: Log testpmd format for flow create/destroy.

To enhance debugability with DPDK, format the logs in a testpmd format
commands.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roni Bar Yanai <roniba@mellanox.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
3 years agonetdev-offload-tc: Add drop action support.
William Tu [Tue, 7 Jul 2020 03:11:05 +0000 (20:11 -0700)]
netdev-offload-tc: Add drop action support.

Currently drop action is not offloaded when using userspace datapath
with tc offload.  The patch programs tc gact (generic action) chain
ID 0 to drop the packet by setting it to TC_ACT_SHOT.

Example:
$ ovs-appctl dpctl/add-flow netdev@ovs-netdev \
  'recirc_id(0),in_port(2),eth(),eth_type(0x0806),\
  arp(op=2,tha=00:50:56:e1:4b:ab,tip=10.255.1.116)' drop

Or no action also infers drop
$ ovs-appctl dpctl/add-flow netdev@ovs-netdev \
  'recirc_id(0),in_port(2),eth(),eth_type(0x0806),\
  arp(op=2,tha=00:50:56:e1:4b:ab,tip=10.255.1.116)' ''

$ tc filter show dev ovs-p0 ingress
filter protocol arp pref 2 flower chain 0
filter protocol arp pref 2 flower chain 0 handle 0x1
  eth_type arp
  arp_tip 10.255.1.116
  arp_op reply
  arp_tha 00:50:56:e1:4b:ab
  skip_hw
  not_in_hw
action order 1: gact action drop
    ...

Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
3 years agopython: Fix plural forms of OVSDB types.
Ben Pfaff [Sat, 21 Mar 2020 22:17:27 +0000 (15:17 -0700)]
python: Fix plural forms of OVSDB types.

Fixes two problems.  First, the plural of chassis is also chassis.
Second, for linguistic analysis we need to consider plain words, not
words that have (e.g.) \fB and \fR pasted into them for nroff output.

This makes the OVN manpage for ovn-sb(5) talk about "set of Chassis"
not "set of Chassiss".

Acked-by: Numan Siddique <numans@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
3 years agodpif-netdev: Do RCU synchronization at fixed interval in PMD main loop.
Nitin Katiyar [Thu, 22 Aug 2019 16:53:30 +0000 (22:23 +0530)]
dpif-netdev: Do RCU synchronization at fixed interval in PMD main loop.

Each PMD updates the global sequence number for RCU synchronization
purpose with other OVS threads. This is done at every 1025th iteration
in PMD main loop.

If the PMD thread is responsible for polling large number of queues
that are carrying traffic, it spends a lot of time processing packets
and this results in significant delay in performing the housekeeping
activities.

If the OVS main thread is waiting to synchronize with the PMD threads
and if those threads delay performing housekeeping activities for
more than 3 sec then LACP processing will be impacted and it will lead
to LACP flaps. Similarly, other controls protocols run by OVS main
thread are impacted.

For e.g. a PMD thread polling 200 ports/queues with average of 1600
processing cycles per packet with batch size of 32 may take 10240000
(200 * 1600 * 32) cycles per iteration. In system with 2.0 GHz CPU
it means more than 5 ms per iteration. So, for 1024 iterations to
complete it would be more than 5 seconds.

This gets worse when there are PMD threads which are less loaded.
It reduces possibility of getting mutex lock in ovsrcu_try_quiesce()
by heavily loaded PMD and next attempt to quiesce would be after 1024
iterations.

With this patch, PMD RCU synchronization will be performed after fixed
interval instead after a fixed number of iterations. This will ensure
that even if the packet processing load is high the RCU synchronization
will not be delayed long.

Co-authored-by: Anju Thomas <anju.thomas@ericsson.com>
Signed-off-by: Anju Thomas <anju.thomas@ericsson.com>
Signed-off-by: Nitin Katiyar <nitin.katiyar@ericsson.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
3 years agoovsdb-idl: Force IDL retry when missing updates encountered.
Dumitru Ceara [Thu, 2 Jul 2020 14:20:57 +0000 (16:20 +0200)]
ovsdb-idl: Force IDL retry when missing updates encountered.

Adds a generic recovery mechanism which triggers an IDL retry with fast
resync disabled in case the IDL has detected that it ended up in an
inconsistent state due to other bugs in the ovsdb-server/ovsdb-idl
implementation.

Additionally, this commit also:
- bumps IDL semantic error logs to level ERR to make them more
  visible.
- triggers an IDL retry in cases when the IDL client used to try to
  recover (i.e., trying to add an existing row, trying to remove a non
  existent row).

Fixes: db2b5757328c ("lib: add monitor2 support in ovsdb-idl.")
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
3 years agoovsdb/TODO.rst: Remove OVN specific items.
Han Zhou [Mon, 22 Jun 2020 05:55:13 +0000 (22:55 -0700)]
ovsdb/TODO.rst: Remove OVN specific items.

These should belong to OVN project, if still not done yet.

Signed-off-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
3 years agoovsdb/TODO.rst: Remove completed items.
Han Zhou [Mon, 22 Jun 2020 05:55:12 +0000 (22:55 -0700)]
ovsdb/TODO.rst: Remove completed items.

- snapshot unit test has been added for "change-election-timer" related
patches.

- 100% CPU problem was addressed by:
    2cd62f75c1 ("ovsdb raft: Precheck prereq before proposing commit.")

Signed-off-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
3 years agoAUTHORS: Add Adrian Moreno.
Ilya Maximets [Mon, 6 Jul 2020 23:19:26 +0000 (01:19 +0200)]
AUTHORS: Add Adrian Moreno.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
3 years agodpif-netdev-unixctl.man: Document bond-show command.
Adrian Moreno [Mon, 6 Jul 2020 09:26:55 +0000 (11:26 +0200)]
dpif-netdev-unixctl.man: Document bond-show command.

Document recently added ovs-appctl command.

Fixes: 9df65060cf4c ("userspace: Avoid dp_hash recirculation for balance-tcp bond mode.")
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
3 years agoofproto: Delete buckets when lb_output is false.
Adrian Moreno [Fri, 26 Jun 2020 11:51:16 +0000 (13:51 +0200)]
ofproto: Delete buckets when lb_output is false.

When lb-output-action is toggled back to "false" buckets are not being
deleted. Delete them as they will no longer be used.

Add unit test to verify buckets are correctly deleted.

Cc: Vishal Deep Ajmera <vishal.deep.ajmera@ericsson.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
3 years agodpif-netdev: Delete the artificial flow limit.
Tonghao Zhang [Sun, 15 Mar 2020 21:56:03 +0000 (05:56 +0800)]
dpif-netdev: Delete the artificial flow limit.

The MAX_FLOWS constant was there from the introduction of dpif-netdev,
however, later new flow-limit mechanism was implemented that
controls number of datapath flows in a dynamic way on ofproto level.

So, we can just remove the limit and fully rely on ofproto to decide
what flow limit we need.  There are no limitations for flow table size
in dpif-netdev beside the artificial one.
'other_config:flow-limit' seems suitable to control this.

Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
3 years agoovsdb: Remove duplicated include.
Yunjian Wang [Fri, 15 May 2020 11:21:00 +0000 (19:21 +0800)]
ovsdb: Remove duplicated include.

Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
3 years agoodp-execute: Fix length checking while executing check_pkt_len action.
Ilya Maximets [Thu, 11 Jun 2020 09:25:52 +0000 (11:25 +0200)]
odp-execute: Fix length checking while executing check_pkt_len action.

If dp-packet contains l2 padding or cutlen was applied to it, size will
be larger than the actual size of a payload and action will work
incorrectly.

Ex. Padding could be added during miniflow_extract() if detected.

Fixes: 5b34f8fc3b38 ("Add a new OVS action check_pkt_larger")
Reported-by: Miroslav Kubiczek <miroslav.kubiczek@adaptivemobile.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2020-May/050157.html
Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
3 years agometa-flow: Document that constituents of conjunctive flows may overlap.
Ben Pfaff [Wed, 27 May 2020 19:24:31 +0000 (12:24 -0700)]
meta-flow: Document that constituents of conjunctive flows may overlap.

Suggested-by: Antonin Bas <abas@vmware.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
3 years agoAUTHORS: Add Peng He.
William Tu [Sun, 5 Jul 2020 13:44:25 +0000 (06:44 -0700)]
AUTHORS: Add Peng He.

Signed-off-by: William Tu <u9012063@gmail.com>
3 years agoconntrack-tp: fix lock order in conn_update_expiration
Peng He [Sun, 5 Jul 2020 13:07:23 +0000 (21:07 +0800)]
conntrack-tp: fix lock order in conn_update_expiration

*conn_update_expiration* violates the lock order of conn->lock and
ct->lock. In the comments of conntrack, the conn->lock should be
held after ct->lock when ct->lock needs to be taken.

Fixes: 2078901a4c142 ("userspace: Add conntrack timeout policy support.")
Signed-off-by: Peng He <hepeng.0320@bytedance.com>
Signed-off-by: William Tu <u9012063@gmail.com>
3 years agoctags: Include new annotations to ctags ignore list.
Flavio Leitner [Wed, 10 Jun 2020 19:49:45 +0000 (16:49 -0300)]
ctags: Include new annotations to ctags ignore list.

The annotation OVS_NO_THREAD_SAFETY_ANALYSIS and OVS_LOCKABLE are
not part of the list, so ctags can't find functions using them.

The annotation list comes from a regex and to include more items
make the regex more difficult to read and maintain. Convert to a
static list because it isn't supposed to change much and there
is no standard names.

Also add a comment to remind to keep the list up-to-date.

Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: William Tu <u9012063@gmail.com>
3 years agolib/tc: only update the stats for non-empty counter
wenxu [Mon, 29 Jun 2020 09:31:18 +0000 (17:31 +0800)]
lib/tc: only update the stats for non-empty counter

A packet with first frag and execute act_ct action.
The packet will stole by defrag. So the stats counter
for "gact action goto chain" will always 0. The openvswitch
update each action in order. So the flower stats finally
alway be zero. The rule will be delete adter max-idle time
even there are packet executing the action.

ovs-appctl dpctl/dump-flows
recirc_id(0),in_port(1),eth_type(0x0800),ipv4(dst=11.0.0.7,frag=first), packets:0, bytes:0, used:5.390s, actions:ct(zone=1,nat),recirc(0x4)

filter protocol ip pref 2 flower chain 0 handle 0x2
  eth_type ipv4
  dst_ip 1.1.1.1
  ip_flags frag/firstfrag
  skip_hw
  not_in_hw
 action order 1: ct zone 1 nat pipe
  index 2 ref 1 bind 1 installed 11 sec used 1 sec
 Action statistics:
 Sent 15000 bytes 11 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
 cookie e04106c2ac41769b278edaa9b5309960

 action order 2: gact action goto chain 1
  random type none pass val 0
  index 2 ref 1 bind 1 installed 11 sec used 11 sec
 Action statistics:
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
 cookie e04106c2ac41769b278edaa9b5309960

Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
3 years agodatapath-windows: Add CTA_HELP and CTA_TUPLE_MASTER
Jinjun Gao [Tue, 30 Jun 2020 11:47:57 +0000 (19:47 +0800)]
datapath-windows: Add CTA_HELP and CTA_TUPLE_MASTER

Add helper and master if existing to a conntrack entry:
1, For CTA_HELP, only support FTP/TFTP;
2, For CTA_TUPLE_MASTER, only support FTP.

Signed-off-by: Jinjun Gao <jinjung@vmware.com>
Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>
3 years agojsonrpc: Don't assert for 0 remotes in jsonrpc_session_open_multiple().
Ben Pfaff [Fri, 26 Jun 2020 19:46:10 +0000 (12:46 -0700)]
jsonrpc: Don't assert for 0 remotes in jsonrpc_session_open_multiple().

It's pretty easy to get 0 remotes here from ovn-northd if you specify
--ovnnb-db='' or --ovnnb-db='   ' on the command line.  The internals
of jsonrpc_session aren't equipped to cope with that, so just add a
dummy remote instead.

Acked-by: Numan Siddique <numans@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
3 years agodatapath-windows, conntrack: Fix conntrack new state
Rui Cao [Tue, 23 Jun 2020 06:46:22 +0000 (06:46 +0000)]
datapath-windows, conntrack: Fix conntrack new state

On windows, if we send a connection setup packet in one direction
twice, it will make the connection to be in established state. The
same issue happened in Linux userspace conntrack module and has
been fixed.

This patch port the following previous fixes to windows datapath to
fix the issue:
a867c010ee9183885ee9d3eb76a0005c075c4d2e
ac23d20fc90da3b1c9b2117d1e22102e99fba006

Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Rui Cao <rcao@vmware.com>
Signed-off-by: William Tu <u9012063@gmail.com>
3 years agobridge: Fix null dereference on ct_timeout_policy record
Yi-Hung Wei [Fri, 26 Jun 2020 18:21:06 +0000 (11:21 -0700)]
bridge: Fix null dereference on ct_timeout_policy record

Accoridng to vswitch.ovsschema, each CT_Zone record may have
zero or one associcated CT_Timeout_policy.  Thus, this patch
checks if ovsrec_ct_timeout_policy exist before accesses the
record.

VMWare-BZ: 2585825
Fixes: 45339539f69d ("ovs-vsctl: Add conntrack zone commands.")
Fixes: 993cae678bca ("ofproto-dpif: Consume CT_Zone, and CT_Timeout_Policy tables")
Reported-by: Yang Song <yangsong@vmware.com>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
3 years agorhel: Fix syntax error when matching version.
William Tu [Mon, 22 Jun 2020 15:54:13 +0000 (08:54 -0700)]
rhel: Fix syntax error when matching version.

Remove the extra 'fi' in the script.

VMware-BZ: #2582834
Fixed: fecb28051b35 ("rhel: Support RHEL 7.8 kernel module rpm build.")
Reported-by: Abhijeet Malawade <amalawade@vmware.com>
Acked-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
3 years agoAUTHORS: Add Sriharsha Basavapatna.
Ilya Maximets [Mon, 22 Jun 2020 12:07:33 +0000 (14:07 +0200)]
AUTHORS: Add Sriharsha Basavapatna.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
3 years agonetdev-offload-dpdk: Support offload of VLAN PUSH/POP actions.
Sriharsha Basavapatna [Fri, 29 May 2020 06:33:05 +0000 (02:33 -0400)]
netdev-offload-dpdk: Support offload of VLAN PUSH/POP actions.

Parse VLAN PUSH/POP OVS datapath actions and add respective RTE actions.

Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Acked-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
3 years agoofproto-dpif.at: Add unit test for lb_output action.
Matteo Croce [Mon, 25 May 2020 18:05:15 +0000 (20:05 +0200)]
ofproto-dpif.at: Add unit test for lb_output action.

Extend the balance-tcp one so it tests lb-output action too.
The test checks that that the option is shown in bond/show,
and that the lb_output action is programmed in the datapath.

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
3 years agouserspace: Avoid dp_hash recirculation for balance-tcp bond mode.
Vishal Deep Ajmera [Fri, 22 May 2020 08:50:05 +0000 (10:50 +0200)]
userspace: Avoid dp_hash recirculation for balance-tcp bond mode.

Problem:

In OVS, flows with output over a bond interface of type “balance-tcp”
gets translated by the ofproto layer into "HASH" and "RECIRC" datapath
actions. After recirculation, the packet is forwarded to the bond
member port based on 8-bits of the datapath hash value computed through
dp_hash. This causes performance degradation in the following ways:

1. The recirculation of the packet implies another lookup of the
packet’s flow key in the exact match cache (EMC) and potentially
Megaflow classifier (DPCLS). This is the biggest cost factor.

2. The recirculated packets have a new “RSS” hash and compete with the
original packets for the scarce number of EMC slots. This implies more
EMC misses and potentially EMC thrashing causing costly DPCLS lookups.

3. The 256 extra megaflow entries per bond for dp_hash bond selection
put additional load on the revalidation threads.

Owing to this performance degradation, deployments stick to “balance-slb”
bond mode even though it does not do active-active load balancing for
VXLAN- and GRE-tunnelled traffic because all tunnel packet have the
same source MAC address.

Proposed optimization:

This proposal introduces a new load-balancing output action instead of
recirculation.

Maintain one table per-bond (could just be an array of uint16's) and
program it the same way internal flows are created today for each
possible hash value (256 entries) from ofproto layer. Use this table to
load-balance flows as part of output action processing.

Currently xlate_normal() -> output_normal() ->
bond_update_post_recirc_rules() -> bond_may_recirc() and
compose_output_action__() generate 'dp_hash(hash_l4(0))' and
'recirc(<RecircID>)' actions. In this case the RecircID identifies the
bond. For the recirculated packets the ofproto layer installs megaflow
entries that match on RecircID and masked dp_hash and send them to the
corresponding output port.

Instead, we will now generate action as
    'lb_output(<bond id>)'

This combines hash computation (only if needed, else re-use RSS hash)
and inline load-balancing over the bond. This action is used *only* for
balance-tcp bonds in userspace datapath (the OVS kernel datapath
remains unchanged).

Example:
Current scheme:

With 8 UDP flows (with random UDP src port):

  flow-dump from pmd on cpu core: 2
  recirc_id(0),in_port(7),<...> actions:hash(hash_l4(0)),recirc(0x1)

  recirc_id(0x1),dp_hash(0xf8e02b7e/0xff),<...> actions:2
  recirc_id(0x1),dp_hash(0xb236c260/0xff),<...> actions:1
  recirc_id(0x1),dp_hash(0x7d89eb18/0xff),<...> actions:1
  recirc_id(0x1),dp_hash(0xa78d75df/0xff),<...> actions:2
  recirc_id(0x1),dp_hash(0xb58d846f/0xff),<...> actions:2
  recirc_id(0x1),dp_hash(0x24534406/0xff),<...> actions:1
  recirc_id(0x1),dp_hash(0x3cf32550/0xff),<...> actions:1

New scheme:
We can do with a single flow entry (for any number of new flows):

  in_port(7),<...> actions:lb_output(1)

A new CLI has been added to dump datapath bond cache as given below.

 # ovs-appctl dpif-netdev/bond-show [dp]

   Bond cache:
     bond-id 1 :
       bucket 0 - slave 2
       bucket 1 - slave 1
       bucket 2 - slave 2
       bucket 3 - slave 1

Co-authored-by: Manohar Krishnappa Chidambaraswamy <manukc@gmail.com>
Signed-off-by: Manohar Krishnappa Chidambaraswamy <manukc@gmail.com>
Signed-off-by: Vishal Deep Ajmera <vishal.deep.ajmera@ericsson.com>
Tested-by: Matteo Croce <mcroce@redhat.com>
Tested-by: Adrian Moreno <amorenoz@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agonetdev-offload-tc: Revert tunnel src/dst port masks handling
Roi Dayan [Tue, 16 Jun 2020 13:03:57 +0000 (16:03 +0300)]
netdev-offload-tc: Revert tunnel src/dst port masks handling

The cited commit intended to add tc support for masking tunnel src/dst
ips and ports. It's not possible to do tunnel ports masking with
openflow rules and the default mask for tunnel ports set to 0 in
tnl_wc_init(), unlike tunnel ports default mask which is full mask.
So instead of never passing tunnel ports to tc, revert the changes
to tunnel ports to always pass the tunnel port.
In sw classification is done by the kernel, but for hw we must match
the tunnel dst port.

Fixes: 5f568d049130 ("netdev-offload-tc: Allow to match the IP and port mask of tunnel")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agorhel: Support RHEL 7.8 kernel module rpm build.
William Tu [Wed, 17 Jun 2020 15:38:40 +0000 (08:38 -0700)]
rhel: Support RHEL 7.8 kernel module rpm build.

Add support for RHEL7.8 GA release with kernel 3.10.0-1127.

VMware-BZ: #2582834
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agodocs: Add note for AF_XDP installation
Yi-Hung Wei [Tue, 9 Jun 2020 17:42:12 +0000 (10:42 -0700)]
docs: Add note for AF_XDP installation

Add notes about some configuration issues when enabling AF_XDP
support.

Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agoofproto-dpif-trace: Improve NAT tracing.
Dumitru Ceara [Fri, 10 Jan 2020 09:34:43 +0000 (10:34 +0100)]
ofproto-dpif-trace: Improve NAT tracing.

When ofproto/trace detects a recirc action it resumes execution at the
specified next table. However, if the ct action performs SNAT/DNAT,
e.g., ct(commit,nat(src=1.1.1.1:4000),table=42), the src/dst IPs and
ports in the oftrace_recirc_node->flow field are not updated. This leads
to misleading outputs from ofproto/trace as real packets would actually
first get NATed and might match different flows when recirculated.

Assume the first IP/port from the NAT src/dst action will be used by
conntrack for the translation and update the oftrace_recirc_node->flow
accordingly. This is not entirely correct as conntrack might choose a
different IP/port but the result is more realistic than before.

This fix covers new connections. However, for reply traffic that executes
actions of the form ct(nat, table=42) we still don't update the flow as
we don't have any information about conntrack state when tracing.

Also move the oftrace_recirc_node processing out of ofproto_trace()
and to its own function, ofproto_trace_recirc_node() for better
readability/

Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoodp-util.c: Fix dp_hash execution with slowpath actions.
Han Zhou [Fri, 15 May 2020 07:17:47 +0000 (00:17 -0700)]
odp-util.c: Fix dp_hash execution with slowpath actions.

When dp_hash is executed with slowpath actions, it results in endless
recirc loop in kernel datapath, and finally drops the packet, with
kernel logs:

openvswitch: ovs-system: deferred action limit reached, drop recirc action

The root cause is that the dp_hash value calculated by slowpath is not
passed to datapath when executing the recirc action, thus when the recirced
packet miss upcall comes to userspace again, it generates the dp_hash
and recirc action again, with same recirc_id, which in turn generates
a megaflow with recirc action with the recird_id same as the recirc_id in
its match condition, which causes a loop in datapath.

For example, this can be reproduced with below setup of OVN environment:

                         LS1            LS2
                          |              |
                          |------R1------|
        VIF--LS0---R0-----|              |------R3
                          |------R2------|

Assume there is a route from the VIF to R3: R0 -> R1 -> R3, and there are two
routes (ECMP) from R3 to the VIF:
R3 -> R1 -> R0
R3 -> R2 -> R0

Now if we ping from the VIF to R3, the OVS flow execution on the HV of the VIF
will hit the R3's datapath which has flows that responds to the ICMP packet
by setting ICMP fields, which requires slowpath actions, and in later flow
tables it will hit the "group" action that selects between the ECMP routes.

By default OVN uses "dp_hash" method for the "group" action.

For the first miss upcall packet, dp_hash value is empty, so the group action
will be translated to "dp_hash" and "recirc".

During action execution, because of the previous actions that sets ICMP fields,
the whole execution requires slowpath, so it tries to execute all actions in
userspace in odp_execute_actions(), including dp_hash action, except the
recirc action, which can only be executed in datapath. So the dp_hash value
is calculated in userspace, and then the packet is injected to datapath for
recirc action execution.

However, the dp_hash calculated by the userspace is not passed to datapath.

Because of this, the packet recirc in datapath doesn't have dp_hash value,
and the miss upcall for the recirced packet hits the same flow tables and
triggers same "dp_hash" and "recirc" action again, with exactly same recirc_id!

This time, the new upcall doesn't require any slowpath execution, so both
the dp_hash and recirc actions are executed in datapath, after creating a
datapath megaflow like:

recirc_id(XYZ),..., actions:hash(l4(0)),recirc(XYZ)

with match recirc_id equals the recirc id in the action, thus creating a loop.

This patch fixes the problem by passing the calculated dp_hash value to
datapath in odp_key_from_dp_packet().

Fixes: 572f732ab078 ("dpif-netdev: user space datapath recirculation")
Signed-off-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoovsdb idl: Try committing the pending txn in ovsdb_idl_loop_run.
Numan Siddique [Fri, 5 Jun 2020 08:30:29 +0000 (14:00 +0530)]
ovsdb idl: Try committing the pending txn in ovsdb_idl_loop_run.

The function ovsdb_idl_loop_run(), after calling ovsdb_idl_run(),
returns a transaction object (of type 'struct ovsdb_idl_txn').
The returned transaction object can be NULL if there is a pending
transaction (loop->committing_txn) in the idl loop object.

Normally the clients of idl library, first call ovsdb_idl_loop_run(),
then do their own processing and create any idl transactions during
this processing and then finally call ovsdb_idl_loop_commit_and_wait().

If ovsdb_idl_loop_run() returns NULL transaction object, then much
of the processing done by the client gets wasted as in the case
of ovn-controller.

The client (in this case ovn-controller), can skip the processing
and instead call ovsdb_idl_loop_commit_and_wait() if the transaction
oject is NULL. But ovn-controller uses IDL tracking and it may
loose the tracked changes in that run.

This patch tries to improve this scenario, by checking if the
pending transaction can be committed in the ovsdb_idl_loop_run()
itself and if the pending transaction is cleared (because of the
response messages from ovsdb-server due to a transaction message
in the previous run), ovsdb_idl_loop_run() can return a valid
transaction object.

CC: Han Zhou <hzhou@ovn.org>
Signed-off-by: Numan Siddique <numans@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>