git.proxmox.com Git - ovs.git/log

poc: Introduce Proof of Concepts (Package building)

This patch sets up foundations for Proof of Concepts that
simply materialize documentation into Ansible instructions
executed in virtualized Vagrant environment.

This Proof of Concept allows to easily build:
1. *.deb packages on Ubuntu 16.04; AND
2. *.rpm packages on CentOS 7.4.
It also sets up DEB and RPM repository over HTTP that can
be used to pull these openvswitch packages with apt-get
or yum from another host.

This particular Proof of Concept is intended to address
following use-cases:
1. for new OVS users to see how debian and rpm packages are
   built;
2. for developers to easily check for packaging build
   regressions;
3. for developers to easily share their sandbox builds
   into QE setups (opposed to manually copying binaries);
4. for developers to add other Proof of Concepts
   that possibly may require full end-to-end integration
   with other thirdparty projects (e.g. DPI, libvirt, IPsec)
   and need Open vSwitch packages.

Tested-by: Greg Rose <gvrose8192@gmail.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ansis Atteka <aatteka@ovn.org>

datapath: use ktime_get_ts64() instead of ktime_get_ts()

Upstream commit:
    commit 311af51dcb5629f04976a8e451673f77e3301041
    Author: Arnd Bergmann <arnd@arndb.de>
    Date:   Mon Nov 27 12:41:38 2017 +0100

    openvswitch: use ktime_get_ts64() instead of ktime_get_ts()

    timespec is deprecated because of the y2038 overflow, so let's convert
    this one to ktime_get_ts64(). The code is already safe even on 32-bit
    architectures, since it uses monotonic times. On 64-bit architectures,
    nothing changes, while on 32-bit architectures this avoids one
    type conversion.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Additional compatability check for ktime_get_ts64() exists or not.
If not, then just continue using ktime_get_ts(). I added a new
compatability header file "timekeeping.h".

Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>

datapath: fix the incorrect flow action alloc size

Upstream commit:
    commit 67c8d22a73128ff910e2287567132530abcf5b71
    Author: zhangliping <zhangliping02@baidu.com>
    Date:   Sat Nov 25 22:02:12 2017 +0800

    openvswitch: fix the incorrect flow action alloc size

    If we want to add a datapath flow, which has more than 500 vxlan outputs'
    action, we will get the following error reports:
      openvswitch: netlink: Flow action size 32832 bytes exceeds max
      openvswitch: netlink: Flow action size 32832 bytes exceeds max
      openvswitch: netlink: Actions may not be safe on all matching packets
      ... ...

    It seems that we can simply enlarge the MAX_ACTIONS_BUFSIZE to fix it, but
    this is not the root cause. For example, for a vxlan output action, we need
    about 60 bytes for the nlattr, but after it is converted to the flow
    action, it only occupies 24 bytes. This means that we can still support
    more than 1000 vxlan output actions for a single datapath flow under the
    the current 32k max limitation.

    So even if the nla_len(attr) is larger than MAX_ACTIONS_BUFSIZE, we
    shouldn't report EINVAL and keep it move on, as the judgement can be
    done by the reserve_sfa_size.

Signed-off-by: zhangliping <zhangliping02@baidu.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: zhangliping <zhangliping02@baidu.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>

datapath: fix data type in queue_gso_packets

Upstream commit:
    commit 2734166e89639c973c6e125ac8bcfc2d9db72b70
    Author: Gustavo A. R. Silva <garsilva@embeddedor.com>
    Date:   Sat Nov 25 13:14:40 2017 -0600

    net: openvswitch: datapath: fix data type in queue_gso_packets

    gso_type is being used in binary AND operations together with SKB_GSO_UDP.
    The issue is that variable gso_type is of type unsigned short and
    SKB_GSO_UDP expands to more than 16 bits:

    SKB_GSO_UDP = 1 << 16

    this makes any binary AND operation between gso_type and SKB_GSO_UDP to
    be always zero, hence making some code unreachable and likely causing
    undesired behavior.

    Fix this by changing the data type of variable gso_type to unsigned int.

    Addresses-Coverity-ID: 1462223
Fixes: 0c19f846d582 ("net: accept UFO datagrams from tuntap and packet")
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While backporting this I found another couple of instances of the
same issue so I fixed them up as well.

Cc: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>

datapath: Fix an error handling path in 'ovs_nla_init_match_and_action()

Upstream commit:
commit 5829e62ac17a40ab08c1b905565604a4b5fa7af6
Author: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Date:   Mon Sep 11 21:56:20 2017 +0200

    openvswitch: Fix an error handling path in 'ovs_nla_init_match_and_action()'

    All other error handling paths in this function go through the 'error'
    label. This one should do the same.

Fixes: 9cc9a5cb176c ("datapath: Avoid using stack larger than 1024.")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Fixes: 850c2a4d1a ("datapath: Avoid using stack larger than 1024.")
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>

compat: Fix compiler headers

Since Linux kernel upstream commit d15155824c50
("linux/compiler.h: Split into compiler.h and compiler_types.h") this
error check for the gcc compiler header is no longer valid. Remove
so that openvswitch builds for linux kernels 4.14.8 and since.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>

datapath: Fix SKB_GSO_UDP usage

Using SKB_GSO_UDP breaks the compilation on Linux 4.14. Check for
the HAVE_SKB_GSO_UDP compiler #define.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>

datapath: conntrack: make protocol tracker pointers const

Upstream commit:
    commit b3480fe059ac9121b5714205b4ddae14b59ef4be
    Author: Florian Westphal <fw@strlen.de>
    Date:   Sat Aug 12 00:57:08 2017 +0200

    netfilter: conntrack: make protocol tracker pointers const

    Doesn't change generated code, but will make it easier to eventually
    make the actual trackers themselvers const.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>

compat:inet_frag.h: Check for frag_percpu_counter_batch

Fix up the compat layer to check for frag_percpu_counter_batch and
if not present then use atomic_sub and atomic_add as per the
backport in the 3.16.50 LTS kernel.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>

compat: Do not include headers when not compiling

If the entire file is not going to be compiled because OVS is using
upstream tunnel support then also don't bother pulling in the headers.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>

datapath: Fix netdev_master_upper_dev_link for 4.14

An extended netlink ack has been added for 4.14 - add compat layer
changes so that it compiles for all kernels up to and including
4.14.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>

tests: Don't include a newline in ovs_fatal() calls.

Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>

ovn: Allow DNS lookups over IPv6

There was a bug in DNS request handling where the incoming packet was
assumed to be IPv4.

The result was that for the outgoing packet, we would attempt to write
the IPv4 checksum and total length into what was actually an IPv6
header. This resulted in the source IPv6 address getting corrupted.
Later, the source and destination IPv6 addresses would get swapped,
resulting in the DNS response being sent to a nonsense destination.

With this change, we check the ethertype of the packet to determine what
l3 information to write, and where to write it. A test is also included
that verifies that this works as expected.

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1539608
Signed-off-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>

datapath: enable NSH support

Upstream commit:
  commit b2d0f5d5dc53532e6f07bc546a476a55ebdfe0f3
  Author: Yi Yang <yi.y.yang@intel.com>
  Date:   Tue Nov 7 21:07:02 2017 +0800

    openvswitch: enable NSH support

    OVS master and 2.8 branch has merged NSH userspace
    patch series, this patch is to enable NSH support
    in kernel data path in order that OVS can support
    NSH in compat mode by porting this.

Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Acked-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Eric Garver <e@erig.me>
Acked-by: Pravin Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>

datapath: nsh: add GSO support

Upstream commit:
  commit c411ed854584a71b0e86ac3019b60e4789d88086
  Author: Jiri Benc <jbenc@redhat.com>
  Date:   Mon Aug 28 21:43:24 2017 +0200

    nsh: add GSO support

    Add a new nsh/ directory. It currently holds only GSO functions but more
    will come: in particular, code shared by openvswitch and tc to manipulate
    NSH headers.

    For now, assume there's no hardware support for NSH segmentation. We can
    always introduce netdev->nsh_features later.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>

datapath: net: add NSH header structures and helpers

Upstream commit:
  commit 1f0b7744c50573df464ca33d8e5275be509f852b
  Author: Yi Yang <yi.y.yang@intel.com>
  Date:   Mon Aug 28 21:43:23 2017 +0200

    net: add NSH header structures and helpers

    NSH (Network Service Header)[1] is a new protocol for service
    function chaining, it can be handled as a L3 protocol like
    IPv4 and IPv6, Eth + NSH + Inner packet or VxLAN-gpe + NSH +
    Inner packet are two typical use cases.

    This patch adds NSH header structures and helpers for NSH GSO
    support and Open vSwitch NSH support.

    [1] https://datatracker.ietf.org/doc/draft-ietf-sfc-nsh/

    [Jiri: added nsh_hdr() helper and renamed the header struct to "struct
    nshhdr" to match the usual pattern. Removed packet type defines, these are
    now shared with VXLAN-GPE.]

Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>

datapath: vxlan: factor out VXLAN-GPE next protocol

Upstream commit:
  commit fa20e0e32cb3dfc1760b6254b64977f2fb5bd851
  Author: Jiri Benc <jbenc@redhat.com>
  Date:   Mon Aug 28 21:43:22 2017 +0200

    vxlan: factor out VXLAN-GPE next protocol

    The values are shared between VXLAN-GPE and NSH. Originally probably by
    coincidence but I notified both working groups about this last year and they
    seem to keep the values in sync since then.

    Hopefully they'll get a single IANA registry for the values, too. (I asked
    them for that.)

    Factor out the code to be shared by the NSH implementation.

    NSH and MPLS values are added in this patch, too. For MPLS, the drafts
    incorrectly assign only a single value, while we have two MPLS ethertypes.
    I raised the problem with both groups. For now, I assume the value is for
    unicast.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>

datapath: ether: add NSH ethertype

Upstream commit:
  commit 155e6f649757c902901e599c268f8b575ddac1f8
  Author: Jiri Benc <jbenc@redhat.com>
  Date:   Mon Aug 28 21:43:21 2017 +0200

    ether: add NSH ethertype

    The NSH draft says:

       An IEEE EtherType, 0x894F, has been allocated for NSH.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>

expr: Add additional invariant check in test.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Numan Siddique <nusiddiq@redhat.com>

expr: Make expr_sort() always yield an expr that satisfies invariants.

Expressions of type EXPR_T_AND are supposed to follow an invariant that
they have at least 2 clauses, but expr_sort() did not always follow that;
for example, applying it to (x[0] == 1 && x[1] == 1) yielded the 1-child
EXPR_T_AND expression x[0..1] == 3. This commit fixes the problem.

I don't know of any externally visible negative consequences for this
problem, but it made the code harder to reason about.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Numan Siddique <nusiddiq@redhat.com>

expr: Fix some bad naming.

expr_is_cmp() was badly named because it didn't just check for whether its
argument was an EXPR_T_CMP node.

struct expr_sort's 'relop' member was badly named because it wasn't a
relational operator, it was a symbol.

This commit improves both names.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Numan Siddique <nusiddiq@redhat.com>

ovs-vsctl: Add commands "add-bond-iface" and "del-bond-iface".

It was not too hard to build these commands using the database commands,
but a few people have asked for them over the years, so here they are.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Tested-by: Yifeng Sun <pkusunyifeng@gmail.com>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>

NEWS: Consolidate ovs-vswitchd sections and fix indentation.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Tested-by: Yifeng Sun <pkusunyifeng@gmail.com>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>

ovs-vsctl: Use default socket name in tests.

By using the default socket name "db.sock", instead of "socket", we can
avoid passing --db=unix:socket to all the ovs-vsctl invocations, which is
kind of nice.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Tested-by: Yifeng Sun <pkusunyifeng@gmail.com>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>

ovs-vsctl: Remove superfluous OVS_VSCTL_CLEANUP from tests.

Since on_exit was introduced a long, long time ago, it has no longer been
necessary to have individual calls to OVS_VSCTL_CLEANUP sprinkled
everywhere in the test code. This change makes the tests easier to read.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Tested-by: Yifeng Sun <pkusunyifeng@gmail.com>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>

ovsdb-client: Add --timeout option.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Acked-by: Justin Pettit <jpettit@ovn.org>

json: Make it safe to pass null pointers to json_equal().

Signed-off-by: Ben Pfaff <blp@ovn.org>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Acked-by: Justin Pettit <jpettit@ovn.org>

jsonrpc: Add comment for jsonrpc_msg_to_json().

From a glance at the prototype it wasn't obvious that it destroyed its
argument.

Signed-off-by: Ben Pfaff <blp@ovn.org>

odp-util: Always report ODP_FIT_TOO_LITTLE for IGMP.

OVS datapaths don't understand or parse IGMP fields, but OVS userspace
does, so this commit updates odp_flow_key_to_flow() to report that properly
to the caller.

Reported-by: Huanle Han <hanxueluo@gmail.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2018-January/343665.html
Signed-off-by: Ben Pfaff <blp@ovn.org>

ofproto-dpif-upcall: Slow path flows that datapath can't fully match.

In the OVS architecture, when a datapath doesn't have a match for a packet,
it sends the packet and the flow that it extracted from it to userspace.
Userspace then examines the packet and the flow and compares them.
Commonly, the flow is the same as what userspace expects, given the packet,
but there are two other possibilities:

    - The flow lacks one or more fields that userspace expects to be there,
      that is, the datapath doesn't understand or parse them but userspace
      does.  This is, for example, what would happen if current OVS
      userspace, which understands and extracts TCP flags, were to be
      paired with an older OVS kernel module, which does not.  Internally
      OVS uses the name ODP_FIT_TOO_LITTLE for this situation.

    - The flow includes fields that userspace does not know about, that is,
      the datapath understands and parses them but userspace does not.
      This is, for example, what would happen if an old OVS userspace that
      does not understand or extract TCP flags, were to be paired with a
      recent OVS kernel module that does.  Internally, OVS uses the name
      ODP_FIT_TOO_MUCH for this situation.

The latter is not a big deal and OVS doesn't have to do much to cope with
it.

The former is more of a problem.  When the datapath can't match on all the
fields that OVS supports, it means that OVS can't safely install a flow at
all, other than one that directs packets to the slow path.  Otherwise, if
OVS did install a flow, it could match a packet that does not match the
flow that OVS intended to match and could cause the wrong behavior.

Somehow, this nuance was lost a long time.  From about 2013 until today,
it seems that OVS has ignored ODP_FIT_TOO_LITTLE.  Instead, it happily
installs a flow regardless of whether the datapath can actually fully match
it.  I imagine that this is rarely a problem because most of the time
the datapath and userspace are well matched, but it is still an important
problem to fix.  This commit fixes it, by forcing flows into the slow path
when the datapath cannot match specifically enough.

CC: Ethan Jackson <ejj@eecs.berkeley.edu>
Fixes: e79a6c833e0d ("ofproto: Handle flow installation and eviction in upcall.")
Reported-by: Huanle Han <hanxueluo@gmail.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2018-January/343665.html
Signed-off-by: Ben Pfaff <blp@ovn.org>

Remove last mentions of 'facet' from comments.

How did these survive so long?! OVS hasn't had facets since 2013.

Signed-off-by: Ben Pfaff <blp@ovn.org>

datapath-windows: Specify platform arch during compilation

Newer compilers expect the platorm architecture to be passed.

Signed-off-by: Shashank Ram <rams@vmware.com>
Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>

datapath-windows: Allow compiling all targets using SDK 10.0

Previously, Win8/8.1 targets would use SDK8.1. However, its
recommended to use the newer SDK as newer VS versions typically
drop support for older SDKs later on. This patch adds support
to compile all targets (Win8/8.1/10) using the 10.0 SDK.

Note that his patch does not drop support for older SDKs.

Signed-off-by: Shashank Ram <rams@vmware.com>
Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>

netdev-linux: Report netdev change events when mac changed.

When mac addr of ports on bridge has been changed, for example,

$ ip link set dev eth0 address 00:11:22:33:44:55

we should reconfigure the datapath id and mac addr of local port.
But now openvswitch dont do that as expected.

A simple example of how to reproduce it:

$ ovs-vsctl add-br br0
$ ifconfig br0 # for example, mac is c6:c6:d7:46:b4:4b
$ ip link set dev br0 address 00:11:22:33:44:55
$ ifconfig br0 # mac of br0 will be 00:11:22:33:44:55

then repeat:
$ ip link set dev br0 address 00:11:22:33:44:55
$ ifconfig br0 # mac of br0 will be c6:c6:d7:46:b4:4b

This patch reports the mac changed event when ports changed, then
openvswitch will reconfigure the datapath id and mac addr of local
port.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>

Makefile.am: Use correct path separator for Windows

Signed-off-by: Shashank Ram <rams@vmware.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>

util: Use lookup table to optimize hexit_value().

Daniel Alvarez Sanchez reported a significant overall speedup in ovn-northd
due to a similar patch.

Reported-by: Daniel Alvarez Sanchez <dalvarez@redhat.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2018-February/046120.html
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Daniel Alvarez <dalvarez@redhat.com>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>

datapath-windows: Add trace level logs in conntrack for invalid ct state.

Signed-off-by: Anand Kumar <kumaranand@vmware.com>
Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>

ovn-controller: Process logical flow matches before actions.

Otherwise, when the match field has "is_chassis_resident", and the match is
not actually resident, and the action has meter or group, the group/meter
ID is assigned even though it will never be used.

Signed-off-by: Guoshuai Li <ligs@dtdream.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>

ovn-controller: Document southbound database use and graceful termination.

A lot of people seem to think that "kill" gracefully terminates
ovn-controller, but it doesn't, so this documentation at least provides
something to point to.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Han Zhou <zhouhan@gmail.com>

datapath-windows: Optimize conntrack lock implementation.

Currently, there is one global lock for conntrack module, which protects
conntrack entries and conntrack table. All the NAT operations are
performed holding this lock.

This becomes inefficient, as the number of conntrack entries grow.
With new implementation, we will have two PNDIS_RW_LOCK_EX locks in
conntrack.

1. ovsCtBucketLock - one rw lock per bucket of the conntrack table,
which is shared by all the ct entries that belong to the same bucket.
2. lock - a rw lock in OVS_CT_ENTRY structure that protects the members
of conntrack entry.

Also, OVS_CT_ENTRY structure will have a lock reference(bucketLockRef)
to the corresponding OvsCtBucketLock of conntrack table.
We need this reference to retrieve ovsCtBucketLock from ct entry
for delete operation.

Signed-off-by: Anand Kumar <kumaranand@vmware.com>
Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>

datapath-windows: Add a global level RW lock for NAT

Currently NAT module relies on the existing conntrack lock.
This patch provides a basic lock implementation for NAT module
in conntrack.

Signed-off-by: Anand Kumar <kumaranand@vmware.com>
Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>

datapath-windows: Refactor conntrack code.

Some of the functions and code are refactored
so that new conntrack lock can be implemented

Signed-off-by: Anand Kumar <kumaranand@vmware.com>
Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>

AUTHORS: Add JunhanYan <juyan@redhat.com>.

Signed-off-by: Ben Pfaff <blp@ovn.org>

ofproto-dpif: Delete system tunnel interface when remove ovs bridge

When a user adds the first tunnel of a given type (e.g. the first VXLAN
tunnel) to an OVS bridge, OVS adds a vport of the same type to the
kernel datapath that backs the bridge. There is the corresponding
expectation that, when the last tunnel of that type is removed from the
OVS bridges, OVS would remove the vport that represents it from the
backing kernel datapath, but OVS was not doing that. This commit fixes
the problem.

There is not any major concern about the lingering tunnel interface, but
it's cleaner to delete it.

Fixes: 921c370a9df5 ("dpif-netlink: Probe for out-of-tree tunnels, decides used interface")
Signed-off-by: JunhanYan <juyan@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>

AUTHORS: Add Frode Nordahl <frode.nordahl@gmail.com>.

Signed-off-by: Ben Pfaff <blp@ovn.org>

debian: Do not modify pre-existing defaults file

Currently, on installation or upgrade the openvswitch-switch deb package
will in some circumstances modify a pre-existing
/etc/default/openvswitch-switch configuration file.

This does not play well with modeling and configuration management tools
and may lead to unnecessary restarts of the openvswitch-switch service
after the initial restart done as part of the package upgrade. As
restarting the openvswitch-switch affects the datapath this is
something we should try to avoid.

I also believe the current behaviour to be in conflict with best practices
set out in the config files section of the
[Debian Policy](https://www.debian.org/doc/debian-policy/#s-config-files).

This commit addresses this by removing the part of the postinst script
that attempts to append missing documentation parts of the template
and leaves the installed defaults file alone when it exists.

Fixes: 0aaa379d99f4 ("Debian packaging: Add several new settings to /etc/default/openflow-switch.")
Signed-off-by: Frode Nordahl <frode.nordahl@gmail.com>
Reported-at: https://github.com/openvswitch/ovs-issues/issues/137
Signed-off-by: Ben Pfaff <blp@ovn.org>

ovn-nbctl: Add QoS commands.

This patch provides the command line to add/delete/list QoS rule on the
logical switch.

Signed-off-by: Guoshuai Li <ligs@dtdream.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>

tests: Make OVS_WAIT_UNTIL and OVS_WAIT_WHILE failures easier to debug.

Until now, when OVS_WAIT_UNTIL or OVS_WAIT_WHILE ran, little information
was available: usually nothing at all in the log, unless the wait failed,
in which case there was a line number. This commit adds a note saying
what is being waited for in any case, and a message saying that the wait
failed if it does.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Tested-by: Yifeng Sun <pkusunyifeng@gmail.com>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>

ovs-router: fix router entry cast

The offsetof(struct ovs_router_entry, cr) should always be 0,
thus the else statement should never be reached.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>

doc: Added OVS Conntrack tutorial

OVS supports connection tracker related match fields and actions.
Added a tutorial to demonstrate the basic use cases for some of these
match fields and actions.

Signed-off-by: Ashish Varma <ashishvarma.ovs@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>

Add unixctl option for ovn-northd

Signed-off-by: Venkata Anil <vkommadi@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>

learn: improve test case

Current learn test cases use only ovs-ofctl add/del flows.
The patch add a new test case for learn with delete_learned and
limit option enabled.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>

Merge branch 'dpdk_merge' of https://github.com/istokes/ovs into HEAD

xlate: fix packets loopback caused by duplicate read of xcfgp.

Some functions, such as xlate_normal_mcast_send_mrouters, test xbundle
pointers equality to avoid sending packet back to in bundle. However,
xbundle pointers port from different xcfgp for same port are inequal.
This may lead to the packet loopback.

This commit stores xcfgp on ctx at first and always uses the same xcfgp
during one packet process period.

Signed-off-by: Huanle Han <hanxueluo@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>

ofctrl: Remove unused declaration.

Signed-off-by: Han Zhou <zhouhan@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>

ovn-nbctl: update manpage for lsp-set-type.

Signed-off-by: Han Zhou <zhouhan@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>

ovs-vswitchd: Avoid or suppress memory leak warning for glibc aio.

The asynchronous IO library in glibc starts threads that show up as memory
leaks in valgrind. This commit attempts to avoid the warnings by flushing
all the asynchronous I/O to the log file before exiting. This only does
part of the job for glibc since it keeps the threads around for some
undefined idle time before killing them, so in addition this commit adds a
valgrind suppression to stop displaying these warnings in any case.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: William Tu <u9012063@gmai.com>

ovs-vswitchd: Fire RCU callbacks before exit to reduce memory leak warnings.

ovs-vswitchd makes extensive use of RCU to defer freeing memory past the
latest time that it could be in use by a thread.  Until now, ovs-vswitchd
has not waited for RCU callbacks to fire before exiting.  This meant that
in many cases, when ovs-vswitchd exits, many blocks of memory are stuck in
RCU callback queues, which valgrind often reports as "possible" memory
leaks.

This commit adds a new function ovsrcu_exit() that waits and fires as many
RCU callbacks as it reasonably can.  It can only do so for the thread that
calls it and the thread that calls the callbacks, but generally speaking
ovs-vswitchd shuts down other threads before it exits anyway, so this is
pretty good.

In my testing this eliminates most valgrind warnings for tests that run
ovs-vswitchd.  This ought to make it easier to distinguish new leaks that
are real from existing non-leaks.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: William Tu <u9012063@gmai.com>

util: Document and rely on ovs_assert() always evaluating its argument.

The ovs_assert() macro always evaluates its argument, even when NDEBUG is
defined so that failure is ignored. This behavior wasn't documented, and
thus a lot of code didn't rely on it. This commit documents the behavior
and simplifies bits of code that heretofore didn't rely on it.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>

Support accepting and displaying table names in OVS tools.

OpenFlow has little-known support for naming tables.  Open vSwitch has
supported table names for ages, but it has never used or displayed them
outside of commands dedicated to table manipulation.  This commit adds
support for table names in ovs-ofctl.  When a table has a name, it displays
that name in flows and actions, so that, for example, the following:
    table=1, arp, actions=resubmit(,2)
might become:
    table=ingress_acl, arp, actions=resubmit(,mac_learning)
given appropriately named tables.

For backward compatibility, only interactive ovs-ofctl commands by default
display table names; to display them in scripts, use the new --names
option.

This feature was inspired by a talk that Kei Nohguchi presented at Open
vSwitch 2017 Fall Conference.

CC: Kei Nohguchi <kei@nohguchi.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Mark Michelson <mmichels@redhat.com>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>

ofp-util: New data structure for mapping between table names and numbers.

This shares the infrastructure for mapping port names and numbers. It will
be used in an upcoming commit.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Acked-by: Mark Michelson <mmichels@redhat.com>

ofp-actions: Make formatting and parsing functions take a struct argument.

An upcoming commit will add another parameter for parsing and formatting
actions. It is much easier to add these parameters if they are
encapsulated in a struct, so this commit first makes that change.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Acked-by: Mark Michelson <mmichels@redhat.com>

classifier: Refactor interface for classifier_remove().

Until now, classifier_remove() returned either null or the classifier rule
passed to it, which is an unusual interface. This commit changes it to
return true if it succeeds or false on failure.

In addition, most of classifier_remove()'s callers know ahead of time that
it must succeed, even though most of them didn't bother with an assertion,
so this commit adds a classifier_remove_assert() function as a helper.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Tested-by: Yifeng Sun <pkusunyifeng@gmail.com>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>

netdev-dpdk: Add support for vHost dequeue zero copy (experimental)

Zero copy is disabled by default. To enable it, set the 'dq-zero-copy'
option to 'true' when configuring the Interface:

ovs-vsctl set Interface dpdkvhostuserclient0
options:vhost-server-path=/tmp/dpdkvhostuserclient0
options:dq-zero-copy=true

When packets from a vHost device with zero copy enabled are destined for
a single 'dpdk' port, the number of tx descriptors on that 'dpdk' port
must be set to a smaller value. 128 is recommended. This can be achieved
like so:

ovs-vsctl set Interface dpdkport options:n_txq_desc=128

Note: The sum of the tx descriptors of all 'dpdk' ports the VM will send
to should not exceed 128. Due to this requirement, the feature is
considered 'experimental'.

Testing of the patch showed a ~8% improvement when switching 512B
packets between vHost devices on different VMs on the same host when
zero copy was enabled on the transmitting device.

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>

classifier: Fix typo in comment.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>

Merge branch 'dpdk_merge' of https://github.com/istokes/ovs into HEAD

ovs-ofctl: Fix typo in comment.

Signed-off-by: Ben Pfaff <blp@ovn.org>

ovs-ofctl: Add "compose-packet" command for testing flow_compose().

I don't feel obligated to add a bunch of automatic tests for
flow_compose(), but this is handy for manual testing or for simple packet
generation.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Tested-by: Yifeng Sun <pkusunyifeng@gmail.com>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>

flow: Add some L7 payload data to most L4 protocols that accept it.

This makes traffic generated by flow_compose() look slightly more
realistic. It requires lots of updates to tests, but at least the tests
themselves should be slightly more realistic too.

At the same time, add --l7 and --l7-len options to ofproto/trace to allow
users to specify the amount or contents of payloads that they want.

Suggested-by: Brad Cowie <brad@cowie.nz>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Tested-by: Yifeng Sun <pkusunyifeng@gmail.com>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>

flow: Simplify flow_compose_l4().

Each of the cases in flow_compose_l4() separately tracked the number of
bytes of L4 data added to the packet. This commit makes the function do
that in a single place without per-protocol bookkeeping.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>

ofproto-dpif-trace: Generalize syntax for ofproto/trace.

ofproto/trace takes a bunch of options that have weird placement and
syntax. This commit changes the syntax so that the options can be placed
anywhere and consistently use a double-dash option prefix. For
compatibility, the previous syntax is also supported.

An upcoming commit will add new options and this change allows that
upcoming commit to be less confusing.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Tested-by: Yifeng Sun <pkusunyifeng@gmail.com>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>

ovs-vsctl, vtep-ctl: Free 'args' string on exit.

This avoids a memory leak warning from valgrind.

ovn-sbctl and ovn-nbctl already followed this pattern.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>

ofproto: Avoid use-after-free on error path in ofproto_flow_mod_learn().

In the case where the learned flow limit has been reached (below_limit ==
false), ofproto_flow_mod_uninit() would unref ofm->temp_rule (which is
also in the 'rule' local variable) before dereferencing rule->flow_cookie
for the log message. This fixes the problem.

(The greatest likely consequence of this bug was logging the wrong cookie
value.)

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>

checkpatch.py: Fix Python style.

Fixes the following warnings:

../utilities/checkpatch.py:219:1: E302 expected 2 blank lines, found 1
../utilities/checkpatch.py:224:1: E302 expected 2 blank lines, found 1
../utilities/checkpatch.py:228:1: E302 expected 2 blank lines, found 1

CC: Justin Pettit <jpettit@ovn.org>
Fixes: 4e99b70dfae0 ("checkpatch.py: Add check for "xxx" in comments.")
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>

netdev-dpdk: Fix xstats leak on port destruction.

CC: Michal Weglicki <michalx.weglicki@intel.com>
Fixes: 971f4b394c6e ("netdev: Custom statistics.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>

netdev-dpdk: Fix memory leak in netdev_dpdk_configure_xstats().

CC: Michal Weglicki <michalx.weglicki@intel.com>
Fixes: 971f4b394c6e ("netdev: Custom statistics.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>

netdev-dpdk: Fix memory leak in netdev_dpdk_get_custom_stats().

CC: Michal Weglicki <michalx.weglicki@intel.com>
Fixes: 971f4b394c6e ("netdev: Custom statistics.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>

vswitchd: show DPDK version

Show DPDK version if Open vSwitch is compiled with DPDK support.
Version can be retrieved with `ovs-vswitchd --version` or from OVS logs.
Small change in ovs-ctl to avoid breakage on output change.

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>

netdev-dpdk: fix port addition for ports sharing same PCI id

Some NICs have only one PCI address associated with multiple ports. This
patch extends the dpdk-devargs option's format to cater for such devices.

To achieve that, this patch uses a new syntax that will be adapted and
implemented in future DPDK release (likely, v18.05):
    http://dpdk.org/ml/archives/dev/2017-December/084234.html

And since it's the DPDK duty to parse the (complete and full) syntax
and this patch is more likely to serve as an intermediate workaround,
here I take a simpler and shorter syntax from it (note it's allowed to
have only one category being provided):
    class=eth,mac=00:11:22:33:44:55:66

Also, old compatibility is kept. Users can still go on with using the
PCI id to add a port (if that's enough for them). Meaning, this patch
will not break anything.

This patch is basically based on the one from Ciara:
    https://mail.openvswitch.org/pipermail/ovs-dev/2017-October/339496.html

Cc: Loftus Ciara <ciara.loftus@intel.com>
Cc: Thomas Monjalon <thomas@monjalon.net>
Cc: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Yuanhan Liu <yliu@fridaylinux.org>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>

netdev-dpdk: Fix requested MTU size validation.

This commit replaces MTU_TO_FRAME_LEN(mtu) with MTU_TO_MAX_FRAME_LEN(mtu)
in netdev_dpdk_set_mtu(), in order to determine if the total length of
the L2 frame with an MTU of ’mtu’ exceeds NETDEV_DPDK_MAX_PKT_LEN.

When setting an MTU we first check if the requested total frame length
(which includes associated L2 overhead) will exceed the maximum
frame length supported in netdev_dpdk_set_mtu(). The frame length is
calculated by MTU_TO_FRAME_LEN as MTU + ETHER_HEADER + ETHER_CRC. The MTU
for the device will be set at a later stage in dpdk_eth_dev_init() using
rte_eth_dev_set_mtu(mtu).

However when using rte_eth_dev_set_mtu(mtu) the calculation used to check
that the frame does not exceed the max frame length for that device varies
between DPDK device drivers. For example ixgbe driver calculates the
frame length for a given MTU as

mtu + ETHER_HDR_LEN + ETHER_CRC_LEN

i40e driver calculates it as

mtu + ETHER_HDR_LEN + ETHER_CRC_LEN + I40E_VLAN_TAG_SIZE * 2

em driver calculates it as

mtu + ETHER_HDR_LEN + ETHER_CRC_LEN + VLAN_TAG_SIZE

Currently it is possible to set an MTU for a netdev_dpdk device that exceeds
the upper limit MTU for that devices DPDK driver. This leads to a segfault.
This is because the frame length comparison as is, does not take into account
the addition of the vlan tag overhead expected in the drivers. The
netdev_dpdk_set_mtu() call will incorrectly succeed but the subsequent
dpdk_eth_dev_init() will fail before the queues have been created for the
DPDK device. This coupled with assumptions regarding reconfiguration
requirements for the netdev will lead to a segfault when the rxq is polled
for this device.

A simple way to avoid this is by using MTU_TO_MAX_FRAME_LEN(mtu) when
validating a requested MTU in netdev_dpdk_set_mtu().
MTU_TO_MAX_FRAME_LEN(mtu) is equivalent to the following:

mtu + ETHER_HDR_LEN + ETHER_CRC_LEN + (2 * VLAN_HEADER_LEN)

By using MTU_TO_MAX_FRAME_LEN at the netdev_dpdk_set_mtu() stage, OvS
now takes into account the maximum L2 overhead that a DPDK driver could
allow for in its frame size calculation. This allows OVS to flag an error
rather than the DPDK driver if the frame length exceeds the max DPDK frame
length. OVS can fail gracefully at this point and use the default MTU of
1500 to continue to configure the port.

Note: this fix is a work around, a better approach would be if DPDK devices
could report the maximum MTU value that can be requested on a per device
basis. This capability however is not currently available. A downside of
this patch is that the MTU upper limit will be reduced by 8 bytes for
DPDK devices that do not need to account for vlan tags in the frame length
driver calculations e.g. ixgbe devices upper MTU limit is reduced from
the OVS point of view from 9710 to 9702.

CC: Mark Kavanagh <mark.b.kavanagh@intel.com>
Fixes: 0072e931 ("netdev-dpdk: add support for jumbo frames")
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Co-authored-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>

ofproto: Fix double-unref of temporary rule when learning.

When ofproto_flow_mod_init() accepts a rule, it takes ownership of it and
either unrefs it on error or transfers ownership to the struct it
initializes on success, but ofproto_flow_mod_init_for_learn() was unref-ing
it a second time if it reported an error.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>

ovs-atomic: Fix typo in comment.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>

checkpatch.py: Add check for "xxx" in comments.

"xxx" is often used to indicate items that the developer wanted to look
at again before committing. Flag those as a warning.

Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>

Fix incorrect handling of return value.

The value cookie_offset should be 'size_t' type.

Signed-off-by: Lili Huang <huanglili.huang@huawei.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>

openvswitch/types.h: Drop the member name in initializer macro

MSVC++ compiler does not allow initializing a struct while
explicitly initializing a member in the struct.

Not allowed:
    static const struct eth_addr a = {{ .ea= { 0xff, 0xff, 0xff, 0xff,
                                        0xff, 0xff }}};

Alowed:
    static const struct eth_addr b  = {{{ 0xff, 0xff, 0xff, 0xff, 0xff,
                                          0xff }}};
*An extra curly brace is required for GCC in case the struct contains
a union.

Signed-off-by: Shashank Ram <rams@vmware.com>
Tested-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>

gre: strip gre-tso offload flags

if the gro enable, ipgre receive a gre-tso package. After pop
the gre-tunnel the encapsulation and GSO_ENCAP flags should be
striped. or the packet encap again and will be dropped in
ovs_iptunnel_handle_offloads

Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>

ofproto-dpif-upcall: Fix typo in comment.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>

ovn: OVN Support QoS meter

This feature is used to limit the bandwidth of flows, such as floating IP.

ovn-northd changes:
1. add bandwidth column in NB's QOS table.
2. add QOS_METER stages in Logical switch ingress/egress.
3. add set_meter() action in SB's LFlow table.

ovn-controller changes:
add meter_table for meter action process openflow meter table.

Now, This feature is only supported in DPDK.

Signed-off-by: Guoshuai Li <ligs@dtdream.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>

ovn-controller: Add extend_table instead of group_table to expand meter.

The structure and function of the group table and meter table are similar,
refactoring code is used to extend for add the meter table.
The following function as lib: table init/destroy/clear/lookup/remove,
assign id for contents, Move the contents of desired to existing.

Signed-off-by: Guoshuai Li <ligs@dtdream.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>

Revert "compat:inet_frag.h: Check for frag_percpu_counter_batch"

This reverts commit 822afef74f5e65af0cdc3916249ce85a70ae7b83.

Requested-at: https://mail.openvswitch.org/pipermail/ovs-dev/2018-January/343674.html
Requested-by: Gregory Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>

tc flower: reorder tunnel encap/decap actions

The tc_flower conversion struct does not consider the order of actions.
If an OvS rule matches on a tunnel (decap) and outputs to a new tunnel,
the netlink conversion to TC will add the set tunnel key action before the
unset, leading to an incorrect TC rule. This patch reorders the netlink
generation to ensure a decap is done before an encap if both exist.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>

docs: Fix formatting in fedora.rst

Fix rst formatting in fedora.rst so that the commands look correctly
on the web.

Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>

AUTHORS: Add Robert Mulik.

Signed-off-by: Ben Pfaff <blp@ovn.org>

LACP: Check active partner sys id

A reboot of one switch in an MC-LAG bond makes all bond links
to go down, causing a total connectivity loss for 3 seconds.

Packet capture shows that spurious LACP PDUs are sent to OVS with
a different MAC address (partner system id) during the final
stages of the MC-LAG switch reboot. The current implementation
doesn't care about the partner sys_id (MAC address).

The code change based on the following:
- If an interface (lead interface) on a bond has an "attached"
  LACP connection, then any other slaves on that bond is allowed
  to become active only when its partner's sys_id is the same as
  the partner's sys_id of the lead interface.
- So, when a slave interface of a bond becomes "current" (it gets
  valid LACP information), first checks if there is already an
  active interface on the bond.
- If there is a lead, the slave checks for the partner sys_ids,
  and becomes active only when they are the same, otherwise it
  remains in "current" state, but "detached".
- If there is no lead, it follows the old way, and accepts any
  partner sys_id.

Signed-off-by: Robert Mulik <robert.mulik@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>

compat:inet_frag.h: Check for frag_percpu_counter_batch

Fix up the compat layer to check for frag_percpu_counter_batch and
if not present then use atomic_sub and atomic_add as per the
backport in the 3.16.50 LTS kernel. Fixes compile errors on
3.16 series kernels from 3.16.50 on.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Justin Pettit <jpettit@ovn.org>

tests: Fix non-canonical MAC addresses in ovn.at.

Signed-off-by: Ben Pfaff <blp@ovn.org>

xlate: fix xport lookup for recirc

Xlate_lookup and xlate_lookup_ofproto_() provides in_port and ofproto
based on xport determined using flow, which is extracted from packet.
The lookup can happen due to recirculation as well. It can happen, that
packet_type has been modified during xlate before recirculation is
triggered, so the lookup fails or delivers wrong xport.
This can be worked around by propagating xport to ctx->xin after the very
first lookup and store it in frozen state of the recirculation.
So, when lookup is performed due to recirculation, the xport can be
retrieved from the frozen state.

The packet-type-aware unit tests are updated with a new one to verify
this behavior.

Signed-off-by: Zoltan Balogh <zoltan.balogh@ericsson.com>
CC: Jan Scheurich <jan.scheurich@ericsson.com>
Fixes: beb75a40fdc2 ("userspace: Switching of L3 packets in L2 pipeline")
Signed-off-by: Ben Pfaff <blp@ovn.org>

ofproto-dpif-xlate: add uuid to xports

This should make possible to look up xport by UUID and will be used by a
later commit.

Signed-off-by: Zoltan Balogh <zoltan.balogh@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>

ofproto-dpif-sflow: Recursively examine actions inside clone.

Until now, dpif_sflow_read_actions() has ignored actions inside clone.
This means that sflow missed tnl_push actions inside clone, which OVS
now uses to avoid tx recirculation. This commit fixes the problem
by making dpif_sflow_read_actions() recursively process actions inside
clone.

In addition, some sflow data needs to be stored and restored in
ofproto-dpif-xlate when native_tunnel_output() is invoked. Otherwise the
output action of underlay bridge is getting counted too when sFlow is set
on the overlay bridge.

Both bugs are connected to sflows and were introduced by the commit in
the "Fixes:" tag below.

Signed-off-by: Zoltan Balogh <zoltan.balogh@ericsson.com>
CC: Sugesh Chandran <sugesh.chandran@intel.com>
Fixes: 7c12dfc527a5 ("tunneling: Avoid datapath-recirc by combining recirc actions at xlate.")
Signed-off-by: Ben Pfaff <blp@ovn.org>

bridge: Fix custom stats' counters leak.

The caller takes ownership over allocated array of counters.
And it must free them.

CC: Michal Weglicki <michalx.weglicki@intel.com>
Fixes: 971f4b394c6e ("netdev: Custom statistics.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>