]> git.proxmox.com Git - ovs.git/log
ovs.git
6 years agonetdev-dpdk: Remove use of rte_mempool_ops_get_count.
Kevin Traynor [Wed, 23 May 2018 13:41:30 +0000 (14:41 +0100)]
netdev-dpdk: Remove use of rte_mempool_ops_get_count.

rte_mempool_ops_get_count is not exported by DPDK so it means it
cannot be used by OVS when using DPDK as a shared library.

Remove rte_mempool_ops_get_count but still use rte_mempool_full
and document it's behavior.

Fixes: 91fccdad72a2 ("netdev-dpdk: Free mempool only when no in-use mbufs.")
Reported-by: Timothy Redaelli <tredaelli@redhat.com>
Reported-by: Markos Chandras <mchandras@suse.de>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
6 years agoExtend tests for conjunctive match support in OVN
Numan Siddique [Thu, 24 May 2018 15:45:53 +0000 (17:45 +0200)]
Extend tests for conjunctive match support in OVN

Check the application of conjunctive matching to logical flow match
expressions. In particular cover the case where conjunctive matching is
applied to ACL match expressions that refer to Address Sets.

Mark Michelson who tested a similar patch [1] has found a significant
improvement in ACL processing and reduction of OF flows from an order of
1 million to few thousands. [2]

Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
[1] - https://mail.openvswitch.org/pipermail/ovs-dev/2018-February/344523.html
[2] - https://mail.openvswitch.org/pipermail/ovs-dev/2018-February/344311.html

Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agoFactor prerequisites out of AND/OR trees with unique symbol
Jakub Sitnicki [Thu, 24 May 2018 15:45:52 +0000 (17:45 +0200)]
Factor prerequisites out of AND/OR trees with unique symbol

Appending prerequisites to sub-expressions of OR that are all over one
symbol prevents the expression-to-matches converter from applying
conjunctive matching. This happens during the annotation phase.

input:      s1 == { c1, c2 } && s2 == { c3, c4 }
expanded:   (s1 == c1 || s1 == c2) && (s2 == c3 || s2 == c4)
annotated:  ((p1 && s1 == c1) || (p1 && s1 == c2)) &&
            ((p2 && s2 == c3) || (p2 && s2 == c4))
normalized: (p1 && p2 && s1 == c1 && s2 == c3) ||
            (p1 && p2 && s1 == c1 && s2 == c4) ||
            (p1 && p2 && s1 == c2 && s2 == c3) ||
            (p1 && p2 && s1 == c2 && s2 == c4)

Where s1,s2 - symbols, c1..c4 - constants, p1,p2 - prerequisites.

Since sub-expressions of OR trees that are over one symbol all have the
same prerequisites, we can factor them out leaving the OR tree in tact,
and enabling the converter to apply conjunctive matching to
AND(OR(clause)) trees.

Going back to our example this change gives us:

input:      s1 == { c1, c2 } && s2 == { c3, c4 }
expanded:   (s1 == c1 || s1 == c2) && (s2 == c3 || s2 == c4)
annotated:  (s1 == c1 || s1 == c2) && p1 && (s2 == c3 || s2 == c4) && p2
normalized: p1 && p2 && (s1 == c1 || s1 == c2) && (s2 == c3 || s2 == c4)

We also factor out the prerequisites out of pure AND or mixed AND/OR
trees to keep the common code path, but in this case the only thing we
gain is a shorter expression as prerequisites for each symbol appear
only once.

Documentation comments have been contributed by Ben Pfaff.

Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agonetdev-native-tnl: Fix alignment for erspan index.
Darrell Ball [Thu, 24 May 2018 02:13:56 +0000 (19:13 -0700)]
netdev-native-tnl: Fix alignment for erspan index.

Flagged by clang.

CC: William Tu <u9012063@gmail.com>
Fixes: 068794b43f0e ("erspan: Add flow-based erspan options")
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
6 years agoofp-flow: Fix uninitialized data decoding OF1.5 flow stats.
Ben Pfaff [Wed, 23 May 2018 20:51:59 +0000 (13:51 -0700)]
ofp-flow: Fix uninitialized data decoding OF1.5 flow stats.

Reported-by: Paul Greenberg
Reported-at: https://github.com/openvswitch/ovs-issues/issues/149
Fixes: c7b02b800615 ("Add support for OpenFlow 1.5 statistics (OXS).")
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Darrell Ball <dlu998@gmail.com>
6 years agorconn: Introduce new invariant to fix assertion failure in corner case.
Ben Pfaff [Wed, 23 May 2018 23:58:31 +0000 (16:58 -0700)]
rconn: Introduce new invariant to fix assertion failure in corner case.

Until now, rconn_get_version() has only reported the OpenFlow version in
use when the rconn is actually connected.  This makes sense, but it has a
harsh consequence.  Consider code like this:

    if (rconn_is_connected(rconn) && rconn_get_version(rconn) >= 0) {
        for (int i = 0; i < 2; i++) {
            struct ofpbuf *b = ofputil_encode_echo_request(
                rconn_get_version(rconn));
            rconn_send(rconn, b, NULL);
        }
    }

Maybe not the smartest code in the world, and probably no one would write
this exact code in any case, but it doesn't look too risky or crazy.

But it is.  The second trip through the loop can assert-fail inside
ofputil_encode_echo_request() because rconn_get_version(rconn) returns -1
instead of a valid OpenFlow version.  That happens if the first call to
rconn_send() encounters an error while sending the message and therefore
destroys the underlying vconn and disconnects so that rconn_get_version()
doesn't have a vconn to query for its version.

In a case like this where all the code to send the messages is close by, we
could just check rconn_get_version() in each loop iteration.  We could even
go through the tree and convince ourselves that individual bits of code are
safe, or be conservative and check rconn_get_version() >= 0 in the iffy
cases.  But this seems to me like an ongoing source of risk and a way to
get things wrong in corner cases.

This commit takes a different approach.  It introduces a new invariant: if
an rconn has ever been connected, then it returns a valid OpenFlow version
from rconn_get_version().  In addition, if an rconn is currently connected,
then the OpenFlow version it returns is the correct one (that may be
obvious, but there were corner cases before where it returned -1 even
though rconn_is_connected() returned true).

With this commit, the code above would work OK.  If the first call to
rconn_send() encounters an error sending the message, then
rconn_get_version() in the second iteration will return the same value as
in the first iteration.  The message passed to rconn_send() will end up
being discarded, but that's much better than either an assertion failure or
having to carefully analyze a lot of our code to deal with one unusual
corner case.

Reported-by: Han Zhou <zhouhan@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Han Zhou <hzhou8@ebay.com>
6 years agodatapath: compat: Fix ndo_size in RHEL 7.5 backport
Yi-Hung Wei [Thu, 17 May 2018 19:39:51 +0000 (12:39 -0700)]
datapath: compat: Fix ndo_size in RHEL 7.5 backport

If 'ndo_size' is not set in 'struct net_device_ops', RHEL kernel will not
make use of functions in 'struct net_device_ops_extended'.

Fixes: 39ca338374ab ("datapath: compat: Fix build on RHEL 7.5")
Reported-by: Jiri Benc <jbenc@redhat.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2018-May/347070.html
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Reviewed-by: Jiri Benc <jbenc@redhat.com>
6 years agoovn-controller: Count calls to lflow_run()
Jakub Sitnicki [Fri, 18 May 2018 16:55:35 +0000 (18:55 +0200)]
ovn-controller: Count calls to lflow_run()

lflow_run() is the main logical flows processing routine that we spend
most of the CPU time in when testing at scale.

With the switch to incremental processing approach in the controller,
we will be trying to avoid calling to lflow_run() as much as possible.

A counter lets us confirm that we are doing logical flow processing
only when it's expected, without resorting to profiling under stress.

It can also serve as a hint as to why ovn-controller process is
consuming CPU time.

Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Han Zhou <hzhou8@ebay.com>
6 years agorhel: Use openvswitch user/group for the log directory
Timothy Redaelli [Wed, 23 May 2018 13:46:32 +0000 (15:46 +0200)]
rhel: Use openvswitch user/group for the log directory

Commit 94cd8383e297 ("rhel: fix log directory permissions") restored the
old 755 permission on /var/log/openvswitch and this can result in the
exposure of sensitive information.

Since commit f624bf23b62a ("rhel: user/group openvswitch does not exist")
moved the user/group creations in %pre phase it's now possible to change
/var/log/openvswitch user/group to openvswitch:openvswitch and remove
the r/x bits for other again without having the "permission denied"
error when the logs are rotated.

CC: Aaron Conole <aconole@redhat.com>
Fixes: 94cd8383e297 ("rhel: fix log directory permissions")
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Reviewed-by: Markos Chandras <mchandras@suse.de>
6 years agoovn.at: fix occasional failure - ACL reject rule test
Han Zhou [Sat, 19 May 2018 21:21:33 +0000 (14:21 -0700)]
ovn.at: fix occasional failure - ACL reject rule test

The test fails occasionally because it may starts sending packets
before the new ACL related flows are installed on HVs, even if it
ensures lflows exist in SB DB. This patch ensure the HVs are in
sync by ovn-nbctl --wait=hv sync, and removes the check for lflow
readiness in SB.

Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agoodp-execute: Rename 'may_steal' to 'should_steal'.
Darrell Ball [Thu, 17 May 2018 02:24:46 +0000 (19:24 -0700)]
odp-execute: Rename 'may_steal' to 'should_steal'.

Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agoodp-execute: Correct and clarify 'steal' parameter.
Darrell Ball [Thu, 17 May 2018 06:08:48 +0000 (23:08 -0700)]
odp-execute: Correct and clarify 'steal' parameter.

Correct and clarify 'steal'/'may_steal' comments in
odp_execute_actions().

Reported-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agotests: Make test result more predictable.
Darrell Ball [Fri, 18 May 2018 16:52:23 +0000 (09:52 -0700)]
tests: Make test result more predictable.

The test 'ofproto-dpif - in place modification (vlan)' fails often
due to miss handling. Hence, make it more predictable by specifying
that misses should just be dropped.

Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agoerspan: fix invalid erspan version.
William Tu [Thu, 17 May 2018 20:36:47 +0000 (13:36 -0700)]
erspan: fix invalid erspan version.

ERSPAN only support version 1 and 2.  When packets send to an erspan device
which does not have proper version number set, drop the packet.  In real
case, we observe multicast packets sent to the erspan pernet device,
erspan0, which does not have erspan version configured.

Without this patch, we observe warning message from ovs-vswitchd as below,
due to receive an malformed erspan packet:

odp_util|WARN|odp_tun_key_from_attr__ invalid erspan version

Reported-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agogre: Resolve gre receive issues
Greg Rose [Thu, 17 May 2018 17:43:53 +0000 (10:43 -0700)]
gre: Resolve gre receive issues

On newer Linux kernels or on older kernels such as Red Hat that backport
from newer upstream Linux kernel releases the built-in gre kernel module
will interfere with OVS gre code in the receive path.  Fix this up by
placing the gre kernel code within the openvswitch driver so it will
not have to depend on the built-in gre kernel module.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
6 years agorhel: Enable ERSPAN features for RHEL 7.x
Greg Rose [Wed, 16 May 2018 20:13:20 +0000 (13:13 -0700)]
rhel: Enable ERSPAN features for RHEL 7.x

Enable ERSPAN on RHEL 7.x

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
6 years agoerspan: set bso when truncated bit is set.
William Tu [Sun, 13 May 2018 14:09:41 +0000 (07:09 -0700)]
erspan: set bso when truncated bit is set.

Before the patch, the erspan BSO bit (Bad/Short/Oversized) is not
handled.  BSO has 4 possible values:
  00 --> Good frame with no error, or unknown integrity
  11 --> Payload is a Bad Frame with CRC or Alignment Error
  01 --> Payload is a Short Frame
  10 --> Payload is an Oversized Frame

This patch set BSO to 01 when truncate is true.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agoerspan: auto detect truncated ipv6 packets.
William Tu [Sun, 13 May 2018 14:03:49 +0000 (07:03 -0700)]
erspan: auto detect truncated ipv6 packets.

Upstream commit:
    commit d5db21a3e6977dcb42cee3d16cd69901fa66510a
    Author: William Tu <u9012063@gmail.com>
    Date:   Fri May 11 05:49:47 2018 -0700

    erspan: auto detect truncated ipv6 packets.

    Currently the truncated bit is set only when 1) the mirrored packet
    is larger than mtu and 2) the ipv4 packet tot_len is larger than
    the actual skb->len.  This patch adds another case for detecting
    whether ipv6 packet is truncated or not, by checking the ipv6 header
    payload_len and the skb->len.

Reported-by: Xiaoyan Jin <xiaoyanj@vmware.com>
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: William Tu <u9012063@gmail.com>
Cc: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agoip6erspan: make sure enough headroom at xmit.
William Tu [Fri, 11 May 2018 03:03:31 +0000 (20:03 -0700)]
ip6erspan: make sure enough headroom at xmit.

Upstream commit:
    commit e41c7c68ea771683cae5a7f81c268f38d7912ecb
    Author: William Tu <u9012063@gmail.com>
    Date:   Fri Mar 9 07:34:42 2018 -0800

    ip6erspan: make sure enough headroom at xmit.

    The patch adds skb_cow_header() to ensure enough headroom
    at ip6erspan_tunnel_xmit before pushing the erspan header
    to the skb.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: William Tu <u9012063@gmail.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
6 years agoip6erspan: improve error handling for erspan version number.
William Tu [Fri, 11 May 2018 02:51:09 +0000 (19:51 -0700)]
ip6erspan: improve error handling for erspan version number.

Upstream commit:
    commit d6aa71197ffcb68850bfebfc3fc160abe41df53b
    Author: William Tu <u9012063@gmail.com>
    Date:   Fri Mar 9 07:34:41 2018 -0800

    ip6erspan: improve error handling for erspan version number.

    When users fill in incorrect erspan version number through
    the struct erspan_metadata uapi, current code skips pushing
    the erspan header but continue pushing the gre header, which
    is incorrect.  The patch fixes it by returning error.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: William Tu <u9012063@gmail.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
6 years agoip6gre: add erspan v2 to tunnel lookup
William Tu [Fri, 11 May 2018 02:45:44 +0000 (19:45 -0700)]
ip6gre: add erspan v2 to tunnel lookup

Upstream commit:
    commit 3b04caab81649a9e8d5375b919b6653d791951df
    Author: William Tu <u9012063@gmail.com>
    Date:   Fri Mar 9 07:34:40 2018 -0800

    ip6gre: add erspan v2 to tunnel lookup

    The patch adds the erspan v2 proto in ip6gre_tunnel_lookup
    so the erspan v2 tunnel can be found correctly.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: William Tu <u9012063@gmail.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
6 years agoerspan: Add flow-based erspan options
Greg Rose [Fri, 18 May 2018 00:46:41 +0000 (17:46 -0700)]
erspan: Add flow-based erspan options

The patch add supports for flow-based erspan options.
The erspan_ver, erspan_idx, erspan_dir, and erspan_hwid can be
set as "flow" so that its value is set by the openflow rule,
instead of statically configured at port creation time.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agolib/dpif-netlink: Fix miscompare of gre ports
Greg Rose [Fri, 4 May 2018 23:48:43 +0000 (16:48 -0700)]
lib/dpif-netlink: Fix miscompare of gre ports

In netdev_to_ovs_vport_type() it checks for netdev types matching
"gre" with a strstr().  This makes it match ip6gre as well and return
OVS_VPORT_TYPE_GRE, which is clearly wrong.

Move the usage of strstr() *after* all the exact matches with strcmp()
to avoid the problem permanently because when I added the ip6gre
type I ran into a very difficult to detect bug.

Cc: Ben Pfaff <blp@ovn.org>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
6 years agoip6gre: Add ip6gre vport type
Greg Rose [Fri, 4 May 2018 17:14:44 +0000 (10:14 -0700)]
ip6gre: Add ip6gre vport type

Add handlers for OVS_VPORT_TYPE_IP6GRE

Cc: Ben Pfaff <blp@ovn.org>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
6 years agoerspan: auto detect truncated packets.
William Tu [Wed, 2 May 2018 22:06:04 +0000 (15:06 -0700)]
erspan: auto detect truncated packets.

Upstream commit:
    commit 1baf5ebf8954d9bff8fa4e7dd6c416a0cebdb9e2
    Author: William Tu <u9012063@gmail.com>
    Date:   Fri Apr 27 14:16:32 2018 -0700

    erspan: auto detect truncated packets.

    Currently the truncated bit is set only when the mirrored packet
    is larger than mtu.  For certain cases, the packet might already
    been truncated before sending to the erspan tunnel.  In this case,
    the patch detect whether the IP header's total length is larger
    than the actual skb->len.  If true, this indicated that the
    mirrored packet is truncated and set the erspan truncate bit.

    I tested the patch using bpf_skb_change_tail helper function to
    shrink the packet size and send to erspan tunnel.

Reported-by: Xiaoyan Jin <xiaoyanj@vmware.com>
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: William Tu <u9012063@gmail.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
6 years agoopenvswitch: fix vport packet length check.
William Tu [Wed, 2 May 2018 21:45:39 +0000 (14:45 -0700)]
openvswitch: fix vport packet length check.

Upstream commit:
    commit 46e371f0e78a82186a83cbcb4f4b8850417c7dd5
    Author: William Tu <u9012063@gmail.com>
    Date:   Wed Mar 7 15:38:48 2018 -0800

    openvswitch: fix vport packet length check.

    When sending a packet to a tunnel device, the dev's hard_header_len
    could be larger than the skb->len in function packet_length().
    In the case of ip6gretap/erspan, hard_header_len = LL_MAX_HEADER + t_hlen,
    which is around 180, and an ARP packet sent to this tunnel has
    skb->len = 42.  This causes the 'unsign int length' to become super
    large because it is negative value, causing the later ovs_vport_send
    to drop it due to over-mtu size.  The patch fixes it by setting it to 0.

Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: William Tu <u9012063@gmail.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
6 years agoerspan: add kernel datapath support
William Tu [Wed, 21 Mar 2018 21:02:25 +0000 (14:02 -0700)]
erspan: add kernel datapath support

pass check, check-kernel (4.16-rc4), check-system-userspace

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agouserspace: add erspan tunnel support.
William Tu [Tue, 15 May 2018 20:10:48 +0000 (16:10 -0400)]
userspace: add erspan tunnel support.

ERSPAN is a tunneling protocol based on GRE tunnel.  The patch
add erspan tunnel support for ovs-vswitchd with userspace datapath.
Configuring erspan tunnel is similar to gre tunnel, but with
additional erspan's parameters.  Matching a flow on erspan's
metadata is also supported, see ovs-fields for more details.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agouserspace: add gre sequence number support.
William Tu [Tue, 15 May 2018 20:10:49 +0000 (16:10 -0400)]
userspace: add gre sequence number support.

The patch adds support for gre sequence number.
Default is disable.  When enable with 'options:seq=true',
the outgoing gre packet will have its sequence number
incremented by one.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agonetdev-native-tnl: refactor the tunnel push header.
William Tu [Fri, 9 Mar 2018 21:02:23 +0000 (13:02 -0800)]
netdev-native-tnl: refactor the tunnel push header.

The patch adds additional 'struct netdev *' to the
native tunnel's push_header() interface.  This is used
for later GRE sequence number support.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agodatapath: add erspan version I and II support
William Tu [Mon, 12 Mar 2018 18:28:08 +0000 (11:28 -0700)]
datapath: add erspan version I and II support

Upstream commit:
    commit fc1372f89ffe1f58b589643b75f679e452350703
    Author: William Tu <u9012063@gmail.com>
    Date:   Thu Jan 25 13:20:11 2018 -0800

    openvswitch: add erspan version I and II support

    The patch adds support for openvswitch to configure erspan
    v1 and v2.  The OVS_TUNNEL_KEY_ATTR_ERSPAN_OPTS attr is added
    to uapi as a binary blob to support all ERSPAN v1 and v2's
    fields.  Note that Previous commit "openvswitch: Add erspan tunnel
    support." was reverted since it does not design properly.

Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: William Tu <u9012063@gmail.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
6 years agocompat: erspan: use bitfield instead of mask and offset
William Tu [Fri, 9 Mar 2018 21:52:32 +0000 (13:52 -0800)]
compat: erspan: use bitfield instead of mask and offset

Upstream commit:
    commit c69de58ba84f480879de64571d9dae5102d10ed6
    Author: William Tu <u9012063@gmail.com>
    Date:   Thu Jan 25 13:20:09 2018 -0800

    net: erspan: use bitfield instead of mask and offset

    Originally the erspan fields are defined as a group into a __be16 field,
    and use mask and offset to access each field.  This is more costly due to
    calling ntohs/htons.  The patch changes it to use bitfields.

Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Folds in the ip_gre portions of this commit.  Other portions of this
commit are included in a previous patch where it is called out.

Cc: William Tu <u9012063@gmail.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
6 years agodatapath: erspan: introduce erspan v2 for ip_gre
William Tu [Fri, 9 Mar 2018 19:03:19 +0000 (11:03 -0800)]
datapath: erspan: introduce erspan v2 for ip_gre

Upstream commit:

    commit f551c91de262ba36b20c3ac19538afb4f4507441
    Author: William Tu <u9012063@gmail.com>
    Date:   Wed Dec 13 16:38:56 2017 -0800

    net: erspan: introduce erspan v2 for ip_gre

    The patch adds support for erspan version 2.  Not all features are
    supported in this patch.  The SGT (security group tag), GRA (timestamp
    granularity), FT (frame type) are set to fixed value.  Only hardware
    ID and direction are configurable.  Optional subheader is also not
    supported.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Includes some compatability layer adjustments and portions of this
commit were introduced earlier while pulling in ipv6 erspan.

Cc: William Tu <u9012063@gmail.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
6 years agodatapath: Use correct tunnel receive for ip6gre
Greg Rose [Wed, 9 May 2018 22:57:10 +0000 (15:57 -0700)]
datapath: Use correct tunnel receive for ip6gre

During backports of ip6 gre I used ovs_ip_tunnel_rcv() for the
ip6gre_rcv() function but that is wrong because it processes ipv4
tunnels.  Use the correct backported ip6 tunnel receive in ip6
tunnel.c ip6_tnl_rcv().

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
6 years agodatapath: Add dellink op to ip6gre and ip6erspan tap ops
Greg Rose [Wed, 9 May 2018 22:04:45 +0000 (15:04 -0700)]
datapath: Add dellink op to ip6gre and ip6erspan tap ops

Fix an oversight in the ip6gre_tap_ops and ip6erspan_tap_ops in
which the .dellink field was not initialized leading to bugs
when trying to remove and re-add those type of ports.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
6 years agocompat: Add ipv6 GRE and IPV6 Tunneling
Greg Rose [Mon, 5 Mar 2018 18:11:57 +0000 (10:11 -0800)]
compat: Add ipv6 GRE and IPV6 Tunneling

This patch backports upstream ipv6 GRE and tunneling into the OVS
OOT (Out of Tree) datapath drivers.  The primary reason for this
is to support the ERSPAN feature.

Because there is no previous history of ipv6 GRE and tunneling it is
not possible to exactly reproduce the history of all the files in
the patch.  The two newly added files - ip6_gre.c and ip6_tunnel.c -
are cut from whole cloth out of the upstream Linux 4.15 kernel and
then modified as necessary with compatibility layer fixups.
These two files already included parts of several other upstream
commits that also touched other upstream files.  As such, this
patch may incorporate parts or all of the following commits:

d350a82 net: erspan: create erspan metadata uapi header
c69de58 net: erspan: use bitfield instead of mask and offset
b423d13 net: erspan: fix use-after-free
214bb1c net: erspan: remove md NULL check
afb4c97 ip6_gre: fix potential memory leak in ip6erspan_rcv
50670b6 ip_gre: fix potential memory leak in erspan_rcv
a734321 ip6_gre: fix error path when ip6erspan_rcv failed
dd8d5b8 ip_gre: fix error path when erspan_rcv failed
293a199 ip6_gre: fix a pontential issue in ip6erspan_rcv
d91e8db5 net: erspan: reload pointer after pskb_may_pull
ae3e133 net: erspan: fix wrong return value
c05fad5 ip_gre: fix wrong return value of erspan_rcv
94d7d8f ip6_gre: add erspan v2 support
f551c91 net: erspan: introduce erspan v2 for ip_gre
1d7e2ed net: erspan: refactor existing erspan code
ef7baf5 ip6_gre: add ip6 erspan collect_md mode
5a963eb ip6_gre: Add ERSPAN native tunnel support
ceaa001 openvswitch: Add erspan tunnel support.
f192970 ip_gre: check packet length and mtu correctly in erspan tx
c84bed4 ip_gre: erspan device should keep dst
c122fda ip_gre: set tunnel hlen properly in erspan_tunnel_init
5513d08 ip_gre: check packet length and mtu correctly in erspan_xmit
935a974 ip_gre: get key from session_id correctly in erspan_rcv
1a66a83 gre: add collect_md mode to ERSPAN tunnel
84e54fe gre: introduce native tunnel support for ERSPAN

In cases where the listed commits also touched other source code
files then the patches are also listed separately within this
patch series.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
6 years agocompat: Fixups for some compile warnings and errors
Greg Rose [Mon, 5 Mar 2018 18:09:10 +0000 (10:09 -0800)]
compat: Fixups for some compile warnings and errors

A lot of code has been pulled in.  Fix it up to make sure it compiles
correctly.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
6 years agocompat: Add #define for gre_handle_offloads
Greg Rose [Tue, 27 Feb 2018 18:14:17 +0000 (10:14 -0800)]
compat: Add #define for gre_handle_offloads

Fixes compile errors on some 4.x kernels.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
6 years agocompat: Move function to header
Greg Rose [Tue, 27 Feb 2018 16:36:28 +0000 (08:36 -0800)]
compat: Move function to header

tnl_flags_to_gre_flags is also needed in both ip_gre.c and gre.c on
some kernels.  Move it from ip_gre.c to the common header.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
6 years agoip_gre: remove the incorrect mtu limit for ipgre tap
Xin Long [Fri, 2 Mar 2018 21:53:29 +0000 (13:53 -0800)]
ip_gre: remove the incorrect mtu limit for ipgre tap

Upstream commit:
    commit cfddd4c33c254954927942599d299b3865743146
    Author: Xin Long <lucien.xin@gmail.com>
    Date:   Mon Dec 18 14:24:35 2017 +0800

    ip_gre: remove the incorrect mtu limit for ipgre tap

    ipgre tap driver calls ether_setup(), after commit 61e84623ace3
    ("net: centralize net_device min/max MTU checking"), the range
    of mtu is [min_mtu, max_mtu], which is [68, 1500] by default.

    It causes the dev mtu of the ipgre tap device to not be greater
    than 1500, this limit value is not correct for ipgre tap device.

    Besides, it's .change_mtu already does the right check. So this
    patch is just to set max_mtu as 0, and leave the check to it's
    .change_mtu.

Fixes: 61e84623ace3 ("net: centralize net_device min/max MTU checking")
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
6 years agoip_gre: erspan: reload pointer after pskb_may_pull
William Tu [Fri, 2 Mar 2018 19:06:52 +0000 (11:06 -0800)]
ip_gre: erspan: reload pointer after pskb_may_pull

Upstream commit:
    commit d91e8db5b629a3c8c81db4dc317a66c7b5591821
    Author: William Tu <u9012063@gmail.com>
    Date:   Fri Dec 15 14:27:44 2017 -0800

    net: erspan: reload pointer after pskb_may_pull

    pskb_may_pull() can change skb->data, so we need to re-load pkt_md
    and ershdr at the right place.

Fixes: 94d7d8f29287 ("ip6_gre: add erspan v2 support")
Fixes: f551c91de262 ("net: erspan: introduce erspan v2 for ip_gre")
Signed-off-by: William Tu <u9012063@gmail.com>
Cc: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Only the ip_gre portion of the upstream commit.  The ipv6 portion
is pulled in with later patch in series.

Cc: William Tu <u9012063@gmail.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
6 years agoip_gre: fix wrong return value of erspan_rcv
Haishuang Yan [Fri, 2 Mar 2018 18:42:21 +0000 (10:42 -0800)]
ip_gre: fix wrong return value of erspan_rcv

Upstream commit:
    commit c05fad5713b81b049ec6ac4eb2d304030b1efdce
    Author: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
    Date:   Fri Dec 15 10:46:16 2017 +0800

    ip_gre: fix wrong return value of erspan_rcv

    If pskb_may_pull return failed, return PACKET_REJECT instead of -ENOMEM.

Fixes: 84e54fe0a5ea ("gre: introduce native tunnel support for ERSPAN")
Cc: William Tu <u9012063@gmail.com>
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
6 years agocompat/erspan: refactor existing erspan code
William Tu [Fri, 2 Mar 2018 18:12:21 +0000 (10:12 -0800)]
compat/erspan: refactor existing erspan code

Upstream commit:
    commit 1d7e2ed22f8d9171fa8b629754022f22115b3f03
    Author: William Tu <u9012063@gmail.com>
    Date:   Wed Dec 13 16:38:55 2017 -0800

    net: erspan: refactor existing erspan code

    The patch refactors the existing erspan implementation in order
    to support erspan version 2, which has additional metadata.  So, in
    stead of having one 'struct erspanhdr' holding erspan version 1,
    breaks it into 'struct erspan_base_hdr' and 'struct erspan_metadata'.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Partial of the upstream commit.  While doing backports it is pretty
much impossible to fully reconstitute all upstream commits but we're
doing our best.  Other parts of this commit are introduced in the
upcoming monster patch for ip6 gre support.

Cc: William Tu <u9012063@gmail.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
6 years agoip_gre: Refactor the erspan tunnel code.
William Tu [Fri, 23 Feb 2018 16:45:13 +0000 (08:45 -0800)]
ip_gre: Refactor the erspan tunnel code.

Upstream commit:
    commit a3222dc95ca751cdc5f6ac3c9b092b160b73ed9f
    Author: William Tu <u9012063@gmail.com>
    Date:   Thu Nov 30 11:51:27 2017 -0800

    ip_gre: Refector the erpsan tunnel code.

    Move two erspan functions to header file, erspan.h, so ipv6
    erspan implementation can use it.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: William Tu <u9012063@gmail.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
6 years agoip_gre: erspan device should keep dst
Xin Long [Fri, 2 Mar 2018 17:26:50 +0000 (09:26 -0800)]
ip_gre: erspan device should keep dst

Upstream commit:
    commit c84bed440e4e11a973e8c0254d0dfaccfca41fb0
    Author: Xin Long <lucien.xin@gmail.com>
    Date:   Sun Oct 1 22:00:56 2017 +0800

    ip_gre: erspan device should keep dst

    The patch 'ip_gre: ipgre_tap device should keep dst' fixed
    the issue ipgre_tap dev mtu couldn't be updated in tx path.

    The same fix is needed for erspan as well.

Fixes: 84e54fe0a5ea ("gre: introduce native tunnel support for ERSPAN")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
6 years agoip_gre: set tunnel hlen properly in erspan_tunnel_init
Xin Long [Fri, 2 Mar 2018 17:21:43 +0000 (09:21 -0800)]
ip_gre: set tunnel hlen properly in erspan_tunnel_init

Upstream commit:
    commit c122fda271717f4fc618e0a31e833941fd5f1efd
    Author: Xin Long <lucien.xin@gmail.com>
    Date:   Sun Oct 1 22:00:55 2017 +0800

    ip_gre: set tunnel hlen properly in erspan_tunnel_init

    According to __gre_tunnel_init, tunnel->hlen should be set as the
    headers' length between inner packet and outer iphdr.

    It would be used especially to calculate a proper mtu when updating
    mtu in tnl_update_pmtu. Now without setting it, a bigger mtu value
    than expected would be updated, which hurts performance a lot.

    This patch is to fix it by setting tunnel->hlen with:
       tunnel->tun_hlen + tunnel->encap_hlen + sizeof(struct erspanhdr)

Fixes: 84e54fe0a5ea ("gre: introduce native tunnel support for ERSPAN")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
6 years agoip_gre: get key from session_id correctly in erspan_rcv
Xin Long [Fri, 2 Mar 2018 17:15:38 +0000 (09:15 -0800)]
ip_gre: get key from session_id correctly in erspan_rcv

Upstream commit:
    commit 935a9749a36828af0e8be224a5cd4bc758112c34
    Author: Xin Long <lucien.xin@gmail.com>
    Date:   Sun Oct 1 22:00:53 2017 +0800

    ip_gre: get key from session_id correctly in erspan_rcv

    erspan only uses the first 10 bits of session_id as the key to look
    up the tunnel. But in erspan_rcv, it missed 'session_id & ID_MASK'
    when getting the key from session_id.

    If any other flag is also set in session_id in a packet, it would
    fail to find the tunnel due to incorrect key in erspan_rcv.

    This patch is to add 'session_id & ID_MASK' there and also remove
    the unnecessary variable session_id.

Fixes: 84e54fe0a5ea ("gre: introduce native tunnel support for ERSPAN")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
6 years agoip_gre: check packet length and mtu correctly in erspan tx
William Tu [Fri, 23 Feb 2018 00:50:36 +0000 (16:50 -0800)]
ip_gre: check packet length and mtu correctly in erspan tx

Upstream commit:
    commit f192970de860d3ab90aa9e2a22853201a57bde78
    Author: William Tu <u9012063@gmail.com>
    Date:   Thu Oct 5 12:07:12 2017 -0700

    ip_gre: check packet length and mtu correctly in erspan tx

    Similarly to early patch for erspan_xmit(), the ARPHDR_ETHER device
    is the length of the whole ether packet.  So skb->len should subtract
    the dev->hard_header_len.

Fixes: 1a66a836da63 ("gre: add collect_md mode to ERSPAN tunnel")
Fixes: 84e54fe0a5ea ("gre: introduce native tunnel support for ERSPAN")
Signed-off-by: William Tu <u9012063@gmail.com>
Cc: Xin Long <lucien.xin@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: William Tu <u9012063@gmail.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
6 years agocompat/gre: add collect_md mode
William Tu [Thu, 22 Feb 2018 23:37:35 +0000 (15:37 -0800)]
compat/gre: add collect_md mode

    commit 1a66a836da630cd70f3639208da549b549ce576b
    Author: William Tu <u9012063@gmail.com>
    Date:   Fri Aug 25 09:21:28 2017 -0700

    gre: add collect_md mode to ERSPAN tunnel

    Similar to gre, vxlan, geneve, ipip tunnels, allow ERSPAN tunnels to
    operate in 'collect metadata' mode.  bpf_skb_[gs]et_tunnel_key() helpers
    can make use of it right away.  OVS can use it as well in the future.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With some adjustments for compatibility layer.

Cc: William Tu <u9012063@gmail.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
6 years agogre: refactor the gre_fb_xmit
William Tu [Thu, 22 Feb 2018 22:24:28 +0000 (14:24 -0800)]
gre: refactor the gre_fb_xmit

Upstream commit:
    commit 862a03c35ed76c50a562f7406ad23315f7862642
    Author: William Tu <u9012063@gmail.com>
    Date:   Fri Aug 25 09:21:27 2017 -0700

    gre: refactor the gre_fb_xmit

    The patch refactors the gre_fb_xmit function, by creating
    prepare_fb_xmit function for later ERSPAN collect_md mode patch.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Only the prepare_fb_xmit() function is pulled in.  Compatibility
issues prevent the refactor of gre_fb_xmit() but we need the
prepare_fb_xmit() function for the subsequent patch.

Cc: William Tu <u9012063@gmail.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
6 years agogre: fix goto statement typo
William Tu [Thu, 22 Feb 2018 20:52:51 +0000 (12:52 -0800)]
gre: fix goto statement typo

Upstream commit:
    commit e3d0328c76dde0b957f62f8c407b79f1d8fe3ef8
    Author: William Tu <u9012063@gmail.com>
    Date:   Tue Aug 22 17:04:05 2017 -0700

    gre: fix goto statement typo

    Fix typo: pnet_tap_faied.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: William Tu <u9012063@gmail.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
6 years agogre: introduce native tunnel support for ERSPAN
William Tu [Thu, 22 Feb 2018 20:28:02 +0000 (12:28 -0800)]
gre: introduce native tunnel support for ERSPAN

Upstream commit:
    commit 84e54fe0a5eaed696dee4019c396f8396f5a908b
    Author: William Tu <u9012063@gmail.com>
    Date:   Tue Aug 22 09:40:28 2017 -0700

    gre: introduce native tunnel support for ERSPAN

    The patch adds ERSPAN type II tunnel support.  The implementation
    is based on the draft at [1].  One of the purposes is for Linux
    box to be able to receive ERSPAN monitoring traffic sent from
    the Cisco switch, by creating a ERSPAN tunnel device.
    In addition, the patch also adds ERSPAN TX, so Linux virtual
    switch can redirect monitored traffic to the ERSPAN tunnel device.
    The traffic will be encapsulated into ERSPAN and sent out.

    The implementation reuses tunnel key as ERSPAN session ID, and
    field 'erspan' as ERSPAN Index fields:
    ./ip link add dev ers11 type erspan seq key 100 erspan 123 \
     local 172.16.1.200 remote 172.16.1.100

    To use the above device as ERSPAN receiver, configure
    Nexus 5000 switch as below:

    monitor session 100 type erspan-source
      erspan-id 123
      vrf default
      destination ip 172.16.1.200
      source interface Ethernet1/11 both
      source interface Ethernet1/12 both
      no shut
    monitor erspan origin ip-address 172.16.1.100 global

    [1] https://tools.ietf.org/html/draft-foschiano-erspan-01
    [2] iproute2 patch: http://marc.info/?l=linux-netdev&m=150306086924951&w=2
    [3] test script: http://marc.info/?l=linux-netdev&m=150231021807304&w=2

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Meenakshi Vohra <mvohra@vmware.com>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit also backports heavily from upstream gre, ip_gre and
ip_tunnel modules to support the necessary erspan ip gre
infrastructure as well as implementing a variety of compatability
layer changes for same support.

Cc: William Tu <u9012063@gmail.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
6 years agocompat: Remove unsupported kernel compat code
Greg Rose [Sat, 24 Feb 2018 22:04:31 +0000 (14:04 -0800)]
compat: Remove unsupported kernel compat code

Anything less than 3.10 isn't supported since a couple of releases ago
so remove the dead code.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
6 years agoovsdb: Use new ovsdb_log_write_and_free().
Justin Pettit [Thu, 17 May 2018 17:58:47 +0000 (10:58 -0700)]
ovsdb: Use new ovsdb_log_write_and_free().

Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
6 years agoofproto-dpif-xlate: Improve tracing through groups.
Ben Pfaff [Thu, 10 May 2018 23:21:50 +0000 (16:21 -0700)]
ofproto-dpif-xlate: Improve tracing through groups.

This makes it clear which buckets from a group are executed and why.

The update to nsh.at provides an example.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
6 years agoofp-print: Handle statistics more systematically.
Ben Pfaff [Thu, 10 May 2018 20:21:29 +0000 (13:21 -0700)]
ofp-print: Handle statistics more systematically.

ofp_to_string__() is supposed to call ofp_print_stats() for all kinds of
statistics, but it was only doing so haphazardly.  This commit makes it
systematic and in the process adds it to at least one case where it was
missing (and fixes up a test case).

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
6 years agoofp-group: Move formatting code for groups into ofp-group.
Ben Pfaff [Thu, 10 May 2018 20:07:45 +0000 (13:07 -0700)]
ofp-group: Move formatting code for groups into ofp-group.

This does a better job of putting related code together.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
6 years agoAdd OpenFlow extensions for group support in OpenFlow 1.0.
Ben Pfaff [Thu, 10 May 2018 00:03:56 +0000 (17:03 -0700)]
Add OpenFlow extensions for group support in OpenFlow 1.0.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
6 years agoofp-group: Require watch_port or watch_group when parsing ff groups.
Ben Pfaff [Thu, 10 May 2018 22:55:07 +0000 (15:55 -0700)]
ofp-group: Require watch_port or watch_group when parsing ff groups.

Fast failover buckets must have a watch_port or a watch_group (or both),
and ovs-vswitchd enforces this, but the bucket parsing code didn't check
it.  This meant that when it was omitted, the error messages were harder
to understand.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
6 years agoofproto-dpif-xlate: Simplify translation for groups.
Ben Pfaff [Thu, 10 May 2018 22:23:43 +0000 (15:23 -0700)]
ofproto-dpif-xlate: Simplify translation for groups.

Translation of groups had a lot of redundant code.  This commit eliminates
most of it.  It should also make it harder to accidentally reintroduce
the reference leak fixed in a previous commit.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
6 years agoofproto-dpif-xlate: Fix reference leak in xlate_dp_hash_select_group().
Ben Pfaff [Thu, 10 May 2018 20:50:51 +0000 (13:50 -0700)]
ofproto-dpif-xlate: Fix reference leak in xlate_dp_hash_select_group().

xlate_group_action() takes a reference to the ofgroup and passes it
down to xlate_group_action__(), xlate_select_group(), and finally to
xlate_dp_hash_select_group(), which is supposed to consume it but fails
to do so.  This commit fixes the problem.

Found by inspection.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
6 years agoovs-ofctl: Clean up Group description in man page.
Justin Pettit [Wed, 16 May 2018 23:00:57 +0000 (16:00 -0700)]
ovs-ofctl: Clean up Group description in man page.

This fixes a few minor issues in the Group description of the ovs-ofctl
man page.  It also puts the description of the dump commands in the same
section as the other Group-related commands.

Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
6 years agoutilities: Add gdb debug commands to dump lists and pmd info
Eelco Chaudron [Wed, 16 May 2018 15:37:11 +0000 (17:37 +0200)]
utilities: Add gdb debug commands to dump lists and pmd info

Adds back-end support for walking ovs cmaps, and the following
commands to the gdb script:

- Dump all poll_thread info added to a specific struct dp_netdev*.
  Usage: ovs_dump_dp_netdev_poll_threads <struct dp_netdev *>

- Dump all nodes of an ovs_list:
    Usage: ovs_dump_ovs_list <struct ovs_list *> {[<structure>] [<member>] {dump}]}

    For example dump all the none quiescent OvS RCU threads:

      (gdb) ovs_dump_ovs_list &ovsrcu_threads
      (struct ovs_list *) 0x7f2a14000900
      (struct ovs_list *) 0x7f2acc000900
      (struct ovs_list *) 0x7f2a680668d0

    This is not very useful, so please use this with the container_of mode:

      (gdb) ovs_dump_ovs_list &ovsrcu_threads 'struct ovsrcu_perthread' list_node
      (struct ovsrcu_perthread *) 0x7f2a14000900
      (struct ovsrcu_perthread *) 0x7f2acc000900
      (struct ovsrcu_perthread *) 0x7f2a680668d0

    Now you can manually use the print command to show the content, or use the
    dump option to dump the structure for all nodes:

      (gdb) ovs_dump_ovs_list &ovsrcu_threads 'struct ovsrcu_perthread' list_node dump
      (struct ovsrcu_perthread *) 0x7f2a14000900 =
        {list_node = {prev = 0xf48e80 <ovsrcu_threads>, next = 0x7f2acc000900}, mutex...

      (struct ovsrcu_perthread *) 0x7f2acc000900 =
        {list_node = {prev = 0x7f2a14000900, next = 0x7f2a680668d0}, mutex ...

      (struct ovsrcu_perthread *) 0x7f2a680668d0 =
        {list_node = {prev = 0x7f2acc000900, next = 0xf48e80 <ovsrcu_threads>}, ...

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agocheckpatch: Be more specific about line length, misspelling warnings.
Ben Pfaff [Wed, 9 May 2018 17:52:49 +0000 (10:52 -0700)]
checkpatch: Be more specific about line length, misspelling warnings.

Until now checkpatch warnings have not said how long a too-long line is
or what word might be misspelled.  This commit makes the messages more
explicit.

To do this the 'print' functions needed to know the line that was in error.
One way to do that was to also pass the line in question to the 'print'
function.  I decided instead to just allow the 'print' function to be
missing and to instead issue these warnings from the 'check' function.  I
don't know whether this design raises any red flags with anyone.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
6 years agoAdd support for OpenFlow 1.5 statistics (OXS).
SatyaValli [Thu, 10 May 2018 16:26:54 +0000 (21:56 +0530)]
Add support for OpenFlow 1.5 statistics (OXS).

This patch provides implementation Existing flow entry statistics are
redefined as standard OXS(OpenFlow Extensible Statistics) fields for
displaying the arbitrary flow stats.

To support this implementation below messages are newly added

OFPRAW_OFPT15_FLOW_REMOVED,
OFPRAW_OFPST15_AGGREGATE_REQUEST,
OFPRAW_OFPST15_FLOW_REPLY,
OFPRAW_OFPST15_AGGREGATE_REPLY,

The current commit adds support for the new feature in flow statistics
multipart messages, aggregate multipart messages and OXS support for flow
removal message, individual flow description messages.

Signed-off-by: Satya Valli <satyavalli.rama@tcs.com>
Co-authored-by: Lavanya Harivelam <harivelam.lavanya@tcs.com>
Signed-off-by: Lavanya Harivelam <harivelam.lavanya@tcs.com>
Co-authored-by: Surya Muttamsetty <muttamsetty.surya@tcs.com>
Signed-off-by: Surya Muttamsetty <muttamsetty.surya@tcs.com>
Co-authored-by: Manasa Cherukupally <manasa.cherukupally@tcs.com>
Signed-off-by: Manasa Cherukupally <manasa.cherukupally@tcs.com>
Co-authored-by: Pavani Panthagada <p.pavani1@tcs.com>
Signed-off-by: Pavani Panthagada <p.pavani1@tcs.com>
[blp@ovn.org simplified and rewrote much of the code]
Co-authored-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agoconntrack-tcp: Handle tcp session reuse.
Darrell Ball [Tue, 15 May 2018 01:38:25 +0000 (18:38 -0700)]
conntrack-tcp: Handle tcp session reuse.

Fix tcp sequence tracking for cases when picking up an existing connection.
This can happen, for example, by doing VM migration and sequence tracking
should be more permissive in these cases.  We don't differentiate picking
up an existing connection vs picking up a new connection; the added
complexity is not worth the benefit of the slightly more strictness in the
case of picking up a new connection.

Fixes: a489b16854b5 ("conntrack: New userspace connection tracker")
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Justin Pettit <jpettit@ovn.org>
6 years agodpif-netdev-perf: Fix linker unresolved symbols on Windows
Alin Gabriel Serdean [Tue, 15 May 2018 00:48:59 +0000 (03:48 +0300)]
dpif-netdev-perf: Fix linker unresolved symbols on Windows

MSVC complains:
"libopenvswitch.lib(dpif-netdev.obj) : error LNK2019: unresolved external
symbol pmd_perf_start_iteration referenced in function pmd_thread_main
libopenvswitch.lib(dpif-netdev.obj) : error LNK2019: unresolved external
symbol pmd_perf_end_iteration referenced in function pmd_thread_main"

Remove inline keyword from the declaration of:
`pmd_perf_start_iteration` and `pmd_perf_end_iteration`

More on the subject:
https://docs.microsoft.com/en-us/cpp/error-messages/tool-errors/function-inlining-problems

Fixes: broken build on Windows
Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
6 years agodatapath: compat: Fix build on RHEL 7.5
Yi-Hung Wei [Fri, 11 May 2018 17:32:12 +0000 (10:32 -0700)]
datapath: compat: Fix build on RHEL 7.5

1) OVS datapath compat modules breaks on RHEL 7.5, because it moves
ndo_change_mtu function pointer from 'struct net_device_ops' to
'struct net_device_ops_extended'.

2) RHEL 7.5 introduces the MTU range checking as mentioned in
6c0bf091 ("datapath: use core MTU range checking in core net infra").
However, the max_mtu field is defined in 'struct net_device_extended'
but not in 'struct net_device' as upstream kernel.

This patch defines a new symbol HAVE_RHEL7_MAX_MTU that determines
the previous 2 conditions, and fixes the backport issue.

Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Acked-by: Lucas Alvares Gomes <lucasagomes@gmail.com>
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
6 years agotunnel: make tun_key_to_attr aware of tunnel type.
William Tu [Mon, 14 May 2018 18:46:47 +0000 (11:46 -0700)]
tunnel: make tun_key_to_attr aware of tunnel type.

When there is a flow rule which forwards a packet from geneve
port to another tunnel port, ex: gre, the tun_metadata carried
from the geneve port might affect the outgoing port.  For example,
the datapath action from geneve port output to gre port (1) shows:
  set(tunnel(tun_id=0x7b,dst=2.2.2.2,ttl=64,
    geneve({class=0xffff,type=0,len=4,0x123}),flags(df|key))),1
Where the geneve(...) should not exist.

When using kernel's tunnel port, this triggers an error saying:
"Multiple metadata blocks provided", when there is a rule forwarding
the geneve packet to vxlan/erspan tunnel port.  A userspace test case
using geneve and gre also demonstrates the issue.

The patch makes the tun_key_to_attr aware of the tunnel type. So only
the relevant output tunnel's options are set.

Reported-by: Xiaoyan Jin <xiaoyanj@vmware.com>
Signed-off-by: William Tu <u9012063@gmail.com>
Cc: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agosparse: Support newer GCC/glibc versions.
Ben Pfaff [Mon, 14 May 2018 17:06:20 +0000 (10:06 -0700)]
sparse: Support newer GCC/glibc versions.

This fixes some "sparse" errors I encountered after upgrading.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
6 years agoovn: Set proper Neighbour Adv flag when replying for NS request for router IP
Numan Siddique [Fri, 11 May 2018 10:38:00 +0000 (16:08 +0530)]
ovn: Set proper Neighbour Adv flag when replying for NS request for router IP

Presently when a VM's IPv6 stack sends a Neighbor Solicitation request for its
router IP, (mostly when the ND cache entry for the router is in STALE state)
ovn-controller responds with a Neighbor Adv packet (using the action nd_na).
But it doesn't set 'ND_RSO_ROUTER' in the RSO flags (please see RFC4861 page 23).
Because of which, the VM deletes the default route. The default route gets added
again when the next RA is received (but would again gets deleted if its sends
NS request). And this results in disruption of IPv6 traffic.

This patch addresses this issue by adding a new action 'nd_na_router' which is
same as 'nd_na' but it sets the 'ND_RSO_ROUTER' in the RSO flags. ovn-northd
uses this action. A new action is added instead of modifying the existing 'nd_na'
action. This is because
  - We cannot set the RSO flags in the "nd_na { ..actions .. }"
  - It would be ugly to have something like nd_na { router_flags, ...actions .. }

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1567735
Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
Acked-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agotests: Wait for NDPs to be sent in tunnel-push-pop-ipv6.
Ilya Maximets [Mon, 14 May 2018 13:35:43 +0000 (16:35 +0300)]
tests: Wait for NDPs to be sent in tunnel-push-pop-ipv6.

Otherwise the tests can fail under heavy load (or with valgrind).

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agonetdev-tc-offloads: Fix incorrect mask in probe_multi_mask_per_prio().
Ben Pfaff [Fri, 11 May 2018 16:10:01 +0000 (09:10 -0700)]
netdev-tc-offloads: Fix incorrect mask in probe_multi_mask_per_prio().

Presumably this was meant to be all-one-bits but it wasn't.  It also didn't
have the right endianness for an ovs_be16, so "sparse" complained.

CC: Paul Blakey <paulb@mellanox.com>
CC: Simon Horman <simon.horman@netronome.com>
Fixes: d00eeded6a9b ("netdev-tc-offloads: Probe for allowing multiple masks on single priority")
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
6 years agorhel: Specify that force-corefiles is enabled by default
Timothy Redaelli [Fri, 11 May 2018 17:13:10 +0000 (19:13 +0200)]
rhel: Specify that force-corefiles is enabled by default

Currently in /etc/sysconfig/openvswitch it's not clear that
force-corefiles is enabled by default.
This patch adds a comment explaining that force-corefiles is, by
default, set to yes.

Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agoMerge branch 'dpdk_merge' of https://github.com/istokes/ovs into HEAD
Ben Pfaff [Fri, 11 May 2018 16:07:09 +0000 (09:07 -0700)]
Merge branch 'dpdk_merge' of https://github.com/istokes/ovs into HEAD

6 years agonetdev-tc-offloads: Probe for allowing multiple masks on single priority
Paul Blakey [Sun, 6 May 2018 12:26:35 +0000 (15:26 +0300)]
netdev-tc-offloads: Probe for allowing multiple masks on single priority

OVS datapath flows aren't overlapping, so having their tc flower
counterparts be prioritized makes no sense, we did so because of a
tc flower restriction.

Kernel tc flower added support for multiple masks on a single flower
instance (there's an instance per priority) to remove this restriction.
Probe for this once on first added port, and if available, use a
single priority per ethertype when inserting tc flower flows.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
6 years agonetdev-dpdk: Fixed netdev_dpdk structure alignment
Eelco Chaudron [Wed, 25 Apr 2018 15:48:23 +0000 (17:48 +0200)]
netdev-dpdk: Fixed netdev_dpdk structure alignment

Currently, the code tells us we have 4 pad bytes left in cacheline0
while actually we are 8 bytes short:

struct netdev_dpdk {
union {
OVS_CACHE_LINE_MARKER cacheline0;        /*           1 */
struct {
dpdk_port_t port_id;             /*     0     2 */
_Bool      attached;             /*     2     1 */
struct eth_addr hwaddr;          /*     4     6 */
int        mtu;                  /*    12     4 */
int        socket_id;            /*    16     4 */
int        buf_size;             /*    20     4 */
int        max_packet_len;       /*    24     4 */
enum dpdk_dev_type type;         /*    28     4 */
enum netdev_flags flags;         /*    32     4 */
char *     devargs;              /*    40     8 */
struct dpdk_tx_queue * tx_q;     /*    48     8 */
struct rte_eth_link link;        /*    56     8 */
int        link_reset_cnt;       /*    64     4 */
};                                       /*          72 */
uint8_t            pad9[128];            /*         128 */
};                                               /*     0   128 */
/* --- cacheline 2 boundary (128 bytes) --- */

Re-located one member, link_reset_cnt, and now it's one cache line:

struct netdev_dpdk {
union {
OVS_CACHE_LINE_MARKER cacheline0;        /*           1 */
struct {
dpdk_port_t port_id;             /*     0     2 */
_Bool      attached;             /*     2     1 */
struct eth_addr hwaddr;          /*     4     6 */
int        mtu;                  /*    12     4 */
int        socket_id;            /*    16     4 */
int        buf_size;             /*    20     4 */
int        max_packet_len;       /*    24     4 */
enum dpdk_dev_type type;         /*    28     4 */
enum netdev_flags flags;         /*    32     4 */
int        link_reset_cnt;       /*    36     4 */
char *     devargs;              /*    40     8 */
struct dpdk_tx_queue * tx_q;     /*    48     8 */
struct rte_eth_link link;        /*    56     8 */
};                                       /*          64 */
uint8_t            pad9[64];             /*          64 */
};                                               /*     0    64 */
/* --- cacheline 1 boundary (64 bytes) --- */

Fixes: 5e925ccc2a6f ("netdev-dpdk: DPDK v17.11 upgrade")
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Tiago Lam <tiago.lam@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
6 years agodpdk-testsuite: Filter 1G HugePages WARN log.
Tiago Lam [Thu, 3 May 2018 16:35:44 +0000 (17:35 +0100)]
dpdk-testsuite: Filter 1G HugePages WARN log.

Currently, DPDK prints a WARN log if one doesn't have 1GB HugePages
available. Since OVS_SWITCHD_STOP considers any WARN a failure, the
newly added DPDK testsuite tests fail if one doesn't have 1GB Hugepages
configured, even though it is still possible to run OvS-DPDK over 2MB
HugePages.

This changes the tests to filter for the following message, meaning it
will start being ignored and systems with 2MB HugePages can run the tests
successfully:
    EAL: No free hugepages reported in hugepages-1048576kB

Signed-off-by: Tiago Lam <tiago.lam@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
6 years agogitignore: Ignore system-dpdk-testsuite
Tiago Lam [Thu, 3 May 2018 16:35:43 +0000 (17:35 +0100)]
gitignore: Ignore system-dpdk-testsuite

Commit a7e4849 ("tests: Add system-dpdk-testsuite") introduced a new
testsuite for OvS-DPDK. This generates a system-dpdk-testsuite script at
build time which, as it happens for other testsuites, should not be part
of the repo.

Add the generated script to tests/.gitignore to reflect the above.

Signed-off-by: Tiago Lam <tiago.lam@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
6 years agoConfigurable Link State Change (LSC) detection mode
Róbert Mulik [Mon, 23 Apr 2018 11:42:41 +0000 (11:42 +0000)]
Configurable Link State Change (LSC) detection mode

It is possible to set LSC detection mode to polling or interrupt mode
for DPDK interfaces. The default is polling mode. To set interrupt mode,
option dpdk-lsc-interrupt has to be set to true.

For detailed description and usage see the dpdk install documentation.

Signed-off-by: Robert Mulik <robert.mulik@ericsson.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
6 years agodpif-netdev: Detection and logging of suspicious PMD iterations
Jan Scheurich [Thu, 19 Apr 2018 17:40:46 +0000 (19:40 +0200)]
dpif-netdev: Detection and logging of suspicious PMD iterations

This patch enhances dpif-netdev-perf to detect iterations with
suspicious statistics according to the following criteria:

- iteration lasts longer than US_THR microseconds (default 250).
  This can be used to capture events where a PMD is blocked or
  interrupted for such a period of time that there is a risk for
  dropped packets on any of its Rx queues.

- max vhost qlen exceeds a threshold Q_THR (default 128). This can
  be used to infer virtio queue overruns and dropped packets inside
  a VM, which are not visible in OVS otherwise.

Such suspicious iterations can be logged together with their iteration
statistics to be able to correlate them to packet drop or other events
outside OVS.

A new command is introduced to enable/disable logging at run-time and
to adjust the above thresholds for suspicious iterations:

ovs-appctl dpif-netdev/pmd-perf-log-set on | off
    [-b before] [-a after] [-e|-ne] [-us usec] [-q qlen]

Turn logging on or off at run-time (on|off).

-b before:  The number of iterations before the suspicious iteration to
            be logged (default 5).
-a after:   The number of iterations after the suspicious iteration to
            be logged (default 5).
-e:         Extend logging interval if another suspicious iteration is
            detected before logging occurs.
-ne:        Do not extend logging interval (default).
-q qlen:    Suspicious vhost queue fill level threshold. Increase this
            to 512 if the Qemu supports 1024 virtio queue length.
            (default 128).
-us usec:   change the duration threshold for a suspicious iteration
            (default 250 us).

Note: Logging of suspicious iterations itself consumes a considerable
amount of processing cycles of a PMD which may be visible in the iteration
history. In the worst case this can lead OVS to detect another
suspicious iteration caused by logging.

If more than 100 iterations around a suspicious iteration have been
logged once, OVS falls back to the safe default values (-b 5/-a 5/-ne)
to avoid that logging itself causes continuos further logging.

Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Acked-by: Billy O'Mahony <billy.o.mahony@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
6 years agodpif-netdev: Detailed performance stats for PMDs
Jan Scheurich [Thu, 19 Apr 2018 17:40:45 +0000 (19:40 +0200)]
dpif-netdev: Detailed performance stats for PMDs

This patch instruments the dpif-netdev datapath to record detailed
statistics of what is happening in every iteration of a PMD thread.

The collection of detailed statistics can be controlled by a new
Open_vSwitch configuration parameter "other_config:pmd-perf-metrics".
By default it is disabled. The run-time overhead, when enabled, is
in the order of 1%.

The covered metrics per iteration are:
  - cycles
  - packets
  - (rx) batches
  - packets/batch
  - max. vhostuser qlen
  - upcalls
  - cycles spent in upcalls

This raw recorded data is used threefold:

1. In histograms for each of the following metrics:
   - cycles/iteration (log.)
   - packets/iteration (log.)
   - cycles/packet
   - packets/batch
   - max. vhostuser qlen (log.)
   - upcalls
   - cycles/upcall (log)
   The histograms bins are divided linear or logarithmic.

2. A cyclic history of the above statistics for 999 iterations

3. A cyclic history of the cummulative/average values per millisecond
   wall clock for the last 1000 milliseconds:
   - number of iterations
   - avg. cycles/iteration
   - packets (Kpps)
   - avg. packets/batch
   - avg. max vhost qlen
   - upcalls
   - avg. cycles/upcall

The gathered performance metrics can be printed at any time with the
new CLI command

ovs-appctl dpif-netdev/pmd-perf-show [-nh] [-it iter_len] [-ms ms_len]
    [-pmd core] [dp]

The options are

-nh:            Suppress the histograms
-it iter_len:   Display the last iter_len iteration stats
-ms ms_len:     Display the last ms_len millisecond stats
-pmd core:      Display only the specified PMD

The performance statistics are reset with the existing
dpif-netdev/pmd-stats-clear command.

The output always contains the following global PMD statistics,
similar to the pmd-stats-show command:

Time: 15:24:55.270
Measurement duration: 1.008 s

pmd thread numa_id 0 core_id 1:

  Cycles:            2419034712  (2.40 GHz)
  Iterations:            572817  (1.76 us/it)
  - idle:                486808  (15.9 % cycles)
  - busy:                 86009  (84.1 % cycles)
  Rx packets:           2399607  (2381 Kpps, 848 cycles/pkt)
  Datapath passes:      3599415  (1.50 passes/pkt)
  - EMC hits:            336472  ( 9.3 %)
  - Megaflow hits:      3262943  (90.7 %, 1.00 subtbl lookups/hit)
  - Upcalls:                  0  ( 0.0 %, 0.0 us/upcall)
  - Lost upcalls:             0  ( 0.0 %)
  Tx packets:           2399607  (2381 Kpps)
  Tx batches:            171400  (14.00 pkts/batch)

Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Acked-by: Billy O'Mahony <billy.o.mahony@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
6 years agonetdev: Add optional qfill output parameter to rxq_recv()
Jan Scheurich [Thu, 19 Apr 2018 17:40:44 +0000 (19:40 +0200)]
netdev: Add optional qfill output parameter to rxq_recv()

If the caller provides a non-NULL qfill pointer and the netdev
implemementation supports reading the rx queue fill level, the rxq_recv()
function returns the remaining number of packets in the rx queue after
reception of the packet burst to the caller. If the implementation does
not support this, it returns -ENOTSUP instead. Reading the remaining queue
fill level should not substantilly slow down the recv() operation.

A first implementation is provided for ethernet and vhostuser DPDK ports
in netdev-dpdk.c.

This output parameter will be used in the upcoming commit for PMD
performance metrics to supervise the rx queue fill level for DPDK
vhostuser ports.

Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Acked-by: Billy O'Mahony <billy.o.mahony@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
6 years agonetdev-dpdk: don't enable scatter for jumbo RX support for nfp
Pablo Cascón [Fri, 27 Apr 2018 16:40:49 +0000 (17:40 +0100)]
netdev-dpdk: don't enable scatter for jumbo RX support for nfp

Currently to RX jumbo packets fails for NICs not supporting scatter.
Scatter is not strictly needed for jumbo RX support. This change fixes
the issue by not enabling scatter only for the PMD/NIC known not to
need it to support jumbo RX.

Note: this change is temporary and not needed for later releases OVS/DPDK

Reported-by: Louis Peens <louis.peens@netronome.com>
Signed-off-by: Pablo Cascón <pablo.cascon@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
6 years agofaq: Document DPDK version maintenance.
Kevin Traynor [Wed, 25 Apr 2018 11:20:53 +0000 (12:20 +0100)]
faq: Document DPDK version maintenance.

The faq already shows the DPDK versions that were
used with each OVS version. Give information about
DPDK stable and LTS releases, so the user can understand
if those versions are maintained.

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
6 years agodpdk: Use DPDK 17.11.2 release.
Kevin Traynor [Mon, 23 Apr 2018 15:11:46 +0000 (16:11 +0100)]
dpdk: Use DPDK 17.11.2 release.

Modify travis linux build script to use the latest
DPDK stable release 17.11.2. Update docs for latest
DPDK stable releases.

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
6 years agoovn-nbctl: Support ACL commands on port groups.
Han Zhou [Thu, 10 May 2018 06:32:04 +0000 (23:32 -0700)]
ovn-nbctl: Support ACL commands on port groups.

Add support for using ovn-nbctl to add/delete/list ACLs on port
groups.

A new option --type is also supported for these commands to
explicitely specify, when needed, whether the operation is on a
port-group or a logical switch. E.g.

ovn-nbctl --type=port-group acl-add port_group1 to-lport 1000 \
    'outport == @port_group1 && ip4.src == $port_group1_ip4' \
     allow-related

Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agoAdd mirror/ovs-tcpdump support for slow protocols' Rx path.
Manohar Krishnappa Chidambaraswamy [Thu, 10 May 2018 06:56:22 +0000 (06:56 +0000)]
Add mirror/ovs-tcpdump support for slow protocols' Rx path.

Problem:
========
Received LACP/CFM/BFD/STP/LLDP slow protocols' packets are not captured in
ovs-tcpdump.

Fix:
====
Add mirror support for slow protocols.

Signed-off-by: Manohar K C <manohar.krishnappa.chidambaraswamy@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agoofproto: Fix crash processing malformed Bundle Add message.
Anju Thomas [Mon, 7 May 2018 17:28:06 +0000 (22:58 +0530)]
ofproto: Fix crash processing malformed Bundle Add message.

When an OpenFlow Bundle Add message is received, a bundle entry is
created and the OpenFlow message embedded in the bundle add message is
processed.  If any error is encountered while processing the embedded
message, the bundle entry is freed. The bundle entry free function
assumes that the entry has been populated with a properly formatted
OpenFlow message and performs some message specific cleanup actions .
This assumption does not hold true in the error case and OVS crashes
when performing the cleanup.

The fix is in case of errors, simply free the bundle entry without
attempting to perform any embedded message cleanup

Signed-off-by: Anju Thomas <anju.thomas@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agotests: Fix typo in test name.
Ben Pfaff [Fri, 13 Apr 2018 17:38:53 +0000 (10:38 -0700)]
tests: Fix typo in test name.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
6 years agofaq: Start an OVN FAQ by giving a rationale for how it uses tunnels.
Ben Pfaff [Mon, 16 Apr 2018 19:16:24 +0000 (12:16 -0700)]
faq: Start an OVN FAQ by giving a rationale for how it uses tunnels.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
6 years agocheckpatch: Fix filename matching.
Ben Pfaff [Wed, 9 May 2018 18:26:06 +0000 (11:26 -0700)]
checkpatch: Fix filename matching.

The .match() method only matches at the beginning of a string but the
blacklists here need to match anywhere in a string.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Numan Siddique <nusiddiq@redhat.com>
6 years agorhel: openvswitch-fedora.spec.in: Specify PYTHON and PYTHON3
Timothy Redaelli [Thu, 10 May 2018 15:21:41 +0000 (17:21 +0200)]
rhel: openvswitch-fedora.spec.in: Specify PYTHON and PYTHON3

Currently python2 and python3 binaries are searched by following the
PATHs, but, on Fedora, the python2 package does not provides /bin/python2
and so if the PATH contains /bin before /usr/bin (for example by using
the ansible poc) then the resulting RPM file will require /bin/python2
instead of /usr/bin/python2 and this breaks some tools (for example
createrepo).

This patch specify the full path of python2 interpreter and,
if python3-openvswitch package is built, the full path of python3
interpreter.

Reported-by: Ansis Atteka <ansisatteka@gmail.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2018-May/346796.html
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Acked-by: Ansis Atteka <aatteka@ovn.org>
6 years agoofproto: Allow bundle idle timeout to be configured.
Flavio Leitner [Thu, 19 Apr 2018 17:09:38 +0000 (14:09 -0300)]
ofproto: Allow bundle idle timeout to be configured.

In some cases 10 seconds might be too much time and in
other cases it might be too little.

The OpenFlow spec mandates that it should wait at least one
second, so enforce that as the minimum acceptable value.

Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agoAvoid crash in OvS while transmitting fragmented packets over tunnel.
Rohith Basavaraja [Fri, 20 Apr 2018 08:47:58 +0000 (08:47 +0000)]
Avoid crash in OvS while transmitting fragmented packets over tunnel.

Currently when fragmented packets are to be transmitted in to tunnel,
base_flow->nw_frag which was initially non-zero at reception is not
reset to zero when the base_flow and flow are rewritten
as part of the emulated tnl_push action in the ofproto-dpif-xlate
module.

Because of this when fragmented packets are transmitted out of tunnel,
we hit crash caused by the following assert.

lib/odp-util.c:5654: assertion flow->nw_proto == base_flow->nw_proto &&
flow->nw_frag == base_flow->nw_frag failed in commit_set_ipv4_action()

With the following change propagate_tunnel_data_to_flow__
is modified to reset *nw_frag* to zero. Also, that currently we don't
fragment tunnelled packets, we should reset *nw_frag* to zero in
propagate_tunnel_data_to_flow__.

Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
From: Rohith Basavaraja <rohith.basavaraja@ericsson.com>
CC: Jan Scheurich <jan.scheurich@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agoExpose missing --peer-ca-cert and SSL options in usage and manpages.
Dan Williams [Mon, 23 Apr 2018 18:04:28 +0000 (13:04 -0500)]
Expose missing --peer-ca-cert and SSL options in usage and manpages.

Signed-off-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agoovn-nbctl: Show gw chassis in decreasing prio order.
Lorenzo Bianconi [Thu, 26 Apr 2018 14:35:46 +0000 (16:35 +0200)]
ovn-nbctl: Show gw chassis in decreasing prio order.

Report gateway chassis in decreasing priority order running ovn-nbctl
show sub-command. Add get_ordered_gw_chassis_prio_list routine to sort
gw chassis according to the configured priority

Acked-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agoDoc: Fix commands not being shown in code blocks
axel@tripier.fr [Fri, 27 Apr 2018 14:59:50 +0000 (16:59 +0200)]
Doc: Fix commands not being shown in code blocks

Some commands are not shown in code blocks in the Advances Features
tutorial, they are shown as variable width text because of a missing ":"
to designate them as code blocks.

Signed-off-by: Axel Tripier <axel@tripier.fr>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agoDoc: Fix binary representation in Faucet tutorial
axel@tripier.fr [Fri, 27 Apr 2018 15:11:24 +0000 (17:11 +0200)]
Doc: Fix binary representation in Faucet tutorial

The binary representation of 80 and 8080 are switched in the
Faucet tutorial.

Signed-off-by: Axel Tripier <axel@tripier.fr>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agovswitchd: Enhance manager_options documentation.
Darrell Ball [Tue, 8 May 2018 22:43:07 +0000 (15:43 -0700)]
vswitchd: Enhance manager_options documentation.

Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>