]> git.proxmox.com Git - ovs.git/log
ovs.git
6 years agoodp-util: Print eth() for Ethernet flows if packet_type is absent.
Ben Pfaff [Wed, 14 Mar 2018 21:57:23 +0000 (14:57 -0700)]
odp-util: Print eth() for Ethernet flows if packet_type is absent.

OVS datapaths have two different ways to indicate what kind of packet a
flow matches.  One way, used by the userspace datapath, is
OVS_KEY_ATTR_PACKET_TYPE.  Another way, used by the kernel datapath, is
OVS_KEY_ATTR_ETHERTYPE when used in the absence of OVS_KEY_ATTR_ETHERNET;
when the latter is present, the packet is always an Ethernet packet.  The
code to print datapath flows wasn't paying attention to this distinction
and always omitted eth() from the output when OVS_KEY_ATTR_ETHERNET was
fully wildcarded, which meant that upon later re-parsing the
OVS_KEY_ATTR_ETHERNET key was omitted, which made it look like a
non-Ethernet match was being described.

This commit makes odp_util_format() add eth() to the output when
OVS_KEY_ATTR_ETHERNET is present and OVS_KEY_ATTR_PACKET_TYPE is absent,
avoiding the problem.

Reported-by: Amar Padmanabhan <amarpadmanabhan@fb.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2017-December/045817.html
Reported-by: Su Wang <suwang@vmware.com>
VMWare-BZ: #2070488
Signed-off-by: Ben Pfaff <blp@ovn.org>
Tested-by: Yi-Hung Wei <yihung.wei@gmail.com>
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
6 years agopython: KeyError shouldn't be raised from __getattr__
Timothy Redaelli [Mon, 12 Mar 2018 10:52:21 +0000 (11:52 +0100)]
python: KeyError shouldn't be raised from __getattr__

On Python 3 hasattr only intercepts AttributeError exception.
On Python2, instead, hasattr intercepts all the exceptions.

This means __getattr__ shouldn't return KeyError when the attribute
doesn't exists, but it should raise AttributeError instead.

Fixes: 2d54d8011e14 ("Python-IDL: getattr after mutate fix")
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Signed-off-by: Russell Bryant <russell@ovn.org>
6 years agopython: Fix decoding error when the received data is larger than 4096.
Guoshuai Li [Thu, 1 Mar 2018 06:27:37 +0000 (14:27 +0800)]
python: Fix decoding error when the received data is larger than 4096.

It can only receive 4096 bytes of data each time in jsonrpc,
when there are similar and Chinese characters occupy multiple bytes,
it may receive half a character, this time the decoding will be abnormal.
We need to receive the completed character to decode.

Signed-off-by: Guoshuai Li <ligs@dtdream.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agotests/ofproto-dpif: New test for action_set after traversing patch port
Eric Garver [Thu, 1 Mar 2018 22:59:42 +0000 (17:59 -0500)]
tests/ofproto-dpif: New test for action_set after traversing patch port

Signed-off-by: Eric Garver <e@erig.me>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agoofproto-dpif-xlate: translate action_set in clone action
Eric Garver [Thu, 1 Mar 2018 22:59:41 +0000 (17:59 -0500)]
ofproto-dpif-xlate: translate action_set in clone action

A clone action saves the action_set prior to performing the clone, then
restores it afterwards. However when xlating the actions it neglects to
consider the action_set so any write_action() inside a clone() are
ignored. Unfortunately patch ports are internally implemented via
clone(). So a frame traversing to a second bridge via patch port will
never be affected by write_action() in the second bridge's flow table.

Lets make clone() aware of the action_set.

Signed-off-by: Eric Garver <e@erig.me>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agoovsdb-server: Don't be picky about particular error in test.
Ben Pfaff [Wed, 7 Mar 2018 21:16:41 +0000 (13:16 -0800)]
ovsdb-server: Don't be picky about particular error in test.

On Windows this test reports "Unknown error" instead of "Protocol error",
so disregard the particular error message.

Reported-by: Alin Gabriel Serdean <aserdean@ovn.org>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2018-March/344951.html
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
6 years agotests: Fix hang when "SSL db: implementation" test failed.
Ben Pfaff [Wed, 7 Mar 2018 21:14:58 +0000 (13:14 -0800)]
tests: Fix hang when "SSL db: implementation" test failed.

The tests were killing $(cat pid) on failure but needed to kill $(cat
ovsdb-server.pid).

Reported-by: Alin Gabriel Serdean <aserdean@ovn.org>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2018-March/344951.html
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
6 years agoovn: Calculate UDP checksum for DNS over IPv6
Mark Michelson [Wed, 7 Mar 2018 15:31:00 +0000 (09:31 -0600)]
ovn: Calculate UDP checksum for DNS over IPv6

Unlike IPv4, IPv6 mandates the calculation of the UDP checksum. For DNS
resolution in OVN, we were setting the checksum to 0, which results in
errors.

This patch fixes the problem by calculating the checksum for DNS over
IPv6. It also alters the applicable test by skipping the checksum when
comparing the expected and actual packets.

Signed-off-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agoovsdb: Fix time in log traces when compacting database
Daniel Alvarez [Wed, 7 Mar 2018 18:02:30 +0000 (19:02 +0100)]
ovsdb: Fix time in log traces when compacting database

Current code is mixing wall and monotonic clocks and the traces are not
useful since the timestamps are not accurate. This patch fixes it by
using the same time reference for the log as used in the code.

Without this patch, the traces look like this:
compacting database online (1519124364.908 seconds old, 951 transactions)

Signed-off-by: Daniel Alvarez <dalvarez@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agorhel: Avoid losing bridge configuration after adding DPDK ports
Vishal Deep Ajmera [Thu, 22 Feb 2018 19:18:49 +0000 (00:48 +0530)]
rhel: Avoid losing bridge configuration after adding DPDK ports

Whenever a DPDK port is added to or deleted from an OVS bridge, the bridge
interface is reconfigured with the lowest MAC address among the connected DPDK
ports. When changing the MAC address, OVS performs a sequences of events
UP -> DOWN -> UP on the bridge interface. In deployments of OVS in RHEL
distribution this results in loosing Linux networking configuration attached to
the bridge interface (e.g. static routes).

This patch changes the interface configuration scripts used in a RHEL deployment
to trigger post-up operations on the bridge device after a change of MAC address.

Signed-off-by: Vishal Deep Ajmera <vishal.deep.ajmera@ericsson.com>
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Russell Bryant <russell@ovn.org>
6 years agotravis: Update Linux kernel test list
Greg Rose [Wed, 14 Feb 2018 23:18:10 +0000 (15:18 -0800)]
travis: Update Linux kernel test list

Add newly supported 4.15 release and also update the kernel test list
to the LTS list at www.kernel.org.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
6 years agotravis: Update kernel test list from kernel.org
Greg Rose [Wed, 7 Feb 2018 15:50:01 +0000 (07:50 -0800)]
travis: Update kernel test list from kernel.org

Also add package libelf-dev - since 4.14 it's required for making
the source.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
6 years agoDocumentation: Update NEWS and faq
Greg Rose [Mon, 19 Feb 2018 18:38:57 +0000 (10:38 -0800)]
Documentation: Update NEWS and faq

Per the Linux 4.15 kernel support.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
6 years agoacinclude: Enable building for Linux kernel 4.15
Greg Rose [Wed, 14 Feb 2018 23:18:09 +0000 (15:18 -0800)]
acinclude: Enable building for Linux kernel 4.15

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
6 years agoofproto-dpif-upcall: fix for segmentation fault
Ashish Varma [Mon, 5 Mar 2018 23:04:01 +0000 (15:04 -0800)]
ofproto-dpif-upcall: fix for segmentation fault

Added check for NULL pointer on return from xlate_lookup_ofproto
function. Access to "ofproto" variable when NULL was causing segmentation
fault.

VMware-BZ: #2061914
CC: Justin Pettit <jpettit@ovn.org>
Fixes: d39ec23de384 ("ofproto-dpif: Don't slow-path controller actions.")
Signed-off-by: Ashish Varma <ashishvarma.ovs@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agocompat: Fix RHEL 7 build warnings
Greg Rose [Mon, 26 Feb 2018 22:10:16 +0000 (14:10 -0800)]
compat: Fix RHEL 7 build warnings

A prior commit to fix up netdev_master_upper_dev_link for recent
kernels caused a compile warning on RHEL 7 builds.

Fixes: 86a94a96163c ("datapath: Fix netdev_master_upper_dev_link for 4.14")
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
6 years agodatapath: compat: Fix RHEL 7 compile
Greg Rose [Wed, 28 Feb 2018 03:52:57 +0000 (19:52 -0800)]
datapath: compat: Fix RHEL 7 compile

frag_percpu_counter_batch is a variable, not a define, so checking if
it is defined is an error and causes warning messages during compile
on RHEL 7 (or other 3.10 based) builds.  Use a compat #define from
acinclude.m4 instead.

Fixes: 2070685328a6a ("compat:inet_frag.h: Check for frag_percpu_counter_batch")
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
6 years agodatapath-windows: fix hash creation on ct mark
Alin Gabriel Serdean [Wed, 21 Feb 2018 14:57:29 +0000 (16:57 +0200)]
datapath-windows: fix hash creation on ct mark

Use key->ct.mark instead of key->ct.zone when generating the hash
over the mark.

Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
Acked-by: Anand Kumar <kumaranand@vmware.com>
6 years agotests: Make packet-type-aware.at hash independent
Balazs Nemeth [Mon, 26 Feb 2018 09:10:35 +0000 (09:10 +0000)]
tests: Make packet-type-aware.at hash independent

When compiling with -msse4.2 a test case of packet-type-aware.at will
fail due to the CRC32 based hash function is different from mhash.
Fix this issue with parsing the port statistics one-by-one.

Signed-off-by: Balazs Nemeth <balazs.nemeth@ericsson.com>
CC: Jan Scheurich <jan.scheurich@ericsson.com>
CC: Zoltan Balogh <zoltan.balogh@ericsson.com>
Fixes: 00135b869d7c ("xlate: fix xport lookup for recirc")
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agoPrepare for 2.9.1.
Justin Pettit [Mon, 19 Feb 2018 22:35:00 +0000 (14:35 -0800)]
Prepare for 2.9.1.

Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
6 years agoSet release dates for 2.9.0.
Justin Pettit [Mon, 19 Feb 2018 19:04:49 +0000 (11:04 -0800)]
Set release dates for 2.9.0.

Signed-off-by: Justin Pettit <jpettit@ovn.org>
6 years agoofp-meter: Fix use-after-free for decoding meter mods.
Ben Pfaff [Wed, 14 Feb 2018 22:36:47 +0000 (14:36 -0800)]
ofp-meter: Fix use-after-free for decoding meter mods.

ofputil_pull_bands() may change bands->data.

Found by libfuzzer-ngram.

Reported-by: Bhargava Shastry <bshastry@sect.tu-berlin.de>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Reviewed-by: Yifeng Sun<pkusunyifeng@gmail.com>
6 years agodatapath: Remove padding from packet before L3+ conntrack processing
Ed Swierk [Wed, 14 Feb 2018 23:18:08 +0000 (15:18 -0800)]
datapath: Remove padding from packet before L3+ conntrack processing

Upstream commit:
    commit 9382fe71c0058465e942a633869629929102843d
    Author: Ed Swierk <eswierk@skyportsystems.com>
    Date:   Wed Jan 31 18:48:02 2018 -0800

    openvswitch: Remove padding from packet before L3+ conntrack processing

    IPv4 and IPv6 packets may arrive with lower-layer padding that is not
    included in the L3 length. For example, a short IPv4 packet may have
    up to 6 bytes of padding following the IP payload when received on an
    Ethernet device with a minimum packet length of 64 bytes.

    Higher-layer processing functions in netfilter (e.g. nf_ip_checksum(),
    and help() in nf_conntrack_ftp) assume skb->len reflects the length of
    the L3 header and payload, rather than referring back to
    ip_hdr->tot_len or ipv6_hdr->payload_len, and get confused by
    lower-layer padding.

    In the normal IPv4 receive path, ip_rcv() trims the packet to
    ip_hdr->tot_len before invoking netfilter hooks. In the IPv6 receive
    path, ip6_rcv() does the same using ipv6_hdr->payload_len. Similarly
    in the br_netfilter receive path, br_validate_ipv4() and
    br_validate_ipv6() trim the packet to the L3 length before invoking
    netfilter hooks.

    Currently in the OVS conntrack receive path, ovs_ct_execute() pulls
    the skb to the L3 header but does not trim it to the L3 length before
    calling nf_conntrack_in(NF_INET_PRE_ROUTING). When
    nf_conntrack_proto_tcp encounters a packet with lower-layer padding,
    nf_ip_checksum() fails causing a "nf_ct_tcp: bad TCP checksum" log
    message. While extra zero bytes don't affect the checksum, the length
    in the IP pseudoheader does. That length is based on skb->len, and
    without trimming, it doesn't match the length the sender used when
    computing the checksum.

    In ovs_ct_execute(), trim the skb to the L3 length before higher-layer
    processing.

Signed-off-by: Ed Swierk <eswierk@skyportsystems.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Ed Swierk <eswierk@skyportsystems.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
6 years agodatapath: Fix pop_vlan action for double tagged frames
Eric Garver [Wed, 14 Feb 2018 23:18:04 +0000 (15:18 -0800)]
datapath: Fix pop_vlan action for double tagged frames

Upstream commit:
    commit c48e74736fccf25fb32bb015426359e1c2016e3b
    Author: Eric Garver <e@erig.me>
    Date:   Wed Dec 20 15:09:22 2017 -0500

    openvswitch: Fix pop_vlan action for double tagged frames

    skb_vlan_pop() expects skb->protocol to be a valid TPID for double
    tagged frames. So set skb->protocol to the TPID and let skb_vlan_pop()
    shift the true ethertype into position for us.

Fixes: 5108bbaddc37 ("openvswitch: add processing of L3 packets")
Signed-off-by: Eric Garver <e@erig.me>
Reviewed-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Eric Garver <e@erig.me>
Fixes: a27c454ee0 ("datapath: add processing of L3 packets")
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
6 years agodatapath: do not propagate headroom updates to internal port
paolo abeni [Wed, 14 Feb 2018 23:18:03 +0000 (15:18 -0800)]
datapath: do not propagate headroom updates to internal port

Upstream commit:
    commit 183dea5818315c0a172d21ecbcd2554894bf01e3
    Author: Paolo Abeni <pabeni@redhat.com>
    Date:   Thu Nov 30 15:35:33 2017 +0100

    openvswitch: do not propagate headroom updates to internal port

    After commit 3a927bc7cf9d ("ovs: propagate per dp max headroom to
    all vports") the need_headroom for the internal vport is updated
    accordingly to the max needed headroom in its datapath.

    That avoids the pskb_expand_head() costs when sending/forwarding
    packets towards tunnel devices, at least for some scenarios.

    We still require such copy when using the ovs-preferred configuration
    for vxlan tunnels:

        br_int
      /       \
    tap      vxlan
               (remote_ip:X)

    br_phy
         \
        NIC

    where the route towards the IP 'X' is via 'br_phy'.

    When forwarding traffic from the tap towards the vxlan device, we
    will call pskb_expand_head() in vxlan_build_skb() because
    br-phy->needed_headroom is equal to tun->needed_headroom.

    With this change we avoid updating the internal vport needed_headroom,
    so that in the above scenario no head copy is needed, giving 5%
    performance improvement in UDP throughput test.

    As a trade-off, packets sent from the internal port towards a tunnel
    device will now experience the head copy overhead. The rationale is
    that the latter use-case is less relevant performance-wise.

Signed-off-by: paolo abeni <pabeni@redhat.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: paolo abeni <pabeni@redhat.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
6 years agoovn-controller: Fix crash when sending GARP when openflow disconnection.
Guoshuai Li [Thu, 15 Feb 2018 10:52:29 +0000 (18:52 +0800)]
ovn-controller: Fix crash when sending GARP when openflow disconnection.

This is call stack:
Program received signal SIGABRT, Aborted.
1  0x00007ffff6a4f8e8 in __GI_abort () at abort.c:90
2  0x00000000004765d6 in ofputil_protocol_to_ofp_version (protocol=<optimized out>) at lib/ofp-util.c:769
3  0x000000000047c19e in ofputil_encode_packet_out (po=po@entry=0x7fffffffa0e0, protocol=<optimized out>) at lib/ofp-util.c:7060
4  0x0000000000410870 in send_garp (garp=0x83cfe0, current_time=current_time@entry=1200375400) at ovn/controller/pinctrl.c:1738
5  0x000000000041430f in send_garp_run (active_tunnels=<optimized out>, local_datapaths=0x7fffffffc0a0, chassis_index=<optimized out>, chassis=0x8194d0, br_int=<optimized out>, ctx=0x7fffffffc080) at ovn/controller/pinctrl.c:2069

Signed-off-by: Guoshuai Li <ligs@dtdream.com>
Acked-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agoofproto-dpif-ipfix: Fix an issue in flow key part
Benli Ye [Thu, 15 Feb 2018 01:52:07 +0000 (17:52 -0800)]
ofproto-dpif-ipfix: Fix an issue in flow key part

As struct ipfix_data_record_flow_key_iface didn't calculate
its length in flow key part, it may cause problem when flow
key part length is not enough. Use MAX_IF_LEN and MAX_IF_DESCR
to pre-allocate memory for ipfix_data_record_flow_key_iface.

Signed-off-by: Daniel Benli Ye <daniely@vmware.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agoovsdb-tool: Indicate "db" and "schema" are optional in man page.
Justin Pettit [Sat, 10 Feb 2018 00:03:40 +0000 (16:03 -0800)]
ovsdb-tool: Indicate "db" and "schema" are optional in man page.

Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
6 years agonetdev-dpdk: Reintroduce shared mempools.
Ian Stokes [Mon, 29 Jan 2018 15:44:50 +0000 (15:44 +0000)]
netdev-dpdk: Reintroduce shared mempools.

This commit manually reverts the current per port mempool model to the
previous shared mempool model for DPDK ports.

OVS previously used a shared mempool model for ports with the same MTU
configuration. This was replaced by a per port mempool model to address
issues flagged by users such as:

https://mail.openvswitch.org/pipermail/ovs-discuss/2016-September/042560.html

However the per port model has a number of issues including:

1. Requires an increase in memory resource requirements to support the same
number of ports as the shared port model.
2. Incorrect algorithm for mbuf provisioning for each mempool.

These are considered blocking factors for current deployments of OVS when
upgrading to OVS 2.9 as a  user may have to redimension memory for the same
deployment configuration. This may not be possible for users.

For clarity, the commits whose changes are removed include the
following:

netdev-dpdk: Create separate memory pool for each port: d555d9b
netdev-dpdk: fix management of pre-existing mempools: b6b26021d
Fix mempool names to reflect socket id: f06546a
netdev-dpdk: skip init for existing mempools: 837c176
netdev-dpdk: manage failure in mempool name creation: 65056fd
netdev-dpdk: Reword mp_size as n_mbufs: ad9b5b9
netdev-dpdk: Rename dpdk_mp_put as dpdk_mp_free: a08a115
netdev-dpdk: Fix mp_name leak on snprintf failure: ec6edc8
netdev-dpdk: Fix dpdk_mp leak in case of EEXIST: 173ef76
netdev-dpdk: Factor out struct dpdk_mp: 24e78f9
netdev-dpdk: Remove unused MAX_NB_MBUF: bc57ed9
netdev-dpdk: Fix mempool creation with large MTU: af5b0da

Due to the number of commits and period of time they were introduced
over, a simple revert was not possible. All code from the commits above
is removed and the shared mempool code reintroduced as it was before its
replacement.

Code introduced by commit

netdev-dpdk: Add debug appctl to get mempool information: be48173

has been modified to work with the shared mempool model.

Cc: Antonio Fischetti <antonio.fischetti@gmail.com>
Cc: Ilya Maximets <i.maximets@samsung.com>
Cc: Kevin Traynor <ktraynor@redhat.com>
Cc: Jan Scheurich <jan.scheurich@ericsson.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Tested-by: Kevin Traynor <ktraynor@redhat.com>
6 years agodocs: Update supported DPDK versions.
Ian Stokes [Mon, 29 Jan 2018 17:17:56 +0000 (17:17 +0000)]
docs: Update supported DPDK versions.

Update the OVS to DPDK release table to use the latest stable
DPDK 16.11.4 for OVS 2.7.

Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
6 years agodatapath: use ktime_get_ts64() instead of ktime_get_ts()
Arnd Bergmann [Wed, 7 Feb 2018 15:30:10 +0000 (07:30 -0800)]
datapath: use ktime_get_ts64() instead of ktime_get_ts()

Upstream commit:
    commit 311af51dcb5629f04976a8e451673f77e3301041
    Author: Arnd Bergmann <arnd@arndb.de>
    Date:   Mon Nov 27 12:41:38 2017 +0100

    openvswitch: use ktime_get_ts64() instead of ktime_get_ts()

    timespec is deprecated because of the y2038 overflow, so let's convert
    this one to ktime_get_ts64(). The code is already safe even on 32-bit
    architectures, since it uses monotonic times. On 64-bit architectures,
    nothing changes, while on 32-bit architectures this avoids one
    type conversion.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Additional compatability check for ktime_get_ts64() exists or not.
If not, then just continue using ktime_get_ts(). I added a new
compatability header file "timekeeping.h".

Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
6 years agodatapath: fix the incorrect flow action alloc size
zhangliping [Wed, 7 Feb 2018 15:30:09 +0000 (07:30 -0800)]
datapath: fix the incorrect flow action alloc size

Upstream commit:
    commit 67c8d22a73128ff910e2287567132530abcf5b71
    Author: zhangliping <zhangliping02@baidu.com>
    Date:   Sat Nov 25 22:02:12 2017 +0800

    openvswitch: fix the incorrect flow action alloc size

    If we want to add a datapath flow, which has more than 500 vxlan outputs'
    action, we will get the following error reports:
      openvswitch: netlink: Flow action size 32832 bytes exceeds max
      openvswitch: netlink: Flow action size 32832 bytes exceeds max
      openvswitch: netlink: Actions may not be safe on all matching packets
      ... ...

    It seems that we can simply enlarge the MAX_ACTIONS_BUFSIZE to fix it, but
    this is not the root cause. For example, for a vxlan output action, we need
    about 60 bytes for the nlattr, but after it is converted to the flow
    action, it only occupies 24 bytes. This means that we can still support
    more than 1000 vxlan output actions for a single datapath flow under the
    the current 32k max limitation.

    So even if the nla_len(attr) is larger than MAX_ACTIONS_BUFSIZE, we
    shouldn't report EINVAL and keep it move on, as the judgement can be
    done by the reserve_sfa_size.

Signed-off-by: zhangliping <zhangliping02@baidu.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: zhangliping <zhangliping02@baidu.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
6 years agodatapath: fix data type in queue_gso_packets
Gustavo A. R. Silva [Wed, 7 Feb 2018 15:30:08 +0000 (07:30 -0800)]
datapath: fix data type in queue_gso_packets

Upstream commit:
    commit 2734166e89639c973c6e125ac8bcfc2d9db72b70
    Author: Gustavo A. R. Silva <garsilva@embeddedor.com>
    Date:   Sat Nov 25 13:14:40 2017 -0600

    net: openvswitch: datapath: fix data type in queue_gso_packets

    gso_type is being used in binary AND operations together with SKB_GSO_UDP.
    The issue is that variable gso_type is of type unsigned short and
    SKB_GSO_UDP expands to more than 16 bits:

    SKB_GSO_UDP = 1 << 16

    this makes any binary AND operation between gso_type and SKB_GSO_UDP to
    be always zero, hence making some code unreachable and likely causing
    undesired behavior.

    Fix this by changing the data type of variable gso_type to unsigned int.

    Addresses-Coverity-ID: 1462223
Fixes: 0c19f846d582 ("net: accept UFO datagrams from tuntap and packet")
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While backporting this I found another couple of instances of the
same issue so I fixed them up as well.

Cc: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
6 years agodatapath: Fix an error handling path in 'ovs_nla_init_match_and_action()
Christophe JAILLET [Wed, 7 Feb 2018 15:30:07 +0000 (07:30 -0800)]
datapath: Fix an error handling path in 'ovs_nla_init_match_and_action()

Upstream commit:
commit 5829e62ac17a40ab08c1b905565604a4b5fa7af6
Author: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Date:   Mon Sep 11 21:56:20 2017 +0200

    openvswitch: Fix an error handling path in 'ovs_nla_init_match_and_action()'

    All other error handling paths in this function go through the 'error'
    label. This one should do the same.

Fixes: 9cc9a5cb176c ("datapath: Avoid using stack larger than 1024.")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Fixes: 850c2a4d1a ("datapath: Avoid using stack larger than 1024.")
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
6 years agocompat: Fix compiler headers
Greg Rose [Wed, 7 Feb 2018 15:30:06 +0000 (07:30 -0800)]
compat: Fix compiler headers

Since Linux kernel upstream commit d15155824c50
("linux/compiler.h: Split into compiler.h and compiler_types.h") this
error check for the gcc compiler header is no longer valid.  Remove
so that openvswitch builds for linux kernels 4.14.8 and since.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
6 years agodatapath: Fix SKB_GSO_UDP usage
Greg Rose [Wed, 7 Feb 2018 15:30:05 +0000 (07:30 -0800)]
datapath: Fix SKB_GSO_UDP usage

Using SKB_GSO_UDP breaks the compilation on Linux 4.14. Check for
the HAVE_SKB_GSO_UDP compiler #define.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
6 years agodatapath: conntrack: make protocol tracker pointers const
Florian Westphal [Wed, 7 Feb 2018 15:30:04 +0000 (07:30 -0800)]
datapath: conntrack: make protocol tracker pointers const

Upstream commit:
    commit b3480fe059ac9121b5714205b4ddae14b59ef4be
    Author: Florian Westphal <fw@strlen.de>
    Date:   Sat Aug 12 00:57:08 2017 +0200

    netfilter: conntrack: make protocol tracker pointers const

    Doesn't change generated code, but will make it easier to eventually
    make the actual trackers themselvers const.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
6 years agocompat:inet_frag.h: Check for frag_percpu_counter_batch
Greg Rose [Wed, 7 Feb 2018 15:30:03 +0000 (07:30 -0800)]
compat:inet_frag.h: Check for frag_percpu_counter_batch

Fix up the compat layer to check for frag_percpu_counter_batch and
if not present then use atomic_sub and atomic_add as per the
backport in the 3.16.50 LTS kernel.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
6 years agocompat: Do not include headers when not compiling
Greg Rose [Wed, 7 Feb 2018 15:30:02 +0000 (07:30 -0800)]
compat: Do not include headers when not compiling

If the entire file is not going to be compiled because OVS is using
upstream tunnel support then also don't bother pulling in the headers.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
6 years agodatapath: Fix netdev_master_upper_dev_link for 4.14
Greg Rose [Wed, 7 Feb 2018 15:30:01 +0000 (07:30 -0800)]
datapath: Fix netdev_master_upper_dev_link for 4.14

An extended netlink ack has been added for 4.14 - add compat layer
changes so that it compiles for all kernels up to and including
4.14.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
6 years agoovn: Allow DNS lookups over IPv6
Mark Michelson [Fri, 9 Feb 2018 15:11:00 +0000 (09:11 -0600)]
ovn: Allow DNS lookups over IPv6

There was a bug in DNS request handling where the incoming packet was
assumed to be IPv4.

The result was that for the outgoing packet, we would attempt to write
the IPv4 checksum and total length into what was actually an IPv6
header. This resulted in the source IPv6 address getting corrupted.
Later, the source and destination IPv6 addresses would get swapped,
resulting in the DNS response being sent to a nonsense destination.

With this change, we check the ethertype of the packet to determine what
l3 information to write, and where to write it. A test is also included
that verifies that this works as expected.

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1539608
Signed-off-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agodatapath: enable NSH support
Yi Yang [Wed, 31 Jan 2018 13:53:06 +0000 (21:53 +0800)]
datapath: enable NSH support

Upstream commit:
  commit b2d0f5d5dc53532e6f07bc546a476a55ebdfe0f3
  Author: Yi Yang <yi.y.yang@intel.com>
  Date:   Tue Nov 7 21:07:02 2017 +0800

    openvswitch: enable NSH support

    OVS master and 2.8 branch has merged NSH userspace
    patch series, this patch is to enable NSH support
    in kernel data path in order that OVS can support
    NSH in compat mode by porting this.

Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Acked-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Eric Garver <e@erig.me>
Acked-by: Pravin Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
6 years agodatapath: nsh: add GSO support
Yi Yang [Wed, 31 Jan 2018 13:53:05 +0000 (21:53 +0800)]
datapath: nsh: add GSO support

Upstream commit:
  commit c411ed854584a71b0e86ac3019b60e4789d88086
  Author: Jiri Benc <jbenc@redhat.com>
  Date:   Mon Aug 28 21:43:24 2017 +0200

    nsh: add GSO support

    Add a new nsh/ directory. It currently holds only GSO functions but more
    will come: in particular, code shared by openvswitch and tc to manipulate
    NSH headers.

    For now, assume there's no hardware support for NSH segmentation. We can
    always introduce netdev->nsh_features later.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
6 years agodatapath: net: add NSH header structures and helpers
Yi Yang [Wed, 31 Jan 2018 13:53:04 +0000 (21:53 +0800)]
datapath: net: add NSH header structures and helpers

Upstream commit:
  commit 1f0b7744c50573df464ca33d8e5275be509f852b
  Author: Yi Yang <yi.y.yang@intel.com>
  Date:   Mon Aug 28 21:43:23 2017 +0200

    net: add NSH header structures and helpers

    NSH (Network Service Header)[1] is a new protocol for service
    function chaining, it can be handled as a L3 protocol like
    IPv4 and IPv6, Eth + NSH + Inner packet or VxLAN-gpe + NSH +
    Inner packet are two typical use cases.

    This patch adds NSH header structures and helpers for NSH GSO
    support and Open vSwitch NSH support.

    [1] https://datatracker.ietf.org/doc/draft-ietf-sfc-nsh/

    [Jiri: added nsh_hdr() helper and renamed the header struct to "struct
    nshhdr" to match the usual pattern. Removed packet type defines, these are
    now shared with VXLAN-GPE.]

Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
6 years agodatapath: vxlan: factor out VXLAN-GPE next protocol
Yi Yang [Wed, 31 Jan 2018 13:53:03 +0000 (21:53 +0800)]
datapath: vxlan: factor out VXLAN-GPE next protocol

Upstream commit:
  commit fa20e0e32cb3dfc1760b6254b64977f2fb5bd851
  Author: Jiri Benc <jbenc@redhat.com>
  Date:   Mon Aug 28 21:43:22 2017 +0200

    vxlan: factor out VXLAN-GPE next protocol

    The values are shared between VXLAN-GPE and NSH. Originally probably by
    coincidence but I notified both working groups about this last year and they
    seem to keep the values in sync since then.

    Hopefully they'll get a single IANA registry for the values, too. (I asked
    them for that.)

    Factor out the code to be shared by the NSH implementation.

    NSH and MPLS values are added in this patch, too. For MPLS, the drafts
    incorrectly assign only a single value, while we have two MPLS ethertypes.
    I raised the problem with both groups. For now, I assume the value is for
    unicast.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
6 years agodatapath: ether: add NSH ethertype
Yi Yang [Wed, 31 Jan 2018 13:53:02 +0000 (21:53 +0800)]
datapath: ether: add NSH ethertype

Upstream commit:
  commit 155e6f649757c902901e599c268f8b575ddac1f8
  Author: Jiri Benc <jbenc@redhat.com>
  Date:   Mon Aug 28 21:43:21 2017 +0200

    ether: add NSH ethertype

    The NSH draft says:

       An IEEE EtherType, 0x894F, has been allocated for NSH.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
6 years agonetdev-linux: Report netdev change events when mac changed.
Tonghao Zhang [Sun, 4 Feb 2018 14:45:38 +0000 (06:45 -0800)]
netdev-linux: Report netdev change events when mac changed.

When mac addr of ports on bridge has been changed, for example,

$ ip link set dev eth0 address 00:11:22:33:44:55

we should reconfigure the datapath id and mac addr of local port.
But now openvswitch dont do that as expected.

A simple example of how to reproduce it:

$ ovs-vsctl add-br br0
$ ifconfig br0  # for example, mac is c6:c6:d7:46:b4:4b
$ ip link set dev br0 address 00:11:22:33:44:55
$ ifconfig br0  # mac of br0 will be 00:11:22:33:44:55

then repeat:
$ ip link set dev br0 address 00:11:22:33:44:55
$ ifconfig br0  # mac of br0 will be c6:c6:d7:46:b4:4b

This patch reports the mac changed event when ports changed, then
openvswitch will reconfigure the datapath id and mac addr of local
port.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agoMakefile.am: Use correct path separator for Windows
Shashank Ram [Fri, 2 Feb 2018 23:49:05 +0000 (15:49 -0800)]
Makefile.am: Use correct path separator for Windows

Signed-off-by: Shashank Ram <rams@vmware.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agoMerge branch 'dpdk_merge_2_9' of https://github.com/istokes/ovs into HEAD
Ben Pfaff [Thu, 1 Feb 2018 20:54:34 +0000 (12:54 -0800)]
Merge branch 'dpdk_merge_2_9' of https://github.com/istokes/ovs into HEAD

6 years agoxlate: fix packets loopback caused by duplicate read of xcfgp.
Huanle Han [Wed, 24 Jan 2018 19:40:16 +0000 (11:40 -0800)]
xlate: fix packets loopback caused by duplicate read of xcfgp.

Some functions, such as xlate_normal_mcast_send_mrouters, test xbundle
pointers equality to avoid sending packet back to in bundle. However,
xbundle pointers port from different xcfgp for same port are inequal.
This may lead to the packet loopback.

This commit stores xcfgp on ctx at first and always uses the same xcfgp
during one packet process period.

Signed-off-by: Huanle Han <hanxueluo@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agoovn-nbctl: update manpage for lsp-set-type.
Han Zhou [Mon, 29 Jan 2018 22:04:48 +0000 (14:04 -0800)]
ovn-nbctl: update manpage for lsp-set-type.

Signed-off-by: Han Zhou <zhouhan@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agonetdev-dpdk: Add support for vHost dequeue zero copy (experimental)
Ciara Loftus [Wed, 31 Jan 2018 10:44:54 +0000 (10:44 +0000)]
netdev-dpdk: Add support for vHost dequeue zero copy (experimental)

Zero copy is disabled by default. To enable it, set the 'dq-zero-copy'
option to 'true' when configuring the Interface:

ovs-vsctl set Interface dpdkvhostuserclient0
options:vhost-server-path=/tmp/dpdkvhostuserclient0
options:dq-zero-copy=true

When packets from a vHost device with zero copy enabled are destined for
a single 'dpdk' port, the number of tx descriptors on that 'dpdk' port
must be set to a smaller value. 128 is recommended. This can be achieved
like so:

ovs-vsctl set Interface dpdkport options:n_txq_desc=128

Note: The sum of the tx descriptors of all 'dpdk' ports the VM will send
to should not exceed 128. Due to this requirement, the feature is
considered 'experimental'.

Testing of the patch showed a ~8% improvement when switching 512B
packets between vHost devices on different VMs on the same host when
zero copy was enabled on the transmitting device.

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
6 years agonetdev-dpdk: Fix xstats leak on port destruction.
Ilya Maximets [Tue, 23 Jan 2018 07:51:31 +0000 (10:51 +0300)]
netdev-dpdk: Fix xstats leak on port destruction.

CC: Michal Weglicki <michalx.weglicki@intel.com>
Fixes: 971f4b394c6e ("netdev: Custom statistics.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
6 years agonetdev-dpdk: Fix memory leak in netdev_dpdk_configure_xstats().
Ilya Maximets [Tue, 23 Jan 2018 07:51:14 +0000 (10:51 +0300)]
netdev-dpdk: Fix memory leak in netdev_dpdk_configure_xstats().

CC: Michal Weglicki <michalx.weglicki@intel.com>
Fixes: 971f4b394c6e ("netdev: Custom statistics.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
6 years agonetdev-dpdk: Fix memory leak in netdev_dpdk_get_custom_stats().
Ilya Maximets [Mon, 22 Jan 2018 16:24:26 +0000 (19:24 +0300)]
netdev-dpdk: Fix memory leak in netdev_dpdk_get_custom_stats().

CC: Michal Weglicki <michalx.weglicki@intel.com>
Fixes: 971f4b394c6e ("netdev: Custom statistics.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
6 years agovswitchd: show DPDK version
Matteo Croce [Mon, 15 Jan 2018 18:21:12 +0000 (19:21 +0100)]
vswitchd: show DPDK version

Show DPDK version if Open vSwitch is compiled with DPDK support.
Version can be retrieved with `ovs-vswitchd --version` or from OVS logs.
Small change in ovs-ctl to avoid breakage on output change.

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
6 years agonetdev-dpdk: fix port addition for ports sharing same PCI id
Yuanhan Liu [Wed, 10 Jan 2018 03:05:29 +0000 (11:05 +0800)]
netdev-dpdk: fix port addition for ports sharing same PCI id

Some NICs have only one PCI address associated with multiple ports. This
patch extends the dpdk-devargs option's format to cater for such devices.

To achieve that, this patch uses a new syntax that will be adapted and
implemented in future DPDK release (likely, v18.05):
    http://dpdk.org/ml/archives/dev/2017-December/084234.html

And since it's the DPDK duty to parse the (complete and full) syntax
and this patch is more likely to serve as an intermediate workaround,
here I take a simpler and shorter syntax from it (note it's allowed to
have only one category being provided):
    class=eth,mac=00:11:22:33:44:55:66

Also, old compatibility is kept. Users can still go on with using the
PCI id to add a port (if that's enough for them). Meaning, this patch
will not break anything.

This patch is basically based on the one from Ciara:
    https://mail.openvswitch.org/pipermail/ovs-dev/2017-October/339496.html

Cc: Loftus Ciara <ciara.loftus@intel.com>
Cc: Thomas Monjalon <thomas@monjalon.net>
Cc: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Yuanhan Liu <yliu@fridaylinux.org>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
6 years agonetdev-dpdk: Fix requested MTU size validation.
Ian Stokes [Tue, 9 Jan 2018 16:09:28 +0000 (16:09 +0000)]
netdev-dpdk: Fix requested MTU size validation.

This commit replaces MTU_TO_FRAME_LEN(mtu) with MTU_TO_MAX_FRAME_LEN(mtu)
in netdev_dpdk_set_mtu(), in order to determine if the total length of
the L2 frame with an MTU of â€™mtu’ exceeds NETDEV_DPDK_MAX_PKT_LEN.

When setting an MTU we first check if the requested total frame length
(which includes associated L2 overhead) will exceed the maximum
frame length supported in netdev_dpdk_set_mtu(). The frame length is
calculated by MTU_TO_FRAME_LEN  as MTU + ETHER_HEADER + ETHER_CRC. The MTU
for the device will be set at a later stage in dpdk_eth_dev_init() using
rte_eth_dev_set_mtu(mtu).

However when using rte_eth_dev_set_mtu(mtu) the calculation used to check
that the frame does not exceed the max frame length for that device varies
between DPDK device drivers. For example ixgbe driver calculates the
frame length for a given MTU as

mtu + ETHER_HDR_LEN + ETHER_CRC_LEN

i40e driver calculates it as

mtu + ETHER_HDR_LEN + ETHER_CRC_LEN + I40E_VLAN_TAG_SIZE * 2

em driver calculates it as

mtu + ETHER_HDR_LEN + ETHER_CRC_LEN + VLAN_TAG_SIZE

Currently it is possible to set an MTU for a netdev_dpdk device that exceeds
the upper limit MTU for that devices DPDK driver. This leads to a segfault.
This is because the frame length comparison as is, does not take into account
the addition of the vlan tag overhead expected in the drivers. The
netdev_dpdk_set_mtu() call will incorrectly succeed but the subsequent
dpdk_eth_dev_init() will fail before the queues have been created for the
DPDK device. This coupled with assumptions regarding reconfiguration
requirements for the netdev will lead to a segfault when the rxq is polled
for this device.

A simple way to avoid this is by using MTU_TO_MAX_FRAME_LEN(mtu) when
validating a requested MTU in netdev_dpdk_set_mtu().
MTU_TO_MAX_FRAME_LEN(mtu) is equivalent to the following:

mtu + ETHER_HDR_LEN + ETHER_CRC_LEN + (2 * VLAN_HEADER_LEN)

By using MTU_TO_MAX_FRAME_LEN at the netdev_dpdk_set_mtu() stage, OvS
now takes into account the maximum L2 overhead that a DPDK driver could
allow for in its frame size calculation. This allows OVS to flag an error
rather than the DPDK driver if the frame length exceeds the max DPDK frame
length. OVS can fail gracefully at this point and use the default MTU of
1500 to continue to configure the port.

Note: this fix is a work around, a better approach would be if DPDK devices
could report the maximum MTU value that can be requested on a per device
basis. This capability however is not currently available. A downside of
this patch is that the MTU upper limit will be reduced by 8 bytes for
DPDK devices that do not need to account for vlan tags in the frame length
driver calculations e.g. ixgbe devices upper MTU limit is reduced from
the OVS point of view from 9710 to 9702.

CC: Mark Kavanagh <mark.b.kavanagh@intel.com>
Fixes: 0072e931 ("netdev-dpdk: add support for jumbo frames")
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Co-authored-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
6 years agoofproto: Fix double-unref of temporary rule when learning.
Ben Pfaff [Fri, 26 Jan 2018 19:43:27 +0000 (11:43 -0800)]
ofproto: Fix double-unref of temporary rule when learning.

When ofproto_flow_mod_init() accepts a rule, it takes ownership of it and
either unrefs it on error or transfers ownership to the struct it
initializes on success, but ofproto_flow_mod_init_for_learn() was unref-ing
it a second time if it reported an error.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
6 years agoopenvswitch/types.h: Drop the member name in initializer macro
Shashank Ram [Thu, 25 Jan 2018 18:12:08 +0000 (10:12 -0800)]
openvswitch/types.h: Drop the member name in initializer macro

MSVC++ compiler does not allow initializing a struct while
explicitly initializing a member in the struct.

Not allowed:
    static const struct eth_addr a = {{ .ea= { 0xff, 0xff, 0xff, 0xff,
                                        0xff, 0xff }}};

Alowed:
    static const struct eth_addr b  = {{{ 0xff, 0xff, 0xff, 0xff, 0xff,
                                          0xff }}};
*An extra curly brace is required for GCC in case the struct contains
a union.

Signed-off-by: Shashank Ram <rams@vmware.com>
Tested-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agogre: strip gre-tso offload flags
wenxu [Fri, 29 Dec 2017 04:45:03 +0000 (12:45 +0800)]
gre: strip gre-tso offload flags

if the gro enable, ipgre receive a gre-tso package. After pop
the gre-tunnel the encapsulation and GSO_ENCAP flags should be
striped. or the packet encap again and will be dropped in
ovs_iptunnel_handle_offloads

Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
6 years agoovn: OVN Support QoS meter
Guoshuai Li [Wed, 24 Jan 2018 12:39:09 +0000 (20:39 +0800)]
ovn: OVN Support QoS meter

This feature is used to limit the bandwidth of flows, such as floating IP.

ovn-northd changes:
1. add bandwidth column in NB's QOS table.
2. add QOS_METER stages in Logical switch ingress/egress.
3. add set_meter() action in SB's LFlow table.

ovn-controller changes:
add meter_table for meter action process openflow meter table.

Now, This feature is only supported in DPDK.

Signed-off-by: Guoshuai Li <ligs@dtdream.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agoovn-controller: Add extend_table instead of group_table to expand meter.
Guoshuai Li [Wed, 24 Jan 2018 12:39:08 +0000 (20:39 +0800)]
ovn-controller: Add extend_table instead of group_table to expand meter.

The structure and function of the group table and meter table are similar,
refactoring code is used to extend for add the meter table.
The following function as lib: table init/destroy/clear/lookup/remove,
assign id for contents, Move the contents of desired to existing.

Signed-off-by: Guoshuai Li <ligs@dtdream.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agoRevert "compat:inet_frag.h: Check for frag_percpu_counter_batch"
Ben Pfaff [Wed, 24 Jan 2018 20:19:03 +0000 (12:19 -0800)]
Revert "compat:inet_frag.h: Check for frag_percpu_counter_batch"

This reverts commit 822afef74f5e65af0cdc3916249ce85a70ae7b83.

Requested-at: https://mail.openvswitch.org/pipermail/ovs-dev/2018-January/343674.html
Requested-by: Gregory Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agonetdev-linux: do not send packets to down tap ifaces.
Flavio Leitner [Thu, 18 Jan 2018 00:09:58 +0000 (22:09 -0200)]
netdev-linux: do not send packets to down tap ifaces.

Today OVS pushes packets to the TAP interface ignoring its
current state. That works because the kernel will return -EIO
when it's not UP and OVS will just ignore that as it is not
an OVS issue.

However, it causes a huge impact when broadcasts happen when
using userspace datapath accelerated with DPDK (e.g.: action
NORMAL).  This patch improves the situation by checking the
TAP's interface state before issueing any syscall.

However, there might be use-cases moving interfaces to other
networking namespaces and in that case, OVS can't retrieve
the iface state (sets it to DOWN). That would stop the traffic
breaking the use-case. This patch relies on netlink notifications
to find out if the device is local or not. When it's local, the
device state is checked otherwise it will behave as before.

Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agotc flower: reorder tunnel encap/decap actions
John Hurley [Tue, 23 Jan 2018 14:08:42 +0000 (14:08 +0000)]
tc flower: reorder tunnel encap/decap actions

The tc_flower conversion struct does not consider the order of actions.
If an OvS rule matches on a tunnel (decap) and outputs to a new tunnel,
the netlink conversion to TC will add the set tunnel key action before the
unset, leading to an incorrect TC rule. This patch reorders the netlink
generation to ensure a decap is done before an encap if both exist.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
6 years agodocs: Fix formatting in fedora.rst
Yi-Hung Wei [Tue, 23 Jan 2018 22:21:31 +0000 (14:21 -0800)]
docs: Fix formatting in fedora.rst

Fix rst formatting in fedora.rst so that the commands look correctly
on the web.

Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agoLACP: Check active partner sys id
Róbert Mulik [Wed, 6 Dec 2017 10:36:33 +0000 (10:36 +0000)]
LACP: Check active partner sys id

A reboot of one switch in an MC-LAG bond makes all bond links
to go down, causing a total connectivity loss for 3 seconds.

Packet capture shows that spurious LACP PDUs are sent to OVS with
a different MAC address (partner system id) during the final
stages of the MC-LAG switch reboot. The current implementation
doesn't care about the partner sys_id (MAC address).

The code change based on the following:
- If an interface (lead interface) on a bond has an "attached"
  LACP connection, then any other slaves on that bond is allowed
  to become active only when its partner's sys_id is the same as
  the partner's sys_id of the lead interface.
- So, when a slave interface of a bond becomes "current" (it gets
  valid LACP information), first checks if there is already an
  active interface on the bond.
- If there is a lead, the slave checks for the partner sys_ids,
  and becomes active only when they are the same, otherwise it
  remains in "current" state, but "detached".
- If there is no lead, it follows the old way, and accepts any
  partner sys_id.

Signed-off-by: Robert Mulik <robert.mulik@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agocompat:inet_frag.h: Check for frag_percpu_counter_batch
Greg Rose [Fri, 5 Jan 2018 19:30:24 +0000 (11:30 -0800)]
compat:inet_frag.h: Check for frag_percpu_counter_batch

Fix up the compat layer to check for frag_percpu_counter_batch and
if not present then use atomic_sub and atomic_add as per the
backport in the 3.16.50 LTS kernel.  Fixes compile errors on
3.16 series kernels from 3.16.50 on.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Justin Pettit <jpettit@ovn.org>
6 years agotests: Fix non-canonical MAC addresses in ovn.at.
Leonid Ryzhyk [Wed, 20 Dec 2017 22:36:44 +0000 (14:36 -0800)]
tests: Fix non-canonical MAC addresses in ovn.at.

Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agoxlate: fix xport lookup for recirc
Zoltan Balogh [Fri, 12 Jan 2018 13:34:11 +0000 (14:34 +0100)]
xlate: fix xport lookup for recirc

Xlate_lookup and xlate_lookup_ofproto_() provides in_port and ofproto
based on xport determined using flow, which is extracted from packet.
The lookup can happen due to recirculation as well. It can happen, that
packet_type has been modified during xlate before recirculation is
triggered, so the lookup fails or delivers wrong xport.
This can be worked around by propagating xport to ctx->xin after the very
first lookup and store it in frozen state of the recirculation.
So, when lookup is performed due to recirculation, the xport can be
retrieved from the frozen state.

The packet-type-aware unit tests are updated with a new one to verify
this behavior.

Signed-off-by: Zoltan Balogh <zoltan.balogh@ericsson.com>
CC: Jan Scheurich <jan.scheurich@ericsson.com>
Fixes: beb75a40fdc2 ("userspace: Switching of L3 packets in L2 pipeline")
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agoofproto-dpif-xlate: add uuid to xports
Zoltan Balogh [Fri, 12 Jan 2018 13:34:10 +0000 (14:34 +0100)]
ofproto-dpif-xlate: add uuid to xports

This should make possible to look up xport by UUID and will be used by a
later commit.

Signed-off-by: Zoltan Balogh <zoltan.balogh@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agoofproto-dpif-sflow: Recursively examine actions inside clone.
Zoltan Balogh [Tue, 9 Jan 2018 18:54:31 +0000 (19:54 +0100)]
ofproto-dpif-sflow: Recursively examine actions inside clone.

Until now, dpif_sflow_read_actions() has ignored actions inside clone.
This means that sflow missed tnl_push actions inside clone, which OVS
now uses to avoid tx recirculation.  This commit fixes the problem
by making dpif_sflow_read_actions() recursively process actions inside
clone.

In addition, some sflow data needs to be stored and restored in
ofproto-dpif-xlate when native_tunnel_output() is invoked. Otherwise the
output action of underlay bridge is getting counted too when sFlow is set
on the overlay bridge.

Both bugs are connected to sflows and were introduced by the commit in
the "Fixes:" tag below.

Signed-off-by: Zoltan Balogh <zoltan.balogh@ericsson.com>
CC: Sugesh Chandran <sugesh.chandran@intel.com>
Fixes: 7c12dfc527a5 ("tunneling: Avoid datapath-recirc by combining recirc actions at xlate.")
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agobridge: Fix custom stats' counters leak.
Ilya Maximets [Tue, 23 Jan 2018 05:55:06 +0000 (08:55 +0300)]
bridge: Fix custom stats' counters leak.

The caller takes ownership over allocated array of counters.
And it must free them.

CC: Michal Weglicki <michalx.weglicki@intel.com>
Fixes: 971f4b394c6e ("netdev: Custom statistics.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agoovn-controller: add new external_id 'ovn-cms-options' to Chassis table
Daniel Alvarez [Tue, 23 Jan 2018 15:13:16 +0000 (15:13 +0000)]
ovn-controller: add new external_id 'ovn-cms-options' to Chassis table

This patch makes ovn-controller sets the external_ids key
'ovn-cms-options' to its own Chassis table entry copying its
contents from the same external_ids key in the local OpenvSwitch
database.

The idea behind this patch is to allow setting general options
from the CMS Plugin to a particular chassis.

A good example of an use case is when we want to schedule a router
on a chassis from OpenStack. In this case, we may want to exclude
some nodes because they are more likely to be restarted for
maintenance operations or they simply won't have external connectivity.
This way, if the CMS/deployment tool would set the external_ids
as:

ovs-vsctl set open . external_ids:ovn-cms-options="enable-chassis-as-gw"

Then ovn-controller will write the options to the Chassis table in
southbound database. This value can be later read by the CMS in order
to decide which Chassis are eligible to schedule a router on.

Similarly, this new key would allow to specify additional options to
be consumed by the CMS.

Signed-off-by: Daniel Alvarez <dalvarez@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agobfd: Send BFD packets with DSCP CS6
Venkatesan Pradeep [Mon, 25 Dec 2017 16:59:22 +0000 (16:59 +0000)]
bfd: Send BFD packets with DSCP CS6

Send BFD packets with TOS value equivalent to DSCP CS6 so that the network
can apply the right QoS for those packets. This can help avoid BFD flaps due
to network congestion.

For a reference on this being the right choice, here is a short
declaration:

http://www.ciscopress.com/articles/article.asp?p=357102&seqNum=4

A long dissertation:

https://www.cisco.com/c/en/us/td/docs/solutions/Enterprise/WAN_and_MAN/QoS_SRND/QoS-SRND-Book/QoSIntro.html

But in a nutshell:

Network engineers create various queue/drop policies based upon precedence.
Routing protocols are considered high priority/high precedence.  During link
saturation events, packets will get dropped. By creating an egress policy
where packets marked by CS6 are allowed front-of-the-queue status, one can be
sure that hello's from the various protocols arrive when they need to, without
delay and without loss.  On the other hand, if the hellos are dropped as part
of normal traffic operations, then traffic routing will flap, leading to
further congestion and drops.

CS6 is a 'well known' marker to network engineers. In many vendor's gear, it
is automatically assigned to routing protocol packets.

Since OVS does not perform queuing, and leaves that to the kernel edge
operations, the queue policies can be used to ensure timely egress of the BFD
packets during high utilization events.

See also:
https://mail.openvswitch.org/pipermail/ovs-dev/2017-October/339784.html
https://mail.openvswitch.org/pipermail/ovs-dev/2017-October/339785.html

Thanks to Raymond Burkholder <ray@oneunified.net> for much of the above
information.

Signed-off-by: Venkatesan Pradeep <venkatesan.pradeep@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agodatapath: add ct_clear action
Eric Garver [Mon, 22 Jan 2018 19:10:05 +0000 (14:10 -0500)]
datapath: add ct_clear action

Upstream commit:
    commit b8226962b1c49c784aeddb9d2fafbf53dfdc2190
    Author: Eric Garver <e@erig.me>
    Date:   Tue Oct 10 16:54:44 2017 -0400

    openvswitch: add ct_clear action

    This adds a ct_clear action for clearing conntrack state. ct_clear is
    currently implemented in OVS userspace, but is not backed by an action
    in the kernel datapath. This is useful for flows that may modify a
    packet tuple after a ct lookup has already occurred.

Signed-off-by: Eric Garver <e@erig.me>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Notes:
   - hunk from include/uapi/linux/openvswitch.h is missing because it
     was added with userspace support in 1fe178d251c8 ("dpif: Add support
     for OVS_ACTION_ATTR_CT_CLEAR")
   - if IP_CT_UNTRACKED is not available use 0 as other nf_ct_set()
     calls do. Since we're setting ct to NULL this is okay.

Signed-off-by: Eric Garver <e@erig.me>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
6 years agoacinclude: check for IP_CT_UNTRACKED
Eric Garver [Mon, 22 Jan 2018 19:10:04 +0000 (14:10 -0500)]
acinclude: check for IP_CT_UNTRACKED

IP_CT_UNTRACKED is fairly new, but used by the kernel datapath ct_clear
action.

Signed-off-by: Eric Garver <e@erig.me>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
6 years agoovsdb-client: Fix memory leaks
Yifeng Sun [Fri, 12 Jan 2018 17:45:31 +0000 (09:45 -0800)]
ovsdb-client: Fix memory leaks

This two leaks are reported by valgrind (testing ovsdb-client
backup and restore):

890 (56 direct, 834 indirect) bytes in 1 blocks are definitely lost in loss record 71 of 73
   by 0x42DE22: xcalloc (util.c:103)
   by 0x40DD8C: ovsdb_schema_create (ovsdb.c:34)
   by 0x40E0B5: ovsdb_schema_from_json (ovsdb.c:196)
   by 0x406DA5: fetch_schema (ovsdb-client.c:415)
   by 0x408478: do_restore (ovsdb-client.c:1595)
   by 0x405BCD: main (ovsdb-client.c:170)

2,688 (88 direct, 2,600 indirect) bytes in 1 blocks are definitely lost in loss record 73 of 73
   by 0x42DE84: xmalloc (util.c:120)
   by 0x40E61F: ovsdb_create (ovsdb.c:329)
   by 0x40BA22: ovsdb_file_open__ (file.c:201)
   by 0x40845A: do_restore (ovsdb-client.c:1592)
   by 0x405BCD: main (ovsdb-client.c:170)

Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agoovn-northd: Fix memory leak
Yifeng Sun [Fri, 12 Jan 2018 17:45:30 +0000 (09:45 -0800)]
ovn-northd: Fix memory leak

This leak was reported by valgrind (testing ovn -- IPv6 Neighbor
Solicitation for unknown MAC):

3,027 bytes in 49 blocks are definitely lost in loss record 210 of 218
    by 0x484C84: xrealloc (util.c:131)
    by 0x43CE41: ds_reserve (dynamic-string.c:63)
    by 0x43D29D: ds_put_format_valist (dynamic-string.c:161)
    by 0x43D3A3: ds_put_format (dynamic-string.c:142)
    by 0x412EEF: ovn_port_update_sbrec (ovn-northd.c:1948)
    by 0x4148B4: build_ports (ovn-northd.c:2109)
    by 0x4148B4: ovnnb_db_run.isra.37 (ovn-northd.c:6202)
    by 0x406FE0: main (ovn-northd.c:6854)

Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agopinctrl: Fix memory leak
Yifeng Sun [Fri, 12 Jan 2018 17:45:29 +0000 (09:45 -0800)]
pinctrl: Fix memory leak

This bug is reported by valgrind (testing ovn -- 3 HVs, 1 LS, 3 lports/HV):

51,680 (27,968 direct, 23,712 indirect) bytes in 76 blocks are definitely lost in loss record 72 of 72
   at 0x4C2FB55: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x4A8992: xcalloc (util.c:103)
   by 0x493052: ovsdb_idl_index_init_row (ovsdb-idl.c:2343)
   by 0x413F69: send_ipv6_ras (pinctrl.c:1321)
   by 0x413F69: pinctrl_run (pinctrl.c:1093)
   by 0x407348: main (ovn-controller.c:703)

Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agosystem-traffic: Add conntrack floating IP test
Eric Garver [Fri, 19 Jan 2018 19:21:53 +0000 (14:21 -0500)]
system-traffic: Add conntrack floating IP test

This test cases uses floating IP (FIP) addresses for each endpoint. If
the destination is a FIP, the packet will undergo a transformation of
the form (dst=FIP, src=non-FIP) --> (dst=non-FIP, src=FIP) before
egress. Otherwise the packet is untouched.

This exercises the ct_clear action in the datapath.

Signed-off-by: Eric Garver <e@erig.me>
Acked-by: William Tu <u9012063@gmail.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Justin Pettit <jpettit@ovn.org>
6 years agosystem-common-macros: Check for ct_clear action in datapath
Eric Garver [Fri, 19 Jan 2018 19:21:52 +0000 (14:21 -0500)]
system-common-macros: Check for ct_clear action in datapath

New macro OVS_CHECK_CT_CLEAR() to check if ct_clear action is supported
by the datapath.

Signed-off-by: Eric Garver <e@erig.me>
Tested-by: William Tu <u9012063@gmail.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Justin Pettit <jpettit@ovn.org>
6 years agodpif: Add support for OVS_ACTION_ATTR_CT_CLEAR
Eric Garver [Fri, 19 Jan 2018 19:21:51 +0000 (14:21 -0500)]
dpif: Add support for OVS_ACTION_ATTR_CT_CLEAR

This supports using the ct_clear action in the kernel datapath. To
preserve compatibility with current ct_clear behavior on old kernels, we
only pass this action down to the datapath if a probe reveals the
datapath actually supports it.

Signed-off-by: Eric Garver <e@erig.me>
Acked-by: William Tu <u9012063@gmail.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Justin Pettit <jpettit@ovn.org>
6 years agoofproto: Fix wrong datapath flow with same in_port and output port.
Lilijun (Jerry) [Fri, 19 Jan 2018 08:12:30 +0000 (08:12 +0000)]
ofproto: Fix wrong datapath flow with same in_port and output port.

In my test, the new datapath flow which has the same in_port and actions
output port was found using ovs-appctl dpctl/dump-flows.  Then the mac
address will move from one port to another and back it again in the
physical switch. This problem result in the VM's traffic become abnormal.

My test key steps:

    1) There are three VM using ovs bridge and intel 82599 nics as uplink
    port, deployed in different hosts connecting to the same physical
    switch. They can be named using VM-A, VM-B and VM-C, Host-A, Host-B,
    Host-C.

    2) VM-A send many unicast packets to VM-B, and VM-B also send unicast
    packets to VM-A.

    3) VM-C ping VM-A continuously, and do ovs port add/delete testing in
    Host-C ovs bridge.

    4) In some abormal scence, the physical switch clear all the mac-entry
    on each ports. Then Host-C ovs bridge's uplink port will receive two
    direction packets(VM-A to VM-B, and VM-B to VM-A).

The expected result is that this two direction packets should be droppd in
the uplink port. Because the dst port of this packets is the uplink port
which is also the src port by looking ovs bridge's mac-entry table learned
by ovs NORMAL rules.  But the truth is some packets being sent back to
uplink port and physical switch. And then VM-A's mac was moved to the
physical switch port of Host-C from the port of Host-A, as a reulst, VM-C
ping VM-A failed at this time.  When this problem occurs, the abnormal ovs
datapath's flow "in_port(2) actions:2" was found by executing the command
"ovs-appctl dpctl/dump-flows".

Currently, xlate_normal() uses xbundle pointer compare to verify the
packet's dst port whether is same with its input port. This implemention
may be wrong while calling xlate_txn_start/xlate_txn_commit in type_run()
at the same time, because xcfg/xbridge/xbundle object was reallocated and
copied just before we lookup the dst mac_port and mac_xbundle. Then
mac_xbundle and in_xbundle are same related with the uplink port but not
same object pointer.

And we can fix this bug by adding ofbundle check conditions shown in my
patch.

Signed-off-by: Lilijun <jerry.lilijun@huawei.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agodpif: geneve: supply dpif function to get ifindex
John Hurley [Tue, 16 Jan 2018 10:46:36 +0000 (10:46 +0000)]
dpif: geneve: supply dpif function to get ifindex

Geneve tunnels are not given a netdev_class function to determine their
ifindex. This means when ofproto-dpif attempts to add a geneve netdev
it fails in 'netdev_ports_insert' in netdev.c. Failure to add this means
that further operations like offloading a rule that egresses to a geneve
port will be rejected as the egress port cannot be found. This patch
applies the same ifindex function to geneve as is used in vxlan.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Acked-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
6 years agoMerge branch 'dpdk_merge' of https://github.com/istokes/ovs into HEAD
Ben Pfaff [Fri, 19 Jan 2018 20:46:44 +0000 (12:46 -0800)]
Merge branch 'dpdk_merge' of https://github.com/istokes/ovs into HEAD

6 years agodpif-netlink-rtnl: Work around MTU bug in kernel GRE driver.
Ben Pfaff [Wed, 17 Jan 2018 18:02:25 +0000 (10:02 -0800)]
dpif-netlink-rtnl: Work around MTU bug in kernel GRE driver.

The kernel GRE driver ignores IFLA_MTU in RTM_NEWLINK requests and
overrides the MTU to 1472 bytes.  This commit works around the problem by
following up a request to create a GRE device with a second request to set
the MTU.

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1488484
Reported-by: Eric Garver <e@erig.me>
Reported-by: James Page <james.page@ubuntu.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Eric Garver <e@erig.me>
Tested-by: James Page <james.page@ubuntu.com>
6 years agodpif-netlink-rtnl: Use 65000 instead of 65535 as tunnel MTU.
Ben Pfaff [Wed, 17 Jan 2018 18:02:24 +0000 (10:02 -0800)]
dpif-netlink-rtnl: Use 65000 instead of 65535 as tunnel MTU.

Most of the existing tunnels accept 65535 for MTU and internally reduce it
to the maximum value actually supported.  However, in RTM_SETLINK calls,
at least GRE tunnels reject MTU larger than actually supported.  This
commit changes the MTU used in RTM_NEWLINK calls to use a value that should
be acceptable to all tunnels and yet does not noticeably reduce
performance.

(This code doesn't actually use RTM_SETLINK to change MTU yet, but that's
coming up.)

Suggested-by: Eric Garver <e@erig.me>
Suggested-at: https://mail.openvswitch.org/pipermail/ovs-dev/2018-January/343304.html
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Eric Garver <e@erig.me>
Tested-by: James Page <james.page@ubuntu.com>
6 years agoDocumentation: Document optional RHEL7 repositories
Greg Rose [Thu, 18 Jan 2018 22:16:52 +0000 (14:16 -0800)]
Documentation: Document optional RHEL7 repositories

On minimal install RHEL 7 servers (and perhaps other types of installs)
you need to enable a couple of optional repositories for the yum-builddep
utility to work correctly.  This patch documents those two optional
repositories.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agoodp-util: Fix compiler warning.
Aaron Conole [Tue, 16 Jan 2018 14:05:47 +0000 (09:05 -0500)]
odp-util: Fix compiler warning.

The result of a ternary operation will be promoted at least to int type.
As such, the compiler may generate a warning as: format specifies type
'unsigned char' but the argument has type 'int'

Found with Apple LLVM version 8.1.0 (clang-802.0.42).

Squelch this by preferring the %d format specifier to print 1/0 values.

Fixes: 74c4530dca93 ("ofproto-dpif: Don't slow-path controller actions with pause.")
Cc: Justin Pettit <jpettit@ovn.org>
Signed-off-by: Aaron Conole <aconole@bytheb.org>
Acked-by: Ian Stokes <ian.stokes@intel.com>
Tested-by: Ian Stokes <ian.stokes@intel.com>
Signed-off-by: Justin Pettit <jpettit@ovn.org>
6 years agorhel: Ensure proper OVS kernel modules load - rhel6
Greg Rose [Wed, 17 Jan 2018 16:01:38 +0000 (08:01 -0800)]
rhel: Ensure proper OVS kernel modules load - rhel6

Patch c49889cf3e "rhel: Ensure proper OVS kernel modules load after upgrade"
did not address the RHEL 6 kmod rpm spec file.  This patch addresses
that error.

Fixes: c49889cf3e ("rhel: Ensure proper OVS kernel modules...")
CC: Ansis Atteka <ansisatteka@gmail.com>
CC: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Ansis Atteka <aatteka@ovn.org>
6 years agoPrepare for 2.9.0.
Justin Pettit [Wed, 17 Jan 2018 17:50:39 +0000 (09:50 -0800)]
Prepare for 2.9.0.

Signed-off-by: Justin Pettit <jpettit@ovn.org>
6 years agoDocumentation: Update Faucet tutorial.
Brad Cowie [Mon, 15 Jan 2018 00:38:48 +0000 (13:38 +1300)]
Documentation: Update Faucet tutorial.

Drop use of minimum_ip_size_check in Faucet tutorial which is no longer
needed after we fixed a bug that was causing packet length checks to be
calculated wrong.

Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agonetdev-dpdk: add vhost-user get_status.
Flavio Leitner [Tue, 16 Jan 2018 04:22:16 +0000 (02:22 -0200)]
netdev-dpdk: add vhost-user get_status.

Expose relevant vhost-user information in status.

Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Tested-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
6 years agoNEWS: Add entry for new appctl dpif-netdev/pmd-rxq-rebalance.
Kevin Traynor [Tue, 16 Jan 2018 15:04:42 +0000 (15:04 +0000)]
NEWS: Add entry for new appctl dpif-netdev/pmd-rxq-rebalance.

This feature was added earlier but we thought it better to
advertise in NEWS after there was stats provided to help
the user decide whether they should use it.

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
6 years agodpif-netdev: Add percentage of pmd/core used by each rxq.
Kevin Traynor [Tue, 16 Jan 2018 15:04:41 +0000 (15:04 +0000)]
dpif-netdev: Add percentage of pmd/core used by each rxq.

It is based on the length of history that is stored about an
rxq (currently 1 min).

$ ovs-appctl dpif-netdev/pmd-rxq-show
pmd thread numa_id 0 core_id 4:
        isolated : false
        port: dpdkphy1         queue-id:  0    pmd usage: 70 %
        port: dpdkvhost0       queue-id:  0    pmd usage:  0 %
pmd thread numa_id 0 core_id 6:
        isolated : false
        port: dpdkphy0         queue-id:  0    pmd usage: 64 %
        port: dpdkvhost1       queue-id:  0    pmd usage:  0 %

These values are what would be used as part of rxq to pmd
assignment due to a reconfiguration event e.g. adding pmds,
adding rxqs or with the command:

ovs-appctl dpif-netdev/pmd-rxq-rebalance

Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Co-authored-by: Jan Scheurich <jan.scheurich@ericsson.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
6 years agodpif-netdev: Reset the rxq current cycle counter on reload.
Kevin Traynor [Thu, 11 Jan 2018 14:25:33 +0000 (14:25 +0000)]
dpif-netdev: Reset the rxq current cycle counter on reload.

An rxq may have processing cycles counted in the current
counter when a reload happens. That could temporarily create
a small skew on the stats for an rxq. Reset the counter after
reload.

Fixes: 4809891b2e01 ("dpif-netdev: Count the rxq processing cycles for an rxq.")
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
6 years agoNEWS: Mark output packet batching support.
Ilya Maximets [Mon, 15 Jan 2018 10:20:55 +0000 (13:20 +0300)]
NEWS: Mark output packet batching support.

New feature should be mentioned in news, especially because it has
user-visible configuration options.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Jan Scheurich <jan.scheurich@ericsson.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
6 years agodocs: Describe output packet batching in DPDK guide.
Ilya Maximets [Mon, 15 Jan 2018 10:20:54 +0000 (13:20 +0300)]
docs: Describe output packet batching in DPDK guide.

Added information about output packet batching and a way to
configure 'tx-flush-interval'.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Co-authored-by: Jan Scheurich <jan.scheurich@ericsson.com>
Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Acked-by: Jan Scheurich <jan.scheurich@ericsson.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>