]> git.proxmox.com Git - mirror_ovs.git/log
mirror_ovs.git
5 years agoMAINTAINERS: Add Ilya Maximets.
Ben Pfaff [Fri, 26 Apr 2019 17:52:50 +0000 (10:52 -0700)]
MAINTAINERS: Add Ilya Maximets.

Ilya was elected by the Open vSwitch committers on Thursday.  Welcome to
the team, Ilya!

CC: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovn-northd: Fix the HA_Chassis sync issue in OVN SB DB
Numan Siddique [Thu, 25 Apr 2019 19:01:39 +0000 (00:31 +0530)]
ovn-northd: Fix the HA_Chassis sync issue in OVN SB DB

ovn-northd deletes and recreates HA_Chassis rows (which belong
to a HA_Chassis_Group) whenever the HA_Chassis_Group/Gateway_Chassis
rows in Northbound DB are out of sync. If a Chassis table row in
Southbound DB is deleted and if this row is referenced by HA_Chassis
row (in Southbound DB), then the present code syncs the HA_Chassis
rows continously and this causes the ovn-controller's to wake up
and results in 100% cpu usage.

This was a simple case which the commit
1be1e0e5e0d1 ("ovn: Add generic HA chassis group") missed out addressing.

This patch fixes this issue.

Fixes: 1be1e0e5e0d1 ("ovn: Add generic HA chassis group")
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2019-April/048580.html
Reported-by: Daniel Alvarez Sanchez (dalvarez@redhat.com)
Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovsdb-server.7: Describe message ordering between "update" and "transact".
Ben Pfaff [Thu, 25 Apr 2019 19:42:46 +0000 (12:42 -0700)]
ovsdb-server.7: Describe message ordering between "update" and "transact".

This comes up sometime and it's best to document it.

Acked-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoDocumentation: Update documentation for OpenFlow support.
Ben Pfaff [Wed, 24 Apr 2019 16:37:21 +0000 (09:37 -0700)]
Documentation: Update documentation for OpenFlow support.

The commits that implemented these features forgot to update the
documentation.

Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agodatapath-windows: Do not send out nbls when cloned nbls are being accessed
Anand Kumar [Thu, 11 Apr 2019 16:14:21 +0000 (09:14 -0700)]
datapath-windows: Do not send out nbls when cloned nbls are being accessed

As per MSDN documentation, "As soon as a filter driver calls the
NdisFSendNetBufferLists function, it relinquishes ownership of
the NET_BUFFER_LIST structures and all associated resources.
A filter driver should never try to examine the NET_BUFFER_LIST
structures or any associated data after calling NdisFSendNetBufferLists".

https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/content/ndis/nf-ndis-ndisfsendnetbufferlists

When freeing up memory of a cloned nbl, parent's nbl and context
is being accessed, which is incorrect can cause BSOD.
With this patch, original nbl is sent out only when cloned nbl is done
with packet processing and its memory is freed.

Signed-off-by: Anand Kumar <kumaranand@vmware.com>
Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>
5 years agosparse: Configure target operating system and fix fallout.
Ben Pfaff [Tue, 23 Apr 2019 23:42:32 +0000 (16:42 -0700)]
sparse: Configure target operating system and fix fallout.

cgcc, the "sparse" wrapper that OVS uses, can be told the host architecture
or the host OS or both.  Until now, OVS has told it the host architecture
because it is fairly common that it doesn't guess it automatically.  Until
now, OS has not told it the host OS, assuming that it would get it right.
However, it doesn't--if you tell it the host OS or the host architecture,
it doesn't really have a default for the other.  This means that on Linux
(presumably the only OS where sparse works properly for OVS), it was not
defining __linux__, which caused some weird behavior.

This commit adds a flag to the cgcc invocation to make it define __linux__
on Linux, and it fixes some errors that this would otherwise cause.

Acked-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agotravis: Fix checks skipping by sparse.
Ilya Maximets [Wed, 24 Apr 2019 13:00:22 +0000 (16:00 +0300)]
travis: Fix checks skipping by sparse.

Recent commit in "sparse" broke checking the OVS sources, because
'make' uses '-MD' flag to generate dependencies as a side effect
within compilation commands, but "sparse" skips all the build commands
that contains '-MD' and friends.
Let's revert the bad commit as a workaround before installing "sparse"
in TravisCI.

Additionally fixed a false-positive:
./lib/bitmap.h:64:29: error: shift too big (64) for type unsigned long

CC: Yi-Hung Wei <yihung.wei@gmail.com>
Fixes: 879e8238dfdf ("travis: Update sparse git repo")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoofproto: Return error codes for rule insertions.
Aravind Prasad S [Tue, 23 Apr 2019 19:00:59 +0000 (00:30 +0530)]
ofproto: Return error codes for rule insertions.

Currently, rule_insert() API does not have return value. There are some
possible scenarios where rule insertions can fail at run-time even though the
static checks during rule_construct() had passed previously.  Some possible
scenarios for failure of rule insertions:

**) Rule insertions can fail dynamically in Hybrid mode (both Openflow and
Normal switch functioning coexist) where the CAM space could get suddenly
filled up by Normal switch functioning and Openflow gets devoid of available
space.

**) Some deployments could have separate independent layers for HW rule
insertions and application layer to interact with OVS. HW layer could face any
dynamic issue during rule handling which application could not have
predicted/captured in rule-construction phase.

Rule-insert errors for bundles are handled too.

Testing: Tested failures of rule insertions and also with bundles.

Signed-off-by: Aravind Prasad S <aravind.sridharan at dell.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoDouble postponing to free subtables.
Zhantao Fu [Tue, 23 Apr 2019 11:04:25 +0000 (19:04 +0800)]
Double postponing to free subtables.

Subtable destruction should be double postponed because readers could still obtain old values while iterating over pvector implementation before its new version published.

Signed-off-by: Zhantao Fu <fuzhantao@huawei.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoOVN: Clarify docs about the default transport zone
Lucas Alvares Gomes [Tue, 23 Apr 2019 12:25:48 +0000 (13:25 +0100)]
OVN: Clarify docs about the default transport zone

This patch is extending the documentation about the new transport zones
feature to clarify that if no transport zones are set, the chassis will
belong to a default group.

Signed-off-by: Lucas Alvares Gomes <lucasagomes@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agodebian: Notes for systemd-networkd integration with OVS.
Gurucharan Shetty [Fri, 19 Apr 2019 07:18:27 +0000 (00:18 -0700)]
debian: Notes for systemd-networkd integration with OVS.

Signed-off-by: Gurucharan Shetty <guru@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
5 years agoOVN: Add support for Transport Zones
Lucas Alvares Gomes [Thu, 18 Apr 2019 13:39:09 +0000 (14:39 +0100)]
OVN: Add support for Transport Zones

This patch is adding support for Transport Zones. Transport zones (a.k.a
TZs) is way to enable users of OVN to separate Chassis into different
logical groups that will only form tunnels between members of the same
groups. Each Chassis can belong to one or more Transport Zones. If
not set, the Chassis will be considered part of a default group.

Configuring Transport Zones is done by creating a key called
"ovn-transport-zones" in the external_ids column of the Open_vSwitch
table from the local OVS instance. The value is a string with the name
of the Transport Zone that this instance is part of. Multiple TZs can
be specified with a comma-separated list. For example:

$ sudo ovs-vsctl set open . external-ids:ovn-transport-zones=tz1

or

$ sudo ovs-vsctl set open . external-ids:ovn-transport-zones=tz1,tz2,tz3

This configuration is also exposed in the Chassis table of the OVN
Southbound Database in a new column called "transport_zones".

The use for Transport Zones includes but are not limited to:

* Edge computing: As a way to preventing edge sites from trying to create
  tunnels with every node on every other edge site while still allowing
  these sites to create tunnels with the central node.

* Extra security layer: Where users wants to create "trust zones"
  and prevent computes in a more secure zone to communicate with a less
  secure zone.

This patch is also backward compatible so the upgrade guide for OVN [0]
is still valid and the ovn-controller service can be upgraded before the
OVSDBs.

[0] http://docs.openvswitch.org/en/latest/intro/install/ovn-upgrades/

Reported-by: Daniel Alvarez Sanchez <dalvarez@redhat.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2019-February/048255.html
Signed-off-by: Lucas Alvares Gomes <lucasagomes@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agotravis: Update sparse git repo
Yi-Hung Wei [Fri, 19 Apr 2019 18:12:15 +0000 (11:12 -0700)]
travis: Update sparse git repo

The old git tree git://git.kernel.org/pub/scm/devel/sparse/chrisl/sparse.git
has not been updated since 2016, and that triggers the following build error
on Ubuntu 18.04 host with 2.27-3 libc6-dev.  So update the sparse git repo
to the new one.

$ .travis/linux-prepare.sh
$  export PATH=$PATH:$HOME/bin
$ .travis/linux-build.sh

/usr/include/stdlib.h:140:17: error: Expected ; at end of declaration
/usr/include/stdlib.h:140:17: error: got strtof32
/usr/include/stdlib.h:146:17: error: Expected ; at end of declaration
/usr/include/stdlib.h:146:17: error: got strtof64
/usr/include/stdlib.h:158:18: error: Expected ; at end of declaration
/usr/include/stdlib.h:158:18: error: got strtof32x
/usr/include/stdlib.h:233:33: error: Expected ) in function declarator
/usr/include/stdlib.h:233:33: error: got __f
/usr/include/stdlib.h:239:33: error: Expected ) in function declarator
/usr/include/stdlib.h:239:33: error: got __f
/usr/include/stdlib.h:251:35: error: Expected ) in function declarator
/usr/include/stdlib.h:251:35: error: got __f
/usr/include/stdlib.h:316:17: error: Expected ; at end of declaration
/usr/include/stdlib.h:316:17: error: got strtof32_l
/usr/include/stdlib.h:323:17: error: Expected ; at end of declaration
/usr/include/stdlib.h:323:17: error: got strtof64_l
/usr/include/stdlib.h:337:18: error: Expected ; at end of declaration
/usr/include/stdlib.h:337:18: error: got strtof32x_l
Makefile:5288: recipe for target 'lib/aes128.lo' failed
make[2]: *** [lib/aes128.lo] Error 1
...

Tested on Jarvis: https://travis-ci.org/YiHungWei/ovs/builds/521979625

Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovsdb raft: Avoid unnecessary reconnecting during leader election.
Han Zhou [Fri, 19 Apr 2019 19:17:47 +0000 (12:17 -0700)]
ovsdb raft: Avoid unnecessary reconnecting during leader election.

If a server claims itself as "disconnected", all clients connected
to that server will try to reconnect to a new server in the cluster.

However, currently a server would claim itself as disconnected even
when itself is the candidate and try to become the new leader (most
likely it will be), and all its clients will reconnect to another
node.

During a leader fail-over (e.g. due to a leader failure), it is
expected that all clients of the old leader will have to reconnect
to other nodes in the cluster, but it is unnecessary for all the
clients of a healthy node to reconnect, which could cause more
disturbance in a large scale environment.

This patch fixes the problem by slightly change the condition that
a server regards itself as disconnected: if its role is candidate,
it is regarded as disconnected only if the election didn't succeed
at the first attempt. Related failure test cases are also unskipped
and all passed with this patch.

Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovsdb-cluster-testsuite.at: Restores "clustered transactions" tests back.
Han Zhou [Fri, 19 Apr 2019 19:17:46 +0000 (12:17 -0700)]
ovsdb-cluster-testsuite.at: Restores "clustered transactions" tests back.

In commit-2bcb3b70 (ovsdb raft: Move ovsdb cluster tests to separate
testsuite.) the "clustered transactions" tests were left unexecuted
because they depend on "EXECUTION_EXAMPLES", which is defined in
ovsdb-execution.at.

Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovn: Generate ICMPv4 packet in router pipeline for larger packets
Numan Siddique [Mon, 22 Apr 2019 19:23:58 +0000 (00:53 +0530)]
ovn: Generate ICMPv4 packet in router pipeline for larger packets

This patch adds 2 stages in router pipeline after ARP_RESOLVE
and adds the logical flows to check the packet length and
generate ICMPv4 packet.

   * S_ROUTER_IN_CHK_PKT_LEN - Which checks the packet length using
                               check_pkt_larger OVN action

   * S_ROUTER_IN_LARGER_PKTS - Which generates icmp packet with
                               type 3 (Destination Unreachable),
                               code 4 (Frag Needed and DF was Set)
                               icmp4.frag_mtu = gw_mtu

In order to add these logical flows, CMS should set the
option 'gateway_mtu' for the distributed logical router port.

Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
Acked-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovn: Support OVS action 'check_pkt_larger' in OVN
Numan Siddique [Mon, 22 Apr 2019 19:23:55 +0000 (00:53 +0530)]
ovn: Support OVS action 'check_pkt_larger' in OVN

Previous commit added a new OVS action 'check_pkt_larger'. This
patch supports that action in OVN. The syntax to use this would be

reg0[0] = check_pkt_larger(LEN)

Upcoming commit will make use of this action in ovn-northd and
will generate an ICMPv4 packet if the packet length is greater than
the specified length.

Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
Acked-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovn: Add a new OVN action 'icmp4_error'
Numan Siddique [Mon, 22 Apr 2019 19:23:51 +0000 (00:53 +0530)]
ovn: Add a new OVN action 'icmp4_error'

This action is similar to the existing 'icmp4' OVN action except that
that this action is expected to be used to generate an ICMPv4 packet
in response to an error in original IP packet. When this action
injects the icmpv4 packet, it also copies the original IP datagram
following the icmp4 header as per RFC 1122: 3.2.2

Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
Acked-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovn: Add a new OVN field icmp4.frag_mtu
Numan Siddique [Mon, 22 Apr 2019 19:23:47 +0000 (00:53 +0530)]
ovn: Add a new OVN field icmp4.frag_mtu

In order to support OVN specific fields (which are not yet
supported in OpenvSwitch to set or modify values) a generic
OVN field support is added in this patch. These OVN fields
gets translated to controller actions.

This patch adds only one field for now - icmp4.frag_mtu.
It should be fairly straightforward to add similar fields in the
near future.

Example usage.
action=(icmp4 {"eth.dst <-> eth.src; "
        "icmp4.type = 3; /* Destination Unreachable */ "
        "icmp4.code = 4; /* Fragmentation Needed */ "
         icmp4.frag_mtu = 1442;
         ...
         "next; };")

action=(icmp4.frag_mtu = 1500; ..)

pinctrl module of ovn-controller will set the specified value
in the the low-order 16 bits of the ICMP4 header field that is
labelled "unused" in the ICMP specification as defined in the RFC 1191.

Upcoming patch will use it to send an icmp4 packet if the
source IPv4 packet destined to go via external gateway needs to
be fragmented.

Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
Acked-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agodatapath: Add a new action check_pkt_len
Numan Siddique [Mon, 22 Apr 2019 19:23:43 +0000 (00:53 +0530)]
datapath: Add a new action check_pkt_len

Upstream commit:
    commit 4d5ec89fc8d14dcdab7214a0c13a1c7321dc6ea9
    Author: Numan Siddique <nusiddiq@redhat.com>
    Date:   Tue Mar 26 06:13:46 2019 +0530

    net: openvswitch: Add a new action check_pkt_len

    This patch adds a new action - 'check_pkt_len' which checks the
    packet length and executes a set of actions if the packet
    length is greater than the specified length or executes
    another set of actions if the packet length is lesser or equal to.

    This action takes below nlattrs
      * OVS_CHECK_PKT_LEN_ATTR_PKT_LEN - 'pkt_len' to check for

      * OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_GREATER - Nested actions
        to apply if the packet length is greater than the specified 'pkt_len'

      * OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_LESS_EQUAL - Nested
        actions to apply if the packet length is lesser or equal to the
        specified 'pkt_len'.

    The main use case for adding this action is to solve the packet
    drops because of MTU mismatch in OVN virtual networking solution.
    When a VM (which belongs to a logical switch of OVN) sends a packet
    destined to go via the gateway router and if the nic which provides
    external connectivity, has a lesser MTU, OVS drops the packet
    if the packet length is greater than this MTU.

    With the help of this action, OVN will check the packet length
    and if it is greater than the MTU size, it will generate an
    ICMP packet (type 3, code 4) and includes the next hop mtu in it
    so that the sender can fragment the packets.

    Reported-at:
    https://mail.openvswitch.org/pipermail/ovs-discuss/2018-July/047039.html
Suggested-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
CC: Gregory Rose <gvrose8192@gmail.com>
CC: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use of 'nla_parse_strict()' (in validate_and_copy_check_len()) is available
only in recent kernels. So changed it to 'nla_parse_nested()'.

Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoAdd a new OVS action check_pkt_larger
Numan Siddique [Mon, 22 Apr 2019 19:23:38 +0000 (00:53 +0530)]
Add a new OVS action check_pkt_larger

This patch adds a new action 'check_pkt_larger' which checks if the
packet is larger than the given size and stores the result in the
destination register.

Usage: check_pkt_larger(len)->REGISTER
Eg. match=...,actions=check_pkt_larger(1442)->NXM_NX_REG0[0],next;

This patch makes use of the new datapath action - 'check_pkt_len'
which was recently added in the commit [1].
At the start of ovs-vswitchd, datapath is probed for this action.
If the datapath action is present, then 'check_pkt_larger'
makes use of this datapath action.

Datapath action 'check_pkt_len' takes these nlattrs
      * OVS_CHECK_PKT_LEN_ATTR_PKT_LEN - 'pkt_len' to check for
      * OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_GREATER (optional) - Nested actions
        to apply if the packet length is greater than the specified 'pkt_len'
      * OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_LESS_EQUAL (optional) - Nested
        actions to apply if the packet length is lesser or equal to the
        specified 'pkt_len'.

Let's say we have these flows added to an OVS bridge br-int

table=0, priority=100 in_port=1,ip,actions=check_pkt_larger:100->NXM_NX_REG0[0],resubmit(,1)
table=1, priority=200,in_port=1,ip,reg0=0x1/0x1 actions=output:3
table=1, priority=100,in_port=1,ip,actions=output:4

Then the action 'check_pkt_larger' will be translated as
  - check_pkt_len(size=100,gt(3),le(4))

datapath will check the packet length and if the packet length is greater than 100,
it will output to port 3, else it will output to port 4.

In case, datapath doesn't support 'check_pkt_len' action, the OVS action
'check_pkt_larger' sets SLOW_ACTION so that datapath flow is not added.

This OVS action is intended to be used by OVN to check the packet length
and generate an ICMP packet with type 3, code 4 and next hop mtu
in the logical router pipeline if the MTU of the physical interface
is lesser than the packet length. More information can be found here [2]

[1] - https://kernel.googlesource.com/pub/scm/linux/kernel/git/davem/net-next/+/4d5ec89fc8d14dcdab7214a0c13a1c7321dc6ea9
[2] - https://mail.openvswitch.org/pipermail/ovs-discuss/2018-July/047039.html

Reported-at:
https://mail.openvswitch.org/pipermail/ovs-discuss/2018-July/047039.html
Suggested-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
CC: Ben Pfaff <blp@ovn.org>
CC: Gregory Rose <gvrose8192@gmail.com>
Acked-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agonetdev-linux: Add coverage counters for netdev_set_policing when ingress tc-offload
Tonghao Zhang [Sat, 20 Apr 2019 00:25:08 +0000 (17:25 -0700)]
netdev-linux: Add coverage counters for netdev_set_policing when ingress tc-offload

When enable tc-offload, we should add coverage counters for netdev_set_policing.

Fixes: e7f6ba220e10 ("lib/tc: add ingress ratelimiting support for tc-offload")
Cc: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agodpif-netdev: fix meter at high packet rate.
William Tu [Fri, 19 Apr 2019 22:26:41 +0000 (15:26 -0700)]
dpif-netdev: fix meter at high packet rate.

When testing packet rate around 1Mpps with meter enabled, the frequency
of hitting meter action becomes much higher, around 30us each time.
As a result, the meter's calculation of 'uint32_t delta_t' becomes
always 0 and meter action has no effect.  This is due to the previous
commit 05f9e707e194 divides the delta by 1000, in order to convert to
msec granularity.  The patch fixes it updating the time when across
millisecond boundary.

Fixes: 05f9e707e194 ("dpif-netdev: Use microsecond granularity.")
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoselinux: update for netlink socket types
Aaron Conole [Wed, 17 Apr 2019 20:07:25 +0000 (16:07 -0400)]
selinux: update for netlink socket types

These are used for interfacing with conntrack, as well as by some
DPDK PMDs

Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Ansis Atteka <aatteka@ovn.org>
5 years agodpif-netdev: Update comment about flow installation race.
Ilya Maximets [Wed, 17 Apr 2019 08:43:56 +0000 (11:43 +0300)]
dpif-netdev: Update comment about flow installation race.

Userspace datapath uses per-PMD flow tables/classifiers for a long
time. However, it was decided to keep this race window to not block
revalidators. Comment should be updated to reflect the current state.

Fixes: 1c1e46ed8457 ("dpif-netdev: Add per-pmd flow-table/classifier.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
5 years agodpif-netdev: Fix double parsing of packets when EMC disabled.
Ilya Maximets [Mon, 11 Mar 2019 16:31:50 +0000 (19:31 +0300)]
dpif-netdev: Fix double parsing of packets when EMC disabled.

This partially reverts commit bde94613e6276d48a6e0be7a592ebcf9836b4aaf.

Commit bde94613e627 was aimed to slightly ( < 1%) increase performance
in the case where EMC disabled, but it avoids RSS hash calculation and
OVS has to calculate it while executing OVS_ACTION_ATTR_HASH in order
to handle balanced-tcp bonding. At the time of executing that action
there is no parsed flow, and OVS parses the packet for the second time
to calculate the hash. This happens for all packets received from the
virtual interfaces because they have no HW RSS.

Here is the example of 'perf' output for VM --> (bonded PHY) traffic:

  Samples: 401K of event 'cycles', Event count (approx.): 50964771478
  Overhead  Shared Object       Symbol
    27.50%  ovs-vswitchd        [.] dpcls_lookup.370382
    16.30%  ovs-vswitchd        [.] rte_vhost_dequeue_burst.9267
    14.95%  ovs-vswitchd        [.] miniflow_extract
     7.22%  ovs-vswitchd        [.] flow_extract
     7.10%  ovs-vswitchd        [.] dp_netdev_input__.371002.4826
     4.01%  ovs-vswitchd        [.] fast_path_processing.370987.4893

We can see that packet parsed twice. First time by 'miniflow_extract'
right after receiving and the second time by 'flow_extract' while
executing actions.

In this particular case calculating RSS on receive saves > 7% of the
total CPU processing time. It varies from ~7 to ~10 % depending on
scenario/traffic types.

It's better to calculate hash each time because performance
improvements of avoiding are negligible in compare with performance
drop in case of sending packets to bonded interface.

Another solution could be to pass the parsed flow explicitly through
the datapath, but this will require big code changes and will have
additional overhead for metadata updating on packet changes.

Also, this change should have small impact since SMC works well in most
cases and will be enabled/recommended by default in the future.

CC: Antonio Fischetti <antonio.fischetti@intel.com>
Fixes: bde94613e627 ("dpif-netdev: Avoid reading RSS hash when EMC is disabled.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
5 years agoovn-nbctl: Fix 32-bit build with gcc.
Ilya Maximets [Wed, 17 Apr 2019 17:22:06 +0000 (20:22 +0300)]
ovn-nbctl: Fix 32-bit build with gcc.

ovn/utilities/ovn-nbctl.c: In function 'print_routing_policy':
ovn/utilities/ovn-nbctl.c:3620:23: error: format '%ld' expects argument
    of type 'long int', but argument 3 has type 'int64_t'
                       policy->match, policy->action, next_hop);
                       ^
ovn/utilities/ovn-nbctl.c:3624:23: error: format '%ld' expects argument
    of type 'long int', but argument 3 has type 'int64_t'
                       policy->match, policy->action);
                       ^
ovn/utilities/ovn-nbctl.c: In function 'cmd_ha_ch_grp_list':
ovn/utilities/ovn-nbctl.c:5056:27: error: format '%lu' expects argument
    of type 'long unsigned int', but argument 10 has type 'int64_t'
                           ha_ch->priority);
                           ^
cc1: all warnings being treated as errors
make[2]: *** [ovn/utilities/ovn-nbctl.o] Error 1

https://travis-ci.org/openvswitch/ovs/jobs/521015912

CC: Numan Siddique <nusiddiq@redhat.com>
CC: Mary Manohar <mary.manohar@nutanix.com>
Fixes: 1be1e0e5e0d1 ("ovn: Add generic HA chassis group")
Fixes: a64bb573468f ("Policy-based routing (PBR) in OVN.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoOVN: Add NEWS for Policy-based routing
Mary Manohar [Wed, 17 Apr 2019 02:05:30 +0000 (02:05 +0000)]
OVN: Add NEWS for Policy-based routing

Signed-off-by: Mary Manohar <mary.manohar at nutanix.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agocompat: iptunnel: NULL pointer deref for ip_md_tunnel_xmit
Alan Maguire [Wed, 27 Mar 2019 15:32:19 +0000 (08:32 -0700)]
compat: iptunnel: NULL pointer deref for ip_md_tunnel_xmit

Upstream commit:
    commit f4b3ec4e6aa1a2ca437905a519ae08e8cf6af754
    Author: Alan Maguire <alan.maguire@oracle.com>
    Date:   Wed Mar 6 10:25:42 2019 +0000

    iptunnel: NULL pointer deref for ip_md_tunnel_xmit

    Naresh Kamboju noted the following oops during execution of selftest
    tools/testing/selftests/bpf/test_tunnel.sh on x86_64:

    [  274.120445] BUG: unable to handle kernel NULL pointer dereference
    at 0000000000000000
    [  274.128285] #PF error: [INSTR]
    [  274.131351] PGD 8000000414a0e067 P4D 8000000414a0e067 PUD 3b6334067 PMD 0
    [  274.138241] Oops: 0010 [#1] SMP PTI
    [  274.141734] CPU: 1 PID: 11464 Comm: ping Not tainted
    5.0.0-rc4-next-20190129 #1
    [  274.149046] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
    2.0b 07/27/2017
    [  274.156526] RIP: 0010:          (null)
    [  274.160280] Code: Bad RIP value.
    [  274.163509] RSP: 0018:ffffbc9681f83540 EFLAGS: 00010286
    [  274.168726] RAX: 0000000000000000 RBX: ffffdc967fa80a18 RCX: 0000000000000000
    [  274.175851] RDX: ffff9db2ee08b540 RSI: 000000000000000e RDI: ffffdc967fa809a0
    [  274.182974] RBP: ffffbc9681f83580 R08: ffff9db2c4d62690 R09: 000000000000000c
    [  274.190098] R10: 0000000000000000 R11: ffff9db2ee08b540 R12: ffff9db31ce7c000
    [  274.197222] R13: 0000000000000001 R14: 000000000000000c R15: ffff9db3179cf400
    [  274.204346] FS:  00007ff4ae7c5740(0000) GS:ffff9db31fa80000(0000)
    knlGS:0000000000000000
    [  274.212424] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [  274.218162] CR2: ffffffffffffffd6 CR3: 00000004574da004 CR4: 00000000003606e0
    [  274.225292] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [  274.232416] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [  274.239541] Call Trace:
    [  274.241988]  ? tnl_update_pmtu+0x296/0x3b0
    [  274.246085]  ip_md_tunnel_xmit+0x1bc/0x520
    [  274.250176]  gre_fb_xmit+0x330/0x390
    [  274.253754]  gre_tap_xmit+0x128/0x180
    [  274.257414]  dev_hard_start_xmit+0xb7/0x300
    [  274.261598]  sch_direct_xmit+0xf6/0x290
    [  274.265430]  __qdisc_run+0x15d/0x5e0
    [  274.269007]  __dev_queue_xmit+0x2c5/0xc00
    [  274.273011]  ? dev_queue_xmit+0x10/0x20
    [  274.276842]  ? eth_header+0x2b/0xc0
    [  274.280326]  dev_queue_xmit+0x10/0x20
    [  274.283984]  ? dev_queue_xmit+0x10/0x20
    [  274.287813]  arp_xmit+0x1a/0xf0
    [  274.290952]  arp_send_dst.part.19+0x46/0x60
    [  274.295138]  arp_solicit+0x177/0x6b0
    [  274.298708]  ? mod_timer+0x18e/0x440
    [  274.302281]  neigh_probe+0x57/0x70
    [  274.305684]  __neigh_event_send+0x197/0x2d0
    [  274.309862]  neigh_resolve_output+0x18c/0x210
    [  274.314212]  ip_finish_output2+0x257/0x690
    [  274.318304]  ip_finish_output+0x219/0x340
    [  274.322314]  ? ip_finish_output+0x219/0x340
    [  274.326493]  ip_output+0x76/0x240
    [  274.329805]  ? ip_fragment.constprop.53+0x80/0x80
    [  274.334510]  ip_local_out+0x3f/0x70
    [  274.337992]  ip_send_skb+0x19/0x40
    [  274.341391]  ip_push_pending_frames+0x33/0x40
    [  274.345740]  raw_sendmsg+0xc15/0x11d0
    [  274.349403]  ? __might_fault+0x85/0x90
    [  274.353151]  ? _copy_from_user+0x6b/0xa0
    [  274.357070]  ? rw_copy_check_uvector+0x54/0x130
    [  274.361604]  inet_sendmsg+0x42/0x1c0
    [  274.365179]  ? inet_sendmsg+0x42/0x1c0
    [  274.368937]  sock_sendmsg+0x3e/0x50
    [  274.372460]  ___sys_sendmsg+0x26f/0x2d0
    [  274.376293]  ? lock_acquire+0x95/0x190
    [  274.380043]  ? __handle_mm_fault+0x7ce/0xb70
    [  274.384307]  ? lock_acquire+0x95/0x190
    [  274.388053]  ? __audit_syscall_entry+0xdd/0x130
    [  274.392586]  ? ktime_get_coarse_real_ts64+0x64/0xc0
    [  274.397461]  ? __audit_syscall_entry+0xdd/0x130
    [  274.401989]  ? trace_hardirqs_on+0x4c/0x100
    [  274.406173]  __sys_sendmsg+0x63/0xa0
    [  274.409744]  ? __sys_sendmsg+0x63/0xa0
    [  274.413488]  __x64_sys_sendmsg+0x1f/0x30
    [  274.417405]  do_syscall_64+0x55/0x190
    [  274.421064]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [  274.426113] RIP: 0033:0x7ff4ae0e6e87
    [  274.429686] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 80 00
    00 00 00 8b 05 ca d9 2b 00 48 63 d2 48 63 ff 85 c0 75 10 b8 2e 00 00
    00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 53 48 89 f3 48 83 ec 10 48 89 7c
    24 08
    [  274.448422] RSP: 002b:00007ffcd9b76db8 EFLAGS: 00000246 ORIG_RAX:
    000000000000002e
    [  274.455978] RAX: ffffffffffffffda RBX: 0000000000000040 RCX: 00007ff4ae0e6e87
    [  274.463104] RDX: 0000000000000000 RSI: 00000000006092e0 RDI: 0000000000000003
    [  274.470228] RBP: 0000000000000000 R08: 00007ffcd9bc40a0 R09: 00007ffcd9bc4080
    [  274.477349] R10: 000000000000060a R11: 0000000000000246 R12: 0000000000000003
    [  274.484475] R13: 0000000000000016 R14: 00007ffcd9b77fa0 R15: 00007ffcd9b78da4
    [  274.491602] Modules linked in: cls_bpf sch_ingress iptable_filter
    ip_tables algif_hash af_alg x86_pkg_temp_thermal fuse [last unloaded:
    test_bpf]
    [  274.504634] CR2: 0000000000000000
    [  274.507976] ---[ end trace 196d18386545eae1 ]---
    [  274.512588] RIP: 0010:          (null)
    [  274.516334] Code: Bad RIP value.
    [  274.519557] RSP: 0018:ffffbc9681f83540 EFLAGS: 00010286
    [  274.524775] RAX: 0000000000000000 RBX: ffffdc967fa80a18 RCX: 0000000000000000
    [  274.531921] RDX: ffff9db2ee08b540 RSI: 000000000000000e RDI: ffffdc967fa809a0
    [  274.539082] RBP: ffffbc9681f83580 R08: ffff9db2c4d62690 R09: 000000000000000c
    [  274.546205] R10: 0000000000000000 R11: ffff9db2ee08b540 R12: ffff9db31ce7c000
    [  274.553329] R13: 0000000000000001 R14: 000000000000000c R15: ffff9db3179cf400
    [  274.560456] FS:  00007ff4ae7c5740(0000) GS:ffff9db31fa80000(0000)
    knlGS:0000000000000000
    [  274.568541] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [  274.574277] CR2: ffffffffffffffd6 CR3: 00000004574da004 CR4: 00000000003606e0
    [  274.581403] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [  274.588535] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [  274.595658] Kernel panic - not syncing: Fatal exception in interrupt
    [  274.602046] Kernel Offset: 0x14400000 from 0xffffffff81000000
    (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
    [  274.612827] ---[ end Kernel panic - not syncing: Fatal exception in
    interrupt ]---
    [  274.620387] ------------[ cut here ]------------

    I'm also seeing the same failure on x86_64, and it reproduces
    consistently.

    >From poking around it looks like the skb's dst entry is being used
    to calculate the mtu in:

    mtu = skb_dst(skb) ? dst_mtu(skb_dst(skb)) : dev->mtu;

    ...but because that dst_entry  has an "ops" value set to md_dst_ops,
    the various ops (including mtu) are not set:

    crash> struct sk_buff._skb_refdst ffff928f87447700 -x
          _skb_refdst = 0xffffcd6fbf5ea590
    crash> struct dst_entry.ops 0xffffcd6fbf5ea590
      ops = 0xffffffffa0193800
    crash> struct dst_ops.mtu 0xffffffffa0193800
      mtu = 0x0
    crash>

    I confirmed that the dst entry also has dst->input set to
    dst_md_discard, so it looks like it's an entry that's been
    initialized via __metadata_dst_init alright.

    I think the fix here is to use skb_valid_dst(skb) - it checks
    for  DST_METADATA also, and with that fix in place, the
    problem - which was previously 100% reproducible - disappears.

    The below patch resolves the panic and all bpf tunnel tests pass
    without incident.

Fixes: c8b34e680a09 ("ip_tunnel: Add tnl_update_pmtu in ip_md_tunnel_xmit")
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Anders Roxell <anders.roxell@linaro.org>
Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Tested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixed up for backward compatibility to our own compat layer ip_tunnel.c
module.

Cc: Alan Maguire <alan.maguire@oracle.com>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agodatapath: fix missing checks for nla_nest_start
Kangjie Lu [Wed, 27 Mar 2019 15:32:18 +0000 (08:32 -0700)]
datapath: fix missing checks for nla_nest_start

Upstream commit:
    commit 0fff9bd47e1341b5c4db862cc39fc68ce45f165d
    Author: Kangjie Lu <kjlu@umn.edu>
    Date:   Fri Mar 15 01:11:22 2019 -0500

    net: openvswitch: fix missing checks for nla_nest_start

    nla_nest_start may fail and thus deserves a check.
    The fix returns -EMSGSIZE when it fails.

Signed-off-by: Kangjie Lu <kjlu@umn.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Kangjie Lu <kjlu@umn.edu>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agonet: openvswitch: fix a NULL pointer dereference
Kangjie Lu [Wed, 27 Mar 2019 15:32:17 +0000 (08:32 -0700)]
net: openvswitch: fix a NULL pointer dereference

Upstream commit:
    commit 6f19893b644a9454d85e593b5e90914e7a72b7dd
    Author: Kangjie Lu <kjlu@umn.edu>
    Date:   Thu Mar 14 23:20:16 2019 -0500

    net: openvswitch: fix a NULL pointer dereference

    upcall is dereferenced even when genlmsg_put fails. The fix
    goto out to avoid the NULL pointer dereference in this case.

Signed-off-by: Kangjie Lu <kjlu@umn.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Kangjie Lu <kjlu@umn.edu>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agodatapath: convert to kvmalloc
Kent Overstreet [Wed, 27 Mar 2019 15:32:16 +0000 (08:32 -0700)]
datapath: convert to kvmalloc

Upstream commit:
    commit ee9c5e67557f9663b27946ba1d3813fb1924b1fe
    Author: Kent Overstreet <kent.overstreet@gmail.com>
    Date:   Mon Mar 11 23:31:02 2019 -0700

    openvswitch: convert to kvmalloc

    Patch series "generic radix trees; drop flex arrays".

    This patch (of 7):

    There was no real need for this code to be using flexarrays, it's just
    implementing a hash table - ideally it would be using rhashtables, but
    that conversion would be significantly more complicated.

Link: http://lkml.kernel.org/r/20181217131929.11727-2-kent.overstreet@gmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Reviewed-by: Matthew Wilcox <willy@infradead.org>
Cc: Pravin B Shelar <pshelar@ovn.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Eric Paris <eparis@parisplace.org>
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Shaohua Li <shli@kernel.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agorhel ovn: Remove ovn-common rpm
Numan Siddique [Tue, 16 Apr 2019 09:01:53 +0000 (14:31 +0530)]
rhel ovn: Remove ovn-common rpm

ovn-fedora spec generates the rpms - ovn, ovn-common, ovn-host etc
in which ovn is an empty package. The ovn fedora spec file here [1]
has moved all the ovn-common files to the 'ovn' package.
This patch does the same.

[1] - https://src.fedoraproject.org/rpms/ovn/blob/master/f/ovn.spec

Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
CC: Timothy Redaelli <tredaelli@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agonetlink linux: fix to append the netnsid netlink attr.
Flavio Leitner [Tue, 26 Mar 2019 17:15:00 +0000 (14:15 -0300)]
netlink linux: fix to append the netnsid netlink attr.

The attribute was being prepended to the netlink buffer, but
the function  nl_sock_transact_multiple__() expects to find the
netlink header as first to update the length, seq and pid fields.

This patch fixes to append the attribute instead of prepending it.

Fixes: 756819ddd788 ("netdev-linux: use netlink to update netdev.")
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agonetlink linux: account for the netnsid netlink attr.
Flavio Leitner [Tue, 26 Mar 2019 17:14:59 +0000 (14:14 -0300)]
netlink linux: account for the netnsid netlink attr.

The buffer needs to be reallocated and data copied when
the netnsid netlink attribute is included, so avoid that
by accounting the attribute when the buffer is initially
allocated.

Fixes: 756819ddd788 ("netdev-linux: use netlink to update netdev.")
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoAUTHORS: Add Liu Chang.
Ben Pfaff [Tue, 16 Apr 2019 22:39:44 +0000 (15:39 -0700)]
AUTHORS: Add Liu Chang.

Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovs-tcpdump: Improve error message when tcpdump is not available.
Liu Chang [Tue, 16 Apr 2019 22:38:35 +0000 (15:38 -0700)]
ovs-tcpdump: Improve error message when tcpdump is not available.

Signed-off-by: Liu Chang <txfh2007@aliyun.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agostream-ssl: Add support for TLS SNI (Server Name Indication).
Ben Pfaff [Thu, 21 Mar 2019 00:38:53 +0000 (17:38 -0700)]
stream-ssl: Add support for TLS SNI (Server Name Indication).

This TLS extension, introduced in RFC 3546, allows the server to know what
host the client believes it is contacting, the TLS equivalent of the Host:
header in HTTP.

Tested-by: Yifeng Sun <pkusunyifeng@gmail.com>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Requested-by: Shivaram Mysore <smysore@servicefractal.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agodatapath: meter: Use struct_size() in kzalloc()
Gustavo A. R. Silva [Wed, 27 Mar 2019 17:14:00 +0000 (10:14 -0700)]
datapath: meter: Use struct_size() in kzalloc()

Upstream commit:
    commit c5c3899de09e307e3a0999ab8d620ab0ede05aa1
    Author: Gustavo A. R. Silva <gustavo@embeddedor.com>
    Date:   Tue Jan 15 15:19:17 2019 -0600

    openvswitch: meter: Use struct_size() in kzalloc()

    One of the more common cases of allocation size calculations is finding the
    size of a structure that has a zero-sized array at the end, along with
    memory for some number of elements for that array. For example:

    struct foo {
        int stuff;
        struct boo entry[];
    };

    instance = kzalloc(sizeof(struct foo) + count * sizeof(struct boo), GFP_KERNEL);

    Instead of leaving these open-coded and prone to type mistakes, we can now
    use the new struct_size() helper:

    instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL);

    This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use of struct_size() needed some compat layer adjustments to make use
of this new macro.  This patch pulls in some of the needed support
from the linux mm.h and overflow.h header files.  This new header
file support is also necessary for the following patch that converts
to use of kvmalloc().

Cc: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agofaq: Add information about git-pw.
Kevin Traynor [Thu, 28 Mar 2019 16:01:25 +0000 (16:01 +0000)]
faq: Add information about git-pw.

git-pw is similar to pwclient but it can apply series directly.

Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoOVN: add the possibility to configure a static IPv4/IPv6 address and dynamic MAC
Lorenzo Bianconi [Fri, 29 Mar 2019 15:58:57 +0000 (16:58 +0100)]
OVN: add the possibility to configure a static IPv4/IPv6 address and dynamic MAC

Add the possibility to configure a static IPv4 and/or IPv6 address
and get MAC address dynamically allocated. This can be done using the
following commands:

$ovn-nbctl ls-add sw0
$ovn-nbctl set Logical-Switch sw0 other_config:subnet=192.168.0.0/24
$ovn-nbctl set Logical-switch sw0 other_config:ipv6_prefix=2001::0
$ovn-nbctl lsp-add sw0 lsp0 -- lsp-set-addresses lsp0 "dynamic 192.168.0.1 2001::1"

Acked-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoconntrack: Fix minimum connections to clean.
Darrell Ball [Fri, 29 Mar 2019 16:50:47 +0000 (09:50 -0700)]
conntrack: Fix minimum connections to clean.

If there is low maximum connection count configuration and less than 10
connections in a bucket, the calculation of the maximum number of
connections to clean for the bucket could be zero, leading to these
connections not being cleaned until and if the connection count in the
bucket increases.

Fix this by checking for low maximum connection count configuration
and do this outside of the buckets loop, thereby simplifying the loop.

Fixes: e6ef6cc6349b ("conntrack: Periodically delete expired connections.")
CC: Daniele Di Proietto <diproiettod@ovn.org>
Reported-by: Liujiaxin <liujiaxin.2019@bytedance.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2019-March/357703.html
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agorhel: if rpms were built without libcapng then let processes to run as root
Ansis Atteka [Tue, 16 Apr 2019 01:23:38 +0000 (18:23 -0700)]
rhel: if rpms were built without libcapng then let processes to run as root

Otherwise, Open vSwitch will fail to start with the following
error "libcap-ng is not configured at compile time" when it
attempts to downgrade to Open vSwitch user.

Also, if packages were built in a way where processes are
supposed to be running only as root, then there is no point
in creating "openvswitch" user in the first place.

Signed-off-by: Ansis Atteka <aatteka@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
5 years agochassis.c: Return chassis record whenever available in chassis_run().
Han Zhou [Tue, 16 Apr 2019 18:42:04 +0000 (11:42 -0700)]
chassis.c: Return chassis record whenever available in chassis_run().

The ovn-controller main loop relies on the return value of chassis_run().
When ovnsb_idl_txn is NULL (i.e. there is a pending transaction for SB),
chasssis_run() returns NULL, which blocks functions to be executed in
the main loop unnecessarily. This patch updates chassis_run() so that
it returns chassis record whenever it is available.

This changes allows xxx_run() functions being executed whenever
br_int and chassis are not NULL. For functions that need to update
SB DB, there are already additional checks making sure ovnsb_idl_txn
is not NULL.

Acked-by: Numan Siddique <nusiddiq@redhat.com>
Acked-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovn-controller: Fix busy loop when sb disconnected.
Han Zhou [Tue, 16 Apr 2019 18:42:03 +0000 (11:42 -0700)]
ovn-controller: Fix busy loop when sb disconnected.

In the main loop, if the SB DB is disconnected when there is a pending
transaction, there can be busy loop causing 100% CPU of ovn-controller,
until SB DB is connected again.

The root cause is that when a transaction is pending, ovsdb_idl_loop_run()
will return NULL for ovnsb_idl_txn, and chassis_run() returns NULL when
ovnsb_idl_txn is NULL, so the condition if (br_int && chassis) is not
satisfied and so ofctrl_run() is not executed in the main loop. If there
is any message pending from br-int.mgmt, such as OFPTYPE_BARRIER_REPLY or
OFPTYPE_ECHO_REQUEST, the main loop will be woken up again and again
because those messages are not processed because ofctrl_run() is not
invoked.

This patch fixes the problem by moving ofctrl_run() above and run it
whenever br_int is not NULL, and not care about chassis because this
function doesn't depend on it.

It also moves out sbrec_chassis_set_nb_cfg() from the "if (ovs_idl_txn)"
just to avoid adding more indentation of the whole block to avoid >79
line length.

Note: the changes of this patch is better to be shown with "-w" because
most of them are indent changes.

Acked-by: Numan Siddique <nusiddiq@redhat.com>
Acked-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoPolicy-based routing (PBR) in OVN.
Mary Manohar [Wed, 3 Apr 2019 23:27:56 +0000 (23:27 +0000)]
Policy-based routing (PBR) in OVN.

PBR provides a mechanism to configure permit/deny and reroute policies on the
router. Permit/deny policies are similar to OVN ACLs, but exist on the
logical-router. Reroute policies are needed for service-insertion and
service-chaining. Currently, policies are stateless.

To achieve this, a new table is introduced in the ingress pipeline of the
Logical-router. The new table is between the â€˜IP Routing’ and the â€˜ARP/ND
resolution’ table. This way, PBR can override routing decisions and provide a
different next-hop.

This Patch:
a. Changes in OVN NB Schema to introduce a new table in the Logical
router.
b. Add commands to ovn-nbctl to add/delete/list routing policies.
c. Changes in ovn-northd to process routing-policy configurations.

 A new table 'Logical_Router_Policy' has been added in the northbound schema.
The table has the following columns:
      * priority: Rules with numerically higher priority take precedence over
        those with lower.
      * match: Uses the same expression language as the 'match' column of
       'Logical_Flow' table in the OVN Southbound database.
      * action: allow/drop/reroute nexthop: Nexthop IP address.

Each row in this table represents one routing policy for a logical router. The
'action' column for the highest priority matching row in this table determines a
packet's treatment. If no row matches, packets are allowed by default.

The new ovn-nbctl commands are as follows:
     1. Add a new ovn-nbctl command to add a routing policy.
     lr-policy-add ROUTER PRIORITY MATCH ACTION [NEXTHOP]

        Nexthop is an optional parameter. It needs to be provided only when
'action' is 'reroute'. A policy is uniquely identified by priority and match.
Multiple policies can have the same priority.

     2. Add a new ovn-nbctl command to delete a routing policy.
     lr-policy-del ROUTER [PRIORITY [MATCH]]

        Takes priority and match as optional parameters. If priority and match
are specified, the policy with the given priority and match is deleted. If
priority is specified and match is not specified, all rules with that priority
are deleted.  If priority is not specified, all the rules would be deleted.

     3. Add a new ovn-nbctl command to list routing-policies in the logical
router.
     lr-policy-list ROUTER

        ovn-northd changes are to get routing-policies from northbound database
and populate the same as logical flows in the southbound database. A new table
called 'POLICY' is introduced in the Logical router's ingress pipeline. Each
routing-policy configured in the northbound database translates into a single
logical flow in the new table.

        The columns from the Logical_Router_Policy table are used as follows:
The priority column is used as priority in the logical-flow. The match column
is used as the 'match' string in the logical-flow. The action column is used to
determine the action of the logical-flow.

        When the 'action' is reroute, if the nexthop ip-address is a connected
router port or the IP address of a logical port, the logical-flow is constructed
to route the packet to the nexthop ip-address.

Signed-off-by: Mary Manohar <mary.manohar@nutanix.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoOVN: fix DVR Floating IP support
Lorenzo Bianconi [Sat, 6 Apr 2019 15:42:52 +0000 (17:42 +0200)]
OVN: fix DVR Floating IP support

When DVR is enabled FIP traffic need to be forwarded directly using
external connection to the underlay network and not be distributed
through geneve tunnels.
Fix this adding new logical flows to take care of distributed DNAT/SNAT

Acked-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovn: Support a new Logical_Switch_Port.type - 'external'
Numan Siddique [Thu, 28 Mar 2019 06:10:17 +0000 (11:40 +0530)]
ovn: Support a new Logical_Switch_Port.type - 'external'

In the case of OpenStack + OVN, when the VMs are booted on
hypervisors supporting SR-IOV nics, there are no OVS ports
for these VMs. When these VMs sends DHCPv4, DHPCv6 or IPv6
Router Solicitation requests, the local ovn-controller
cannot reply to these packets. OpenStack Neutron dhcp agent
service needs to be run to serve these requests.

With the new logical port type - 'external', OVN itself can
handle these requests avoiding the need to deploy any
external services like neutron dhcp agent.

To make use of this feature, CMS has to
 - create a logical port for such VMs
 - set the type to 'external'
 - create an HA chassis group and associate the logical port
   to it or associate an already existing HA chassis group.
 - create a localnet port for the logical switch
 - configure the ovn-bridge-mappings option in the OVS db.

HA chassis with the highest priority becomes the master of
the HA chassis group and the ovn-controller running in that
'chassis', claims the Port_Binding for that logical port
and it adds the necessary DHCPv4/v6 OF flows. Since the packet
enters the logical switch pipeline via the localnet port,
the inport register (reg14) is set
to the tunnel key of localnet port in the match conditions.

In case the chassis goes down for some reason, next higher
priority HA chassis becomes the master and claims the port.

When the VM with the external port, sends an ARP request for
the router ips, only the chassis which has claimed the port,
will reply to the ARP requests. Rest of the chassis on
receiving these packets drop them in the ingress switch
datapath stage - S_SWITCH_IN_EXTERNAL_PORT which is just
before S_SWITCH_IN_L2_LKUP.

This would guarantee that only the chassis which has claimed
the external ports will run the router datapath pipeline.

Acked-by: Mark Michelson <mmichels@redhat.com>
Acked-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovn-northd: Delete the references to gateway_chasss in SB DB
Numan Siddique [Thu, 28 Mar 2019 06:10:11 +0000 (11:40 +0530)]
ovn-northd: Delete the references to gateway_chasss in SB DB

Previous patch in the series added the support in ovn-controller
to use ha_chassis_group table in SB DB to support HA chassis
and establishing BFD tunnels instead of the gateway_chassis table.
There is no need for ovn-northd to create any gateway_chassis
rows in SB DB. This patch does that and deletes the code
which is not required anymore.

This patch also now supports 'ha_chassis_group' to be associated
with a distributed logical router port and ignores 'gateway_chassis'
and 'redirect-chassis' if set along with 'ha_chassis_group'.

Acked-by: Mark Michelson <mmichels@redhat.com>
Acked-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovn-controller: Make use of ha_chassis_group table to bind the chassisredirect ports
Numan Siddique [Thu, 28 Mar 2019 06:10:03 +0000 (11:40 +0530)]
ovn-controller: Make use of ha_chassis_group table to bind the chassisredirect ports

This patch uses the newly added ha_chassis_group table in Southbound DB

 - to bind the chassisredirect ports.

 - to establish BFD sessions with the required chassis. The previous patch
   in this series sets the list of chassis which references a ha chassis group
   in the 'ref_chassis' column of 'ha_chassis_group' table (in ovn-northd).
   This patch uses that information to establish BFD sessions with only the
   required chassis. There is no need to traverse the local_datapath list
   to determine if a local chasis has to establish a BFD session with another
   chassis. For eg, if chassis - HV1, HV2 and HV3 are part of a chassis group
   G1 and G1 is referenced by compute chassis - C1 and C2, the chassis C1
   will establish BFD sessions with HV1, HV2 and HV3 since C1 references the
   group G1. The ha chassis HV1, HV2 and HV3 also establish BFD sessions
   amongst themselves and also with C1 and C2.

This patch also deletes the old code (which used gateway_chassis table)
to bind the chassisredirect port.

The rational behind the refactor is to make the ha chassis binding support
generic, so that logical ports of type 'external' (which will be
added in the upcoming patch) can also make use of it and to simplify
the gateway chassis support code in OVN. Functionally this new
approach is same as the older one.

Acked-by: Mark Michelson <mmichels@redhat.com>
Acked-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovn: Add generic HA chassis group
Numan Siddique [Thu, 28 Mar 2019 06:09:54 +0000 (11:39 +0530)]
ovn: Add generic HA chassis group

This patch adds the tables - 'HA_Chassis_Group' and 'HA_Chassis' in
both OVN Northbound and Southbound DBs to support generic HA Chassis
groups in OVN. CMS can create a group of HA chassis with the priorities
assigned to each chassis in the group. An HA chassis group can be associated to
a distributed logical router port. An upcoming patch will make
use of it while supporting  'external'* logical ports.

HA chassis group is similar to the existing gateway chassis support in
OVN which is used by the distributed gateway router ports.
This patch tries to abstract this so that, the HA chassis support
can be leveraged by not just distributed gateway router ports.

If a logical router port has a set of gateway chassis associated to
it, ovn-northd will create HA chassis group in Southbound
DB and add these gateway chassis to this group. ovn-northd would still create
gateway chassis in Southbound DB as ovn-controller still doesn't support
using the HA chassis group.

Next patch in the series will add the support in ovn-controller to
make use of HA chassis group instead of gateway chassis. The patch following
that will delete creation of gateway chassis in Southbound DB.

HA_Chasss_Group table in Southbound DB has a column - 'ref_chassis'.
This column is used to store the list of chassis which references the
HA chassis group. This information will be used by ovn-controller in an
upcoming patch to establish BFD sessions with the required chassis.

Suppose if there is an HA chassis group - 'hagrp1' in the Southbound
DB and it has HA chasiss list - ha1, ha2 and ha3 and this HA chassis
group is used by a distributed logical router port, then ovn-northd
will update the 'ref_chassis' with the list of chassis which has claimed
all the logical switch ports which are connected to the logical router
which has this distributed logical router port.

Acked-by: Han Zhou <hzhou8@ebay.com>
Acked-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovn-northd: Reuse the hmaps - datapaths and ports in ovnsb_db_run()
Numan Siddique [Thu, 28 Mar 2019 06:09:48 +0000 (11:39 +0530)]
ovn-northd: Reuse the hmaps - datapaths and ports in ovnsb_db_run()

We can reuse the datapaths and ports built during ovnnb_db_run()
in ovnsb_db_run(). This way we avoid creating the logical ports hash nodes
during the ovnsb_db_run().

An upcoming patch will make further use of these hashmaps during ovnsb_db_run().

This patch refactors the code accordingly.

Acked-by: Mark Michelson <mmichels@redhat.com>
Acked-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agocompiler: Fix compilation when using VStudio 2015/2017
Alin Gabriel Serdean [Wed, 3 Apr 2019 12:01:55 +0000 (15:01 +0300)]
compiler: Fix compilation when using VStudio 2015/2017

This is somewhat a regression of:
https://github.com/openvswitch/ovs/commit/27f141d44d95b4cabfd7eac47ace8d1201668b2c

The main issue using `offsetof` from <stddef.h> via the C compiler from
MSVC 2015/2017 has issues and is buggy:
https://bit.ly/2UvWwti

Until it is fixed, we define our own definition of `offsetof`.

Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
Acked-by: Anand Kumar <kumaranand@vmware.com>
5 years agodatapath: Revert "datapath: Fix template leak in error cases."
Flavio Leitner [Wed, 3 Apr 2019 16:49:13 +0000 (09:49 -0700)]
datapath: Revert "datapath: Fix template leak in error cases."

Upstream commit:
    commit 7f6d6558ae44bc193eb28df3617c364d3bb6df39
    Author: Flavio Leitner <fbl@redhat.com>
    Date:   Fri Sep 28 14:55:34 2018 -0300

    Revert "openvswitch: Fix template leak in error cases."
    This reverts commit 90c7afc.

    When the commit was merged, the code used nf_ct_put() to free
    the entry, but later on commit 7664423 ("openvswitch: Free
    tmpl with tmpl_free.") replaced that with nf_ct_tmpl_free which
    is a more appropriate. Now the original problem is removed.

    Then 44d6e2f ("net: Replace NF_CT_ASSERT() with WARN_ON().")
    replaced a debug assert with a WARN_ON() which is trigged now.

Signed-off-by: Flavio Leitner <fbl@redhat.com>
Acked-by: Joe Stringer <joe@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch backports this upstream patch to OVS.

Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agofaq: Explain why select groups don't sort out packets evenly.
Ben Pfaff [Fri, 8 Mar 2019 01:47:39 +0000 (17:47 -0800)]
faq: Explain why select groups don't sort out packets evenly.

This keeps coming up.

Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovsdb raft: Fix duplicated transaction execution when leader failover.
Han Zhou [Fri, 12 Apr 2019 23:26:28 +0000 (16:26 -0700)]
ovsdb raft: Fix duplicated transaction execution when leader failover.

When a transaction is submitted from a client connected to a follower,
if leader crashes after receiving the execute_command_request from the
follower and sending out append request to the majority of followers,
but before sending execute_command_reply to the follower. The
transaction would finally got commited by the new leader. However,
with current implementation the transaction would be commited twice.

For the root cause, there are two cases:

Case 1, the connected follower becomes the new leader. In this case,
the pending command of the follower will be cancelled during its role
changing to leader, so the trigger for the transaction will be retried.

Case 2, another follower becomes the new leader. In this case, since
there is no execute_command_reply from the original leader (which has
crashed), the command will finally timed out, causing the trigger for
the transaction retried.

In both cases, the transaction will be retried by the server node's
trigger retrying logic. This patch fixes the problem by below changes:

1) A pending command can be completed not only by
execute_command_reply, but also when the eid is committed, if the
execute_command_reply never came.

2) Instead of cancelling all pending commands during role change, let
the commands continue waiting to be completed when the eid is
committed. The timer is increased to be twice the election base time,
so that it has the chance to be completed when leader crashes.

This patch fixes the two raft failure test cases previously disabled.
See the test case for details of how to reproduce the problem.

Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovsdb raft: cmd->eid should always be non-null.
Han Zhou [Fri, 12 Apr 2019 23:26:27 +0000 (16:26 -0700)]
ovsdb raft: cmd->eid should always be non-null.

raft_command's eid should always be non-null in all 3 cases. Fix the
comment, and also replace if condition with assert.

Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovsdb raft: Test cases for cluster failures when there are pending transactions.
Han Zhou [Fri, 12 Apr 2019 23:26:26 +0000 (16:26 -0700)]
ovsdb raft: Test cases for cluster failures when there are pending transactions.

Implement test cases for the failure scenarios when there are pending
transactions from clients. This patch implements test cases for different
combinations of conditions with the help of previously added test
commands and options for cluster mode. The conditions include:

- Connected node from which client transaction is executed: leader, follower
- Crashed node: leader, follower that is connected, or the other follower
- Crash point:
    - For leader:
        - before/after receiving execute_command_request
        - before/after sending append_request
        - before/after sending execute_command_reply
    - For follower:
        - before/after sending execute_command_request
        - after receiving append_request

There are 16 test cases in total, and 9 of them are skipped purposely
because of the bugs found by the test cases to avoid CI failure. They will
be enabled in coming patches when the corresponding bugs are fixed.

Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovn-nbctl: Support --no-shuffle-remotes.
Han Zhou [Fri, 12 Apr 2019 23:26:25 +0000 (16:26 -0700)]
ovn-nbctl: Support --no-shuffle-remotes.

Support --no-shuffle-remotes option for ovn-nbctl, which is mainly for testing
purpose, so that we can specify the order that client will failover when the
connected node is down, to have more predictability in the test cases.

Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovsdb-idl: Support optionally not shuffling multiple remotes.
Han Zhou [Fri, 12 Apr 2019 23:26:24 +0000 (16:26 -0700)]
ovsdb-idl: Support optionally not shuffling multiple remotes.

This patch allows remotes not being shuffled if desired (mostly for
testing purpose, when we need the order of remotes during retrying
be predictable). By default it still shuffles as how it behaves today.

Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovsdb raft: Support commands that are required for testing failure scenarios.
Han Zhou [Fri, 12 Apr 2019 23:26:23 +0000 (16:26 -0700)]
ovsdb raft: Support commands that are required for testing failure scenarios.

Added unix commands cluster/... for ovsdb raft, which will be used in a future
patch to test more fine-grained failure scenarios. The commands either causes
a node to crash at certain point, or manipulate the election timer so that
we can control the election process to elect a new leader we desired for the
test cases.

Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovsdb raft: Sync commit index to followers without delay.
Han Zhou [Fri, 12 Apr 2019 23:26:22 +0000 (16:26 -0700)]
ovsdb raft: Sync commit index to followers without delay.

When update is requested from follower, the leader sends AppendRequest
to all followers and wait until AppendReply received from majority, and
then it will update commit index - the new entry is regarded as committed
in raft log. However, this commit will not be notified to followers
(including the one initiated the request) until next heartbeat (ping
timeout), if no other pending requests. This results in long latency
for updates made through followers, especially when a batch of updates
are requested through the same follower.

$ time for i in `seq 1 100`; do ovn-nbctl ls-add ls$i; done

real    0m34.154s
user    0m0.083s
sys 0m0.250s

This patch solves the problem by sending heartbeat as soon as the commit
index is updated in leader. It also avoids unnessary heartbeat by resetting
the ping timer whenever AppendRequest is broadcasted. With this patch
the performance is improved more than 50 times in same test:

$ time for i in `seq 1 100`; do ovn-nbctl ls-add ls$i; done

real    0m0.564s
user    0m0.080s
sys 0m0.199s

Torture test cases are also updated because otherwise the tests will
all be skipped because of the improved performance.

Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovsdb raft: Move ovsdb cluster tests to separate testsuite.
Han Zhou [Fri, 12 Apr 2019 23:26:21 +0000 (16:26 -0700)]
ovsdb raft: Move ovsdb cluster tests to separate testsuite.

Tests in ovsdb-cluster.at are relatively slow, especially torture
tests, and they will be changed in the future costing high CPU, to
make the tests more effective. So we move the tests to a separate
testsuite, so that we can execute them separately, probably with
lower parallelism to avoid exhausting system resources.

Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovsdb.at: Move ovsdb macros from ovsdb.at to ovsdb-macros.at.
Han Zhou [Fri, 12 Apr 2019 23:26:20 +0000 (16:26 -0700)]
ovsdb.at: Move ovsdb macros from ovsdb.at to ovsdb-macros.at.

Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agodebian: Remove Ben Pfaff from Uploaders field.
Ben Pfaff [Fri, 12 Apr 2019 17:00:02 +0000 (10:00 -0700)]
debian: Remove Ben Pfaff from Uploaders field.

I don't want to claim to be in charge of upstream Debian packaging anymore.
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agonetdev-vport: Use the dst_port in tunnel netdev name
Chris Mi [Sat, 13 Apr 2019 08:09:37 +0000 (16:09 +0800)]
netdev-vport: Use the dst_port in tunnel netdev name

If tunnel device dst_port is not the default one, "ovs-dpctl dump-flows"
will fail. The error message for vxlan is:

netdev_linux|INFO|ioctl(SIOCGIFINDEX) on vxlan_sys_4789 device failed: No such device

That's because when calling netdev_vport_construct() for netdev
vxlan_sys_xxxx, the default dst_port is used. Actually, the dst_port
value is in the netdev name. Use it to avoid the error.

Signed-off-by: Chris Mi <chrism@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agocheckpatch: Fix handling of line endings.
Ilya Maximets [Mon, 15 Apr 2019 13:36:54 +0000 (16:36 +0300)]
checkpatch: Fix handling of line endings.

Unlike manual splitting, 'splitlines' correctly handles different
line endings. Without this change script fails to check files with
'\r\n' endings treating the whole patch as a header.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agopvector: Document the entry destruction policy.
Ilya Maximets [Mon, 15 Apr 2019 10:21:00 +0000 (13:21 +0300)]
pvector: Document the entry destruction policy.

This describes how to safely destroy pvector entries after removal.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoDocs: fix conntrack flow ct_state input
LIU Yulong [Tue, 9 Apr 2019 07:48:08 +0000 (15:48 +0800)]
Docs: fix conntrack flow ct_state input

In the following envrionment:
  ovs-vsctl (Open vSwitch) 2.11.0
  DB Schema 7.16.1

we meet the following error during the tutorials
conntrack test:
  "ovs-ofctl: field +est missing value"
  "ovs-ofctl: field +trk missing value"
ovs-vsctl 2.9.0 has the same issue.

This patch gives the tutorials with right
conntrack input.

Signed-off-by: LIU Yulong <i@liuyulong.me>
5 years agoodp-util: Add FLOW_WC_SEQ assertions.
Ben Pfaff [Fri, 29 Mar 2019 19:19:10 +0000 (12:19 -0700)]
odp-util: Add FLOW_WC_SEQ assertions.

The assertions make it easier to find all the places that need to be
updated when adding protocol support.

Acked-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoflow: Add FLOW_WC_SEQ assertions and improve comments.
Ben Pfaff [Thu, 28 Mar 2019 16:49:01 +0000 (09:49 -0700)]
flow: Add FLOW_WC_SEQ assertions and improve comments.

The assertions make it easier to find all the places that need to be
updated when adding protocol support.

Acked-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agometa-flow: Add comment.
Ben Pfaff [Thu, 28 Mar 2019 23:01:41 +0000 (16:01 -0700)]
meta-flow: Add comment.

Acked-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoextract-ofp-fields: Improve error message.
Ben Pfaff [Thu, 28 Mar 2019 23:00:10 +0000 (16:00 -0700)]
extract-ofp-fields: Improve error message.

Without this change, it's not obvious what needs to be edited.

Acked-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agodatapath: fix flow actions reallocation
Andrea Righi [Wed, 10 Apr 2019 22:50:22 +0000 (15:50 -0700)]
datapath: fix flow actions reallocation

Upstream commit:
    commit f28cd2af22a0c134e4aa1c64a70f70d815d473fb
    Author: Andrea Righi <andrea.righi@canonical.com>
    Date:   Thu Mar 28 07:36:00 2019 +0100

    openvswitch: fix flow actions reallocation

    The flow action buffer can be resized if it's not big enough to contain
    all the requested flow actions. However, this resize doesn't take into
    account the new requested size, the buffer is only increased by a factor
    of 2x. This might be not enough to contain the new data, causing a
    buffer overflow, for example:

    [   42.044472] =============================================================================
    [   42.045608] BUG kmalloc-96 (Not tainted): Redzone overwritten
    [   42.046415] -----------------------------------------------------------------------------

    [   42.047715] Disabling lock debugging due to kernel taint
    [   42.047716] INFO: 0x8bf2c4a5-0x720c0928. First byte 0x0 instead of 0xcc
    [   42.048677] INFO: Slab 0xbc6d2040 objects=29 used=18 fp=0xdc07dec4 flags=0x2808101
    [   42.049743] INFO: Object 0xd53a3464 @offset=2528 fp=0xccdcdebb

    [   42.050747] Redzone 76f1b237: cc cc cc cc cc cc cc cc                          ........
    [   42.051839] Object d53a3464: 6b 6b 6b 6b 6b 6b 6b 6b 0c 00 00 00 6c 00 00 00  kkkkkkkk....l...
    [   42.053015] Object f49a30cc: 6c 00 0c 00 00 00 00 00 00 00 00 03 78 a3 15 f6  l...........x...
    [   42.054203] Object acfe4220: 20 00 02 00 ff ff ff ff 00 00 00 00 00 00 00 00   ...............
    [   42.055370] Object 21024e91: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    [   42.056541] Object 070e04c3: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    [   42.057797] Object 948a777a: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    [   42.059061] Redzone 8bf2c4a5: 00 00 00 00                                      ....
    [   42.060189] Padding a681b46e: 5a 5a 5a 5a 5a 5a 5a 5a                          ZZZZZZZZ

    Fix by making sure the new buffer is properly resized to contain all the
    requested data.

BugLink: https://bugs.launchpad.net/bugs/1813244
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Andrea Righi <andrea.righi@canonical.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoacinclude: Use AC_SEARCH_LIBS for linking with dl.
Ilya Maximets [Thu, 11 Apr 2019 07:29:43 +0000 (10:29 +0300)]
acinclude: Use AC_SEARCH_LIBS for linking with dl.

DPDK uses dlopen to load plugins and we need to search for
library containing this function. But we should not do this
in a loop because 'AC_SEARCH_LIBS' could do this for us.
Also, 'AC_SEARCH_LIBS' prints user-visible messages that are
useful for debuging.
Also added the new 'checking' message and code normalized to
be more readable.

With this change we'll have following additional messages:

  checking for library containing dlopen... -ldl
  checking whether linking with dpdk works... yes

Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoacinclude: Transparent checking for DPDK dependencies.
Ilya Maximets [Thu, 11 Apr 2019 07:29:42 +0000 (10:29 +0300)]
acinclude: Transparent checking for DPDK dependencies.

'AC_CHECK_DECL' makes almost same thing as 'AC_COMPILE_IFELSE', but
looks more pretty. Additionally it prints checking results in a
user-visible way making it easy to understand which configs checked
and why we need one or another dependency.

For exmaple, with this patch, configure log may look like this:

  checking whether dpdk datapath is enabled... yes
  checking for rte_config.h... yes
  checking whether RTE_LIBRTE_VHOST_NUMA is declared... no
  checking whether RTE_EAL_NUMA_AWARE_HUGEPAGES is declared... yes
  checking for library containing get_mempolicy... -lnuma
  checking whether RTE_LIBRTE_VHOST_NUMA is declared... (cached) no
  checking whether RTE_LIBRTE_PMD_PCAP is declared... yes
  checking for library containing pcap_dump... -lpcap
  checking whether RTE_LIBRTE_PDUMP is declared... yes
  checking whether RTE_LIBRTE_MLX5_PMD is declared... no
  checking whether RTE_LIBRTE_MLX4_PMD is declared... yes
  checking whether RTE_LIBRTE_MLX4_DLOPEN_DEPS is declared... yes

Instead of just:

  checking whether dpdk datapath is enabled... yes
  checking for rte_config.h... yes
  checking for library containing get_mempolicy... -lnuma
  checking for library containing pcap_dump... -lpcap

Anyway, code looks more clean and easier to understand. Also, with
this change we're defining VHOST_NUMA only if RTE_LIBRTE_VHOST_NUMA
defined. This costs nothing as all the checks with 'AC_CHECK_DECL'
are cached.

Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agonetdev-rte-offloads: Fix printing masks with wrong byte order.
Ilya Maximets [Tue, 26 Mar 2019 12:43:19 +0000 (15:43 +0300)]
netdev-rte-offloads: Fix printing masks with wrong byte order.

'spec's and 'mask's should be printed in a same byte order.

Fixes: daf90186e291 ("netdev-dpdk: add debug for rte flow patterns")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Roni Bar Yanai <roniba@mellanox.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
5 years agonetdev-dpdk: Allocate vhost_id dynamically.
Ilya Maximets [Tue, 5 Mar 2019 16:28:27 +0000 (19:28 +0300)]
netdev-dpdk: Allocate vhost_id dynamically.

'vhost_id' is an array of 'PATH_MAX' bytes in the middle of
'netdev_dpdk' structure. That is 4K bytes.

'vhost_id' never used on a hot path and there is no need to keep
it inside the structure memory. Dynamic allocation will allow to
decrease 'struct netdev_dpdk' significantly, saving 4KB per ETH
port (ETH ports doesn't use 'vhost_id') and almost same value per
vhost ports (real 'vhost_id's, in common case, are much shorter).
We could save the pointer space by making the union with 'devargs'
which is mutually exclusive with 'vhost_id'.
As we're just removing the single 'PADDED_MEMBER', the total
cacheline layout is not affected.

Stats for 'struct netdev_dpdk':

    Before: /* size: 4992, cachelines: 78 */
    After : /* size:  896, cachelines: 14 */

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
5 years agoovs-tc: offload datapath rules matching on internal ports
John Hurley [Tue, 9 Apr 2019 14:36:14 +0000 (15:36 +0100)]
ovs-tc: offload datapath rules matching on internal ports

Rules applied to OvS internal ports are not represented in TC datapaths.
However, it is possible to support rules matching on internal ports in TC.
The start_xmit ndo of OvS internal ports directs packets back into the OvS
kernel datapath where they are rematched with the ingress port now being
that of the internal port. Due to this, rules matching on an internal port
can be added as TC filters to an egress qdisc for these ports.

Allow rules applied to internal ports to be offloaded to TC as egress
filters. Rules redirecting to an internal port are also offloaded. These
are supported by the redirect ingress functionality applied in an earlier
patch.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
5 years agoovs-tc: allow offloading TC rules to egress qdiscs
John Hurley [Tue, 9 Apr 2019 14:36:13 +0000 (15:36 +0100)]
ovs-tc: allow offloading TC rules to egress qdiscs

Offloading rules to a TC datapath only allows the creating of ingress hook
qdiscs and the application of filters to these. However, there may be
certain situations where an egress qdisc is more applicable (e.g. when
offloading to TC rules applied to OvS internal ports).

Extend the TC API in OvS to allow the creation of egress qdiscs and to add
or interact with flower filters applied to these.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
5 years agoovs-tc: allow offloading of ingress mirred TC actions to datapath
John Hurley [Tue, 9 Apr 2019 14:36:12 +0000 (15:36 +0100)]
ovs-tc: allow offloading of ingress mirred TC actions to datapath

The TC datapath only permits the offload of mirred actions if they are
egress. To offload TC actions that output to OvS internal ports, ingress
mirred actions are required. At the TC layer, an ingress mirred action
passes the packet back into the network stack as if it came in the action
port rather than attempting to egress the port.

Update OvS-TC offloads to support ingress mirred actions. To ensure
packets that match these rules are properly passed into the network stack,
add a TC skbedit action along with ingress mirred that sets the pkt_type
to PACKET_HOST. This mirrors the functionality of the OvS internal port
kernel module.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
5 years agocompat: add compatibility headers for tc skbedit action
John Hurley [Tue, 9 Apr 2019 14:36:11 +0000 (15:36 +0100)]
compat: add compatibility headers for tc skbedit action

OvS includes compat code for several TC actions including vlan, mirred and
tunnel key. Add support for using skbedit actions when compiling
user-space code against older kernel headers.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
5 years agodatapath-windows: Fix vlan key getting stored in host byte order.
Anand Kumar via dev [Fri, 5 Apr 2019 18:22:04 +0000 (11:22 -0700)]
datapath-windows: Fix vlan key getting stored in host byte order.

Update flowkey to set vlan information in network byte order.

Signed-off-by: Anand Kumar <kumaranand@vmware.com>
Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>
5 years agowindows, tests: Allow tests to run on MSYS2
Alin Gabriel Serdean [Wed, 3 Apr 2019 12:03:34 +0000 (15:03 +0300)]
windows, tests: Allow tests to run on MSYS2

Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
5 years agonetdev-tc-offloads: Fix probe tc block support
Raed Salem [Mon, 8 Apr 2019 12:42:11 +0000 (15:42 +0300)]
netdev-tc-offloads: Fix probe tc block support

Current implementation will try to create an qdisk of type ingress with
block id 1 to check for kernel ingress block support, this check is
insufficient as old kernels without ingress block support will
successfully create an ingress qdisc, ignoring the ingress block.

Fix by trying to add a test rule on the ingress block.

Fixes 093c9458fb02 ("tc: allow offloading of block ids")
Signed-off-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
5 years agocompiler: Disable BUILD_MESSAGE() when processing with sparse.
Ben Pfaff [Wed, 27 Mar 2019 23:10:58 +0000 (16:10 -0700)]
compiler: Disable BUILD_MESSAGE() when processing with sparse.

sparse doesn't support _Pragma(message(x)), even though GCC does, so
HAVE_PRAGMA_MESSAGE is deceptive in that case and causes pointless errors.

Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agodatapath-windows: Add guards around IpHelper adapter binding calls
Sairam Venugopal via dev [Wed, 13 Mar 2019 22:37:29 +0000 (15:37 -0700)]
datapath-windows: Add guards around IpHelper adapter binding calls

Protect internal adapter up/down calls with a dispatch lock. It was
observed that the InternalAdapter bind calls could happen out of order
thereby causing encap packets to not be sent properly.

Add assert around the IpHelper bind calls to ensure Up/Down gets called
only for the appropriate vports.

Signed-off-by: Sairam Venugopal <vsairam@vmware.com>
Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>
5 years agocheckpatch: Normalize exit code for Windows
Alin Gabriel Serdean [Mon, 18 Mar 2019 22:43:00 +0000 (00:43 +0200)]
checkpatch: Normalize exit code for Windows

Using python `sys.exit(-1)` on Windows produces mixed results.
Let's take the following results from different shells:
CMD
>python -c "import sys; sys.exit(-1)" & echo %errorlevel%
1
MSYS
$ python -c "import sys; sys.exit(-1)" && echo $?
0
WSL
$ python -c "import sys; sys.exit(-1)"; echo $?
255

this results in the following tests to fail:
checkpatch

 10: checkpatch - sign-offs                          FAILED (checkpatch.at:32)
 11: checkpatch - parenthesized constructs           FAILED (checkpatch.at:32)
 12: checkpatch - parenthesized constructs - for     FAILED (checkpatch.at:32)
 13: checkpatch - comments                           FAILED (checkpatch.at:32)

because of:
 ./checkpatch.at:32: exit code was 0, expected 255

This patch introduces a positive constant for the default exit code (1)
similar to other OVS utilities.

Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
5 years agodatapath-windows: Address memory allocation issues for OVS_BUFFER_CONTEXT
Anand Kumar via dev [Wed, 20 Mar 2019 23:54:30 +0000 (16:54 -0700)]
datapath-windows: Address memory allocation issues for OVS_BUFFER_CONTEXT

With current implementation, when nbl pool is allocated, context size is
specified as 64 bytes, while the OVS_BUFFER_CONTEXT size is only 32 bytes.
Since context size is never changed, additional memory is not required.

This patch makes it simpler to allocate memory for OVS_BUFFER_CONTEXT so
that it is always aligned to MEMORY_ALLOCATION_ALIGNMENT.
This is acheived by updating "value" field in the context
structure, so that number of elements in array is always a multiple of
MEMORY_ALLOCATION_ALIGNMENT.

Also change the DEFAULT_CONTEXT_SIZE to accomodate OVS_BUFFER_CONTEXT size.

Signed-off-by: Anand Kumar <kumaranand@vmware.com>
Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>
5 years agorhel: Include all header files in the Fedora's devel package
Ansis Atteka [Tue, 26 Mar 2019 18:12:01 +0000 (11:12 -0700)]
rhel: Include all header files in the Fedora's devel package

While the header files added by this patch into Fedora's devel
rpm package can be considered private, the other devel packages
for RHEL/CentOS and Debian/Ubuntu distros include them.

So this patch simply makes the Fedora devel package consistent with
the other devel packages.

Signed-off-by: Ansis Atteka <aatteka@ovn.org>
5 years agolib: added check to prevent int overflow
Toms Atteka [Wed, 20 Mar 2019 20:40:19 +0000 (13:40 -0700)]
lib: added check to prevent int overflow

If enough large input is given ofpact_finish will fail.
Implemented ofpbuf_oversized function to check for oversized
buffer. Checks were added for parse functions and error messages
returned.

Basic manual testing performed.

Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Reported-by: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=12972
Signed-off-by: Toms Atteka <cpp.code.lv@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agobridge: Propagate patch port pairing errors to db.
Ilya Maximets [Fri, 22 Mar 2019 12:58:39 +0000 (15:58 +0300)]
bridge: Propagate patch port pairing errors to db.

Virtual ports like 'patch' ports that almost fully implemented on
'ofproto' layer could have internal to 'ofproto' statuses that
could not be retrieved from 'netdev' or other layers. For example,
in current implementation there is no way to get the patch port
pairing status (i.e. if it has usable peer?).

New 'ofproto-provider' API function 'vport_get_status' introduced to
cover this gap. It allowes 'bridge' layer to retrive current status
of ofproto virtual ports and propagate it to DB.
For now we're only interested in pairing errors of 'patch' ports.
That are propagated to the 'error' column of the 'Interface' table.

Ex.:

  $ ovs-vsctl show
    ...
    Bridge "br1"
      ...
      Port "patch1"
        Interface "patch1"
          type: patch
          options: {peer="patch0"}
          error: "No usable peer 'patch0' exists in 'system' datapath."

    Bridge "br0"
      datapath_type: netdev
      ...
      Port "patch0"
        Interface "patch0"
          type: patch
          options: {peer="patch1"}
          error: "No usable peer 'patch1' exists in 'netdev' datapath."

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovsdb-idl.c: Remove meaningless MAX().
Han Zhou [Thu, 21 Mar 2019 05:48:22 +0000 (22:48 -0700)]
ovsdb-idl.c: Remove meaningless MAX().

In the else condition, it is already ensured that index >= idl->min_index.
So the MAX() is confusing and misleading here.

Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agohmap: Improve debug log message when reporting unusually large buckets.
Ben Pfaff [Tue, 26 Mar 2019 16:58:20 +0000 (09:58 -0700)]
hmap: Improve debug log message when reporting unusually large buckets.

I was seeing a lot of these messages, including a lot of them suppressed
by rate-limiting, and I wondered whether any really big messages were
being suppressed.  By reporting the largest bucket, instead of just every
large bucket, it becomes more likely that the truly too-large buckets get
reported.

(The problem I saw was a false alarm.)

Acked-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agofaq: Add Q&A for applying patches from email.
Ben Pfaff [Tue, 26 Mar 2019 16:34:58 +0000 (09:34 -0700)]
faq: Add Q&A for applying patches from email.

Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoodp-util: Do not rewrite fields with the same values as matched
Eli Britstein [Thu, 21 Mar 2019 07:44:16 +0000 (07:44 +0000)]
odp-util: Do not rewrite fields with the same values as matched

To improve performance and avoid wasting resources for HW offloaded
flows, do not rewrite fields that are matched with the same value.

Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoMakefiles: Generate datapath ovs key fields macros
Eli Britstein [Thu, 21 Mar 2019 07:44:15 +0000 (07:44 +0000)]
Makefiles: Generate datapath ovs key fields macros

Generate datapath ovs key fields offset and size array macros as a
pre-step for bit-wise comparing fields, with no functional change.

Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoOVN: Make periodic RAs consistent with RA responder.
Mark Michelson [Mon, 25 Mar 2019 21:29:56 +0000 (17:29 -0400)]
OVN: Make periodic RAs consistent with RA responder.

This commit makes periodic RAs from OVN consistent with the RAs sent in
response to RSs. Specifically, this ensures that prefix flags are set
correctly for each address mode.

This commit also gets rid of some redundant definitions for RA prefix
option flags from packets.h in favor of the ones in ovn-l7.h.

Signed-off-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoOVN: Always send prefix option in RAs
Mark Michelson [Mon, 25 Mar 2019 21:29:55 +0000 (17:29 -0400)]
OVN: Always send prefix option in RAs

OVN's behavior when sending router advertisements has been to include IP
prefix information only if the address mode is set to "slaac" or
"dhcp_stateless". In these modes, sending the prefix to the client is
necessary so that it may automatically provision its IP address. We do
not send the prefix option when the address mode is set to
"dhcp_stateful" since there is no need for the client to automatically
provision an IP address.

This logic is flawed, however. When using dhcp_stateful, we provide a
managed IPv6 address for a client. However, because we do not provide
prefix information in our RAs, the client does not know the prefix
length for the address it has been allocated. With dhclient, we have
seen it assume either /64 or /128, depending on which version is being
used. This may not accurately reflect the prefix length being used by
the DHCP server though.

The fix here is to always send prefix information in our RAs, regardless
of address mode. The key difference lies in how we set the A
(autonomous addressing) flag. For slaac and dhcp_stateless address
modes, we will set this flag, indicating the client should provision its
own address based on the prefix we have sent. For dhcp_stateful, we will
not set this flag. This way, it is clear the prefix is informational,
and the client should not try to provision its own IPv6 address.

Signed-off-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoOVN: Use offset instead of pointer into ofpbuf
Mark Michelson [Mon, 25 Mar 2019 21:29:54 +0000 (17:29 -0400)]
OVN: Use offset instead of pointer into ofpbuf

In general, maintaining a pointer into an ofpbuf is risky. As the ofpbuf
grows, it can reallocate its data. If this happens, then pointers into
the data will become invalid.

A safer practice is to track an offset into the ofpbuf's data where a
structure you are interested in is kept. This way, if the ofpbuf data is
reallocated, you can find your structure again by using the offset.

In practice, this patch is not fixing any issues with OVN. Even though
the ra pointer is pointing to ofpbuf data that can be reallocated, it
will never actually happen. ovn-northd and all test cases always encode
the address mode first, meaning we will only ever read from the ra
pointer before the ofpbuf has a chance to expand.

However, this base work is essential for an upcoming patch in this series.

Signed-off-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>