]> git.proxmox.com Git - mirror_ovs.git/log
mirror_ovs.git
6 years agodatapath-windows: Add annotations for OvsReleaseEventQueueLock
Alin Serdean [Fri, 14 Jul 2017 04:40:55 +0000 (04:40 +0000)]
datapath-windows: Add annotations for OvsReleaseEventQueueLock

Add function annotations for ` OvsReleaseEventQueueLock`.
We make it aware that it requires a certain dispatch level, that it
restores the dispatch level, that it requires a lock held and releases
a lock.

Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Shashank Ram <rams@vmware.com>
6 years agodatapath-windows: Add function annotations for OvsAcquireEventQueueLock
Alin Serdean [Fri, 14 Jul 2017 04:40:55 +0000 (04:40 +0000)]
datapath-windows: Add function annotations for OvsAcquireEventQueueLock

The function should be aware that it raises the dispatch level, saves the
dispatch level and acquires a lock.

This patch adds annotation for that.

Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Shashank Ram <rams@vmware.com>
6 years agodatapath-windows: Add function annotations for OvsCancelIrpDatapath
Alin Serdean [Fri, 14 Jul 2017 04:40:55 +0000 (04:40 +0000)]
datapath-windows: Add function annotations for OvsCancelIrpDatapath

The function should be aware that it is cancel routine.

This patch adds annotation for that.

Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Shashank Ram <rams@vmware.com>
6 years agodatapath-windows: Add function annotations for OvsTunnelFilterCancelIrp
Alin Serdean [Fri, 14 Jul 2017 04:40:54 +0000 (04:40 +0000)]
datapath-windows: Add function annotations for OvsTunnelFilterCancelIrp

The function should be aware that it is cancel routine.

This patch adds annotation for that.

Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Shashank Ram <rams@vmware.com>
6 years agodatapath-windows: Add function annotations for OvsCancelIrp
Alin Serdean [Fri, 14 Jul 2017 04:40:54 +0000 (04:40 +0000)]
datapath-windows: Add function annotations for OvsCancelIrp

The function should be aware that it is cancel routine.

This patch adds annotation for that.

Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Shashank Ram <rams@vmware.com>
6 years agodatapath-windows: Add function annotations for OvsReleaseDatapath
Alin Serdean [Fri, 14 Jul 2017 04:40:54 +0000 (04:40 +0000)]
datapath-windows: Add function annotations for OvsReleaseDatapath

The function should be aware that it requires a certain dispatch level,
restores the dispatch level, requires lock held and releases a lock.

This patch adds annotation for that.

Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Shashank Ram <rams@vmware.com>
6 years agodatapath-windows: Add function annotations for OvsAcquireDatapathWrite
Alin Serdean [Fri, 14 Jul 2017 04:40:54 +0000 (04:40 +0000)]
datapath-windows: Add function annotations for OvsAcquireDatapathWrite

The function should be aware that it raises the dispatch level, saves the
dispatch level and acquires a lock.

This patch adds annotation for that.

Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Shashank Ram <rams@vmware.com>
6 years agodatapath-windows: Add function annotations for OvsAcquireDatapathRead
Alin Serdean [Fri, 14 Jul 2017 04:40:54 +0000 (04:40 +0000)]
datapath-windows: Add function annotations for OvsAcquireDatapathRead

The function should be aware that it raises the dispatch level, saves the
dispatch level and acquires a lock.

This patch adds annotation for that.

Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Shashank Ram <rams@vmware.com>
6 years agodatapath-windows: Remove function declarations from Tunnel.c
Alin Serdean [Fri, 14 Jul 2017 04:40:54 +0000 (04:40 +0000)]
datapath-windows: Remove function declarations from Tunnel.c

`OvsAcquireDatapathRead`, `OvsAcquireDatapathWrite`, `OvsReleaseDatapath`
are defined and implemented in Switch.h which is already included.

Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Shashank Ram <rams@vmware.com>
6 years agodatapath-windows: Add annotations for OvsReleaseCtrlLock
Alin Serdean [Fri, 14 Jul 2017 04:40:54 +0000 (04:40 +0000)]
datapath-windows: Add annotations for OvsReleaseCtrlLock

Add function annotations for `OvsReleaseCtrlLock`.
We make it aware that it requires a certain dispatch level, that it
restores the dispatch level, that it requires a lock held and release
a lock.

Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Shashank Ram <rams@vmware.com>
6 years agodatapath-windows: Add annotations for OvsAcquireCtrlLock
Alin Serdean [Fri, 14 Jul 2017 04:40:53 +0000 (04:40 +0000)]
datapath-windows: Add annotations for OvsAcquireCtrlLock

Add annotations to the function `OvsAcquireCtrlLock`.
We make it aware that it raises the dispatch level, where it saves the
dispatch level and it acquires a lock.

Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Shashank Ram <rams@vmware.com>
6 years agodatapath-windows: Add an assert in recirculation
Alin Serdean [Fri, 14 Jul 2017 04:40:53 +0000 (04:40 +0000)]
datapath-windows: Add an assert in recirculation

`ovsFwdCtx.switchContext` can't be null since it is passed from actions.
Add an assert to keep the static analyzer happy.

Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Shashank Ram <rams@vmware.com>
6 years agodatapath-windows: Fix possible NULL dereference in BufferMgmt
Alin Serdean [Fri, 14 Jul 2017 04:40:53 +0000 (04:40 +0000)]
datapath-windows: Fix possible NULL dereference in BufferMgmt

The mdl can be NULL.

Found using WDK 10 static code analysis.

Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Shashank Ram <rams@vmware.com>
6 years agodatapath-windows: Suppress PAGED_CODE warnings
Alin Serdean [Fri, 14 Jul 2017 04:40:53 +0000 (04:40 +0000)]
datapath-windows: Suppress PAGED_CODE warnings

Suppress static code analysis around PAGED_CODE(). The macro is useful only in
checked builds.

Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Shashank Ram <rams@vmware.com>
6 years agodatapath-windows: Add asserts to Stt
Alin Serdean [Fri, 14 Jul 2017 04:40:53 +0000 (04:40 +0000)]
datapath-windows: Add asserts to Stt

Unfortunately the WDK 10 static code analysis can't see this one clearly.

Add an ASSERT to silence the warning.

Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Shashank Ram <rams@vmware.com>
6 years agodatapath-windows: Fix code alignment in Stt
Alin Serdean [Fri, 14 Jul 2017 04:40:52 +0000 (04:40 +0000)]
datapath-windows: Fix code alignment in Stt

Found by inspection.

Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Shashank Ram <rams@vmware.com>
6 years agodatapath-windows: interfaceName overflow in IpHelper
Alin Serdean [Fri, 14 Jul 2017 04:40:52 +0000 (04:40 +0000)]
datapath-windows: interfaceName overflow in IpHelper

Bump the size of interfaceName so an overflow cannot occur when using
`ConvertInterfaceLuidToAlias`.

Found using WDK 10 static code analysis.

Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Shashank Ram <rams@vmware.com>
6 years agodatapath-windows: Remove annotations in Switch.c
Alin Serdean [Fri, 14 Jul 2017 04:40:52 +0000 (04:40 +0000)]
datapath-windows: Remove annotations in Switch.c

There are no annotations defined for `OvsExtDetach` and `OvsExtRestart`.

Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Shashank Ram <rams@vmware.com>
6 years agodatapath-windows: Use non-executable memory when allocating memory
Alin Serdean [Fri, 14 Jul 2017 04:40:52 +0000 (04:40 +0000)]
datapath-windows: Use non-executable memory when allocating memory

Use non-executable memory when using ExAllocatePoolWithTagPriority.

Found using WDK 10 static code analysis.

Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Shashank Ram <rams@vmware.com>
6 years agotests: Extend PTAP unit tests with decap action
Zoltan Balogh [Wed, 2 Aug 2017 08:04:13 +0000 (16:04 +0800)]
tests: Extend PTAP unit tests with decap action

  - Checking decap() prerequisits.
  - Encap/decap VLAN tagged Ethernet frames.
  - Send L3 packet over patch port.
  - Output L2/L3 packet to ports with different packet_type properties.

Signed-off-by: Zoltan Balogh <zoltan.balogh@ericsson.com>
Suggested-by: Jan Scheurich <jan.scheurich@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agoOF support and translation of generic encap and decap
Jan Scheurich [Wed, 2 Aug 2017 08:04:12 +0000 (16:04 +0800)]
OF support and translation of generic encap and decap

This commit adds support for the OpenFlow actions generic encap
and decap (as specified in ONF EXT-382) to the OVS control plane.

CLI syntax for encap action with properties:
  encap(<header>)
  encap(<header>(<prop>=<value>,<tlv>(<class>,<type>,<value>),...))

For example:
  encap(ethernet)
  encap(nsh(md_type=1))
  encap(nsh(md_type=2,tlv(0x1000,10,0x12345678),tlv(0x2000,20,0xfedcba9876543210)))

CLI syntax for decap action:
  decap()
  decap(packet_type(ns=<pt_ns>,type=<pt_type>))

For example:
  decap()
  decap(packet_type(ns=0,type=0xfffe))
  decap(packet_type(ns=1,type=0x894f))

The first header supported for encap and decap is "ethernet" to convert
packets between packet_type (1,Ethertype) and (0,0).

This commit also implements a skeleton for the translation of generic
encap and decap actions in ofproto-dpif and adds support to encap and
decap an Ethernet header.

In general translation of encap commits pending actions and then rewrites
struct flow in accordance with the new packet type and header. In the
case of encap(ethernet) it suffices to change the packet type from
(1, Ethertype) to (0,0) and set the dl_type accordingly. A new
pending_encap flag in xlate ctx is set to mark that an corresponding
datapath encap action must be triggered at the next commit. In the
case of encap(ethernet) ofproto generetas a push_eth action.

The general case for translation of decap() is to emit a datapath action
to decap the current outermost header and then recirculate the packet
to reparse the inner headers. In the special case of an Ethernet packet,
decap() just changes the packet type from (0,0) to (1, dl_type) without
a need to recirculate. The emission of the pop_eth action for the
datapath is postponed to the next commit.

Hence encap(ethernet) and decap() on an Ethernet packet are OF octions
that only incur a cost in the dataplane when a modifed packet is
actually committed, e.g. because it is sent out. They can freely be
used for normalizing the packet type in the OF pipeline without
degrading performance.

Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Signed-off-by: Zoltan Balogh <zoltan.balogh@ericsson.com>
Co-authored-by: Zoltan Balogh <zoltan.balogh@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agoovn-northd: Add native active-standby HA.
Russell Bryant [Tue, 1 Aug 2017 16:15:04 +0000 (12:15 -0400)]
ovn-northd: Add native active-standby HA.

Add native support for active-standby HA in ovn-northd by having each
instance attempt to acquire an OVSDB lock.  Only the instance of
ovn-northd that currently holds the lock will make active changes to
the OVN databases.

Signed-off-by: Russell Bryant <russell@ovn.org>
Acked-by: Han Zhou <zhouhan@gmail.com>
Tested-by: Numan Siddique <nusiddiq@redhat.com>
Acked-by: Numan Siddique <nusiddiq@redhat.com>
6 years agodpif-netdev: Reorder elements in dp_netdev_port structure.
Bhanuprakash Bodireddy [Wed, 2 Aug 2017 03:13:38 +0000 (20:13 -0700)]
dpif-netdev: Reorder elements in dp_netdev_port structure.

By reordering the elements in dp_netdev_port structure, pad bytes can be
reduced there by saving a cache line. Marginal performance improvement
is also observed with this change.

Before: structure size: 136, holes: 7, sum padbytes:7, cachelines:3
After : structure size: 128, holes: 6, sum padbytes:0, cachelines:2

Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agodpctl: Add new 'ct-bkts' command.
Antonio Fischetti [Wed, 2 Aug 2017 03:12:03 +0000 (20:12 -0700)]
dpctl: Add new 'ct-bkts' command.

With the command:
 ovs-appctl dpctl/ct-bkts
shows the number of connections per bucket.

By using a threshold:
 ovs-appctl dpctl/ct-bkts gt=N
for each bucket shows the number of connections when they
are greater than N.

Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com>
Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
Co-authored-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agoconntrack : Use Rx checksum offload feature on DPDK ports for conntrack.
Sugesh Chandran [Wed, 2 Aug 2017 01:51:14 +0000 (18:51 -0700)]
conntrack : Use Rx checksum offload feature on DPDK ports for conntrack.

Avoiding checksum validation in conntrack module if it is already verified
in DPDK physical NIC ports.

Signed-off-by: Sugesh Chandran <sugesh.chandran@intel.com>
Co-authored-by: Darrell Ball <dball@vmware.com>
Signed-off-by: Darrell Ball <dball@vmware.com>
Acked-by: Antonio Fishetti <antonio.fischetti@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agodp-packet : Update DPDK rx checksum validation functions.
Sugesh Chandran [Wed, 2 Aug 2017 00:36:33 +0000 (17:36 -0700)]
dp-packet : Update DPDK rx checksum validation functions.

DPDK ports use masks while reporting rx checksum flags. OVS should use these
mask along with reported checksum flag while validating the good checksum.

Added two new functions to validate bad checksum reported by DPDK NIC port.
These two functions will be used in the following patch for enabling rx checksum
offload in conntrack module.

Signed-off-by: Sugesh Chandran <sugesh.chandran@intel.com>
Co-authored-by: Darrell Ball <dball@vmware.com>
Signed-off-by: Darrell Ball <dball@vmware.com>
Acked-by: Antonio Fishetti <antonio.fischetti@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agopackets: Do not initialize ct_orig_tuple.
Daniele Di Proietto [Wed, 2 Aug 2017 00:26:28 +0000 (17:26 -0700)]
packets: Do not initialize ct_orig_tuple.

Commit "odp: Support conntrack orig tuple key." introduced new fields
in struct 'pkt_metadata'.  pkt_metadata_init() is called for every
packet in the userspace datapath.  When testing a simple single
flow case with DPDK, we observe a lower throughput after the above
commit (it was 14.88 Mpps before, it is 13 Mpps after).

This patch skips initializing ct_orig_tuple in pkt_metadata_init().
It should be enough to initialize ct_state, because nobody should look
at ct_orig_tuple unless ct_state is != 0.

It's discussed at:
https://mail.openvswitch.org/pipermail/ovs-dev/2017-May/332419.html

Fixes: daf4d3c18da4("odp: Support conntrack orig tuple key.")
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
Co-authored-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agodpdk: Fix device cleanup.
Darrell Ball [Wed, 2 Aug 2017 00:04:29 +0000 (17:04 -0700)]
dpdk: Fix device cleanup.

Commit 5dcde09c80a8 was introduced to make detaching more
automatic without using an additional command beyond
ovs-vsctl del-port <br> <port>.

Sometimes, since commit 5dcde09c80a8, dpdk devices are
not detached when del-port is issued; command example:

sudo ovs-vsctl del-port br0 dpdk1

This can happen when vswitchd is (re)started with an existing
database and devices are already bound to dpdk.

A minimal recipe to reproduce the issue is:

1/ Starting with

darrell@prmh-nsx-perf-server125:~$ sudo ovs-vsctl show
1c50d8ee-b17f-4fac-a595-03b0da8c8275
    Bridge "br0"
        Port "br0"
            Interface "br0"
                type: internal
        Port "dpdk1"
            Interface "dpdk1"
                type: dpdk
                options: {dpdk-devargs="0000:04:00.1"}
        Port "dpdk0"
            Interface "dpdk0"
                type: dpdk
                options: {dpdk-devargs="0000:04:00.0"}

darrell@prmh-nsx-perf-server125:~$ /usr/src/dpdk-16.11/tools/dpdk-devbind.py --status

Network devices using DPDK-compatible driver

============================================
0000:04:00.0 'Ethernet Controller 10-Gigabit X540-AT2' drv=uio_pci_generic unused=ixgbe,vfio-pci
0000:04:00.1 'Ethernet Controller 10-Gigabit X540-AT2' drv=uio_pci_generic unused=ixgbe,vfio-pci

2/ restart vswitchd

3/ run
 sudo ovs-vsctl del-port br0 dpdk1

and find the interface is NOT detached; there is
no info log ‘Device '0000:04:00.1' detached’.

A more verbose discussion is here:
https://mail.openvswitch.org/pipermail/ovs-dev/2017-June/333462.html
along with another possible solution.

Since we are nearing the end of a release, a safe approach is needed,
at this time.
One approach is to revert 5dcde09c80a8.  This patch does not do that
but reinstates the command ovs-appctl netdev-dpdk/detach to handle
cases when del-port will not work.

To detach the device, run the reinstated command
ovs-appctl netdev-dpdk/detach 0000:04:00.1
Observe console output
‘Device '0000:04:00.1' has been detached’

Fixes: 5dcde09c80a8 ("netdev-dpdk: Fix device leak on port deletion.")
CC: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Fischetti, Antonio <antonio.fischetti@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agoUpdate relevant artifacts to add support for DPDK 17.05.1.
Michal Weglicki [Tue, 1 Aug 2017 23:14:10 +0000 (16:14 -0700)]
Update relevant artifacts to add support for DPDK 17.05.1.

Upgrading to DPDK 17.05.1 stable release adds new
significant features relevant to OVS, including,
but not limited to:
- tun/tap PMD,
- VFIO hotplug support,
- Generic flow API.

Following changes are applied:
- netdev-dpdk: Changes required by DPDK API modifications.
- doc: Because of DPDK API changes, backward compatibility
  with previous DPDK releases will be broken, thus all
  relevant documentation entries are updated.
- .travis: DPDK version change from 16.11.1 to 17.05.1.
- rhel/openvswitch-fedora.spec.in: DPDK version change
  from 16.11 to 17.05.1

Signed-off-by: Michal Weglicki <michalx.weglicki@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Tested-by: Ian Stokes <ian.stokes@intel.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agonetdev-dpdk: use rte_eth_dev_set_mtu.
Mark Kavanagh [Tue, 1 Aug 2017 22:03:08 +0000 (15:03 -0700)]
netdev-dpdk: use rte_eth_dev_set_mtu.

DPDK provides an API to set the MTU of compatible physical devices -
rte_eth_dev_set_mtu(). Prior to DPDK v16.07 however, this API was not
implemented in some DPDK PMDs (i40e, specifically). To allow the use
of jumbo frames with affected NICs in OvS-DPDK, MTU configuration was
achieved by setting the jumbo frame flag, and corresponding maximum
permitted Rx frame size, in an rte_eth_conf structure for the NIC
port, and subsequently invoking rte_eth_dev_configure() with that
configuration.

However, that method does not set the MTU field of the underlying DPDK
structure (rte_eth_dev) for the corresponding physical device;
consequently, rte_eth_dev_get_mtu() reports the incorrect MTU for an
OvS-DPDK phy device with non-standard MTU.

Resolve this issue by invoking rte_eth_dev_set_mtu() when setting up
or modifying the MTU of a DPDK phy port.

Fixes: 0072e93 ("netdev-dpdk: add support for jumbo frames")
Reported-by: Aaron Conole <aconole@redhat.com>
Reported-by: Vipin Varghese <vipin.varghese@intel.com>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Acked-by: Sugesh Chandran <sugesh.chandran@intel.com>
Tested-by: Sugesh Chandran <sugesh.chandran@intel.com>
Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agodpif-netdev: Assign ports to pmds on non-local numa node.
Billy O'Mahony [Tue, 1 Aug 2017 21:38:43 +0000 (14:38 -0700)]
dpif-netdev: Assign ports to pmds on non-local numa node.

Previously if there is no available (non-isolated) pmd on the numa node
for a port then the port is not polled at all. This can result in a
non-operational system until such time as nics are physically
repositioned. It is preferable to operate with a pmd on the 'wrong' numa
node albeit with lower performance. Local pmds are still chosen when
available.

Signed-off-by: Billy O'Mahony <billy.o.mahony@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Co-authored-by: Ilya Maximets <i.maximets@samsung.com>
Tested-by: Ian Stokes <ian.stokes@intel.com>
Acked-by: Ian Stokes <ian.stokes@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agodpif-netdev: Don't uninit emc on reload.
Ilya Maximets [Tue, 1 Aug 2017 21:22:17 +0000 (14:22 -0700)]
dpif-netdev: Don't uninit emc on reload.

There are many reasons for reloading of pmd threads:
* reconfiguration of one of the ports.
* Adjusting of static_tx_qid.
* Adding new tx/rx ports.

In many cases EMC is still useful after reload and uninit
will only lead to unnecessary upcalls/classifier lookups.

Such behaviour slows down the datapath. Uninit itself slows
down the reload path. All this factors leads to additional
unexpected latencies/drops on events not directly connected
to current PMD thread.

Lets not uninitialize emc cache on reload path.
'emc_cache_slow_sweep()' and replacements should free all
the old/unwanted entries.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Cian Ferriter <cian.ferriter@intel.com>
Tested-by: Cian Ferriter <cian.ferriter@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agodpif-netdev: Incremental addition/deletion of PMD threads.
Ilya Maximets [Tue, 1 Aug 2017 20:46:33 +0000 (13:46 -0700)]
dpif-netdev: Incremental addition/deletion of PMD threads.

Currently, change of 'pmd-cpu-mask' is very heavy operation.
It requires destroying of all the PMD threads and creating
them back. After that, all the threads will sleep until
ports' redistribution finished.

This patch adds ability to not stop the datapath while
adjusting number/placement of PMD threads. All not affected
threads will forward traffic without any additional latencies.

id-pool created for static tx queue ids to keep them sequential
in a flexible way. non-PMD thread will always have
static_tx_qid = 0 as it was before.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Tested-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agoovn: Fix the failing "2335: ovn -- ACL logging" test case
Numan Siddique [Wed, 2 Aug 2017 14:20:57 +0000 (19:50 +0530)]
ovn: Fix the failing "2335: ovn -- ACL logging" test case

The test case is failing mainly because of timing issue. Looking into the
ovn-controller.log it is evident that the last packet injected just before the
AT_CHECK, is still not processed by ovn-controller.

Fixes: d383eed59589 ("ovn: Add support for ACL logging.")
Suggested-by: Lance Richardson <lrichard@redhat.com>
Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Justin Pettit <jpettit@ovn.org>
6 years agodpif-netlink: Fix log level for error message
Roi Dayan [Sun, 30 Jul 2017 04:58:17 +0000 (07:58 +0300)]
dpif-netlink: Fix log level for error message

Since it's an error but also will always occur in older kernels
log the message with level warning instead of info.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Acked-by: Eric Garver <e@erig.me>
Signed-off-by: Joe Stringer <joe@ovn.org>
6 years agodpif-netlink-rtnl: Fix false errors on interfaces without tunnel config
Roi Dayan [Thu, 27 Jul 2017 11:40:02 +0000 (14:40 +0300)]
dpif-netlink-rtnl: Fix false errors on interfaces without tunnel config

When we skip adding a port using rtnetlink and not because of an error we
need to return EOPNOTSUPP to avoid logging an error message.

Fixes: 2fd3d5eda508 ("dpif-netlink-rtnl: Support layer3 GRE")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Acked-by: Eric Garver <e@erig.me>
Acked-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
6 years agodpif-netlink-rtnl: Fix VXLAN port create for regular VXLAN
Eric Garver [Tue, 1 Aug 2017 22:47:18 +0000 (18:47 -0400)]
dpif-netlink-rtnl: Fix VXLAN port create for regular VXLAN

When VXLAN-GPE was introduced we added IFLA_VXLAN_GPE to the policy
parsing, but did not mark it as optional. The kernel only returns this
netlink attribute if it's actually configured.

This also adds a missing entry for IFLA_VXLAN_GBP. Apparently we have no
system-traffic test coverage there.

Fixes: c33c412fb139 ("dpif-netlink-rtnl: Support VXLAN-GPE")
Fixes: 825e45e0109f ("dpif-netlink-rtnl: Support VXLAN creation")
Reported-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Eric Garver <e@erig.me>
Signed-off-by: Joe Stringer <joe@ovn.org>
6 years agoofpbuf: Fix parameter for const initializer.
Joe Stringer [Tue, 1 Aug 2017 00:16:11 +0000 (17:16 -0700)]
ofpbuf: Fix parameter for const initializer.

Clang 4.0 complains:

In file included from include/openvswitch/cxxtest.cc:11:0:
../include/openvswitch/ofpbuf.h: In function ‘ofpbuf ofpbuf_const_initializer(const void*, size_t)’:
../include/openvswitch/ofpbuf.h:107:5: warning: narrowing conversion of ‘size’ from ‘size_t {aka long unsigned int}’ to ‘uint32_t {aka unsigned int}’ inside { } [-Wnarrowing]
     };
     ^
../include/openvswitch/ofpbuf.h:107:5: warning: narrowing conversion of ‘size’ from ‘size_t {aka long unsigned int}’ to ‘uint32_t {aka unsigned int}’ inside { } [-Wnarrowing]

This is because the ofpbuf struct's "size" parameter is a uint32_t,
while ofpbuf_const_initializer() takes a size_t for the size. Fix this
function to take a uint32_t instead.

Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
6 years agorhel: Fix typo in README.RHEL.rst
Timothy Redaelli [Fri, 28 Jul 2017 19:02:02 +0000 (21:02 +0200)]
rhel: Fix typo in README.RHEL.rst

Replace systemctk with systemctl

Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Signed-off-by: Russell Bryant <russell@ovn.org>
6 years agoodp-util: Refactor odp_key_to_dp_packet()
Yi-Hung Wei [Mon, 31 Jul 2017 20:35:39 +0000 (13:35 -0700)]
odp-util: Refactor odp_key_to_dp_packet()

Change type from uint16_t to 'enum ovs_key_attr' so that the compiler
will warn the unhandled cases.

Suggested-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
6 years agoodp-util: Remove unnecessary optimization in odp_key_to_dp_packet()
Yi-Hung Wei [Mon, 31 Jul 2017 20:35:38 +0000 (13:35 -0700)]
odp-util: Remove unnecessary optimization in odp_key_to_dp_packet()

The optimization logic in odp_key_to_dp_packet() used to be useful if the
number of wanted key attributes are small. However, as the expected key
attributes increase, and the optimization logic need to check all the
netlink attributes if one of the wanted key attributes is missing, the
benefit of the optimization logic is minimal. Therefore, this patch removes
the optimization.

Suggested-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
6 years agoodp-util: Fix generating ct_orig_tuple in odp_key_to_dp_packet()
Yi-Hung Wei [Mon, 31 Jul 2017 20:35:37 +0000 (13:35 -0700)]
odp-util: Fix generating ct_orig_tuple in odp_key_to_dp_packet()

Previously, odp_key_to_dp_packet() may fail to get ct_orig_tuple
from ODP flow key. This patch fixes the issue.

VMWare-BZ: #1920903
Fixes: daf4d3c18da4 ("odp: Support conntrack orig tuple key.")
Suggested-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
6 years agoodp-util: Fix generating various ct fields in odp_key_to_dp_packet()
Yi-Hung Wei [Mon, 31 Jul 2017 20:35:36 +0000 (13:35 -0700)]
odp-util: Fix generating various ct fields in odp_key_to_dp_packet()

Previously, odp_key_to_dp_packet() may fail to get ct_state, ct_zone,
ct_mark, and ct_labels from ODP flow key. This patch fixes the issue.

VMWare-BZ: #1920903
Fixes: 07659514c3c1 ("Add support for connection tracking.")
Fixes: 8e53fe8cf7a1 ("Add connection tracking mark support.")
Fixes: 9daf23484fb1 ("Add connection tracking label support.")
Suggested-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
6 years agoodp-util: Make checks for exact or wildcard masks more precise.
Ben Pfaff [Mon, 31 Jul 2017 19:36:48 +0000 (12:36 -0700)]
odp-util: Make checks for exact or wildcard masks more precise.

Checking whether an ODP mask is all-0s or all-1s is a little more
complicated than one might expect because the structures sometimes have
trailing padding.  The function odp_mask_is_exact() was fairly careful
about this, but odp_mask_attr_is_wildcard() didn't take padding into
consideration at all, which caused test failures on Travis and on some
machines because of uninitialized padding.

This commit fixes the problem by unifying the two different functions so
that both of them are careful about checking only significant bytes.  It
also adds support for the ct_orig_tuples for IPv4 and IPv6, which also
have trailing padding but weren't special cased before.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Andy Zhou <azhou@ovn.org>
6 years agoutil: New function is_all_byte().
Ben Pfaff [Mon, 31 Jul 2017 17:07:50 +0000 (10:07 -0700)]
util: New function is_all_byte().

This makes it easy for callers to choose all-ones or all-zeros based on
a parameter instead of choice of function.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Andy Zhou <azhou@ovn.org>
6 years agoodp-util: Drop special case for OVS_KEY_ATTR_TUNNEL for exact mask checks.
Ben Pfaff [Mon, 31 Jul 2017 16:40:57 +0000 (09:40 -0700)]
odp-util: Drop special case for OVS_KEY_ATTR_TUNNEL for exact mask checks.

This special case isn't actually necessary.  Commit 48954dab23ee
("odp-util: Remove last use of odp_tun_key_from_attr for formatting.")
retained it "as a safety measure" but that isn't really needed.

This makes an upcoming change more straightforward.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Andy Zhou <azhou@ovn.org>
6 years agoodp-util: Rewrite odp_mask_attr_is_exact().
Ben Pfaff [Fri, 28 Jul 2017 21:47:58 +0000 (14:47 -0700)]
odp-util: Rewrite odp_mask_attr_is_exact().

The way this function was written seemed really funny to me, so this commit
rewrites it.  There should be no behavioral change.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Andy Zhou <azhou@ovn.org>
6 years agoodp-util: More carefully validate attribute length in odp_flow_format().
Ben Pfaff [Fri, 28 Jul 2017 21:43:57 +0000 (14:43 -0700)]
odp-util: More carefully validate attribute length in odp_flow_format().

odp_flow_format() passes masks to odp_mask_attr_is_wildcard() without
first checking that they are the correct length.  This is OK for the
moment because odp_mask_attr_is_wildcard() doesn't care that the length
is correct.  An upcoming commit will change odp_mask_attr_is_wildcard()
to make it pickier, so this prepares for that change.

This adds a few comments to make it a little harder to get length
validation wrong in the future.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Andy Zhou <azhou@ovn.org>
6 years agoodp-util: Fix misleading parameter names.
Ben Pfaff [Fri, 28 Jul 2017 21:23:57 +0000 (14:23 -0700)]
odp-util: Fix misleading parameter names.

The 'max_len' parameters to these functions are actually the maximum type
values, not the maximum length of anything.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Andy Zhou <azhou@ovn.org>
6 years agoAdd 'extern "C"' for all relevant public header files, plus a build check.
Ben Pfaff [Mon, 31 Jul 2017 01:03:24 +0000 (18:03 -0700)]
Add 'extern "C"' for all relevant public header files, plus a build check.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Russell Bryant <russell@ovn.org>
6 years agoAutomatically verify that OVS header files work OK in C++ also.
Ben Pfaff [Mon, 31 Jul 2017 20:31:43 +0000 (13:31 -0700)]
Automatically verify that OVS header files work OK in C++ also.

This should help address a recurring problem.

This change makes the OVS header files, when parsed by a C++ compiler,
require C++11 or later.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Russell Bryant <russell@ovn.org>
6 years agoofp-util: Avoid C++ keyword 'public' in name of struct member.
Ben Pfaff [Mon, 31 Jul 2017 00:40:32 +0000 (17:40 -0700)]
ofp-util: Avoid C++ keyword 'public' in name of struct member.

This allows a C++ program to include ofp-util.h.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Russell Bryant <russell@ovn.org>
6 years agoutil: Add C++ compatible definition of PADDED_MEMBERS.
Ben Pfaff [Mon, 31 Jul 2017 00:36:21 +0000 (17:36 -0700)]
util: Add C++ compatible definition of PADDED_MEMBERS.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Russell Bryant <russell@ovn.org>
6 years agoofp-actions: Add casts to placate C++ compilers.
Ben Pfaff [Mon, 31 Jul 2017 00:35:25 +0000 (17:35 -0700)]
ofp-actions: Add casts to placate C++ compilers.

C++ does not allow for an implicit conversion from void * to pointer to
object or incomplete type.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Russell Bryant <russell@ovn.org>
6 years agoovn: Restrict encap modification to its creating chassis
Mark Michelson [Thu, 27 Jul 2017 18:34:23 +0000 (13:34 -0500)]
ovn: Restrict encap modification to its creating chassis

This patch extends RBAC restrictiveness of the encap table in
the ovn southbound database by only allowing modification by the
chassis that created the encap.

Signed-off-by: Mark Michelson <mmichels@redhat.com>
Reported-by: Lance Richardson <lrichard@redhat.com>
Acked-by: Lance Richardson <lrichard@redhat.com>
Signed-off-by: Russell Bryant <russell@ovn.org>
6 years agoovn: Update ovn-nbctl manpage with DHCP lsp commands.
Mark Michelson [Wed, 26 Jul 2017 21:28:13 +0000 (16:28 -0500)]
ovn: Update ovn-nbctl manpage with DHCP lsp commands.

This adds the appropriate manpage entries for ovn-nbctl for

lsp-set-dhcpv4-options
lsp-get-dhcpv4-options
lsp-set-dhcpv6-options
lsp-get-dhcpv4-options

Signed-off-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Russell Bryant <russell@ovn.org>
6 years agotests: Use ovn-nbctl lsp-set-dhcpvX-options
Mark Michelson [Wed, 26 Jul 2017 21:28:12 +0000 (16:28 -0500)]
tests: Use ovn-nbctl lsp-set-dhcpvX-options

Existing OVN tests manually added DHCP options to the
Logical_Switch_Port database. There is a shortcut CLI command for doing
the same thing, so we may as well use it and get the extra test coverage
as a result.

Signed-off-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Russell Bryant <russell@ovn.org>
6 years agoovn: Add lsp-set-dhcpv6-options ovn-nbctl operation.
Mark Michelson [Wed, 26 Jul 2017 21:28:11 +0000 (16:28 -0500)]
ovn: Add lsp-set-dhcpv6-options ovn-nbctl operation.

OVN offers a shortcut to set DHCPv4 options on a logical switch port,
but it did not offer the same for DHCPv6. This commit adds a similar
option for DHCPv6.

Signed-off-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Russell Bryant <russell@ovn.org>
6 years agoovn: Add support for ACL logging.
Justin Pettit [Sat, 17 Dec 2016 01:40:24 +0000 (17:40 -0800)]
ovn: Add support for ACL logging.

Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Han Zhou <zhouhan@gmail.com>
Acked-by: Ben Pfaff <blp@ovn.org>
6 years agoofproto-dpif-rid: Don't include action_set_len as part of hash.
Justin Pettit [Fri, 28 Jul 2017 01:02:26 +0000 (18:02 -0700)]
ofproto-dpif-rid: Don't include action_set_len as part of hash.

It doesn't improve the hashing, since the number of bytes hashed is
included in hash_bytes64() hash calculation.

Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
6 years agoofproto-dpif-rid: Always store tunnel metadata.
Justin Pettit [Fri, 7 Jul 2017 23:26:10 +0000 (16:26 -0700)]
ofproto-dpif-rid: Always store tunnel metadata.

Tunnel metadata was only stored if the tunnel destination was set.  It's
possible, for example, that a flow could set the tunnel id field before
recirculation and then set the destination field afterwards.  The
previous behavior is that the tunnel id would be lost during
recirculation under such a circumstance.  This changes the behavior to
always copy the tunnel metadata, regardless of whether the tunnel
destination is set.  It also adds a unit test.

Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
6 years agoofproto-dpif-rid: Store tunnel metadata in frozen metadata directly.
Justin Pettit [Fri, 7 Jul 2017 23:04:57 +0000 (16:04 -0700)]
ofproto-dpif-rid: Store tunnel metadata in frozen metadata directly.

"recirc_id_node" contains a 'state_metadata_tunnel' member field.  The
"frozen_metadata" structure used by "recird_id_node" had a 'tunnel'
member that always pointed to 'state_metadata_tunnel".  This commit just
stores the tunnel information directly in "frozen_metadata" instead of
accessing it through a pointer.

This makes the code a bit simpler and easier to reason about.

Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
6 years agotravis: Fail the build if any of the Linux build preparations fail.
Ben Pfaff [Thu, 27 Jul 2017 20:41:06 +0000 (13:41 -0700)]
travis: Fail the build if any of the Linux build preparations fail.

We want the build to fail if we can't prepare properly for it, but
linux-prepare.sh ignored errors.  This fixes the problem.

This would have made it more obvious where the problem fixed by the
previous commit originated.

(osx-prepare.sh already does the right thing.)

Signed-off-by: Ben Pfaff <blp@ovn.org>
6 years agotravis: Explicitly disable LLVM for sparse build.
Ben Pfaff [Thu, 27 Jul 2017 23:48:54 +0000 (16:48 -0700)]
travis: Explicitly disable LLVM for sparse build.

Newer travis environments claim to have LLVM support (llvm-config exists
and works) but in reality don't, which prevents sparse from building and
later parts of the build from succeeding.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
6 years agoflow: Refactor flow_compose() API.
Andy Zhou [Tue, 25 Jul 2017 21:26:22 +0000 (14:26 -0700)]
flow: Refactor flow_compose() API.

Currently, flow_compose_size() is only supposed to be called after
flow_compose(). I find this API to be unintuitive.

Change flow_compose() API to take the 'size' argument, and
returns 'true' if the packet can be created, 'false' otherwise.

This change also improves error detection and reporting when
'size' is unreasonably small.

Signed-off-by: Andy Zhou <azhou@ovn.org>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
6 years agonetlink: Correct comment for nl_msg_put_unspec().
Justin Pettit [Wed, 26 Jul 2017 23:51:17 +0000 (16:51 -0700)]
netlink: Correct comment for nl_msg_put_unspec().

Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
6 years agoovn: l3ha, CLI for logical router port gateway chassis
Venkata Anil [Tue, 18 Jul 2017 06:05:45 +0000 (11:35 +0530)]
ovn: l3ha, CLI for logical router port gateway chassis

This change adds commands to set, get and delete gateway chassis
for logical router port.

Signed-off-by: Venkata Anil Kommaddi <vkommadi@redhat.com>
Signed-off-by: Russell Bryant <russell@ovn.org>
6 years agodpif: Refactor obj type from void pointer to dpif_class
Roi Dayan [Tue, 25 Jul 2017 05:28:41 +0000 (08:28 +0300)]
dpif: Refactor obj type from void pointer to dpif_class

It's basically what is being passed today and passing a specific
type adds a compiler type check.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
6 years agotc: Add SCTP support
Vlad Buslov [Tue, 25 Jul 2017 11:39:51 +0000 (14:39 +0300)]
tc: Add SCTP support

Implement SCTP source and destination ports support for flower.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Acked-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
6 years agoDocumentation/conf.py: Fix line length.
Russell Bryant [Thu, 27 Jul 2017 00:30:34 +0000 (20:30 -0400)]
Documentation/conf.py: Fix line length.

A previous commit introduced a line that was greater than 79
characters long, causing a flake8 warning to be emitted.

Reported-by: Joe Stringer <joe@ovn.org>
Fixes: 5ca89127382d ("docs: Refer to correct package name for sphinx theme.")
Signed-off-by: Russell Bryant <russell@ovn.org>
6 years agodatapath: fix potential out of bound access in parse_ct
Greg Rose [Tue, 25 Jul 2017 15:40:58 +0000 (08:40 -0700)]
datapath: fix potential out of bound access in parse_ct

Upstream commit:
    commit 69ec932e364b1ba9c3a2085fe96b76c8a3f71e7c
    Author: Liping Zhang <zlpnobody@gmail.com>
    Date:   Sun Jul 23 17:52:23 2017 +0800

    openvswitch: fix potential out of bound access in parse_ct

    Before the 'type' is validated, we shouldn't use it to fetch the
    ovs_ct_attr_lens's minlen and maxlen, else, out of bound access
    may happen.

Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pick up an upstream bug fix.

Fixes: a94ebc39996b ("datapath: Add conntrack action")
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
6 years agosystem-userspace-macros: Fix ethtool with new kernels.
Joe Stringer [Wed, 26 Jul 2017 19:49:48 +0000 (12:49 -0700)]
system-userspace-macros: Fix ethtool with new kernels.

The latest net-next kernels have removed the UFO feature, which results
in older ethtool reporting the following error:

Cannot get device udp-fragmentation-offload settings: Operation not
supported

Currently, we rely on no errors being reported, and if there is an error
then a failure is reported. However, in this case we can safely ignore
the stderr output. We still check the return code so if something is
truly fatal, a failure will still be reported; otherwise, we will not
fail the test due to the above.

Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Andy Zhou <azhou@ovn.org>
6 years agoovn-northd: Optimize acl of localnet-port.
wangqianyu [Wed, 26 Jul 2017 21:02:24 +0000 (17:02 -0400)]
ovn-northd: Optimize acl of localnet-port.

Localnet port is not an endpoint, and have no security requirements
to use localnet port at present. So, for performance consideration, we
could do not use ct for localnet port.

The more specific discussion can be found from
https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335048.html

Signed-off-by: wangqianyu <wang.qianyu@zte.com.cn>
Acked-by: Han Zhou <zhouhan@gmail.com>
Signed-off-by: Russell Bryant <russell@ovn.org>
6 years agocheckpatch: Print commit hashes and names.
Ilya Maximets [Fri, 14 Jul 2017 10:57:24 +0000 (13:57 +0300)]
checkpatch: Print commit hashes and names.

It's better to see real commits instead of 'HEAD~n'.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Russell Bryant <russell@ovn.org>
6 years agocheckpatch: Allow checking more than one file.
Ilya Maximets [Fri, 14 Jul 2017 10:57:23 +0000 (13:57 +0300)]
checkpatch: Allow checking more than one file.

Currently to check more than one patch or file it's required
to invoke script for each file separately.
Fix that by iterating over all the passed filenames.

Note: If '-f' option passed, all the files treated as usual files.
      Without '-f' all the files treated as patch files.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Russell Bryant <russell@ovn.org>
6 years agocheckpatch: Print results while checking HEAD and stdin.
Ilya Maximets [Fri, 14 Jul 2017 10:57:22 +0000 (13:57 +0300)]
checkpatch: Print results while checking HEAD and stdin.

Currently, result status printed only for patch files.
It'll be nice to have results for other checking types.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Russell Bryant <russell@ovn.org>
6 years agocheckpatch: Don't allow Gerrit Change-Ids.
Ilya Maximets [Fri, 14 Jul 2017 10:57:21 +0000 (13:57 +0300)]
checkpatch: Don't allow Gerrit Change-Ids.

Local Gerrit Change-Ids are not welcome in common repository.
Inspired by checkpatch.pl from Linux Kernel.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Russell Bryant <russell@ovn.org>
6 years agodocs: Refer to correct package name for sphinx theme.
Russell Bryant [Fri, 21 Jul 2017 00:23:21 +0000 (20:23 -0400)]
docs: Refer to correct package name for sphinx theme.

Update the log message emitted when the OVS sphinx theme is not found
to reference the name of the package to be installed via pip:
ovs-sphinx-theme.

Signed-off-by: Russell Bryant <russell@ovn.org>
Acked-by: Lance Richardson <lrichard@redhat.com>
6 years agoovn-controller: avoid null ptr dereference
Lance Richardson [Wed, 26 Jul 2017 19:47:34 +0000 (15:47 -0400)]
ovn-controller: avoid null ptr dereference

Avoid null pointer dereference in fdb_calculate_active_tunnels()
when integration bridge isn't present. This is easily encountered
by executing "make sandbox SANDBOXFLAGS=--ovn".

Fixes: 3475695ea61c ("ovn: l3ha, enable bfd between tunnel endpoints")
Signed-off-by: Lance Richardson <lrichard@redhat.com>
Signed-off-by: Russell Bryant <russell@ovn.org>
6 years agobond: Adjust bond hash masks
Andy Zhou [Tue, 25 Jul 2017 18:28:37 +0000 (11:28 -0700)]
bond: Adjust bond hash masks

Commit 42781e77035d (bond: Unify hash functions in hash action and entry
lookup.) changed the BM_TCP's hash function, but did not update
hash mask fields accordingly. Found by inspection.

CC: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Andy Zhou <azhou@ovn.org>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
6 years agodpif-netdev.at: Add netdev-dummy/receive test.
Ilya Maximets [Tue, 25 Jul 2017 13:02:02 +0000 (16:02 +0300)]
dpif-netdev.at: Add netdev-dummy/receive test.

Regression test for 'netdev-dummy/receive' appctl command.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Andy Zhou <azhou@ovn.org>
6 years agonetdev-dummy: Fix setting length in recieve command.
Ilya Maximets [Tue, 25 Jul 2017 13:02:01 +0000 (16:02 +0300)]
netdev-dummy: Fix setting length in recieve command.

Currently, if '--len' option passed to 'netdev-dummy/receive' command,
only 'size' field of dp_packet will changes.

This is incorrect behaviour, because memory for that size is not
allocated and also packet headers not fixed to reflect the new size.
This leads to flow_extract() failure, because it checks the
'ip->tot_len' and stops further parsing if it doesn't match the
dp_packet_size(). As a result packets created while processing of the
'receive' command can't be parsed to the same flow.
Additionally this may lead to wrong memory accesses in case someone
will try to read or modify packets data.

Fix that by creating right packets using recently introduced
'flow_compose_size()'.

CC: Andy Zhou <azhou@ovn.org>
Fixes: d8ada2368cbe ("netdev-dummy: Add --len option for netdev-dummy/receive command")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Andy Zhou <azhou@ovn.org>
6 years agoflow: Add flow_compose_size().
Ilya Maximets [Tue, 25 Jul 2017 13:02:00 +0000 (16:02 +0300)]
flow: Add flow_compose_size().

This allows to compose packets with different real lenghts from
odp flows i.e. memory will be allocated for requested packet
size and all required headers like ip->tot_len filled correctly.

Will be used in netdev-dummy to properly handle '--len' option.

Suggested-by: Andy Zhou <azhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Andy Zhou <azhou@ovn.org>
6 years agostream-ssl: Fix memory leak in error scenario
Mark Michelson [Fri, 21 Jul 2017 20:46:00 +0000 (15:46 -0500)]
stream-ssl: Fix memory leak in error scenario

ssl_new_stream() takes ownership of the passed-in 'name' parameter.
In error scenarios, the name is leaked. I was able to trigger this
leak by attempting to connect to an ovsdb over SSL and specifying
non-existent certificate, private key, and CA cert files.

This patch fixes the problem by freeing 'name' in the error label.

Signed-off-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Russell Bryant <russell@ovn.org>
6 years agoAUTHORS.rst: Add Mark Michelson.
Russell Bryant [Tue, 25 Jul 2017 19:39:59 +0000 (15:39 -0400)]
AUTHORS.rst: Add Mark Michelson.

Signed-off-by: Russell Bryant <russell@ovn.org>
6 years agobond: Remove bond_hash_src.
Ilya Maximets [Tue, 25 Jul 2017 10:46:39 +0000 (13:46 +0300)]
bond: Remove bond_hash_src.

Since introduction of 'hash_mac()' function in
commit 7e36ac42e33a ("lib/packet.h: add hash_mac()"), there is no
need to have additional wrapper for mac address hashing.

Lets use 'hash_mac()' directly and remove 'bond_hash_src()' to
simplify the code.

Suggested-by: Andy Zhou <azhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Andy Zhou <azhou@ovn.org>
6 years agobond: Unify hash functions in hash action and entry lookup.
Ilya Maximets [Tue, 25 Jul 2017 10:46:38 +0000 (13:46 +0300)]
bond: Unify hash functions in hash action and entry lookup.

'lookup_bond_entry' currently uses 'flow_hash_symmetric_l4' while
OVS_ACTION_ATTR_HASH uses 'flow_hash_5tuple'. This may lead to
inconsistency in slave choosing for the new flows.  In general,
there is no point to unify hash functions, because it's not
required for correct work, but it's logically wrong to use
different hash functions there.

Unfortunately we're not able to use RSS hash here, because we have
no packet at this point, but we may reduce inconsistency by using
'flow_hash_5tuple' instead of 'flow_hash_symmetric_l4' because
symmetric quality is not needed.

'flow_hash_symmetric_l4' was used previously just because there
was no other implemented hash function at the moment and L2
fields was additionally involved in hash calculation. Now we
have 5tuple hash and L2 not used anymore, so, we may replace the
old function.

'flow_hash_5tuple' is preferable solution because it in 2 - 8 times
(depending on the flow) faster than symmetric function.
So, this change will also speed up handling of the new flows and
statistics accounting.

Additionally function 'bond_hash_tcp()' was removed for the reasons
of code simplification and possible additional speed up.

Co-authored-by: Andy Zhou <azhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Andy Zhou <azhou@ovn.org>
6 years agovswitch.xml: Fix L2 balancing mentioning for balance-tcp bond.
Ilya Maximets [Tue, 25 Jul 2017 10:46:37 +0000 (13:46 +0300)]
vswitch.xml: Fix L2 balancing mentioning for balance-tcp bond.

L2 fields are not used in userspace hash action since
commit 4f150744921f ("dpif-netdev: Use miniflow as a flow key.").
In kernel datapath RSS (which is not include L2 by default for
most of the NICs) was used from the beginning. This means that
if recirculation is in use, L2 fields are not used for flow
balancing.

Fix the documentation accordingly.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Andy Zhou <azhou@ovn.org>
6 years agoovn-architecture: Remove outdated comment.
Russell Bryant [Mon, 24 Jul 2017 20:52:30 +0000 (16:52 -0400)]
ovn-architecture: Remove outdated comment.

This outdated comment said that support for hardware gateways that
support the vtep schema would come later.  This was actually
implemented a long time ago.

Signed-off-by: Russell Bryant <russell@ovn.org>
Acked-by: Miguel Angel Ajo <majopela@redhat.com>
6 years agotests: Add force/commit test to system-traffic.at
Joe Stringer [Tue, 18 Jul 2017 15:42:53 +0000 (08:42 -0700)]
tests: Add force/commit test to system-traffic.at

Add a new check if the conntrack force direction change and
commit is working correctly.

This test was used to find and root cause VMware-BZ 1890854.

Signed-off-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Darrell Ball <dlu998@gmail.com>
6 years agodatapath: Fix for force/commit action failures
Greg Rose [Tue, 18 Jul 2017 15:42:54 +0000 (08:42 -0700)]
datapath: Fix for force/commit action failures

Upstream commit:
    commit 8b97ac5bda17cfaa257bcab6180af0f43a2e87e0
    Author: Greg Rose <gvrose8192@gmail.com>
    Date:   Fri Jul 14 12:42:49 2017 -0700

    openvswitch: Fix for force/commit action failures

    When there is an established connection in direction A->B, it is
    possible to receive a packet on port B which then executes
    ct(commit,force) without first performing ct() - ie, a lookup.
    In this case, we would expect that this packet can delete the
    existing entry so that we can commit a connection with direction B->A.
    However, currently we only perform a check in skb_nfct_cached() for
    whether OVS_CS_F_TRACKED is set and OVS_CS_F_INVALID is not set, ie
    that a lookup previously occurred. In the above scenario, a lookup
    has not occurred but we should still be able to statelessly look
    up the existing entry and potentially delete the entry if it is
    in the opposite direction.

    This patch extends the check to also hint that if the action has the
    force flag set, then we will lookup the existing entry so that the
    force check at the end of skb_nfct_cached has the ability to delete
    the connection.

Fixes: dd41d330b03 ("openvswitch: Add force commit.")
CC: Pravin Shelar <pshelar@nicira.com>
CC: dev@openvswitch.org
Signed-off-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Co-authored-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
6 years agotravis: Update test kernels
Greg Rose [Fri, 21 Jul 2017 23:46:14 +0000 (16:46 -0700)]
travis: Update test kernels

Update the Travis test kernels as per the latest information from
kernel.org. In particular add support for kernel 4.12 as the newest
released kernel.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
6 years agoacinclude.m4: Support Linux kernel 4.12
Greg Rose [Fri, 21 Jul 2017 23:46:13 +0000 (16:46 -0700)]
acinclude.m4: Support Linux kernel 4.12

Allow datapath kernel modules to be configured and built for kernels up
to 4.12.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
6 years agodatapath: fix mis-ordered comment lines for ovs_skb_cb
Greg Rose [Fri, 21 Jul 2017 23:46:12 +0000 (16:46 -0700)]
datapath: fix mis-ordered comment lines for ovs_skb_cb

Upstream commit:
    commit 52427fa0631269c62885dc48e0c32e2ad6e17f8c
    Author: Daniel Axtens <dja@axtens.net>
    Date:   Mon Jul 3 21:46:43 2017 +1000

    openvswitch: fix mis-ordered comment lines for ovs_skb_cb

    I was trying to wrap my head around meaning of mru, and realised
    that the second line of the comment defining it had somehow
    ended up after the line defining cutlen, leading to much confusion.

    Reorder the lines to make sense.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
6 years agodatapath: Avoid using stack larger than 1024.
Tonghao Zhang [Fri, 21 Jul 2017 23:46:11 +0000 (16:46 -0700)]
datapath: Avoid using stack larger than 1024.

Upstream commit:
    commit 9cc9a5cb176ccb4f2cda5ac34da5a659926f125f
    Author: Tonghao Zhang <xiangxia.m.yue@gmail.com>
    Date:   Thu Jun 29 17:27:44 2017 -0700

    datapath: Avoid using stack larger than 1024.

    When compiling OvS-master on 4.4.0-81 kernel,
    there is a warning:

        CC [M]  /root/ovs/datapath/linux/datapath.o
/root/ovs/datapath/linux/datapath.c: In function
'ovs_flow_cmd_set':
/root/ovs/datapath/linux/datapath.c:1221:1: warning:
the frame size of 1040 bytes is larger than 1024 bytes
[-Wframe-larger-than=]

    This patch factors out match-init and action-copy to avoid
    "Wframe-larger-than=1024" warning. Because mask is only
    used to get actions, we new a function to save some
    stack space.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
6 years agocompat: net: store port/representator id in metadata_dst.
Joe Stringer [Fri, 21 Jul 2017 23:46:10 +0000 (16:46 -0700)]
compat: net: store port/representator id in metadata_dst.

Upstream commit:
    commit 3fcece12bc1b6dcdf0986f2cd9e8f63b1f9b6aa0
    Author: Jakub Kicinski <jakub.kicinski@netronome.com>
    Date: Fri Jun 23 22:11:58 2017 +0200

    net: store port/representator id in metadata_dst

    Switches and modern SR-IOV enabled NICs may multiplex traffic from Port
    representators and control messages over single set of hardware queues.
    Control messages and muxed traffic may need ordered delivery.

    Those requirements make it hard to comfortably use TC infrastructure today
    unless we have a way of attaching metadata to skbs at the upper device.
    Because single set of queues is used for many netdevs stopping TC/sched
    queues of all of them reliably is impossible and lower device has to
    retreat to returning NETDEV_TX_BUSY and usually has to take extra locks on
    the fastpath.

    This patch attempts to enable port/representative devs to attach metadata
    to skbs which carry port id.  This way representatives can be queueless and
    all queuing can be performed at the lower netdev in the usual way.

    Traffic arriving on the port/representative interfaces will be have
    metadata attached and will subsequently be queued to the lower device for
    transmission.  The lower device should recognize the metadata and translate
    it to HW specific format which is most likely either a special header
    inserted before the network headers or descriptor/metadata fields.

    Metadata is associated with the lower device by storing the netdev pointer
    along with port id so that if TC decides to redirect or mirror the new
    netdev will not try to interpret it.

    This is mostly for SR-IOV devices since switches don't have lower netdevs
    today.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Upstream: 3fcece12bc1b ("net: store port/representator id in metadata_dst")
Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Greg Rose <gvrose8192@gmail.com>
6 years agodatapath: get rid of redundant vxlan_dev.flags
Greg Rose [Fri, 21 Jul 2017 23:46:09 +0000 (16:46 -0700)]
datapath: get rid of redundant vxlan_dev.flags

Upstream commit:
    commit dc5321d79697db1b610c25fa4fad1aec7533ea3e
    Author: Matthias Schiffer <mschiffer@universe-factory.net>
    Date:   Mon Jun 19 10:03:56 2017 +0200

    vxlan: get rid of redundant vxlan_dev.flags

    There is no good reason to keep the flags twice in vxlan_dev and
    vxlan_config.

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Applied using HAVE_VXLAN_DEV_CFG compatibility flag defined in
acinclude.m4.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
6 years agocompat: Implement upstream net device free change.
Greg Rose [Fri, 21 Jul 2017 23:46:08 +0000 (16:46 -0700)]
compat: Implement upstream net device free change.

Upstream commit cf124db566e6 ("net: Fix inconsistent teardown and
release of private netdev state.") removed the destructor member
of the net_device structure and replaced it with a boolean flag
indicating that the net device resource needs freeing.  Use
compat flag HAVE_NEEDS_FREE_NETDEV to indicate whether the new
flag should be used.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
6 years agocompat: convert many more places to skb_put_zero().
Joe Stringer [Fri, 21 Jul 2017 23:46:07 +0000 (16:46 -0700)]
compat: convert many more places to skb_put_zero().

Upstream commit:
    commit de77b966ce8adcb4c58d50e2f087320d5479812a
    Author: Johannes Berg <johannes.berg@intel.com>
    Date: Fri Jun 16 14:29:19 2017 +0200

    networking: convert many more places to skb_put_zero()

    There were many places that my previous spatch didn't find,
    as pointed out by yuan linyu in various patches.

    The following spatch found many more and also removes the
    now unnecessary casts:

        @@
        identifier p, p2;
        expression len;
        expression skb;
        type t, t2;
        @@
        (
        -p = skb_put(skb, len);
        +p = skb_put_zero(skb, len);
        |
        -p = (t)skb_put(skb, len);
        +p = skb_put_zero(skb, len);
        )
        ... when != p
        (
        p2 = (t2)p;
        -memset(p2, 0, len);
        |
        -memset(p, 0, len);
        )

        @@
        type t, t2;
        identifier p, p2;
        expression skb;
        @@
        t *p;
        ...
        (
        -p = skb_put(skb, sizeof(t));
        +p = skb_put_zero(skb, sizeof(t));
        |
        -p = (t *)skb_put(skb, sizeof(t));
        +p = skb_put_zero(skb, sizeof(t));
        )
        ... when != p
        (
        p2 = (t2)p;
        -memset(p2, 0, sizeof(*p));
        |
        -memset(p, 0, sizeof(*p));
        )

        @@
        expression skb, len;
        @@
        -memset(skb_put(skb, len), 0, len);
        +skb_put_zero(skb, len);

    Apply it to the tree (with one manual fixup to keep the
    comment in vxlan.c, which spatch removed.)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use e45a79da863c ("skbuff/mac80211: introduce and use skb_put_zero()")
as the basis for the backported function.

Upstream: de77b966ce8a ("networking: convert many more places to skb_put_zero()")
Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Greg Rose <gvrose8192@gmail.com>
6 years agodatapath: Fix inconsistent teardown and release of private netdev state.
Greg Rose [Fri, 21 Jul 2017 23:46:06 +0000 (16:46 -0700)]
datapath: Fix inconsistent teardown and release of private netdev state.

Upstream commit:
    commit cf124db566e6b036b8bcbe8decbed740bdfac8c6
    Author: David S. Miller <davem@davemloft.net>
    Date:   Mon May 8 12:52:56 2017 -0400

    net: Fix inconsistent teardown and release of private netdev state.

    Network devices can allocate reasources and private memory using
    netdev_ops->ndo_init().  However, the release of these resources
    can occur in one of two different places.

    Either netdev_ops->ndo_uninit() or netdev->destructor().

    The decision of which operation frees the resources depends upon
    whether it is necessary for all netdev refs to be released before it
    is safe to perform the freeing.

    netdev_ops->ndo_uninit() presumably can occur right after the
    NETDEV_UNREGISTER notifier completes and the unicast and multicast
    address lists are flushed.

    netdev->destructor(), on the other hand, does not run until the
    netdev references all go away.

    Further complicating the situation is that netdev->destructor()
    almost universally does also a free_netdev().

    This creates a problem for the logic in register_netdevice().
    Because all callers of register_netdevice() manage the freeing
    of the netdev, and invoke free_netdev(dev) if register_netdevice()
    fails.

    If netdev_ops->ndo_init() succeeds, but something else fails inside
    of register_netdevice(), it does call ndo_ops->ndo_uninit().  But
    it is not able to invoke netdev->destructor().

    This is because netdev->destructor() will do a free_netdev() and
    then the caller of register_netdevice() will do the same.

    However, this means that the resources that would normally be released
    by netdev->destructor() will not be.

    Over the years drivers have added local hacks to deal with this, by
    invoking their destructor parts by hand when register_netdevice()
    fails.

    Many drivers do not try to deal with this, and instead we have leaks.

    Let's close this hole by formalizing the distinction between what
    private things need to be freed up by netdev->destructor() and whether
    the driver needs unregister_netdevice() to perform the free_netdev().

    netdev->priv_destructor() performs all actions to free up the private
    resources that used to be freed by netdev->destructor(), except for
    free_netdev().

    netdev->needs_free_netdev is a boolean that indicates whether
    free_netdev() should be done at the end of unregister_netdevice().

    Now, register_netdevice() can sanely release all resources after
    ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit()
    and netdev->priv_destructor().

    And at the end of unregister_netdevice(), we invoke
    netdev->priv_destructor() and optionally call free_netdev().

Signed-off-by: David S. Miller <davem@davemloft.net>
Applied the portion of the commit applicable to openvswitch.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Joe Stringer <joe@ovn.org>