]> git.proxmox.com Git - mirror_ovs.git/log
mirror_ovs.git
4 years agodp-packet: prefetch the next packet when cloning a batch.
Flavio Leitner [Fri, 7 Feb 2020 12:01:13 +0000 (09:01 -0300)]
dp-packet: prefetch the next packet when cloning a batch.

There is a cache miss when accessing mbuf->data_off while cloning
a batch and using prefetch improved the throughput by ~2.3%.

Before: 13709416.30 pps
 After: 14031475.80 pps

Fixes: d48771848560 ("dp-packet: preserve headroom when cloning a pkt batch")
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agodatapath-windows: Append tunnel info to upcall for correct template
Amber Hu via dev [Tue, 4 Feb 2020 05:03:03 +0000 (05:03 +0000)]
datapath-windows: Append tunnel info to upcall for correct template

Formerly, there is no tunnel information appended in the upcall’s
packet data, which is expected by IPFIX in userspace to calculate
the template for exporting the sampled flow record of on egress
tunnel port.
To fix this, during performing OvsOutputUserspaceAction(), we
would check whether it is initiated by the sampling on egress
tunnel which would be indicated by the attribute as
OVS_USERSPACE_ATTR_EGRESS_TUN_PORT in the nested attribute
list. If so, we would append the tunKey in OvsForwardingContext
indexed by OVS_PACKET_ATTR_EGRESS_TUN_KEY to the upcall.
Besides, at this point, the source transport port and  source ip
address are not available in the structure, so we have to fill it in the
way how the packet would be capsulated during performing
OvsEncapGeneve(), which is following the
OvsOutputUserspaceAction() unfortunately.
I have tested the IPFIX functionality with the change, we could see the
template is correct and the expected tunnel information could be
packed in the IPFIX packet finally. The traffic for test is generated by
PING utility.

>From d727d051c9a44a4a93e5ee5f3da3ca9b125aad29 Mon Sep 17 00:00:00 2001
From: Amber Hu <qhu@vmware.com>
Date: Thu, 30 Jan 2020 18:01:32 -0800
Subject: [PATCH v3] datapath-windows: Append tunnel info to upcall for correct
 template

Signed-off-by: Amber Hu <qhu@vmware.com>
Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>
4 years agoAUTHORS: Add Martin Varghese.
Ben Pfaff [Fri, 7 Feb 2020 18:43:33 +0000 (10:43 -0800)]
AUTHORS: Add Martin Varghese.

Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agonetdev-dpdk: Don't enable offloading on HW device if not requested.
Ilya Maximets [Thu, 6 Feb 2020 13:24:23 +0000 (14:24 +0100)]
netdev-dpdk: Don't enable offloading on HW device if not requested.

DPDK drivers has different implementations of transmit functions.
Enabled offloading may cause driver to choose slower variant
significantly affecting performance if userspace TSO wasn't requested.

Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support")
Reported-by: David Marchand <david.marchand@redhat.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agonetdev-linux: Prepend the std packet in the TSO packet
Flavio Leitner [Mon, 3 Feb 2020 21:45:50 +0000 (18:45 -0300)]
netdev-linux: Prepend the std packet in the TSO packet

Usually TSO packets are close to 50k, 60k bytes long, so to
to copy less bytes when receiving a packet from the kernel
change the approach. Instead of extending the MTU sized
packet received and append with remaining TSO data from
the TSO buffer, allocate a TSO packet with enough headroom
to prepend the std packet data.

Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support")
Suggested-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agonetdev-linux-private: fix max length to be 16 bits
Flavio Leitner [Mon, 3 Feb 2020 14:22:22 +0000 (11:22 -0300)]
netdev-linux-private: fix max length to be 16 bits

The dp_packet length is limited to 16 bits, so document that
and fix the length value accordingly.

Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support")
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agonetdev-dpdk: Fix port init when lacking Tx offloads for TSO.
David Marchand [Tue, 4 Feb 2020 21:28:26 +0000 (22:28 +0100)]
netdev-dpdk: Fix port init when lacking Tx offloads for TSO.

The check on TSO capability did not ensure ip checksum, tcp checksum and
TSO tx offloads were available which resulted in a port init failure
(example below with a ena device):

*2020-02-04T17:42:52.976Z|00084|dpdk|ERR|Ethdev port_id=0 requested Tx
offloads 0x2a doesn't match Tx offloads capabilities 0xe in
rte_eth_dev_configure()*

Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support")
Reported-by: Ravi Kerur <rkerur@gmail.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agovswitchd: Add serial number configuration.
Kirill A. Kornilov [Mon, 13 Jan 2020 09:29:10 +0000 (12:29 +0300)]
vswitchd: Add serial number configuration.

Signed-off-by: Kirill A. Kornilov <kornilov@zelax.ru>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoflow: Fix parsing l3_ofs with partial offloading
Eli Britstein [Tue, 14 Jan 2020 13:21:15 +0000 (13:21 +0000)]
flow: Fix parsing l3_ofs with partial offloading

l3_ofs should be set all Ethernet packets, not just IPv4/IPv6 ones.
For example for ARP over VLAN tagged packets, it may cause wrong
processing like in changing the VLAN ID action. Fix it.

Fixes: aab96ec4d81e ("dpif-netdev: retrieve flow directly from the flow mark")
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agocompat: Remove HAVE_BOOL_TYPE
Greg Rose [Thu, 30 Jan 2020 23:55:27 +0000 (15:55 -0800)]
compat: Remove HAVE_BOOL_TYPE

OVS only supports Linux kernels since 3.10 and all kernels since
then have the bool type.  This check is unnecessary so remove it.

Passes Travis:
https://travis-ci.org/gvrose8192/ovs-experimental/builds/644103253

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agodocs: Add header install command for afxdp.
William Tu [Thu, 23 Jan 2020 17:03:11 +0000 (09:03 -0800)]
docs: Add header install command for afxdp.

The 'XDP_RING_NEED_WAKEUP' and related flags are defined if_xdp.h, so if
users are building their own kernel, users have to update the kernel's
header files, by doing:

  $ make headers_install INSTALL_HDR_PATH=/usr

Otherwise the following error shows:
/usr/local/include/bpf/xsk.h: In function 'xsk_ring_prod__needs_wakeup':
/usr/local/include/bpf/xsk.h:82:21: error: 'XDP_RING_NEED_WAKEUP' undeclared \
  (first use in this function)
  return *r->flags & XDP_RING_NEED_WAKEUP;

Reported-by: Tomek Osinski <osinstom@gmail.com>
Reported-at: https://osinstom.github.io/en/tutorial/ovs-afxdp-installation/
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agoconntrack: Fix conntrack new state
Yi-Hung Wei [Fri, 20 Dec 2019 17:51:08 +0000 (09:51 -0800)]
conntrack: Fix conntrack new state

In connection tracking system, a connection is established if we
see packets from both directions.  However, in userspace datapath's
conntrack, if we send a connection setup packet in one direction
twice, it will make the connection to be in established state.

This patch fixes the aforementioned issue, and adds a system traffic
test for UDP and TCP traffic to avoid regression.

Fixes: a489b16854b59 ("conntrack: New userspace connection tracker.")
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agodpif: Fix dp_extra_info leak by reworking the allocation scheme.
Ilya Maximets [Thu, 23 Jan 2020 18:10:05 +0000 (19:10 +0100)]
dpif: Fix dp_extra_info leak by reworking the allocation scheme.

dpctl module leaks the 'dp_extra_info' in case the dumped flow doesn't
fit the dump filter while executing dpctl/dump-flows and also while
executing dpctl/get-flow.

This is already a 3rd attempt to fix all the leaks and incorrect usage
of this string that definitely indicates poor initial design of the
feature.

Flow dump/get documentation clearly states that the caller does not own
the data provided in dpif_flow.  Datapath still owns all the data and
promises to not free/modify it until the next quiescent period, however
we're requesting the caller to free 'dp_extra_info' and this obviously
breaks the rules.

This patch fixes the issue by by storing 'dp_extra_info' within
'struct dp_netdev_flow' making datapath to own it.  'dp_netdev_flow'
is RCU-protected, so it will be valid until the next quiescent period.

Fixes: 0e8f5c6a38d0 ("dpif-netdev: Modified ovs-appctl dpctl/dump-flows command")
Tested-by: Emma Finn <emma.finn@intel.com>
Acked-by: Emma Finn <emma.finn@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agoofproto: Do not delete datapath flows on exit by default.
Ben Pfaff [Thu, 9 Jan 2020 20:49:44 +0000 (12:49 -0800)]
ofproto: Do not delete datapath flows on exit by default.

Commit e96a5c24e853 ("upcall: Remove datapath flows when setting
n-threads.") caused OVS to delete datapath flows when it exits through
any graceful means.  This is not necessarily desirable, especially when
OVS is being stopped as part of an upgrade.  This commit changes OVS so
that it only removes datapath flows when requested, via "ovs-appctl
exit --cleanup".

Acked-by: Numan Siddique <numans@ovn.org>
Tested-by: Numan Siddique <numans@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoofproto-dpif-upcall: Get rid of udpif_synchronize().
Ben Pfaff [Thu, 9 Jan 2020 20:49:43 +0000 (12:49 -0800)]
ofproto-dpif-upcall: Get rid of udpif_synchronize().

RCU provides the semantics we want from udpif_synchronize() and it
should be much more lightweight than killing and restarting all the
upcall threads.  It looks like udpif_synchronize() was written before
the OVS tree had RCU support, which is probably why we didn't use it
here from the beginning.  So we can just change udpif_synchronize()
to a single ovsrcu_synchronize() call.

However, udpif_synchronize() only has a single caller, which calls
ovsrcu_synchronize() anyway just beforehand, via xlate_txn_commit().
So we can get rid of udpif_synchronize() entirely, which this patch
does.

As a side effect, this eliminates one reason why terminating OVS cleanly
clears the datapath flow table.  An upcoming patch will eliminate
other reasons.

Acked-by: Numan Siddique <numans@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agolib/stream-windows.c: Grant Access Privilege of Named Pipe to Creator
Ning Wu [Wed, 22 Jan 2020 07:46:58 +0000 (23:46 -0800)]
lib/stream-windows.c: Grant Access Privilege of Named Pipe to Creator

Current implementation of ovs on windows only allows LocalSystem and
Administrators to access the named pipe created with API of ovs.
Thus any service that needs to invoke the API to create named pipe
has to run as System account to interactive with ovs. It causes the
system more vulnerable if one of those services was break into.
The patch adds the creator owner account to allowed ACLs.

Signed-off-by: Ning Wu <nwu@vmware.com>
Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
Acked-by: Anand Kumar <kumaranand@vmware.com>
Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>
4 years agodpif: Fix memory leak while dumping dpif flows.
Damijan Skvarc [Wed, 22 Jan 2020 14:06:43 +0000 (15:06 +0100)]
dpif: Fix memory leak while dumping dpif flows.

Leak was detected by running test: "ofproto-dpif - balance-tcp bonding"

Fixes: 0e8f5c6a38d0 ("dpif-netdev: Modified ovs-appctl dpctl/dump-flows command")
Signed-off-by: Damijan Skvarc <damjan.skvarc@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agotc: handle packet mark of zero
John Hurley [Thu, 19 Dec 2019 14:58:43 +0000 (14:58 +0000)]
tc: handle packet mark of zero

Openstack may set an skb mark of 0 in tunnel rules. This is considered to
be an unused/unset value. However, it prevents the rule from being
offloaded.

Check if the key value of the skb mark is 0 when it is in use (mask is
set to all ones). If it is then ignore the field and continue with TC offload.

Only the exact-match case is covered by this patch as it addresses the
Openstack use-case and seems most robust against feature evolution: f.e. in
future there may exist hardware offload scenarios where an operation, such
as a BPF offload, sets the SKB mark before proceeding tho the in-HW OVS.
datapath.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Co-Authored-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Aaron Conole <aconole@redhat.com>
4 years agoPrepare for post-2.13.0 (2.13.90).
Ben Pfaff [Tue, 21 Jan 2020 20:45:00 +0000 (12:45 -0800)]
Prepare for post-2.13.0 (2.13.90).

Acked-by: Gurucharan Shetty <guru@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoPrepare for 2.13.0.
Ben Pfaff [Tue, 21 Jan 2020 20:24:52 +0000 (12:24 -0800)]
Prepare for 2.13.0.

Acked-by: Gurucharan Shetty <guru@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoDocumentation: add notes for TSO & i40e
Ciara Loftus [Mon, 20 Jan 2020 13:09:36 +0000 (13:09 +0000)]
Documentation: add notes for TSO & i40e

When using TSO in OVS-DPDK with an i40e device, the following
patch is required for DPDK, which fixes an issue on the TSO path:
https://patches.dpdk.org/patch/64136/
Document this as a limitation until a DPDK release with the fix
included is supported by OVS.

Also, document best known methods for performance tuning when
testing TSO with the tool iperf.

Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support")
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
4 years agodpif: Fix leak and usage of uninitialized dp_extra_info.
Ilya Maximets [Fri, 17 Jan 2020 22:00:05 +0000 (23:00 +0100)]
dpif: Fix leak and usage of uninitialized dp_extra_info.

'dpif_probe_feature'/'revalidate' doesn't free the 'dp_extra_info'
string.  Also, all the implementations of dpif_flow_get() should
initialize the value to avoid printing/freeing of random memory.

 30 bytes in 1 blocks are definitely lost in loss record 323 of 889
    at 0x483AD19: realloc (vg_replace_malloc.c:836)
    by 0xDDAD89: xrealloc (util.c:149)
    by 0xCE1609: ds_reserve (dynamic-string.c:63)
    by 0xCE1A90: ds_put_format_valist (dynamic-string.c:161)
    by 0xCE19B9: ds_put_format (dynamic-string.c:142)
    by 0xCCCEA9: dp_netdev_flow_to_dpif_flow (dpif-netdev.c:3170)
    by 0xCCD2DD: dpif_netdev_flow_get (dpif-netdev.c:3278)
    by 0xCCEA0A: dpif_netdev_operate (dpif-netdev.c:3868)
    by 0xCDF81B: dpif_operate (dpif.c:1361)
    by 0xCDEE93: dpif_flow_get (dpif.c:1002)
    by 0xCDECF9: dpif_probe_feature (dpif.c:962)
    by 0xC635D2: check_recirc (ofproto-dpif.c:896)
    by 0xC65C02: check_support (ofproto-dpif.c:1567)
    by 0xC63274: open_dpif_backer (ofproto-dpif.c:818)
    by 0xC65E3E: construct (ofproto-dpif.c:1605)
    by 0xC4D436: ofproto_create (ofproto.c:549)
    by 0xC3931A: bridge_reconfigure (bridge.c:877)
    by 0xC3FEAC: bridge_run (bridge.c:3324)
    by 0xC4551D: main (ovs-vswitchd.c:127)

CC: Emma Finn <emma.finn@intel.com>
Fixes: 0e8f5c6a38d0 ("dpif-netdev: Modified ovs-appctl dpctl/dump-flows command")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Roi Dayan <roid@mellanox.com>
4 years agonetdev-afxdp: NUMA-aware memory allocation for XSK related memory.
Yi-Hung Wei [Sat, 4 Jan 2020 01:13:26 +0000 (17:13 -0800)]
netdev-afxdp: NUMA-aware memory allocation for XSK related memory.

Currently, the AF_XDP socket (XSK) related memory are allocated by main
thread in the main thread's NUMA domain.  With the patch that detects
netdev-linux's NUMA node id, the PMD thread of AF_XDP port will be run on
the AF_XDP netdev's NUMA domain.  If the net device's NUMA domain
is different from the main thread's NUMA domain, we will have two
cross-NUMA memory accesses (netdev <-> memory, memory <-> CPU).

This patch addresses the aforementioned issue by allocating
the memory in the net device's NUMA domain.

Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agonetdev-linux: Detect numa node id.
William Tu [Sat, 4 Jan 2020 01:13:25 +0000 (17:13 -0800)]
netdev-linux: Detect numa node id.

The patch detects the numa node id from the name of the netdev,
by reading the '/sys/class/net/<devname>/device/numa_node'.
If not available, ex: virtual device, or any error happens,
return numa id 0.  Currently only the afxdp netdev type uses it,
other linux netdev types are disabled due to no use case.

Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agouserspace: Add TCP Segmentation Offload support
Flavio Leitner [Fri, 17 Jan 2020 21:47:55 +0000 (18:47 -0300)]
userspace: Add TCP Segmentation Offload support

Abbreviated as TSO, TCP Segmentation Offload is a feature which enables
the network stack to delegate the TCP segmentation to the NIC reducing
the per packet CPU overhead.

A guest using vhostuser interface with TSO enabled can send TCP packets
much bigger than the MTU, which saves CPU cycles normally used to break
the packets down to MTU size and to calculate checksums.

It also saves CPU cycles used to parse multiple packets/headers during
the packet processing inside virtual switch.

If the destination of the packet is another guest in the same host, then
the same big packet can be sent through a vhostuser interface skipping
the segmentation completely. However, if the destination is not local,
the NIC hardware is instructed to do the TCP segmentation and checksum
calculation.

It is recommended to check if NIC hardware supports TSO before enabling
the feature, which is off by default. For additional information please
check the tso.rst document.

Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Tested-by: Ciara Loftus <ciara.loftus.intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
4 years agoTypo fix: vswtch -> vswitch.
Ben Pfaff [Fri, 17 Jan 2020 18:29:40 +0000 (10:29 -0800)]
Typo fix: vswtch -> vswitch.

Acked-by: Numan Siddique <numans@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agovhost: Disable multi-segmented buffers
Flavio Leitner [Thu, 16 Jan 2020 17:00:34 +0000 (14:00 -0300)]
vhost: Disable multi-segmented buffers

There is no support for multi-segmented buffers, so flag
that to vhost library.

Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Tested-by: Ciara Loftus <ciara.loftus.intel.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
4 years agodp-packet: preserve headroom when cloning a pkt batch
Flavio Leitner [Thu, 16 Jan 2020 17:00:33 +0000 (14:00 -0300)]
dp-packet: preserve headroom when cloning a pkt batch

The headroom is useful if the packet needs to insert additional
header, so preserve the original headroom when cloning the batch.

Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Tested-by: Ciara Loftus <ciara.loftus.intel.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
4 years agoAUTHORS: Add Emma Finn.
Ian Stokes [Fri, 17 Jan 2020 15:30:03 +0000 (15:30 +0000)]
AUTHORS: Add Emma Finn.

Signed-off-by: Ian Stokes <ian.stokes@intel.com>
4 years agodpif-netdev: Modified ovs-appctl dpctl/dump-flows command
Emma Finn [Wed, 6 Nov 2019 11:35:29 +0000 (11:35 +0000)]
dpif-netdev: Modified ovs-appctl dpctl/dump-flows command

Modified ovs-appctl dpctl/dump-flows command to output
the miniflow bits for a given flow when -m option is passed.

$ ovs-appctl dpctl/dump-flows -m

Signed-off-by: Emma Finn <emma.finn@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
4 years agoNEWS: Fix location of output actions to be under DPDK.
Eli Britstein [Fri, 17 Jan 2020 08:26:58 +0000 (08:26 +0000)]
NEWS: Fix location of output actions to be under DPDK.

In the cited commit, the output actions NEWS was mistakenly under OVSDB
instead of DPDK. Fix it.

Fixes: 3c7330ebf036 ("netdev-offload-dpdk: Support offload of output action.")
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agoDocumentation: Fix building with Python 3.9
Timothy Redaelli [Thu, 16 Jan 2020 13:21:47 +0000 (14:21 +0100)]
Documentation: Fix building with Python 3.9

open(), io.open(), codecs.open() and fileinput.FileInput no longer accept 'U'
("universal newline") in the file mode.
This flag was deprecated since Python 3.3.
In Python 3, the "universal newline" is used by default when a file is open
in text mode.

Acked-by: Flavio Leitner <fbl@sysclose.org>
Reported-at: https://bugzilla.redhat.com/1791681
Reported-by: Miro Hrončok <mhroncok@redhat.com>
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoovsdb-server: Allow OVSDB clients to specify the UUID for inserted rows.
Ben Pfaff [Thu, 9 Jan 2020 20:48:30 +0000 (12:48 -0800)]
ovsdb-server: Allow OVSDB clients to specify the UUID for inserted rows.

Acked-by: Han Zhou <hzhou@ovn.org>
Requested-by: Leonid Ryzhyk <lryzhyk@vmware.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agonetdev-offload-dpdk: Support offload of set TCP/UDP ports actions.
Eli Britstein [Thu, 9 Jan 2020 07:46:55 +0000 (07:46 +0000)]
netdev-offload-dpdk: Support offload of set TCP/UDP ports actions.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agonetdev-offload-dpdk: Support offload of set IPv4 actions.
Eli Britstein [Thu, 9 Jan 2020 07:46:54 +0000 (07:46 +0000)]
netdev-offload-dpdk: Support offload of set IPv4 actions.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agonetdev-offload-dpdk: Support offload of set MAC actions.
Eli Britstein [Thu, 9 Jan 2020 07:46:53 +0000 (07:46 +0000)]
netdev-offload-dpdk: Support offload of set MAC actions.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agonetdev-offload-dpdk: Support offload of drop action.
Eli Britstein [Thu, 9 Jan 2020 07:46:52 +0000 (07:46 +0000)]
netdev-offload-dpdk: Support offload of drop action.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agonetdev-offload-dpdk: Support offload of output action.
Eli Britstein [Thu, 9 Jan 2020 07:46:51 +0000 (07:46 +0000)]
netdev-offload-dpdk: Support offload of output action.

Support offload of output action, also configuring count action for
allowing query statistics of HW offloaded flows.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agonetdev-offload: Introduce a function to validate same flow api handle.
Eli Britstein [Thu, 9 Jan 2020 07:46:50 +0000 (07:46 +0000)]
netdev-offload: Introduce a function to validate same flow api handle.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agonetdev-dpdk: Getter function for dpdk port id API.
Eli Britstein [Thu, 9 Jan 2020 07:46:49 +0000 (07:46 +0000)]
netdev-dpdk: Getter function for dpdk port id API.

Add a getter function for using the dpdk port id outside the scope of
netdev-dpdk.c to be used for HW offload.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agodpif-netdev: Populate dpif class field in offload struct.
Eli Britstein [Thu, 9 Jan 2020 07:46:48 +0000 (07:46 +0000)]
dpif-netdev: Populate dpif class field in offload struct.

Populate dpif class field in offload struct to be used in offloading
flow put.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agodpif-netdev: Update offloaded flows statistics.
Ophir Munk [Thu, 9 Jan 2020 07:46:47 +0000 (07:46 +0000)]
dpif-netdev: Update offloaded flows statistics.

In case a flow is HW offloaded, packets do not reach the SW, thus not
counted for statistics. Use netdev flow get API in order to update the
statistics of flows by the HW statistics.

Co-authored-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agodpctl: Support dump-flows filters "dpdk" and "partially-offloaded".
Eli Britstein [Thu, 9 Jan 2020 07:46:46 +0000 (07:46 +0000)]
dpctl: Support dump-flows filters "dpdk" and "partially-offloaded".

Flows that are offloaded via DPDK can be partially offloaded (matches
only) or fully offloaded (matches and actions). Set partially offloaded
display to (offloaded=partial, dp:ovs), and fully offloaded to
(offloaded=yes, dp:dpdk). Also support filter types "dpdk" and
"partially-offloaded".

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agonetdev-offload-dpdk: Implement flow get method.
Eli Britstein [Thu, 9 Jan 2020 07:46:45 +0000 (07:46 +0000)]
netdev-offload-dpdk: Implement flow get method.

Implement the flow get method for DPDK, to get the statistics of the
provided ufid, towards reading statistics of fully offloaded flows.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agonetdev-offload-dpdk: Framework for actions offload.
Eli Britstein [Thu, 9 Jan 2020 07:46:44 +0000 (07:46 +0000)]
netdev-offload-dpdk: Framework for actions offload.

Currently HW offload is accelerating only the rule matching sequence.
Introduce a framework for offloading rule actions as a pre-step for
processing the rule actions in HW. In case of a failure, fallback to the
legacy partial offload scheme.

Note: a flow will be fully offloaded only if it can process all its
actions in HW.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agonetdev-offload-dpdk: Return UFID-rte_flow entry in find method.
Eli Britstein [Thu, 9 Jan 2020 07:46:43 +0000 (07:46 +0000)]
netdev-offload-dpdk: Return UFID-rte_flow entry in find method.

Change the find method to return the whole entry of UFID-rte_flow
association instead of only the rte_flow field in it, as a pre-step
towards adding and using more fields into that map entry.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agonetdev-dpdk: Introduce rte flow query function.
Eli Britstein [Thu, 9 Jan 2020 07:46:42 +0000 (07:46 +0000)]
netdev-dpdk: Introduce rte flow query function.

Introduce a rte flow query function as a pre-step towards reading HW
statistics of fully offloaded flows.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agonetdev-offload-dpdk: Improve HW offload flow debuggability.
Eli Britstein [Thu, 9 Jan 2020 07:46:41 +0000 (07:46 +0000)]
netdev-offload-dpdk: Improve HW offload flow debuggability.

Add debug prints when creating and destroying rte flows, with all the
flow details (attributes, patterns, actions).

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agonetdev-offload-dpdk: Dynamically allocate pattern items.
Eli Britstein [Thu, 9 Jan 2020 07:46:40 +0000 (07:46 +0000)]
netdev-offload-dpdk: Dynamically allocate pattern items.

Instead of statically allocated pattern items on the stack, dynamically
allocate only the required items while parsing the matches, to simplify
the parsing and make it self-contained, without need of external types,
making it easier to support more matches in the future.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agonetdev-offload-dpdk: Refactor action items freeing scheme.
Eli Britstein [Thu, 9 Jan 2020 07:46:39 +0000 (07:46 +0000)]
netdev-offload-dpdk: Refactor action items freeing scheme.

Action item data structures are pointed by rte_flow_action items.
Refactor the code to free the data structures when freeing the
rte_flow_action items, allowing simpler future actions simpler to add to
the code.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agonetdev-offload-dpdk: Refactor flow patterns.
Eli Britstein [Thu, 9 Jan 2020 07:46:38 +0000 (07:46 +0000)]
netdev-offload-dpdk: Refactor flow patterns.

Refactor the flow patterns code to a helper function for better
readability and towards supporting more matches.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agodpif-netdev: Make datapath port mutex recursive.
Ilya Maximets [Wed, 15 Jan 2020 17:11:51 +0000 (18:11 +0100)]
dpif-netdev: Make datapath port mutex recursive.

Upcoming HW offloading will request flow statistics from the dpdk
offloading module.  This operation requires holding datapath port
mutex.  However, there is a possible scenario in which flow deletion
happens during datapath reconfiguration process and the mutex already
acquired:

  0  raise () from /lib64/libc.so.6
  1  abort () from /lib64/libc.so.6
  2  ovs_abort_valist ()
  3  ovs_abort ()
  4  ovs_mutex_lock_at ()
  5  dpif_netdev_get_flow_offload_status ()
  6  get_dpif_flow_status ()
  7  flow_del_on_pmd ()
  8  dpif_netdev_flow_del ()
  9  dpif_netdev_operate ()
  10 dpif_operate ()
  11 push_dp_ops ()
  12 push_ukey_ops ()
  13 dp_purge_cb ()
  14 dp_netdev_del_pmd ()
  15 reconfigure_pmd_threads ()
  16 reconfigure_datapath ()
  17 do_del_port ()
  18 dpif_netdev_port_del ()
  19 dpif_port_del ()
  20 port_del ()
  21 ofproto_port_del ()
  22 bridge_delete_or_reconfigure_ports ()
  23 bridge_reconfigure ()
  24 bridge_run ()
  25 main ()

This happens while removing the last port of a particular PMD thread.
Reconfiguration process decides that we need to remove current PMD
thread and calls datapath purge callback in order to clean up resources
assigned to it.  This turns into flow removal and flow_del() tries to
request statistics.

Turning the dp->port_mutex into recursive version as a quick fix for
this issue.  Better solutions might be to avoid statistics request
somehow, or fully disassociate offloaded flows from the datapath flows.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Ian Stokes <ian.stokes@intel.com>
4 years agodocs: Correct links in mailing-lists.rst
Lars Kellogg-Stedman [Wed, 15 Jan 2020 18:42:06 +0000 (13:42 -0500)]
docs: Correct links in mailing-lists.rst

Adds `mailto:` to email address links in mailing-lists.rst. The
existing syntax resulted in broken links of the form
`http://docs.openvswitch.org/en/latest/internals/mailing-lists/<address>`,
which would result in a 404 error.

Signed-off-by: Lars Kellogg-Stedman <lars@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agonetdev-dpdk: Add new DPDK RFC 4115 egress policer
Eelco Chaudron [Tue, 14 Jan 2020 16:12:47 +0000 (11:12 -0500)]
netdev-dpdk: Add new DPDK RFC 4115 egress policer

This patch adds a new policer to the DPDK datapath based on RFC 4115's
Two-Rate, Three-Color marker. It's a two-level hierarchical policer
which first does a color-blind marking of the traffic at the queue
level, followed by a color-aware marking at the port level. At the end
traffic marked as Green or Yellow is forwarded, Red is dropped. For
details on how traffic is marked, see RFC 4115.

This egress policer can be used to limit traffic at different rated
based on the queues the traffic is in. In addition, it can also be used
to prioritize certain traffic over others at a port level.

For example, the following configuration will limit the traffic rate at a
port level to a maximum of 2000 packets a second (64 bytes IPv4 packets).
100pps as CIR (Committed Information Rate) and 1000pps as EIR (Excess
Information Rate). High priority traffic is routed to queue 10, which marks
all traffic as CIR, i.e. Green. All low priority traffic, queue 20, is
marked as EIR, i.e. Yellow.

ovs-vsctl --timeout=5 set port dpdk1 qos=@myqos -- \
  --id=@myqos create qos type=trtcm-policer \
  other-config:cir=52000 other-config:cbs=2048 \
  other-config:eir=52000 other-config:ebs=2048  \
  queues:10=@dpdk1Q10 queues:20=@dpdk1Q20 -- \
  --id=@dpdk1Q10 create queue \
    other-config:cir=41600000 other-config:cbs=2048 \
    other-config:eir=0 other-config:ebs=0 -- \
  --id=@dpdk1Q20 create queue \
    other-config:cir=0 other-config:cbs=0 \
    other-config:eir=41600000 other-config:ebs=2048 \

This configuration accomplishes that the high priority traffic has a
guaranteed bandwidth egressing the ports at CIR (1000pps), but it can also
use the EIR, so a total of 2000pps at max. These additional 1000pps is
shared with the low priority traffic. The low priority traffic can use at
maximum 1000pps.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
4 years agonetdev-dpdk: Add support for multi-queue QoS to the DPDK datapath
Eelco Chaudron [Tue, 14 Jan 2020 16:12:40 +0000 (11:12 -0500)]
netdev-dpdk: Add support for multi-queue QoS to the DPDK datapath

This patch adds support for multi-queue QoS to the DPDK datapath. Most of
the code is based on an earlier patch from a patchset sent out by
zhaozhanxu. The patch was titled "[ovs-dev, v2, 1/4] netdev-dpdk.c: Support
the multi-queue QoS configuration for dpdk datapath"

Signed-off-by: zhaozhanxu <zhaozhanxu@163.com>
Co-authored-by: zhaozhanxu <zhaozhanxu@163.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
4 years agonetdev-offload-tc: Fix crash if offloading is not configured on outdev.
Ilya Maximets [Tue, 14 Jan 2020 20:18:14 +0000 (21:18 +0100)]
netdev-offload-tc: Fix crash if offloading is not configured on outdev.

If output device is not yet added to netdev-offload, netdev_ports_get()
will not find it leading to NULL pointer dereference inside
netdev_get_ifindex().

Fixes: 8f283af89298 ("netdev-tc-offloads: Implement netdev flow put using tc interface")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
4 years agosocket-util: Emulate recvmmsg() and sendmmsg() on Linux only.
Ben Pfaff [Mon, 13 Jan 2020 18:03:04 +0000 (10:03 -0800)]
socket-util: Emulate recvmmsg() and sendmmsg() on Linux only.

These functions failed to build on OS X because MSG_WAITFORONE is not
defined there.  There are pitfalls for trying to define our own MSG_*
constants, since it's hard to pick a constant that is not used by the
system already.  Because OVS only uses recvmmsg() and sendmmsg() on
Linux, it seems easiest to just emulate them there.

Reported-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoodp-util: Fix passing uninitialized bytes in OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV*.
Ilya Maximets [Thu, 9 Jan 2020 15:59:15 +0000 (16:59 +0100)]
odp-util: Fix passing uninitialized bytes in OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV*.

Both ovs_key_ct_tuple_ipv* structures contains padding at the end
that mast be cleared before passing attributes to kernel:

 Syscall param sendmsg(msg.msg_iov[0]) points to uninitialised byte(s)
    at 0x566A607: sendmsg (sendmsg.c:28)
    by 0xFC95CE: nl_sock_transact_multiple__ (netlink-socket.c:858)
    by 0xFC8580: nl_sock_transact_multiple (netlink-socket.c:1079)
    by 0xFC83FF: nl_transact_multiple (netlink-socket.c:1839)
    by 0xFA8648: dpif_netlink_operate__ (dpif-netlink.c:1926)
    by 0xFA789F: dpif_netlink_operate_chunks (dpif-netlink.c:2219)
    by 0xFA25CB: dpif_netlink_operate (dpif-netlink.c:2278)
    by 0xE5BB4C: dpif_operate (dpif.c:1377)
    by 0xE5B7F6: dpif_flow_put (dpif.c:1048)
    by 0xE5B49A: dpif_probe_feature (dpif.c:965)
    by 0xDD6BF5: check_ct_orig_tuple (ofproto-dpif.c:1557)
    by 0xDD41EC: check_support (ofproto-dpif.c:1590)
    by 0xDD3BF3: open_dpif_backer (ofproto-dpif.c:818)
    by 0xDC8467: construct (ofproto-dpif.c:1605)
    by 0xDAD6BB: ofproto_create (ofproto.c:549)
    by 0xD96A19: bridge_reconfigure (bridge.c:877)
    by 0xD9625D: bridge_run (bridge.c:3324)
    by 0xDA5829: main (ovs-vswitchd.c:127)
  Address 0x1ffefe36a5 is on thread 1's stack
  in frame #4, created by dpif_netlink_operate__ (dpif-netlink.c:1839)
  Uninitialised value was created by a stack allocation
    at 0xEB87D0: odp_flow_key_from_flow__ (odp-util.c:5996)

Fixes: daf4d3c18da4 ("odp: Support conntrack orig tuple key.")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
4 years agoovs-tcpdump: Fallback to read /proc/net/dev on Linux
Timothy Redaelli [Mon, 13 Jan 2020 16:47:03 +0000 (17:47 +0100)]
ovs-tcpdump: Fallback to read /proc/net/dev on Linux

Currently, ovs-tcpdump uses python3-netifaces in order to get the list of
the network interfaces, but python3-netifaces may not be installed nor
available on some distributions (for example on RHEL7).

This commit adds, only for Linux, an alternative way (that is only used
when netifaces is not available) to read the list of the network interfaces
by reading "/proc/net/dev".

Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agonetdev-dpdk: Avoid undefined behavior processing devargs
Aaron Conole [Thu, 9 Jan 2020 18:27:06 +0000 (13:27 -0500)]
netdev-dpdk: Avoid undefined behavior processing devargs

In "Use of library functions" in the C standard, the following statement
is written to apply to all library functions:

  If an argument to a function has an invalid value (such as ... a
  null pointer ... the behavior is undefined.

Later, under the "String handling" section, "Comparison functions" no
exception is listed for strcmp, which means NULL is invalid.  It may
be possible for the smap_get to return NULL.

Given the above, we must check that new_devargs is not null.  The check
against NULL for new_devargs later in the function is still valid.

Fixes: 55e075e65ef9 ("netdev-dpdk: Arbitrary 'dpdk' port naming")
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
4 years agoUse batch process recv for tap and raw socket in netdev datapath
Yi Yang [Wed, 18 Dec 2019 02:35:27 +0000 (21:35 -0500)]
Use batch process recv for tap and raw socket in netdev datapath

Current netdev_linux_rxq_recv_tap and netdev_linux_rxq_recv_sock
just receive single packet, that is very inefficient, per my test
case which adds two tap ports or veth ports into OVS bridge
(datapath_type=netdev) and use iperf3 to do performance test
between two ports (they are set into different network name space).

The result is as below:

  tap:  295 Mbits/sec
  veth: 207 Mbits/sec

After I change netdev_linux_rxq_recv_tap and
netdev_linux_rxq_recv_sock to use batch process, the performance
is boosted by about 7 times, here is the result:

  tap:  1.96 Gbits/sec
  veth: 1.47 Gbits/sec

Undoubtedly this is a huge improvement although it can't match
OVS kernel datapath yet.

FYI: here is thr result for OVS kernel datapath:

  tap:  37.2 Gbits/sec
  veth: 36.3 Gbits/sec

Note: performance result is highly related with your test machine,
you shouldn't expect the same results on your test machine.

Signed-off-by: Yi Yang <yangyi01@inspur.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agosocket-util: Introduce emulation and wrapper for recvmmsg().
Ben Pfaff [Tue, 17 Dec 2019 20:38:37 +0000 (12:38 -0800)]
socket-util: Introduce emulation and wrapper for recvmmsg().

Not every system will have recvmmsg(), so introduce compatibility code
that will allow it to be used blindly from the rest of the tree.

This assumes that recvmmsg() and sendmmsg() are either both present or
both absent in system libraries and headers.

CC: Yi Yang <yangyi01@inspur.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoacinclude: Use RTE_IBVERBS_LINK_DLOPEN
Timothy Redaelli [Wed, 8 Jan 2020 12:42:14 +0000 (13:42 +0100)]
acinclude: Use RTE_IBVERBS_LINK_DLOPEN

On DPDK 19.11 RTE_IBVERBS_LINK_DLOPEN is used by Mellanox PMDs (mlx4 and
mlx5) instead of RTE_LIBRTE_MLX{4,5}_DLOPEN_DEPS.

Without this commit is not possible to statically link OVS with DPDK when MLX4
or MLX5 PMDs are enabled.

Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
4 years agoAUTHORS: update email for Lance Richardson
Lance Richardson [Wed, 8 Jan 2020 19:51:09 +0000 (14:51 -0500)]
AUTHORS: update email for Lance Richardson

Update email address for Lance Richardson.

Signed-off-by: Lance Richardson <lance.richardson@broadcom.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agodpif-netdev: Get rid of broken dpif pointer in dp_netdev structure.
Ilya Maximets [Sun, 8 Dec 2019 17:51:09 +0000 (18:51 +0100)]
dpif-netdev: Get rid of broken dpif pointer in dp_netdev structure.

This pointer was introduced in July 2014 by commit
6b31e07347ad ("dpif-netdev: Polling threads directly call ofproto upcall functions.")
and it was broken right from this point because dpif_netdev_open()
updates it on each call with the pointer to a newly allocated
'dpif' structure that becomes invalid on the next dpif_netdev_close().
Since dpif_open/close() always happens asynchronously from different
threads and pointer is not protected by rcu or mutex (it's not even
atomic) it's not possible to safely use it.  Thankfully the actual
usage was in repository for less than 3 weeks and was removed by
commit 623540e4617e ("dpif-netdev: Streamline miss handling.").  Until
recently this pointer was used in order to pass it to dpif_flow_hash().
Another luck is that dpif_flow_hash() didn't use the 'dpif' argument.

However, we tried to use it while netdev offloading by commit
30115809da2e ("dpif-netdev: Use netdev-offload API for port lookup while offloading.")
and that unveiled the issue.

Now that all the code that used this pointer was cleaned up we can
just remove it from the structure to avoid possible misuse in the
future.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
4 years agodpif: Turn dpif_flow_hash function into generic odp_flow_key_hash.
Ilya Maximets [Sun, 8 Dec 2019 17:09:53 +0000 (18:09 +0100)]
dpif: Turn dpif_flow_hash function into generic odp_flow_key_hash.

Current implementation of dpif_flow_hash() doesn't depend on datapath
interface and only complicates the callers by forcing them to figure
out what is their current 'dpif'.  If we'll need different hashing
for different 'dpif's we'll implement an API for dpif-providers
and each dpif implementation will be able to use their local function
directly without calling it via dpif API.

This change will allow us to not store 'dpif' pointer in the userspace
datapath implementation which is broken and will be removed in next
commits.

This patch moves dpif_flow_hash() to odp-util module and replaces
unused odp_flow_key_hash() by it, along with removing of unused 'dpif'
argument.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
4 years agoovsdb replication: Provide option to configure probe interval.
Numan Siddique [Tue, 7 Jan 2020 04:54:48 +0000 (10:24 +0530)]
ovsdb replication: Provide option to configure probe interval.

When ovsdb-server is in backup mode and connects to the active
ovsdb-server for replication, and if takes more than 5 seconds to
get the dump of the whole database, it will drop the connection
soon after as the default probe interval is 5 seconds. This
results in a snowball effect of reconnections to the active
ovsdb-server.

This patch handles or mitigates this issue by setting the
default probe interval value to 60 seconds and provide the option to
configure this value from the unixctl command.

Other option could be increase the value of 'RECONNECT_DEFAULT_PROBE_INTERVAL'
to a higher value.

Acked-by: Mark Michelson <mmichels@redhat.com>
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Numan Siddique <numans@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agotests: introduced tests for adding/deleting logical routers in VTEP database
Damijan Skvarc [Mon, 23 Dec 2019 09:38:57 +0000 (10:38 +0100)]
tests: introduced tests for adding/deleting logical routers in VTEP database

New tests were introduced based on lcov report, which reveals apparent code
is not covered by ovs test suites.

Signed-off-by: Damijan Skvarc <damjan.skvarc@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agouserspace: Improved packet drop statistics.
Anju Thomas [Wed, 18 Dec 2019 04:48:12 +0000 (05:48 +0100)]
userspace: Improved packet drop statistics.

Currently OVS maintains explicit packet drop/error counters only on port
level.  Packets that are dropped as part of normal OpenFlow processing
are counted in flow stats of “drop” flows or as table misses in table
stats. These can only be interpreted by controllers that know the
semantics of the configured OpenFlow pipeline.  Without that knowledge,
it is impossible for an OVS user to obtain e.g. the total number of
packets dropped due to OpenFlow rules.

Furthermore, there are numerous other reasons for which packets can be
dropped by OVS slow path that are not related to the OpenFlow pipeline.
The generated datapath flow entries include a drop action to avoid
further expensive upcalls to the slow path, but subsequent packets
dropped by the datapath are not accounted anywhere.

Finally, the datapath itself drops packets in certain error situations.
Also, these drops are today not accounted for.This makes it difficult
for OVS users to monitor packet drop in an OVS instance and to alert a
management system in case of a unexpected increase of such drops.
Also OVS trouble-shooters face difficulties in analysing packet drops.

With this patch we implement following changes to address the issues
mentioned above.

1. Identify and account all the silent packet drop scenarios
2. Display these drops in ovs-appctl coverage/show

Co-authored-by: Rohith Basavaraja <rohith.basavaraja@gmail.com>
Co-authored-by: Keshav Gupta <keshugupta1@gmail.com>
Signed-off-by: Anju Thomas <anju.thomas@ericsson.com>
Signed-off-by: Rohith Basavaraja <rohith.basavaraja@gmail.com>
Signed-off-by: Keshav Gupta <keshugupta1@gmail.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com
Acked-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agoofproto-dpif-upcall: Fix using uninitialized upcall hash.
Ilya Maximets [Sat, 4 Jan 2020 00:07:36 +0000 (01:07 +0100)]
ofproto-dpif-upcall: Fix using uninitialized upcall hash.

upcalls are allocated on stack and 'hash' field must be initialized
regardless of attribute existence because it will be used later.

 Conditional jump or move depends on uninitialised value(s)
    at 0xFA74A7: dpif_netlink_encode_execute (dpif-netlink.c:1828)
    by 0xFA6DE8: dpif_netlink_operate__ (dpif-netlink.c:1906)
    by 0xFA612F: dpif_netlink_operate_chunks (dpif-netlink.c:2219)
    by 0xFA0E36: dpif_netlink_operate (dpif-netlink.c:2275)
    by 0xE5AFAC: dpif_operate (dpif.c:1376)
    by 0xDF3922: handle_upcalls (ofproto-dpif-upcall.c:1615)
    by 0xDF269B: recv_upcalls (ofproto-dpif-upcall.c:857)
    by 0xDF1C49: udpif_upcall_handler (ofproto-dpif-upcall.c:759)
    by 0xF3A3FE: ovsthread_wrapper (ovs-thread.c:383)
    by 0x565F6DA: start_thread (pthread_create.c:463)
    by 0x615988E: clone (clone.S:95)
  Uninitialised value was created by a stack allocation
    at 0xDF2258: recv_upcalls (ofproto-dpif-upcall.c:773)

Fixes: 0442bfb11d6c ("ofproto-dpif-upcall: Echo HASH attribute back to datapath.")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: William Tu <u9012063@gmail.com>
4 years agoofproto-dpif: Fix using uninitialized execute hash.
Ilya Maximets [Fri, 3 Jan 2020 23:32:04 +0000 (00:32 +0100)]
ofproto-dpif: Fix using uninitialized execute hash.

Most of callers doesn't initialize dpif_execute.hash leaving random
value from the stack.  And this random value used later while encoding
netlink message and might produce unwanted kernel behavior.

Fix that by fully initializing dpif_execute structure.  Using
designated initializers to avoid such issues in the future.

Fixes: 0442bfb11d6c ("ofproto-dpif-upcall: Echo HASH attribute back to datapath.")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
Acked-by: Ben Pfaff <blp@ovn.org>
4 years agotests: Allow running offloads testsuite under valgrind.
Ilya Maximets [Mon, 6 Jan 2020 10:54:07 +0000 (11:54 +0100)]
tests: Allow running offloads testsuite under valgrind.

This helps a lot with finding memory leaks and uninitialized
data usage.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agodpif-netlink: Fix dumping uninitialized netdev flow stats.
Ilya Maximets [Mon, 6 Jan 2020 10:23:42 +0000 (11:23 +0100)]
dpif-netlink: Fix dumping uninitialized netdev flow stats.

dpif logging functions expects to be called after the operation.
log_flow_del_message() dumps flow stats on success which are not
initialized before the actual call to netdev_flow_del():

 Conditional jump or move depends on uninitialised value(s)
    at 0x6090875: _itoa_word (_itoa.c:179)
    by 0x6093F0D: vfprintf (vfprintf.c:1642)
    by 0x60C090F: vsnprintf (vsnprintf.c:114)
    by 0xE5E7EC: ds_put_format_valist (dynamic-string.c:155)
    by 0xE5E755: ds_put_format (dynamic-string.c:142)
    by 0xE5A5E6: dpif_flow_stats_format (dpif.c:903)
    by 0xE5B708: log_flow_message (dpif.c:1763)
    by 0xE5BCA4: log_flow_del_message (dpif.c:1809)
    by 0xFA6076: try_send_to_netdev (dpif-netlink.c:2190)
    by 0xFA0D3C: dpif_netlink_operate (dpif-netlink.c:2248)
    by 0xE5AFAC: dpif_operate (dpif.c:1376)
    by 0xDF176E: push_dp_ops (ofproto-dpif-upcall.c:2367)
    by 0xDF04C8: push_ukey_ops (ofproto-dpif-upcall.c:2447)
    by 0xDF008F: revalidator_sweep__ (ofproto-dpif-upcall.c:2805)
    by 0xDF5DC6: revalidator_sweep (ofproto-dpif-upcall.c:2816)
    by 0xDF1E83: udpif_revalidator (ofproto-dpif-upcall.c:949)
    by 0xF3A3FE: ovsthread_wrapper (ovs-thread.c:383)
    by 0x565F6DA: start_thread (pthread_create.c:463)
    by 0x615988E: clone (clone.S:95)
  Uninitialised value was created by a stack allocation
    at 0xDEFC24: revalidator_sweep__ (ofproto-dpif-upcall.c:2733)

Fixes: 3cd99886191e ("dpif-netlink: Use dpif logging functions")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agotc: Fix using uninitialized id chain.
Ilya Maximets [Fri, 3 Jan 2020 18:48:18 +0000 (19:48 +0100)]
tc: Fix using uninitialized id chain.

tc_make_tcf_id() doesn't initialize the 'chain' field leaving it with a
random value from the stack.  This makes request_from_tcf_id() create
request with random TCA_CHAIN included.  These requests are obviously
doesn't work as needed leading to broken flow dump and various other
issues.  Fix that by using designated initializer instead.

Fixes: acdd544c4c9a ("tc: Introduce tcf_id to specify a tc filter")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agonetdev-offload-tc: Fix using uninitialized recirc_act.
Ilya Maximets [Fri, 3 Jan 2020 19:06:32 +0000 (20:06 +0100)]
netdev-offload-tc: Fix using uninitialized recirc_act.

Fixes: b2ae40690ed7 ("netdev-offload-tc: Add recirculation support via tc chains")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agocompat: Include confirm_neigh parameter if needed
Greg Rose [Mon, 6 Jan 2020 21:36:34 +0000 (13:36 -0800)]
compat: Include confirm_neigh parameter if needed

A change backported to the Linux 4.14.162 LTS kernel requires
a boolean parameter.  Check for the presence of the parameter
and adjust the caller in that case.

Passes check-kmod test with no regressions.

Passes Travis build here:
https://travis-ci.org/gvrose8192/ovs-experimental/builds/633461320

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agoAUTHORS: Add Krishna Kolakaluri.
Ben Pfaff [Mon, 6 Jan 2020 22:28:26 +0000 (14:28 -0800)]
AUTHORS: Add Krishna Kolakaluri.

Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agobridge: Split the column updates of rstp statistics and status.
Krishna Kolakaluri [Tue, 31 Dec 2019 00:19:39 +0000 (16:19 -0800)]
bridge: Split the column updates of rstp statistics and status.

Split the update of rstp_statistics column and rstp_status column in
Port table into two different functions.  This helps in controlling the
number of times the rstp_statistics column is updated with the key
"stats-update_interval" in Open_vSwitch table.

Signed-off-by: Krishna Kolakaluri <kkolakaluri@plume.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agonetdev-afxdp: Fix transmission freeze in native mode without zerocopy.
Ilya Maximets [Sun, 5 Jan 2020 00:51:19 +0000 (01:51 +0100)]
netdev-afxdp: Fix transmission freeze in native mode without zerocopy.

Kernel uses 'xsk_generic_xmit()' for all modes where zerocopy is
not enabled:

   net/xdp/xsk.c
   433  static int __xsk_sendmsg(struct sock *sk)
   434  {
            ...
   442      return xs->zc ? xsk_zc_xmit(xs) : xsk_generic_xmit(sk);
   443  }

'xsk_generic_xmit ()' sends packets synchronously and no more than 16
packets at a time.  This means that we have to kick Tx with sendmsg()
for every 16 packets in simple native mode too, otherwise the packets
may never be sent.

Reported-by: William Tu <u9012063@gmail.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2019-November/365076.html
Fixes: e8f5634484e8 ("netdev-afxdp: Best-effort configuration of XDP mode.")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agocirrus: Use python 3.7 packages on FreeBSD.
Ilya Maximets [Thu, 2 Jan 2020 10:57:53 +0000 (11:57 +0100)]
cirrus: Use python 3.7 packages on FreeBSD.

Python 3.6 versions of these packages are no longer available in
FreeBSD ports.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
4 years agonetdev-offload-tc: Add conntrack nat support
Paul Blakey [Sun, 22 Dec 2019 10:16:43 +0000 (12:16 +0200)]
netdev-offload-tc: Add conntrack nat support

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agonetdev-offload-tc: Add conntrack label and mark support
Paul Blakey [Sun, 22 Dec 2019 10:16:42 +0000 (12:16 +0200)]
netdev-offload-tc: Add conntrack label and mark support

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agonetdev-offload-tc: Add conntrack support
Paul Blakey [Sun, 22 Dec 2019 10:16:41 +0000 (12:16 +0200)]
netdev-offload-tc: Add conntrack support

Zone and ct_state first.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agonetdev-offload-tc: Add recirculation support via tc chains
Paul Blakey [Sun, 22 Dec 2019 10:16:40 +0000 (12:16 +0200)]
netdev-offload-tc: Add recirculation support via tc chains

Each recirculation id will create a tc chain, and we translate
the recirculation action to a tc goto chain action.

We check for kernel support for this by probing OvS Datapath for the
tc recirc id sharing feature. If supported, we can offload rules
that match on recirc_id, and recirculation action safely.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agotc: Move tunnel_key unset action before output ports
Paul Blakey [Sun, 22 Dec 2019 10:16:39 +0000 (12:16 +0200)]
tc: Move tunnel_key unset action before output ports

Since OvS datapath gets packets already decapsulated from tunnel devices,
it doesn't explicitly decapsulate them. So in a recirculation setup,
the tunnel matching continues in the recirculation as the tunnel
metadata still exists on the SKB.

Tunnel key unset action unsets this metadata. Some drivers might rely
on this explicit tunnel key unset to know when to decapsulate the packet
instead of the device type. So instead of removing it completly,
we move it near the output actions.

This way, we also keep SKB metadata through recirculation, and for
non-recirculation rules, the resulting tc rules should remain the same.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agodpif: Add support to set user features
Paul Blakey [Sun, 22 Dec 2019 10:16:38 +0000 (12:16 +0200)]
dpif: Add support to set user features

This enables user features on the kernel datapath via the DP_CMD_SET
command, and also retrieves them to check for actual support and
not just an older kernel ignoring the requested features.

This will be used in next patch to enable recirc_id sharing with tc.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agonetdev-offload-tc: Implement netdev tc flush via tc filter del
Paul Blakey [Sun, 22 Dec 2019 10:16:37 +0000 (12:16 +0200)]
netdev-offload-tc: Implement netdev tc flush via tc filter del

To be consistent with our tc-ufid mapping after flush, and to support tc
chains flushing in the next commit, implement flush operation via
deleting all the filters we actually added and delete their mappings.

This will also not delete the configured qos policing via matchall filters,
while old code did.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agotc: Introduce tcf_id to specify a tc filter
Paul Blakey [Sun, 22 Dec 2019 10:16:36 +0000 (12:16 +0200)]
tc: Introduce tcf_id to specify a tc filter

Move all that is needed to identify a tc filter to a
new structure, tcf_id. This removes a lot of duplication
in accessing/creating tc filters.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agocompat: Add tc ct action and flower matches defines for older kernels
Paul Blakey [Sun, 22 Dec 2019 10:16:35 +0000 (12:16 +0200)]
compat: Add tc ct action and flower matches defines for older kernels

Update kernel UAPI to support conntrack matches, and the
tc actions ct and goto chain.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agomatch: Add match_set_ct_zone_masked helper
Paul Blakey [Sun, 22 Dec 2019 10:16:34 +0000 (12:16 +0200)]
match: Add match_set_ct_zone_masked helper

Sets zone in match.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agoovsdb raft: Fix the problem when cluster restarted after DB compaction.
Han Zhou [Wed, 4 Dec 2019 01:57:20 +0000 (17:57 -0800)]
ovsdb raft: Fix the problem when cluster restarted after DB compaction.

Cluster doesn't work after all nodes restarted after DB compaction,
unless there is any transaction after DB compaction before the restart.

Error log is like:
raft|ERR|internal error: deferred vote_request message completed but not ready
to send because message index 9 is past last synced index 0: s2 vote_request:
term=6 last_log_index=9 last_log_term=4

The root cause is that the log_synced member is not initialized when
reading the raft header. This patch fixes it and remove the XXX
from the test case.

Signed-off-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoovsdb-cluster.at: Wait until leader is elected before DB compact.
Han Zhou [Wed, 4 Dec 2019 01:57:19 +0000 (17:57 -0800)]
ovsdb-cluster.at: Wait until leader is elected before DB compact.

In test case "election timer change", before testing DB compact,
we had to insert some data. Otherwise, inserting data after DB
compact will cause busy loop as mentioned in the XXX comment.

The root cause of the busy loop is still not clear, but the test
itself didn't wait until the leader election finish before initiating
DB compact. This patch adds the wait to make sure the test continue
after leader is elected so that the following tests are based on
a clean state. While this wait is added, the busy loop problem is
gone even without inserting the data, so the additional data insertion
is also removed by this patch.

A separate patch will address the busy loop problem in the scenario:
1. Restart cluster
2. DB compact before the cluster is ready
3. Insert data

Signed-off-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoovs container build: Make kernel module configurable
Aliasgar Ginwala [Fri, 20 Dec 2019 00:50:47 +0000 (16:50 -0800)]
ovs container build: Make kernel module configurable

--with-linux can be made configurable while building containers
for leveraging kernel modules installed on host.
KERNEL_VERSION=host should be used in env variable for the same.

Signed-off-by: Aliasgar Ginwala <aginwala@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoRevert "docs: To build OVS on RHEL7 EPEL is needed"
Timothy Redaelli [Fri, 20 Dec 2019 17:35:09 +0000 (18:35 +0100)]
Revert "docs: To build OVS on RHEL7 EPEL is needed"

This reverts commit 9e334d91b3ea95e2b96f7b3edcb2ba9c3353288a.

This commit is not needed since OVS doesn't use six anymore.

Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoRemove dependency on python3-six
Timothy Redaelli [Fri, 20 Dec 2019 17:35:08 +0000 (18:35 +0100)]
Remove dependency on python3-six

Since Python 2 support was removed in 1ca0323e7c29 ("Require Python 3 and
remove support for Python 2."), python3-six is not needed anymore.

Moreover python3-six is not available on RHEL/CentOS7 without using EPEL
and so this patch is needed in order to release OVS 2.13 on RHEL7.

Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agotravis: Use pip3 to install the python packages on linux
Timothy Redaelli [Wed, 18 Dec 2019 15:53:10 +0000 (16:53 +0100)]
travis: Use pip3 to install the python packages on linux

Currently pip is used to install the python packages on linux by travis,
but pip3 should be used since pip is a symlink of pip2.

Fixes: 1ca0323e7c29 ("Require Python 3 and remove support for Python 2.")
Cc: blp@ovn.org
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agodpctl: Fix referencing DPDK in a flow dump.
Ilya Maximets [Wed, 18 Dec 2019 14:38:29 +0000 (15:38 +0100)]
dpctl: Fix referencing DPDK in a flow dump.

Few reasons to replace 'non-dpdk interfaces' with 'the main thread':

* Flows are dumped from threads (from per thread flow tables) not from
  the interfaces.

* 'non-dpdk' here sounds like all other flows (dumped from PMDs) has
  some relation with DPDK which is not true at least because we have
  afxdp and dummy ports that could be polled by PMD threads.

* 'main thread' is the same term as we're using in the output of
  ovs-appctl dpif-netdev/pmd-stats-show.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
4 years agoovs-thread: Avoid huge alignment on a base spinlock structure.
Ilya Maximets [Mon, 16 Dec 2019 12:54:38 +0000 (13:54 +0100)]
ovs-thread: Avoid huge alignment on a base spinlock structure.

Marking the structure as 64 bytes aligned forces compiler to produce
big holes in the containing structures in order to fulfill this
requirement.  Also, any structure that contains this one as a member
automatically inherits this huge alignment making resulted memory
layout not efficient.  For example, 'struct umem_pool' currently
uses 3 full cache lines (192 bytes) with only 32 bytes of actual data:

  struct umem_pool {
    int                        index;                /*  0   4 */
    unsigned int               size;                 /*  4   4 */

    /* XXX 56 bytes hole, try to pack */

    /* --- cacheline 1 boundary (64 bytes) --- */
    struct ovs_spin lock __attribute__((__aligned__(64))); /* 64  64 */

    /* XXX last struct has 48 bytes of padding */

    /* --- cacheline 2 boundary (128 bytes) --- */
    void * *                   array;                /* 128  8 */

    /* size: 192, cachelines: 3, members: 4 */
    /* sum members: 80, holes: 1, sum holes: 56 */
    /* padding: 56 */
    /* paddings: 1, sum paddings: 48 */
    /* forced alignments: 1, forced holes: 1, sum forced holes: 56 */
  } __attribute__((__aligned__(64)));

Actual alignment of a spin lock is required only for Tx queue locks
inside netdev-afxdp to avoid false sharing, in all other cases
alignment only produces inefficient memory usage.

Also, CACHE_LINE_SIZE macro should be used instead of 64 as different
platforms may have different cache line sizes.

Using PADDED_MEMBERS to avoid alignment inheritance.

Fixes: ae36d63d7e3c ("ovs-thread: Make struct spin lock cache aligned.")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
4 years agonetdev-dpdk: Add coverage counter to count vhost IRQs.
Eelco Chaudron [Thu, 12 Dec 2019 03:00:01 +0000 (22:00 -0500)]
netdev-dpdk: Add coverage counter to count vhost IRQs.

When the dpdk vhost library executes an eventfd_write() call,
i.e. waking up the guest, a new callback will be called.

This patch adds the callback to count the number of
interrupts sent to the VM to track the number of times
interrupts where generated.

This might be of interest to find out system-calls were
called in the DPDK fast path.

The coverage counter is called "vhost_notification" and
can be read with:

  $ ovs-appctl coverage/read-counter vhost_notification
  13238319

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agonetdev-dpdk: Fix sw stats perf drop.
Kevin Traynor [Tue, 17 Dec 2019 15:07:37 +0000 (15:07 +0000)]
netdev-dpdk: Fix sw stats perf drop.

Accessing the sw stats in the vhost datapath of a PVP test
can incur a performance drop of ~2%.

Most of the time these stats will just be getting zero added
to them. By checking if there is a non-zero update first, we
can avoid accessing them when they won't be updated and avoid
the performance drop.

Fixes: 2f862c712e52 ("netdev-dpdk: Detailed packet drop statistics.")
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>