Greg Rose [Tue, 14 Apr 2020 18:42:10 +0000 (11:42 -0700)]
compat: Fix broken partial backport of extack op parameter
A series of commits added support for the extended ack
parameter to the newlink, changelink and validate ops in
the rtnl_link_ops structure: a8b8a889e369d ("net: add netlink_ext_ack argument to rtnl_link_ops.validate") 7a3f4a185169b ("net: add netlink_ext_ack argument to rtnl_link_ops.newlink") ad744b223c521 ("net: add netlink_ext_ack argument to rtnl_link_ops.changelink")
These commits were all added at the same time and present since the
Linux kernel 4.13 release. In our compatiblity layer we have a
define HAVE_EXT_ACK_IN_RTNL_LINKOPS that indicates the presence of
the extended ack parameter for these three link operations.
At least one distro has only backported two of the three patches,
for newlink and changelink, while not backporting patch a8b8a889e369d
for the validate op. Our compatibility layer code in acinclude.m4
is able to find the presence of the extack within the rtnl_link_ops
structure so it defines HAVE_EXT_ACK_IN_RTNL_LINKOPS but since the
validate link op does not have the extack parameter the compilation
fails on recent kernels for that particular distro. Other kernel
distributions based upon this distro will presumably also encounter
the compile errors.
Introduce a new function in acinclude.m4 that will find function
op definitions and then search for the required parameter. Then
use this function to define HAVE_RTNLOP_VALIDATE_WITH_EXTACK so
that we can detect and enable correct compilation on kernels
which have not backported the entire set of patches. This function
is generic to any function op - it need not be in a structure.
In places where HAVE_EXT_ACK_IN_RTNL_LINKOPS wraps validate functions
replace it with the new HAVE_RTNLOP_VALIDATE_WITH_EXTACK define.
Passes a kernel check-kmod test on several systems, including
sles12 sp4 4.12.14-95.48-default kernel, without any regressions.
VMWare-BZ: #2544032 Signed-off-by: Greg Rose <gvrose8192@gmail.com> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: William Tu <u9012063@gmail.com>
Yifeng Sun [Thu, 9 Apr 2020 18:37:39 +0000 (11:37 -0700)]
system-traffic: Check frozen state handling with TLV map change
This patch enhances a system traffic test to prevent regression on
the tunnel metadata table (tun_table) handling with frozen state.
Without a proper fix this test can crash ovs-vswitchd due to a
use-after-free bug on tun_table.
These are the timed sequence of how this bug is triggered:
- Adds an OpenFlow rule in OVS that matches Geneve tunnel metadata that
contains a controller action.
- When the first packet matches the aforementioned OpenFlow rule,
during the miss upcall, OVS stores a pointer to the tun_table (that
decodes the Geneve tunnel metadata) in a frozen state and pushes down
a datapath flow into kernel datapath.
- Issues a add-tlv-map command to reprogram the tun_table on OVS.
OVS frees the old tun_table and create a new tun_table.
- A subsequent packet hits the kernel datapath flow again. Since
there is a controller action associated with that flow, it triggers
slow path controller upcall.
- In the slow path controller upcall, OVS derives the tun_table
from the frozen state, which points to the old tun_table that is
already being freed at this time point.
- In order to access the tunnel metadata, OVS uses the invalid
pointer that points to the old tun_table and triggers the core dump.
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Co-authored-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: William Tu <u9012063@gmail.com>
Yifeng Sun [Thu, 9 Apr 2020 18:37:38 +0000 (11:37 -0700)]
tun_metadata: Fix coredump caused by use-after-free bug
Tun_metadata can be referened by flow and frozen_state at the same
time. When ovs-vswitchd handles TLV table mod message, the involved
tun_metadata gets freed. The call trace to free tun_metadata is
shown as below:
Unfortunately, this tun_metadata can be still used by some frozen_state,
and later on when frozen_state tries to access its tun_metadata table,
ovs-vswitchd crashes. The call trace to access tun_metadata from
frozen_state is shown as below:
It is unsafe for frozen_state to reference tun_table because tun_table
is protected by RCU while the lifecycle of frozen_state can span several
RCU quiesce states. Current code violates OVS's RCU protection mechanism.
This patch fixes it by simply stopping frozen_state from referencing
tun_table. If frozen_state needs tun_table, the latest valid tun_table
can be found through ofproto_get_tun_tab() efficiently.
A previous commit seems fixing the samiliar issue: 254878c18874f6 (ofproto-dpif-xlate: Fix segmentation fault caused by tun_table)
VMware-BZ: #2526222 Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: William Tu <u9012063@gmail.com>
Timothy Redaelli [Thu, 19 Mar 2020 19:05:39 +0000 (20:05 +0100)]
bugtool: Fix for Python3.
Currently ovs-bugtool tool doesn't start on Python 3.
This commit fixes ovs-bugtool to make it works on Python 3.
Replaced StringIO.StringIO with io.BytesIO since the script is
processing binary data.
Reported-at: https://bugzilla.redhat.com/1809241 Reported-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Timothy Redaelli <tredaelli@redhat.com> Co-authored-by: William Tu <u9012063@gmail.com> Signed-off-by: William Tu <u9012063@gmail.com>
dpif-netdev: includes microsecond delta in meter bucket calculation
When dp-netdev meter rate is higher than 200Mbps, observe
more than 10% bias from configured rate value with UDP traffic.
In dp-netdev meter, millisecond delta between now and last used
is taken into bucket size calcualtion, while sub-millisecond part
is truncated.
If traffic rate is pretty high, time delta can be few milliseconds,
its ratio to truncated part is less than 10:1, the loss of bucket
size caused by truncated can be observed obviously by commited
traffic rate.
In this patch, microsend delta part is included in calculation
of meter bucket to make it more precise.
Signed-off-by: Jiang Lidong <jianglidong3@jd.com> Signed-off-by: William Tu <u9012063@gmail.com>
Lance Yang [Mon, 30 Mar 2020 12:54:03 +0000 (20:54 +0800)]
Travis: Enable clang compiler and unit test for arm CI
Enable testsuite and clang compiler for arm CI. In order not to increase
the CI jobs, selectively enable them in the existing jobs instead of
adding extra jobs.
Malvika Gupta [Mon, 30 Mar 2020 12:54:02 +0000 (20:54 +0800)]
tests/testsuite: Skip failing UT cases on aarch64
The following test cases are failing inconsistently on aarch64 platforms and
have been skipped until further investigation can be made on how to fix them:
Malvika Gupta [Mon, 30 Mar 2020 12:54:01 +0000 (20:54 +0800)]
tests/atlocal.in: Add check for aarch64 Architecture
This patch adds a condition to check if the CPU architecture is aarch64. If the
condition evaluates to true, $IS_ARM64 variable is set to 'yes'. For all other
architectures, this variable is set to 'no'.
Reviewed-by: Yanqin Wei <Yanqin.wei@arm.com> Signed-off-by: Malvika Gupta <malvika.gupta@arm.com> Signed-off-by: William Tu <u9012063@gmail.com>
William Tu [Mon, 6 Apr 2020 23:59:01 +0000 (16:59 -0700)]
ovs-vswitchd: Fix icmp reply timeout description.
Currently the userspace datapath implements conntrack ICMP reply state
as when ICMP packets have been seen on both directions. However, the
description is defined as timeout of the connection after an ICMP error
is replied in response to an ICMP packet.
Fixes: 61a5264d60d0c ("ovs-vswitchd: Add Datapath, CT_Zone, and CT_Zone_Policy tables.") Signed-off-by: William Tu <u9012063@gmail.com> Acked-by: Greg Rose <gvrose8192@gmail.com>
If the ovsdb-server reply to "monitor_cond_since" requests has
"found" == false then ovsdb_idl_db_parse_monitor_reply() calls
ovsdb_idl_db_clear() which iterates through all tables and
unconditionally sets table->cond_changed to false.
However, if the client had already set a new condition for some of the
tables, this new condition request will never be sent to ovsdb-server
until the condition is reset to a different value. This is due to the
check in ovsdb_idl_db_set_condition().
One way to replicate the issue is described in the bugzilla reporting
the bug, when ovn-controller is configured to use "ovn-monitor-all":
https://bugzilla.redhat.com/show_bug.cgi?id=1808125#c6
Commit 5351980b047f tried to optimize sending redundant conditional
monitoring updates but the chances that this scenario happens with the
latest code is quite low since commit 403a6a0cb003 ("ovsdb-idl: Fast
resync from server when connection reset.") changed the behavior of
ovsdb_idl_db_parse_monitor_reply() to avoid calling ovsdb_idl_db_clear()
in most cases.
Reported-by: Dan Williams <dcbw@redhat.com>
Reported-at: https://bugzilla.redhat.com/1808125 CC: Andy Zhou <azhou@ovn.org> Fixes: 5351980b047f ("ovsdb-idl: Avoid sending redundant conditional monitoring updates") Acked-by: Han Zhou <hzhou@ovn.org> Acked-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
William Tu [Mon, 25 Nov 2019 19:19:23 +0000 (11:19 -0800)]
userspace: Add GTP-U support.
GTP, GPRS Tunneling Protocol, is a group of IP-based communications
protocols used to carry general packet radio service (GPRS) within
GSM, UMTS and LTE networks. GTP protocol has two parts: Signalling
(GTP-Control, GTP-C) and User data (GTP-User, GTP-U). GTP-C is used
for setting up GTP-U protocol, which is an IP-in-UDP tunneling
protocol. Usually GTP is used in connecting between base station for
radio, Serving Gateway (S-GW), and PDN Gateway (P-GW).
This patch implements GTP-U protocol for userspace datapath,
supporting only required header fields and G-PDU message type.
See spec in:
https://tools.ietf.org/html/draft-hmm-dmm-5g-uplane-analysis-00
Tested-at: https://travis-ci.org/github/williamtu/ovs-travis/builds/666518784 Signed-off-by: Feng Yang <yangfengee04@gmail.com> Co-authored-by: Feng Yang <yangfengee04@gmail.com> Signed-off-by: Yi Yang <yangyi01@inspur.com> Co-authored-by: Yi Yang <yangyi01@inspur.com> Signed-off-by: William Tu <u9012063@gmail.com> Acked-by: Ben Pfaff <blp@ovn.org>
Ilya Maximets [Tue, 24 Mar 2020 23:50:45 +0000 (00:50 +0100)]
dpif-netdev: Force port reconfiguration to change dynamic_txqs.
In case number of polling threads goes from exact number of Tx queues
in port to higher value while set_tx_multiq() not implemented or not
requesting reconfiguration, port will not be reconfigured and datapath
will continue using static Tx queue ids leading to crash.
Ex.:
Assuming that port p0 supports up to 4 Tx queues and doesn't support
set_tx_multiq() method. For example, netdev-afxdp could be the case,
because it could have multiple Tx queues, but doesn't have
set_tx_multiq() implementation because number of Tx queues always
equals to number of Rx queues.
1. Configuring pmd-cpu-mask to have 3 pmd threads.
2. Adding port p0 to OVS.
At this point wanted_txqs = 4 (3 for pmd threads + 1 for non-pmd).
Port reconfigured to have 4 Tx queues successfully.
dynamic_txqs = (4 < 4) = false;
3. Configuring pmd-cpu-mask to have 10 pmd threads.
At this point wanted_txqs = 11 (10 for pmd threads + 1 for non-pmd).
Since set_tx_multiq() is not implemented, netdev doesn't request
reconfiguration and 'dynamic_txqs' remains in 'false' state.
4. Since 'dynamic_txqs == false', dpif-netdev uses static Tx queue
ids that are in range [0, 10] while device only supports 4 leading
to unwanted behavior and crashes.
Fix that by marking for reconfiguration all the ports that will likely
change their 'dynamic_txqs' value.
It looks like the issue could be reproduced only with afxdp ports,
because all other non-dpdk ports ignores Tx queue ids and dpdk ports
requests for reconfiguration on set_tx_multiq().
Reported-by: William Tu <u9012063@gmail.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2020-March/368364.html Fixes: e32971b8ddb4 ("dpif-netdev: Centralized threads and queues handling code.") Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: William Tu <u9012063@gmail.com>
Dmytro Linkin [Thu, 27 Feb 2020 15:22:32 +0000 (17:22 +0200)]
netdev-offload-tc: Flush rules on ingress block when init tc flow api
OVS can fail to attach ingress block on iface when init tc flow api,
if block already exist with rules on it and is shared with other iface.
Fix by flush all existing rules on the ingress block prior to deleting
it.
Fixes: 093c9458fb02 ("tc: allow offloading of block ids") Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com> Acked-by: Raed Salem <raeds@mellanox.com> Acked-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
Lance Yang [Tue, 24 Mar 2020 07:00:37 +0000 (15:00 +0800)]
travis: Enable OvS Travis CI for arm
Enable part of travis jobs with gcc compiler for arm64 architecture
1. Add arm jobs into the matrix in .travis.yml configuration file
2. To enable OVS-DPDK jobs, set the build target according to
different CPU architectures
3. Temporarily disable sparse checker because of static code checking
failure on arm64
Considering the balance of the CI coverage and running time, some kernel
and DPDK jobs are removed from Arm CI.
Successful travis build jobs report:
https://travis-ci.org/github/yzyuestc/ovs/builds/666129448
Reviewed-by: Yanqin Wei <Yanqin.Wei@arm.com> Reviewed-by: Ruifeng Wang <Ruifeng.Wang@arm.com> Reviewed-by: JingZhao Ni <JingZhao.Ni@arm.com> Reviewed-by: Gavin Hu <Gavin.Hu@arm.com> Signed-off-by: Lance Yang <Lance.Yang@arm.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Greg Rose [Tue, 24 Mar 2020 15:42:02 +0000 (08:42 -0700)]
compat: Fix nf_ip_hook parameters for RHEL 8
A RHEL release version check was only checking for RHEL releases
greater than 7.0 so that ended up including a compat fixup that
is not needed for 8.0. Fix up the version check.
Signed-off-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: William Tu <u9012063@gmail.com>
Dumitru Ceara [Thu, 19 Mar 2020 19:21:16 +0000 (20:21 +0100)]
conntrack: Reset ct_state when entering a new zone.
When a new conntrack zone is entered, the ct_state field is zeroed in
order to avoid using state information from different zones.
One such scenario is when a packet is double NATed. Assuming two zones
and 3 flows performing the following actions in order on the packet:
1. ct(zone=5,nat), recirc
2. ct(zone=1), recirc
3. ct(zone=1,nat)
If at step #1 the packet matches an existing NAT entry, it will get
translated and pkt->md.ct_state is set to CS_DST_NAT or CS_SRC_NAT.
At step #2 the new tuple might match an existing connection and
pkt->md.ct_zone is set to 1.
If at step #3 the packet matches an existing NAT entry in zone 1,
handle_nat() will be called to perform the translation but it will
return early because the packet's zone matches the conntrack zone and
the ct_state field still contains CS_DST_NAT or CS_SRC_NAT from the
translations in zone 5.
In order to reliably detect when a packet enters a new conntrack zone
we also need to make sure that the pkt->md.ct_zone is properly
initialized if pkt->md.ct_state is non-zero. This already happens for
most cases. The only exception is when matched conntrack connection is
of type CT_CONN_TYPE_UN_NAT and the master connection is missing. To
cover this path we now call write_ct_md() in that case too. Remove
setting the CS_TRACKED flag as in this case as it will be done by the
new call to write_ct_md().
William Tu [Mon, 23 Mar 2020 23:34:37 +0000 (16:34 -0700)]
lockfile: Fix OVS_REQUIRES macro.
Pass lock objects, not their addresses, to the annotation macros.
Fixes: f21fa45f3085 ("lockfile: Minor code cleanup.")
Tested-at: https://travis-ci.org/github/williamtu/ovs-travis/builds/666098338 Signed-off-by: William Tu <u9012063@gmail.com> Acked-by: Ben Pfaff <blp@ovn.org>
William Tu [Mon, 23 Mar 2020 14:44:48 +0000 (07:44 -0700)]
fatal-signal: Log backtrace when no monitor daemon.
Currently the backtrace logging is only available when monitor
daemon is running. This patch enables backtrace logging when
no monitor daemon exists. At signal handling context, it detects
whether monitor daemon exists. If not, write directly the backtrace
to the vlog fd. Note that using VLOG_* macro doesn't work due to
it's buffer I/O, so this patch directly issue write() syscall to
the file descriptor.
For some system we stop using monitor daemon and use systemd to
monitor ovs-vswitchd, thus need this patch. Example of
ovs-vswitchd.log (note that there is no timestamp printed):
2020-03-23T14:42:12.949Z|00049|memory|INFO|175332 kB peak resident
2020-03-23T14:42:12.949Z|00050|memory|INFO|handlers:2 ports:3 reva
SIGSEGV detected, backtrace:
0x0000000000486969 <fatal_signal_handler+0x49>
0x00007f7f5e57f4b0 <killpg+0x40>
0x000000000047daa8 <pmd_thread_main+0x238>
0x0000000000504edd <ovsthread_wrapper+0x7d>
0x00007f7f5f0476ba <start_thread+0xca>
0x00007f7f5e65141d <clone+0x6d>
0x0000000000000000 <+0x0>
Acked-by: Ben Pfaff <blp@ovn.org> Signed-off-by: William Tu <u9012063@gmail.com>
Terry Wilson [Fri, 20 Mar 2020 15:22:38 +0000 (15:22 +0000)]
Handle refTable values with setkey()
For columns like QoS.queues where we have a map containing refTable
values, assigning w/ __setattr__ e.g. qos.queues={1: $queue_row}
works, but using using qos.setkey('queues', 1, $queue_row) results
in an Exception. The opdat argument can essentially just be the
JSON representation of the map column instead of trying to build
it.
Signed-off-by: Terry Wilson <twilson@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ben Pfaff [Fri, 20 Mar 2020 00:53:10 +0000 (17:53 -0700)]
ofproto-dpif-xlate: Fix recirculation when in_port is OFPP_CONTROLLER.
Recirculation usually requires finding the pre-recirculation input port.
Packets sent by the controller, with in_port of OFPP_CONTROLLER or
OFPP_NONE, do not have a real input port data structure, only a port
number. The code in xlate_lookup_ofproto_() mishandled this case,
failing to return the ofproto data structure. This commit fixes the
problem and adds a test to guard against regression.
Greg Rose [Wed, 11 Mar 2020 17:49:17 +0000 (10:49 -0700)]
Documentation: Add note about iproute2 requirements for check-kmod
On many systems the check-kmod and check-kernel test suites have
many failures due to the lack of feature support in the older
iproute2 utility packages shipped with those systems. Add a
note indicating that it might be necessary to update the iproute2
utility package in order to fix those errors.
Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: William Tu <u9012063@gmail.com>
Usman Ansari [Thu, 19 Mar 2020 21:47:17 +0000 (14:47 -0700)]
hmap: Fix Coverity false positive
Coverity reports a false positive below:
Incorrect expression, Assign_where_compare_meant: use of "="
where "==" may have been intended.
Fixed it by rewriting '(NODE = NULL)' as '((NODE = NULL), false)'.
"make check" passes for this change
Coverity reports over 500 errors resolved
Suggested-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Usman Ansari <ua1422@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
wenxu [Wed, 11 Mar 2020 05:39:34 +0000 (13:39 +0800)]
dpif-netlink: avoid netlink modify flow put op failed after tc modify flow put op failed.
The tc modify flow put always delete the original flow first and
then add the new flow. If the modfiy flow put operation failed,
the flow put operation will change from modify to create if success
to delete the original flow in tc (which will be always failed with
ENOENT, the flow is already be deleted before add the new flow in tc).
Finally, the modify flow put will failed to add in kernel datapath.
Signed-off-by: wenxu <wenxu@ucloud.cn> Acked-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
Ilya Maximets [Fri, 21 Feb 2020 14:41:50 +0000 (15:41 +0100)]
dpif-netdev: Enter quiescent state after each offloading operation.
If the offloading queue is big and filled continuously, offloading
thread may have no chance to quiesce blocking rcu callbacks and
other threads waiting for synchronization.
Fix that by entering momentary quiescent state after each operation
since we're not holding any rcu-protected memory here.
Fixes: 02bb2824e51d ("dpif-netdev: do hw flow offload in a thread") Reported-by: Eli Britstein <elibr@mellanox.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2020-February/049768.html Acked-by: Eli Britstein <elibr@mellanox.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Yanqin Wei [Thu, 27 Feb 2020 16:12:21 +0000 (00:12 +0800)]
pvector: Use acquire-release semantics for size.
Read/write concurrency of pvector library is implemented by a temp vector
and RCU protection. Considering performance reason, insertion does not
follow this scheme.
In insertion function, a thread fence ensures size increment is done
after new entry is stored. But there is no barrier in the iteration
fuction(pvector_cursor_init). Entry point access may be reordered before
loading vector size, so the invalid entry point may be loaded when vector
iteration.
This patch fixes it by acquire-release pair. It can guarantee new size is
observed by reader after new entry stored by writer. And this is
implemented by one-way barrier instead of two-way memory fence.
Ilya Maximets [Thu, 12 Mar 2020 09:57:44 +0000 (10:57 +0100)]
travis: Disable sindex build in sparse.
Sparse introduced a new utility 'sindex' for semantic search,
but unfortunately it fails to build in Travis environment.
Disabling it explicitly as we don't need it anyway.
openvswitch: conntrack: mark expected switch fall-through
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
Notice that in this particular case I placed a "fall through" comment on
its own line, which is what GCC is expecting to find.
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
netlink: make validation more configurable for future strictness
We currently have two levels of strict validation:
1) liberal (default)
- undefined (type >= max) & NLA_UNSPEC attributes accepted
- attribute length >= expected accepted
- garbage at end of message accepted
2) strict (opt-in)
- NLA_UNSPEC attributes accepted
- attribute length >= expected accepted
Split out parsing strictness into four different options:
* TRAILING - check that there's no trailing data after parsing
attributes (in message or nested)
* MAXTYPE - reject attrs > max known type
* UNSPEC - reject attributes with NLA_UNSPEC policy entries
* STRICT_ATTRS - strictly validate attribute size
The default for future things should be *everything*.
The current *_strict() is a combination of TRAILING and MAXTYPE,
and is renamed to _deprecated_strict().
The current regular parsing has none of this, and is renamed to
*_parse_deprecated().
Additionally it allows us to selectively set one of the new flags
even on old policies. Notably, the UNSPEC flag could be useful in
this case, since it can be arranged (by filling in the policy) to
not be an incompatible userspace ABI change, but would then going
forward prevent forgetting attribute entries. Similar can apply
to the POLICY flag.
We end up with the following renames:
* nla_parse -> nla_parse_deprecated
* nla_parse_strict -> nla_parse_deprecated_strict
* nlmsg_parse -> nlmsg_parse_deprecated
* nlmsg_parse_strict -> nlmsg_parse_deprecated_strict
* nla_parse_nested -> nla_parse_nested_deprecated
* nla_validate_nested -> nla_validate_nested_deprecated
For this patch, don't actually add the strict, non-renamed versions
yet so that it breaks compile if I get it wrong.
Also, while at it, make nla_validate and nla_parse go down to a
common __nla_validate_parse() function to avoid code duplication.
Ultimately, this allows us to have very strict validation for every
new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
next patch, while existing things will continue to work as is.
In effect then, this adds fully strict validation for any new command.
Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Backport portions of this commit applicable to openvswitch and
added necessary compatibility layer changes to support older
kernels.
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Greg Rose [Fri, 6 Mar 2020 22:37:18 +0000 (14:37 -0800)]
datapath: Kbuild: Add kcompat.h header to front of NOSTDINC
Since this commit in the Linux upstream kernel:
'commit 9b9a3f20cbe0 ("kbuild: split final module linking out into Makefile.modfinal")'
The openvswitch kernel module fails to build against the upstream
Linux kernel. The cause of the build failure is that the include of the
KBUILD_EXTMOD variable was dropped in Makefile.modfinal when
it was split out from Makefile.modpost. Our Kbuild was setting
the ccflags-y variable to include our kcompat.h header as the
first header file. The Linux kernel maintainer has said that
it is incorrect to rely on the ccflags-y variable for the modfinal
phase of the build so that is why KBUILD_EXTMOD is not included.
We fix this by breaking a different Linux kernel make rule. We
add '-include $(builddir)/kcompat.h' to the front of the NOSTDINC
variable setting in our Kbuild makefile.
As noted already in the comment for the NOSTDINC setting:
\# These include directories have to go before -I$(KSRC)/include.
\# NOSTDINC_FLAGS just happens to be a variable that goes in the
\# right place, even though it's conceptually incorrect.
So we continue the misuse of the NOSTDINC variable to fix this
issue as well.
The assumption of the Linux kernel maintainers is that any
local, out-of-tree build include files can be added to the end
of the command line. In our case that is wrong of course, but
there is nothing we can do about it that I know of other than using
some utility like unifdef to strip out offending chunks of our
compatibility layer code before invocation of Makefile.modfinal.
That is a big change that would take a lot of work to implement.
We could ask the Linux kernel maintainers to provide some
way for out-of-tree kernel modules to include their own header
files first in a proper manner. I consider that to be a very
low probability of success but something we could ask about.
For now we cheat and take the easy way out.
Reported-by: David Ahern <dsahern@gmail.com> Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except
at places where these are defined. Later patches will remove the unused
definition of FIELD_SIZEOF().
git grep -l -e "\bFIELD_SIZEOF\b" | while read file;
do
if [[ "$file" =~ $EXCLUDE_FILES ]]; then
continue
fi
sed -i -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file;
done
Signed-off-by: Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com> Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.com Co-developed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: David Miller <davem@davemloft.net> # for net
Also added a compatibility layer macro for older kernels that still
use FIELD_SIZEOF
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Since maxattr is common, the policy can't really differ sanely,
so make it common as well.
The only user that did in fact manage to make a non-common policy
is taskstats, which has to be really careful about it (since it's
still using a common maxattr!). This is no longer supported, but
we can fake it using pre_doit.
This reduces the size of e.g. nl80211.o (which has lots of commands):
text data bss dec hex filename
398745 14323 2240 415308 6564c net/wireless/nl80211.o (before)
397913 14331 2240 414484 65314 net/wireless/nl80211.o (after)
--------------------------------
-832 +8 0 -824
Which is obviously just 8 bytes for each command, and an added 8
bytes for the new policy pointer. I'm not sure why the ops list is
counted as .text though.
Most of the code transformations were done using the following spatch:
@ops@
identifier OPS;
expression POLICY;
@@
struct genl_ops OPS[] = {
...,
{
- .policy = POLICY,
},
...
};
This also gets rid of devlink_nl_cmd_region_read_dumpit() accessing
the cb->data as ops, which we want to change in a later genl patch.
Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit 3b0f31f2b8c9f ("genetlink: make policy common to family")
the policy field of the genl_ops structure has been moved into the
genl_family structure. Add necessary compat layer infrastructure
to still support older kernels.
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Greg Rose [Fri, 6 Mar 2020 22:37:14 +0000 (14:37 -0800)]
compat: Fix up changes to inet frags in 5.1+
Since Linux kernel release 5.1 the fragments field of the inet_frag_queue
structure is removed and now only the rb_fragments structure with an
rb_node pointer is used for both ipv4 and ipv6. In addition, the
atomic_sub and atomic_add functions are replaced with their
equivalent long counterparts.
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Han Zhou [Fri, 6 Mar 2020 07:48:45 +0000 (23:48 -0800)]
raft: Fix the problem of stuck in candidate role forever.
Sometimes a server can stay in candidate role forever, even if the server
already see the new leader and handles append-requests normally. However,
because of the wrong role, it appears as disconnected from cluster and
so the clients are disconnected.
This problem happens when 2 servers become candidates in the same
term, and one of them is elected as leader in that term. It can be
reproduced by the test cases added in this patch.
The root cause is that the current implementation only changes role to
follower when a bigger term is observed (in raft_receive_term__()).
According to the RAFT paper, if another candidate becomes leader with
the same term, the candidate should change to follower.
This patch fixes it by changing the role to follower when leader
is being updated in raft_update_leader().
Signed-off-by: Han Zhou <hzhou@ovn.org> Signed-off-by: Ben Pfaff <blp@ovn.org>
Han Zhou [Sat, 29 Feb 2020 02:07:10 +0000 (18:07 -0800)]
raft: Fix next_index in install_snapshot reply handling.
When a leader handles install_snapshot reply, the next_index for
the follower should be log_start instead of log_end, because there
can be new entries added in leader's log after initiating the
install_snapshot procedure. Also, it should send all the accumulated
entries to follower in the following append-request message, instead
of sending 0 entries, to speed up the converge.
Without this fix, there is no functional problem, but it takes
uncessary extra rounds of append-requests responsed with "inconsistency"
by follower, although finally will be converged.
Signed-off-by: Han Zhou <hzhou@ovn.org> Signed-off-by: Ben Pfaff <blp@ovn.org>
Han Zhou [Sat, 29 Feb 2020 02:07:08 +0000 (18:07 -0800)]
raft: Avoid sending unnecessary heartbeat when becoming leader.
When a node becomes leader, it sends out heartbeat to all followers
and then sends out another append-request for a no-op command
execution to all followers again immediately. This causes 2
continously append-requests sent out to each followers, and the first
heartbeat append-request is unnecessary. This patch removes the
heartbeat.
Signed-off-by: Han Zhou <hzhou@ovn.org> Signed-off-by: Ben Pfaff <blp@ovn.org>
Han Zhou [Sat, 29 Feb 2020 02:07:07 +0000 (18:07 -0800)]
raft: Avoid busy loop during leader election.
When a server doesn't see a leader yet, e.g. during leader re-election,
if a transaction comes from a client, it will cause 100% CPU busy loop.
With debug log enabled it is like:
2020-02-28T04:04:35.631Z|00059|poll_loop|DBG|wakeup due to 0-ms timeout at ../ovsdb/trigger.c:164
2020-02-28T04:04:35.631Z|00062|poll_loop|DBG|wakeup due to 0-ms timeout at ../ovsdb/trigger.c:164
2020-02-28T04:04:35.631Z|00065|poll_loop|DBG|wakeup due to 0-ms timeout at ../ovsdb/trigger.c:164
2020-02-28T04:04:35.631Z|00068|poll_loop|DBG|wakeup due to 0-ms timeout at ../ovsdb/trigger.c:164
2020-02-28T04:04:35.631Z|00071|poll_loop|DBG|wakeup due to 0-ms timeout at ../ovsdb/trigger.c:164
2020-02-28T04:04:35.631Z|00074|poll_loop|DBG|wakeup due to 0-ms timeout at ../ovsdb/trigger.c:164
2020-02-28T04:04:35.631Z|00077|poll_loop|DBG|wakeup due to 0-ms timeout at ../ovsdb/trigger.c:164
...
The problem is that in ovsdb_trigger_try(), all cluster errors are treated
as temporary error and retry immediately. This patch fixes it by introducing
'run_triggers_now', which tells if a retry is needed immediately. When the
cluster error is with detail 'not leader', we don't immediately retry, but
will wait for the next poll event to trigger the retry. When 'not leader'
status changes, there must be a event, i.e. raft RPC that changes the
status, so the trigger is guaranteed to be triggered, without busy loop.
Signed-off-by: Han Zhou <hzhou@ovn.org> Signed-off-by: Ben Pfaff <blp@ovn.org>
Han Zhou [Sat, 29 Feb 2020 02:07:06 +0000 (18:07 -0800)]
raft: Fix raft_is_connected() when there is no leader yet.
If there is never a leader known by the current server, it's status
should be "disconnected" to the cluster. Without this patch, when
a server in cluster is restarted, before it successfully connecting
back to the cluster it will appear as connected, which is wrong.
Signed-off-by: Han Zhou <hzhou@ovn.org> Signed-off-by: Ben Pfaff <blp@ovn.org>
Han Zhou [Sat, 29 Feb 2020 02:07:05 +0000 (18:07 -0800)]
ovsdb-server: Don't disconnect clients after raft install_snapshot.
When "schema" field is found in read_db(), there can be two cases:
1. There is a schema change in clustered DB and the "schema" is the new one.
2. There is a install_snapshot RPC happened, which caused log compaction on the
server and the next log is just the snapshot, which always constains "schema"
field, even though the schema hasn't been changed.
The current implementation doesn't handle case 2), and always assume the schema
is changed hence disconnect all clients of the server. It can cause stability
problem when there are big number of clients connected when this happens in
a large scale environment.
Signed-off-by: Han Zhou <hzhou@ovn.org> Signed-off-by: Ben Pfaff <blp@ovn.org>
ofproto: Add support to watch controller port liveness in fast-failover group
Currently fast-failover group does not support checking liveness of controller
port (OFPP_CONTROLLER). However this feature can be useful for selecting
alternate pipeline when controller connection itself is down for e.g.
by using local DHCP server to reply for any DHCP request originating from VMs.
This patch adds the support for watching controller port liveness in fast-
failover group. Controller port is considered live when atleast one
of-connection is alive.
Aaron Conole [Mon, 2 Mar 2020 16:05:06 +0000 (11:05 -0500)]
ovs-dpctl-top: python3 compatibility
During the transition to python3 support, some syntax errors weren't
adequately cleaned. This addresses the various errors, plus one
minor issue with string type conversion.
Flavio Leitner [Sat, 29 Feb 2020 23:29:35 +0000 (20:29 -0300)]
netdev-linux: Enable TSO in the TAP device.
Use ioctl TUNSETOFFLOAD if kernel supports to enable TSO
offloading in the tap device.
Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support") Reported-by: "Yi Yang (杨�D)-云服务集团" <yangyi01@inspur.com> Tested-by: William Tu <u9012063@gmail.com> Signed-off-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: William Tu <u9012063@gmail.com>
Ben Pfaff [Fri, 28 Feb 2020 17:15:32 +0000 (09:15 -0800)]
ofproto-dpif: Only delete tunnel backer ports along with the dpif.
The admin can choose whether or not to delete flows from datapaths when
they stop ovs-vswitchd. The goal of not deleting flows it to allow
existing traffic to continue being forwarded until ovs-vswitchd is
restarted. Until now, regardless of this choice, ovs-vswitchd has
always deleted tunnel ports from the datapath. When flows are not
deleted, this nevertheless prevents tunnel traffic from being forwarded.
With this patch, ovs-vswitchd no longer deletes tunnel ports in the
case where it does not delete flows, allowing tunnel traffic to continue
being forwarded.
Reported-by: Antonin Bas <abas@vmware.com> Tested-by: Antonin Bas <abas@vmware.com> Tested-by: txfh2007 <txfh2007@aliyun.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Yanqin Wei [Wed, 26 Feb 2020 04:46:36 +0000 (12:46 +0800)]
dpif-netdev.at: Fix partial offloading test cases failure.
Some partial offloading test cases are failing inconsistently. The root
cause is that dummy netdev is assigned with "linux_tc" offloading API.
dpif-netdev - partial hw offload - dummy
dpif-netdev - partial hw offload - dummy-pmd
dpif-netdev - partial hw offload with packet modifications - dummy
dpif-netdev - partial hw offload with packet modifications - dummy-pmd
This patch fixes this issue by changing 'options:ifindex=1' to some big
value. It is a workaround to make "linux_tc" init flow api failure. All
above cases can pass consistently after applying this patch.
Flavio Leitner [Fri, 14 Feb 2020 13:03:36 +0000 (10:03 -0300)]
userspace TSO: SCTP checksum offload optional.
Ideally SCTP checksum offload needs be advertised by the
NIC when userspace TSO is enabled. However, very few drivers
do that and it's not a widely used protocol. So, this patch
enables SCTP checksum offload if available, otherwise userspace
TSO can still be enabled but SCTP packets will be dropped on
NICs without support.
Flavio Leitner [Fri, 14 Feb 2020 13:03:35 +0000 (10:03 -0300)]
userspace TSO: Include UDP checksum offload.
Virtio doesn't expose flags to control which protocols checksum
offload needs to be enabled or disabled. This patch checks if the
NIC supports UDP checksum offload and active it when TSO is enabled.
Yi-Hung Wei [Fri, 21 Feb 2020 22:50:38 +0000 (14:50 -0800)]
docs: Update conntrack established state description
Patch a867c010ee91 ("conntrack: Fix conntrack new state") fixes
the userspace conntrack behavior. This patch updates the
corresponding conntrack state description.
Fixes: a867c010ee91 ("conntrack: Fix conntrack new state") Reported-by: Roni Bar Yanai <roniba@mellanox.com> Acked-by: Roni Bar Yanai <roniba@mellanox.com> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: William Tu <u9012063@gmail.com>
Yi-Hung Wei [Fri, 7 Feb 2020 22:55:06 +0000 (14:55 -0800)]
conntrack: Fix TCP conntrack state
If a TCP connection is in SYN_SENT state, receiving another SYN packet
would just renew the timeout of that conntrack entry rather than create
a new one. Thus, tcp_conn_update() should return CT_UPDATE_VALID_NEW.
This also fixes regressions of a couple of OVN system tests.
Fixes: a867c010ee91 ("conntrack: Fix conntrack new state") Reported-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Tested-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: William Tu <u9012063@gmail.com>
datapath-windows: Append tunnel info to upcall for correct template
Formerly, there is no tunnel information appended in the upcall’s
packet data, which is expected by IPFIX in userspace to calculate
the template for exporting the sampled flow record of on egress
tunnel port.
To fix this, during performing OvsOutputUserspaceAction(), we
would check whether it is initiated by the sampling on egress
tunnel which would be indicated by the attribute as
OVS_USERSPACE_ATTR_EGRESS_TUN_PORT in the nested attribute
list. If so, we would append the tunKey in OvsForwardingContext
indexed by OVS_PACKET_ATTR_EGRESS_TUN_KEY to the upcall.
Besides, at this point, the source transport port and source ip
address are not available in the structure, so we have to fill it in the
way how the packet would be capsulated during performing
OvsEncapGeneve(), which is following the
OvsOutputUserspaceAction() unfortunately.
I have tested the IPFIX functionality with the change, we could see the
template is correct and the expected tunnel information could be
packed in the IPFIX packet finally. The traffic for test is generated by
PING utility.
>From d727d051c9a44a4a93e5ee5f3da3ca9b125aad29 Mon Sep 17 00:00:00 2001
From: Amber Hu <qhu@vmware.com>
Date: Thu, 30 Jan 2020 18:01:32 -0800
Subject: [PATCH v3] datapath-windows: Append tunnel info to upcall for correct
template
Signed-off-by: Amber Hu <qhu@vmware.com> Acked-by: Alin Gabriel Serdean <aserdean@ovn.org> Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>
Ilya Maximets [Thu, 6 Feb 2020 13:24:23 +0000 (14:24 +0100)]
netdev-dpdk: Don't enable offloading on HW device if not requested.
DPDK drivers has different implementations of transmit functions.
Enabled offloading may cause driver to choose slower variant
significantly affecting performance if userspace TSO wasn't requested.
Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support") Reported-by: David Marchand <david.marchand@redhat.com> Acked-by: David Marchand <david.marchand@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Flavio Leitner [Mon, 3 Feb 2020 21:45:50 +0000 (18:45 -0300)]
netdev-linux: Prepend the std packet in the TSO packet
Usually TSO packets are close to 50k, 60k bytes long, so to
to copy less bytes when receiving a packet from the kernel
change the approach. Instead of extending the MTU sized
packet received and append with remaining TSO data from
the TSO buffer, allocate a TSO packet with enough headroom
to prepend the std packet data.
Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support") Suggested-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ben Pfaff <blp@ovn.org>
David Marchand [Tue, 4 Feb 2020 21:28:26 +0000 (22:28 +0100)]
netdev-dpdk: Fix port init when lacking Tx offloads for TSO.
The check on TSO capability did not ensure ip checksum, tcp checksum and
TSO tx offloads were available which resulted in a port init failure
(example below with a ena device):
*2020-02-04T17:42:52.976Z|00084|dpdk|ERR|Ethdev port_id=0 requested Tx
offloads 0x2a doesn't match Tx offloads capabilities 0xe in
rte_eth_dev_configure()*
Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support") Reported-by: Ravi Kerur <rkerur@gmail.com> Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Eli Britstein [Tue, 14 Jan 2020 13:21:15 +0000 (13:21 +0000)]
flow: Fix parsing l3_ofs with partial offloading
l3_ofs should be set all Ethernet packets, not just IPv4/IPv6 ones.
For example for ARP over VLAN tagged packets, it may cause wrong
processing like in changing the VLAN ID action. Fix it.
Fixes: aab96ec4d81e ("dpif-netdev: retrieve flow directly from the flow mark") Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
William Tu [Thu, 23 Jan 2020 17:03:11 +0000 (09:03 -0800)]
docs: Add header install command for afxdp.
The 'XDP_RING_NEED_WAKEUP' and related flags are defined if_xdp.h, so if
users are building their own kernel, users have to update the kernel's
header files, by doing:
$ make headers_install INSTALL_HDR_PATH=/usr
Otherwise the following error shows:
/usr/local/include/bpf/xsk.h: In function 'xsk_ring_prod__needs_wakeup':
/usr/local/include/bpf/xsk.h:82:21: error: 'XDP_RING_NEED_WAKEUP' undeclared \
(first use in this function)
return *r->flags & XDP_RING_NEED_WAKEUP;
Reported-by: Tomek Osinski <osinstom@gmail.com>
Reported-at: https://osinstom.github.io/en/tutorial/ovs-afxdp-installation/ Signed-off-by: William Tu <u9012063@gmail.com> Acked-by: Ben Pfaff <blp@ovn.org> Signed-off-by: William Tu <u9012063@gmail.com>
Yi-Hung Wei [Fri, 20 Dec 2019 17:51:08 +0000 (09:51 -0800)]
conntrack: Fix conntrack new state
In connection tracking system, a connection is established if we
see packets from both directions. However, in userspace datapath's
conntrack, if we send a connection setup packet in one direction
twice, it will make the connection to be in established state.
This patch fixes the aforementioned issue, and adds a system traffic
test for UDP and TCP traffic to avoid regression.
Fixes: a489b16854b59 ("conntrack: New userspace connection tracker.") Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: William Tu <u9012063@gmail.com>
Ilya Maximets [Thu, 23 Jan 2020 18:10:05 +0000 (19:10 +0100)]
dpif: Fix dp_extra_info leak by reworking the allocation scheme.
dpctl module leaks the 'dp_extra_info' in case the dumped flow doesn't
fit the dump filter while executing dpctl/dump-flows and also while
executing dpctl/get-flow.
This is already a 3rd attempt to fix all the leaks and incorrect usage
of this string that definitely indicates poor initial design of the
feature.
Flow dump/get documentation clearly states that the caller does not own
the data provided in dpif_flow. Datapath still owns all the data and
promises to not free/modify it until the next quiescent period, however
we're requesting the caller to free 'dp_extra_info' and this obviously
breaks the rules.
This patch fixes the issue by by storing 'dp_extra_info' within
'struct dp_netdev_flow' making datapath to own it. 'dp_netdev_flow'
is RCU-protected, so it will be valid until the next quiescent period.
Fixes: 0e8f5c6a38d0 ("dpif-netdev: Modified ovs-appctl dpctl/dump-flows command") Tested-by: Emma Finn <emma.finn@intel.com> Acked-by: Emma Finn <emma.finn@intel.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Ben Pfaff [Thu, 9 Jan 2020 20:49:44 +0000 (12:49 -0800)]
ofproto: Do not delete datapath flows on exit by default.
Commit e96a5c24e853 ("upcall: Remove datapath flows when setting
n-threads.") caused OVS to delete datapath flows when it exits through
any graceful means. This is not necessarily desirable, especially when
OVS is being stopped as part of an upgrade. This commit changes OVS so
that it only removes datapath flows when requested, via "ovs-appctl
exit --cleanup".
Ben Pfaff [Thu, 9 Jan 2020 20:49:43 +0000 (12:49 -0800)]
ofproto-dpif-upcall: Get rid of udpif_synchronize().
RCU provides the semantics we want from udpif_synchronize() and it
should be much more lightweight than killing and restarting all the
upcall threads. It looks like udpif_synchronize() was written before
the OVS tree had RCU support, which is probably why we didn't use it
here from the beginning. So we can just change udpif_synchronize()
to a single ovsrcu_synchronize() call.
However, udpif_synchronize() only has a single caller, which calls
ovsrcu_synchronize() anyway just beforehand, via xlate_txn_commit().
So we can get rid of udpif_synchronize() entirely, which this patch
does.
As a side effect, this eliminates one reason why terminating OVS cleanly
clears the datapath flow table. An upcoming patch will eliminate
other reasons.
Acked-by: Numan Siddique <numans@ovn.org> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ning Wu [Wed, 22 Jan 2020 07:46:58 +0000 (23:46 -0800)]
lib/stream-windows.c: Grant Access Privilege of Named Pipe to Creator
Current implementation of ovs on windows only allows LocalSystem and
Administrators to access the named pipe created with API of ovs.
Thus any service that needs to invoke the API to create named pipe
has to run as System account to interactive with ovs. It causes the
system more vulnerable if one of those services was break into.
The patch adds the creator owner account to allowed ACLs.
Signed-off-by: Ning Wu <nwu@vmware.com> Acked-by: Alin Gabriel Serdean <aserdean@ovn.org> Acked-by: Anand Kumar <kumaranand@vmware.com> Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>
John Hurley [Thu, 19 Dec 2019 14:58:43 +0000 (14:58 +0000)]
tc: handle packet mark of zero
Openstack may set an skb mark of 0 in tunnel rules. This is considered to
be an unused/unset value. However, it prevents the rule from being
offloaded.
Check if the key value of the skb mark is 0 when it is in use (mask is
set to all ones). If it is then ignore the field and continue with TC offload.
Only the exact-match case is covered by this patch as it addresses the
Openstack use-case and seems most robust against feature evolution: f.e. in
future there may exist hardware offload scenarios where an operation, such
as a BPF offload, sets the SKB mark before proceeding tho the in-HW OVS.
datapath.
Signed-off-by: John Hurley <john.hurley@netronome.com> Co-Authored-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Acked-by: Aaron Conole <aconole@redhat.com>
Ciara Loftus [Mon, 20 Jan 2020 13:09:36 +0000 (13:09 +0000)]
Documentation: add notes for TSO & i40e
When using TSO in OVS-DPDK with an i40e device, the following
patch is required for DPDK, which fixes an issue on the TSO path:
https://patches.dpdk.org/patch/64136/
Document this as a limitation until a DPDK release with the fix
included is supported by OVS.
Also, document best known methods for performance tuning when
testing TSO with the tool iperf.