The commit 69c51582ff786 ("dpif-netlink: don't allocate per
thread netlink sockets"), in Open vSwitch ovs-vswitchd, has
changed the number of allocated sockets to just one per port
by moving the socket array from a per handler structure to
a per datapath one. In the kernel datapath, a vport will have
only one socket in most case, if so select it directly in
fast-path.
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
net: openvswitch: fix possible memleak on destroy flow-table
When we destroy the flow tables which may contain the flow_mask,
so release the flow mask struct.
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Added additional compat layer fixup for WRITE_ONCE()
The most case *index < ma->max, and flow-mask is not NULL.
We add un/likely for performance.
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Acked-by: William Tu <u9012063@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Simplify the code and remove the unnecessary BUILD_BUG_ON.
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Acked-by: William Tu <u9012063@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The full looking up on flow table traverses all mask array.
If mask-array is too large, the number of invalid flow-mask
increase, performance will be drop.
One bad case, for example: M means flow-mask is valid and NULL
of flow-mask means deleted.
In that case, without this patch, openvswitch will traverses all
mask array, because there will be one flow-mask in the tail. This
patch changes the way of flow-mask inserting and deleting, and the
mask array will be keep as below: there is not a NULL hole. In the
fast path, we can "break" "for" (not "continue") in flow_lookup
when we get a NULL flow-mask.
"break"
v
+-------------------------------------------+
| M | M | NULL |... | NULL | NULL|
+-------------------------------------------+
This patch don't optimize slow or control path, still using ma->max
to traverse. Slow path:
* tbl_mask_array_realloc
* ovs_flow_tbl_lookup_exact
* flow_mask_find
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
net: openvswitch: don't unlock mutex when changing the user_features fails
Unlocking of a not locked mutex is not allowed.
Other kernel thread may be in critical section while
we unlock it because of setting user_feature fail.
Fixes: 95a7233c4 ("net: openvswitch: Set OvS recirc_id from tc chain index") Cc: Paul Blakey <paulb@mellanox.com> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Acked-by: William Tu <u9012063@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
In rtnl_net_notifyid(), we certainly can't pass a null GFP flag to
rtnl_notify(). A GFP_KERNEL flag would be fine in most circumstances,
but there are a few paths calling rtnl_net_notifyid() from atomic
context or from RCU critical sections. The later also precludes the use
of gfp_any() as it wouldn't detect the RCU case. Also, the nlmsg_new()
call is wrong too, as it uses GFP_KERNEL unconditionally.
Therefore, we need to pass the GFP flags as parameter and propagate it
through function calls until the proper flags can be determined.
In most cases, GFP_KERNEL is fine. The exceptions are:
* openvswitch: ovs_vport_cmd_get() and ovs_vport_cmd_dump()
indirectly call rtnl_net_notifyid() from RCU critical section,
* rtnetlink: rtmsg_ifinfo_build_skb() already receives GFP flags as
parameter.
Also, in ovs_vport_cmd_build_info(), let's change the GFP flags used
by nlmsg_new(). The function is allowed to sleep, so better make the
flags consistent with the ones used in the following
ovs_vport_cmd_fill_info() call.
Found by code inspection.
Fixes: 9a9634545c70 ("netns: notify netns id events") Signed-off-by: Guillaume Nault <gnault@redhat.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Backport the datapath.c portion of this fix.
$ tc filter add dev dev1 ingress \
prio 1 chain 0 proto ip \
flower tcp ct_state -trk \
action ct pipe \
action goto chain 2
Received packets will first travel though tc, and if they aren't stolen
by it, like in the above rule, they will continue to OvS datapath.
Since we already did some actions (action ct in this case) which might
modify the packets, and updated action stats, we would like to continue
the proccessing with the correct recirc_id in OvS (here recirc_id(2))
where we left off.
To support this, introduce a new skb extension for tc, which
will be used for translating tc chain to ovs recirc_id to
handle these miss cases. Last tc chain index will be set
by tc goto chain action and read by OvS datapath.
Signed-off-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Backport the local datapath changes from this patch and add compat
layer fixup for the DECLARE_STATIC_KEY_FALSE macro.
openvswitch: Print error when ovs_execute_actions() fails
Currently in function ovs_dp_process_packet(), return values of
ovs_execute_actions() are silently discarded. This patch prints out
an debug message when error happens so as to provide helpful hints
for debugging. Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com> Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
net: openvswitch: do not update max_headroom if new headroom is equal to old headroom
When a vport is deleted, the maximum headroom size would be changed.
If the vport which has the largest headroom is deleted,
the new max_headroom would be set.
But, if the new headroom size is equal to the old headroom size,
updating routine is unnecessary.
Signed-off-by: Taehee Yoo <ap420073@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Ilya Maximets [Mon, 5 Oct 2020 10:09:55 +0000 (12:09 +0200)]
dpctl: Fix broken flow deletion via ovs-dpctl due to missing ufid.
Current code generates UFID for flows installed by ovs-dpctl. This
leads to inability to remove such flows by the same command. Ex:
ovs-dpctl add-dp test
ovs-dpctl add-if test vport0
ovs-dpctl add-flow test "in_port(0),eth(),eth_type(0x800),ipv4(src=100.1.0.1)" 0
ovs-dpctl del-flow test "in_port(0),eth(),eth_type(0x800),ipv4(src=100.1.0.1)"
dpif|WARN|system@test: failed to flow_del (No such file or directory)
ufid:e4457189-3990-4a01-bdcf-1e5f8b208711 in_port(0),
eth(src=00:00:00:00:00:00,dst=00:00:00:00:00:00),eth_type(0x0800),
ipv4(src=100.1.0.1,dst=0.0.0.0,proto=0,tos=0,ttl=0,frag=no)
ovs-dpctl: deleting flow (No such file or directory)
Perhaps you need to specify a UFID?
During del-flow operation UFID is generated too, however resulted
value is different from one generated during add-flow. This happens
because odp_flow_key_hash() function uses random base value for flow
hashes which is different on every invocation. That is not an issue
while running 'ovs-appctl dpctl/{add,del}-flow' because execution
of these requests happens in context of the OVS main process, i.e.
there will be same random seed.
Commit e61984e781e6 was intended to allow offloading for flows
added by dpctl/add-flow unixctl command, so it's better to generate
UFIDs conditionally inside dpctl command handler only for appctl
invocations. Offloading is not possible from ovs-dpctl utility anyway.
There are still couple of corner case: It will not be possible to
remove flow by 'ovs-appctl dpctl/del-flow' without specifying UFID if
main OVS process was restarted since flow addition and it will not
be possible to remove flow by ovs-dpctl without specifying UUID if
it was added by 'ovs-appctl dpctl/add-flow'. But these scenarios
seems minor since these commands intended for testing only.
Ilya Maximets [Wed, 7 Oct 2020 13:23:49 +0000 (15:23 +0200)]
travis: Disable check for array of flexible structures in sparse.
Sparse introduced new checks for flexible arrays and there is a
false-positive in netdev-linux implementation right now that can not
be easily fixed. Patch sent to sparse to fix it, but we need to
disable the check for now to unblock our CI.
lib/netdev-linux.c:1238:19: error: array of flexible structures
The issue is with the following code:
union {
struct cmsghdr cmsg;
char buffer[CMSG_SPACE(sizeof(struct tpacket_auxdata))];
} cmsg_buffers[NETDEV_MAX_BURST];
'struct cmsghdr' contains a flexible array. But this union is a way
to ensure correct alignment of 'buffer', suggested by CMSG manpage.
Ilya Maximets [Thu, 8 Oct 2020 13:53:43 +0000 (15:53 +0200)]
datapath: Fix exposing OVS_TUNNEL_KEY_ATTR_GTPU_OPTS to kernel module.
Kernel module doesn't know about GTPU and it should return correct
out-of-range error in case this tunnel attribute passed there for
any reason. Current out-of-tree module will pass the range check
and will try to access ovs_tunnel_key_lens[] array by index
OVS_TUNNEL_KEY_ATTR_GTPU_OPTS. Even though it might not produce
issues in current code, this is not a good thing to do since
ovs_tunnel_key_lens[] array is not explicitly initialized for
OVS_TUNNEL_KEY_ATTR_GTPU_OPTS and we will likely have misleading
error about incorrect attribute length in the end.
Ted Elhourani [Tue, 6 Oct 2020 20:25:56 +0000 (20:25 +0000)]
dns-resolve: Allow unbound's config file to be set through an env var.
When an unbound context is created, check whether OVS_UNBOUND_CONF has been
set. If a valid config file is supplied then use it to configure the
context. The procedure returns if the config file is invalid. If no config
file is found then the default unbound config is used.
ofproto-dpif-upcall: Log the emergency flow flush.
When the number of flows in the datapath reaches twice the
maximum, revalidators will delete all flows as an emergency
action to recover. In that case, log a message with values
and increase a coverage counter.
ovsdb-idl.at: Queue for termination all OVSDB IDL pids.
When running OVSDB cluster tests on Windows not all the ovsdb processes
are terminated. Queue up the pids of the started processes for
termination when the test stops.
Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
system-userspace-packet-type-aware.at: Wait for ip address updates.
ovs-router module checks for the source ip address of the interface
while adding a new route. netdev module doesn't request ip addresses
from the system every time, but instead it caches currently assigned
ip addresses and updates the cache on netlink notifications if needed.
So, there is a slight delay between setting ip address on interface
in a system and a moment OVS updates list of ip addresses of this
interface. If route addition happens within this time frame, it
fails with the following error:
# ovs-appctl ovs/route/add 10.0.0.0/24 br-p1
Error while inserting route.
ovs-appctl: ovs-vswitchd: server returned an error
This makes system tests to fail frequently.
Let's wait until local route successfully added. This will mean
that OVS finished processing of a netlink event and will use up to
date list of ip addresses on desired interface.
Fixes: 526cf4e1d6a8 ("tests: Added unit tests in packet-type-aware.at") Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Aaron Conole <aconole@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org>
Tomasz Konieczny [Mon, 17 Feb 2020 11:37:36 +0000 (12:37 +0100)]
docs: Add flow control on i40e issue
There is an issue with flow control configuration on i40e devices
and it has a work around. We add this to documentation as known issue
until a permanent solution is developed.
Signed-off-by: Tomasz Konieczny <tomaszx.konieczny@intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
windows, tests: Strip EOL characters when passing them to tasklist
When running OVSDB cluster tests on Windows not all the ovsdb
processes are terminated.
Strip carriage return and newline of the arguments passed to the kill
command because they will cause problems when passing them to tasklist
and taskkill.
Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org> Acked-by: Ilya Maximets <i.maximets@ovn.org>
windows: Document how to generate the Windows installer
This patch adds information on how to generate the Windows installer
which can be used to easily deploy the userspace binaries, kernel module
and create services on new environments.
Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org> Acked-by: Ilya Maximets <i.maximets@ovn.org>
Currently, we terminate a daemon by trying
"ovs-appctl exit", "SIGTERM" and finally "SIGKILL".
But the logic fails if during "ovs-appctl exit", the
daemon crashes (segfaults). The monitor will automatically
restart the daemon with a new pid. The current logic of
checking the non-existance of old pid succeeds and we proceed
with the assumption that the daemon is dead.
This is a problem during OVS upgrades as we will continue
to run the older version of OVS.
With this commit, we take care of this situation. If there
is a segfault, the pidfile is not deleted. So, we wait a
little to give time for the monitor to restart the daemon
(which is usually instantaneous) and then re-read the pidfile.
Strongswan seems to have .opt files in the source tree with the dotted
option syntax. It seems that up until version 5.6, the syntax was also
accepted by Strongswan.
However, the .opt files are converted to .conf files during Strongswan
build, and the dotted syntax is no longer accepted by Strongswan (tested
on 5.8.2).
The effect was that the ovs ipsec monitor fails to start Strongswan,
since that complains with:
/etc/strongswan.d/ovs.conf:4: syntax error, unexpected ., expecting : or '{' or '=' [.]
This commit fixes the configuration file provided to Strongswan to .conf
syntax.
selinux: Add missing permissions for ovs-kmod-ctl.
On RHEL 8, a SELinux policy is missing when ovs-kmod-ctl use modprobe
to load kernel modules. This patch adds the missing permissions based
on /var/log/audit/audit.log
Example log of the AVC violations:
type=AVC msg=audit(1599075387.136:65): avc: denied { read } for
pid=1472 comm="modprobe" name="modules.alias.bin" dev="dm-0" ino=586629
scontext=system_u:system_r:openvswitch_load_module_t:s0
tcontext=system_u:object_r:modules_dep_t:s0 tclass=file permissive=0
type=AVC msg=audit(1599085253.148:45): avc: denied { open } for pid=1355
comm="modprobe" path="/usr/lib/modules/4.18.0-193.el8.x86_64/modules.dep.bin"
dev="dm-0" ino=624258 scontext=system_u:system_r:openvswitch_load_module_t:s0
tcontext=unconfined_u:object_r:modules_dep_t:s0 tclass=file permissive=0
Dumitru Ceara [Mon, 3 Aug 2020 15:05:28 +0000 (17:05 +0200)]
ovsdb: Add unixctl command to show storage status.
If a database enters an error state, e.g., in case of RAFT when reading
the DB file contents if applying the RAFT records triggers constraint
violations, there's no way to determine this unless a client generates a
write transaction. Such write transactions would fail with "ovsdb-error:
inconsistent data".
This commit adds a new command to show the status of the storage that's
backing a database.
Example, on an inconsistent database:
$ ovs-appctl -t /tmp/test.ctl ovsdb-server/get-db-storage-status DB
status: ovsdb error: inconsistent data
Example, on a consistent database:
$ ovs-appctl -t /tmp/test.ctl ovsdb-server/get-db-storage-status DB
status: ok
Signed-off-by: Dumitru Ceara <dceara@redhat.com> Acked-by: Han Zhou <hzhou@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
ovsdb-tool: Add a db consistency check to the ovsdb-tool check-cluster command.
There are some occurrences where the database ends up in an inconsistent
state. This happened in ovn-k8s and is described in [0].
Here we are adding a supported way to check that a given db is consistent,
which is less error prone than checking the logs.
Tested against both a valid db and a corrupted db attached to the
above bug [1]. Also, tested with a fresh db that did not do a snapshot.
OVS_USER_ID was being picked up from a previously existing
openvswitch.useropts rendering innefective any configuration change
through sysconfig.
There is no ordering between Exec* and Environment* stanzas of systemd,
full Enviroment* is always loaded before each Exec*. We make
sure that openvswitch.useropts is removed in a first Exec so that a
fresh OVS_USER_ID can be picked up from config in successive Exec*.
Fixes: 94e1e8b ("rhel: run ovn with the same user as ovs") Signed-off-by: Jaime Caamaño Ruiz <jcaamano@suse.com> Acked-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
William Tu [Tue, 17 Mar 2020 21:39:40 +0000 (14:39 -0700)]
classifier: Fix use of uninitialized value.
Coverity reports use of uninitialized value of cursor.
This happens in cls_cursor_start(), when rule is false,
cursor.subtable is uninitialized. CID 279324.
Signed-off-by: William Tu <u9012063@gmail.com> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Flavio Leitner [Tue, 10 Mar 2020 14:39:36 +0000 (11:39 -0300)]
userspace-tso: Document the minimum kernel version.
The kernel needs to be at least 4.19-rc7 to include the commit 9d2f67e43b73 ("net/packet: fix packet drop as of virtio gso")
otherwise the TSO packets are dropped when using raw sockets.
Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support") Reported-by: Yi Yang <yangyi01@inspur.com> Signed-off-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Ben Pfaff [Mon, 22 Jul 2019 17:35:24 +0000 (10:35 -0700)]
Documentation: Correct claims about Debian packaging.
The documentation reported the union of all possible Debian- and
Debian-derived packaging. This isn't realistic: there are differences
between OVS upstream, Debian downstream, and Ubuntu downstream. This
commit distinguishes them.
Reported-by: Ravi Kerur <rkerur@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
ovsdb-idl.at: Wait all servers to join the cluster.
The test 'Check Python IDL reconnects to leader - Python3
(leader only)' fails sometimes when the first ovsdb-server
gets killed before the others had joined the cluster.
Fix the function ovsdb_cluster_start_idltest to wait them
to join the cluster.
Timothy Redaelli [Fri, 19 Jun 2020 13:53:52 +0000 (15:53 +0200)]
ovs-dpctl-top: Skip "eth()" element.
With commit efde188622ae ("odp-util: Print eth() for Ethernet flows if
packet_type is absent.") "eth()" is printed for Ethernet flows if packet_type
is absent, but this broke "ovs-dpctl-top" since it expects that every
element has a value.
This commit skips the parsing of the empty "eth()" element.
Fixes: efde188622ae ("odp-util: Print eth() for Ethernet flows if packet_type is absent.") Signed-off-by: Timothy Redaelli <tredaelli@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
meta-flow: fix a typo in "MPLS Bottom of Stack Field" paragraph.
In the ovs-fields.7 manual page, the "MPLS Bottom of Stack Field" paragraph
says:
* When mpls_bos is 1, there is another MPLS label following this one,
so the Ethertype passed to pop_mpls should be an MPLS Ethertype. [...]
* When mpls_bos is 0, this MPLS label is the last one, so the Ethertype
passed to pop_mpls should be a non-MPLS Ethertype such as IPv4. [...]
The values 0 and 1 have been swapped: when BOS is 1,
then no more label stack entries follows.
Fixes: 96fee5e0a2a0 ("ovs-fields: New manpage to document Open vSwitch and OpenFlow fields.")
Reported-at: https://bugzilla.redhat.com/1842032 Reported-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Timothy Redaelli <tredaelli@redhat.com> Acked-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Ilya Maximets [Tue, 18 Aug 2020 14:13:29 +0000 (16:13 +0200)]
test-conntrack: Fix conntrack benchmark by clearing conntrack metadata.
Packets in the benchmark must be treated as new packets, i.e. they
should not have conntrack metadata set. Current code will set up
'pkt->md.conn' after the first run and all subsequent calls will hit
the 'fast' processing that is intended for recirculated packets making
a false impression that current conntrack implementation is lightning
fast.
Before the change:
$ ./ovstest test-conntrack benchmark 4 33554432 32 1
conntrack: 1059 ms
After (correct):
$ ./ovstest test-conntrack benchmark 4 33554432 32 1
conntrack: 92785 ms
Ilya Maximets [Fri, 21 Aug 2020 12:04:05 +0000 (14:04 +0200)]
travis: Test build of debian packages.
We had a lot of issues with debian packaging lately. This job will
check build and installation of debian packages to avoid most of such
issues in the future.
Installing only minimal set of tools, most of dependencies will be
installed according to package description, this way we will check if
we have all required dependencies listed.
Not trying to install openvswitch-ipsec package as there is an issue
that python from the pyenv for some reason doesn't see ovs packages
installed from python3-openvswitch, i.e. ipsec service is not able to
start.
Tests are skipped because they are tested in many other scenarios.
No need to waste time.
Aaron Conole [Wed, 12 Aug 2020 20:07:55 +0000 (16:07 -0400)]
connmgr: Support changing openflow versions without restarting.
When commit a0baa7dfa4fe ("connmgr: Make treatment of active and passive
connections more uniform") was applied, it did not take into account
that a reconfiguration of the allowed_versions setting would require a
reload of the ofservice object (only accomplished via a restart of OvS).
For now, during the reconfigure cycle, we delete the ofservice object and
then recreate it immediately. A new test is added to ensure we do not
break this behavior again.
Fixes: a0baa7dfa4fe ("connmgr: Make treatment of active and passive connections more uniform") Suggested-by: Ben Pfaff <blp@ovn.org>
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1782834 Signed-off-by: Aaron Conole <aconole@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Numan Siddique <numans@ovn.org> Tested-by: Numan Siddique <numans@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Emma Finn [Fri, 14 Aug 2020 13:38:49 +0000 (14:38 +0100)]
netdev-offload-dpdk: Fix for broken ethernet matching HWOL for XL710NIC.
This patch introduces a temporary work around to fix
partial hardware offload for XL710 devices. Currently the incorrect
ethernet pattern is being set. This patch will be removed once
this issue is fixed within the i40e PMD.
Signed-off-by: Emma Finn <emma.finn@intel.com> Signed-off-by: Eli Britstein <elibr@nvidia.com> Co-authored-by: Eli Britstein <elibr@nvidia.com> Tested-by: Ian Stokes <ian.stokes@intel.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Han Zhou [Tue, 11 Aug 2020 06:15:10 +0000 (23:15 -0700)]
Revert "ovsdb-idl: Fix NULL deref reported by Coverity."
This reverts commit 68bc6f88a3a36549fcd3b6248c25c5e2e6deb8f3.
The commit causes a regression in OVN scale test. ovn-northd's CPU
more than doubled for the test scenario: create and bind 12k ports.
Below are some perf data of ovn-northd when running command:
ovn-nbctl --wait=sb sync
Before reverting this commit:
- 92.42% 0.62% ovn-northd ovn-northd [.] main
- 91.80% main
+ 68.93% ovn_db_run (inlined)
+ 22.45% ovsdb_idl_loop_commit_and_wait
After reverting this commit:
- 92.84% 0.60% ovn-northd ovn-northd [.] main
- 92.24% main
+ 92.03% ovn_db_run (inlined)
Reverting this commit avoided 22.45% of the CPU caused by
ovsdb_idl_loop_commit_and_wait().
The commit changed the logic of ovsdb_idl_txn_write__() by adding
the check "datum->keys && datum->values" before discarding unchanged
data in a transaction. However, it is normal for OVSDB clients (
such as ovn-northd) to try to set columns with same empty data
as it is before the transaction. IDL would discard these changes
and avoid sending big transactions to server (which would end up as
no-op on server side). In the ovn scale test scenario mentioned above,
each iteration of ovn-northd would send a transaction to server that
includes all rows of the huge Port_Binding table, which caused the
significant CPU increase of ovn-northd (and also the OVN SB DB server),
resulted in longer end to end latency of OVN configuration changes.
For the original problem the commit 68bc6f88 was trying to fix, it
doesn't seem to be a real problem. The NULL deref reported by
Coverity may be addressed in a future patch using a different approach,
if necessary.
Signed-off-by: Han Zhou <hzhou@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
As of DPDK 19.11, in order to use dequeue-zero-copy in DPDK Vhost library,
the application has to disable the linear buffer option. Hence
dequeue-zero-copy is not supported for vhost application that requires
linear buffers.
An alternative DPDK based approach to disable the linear buffers within
the vhost library itself was proposed in [1], however the consensus was
that application should be responsible for disabling linear buffers.
As such this patch disables linear buffers when zero-copy is enabled.
[1] https://patches.dpdk.org/patch/67200/
Fixes: 127b6a6eea02 ("dpdk: Update to use DPDK 19.11.") Signed-off-by: Sivaprasad Tummala <Sivaprasad.Tummala@intel.com> Acked-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Ilya Maximets [Wed, 12 Aug 2020 08:57:07 +0000 (10:57 +0200)]
acinclude: Fix build with kernels with prandom* moved to prandom.h.
Recent commit c0842fbc1b18 ("random32: move the pseudo-random 32-bit
definitions to prandom.h") in upstream kernel moved the definition
of prandom_* functions from random.h to prandom.h. This change was
also backported to stable kernels.
Fixing our configure script to look for these functions in a new
location and avoid build failures:
datapath/linux/compat/include/linux/random.h:11:19:
error: redefinition of 'prandom_u32_max'
Acked-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Dumitru Ceara [Wed, 5 Aug 2020 19:40:51 +0000 (21:40 +0200)]
ovsdb-server: Replace in-memory DB contents at raft install_snapshot.
Every time a follower has to install a snapshot received from the
leader, it should also replace the data in memory. Right now this only
happens when snapshots are installed that also change the schema.
This can lead to inconsistent DB data on follower nodes and the snapshot
may fail to get applied.
Fixes: bda1f6b60588 ("ovsdb-server: Don't disconnect clients after raft install_snapshot.") Acked-by: Han Zhou <hzhou@ovn.org> Signed-off-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Harry van Haaren [Wed, 29 Jul 2020 10:59:34 +0000 (11:59 +0100)]
configure: explicitly disable avx512 if bintuils check fails
This commit explicitly disables avx512f if the binutils assembler
check fails to correctly assemble its input.
Without this fix, there is a possibility that users can see undefined
behaviour when compiling with -march=native on a CPU which supports
avx512 and with a buggy binutils version (v2.30 and 2.31), without a
backported fix, if the compiler's vectorizing optimizations convert
scalar code to avx512 instructions.
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Harry van Haaren [Wed, 29 Jul 2020 10:59:33 +0000 (11:59 +0100)]
dpif-netdev/avx512: add -fPIC flag to enable shared builds
In certain scenarios with OVS built with --enable-shared and
DPDK enabled as shared build too, Position Independant Code
is required to link the avx512.a file into the relocatable .so
that it must be linked into.
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Harry van Haaren [Wed, 29 Jul 2020 10:59:32 +0000 (11:59 +0100)]
dpif-netdev/avx512: avoid compiling avx512 code if binutils check fails
This commit avoids compiling and linking of avx512 code into the
vswitch_la library if the binutils check fails. This avoids compiling
code into OVS that will not be executed due to binutils issue.
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Peng He [Tue, 4 Aug 2020 01:54:56 +0000 (09:54 +0800)]
odp-util: Clear padding in the nd_extension.
Silimar to the patch 67eb8110171f ("odp-util: Fix passing
uninitialized bytes in OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV*.")
when change from flow into the netlink format, the tail
padding of nd_extension should be cleared.
this fixes the following warning logs:
|ofproto_dpif_upcall(pmd-...)|WARN|Conflicting ukey for flows:
ufid:763c7d3b-4d0c-4bff-aafc-fdfb6089c2ba
<...>,eth(...),eth_type(0x86dd),ipv6(...),icmpv6(type=135,code=0),\
nd(target=fdbd:dc02:ff:1:1::1,sll=fa:16:3e:75:b3:a9,tll=00:00:00:00:00:00),\
nd_ext(nd_reserved=0x0,nd_options_type=1)
Fixes: 9b2b84973db7 ("Support for match & set ICMPv6 reserved and options type fields") Signed-off-by: Peng He <hepeng.0320@bytedance.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
odp-util: Fix clearing match mask if set action is partially unnecessary.
While committing set() actions, commit() could wildcard all the fields
that are same in match key and in the set action. This leads to
situation where mask after commit could actually contain less bits
than it was before. And if set action was partially committed, all
the fields that were the same will be cleared out from the matching key
resulting in the incorrect (too wide) flow.
For example, for the flow that matches on both src and dst mac
addresses, if the dst mac is the same and only src should be changed
by the set() action, destination address will be wildcarded in the
match key and will never be matched, i.e. flows with any destination
mac will match, which is not correct.
The code before commit dbf4a92800d0 was not able to reduce the mask,
it was only possible to expand it to exact match, so it was OK to
update original matching mask with the new value in all cases.
Fixes: dbf4a92800d0 ("odp-util: Do not rewrite fields with the same values as matched")
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1854376 Acked-by: Eli Britstein <elibr@mellanox.com> Tested-by: Adrián Moreno <amorenoz@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The flow key is not updated when process OVS_ACTION_ATTR_SET action.
It will impact follow-up actions, such as, conntrack module cannot
find created conntrack entry if passing old flow key to it.
Reported-by: Rui Cao <rcao@vmware.com> Signed-off-by: Jinjun Gao <jinjung@vmware.com> Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
datapath-windows: Reset ct_mark/ct_label to support ALG
The ct_mark/ct_label setting on related connection keep the same
behavior with Linux datapath. If one CT entry has parent/master entry,
its ct_mark and ct_label should inherit from the corresponding part
of parent/master entry at initialization.
Signed-off-by: Jinjun Gao <jinjung@vmware.com> Acked-by: Alin Gabriel Serdean <aserdean@ovn.org> Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Yifeng Sun [Mon, 27 Jul 2020 19:27:23 +0000 (12:27 -0700)]
bfd: Support overlay BFD
Current OVS intercepts and processes all BFD packets, thus VM-2-VM
BFD packets get lost and the recipient VM never sees them.
This patch fixes it by only intercepting and processing BFD packets
destined to a configured BFD instance, and other BFD packets are made
available to the OVS flow table for forwarding.
This patch keeps BFD's backward compatibility.
VMware-BZ: #2579326 Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: William Tu <u9012063@gmail.com>
William Tu [Thu, 23 Jul 2020 16:32:06 +0000 (09:32 -0700)]
tests: Refactor the iptables accept rule.
Certain Linux distributions, like CentOS, have default iptable
rules to reject input traffic from br-underlay. Refactor by
creating a macro 'IPTABLES_ACCEPT([bridge])' for adding the
accept rule to the iptable input chain.
Use the pktgen-dpdk to test the commit 5c41c31ebd64
("dpif-netdev: includes microsecond delta in meter bucket calculation"),
it does't work as expected. And it broken the meter function (e.g. set
rate 200Mbps, the rate watched was 400Mbps). To reproduce it:
meter-test.pkt:
| set 0 count 0
| set 0 size 1500
| set 0 rate 100
| set 0 burst 64
| set 0 sport 1234
| set 0 dport 5678
| set 0 prime 1
| set 0 type ipv4
| set 0 proto udp
| set 0 dst ip 1.1.1.2
| set 0 src ip 1.1.1.1/24
| set 0 dst mac ec:0d:9a:ab:54:0a
| set 0 src mac ec:0d:9a:bf:df:bb
| set 0 vlanid 0
| start 0
Note that the issue that patch 5c41c31ebd64 was intended to fix was
already fixed by commit: 42697ca7757b ("dpif-netdev: fix meter at high packet rate.")
Some tests checks for 'miss upcall' log in a log file immediately
after sending the packet, this causes test failures while running
them under valgrind or on the overloaded system.
Fix that by waiting for appearance of the actual string in the log
file. Some other tests uses 'sleep 1' to fix that, but it's better
to wait for event than sleep for a specific amount of time.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: William Tu <u9012063@gmail.com>
Fixes: a692410af0f7 ("dpctl: Expand the flow dump type filter") Cc: Gavi Teitz <gavi@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: William Tu <u9012063@gmail.com>
Flavio Leitner [Fri, 5 Jun 2020 19:24:53 +0000 (16:24 -0300)]
docs: Remove duplicate word from vhost-user doc.
Fixes: 49df3c0fe779 ("docs: DPDK isn't a datapath, so don't use the term.") Acked-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: William Tu <u9012063@gmail.com>
Since commit 8e4e45887ec3, priority of 'local' route entries no
longer matches with 'plen'. This should be taken into account
while flushing cached routes, otherwise they will remain in OVS
even after removing them from the system:
# ifconfig eth0 11.0.0.1
# ovs-appctl ovs/route/show
--- A new route synchronized from kernel route table ---
Cached: 11.0.0.1/32 dev eth0 SRC 11.0.0.1 local
# ifconfig eth0 0
# ovs-appctl ovs/route/show
-- the new route entry is still in ovs route table ---
Cached: 11.0.0.1/32 dev eth0 SRC 11.0.0.1 local
CC: wenxu <wenxu@ucloud.cn> Fixes: 8e4e45887ec3 ("ofproto-dpif-xlate: makes OVS native tunneling honor tunnel-specified source addresses") Reported-by: Zheng Jingzhou <glovejmm@163.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2020-July/373093.html Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: William Tu <u9012063@gmail.com>
David Marchand [Mon, 13 Jul 2020 08:06:21 +0000 (10:06 +0200)]
dpdk: Add commands to configure log levels.
Enabling debug logs in dpdk can be a challenge to be sure of what is
actually enabled, add commands to list and change those log levels.
However, these commands do not help when tracking issues in dpdk init
itself: dump log levels right after init.
Example:
$ ovs-appctl dpdk/log-list
global log level is debug
id 0: lib.eal, level is info
id 1: lib.malloc, level is info
id 2: lib.ring, level is info
id 3: lib.mempool, level is info
id 4: lib.timer, level is info
id 5: pmd, level is info
[...]
id 37: pmd.net.bnxt.driver, level is notice
id 38: pmd.net.e1000.init, level is notice
id 39: pmd.net.e1000.driver, level is notice
id 40: pmd.net.enic, level is info
[...]
$ ovs-appctl dpdk/log-set debug pmd.*:notice
$ ovs-appctl dpdk/log-list
global log level is debug
id 0: lib.eal, level is debug
id 1: lib.malloc, level is debug
id 2: lib.ring, level is debug
id 3: lib.mempool, level is debug
id 4: lib.timer, level is debug
id 5: pmd, level is debug
[...]
id 37: pmd.net.bnxt.driver, level is notice
id 38: pmd.net.e1000.init, level is notice
id 39: pmd.net.e1000.driver, level is notice
id 40: pmd.net.enic, level is notice
[...]
Signed-off-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Roi Dayan [Wed, 15 Jul 2020 12:40:49 +0000 (15:40 +0300)]
rhel: openvswitch-fedora.spec.in: Fix installed but not packaged.
With the cited commit, we get an error from rpmbuild about installed
but not packaged /usr/lib64/libopenvswitchavx512.a.
Fix it by treating it as the other la files.
Fixes: 352b6c7116cd ("dpif-lookup: add avx512 gather implementation.") Signed-off-by: Roi Dayan <roid@mellanox.com> Acked-by: Ian Stokes <ian.stokes@intel.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Jeff Squyres [Thu, 9 Jul 2020 23:57:47 +0000 (16:57 -0700)]
bond: Add 'primary' interface concept for active-backup mode.
In AB bonding, if the current active slave becomes disabled, a
replacement slave is arbitrarily picked from the remaining set of
enabled slaves. This commit adds the concept of a "primary" slave: an
interface that will always be (or become) the current active slave if
it is enabled.
The rationale for this functionality is to allow the designation of a
preferred interface for a given bond. For example:
1. Bond is created with interfaces p1 (primary) and p2, both enabled.
2. p1 becomes the current active slave (because it was designated as
the primary).
3. Later, p1 fails/becomes disabled.
4. p2 is chosen to become the current active slave.
5. Later, p1 becomes re-enabled.
6. p1 is chosen to become the current active slave (because it was
designated as the primary)
Note that p1 becomes the active slave once it becomes re-enabled, even
if nothing has happened to p2.
This "primary" concept exists in Linux kernel network interface
bonding, but did not previously exist in OVS bonding.
Only one primary slave interface is supported per bond, and is only
supported for active/backup bonding.
The primary slave interface is designated via
"other_config:bond-primary" when creating a bond.
Also, while adding tests for the "primary" concept, make a few small
improvements to the non-primary AB bonding test.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com> Reviewed-by: Aaron Conole <aconole@redhat.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
dpif-netdev: Avoid deadlock with offloading during PMD thread deletion.
Main thread will try to pause/stop all revalidators during datapath
reconfiguration via datapath purge callback (dp_purge_cb) while
holding 'dp->port_mutex'. And deadlock happens in case any of
revalidator threads is already waiting on 'dp->port_mutex' while
dumping offloaded flows:
main thread revalidator
--------------------------------- ----------------------------------
reconfigure_datapath()
-> reconfigure_pmd_threads()
-> dp_netdev_del_pmd()
-> dp_purge_cb()
-> udpif_pause_revalidators()
-> ovs_barrier_block(&udpif->pause_barrier)
<waiting for revalidators to reach barrier>
<DEADLOCK>
We're not allowed to call offloading API without holding global
port mutex from the userspace datapath due to thread safety
restrictions on netdev-offload-dpdk module. And it's also not easy
to rework datapath reconfiguration process in order to move actual
PMD removal and datapath purge out of the port mutex.
So, for now, not sleeping on a mutex if it's not immediately available
seem like an easiest workaround. This will have impact on flow
statistics update rate and on ability to get the latest statistics
before removing the flow (latest stats will be lost in case we were
not able to take the mutex). However, this will allow us to operate
normally avoiding the deadlock.
The last bit is that to avoid flapping of flow attributes and
statistics we're not failing the operation, but returning last
statistics and attributes returned by offload provider. Since those
might be updated in different threads, stores and reads are atomic.
Reported-by: Frank Wang (王培辉) <wangpeihui@inspur.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2020-June/371753.html Fixes: a309e4f52660 ("dpif-netdev: Update offloaded flows statistics.") Acked-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Ian Stokes <ian.stokes@intel.com> Tested-by: Eli Britstein <elibr@mellanox.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
William Tu [Mon, 13 Jul 2020 20:34:32 +0000 (13:34 -0700)]
ovs-bugtool: Fix Python3 bytes str issue.
The patch fixes two errors due to type mismatched, when converting
between str and bytes:
File "/usr/local/sbin/ovs-bugtool", line 649, in main
cmd_output(CAP_NETWORK_STATUS, [OVS_DPCTL, 'dump-flows', '-m', d])
File "/usr/local/sbin/ovs-bugtool", line 278, in cmd_output
label = ' '.join(a)
TypeError: sequence item 3: expected str instance, bytes found
And
File "/usr/sbin/ovs-bugtool", line 721, in main
collect_data()
File "/usr/sbin/ovs-bugtool", line 366, in collect_data
run_procs(process_lists.values())
File "/usr/sbin/ovs-bugtool", line 1354, in run_procs
p.inst.write("\n** timeout **\n")
File "/usr/sbin/ovs-bugtool", line 1403, in write
BytesIO.write(self, s)
TypeError: a bytes-like object is required, not 'str'
VMware-BZ: #2602135
Fixed: 9e6c00bca9af ("bugtool: Fix for Python3.") Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: William Tu <u9012063@gmail.com>