Yifeng Sun [Tue, 26 Jun 2018 21:06:21 +0000 (14:06 -0700)]
DNS: Add basic support for asynchronous DNS resolving
This patch is a simple implementation for the proposal discussed in
https://mail.openvswitch.org/pipermail/ovs-dev/2017-August/337038.html and
https://mail.openvswitch.org/pipermail/ovs-dev/2017-October/340013.html.
It enables ovs-vswitchd and other utilities to use DNS names when specifying
OpenFlow and OVSDB remotes.
Below are some of the features and limitations of this patch:
- Resolving is asynchornous in daemon context, avoiding blocking main loop;
- Resolving is synchronous in general utility context;
- Both IPv4 and IPv6 are supported;
- The resolving API is thread-safe;
- Depends on the unbound library;
- When multiple ip addresses are returned, only the first one is used;
- /etc/nsswitch.conf isn't respected as unbound library doesn't look at it;
- For async-resolving, caller need to retry later; there is no callback.
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Yifeng Sun [Fri, 6 Jul 2018 15:28:27 +0000 (08:28 -0700)]
dpif-netlink-rtnl: Retry smaller MTU when default MAX_MTU is too large.
When MAX_MTU is larger than hw supported max MTU,
dpif_netlink_rtnl_create will fail. This leads to
testing failure '11: datapath - ping over gre tunnel'
in 'make check-kmod'.
This patch fixes this issue by retrying a smaller MTU
when MAX_MTU is too large.
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
OVN: add IPv6 address unreachable support to OVN router ports
Add priority-70 flows to generate ICMPv6 address unreachable messages
in reply to IPv6 packets directed to the router's IP address on IP
protocols other than UDP, TCP, and ICMP
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
OVN: add IPv6 UDP port unreachable support to OVN logical router
Add priority-80 flow to generate ICMPv6 port unreachable messages in
reply to IPv6 UDP datagrams directed to the router's IP address since the
logical router doesn't accept any UDP traffic
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
ofproto-dpif-xlate: Fix packet_in reason for Table-miss rule
Currently in OvS if we hit "Table-miss" rules (associated with Controller
action) then we send PACKET_IN message to controller with reason as
OFPR_NO_MATCH.
“Table-miss” rule is one whose priority is 0 and its catch all rule.
But if we hit same "Table-miss" rule after executing group entry we will
send the reason as OFPR_ACTION (for OF1.3 and below) and OFPR_GROUP
(for OF1.4 and above).
This is because once we execute group entry we set ctx->in_group and later
when we hit the "Table-miss" rule, Since ctx->in_group is set we send
reason as OFPR_ACTION (for OF1.3) and OFPR_GROUP (for OF1.4 and above).
For eg: for the following pipeline, we will send the reason as OFPR_ACTION
even if we hit The “Table-miss” rule.
Ian Stokes [Wed, 27 Jun 2018 13:58:31 +0000 (14:58 +0100)]
dpdk: Support both shared and per port mempools.
This commit re-introduces the concept of shared mempools as the default
memory model for DPDK devices. Per port mempools are still available but
must be enabled explicitly by a user.
OVS previously used a shared mempool model for ports with the same MTU
and socket configuration. This was replaced by a per port mempool model
to address issues flagged by users such as:
However the per port model potentially requires an increase in memory
resource requirements to support the same number of ports and configuration
as the shared port model.
This is considered a blocking factor for current deployments of OVS
when upgrading to future OVS releases as a user may have to redimension
memory for the same deployment configuration. This may not be possible for
users.
This commit resolves the issue by re-introducing shared mempools as
the default memory behaviour in OVS DPDK but also refactors the memory
configuration code to allow for per port mempools.
This patch adds a new global config option, per-port-memory, that
controls the enablement of per port mempools for DPDK devices.
ovs-vsctl set Open_vSwitch . other_config:per-port-memory=true
This value defaults to false; to enable per port memory support,
this field should be set to true when setting other global parameters
on init (such as "dpdk-socket-mem", for example). Changing the value at
runtime is not supported, and requires restarting the vswitch
daemon.
The mempool sweep functionality is also replaced with the
sweep functionality from OVS 2.9 found in commits
c77f692 (netdev-dpdk: Free mempool only when no in-use mbufs.) a7fb0a4 (netdev-dpdk: Add mempool reuse/free debug.)
A new document to discuss the specifics of the memory models and example
memory requirement calculations is also added.
Signed-off-by: Ian Stokes <ian.stokes@intel.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Tiago Lam <tiago.lam@intel.com> Tested-by: Tiago Lam <tiago.lam@intel.com>
Yuanhan Liu [Mon, 25 Jun 2018 13:21:08 +0000 (16:21 +0300)]
dpif-netdev: do hw flow offload in a thread
Currently, the major trigger for hw flow offload is at upcall handling,
which is actually in the datapath. Moreover, the hw offload installation
and modification is not that lightweight. Meaning, if there are so many
flows being added or modified frequently, it could stall the datapath,
which could result to packet loss.
To diminish that, all those flow operations will be recorded and appended
to a list. A thread is then introduced to process this list (to do the
real flow offloading put/del operations). This could leave the datapath
as lightweight as possible.
Signed-off-by: Yuanhan Liu <yliu@fridaylinux.org> Co-authored-by: Shahaf Shuler <shahafs@mellanox.com> Signed-off-by: Shahaf Shuler <shahafs@mellanox.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Finn Christensen [Mon, 25 Jun 2018 13:21:06 +0000 (16:21 +0300)]
netdev-dpdk: implement flow offload with rte flow
The basic yet the major part of this patch is to translate the "match"
to rte flow patterns. And then, we create a rte flow with MARK + RSS
actions. Afterwards, all packets match the flow will have the mark id in
the mbuf.
The reason RSS is needed is, for most NICs, a MARK only action is not
allowed. It has to be used together with some other actions, such as
QUEUE, RSS, etc. However, QUEUE action can specify one queue only, which
may break the rss. Likely, RSS action is currently the best we could
now. Thus, RSS action is choosen.
For any unsupported flows, such as MPLS, -1 is returned, meaning the
flow offload is failed and then skipped.
Co-authored-by: Yuanhan Liu <yliu@fridaylinux.org> Signed-off-by: Finn Christensen <fc@napatech.com> Signed-off-by: Yuanhan Liu <yliu@fridaylinux.org> Co-authored-by: Shahaf Shuler <shahafs@mellanox.com> Signed-off-by: Shahaf Shuler <shahafs@mellanox.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Yuanhan Liu [Mon, 25 Jun 2018 13:21:05 +0000 (16:21 +0300)]
dpif-netdev: retrieve flow directly from the flow mark
So that we could skip some very costly CPU operations, including but
not limiting to miniflow_extract, emc lookup, dpcls lookup, etc. Thus,
performance could be greatly improved.
A PHY-PHY forwarding with 1000 mega flows (udp,tp_src=1000-1999) and
1 million streams (tp_src=1000-1999, tp_dst=2000-2999) show more that
260% performance boost.
Note that though the heavy miniflow_extract is skipped, we still have
to do per packet checking, due to we have to check the tcp_flags.
Co-authored-by: Finn Christensen <fc@napatech.com> Signed-off-by: Yuanhan Liu <yliu@fridaylinux.org> Signed-off-by: Finn Christensen <fc@napatech.com> Co-authored-by: Shahaf Shuler <shahafs@mellanox.com> Signed-off-by: Shahaf Shuler <shahafs@mellanox.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Yuanhan Liu [Mon, 25 Jun 2018 13:21:03 +0000 (16:21 +0300)]
dpif-netdev: associate flow with a mark id
Most modern NICs have the ability to bind a flow with a mark, so that
every packet matches such flow will have that mark present in its
descriptor.
The basic idea of doing that is, when we receives packets later, we could
directly get the flow from the mark. That could avoid some very costly
CPU operations, including (but not limiting to) miniflow_extract, emc
lookup, dpcls lookup, etc. Thus, performance could be greatly improved.
Thus, the major work of this patch is to associate a flow with a mark
id (an uint32_t number). The association in netdev datapath is done
by CMAP, while in hardware it's done by the rte_flow MARK action.
One tricky thing in OVS-DPDK is, the flow tables is per-PMD. For the
case there is only one phys port but with 2 queues, there could be 2
PMDs. In other words, even for a single mega flow (i.e. udp,tp_src=1000),
there could be 2 different dp_netdev flows, one for each PMD. That could
results to the same mega flow being offloaded twice in the hardware,
worse, we may get 2 different marks and only the last one will work.
To avoid that, a megaflow_to_mark CMAP is created. An entry will be
added for the first PMD that wants to offload a flow. For later PMDs,
it will see such megaflow is already offloaded, then the flow will not
be offloaded to HW twice.
Meanwhile, the mark to flow mapping becomes to 1:N mapping. That is
what the mark_to_flow CMAP is for. When the first PMD wants to offload
a flow, it allocates a new mark and performs the flow offload by reusing
the ->flow_put method. When it succeeds, a "mark to flow" entry will be
added. For later PMDs, it will get the corresponding mark by above
megaflow_to_mark CMAP. Then, another "mark to flow" entry will be added.
Signed-off-by: Yuanhan Liu <yliu@fridaylinux.org> Co-authored-by: Finn Christensen <fc@napatech.com> Signed-off-by: Finn Christensen <fc@napatech.com> Co-authored-by: Shahaf Shuler <shahafs@mellanox.com> Signed-off-by: Shahaf Shuler <shahafs@mellanox.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
OVN: add ICMPv6 time exceeded support to OVN logical router
Using icmp6 action, send an ICMPv6 time exceeded frame whenever
an OVN logical router receives an IPv6 packets whose TTL has
expired (ip.ttl == {0, 1})
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ben Pfaff [Thu, 21 Jun 2018 22:53:53 +0000 (15:53 -0700)]
ofproto-dpif: Let the dpif report when a port is a duplicate.
The port_add() function checks whether the port about to be added to the
dpif is already present and adds it only if it is not. This duplicates a
check also present (and necessary) in each dpif and races with it as well.
When a dpif has a large number of ports, the check can be expensive (it is
not efficiently implemented). It would be nice to made the check cheaper,
but it also seems reasonable to do as done in this patch and just let the
dpif report the duplication.
Reported-by: Haifeng Lin <haifeng.lin@huawei.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Justin Pettit <jpettit@ovn.org> Signed-off-by: Ben Pfaff <blp@ovn.org>
A bissect shows that commit d22f892 ("netdev-linux: monitor and offload
LAG slaves to TC") introduced netdev_linux_update_lag(), which is now
triggering a crash in the "datapath - ping over bond" test in
system-userspace-testsuite:
(gdb) bt
#0 0x00000000009762e7 in netdev_linux_update_lag (change=0x7ffdff013750) at lib/netdev-linux.c:728
728 if (is_netdev_linux_class(master_netdev->netdev_class)) {
This fixes the crash by simply returning in case netdev_from_name()
returns NULL, as this should indicate the master is not attached to the
bridge.
Additionally, netdev_linux_update_lag() isn't "clearing" the netdev
reference it gets from netdev_from_name(), meaning its ref_cnt is
incremented but never decremented. Thus, also call netdev_close() before
returning.
CC: John Hurley <john.hurley@netronome.com> Fixes: d22f8927 ("netdev-linux: monitor and offload LAG slaves to TC") Signed-off-by: Tiago Lam <tiago.lam@intel.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Traditionally, for boolean variables we use boolean values.
Lets keep to that tradition.
Hopefully, using false with a bool works with gcc 6.3.1;
I use both recent versions of gcc (7.3) and older
versions (4.x), but did not see the issue found in 165c1f0649af commit.
Cc: Ian Stokes<ian.stokes@intel.com> Fixes: 165c1f0649af ("db-ctl-base: Fix compilation warnings.") Signed-off-by: Darrell Ball <dlu998@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Darrell Ball [Fri, 29 Jun 2018 06:39:47 +0000 (23:39 -0700)]
conntrack: Fix fragmentation checks.
The ipv4 fragmentation check is broken and allows fragments through.
There were fragile and poorly maintainable checks in extract_l3_ipv*
designed to save a few cycles. The checks make assumptions about what
sanity checks may have been done and could be skipped based on inferring
from the value of another paramater that should be unrelated (l4
pointer needing assignment). Since the benefit is minimal, remove
the special checks and always do sanity checks.
Four tests are added to better maintain fragmentation support.
This needs backporting to 2.9.
Fixes: c8b1ad49da68("conntrack: Reorder sanity checks in extract_l3_ipvx().") Fixes: a489b16854b5("conntrack: New userspace connection tracker.") Signed-off-by: Darrell Ball <dlu998@gmail.com>
Han Zhou [Mon, 25 Jun 2018 17:03:02 +0000 (10:03 -0700)]
ovn.at: Add stateful test for ACL on port groups.
A bug was reported on the feature of applying ACLs on port groups [1].
This bug was not detected by the original test case, because it didn't
test the return traffic and so didn't ensure the stateful feature is
working. The fix [2] causes the original test case fail, because
once the conntrack is enabled, the test packets are dropped because
the checksum in those packets are invalid and so marked with "invalid"
state by conntrack. To avoid the test case failure, the fix [2] changed
it to test stateless acl only, which leaves the scenario untested,
although it is fixed. This patch adds back the stateful ACL in the
test, and replaced the dummy/receive with inject-pkt to send the test
packets, so that checksums can be properly filled in, and it also
adds tests for the return traffic, which ensures the stateful is
working.
Signed-off-by: Han Zhou <hzhou8@ebay.com> Acked-by: Jakub Sitnicki <jkbs@redhat.com> Acked-by: Daniel Alvarez <dalvarez@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Daniel Alvarez [Wed, 20 Jun 2018 02:18:59 +0000 (04:18 +0200)]
ovn-northd: Apply pre ACLs when using Port Groups
When using Port Groups, the pre ACLs were not applied so the
conntrack action was not performed. This patch takes Port Groups
into account when processing the pre ACLs.
As a follow up, we could enhance this patch by creating an index
from lswitch to port groups.
Signed-off-by: Daniel Alvarez <dalvarez@redhat.com> Acked-by: Lucas Alvares Gomes <lucasagomes@gmail.com> Acked-by: Han Zhou <hzhou8@ebay.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
aginwala [Sat, 9 Jun 2018 01:33:13 +0000 (18:33 -0700)]
ovndb-servers: Set connection table when using load balancer to manage ovndb clusters via pacemaker.
This is will allow setting inactivity probe on the master node.
For pacemaker to manage ovndb resources via LB, we skipped creating connection
table and hence the inactivity probe was getting set to 5000 by default.
In order to over-ride it we need this table. However, we need to skip slaves
listening on local sb and nb connections table so that LB feature is
intact and only master is listening on 0.0.0.0
e.g --remote=db:OVN_Southbound,SB_Global,connections and
--remote=db:OVN_Northbound,NB_Global,connections
will be skipped for slave SB and NB dbs respectively by unsetting
--db-sb-use-remote-in-db and --db-nb-use-remote-in-db in ovn-ctl.
Signed-off-by: aginwala <aginwala@ebay.com> Acked-by: Numan Siddique <nusiddiq@redhat.com> Acked-by: Han Zhou <hzhou8@ebay.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
aginwala [Fri, 8 Jun 2018 19:32:22 +0000 (12:32 -0700)]
ovn-ctl: Support NB and SB DBs to start without using remote connections.
e.g --remote=db:OVN_Southbound,SB_Global,connections and
--remote=db:OVN_Northbound,NB_Global,connections
can be skipped for cases where slaves do not need to listen on nb and sb db
connection tables while using pacemaker with load balancer for ovndb clusters.
Signed-off-by: aginwala <aginwala@ebay.com> Acked-by: Han Zhou <hzhou8@ebay.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ian Stokes [Wed, 4 Jul 2018 14:28:33 +0000 (15:28 +0100)]
db-ctl-base: Fix compilation warnings.
This commit fixes uninitialized variable warnings in functions
cmd_create() and cmd_get() when compiling with gcc 6.3.1 and -Werror
by initializing variables 'symbol' and 'new' to NULL.
Cc: Alex Wang <alexw@nicira.com> Fixes: 07ff77ccb82a ("db-ctl-base: Make common database command code into library.") Signed-off-by: Ian Stokes <ian.stokes@intel.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ilya Maximets [Wed, 20 Jun 2018 07:44:51 +0000 (10:44 +0300)]
rconn: Suppress 'connected' log for unreliable connections.
Recent assertion failure fix changed rconn workflow for unreliable
connections (such as connections from ovs-ofctl) from
|rconn|DBG|br-int<->unix#151: entering ACTIVE
|rconn|DBG|br-int<->unix#151: connection closed by peer
|rconn|DBG|br-int<->unix#151: entering DISCONNECTED
To
|rconn|DBG|br-int<->unix#200: entering CONNECTING
|rconn|INFO|br-int<->unix#200: connected
|rconn|DBG|br-int<->unix#200: entering ACTIVE
|rconn|DBG|br-int<->unix#200: connection closed by peer
|rconn|DBG|br-int<->unix#200: entering DISCONNECTED
Many monitoring/configuring tools (ex. ovs-neutron-agent) uses
ovs-ofctl frequently to check the statuses of installed flows.
This produces a lot of "connected" logs, that are useless in general.
Fix that by changing the log level to DBG for unreliable connections.
Suggested-by: Ben Pfaff <blp@ovn.org> Fixes: c9a9b9b00bf5 ("rconn: Introduce new invariant to fix assertion failure in corner case.") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ben Pfaff [Tue, 3 Jul 2018 18:32:18 +0000 (11:32 -0700)]
ofproto-macros: Ignore "Dropped # log messages" in check_logs.
check_logs ignores some log messages, but it wasn't smart enough to ignore
the messages that said that the ignored messages had been rate-limited.
This fixes the problem.
It's OK to ignore all rate-limiting messages because they only appear if at
least one message was not rate-limited, which check_logs will catch anyway.
Jakub Sitnicki [Mon, 2 Jul 2018 10:50:09 +0000 (12:50 +0200)]
db-ctl-base: Don't die in ctl_set_column() on error.
Return the error message to the caller instead of reporting it and dying
so that the caller can handle the error without terminating the process
if needed.
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Jakub Sitnicki [Mon, 2 Jul 2018 10:50:08 +0000 (12:50 +0200)]
db-ctl-base: Don't die in pre_list_columns() on error.
Return the error message to the caller instead of reporting it and dying
so that the caller can handle the error without terminating the process
if needed.
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Jakub Sitnicki [Mon, 2 Jul 2018 10:50:07 +0000 (12:50 +0200)]
db-ctl-base: Don't die in pre_parse_column_key_value() on error.
Return the error message to the caller instead of reporting it and dying
so that the caller can handle the error without terminating the process
if needed.
Also, we no longer return the column as it was not used by any of
existing callers.
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Jakub Sitnicki [Mon, 2 Jul 2018 10:50:06 +0000 (12:50 +0200)]
db-ctl-base: Don't die in pre_get_table() on error.
Return the error message to the caller instead of reporting it and dying
so that the caller can handle the error without terminating the process
if needed.
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Jakub Sitnicki [Mon, 2 Jul 2018 10:50:05 +0000 (12:50 +0200)]
db-ctl-base: Don't die in pre_get_column() on error.
Return the error message to the caller instead of reporting it and dying
so that the caller can handle the error without terminating the process
if needed.
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Jakub Sitnicki [Mon, 2 Jul 2018 10:50:04 +0000 (12:50 +0200)]
db-ctl-base: Don't die in ctl_get_row() on error.
Return the error message to the caller instead of reporting it and dying
so that the caller can handle the error without terminating the process
if needed.
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Jakub Sitnicki [Mon, 2 Jul 2018 10:50:03 +0000 (12:50 +0200)]
db-ctl-base: Don't die in get_row_by_id() on multiple matches.
Signal that multiple rows match the record identifier via a new output
parameter instead of reporting the problem and dying, so that the caller
can handle the error without terminating the process if needed.
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Jakub Sitnicki [Mon, 2 Jul 2018 10:50:02 +0000 (12:50 +0200)]
db-ctl-base: Don't die in create_symbol() on error.
Return the error message to the caller instead of reporting it and dying
so that the caller can handle the error without terminating the process
if needed.
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Jakub Sitnicki [Mon, 2 Jul 2018 10:50:01 +0000 (12:50 +0200)]
db-ctl-base: Don't die in set_column() on error.
Return the error message to the caller instead of reporting it and dying
so that the caller can handle the error without terminating the process
if needed.
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Jakub Sitnicki [Mon, 2 Jul 2018 10:50:00 +0000 (12:50 +0200)]
db-ctl-base: Don't die in check_mutable() on error.
Return the error message to the caller instead of reporting it and dying
so that the caller can handle the error without terminating the process
if needed.
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Jakub Sitnicki [Mon, 2 Jul 2018 10:49:59 +0000 (12:49 +0200)]
db-ctl-base: Don't die in is_condition_satisfied() on error.
Return the error message to the caller instead of reporting it and dying
so that the caller can handle the error without terminating the process
if needed.
Also, rename the function as it is no longer a typical predicate, so
that the users don't assume that the result is passed in return value.
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Jakub Sitnicki [Mon, 2 Jul 2018 10:49:58 +0000 (12:49 +0200)]
db-ctl-base: Don't die in get_table() on error.
Return the error message to the caller instead of reporting it and dying
so that the caller can handle the error without terminating the process
if needed.
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Jakub Sitnicki [Mon, 2 Jul 2018 10:49:57 +0000 (12:49 +0200)]
db-ctl-base: Don't die in parse_column_names() on error.
Return the error message to the caller instead of reporting it and dying
so that the caller can handle the error without terminating the process
if needed.
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Aaron Conole [Thu, 28 Jun 2018 00:40:04 +0000 (20:40 -0400)]
checkpatch: fix patch separator line regex
The separator line always starts with three dashes on a line, optionally
followed by either white-space, OR a single space and a filename. The
regex would previously match on any three dashes in a row. This means
that a patch (such as [1]) would trigger the parser state machine to
advance beyond the signed-off checks.
Now, bound the check only to use what git-mailinfo would use as a
separator.
--- <filename>
---<sp>
Roi Dayan [Mon, 2 Jul 2018 09:07:58 +0000 (12:07 +0300)]
netdev-tc-offloads: Fix probing multi mask per prio
When adding TC rules we save the prio so can reuse same prio
for same mask since different mask will have to use different prio.
The multi mask per prio probe broke this by using a prio but
get_prio_for_tc_flower() didn't know about it.
Also multi mask per prio support changes the hash calculation.
It's best the probe will add and del the ingress qdisc to have a clean start
after it.
Signed-off-by: Roi Dayan <roid@mellanox.com> Acked-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
Greg Rose [Fri, 29 Jun 2018 18:18:13 +0000 (11:18 -0700)]
utilities: On RHEL 7 systems clean up after upgrade
When upgrading from older versions of OVS that used the built-in geneve
kernel module on RHEL 7 systems to newer versions that use the 'compat'
vport_geneve and vport_vxlan drivers we need to clean up some cruft
that might have been left over after the upgrade.
Remove any genev_sys_6081 and vxlan_sys_4789 interfaces and then if
the RHEL 7 geneve or vxlan built-in drivers are loaded remove them
before loading the new drivers.
Removing the geneve and vxlan built-in drivers will prevent occurrences
of the "unassociated datapath" errors that can sometimes occur in some
environments.
Greg Rose [Fri, 29 Jun 2018 03:31:26 +0000 (20:31 -0700)]
datapath: Add missing code in ip_tunnel_lookup()
The compat rpl_ip_tunnel_lookup() function was missing some code added
in Linux kernel release 4.3 but not backported in the initial commit.
This also allows us to remove an old hack in erspan_rcv() that was
zeroing out the key parameter so that the tunnel lookups wouldn't fail.
Fixes: 8e53509c ("gre: introduce native tunnel support for ERSPAN") Reported-by: William Tu <u9012063@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: Justin Pettit <jpettit@ovn.org>
Greg Rose [Fri, 29 Jun 2018 03:31:25 +0000 (20:31 -0700)]
compat: Fix gre header bug
Commit 436d36db introduced a bug into the gre header build for gre and
ip gre type tunnels. __vlan_hwaccel_push_inside does not check whether
the vlan tag is even present. So check first and avoid padding space
for a vlan tag that isn't present.
Fixes: 436d36db ("compat: Fixups for newer kernels") Signed-off-by: Greg Rose <gvrose8192@gmail.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: Justin Pettit <jpettit@ovn.org>
OVN: do not mark ND packets for conntrack in PRE_LB stage
Do not send Neighbor Discovery packets to conntrack module if
load balancing rules have been added to NB db since otherwise
Neighbor Advertisement frames will be discarded by OVN.
In order to reproduce the issue it is enough to add 2 logical ports
to a single logical switch, assign an IPv6 address to each VIF, and
define a load balance rule on the logical switch. After a while the
ping6 from VIF1 to VIF2 will stop since the vm will not receive any NA
packet
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: Gurucharan Shetty <guru@ovn.org>
Darrell Ball [Thu, 28 Jun 2018 05:15:43 +0000 (22:15 -0700)]
ovn: Fix gateway load balancing.
Non-distributed and distributed gateway load balancing is broken.
Recent changes for port unreachable handling broke the associated
unsnat functionality. The fix approach is check for gateway
contexts and accept packets directed to gateway router IPs.
Fixes: 86558ac2e476 ("OVN: add UDP port unreachable support to OVN logical router.") Fixes: 159932c9e4ea ("OVN: add TCP port unreachable support to OVN logical router.") Fixes: 0e858e05f76b ("OVN: add protocol unreachable support to OVN router ports.") CC: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: Darrell Ball <dlu998@gmail.com> Signed-off-by: Gurucharan Shetty <guru@ovn.org>
John Hurley [Thu, 28 Jun 2018 16:03:07 +0000 (17:03 +0100)]
netdev-linux: monitor and offload LAG slaves to TC
A LAG slave cannot be added directly to an OvS bridge, nor can a OvS
bridge port be added to a LAG dev. However, LAG masters can be added to
OvS.
Use TC blocks to indirectly offload slaves when their master is attached
as a linux-netdev to an OvS bridge. In the kernel TC datapath, blocks link
together netdevs in a similar way to LAG devices. For example, if a filter
is added to a block then it is added to all block devices, or if stats are
incremented on 1 device then the stats on the entire block are incremented.
This mimics LAG devices in that if a rule is applied to the LAG master
then it should be applied to all slaves etc.
Monitor LAG slaves via the netlink socket in netdev-linux and, if their
master is attached to the OvS bridge and has a block id, add the slave's
qdisc to the same block. Similarly, if a slave is freed from a master,
remove the qdisc from the masters block.
Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
John Hurley [Thu, 28 Jun 2018 16:03:06 +0000 (17:03 +0100)]
netdev-linux: assign LAG devs to tc blocks
Assign block ids to LAG masters that are added to OvS as linux-netdevs and
offloaded via offload API calls. Only LAG masters are assigned to blocks.
To ensure uniqueness, the block ids are determined by the netdev ifindex.
Implement a get_block_id op for linux netdevs to achieve this.
Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
John Hurley [Thu, 28 Jun 2018 16:03:05 +0000 (17:03 +0100)]
netdev-linux: indicate if netdev is a LAG master
If a linux netdev is added to OvS that is a LAG master (for example, a
bond or team netdev) then record this in bool form in the dev struct. Use
the link info extracted from rtnetlink calls to determine this.
Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
John Hurley [Thu, 28 Jun 2018 16:03:04 +0000 (17:03 +0100)]
rtnetlink: extend parser to include kind of master and slave
Extend the rtnetlink_parse function to look for linkinfo attributes and,
in turn, store pointers to the master and slave kinds (if any) in the
rtnetlink_change struct.
Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
John Hurley [Thu, 28 Jun 2018 16:03:03 +0000 (17:03 +0100)]
netdev-provider: add class op to get block_id
Add a new class op for netdevs to get the block_id if one exists. The
block_id is used in offload ops to group multiple qdiscs together.
Stub calls are made to the new class op (implementation to follow in
further patches). The default block_id of 0 (no block) will be used in
these cases.
Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
John Hurley [Thu, 28 Jun 2018 16:03:02 +0000 (17:03 +0100)]
tc: allow offloading of block ids
Blocks, in tc classifiers, allow the grouping of multiple qdiscs with an
associated block id. Whenever a filter is added to/removed from this
block, the filter is added to/removed from all associated qdiscs.
Extend TC offload functions to take a block id as a parameter. If the id
is zero then the dqisc is not considered part of a block.
Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
Ben Pfaff [Wed, 27 Jun 2018 14:07:49 +0000 (07:07 -0700)]
ofp-meter: Fix ofp_print_meter_flags() output.
It had a missing space.
CC: Yifeng Sun <pkusunyifeng@gmail.com> Fixes: 61677bf976e9 ("ofp-meter: Fix ds_put_format that treats enum type as short integer") Signed-off-by: Ben Pfaff <blp@ovn.org>
Yifeng Sun [Tue, 26 Jun 2018 21:23:49 +0000 (14:23 -0700)]
ofp-meter: Fix ds_put_format that treats enum type as short integer
Travis job fails because of the below error and this patch solves this issue.
lib/ofp-meter.c:340:48: error: format specifies type 'unsigned short'
but the argument has underlying type 'unsigned int' [-Werror,-Wformat]
ds_put_format(s, "flags:0x%"PRIx16" ", flags);
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
During the investigation of a kernel panic, we encountered a condition
that triggered a kernel panic due to a large skb with an unusual
geometry. Inside of the STT codepath, an effort is made to linearize
such packets to avoid trouble during both fragment reassembly and
segmentation in the linux networking core.
As currently implemented, kernels with CONFIG_SLUB defined will skip
this process because it does not expect an skb with a frag_list to be
present. This patch removes the assumption, and allows these skb to
be linearized as intended. We confirmed this corrects the panic we
encountered.
Aaron Conole [Wed, 20 Jun 2018 18:40:58 +0000 (14:40 -0400)]
checkpatch: Only consider certain signoffs
Formatted patches can contain a heirarchy of sign-offs. This is true when
merging patches from different projects (eg. backports to the datapath
directory from the linux net project).
This means that a submitted backport will contain multiple signed-off
tags, and not all should be considered.
This commit updates checkpatch to only consider those signoff lines which
start at the beginning of a line. So the following:
Signed-off-by: Foo Bar <foo@bar.com>
should not trigger.
Signed-off-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Anand Kumar [Fri, 22 Jun 2018 17:09:27 +0000 (10:09 -0700)]
datapath-windows: Compute ct hash based on 5-tuple and zone
Conntrack 5-tuple consists of src address, dst address, src port,
dst port and protocol which will be unique to a ct session.
Use this information along with zone to compute hash.
Also re-factor conntrack code related to parsing netlink attributes.
Testing:
Verified loading/unloading the driver with driver verified enabled.
Ran TCP/UDP and ICMP traffic.
Signed-off-by: Anand Kumar <kumaranand@vmware.com> Acked-by: Alin Gabriel Serdean <aserdean@ovn.org> Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>
Anand Kumar [Fri, 22 Jun 2018 17:09:26 +0000 (10:09 -0700)]
datapath-windows: Implement locking in conntrack NAT.
This patch primarily replaces existing ndis RWlock based implementaion
for NAT in conntrack with a spinlock based implementation inside NAT,
module along with some conntrack optimization.
- The 'ovsNatTable' and 'ovsUnNatTable' tables are shared
between cleanup threads and packet processing thread.
In order to protect these two tables use a spinlock.
Also introduce counters to track number of nat entries.
- Introduce a new function OvsGetTcpHeader() to retrieve TCP header
and payload length, to optimize for TCP traffic.
- Optimize conntrack look up.
- Remove 'bucketlockRef' member from conntrack entry structure.
Testing:
Verified loading/unloading the driver with driver verified enabled.
Ran TCP/UDP and ICMP traffic.
Signed-off-by: Anand Kumar <kumaranand@vmware.com> Acked-by: Alin Gabriel Serdean <aserdean@ovn.org> Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>
Anand Kumar [Fri, 22 Jun 2018 17:09:25 +0000 (10:09 -0700)]
datapath-windows: Use spinlock instead of RW lock for ct entry
This patch mainly changes a ndis RW lock for conntrack entry to a
spinlock along with some minor refactor in conntrack. Using
spinlock instead of RW lock as RW locks causes performance hits
when acquired/released multiple times.
- Use NdisInterlockedXX wrapper api's instead of InterlockedXX.
- Update 'ctTotalRelatedEntries' using interlocked functions.
- Move conntrack lock out of NAT module.
Testing:
Verified loading/unloading the driver with driver verified enabled.
Ran TCP/UDP and ICMP traffic.
Signed-off-by: Anand Kumar <kumaranand@vmware.com> Acked-by: Alin Gabriel Serdean <aserdean@ovn.org> Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>
Eelco Chaudron [Wed, 20 Jun 2018 09:04:03 +0000 (11:04 +0200)]
utilities: Add the ovs_show_fdb command to gdb
This adds the ovs_show_fdb command:
Usage: ovs_show_fdb {<bridge_name> {dbg} {hash}}
<bridge_name> : Optional bridge name, if not supplied FDB summary
information is displayed for all bridges.
dbg : Will show structure address information
hash : Will display the forwarding table using the hash
table, rather than the rlu list.
Signed-off-by: Andy Zhou <azhou@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Justin Pettit <jpettit@ovn.org> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com>
Justin Pettit [Tue, 19 Jun 2018 21:10:17 +0000 (14:10 -0700)]
datapath: Fix compiler warning for HAVE_RHEL7_MAX_MTU.
Fixes: 1e40b541bc ("datapath: Fix max MTU size on RHEL 7.5 kernel") Signed-off-by: Justin Pettit <jpettit@ovn.org> Reviewed-by: Greg Rose <gvrose8192@gmail.com>
ovn: Fix DHCP classless static route for non-classful masks.
When trying to determine how many bytes of ip address needs to be included
in classless static route option, we should take into consideration the
following. To get the correct amount of bytes we need to take number of
network bits in the mask and divide it by 8. But if the mask has a
remainder when divided, we need to not ignore this and add 1 byte to the to
the length of the option.
Signed-off-by: Rostyslav Fridman <rostyslav_fridman@epam.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Lorenzo Bianconi [Mon, 18 Jun 2018 11:56:00 +0000 (13:56 +0200)]
OVN: add protocol unreachable support to OVN router ports
Add priority-70 flows to generate ICMP protocol unreachable messages
in reply to packets directed to the router's IP address on IP protocols
other than UDP, TCP, and ICMP
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Lorenzo Bianconi [Mon, 18 Jun 2018 11:55:59 +0000 (13:55 +0200)]
OVN: add TCP port unreachable support to OVN logical router
Add priority-80 flows to generate TCP reset messages in reply to
TCP datagrams directed to the router's IP address since the
logical router doesn't accept any TCP traffic
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Lorenzo Bianconi [Mon, 18 Jun 2018 11:55:58 +0000 (13:55 +0200)]
OVN: add UDP port unreachable support to OVN logical router
Add priority-80 flows to generate ICMP port unreachable messages in
reply to UDP datagrams directed to the router's IP address since the
logical router doesn't accept any UDP traffic
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Han Zhou [Wed, 30 May 2018 17:08:26 +0000 (10:08 -0700)]
ovsdb-idl: Remove unnecessary code in track clear.
In ovsdb_idl_db_track_clear(), it needs to free the deleted row.
However, it unnecessary to call ovsdb_idl_row_clear_old(), because
this has been called in ovsdb_idl_row_destroy(). It is also confusing
because it is called only if:
if (ovsdb_idl_row_is_orphan(row))
This is contradict with the check in ovsdb_idl_row_clear_old():
if (!ovsdb_idl_row_is_orphan(row))
(Currently the tracked row doesn't maintain any data, so there is no
leak.)
Signed-off-by: Han Zhou <hzhou8@ebay.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Kyle Simpson [Wed, 6 Jun 2018 14:17:59 +0000 (15:17 +0100)]
ofp-actions: Build action_set in one scan of action_list.
The previous implementation scans the action set of each WRITE_ACTIONS
command 13--17 times when moving the actions over. This change builds
up the list as a single scan, which should be more efficient.
Signed-off-by: Kyle Simpson <kyleandrew.simpson@gmail.com> Co-authored-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Ben Pfaff <blp@ovn.org>