The command: `make check-tabs` fails on Windows due to line ending conversions
caused by the following setting: `git config --global core.autocrlf true`
(the whitelist `build-aux/initial-tab-whitelist` becomes a blacklist)
This patch adds a .gittatribute file to build-aux to force LF endings
on Windows.
Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org> Co-authored-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Aaron Conole <aconole@redhat.com> Acked-by: Ben Pfaff <blp@ovn.org>
Ian Stokes [Tue, 10 Jul 2018 18:46:55 +0000 (19:46 +0100)]
ovn-nbctl: Fix compilation warnings.
This commit fixes 'maybe-uninitialized' warnings for pointers in various
functions in ovn-nbctl when compiling with gcc 6.3.1 and -Werror.
Pointers to structs nbrec_logical_switch, nbrec_logical_switch_port,
nbrec_logical_router and nbrec_logical_router_port are now initialized
to NULL where required.
Cc: Justin Pettit <jpettit@ovn.org> Cc: Venkata Anil <vkommadi@redhat.com> Fixes: 31114af758c7 ("ovn-nbctl: Update logical router port commands.") Fixes: 80f408f4cffb ("ovn: Use Logical_Switch_Port in NB.") Fixes: 36f232bca2db ("ovn: l3ha, CLI for logical router port gateway
chassis") Signed-off-by: Ian Stokes <ian.stokes@intel.com> Signed-off-by: Justin Pettit <jpettit@ovn.org>
Martin Xu [Thu, 12 Jul 2018 23:25:24 +0000 (16:25 -0700)]
rhel: support kmod-openvswitch build against multiple kernels, rhel6
This patch only affects rhel6 spec file.
RHEL 7.4 introduced backward incompatible changes in the kernel. As
a result, prebuilt PRM packages against kernels newer than 693.17.1
will cannot be used on systems with older kernels, vice versa.
This patch allows multiple kernel version numbers delimited by
whitespace to be passed as variable "kversion". kmod-openvswitch RPM
packages the kernel module .ko files from all specified kernel
versions.
This patch also includes a script to update the weak-update symlinks
if the system kernel version is upgraded or downgraded after
kmod-openvswitch is installed.
Signed-off-by: Martin Xu <martinxu9.ovs@gmail.com> Co-authored-by: Greg Rose <gvrose8192@gmail.com> CC: Ben Pfaff <blp@ovn.org> CC: Flavio Leitner <fbl@redhat.com> CC: Aaron Conole <aconole@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Martin Xu [Thu, 12 Jul 2018 23:25:23 +0000 (16:25 -0700)]
rhel: remove openvswitch-kmod package from build, rhel6
This patch only affects rhel6 spec file.
Previouly the kernel_module_package macro is used to generate spec file
template to build kmod-openvswitch RPM. The main package only contains
the openvswitch.conf for depmod. The macro is now removed. Everything is
built in the main package instead. This effectively removes the redudant
openvswitch-kmod package from the build.
Signed-off-by: Martin Xu <martinxu9.ovs@gmail.com> CC: Greg Rose <gvrose8192@gmail.com> CC: Ben Pfaff <blp@ovn.org> CC: Flavio Leitner <fbl@redhat.com> CC: Aaron Conole <aconole@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Martin Xu [Thu, 12 Jul 2018 23:25:22 +0000 (16:25 -0700)]
rhel: rename openvswitch kmod rhel6 spec file
This patch only affects rhel6 spec file.
The rhel6 kmod spec file is renamed from openvswitch-kmod-rhel6.spec
to kmod-openvswitch-rhel6.spec . This is to prepare for the next
patches to support building multiple kernel versions in the main
package. The rename makes the spec file consistent with the resulted
kmod-openvswitch-<version>.rpm, which is the real package with
kernel module files.
Signed-off-by: Martin Xu <martinxu9.ovs@gmail.com> Reviewed-by: Flavio Leitner <fbl@redhat.com> CC: Greg Rose <gvrose8192@gmail.com> CC: Ben Pfaff <blp@ovn.org> CC: Aaron Conole <aconole@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
William Tu [Wed, 11 Jul 2018 16:45:08 +0000 (09:45 -0700)]
datapath: work around the single GRE receive limitation.
Commit 9f57c67c379d ("gre: Remove support for sharing GRE protocol hook")
allows only single GRE packet receiver. When upstream kernel's gre module
is loaded, the gre.ko exclusively becomes the only gre packet receiver,
preventing OVS kernel module from registering another gre receiver.
We can either try to unload the gre.ko by removing its dependencies,
or, in this patch, we try to register OVS as only the GRE transmit
portion when detecting there already exists another GRE receiver.
Signed-off-by: William Tu <u9012063@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Cc: Greg Rose <gvrose8192@gmail.com> Cc: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
conntrack: Fix conn_update_state_alg use after free.
When conn_update_state() returns true, conn has been freed, so skip calling
handle_ftp_ctl() with this conn and instead follow code path for new
connections.
Fixes: bd5e81a0e596 ("Userspace Datapath: Add ALG infra and FTP.") Signed-off-by: Darrell Ball <dlu998@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ben Pfaff [Tue, 10 Jul 2018 16:27:18 +0000 (09:27 -0700)]
sparse: Make IN6_IS_ADDR_MC_LINKLOCAL and IN6_ARE_ADDR_EQUAL pickier.
On GNU systems these macros work with arbitrary pointers, but the relevant
standards only require IN6_IS_ADDR_MC_LINKLOCAL to work with in6_addr (and
don't specify IN6_ARE_ADDR_EQUAL at all). Make the "sparse"
implementations correspondingly pickier so that we catch any introduced
problems more quickly.
Ken Sanislo [Wed, 20 Jun 2018 21:44:08 +0000 (14:44 -0700)]
ifupdown.sh: Correctly bring up bond slaves.
It seems that line 70 needs to be operating on the $slave variable created
in the for loop at :68. Bonded interfaces fail to bring up their links
with the current version, this will makes them work correctly.
Signed-off-by: Ken Sanislo <ken@intherack.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Tiago Lam [Thu, 21 Jun 2018 17:39:16 +0000 (18:39 +0100)]
bridge: Clean leaking netdevs when route is added.
When adding a route to a bridge, by executing "$appctl ovs/route/add
$IP/$MASK $BR", a reference to the existing netdev is taken and stored
in an instantiated ip_dev struct which is then stored in an addr_list
list in tnl-ports.c. When OvS is signaled to exit, as a result of a
"$appctl $OVS_PID exit --cleanup", for example, the bridge takes care of
destroying its allocated port and iface structs. While destroying and
freeing an iface, the netdev associated with it is also destroyed.
However, for this to happen its ref_cnt must be 0. Otherwise the
destructor of the netdev (specific to each datapath) won't be called. On
the userspace datapath this means a system interface, such as "br0",
wouldn't get deleted upon exit of OvS (when a route happens to be
assocaited).
This was first observed in the "ptap - triangle bridge setup with L2 and
L3 GRE tunnels" test, which runs as part of the system userspace
testsuite and uses the netdev datapath (as opoosed to several tests
which use the dummy datapath, where this issue isn't seen). The test
would pass every other time and fail the rest of the times because the
needed system interfaces (br-p1, br-p2 and br-p3) were already present
(from the previous successfull run which didn't clean up properly),
leading to a failure.
To fix the leak and clean up the interfaces upon exit, on its final
stage before destroying a netdev, in iface_destroy__(), the bridge calls
tnl_port_map_delete_ipdev() which takes care of freeing the instatiated
ip_dev structs that refer to a specific netdev.
An extra test is also introduced which verifies that the resources used
by OvS netdev datapath have been correctly cleaned up between
OVS_TRAFFIC_VSWITCHD_STOP and AT_CLEANUP.
Signed-off-by: Tiago Lam <tiago.lam@intel.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
xlate: use const struct in6_addr in linklocal check
Commit 83c2757bd16e ("xlate: Move tnl_neigh_snoop() to
terminate_native_tunnel()") introduced a call to
IN6_IS_ADDR_MC_LINKLOCAL() when checking neighbor discovery.
The call to this assumes that the argument may be a const uint8_t *.
According to The Open Group Base Specifications Issue 7, 2018:
macro is of type int and takes a single argument of
type const struct in6_addr *
The GNU implementation allows a bit of flexibility, by internally
casting the argument. However, other implementations (such as OS X)
more rigidly implement the standard and fail with errors like:
error: member reference base type 'const uint8_t'
(aka 'const unsigned char') is not a structure or union
Fixes: 83c2757bd16e ("xlate: Move tnl_neigh_snoop() to terminate_native_tunnel()") Cc: Zoltan Balogh <zoltan.balogh.eth@gmail.com> Cc: Jan Scheurich <jan.scheurich@ericsson.com> Signed-off-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ben Pfaff [Mon, 9 Jul 2018 20:04:03 +0000 (13:04 -0700)]
flow: Fix buffer overread for crafted IPv6 packets.
The ipv6_sanity_check() function implemented a check for IPv6 payload
length wrong: ip6_plen is the payload length but this function checked
whether it was longer than the total length of IPv6 header plus payload.
This meant that a packet with a crafted ip6_plen could result in a buffer
overread of up to the length of an IPv6 header (40 bytes).
The kernel datapath flow extraction code does not obviously have a similar
problem.
Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=9287 Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Darrell Ball <dlu998@gmail.com>
Jakub Sitnicki [Sat, 7 Jul 2018 11:09:36 +0000 (13:09 +0200)]
ovn-nbctl: Report the actual error from the command handler.
Fix a typo that went undetected by tests because we don't have any test
cases for error paths when using database commands with ovn-nbctl.
Fixes: 675b152e999f ("db-ctl-base: Extend ctl_context with an error message.") Signed-off-by: Jakub Sitnicki <jkbs@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
OVN: add unit test for TCPv6 port unreachable support
Add unit test for the TCP reset segment sent by OVN logical router when
it receives an IPv6 TCP segment directed to the router's IP address since
the logical router doesn't accept any TCP traffic
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ben Pfaff [Thu, 5 Jul 2018 22:28:51 +0000 (15:28 -0700)]
ofp-group: Don't assert-fail decoding bad OF1.5 group mod type or command.
When decoding a group mod, the current code validates the group type and
command after the whole group mod has been decoded. The OF1.5 decoder,
however, tries to use the type and command earlier, when it might still be
invalid. This caused an assertion failure (via OVS_NOT_REACHED). This
commit fixes the problem.
ovs-vswitchd does not enable support for OpenFlow 1.5 by default.
Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=9249 Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Mark Michelson [Tue, 26 Jun 2018 18:42:32 +0000 (14:42 -0400)]
ovn: Add router load balancer undnat rule for IPv6
When configuring a router port to have a redirect-chassis and using an
IPv6 load balancer rule that specifies a TCP/UDP port, load balancing
would not work as expected. This is because a rule to un-dnat the return
traffic from the load balancer destination was not installed. This is
because this rule was only being installed for IPv4 load balancers.
This change adds the same rule for IPv6 load balancers as well.
Signed-off-by: Mark Michelson <mmichels@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
CC: Yifeng Sun <pkusunyifeng@gmail.com> Fixes: 771680d96f ("DNS: Add basic support for asynchronous DNS resolving") Signed-off-by: Justin Pettit <jpettit@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
ovs-sandbox: Fix ovs-appctl for ovn-northd and ovn-controller.
Commits 1e8eeb66db2e7 ("ovs-sandbox: Support starting multiple
ovn-northds.") and 047458de40391 ("ovs-sandbox: Add option to support
multiple ovn-controllers.") allowed starting multiple instances of
ovn-northd and ovn-controller, respectively. It did this by assigning a
sequence number to to the pidfile name. Unfortunately, this breaks the
method ovs-appctl uses to determine to which process it should connect.
This commit changes the behavior so that a sequence number is not added
to the first instance, so ovs-appctl will connect to that be default.
This commit also uses the same convention for naming the log file.
Signed-off-by: Justin Pettit <jpettit@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
ovs-sandbox: Use different log file names for ovn-controllers.
Commit 047458de40391 ("ovs-sandbox: Add option to support multiple
ovn-controllers.") allowed creating multiple instances of
ovn-controller. However, all instances would use the same log file
name. This commit uses the sequence number to name the log file.
Signed-off-by: Justin Pettit <jpettit@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
Eelco Chaudron [Mon, 25 Jun 2018 10:58:05 +0000 (12:58 +0200)]
ofproto: Add CLI commands to show and clear mac_learning statistics
Add two new commands, fdb/stats-show and fdb/stats-clear, to
ovs-appctl to show and clear the new mac_learning statistics.
$ ovs-appctl fdb/stats-show ovs_pvp_br0
Statistics for bridge "ovs_pvp_br0":
Current/maximum MAC entries in the table: 4/2048
Total number of learned MAC entries : 4
Total number of expired MAC entries : 1
Total number of evicted MAC entries : 0
Total number of port moved MAC entries : 32
Eelco Chaudron [Mon, 25 Jun 2018 10:57:40 +0000 (12:57 +0200)]
mac-learning: Add per mac learning instance counters
This patch adds counters per mac_learning instance.
The following counters are added:
total_learned: Total number of learned MAC entries
total_expired: Total number of expired MAC entries
total_evicted: Total number of evicted MAC entries, i.e. entries moved
out due to the table being full.
total_moved : Total number of port moved MAC entries, i.e. entries
where the MAC address moved to a different port.
Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
This patch adds two additional mac-learning coverage counters:
- mac_learning_evicted, entries deleted due to mac table being full
- mac_learning_moved, entries where the port has changed.
Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Justin Pettit [Fri, 29 Jun 2018 19:02:28 +0000 (12:02 -0700)]
ovs-ofctl: Prefer "del-meters" and "dump-meters".
Previously to delete or dump the meter table, separate commands had to
be used depending on whether one wanted to operate on a single or all
meters. This change makes it so that the "meter" argument is always
optional regardless of the command. This is a bit more consistent with
other OVS commands and makes it easier when experimenting to not have to
distinguish between the two cases.
This also fixes an error in the ovs-ofctl man page that show the plural
version of the command supported an optional "meter" argument.
"del-meter" and "dump-meter" can still be used, but their use is no
longer documented.
Signed-off-by: Justin Pettit <jpettit@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
Yifeng Sun [Tue, 26 Jun 2018 21:06:21 +0000 (14:06 -0700)]
DNS: Add basic support for asynchronous DNS resolving
This patch is a simple implementation for the proposal discussed in
https://mail.openvswitch.org/pipermail/ovs-dev/2017-August/337038.html and
https://mail.openvswitch.org/pipermail/ovs-dev/2017-October/340013.html.
It enables ovs-vswitchd and other utilities to use DNS names when specifying
OpenFlow and OVSDB remotes.
Below are some of the features and limitations of this patch:
- Resolving is asynchornous in daemon context, avoiding blocking main loop;
- Resolving is synchronous in general utility context;
- Both IPv4 and IPv6 are supported;
- The resolving API is thread-safe;
- Depends on the unbound library;
- When multiple ip addresses are returned, only the first one is used;
- /etc/nsswitch.conf isn't respected as unbound library doesn't look at it;
- For async-resolving, caller need to retry later; there is no callback.
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Yifeng Sun [Fri, 6 Jul 2018 15:28:27 +0000 (08:28 -0700)]
dpif-netlink-rtnl: Retry smaller MTU when default MAX_MTU is too large.
When MAX_MTU is larger than hw supported max MTU,
dpif_netlink_rtnl_create will fail. This leads to
testing failure '11: datapath - ping over gre tunnel'
in 'make check-kmod'.
This patch fixes this issue by retrying a smaller MTU
when MAX_MTU is too large.
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
OVN: add IPv6 address unreachable support to OVN router ports
Add priority-70 flows to generate ICMPv6 address unreachable messages
in reply to IPv6 packets directed to the router's IP address on IP
protocols other than UDP, TCP, and ICMP
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
OVN: add IPv6 UDP port unreachable support to OVN logical router
Add priority-80 flow to generate ICMPv6 port unreachable messages in
reply to IPv6 UDP datagrams directed to the router's IP address since the
logical router doesn't accept any UDP traffic
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
ofproto-dpif-xlate: Fix packet_in reason for Table-miss rule
Currently in OvS if we hit "Table-miss" rules (associated with Controller
action) then we send PACKET_IN message to controller with reason as
OFPR_NO_MATCH.
“Table-miss” rule is one whose priority is 0 and its catch all rule.
But if we hit same "Table-miss" rule after executing group entry we will
send the reason as OFPR_ACTION (for OF1.3 and below) and OFPR_GROUP
(for OF1.4 and above).
This is because once we execute group entry we set ctx->in_group and later
when we hit the "Table-miss" rule, Since ctx->in_group is set we send
reason as OFPR_ACTION (for OF1.3) and OFPR_GROUP (for OF1.4 and above).
For eg: for the following pipeline, we will send the reason as OFPR_ACTION
even if we hit The “Table-miss” rule.
Ian Stokes [Wed, 27 Jun 2018 13:58:31 +0000 (14:58 +0100)]
dpdk: Support both shared and per port mempools.
This commit re-introduces the concept of shared mempools as the default
memory model for DPDK devices. Per port mempools are still available but
must be enabled explicitly by a user.
OVS previously used a shared mempool model for ports with the same MTU
and socket configuration. This was replaced by a per port mempool model
to address issues flagged by users such as:
However the per port model potentially requires an increase in memory
resource requirements to support the same number of ports and configuration
as the shared port model.
This is considered a blocking factor for current deployments of OVS
when upgrading to future OVS releases as a user may have to redimension
memory for the same deployment configuration. This may not be possible for
users.
This commit resolves the issue by re-introducing shared mempools as
the default memory behaviour in OVS DPDK but also refactors the memory
configuration code to allow for per port mempools.
This patch adds a new global config option, per-port-memory, that
controls the enablement of per port mempools for DPDK devices.
ovs-vsctl set Open_vSwitch . other_config:per-port-memory=true
This value defaults to false; to enable per port memory support,
this field should be set to true when setting other global parameters
on init (such as "dpdk-socket-mem", for example). Changing the value at
runtime is not supported, and requires restarting the vswitch
daemon.
The mempool sweep functionality is also replaced with the
sweep functionality from OVS 2.9 found in commits
c77f692 (netdev-dpdk: Free mempool only when no in-use mbufs.) a7fb0a4 (netdev-dpdk: Add mempool reuse/free debug.)
A new document to discuss the specifics of the memory models and example
memory requirement calculations is also added.
Signed-off-by: Ian Stokes <ian.stokes@intel.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Tiago Lam <tiago.lam@intel.com> Tested-by: Tiago Lam <tiago.lam@intel.com>
Yuanhan Liu [Mon, 25 Jun 2018 13:21:08 +0000 (16:21 +0300)]
dpif-netdev: do hw flow offload in a thread
Currently, the major trigger for hw flow offload is at upcall handling,
which is actually in the datapath. Moreover, the hw offload installation
and modification is not that lightweight. Meaning, if there are so many
flows being added or modified frequently, it could stall the datapath,
which could result to packet loss.
To diminish that, all those flow operations will be recorded and appended
to a list. A thread is then introduced to process this list (to do the
real flow offloading put/del operations). This could leave the datapath
as lightweight as possible.
Signed-off-by: Yuanhan Liu <yliu@fridaylinux.org> Co-authored-by: Shahaf Shuler <shahafs@mellanox.com> Signed-off-by: Shahaf Shuler <shahafs@mellanox.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Finn Christensen [Mon, 25 Jun 2018 13:21:06 +0000 (16:21 +0300)]
netdev-dpdk: implement flow offload with rte flow
The basic yet the major part of this patch is to translate the "match"
to rte flow patterns. And then, we create a rte flow with MARK + RSS
actions. Afterwards, all packets match the flow will have the mark id in
the mbuf.
The reason RSS is needed is, for most NICs, a MARK only action is not
allowed. It has to be used together with some other actions, such as
QUEUE, RSS, etc. However, QUEUE action can specify one queue only, which
may break the rss. Likely, RSS action is currently the best we could
now. Thus, RSS action is choosen.
For any unsupported flows, such as MPLS, -1 is returned, meaning the
flow offload is failed and then skipped.
Co-authored-by: Yuanhan Liu <yliu@fridaylinux.org> Signed-off-by: Finn Christensen <fc@napatech.com> Signed-off-by: Yuanhan Liu <yliu@fridaylinux.org> Co-authored-by: Shahaf Shuler <shahafs@mellanox.com> Signed-off-by: Shahaf Shuler <shahafs@mellanox.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Yuanhan Liu [Mon, 25 Jun 2018 13:21:05 +0000 (16:21 +0300)]
dpif-netdev: retrieve flow directly from the flow mark
So that we could skip some very costly CPU operations, including but
not limiting to miniflow_extract, emc lookup, dpcls lookup, etc. Thus,
performance could be greatly improved.
A PHY-PHY forwarding with 1000 mega flows (udp,tp_src=1000-1999) and
1 million streams (tp_src=1000-1999, tp_dst=2000-2999) show more that
260% performance boost.
Note that though the heavy miniflow_extract is skipped, we still have
to do per packet checking, due to we have to check the tcp_flags.
Co-authored-by: Finn Christensen <fc@napatech.com> Signed-off-by: Yuanhan Liu <yliu@fridaylinux.org> Signed-off-by: Finn Christensen <fc@napatech.com> Co-authored-by: Shahaf Shuler <shahafs@mellanox.com> Signed-off-by: Shahaf Shuler <shahafs@mellanox.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Yuanhan Liu [Mon, 25 Jun 2018 13:21:03 +0000 (16:21 +0300)]
dpif-netdev: associate flow with a mark id
Most modern NICs have the ability to bind a flow with a mark, so that
every packet matches such flow will have that mark present in its
descriptor.
The basic idea of doing that is, when we receives packets later, we could
directly get the flow from the mark. That could avoid some very costly
CPU operations, including (but not limiting to) miniflow_extract, emc
lookup, dpcls lookup, etc. Thus, performance could be greatly improved.
Thus, the major work of this patch is to associate a flow with a mark
id (an uint32_t number). The association in netdev datapath is done
by CMAP, while in hardware it's done by the rte_flow MARK action.
One tricky thing in OVS-DPDK is, the flow tables is per-PMD. For the
case there is only one phys port but with 2 queues, there could be 2
PMDs. In other words, even for a single mega flow (i.e. udp,tp_src=1000),
there could be 2 different dp_netdev flows, one for each PMD. That could
results to the same mega flow being offloaded twice in the hardware,
worse, we may get 2 different marks and only the last one will work.
To avoid that, a megaflow_to_mark CMAP is created. An entry will be
added for the first PMD that wants to offload a flow. For later PMDs,
it will see such megaflow is already offloaded, then the flow will not
be offloaded to HW twice.
Meanwhile, the mark to flow mapping becomes to 1:N mapping. That is
what the mark_to_flow CMAP is for. When the first PMD wants to offload
a flow, it allocates a new mark and performs the flow offload by reusing
the ->flow_put method. When it succeeds, a "mark to flow" entry will be
added. For later PMDs, it will get the corresponding mark by above
megaflow_to_mark CMAP. Then, another "mark to flow" entry will be added.
Signed-off-by: Yuanhan Liu <yliu@fridaylinux.org> Co-authored-by: Finn Christensen <fc@napatech.com> Signed-off-by: Finn Christensen <fc@napatech.com> Co-authored-by: Shahaf Shuler <shahafs@mellanox.com> Signed-off-by: Shahaf Shuler <shahafs@mellanox.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
OVN: add ICMPv6 time exceeded support to OVN logical router
Using icmp6 action, send an ICMPv6 time exceeded frame whenever
an OVN logical router receives an IPv6 packets whose TTL has
expired (ip.ttl == {0, 1})
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ben Pfaff [Thu, 21 Jun 2018 22:53:53 +0000 (15:53 -0700)]
ofproto-dpif: Let the dpif report when a port is a duplicate.
The port_add() function checks whether the port about to be added to the
dpif is already present and adds it only if it is not. This duplicates a
check also present (and necessary) in each dpif and races with it as well.
When a dpif has a large number of ports, the check can be expensive (it is
not efficiently implemented). It would be nice to made the check cheaper,
but it also seems reasonable to do as done in this patch and just let the
dpif report the duplication.
Reported-by: Haifeng Lin <haifeng.lin@huawei.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Justin Pettit <jpettit@ovn.org> Signed-off-by: Ben Pfaff <blp@ovn.org>
A bissect shows that commit d22f892 ("netdev-linux: monitor and offload
LAG slaves to TC") introduced netdev_linux_update_lag(), which is now
triggering a crash in the "datapath - ping over bond" test in
system-userspace-testsuite:
(gdb) bt
#0 0x00000000009762e7 in netdev_linux_update_lag (change=0x7ffdff013750) at lib/netdev-linux.c:728
728 if (is_netdev_linux_class(master_netdev->netdev_class)) {
This fixes the crash by simply returning in case netdev_from_name()
returns NULL, as this should indicate the master is not attached to the
bridge.
Additionally, netdev_linux_update_lag() isn't "clearing" the netdev
reference it gets from netdev_from_name(), meaning its ref_cnt is
incremented but never decremented. Thus, also call netdev_close() before
returning.
CC: John Hurley <john.hurley@netronome.com> Fixes: d22f8927 ("netdev-linux: monitor and offload LAG slaves to TC") Signed-off-by: Tiago Lam <tiago.lam@intel.com> Signed-off-by: Ben Pfaff <blp@ovn.org>