William Tu [Fri, 13 May 2016 18:58:43 +0000 (11:58 -0700)]
ovn-controller: Fix errors reported by Valgrind.
Fix two errors reported by test 2026: ovn -- 3 HVs, 1 LS, 3 lports/HV.
1. Conditional jump or move depends on uninitialised value(s)
physical_run (physical.c:366)
main (ovn-controller.c:382)
2. Use of uninitialised value of size 8
bitmap_set1 (bitmap.h:97)
update_ct_zones (binding.c:115)
binding_run (binding.c:228)
main (ovn-controller.c:362)
Signed-off-by: William Tu <u9012063@gmail.com> Acked-by: Ryan Moats <rmoats@us.ibm.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ben Pfaff [Sat, 23 Apr 2016 00:45:03 +0000 (17:45 -0700)]
ofproto-dpif-xlate: Always generate wildcards.
Until now, the flow translation code has tried to avoid constructing a
set of wildcards during translation in the cases where it can, because
wildcards are large and somewhat expensive. However, this has problems
that we hadn't previously realized. Specifically, the generated actions
can depend on the constructed wildcards, to decide which bits of a field
need to be set in a masked set_field action. This means that in practice
translation needs to always construct the wildcards.
(It might be possible to avoid masked set_field when we're not constructing
wildcards, but this would mean that we'd generate different actions
depending on whether wildcards were being constructed, which seems rather
confusing at best. Also, the cases in which we don't need wildcards anyway
are fairly obscure, meaning that the benefits of avoiding them in those
cases are minimal and that it's going to be hard to get test coverage. The
latter is probably why we didn't notice this until now.)
Reported-by: William Tu <u9012063@gmail.com>
Reported-at: http://openvswitch.org/pipermail/dev/2016-April/069219.html Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Ryan Moats <rmoats@us.ibm.com> Tested-by: William Tu <u9012063@gmail.com>
Joe Stringer [Tue, 10 May 2016 22:50:42 +0000 (15:50 -0700)]
netdev-dpdk: Fix locking during get_stats.
Clang complains:
lib/netdev-dpdk.c:1860:1: error: mutex 'dev->mutex' is not locked on every path
through here [-Werror,-Wthread-safety-analysis]
}
^
lib/netdev-dpdk.c:1815:5: note: mutex acquired here
ovs_mutex_lock(&dev->mutex);
^
./include/openvswitch/thread.h:60:9: note: expanded from macro 'ovs_mutex_lock'
ovs_mutex_lock_at(mutex, OVS_SOURCE_LOCATOR)
^
Fixes: d6e3feb57c44 ("Add support for extended netdev statistics based on RFC 2819.") Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Joe Stringer [Tue, 10 May 2016 22:42:01 +0000 (15:42 -0700)]
ofproto-dpif-upcall: Pass key to dpif_flow_get().
Windows datapath folks have reported instances where OVS userspace will
pass down a flow_get request to the datapath using a UFID even though the
datapath has no support for UFIDs. Since commit e672ff9b4d22
("ofproto-dpif: Restore metadata and registers on recirculation."), if a
flow dump provides a flow that userspace isn't aware of, and the flow
dump doesn't provide actions for that flow, then userspace will attempt
a flow_get using just the UFID. This is because the ofproto-dpif layer
doesn't pass the key down to the dpif layer even if it's available.
Prior to the above commit, the codepath was only hit if the key was not
available, which would have implied UFID support. This assumption is now
broken: An empty set of actions could also trigger flow_get, and
datapaths without UFID support are free to pass up empty actions lists.
Pass down the flow key if available, and don't pass down the UFID if
unavailable to be more consistent with the usage of other dpif APIs
within this file.
Fixes: e672ff9b4d22 ("ofproto-dpif: Restore metadata and registers on recirculation.") Reported-by: Sairam Venugopal <vsairam@vmware.com> Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Jarno Rajahalme <jarno@ovn.org>
Darrell Ball [Sat, 7 May 2016 16:21:21 +0000 (09:21 -0700)]
vtep: Add source node replication support.
This patch updates the vtep schema, vtep-ctl commands and vtep simulator
to support source node replication in addition to service node
replication per logical switch. The default replication mode is service
node as that was the only mode previously supported. Source node
replication mode is optionally configurable and clearing the replication
mode implicitly sets the replication mode back to a default of service
node.
László Sürü [Wed, 11 May 2016 08:46:33 +0000 (08:46 +0000)]
ofproto-dpif-xlate: fix for group liveness propagation
According to OpenFlow v1.3.5 specification a group is considered live,
if it has at least one live bucket in it. (6.5 Group Table
Modification Messages: "A group is considered live if a least one of
its buckets is live.")
However, OVS implementation incorrectly returns group as live when no
live bucket is found in group_is_alive() function of
ofproto-dpif-xlate.c.
Instead it should return true only if a live bucket is found (that is
!= NULL).
Signed-off-by: László Sűrű <laszlo.suru@ericsson.com> Co-authored-by: Jan Scheurich <jan.scheurich@ericsson.com> Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Acked-by: Jarno Rajahalme <jarno@ovn.org>
Justin Pettit [Wed, 4 May 2016 01:20:51 +0000 (18:20 -0700)]
util: Pass 128-bit arguments directly instead of using pointers.
Commit f2d105b5 (ofproto-dpif-xlate: xlate ct_{mark, label} correctly.)
introduced the ovs_u128_and() function. It directly takes ovs_u128
values as arguments instead of pointers to them. As this is a bit more
direct way to deal with 128-bit values, modify the other utility
functions to do the same.
Signed-off-by: Justin Pettit <jpettit@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
Joe Stringer [Thu, 5 May 2016 01:01:06 +0000 (18:01 -0700)]
system-traffic: Wait for availability of ftpd.
Some FTP tests had intermittent failures because the FTP daemons
might not load before the testsuite script iterated to running the
client. Add checks after launching FTP daemons to make these tests more
resilient.
Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Jarno Rajahalme <jarno@ovn.org>
Joe Stringer [Thu, 5 May 2016 01:01:05 +0000 (18:01 -0700)]
system-traffic: Wait for IPv6 connectivity.
Several of the tests have race conditions where the next step in the
test may run before the kernel actually provides IPv6 connectivity.
This causes intermittent testsuite failures. Some existing tests
would even sleep in an attempt to mitigate this issue.
Improve the resilience of these tests by waiting until IPv6 or FTP
connectivity are ready. This speeds the testsuite up by a couple of
percent.
Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Jarno Rajahalme <jarno@ovn.org>
Joe Stringer [Thu, 5 May 2016 01:01:03 +0000 (18:01 -0700)]
system-traffic: Drop auto ct helpers in namespaces.
Automatic helper assignment in conntrack can trigger an upstream bug
where namespace deletion followed by immediate unload of conntrack
helper modules may cause kernel crashes. Disable automatic helper
assignment within created namespaces to avoid this issue.
Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Jarno Rajahalme <jarno@ovn.org>
The neighbor entry expiry is only checked in dpif-poll
event handler, But in absence of any event we could keep
using arp entry forever. This patch changes it to check
expiration on each lookup.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
Ben Pfaff [Sat, 23 Apr 2016 00:03:22 +0000 (17:03 -0700)]
netdev: Fix potential deadlock.
Until now, netdev_class_mutex and route_table_mutex could be taken in
either order:
* netdev_run() takes netdev_class_mutex, then netdev_vport_run() calls
route_table_run(), which takes route_table_mutex.
* route_table_init() takes route_table_mutex and then eventually calls
netdev_open(), which takes netdev_class_mutex.
This commit fixes the problem by converting the netdev_classes hmap,
protected by netdev_class_mutex, into a cmap protected on the read
side by RCU. Only a very small amount of code actually writes to the
cmap in question, so it's a lot easier to understand the locking rules
at that point. In particular, there's no need to take netdev_class_mutex
from either netdev_run() or netdev_open(), so neither of the code paths
above determines a lock ordering any longer.
Reported-by: William Tu <u9012063@gmail.com>
Reported-at: http://openvswitch.org/pipermail/discuss/2016-February/020216.html Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Ryan Moats <rmoats@us.ibm.com> Tested-by: William Tu <u9012063@gmail.com>
Ben Pfaff [Fri, 22 Apr 2016 23:51:03 +0000 (16:51 -0700)]
cmap: New macro CMAP_INITIALIZER, for initializing an empty cmap.
Sometimes code is much simpler if we can statically initialize data
structures. Until now, this has not been possible for cmap-based data
structures, so this commit introduces a CMAP_INITIALIZER macro.
This works by adding a singleton empty cmap_impl that simply forces the
first insertion into any cmap that points to it to allocate a real
cmap_impl. There could be some risk that rogue code modifies the
singleton, so for safety it is also marked 'const' to allow the linker to
put it into a read-only page.
This adds a new OVS_ALIGNED_VAR macro with GCC and MSVC implementations.
The latter is based on Microsoft webpages, so developers who know Windows
might want to scrutinize it.
As examples of the kind of simplification this can make possible, this
commit removes an initialization function from ofproto-dpif-rid.c and a
call to cmap_init() from tnl-neigh-cache.c. An upcoming commit will add
another user.
CC: Jarno Rajahalme <jarno@ovn.org> CC: Gurucharan Shetty <guru@ovn.org> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Ryan Moats <rmoats@us.ibm.com>
Ben Pfaff [Thu, 21 Apr 2016 17:50:17 +0000 (10:50 -0700)]
ofproto-dpif: Do not count resubmit to later tables against limit.
Open vSwitch must ensure that flow translation takes a finite amount of
time. Until now it has implemented this by limiting the depth of
recursion. The initial limit, in version 1.0.1, was no recursion at all,
and then over the years it has increased to 8 levels, then 16, then 32,
and 64 for the last few years. Now reports are coming in that 64 levels
are inadequate for some OVN setups. The natural inclination would be to
double the limit again to 128 levels.
This commit attempts another approach. Instead of increasing the limit,
it reduces the class of resubmits that count against the limit. Since the
goal for the depth limit is to prevent an infinite amount of work, it's
not necessary to count resubmits that can't lead to infinite work. In
particular, a resubmit from a table numbered x to a table y > x cannot do
this, because any OpenFlow switch has a finite number of tables. Because
in fact a resubmit (or goto_table) from one table to a later table is the
most common form of an OpenFlow pipeline, I suspect that this will greatly
alleviate the pressure to increase the depth limit.
Reported-by: Guru Shetty <guru@ovn.org> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Ryan Moats <rmoats@us.ibm.com>
Ben Pfaff [Thu, 21 Apr 2016 17:50:16 +0000 (10:50 -0700)]
ofproto-dpif: Rename "recurse" to "indentation".
The "recurse" member of struct xlate_in and struct xlate_ctx is used for
two purposes: to determine the amount of indentation in "ofproto/trace"
output and to limit the depth of recursion. An upcoming commit will
separate these tasks, and so in preparation this commit renames "recurse"
to "indentation".
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Ryan Moats <rmoats@us.ibm.com>
Ben Pfaff [Sun, 8 May 2016 16:21:29 +0000 (09:21 -0700)]
ovn-nbctl: Add sanity checking for lswitch-add.
I don't think anyone really wants the painful behavior of creating multiple
logical switches with the same name to be the default. This commit retains
the possibility of doing that in case someone really wants it, but refuses
by default for sanity.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Justin Pettit <jpettit@ovn.org>
Ben Pfaff [Sun, 8 May 2016 16:21:41 +0000 (09:21 -0700)]
ovn-nbctl: Make error handling consistent with ovs-vsctl.
ovs-vsctl distinguishes between internal database inconsistencies, which
it logs, and errors in commands specified by the user, which cause fatal
exits. ovn-nbctl wasn't as careful about this and tended to just log
everything. This commit brings it up to the same standard as ovs-vsctl.
This commit also adds --if-exists and --may-exist options in the same kinds
of places as ovs-vsctl, to allow for scripting in cases where it's OK if
an operation has already occurred.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Justin Pettit <jpettit@ovn.org>
Ciara Loftus [Fri, 6 May 2016 10:20:34 +0000 (11:20 +0100)]
netdev-dpdk: Print default vhost-sock-dir value & update documentation
When no vhost-sock-dir value is provided, print the default location.
Update the documentation to reflect the fact that vhost-sock-dir values
are now subdirectory loctions rather than full paths.
mweglicx [Thu, 5 May 2016 08:46:01 +0000 (09:46 +0100)]
Add support for extended netdev statistics based on RFC 2819.
Implementation of new statistics extension for DPDK ports:
- Add new counters definition to netdev struct and open flow,
based on RFC2819.
- Initialize netdev statistics as "filtered out"
before passing it to particular netdev implementation
(because of that change, statistics which are not
collected are reported as filtered out, and some
unit tests were modified in this respect).
- New statistics are retrieved using experimenter code and
are printed as a result to ofctl dump-ports.
- New counters are available for OpenFlow 1.4+.
- Add new vendor id: INTEL_VENDOR_ID.
- New statistics are printed to output via ofctl only if those
are present in reply message.
- Add new file header: include/openflow/intel-ext.h which
contains new statistics definition.
- Extended statistics are implemented only for dpdk-physical
and dpdk-vhost port types.
- Dpdk-physical implementation uses xstats to collect statistics.
- Dpdk-vhost implements only part of statistics (RX packet sized
based counters).
Signed-off-by: Michal Weglicki <michalx.weglicki@intel.com>
[blp@ovn.org made software devices more consistent] Signed-off-by: Ben Pfaff <blp@ovn.org>
RYAN D. MOATS [Fri, 22 Apr 2016 21:35:37 +0000 (16:35 -0500)]
Add change tracking documentation
Change tracking is a bit different from what someone with
"classic" database experience might expect, so let's add
the knowledged gained from the experience of making change
tracking work for incremental processing.
Signed-off-by: RYAN D. MOATS <rmoats@us.ibm.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
After removing a flow from the dpcls classifier there might still be
readers who have access to the flow, until the next grace period.
Setting flow->cr.mask to NULL can cause concurrent readers to crash,
so this commit avoids doing it.
The crash can be reproduced, for example, by invoking an operation
that cause datapath flows to be deleted (such as `ovs-appctl
upcall/enable-megaflows`) while traffic is running.
I think the assignment was intended just as a safety measure to catch
race conditions, and it should be safe to remove.
Here's a stack trace of a possible crash:
Program terminated with signal SIGSEGV, Segmentation fault.
rule=0x7f3ae8006190) at ../lib/dpif-netdev.c:4156
4156 if (OVS_UNLIKELY((value & *maskp++) != *keyp++)) {
(gdb) bt
rule=0x7f3ae8006190) at ../lib/dpif-netdev.c:4156
rules=0x7f3afa3f2e40, cnt=<optimized out>) at ../lib/dpif-netdev.c:4225
(pmd=pmd@entry=0x7f3afa3fc010, packets=packets@entry=0x7f3afa3fa420,
cnt=cnt@entry=32, keys=keys@entry=0x7f3afa3f6428,
batches=batches@entry=0x7f3afa3f4118,
n_batches=n_batches@entry=0x7f3afa3fa3b0)
at ../lib/dpif-netdev.c:3483
(pmd=pmd@entry=0x7f3afa3fc010, packets=packets@entry=0x7f3afa3fa420,
cnt=<optimized out>, md_is_valid=md_is_valid@entry=false,
port_no=<optimized out>) at ../lib/dpif-netdev.c:3625
cnt=<optimized out>, packets=0x7f3afa3fa420, pmd=0x7f3afa3fc010) at
../lib/dpif-netdev.c:3642
rxq=<optimized out>, port=<optimized out>, port=<optimized out>) at
../lib/dpif-netdev.c:2574
../lib/dpif-netdev.c:2693
../lib/ovs-thread.c:340
pthread_create.c:312
../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Steve Ruan [Tue, 3 May 2016 12:06:50 +0000 (07:06 -0500)]
ovn-northd: Add support for static_routes.
Logical patch ports are used to connect logical routers
together. Static routes are used to select between different logical router
ports when exiting a logical router.
Joe Stringer [Tue, 3 May 2016 22:44:15 +0000 (15:44 -0700)]
check-kmod: Remove all OVS modules in this target.
The make check-kmod target would previously attempt to only remove the
openvswitch module, which would fail if any vport modules were loaded.
Remove those modules too, to allow the target to proceed.
Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Andy Zhou <azhou@ovn.org>
Jarno Rajahalme [Wed, 4 May 2016 20:00:06 +0000 (13:00 -0700)]
classifier: Remove rare optimization case.
This optimization applied when a staged lookup index would narrow down
to a single rule, which happens sometimes is simple test cases, but
presumably less often in more populated flow tables. The result of
this optimization allowed a bit more general megaflows, but the bit
patterns produced were sometimes cryptic. Finally, a later fix to a
more important performance problem does not allow for this
optimization any more, so remove it now.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Ryan Moats <rmoats@us.ibm.com> Acked-by: Ben Pfaff <blp@ovn.org>
Joe Stringer [Mon, 2 May 2016 18:19:17 +0000 (11:19 -0700)]
datapath: Fix template leak in error cases.
Upstream commit:
openvswitch: Fix template leak in error cases.
Commit 2f3ab9f9fc23 ("openvswitch: Fix helper reference leak") fixed a
reference leak on helper objects, but inadvertently introduced a leak on
the ct template.
Previously, ct_info.ct->general.use was initialized to 0 by
nf_ct_tmpl_alloc() and only incremented when ovs_ct_copy_action()
returned successful. If an error occurred while adding the helper or
adding the action to the actions buffer, the __ovs_ct_free_action()
cleanup would use nf_ct_put() to free the entry; However, this relies on
atomic_dec_and_test(ct_info.ct->general.use). This reference must be
incremented first, or nf_ct_put() will never free it.
Fix the issue by acquiring a reference to the template immediately after
allocation.
Fixes: cae3a2627520 ("openvswitch: Allow attaching helpers to ct action") Fixes: 2f3ab9f9fc23 ("openvswitch: Fix helper reference leak") Signed-off-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Upstream: 90c7afc96cbb ("openvswitch: Fix template leak in error cases.") Fixes: 11251c170d92 ("datapath: Allow attaching helpers to ct action") Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
Joe Stringer [Mon, 2 May 2016 18:19:16 +0000 (11:19 -0700)]
datapath: Orphan skbs before IPv6 defrag
Upstream commit:
openvswitch: Orphan skbs before IPv6 defrag
This is the IPv6 counterpart to commit 8282f27449bf ("inet: frag: Always
orphan skbs inside ip_defrag()").
Prior to commit 029f7f3b8701 ("netfilter: ipv6: nf_defrag: avoid/free
clone operations"), ipv6 fragments sent to nf_ct_frag6_gather() would be
cloned (implicitly orphaning) prior to queueing for reassembly. As such,
when the IPv6 message is eventually reassembled, the skb->sk for all
fragments would be NULL. After that commit was introduced, rather than
cloning, the original skbs were queued directly without orphaning. The
end result is that all frags except for the first and last may have a
socket attached.
This commit explicitly orphans such skbs during nf_ct_frag6_gather() to
prevent BUG_ON(skb->sk) during a later call to ip6_fragment().
Valdis reports NULL deref in nf_ct_frag6_gather.
Problem is bogus use of skb_queue_walk() -- we miss first skb in the list
since we start with head->next instead of head.
In case the element we're looking for was head->next we won't find
a result and then trip over NULL iter.
(defrag uses plain NULL-terminated list rather than one terminated by
head-of-list-pointer, which is what skb_queue_walk expects).
The previous patch changed nf_ct_frag6_gather() to morph reassembled skb
with the previous one.
This means that the return value is always NULL or the skb argument.
So change it to an err value.
Instead of invoking NF_HOOK recursively with threshold to skip already-called hooks
we can now just return NF_ACCEPT to move on to the next hook except for
-EINPROGRESS (which means skb has been queued for reassembly), in which case we
return NF_STOLEN.
commit 6aafeef03b9d9ecf
("netfilter: push reasm skb through instead of original frag skbs")
changed ipv6 defrag to not use the original skbs anymore.
So rather than keeping the original skbs around just to discard them
afterwards just use the original skbs directly for the fraglist of
the newly assembled skb and remove the extra clone/free operations.
The skb that completes the fragment queue is morphed into a the
reassembled one instead, just like ipv4 defrag.
openvswitch doesn't need any additional skb_morph magic anymore to deal
with this situation so just remove that.
A followup patch can then also remove the NF_HOOK (re)invocation in
the ipv6 netfilter defrag hook.
Cc: Joe Stringer <joestringer@nicira.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Upstream: 029f7f3b8701 ("netfilter: ipv6: nf_defrag: avoid/free clone operations") Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
Joe Stringer [Mon, 2 May 2016 18:19:12 +0000 (11:19 -0700)]
compat: ipv6: Pass struct net into nf_ct_frag6_gather.
Upstream commit:
ipv6: Pass struct net into nf_ct_frag6_gather
The function nf_ct_frag6_gather is called on both the input and the
output paths of the networking stack. In particular ipv6_defrag which
calls nf_ct_frag6_gather is called from both the the PRE_ROUTING chain
on input and the LOCAL_OUT chain on output.
The addition of a net parameter makes it explicit which network
namespace the packets are being reassembled in, and removes the need
for nf_ct_frag6_gather to guess.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Upstream: b72775977c39 ("ipv6: Pass struct net into nf_ct_frag6_gather") Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
Joe Stringer [Mon, 2 May 2016 18:19:11 +0000 (11:19 -0700)]
compat: ipv4: Pass struct net into ip_defrag.
Upstream commit:
ipv4: Pass struct net into ip_defrag and ip_check_defrag
The function ip_defrag is called on both the input and the output
paths of the networking stack. In particular conntrack when it is
tracking outbound packets from the local machine calls ip_defrag.
So add a struct net parameter and stop making ip_defrag guess which
network namespace it needs to defragment packets in.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Upstream: 19bcf9f203c8 ("ipv4: Pass struct net into ip_defrag and ip_check_defrag") Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
Joe Stringer [Mon, 2 May 2016 18:19:10 +0000 (11:19 -0700)]
compat: Add a struct net parameter to l4_pkt_to_tuple.
Upstream commit:
netfilter: nf_conntrack: Add a struct net parameter to l4_pkt_to_tuple
As gre does not have the srckey in the packet gre_pkt_to_tuple
needs to perform a lookup in it's per network namespace tables.
Pass in the proper network namespace to all pkt_to_tuple
implementations to ensure gre (and any similar protocols) can get this
right.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Upstream: a31f1adc0948 ("netfilter: nf_conntrack: Add a struct net
parameter to l4_pkt_to_tuple") Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
Joe Stringer [Thu, 28 Apr 2016 21:39:09 +0000 (14:39 -0700)]
FAQ: Update feature table.
Linux kernel support for features in out-of-tree module no longer depend
on particular versions, as we only support kernels 3.10-4.3; Connection
tracking status has changed recently; and NAT is a brand new feature
with only support in the latest unreleased Linux kernel version.
Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org> Acked-by: Ryan Moats <rmoats@us.ibm.com>
A previous patch introduced the ability to pass arbitrary EAL command
line options via the dpdk_extras database entry. This commit enhances
that by warning the user when such a configuration is detected and
prefering the value in the database.
Suggested-by: Sean K Mooney <sean.k.mooney@intel.com> Signed-off-by: Aaron Conole <aconole@redhat.com> Tested-by: Sean K Mooney <sean.k.mooney@intel.com> Tested-by: Kevin Traynor <kevin.traynor@intel.com> Acked-by: Panu Matilainen <pmatilai@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
A previous change moved some commonly used arguments from commandline to
the database, and with it the ability to pass arbitrary arguments to
EAL. This change allows arbitrary eal arguments to be provided
via a new db entry 'other_config:dpdk-extra' which will tokenize the
string and add it to the argument list. The only argument which will not
be supported with this change is '--no-huge', which appears to break the
system in other ways.
Signed-off-by: Aaron Conole <aconole@redhat.com> Tested-by: Sean K Mooney <sean.k.mooney@intel.com> Tested-by: RobertX Wojciechowicz <robertx.wojciechowicz@intel.com> Tested-by: Kevin Traynor <kevin.traynor@intel.com> Acked-by: Panu Matilainen <pmatilai@redhat.com> Acked-by: Kevin Traynor <kevin.traynor@intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
The user has control over the DPDK internal lcore coremask, but this
parameter can be autofilled with a bit more intelligence. If the user
does not fill this parameter in, we use the lowest set bit in the
current task CPU affinity. Otherwise, we will reassign the current
thread to the specified lcore mask, in addition to the dpdk lcore
threads.
Signed-off-by: Aaron Conole <aconole@redhat.com> Tested-by: Sean K Mooney <sean.k.mooney@intel.com> Tested-by: RobertX Wojciechowicz <robertx.wojciechowicz@intel.com> Tested-by: Kevin Traynor <kevin.traynor@intel.com> Acked-by: Panu Matilainen <pmatilai@redhat.com> Acked-by: Kevin Traynor <kevin.traynor@intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Since the vhost-user sockets directory now comes from the database, it is
possible for any user with database access to program an arbitrary filesystem
location for the sockets directory. This could result in unprivileged users
creating or deleting arbitrary filesystem files by using specially crafted
names. To prevent this, 'vhost-sock-dir' is now relative to ovs_rundir()
and must not contain "..".
Signed-off-by: Aaron Conole <aconole@redhat.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
netdev-dpdk: Convert initialization from cmdline to db
Existing DPDK integration is provided by use of command line options which
must be split out and passed to librte in a special manner. However, this
forces any configuration to be passed by way of a special DPDK flag, and
interferes with ovs+dpdk packaging solutions.
This commit delays dpdk initialization until after the OVS database
connection is established, at which point ovs initializes librte. It
pulls all of the config data from the OVS database, and assembles a
new argv/argc pair to be passed along.
Signed-off-by: Aaron Conole <aconole@redhat.com> Acked-by: Kevin Traynor <kevin.traynor@intel.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
netdev-dpdk: Restore thread affinity after DPDK init
When the DPDK init function is called, it changes the executing thread's
CPU affinity to a single core specified in -c. This will result in the
userspace bridge configuration thread being rebound, even if that is not
the intent.
This change fixes that behavior by rebinding to the original thread
affinity after calling dpdk_init().
Co-authored-by: Kevin Traynor <kevin.traynor@intel.com> Signed-off-by: Kevin Traynor <kevin.traynor@intel.com> Signed-off-by: Aaron Conole <aconole@redhat.com> Tested-by: RobertX Wojciechowicz <robertx.wojciechowicz@intel.com> Tested-by: Sean K Mooney <sean.k.mooney@intel.com> Acked-by: Panu Matilainen <pmatilai@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Joe Stringer [Thu, 28 Apr 2016 21:13:38 +0000 (14:13 -0700)]
ofp-actions: Fix use-after-free in decode_NOTE.
When decoding the 'note' action, variable-length data could be pushed to
a buffer immediately prior to calling ofpact_finish_NOTE(). The
ofpbuf_put() could cause reallocation, in which case the finish call
could access freed memory. Fix the issue by updating the local pointer
before passing it to ofpact_finish_NOTE().
If the memory was reused, it may trigger an assert in ofpact_finish():
assertion ofpact == ofpacts->header failed in ofpact_finish()
With the included test, make check-valgrind reports:
Invalid read of size 1
at 0x500A9F: ofpact_finish_NOTE (ofp-actions.h:988)
by 0x4FE5C1: decode_NXAST_RAW_NOTE (ofp-actions.c:4557)
by 0x4FBC05: ofpact_decode (ofp-actions.inc2:3831)
by 0x4F7E87: ofpacts_decode (ofp-actions.c:5780)
by 0x4F709F: ofpacts_pull_openflow_actions__ (ofp-actions.c:5817)
by 0x4F7856: ofpacts_pull_openflow_instructions (ofp-actions.c:6397)
by 0x52CFF5: ofputil_decode_flow_mod (ofp-util.c:1727)
by 0x5227A9: ofp_print_flow_mod (ofp-print.c:789)
by 0x520823: ofp_to_string__ (ofp-print.c:3235)
by 0x5204F6: ofp_to_string (ofp-print.c:3468)
by 0x5925C8: do_recv (vconn.c:644)
by 0x592372: vconn_recv (vconn.c:598)
by 0x565CEA: rconn_recv (rconn.c:703)
by 0x46CB62: ofconn_run (connmgr.c:1367)
by 0x46C7AD: connmgr_run (connmgr.c:320)
by 0x4224A9: ofproto_run (ofproto.c:1763)
by 0x407C0D: bridge_run__ (bridge.c:2888)
by 0x40767A: bridge_run (bridge.c:2943)
by 0x4161B7: main (ovs-vswitchd.c:120)
Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Ansis Atteka <ansisatteka@gmail.com>
STT implementation I saw performance improvements with linearizing
skb for SLUB case. So following patch skips zero copy operation
for such a case.
First change is to reassembly code where in-order packet is merged
to head, if there is no room to merge it then combined packet is
linearized.
Second case is of reassembly of out-of-order packets. In this case
the list of packets is linearized before sending it up to datapath.
Performance number for large packet TCP test using netperf.
The "VLAN splinters" feature works around buggy device drivers in
old Linux versions. But support for the old kernel is dropped, So
now all supported kernel vlan drivers should be working fine with
OVS kernel datapath.
Following patch removes this deprecated feature.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
datapath-windows: Fix recirculation when it is not the last attribute
When the recirc action is in middle, the current code creates a clone of
the NBL. However, it overwrites the pointer to point to the cloned NBL
without completing it. This causes a memory leak that crashes the kernel.
The userspace conntrack had a bug in tcp_wscale_get(), where the length
of an option would be read from the third octet of the option TLV
instead of the second. This could cause an incorrect wscale value to
be returned, and it would at least impact performance.
Also use 'int' instead of 'unsigned' for 'len', since the value can be
negative.
Unfortunately this configuration has some problems with offloads: a
packet generated by the TCP stack maybe sent to p0 without being
checksummed or segmented. The AF_PACKET socket, by default, ignores the
offloads and just transmits the data of the packets to userspace, but:
1. The packet may need GSO, so the data will be too big to be received
by the userspace datapath
2. The packet might have incomplete checksums, so it will likely be
discarded by the receiver.
Problem 1 causes TCP connections to see a congestion window smaller than
the MTU, which hurts performance but doesn't prevent communication.
Problem 2 was hidden in the testsuite by a Linux kernel bug, fixed by
commit ce8c839b74e3("veth: don’t modify ip_summed; doing so treats
packets with bad checksums as good"). In the kernels that include the
fix, the userspace datapath is able to process pings, but not tcp or udp
data.
Unfortunately I couldn't find a way to ask the AF_PACKET to perform
offloads in kernel. A possible fix would be to use the PACKET_VNET_HDR
sockopt and perform the offloads in userspace.
Until a proper fix is worked out for netdev-linux, this commit disables
offloads on the non-OVS side of the veth pair, as a workaround.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Joe Stringer <joe@ovn.org> Acked-by: Flavio Leitner <fbl@sysclose.org>
Alin Serdean [Thu, 10 Mar 2016 13:33:42 +0000 (13:33 +0000)]
datapath-windows: Pause switch state on PnP event
A PnP(plug and play) event will be triggered before trying to disable
the extension. We could use this PnP event to prepare for detaching
the datapath.
This patch sets the switch into a paused state so no more net buffers
are queued.
Also clean some commentaries.
Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com> Acked-by: Sairam Venugopal <vsairam@vmware.com> Acked-by: Nithin Raju <nithin@vmware.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
ovn-controller-vtep: Support BUM traffic for the VTEP Schema.
This patch implements BUM support in the VTEP schema. This relates to
BUM traffic flowing from a gateway towards HVs. This code would be
relevant to HW gateways and the ovs-vtep simulator. In order to do this,
the mcast macs remote table in the VTEP schema is populated based on the
OVN SB port binding. For each logical switch, the SB port bindings are
queried to find all the physical locators to send BUM traffic to and the
VTEP DB is updated.
Some test packets were enabled in the HW gateway test case to exercise
the new code.
Simon Horman [Fri, 22 Apr 2016 12:22:56 +0000 (22:22 +1000)]
packets: use flow protocol when recalculating ipv6 checksums
When using masked actions the ipv6_proto field of an action
to set IPv6 fields may be zero rather than the prevailing protocol
which will result in skipping checksum recalculation.
This patch resolves the problem by relying on the protocol
in the packet rather than that in the set field action.
A similar fix for the kernel datapath has been accepted into David Miller's
'net' tree as b4f70527f052 ("openvswitch: use flow protocol when
recalculating ipv6 checksums").
Cc: Jarno Rajahalme <jrajahalme@nicira.com> Fixes: 6d670e7f0d45 ("lib/odp: Masked set action execution and printing.") Signed-off-by: Simon Horman <simon.horman@netronome.com> Acked-by: Ben Pfaff <blp@ovn.org>
When translating multiple ct actions in a row which include modification
of ct_mark or ct_labels, these fields could be incorrectly translated
into datapath actions, resulting in modification of these fields for
entries when the OpenFlow rules didn't actually specify the change.
For instance, the following OpenFlow actions:
ct(zone=1,commit,exec(set_field(1->ct_mark))),ct(zone=2,table=1),...
Would translate into the datapath actions:
ct(zone=1,commit,mark=1),ct(zone=2,mark=1),recirc(...),...
This commit fixes the issue by zeroing the wildcards for these fields
prior to performing nested actions translation (and restoring
afterwards). As such, these fields do not hold both the match and the
field modification values at the same time. As a result, the ct_mark and
ct_labels don't leak from one ct action to the next.
Fixes: 8e53fe8cf7a1 ("Add connection tracking mark support.") Fixes: 9daf23484fb1 ("Add connection tracking label support.") Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
Joe Stringer [Thu, 21 Apr 2016 21:10:11 +0000 (14:10 -0700)]
system-traffic: Fix IPv6 frag vxlan check.
This was missed before somehow, which would cause the test to fail
(rather than being skipped) if iproute2 didn't support setting the
vxlan dstport on the kernel tunnel device.
Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Simon Horman [Fri, 22 Apr 2016 10:42:43 +0000 (10:42 +0000)]
debian: Fix treatment of upstream version that contains hyphens.
The Debian Policy Manual
(https://www.debian.org/doc/debian-policy/ch-controlfields.html#s-f-Version)
says that the upstream_version may contain only alphanumerics and the
characters . + - : ~ (full stop, plus, hyphen, colon, tilde) and should
start with a digit.
Currently, the upstream_version is defined in the debian/rules file:
DEB_UPSTREAM_VERSION=$(shell dpkg-parsechangelog | sed -rne 's,^Version: ([0-9]:)*([^-]+).*,\2,p')
The version number is taken from the dpkg-parsechangelog printout then the
first part of the version number which does not contain hyphen is filtered
out with sed. However the Debian Policy Manual says that hyphen is allowed
in the upstream_version.
This is not a problem with current vanilla OVS debian version. But, if a
postfix string including a hyphen is added to the upstream_version then
installation of datapath-dkms package will fail.
Reported-by: Zoltán Balogh <zoltan.balogh@ericsson.com> Tested-by: Zoltán Balogh <zoltan.balogh@ericsson.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Calculates the cksum removing the cksum line using a more
strict regex than the used previously.
It fixes a problem when calculating the cksum of a schema that
has fields with the substring cksum (e.g.: a checksum column),
lines that the previous cksum calculation incorrectly removes
before running cksum.
Also, the tool calculate-schema-cksum is introduced. This tool
calculates the cksum of a schema file. It could be used in other
programs, instead of calculating the cksum in an eventually
different way than the expected by cksum-schema-check and other
tools.
Signed-off-by: Esteban Rodriguez Betancourt <estebarb@hpe.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Add braces around the if statement to prevent Visual Studio from giving
the "error C2275: illegal use of this type as an expresion". This happens
when a variable is declared after a block. This error occurs on certain
versions of compilers.
Miguel Angel Ajo [Thu, 14 Apr 2016 09:51:44 +0000 (11:51 +0200)]
netdev-linux: Fix ingress policing burst rate configuration via tc
The tc_police structure was filled with a value calculated in bits
instead of bytes while bytes were expected. This led the setting
of an x8 higher burst value.
Documentation and defaults have been corrected accordingly to minimize
nuisances on users sticking to the defaults.
The suggested burst value is now 80% of policing rate to make sure
TCP works correctly.
Signed-off-by: Miguel Angel Ajo <majopela@redhat.com> Tested-by: Miguel Angel Ajo <majopela@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
tunneling: Fix for concomitant IPv4 and IPv6 tunnels
When using an IPv6 tunnel on the same bridge as an IPv4 tunnel, the flow
received from the IPv6 tunnel would have an IPv4 address added to it, causing
problems when trying to put or execute the action on Linux datapath.
Clearing the IPv6 address when we have a valid IPv4 address fixes this problem.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
"Use form feeds (control+L) to divide long source files into logical
pieces. A form feed should appear as the only character on a line."
checkpatch.py currently complains about form feed. For example, on
commit 2c06d9a927c5("ovstest: Add test-netlink-conntrack command."),
checkpatch.py returns:
W(140): Line has non-spaces leading whitespace
W(140): Line has trailing whitespace
+
W(177): Line has non-spaces leading whitespace
W(177): Line has trailing whitespace
+
W(199): Line has non-spaces leading whitespace
W(199): Line has trailing whitespace
+
This commit suppresses the two warnings for lines with form feeds as the
only character.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ben Pfaff <blp@ovn.org>
classifier: Fix race condition leading to NULL dereference.
Addition of table versioning exposed struct cls_rule member
'cls_match' to RCU readers and made it possible for 'cls_match' become
NULL while being accessed by an RCU reader, but we failed to check for
this condition. This may have resulted in NULL pointer dereference
and ovs-vswitchd crash.
Fix this by making the 'cls_match' member an RCU pointer and checking
the value whenever it potentially read by an RCU reader. In these
instances we use ovsrcu_get(), whereas functions accessible only by
the exclusive writers use ovsrcu_get_protected() and do not need to
check the result.
ovn: Fix the port secuirty test failure by adding a sleep of 2 sec.
Added a sleep of 2 seconds before generating a test packet in ovn.at
so that ovn-northd reads the northbound db changes and updates the
southbound db.
Fixes: 7d9d86a ("ovn-northd: Handle IPv4 addresses with prefixes in lport port security") Signed-off-by: Numan Siddique <nusiddiq@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Update relevant artifacts to add support for DPDK 16.04.
Following changes are applied:
- INSTALL.DPDK.md: CONFIG_RTE_BUILD_COMBINE_LIBS step has been
removed because it is no longer present in DPDK configuration
(combined library is created by default),
- INSTALL.DPDK.md: VHost Cuse configuration is updated,
- netdev-dpdk.c: Link speed definition is changed in DPDK and
netdev_dpdk_get_features is updated accordingly,
- netdev-dpdk.c: TSO and checksum offload has been disabled for
vhostuser device.
- .travis/linux-build.sh: DPDK version is updated and legacy
flags have been removed in configuration.
Signed-off-by: Michal Weglicki <michalx.weglicki@intel.com> Signed-off-by: Panu Matilainen <pmatilai@redhat.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
acinclude: Autodetect DPDK location when configuring OVS
When using DPDK datapath, the OVS configure script requires the DPDK
build directory passed on --with-dpdk. This can be avoided if DPDK
library, headers are in standard compiler search paths.
This patch fixes the problem by searching for DPDK libraries in standard
locations and configure OVS sources for dpdk datapath.
If the install location is manually specified in "--with-dpdk"
autodiscovery shall be skipped.
Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>