Jarno Rajahalme [Mon, 11 Aug 2014 15:38:58 +0000 (08:38 -0700)]
datapath/actions: Mark recalculate_csum as likely in set_ipv6_addr().
The ‘recalculate_csum’ is almost always ‘true’. It is false only if
the ipv6 nexthdr is an extension header, and a routing header is
found. For the majority of ipv6 packets this would not be the case,
so this can be marked as 'likely'.
Pravin B Shelar [Fri, 8 Aug 2014 04:26:35 +0000 (21:26 -0700)]
datapath: Avoid NULL mask check while building mask
OVS does mask validation even if it does not need to convert
netlink mask attributes to mask structure. ovs_nla_get_match()
caller can pass NULL mask structure pointer if the caller does
not need mask. Therefore NULL check is required in SW_FLOW_KEY*
macros. Following patch does not convert mask netlink attributes
if mask pointer is NULL, so we do not need these checks in
SW_FLOW_KEY* macro.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Daniele Di Proietto <ddiproietto@vmware.com> Acked-by: Andy Zhou <azhou@nicira.com>
Pravin B Shelar [Thu, 7 Aug 2014 22:46:19 +0000 (15:46 -0700)]
datapath: Optimize recirc action.
OVS need to flow key for flow lookup in recic action. OVS
does key extract in recic action. Most of cases we could
use OVS_CB packet key directly and can avoid packet flow key
extract. SET action we can update flow-key along with packet
to keep it consistent. But there are some action like MPLS
pop which forces OVS to do flow-extract. In such cases we
can mark flow key as invalid so that subsequent recirc
action can do full flow extract.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Andy Zhou <azhou@nicira.com>
Pravin B Shelar [Fri, 8 Aug 2014 03:34:20 +0000 (20:34 -0700)]
datapath: Avoid using stack larger than 1024.
commit (datapath: Refactor action alloc and copy api) effectively
reverted 1d2a1b5f5252e4c6ce8bbf8d91ca27aba52496e6 (datapath: Factor out
allocation and verification of actions.). This results in following
warning:
CC [M] /home/jesse/openvswitch/datapath/linux/datapath.o
/home/jesse/openvswitch/datapath/linux/datapath.c: In function
‘ovs_flow_cmd_set’:
/home/jesse/openvswitch/datapath/linux/datapath.c:1094:1: warning: the
frame size of 1256 bytes is larger than 1024 bytes
[-Wframe-larger-than=]
}
Pravin B Shelar [Thu, 7 Aug 2014 19:51:14 +0000 (12:51 -0700)]
datapath: Refactor action alloc and copy api.
There are two separate API to allocate and copy actions list. Anytime
OVS needs to copy action list, it needs to call both functions.
Following patch moves action allocation to copy function to avoid
code duplication.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Justin Pettit [Wed, 6 Aug 2014 21:15:52 +0000 (14:15 -0700)]
datapath: Update comments about 'OVS_KEY_ATTR_8021Q'.
Commit fea393b1 (datapath: Describe policy for extending flow key,
implement needed changes.) changed the key 'OVS_KEY_ATTR_8021Q' to
'OVS_KEY_ATTR_VLAN' and the size of the attribute structure. A couple
of comments were missed, so this commit updates them.
Signed-off-by: Justin Pettit <jpettit@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>
Pravin B Shelar [Tue, 5 Aug 2014 20:49:57 +0000 (13:49 -0700)]
datapath: Use tun_info only for egress tunnel path.
Currently tun_info is used for passing tunnel information
on ingress and egress path, this cause confusion. Following
patch removes its use on ingress path make it egress only parameter.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Andy Zhou <azhou@nicira.com>
Pravin B Shelar [Tue, 5 Aug 2014 20:43:45 +0000 (13:43 -0700)]
datapath: Avoid using wrong metadata for recic action.
Recirc action needs to extract flow key from packet, it uses tun_info
from OVS_CB for setting tunnel meta data in flow key. But tun_info
can be overwritten by tunnel send action. This would result in wrong
flow key for the recirculation.
Following patch copies flow-key meta data from OVS_CB packet key
itself thus avoids this bug.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Andy Zhou <azhou@nicira.com>
OVS flow extract is called on packet receive or packet
execute code path. Following patch defines separate API
for extracting flow-key in packet execute code path.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Andy Zhou <azhou@nicira.com>
There's no reason for the local variable 'old_count' to be atomic. In fact, if
it is atomic it triggers a GCC4.9 warning (Wunused-value)
The global variables 'a' and 'paux' could be static, according to sparse.
Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
This patch causes classifier_lookup_miniflow_batch() to return a
boolean indicating whether any rules could not be successfully looked
up. Used in future patches.
Signed-off-by: Ethan Jackson <ethan@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Ethan Jackson [Mon, 28 Jul 2014 01:56:45 +0000 (18:56 -0700)]
dpif-netdev: Avoid useless flow copy in dp_netdev_flow_add().
This patch gives dp_netdev_flow_add() a match with which it can
initialize the classifier rule. This prevents it from needing to copy
a flow and flow_wildcards into the match first.
Signed-off-by: Ethan Jackson <ethan@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Jarno Rajahalme [Tue, 5 Aug 2014 20:51:19 +0000 (13:51 -0700)]
lib/ovs-atomic: Native support for 32-bit 586 with GCC.
XenServer runs OVS in dom0, which is a 32-bit VM. As the build
environment lacks support for atomics, locked pthread atomics were
used with considerable performance hit.
This patch adds native support for ovs-atomic with 32-bit Pentium and
higher CPUs, when compiled with an older GCC. We use inline asm with
the cmpxchg8b instruction, which was a new instruction to Intel
Pentium processors. We do not expect anyone to run OVS on 486 or older
processor.
cmap benchmark before the patch on 32-bit XenServer build (uses
ovs-atomic-pthread):
$ tests/ovstest test-cmap benchmark 2000000 8 0.1
Benchmarking with n=2000000, 8 threads, 0.10% mutations:
cmap insert: 8835 ms
cmap iterate: 379 ms
cmap search: 6242 ms
cmap destroy: 1145 ms
After:
$ tests/ovstest test-cmap benchmark 2000000 8 0.1
Benchmarking with n=2000000, 8 threads, 0.10% mutations:
cmap insert: 711 ms
cmap iterate: 68 ms
cmap search: 353 ms
cmap destroy: 209 ms
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Jarno Rajahalme [Tue, 5 Aug 2014 20:51:19 +0000 (13:51 -0700)]
lib/ovs-atomic: Native support for x86_64 with GCC.
Some supported XenServer build environments lack compiler support for
atomic operations. This patch provides native support for x86_64 on
GCC, which covers possible future 64-bit builds on XenServer.
Since this implementation is faster than the existing support prior to
GCC 4.7, especially for cmap inserts, we use this with GCC < 4.7 on
x86_64.
Example numbers with "tests/test-cmap benchmark 2000000 8 0.1" on
quad-core hyperthreaded laptop, built with GCC 4.6 -O2:
Using ovs-atomic-pthreads on x86_64:
Benchmarking with n=2000000, 8 threads, 0.10% mutations:
cmap insert: 4725 ms
cmap iterate: 329 ms
cmap search: 5945 ms
cmap destroy: 911 ms
Using ovs-atomic-gcc4+ on x86_64:
Benchmarking with n=2000000, 8 threads, 0.10% mutations:
cmap insert: 845 ms
cmap iterate: 58 ms
cmap search: 308 ms
cmap destroy: 295 ms
With the native support provided by this patch:
Benchmarking with n=2000000, 8 threads, 0.10% mutations:
cmap insert: 530 ms
cmap iterate: 59 ms
cmap search: 305 ms
cmap destroy: 232 ms
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Jarno Rajahalme [Tue, 5 Aug 2014 20:51:19 +0000 (13:51 -0700)]
lib/ovs-atomic-gcc4+: Use 'volatile' to enforce memory access.
Use 'volatile' to enforce a new memory access on each lockless atomic
store and read. Without this a loop consisting of an atomic_read with
memory_order_relaxed would be simply optimized away. Also, using
volatile is cheaper than adding a full compiler barrier (also) in that
case.
This use of a volatile cast mirrors the Linux kernel ACCESS_ONCE macro.
Without this change the more rigorous atomic test cases introduced in
a following patch will hang due to the atomic accesses being optimized
away.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Jarno Rajahalme [Tue, 5 Aug 2014 20:51:19 +0000 (13:51 -0700)]
ovs-atomic: Fix GCC4+ atomic_flag.
The default memory order for atomic_flag is documented to be
memory_order_seq_cst (as in C11), but the GCC4+ implementation only
used the GCC builtins, which provide acquire and release semantics
only. Additional barriers are needed for in other cases.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Jarno Rajahalme [Tue, 5 Aug 2014 20:51:19 +0000 (13:51 -0700)]
lib/ovs-atomic: Require memory_order be constant.
Compiler implementations may provide sub-optimal support for a
memory_order passed in as a run-time value (ref.
https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html).
Document that OVS atomics require the memory order to be passed in as
a compile-time constant.
It should be noted, however, that when inlining is disabled (i.e.,
compiling without optimization) even compile-time constants may be
passed as run-time values to (non-inlined) functions.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
The definition of memory_order_relaxed included a compiler barrier,
while it is not necessary, and indeed the following text on
atomic_thread_fence and atomic_signal_fence contradicted that.
memory_order_consume and memory_order_acq_rel are also more thoroughly
described.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Jarno Rajahalme [Tue, 5 Aug 2014 20:51:19 +0000 (13:51 -0700)]
Fix strict aliasing violations with GCC 4.1 and 4.4.
The typical use of struct sockaddr_storage is flagged as a strict
aliasing violation by GCC 4.4.7. Using an explicit union lets the
compiler know that accessing the same location via different types is
not an error.
GCC 4.1.2 had a similar complaint about a cast of ukey's key_buf to
nlattr. After this patch there are no further warnings with the
XenServer build, so we could start treating warnings as errors in the
builds.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Tue, 29 Jul 2014 16:54:17 +0000 (09:54 -0700)]
tests: Turn off appctl poll_loop logging for long output.
One of the VMware internal autobuilder builds failed due to extraneous
logging in these tests of the form:
2014-07-28T21:11:07Z|00001|poll_loop|INFO|wakeup due to [POLLIN] on fd 3
(...) at lib/stream-fd-unix.c:124 (93% CPU usage)
I think this must be because these tests have tons of output and so on a
loaded machine it can take some CPU to pull it down. We don't want to fail
for that reason, so disable this logging.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Ansis Atteka <aatteka@nicira.com>
Ben Pfaff [Mon, 4 Aug 2014 19:36:04 +0000 (12:36 -0700)]
ovs-appctl: Add logging options.
Normally I would also add documentation for the logging options to the
ovs-appctl manpage, but I am concerned that in this case it would actually
make the manpage confusing, because one of the main purposes of ovs-appctl
is to modify the log levels of *other* programs, and these options only
modify the log level of ovs-appctl itself, which is rarely useful.
The following commit will start using these logging options in a test.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Ansis Atteka <aatteka@nicira.com>
dpctl: add ovs-appctl dpctl/* commands to talk to dpif-netdev
This commit introduces multiple appctl commands (dpctl/*)
They are needed to interact with userspace datapaths (dpif-netdev), because the
ovs-dpctl command runs in a separate process and cannot see the userspace
datapaths inside vswitchd.
This change moves most of the code of utilities/ovs-dpctl.c in lib/dpctl.c.
Both the ovs-dpctl command and the ovs-appctl dpctl/* commands make calls to
lib/dpctl.c functions, to interact with datapaths.
The code from utilities/ovs-dpctl.c has been moved to lib/dpctl.c and has been
changed for different reasons:
- An exit() call in the old code made perfectly sense. Now (since the code
can be run inside vswitchd) it would terminate the daemon. Same reasoning
can be applied to ovs_fatal_*() calls.
- The lib/dpctl.c code _should_ not leak memory.
- All the print* have been replaced with a function pointer provided by the
caller, since this code can be run in the ovs-dpctl process (in which
case we need to print to stdout) or in response to a unixctl request (and
in this case we need to send everything through a socket, using JSON
encapsulation).
The syntax is
ovs-appctl dpctl/(COMMAND) [OPTIONS] [PARAMETERS]
while the ovs-dpctl syntax (which _should_ remain the same after this change)
is
ovs-dpctl [OPTIONS] (COMMAND) [PARAMETERS]
Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
[blp@nicira.com made stylistic and documentation changes] Signed-off-by: Ben Pfaff <blp@nicira.com>
Andy Zhou [Sun, 3 Aug 2014 23:16:40 +0000 (16:16 -0700)]
ofproto-dpif: fix an ovs crash when dpif_recv_set returns error
When dpif_recv_set returns an error, close_dpif_backer gets called,
which in term calls recirc_id_pool_destroy, which can lead to a crash
since recirc_id_pool_create() was not called before this patch.
Reported-by: Mukesh Hira <mhira@vmware.com> Signed-off-by: Andy Zhou <azhou@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Sat, 2 Aug 2014 00:22:20 +0000 (17:22 -0700)]
ofproto-dpif: Use DPIF_FP_CREATE but not DPIF_FP_MODIFY.
A dpif reports EEXIST if a flow put operation that should create a new flow
instead attempts to modify an existing flow, or ENOENT if a flow put would
create a flow that overlaps some existing flow. The latter does not always
indicate a bug in OVS userspace, because it can also mean that two
userspace OVS packet handler threads received packets that generated
the same megaflow for different microflows. Until now, userspace has
logged this, which confuses users by making them suspect a bug. We could
simply not log ENOENT in userspace, but that would suppress logging for
genuine bugs too. Instead, this commit drops DPIF_FP_MODIFY from flow
put operations in ofproto-dpif, which transforms this particular
problem into EEXIST, which userspace already logs at "debug" level (see
flow_message_log_level()), effectively suppressing the logging in normal
circumstances.
It appears that in practice ofproto-dpif doesn't actually ever need to
modify flows in the datapath, only create and delete them, so this
shouldn't cause problems.
Ben Pfaff [Fri, 13 Jun 2014 22:28:29 +0000 (15:28 -0700)]
odp-netlink.h: Use 32-bit aligned 64-bit types.
Open vSwitch userspace uses special types to indicate that a particular
object may not be naturally aligned. Netlink is one source of such
problems: in Netlink, 64-bit integers are often aligned only on 32-bit
boundaries. This commit changes the odp-netlink.h that is transformed from
<linux/openvswitch.h> to use these types to make it harder to accidentally
access a misaligned 64-bit member.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
Ben Pfaff [Mon, 4 Aug 2014 18:11:40 +0000 (11:11 -0700)]
Do not seemingly #include Linux-specific headers on other platforms.
Until now, the OVS source tree has had a whole maze of header files that
make "#include <linux/openvswitch.h>" work OK regardless of platform, but
this confuses everyone new to the tree, at first glance, and is difficult
to understand at second glance too.
This commit renames include/linux/openvswitch.h to
datapath/linux/compat/include/linux/openvswitch.h without other change,
then modifies the userspace build to generate a header that makes sense in
portable Open vSwitch userspace from that header.
It then removes all the remaining include/linux/* files since they are now
unused.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
datapath: do not use vport type to determine presence of Geneve attributes
This patch fixes following kernel crash that could happen, if geneve
vport was not added yet, but revalidator thread attempted to dump flows.
To reproduce:
1. switch tunnel type between geneve and gre in a loop; and
2. run ping.
Chunhe Li [Wed, 30 Jul 2014 01:49:01 +0000 (09:49 +0800)]
datapath: Drop packets when interdev is not up
If the internal device is not up, it should drop received
packets. Sometimes it receive the broadcast or multicast
packets, and the ip protocol stack will casue more cpu
usage wasted.
Signed-off-by: Chunhe Li <lichunhe@huawei.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>
Ben Pfaff [Tue, 29 Jul 2014 15:59:40 +0000 (08:59 -0700)]
netlink-socket: Add conceptual documentation.
Based on a conversation with the VMware Hyper-V team earlier today.
This commit also changes a couple of functions that were only used with
netlink-socket.c into static functions. I couldn't think of a reason for
code outside that file to use them.
Ben Pfaff [Mon, 28 Jul 2014 17:17:56 +0000 (10:17 -0700)]
connmgr: Demote service controllers too in ofconn_set_role().
Service controllers can set their roles, so it's necessary to demote them
to slaves if another controller becomes master. Unfortunately the
'controllers' hmap only contains primary controllers, so this was omitted.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Justin Pettit <jpettit@nicira.com>
Saurabh Shah [Mon, 28 Jul 2014 00:26:58 +0000 (17:26 -0700)]
datapath-windows: Kernel module for HyperV.
The kernel switch extension has support for bridged back forwarding & tunneling
over VXLAN. There is no Netlink integration as it is still being worked out.
Co-Authored-By: Ankur Sharma <ankursharma@vmware.com> Signed-off-by: Ankur Sharma <ankursharma@vmware.com> Co-Authored-By: Eitan Eliahu <eliahue@vmware.com> Signed-off-by: Eitan Eliahu <eliahue@vmware.com> Co-Authored-By: Guolin Yang <gyang@vmware.com> Signed-off-by: Guolin Yang <gyang@vmware.com> Co-Authored-By: Linda Sun <lsun@vmware.com> Signed-off-by: Linda Sun <lsun@vmware.com> Co-Authored-By: Nithin Raju <nithin@vmware.com> Signed-off-by: Nithin Raju <nithin@vmware.com> Signed-off-by: Saurabh Shah <ssaurabh@vmware.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Sat, 26 Jul 2014 19:19:03 +0000 (12:19 -0700)]
ofproto-dpif-upcall: Fix sparse warnings.
Fixes these warnings from "sparse":
../ofproto/ofproto-dpif-upcall.c:761:1: warning: symbol 'free_upcall' was
not declared. Should it be static?
../ofproto/ofproto-dpif-upcall.c:849:1: warning: symbol 'convert_upcall'
was not declared. Should it be static?
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Justin Pettit <jpettit@nicira.com>
Ben Pfaff [Thu, 15 May 2014 15:19:11 +0000 (08:19 -0700)]
meta-flow: Simplify handling of a variable number of registers.
At the time that Open vSwitch implemented registers, there was a high cost
to adding additional fields, so I wrote the code so that the number of
registers could be reduced at compile time. Now, fields are cheaper
(though not free) and in the meantime I have never heard of anyone reducing
the number of registers. Since I intend to add more code that would
require awkward "#if"s like this, I think that this is a good time to
simplify it by requiring FLOW_N_REGS to be fixed. This commit does that.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Typically, kernel datapath threads send upcalls to userspace where
handler threads process the upcalls. For TAP and DPDK devices, the
datapath threads operate in userspace, so there is no need for
separate handler threads.
This patch allows userspace datapath threads to directly call the
ofproto upcall functions, eliminating the need for handler threads
for datapaths of type 'netdev'.
Signed-off-by: Ryan Wilson <wryan@nicira.com> Signed-off-by: Ethan Jackson <ethan@nicira.com> Acked-by: Ethan Jackson <ethan@nicira.com>
There is race in datapath when last mask in mask array deleted can
result in NULL pointer dereference in datapath.
datapath lookup does not check mask pointer if its index is less than
mask-array count. That is safe because delete operation moves last valid
pointer to the deleted element. But this does not work if we are
deleting last element in array. Following patch adds NULL check for the
mask pointer.
This patch also avoids accessing ma->count without any locks.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Andy Zhou <azhou@nicira.com>
Ben Pfaff [Fri, 25 Jul 2014 16:19:17 +0000 (09:19 -0700)]
cfm: Reduce "long delay" message from WARN to INFO, to match BFD behavior.
These messages can cause the testsuite to fail on a busy build machine
since the testsuite treats WARN or ERR log messages as failures. BFD
uses an INFO message instead of WARN, so this just changes CFM to match.
Alternatively, the testsuite could ignore "long delay" messages (it ignores
some other categories of messages). In that case I'd expect that we'd
want to change BFD to match CFM since I don't know of a reason why they
should log differently.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Justin Pettit <jpettit@nicira.com>
Andy Zhou [Thu, 24 Jul 2014 09:17:48 +0000 (02:17 -0700)]
datapath: reorder action netlink attribute definition for upstreaming
Keeping the order of netlink attribute definition in the order of
upstreaming is the best way to keep all released user space program
forward compatible with upstreamed kernel modules.
Adjust action netlink attribute order to match with the current
upstreaming plan.
Recirc and hash actions are introduced in branch 2.3, which will be
fixed by the patch. The MPLS actions have been released since
branch-2.1 but there is no kernel implementation of them prior to
branch 2.3. Thus the ordering change should not affect them.
Signed-off-by: Andy Zhou <azhou@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>
datapath/flow_netlink: Avoid wildcarding tunnel key with disabled megaflows
If the userspace wants to match on a flow with some tunnel attributesset to 0,
it simply omits them in the netlink attributes stream.
Since our wildcarding logic (when megaflows are disabled) is based on the
attributes in the netlink stream, we set our mask incorrectly.
This commit adds a check to detect if the userspace wants to match on a tunnel,
in which case we simply unwildcard the whole tun_key
Reported-by: Andy Zhou <azhou@nicira.com> Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com> Signed-off-by: Andy Zhou <azhou@nicira.com>
Alex Wang [Wed, 23 Jul 2014 06:14:50 +0000 (23:14 -0700)]
datapath: flow_netlink: Fix a bug.
Commit 62974663fe (datapath/flow_netlink: Create right mask with
disabled megaflows) introduced the bug which caused
ovs_nla_get_match() returns immediately after parsing the flow
mask for OVS_KEY_ATTR_ENCAP. Consequently, when vlan encapsulated
packets are present, the corresponding datapath flows will have
incorrect mask like below. And the incorrect flows could affect
other non-vlan packets.
Ben Pfaff [Wed, 23 Jul 2014 04:06:44 +0000 (21:06 -0700)]
dpif-netdev: Initialize upcall->packet when queuing to userspace.
Commit db73f7166a6 (netdev-dpdk: Fix race condition with DPDK mempools in
non pmd threads) switched to a new way of setting up 'upcall->packet', but
only initialized two of the fields in the packet. This could cause
core dumps and other strange behavior. In particular it caused failures in
several unit tests on XenServer.
This commit fixes the problem by initializing the entire ofpbuf.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Justin Pettit <jpettit@nicira.com>
netlink-notifier: Exit loop if nl_sock_recv() returns an error
An error from nl_sock_recv() could mean that there are some issues with the
netlink socket (EBADF, ENOTSOCK, ...). Calling nl_sock_recv() in this case is
harmful: nln_run() will never return and since we are calling it from the main
thread, vswitchd becomes unresponsive.
Also, with this commit we avoid calling the notifier callback in case of error
(except for ENOBUFS, which means that there could be too many notifications)
Suggested-by: Alex Wang <alexw@nicira.com> Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com> Acked-by: Ben Pfaff <blp@nicira.com> Acked-by: Alex Wang <alexw@nicira.com>
Ben Pfaff [Tue, 22 Jul 2014 23:18:35 +0000 (16:18 -0700)]
configure: Don't check for malloc hooks that we no longer use.
Commit 825da1c6d1c7 (leak-checker: Remove because it cannot be made
thread-safe.) removed the only uses of these hooks but neglected to remove
the test for them.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Justin Pettit <jpettit@nicira.com>
Ben Pfaff [Tue, 22 Jul 2014 23:09:26 +0000 (16:09 -0700)]
ovs-ctl: Add comment to explain why we only save ofports for pre-1.10.
We've had a couple of questions about this lately, including one suggestion
that we should always save the OpenFlow port numbers. This explains why
the behavior is as it is.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Justin Pettit <jpettit@nicira.com>
Ben Pfaff [Tue, 22 Jul 2014 18:50:37 +0000 (11:50 -0700)]
debian: Avoid -Wformat-zero-length warnings.
Debian puts an extra "-Wformat" in the CFLAGS following OVS's own
"-Wformat -Wno-format-zero-length", which therefore overrides
-Wno-format-zero-length, so this commit adds an extra
-Wno-format-zero-length to avoid those false positives.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Justin Pettit <jpettit@nicira.com>
Ben Pfaff [Tue, 22 Jul 2014 04:00:04 +0000 (21:00 -0700)]
cmap, classifier: Avoid unsafe aliasing in iterators.
CMAP_FOR_EACH and CLS_FOR_EACH and their variants tried to use void ** as
a "pointer to any kind of pointer". That is a violation of the aliasing
rules in ISO C which technically yields undefined behavior. With GCC 4.1,
it causes both warnings and actual misbehavior. One option would to add
-fno-strict-aliasing to the compiler flags, but that would only help with
GCC; who knows whether this can be worked around with other compilers.
Instead, this commit rewrites the iterators to avoid disallowed pointer
aliasing.
VMware-BZ: #1287651 Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
The check for the need of default values was in the wrong place,
causing no prefix tracking to be used when database had no
configuration for a flow table. Missing configuration means that
defaults should be used.
To limit clutter on the log, we now log the prefix tracking
configuration when it is explicitly set in the database.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
lib/ovs-rcu: evaluate argument of ovsrcu_get only once.
As ovsrcu_get() looks like a function call, it is reasonable for the
callers to expect that the arguments are evaluated only once.
CONST_CAST expands its 'POINTER' argument multiple times, and the
exact effect of this turned out to be compiler dependent. Fix this by
expanding the macro argument before CONST_CAST, and removing
unnecessary CONST_CASTs.
netdev-dpdk: Fix race condition with DPDK mempools in non pmd threads
DPDK mempools rely on rte_lcore_id() to implement a thread-local cache.
Our non pmd threads had rte_lcore_id() == 0. This allowed concurrent access to
the "thread-local" cache, causing crashes.
This commit resolves the issue with the following changes:
- Every non pmd thread has the same lcore_id (0, for management reasons), which
is not shared with any pmd thread (lcore_id for pmd threads now start from 1)
- DPDK mbufs must be allocated/freed in pmd threads. When there is the need to
use mempools in non pmd threads, like in dpdk_do_tx_copy(), a mutex must be
held.
- The previous change does not allow us anymore to pass DPDK mbufs to handler
threads: therefore this commit partially revert 143859ec63d45e. Now packets
are copied for upcall processing. We can remove the extra memcpy by
processing upcalls in the pmd thread itself.
With the introduction of the extra locking, the packet throughput will be lower
in the following cases:
- When using internal (tap) devices with DPDK devices on the same datapath.
Anyway, to support internal devices efficiently, we needed DPDK KNI devices,
which will be proper pmd devices and will not need this locking.
- When packets are processed in the slow path by non pmd threads. This overhead
can be avoided by handling the upcalls directly in pmd threads (a change that
has already been proposed by Ryan Wilson)
Also, the following two fixes have been introduced:
- In dpdk_free_buf() use rte_pktmbuf_free_seg() instead of rte_mempool_put().
This allows OVS to run properly with CONFIG_RTE_LIBRTE_MBUF_DEBUG DPDK option
- Do not bulk free mbufs in a transmission queue. They may belong to different
mempools
Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>
Alex Wang [Fri, 18 Jul 2014 21:27:36 +0000 (14:27 -0700)]
netlink-socket: Do not make flow_dump block on netlink socket.
Commit 93295354 (netlink-socket: Simplify multithreaded dumping
to match Linux reality.) makes the call to recvmsg() block if no
messages are available. This can cause revalidator threads hanging
for long time or even deadlock when main thread tries to stop the
revalidator threads.
This commit fixes the issue by enabling the MSG_DONTWAIT flag in
the call to recvmsg().
Signed-off-by: Alex Wang <alexw@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Without this change, with shared libraries, VLOG
constructor for ovsdb-server would get called twice corrupting
the 'vlog_modules' list causing an infinite loop.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com> Reported-by: Gur Stavi <gstavi@mrv.com> Acked-by: Ben Pfaff <blp@nicira.com>
lib/classifier: Return all matching prefix lengths from trie lookup.
Previously we only returned the last matching prefix length
encountered during a trie lookup, and skipped subtables that had
prefixes longer than that. This patch changes the trie lookup
functions to return all matching prefix lengths seen, so that all
non-matching prefix lengths can be skipped.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
lib/classifier: Unify struct classifier and cls_classifier.
Now that it is clear that struct cls_classifier itself does not
need RCU indirection and pvector is defined in its own header, it
is possible get rid of the indirection from struct classifier to
struct cls_classifier.