Andy Zhou [Thu, 13 Mar 2014 22:28:54 +0000 (15:28 -0700)]
backtrace: Add log_backtrace()
log_backtrace() and log_backtrace_msg() logs the back trace into
the log file. It may be most useful when debugging unit tests.
"backtrace.h" documents the usage. It is not being called directly
in the code, but rather as a handy tool available when needed.
Signed-off-by: Andy Zhou <azhou@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Andy Zhou [Fri, 14 Mar 2014 04:48:55 +0000 (21:48 -0700)]
udpif: Bug fix updif_flush
Before this commit, all datapath flows are cleared with dpif_flush(),
but the revalidator thread still holds ukeys, which are caches of the
datapath flows in the revalidaor. Flushing ukeys causes flow_del
messages to be sent to the datapath again on flows that have been
deleted by the dpif_flush() already.
Double deletion by itself is not problem, per se, may an efficiency
issue. However, for ever flow_del message sent to the datapath, a log
message, at the warning level, will be generated in case datapath
failed to execute the command. In addition to cause spurious log
messages, Double deletion causes unit tests to report erroneous
failures as all warning messages are considered test failures.
The fix is to simply shut down the revalidator threads to flush all
ukeys, then flush the datapth before restarting the revalidator threads.
dpif_flush() was implemented as flush flows of all datapaths while
most of its invocation should only flush its local datapath.
Only megaflow on/off commands should flush all dapapaths. This bug is
also fixed.
Found during development.
Signed-off-by: Andy Zhou <azhou@nicira.com> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
kmindg [Sun, 9 Mar 2014 09:48:52 +0000 (17:48 +0800)]
stp: Fix bpdu tx problem in listening state
The restriction only allows to send bpdu in forwarding state in
compose_output_action__. But a port could send bpdu in listening
and learning state according to comments in lib/stp.h(State of
an STP port).
Until this commit, OVS did not send out BPDUs in listening and learning
states. But those two states are temporary, the stp port will be in
forwarding state and send out BPDUs eventually (In the default
configuration listening and learning states last 15+15 second). Therefore,
this bug increased convergence time but did not entirely break STP.
Signed-off-by: kmindg <kmindg@gmail.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
windows/netinet: Copy ip6.h and icmp6.h from netbsd.
There are a few structure definitions that is used from
these headers. So copy them from the netbsd repo.
The following changes have been made on top of it:
* The keyword "__packed" has been removed
from the headers as the corresponding Linux headers don't
do packing.
* #if BYTE_ORDER == 'X' macros have been replaced by CONSTANT_HTONx().
* code inside #ifdef _KERNEL has been deleted.
* code inside #ifdef ICMP6_STRINGS has been deleted.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Tue, 11 Mar 2014 19:46:29 +0000 (12:46 -0700)]
ovs-atomic: Use raw types, not structs, when locks are required.
Until now, the GCC 4+ and pthreads implementations of atomics have used
struct wrappers for their atomic types. This had the advantage of allowing
a mutex to be wrapped in, in some cases, and of better type-checking by
preventing stray uses of atomic variables other than through one of the
atomic_*() functions or macros. However, the mutex meant that an
atomic_destroy() function-like macro needed to be used. The struct wrapper
also made it impossible to define new atomic types that were compatible
with each other without using a typedef. For example, one could not simply
define a macro like
#define ATOMIC(TYPE) struct { TYPE value; }
and then have two declarations like:
ATOMIC(void *) x;
ATOMIC(void *) y;
and do anything with these objects that require type-compatibility, even
"&x == &y", because the two structs are not compatible. One can do it
through a typedef:
typedef ATOMIC(void *) atomic_voidp;
atomic_voidp x, y;
but that is inconvenient, especially because of the need to invent a name
for the type.
This commit aims to ease the problem by getting rid of the wrapper structs
in the cases where the atomic library used them. It gets rid of the
mutexes, in the cases where they are still needed, by using a global
array of mutexes instead.
This commit also defines the ATOMIC macro described above and documents
its use in ovs-atomic.h.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Andy Zhou <azhou@nicira.com>
idl headers won't be built, if we build individual executables
e..g., "make ovsbd/ovsdb-server.exe". According to
http://www.gnu.org/software/automake/manual/html_node/Built-Sources-Example.html
we may have to add the headers as dependecies for every executables.
Currently the lack of a ovs-appctl port to Windows prevents us from
running just a "make". We plan to get ovs-appctl port done soon. Till
then, call out that the idl headers need to be built separately.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
ofp-actions: Relax build assertion condition for ofpact_nest struct.
struct ofpact has enums that are packed in case of __GNUC__.
This packing does not occur for visual studio. For 'struct ofpact_nest',
we are currently expecting that "struct ofpact actions[]" has an offset of
8 bytes. This condition won't be true in compilers where enums are
not packed.
It is good enough if struct ofpact actions[] starts at an offset which is
a multiple of OFPACT_ALIGNTO.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Windows does not have the getppid(), getuid(), getgid() functions.
We do get a random seed from CryptGenRandom(). That seed along with
process id and current time hopefully is good enough.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
kmindg [Sun, 9 Mar 2014 09:48:04 +0000 (17:48 +0800)]
ofproto: Update rule's priority in eviction group.
We do call heap_rebuild in ofproto_run, but we do not update rule's
priority with latest hard_timeout and idle_timeout before heap_rebuild.
This patch ensures that rule's priority has been updated before
heap_rebuild, and adds two test cases to check eviction with modified
hard_timeout and idle_timwout.
Signed-off-by: kmindg <kmindg@gmail.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
We use the number of cpu cores to determine the number
of threads that we spawn. We are not yet sure what is
the ideal number of OVS userspace threads that can run
on Hyper-V. Till we figure that out, use the same logic
of counting CPU cores in Windows too.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Joe Stringer [Fri, 7 Mar 2014 01:20:27 +0000 (17:20 -0800)]
tests: Test learned flow idle timeouts.
The previous tests would check that a single learned flow had its stats
correctly attributed to the right interfaces and flows. These new tests
take it a step further by causing two different learned flows to be
created, and checking the stats are correct. This is done for rules that
are learned with idle and hard timeouts.
Signed-off-by: Joe Stringer <joestringer@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
Lorand Jakab [Tue, 11 Mar 2014 17:02:28 +0000 (19:02 +0200)]
README-lisp: improve LISP documentation
People familiar with LISP are used to the concept of a mapping cache in
a LISP Tunnel Router. Explain how that concept maps to OVS flow rules.
Additionally, mention that eth0 need not be added in all example
scenarios.
Signed-off-by: Lorand Jakab <lojakab@cisco.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
Joe Stringer [Thu, 6 Mar 2014 00:56:05 +0000 (16:56 -0800)]
upcall: Configure datapath max-idle through ovs-vsctl.
This patch adds a new configuration option, "max-idle" to the
Open_vSwitch "other-config" column. This sets how long datapath flows
are cached in the datapath before revalidators expire them.
Signed-off-by: Joe Stringer <joestringer@nicira.com> Signed-off-by: Justin Pettit <jpettit@nicira.com>
We use getrusage mainly to get user CPU time and system CPU time.
Windows has a GetProcessTimes and GetThreadTimes that does the
same job. So use them.
We also use getrusage to get page faults. Use GetProcessMemoryInfo()
for that.
We also get number of context switches, block i/o times and use it for
debug information when we wake up from poll_block late. I haven't found
functions for that in Windows. We only use it for debug information, so
it should be okay not implementing it.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com> Co-authored-by: Linda Sun <lsun@vmware.com> Signed-off-by: Linda Sun <lsun@vmware.com> Acked-by: Ben Pfaff <blp@nicira.com>
Use GetSystemTimePreciseAsFileTime() for gettimeofday().
GetSystemTimePreciseAsFileTime() provides the result that is more
high resolution than just the microsecond that gittimeofday() in
Linux provides. So we need to remove some additional precision.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
QueryPerformanceCounter() retrieves the current value of the performance
counter, which is a high resolution (<1us) time stamp that can be used for
time-interval measurements. So, use it for MONOTONIC clock.
The GetSystemTimePreciseAsFileTime() function retrieves the current system date
and time with the highest possible level of precision (<1us). Use it for
real time clock. This function returns a counter representing the number of
100-nanosecond intervals since January 1, 1601. To make it compatible with
Linux CLOCK_REALTIME, we need to calculate the 100-nanoseconds counter value
till 01/01/1970.
An upcoming commit implements gettimeofday() using the same clock, so,
carve out a function.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Thu, 23 Jan 2014 23:35:22 +0000 (15:35 -0800)]
Makefile: Compile Linux-specific files based on __linux__ macro.
We want to conditionally compile several files based on whether we're
building for a Linux host, so we need some Automake conditional for that.
Previously this was based on whether Netlink is available and we're not
on ESX (since ESX has Netlink but isn't Linux), but it's more
straightforward to just test for Linux directly.
CC: Luigi Rizzo <rizzo@iet.unipi.it> Signed-off-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Thu, 23 Jan 2014 23:33:25 +0000 (15:33 -0800)]
Use __linux__ instead of LINUX_DATAPATH in C code.
The LINUX_DATAPATH C preprocessor symbol was originally meant to be used as
a signal for whether the Linux datapath module could be used, but it was
used as a proxy for a lot of other stuff that is really just Linux
specific. This commit switches all of these users to just test for
__linux__, which is more straightforward and should have the same result.
CC: Luigi Rizzo <rizzo@iet.unipi.it> Signed-off-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Sun, 2 Mar 2014 01:15:00 +0000 (17:15 -0800)]
tunnel: Do not set padding bits in tunnel mask.
On most architectures other than 32-bit x86, struct flow_tnl ends with 4
padding bytes. Until now, tnl_xlate_init() set those bytes to nonzero
values in the wildcard mask. When the wildcard mask passed through Netlink
attributes and back to userspace, the padding bytes of course became zero
again, which caused a wildcard mask mismatch and premature deletion of the
flow in revalidation. This commit fixes the problem.
Bug #1192516. Reported-by: Krishna Miriyala <miriyalak@vmware.com> Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Ethan Jackson <ethan@nicira.com>
Ben Pfaff [Sun, 2 Mar 2014 01:11:02 +0000 (17:11 -0800)]
odp-util: Include tun_id when nonzero even if "key" flag not set.
When a flow_tnl is being translated to Netlink attributes, the tun_id field
was included only if the FLOW_TNL_F_KEY flag was set. This meant that for
a mask, where one would not necessarily expect that flag to be set even if
there were a key, the tun_id could be omitted even if it were nonzero.
This led to kernel flows that did not match on a field that was required
to be matched (possibly causing incorrect treatment of packets) and
premature deletion of kernel flows due to mask mismatch. This commit
fixes the problem.
Bug #1192516. Reported-by: Krishna Miriyala <miriyalak@vmware.com> Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Ethan Jackson <ethan@nicira.com>
Ben Pfaff [Sat, 1 Mar 2014 00:20:17 +0000 (16:20 -0800)]
vconn: Fix vconn_get_status() return value when connection in progress.
When a connection takes a few rounds of the state machine to complete,
'error' gets filled with EAGAIN until that completes. This didn't match
the vconn_get_status() documentation, which says that it only returns a
positive errno value if there was an error. One could fix the problem
by updating the documentation (and the callers) or by updating the
implementation. I decided that the latter was the way to go because
the distinction between the TCP connection being in progress or complete
isn't visible to the client; what is visible to the client is the OpenFlow
negotiation being complete.
This problem is difficult to find in the unit tests because TCP connections
to localhost complete immediately.
Bug introduced by commit accaecc419cc57d (rconn: Discover errors in
rconn_run() even if rconn_recv() is never called.)
Reported-by: Anuprem Chalvadi <achalvadi@vmware.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
Andy Zhou [Thu, 27 Feb 2014 02:08:04 +0000 (18:08 -0800)]
lib: simplify flow_extract() API
Change the flow_extract() API to accept struct pkt_metadata,
instead of individual metadata fields. It will make the API more
logical and easier to maintain when we need to expand metadata
down the road.
Signed-off-by: Andy Zhou <azhou@nicira.com> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>¬
Ben Pfaff [Fri, 28 Feb 2014 21:12:04 +0000 (13:12 -0800)]
datapath: Correctly report flow used times for first 5 minutes after boot.
The kernel starts out its "jiffies" timer as 5 minutes below zero, as
shown in include/linux/jiffies.h:
/*
* Have the 32 bit jiffies value wrap 5 minutes after boot
* so jiffies wrap bugs show up earlier.
*/
#define INITIAL_JIFFIES ((unsigned long)(unsigned int) (-300*HZ))
The loop in ovs_flow_stats_get() starts out with 'used' set to 0, then
takes any "later" time. This means that for the first five minutes after
boot, flows will always be reported as never used, since 0 is greater than
any time already seen.
Bug #1192516. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>
Joe Stringer [Thu, 27 Feb 2014 22:13:10 +0000 (14:13 -0800)]
dpif: New function flow_dump_next_may_destroy_keys().
This new function allows callers to determine whether previously
returned keys will be modified or reallocated on the next call to
dpif_flow_dump_next(). This will be used in a future commit to allow
batched flow deletion by revalidator threads.
Signed-off-by: Joe Stringer <joestringer@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
Joe Stringer [Thu, 27 Feb 2014 22:13:09 +0000 (14:13 -0800)]
dpif: Don't synchronize flow_dump_next() status.
Recent changes to the flow_dump_next() interface have made it the
responsibility of the dpif implementation to track error status over a
flow dump operation.
This patch removes status tracking from 'struct dpif_flow_dump', allowing
multiple threads to call dpif_flow_dump_next() and track their status
independently. Even if one thread finishes processing flows for a given
iterator and state, it will not prevent other callers from processing
the remaining flows in their buffers.
After this patch, the error code that dpif_flow_dump_next() returns is
only significant for the current state and buffer. As before, the status
of the entire flow dump operation can be obtained by calling
dpif_flow_dump_done().
Signed-off-by: Joe Stringer <joestringer@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
Joe Stringer [Thu, 27 Feb 2014 22:13:08 +0000 (14:13 -0800)]
dpif: Make dpif_flow_dump_next() thread-safe.
This patch makes it the caller's responsibility to initialize a
per-thread 'state' object and pass it down to the dpif_flow_dump_next()
implementation. The implementation can expect to be called from multiple
threads with the same 'iter' and different 'state' objects.
When flow_dump_next() returns non-zero, the implementation must ensure
that subsequent calls with the same arguments also return non-zero.
Subsequent calls with the same 'iter' and different 'state' may return
zero, but should make progress towards returning non-zero.
Signed-off-by: Joe Stringer <joestringer@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
Joe Stringer [Thu, 27 Feb 2014 22:13:07 +0000 (14:13 -0800)]
dpif: Separate local and shared flow dump state.
This patch separates the structures for thread-local flow dump state
("state") from the shared flow dump state ("iter") in dpif-linux and
dpif-netdev. Future patches will make use of this to allow multiple
threads to dump flows from the same flow dump operation.
Signed-off-by: Joe Stringer <joestringer@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
Joe Stringer [Thu, 27 Feb 2014 22:13:06 +0000 (14:13 -0800)]
netlink: Make nl_dump_next() thread-safe.
This patch modifies 'struct nl_dump' and nl_dump_next() to allow
multiple threads to share the same nl_dump. These changes are targeted
around synchronizing dump status between multiple callers, and
allowing callers to fully process their existing buffers before
determining whether to stop fetching flows.
The 'status' field of 'struct nl_dump' becomes atomic, so that multiple
threads may check and/or update it to communicate when there is an error
or the netlink dump is finished. The low bit holds whether the final
message was seen, while the higher bits hold an errno value.
nl_dump_next() will now read all messages from the given buffer before
checking the shared error status and attempting to fetch more. Multiple
threads may call this with the same nl_dump, but must provide
independent buffers. As previously, the final dump status can be
determined by calling nl_dump_done() from a single thread.
Signed-off-by: Joe Stringer <joestringer@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
Joe Stringer [Thu, 27 Feb 2014 22:13:05 +0000 (14:13 -0800)]
netlink: Remove buffer from 'struct nl_dump'.
This patch makes all of the users of 'struct nl_dump' allocate their own
buffers to pass down to nl_dump_next(). This paves the way for allowing
multithreaded flow dumping.
Signed-off-by: Joe Stringer <joestringer@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Thu, 27 Feb 2014 19:06:30 +0000 (11:06 -0800)]
rconn: Discover errors in rconn_run() even if rconn_recv() is never called.
rconn_recv() calls vconn_recv(), which will report a connection error and
cause rconn_recv() to disconnect the rconn. Most rconn users regularly
call rconn_recv(), so that disconnection happens promptly. However,
the lswitch code only calls rconn_recv() after the connection negotiates an
OpenFlow version, so a connection that failed before negotiation would
never be detected as failed. This commit fixes the problem by making
rconn_run() also detect and handle connection errors.
The lswitch code is only used by the test-controller program, so this is
not an important bug fix.
Reported-by: Vasu Dasari <vdasari@gmail.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Thu, 27 Feb 2014 19:18:31 +0000 (11:18 -0800)]
tests: Re-fix a race.
Patch bf06c4fe (tests/ofproto-dpif.at: Workaround a race.), fixed a
race condition which patch 0a8763f (ofproto-dpif-upcall: Hardcode
max_idle to 1500ms.) unfixed.
Signed-off-by: Ethan Jackson <ethan@nicira.com> Reported-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp> Signed-off-by: Ben Pfaff <blp@nicira.com>
Flavio Leitner [Thu, 27 Feb 2014 12:16:34 +0000 (09:16 -0300)]
ovs-lib: allow non-root users to check service status
It tries to log the status operation, so although non-root
users can see the current status, the lack of permission
to write to the log results in an error message.
This changes to log only if the user has the permission to
write to the log file.
Joe Stringer [Wed, 26 Feb 2014 21:22:35 +0000 (13:22 -0800)]
dpif-linux: Lookup netdev to get netdev type string.
When creating tap ports in dpif-linux, the "tap" type is treated the
same as "system", and the type is discarded. When dumping datapath
port types, this would cause "tap" type to be reported as a "system"
type.
Each time we see a port of the wrong type in bridge_reconfigure(), we
remove it and add a port with the correct configuration. This would
always occur for tap ports, causing deletion and re-creation of all tap
ports each time the bridge was reconfigured.
This patch makes dpif-linux use netdev to look up port types if the
datapath reports that they are of type OVS_VPORT_TYPE_NETDEV.
Reported-by: James Schmidt <jschmidt@vmware.com> Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Alex Wang <alexw@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Ctrl+C signals are a special case for Windows and can
be handled by registering a handle through
SetConsoleCtrlHandler() routine. This is only useful
when we run it directly on console and not as services in
the background.
Once we get a Ctrl+C signal, we call the cleanup functions
and then exit.
One thing to know here is that MinGW terminal handles
Ctrl+C signal differently (and looks a little buggy. I see
it exiting the handler midway with some sort of timeout).
So this implementation is only useful when run on Windows
terminal. Since we only use MinGW for compilation and
eventually to run unit tests, it should be okay. (The unit
tests would ideally use windows services and not expect
Ctrl+C)
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Windows does have pipes (the interface is a little different).
We mostly use pipes in Linux to synchronize between parent and
children and also to handle fatal signals and then wake from poll_loop().
For Windows, we are using events for the same purpose. So don't
implement pipes for Windows.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Windows does not have a SIGHUP or SIGALRM. It does have
a SIGINT and SIGTERM. The documentation at msdn says that
SIGINT is not supported for win32 applications because
WIN32 operating systems generate a new thread to specifically
handle Ctrl+C.
This commit handles SIGTERM for Windows. The documentation also
states that nothing generates SIGTERM in Windows, but one can
use raise(SIGTERM) to manage it. The idea for handling SIGTERM
for Windows is to just have a place holder if there is need to
raise() a signal for some other purpose.
We use SIGALRM in timeval.c if we wake up from a sleep after
'deadline'. For Windows, print an error message and then
use SIGTERM.
There is an atexit() function for Windows, so we can call cleanup
functions during exit.
An upcoming commit separately handles Ctrl+C so that we can call
clean up functions for that use case.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
I have not seen a use case where the "lib" prefix is needed.
In my tests, I see that adding the additional "lib" prefix
causes libraries to not be recognized.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>