Pravin [Thu, 20 Mar 2014 17:57:41 +0000 (10:57 -0700)]
dpif-netdev: Add poll-mode-device thread.
This patch adds PMD type netdev for netdevice with poll-mode
drivers. Since there is no way to get signal on a packet recv
from these devices we need to poll them in busy loop. So minimize
system call overhead this patch uses dpif-thread exclusively
for PMD devices and rest of devices which needs system calls to
do IO are moved to dpif-netdev-run().
PMD device like DPDK work in userspace so there is no system call
overhead for them.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Thomas Graf <tgraf@redhat.com>
Pravin [Thu, 20 Mar 2014 17:54:37 +0000 (10:54 -0700)]
netdev: Extend rx_recv to pass multiple packets.
DPDK can receive multiple packets but current netdev API does
not allow that. Following patch allows dpif-netdev receive batch
of packet in a rx_recv() call for any netdev port. This will be
used by dpdk-netdev.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Jarno Rajahalme [Tue, 25 Mar 2014 00:34:48 +0000 (17:34 -0700)]
datapath: Avoid assigning a NULL pointer to flow actions.
Flow SET can accept an empty set of actions, with the intended
semantics of leaving existing actions unmodified. This seems to have
been brokin after OVS 1.7, as we have assigned the flow's actions
pointer to NULL in this case, but we never check for the NULL pointer
later on. This patch restores the intended behavior and documents it
in the include/linux/openvswitch.h.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
packets: packet metadata from flow function instead of macro.
Commit 03fbdf8d9c80a (lib/flow: Retain ODPP_NONE on flow_extract())
replaced packet metadata initialization function by a macro.
Visual studio does not like nested structure initialization that
is done in that macro.
Jarno Rajahalme [Tue, 18 Mar 2014 23:32:45 +0000 (16:32 -0700)]
datapath: Compact sw_flow_key.
Minimize padding in sw_flow_key and move 'tp' top the main struct.
These changes simplify code when accessing the transport port numbers
and the tcp flags, and makes the sw_flow_key 8 bytes smaller on 64-bit
systems (128->120 bytes). These changes also make the keys for IPv4
packets to fit in one cache line.
There is a valid concern for safety of packing the struct
ovs_key_ipv4_tunnel, as it would be possible to take the address of
the tun_id member as a __be64 * which could result in unaligned access
in some systems. However:
- sw_flow_key itself is 64-bit aligned, so the tun_id within is always
64-bit aligned.
- We never make arrays of ovs_key_ipv4_tunnel (which would force every
second tun_key to be misaligned).
- We never take the address of the tun_id in to a __be64 *.
- Whereever we use struct ovs_key_ipv4_tunnel outside the sw_flow_key,
it is in stack (on tunnel input functions), where compiler has full
control of the alignment.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Jarno Rajahalme [Tue, 18 Mar 2014 23:32:45 +0000 (16:32 -0700)]
datapath: Fix output of SCTP mask.
The 'output' argument of the ovs_nla_put_flow() is the one from which
the bits are written to the netlink attributes. For SCTP we
accidentally used the bits from the 'swkey' instead. This caused the
mask attributes to include the bits from the actual flow key instead
of the mask.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Alexandru Copot [Mon, 3 Mar 2014 13:22:32 +0000 (15:22 +0200)]
ofproto: Allow the use of the OpenFlow 1.4 protocol
This defines the version number for OpenFlow 1.4 so that the switch
can actually use it. The ovsdb schema is also modified.
Signed-off-by: Alexandru Copot <alex.mihai.c@gmail.com> Cc: Daniel Baluta <dbaluta@ixiacom.com>
[blp@nicira.com adjusted code in cases where 1.3 and 1.4 are the same] Signed-off-by: Ben Pfaff <blp@nicira.com>
Simon Horman [Thu, 13 Mar 2014 06:52:55 +0000 (15:52 +0900)]
ofproto-dpif: Differentiate between different miss types in packet in
Replace the generated_by_table_miss field of struct ofproto_packet_in
with a miss_type field.
The generated_by_table_miss field allowed packet-in messages generated
by table-miss rules to be differentiated. This differentiation
is still provided for by miss_type being set to OFPROTO_PACKET_IN_MISS_FLOW.
This patch allows further differentiation by setting miss_type
to OFPROTO_PACKET_IN_MISS_WITHOUT_FLOW if the packet-in message
is generated by a table-miss which is not handled by a table-miss rule.
This is in preparation for OpenFlow 1.3 version-specific
handling of the default action for such misses.
Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Tue, 31 Dec 2013 18:35:27 +0000 (10:35 -0800)]
odp-util: Fix VLAN parsing behavior in parse_8021q_onward().
Anytime there is a VLAN the flow needs to properly reflect that. Keeping
the TPID in dl_type never makes sense and will probably cause problems.
The existing code did the right thing in the common case but not in corner
cases where it returned ODP_FIT_TOO_MUCH or ODP_FIT_TOO_LITTLE (the cases
where it returned an error don't matter since nothing looks at the flow
in that case).
Simon Horman [Thu, 20 Mar 2014 20:42:22 +0000 (13:42 -0700)]
ofproto: Honour Table Mod settings for table-miss handling
This reworks lookup of rules for both table 0 and table action translation.
The result is that Table Mod settings, which can alter the miss-behaviour
of tables, including table 0, on a per-table basis may be honoured.
Previous patches proposed by myself which build on earlier merged patches
by Andy Zhou implement the ofproto side of Table Mod. So with this patch
the feature should be complete.
Neither this patch, nor any other patches it builds on, alter the default
behaviour of Open vSwitch. And in particular the OpenFlow1.1 behaviour is
the default regardless of which OpenFlow version is negotiated between the
switch and the controller.
An implementation detail, which lends itself to future work, is the
handling of OFPTC_TABLE_MISS_CONTINUE. If a table has this behaviour set by
Table Mod and a miss occurs then a loop is created, skipping to the next
table. It is quite easy to create a situation where this loop covers ~255
tables which is very expensive as the lookup for each table involves taking
locks, amongst other things.
Cc: Andy Zhou <azhou@nicira.com> Signed-off-by: Simon Horman <horms@verge.net.au>
[blp@nicira.com updated comments and refactored] Signed-off-by: Ben Pfaff <blp@nicira.com>
Alex Wang [Wed, 19 Mar 2014 23:19:28 +0000 (16:19 -0700)]
cfm: Define old_cfm_fault as 'enum cfm_fault_reason'.
CFM fault variable type has been changed to 'enum cfm_fault_reason' for
long time. However, inside cfm_run(), the old_cfm_fault is still defined
as boolean. This commit fixes the issue.
Found by inspection.
Signed-off-by: Alex Wang <alexw@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Alex Wang [Wed, 26 Feb 2014 18:07:38 +0000 (10:07 -0800)]
dpif-netdev: Implement the API functions to allow multiple handler
threads read upcall.
This commit implements the API functions to allow multiple handler
threads read upcall.
Also, this commit removes the handling priority of DPIF_UC_MISS
over DPIF_UC_ACTION. So, both misses will be put to the same
queue. The decision is based on the fact that a lot has changed
since the age when flow setup rate is most treasured and starving
all actions in the presence of any flow misses doesn't seem like
a sound balancing solution.
Thusly the current implementation will be put in testing and
investigation for better balancing solution will continue if
there is an issue.
Also note, the introduction and use of flow_hash_5tuple() will
put missed ICMP packets from same source but with different
type/code to different handler queues. This may cause reordering
of these packets. For now, we do not count this as a problem.
Signed-off-by: Alex Wang <alexw@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Jarno Rajahalme [Wed, 19 Mar 2014 23:13:32 +0000 (16:13 -0700)]
dpif-netdev: Use packet key to parse TCP flags.
The flow that created the netdev_flow might have wildcarded TCP flags,
or it may not be a TCP flow at all. Fix this by using the freshly
extracted flow key to parse TCP flags.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
Alex Wang [Sat, 15 Mar 2014 01:30:39 +0000 (18:30 -0700)]
cfm: Notify connectivity_seq on remote maintenance points change.
Commit f23d157c ("ofproto-dpif: Don't poll ports when nothing changes")
did not ensure the update of the row of remote maintenance points in ovsdb
when it changes. This commit makes the update happen by notifying the
global connectivity_seq.
Alex Wang [Wed, 19 Mar 2014 17:42:08 +0000 (10:42 -0700)]
ovs-rcu: Call ovsrcu_init() in ovsrcu_quiesce().
This commit fixes a bug introduced by 0f2ea848(ovs-rcu: New library.).
It is possible that ovsrcu_quiesce() is called before ovsrcu_init().
So, it is necessary to call ovsrcu_init() in ovsrcu_quiesce().
Signed-off-by: Alex Wang <alexw@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Jarno Rajahalme [Wed, 19 Mar 2014 15:51:52 +0000 (08:51 -0700)]
lib/hmap: Remove the memory fence from hmap_is_empty().
The fence made classifier_lookup() slower. Access to a size_t 'n' is
safe without synchonizing, and if racing with writers matters,
additional syncronization primitives are used anyway.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
Alex Wang [Fri, 14 Mar 2014 18:47:30 +0000 (11:47 -0700)]
datapath: compat: Downstream the reciprocal_div.{c,h}.
The reciprocal division code used in datapath is flawed. The bug
has been fixed in the linux kernel repo in commit 809fa972fd(
reciprocal_divide: update/correction of the algorithm).
This commit downstreams the reciprocal_div.{c,h} from the linux
kernel repo.
Signed-off-by: Alex Wang <alexw@nicira.com> Reviewed-by: Thomas Graf <tgraf@redhat.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>
Andy Zhou [Thu, 13 Mar 2014 22:28:54 +0000 (15:28 -0700)]
backtrace: Add log_backtrace()
log_backtrace() and log_backtrace_msg() logs the back trace into
the log file. It may be most useful when debugging unit tests.
"backtrace.h" documents the usage. It is not being called directly
in the code, but rather as a handy tool available when needed.
Signed-off-by: Andy Zhou <azhou@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Andy Zhou [Fri, 14 Mar 2014 04:48:55 +0000 (21:48 -0700)]
udpif: Bug fix updif_flush
Before this commit, all datapath flows are cleared with dpif_flush(),
but the revalidator thread still holds ukeys, which are caches of the
datapath flows in the revalidaor. Flushing ukeys causes flow_del
messages to be sent to the datapath again on flows that have been
deleted by the dpif_flush() already.
Double deletion by itself is not problem, per se, may an efficiency
issue. However, for ever flow_del message sent to the datapath, a log
message, at the warning level, will be generated in case datapath
failed to execute the command. In addition to cause spurious log
messages, Double deletion causes unit tests to report erroneous
failures as all warning messages are considered test failures.
The fix is to simply shut down the revalidator threads to flush all
ukeys, then flush the datapth before restarting the revalidator threads.
dpif_flush() was implemented as flush flows of all datapaths while
most of its invocation should only flush its local datapath.
Only megaflow on/off commands should flush all dapapaths. This bug is
also fixed.
Found during development.
Signed-off-by: Andy Zhou <azhou@nicira.com> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
kmindg [Sun, 9 Mar 2014 09:48:52 +0000 (17:48 +0800)]
stp: Fix bpdu tx problem in listening state
The restriction only allows to send bpdu in forwarding state in
compose_output_action__. But a port could send bpdu in listening
and learning state according to comments in lib/stp.h(State of
an STP port).
Until this commit, OVS did not send out BPDUs in listening and learning
states. But those two states are temporary, the stp port will be in
forwarding state and send out BPDUs eventually (In the default
configuration listening and learning states last 15+15 second). Therefore,
this bug increased convergence time but did not entirely break STP.
Signed-off-by: kmindg <kmindg@gmail.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
windows/netinet: Copy ip6.h and icmp6.h from netbsd.
There are a few structure definitions that is used from
these headers. So copy them from the netbsd repo.
The following changes have been made on top of it:
* The keyword "__packed" has been removed
from the headers as the corresponding Linux headers don't
do packing.
* #if BYTE_ORDER == 'X' macros have been replaced by CONSTANT_HTONx().
* code inside #ifdef _KERNEL has been deleted.
* code inside #ifdef ICMP6_STRINGS has been deleted.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Tue, 11 Mar 2014 19:46:29 +0000 (12:46 -0700)]
ovs-atomic: Use raw types, not structs, when locks are required.
Until now, the GCC 4+ and pthreads implementations of atomics have used
struct wrappers for their atomic types. This had the advantage of allowing
a mutex to be wrapped in, in some cases, and of better type-checking by
preventing stray uses of atomic variables other than through one of the
atomic_*() functions or macros. However, the mutex meant that an
atomic_destroy() function-like macro needed to be used. The struct wrapper
also made it impossible to define new atomic types that were compatible
with each other without using a typedef. For example, one could not simply
define a macro like
#define ATOMIC(TYPE) struct { TYPE value; }
and then have two declarations like:
ATOMIC(void *) x;
ATOMIC(void *) y;
and do anything with these objects that require type-compatibility, even
"&x == &y", because the two structs are not compatible. One can do it
through a typedef:
typedef ATOMIC(void *) atomic_voidp;
atomic_voidp x, y;
but that is inconvenient, especially because of the need to invent a name
for the type.
This commit aims to ease the problem by getting rid of the wrapper structs
in the cases where the atomic library used them. It gets rid of the
mutexes, in the cases where they are still needed, by using a global
array of mutexes instead.
This commit also defines the ATOMIC macro described above and documents
its use in ovs-atomic.h.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Andy Zhou <azhou@nicira.com>
idl headers won't be built, if we build individual executables
e..g., "make ovsbd/ovsdb-server.exe". According to
http://www.gnu.org/software/automake/manual/html_node/Built-Sources-Example.html
we may have to add the headers as dependecies for every executables.
Currently the lack of a ovs-appctl port to Windows prevents us from
running just a "make". We plan to get ovs-appctl port done soon. Till
then, call out that the idl headers need to be built separately.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
ofp-actions: Relax build assertion condition for ofpact_nest struct.
struct ofpact has enums that are packed in case of __GNUC__.
This packing does not occur for visual studio. For 'struct ofpact_nest',
we are currently expecting that "struct ofpact actions[]" has an offset of
8 bytes. This condition won't be true in compilers where enums are
not packed.
It is good enough if struct ofpact actions[] starts at an offset which is
a multiple of OFPACT_ALIGNTO.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Windows does not have the getppid(), getuid(), getgid() functions.
We do get a random seed from CryptGenRandom(). That seed along with
process id and current time hopefully is good enough.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
kmindg [Sun, 9 Mar 2014 09:48:04 +0000 (17:48 +0800)]
ofproto: Update rule's priority in eviction group.
We do call heap_rebuild in ofproto_run, but we do not update rule's
priority with latest hard_timeout and idle_timeout before heap_rebuild.
This patch ensures that rule's priority has been updated before
heap_rebuild, and adds two test cases to check eviction with modified
hard_timeout and idle_timwout.
Signed-off-by: kmindg <kmindg@gmail.com> Signed-off-by: Ben Pfaff <blp@nicira.com>