Ben Pfaff [Wed, 11 Sep 2013 04:31:59 +0000 (21:31 -0700)]
ofproto: Mark immutable members of struct rule 'const'.
One difficult part of make flow_mods thread-safe is sorting out which
members of each structure can be modified under which locks and,
especially, documenting this in a way that makes it hard for programmers to
get it wrong later. The compiler provides some tools for us, however, and
'const' is one of the nicer ones since it is standardized rather than part
of a compiler extension.
This commit makes use of 'const' to mark the immutable members of struct
rule, which is definitely the most confusing structure regarding thread
safety simply because it has so many members that use different forms of
synchronization. It also adds a bunch of CONST_CASTs to allow these
members to be initialized and destroyed where necessary.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Ethan Jackson <ethan@nicira.com>
Ben Pfaff [Wed, 11 Sep 2013 04:18:09 +0000 (21:18 -0700)]
ofproto: Remove ->timeout_mutex from struct rule (just use ->mutex).
I think that ->timeout_mutex and ->mutex were separate because the latter
(which was actually a rwlock named ->rwlock until recently) was held for
long periods of time, which meant that having a separate ->timeout_mutex
reduced lock contention. But ->mutex is now only held briefly, so it seems
reasonable to just use it.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Ethan Jackson <ethan@nicira.com>
Ben Pfaff [Thu, 12 Sep 2013 07:28:49 +0000 (00:28 -0700)]
ofproto: Replace rwlock in struct rule by a mutex.
A rwlock is suitable when one expects long hold times so there is a need
for some parallelism for readers. But when the lock is held briefly, it
makes more sense to use a mutex for two reasons. First, a rwlock is more
complex than a mutex so one would expect it to be more expensive to
acquire. Second, a rwlock is less fair than a mutex: as long as there are
any readers this blocks out writers.
At least, that's the behavior I intuitively expect, and a few looks around
the web suggest that I'm not the only one.
Previously, struct rule's rwlock was held for long periods, so using a
rwlock made sense. Now it is held only briefly, so this commit replaces it
by a mutex.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Ethan Jackson <ethan@nicira.com>
Ben Pfaff [Thu, 12 Sep 2013 07:31:33 +0000 (00:31 -0700)]
ofproto: Drop 'expirable_mutex' in favor of new global 'ofproto_mutex'.
This is the first step toward making a global lock that protects everything
needed for updating the flow table. This commit starts out by merging one
lock into the new one, and later commits will continue that process.
The mutex is initially a recursive one, because I wasn't sure that there
were no nested acquisitions at this stage, but a later commit will change
it to a regular error-checking mutex once it's settled down a bit.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Ethan Jackson <ethan@nicira.com>
Ben Pfaff [Thu, 12 Sep 2013 07:25:28 +0000 (00:25 -0700)]
ofproto: Move rule_execute() out of ofopgroup_complete().
One goal we're moving toward is to be able to execute "learn" actions
in each thread that handles packets that arrive on an interface just before
we forward those packets. The planned strategy there is to have a global
lock that protects everything required to modify the flow table. Generally
this works out well, but rule_execute() is a trouble spot. That's because
it's called during a flow table modification (when that global lock is
held) which means that a "learn" action triggered by the executing the
packet would try to recursively modify the flow table and reacquire the
global lock.
I can see two ways out of this issue. One would be to make the global lock
a recursive one, or otherwise deal with handling recursive flow_mods. The
other is to just queue up flow_mods due to rule_execute(), which itself is
a corner case (it only happens when a flow_mod sent via OpenFlow includes
a buffer_id). (I guess there could be other more radical solutions, like
just dropping packets that contain "learn" actions if they occur in this
situation.)
This commit implements the second solution because it seems less likely to
be wrong in a way that crashes the switch.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Ethan Jackson <ethan@nicira.com>
Ben Pfaff [Fri, 13 Sep 2013 00:42:23 +0000 (17:42 -0700)]
guarded-list: New data structure for thread-safe queue.
We already had queues that were suitable for replacement by this data
structure, and I intend to add another one later on.
flow_miss_batch_ofproto_destroyed() did not work well with the guarded-list
structure (it required either adding a lot more functions or breaking the
abstraction) so I changed the caller to just use udpif_revalidate().
Checking reval_seq at the end of handle_miss_upcalls() also didn't work
well with the abstraction, so I decided that since this was a corner case
anyway it would be acceptable to just drop those in flow_miss_batch_next().
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Ethan Jackson <ethan@nicira.com>
Ben Pfaff [Thu, 12 Sep 2013 06:23:00 +0000 (23:23 -0700)]
ofproto: Add a ref_count to "struct rule" to protect it from being freed.
Taking a read-lock on the 'rwlock' member of struct rule prevents members
of the rule from changing. This is a short-term use of the 'rwlock': one
takes the lock, reads some members, and releases the lock.
Taking a read-lock on the 'rwlock' also prevents the rule from being freed.
This is often a relatively long-term need. For example, until now flow
translation has held the rwlock in xlate_table_action() across the entire
recursive translation, which can call into a great deal of different code
across multiple files.
This commit switches to using a reference count for this second purpose
of struct rule's rwlock. This means that all the code that previously
held a read-lock on the rwlock across deep stacks of functions can now
simply keep a reference.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Ethan Jackson <ethan@nicira.com>
Ben Pfaff [Mon, 9 Sep 2013 18:19:41 +0000 (11:19 -0700)]
ofproto: Remove soon-to-be-invalid optimizations.
Until now, ofproto_add_flow() and ofproto_delete_flow() assumed that the
flow table could not change between its flow table check and its later
modification to the flow table. This assumption will soon be untrue, so
remove it.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Ethan Jackson <ethan@nicira.com>
Ben Pfaff [Sun, 1 Sep 2013 21:46:04 +0000 (14:46 -0700)]
ofproto-dpif: Remove vestigial "clogged" feature.
Before this commit, ofproto-dpif did not ordinarily ever defer
completion of a flow_mod, but it was possible to use "ovs-appctl
ofproto/clog" to force deferring completion, which allowed some basic
debugging of deferred completion. I only ever used that feature once,
while doing initial debugging of the deferred completion feature. I
intend in future commits to make changes that will make deferred
completion harder to implement in ofproto-dpif, so this commit removes
that feature in advance.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Ethan Jackson <ethan@nicira.com>
Ben Pfaff [Wed, 11 Sep 2013 23:41:32 +0000 (16:41 -0700)]
ofproto: Eliminate 'ofproto_node' member from struct rule.
The ofproto_node member is convenient for collecting lists of rules, but
it is also challenging for concurrency because only a single thread at a
time can put a given rule on a list. This commit eliminates the
ofproto_node member and introduces a new 'struct rule_collection' that
can be use in a thread-safe manner.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Ethan Jackson <ethan@nicira.com>
ofproto: Use proper error code when meter_id is out of range.
Use OUT_OF_METERS when given meter_id is greater than what is supported
by the datapath. Retain the INVALID_METER error code for the meter_ids
outside of the range supported by the specification.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
Alex Wang [Thu, 12 Sep 2013 18:54:16 +0000 (11:54 -0700)]
bridge: Always call smap_destroy() after smap_init()
This commit fixes a place in bridge.c where smap_destroy() is not
always called after smap_init(). Though there is no memory leak
now, it is necessary to fix it and prevent memory leak in the
future when smap_init() may be modified to allocate dynamic memory.
Reported-by: Ansis Atteka <aatteka@nicira.com> Signed-off-by: Alex Wang <alexw@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
Daniel Baluta [Wed, 11 Sep 2013 21:56:53 +0000 (00:56 +0300)]
ofproto: update flow_stats flags on flow_stats_request
This is a first step in implementing 'on demand flow counters'.
We save relevant flow_mod flags (OFPUTIL_FF_SEND_FLOW_REM,
OFPUTIL_FF_NO_PKT_COUNTS, OFPUTIL_FF_NO_BYT_COUNTS) into newly created rule
when a new flow is added, and echo them back in the flow stats request.
Notice that we remove send_flow_removed flag from struct rule/struct ofoperation
and we replace it with an enum tracking all ofputil_flow_mod_flags.
Signed-off-by: Daniel Baluta <dbaluta@ixiacom.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
Andy Zhou [Tue, 10 Sep 2013 20:33:08 +0000 (13:33 -0700)]
openflow-1.1+: move OFPTC11_TABLE_MISS_MASK out of enum
OFPTC11_TABLE_MISS_MASK is not a valid configuration, but to indicate
there are only 2 bits being used for table miss configuration. Move
it out of the enum definition.
Reported-by: Simon Horman <horms@verge.net.au> Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
Ethan Jackson [Fri, 6 Sep 2013 19:51:02 +0000 (12:51 -0700)]
ofproto-dpif: Don't hold mac rwlock while calling send_packet().
Holding the mac learning rwlock while calling send_packet() which
calls xlate_actions() is dangerous because somewhere deep in the chain
the same lock might be acquired causing a deadlock. Though we haven't
run into this problem in testing, it seems prudent to refactor it.
Reported-by: Ben Pfaff <blp@nicira.com> Signed-off-by: Ethan Jackson <ethan@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
OVS has its own workq implementation for coupe of reasons. first
was to avoid system freeze due to ovs-flow rehash softlockup.
We have moved out rehash from workq, So this problem does not exist.
second was related bugs in kernel workq implementation in pre-2.6.32
kernel. But we have dropped support for older kernel.
So there is no reason to keep ovs-workq around. Following patch
removes it.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
Daniel Borkmann [Mon, 9 Sep 2013 20:27:27 +0000 (13:27 -0700)]
datapath: flow: fix potential illegal memory access in __parse_flow_nlattrs
In function __parse_flow_nlattrs(), we check for condition
(type > OVS_KEY_ATTR_MAX) and if true, print an error, but we do
not return from this function as in other checks. It seems this
has been forgotten, as otherwise, we could access beyond the
memory of ovs_key_lens, which is of ovs_key_lens[OVS_KEY_ATTR_MAX + 1].
Hence, a maliciously prepared nla_type from user space could access
beyond this upper limit.
Introduced by 03f0d916a ("openvswitch: Mega flow implementation").
Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Andy Zhou <azhou@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
Guolin Yang [Mon, 9 Sep 2013 16:38:01 +0000 (09:38 -0700)]
util: Don't set thread name to empty
In monitor_daemon(), it set subprogram_name to "" which causes system crash
in some platform when trying to set the thread name to "". This change set
thread name to program name in this case.
Signed-off-by: Guolin Yang <gyang@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
datapath: Move flow table rehashing to flow install.
Rehashing in ovs-workqueue can cause ovs-mutex lock contentions
in case of heavy flow setups where both needs ovs-mutex. So by
moving rehashing to flow-setup we can eliminate contention.
This also simplify ovs locking and reduces dependence on
workqueue.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
datapath lisp: Use ovs offload compat functionality.
OVS already has compat functions to handle GSO packets.
Following patch get rid of GSO packet handling in lisp
and use ovs iptunnel_xmit() function for same.
CC: Lori Jakab <lojakab@cisco.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
CC: Lori Jakab <lojakab@cisco.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Lorand Jakab <lojakab@cisco.com> Acked-by: Jesse Gross <jesse@nicira.com>
Andy Zhou [Sat, 7 Sep 2013 10:02:32 +0000 (03:02 -0700)]
openflow-1.1+: OFPT_TABLE_MOD (part 1)
Added infrastructure to support Openflow OFPT_TABLE_MOD message. This patch
does not include the flexible table miss handling code that is necessary to
support the semantics specified in OFPT_TABLE_MOD messages.
Current flow miss behavior continues to conform to Openflow 1.0. Future
commits to add more flexible table miss support are needed to fully support
OPFT_TABLE_MOD for Openflow-1.1+.
Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
sw_flow_key alignment was declared as " __aligned(__alignof__(long))".
However, this breaks on the m68k architecture where long is 32 bit in
size but 16 bit aligned by default. This aligns to the size of a long to
ensure that we can always do comparsions in full long-sized chunks. It
also adds an additional build check to catch any reduction in alignment.
Simon Horman [Thu, 5 Sep 2013 05:19:13 +0000 (14:19 +0900)]
ovs-ofctl: Handle any number of buckets in group statistics
struct ofputil_group_stats has an arbitrary limit
of 16 buckets for which it can record statistics.
However the code does not appear to enforce this
limit and it seems to me that the code could overflow.
This patch aims to remove the arbitrary limit by
changing the 'bucket_stats' field of struct ofputil_group_stats
from a fixed length array to a pointer whose storage is allocated and freed
as necessary.
Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Thu, 5 Sep 2013 16:36:43 +0000 (09:36 -0700)]
configure: Remove call to OVS_CHECK_CACHE_TIME that does not exist.
Commit 31ef9f5178 (timeval: Remove CACHE_TIME scheme.) removed the
OVS_CHECK_CACHE_TIME macro but not the call to it, which leads to an error
message when one runs configure.
Alex Wang [Wed, 4 Sep 2013 22:49:19 +0000 (15:49 -0700)]
timeval: Remove CACHE_TIME scheme.
This commit removes the CACHE_TIME scheme from timeval module. This
is for eliminating the lock contention over the read/write lock of
the cached time. To get the time, the thread now will directly do
the system call 'clock_gettime()'.
As a side effect, timer can only be warpped after timer is stopped
by 'appctl time/stop' command.
Signed-off-by: Alex Wang <alexw@nicira.com> Signed-off-by: Ethan Jackson <ethan@nicira.com> Acked-by: Ethan Jackson <ethan@nicira.com>
Simon Horman [Mon, 2 Sep 2013 02:23:16 +0000 (11:23 +0900)]
ovs-ofctl: Correct formatting of instructions in manpage.
Adjust formatting in ovs-ofctl manpage so that apply_actions, clear_actions
write_metadata and goto_table appear at the same level of indentation as
actions rather being indented as if they are arguments to the learn action.
Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Ben Pfaff <blp@nicira.com>
Ethan Jackson [Wed, 4 Sep 2013 00:23:44 +0000 (17:23 -0700)]
ofproto: Rename struct rule's evict lock and use it more widely.
There are a few fields in struct rule which are accessible by
functions in ofproto-dpif and therefore need to accessed in a thread
safe manner. This patch achieves this by generalizing the rules evict
rwlock and requiring a writelock to be held to edit them.
Signed-off-by: Ethan Jackson <ethan@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Ethan Jackson [Tue, 3 Sep 2013 22:37:14 +0000 (15:37 -0700)]
ofproto-dpif: Don't manage eviction groups from threads.
Managing eviction groups from threads was going to be difficult to do
in a performant thread-safe manner, so this patch punts the problem to
the main thread.
Signed-off-by: Ethan Jackson <ethan@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Ethan Jackson [Tue, 27 Aug 2013 20:17:11 +0000 (13:17 -0700)]
ofproto-dpif: Hide struct rule_dpif internally.
By hiding struct rule_dpif inside ofproto-dpif, it becomes very clear
which attributes are accessed by multiple threads and need to be
protected by locks.
Signed-off-by: Ethan Jackson <ethan@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Fri, 30 Aug 2013 03:55:10 +0000 (20:55 -0700)]
ofproto-dpif: Do not dpif_port_del() patch ports in port_del().
Patch ports don't have datapath ports so it doesn't make sense to try to
call dpif_port_del() on them. If we do try, it will fail, which makes the
caller think that the port wasn't really deleted, which in turn keeps
ofproto from reporting the port deletion via OpenFlow. This fixes the
problem.
Without this patch, the following commands, executed under "make sandbox",
will report the patch port addition in "ovs-ofctl monitor" output, but not
the patch port deletion. With this patch, both the addition and deletion
will be reported.
Alex Wang [Wed, 4 Sep 2013 22:21:15 +0000 (15:21 -0700)]
ofproto-dpif-xlate: Fix confusion between "no stp port" and "stp port 0".
Commit 9d189a50e (ofproto-dpif-xlate: Pull STP xlation into
ofproto-dpif-xlate.) introduced the bug that considers 'stp_port_no'
of 0 as stp disabled on the port. However 'stp_port_no' is
actually the index of the stp struct's port array and ranges
between [0, STP_MAX_PORTS). So the bug allows the blocked
port keep transmitting packets and generates loop.
This commit fixes this bug.
Signed-off-by: Alex Wang <alexw@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Wed, 4 Sep 2013 20:37:56 +0000 (13:37 -0700)]
ofproto: Convert units correctly in ofport_open().
netdev_features_to_bps() returns a speed in bps, but struct
ofputil_phy_port's curr_speed and max_speed are in kbps, so a conversion
is necessary. This commit fixes the problem.
Reported-by: Benjamin Lunsky <benjamin.lunsky@netronome.com> Tested-by: Benjamin Lunsky <benjamin.lunsky@netronome.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Wed, 4 Sep 2013 19:33:06 +0000 (12:33 -0700)]
flow: Fix hypothetical memory leak in miniflow_move().
Ordinarily a miniflow will use its inline_values if its values can fit, but
there is nothing to prevent a small number of values from being stored
in malloc()'d memory. If this happened, then miniflow_move() would leak
memory. This commit fixes the problem.
This is a hypothetical problem. I haven't seen it in practice.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Ethan Jackson <ethan@nicira.com>
Ben Pfaff [Wed, 4 Sep 2013 19:38:12 +0000 (12:38 -0700)]
ofproto-dpif: Fix use-after-free deleting a bridge with active traffic.
When a bridge that has active traffic is deleted, the bridge's ofproto can
be referenced by in-flight "flow_miss"es. This commit fixes the problem
by destroying "flow_miss"es that reference the ofproto that is being
deleted.
I found the problem by adding and removing a bridge that had active traffic
(via hping3) while running under valgrind.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Ethan Jackson <ethan@nicira.com>
Ethan Jackson [Wed, 4 Sep 2013 16:27:31 +0000 (09:27 -0700)]
ofproto-dpif: Destroy facets on ofproto removal.
Facets maintain a pointer to their owning ofproto-dpif, and therefore
when an ofproto-dpif is destroyed all of their child facets should be
destroyed along with it. If they aren't, it's possible a subfacet
looked up in the dpif-backer could access it's parent facet and
therefore a defunct parent ofproto causing a segmentation fault.
Bug #19421.
Bug #19423. Diagnosed-by: Ben Pfaff <blp@nicira.com> Signed-off-by: Ethan Jackson <ethan@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Ethan Jackson <ethan@nicira.com>
Ethan Jackson [Wed, 4 Sep 2013 00:34:00 +0000 (17:34 -0700)]
ofproto-dpif-xlate: Fix mac learning deadlock.
xlate_normal() held the mac_learning lock while calling
output_normal(). When running with patch ports, this could cause
xlate_actions() to be called again, possibly attempting to take a
write lock on the same learning table causing a deadlock. This patch
solves the problem by holding the lock for a very brief period of
time.
Bug #19423. Signed-off-by: Ethan Jackson <ethan@nicira.com> Acked-by: Justin Pettit <jpettit@nicira.com>
Guolin Yang [Fri, 30 Aug 2013 16:57:13 +0000 (09:57 -0700)]
Fix a bug in conversion between flow/mask and flow key
In odp_flow_key_from_flow__(), when converting ICMPv6 flow/mask
to flow/mask key, we should always use flow to check for whether
ND informaition is present or not. In mask case, both type and code
field should be masked, otherwise ND fields can be masked.
Similarly in reverse conversion (parse_l2_5_onward()), we should
have same check.
Signed-off-by: Guolin Yang <gyang@nicira.com>
[blp@nicira.com changed && to || in parse_l2_5_onward() Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Andy Zhou <azhou@nicira.com>
Ed Maste [Fri, 30 Aug 2013 14:56:08 +0000 (10:56 -0400)]
util: Include pthread_np.h on FreeBSD
On FreeBSD pthread_set_name_np's prototype is provided by pthread_np.h.
As I believe it is the only platform to provide the "set_name" (with an
underscore) variant I hope it's fine to use the existing autoconf macro
HAVE_PTHREAD_SET_NAME_NP.
Signed-off-by: Ed Maste <emaste@freebsd.org> Signed-off-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Thu, 29 Aug 2013 23:49:51 +0000 (16:49 -0700)]
ofproto: Avoid removing rules from meter lists twice.
Since the meter code was introduced, the oftable_remove_rule() function
(recently renamed oftable_remove_rule__()) has had two blocks of code to
remove a rule from its meter's list of rules that reference that meter.
This commit reduces that to one.
Also modifies oftable_remove_rule__() to maintain the invariant that
'rule' is in a meter's list if and only if
!list_is_empty(&rule->meter_list_node). I don't believe that that fixes
a real bug, but it seems like a good idea.
(This seemed like a serious bug at first, but in fact list_remove() is
idempotent: calling it two times in a row has the same effect as calling
it once.)
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>