Ben Pfaff [Fri, 20 Jul 2012 21:46:15 +0000 (14:46 -0700)]
classifier: Optimize iteration with a catch-all target rule.
When cls_cursor_init() is given a NULL target, it can skip an expensive
step comparing the rule against the target for every table and every rule
in the classifier. collect_rule_loose() and other callers could take
advantage of this optimization, except that they actually pass in a rule
that matches everything instead of a NULL rule (e.g. for "ovs-ofctl
dump-flows <bridge>" without specifying a matching rule).
Ben Pfaff [Mon, 20 Aug 2012 18:29:43 +0000 (11:29 -0700)]
classifier: Prepare for "struct cls_rule" needing to be destroyed.
Until now, "struct cls_rule" didn't own any data outside its own memory
block. An upcoming commit will make "struct cls_rule" sometimes own blocks
of memory, so it needs "destroy" and to a lesser extent "clone" functions.
This commit adds these in advance, even though they are mostly no-ops, to
make it possible to separately review the memory management.
int
popcount4(unsigned int x)
{
x -= (x >> 1) & 0x55555555;
x = (x & 0x33333333) + ((x >> 2) & 0x33333333);
x = (x + (x >> 4)) & 0x0f0f0f0f;
x += x >> 8;
x += x >> 16;
return x & 0x3f;
}
int
popcount5(unsigned int x)
{
int n;
n = 0;
while (x) {
if (x & 0xf) {
n += ((0xe9949440 >> (x & 0xf)) & 3) + 1;
}
x >>= 4;
}
return n;
}
int
popcount6(unsigned int x)
{
int n;
n = 0;
while (x) {
n += (0xe994 >> (x & 7)) & 3;
x >>= 3;
}
return n;
}
int
popcount7(unsigned int x)
{
static const int table[16] = {
0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4
};
Ben Pfaff [Tue, 7 Aug 2012 20:43:18 +0000 (13:43 -0700)]
flow: Simplify many functions for working with flows and wildcards.
Now that "struct flow" and "struct flow_wildcards" have the same simple
and uniform structure, it's easy to handle common operations by just
iterating over the bits inside them.
Ben Pfaff [Mon, 18 Jun 2012 22:46:13 +0000 (15:46 -0700)]
flow: Remove flow_wildcards_is_exact().
It's only used in a not-very-useful assertion in some test code. In
general, exact-match flows make very little sense anymore, and they're
basically on their way out.
Ben Pfaff [Mon, 18 Jun 2012 16:55:22 +0000 (09:55 -0700)]
flow: Fully separate FWW_* from OFPFW10_*.
It might have been a useful optimization at one point to have FWW_*
correspond in OFPFW10_* where possible, but it doesn't seem worthwhile for
only 3 corresponding values. It also makes the code somewhat more
confusing.
Commit f36ff26ac08bcba28b5befba7b1dcff9b963769d (datapath: Remove
vport list node.) removed an unused list node in struct vport but
did not remove the comment describing it. This does that.
This is analogous to the change made in userspace with 2508ac16defd417b94fb69689b6b1da4fbc76282 (odp-util: Update
ODPUTIL_FLOW_KEY_BYTES for current kernel flow format.). The extra
space for vlan encapsulation was not included in the allocation for
maximum length flows.
Found by code inspection and to my knowledge has never been hit, likely
because skb allocations are padded out to a cacheline, making userspace
more susceptible to this problem than the kernel. In theory, however,
the right combination of flow and packet size could result in a kernel
panic.
Ethan Jackson [Wed, 29 Aug 2012 23:00:31 +0000 (16:00 -0700)]
vswitchd: Respect other_config:stp-enable port setting.
Commit a699f614 (lib: Utilize smaps in the idl.) broke the
other_config:stp-enable port setting in two ways. First, it
changed the default if the setting was missing to disabled.
Second, if the setting was present, it did the opposite of what the
user configured.
Bug #13122. Reported-by: Paul Ingram <paul@nicira.com> Signed-off-by: Ethan Jackson <ethan@nicira.com>
Isaku Yamahata [Tue, 28 Aug 2012 17:19:03 +0000 (02:19 +0900)]
ofproto-dpif: Make OFPP_TABLE send packet-in on miss.
The OpenFlow specification for OFPP_TABLE implies that a miss
should generate a packet-in, but Open vSwitch has never done
that. This corrects the behavior.
This also prepares for the goto-table instruction, which will
need to generate a table-miss in some circumstances.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
[blp@nicira.com rebased and updated commit message] Signed-off-by: Ben Pfaff <blp@nicira.com>
Simon Horman [Tue, 21 Aug 2012 04:55:37 +0000 (13:55 +0900)]
ovs-ofctl: Use vconn as a parameter of dump_stats_transaction()
In order to form a stats message for the prevailing OpenFlow version
of a vconn the vconn needs to be open at the time the request is
encoded. Thus there is no longer a case where it makes sense to
use dump_stats_transaction() without a vconn already being open and
available to pass as a parameter.
Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Ben Pfaff <blp@nicira.com>
Ethan Jackson [Tue, 21 Aug 2012 22:44:58 +0000 (15:44 -0700)]
bridge: Deprecate the null interface type.
It's not entirely clear what problem the null interface type is
trying to solve, nor how it could be of use to any controller.
This patch deprecates it, and schedules its removal for February
2013. If there are concerns, please email dev@openvswitch.org.
Ethan Jackson [Sat, 18 Aug 2012 02:14:02 +0000 (19:14 -0700)]
nicira-ext: Deprecate the autopath action.
The autopath action is an obsolete attempt to replicate
functionality contained in the bundle action. It is ugly and of
questionable usefulness. This patch deprecates it and schedules
its removal for February 2013. If there are concerns, please email
dev@openvswitch.org.
Ethan Jackson [Sat, 18 Aug 2012 01:47:43 +0000 (18:47 -0700)]
bond: Deprecate stable bonds.
The stable bond mode is an obsolete attempt to replicate the
functionality contained in the bundle action. They are ugly and of
questionable usefulness. This patch deprecates them and schedules
their removal for February 2013. If there are concerns, please
email dev@openvswitch.org.
Jesse Gross [Wed, 22 Aug 2012 00:48:54 +0000 (17:48 -0700)]
datapath: Fix namespace refcount leak on failed init.
If a datapath fails to initialze fully (likely due to out-of-memory)
then it's possible that we can take a reference to a network
namespace but never release it. This fixes the problem by releasing
any resources in the event of an error.
Found by code inspection, it's likely to be extremely rare in practice.
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>
Ben Pfaff [Sat, 18 Aug 2012 06:27:40 +0000 (23:27 -0700)]
ofproto-dpif: Make sure one-packet flows have zero duration (again).
commit 6a0a5bbbc (ofproto-dpif: Make sure one-packet flows have zero
duration.) was supposed to fix failures in the "ofproto-dpif - NetFlow
flow expiration" test, but it didn't fix the whole problem. That commit
eliminated one reason why a one-packet flow might be shown as having an
nonzero duration, but missed another.
The other reason was that the call to dpif_flow_stats_extract() could
obtain a time later than the time that a new facet was created. (This
wasn't obvious because dpif_flow_stats_extract() obtained the time
internally instead of taking it from the caller.) This commit fixes that
problem, by using the facet creation there too for the first packet in
a facet.
This problem has suddenly started showing up in a lot of builds. I think
it's probably because of the recent change that makes x86-64 skip the timer
optimizations, so that the return value of time_msec() changes every 1 ms,
not just every 100 ms.
I've tested this by running the test in question in a loop for several
minutes, without any failures.
Ben Pfaff [Tue, 21 Aug 2012 20:51:01 +0000 (13:51 -0700)]
bond: Tag flows according to their hash bucket, not just their slave.
The bonding code is supposed to tag flows two ways:
- According to the chosen bond slave, to make it easy to invalidate all
of the flows assigned to a given slave.
- According to the hash value for a flow, to make it easy to invalidate
all of the flows that hash into the same bucket.
However, the code wasn't actually applying the hash-based tags. This
meant that rebalancing didn't take effect immediately, and so after
rebalancing we could get log messages like this:
inconsistency in subfacet (actions were: 5) (correct actions: 4)
specifying some flow that was moved by the rebalance.
This commit fixes the problem by applying the hash-based tags.
Bug #12847. Reported-by: Pratap Reddy <preddy@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Ethan Jackson <ethan@nicira.com>
Ben Pfaff [Mon, 20 Aug 2012 17:29:39 +0000 (10:29 -0700)]
tests: New m4 macro ON_EXIT to add a cleanup action.
Several of the tests start daemons and then need to make sure that the
daemons get killed when the test completes, even if it completes in the
middle due to an early failure. Until now, they have been using manual
shell "trap" calls to do this. This works well enough for simple cases,
but sometimes multiple macros start daemons in a single test, and then
each "trap" has to be carefully written to kill off the daemons for the
previously invoked macros.
This commit introduces a new macro ON_EXIT whose use is composable: each
call appends a new action to the ones already specified.
Ben Pfaff [Wed, 15 Aug 2012 17:16:49 +0000 (10:16 -0700)]
Do not include zeroed metadata fields in NXM/OXM packet-in messages.
NXM and OpenFlow 1.2+ allow including the values of arbitrary flow metadata
in "packet-in" messages. Open vSwitch has until now always included all
the values of the metadata fields that it implements in NXT_PACKET_IN
messages.
However, this has at least two disadvantages:
- Most of the metadata fields tend to be zero most of the time, which
wastes space in the message.
- It means that controllers must be very liberal about accepting
fields that they know nothing about in packet-in messages, since any
switch upgrade could cause new fields to appear even if the
controller does nothing to give them nonzero values. (Controllers
have to be prepared to tolerate unknown fields in any case, but this
property makes unknown fields more likely to appear than otherwise.)
This commit changes Open vSwitch so that metadata fields whose values are
zero are not reported in packet-ins, fixing both problems. (This is
explicitly allowed by OpenFlow 1.2+.)
This commit mainly fixes a sort of internal conceptual dissonance centering
around struct flow_metadata. This structure is supposed to report the
metadata for a given flow. If you look at a flow, it has particular
metadata values; it doesn't have masks, and the idea of a mask for a
particular flow doesn't really make sense. However, struct flow_metadata
did have masks. This led to internal confusion; one can see this in, for
example, the following code removed by this commit in ofproto-dpif.c to
handle misses in the OpenFlow flow table:
/* Registers aren't meaningful on a miss. */
memset(pin.fmd.reg_masks, 0, sizeof pin.fmd.reg_masks);
What this code was really trying to say is that on a flow miss, the
registers are zero, so they shouldn't be included in the packet-in message.
It did manage to omit the registers, by marking them as "wild", but it is
conceptually more correct to simply omit them because they are zero (and
that's one effect of this commit).
Bug #12968. Reported-by: Igor Ganichev <iganichev@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Tue, 14 Aug 2012 20:18:03 +0000 (13:18 -0700)]
tests: Invoke daemons with --no-chdir so core files appear in test dir.
The OVS daemons "cd" to / as a normal part of their startup, since this is
traditional for daemons under Unix. But this also means that, if the
daemons happen to terminate with a core in the unit tests, then the core
file won't be written because / has too-restrictive permissions. (Unless
you run the unit tests as root, or you've got cores configured to go to a
non-standard location.)
This commit fixes the problem by invoking most daemons with --no-chdir so
that the core files go to a test-specific directory. I didn't change
invocations of the Python daemons, since Python doesn't normally terminate
with a core.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Kyle Mestery <kmestery@cisco.com>
Ben Pfaff [Tue, 14 Aug 2012 20:23:22 +0000 (13:23 -0700)]
vswitch.xml: active-backup and balance-slb bonds must be one-sided.
It doesn't work to hook up an active-backup bond on one switch to an
active-backup bond on another switch, because they might pick different
active interfaces and therefore not pass any traffic.
The same is true of balance-slb because multicast and broadcast
traffic is dropped on ingress to any interface other than the active
interface.
Ben Pfaff [Thu, 16 Aug 2012 18:33:21 +0000 (11:33 -0700)]
ofproto-dpif: Avoid dereferencing possibly null or wild pointer.
If ofpacts_len is 0 then ofpacts->type is a bad reference.
(An early draft of ofpacts used an OFPACT_END sentinel so that there was
always data there in this function, but in review the sentinel got deleted
and I did not notice that this function needed an update.)
Found by valgrind.
Bug #12847. Signed-off-by: Ben Pfaff <blp@nicira.com>
Currently, if a controller having a nonzero id registers to get a
OFPR_INVALID_TTL async message, it will not receive it. This is because
compose_dec_ttl() only sent the invalid ttl packets to the default controller
id. NXAST_DEC_TTL_CNT_IDS is a new action that accepts a list of controller
ids, each separated by `,', to which the OFPR_INVALID_TTL packets must be sent.
The earlier requirement of the controller having to explicitly register to
receive these asynchronous messages is retained.
The syntax of this action is:
dec_ttl(id1,id2)
where id1, id2 are valid controller ids.
Ben Pfaff [Mon, 13 Aug 2012 16:30:26 +0000 (09:30 -0700)]
ofproto-dpif: Avoid searching all subfacets when creating first in a facet.
When we create the first subfacet within a facet, we know that there
cannot be an existing subfacet with the same key, so we can skip the search
through the ofproto's table of subfacets.
This is a small optimization, but it should not affect the flow setup rate
in most benchmarks, because in the stressful situations that benchmarks
create, OVS does not set up flows.
Ethan Jackson [Tue, 14 Aug 2012 21:08:09 +0000 (14:08 -0700)]
bridge: Write certain statistics to the database instantly.
Traditionally the bridge has written interface and port statistics
to the database in a rate limited fashion. This makes a lot of
sense for statistics which are either constantly changing, or are
expensive to collect. However, some statistics were rate limited
which have neither of these properties. Furthermore some of these
statistics (most notably carrier) could be very useful to a
controller if updated promptly.
Mehak Mahajan [Wed, 15 Aug 2012 18:19:35 +0000 (11:19 -0700)]
Correct number of bytes to allocated for slaves in bundle action.
The size of each slave is a uint16_t. This means that each slave needs 2 bytes
at the end of nx_action_bundle. Earlier, the size of each slave was not being
factored in when allocating space. This commit corrects that by allocating 2
bytes for each slave when calculating the total number of bytes to be allocated
at the end of nx_action_bundle.
Jesse Gross [Tue, 7 Aug 2012 22:26:33 +0000 (15:26 -0700)]
tests: Add delay in NetFlow unit tests before killing processes.
At the end of the NetFlow unit tests we warp time to force any
remaining flows to expire and then immediately kill OVS and the
collector. However, this creates a race where sometimes these
processes are killed before the last records are sent or collected.
It's possible to force OVS to go through the run loop one last time
before exiting but it's harder to enforce that the collector receives
the packet. This simply avoids the problem by adding a 1 second delay
before killing the processes, which should be more than enough time.
Ethan Jackson [Fri, 10 Aug 2012 23:36:18 +0000 (16:36 -0700)]
cfm: Report opup as undefined if not in extended mode.
The cfm_get_opup() function's result doesn't make sense when CFM is
not configured in extended mode. This patch makes it report -1 in
this case. Future patches will rely on this behavior.
Simon Horman [Thu, 9 Aug 2012 08:49:32 +0000 (17:49 +0900)]
ofp-msgs: Allow 1.0-1.2 range
This is intended for use with OFPRAW_OFPST_TABLE_REQUEST
in order for it to be symmetric with OpenFlow 1.0, 1.1 and 1.2
versions of OFPRAW_OFPST1TABLE_REPLY.
OpenFlow 1.3 introduces yet another format for OFPRAW_OFPST1TABLE_REPLY.
Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Ben Pfaff <blp@nicira.com>
Simon Horman [Thu, 9 Aug 2012 08:49:23 +0000 (17:49 +0900)]
ofp-util: Use 32bit port for Open Flow 1.1 & 1.2 Port Mod Request
When encoding Open Flow 1.1 & 1.2 Port Mod Request messages
the port number should be converted to 32bits using
ofputil_port_to_ofp11() rather than htonl(). This ensures
that port numbers in the reserved range are translated correctly.
Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Ben Pfaff <blp@nicira.com>
Leo Alterman [Tue, 7 Aug 2012 23:36:27 +0000 (16:36 -0700)]
timeval: On Linux x86-64 systems refresh time whenever it is requested.
64-bit Linux appears to avoid syscalls for clock_gettime(), so we can get
higher resolution timing and avoid having a timer firing off SIGALRM
without introducing extra overhead.
Signed-off-by: Leo Alterman <lalterman@nicira.com>
Leo Alterman [Thu, 9 Aug 2012 00:40:43 +0000 (17:40 -0700)]
lockfile: Remove lockfile_lock timeout argument
lockfile_lock() accepts a timeout argument but, aside from unit tests
pertaining to timeout, its value is always 0. Since this feature relies on
a periodic SIGALRM signal, which is not a given if we're not caching time,
the cleanest solution is just to remove it.
Signed-off-by: Leo Alterman <lalterman@nicira.com>
If we've already reported an error at this point, then we currently report
a no-match error also, but that doesn't add any useful information; it's
just noise in the log.
Ben Pfaff [Thu, 9 Aug 2012 21:18:46 +0000 (14:18 -0700)]
stream-ssl: Seed OpenSSL if it fails to seed itself.
We occasionally see OpenSSL fail to seed its random number generator in
heavily loaded hypervisors. I suspect the following scenario:
1. OpenSSL calls read() to get 32 bytes from /dev/urandom.
2. The kernel generates 10 bytes of randomness and copies it out.
3. A signal arrives (perhaps SIGALRM).
4. The kernel interrupts the system call to service the signal.
5. Userspace gets 10 bytes of entropy.
6. OpenSSL doesn't read again to get the final 22 bytes. Therefore
OpenSSL doesn't have enough entropy to consider itself initialized.
It never tries again, so we're stuck forever.
The only part I'm not entirely sure about is #6, because the OpenSSL code
is so hard to read.
Thanks to Alex Yip for suggesting that this might be a startup problem.
Bug #10164. Reported-by: Ram Jothikumar <ram@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
Simon Horman [Thu, 9 Aug 2012 08:49:54 +0000 (17:49 +0900)]
Allow decoding of Open Flow 1.1 & 1.2 Flow Removed Messages
Signed-off-by: Simon Horman <horms@verge.net.au>
[blp@nicira.com added support for hard_timeout, plus a test] Signed-off-by: Ben Pfaff <blp@nicira.com>