Ben Pfaff [Fri, 9 Dec 2011 23:54:43 +0000 (15:54 -0800)]
dpif-linux: Avoid valgrind warning in epoll_ctl() call.
Valgrind points out correctly that there are uninitialized bytes in the
'event' structure. That's OK, but it doesn't hurt to suppress the warning
by zeroing all of the bytes.
Ben Pfaff [Mon, 12 Dec 2011 22:44:23 +0000 (14:44 -0800)]
bridge: Enable support for access and native VLAN ports on bonds.
Since Open vSwitch's inception we've disabled the use of bonds as access
ports, for no particularly good reason. This also unintentionally
prevented bonds from being used as native VLAN ports.
This commit removes the code that prevented using bonds these ways
Reported-and-tested-by: "Michael A. Collins" <mike.a.collins@ark-net.org> Signed-off-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Mon, 12 Dec 2011 17:37:34 +0000 (09:37 -0800)]
debian: Correct licensing information for user/kernel shared header files.
The intent is to license all shared user/kernel header files under both
GPLv2 and Apache v2. The license statement here said GPLv3 instead of
GPLv2, so this commit fixes that problem.
Also, include/openvswitch used to be where all the shared user/kernel
header files were located, but this is no longer true, and now there is a
userspace-only header file also in include/openvswitch, so this commit now
lists all of the user/kernel header files explicitly.
Neil McKee [Sat, 10 Dec 2011 00:56:32 +0000 (16:56 -0800)]
sFlow: add Sun Industry Standards Source License 1.1 as licensing option
The sFlow License was not on the list for the Fedora Project, but the
Sun Industry Standards Source License 1.1 was (and it has the right
properties). So this patch includes it as a licensing option in the
relevant places (COPYING and the lib/sflow* sources). Let me know
if this looks OK or not.
Signed-off-by: Neil McKee <neil.mckee@inmon.com> Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Chris Wright <chrisw@sous-sol.org>
Ben Pfaff [Fri, 9 Dec 2011 23:57:55 +0000 (15:57 -0800)]
bridge: Avoid use-after-free with VLAN splinters and multiple bridges.
The VLAN splinters feature uses a "pool" to track and free allocated
blocks. There's only one pool, but the implementation was freeing all of
the blocks in it for every bridge during reconfiguration, not just once for
each reconfiguration, so caused a use-after-free when there was more than
one bridge and a bridge other than the last one in the list of bridges had
a VLAN splinter port.
Bug #8671. Reported-by: Michael Mao <mmao@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Fri, 9 Dec 2011 21:09:23 +0000 (13:09 -0800)]
lacp: Avoid valgrind warning in lacp_configure() if custom timing not used.
The caller currently doesn't fill in s->custom_time unless it actually
wants a custom LACP time, but lacp_configure() still does a calculation
with it, provoking a warning from valgrind. This eliminates the warning.
The calculated value was not actually used in this case, so this commit
does not fix a real bug.
The xenserver specfile still places them in /etc/xensource/bugtool since
that's a distro policy. Of course, the rpmlint warnings are as well,
however, this seems like a more logical place for the bugtool plugins.
Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Ben Pfaff <blp@nicira.com>
Chris Wright [Fri, 9 Dec 2011 07:36:03 +0000 (23:36 -0800)]
man: fix pic issue at the source
The commit 0993b66 (man: pic failed to run during manpage-check) worked
around the manpage-check warning generated by groff. Using "-T ascii"
rather "-T utf8" was enough to silence the warning because the man page
has this condition in it:
.if !'\*[.T]'ascii'
However, rpmlint generates the same warning as manpage-check was (it
uses -Tutf8), and manpages are generated using -Tutf8 (leading to an
fairly unreadable drawing). So let's change the logic a bit and allow
pdf generation w/ nice drawing and kill it for tty's.
Cc: Ethan Jackson <ethan@nicira.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Tue, 6 Dec 2011 23:55:22 +0000 (15:55 -0800)]
socket-util: Correctly return negative values for errors.
The comment on this function says that negative values indicate errors, and
the callers assume that too, but in fact it was returning positive errno
values, which are indistinguishable from valid fd numbers.
It really seems to me that this should have been found pretty quickly in
the field, since stream-tcp and stream-ssl both use inet_open_passive to
implement their passive listeners. I'm surprised that no one has reported
it.
Rob Hoes [Mon, 5 Dec 2011 14:43:12 +0000 (14:43 +0000)]
xenserver: Reduce number of xapi DB calls in plugin
When there are lots of PIFs in a XenServer/XCP pool, for example when
there are many VLANs configured on the pool, operations such as
PIF.get_all and loops over all PIFs which include database operations,
are very inefficient when executed on a pool slave, and should be
avoided as much as possible. This patch reduces the number of database
calls in the update function of the openvswitch-cfg-update xapi plugin.
Signed-off-by: Rob Hoes <rob.hoes@citrix.com> Acked-by: Dominic Curran <Dominic.curran@citrix.com>
Justin Pettit [Mon, 5 Dec 2011 00:33:54 +0000 (16:33 -0800)]
netdev-linux: Don't restrict policing to IPv4 and don't call "tc".
Mike Bursell pointed out that our policer only works on IPv4
traffic--and specifically not IPv6. By using the "basic" filter, we can
enforce policing on all traffic for a particular interface.
Jamal Hadi Salim pointed out that calling "tc" directly with system() is
pretty ugly. This commit switches our remaining "tc" calls to directly
sending the appropriate netlink messages.
Suggested-by: Mike Bursell <mike.bursell@citrix.com> Suggested-by: Jamal Hadi Salim <hadi@cyberus.ca>
Today i played with openvswitch on my workstation with kernel 2.6.40 and found that it break when i built. The
+issue is introduced by commit ceb176fdb72bb7ce90debc66e1eeb1d25823d30a
Jesse Gross [Fri, 2 Dec 2011 00:09:05 +0000 (16:09 -0800)]
datapath: Remove custom version of ipv6_skip_exthdr().
We currently have a version of ipv6_skip_exthdr() which is
identical to the main one with the addition of fragment reporting.
We can propose our version for upstream and then use it directly
without duplication.
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Wed, 30 Nov 2011 18:59:12 +0000 (10:59 -0800)]
netdev-linux: Ref and unref the netdev_linux_cache_notifier for taps too.
netdev-linux uses netdev_linux_cache_notifier to flush its cache when the
kernel notifies userspace that a particular network device's configuration
or status has changed. This is as applicable to tap devices as to system
and internal devices, so we should create and destroy the notifier for
tap devices also.
I doubt that in practice it's possible to run ovs-vswitchd without having
a non-tap device open, at least with the kernel datapath, because the
local port for a bridge is not a tap device, so there should be no need to
backport this to older versions.
Ben Pfaff [Wed, 30 Nov 2011 21:07:38 +0000 (13:07 -0800)]
ovs-ofctl: Improve usage message.
TARGET and SWITCH are different because TARGET can refer to a switch or a
controller whereas SWITCH must be a switch, but TARGET wasn't defined
before.
Also, TARGET seems a little more user-friendly than the VCONN that was used
here before.
Ben Pfaff [Wed, 30 Nov 2011 20:09:35 +0000 (12:09 -0800)]
bridge: Configure datapath ID earlier.
The design intent is for LACP ports to use the datapath ID as the default
system ID when none is specifically configured. However, the datapath ID
is not available that early. This commit makes it available earlier.
This commit does not fix another bug that prevents the LACP system ID from
being set properly (nothing sets it at all, in fact, so it always uses 0).
Ben Pfaff [Wed, 16 Nov 2011 22:38:52 +0000 (14:38 -0800)]
ovsdb: Correctly implement conditions that include multiple clauses.
Multiple-clause conditions in OVSDB operations with "where" clauses are
supposed to be conjunctions, that is, the condition is true only if every
clause is true. In fact, the implementation only checked a single clause
(not necessarily the first one) and ignored the rest. This fixes the
problem and adds test coverage for multiple-clause conditions.
Ben Pfaff [Wed, 23 Nov 2011 20:15:42 +0000 (12:15 -0800)]
daemon: Better log when fork child dies early from signals.
On one machine, "/etc/init.d/openvswitch-switch start" failed to start
with:
ovs-vswitchd: fork child failed to signal startup (Success)
Starting ovs-vswitchd ... failed!
"strace" revealed that the fork child was actually segfaulting, but the
message output didn't indicate that in any way. This commit fixes the
log message (but not the segfault itself).
Reported-by: Michael Hu <mhu@nicira.com>
Bug #8457.
Ben Pfaff [Mon, 14 Nov 2011 18:10:58 +0000 (10:10 -0800)]
netlink-socket: Let the kernel choose Netlink pids for us.
The Netlink code in the Linux kernel has been willing to choose unique
Netlink pids for userspace sockets since at least 2.4.36 and probably
earlier. There's no value in choosing them ourselves.
This simplifies the code and eliminates the possibility of exhausting our
supply of Netlink PIDs.
Ben Pfaff [Mon, 28 Nov 2011 18:35:15 +0000 (10:35 -0800)]
ofproto: Add "fast path".
The key to getting good performance on the netperf CRR test seems to be to
handle the first packet of each new flow as quickly as possible. Until
now, we've only had one opportunity to do that on each trip through the
main poll loop. One way to improve would be to make that poll loop
circulate more quickly. My experiments show, however, that even just
commenting out the slower parts of the poll loop yield minimal improvement.
This commit takes another approach. Instead of making the poll loop
overall faster, it invokes the performance-critical parts of it more than
once during each poll loop.
My measurements show that this commit improves netperf CRR performance by
24% versus the previous commit, for an overall improvement of 87% versus
the baseline just before the commit that removed the poll_fd_woke(). With
this commit, ovs-benchmark performance has also improved by 13% overall
since that baseline.
Ben Pfaff [Fri, 11 Nov 2011 00:42:51 +0000 (16:42 -0800)]
ofproto-dpif: Process multiple batches of upcalls in a single poll loop.
This yields a 27% improvement in netperf CRR results in my tests
versus the previous commit, which is a 52% improvement versus
the baseline from just before the poll_fd_woke() optimization was
removed.
Ben Pfaff [Tue, 22 Nov 2011 17:25:32 +0000 (09:25 -0800)]
dpif-linux: Use "epoll" instead of poll().
epoll appears to be much more efficient than poll() at least for
static file descriptor sets. I can't otherwise explain why this
patch increases netperf CRR performance by 20% above the previous
commit, which is also about a 19% overall improvement versus
the baseline from before the poll_fd_woke() optimization was
removed.
Ben Pfaff [Mon, 28 Nov 2011 17:29:18 +0000 (09:29 -0800)]
dpif-linux: Use poll() internally in dpif_linux_recv().
Using poll() internally in dpif_linux_recv(), instead of relying
on the results of the main loop poll() call, brings netperf CRR
performance back within 1% of par versus the code base before the
poll_fd_woke() optimizations were introduced. It also increases
the ovs-benchmark results by about 5% versus that baseline, too.
My theory is that this is because the main loop takes long enough
that a significant number of packets can arrive during the main
loop itself, so this reduces the time before OVS gets to those
packets.
Ben Pfaff [Tue, 22 Nov 2011 19:05:53 +0000 (11:05 -0800)]
Revert "poll-loop: Enable checking whether a FD caused a wakeup."
This reverts commit 1e276d1a10539a8cd97d2ad63c073a9a43f0f1ef.
The poll_fd_woke() and nl_sock_woke() function added in that commit are
no longer used, so there is no reason to keep them in the tree.
Ben Pfaff [Thu, 10 Nov 2011 23:39:39 +0000 (15:39 -0800)]
dpif-linux: Remove poll_fd_woke() optimization from dpif_linux_recv().
This optimization on its own provided about 37% benefit against a
load of a single netperf CRR test, but at the same time it penalized
ovs-benchmark by about 11%. We can get back the CRR performance
loss, and more, other ways, so the first step is to revert this
patch, temporarily accepting the performance loss.
Justin Pettit [Wed, 23 Nov 2011 08:04:58 +0000 (00:04 -0800)]
mirroring: Don't require the "normal" action to perform mirroring.
Previously, mirrors only worked when using the "normal" action. This
commit performs mirroring even when mirroring is not used. It also adds
some unit tests.
Justin Pettit [Sun, 20 Nov 2011 23:12:36 +0000 (15:12 -0800)]
ovs-vswitchd: Track packet and byte statistics sent on mirrors.
This commit adds support for tracking the number of packets and bytes
sent through a mirror. The numbers are kept in the new "statistics"
column on the mirror table in the "tx_packets" and "tx_bytes" keys.
Ben Pfaff [Wed, 23 Nov 2011 21:22:30 +0000 (13:22 -0800)]
ofproto-dpif: Separately track the initial VLAN TCI of arriving packets.
In an upcoming commit, VLAN splinters can cause the VLAN TCI of a packet
received on an interface to differ from the logical VLAN TCI. That is,
a packet that is received on a Linux VLAN network device has no VLAN (so
its initial VLAN TCI is 0) but we logically treat it as if it has the VLAN
associated with the VLAN device.
This is only desirable for use with VLAN splinters and should be reverted
when this feature is no longer needed. I'm breaking it out here only to
make the series easier to review.
Ben Pfaff [Wed, 16 Nov 2011 01:06:41 +0000 (17:06 -0800)]
ofproto-dpif: Move ODP actions from facets to subfacets.
This is a prerequisite for the upcoming VLAN splinter patch, because
splinters and non-splintered subfacets might need slightly different
actions due to the VLAN tag being initially different (present vs. absent).
This is only desirable for use with VLAN splinters and should be reverted
when this feature is no longer needed. I'm breaking it out here only to
make the series easier to review.
Ben Pfaff [Tue, 8 Nov 2011 21:53:38 +0000 (13:53 -0800)]
vswitchd: Remove special case for VLAN devices.
We introduced this special case before the XenServer integration was
complete. At that point, we were using VLAN devices on XenServer, with a
separate bridge for each VLAN, so we needed this special case. But no
version of OVS for any supported XenServer version uses VLAN devices this
way, so we can delete the special case.
Ben Pfaff [Wed, 23 Nov 2011 00:46:05 +0000 (16:46 -0800)]
ofproto-dpif: Factor NetFlow active timeouts out of flow expiration.
NetFlow active timeouts were only mixed in with flow expiration for
convenience: both processes need to iterate all the facets. But
an upcoming commit will change flow expiration to work in terms of
a new "subfacet" entity, so they will no longer fit together well.
This change could be seen as an optimization, since NetFlow active
timeouts don't ordinarily have to run as often as flow expiration,
especially when the flow expiration rate is stepped up due to a
large volume of flows.
Ben Pfaff [Wed, 23 Nov 2011 21:17:38 +0000 (13:17 -0800)]
bridge: Avoid reading other_config columns with ovsdb_idl_get().
ovsdb_idl_get() doesn't work with synthetic records. Upcoming commits
will start synthesizing more ports and interfaces, so we should avoid
using ovsdb_idl_get().
In the long term it's probably a good idea to come up with a better way
to do synthetic database records, one that causes less trouble.
Ethan Jackson [Tue, 22 Nov 2011 03:25:19 +0000 (19:25 -0800)]
ofproto-dpif: Simplify commit logic.
Before executing an output action, ofproto-dpif must commit the
changes it's made to the flow so they are reflected in the
packet. This code has been unnecessarily complex. This patch
attempts to simplify the code in the following ways.
- Commit in fewer places.
In an attempt to provide some optimization, the ofproto-dpif code
separated the commit and output composition steps so things like
flood actions could avoid redundant commits. This is a case of
premature optimization that makes the code significantly more
difficult to reason about. With this patch, commits happen only
when really necessary.
- Only perform full commits.
In an attempt to provide some optimization, the ofproto-dpif code
would allow callers to only commit the part of the flow that they
had modified by directly calling the relevant subroutine. This
practice made the code difficult to reason about and is thus
discontinued.
- Perform all output logic in one function.
All of the logic surrounding the datapath output action has been
placed in the compose_output_action__() function. Most callers
will use the compose_output_action() function which simply passes
reasonable defaults through to compose_output_action__().
Ethan Jackson [Tue, 22 Nov 2011 03:18:14 +0000 (19:18 -0800)]
ofproto-dpif: Properly update tos and ttl fields.
ofproto-dpif failed to update the base flow's tos and ttl fields
when preparing for an output action. This could cause redundant
updates of those fields in the datapath. A future patch adds a
test which could have caught the issue for the tos bits.
Ben Pfaff [Sat, 22 Oct 2011 20:11:48 +0000 (13:11 -0700)]
meta-flow: Split ICMP into ICMPv4 and ICMPv6.
NXM breaks ICMP into v4 and v6. An upcoming commit will drop all of the
NXM specific data in favor of mf_field, and so at that point we need to
have a separate mf_field for each NXM field. So, this commit splits
ICMP into v4 and v6 for meta-flow also.
Ethan Jackson [Mon, 21 Nov 2011 21:36:17 +0000 (13:36 -0800)]
dpif-netdev: Allow enqueue actions.
The dpif-netdev implementation disallowed enqueue actions because
it did not support conversion from OVS 'queue_id' to dpif
'priority'. For testing purposes, this patch allows queues which
translate into NOOPs.
Ethan Jackson [Fri, 18 Nov 2011 21:47:25 +0000 (13:47 -0800)]
ofproto-dpif: Test basic output and flooding.
This patch adds basic tests to ofproto-dpif which verify that
output and flood actions respect the relevant OFPPC flags, and do
not loop back to the in_port.
Ethan Jackson [Mon, 21 Nov 2011 18:59:41 +0000 (10:59 -0800)]
ovs-ofctl: Support OFPPC_NO_FWD.
Currently, there is no way to disable forwarding on an OpenFlow
port from the command line. This patch adds support for the
OFPPC_NO_FWD flag to the ovs-ofctl utility.
Ethan Jackson [Sat, 19 Nov 2011 03:00:34 +0000 (19:00 -0800)]
ofp-util: Support OFPP_LOCAL in enqueue actions.
According to the specification the enqueue action should refer to
"a valid physical port", or OFPP_IN_PORT. It would be strange to
attach a queueing discipline to the local port, but I see no reason
to restrict it.
Jesse Gross [Sat, 19 Nov 2011 22:26:02 +0000 (14:26 -0800)]
datapath: Always notify in initial namespace for port deletions.
We currently notify for port deletions in the namespace of the device
that was deleted. In general this should be initial namespace because
that's the only place where we look but it's possible that the device
was moved after being attached. However, it's not semantically correct
because we really care about the namespace of the userspace process, not
that of the device. This switches to genlmsg_multicast() which always
uses the initial namespace and seems more appropriate anyways.
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>