Ethan Jackson [Wed, 13 Jul 2011 23:20:24 +0000 (16:20 -0700)]
nicira-ext: Generalize nx_mp_fields into nx_hash_fields.
Future patches will use nx_hash_fields for non-multipath related
actions. This patch renames nx_mp_fields and creates a new
flow_hash_fields() function.
Ben Pfaff [Fri, 15 Jul 2011 18:25:14 +0000 (11:25 -0700)]
ovsdb: Log when remote connections are deconfigured.
Recently I helped debug a scenario where ovsdb-server connected to a remote
manager, then ovs-vsctl deleted the remote manager and soon after re-added
it. The log was difficult to interpret because it showed two successful
connection attempts to the same remote without showing a reason why the
connection was dropped in the first place. Adding this log message would
make it clear that the configuration changed to remove that remote
connection in the meantime.
Ben Pfaff [Wed, 6 Jul 2011 17:26:57 +0000 (10:26 -0700)]
ovs-bugtool: Add an OVSDB snapshot to ovs-bugtool output.
The ovs-bugtool output already includes a copy of the configuration
database file, but this file omits many instantaneous details. For
example, it does not include any information about controller connection
status or interface statistics. This commit adds a snapshot of the
database contents that does include these details.
Ben Pfaff [Wed, 6 Jul 2011 17:43:03 +0000 (10:43 -0700)]
ovs-bugtool: Add plugins previously used only under XenServer.
All of the xen-bugtool plugins that OVS has previously installed only under
XenServer are equally useful with Debian and other distributions, so
this commit installs and uses them everywhere.
Ben Pfaff [Wed, 13 Jul 2011 17:58:59 +0000 (10:58 -0700)]
configure: Improve error message when pkg-config is missing.
Until now, when pkg-config is missing, Autoconf emitted this error:
error: possibly undefined macro: PKG_CHECK_MODULES
This commit changes the message to:
error: Please install pkg-config.
This should be easier for users to interpret.
Ben Pfaff [Tue, 12 Jul 2011 16:38:12 +0000 (09:38 -0700)]
ovs-bugtool: Turn off "group" and "other" permissions for generated files.
ovs-bugtool's output is potentially sensitive, so it seems best not to
allow anyone but the owner to read it. This commit disables "group" and
"other" bits in the Unix ACL.
Ben Pfaff [Tue, 12 Jul 2011 16:37:08 +0000 (09:37 -0700)]
ovs-bugtool: Make available outside of Debian packages.
ovs-bugtool is no longer Debian-specific, so install it everywhere. (On
XenServer, specifically, we do not install it, because there xen-bugtool
already exists.)
Ben Pfaff [Thu, 30 Jun 2011 19:42:32 +0000 (12:42 -0700)]
ovs-bugtool: Restore RHEL support.
ovs-bugtool was originally xen-bugtool from Citrix XenServer. We modified
it for OVS by tailoring it to work better on Debian, updating file
locations and removing features that were specific to XenServer or that
work with packages not installed by default on Debian.
This commit reverts many of those changes. This commit:
- Adds back code that works with RHEL features not installed by default
on Debian (but not XenServer-specific features, since xen-bugtool works
nicely on XenServer).
- Switches from hard-coded paths for utilities to searching the
superuser's typical $PATH (because RHEL and Debian disagree on the
location for some utilities).
- In a few cases merges lists, e.g. now it looks for logs under the names
used in both Debian and RHEL.
- Fixes a few spurious differences between ovs-bugtool and xen-bugtool,
e.g. in white space.
Simon Horman [Tue, 12 Jul 2011 06:52:44 +0000 (15:52 +0900)]
datapath: An expanded table should be larger than its predecessor
This resolves what appears to be a think-o in tbl_expand()
* Old Logic: Always create tables with TBL_MIN_BUCKETS buckets
* New Logic: Create tables twice as big as their predecessor
When sending 10,000 flows through ovs-vswitchd:
* Old Logic: CPU bound in tbl_lookup(), significant packet loss
* New Logic: ~10% of one core used, negligible packet loss
Tested with an Intel E5520 @ 2.27GHz,
flows from an ethernet device to to a dummy interface with
no address configured.
Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Jesse Gross <jesse@nicira.com>
Ben Pfaff [Mon, 11 Jul 2011 20:57:58 +0000 (13:57 -0700)]
configure: Pass correct -target option to "cgcc" in the common case.
The "cgcc" script included with sparse guesses the target architecture
based on the host architecture instead of based on the GCC architecture.
This means that it often guesses wrong on biarch systems, e.g. my Linux
kernel is x86_64 but userspace is i686 and thus GCC targets i686 by
default.
This fixes the problem by passing an explicit "-target=i86" to cgcc if
GCC targets x86 or "-target=x86_64" if GCC targets x86_64.
Bug #6312. Reported-by: Ethan Jackson <ethan@nicira.com>
ovs-brcompatd no longer accepts any non-option arguments. Also,
-vANY:console:EMER is unnecessary, because --detach now implies disabling
logging to the console.
Ben Pfaff [Fri, 8 Jul 2011 16:11:55 +0000 (09:11 -0700)]
vconn-stream: Always call the stream's run function from vconn_stream_run().
The stream's run function ensures that data buffered in the stream itself
gets pushed to the network. Only the SSL stream class has such a run
function, which means that SSL stream data failed to be pushed to the
remote peer in a timely manner in some cases.
Many thanks to Alex Yip for narrowing this down.
Reported-by: Alex Yip <alex@nicira.com> Tested-by: Alex Yip <alex@nicira.com>
Bug #6221.
Ben Pfaff [Fri, 1 Jul 2011 17:11:30 +0000 (10:11 -0700)]
python: Make invalid UTF-8 sequence messages consistent across Python versions.
Given the invalid input <C0 22>, some versions of Python report <C0> as the
invalid sequence and other versions report <C0 22> as the invalid sequence.
Similarly, given input <ED 80 7F>, some report <ED 80> and others report
<ED 80 7F> as the invalid sequence. This caused spurious test failures for
the test "no invalid UTF-8 sequences in strings - Python", so this commit
makes the messages consistent by dropping the extra trailing byte from the
message.
I first noticed the longer sequences <C0 22> and <ED 80 7F> on Ubuntu
10.04 with python version 2.6.5-0ubuntu1, but undoubtedly it exists
elsewhere also.
Andrew Evans [Fri, 1 Jul 2011 01:08:59 +0000 (18:08 -0700)]
connmgr: Free controller info in the same module where it's allocated.
Make ofproto_free_ofproto_controller_info() just a passthrough to
connmgr_free_controller_info() so the allocation and freeing of memory in the
controller info structure is done in the same place.
Andrew Evans [Thu, 30 Jun 2011 22:15:46 +0000 (15:15 -0700)]
bridge: Update controller connection status correctly.
Updates to status-related columns in the Controller table can be lost if there
are multiple bridges with different sets of controllers. This commit fixes this
behavior by first accumulating status for all controllers on all bridges, then
making one pass over all rows in the Controller tables, updating the status of
each.
Bug #6185. Reported-by: Michael Hu <mhu@nicira.com>
Jesse Gross [Thu, 30 Jun 2011 19:49:11 +0000 (12:49 -0700)]
tunneling: Force selection of an IP ID with GRE.
By default we set the DF bit on tunneled packets because we want to
get path MTU discovery from the underlying network. In turn this
causes Linux to leave the IP ID as 0 because it believes that
fragmentation can never occur. However, with GRE fragmentation is
still possible because we may get a large packet to be encapsulated
and let the local IP stack do fragmentation. As long as packets are
kept in order fragments are not misassociated and everything works fine.
However, if there is reordering in the underlying network then packets
can become corrupted. This forces selection of an IP ID for GRE packets
to avoid misassociation.
Bug #6128
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Thu, 30 Jun 2011 17:05:52 +0000 (10:05 -0700)]
ofp-util: Centralize decoding of OpenFlow actions.
This significantly simplifies code in ofp-print and ofproto-dpif and is
likely to simplify any new ofproto implementations whose support for
actions differs from ofproto-dpif.
Ben Pfaff [Thu, 30 Jun 2011 17:04:09 +0000 (10:04 -0700)]
ofp-util: Simplify iteration through OpenFlow actions.
The existing actions_first() and actions_next() iterator functions are not
much like the other iteration constructs found throughout the Open vSwitch
tree. Also, they only work with actions that have already been validated,
so there are cases where they cannot be used.
This commit adds new macros for iterating through OpenFlow actions, one
for actions that have been validated and one for actions that have not, and
adapts the existing users. The following commit will further refine action
parsing and add more users.
Ben Pfaff [Fri, 24 Jun 2011 20:58:08 +0000 (13:58 -0700)]
ofp-util: Rename OFPUTIL_INVALID to OFPUTIL_MSG_INVALID.
An upcoming commit will introduce new OPFUTIL_* constants for actions. It
seems best to be able to visually distinguish the contants. Most of the
existing constants start with a good prefix, but OFPUTIL_INVALID does not,
so rename it.
Simon Horman [Thu, 30 Jun 2011 11:34:15 +0000 (20:34 +0900)]
ofproto: Simplify bucket finding in facet_max_idle()
The existing dual-loop setup is unnecessary
as the outer loop only skips to the first non-zero value
and then exits once the inner loop completes.
Zero values in the inner loop have no affect on its logic.
Signed-off-by: Simon Horman <horms@verge.net.au>
[pushed declaration of subtotal out to function scope] Signed-off-by: Ben Pfaff <blp@nicira.com>
Andrew Evans [Tue, 28 Jun 2011 20:17:54 +0000 (13:17 -0700)]
bridge: Populate interface status/statistics as soon as a port is added.
Currently there's a lag of up to five seconds before the status and statistics
columns in the Interface table are populated when a port is first added to a
bridge. This may confuse systems that expect those columns to be populated
right away.
Ethan Jackson [Mon, 27 Jun 2011 20:18:19 +0000 (13:18 -0700)]
bond: Drop packets on backup slaves.
Currently, OVS accepts incoming traffic on all slaves participating
in a bond. In Linux active-backup bonding, all traffic which comes
in on backup slaves is dropped. This patch causes OVS to do the
same.
Jesse Gross [Thu, 23 Jun 2011 19:54:48 +0000 (12:54 -0700)]
datapath: Add missing header.
The internal dev vport really needs hardirq.h but doesn't depend
directly on it and has relied on it being included from other sources.
Recent kernels broke this, so explicitly add the header.
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Wed, 22 Jun 2011 17:37:18 +0000 (10:37 -0700)]
ovs-ofctl: Accept only valid flow_mod and flow_stats_request fields.
OpenFlow commands have several idiosyncratic fields that are used in some
cases and ignored in others. Until now, ovs-ofctl has been lax about
allowing some of them in places where they are ignored. This commit
tightens the checks to exactly what is allowed.
Ethan Jackson [Wed, 22 Jun 2011 20:42:52 +0000 (13:42 -0700)]
xenserver: Update all external_ids in tap interfaces.
Commit 400430 "xenserver: Give tap devices iface-ids." copies the
iface-id from vifs to their related tap device. It turns out this
is not sufficient, so this commit copies all relevant external_ids
over.
Requested-by: Pankaj Thakkar <thakkar@nicira.com> Signed-off-by: Ethan Jackson <ethan@nicira.com>
Bug #5954.
Ben Pfaff [Wed, 22 Jun 2011 22:09:35 +0000 (15:09 -0700)]
configure: Fix --with-linux when environment contains KSRC.
When "configure"'s environment contains KSRC, "configure" would use it in
preference to the value specified on --with-linux. This caused a problem
for module-assistant builds in particular.
Ben Pfaff [Wed, 22 Jun 2011 16:26:31 +0000 (09:26 -0700)]
configure: Remove "26" from Linux variable names.
OVS used to support Linux 2.4 and Linux 2.6, but now it only supports
Linux 2.6. Linux 3.0 is coming up, and it's just an evolution of 2.6, so
OVS should stop referring to it as "2.6".
This takes a first step by removing "26" from internal variable names.
There should be no user-visible changes.
Ben Pfaff [Tue, 21 Jun 2011 23:40:44 +0000 (16:40 -0700)]
Avoid inserting duplicate iptables rules when restarting vswitch.
On startup, some OVS initscripts insert an iptables rule to allow GRE
traffic (because GRE support is an important OVS feature). I noticed that,
each time I restarted OVS, this added another GRE-related rule to the
iptables chain. This is wasteful, because each additional rule increases
the time it takes to process a packet in the IP stack.
This commit avoids the problem by inserting an iptables rule when there
isn't already an appropriate rule. It also avoids inserting an iptables
rule if the iptables policy is ACCEPT, meaning that packets are accepted
by default; in such a case, if the GRE packet would be dropped, it is
because the system administrator made that decision explicitly.
Ben Pfaff [Tue, 21 Jun 2011 22:12:49 +0000 (15:12 -0700)]
ovs-ctl: Use "action" to print success or failure directly.
This displays errors in whatever fashion the distro prefers, which seems
like a good idea. We have to use a shell function so that the redirection
to a temporary file doesn't write the messages for the admin to the file
instead of the console.
Ben Pfaff [Mon, 20 Jun 2011 23:17:44 +0000 (16:17 -0700)]
ovsdb-idl: Plug hole in state machine.
The state machine didn't have a proper state for "not yet committed or
aborted", which meant that destroying an ovsdb_idl_txn without committing
or aborting it caused a segfault. This fixes the problem by adding a new
state TXN_UNCOMMITTED to the state machine.
This is related to commit 79554078d "ovsdb-idl: Fix bad logic in
ovsdb_idl_txn_commit() state transitions", which fixed a related bug.
Ben Pfaff [Mon, 20 Jun 2011 22:06:33 +0000 (15:06 -0700)]
vlog: Add a little more detail to ratelimit messages
When a message is suppressed by vlog ratelimiting, and then that message
occurs again much later, sometimes we get messages like this:
Dropped 4 log messages in last 8695 seconds due to excessive rate
It seems pretty clear in this case that in fact we just didn't get that
kind of message for most of that 8695 seconds. This commit improves the
message by adding a little more detail, e.g.:
Dropped 4 log messages in last 8695 seconds (most recently, 6697 seconds
ago) due to excessive rate.
Ethan Jackson [Tue, 21 Jun 2011 18:24:52 +0000 (11:24 -0700)]
schema: Update schema version due to xenserver changes.
Commit 32abfca0 "xenserver: New iface-status external id." and
Commit 40043044 "xenserver: Give tap devices iface-ids.", changed
the way a controller interprets the external_ids column of the
Interface table. This patch increments the schema version number
to reflect that change.
Ben Pfaff [Thu, 16 Jun 2011 21:02:10 +0000 (14:02 -0700)]
Reduce log level for ENODEV errors getting Ethernet address.
Bug #5844 reports several log messages of the form:
netdev_linux|ERR|ioctl(SIOCGIFHWADDR) on vif426.1 device failed: No
such device
during migrations. These are normal and unavoidable, because the vifs
disappear from the kernel before they are removed them from the OVS
database. Reduce the log level to avoid making people worry.
Ethan Jackson [Thu, 16 Jun 2011 23:37:18 +0000 (16:37 -0700)]
xenserver: New iface-status external id.
The iface-status external id indicates to a controller which device
it should manage when there are multiple choices for a given vif.
Currently, it always chooses a tap device if available, but one
could imagine more sophisticated strategies in the future.
Ethan Jackson [Thu, 16 Jun 2011 22:02:50 +0000 (15:02 -0700)]
xenserver: Give tap devices iface-ids.
In some cases XenServer will give a virtual machine a tap device in
addition to its usual vif. These tap devices need iface-ids so
that controllers can figure out which vif they are related to.
Andrew Evans [Fri, 17 Jun 2011 19:24:54 +0000 (12:24 -0700)]
ovs-ofctl: Print the offending flow on parse error when reading from a file.
When an error is encountered while parsing flows from a file, ovs-ofctl doesn't
print the erroneous flow, so it's not always obvious which flow is causing
the error. Print the flow before the error message to make it clear.
David Tsai [Fri, 17 Jun 2011 06:13:24 +0000 (23:13 -0700)]
xenserver: allow dom0 traffic in secure pool host when controller unavailable.
A pool configured for secure fail-mode can block dom0 traffic on hosts joining
the pool or if the host reboots while the controller is unavailable. This
commit sets default flows on a host under these conditions to allow management
traffic. Once the connection with the controller is re-established, these
default flows are replaced by the controller.
tests/interface-reconfigure.at updated by Ben Pfaff.
NIC-376.
Signed-off-by: David Tsai <dtsai@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Wed, 15 Jun 2011 18:50:24 +0000 (11:50 -0700)]
stream-ssl: Clear CAs for certificate verification before adding new ones.
If the CA certificate changed and OVS added the new CA certificate, the
change was ineffective. Clearing the certificate store before adding the
new CA certificate fixes the problem.
I don't know exactly why this fixes the problem, but in my testing it does.
Bug #2921. Reported-by: Dan Wendlandt <dan@nicira.com> Reported-by: Pierre Ettori <pettori@nicira.com>
Justin Pettit [Fri, 17 Jun 2011 01:04:41 +0000 (18:04 -0700)]
netdev-vport: Don't use ipsec options for either arg in config_equal_ipsec().
Commit aebf423 (netdev: Add methods to do netdev-specific argument
comparisons.) added a new config_equal_ipsec() function to ignore
IPsec key options when comparing an existing netdev's options with a new
netdev. We only ignored the options for the new netdev configuration,
which works when pulling the existing configuration from the kernel.
Unfortunately, if this is just a re-init of a netdev for which we just
created, this ignoring of the IPsec key options on the new netdev will
cause the check to fail, since the full options actually available in
both netdevs. This commit just ignore all IPsec key options from both
netdevs.
Jesse Gross [Thu, 16 Jun 2011 22:32:26 +0000 (15:32 -0700)]
datapath: Use consume_skb() on non-errors.
It's possible to trace kfree_skb() call sites to find out where
packets are getting dropped. Situations where kfree_skb() does
not actually indicate an error adds additional noise, so use
consume_skb() instead to avoid tracing non-errors.
Suggested-by: Ben Pfaff <blp@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Fri, 27 May 2011 22:53:49 +0000 (15:53 -0700)]
datapath: Further mirror checksum offloading state on old kernels.
Older kernels (those before 2.6.22) rely on implicit assumptions
to determine checksum offloading status. These assumptions tend
to break down when doing switching because it sits in the middle
of the transmit and receive path. Newer kernels deal with this
problem by adding more explicit information about how to checksum.
This replicates that behavior by mirroring the state from newer
kernels in private OVS storage on the kernels that lack it. On
ingress and egress we then map that state onto the appropriate
location for the given kernel and can consistently manipulate it
within OVS. Some of this was already done for the checksum type
but this makes it more robust and expands it to the checksum start
and offset as well.
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Wed, 8 Jun 2011 00:11:02 +0000 (17:11 -0700)]
datapath: Drop set_skb_csum_bits().
Various older kernels have had different bugs with copying checksum
state when a complete copy of a packet is made. However, it is not
actually necessary to make these copies and all occurrences have
now been removed. Therefore, we can also remove the workarounds to
deal with these bugs.
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Wed, 8 Jun 2011 00:09:35 +0000 (17:09 -0700)]
tunneling: Avoid extra copying if expanding headroom.
Currently if we need additional headroom before encapsulating a
packet a clone is made before expanding headroom or if we are
just trying to make the headroom writable then we copy both
the struct sk_buff and the paged data. Both of these are unnecessary
and we end up freeing the original copy. We can remove these copies
and simplify the code by just expanding the linear data area.
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Tue, 7 Jun 2011 02:17:25 +0000 (19:17 -0700)]
datapath: Simplify make_writable().
The current implementation of make_writable() is both overly complex
and unnecessarily aggressive about copying data. We can improve
performance by only making a copy of the data if someone else holds
a reference to the portion of the data that we want to modify. This
means that if a clone is held by the TCP stack for retransmission then
we do not need to make a copy if we are changing the IP header because
it will get regenerated on retransmit anyways. Even when it is necessary
to copy we avoid duplicating struct sk_buff.
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Mon, 6 Jun 2011 23:11:47 +0000 (16:11 -0700)]
datapath: Use strip_vlan() for modify_vlan_tci().
The sematics for setting a vlan tag are to modify the existing tag
if one exists. This can be expressed as removing the existing tag
first and then adding a new one. This simplifies the code by not
requiring two copies of the logic that manipulates non-accelerated
vlans and should not make a performance difference because the vlan
tag is contained in a single cache line.
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>