Ben Pfaff [Thu, 12 May 2011 16:58:01 +0000 (09:58 -0700)]
Implement basic multiple table support.
This implements basic multiple table support in ofproto and supporting
libraries and utilities. The design is the same as the one that has been
on the Open vSwitch "wdp" branch for a long time. There is no support for
multiple tables in the software switch implementation (ofproto-dpif), only
a set of hooks for other switch implementations to use.
To allow controllers to add flows in a particular table, Open vSwitch adds
an OpenFlow 1.0 extension called NXT_FLOW_MOD_TABLE_ID.
Ben Pfaff [Tue, 26 Apr 2011 21:25:00 +0000 (14:25 -0700)]
ofproto: Drop ofproto_rule_lookup().
There's no reason not to implement this trivial function in ofproto-dpif,
especially since it makes less sense once multiple table support is
implemented (which table should be searched?).
Ben Pfaff [Wed, 11 May 2011 21:06:48 +0000 (14:06 -0700)]
ofproto: Make rule construction and destruction more symmetric.
Before, ->rule_construct() both created the rule and inserted into the
flow table, but ->rule_destruct() only destroyed the rule. This makes
->rule_destruct() also remove the rule from the flow table.
Ben Pfaff [Tue, 26 Apr 2011 20:09:24 +0000 (13:09 -0700)]
classifier: Remove OF1.0 special case from classifier_find_rule_exactly().
This special case should never have actually triggered in practice, because
OpenFlow 1.0 cannot set up an exact-match rule as defined by
flow_wildcards_is_exact(). (OpenFlow 1.0 will always, for example,
wildcard all NXM registers.)
OVS implements this OF1.0 special case differently, by changing flow
priority to 65535 in cls_rule_from_match() if the flow is an exact match as
defined by OpenFlow 1.0.
Ben Pfaff [Tue, 26 Apr 2011 19:31:12 +0000 (12:31 -0700)]
ofproto: Eliminate reference to dpif_upcall from ofproto.
The dpif_upcall structure is specific to the ofproto-dpif implementation.
The generic ofproto and connmgr interface have no business using it, so
this commit switches to using ofputil_packet_in instead.
Ben Pfaff [Wed, 11 May 2011 19:13:10 +0000 (12:13 -0700)]
ofproto: Break apart into generic and hardware-specific parts.
In addition to the changes to ofproto, this commit changes all of the
instances of "struct flow" in the tree so that the "in_port" member is an
OpenFlow port number. Previously, this member was an OpenFlow port number
in some cases and an ODP port number in other cases.
Ben Pfaff [Mon, 9 May 2011 16:33:02 +0000 (09:33 -0700)]
ofproto: Complete abstraction by adding enumeration and deletion functions.
This eliminates the final reference from bridge.c directly into the dpif
layer, which will make it easier to change the implementation of ofproto
to support other lower layers.
Ben Pfaff [Mon, 9 May 2011 16:24:39 +0000 (09:24 -0700)]
ofproto: Improve abstraction by using OpenFlow port numbers in interface.
Until now, ofproto has used a mix of datapath and OpenFlow port numbers in
its client interface. This commit changes it to use OpenFlow port numbers
exclusively, to raise the level of abstraction.
Most of this commit boils down to simple search-and-replace with a few
call to ofp_port_to_odp_port() sprinkled in. The addition of ofproto_port
is one exception. An ofproto_port is almost the same as a dpif_port; the
difference is just that its port number is an OpenFlow port number instead
of a datapath port number.
Ben Pfaff [Fri, 6 May 2011 22:04:29 +0000 (15:04 -0700)]
dpif: Improve abstraction by making 'run' and 'wait' functions per-dpif.
Until now, the dp_run() and dp_wait() functions had to be called at the top
level of the program because they applied to every open dpif. By replacing
them by functions that take a specific dpif as an argument, we can call
them only from ofproto, which is currently the correct layer to deal with
dpifs.
Ben Pfaff [Wed, 11 May 2011 19:26:06 +0000 (12:26 -0700)]
bridge: Move packet processing functionality into ofproto.
Until now, packet processing in ovs-vswitchd has been split between two
components: ofproto, for basic OpenFlow functionality, and bridge, for
OFPP_NORMAL processing. This architecture will not work as Open vSwitch
starts to support a wider variety of underlying hardware, because it
imposes a model in which the bridge needs to be able to look at every
exact-match flow within a OpenFlow flow, which most hardware doesn't
support.
Therefore, this commit moves all of the packet processing code in
bridge into ofproto, as preparation for generalizing further.
Jesse Gross [Tue, 10 May 2011 18:48:36 +0000 (11:48 -0700)]
datapath: Pull data into linear area only on demand.
We currently always pull 64 bytes of data (if it exists) into the
skb linear data area when parsing flows. The theory behind this
is that the data should always be there and it's enough to parse
common flows. However, this causes a number of problems in
different situations. The first is that it is not enough to handle
IPv6 so we must pull additional data anyways. However, the main
problem is that GRO typically allocates a new skb and puts just the
headers in there. For a typical TCP/IPv4 packet there are 54 bytes
of headers, which means that we must possibly reallocate and copy
on every packet. In addition, GRO creates frag_lists with this
specific geometry in order to allow later segmentation if the packet
is forwarded to a device that does not support frag_lists. When
we pull additional data it changes the geometry and causes later
problems for the device. This patch instead incrementally pulls
data, which avoids these problems.
Signed-off-by: Jesse Gross <jesse@nicira.com> CC: Ian Campbell <Ian.Campbell@citrix.com>
Justin Pettit [Tue, 10 May 2011 06:30:07 +0000 (23:30 -0700)]
xenserver: Fix bugs related to using xe-switch-network-backend in spec file.
Commit daf2ebb (xenserver: Use xe-switch-network-stack in RPM spec
file.) changed the spec file to use xe-switch-network-backend instead of
directly modifying "/etc/xensource/network.conf". It incorrectly
assumed that the command was in the search path. It also didn't take
into account that the command will remove the "openvswitch" service with
chkconfig. This commit fixes those errors.
Ben Pfaff [Tue, 10 May 2011 16:17:37 +0000 (09:17 -0700)]
stream-ssl: Improve messages when configuring SSL if it is unsupported.
Previously, if --private-key or another option that requires SSL support
was used, but OVS was built without OpenSSL support, then OVS would fail
with an error message that the specified option was not supported. This
confused users because it made them think that the option had been removed:
http://openvswitch.org/pipermail/discuss/2011-April/005034.html
This commit improves the error message: OVS will now report that it was
built without SSL support. This should be make the problem clear to users.
Ethan Jackson [Sat, 7 May 2011 00:02:02 +0000 (17:02 -0700)]
bridge: Don't configure QoS without Queues.
It doesn't make sense to create a QoS object without any queues.
Before this patch, OVS would configure the QoS object and as a
result drop all traffic going through the affected interface. With
this patch, OVS will simply clear QoS configuration on the
interface.
Ethan Jackson [Mon, 2 May 2011 20:15:59 +0000 (13:15 -0700)]
ofproto: Resubmit statistics improperly account during failover.
In some cases, when a facet's actions change because it resubmits
into a different rule, it will account all packets it as accrued
in the datapath to the new rule. Due to the algorithm we are
using, it is acceptable for a facet to miscount at most 1 second
worth of packets in this manner. This patch implements the proper
behavior.
Generally speaking, when a facet is facet_put__() into the
datapath, the kernel returns the old flow's statistics so they may
be accounted for in user space. These statistics are generally
pushed down to the relevant facet's resubmit children. Before this
patch, facet_put__() did not compensate for the fact that many of
the statistics in the datapath may have been already pushed.
Thus the entire packet count stored in the datapath would be pushed
to its children instead of simply the packets which have accrued
since the last accounting. This patch fixes the behavior by
subtracting already accounted for packets from the statistics
returned by the datapath.
Ethan Jackson [Thu, 5 May 2011 23:52:56 +0000 (16:52 -0700)]
lacp: New "lacp-heartbeat" mode.
This commit creates a new heartbeat mode for LACP. This mode
treats LACP as a protocol simply for monitoring link status. It
strips out most of the sanity checks built into the protocol.
Addition of this mode makes "lacp-force-aggregatable" and
"lacp-strict" options obsolete so they are removed.
Ethan Jackson [Thu, 5 May 2011 23:01:11 +0000 (16:01 -0700)]
bond: Create new "bond-stable-id".
Stable bonding mode needs an ID to guarantee consistent slave
selection decisions across ovs-vswitchd instances. Before this
patch, we used the lacp-port-id for this purpose. However, LACP
places restrictions on how lacp-port-ids can be allocated which may
be inconvenient. This patch creates a special purpose
bond-stable-id other_config setting which allows users to tweak
this value directly.
xenserver: Use xe-switch-network-stack in RPM spec file.
The proper way to switch the networking back-end is to use the
"xe-switch-network-stack" command rather than directly modifying
"/etc/xensource/network.conf". Use that method in the spec file.
Ben Pfaff [Wed, 20 Apr 2011 22:13:46 +0000 (15:13 -0700)]
ofproto: Initialize ports immediately upon ofproto creation.
I don't see why we should delay initializing the ports to the first call
of ofproto_run1(). We originally did initialize the ports in
ofproto_create(), but back in January 2010 Jesse moved the call into
ofproto_run1() in commit 149f577a "netdev: Fully handle netdev lifecycle
through refcounting." The commit message doesn't explain why this
particular change was made, so I can only assume that it was important at
the time. Now, however, everything seems to work fine with initialization
done in the most logical place.
Ben Pfaff [Thu, 7 Apr 2011 21:43:14 +0000 (14:43 -0700)]
dpif: Better log unusual errors in dpif_port_query_by_name().
Logging these unusual errors at a low level means that we can remove a
bit of higher-level code from ofproto.
The ofproto change also changes behavior for these error cases, from doing
nothing to removing the port, but I think that's OK. I've never noticed
this log message.
Ben Pfaff [Fri, 8 Apr 2011 19:35:38 +0000 (12:35 -0700)]
ofproto: Add 'name' field to struct ofproto and use hmap instead of shash.
It's slightly inconvenient to call into dpif_name() just to get the name
of an ofproto. Furthermore, we're already keeping a copy of the ofproto's
name around, in the 'name' field of its shash_node. It seems easier all
around if we just keep the name right in the struct ofproto and use an
hmap instead of a shash.
Ben Pfaff [Tue, 5 Apr 2011 23:34:09 +0000 (16:34 -0700)]
ofproto: Rename ofproto_iface_*() functions to ofproto_port_*().
This makes ofproto use the term "port" consistently for a single
purpose (which is unfortunately different from the term "interface"
used in the OVS database, but at least it is now internally
consistent).
Ben Pfaff [Fri, 8 Apr 2011 20:50:21 +0000 (13:50 -0700)]
bridge: Reorder configuration.
This loses the bridge_run_one() before iface_configure_cfm(), which means
that CFM configuration can now take two reconfigurations in a row. That's
a regression that we had earlier, which had been fixed previously by commit 392730c42bb "bridge: Run once before configuring CFM". It will, however,
be fixed again in a later commit.
Ben Pfaff [Wed, 4 May 2011 17:18:23 +0000 (10:18 -0700)]
bridge: Get rid of bridge_get_all_ifaces(), bridge_fetch_dp_ifaces().
The bridge_get_all_ifaces() function is rather odd. It creates an shash
index over the "struct iface"s within a bridge, but there's already an
index over them (the 'iface_by_name' hmap in struct bridge) that the
iface_lookup() function searches. The only value it adds is to put the
names of bond fake ifaces into the index, but that's hardly worth it. We
can just search the existing hash table as needed, instead.
The bridge_fetch_dp_ifaces() function is also odd. It fetches the entire
mapping from port number to name from the dpif again, although this has
already been done twice already. We can just merge this in with the second
iteration.
Ben Pfaff [Mon, 4 Apr 2011 21:11:16 +0000 (14:11 -0700)]
bridge: Change all_bridges from list to hmap (indexed by name).
This is more convenient for looking up a bridge by name. That makes
reconfiguration a little bit simpler, because there is no longer a need to
build a temporary index of existing bridges. I don't see any downsides.
Ben Pfaff [Tue, 29 Mar 2011 21:42:20 +0000 (14:42 -0700)]
Convert remaining network-byte-order "uint<N>_t"s into "ovs_be<N>"s.
I looked at almost every uint<N>_t in the tree to determine whether it was
really in network byte order, and converted the ones that were.
The only remaining ones, modulo my mistakes, are in openflow.h. I'm not
sure whether we should convert those, because there might be some value
in remaining close to upstream for this header.
Ben Pfaff [Tue, 29 Mar 2011 18:32:25 +0000 (11:32 -0700)]
bridge: Inline iterate_and_prune_ifaces() and remove it.
The main reason that iterate_and_prune_ifaces() existed was because it was
somewhat inconvenient to iterate across all of the interfaces, especially
if anything needed to be deleted. Now that we've switched from arrays to
lists and hmaps, it's a bit easier, and certainly it's easier to read code
when there aren't any callbacks involved, so inline what this was doing.
This was the only remaining caller of iterate_and_prune_ifaces() so this
removes that function as well as the callback.
Ben Pfaff [Tue, 3 May 2011 18:03:08 +0000 (11:03 -0700)]
xenserver: Don't remove network.dbcache on uninstall.
network.dbcache was introduced by Open vSwitch for its own purposes, but
it has now migrated into the base install of XenServer, which uses it
whether Open vSwitch is installed or not, so we should no longer remove it
on package uninstall.
Signed-off-by: Ben Pfaff <blp@nicira.com> Reported-by: Bob Ball <bob.ball@citrix.com>
Ben Pfaff [Tue, 3 May 2011 17:51:06 +0000 (10:51 -0700)]
xenserver: Use .../extra not .../kernel/extra for kernel modules.
On XenServer, depmod.conf causes modules in /lib/modules/$(uname -r)/extra
to take priority over standard modules. Unfortunately, we were installing
our modules in /lib/modules/$(uname -r)/kernel/extra, which isn't special.
This commit fixes the problem.
Signed-off-by: Ben Pfaff <blp@nicira.com> Reported-by: Bob Ball <bob.ball@citrix.com>
Ethan Jackson [Mon, 2 May 2011 23:33:01 +0000 (16:33 -0700)]
vswitchd: Update schema version number.
Quite a few changes to LACP and bonding have gone in recently which
allowed additional other_config parameters on ports and interfaces.
These changes should have updated the vswitch.ovsschema version
number.
Ben Pfaff [Thu, 28 Apr 2011 18:13:53 +0000 (11:13 -0700)]
netdev-linux: New functions for converting netdev stats formats.
An upcoming commit will introduce another function that needs to convert
between rtnl_link_stats64 and netdev_stats, so it seemed best to just add
functions to do the conversion.
Andrew Evans [Sat, 30 Apr 2011 00:05:58 +0000 (17:05 -0700)]
tunneling: Add df_default and df_inherit tunnel options.
Split existing pmtud tunnel option's functionality into three. Existing pmtud
option still exists, but now governs only whether datapath sends ICMP frag
needed messages. New df_inherit option controls whether DF bit is copied from
packet inner header to outer tunnel header. New df_default option controls
whether DF bit is set if inner packet isn't IP or if df_inherit is disabled.
Suggested-by: Jesse Gross <jesse@nicira.com> Signed-off-by: Andrew Evans <aevans@nicira.com>
Feature #5456.
Ethan Jackson [Fri, 29 Apr 2011 20:12:19 +0000 (13:12 -0700)]
dpif-linux: Recycle leaked ports.
When ports are deleted from the datapath they need to be added to
an LRU list maintained in dpif-linux so they may be reallocated.
When using vswitchd to delete the ports this happens automatically.
However, if a port is deleted directly from the datapath it is
never reclaimed by dpif-linux. If this happens often, eventually
no ports will be available for allocation and dpif-linux will fall
back to using the old, kernel implemented, allocation strategy.
This commit fixes the problem by automatically reclaiming ports
missing from the datapath whenever the list of ports in the
datapath is dumped.
Ben Pfaff [Fri, 29 Apr 2011 17:49:06 +0000 (10:49 -0700)]
datapath: Drop parameters from execute_actions().
It's (almost) always easier to understand a function with fewer parameters,
so this removes the now-redundant sw_flow_key and actions parameters from
execute_actions(), since they can be found through OVS_CB(skb)->flow now.
This also necessarily moves loop detection into execute_actions().
Otherwise, the flow's actions could have changed between the time that the
loop was detected and the time that it was suppressed, which would mean
that the wrong (version of the) flow would get suppressed.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
Ben Pfaff [Thu, 28 Apr 2011 23:54:07 +0000 (16:54 -0700)]
datapath: Make every packet passing through the datapath have an sw_flow.
This way, it's always possible to get a packet's key or hash simply by
looking at its 'flow', without considering whether the packet came from
userspace or from a vport.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
Ben Pfaff [Thu, 28 Apr 2011 23:34:56 +0000 (16:34 -0700)]
datapath: Avoid freeing wild pointer in corner case.
In odp_flow_cmd_new_or_set(), if flow_actions_alloc() fails in the "new
flow" case, then flow_put() will kfree() the new flow's 'sf_acts' pointer,
but nothing has initialized that pointer. Initialize the pointer to NULL
to avoid the problem.
Found by inspection.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
Ben Pfaff [Thu, 28 Apr 2011 23:13:39 +0000 (16:13 -0700)]
datapath: No need to zero cb anymore in odp_packet_cmd_execute().
Before commit 3f19d399f "datapath: Fix mysterious GRE-over-IPSEC problems,"
'packet' in opd_packet_cmd_execute() was an skb cloned from one created by
Netlink, so its cb member wasn't necessarily zeroed. But that commit
changed 'packet' to be freshly allocated with __dev_alloc_skb(), which
means that cb is zeroed, so we don't have to do it again.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
Some (broken) firewalls do not properly pass UDP fragments, which will
prevent IKE from completing. This commit enables the racoon option to
allow application-level fragmenting and allow security associations to
be created.
Ethan Jackson [Tue, 26 Apr 2011 22:39:58 +0000 (15:39 -0700)]
lacp: Allow configurable aggregation keys.
Users will the ability to manually set aggregation keys on a
per-slave basis in order to use some of the more advanced LACP
features. Most notably, LACP controlled active-backup bonding
requires fine grained aggregation key configuration.
Ben Pfaff [Tue, 26 Apr 2011 16:42:18 +0000 (09:42 -0700)]
Remove support for obsolete "tun_id_from_cookie" extension.
The "tun_id_from_cookie" OpenFlow extension predated NXM and supports only
a fraction of its features. Nothing (at Nicira, anyway) uses it any
longer. Support for it had been broken since January and it took until a
few days ago for anyone to complain, so it cannot be too important. This
commit removes it.
Ben Pfaff [Wed, 6 Apr 2011 22:31:22 +0000 (15:31 -0700)]
mac-learning: Change mac_learning_set_flood_vlans() to not take ownership.
These new semantics are less efficient in the case where the flood_vlans
actually changed, but that should be very rare.
There are no advantages to this change on its own, but upcoming commits
will add multiple layers between the code supplying the flood_vlans and
actually calling mac_learning_set_flood_vlans(). Consistency in this
multilayered interface seems valuable, and the rest of it does not transfer
ownership from the caller to the callee.
Ben Pfaff [Thu, 21 Apr 2011 23:34:51 +0000 (16:34 -0700)]
bond: Be more careful about adding and removing netdevs in the monitor.
The code was careless about updating the netdev_monitor. Newly added
slaves weren't added to the monitor until the next bond_reconfigure() call,
and netdevs were never removed from the monitor.
Ben Pfaff [Thu, 21 Apr 2011 23:25:41 +0000 (16:25 -0700)]
ofproto: Adjust netdev_monitor when switching netdevs.
This fixes a segfault in the "ofproto - mod-port" test. The segfault
should not occur--there must be a bug in the netdev_monitor or possibly
the netdev_dummy implementation--but the netdev_monitor_remove() and
netdev_monitor_add() calls are definitely wanted here in any case to ensure
that the new netdev, not the old one, is what gets monitored.
Ben Pfaff [Wed, 13 Apr 2011 18:10:44 +0000 (11:10 -0700)]
bridge: Tolerate missing Port and Interface records for local port.
Until now, ovs-vswitchd has been unable to configure IP addresses and
routes for bridges whose Bridge records lack a Port and an Interface
record for the bridge's local port (e.g. OFPP_LOCAL, the port with the
same name as the bridge itself). When such a bridge was reconfigured,
ovs-vswitchd would output a log message that worried people.
This commit fixes the internal limitation that led to the message being
printed.
Ben Pfaff [Wed, 20 Apr 2011 20:48:11 +0000 (13:48 -0700)]
ofproto: Rework and fix bugs in port change detection.
The OpenFlow port change detection code in update_port() is supposed to
send out an OFPT_PORT_STATUS message whenever an OpenFlow port is added or
removed or changes in some way. This commit fixes a number of bugs that
have persisted until now.
First, if a port with a given name is removed from the datapath and a new
port with the same name but a different port number is added to the
datapath, then update_port() would report this as a port "modify" change.
Reporting this as a "modify" seems likely to confuse controllers, which
have no reason to realize that the old port was deleted and may not
understand why a port that has not been reported as added would be
modified. (This scenario is more likely than before, because the Linux
datapath implementation no longer quickly reuses port numbers. This
problem has actually been reported in testing.) This commit fixes the
problem by changing update_port() to report a "delete" of the old port
followed by an "add" of the new port.
Second, suppose that a datapath initially has "eth1" on port 1 and "eth2"
on port 2. Then, "eth1" gets removed and "eth2" is reassigned to port 1.
If update_port() is first passed "eth2", then the old implementation would
have sent out an OpenFlow "modify" notification instead of "delete"
followed by "add", which is the same as the previous scenario. But as a
further wrinkle, it would have failed to remove "eth1", which meant that we
ended up with two "ofports" with port number 1! This commit fixes this
problem too.
Reported-by: David Tsai <dtsai@nicira.com>
Bug #5466.
NIC-372.
Ben Pfaff [Wed, 20 Apr 2011 20:03:45 +0000 (13:03 -0700)]
ofproto: Consistently use netdev's name instead of ofp_phy_port name.
There are at least two ways to get an ofport's name: from the netdev using
netdev_get_name() or from the ofp_phy_port's 'name' member. Some code used
one, some used the other. This switches all relevant code to use only
netdev_get_name(), because the 'name' member in ofp_phy_port is
fixed-length and thus a long name could be truncated.
This isn't a problem under Linux since the maximum length of a network
device's name under Linux is the same as the field width in ofp_phy_port.
Ethan Jackson [Wed, 20 Apr 2011 22:53:58 +0000 (15:53 -0700)]
bond: BM_STABLE consistent hashing.
This patch converts stable bonds from modulo n based hashing to
Highest Random Weight based hashing. This hashing strategy only
redistributes 1/n_slaves traffic when a slave is enabled or
disabled. It also turns out to have a vastly simpler
implementation.