Ben Pfaff [Fri, 10 Dec 2010 22:39:25 +0000 (14:39 -0800)]
datapath: Include <linux/skbuff.h> directly into linux/ip.h compat.
While doing test builds on numerous kernel versions I found that one build
failed because skb_network_header() wasn't visible from flow.h. I guess
that we accidentally depend on <linux/netlink.h> being included indirectly,
but this didn't always happen.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
Ben Pfaff [Fri, 10 Dec 2010 22:38:25 +0000 (14:38 -0800)]
datapath: Include <linux/netlink.h> directly into flow.h.
While doing test builds on numerous kernel versions I found that one build
failed because "struct nlattr" wasn't visible from flow.h. I guess that
we accidentally depend on <linux/netlink.h> being included indirectly, but
this didn't always happen.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
Jesse Gross [Fri, 10 Dec 2010 00:40:15 +0000 (16:40 -0800)]
datpath: Fix memory leak when a loop is detected.
If we detect a packet that is looping we kill the flow but then
don't do anything with the packet that caused the problem in the
first place, so this frees the packet. This isn't a very serious
leak because we try to shut off the flow that lead to the loop
as early as possible. Once this happens, packets will no longer
hit the loop detector and will be freed just as any other packet
that should be dropped.
It also fixes an issue where the offset to the stats counter is
uninitialized after a loop is detected.
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Thu, 9 Dec 2010 07:55:20 +0000 (23:55 -0800)]
datapth: Drop check for impossible condition after skb_gso_segment().
It's possible for skb_gso_segment to return NULL but only if the
hardware supports the correct form of segmentation offload but just
wants software to verify the offload parameters. However, since we're
not hardware and don't support any kind of segmentation offload natively,
we can never get in this situation. Therefore drop the check and
comment.
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Thu, 9 Dec 2010 07:29:10 +0000 (23:29 -0800)]
datapath: Drop synchronize_rcu() in internal dev destroy.
unregister_netdevice() contains a call to synchronize_rcu(), so there
is no need to directly call it ourselves immediately beforehand.
We were relying on the call during unregistration anyways to stop
packets from being transmited on the device, so our version was
both misleading and had a performance penalty.
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Thu, 9 Dec 2010 03:21:40 +0000 (19:21 -0800)]
datapath: Don't use RCU for internal dev vport.
The vports are now attached and ready to go when they are allocated,
so we don't have to worry about future changes. As a result, we can
directly store the pointer in the internal dev's netdevice private
space before it is registered. The registration process will handle
the necessary write memory barriers and anyone who has a reference
to the netdev will have done the read side barriers, we don't need
to use RCU at all.
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Justin Pettit [Sat, 11 Dec 2010 04:50:58 +0000 (20:50 -0800)]
ofproto: Fix problem that caused facets not to be installed into datapath.
Commit cdee00f (datapath: Replace "struct odp_action" by Netlink
attributes.) stopped initializing some elements in facet structures
in certain cases. This caused flows to not be installed into the datapath.
This commit sets that again based on the action context.
Ben Pfaff [Fri, 10 Dec 2010 18:42:42 +0000 (10:42 -0800)]
Expand tunnel IDs from 32 to 64 bits.
We have a need to identify tunnels with keys longer than 32 bits. This
commit adds basic datapath and OpenFlow support for such keys. It doesn't
actually add any tunnel protocols that support 64-bit keys, so this is not
very useful yet.
The 'arg' member of struct odp_msg had to be expanded to 64-bits also,
because it sometimes contains a tunnel ID. This member also contains the
argument passed to ODPAT_CONTROLLER, so I expanded that action's argument
to 64 bits also so that it can use the full width of the expanded 'arg'.
Userspace doesn't take advantage of the new space though (it was only
using 16 bits anyhow).
This commit has been tested only to the extent that it doesn't disrupt
basic Open vSwitch operation. I have not tested it with tunnel traffic.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
Feature #3976.
Ben Pfaff [Fri, 10 Dec 2010 18:40:58 +0000 (10:40 -0800)]
datapath: Replace "struct odp_action" by Netlink attributes.
In the medium term, we plan to migrate the datapath to use Netlink as its
communication channel. In the short term, we need to be able to have
actions with 64-bit arguments but "struct odp_action" only has room for
48 bits. So this patch shifts to variable-length arguments using Netlink
attributes, which starts in on the Netlink transition and makes 64-bit
arguments possible at the same time.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
Ben Pfaff [Tue, 7 Dec 2010 17:37:59 +0000 (09:37 -0800)]
netlink: New function nl_attr_type().
Linux since v2.6.24 has a couple of couple of bits at the top of
nla_type that one is apparently supposed to ignore. This commit
starts doing that in Open vSwitch userspace.
Ben Pfaff [Fri, 10 Dec 2010 17:51:03 +0000 (09:51 -0800)]
netlink: Split into generic and Linux-specific parts.
The parts of the netlink module that are related to sockets are
Linux-specific, since only Linux has AF_NETLINK sockets. The rest can be
built anywhere. This commit breaks them into two modules, and builds the
generic one on all platforms.
Ben Pfaff [Tue, 7 Dec 2010 17:33:27 +0000 (09:33 -0800)]
netlink: Make netlink-protocol.h compatible with <linux/netlink.h>.
Until now, netlink-protocol.h and <linux/netlink.h> could not both be
included by a single source file, because they contained conflicting
definitions. This commit fixes the problem, by having netlink-protocol.h
delegate to <linux/netlink.h> where it is available.
Here's an example of the problem: odp-util.c includes both
datapath-protocol.h and will need netlink-protocol.h also so that it can
look through actions defined as struct nlattr. datapath-protocol.h
includes <linux/if_link.h> for the definition of rtnl_link_stats64, and
<linux/if_link.h> includes <linux/netlink.h>.
Jesse Gross [Fri, 10 Dec 2010 01:52:39 +0000 (17:52 -0800)]
tunneling: Fix updated port pools commit.
If readding a tunnel to the table fails during move_port(), we
should decrement the port pool counter that it is in. However,
when I attempted to do this, I accidentally put it in add_port().
Jesse Gross [Wed, 8 Dec 2010 21:38:22 +0000 (13:38 -0800)]
datapath: Drop unused file ops.
There have been two ops to support async access to the datapath
character device for a long time but they have never been implemented.
Drop the commented out code.
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Wed, 8 Dec 2010 21:19:05 +0000 (13:19 -0800)]
datapath: Hold mutex for DP while userspace is blocking on read.
Currently we get a pointer to the DP in openvswitch_read() and
openvswitch_poll() and use it without any synchronization. This means
that the DP could disappear from underneath us while we are using it.
Currently, this isn't a problem because userspace is single threaded but
it's better for the locking to be correct.
With this change we hold the mutex while doing a blocking wait, which
means that no changes can be made, including adding/removing flows. It's
possible to make this finer grained but for the time being that isn't done,
since current userspace doesn't care.
Found with lockdep.
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Wed, 8 Dec 2010 20:02:42 +0000 (12:02 -0800)]
datapath: dp_sysfs_add_dp() needs RTNL lock.
We currently drop RTNL before adding a new datapath to sysfs but
then access the dp data structures. This moves the call to
dp_sysfs_add_dp() before we drop the locks to prevent a potential
race. All other calls to sysfs functions already hold RTNL.
Found with lockdep.
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Mon, 6 Dec 2010 19:27:07 +0000 (11:27 -0800)]
datapath: RCU dereference correct pointer in table.
Our hash table implementation consists of two levels of buckets
and then arrays of pointers. The bucket arrays are fixed by the
size of the table, which is therefore protected by the RCU
dereference of the table pointer. The arrays change when items
are inserted or deleted. However, in tbl_insert/remove we need
to look at the old values and we do an rcu_dereference() on the
second level array instead of the bucket itself. Other places
that access the table for lookup do the pointer dereference in
the correct order.
Found by sparse.
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Mon, 6 Dec 2010 19:26:16 +0000 (11:26 -0800)]
datapath: Don't rcu_dereference() objects in table.
Each time that we modify the flow/port table, we reallocate the
array of pointers to objects in a particular bucket. We then use
RCU to update the link to that bucket. This means that we don't
need to use RCU to access the individual object pointers, since
they are constant for a given instance of the bucket data structure.
This doesn't cause a problem per se (though it does restrict the
optimizations that the compiler can perform and adds a memory barrier
on Alpha). However, it is confusing and inconsistent since the
pointers are not protected by RCU and we don't use rcu_assign_pointer().
Found by sparse.
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Sun, 5 Dec 2010 20:03:49 +0000 (12:03 -0800)]
tunneling: Add missing rcu_dereference() to cache cleaner.
The cleaner for the header caching accesses the tunnel port table
without holding any locks. However, it doesn't have a read memory
barrier, so there is no guarantee that the contents of the table
have made it to the right CPU.
Found by sparse.
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Sat, 4 Dec 2010 23:17:56 +0000 (15:17 -0800)]
brcompat: Simplify generation of bridge ID.
Currently we use a fairly complicated method of generating the
bridge ID, since the actual struct is only available in a header
file private to the Linux bridge. The current method appears to
be correct but is difficult to reason about. This replaces it
with a simple memcpy, which is more analogous to what the Linux
bridge does.
Flagged by sparse.
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Sat, 4 Dec 2010 21:49:50 +0000 (13:49 -0800)]
capwap: Bind address should be big endian.
CAPWAP creates a UDP socket that accepts packets from any address using
INADDR_ANY. IP addresses should be in network byte order but that
constant is in host byte order, so use htonl. However, this is not a
real bug since the value of INADDR_ANY is 0.
Found with sparse.
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Tue, 7 Dec 2010 02:03:37 +0000 (18:03 -0800)]
datapath: Try to avoid custom checksum update function.
Our update_csum() function was exactly the same as
inet_proto_csum_replace4() with the one exception that it uses our
checksum status fields on older kernels that need it. Unfortunately,
we can't completely move the code to the compat directory because it
relies on fields in OVS CB but we can at least exile it to checksum.h.
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Tue, 7 Dec 2010 01:51:33 +0000 (17:51 -0800)]
datapath: Correctly update IP checksum with actions.
The update_csum() function that we currently use to update
checksums on actions is really intended for L4 checksums. In
particular, if the packet has a partial checksum and the field
is not in the pseudo header, it doesn't do anything at all.
This doesn't make sense for the IP header because Linux doesn't
use hardware offload for it, so we always need to recompute the
checksum. Instead, we can use the kernel function csum_replace4(),
which will always do the right thing.
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Sun, 5 Dec 2010 20:16:40 +0000 (12:16 -0800)]
tunneling: Update port pools on config change.
We keep track of the number of tunnels using the different types of
matching in order to avoid doing the lookup when there are no ports
of that type. However, when updating the configuration we weren't
changing the port pool counts, which could lead to incorrectly not
finding a tunnel on receive.
Jesse Gross [Sat, 4 Dec 2010 17:43:35 +0000 (09:43 -0800)]
datapath: Convert patch vport to use call_rcu() on destruction.
Since patch ports are virtual devices, we can potentially have many
of them in a datapath. Currently we have a call to synchronize_rcu()
each time we destroy one, which can be expensive if we are deleting a
datapath with many ports. This converts it to use call_rcu() instead,
which allows us to wait for only a single RCU grace period independent
of the number of ports.
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Sat, 4 Dec 2010 03:17:20 +0000 (19:17 -0800)]
tunneling: Access correct IP header when processing ECN.
We attempt to copy the ECN bits from the outside of the tunnel to
the inside on receive if we are encapsulating IP traffic. However,
we were previously looking at the inner IP header as the source of
the ECN bits, when it should have been the outer header. This
corrects that and cleans up the function a little bit.
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Sat, 4 Dec 2010 02:06:23 +0000 (18:06 -0800)]
tunneling: Remove call to eth_type_trans() on receive.
On receive we call eth_type_trans() to set skb->protocol. However,
that function also sets skb->pkt_type, which requires several
comparisons to MAC addresses. Nothing in OVS cares about pkt_type,
so this is wasteful. If we actually do egress to the IP stack
through an internal device then we'll call eth_type_trans() to get
everything correctly setup. It's possible for device drivers to
see an incorrect pkt_type or not correctly parse legacy IPX (which
eth_type_trans() also handles) but it's highly unlikely that they
will care.
Ben Pfaff [Thu, 9 Dec 2010 22:52:44 +0000 (14:52 -0800)]
ofproto: Change xlate_actions() to take a structure.
An upcoming commit has a need to give xlate_actions() another parameter,
but it already has too many. This commit improves the situation by making
xlate_actions()'s caller fill in a structure instead.
The action_xlate_ctx structure is kind of big and unwieldy because it
include a struct odp_actions, which is about 16 kB. But work underway will
change that to a "struct ofpbuf", which is much more reasonable.
Ben Pfaff [Tue, 7 Dec 2010 22:30:07 +0000 (14:30 -0800)]
ovs-ofctl: Fix del-flows command parsing bugs.
"ovs-ofctl del-flows br0" segfaulted because do_flow_mod__() assumed that
it always had a "flow" argument, which is not true for the del-flows
command.
Beyond that, parse_ofp_flow_mod_str() rejected "ovs-ofctl del-flows
br0" because no actions were supplied, even though supplying actions
doesn't make sense for deleting flows.
This commit fixes both problems and adds a simple test that would have
caught both problems.
Ben Pfaff [Wed, 8 Dec 2010 22:26:37 +0000 (14:26 -0800)]
ovsdb-idl: Check prerequisites for ovsdb_idl_txn_verify() also.
The IDL can only verify prerequisites for columns that it is monitoring,
but it didn't check for that. This assertion (which is the same as one in
ovsdb_idl_txn_write()) should alert us to such problems.
This would have found the problem fixed by the previous commit.
Ben Pfaff [Wed, 8 Dec 2010 22:24:59 +0000 (14:24 -0800)]
ovs-vsctl: Fix controller command prerequisites.
The controller commands use the "target" column of the Controller table,
but they don't supply it as a prerequisite, which makes those commands
hang. This commit fixes the problem.
Justin Pettit [Wed, 8 Dec 2010 05:57:09 +0000 (21:57 -0800)]
ofp-print: Print Nicira error extension messages.
Currently, Nicira error messages are non-overlapping with the OpenFlow
error definitions. This commit takes advantage of that by not taking
into account the vendor id. Printing error messages is likely to be
overhauled before long, and a more general approach can be taken then.
Ben Pfaff [Wed, 8 Dec 2010 20:05:20 +0000 (12:05 -0800)]
ofp-print: Print each flow at the start of a line.
Before this commit, the first flow in "ovs-ofctl dump-flows" output was
printed on the same line as the OpenFlow message type name and the xid.
With this commit, that flow is put on a line of its own, like all of the
other flows in the output.
Requested-by: Paul Ingram <paul@nicira.com> CC: Paul Ingram <paul@nicira.com>
Ben Pfaff [Tue, 7 Dec 2010 20:17:03 +0000 (12:17 -0800)]
odp-util: Bump up maximum number of ODP actions.
The kernel supports more than a single page of actions now, so userspace
should be able to take advantage of this.
Upcoming commits will completely replace this data structure but this
commit makes the bug fix clear and is suitable for cherry-picking to
long-term support branches.
Ben Pfaff [Tue, 7 Dec 2010 00:11:55 +0000 (16:11 -0800)]
ofp-util: Fully initialize flow_wildcards in ofputil_cls_rule_from_match().
The new 'zero' member was not being properly initialized. One approach
would be to add an assignment, but it seems more future-proof to let
flow_wildcards_init_catchall() do the right thing.
The old formatting was only good enough for debugging, but now we need to
be able to format cls_rules as part of ofp-print.c. This new code is
modeled after ofp_match_to_string().
Ben Pfaff [Tue, 7 Dec 2010 20:45:24 +0000 (12:45 -0800)]
ofp-util: New abstractions for flow_mod, flow_stats_request.
These will be useful for adding Nicira Extended Match support to ovs-ofctl.
This commit makes ofproto use the new flow_mod abstraction, but not the
new flow and aggregate stats abstraction. The latter takes a bit more
infrastructure that I haven't finished yet.
Ben Pfaff [Thu, 2 Dec 2010 22:15:33 +0000 (14:15 -0800)]
nicira-ext: Clarify and fix macros to check for NXM metadata registers.
The NXM_IS_NX_REG macro didn't check the "hasmask" bit, which meant that it
looked like it was supposed to match both exact and wildcarded NXM headers,
e.g. both NXM_NX_REG0 and NXM_NX_REG0_W. But exact and wildcarded NXM
headers differ not just in the "hasmask" bit but in the "length" value
also (the wildcarded version's length is twice the exact version's length),
so this was not what it actually did.
The only current users of NXM_IS_NX_REG actually only want to match exact
versions, so this commit makes it only match those. It also adds a new
NXM_IS_NX_REG_W macro that matches only wildcarded versions. This new
macro has no users yet, but its existence should help to make it clear that
NXM_IS_NX_REG only matches exact NXM headers.
Ethan Jackson [Sat, 4 Dec 2010 00:49:02 +0000 (16:49 -0800)]
vswitchd: Remove bond/migrate MAC argument.
Before this patch one could specify a mac address as part of the
bond/migrate command. This will no longer make sense as bond
hashing becomes more complicated.
Ben Pfaff [Mon, 6 Dec 2010 18:20:20 +0000 (10:20 -0800)]
Refactor and centralize basic OpenFlow message decoding and validation.
Open vSwitch contains a few different chunks of code that need to decode
an OpenFlow message to determine its type and then validate that it is
long enough. Until now, the code for doing this has been more or less
scattered across the tree. Whenever a new piece of code needed to do this,
it generally needed to reimplement at least part of it.
This commit centralizes all of that work into a single function,
ofputil_decode_msg_type(), and helper functions, and converts all of the
code that was decoding messages by hand to use the new function.
Ben Pfaff [Tue, 23 Nov 2010 21:20:17 +0000 (13:20 -0800)]
pinsched: Use hmap instead of port_array.
This is the last remaining use of port_array in the tree. It wasn't really
taking advantage of any of the special features of port_arrays, so it's
better to save some time and memory by using an hmap instead.
(In addition, OpenFlow 1.1, which we may eventually want to support, has
changed from 16-bit to 32-bit port numbers, which would require the
port_array code to be rewritten anyhow.)
Ben Pfaff [Mon, 6 Dec 2010 18:03:31 +0000 (10:03 -0800)]
queue: Get rid of ovs_queue data structure.
ovs_queue doesn't seem very useful; it's just a singly-linked list. It's
more generally useful to use a general-purpose "struct list" for lists of
packets, so this commit adds such a member to "struct ofpbuf" and shifts
the existing users to use it.