Justin Pettit [Thu, 2 Dec 2010 01:23:33 +0000 (17:23 -0800)]
vswitch: Use "ipsec_gre" vport instead of "gre" with "other_config"
Previously, a GRE-over-IPsec tunnel was created as an interface with a
"type" of "gre" and the "other_config" column with "ipsec_cert" or
"ipsec_psk" set. This could lead to a potential security problem if a user
intended to create a GRE-over-IPsec tunnel, but misconfigured the
"ipsec_*" config and created an unencrypted GRE tunnel.
This commit defines an "ipsec_gre" tunnel type, which should prevent
users from inadvertently establishing insecure tunnels.
Justin Pettit [Tue, 30 Nov 2010 02:55:54 +0000 (18:55 -0800)]
debian: Don't require ipsec_local_ip to configure IPsec
Commit e97a103 (Open vSwitch: ovs-monitor-ipsec: Add ability to traverse
NATs) removed the requirement that the "ipsec_local_ip" key must be set
to use IPsec, but other code and documentation was not updated to
reflect this. This commit does that.
Justin Pettit [Sat, 18 Dec 2010 09:07:06 +0000 (01:07 -0800)]
ovs-dpctl: Print extended information about vports.
When "ovs-dpctl show" is run, return additional information about the
port. For example, tunnel ports will print the remote_ip, local_ip, and
in_key when defined.
Justin Pettit [Sat, 18 Dec 2010 09:04:37 +0000 (01:04 -0800)]
datapath: Return vport configuration when queried.
Additional configuration is passed down to the kernel in the "config"
array of an odp_port when a vport is created. This information is not
returned when a vport is queried, though. This information is useful
for debugging, since it may be used to distinguish ports based on
additional data, such as the peer in tunnels. In a forthcoming patch, it
will be essential to distinguish between plain GRE and GRE over IPsec.
Jesse Gross [Tue, 28 Dec 2010 05:19:35 +0000 (21:19 -0800)]
tunneling: Don't shadow vport when generating cache.
When generating the tunnel header cache we have two vports that we
are working with: the sender and destination. Unfortunately, both of
these use the name 'vport'. This renames the destination to avoid
shadowing the sender. This doesn't actually fix a bug because the
compiler correctly uses the right one, even when shadowed.
Found with sparse.
Reported-by: Ben Pfaff <blp@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Tue, 28 Dec 2010 00:20:11 +0000 (16:20 -0800)]
datapath: Clean up use of TBL_* constants.
A lot of the TBL_* constants were being used in conceptually wrong ways,
even though the code was correct because the actual values were correct.
(This is because TBL_L1_BITS, TBL_L2_BITS, and TBL_L1_SHIFT are all 10
and TBL_L1_SIZE and TBL_L2_SIZE are both 1024.)
Ben Pfaff [Tue, 28 Dec 2010 00:06:08 +0000 (16:06 -0800)]
datapath: Clarify meaning of n_buckets argument to tbl_create().
The n_buckets argument to tbl_create() can be zero, but the comment didn't
mention that. However, there's no reason that the caller can't just pass
in a correct size, so this commit changes them to do that.
Also, TBL_L1_SIZE was conceptually wrong as the minimum size: the minimum
size is one L2 page, e.g. TBL_L2_SIZE. But TBL_MIN_BUCKETS seems like a
better all-around way to indicate the minimum size, so this commit also
introduces that macro and uses it.
Jesse Gross pointed out inconsistencies in this area.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
Ben Pfaff [Mon, 27 Dec 2010 23:28:58 +0000 (15:28 -0800)]
datapath: Do not shadow 'err' variable name in tnl_send().
The sparse checker reported that 'err' was used for two different variables
in tnl_send(). The two variables have different types, so this patch
renames the inner one.
Jesse confirmed that the original code was correct as written. This patch
does not change its behavior.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
Ben Pfaff [Mon, 27 Dec 2010 23:23:54 +0000 (15:23 -0800)]
datapath: Suppress sparse complaints about address spaces.
The sparse checker was complaining about incorrect address spaces (e.g.
__user versus non-__user pointers). I looked at each of them, checked
that the code looked correct to me, and added the appropriate __force
annotations to casts.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
Ben Pfaff [Mon, 27 Dec 2010 23:21:29 +0000 (15:21 -0800)]
datapath: Fix type of actions_len_left in modify_vlan_tci().
The sparse checker reported that the type of the pointer passed to
nla_next(), as &actions_len_left, was incorrect: whereas the parameter
has type "int *", &actions_len_left is an "unsigned int *". This fixes
the problem. It is not a bug fix since the code is equally correct (or
incorrect) either way, but it gets the types right anyhow.
I don't know why GCC was not reporting this as an error.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
Ben Pfaff [Mon, 27 Dec 2010 23:18:37 +0000 (15:18 -0800)]
datapath: Remove shadowed 'err' variable.
sparse reported that 'err' was declared in two different places in this
function. This patch removes the inner one. I verified that this didn't
affect correctness either way, so this is not a bug fix.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
Ben Pfaff [Mon, 27 Dec 2010 18:18:14 +0000 (10:18 -0800)]
vswitchd: Add OVS version to database, give system info its own columns.
Until now, nothing in the database has reported the Open vSwitch version
number. This commit adds that.
In addition, this commits moves the system type and version from
external-ids to individual columns, because we decided that these were
important enough not to relegate them to a grab-bag field.
Ben Pfaff [Thu, 23 Dec 2010 18:41:17 +0000 (10:41 -0800)]
ofp-util: Improve log messages for bad Nicira extension actions.
check_action_exact_len() will always report that a Nicira extension action
has type 65535 (OFPAT_VENDOR), which isn't very helpful for debugging.
This introduces a new function that reports the subtype.
Also, log the subtype of unknown Nicira vendor actions.
Ben Pfaff [Thu, 23 Dec 2010 18:36:02 +0000 (10:36 -0800)]
ofp-util: Improve log message for bad OpenFlow action length.
First, this is an important message since it indicates a bug in the
controller, so log it at warning level instead of debug level--we want to
know about it.
Second, properly byteswap the action type.
Third, use the correct PRIu16 format specified for a uint16_t.
Ben Pfaff [Thu, 23 Dec 2010 17:35:15 +0000 (09:35 -0800)]
datapath: Don't recursively sample packets or reset their "tun_id"s.
execute_actions() is called recursively when ODPAT_SET_DL_TCI adds a VLAN
header to a GSO packet, but we don't want to re-sample the sub-packet or
re-reset its tun_id, so break those two actions into a wrapper function.
This commit mostly moves code around without modifying it.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
Ben Pfaff [Thu, 23 Dec 2010 17:36:19 +0000 (09:36 -0800)]
datapath: Correct argument size for ODP_FLOW_GET.
ODP_FLOW_GET takes an odp_flowvec, not an odp_flow.
(This would merely introduce a gratuitous ABI incompatibility for the sake
of pedantic correctness, except that we're breaking the ABI regularly
anyhow.)
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
Jesse Gross [Thu, 16 Dec 2010 22:27:47 +0000 (14:27 -0800)]
odp-util: Correct length check in format_odp_action().
When printing the action list we first check that the size of the
action matches the expected length for that type. However, when
doing the lookup we were passing in the length of the action, not
the type, leading to bogus values.
The functions to get and set the checksum pointers consistently across
different kernel versions had different interpretations of what the
csum_offset pointer was relative to, which is confusing, to say the least.
This makes the meaning be the same as skb->csum_offset in modern kernels
and updates the caller. For a given function the results were consistent
across kernel versions and the callers knew what the meaning should be, so
this doesn't actually fix any bugs.
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Wed, 15 Dec 2010 23:38:06 +0000 (15:38 -0800)]
tunneling: Refresh IP header pointer after update_header().
We were assuming that the call to update_header() to finalize tunnel
headers wouldn't cause the skb linear data area to be reallocated.
So far this hasn't been a problem but it's not, generally speaking,
a good assumption to make. Therefore, refetch the pointer to the IP
header instead of carrying it across the call.
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Mon, 13 Dec 2010 23:21:28 +0000 (15:21 -0800)]
datapath: Correctly return error if percpu allocation fails.
If the allocation of percpu stats fails when creating a new
datapath, we currently don't return the correct error code. Since
we don't explicitly set it when the allocation fails it will keep
the value from the previous call. This means we will return success
when the creation actually failed.
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Mon, 13 Dec 2010 22:32:55 +0000 (14:32 -0800)]
Makefile: Check for undistributed files on every make, not just "make dist".
It's really easy to add files to the Git repository but forget to add them
to the distributions created by "make dist". I do this regularly, for
example. For some time, we've had a check that runs on "make dist" to
make sure that the distribution is complete, but I still screw up because
I don't run "make dist" all that often.
This commit improves the situation, by doing the check on every "make",
instead of just on "make dist".
Ben Pfaff [Mon, 13 Dec 2010 21:07:48 +0000 (13:07 -0800)]
netdev-linux: Fix pairing of rtnetlink register and unregister calls.
netdev_linux_create() called rtnetlink_notifier_register() for both system
and internal devices, but netdev_linux_destroy() only did the reverse
accounting for system devices. This fixes the pairing.
This isn't really much of a bug, since it would only cause the notifier to
be active unnecessarily (not to be removed even though it was needed). At
most it was a missed opportunity for optimization, but I don't think that
optimization would ever happen anyway.
Found with valgrind --leak-check=full --show-reachable=yes.
Ben Pfaff [Mon, 13 Dec 2010 22:28:53 +0000 (14:28 -0800)]
vswitchd: Fix dependency on DP_MAX_PORTS for allocating "struct dst"s.
Until now, compose_actions() has allocated enough "struct dst"s on the
stack for a worst-case flow, one that floods packets with the maximum
number of ports and mirrors. When the code was written this was correct.
However, now the number of ports is no longer known at compile time. The
maximum number, 65535, would require (65536 * (32 + 1) * 4) == 8 MB of
stack space, which is a lot. So this commit fixes the problem a different
way, by allocating the "struct dst"s dynamically when necessary.
This is a bug fix, but not a very serious one, because it could only
become a buffer overflow with a large number of mirrors.
Ben Pfaff [Mon, 13 Dec 2010 19:12:37 +0000 (11:12 -0800)]
bridge: Eliminate bond_rebalance_port() dependency on DP_MAX_PORTS.
There's no reason to allocate the bals[] array on the stack here, since
this is not on any fast-path.
As an alternative, we could limit the number of interfaces on a single
bond to some reasonable maximum, such as 8 or 32, but this commit's change
is simpler.
Jesse Gross [Wed, 8 Dec 2010 19:32:05 +0000 (11:32 -0800)]
datapath: Check locks on access to flow table.
When accessing the flow table without holding rcu_read_lcok
we need to hold the lock on the datapath. This enables lockdep
to validate that that is the case.
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Sun, 5 Dec 2010 20:36:36 +0000 (12:36 -0800)]
tunneling: Add checks for header cache lock.
When updating the tunnel header cache, we need to hold a lock to
protect against concurrent access. This adds annotations to
make sparse happy when we access the data without rcu_read_lock
and enables lockdep to verify that we have the correct lock.
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Mon, 6 Dec 2010 23:15:47 +0000 (15:15 -0800)]
datapath: Convert rcu_dereference() to correct variant.
Using rcu_dereference() makes lockdep complain if rcu_read_lock
is not held. This is OK if the update side lock is held. This
adds checks to see if RTNL lock is held, if that is also a
correct form of protection. Alternately, it enforces that RTNL
must be held.
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
If RTNL lock is used to protected updates to RCU data structures
then it isn't necessary to use rcu_dereference() to access them if
RTNL is held. This adds rtnl_dereference() to access these pointers
which has several benefits: documents the locking expectations;
checks that RTNL actually is held when run with lockdep; makes
sparse not complain about directly accessing RCU pointers.
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Sat, 4 Dec 2010 20:04:39 +0000 (12:04 -0800)]
datapath: Correct byte order annotations.
We have generally been using the byte order specific data types
(i.e. __be32 instead of u32) in most places. This corrects a
declaration and adds a few needed casts.
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Sat, 4 Dec 2010 19:50:53 +0000 (11:50 -0800)]
datapath: Add usage of __rcu annotation.
Sparse can warn about incorrect usage of RCU via direct access to
points when used in conjuction with __rcu and CONFIG_SPARSE_RCU.
This adds the necessary annotations.
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Mon, 6 Dec 2010 23:39:19 +0000 (15:39 -0800)]
datapath: Compatibility code for RCU check functions.
The rcu_dereference_rtnl() and rtnl_dereference() functions will
be introduced in 2.6.37. They provide nice documentation of
locking expectations as well as checking on recent kernels.
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Sun, 12 Dec 2010 18:01:19 +0000 (10:01 -0800)]
datapath-protocol: Include netlink.h.
On older kernels that don't have if_link.h, we use our own, limited
version. This version doesn't include the netlink header, causing
problems where we were relying on it to define the types in
datapath-protocol.h. Therefore, directly include it, since it is
better to be explicit about it anyways.
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Sun, 12 Dec 2010 17:54:46 +0000 (09:54 -0800)]
pinsched: Avoid uninitialized variable warning.
Some compilers warn about the variable 'n_longest' in drop_packet()
being used uninitialized. This isn't actually possible but explicitly
set it to zero to avoid spurious warnings.
Jesse Gross [Sun, 12 Dec 2010 06:53:34 +0000 (22:53 -0800)]
nx-match: Use correct printf format specifiers.
A few of the printf format specifiers didn't match the type that
they were printing. On 32-bit platforms there is some overlap
but on 64-bit they cause a mismatch.
Jesse Gross [Sun, 12 Dec 2010 06:51:31 +0000 (22:51 -0800)]
vswitchd: Consistently use size_t for action lengths.
Currently the type of the datapath action length is mixture of
size_t and unsigned int. However, size_t is really defined as an
unsigned long, which causes the build to fail on 64-bit platforms.
This consistently uses size_t.
Jesse Gross [Sun, 12 Dec 2010 01:31:36 +0000 (17:31 -0800)]
flow: Make size of flow struct a multiple of 8.
The compiler wants to pad structures to a multiple of the native
datatype for the architecture, so a multiple of 4 on 32-bit platforms
and a multiple of 8 on 64-bit. Currently the size struct flow is
a multiple of 4, so the total size with padding varies depending on
the architecture, causing build asserts to fail. This explicitly pads
it out to a multiple of 8 for consistency.
Ben Pfaff [Mon, 13 Dec 2010 18:19:46 +0000 (10:19 -0800)]
datapath: Introduce more compat support for <net/netlink.h>.
With this commit, I have successfully built the datapath, without warnings,
on 2.6.{18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,36} on i386,
2.6.31 on x86-64, and the kernels included with XenServer 5.5.0 and (some
prerelease kernel for) XenServer 5.6.0.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
Ben Pfaff [Fri, 10 Dec 2010 22:39:25 +0000 (14:39 -0800)]
datapath: Include <linux/skbuff.h> directly into linux/ip.h compat.
While doing test builds on numerous kernel versions I found that one build
failed because skb_network_header() wasn't visible from flow.h. I guess
that we accidentally depend on <linux/netlink.h> being included indirectly,
but this didn't always happen.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
Ben Pfaff [Fri, 10 Dec 2010 22:38:25 +0000 (14:38 -0800)]
datapath: Include <linux/netlink.h> directly into flow.h.
While doing test builds on numerous kernel versions I found that one build
failed because "struct nlattr" wasn't visible from flow.h. I guess that
we accidentally depend on <linux/netlink.h> being included indirectly, but
this didn't always happen.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
Jesse Gross [Fri, 10 Dec 2010 00:40:15 +0000 (16:40 -0800)]
datpath: Fix memory leak when a loop is detected.
If we detect a packet that is looping we kill the flow but then
don't do anything with the packet that caused the problem in the
first place, so this frees the packet. This isn't a very serious
leak because we try to shut off the flow that lead to the loop
as early as possible. Once this happens, packets will no longer
hit the loop detector and will be freed just as any other packet
that should be dropped.
It also fixes an issue where the offset to the stats counter is
uninitialized after a loop is detected.
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Thu, 9 Dec 2010 07:55:20 +0000 (23:55 -0800)]
datapth: Drop check for impossible condition after skb_gso_segment().
It's possible for skb_gso_segment to return NULL but only if the
hardware supports the correct form of segmentation offload but just
wants software to verify the offload parameters. However, since we're
not hardware and don't support any kind of segmentation offload natively,
we can never get in this situation. Therefore drop the check and
comment.
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Thu, 9 Dec 2010 07:29:10 +0000 (23:29 -0800)]
datapath: Drop synchronize_rcu() in internal dev destroy.
unregister_netdevice() contains a call to synchronize_rcu(), so there
is no need to directly call it ourselves immediately beforehand.
We were relying on the call during unregistration anyways to stop
packets from being transmited on the device, so our version was
both misleading and had a performance penalty.
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Thu, 9 Dec 2010 03:21:40 +0000 (19:21 -0800)]
datapath: Don't use RCU for internal dev vport.
The vports are now attached and ready to go when they are allocated,
so we don't have to worry about future changes. As a result, we can
directly store the pointer in the internal dev's netdevice private
space before it is registered. The registration process will handle
the necessary write memory barriers and anyone who has a reference
to the netdev will have done the read side barriers, we don't need
to use RCU at all.
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Justin Pettit [Sat, 11 Dec 2010 04:50:58 +0000 (20:50 -0800)]
ofproto: Fix problem that caused facets not to be installed into datapath.
Commit cdee00f (datapath: Replace "struct odp_action" by Netlink
attributes.) stopped initializing some elements in facet structures
in certain cases. This caused flows to not be installed into the datapath.
This commit sets that again based on the action context.
Ben Pfaff [Fri, 10 Dec 2010 18:42:42 +0000 (10:42 -0800)]
Expand tunnel IDs from 32 to 64 bits.
We have a need to identify tunnels with keys longer than 32 bits. This
commit adds basic datapath and OpenFlow support for such keys. It doesn't
actually add any tunnel protocols that support 64-bit keys, so this is not
very useful yet.
The 'arg' member of struct odp_msg had to be expanded to 64-bits also,
because it sometimes contains a tunnel ID. This member also contains the
argument passed to ODPAT_CONTROLLER, so I expanded that action's argument
to 64 bits also so that it can use the full width of the expanded 'arg'.
Userspace doesn't take advantage of the new space though (it was only
using 16 bits anyhow).
This commit has been tested only to the extent that it doesn't disrupt
basic Open vSwitch operation. I have not tested it with tunnel traffic.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
Feature #3976.
Ben Pfaff [Fri, 10 Dec 2010 18:40:58 +0000 (10:40 -0800)]
datapath: Replace "struct odp_action" by Netlink attributes.
In the medium term, we plan to migrate the datapath to use Netlink as its
communication channel. In the short term, we need to be able to have
actions with 64-bit arguments but "struct odp_action" only has room for
48 bits. So this patch shifts to variable-length arguments using Netlink
attributes, which starts in on the Netlink transition and makes 64-bit
arguments possible at the same time.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
Ben Pfaff [Tue, 7 Dec 2010 17:37:59 +0000 (09:37 -0800)]
netlink: New function nl_attr_type().
Linux since v2.6.24 has a couple of couple of bits at the top of
nla_type that one is apparently supposed to ignore. This commit
starts doing that in Open vSwitch userspace.
Ben Pfaff [Fri, 10 Dec 2010 17:51:03 +0000 (09:51 -0800)]
netlink: Split into generic and Linux-specific parts.
The parts of the netlink module that are related to sockets are
Linux-specific, since only Linux has AF_NETLINK sockets. The rest can be
built anywhere. This commit breaks them into two modules, and builds the
generic one on all platforms.
Ben Pfaff [Tue, 7 Dec 2010 17:33:27 +0000 (09:33 -0800)]
netlink: Make netlink-protocol.h compatible with <linux/netlink.h>.
Until now, netlink-protocol.h and <linux/netlink.h> could not both be
included by a single source file, because they contained conflicting
definitions. This commit fixes the problem, by having netlink-protocol.h
delegate to <linux/netlink.h> where it is available.
Here's an example of the problem: odp-util.c includes both
datapath-protocol.h and will need netlink-protocol.h also so that it can
look through actions defined as struct nlattr. datapath-protocol.h
includes <linux/if_link.h> for the definition of rtnl_link_stats64, and
<linux/if_link.h> includes <linux/netlink.h>.
Jesse Gross [Fri, 10 Dec 2010 01:52:39 +0000 (17:52 -0800)]
tunneling: Fix updated port pools commit.
If readding a tunnel to the table fails during move_port(), we
should decrement the port pool counter that it is in. However,
when I attempted to do this, I accidentally put it in add_port().
Jesse Gross [Wed, 8 Dec 2010 21:38:22 +0000 (13:38 -0800)]
datapath: Drop unused file ops.
There have been two ops to support async access to the datapath
character device for a long time but they have never been implemented.
Drop the commented out code.
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>