so after it, we should recompute the checksum to include these 4 bytes.
skb->data still points to the mac header, therefore VLAN header is at
(2 * ETH_ALEN = 12) bytes after it, not (ETH_HLEN = 14) bytes.
Therefore the VLAN_HLEN = 4 bytes after 2 * ETH_ALEN is the part
we want to sub from checksum.
Cc: David S. Miller <davem@davemloft.net> Cc: Jesse Gross <jesse@nicira.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
Pravin B Shelar [Sat, 23 Feb 2013 01:16:11 +0000 (17:16 -0800)]
datapath: Increase maximum allocation size of action list.
The switch to flow based tunneling increased the size of each output
action in the flow action list. In extreme cases, this can result
in the action list exceeding the maximum buffer size.
This doubles the maximum buffer size to compensate for the increase
in action size. In the common case, most allocations will be
less than a page and those uses kmalloc. Therefore, for the majority
of situations, this will have no impact.
Justin Pettit [Fri, 22 Feb 2013 22:07:47 +0000 (14:07 -0800)]
ofproto-dpif: Look at the flow's ofproto when handling flow misses.
When handling flow misses, an attempt is made to group identical packets
together. Before the single datapath, each OpenFlow port number was
unique, so the flow_equal() function was sufficient to check whether
packets are identical. With the single datapath, the OpenFlow port
numbers are shared across bridges, so packets that arrive at the same
time and are identical other than their ingress port were being serviced
by the same ofproto instance. This commit changes the duplicate flow
finding function to take the ofproto into account.
Bug #14934
Signed-off-by: Justin Pettit <jpettit@nicira.com> Acked-by: Ethan Jackson <ethan@nicira.com>
Justin Pettit [Fri, 22 Feb 2013 02:46:20 +0000 (18:46 -0800)]
match: Only print tp_src and tp_dst for TCP and UDP.
When printing a match, we would print "tp_src" and "tp_dst" if the
packet wasn't ICMPv4 or ICMPv6. Unfortunately, this doesn't cover ARP.
This changes the check to only print those keys if the network protocol
is TCP or UDP.
Ansis Atteka [Thu, 14 Feb 2013 00:48:46 +0000 (16:48 -0800)]
tunnel: set skb mark for IPsec tunnel packets
The new ovs-monitor-ipsec implementation will use skb marks in
IPsec policies. This patch will configure datapath to use these
skb marks for IPsec tunnel packets.
Pravin B Shelar [Tue, 19 Feb 2013 20:45:57 +0000 (12:45 -0800)]
datapath: Remove CAPWAP tunneling support.
The CAPWAP implementation is just the encapsulation format and
therefore really not the full protocol. While there were some
uses of it (primarily hardware support and UDP transport). But
these are most likely better provided by VXLAN.
Following patch removes CAPWAP tunneling support.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
Rich Lane [Fri, 8 Feb 2013 23:29:57 +0000 (15:29 -0800)]
datapath: Fix parsing invalid LLC/SNAP ethertypes
Before this patch, if an LLC/SNAP packet with OUI 00:00:00 had an ethertype
less than 1536 the flow key given to userspace in the upcall would contain the
invalid ethertype (for example, 3). If userspace attempted to insert a kernel
flow for this key it would be rejected by ovs_flow_from_nlattrs.
This patch allows OVS to pass the OFTest pktact.DirectBadLlcPackets.
Signed-off-by: Rich Lane <rlane@bigswitch.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
Jesse Gross [Tue, 19 Feb 2013 19:01:33 +0000 (11:01 -0800)]
datapath: Use nla_len() in queue_userspace_packet().
Commit e995e3df57ea4e27678bc0bea5eb30872994155b (Allow
OVS_USERSPACE_ATTR_USERDATA to be variable length.) introduced an
open coded version of nla_len() in queue_userspace_packet(). This
replaces it with the equivalent function call.
Ethan Jackson [Sat, 16 Feb 2013 20:07:18 +0000 (12:07 -0800)]
ofproto-dpif: Receive special packets on patch ports.
Commit 0a740f48293 (ofproto-dpif: Implement patch ports in
userspace.) allowed special packets (i.e. LACP, CFM, etc) to be
sent on patch ports, but not received. This patch implements the
logic required to receive special packets on patch ports.
Bug #15154. Signed-off-by: Ethan Jackson <ethan@nicira.com>
Ben Pfaff [Sat, 16 Feb 2013 00:48:32 +0000 (16:48 -0800)]
Allow OVS_USERSPACE_ATTR_USERDATA to be variable length.
Until now, the optional OVS_USERSPACE_ATTR_USERDATA attribute had to be
exactly 64 bits long, if it was present. However, 64 bits is not enough
space to associate as much information with a flow as would be convenient
for some userspace features now under development. This commit generalizes
the attribute, allowing it to be any length.
This generalization is backward-compatible: if userspace only uses 64-bit
attributes, then it will not see any change in behavior.
Currently we do not include ovs-bugtool in xenserver rpms.
This is because xen-bugtool provides the information required
to debug openvswitch. But xen-bugtool also provides a lot more
data that is not required for openvswitch debugging. This makes
the debug bundle quite huge.
Also, xen-bugtool takes a lot of time to collect the required
information. For example, in my xenserver6.0.2 with 100 OVS
interfaces, 'xen-bugtool -y -s' takes 180 seconds to finish
creating a debug bundle with a size of 124M.
On the other hand, if we run a ovs-bugtool command of the form
'ovs-bugtool -y -s --log-days=10 --outfile bundle.tar.gz', it
takes 5 seconds to finish with a debug bundle size of 28M.
In my tests, I see that creating a tar.gz takes a lot less
time than creating a tar.bz2. The difference in compressed
size of the debug bundle is not much different when either
of the above is used. So, use tar.gz as the default debug
bundle type.
Test results in my setup:
For an uncompressed debug bundle size of 250MB(95% of it is log files),
bz2 takes 50 seconds whereas gz takes 8 seconds. xz took 90 seconds.
gz, bz2 and xz compressed the debug bundle into 144M, 139M and 131M
respectively.
ovs-bugtool: Don't run a few ethtool commands on virtual devices.
There can be a few hundred virtual interfaces in a hypervisor.
Some of the ethtool commands that we currently run on these devices
probably does not provide any extra information. So remove them
for tap and vif interfaces.
Also bump up the size limitation for CAP_NETWORK_STATUS. The
current value is quite low and a 50 MB limit pre-compression
does not add much to the overall size.
ovs-bugtool: Ability to collect the number of rotated logs.
A big reason for a large debug bundle size is the size of log
files. By default we collect 20 rotated logs for each logfile.
Most of the times we collect the debug bundle as soon as we
hit a bug. In such cases, we know that we need only one day's
worth of logs.
This patch adds an option, '--log-days' to ovs-bugtool wherein
we can specify how many days worth of rotated logs do we need
as part of the debug bundle.
ovs-bugtool: Provide a separate capability to openvswitch logs.
Currently we have a 50 MB size limitation for all logs. This looks
quite less because a single uncompressed log can be 50 MB which
will result in ovs-bugtool picking a single log.
While debugging issues related to openvswitch, it is important that
we have all logs related to openvswitch atleast. This patch provides
a new capability for openvswitch logs with no size limitation. This
should not be a problem since compression reduces the size of the logs
quite a bit.
Also increase the size limitation for the regular system logs to 200 MB.
A future commit adds an option '--log-days' to control the number of logs
that we collect.
rhel, xenserver: Make logrotate daily and compress old logs.
The default values can be different and usually comes from /etc/logrotate.conf.
For xenserver6.0.2, the values in /etc/logrotate.conf is daily and compress.
So this patch does not make any difference. But it does future proof against
any changes in xenserver in the future.
For rhel6.1, the values are weekly and un-compress.
Kyle Mestery [Thu, 14 Feb 2013 14:37:28 +0000 (09:37 -0500)]
vxlan: Change dpif_backer->tnl backer to a "struct simap"
Move dpif_backer->tnl_backers from a "struct sset" to a
"struct simap". Store odp_port in the new map. This will make it easier to
access the odp_port for future patches.
Signed-off-by: Kyle Mestery <kmestery@cisco.com> Acked-by: Ethan Jackson <ethan@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Wed, 13 Feb 2013 23:50:54 +0000 (15:50 -0800)]
ofproto-dpif: Move 'orig_flow' from action_xlate_ctx to local variable.
A comment said that this was necessary to silence a false-positive warning
from GCC 4.4. However, it no longer triggers a warning for me, so enough
must have changed in the meantime to make GCC happy.
Ben Pfaff [Tue, 12 Feb 2013 23:56:10 +0000 (15:56 -0800)]
ofproto-dpif: Reduce number of get_ofp_port() calls during flow xlate.
Until now the flow translation code has done one get_ofp_port() call
initially to check for special processing, then one for each level of
action processing. Only one call is actually necessary, though, because
the in_port of a flow doesn't change in ordinary circumstances, and so this
commit eliminates the unnecessary calls.
The one case where the in_port can change is when a packet passes through
a patch port. The implementation here was buggy anyway: when the patch
port's peer had forwarding disabled by STP, then the code would drop all
ODP actions, even those that were executed before the packet crossed the
patch port. This commit fixes that case.
With a complicated flow table involving multiple levels of resubmit, this
increases flow setup performance by 2-3%.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Ethan Jackson <ethan@nicira.com>
Ben Pfaff [Tue, 12 Feb 2013 23:49:12 +0000 (15:49 -0800)]
ofp-msgs: ensure that l2 is set in ofpmp_reserve()
Ensure that the buffer returned by ofpmp_reserve() has buf->l2 set
as this may be required by nxm_reg_load_to_nxast() when generating
the reply to an stats request
This problem was observed when dumping a large number of flows
with set_field actions using ovs-ofctl dump-flows.
Signed-off-by: Ben Pfaff <blp@nicira.com> Co-authored-by: Simon Horman <horms@verge.net.au> Signed-off-by: Simon Horman <horms@verge.net.au>
In fact, the "target" column cannot be made unique within the
Controller table, because different bridges are allowed to have
the same target. OVSDB does not have a way to express this
constraint, so it must be omitted entirely.
Reported-by: Saul St. John <sstjohn@cs.wisc.edu> CC: Natasha Gude <natasha@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Tue, 12 Feb 2013 08:00:42 +0000 (00:00 -0800)]
Make OpenFlow 1.2+ role replies return the generation ID.
OpenFlow extensibility working group issue EXT-272 clarifies the use of
the generation_id in role reply messages as used for the current generation
ID or all-1-bits if there is no current generation ID. This commit
implements EXT-272 in Open vSwitch.
Unfortunately the full text of EXT-272 is not available freely online.
(The "open" part of the Open Networking Foundation is the network, not
the foundation)
EXT-272. CC: Jarno Rajahalme <jarno.rajahalme@nsn.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Tue, 12 Feb 2013 07:55:31 +0000 (23:55 -0800)]
ofp-util: Simplify struct ofputil_role_request.
It makes more sense to use enum ofp12_controller_role here than
to use enum nx_role, because the former is a superset of the latter and
we can then get rid of a bool member too.
Ben Pfaff [Mon, 11 Feb 2013 21:46:42 +0000 (13:46 -0800)]
vswitchd: Require "target" column to be unique in OVS database.
Commit cc7ecee48 (vswitchd: Add unique indexes for some columns.) says,
in part:
With this commit, the database server itself rejects attempts to add
Port or Interface records with duplicate names or Controller or
Manager records with duplicate targets.
but in fact didn't change the Controller table as described. This commit
fixes that.
This commit updates the schema version number's major version, because this
is a potentially non-backward compatible change, if some user depended on
the ability to add Controller records with duplicate targets. However, if
anyone thinks this is a bad idea, then I'm open to discussion.
Reported-by: Natasha Gude <natasha@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Ethan Jackson <ethan@nicira.com>
Ben Pfaff [Fri, 17 Aug 2012 23:08:17 +0000 (16:08 -0700)]
tests: Set explicit bond mode in LACP test.
This avoids a log warning:
bridge|WARN|port bond: Using the default bond_mode active-backup.
Note that in previous versions, the default bond_mode was balance-slb
This warning is harmless, but I'm trying to add checks for "warn" and
higher severity log messages to the tests, so it makes sense to get rid of
this one.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Ethan Jackson <ethan@nicira.com>
Pavithra Ramesh [Fri, 8 Feb 2013 20:37:18 +0000 (12:37 -0800)]
stream-unix: Use rundir as root for relative paths.
Until now, "unix:" and "punix:" paths that are not absolute have
been considered relative to the current working directory. It
is more useful to consider them relative to the rundir, so this
commit makes that change to the C and Python implementations of
the stream code.
This commit also relaxes the whitelist check in the bridge code
so that any name that does not contain a "/" is considered OK.
Signed-off-by: Pavithra Ramesh <paramesh@vmware.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
Jarno Rajahalme [Thu, 7 Feb 2013 22:06:23 +0000 (00:06 +0200)]
classifier: Maintain tables in descending priority order.
Signed-off-by: Jarno Rajahalme <jarno.rajahalme@nsn.com>
[blp@nicira.com: this along with Jarno's previous patch to the
classifier give me a combined 15% boost in "ovs-benchmark rate"
with a complicated flow table involving multiple resubmits] Signed-off-by: Ben Pfaff <blp@nicira.com>
ovs-pki: Increase the validity period for all certificates.
This patch increases the certificate validity to 100 years
for certificate authorities, the certificates that they certify
and for self signed certificates.
Ethan Jackson [Fri, 8 Feb 2013 02:39:24 +0000 (18:39 -0800)]
tunnel: Treat in_key=0 the same as a missing in_key.
The documented behavior of ovs is that a missing key is the
same as a zero key. However, the tunneling code actually treated
them differently. This could cause problems with tunneling modes
such as vxlan which always have a key. Specifically, a tunnel with
no key configured, would send have to send traffic with a key of
zero. However, the same tunnel would drop incoming traffic with a
zero key because it was expecting there to be none at all.
Ethan Jackson [Thu, 7 Feb 2013 00:45:38 +0000 (16:45 -0800)]
tunnel: Log tunneling changes at INFO level.
These log messages occur infrequently, and are quite useful when
debugging problems after the fact. So they should be logged at
info level which makes them more readily available.
Cong Wang [Wed, 6 Feb 2013 22:40:36 +0000 (14:40 -0800)]
datapath: adjust skb_gso_segment() for calling in rx path
skb_gso_segment() is almost always called in tx path,
except for openvswitch. It calls this function when
it receives the packet and tries to queue it to user-space.
In this special case, the ->ip_summed check inside
skb_gso_segment() is no longer true, as ->ip_summed value
has different meanings on rx path.
This patch adjusts skb_gso_segment() so that we can at least
avoid such warnings on checksum.
Cc: Jesse Gross <jesse@nicira.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[jesse: backport to kernels before 3.9 and add to tunnel.c] Signed-off-by: Jesse Gross <jesse@nicira.com>
Ethan Jackson [Tue, 5 Feb 2013 02:45:54 +0000 (18:45 -0800)]
nicira-ext: Remove the autopath action.
The autopath action was attempting to achieve functionality similar
to the bundle action, but was significantly clunkier, more
difficult to understand, more difficult to use, and less reliable.
This patch removes it.
Ethan Jackson [Tue, 5 Feb 2013 02:28:57 +0000 (18:28 -0800)]
bond: Remove stable bond mode.
Stable bond mode, along with autopath, were trying to implement
functionality close to what we get from the bundle action.
Unfortunately, they are quite clunky, and generally less useful
than bundle, so they're being removed.
Simon Horman [Fri, 25 Jan 2013 07:22:07 +0000 (16:22 +0900)]
User-Space MPLS actions and matches
This patch implements use-space datapath and non-datapath code
to match and use the datapath API set out in Leo Alterman's patch
"user-space datapath: Add basic MPLS support to kernel".
The resulting MPLS implementation supports:
* Pushing a single MPLS label
* Poping a single MPLS label
* Modifying an MPLS lable using set-field or load actions
that act on the label value, tc and bos bit.
* There is no support for manipulating the TTL
this is considered future work.
The single-level push pop limitation is implemented by processing
push, pop and set-field/load actions in order and discarding information
that would require multiple levels of push/pop to be supported.
e.g.
push,push -> the first push is discarded
pop,pop -> the first pop is discarded
This patch is based heavily on work by Ravi K.
Cc: Ravi K <rkerur@gmail.com> Reviewed-by: Isaku Yamahata <yamahata@valinux.co.jp> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Fri, 1 Feb 2013 22:52:49 +0000 (14:52 -0800)]
python/ovs/db/types: Fix English grammar for enums with one member.
Before this change, enums that have one member were formatted as, e.g.:
"one of xyzzy, , or "
This changes them to be formatted as:
"must be xyzzy"
which makes much more sense.
(An enum with one member may make some sense if you are trying to leave
the possibility for future expansion.)
Jesse Gross [Fri, 1 Feb 2013 23:34:10 +0000 (15:34 -0800)]
tunneling: Don't send ICMP messages if no tunnel port is found.
Some tunnel code in OVS (for example, CAPWAP) uses the skb->cb to
store information while processing packets. However, if we don't
find an appropriate tunnel port on receive, then we send an ICMP
port unreachable message, which calls back into the IP stack. The
stack assumes that skb->cb will still contain valid information
about from the IP layer, including any IP options. As a result,
icmp_echo_options() can read the garbage values from OVS and
overwrite data on the stack, panicing the machine.
This simply stops sending ICMP messages when ports are not found.
Many people find them confusing and flow based tunneling will
never send them (since it always finds a port) so it solves both
problems at once.
Ben Pfaff [Thu, 24 Jan 2013 21:46:23 +0000 (13:46 -0800)]
unixctl: Use ovs_retval_to_string() where EOF is a possible value.
jsonrpc_transact_block() might return EOF so passing its return value to
strerror() isn't general enough.
It might be better to change jsonrpc_transact{_block}() to never return
EOF, since a closed connection seems like it is always an error in that
context.
Found by Coverity.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Ethan Jackson <ethan@nicira.com>
Ben Pfaff [Thu, 24 Jan 2013 22:17:21 +0000 (14:17 -0800)]
vlog: New function vlog_set_levels_from_string_assert().
Two of the users of vlog_set_levels_from_string() in the tests could have
silently failed, if their arguments were invalid. This avoids that problem
(and a memory leak).
Found by Coverity.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Ethan Jackson <ethan@nicira.com>
Justin Pettit [Fri, 1 Feb 2013 08:11:32 +0000 (00:11 -0800)]
ofp-parse: Ignore "idle_age" and "hard_age" when parsing a flow string.
It should be possible to feed to output of "ovs-ofctl dump-flows" to
"ovs-ofctl add-flows". However, some of the metadata needs to be
ignored. "idle_age" and "hard_age" was recently added to the output of
"ovs-ofctl dump-flows", but they were not ignored like the other
metadata. This commit ignores them.
Ben Pfaff [Fri, 17 Aug 2012 22:40:03 +0000 (15:40 -0700)]
netlink-socket: Don't bother logging SO_RCVBUFFORCE failure as non-root.
Some Open vSwitch utilities can do useful work when they are not run as
root. Without this commit, these utilities will log a warning on failure
to use the SO_RCVBUFFORCE socket option if they open any Netlink sockets.
This will always happen, it does not report anything unexpected or
fixable as non-root, and sometimes it makes users wonder if something is
wrong, so there is no benefit to logging it. This commit drops it in that
case.
Ethan Jackson [Fri, 25 Jan 2013 21:30:40 +0000 (13:30 -0800)]
netdev-vport: Build on all platforms.
This patch removes the final bit of linux specific code which
prevents building netdev-vport everywhere. With this, other
platforms automatically get access to patch ports, and (if their
datapath supports it), flow based tunneling.