Ethan Jackson [Fri, 27 May 2011 17:48:58 +0000 (10:48 -0700)]
ofproto: Optimize datapath actions.
The translation from ofproto to datapath actions has historically
been fairly naive. Since redundant ODP actions result in more code
execution in the fast path, this patch changes xlate_actions() to
emit fewer ODP actions for a given flow.
This change is not simply a theoretical optimization. It actually
reduces the number of ODP actions for real flow tables which
Nicira's controllers use in practice. Furthermore, removing
unnecessary modification actions after the last output action has
the added benefit of saving a whole skb_clone() in the fast path.
Ben Pfaff [Wed, 1 Jun 2011 20:39:51 +0000 (13:39 -0700)]
datapath: Get packet metadata from userspace in odp_packet_cmd_execute().
Until now, the tun_id and in_port have been lost when a packet is sent from
the kernel to userspace and then back to the kernel. I didn't think that
this was a problem, but recent behavior made me look closer and see that
it makes a difference if sFlow is turned on or if an
ODP_ATTR_ACTION_CONTROLLER action is present. We could possibly kluge
around those, but for future-proofing it seems better to pass the packet
metadata from userspace to the kernel. That is what this commit does.
This commit introduces a user-kernel protocol break. We could avoid that,
if it is desirable, by making ODP_PACKET_ATTR_KEY optional for
ODP_PACKET_CMD_EXECUTE commands.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
Ben Pfaff [Wed, 1 Jun 2011 17:53:53 +0000 (10:53 -0700)]
learning-switch: Don't limit message queued by --with-flows.
queue_tx() intentionally limits the number of outstanding OpenFlow messages
queued to the switch. This was unintentionally being applied to the
messages queued to the switch at startup by ovs-ofctl's --with-flows
command. This patch should fix the problem, by calling rconn_send()
directly instead of through queue_tx().
Ahmed reported that with this patch there was still a problem when 30,000
flows were specified in the file.
Ben Pfaff [Tue, 31 May 2011 19:50:57 +0000 (12:50 -0700)]
ovsdb: Check ovsdb_mutation_set_execute() return value in transactions.
Errors from this function were being ignored, which meant that transactions
could use "mutate" to bypass number-of-elements constraints on sets and
maps. This fixes the problem and adds a test to prevent the problem from
recurring.
Jesse Gross [Fri, 27 May 2011 22:57:28 +0000 (15:57 -0700)]
datapath: Don't call genlmsg_reply() under rcu_read_lock().
genlmsg_reply() indirectly makes a call to kmalloc but takes no
GFP flags, instead using GFP_ATOMIC if in a softirq and GFP_KERNEL
otherwise. However, here we hold rcu_read_lock(), which requires
GFP_ATOMIC but is not a softirq. Since we've already built the
reply message, it is safe to release rcu_read_lock(), so do that
before calling genlmsg_reply().
Ben Pfaff [Tue, 10 May 2011 15:59:21 +0000 (08:59 -0700)]
ovs-ofctl: Report attempts to add (remove, etc.) non-normalized flows.
Commit 0b3f27253 (ovs-ofctl: Warn about flows not in normal form) made
ovs-ofctl warn about non-normalized flows, that is, flows some of whose
specified fields will be ignored by the switch. This was convenient for
users, who are understandably confused by flow normalization. However,
later commit 8050b31d6 (ofp-parse: Refactor flow parsing) accidentally
deleted the warning. This commit restores it and adds a test to ensure
that it doesn't get deleted again later.
Ben Pfaff [Mon, 9 May 2011 23:25:48 +0000 (16:25 -0700)]
ofp-util: Export (again) a generalized function for normalizing rules.
Feature #5029 requests that "ovs-ofctl add-flow" report an attempt to add
a flow that is not properly normalized, that is, a flow to which the switch
will add extra wildcards, ignoring some fields specified by the user. This
requires that ofp-util make flow normalization directly available (again).
Until now, flow normalization has only been applied to OpenFlow 1.0 flows,
but the concept applies equally to NXM, so this commit generalizes the
implementation to NXM also.
Ben Pfaff [Mon, 9 May 2011 23:21:00 +0000 (16:21 -0700)]
nx-match: Check prerequisites for ICMPv6 before outputting subfields.
nicira-ext.h documents that NXM_NX_ND_TARGET requires ND_NEIGHBOR_SOLICIT
or ND_NEIGHBOR_ADVERT, that NXM_NX_ND_SLL requires ND_NEIGHBOR_SOLICIT,
and that NXM_NX_ND_TLL requires ND_NEIGHBOR_ADVERT, but nx_put_match()
would add them to the match regardless of whether these prerequisites were
satisfied. On the other side, nx_pull_match() did check the prerequisites,
so this was a case where OVS could output flows that it would refused to
parse. This fixes the problem.
Ethan Jackson [Thu, 26 May 2011 01:23:31 +0000 (18:23 -0700)]
CFM: Revert default MA and MD name.
Commit 84c5d450 "cfm: No longer allow configuration of ma_name and
md_name." changed the default MA and MD name for no reason. This
could add needless complexity to situations where OVS instances
built before and after this commit need to speak CFM to each other.
This commit reverts to the old values.
Ben Pfaff [Thu, 26 May 2011 16:06:10 +0000 (09:06 -0700)]
ovsdb-doc: Omit E-R diagram from ASCII version of manpage.
The E-R diagram is illegible when rendered in ASCII, so exclude it from
that version only. The E-R diagram is still included in other versions of
the manpage (e.g. for PostScript output).
Ben Pfaff [Tue, 24 May 2011 18:04:22 +0000 (11:04 -0700)]
test-csum: Avoid "cast increases required alignment of target type" warning.
Casting a character pointer to uint16_t * or uint32_t * provokes this
warning on sparc. There is no actual problem, because all of the accesses
to data occurs through calls to the get_unaligned_*() functions, so this
commit suppresses the warning by using "void *" as an intermediate type.
Ben Pfaff [Tue, 24 May 2011 18:02:31 +0000 (11:02 -0700)]
bridge: Avoid "cast increases required alignment of target type" warning.
This pointer arithmetic caused a warning on sparc. One alternative would
be to add a cast through "void *", but it seems at least as reasonable to
just use a few macros to do direct assignments.
Ben Pfaff [Thu, 26 May 2011 16:03:03 +0000 (09:03 -0700)]
in-band: Avoid "cast increases required alignment of target type" warning.
Casting a struct ofp_action_output object's address to type union
ofp_action * provokes a warning on sparc. It's easier to just declare
the object as having type union ofp_action, even though I don't think that
there is a real problem here.
Ben Pfaff [Tue, 24 May 2011 17:55:26 +0000 (10:55 -0700)]
netlink: Avoid "cast increases required alignment of target type" on RISC.
This warning doesn't indicate a genuine problem, since 'nla' must be
aligned properly and NLA_ALIGN ensures that the offset is aligned properly
too. Casting via void * suppresses the warning.
Ben Pfaff [Tue, 24 May 2011 19:23:42 +0000 (12:23 -0700)]
poll-loop: Automatically log reason for wakeup when CPU usage spikes.
For a long time, the poll-loop module has had the ability to log the reason
for wakeups, which is valuable for debugging excessive use of CPU time.
But I have to ask users to turn up the log level for the module, which
wastes their time and mine. This commit improves the situation by
automatically logging the reason for a wakeup whenever a process's
estimated CPU usage rises above 50%. (ovs-vswitchd often uses less than
1% CPU; more than 5% CPU is uncommon.)
Ben Pfaff [Wed, 18 May 2011 21:28:24 +0000 (14:28 -0700)]
ofproto-dpif: Fix statistics busted by bad merge.
This reimplements the statistics fixes originally implemented by the
following commits, which became broken by bad merges during the previous
"next" branch development cycle:
Ben Pfaff [Fri, 13 May 2011 21:20:10 +0000 (14:20 -0700)]
tests: Fix the two Python XFAIL tests.
OVS has two Python tests that have always failed, for reasons not
understood, since they were added to the tree. This commit fixes them.
One problem was that Python was assuming that stdout was encoded in ASCII.
Apparently the only way to "fix" this at runtime is to set PYTHONIOENCODING
to utf_8 in the environment, so this change does that.
Second, it appears that Python really doesn't like to print invalid UTF-8,
so this avoids doing that in python/ovs/json.py, instead just printing
the hexadecimal values of the invalid bytes. For consistency, it makes
the same change to the C version.
Third, the C version of test-ovsdb doesn't check UTF-8 for consistency, it
just sends it blindly to the OVSDB server, but Python does check it and so
it bails out earlier. This commit changes the Python version of the
"no invalid UTF-8 sequences in strings" to allow for the slight difference
in output that occurs for that reason.
Finally, test-ovsdb.py needs to convert error messages to Unicode
explicitly before printing them in the "parse-atoms" function. I don't
really understand why, but now it works.
Ben Pfaff [Thu, 5 May 2011 18:25:51 +0000 (11:25 -0700)]
stream-ssl: Log unexpected errors from 'stat'.
A user report possibly implicates problems reading the mtime of the CA
cert file. Until now, nothing has logged these errors. This commit adds
such logging.
Ben Pfaff [Thu, 5 May 2011 17:59:50 +0000 (10:59 -0700)]
stream-ssl: Force CA cert file to be read when it appears during bootstrap.
A user report shows the message "reading CA cert
/etc/openvswitch/vswitchd.cacert created by another process" appearing
hundreds of times over a long period of time in the log. The only way I
can see that this would happen is if update_ssl_config() returned false,
indicating that the CA cert does not need to be re-read because it has not
changed. This commit should prevent that from happening.
We don't want to simply skip calling update_ssl_config() in this case,
because then the next call to stream_ssl_set_ca_cert_file() would usually
re-read the CA certificate, which is a waste of time.
Ben Pfaff [Mon, 2 May 2011 23:27:58 +0000 (16:27 -0700)]
ovsdbmonitor: Update to work with recent ovs-dpctl output.
The format of "ovs-dpctl dump-flows" has completely changed since
ovsdbmonitor was written. It seems like no one has tried out ovsdbmonitor
since the rewrite, because it didn't work at all. This commit fixes the
parser.
Ben Pfaff [Mon, 2 May 2011 23:17:11 +0000 (16:17 -0700)]
ovsdbmonitor: Use ovs.json module instead of JsonReader and JsonWriter.
I can't figure out where JsonReader and JsonWriter come from. I know that
they must exist, because I (and others) have used ovsdbmonitor before, but
I can't find them now.
Switch to using ovs.json, which is part of Open vSwitch so we know that
it exists. At the same time, we have to start translating the Unicode
strings that ovs.json outputs into standard Python strings; otherwise
the "twisted conch" ssh implementation craps out because it tries to
concatenate this Unicode string with a standard string that contains
non-ASCII characters.
Ben Pfaff [Tue, 3 May 2011 00:15:21 +0000 (17:15 -0700)]
docs: Suppress "warning: macro `DD' not defined" warnings from man.
deamon.man allows the file that is including it to include extra text in
the description of --detach by defining a macro named DD. Only some of the
manpages that included it did this (only those manpages that needed extra
text there). But it's better to be quiet in "man --warnings", so this
defines DD to an empty value in the other manpages that include daemon.man.
There is no reason to believe that the source and destination ports
will be symmetric in a bidirectional UDP stream. This patch no
longer uses them for symmetric hashing.
Ben Pfaff [Thu, 12 May 2011 20:46:16 +0000 (13:46 -0700)]
Avoid sparse error or warning for sizeof(_Bool).
The sparse checker does not like taking sizeof(_Bool). Older versions of
sparse output a hard error ("error: cannot size expression"). Newer
versions output a warning ("warning: expression using sizeof bool"). This
commit avoids the problem by not using sizeof(_Bool) anywhere.
The only place where OVS uses sizeof(_Bool) anyway is in code generated by
the OVSDB IDL. It generates it for populating "optional bool" columns in
the database, that is, columns that are allowed to contain 0 or 1 instances
of a bool. For these columns, it generates code that looks roughly like
this:
row->column = xmalloc(sizeof *row->column);
*row->column = value;
This commit changes these columns from type "bool *" to type "const bool *"
and changes the generated code to:
static const bool true_value = true;
static const bool false_value = false;
row->column = value ? &true_value : &false_value;
which avoids the problem and saves a malloc() call at the same time.
The idltest code had a column with a slightly different type ("0, 1, or
2 bools") that didn't fit the revised pattern and is a fairly stupid type
anyhow, so I just changed it to "0 or 1 bools".
Ethan Jackson [Tue, 17 May 2011 23:07:58 +0000 (16:07 -0700)]
cfm: Implement 802.1ag RDI flag.
According to the 802.1ag specification, a maintenance point should
set the Remote Defect Indicator (RDI) bit on CCMs when it has
failed to receive a CCM from any of it's configured remote
maintenance points within the required fault interval. This allows
unidirectional faults to be flagged by both ends of a CFM monitored
tunnel.
Ethan Jackson [Sat, 14 May 2011 01:11:43 +0000 (18:11 -0700)]
cfm: Remove packet definition from CFM header file.
This patch makes a stylistic improvement by removing CFM protocol
information from cfm.h. In the process it changes
cfm_compose_ccm() to populate an ofpbuf instead of a struct ccm.
Ethan Jackson [Thu, 12 May 2011 22:28:43 +0000 (15:28 -0700)]
cfm: Remove Maintenance_Point and Monitor tables.
In an effort to make CFM easier to understand and configure, this
patch removes the Maintenance_Point and Monitor tables from the
database. As a consequence, users will only be able to configure
one remote maintenance point. Furthermore, before this patch each
remote maintenance point maintained its own separate fault flag in
the database. This flag is no longer reported, users will need to
infer the fault status from the global CFM fault flag.
Ethan Jackson [Thu, 12 May 2011 23:08:52 +0000 (16:08 -0700)]
cfm: No longer allow configuration of ma_name and md_name.
These settings added complexity to the database and CFM module
interface with negligible benefit. This patch removes them in such
a way that they can easily be re-added in the (unlikely) event that
we need them in the future.
Ethan Jackson [Fri, 13 May 2011 23:53:24 +0000 (16:53 -0700)]
cfm: Migrate cfm/show unixctl command to CFM module.
This patch moves the cfm/show unixctl show command from the bridge
to the CFM module. This is more in line with how LACP does it, and
will make future patches easier to implement.
Ethan Jackson [Mon, 16 May 2011 21:40:03 +0000 (14:40 -0700)]
netdev: Take responsibility for polling MII registers.
This patch moves miimon logic from the bond module to netdev-linux.
This greatly simplifies the bonding code while adding minimal
complexity to netdev-linux. The bonding code is so high level, it
really has no business worrying about how precisely slave status is
determined.
Ethan Jackson [Thu, 19 May 2011 22:52:54 +0000 (15:52 -0700)]
xenserver: Pull slave MTU from bond record.
The MTU of the fake bond interface and the slaves participating in
a bond should all agree. The correct long term solution to this
problem is to remove the fake bond interface altogether. Until
that's possible, we simply set the MTU of the slaves.
Jesse Gross [Thu, 19 May 2011 20:17:39 +0000 (13:17 -0700)]
datapath: Check that netdev vport is fully initialized.
Starting in 2.6.37 we have our own flag for identifying net_devices
as being attached to OVS. However, it's possible to receive packets
before this flag has been applied, resulting in a NULL vport when
processing the packet. This checks to make sure that the vport is
valid instead of crashing.
Bug #5675
Reported-by: Brad Hall <brad@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Wed, 18 May 2011 23:40:21 +0000 (16:40 -0700)]
ofproto-dpif: Fix bond accounting.
Calls to bond_account() and bond_choose_output_slave() had different ideas
for the vlan of a flow that did not have a tagged VLAN. The call to
bond_choose_output_slave() passed OFP_VLAN_NONE in this case, the call to
bond_account() passed 0. This meant that packets not on a VLAN weren't
accounted properly, which typically caused bond/show to show "0 kB load"
on active hashes. Obviously that broke rebalancing too.
I've verified that this fixes accounting. I haven't directly verified that
it fixes rebalancing, so it's possible that there is another issue too.
Ethan Jackson [Wed, 18 May 2011 18:52:08 +0000 (11:52 -0700)]
ofproto: Datapath statistics accounted twice.
Due to an error introduced in Commit 6f1435f "ofproto: Resubmit
statistics improperly account during failover." Flow statistics
could be double accounted when removed from the datapath.
Andrew Evans [Wed, 18 May 2011 18:30:07 +0000 (11:30 -0700)]
datapath: Hash and compare only the part of sw_flow_key actually used.
Currently the whole flow key struct is hashed on every packet received from the
network or userspace. The whole struct is also compared byte-for-byte when
doing flow table lookups. This consumes a fair percentage of CPU time, and most
of the time part of the structure is unused (e.g. the IPv6 fields when handling
IPv4 traffic; the IPv4 fields when handling Ethernet frames).
This commit reorders the fields in the flow key struct to put the least
commonly used elements at the end and changes the hash and comparison functions
to look only at the part that contains data.
Signed-off-by: Andrew Evans <aevans@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
Ethan Jackson [Tue, 17 May 2011 20:47:23 +0000 (13:47 -0700)]
bond: Bonds never sleep if carrier changes.
The bonding code neglected to call netdev_monitor_poll() on its
monitor during bond_run(). Thus carrier changes would be
permanently queued in the monitor, preventing it from ever allowing
poll_loop to sleep.
Sajjad Lateef [Tue, 17 May 2011 00:29:04 +0000 (17:29 -0700)]
xenserver: modify module spec file
Based on feedback from Citrix about building for multiple kernels,
the spec file has been modified to take three arguments on the
command line: kernel_name, kernel_version and kernel_flavor.
The kernel_flavor is either xen or kdump. The kernel_name is the
Name value embedded in the kernel rpm and the kernel_version
is Version-Release values embedded in the kernel rpm. The
xen_version is calculated.
The INSTALL document has been updated to reflect these changes.
Sajjad Lateef [Tue, 17 May 2011 00:24:20 +0000 (17:24 -0700)]
xenserver: modify module spec file
Based on feedback from Citrix about building for multiple kernels,
the spec file has been modified to take three arguments on the
command line: kernel_name, kernel_version and kernel_flavor.
The kernel_flavor is either xen or kdump. The kernel_name is the
Name value embedded in the kernel rpm and the kernel_version
is Version-Release values embedded in the kernel rpm. The
xen_version is calculated.
The INSTALL document has been updated to reflect these changes.
Ben Pfaff [Fri, 6 May 2011 19:59:51 +0000 (12:59 -0700)]
Make the source tree sparse clean.
With this commit, the tree compiles clean with sparse commit 87f4a7fda3d
"Teach 'already_tokenized()' to use the stream name hash table" with patch
"evaluate: Allow sizeof(_Bool) to succeed" available at
http://permalink.gmane.org/gmane.comp.parsers.sparse/2461 applied, as long
as the "include/sparse" directory is included for use by sparse (only),
e.g.:
make CC="CHECK='sparse -I../include/sparse' cgcc"
Ben Pfaff [Fri, 6 May 2011 19:38:20 +0000 (12:38 -0700)]
Fix up usage of flow_wildcards_t.
The flow_wildcards_t type is defined as a distinct type from sparse's
perspective (with __attribute__((bitwise))) so that we don't accidentally
mix it with only-partially-compatible OFPFW_* flags. But we were weren't
using it quite right in a few plces. This fixes it up.
Ben Pfaff [Fri, 6 May 2011 19:34:46 +0000 (12:34 -0700)]
csum: Annotate byte order to match actual usage.
The IP checksum algorithm yields identical results regardless of whether
arithmetic little-endian or big-endian, but in practice OVS only passes in
big-endian data, so it seems reasonable to annotate these functions that
way.
Ben Pfaff [Fri, 6 May 2011 18:38:19 +0000 (11:38 -0700)]
Suppress sparse warnings for global variables initialized in headers.
sparse warns if a non-static variable with external linkage has an
initializer at first declaration, because it suspects that it should be
static instead. Generally it's correct, but not in these cases, so add
a redundant declaration to suppress the warning.
The suppress warnings look like:
../ofproto/connmgr.c:40:1: warning: symbol 'VLM_connmgr' was not declared. Should it be static?
../ofproto/collectors.c:31:1: warning: symbol 'vlog_module_ptr_collectors' was not declared. Should it be static?
../ofproto/connmgr.c:43:1: warning: symbol 'counter_ofconn_stuck' was not declared. Should it be static?
Ben Pfaff [Fri, 6 May 2011 18:34:59 +0000 (11:34 -0700)]
compiler: Suppress sparse complaints about function attributes.
GCC allows __attribute__s to be included in function prototypes and
then omitted later on the function definition, but sparse complains about
this. Furthermore, sparse doesn't like the placement of the __attribute__s
that we tend to use in OVS.
I don't see any value in "fixing" these to suit sparse so it seems better
to just omit them.
Ben Pfaff [Wed, 4 May 2011 21:03:43 +0000 (14:03 -0700)]
openflow: Change types from uint<N>_t to ovs_be<N>.
I've been reluctant in the past to make wholesale changes to openflow.h
because it would be a divergence from upstream that would make comparisons
and merges more difficult. But, in practice, no one does such comparisons
and no merges happen (because OpenFlow 1.0 is not changing). I'd still be
inclined to resist, except that in this series I'm adding actual checking
for byte order conventions (as opposed to just documentation).
Ben Pfaff [Tue, 29 Mar 2011 21:42:20 +0000 (14:42 -0700)]
Convert remaining network-byte-order "uint<N>_t"s into "ovs_be<N>"s.
I looked at almost every uint<N>_t in the tree to determine whether it was
really in network byte order, and converted the ones that were.
The only remaining ones, modulo my mistakes, are in openflow.h. I'm not
sure whether we should convert those, because there might be some value
in remaining close to upstream for this header.
Ben Pfaff [Wed, 4 May 2011 22:44:51 +0000 (15:44 -0700)]
ofproto: Fix ofproto_send_packet() treatment of vlan_tci parameter.
ofproto_send_packet() seems very confused about whether vlan_tci is in
host or network byte order. But the callers always pass 0 anyway, so we
might as well remove it.