Ben Pfaff [Mon, 30 Apr 2012 20:56:49 +0000 (13:56 -0700)]
connmgr: Log when controllers are added and removed.
Otherwise occasionally during debugging it can hard to figure out why a
controller connection seemed to drop for a while (when in fact it happened
because the configuration changed).
Suggested-by: Natasha Gude <natasha@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Mon, 30 Apr 2012 17:47:51 +0000 (10:47 -0700)]
ovs-bugtool: Add "ovs-vsctl show" output to bugtool.
The information output by "ovs-vsctl show" is a subset of that available
elsewhere in bugtool output, but it is human-readable instead of needing
to be processed through ovsdb-server or ovsdb-tool, so it is much more
convenient for basic diagnosis.
xenserver: Update Open vSwitch post upgrade instructions.
Currently, when we upgrade the userspace rpm for XenServer,
we print a message asking users to reboot the hypervisor.
This is not needed. The reboot of hypervisor is needed when
we upgrade the rpm containing the kernel module. This
reboot can sometimes be avoided by running a
"service openvswitch force-reload-kmod".
Ethan Jackson [Thu, 26 Apr 2012 04:12:18 +0000 (21:12 -0700)]
rconn: Simplify rconn_send() semantics.
Before this patch, rconn_send() would delete 'b' on success, and
not on error. This is confusing and error-prone. This patch
causes rconn_send() to always delete 'b'.
Ben Pfaff [Fri, 20 Apr 2012 21:09:30 +0000 (14:09 -0700)]
coverage: Make ovs-appctl command more useful and less alarming.
I've had a few complaints that ovs-vswitchd logs its coverage counters
at WARN level, but this is mainly wrong: ovs-vswitchd only logs coverage
counters at WARN level when the "coverage/log" command is used through
ovs-appctl. This was even documented.
The reason to log at such a high level was to make it fairly certain that
these messages specifically requested by the admin would not be filtered
out before making it to the log. But it's even better if the admin just
gets the coverage counters as a reply to the ovs-appctl command. So that
is what this commit does.
This commit also improves the documentation of the ovs-appctl command.
I'd always assumed that the exponentially weighted moving average code in
timeval was enough rate-limiting, but I actually encountered a pathological
case some time ago that forced this coverage information to print once a
second or so, which seems too often.
Ben Pfaff [Fri, 20 Apr 2012 20:43:54 +0000 (13:43 -0700)]
timeval: Rate-limit logging rusage information.
I'd always assumed that the exponentially weighted moving average code
here was sufficient rate-limiting, but I actually encountered a
pathological case some time ago that forced this rusage information to
print once a second or so, which seems too often.
Ben Pfaff [Fri, 20 Apr 2012 21:52:16 +0000 (14:52 -0700)]
cfm: Log fault status changes more informatively.
Until now, fault status changes just log the new status. This means that
the administrator has to find two consecutive status change messages to
see what actually changed.
This commit changes the log message format to prefix new faults with '+'
and faults that disappeared with '-'. Existing faults that are still
present are not prefixed.
This also simplifies the code a little by making ds_put_cfm_fault()
put spaces before fault names instead of after.
Ben Pfaff [Thu, 26 Apr 2012 16:48:28 +0000 (09:48 -0700)]
json: Correct position tracking in JSON parser implementations.
When json_lex_input() returns false, the parser does not consume the byte
passed in. That byte will get processed again in the next iteration of
the json_parser_feed() loop. Therefore, until now, this code has
double-counted bytes that cause a false return from json_lex_input().
This fixes the problem. Every input byte is now counted only once.
Ben Pfaff [Tue, 24 Apr 2012 17:57:41 +0000 (10:57 -0700)]
jsonrpc: Keep jsonrpc_recv() from taking over the CPU.
jsonrpc_recv() could take an unbounded amount of CPU time as long as data
kept arriving, preventing other work from taking place. This limits the
amount of work to processing at most 25 kB of received data and then
yielding to the caller.
Simon Horman [Wed, 25 Apr 2012 01:18:30 +0000 (10:18 +0900)]
Add OXM data to mf_fields
Add oxm_name and oxm_header elements to struct mf_field
and populate those entries for fields that are present
in both NXM and the OXM basic class.
This implementation was suggested by Ben Pfaff.
This does not address any possible differences in the NXM and
OXM basic class fields, for instance different maskability.
That may be addressed later as needed.
Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Tue, 24 Apr 2012 23:53:01 +0000 (16:53 -0700)]
vswitchd: Clean up iface_create().
iface_create() did its work in an order that meant it had to do a lot more
cleanup on error paths than is otherwise needed. This commit reorders the
work to avoid this extra cleanup.
bridge_ofproto_port_del() is no longer used after the refactoring so this
commit deletes it.
Ben Pfaff [Tue, 24 Apr 2012 23:47:27 +0000 (16:47 -0700)]
vswitchd: Make reconfiguration update port configuration again.
Commit bae7208e91a0 (bridge: Refactor bridge_reconfigure().) introduced
a regression in bridge reconfiguration. Previously, reconfiguration would
update the configuration of each bridge port, so that if the controller
(or the admin) changed a port's options, then that change would propagate
to the datapath. Following that commit, that no longer happened.
This commit restores that feature.
Bug #10972. Reported-by: Michael Hu <mhu@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
Ethan Jackson [Sat, 21 Apr 2012 02:11:35 +0000 (19:11 -0700)]
bridge: Refactor bridge_reconfigure().
The existing bridge_reconfigure() implementation is suboptimal.
When adding lots of new ports, on every pass through the run loop
it allocates a bunch of "struct iface"s and "struct port"s, only to
destroy them when out of time. Additionally, when there are errors
adding or deleting ports, it can fail to converge. Instead it will
attempt and fail to add the same set of ports forever.
This patch rewrites bridge_reconfigure() using a new strategy.
Whenever the database changes, some initial bookkeeping is done,
and a list of future work is compiled. The bridge begins whittling
down this list, and stops processing database changes until
finished.
Bug #10902. Signed-off-by: Ethan Jackson <ethan@nicira.com>
Ben Pfaff [Mon, 23 Apr 2012 17:07:29 +0000 (10:07 -0700)]
tests: Generalize 'sed' calls in MAC learning test to more than one digit.
With "check-valgrind" the test can take more than 10 seconds to run, so
replacing only a single trailing digit with ? ends up with 1? which causes
the test to fail.
Ben Pfaff [Mon, 23 Apr 2012 16:16:18 +0000 (09:16 -0700)]
ofproto: Fix use-after-free error when ports disappear.
update_port() can delete the port for which it is called, if the underlying
network device has been destroyed, so HMAP_FOR_EACH is unsafe in
ofproto_run().
Less obviously, update_port() can delete unrelated ports. For example,
suppose that initially device A is port 1 and device B is port 2. If
update_port("A") runs just after this, then it will ofport_remove() both
ports, then ofport_install() A as the new port 2.
So this commit first assembles a list of ports to update, then updates them
in a separate loop.
Without this commit, running "ovs-dpctl del-dp" while ovs-vswitchd is
running consistently causes a crash for me within a few seconds.
Ben Pfaff [Tue, 17 Apr 2012 23:56:21 +0000 (16:56 -0700)]
ovs-vsctl: Speed up port management operations with many ports.
This makes a sequence of 10,000 "add-port" operations on a single ovs-vsctl
command line about 4X faster. It makes a sequence of 10,000 "del-port"
operations on a single command line over 2X faster.
It works by not repopulating the cache of relationships between bridges,
ports, and interfaces after most operations, instead updating them
incrementally in-place.
Ben Pfaff [Wed, 18 Apr 2012 00:07:00 +0000 (17:07 -0700)]
ovs-vsctl: Merge struct vsctl_info into struct vsctl_context.
To speed up management operations with many ports, we need to preserve the
cache of bridges, ports, and interfaces from one operation to the next.
One necessary step is to push the "struct vsctl_info" that did the caching
up from the individual functions that need it into a more global structure.
This commit does that, merging it into struct vsctl_context.
This commit also modifies do_vsctl(), the top-level control code in
ovs-vsctl, to keep this part of the vsctl_context unchanged from running
one command to the next.
Ben Pfaff [Tue, 17 Apr 2012 20:50:53 +0000 (13:50 -0700)]
ovs-vsctl: Verify VLAN bridge controllers in cmd_get_controller().
A VLAN bridge uses its parent's controllers, so checking the controller
should verify the parent's set of controllers.
The change to verify_controllers() isn't necessary; it just deletes
the check for a null 'bridge' because verify_controllers() can no
longer be called with a null 'bridge'.
This fixes a bug, but it is unlikely to ever have caused a real problem for
users.
Ben Pfaff [Mon, 16 Apr 2012 22:54:37 +0000 (15:54 -0700)]
ofproto-dpif: Avoid extra flow copy in xlate_actions() for unneeded warnings.
The copy of the extra flow copy here was showing up in profiles. We only
need this copy if we end up doing a "trace" to warn the user. Most runs
won't ever do that, so don't start making copies until we actually hit
such a case.
This has a small behavioral change in that we'll only get a warning on the
*second* time we hit the resubmit recursion limit, not on the first. I
doubt that's really a problem.
Ben Pfaff [Thu, 19 Apr 2012 00:11:10 +0000 (17:11 -0700)]
ofproto-dpif: Implement "flow setup governor" to speed up many short flows.
The cost of creating and initializing facets and subfacets and installing,
tracking, and uninstalling kernel flows is significant. When most flows
have only one or a few packets, this overhead is higher than the cost of
handling each packet individually. This commit introduces heuristics that
cheaply count (approximately) the number of packets seen in a flow and
skips most of this expensive bookkeeping until the packet count exceeds a
threshold (currently 5 packets).
Ben Pfaff [Fri, 13 Apr 2012 21:09:10 +0000 (14:09 -0700)]
ofproto-dpif: Make it easier to credit statistics for resubmits.
Until now, crediting statistics to OpenFlow rules due to "resubmit" actions
has required setting up a "resubmit hook" with a callback function and
auxiliary data. This commit makes it easier to do, by adding a member to
struct action_xlate_ctx that specifies statistics to credit to each
resubmitted rule.
This commit includes one small behavioral change as an optimization.
Previously, rule_execute() translated the rule twice: once to get the ODP
actions, then a second time after executing the ODP actions to credit
statistics to the rules. After this commit, rule_execute() translates the
rule only once, crediting statistics as a side effect. The difference only
becomes visible when executing the actions fails: previously the statistics
would not be incremented, after this commit they will be. It is very
unusual for executing actions to fail (generally this indicates a bug) so
I'm not concerned about it.
Ben Pfaff [Mon, 9 Apr 2012 22:49:22 +0000 (15:49 -0700)]
classifier: Optimize search of "catchall" table.
Most flow tables have some kind of "catchall" rule that matches every
packet. For this table, the cost of copying, zeroing, and hashing the
input flow is significant. This patch avoids these costs.
Ben Pfaff [Sat, 7 Apr 2012 00:11:18 +0000 (17:11 -0700)]
ofproto-dpif: Avoid malloc() of "struct flow_miss".
In addition to avoid malloc() for struct flow_miss, this commit avoids
copying "struct flow" around, which is a significant benefit because
struct flow is currently 144 bytes.
Ben Pfaff [Fri, 6 Apr 2012 22:30:21 +0000 (15:30 -0700)]
ofproto-dpif: Avoid malloc() in common case for allocating subfacets.
Usually a facet has exactly one subfacet that has the same lifetime as
the facet. Allocating both the facet and its subfacet in a single memory
block improves performance.
Ben Pfaff [Mon, 16 Apr 2012 23:01:01 +0000 (16:01 -0700)]
netlink: Postpone choosing sequence numbers until send time.
Choosing sequence numbers at time of creating a packet means that
nl_sock_transact_multiple() has to search for the sequence number
of a reply, because the sequence numbers of the requests aren't
necessarily sequential. This commit makes it possible to avoid
the search, by deferring choice of sequence numbers until the
time that we send the packets. It doesn't actually modify
nl_sock_transact_multiple(), which will happen in a later commit.
Previously, I was concerned about a theoretical race condition
described in a comment in the old versino of this code:
This implementation uses sequence numbers that are unique
process-wide, to avoid a hypothetical race: send request, close
socket, open new socket that reuses the old socket's PID value,
send request on new socket, receive reply from kernel to old
socket but with same PID and sequence number. (This race could be
avoided other ways, e.g. by preventing PIDs from being quickly
reused).
However, I no longer believe that this can be a real problem,
because Netlink operates synchronously. The reply to a request
will always arrive before the socket can be closed and a new
socket opened with the old socket's PID.
Ben Pfaff [Thu, 5 Apr 2012 20:43:24 +0000 (13:43 -0700)]
ofproto-dpif: Don't do an extra flow translation when removing facets.
When we remove a facet, it seems superfluous to translate the flow
once more just to update the MAC learning table, since we know that
it was done within a second or so anyway (e.g. when the flow stats
were last updated).
Ansis Atteka [Fri, 30 Mar 2012 02:03:08 +0000 (19:03 -0700)]
ovs-test: Enhancements to the ovs-test tool
-Implemented support for ovs-test client, so that it could automatically
spawn an ovs-test server process from itself. This reduces the number of
commands the user have to type to get tests running.
-Automated creation of OVS bridges and ports (for VLAN and GRE tests), so that
user would not need to invoke ovs-vsctl manually to switch from direct, 802.1Q
and GRE tests.
-Fixed some pylint reported warnings.
-Fixed ethtool invocation so that we always try to query the physical interface
to get the driver name and version.
-and some others enhancements.
The new usage:
Node1:ovs-test -s 15531
Node2:ovs-test -c 127.0.0.1,1.1.1.1 192.168.122.151,1.1.1.2 -d -l 125 -t gre
Ethan Jackson [Mon, 16 Apr 2012 22:01:09 +0000 (15:01 -0700)]
lacp: Remove heartbeat mode.
The LACP heartbeat mode was used to monitor interfaces for
connectivity. It turns out that LACP is inferior to CFM for this
purpose. For the sake of simplicity this patch removes the
feature.
Ethan Jackson [Mon, 16 Apr 2012 21:55:58 +0000 (14:55 -0700)]
lacp: Remove custom transmission intervals.
Open vSwitch allowed users to set a custom LACP PDU transmission
interval. This turned out to be an ill conceived idea which was
more confusing than useful. This patch reverts Open vSwitch to the
behavior supported in the LACP specification: two transmission
intervals, fast and slow.
Ethan Jackson [Mon, 16 Apr 2012 19:09:49 +0000 (12:09 -0700)]
vswitch: Use consistent representation of DSCP bits.
There are two sensible ways to represent the 6 DSCP bits of an IP
packet. One could represent them as an integer in the range 0 to
63. Or one could represent them as they would appear in the tos
field (0 to 63) << 2. Before this patch, OVS had used the former
method for the DSCP bits in the Queue Table, and the latter for the
DSCP in the Controller and Manager tables. Since the ability to
set DSCP bits in the Controller and Manager tables is so new that
it hasn't been released yet, this patch changes it to use the
existing style employed in the Queue table. Hopefully this should
make the code and configuration less confusing.
Ethan Jackson [Mon, 16 Apr 2012 20:56:58 +0000 (13:56 -0700)]
socket-util: Remove DSCP_INVALID.
The DSCP_INVALID flag allowed callers to prevent socket-util from
modify the DSCP bits of newly created sockets. However, the two
really important callers (implementations of the controller and
manager tables) never used it. Furthermore, the other callers
would be fine always setting the DSCP bits to zero. This patch
removes the DSCP_INVALID option in an effort to simplify the code.
Ethan Jackson [Thu, 12 Apr 2012 02:52:46 +0000 (19:52 -0700)]
lib: Pull dscp bits out of reconnect.
The DSCP bits of a connection have nothing to do with the
reconnection state machine. This pulls them up into jsonrpc which
needs them to properly establish connections.
Ethan Jackson [Mon, 16 Apr 2012 19:46:46 +0000 (12:46 -0700)]
socket-util: Close socket on failed dscp modification.
If socket-util failed to modify the dscp bits of an active
connection, it would fail to close the file descriptor potentially
causing a leak. Found by inspection.
Ben Pfaff [Sat, 14 Apr 2012 04:24:17 +0000 (21:24 -0700)]
learn: Make it possible to parse "load" actions wider than 64 bits.
The implementation of the "learn" action now properly implements
specifications such as 0x20010db885a308d313198a2e03707348->NXM_NX_IPV6_DST
but the parser used in ovs-ofctl and elsewhere could not generate such
specifications. This commit adds that support.
Ben Pfaff [Wed, 11 Apr 2012 21:45:34 +0000 (14:45 -0700)]
meta-flow: New functions for reading and writing generalized subfields.
The existing functions for reading and writing the values of subfields only
handle subfields up to 64 bits wide. These new functions handle subfields
of any width.
Ethan Jackson [Sat, 7 Apr 2012 01:38:48 +0000 (18:38 -0700)]
bridge: Rate limit port creations and deletions.
In some datapaths, adding or deleting OpenFlow ports can take quite
a bit of time. If there are lots of OpenFlow ports which needed to
be added in a run loop, this can cause Open vSwitch to lock up and
stop setting up flows while trying to catch up. This patch lessons
the severity of the problem by only doing a few OpenFlow port adds
or deletions per run loop allowing other work to be done in
between.
Bug #10672. Signed-off-by: Ethan Jackson <ethan@nicira.com>
Ben Pfaff [Tue, 27 Mar 2012 17:16:52 +0000 (10:16 -0700)]
ovsdb-idl: Simplify transaction retry.
Originally the IDL transaction state machine had a return value
TXN_TRY_AGAIN to signal the client to wait for a change in the database and
then retry its transaction. However, this logic was incomplete, because
it was possible for the database to change before the reply to the
transaction RPC was received, in which case the client would wait for a
further change. Commit 4fdfe5ccf84c (ovsdb-idl: Prevent occasional hang
when multiple database clients race.) fixed the problem by breaking
TXN_TRY_AGAIN into two status codes, TXN_AGAIN_WAIT that meant to wait for
a further change and TXN_AGAIN_NOW that meant that a change had already
occurred so try again immediately.
This is correct enough, but it is more complicated than necessary. It is
simpler and just as correct to use a single "try again" status that
requires the client to wait for a change relative to the database contents
*before* the transaction was committed. This commit makes that change.
It also changes ovsdb_idl_run()'s return type from bool to void because
its return type is hardly useful anymore.
Python binaries to write comment in db show-log while updating database
This is an improvement in python script ovs-xapi-sync to call add_comment()
method while updating database. This will help developer to debug binaries
and database from ovsdb-tool show-log command output and understand which
python binary is updating db.
Feature #10543 Signed-off-by: Arun Sharma <arun.sharma@calsoftinc.com> Signed-off-by: Ben Pfaff <blp@nicira.com>