Ben Pfaff [Mon, 16 Jul 2012 17:23:58 +0000 (10:23 -0700)]
ovs-ofctl: Fix use-after-free error.
Commit 4ce9c31573 (ovs-ofctl: Factor code out of read_flows_from_switch().)
introduced a use-after-free error, fixed by this change.
Also adds a unit test for "ovs-ofctl diff-flows" that would have found the
problem. (The bug report cited "diff-flows" but this bug was present in
dump-flows as well because they share common code.)
Bug #12461. Reported-by: James Schmidt <jschmidt@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Thu, 12 Jul 2012 23:32:56 +0000 (16:32 -0700)]
ovs-ofctl: Avoid use-after-free upon "ofctl/unblock" when connection dies.
The implementation of "ofctl/block" used a nested poll loop, with an inner
call to unixctl_server_run(). This poll loop always ran inside an outer
call to unixctl_server_run(), since that's the context within which unixctl
command implementations run. That means that, if a unixctl connection got
closed within the inner poll loop, and the outer poll loop happened to be
processing the same unixctl connection, then the outer poll loop would
dereference data in the freed connection.
The simplest solution is to avoid a nested poll loop, so that's what this
commit does.
This didn't cause a failure in the unit tests on i386 (which is why I
didn't catch it before pushing) but it did, reliably, on x86-64, and it
showed up in valgrind everywhere.
Recently released CentOS 6.3 (and probably also RHEL 6.3, I assume)
backported skb_frag_page() and others to their 2.6.32-based kernel,
which caused build failure of Open vSwitch kernel modules.
Ben Pfaff [Thu, 12 Jul 2012 21:18:05 +0000 (14:18 -0700)]
ofproto: New feature to notify controllers of flow table changes.
OpenFlow switching monitoring and controller coordination can be made more
efficient if the switch can notify a controller of flow table changes as
they occur, rather than periodically polling for changes. This commit
implements such a feature.
Feature #6633. CC: Natasha Gude <natasha@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Tue, 3 Jul 2012 21:00:38 +0000 (14:00 -0700)]
ofproto: Add extra comments and checking for expiring a pending rule.
A given rule may only have one pending operation at a time, so when an
operation is pending we must not allow a flow expiration to be started on
that rule.
This doesn't fix a user-visible bug in ofproto-dpif because ofproto-dpif
always completes operations immediately, that is, no operations will be
pending when expiration runs. (Technically there is a bug if the user
runs "ovs-appctl ofproto/clog", but that feature is for debugging only and
there is no reason for a user to ever run it.)
Ben Pfaff [Thu, 12 Jul 2012 17:17:10 +0000 (10:17 -0700)]
ofproto: Represent flow cookie changes as operations too.
An upcoming commit will add support for monitoring changes to the flow
table. This feature wants to be able to report changes to flow cookies,
as well as to other properties of a flow. Until now, however, a flow_mod
that modifies only the flow's cookie is treated as a special case that does
not go through the ofoperation mechanism. That makes it harder to report
flow cookie-only changes (it would require an additional special case in
the reporting mechanism) so this commit changes cookie-only changes to
go through ofoperations.
The bulk of this change is to change the meaning of ofoperation's 'ofpacts'
member so that a NULL value indicates that the flow's actions are not
changing. Otherwise a flow-cookie only change would still require copying
and then freeing all the actions, which seems like a waste.
Ben Pfaff [Fri, 6 Jul 2012 17:36:00 +0000 (10:36 -0700)]
ofproto: Revert change in flow cookie when an ofoperation fails.
The flow_cookie member of struct ofoperation has always been there, but it
seems that it's never been used. This fixes the code so that if a modify
operation fails the rule's original flow cookie is restored.
Ben Pfaff [Sat, 30 Jun 2012 05:33:56 +0000 (22:33 -0700)]
ofproto: Finalize all ofoperations in a given ofgroup at the same time.
An upcoming commit will add support for flow table monitoring by
controllers. One feature of this upcoming support is that a controller's
own changes to the flow table can be abbreviated to a summary, since the
controller presumably knows what it has already sent to the switch.
However, the summary only makes sense if a set of flow table changes
completely succeeds or completely fails. If it partially fails, the
switch must not attempt to summarize it, because the controller needs
to know the details. Given that, we have to wait for all of the
operations in an ofgroup to either succeed or fail before the switch
can send its flow table update report to the controllers. This
commit makes that change.
Ben Pfaff [Thu, 12 Jul 2012 20:32:47 +0000 (13:32 -0700)]
ovs-ofctl: Add --sort and --rsort options for "dump-flows" command.
Feature #8754. Signed-off-by: Arun Sharma <arun.sharma@calsoftinc.com>
[blp@nicira.com rewrote most of the code] Signed-off-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Tue, 3 Jul 2012 17:25:35 +0000 (10:25 -0700)]
ovs-ofctl: Use the prepared connection to dump flows in do_dump_flows__().
The logic in do_dump_flows__() went to some trouble to open an OpenFlow
connection and set the correct protocol, but then it allowed
dump_stats_transaction() to create and use a completely different OpenFlow
connection that hadn't been prepared that way. This commit fixes the
problem.
I don't think that there is a real bug here because currently the set of
protocols doesn't influence flow stats replies. But that could change in
the future.
Syscall param socketcall.sendmsg(msg.msg_iov[i]) points to uninitialised
byte(s)
at 0x42D3021: sendmsg (in /lib/libc-2.5.so)
by 0x80E4D23: nl_sock_transact (netlink-socket.c:670)
by 0x80D9086: dpif_linux_execute__ (dpif-linux.c:872)
by 0x807D6AE: dpif_execute__ (dpif.c:957)
by 0x807D6FE: dpif_execute (dpif.c:987)
by 0x805DED9: send_packet (ofproto-dpif.c:4727)
by 0x805F8E1: port_run_fast (ofproto-dpif.c:2441)
by 0x8065CF6: run_fast (ofproto-dpif.c:926)
by 0x805674F: ofproto_run_fast (ofproto.c:1148)
by 0x804C957: bridge_run_fast (bridge.c:1980)
by 0x8053F49: main (ovs-vswitchd.c:123)
Address 0xbea0895c is on thread 1's stack
Bug #11797. Signed-off-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Tue, 10 Jul 2012 21:11:59 +0000 (14:11 -0700)]
datapath: Check gso_type for correct sk_buff in queue_gso_packets().
At the point where it was used, skb_shinfo(skb)->gso_type referred to a
post-GSO sk_buff. Thus, it would always be 0. We want to know the pre-GSO
gso_type, so we need to obtain it before segmenting.
Before this change, the kernel would pass inconsistent data to userspace:
packets for UDP fragments with nonzero offset would be passed along with
flow keys that indicate a zero offset (that is, the flow key for "later"
fragments claimed to be "first" fragments). This inconsistency tended
to confuse Open vSwitch userspace, causing it to log messages about
"failed to flow_del" the flows with "later" fragments.
Bug #12394. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
Ethan Jackson [Mon, 9 Jul 2012 22:59:44 +0000 (15:59 -0700)]
cfm: Remove sequence fault reason.
Commit 2b540ecb (Added handling of previously ignored cfm faults.)
made the CFM code trigger a fault when a packet is received with an
out of order sequence number. This means that if even one CFM
probe is dropped, a fault will be triggered because the next
received probe's sequence would be two greater than the last. This
is in conflict with the 802.1ag requirement that 3.5 dropped probes
triggers fault.
ipsec gre: Don't cache bad ovs-monitor-ipsec pid values.
Commit 2a586a5 (ipsec gre: Do not reread ovs monitor ipsec pidfile in
netdev vport so much) attempts to cache the pid of ovs-monitor-ipsec so
that it's not re-checked so often. Unfortunately, it also cached error
returns, so errors never recover. This commit continues to check for
the process's existence after an error.
Issue #12399
Reported-by: Paul Ingram <paul@nicira.com> Signed-off-by: Justin Pettit <jpettit@nicira.com>
Ethan Jackson [Thu, 28 Jun 2012 21:55:55 +0000 (14:55 -0700)]
ovs-lib: Support old versions of strace.
The ovs-lib strace wrapper requires the -D (run tracer process as a
detached grandchild, not as parent) option which does not exist in
older versions. This patch falls back to attaching to the running
process when the -D option does not exists.
Ben Pfaff [Tue, 26 Jun 2012 21:43:08 +0000 (14:43 -0700)]
ovs-vswitchd: Log datapath ID in a more user-friendly way.
The layering between ofproto and ovs-vswitchd caused the datapath ID to be
logged in a needlessly confusing way. First, ofproto would log its
default datapath ID:
Ethan Jackson [Mon, 25 Jun 2012 22:46:44 +0000 (15:46 -0700)]
bond: Sending learning packets on active-backup.
Suppose we have an active bond with two ports, eth1 and eth2,
attached to a standard L2 learning switch which does not know it's
participating in a bond (i.e. isn't running LACP). Suppose eth1 is
active and therefore the L2 learning switch is forwarding traffic
to eth1 as instructed by its learning table. Now suppose, for some
reason, OVS fails over from eth1 to eth2. For each destination
MAC, the L2 learning switch will continue sending traffic to eth1,
which will be dropped, until either traffic from that MAC appears
on eth2, or the learning table entries expire.
To alleviate this issue, this patch sends learning packets on newly
active interfaces in active-backup bonds in order to educate the
upstream network of the change.
Requested-by: Frido Roose <fr.roose@gmail.com> Signed-off-by: Ethan Jackson <ethan@nicira.com>
Ethan Jackson [Mon, 25 Jun 2012 22:48:10 +0000 (15:48 -0700)]
bond: Don't send learning packets on STABLE bonds.
Stable bonds require upstream switch support to avoid confusing
learning tables. Therefore, sending learning packets on these
bonds doesn't make a lot of sense.
Ben Pfaff [Thu, 5 Jul 2012 15:41:03 +0000 (08:41 -0700)]
ovs-brcompatd: Fix sending replies to kernel requests.
Commit 7d7447 (netlink: Postpone choosing sequence numbers until send
time.) broke ovs-brcompatd because it prevented userspace replies to
kernel requests from using the correct sequence numbers. This commit fixes
it.
Atzm Watanabe found the root cause and provided an alternative patch to
avoid the problem.
Ben Pfaff [Wed, 4 Jul 2012 05:17:14 +0000 (22:17 -0700)]
Introduce ofpacts, an abstraction of OpenFlow actions.
OpenFlow actions have always been somewhat awkward to handle.
Moreover, over time we've started creating actions that require more
complicated parsing. When we maintain those actions internally in
their wire format, we end up parsing them multiple times, whenever
we have to look at the set of actions.
When we add support for OpenFlow 1.1 or later protocols, the situation
will get worse, because these newer protocols support many of the same
actions but with different representations. It becomes unrealistic to
handle each protocol in its wire format.
This commit adopts a new strategy, by converting OpenFlow actions into
an internal form from the wire format when they are read, and converting
them back to the wire format when flows are dumped. I believe that this
will be more maintainable over time.
Thanks to Simon Horman and Pravin Shelar for reviews.
ipsec gre: Do not reread ovs monitor ipsec pidfile in netdev vport so much
Instead of rereading ovs-monitor-ipsec pidfile in netdev-vport so much. It's
probably only necessary to check once if ovs-monitor-ipsec is running,
and then cache the result. If the result is negative, then it may be
worthwhile to try again the next time someone tries to configure an ipsec
tunnel.
Signed-off-by: Arun Sharma <arun.sharma@calsoftinc.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
Ansis Atteka [Thu, 28 Jun 2012 22:52:40 +0000 (15:52 -0700)]
ovs-l3ping: A new test utility that allows to detect L3 tunneling issues
ovs-l3ping is similar to ovs-test, but the main difference
is that it does not require administrator to open firewall
holes for the XML/RPC control connection. This is achieved
by encapsulating the Control Connection over the L3 tunnel
itself.
This tool is not intended as a replacement for ovs-test,
because ovs-test covers much broader set of test cases.
Sample usage:
Node1: ovs-l3ping -s 192.168.122.236,10.1.1.1 -t gre
Node2: ovs-l3ping -c 192.168.122.220,10.1.1.2,10.1.1.1 -t gre
Ben Pfaff [Fri, 29 Jun 2012 16:22:59 +0000 (09:22 -0700)]
ovs-vswitchd: Call mlockall() from the daemon, not the parent or monitor.
mlockall(2) says:
Memory locks are not inherited by a child created via fork(2) and are
automatically removed (unlocked) during an execve(2) or when the
process terminates.
which means that --mlockall was ineffective in combination with --detach
or --monitor or both. Both are used in the most common production
configuration of Open vSwitch, so this means that --mlockall has never been
effective in production.
Ed Maste [Fri, 29 Jun 2012 21:11:24 +0000 (21:11 +0000)]
Route-table implementation for (Free)BSD
This is a trivial implementation of the route-table functionality for
FreeBSD, as needed by ofproto/ofproto-dpif-sflow.c. It has not yet
been extensively tested.
Signed-off-by: Ed Maste <emaste@freebsd.org> Signed-off-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Mon, 11 Jun 2012 18:23:06 +0000 (11:23 -0700)]
ofproto: Report nonexistent ports and queues as errors in queue stats.
Until now, Open vSwitch has ignored missing ports and queues in most cases
in queue stats requests, simply returning an empty set of statistics.
It seems that it is better to report an error, so this commit does this.
Reported-by: Prabina Pattnaik <Prabina.Pattnaik@nechclst.in> Signed-off-by: Ben Pfaff <blp@nicira.com>
Ethan Jackson [Wed, 27 Jun 2012 20:41:17 +0000 (13:41 -0700)]
ovs-ctl: Add additional options to strace wrapper.
It's useful to know how long each system call took, and at what
time each system call happened. In addition this patch causes
strace to print strings more fully allowing log messages to be seen
in the output.
Ben Pfaff [Wed, 27 Jun 2012 16:56:20 +0000 (09:56 -0700)]
tests: Fix MockXenAPI to make the ovs-xapi-sync test case pass again.
Commit 1dc6839d2d (xenserver: Improve efficiency of code by using
get_all_records_where()) updated the ovs-xapi-sync script and caused a unit
test failure. This fixes it.
Rob Hoes [Wed, 27 Jun 2012 15:14:21 +0000 (16:14 +0100)]
xenserver: Improve efficiency of code by using get_all_records_where()
Replace the get_record() for network references which caused as many
slave-to-master calls as there are Network records plus one.
The get_all_records_where() call gets exactly what is needed with a single
call.
Signed-off-by: Rob Hoes <rob.hoes@citrix.com> Acked-by: Dominic Curran <dominic.curran@citrix.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
Isaku Yamahata [Wed, 27 Jun 2012 14:23:25 +0000 (07:23 -0700)]
lib/meta-flow: introduce a macro, CASE_MFF_REGS, to catch "case MFF_REG<N>:"
Introduce a macro instead for
With this macro, the code is a bit reduced.
test: compile-tested and unit tests passed.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
[blp@nicira.com moved the macro declaration, moved trailing colon from
macro definition to invocation, adjusted style slightly] Signed-off-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Tue, 26 Jun 2012 17:52:34 +0000 (10:52 -0700)]
meta-flow: Accept NXM and OXM field names, support NXM and OXM for output.
This commit makes actions that accept NXM header values also accept OXM
header values and accept OXM field names where previously only NXM field
names were accepted.
This makes it possible to add new OXM fields that don't have NXM header
values, e.g. the OXM "metadata" field.
Inspired by Joe Stringer's patch:
http://openvswitch.org/pipermail/dev/2012-June/018344.html
Reported-by: Joe Stringer <joe@wand.net.nz> Signed-off-by: Ben Pfaff <blp@nicira.com>
Mehak Mahajan [Tue, 26 Jun 2012 19:30:26 +0000 (12:30 -0700)]
Setting miss_send_len on receiving NXT_SET_ASYNC_CONFIG message.
For the service controllers to receive any asynchronous messages, the
miss_send_len must be set to a non-zero value (refer to DESIGN). On
receiving the NXT_SET_ASYNC_CONFIG message, the miss_send_len is set
to the default value unless it is set to a non-zero value earlier by
the OFPT_SET_CONFIG message.
Ben Pfaff [Mon, 25 Jun 2012 16:48:44 +0000 (09:48 -0700)]
ofproto-dpif-governor: Improve performance when most flows get set up.
The "flow setup governor" was introduced to avoid the cost of setting up
short flows when there are many of them. It works very well for short
flows, in fact. However, when the bulk of flows are short, but still long
enough to be set up by the governor, we end up with the worst of both
worlds: OVS processes the first 5 packets of every flow "by hand" and then
it still has to set up a flow.
This commit refines the flow setup governor so that, when most of the flows
that go through it actually get set up, it in turn starts setting up most
flows at the first packet. When it does this, it continues to sample a
small fraction of the flows in the governor's usual manner, so that if the
behavior changes it can react to it.
This increases netperf TCP_CRR transactions per second by about 25% in my
test setup, without affecting "ovs-benchmark rate" performance.
(I found that to get relatively stable performance for TCP_CRR, regardless
of whether Open vSwitch or any kind of bridging was involved, I had to pin
the netperf processes on each side of the link to a single core. I found
that my NIC's interrupts were already pinned. Thanks to Luca Giraudo
<lgiraudo@nicira.com> for these hints.)
Bug #12080. Reported-by: Gurucharan Shetty <gshetty@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Wed, 20 Jun 2012 17:55:41 +0000 (10:55 -0700)]
dpif-linux: Zero 'stats' outputs of dpif_operate() ops on failure.
When DPIF_OP_FLOW_PUT or DPIF_OP_FLOW_DEL operations failed, they left
their 'stats' outputs uninitialized. For DPIF_OP_FLOW_DEL, this meant that
the caller would read indeterminate data:
Conditional jump or move depends on uninitialised value(s)
at 0x805C1EB: subfacet_reset_dp_stats (ofproto-dpif.c:4410)
by 0x80637D2: expire_batch (ofproto-dpif.c:3471)
by 0x8066114: run (ofproto-dpif.c:3513)
by 0x8059DF4: ofproto_run (ofproto.c:1035)
by 0x8052E17: bridge_run (bridge.c:2005)
by 0x8053F74: main (ovs-vswitchd.c:108)
It's unusual for a delete operation to fail. The most common reason is an
administrator running "ovs-dpctl del-flows".
The only user of DPIF_OP_FLOW_PUT did not request stats, so this doesn't
fix an actual bug for that case.
Bug #11797. Reported-by: James Schmidt <jschmidt@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
ovs-bugtool: Avoid running ethtool on non-physical devices.
There can be possibilities where there are hundreds of OVS
internal devices. In such a situation, running ovs-bugtool
can take a very long time to complete as multiple ethtool
commands are run on each interface in /sys/class/net. Once
the ovs-bugtool completes, most of the ethtool command outputs
would be incomplete with "timeouts" as we only give 30 seconds
for CAP_NETWORK_STATUS.
With the following patch, we only run ethtools on those interfaces
that have an associated "device". All physical interfaces have
this entry in /sys/class/net/${interface_name}/. Virtual interfaces
can have this entry too, if it has an underlying virtual device.
Ethan Jackson [Fri, 22 Jun 2012 00:57:30 +0000 (17:57 -0700)]
ofproto-dpif: Place high priority on sending CCMs.
It's very important to get CCMs out as quickly as possible to avoid
causing a fault when there is really no problem. This patch sends
CCMs as part of port_run_fast() in an attempt to move in this
direction.
Mehak Mahajan [Thu, 21 Jun 2012 19:22:42 +0000 (12:22 -0700)]
Reapplying the dscp changes: No need to restart DB/OVS on changing dscp value.
This patch reapplies the changes that were reverted with the commit 59efa47
(Revert DSCP update changes.). It also addresses the problem introduced by
the original commits, cd8fca2 ((jsonrpc: Correctly setting the dscp value
before reconnect.) and b2e18d (No need to restart DB / OVS on changing
dscp value.), that caused numerous unit test failures on some systems (as
diagnosed by valgrind).
With this change there is no need to restart the DB or OVS on configuring a
different value for the manager or controller connection respectively. On
detecting a change in the dscp value on the socket, the previous socket is
closed and a new socket is created and connection is established with the new
configured dscp value.
Ben Pfaff [Thu, 21 Jun 2012 17:42:20 +0000 (10:42 -0700)]
odp-util: Include <config.h> first.
Otherwise _GNU_SOURCE doesn't get defined early enough and on some systems
LLONG_MIN is missing when odp-util.c tries to use it indirectly through
token-bucket.h.
Reported-by: Michael Hu <mhu@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>