Ben Pfaff [Tue, 26 Jun 2012 21:43:08 +0000 (14:43 -0700)]
ovs-vswitchd: Log datapath ID in a more user-friendly way.
The layering between ofproto and ovs-vswitchd caused the datapath ID to be
logged in a needlessly confusing way. First, ofproto would log its
default datapath ID:
Ethan Jackson [Mon, 25 Jun 2012 22:46:44 +0000 (15:46 -0700)]
bond: Sending learning packets on active-backup.
Suppose we have an active bond with two ports, eth1 and eth2,
attached to a standard L2 learning switch which does not know it's
participating in a bond (i.e. isn't running LACP). Suppose eth1 is
active and therefore the L2 learning switch is forwarding traffic
to eth1 as instructed by its learning table. Now suppose, for some
reason, OVS fails over from eth1 to eth2. For each destination
MAC, the L2 learning switch will continue sending traffic to eth1,
which will be dropped, until either traffic from that MAC appears
on eth2, or the learning table entries expire.
To alleviate this issue, this patch sends learning packets on newly
active interfaces in active-backup bonds in order to educate the
upstream network of the change.
Requested-by: Frido Roose <fr.roose@gmail.com> Signed-off-by: Ethan Jackson <ethan@nicira.com>
Ethan Jackson [Mon, 25 Jun 2012 22:48:10 +0000 (15:48 -0700)]
bond: Don't send learning packets on STABLE bonds.
Stable bonds require upstream switch support to avoid confusing
learning tables. Therefore, sending learning packets on these
bonds doesn't make a lot of sense.
Ben Pfaff [Thu, 5 Jul 2012 15:41:03 +0000 (08:41 -0700)]
ovs-brcompatd: Fix sending replies to kernel requests.
Commit 7d7447 (netlink: Postpone choosing sequence numbers until send
time.) broke ovs-brcompatd because it prevented userspace replies to
kernel requests from using the correct sequence numbers. This commit fixes
it.
Atzm Watanabe found the root cause and provided an alternative patch to
avoid the problem.
Reported-by: André Ruß <andre.russ@hybris.com> Reported-by: Atzm Watanabe <atzm@stratosphere.co.jp> Tested-by: Atzm Watanabe <atzm@stratosphere.co.jp> Signed-off-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Wed, 4 Jul 2012 05:17:14 +0000 (22:17 -0700)]
Introduce ofpacts, an abstraction of OpenFlow actions.
OpenFlow actions have always been somewhat awkward to handle.
Moreover, over time we've started creating actions that require more
complicated parsing. When we maintain those actions internally in
their wire format, we end up parsing them multiple times, whenever
we have to look at the set of actions.
When we add support for OpenFlow 1.1 or later protocols, the situation
will get worse, because these newer protocols support many of the same
actions but with different representations. It becomes unrealistic to
handle each protocol in its wire format.
This commit adopts a new strategy, by converting OpenFlow actions into
an internal form from the wire format when they are read, and converting
them back to the wire format when flows are dumped. I believe that this
will be more maintainable over time.
Thanks to Simon Horman and Pravin Shelar for reviews.
ipsec gre: Do not reread ovs monitor ipsec pidfile in netdev vport so much
Instead of rereading ovs-monitor-ipsec pidfile in netdev-vport so much. It's
probably only necessary to check once if ovs-monitor-ipsec is running,
and then cache the result. If the result is negative, then it may be
worthwhile to try again the next time someone tries to configure an ipsec
tunnel.
Signed-off-by: Arun Sharma <arun.sharma@calsoftinc.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
Ansis Atteka [Thu, 28 Jun 2012 22:52:40 +0000 (15:52 -0700)]
ovs-l3ping: A new test utility that allows to detect L3 tunneling issues
ovs-l3ping is similar to ovs-test, but the main difference
is that it does not require administrator to open firewall
holes for the XML/RPC control connection. This is achieved
by encapsulating the Control Connection over the L3 tunnel
itself.
This tool is not intended as a replacement for ovs-test,
because ovs-test covers much broader set of test cases.
Sample usage:
Node1: ovs-l3ping -s 192.168.122.236,10.1.1.1 -t gre
Node2: ovs-l3ping -c 192.168.122.220,10.1.1.2,10.1.1.1 -t gre
Ben Pfaff [Fri, 29 Jun 2012 16:22:59 +0000 (09:22 -0700)]
ovs-vswitchd: Call mlockall() from the daemon, not the parent or monitor.
mlockall(2) says:
Memory locks are not inherited by a child created via fork(2) and are
automatically removed (unlocked) during an execve(2) or when the
process terminates.
which means that --mlockall was ineffective in combination with --detach
or --monitor or both. Both are used in the most common production
configuration of Open vSwitch, so this means that --mlockall has never been
effective in production.
Ed Maste [Fri, 29 Jun 2012 21:11:24 +0000 (21:11 +0000)]
Route-table implementation for (Free)BSD
This is a trivial implementation of the route-table functionality for
FreeBSD, as needed by ofproto/ofproto-dpif-sflow.c. It has not yet
been extensively tested.
Signed-off-by: Ed Maste <emaste@freebsd.org> Signed-off-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Mon, 11 Jun 2012 18:23:06 +0000 (11:23 -0700)]
ofproto: Report nonexistent ports and queues as errors in queue stats.
Until now, Open vSwitch has ignored missing ports and queues in most cases
in queue stats requests, simply returning an empty set of statistics.
It seems that it is better to report an error, so this commit does this.
Reported-by: Prabina Pattnaik <Prabina.Pattnaik@nechclst.in> Signed-off-by: Ben Pfaff <blp@nicira.com>
Ethan Jackson [Wed, 27 Jun 2012 20:41:17 +0000 (13:41 -0700)]
ovs-ctl: Add additional options to strace wrapper.
It's useful to know how long each system call took, and at what
time each system call happened. In addition this patch causes
strace to print strings more fully allowing log messages to be seen
in the output.
Ben Pfaff [Wed, 27 Jun 2012 16:56:20 +0000 (09:56 -0700)]
tests: Fix MockXenAPI to make the ovs-xapi-sync test case pass again.
Commit 1dc6839d2d (xenserver: Improve efficiency of code by using
get_all_records_where()) updated the ovs-xapi-sync script and caused a unit
test failure. This fixes it.
Rob Hoes [Wed, 27 Jun 2012 15:14:21 +0000 (16:14 +0100)]
xenserver: Improve efficiency of code by using get_all_records_where()
Replace the get_record() for network references which caused as many
slave-to-master calls as there are Network records plus one.
The get_all_records_where() call gets exactly what is needed with a single
call.
Signed-off-by: Rob Hoes <rob.hoes@citrix.com> Acked-by: Dominic Curran <dominic.curran@citrix.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
Isaku Yamahata [Wed, 27 Jun 2012 14:23:25 +0000 (07:23 -0700)]
lib/meta-flow: introduce a macro, CASE_MFF_REGS, to catch "case MFF_REG<N>:"
Introduce a macro instead for
With this macro, the code is a bit reduced.
test: compile-tested and unit tests passed.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
[blp@nicira.com moved the macro declaration, moved trailing colon from
macro definition to invocation, adjusted style slightly] Signed-off-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Tue, 26 Jun 2012 17:52:34 +0000 (10:52 -0700)]
meta-flow: Accept NXM and OXM field names, support NXM and OXM for output.
This commit makes actions that accept NXM header values also accept OXM
header values and accept OXM field names where previously only NXM field
names were accepted.
This makes it possible to add new OXM fields that don't have NXM header
values, e.g. the OXM "metadata" field.
Inspired by Joe Stringer's patch:
http://openvswitch.org/pipermail/dev/2012-June/018344.html
Reported-by: Joe Stringer <joe@wand.net.nz> Signed-off-by: Ben Pfaff <blp@nicira.com>
Mehak Mahajan [Tue, 26 Jun 2012 19:30:26 +0000 (12:30 -0700)]
Setting miss_send_len on receiving NXT_SET_ASYNC_CONFIG message.
For the service controllers to receive any asynchronous messages, the
miss_send_len must be set to a non-zero value (refer to DESIGN). On
receiving the NXT_SET_ASYNC_CONFIG message, the miss_send_len is set
to the default value unless it is set to a non-zero value earlier by
the OFPT_SET_CONFIG message.
Ben Pfaff [Mon, 25 Jun 2012 16:48:44 +0000 (09:48 -0700)]
ofproto-dpif-governor: Improve performance when most flows get set up.
The "flow setup governor" was introduced to avoid the cost of setting up
short flows when there are many of them. It works very well for short
flows, in fact. However, when the bulk of flows are short, but still long
enough to be set up by the governor, we end up with the worst of both
worlds: OVS processes the first 5 packets of every flow "by hand" and then
it still has to set up a flow.
This commit refines the flow setup governor so that, when most of the flows
that go through it actually get set up, it in turn starts setting up most
flows at the first packet. When it does this, it continues to sample a
small fraction of the flows in the governor's usual manner, so that if the
behavior changes it can react to it.
This increases netperf TCP_CRR transactions per second by about 25% in my
test setup, without affecting "ovs-benchmark rate" performance.
(I found that to get relatively stable performance for TCP_CRR, regardless
of whether Open vSwitch or any kind of bridging was involved, I had to pin
the netperf processes on each side of the link to a single core. I found
that my NIC's interrupts were already pinned. Thanks to Luca Giraudo
<lgiraudo@nicira.com> for these hints.)
Bug #12080. Reported-by: Gurucharan Shetty <gshetty@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Wed, 20 Jun 2012 17:55:41 +0000 (10:55 -0700)]
dpif-linux: Zero 'stats' outputs of dpif_operate() ops on failure.
When DPIF_OP_FLOW_PUT or DPIF_OP_FLOW_DEL operations failed, they left
their 'stats' outputs uninitialized. For DPIF_OP_FLOW_DEL, this meant that
the caller would read indeterminate data:
Conditional jump or move depends on uninitialised value(s)
at 0x805C1EB: subfacet_reset_dp_stats (ofproto-dpif.c:4410)
by 0x80637D2: expire_batch (ofproto-dpif.c:3471)
by 0x8066114: run (ofproto-dpif.c:3513)
by 0x8059DF4: ofproto_run (ofproto.c:1035)
by 0x8052E17: bridge_run (bridge.c:2005)
by 0x8053F74: main (ovs-vswitchd.c:108)
It's unusual for a delete operation to fail. The most common reason is an
administrator running "ovs-dpctl del-flows".
The only user of DPIF_OP_FLOW_PUT did not request stats, so this doesn't
fix an actual bug for that case.
Bug #11797. Reported-by: James Schmidt <jschmidt@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
ovs-bugtool: Avoid running ethtool on non-physical devices.
There can be possibilities where there are hundreds of OVS
internal devices. In such a situation, running ovs-bugtool
can take a very long time to complete as multiple ethtool
commands are run on each interface in /sys/class/net. Once
the ovs-bugtool completes, most of the ethtool command outputs
would be incomplete with "timeouts" as we only give 30 seconds
for CAP_NETWORK_STATUS.
With the following patch, we only run ethtools on those interfaces
that have an associated "device". All physical interfaces have
this entry in /sys/class/net/${interface_name}/. Virtual interfaces
can have this entry too, if it has an underlying virtual device.
Ethan Jackson [Fri, 22 Jun 2012 00:57:30 +0000 (17:57 -0700)]
ofproto-dpif: Place high priority on sending CCMs.
It's very important to get CCMs out as quickly as possible to avoid
causing a fault when there is really no problem. This patch sends
CCMs as part of port_run_fast() in an attempt to move in this
direction.
Mehak Mahajan [Thu, 21 Jun 2012 19:22:42 +0000 (12:22 -0700)]
Reapplying the dscp changes: No need to restart DB/OVS on changing dscp value.
This patch reapplies the changes that were reverted with the commit 59efa47
(Revert DSCP update changes.). It also addresses the problem introduced by
the original commits, cd8fca2 ((jsonrpc: Correctly setting the dscp value
before reconnect.) and b2e18d (No need to restart DB / OVS on changing
dscp value.), that caused numerous unit test failures on some systems (as
diagnosed by valgrind).
With this change there is no need to restart the DB or OVS on configuring a
different value for the manager or controller connection respectively. On
detecting a change in the dscp value on the socket, the previous socket is
closed and a new socket is created and connection is established with the new
configured dscp value.
Ben Pfaff [Thu, 21 Jun 2012 17:42:20 +0000 (10:42 -0700)]
odp-util: Include <config.h> first.
Otherwise _GNU_SOURCE doesn't get defined early enough and on some systems
LLONG_MIN is missing when odp-util.c tries to use it indirectly through
token-bucket.h.
Reported-by: Michael Hu <mhu@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Thu, 31 May 2012 00:05:34 +0000 (17:05 -0700)]
sat-math: Introduce macro version of SAT_MUL.
The macro version can be used in a constant expression, such as an
initializer for a variable with static lifetime. (Otherwise, it's better
to use the function.)
Ben Pfaff [Wed, 30 May 2012 21:33:08 +0000 (14:33 -0700)]
pinsched: Completely fill the token bucket at initialization.
This code, which dates to August 2008, initially sets the packet-in
scheduler token buckets to 10% full, without any rationale. I suspect
that this is just a typo for 100% full, which I think would be more
conventional, so this commit switches to that.
Isaku Yamahata [Thu, 21 Jun 2012 02:25:48 +0000 (11:25 +0900)]
build: automake complains IntegrationGuide is missing
Change set of 502c471406b32e5afcdea62fa8307f9856d05437 added IntegrationGuide,
but it wasn't added to EXTRA_DIST. So automake complains.
This patch adds the file to EXTRA_DIST.
> make[3]: Leaving directory `/openvswitch/build/datapath'
> The distribution is missing the following files:
> IntegrationGuide
> make[2]: *** [dist-hook-git] Error 1
> make[2]: *** Waiting for unfinished jobs....
> make[2]: Leaving directory `/openvswitch/build'
> make[1]: *** [all-recursive] Error 1
> make[1]: Leaving directory `/openvswitch/build'
> make: *** [all] Error 2
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp> Signed-off-by: Ben Pfaff <blp@nicira.com>
Ethan Jackson [Tue, 19 Jun 2012 20:24:43 +0000 (13:24 -0700)]
cfm: Warn when delayed sending CCMs.
We've recently seen problems where OVS can get delayed sending CCM
probes by several seconds. This can cause tunnels to flap, and
generally wreak havoc. It's easy to detect when this is happening,
so minimally, warning should be helpful to those debugging
problems.
Ben Pfaff [Wed, 20 Jun 2012 22:13:38 +0000 (15:13 -0700)]
docs: Add references to the database schema documentation.
I field lots of questions about "where's the documentation?" Perhaps this
will help.
The changes to ovs-vsctl(8) add a couple of references to
ovs-vswitchd.conf.db(5) but they also rephrase a couple of paragraphs in
what seems to me an easier to understand style.
Justin Pettit [Tue, 19 Jun 2012 23:44:54 +0000 (16:44 -0700)]
FAQ: Add additional entries.
Does some cleanup and adds entries that cover:
- OVS isn't Linux-specific.
- Point out PORTING guide.
- Explanation of LTS releases.
- Supported versions of OpenFlow.
- Missing features from userspace datapath and upstream kernel
module.
Ben Pfaff [Wed, 20 Jun 2012 20:18:25 +0000 (13:18 -0700)]
ofproto-dpif-governor: Wake up only when there is genuinely work to do.
Until now, governor_wait() has awakened the poll loop whenever the
generation timer expires, to allow it to shrink the governor to the next
smaller size in governor_run(). However, if the governor is already the
smallest possible size, then governor_run() will not have anything to do
and will not restart the timer, which means that governor_wait() will again
immediately wake up the poll loop, and we end up using 100% CPU.
This is kind of hard to trigger because normally the client will destroy
a governor in such a case. However, if there are too many subfacets, the
client will keep even a minimum-size governor, triggering the bug.
Bug #12106. Reported-by: Alex Yip <alex@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
Conditional jump or move depends on uninitialised value(s)
at 0x805F63F: jsonrpc_session_set_dscp (jsonrpc.c:1061)
by 0x804F45D: ovsdb_jsonrpc_server_set_remotes (jsonrpc-server.c:417)
by 0x804B775: reconfigure_from_db (ovsdb-server.c:656)
by 0x804C231: main (ovsdb-server.c:159)
Pravin B Shelar [Wed, 20 Jun 2012 00:22:54 +0000 (17:22 -0700)]
datapath: Make 'struct work_struct' consistent with kernel definition.
From kernel 3.4 netdevice structure has delayed_work in
net_device->pm_qos_req. delayed_work needs work_struct definition.
OVS has its own workq implementation which redefines work_struct.
So we need to make it consistent with work_struct defined
in kernel workqueue.h to have correct net_device definition.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
Mehak Mahajan [Thu, 7 Jun 2012 23:57:56 +0000 (16:57 -0700)]
No need to restart DB / OVS on changing dscp value.
With this change there is no need to restart the DB or OVS on configuring a
different value for the manager or controller connection respectively. On
detecting a change in the dscp value on the socket, the previous socket is
closed and a new socket is created and connection is established with the new
configured dscp value.
Ben Pfaff [Mon, 18 Jun 2012 16:33:23 +0000 (09:33 -0700)]
debian: Make DKMS automatically build for running kernel.
By default DKMS doesn't build on demand for each kernel booted or updated.
Adding AUTOINSTALL=yes gives it this behavior. Based on a small sample of
Debian packages and how-to guides for Ubuntu, AUTOINSTALL=yes is what most
packages use and what users expect.
Ethan Jackson [Tue, 22 May 2012 08:53:07 +0000 (01:53 -0700)]
lib: Utilize smaps in the idl.
String to string maps are used all over the Open vSwitch database.
Before this patch, they were implemented in the idl as parallel
string arrays. This strategy has proven a bit cumbersome. With
this patch, string to string maps are implemented using the smap
library.
Ethan Jackson [Tue, 22 May 2012 10:47:36 +0000 (03:47 -0700)]
lib: New data structure - smap.
A smap is a string to string hash map. It has a cleaner interface
than shash's which were traditionally used for the same purpose.
This patch implements the data structure, and changes netdev and
its providers to use it.
Ethan Jackson [Tue, 22 May 2012 23:16:08 +0000 (16:16 -0700)]
bridge: Simplify VLAN splinter memory management.
Before this patch, the VLAN splinter memory management operated on
blocks of memory instead of ovsrec_ports. This strategy is
problematic in future patches when more than simply calling
'free()' needs to be done to destroy splinter ports. This patch
solves the problem by keeping track of entire ovsrec_ports instead
of just the memory allocated to create them.
Ben Pfaff [Wed, 13 Jun 2012 20:26:27 +0000 (13:26 -0700)]
tests: Add $(check_DATA) to check-valgrind dependencies.
Otherwise if you run "check-valgrind" in a tree where you've never run
"check", you get some test failures because some data files don't get
generated before the tests run.
Ben Pfaff [Tue, 22 May 2012 04:51:03 +0000 (21:51 -0700)]
openflow-1.0: Rename ofp_match to ofp10_match, OFPFW_* to OFPFW10_*.
This better fits our general policy of adding a version number suffix
to structures and constants whose values differ from one OpenFlow
version to the next.
Reviewed-by: Simon Horman <horms@verge.net.au> Signed-off-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Tue, 12 Jun 2012 16:40:11 +0000 (09:40 -0700)]
Add a FAQ.
I wrote most of this myself. The answer to "I can't seem to use Open
vSwitch in a wireless network" is based on a response by Jesse Gross:
http://openvswitch.org/pipermail/discuss/2011-January/004707.html
Simon Horman [Mon, 11 Jun 2012 16:56:12 +0000 (09:56 -0700)]
nx-match: Add parsing and serialisation of OXM matches.
This code, which leverages the existing NXM implementation,
adds parsing and serialisation of OXM matches. Test cases
have also been provided.
This patch only implements parsing and serialisation of OXM fields that
are already handled by NXM.
It should be noted that in OXM ports are 32bit whereas in NXM they
are 16 bit. This has been handled as a special case as all other field
widths are the same in both OXM and NXM.
This patch does not address differences in wildcarding between OXM and NXM.
It is planned that liberal wildcarding policy dictated by either OXM or
NXM will be implemented.
This patch also does not address any (subtle?) differences between
OXM and NXM treatment of specific fields. It is envisages that his
can be handled by subsequent patches.
Signed-off-by: Simon Horman <horms@verge.net.au>
[blp@nicira.com adjusted style, added a comment, changed in_port special
case, enabled NXM extensions to OXM] Signed-off-by: Ben Pfaff <blp@nicira.com>