Isaku Yamahata [Wed, 27 Jun 2012 14:23:25 +0000 (07:23 -0700)]
lib/meta-flow: introduce a macro, CASE_MFF_REGS, to catch "case MFF_REG<N>:"
Introduce a macro instead for
With this macro, the code is a bit reduced.
test: compile-tested and unit tests passed.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
[blp@nicira.com moved the macro declaration, moved trailing colon from
macro definition to invocation, adjusted style slightly] Signed-off-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Tue, 26 Jun 2012 17:52:34 +0000 (10:52 -0700)]
meta-flow: Accept NXM and OXM field names, support NXM and OXM for output.
This commit makes actions that accept NXM header values also accept OXM
header values and accept OXM field names where previously only NXM field
names were accepted.
This makes it possible to add new OXM fields that don't have NXM header
values, e.g. the OXM "metadata" field.
Inspired by Joe Stringer's patch:
http://openvswitch.org/pipermail/dev/2012-June/018344.html
Reported-by: Joe Stringer <joe@wand.net.nz> Signed-off-by: Ben Pfaff <blp@nicira.com>
Mehak Mahajan [Tue, 26 Jun 2012 19:30:26 +0000 (12:30 -0700)]
Setting miss_send_len on receiving NXT_SET_ASYNC_CONFIG message.
For the service controllers to receive any asynchronous messages, the
miss_send_len must be set to a non-zero value (refer to DESIGN). On
receiving the NXT_SET_ASYNC_CONFIG message, the miss_send_len is set
to the default value unless it is set to a non-zero value earlier by
the OFPT_SET_CONFIG message.
Ben Pfaff [Mon, 25 Jun 2012 16:48:44 +0000 (09:48 -0700)]
ofproto-dpif-governor: Improve performance when most flows get set up.
The "flow setup governor" was introduced to avoid the cost of setting up
short flows when there are many of them. It works very well for short
flows, in fact. However, when the bulk of flows are short, but still long
enough to be set up by the governor, we end up with the worst of both
worlds: OVS processes the first 5 packets of every flow "by hand" and then
it still has to set up a flow.
This commit refines the flow setup governor so that, when most of the flows
that go through it actually get set up, it in turn starts setting up most
flows at the first packet. When it does this, it continues to sample a
small fraction of the flows in the governor's usual manner, so that if the
behavior changes it can react to it.
This increases netperf TCP_CRR transactions per second by about 25% in my
test setup, without affecting "ovs-benchmark rate" performance.
(I found that to get relatively stable performance for TCP_CRR, regardless
of whether Open vSwitch or any kind of bridging was involved, I had to pin
the netperf processes on each side of the link to a single core. I found
that my NIC's interrupts were already pinned. Thanks to Luca Giraudo
<lgiraudo@nicira.com> for these hints.)
Bug #12080. Reported-by: Gurucharan Shetty <gshetty@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Wed, 20 Jun 2012 17:55:41 +0000 (10:55 -0700)]
dpif-linux: Zero 'stats' outputs of dpif_operate() ops on failure.
When DPIF_OP_FLOW_PUT or DPIF_OP_FLOW_DEL operations failed, they left
their 'stats' outputs uninitialized. For DPIF_OP_FLOW_DEL, this meant that
the caller would read indeterminate data:
Conditional jump or move depends on uninitialised value(s)
at 0x805C1EB: subfacet_reset_dp_stats (ofproto-dpif.c:4410)
by 0x80637D2: expire_batch (ofproto-dpif.c:3471)
by 0x8066114: run (ofproto-dpif.c:3513)
by 0x8059DF4: ofproto_run (ofproto.c:1035)
by 0x8052E17: bridge_run (bridge.c:2005)
by 0x8053F74: main (ovs-vswitchd.c:108)
It's unusual for a delete operation to fail. The most common reason is an
administrator running "ovs-dpctl del-flows".
The only user of DPIF_OP_FLOW_PUT did not request stats, so this doesn't
fix an actual bug for that case.
Bug #11797. Reported-by: James Schmidt <jschmidt@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
ovs-bugtool: Avoid running ethtool on non-physical devices.
There can be possibilities where there are hundreds of OVS
internal devices. In such a situation, running ovs-bugtool
can take a very long time to complete as multiple ethtool
commands are run on each interface in /sys/class/net. Once
the ovs-bugtool completes, most of the ethtool command outputs
would be incomplete with "timeouts" as we only give 30 seconds
for CAP_NETWORK_STATUS.
With the following patch, we only run ethtools on those interfaces
that have an associated "device". All physical interfaces have
this entry in /sys/class/net/${interface_name}/. Virtual interfaces
can have this entry too, if it has an underlying virtual device.
Ethan Jackson [Fri, 22 Jun 2012 00:57:30 +0000 (17:57 -0700)]
ofproto-dpif: Place high priority on sending CCMs.
It's very important to get CCMs out as quickly as possible to avoid
causing a fault when there is really no problem. This patch sends
CCMs as part of port_run_fast() in an attempt to move in this
direction.
Mehak Mahajan [Thu, 21 Jun 2012 19:22:42 +0000 (12:22 -0700)]
Reapplying the dscp changes: No need to restart DB/OVS on changing dscp value.
This patch reapplies the changes that were reverted with the commit 59efa47
(Revert DSCP update changes.). It also addresses the problem introduced by
the original commits, cd8fca2 ((jsonrpc: Correctly setting the dscp value
before reconnect.) and b2e18d (No need to restart DB / OVS on changing
dscp value.), that caused numerous unit test failures on some systems (as
diagnosed by valgrind).
With this change there is no need to restart the DB or OVS on configuring a
different value for the manager or controller connection respectively. On
detecting a change in the dscp value on the socket, the previous socket is
closed and a new socket is created and connection is established with the new
configured dscp value.
Ben Pfaff [Thu, 21 Jun 2012 17:42:20 +0000 (10:42 -0700)]
odp-util: Include <config.h> first.
Otherwise _GNU_SOURCE doesn't get defined early enough and on some systems
LLONG_MIN is missing when odp-util.c tries to use it indirectly through
token-bucket.h.
Reported-by: Michael Hu <mhu@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Thu, 31 May 2012 00:05:34 +0000 (17:05 -0700)]
sat-math: Introduce macro version of SAT_MUL.
The macro version can be used in a constant expression, such as an
initializer for a variable with static lifetime. (Otherwise, it's better
to use the function.)
Ben Pfaff [Wed, 30 May 2012 21:33:08 +0000 (14:33 -0700)]
pinsched: Completely fill the token bucket at initialization.
This code, which dates to August 2008, initially sets the packet-in
scheduler token buckets to 10% full, without any rationale. I suspect
that this is just a typo for 100% full, which I think would be more
conventional, so this commit switches to that.
Isaku Yamahata [Thu, 21 Jun 2012 02:25:48 +0000 (11:25 +0900)]
build: automake complains IntegrationGuide is missing
Change set of 502c471406b32e5afcdea62fa8307f9856d05437 added IntegrationGuide,
but it wasn't added to EXTRA_DIST. So automake complains.
This patch adds the file to EXTRA_DIST.
> make[3]: Leaving directory `/openvswitch/build/datapath'
> The distribution is missing the following files:
> IntegrationGuide
> make[2]: *** [dist-hook-git] Error 1
> make[2]: *** Waiting for unfinished jobs....
> make[2]: Leaving directory `/openvswitch/build'
> make[1]: *** [all-recursive] Error 1
> make[1]: Leaving directory `/openvswitch/build'
> make: *** [all] Error 2
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp> Signed-off-by: Ben Pfaff <blp@nicira.com>
Ethan Jackson [Tue, 19 Jun 2012 20:24:43 +0000 (13:24 -0700)]
cfm: Warn when delayed sending CCMs.
We've recently seen problems where OVS can get delayed sending CCM
probes by several seconds. This can cause tunnels to flap, and
generally wreak havoc. It's easy to detect when this is happening,
so minimally, warning should be helpful to those debugging
problems.
Ben Pfaff [Wed, 20 Jun 2012 22:13:38 +0000 (15:13 -0700)]
docs: Add references to the database schema documentation.
I field lots of questions about "where's the documentation?" Perhaps this
will help.
The changes to ovs-vsctl(8) add a couple of references to
ovs-vswitchd.conf.db(5) but they also rephrase a couple of paragraphs in
what seems to me an easier to understand style.
Justin Pettit [Tue, 19 Jun 2012 23:44:54 +0000 (16:44 -0700)]
FAQ: Add additional entries.
Does some cleanup and adds entries that cover:
- OVS isn't Linux-specific.
- Point out PORTING guide.
- Explanation of LTS releases.
- Supported versions of OpenFlow.
- Missing features from userspace datapath and upstream kernel
module.
Ben Pfaff [Wed, 20 Jun 2012 20:18:25 +0000 (13:18 -0700)]
ofproto-dpif-governor: Wake up only when there is genuinely work to do.
Until now, governor_wait() has awakened the poll loop whenever the
generation timer expires, to allow it to shrink the governor to the next
smaller size in governor_run(). However, if the governor is already the
smallest possible size, then governor_run() will not have anything to do
and will not restart the timer, which means that governor_wait() will again
immediately wake up the poll loop, and we end up using 100% CPU.
This is kind of hard to trigger because normally the client will destroy
a governor in such a case. However, if there are too many subfacets, the
client will keep even a minimum-size governor, triggering the bug.
Bug #12106. Reported-by: Alex Yip <alex@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
Conditional jump or move depends on uninitialised value(s)
at 0x805F63F: jsonrpc_session_set_dscp (jsonrpc.c:1061)
by 0x804F45D: ovsdb_jsonrpc_server_set_remotes (jsonrpc-server.c:417)
by 0x804B775: reconfigure_from_db (ovsdb-server.c:656)
by 0x804C231: main (ovsdb-server.c:159)
Pravin B Shelar [Wed, 20 Jun 2012 00:22:54 +0000 (17:22 -0700)]
datapath: Make 'struct work_struct' consistent with kernel definition.
From kernel 3.4 netdevice structure has delayed_work in
net_device->pm_qos_req. delayed_work needs work_struct definition.
OVS has its own workq implementation which redefines work_struct.
So we need to make it consistent with work_struct defined
in kernel workqueue.h to have correct net_device definition.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
Mehak Mahajan [Thu, 7 Jun 2012 23:57:56 +0000 (16:57 -0700)]
No need to restart DB / OVS on changing dscp value.
With this change there is no need to restart the DB or OVS on configuring a
different value for the manager or controller connection respectively. On
detecting a change in the dscp value on the socket, the previous socket is
closed and a new socket is created and connection is established with the new
configured dscp value.
Ben Pfaff [Mon, 18 Jun 2012 16:33:23 +0000 (09:33 -0700)]
debian: Make DKMS automatically build for running kernel.
By default DKMS doesn't build on demand for each kernel booted or updated.
Adding AUTOINSTALL=yes gives it this behavior. Based on a small sample of
Debian packages and how-to guides for Ubuntu, AUTOINSTALL=yes is what most
packages use and what users expect.
Ethan Jackson [Tue, 22 May 2012 08:53:07 +0000 (01:53 -0700)]
lib: Utilize smaps in the idl.
String to string maps are used all over the Open vSwitch database.
Before this patch, they were implemented in the idl as parallel
string arrays. This strategy has proven a bit cumbersome. With
this patch, string to string maps are implemented using the smap
library.
Ethan Jackson [Tue, 22 May 2012 10:47:36 +0000 (03:47 -0700)]
lib: New data structure - smap.
A smap is a string to string hash map. It has a cleaner interface
than shash's which were traditionally used for the same purpose.
This patch implements the data structure, and changes netdev and
its providers to use it.
Ethan Jackson [Tue, 22 May 2012 23:16:08 +0000 (16:16 -0700)]
bridge: Simplify VLAN splinter memory management.
Before this patch, the VLAN splinter memory management operated on
blocks of memory instead of ovsrec_ports. This strategy is
problematic in future patches when more than simply calling
'free()' needs to be done to destroy splinter ports. This patch
solves the problem by keeping track of entire ovsrec_ports instead
of just the memory allocated to create them.
Ben Pfaff [Wed, 13 Jun 2012 20:26:27 +0000 (13:26 -0700)]
tests: Add $(check_DATA) to check-valgrind dependencies.
Otherwise if you run "check-valgrind" in a tree where you've never run
"check", you get some test failures because some data files don't get
generated before the tests run.
Ben Pfaff [Tue, 22 May 2012 04:51:03 +0000 (21:51 -0700)]
openflow-1.0: Rename ofp_match to ofp10_match, OFPFW_* to OFPFW10_*.
This better fits our general policy of adding a version number suffix
to structures and constants whose values differ from one OpenFlow
version to the next.
Reviewed-by: Simon Horman <horms@verge.net.au> Signed-off-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Tue, 12 Jun 2012 16:40:11 +0000 (09:40 -0700)]
Add a FAQ.
I wrote most of this myself. The answer to "I can't seem to use Open
vSwitch in a wireless network" is based on a response by Jesse Gross:
http://openvswitch.org/pipermail/discuss/2011-January/004707.html
Simon Horman [Mon, 11 Jun 2012 16:56:12 +0000 (09:56 -0700)]
nx-match: Add parsing and serialisation of OXM matches.
This code, which leverages the existing NXM implementation,
adds parsing and serialisation of OXM matches. Test cases
have also been provided.
This patch only implements parsing and serialisation of OXM fields that
are already handled by NXM.
It should be noted that in OXM ports are 32bit whereas in NXM they
are 16 bit. This has been handled as a special case as all other field
widths are the same in both OXM and NXM.
This patch does not address differences in wildcarding between OXM and NXM.
It is planned that liberal wildcarding policy dictated by either OXM or
NXM will be implemented.
This patch also does not address any (subtle?) differences between
OXM and NXM treatment of specific fields. It is envisages that his
can be handled by subsequent patches.
Signed-off-by: Simon Horman <horms@verge.net.au>
[blp@nicira.com adjusted style, added a comment, changed in_port special
case, enabled NXM extensions to OXM] Signed-off-by: Ben Pfaff <blp@nicira.com>
Ethan Jackson [Thu, 7 Jun 2012 22:27:22 +0000 (15:27 -0700)]
packets: Use RARPs for learning packets.
Traditionally Open vSwitch had used 802.2 SNAP packets to update
upstream switch learning tables when necessary. This approach had
advantages in that debugging information could be embedded in the
packet helping hapless admins figure out what's going on. However,
since both qemu and VMware use RARP for this purpose, it seems
appropriate to fall in line with the defacto standard.
Requested-by: Ben Basler <bbasler@nicira.com> Signed-off-by: Ethan Jackson <ethan@nicira.com>
Ethan Jackson [Thu, 7 Jun 2012 20:05:41 +0000 (13:05 -0700)]
ofproto: Fix use after free in ofoperation_complete().
In one edge case, ofoperation_complete() destroys its rule, without
updating its ofoperation that the rule is gone. Later in the same
function, ofoperation_destroy() attempts to modify the rule which
already destroyed.
Bug #11797. Signed-off-by: Ethan Jackson <ethan@nicira.com>
Ethan Jackson [Fri, 1 Jun 2012 21:33:41 +0000 (14:33 -0700)]
packets: Generalize reserved RSPAN protocols.
Open vSwitch refuses to mirror certain destination addresses in
addition to those classified by eth_addr_is_reserved(). Looking
through the uses of eth_addr_is_reserved(), one finds that no
callers should be using the additional addresses which mirroring
drops. This patch folds the additional addresses dropped in the
mirroring code, into the more general eth_addr_is_reserverd()
function.
This patch also changes the implementation in a way that is
slightly less efficient, but much easier to read and extend int he
future.
Bug #11755. Signed-off-by: Ethan Jackson <ethan@nicira.com>
Ethan Jackson [Thu, 7 Jun 2012 00:37:46 +0000 (17:37 -0700)]
packets: Fix eth_addr_equal_except().
It turns out that eth_addr_equal_except() computed the exact
opposite of what it purported to. It returned true if the two
arguments where *not* equal. This is extremely confusing, so this
patch changes it.
Bruce Davie [Wed, 6 Jun 2012 01:49:51 +0000 (18:49 -0700)]
ovsdb-client: Fix bugs in man page
In commit 53ffefe9 (ovsdb-client: Make "server" and "database"
arguments optional.), two errors were introduced. "list-columns"
appeared twice in the list of commands, the first instance should be
"list-tables". The "monitor" command now lists optional "column"
arguments.
Signed-off-by: Bruce Davie <bsd@nicira.com> Signed-off-by: Bruce Davie <bdavie@nicira.com> Signed-off-by: Justin Pettit <jpettit@nicira.com>
Ben Pfaff [Fri, 1 Jun 2012 21:40:31 +0000 (17:40 -0400)]
dpif-linux: Log details when a packet is lost.
Until now, when a packet was dropped in the kernel-to-user buffers, we
logged the occurrence but nothing that would allow a person reading the
log after the fact to learn why it was dropped. This commit adds details
that identify the major sources of packets in the buffer, which should
help.
Ben Pfaff [Wed, 23 May 2012 23:55:09 +0000 (16:55 -0700)]
dpif-linux: Slightly refactor internal data structures.
An initial attempt also replaced the 'uint32_t ready_mask' in struct
dpif_linux by a 'bool ready' in each struct dpif_channel, but I wasn't
happy with the result (the ready_mask bitmap works out really well) and so
I dropped that part.
Ben Pfaff [Wed, 23 May 2012 21:56:20 +0000 (14:56 -0700)]
dpif-linux: Avoid pessimal behavior when kernel-to-user buffers overflow.
When a kernel-to-user Netlink buffer overflows, the kernel reports
ENOBUFS without passing along an actual message. When it does this,
we should immediately try again, because we know that there is a
message waiting, instead of reporting the error to the caller.
This improves the OVS response rate to "hping3 --flood" traffic by
a few percentage points in my testing.
Joe Stringer [Mon, 28 May 2012 12:38:21 +0000 (00:38 +1200)]
flow: Adds support for arbitrary ethernet masking
Arbitrary ethernet mask support is one step on the way to support for OpenFlow
1.1+. This patch set seeks to add this capability without breaking current
protocol support.
Signed-off-by: Joe Stringer <joe@wand.net.nz>
[blp@nicira.com made some updates, see
http://openvswitch.org/pipermail/dev/2012-May/017585.html] Signed-off-by: Ben Pfaff <blp@nicira.com>
Justin Pettit [Sat, 24 Mar 2012 08:02:26 +0000 (01:02 -0700)]
ofp-util: Clean up cookie handling.
Commit e72e793 (Add ability to restrict flow mods and flow stats
requests to cookies.) modified cookie handling. Some of its behavior
was unintuitive and there was at least one bug (described below).
Commit f66b87d (DESIGN: Document uses for flow cookies.) attempted to
document a clean design for cookie handling. This commit updates the
DESIGN document and brings the implementation in line with it.
In commit e72e793, the code that handled processing OpenFlow flow
modification requests set the cookie mask to exact-match. This seems
reasonable for adding flows, but is not correct for matching, since
OpenFlow 1.0 doesn't support matching based on the cookie. This commit
changes to cookie mask to fully wildcarded, which is the correct
behavior for modifications and deletions. It doesn't cause any problems
for flow additions, since the mask is ignored for that operation.
Joe Stringer [Tue, 29 May 2012 18:07:16 +0000 (11:07 -0700)]
packets: Adds ethernet-matching helper functions
With OpenFlow 1.1 requiring arbitrary ethernet match support, it simplifies
other code if we have some extra helper functions. This patch adds
eth_mask_is_exact(mask), eth_addr_bitand(src, mask, dst),
eth_addr_equal_except(a, b, mask) and eth_format_masked(eth, mask, output).
Signed-off-by: Joe Stringer <joe@wand.net.nz> Signed-off-by: Ben Pfaff <blp@nicira.com>
Eric Dumazet [Fri, 25 May 2012 18:07:35 +0000 (11:07 -0700)]
datapath: cleanup unsigned to unsigned int
Use of "unsigned int" is preferred to bare "unsigned" in net tree.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jesse Gross <jesse@nicira.com>
Ben Pfaff [Tue, 15 May 2012 19:50:57 +0000 (12:50 -0700)]
odp-util: Update ODPUTIL_FLOW_KEY_BYTES for current kernel flow format.
Before we submitted the kernel module upstream, we updated the flow format
by adding two fields to the description of packets with VLAN headers, but
we forgot to update ODPUTIL_FLOW_KEY_BYTES to reflect these changes. The
result was that a maximum-length flow did not fit in the given space.
This fixes a crash processing IPv6 neighbor discovery packets with VLAN
headers received in a tunnel configured with key=flow or in_key=flow.
This updates some comments to better describe the implications of
ODPUTIL_FLOW_KEY_BYTES (suggested by Justin).
This also updates test-odp.c so that it would have caught this problem, and
updates odp.at to demonstrate that a full 156 bytes are necessary. (To see
that, revert the change to ODPUTIL_FLOW_KEY_BYTES and run the test.)
Reported-by: Dan Wendlandt <dan@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>