Pravin B Shelar [Tue, 13 Aug 2013 07:19:53 +0000 (00:19 -0700)]
datapath: Move generic tunnel functions to lisp module.
Generic tunnel rcv and send function are only used by lisp tunneling
module, so It make sense to move them to lisp module.
CC: Lori Jakab <lojakab@cisco.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Lorand Jakab <lojakab@cisco.com> Acked-by: Jesse Gross <jesse@nicira.com>
Pravin B Shelar [Thu, 15 Aug 2013 00:31:57 +0000 (17:31 -0700)]
datapath: tunnel: Fix gre64 tunnel when key not specified.
User is allowed to create tunnel without any keys. In this
case userspace set tunnel action does not set tun-key flag
which was confusing gre64 vport header calculations. Following
patch fixes it by always assuming TUNNEL_KEY parameter as we
do it with tun-seq.
Pravin B Shelar [Wed, 14 Aug 2013 18:46:15 +0000 (11:46 -0700)]
datapath: tunnel: Do not use inner ip-header-id for tunnel ip-header-id.
Using inner-id for tunnel id is not safe in some rare cases.
E.g. packets coming from multiple sources entering same tunnel
can have same id. Therefore on tunnel packet receive we
could have packets from two different stream but with same
source and dst IP with same ip-id which could confuse ip packet
reassembly.
Alex Wang [Tue, 13 Aug 2013 23:51:08 +0000 (16:51 -0700)]
bfd: Increase configuration efficiency.
Currently, when there are multiple bfd configuration changes,
the bfd_poll() will only update one change at a time with the
other side. This commit moves the call to bfd_poll() at the
end of configuration processing function, so that bfd_poll()
will update all configuration changes together.
Signed-off-by: Alex Wang <alexw@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
Alex Wang [Wed, 14 Aug 2013 20:12:33 +0000 (13:12 -0700)]
bfd: Fix check_tnl_key error.
This commit fixes the bug introduced by commit 26131299fa5 (bfd: Make the
BFD module thread safe.). The bug caused the opposite of the intended
behaviour.
Unit test is added for the 'check_tnl_key' feature.
Signed-off-by: Alex Wang <alexw@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Wed, 14 Aug 2013 00:44:14 +0000 (17:44 -0700)]
ofproto-dpif: Destroy bundle after moving its last port out.
When the ofp_port argument to bundle_add_port() refers to an ofport_dpif
that already belongs to some other bundle, bundle_add_port() removed
the port from the other bundle, correctly, with bundle_del_port().
If the other bundle now contained no ports, however, this violated the
invariant that a bundle always contains at least one port.
Normally, this would get fixed up when the other bundle was processed
later during reconfiguration. I haven't quite zeroed in on the exact
case where this is not true, but segfaults have happened here in
production, in particular when port adds and deletes happen simultaneously
and the new port reuses the OpenFlow port number of one of the deleted
ports. It seems that the duplicate port number allows some port to rip
away the new port from its bundle without destroying that bundle. I
suspect, therefore, that there is still a more subtle bug here, but I
hope that this will fix the segfault.
Bug #18967. Signed-off-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Tue, 6 Aug 2013 19:57:16 +0000 (12:57 -0700)]
flow: Enable matching on new field 'pkt_mark'.
The Linux kernel datapath enables matching and setting the skb mark
but this functionality is currently used only internally by
ovs-vswitchd. This exposes it through NXM to enable external
controllers to interact with other kernel subsystems. Although this
is simply exporting the skb mark, the intention is that this is a
platform independent mechanism to access some system metadata and
therefore may have different implementations on various systems.
Bug #17855
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Andy Zhou <azhou@nicira.com>
Jesse Gross [Tue, 6 Aug 2013 19:57:15 +0000 (12:57 -0700)]
tunnel: Clear IPSEC_MARK on input rather than output.
Currently we remove the IPSEC_MARK flag from all packets that are
egressing on non-tunnel ports. However, this behavior is confusing
if we allow OpenFlow controllers to match and set the pkt_mark field
because the tunnel behavior applies even on non-tunnel ports.
This instead clears the mark on tunnel input which should have the
same effect for tunnel ports. However, on non-tunnel traffic (or
even for traffic entering on a tunnel port but leaving on a non-
tunnel port) it allows the mark to pass through without change.
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Andy Zhou <azhou@nicira.com>
Jesse Gross [Tue, 6 Aug 2013 19:57:14 +0000 (12:57 -0700)]
tunnel: Consolidate action code for tunnel port receive.
There are a couple of operations that are related to receiving a
packet on a tunnel port but that affect the actions and therefore
need to be performed on the output path. This adds a new hook to
do this and consolidates the existing code there.
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Andy Zhou <azhou@nicira.com>
Jesse Gross [Tue, 6 Aug 2013 19:57:13 +0000 (12:57 -0700)]
flow: Rename skb_mark to pkt_mark.
The skb_mark field is currently only available with the Linux datapath
and is only used internally. However, it is desirable to expose this
through OpenFlow and when it is exposed ideally it would not be system-
specific. In preparation for this, skb_mark is rename to pkt_mark in
internal data structures for consistency.
This does not rename the Linux interfaces because doing so would break
the API. It would not necessarily be desirable to do anyways since in
Linux-specific code it is clearer to use the actual name rather than a
generic one. This can lead to confusion in some places, however, because
we do not always strictly separate generic and platform dependent code
(one example is actions). This seems inevitable though at this point if
the lower and upper layers have different names (as they must given the
above requirements).
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Andy Zhou <azhou@nicira.com>
Ben Pfaff [Tue, 13 Aug 2013 19:54:35 +0000 (12:54 -0700)]
debian: Fix build with old versions of dpkg-buildflags.
dpkg-buildflags has not always supported --export=configure, but commit 6c2d4c8780 (debian: Apply hardening options to build.) used it
unconditionally, causing the build to fail on old Debian distributions.
This fixes the problem.
Ben Pfaff [Tue, 13 Aug 2013 19:37:50 +0000 (12:37 -0700)]
tests: Fix threading race in "ofproto-dpif megaflow - learning" test.
Threaded ofproto-dpif uses a queue to pass packets from the forwarding
threads to the main thread for (mega)flow setup and for learning. When
learning occurs, causing revalidations, this races against flow setup, so
that sometimes a datapath (mega)flow does get set up for a packet that
causes learning and sometimes it doesn't. This caused this test to
sometimes fail because one megaflow or the other that was expected to be
set up was not.
This commit fixes the problem by sending a second packet in each flow.
These additional packets don't cause any additional changes to the flow
table but they do cause flows to be set up, fixing the problem.
Ben Pfaff [Tue, 13 Aug 2013 18:22:26 +0000 (11:22 -0700)]
tests: Fix problems in "learning action - self-modifying flow" test.
This test had two problems. First, it had a bizarre dependency on stats
that were not up-to-date: the "ovs-ofctl dump-flows" assumed that only
the first one of ten packets sent through the switch had been accounted
to OpenFlow flow statistics. Adding a 1-second time warp fixed this
problem by ensuring that all ten packets were accounted. (That's why this
patch updates the expected output of "ovs-ofctl dump-flows".)
Second, multithreading has made packet processing less predictable in
general. This commit adds 10-ms time warps after sending each packet,
which seems to make the test reliable for me.
Ben Pfaff [Mon, 12 Aug 2013 22:49:25 +0000 (15:49 -0700)]
sparse: Remove support for thread-safety annotations.
The Clang support for thread-safety annotations is much more effective
than "sparse" support. I found that I was unable to make the annotations
warning-free under sparse.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Ethan Jackson <ethan@nicira.com>
Jesse Gross [Mon, 5 Aug 2013 23:00:03 +0000 (16:00 -0700)]
ofproto: Include classifier wildcards in trace output.
When tracing a flow, it shows the "relevant fields" that were used
to determine the results. However, this currently only includes fields
that are used for computing the actions but not the flow lookup. This
can be confusing so this patch includes the wildcards from the classifer
lookup as well.
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
YAMAMOTO Takashi [Mon, 12 Aug 2013 23:00:05 +0000 (08:00 +0900)]
ofproto: make oftable_remove_rule__ release evict lock
according to the OVS_RELEASES annotation, oftable_remove_rule__ is
expected to release rule->evict lock. make it actually do so.
this fixes pthread_rwlock_destroy failures observed on NetBSD,
where destroying a held lock, which is specwise undefined behaviour,
actually fails. i guess it doesn't fail on linux but it's better
not to rely on an undefined behavior.
Signed-off-by: YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp> Signed-off-by: Ben Pfaff <blp@nicira.com>
Pravin B Shelar [Mon, 12 Aug 2013 17:12:27 +0000 (10:12 -0700)]
datapath: Do not set dev->master for XEN.
XEN dom0 networking assumes dev->master is bond device
and it tries to access bond private structure from dev->master
ptr on receive path. This causes panic.
Following patch removes compat code that is setting master
device.
Alex Wang [Mon, 12 Aug 2013 21:14:52 +0000 (14:14 -0700)]
band: Fix error in bond_choose_output_slave() function.
This commit fixes the error introduced by commit 4a1b8f30e59 (bond:
Stop using tags.). The error is caused by mistakenly returning 'slave'
where 'slave->aux' should be returned.
Signed-off-by: Alex Wang <alexw@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Mon, 12 Aug 2013 19:49:23 +0000 (12:49 -0700)]
netdev-dummy: Include all dummy classes in iterations.
Commit 86f1d0326bd0 (netdev-dummy: Use netdev_get_devices() instead of a
local shash.) caused netdev-dummy functions that iterate over all dummy
devices to iterate only over the ones that have class 'dummy_class'. This
seemed to obviously include all the ones that we want, but in fact
when ovs-vswitch is invoked with --enable-dummy=override, there are more
dummy classes than just dummy_class, which this new form of iteration
skipped over, with various negative consequences that showed up in some
testing.
This commit switches netdev-dummy back to internally tracking its own
dummy devices. It fixes the tests for me.
Alex Wang [Mon, 12 Aug 2013 19:01:46 +0000 (12:01 -0700)]
bond.c: Fix a typo.
This commit fixes a typo in "lib/bond.c" which causes the high CPU
utilization after adding bond. The bug was introduced in commit 4a1b8f30e59 (bond: Stop using tags.).
Signed-off-by: Alex Wang <alexw@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
Pravin B Shelar [Thu, 1 Aug 2013 22:36:06 +0000 (15:36 -0700)]
datapath: Use parallel_ops genl.
OVS locking was recently changed to have private OVS lock which
simplified overall locking. Therefore there is no need to have
another global genl lock to protect OVS data structures. Following
patch uses of parallel_ops genl family for OVS. This also allows
more granual OVS locking using ovs_mutex for protecting OVS data
structures, which gives more concurrencey. E.g multiple genl
operations OVS_PACKET_CMD_EXECUTE can run in parallel, etc.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
Ethan Jackson [Tue, 25 Jun 2013 21:45:43 +0000 (14:45 -0700)]
ofproto-dpif: Implement multi-threaded miss handling.
This patch factors flow miss handling into its own module,
ofproto-dpif-upcall which can utilize multiple threads to process
misses. For some important benchmarks, this change improves Open
vSwitch flow setup performance by roughly 50x (that's 50 times not
50%) in my testing.
Signed-off-by: Ethan Jackson <ethan@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Ethan Jackson [Wed, 17 Jul 2013 23:14:02 +0000 (16:14 -0700)]
ofproto-dpif: Lock rules to prevent eviction.
This patch uses a read-write lock to prevent rules from being evicted
while they're used by child threads. It also changes the prototypes
of the various rule lookup functions so that the thread safety
analysis can be used to ensure that the locking is handled properly.
Signed-off-by: Ethan Jackson <ethan@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Sat, 10 Aug 2013 04:29:03 +0000 (21:29 -0700)]
netdev-linux: Use dedicated netlink notification socket.
The rtnetlink_link asynchronous netlink notifications seem somewhat
troublesome in a threaded environment. It seems more straightforward
to have netdev-linux fend for itself.
Ben Pfaff [Wed, 31 Jul 2013 21:09:30 +0000 (14:09 -0700)]
netdev-vport: Make netdev_vport_patch_peer() return a malloc()'d string.
When threading comes into the picture there arises the possibility of a
race between netdev_vport_patch_peer()'s caller using the returned string
and another caller changing the peer. It is safer to return a copy.
This is the same lifecycle used in the ofproto provider interface.
Compared to the previous netdev provider interface, it has the
advantage that the netdev top layer can control when any given
netdev becomes visible to the outside world.
Ben Pfaff [Sat, 10 Aug 2013 04:14:23 +0000 (21:14 -0700)]
netdev-linux, netdev-bsd: Make access to AF_INET socket thread-safe.
The only uses of 'af_inet_sock', in both drivers, were ioctls, so it seemed
like a good abstraction to write a function that just does such an ioctl,
and to factor out shared code into socket-util.
Signed-off-by: Ben Pfaff <blp@nicira.com> CC: Ed Maste <emaste@freebsd.org>
Ben Pfaff [Thu, 25 Jul 2013 23:11:52 +0000 (16:11 -0700)]
netdev-dummy: Use netdev_get_devices() instead of a local shash.
When an upcoming commit introduces thread safety into the netdev API, this
allows netdev-dummy to avoid adding more internal locking by taking
advantage of netdev_get_devices() refcounting.
Jesse Gross [Fri, 9 Aug 2013 22:27:27 +0000 (15:27 -0700)]
datapath: Fix typo in flow validation logic.
A bit shift operation is using the value '11' instead of '1' as the
starting value. This only makes validation weaker than it should be
so unless userspace is trying to install an invalid flow there will
be no effect.
ovs-bugtool: Collect database through CAP_NETWORK_STATUS.
Currently the openvswitch database is being collected with
CAP_NETWORK_CONFIG which has a max size of 50 KB. This is
quite low as the database can easily be larger than 50 KB.
Move database collection to CAP_NETWORK_STATUS which does
not have a max size. If database size exceeds 10 MB, create
a compacted version of it and then collect it.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Wed, 7 Aug 2013 03:35:29 +0000 (20:35 -0700)]
ofproto-dpif-xlate: Avoid MAC learning write lock on fast path.
Most of the trips through update_learning_table() do not actually change
the MAC learning table, but because some do the code there took the
MAC learning table's write lock. This commit changes the common case to
take only the read lock, falling back to the write lock if a change was
actually necessary.
Ethan reported that this gave a 3.3x performance improvement in one test
case due to reduced lock contention.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Ethan Jackson <ethan@nicira.com>
Jesse Gross [Thu, 8 Aug 2013 02:47:59 +0000 (19:47 -0700)]
datapath: Add 'ovs_' prefix to extern symbols.
The external symbols in the OVS kernel module are prefixed with
'ovs_' with the exception of ipv4_tun_to/from_nlattr(). This adds
the prefix and makes the out of tree version consistent with
upstream.
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Andy Zhou <azhou@nicira.com>
Ben Pfaff [Tue, 6 Aug 2013 21:57:19 +0000 (14:57 -0700)]
ovs-thread: New function ovsthread_id_self().
I foresee a need for possibly large numbers of instances of "struct
seq" (which is introduced in an upcoming patch). Each struct seq
needs some per-thread data. POSIX has pthread_key_t for this, but
the number of keys can be fairly limited, to as few as 128. It is
reasonable to work around this by using a hash table indexed on the
current thread. That only works if one can get a thread identifier
that is hashable (pthread_t is not). This patch introduces a
hashable thread identifier.
Ben Pfaff [Tue, 6 Aug 2013 21:40:25 +0000 (14:40 -0700)]
ovs-thread: Add support for globally visible per-thread data.
DEFINE_PER_THREAD_DATA always declared its data item as "static", meaning
that it was only directly visible within a single translation unit.
This commit adds additional forms of per-thread data that allow the data
to be accessible from multiple translation units.
Ben Pfaff [Fri, 26 Jul 2013 00:05:46 +0000 (17:05 -0700)]
netdev: Make netdev_from_name() take a reference to its returned netdev.
This API change is necessary for thread safety, to be added in an upcoming
commit. Otherwise, the client would not be able to safely use the returned
netdev because it could already have been destroyed.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Andy Zhou <azhou@nicira.com>
Ben Pfaff [Thu, 25 Jul 2013 23:27:39 +0000 (16:27 -0700)]
netdev: Make netdev_get_devices() take a reference to each netdev.
This API change is necessary for thread safety, to be added in an upcoming
commit. Otherwise, the client would not be able to actually use any of
the returned netdevs because they could already have been destroyed.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Andy Zhou <azhou@nicira.com>
Ben Pfaff [Fri, 2 Aug 2013 19:19:49 +0000 (12:19 -0700)]
netdev-bsd: Don't assume 'struct netdev' has offset 0.
The data items returned by netdev_get_devices() are "struct netdev *"s.
The code fixed up by this commit used them as "struct netdev_bsd *",
which happens to work because struct netdev happens to be at offset 0 in
each struct but it's better to do a proper cast in case someday
struct netdev gets moved to a nonzero offset.
Ben Pfaff [Wed, 31 Jul 2013 22:22:12 +0000 (15:22 -0700)]
netdev-bsd: Correctly handle IPv4 netmasks.
netdev_bsd_get_in4() did not set anything in its 'netmask' output argument
if the IPv4 address was cached, leaving it indeterminate. It would also
mark the cache as valid even if there was an error retrieving the netmask.
This fixes both problems.
Found by inspection.
Signed-off-by: Ben Pfaff <blp@nicira.com> CC: Ed Maste <emaste@freebsd.org>
Simon Horman [Wed, 7 Aug 2013 00:28:00 +0000 (09:28 +0900)]
Update OPENFLOW-1.1+ to differentiate optional and required features
The purpose of this patch is primarily to provide details on which
unimplemented features are optional and which are required as this
may be of interest to those working on OpenFlow 1.1+ coverage.
This patch also:
* Clarifies the text of some entries which seemed difficult to understand
for the authors of this patch.
* Adds entries for features that were missing from the existing list.
N.B: It is entirely possible that there are still missing entries.
* Expands some entries into sub-entries where some portions of
a feature are required and others are optional
Co-authored-by: Joe Stringer <joe@wand.net.nz> Signed-off-by: Joe Stringer <joe@wand.net.nz> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Ben Pfaff <blp@nicira.com>
Andy Zhou [Sat, 3 Aug 2013 19:23:15 +0000 (12:23 -0700)]
odp-util: Always export the priority and skb_mark netlink attributes.
The current Netlink protocol allows a default value of zero if either mark
or priority is not specified (this is part of the ABI). Until now, when
userspace serializes either the value or mask, it looked at the value and
omitted the netlink attribute if it is zero. This is a bug because an
exact match on zero turns into a wildcard of the field.
These two fields (plus input port and EtherType) are special because they
can be omitted whereas most other values are required to be fully
specified. These protocol variations tend to cause bugs (as above) when we
evolve the protocol because an exception that makes sense in one context
might not be logical in another. Since the default value for mark and
priority are merely shorthands, we can push the protocol in a more
consistent direction by ignoring the shortcut and always serializing the
values. This is what this commits does.
Signed-off-by: Andy Zhou <azhou@nicira.com>
[blp@nicira.com added Jesse's text to the commit message] Signed-off-by: Ben Pfaff <blp@nicira.com>
Paul Ingram [Sat, 3 Aug 2013 07:12:36 +0000 (07:12 +0000)]
cfm: update remote opstate only when a CCM is received.
The remote opstate for a CFM interface is presumed to be up unless a CCM is
received which signals opstate down. This means than an interface configured
for CFM demand mode may incorrectly appear to be opstate up if it has not
received a CCM within the last fault interval.
We should remember the last remote opstate for a CFM interface and only
change it when a CCM arrives signaling a change.
Bug #18806 Signed-off-by: Paul Ingram <pingram@nicira.com> Signed-off-by: Ethan Jackson <ethan@nicira.com> Acked-by: Ethan Jackson <ethan@nicira.com>
The current situation is that whenever any packet enters the
userspace, bfd_should_process_flow() looks at the UDP destination
port to figure out whether that is a BFD packet. This means that
UDP destination port cannot be wildcarded for all the other flows
too.
To optimize BFD for megaflows, we introduce a new
'bfd:bfd_dst_mac' field in the database. Whenever this field is set
by a controller, it is assumed that all the BFD packets to/from
this interface will have the destination mac address set as the one
specified in the bfd:bfd_dst_mac field. If this field is set, we
first look at the destination mac address of a packet and if it
does not match the mac address set in bfd:bfd_dst_mac, we do not
process that packet as bfd. If the field does match, we go ahead
and look at the UDP destination port too.
Also, change the default BFD destination mac address to
"00:23:20:00:00:01".
Feature #18850. Signed-off-by: Gurucharan Shetty <gshetty@nicira.com> Acked-by: Ethan Jackson <ethan@nicira.com>
Ethan Jackson [Fri, 2 Aug 2013 19:43:03 +0000 (12:43 -0700)]
ofproto-dpif-xlate: Take responsibility for ofproto_receive().
ofproto_receive() is a slightly odd function which doesn't fit
perfectly in either ofproto-dpif or ofproto-dpif-xlate. However, it's
much easier to reason about its thread safety in ofproto-dpif-xlate,
so this patch moves it there and renames it xlate_receive().
Signed-off-by: Ethan Jackson <ethan@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Andy Zhou [Sat, 3 Aug 2013 19:23:14 +0000 (12:23 -0700)]
odp-util: add verbose mode for displaying dp flow.
When verbose mode tuned on, all dp flow fields described by the netlink
attributes are displayed, including fully wildcarded attributes.
Otherwise, the fully wildcarded attributes are omitted for brevity.
Added -m option to "ovs-dpctl dump-flows" to enable verbose mode. It is
off by default.
Signed-off-by: Andy Zhou <azhou@nicira.com>
[blp@nicira.com added documentation] Signed-off-by: Ben Pfaff <blp@nicira.com>
Ethan Jackson [Fri, 2 Aug 2013 03:52:01 +0000 (20:52 -0700)]
ofproto-dpif-xlate: Don't trace on deep resubmit.
While this code is useful for debugging, removing it allows us to hide
ofproto_trace() in ofproto-dpif. ofproto_trace() is a complex function
which could be difficult to make "obviously" thread safe.
Signed-off-by: Ethan Jackson <ethan@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>